Friday, October 20, 2006

Richard Miniter Versus the Lancet

Pajamas Washington DC editor Richard Miniter interviews Lancet study author Professor Gilbert Burnham at Pajamas Media. I am somewhat disappointed in Burnham, who seemed to have no other answer to Miniter's specific questions other than to assert, time and again, 'but this is what our research methodology told us'. A sample of the interview is given below:


PajamasMedia: The Lancet study uses a baseline mortality rate (the rate during Saddam years) of 5.5/1000 – almost half of the mortality rate of Europe. The mortality rate in the EU is 10.10/1000. Given Europe’s excellent health care, public health infrastructure and, lack of war in the past 60 years, how is it possible that Iraq’s baseline is half that of the EU? Are you simply relying on pre-war publications or was the baseline itself generated by interviews with random clusters?

Burnham: This was a ‘cohort’ study, which means we compared household deaths after the invasion with deaths before the invasion in the same households. The death rates for these comparison households was 5.5/1000/yr.

What we did find for the households as a pre-invasion death rate was essential the same number as we found in 2004, the same number as the CIA gives and the estimate for Iraq by the US Census Bureau.

Death rates are a function of many things—not just health of the population. One of the most important factors in the death rate is the number of elderly in the population. Iraq has few, and a death rate of 5.5/1000/yr in our calculation (5.3 for the CIA), the USA is 8 and Sweden is 11. This is an indication of how important the population structures are in determining death rates. (You might Google ‘population pyramid’ and look at the census bureau site—fascinating stuff.)

PajamasMedia: During the same period, Iraq is at war with Iran and itself. Public-health infrastructure was poor, although perhaps not as poor as today. Does it seem plausible to you that the baseline (or pre-war) mortality rate is accurate?

Burnham: Yes as above. Yes as being the right number, and Yes as what we need it for—comparisons in the same households before the war.

A moment's thought would have convinced Burnham that, except from a very narrow definitional point of view, a person contributes to the death rate whether he happens to die young or old. The death rate per 100,000 over the course of a person's life will be reflected in life expectancy. The higher the death rate per 100,000 the lower the average life expectancy. The lower the death rate the higher the life expectancy. Europe's population is old because it has a low death rate per 100,000. In fact, if the death rate were zero they would live forever.

But, one might say that however nonsensical his definition, Burnham is using comparable pre-OIF statistics against the post-OIF statistics and whatever problems the underlying concept of his definition may have the comparison is apples to apples. But how does he measure his apples?

PajamasMedia: You conducted interviews in 47 clusters, 12 of them were in Baghdad, 2 were in Basra, and 3 were in Anbar. Approximately 25% of the estimate comes from Baghdad (only 21% of the population in Baghdad). This seems disproportionate. Is it possible that you over-sampled “hot zones” relative to population?

Burnham: We used the 2004 UNDP/MoP estimates for governorates. We divided the total population by the number of clusters, and then moved on a systematic way through the population assigning clusters proportionate to the population numbers we were using. ... There are always chances that sampling was done in more hot spots, but there is an equal chance that, with a natural human tendancy to self-preservation might cause sampling to be the other way to unconsciously sample in cool spots where one might be safer.

Burnham is apparently using a kind of stratified sampling methodology. He uses the UNDP/MoP to identify different population groups within Iraq and then goes out and measures groups that he thinks correspond to these populations, obtains his statistic and then extrapolates it according to the known weights. This means that if he wrongly identifies his sample with the wrong population groups both his weights and results will be wrong as well. But as you can see he doesn't know whether he is sampling hot spots -- the unrepresentative spots -- at all. In fact, he argues that they may have been "cool spots". But he has to know: because unless he knows, given the methodology he employs, his survey will be useless. Let's take an example.

Trees grow better in a valley where they have abundant water and soil and grow more poorly on hilltops where the soil is thin and dry. Your job is to estimate the timber in the forest. You have an estimate of the proportion of hilltops to valleys. A man is put blindfolded into a forest to inventory it and when he is already in the midst of it the blindfold is removed to measure the trees but he can't see whether he is on a hill or a valley. Can he extrapolate the trees volumes without knowing whether he was on a hill or a valley? What value would a timber inventory have absent that knowledge? If Burnham didn't know whether he was in a cool or hot spot, he would lack the most critical piece of information to make his study work.

As I said, I'm disappointed.

13 Comments:

Blogger Tom Paine said...

You are disappointed with his apparent ignorance of basic methodology.

I suspect he is simply a fraud pretending to be a fool. It’s a common tactic among frauds – because the penalties for “mistakes” are much lower and less likely to be applied.

This was political “science”.

10/20/2006 04:07:00 PM  
Blogger RWE said...

Would it not seem reasonable that pre-OIF numbers were based on the most accessible and politically correct population, i.e., Sunnis - and especially Batth party members? And they enjoyed unusual privelage in Saddam's Iraq.

If you had asked Saddam's government to conduct studies, that is where they would have sent you. Or sent his friends in the U.S.

In contrast, the Marsh Arabs would have been officially nonexistant.

And today the Sunnis are in a world of hurt that is largely their own making - and the Marsh Arabs probably think they are entering a new golden age.

10/20/2006 05:21:00 PM  
Blogger pidgas said...

Hey Wretchard,

Burnham is apparently using a kind of stratified sampling methodology. He uses the UNDP/MoP to identify different population groups within Iraq and then goes out and measures groups that he thinks correspond to these populations, obtains his statistic and then extrapolates it according to the known weights.

I think maybe I can elaborate on the study for you a little. It was a cluster sample survey. Each cluster was considered to be a contiguous block of 40 houesholds. Basically, what they did is this:

1. Weight the 18 governorates by population (higher population = higher probability of being selected). I think the technical term for it is probability proportionate to size.

2. Using those probability proportionate to size weights, they randomly selected how many (of 50) clusters to sample from within each governorate.

3. Within each governorate, it was second verse same as the first. They established probability proportionate to size weights for areas within the governorates (it's not clear if they used the directorates/districts or some other boundary to identify sub-areas within the governorates).

4. They then randomly chose the apportioned number of clusters from each governorarte weighted by the probabilities obtained in step three.

5. What they chose in step 4 were actually street intersections. From those locations they went on to interview 40 homes deterministically from that spot (they just went up the street until they interviewed 40 households).

That was their methodology. They didn't compensate for the difficulties you cited by weighting their statistics. They overcome issues of correlation by increasing the size of the sample.

Burnham is using comparable pre-OIF statistics against the post-OIF statistics and whatever problems the underlying concept of his definition may have the comparison is apples to apples. But how does he measure his apples?

They interviewed the households identified as above and asked them if anyone in the household had died since January 2002. According to the paper, households produced death certificates for 92% of all reported deaths. I think that's how they measured their apples for the sample. They then calculated death rates per year per and post OIF.

A moment's thought would have convinced Burnham that, except from a very narrow definitional point of view, a person contributes to the death rate whether he happens to die young or old. The death rate per 100,000 over the course of a person's life will be reflected in life expectancy.

With respect, I think you might be a bit off with respect to the death rate thing. They aren't talking about death rates averaged over the average life expectancy. They are talking about annual death rates; snapshots in time. Life expectancy and has little to do with it unless your population is old relative to that expectation.


Pid

10/20/2006 05:24:00 PM  
Blogger wretchard said...

pidgas,

Yes. Burnham is definitionally correct about the death rate when he regards it as a snapshot. He is making a statement, I think, about the difference between two snapshots and inferring from the differences a rate of change. But then in compararing snapshots it is important that they be comparable. The nice thing about looking at a situation over time is that it is not dependent on the vagaries of when the snapshot is taken. Viewed as a movie, that is over time, it is clearly safer to be in Europe than in Iraq.

Now let's look at his snapshots. He took 47 sample points nationwide. For administrative reasons, Burnham's team stayed near the cities. "We sampled from all administrative units selected. Using the standard nearest-front-door approach, the teams would not likely select isolated homesteads which were many miles away from the area being sampled." Nor did visit the same neighborhoods. "But in both surveys we started at 1 January 2002, thus giving us a chance this time to confirm the findings from 2004, though we visited none of the same neighborhoods in 2006." So it's not as if he had a patch of growth he revisited each year to check on the delta. He inspected what he thought were comparable sets and measured that agaisnt a sets surveyed previously.

Now let's consider Mr. Burnham's claim that "Overall, 13% of deaths were attributed to airstrikes" because this gives us a window into his data. A sample of his sample, so to speak. That's 85,000 deaths from airstrikes. The problem with this is that airstrikes virtually ceased since 2004. Now maybe "airstrikes" is really understood to be all kinds of fire. Could this be true?

Where would these airstrikes have happened? Where Burnham sampled, of course because he claims these deaths are backed by death certificates. Now recall that of his 47 clusters, 12 were in Baghdad, 2 were in Basra, and 3 were in Anbar. But wait! Basra is in the British sector and there have been no air missions in Basra to speak of. As for Baghdad, if which has 1/4 of the clusters, at least 20,000 people would have died in airstrikes within full view of the media since 2004. Anbar is the place where most airstrikes would probably have occured. But it's a sparsely populated area and contains 3/47 clusters. Could it be that most of the airstrikes were there?

So we come to the question of what his snapshots really look like. And as I said, I am disappointed.

10/20/2006 06:28:00 PM  
Blogger Kinuachdrach said...

Let's face it -- statistics is one of the most difficult subjects to truly comprehend. Philosophers have been arguing about Frequentist versus Epistemic interpretations of probability for hundreds of years, without solving the problem.

Lets also remember that the great English statistician, Ronald Alymer Fisher, went to his grave arguing that activists were misusing statistics when they fingered smoking as the cause of lung cancer. There are things that ordinary people have been told to believe based on statistics that do not pass muster with real statisticians.

Since most of us don't have the background to comprehend the issues involved in the statistical analyses, we have to look at the apparent reasonableness (or unreasonableness) of the conclusions. And if all of this sounds vaguely reminiscent of the "global warming" debate, there is a reason for it!

10/20/2006 06:59:00 PM  
Blogger pidgas said...

Wretchard,

It's just a cohort study. He uses the sample population as its own control. As you say, he is comparing the snapshots pre and post OIF. The "treatment" is the war. It's basically like taking a random group of people and then watching some characteristic (say Blood Pressure) before and after the administration of a drug. The treatment and control groups are the same, just separated by time.

So it's not as if he had a patch of growth he revisited each year to check on the delta. He inspected what he thought were comparable sets and measured that agaisnt a sets surveyed previously.

I think we're on the same page here. They had two samples, one obtained in 2004 and one in 2006. Similarly constructed, but random samples and unlikely to contain the same data point twice. He comments that the pre-war death rate in both study samples was similar. That makes it less likely that one sample or the other wildly misrepresents the pre-war death rate.

Now let's consider Mr. Burnham's claim that "Overall, 13% of deaths were attributed to airstrikes" because this gives us a window into his data. A sample of his sample, so to speak. That's 85,000 deaths from airstrikes.

Now here's where we're definitely on the same page. I may have come off as an apologist for the study so far, but I'm no fanboy. This "cherry-picking" of the data is something with which I have a big problem. Especially given the nature of air strikes, cluster sampling seems like a poor choice to determine the rate of death by air strike. The rate or intracluster correlation would be much more likely to be high with air strikes - and thus require a much larger sample to distinguish from "design effect."

This study was designed to pick up a change in the death rate from ~5 to ~10 per thousand with 95% certainty 80% of the time. In other words, if the ACTUAL death rate goes from 5 to 10 per thousand in REALITY, your sample will show that difference 80% percent of the time and have 95% confidence that the difference did not occur by chance. The key is that they designed their study to identify that particular outcome .

Insofar as they limit their conclusions to the fact that the death rate appears to have doubled, their conclusion is difficult to refute. Commenting on WHY people have died is a completely different subject. They can say, "13% of this sample population had the cause of death reported as 'airstrike.'" But they are stretching what they can reasonably conclude by extrapolating that feature of the study population to the whole population.

I read another post on another blog today suggesting that this number of deaths from airstrikes is criminally negligent. This is where it would be nice to know what their rate of death by air strike was in the first study. Although they assert that the air strikes caused the majority of the increased deaths, they do not put a number on it...at least not that I can find.

Wife impatient. Must. Go.

Pid

10/20/2006 08:09:00 PM  
Blogger wretchard said...

Pid,

The effects of war can be directly observed on the population of a country or by ennumeration. For example, the Great War left a demographic dip in the population of Western Europe as did the Great Patriotic War on Russia. More recently the effects of war can be directly observed in Africa where large numbers of orphans and abandoned older people are in evidence. Where the war is still in progress the cumulative effects may still be unappreciable, so you take a series of snapshots and then infer the result.

I'd like to compare the problem to that of estimating a person's asset base. You can either take a snapshot of earnings in two different periods and from there infer the change in assets or you can measure the change in assets directly and from there calculate the change in income. In Burnham's case he compares two snapshots through his cohort methodology. But bear in mind the alternative, which is to examine the cumulative effect. And that I argue is by now possible.

He has conducted his study at a time of regime transition. The baseline is the Saddam era. The presumption is that deaths were reported accurately at that time, so that presumably there were death certificates which said "tortured to death" as there are now evidently death certificates which say "died by airstrike". Here is where the cohort system becomes immediately vulnerable to the framing of the snapshot. If changes have been made in measuring an effect then the data afforded by a cohort comparison begins to acquire a large error term. We read in the papers that the incidence of breast cancer has doubled or tripled, but much of that effect is due to the fact that breast cancers are being diagnosed at much earlier stages today than decades ago. So the first thing to observe is that the death certificates of Dr. Burnham in control, if he had any death certificates then, were issued under the ancien regime and the ones in the next data a point were issued by a new regime.

This brings us to an immediate problem. The Iraqi government was not constituted until 2005. So the the "certificates" issued today are not obviously comparable to anything that came before. Moreover, there has been a policy of compensating civilians for collateral deaths and for pecuniary reasons there is an incentive to die by "airstrike" as opposed to dying of say, cholera. I will submit that for those reasons, about the only reliable part of Burnham's methodology will be his interviews.

Now I've done quite a bit of field interviewing myself and one thing you look for is collateral effects: 655,000 deaths are the equivalent of about 7 million deaths in America and at that level there will be a lot of collateral effects. When you ask a farmer whether he did well last year you also ask questions like when did you buy your farm animal or when did you buy a new suit of clothes to provide the collateral. Where is the collateral of Burnham?

Let's consider the airstrikes. The Second Falluja produced on the order of 3,000 insurgent deaths in a campaign that visibly damaged, some would say leveled the town. Any campaign of airstrikes that would kill nearly 20 times as many would probably be obvious. Among other things, bombs leave craters. Where are the craters?

In Lebanon, for example, at the conclusion of a month's intensive "genocidal" bombing how many deaths does Lebanon claim to airstrikes? Not nearly anything like 55,000. We know that bomb craters can be counted because the UN mine clearance people claim to be marking the impact holes and even the "cluster bomb" submunitions. In Iraq, Dr. Burnham claims 50K+ fatalities from airstrikes. Where are the craters?

Returning to the methodology, measuring the cumulative effect of a rate is always better than estimating and comparing the instantaneous rate. Hence I am always better off with aerial photography of a forest than doing cohort studies on the rate of growth of stands and extrapolating the timber volume from that study.

If you finished a cohort study which concluded that the forest, by extrapolation was denuded yet aerial photography did not show any logged over areas then you would be exactly in the position of accepting Burnham's proposition that 655,000 people died without observing the masses of widows, orphans, mass graves and bomb craters that one would expect to accompany such an enormous loss. My parents were unfortunate enough to go through a major urban battle. It only killed 100K people but what it did was produce a flood of refugees and displaced persons whose existence was palpable. One quarter of Burnham's sample is from Baghdad. Where are the refugees? It is not widely appreciated but Iraq is host to millions of Shi'ite pilgrims every year. There is not a single historical instance I can think of where tourists continued to visit a country beset on the scale claimed.

Deaths on Burnham's scale show up in numerous ways, none of which he can readily point to. To return to the analogy, Burnham says he has observed the cash register 78 times on several occasions and claims you are broke. You look at your account and see that you are not broke. But his cohort study proves you are broke. So are you broke? I don't think it is enough to say "my methodology shows you are broke so you must be broke".

But there is a further problem with considering War a "treatment" whose effects can be studied on a control group of patients. There have been the actions of Syria, Iran and al-Qaeda for example. And there are more subtle changes too. For example, many ethnic Kurds have actually gone to Kurdistan because of the prosperity and peace there. Would that not have changed the mix of Burnham's sample?

Notice that not once have I impugned Dr. Burnham's integrity and accused him of malice. However one of the hallmarks of a good researcher is that he subjects his calculations to a sanity check. If your instrument indicates that a person is brain dead but that person is observed to be playing basketball you might want to check your instrument. I am not persuaded by Dr. Burnham's 655,000 figure.

I think the number of surplus deaths figure is probably in the 100 K range and mostly concentrated in the male population and here's why. Nearly every death incident which is observed directly involves people with this profile. These are people killed in attacks on police stations, recruiting offices, checkpoints, etc. These instances are often verified by US military, Iraqi government agents and the press. So when we say 75 men killed in Habaniyah it is a better sample than one of Dr. Burnham's interviews. Unless there is a parallel universe out there that nobody but Dr. Burnham sees the results of extrapolating the one sample should coincide pretty fairly with the other.

As I said, I'm disappointed in Dr. Burnham.

10/20/2006 11:00:00 PM  
Blogger The Wobbly Guy said...

"Never attribute to malice what can be adequately explained by ignorance or stupidity."

For the left, which Burnham is probably a member, I'd revise the quote: Never attribute to stupidity what can be adequately explained by malice.

As a supposedly learned man of letters and education, he can hardly be considered stupid in his field of study. Apply Occam's Razor, and we can conclude he's only out to cast the War in the harshest light possible.

10/21/2006 12:59:00 AM  
Blogger Jeff M said...

One day, a woman approached Buddha after a teaching begging that he do something to restore her dead child to her. He listened patiently to her plea and saw how great was her despair. He said to her, "Mother, if you bring me just one mustard seed from any household in which no person has died, then I shall revive your child."

The woman was greatly encouraged by the Teacher's words. She traveled from door to door throughout her own village, but could not find even a single residence in which no one had died. She went out of town, wandering to this hamlet and that in search of the tiny seed that the Buddha had requested. Days later, muddy and footsore, she returned to the place where the Buddha and his followers were passing the rainy season.

She was ushered into the Teacher's presence worn out, but not discouraged. "Master, try as I might, I could not locate the token you requested as an offering...

10/21/2006 08:15:00 AM  
Blogger 3Case said...

Wait a second...wait a second....

What's that old saying?..."Figures never lie...."

10/21/2006 09:11:00 PM  
Blogger friend said...

1. Weight the 18 governorates by population (higher population = higher probability of being selected). I think the technical term for it is probability proportionate to size.

Why would he stratify his sample based on a weighted population? If you are seeking to extrapolate a death rate from a WAR you go to where the war is (and is not), not where the people are. If the sample were chosen based on military activity, then you would get a more representative sample of the fighting going on. So he may have, for example, several strata based on air sorties, IED attacks, and Allied missions of some kind or other. The strata would be high, medium and low activity locations at a provincial (or the Iraq equivalent of county) level. Based on that strata, he can generate a random sample and weight the final data to come up with a more accurate estimate. Weighting by population (density) doesn't accomplish this.

Thats like drawing a survey sample to predict voting behavior based on population. Sure, population is a part of it (like making sure you don't have 10% of your sample coming from Vermont), but you need to also stratify across the variables that make voting and non-voting more and less probable (the "social" part of "social science")

10/22/2006 12:31:00 AM  
Blogger genwolf said...

The collateral effect argument is one that the IBC (see http://www.iraqbodycount.org/press/pr14.php)
)has mounted as a criticism of the study - and they have gone to some lengths in showing just how serious this shortcoming is.

Some of the other key critcicisms that I have yet to see convincingly answered by either the studies authors or defenders include, in no particular order:

1) The claim of internal self consistency is a false one. The studies authors make the claim that the 2006 results confirm the results from the 2004 survey in that roughly similar mortality rates occur in both. In fact when the results are examined there is no such self consistency at all:
see http://hurryupharry.bloghouse.net/archives/2006/10/15/lancet_redux.php
In essence whilst the figures in Lancet 1 and 2 for total mortality are relatively close the composition of the mortality in the 2 surveys is completely different. this would be like doing 2 surveys of fruit production both both produce roughly the same figure for a particular period in terms of total tonnage but when examined show that one survey says for the same period that mostly Oranges were produces and the other says that mostly apples were. Such a study is not consistent, and it is almost difficult to believe that the Surveyors failed to notice this.

2) Verification: When doing this kind of surveying one of the most critical aspects of ensuring that your raw data is sound involves back checking. In this case it is very difficult to see how this back checking could have been accomplished - as the survey also assures us that no identifying information on respondents was collected.

3) A few (including myself) people who have worked in market research are somewhat incredulous at the claimed survey response levels and and claimed compliance.

4) The question of the baseline used is one of the most serious criticisms leveled at the survey. The Snapshot year chosen as the baseline is 2002-03. I don't think that a single serious historian in the world could claim with a straight face that this year could be called representative of Sadaam's reign in terms of mortality , and that is aside from the criticism that the baseline figure is itself determined using the methodology that is under question in determining post invasion mortality.

5) Sample size:
There is some confusion amongst both critics and defenders of Lancet about what the sample size criticism really is. Defenders are right to point out that it is in this type of survey it is more important that the sample be representative than large (see next point) and that there is precedent for extrapolating from small sample sizes. The problem is that you need to know a great deal about what a representative sample would look like, from census and similar data, and then ask lots of control questions to ensure that your sample is representative. It does not appear that either was done in this case. The other problem is that when you are sampling for something that is known to be very lumpy and relatively infrequent in it's distribution (such as Wartime violence and premature mortality)- even assuming you have avoided a systemic sample bias or design effect a small sample size is much more susceptible to being thrown out by the presence or absence of just a few lumps. Take for example an attempt to use cluster sampling with a small sample to determine the varying probabilities of being paid by various poker machines in a large hall full of them. You randomly select 47 machines from say a 1,000 of them, and then pull the lever 40 times on each - the only certainty s that your extrapolation of the percentages accorded various winning combinations are going to be - taken all together - wildly inaccurate. For the very rare major jackpots you are going to be wrong even if you win - and you are going to be quite massively wrong. For the more frequent minor payouts even assuming an entirely random lumpiness you will require a sample size that is at least a multiple of the inverse of the frequency to avoid the near certain effect of pretty gigantic swings in extrapolated numbers. When even with the very high Lancet claims about mortality we are talking about frequencies below 5% this sample size could only come close to being accurate if mortality was nearly perfectly evenly distributed, which it very clearly is not.

6) Sample Bias:
I think this is the most serious criticism of all. I pointed out in this thread at Harry's Place
http://hurryupharry.bloghouse.net/archives/2006/10/11/the_lancet_report.php
(where I post as Johan W) that the sampling bias occurs on a number of levels , including at the most micro level ( a critcism that has been dubbed main street bias).
The sample has been weighted towards the most violent provinces and away from the least violent, towards medium and larger cities and away from smaller towns and villages, and within those towns towards the main streets and their immediate environs and away from the sort of secluded neighborhoods common in many Arab cities that are almost like large compounds that sit off and apart from thoroughfares and markets. Defenders claim that this is simply a function of weighing the samples by population, but the trouble is that it would appear that the methodology does not so much assign a smaller proportion of samples to Rural Iraqis or those living in smaller towns and villages or those living in compound like neighborhoods as it entirely excludes them from being sampled at all. In Iraq this means the collectively the population thus excluded (rural, smaller towns or living in the lower populated provinces) is more than 1/2 the population (the rural population alone is 1/3rd of all Iraqis). And sources like IBC which contain some geographic information, as well as the News show that the violence, criminality, and suspension of basic services is massively concentrated in those areas the survey is deigned to over sample. In addition we have few other ways of determining whether even for the areas actually surveyed we have a representative sample - in terms of religious affiliation or income or ethnicity or marital status and household size. We have age and Gender and that is it.

Together I think these are enough strikes to say that this survey should be struck out, and we really should be questioning whether incompetence or something worse is responsible for the whole sorry excercise.

10/22/2006 06:07:00 AM  
Blogger Larry said...

Three points I haven't seen above.

o The pre-war death rate in Iraq was allegedly quite high because of the sanctions. The allegation I read was approx. 500k "excess deaths" from higher infant mortality.

o The surveyors gathered no demographic data - highly irregular, and removes the ability to crosscheck the sample's validity.

o The use of only 43 clusters is highly unusual. The UN study last year used 2,000+.

10/22/2006 06:09:00 AM  

Post a Comment

Links to this post:

Create a Link

<< Home


Powered by Blogger