Friday, October 20, 2006

Richard Miniter Versus the Lancet

Pajamas Washington DC editor Richard Miniter interviews Lancet study author Professor Gilbert Burnham at Pajamas Media. I am somewhat disappointed in Burnham, who seemed to have no other answer to Miniter's specific questions other than to assert, time and again, 'but this is what our research methodology told us'. A sample of the interview is given below:


PajamasMedia: The Lancet study uses a baseline mortality rate (the rate during Saddam years) of 5.5/1000 – almost half of the mortality rate of Europe. The mortality rate in the EU is 10.10/1000. Given Europe’s excellent health care, public health infrastructure and, lack of war in the past 60 years, how is it possible that Iraq’s baseline is half that of the EU? Are you simply relying on pre-war publications or was the baseline itself generated by interviews with random clusters?

Burnham: This was a ‘cohort’ study, which means we compared household deaths after the invasion with deaths before the invasion in the same households. The death rates for these comparison households was 5.5/1000/yr.

What we did find for the households as a pre-invasion death rate was essential the same number as we found in 2004, the same number as the CIA gives and the estimate for Iraq by the US Census Bureau.

Death rates are a function of many things—not just health of the population. One of the most important factors in the death rate is the number of elderly in the population. Iraq has few, and a death rate of 5.5/1000/yr in our calculation (5.3 for the CIA), the USA is 8 and Sweden is 11. This is an indication of how important the population structures are in determining death rates. (You might Google ‘population pyramid’ and look at the census bureau site—fascinating stuff.)

PajamasMedia: During the same period, Iraq is at war with Iran and itself. Public-health infrastructure was poor, although perhaps not as poor as today. Does it seem plausible to you that the baseline (or pre-war) mortality rate is accurate?

Burnham: Yes as above. Yes as being the right number, and Yes as what we need it for—comparisons in the same households before the war.

A moment's thought would have convinced Burnham that, except from a very narrow definitional point of view, a person contributes to the death rate whether he happens to die young or old. The death rate per 100,000 over the course of a person's life will be reflected in life expectancy. The higher the death rate per 100,000 the lower the average life expectancy. The lower the death rate the higher the life expectancy. Europe's population is old because it has a low death rate per 100,000. In fact, if the death rate were zero they would live forever.

But, one might say that however nonsensical his definition, Burnham is using comparable pre-OIF statistics against the post-OIF statistics and whatever problems the underlying concept of his definition may have the comparison is apples to apples. But how does he measure his apples?

PajamasMedia: You conducted interviews in 47 clusters, 12 of them were in Baghdad, 2 were in Basra, and 3 were in Anbar. Approximately 25% of the estimate comes from Baghdad (only 21% of the population in Baghdad). This seems disproportionate. Is it possible that you over-sampled “hot zones” relative to population?

Burnham: We used the 2004 UNDP/MoP estimates for governorates. We divided the total population by the number of clusters, and then moved on a systematic way through the population assigning clusters proportionate to the population numbers we were using. ... There are always chances that sampling was done in more hot spots, but there is an equal chance that, with a natural human tendancy to self-preservation might cause sampling to be the other way to unconsciously sample in cool spots where one might be safer.

Burnham is apparently using a kind of stratified sampling methodology. He uses the UNDP/MoP to identify different population groups within Iraq and then goes out and measures groups that he thinks correspond to these populations, obtains his statistic and then extrapolates it according to the known weights. This means that if he wrongly identifies his sample with the wrong population groups both his weights and results will be wrong as well. But as you can see he doesn't know whether he is sampling hot spots -- the unrepresentative spots -- at all. In fact, he argues that they may have been "cool spots". But he has to know: because unless he knows, given the methodology he employs, his survey will be useless. Let's take an example.

Trees grow better in a valley where they have abundant water and soil and grow more poorly on hilltops where the soil is thin and dry. Your job is to estimate the timber in the forest. You have an estimate of the proportion of hilltops to valleys. A man is put blindfolded into a forest to inventory it and when he is already in the midst of it the blindfold is removed to measure the trees but he can't see whether he is on a hill or a valley. Can he extrapolate the trees volumes without knowing whether he was on a hill or a valley? What value would a timber inventory have absent that knowledge? If Burnham didn't know whether he was in a cool or hot spot, he would lack the most critical piece of information to make his study work.

As I said, I'm disappointed.

8 Comments:

Blogger Tom Paine said...

You are disappointed with his apparent ignorance of basic methodology.

I suspect he is simply a fraud pretending to be a fool. It’s a common tactic among frauds – because the penalties for “mistakes” are much lower and less likely to be applied.

This was political “science”.

10/20/2006 04:07:00 PM  
Blogger RWE said...

Would it not seem reasonable that pre-OIF numbers were based on the most accessible and politically correct population, i.e., Sunnis - and especially Batth party members? And they enjoyed unusual privelage in Saddam's Iraq.

If you had asked Saddam's government to conduct studies, that is where they would have sent you. Or sent his friends in the U.S.

In contrast, the Marsh Arabs would have been officially nonexistant.

And today the Sunnis are in a world of hurt that is largely their own making - and the Marsh Arabs probably think they are entering a new golden age.

10/20/2006 05:21:00 PM  
Blogger wretchardthecat said...

pidgas,

Yes. Burnham is definitionally correct about the death rate when he regards it as a snapshot. He is making a statement, I think, about the difference between two snapshots and inferring from the differences a rate of change. But then in compararing snapshots it is important that they be comparable. The nice thing about looking at a situation over time is that it is not dependent on the vagaries of when the snapshot is taken. Viewed as a movie, that is over time, it is clearly safer to be in Europe than in Iraq.

Now let's look at his snapshots. He took 47 sample points nationwide. For administrative reasons, Burnham's team stayed near the cities. "We sampled from all administrative units selected. Using the standard nearest-front-door approach, the teams would not likely select isolated homesteads which were many miles away from the area being sampled." Nor did visit the same neighborhoods. "But in both surveys we started at 1 January 2002, thus giving us a chance this time to confirm the findings from 2004, though we visited none of the same neighborhoods in 2006." So it's not as if he had a patch of growth he revisited each year to check on the delta. He inspected what he thought were comparable sets and measured that agaisnt a sets surveyed previously.

Now let's consider Mr. Burnham's claim that "Overall, 13% of deaths were attributed to airstrikes" because this gives us a window into his data. A sample of his sample, so to speak. That's 85,000 deaths from airstrikes. The problem with this is that airstrikes virtually ceased since 2004. Now maybe "airstrikes" is really understood to be all kinds of fire. Could this be true?

Where would these airstrikes have happened? Where Burnham sampled, of course because he claims these deaths are backed by death certificates. Now recall that of his 47 clusters, 12 were in Baghdad, 2 were in Basra, and 3 were in Anbar. But wait! Basra is in the British sector and there have been no air missions in Basra to speak of. As for Baghdad, if which has 1/4 of the clusters, at least 20,000 people would have died in airstrikes within full view of the media since 2004. Anbar is the place where most airstrikes would probably have occured. But it's a sparsely populated area and contains 3/47 clusters. Could it be that most of the airstrikes were there?

So we come to the question of what his snapshots really look like. And as I said, I am disappointed.

10/20/2006 06:28:00 PM  
Blogger Kinuachdrach said...

Let's face it -- statistics is one of the most difficult subjects to truly comprehend. Philosophers have been arguing about Frequentist versus Epistemic interpretations of probability for hundreds of years, without solving the problem.

Lets also remember that the great English statistician, Ronald Alymer Fisher, went to his grave arguing that activists were misusing statistics when they fingered smoking as the cause of lung cancer. There are things that ordinary people have been told to believe based on statistics that do not pass muster with real statisticians.

Since most of us don't have the background to comprehend the issues involved in the statistical analyses, we have to look at the apparent reasonableness (or unreasonableness) of the conclusions. And if all of this sounds vaguely reminiscent of the "global warming" debate, there is a reason for it!

10/20/2006 06:59:00 PM  
Blogger wretchardthecat said...

Pid,

The effects of war can be directly observed on the population of a country or by ennumeration. For example, the Great War left a demographic dip in the population of Western Europe as did the Great Patriotic War on Russia. More recently the effects of war can be directly observed in Africa where large numbers of orphans and abandoned older people are in evidence. Where the war is still in progress the cumulative effects may still be unappreciable, so you take a series of snapshots and then infer the result.

I'd like to compare the problem to that of estimating a person's asset base. You can either take a snapshot of earnings in two different periods and from there infer the change in assets or you can measure the change in assets directly and from there calculate the change in income. In Burnham's case he compares two snapshots through his cohort methodology. But bear in mind the alternative, which is to examine the cumulative effect. And that I argue is by now possible.

He has conducted his study at a time of regime transition. The baseline is the Saddam era. The presumption is that deaths were reported accurately at that time, so that presumably there were death certificates which said "tortured to death" as there are now evidently death certificates which say "died by airstrike". Here is where the cohort system becomes immediately vulnerable to the framing of the snapshot. If changes have been made in measuring an effect then the data afforded by a cohort comparison begins to acquire a large error term. We read in the papers that the incidence of breast cancer has doubled or tripled, but much of that effect is due to the fact that breast cancers are being diagnosed at much earlier stages today than decades ago. So the first thing to observe is that the death certificates of Dr. Burnham in control, if he had any death certificates then, were issued under the ancien regime and the ones in the next data a point were issued by a new regime.

This brings us to an immediate problem. The Iraqi government was not constituted until 2005. So the the "certificates" issued today are not obviously comparable to anything that came before. Moreover, there has been a policy of compensating civilians for collateral deaths and for pecuniary reasons there is an incentive to die by "airstrike" as opposed to dying of say, cholera. I will submit that for those reasons, about the only reliable part of Burnham's methodology will be his interviews.

Now I've done quite a bit of field interviewing myself and one thing you look for is collateral effects: 655,000 deaths are the equivalent of about 7 million deaths in America and at that level there will be a lot of collateral effects. When you ask a farmer whether he did well last year you also ask questions like when did you buy your farm animal or when did you buy a new suit of clothes to provide the collateral. Where is the collateral of Burnham?

Let's consider the airstrikes. The Second Falluja produced on the order of 3,000 insurgent deaths in a campaign that visibly damaged, some would say leveled the town. Any campaign of airstrikes that would kill nearly 20 times as many would probably be obvious. Among other things, bombs leave craters. Where are the craters?

In Lebanon, for example, at the conclusion of a month's intensive "genocidal" bombing how many deaths does Lebanon claim to airstrikes? Not nearly anything like 55,000. We know that bomb craters can be counted because the UN mine clearance people claim to be marking the impact holes and even the "cluster bomb" submunitions. In Iraq, Dr. Burnham claims 50K+ fatalities from airstrikes. Where are the craters?

Returning to the methodology, measuring the cumulative effect of a rate is always better than estimating and comparing the instantaneous rate. Hence I am always better off with aerial photography of a forest than doing cohort studies on the rate of growth of stands and extrapolating the timber volume from that study.

If you finished a cohort study which concluded that the forest, by extrapolation was denuded yet aerial photography did not show any logged over areas then you would be exactly in the position of accepting Burnham's proposition that 655,000 people died without observing the masses of widows, orphans, mass graves and bomb craters that one would expect to accompany such an enormous loss. My parents were unfortunate enough to go through a major urban battle. It only killed 100K people but what it did was produce a flood of refugees and displaced persons whose existence was palpable. One quarter of Burnham's sample is from Baghdad. Where are the refugees? It is not widely appreciated but Iraq is host to millions of Shi'ite pilgrims every year. There is not a single historical instance I can think of where tourists continued to visit a country beset on the scale claimed.

Deaths on Burnham's scale show up in numerous ways, none of which he can readily point to. To return to the analogy, Burnham says he has observed the cash register 78 times on several occasions and claims you are broke. You look at your account and see that you are not broke. But his cohort study proves you are broke. So are you broke? I don't think it is enough to say "my methodology shows you are broke so you must be broke".

But there is a further problem with considering War a "treatment" whose effects can be studied on a control group of patients. There have been the actions of Syria, Iran and al-Qaeda for example. And there are more subtle changes too. For example, many ethnic Kurds have actually gone to Kurdistan because of the prosperity and peace there. Would that not have changed the mix of Burnham's sample?

Notice that not once have I impugned Dr. Burnham's integrity and accused him of malice. However one of the hallmarks of a good researcher is that he subjects his calculations to a sanity check. If your instrument indicates that a person is brain dead but that person is observed to be playing basketball you might want to check your instrument. I am not persuaded by Dr. Burnham's 655,000 figure.

I think the number of surplus deaths figure is probably in the 100 K range and mostly concentrated in the male population and here's why. Nearly every death incident which is observed directly involves people with this profile. These are people killed in attacks on police stations, recruiting offices, checkpoints, etc. These instances are often verified by US military, Iraqi government agents and the press. So when we say 75 men killed in Habaniyah it is a better sample than one of Dr. Burnham's interviews. Unless there is a parallel universe out there that nobody but Dr. Burnham sees the results of extrapolating the one sample should coincide pretty fairly with the other.

As I said, I'm disappointed in Dr. Burnham.

10/20/2006 11:00:00 PM  
Blogger The Wobbly Guy said...

"Never attribute to malice what can be adequately explained by ignorance or stupidity."

For the left, which Burnham is probably a member, I'd revise the quote: Never attribute to stupidity what can be adequately explained by malice.

As a supposedly learned man of letters and education, he can hardly be considered stupid in his field of study. Apply Occam's Razor, and we can conclude he's only out to cast the War in the harshest light possible.

10/21/2006 12:59:00 AM  
Blogger 3Case said...

Wait a second...wait a second....

What's that old saying?..."Figures never lie...."

10/21/2006 09:11:00 PM  
Blogger genwolf said...

The collateral effect argument is one that the IBC (see http://www.iraqbodycount.org/press/pr14.php)
)has mounted as a criticism of the study - and they have gone to some lengths in showing just how serious this shortcoming is.

Some of the other key critcicisms that I have yet to see convincingly answered by either the studies authors or defenders include, in no particular order:

1) The claim of internal self consistency is a false one. The studies authors make the claim that the 2006 results confirm the results from the 2004 survey in that roughly similar mortality rates occur in both. In fact when the results are examined there is no such self consistency at all:
see http://hurryupharry.bloghouse.net/archives/2006/10/15/lancet_redux.php
In essence whilst the figures in Lancet 1 and 2 for total mortality are relatively close the composition of the mortality in the 2 surveys is completely different. this would be like doing 2 surveys of fruit production both both produce roughly the same figure for a particular period in terms of total tonnage but when examined show that one survey says for the same period that mostly Oranges were produces and the other says that mostly apples were. Such a study is not consistent, and it is almost difficult to believe that the Surveyors failed to notice this.

2) Verification: When doing this kind of surveying one of the most critical aspects of ensuring that your raw data is sound involves back checking. In this case it is very difficult to see how this back checking could have been accomplished - as the survey also assures us that no identifying information on respondents was collected.

3) A few (including myself) people who have worked in market research are somewhat incredulous at the claimed survey response levels and and claimed compliance.

4) The question of the baseline used is one of the most serious criticisms leveled at the survey. The Snapshot year chosen as the baseline is 2002-03. I don't think that a single serious historian in the world could claim with a straight face that this year could be called representative of Sadaam's reign in terms of mortality , and that is aside from the criticism that the baseline figure is itself determined using the methodology that is under question in determining post invasion mortality.

5) Sample size:
There is some confusion amongst both critics and defenders of Lancet about what the sample size criticism really is. Defenders are right to point out that it is in this type of survey it is more important that the sample be representative than large (see next point) and that there is precedent for extrapolating from small sample sizes. The problem is that you need to know a great deal about what a representative sample would look like, from census and similar data, and then ask lots of control questions to ensure that your sample is representative. It does not appear that either was done in this case. The other problem is that when you are sampling for something that is known to be very lumpy and relatively infrequent in it's distribution (such as Wartime violence and premature mortality)- even assuming you have avoided a systemic sample bias or design effect a small sample size is much more susceptible to being thrown out by the presence or absence of just a few lumps. Take for example an attempt to use cluster sampling with a small sample to determine the varying probabilities of being paid by various poker machines in a large hall full of them. You randomly select 47 machines from say a 1,000 of them, and then pull the lever 40 times on each - the only certainty s that your extrapolation of the percentages accorded various winning combinations are going to be - taken all together - wildly inaccurate. For the very rare major jackpots you are going to be wrong even if you win - and you are going to be quite massively wrong. For the more frequent minor payouts even assuming an entirely random lumpiness you will require a sample size that is at least a multiple of the inverse of the frequency to avoid the near certain effect of pretty gigantic swings in extrapolated numbers. When even with the very high Lancet claims about mortality we are talking about frequencies below 5% this sample size could only come close to being accurate if mortality was nearly perfectly evenly distributed, which it very clearly is not.

6) Sample Bias:
I think this is the most serious criticism of all. I pointed out in this thread at Harry's Place
http://hurryupharry.bloghouse.net/archives/2006/10/11/the_lancet_report.php
(where I post as Johan W) that the sampling bias occurs on a number of levels , including at the most micro level ( a critcism that has been dubbed main street bias).
The sample has been weighted towards the most violent provinces and away from the least violent, towards medium and larger cities and away from smaller towns and villages, and within those towns towards the main streets and their immediate environs and away from the sort of secluded neighborhoods common in many Arab cities that are almost like large compounds that sit off and apart from thoroughfares and markets. Defenders claim that this is simply a function of weighing the samples by population, but the trouble is that it would appear that the methodology does not so much assign a smaller proportion of samples to Rural Iraqis or those living in smaller towns and villages or those living in compound like neighborhoods as it entirely excludes them from being sampled at all. In Iraq this means the collectively the population thus excluded (rural, smaller towns or living in the lower populated provinces) is more than 1/2 the population (the rural population alone is 1/3rd of all Iraqis). And sources like IBC which contain some geographic information, as well as the News show that the violence, criminality, and suspension of basic services is massively concentrated in those areas the survey is deigned to over sample. In addition we have few other ways of determining whether even for the areas actually surveyed we have a representative sample - in terms of religious affiliation or income or ethnicity or marital status and household size. We have age and Gender and that is it.

Together I think these are enough strikes to say that this survey should be struck out, and we really should be questioning whether incompetence or something worse is responsible for the whole sorry excercise.

10/22/2006 06:07:00 AM  

Post a Comment

<< Home


Powered by Blogger