The Royal Statistical Society Schools Lecture 2004: Lies and Statistics', Part 1

The Royal Statistical Society Schools Lecture 2004: Lies and Statistics, Part 1
O 2 riginal 28 Teaching 0141-982X Publishing Ltd Teaching Statistics TEST Articles Oxford, UK Blackwell Statistics. Volume 28, Number 2, Spring 2006
KEYWORDS: Teaching; Misinterpretations of statistics; Averages; Coin tossing; DNA profiles.
Frank Duckworth
Stinchcombe, Gloucestershire, England e-mail: f.duckworth@rss.org.uk Summary This article is the first of a two-part printed version of the Royal Statistical Societys Schools Lecture for 2004, on Lies and Statistics.
INTRODUCTION BY THE EDITOR
he Royal Statistical Society provides a schools lecture service for schools and colleges (and universities as part of outreach activities) throughout the UK. The lecturer each year is appointed as the Guy lecturer in honour of William Augustus Guy (18101885), a celebrated early medical statistician. The Guy lecturer for 2004 was Frank Duckworth, who has kindly transcribed his lecture into a print version for publication in the journal. We are very pleased to publish it, as we believe it will be interesting and useful for all our readers. Frank is very well known in sporting circles as the co-inventor of the Duckworth/Lewis method for setting revised targets in rain-interrupted one-day cricket matches. He provided an article about this in the journal (Duckworth 2001). However, the lecture is not about this method. We gratefully acknowledge the permission of the Royal Statistical Society to publish the lecture. It is a talk that is about an hour long, so the printed version is being serialized in this issue of the journal and the next. Readers should keep in mind that the lecture is directed at an audience of preuniversity students in the UK. The text is adjusted for presentation as a written paper and customized for an international readership. Some authors comments on what happens during the lecture are given in square brackets, as are a few other asides.
any myths about statistics being associated with lies. [But before proceeding with the talk I carry out an experiment see Think of a number in Part 2 and ask the audience to keep their answers hidden until near the end of the talk.] Sometime during the 1950s, The (London) Times reported that Sir Winston Churchill had quoted from Mark Twain who had attributed the following remark to Disraeli:
There are three kinds of lies; there are lies, damned lies and statistics.
We poor statisticians have been trying to live down this notorious quote ever since. [Actually, theres considerable uncertainty over the real origin of this quote. I am now told that it is most likely due to Leonard Henry Courtney (1832 1918).] Of course, to equate statistics with lies is totally unjust, but then I would say that, wouldnt I? So what Im going to try and do is to offer a logical and indisputable defence of statistics and statisticians; and Im going to do this by giving you examples to illustrate where statistics and lies have been wrongly equated. The truth is that there are two quite different meanings of the word statistics. First, there is the word which is the plural of statistic and which is synonymous with data. Statistics are numbers: figures, pieces of numerical information. And that is all no messages, no conclusions, no statement about society or to whatever they may relate just numbers. Then there is the subject called statistics, which is a singular word like mathematics or physics. Its a subject you can study and, if youre successful in the appropriate examinations, practise.
2006 Teaching Statistics Trust
STATISTICS AND STATISTICS

Ive chosen Lies and Statistics as the title of my talk, and my whole object will be to try and dispel
34
Teaching Statistics. Volume 28, Number 2, Summer 2006
What you do when you practise statistics/singular is to analyse statistics/plural and try and learn something from them. And to do this you use specialist mathematical techniques that have to be learnt and applied properly; techniques that you wont understand unless youve studied statistics/ singular for instance, regression, correlation, analysis of variance or significance testing. The techniques of statistics/singular first came to prominence during the 1920s and 1930s, and it was the British who were the pioneers; people like Sir Ronald Fisher, who worked at the agricultural research station at Rothamsted. Indeed, it was in the field of agriculture that modern statistics first took off. But the point I am making is that the methods of statistics/singular were not invented until long after Disraeli [or Courtney] was dead and buried, so theres no way that he could have wished to imply that the conclusions of statistical analysis were lies.
that is the intended purpose of league tables, then you should report a measure not of the examination results themselves but of these results adjusted for the ability of the students on admission. You would thus obtain a measure of the added value of the school. In fact, statisticians working at the Institute of Education have recently produced such an alternative measure. The second example, which is a favourite of mine as a sporting statistician, is football league attendance figures. In the years following the end of the Second World War, attendances were higher than they had ever been before and indeed have ever been since. During the 1950s and early 1960s, however, they suffered a steady decline as other interests, such as television and the motor car, competed for the working mans attention on a Saturday afternoon. Then after England won the World Cup in 1966, attendances rose quite substantially and they maintained their high level for a few years, until after Englands ignominious exit from the 1970 competition, when they started to drop off again. During the early 1970s, the press gave great publicity to this decline in attendances by regularly publishing the weekly figures which always seemed to be lower than those for the corresponding week of the previous season. I started to get interested in these figures as I suspected that there was no proper statistical analysis carried out on them before reporting the verdicts. Indeed, I found some rather amazing inconsistencies. For example, figure 1 is a copy of a piece in the Daily Telegraph. The heading was Gates down (gates being the shorthand expression for the number of people passing through the entrance turnstiles) and it gave the comparative figures shown. Note the comment: Compared with last year when three more matches were played.
MISLEADING STATISTICS
I dont know the context of the quotation, but I suspect that the point it was making is that statistics/ plural can be misleading. Indeed, that is still very true today. But its not the statisticians fault; rather, it is the fault of the unqualified commentators who are reporting them or drawing implications from them. Here are a couple of examples. The first of these is school league tables based on A-level results. When I first prepared this talk about 4 years ago, St Pauls School, Westminster, was top of the league, and this school has been top, or very nearly so, since league tables were first produced several years ago. Im quite sure that this is an excellent school with a very high standard of teaching, but because it is top of the league it does not mean that it is necessarily the best. To get good results, you need two things: you need good teaching and you also need good students. Now, entrance to many of the countrys top independent schools is by examination, so a perfectly valid alternative interpretation of the league tables is that they indicate the relative difficulty of the entrance examinations. League tables are not a fair measure of the teaching quality of the schools, and hence they should not be used as a valid basis for parents to decide to which school they should send their child. If
GATES DOWN Football league attendances on Saturday were 48,514 lower than on the corresponding Saturday last season, when three more matches were played. Figures, with the previous years in brackets, are given below: Division Division Division Division Totals 1 2 3 4 175,902 143,222 82,155 53,091 (234,523) (136,008) (78,991) (53,362)
454,370 (502,884)
Fig. 1. Football league attendances

35
When I looked into the figures I found that those three more matches were all in the first division, and first division attendances averaged about 25,000, so if you look closely at the figures you can see that the actual trend was upwards. Nevertheless, the total figure was indeed less than that for the previous year, so it produced the heading Gates down. The statistics were apparently giving completely the wrong message. Of course, there are many more examples of statistics presented so that the wrong message is conveyed; and it is this phenomenon that tarnishes the name of statistics/singular and its practitioners. But I repeat that it is not the statistics that lie, and it is not the statisticians who lie; it is the people who present them in a way that conveys the wrong message who are to blame.
Englishmen, obtain their heights and find that the average height of my 20 Welshmen is 1.80 metres and that for my 20 Englishmen is only 1.73 metres. Would I be right in drawing the conclusion that Welshmen are taller on average than Englishmen? [The question is directed at some specific students and usually elicits the response no.] Of course, I wouldnt. There is considerable variation in the heights of both Welshmen and Englishmen and it may be that the sample of Welshmen I selected just happened to have included more tall men. But suppose I then randomly select 200 men from each country and find that the Welshmen average 1.78 metres and the Englishmen 1.75 metres. Even though the difference is less, because the sample was much larger it may well suggest that Welshmen really are taller. But in either case, statistics can never prove that Welshmen are taller on average than Englishmen. All statistics can do is to make an assumption about the truth and invoke the concept of probability to say whether or not the data are consistent with that supposed truth. Statistical analysis is usually carried out on samples because it is often just not possible to measure the entire population. So we take samples and use statistical analysis to try to draw conclusions about the populations from which these samples were drawn. But here is the problem. Statistics from a sample can never prove anything about the population. Whatever the difference in the average heights of my samples of Welshmen and Englishmen, the measurements can never prove that the average Welshman is taller than the average Englishman. All one can do with statistics is to write down a hypothesis and then look at the data and say whether or not they are consistent with that hypothesis being true. For instance, let us make the following hypothesis:
There is no difference in the average height of Welshmen and Englishmen.
WHAT CAN AND WHAT CANNOT STATISTICS DO?

What are statistics/plural and what is the objective of statistics/singular? This is the way I see the situation. The world exists as it does and the world, and the people in it, behave in a certain way. Because the world behaves as it does, things happen. We can observe the things that happen and describe these things by making measurements, and these measurements yield statistics/plural. The statistics may be school examination results or football attendances; or they may be the monthly ice cream sales in Kensington Gardens. It is the object of statistics/ singular to examine them and try to learn from them something useful about the processes that created them, and this is usually a very difficult thing to do. The main problem is that there are many alternative states of the world that can produce a given set of observations. For any given state of the world, one can attempt to work out the statistics that would be produced, but it is quite a different matter to start with the statistics and say something about the world that produced them. Because of this it is often the case that what may appear to be conclusions to an unqualified reporter or analyst are just not true, and so why many people still associate statistics with lies. For example, suppose I were to select a random sample of 20 Welshmen and another of 20
36
Then we might look at the measurements we have taken, do some mathematical juggling with the figures, and say: If there really is no difference, the chances of getting such a difference in average height from samples of 200 are only 20%, i.e. 1 in 5. Now, whether that means that Welshmen are taller than Englishmen or whether it just happens that the sample has been biased towards tall
Welshmen is entirely ones own judgement. Statistical analysis can do nothing more than provide the probabilities, and the probability in this case is 1 in 5. It has come to be regarded over the years that if the probability of your observation is as low as 5%, thats 1 in 20, then you should sit up and take notice; and when we have this low level of probability, we normally say that the observation is statistically significant. But that does not mean we have established truth. If we carry out sufficient studies, then every so often well conclude that an observation is statistically significant when in truth there is no effect. Dont blame statistics; blame life, the universe and everything.
and obtained statistics on the numbers of babies born that year in each town, and he showed a most remarkable correlation: those towns with the most nests had the highest numbers of births per year and vice versa. Of course, the truth was that both these quantities independently correlated with the size of the town. The bigger the towns, the greater were the numbers of chimney stacks, and hence the more storks nests; also the greater were the numbers of houses and hence people and hence babies. [I cannot recall the specific reference to the story, which I read over 30 years ago, but a good article which uses the same example to warn of the dangers of confusing correlation with causation is Matthews (2000).]
CAUSE AND EFFECT THE LAW OF AVERAGES

Another common area where statistics might easily be misinterpreted is the case of mistaking correlation with causation. For instance, I remember seeing this headline (the dialect being taken from the Cockney song upon which it was based).
its the rich wot live the longer.
Another concept that can lead to confusion is that of average. We are all familiar with statements made about how disgusting it is that so many people earn less than the average wage. A few years ago The Times ran a succession of correspondence on this topic. Some politician had made the outrageous statement that he regarded anyone earning less than the average wage as living in poverty. In fact, the strict definition of poverty is earning less than half the average wage, and that too presents problems; but that is another subject. First of all, we had letters pointing out that the only way to avoid anyone earning less than the average was for everybody to earn the same, even Richard Branson or Bill Gates. Then, someone else wrote an equally silly letter, saying that however fair the distribution of income, there will always be 50% of workers who earn less than the average. This person was, of course, confusing the word average with another quantity, the median, but this confusion was illustrated with a most excellent letter which had the effect of bringing the correspondence to an abrupt end. And were back now with Welshmen. This is what it said. Sir, Bearing in mind the few Welshmen who, for one reason or another, have the misfortune to have only one leg and the even fewer who have no legs at all, then each Welshman on average has only 1.9995 legs. It can therefore be stated in all honesty
This referred to a report on a study which showed that there was a statistically significant correlation between how long you live and how much you earn. I am not for a moment suggesting that the conclusion was incorrect, and that this was one of those cases where statistical significance did not mean truth. What I am saying is that one cannot look at associations of this sort and draw conclusions about the cause. The implication of the way the report was presented was that money bought long life; presumably if you had money then you could afford the best treatments or jump the queue for a hospital bed. There may well be some truth in this, but there are many other ways of interpreting the data presented. The pitfalls of correlation studies were well illustrated many years ago by a famous study on storks and babies. It used to be the tradition to tell children who were too young to be told the real facts of life that babies were brought by storks. Some years ago, a statistician randomly selected about 50 Swedish towns and counted the numbers of storks nesting places in each. Then he went to the local registrars
37
that 99.95% of Welshmen have more than the average number of legs.
expect to happen every 32 times you go through this process. So if someone came to you and produced a coin and then tossed it five times and it came down heads on all five occasions and he asked you to bet on it coming down tails next time, would you take an even bet? [chorus of yes] yes, I think you would. But suppose he carried on and tossed a further five heads, making ten in a row, which had a probability of 1 in over 1000; would you still take an even bet on tails for the eleventh toss? [hesitation . . . ]. Mmm. Now you start to wonder if he is playing fair. And you would probably want better odds than 50/50. But what odds would you accept as fair?
ON THE TOSS OF A COIN

Most talks about statistics and probability mention coin tossing at some stage, so let us think about coin tossing. Englands cricket captains seem to have that vital knack of calling incorrectly when it is very important to win the toss. Occasionally a captain will win every toss of a match series. I am old enough to remember the late Australian captain Lindsay Hassett in 1953 winning every toss of the five-match series, and I know that such streaks have occurred several times since. So what are the chances of the same teams captain winning every toss in a five-match series? Any offers? [At this point I wait for members of the audience to volunteer an answer, and inevitably someone will call out 1 in 32.] Actually, the correct answer is 1 in 16, as I didnt specify which captain. But if Id have asked for the probability of England winning the toss in all five matches of a series, the answer would indeed be 1 in 32. The probability of winning it on the first occasion is 1 in 2; that of winning it on the second occasion as well as on the first is 1 in 2 2, which is 1 in 4; and so for five consecutive wins the probability is 1 in 32. Now, I said earlier that probabilities which are lower than 1 in 20 are regarded as indicating something of statistical significance. And I have just referred to an event which has happened several times, which has a probability of only 1 in 32. So if we are playing a six-match series and Englands captain has lost the toss on all five previous calls, what chance would you give him of winning the sixth toss? Remember that he has lost it five times running, and the chances of this having happened are not consistent with the assumption that the coin tossing process was completely fair. Would you want better odds than evens for him winning the sixth toss, or would you stick with your assumption that the tossing process was completely fair and be content with evens? [Await murmurs from the audience that one should still accept an even money wager.] Youd do the latter of course, because you believe that the chances are really 50/50 on each toss and that losing five times in a row was just bad luck; it was what you would
38
BAYES AND THE LAW COURTS

Classical statistical methods can only tell you that tossing ten heads in a row is inconsistent with the assumption of a 50/50 chance on each toss. What it cannot do is tell you what you should really regard as the odds on the next toss giving tails, given that you have just tossed ten consecutive heads. During the last 40 years or so, however, statisticians have started to adopt a completely different approach to the analysis of data, whereby the object is not to say how likely are the data given the hypothesis but to answer the more important question of how likely is the hypothesis given the data. And because the mathematics of this is based on the use of a very old theorem first formulated by Rev Thomas Bayes in the 18th century, this branch of statistics has become known as Bayesian statistics. A good example of the way Bayesian statistics is finding its way into our lives is to be found in recent court rulings involving evidence from DNA fingerprinting. If you find someones fingerprints at the scene of the crime, it is 100% certain that that person was there, because fingerprints are completely unique. No two people in the world have the same fingerprints. A recently introduced equally valuable forensic tool is DNA fingerprinting, where DNA profiles can be extracted from samples of blood or other bodily fluids, and these can be compared with those from suspects. The science of DNA
fingerprinting has improved over the years and I am told that it is now almost as reliable as normal fingerprinting, but this has not always been the case. About 10 years ago, when DNA fingerprinting was in its infancy, it was possible that an individuals DNA profile could be shared by several others, perhaps up to ten in a country the size of the UK. So, although there was a low probability of two individuals having the same DNA, a matching DNA profile was not in itself 100% proof of guilt. The statistical problem that arose is best illustrated with reference to an actual case. A woman was raped and forensic analysis provided the DNA profile of her attacker. A man, whom I shall not name but I will refer to him as Jones, was found to have a matching DNA profile, and he was charged with the crime despite the woman failing to identify him as her attacker. Now it seems that the chances of an innocent man having a matching DNA profile were only about 1 in 3 million. But in answer to a leading question from the prosecution, a forensic scientist in the witness box stated that the probability of Jones being innocent was 1 in 3 million. Based on this, Jones was convicted. Can you see the fallacy? Incidentally, the fallacy has become known as the prosecutors fallacy. The chances of an innocent man having a matching DNA profile were 1 in 3 million, but the prosecution interpreted this as the probability of Jones being innocent being only 1 in 3 million, i.e. that he must be guilty. But this is a complete misinterpretation of the statistics. The probability of a man having a matching DNA profile, given that he is innocent, is not the same as the probability of a man being innocent given that he has a matching DNA profile. The probability of the
observation given the hypothesis is not the same as the probability of the hypothesis given the observation. For the latter you need to use Bayes theorem, and this means that you cannot use the DNA evidence alone but must take account of additional evidence. It is the same argument as tossing the coin. The probability of it coming down ten heads in a row given that it is unbiased is 1 in 1024. But if it comes down ten heads in a row, the probability of it being unbiased is not 1 in 1024; it is much nearer 1 in 1, because you have prior knowledge that it is probably unbiased. And the use of Bayes theorem demands that you make use of this prior knowledge in drawing conclusions from the observations. In the Jones case, what happened was that it went to the Court of Appeal. They understood the difference and ordered a retrial. In fact, Jones changed his plea to guilty before the retrial, but the important thing as far as the statistical profession was concerned was that the prosecutors fallacy had been exposed. In several subsequent court cases, statisticians have been called in as expert witnesses, and other cases of conviction based on DNA evidence alone have gone to appeal as a result. The text of the lecture will be concluded in the next edition of the journal.
References Duckworth, F. (2001). A role for statistics in international cricket. Teaching Statistics 23(2), 38 44. Matthews, R. (2000). Storks deliver babies ( p = 0.008). Teaching Statistics 22(2), 368.
Teaching Statistics.
Volume 28, Number 2, Summer 2006
39

The Royal Statistical Society Schools Lecture 2004: Lies and Statistics', Part 1

Uploaded by

Copyright:

Available Formats

You might also like

The Royal Statistical Society Schools Lecture 2004: Lies and Statistics', Part 1

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

The Royal Statistical Society Schools Lecture 2004: Lies and Statistics', Part 1

Uploaded by

Copyright:

Available Formats

The Royal Statistical Society Schools Lecture 2004: Lies and Statistics, Part 1

KEYWORDS: Teaching; Misinterpretations of statistics; Averages; Coin tossing; DNA profiles.

INTRODUCTION BY THE EDITOR

STATISTICS AND STATISTICS

Fig. 1. Football league attendances

WHAT CAN AND WHAT CANNOT STATISTICS DO?

CAUSE AND EFFECT THE LAW OF AVERAGES

ON THE TOSS OF A COIN

BAYES AND THE LAW COURTS

2006 Teaching Statistics Trust

Volume 28, Number 2, Summer 2006

You might also like