Wagner 1982 Amer Stat - Simpson - S Paradox in Real Life

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

Ql

1.0 -
overB. The distinction
reflects
thattheconditions
for
confounding differ
dependinguponthemeasureofas-
sociationused.
[Received
August1979.RevisedSeptember
1981.]

REFERENCES
BISHOP, YVONNE M. M.; FIENBERG, STEPHEN E.; and HOL-
LAND, PAUL W. (1975),Discrete
Multivariate
Analysis:Theory
and Practice,Cambridge:MIT Press.
BLYTH, COLIN R. (1972), "On Simpson's Paradox and the Sure-
ThingPrinciple,"
Journal
oftheAmerican
Statistical
Association,
67, 364-366.
0.25- GARDNER, MARTIN (1976), "On the Fabric of InductiveLogic
and Some ProbabilityParadoxes," ScientificAmerican, 234,
119-124.
ROTHMAN, KENNETH J. (1975), "A PictorialRepresentationof
Confoundingin EpidemiologicStudies," Journalof ChronicDis-
/ , , , ,
~~~~~~~~~~~~~~~~Q2
eases, 28, 101-108.
0.0 0.25 0.50 0.75 1.0 SIMPSON, E. H. (1951), "The Interpretation
of Interactionin Con-
Figure 2. Plot of Table 1-Conditional Probability tingency
Tables,"Journal
oftheRoyalStatistical
Society,Ser. B,
Coordinates 13, 238-241.
WHITTEMORE, ALICE S. (1978), "Collapsibility of Multi-
dimensionalContingencyTables," Journalof theRoyal Statistical
Society,Ser. B, 40, 328-340.
relevantcondition
is marginal ofA and
independence YULE, G. U. (1903), "Notes on theTheoryof Associationof Attri-
ofA andC whenwecombine
C, thatis,independence butes in Statistics,"Biometrika,2, 121-134.

Simpson'sParadoxin Real Life


CLIFFORD H. WAGNER*

1. INTRODUCTION O'Connell1975). However,thiswas not a complete


instanceofSimpson'sparadoxbecause,whenthedata
Simpson's paradox(Blyth1972)isthedesignationfor were disaggregated, the overalltendencytowarda
a surprisingsituation
thatmayoccurwhentwopopu- higheracceptancerateformaleapplicants was notre-
lationsare comparedwithrespectto theincidenceof versedin each academicdepartment.
someattribute: Ifthepopulationsareseparatedinpar- Two real-lifeexamplesof Simpson'sparadoxare
allelintoa setofdescriptivecategories,
thepopulation presentedbelow.Theyillustrate
theparadoxinthecon-
withhigheroverallincidencemayyetexhibita lower textofpopulations composedofseveralcategories
and
incidence withineach suchcategory. demonstrate howeasilytheparadoxcan occur.
An actualoccurrence of thisparadoxwas observed
(Cohen and Nagel 1934,p. 449) in a comparison of
tuberculosisdeathsin New YorkCityand Richmond,
Virginia,duringthe year1910. Althoughthe overall 2. RENEWALRATES
tuberculosismortalityratewaslowerin NewYork,the
oppositewas observedwhenthe data wereseparated Magazinepublishers monitorratesof re-
carefully
intotworacialcategories;in boththewhiteand non- newalofexpiring Forexample,atAmer-
subscriptions.
whitecategories,Richmond hada lowermortalityrate. ican HistoryIllustratedin early 1979, the publishers
A similarsituationinvolving dividedintoa
populations werepleasedto notean increaseintheoverallrenewal
largenumberof categoriesoccurredin a well-known ratefrom51.2 percentin January to 64.1 percentin
studyofsexbiasingraduateadmissions at theUniver- February.Becauserenewalratesare highly correlated
sityof California,Berkeley(Bickel, Hammel,and withestablished categories,and because
subscription
one mightwishto identifythe kindsof subscriptions
*CliffordH. Wagneris AssistantProfessor,Departmentof Mathe- renewalrate,thedatafor
thataccountfortheincreased
maticalSciences,The Capitol Campus, The PennsylvaniaState Uni- expiring and renewalsare tabulatedas in
subscriptions
versity,Middletown,PA 17057. Table1. The paradoxofthisexampleis thatfromJan-

46 February1982, Vol. 36, No. I


(? The AmericanStatistician,

This content downloaded from 128.61.129.207 on Wed, 8 Jan 2014 08:42:54 AM


All use subject to JSTOR Terms and Conditions
uaryto February therenewalratesactuallydeclinedin Table 1. ExpiringSubscriptions,Renewals, and Renewal
everycategory.Or, in theterminology ofmutually fa- Rates, by Monthand SubscriptionCategory
vorableevents(Blyth1973,Chung1942):overallFebru-
Source of CurrentSubscription
aryis favorable
to renewal,whileineachcategory Jan-
Previous Direct Subscription Catalog
uaryis favorableto renewal. Month Gift Renewal Mail Service Agent Overall
The primary causeofthemisleading increasein the January
overallrenewalrateis thesharpdecreaseintherelative Total
Renewals
3,594
2,918
18,364
14,488
2,986
1,783
20,862
4,343
149
13
45,955
23,545
importanceofthesubscription servicecategory
(andof Rate .812 .789 .597 .208 .087 .512
its low renewalrate). LettingCl, C2, C3, C4, and C5 February
Total 884 5,140 2,224 864 45 9,157
designate
thefivesubscriber theoverallre-
categories, Renewals 704 3,907 1,134 122 2 5,869
newalprobability
P(R) is a weightedaverageof the Rate .796 .760 .510 .141 .044 .641
renewalprobabilities infact
fortheseparatecategories,

P(R) = EP(R n C,)= EP(Ci)P(R IC). 4. CONCLUSION


ForJanuary,
therenewalprobability
is givenby Simpson'sparadoxis nota contrived pedagogicalex-
ample.Because thissituationoccursat the levelof a
.08(.81) + .40(.79) + .06(.60) + .45(.21) + .00(.09),
purelydescriptivedata analysis,it can easilybewilder
andthecorresponding
weighted
averageforFebruary
is thestatistically
naiveobserver.Classroomdiscussions
of descriptive
statisticsshould includeexamplesof
.10(.80) + .56(.76) + .24(.51) + .09(.14) + .00(.04). anomaliessuchas Simpson'sparadox.
Foradditionaldiscussionof Simpson'sparadoxand
Noticethedecreasefrom.45 to .09 in theweightas- themoregeneraltopicofinteractions andcollapsibility
signedtoC4,thesubscription servicecategory.
Changes in three-dimensionalcontingency tables, see Yule
intheweights as wellas changesintherenewalratesof (1903),Simpson(1951),Bishop,Fienberg, andHolland
the separatecategoriesdetermine the changein the (1975), Fienberg(1977), and Lindleyand Novick
overallrate.
(1981).

3. INCOME TAX RATES 5. ACKNOWLEDGMENT


ThesecondexampleofSimpson'sparadoxinvolves a Theauthoris grateful
toJamesRietmulder,
Director
comparison of federalpersonalincometax ratesfor ofPlanning
at Historical
TimesIncorporated,
publisher
years(see Table2). Between1974and 1978,
different ofAmericanHistoryIllustrated,forhiscooperationand
thetaxratedecreasedineachincomecategory, yetthe assistancein obtainingthe subscription
renewaldata
overalltax rate increasedfrom14.1 percentto 15.2 presentedin thisarticle.
percent.Again,theoverallratesareweighted averages,
withthetax rateforeach categoryweightedby that [ReceivedApril 1981. RevisedJuly1981.
category'sproportionof totalincome.Because of in-
in 1978therewererelatively
flation, morepersonsand
consequently moretaxabledollarsassignedto
relatively REFERENCES
thehigher income(i.e., highertaxrate)brackets.
The
BICKEL, P. J.; HAMMEL, E. A.; and O'CONNELL, J. W. (1975),
readermaywishtospeculateaboutthenumber oflegis- "Sex Bias in Graduate Admissions:Data fromBerkeley,"Science,
latorswho fullyunderstand the effectof Simpson's 187, 398-404.
paradoxeventhoughunawareof itsofficial name. BISHOP, Y. M. M., FIENBERG, S. E., and HOLLAND, P. W.

Table 2. TotalIncomeand TotalTax (inthousands ofdollars),and TaxRate forTaxable Income Tax Returns,by Income Category
and Year

1974 1978
Adjusted Tax Tax
Gross Income Income Tax Rate Income Tax Rate
under$ 5,000 41,651,643 2,244,467 .054 19,879,622 689,318 .035
$ 5,000 to $ 9,999 146,400,740 13,646,348 .093 122,853,315 8,819,461 .072
$ 10,000 to $14,999 192,688,922 21,449,597 .111 171,858,024 17,155,758 .100
$ 15,000 to $99,999 470,010,790 75,038,230 .160 865,037,814 137,860,951 .159
$ 100,000 or more 29,427,152 11,311,672 .384 62,806,159 24,051,698 .383
Total 880,179,247 123,690,314 1,242,434,934 188,577,186
OverallTax Rate .141 .152

?) The AmericanStatistician,
February1982, Vol. 36, No. 1 47

This content downloaded from 128.61.129.207 on Wed, 8 Jan 2014 08:42:54 AM


All use subject to JSTOR Terms and Conditions
Analysis: Theoryand Practice,Cam-
(1975), DiscreteMultivariate FIENBERG, STEPHEN E. (1977), The Analysisof Cross-Classified
bridge,Mass.: The MIT Press. CategoricalData, Cambridge,Mass.: The MIT Press.
BLYTH, COLIN R. (1972), "On Simpson's Paradox and the Sure- LINDLEY, D. V., and NOVICK, MELVIN R. (1981), "The Role of
ThingPrinciple,"Journalof theAmericanStatisticalAssociation, Exchangeabilityin Inference,"The Annals of Statistics,9, 45-58.
67, 364-366. SIMPSON, E. H. (1951), "The Interpretation of Interactionin Con-
(1973), "Simpson's Paradox and MutuallyFavorableEvents," tingencyTables," Journalof theRoyal StatisticalSociety,Ser. B,
Journalof theAmericanStatisticalAssociation,68, 746. 13, 238-241.
CHUNG, KAI-LAI (1942), "On MutuallyFavorable Events," An- The WorldAlmanac and Book of Facts, (1977 and 1981 ed.), New
nals of MathematicalStatistics,13, 338-349. York: NewspaperEnterpriseAssociation,Inc.
COHEN, MORRIS R., and NAGEL, ERNEST (1934), An Intro- YULE, GEORGE U. (1903), "Notes on the Theoryof Association
ductionto Logic and ScientificMethod, New York: Harcourt, of Attributesin Statistics,"Biometrika,2, 121-134.
Brace and World,Inc.

AnotherApproachto IncompleteIntegrals
ANNE CHAO*

e )( ? r-Iev dy (y = x -C)
as thesums
integrals
One usuallywritestheincomplete
theprocedureof
byrepeating
ofdiscreteprobabilities
by parts.This workprovidesanotherap-
integration = e-ac r1 e
)( f yr-aye dy
proachbyemploying thebinomialexpansion.

KEY WORDS: Incomplete beta;


gamma;Incomplete = e j. !(r 1 - j)! ( r-j
Binomialexpansion.
r-I

=>Ee- c(ac)ylj!
j=O
In an introductory course,one usuallyre-
statistics
peatstheprocedure ofintegrationbypartsto establish whichproves(1). Similarly,
we can write
therelationsbetweentheincomplete integralsanddis- I dx
F(n?+ 1) (1 x)n-kxk-
suchas theincomplete gammaand
fC
creteprobabilities,
thePoisson
J (kF(nk? 1)
i(r ) x r-le-a dx = 1 - > e -c(ac Ylj! (1) -(k I7(n?1
-?k + 1) [( _ c) + cz]n-kCk( - Z)k- 'dz
j=O F(k)F(n - k + 1) - c)
wherer is a positiveinteger,ot> 0, c > 0; or thein- (x =c - CZ)
completebetaand thebinomial
- FI(n?+1) nk\'f ~( c)k
fr r(fl +1) (1 - X )n-kXk-i dX

=E(.ci(-) (2)
j=k - 2
0 < c < 1. I have
wherek andn are positiveintegers, n-k

foundthatmoststudentsare moreinterested in the F (k - )!i!(n - k - i) !c(1 - c)nki


directapproach,whichemploysonlya bi-
following
-
nomialexpansion.Notethat x (k 1)!i!
(k + i)!
f[ Ie -xdx n-k

= ' (k + i)ck+i - c)nk


*Anne Chao is Associate Professor,Instituteof Applied Mathe-
matics,NationalTsing Hua University,Taiwan, Republic of China. whichis exactlytherightsideof (2).

48 ? The AmericanStatistician,February1982, Vol. 36, No. I

This content downloaded from 128.61.129.207 on Wed, 8 Jan 2014 08:42:54 AM


All use subject to JSTOR Terms and Conditions

You might also like