Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

3/9/2010

1
SamplingVariabilityandSampling
Distributions
Aparameter isanumericalcharacteristicdescribinga
population.Typically,thevalueoftheparameterisunknown
becausewecannotexaminetheentirepopulation.
Astatistic isanumericalcharacteristicdescribingaspecific
sample.Itsvaluecanbecalculatedfromthesampledataand
willvaryfromsampletosample willvaryfromsampletosample.
SamplingVariability:theobservedvalueofastatisticwill
dependontheparticularsampleselectedfromthepopulation
andisexpectedtovaryinrepeatedrandomsampling.
Throughinferenceweoftenusesamplestatisticstoestimate
thevalueofthecorrespondingpopulationparameter.
PopulationMean:
Fixedvaluebut
SampleMean:
Valueisknownfor
x
Thedistinctionbetweenpopulationandsampleisoflessimportancein
exploratoryanalysisbutcriticalinstatisticalinferenceletsstartwith
notation:
Fixedvaluebut
probablyunknown
Otherparameters:
StandardDeviation:
Proportion:
Valueisknownfor
specificsamplebutwill
changefromsampleto
sample
Otherstatistics:
StandardDeviation:s
Proportion: p
Althoughunpredictableintheshortrun, chanceorrandombehaviorhasa
regularandpredictablepatterninthelongrun!
Inmathematics&statistics,randomdoesnotmeanhaphazard;it
describestheorderthatemergeswhenarandomphenomenonisrepeated
manytimes.
Althoughweencounterrandomphenomenoneveryday,ittendstobe
theshortrununpredictableside.Werarelyobserveenoughrepetitions
toseethelongtermregularity.
Arandomvariableisavariablewhosevalueisanumericaloutcomeofa
randomphenomenon.
The sampling distributionofastatistic isthedistributionof
allpotentialvaluesineverypossiblesamplesofagivensize
fromthesamepopulation.
Thesamplingdistributionofasamplestatisticcanbeexaminedbeforethe
sampledataareactuallycollected.
Samplingdistributionsarethebuildingblocksforinference.
Itisraretogeneratetheexactsamplingdistribution because
gettingallpossiblesamplesisrarelypossible.(exceptfor
relativelysmallpopulationsoffixedsize)
Typically,weapproximatethesamplingdistribution througha
simulationprocessbutitispossibleforsimulationtomiss
someofthepossiblevaluesofthesamplestatistic.
I. Takearandomsampleofsizenfroma
population.
II. Computetheappropriatesummarystatistic.
III. RepeatstepsI&IImanytimes.
IV. Displaythedistributionofthesummarystatistic.
3/9/2010
2
AnActualAgeDiscriminationLawsuit
In1991,WestvacoCorporationwentthough5roundsof
layoffs.
Bytheendofthelayoffs,only22of50workersinthe
engineeringdepartmentoftheenvelopedivisionstillhadjobs.
Theaverageageofworkersinthedepartmentfellfrom48 g g p 4
to46.
RobertMartin,whoturned55in1991,wasoneofthe
workerslaidoff.
MartinsuedWestvaco,claiminghehadlostisjobbecauseof
hisage.
TheQuestion Wereolderworkersdiscriminated
againstduringthecompanyslayoff?
Martinslawyersusedstatisticstohelpanswerthequestion.
Thestatisticalanalysisinthelawsuitlookedatall50employees
intheengineeringdepartmentoftheenvelopedivision,with intheengineeringdepartmentoftheenvelopedivision,with
separateanalysesforsalariedandhourlyworkers.
Priortotheround2layoffstherewere10hourlyworkers,
ages: 25333538485555555664
Thisisthedistributionofthepopulation ofages.
WestvacoHourly Workers' Ages Prior to Round2 Layoffs
Worker Ages
N
u
m
b
e
r
o
f
W
o
r
k
e
r
s
60 50 40 30
4
3
2
1
0
m
b
e
r
o
f
S
a
m
p
le
s
20
15
10
WestvacoActual SamplingDistributionof theSampleMean, n=3
m
b
e
r
o
f
S
a
m
p
le
s
180
160
140
120
100
80
WestvacoSimulatedSamplingDistributionof 1000SampleMeans, n=3
Thegraphontheleftshowsthesamplingdistributionofsamplemeans
forall120samplesofsize3thatcanberandomlychosenfromthe
population.
Ontherightisthesimulatedsamplingdistributionofsamplemeansfrom
activity1.1.
SampleMeans
N
u
60 56 52 48 44 40 36 32
5
0
Samplemeans
N
u
m
60 56 52 48 44 40 36 32
60
40
20
0
Averagesarealwayslessvariablethanindividual
observations.
Thesamplingdistributionofsamplemeanswillalwayshave
lessvariability(smallerstandarddeviation)thantheoriginal
distribution.
Thesamplemeansmuststaycloserto thanindividual Thesamplemeansmuststaycloserto thanindividual
observations.
Asn getslargerthesamplestatisticmustgetclosertothe
populationparameter.
Theimpactofincreasingsamplesizehappensslowly!Tocutthe
standarddeviationofasamplingdistributioninhalf requiresasample
sizefourtimes larger!
3/9/2010
3
Themeanofthesamplingdistributionofsample
meansisthesameasthepopulationmean.Since
,thesamplemeanisanunbiasedestimatorofthe
populationmean.
Thestandarddeviationofthesamplingdistribution

x
Thestandarddeviationofthesamplingdistribution
ofsamplemeansissmallerthanthepopulation
standarddeviationandisequaltothepopulation
standarddeviationdividedbythesquarerootofthe
samplesize.(forlargepopulations)
n
x


DrawanSRSofsizen fromany populationwithmeanand
finitestandarddeviation.Whenn islarge,thesampling
distributionofthesamplemeanisapproximatelynormal:
) , (
n
N

x
Howlargeasamplesizeisneededforthesamplingdistribution
ofsamplemeantobeclosetonormaldependsonhowfarthe
originalpopulationisfromnormal.TypicallytheCentralLimit
Theoremcanbeappliedifthesamplesizeexceeds30.
Sincethesamplingdistributionisapproximatelynormal,wecanuse
normaldistributiontechniquestofindprobabilities!
YoutypicallypurchaseDogspaldogfoodthatcomesinbagsmarked
NetWeight40Pounds.Recently,thebagsfeellighterthanyou
recallfromthepast.WhenyoupurchasethenextbagofDogspal,you
weighitandfindittoweigh39pounds.
AsalongtermDogspalcustomeryoucheckoutcustomer
serviceinformationonthecompanywebpage.Thereyoufindthatthe
dl l i th tth l i th i t companyproudlyclaimsthattheyalwaysgivetheircustomersmore
thanwhattheypayfor!Theystatethattheweightsoftheir40pound
bagsareapproximatelynormallydistributedwithameanof40.5
poundsandastandarddeviationof1.5pounds.
Doestheweightofyoursinglebagindicateaproblem,either
withthemanufacturingprocessorwiththeDogspal
information?Doyouhavegroundsforcomplaint?
Weneedtoanswerwithfact,notemotion!So,howunusual
isittoseeabagwithaweightthatisasextremeormore
extremethanoursifthecompanyinformationisaccurate? extremethanoursifthecompanyinformationisaccurate?
(CAUTION:Moreextremecanbeeitherlessthanorgreater
thandependingonsituation.)
Calculatetheprobabilityassociatedwiththeindividual
observationaswedidinchapter2.
X=theactualweightofasinglebagofDogspal inthe
40poundbag
N(40.5lbs,1.5lbs)
1587 . 0 ) 1 ( )
5 1
5 . 40 39
( ) 39 (

z P z P x P
Interprettheseresults.
So,isitunusualtoseeasinglebagofDogspal witha
weightaslowasours?
NO!Asconsumers,wemaynotbehappybutthe
companyacceptsthisoutcome!
5 . 1
3/9/2010
4
Nowconsiderthesituationwhereyoupurchase10Dogspal
bagsatrandomandweighthem.Surprisingly,themean
weightis39pounds
Doesthisindicateaproblem,eitherwiththemanufacturing
processorwiththeDogspalinformation? p g p
Howunusualisittoseeameanweightfrom10bagsas
extremeormoreextremethanwhatwasobservedifthe
companyinformationisaccurate?
Calculatetheprobabilityassociatedwiththesamplemean.
=themeanweightofarandomlyselectedsampleof10bagsofDogspal
in40poundbags.
Requiresanewzscoreformula:
N(40.5lbs, lbs)
x
4743 . 0
10
5 . 1

n
x
x
x
x x
z

Interprettheseresults.Isitunusualtoseeasamplemeanweightfrom10
randombagsofDogspal aslowasours?
YES!Sowhatdoesthattellus?
0008 . 0 ) 16 . 3 ( )
4743 .
5 . 40 39
( ) 39 (

z P z P x P
istheproportionofindividualsinthepopulation
withacharacteristicofinterest
Pistheproportionofindividualsinthesample with
acharacteristicofinterest
S success individualhasthecharacteristicof
interest
F failure individualdoesnothavethecharacteristic
ofinterest
numberorSsinthesample
p= SampleSize(n)
SampleisanSRSofsizen fromalargepopulation
withpopulationproportionofsuccesses,
Asthesamplesizeincreases,thesampling
distributionofisapproximatelynormalwithmean p
andstandarddeviation
p isanunbiasedestimatorof.
p
n
p
) 1 (

You might also like