Professional Documents
Culture Documents
Introductory Statistics: Class 2
Introductory Statistics: Class 2
SatoshiMiyata,Ph.D.
Announcements
10/17/2014
AU14Statistics:class2
Flowchartofdataanalysis
Objective
Sampling
Descriptivestatistic
(Numerical&Graphicalsummary)
Summary
Selectionofmodel
Modification
model
(Regression,ANOVA,etc.)
Inferentialstatistics
Modelbuilding
Modeldiagnostics
(diagnosisofmodelassumptions)
Decision&Report
10/17/2014
AU14Statistics:class2
Basicdefinitions
Population:Collectionofobjectsofinterest.
Parameter:Characteristicunknownconstantof
thepopulation.
Sample:Partofthepopulation.
Samplingframe:Listingoftheindividualstobe
sampled.
Variable:Randomquantityinthepopulation.
Statistic:Quantitycalculatedfromthesample.
10/17/2014
AU14Statistics:class2
BasicParadigm
Samplingframe
Samplingmethod
Sampling
Population
Sample
Calculation
Inference
Parameter
Statistic
Objectofinterest
Basedon Variable
10/17/2014
AU14Statistics:class2
ProcedureofDataAnalysis
Objective
Population,Sample,Samplingframe,
Variable,Parameter,Statistic
Sampling
Census,SimpleRandom,Convenience,VoluntaryResponse
Randomizetoavoidbias.
Summary
NumericalSummary.(Mean,Median,Variance,s.d.,etc.)
GraphicalSummary.(Histogram,Boxplot,etc.)
10/17/2014
AU14Statistics:class2
ProcedureofDataAnalysis(cont.)
ModelBuilding
Estimation,HypothesisTesting
ANOVA,Regression,etc.
Diagnostics
Checkthemodelassumptions
Report
10/17/2014
AU14Statistics:class2
TypesofSamplingMethods
Census:Allindividualsinthepopulationare
sampled.
SimpleRandomSampling(SRS):Allindividualsin
thepopulationhaveequalchancetobesampled.
ConvenienceSampling:Individualsaresampled
fromaspecificpartofthepopulation.
VoluntaryResponseSampling:Samplesare
obtainedonlyfromvoluntaryresponses.
10/17/2014
AU14Statistics:class2
Biasduetosamplingmethods
Allindividualsinthepopulationshouldhaveequalchance
tobesampled.
Census
SRS
unbiasedbutdifficult
unbiased (desirable)
Convenience maybebiased
Popula on
Samplingframe
Voluntary
maybebiased
Inthepreviousexample,thepopulationandthesampling
framearedifferent.Theresultmaybebiased.
10/17/2014
AU14Statistics:class2
BranchesofStatistics
Descriptivestatistics:tosummarizeanddescribe
importantfeaturesofthedata.
Inferentialstatistics:togeneralizethesample
informationanddrawaconclusionaboutthe
population.
Descriptivestatisticssummarizestheshapeofthe
distributionofthedata.
Location
Variation
Skewness,outliers,shapeofdistribution,etc.
10/17/2014
AU14Statistics:class2
10
Numericalsummary:Location
x1, x2 ,, xn beobservations.
Let
n :numberofobservations.
1
Mean: x n i 1 xi
n
x1
Median:Sort,andlet
x1, x2 ,, xn
x2 xn .
if n is odd,
xn 1 2
~
x
xn 2 xn 21 2 if n is even.
10/17/2014
AU14Statistics:class2
11
Propertiesofmeanandmedian
Mean x ismoresensitivetooutliersthan
~
median.
x
Distribution of x
Symmetric
Skew to left
Skew to right
x~
x
x~
x
x~
x
Letbeaconstant.
c
yi xi c, zi cxi
y x c, z cx , ~
y~
x c, ~
z c~
x
10/17/2014
AU14Statistics:class2
12
Numericalsummary:Location(cont.)
Percentile:k%percentileisthepointthatk%ofthe
samplesarebelowand(100k)%ofthesamplesareabove.
Quartile:1st quartile=25%percentile,
3rd quartile=75%percentile.
Trimmedmean:k%trimmedmeaniscalculatedby
discardingthelowerandhigherk%ofthedata.
Fivenumberssummary:minimum,1stquartile,median,
3rdquartile,maximum.
10/17/2014
AU14Statistics:class2
13
SampleRcode:
10/17/2014
14
Numericalsummary:Variation
Variance:averageofsquareddistancebetweenindividual
observationsandasamplemean.
2
x
i1 i
n
s
2
Standarddeviation:
n 1
S xx
n 1
s s2
FourthSpread(InterQuartileRange):
fs =(3rd quartile 1st quartile)
10/17/2014
AU14Statistics:class2
15
Propertiesofvarianceands.d.
s 0, s 0.
Letbeaconstant.
c
2
yi xi c, zi cxi ,
10/17/2014
AU14Statistics:class2
16
SampleRcode:
> var(x)## variance ##
[1] 0.9881656
>
> sd(x)## standard deviation ##
[1] 0.9940652
> sqrt(var(x))
[1] 0.9940652
>
> IQR(x) ## inter quantile distance ##
[1] 1.312823
10/17/2014
AU14Statistics:class2
17
GraphicalSummary:Histogram
100
0
50
Frequency
150
Classes/Bins:Subintervalofthesamplerange
Frequency:#ofobservationsineachbin.
RelativeFrequency:=Frequency/Totalnumberof
observations.
Histogram:Barchartofthefrequencyortherelative
frequency.
-3
10/17/2014
-2
-1
AU14Statistics:class2
18
TypesofHistogramShapes
100
50
0
50
Frequency
150
bimodal
Frequency
-3 -2
-1
-2
300
100
0
100
Frequency
300
left skewed
Frequency
right skewed
10
15
10/17/2014
20
25
-15
-10
-5
AU14Statistics:class2
19
HistogrambyR
> hist(x)
10
0
Frequency
15
20
Histogram of x
-2
-1
10/17/2014
AU14Statistics:class2
20
GraphicalSummary:Boxplot
Thelowerandtheupperedgesofthe"box"areat1st and
3rd quartiles.Thecenterlineisatthemedian.Thelower
andtheupper"whiskers"arebasicallyatmin.andmax.
maximum
3rd quartile
median
1st quartile
minimum
boxplot(x)
AU14Statistics:class2
skew to left
400
0
20
0
-20 -10
skew to right
-10
10
30
15
heavy tails
-10
-5
-10
-15
-1
10
15
10
-5
skew to left
20
symmetric
10
300
100
5
1
5
0
-1
200
Frequency
15
10
Frequency
15
10
Frequency
20
15
10
5
Frequency
-3
heavy tails
20
skew to right
25
symmetric
21
500
10/17/2014
10/17/2014
AU14Statistics:class2
22
TipsonR
TocopygraphicsontoPowerPointorWORD,etc:
1. Rightcrickonthegraphicswindow.
2. SelectCopyasmetafilefromapulldownmenu.
3. PasteitonPPTorawordprocessor.
TosaveRcodeorprogram:
1. Youmaywriteaprogramfirstonatexteditor.Then
copyandpasteitinR.
2. InR,selectFilemenuandgotoNewscript.
Thenwrite,copyandpasteaprogramonREditor.
10/17/2014
AU14Statistics:class2
23