Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 69

Statistics is a Science of Inference

• Statistical Inference: On basis of sample statistics


 Predict and forecast values of
population parameters... derived from limited and
 Test hypotheses about values incomplete sample
of population parameters... information
 Make decisions...
The Election Poll
Unbiased
Sample
Unbiased, representative
sample drawn at random
North South from the entire
Population
population.

Biased
People who have phones Sample Biased, unrepresentative
and/or cars and/or are
Newspaper readers. sample drawn from
North
people who have cars
South
Population and/or telephones and/or
read Newspapers.
Sample Statistics as Estimators of
Population Parameters
• A sample statistic is a A population parameter
numerical measure of a is a numerical measure of
summary characteristic a summary characteristic
of a sample. of a population.

• An estimator of a population parameter is a sample statistic


used to estimate or predict the population parameter.
• An estimate of a parameter is a particular numerical value of
a sample statistic obtained through sampling.
• A point estimate is a single value used as an estimate of a
population parameter.
Estimators

•• The
The sample
sample mean,
mean,X ,, isis the
the most
most common
common
estimator of
estimator of the
the population
population mean, mean, 
•• The
The sample
sample variance,
variance, ss22,, isis the
the most
most common
common
estimator of the population variance,
estimator of the population variance,  .  22.

•• The
The sample
sample standard
standard deviation,
deviation, s,s, isis the
the most
most
common estimator
common estimator ofof the
the population
population standard
standard
deviation, ..
deviation,
•• The
The sample
sample proportion,
proportion,p̂,, isis thethe most
most common
common
estimator of
estimator of the
the population
population proportion,
proportion, p. p.
Population and Sample Proportions

• The population proportion is equal to the number of


elements in the population belonging to the category of
interest, divided by the total number of elements in the
population: X
p 
N
• The sample proportion is the number of elements in the
sample belonging to the category of interest, divided by
the sample size:
x
p 
n
A Population Distribution, a Sample from a
Population, and the Population and Sample Means

Population mean ()


Frequency distribution
of the population

X X X X X X X
X X X X X X X
X X X X

Sample points

Sample mean ( X)
Sampling Distributions

• The sampling distribution of a statistic is the


probability distribution of all possible values the
statistic may assume, when computed from
random samples of the same size, drawn from a
specified population.
• The sampling distribution of X is the probability
distribution of all possible values the random
variable X may assume when a sample of size n is
taken from a specified population.
Sampling Distributions

Let’s look at the sales of the following six


salespersons as a population
660
Tom 60
Dick 80 Population Mean = ____ = 110
6
Harry 100
Amar 120
Akbar 140
Anthony 160

Let us take a random sample of size 2 without replacement


from this population of 6 salespersons
The Total number of samples each of size 2 without replacement
that are possible to be taken from a population size of 6 would be
= 6 = 15 samples
C2
The Complete set of all possible samples are as follows:
( Possible in this case but not possible in general to enumerate all
possible combinations!!!!)

Sample Sample Mean Sample Sample Mean


1.T,D 60, 80 70
2.T,H 60, 100 80 9. D,An 80, 160 120
3.T,Am 60, 120 90 10.H,Am 100, 120 110
4.T,Ak 60, 140 100 11.H,Ak 100, 140 120
5.T,An 60, 160 110 12.H,An 100, 160 130
6.D,H 80, 100 90 13. Am,Ak 120, 140 130
7.D,Am 80, 120 100 14.Am,An 120, 160 140
8.D,Ak 80, 140 110 15. Ak,An 140, 160 150
__ Frequency Relative __
X Frequency  X
70 1 0.0667 110 -40
80 1 0.0667 110 -30
90 2 0.1333 110 -20
100 2 0.1333 110 -10
110 3 0.2000 110 0
120 2 0.1333 110 10
130 2 0.1333 110 20
140 1 0.0667 110 30
150 1 0.667 110 40
Total N=15 1.000 110
Sampling Distribution of the Sample Mean
4 0.2500

Relative Frequency
0.2000
3
Frequency

0.1500
2
0.1000

1
0.0500

0 0.0000
70 80 90 100 110 120 130 140 150
Sample Means of Samples
Sampling Distributions (Continued)

Uniform population of integers from 1 to 8:


X P(X) XP(X) (X-x) (X-x)2 P(X)(X-x)2 Uniform Distribution (1,8)
0.2
1 0.125 0.125 -3.5 12.25 1.53125
2 0.125 0.250 -2.5 6.25 0.78125
3 0.125 0.375 -1.5 2.25 0.28125
4 0.125 0.500 -0.5 0.25 0.03125

P(X)
5 0.125 0.625 0.5 0.25 0.03125 0.1
6 0.125 0.750 1.5 2.25 0.28125
7 0.125 0.875 2.5 6.25 0.78125
8 0.125 1.000 3.5 12.25 1.53125
0.0
1.000 4.500 5.25000 1 2 3 4 5 6 7 8
X

E(X) =  = 4.5
V(X) = 2 = 5.25
SD(X) =  = 2.2913
Sampling Distributions (Continued)
• There are 8*8 = 64 different but Each of these samples has a sample
equally-likely samples of size 2 mean. For example, the mean of the
that can be drawn (with sample (1,4) is 2.5, and the mean of
replacement) from a uniform the sample (8,4) is 6.
population of the integers from
1 to 8: of Size 2 from Uniform (1,8)
Samples Sample Means from Uniform (1,8), n = 2
1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
1 1,1 1,2 1,3 1,4 1,5 1,6 1,7 1,8 1 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5
2 2,1 2,2 2,3 2,4 2,5 2,6 2,7 2,8 2 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0
3 3,1 3,2 3,3 3,4 3,5 3,6 3,7 3,8 3 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5
4 4,1 4,2 4,3 4,4 4,5 4,6 4,7 4,8 4 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0
5 5,1 5,2 5,3 5,4 5,5 5,6 5,7 5,8 5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5
6 6,1 6,2 6,3 6,4 6,5 6,6 6,7 6,8 6 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0
7 7,1 7,2 7,3 7,4 7,5 7,6 7,7 7,8 7 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5
8 8,1 8,2 8,3 8,4 8,5 8,6 8,7 8,8 8 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0
Sampling Distributions (Continued)
The probability distribution of the sample mean is called the
sampling distribution of the the sample mean.
mean
Sampling Distribution of the Mean
Sampling Distribution of the Mean
P(X) XP(X) X-X (X-X)2 P(X)(X-X)2
0.10
0.015625 0.015625 -3.5 12.25 0.191406
0.031250 0.046875 -3.0 9.00 0.281250

P(X)
0.046875 0.093750 -2.5 6.25 0.292969 0.05
0.062500 0.156250 -2.0 4.00 0.250000
0.078125 0.234375 -1.5 2.25 0.175781
0.093750 0.328125 -1.0 1.00 0.093750
0.00
0.109375 0.437500 -0.5 0.25 0.027344
1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0
0.125000 0.562500 0.0 0.00 0.000000
X
0.109375 0.546875 0.5 0.25 0.027344
0.093750 0.515625 1.0 1.00 0.093750

E ( X )   X  4.5
0.078125 0.468750 1.5 2.25 0.175781
0.062500 0.406250 2.0 4.00 0.250000
0.046875
0.031250
0.328125
0.234375
2.5
3.0
6.25
9.00
0.292969
0.281250 V ( X )   2X  2.625
0.015625 0.125000 3.5 12.25 0.191406
SD( X )   X  1.6202
1.000000 4.500000 2.625000
How are the Mean of the Sample Means and the Population
Mean related ?

While any individual sample mean need not be equal to the


Population Mean, Mean of all the Sample Means always equals
the Population Mean !!

Something Great and very useful in Estimation!!!

__
X __
 __

    N   X . P X  x   110
__

 __ 
 
 Or E X   
X  
How are the Variance of the Sample Means related to the
Population Variance ?
 __  
P X .( X   
__ 2
 (X   ) )
2
__

 2
__  X or   X
X N
= 21.61
Recall  of the population is 34.16

N
 N n 
2

 __  2
.
If  20
X  N 1  n
n


2
N
 2
if  20
X
__
n n

 __  Is often referred as the Standard


__ Error
X
n of the Sample Statistic X
Relationships between Population Parameters and
the Sampling Distribution of the Sample Mean
The expected value of the sample mean is equal to the population mean:

E( X )    
X X

The variance of the sample mean is equal to the population variance divided by
the sample size:

 2

V(X)  2
 X
X
n
The standard deviation of the sample mean, known as the standard error of
the mean,
mean is equal to the population standard deviation divided by the square
root of the sample size:

SD( X )    X
X
n
Sampling from a Normal Population
Whensampling
When samplingfrom
fromaanormal
normalpopulation
populationwith meanand
withmean andstandard
standard
deviation ,the
deviation, thesample
samplemean,
mean,X,
X,has
hasaanormal
normalsampling
samplingdistribution:
distribution:
distribution
distribution

2

X ~ N (, )
n

Thismeans
This meansthat,
that,as
asthe
the Sampling Distribution of the Sample Mean

samplesize
sample sizeincreases,
increases,the
the 0.4

samplingdistribution
sampling distributionof
ofthe
the 0.3
Sampling Distribution: n =16

Sampling Distribution: n =4
samplemean
sample meanremains
remains
f(X)
0.2

centeredon
centered onthe
thepopulation
population 0.1
Sampling Distribution: n =2
Normal population

mean,but
mean, butbecomes
becomesmore
more 0.0
Normal population


compactlydistributed
compactly distributedaround
around
thatpopulation
that populationmean
mean
Properties of the Sampling Distribution
of the Sample Mean
Uniform Distribution (1,8)

• Comparing the population 0.2

distribution and the sampling

P(X)
0.1

distribution of the mean:


The sampling distribution is 0.0
1 2 3 4 5 6 7 8

more bell-shaped and X

symmetric. Sampling Distribution of the Mean

Both have the same center. 0.10

The sampling distribution of


P(X)
the mean is more compact, 0.05

with a smaller variance. 0.00


1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0
X
Distribution of the Population Values Distribution of a Sample ( Tom, Dick)
2
2
Frequency

Frequency
1 1

0
60 80
0
Sample units
60 80 100 120 140 160

Sampling Distribution of the Sample Mean


4
Population Mean is0.2500
the same
as the Mean of the0.2000
Sampling

Relative Frequency
3
distribution of the Sample
Frequency

0.1500
2 Mean
0.1000

1
0.0500

0 0.0000
70 80 90 100 110 120 130 140 150
Sample Means of Samples
(Example 1)

A Company has 5000 accounts averaging each Rs.3500 with a


standard deviation 2000. Take a random sample__ of size 100.
What is the probability that the sample mean X will be between
Rs.3000 and Rs.4000?

Given  = 3500 and  =2500 n = 100 and N= 5000

To find  ___

P 3000 

X  4000  ?

Can you really find an answer__to this unless you know the
shape of the distribution of X ?
What can you say about the shape of the Sampling Distribution of
the Sample Mean ?

Central Limit Theorem

Regardless of the__shape of the population, the sampling


distribution of X is approximately normal if the random
sample size is sufficiently large

( Thumb Rule is, if the sample size is at least 30 , the


distribution of sample means is approximately normal. So
what more do you want?)
The Central Limit Theorem
When sampling
When sampling from from aa population
population 0.25
n=5

with mean  and


with mean and finite
finite standard
standard 0.20
0.15

P(X)
, the
0.10

deviation ,
deviation the sampling
sampling 0.05
0.00
X

distribution of
distribution of the
the sample
sample mean
mean will
will
n = 20
tend to
tend to aa normal
normal distribution
distribution with
with 0.2


mean  and
mean and standard
standard deviation
deviation as as

P(X)
0.1

n
the sample
the sample size
size becomes
becomes large
large 0.0
X

(n >30).
(n >30). Large n
0.4
0.3

f(X)
For “large
“large enough”
enough” n:
n: X ~ N (  , / n)
0.2

For
2
0.1
0.0

X
-
The Central Limit Theorem Applies to
Sampling Distributions from Any Population
Normal Uniform Skewed General

Population

n=2

n = 30

 X  X  X  X
Accounts Receivables Example Continued…….
 ___

P 3000 

X  4000  ?

 __

3000   
P  X   4000   
  
 
 n n n 

 
 
P 3000  3500  Z  4000  3500  
2500 2500 
 
 100 100 

P 2.5  Z  2.5  0.4938  0.4938


= 0.9876
The Central Limit Theorem
(Example 2)
Mercurymakes
Mercury makesaa2.4
2.4liter
literV-6
V-6engine,
engine,the
theLaser
LaserXRi,
XRi,used
usedininspeedboats.
speedboats. TheThe
company’sengineers
company’s engineersbelieve
believethe
theengine
enginedelivers
deliversan
anaverage
averagepower
powerof of220
220
horsepowerand
horsepower andthat
thatthe
thestandard
standarddeviation
deviationofofpower
powerdelivered
deliveredisis1515HP.
HP. AA
potentialbuyer
potential buyerintends
intendstotosample
sample100
100engines
engines(each
(eachengine
engineisistotobe
berun
runaasingle
single
time). What
time). Whatisisthe
theprobability
probabilitythat
thatthe
thesample
samplemean
meanwill
willbe
beless
lessthan
than217HP?
217HP?
 
 X   217   
P ( X  217)  P  
   
 n n 

   
 217  220   217  220
 P Z    P  Z  
 15   15 
 100   10 

 P ( Z  2)  0.0228
Confidence Interval for 
When  Is Known

If the population distribution is normal,
normal the sampling
distribution of the mean is normal.
• If the sample is sufficiently large, regardless of the shape of
the population distribution,
distribution the sampling distribution is
normal (Central Limit Theorem).
In either case: Standard Normal Distribution: 95% Interval

0.4

   
P   196
.  x    196
.   0.95 0.3
 n n
f(z)
0.2

or 0.1

0.0
    -4 -3 -2 -1 0 1 2 3 4
P x  196
.    x  196
.   0.95
 n n
z
Recall the examples of the Central Limit Theorem—Accounts Receivables!!
But, Who is interested in knowing about the chance of a
Sample Mean lying between 3000 and 4000 when the
population mean and standard deviation is known?
What we usually face is a situation where we donot know the
Population Mean ( allowing a concession at present that standard
deviation is known!!!).

But we have a sample in hand of a given size whose mean and


standard deviation is possible to obtain from the sample units
isn’t?
So we want to estimate the unknown population mean from
this sample mean!!

This is the purpose any inferential Statistics and Estimation


So how do we expand the results from a given sample? This is
where the Central Limit Theorem comes to our rescue!!. We use
the properties of the Sampling distribution of the sample Mean
and Central Limit Theorem to do the job.
Standard Normal Distribution: 95% Interval

Look at the following statement of 0.4

truth. 0.3

P 1.96  Z  1.96  0.95

f(z)
0.2

0.1

( any objection?) 0.0


__ -4 -3 -2 -1 0 1 2 3 4

CLT tells me that X is approximately Normal for a


z

sufficiently large sample isn’t?


__
So X will have a bell shaped curve with mean

 and
standard deviation
n
P 1.96  Z  1.96  0.95

 
 X 
So P  1.96   1.96  0.95
  
 n  X 
Z
as 
n
Therefore, rearranging this we get


P  X  1.96     X  1.96    0.95
 n 
n
This explains the process of constucting 95% Confidence Interval
for 
Confidence Interval for when
 is Known (Continued)
Beforesampling,
Before sampling,there
thereisisaa0.95probab
0.95probability
ilitythat
thatthe
theinterval
interval

  1.96 
  1.96 n
n
willinclude
will includethe
thesample
samplemean
mean(and
(and5%
5%that
thatititwill
willnot).
not).

Conversely, ,after
Conversely aftersampling,
sampling,approximat
approximately
ely95%
95%of
ofsuch
suchintervals
intervals

.96 
xx11.96
nn
willinclude
will includethe
thepopulation
populationmean
mean(and
(and5%
5%of
of them
themwill
willnot).
not).


That 1.96  isisaa95%
is,xx1.96
Thatis, 95%confidence
confidenceinterval for. .
intervalfor
nn
A 95% Interval around the Population
Mean
Sampling Distribution of the Mean
0.4 Approximately95%
Approximately 95%of ofsample
samplemeans
means
0.3
95% canbe
can beexpected
expectedto tofall
fallwithin
withinthe
the
interval   1.96  ,   1.96  ..
interval
f(x)

0.2
 n n 
0.1
2.5% 2.5%
Conversely,about
Conversely, about2.5%
2.5%can canbebe
above  1.96  and
0.0

  196
.
    196
.
 x expectedto
expected tobe
beabove and
n n n
2.5%can
2.5% canbebeexpected
expectedto
tobebebelow
below
x 
  1.96
n
..
x
2.5% fall below
the interval x
x
x
x 2.5% fall above So5%
So 5%can
canbebeexpected
expected
 1.96

,  to
to
 1.fall
fall
96
 outside

outside
the interval  n n 
x
the interval
the interval ..
x
x

95% fall within


the interval
95% Intervals around the Sample Mean
0.4
Sampling Distribution of the Mean
Approximately95%
Approximately 95%ofofthe
theintervals
intervals
95% x  1.96 aroundthe
 around thesample
samplemean
meancan
canbebe
n
0.3
expected
expected totoinclude
includethe
theactual
actualvalue
valueof
ofthe
the
populationmean,mean,.. (When
(Whenthethesample
sample
f(x)

population
0.2

0.1
2.5% 2.5%
meanfalls
mean fallswithin
withinthe
the95%
95%interval
intervalaround
around
0.0
thepopulation
the populationmean.)
mean.)
  x
  196
.    196
.
n n x x x

x **5%
5%of
ofsuch
suchintervals
intervalsaround
aroundthe
thesample
sample
x
meancan
mean canbebeexpected
expectednot
nottotoinclude
includethe
the
x
actualvalue
actual valueof
ofthe
thepopulation
populationmean.
mean.
* x
x
(Whenthe
(When thesample
samplemean
meanfalls
fallsoutside
outsidethe
the
x 95%interval
95% intervalaround
aroundthe
thepopulation
population
x
mean.)
mean.)
x
x
x
x
*
The 95% Confidence Interval for 
AA95%
95%confidence
confidenceinterval forwhen
intervalfor whenisisknown
knownandandsampling
samplingisis
donefrom
done fromaanormal
normalpopulation,
population,ororaalarge
largesample
sampleisisused:
used:

x  1.96
n

The quantity 1.96  isisoften


Thequantity oftencalled
calledthe
themargin
marginof
oferror
erroror
orthe
the
sampling error. n
sampling error.
For example, if: n = 25 A 95% confidence interval:
= 20  20
x  1.96  122  1.96
x = 122 n 25
 122  (1.96)(4 )
 122  7.84
 114.16,129.84
A (1-)100% Confidence Interval for 

We define z as the z value that cuts off a right-tail area of under the standard
2 2
normal curve. (1-) is called the confidence coefficient.  is called the error
probability, and (1-)100% is called the confidence level.
S tand ard Norm al Distrib ution  
P z  z  
0.4  
(1   ) 2
 
0.3 P z  z   
 2

 
f(z)

0.2
P  z z z   (1  )
  
0.1    2 2

2 2
0.0 (1- )100% Confidence Interval:
-5 -4 -3 -2 -1 0 1 2 3 4 5 
z  Z z x  z
2 2
2 n
Critical Values of z and Levels of
Confidence

(1   )
 z
Stand ard N o rm al Distrib utio n

2 2
0.4
(1   )

0.99 0.005 2.576


0.3

f(z)
0.2

0.98 0.010 2.326 0.1  


2 2
0.95 0.025 1.960 0.0
-5 -4 -3 -2 -1 0 1 2 3 4 5

0.90 0.050 1.645 z 


2
Z z
2

0.80 0.100 1.282


The Level of Confidence and the Width
of the Confidence Interval
Whensampling
When samplingfrom
fromthe
thesame
samepopulation,
population,using
usingaafixed
fixedsample
samplesize,
size,the
the
higherthe
higher theconfidence
confidencelevel,
level,the
thewider
widerthe
theconfidence
confidenceinterval.
interval.
St an d ar d N or m al Di s tri b uti o n St an d ar d N or m al Di s tri b uti o n

0.4 0.4

0.3 0.3
f(z)

f(z)
0.2 0.2

0.1 0.1

0.0 0.0
-5 -4 -3 -2 -1 0 1 2 3 4 5 -5 -4 -3 -2 -1 0 1 2 3 4 5
Z Z

80% Confidence Interval: 95% Confidence Interval:


 
x  128
. x  196
.
n n
The Sample Size and the Width of the
Confidence Interval
Whensampling
When samplingfrom
fromthe
thesame
samepopulation,
population,using
usingaafixed
fixedconfidence
confidencelevel,
level,
thelarger
the largerthe
thesample
samplesize,
size,n,
n,the
thenarrower
narrowerthe
theconfidence
confidenceinterval.
interval.
S a m p lin g D is trib utio n o f th e M e an S a m p lin g D is trib utio n o f th e M e an

0 .4 0 .9

0 .8

0 .3 0 .7

0 .6

0 .5
f(x)

f(x)
0 .2
0 .4

0 .3
0 .1
0 .2

0 .1
0 .0 0 .0

x x

95% Confidence Interval: n = 20 95% Confidence Interval: n = 40


Example 1

•• Population
Population consists
consists of
of the
the Fortune
Fortune 500
500 Companies
Companies
(Fortune Web
(Fortune Web Site),
Site), as
as ranked
ranked by
by Revenues.
Revenues. You
You
are trying
are trying to
to to
to find
find out
out the
the average
average Revenues
Revenues for
for
the companies
the companies on on the
the list.
list. The
The population
population
standard deviation
standard deviation isis $15,056.37.
$15,056.37. A A random
random
sample of
sample of 30
30 companies
companies obtains
obtains aa sample
sample mean
mean of
of
$10,672.87. Give
$10,672.87. Give aa 95%
95% andand 90%
90% confidence
confidence
interval for
interval for the
the average
average Revenues.
Revenues.
Example 1 (continued) - Using the
Template

Note: The remaining part of the template display is


shown on the next slide.
Example-1 (continued) - Using the
Template

 (Sigma)
Example 1 (continued) - Using the
Template when the Sample Data is Known
Confidence Interval or Interval Estimate for 
When  Is Unknown - The t Distribution
If the population standard deviation, , is not known, replace
with the sample standard deviation, s. If the population is
normal, the resulting statistic: t  X s 
n
has a t distribution with (n - 1) degrees of freedom.
•• Thet tisisaafamily
The familyofofbell-shaped
bell-shapedand andsymmetric
symmetric
Standard normal
distributions,one
distributions, oneforforeach
eachnumber
numberofofdegree
degreeofof
freedom.
freedom. t, df = 20
•• Theexpected
The expectedvalue
valueofoft tisis0.0.
t, df = 10
•• Fordfdf>>2,2,the
For thevariance
varianceofof t tisisdf/(df-2).
df/(df-2). This
Thisisis
greaterthan
greater than1,1,but
butapproaches
approaches11asasthe thenumber
number
ofofdegrees
degreesofoffreedom
freedomincreases.
increases. The Thet tisisflatter
flatter
andhas
and hasfatter
fattertails
tailsthan
thandoes
doesthe thestandard
standard
normal.
normal. 
•• Thet tdistribution
The distributionapproaches
approachesaastandard
standardnormal
normal 

asasthe
thenumber
numberofofdegrees
degreesofoffreedom
freedomincreases
increases
The t Distribution Template
Confidence Intervals for  when  is
Unknown- The t Distribution

(1-)100%
AA(1- )100%confidence
confidenceinterval forwhen
intervalfor whenisisnot
notknown
known
(assumingaanormally
(assuming normallydistributed
distributedpopulation):
population):
s
x t 
n
2

where t isisthe
where thevalue
valueofofthe
thettdistribution
distributionwith
withn-1n-1degrees
degreesof
of
2 
freedomthat
freedom thatcuts
cutsoff
offaatail
tailarea
areaof
of 2 totoits
itsright.
right.
The t Distribution
df t0.100 t0.050 t0.025 t0.010 t0.005
t D is trib utio n: d f = 1 0
--- ----- ----- ------ ------ ------
1 3.078 6.314 12.706 31.821 63.657
0 .4
2 1.886 2.920 4.303 6.965 9.925
3 1.638 2.353 3.182 4.541 5.841
4 1.533 2.132 2.776 3.747 4.604
0 .3
5 1.476 2.015 2.571 3.365 4.032
6 1.440 1.943 2.447 3.143 3.707 Area = 0.10 Area = 0.10

}
7 1.415 1.895 2.365 2.998 3.499

f(t)
0 .2
8 1.397 1.860 2.306 2.896 3.355
9 1.383 1.833 2.262 2.821 3.250
10 1.372 1.812 2.228 2.764 3.169 0 .1
11 1.363 1.796 2.201 2.718 3.106
12 1.356 1.782 2.179 2.681 3.055
13 1.350 1.771 2.160 2.650 3.012 0 .0
14 1.345 1.761 2.145 2.624 2.977 -1.372 0 1.372
-2.228 2.228

}
15 1.341 1.753 2.131 2.602 2.947 t
16 1.337 1.746 2.120 2.583 2.921
17 1.333 1.740 2.110 2.567 2.898 Area = 0.025 Area = 0.025
18 1.330 1.734 2.101 2.552 2.878
19 1.328 1.729 2.093 2.539 2.861
20
21
1.325
1.323
1.725
1.721
2.086
2.080
2.528
2.518
2.845
2.831
Wheneverisisnot
Whenever notknown
known(and
(andthe
thepopulation
populationisis
22
23
1.321
1.319
1.717
1.714
2.074
2.069
2.508
2.500
2.819
2.807
assumednormal),
assumed normal),thethecorrect
correctdistribution
distributiontotouse
useisis
24
25
1.318
1.316
1.711
1.708
2.064
2.060
2.492
2.485
2.797
2.787
thet tdistribution
the distributionwith
withn-1
n-1degrees
degreesofoffreedom.
freedom.
26
27
1.315
1.314
1.706
1.703
2.056
2.052
2.479
2.473
2.779
2.771
Note,however,
Note, however,that
thatfor
forlarge
largedegrees
degreesofoffreedom,
freedom,
28
29
1.313
1.311
1.701
1.699
2.048
2.045
2.467
2.462
2.763
2.756
thet tdistribution
the distributionisisapproximated
approximatedwellwellbybythe
theZZ
30
40
1.310
1.303
1.697
1.684
2.042
2.021
2.457
2.423
2.750
2.704
distribution.
distribution.
60 1.296 1.671 2.000 2.390 2.660
120 1.289 1.658 1.980 2.358 2.617
 1.282 1.645 1.960 2.326 2.576
Example 2

AAstock
stockmarket
marketanalyst
analystwants
wantstotoestimate
estimatethe
theaverage
averagereturn
returnononaacertain
certain
stock. AArandom
stock. randomsample
sampleofof15
15days
daysyields
yieldsan
anaverage
average(annualized)
(annualized)return
return
of x  10.37% and
of andaastandard
standarddeviation
deviationof
ofss==3.5%.
3.5%. Assuming
Assumingaanormal
normal
populationof
population ofreturns,
returns,give
giveaa95%
95%confidence
confidenceinterval
intervalfor
forthe
theaverage
averagereturn
return
onthis
on thisstock.
stock.
df
---
t0.100
-----
t0.050
-----
t0.025
------
t0.010
------
t0.005
------
The critical value of t for df = (n -1) = (15 -1)
1
.
3.078
.
6.314
.
12.706
.
31.821
.
63.657
. =14 and a right-tail area of 0.025 is:
t 0.025  2.145
. . . . . .
. . . . . .
13 1.350 1.771 2.160 2.650 3.012
14 1.345 1.761 2.145 2.624 2.977 The corresponding confidence interval or
interval estimate is:x  t 0.025 s
15 1.341 1.753 2.131 2.602 2.947
. . . . . .
. . . . . .
. . . . . . n
35
.
 10.37  2.145
15
 10.37  1.94
 8.43,12.31
Large Sample Confidence Intervals for
the Population Mean

df t0.100 t0.050 t0.025 t0.010 t0.005


--- ----- ----- ------ ------ ------ Whenever  is not known (and the population is
1 3.078 6.314 12.706 31.821 63.657
. . . . . . assumed normal), the correct distribution to use is
. . . . . .
. . . . . . the t distribution with n-1 degrees of freedom.
120 1.289 1.658 1.980 2.358 2.617
 1.282 1.645 1.960 2.326 2.576 Note, however, that for large degrees of freedom,
the t distribution is approximated well by the Z
distribution.
Large Sample Confidence Intervals for
the Population Mean
A large - sample (1 -  )100% confidence interval for :
s
x  z
2 n

Example 3: An economist wants to estimate the average amount in checking accounts at banks in a given region. A
random sample of 100 accounts gives x-bar = $357.60 and s = $140.00. Give a 95% confidence interval for , the
average amount in any checking account at a bank in the given region.

s 140.00
x  z 0.025  357.60  1.96  357.60  27.44   33016,385
. .04
n 100
Large-Sample Confidence Intervals for
the Population Proportion, p

The estimator of the population proportion, p , is the sample proportion, p . If the


sample size is large, p has an approximately normal distribution, with E( p ) = p and
pq
V( p ) = , where q = (1 - p). When the population proportion is unknown, use the
n
estimated value, p , to estimate the standard deviation of p .

For estimating p , a sample is considered large enough when both n  p an n  q are greater
than 5.
Large-Sample Confidence Intervals for
the Population Proportion, p

A large - sample (1 -  )100% confidence interval for the population proportion, p :


pˆ qˆ
pˆ  z
α n
2
where the sample proportion, p̂, is equal to the number of successes in the sample, x,
divided by the number of trials (the sample size), n, and q̂ = 1 - p̂.
Large-Sample Confidence Interval for the
Population Proportion, p (Example 4)
A marketing research firm wants to estimate the share that foreign companies
have in the American market for certain products. A random sample of 100
consumers is obtained, and it is found that 34 people in the sample are users
of foreign-made products; the rest are users of domestic products. Give a
95% confidence interval for the share of foreign products in this market.


pq ( 0.34 )( 0.66)
p  z  0.34  1.96
2
n 100
 0.34  (1.96)( 0.04737 )
 0.34  0.0928
  0.2472 ,0.4328
Thus,the
Thus, thefirm
firmmay
maybebe95%
95%confident
confidentthat
thatforeign
foreignmanufacturers
manufacturerscontrol
control
anywherefrom
anywhere from24.72%
24.72%toto43.28%
43.28%ofofthe
themarket.
market.
Large-Sample Confidence Interval for the
Population Proportion, p (Example 4) –
Using the Template
Reducing the Width of Confidence
Intervals - The Value of Information
The width of a confidence interval can be reduced only at the price of:
• a lower level of confidence, or
• a larger sample.
LowerLevel
Lower Levelof
ofConfidence
Confidence LargerSample
Larger SampleSize
Size

90% Confidence Interval Sample Size, n = 200



pq (0.34)(0.66) 
pq (0.34)(0.66)
p  z   0.34  1645
. p  z  0.34  196
.
2
n 100 2
n 200
 0.34  (1645
. )(0.04737)  0.34  (196
. )(0.03350)
 0.34  0.07792  0.34  0.0657
 0.2621,0.4197  0.2743,0.4057
Confidence Intervals for the Population
Variance: The Chi-Square (2) Distribution
• The sample variance, s2, is an unbiased estimator
of the population variance, 2.
• Confidence intervals for the population variance
are based on the chi-square (2) distribution.
The chi-square distribution is the probability
distribution of the sum of several independent, squared
standard normal random variables.
The mean of the chi-square distribution is equal to the
degrees of freedom parameter, (E[2] = df). The
variance of a chi-square is equal to twice the number of
degrees of freedom, (V[2] = 2df).
The Chi-Square (2) Distribution
C hi-S q uare D is trib utio n: d f=1 0 , df=3 0 , df =5 0

The chi-square random variable cannot
be negative, so it is bound by zero on 0 .1 0
df = 10
0 .0 9

the left. 0 .0 8
0 .0 7

The chi-square distribution is skewed


0 .0 6

f ( 2 )
df = 30
0 .0 5

to the right. 0 .0 4
0 .0 3 df = 50
0 .0 2

The chi-square distribution approaches 0 .0 1
0 .0 0
a normal as the degrees of freedom 0 50 100

increase.  2

Insampling
In samplingfromfromaanormal
normalpopulation,
population, the
therandom
randomvariable: variable:

((nn11
) )
s s
2
2

2 
2

22

hasaachi
has chi--square
squaredistribution
distributionwith
with(n
(n--1)
1)degrees
degreesof
offreedom.
freedom.
Values and Probabilities of Chi-Square
Distributions
Area in Right Tail

.995 .990 .975 .950 .900 .100 .050 .025 .010 .005

Area in Left Tail

df .005 .010 .025 .050 .100 .900 .950 .975 .990 .995

1 0.0000393 0.000157 0.000982 0.000393 0.0158 2.71 3.84 5.02 6.63 7.88
2 0.0100 0.0201 0.0506 0.103 0.211 4.61 5.99 7.38 9.21 10.60
3 0.0717 0.115 0.216 0.352 0.584 6.25 7.81 9.35 11.34 12.84
4 0.207 0.297 0.484 0.711 1.06 7.78 9.49 11.14 13.28 14.86
5 0.412 0.554 0.831 1.15 1.61 9.24 11.07 12.83 15.09 16.75
6 0.676 0.872 1.24 1.64 2.20 10.64 12.59 14.45 16.81 18.55
7 0.989 1.24 1.69 2.17 2.83 12.02 14.07 16.01 18.48 20.28
8 1.34 1.65 2.18 2.73 3.49 13.36 15.51 17.53 20.09 21.95
9 1.73 2.09 2.70 3.33 4.17 14.68 16.92 19.02 21.67 23.59
10 2.16 2.56 3.25 3.94 4.87 15.99 18.31 20.48 23.21 25.19
11 2.60 3.05 3.82 4.57 5.58 17.28 19.68 21.92 24.72 26.76
12 3.07 3.57 4.40 5.23 6.30 18.55 21.03 23.34 26.22 28.30
13 3.57 4.11 5.01 5.89 7.04 19.81 22.36 24.74 27.69 29.82
14 4.07 4.66 5.63 6.57 7.79 21.06 23.68 26.12 29.14 31.32
15 4.60 5.23 6.26 7.26 8.55 22.31 25.00 27.49 30.58 32.80
16 5.14 5.81 6.91 7.96 9.31 23.54 26.30 28.85 32.00 34.27
17 5.70 6.41 7.56 8.67 10.09 24.77 27.59 30.19 33.41 35.72
18 6.26 7.01 8.23 9.39 10.86 25.99 28.87 31.53 34.81 37.16
19 6.84 7.63 8.91 10.12 11.65 27.20 30.14 32.85 36.19 38.58
20 7.43 8.26 9.59 10.85 12.44 28.41 31.41 34.17 37.57 40.00
21 8.03 8.90 10.28 11.59 13.24 29.62 32.67 35.48 38.93 41.40
22 8.64 9.54 10.98 12.34 14.04 30.81 33.92 36.78 40.29 42.80
23 9.26 10.20 11.69 13.09 14.85 32.01 35.17 38.08 41.64 44.18
24 9.89 10.86 12.40 13.85 15.66 33.20 36.42 39.36 42.98 45.56
25 10.52 11.52 13.12 14.61 16.47 34.38 37.65 40.65 44.31 46.93
26 11.16 12.20 13.84 15.38 17.29 35.56 38.89 41.92 45.64 48.29
27 11.81 12.88 14.57 16.15 18.11 36.74 40.11 43.19 46.96 49.65
28 12.46 13.56 15.31 16.93 18.94 37.92 41.34 44.46 48.28 50.99
29 13.12 14.26 16.05 17.71 19.77 39.09 42.56 45.72 49.59 52.34
30 13.79 14.95 16.79 18.49 20.60 40.26 43.77 46.98 50.89 53.67
Values and Probabilities of Chi-Square
Distributions
Confidence Interval for the Population
Variance
A (1-)100% confidence interval for the population variance * (where the
population is assumed normal) is:
 2
 ( n  1) s , ( n  1) s 
2

  2 2  
 2
1
2 
 2
where  is the value of the chi-square distribution with n - 1 degrees of freedom
2   2
that cuts off an area to its right and  is the value of the distribution that
2 1
2 
cuts off an area of 2 to its left (equivalently, an area of 1  to its right).
2

* *Note:
Note: Because
Becausethe
thechi-square
chi-squaredistribution
distributionisisskewed,
skewed,the
theconfidence
confidenceinterval
intervalfor
forthe
the
populationvariance
population varianceisisnot
notsymmetric
symmetric
Confidence Interval for the Population
Variance - Example 5
In an automated process, a machine fills cans of coffee. If the average amount
filled is different from what it should be, the machine may be adjusted to
correct the mean. If the variance of the filling process is too high, however,
the machine is out of control and needs to be repaired. Therefore, from time to
time regular checks of the variance of the filling process are made. This is
done by randomly sampling filled cans, measuring their amounts, and
computing the sample variance. A random sample of 30 cans gives an estimate
s2 = 18,540. Give a 95% confidence interval for the population variance, 2.

 ( n  1) s2 2 ( n  1) s2 2  ( 30  1)18540 ( 30  1)18540 


( n  12 2) s , ,( n 212 ) s ( 30  1)18540 , ,( 30  1)18540 11765 ,33604
11765,33604
   
 1   457
457
. . 16.
16.00 
 2
2 1 2
2 
Example 5 (continued)
Area in Right Tail

df .995 .990 .975 .950 .900 .100 .050 .025 .010 .005
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
28 12.46 13.56 15.31 16.93 18.94 37.92 41.34 44.46 48.28 50.99
29 13.12 14.26 16.05 17.71 19.77 39.09 42.56 45.72 49.59 52.34
30 13.79 14.95 16.79 18.49 20.60 40.26 43.77 46.98 50.89 53.67

Chi-Square Distribution: df = 29

0.06

0.05
0.95
0.04
f(2 )

0.03

0.02
0.025
0.01 0.025
0.00
0 10 20 30 40 50 60 70
2
 20.975  16.05  20.025  4572
.
Example 5 Using the Template

Using Data
Sample-Size Determination
Beforedetermining
Before determiningthe
thenecessary
necessarysample
samplesize,
size,three
threequestions
questionsmust
must
beanswered:
be answered:

•• How
Howclose
closedo
doyou
youwant
wantyour
yoursample
sampleestimate
estimatetotobe
betotothe
theunknown
unknown
parameter? (What
parameter? (Whatisisthe
thedesired
desiredbound,
bound,B?)
B?)
•• Whatdo
What doyou
youwant
wantthe
thedesired
desiredconfidence
confidencelevel
level(1-)
(1-)totobe
beso
sothat
thatthe
the
distancebetween
distance betweenyour
yourestimate
estimateand
andthe
theparameter
parameterisisless
lessthan
thanor
orequal
equaltoto
B?
B?
•• Whatisisyour
What yourestimate
estimateofofthe
thevariance
variance(or
(orstandard
standarddeviation)
deviation)ofofthe
the
populationininquestion?
population question?



Forexample:
For (1-))Confidence
example: AA(1- ConfidenceInterval for:: xxzz 
Intervalfor

n 2 n

}
2

Bound, B
Sample Size and Standard Error

Thesample
The samplesize
sizedetermines
determinesthe
thebound
boundofofaastatistic,
statistic,since
sincethe
thestandard
standard
errorof
error ofaastatistic
statisticshrinks
shrinksas
asthe
thesample
samplesize
sizeincreases:
increases:

Sample size = 2n
Standard error
of statistic

Sample size = n
Standard error
of statistic


Minimum Sample Size:
Mean and Proportion
Minimum required sample size in estimating the population
mean, :
z 
2 2

n 2 2
B
Bound of estimate:

B = z
2 n

Minimum required sample size in estimating the population


proportion, p
2
z pq
n 2 2
B
Sample-Size Determination:
Example 6
AAmarketing
marketingresearch
researchfirm
firmwants
wantstotoconduct
conductaasurvey
surveytotoestimate
estimatethe
theaverage
average
amountspent
amount spentononentertainment
entertainmentby byeach
eachperson
personvisiting
visitingaapopular
popularresort.
resort. The
The
peoplewho
people whoplan
planthe
thesurvey
surveywould
wouldlike
liketotodetermine
determinethe
theaverage
averageamount
amountspent
spentby
by
allpeople
all peoplevisiting
visitingthe
theresort
resorttotowithin
within$120,
$120,with
with95%
95%confidence.
confidence. From
Frompast
past
operationof
operation ofthe
theresort,
resort,ananestimate
estimateofofthe
thepopulation
populationstandard
standarddeviation
deviationisis
ss==$400.
$400. What
Whatisisthe
theminimum
minimumrequired
requiredsample
samplesize?
size?

zz 
2


2
2
2

nn B
2
2
2

B 2

(196
. . )) ((400
(196 400)) 2
2
2
2


120
120 2
2

42 .68443
42.684 43
Sample-Size for Proportion:
Example 7
Themanufacturers
The manufacturersof ofaasports
sportscar
carwant
wanttotoestimate
estimatethe
theproportion
proportionof
ofpeople
peopleininaa
givenincome
given incomebracket
bracketwho
whoare
areinterested
interestedininthe
themodel.
model. The
Thecompany
companywants
wantstoto
knowthe
know thepopulation
populationproportion,
proportion,p,p,totowithin
within0.01
0.01with
with99%
99%confidence.
confidence. Current
Current
companyrecords
company recordsindicate
indicatethat
thatthe
theproportion
proportionppmaymaybebearound
around0.25.
0.25. What
Whatisisthe
the
minimumrequired
minimum requiredsample
samplesize
sizefor
forthis
thissurvey?
survey?
2
z2 pq
zpq
n 
n  B22
2
2

B
2
2 .576
2 ( 0.25)( 0.75)
2.576 (0.25)(0.75)

 2
010
010
. . 2

124.42
124.42125
125
The Templates – Optimizing Population
Mean Estimates
The Templates – Optimizing Population
Proportion Estimates

You might also like