Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 31

Confidence Interval Estimation

• Point estimation and confidence interval


estimation
• Confidence interval estimates for the mean
• Confidence interval estimates for the proportion
• Sample size decision in estimating population
mean
• Sample size decision in estimating population
proportion
Statistical Estimation
• We take data from a sample and say something about the
population from which the sample was drawn
• Sample statistic is used to estimate unknown parameter.
• There are two types of estimation:
• Point Estimation:
 Calculation of a single value of a sample statistic
• Confidence Interval Estimation
 Calculation of an interval using a sample statistic
 This interval is calculated at a desired level of confidence
• Eg. 95% confidence, 99% confidence, can not be 100%
 Sample to sample variation (standard error) is also taken
into consideration.
)
Confidence Interval Estimates
• Let θ is unknown parameter.
• Suppose T is the point estimate of θ
• Also, E(T) = θ
• Fix the confidence level at (1-  )x100 %.
• Suppose that the confidence interval estimate of θ is obtained
as
[T-h, T+h]
• It means that P(T-h ≤ θ ≤ T+h) = 1- 
•  is the probability of “error”.
• (1- ) is called confidence coefficient.
• Thus, for 95% confidence level,  = 0.05.
• The general formula for all confidence intervals is:
[UE ± CV x SE]
• where
 UE = Unbiased (Point) Estimate of the unknown parameter
 CV = Critical Value (will be discussed later)
 SE = Standard Error of the estimator
• i.e., Lower Confidence Limit = UE - CV x SE
• Upper Confidence Limit = UE + CV x SE
Point Estimate
Lower Confidence Limit Upper Confidence Limit

Width of
confidence interval
• Using Central Limit Theorem, for large sample
size, Unbiased Estimator  Parameter
Z ~ N (0,1)
Standard Error

• Fix the confidence level at (1-  )x100 %


• critical value is the given by z/2 as below
• For Z~N(0,1), P(-z/2 < Z < z/2) = (1-  ).
N(0,1)
Unbiased Estimator  Parameter
• Since Z ~ N (0,1)
Standard Error
• And P(-z/2 < Z < z/2) = (1-  ), where Z~N(0,1).
• This implies
 UE  Parameter 
P  z / 2   z / 2   1  
 SE 
or P  z / 2  SE  UE  Parameter  z / 2  SE   1  
or P UE  z / 2  SE  Parameter  UE  z / 2  SE   1  

• Thus (1-  )x100 % Confidence interval estimate of


unknown parameter is given by
• [UE - z/2 x SE, UE + z/2 x SE]
Confidence Interval for Population Mean μ
(σ Known)
• When
 Population standard deviation σ is known
 Population is normally distributed
 If population is not normal, sample size is large
• (1-  )x100 % Confidence interval estimate of is
μ given by
   
 x  z / 2  , x  z / 2  
 n n
• where P(-z/2 < Z < z/2) = (1-  ), Z~N(0,1).
N(0,1)

α
 .025 1    0.95 α
 .025
2 2

-z/2 = - 1.96 0 z/2 = 1.96


Commonly used confidence levels and corresponding critical values
(N(0,1) Distribution)
Confidence
Confidence Level Coefficient α Critical Value
80% 0.8 0.2 1.28
90% 0.9 0.1 1.645
95% 0.95 0.05 1.96
98% 0.98 0.02 2.33
99% 0.99 0.01 2.58
99.80% 0.998 0.002 3.08
99.90% 0.999 0.001 3.27
Sampling Distribution of the Mean 
N  , n 
/2 1  /2

μx  μ
Value of Sample Mean x (1-) x100%
for different samples of intervals will
contain μ.

Confidence Intervals (for different samples)


 σ σ 
 x  zα/ 2 , x  zα/ 2 
 n n
• Example:
• A sample of 11 circuits from a large normal population
has a mean resistance of 2.20 ohms.
• We know from past testing that the population standard
deviation is 0.35 ohms.
• Determine a 95% confidence interval for the true mean
resistance of the population.
• Ans. σ
x  z( 0.025)
n
 2.20  1.96 (0.35/ 11 )
 2.20  0.2068
(1.9932 , 2.4068)
Confidence Interval for Population Mean μ
(σ Unknown)
• Use unbiased estimate of σ, given by
1 n
s1   i
n  1 i 1
( x  x ) 2

• Case 1: n is small
 Value of s1 varies sample to sample
 This increases extra variability
 Normal distribution can not be used
 We use t distribution
• Case 2: n is large
 When n is large, t distribution approaches normal distribution
 We use N(0,1) distribution
Case 1: σ is unknown and n is small
• Assumption: Population has normal distribution
• (1-  )x100 % Confidence interval estimate of is μ given
by
 s1 s1 
 x  t / 2  , x  t / 2  
 n n

• Where t/2 is given such that


• For T ~ t(n-1), P(-t/2 < T < t/2) = (1-  ).
α t(n-1) α
2 2
1

0
-t/2 t/2
Some Critical Values of t(n-1) distribution for given α and d.f. (n-1)
d.f. Critical Value Critical Value
(n-1) at α = 0.05 at α = 0.10
1 12.706 6.314
2 4.303 2.92
3 3.182 2.353
4 2.776 2.132
5 2.571 2.015
6 2.447 1.943
7 2.365 1.895
• Consider the same example
• A sample of 11 circuits from a large normal population
has a mean resistance of 2.20 ohms.
• Population standard deviation is not known.
• Sample standard deviation is 0.35 ohms.
• Determine a 95% confidence interval for the true mean
resistance of the population.
• Ans. x  t s1 If we are given s2, we
( 0 .025 ) can use following
n
formula
 2.20  2.365  ( 0.35 / 11 )
n 2
 2.20  0.249576 s 
2
1 s
n 1
( 1.950424 , 2.449576 )
Case 2: σ is unknown and n is large
• Assumption: Population has normal distribution
• This assumption is not very strong.
• (1-  )x100 % Confidence interval estimate of is μ given
by
 s1 s1 
 x  z / 2  , x  z / 2  
 n n

• Where z/2 is given such that


• For Z~N(0,1), P(-z/2 < Z < z/2) = (1-  ).
Confidence Interval Estimate of μ

σ known σ Unknown

n small n large n small n large


Normal Any Normal Any
Distribution Distribution Distribution Distribution

     s1 s1 
 x  z / 2  , x  z / 2    x  z / 2  , x  z / 2  
 n n  n n

 s s 
 x  t / 2  1 , x  t / 2  1 
 n n
Confidence Intervals for Population Proportion P

• We know, for large n, that pP


Z ~ N (0,1)
PQ n
• For Z~N(0,1), we have

P( z / 2  Z  z / 2 )  1  
 p  P 
or P  z / 2   z / 2   1  
 PQ n 
 
or 
P p  z / 2  PQ n  P  p  z / 2  PQ n  1   
• Thus (1-  )x100 % CI estimate of P is given by

pz  /2  PQ n , p  z / 2  PQ n 
• This expression itself contains P
• Which is unknown
• So, this CI estimate becomes meaningless.
• We use the unbiased estimate of P
• Then, (1-  )x100 % CI estimate of P is given by

pz  /2  pq n , p  z / 2  pq n 
• Where q=1-p.
• Example:
• A random sample of 100 people shows that 25
have opened IRA (individual retirement
arrangement) this year.
• Construct a 95% confidence interval for the true
proportion of population who have opened IRA.
• Ans
p  z( 0 .025 ) p( 1  p)/n
 25 / 100  1.96 0.25( 0.75 )/ 100
 0.25  1.96 (.0433 )
 ( 0.1651 , 0.3349 )
Sample Size Decision
(when Estimating μ)
• We have seen (for sufficiently large n) that
x
Z
x ~ N ( , n) or
 n
~ N (0,1)

• Error of Estimation e  x  
• Fix the confidence level at (1-  )x100 %
• Obtain critical value is z/2 using N(0,1) such that
• Then, we have
2
e   z / 2 
z / 2  or n 
 n  e 
• Thus the sample size for estimating population mean μ is
2
  z / 2 
n 
 e 
• Critical value z/2 can be taken from the table.
• Estimation Error (e) should be fixed by the researcher in
advance.
• Clearly, e ≠ 0
• Population standard deviation σ can be estimated from
some other small sample or pilot survey as
• Range/6 or by sample standard deviation
• Example:
• In a pilot survey, it is observed that the smallest
observation is 6 and the largest observation is 276.
• What should be the sample size needed to estimate the
population mean within ± 5 with 90% confidence level?
• Ans.
276  6
Estimate of population standard deviation ˆ   45
6
Estimation Error e  5
For 90% confidence level, critical value z ( 0.05)  1.645
2 2
 ˆ z 0.05   45  1.645 
So, n       219.19  219
 e   5 
Sample Size Decision
(when Estimating P)
• Similarly, the sample size for estimating population
proportion P is given by PQ ( z / 2 ) 2
n
e2
• For fixed confidence coefficient (1-  ), critical value z/2 can
be taken from the normal table.
• Estimation Error (e = |p – P|) should be fixed by the
researcher in advance. Clearly, e ≠ 0
• Population proportion P can be estimated from some other
small sample or pilot survey.
• If no information is available, it can be decided by the
researcher using past experience or can be taken as 0.5.
• Example:
• How large a sample would be necessary to
estimate the true proportion defective in a large
population within ±3%, with 95% confidence?
• (Assume a pilot sample yields p = 0.12)
• Ans.
Estimate of population proportion p  0.12
Estimation Error e  3 / 100  0.03
For 95% confidence level,critical value z ( 0.025 )  1.96
pq ( z 0.025 ) 2 0.12  0.88  1.96  1.96
So, n    450.75  451
e 2
0.03  0.03
• Estimating Total:
• In auditing, one is more interested to get the estimate of
population total amount.
• The point estimate of it can be given by Nx
• The CI estimate at (1-  )x100 % confidence level is given by

 s1   s1 
 N x  N t / 2    N x  N z / 2  
 n  n
(small sample size, normal distribution) (large sample size)
• fpc should be used when n / N >0.05
 s1 N  n   s N n 
 N x  N t / 2    N x  N z / 2  1 
 n N  1   n N  1 
  
(small sample size, normal distributi on) (large sample size)
Example: A firm has a population of 1000 accounts and
wishes to estimate the total population value.
• A sample of 80 accounts is selected with average
balance of $87.6 and standard deviation of $22.3.
• Find the 95% confidence interval estimate of the total
balance.
• Ans: N  1000, n  80, x  87.6, s1  22.3
s1 N n
Nx  N z 0.025
n N 1
22.3 1000  80
 ( 1000 )( 87.6 )  ( 1000 )( 1.96 )
80 1000  1
 87 ,600  4 ,762.48
 (82837.52, 92362.48)
• Estimating Total Difference:
• An auditor may wish to estimate the magnitude of
errors
• An error is the difference of the values reached
during audit and the original values recorded.
• A sample of size n items is collected.
• Let Di denote the error in the ith item (i=1,2,…,n).
 Di = 0, if the auditor finds that the original value is correct
 Di > 0, if the audited value is larger than the original value
 Di < 0, if the audited value is smaller than the original value
• Define: D  1 n D and s  1 n
 i n
D
i 1

n  1 i 1
( Di  D ) 2

• Point Estimate of Total Difference is N  D


• CI estimate of Total Difference
 sD   sD 
 N D  N t / 2    N D  N z / 2  
 n  n
(for small samples, normal distribution) (for large samples)

• fpc should be used when n / N >0.05


 s N n   s N n
 N D  N t / 2  D   N D  N z / 2  D 
 n N  1   n N  1 
 
(for smallsamples, normal distribution) (for large samples)
• Example:
• Econe Dresses has 1200 inventory items.
• In the past 15% items were incorrectly priced.
• A sample of 120 items was selected.
• Historical cost of each item was compared with
the audited value.
• 15 items differ in their historical costs and
audited values.
• These values are as follows:
n  120, N  1200
D  0.95833
s D  25.24482

n / N  120 / 1200  0.1  0.05,


we use fpc
 95% CI is
 s N n
 N D  Nz ( 0.025 ) D 
 n N  1 

 25.24482 1200  120 
 1200  ( 0.95833)  1200  1.96  
 120 1200  1 
Summary
• Point estimation and confidence interval estimation
• CI estimates for the population mean (σ known)
• CI estimates for the population mean (σ unknown)
• CI estimates for the population proportion
• Sample size decision in estimating population mean
• Sample size decision in estimating population
proportion
• CI estimates for Population Total
• CI estimates for Total Difference

You might also like