Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 13

Chapter five

Statistical Inferences; Estimating for single populations


Estimating Population Mean with Large sample size
On many occasions estimating the population mean is useful in business research.
A point estimate is a statistic taken from a sample and is used to estimate a population
parameter. However, a point estimate is only as good as the representative ness of its
sample. If other random sample are taken from the population. The point estimates
derived from those samples are likely to vary. Because of variation in sample statistics,
estimating a population parameter with an interval estimate is often preferable to using
point estimate. An interval estimate (confidence interval) is a range of values with in
which the analyst can declare with some confidence the population parameter lies.
Confidence interval can be two sided or one sided.

As a result of the central limit theorem, the following Z formula for means can be used
when sample sizes are large, regardless of the shape of the population distribution, or for
smaller sizes if the population is normally distributed.

X 
Z= 
n
Rearranging this formula algebraically to solve for  gives


= x – Z.
n
Because a sample mean can be greater than or less than the population mean, Z can be
positive or negative. Thus;


= x  Z.
n
Rewriting this expression yields the confidence interval formula for estimating  with
large sample size.

(100 –  ) %, confidence interval to estimate 


x  Z2 ……………………………………………….. (5.1)
n
Or

 
x – Z2    x + Z2
n n
Where
  The area under the normal curve out side the confidence interval area

2 = The area in one end (tail) of the distribution outside the confidence interval

1
Here we use  to locate the Z value in constructing the confidence interval. Because the
standard normal table is based on areas between a Z of 0 and Z  2 , the table Z value is
found by locating the area of 0.5 –  2 ,which is the part of the normal curve between the
middle of the curve and one of the tails. Another ways to locate this Z value is to change
the confidence level from percentage to proportion, divide it in half, and go to the table
with this value. The results are the same.
The confidence interval formula (5.1) yields a range (interval) with in which we feel
with some confidence the population mean is located. It is not certain that the population
mean is in the interval unless we have 100% confidence interval that it is infinitely wide.
However we can assign probability that the parameter (  ) is located with in the
interval.
Formula 5.1 can be presented as a probability statement

 
P x – Z2    x + Z2  =1– 
n n
Z score for confidence intervals in relation to 


2

- Z2 0 Z2

2
1–

Confidence

2

 Shaded area

Example;
Real estate broker estimate the mean family income in the area as an indicator of
expected sales. A sample of 100 families yields a mean of x = 35, 500. Presume the
population standard deviation is  = 7,200, given that a 95% confidence interval, is
estimated as;
Confidence interval for  is

3

 x  Z.
n
The Z value for a 95% confidence interval is 1.96 ( 0.95 2 = 0.475, the Z value is 1.96 or Z

2
or Z 0.025 is 0.5 – 0.025 = 0.475, the Z value is 1.96)

7,200
35,500  (1.96)
100
34,088.80    36,911.20

Interpretation
- The developer is 95% confident that the true unknown population mean is between $
34,088.80 and $ 36, 9911.20

Finite correction Factor


In case of interval estimation, the finite correction factor is used to reduce the width of
the interval.

Confidence interval to Estimate  using the finite correction factor


 ( N  n)  ( N  n)
x – Z2    x + Z2
n ( N  1) n ( N  1)
…………………….5.2
Example
A random sample of 50 from 800 engineers reveals that the average sample age is 34.3
years. Historically, the population standard deviation of the ages of the company’s
engineers is approximately 8 years. construct a 98% confidence interval to estimate the
average age of all the engineers in this company.
Solution
N = 800
n = 50  which is  5% N (Finite correction factor)
x = 34.3
 =8

Z value for a 98% confidence interval is 2.33


Using the Z formula yields
8 (750) 8 (750)
34.3 – 2.33    34.3 + 2.33
50 (799) 50 (799)
 31.66    36.94
The finite correction factor takes in to account the fact that the population is only 800
instead of being infinitely large. The sample, n = 50, is greater proportion of the 800 than
it would be of a larger population, and thus the width of the confidence interval is
reduced.

4
Confidence interval to Estimate  when  unknown and n is is large
When sample sizes are large (n  30), the sample standard deviation is a good estimate
of the population standard deviation and can be used as an acceptable approximation of
the population standard deviation in the Z formula for a mean. Because formula based on
central limit theorem require large samples for non normal populations, it makes sense to
modify formula (.5.1) to use the sample standard deviation, S. Beware, however, not to
use this modified formula for small samples when the population standard deviation is
unknown even when the population is normally distributed.

S
x  Z2 Or
n
S S
x – Z2    x + Z2 ……………………………… (5.3)
n n
Example
Given, n = 110, x = 85.5 and S =19.3, compute a 99% confidence interval to estimate

Solution
The confidence interval is
S S
 x – Z2    x + Z2 , Z 0.005 = 2.575 Or
n n
0.99
 0.495 , 0.5 – 0.495 = 0.005  Z 0.005 =2.575
2
19.3 19.3
 85.5 – 2.575    85.5 + 2.575
110 110
80.8    90.2
With 99% confidence, we estimate that the population mean is some where between 80.8
and 90.2.

Table
-Value of Z for some of the more common levels of confidence.

Confidence
Level Z value Estimating the population mean; small sample sizes,
90% 1.645
 unknown
In many real life situations, sample sizes of less than 30 are
95% 1.96
the norm.
98% 2.33
99% 2.575
The t Distribution

5
William S.Gosset (British statistician) developed the t distribution, which describes the
sample data in small samples when the population standard deviation is unknown and the
population is normally distributed. The formula for the t value is;

x
t= S
n
The formula is essentially the same as the Z formula, but the distribution table values are
different.
The assumption underlying the use of the techniques discussed in this chapter for small
sample sizes is that the population is normally distributed. . If the population distribution
is not normal or is unknown, non parametric techniques should be used.

Robustness
Most statistical techniques have one or more underlying assumptions. If a statistical
technique is relatively insensitive to minor violations in one or more of its underlying
assumptions, the technique is said to be robust to that assumption. The t statistic for
estimating a population mean is relatively robust to the assumption that the population is
normally distributed.
Some statistical techniques are not robust, and statisticians should exercise extreme
caution to be certain that the assumptions underlying a technique are being met befor
using it or interpreting statistical out put resulting from its use.
A researcher should always beware of statistical assumptions and the robustness of
techniques being used in an analysis.

Characteristics of the t Distribution


▪ Symmetric
▪ Unimodal
▪ Family of curve
▪ Flatter in the middle and have more area in their tails than the standard normal
distribution
An examination of t distribution values reveals that the t distribution approaches the
standard normal curve as n becomes large.

The t distribution is the appropriate distribution to use any time the population variance
or standard deviation is unknown, regardless of sample size. However, because the
difference between the table value for Z and t becomes negligible for large sample many
researchers use the Z distribution for large sample – analysis even when the standard
deviation or variance is unknown.
The t distribution is reserved for use with small sample size problems (n < 30) because,
as n nears size 30, the t table values approach the Z table values.

To find a value in the t distribution table requires knowing the sample size. The t
distribution table is a compilation of many t distributions, with each line of the table

6
representing a different sample size. However, the sample size must be converted to
degrees of freedom ( df ) before a table value can be determined.

t formula are used because the population variance or standard deviation, which is part
of the Z formula, is unknown and must be estimated by a sample standard deviation or
variance.

The t distribution table does not use the area between the statistic and the mean as does
the Z distribution. Instead t table uses the area in the tail of the distribution. The emphasis
in the t table is on  and each tail of the distribution contains  2 of the area under the
curve when confidence interval are constructed.
Degree of Freedom (df = n–1)
-The number of observations that can be freely chosen

Variance of t-distribution

n 1
2 =
n3

Example,
Given, n= 4 observation that must produce a mean of 10. The mean of 10 serves as a
constraint and there are n-1= 3 degree of freedom.

Confidence interval to Estimate  when  unknown and Sample size is


small
The t formula
x
t= S
n
Can be manipulated algebraically to produce a formula for estimating the population
mean using small sample when  is unknown and the population is normally
distributed. The result is the formulas given next

S
x  t  2 , n 1 or
n

S S
x – t  2 , n 1    x + t  2 , n 1 ……………………………….( 5.4)
n n
Example;
Owner of a large equipment rental company wants to make rather quick estimate of the
average number of days a piece of equipment is rented out per person per time. The
owners decide to take a random sample of rental invoices. Fourteen different rentals of
7
the equipment are selected randomly from the files, yielding the following data. She uses
these data to construct a 99% confidence interval to estimate the average number of days
that equipment is rented and assume that the number of days per rental is normally
distributed in the population.

3 1 3 2 5 1 2 1 4 2 1 3 1 1

As, n= 14, the df= 13, the 99% level of confidence results in  2 = 0.005 areas in each
tail of the distribution. The table t value is
t 0.005 , 13 = 3.012
The sample mean is 2.14 and the sample standard deviation is 1.29 the confidence
interval is

S
x  t
n

1.29
2.14  3.012 = 1.10    3.18
14

Prob 1.10    3.18


The point estimate of the average length of time per day rental is 2.14 days , with an error
of  1.04.

Estimating the population Proportions


Business decision makers and researchers often need to be able to estimate the population
proportion.
The central limit theorem for sample proportions led to the following formula.

ˆP
P
Z= P.Q
n
Where Q = 1–P, recall that this formula can be applied only when n.p and n.Q are greater
than 5.
Algebraically manipulating this formula to estimate P involves solving for P. However P
is in both the numerator and the denominator ,which complicates the resulting formula.
For this reason – for confidence interval purposes only and for large sample size- P̂ is
substituted for P in the denominator, yielding.

ˆP
P
Z= ˆ qˆ.
p
n
Where p̂ = 1– q̂ .Solving for P resulting in the confidence interval in formula 5.5

8
Confidence interval to estimate P

ˆ .qˆ
p ˆ .qˆ
p
P̂ – Z  2    P̂ + Z  2 ……………………………….5.5
n n
Where
p̂ = Sample proportion
q̂ = 1– p̂
P= Population proportion
n = Sample size
ˆ .qˆ
p
In this formula, p̂ is the point estimate and  Z  2 is the error of the estimation.
n

Example
A study of 87 randomly selected companies with a telemarketing operation revealed that
39% of the sampled companies had used telemarketing to assist them in order processing.
Using this information, how could a researcher estimate the population proportion of
telemarketing companies that use their telemarketing operation to assist them in order
processing?

Solution
- p̂ = 0.39 – is the point estimate of the population proportion, P
- For n = 87, and p̂ = 0.39, a 95 % confidence interval can be computed to determine the
interval estimation of P.

-The Z value for 95% confidence is 1.96.


-The value of q̂ = 1– p̂ is 1–0.39=0.61
-The confidence interval estimation is

(0.39)(0.61) (0.39)(0.61)
0.39 – 1.96  P  0.39+1.96
87 87
0.29  P  0.49
Prob  0.29  P  0.49  = 0.95
-There is a point estimate of 0.39 with an error of  10.This results has a 95 % level of
confidence.
Estimating the population variance
Suppose a researcher wants to estimate the population variance. The relation ship of the
sample variance is captured by the chi-square distribution ( x 2 ).
-Chi-square lacks robustness.
-The number of degree of freedom for the chi-square formula is n– 1
x2 Formula for single variance

(n  1) s 2
x =
2
…………………………………………………………….8.6
2

9
df = n– 1
- The Chi-square distribution is not symmetrical and its shape will vary according to the
degree of freedom

Formula 8.6 can be algebraically to produce a formula that can be used to construct
confidence intervals for population variances.

Confidence interval to estimate the population variance

(n  1) s 2 (n  1) s 2
2  
2
2
x 2 x1 2

df = n– 1
The value of  is equal to 1–(level of confidence expressed as a proportion).Thus if we
are constructing a 90% confidence interval, alpha is 10% of the area and is expressed in
proportion from  = 0.10.
Example
Given, S = 1.12 , n= 25 develop a 95% confidence interval to estimate the population
variance. Assume the populations are normally distributed.

Solution
S 2 = (1.12 2 ) = 1.2544
df= n– 1 = 25– 1 = 24
A 95% confidence means that alpha (  ) is 1– 0.95 =0.05.This value is split to determine
the area in each tail of the chi-square distribution;  2 = 0.025. The values of the chi-
square obtained from the table are;

x2 0.025 , 24 = 39.3641
x 2
0.975 , 24
= 12.4011
From this information, the confidence interval can be determined.

(n  1) s 2 (n  1) s 2
2  
2
2
x 2 x1 2

( 24)()1.2544) ( 245)(1.2544)
2 
39.3641 x12.40112

0.7648   2  2.4277

10
Prob  0.7648   2  2.4277 

Graphically

0.025
0.025
0.975

x x

Estimating sample size


In most business research that uses sample statistics to infer about the population, being
able to estimate the size of sample necessary to accomplish the purpose of the study is
important.
Sample size when estimating 

-When  is being estimated, the size of sample can be determined by using the z
formula for sample means to solve for n.

X 
Z= 
n
(x  ) -is the error of estimation resulting from the sampling process. Let E = (x  )
- the error of estimation.
E
Z=  Solving for n produce the sample size
n
Sample size when estimating 

11
Z 22 2 Z  2
n=
2
=  2
E E
-If  is unknown use the following estimate to represent  .
2

 = 1 4 (range)
This estimate is derived from the empirical rule(chapter 1) stating that approximately
95% of the values in a normal distribution are with in a normal distribution are with in
 2  of the mean ,giving a range with in which most of the values are located.
Example
Suppose you want to estimate the average age of the all Boeing 727 air planes how in
active domestic U.S service. You want to be 95% confident, and you want your estimate
to be with in 2 years of the actual figure. The 727was first placed in service about 30
years ago, but you believe that no active 727 is the U.S domestic fleets are more than 25
years old. How large a sample should you take?

Solution
E= 2 years, the Z value for 95% is 1.96, and  is unknown
   ( 1 4 ) (range)
  ( 1 4 ) (25) = 6.25

Z 22 2 1.96 2  6.25 2


n= = = 37.52
E 2 22
 38
If you randomly sample 38 units, you have an opportunity to estimate the average age of
active 727 with in two years and be 95% confident of results.

12
13

You might also like