Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 10

CHAPTER TWO

STATISTICAL ESTIMATIONS
2.1Basic Concepts
Introduction
In many cases values for a population parameter are unknown. If parameters are unknown it is
generally not sufficient to make some convenient assumption about their values, rather those
unknown parameters should be estimated.
In business many decision are made without complete information. A firm does not know
exactly what will be its sales volume next year or next month. A college does not know exactly
how many students will enroll next year. Both must estimate to make decision about the future.
Inferential statistics is concerned with estimation. It is the procedure where inference about a
population is made on the basis of the results obtained from a sample drawn from that
population. This can be achieved by:
 Hypothesis Testing -e.g. Use sample evidence to test hypotheses about the population mean.
 Estimation e.g. Estimate the population mean using the information derived from sample.
There are two types of estimations for a population parameter:
- Point estimation
- Interval estimation
Definition of Terms:
 Estimation: is the process of predicting the unknown population parameter through
sampling i.e. it is the process of using sample statistic so as to estimate the unknown

population parameter. It is simply the act of guessing the value of a population parameter.
The objective of estimation is to determine the approximate value of a population parameter
on the basis of a sample statistic. E.g. The sample mean ( x ) is employed to estimate the
population mean (  )
 Estimator: is a sample statistic that is used to estimate an unknown population parameter.
E.g. sample mean, sample proportion, etc.
 Estimate: is a single numerical value obtained for an estimator. E.g. 1, 2, 3, - - -. For
instance, the sample mean is an estimator for the population mean.
 Parameter: is a characteristic of a population. E.g. population mean, population standard
deviation, population proportion etc
 Statistic: is a characteristic of a sample. E.g. sample mean, sample proportion, sample
standard deviation, etc.
2.2Types of Estimates
2.2.1 Point Estimation
It is a single numerical value obtained from a random sample used to estimate the corresponding
population parameter. A random sample of observations is taken from the population of interest
and the observed values are used to obtain a point estimate of the relevant parameter.
 The Sample mean, x is the best estimator of the population mean (). Different samples from a
population yield different point estimates of ().

1|Page
 Sample proportion p is a good estimator of population proportion, p.
Population proportion (P) is equal to the number of elements in the population belonging to the
X
category of interest divided by the total number of elements in the population, p = N Where:

X = is the number of success in the population and N = population size


x
Sample proportion, p = n where; X = is the number of elements in the sample found to
belong to the category of interest and n = is the sample size. p = Number of success in a sample
Number sampled
Example: of 2000 persons sampled 1600 favored more strict environmental protection measures,
what is the estimated population proportion.
p = 16000 = 0.80
2000
80% is an estimate of the proportion in the population that favors more strict measures.
In general: The statistic x estimates 
S estimates 
S2 estimates 2
p Estimates p
2.2.1.1 Estimators and their properties / Goodness of an estimator/
A good estimator should satisfy the following properties.
 Unbiasedness - An estimator is said to be unbiased if its expected value is equal to the population
parameter it estimates (E ( x ) = ). The sample mean, x , is therefore, an unbiased estimator of the
population mean. Any systematic deviation of the estimator away from the parameter of interest
is called Bias.
 Efficiency - An estimator is efficient if it has a relatively small variance (as standard deviation). If
there are two unbiased estimators of a parameter, the one whose variance is smaller is said to be
relatively efficient.
 Consistency - An estimator is said to be consistent if its probability of being close to the
parameter it estimates increases as the sample size increases. The
sample mean is a consistent estimator of ( ). This is so because the standard deviation of x is

x 
n . As the sample size n increases, the standard deviation of  x decreases and hence the

probability that x will be closes to its expected value (), increases.
 Sufficiency - An estimator is said to be sufficient if it contains all the information in the data
about the parameter it estimates. The sample mean is sufficient estimator of (). Other estimators
like the median and mode do not consider all values. But the mean considers all values (added
and divided by the sample size).
2.2.2 Interval Estimation
There is always a sort of sampling error that can be measured by the Standard Error of the mean
which relates to the precision of the estimated mean. Because of sampling variation we cannot

2|Page
say that the exact parameter value is some specific number, but we can determine a range of
values within which we are confident the unknown parameter lies.
Interval estimate states the range within which a population parameter probably lies. The interval
with in which a population parameter is expected to lie is usually referred to as the confidence
interval.
The confidence interval for the population mean is the interval that has a high probability of
containing the population means, 
Three confidence intervals are used extensively.
 90% confidence interval,
 95% confidence interval and
 99% confidence interval
A 95% confidence interval means that about 95% of the similarly constructed intervals will
contain the parameter being estimated. If we use the 99% confidence interval we expect about
99% of the intervals to contain the parameter being estimated.
Another interpretation of the 95 % confidence interval is that 95 % of the sample means for a
specified sample size will lie within 1.96 standard deviations of the hypothesized population
mean. For 99% the sample means will lie, with in 2.58 standard deviations of the hypothesized
population mean.
 Where do the values 1.96 and 2.58 come from?
The middle 95% of the sample mean lie equally on either side of the mean. And logically
0.95/2=0.4750 or 47.5% of the area is to the right of the mean and the area to the left of the mean
is 0.4750. The Z value for this probability is 1.96. The Z to the right of the mean is + 1.96 and Z
to the left is – 1.96.
2.2.2.1 Interval Estimation of the Mean
a) Compute the standard error of the mean
Standard error of the mean is the standard deviation of the sample means.
  = population standard deviation, n = sample size
x 
n
If the population standard deviation is not known, the standard deviation of the sample s, is used
S
Sx 
to approximate the population standard deviation. n
This indicates that the error in estimating the population means decreases as the sample size
increases.
b) The 95% and 99% confidence intervals are constructed as follows when n > 30.
x
x 
95% confidence interval  1.96 n

x
 s
99% confidence interval x  2.58 n
1.96 & 2.58 indicate the Z values corresponding to the middle 95% or 99% of the observation
respectively.
S
xZ xZ
In general a confidence interval for the mean is computed by, n , or n Z reflects
the selected level of confidence.

3|Page
Example
An experiment involves selecting a random sample of 256 middle managers for studying their
annual income. The sample mean is computed to be Br. 35,420 and the sample standard
deviation is Br. 2,050.
a) What is the estimated mean income of all middle managers (the population)?
b) What is the 95% confidence interval (rounded to the nearest 10)
c) What are the 95% confidence limits?
d) Interpret the finding.
Solution
a) Sample mean is 35420 so this will approximate the population mean so  = 35420. It is
estimated from the sample mean.
b) The confidence interval is between 35168.87 and 35671.13 found by
S  2050 
X  1.96  
n = 35420  1.96  256  = 35168.87 and 35671.13
C) The end points of the confidence interval are called the confidence limits. In this case they
are rounded to 35168.87 and 35671.13. 35168.87 is the lower limit and 35671.13 is the upper
limit.
D) Interpretation - The population means annual income would be found between 35168.87 and
35671.13 at 95 out of the 100 confidence intervals. About 5 out of the 100 confidence
intervals would not contain the population mean annual income.
Exercise
() A research firm conducted a survey to determine the mean amount smokers spend on
cigarette during a week. A sample of 49 smokers revealed that the sample mean is Br. 20 with
standard deviation of Br. 5. Construct 95% confidence interval for the mean amount spent.
2.2.2.2 Interval Estimation of the difference between two
independent means
If all possible samples of large size n 1 and n2 are drawn from two different populations, then
x x 2 is approximately normal
sampling distribution of the difference between two means 1 &
with mean (µ1-µ2) and standard deviation  x 1 - x 2 = √12/n1 +22/n2
For a desired confidence level, the confidence interval limits for the population mean (µ 1-µ2) are
given by x 1 - x 2 ± Z x 1 - x 2
Example:
The strength of the wire produced by company A has a mean of 4,500kg and a standard deviation
of 200 kg. Company B has a mean of 4,000 kg and a standard deviation of 300 kg. A sample of
50 wires of company A and 100 wires of company B are selected at random for testing the
strength. Find 99% confidence limits on the difference in the average strength of the populations
of wires produced by the two companies.
Solution: the following information is given:
Company A: x 1 = 4500  = 200 n1 = 50
Company B: x 2 = 4000  = 300 n2 = 100
Therefore, x 1 - x 2 = 4500 – 4000 = 500 and Z = 2.576

4|Page
x 1 -
x 2 = √12/n1 +22/n2 = √40,000/50 + 90,000/100 = 41.23
x x x x
The required 99 % confidence interval limits are given by ( 1 - 2) ± Z 1 - 2
= 500 ± 2.576(41.23) = 500 ± 106.2
Hence the 99% confidence limit on the difference in the average strength of wires produced by
the two companies are likely to fall in the interval 394.04 =< µ =< 605.96
Exercise 1: A large chain-store wishes to compare credit card holders living in area I with those
living in area II in terms of the length of time the customers have held the credit cards. A random
sample of 81 card holders is selected from each area. The sample means are found to be 120
months and 90 months for area I and area II, respectively. The population variances being 49
months for area I and 36 months for area II, construct a 99% confidence interval for the
difference between the two population means.
Exercise 2: A sample of 150 bulbs of brand A showed an average life of 1800 hrs with standard
deviation of 15 hrs. Another sample of 100 bulbs of brad B showed an average life of 1500 hrs
with standard deviation of 11 hrs. Obtain 95% confidence interval for the difference in the mean
life of population of A and B brand bulbs.
2.2.2.3 Interval estimation for a population proportion
The confidence interval for a population proportion is estimated
p  Zp
Where p is the standard error of the proportion and

σ p=
√ p(1− p )
n

Therefore the confidence interval for population proportion is constructed by

p Z n √
p(1−p )

Example. Suppose 1600 of 2000 union members sampled said they plan to vote for the proposal
to merge with a national union. Union by laws state that at least 75% of all members must
approve for the merger to be enacted. Using the 0.95 degree of confidence, what is the interval
estimate for the population proportion? Based on the confidence interval, what conclusion can be
1600
drawn? p = 2000 = 0.8. The sample proportion is 80%

The interval is computed as follows. p Z √ p(1−p )


n √
0. 80(1−0 .8 )
= 0.80  1.96 2000 = 0.08 
1.96 √ 0. 00008 = 0.78247 and 0.81753 rounded to 0.782 and 0.818.
Based on the sample results when all union members vote, the proposal will probably pass
because 0.75 lie below the interval between 0.782 and 0.818.
Exercise
() A sample of 200 people was assumed to identify their major source of news information; 110
stated that their major source was television news coverage. Construct a 90% confidence interval

5|Page
for the proportion of people in the population who consider television their major source of news
information.
Exercise

2.2.3 Student’s t- Distribution


When the population is large and the variable is normally distributed and the standard deviation
is known or when the standard deviation is unknown and the sample size greater than 30, the
standard normal distribution is employed to construct the confidence interval for the mean and
proportion. However, in many situations the sample size is less than 30 and population standard
deviation is unknown, the standard normal distribution, Z, is not appropriate. The student’s t or
the t distribution is used.
Characteristics of the Student’s t Distribution
The t distribution shares some characteristics of the normal distribution and differs from it in
others. Both are a continuous distribution, bell-shaped and symmetrical. The t distribution is
differs from the standard normal distribution in the following ways:
 There is no one t distribution, but rather a ‘family’ of t distribution. All have the same mean of
zero but their standard deviation differs according to the sample size, n. The t distribution differs
for different sample size.
 As the sample size increases, the t distribution approaches the standard normal distribution.
 It is more spread out and flatter at the center than is the Z. However as the sample size increases
the curve representing t distribution approaches the Z distribution.

t distribution for sample size of 28

t distribution for sample size of 20


t distribution for sample size of 10

As the sample size decreases the curve representing the t distribution will have wider tails and
will be more flat at the center.

Z Distribution

6|Page
t Distribution

The t distribution depends on a parameter known as a degree of freedom. Degree of freedom


means the freedom to freely move data points or the freedom to freely assign values arbitrarily.
The symbol df will be used for degrees of freedom. The degrees of freedom for a confidence
interval for the mean are found by subtracting 1 from the sample size.
That is, df = n – 1 where n is the sample size.
As the number of degrees of freedom increases, t distribution gradually approaches the normal
distribution and the sample standard deviation s becomes a better estimate of population standard
deviation. When the sample size is small less than 30 and standard deviation of population is
unknown, the interval estimate of a population mean will be:
S
x±t
√n

Computing t value
x−μ
The t variable representing the student’s t distribution is defined as: t = s / √ n where: x is the
sample mean of n measurements,  is the population mean and s is the sample standard deviation
x−μ
Note that t is just like Z = σ / √ n except that we replace  with s. unlike our methods of large
samples,  cannot be approximated by s when the sample size is less than 30 and we cannot use
the normal distribution. The table for the t distribution is constructed for selected levels of
confidence for degree of freedom up to 30. To use the table we need to know two numbers, the
tail area, (1 minus confidence level selected), and the degree of freedom.
(1 – Confidence level selected) is , the Greek letter alpha. This is the error we committee in
estimating.
S
The confidence interval for the sample mean is x  t √n
Example: A traffic department in town is planning to determine mean number of accidents at a
high-risk intersection. Only a random sample of 10 days measurements were obtained. The
numbers of accidents per day were:
8 7 10 15 11 6 8 5 13 12
Construct a 95% confidence interval for the mean number of accident per day.
a) Compute x and s

7|Page
95
x = 10 = 9.5 per day

S x=
√ ∑ ( x−x )2 =
n−1 √ 94 . 5
9 = 3.24 per day
α 0 . 05
=
The confidence level is 95%,  = 1 – 0.95 = 0.05 (one tail) and two tails is 2 2 = 0.025
The degree of freedom, df = n – 1 = 10 – 1 = 9 from the t table t_0.025, df_ 9 = 2.26
s
The confidence interval is x  t_.0025 df(9) √ n
3. 24
9.5  (2.26) √ 10 = 9.5  2.3 = 7.2 to 11.80
With 95% confidence the mean number of accident at this particular intersection is between 7.2
and 11.8.

Exercise
() A quality controller of a company plans to inspect the average diameter of small bolts made.
A random sample of 6 bolts was selected. The sample is computed to be 2.0016mm and the
sample standard deviation 0.0012mm. Construct the 99% confidence interval for all bolts made.

2.2.4 Sample Size Determination

Size of a sample must be determined scientifically. Care must be taken not to select a sample too
large or too small. The sample size should be mathematically determined.
2.2.4.1 Sample size for estimating Population mean
When the distribution of sample mean x is normal, the standard normal variable Z is given as
x−μ σ
σ
Z= √n x
or - µ = √
n
The value of Z in the above equation will be positive or negative, depending or whether the
sample mean x is larger or smaller than population mean µ. The difference between
and µ is
x
called the sampling error or margin of error, E. thus, margin of error acceptable (i.e.
maximum tolerable difference between unknown population mean and the sample estimate at a
particular level of confidence) can be written as:
σ σ
x - µ = Z √n = E = Z √n
σ
√n = Z E = n = Z22/ E2
If population standard deviation  is not known, the sample standard deviation, s can be used to
determine the sample size, n.

8|Page
Example: Given a population with a standard deviation of 8.6. What sample size is needed to
estimate the mean of population within ± 0.5 with 99% confidence?
Solution: we have E = 0.5, Z = 2.576 at 99% CI and  = 8.6
n = Z22 / E2
n = (2.576)2(8.6)2 = 1964
(0.5)2
2.2.4.2 Sample size for estimating population proportion
The method for determining a sample size for estimating the population proportion is similar to
that used in the previous section. We require that the sample proportion p should fall within

range p ± E.
The formula for determining the sample size n for a proportion
2
n= p (1 - p ) ( ZE )
Where: p - estimated proportion
Z = Z value for the selected confidence level
E = the maximum tolerable error

Example: A member of parliament wants to determine her popularity in her region. She
indicates that the proportion of voters who will vote for her must be estimated within + 2 percent
of the population proportion. Further, the 95% degree of confidence is to be used. In past
elections she received 40% of the popular vote in that area. She doubts whether it has changed
much. How many registered voters should be sampled?
p = 0.40 & E = 0.02
Solution: Z = 1.96,
Z 2
n = (1 - ) E )
p p (

( )
1. 96 2
= 0.40 (1 – 0.4) 0 . 02 = 2,304.96  2305

2.2.4.3 Sample size determination for finite population


When samples are drawn from a finite population of size N, finite population correction factor
will used to minimize the sample which will be taken. It has little effect on reducing the amount
of sampling error.
Let no be the size for estimating population mean without using correction factor. Then
no = Z22
E2
The revised sample size, taking in to consideration the size of population, is given by
noN
n=
no+( N−1)
Example: for a population of 1000, what should be the sampling size necessary to estimate the
population mean at 95% confidence with a sampling error of 5 and the standard deviation equal
to 20?
Solution: We have E = 5,  = 20, Z = 1.96 at 95% CI and N = 1000

9|Page
no = (1.96)2(20)2 = 61.456
(5)2
Since the population size is finite, the revised sample size obtained by using the correction factor
noN ( 61.456)1000
n= =n= = 57.952
no+( N−1) 61.456+(1000−1)
Thus a slightly small sample size of n = 58 should be taken.

10 | P a g e

You might also like