Professional Documents
Culture Documents
Statistics For Manangement II
Statistics For Manangement II
STATISTICAL ESTIMATIONS
2.1Basic Concepts
Introduction
In many cases values for a population parameter are unknown. If parameters are unknown it is
generally not sufficient to make some convenient assumption about their values, rather those unknown
parameters should be estimated.
Hypothesis Testing -e.g. Use sample evidence to test hypotheses about the population mean.
Estimation e.g. Estimate the population mean using the information derived from sample. There
are two types of estimations for a population parameter:
- Point estimation
- Interval estimation
Definition of Terms:
Estimation: is the process of predicting the unknown population parameter through sampling i.e. it
is the process of using sample statistic so as to estimate the unknown population parameter. It is
simply the act of guessing the value of a population parameter. The objective of estimation is to
determine the approximate value of a population parameter on the basis of a sample statistic. E.g.
The sample mean ( x ) is employed to estimate the population mean ( )
Estimator: is a sample statistic that is used to estimate an unknown population parameter. E.g.
sample mean, sample proportion, etc.
Estimate: is a single numerical value obtained for an estimator. E.g. 1, 2, 3, - - -. For instance, the
sample mean is an estimator for the population mean.
Parameter: is a characteristic of a population. E.g. population mean, population standard
deviation, population proportion etc
Statistic: is a characteristic of a sample. E.g. sample mean, sample proportion, sample standard
deviation, etc.
2.2Types of Estimates
2.2.1 Point Estimation
It is a single numerical value obtained from a random sample used to estimate the corresponding
population parameter. A random sample of observations is taken from the population of interest and
the observed values are used to obtain a point estimate of the relevant parameter.
The Sample mean, x is the best estimator of the population mean (). Different samples from a population
yield different point estimates of ().
Sample proportion p is a good estimator of population proportion, p.
Population proportion (P) is equal to the number of elements in the population belonging to the
X
category of interest divided by the total number of elements in the population, p = N Where: X = is
the category of interest and n = is the sample size. p = Number of success in a sample
Number sampled
Example: of 2000 persons sampled 1600 favored more strict environmental protection measures, what
is the estimated population proportion.
p = 1600 = 0.80
2000
80% is an estimate of the proportion in the population that favors more strict measures.
In general: The statistic x estimates
S estimates
S2 estimates 2
p Estimates p
2.2.1.1 Estimators and their properties / Goodness of an estimator/
A good estimator should satisfy the following properties.
Unbiasedness - An estimator is said to be unbiased if its expected value is equal to the population
parameter it estimates (E ( x ) = ). The sample mean, x , is therefore, an unbiased estimator of the
population mean. Any systematic deviation of the estimator away from the parameter of interest is
called Bias.
Efficiency - An estimator is efficient if it has a relatively small variance (as standard deviation). If there
are two unbiased estimators of a parameter, the one whose variance is smaller is said to be relatively
efficient.
Consistency - An estimator is said to be consistent if its probability of being close to the parameter it
estimates increases as the sample size increases. The sample mean is a
x
consistent estimator of ( ). This is so because the standard deviation of x is n . As the sample
size n increases, the standard deviation of x decreases and hence the probability that x will be closes to
its expected value (), increases.
Sufficiency - An estimator is said to be sufficient if it contains all the information in the data about the
parameter it estimates. The sample mean is sufficient estimator of (). Other estimators like the median
and mode do not consider all values. But the mean considers all values (added and divided by the
sample size).
2.2.2 Interval Estimation
There is always a sort of sampling error that can be measured by the Standard Error of the mean which
relates to the precision of the estimated mean. Because of sampling variation we cannot say that the
exact parameter value is some specific number, but we can determine a range of values within which
we are confident the unknown parameter lies.
Interval estimate states the range within which a population parameter probably lies. The interval with
in which a population parameter is expected to lie is usually referred to as the confidence interval.
x
s
99% confidence interval x 2.58 n
1.96 & 2.58 indicate the Z values corresponding to the middle 95% or 99% of the observation
respectively.
S
xZ xZ
In general a confidence interval for the mean is computed by, n , or n Z reflects the
selected level of confidence.
Example
An experiment involves selecting a random sample of 256 middle managers for studying their annual
income. The sample mean is computed to be Br. 35,420 and the sample standard deviation is Br. 2,050.
a) What is the estimated mean income of all middle managers (the population)?
b) What is the 95% confidence interval (rounded to the nearest 10)
Statistics for Management II Page 3
c) What are the 95% confidence limits?
d) Interpret the finding.
Solution
a) Sample mean is 35420 so this will approximate the population mean so = 35420. It is estimated
from the sample mean.
b) The confidence interval is between 35168.87 and 35671.13 found by
S 2050
X 1.96
n = 35420 1.96 256 = 35168.87 and 35671.13
C) The end points of the confidence interval are called the confidence limits. In this case they are
rounded to 35168.87 and 35671.13. 35168.87 is the lower limit and 35671.13 is the upper limit.
D) Interpretation - The population means annual income would be found between 35168.87 and
35671.13 at 95 out of the 100 confidence intervals. About 5 out of the 100 confidence intervals
would not contain the population mean annual income.
Exercise
() A research firm conducted a survey to determine the mean amount smokers spend on cigarette
during a week. A sample of 49 smokers revealed that the sample mean is Br. 20 with standard
deviation of Br. 5. Construct 95% confidence interval for the mean amount spent.
2.2.2.2 Interval Estimation of the difference between two independent
means
If all possible samples of large size n 1 and n2 are drawn from two different populations, then sampling
x x 2 is approximately normal with mean (µ1-µ2)
distribution of the difference between two means 1 &
and standard deviation x 1 - x 2 = √12/n1 +22/n2
For a desired confidence level, the confidence interval limits for the population mean (µ 1-µ2) are given
by x 1 - x 2 ± Z x 1 - x 2
Example:
The strength of the wire produced by company A has a mean of 4,500kg and a standard deviation of
200 kg. Company B has a mean of 4,000 kg and a standard deviation of 300 kg. A sample of 50 wires
of company A and 100 wires of company B are selected at random for testing the strength. Find 99%
confidence limits on the difference in the average strength of the populations of wires produced by the
two companies.
Solution: the following information is given:
Company A: x 1 = 4500 = 200 n1 = 50
Company B: x 2 = 4000 = 300 n2 = 50
Therefore, x 1 - x 2 = 4500 – 4000 = 500 and Z = 2.576
x 1 - x 2 = √12/n1 +22/n2 = √40,000/50 + 90,000/100 = 41.23
x x x x
The required 99 % confidence interval limits are given by ( 1 - 2) ± Z 1 - 2
= 500 ± 2.576(41.23) = 500 ± 106.2
Hence the 99% confidence limit on the difference in the average strength of wires produced by the two
companies are likely to fall in the interval 393.80 =< µ =< 6.6.20.
σ p=
√ p(1− p )
n
p Z n √
p(1−p )
Example. Suppose 1600 of 2000 union members sampled said they plan to vote for the proposal to
merge with a national union. Union by laws state that at least 75% of all members must approve for the
merger to be enacted. Using the 0.95 degree of confidence, what is the interval estimate for the
1600
population proportion? Based on the confidence interval, what conclusion can be drawn? p = 2000 =
0.8. The sample proportion is 80%
As the sample size decreases the curve representing the t distribution will have wider tails and will be
more flat at the center.
Z Distribution
t Distribution
The t distribution depends on a parameter known as a degree of freedom. Degree of freedom means the
freedom to freely move data points or the freedom to freely assign values arbitrarily.
The symbol df will be used for degrees of freedom. The degrees of freedom for a confidence interval
for the mean are found by subtracting 1 from the sample size.
That is, df = n – 1 where n is the sample size.
As the number of degrees of freedom increases, t distribution gradually approaches the normal
distribution and the sample standard deviation s becomes a better estimate of population standard
deviation. When the sample size is small less than 30 and standard deviation of population is unknown,
the interval estimate of a population mean will be:
S
x±t
√n
S x=
√ ∑ ( x−x )2 =
n−1 √ 94 . 5
9 = 3.24 per day
α 0 . 05
=
The confidence level is 95%, = 1 – 0.95 = 0.05 (one tail) and two tails is 2 2 = 0.025
The degree of freedom, df = n – 1 = 10 – 1 = 9 from the t table t_0.025, df_ 9 = 2.26
s
The confidence interval is x t_.0025 df(9) √ n
3. 24
9.5 (2.26) √ 10 = 9.5 2.3 = 7.2 to 11.80
With 95% confidence the mean number of accident at this particular intersection is between 7.2 and
11.8.
Exercise
(2). A quality controller of a company plans to inspect the average diameter of small bolts made. A
random sample of 6 bolts was selected. The sample is computed to be 2.0016mm and the sample
standard deviation 0.0012mm. Construct the 99% confidence interval for all bolts made.
F For example in hypothesis testing of population mean the test statistic Z is computed as
( when sample size is large and is known)
Non-rejection
Region or do not reject H0 Rejection region
Scale of Z
0 1.96
0.95 Probability 0.05 Probability
Initial Value
The above chart portrays the rejection region for a test of significance. The level of significance
selected is 0.05.
1. The area where the null hypothesis is not rejected includes the area to the left of 1.96
2. The area of rejection is to the right of 1.96
3. A one – tailed test is being applied
4. The 0.05 level of significant was chosen
5. The sampling distribution is for the test statistic Z, the standard normal deviate.
6. The value 1.96 separates the regions where the null hypothesis is rejected and where it is not
rejected
7. The value 1.96 is called the critical value. It is the corresponding value of the test statistic for
the selected level of significance i.e. Z value at the 0.05 level of significance is 1.96.
Non-rejection
Rejection region region or do not reject H0
0.95 Probability
Z
-1.96 0
0.05 Probability 0.95 Probability
Initial Value
Consider companies purchase larger quantities of tire. Suppose they want the tires to an average
mileage of 40,000 Kms of wear under normal usage. They will therefore reject a shipment of tires if
accelerated - life test reveal that the life of the tires is significantly below 40,000 Kms on the average.
The purchasers gladly accept a shipment if the mean life is greater than 40,000 Kms, they are not
concerned with this possibility.
They are only concerned if they have sample evidence to conclude that the tires will average less than
40,000 Kms of useful life.
Thus the test is set up to satisfy the concern of the companies that the mean life of the tires is less than
40,000Kms.
The null and alternate hypotheses are written: -
Non-rejection
Rejection region region or do not reject H0 Rejection region
0.95 Probability
Z
-1.96 0 + 1. 96
0.025 Probability 0.025 Probability
2. H 0 : μ=μ 0 vs H 1 : μ> μ 0
3. H 0 : μ =μ 0 vs H 1 : μ< μ0
The decision rule is therefore: Reject the null hypothesis and accept the alternate hypothesis if the
computed value of Z does not fall in the region between +2.58 and -2.58. Otherwise do not reject the
null hypothesis.
α=0 . 05 ( given)
Step 2: select the level of significance,
Step 3: Select an appropriate test statistics
Z- Statistic is appropriate because population variance is known.
Step 4: identify the critical region.
The critical region is Z cal >−Z 0 .05 =−1. 645
⇒(−1. 645 , ∞ ) is accep tan ce region.
Step 5: Computations Decision & Conclusion
X̄ −μ0 1570−1600
Z cal= = =−1. 0
σ / √n 120 / √ 16
Decision: Accept H0 , since Zcal is in the acceptance region.
Conclusion: At 5% level of significance, we have no evidence to say that that the life time of light
bulbs is decreasing, based on the given sample data.
Example 3.3: Random samples of 200 senior school students produce a mean weight of 58 kg with
stdev. 4 kg. Test the hypothesis that the mean weights of the population is greater than 60 kg. Use 0.05
level of significance
Solution:
i)
Implies
v) Since is greater than H0 is rejected in favour of H1, this implies that the
mean weight of the senior school students is greater than 60.
Example:
A department store issues its own credit card. The credit manager wants to find out if the mean
monthly unpaid balance is more than Br. 400. The level of significance is set at 0.05. A random check
of 172 unpaid balances revealed the sample mean to be 407 and the standard deviation of the sample
38. Should the credit manager conclude that the population mean is greater than 400, or is it reasonable
to assume that the difference of 407- 400=7 is due to chance:
Solution
Ho: =400
Hi: > 400
Because Hl states a direction, a one tailed test is applied. The critical value of Z is 1.645 for 0.05 level
X−μ 407−400
S 380
Z= √ = n √ 172 = 2.42
A value of this large (2.42) will occur less than 5% of the time. So the credit manager would reject the
null hypothesis, Ho. that the mean unpaid balance is greater than 400, in favor of H 1, which states that
the mean is greater than 400.
X̄ −μ0
t cal =
S / √n
Where:
Conclusion: At 1% level of significance, we have no evidence to say that the average height content
of containers of the given lubricant is different from 10 litters, based on the given sample data.
Example 3.5: the mean length of a small counter balance bar is 43 mm there is aconcern that the
adjustment the machine changed the bars. The null hypothesis there is no change in the mean length (
) test at 0.02 level of significance. 12 bars are randomly selected and their length in mm 42,
39, 42, 45, 43, 40, 39, 41, 40, 42, 43, 42
Solution:
1)
2)
3)
4) Critical value:
1)
2)
3) Test statistics:
Critical region
Case3: When sampling is from a non- normally distributed population or a population whose
functional form is unknown.
- If a sample size is large one can perform a test hypothesis about the mean by using:
X̄−μ0
Z cal= , if σ 2 is known .
σ /√n
X̄−μ0
= , if σ 2 is unknown .
S / √n
CHAPTER 4
CHI-SQUARE (X2) DISTRIBUTIONS
The X2 distribution is an asymmetric distribution that has a minimum value of 0, but no maximum
value. The curve reaches a peak to the right of 0, and then gradually declines in height, the larger the
X2 value is. The curve is asymptotic to the horizontal axis, always approaching it, but never quite
touching the axis.
For each degree of freedom there is a different X 2 distribution. The mean of the chi square
distribution is the degree of freedom and the standard deviation is twice the degrees of freedom. This
Problem: 2Suppose that 60 children were asked as to which ice-cream flavor they liked out of the
three flavors of vanilla, strawberry and chocolate. The answers are recorded as follows;
Observed frequency
Flavours Number
Vanilla 17
Strawberry 24
Blue Eyes 23 7 30
Brown Eyes 4 16 20
Column Total 27 23 50
============================= END=======================================