Professional Documents
Culture Documents
Unit 8
Unit 8
1 Introduction
Everyone makes estimates. When you are ready to cross a street, you estimate the speed of any car that is approaching, the distance between you and that car, and your own speed. Having made these quick estimates, you decide whether to wait, walk, or run.
A Point estimate: is a single number that is used to estimate an unknown population parameter. A point estimate is often insufficient, because it is either right or wrong, we do not know how wrong it is. Therefore, a point estimate is much more useful if it is accompanied by an estimate of the error that might be involved. An interval estimate: is a range of values used to estimate a population parameter. It indicates the error in two ways: by the extent of its range and by the probability of the true population parameter lying within that range.
Consider the table above, we have taken a sample of 35 boxes of bolts from a manufacturing line and have counted the bolts per box. We can arrive at the population mean i.e. mean number of bolts by taking the mean for the 35 boxes we have sampled. i.e. adding all the bolts and dividing by the number of boxes.
Thus using the sample mean x as the estimator we have a point estimate of the population mean . Similarly we can use the sample variance s 2 and estimate the population variance, where the sample variance s 2 is given by the formula.
The marketing research director needs an estimate of the average life in months of car batteries his company manufactures. We select a random sample of 200 batteries with a mean life of 36 months. If we use the point estimate of the sample mean x as the best estimator of the population mean , we would report that the mean life of the companys batteries is 36 months. The director also asks for a statement about the uncertainty that will be likely to accompany this estimate, that is, a statement about the range within which the unknown population mean is likely to lie. To provide such a statement, we need to find the standard error of the mean. If we select and plot a large number of sample means from a population, the distribution of these means will approximate to normal curve. Furthermore, the mean of the sample means will be the same as the population mean. Our sample size of 200 is large enough that we can apply the central limit theorem. Suppose we have already estimated the standard deviation of the population of the batteries and reported that it is 10 months. Using this standard deviation we can calculate the standard error of the mean by using the formula
Making the interval estimate: We can tell to the director that our estimate of the life of the companys batteries is 36 months, and the standard error that accompanies this estimate is 0.707. In other words, the actual mean life for all the batteries may lie somewhere in the interval estimate of 35.293 to 36.707 months. This is helpful but insufficient information for the director. Next, we need to calculate the chance that the actual life will lie in this interval or in other intervals of different widths that we might choose, 2 (2 x 0.707), 3 (3 x 0.707), and so on. The probability is 0.955 that the mean of a sample size of 200 will be within 2 standard errors of the population mean. Stated differently, 95.5 percent of all the sample means are within 2 standard errors from m. The population mean will be located within 2 standard errors from the sample mean 95.5 percent of the time. Hence from the above example we can now report to the director, that the best estimate of the life of the companys batteries is 36 months, and we are 68.3 percent confident that the life lies in the interval from 35.293 to 36.707 months (36 1x ). Similarly, we are 95.5percent confident that the life falls within the interval of 34.586 to 37.414 months (36 2x ), and we are 99.7 percent confident that battery life falls within the interval of 33.879 to 38.121 months (36 3x).
+ 1.64 x = upper limit of the confidence interval 1.64 x = lower limit of the confidence interval
Thus, confidence limits are the upper and lower limits of the confidence interval. In this case,
X + 1.64 x is called the upper confidence limit (UCL) and X 1.64 x = is the lower confidence limit (LCL).
Example: In a very large organization the director wanted to find out what proportions of the employees prefer to provide their own retirement benefits in lieu of a company sponsored plan. A simple random sample of 75 employees was taken and found that 40%, i.e. 0.4 of them are interested in providing their own retirement plans. The management requests that we use this sample to find an interval about which they can be 99 percent confident that it contains the true population proportion. Here n = 75, p = 0.4 q = 1- p = 1 0.4 = 0.6 Therefore Standard error of the mean = There the interval estimate for 99% level of confidence is 0.4 2.58 (0.057) = 0.253 and 0.547. Therefore the proportion of the total population of employees who wish to establish their own retirements plans lie between 0.253 and 0.547.
Conditions for usage: Because it is used when the sample size is 30 or less, statisticians often associate the t distribution with small sample statistics. This is misleading because the size of the sample is only one of the conditions that lead us to use the t distribution. The second condition is that the population standard deviation must be unknown. Use of the t distributions for estimating is required whenever the sample size is 30 or less and the population standard deviation is not known. Furthermore, in using the t distribution, we assume that the population is normal or approximately normal.
Degrees of freedom
There is a different t distribution for each of the possible degrees of freedom. What are degrees of freedom? We can define them as the number of values we can choose freely. We will use degrees of freedom when we select a t distribution to estimate a population mean, and we will use n 1degrees of freedom, where n is the sample size. For example, if we use a sample of 20 to estimate a population mean, we will use 19 degrees of freedom in order to select the appropriate t distribution. With two sample values, we have one degree of freedom (21 = 1), and with seven sample values, we have six degrees of freedom (71 = 6). In each of these two examples, then, we had n1 degrees of freedom, assuming n is the sample size. Similarly, a sample of 23 would give us 22 degrees of freedom.