Summary of Relationships Between Central Limit Theorem, Confidence Intervals, Regression Analysis

You might also like

Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 4

Document 28

Summary of Relationships Between


Central Limit Theorem, Confidence Intervals, Regression Analysis

1. Problem: Let Y be a normal random variable with mean  y and variance  y .


2

Find the probability that Y lies in a given interval.

Example: Find the probability that a light bulb’s lifetime will last between 850
and 1000 hours given that the lifetime of a light bulb is normally distributed with
mean of 800 hours and standard deviation of 100 hours.

Problem formulation:

Given Y  N (800, 1002 ), find P( 850 < Y < 1000)

Solution: Transform the problem into a standard normal random variable.


Remember,

Z = (Y –  y )/  y

= (Random variable – Mean of random variable)/(Standard deviation of the


random variable).

In the example, Z = (Y – 800)/100.

Therefore,

P( 850 < Y < 1000)

= P( (850 – 800)/100 < (Y –  y )/  y < (1000 – 800)/100)

= P( 0.5 < Z < 2 ) = 0.4772 - 0.1915 = 0.2857

2. Problem (Central Limit Theorem): Let y1 ,…, y n be a random sample with


1 n
sample mean (average) y  
n i 1
yi . Assume that the variance  2y is known.

Find a 95% confidence interval for  y .

Example: The average lifetime of 25 XYZ light bulbs was 750 hours. Find a
95% confidence interval for the mean of the population of all XYZ light bulbs
when the standard deviation is known to be 100.

183
Problem formulation:
 2 
Given: y  N   y , y n  or, specifically,
 

y  N  y , 100
2

25 

The general form for a 95% confidence interval is:

P( Estimator – 1.96 Standard deviation of estimator <  y < Estimator + 1.96


Standard deviation of estimator) = 0.95

Solution to example:


P y – 1.96
y
n

<  y < y + 1.96 y
n
 = 0.95

P(750 – 1.96 (100/5) <  y < 750 + 1.96 (100/5)) = 0.95

P(710.8 <  y < 789.2) = 0.95. Therefore, the unknown mean,  y , lies in the
interval [710.8, 789.2] with probability 0.95.

3. Regression (the example is from the discussion in Document 23):

a. State the variables


General Example
Dependent variable: Y Y is sales data
Explanatory variables Explanatory variables
X1 X1 = National advertising
X2 X 2 = Promotions

b. State the model Predictor equation


Y  0  1 X1   2 X 2   y  b 0  b1 X1  b 2 X 2
Note: b 0 , b1 , b 2 , are
computed by
least squares

c. Test the statistical significance of


the model (see footnotes ):
From the analysis variance table,
184
check the signal-to-noise ratio.

Average signal/Average noise


= (SSR/k)/(SSE/n-k-1) (442.315/2)/(96.67/7)
= 16.014
where k=number of explanatory variables
n=number of observations

Compare the signal-to-noise ratio 16.014> F 2,7  = 4.74


to the F statistic with k and (n-k-1) This model is statistically
degrees of freedom. If the ratio is greater significant.
than F  k,n  k 1  , the model is statistically
significant

d. How useful is the model? R 2 = 442.315/538.985


R 2 = SSR/SST = 0.82,
proportion of variation explained 82% of the variation
by the model. in sales is explained
by the model (the two
explanatory variables).

e. Pruning the model:


To test the importance of an Test the importance of
explanatory variable, test to advertising:
see if its coefficient is significantly t=(1.3479-0)/ Sb1
different than zero. The test is based = 3.583, which is greater
on the Z-statistic (t-statistic since than the absolute value of
the coefficient’s variance is estimated). of 2. Therefore, keep
advertising in the model.

Test the importance of


Z=( b   )/std dev of b promotionals:
t=(1.8247-0)/ Sb2
Keep the variable if the = 4.008, which is greater
Z (or t) value is greater than than the absolute value of
2 in absolute value. 2. Therefore, keep
promotionals in the model.
f. Practical significance:
Best forecast is y . Predict Y when X1 = 10 and
Approximate 95% prediction X = 3. y is 43.54.
2
interval is: The approximate 95%

185
y  2 SSE  n  k  1 . prediction interval is
The acceptable width is 43.54  2 13.81 .
determined by the owner The 95% interval is:
of the problem. [36.12, 50.96]

 Comments about degrees of freedom


1. The degrees of freedom represent the amount of (linear)
information within the accompanying statistic.
2. The degrees of freedom for the signal (SSR) represent the number
of explanatory variables.
3. The degrees of freedom for the total variability (SST) is always
(n – 1), where n is the sample size.
4. The sample variance of Y is SST/(n – 1).
5. The degrees of freedom for noise (SSE) is (n - 1- k), where k is the
number of explanatory variables.
 Comments about the signal-to-noise ratio
1. The model is called statistically significant (useful in a statistical
sense) if the signal-to-noise ratio is great than the F-distribution
threshold.
2. The F-distribution is a table of signal-to-noise ratios indexed by 2
parameters: the degrees of freedom in the signal (k), and the
degrees of freedom in the noise (n-k-1).

 Comment about practical significance


Practical significance is based on the width of the approximate
95% prediction interval (A prediction interval is a probability
interval that covers a random variable; e.g., y . A confidence
interval is a probability interval that covers a constant; e.g.,  y .).
Practical significance is determined by the subject matter person,
not by the statistician.

An annotated example of Minitab output is in the notes, course information


12.

186

You might also like