Summary of Relationships Between Central Limit Theorem, Confidence Intervals, Regression Analysis

Document 28
Summary of Relationships Between

Central Limit Theorem, Confidence Intervals, Regression Analysis
1. Problem: Let Y be a normal random variable with mean  y and variance  y .

2
Find the probability that Y lies in a given interval.
Example: Find the probability that a light bulb’s lifetime will last between 850
and 1000 hours given that the lifetime of a light bulb is normally distributed with
mean of 800 hours and standard deviation of 100 hours.
Problem formulation:
Given Y  N (800, 1002 ), find P( 850 < Y < 1000)
Solution: Transform the problem into a standard normal random variable.

Remember,
Z = (Y –  y )/  y
= (Random variable – Mean of random variable)/(Standard deviation of the

random variable).
In the example, Z = (Y – 800)/100.
Therefore,
P( 850 < Y < 1000)
= P( (850 – 800)/100 < (Y –  y )/  y < (1000 – 800)/100)
= P( 0.5 < Z < 2 ) = 0.4772 - 0.1915 = 0.2857
2. Problem (Central Limit Theorem): Let y1 ,…, y n be a random sample with

1 n
sample mean (average) y  
n i 1
yi . Assume that the variance  2y is known.
Find a 95% confidence interval for  y .
Example: The average lifetime of 25 XYZ light bulbs was 750 hours. Find a
95% confidence interval for the mean of the population of all XYZ light bulbs
when the standard deviation is known to be 100.
183
Problem formulation:
 2 
Given: y  N   y , y n  or, specifically,
 

y  N  y , 100
2
25 
The general form for a 95% confidence interval is:
P( Estimator – 1.96 Standard deviation of estimator <  y < Estimator + 1.96

Standard deviation of estimator) = 0.95
Solution to example:

P y – 1.96
y
n

<  y < y + 1.96 y
n
 = 0.95
P(750 – 1.96 (100/5) <  y < 750 + 1.96 (100/5)) = 0.95
P(710.8 <  y < 789.2) = 0.95. Therefore, the unknown mean,  y , lies in the
interval [710.8, 789.2] with probability 0.95.
3. Regression (the example is from the discussion in Document 23):
a. State the variables

General Example
Dependent variable: Y Y is sales data
Explanatory variables Explanatory variables
X1 X1 = National advertising
X2 X 2 = Promotions
b. State the model Predictor equation

Y  0  1 X1   2 X 2   y  b 0  b1 X1  b 2 X 2
Note: b 0 , b1 , b 2 , are
computed by
least squares
c. Test the statistical significance of

the model (see footnotes ):
From the analysis variance table,
184
check the signal-to-noise ratio.
Average signal/Average noise

= (SSR/k)/(SSE/n-k-1) (442.315/2)/(96.67/7)
= 16.014
where k=number of explanatory variables
n=number of observations
Compare the signal-to-noise ratio 16.014> F 2,7  = 4.74

to the F statistic with k and (n-k-1) This model is statistically
degrees of freedom. If the ratio is greater significant.
than F  k,n  k 1  , the model is statistically
significant
d. How useful is the model? R 2 = 442.315/538.985

R 2 = SSR/SST = 0.82,
proportion of variation explained 82% of the variation
by the model. in sales is explained
by the model (the two
explanatory variables).
e. Pruning the model:

To test the importance of an Test the importance of
explanatory variable, test to advertising:
see if its coefficient is significantly t=(1.3479-0)/ Sb1
different than zero. The test is based = 3.583, which is greater
on the Z-statistic (t-statistic since than the absolute value of
the coefficient’s variance is estimated). of 2. Therefore, keep
advertising in the model.
Test the importance of

Z=( b   )/std dev of b promotionals:
t=(1.8247-0)/ Sb2
Keep the variable if the = 4.008, which is greater
Z (or t) value is greater than than the absolute value of
2 in absolute value. 2. Therefore, keep
promotionals in the model.
f. Practical significance:
Best forecast is y . Predict Y when X1 = 10 and
Approximate 95% prediction X = 3. y is 43.54.
2
interval is: The approximate 95%
185
y  2 SSE  n  k  1 . prediction interval is
The acceptable width is 43.54  2 13.81 .
determined by the owner The 95% interval is:
of the problem. [36.12, 50.96]
 Comments about degrees of freedom

1. The degrees of freedom represent the amount of (linear)
information within the accompanying statistic.
2. The degrees of freedom for the signal (SSR) represent the number
of explanatory variables.
3. The degrees of freedom for the total variability (SST) is always
(n – 1), where n is the sample size.
4. The sample variance of Y is SST/(n – 1).
5. The degrees of freedom for noise (SSE) is (n - 1- k), where k is the
number of explanatory variables.
 Comments about the signal-to-noise ratio
1. The model is called statistically significant (useful in a statistical
sense) if the signal-to-noise ratio is great than the F-distribution
threshold.
2. The F-distribution is a table of signal-to-noise ratios indexed by 2
parameters: the degrees of freedom in the signal (k), and the
degrees of freedom in the noise (n-k-1).
 Comment about practical significance

Practical significance is based on the width of the approximate
95% prediction interval (A prediction interval is a probability
interval that covers a random variable; e.g., y . A confidence
interval is a probability interval that covers a constant; e.g.,  y .).
Practical significance is determined by the subject matter person,
not by the statistician.
An annotated example of Minitab output is in the notes, course information

12.
186

Summary of Relationships Between Central Limit Theorem, Confidence Intervals, Regression Analysis

Uploaded by

Copyright:

Available Formats

You might also like

Summary of Relationships Between Central Limit Theorem, Confidence Intervals, Regression Analysis

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Summary of Relationships Between Central Limit Theorem, Confidence Intervals, Regression Analysis

Uploaded by

Copyright:

Available Formats

Document 28

Summary of Relationships Between

1. Problem: Let Y be a normal random variable with mean  y and variance  y .

Find the probability that Y lies in a given interval.

Given Y  N (800, 1002 ), find P( 850 < Y < 1000)

Solution: Transform the problem into a standard normal random variable.

= (Random variable – Mean of random variable)/(Standard deviation of the

In the example, Z = (Y – 800)/100.

P( 850 < Y < 1000)

= P( (850 – 800)/100 < (Y –  y )/  y < (1000 – 800)/100)

= P( 0.5 < Z < 2 ) = 0.4772 - 0.1915 = 0.2857

2. Problem (Central Limit Theorem): Let y1 ,…, y n be a random sample with

Find a 95% confidence interval for  y .

The general form for a 95% confidence interval is:

P( Estimator – 1.96 Standard deviation of estimator <  y < Estimator + 1.96

P(750 – 1.96 (100/5) <  y < 750 + 1.96 (100/5)) = 0.95

3. Regression (the example is from the discussion in Document 23):

a. State the variables

b. State the model Predictor equation

c. Test the statistical significance of

Average signal/Average noise

Compare the signal-to-noise ratio 16.014> F 2,7  = 4.74

d. How useful is the model? R 2 = 442.315/538.985

e. Pruning the model:

Test the importance of

 Comments about degrees of freedom

 Comment about practical significance

An annotated example of Minitab output is in the notes, course information

You might also like