Professional Documents
Culture Documents
Regrion
Regrion
Regression Statistics
Sales Advertising Multiple R 0.964212
27 20 R Square 0.929705
23 20 Adjusted R Square 0.922675
31 25 Standard Error 5.039375
45 28 Observations 12
47 29
ANOVA
42 28 df SS MS F Significance F
39 31 Regression 1 3358.714 3358.714 132.2573 4.35E-07
45 34 Residual 10 253.953 25.3953
57 35 Total 11 3612.667
59 36
73 41 Coefficie Standard Lower Upper
84 45 nts Error t Stat P-value 95% 95%
Intercept -23.0191 6.316228 -3.64444 0.004504 -37.0925 -8.94566
Advertising 2.280186 0.198272 11.50032 4.35E-07 1.838409 2.721962
Comparison of A Forecasted
value to the actual value and
average.
Y
Sales
(Y Y )
i
2 S y2
Adjusted R2 - adjusted for complexity by the degrees of freedom.
Unadjusted R2 becomes larger as more variables are added to the
equation (decreases the sum of errors in the denominator). The use of
an unadjusted R2 may result in believing that additional variables are
useful when they are not.
More on R2
If R2 = 1, there is a perfect linear relationship. All the variance in Y
is explained by X. All of the data points are on the regression line.
If R2 = 0, there is no relationship between X and Y (if this is the
case, we should not have run a linear model - and we should have
realized this with a correlation coefficient and by graphing -
BEFORE running the model!
Several ways to calculate. From ANOVA table: SSR/SST (this is
an UNADJUSTED R2 )
Adjusted R2 from ANOVA = 1-MSE/(SST/n-1)
The square root of R2 is R which is the correlation coefficient. This
identifies positive and negative relationships
R2 is useful to make model comparisons
Data Analysis
Syx or Standard Error - measure for goodness of fit. Measures the
actual values (Y) against the regression line ^ Lower S yx is a better
fit Y
Syx = (Y Yˆ ) 2
i
e 2
nk nk
The standard error can also be calculated by taking the square root of
the MSE in the ANOVA table!
MSE
Residuals
Predicted
Week ly Residuals
Observation Sales Residuals Squared Excel will provide
1 13.23544 -3.23544 10.46805
2 3.058252 2.941748 8.653879 the residuals in the
3 7.419903 -2.4199 5.85593 output. This table
4 10.32767 1.67233 2.796688
5 8.873786 1.126214 1.268357 also includes another
6 14.68932 0.31068 0.096522
7 8.873786 -3.87379 15.00622
column that I added -
8 11.78155 0.218447 0.047719 the residuals squared
9 17.59709 -0.59709 0.356513
10 16.1432 3.856796 14.87488 which is used to
determine the
standard error of the
estimate (Syx)
Confidence Intervals
Prior to relating Y to X, confidence intervals about the
future values are based on the standard error of Y.
However, in the regression equation, the standard
error of forecast (Sf) gives tighter confidence intervals
and greater accuracy.
Confidence Interval for Y:
Y z / 2 / n
Confidence Interval for ^:
Y
1 ( X i X )2
Yˆ Z / 2 S yx 1
n ( X i X )2
Use t/2 for small sample sizes!
Making Predictions
Identifying a forecasted point from the regression equation
does not give us an idea of the accuracy of the prediction.
We use the prediction interval to determine accuracy. For
example, a prediction of 8.44 appears to be precise - but
not if the 95% confidence level allows the forecast to be
between 1.75 to 15.15!
Be careful about making a prediction based on a prediction.
For example, if the X values range between 5 and 15, you
should be cautious about using an X value of 20 - it is
outside the range of the data and possibly outside of the
linear relationship.
Is the Independent Variable Significant?
Ho : B 0
H A: B 0
Where B is the true slope of the regression line
Is the Independent Variable Significant?
ANOVA
df SS MS F Significance F
Regression 1 174.1752 174.1752 23.44817 0.001284315
Residual 8 59.42476 7.428095
Total 9 233.6
ANOVA df SS MS F
Regression k-1 (Yˆ Y ) 2 SSR/k-1 MSR/MSE
Error n-k (Y Yˆ ) 2 SSE/n-k
Total n-1 (Y Y ) 2
F-Test
Ho: The model is NOT valid and there is NOT a
statistical relationship between the dependent and
independent variables
HA: The model is valid. There is a statistical
relationship between the dependent and
independent variables.
DW
t t 1
( e e ) 2
t
e 2
Data Transformations
Curvilinear relationships - fit the data with a curved line
Transform the X variable (independent) so the
resulting relationship with Y is linear.
Log of X, Square Root of X, X squared, and reciprocal
of X (or 1/X) are common. The hope is that one of
these transformations will result in a linear relationship.
Ok, 18 pages of notes, so where do we start?