Professional Documents
Culture Documents
Multi-Regression - Cont Student Ver - Updated
Multi-Regression - Cont Student Ver - Updated
Multi-Regression - Cont Student Ver - Updated
11/18/20 1
Selecting the best Regression equation.
n After a lengthy list of potentially useful
independent variables has been compiled,
some of the independent variables can be
screened out. An independent variable
n May not be fundamental to the problem
n May be subject to large measurement error
n May effectively duplicate another independent
variable in the list.
11/18/20 2
Selecting the best Regression Equation.
n Once the investigator has tentatively
decided upon the functional forms of the
regression relations (linear, quadratic, etc.),
the next step is to obtain a subset of the
independent variables (x) that “best”
explain the variability in the dependent
variable y.
11/18/20 3
Selecting the best Regression Equation.
n An automatic search procedure that
develops sequentially the subset of x
variables to be included in the regression
model is called stepwise procedure.
n It was developed to economize on
computational efforts.
n It will end with the identification of a single
regression model as “best”.
11/18/20 4
Example:Sales Forecasting
n Sales Forecasting
n Multiple regression is a popular technique for predicting
product sales with the help of other variables that are likely to
have a bearing on sales.
n Example
n The growth of cable television has created vast new potential
11/18/20 5
Example:Sales Forecasting
n Y = Number of cable subscribers (SUSCRIB)
n X1 = Advertising rate which the station charges local
advertisers for one minute of prim time
space (ADRATE)
n X2 = Kilowatt power of the station’s non-cable signal
(KILOWATT)
n X3 = Number of families living in the station’s area of
dominant influence (ADI), a geographical division of
radio and TV audiences (APIPOP)
n X4 = Number of competing stations in the ADI
(COMPETE)
11/18/20 6
Example:Sales Forecasting
n The sample data are fitted by a multiple regression
model using Excel program.
n The marginal t-test provides a way of choosing the
variables for inclusion in the equation.
n The fitted Model is
SUBSCRIBE = b0 + b1 ´ ADRATE + b 2 ´ APIPOP + b3 ´ COMPETE + b 4 ´ SIGNAL
11/18/20 7
Example:Sales Forecasting
n Excel Summary output
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.884267744
R Square 0.781929444
Adjusted R Square 0.723777295
Standard Error 142.9354188
Observations 20
ANOVA
df SS MS F Significance F
Regression 4 1098857.84 274714.4601 13.44626923 7.52E-05
Residual 15 306458.0092 20430.53395
Total 19 1405315.85
11/18/20
Non - Reject Ho: beta>=0 Non- Reject Ho: beta<=0
8
Example:Sales Forecasting
n Do we need all the four variables in the
model?
n Based on the partial t-test, the variables
signal and compete are the least significant
variables in our model.
n Let’s drop the least significant variables one
at a time.
11/18/20 9
Example:Sales Forecasting
n Excel Summary Output
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.882638739
R Square 0.779051144
Adjusted R Square 0.737623233
Standard Error 139.3069743
Observations 20
ANOVA
df SS MS F Significance F
Regression 3 1094812.92 364937.64 18.80498277 1.69966E-05
Residual 16 310502.9296 19406.4331
Total 19 1405315.85
11/18/20 11
Example:Sales Forecasting
n Excel Summary Output
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.8802681
R Square 0.774871928
Adjusted R Square 0.748386273
Standard Error 136.4197776
Observations 20
ANOVA
df SS MS F Significance F
Regression 2 1088939.802 544469.901 29.2562866 3.13078E-06
Residual 17 316376.0474 18610.35573
Total 19 1405315.85
11/18/20 12
Example:Sales Forecasting
11/18/20 13
Interpreting the Final Model
n What is the interpretation of the estimated parameters.
n Is the association positive or negative?
n Does this make sense intuitively, based on what the data
represents?
n What other variables could be confounders?
n Are there other analysis that you might consider doing?
New questions raised?
11/18/20 14
Multicollinearity
n In multiple regression analysis, one is often
concerned with the nature and significance of the
relations between the independent variables and
the dependent variable.
n Questions that are frequently asked are:
n What is the relative importance of the effects of the
different independent variables?
n What is the magnitude of the effect of a given
independent variable on the dependent variable?
11/18/20 15
Multicollinearity
n Can any independent variable be dropped from the
model because it has little or no effect on the dependent
variable?
n Should any independent variables not yet included in
the model be considered for possible inclusion?
n Simple answers can be given to these questions if
n The independent variables in the model are
uncorrelated among themselves.
n They are uncorrelated with any other independent
variables that are related to the dependent variable but
omitted from the model.
11/18/20 16
Multicollinearity
n When the independent variables are correlated among
themselves, multicollinearity among them is said to exist.
n In many non-experimental situations in business,
economics, and the social and biological sciences, the
independent variables tend to be correlated among
themselves.
n For example, in a regression of family food expenditures
on the variables family income, family savings, and the age
of head of household, the independent variables will be
correlated among themselves.
11/18/20 17
Multicollinearity
n Further, the independent variables will also
be correlated with other socioeconomic
variables not included in the model that do
affect family food expenditures, such as
family size.
11/18/20 18
Multicollinearity
n Some key problems that typically arise when the
independent variables being considered for the
regression model are highly correlated among
themselves are:
1. Adding or deleting an independent variable changes the
regression coefficients.
2. The estimated standard deviations of the regression coefficients
become large when the independent variables in the regression
model are highly correlated with each other.
3. The estimated regression coefficients individually may not be
statistically significant even though a definite statistical relation
exists between the dependent variable and the set of
independent variables.
11/18/20 19
Multicollinearity Diagnostics
n A formal method of detecting the presence of
multicollinearity that is widely used is by the
means of Variance Inflation Factor.
n It measures how much the variances of the estimated
regression coefficients are inflated as compared to
when the independent variables are not linearly related.
1
VIFj = , j = 1,2,! k
1- Rj
2
11/18/20 20
Multicollinearity Diagnostics
n A VIF near 1 suggests that multicollinearity is not a
problem for the independent variables.
n Its estimated coefficient and associated t value will not change
much as the other independent variables are added or deleted from
the regression equation.
n A VIF much greater than 1 indicates the presence of
multicollinearity. A maximum VIF value in excess of 10 is
often taken as an indication that the multicollinearity may
be unduly influencing the least square estimates.
n the estimated coefficient attached to the variable is unstable and
its associated t statistic may change considerably as the other
independent variables are added or deleted.
11/18/20 21
Multicollinearity Diagnostics
n For example, a VIF of 1.9 tells you that the variance of a
particular coefficient is 90% bigger than what you would
expect if there was no multicollinearity — if there was no
correlation with other predictors.
11/18/20 22
Multicollinearity Diagnostics
n A rule of thumb for interpreting the variance inflation
factor:
n 1 = not correlated.
11/18/20 23
Multicollinearity Diagnostics
n The simple correlation coefficient between all
pairs of explanatory variables (i.e., X1, X2, …, Xk
) is helpful in selecting appropriate explanatory
variables for a regression model and is also critical
for examining multicollinearity.
n While it is true that a correlation very close to +1
or –1 does suggest multicollinearity, it is not true
(unless there are only two explanatory variables)
to infer multicollinearity does not exist when there
are no high correlations between any pair of
explanatory variables.
11/18/20 24
Example:Sales Forecasting
Pearson Correlation Coefficients, N = 20
Prob > |r| under H0: Rho=0
11/18/20 25
Example:Sales Forecasting
n VIF calculation:
n Fit the model
APIPOP = b0 + b1 ´ SIGNAL + b 2 ´ ADRATE + b3 ´ COMPETE
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.878054
R Square 0.7709781
Adjusted R Square 0.728036
Standard Error 264.3027
Observations 20
ANOVA
df SS MS F Significance F
Regression 3 3762601 1254200 17.9541 2.25472E-05
Residual 16 1117695 69855.92
Total 19 4880295
Coefficients
Standard Error t Stat P-value Lower 95% Upper 95%
Intercept -472.685 139.7492 -3.38238 0.003799 -768.9402258 -176.43
Compete 159.8413 28.29157 5.649786 3.62E-05 99.86587622 219.8168
ADRATE 0.048173 0.149395 0.322455 0.751283 -0.268529713 0.364876
11/18/20
Signal 0.037937 0.083011 0.457012 0.653806 -0.138038952 0.213913 26
Example:Sales Forecasting
n Fit the model
Compete = b 0 + b1 ´ ADRATE + b 2 ´ APIPOP + b3 ´ SIGNAL
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.882936
R Square 0.7795754
Adjusted R Square 0.738246
Standard Error 1.34954
Observations 20
ANOVA
df SS MS F Significance F
Regression 3 103.0599 34.35329 18.86239 1.66815E-05
Residual 16 29.14013 1.821258
Total 19 132.2
Coefficients
Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 3.10416 0.520589 5.96278 1.99E-05 2.000559786 4.20776
ADRATE 0.000491 0.000755 0.649331 0.525337 -0.001110874 0.002092
Signal 0.000334 0.000418 0.799258 0.435846 -0.000552489 0.001221
APIPOP 0.004167 0.000738 5.649786 3.62E-05 0.002603667 0.005731
11/18/20 27
Example:Sales Forecasting
n Fit the model
Signal = b0 + b1 ´ ADRATE + b 2 ´ APIPOP + b3 ´ COMPETE
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.512244
R Square 0.262394
Adjusted R Square 0.124092
Standard Error 790.8387
Observations 20
ANOVA
df SS MS F Significance F
Regression 3 3559789 1186596 1.897261 0.170774675
Residual 16 10006813 625425.8
Total 19 13566602
Coefficients
Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 5.171093 547.6089 0.009443 0.992582 -1155.707711 1166.05
APIPOP 0.339655 0.743207 0.457012 0.653806 -1.235874129 1.915184
Compete 114.8227 143.6617 0.799258 0.435846 -189.7263711 419.3718
ADRATE -0.38091 0.438238 -0.86919 0.397593 -1.309935875 0.548109
11/18/20 28
Example:Sales Forecasting
n Fit the model
ADRATE = b0 + b1 ´ Signal + b 2 ´ APIPOP + b3 ´ COMPETE
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.399084
R Square 0.1592681
Adjusted R Square 0.001631
Standard Error 440.8588
Observations 20
ANOVA
df SS MS F Significance F
Regression 3 589101.7 196367.2 1.010346 0.413876018
Residual 16 3109703 194356.5
Total 19 3698805
Coefficients
Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 253.7304 298.6063 0.849716 0.408018 -379.2865355 886.7474
Signal -0.11837 0.136186 -0.86919 0.397593 -0.407073832 0.170329
APIPOP 0.134029 0.415653 0.322455 0.751283 -0.747116077 1.015175
Compete 52.3446 80.61309 0.649331 0.525337 -118.5474784 223.2367
11/18/20 29
Example:Sales Forecasting
n VIF calculation Results:
Variable R- Squared VIF
ADRATE 0.159268 1.19
COMPETE 0.779575 4.54
SIGNAL 0.262394 1.36
APIPOP 0.770978 4.36
11/18/20 30
Time Series Data and the Problem of
Serial Correlation
n In the regression models we assume that the
errors ei are independent.
n In business and economics, many regression
applications involve time series data.
n For such data, the assumption of
uncorrelated or independent error terms is
often not appropriate.
11/18/20 31
Time Series Data and the
Problem of Serial Correlation
n Example: Consider the annual base price for a particular
model of a new car. Can you imagine the chaos that would
exist if the new car prices from one year to the next were
indeed unrelated (independent) of one another ?
11/18/20 32
Time Series Data and the
Problem of Serial Correlation
n In the real world, price in the current year is related to
(correlated with) the price in the previous year, and maybe
the price two years ago, and so forth.
11/18/20 33
What is serial correlation?
n Autocorrelation (serial correlation) exists when successive
observations over time are related to one another.
11/18/20 34
Problems of Serial Correlation
n If the error terms in the regression model are
autocorrelated, the use of ordinary least squares
procedures has a number of important
consequences
n MSE underestimate the variance of the error terms
n The confidence intervals and tests using the t and F
distribution are no longer strictly applicable.
n The standard error of the regression coefficients
underestimate the variability of the estimated regression
coefficients. Spurious regression can result.
11/18/20 35
First order serial correlation
n The error term in current period is directly related
to the error term in the previous time period.
n Let the subscript t represent time, then the simple
linear regression model is:
yt = b 0 + b1 xt + e t
n Where
e t = r e t -1 +n t
n et = error at time t
n r = the parameter that measures correlation between
adjacent error terms
n nt normally distributed error terms with mean zero and
11/18/20 variance s2n 36
Example
n The effect of positive serial correlation in a
simple linear regression model.
n Misleading forecasts of future y values.
n Standard error of the estimate, S y.x will
underestimate the variability of the y’s about
the true regression line.
n Strong autocorrelation can make two unrelated
variables appear to be related.
11/18/20 37
11/18/20 38
Durbin-Watson Test for Serial
Correlation
n One approach that is used frequently to determine
if serial correlation is present is the Durbin-
Watson Test.
n The test involves the determination of whether the
autocorrelation parameter 𝜌 is zero
11/18/20 39
Durbin-Watson Test for Serial
Correlation
n Recall the first-order serial correlation model
yt = b 0 + b1 xt + e t
e t = r e t -1 +n t
n The hypothesis to be tested are:
H0 : r = 0
Ha : r > 0
n The alternative hypothesis is r > 0 since in
business and economic time series tend to show
positive correlation.
11/18/20 40
Durbin-Watson Test for Serial
Correlation
n The Durbin-Watson statistic is defined as
n
å (e t - et -1 ) 2
DW = t =2
n
åt
e 2
t =1
n Where
et = yt - yˆt = the residual for time period t
et -1 = yt -1 - yˆt -1 = the residual for time period t -1
11/18/20 41
Durbin-Watson Test for Serial
Correlation
n The auto correlation coefficient r can be
estimated by the lag 1 residual
autocorrelation r1(e)
n
åe e t t -1
r1 (e) = t =2
n
åe
t =1
2
t
11/18/20 42
Durbin-Watson Test for Serial
Correlation
n Since –1 < r1(e) < 1 then 0 < DW < 4
n If r1(e) = 0, then DW = 2 (there is no
correlation.)
n If r1(e) > 0, then DW < 2 (positive
correlation)
n If r1(e) < 0, Then DW > 2 (negative
correlation)
11/18/20 43
Durbin-Watson Test for Serial
Correlation
n Decision rule:
n If DW > dU, Do not reject H0.
n If DW < dL, Reject H0
n If dL £ DW £ dU, the test is inconclusive.
n The critical Upper (U) an Lower (L) bound can be
found in Durbin-Watson table in appendix.
n To use this table you need to know The
significance level (a) The number of independent
parameters in the model (k), and the sample size
(n).
11/18/20 44
Durbin-Watson Test for Serial
Correlation
H0 : No autocorrelation (error terms are independent)
H1 : There is autocorrelation (error terms are not
independent)
0 dL dU 2 4-dU 4-dL 4
11/18/20 45
Obtaining the Critical Values of
Durbin-Watson Statistic
Finding Critical Values of Durbin-Watson Statistic
a =.05
k=1 k=2
n dL dU dL dU
15 1.08 1.36 .95 1.54
16 1.10 1.37 .98 1.54
11/18/20 46
Example: Blaisdell Company
n The Blaisdell Company wished to predict
its sales by using industry sales as a
predictor variable. The following table gives
seasonally adjusted quarterly data on
company sales and industry sales for the
period 1983-1987.
11/18/20 47
Example
Year Quarter t CompSale InduSale
1983 1 1 20.96 127.3
2 2 21.4 130
3 3 21.96 132.7
4 4 21.52 129.4
1984 1 5 22.39 135
2 6 22.76 137.1
3 7 23.48 141.2
4 8 23.66 142.8
1985 1 9 24.1 145.5
2 10 24.01 145.3
3 11 24.54 148.3
4 12 24.3 146.4
1986 1 13 25 150.2
2 14 25.64 153.1
3 15 26.36 157.3
4 16 26.98 160.7
1987 1 17 27.52 164.2
2 18 27.78 165.6
3 19 28.24 168.7
4 20 28.78 171.7
11/18/20 48
Example
Blaisdell Company Example
35
30
Company Sales ($
25
millions)
20
15
10
5
0
0 50 100 150 200
Industry sales($ millions)
11/18/20 49
Example
n The scatter plot suggests that a linear regression
model is appropriate.
n Least squares method was used to fit a regression
line to the data.
n The residuals were plotted against the fitted
values.
n The plot shows that the residuals are consistently
above or below the fitted value for extended
periods.
11/18/20 50
Example
11/18/20 51
Example
n To confirm this graphic diagnostis we will use the
Durbin-Watson test for:
H0 : r = 0
Ha : r > 0
å (e t - et -1 ) 2
DW = t =2
n
åt
e 2
t =1
11/18/20 52
Example
Year Quarter t Company sales(y) Industry sales(x) et et-et-1 (et-et-1 )^2 et^2
1983 1 1 20.96 127.3 -0.02605 0.000679
2 2 21.4 130 -0.06202 -0.03596 0.001293 0.003846
3 3 21.96 132.7 0.022021 0.084036 0.007062 0.000485
4 4 21.52 129.4 0.163754 0.141733 0.020088 0.026815
1984 1 5 22.39 135 0.04657 -0.11718 0.013732 0.002169
2 6 22.76 137.1 0.046377 -0.00019 3.76E-08 0.002151
3 7 23.48 141.2 0.043617 -0.00276 7.61E-06 0.001902
4 8 23.66 142.8 -0.05844 -0.10205 0.010415 0.003415
1985 1 9 24.1 145.5 -0.0944 -0.03596 0.001293 0.008911
2 10 24.01 145.3 -0.14914 -0.05474 0.002997 0.022243
3 11 24.54 148.3 -0.14799 0.001152 1.33E-06 0.021901
4 12 24.3 146.4 -0.05305 0.094937 0.009013 0.002815
1986 1 13 25 150.2 -0.02293 0.030125 0.000908 0.000526
2 14 25.64 153.1 0.105852 0.12878 0.016584 0.011205
3 15 26.36 157.3 0.085464 -0.02039 0.000416 0.007304
4 16 26.98 160.7 0.106102 0.020638 0.000426 0.011258
1987 1 17 27.52 164.2 0.029112 -0.07699 0.005927 0.000848
2 18 27.78 165.6 0.042316 0.013204 0.000174 0.001791
3 19 28.24 168.7 -0.04416 -0.08648 0.007478 0.00195
4 20 28.78 171.7 -0.03301 0.011152 0.000124 0.00109
0.097941 0.133302
Blaisdell Company Example
11/18/20 53
Example
.09794
DW = = .735
.13330
n Using Durbin Watson table of your text
book, for k = 1, and n=20, and using a =
.01 we find U = 1.15, and L = .95
n Since DW = .735 falls below L = .95 , we
reject the null hypothesis, namely, that the
error terms are positively autocorrelated.
11/18/20 54
Remedial Measures for Serial
Correlation
n Addition of one or more independent
variables to the regression model.
n One major cause of autocorrelated error terms
is the omission from the model of one or more
key variables that have time-ordered effects on
the dependent variable.
n Use transformed variables.
n The regression model is specified in terms of
changes rather than levels.
11/18/20 55
Extensions of the Multiple Regression
Model
n In some situations, nonlinear terms may be needed
as independent variables in a regression analysis.
n Business or economic logic may suggest that non-
linearity is expected.
n A graphic display of the data may be helpful in
determining whether non-linearity is present.
n One common economic cause for non-linearity is
diminishing returns.
n Fore example, the effect of advertising on sales may
diminish as increased advertising is used.
11/18/20 56
Extensions of the Multiple Regression
Model
n Some common forms of nonlinear functions
are :
Y = b 0 + b1 ( X ) + b 2 ( X 2 )
Y = b 0 + b1 ( X ) + b 2 ( X 2 ) + b 3 ( X 3 )
Y = b 0 + b1 (1 X )
Y = e b0 X b1
11/18/20 57
Extensions of the Multiple Regression
Model
n To illustrate the use and interpretation of a
non-linear term, we return to the problem of
developing a forecasting model for private
housing starts (PHS).
n So far we have looked at the following
model
PHS = b0 + b1 (MR) + b 2 (Q2) + b3 (Q3) + b 4 (Q4)
n Where MR is the mortgage rate and Q2, Q3, and Q4 are
indicators variables for quarters 2, 3, and 4.
11/18/20 58
Example: Private Housing Starts (PHS)
PERIOD PHS MR Q2 Q3 Q4
31-Mar-90 217 10.1202 0 0 0
30-Jun-90 271.3 10.3372 1 0 0
30-Sep-90 233 10.1033 0 1 0
31-Dec-90 173.6 9.9547 0 0 1
31-Mar-91 146.7 9.5008 0 0 0
30-Jun-91 254.1 9.5265 1 0 0
30-Sep-91 239.8 9.2755 0 1 0
31-Dec-91 199.8 8.6882 0 0 1
31-Mar-92 218.5 8.7098 0 0 0
30-Jun-92 296.4 8.6782 1 0 0
30-Sep-92 276.4 8.0085 0 1 0
31-Dec-92 238.8 8.2052 0 0 1
31-Mar-93 213.2 7.7332 0 0 0
30-Jun-93 323.7 7.4515 1 0 0
30-Sep-93 309.3 7.0778 0 1 0
31-Dec-93 279.4 7.0537 0 0 1
31-Mar-94 252.6 7.2958 0 0 0
30-Jun-94 354.2 8.4370 1 0 0
30-Sep-94 325.7 8.5882 0 1 0
31-Dec-94 265.9 9.0977 0 0 1
31-Mar-95 214.2 8.8123 0 0 0
30-Jun-95 296.7 7.9470 1 0 0
30-Sep-95 308.2 7.7012 0 1 0
31-Dec-95 257.2 7.3508 0 0 1
31-Mar-96 240 7.2430 0 0 0
30-Jun-96 344.5 8.1050 1 0 0
30-Sep-96 324 8.1590 0 1 0
31-Dec-96 252.4 7.7102 0 0 1
31-Mar-97 237.8 7.7905 0 0 0
30-Jun-97 324.5 7.9255 1 0 0
30-Sep-97 314.6 7.4692 0 1 0
31-Dec-97 256.8 7.1980 0 0 1
31-Mar-98 258.4 7.0547 0 0 0
30-Jun-98 360.4 7.0938 1 0 0
30-Sep-98 348 6.8657 0 1 0
31-Dec-98 304.6 6.7633 0 0 1
31-Mar-99 294.1 6.8805 0 0 0
30-Jun-99 377.1 7.2037 1 0 0
30-Sep-99 355.6 7.7990 0 1 0
31-Dec-99 308.1 7.8338 0 0 1
11/18/20 59
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.885398221
R Square 0.78393001
Adjusted R Square 0.759236296
Standard Error 26.4498851
Observations 40
ANOVA
df SS MS F Significance F
Regression 4 88837.93624 22209.48406 31.74613731 3.33637E-11
Residual 35 24485.87476 699.5964217
Total 39 113323.811
11/18/20 60
Example: Private Housing Starts (PHS)
n To Account for and measure this seasonality in a
regression model, we will use three dummy
variables: Q2 for the second quarter, Q3 for the
third quarter, and Q4 for the fourth quarter. These
will be coded as follows:
n Q2 = 1 for all second quarters and zero otherwise.
n Q3 = 1 for all third quarters and zero otherwise
n Q4 = 1 for all fourth quarters and zero otherwise.
11/18/20 61
Example: Private Housing Starts (PHS)
n Data for private housing starts (PHS), the
mortgage rate (MR), and these seasonal indicator
variables are shown in the following slide.
n Examine the data carefully to verify your
understanding of the coding for Q2, Q3, Q4.
n Since we have assigned dummy variables for the
second, third, and fourth quarters, the first quarter
is the base quarter for our regression model.
n Note that any quarter could be used as the base,
with indicator variables to adjust for differences in
other quarters.
11/18/20 62
Example: Private Housing Start
n First we add real disposable personal
income per capita (DPI) as an independent
variable. Our new model for this data set is:
PHS = b 0 + b1 (MR) + b 2 (Q2) + b3 (Q3) + b 4 (Q4) + b5 ( DPI )
11/18/20 63
Example: Private Housing Start
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.943791346
R Square 0.890742104
Adjusted R Square 0.874187878
Standard Error 19.05542121
Observations 39
ANOVA
df SS MS F Significance F
Regression 5 97690.01942 19538 53.80753 6.51194E-15
Residual 33 11982.59955 363.1091
Total 38 109672.619
11/18/20 64
Example: Private Housing Start
n The prediction model is
PHSˆ = -31.06 - 20.19( MR) + 97.03(Q 2) + 75.40(Q3) + 20.35(Q 4) + 0.02( DPI )
11/18/20 65
Example: Private Housing Start
n The value of the DW test has changed from 0.88
for the previous model to 0.78 for the new model.
n At 5% level the critical value for DW test, from
Durbin-Watson table, for k = 5, and n = 39 is dL=
1.22, and dU = 1.79.
n Since The value of the DW test is smaller than
L=1.22, we reject the null hypothesis H0: r =0
n This implies that there is serial correlation in both
models, the assumption of the independence of
the error terms is not valid.
11/18/20 66
Example: Private Housing Start
n The Plot of PHS against
DPI shows a curve linear Private Housing Start and Disposable Personal Income
relation. 21500
21000
regression.
PHS
19500
19000
DPI
250 300 350 400
regression model.
11/18/20 67
Example: Private Housing Start
n We also add the dependent variable, lagged
one quarter, as an independent variable in
order to help reduce serial correlation.
n The third model that we fit to our data set
is:
PHS = b 0 + b1 ( MR) + b 2 (Q 2) + b 3 (Q3) + b 4 (Q 4) + b 5 ( DPI ) + b 6 ( DPI 2 ) + b 7 ( LPHS )
11/18/20 68
Example: Private Housing Start
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.97778626
R Square 0.956065971
Adjusted R Square 0.946145384
Standard Error 12.46719572
Observations 39
ANOVA
df SS MS F Significance F
Regression 7 104854.2589 14979.17985 96.37191 3.07085E-19
Residual 31 4818.360042 155.4309691
Total 38 109672.619
11/18/20 69
Example: Private Housing Start
n The inclusion of DPI2 and Lagged PHS has
increased the R-squared to 96%
n The standard error of the estimate has decreased to
12.45
n The value of the DW test has increased to 2.32
which is greater than dU = 1.79 which rule out
positive serial correlation.
n You see that the third model worked best for this
data set.
n The following slide gives the data set.
11/18/20 70
Example: Private Housing Start
PERIOD PHS MR LPHS Q2 Q3 Q4 DPI DPI SQUARED
30-Jun-90 271.3 10.3372 217 1 0 0 18063 1,631,359.85
30-Sep-90 233 10.1033 271.3 0 1 0 18031 1,625,584.81
31-Dec-90 173.6 9.9547 233 0 0 1 17856 1,594,183.68
31-Mar-91 146.7 9.5008 173.6 0 0 0 17748 1,574,957.52
30-Jun-91 254.1 9.5265 146.7 1 0 0 17861 1,595,076.61
30-Sep-91 239.8 9.2755 254.1 0 1 0 17816 1,587,049.28
31-Dec-91 199.8 8.6882 239.8 0 0 1 17811 1,586,158.61
31-Mar-92 218.5 8.7098 199.8 0 0 0 18000 1,620,000.00
30-Jun-92 296.4 8.6782 218.5 1 0 0 18085 1,635,336.13
30-Sep-92 276.4 8.0085 296.4 0 1 0 18036 1,626,486.48
31-Dec-92 238.8 8.2052 276.4 0 0 1 18330 1,679,944.50
31-Mar-93 213.2 7.7332 238.8 0 0 0 17975 1,615,503.13
30-Jun-93 323.7 7.4515 213.2 1 0 0 18247 1,664,765.05
30-Sep-93 309.3 7.0778 323.7 0 1 0 18246 1,664,582.58
31-Dec-93 279.4 7.0537 309.3 0 0 1 18413 1,695,192.85
31-Mar-94 252.6 7.2958 279.4 0 0 0 18154 1,647,838.58
30-Jun-94 354.2 8.4370 252.6 1 0 0 18409 1,694,456.41
30-Sep-94 325.7 8.5882 354.2 0 1 0 18493 1,709,955.25
31-Dec-94 265.9 9.0977 325.7 0 0 1 18667 1,742,284.45
31-Mar-95 214.2 8.8123 265.9 0 0 0 18834 1,773,597.78
30-Jun-95 296.7 7.9470 214.2 1 0 0 18798 1,766,824.02
30-Sep-95 308.2 7.7012 296.7 0 1 0 18871 1,780,573.21
31-Dec-95 257.2 7.3508 308.2 0 0 1 18942 1,793,996.82
31-Mar-96 240 7.2430 257.2 0 0 0 19071 1,818,515.21
30-Jun-96 344.5 8.1050 240 1 0 0 19081 1,820,422.81
30-Sep-96 324 8.1590 344.5 0 1 0 19161 1,835,719.61
31-Dec-96 252.4 7.7102 324 0 0 1 19152 1,833,995.52
31-Mar-97 237.8 7.7905 252.4 0 0 0 19331 1,868,437.81
30-Jun-97 324.5 7.9255 237.8 1 0 0 19315 1,865,346.13
30-Sep-97 314.6 7.4692 324.5 0 1 0 19385 1,878,891.13
31-Dec-97 256.8 7.1980 314.6 0 0 1 19478 1,896,962.42
31-Mar-98 258.4 7.0547 256.8 0 0 0 19632 1,927,077.12
30-Jun-98 360.4 7.0938 258.4 1 0 0 19719 1,944,194.81
30-Sep-98 348 6.8657 360.4 0 1 0 19905 1,980,963.41
31-Dec-98 304.6 6.7633 348 0 0 1 20194 2,038,980.00
31-Mar-99 294.1 6.8805 304.6 0 0 0 20377 2,076,010.87
30-Jun-99 377.1 7.2037 294.1 1 0 0 20472 2,095,440.74
30-Sep-99 355.6 7.7990 377.1 0 1 0 20756 2,153,982.23
31-Dec-99 308.1 7.8338 355.6 0 0 1 21124 2,231,020.37
11/18/20 71