Untitled

1
Assumptions of multiple linear

regression
 The same as those made for simple
linear regression plus
 the assumption of no exact collinearity
among the explanatory variables
 high collinearity among the
explanatory variables is called
multicollinearity
2
 Example: The table below has the OLS estimated results
for three models on deaths due to heart disease:
Model A Model B Model C

Variable (n=34) R2 = .731 R2 = .723 R2 = .712
Constant 226.00 247.00 139.68
(t-statistic) (1.54) (1.94) (1.79)
Calcium (b1) -69.98 -77.76
(t-statistic) (-0.89) (-1.06)
Cigarettes (b2) 10.12 10.64 10.71
(t-statistic) (2.00) (2.32) (2.33)
Unemployed (b3) -0.613
3
(t-statistic) (-0.39)
Edible fats (b4) 2.81 2.73 3.38
(t-statistic) (1.88) (2.40) (3.50)
Meat (b5) 0.11
(t-statistic) (0.46)
Spirits (b6) 21.72 23.65 26.75
(t-statistic) (2.57) (3.11) (3.80)
Beer (b7) -3.47 -3.85 -4.13
(t-statistic) (-2.67) (-4.27) (-4.79)
Wine (b8) -4.56
(t-statistic) (-0.28)
4
Explain why each of the independent
variables might affect the death rate
due to Coronary Heart Disease
(CHD)
 Calcium: improves the bone
structure, not clear whether/what
effect it has on CHD
 Cigarettes: expect positive sign of the
coefficient of CIG
 Unemployment: expect positive sign
5
 Fats: expect positive sign
 Meat: expect positive sign
 Alcohol: because heavy drinkers
have much higher incidence of
heart disease, we expect positive
signs for SPIRITS, BEER, and
WINE
 Some of the observed signs differ
from these expectations. One
reason may be multicollinerity.
6
Estimate the fit of each model.
Would you say that they are good?
Why?
 adj-R2 = 1 – (1 – R2)[(n-1)/(n-(k+1)]
where k is the # of explanatory
variables in the model
7
 Model A:
 adj-R2 = 1 – (1 – R2)[(n-1)/(n-(k+1)]
=1 – (1 - .731)[(34-1)/(34-9)]= 0.645
 Model B: adj-R2 = … = 0.674
 Model C: adj-R2 = … = 0.672
 Considering that the data are
time series, we would expect
higher R2.
8
In recent years, several criteria for
choosing among models have been
proposed. All of these take the form of
the RSS = (SER)2×[n-(k+1)]
• multiplied by a penalty factor that
depends on the number of the
parameters.
• A model with a lower value of a criterion
statistic is judged to be preferable.
9
The most popular among the Model
Selection Criteria are:
• Akaike’s Information Criterion AIC =
(RSS/n)×exp[2(k+1)/n]
• Hannan-Quinn HQ criterion
HQ = (RSS/n)×[ln(n)][2(k+1)/n]
• Schwarz Bayesian Information
Criterion (BIC or SC or SBC)
SC = (RSS/n)×n[(k+1)/n]
10
• Ideally, we would like a model to have
lower values for all these statistics, as
compared to an alternative model
• Unfortunately, it is possible to find a
model superior under one criterion
and inferior under another.
• The AIC is commonly used in time
series analysis.
11
Test the overall significance of the model
Test statist. F=(R2/k)/[(1-R2)/(n-(k+1)]
• Model A: [F has k and n-(k+1) d.f.]
F = (.731/8)]/[(1-.731)/(34-9)] = 8.49
F8,25,5% = 2.34, hence reject at the 5%
level H0: β1=β2=…= β8 = 0 (note: H0
does not include β0) and accept H1: At
least one of the regression coefficients
is nonzero
12
• Model B: F = … = 14.63 Perform
the F test
• Model C: F = … = 17.93 Perform
the F test
13
Tests of significance of parameter
estimates (use the 2-sided test and α =
5%)
 Model A: We use the t-test. The d.f. is
n-(k+1) = 34 – 9 = 25. The critical
tcr,25,2.5% = 2.06
|t| >2.06 only for SPIRITS and BEER
Repeat the same analysis for Model B
and C.
14
 Model A (continued)
The 95% Confidence Interval for SPIRITS
(β6):
LHS = 21.72 – 2.06×SE(b6)
Since t6 = b6/SE(b6), then
SE(b6) = b6/t6 = 21.72/2.57 = 8.45,
thus LHS = 4.31 and RHS = 39.13
Repeat the same calculations for
BEER.
15
• Model A: since only SPIRITS and
BEER are significant, the other va-
riables are candidates to be dropped
However, do not drop them together
 You may want to keep the variables
with, for example, |t| > 0.5. Thus,
only UNEMP (β3), MEAT (β5), and
WINE (β8), will be excluded.
• This results in Model B.
16
Perform the relevant Wald F-test
for excluding the variables from
Model A to obtain Model B.
State the relevant hypotheses H0 and
H1, and the d.f. of the test statistic.
• Ho: β3 = β5 = β8 = 0
• H1: H0 is not true
17
The Wald F-test:
Model A: unrestricted with RU2=.731
Model B: restricted with RR2 = .723
Then F = N/D has an F[r,n-(k+1)] distrib.
• Numerator N = (RU2 – RR2 )/r, where r=
the # of restricted parameters in H0
• Denominator D = (1 – RU2)/[n-(k+1)],
where n = the sample size; k = the # of
explanatory variables (X’s).
18
• N = (.731-.723)/3 = 0.0027
• D = (1-.731)/(34-9) = 0.01076
• F = 0.2509; Fcr,3,25,5% = 2.99.
Interpretation:
• Because F < Fcr, these three regression
coefficients (for UNEMP, MEAT, and
WINE) are not jointly significant at the
5% level. This suggests that all three of
them can be dropped.
19
• In Model B, the only insignificant
coefficient is the one for CALCIUM
(check it out), therefore this
variable will be also removed from
the model (we get Model C).
• Which model (A, B, or C) is the
best?
• According to adj-R2, Model B is
“the best”
20
 However, Model B has one coeffi-
cient insignificant (CALCIUM)
 Model C has adj-R2 very close to
that of B and no insignificant
coefficients (except the constant
term - intercept)
 Because the intercept captures the
average effect of omitted variables,
it should always be retained
21
MULTICOLLINEARITY
• The data for explanatory variables
often move together
• For example, population and GDP
are highly correlated with each other
• Exact (perfect) multicollinearity – it
is not possible to estimate the
regression coefficients
22
 Regression programs give an
error message, e.g., “Matrix
singular” or “Exact collinearity
encountered”
 When this happens, one or more
variables should be dropped form
the model
23
 When explanatory variables have
approximate (not exact) linear
relationship – the problem of
multicollinearity arises
Consequences of ignoring
multicollinearity
 The OLS estimators are still
BLUE and consistent
24
 Forecasts are still unbiased and
confidence intervals are valid
 Although the standard errors
and t-statistics of regression
coefficients are numerically
affected, tests based on them are
still valid
25
Then, why do we care about
multicollinearity?
 Standard errors SE(b) of regr.
parameters b are usually higher
when multicollinearity arises,
making t-statistics = b/SE(b)
lower and insignificant
26
 Interpretation of the estimates is
more difficult
When two explanatory
variables move closely together,
we cannot simply hold one
constant and interpret the other
because when the latter is
changed, the former is changed
too.
27
 Identifying multicollinearity
1. High R2 with low t-statistics
2. High pairwise correlations
among explanatory variables
 However, multicollinearity
may be present even though
pairwise correlations are not high
This is because three or more
variables may be nearly linear
28
3. From auxiliary regressions:
Xi = f(other X’s)
if Ri2 > R2, then
multicollinearity is a problem
4. Formal tests for multicollinear.
Since these tests are quite
controversial, we do not discuss
them here
29
 Dealing with multicollinearity
1. Ignore it (if you are more
interested in forecasting than in
interpreting coefficients)
2. Collect more data
3. Transform the functional relation
e.g., express the variables as per
capita rather than keep popul.
size as explanatory variable.
30
4. Drop one or more of the highly
correlated variables with the lowest
t-statistic (Caution: not too many,
the model may become
theoretically unfounded)
5. Try to keep those insignificant
variables that have |t|-statistic
at least 1or p-value < 0.25
31

Untitled

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Untitled

Uploaded by

Copyright:

Available Formats

1

Assumptions of multiple linear

Model A Model B Model C

You might also like