Professional Documents
Culture Documents
Lecture - 8 MLR
Lecture - 8 MLR
Lecture - 8 MLR
Regression
MMT
Multiple Linear Regression
• Multiple linear regression means linear in regression
parameters (beta values). The following are
examples of multiple linear regression:
Y 0 1x1 2 x2 ... k xk
2
Y 0 1x1 2 x2 3 x1x2 4 x2 ... k xk
y1 1 x11 x21 xk 1 0 1
y 1 x12 x22
xk 2 1 2
2
yn 1 x1n x2 n xkn k n
Y X
Ordinary Least Squares Estimation for Multiple
Linear Regression
The variance of the residuals, Var(i|Xi), is constant for all values of Xi.
When the variance of the residuals is constant for different values of Xi, it
is called homoscedasticity. A non-constant variance of residuals is called
heteroscedasticity.
Hat matrix plays a crucial role in identifying the outliers and influential
observations in the sample.
Multiple Linear Regression Model Building
A few examples of MLR are as follows:
Model Summary
1 1 1 0 0 9800
11 2 0 1 0 17200
19 3 0 0 1 18500
27 4 0 0 0 7700
The corresponding regression model is as follows:
Y = 0 + 1 × HS + 2 × UG + 3 × PG
Note that in Table 10.4, all the dummy variables are statistically significant
= 0.01, since p-values are less than 0.01.
Interaction Variables in Regression Models
That is, the change in salary for female when WE increases by one year is
609.639 and for male is 3523.547. That is the salary for male workers is
increasing at a higher rate compared female workers. Interaction variables
are an important class of derived variables in regression model building.
Validation of Multiple Regression Model
n
2
(Yi Yi )
2 SSE i 1
R 1 1
SST n
2
i(Y Y )
i 1
• SSE is the sum of squares of errors and SST is the sum of squares
of total deviation. In case of MLR, SSE will decrease as the
number of explanatory variables increases, and SST remains
constant.
SSE/(n - k - 1)
Adjusted R - Square 1 -
SST/(n - 1)
Statistical Significance of Individual Variables in MLR – t-test
β (X T X)1 XT Y
Alternatively,
• H0: i = 0
• HA: i 0
The corresponding test statistic is given by
i 0 i
t
Se (i ) Se (i )
Validation of Overall Regression Model – F-
test
Analysis of Variance (ANOVA) is used to validate the overall
regression model. If there are k independent variables in the
model, then the null and the alternative hypotheses are,
respectively, given by
H0: 1 = 2 = 3 = … = k = 0
H1: Not all s are zero. Or at-least one of the is
not zero
F = MSR/MSE
Validation of Portions of a MLR Model – Partial F-
test
The objective of the partial F-test is to check where the
additional variables (Xr+1, Xr+2, …, Xk) in the full model are
statistically significant.
The corresponding partial F-test has the following null and
alternative hypotheses:
• H0: r+1 = r+2 = … = k = 0
• H1: Not all r+1, r+2, …, k are zero
• The partial F-test statistic is given by
(SSE R SSEF ) /( k r )
Partial F
MSE F
Residual Analysis in Multiple Linear
Regression
Residual analysis is important for checking assumptions about
normal distribution of residuals, homoscedasticity, and the
functional form of a regression model.
Multi-Collinearity and Variance Inflation
Factor
Multi-collinearity can have the following impact on the model:
• The standard error of estimate of a regression coefficient may
be inflated, and may result in retaining of null hypothesis in t-
test, resulting in rejection of a statistically significant
explanatory variable.
• The t-statistic value is e )
S (
• If e ) is inflated, then the t-value will be underestimated
S (
resulting in high p-value that may result in failing to reject the
null hypothesis.
• Thus, it is possible that a statistically significant explanatory
variable may be labelled as statistically insignificant due to the
presence of multi-collinearity.
Impact of Multicollinearity
1
VIF
2
1 R12
1
tactual VIF
S ( )
e 1
Remedies for Handling Multi-Collinearity
Mahalanobis Distance
Cook’s Distance
Leverage Values
DM ( X i ) ( X i i ) S 1 ( X i i )
Cook’s Distance
• Cook’s distance (Cook, 1977) measures the change in the
regression parameters and thus how much the predicted value of
the dependent variable changes for all the observations in the
sample when a particular observation is excluded from sample
for the estimation of regression parameters.
T
Yj Yj(i) Yj Yj(i)
Di
( k 1) MSE
DFFIT and SDFFIT
DFFIT measures the difference in the fitted value of an
observation when that particular observation is removed from
the model building. DFFIT is given by
DFFIT y i y i (i )
where, Yi is the
predicted value of i observation including i
th th
SSE p
Cp (n 2 p)
MSE full
Model Summary
Coefficients
Model Unstandardized Standardize T Sig.
Coefficients d
Coefficients
Coefficients