Chapter 5 - Violations of Regression Assumptions

Statistical Inference in the Multiple Regression Model
 Estimate of the characteristics or properties of a population,

derived from the analysis of a sample drawn from it.
 Statistical inference means drawing conclusions based on
data.
 There are a many contexts in which inference is desirable,
and there are many approaches to performing inference.
 A series of procedures in which the data obtained from

samples are used to make statements about some broader
set of circumstances.
 Statistical inferences are used to project the
data from the sample to the entire population.
 Random errors typical of sampling may lead to
the fact that the sample will not be an
[accurate enough] model of the population.
 In fact, the sample is never a 100% accurate
model of the population, but only its more or
less distorted variant.
 In order to correct these distortions and,
therefore, to make more accurate conclusions
about the population assumptions should be
satisfied.
Classical Linear Regression Model Assumptions
and Diagnostic Tests
 In carrying out any statistical analysis it is always
important to consider the assumptions under
which that analysis was carried out, and to assess
which of those assumptions may be satisfied for
this data.
 The analysts need to be able to diagnose the
violations of assumptions, understand the
consequences of the violations and know the
remedial actions to take.
 These assumptions should be studied and in
particular looking at the following:
 How can violations of the assumptions be
detected?
 What are the most likely causes of the violations
in practice?
 What are the consequences for the model if an
assumption is violated?
 The most critical violations are:
Normality
Heteroskedasticity
Serial Correlation
Multicollinearity
Testing for departures from Normality
 There are few consequences associated with a
violation of the normality assumption, as it does
not contribute to bias or inefficiency in regression
models.
 It is only important for the calculation of p values
for significance testing, but this is only a
consideration when the sample size is very small.
 When the sample size is sufficiently large (>30),
the normality assumption is not needed at all as
the Central Limit Theorem ensures that the
distribution of disturbance term will approximate
normality
 When dealing with very small samples, it is
important to check for a possible violation of the
normality assumption.
 This can be accomplished through an inspection of
the residuals from the regression model (some
programs will perform this automatically while
others require that you save the residuals as a new
variable and examine them using summary statistics
and histograms).
 There are several statistics available to examine the
normality of variables which are skewness and
kurtosis, as well as numerous graphical depictions
such as the normal probability plot.
 The Bera-Jarque normality tests results can be
used to test normality.
 To improve the chances of error normality is
to use dummy variables or some other
method to effectively remove those
observations.
Heteroskedasticity
 It is where the standard deviations of a variable,

monitored over a specific amount of time, are non-
constant.
 Hetero (different or unequal) is the opposite of Homo
(same or equal
Homoskedasticity = equal spread
Heteroskedasticity = unequal spread
 Errors in financial data are often heteroskedastic –

the variance of errors differs across observations.
 Diagrammatically, in the two-variable regression
model homoscedasticity can be shown as follows:
 Fig A
 Fig B
 Figure A shows, the conditional variance of Yi
(which is equal to that of ui), conditional upon
the given Xi, remains the same regardless of
the values taken by the variable X.
 In contrast, consider Figure B, which shows
that the conditional variance of Yi increases as
X increases. Here, the variances of Yi are not
the same. Hence, there is heteroscedasticity.
 Assumption 5of the LRM states that the
disturbances should have a constant (equal)
variance independent of t:
Var(ut)=σ2
 Note: as X increases, the variance of the error
term increases (the “goodness of fit” gets worse)
Homoskedasticity: The error has constant variance
Heteroskedasticity: Spread of error depends on X.
Another form of Heteroskedasticity
Types of Heteroskedacticity
 Unconditional Heteroskedasticity – occurs
when heteroskedasticity of the error variance
is not correlated with the independent
variables in the multiple regression.
 Conditional Heteroskedasticity – the error
variance that is correlated (conditional on)
the values of the independent variables in the
regression.
Consequences of Heteroskedasticity
 Heteroskedasticity can lead to mistakes in inference.
 When errors are heteroskedastic the F-test for the
overall significance of the regression is unreliable.
 Further more t-tests for the significance of individual
regression coefficients are unreliable because
hetereskedasticity introduces bias into estimators of
the regression coefficients.
 If a regression shows significant heteroskedasticity,
the standard errors and test statistics computed by
regression programs will be incorrect unless they are
adjusted for heteroskedasticity.
 In regressions with financial data, the most
likely result of heteroskedasticity is that the
estimated standard errors will be
underestimated and the t-statistics will be
inflated.
Hypothesis Testing:
H0: Error variances are equal or constant
H1: Error variances are not equal
Detecting Heteroskedasticity
 There are two ways in general.
 The first is the informal way which is done
through graphs and therefore we call it the
graphical method.
 The second is through formal tests for
heteroskedasticity, like the following ones:
1. The Breusch-Pagan LM Test

2. The Glesjer LM Test
3. The Harvey-Godfrey LM Test
4. The Park LM Test
5. The Goldfeld-Quandt Test
6. White’s Test
The Breusch-Pagan LM Test
 Breusch-Pagan / Cook-Weisberg tests the null

hypothesis that the error variances are all equal
versus the alternative that the error variances are a
multiplicative function of one or more variable.
Interpretation of the Formal tests
 Based on the p-value of tests B-P above

which are less than alpha (of say 5%), we
conclude that there is substantial amount of
HSK in the model.
 If p-value < 0.05: We Reject Ho
Correcting for Heteroskedasticity
 Financial analysts need to know how to
correct for heteroskedasticity because such a
correction may reverse the conclusions about
a particular hypothesis test.
 Thus affect a particular investment decision
 Two methods are usually used to correct the
effects of conditional heteroskedasticity in
linear regression models.
Robust Standard Errors
 Robust Standard Errors - Corrects standard
errors for the linear regression model ‘ s
estimated coefficients to account for
conditional heteroskedasticity
 Generalised Least Squares - Modifies the

original equations in an attempt to eliminate
the heteroskedasticity.
Serial Correlation
 One of the assumptions of both simple and
multiple regression analysis is that the error
terms are independent from one another –
they are uncorrelated.
 If this assumption is violated, although the
estimated regression model can still be of
some value for prediction, its usefulness is
greatly compromised.
Consequences of Serial Correlation
 Serial correlation causes OLS to underestimate the variances

(and standard errors) of the coefficients.
 Intuitively - Serial correlation increases the fit of the
model.
 Hence the estimation of the variance and standard errors
is lower. This can lead the researcher to conclude a
relationship exists when in fact the variables in question
are unrelated.
 Hence the t-stats and F-stats can not be relied upon for
statistical inference.
Testing for Serial Correlation
 The Durbin Watson test is a well known formal
method of testing if serial correlation is a
serious problem undermining the model’s
inferential suitability.
 The test statistic of the Durbin-Watson
procedure is calculated as follows:
 Recall that et represents the observed error
term (residuals) or (Yt –Ŷt) = Yt – a – bXt
 The value of dw will be between zero and
four; zero corresponding to perfect positive
correlation and four to perfect negative
correlation.
 If the error terms, et and et-1, are uncorrelated,
the expected value of dw = 2
 The further d is below 2 the stronger the
evidence for the existence of positive first
order serial correlation and vice versa
 The critical values of dw for a given level of
significance, sample size and number of
independent variables are tabulated as pairs
of values: DL and DU
 The formal test of positive first order serial
correlation is as follows:
Ho : r = 0 (no serial correlation)
H1 : r > 0 (positive serial correlation)
 dw < DL reject Ho,
 dw > DU do not reject Ho.
 L  dw  U, the test is inconclusive
Example
Lets look at this model: Y = A + BX + e, where
Y is the sales revenue and X is the advertising
budget, the successive pairs of Y and X values
are observed monthly. Therefore, there is
concern that first-order serial correlation may
exist. To test for positive serial correlation we
calculate the error terms, ei. The errors
(residuals) are given below.
= 1982184
 Notice that the denominator of the Durbin
Watson statistic is SSE = 900,722.1.
 The sum of the last column which gives the
numerator is 1982184
 We can now compute the Durbin-Watson
statistic, dw as 1982184/900,722.1 = 2.200661
 For n =12, k = 1 and a =.05 DL = .97 and DU = 1.33.
Since wd > DU do not reject the null hypothesis,
that means there is no significant positive serial
correlation. You may wish to formally test the
existence of negative serial correlation.
Example
Year Quarter t CompSale InduSale
1983 1 1 20.96 127.3
2 2 21.4 130
3 3 21.96 132.7
4 4 21.52 129.4
1984 1 5 22.39 135
2 6 22.76 137.1
3 7 23.48 141.2
4 8 23.66 142.8
1985 1 9 24.1 145.5
2 10 24.01 145.3
3 11 24.54 148.3
4 12 24.3 146.4
1986 1 13 25 150.2
2 14 25.64 153.1
3 15 26.36 157.3
4 16 26.98 160.7
1987 1 17 27.52 164.2
2 18 27.78 165.6
3 19 28.24 168.7
4 20 28.78 171.7
Year Quarter t Company sales(y) Industry sales(x) et et-et-1 (et-et-1)^2 et^2
1983 1 1 20.96 127.3 -0.02605 0.000679
2 2 21.4 130 -0.06202 -0.03596 0.001293 0.003846
3 3 21.96 132.7 0.022021 0.084036 0.007062 0.000485
4 4 21.52 129.4 0.163754 0.141733 0.020088 0.026815
1984 1 5 22.39 135 0.04657 -0.11718 0.013732 0.002169
2 6 22.76 137.1 0.046377 -0.00019 3.76E-08 0.002151
3 7 23.48 141.2 0.043617 -0.00276 7.61E-06 0.001902
4 8 23.66 142.8 -0.05844 -0.10205 0.010415 0.003415
1985 1 9 24.1 145.5 -0.0944 -0.03596 0.001293 0.008911
2 10 24.01 145.3 -0.14914 -0.05474 0.002997 0.022243
3 11 24.54 148.3 -0.14799 0.001152 1.33E-06 0.021901
4 12 24.3 146.4 -0.05305 0.094937 0.009013 0.002815
1986 1 13 25 150.2 -0.02293 0.030125 0.000908 0.000526
2 14 25.64 153.1 0.105852 0.12878 0.016584 0.011205
3 15 26.36 157.3 0.085464 -0.02039 0.000416 0.007304
4 16 26.98 160.7 0.106102 0.020638 0.000426 0.011258
1987 1 17 27.52 164.2 0.029112 -0.07699 0.005927 0.000848
2 18 27.78 165.6 0.042316 0.013204 0.000174 0.001791
3 19 28.24 168.7 -0.04416 -0.08648 0.007478 0.00195
4 20 28.78 171.7 -0.03301 0.011152 0.000124 0.00109
0.097941 0.133302
Blaisdell Company Example
Example
.09794
DW   .735
.13330
 Using Durbin Watson table of your text book,
for k = 1, and n=20, and using  = .05 we find
U = 1.20, and L = 1.41
 If wd < DL reject Ho, while if wd > DU do not
reject Ho.
 Since dw = .735 < L = .95
 We reject the null hypothesis, namely, that
the error terms are positively
autocorrelated.
Remedial Measures for Serial Correlation
 Addition of one or more independent
variables to the regression model.
One major cause of autocorrelated error terms is the
omission from the model of one or more key
variables that have time-ordered effects on the
dependent variable.
 Use transformed variables.
 The regression model is specified in terms of
changes rather than levels.
Multicollinearity
 Multicollinearity (also collinearity) is a phenomenon
in which two or more predictor variables in a
multiple regression model are highly correlated,
meaning that one can be linearly predicted from the
others with a substantial degree of accuracy.
 Multicollinearity violates Classical Regression
Assumption which specifies that no explanatory
variable is a perfect linear function of any other
explanatory variables.
 You cannot hold all the other independent variables
in the equation constant if every time one variable
changes, another changes in an identical manner.
 There are certain reasons why Multicollinearity occurs:
 It is caused by an inaccurate use of dummy
variables.
 It is caused by the inclusion of a variable
which is computed from other variables in the
data set.
 Generally occurs when the variables are highly
correlated to each other.
Problems of Multicollinearity
 The computed t-scores will fall.
 Estimates will become very sensitive to changes in
specification.
 Multicollinearity will thus make confidence
intervals for the parameters very wide, and
significance tests might therefore give
inappropriate conclusions, and so make it
difficult to draw sharp inferences.
Multicollinearity Diagnostics
 A formal method of detecting the presence of
multicollinearity that is widely used is by the means
of Variance Inflation Factor (VIF)
– It measures how much the variances of the estimated
regression coefficients are inflated as compared to when
the independent variables are not linearly related.
1
VIF j  , j  1,2,  k
1  R 2j
– Is the coefficient of determination from the regression of

the jth independent variable on the remaining k-1
independent variables.
Multicollinearity Diagnostics
 Its estimated coefficient and associated t value will not

change much as the other independent variables are added or
deleted from the regression equation.
 A maximum VIF value in excess of 10 is often taken as an indication
that the multicollinearity may be unduly influencing the least square
estimates.
 The estimated coefficient attached to the variable is
unstable and its associated t statistic may change
considerably as the other independent variables are added
or deleted.
 The most commonly used method is that the VIF should
be less than 10
Example: Sales Forecasting
 VIF calculation Results:
Variable R- Squared VIF

ADRATE 0.159268 1.19
COMPETE 0.779575 4.54
SIGNAL 0.262394 1.36
APIPOP 0.770978 4.36
 There is no significant Multicollinearity

because VIF < 10
Solutions to the problem of Multicollinearity
 Drop one of the collinear variables, so that the

problem disappears
 Transform the highly correlated variables into
a ratio and include only the ratio and not the
individual variables in the regression (May not
be acceptable in Financial Theories
 Increase the sample size
END

Chapter 5 - Violations of Regression Assumptions

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 5 - Violations of Regression Assumptions

Uploaded by

Copyright:

Available Formats

Statistical Inference in the Multiple Regression Model

 Estimate of the characteristics or properties of a population,

 A series of procedures in which the data obtained from

 It is where the standard deviations of a variable,

 Errors in financial data are often heteroskedastic –

1. The Breusch-Pagan LM Test

 Breusch-Pagan / Cook-Weisberg tests the null

 Based on the p-value of tests B-P above

 Generalised Least Squares - Modifies the

 Serial correlation causes OLS to underestimate the variances

– Is the coefficient of determination from the regression of

 Its estimated coefficient and associated t value will not

Variable R- Squared VIF

 There is no significant Multicollinearity

 Drop one of the collinear variables, so that the

You might also like