chp2 Econometric

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 54

Chapter Three: Violations of Basic

Assumptions of CLRM
• OLS estimates are BLUE if the error term, u, has:
– zero mean: E(ui) = 0 all i
– common variance: var(ui) = σ2 for all i
– normal: ui are normally distributed for all I
– independence: ui and uj are independent (uncorrelated)
for all i j
– No perfect multicolliniarity
– No model misspecification

1
Cont’d---
• Inferences made based on the results of OLS estimations are
valid so long as the assumptions of the classical linear
regression model hold

• Unfortunately, one or more of those assumptions are


violated

• The seriousness and implications of the violations of the


assumptions depends on which assumption is violated

• In most cases, econometricians can diagnosies and remedy


2
such problems.
Problems with OLS
• What the problem means (What is the nature of the
problem?)
• What are the possible causes
• How to detect it (How is the problem diagnosed?)
• What it does to our estimates and inference (What are the
consequences of the problem?)
• How to correct for it (remedies or solutions)
• Key Problems: Misspecification of the model, Non-
Normality, MultiCollinearity, Heteroscedasticity, Serial
Correlation

3
Specifying an Econometric Model and Specification Error

• OLS estimation is based on the assumption that the model to be estimated is


correctly specified.
• However, we can never be sure that a given model is correctly specified.
Researchers usually examine more than one possible specifications in an
attempt to find the specification which best describes the process under
study.
• Before any equation can be estimated, it must be completely specified
• Specifying an econometric model consists of three parts, namely choosing
the correct: independent variables, functional form, and form of the
stochastic error term
• A specification error results when one of these choices is made incorrectly
• There are four basic types of model misspecification (Specification Error)
– Inclusion of an irrelevant variable
– Exclusion of a relevant variable
– Measurement error 4
– Erroneous functional form for the relationship
Omitted variables
• Two reasons why an important explanatory variable might have been
left out:
– we forgot…
– it is not available in the dataset, we are examining
• Either way, this may lead to omitted variable bias
(or, more generally, specification bias)
• The reason for this is that when a variable is not included, it cannot be
held constant
• Omitting a relevant variable usually is evidence that the entire
equation is a suspect, because of the likely bias of the coefficients
• The estimates are unreliable and the t and F statistics can not be relied
on
5
The Consequences of an Omitted
Variable
• Suppose the true regression model is:
(3.1)
Where is a classical error term
• If X2 is omitted, the equation becomes instead:
(3.2)
Where: (3.3)
• Hence, the explanatory variables in the estimated regression (3.2) are not independent of the
error term (unless the omitted variable is uncorrelated with all the included variables)
• But this Violates Classical Assumption of Cov (Xi εi) = 0
• What happens if we estimate Equation 3.2 when Equation 3.1 is the truth?
- We get bias!
- What this means is that: (3.4)
• The amount of bias is a function of the impact of the omitted variable on the dependent
variable times a function of the correlation between the included and the omitted variable
• More formally: (3.5)

• So, the bias exists unless:


1. the true coefficient equals zero, or
2. the included and omitted variables are uncorrelated
Correcting for an Omitted Variable
• In theory, the solution to a problem of specification bias seems easy:
add the omitted variable to the equation!
• Unfortunately, that’s easier said than done, for a couple of reasons:
1. Omitted variable bias is hard to detect: the amount of bias
introduced can be small and not immediately detectable
2. Even if it has been decided that a given equation is suffering from
omitted variable bias, how to decide exactly which variable to
include?
3. You have two or more theoretically sound explanatory variables
as potential “candidates” for inclusion as the omitted variable to
the equation is to use

7
Irrelevant Variables
• This refers to the case of including a variable in an equation when it does not belong
there
• This is the opposite of the omitted variables case—and so the impact can be
illustrated using the same model
• Assume that the true regression specification is:
(3.6)
• But the researcher for some reason includes an extra variable:
(3.7)
• The misspecified equation’s error term then becomes:
(3.8)
• So, the inclusion of an irrelevant variable will not cause bias (since the true coefficient of the
irrelevant variable is zero, and so the second term will drop out of Equation 3.8)
• However, the inclusion of an irrelevant variable will:
– Increase the variance of the estimated coefficients, and this increased variance will tend
to decrease the absolute magnitude of their t-scores
– Decrease the adjusted R2 (but not the R2)
8
– But this is not as serious as the omitted variable bias
Measurement error
• Measurement Error in independent variable (s)
✓ Suppose we are interested in estimating a two variable regression
model and we are concerned with the possibility that explanatory
variable might be measured with error.
- The explanatory variable and the error term are correlated which
violates one of the assumptions of the CLRM
- Under such circumstances OLS are not only biased but
inconsistent, i.e. remain biased even asymptotically.
• Measurement Error in dependent variable
✓ The errors of measurement associated with the dependent variable do
not destroy the unbiasedness property of OLS. However, the
parameter estimates are inefficient (high variance and standard errors)

9
Functional Form
❖ Nonlinearity in variables
• This occurs when a linear regression model, which is linear in parameters, is
estimated when the true regression model is nonlinear.
• Suppose the true model is of the form:
Y = β 0 + β1 X 2 + ε (3.9)
• This equation is linear in parameters. Therefore, we can apply OLS
• While the estimated model is:
Y = β0 + β1X + ε (3.10)
• The specification of a linear model when the true model is nonlinear can lead to
biased and inconsistent parameter estimates. Thus, as a test for non-linearity
polynomials need to be estimated
❖ Nonlinearity in parameters
• Consider the model: y = g ( x ,  ) =  x
1 1
2
x 2
3
(3.11)
ln y = 1 +  2 ln x1 +  3 ln x2 (3.12)
Which is linear in parameters; thus, we can apply OLS
10
Methods to detect specification errors
• The Ramsey Reset test can be used to determine if the functional form
of a model is acceptable
i.e., H0: the linear model is correct
H0 : There is no omitted variable in the model (No specification
bias)
• Intuition: If the linear model is correct, powers of the predicted values
of the dependent variable should not the explain the dependent
variable
• This test is based on running the regression and saving the residual as
well as the fitted values.
• Then run a secondary regression of the residual on powers of these
fitted values.
yt =  + xt + ut (3.11)
uˆt =  0 + 1 yˆ t2 +  2 yˆ t3 + ... +  p −1 yˆ tp + vt (3.12)
11
Cont’d---
• The R-squared statistic is taken from the secondary regression and the
test statistic formed: n*R-squared

• It follows a chi-squared distribution with (p-1) degrees of freedom.

• The null hypothesis is the functional form is suitable.

• If a n*R-squared statistic of 7.6 is obtained and we had up to the


power of 3 in the secondary regression, then the critical value for chi-
squared (2) is 5.99, 7.6>5.99 so reject the null and the functional form
is a problem

• It is called Ovtest (Specification error/ misspecification test, in


particular test for functional form or omission variables) 12
Cont’d---
• Put differently: powers of the predicted value of y,
yˆ i = xi ' b shouldn't help explaining yi
• To test, regress:
yi = xi '  +  2 yˆi2 +  3 yˆi3 +  +  Q yˆip + i (3.13)
• Ramsey’s RESET test:
– F-test for the joint significance of the alphas
– the regression in Ramsey’s RESET test is an auxiliary
regression
• WARNING: rejecting H0 does not always indicate
non-linearity, it might also indicate an omitted
variable 13
Cont’d---
▪ linktest- it creates two new variables, the variable of
prediction, _hat, and the variable of squared prediction,
_hatsq. The model is then refit using these two variables as
predictors. _hat should be significant since it is the
predicted value. On the other hand, _hatsq shouldn't,
because if our model is specified correctly, the squared
predictions should not have much explanatory power. That
is we wouldn't expect _hatsq to be a significant predictor if
our model is specified correctly. So we will be looking at
the p-value for _hatsq.

14
Non-Normality
• What it means
– Suppose the assumptions about mean and variance of regression errors
satisfied, OLS also assumes they are normally distributed
• How to detect non-normality
• There are several methods of assessing whether data are normally
distributed or not. They fall into two broad categories: graphical and
statistical. The some common techniques are:
1. Graphical
• Q-Q probability plots
• Cumulative frequency (P-P) plots
2. Statistical
• Jarque-Bera test
• Shapiro-Wilks test
• Kolmogorov-Smirnov test
15
Cont’d---
• Q-Q plots display the observed values against normally distributed
data (represented by the line)

• Normally distributed data fall along the line 16


Cont’d---
❖ The P-P plots

❖ This is a histogram of cumulative frequency of a hypothetical data.


These data do not ‘look’ normal, but they are not statistically different
than normal 17
Cont’d---
❖ Graphical methods (visual inspections)are necessary but not sufficient
and typically not very useful when the sample size is small. In this
case, Statistical methods can better serve
❖ Statistical tests for normality are more precise since actual
probabilities are calculated.
❖ Tests for normality calculate the probability that the sample was
drawn from a normal population.
❖ The hypotheses used are:
– Ho: The sample data are not significantly different than a normal population (the
data or error terms are normally distributed).
– Ha: The sample data are significantly different than a normal
population.

18
Cont’d---
➢ The Jarque–Bera test is a goodness-of-fit test of whether sample
data have the skewness and kurtosis matching a normal distribution.
– For normal distribution, skew = 0, kurtosis = 3
– Jarque-Bera test
• J-B = n[S2/6 + (K-3)2/24], n- sample size, S and K- are sample skewness is
kurtosis
• The J-B statistic can be compared to the χ2 distribution (table) with 2
degrees of freedom to determine the critical value at an alpha level of 0.05.
• distributed χ2(2) so CV is approx 6.0
– J-B > 6.0 => non-normality
• What it does (loosely speaking)
» skewness means coefficient estimates are biased.
» excess kurtosis (EK = K-3) means standard errors are understated.

19
Cont’d---
• NB: Normality can be a problem when the sample size is small (< 50)
• How to correct for it
– skewness can be reduced by transforming the data
• take natural logs
• look at outliers
– kurtosis can be accounted for by adjusting the degrees of freedom used in
standard tests of coefficient on x
❑ Other tests
H0: residuals are normally distributed
❖ When the probability is less than .05, we must reject the null hypothesis and infer
that the residuals are non-normally distributed for the following two tests.
✓ Smirnov-Kolmogorov test
✓ Shapiro-Wilk W test for Normality. This tests the cumulative distribution of the
residuals against that of the theoretical normal distribution with a chi-square test.
✓ kdensity test
✓ The pnorm command produces a normal probability plot and it is another
method of testing whether the residuals from the regression are normally20
Cont’d---
❑ Which normality test should be used?
❖ Jarque-Bera:
• Tests for skewness and kurtosis, very effective.
❖ Kolmogorov-Smirnov:
• Not sensitive to problems in the tails.
• For data sets > 50.
❖ Shapiro-Wilks:
• Doesn't work well if several values in the data set are the same.
• Works best for data sets with < 50, but can be used with larger

.
21
Multicollinearity
• What it means? It is a special kind of correlation where regressors are
highly intercorrelated
• More formally, it is a situation where the independent variables in a multiple
regression are linearly correlated
• Multicollinearity is a question of degree, not of kind
• There could be two types of multicollinearity problems: Perfect and less
than perfect collinearity
• If multicollinearity is perfect, the regression coefficients of the X variables
are indeterminate and their standard errors infinite.
• Perfect multicollinearity (exact relationship) violates CLRM, which
specifies that no explanatory variable is a perfect linear function of any other
explanatory variables
• If multicollinearity is less than perfect (near high- doesn’t affect OLS
assumptions), the regression coefficient although determinate, possesses
large standard errors, which means the coefficients can not be estimated with
great precision
22
• It is a feature of sample, and not that of population
Possible Causes of multicollinearity
• The possible sources of multicollinearity are:
➢ The data collection method employed: For instance, sampling over
a limited range
➢ Model specification: For instance adding polynomial terms
➢ The use of lagged values of some explanatory variables as separate
independent factors in the relationship.
➢ An over determined model: Too many variables in the model
(when the model has more explanatory variables than the number
of observations)
➢ X’s are causally related to one another
➢ In time series data, the regressors may share the same trend

23
Consequences of multicollinearity
❖ There are five major consequences of multicollinearity:
1. Estimates will remain unbiased
2. Although the OLS estimators unbiased, the variances and standard errors of
the estimates will increase; thus, harder to distinguish the effect of one
variable from the effect of another, so much more likely to make large errors
in estimating the βs than without multicollinearity
3. As the result, the computed t-scores will fall, F-test is significant while
coefficients are large but not significant , wrong signs of the regression
coefficients, the confidence interval will be wider leading to the acceptance of
the null hypothesis. Although individual coefficients are insignificant, the R2
may be large
4. Estimates will become very sensitive to small changes in specification:
➢ The addition or deletion of an explanatory variable or of a few observations
will often cause major changes in the values of the when significant
multicollinearity exists. For example, if you drop a variable, even one that
appears to be statistically insignificant, the coefficients of the remaining
variables in the equation sometimes will change dramatically
5. However, the overall fit of the equation and the estimation of the coefficients
24 of
non- multicollinear variables will be largely unaffected
Detection of Multicollinearity
• How to detect it?
✓ A relatively high and significant F-statistics with few significant t- statistics
✓ Wrong signs of the regression coefficients
✓ High partial correlation coefficients among the independent variables
✓ Use subsidiary or auxiliary regressions. This involves regressing each
independent variable on the remaining independent variables and use F-test to
determine the significance of R2: F = R 2 / ( k - 1)
(1- R 2) / (n - k )

✓ Using VIF (variance inflating factor): VIF = . It is used to indicate the


presence of multicollinearity between continuous variables
- if VIF of a variable exceeds 10 (this will happens if exceeds (0.9) the
variable is said to be highly collinear
✓ TOLi = (1- Ri2) =
- As a rule of thumb if Tolerance is less than 0.1, multicollinearity problem
become sever
✓ Contingency Coefficient (CC) test: CC = √χ2/ n+χ2
- This test is based on chi-square statistics and for discrete variables 25
• As a rule of thumb, if CC is greater than 0.75, the variables are said to be collinear
Remedies of multicollinearity
• Do nothing, it is not necessarily bad or unavoidable, if t-value = 2
• Drop a redundant variable:
a. Viable strategy when two variables measure essentially the same thing
b. Always use theory as the basis for this decision since the deletion of a
multicollinear variable that belongs in an equation will cause
specification bias
• Increase the sample size:
a. This is frequently impossible but a useful alternative to be considered
if feasible
b. The idea is that the larger sample normally will reduce the variance of
the estimated coefficients, diminishing the impact of the
multicollinearity
• Rethinking of the model: revising incorrect choice of functional form,
specification errors
• Transformation of variables to some forms (into natural logarithms,
forming ratios, first differencing---)
• Change sampling mechanism to allow greater variation in X’s
• Change unit of analysis to allow more cases and more variation in X’s26
Heteroskedasticity
• What it means?
– OLS assumes common variance or homoscedasticity (var(ui) = σ2
for all i)
– Which is equivalent to: the variance of error term does not depend
on i
– Heteroscedasticity (var(ui) = σi2 - the variance varies - often
variance gets larger for larger values of x). This is called pure
heteroscedasticity
– impure heteroskedasticity is heteroskedasticity that is caused by
a specification error originates from an omitted variable rather
than an incorrect functional form)
– It is the common problem in cross sectional data

27
Causes of Heteroskedasticity
• Poor data collection technique
• Outlier
• Specification error (omitted variable); thus, the residuals obtained
from the regression (the error variances) may not be constant
• Skewness: the distribution of some variables such as income, wealth,
etc… is skewed
• Incorrect data transformation, incorrect functional form, etc

28
The Consequences of Heteroskedasticity
• Why worry about heteroskedasticity?
- The existence of heteroskedasticity in the error term of an equation
violates CLRM, and the estimation of the equation with OLS has the
consequences:
1. Pure heteroskedasticity does not cause bias in the coefficient estimates (OLS is
still linear, unbiased and consistent)
2. Impure heteroskedasticity typically causes OLS to no longer be the minimum
variance estimator (of all the linear unbiased estimators) ; thus, OLS are
biased and no longer BLUE
3. Heteroskedasticity causes the OLS estimates of the the standard errors to be
biased, leading to unreliable hypothesis testing. Typically the bias in the SE
estimate is negative, meaning that OLS underestimates the standard errors
(and thus overestimates the t-scores)
4. The usual formulae of the variances of the coefficients are not appropriate to
conduct tests of significance and construct confidence intervals. The tests are
inapplicable.Therefore, the t, F and LM tests cannot be used for
29
drawing inferences
Detecting Heteroskedasticity
• Plot residuals as time series or against x (informal method)
• Statistical tests (formal approach)
1. The Spearman rank-correlation test
This is the simplest test, which may be applied to either small or large samples.
It assumes that the variance of the disturbance term is either increasing or
decreasing as x increases
• Formulate the null hypothesis of homoscedasticity
• Estimate: Yi = β0 + βiXi + Ui and obtain e
• Order/rank the e's (ignoring their sign or take absolute value) and the X
values in ascending or descending order
• Compute the Spearman rank correlation coefficient, rs using the formula:
, Di (d/c b/n the ranks of X and e, and n = SS)
A high rank correlation coefficient is an implication of the presence of
heteroscedasticity
• Compute the value of t =
30
• Reject the null hypothesis, if t > t tabulated at n-2 df
Cont’d---
2. Goldfeld-Quandt test
• This test is mainly applicable when the sample size is large
• The test assumes normality and serially independent u’s
• The test is used to see if the variance increases as the explanatory
variable(s) changes.
• The test involves the following procedures:
– Order or rank the observations according to the values of
beginning with the lowest X value
– The sample is split into two equally sized samples by omitting
some central observations, say c amount, each sample with ½(n-c).
– Run separate regression on each sub-sample and obtain RSS
(RSS1 for the first sub-sample and RSS2 for the second.)
RSS / df
– Compute the ratio: F = 2
, where df = n1-K for 1st and n2-K for 2nd
RSS / df
1

– If the computed F is significant (greater than the critical F), 31


conclude that the null of homoscedasticity is rejected
Cont’d---
3. Breusch-Pagan/Godfrey
• The success of Goldfeld-Quandt test depends not only on the size of
the central observations but also on correctly identifying the
explanatory variables with which to order the observation.
• The Breusch-Pagan/Godfrey test takes care of this limitation. It does
not require ordering of observations but requires the assumption of
normality
• This is also known as Cook-Weisberg test or Lagrangian Multiplier
(LM) test and is a popular test procedure used in most applied
econometric researches
• Suppose the error variance, σi2 which is, f (δ0 + δiZi). This implies the
error variance is a linear function of an intercept and one or more
independent variables, the Zi’s, where Zi could be some other variable
or group of variables
• Steps
➢ Run the standard/orginal regression equation and calculate the
residual (e)

 2
=
 i  2
32
➢ Estimate the regression variance: i
n
Cont’d---
 2
i
➢ Run the auxiliary regression:  =  +  Z + v
 2 i i

➢ Test a hypothesis: H0 : δ1 = δ2 = ... = δn = 0


➢ Obtain test statistic LM and compare it with table value:
, where n is the number of observations, ,
coefficient of determination from auxiliary regression and p
is the number of restrictions (number of variables included
in auxiliary regression)
➢ If the calculated value  2 is greater than the tabulated at chosen
significance level and df (p), we reject the null hypothesis of
homoskedasticity and conclude that there is hetroscedasticity problem
in the data, or
➢ After obtaining the RSS from subsidiary2 regression, calculatethe33

statistic (Θ = 2 ) and compare it with table value (  (2p) )
RSS
Cont’d---
4. White test
• This test is designed to detect non-linear form of hetroscedasticity.
• It is based on an auxiliary regression with squared residuals as
dependent variable and regressors given by: the regressors of the
initial model, their squares and their cross products.
• Therefore, it is also a large sample test but it does not depend on any
normality assumption.
• Hence, this test is more robust than the other test procedures
described above, and is sometimes also called General
heteroscedasticity test
• Estimate the orginal model and obtain residual (e) and square it
• Regress squared residuals on x’s, x2’s and cross products:

34
Cont’d---
❖ The rationale of including these terms is that the variance may be
systematically correlated with either of the independent variables
linearly or non-linearly
• Test a hypothesis: H0 : α1 = α2 = ... = αn = 0
• Using the results from the auxiliary regression you can calculate
the test statistics and compare it with chi-square value:
• If the calculated value  is greater than the tabulated at chosen
2

significance level and df (p), we reject the null hypothesis

35
Remedies for Heteroskedasticity
• While heteroscedasticity is the property of disturbances, the
above tests deal with residuals.
• Hence, they may not report the genuine heteroscedasticity.
• Diagnostic results against homoscedasticity could be due to
misspecification of the model (Impure)
• But, if we are sure that there is a genuine (pure)
heteroscedasticity problem, we can deal with the problem with
heteroscedastic consistent estimation and Weighted Least
Squares (WLS) or GLS (redefining variables)
❖ Weighted Least Squares
• Suppose we want to estimate: yi = b1 + b2x2i + ei
where i is heteroscedastic, that is, var( ei ) = .s i
2

*
• If we transform i by dividing it by i , we get i = ei s i
e
1 1 2
var( ei* ) = var( ei / s i ) = (var( e ) ) = .s = 1
si2 i
si2 i , whose variance is unity
36
Cont’d---
yi
= b1( 1 / s i ) + b 2( x 2i / s i ) + ei*
si
Þ yi* = b1x 1*i + b 2x 2* i + ei*
• The transformed disturbance term has constant variance
• WLS or GLS is OLS on the transformed variables that
satisfy the standard least squares assumptions
• Hence, we can transform and apply OLS and the estimators
are the Generalized Least Squares (GLS) estimators
• The estimators thus obtained are known as GLS estimators,
and it is these estimators that are BLUE
• In applied research, econometricians usually assume that the
variance of u, changes in proportion to the square of the
explanatory variable 37
Serial Correlation
• OLS assumes no serial correlation
– ui and uj are independent for all i  j
• However, in cross-section analysis, residuals are likely to be
correlated across individuals: e.g. common shocks
• Autocorrelation occurs in time-series studies when the errors
associated with a given time period carry over into future time
periods.
➢ For example, if we are predicting the growth of stock dividends, an
overestimate in one year is likely to lead to overestimates in succeeding
years.
• It is likely that such data exhibit intercorrelation, especially if the time
interval between successive observations is short, such as weeks or days.
➢ In time series analysis, today’s error is likely to be related to (correlated
with) yesterday’s residual
38

• maybe due to autocorrelation in omitted variables


Cont’d---
➢ There are different types of serial correlation. With first-order serial
correlation, errors in one time period are correlated directly with
errors in the ensuing time period.
➢ With positive serial correlation, errors in one time period are
positively correlated with errors in the next time period
➢ The most basic form of autocorrelation is referred to as the first order
autocorrelation and is specified as: u t = u t −1 + t , −1    1
➢ This Scheme is known as an Autoregressive (AR(1))process
➢ If ρ < 0, then there is negative serial correlation, meaning that an
unusually high values is likely to be followed by an unusually low
values of the dependent variable and if > 0, positive autocorrelation.
➢ However, in practice it is positive. The reasons is that economic
growth and cyclic movements of the economy and interdependence
among most macroeconomic variables 39
Cont’d---
➢ It is important to find out whether autocorrelation is pure
autocorrelation ( due to behavioral pattern of the values of the true U)
or quasi autocorrelation (the result of mis-specification of the model
or omission variable bais)
➢ Under the AR (1) process, the BLUE estimator of β2 is given by the
following expression
Yt =  1 +  2 X t + u t E (u i u j )  0

 ( xt − xt −1 )( yt − yt −1 )
n

ˆ2GLS = t −2

t −2 t 
n
( x − xt −1 ) 2

2
Var ( ˆ GLS
)=
t −2 t
2

n
( x − xt −1 ) 2

40
Causes of Autocorrelation
1. Inertia - Macroeconomics data experience cycles/business cycles.
2. Specification Bias- Excluded variable
➢ Appropriate equation: Yt =  1 +  2 X 2t +  3 X 3t +  4 X 4t + u t
➢ Estimated equation : Yt =  1 +  2 X 2t +  3 X 3t + v t
➢ Estimating the second equation implies: v t =  4 X 4t + u t
3. Specification Bias- Incorrect Functional Form
Yt =  1 +  2 X 2t +  3 X 22t + v t
Yt =  1 +  2 X 2t + u t
u t =  3 X 22t + v t
4. Cobweb Phenomenon
➢ In agricultural market, the supply reacts to price with a lag of one time period
because supply decisions take time to implement. This is known as the
cobweb phenomenon.
41
➢ Supplyt = β0 + β1Pt−1 + ut
Cont’d---
5. Lags: Ct = 1 +  2Ct −1 + ut
➢ The above equation is known as autoregression because one of the
explanatory variables is the lagged value of the dependent variable.
➢ If you neglect the lagged the resulting error term will reflect a systematic
pattern due to the influence of lagged consumption on current consumption.
6. Data Manipulation
Yt =  1 +  2 X t + u t Yt −1 =  1 +  2 X t −1 + u t −1
Yt =  2 X t + vt
➢ This equation is known as the first difference form and dynamic regression
model. The previous equation is known as the level form.
➢ Note that the error term in the first equation is not autocorrelated but it can be
shown that the error term in the first difference form is autocorrelated
7. Nonstationarity
➢ When dealing with time series data, we should check whether the given time
series is stationary.
➢ A time series is stationary if its characteristics (e.g. mean, variance and
covariance) are time variant; that is, they do not change over time. 42
➢ If that is not the case, we have a nonstationary time series.
Consequences of Autocorrelation
❖ In some cases, it can happen that OLS is BLUE despite autocorrelation.
But such cases are very rare
➢ OLS Estimation Allowing for Autocorrelation

The estimator is no more not BLUE ( but does not affect the
linearity, consistency and unbiasedness ), but inefficient
▪ the confidence intervals are likely to be wider than those based on the
GLS procedure; thus, we are likely to declare a coefficient statistically
insignificant even though in fact it may be
▪ One should use GLS and not OLS
➢ OLS Estimation Disregarding Autocorrelation
▪ The estimated variance of the error is likely to overestimate the true variance
• With positive autocorrelation the standard errors are biased and too small.
• With negative autocorrelation the standard errors are biased and too large.
▪ Over estimate R-square
▪ Therefore, the usual t and F tests of significance are no longer valid, and if
applied, are likely to give seriously misleading conclusions about 43 the
statistical significance of the estimated regression coefficients.
Detection of Autocorrelation
1. Graphical Method
There are various ways of examining the residuals.
✓ The time sequence plot can be produced.
✓ Alternatively, we can plot the standardized residuals against time. The
standardized residuals is simply the residuals divided by the standard
error of the regression.
✓ If the actual and standard plot shows a pattern, then the errors may not
be random.
✓ We can also plot the error term with its first lag.
2. The Durbin
t =n Watson Test
 (uˆ
t =2
t −u
ˆ t −1 ) 2
d = t =n

 uˆ 2
t

It is simply thet =ratio


1
of the sum of squared differences in successive residuals to
the RSS.
➢ The number of observation is n-1 as one observation is lost in taking 44
successive differences
Cont’d---
➢ A great advantage of the Durbin Watson test is that based on the
estimated residuals.
➢ Durbin-Watson have derived a lower bound dL and an upper bound
dU such that if the computed d lies outside these critical values, a
decision can be made regarding the presence of positive or negative
serial correlation.

d=
 t  t −1 − 2 uˆ t uˆ t −1
ˆ
u 2
+ ˆ
u 2

 21 −
 uˆ uˆ
t t −1


 t
ˆ
u 2 
  uˆ 2
t


d  2(1 − ̂ )  uˆ uˆ t −1
, Where ˆ =  uˆ
t
2
t

➢ But since -1 ≤  ≤ 1, this implies that 0 ≤ d ≤ 4.


➢ When  =1, the error term, ut, is nonstationary, for the variances
and covariances become infinite.
➢ When  =1, the first differenced ut becomes stationary, as it is45equal
to q white noise error term
Cont’d---
• Test: Durbin-Watson statistic:

d=
 (ei − ei −1) 2

, for n and K -1 d.f.


e i
2

Positive Zone of No Autocorrelation Zone of Negative


autocorrelation indecision indecision autocorrelation
|_______________|__________________|_____________|_____________|__________________|___________________|
0 d-lower d-upper 2 4-d-upper 4-d-lower 4

Autocorrelation is clearly evident


Ambiguous – cannot rule out autocorrelation
Autocorrelation in not evident
Cont’d---
• d lies between 0 and 4
– d = 2 implies residuals uncorrelated.
• D-W provide upper and lower bounds for d
– if d < dL then reject null of no serial correlation
– if d > dU then reject null hypothesis of no serial correlation
– if then test is inconclusive.
Null hypothesis Decision If

No positive autocorrelation Reject 0<d < dL

No positive autocorrelation No decision dL< d < dU

No negative autocorrelation Reject 4-dL <d<4

No negative autocorrelation No decision 4-du <d<4-dL

No autocorrelation, positive Accept du <d<4-du


or negative
47
Cont’d---
➢ The mechanics of the Durbin-Watson test are as follows:
o Run the OLS regression and obtain the residuals
o Compute d
o For the given sample size and given number of explanatory
variables, find out the critical dL and dU.
o Follow the decisions rule
• With no lagged dependent variable (so DW d is a valid test) and is a
test for 1st order autocorrelation
• With lagged dependent variables in regression (when DW test is
invalid).
– OLS coefficient estimates are inconsistent
• even as sample size increases, estimated coefficient does not
converge on the true coefficient (i.e. it is biased)
48
• So inference is wrong
Cont’d---
3. The Breusch – Godfrey
➢ The BG test, also known as the LM test, is a general test for
autocorrelation in the sense that it allows for:
1. nonstochastic regressors such as the lagged values of the
regressand;
2. higher-order autoregressive schemes such as AR(1), AR
(2)etc.; and
3. simple or higher-order moving averages of white noise error
terms.
➢ Consider the following model:
Yt = 1 +  2 X t + ut
u t =  1u t −1 +  2 u t − 2 + ........ p u t − p + t
H o :  1 =  2 = ..... =  p = 0 t
➢ Estimate the regression using OLS 49
➢ Run the 2nd regression and obtained the R-square
Cont’d---
➢ If the sample size is large, Breusch and Godfrey have
shown that (n – p) R2 follow a chi-square , where p is the
lag length
➢ If (n – p) R2 exceeds the critical value at the chosen level
of significance, we reject the null hypothesis, in which
case at least one rho is statistically different from zero
➢ The BG test is applicable even if the disturbances follow a
pth-order moving averages (MA) process, that is ut is
integrated as follows:
u t =  t + 1 t −1 +  2  t − 2 + ........ p  t − p

➢ A drawback of the BG test is that the value of p, the


length of the lag cannot be specified as a priori
50
Correcting for Serial Correlation
• There are several approaches to resolving problems of
autocorrelation.

– Lagged dependent variables

– Differencing the Dependent variable

– GLS

– ARIMA

51
Cont’d---
➢ The Method of GLS
Yt =  1 +  2 X t + u t u t = 1u t −1 + t −1  1
▪ There are two cases when (1)  is known and (2)  is not known
1. When  is known
➢ If the regression holds at time t, it should hold at time t-1, i.e.
Yt −1 =  1 +  2 X t −1 + u t −1
➢ Multiplying the second equation by  gives
Yt −1 = 1 +  2 X t −1 + u t −1
➢ Subtracting (3) from (1) gives
Yt − Yt −1 =  1 (1 − p) +  2 ( X t − X t −1 ) + 
 t = (u t − u t −1 )
Cont’d---
▪ The equation can be
Yt * =  t* +  t* X t* +  t
➢ The error term satisfies all the OLS assumptions
➢ Thus we can apply OLS to the transformed variables Y* and X* and
obtain estimation with all the optimum properties, namely BLUE
➢ In effect, running this equation is the same as using the GLS.
2. When  is unknown, there are many ways to estimate it.
➢ Assume that  = +1 the generalized difference equation reduces to
the first difference equation
Yt − Yt =  2 ( X t − X t −1 ) + (u t − u t −1 )
Yt =  2 X t +  t
➢ The first difference transformation may be appropriate if the
coefficient of autocorrelation is very high, say in excess of 0.8, or
the Durbin-Watson d is quite low (Maddala’s rough rule of thumb)
Cont’d---
➢ Use the first difference form whenever d< R2.
Rule of thumb:
if d < R2 then estimate model in first difference form
yt = α + β xt + ut
yt-1 = α + β xt-1 + ut-1
yt - yt-1 = β( xt - xt-1) + (ui - ut-1)
so we can recover the regression coefficients (but not the intercept)
➢ There are many interesting features of the first difference equation
➢ There is no intercept regression in it. Thus, you have to use the
regression through the origin routine
➢ If however by mistake one includes an intercept term, then the
original model has a trend in it.
➢ Thus, by including the intercept, one is testing the presence of a
trend in the equation

You might also like