Professional Documents
Culture Documents
chp2 Econometric
chp2 Econometric
chp2 Econometric
Assumptions of CLRM
• OLS estimates are BLUE if the error term, u, has:
– zero mean: E(ui) = 0 all i
– common variance: var(ui) = σ2 for all i
– normal: ui are normally distributed for all I
– independence: ui and uj are independent (uncorrelated)
for all i j
– No perfect multicolliniarity
– No model misspecification
1
Cont’d---
• Inferences made based on the results of OLS estimations are
valid so long as the assumptions of the classical linear
regression model hold
3
Specifying an Econometric Model and Specification Error
7
Irrelevant Variables
• This refers to the case of including a variable in an equation when it does not belong
there
• This is the opposite of the omitted variables case—and so the impact can be
illustrated using the same model
• Assume that the true regression specification is:
(3.6)
• But the researcher for some reason includes an extra variable:
(3.7)
• The misspecified equation’s error term then becomes:
(3.8)
• So, the inclusion of an irrelevant variable will not cause bias (since the true coefficient of the
irrelevant variable is zero, and so the second term will drop out of Equation 3.8)
• However, the inclusion of an irrelevant variable will:
– Increase the variance of the estimated coefficients, and this increased variance will tend
to decrease the absolute magnitude of their t-scores
– Decrease the adjusted R2 (but not the R2)
8
– But this is not as serious as the omitted variable bias
Measurement error
• Measurement Error in independent variable (s)
✓ Suppose we are interested in estimating a two variable regression
model and we are concerned with the possibility that explanatory
variable might be measured with error.
- The explanatory variable and the error term are correlated which
violates one of the assumptions of the CLRM
- Under such circumstances OLS are not only biased but
inconsistent, i.e. remain biased even asymptotically.
• Measurement Error in dependent variable
✓ The errors of measurement associated with the dependent variable do
not destroy the unbiasedness property of OLS. However, the
parameter estimates are inefficient (high variance and standard errors)
9
Functional Form
❖ Nonlinearity in variables
• This occurs when a linear regression model, which is linear in parameters, is
estimated when the true regression model is nonlinear.
• Suppose the true model is of the form:
Y = β 0 + β1 X 2 + ε (3.9)
• This equation is linear in parameters. Therefore, we can apply OLS
• While the estimated model is:
Y = β0 + β1X + ε (3.10)
• The specification of a linear model when the true model is nonlinear can lead to
biased and inconsistent parameter estimates. Thus, as a test for non-linearity
polynomials need to be estimated
❖ Nonlinearity in parameters
• Consider the model: y = g ( x , ) = x
1 1
2
x 2
3
(3.11)
ln y = 1 + 2 ln x1 + 3 ln x2 (3.12)
Which is linear in parameters; thus, we can apply OLS
10
Methods to detect specification errors
• The Ramsey Reset test can be used to determine if the functional form
of a model is acceptable
i.e., H0: the linear model is correct
H0 : There is no omitted variable in the model (No specification
bias)
• Intuition: If the linear model is correct, powers of the predicted values
of the dependent variable should not the explain the dependent
variable
• This test is based on running the regression and saving the residual as
well as the fitted values.
• Then run a secondary regression of the residual on powers of these
fitted values.
yt = + xt + ut (3.11)
uˆt = 0 + 1 yˆ t2 + 2 yˆ t3 + ... + p −1 yˆ tp + vt (3.12)
11
Cont’d---
• The R-squared statistic is taken from the secondary regression and the
test statistic formed: n*R-squared
14
Non-Normality
• What it means
– Suppose the assumptions about mean and variance of regression errors
satisfied, OLS also assumes they are normally distributed
• How to detect non-normality
• There are several methods of assessing whether data are normally
distributed or not. They fall into two broad categories: graphical and
statistical. The some common techniques are:
1. Graphical
• Q-Q probability plots
• Cumulative frequency (P-P) plots
2. Statistical
• Jarque-Bera test
• Shapiro-Wilks test
• Kolmogorov-Smirnov test
15
Cont’d---
• Q-Q plots display the observed values against normally distributed
data (represented by the line)
18
Cont’d---
➢ The Jarque–Bera test is a goodness-of-fit test of whether sample
data have the skewness and kurtosis matching a normal distribution.
– For normal distribution, skew = 0, kurtosis = 3
– Jarque-Bera test
• J-B = n[S2/6 + (K-3)2/24], n- sample size, S and K- are sample skewness is
kurtosis
• The J-B statistic can be compared to the χ2 distribution (table) with 2
degrees of freedom to determine the critical value at an alpha level of 0.05.
• distributed χ2(2) so CV is approx 6.0
– J-B > 6.0 => non-normality
• What it does (loosely speaking)
» skewness means coefficient estimates are biased.
» excess kurtosis (EK = K-3) means standard errors are understated.
19
Cont’d---
• NB: Normality can be a problem when the sample size is small (< 50)
• How to correct for it
– skewness can be reduced by transforming the data
• take natural logs
• look at outliers
– kurtosis can be accounted for by adjusting the degrees of freedom used in
standard tests of coefficient on x
❑ Other tests
H0: residuals are normally distributed
❖ When the probability is less than .05, we must reject the null hypothesis and infer
that the residuals are non-normally distributed for the following two tests.
✓ Smirnov-Kolmogorov test
✓ Shapiro-Wilk W test for Normality. This tests the cumulative distribution of the
residuals against that of the theoretical normal distribution with a chi-square test.
✓ kdensity test
✓ The pnorm command produces a normal probability plot and it is another
method of testing whether the residuals from the regression are normally20
Cont’d---
❑ Which normality test should be used?
❖ Jarque-Bera:
• Tests for skewness and kurtosis, very effective.
❖ Kolmogorov-Smirnov:
• Not sensitive to problems in the tails.
• For data sets > 50.
❖ Shapiro-Wilks:
• Doesn't work well if several values in the data set are the same.
• Works best for data sets with < 50, but can be used with larger
.
21
Multicollinearity
• What it means? It is a special kind of correlation where regressors are
highly intercorrelated
• More formally, it is a situation where the independent variables in a multiple
regression are linearly correlated
• Multicollinearity is a question of degree, not of kind
• There could be two types of multicollinearity problems: Perfect and less
than perfect collinearity
• If multicollinearity is perfect, the regression coefficients of the X variables
are indeterminate and their standard errors infinite.
• Perfect multicollinearity (exact relationship) violates CLRM, which
specifies that no explanatory variable is a perfect linear function of any other
explanatory variables
• If multicollinearity is less than perfect (near high- doesn’t affect OLS
assumptions), the regression coefficient although determinate, possesses
large standard errors, which means the coefficients can not be estimated with
great precision
22
• It is a feature of sample, and not that of population
Possible Causes of multicollinearity
• The possible sources of multicollinearity are:
➢ The data collection method employed: For instance, sampling over
a limited range
➢ Model specification: For instance adding polynomial terms
➢ The use of lagged values of some explanatory variables as separate
independent factors in the relationship.
➢ An over determined model: Too many variables in the model
(when the model has more explanatory variables than the number
of observations)
➢ X’s are causally related to one another
➢ In time series data, the regressors may share the same trend
23
Consequences of multicollinearity
❖ There are five major consequences of multicollinearity:
1. Estimates will remain unbiased
2. Although the OLS estimators unbiased, the variances and standard errors of
the estimates will increase; thus, harder to distinguish the effect of one
variable from the effect of another, so much more likely to make large errors
in estimating the βs than without multicollinearity
3. As the result, the computed t-scores will fall, F-test is significant while
coefficients are large but not significant , wrong signs of the regression
coefficients, the confidence interval will be wider leading to the acceptance of
the null hypothesis. Although individual coefficients are insignificant, the R2
may be large
4. Estimates will become very sensitive to small changes in specification:
➢ The addition or deletion of an explanatory variable or of a few observations
will often cause major changes in the values of the when significant
multicollinearity exists. For example, if you drop a variable, even one that
appears to be statistically insignificant, the coefficients of the remaining
variables in the equation sometimes will change dramatically
5. However, the overall fit of the equation and the estimation of the coefficients
24 of
non- multicollinear variables will be largely unaffected
Detection of Multicollinearity
• How to detect it?
✓ A relatively high and significant F-statistics with few significant t- statistics
✓ Wrong signs of the regression coefficients
✓ High partial correlation coefficients among the independent variables
✓ Use subsidiary or auxiliary regressions. This involves regressing each
independent variable on the remaining independent variables and use F-test to
determine the significance of R2: F = R 2 / ( k - 1)
(1- R 2) / (n - k )
27
Causes of Heteroskedasticity
• Poor data collection technique
• Outlier
• Specification error (omitted variable); thus, the residuals obtained
from the regression (the error variances) may not be constant
• Skewness: the distribution of some variables such as income, wealth,
etc… is skewed
• Incorrect data transformation, incorrect functional form, etc
28
The Consequences of Heteroskedasticity
• Why worry about heteroskedasticity?
- The existence of heteroskedasticity in the error term of an equation
violates CLRM, and the estimation of the equation with OLS has the
consequences:
1. Pure heteroskedasticity does not cause bias in the coefficient estimates (OLS is
still linear, unbiased and consistent)
2. Impure heteroskedasticity typically causes OLS to no longer be the minimum
variance estimator (of all the linear unbiased estimators) ; thus, OLS are
biased and no longer BLUE
3. Heteroskedasticity causes the OLS estimates of the the standard errors to be
biased, leading to unreliable hypothesis testing. Typically the bias in the SE
estimate is negative, meaning that OLS underestimates the standard errors
(and thus overestimates the t-scores)
4. The usual formulae of the variances of the coefficients are not appropriate to
conduct tests of significance and construct confidence intervals. The tests are
inapplicable.Therefore, the t, F and LM tests cannot be used for
29
drawing inferences
Detecting Heteroskedasticity
• Plot residuals as time series or against x (informal method)
• Statistical tests (formal approach)
1. The Spearman rank-correlation test
This is the simplest test, which may be applied to either small or large samples.
It assumes that the variance of the disturbance term is either increasing or
decreasing as x increases
• Formulate the null hypothesis of homoscedasticity
• Estimate: Yi = β0 + βiXi + Ui and obtain e
• Order/rank the e's (ignoring their sign or take absolute value) and the X
values in ascending or descending order
• Compute the Spearman rank correlation coefficient, rs using the formula:
, Di (d/c b/n the ranks of X and e, and n = SS)
A high rank correlation coefficient is an implication of the presence of
heteroscedasticity
• Compute the value of t =
30
• Reject the null hypothesis, if t > t tabulated at n-2 df
Cont’d---
2. Goldfeld-Quandt test
• This test is mainly applicable when the sample size is large
• The test assumes normality and serially independent u’s
• The test is used to see if the variance increases as the explanatory
variable(s) changes.
• The test involves the following procedures:
– Order or rank the observations according to the values of
beginning with the lowest X value
– The sample is split into two equally sized samples by omitting
some central observations, say c amount, each sample with ½(n-c).
– Run separate regression on each sub-sample and obtain RSS
(RSS1 for the first sub-sample and RSS2 for the second.)
RSS / df
– Compute the ratio: F = 2
, where df = n1-K for 1st and n2-K for 2nd
RSS / df
1
34
Cont’d---
❖ The rationale of including these terms is that the variance may be
systematically correlated with either of the independent variables
linearly or non-linearly
• Test a hypothesis: H0 : α1 = α2 = ... = αn = 0
• Using the results from the auxiliary regression you can calculate
the test statistics and compare it with chi-square value:
• If the calculated value is greater than the tabulated at chosen
2
35
Remedies for Heteroskedasticity
• While heteroscedasticity is the property of disturbances, the
above tests deal with residuals.
• Hence, they may not report the genuine heteroscedasticity.
• Diagnostic results against homoscedasticity could be due to
misspecification of the model (Impure)
• But, if we are sure that there is a genuine (pure)
heteroscedasticity problem, we can deal with the problem with
heteroscedastic consistent estimation and Weighted Least
Squares (WLS) or GLS (redefining variables)
❖ Weighted Least Squares
• Suppose we want to estimate: yi = b1 + b2x2i + ei
where i is heteroscedastic, that is, var( ei ) = .s i
2
*
• If we transform i by dividing it by i , we get i = ei s i
e
1 1 2
var( ei* ) = var( ei / s i ) = (var( e ) ) = .s = 1
si2 i
si2 i , whose variance is unity
36
Cont’d---
yi
= b1( 1 / s i ) + b 2( x 2i / s i ) + ei*
si
Þ yi* = b1x 1*i + b 2x 2* i + ei*
• The transformed disturbance term has constant variance
• WLS or GLS is OLS on the transformed variables that
satisfy the standard least squares assumptions
• Hence, we can transform and apply OLS and the estimators
are the Generalized Least Squares (GLS) estimators
• The estimators thus obtained are known as GLS estimators,
and it is these estimators that are BLUE
• In applied research, econometricians usually assume that the
variance of u, changes in proportion to the square of the
explanatory variable 37
Serial Correlation
• OLS assumes no serial correlation
– ui and uj are independent for all i j
• However, in cross-section analysis, residuals are likely to be
correlated across individuals: e.g. common shocks
• Autocorrelation occurs in time-series studies when the errors
associated with a given time period carry over into future time
periods.
➢ For example, if we are predicting the growth of stock dividends, an
overestimate in one year is likely to lead to overestimates in succeeding
years.
• It is likely that such data exhibit intercorrelation, especially if the time
interval between successive observations is short, such as weeks or days.
➢ In time series analysis, today’s error is likely to be related to (correlated
with) yesterday’s residual
38
( xt − xt −1 )( yt − yt −1 )
n
ˆ2GLS = t −2
t −2 t
n
( x − xt −1 ) 2
2
Var ( ˆ GLS
)=
t −2 t
2
n
( x − xt −1 ) 2
40
Causes of Autocorrelation
1. Inertia - Macroeconomics data experience cycles/business cycles.
2. Specification Bias- Excluded variable
➢ Appropriate equation: Yt = 1 + 2 X 2t + 3 X 3t + 4 X 4t + u t
➢ Estimated equation : Yt = 1 + 2 X 2t + 3 X 3t + v t
➢ Estimating the second equation implies: v t = 4 X 4t + u t
3. Specification Bias- Incorrect Functional Form
Yt = 1 + 2 X 2t + 3 X 22t + v t
Yt = 1 + 2 X 2t + u t
u t = 3 X 22t + v t
4. Cobweb Phenomenon
➢ In agricultural market, the supply reacts to price with a lag of one time period
because supply decisions take time to implement. This is known as the
cobweb phenomenon.
41
➢ Supplyt = β0 + β1Pt−1 + ut
Cont’d---
5. Lags: Ct = 1 + 2Ct −1 + ut
➢ The above equation is known as autoregression because one of the
explanatory variables is the lagged value of the dependent variable.
➢ If you neglect the lagged the resulting error term will reflect a systematic
pattern due to the influence of lagged consumption on current consumption.
6. Data Manipulation
Yt = 1 + 2 X t + u t Yt −1 = 1 + 2 X t −1 + u t −1
Yt = 2 X t + vt
➢ This equation is known as the first difference form and dynamic regression
model. The previous equation is known as the level form.
➢ Note that the error term in the first equation is not autocorrelated but it can be
shown that the error term in the first difference form is autocorrelated
7. Nonstationarity
➢ When dealing with time series data, we should check whether the given time
series is stationary.
➢ A time series is stationary if its characteristics (e.g. mean, variance and
covariance) are time variant; that is, they do not change over time. 42
➢ If that is not the case, we have a nonstationary time series.
Consequences of Autocorrelation
❖ In some cases, it can happen that OLS is BLUE despite autocorrelation.
But such cases are very rare
➢ OLS Estimation Allowing for Autocorrelation
▪
The estimator is no more not BLUE ( but does not affect the
linearity, consistency and unbiasedness ), but inefficient
▪ the confidence intervals are likely to be wider than those based on the
GLS procedure; thus, we are likely to declare a coefficient statistically
insignificant even though in fact it may be
▪ One should use GLS and not OLS
➢ OLS Estimation Disregarding Autocorrelation
▪ The estimated variance of the error is likely to overestimate the true variance
• With positive autocorrelation the standard errors are biased and too small.
• With negative autocorrelation the standard errors are biased and too large.
▪ Over estimate R-square
▪ Therefore, the usual t and F tests of significance are no longer valid, and if
applied, are likely to give seriously misleading conclusions about 43 the
statistical significance of the estimated regression coefficients.
Detection of Autocorrelation
1. Graphical Method
There are various ways of examining the residuals.
✓ The time sequence plot can be produced.
✓ Alternatively, we can plot the standardized residuals against time. The
standardized residuals is simply the residuals divided by the standard
error of the regression.
✓ If the actual and standard plot shows a pattern, then the errors may not
be random.
✓ We can also plot the error term with its first lag.
2. The Durbin
t =n Watson Test
(uˆ
t =2
t −u
ˆ t −1 ) 2
d = t =n
uˆ 2
t
d=
t t −1 − 2 uˆ t uˆ t −1
ˆ
u 2
+ ˆ
u 2
21 −
uˆ uˆ
t t −1
t
ˆ
u 2
uˆ 2
t
d 2(1 − ̂ ) uˆ uˆ t −1
, Where ˆ = uˆ
t
2
t
d=
(ei − ei −1) 2
– GLS
– ARIMA
51
Cont’d---
➢ The Method of GLS
Yt = 1 + 2 X t + u t u t = 1u t −1 + t −1 1
▪ There are two cases when (1) is known and (2) is not known
1. When is known
➢ If the regression holds at time t, it should hold at time t-1, i.e.
Yt −1 = 1 + 2 X t −1 + u t −1
➢ Multiplying the second equation by gives
Yt −1 = 1 + 2 X t −1 + u t −1
➢ Subtracting (3) from (1) gives
Yt − Yt −1 = 1 (1 − p) + 2 ( X t − X t −1 ) +
t = (u t − u t −1 )
Cont’d---
▪ The equation can be
Yt * = t* + t* X t* + t
➢ The error term satisfies all the OLS assumptions
➢ Thus we can apply OLS to the transformed variables Y* and X* and
obtain estimation with all the optimum properties, namely BLUE
➢ In effect, running this equation is the same as using the GLS.
2. When is unknown, there are many ways to estimate it.
➢ Assume that = +1 the generalized difference equation reduces to
the first difference equation
Yt − Yt = 2 ( X t − X t −1 ) + (u t − u t −1 )
Yt = 2 X t + t
➢ The first difference transformation may be appropriate if the
coefficient of autocorrelation is very high, say in excess of 0.8, or
the Durbin-Watson d is quite low (Maddala’s rough rule of thumb)
Cont’d---
➢ Use the first difference form whenever d< R2.
Rule of thumb:
if d < R2 then estimate model in first difference form
yt = α + β xt + ut
yt-1 = α + β xt-1 + ut-1
yt - yt-1 = β( xt - xt-1) + (ui - ut-1)
so we can recover the regression coefficients (but not the intercept)
➢ There are many interesting features of the first difference equation
➢ There is no intercept regression in it. Thus, you have to use the
regression through the origin routine
➢ If however by mistake one includes an intercept term, then the
original model has a trend in it.
➢ Thus, by including the intercept, one is testing the presence of a
trend in the equation