Professional Documents
Culture Documents
2 - Model Linear Jamak Dan OLS
2 - Model Linear Jamak Dan OLS
Session 2
Multiple Regression Analysis: Estimation
Wooldridge Ch. 3
From Session 1
(2.3)
avscore=β 0 +β 1 expend+ β2 avinc+u
to see the effect of per student spending (expend) and average family income
(avinc) on the average test score. Suppose the primary interest is 1 . By
including avinc explicitly, we are able to control for its effect on avscore. This
is important because average family income tends to be correlated with per
student spending. In the two-variable model with expend only, avinc would be
1
included in the error term, which would likely to be correlated with expend,
causing the OLS estimator of
β 1 in the two-variable model to be biased.
o Zero conditional mean assumption in model (2.3) becomes
E(u|educ , exper)=0 E (u | x 1 , x 2 ) 0
(2.4) or .
This implies that other factors affecting wage are not related on average to educ
and exper. Therefore, if there is another factor such as innate ability that is part
of u, then we will need average ability level to be the same across all
combination of educ and exper in the working population. This is what we
meant from imposing restriction or assumption such as (2.4), and in fact, it may
or may not be true. This is the question we need to ask in order to determine
whether the method of OLS produced unbiased estimators.
o Consider the CEO salary (salary) equation which is explained by firm sales
(sales) and CEO tenure (tenure) with the firm:
2
(2.6) log( salary )=β 0 + β1 log(sales )+β 2 ceoten+ β3 ceoten +u .
This fits into the multiple regression model (with 3 independent variables) by
2
defining y=log(salary ) , x 1=log(sales ) , x 2=ceoten and x 3=ceoten .
The parameter
β 1 is the (ceteris paribus) elasticity of salary with respect to
sales. If
β 3=0 , then 100 β2 is the (ceteris paribus) percentage increase in
salary when ceoten increases by one year. When
β 3≠0 , the effect of ceoten
on salary is more complicated.
where
β 0 is the intercept, and 1 , 2 ,…, β k are the parameter associated with
x1 , x2 ,…, x k . There are k+1 (unknown) population parameters.
2
The key assumption for the general multiple regression model is the conditional
expectation:
n
⋮
∑ x ik ( y i− β^ 0− β^ 1 xi 1−…− β^ k x ik )=0
i=1
3
^y = β^ 0 + β^ 1 x 1 + β^ 2 x 2 +…+ β^ k x k
in terms of changes it can be written as
Δ ^y = β^ 1 Δx 1 + β^ 2 Δx2 +…+ β^ k Δx k .
The coefficient 1 β^
measures the change in ^y due to a one-unit increase in
x 1 , holding all other independent variables fixed. That is, Δ y^ = β^ 1 Δx 1 .
Statistical properties of OLS estimator: The expected value of the OLS estimators
where
β 1 , β 2 ,…, β k are the unknown parameters of interest and u is
unobservable random error or random disturbance term. The key feature is that
β β
the model is linear in the parameters 1 , 2 ,…, β k .
4
Could you tell whether MLR 3 is violated in the following regression
equations?
avscore 0 1expend 2 avinc u
2
cons=β 0 +β 1 inc+β 2 inc +u
log(cons) 0 1 log(inc) 2 log(inc 2 ) u
voteA=β 0 +β 1 expend A +β 2 expend B +β 3 totexpend +u
To avoid perfect collinearity, be careful in specifying the model.
The solution to the perfect collinearity: change the specification of the
model (e.g. change the problematic independent variable or drop it).
(2.11) E( β^ j )= β j , j=1,…,k
When we say that OLS is unbiased under MLR 1 ~ MLR 4, we mean that the
procedure by which the OLS estimates are obtained is unbiased when the procedure
is applied across all possible random samples.
5
Omitted variable bias: the simple case
Suppose we omit a variable that actually belongs in the true (or population)
model. This is the problem of excluding a relevant variable or underspecifying
the model. When important variables are omitted from the model, OLS
estimators are generally biased.
We will derive the direction and size of the bias. This is an example of
misspecification analysis.
~
( xi1 x1 ) y1 ( xi1 x1 ) y1
i 1 i 1
1 n
SST1
( xi1 x1 ) 2
i 1
(2.12) .
SST 1
If we divide (2.13) by , take the expectation conditional on the values of
the independent variables and use E (ui )=0 , we obtain
n
~
( xi1 x1 ) xi 2
i 1
E ( 1 ) 1 2 n
( xi1 x1 ) 2
i 1
(2.14)
~ ~
Thus,
E( β 1 )≠β 1 ; 1 is biased for β 1 . The bias is
n
∑ ( x i1− x̄1 ) xi 2
~ ~
E( β 1 )−β 1=β 2 i = n1 =β 2 δ 1
∑ ( x i1− x̄ 1) 2
(2.15) i=1
~
where
δ1 x
is just the slope coefficient from the regression of x 2 on 1
~x =~δ +~ ~
(2.16) 2 0 δ 1 x1
6
Because we are conditioning on the sample values of both independent
~
δ1
variables, is not random and we can write (2.15), i.e. the omitted variable
bias.
~
1
From (2.15) there are two cases where is unbiased.
o
o First, when β 2=0 ; that is when x2 does not appear in the true model,
~
1
is unbiased.
~ ~ ~
1
o Second, when δ 1=0 is unbiased even if β 2≠0 . Since δ 1 is the
sample covariance between x 2 and
x 1 over the sample variance of
~
x 1 , then δ 1=0 if and only if x 2 and x 1 are uncorrelated in the
sample. Thus, we have in important conclusion that, if x2 and
x 1 are
~
uncorrelated in the sample, then β 1 is unbiased.
~
When x 1 and x 2 are correlated,
δ 1 has the same sign as the correlation
between x 1 and x 2 .
~ ~
The sign of the bias in β 1 depends on the sign of both β 2
δ
and 1 . The
summary is below.
~
Summary of bias in β 1 when x 2 is omitted
Corr (x 1 , x 2)> 0 Corr ( x 1 , x 2 )< 0
β 2> 0 Positive bias Negative bias
β 2< 0 Negative bias Positive bias
~ ~
The size of the bias in β 1 is determined by the sizes of β 2 and
δ1 . A
small bias need not be a cause of concern.
Example.
7
o We expect β 2 to be positive: more ability leads to higher productivity
and therefore higher wages.
o It is reasonable to believe that educ and abil are positively related. On
average, people with more innate ability choose higher levels of
education.
~ ~
o Thus β 1 can be bias. On average, β 1 is too large compared to
~
β 1 (an overestimation). We say that β 1 has an upward bias.
^
Expected value of OLS estimator gives info on the central tendency of the β j while
variance gives a measure of the spread in its sampling distribution. To formulate
variance of the OLS estimator, we need a homoskedasticity assumption.
Theorem 2 Sampling variance of the OLS slope estimator (stated without proof).
Under assumptions MLR 1 through MLR 5, sampling variance of the OLS slope
estimator conditional on the sample values is
2
σ
Var ( β^ j )=
(2.16) SST j (1−R2j )
n
SST j =∑ ( xij − x̄ j )2 2
for j=1,2,…,k and i=1 and R j is the R-squared from
x
regressing j on all other independent variables (including an intercept).
8
Thus the standard deviation of β^ j is just sd ( β^ j )= σ^ /[ SST j (1−R2j )] 1/2
Under assumption MLR 1 through MLR 5, β^ 1 , β^ 2 , …, β^ k are the best linear unbiased
estimators (BLUE) of
β 1 ,β 2 ,…, β k respectively.
where
w ij can be a function of the sample values of all independent variables. The
OLS estimator satisfies this.
~
“Best” means the smallest variance. For any estimator
β j that is linear and
^ ~
unbiased, Var ( β j )≤Var ( β j ) . In other words, in the class of linear unbiased
estimators, OLS has the smallest variance (under Gauss Markov assumptions).
The message is that, when the standard set of assumptions holds, no other unbiased and
linear estimator is better than OLS. If any of the Gauss-Markov assumptions fail, the
theorem no longer holds.
Failure of MLR 4 (Zero conditional mean) causes OLS to be biased, and thus
theorem 4 fails.
Failure of MLR 5 (Homoskedasticity) does not cause OLS to be biased, but
OLS no longer has the smallest variance among linear unbiased estimators.
Multicollinearity
2
σ
Var ( β^ j )=
Equation (2.16) SST j (1−R2j ) shows that variance of β^ j depends on
three factors: σ2 , SST j , and R2j .
2 2
The error variance. Again, σ is unknown. The unbiased estimator for σ is
9
n
∑ u^ 2i
i=1 SSR
σ^ 2 = =
(2.17) (n−k −1) (n-k- 1)
(n-k-1) is the degree of freedom for the general OLS problem with n observations and k
independent variable.
2
The larger the σ is large, the larger the variance of OLS estimator will be.
2
σ is a a feature in the population, so it has nothing to do with the sample size. It
2
means that you cannot reduce σ by increasing the sample size. One way to reduce
the error variance is to add more explanatory variables to the equation (take some
factors out of the error term).
2
The Linear Relationship among the Independent Variables Rj .
2 2
The term R j here is different to R .
2
Rj is obtained from a regression of
x j on other independent variables. The value
2
of R j
x
reflects the proportion of the total variation j that can be explained by the
2
other independent variables appearing in the equation. High R j means that j is
x
linearly related to a high degree with other independent variables. In other terms, high
2
R j means x j is highly correlated with other independent variables.
High (but not perfect) correlation between two or more independent variables is called
multicollinearity.
10
If you worry about high degrees of correlation among the independent variables in the
sample,
- you can either drop the independent variables that you think creating the
multicollinearity. However you need to be careful not to drop a variable that
belongs to the population model, because it can lead to bias, a much more
serious problem (because it can violate the OLS assumption) than
multicollinearity.
- Some multicollinearity can be reduced collecting more data.
- You may wish to rethink the specification
11