Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 11

Econometrics 1

Session 2
Multiple Regression Analysis: Estimation
Wooldridge Ch. 3

From Session 1

 We have learned how to use simple regression analysis to explain a dependent


variable y as a function of a single independent variable x .
 The primary drawback: it is very difficult to draw ceteris paribus conclusions
about how x affects y : the key assumption SLR 3 (that all other factors
affecting y are uncorrelated with x ) is often unrealistic.
 Multiple regression analysis allows us to explicitly control for many other
factors that simultaneously affect the dependent variable.
 If we add more factors to our model that are useful for explaining y , then
more of variation in y can be explained. Multiple regression analysis can be used
to build better model for predicting the dependent variable.

Motivation for multiple regression


1. To make error u uncorrelated with explanatory variables.
o Consider the wage equations:
wage   0   1 educ  u
(2.1) , and
wage   0   1 educ   2 exper  v
(2.2)
In (2.1) wage is determined by years of education and other factors are
contained in u . Equation (2.2) takes exper out of the error term and puts it
explicitly in the equation. The coefficient β 2 measures the ceteris paribus
effect of exper on wage.
Note that if exper and educ to some extent are correlated, assuming that u (that
still contains exper) is uncorrelated with educ in the simple regression analysis
would be tenuous.
o Consider the student average score equation:

(2.3)
avscore=β 0 +β 1 expend+ β2 avinc+u
to see the effect of per student spending (expend) and average family income

(avinc) on the average test score. Suppose the primary interest is 1 . By
including avinc explicitly, we are able to control for its effect on avscore. This
is important because average family income tends to be correlated with per
student spending. In the two-variable model with expend only, avinc would be

1
included in the error term, which would likely to be correlated with expend,
causing the OLS estimator of
β 1 in the two-variable model to be biased.
o Zero conditional mean assumption in model (2.3) becomes
E(u|educ , exper)=0 E (u | x 1 , x 2 )  0
(2.4) or .
This implies that other factors affecting wage are not related on average to educ
and exper. Therefore, if there is another factor such as innate ability that is part
of u, then we will need average ability level to be the same across all
combination of educ and exper in the working population. This is what we
meant from imposing restriction or assumption such as (2.4), and in fact, it may
or may not be true. This is the question we need to ask in order to determine
whether the method of OLS produced unbiased estimators.

2. To generalize functional relationship between variables


o Consider the family consumption (cons) which is a quadratic function of family
income (inc):
(2.5) cons=β 0 + β 1 inc+ β 2 inc 2+ u
Consumption in this model depends only on one observed factor, but different
to simple regression because here there are 3 parameters. This leads to different
interpretation of parameters. With this model, the marginal propensity to
consume (i.e. change in consumption wrt change in income) is approximated by
∆ cons
≈ β 1+ 2 β 2 inc
∆ inc
In this case the MPC depends on β 2 , β 1 , and the level of income .

o Consider the CEO salary (salary) equation which is explained by firm sales
(sales) and CEO tenure (tenure) with the firm:
2
(2.6) log( salary )=β 0 + β1 log(sales )+β 2 ceoten+ β3 ceoten +u .
This fits into the multiple regression model (with 3 independent variables) by
2
defining y=log(salary ) , x 1=log(sales ) , x 2=ceoten and x 3=ceoten .
The parameter
β 1 is the (ceteris paribus) elasticity of salary with respect to
sales. If
β 3=0 , then 100 β2 is the (ceteris paribus) percentage increase in
salary when ceoten increases by one year. When
β 3≠0 , the effect of ceoten
on salary is more complicated.

 In the population, multiple linear regression model can be written as


y   0   1 x1   2 x2     k xk  u
(2.7)

where
β 0 is the intercept, and  1 ,  2 ,…, β k are the parameter associated with
x1 , x2 ,…, x k . There are k+1 (unknown) population parameters.

2
 The key assumption for the general multiple regression model is the conditional
expectation:

(2.8) E(u|x 1 , x 2 , …, x k )=0


Equation (2.8) requires that all factors in the unobserved error term be uncorrelated
with the explanatory variables. It also means the functional relationships between
the explained and explanatory variables are correctly specified. Assumption (2.8)
implies that OLS is unbiased. Any problem that causes u to be correlated with any
of the independent variables causes (2.8) to fail. Example: when a key variable fails
to be included in the equation (the case of omitted variable), the OLS estimates
would be biased.

Computational, algebraic features of OLS and interpretation of estimates


 The OLS estimates are solutions to the problem
n
min ∑ ( y i − β^ 0− β^ 1 xi1−…− β^ k x ik )2
β^ 0 , ^ β 1 ,…,^ β k i= 1
(2.9)
b , j  1, , k
Taking the partial derivatives with respect to each of the j , evaluating
them at the solutions, and setting them equal to zero gives k+1 linear equations
^ ^ ^
in k+1 unknowns β 0 , β 1 ,..., β k :
n
 ( yi  ˆ 0  ˆ1 x i1    ˆ k x ik )  0
i 1
(2.10)
n
 xi1 ( yi  ˆ 0  ˆ 1 x i1    ˆ k x i k )  0
i 1
n
 xi 2 ( yi  ˆ 0  ˆ 1 x i1    ˆ k x i k )  0
i 1

n

∑ x ik ( y i− β^ 0− β^ 1 xi 1−…− β^ k x ik )=0
i=1

 Interpretation of the OLS sample regression function


^y = β^ 0 + β^ 1 x 1 + β^ 2 x 2 :
̂ 0 x =0 x2  0
, the intercept, is the predicted value of y when 1 and .
ˆ 1 ^
and β 2 have partial effect, or ceteris paribus, interpretations. We can write
 yˆ  ˆ 1 x 1  ˆ 2  x 2 x2 Δ x 2=0 , then
. When is held fixed, so that
Δ ^y = β^ 1 Δx1 . Similarly, when x1 is held fixed or that
Δ x 1 =0 then
Δ ^y = β^ Δx
2 2 . Note that the intercept has nothing to do with the changes in y .
 The case with more than two independent variables is similar. When the OLS
regression line is

3
^y = β^ 0 + β^ 1 x 1 + β^ 2 x 2 +…+ β^ k x k
in terms of changes it can be written as
Δ ^y = β^ 1 Δx 1 + β^ 2 Δx2 +…+ β^ k Δx k .

The coefficient 1 β^
measures the change in ^y due to a one-unit increase in
x 1 , holding all other independent variables fixed. That is, Δ y^ = β^ 1 Δx 1 .

The OLS residuals have some important properties:


1. The sample average of the residuals is zero.
2. The sample covariance between each independent variable and the OLS
residuals u^ is zero. Consequently, the sample covariance between the OLS
fitted values ^y and the OLS residuals is zero.
These properties are immediate consequences of the set of equations used to obtain
the OLS estimates.

Statistical properties of OLS estimator: The expected value of the OLS estimators

o Assumption MLR 1 (Linear in parameters)


The model in the population can be written as
y   0   1 x1   2 x2     k xk  u

where
β 1 , β 2 ,…, β k are the unknown parameters of interest and u is
unobservable random error or random disturbance term. The key feature is that
β β
the model is linear in the parameters 1 , 2 ,…, β k .

o Assumption MLR 2 (Random sampling)

We have a random sample of n observations, {( xi1 , xi 2 , , xik , yi ) : i  1,2, , n}


from the population model in MLR1.

o Assumption MLR 3 (No perfect collinearity)


In the sample (and therefore in the population), none of the independent
variables is constant, and there are no exact linear relationships among the
independent variable.
 This assumption concerns only the independent variables; it says
nothing about the relationship between u and the explanatory variables.
 If an independent variable is an exact linear combination of the other
independent variables, then the model suffers from perfect collinearity and it
cannot be estimated by OLS.
 This assumption does allow the independent variables to be correlated;
they just cannot be perfectly correlated.

4
 Could you tell whether MLR 3 is violated in the following regression
equations?
avscore   0   1expend   2 avinc  u
2
cons=β 0 +β 1 inc+β 2 inc +u
log(cons)   0   1 log(inc)   2 log(inc 2 )  u
voteA=β 0 +β 1 expend A +β 2 expend B +β 3 totexpend +u
 To avoid perfect collinearity, be careful in specifying the model.
 The solution to the perfect collinearity: change the specification of the
model (e.g. change the problematic independent variable or drop it).

o Assumption MLR 4 (Zero conditional mean)


The error u has an expected value of zero, given any values of the independent
variables, i.e.,
E (u | x 1 , x 2 , , xk )  0
.
This assumption restricts the relationship between the unobservables in u and
the explanatory variables.
Assumption MLR 4 can fail if:

 The functional relationship between y i and


x ij is misspecified (e.g.
using level for y when log y is what actually shows up in the
population model).

 Omitting an important factor that is correlated with any of x i1 , xi 2 ,…, x ik .


 Measurement error in any explanatory variable.
When assumption MLR 4 holds, we have exogenous explanatory variables. If
x j is correlated with u for any reason, then x j is said to be endogenous
explanatory variable.
Unfortunately we will never know for sure whether MLR 4 is violated or not.
We would frequently rather assume or argue that our model does not violate
MLR 4, making it a critical assumption.

Theorem 1. Unbiasedness of OLS : Under assumptions MLR 1 through MLR 4,


the OLS estimators are unbiased estimators of the population parameters, i.e.

(2.11) E( β^ j )= β j , j=1,…,k

When we say that OLS is unbiased under MLR 1 ~ MLR 4, we mean that the
procedure by which the OLS estimates are obtained is unbiased when the procedure
is applied across all possible random samples.

5
Omitted variable bias: the simple case
 Suppose we omit a variable that actually belongs in the true (or population)
model. This is the problem of excluding a relevant variable or underspecifying
the model. When important variables are omitted from the model, OLS
estimators are generally biased.
 We will derive the direction and size of the bias. This is an example of
misspecification analysis.

 Assume the true population model: 0 1 1 2 2 y=β + β x +β x +u


, which satisfies
MLR 1 ~ MLR 4. But instead the true model, we run a simple regression
~ ~ ~
y = β 0 + β 1 x 1 . Thus, we estimate β 1 as
n n

~
 ( xi1  x1 ) y1  ( xi1  x1 ) y1
i 1 i 1
1  n

SST1
 ( xi1  x1 ) 2

i 1
(2.12) .

When the true model


y i=β 0 +β 1 x i 1 +β 2 xi 2 +ui is plugged into (2.12), the
numerator becomes
n
 ( xi1  x1 )( 0   1 xi1   2 xi 2  ui )
i 1
(2.13)
n n n
  1  ( xi1  x1 ) 2   2  ( xi1  x1 ) xi 2   ( xi1  x1 ) ui
i 1 i 1 i 1
n n
=β 1 SST 1 + β2 ∑ ( xi 1− x̄ 1 ) x i 2+ ∑ ( x i1− x̄1 )ui
i=1 i=1

SST 1
If we divide (2.13) by , take the expectation conditional on the values of
the independent variables and use E (ui )=0 , we obtain
n

~
 ( xi1  x1 ) xi 2
i 1
E ( 1 )   1   2 n
 ( xi1  x1 ) 2
i 1
(2.14)
~ ~
Thus,
E( β 1 )≠β 1 ;  1 is biased for β 1 . The bias is
n
∑ ( x i1− x̄1 ) xi 2
~ ~
E( β 1 )−β 1=β 2 i = n1 =β 2 δ 1
∑ ( x i1− x̄ 1) 2
(2.15) i=1

~
where
δ1 x
is just the slope coefficient from the regression of x 2 on 1
~x =~δ +~ ~
(2.16) 2 0 δ 1 x1

6
Because we are conditioning on the sample values of both independent
~
δ1
variables, is not random and we can write (2.15), i.e. the omitted variable
bias.
~
1
 From (2.15) there are two cases where is unbiased.
o

o First, when β 2=0 ; that is when x2 does not appear in the true model,
~
1
is unbiased.
~ ~ ~
1
o Second, when δ 1=0 is unbiased even if β 2≠0 . Since δ 1 is the
sample covariance between x 2 and
x 1 over the sample variance of
~
x 1 , then δ 1=0 if and only if x 2 and x 1 are uncorrelated in the
sample. Thus, we have in important conclusion that, if x2 and
x 1 are
~
uncorrelated in the sample, then β 1 is unbiased.

~
 When x 1 and x 2 are correlated,
δ 1 has the same sign as the correlation
between x 1 and x 2 .
~ ~
 The sign of the bias in β 1 depends on the sign of both β 2
δ
and 1 . The
summary is below.
~
Summary of bias in β 1 when x 2 is omitted
Corr (x 1 , x 2)> 0 Corr ( x 1 , x 2 )< 0
β 2> 0 Positive bias Negative bias
β 2< 0 Negative bias Positive bias

~ ~
 The size of the bias in β 1 is determined by the sizes of β 2 and
δ1 . A
small bias need not be a cause of concern.

Example.

Suppose wage is determined by


wage=β 0 +β 1 educ +β 2 abil+u
Since ability is not observed, we instead estimate the model
wage=β 0 +β 1 educ +v
where now v =β 2 abil+u . Call the estimator of β 1 from the simple regression of
~ ~
wage on educ as β 1 . Do you think that β 1 is unbiased? If do you think so, can
~
you tell whether β 1 on average underestimate or overestimate β 1 ?

7
o We expect β 2 to be positive: more ability leads to higher productivity
and therefore higher wages.
o It is reasonable to believe that educ and abil are positively related. On
average, people with more innate ability choose higher levels of
education.
~ ~
o Thus β 1 can be bias. On average, β 1 is too large compared to
~
β 1 (an overestimation). We say that β 1 has an upward bias.

Statistical properties of OLS estimator: Variance of the OLS estimator

^
Expected value of OLS estimator gives info on the central tendency of the β j while
variance gives a measure of the spread in its sampling distribution. To formulate
variance of the OLS estimator, we need a homoskedasticity assumption.

Assumption MLR 5 (Homoskedasticity).


2
Var (u|x 1 , x 2 ,…, x k )=σ
The variance of the error term u, conditional of the explanatory variables, is the
same.

Assumption MLR 1 through MLR 5 collectively are known as Gauss Markov


assumptions (for cross-sectional regression).

Theorem 2 Sampling variance of the OLS slope estimator (stated without proof).
Under assumptions MLR 1 through MLR 5, sampling variance of the OLS slope
estimator conditional on the sample values is
2
σ
Var ( β^ j )=
(2.16) SST j (1−R2j )
n
SST j =∑ ( xij − x̄ j )2 2
for j=1,2,…,k and i=1 and R j is the R-squared from
x
regressing j on all other independent variables (including an intercept).

Theorem 3 Unbiased estimation of σ 2 (stated without proof).

Under assumptions MLR 1 through MLR 5, E ( σ^ 2 )=σ 2 .

8
Thus the standard deviation of β^ j is just sd ( β^ j )= σ^ /[ SST j (1−R2j )] 1/2

Theorem 4 Gauss-Markov Theorem

Under assumption MLR 1 through MLR 5, β^ 1 , β^ 2 , …, β^ k are the best linear unbiased

estimators (BLUE) of
β 1 ,β 2 ,…, β k respectively.

An estimator is “linear” if it can be expressed as a linear function of the data on the


dependent variable:
n
β^ j =∑ w ij y i
i=1

where
w ij can be a function of the sample values of all independent variables. The
OLS estimator satisfies this.

“Unbiased” means that E ( β^ j )=β j .

~
“Best” means the smallest variance. For any estimator
β j that is linear and
^ ~
unbiased, Var ( β j )≤Var ( β j ) . In other words, in the class of linear unbiased
estimators, OLS has the smallest variance (under Gauss Markov assumptions).

The message is that, when the standard set of assumptions holds, no other unbiased and
linear estimator is better than OLS. If any of the Gauss-Markov assumptions fail, the
theorem no longer holds.
Failure of MLR 4 (Zero conditional mean) causes OLS to be biased, and thus
theorem 4 fails.
Failure of MLR 5 (Homoskedasticity) does not cause OLS to be biased, but
OLS no longer has the smallest variance among linear unbiased estimators.

Multicollinearity
2
σ
Var ( β^ j )=
Equation (2.16) SST j (1−R2j ) shows that variance of β^ j depends on
three factors: σ2 , SST j , and R2j .

2 2
The error variance. Again, σ is unknown. The unbiased estimator for σ is

9
n
∑ u^ 2i
i=1 SSR
σ^ 2 = =
(2.17) (n−k −1) (n-k- 1)

(n-k-1) is the degree of freedom for the general OLS problem with n observations and k
independent variable.

2
The larger the σ is large, the larger the variance of OLS estimator will be.
2
σ is a a feature in the population, so it has nothing to do with the sample size. It
2
means that you cannot reduce σ by increasing the sample size. One way to reduce
the error variance is to add more explanatory variables to the equation (take some
factors out of the error term).

The total sample variation in


x j : SST j .
x ^
The larger the total variation in j is, the smaller is Var( β j ) .
x
Thus we prefer to have more sample variation in j . A way to increase the sample
variation in each of the independent variable is by increasing the sample size.
This is the component of the variance that systematically depends on the sample size.
SST Var( β j ) ^ SST j is not a
When j is small, can get very large, but a small
violation of Assumption MLR3.

2
The Linear Relationship among the Independent Variables Rj .
2 2
The term R j here is different to R .
2
Rj is obtained from a regression of
x j on other independent variables. The value
2
of R j
x
reflects the proportion of the total variation j that can be explained by the
2
other independent variables appearing in the equation. High R j means that j is
x
linearly related to a high degree with other independent variables. In other terms, high
2
R j means x j is highly correlated with other independent variables.
High (but not perfect) correlation between two or more independent variables is called
multicollinearity.

It is important to be very clear that multicollinearity violates none of the assumptions.


So, what is the consequence of having multicollinearity among the independent
variables? It will increase Var( β^ j ) . However from the equation (2.16) there are two
^ SST j and
things contributing to higher Var( β j ) from the sample observation:
2 2
R j . It is not R j alone.

10
If you worry about high degrees of correlation among the independent variables in the
sample,
- you can either drop the independent variables that you think creating the
multicollinearity. However you need to be careful not to drop a variable that
belongs to the population model, because it can lead to bias, a much more
serious problem (because it can violate the OLS assumption) than
multicollinearity.
- Some multicollinearity can be reduced collecting more data.
- You may wish to rethink the specification

11

You might also like