Professional Documents
Culture Documents
All Slides
All Slides
All Slides
Institute for Statistics and Mathematics, Department of Finance, Accounting and Statistics
Introductory Course to Econometric Modeling
The course – in particular this set of slides – is based on and coherent with previous
econometrics courses held by Sylvia Frühwirth-Schnatter. It is aimed at being consistent
with other courses from this lecture series (“Econometrics II” / “Applied
Econometrics”).
Throughout the next few months, we strive for competence in the following. . .
Major Milestones
▶ Part I: Basic Concepts of Econometric Modeling
▶ Part II: OLS Estimation
▶ Part III: Multiple Regression Model
2 / 288
Literature
Introductory and largely non-mathematical:
▶ Gary Koop: Analysis of Economic Data. Wiley, 4th edition, 2013.
The “Classics”:
▶ James H. Stock and Mark W. Watson: Introduction to Econometrics. Prentice
Hall, 3rd international edition, 2011.
▶ Jeffrey M. Wooldridge: Introductory Econometrics: A Modern Approach. Cengage,
5th international edition, 2013.
In German:
▶ Herbert Stocker: Methoden der Empirischen Wirtschaftsforschung.
https://www.hsto.info/econometrics/.
▶ Peter Hackl: Einführung in die Ökonometrie. Pearson, 2. Auflage, 2013.
4 / 288
Part I
Basic Concepts of Econometric Modeling
Outline
▶ First steps in R
Econometrics deals with learning about an economic phenomenon (e.g. status of the
economy, influence of product attributes, volatility on financial markets, wage mobility)
from data.
▶ Econometric model: description of the phenomenon involving quantities that are
observable
▶ Data: collected for the observable variables
▶ Econometric inference: draw conclusions from the data about the phenomenon
of interest
Linear model:
D = β0 + β1 p
Non-linear model:
D = β0 p β1
Exact quantitative relationship between the variables of interest is NOT known, but
disturbed by a (stochastic) error term.
Linear model:
D = β0 + β1 p + u
Non-linear model:
D = β0 p β1 u
−2 −2
demand
demand
−2.5 −2.5
−3 −3
−3.5 −3.5
1 1.2 1.4 1.6 1.8 2 1 1.2 1.4 1.6 1.8 2
price price
0.25 0.25
0.2 0.2
demand
demand
0.15 0.15
0.1 0.1
0.05 0.05
1 1.2 1.4 1.6 1.8 2 1 1.2 1.4 1.6 1.8 2
price price
Part I: Basic Concepts of Econometric Modeling What is econometric modeling? 11 / 288
Where Does the Error Come From?
▶ u aggregates variables that are not included into the model because
▶ their influence is not known a priori;
▶ these variables are unobservable or difficult to quantify.
▶ u aggregates measurement errors which are caused by quantifying economic
variables.
▶ u captures the unpredictable randomness in the left hand side variable of the model.
D = β0 + β1 p + u
Econometric inference is, in general, concerned with drawing conclusions from observed
data about quantities that are not directly observable.
Due to this impossibility to observe these quantities of interest, any statement about
these quantities will be uncertain. Two ways of dealing with this uncertainty:
▶ Classical inference: parameter estimation, hypothesis testing, and prediciton as
discussed in the PI Statistik.
▶ Bayesian inference: is based on the concept that the state of knowledge about
any unknown quantity is best expressed in terms of a probability distribution which
is updated in the light of new knowledge.
▶ Model formulation
▶ Model estimation
▶ Econometric inference: parameter estimation, hypothesis testing,
forecasting/prediction
▶ Model choice
▶ Model checking
▶ First steps in R
▶ First steps in R
If the data set is not a (simple) random sample, there is a sample-selection problem.
yt , t = 1, . . . , T .
yit , i = 1, . . . , N, t = 1, . . . , T
Have a look at how data are organized. Files and R code are available on learn@wu:
▶ Case Study Marketing, workfile marketing
▶ Case Study Profit, workfile profit
▶ Case Study Vienna Stocks, workfile viennastocks
▶ Case Study Yields, workfile yieldus
▶ Case Study Chicken, workfile chicken
▶ Case Study Labor Force, workfile change
The code file is called code_eco_I.R.
▶ First steps in R
Part I: Basic Concepts of Econometric Modeling The simple regression model 25 / 288
Question and Data
▶ We are interested in a
▶ dependent variable Y (left-hand side, explained, response), which is supposed
▶ to depend on an variable X explanatory (right-hand side, independent, control,
predictor).
▶ Examples:
▶ demand is a response variable and price is a predictor variable;
▶ wage is a response and years of education is a predictor.
▶ Data: We observe the pair of variables (Y , X ) for N subjects drawn randomly from
a population (e.g. for various supermarkets, for various individuals): (yi , xi ),
i = 1, . . . , N.
Part I: Basic Concepts of Econometric Modeling The simple regression model 26 / 288
Model Formulation
The simple linear regression model describes the dependence between the variables X
and Y as:
Simple Linear Regression Model
Y = β0 + β1 X + u. (1)
Part I: Basic Concepts of Econometric Modeling The simple regression model 27 / 288
Impact of the Error Term
2 2
σ =0.2 σ =1
1 1
0 0
−1 −1
demand
demand
−2 −2
−3 −3
−4 −4
−5 −5
1 1.2 1.4 1.6 1.8 2 1 1.2 1.4 1.6 1.8 2
price price
2 2
σ =0.01 σ =0
1 1
0 0
−1 −1
demand
demand
−2 −2
−3 −3
−4 −4
−5 −5
1 1.2 1.4 1.6 1.8 2 1 1.2 1.4 1.6 1.8 2
price price
Part I: Basic Concepts of Econometric Modeling The simple regression model 28 / 288
Basic Assumptions
▶ The average value of the error term u in the population is 0 (not restrictive, we can
always use β0 to normalize E(u) to 0):
E(u) = 0 (2)
▶ A crucial assumption is that u and X are uncorrelated. This means that the
conditional mean of u is zero, i.e., knowing X does not give us any information
about u.
Assumption About the Conditional Mean Error
Part I: Basic Concepts of Econometric Modeling The simple regression model 29 / 288
Main Assumption
The linear model given in (1) and assumptions (2) and (3) imply that E(Y |X ) (i.e., the
conditional mean of Y given X ) is a linear function of X :
E(Y |X ) = β0 + β1 X (4)
Loosely speaking: For a fixed value of X = x , on average over the population, the linear
prediction β0 + β1 x is correct.
Part I: Basic Concepts of Econometric Modeling The simple regression model 30 / 288
Understanding the Regression Model - error term
▶ Simulate data from a simple regression model with β0 = 0.2 and β1 = −1.8:
Part I: Basic Concepts of Econometric Modeling The simple regression model 31 / 288
Understanding the Regression Model
2 2
σ =0.2 σ =1
1 1
0 0
−1 −1
demand
demand
−2 −2
−3 −3
−4 −4
−5 −5
1 1.2 1.4 1.6 1.8 2 1 1.2 1.4 1.6 1.8 2
price price
2 2
σ =0.01 σ =0
1 1
0 0
−1 −1
demand
demand
−2 −2
−3 −3
−4 −4
−5 −5
1 1.2 1.4 1.6 1.8 2 1 1.2 1.4 1.6 1.8 2
price price
Part I: Basic Concepts of Econometric Modeling The simple regression model 32 / 288
Understanding the Parameters - interpretation of β1
E(Y |X = x ) = β0 + β1 x
E(Y |X = x + 1) = β0 + β1 (x + 1)
Thus, β1 is the expected absolute change of the response variable Y , if the predictor X
is increased by 1:
E(Y |X = x + 1) − E(Y |X = x ) = β1
Part I: Basic Concepts of Econometric Modeling The simple regression model 33 / 288
Understanding the Parameters
Part I: Basic Concepts of Econometric Modeling The simple regression model 34 / 288
The Log-Linear Regression Model
▶ Simulate data from a simple log-linear regression model with β̃0 = 0.2 and
β1 = −1.8:
Y = 0.2 · X −1.8 e u
Part I: Basic Concepts of Econometric Modeling The simple regression model 36 / 288
Visualizing the Log-Linear Regression Model
2
σ =0.01
0.25 −1
0.2 −1.5
log(demand)
demand
0.15 −2
0.1 −2.5
0.05 −3
1 1.2 1.4 1.6 1.8 2 0 0.2 0.4 0.6 0.8
price log(price)
2
σ =0.1
0.35 −1
0.3
−1.5
0.25
log(demand)
−2
demand
0.2
0.15 −2.5
0.1
−3
0.05
0 −3.5
1 1.2 1.4 1.6 1.8 2 0 0.2 0.4 0.6 0.8
price log(price)
Part I: Basic Concepts of Econometric Modeling The simple regression model 37 / 288
Part II
OLS Estimation
Understanding the Parameters I
In economics, elasticity measures how changing one variable affects other variables in
relative terms. If y = f (x ), then the elasticity is the ratio of the percentage change
%∆y in y and the percentage change %∆x in the variable x :
%∆y ∂y ∂x ∂log y
≈ / =
%∆x y x ∂ log x
From equation (8) we obtain the following expected value of log Y , if the predictor X is
equal to x :
E(log Y |X = x ) = β0 + β1 log x .
Therefore:
! !
%∆Y ∂log y ∂E(log y )
E ≈E = = β1
%∆X ∂ log x ∂ log x
yi = β0 + β1 xi + ui . (9)
▶ The parameter estimates are typically denoted by a hat: βˆ0 and βˆ1 .
N N N
2 2
(yi − γ0 − γ1 xi )2
X X X
SSR = ui (γ0 , γ1 ) = (yi − ŷi (γ0 , γ1 )) = (12)
i=1 i=1 i=1
▶ Intuitively, OLS is fitting a line through the sample points such that the sum of
squared residuals is as small as possible.
▶ The OLS-estimator β̂ = (βˆ0 , βˆ1 ) is the parameter that minimizes the sum of
squared residuals
18
14
8 10
6
0 2 4 6 8 10 12
18
14
8 10
6
0 2 4 6 8 10 12
18
14
8 10
6
0 2 4 6 8 10 12
18
14
8 10
6
0 2 4 6 8 10 12
RSS = 26.80
Part II: OLS Estimation 49 / 288
OLS-Estimation
18
14
8 10
6
0 2 4 6 8 10 12
RSS = 22.19
Part II: OLS Estimation 50 / 288
OLS-Estimation
18
14
8 10
6
0 2 4 6 8 10 12
RSS = 18.03
Part II: OLS Estimation 51 / 288
OLS-Estimation
18
14
8 10
6
0 2 4 6 8 10 12
RSS = 14.33
Part II: OLS Estimation 52 / 288
OLS-Estimation
18
14
8 10
6
0 2 4 6 8 10 12
RSS = 11.09
Part II: OLS Estimation 53 / 288
OLS-Estimation
18
14
8 10
6
0 2 4 6 8 10 12
RSS = 8.30
Part II: OLS Estimation 54 / 288
OLS-Estimation
18
14
8 10
6
0 2 4 6 8 10 12
RSS = 5.97
Part II: OLS Estimation 55 / 288
OLS-Estimation
18
14
8 10
6
0 2 4 6 8 10 12
RSS = 4.09
Part II: OLS Estimation 56 / 288
OLS-Estimation
18
14
8 10
6
0 2 4 6 8 10 12
RSS = 2.67
Part II: OLS Estimation 57 / 288
OLS-Estimation
18
14
8 10
6
0 2 4 6 8 10 12
RSS = 1.11
Part II: OLS Estimation 58 / 288
OLS-Estimation
18
14
8 10
6
0 2 4 6 8 10 12
Substituting β̂0 = y − β̂1 x into (16) and solving for β̂1 , we obtain:
PN
i=1 (xi − x )(yi − y) sy
β̂1 = PN 2
= rxy (18)
i=1 (xi − x ) sx
PN
provided that i=1 (xi − x )2 > 0 (or sx2 > 0).
▶ The sample covariance between the regressor and the OLS residuals is zero.
Follows from (16):
N N
1 X 1 X
xi ûi = xi (yi − βˆ0 − βˆ1 xi ) = 0.
N i=1 N i=1
ŷ = β̂0 + β̂1 x .
Econometric inference: learning from the data about the unknown parameter
′
β = (β0 , β1 ) in the regression model.
▶ Use the OLS estimator β̂ to learn about the regression parameter.
▶ Is this estimator equal to the true value?
▶ How large is the difference between the OLS estimator and the true parameter?
▶ Is there a better estimator than the OLS estimator?
2. Run OLS estimation to obtain (β̂0 , β̂1 ) and compare the estimated values with the
true values β0 = 0.2 and β1 = −1.8,
3. Repeat this experiment several times.
4
2
2
0
0
y
y
−2
−2
−4
−4
−2 −1 0 1 2 −2 −1 0 1 2
x x
−1.2
−1.4
−1.4
●
●
−1.6
−1.6
●
● ●● ● ●
●
● ● ●
●
●● ●● ● ● ●● ●● ● ● ●●
●● ●
● ●● ● ● ● ●● ●●
●
● ●●●●●
● ● ● ●● ● ● ●
●●
−1.8
−1.8
●
● ● ●
● ●
● ●● ● ●
● ●●● ●
●●●●
●●
●● ● ●
●●
β1
β1
● ● ●
● ●● ● ●● ●●●●● ●
●
●●●●●
^
^
●● ● ● ●● ● ● ●●
●
●● ●●●●
●●●
●
●
●
●
● ● ●●●●●● ●
● ● ● ● ● ● ●● ●
●
● ● ● ● ● ●●●●●
●
● ● ● ●●●● ●●●●●●
● ● ● ●●
● ● ● ● ● ●
●
●
−2.0
−2.0
●●
●
−2.2
−2.2
−2.4
−2.4
−0.4 −0.2 0.0 0.2 0.4 0.6 0.8 −0.4 −0.2 0.0 0.2 0.4 0.6 0.8
^ ^
β0 β0
4
2
2
0
0
y
y
−2
−2
−4
−4
−2 −1 0 1 2 −2 −1 0 1 2
x x
−1.2
●
−1.4
−1.4
●
● ●
● ●
● ● ● ●
−1.6
−1.6
● ●
● ● ●
● ●
● ● ● ● ● ●
●
● ● ● ● ● ●
● ●● ● ● ● ●
●● ● ● ● ● ● ●
● ● ● ●
● ● ●●● ● ● ● ● ● ● ● ●
● ● ● ●
●
● ●● ● ● ● ● ● ● ● ●
−1.8
−1.8
● ●
●
●
● ● ● ● ● ●
●
● ●●● ● ●
β1
β1
● ●● ●● ● ● ● ●
^
^
● ● ● ●
● ●●● ●● ● ● ●
● ● ● ●●
● ●● ● ● ●● ● ●
●
●● ● ●● ● ●●●
●●● ● ● ● ● ● ●
● ● ● ● ● ●
● ● ●●●●●● ● ●
● ● ●
● ● ●
● ● ● ● ● ● ●
● ● ●
−2.0
−2.0
● ● ● ●
● ● ●
● ●
● ● ● ●
● ●
●
●● ●
−2.2
−2.2
●
●
−2.4
−2.4
−0.4 −0.2 0.0 0.2 0.4 0.6 0.8 −0.4 −0.2 0.0 0.2 0.4 0.6 0.8
^ ^
β0 β0
4
2
2
0
0
y
y
−2
−2
−4
−4
−2 −1 0 1 2 −2 −1 0 1 2
x x
−1.2
−1.4
−1.4
●●
● ●
●
−1.6
−1.6
●
● ●●
● ● ● ●
● ● ●●
●
● ● ●● ●
● ● ●●
● ●●
● ● ● ●● ● ● ● ●
● ●● ●● ● ●● ● ● ●●● ●
●● ● ●●●● ●●● ●●●●●● ●
−1.8
−1.8
● ● ● ● ●● ●●●
● ●
● ●● ● ● ● ●
●●●
● ● ●● ●
● ●● ● ● ●
β1
β1
●●●● ● ● ●
● ● ●●
● ●● ●
●●●
● ●● ● ●●●● ●
^
^
● ●● ●
● ● ●
● ●
● ● ●●●● ● ●●
● ● ●●
●● ●●
● ●● ●
●
● ● ● ● ● ● ● ● ●
●● ● ● ● ●
● ● ●
●● ●● ● ●● ●
● ● ●
● ● ● ●
● ●● ● ●
●
−2.0
−2.0
● ●
●
●
−2.2
−2.2
−2.4
−2.4
−0.4 −0.2 0.0 0.2 0.4 0.6 0.8 −0.4 −0.2 0.0 0.2 0.4 0.6 0.8
^ ^
β0 β0
4
2
2
0
0
y
y
−2
−2
−4
−4
−2 −1 0 1 2 −2 −1 0 1 2
x x
−1.2
−1.4
−1.4
●
−1.6
−1.6
● ●
● ● ●
●
● ●
● ● ● ●● ● ●
● ● ● ● ● ●
● ●
● ● ● ●
● ● ●● ●● ● ●● ●
● ●● ● ● ●● ●● ●
●●●
● ● ●●● ● ● ● ● ●
● ●●
● ●● ● ●
●
−1.8
−1.8
● ●● ● ● ● ●
●
● ●●●● ●
● ● ●●
●
●● ●● ●
β1
β1
● ●
●● ● ●● ●
● ● ● ● ●●● ● ● ● ●
^
^
●
● ● ●●● ● ● ● ● ● ●● ● ●
● ●
● ● ● ●● ● ●
● ●●
● ● ●● ● ● ● ● ●●● ●● ●●●●
● ●
●● ● ●● ● ●
● ● ● ●● ● ●● ● ●●
● ●● ● ● ●
●
● ● ● ● ● ●●
● ●
−2.0
−2.0
● ●● ● ●
● ●
● ●
●
●
−2.2
−2.2
−2.4
−2.4
−0.4 −0.2 0.0 0.2 0.4 0.6 0.8 −0.4 −0.2 0.0 0.2 0.4 0.6 0.8
^ ^
β0 β0
▶ Although we are estimating the true model (no model misspecification), the OLS
estimates differ from the true value.
▶ Many different data sets of size N may be generated by the same regression
model due to the stochastic error term.
▶ The estimated parameters differ, as the sample mean, sample variance and
correlation coefficient are different for each data set:
Obviously, the estimator is a random variable. Hence it makes sense to study the
statistical properties of OLS estimation.
Questions
▶ Are the OLS estimates unbiased, i.e., is the expected difference between the OLS
estimator and the true parameter equal to 0?
▶ How precise are these parameter estimates, i.e., how large is the variance of the
two estimators?
▶ Are the OLS coefficients correlated?
▶ How are the OLS coefficients distributed?
▶ For fixed values of x1 , . . . , xN the sampling properties of y , sy2 and rxy determine
the estimation error:
Properties of the Estimation Error
The estimation error. . .
▶ decreases with increasing number of observations N
OLS is Unbiased
Under assumption (4), the OLS estimator is unbiased, i.e., on average the estimated
value is equal to the true one:
E β̂1 = β1 , E β̂0 = β0
N
1 X
E β̂1 = E(β1 ) + (Xi − X )E(ui |Xi ) = β1 .
NsX2 i=1
Hence E β1 − β̂1 = 0. Furthermore:
N
1 X
β̂0 = Y − β̂1 X = β0 + β1 X + ū − β̂1 X = β0 + (β1 − β̂1 )X + ui ,
N i=1
which implies
N
1 X
E β̂0 = E(β0 ) + E β1 − β̂1 X + E(ui |Xi ) = β0 .
N i=1
▶ How big is the difference between the OLS estimator and the true parameter?
▶ To answer this question, we make an additional assumption on the conditional
variance:
Assumption of Homoskedasticity
V(u|X ) = σ 2 (21)
▶ This means that the variance of the error term u is the same, regardless of the
value of the predictor variable X .
▶ Note: If assumption (21) is violated, e.g. if V(u|X ) = σ 2 h(X ), then we say the
error term is heteroskedastic.
▶ How large is the variation of the OLS estimator around the true parameter?
▶ We know that E β̂1 − β1 = 0
▶ We measure the variation of the OLS estimator around the true parameter through
the expected squared difference, i.e. the variance:
E (β̂1 − β1 )2 = V β̂1 (22)
▶ Similarly for β̂0 : V β̂0 = E (β̂0 − β0 )2
▶ The variance of the slope estimator is the larger, the smaller the number of
observations N (or the smaller, the larger N). Doubling the sample size N halves
the variance of β̂1 .
▶ The variance of the slope estimator is the larger, the larger the error variance σ 2 .
Doubling the error variance σ 2 doubles the variance of β̂1 .
▶ The variance of the slope estimator is the larger, the smaller the variation in X.
Doubling sX2 halves the variance of β̂1 .
▶ The variance is in general different for the two parameters in the simple regression
model. V(β0 ) is given by (without proof):
N
σ2 1 X
V β̂0 = 2
· Xi2 (24)
NsX N i=1
▶ The standard deviations sd(βˆ0 ) and sd(βˆ1 ) of the OLS estimators are defined as:
r r
sd(βˆ0 ) = V βˆ0 , sd(βˆ1 ) = V βˆ1
E(u|X ) = 0
V(u|X ) = σ 2 .
▶ Since we don’t observe the error terms ui directly, we take the residuals ûi as
proxies.
▶ We usually don’t have enough observations of the regressor X for any possible
value x . That’s why we check E(u|X ) = 0 not for X = x , but for a ≤ X ≤ b.
OLS−true model
0.3
50
0.2
0.1
40
OLS−residuals
−0.1 0.0
log(y)
30
20
10
−0.3
0
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
log(x) log(x)
OLS−misspecification
6e+24
3e+24
4e+24
OLS−residuals
y
1e+24
2e+24
−1e+24
0e+00
0 200 400 600 800 1000 0 200 400 600 800 1000
x x
6
4
3000
OLS−residuals
2
0
y
2000
−2
−4
1000
−6
0 200 400 600 800 1000 0 2000 4000 6000 8000
x x
OLS−heteroskedasticity
2000 4000 6000 8000
2000
OLS−residuals
y
0
−4000
0
0 200 400 600 800 1000 0 200 400 600 800 1000
x x
▶ Model Formulation
▶ OLS Estimation
▶ We are interested in a
▶ dependent (left-hand side, explained, response) variable Y , which is supposed to
depend on
▶ K explanatory (right-hand side, independent, control, predictor) variables
X1 , . . . , XK .
▶ Example: wage is a response; education, gender, and experience are predictor
variables.
▶ Sample: We observe these variables for N subjects drawn randomly from a
population (e.g. for various supermarkets, for various individuals):
▶ The multiple regression model describes the relation between the response variable
Y and the predictor variables X1 , . . . , XK as:
Multiple Linear Regression Model
Y = β0 + β1 X1 + . . . + βK XK + u (25)
▶ Key Assumption:
Linearity
Proof:
▶ The multiple log-linear model (also called the multiple “log-log” model) reads:
▶ The log transformation of all variables yields a model that is linear in the
parameters β0 , β1 , . . . , βK ,
Homework:
Have a look in R how to define a multiple regression model and discuss the meaning of
the estimated parameters:
▶ Case Study Chicken, work file chicken
▶ Case Study Marketing, work file marketing
▶ Case Study Profit, work file profit
⇒ R-code code_eco_I.R
▶ Model Formulation
▶ OLS Estimation
The commonly used method to estimate the parameters in a multiple regression model
is, again, OLS estimation:
▶ Denote the candidate choice by γ = (γ0 , . . . , γK )′ .
▶ For each observation yi , the prediction ŷi (γ) of yi depends on γ.
▶ For each yi , define the regression residuals (prediction error) ui (γ) as:
N N
ui (γ)2 = (yi − γ0 − γ1 x1,i − . . . − γK xK ,i )2
X X
SSR(γ) = (33)
i=1 i=1
▶ The OLS-estimator β̂ = (βˆ0 , βˆ1 , . . . , βˆK ) is the parameter that minimizes the sum
of squared residuals.
For a multiple regression model, the estimation problem is solved by software packages
like EViews or R.
In matrix notation, the N equations given in (30) for i = 1, . . . , N, may be written as:
y = Xβ + u
where
u1
β0
u2
.
.
u= .. , β=
.
.
βK
uN
u ′ u = (y − Xγ)′ (y − Xγ) = y ′ y − γ ′ X ′ y − y ′ Xγ +γ ′ X ′ Xγ = y ′ y − 2γ ′ X ′ y + γ ′ X ′ Xγ
| {z } | {z }
scalar! scalar!
Now, find β̂ := arg minγ SSR(γ) which can be done by finding γ such that the FOC is
satisfied:
∂u ′ u
= −2X ′ y + 2X ′ Xγ = 0
∂γ
In other words
′ ′
β̂ = (X X)−1 X y
′
Necessary conditions for X X being invertible:
▶ We have to observe sample variation for each predictor Xk , i.e., the sample
variances of xk,1 , . . . , xk,N are positive for all k = 1, . . . , K
▶ No exact linear relation between any predictors Xk and Xl may be present,
i.e., the empirical correlation coefficients of all pairwise data sets (xk,i , xl,i ),
i = 1, . . . , N are different from 1 and −1 for l ̸= k.
′ ′
Note: EViews produces an error if X X is not invertible, whereas R tries to make X X
invertible by removing predictors.
⇒ R-code code_eco_I.R
Any parameter β ⋆ = (β0 , β1⋆ , β2⋆ , β3⋆ ), where β3⋆ may be arbitrarily chosen and
β2⋆ = β2 + β3 − β3⋆
β1⋆ = β1 − β3 + β3⋆
will lead to the same sum of mean squared errors as β. The OLS estimator is not
unique!
▶ Econometric Inference
▶ OLS Residuals
Part IV: Expected Value and Variance of the OLS Estimator Econometric Inference 112 / 288
Understanding Econometric Inference
Econometric Inference
Learning from data about the unknown parameter β in the regression model:
▶ Use the OLS estimator β̂ to learn about the regression parameter.
▶ Is this estimator equal to the true value?
▶ How large is the difference between the OLS estimator and the true parameter?
▶ Is there a better estimator than the OLS estimator?
Part IV: Expected Value and Variance of the OLS Estimator Econometric Inference 113 / 288
Unbiasedness of the OLS Estimator
OLS is Unbiased
Under the assumptions (26), the OLS estimator (if it exists) is unbiased, i.e. the
estimated values are on average equal to the true values:
E β̂j = βj , j = 0, . . . , K
In matrix notation:
E β̂ = β, E β̂ − β = 0 (36)
Part IV: Expected Value and Variance of the OLS Estimator Econometric Inference 114 / 288
Unbiasedness of the OLS Estimator (Proof)
Part IV: Expected Value and Variance of the OLS Estimator Econometric Inference 115 / 288
Covariance Matrix of the OLS Estimator
▶ Due to unbiasedness, the expected value E β̂j of the OLS estimator is equal to βj
for j = 0, . . . , K .
▶ Hence, the variance V β̂j measures the variation of the OLS estimator β̂j around
the true value βj :
2 2
V β̂j = E β̂j − E β̂j =E β̂j − βj
▶ Are the deviation of the estimator from the true value correlated for different
coefficients of the OLS estimators?
Part IV: Expected Value and Variance of the OLS Estimator Econometric Inference 116 / 288
Recap: Effect of “De-Centering” (100 Experiments)
−1.2
−1.4
−1.4
●
−1.6
−1.6
● ●
● ● ●
●
● ●
● ● ● ●● ● ●
● ● ● ● ● ●
● ●
● ● ● ●
● ● ●● ●● ● ●● ●
● ●● ● ● ●● ●● ●
●●●
● ● ●●● ● ● ● ● ●
● ●●
● ●● ● ●
●
−1.8
−1.8
● ●● ● ● ● ●
●
● ●●●● ●
● ● ●●
●
●● ●● ●
β1
β1
● ●
●● ● ●● ●
● ● ● ● ●●● ● ● ● ●
^
^
●
● ● ●●● ● ● ● ● ● ●● ● ●
● ●
● ● ● ●● ● ●
● ●●
● ● ●● ● ● ● ● ●●● ●● ●●●●
● ●
●● ● ●● ● ●
● ● ● ●● ● ●● ● ●●
● ●● ● ● ●
●
● ● ● ● ● ●●
● ●
−2.0
−2.0
● ●● ● ●
● ●
● ●
●
●
−2.2
−2.2
−2.4
−2.4
−0.4 −0.2 0.0 0.2 0.4 0.6 0.8 −0.4 −0.2 0.0 0.2 0.4 0.6 0.8
^ ^
β0 β0
Part IV: Expected Value and Variance of the OLS Estimator Econometric Inference 117 / 288
Covariance Matrix of the OLS Estimator
▶ The covariance Cov β̂j , β̂k of different coefficients of the OLS estimators
measures how strongly deviations between the estimator and the true value are
correlated:
Cov β̂j , β̂k = E β̂j − βj β̂k − βk
Note that
Cov β̂ = E (β̂ − β)(β̂ − β)′
Part IV: Expected Value and Variance of the OLS Estimator Econometric Inference 118 / 288
Covariance Matrix of the OLS Estimator
V β̂0 Cov β̂0 , β̂1 ··· Cov β̂0 , β̂K
Cov β̂0 , β̂1 V β̂1 ··· Cov β̂1 , β̂K
Cov β̂ = .. ..
..
. ··· . .
Cov β̂0 , β̂K ··· Cov β̂K −1 , β̂K V β̂K
Part IV: Expected Value and Variance of the OLS Estimator Econometric Inference 119 / 288
Homoskedasticity
To derive Cov β̂ , we make an additional assumption:
Homoskedasticity
V(u | X1 , . . . , Xk ) = σ 2 (38)
This means that the variance of the error term u is the same, regardless of the
predictor variables X1 , . . . , XK .
V(Y | X1 , . . . , Xk ) = σ 2
Part IV: Expected Value and Variance of the OLS Estimator Econometric Inference 120 / 288
Covariance Matrix of Error Vector
▶ Because the observations are (by assumption) a random sample from the
population, any two observations yi and yl are uncorrelated. Hence also the errors
ui and ul are uncorrelated.
▶ Together with homoskedasticity (38) we obtain the following covariance matrix
of the error vector u:
Cov(u | X1 , . . . , Xk ) = σ 2 I
Part IV: Expected Value and Variance of the OLS Estimator Econometric Inference 121 / 288
Covariance Matrix of the OLS Estimator
Under assumption (26) and (38), the covariance matrix of the OLS estimator β̂ is given
by:
′
Cov β̂ = σ 2 (X X)−1 (39)
поняття не маю, що це
Part IV: Expected Value and Variance of the OLS Estimator Econometric Inference 122 / 288
Covariance Matrix of the OLS Estimator
Proof of (39)
Using (37), we obtain:
′ ′
β̂ − β = Au with A = (X X)−1 X
Therefore:
′ ′ ′ ′
Cov β̂ = σ 2 AA′ = σ 2 (X X)−1 X X(X X)−1 = σ 2 (X X)−1
Part IV: Expected Value and Variance of the OLS Estimator Econometric Inference 123 / 288
Covariance Matrix of the OLS Estimator
′
The diagonal elements of the matrix σ 2 (X X)−1 define the variance V β̂j of the
OLS estimator for each component
The standard deviation sd β̂j of each OLS estimator is defined as:
Evidently, the standard deviation is larger for larger error variances σ 2 . What other
factors influence the standard deviation?
Part IV: Expected Value and Variance of the OLS Estimator Econometric Inference 124 / 288
Multicollinearity
In practical regression analysis very often high (but not perfect) multicollinearity is
present.
Consider Xj as left-hand variable in the following regression model, whereas all the
remaining predictors remain on the right hand side:
Use OLS estimation to estimate the parameters and let x̂j,i be the values predicted from
this (OLS) regression:
Part IV: Expected Value and Variance of the OLS Estimator Econometric Inference 125 / 288
Multicollinearity
▶ Define Rj as the correlation between the observed values xj,i and the predicted
values x̂j,i in this regression.
▶ If Rj2 is close to 0, then Xj cannot be predicted from the other regressors.
⇒ Xj contains additional, “independent” information.
▶ The closer Rj2 is to 1, the better Xj is predicted from the other regressors and
multicollinearity is present.
⇒ Xj does not contain much „independent” information.
Part IV: Expected Value and Variance of the OLS Estimator Econometric Inference 126 / 288
The Variance of the OLS Estimator
Using Rj , the variance V β̂j of the OLS estimators of the coefficient βj corresponding
to Xj may be expressed in the following way for j = 1, . . . , K :
σ2
V β̂j =
Nsx2j (1 − Rj2 )
The variance V β̂j (and consequently the standard deviation) of the estimate β̂j is
large if
⇒ the regressor Xj is highly redundant given the other regressors,
⇒ Rj2 close to 1, almost multicollinearity present.
Part IV: Expected Value and Variance of the OLS Estimator Econometric Inference 127 / 288
The Variance of the OLS Estimator
All other factors are the same as in the simple regression model, i.e.:
The variance V β̂j , j = 1, . . . , K , of the estimator β̂j is large, if
▶ the variance σ 2 of the error term u is large;
▶ the sampling variation in the regressor Xj , i.e. the variance sx2j , is small;
▶ the sample size N is small.
Part IV: Expected Value and Variance of the OLS Estimator Econometric Inference 128 / 288
Outline
▶ Econometric Inference
▶ OLS Residuals
Part IV: Expected Value and Variance of the OLS Estimator OLS Residuals 129 / 288
OLS Residuals
where
▶ ŷi = β̂0 + β̂1 x1,i + . . . + β̂K xK ,i is called the fitted value
▶ ûi is called the OLS residual
Part IV: Expected Value and Variance of the OLS Estimator OLS Residuals 130 / 288
R / EViews Class Exercise
Homework:
Have a look in R / EViews how to obtain the OLS residuals and the fitted regression:
▶ Case Study profit, workfile profit
▶ Case Study Chicken, workfile chicken
▶ Case Study Marketing, workfile marketing
⇒ R-code code_eco_I.R
Part IV: Expected Value and Variance of the OLS Estimator OLS Residuals 131 / 288
OLS Residuals as Proxies for the Error
Y = β0 + β1 X1 + . . . + βK XK + u (41)
Part IV: Expected Value and Variance of the OLS Estimator OLS Residuals 132 / 288
Algebraic Properties of the OLS residuals
The OLS residuals û1 , . . . , ûN obey K + 1 linear equations and have the following
algebraic properties:
▶ The sum (and thus also the mean) of the OLS residuals ûi is equal to zero:
N
1 X
ûi = 0 (42)
N i=1
N
1 X
xk,i ûi = 0, ∀k = 1, . . . , K (43)
N i=1
Part IV: Expected Value and Variance of the OLS Estimator OLS Residuals 133 / 288
Estimating σ 2 - naive estimator
A naive estimator of σ 2 would be the sample variance of the OLS residuals û1 , . . . , ûN :
N N
!2 N
˜2 = 1 1 X 1 X
ûi2 =
SSR
X
σ̂ ûi − ûi =
N i=1 N i=1 N i=1 N
PN
where we used (42) and SSR = i=1 ûi2 is the sum of squared residuals.
Part IV: Expected Value and Variance of the OLS Estimator OLS Residuals 134 / 288
Estimating σ 2
SSR
σ̂ 2 = (44)
df
where
▶ SSR = N 2
P
i=1 ûi is the sum of squared OLS residuals,
▶ df = (N − K − 1), N is the number of observations, and
▶ K is the number of predictors X1 , . . . , XK .
Part IV: Expected Value and Variance of the OLS Estimator OLS Residuals 135 / 288
Standard Errors of the OLS Estimator
▶ The standard deviation sd β̂j of the OLS estimator given in (40) depends on
√
σ = σ2.
▶ To evaluate the estimation error for a given data set in practical regression analysis,
σ 2 is substituted by the estimator σ̂ 2 given in (44).
Part IV: Expected Value and Variance of the OLS Estimator OLS Residuals 136 / 288
R / Eviews Class Exercise
R / EViews (and other software packages) report for each predictor the OLS estimator
together with the standard errors:
▶ Case Study profit, work file profit
▶ Case Study Chicken, work file chicken
▶ Case Study Marketing, work file marketing
⇒ R-code code_eco_I.R
Note: the standard errors computed by R / EViews (and other software packages) are
valid only under the assumptions made above, in particular, homoskedasticity.
Part IV: Expected Value and Variance of the OLS Estimator OLS Residuals 137 / 288
Quantifying the model fit - simplest model
▶ How well does the multiple regression model (41) explain the variation in Y ?
▶ Compare it with the following simple model without any predictors:
Y = β0 + ũ (46)
▶ The OLS estimator β̂0 which minimizes the following sum of squared residuals:
N
(yi − γ0 )2
X
i=1
Part IV: Expected Value and Variance of the OLS Estimator OLS Residuals 138 / 288
Coefficient of Determination - TSS
▶ In the model without any predictors (46), the sum of squared residuals RSS is
called the total sum of squares (TSS):
N
(yi − y )2
X
TSS =
i=1
▶ Is it possible to reduce the sum of squared residuals of the simple model (46), i.e.
TSS, by including the predictor variables X1 , . . . , XK as in (41)?
Part IV: Expected Value and Variance of the OLS Estimator OLS Residuals 139 / 288
Coefficient of Determination R 2
1. The sum of squared residuals SSR of the multiple regression model (41) is always
smaller than the sum of squared residuals TSS of the simple model (46):
Part IV: Expected Value and Variance of the OLS Estimator OLS Residuals 140 / 288
Coefficient of Determination - Proof
Proof of (47)
The following variance decomposition holds:
N N N N
(yi − ŷi + ŷi − y )2 = ûi2 + 2 (ŷi − y )2
X X X X
TSS = ûi (ŷi − y ) +
i=1 i=1 i=1 i=1
Using the algebraic properties (42) and (43) of the OLS residuals, we obtain:
N
X N
X N
X N
X N
X
ûi (ŷi − y ) = β̂0 ûi + β̂1 ûi x1,i + . . . + β̂K ûi xK ,i − y ûi = 0
i=1 i=1 i=1 i=1 i=1
Therefore:
N
(ŷi − y )2 ≥ SSR
X
TSS = SSR +
i=1
Part IV: Expected Value and Variance of the OLS Estimator OLS Residuals 141 / 288
Coefficient of Determination - Interpretation
Part IV: Expected Value and Variance of the OLS Estimator OLS Residuals 142 / 288
Coefficient of Determination - examples
2 2
SSR=9.5299, SST=120.0481, R =0.92062 SSR=8.3649, SST=8.6639, R =0.034512
2.5 1
data data
price as predictor price as predictor
2 no predictor 0.8 no predictor
1.5 0.6
1 0.4
0.5 0.2
0 0
−0.5 −0.2
−1 −0.4
−1.5 −0.6
−2 −0.8
−0.5 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4 0.5 −0.5 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4 0.5
Part IV: Expected Value and Variance of the OLS Estimator OLS Residuals 143 / 288
Part V
Testing Hypotheses (One Coefficient)
Outline
Part V: Testing Hypotheses (One Coefficient) Testing Hypotheses - One Coefficient 145 / 288
Testing Hypothesis
Y = β0 + β1 X1 + . . . + βj Xj + . . . + βK XK + u, (49)
Formally,
βj = 0 ?
Part V: Testing Hypotheses (One Coefficient) Testing Hypotheses - One Coefficient 146 / 288
Understanding the Testing Problem
Part V: Testing Hypotheses (One Coefficient) Testing Hypotheses - One Coefficient 147 / 288
Understanding the Testing Problem
0.10
●
●
0.05
●
● ●
● ●
● ●
β2 (redundant variable)
●
● ● ●
● ● ●
●
0.00
●
●
●
●
●
● ● ●
● ●
● ●●
●
●
● ●
● ● ● ●
−0.05
● ●
^
●
●
●● ●
● ●
●
−0.10
The OLS estimator β̂2 of β2 = 0 differs from 0 for a single data set, but is 0 on average.
Part V: Testing Hypotheses (One Coefficient) Testing Hypotheses - One Coefficient 148 / 288
Understanding the Testing Problem
▶ OLS estimation for the true model in comparison to estimating a model with a
redundant predictor:
⇒ including the redundant predictor X2 increases the estimation error for the
other parameters!
One predictor Two predictors
−1.70
−1.70
●
● ●
●
●
●
−1.75
−1.75
●
● ● ●
● ● ●
● ● ●
● ●
● ● ●
● ● ●
●
●
● ●
● ● ● ●
●
● ● ●
● ● ●
−1.80
−1.80
● ● ●
● ●
● ● ●
● ● ● ● ●
● ● ●
● ●
● ● ● ● ●●
●
β1
β1
● ● ●
^
^
● ●
● ●● ● ●
● ● ●
● ● ●
● ● ● ●
● ● ●
● ●
●
● ● ●
−1.85
−1.85
● ● ●
● ●
● ●
●
●
●
●
−1.90
−1.90
●
0.14 0.16 0.18 0.20 0.22 0.24 0.26 0.14 0.16 0.18 0.20 0.22 0.24 0.26
^ ^
β0 β0
Part V: Testing Hypotheses (One Coefficient) Testing Hypotheses - One Coefficient 149 / 288
Testing of Hypotheses
Part V: Testing Hypotheses (One Coefficient) Testing Hypotheses - One Coefficient 150 / 288
The Classical Regression Model
This assumption implies the more general assumptions (26) and (38):
E(u|X1 , . . . , XK ) = E(u) = 0
V(u|X1 , . . . , XK ) = V(u) = σ 2
Part V: Testing Hypotheses (One Coefficient) Testing Hypotheses - One Coefficient 151 / 288
The Classical Regression Model
▶ Furthermore, because the observations are a random sample, the error vector u
has a multivariate normal distribution with independent components:
u ∼ NN 0, σ 2 I
Part V: Testing Hypotheses (One Coefficient) Testing Hypotheses - One Coefficient 152 / 288
Multivariate Normal Distributions with Independent
Components
Density of the bivariate normal distribution N2 (0, σ 2 I) with σ 2 = 0.5:
2.5
1.5
0.35
1
0.3
0.25 0.5
0.2
x2
0
0.15
−0.5
0.1
0.05 −1
0 −1.5
2
−2
1 2
0 1
−1 0 −2.5
−1
−2 −2
−3 −2 −1 0 1 2 3
x2 x1
x1
Part V: Testing Hypotheses (One Coefficient) Testing Hypotheses - One Coefficient 153 / 288
Multivariate Normal Distributions with Independent
Components
1000 observations from N2 (0, 0.5I) in comparison to 100α%-confidence region (from
the left to the right: α = 0.25, α = 0.5, α = 0.95)
α = 0.25, Rel.H.: 0.242 α = 0.5, Rel.H.: 0.48 α = 0.95, Rel.H.: 0.954
3 3 3
2 2 2
1 1 1
0 0 0
−1 −1 −1
−2 −2 −2
−3 −3 −3
−3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3
Part V: Testing Hypotheses (One Coefficient) Testing Hypotheses - One Coefficient 154 / 288
Multivariate Normal Distributions with Dependent
Components
Density of the bivariate normal distribution N2 (µ, Σ) with
! !
2 4 3.2
µ= Σ=
−3 3.2 7
0
0.04
x2
0.03
−5
0.02
10
0.01
5 −10
0
0
5
0
−5
−5 −15
−10 x1
−15 −8 −6 −4 −2 0 2 4 6 8 10 12
x2 x
1
Part V: Testing Hypotheses (One Coefficient) Testing Hypotheses - One Coefficient 155 / 288
Multivariate Normal Distributions with Dependent
Components
4 4 4
2 2 2
0 0 0
−2 −2 −2
−4 −4 −4
−6 −6 −6
−8 −8 −8
−8 −6 −4 −2 0 2 4 6 8 10 12 −8 −6 −4 −2 0 2 4 6 8 10 12 −8 −6 −4 −2 0 2 4 6 8 10 12
Part V: Testing Hypotheses (One Coefficient) Testing Hypotheses - One Coefficient 156 / 288
Distribution of the OLS Estimator
Using (37), we obtain:
′
β̂ − β ∼ NK +1 0, Cov β̂ , Cov β̂ = σ 2 (X X)−1
All marginal distributions are normal:
2
β̂j − βj ∼ N 0, sd β̂j ,
thus
β̂j − βj
∼ N (0, 1) (51)
sd β̂j
Note: Deviations between the true value and the OLS estimator are usually correlated:
′
β̂ − β ∼ NK +1 0, σ 2 (X X)−1 ,
Part V: Testing Hypotheses (One Coefficient) Testing Hypotheses - One Coefficient 157 / 288
Testing a Single Coefficient: t-Test
|β̂j |
≤ cα (52)
sd β̂j
β̂j
tj = (53)
sd β̂j
Part V: Testing Hypotheses (One Coefficient) Testing Hypotheses - One Coefficient 158 / 288
Testing a Single Coefficient: t-Test
If (50) holds and σ 2 is known, then tj follows a standard normal distribution under the
null hypothesis:
▶ Choose a significance level α
▶ Determine the corresponding critical value cα
▶ If |tj | > cα : reject the null hypothesis (the risk to reject the null hypothesis
although it is true is at most α)
▶ If |tj | ≤ cα : do not reject the null hypothesis (the risk to “not reject” a wrong null
hypothesis may be arbitrarily large)
Part V: Testing Hypotheses (One Coefficient) Testing Hypotheses - One Coefficient 159 / 288
Choice of cα when σ 2 is unknown
▶ If σ 2 is unknown and estimated as described above, then sd β̂j is substituted by
se β̂j , yielding the test statistic:
β̂j
tj = (54)
se β̂j
▶ Choosing the quantiles of the normal distributions would lead to a test which rejects
the true null-hypothesis more often than desired, e.g. for α = 0.05 and K = 3:
N 10 20 30 40 50 100
P(reject H0 ) 0.09 0.07 0.06 0.06 0.06 0.05
Part V: Testing Hypotheses (One Coefficient) Testing Hypotheses - One Coefficient 160 / 288
Choice of cα when σ 2 is unknown
▶ The reason for this phenomenon is that tj no longer follows a normal distribution,
but a tdf - distribution where df = (N − K − 1).
▶ The critical values tdf,1−α/2 depend on df and are equal to the quantiles of the tdf
distribution.
E.g., for α = 0.05 and for a regression model with 3 parameters, these values are
approximately:
df = N − 3 7 17 27 37 47 97 ∞
tdf,0.975 2.36 2.11 2.05 2.03 2.01 1.98 1.96
Part V: Testing Hypotheses (One Coefficient) Testing Hypotheses - One Coefficient 161 / 288
The Student t distribution
0.2
0.1
0.0
−4 −2 0 2 4
t2,0.975 ≈ 4.30
Part V: Testing Hypotheses (One Coefficient) Testing Hypotheses - One Coefficient 162 / 288
The Student t distribution
0.2
0.1
0.0
−4 −2 0 2 4
t3,0.975 ≈ 3.18
Part V: Testing Hypotheses (One Coefficient) Testing Hypotheses - One Coefficient 163 / 288
The Student t distribution
0.2
0.1
0.0
−4 −2 0 2 4
t5,0.975 ≈ 2.57
Part V: Testing Hypotheses (One Coefficient) Testing Hypotheses - One Coefficient 164 / 288
The Student t distribution
0.2
0.1
0.0
−4 −2 0 2 4
t10,0.975 ≈ 2.23
Part V: Testing Hypotheses (One Coefficient) Testing Hypotheses - One Coefficient 165 / 288
The Student t distribution
0.2
0.1
0.0
−4 −2 0 2 4
t30,0.975 ≈ 2.04
Part V: Testing Hypotheses (One Coefficient) Testing Hypotheses - One Coefficient 166 / 288
The Student t distribution
0.2
0.1
0.0
−4 −2 0 2 4
The tdf distribution converges to the standard normal for large df: t∞,0.975 ≈ 1.96
Part V: Testing Hypotheses (One Coefficient) Testing Hypotheses - One Coefficient 167 / 288
The p-value
The p-value is derived from the distribution of the t-statistic under the null
hypothesis and is easier to interpret than the t-statistic which has to be compared
to the correct quantiles:
▶ Choose a significance level α
▶ If p < α: reject the null hypothesis (risk to reject the null hypothesis although it is
true is at most α)
▶ If p ≥ α: do not reject the null hypothesis (risk to “not reject” a wrong null
hypothesis may be arbitrarily large)
An Old Saying. . .
If the p is low, the null must go
If the p is high, the null will fly
Part V: Testing Hypotheses (One Coefficient) Testing Hypotheses - One Coefficient 168 / 288
R / EViews Class Exercise
Have a look in R how to formulate sensible null hypotheses and how to test them using
the t-statistic and the p-value:
▶ Case Study Profit, workfile profit
▶ Case Study Chicken, workfile chicken
▶ Case Study Marketing, workfile marketing
⇒ R-code code_eco_I.R
Part V: Testing Hypotheses (One Coefficient) Testing Hypotheses - One Coefficient 169 / 288
Case Study Chicken
The t-statistic for the variable income is equal to 1.024, p-value: 0.319 (rounded)
0.4
0.3
0.2
0.1
0.16 0.16
0.0
1.024
−4 −2 0 2 4
Part V: Testing Hypotheses (One Coefficient) Testing Hypotheses - One Coefficient 170 / 288
Case Study Chicken
The t-statistic for the variable ppork is equal to 3.114, p-value: 0.006
0.4
0.3
0.2
0.1
0.003 0.003
0.0
3.114
−4 −2 0 2 4
Part V: Testing Hypotheses (One Coefficient) Testing Hypotheses - One Coefficient 171 / 288
Understanding p-Values
▶ A small p-value shows that the value observed for the t-statistic (or an even more
“extreme” value) is unlikely under the null hypothesis, thus we reject the null
hypothesis for small p-values.
⇒ There is substantial evidence in the data that βj ̸= 0.
▶ A p-value considerable larger than 0 shows that the observed value (or an even
more “extreme” value) for the t-statistic is plausible under the null hypothesis, thus
we do not reject the null hypothesis for large p-values.
⇒ There is little evidence in the data that βj ̸= 0.
Note that not rejecting the null does not necessarily mean that βj = 0, because the risk
to accept a wrong null hypothesis is not controlled!
Part V: Testing Hypotheses (One Coefficient) Testing Hypotheses - One Coefficient 172 / 288
Confidence Intervals for the Unknown Coefficients
The marginal distribution (51) is also useful for obtaining 100(1 − α)% confidence
regions for the unknown regression coefficients (e.g., α = 0.05 leads to a 95%
confidence region)
Part V: Testing Hypotheses (One Coefficient) Testing Hypotheses - One Coefficient 173 / 288
Confidence Intervals for the Unknown Coefficients
Part V: Testing Hypotheses (One Coefficient) Testing Hypotheses - One Coefficient 174 / 288
Confidence Intervals for the Unknown Coefficients
If σ 2 is unknown, then sd β̂j is substituted by se β̂j . Instead of (51), we obtain with
df = (N − K − 1):
β̂j − βj
∼ tdf
se β̂j
Part V: Testing Hypotheses (One Coefficient) Testing Hypotheses - One Coefficient 175 / 288
More about the distribution of the OLS estimator
˜ is obtained from the rows and columns j , . . . , j of Covβ̂
where Cov β̂ 1 q
▶ This result may be used to construct 95%-confidence ellipsoids for all pairs of
parameters (βj1 , βj2 )
Part V: Testing Hypotheses (One Coefficient) Testing Hypotheses - One Coefficient 176 / 288
Part VI
Testing Hypotheses (More Coefficients)
Testing More Than One Coefficient
▶ Testing the null hypothesis βj = 0 based on tj is only valid if all other parameters
remain in the model.
▶ Often, we want to test joint hypotheses about our parameters.
▶ E.g., if the tj -statistic is not significant for more than one parameter j1 , . . . , jq , then
one needs to test, if βj1 = 0, βj2 = 0, . . . , βjq = 0 simultaneously.
▶ We cannot simply check each tj -statistic separately, as it is possible for jointly
insignificant regressors to be individually significant (and vice versa).
˜ = (βˆ , . . . , β̂ )′
Reject the null hypothesis, if the distance between the OLS estimator β̂ j1 jq
and 0 is “large” (one-sided test).
▶ Aggregate tjl = βˆjl /sd βˆjl for l = 1, . . . , q, e.g., by taking the sum of squared t
statistics?
▶ If the deviations of the OLS estimators βˆj1 , . . . , β̂jq from the true values are
uncorrelated, then the aggregated test statistic
2
q
X βˆjl
2
l=1 sd βˆj
l
0.35
0.3 1.5
0.25
0.2 1
0.15
0.1 0.5
0.05
0 0
0 5 10 15 20 25 30 35 40 0 1 2 3 4 5 6 7
Left hand side: density of the χ2q -distribution; right hand side: density of the random
variable X /q, where X ∼ χ2q ; degrees of freedom q = ν ∈ {2, 5, 10, 20}.
Part VI: Testing Hypotheses (More Coefficients) 181 / 288
Testing More Than One Coefficient
Usually, the deviations of the OLS estimators βˆj1 , . . . , β̂jq from the true values are
correlated:
▶ Transform the deviations to a coordinate system with independent standard normal
random variables. In this new coordinate system, the sum of squared deviations
follow a χ2q -distribution with q degrees of freedom. The appropriate transformation
reads:
−1
˜ ′ Cov β̂
β̂ ˜ ˜ ∼ χ2
β̂ q
1.5
0.5
0
0 0.5 1 1.5 2 2.5 3 3.5 4
The F -statistic for testing the joint hypothesis βgender = βage = 0 jointly is equal to
2.086, p-value: 0.124 (rounded)
1.0
0.8
0.6
0.4
0.2
0.124
0.0
2.086
0 1 2 3 4 5 6
The F -statistic for testing the joint hypothesis βgender = βage = βprice = 0 jointly is
equal to 451.572, p-value: 0.000 (rounded)
0.3
0.2
0.1
0.0
451.572
1 2 5 10 20 50 100 200 500
Equivalent forms of the F -statistic show that the F-statistic measures the loss of fit
from imposing the q restrictions on the model:
Here,
▶ SSR is the minimum sum of squared residuals and R2 is the coefficient of
determination for the unrestricted regression model.
▶ SSRr is the minimum sum of squared residuals and R2r is the coefficient of
determination for the restricted regression model.
β1 = 0, β2 = 0, . . . , βK = 0
R2 /K
F =
(1 − R2 )/df
▶ Under the null hypothesis, F follows a FK ,df -distribution. Hopefully, the
corresponding p-value is close to 0. Otherwise, the usefulness of the whole
regression model is somewhat doubtful!
Suppose we want to test the hypothesis that two regression coefficients are equal,
e.g. β1 = β2 . This is equivalent to testing the following linear constraint (null
hypothesis):
β1 − β2 = 0 (57)
Test statistic based on the difference of the OLS estimators β̂1 − β̂2 :
▶ If |β̂1 − β̂2 | is small, then the hypothesis (57) is not rejected.
▶ If |β̂1 − β̂2 | is large, then the hypothesis (57) is rejected.
where
˜ = LCovβ̂ L′
Cov β̂
The F -statistic may also be used to test more than one linear constraint on the
coefficients, i.e. Lβ = 0, where L is a q × (K + 1)-matrix with q > 1.
The F -statistic is constructed as above and follows an Fq,df -distribution, where q is the
number of linear constraints.
▶ Dummy Variables
Part VII: Further Properties of the OLS Estimator and Dummy Variables Further Properties of the OLS Estimator 195 / 288
Further Properties of the OLS Estimator
Under the normality assumption (50) about the error term, the OLS estimator is not
only BLUE. A stronger optimality result holds:
Efficiency of OLS estimation
Under assumption (50), the OLS estimator β̂ is the minimum variance unbiased
estimator.
Any other unbiased estimator β̃ (which need not be a linear estimator) has larger
standard deviations
than the OLS estimator:
▶ sd β̃j ≥ sd β̂j
▶ Cov β̃ − Cov β̂ is positive semi-definite
However, if assumption (50) is violated, other (nonlinear) estimation methods may be
more efficient.
Part VII: Further Properties of the OLS Estimator and Dummy Variables Further Properties of the OLS Estimator 197 / 288
Consistency of OLS Estimation
or, equivalently,
P |β̂N − β| < ϵ → 1 as N → ∞.
Part VII: Further Properties of the OLS Estimator and Dummy Variables Further Properties of the OLS Estimator 198 / 288
Consistency of OLS Estimation
▶ Consistency means that the OLS estimator converges “in probability” to the true
value with increasing number of observations N.
▶ A sufficient condition for this convergence in probability is that E β̂N → β and
sd β̂N → 0 as N → ∞.
▶ Under the Gauss Markov assumptions, the OLS estimator is a consistent estimator
of β.
▶ Note that consistency also holds if the normality assumption (50) is violated.
Part VII: Further Properties of the OLS Estimator and Dummy Variables Further Properties of the OLS Estimator 199 / 288
Consistency of OLS Estimation
“Proof”
For each j = 1, . . . , K :
▶ The OLS estimator is unbiased, i.e. E β̂j = βj .
▶ The standard deviation sd β̂j goes to 0 for N → ∞:
σ
sd β̂j = q → 0 as N → ∞
Nsx2j (1 − Rj2 )
Part VII: Further Properties of the OLS Estimator and Dummy Variables Further Properties of the OLS Estimator 200 / 288
Outline
▶ Dummy Variables
Part VII: Further Properties of the OLS Estimator and Dummy Variables Dummy Variables 201 / 288
Regression Models with Dummy Variables as
Predictors
▶ A dummy variable (binary variable) D is a variable that assumes two values only: 0
or 1
▶ Examples: EU member (D = 1 if EU member, 0 otherwise), brand (D = 1 if
product has a particular brand, 0 otherwise), gender (D = 1 if male, 0 otherwise)
▶ Note that the labelling is not unique, a dummy variable could be labeled in two
ways, i.e. for variable gender:
▶ D = 1 if male, D = 0 if female
▶ D = 1 if female, D = 0 if male
Part VII: Further Properties of the OLS Estimator and Dummy Variables Dummy Variables 202 / 288
Regression Models with Dummy Variables as
Predictors
Consider a regression model with one continuous variable X and one dummy variable D:
Y = β0 + β1 D + β2 X + u
If D = 0, then:
Y = β0 +β2 X + u
|{z}
intercept
If D = 1, then:
Y = β0 + β1 +β2 X + u
| {z }
intercept
Part VII: Further Properties of the OLS Estimator and Dummy Variables Dummy Variables 203 / 288
Regression Models with Dummy Variables as
Predictors
Example: Y = 20 + 3.2D − 2.5X
24
22
D=1
20
18
16
Y
14
12 D=0
10
6
0 1 2 3 4 5
X
Part VII: Further Properties of the OLS Estimator and Dummy Variables Dummy Variables 204 / 288
Regression Models with Dummy Variables as
Predictors
Interpretation:
▶ The observed units are split into 2 groups according to D (e.g. into men and
women).
▶ The group with D = 0 is called the baseline (e.g. men).
▶ The regression coefficient β1 of D quantifies the expected difference of considering
the other group (e.g. women) on the dependent variable Y , while holding all other
variables (e.g. X ) fixed.
▶ The null hypothesis β1 = 0 corresponds to the assumption that the conditional
average value of Y given all remaining regrossors is the same for both groups.
Part VII: Further Properties of the OLS Estimator and Dummy Variables Dummy Variables 205 / 288
Regression Models with Dummy Variables as
Predictors
Consider model
Y = 20 + 3.2D − 2.5X + u
where D = 1 if female. Assume that X = 4:
▶ expected value of Y for a man: E(Y |X = 4, D = 0) = 20 − 2.5 · 4 = 10
▶ expected value of Y for a woman: E(Y |X = 4, D = 1) = 20 + 3.2 − 2.5 · 4 = 13.2
▶ expected difference between women and men is equal to β1 = 3.2
The expected difference between women and men is equal to β1 = 3.2 for all values of
X!
Part VII: Further Properties of the OLS Estimator and Dummy Variables Dummy Variables 206 / 288
Combining More Dummy Variables
Estimate a model where D1 is the gender (1: female, 0: male), D2 is the brand (1:
specific brand, 0: no-name), and P is the price:
Y = β0 + β1 D1 + β2 D2 + β3 P + u
Part VII: Further Properties of the OLS Estimator and Dummy Variables Dummy Variables 207 / 288
Categorical Variables
We can use dummy variables to control for characteristics with multiple categories (K
categories ⇒ K − 1 dummies)
Suppose one of the predictors is the highest level of education. Such variables are often
coded in the following way:
edu
1 high school dropout
2 high school degree
3 college degree
Part VII: Further Properties of the OLS Estimator and Dummy Variables Dummy Variables 208 / 288
Categorical Variables
Including edu directly into a linear regression model would mean that the effect of a
high school degree compared to a drop out is the same as the effect of a college degree
compared to a high school degree.
Part VII: Further Properties of the OLS Estimator and Dummy Variables Dummy Variables 209 / 288
Categorical Variables
This yields:
▶ Baseline (all dummies 0): high school dropout
▶ D1 = 1, if highest degree from high school, 0 otherwise
▶ D2 = 1, if college degree, 0 otherwise
Y = β0 + β1 D1 + β2 D2 + β3 X + u
Part VII: Further Properties of the OLS Estimator and Dummy Variables Dummy Variables 210 / 288
Categorical Variables
In other words:
▶ β1 is the effect of a high school degree compared to a drop out.
▶ β2 is the effect of a college degree compared to a drop out.
Testing hypothesis:
▶ Is the effect of a high school degree compared to a drop out the same as the effect
of a college degree compared to a high school degree?
▶ Test if 2β1 = β2 , or equivalently, test the linear hypothesis 2β1 − β2 = 0.
Part VII: Further Properties of the OLS Estimator and Dummy Variables Dummy Variables 211 / 288
Case Study Marketing
There are 5 different brands of mineral water (KR, RO, VO, JU, A):
▶ Select one mineral water as baseline, e.g. KR.
▶ Introduce 4 dummy variables D1 , . . . , D4 , and assign each of them to the remaining
brands, e.g. D1 = 1, if brand is equal to RO and D1 = 0, otherwise; D2 = 1, if
brand is equal to VO and D2 = 0, otherwise; etc.
Y = β0 + β1 D1 + . . . + β4 D4 + β5 P + u (59)
Part VII: Further Properties of the OLS Estimator and Dummy Variables Dummy Variables 212 / 288
Case Study Marketing
▶ The difference in the expected average rating between two arbitrary brands Dj and
Dk is equal to βj − βk .
▶ Is the rating different for the brands Dj and Dk ? Test the linear hypothesis
βj − βk = 0!
Part VII: Further Properties of the OLS Estimator and Dummy Variables Dummy Variables 213 / 288
Case Study Marketing
Y = β0 + β1 D1 + . . . + β5 D5 + β6 P + u
D1 + D2 + . . . + D5 = 1
Hence, the set of regressors D1 , . . . , D5 is perfectly correlated with the regressor ’1’
corresponding to the intercept ⇒ EViews produces an error message indicating
difficulties with estimating the model.
Part VII: Further Properties of the OLS Estimator and Dummy Variables Dummy Variables 214 / 288
Case Study Marketing
Y = β1 D1 + . . . + β5 D5 + β6 P + u
▶ βj is a brand specific intercept of the regression model for the brand corresponding
to Dj .
▶ For a given price level P, the expected rating for the brand corresponding to Dj is
given by βj + β6 P.
▶ The difference in the expected average rating between two arbitrary brands Dj and
Dk is (again) equal to βj − βk .
Part VII: Further Properties of the OLS Estimator and Dummy Variables Dummy Variables 215 / 288
Part VIII
Residual Diagnosis
Outline
▶ Residual Diagnostics
Hypothetical model:
Y = β0 + β1 X1 + . . . + βK XK + u
Estimated model:
▶ Roughly 95% of the OLS residuals lie between [−2σ̂, 2σ̂]; only 5% lie outside.
▶ Assumption often violated if outliers are present.
▶ Normality often improved through transformations.
N N
! !
1 1 X 1 1 X
m3 = 3 û 3 m4 = 4 û 4
σ̂ N i=1 i σ̂ N i=1 i
Jarque-Bera-Statistic:
N −K 1
J= m32 + (m4 − 3)2 (60)
6 4
where
0.30
50
0.25
40
0.20
30
Frequency
0.15
20
0.10
10
0.05
910.094
0.00
0
−1.0 −0.5 0.0 0.5 1.0 1.5 2.0 2.5 1 10 100 1000 10000
where
yi . . . profit 1994
x1,i . . . profit 1993
x2,i . . . turnover 1994
0.30
2.5
0.25
0.20
2.0
Frequency
0.15
1.5
0.10
1.0
0.05
0.5
0.245
0.00
0.0
2.811
−1500 −1000 −500 0 500 1000 1500 2000 1 10 100 1000 10000
V(u|X1 , . . . , XK ) = σ 2
First informal check: residual plot; more about formal tests later.
2.5
2.0
2.0
0.5 1.0 1.5
1.5
Residuals
1.0
0.5
0.0
0.0
−1.0
−1.0
2 4 6 8 10 12 14 16 4 6 8 10 12 14
Yield with maturity 1 month Yield with maturity 60 months
20
Residual
15
Actual
Fitted
10
5
0
1966 1969 1971 1974 1976 1979 1982 1984 1987 1989 1992
Assumption (26)
The model does not contain any systematic error, i.e.
E(u|X1 , . . . , XK ) = 0
Example: Simulate data from a simple log-linear regression model with β̃1 = 0.2 and
β2 = −1.8:
yi = 0.2xi−1.8 e ui (63)
−2.5 0.2
0.1
−3
log(demand)
OLS−error
0
−3.5
−0.1
−4
−0.2
−4.5 −0.3
−5 −0.4
0 0.5 1 1.5 2 0 0.5 1 1.5 2
log(price) log(price)
OLS − misspecification
0.12 0.04
0.1 0.03
0.08
0.02
OLS−error
demand
0.06
0.01
0.04
0
0.02
0 −0.01
−0.02 −0.02
1 2 3 4 5 1 2 3 4 5
price price
Part VIII: Residual Diagnosis Residual Diagnostics 231 / 288
Case Study Profit
4000
3000
3000
2000
2000
Residuals
GEW94
1000
1000
0
0
−1000
−1000
−2000 −1000 0 1000 2000 3000 −2000 −1000 0 1000 2000 3000
GEW93 GEW93
▶ Residual Diagnostics
Part VIII: Residual Diagnosis Model Evaluation and Model Comparison 233 / 288
Model Comparison Using R2 and AIC/BIC
Part VIII: Residual Diagnosis Model Evaluation and Model Comparison 234 / 288
Coefficient of Determination R2
Part VIII: Residual Diagnosis Model Evaluation and Model Comparison 235 / 288
R / EViews Class Exercise
Discuss in R / EViews where to find SSR and R2 ; discuss how SSR and R2 change when
number of predictors is increased
▶ Case Study Profit, workfile profit
▶ Case Study Chicken, workfile chicken
▶ Case Study Marketing, workfile marketing
⇒ R-code code_eco_I.R
Part VIII: Residual Diagnosis Model Evaluation and Model Comparison 236 / 288
Case Study Chicken (Log-Linear Model)
Predictors SSR R2
pchick 0.273487 0.647001
income 0.041986 0.945807
income, pchick 0.015437 0.980074
income, pchick, ppork 0.014326 0.981509
income, pchick, ppork, pbeef 0.013703 0.982313
Part VIII: Residual Diagnosis Model Evaluation and Model Comparison 237 / 288
Problems with R2
▶ Choosing the model with the smallest SSR (largest R2 ) leads to overfitting: R2
“automatically” increases when the number of variables increases.
▶ R2 is 1 for K = N − 1 because SSR = 0 if we include as many predictors as
observations (even if the predictors are useless!).
▶ However, the increase is small when a useless predictor is added. ⇒ penalize the
ever decreasing SSR by incorporating the number of parameters used for
estimation!
Part VIII: Residual Diagnosis Model Evaluation and Model Comparison 238 / 288
Adjusted R2
A very simple way out is to adjust R2 to cater for the number of parameters:
Adjusted R2
N −1 N − 1 SSR s2
R2adj = 1 − (1 − R2 ) = 1 − = 1 − û2
N −K −1 N − K − 1 TSS sy
Alternatively (or better), use so-called “information criteria” AIC and SC (BIC).
Part VIII: Residual Diagnosis Model Evaluation and Model Comparison 239 / 288
Information Criteria
Part VIII: Residual Diagnosis Model Evaluation and Model Comparison 240 / 288
Information Criteria
40
35
30
Penalty
25
20
15
10
0
1 2 3 4 5 6 7 8 9 10
Parameter k
Discuss in EViews where to find R2adj , AIC, and Schwarz criterion; discuss how to choose
predictors based on these model choice criteria.
▶ Case Study Profit, workfile profit
▶ Case Study Chicken, workfile chicken
▶ Case Study Marketing, workfile marketing
⇒ R-code code_eco_I.R
Part VIII: Residual Diagnosis Model Evaluation and Model Comparison 242 / 288
Case Study Chicken (Log-Linear Model)
Caveat: Mind the different implementations in R and EViews (but the result of the best
model remains the same).
Part VIII: Residual Diagnosis Model Evaluation and Model Comparison 243 / 288
Comparing Linear and Log-Linear Models
▶ The residual sum of squares SSR depends on the scale of yi , therefore AIC and SC
are scale dependent.
▶ AIC and SC cannot be used directly to compare a linear and a log-linear model.
▶ AIC and SC of the log-linear model could be matched back to the original scale by
adding 2 times the mean (EViews) or 2 times the sum (R) of the log-values of yi .
Part VIII: Residual Diagnosis Model Evaluation and Model Comparison 245 / 288
Part IX
Advanced Multiple Regression Models
Outline
▶ Interaction Terms
Part IX: Advanced Multiple Regression Models Quadratic Terms 247 / 288
Models with Quadratic Terms
Y = β0 + β1 X + β2 X 2 + u (67)
Implications:
▶ OLS estimation of β0 , β1 , and β2 proceeds as discussed above, based on the
predictors X1 = X and X2 = X 2
▶ Although the relationship between X1 and X2 is deterministic (note that X2 = X12 ),
the predictors X1 and X2 are not linearly dependent; hence, OLS estimation is
feasible
▶ Note that the relationship between X and E (Y |X ) is non-linear
Part IX: Advanced Multiple Regression Models Quadratic Terms 248 / 288
Models with Quadratic Terms
0 100
−20
80
−40
60
−60
Y
Y
40
−80
−100 20
−120 0
−140
−20
−160
−30 −20 −10 0 10 20 30 40 50 −20 −10 0 10 20 30 40 50
X X
10
0
8
−5
Y
Y
−10
4
−15
2
0 −20
0 2 4 6 8 10 0 2 4 6 8 10
X X
▶ The parabola corresponding to the quadratic function (67) opens up iff β2 > 0 and
opens down iff β2 < 0
▶ The vertex (Scheitel) is obtained by setting the first derivative of E(Y |X = x ) with
respect to x equal to 0:
∂E(Y |X = x )
= β1 + 2β2 x = 0
∂x
yields that the vertex lies at x0 = −β1 /(2β2 ); note that x0 is negative if β1 and β2
have the same sign, and positive otherwise.
Part IX: Advanced Multiple Regression Models Quadratic Terms 251 / 288
Monotonic Behavior
Often, only part of the parabola is used to describe a monotonic behavior over a certain
range of X , e.g. the smallest and the largest observed value of X
Part IX: Advanced Multiple Regression Models Quadratic Terms 252 / 288
Testing for Non-Linearity
Part IX: Advanced Multiple Regression Models Quadratic Terms 253 / 288
Understanding the Coefficients
The instantaneous change of E(Y |X = x ) is equal to the first derivative with respect to
x:
∂E(Y |X = x )
= β1 + 2β2 x (68)
∂x
Part IX: Advanced Multiple Regression Models Quadratic Terms 254 / 288
Understanding the Coefficients
Part IX: Advanced Multiple Regression Models Quadratic Terms 255 / 288
Understanding the Coefficients
Suppose that β1 is positive while β2 is negative. Then according to the first term in
(68), increasing x will increase E(Y |X = x ), however, this positive effect becomes
smaller with increasing x . It remains positive as long as x is smaller than the vertex x0 :
β1
x <−
2β2
If x is larger than the vertex x0 , there is a negative effect of increasing x , which gets
larger with increasing x .
Part IX: Advanced Multiple Regression Models Quadratic Terms 256 / 288
Monotonic Behavior
Part IX: Advanced Multiple Regression Models Quadratic Terms 257 / 288
Monotonic Behavior
Example: E(Y |X = x ) = 20 + 0.005x − 0.2x 2 , 1 ≤ x ≤ 5
▶ Parabola opens down because β2 = −0.2 < 0
▶ Vertex: 0.005 − 0.4x0 = 0 ⇒ x0 = 0.0125
▶ Range of x restricted to the right hand side ⇒ monotonically decreasing function
20
19.5
19
18.5
18
17.5
Y
17
16.5
16
15.5
15
1 1.5 2 2.5 3 3.5 4 4.5 5
X
Part IX: Advanced Multiple Regression Models Quadratic Terms 258 / 288
Case Study Chicken
Y = β0 + β1 X1 + β2 P1 + β3 P12 + β4 P2 + β5 P22 + u
with X1 the income, P1 the price of chicken, and P2 the price of pork. This model
outperforms a model without quadratic terms according to AIC and SC.
β2 is negative, but the negative effect decreases as the price increases, since β3 is
positive. The vertex is equal to
β2 −1.69
− =− = 60.
2β3 2 × 0.014
This value lies in the range of observed prices, hence the chicken price effect changes
sign over the whole range of observations.
Part IX: Advanced Multiple Regression Models Quadratic Terms 259 / 288
Case Study Chicken
β4 is positive, but the positive effect decreases as the price of pork increases, since β5 is
negative. The vertex is equal to
β4 0.542
− =− = 113.
2β5 2 × −0.0024
This value lies in the range of observed prices, hence the pork price effect changes sign
over the whole range of observations.
Part IX: Advanced Multiple Regression Models Quadratic Terms 260 / 288
Outline
▶ Interaction Terms
Part IX: Advanced Multiple Regression Models Interaction Terms 261 / 288
Models with Interaction Terms
Y = β0 + β1 X1 + β2 X2 + β3 X1 X2 + u (69)
Part IX: Advanced Multiple Regression Models Interaction Terms 262 / 288
Models with Interaction Terms
∂E(Y |X1 = x1 , X2 = x2 )
= β1 + β3 x 2
∂x1
and depends on the actual value of x2 .
▶ The first derivative of E(Y |X1 = x1 , X2 = x2 ) with respect to x2 is given by:
∂E(Y |X1 = x1 , X2 = x2 )
= β2 + β3 x 1
∂x2
and depends on the actual value of x1 .
Part IX: Advanced Multiple Regression Models Interaction Terms 263 / 288
Understanding the Coefficients
δ1 = β1 + β3 X2
Similarly, the average effect δ2 of X2 on E(Y |X1 , X2 ) can be evaluated at the sample
mean X1 of X1 :
δ2 = β2 + β3 X1
Part IX: Advanced Multiple Regression Models Interaction Terms 264 / 288
Centering the Predictors
Y = δ0 + δ1 X1 + δ2 X2 + δ3 (X1 − X1 )(X2 − X2 ) + u
Thus,
▶ δ1 is the average effect of X1 on E(Y |X1 , X2 ) at the mean of X2
▶ δ2 is the average effect of X2 on E(Y |X1 , X2 ) at the mean of X1
Part IX: Advanced Multiple Regression Models Interaction Terms 265 / 288
Case Study Chicken
Y = β0 + β1 X1 + β2 P1 + β3 P2 + β4 X1 P2 + u
This model outperforms a model without the interaction term according to AIC and BIC.
β3 is positive, but the positive effect of increasing the price of pork decreases as the
income X1 increases, since β4 is negative:
∂E(Y |X1 = x1 , P1 = p1 , P2 = p2 )
= β3 + β4 x1
∂p2
Part IX: Advanced Multiple Regression Models Interaction Terms 266 / 288
Case Study Chicken
The average income is equal to X1 = 1035.065, hence the average effect of the price of
pork is equal to:
This value is considerably smaller than the effect obtained from the model without an
interaction term (0.174).
The average effect of the price of pork is obtained immediately from OLS estimation if
following model is fit to the data:
Y = δ0 + δ1 X1 + δ2 P1 + δ3 P2 + δ4 (X1 − X1 )(P2 − P2 ) + u
Part IX: Advanced Multiple Regression Models Interaction Terms 267 / 288
Outline
▶ Interaction Terms
Part IX: Advanced Multiple Regression Models Dummy Variables with Interaction Terms 268 / 288
Interaction with Dummy Variables
Y = β0 + β1 D + β2 X + β3 XD + u (70)
If D = 0, then:
Y = β0 + β2 X + u
If D = 1, then:
Y = (β0 + β1 ) + (β2 + β3 )X + u
Part IX: Advanced Multiple Regression Models Dummy Variables with Interaction Terms 269 / 288
Interaction with Dummy Variables
D=1
22
20
18
16
Y
14
D=0
12
10
6
0 1 2 3 4 5
X
Part IX: Advanced Multiple Regression Models Dummy Variables with Interaction Terms 270 / 288
Interaction with Dummy Variables
Interpretation:
▶ The observed units are split into 2 groups according to D (e.g. into men and
women)
▶ The coefficient β3 models the difference in the marginal effect of X between the
two groups. A change ∆X has an expected change of Y equals to
▶ E(Y |X = x + ∆x , D = 0) − E(Y |X = x , D = 0) = β2 ∆x ,
▶ E(Y |X = x + ∆x , D = 1) − E(Y |X = x , D = 1) = (β2 + β3 )∆x ,
▶ The difference in the expected value of Y between the two groups for a given value
of X is equal to:
E(Y |X , D = 1) − E(Y |X , D = 0) = β1 + β3 X
Part IX: Advanced Multiple Regression Models Dummy Variables with Interaction Terms 271 / 288
Interaction with Dummy Variables
Testing hypotheses:
▶ The null hypothesis β3 = 0 corresponds to the assumption that the effect of X is
the same for both groups (interaction effect is not significant)
▶ The joint null hypothesis β2 = 0, β2 + β3 = 0 corresponds to the assumption that
the effect of X is zero for both groups
▶ The joint null hypothesis β1 = 0, β3 = 0 corresponds to the assumption that the
regression model is the same for both groups
Part IX: Advanced Multiple Regression Models Dummy Variables with Interaction Terms 272 / 288
Case Study Marketing
Y = β0 + β1 D + β2 P + β3 PD + u
Results:
▶ There is a very significant price effect for the specific brand
▶ Increasing the price for an ordinary brand by one unit leads to an expected decrease
in the rating by β2 , i.e. around 0.31 points
▶ For the KR brand, the price effect is equal to β2 + β3 , i.e. increasing the price for
the specific brand by one unit leads to an expected decrease in the rating by 0.26
points
Part IX: Advanced Multiple Regression Models Dummy Variables with Interaction Terms 273 / 288
Part X
Regression with Heteroscedastic Errors
Outline
Part X: Regression with Heteroscedastic Errors Regression Models with Heteroskedastic Errors 275 / 288
Regression Models with Heteroskedastic Errors
Part X: Regression with Heteroscedastic Errors Regression Models with Heteroskedastic Errors 276 / 288
Case Study Profit
yi = β0 + β1 x1,i + β2 x2,i + ui
yi . . . profit 1994
x1,i . . . profit 1993
x2,i . . . turnover 1994
Part X: Regression with Heteroscedastic Errors Regression Models with Heteroskedastic Errors 277 / 288
OLS Estimation under Heteroskedasticity
Simulate data from a regression model with β0 = 0.2 and β1 = −1.8 and
heteroskedastic errors:
yi = 0.2 − 1.8xi + ui , ui ∼ N 0, σi2
σi2 = σ 2 (0.2 + xi )2
1.5
0.5
Y
−0.5
−1
−0.5 0 0.5
X
Part X: Regression with Heteroscedastic Errors Regression Models with Heteroskedastic Errors 278 / 288
OLS Estimation under Heteroskedasticity
2 2
N=50,σ =0.1,Design 2 N=50,σ =0.1,Design 2
−1.5 −1.5
−1.6 −1.6
−1.7 −1.7
−1.8 −1.8
β2 (price)
β (price)
2
−1.9 −1.9
−2 −2
−2.1 −2.1
−2.2 −2.2
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4
β1 (constant) β1 (constant)
Left hand side: estimation errors obtained from a simulation study with 200 data sets
(each N = 50 observations); right hand side: contours show estimation error according
to OLS estimation
Part X: Regression with Heteroscedastic Errors Regression Models with Heteroskedastic Errors 279 / 288
Weighted Least Squares Estimation
Part X: Regression with Heteroscedastic Errors Regression Models with Heteroskedastic Errors 280 / 288
Weighted Least Squares Estimation
yi = β0 + β1 x1,i + . . . + βK xK ,i + ui
Regression model (72) has identical parameters as the original model, but a transformed
response variable as well as transformed predictors.
Part X: Regression with Heteroscedastic Errors Regression Models with Heteroskedastic Errors 281 / 288
Weighted Least Squares Estimation
where
yi 1 xj,i
yi⋆ = √ , ⋆
x0,i =√ , ⋆
xj,i =√ , ∀j = 1, . . . , K
Zi Zi Zi
Note that model (73) fulfills assumption (38), i.e. it is a model with homoskedastic
errors.
Part X: Regression with Heteroscedastic Errors Regression Models with Heteroskedastic Errors 282 / 288
Weighted Least Squares Estimation
Residuals ui for observations with big variances are down-weighted, while residuals for
observations with small variances obtain a higher weight. Hence the name weighted
least squares estimation.
There is no “intercept” in the model (73), only covariates. Using the matrix formulation
of the multiple regression model (73), we obtain the following matrix of predictors and
observation vector:
where
1
wi = √ , i = 1, . . . , N
Zi
Part X: Regression with Heteroscedastic Errors Regression Models with Heteroskedastic Errors 284 / 288
Weighted Least Squares Estimation
This is equal to following WLS estimator, which is expressed entirely in terms of the
original variables:
′ ′
β̂ = (X WX)−1 X Wy (74)
Part X: Regression with Heteroscedastic Errors Regression Models with Heteroskedastic Errors 285 / 288
Testing for Heteroskedasticity
▶ Classical tests for heteroskedasticity are based on the squared OLS residuals ûi2 ,
e.g. the White or the Breusch-Pagan heteroskedasticity test. The idea is to test for
dependence of the squared residuals on any of the predictor variables using a
regression type model:
ûi2 = α0 + α1 x1,i + . . . + αK xK ,i + ξi
Part X: Regression with Heteroscedastic Errors Regression Models with Heteroskedastic Errors 286 / 288
Case Study Profit
yi = β0 + β1 x1,i + β2 x2,i + ui
Part X: Regression with Heteroscedastic Errors Regression Models with Heteroskedastic Errors 287 / 288
Some Final Words. . .
http://xkcd.com/552/
Part X: Regression with Heteroscedastic Errors Regression Models with Heteroskedastic Errors 288 / 288