Lecture 2 & 3: Simple Linear Regression: Gumilang Aryo Sahadewo

Lecture 2 & 3: Simple Linear Regression
Gumilang Aryo Sahadewo
Department of Economics
Universitas Gadjah Mada
October 9, 2017
Gumilang Aryo Sahadewo (MEP UGM) Applied ECM: Lecture 2 & 3 October 9, 2017 1 / 52
Logistics
Textbook:
JW: Introductory Econometrics: A Modern Approach by Jeffrey M.
Woolridge, sixth edition, required.
Lecture notes
Class notes
Review
Econometrics is a useful tool to estimate the effect of changing one

variable on another one.
Given two random variables y and x, we are interested in studying
how y varies with changes in x keeping everything else constant.
To address this issue, we study a simple linear regression model.
Motivation
Consider the next questions.

A governmental program increase the salary of teachers: what is the
effect on its students test scores?
The GoI implements a higher excise tax for cigarette: what is the effect
on layoffs?
Quantitative answers to these questions are useful for taking decisions
and policy recommendations.
The simple regression model is the base model to address all these
questions from a simple and general perspective.
Simple Regression Model
In a simple regression model that studies how y varies with changes in

x
Y = 0 + 1 X + u
Y is dependent variable, explained variable, LHS variable

X is independent variable, explanatory variable, control variable, RHS
variable
The parameters of the model are 0 and 1 . The former is the
intercept, while the latter is the slope. Interest usually relies on 1 .
The term u in a regression model
Textbooks define u as error term or disturbance.

I personally dont like it! So, what is u?
In a simple regression model that studies how Y varies with changes
in x
Y = 0 + 1 X + u
The term u includes variables we do not control in this model, but

also affects Y .
Consider the following variables:
Y = monthly wage (in dollars);
X = years of education.
Interpreting Coefficients
If the other factors in u are held fixed, so that the change in u is

zero,u = 0 , then x has a linear effect on y:
Y = 1 X , if u = 0
The change in y is simply 1 multiplied by the change in x. This

means that 1 is the slope parameter in the relationship between y
and x, holding the other factors in u fixed; it is of primary interest in
applied economics.
The intercept parameter 0 , sometimes called the constant term, also
has its uses, although it is rarely central to an analysis.
An Example (A Simple Wage Equation)
A model relating a persons wage to observed education and other

unobserved factors is
Wage = 0 + 1 Education + u
If wage is measured in dollars per hour and Education measures years

of education, then 1 measures the change in hourly wage given
another year of education.
What does 0 measure?
Lets look at a scatter graph
Example: Test Score and Student-Teacher Ratio
Now consider the model Y = 0 + 1 X + u, where

Y = average test scores;
X = student/teacher ratio.
Again, u may represent the intelligence or social background.
What is the expected sign of 1 ? Why?
1 has the following interpretation: if the student/teacher ratio
increases by one, then average test scores increases by 1 , keeping
everything else constant.
Two Questions
In what follows, we focus on the next two questions:

When is it possible to recover (0 , 1 ) from data?
How can we do it?
The answers are:
Key Assumptions.
Ordinary Least Squares (OLS) estimator.
Assumptions in Linear Regression Model
The key assumption involves the conditional expectation of u given x,

E (u|X ).
We assume zero conditional mean assumption:
E [u | X ] = 0
This assumption implies that u and X are uncorrelated

Since the key assumption involves an unobservable variable, it cannot
be tested.
Zero conditional mean assumption: an example
In a wage equation
Zero conditional mean assumption implies: E [u | Education] = 0.
In a wage equation

This means u and Education are uncorrelated
To simplify the discussion, assume that u is the same as innate ability
in the wage equation.
In a wage equation

This means u and Education are uncorrelated
To simplify the discussion, assume that u is the same as innate ability
in the wage equation.
The zero conditional mean assumption requires that the average level
of ability is the same regardless of years of education.
If, for example, we think that average ability increases with years of
education, then the assumption is false. (This would happen if, on
average, people with more ability choose to become more educated.)
Assumptions in Linear Regression Model
E [Y | X ] is a linear function of X; for any given value of X, the
distribution of Y is centered about E [Y | X ]
Linear regression model
We write the basic model as
Y = 0 + 1 X + u
0 is the intercept, 1 is the coefficients of their X , also called the

slopes.
Estimated s are denoted 0 , 1 .
Fitted values of Y are denoted Y : Yi = 0 + 1 Xi
Residuals are denoted u: ui = Yi Yi
How to estimate the model: data
In order to estimate the model one needs data

A random sample of n observations.
Data: Test Score and Student-Teacher Ratio
How to estimate the model: ordinary least squares
The most popular approach to estimating 0 , 1 is ordinary least
squares.
We choose 0 , 1 to minimize the sum of squared residuals, i.e., we
solve
n
X 2
minimize Yi 0 1 Xi
0 ,1 i=1
This is a calculus/optimization approach.

The proof is beyond the scope of this course
The OLS estimators are:
Pn
i=1 (Xi X )(Yi Y )
1 = Pn 2
i=1 (Xi X )
0 = Y 1 X
Ordinary least squares
Ordinary least squares
Some comments on OLS estimators
The slope estimator is the sample covariance of X and Y divided by

sample variance of X .
If X and Y are positively correlated, then the slope is positive.
If X and Y are negatively correlated, then the slope is negative.
Example: CEO Salary and Return on Equity
Let y be annual salary (in thousands of dollars) and x be the average
roe (in percentage) for the CEOs firm.
Salary = 0 + 1 ROE + u
Let Y be annual salary (in thousands of dollars) and X be the

average roe (in percentage) for the CEOs firm.
The model is salary = 0 + 1 roe + u.
The OLS estimates are 0 = 963.191 and 1 = 18.501.
salary = 963.191 + 18.501 roe
Let Y be annual salary (in thousands of dollars) and X be the

average roe (in percentage) for the CEOs firm.
The model is salary = 0 + 1 roe + u.
The OLS estimates are 0 = 963.191 and 1 = 18.501.
salary = 963.191 + 18.501 roe
If roe equals zero, the predicted annual salary is 963.191 thousands of

dollars, i.e., $963, 191 = 963.191 1, 000.
If roe increases by one percentage point, then the salary increases by
$18, 501 = 18.501 1, 000 .
Estimated Regression Line
Example: Monthly Wage and Years of Education
Dataset: wage.dta
Command: graph twoway (scatter wage educ) (lfit wage educ)
Estimated Regression Line
Example: Test Score and Student-Teacher Ratio
Dataset: caschool.dta
Command: graph twoway (scatter testscr str) (lfit testscr str)
Some important quantities
Total sum of squares (SST):
n
X 2
SST Yi Y
i=1
Explained sum of squares (SSE):

n
X 2
SSE Yi Y
i=1
Residual sum of squares (SSR):

n 2 n
(ui )2
X X
SSR Yi Yi =
i=1 i=1
SST = SSE + SSR.
Goodness of fit: R 2
One measure of the goodnessoffit of a regression model is the R 2 :

SSE SSR
R2 =1
SST SST
R 2 measures the predictive power of our model.
0 R 2 1. R 2 = 1 means perfect fit. If R 2 is near one, the
regressor X is good at predicting Y. If R 2 is close to zero, the
regressor is not good at predicting Y.
R 2 does not depend on the units of measurements
Example
CEO Salary and return on equity
\ = 963.191 + 18.501ROE
salary
n = 209, R 2 = 0.0132
Wage and education
[ = 0.90 + 0.54educ
wage
n = 526, R 2 = 0.163
Caution: A high R-squared does not necessarily mean that the

regression has a causal interpretation!
Incorporating Nonlinearities
So far the population regression line was assumed to be linear,

E (Y |X ) = 0 + 1 X .
This assumption is very strong, and in many cases, there are reasons
to believe that the relationship between X and Y is nonlinear.
In applied work, e.g., you will often encounter regression equations
where the dependent variable appears in logarithmic form.
For instance, the simple Mincer equation is given by
log(wage) = 0 + 1 educ + u.
Incorporating nonlinearities in simple regression
Suppose, instead, that the percentage increase in wage is the same,

given one more year of education. A model that gives (approximately)
a constant percentage effect is
log(wage) = 0 + 1 educ + u
where log() denotes the natural logarithm.

In particular, if u = 0 , then
%wage (100 1 )educ
Log-Linear Model
The log-linear regression model is
log(y ) = 0 + 1 x + u.
Note that the predicted value of y is always positive because

y = exp(0 + 1 x + u) > 0. This is a desired property in many cases,
wage-education e.g.
Suppose that x increases by one unit, i.e., x = 1. Then, y increases
as follows:
log(y + y ) = 0 + 1 (x + x ) + u = log(y ) + 1 x = log(y ) + 1 ,
which implies log(y + y ) log(y ) = 1 .

Then, (y )/y 1 , so y increases by 1 100%.
Example
Suppose we use log(wage) as the dependent variable, we obtain the

following relationship
\ = 0.584 + 0.083educ + u
log(wage)
n = 526, R 2 = 0.186
The coefficient on educ has a percentage interpretation when it is

multiplied by 100: wage increases by 8.3 percent for every additional
year of education. This is what economists mean when they refer to
the return to another year of education.
Log-Log Model
Interpretation of Coefficients: Elasticity
The second case is a model where both y and x are transformed by

taking their logarithms:
log(y ) = 0 + 1 log(x) + u.
Now 1 represents a constant elasticity, i.e., a one percent increase in

x leads to a 1 % increase in y .
E.g., if 1 = 0.312, a 1% increase in x leads to a 0.312% increase in
y.
In quantity-price mode with y = q d and x = price, if 0 1 < 1, the
demand is said to be inelastic.
The interpretation of 0 is irrelevant.
Linear-Log Model
The third case is a model where x is transformed by taking its

logarithm:
y = 0 + 1 log(x) + u.
This model is usually called linear-log model.
The interpretation of 1 is as follows: a one percent increase in x
leads to 1 0.01 units increase in y .
When 1 = 27 e.g., a one percent increase in x leads to 0.27 units
increase in y .
Note that 0 represents the expected value of y given x = 1 because
log(1) = 0
Advantages of linear regression
Easy.
Computationally simple/fast.
Speed insensitive to dimension of X .
Relatively easy to interpret.
Recognizable & crossdisciplinary.
Somewhat flexible.
Standardized.
Disadvantages of linear regression
Does not allow one to discover nonlinear structure in the data
Any nonlinear structure must be known and specified a priori.
How to Run Regressions in Stata?
reg Y X
How to Run Regressions in STATA?
Standard assumptions for the simple linear regresion model
Assumption SLR.1 (Linear in parameters)

The Data Generating Process can be written as
Y = 0 + 1 X + u
which is linear in parameters.

Assumption SLR.2 (Random sampling)
We have a random sample {(Yi , Xi ) | i = 1, 2, . . . , n} with n 2.
Assumption SLR.3 (Sample variation in explanatory variable)
The values of the explanatory variables are not all the same
n
X
(Xi X )2 > 0
i=1
Standard assumptions for the simple linear regresion model
Assumption SLR.4 (Zero conditional mean)

Zero conditional mean of the error term: E [ui | Xi ] = 0.
Assumption SLR.5 (Homoskedasticity)
Homoskedasticity: Var (ui | Xi ) = 2 .
OLS estimator is unbiased
Theorem (Unbiasedness)
Under Assumption SLR.1- SLR.4:
E (0 ) = 0 , and E (1 ) = 1
for any values of 0 , 1 . In other words, 0 is an unbiased estimator for

0 , and 1 is an unbiased estimator for 1
Interpretation of unbiasedness
The estimated coefficients may be smaller or larger, depending on the

sample that is the result of a random draw
However, on average, they will be equal to the values that
characterize the true relationship between y and x in the population
On average means if sampling was repeated, i.e. if drawing the
random sample und doing the estimation was repeated many times
In a given sample, estimates may differ considerably from true values
Some Comments
Remember that unbiasedness is a feature of the sampling distributions

of 1 and 0 , which says nothing about the estimate that we obtain
for a given sample.
Unbiasedness generally fails if any of our four assumptions fail.
Think about examples how SLR.1-SLR.4 would fail.
Variances of the OLS Estimators
In addition to knowing that the sampling distribution of 1 is

centered about 1 (1 is unbiased), it is important to know how far
we can expect 1 to be away from 1 on average.
This also allows us to choose the best estimator among all, or at least
a broad class of, unbiased estimators. Estimator that has the smallest
variance is called efficient estimator.
The estimation will be much simpler under the SLR.5:
homoskedasticity assumption
Var (ui | Xi ) = 2 , Xi
Under the zero conditional mean assumption: E (u|X ) = E (u) = 0,

we have
Var (u|X ) = E (u 2 |X ) [E (u|X )]2 = E (u 2 |X )
This means 2 is also the unconditional expectation of u 2 :

2 = E (u 2 |X ) = Var (u|X )
It is useful to write the model in terms of conditional mean and
conditional variance of Y:
E (Y |X ) = 0 + 1 X
Var (Y |X ) = Var (0 + 1 X + u|X ) = Var (u|X ) = 2
When Var (u|X ) depends on X , the error term is said to exhibit

heteroskedasicity (or nonconstant variance). Will this make 1 biased?
Graphical illustration of homoskedasticity
The variability of the unobserved influences does not dependent on
the value of the explanatory variable
Graphical illustration of heteroskedasticity
The variability of the unobserved influences depends on the value of
the explanatory variable (e.g. wage v.s. educ)
Homoskedasticity
Discussion
Now we discuss whether the predictive power of regression line,

E (Yi |Xi ) = 0 + 1 Xi , differs across Xi .
Recall that the model is Y = 0 + 1 X + u and the population
regression function is E (Y |X ) = 0 + 1 X .
The conditional variance of the error term, i.e., var(u|x) determines
the predictive power of the latter .
When var(u|x) is constant, the predictive power of E (y |x) does not
vary with x.
When var(u|x) is constant, we say that the error term exhibits
homoskedasticity or that the error term is homoskedastic.
Under SLR.1-SLR.5:
2 2
Var (1 ) = Pn 2
=
i=1 (Xi X ) SSTx
2 Pn 2
n 1
i=1 Xi
Var (0 ) = Pn 2
i=1 (Xi X )
q
Standard deviation of 1 is sd(1 ) = Var (1 ) = / SSTx
The sampling variability of the estimated regression coefficients will
be the higher the larger the variability of the unobserved factors, and
the lower, the higher the variation in the explanatory variable .
Under SLR.1-SLR.5:
2 2
Var (1 ) = Pn 2
=
i=1 (Xi X ) SSTx
2 P n 2
n 1
i=1 Xi
Var (0 ) = Pn 2
i=1 (Xi X )
However, in most of cases, 2 is unknown, therefore we need to use

the data to estimate 2 .
Estimating the Error Variance
The unbiased estimator 2 for 2

n n
1 X 1 X
2 = (ui ui )2 = u 2 = SSR/(n 2)
n 2 i=1 n 2 i=1 i
(n 2) is the degree of freedom

Under SLR.1-SLR.5, E ( 2 ) = 2
Calculation of standard errors for regression coefficients

= 2 is called standard error of the regression
Standard error of 1
r s
\ 2
se(1 ) = Var (1 ) = = qP
SSTx n 2
i=1 (Xi X )
Var (1 ) and/or se(1 )measure how precisely the regression

coefficients are estimated
Standard errors in Stata regression output

Lecture 2 & 3: Simple Linear Regression: Gumilang Aryo Sahadewo

Uploaded by

Copyright:

Available Formats

You might also like

Lecture 2 & 3: Simple Linear Regression: Gumilang Aryo Sahadewo

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture 2 & 3: Simple Linear Regression: Gumilang Aryo Sahadewo

Uploaded by

Copyright:

Available Formats

Lecture 2 & 3: Simple Linear Regression

Gumilang Aryo Sahadewo

Econometrics is a useful tool to estimate the effect of changing one

Consider the next questions.

In a simple regression model that studies how y varies with changes in

Y is dependent variable, explained variable, LHS variable

Textbooks define u as error term or disturbance.

The term u includes variables we do not control in this model, but

If the other factors in u are held fixed, so that the change in u is

The change in y is simply 1 multiplied by the change in x. This

A model relating a persons wage to observed education and other

If wage is measured in dollars per hour and Education measures years

Now consider the model Y = 0 + 1 X + u, where

In what follows, we focus on the next two questions:

The key assumption involves the conditional expectation of u given x,

This assumption implies that u and X are uncorrelated

Zero conditional mean assumption implies: E [u | Education] = 0.

Zero conditional mean assumption implies: E [u | Education] = 0.

Zero conditional mean assumption implies: E [u | Education] = 0.

We write the basic model as

0 is the intercept, 1 is the coefficients of their X , also called the

In order to estimate the model one needs data

This is a calculus/optimization approach.

The slope estimator is the sample covariance of X and Y divided by

Let Y be annual salary (in thousands of dollars) and X be the

salary = 963.191 + 18.501 roe

Let Y be annual salary (in thousands of dollars) and X be the

salary = 963.191 + 18.501 roe

If roe equals zero, the predicted annual salary is 963.191 thousands of

Explained sum of squares (SSE):

Residual sum of squares (SSR):

SST = SSE + SSR.

One measure of the goodnessoffit of a regression model is the R 2 :

CEO Salary and return on equity

Wage and education

Caution: A high R-squared does not necessarily mean that the

So far the population regression line was assumed to be linear,

Suppose, instead, that the percentage increase in wage is the same,

where log() denotes the natural logarithm.

%wage (100 1 )educ

The log-linear regression model is

Note that the predicted value of y is always positive because

log(y + y ) = 0 + 1 (x + x ) + u = log(y ) + 1 x = log(y ) + 1 ,

which implies log(y + y ) log(y ) = 1 .

Suppose we use log(wage) as the dependent variable, we obtain the

The coefficient on educ has a percentage interpretation when it is

The second case is a model where both y and x are transformed by

Now 1 represents a constant elasticity, i.e., a one percent increase in

The third case is a model where x is transformed by taking its

Does not allow one to discover nonlinear structure in the data

Any nonlinear structure must be known and specified a priori.

Assumption SLR.1 (Linear in parameters)

which is linear in parameters.

Assumption SLR.4 (Zero conditional mean)

for any values of 0 , 1 . In other words, 0 is an unbiased estimator for

The estimated coefficients may be smaller or larger, depending on the

Remember that unbiasedness is a feature of the sampling distributions