lect2_part1

Econometrics I: Fundamentals of Regression Analysis
Part 1
Javier Abellán, Màxim Ventura and Carlos Suárez
Universitat Pompeu Fabra
Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 1 / 64
Fundamentals of Regression Analysis
Contents
1. The role of Econometrics: simple regression model

2. Different types of data
3. The Ordinary Least Squares estimator (OLS)
4. First order conditions
5. Interpretation of the coefficients
6. Measures of goodness of fit
7. The OLS estimator assumptions
8. The sampling distribution of the OLS estimator
9. Homoskedasticity and heteroskedasticity
10. Hypothesis test and confidence intervals
Fundamentals of Regression Analysis The role of Econometrics: simple regression model
The role of Econometrics: simple regression model
Empirical problem: Size of an apartment and its price

• Let’s assume we work in a real estate firm and we are asked to set the
price of a property for which we only know its size
• We have information regarding the price and size of different apartments
from (1260) previous transactions
• Using all this information, want to know how the area (in squared
meters) of an apartment in Barcelona affects its price (in euros)
• We could then draw a line through the data points Pricei = β0 + βarea Areai
• Accordingly, the area of an apartment i is linearly related to its price,

where β0 is the ordinate to the origin while βarea is the slope of the line
Stata code for the scatterplot
clear all
set more off
use habitatge_BCN_1920_12.dta, clear
twoway (scatter preu superf, msize(tiny) mcolor(edkblue)) ///
(lfit preu superf, lwidth(medium) lcolor(black))
corr superf preu
• Of course, many other factors probably influence the price of an

apartment
• For instance, we know that location affects the price of a property
• For that reason, as we can see in the previous graphic, the line won’t
accurately predict the price of every property
• All those factors that are not explicitly included in the regression will be
gathered altogether in the error term
Pricei = β0 + βarea Areai + other factorsi
• In general
Yi = β0 + β1 Xi + ui (1)
• This is the simple linear regression model with just one regressor, where
Yi is the dependent variable for unit i, Xi is the regressor or independent
variable for unit i and ui is the error term for unit i.
• The first part β0 + β1 Xi is the population regression line, that is, the
average relation between X and Y that we see in the population
• If we know the value of β0 and β1 , for a given X we could use the
corresponding population equation and predict its Y
• Last but not least, ui is the difference between Yi and the corresponding
regression line, and arises from all the other factors affecting Yi that have
not been included in our model
• What does it means that β0 and β1 are unknown population parameters?

• The problem is similar to others in statistics
• We don’t know the value of population parameters but we can estimate
them using a random sample and the appropriate technique
• So, how do we estimate the slope of a line such that goes through the
scatterplot of Size and Price?
• Of course, there is no line that will go through all the data points and we
can actually draw an infinite number of different lines that go through the
data points
• So, which criteria should we use to pick one among all the possibilities?
Fundamentals of Regression Analysis Different types of data
Different types of data
Experimental vs Observational
• Experimental data are observations that comes from an

studies where the researcher has knows and controls the
assignment mechanism for each treatment value, for every
unit.
• Observational data are observations that comes from
studies where the assignment mechanism is not known or
not under the control of the researcher
• The effectiveness studies to approve different Covid
vaccines are an example of experimental data (people
where randomly assigned by the researchers to the vaccine
or to a placebo). The effect of Covid on the economy would
be an example of studies using observational data
Cross Sectional vs Time Series
• Cross-sectional data are observations that come from

different individuals or groups at a single point in time
• On the other hand, time-series data are observations
collected at spaced time intervals
• The expenditure on energy of families living in Catalunya
during 2020 or the grades obtained in Econometrics I by the
2019/2020 cohort would be an expample of cross sectional
data
• The daily closing price of a certain stock, weekly sales
figures of ice cream sold or the yearly number of students
registered for a Econometrics I at upf are examples of
time-series data
Fundamentals of Regression Analysis The Ordinary Least Squares (OLS) estimator
The Ordinary Least Squares (OLS) estimator
The OLS estimator

• If we remember from the probability and statistics review, among other
properties Ȳ was the minimum squares estimator of µY , and it solved
n
X
min (Yi − µY )2
m
i=1
• The ordinary least squares (OLS) extends this idea for the linear
regression model
• Let’s assume that b0 and b1 are some estimators of the unknown
parameters β0 and β1
• Based on those estimators, the regression line is b0 + b1 X
• From that estimation, the predicted value of Yi is Ŷi = b0 + b1 Xi
• Therefore, the residual from the ith prediction will be
ûi = Yi − Ŷi = Yi − (b0 + b1 Xi ) = Yi − b0 − b1 Xi
• Note: The residual ûi can be interpreted as the sample counterpart of ui
• The OLS estimator of β0 and β1 solves the following minimisation

problem:
n
X
min û2i
(b0 ,b1 )
i=1
n
X
= min (Yi − b0 − b1 Xi )2
(b0 ,b1 )
i=1
• It looks for the pair (b0 , b1 ) that solves the aforementioned problem
• We will refer to the pair that minimises the sum of squares as (β̂0 , β̂1 )
Pn
(X − X̄)(Yi − Ȳ) sXY
β̂1 = Pn i
i=1
=
i=1 (Xi − X̄)
2 s2X
β̂0 = Ȳ − β̂1 X̄
ûi = Yi − Ŷi = Yi − β̂0 − β̂1 Xi
• The first two equations are the OLS estimators of the unknown
population parameters β0 and β1 ; the third is the residual from model
prediction (sample counterpart of the error term, but we should never
interpret them as equivalent)
• Different samples, will generate different estimands of β̂0 and β̂1 (that is,
their estimated value)
• That is, the estimators are random variables and their particular value
will depend on the sample
Notation
• β̂1 is the OLS estimator of β1

• β̂0 is the OLS estimator of β0
• The regression line is the sample regression line or sample regression
function
• The predicted value of Yi for a given Xi is Ŷi = β̂0 + β̂1 Xi
• The residual for the observation i is the difference between Yi and its
predicted value Ŷi : ûi = Yi − Ŷi
• Do not confuse the residual ûi with the error term ui ; they are two
different things!
Fundamentals of Regression Analysis First order conditions of the OLS estimators
First order conditions of the OLS estimators
Some interpretations of the first order conditions
From the minimization problem, the first order conditions are:
∂ X
= −2 (Yi − b0 − b1 Xi ) = 0
∂b0 i
∂ X
= −2 (Yi − b0 − b1 Xi )Xi = 0
∂b1 i
1. The prediction of Ŷi for X̄ is Ȳ

2. If the regression model includes a constant term, the mean of the
residuals (ûi ) is zero
3. The residuals ûi are uncorrelated with Xi
4. The residuals ûi are uncorrelated with Ŷi (the prediction of Yi )
Example of the first order conditions
• In the appendix of these notes you have the formal proof of these
properties, let’s take a look at those using the data from our example.
• In order to do so, we will:
1. Run the OLS regression of price on area
2. Predict the value of Ŷi using X̄: β̂0 + β̂1 X̄
3. Predict the OLS residuals ûi as the difference Yi − Ŷi , where
Ŷi = β̂0 + β̂P
1 Xi
n
4. Calculate i ûi
5. Calculate Corr(Xi , ûi )
6. Calculate Corr(Ŷi , ûi )
Stata code
sum superf
local msuperf = r(mean)
disp _b[_cons] + _b[superf]*‘msupfer’
sum preu
reg preu superf
predict resid, r
predict yhat, xb
sum resid
corr superf resid
corr yhat resid
Property # 1
Property # 2
Property # 3
Property # 4
Fundamentals of Regression Analysis Interpretation of the OLS coefficients
Interpretation of the OLS coefficients
Interpretation of the estimated coefficients for β̂0 and β̂1
• Since the regression model is a linear model,

∆Y (Y | X + ∆X) − (Y | X)
• β1 = =
∆X ∆X
• Therefore, β1 is the slope of the line and the units of β1 are the units of Y
over the units of X
• β0 is the ordinate to the origin, that is, the value of Y when X = 0.
• The units of β0 are the units of Y
• Similarly, we can interpret β̂1 as the predicted (or adjusted) change in Y

when we change X by one unit
• And we can interpret β̂0 as the predicted value of Y when X = 0. Beyond
the algebraic interpretation of β̂0 , we have to be careful how to interpret
it, particularly when the X cannot be equal to 0
• Let’s look at the example of housing price and area of the dwelling
Stata output from the OLS regression
Stata output interpretation
• In Stata, the command reg (or regress) will estimate linear model of Y
on X
• Let’s take a look at the result of using that command for the housing
example, where Y is the price of the dwelling in euros and X its area in
sq meters
• According to the results, β̂1 = 1641.242 euros/m2
• That is, according to our estimation, if the size of the dwelling is 1
squared meter larger, we expect the price of the dwelling to be 1641.24
euros higher
Stata output interpretation
• β̂0 = −31353.4 euros

• Algebraically, that tells us that an apartment of 0 square meters costs
-31353.4 euros. Because no dwelling has 0 squared meters, this
coefficient does not make economic sense
• We should never interpret the estimated β̂0 when the X variable can not
take the value 0
• Remember both β̂1 and β̂0 are random variables. With a different sample
of dwellings we will have obtained different estimated values
• Other things to notice: the model was estimated using 1267
observations; we will go through the rest of the table later
Interpretation of the estimated coefficient when X is a

binary variable (0/1)
• In many situations the regressor will be a binary variable:

▶ Sex: Man (X = 0), Woman (X = 1)
▶ Received some experimental treatment for mental health (X = 1),
control (X = 0)
▶ Small apartment (X = 0), Large apartment (X = 1)
• So far, we have interpreted β1 as the slope of a line; however this
interpretation is meaningless if X is binary
• How do we interpret the coefficient for a binary regressor?
• Let’s estimate Yi = β0 + β1 Xi + ui but now Xi is equal to 1 if the dwelling

is 150 squared meters or more and 0 otherwise
• When X is a binary 0/1 variable, β̂1 is the difference of the sample mean
of the price of the dwelling between the two groups (large dwellings vs
small dwellings)
• That is, β̂1 = (Ȳ | X = 1) − (Ȳ | X = 0) = Ȳ1 − Ȳ0
• As we can see, the OLS estimator is just a generalisation of the sample

mean difference between groups, when the X is allowed to be
continuous
Things to bear in mind regarding the linear regression

model and the OLS estimator β̂1 and β̂0
1. By construction, ûi (the residuals) are not correlated with Xi . This can
be interpreted as if the OLS estimator is extracting all the linear
information from X that is useful to predict Y
2. ûi ̸= ui . The first is the difference between the value of Yi and the
predicted value Ŷi ; the second is the unknown error from the population
regression line
3. The units of β̂1 are the units of Y over the units of X, while the units of β̂0
are in Y’s units. Therefore, a different unit of X or a different unit of Y will
have consequences on the estimated coefficients (but not on the
underlying conclusions of the model)
4. The interpretation of β̂0 is meaningful only if the probability of X being 0
is larger than zero
Fundamentals of Regression Analysis Measures of goodness of fit
Measures of goodness of fit
Measures of goodness of fit
• A natural question is how well the regression line "fits" or explains the
data
• How much of the observed variation in Y is explained by our model?
How close is the regression line to the observations?
• There are two regression statistics that provide complementary
measures of the quality of the fit:
▶ The R2 statistic measures the fraction of the variance of Y that is
explained by X; it is unitless and ranges between zero (no fit) and
one (perfect fit)
▶ The standard error of the regression (SER) measures the typical
size of a regression residual in the units of Y.
R2
• The R2 tells us the fraction of the variance of Yi that is explained or

predicted by the values of Xi .
var(Ŷi )
• Accordingly, R2 =
var(Yi )
1 Pn 1 Pn
• where var(Ŷi ) = (Ŷi − Ȳ)2 and var(Yi ) = (Yi − Ȳ)2
n i=1 n i=1
• Let’s define the Total Sum of Squared (TSS) as the sum of the square of
Pn
the deviation of Y from its the sample mean: i=1 (Yi − Ȳ)2
• Let’s define the Explained Sum Square (ESS) as the sum of the square
Pn ¯ 2
of the deviation of Ŷ from its mean: i=1 (Ŷi − Ŷ)
1
Pn
var(Ŷi ) (Ŷ − Ȳ)2 ESS
• R2 = = 1n Pni=1 =
var(Yi ) n (Y
i=1 i − Ȳ)2 TSS
• R2 can also be written in term of the residuals

• Working out the expression for the TSS we have:
n
X
TSS = (Yi − Ŷi + Ŷi − Ȳ)2
i=1
n
X n
X n
X
= (Yi − Ŷi )2 + (Ŷi − Ȳ)2 + 2 (Yi − Ŷi )(Ŷi − Ȳ)
i=1 i=1 i=1
n
X
= RSS + ESS + 2 (ûi Ŷi ) = RSS + ESS
i=1
RSS
• Therefore R2 = 1 −
TSS
Rewriting the equation for the residual, we have that Yi = Ŷi + ûi . Therefore,
var(Yi ) = var(Ŷi ) + var(ûi )

Pn
2 (ûi − ūi )2
R = 1 − Pi=1 n 2
i=1 (Yi − Ȳ)
RSS
R2 = 1 −
TSS
Xn
RSS = û2i
i=1
• The R2 ranges from 0 to 1.

• If it is 0, it means that our regressor does not predict any of the observed
variance of Y. This is equivalent to a model where β̂1 = 0
• The "worst" model we can think of is one where we exclude X and we
predict the value of Y using Ȳ; this is similar to estimating the following
regression Yi = β0 + ui (check that this model does not explain any
fraction of the observed variance of Y)
• Be aware that a high or low R2 does not tell us anything in absolute
terms; it is only a relative.
• The worst model we can think of has an R2 of zero. Therefore, the
comparison we are doing is relative to this case
R2 = Corr(X, Y)2
• One interesting thing is that the R2 is the same as the sample correlation
of X and Y squared
• That is because, in the end, the linear regression model is just scaling
the sample correlation between X and Y by the variance of X. So the
model is as good as it is the correlation between these two variables
R2 : warning
• R2 is a measure of the goodness of the fit regarding the initial question

that motivated the statistical analysis
• That is, R2 does not judge whether the question is right or whether the
data is right, it just shows how adequate the answer is to the question
• Comparing statistical models based on the R2 is similar to comparing
cars based on the size
• "Every model is wrong, some of them are useful" (George Box)
• There is no way of saying whether the estimated R2 is high or low, unless
we make explicit what we expected from the model
• Finally, the R2 is a measure of goodness of fit in-the-sample; that is, it is
not useful to make out of sample predictions
Standard Error of the Regression (SER)
The standard error of the regression is (almost) the sample standard

deviation of the OLS residuals
q Pn ¯ 2 RSS
• SER = sû = s2û , where s2û = n−2
1
i=1 (ûi − ûi ) =
n−2
• SER measures the spread of the distribution of u
• SER measures the average "size" of the OLS residual (the average
"mistake" made by the estimated OLS regression line)
• SER is computed using the sample residual û
• The root mean squared error (RMSE) is closely related to the SER:
q P
n
RMSE = 1n i=1 (ûi )2
• Division by n − 2 is an adjustment to correct for the downward bias for

the two degrees of freedom that are lost when we estimate β̂0 and β̂1
• This is similar to the case of s2Y ; here we divide by n − 1 to compute the
standard deviation because we lost one degree of freedom to compute
the mean.
• If n is large enough, the difference between dividing by n − 2 or n is
negligible
• In our case, R2 = 0.6867 which means that the area of the dwelling
explains 68.67% of the variance in the price of the dwelling
• According to the model, since the RSS is 1545445061347.875 and we
have 1267 observations, the SER is 34953 euros and the RMSE is
34925 euros
• Word of caution: Stata calculates the RMSE differently. It divides by
(n-2) instead of n. So what Stata puts as the RMSE is actually the SER
Fundamentals of Regression Analysis Appendix
Appendix
Derivation of the OLS estimators for the simple regression

model
• Let’s go over the algebra to derive the OLS estimators

• The OLS estimators of the simple model are the (b0 , b1 ) that minimises
the following equation:
Minb0 ,b1 i û2i = Minb0 ,b1 i (Yi − b0 − b1 Xi )2

P P
• From the first order conditions
∂ X
= −2 (Yi − b0 − b1 Xi ) = 0
∂b0 i
∂ X
= −2 (Yi − b0 − b1 Xi )Xi = 0
∂b1 i
• Working out with the first order condition
X
(Yi − b0 − b1 Xi ) = 0
i
X X X
Yi − b0 − b1 Xi ) = 0
i i i
X X X
b0 = Yi − b1 Xi
i i i
b0 = Ȳ − b1 X̄i
• Working out with the second order condition
X
(Yi − b0 − b1 Xi )Xi = 0
i
X X X
Yi Xi − b0 Xi − b1 Xi2 = 0
i i i
• Substituting the previous result b0 = Ȳ − b1 X̄i

X X X
Yi Xi − (Ȳ − b1 X̄i ) Xi − b1 Xi2 = 0
i i i
• Working out with the second order condition
X X X X
Yi Xi − Ȳ Xi − b1 X̄i Xi − b1 Xi2 = 0
i i i i
X X X X
b1 ( Xi2 − Xi X̄) = Yi Xi − Ȳ Xi
i i i i
X X
b1 Xi (Xi − X̄) = Xi (Yi − Ȳ)
i i
• Let’s show that i Xi (Xi − X̄) = (Xi − X̄)2 and

P P
P P
i Xi (Yi − Ȳ) = (Xi − X̄)(Yi − Ȳ)
X X X X
(Xi − X̄)2 = Xi2 + X̄ 2 − 2X̄ Xi
X
= Xi2 + N X̄ 2 − N2X̄ 2
X
= Xi2 − N X̄ 2
X X
= Xi2 − X̄ 2
X
= Xi (Xi − X̄)
i
• And similar for

P P
i Xi (Yi − Ȳ) = (Xi − X̄)(Yi − Ȳ)
P
(Xi − X̄)(Yi − Ȳ)
b1 = P
(Xi − X̄)2
• Therefore, the parameters that minimize the sum of residuals square in

the simple model are (OLS estimators)
P
(Xi − X̄)(Yi − Ȳ)
β̂1 = P
(Xi − X̄)2
β̂0 = Ȳ − β̂1 X̄
• And using the definition of sample covariance and variance we can

rewrite the previous equation as:
SXY
β̂1 =
SX2
β̂0 = Ȳ − β̂1 X̄
Algebraic proof of the results from the first order conditions

of the OLS estimator
1. The prediction of Ŷi using X̄ is Ȳ

2. If the regression model includes a constant term, the mean of the
residuals (ûi ) is zero
3. The residuals ûi are uncorrelated with Xi
4. The residuals ûi are uncorrelated with Ŷi (the prediction of Yi )
5. STC = SRC + SEC
Proof #1
Ŷi = β̂0 + β̂1 Xi

Ŷi = (Ȳ − β̂1 X̄) + β̂1 Xi
n n n n
1 X 1X 1X 1X
Ŷi = Ȳ − β̂1 X̄ + β̂1 Xi
n n n n
i=1 i=1 i=1 i=1
Ȳˆ = Ȳ − β̂1 X̄ + β̂1 X̄

Ȳˆ = Ȳ
Proof #2
ûi = Yi − β̂0 − β̂1 Xi

β̂0 = Ȳ − β̂1 X̄
ûi = (Yi − Ȳ) − β̂1 (Xi − X̄)
n n n
1 X 1X 1X
ûi = (Yi − Ȳ) − β̂1 (Xi − X̄)
n n n
i=1 i=1 i=1
n n n
1X 1X 1X
Since (Yi − Ȳ) = Yi − Ȳ = Ȳ − Ȳ = 0
n n n
i=1 i=1 i=1
n
1X
And the same with β̂1 (Xi − X̄)
n
i=1
n
1X
We have that ûi = 0
n
i=1
Proof #3 (similar for #4)
n
X n
X
ûi Xi = ûi (Xi − X̄)
i=1 i=1
Xn
= ((Yi − Ȳ) − β̂1 (Xi − X̄))(Xi − X̄)
i=1
Xn n
X
= (Yi − Ȳ)(Xi − X̄) − β̂1 (Xi − X̄)2
i=1 i=1
Pn
(Y − Ȳ)(Xi − X̄)
Como β̂1 = Pn i
i=1
2
i=1 (Xi − X̄)
n n Pn n
(Y − Ȳ)(Xi − X̄) X
Pn i
X X
ûi Xi = (Yi − Ȳ)(Xi − X̄) − i=1
2
(Xi − X̄)2 = 0
i=1 i=1 i=1 (Xi − X̄) i=1
Proof #5
n
X n
X
2
STC = (Yi − Ȳ) = (Yi − Ŷi + Ŷi − Ȳ)2
i=1 i=1
Xn Xn n
X
= (Yi − Ŷi )2 + (Ŷi − Ȳ)2 + 2 (Yi − Ŷi )(Ŷi − Ȳ)
i=1 i=1 i=1
n
X
= SRC + SEC + 2 ûi Ŷi = SRC + SEC
i=1
Formal derivation of the OLS estimator when X is a

dichotomous variable
• OLS estimator:
P P
i (X − X̄)(Yi − Ȳ) i Xi (Yi − Ȳ)
β̂1 = Pi 2
= P
i (Xi − X̄) i Xi (Xi − X̄)
• Now X will be a variable that takes only two values

(
1 if individual i receives the treatment (T)
Xi =
0 case contrary (NT)
• Let’s rewrite the OLS estimator in terms of this binary or dummy variable
X
P
• β̂1 = i 1(i∈T)(Yi −Ȳ)
N
1(i∈T)[1(i∈T)− NT ]
P
i
• NT
N is the result of :
1 X 1 X X
X̄ = 1(i ∈ T) = ( NT × 1 + NNT × 0)
N i N
i=1 i=NT +1
.
• Thus, X̄ = NT
N
• Let’s see what the denominator and numerator look like

• The denominator is
X NT X NT X
1(i ∈ T)[1(i ∈ T) − ]= 1(i ∈ T)2 − 1(i ∈ T)
i
N i
N i
NT2
= NT −
N
NT
= NT (1 − )
N
• The numerator is
X X X
1(i ∈ T)(Yi − Ȳ) = 1(i ∈ T)Yi − Ȳ 1(i ∈ T)
i i i
• Rewriting Ȳ as the weighted average in the different groups, we have
1 X NT NNT
NT [ 1(i ∈ T)Yi ] − NT [ E[Yi |Xi = 1] + E[Yi |Xi = 0]
NT i N N
• NT E[Y|X = T] − NT
N [NT E[Yi |Xi = 1] + NNT E[Yi |Xi = 0]]
• E[Y|X = T](NT − NT NT
N ) + N NNT E[Yi |Xi = 0]
• Since (NT − NT NT NT NT NT
N ) = NT (1 − N ) y N NNT = N (N − NT ) = NT (1 − N )
• NT (1 − NT
N )[E[Yi |Xi = 1] − E[Yi |Xi = 0]]
• Therefore,
NT
NT (1 − N )[E[Yi |Xi = 1] − E[Yi |Xi = 0]]
β̂1OLS =
NT (1 − NNT )
• β̂1 = Ȳ T − Ȳ NT

lect2_part1

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

lect2_part1

Uploaded by

Copyright:

Available Formats

Econometrics I: Fundamentals of Regression Analysis

Javier Abellán, Màxim Ventura and Carlos Suárez

Universitat Pompeu Fabra

1. The role of Econometrics: simple regression model

The role of Econometrics: simple regression model

Empirical problem: Size of an apartment and its price

• Accordingly, the area of an apartment i is linearly related to its price,

Stata code for the scatterplot

• Of course, many other factors probably influence the price of an

• What does it means that β0 and β1 are unknown population parameters?

Different types of data

• Experimental data are observations that comes from an

Cross Sectional vs Time Series

• Cross-sectional data are observations that come from

The Ordinary Least Squares (OLS) estimator

The OLS estimator

• The OLS estimator of β0 and β1 solves the following minimisation

• β̂1 is the OLS estimator of β1

First order conditions of the OLS estimators

Some interpretations of the first order conditions

From the minimization problem, the first order conditions are:

1. The prediction of Ŷi for X̄ is Ȳ

Example of the first order conditions

Interpretation of the OLS coefficients

Interpretation of the estimated coefficients for β̂0 and β̂1

• Since the regression model is a linear model,

• Similarly, we can interpret β̂1 as the predicted (or adjusted) change in Y

Stata output from the OLS regression

Stata output interpretation

Stata output interpretation

• β̂0 = −31353.4 euros

Interpretation of the estimated coefficient when X is a

• In many situations the regressor will be a binary variable:

• Let’s estimate Yi = β0 + β1 Xi + ui but now Xi is equal to 1 if the dwelling

• As we can see, the OLS estimator is just a generalisation of the sample

Things to bear in mind regarding the linear regression

Measures of goodness of fit

Measures of goodness of fit

• The R2 tells us the fraction of the variance of Yi that is explained or

• R2 can also be written in term of the residuals

var(Yi ) = var(Ŷi ) + var(ûi )

• The R2 ranges from 0 to 1.

• R2 is a measure of the goodness of the fit regarding the initial question

Standard Error of the Regression (SER)

The standard error of the regression is (almost) the sample standard

• Division by n − 2 is an adjustment to correct for the downward bias for

Derivation of the OLS estimators for the simple regression

• Let’s go over the algebra to derive the OLS estimators

Minb0 ,b1 i û2i = Minb0 ,b1 i (Yi − b0 − b1 Xi )2

• From the first order conditions

• Working out with the first order condition

• Working out with the second order condition

• Substituting the previous result b0 = Ȳ − b1 X̄i

• Working out with the second order condition

• Let’s show that i Xi (Xi − X̄) = (Xi − X̄)2 and

• And similar for

• Therefore, the parameters that minimize the sum of residuals square in

• And using the definition of sample covariance and variance we can

Algebraic proof of the results from the first order conditions

1. The prediction of Ŷi using X̄ is Ȳ

Ŷi = β̂0 + β̂1 Xi

Ȳˆ = Ȳ − β̂1 X̄ + β̂1 X̄

ûi = Yi − β̂0 − β̂1 Xi

Proof #3 (similar for #4)