lect2_part2

Econometrics I: Fundamentals of Regression Analysis
Part 2
Javier Abellán, Màxim Ventura and Carlos Suárez
Universitat Pompeu Fabra
Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 1 / 73
Fundamentals of Regression Analysis The OLS estimator assumptions
The OLS estimator assumptions
Why do we use OLS instead of other possible estimators?
• OLS is a generalisation of the sample average: if the "line" would be just

an intercept (that is, if the model does not include any variable), then the
OLS estimator would be the sample average of Y1 , . . . , Yn (Ȳ)
• Similar to Ȳ, the OLS estimator has some desirable properties:
▶ Under certain assumptions, it is unbiased, it is an unbiased
estimator: E(β̂1 ) = β1
▶ Under certain assumptions, has a tighter sampling distribution than
some other unbiased candidates estimators of β1 (that is, it has
lower variance).
Least-Squares assumptions
Yi = β0 + β1 Xi + ui , i = 1, ..., n
In order for the OLS estimators, βˆ0 and βˆ1 , to be appropriate estimators of the
true parameters β0 and β1 , the following three assumptions need to be true:
• Assumption 1: The conditional distribution of ui given Xi has a mean of

zero.
E(ui |Xi ) = 0
• Assumption 2: Observations are independently and identically
distributed.
(Xi , Yi ), i = 1, . . . , n are i.i.d
• Assumption 3: Large outliers are unlikely
0 < E(X 4 ) < ∞ and 0 < E(Y 4 ) < ∞
Assumption 1: E(ui | Xi ) = 0
If the conditional distribution of ui given Xi has a mean of zero:
E(ui |Xi ) = 0
• All the "other factors" captured in the error term ui (those that
explain Yi but have not been included in the model) are (linearly)
unrelated to Xi : Cov(X, u) = 0 (See Appendix)
• The conditional distribution of Yi is centered in the population
regression line: That is, on average, the prediction of Yi is right (See
Appendix)
• We will frequently come back to this assumption during the course
With experimental data:

• In randomized control trials (RCT), Xi is randomly assigned to individuals
without taking into account its characteristics
• As a consequence, Xi is unrelated to all characteristics of the individual
that affects Yi , which in our model are captured by ui
• Therefore, in well-designed experimental settings:
▶ ui and Xi are independently distributed
▶ E(ui |Xi ) = 0
With observational data:
• Xi is not randomly assigned across the population
• So we should be careful to understand wether this assumption is
actually valid in the data
Assumption 2: (Xi , Yi ), i = 1, ....., n are i.i.d
(Xi , Yi ), i = 1, ....., n are independently and identically distributed

• If observations are selected by simple random sampling from a single
large population, this assumption is true.
• Let’s continue with our example of the housing price in Barcelona. If X is
the area of a dwelling and Y its sale’s price.
• If we randomly sample n dwellings from the population of dwellings sold
in Barcelona between 1998 and 2000:
▶ because all observations are drawn from the same population, the
joint distribution of surface and price is the same for each i and
equals the joint distribution of surface and price in the population
(identically distributed).
▶ because the sample is selected at random, knowing the surface
and price of dwelling 1 tells us nothing about the surface and price
of the remaining n-1 dwellings (independently distributed).
Assumption 3: Large outliers are unlikely
• Outlier: an observation with values of Xi , Yi or both far outside the usual

range of the data
• Extreme values prevent the sample variance s2 to converge to the
population variance σ 2 , this making the OLS estimations misleading.
• Mathematically, that extreme values are unlikely is stated as:
▶ X and Y have nonzero finite fourth moments: 0 < E(X 4 ) < ∞ and
0 < E(Y 4 ) < ∞
▶ Another way to put this is that X and Y have finite kurtosis
Assumption 3: Large outliers are unlikely
• The validity of this assumption will depend on the characteristics of the

data
• For instance, the area of a dwelling will probably satisfy the assumption
• The same goes for exam grades, age of a person, etc.
• However, for other variables such as returns of the stock market, we
should check whether this is actually the case
Twin roles of the least square assumptions
1. Mathematical role: if the three assumptions hold...

▶ the OLS estimators will be unbiased estimators of the true
parameters
▶ the OLS estimators will be consistent estimators of the true
parameters
▶ the OLS estimators will have sampling distributions that are
approximately normal in large samples.
2. Circumstances when the OLS assumptions do not hold
▶ Corr(X, u) ̸= 0
▶ Observations not i.i.d
▶ Outliers
Fundamentals of Regression Analysis The sampling distribution of the OLS estimator
The sampling distribution of the OLS estimator
Sampling distribution of the OLS estimators

If the OLS estimators are computed from randomly drawn samples, βˆ0 and βˆ1
will be random variables themselves with a sampling distribution.
Under the least square assumptions:
• βˆ0 and βˆ1 are unbiased estimators of β0 and β1
E(βˆ0 ) = β0 and E(βˆ1 ) = β1
• In large samples, by the central limit theorem, the sampling distribution

of βˆ0 and βˆ1 can be well approximated by the bivariate normal
distribution.
▶ The marginal distributions of βˆ0 and βˆ1 are (approximately)
normally distributed in large samples.
βˆ0 → N (β0 , σβ2ˆ ) and βˆ1 → N (β1 , σβ2ˆ )

d 0 d 1
Unbiasedness of βˆ1
P
(Xi − X)(Yi − Y)
β̂1 = P
(Xi − X)2
• If we replace Yi by its population value according to the true model

(Yi = β0 + β1 Xi + ui ) and work out the math, we can show that: (See
Appendix)
P
(Xi − X)ui
β̂1 = β1 + P
(Xi − X)2
• This is one of the most important formulas we will see during this course
P
(Xi − X)ui
β̂1 = β1 + P
(Xi − X)2
• The intuitive idea is that our estimator is equal to the true parameter plus
’something’ else
• If the expected value of that ’something’ else is zero, our estimator is
unbiased; otherwise it is biased
• If the error term is uncorrelated with our X (if assumption #1 holds), then
the second term will be zero and thus our estimator will be unbiased
(E(β̂1 ) = β1 )
• However, if our model has left in the error term something relevant
(explains Y and therefore belongs to the error term and is correlated with
X), the second term will not be zero and our estimator will be biased
Normal approximation of βˆ1 and βˆ0 in large samples
The large sample approximation of βˆ1 is:
1 var[(Xi − µX )ui ]
N (β1 , σβ2ˆ ) where σβ2ˆ =
1 1 n [var(Xi )]2
The large sample approximation of βˆ0 is:
1 var(Hi ui )
N (β0 , σβ2ˆ ) where σβ2ˆ =
0 0 n [E(Hi )2 ]2
h µ i
X
and Hi = 1 − Xi
E(Xi2 )
From the variance formula of the OLS estimators we can see several things:
1. Other things equal, the larger the variance of Xi , the smaller the variance
of βˆ1
▶ Intuitively, the wider the range of X, the ’better’ information to draw
the the regression line.
2. Other things equal, the smaller the variance of ui , the smaller the
variance of βˆ1
▶ Intuitively, if we have a very good model (the errors are smaller), the
data will have a tighter scatter around the population regression
line, so its slope will be estimated more precisely.
3. Other things equal, a larger the sample size (n), the smaller the variance
βˆ1
▶ Intuitively, larger n means more dots (information) to draw the
regression line
Consistency of βˆ1 and βˆ0
From the variance formula of the OLS estimators we can see several things:
• βˆ0 and βˆ1 are consistent estimators of β0 and β1

▶ As n gets larger, the variance of βˆ0 and βˆ1 will go to zero
▶ Since n is in the denominator of the variance’s formulas, if
assumption #3 holds (the other terms are finite), the variance will
converges to zero as n → ∞
Estimator of the variance and standard error of βˆ1 and βˆ0

The variances of the OLS estimators, σβ2ˆ and σβ2ˆ , are unknown parameters
1 0
so they need to be estimated as well.
The estimators of σβ2ˆ and σβ2ˆ are, respectively:

1 0
1
Pn 2 2 1
Pn 2 2
1 n−2 i=1 (Xi − X̄) ûi 1 n−2 i=1 Ĥi ûi
σ̂β̂2 = n h P i2 and σ̂β̂2 = n h P 2 2
1 0
i
1 n 2 1 n
n i=1 (X i − X̄) n i=1 Ĥ i
1 Pn
where Ĥi = 1 − (X/ X 2 )Xi
n i=1 i
And the standard errors of βˆ1 and βˆ0 are estimators of the standard deviation
of βˆ1 and βˆ0 , σβˆ1 and σβˆ0 :
q q
se(βˆ1 ) = σ̂β2ˆ and se(βˆ0 ) = σ̂β2ˆ
1 0
Fundamentals of Regression Analysis Homoskedasticity and heteroskedasticity
Homoskedasticity and heteroskedasticity
Homoskedasticity
Let’s add a fourth assumption:

• Assumption 4: the errors ui are homoskedastic
The error term ui is homoskedastic if:

▶ The variance of the conditional distribution of ui given Xi , Var(ui |Xi ),
is constant for i = 1, ...,n
▶ In particular, does not depend of Xi
Otherwise, the error term is said to be heteroskedastic.
Graphically:
• In the left-hand figure, the spread of the conditional distribution of ui

given Xi (student-teacher ratio in the example) does not depend on the
value of Xi .
• On the contrary, in the right-hand figure, the spread of the conditional
distribution of ui given Xi is tight for low values of Xi and greater for larger
values of Xi . So it does depend on Xi .
Mathematical implications of homoskedasticity
If the three least square assumptions hold and the errors are homoskedastic:
1. The OLS estimators remain unbiased, consistent and asymptotically
normal
▶ Note that unbiasedness and consistency do not depend on whether
errors are heteroskedastic or homoskedastic
▶ For these properties to be true, we only need the
first 3 least square assumptions to hold
2. The OLS estimators βˆ0 and βˆ1 are efficient among all estimators that are
a linear combination of Y1 , ..., Yn and are unbiased (Gauss-Markov
theorem).
▶ This is, the OLS estimators are the more efficient linear
conditionally unbiased estimators (are BLUE)
3 Because the conditional variance of ui given Xi is constant,

Var(ui |Xi ) = σu2 , the formulas for the variance of βˆ0 and βˆ1 simplify to:
σu2 E(Xi2 ) 2
σβ2ˆ = and σβ2ˆ = σ
1 nσX2 0 nσX2 u
• Consequently, if errors are homoskedastic, the formula for the standard

errors of βˆ0 and βˆ1 is simplified. The homoskedasticity-only standard
errors:
q s2û
se(βˆ1 ) = σ̃β2ˆ where σ̃β2ˆ = Pn 2
i=1 (Xi − X)
1 1
1 P
n
i=1 Xi2 s2û
σ̃β2ˆ = Pnn
q
se(βˆ0 ) = σ̃β2ˆ where
0 0
i=1 (Xi − X)2
Warning
• When the errors are heteroskedastic, the homoskedasticity-only

formulas for the standard errors are inappropriate. Specifically:
▶ The t-statistic computed using the homoskedasticity-only standard
error does not have a standard normal distribution, even in large
samples.
▶ The 95% confidence intervals constructed using 1.96 as a critical
value and the homoskedasticity-only standard error will not contain
the true value of the parameter with 95% probability, even in large
samples.
• In contrast, using heteroskedasticity-robust standard errors (the

formulas initially presented for se(βˆ1 ) and se(βˆ1 )) leads to valid statistical
inferences whether or not the errors are heteroskedastic.
▶ At a general level, economic theory rarely gives any reason to
believe that the error term is homoskedastic
▶ So we will generally assume that errors are heteroskedastic and we
will use heteroskedasticity-robust standard errors.
Variance of the residuals in the dwellings example
• Does σû2 depend on Xi ?

i
• As we can see, a larger value of X has a larger û. Therefore, it is quite

likely that assumption #4 (homoskedasticity) does not hold
Fundamentals of Regression Analysis Hypothesis test and confidence intervals
Hypothesis test and confidence intervals
Testing hypotheses about β1
The general approach to test hypothesis about the unknown parameter β1 is

the same as the one to test hypothesis about the population mean, µ.
Steps:
1. Set a null hypothesis (H0 ) about β1 and assume it is true

2. Characterise the sampling distribution of βˆ1 under H0
3. Calculate βˆ1 from the randomly selected sample
4. Choose a significance probability level (α)
5. Reject H0 or not accordingly. Three alternative ways:
5.1 Calculate the t-statistic and compare it to the critical value t*
5.2 Calculate the p-value and compare it to α
5.3 Calculate the confidence interval for β1 and check if β1,0 is in it
Two-sided hypothesis concerning β1
Empirical question:
Does the size of an apartment affects its sale’s price?
Step 1: Convert your empirical question into a hypothesis

Our empirical question concerns the slope of the population regression line
that relates the size of an apartment with its price:
Price = β0 + β1 Size
Concretely, we want to know if this relation exists at all. Therefore, our null
and alternative hypotheses are:
• H0 : β 1 = 0 (NO relation between Size and Price in the population)
• H1 : β1 ̸= 0 (Relation between Size and Price in the population)
More generally: H0 : β1 = β1,0 and H1 : β1 ̸= β1,0
Empirical question:
Does the size of an apartment affects its price of sale?
Step 2: Characterize the sampling distribution of βˆ1 under H0

We have seen in the previous section that in large samples the distribution of
βˆ1 is well approximated by:
1 var[(Xi − µX )ui ]
N(β1 , σβ2ˆ ) where σβ2ˆ =
1 1 n [var(Xi )]2
So under our H0 , if the sample is large enough the distribution of βˆ1 is

approximately N (0, σβ2ˆ )
1
Or more generally, approximately a N(β1,0 , σβ2ˆ )

1
Empirical question:
Step 3: Calculate βˆ1 from a randomly selected sample.

P
(Xi − X)(Yi − Y) sX,Y
β̂1act = P 2
= 2
(Xi − X) sX
Or in our example:
P
(Sizei − Size)(Pricei − Price) sSize,Price
β̂1act = =
s2Size
P 2
(Sizei − Size)
Step 4: Choose a significance probability level (α).
• Commonly set at 5% (α = 0.05)

Empirical question:
Step 5: Reject or not the null hypothesis that β1 = 0
Alternative 1: Calculate the t-statistic using β̂ act and compare it to the critical
value t* (for α = 0.05, t*=1.96)
act
βˆ1 − β1,0 βˆ1 − 0 βˆ1
t= = −→ tact =
se(βˆ1 ) se(βˆ1 ) se(βˆ1 )
where se(βˆ1 ) is the standard error of βˆ1 , which is the estimator of the standard
deviation of βˆ1 , σβˆ1 :
1
Pn 2 2
i=1 (Xi − X̄) ûi
q
ˆ
se(β1 ) = σ̂βˆ 2 where 2 1 n−2
σ̂β̂ = n h i2
1 1 Pn
1 2
n i=1 (X i − X̄)
Alternative 1: Rejection rule

• If |tact | > 1.96 → Reject H0 at a 5% significance level
• If |tact | ≤ 1.96 → Do not reject H0 at a 5% significance level
Alternative 2: Calculate the p-value and compare it to α.
p-value: probability, under the assumption that H0 is true, of observing a

act
value of βˆ1 as far from β1,0 as your estimate βˆ1 .
p − value = PrH0 [|β̂1 − β1,0 | > |β̂1act − β1,0 |]

h β̂ − β
1 1,0 β̂ act − β1,0 i
= PrH0 > 1
se(βˆ1 ) se(βˆ1 )
= PrH0 [|t| > |tact |]
Because βˆ1 is approximately distributed in large samples, under H0 the

t-statistic is approximately distributed as a standard normal, so:
p − value = Pr[|Z| > |tact |] = 2Φ(−|tact |)
where Φ() is the cumulative distribution function (CDF) of the standard

normal distribution

• If p − value ≤ 0.05 → Reject H0 at a 5% significance level
• If p − value > 0.05 → Do not reject H0 at a 5% significance level
Alternative 3: Calculate the confidence interval for β1 and check if β1,0 is in it.
95% confidence interval (CI) of β1 : an interval that contains the true value
of β1 with 95% probability. Or equivalently, the set of values of β1 that cannot
be rejected by a 5% two-sided hypothesis test.
By rearranging the rejection rule based on the t-statistic:

act
βˆ1 − β1,0
Do not reject H0 if < |1.96|
se(β̂1 )
We can establish the set of values β1 that are not rejected at a 5%
significance level:
act act act
95% CI for β1 = {βˆ1 ± 1.96se(β̂1 )} = [βˆ1 − 1.96se(β̂1 ), βˆ1 + 1.96se(β̂1 )]

act
/ {βˆ1
• If β1,0 ∈ ± 1.96se(βˆ1 )} → Reject H0 at a 5% significance level
act
• If β1,0 ∈ {βˆ1 ± 1.96se(βˆ1 )} → Do not reject H0 at a 5% significance level
Confidence interval for predicted effect of changing Size
The 95% confidence interval for β1 can be used to construct a 95% interval
for the predicted effect of a general change in Size (∆Size) on Price (∆Price).
According to our model, the predicted change in Price will be:
∆Price = β1 ∆Size
β1 is unknown, but because we can construct a confidence interval for β1 , we

can also construct a confidence interval for the predicted effect β1 ∆Size:
act act
95% CI for β1 ∆Size = [βˆ1 ∆Size − 1.96se(β̂1 ) × ∆Size, βˆ1 ∆Size + 1.96se(β̂1 )]
For example, the confidence interval for the predicted change in price for a
15m2 increase in house size will be:
95% CI for β1 ∆Size = [1614.242 × 15 − 1.96 × 73.43 × 15, 1614.242 × 15 +

1.96 × 73.43 × 15] = [22054.79 , 26372.47]
Empirical question:
• H0 : β 1 = 0 (NO relation between Size and Price in the population)

• H1 : β1 ̸= 0 (Relation between Size and Price in the population)
We can use any of the three alternatives to reject H0 at a 5% significance

level:
1. Using the t-statistic: 22.35 > 1.96 −→ Reject H0
2. Using the p-value: 0.000 < 0.05 −→ Reject H0
3. Using the 95% confidence interval: 0 ∈
/ [1497.2 , 1785.3] −→ Reject H0
So we conclude that the size of an apartment affects its prices of sale.
One-sided hypothesis concerning β1
Empirical question:
Is the increase in the price of sale for an additional square meter greater
than 1600 euros?
Step 1: Convert your empirical question into a hypothesis

Again, our empirical question concerns the slope of the regression line:
But now we want to know if this slope is greater than 1600. Therefore, now
our null and alternative hypotheses are:
• H0 : β1 = 1600
• H1 : β1 > 1600
More generally: H0 : β1 = β1,0 and H1 : β1 > β1,0

Empirical question:
than 1600 euros?
Step 2: Characterize the sampling distribution of βˆ1 under H0
• Under our H0 , the distribution of βˆ1 is approximately N(1600, σ 2ˆ )

β 1
Step 3: Calculate βˆ1 with a randomly selected sample.

P
act (Sizei − Size)(Pricei − Price) sSize,Price
β̂1 = =
s2Size
P 2
(Sizei − Size)
Step 4: Choose a significance probability level (α).
• Commonly set at 5% (α = 0.05)


Empirical question:
than 1600 euros?
Step 5: Reject or not the null hypothesis that β1 > 1600
Alternative 1: Calculate the t-statistic using β̂ act and compare it to the critical
value t*
The t-statistic is constructed as you will do it for a two-sided hypothesis tests:

act
act
βˆ1 − 1600
t =
se(βˆ1 )
When using a one-tailed test, we are testing for the possibility of the
relationship in one direction and completely disregarding the possibility of a
relationship in the other direction. Therefore, we will concentrate on only one
side of the standard normal distribution and critical value for the CDF at 5%
changes (t∗). Concretely, in a one-sided test, for significance level of

• If H1 : β1 > β1,0 : Reject H0 if tact > 1.645
• If H1 : β1 < β1,0 : Reject H0 if tact < -1.645
So in our example:
act
act
βˆ1 − 1600 1641.24 − 1600
t = = = 0.56
se(βˆ1 ) 73.43
• 0.56 < 1.645 −→ We cannot reject the null that β1 = 1600
So, with the evidence at hand, we conclude that the increase in price for an
additional square meter is not different from 1600.
Alternative 2: Calculate the p-value and compare it to α.
The p-value for a one-sided test is

act
• For H1 : β1 > β1,0 : p − value = PrH0 [βˆ1 − β1,0 > βˆ1 − β1,0 ]
act
• For H1 : β1 < β1,0 : p − value = PrH0 [βˆ1 − β1,0 < βˆ1 − β1,0 ]
And it is obtained from the cumulative standard normal distribution as:

• For H1 : β1 > β1,0 : p − value = Pr(Z > tact ) = 1 − Φ(tact )
• For H1 : β1 < β1,0 : p − value = Pr(Z < tact ) = Φ(tact )

In our example: p − value = Pr(Z > 0.56) = 1 − Φ(tact ) = 1 − 0.7123 = 0.2877
• 0.2877 > 0.05 −→ We cannot reject the null that β1 = 1600
Fundamentals of Regression Analysis Appendix
Appendix
Assumption 1: E(ui |Xi ) = 0

E(ui |Xi ) = 0
• There are different ways of showing the result of these assumption

• The "other factors" contained in ui are unrelated to Xi :
(1) Cov(X, u) = E[(X − E(X))(u − E(u))] = E[(X − E(X))u]
(2) Cov(X, u) = E[Xu − E(X)u] = E(Xu) − E(X)E(u)
By the law of iterated expectations: E(u) = E[E(u|X)]
(3) Cov(X, u) = E(Xu) − E(X)E[E(u|X)] = E(Xu)
By the law of iterated expectations: E(Xu) = E[E(Xu|X)]
(4) Cov(X, u) = E[E(Xu|X)] = E[E(u|X)X] = 0 → Corr(X, u) = 0

E(ui |Xi ) = 0
• The conditional distribution of Yi is centered in the population

regression line (on average, the prediction of Yi is right).
(1) E(Yi | Xi ) = E(β0 + β1 Xi + ui | Xi )
(2) E(Yi | Xi ) = E(β0 + β1 Xi | Xi ) + E(ui | Xi )
(3) E(Yi | Xi ) = β0 + β1 Xi
It is often convenient to discuss the conditional mean assumption in terms of

correlation between ui and Xi .
When doing so, remember that:
• If E(ui |Xi ) = 0, then Corr(Xi , ui ) is always zero
• However, Corr(Xi , ui ) = 0 does not necessarily imply that E(ui |Xi ) = 0
▶ That is, the correlation only captures the linear relationship between
Xi and ui
• But Corr(Xi , ui ) ̸= 0 necessarily implies that E(ui |Xi ) ̸= 0
P
(Xi − X)(Yi − Y)
β̂1 = P
(Xi − X)2
First, represent β̂1 in terms of X and u
• Hint: Yi − Y i = β1 (Xi − X i ) + (ui − u)
P
(Xi − X̄)[β1 (Xi − X i ) + (ui − u)]
(1) β̂1 = P
(Xi − X̄)2
β1 (Xi − X̄)2 + (Xi − X̄)(ui − u)]

P P
(2) β̂1 = P
(Xi − X̄)2
(Xi − X̄)2
P P P
(Xi − X̄)(ui − ū) (Xi − X̄)(ui − ū)
(3) β̂1 = β1 P 2
+ P 2
= β 1 + P
(Xi − X̄) (Xi − X̄) (Xi − X̄)2
P P
(Xi − X̄)ui − (Xi − X̄)u
(4) β̂1 = β1 + P
(Xi − X̄)2
Pn
• Hint: X̄ = i=1 Xi Pn
→ i=1 Xi = nX̄
n
• Hint: (Xi − X̄)ū = [ ni=1 Xi − ni=1 X̄]ū = [nX̄ − nX̄]ū = 0
P P P
P
(Xi − X)ui
(5) β̂1 = β1 + P
(Xi − X)2
Then, take the expectation of βˆ1 .

P
(Xi − X)ui
(6) E(β̂1 ) = E β1 + P
(Xi − X)2
• By the law of iterated expectations: E(ui ) = E[E(ui |X1 , ..., Xn )]
P
(Xi − X)E(ui |X1 , ..., Xn )
(7) E(β̂1 ) = E β1 + P
(Xi − X)2
• Because observations are independently distributed:

E(ui |X1 , ..., Xn ) = E[E(ui |Xi )] and E(ui |Xi ) = 0
P
(Xi − X)E(ui |Xi )
(8) E(β̂1 ) = E β1 + P = β1
(Xi − X)2
• Equivalently, by the law of iterated expectation:
(9) E(β̂1 ) = E[E(βˆ1 |X1, ..., Xn )] = β1
Unbiasedness example
• What does E(β̂1 ) = β1 intuitively means?

• The concept of unbiasedness should make us reflect about things such
as probability and expected value
• We will do a little simulated experiment to understand what is behind the
concept of unbiasedness
• And to what extent it is important or relevant to ask an estimator to be
unbiased.
Data simulation
First, we will use Stata to simulate some observations from a true model
(remember: we never know the true model and the whole point is to estimate
its parameters)
• The true model is Wagei = β0 + β1 × Agei + ui , with β0 = 21 y β1 = 2
• Age is in years and wage is in euros per hour
• Let’s assume for the sake of simplicity, that the unknown error term is
iid
u ∼ N (0, 3) and satisfies the assumption of E(u | X) = 0
• As we said, we will treat the model as known, and we will generate 1000
values for Age and u, and using the true values of the parameters β0 and
β1 we will generate 1000 values for the wage
• Therefore, the 1000 data points for (Yi , Xi , ui ) will be our population
True regression line: Wagei = 21 + 2Agei
Simulation
1. Let’s take a random sample of n=50 data from 1000 data points
2. Then estimate the parameters β0 and β1 applying OLS to those data
3. That is, let’s now pretend that we don’t know the true population
parameters and use our random sample to estimate both β̂0 and β̂1
• What do we expect as the result from this estimation?

• What relationship does it have regarding the true line?
The red dots are those point from the populations that were chosen in the random sampling and the estimated line with
those 50 points Wagei = 21.87 + 1.75Agei . In black, we have the true line (Wagei = 21 + 2Agei )
Continuing the experiment
• Let’s repeat the experiment 10 times

• Each of these ten times, we will pick 50 points randomly from the entire
population and estimate again the OLS regression
• What does this suggests regarding the property of unbiasedness of the
OLS estimator and the assumption of E(u|X) = 0?
Again, in red we have the points chosen in the second random sampling and the regression line with 50 data points
Wagei = 20, 6 + 2, 06Edadi . In black, we have the true line (Wagei = 21 + 2Agei )
In red we have the points chosen in the third random sampling and the regression line with 50 data points
Wagei = 22, 09 + 1, 81Agei . In black, we have the true line (Wagei = 21 + 2Agei )
In red we have the points chosen in the fourth random sampling and the regression line with 50 data points
In red we have the points chosen in the fifth random sampling and the regression line with 50 data points
In red we have the points chosen in the sixth random sampling and the regression line with 50 data points
In red we have the points chosen in the seventh random sampling and the regression line with 50 data points
• In red we have the points chosen in the eight random sampling and the regression line with 50 data points
In red we have the points chosen in the ninth random sampling and the regression line with 50 data points
In red we have the points chosen in the tenth random sampling and the regression line with 50 data points
Sample β̂0 β̂1
1 21.87 1.75
2 20.66 2.06
3 22.09 1.81
4 21.12 1.94
5 20.96 2.04
6 19.66 2.23
7 21.73 1.85
8 22.71 1.86
9 20.43 2.14
10 20.55 2.15
Average 21.178 1.983
Simulation: continuation
• How would the table change if we would repeat it 30 times?
Muestra β̂0 β̂1
1 21.87 1.75
2 20.66 2.06
3 22.09 1.81
4 21.12 1.94
5 20.96 2.04
6 19.66 2.23
7 21.73 1.85
8 22.71 1.86
9 20.43 2.14
10 20.55 2.15
11 20.57 2.12
12 21.86 1.88
13 21.36 2.00
14 22.60 1.75
15 21.50 1.95
16 20.49 2.11
17 20.81 2.02
18 21.20 1.98
19 22.23 1.75
20 20.80 2.12
21 19.69 2.16
22 21.58 1.83
23 21.00 2.05
24 19.99 2.04
25 20.89 2.12
26 20.89 2.06
27 21.73 1.99
28 21.85 2.03
29 21.95 1.87
30 20.31 2.12
Average 21.17 1.99
• How would the result would change if we could repeat the process more
times?
• How would the result change if we repeat it from scratch 1000, but now
using random samples of n=100?
Conclusions
• The estimator is a random variable: with different samples we get

different results
• Unbiased means that if we repeat the random sampling enough times,
the average of those experiments will be the true parameter
• But it does not states anything regarding a particular sample and
estimated coefficient
• Unbiasedness is a concept linked to fairness
• Its interpretation will depend on the concept of probability that we use
• In the frequentist view, the interpretation is the value that would arise if
we repeat the experiment a sufficient number of times (LLN)
• If the concept is more subjective (bayesian), it would be the value we
expect to show in light of our knowledge of how the true model works

lect2_part2

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

lect2_part2

Uploaded by

Copyright:

Available Formats

Econometrics I: Fundamentals of Regression Analysis

Javier Abellán, Màxim Ventura and Carlos Suárez

Universitat Pompeu Fabra

The OLS estimator assumptions

Why do we use OLS instead of other possible estimators?

• OLS is a generalisation of the sample average: if the "line" would be just

• Assumption 1: The conditional distribution of ui given Xi has a mean of

0 < E(X 4 ) < ∞ and 0 < E(Y 4 ) < ∞

If the conditional distribution of ui given Xi has a mean of zero:

With experimental data:

Assumption 2: (Xi , Yi ), i = 1, ....., n are i.i.d

(Xi , Yi ), i = 1, ....., n are independently and identically distributed

Assumption 3: Large outliers are unlikely

• Outlier: an observation with values of Xi , Yi or both far outside the usual

Assumption 3: Large outliers are unlikely

• The validity of this assumption will depend on the characteristics of the

Twin roles of the least square assumptions

1. Mathematical role: if the three assumptions hold...

The sampling distribution of the OLS estimator

Sampling distribution of the OLS estimators

Under the least square assumptions:

• βˆ0 and βˆ1 are unbiased estimators of β0 and β1

E(βˆ0 ) = β0 and E(βˆ1 ) = β1

• In large samples, by the central limit theorem, the sampling distribution

βˆ0 → N (β0 , σβ2ˆ ) and βˆ1 → N (β1 , σβ2ˆ )

• If we replace Yi by its population value according to the true model

Normal approximation of βˆ1 and βˆ0 in large samples

The large sample approximation of βˆ1 is:

The large sample approximation of βˆ0 is:

Consistency of βˆ1 and βˆ0

• βˆ0 and βˆ1 are consistent estimators of β0 and β1

Estimator of the variance and standard error of βˆ1 and βˆ0

The estimators of σβ2ˆ and σβ2ˆ are, respectively:

Homoskedasticity and heteroskedasticity

Let’s add a fourth assumption:

The error term ui is homoskedastic if:

• In the left-hand figure, the spread of the conditional distribution of ui

Mathematical implications of homoskedasticity

3 Because the conditional variance of ui given Xi is constant,

• Consequently, if errors are homoskedastic, the formula for the standard

• When the errors are heteroskedastic, the homoskedasticity-only

• In contrast, using heteroskedasticity-robust standard errors (the

Variance of the residuals in the dwellings example

• Does σû2 depend on Xi ?

• As we can see, a larger value of X has a larger û. Therefore, it is quite

Hypothesis test and confidence intervals

Testing hypotheses about β1

The general approach to test hypothesis about the unknown parameter β1 is

1. Set a null hypothesis (H0 ) about β1 and assume it is true

Two-sided hypothesis concerning β1

Step 1: Convert your empirical question into a hypothesis

More generally: H0 : β1 = β1,0 and H1 : β1 ̸= β1,0

Two-sided hypothesis concerning β1

Step 2: Characterize the sampling distribution of βˆ1 under H0

So under our H0 , if the sample is large enough the distribution of βˆ1 is

Or more generally, approximately a N(β1,0 , σβ2ˆ )

Two-sided hypothesis concerning β1

Step 3: Calculate βˆ1 from a randomly selected sample.

Step 4: Choose a significance probability level (α).

• Commonly set at 5% (α = 0.05)

Two-sided hypothesis concerning β1

Step 5: Reject or not the null hypothesis that β1 = 0

Alternative 1: Rejection rule

Step 5: Reject or not the null hypothesis that β1 = 0

where Φ() is the cumulative distribution function (CDF) of the standard