Download as pdf or txt
Download as pdf or txt
You are on page 1of 73

Econometrics I: Fundamentals of Regression Analysis

Part 2

Javier Abellán, Màxim Ventura and Carlos Suárez

Universitat Pompeu Fabra

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 1 / 73
Fundamentals of Regression Analysis The OLS estimator assumptions

The OLS estimator assumptions

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 2 / 73
Fundamentals of Regression Analysis The OLS estimator assumptions

Why do we use OLS instead of other possible estimators?

• OLS is a generalisation of the sample average: if the "line" would be just


an intercept (that is, if the model does not include any variable), then the
OLS estimator would be the sample average of Y1 , . . . , Yn (Ȳ)
• Similar to Ȳ, the OLS estimator has some desirable properties:
▶ Under certain assumptions, it is unbiased, it is an unbiased
estimator: E(β̂1 ) = β1
▶ Under certain assumptions, has a tighter sampling distribution than
some other unbiased candidates estimators of β1 (that is, it has
lower variance).

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 3 / 73
Fundamentals of Regression Analysis The OLS estimator assumptions

Least-Squares assumptions

Yi = β0 + β1 Xi + ui , i = 1, ..., n

In order for the OLS estimators, βˆ0 and βˆ1 , to be appropriate estimators of the
true parameters β0 and β1 , the following three assumptions need to be true:

• Assumption 1: The conditional distribution of ui given Xi has a mean of


zero.
E(ui |Xi ) = 0
• Assumption 2: Observations are independently and identically
distributed.
(Xi , Yi ), i = 1, . . . , n are i.i.d
• Assumption 3: Large outliers are unlikely

0 < E(X 4 ) < ∞ and 0 < E(Y 4 ) < ∞

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 4 / 73
Fundamentals of Regression Analysis The OLS estimator assumptions

Assumption 1: E(ui | Xi ) = 0

If the conditional distribution of ui given Xi has a mean of zero:

E(ui |Xi ) = 0

• All the "other factors" captured in the error term ui (those that
explain Yi but have not been included in the model) are (linearly)
unrelated to Xi : Cov(X, u) = 0 (See Appendix)
• The conditional distribution of Yi is centered in the population
regression line: That is, on average, the prediction of Yi is right (See
Appendix)
• We will frequently come back to this assumption during the course

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 5 / 73
Fundamentals of Regression Analysis The OLS estimator assumptions

With experimental data:


• In randomized control trials (RCT), Xi is randomly assigned to individuals
without taking into account its characteristics
• As a consequence, Xi is unrelated to all characteristics of the individual
that affects Yi , which in our model are captured by ui
• Therefore, in well-designed experimental settings:
▶ ui and Xi are independently distributed
▶ E(ui |Xi ) = 0
With observational data:
• Xi is not randomly assigned across the population
• So we should be careful to understand wether this assumption is
actually valid in the data

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 6 / 73
Fundamentals of Regression Analysis The OLS estimator assumptions

Assumption 2: (Xi , Yi ), i = 1, ....., n are i.i.d

(Xi , Yi ), i = 1, ....., n are independently and identically distributed


• If observations are selected by simple random sampling from a single
large population, this assumption is true.
• Let’s continue with our example of the housing price in Barcelona. If X is
the area of a dwelling and Y its sale’s price.
• If we randomly sample n dwellings from the population of dwellings sold
in Barcelona between 1998 and 2000:
▶ because all observations are drawn from the same population, the
joint distribution of surface and price is the same for each i and
equals the joint distribution of surface and price in the population
(identically distributed).
▶ because the sample is selected at random, knowing the surface
and price of dwelling 1 tells us nothing about the surface and price
of the remaining n-1 dwellings (independently distributed).

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 7 / 73
Fundamentals of Regression Analysis The OLS estimator assumptions

Assumption 3: Large outliers are unlikely

• Outlier: an observation with values of Xi , Yi or both far outside the usual


range of the data
• Extreme values prevent the sample variance s2 to converge to the
population variance σ 2 , this making the OLS estimations misleading.
• Mathematically, that extreme values are unlikely is stated as:
▶ X and Y have nonzero finite fourth moments: 0 < E(X 4 ) < ∞ and
0 < E(Y 4 ) < ∞
▶ Another way to put this is that X and Y have finite kurtosis

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 8 / 73
Fundamentals of Regression Analysis The OLS estimator assumptions

Assumption 3: Large outliers are unlikely

• The validity of this assumption will depend on the characteristics of the


data
• For instance, the area of a dwelling will probably satisfy the assumption
• The same goes for exam grades, age of a person, etc.
• However, for other variables such as returns of the stock market, we
should check whether this is actually the case

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 9 / 73
Fundamentals of Regression Analysis The OLS estimator assumptions

Twin roles of the least square assumptions

1. Mathematical role: if the three assumptions hold...


▶ the OLS estimators will be unbiased estimators of the true
parameters
▶ the OLS estimators will be consistent estimators of the true
parameters
▶ the OLS estimators will have sampling distributions that are
approximately normal in large samples.
2. Circumstances when the OLS assumptions do not hold
▶ Corr(X, u) ̸= 0
▶ Observations not i.i.d
▶ Outliers

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 10 / 73
Fundamentals of Regression Analysis The sampling distribution of the OLS estimator

The sampling distribution of the OLS estimator

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 11 / 73
Fundamentals of Regression Analysis The sampling distribution of the OLS estimator

Sampling distribution of the OLS estimators


If the OLS estimators are computed from randomly drawn samples, βˆ0 and βˆ1
will be random variables themselves with a sampling distribution.

Under the least square assumptions:

• βˆ0 and βˆ1 are unbiased estimators of β0 and β1

E(βˆ0 ) = β0 and E(βˆ1 ) = β1

• In large samples, by the central limit theorem, the sampling distribution


of βˆ0 and βˆ1 can be well approximated by the bivariate normal
distribution.
▶ The marginal distributions of βˆ0 and βˆ1 are (approximately)
normally distributed in large samples.

βˆ0 → N (β0 , σβ2ˆ ) and βˆ1 → N (β1 , σβ2ˆ )


d 0 d 1

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 12 / 73
Fundamentals of Regression Analysis The sampling distribution of the OLS estimator

Unbiasedness of βˆ1

P
(Xi − X)(Yi − Y)
β̂1 = P
(Xi − X)2

• If we replace Yi by its population value according to the true model


(Yi = β0 + β1 Xi + ui ) and work out the math, we can show that: (See
Appendix)
P
(Xi − X)ui
β̂1 = β1 + P
(Xi − X)2

• This is one of the most important formulas we will see during this course

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 13 / 73
Fundamentals of Regression Analysis The sampling distribution of the OLS estimator

Unbiasedness of βˆ1

P
(Xi − X)ui
β̂1 = β1 + P
(Xi − X)2

• The intuitive idea is that our estimator is equal to the true parameter plus
’something’ else
• If the expected value of that ’something’ else is zero, our estimator is
unbiased; otherwise it is biased
• If the error term is uncorrelated with our X (if assumption #1 holds), then
the second term will be zero and thus our estimator will be unbiased
(E(β̂1 ) = β1 )
• However, if our model has left in the error term something relevant
(explains Y and therefore belongs to the error term and is correlated with
X), the second term will not be zero and our estimator will be biased

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 14 / 73
Fundamentals of Regression Analysis The sampling distribution of the OLS estimator

Normal approximation of βˆ1 and βˆ0 in large samples

The large sample approximation of βˆ1 is:

1 var[(Xi − µX )ui ]
N (β1 , σβ2ˆ ) where σβ2ˆ =
1 1 n [var(Xi )]2

The large sample approximation of βˆ0 is:

1 var(Hi ui )
N (β0 , σβ2ˆ ) where σβ2ˆ =
0 0 n [E(Hi )2 ]2
h µ i
X
and Hi = 1 − Xi
E(Xi2 )

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 15 / 73
Fundamentals of Regression Analysis The sampling distribution of the OLS estimator

From the variance formula of the OLS estimators we can see several things:

1. Other things equal, the larger the variance of Xi , the smaller the variance
of βˆ1
▶ Intuitively, the wider the range of X, the ’better’ information to draw
the the regression line.
2. Other things equal, the smaller the variance of ui , the smaller the
variance of βˆ1
▶ Intuitively, if we have a very good model (the errors are smaller), the
data will have a tighter scatter around the population regression
line, so its slope will be estimated more precisely.
3. Other things equal, a larger the sample size (n), the smaller the variance
βˆ1
▶ Intuitively, larger n means more dots (information) to draw the
regression line

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 16 / 73
Fundamentals of Regression Analysis The sampling distribution of the OLS estimator

Consistency of βˆ1 and βˆ0

From the variance formula of the OLS estimators we can see several things:

• βˆ0 and βˆ1 are consistent estimators of β0 and β1


▶ As n gets larger, the variance of βˆ0 and βˆ1 will go to zero
▶ Since n is in the denominator of the variance’s formulas, if
assumption #3 holds (the other terms are finite), the variance will
converges to zero as n → ∞

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 17 / 73
Fundamentals of Regression Analysis The sampling distribution of the OLS estimator

Estimator of the variance and standard error of βˆ1 and βˆ0


The variances of the OLS estimators, σβ2ˆ and σβ2ˆ , are unknown parameters
1 0
so they need to be estimated as well.

The estimators of σβ2ˆ and σβ2ˆ are, respectively:


1 0

1
Pn 2 2 1
Pn 2 2
1 n−2 i=1 (Xi − X̄) ûi 1 n−2 i=1 Ĥi ûi
σ̂β̂2 = n h P i2 and σ̂β̂2 = n h P 2 2
1 0
i
1 n 2 1 n
n i=1 (X i − X̄) n i=1 Ĥ i

1 Pn
where Ĥi = 1 − (X/ X 2 )Xi
n i=1 i
And the standard errors of βˆ1 and βˆ0 are estimators of the standard deviation
of βˆ1 and βˆ0 , σβˆ1 and σβˆ0 :
q q
se(βˆ1 ) = σ̂β2ˆ and se(βˆ0 ) = σ̂β2ˆ
1 0

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 18 / 73
Fundamentals of Regression Analysis Homoskedasticity and heteroskedasticity

Homoskedasticity and heteroskedasticity

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 19 / 73
Fundamentals of Regression Analysis Homoskedasticity and heteroskedasticity

Homoskedasticity

Let’s add a fourth assumption:


• Assumption 4: the errors ui are homoskedastic

The error term ui is homoskedastic if:


▶ The variance of the conditional distribution of ui given Xi , Var(ui |Xi ),
is constant for i = 1, ...,n
▶ In particular, does not depend of Xi
Otherwise, the error term is said to be heteroskedastic.

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 20 / 73
Fundamentals of Regression Analysis Homoskedasticity and heteroskedasticity

Graphically:

• In the left-hand figure, the spread of the conditional distribution of ui


given Xi (student-teacher ratio in the example) does not depend on the
value of Xi .
• On the contrary, in the right-hand figure, the spread of the conditional
distribution of ui given Xi is tight for low values of Xi and greater for larger
values of Xi . So it does depend on Xi .

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 21 / 73
Fundamentals of Regression Analysis Homoskedasticity and heteroskedasticity

Mathematical implications of homoskedasticity

If the three least square assumptions hold and the errors are homoskedastic:
1. The OLS estimators remain unbiased, consistent and asymptotically
normal
▶ Note that unbiasedness and consistency do not depend on whether
errors are heteroskedastic or homoskedastic
▶ For these properties to be true, we only need the
first 3 least square assumptions to hold

2. The OLS estimators βˆ0 and βˆ1 are efficient among all estimators that are
a linear combination of Y1 , ..., Yn and are unbiased (Gauss-Markov
theorem).
▶ This is, the OLS estimators are the more efficient linear
conditionally unbiased estimators (are BLUE)

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 22 / 73
Fundamentals of Regression Analysis Homoskedasticity and heteroskedasticity

3 Because the conditional variance of ui given Xi is constant,


Var(ui |Xi ) = σu2 , the formulas for the variance of βˆ0 and βˆ1 simplify to:

σu2 E(Xi2 ) 2
σβ2ˆ = and σβ2ˆ = σ
1 nσX2 0 nσX2 u

• Consequently, if errors are homoskedastic, the formula for the standard


errors of βˆ0 and βˆ1 is simplified. The homoskedasticity-only standard
errors:
q s2û
se(βˆ1 ) = σ̃β2ˆ where σ̃β2ˆ = Pn 2
i=1 (Xi − X)
1 1

1 P 
n
i=1 Xi2 s2û
σ̃β2ˆ = Pnn
q
se(βˆ0 ) = σ̃β2ˆ where
0 0
i=1 (Xi − X)2

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 23 / 73
Fundamentals of Regression Analysis Homoskedasticity and heteroskedasticity

Warning

• When the errors are heteroskedastic, the homoskedasticity-only


formulas for the standard errors are inappropriate. Specifically:
▶ The t-statistic computed using the homoskedasticity-only standard
error does not have a standard normal distribution, even in large
samples.
▶ The 95% confidence intervals constructed using 1.96 as a critical
value and the homoskedasticity-only standard error will not contain
the true value of the parameter with 95% probability, even in large
samples.

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 24 / 73
Fundamentals of Regression Analysis Homoskedasticity and heteroskedasticity

• In contrast, using heteroskedasticity-robust standard errors (the


formulas initially presented for se(βˆ1 ) and se(βˆ1 )) leads to valid statistical
inferences whether or not the errors are heteroskedastic.
▶ At a general level, economic theory rarely gives any reason to
believe that the error term is homoskedastic
▶ So we will generally assume that errors are heteroskedastic and we
will use heteroskedasticity-robust standard errors.

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 25 / 73
Fundamentals of Regression Analysis Homoskedasticity and heteroskedasticity

Variance of the residuals in the dwellings example

• Does σû2 depend on Xi ?


i

• As we can see, a larger value of X has a larger û. Therefore, it is quite


likely that assumption #4 (homoskedasticity) does not hold

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 26 / 73
Fundamentals of Regression Analysis Hypothesis test and confidence intervals

Hypothesis test and confidence intervals

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 27 / 73
Fundamentals of Regression Analysis Hypothesis test and confidence intervals

Testing hypotheses about β1

The general approach to test hypothesis about the unknown parameter β1 is


the same as the one to test hypothesis about the population mean, µ.

Steps:

1. Set a null hypothesis (H0 ) about β1 and assume it is true


2. Characterise the sampling distribution of βˆ1 under H0
3. Calculate βˆ1 from the randomly selected sample
4. Choose a significance probability level (α)
5. Reject H0 or not accordingly. Three alternative ways:
5.1 Calculate the t-statistic and compare it to the critical value t*
5.2 Calculate the p-value and compare it to α
5.3 Calculate the confidence interval for β1 and check if β1,0 is in it

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 28 / 73
Fundamentals of Regression Analysis Hypothesis test and confidence intervals

Two-sided hypothesis concerning β1

Empirical question:
Does the size of an apartment affects its sale’s price?

Step 1: Convert your empirical question into a hypothesis


Our empirical question concerns the slope of the population regression line
that relates the size of an apartment with its price:

Price = β0 + β1 Size

Concretely, we want to know if this relation exists at all. Therefore, our null
and alternative hypotheses are:
• H0 : β 1 = 0 (NO relation between Size and Price in the population)
• H1 : β1 ̸= 0 (Relation between Size and Price in the population)

More generally: H0 : β1 = β1,0 and H1 : β1 ̸= β1,0

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 29 / 73
Fundamentals of Regression Analysis Hypothesis test and confidence intervals

Two-sided hypothesis concerning β1

Empirical question:
Does the size of an apartment affects its price of sale?

Step 2: Characterize the sampling distribution of βˆ1 under H0


We have seen in the previous section that in large samples the distribution of
βˆ1 is well approximated by:

1 var[(Xi − µX )ui ]
N(β1 , σβ2ˆ ) where σβ2ˆ =
1 1 n [var(Xi )]2

So under our H0 , if the sample is large enough the distribution of βˆ1 is


approximately N (0, σβ2ˆ )
1

Or more generally, approximately a N(β1,0 , σβ2ˆ )


1

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 30 / 73
Fundamentals of Regression Analysis Hypothesis test and confidence intervals

Two-sided hypothesis concerning β1

Empirical question:
Does the size of an apartment affects its price of sale?

Step 3: Calculate βˆ1 from a randomly selected sample.


P
(Xi − X)(Yi − Y) sX,Y
β̂1act = P 2
= 2
(Xi − X) sX

Or in our example:
P
(Sizei − Size)(Pricei − Price) sSize,Price
β̂1act = =
s2Size
P 2
(Sizei − Size)

Step 4: Choose a significance probability level (α).

• Commonly set at 5% (α = 0.05)

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 31 / 73
Fundamentals of Regression Analysis Hypothesis test and confidence intervals

Two-sided hypothesis concerning β1


Empirical question:
Does the size of an apartment affects its price of sale?

Step 5: Reject or not the null hypothesis that β1 = 0

Alternative 1: Calculate the t-statistic using β̂ act and compare it to the critical
value t* (for α = 0.05, t*=1.96)
act
βˆ1 − β1,0 βˆ1 − 0 βˆ1
t= = −→ tact =
se(βˆ1 ) se(βˆ1 ) se(βˆ1 )

where se(βˆ1 ) is the standard error of βˆ1 , which is the estimator of the standard
deviation of βˆ1 , σβˆ1 :
1
Pn 2 2
i=1 (Xi − X̄) ûi
q
ˆ
se(β1 ) = σ̂βˆ 2 where 2 1 n−2
σ̂β̂ = n h i2
1 1 Pn
1 2
n i=1 (X i − X̄)

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 32 / 73
Fundamentals of Regression Analysis Hypothesis test and confidence intervals

Alternative 1: Rejection rule


• If |tact | > 1.96 → Reject H0 at a 5% significance level
• If |tact | ≤ 1.96 → Do not reject H0 at a 5% significance level

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 33 / 73
Fundamentals of Regression Analysis Hypothesis test and confidence intervals

Step 5: Reject or not the null hypothesis that β1 = 0

Alternative 2: Calculate the p-value and compare it to α.

p-value: probability, under the assumption that H0 is true, of observing a


act
value of βˆ1 as far from β1,0 as your estimate βˆ1 .

p − value = PrH0 [|β̂1 − β1,0 | > |β̂1act − β1,0 |]


h β̂ − β
1 1,0 β̂ act − β1,0 i
= PrH0 > 1
se(βˆ1 ) se(βˆ1 )
= PrH0 [|t| > |tact |]

Because βˆ1 is approximately distributed in large samples, under H0 the


t-statistic is approximately distributed as a standard normal, so:

p − value = Pr[|Z| > |tact |] = 2Φ(−|tact |)

where Φ() is the cumulative distribution function (CDF) of the standard


normal distribution
Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 34 / 73
Fundamentals of Regression Analysis Hypothesis test and confidence intervals

Alternative 2: Rejection rule


• If p − value ≤ 0.05 → Reject H0 at a 5% significance level
• If p − value > 0.05 → Do not reject H0 at a 5% significance level

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 35 / 73
Fundamentals of Regression Analysis Hypothesis test and confidence intervals

Step 5: Reject or not the null hypothesis that β1 = 0

Alternative 3: Calculate the confidence interval for β1 and check if β1,0 is in it.

95% confidence interval (CI) of β1 : an interval that contains the true value
of β1 with 95% probability. Or equivalently, the set of values of β1 that cannot
be rejected by a 5% two-sided hypothesis test.

By rearranging the rejection rule based on the t-statistic:


act
βˆ1 − β1,0
Do not reject H0 if < |1.96|
se(β̂1 )
We can establish the set of values β1 that are not rejected at a 5%
significance level:
act act act
95% CI for β1 = {βˆ1 ± 1.96se(β̂1 )} = [βˆ1 − 1.96se(β̂1 ), βˆ1 + 1.96se(β̂1 )]

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 36 / 73
Fundamentals of Regression Analysis Hypothesis test and confidence intervals

Alternative 3: Rejection rule


act
/ {βˆ1
• If β1,0 ∈ ± 1.96se(βˆ1 )} → Reject H0 at a 5% significance level
act
• If β1,0 ∈ {βˆ1 ± 1.96se(βˆ1 )} → Do not reject H0 at a 5% significance level

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 37 / 73
Fundamentals of Regression Analysis Hypothesis test and confidence intervals

Confidence interval for predicted effect of changing Size

The 95% confidence interval for β1 can be used to construct a 95% interval
for the predicted effect of a general change in Size (∆Size) on Price (∆Price).
According to our model, the predicted change in Price will be:

∆Price = β1 ∆Size

β1 is unknown, but because we can construct a confidence interval for β1 , we


can also construct a confidence interval for the predicted effect β1 ∆Size:
act act
95% CI for β1 ∆Size = [βˆ1 ∆Size − 1.96se(β̂1 ) × ∆Size, βˆ1 ∆Size + 1.96se(β̂1 )]

For example, the confidence interval for the predicted change in price for a
15m2 increase in house size will be:

95% CI for β1 ∆Size = [1614.242 × 15 − 1.96 × 73.43 × 15, 1614.242 × 15 +


1.96 × 73.43 × 15] = [22054.79 , 26372.47]

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 38 / 73
Fundamentals of Regression Analysis Hypothesis test and confidence intervals

Two-sided hypothesis concerning β1

Empirical question:
Does the size of an apartment affects its price of sale?

Price = β0 + β1 Size

• H0 : β 1 = 0 (NO relation between Size and Price in the population)


• H1 : β1 ̸= 0 (Relation between Size and Price in the population)

We can use any of the three alternatives to reject H0 at a 5% significance


level:
1. Using the t-statistic: 22.35 > 1.96 −→ Reject H0
2. Using the p-value: 0.000 < 0.05 −→ Reject H0
3. Using the 95% confidence interval: 0 ∈
/ [1497.2 , 1785.3] −→ Reject H0

So we conclude that the size of an apartment affects its prices of sale.

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 39 / 73
Fundamentals of Regression Analysis Hypothesis test and confidence intervals

One-sided hypothesis concerning β1

Empirical question:
Is the increase in the price of sale for an additional square meter greater
than 1600 euros?

Step 1: Convert your empirical question into a hypothesis


Again, our empirical question concerns the slope of the regression line:

Price = β0 + β1 Size

But now we want to know if this slope is greater than 1600. Therefore, now
our null and alternative hypotheses are:
• H0 : β1 = 1600
• H1 : β1 > 1600

More generally: H0 : β1 = β1,0 and H1 : β1 > β1,0

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 40 / 73
Fundamentals of Regression Analysis Hypothesis test and confidence intervals

One-sided hypothesis concerning β1


Empirical question:
Is the increase in the price of sale for an additional square meter greater
than 1600 euros?

Step 2: Characterize the sampling distribution of βˆ1 under H0

• Under our H0 , the distribution of βˆ1 is approximately N(1600, σ 2ˆ )


β 1

Step 3: Calculate βˆ1 with a randomly selected sample.


P
act (Sizei − Size)(Pricei − Price) sSize,Price
β̂1 = =
s2Size
P 2
(Sizei − Size)

Step 4: Choose a significance probability level (α).

• Commonly set at 5% (α = 0.05)


Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 41 / 73
Fundamentals of Regression Analysis Hypothesis test and confidence intervals

One-sided hypothesis concerning β1


Empirical question:
Is the increase in the price of sale for an additional square meter greater
than 1600 euros?

Step 5: Reject or not the null hypothesis that β1 > 1600

Alternative 1: Calculate the t-statistic using β̂ act and compare it to the critical
value t*

The t-statistic is constructed as you will do it for a two-sided hypothesis tests:


act
act
βˆ1 − 1600
t =
se(βˆ1 )

When using a one-tailed test, we are testing for the possibility of the
relationship in one direction and completely disregarding the possibility of a
relationship in the other direction. Therefore, we will concentrate on only one
side of the standard normal distribution and critical value for the CDF at 5%
changes (t∗). Concretely, in a one-sided test, for significance level of
Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 42 / 73
Fundamentals of Regression Analysis Hypothesis test and confidence intervals

Alternative 1: Rejection rule


• If H1 : β1 > β1,0 : Reject H0 if tact > 1.645
• If H1 : β1 < β1,0 : Reject H0 if tact < -1.645

So in our example:

act
act
βˆ1 − 1600 1641.24 − 1600
t = = = 0.56
se(βˆ1 ) 73.43

• 0.56 < 1.645 −→ We cannot reject the null that β1 = 1600

So, with the evidence at hand, we conclude that the increase in price for an
additional square meter is not different from 1600.

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 43 / 73
Fundamentals of Regression Analysis Hypothesis test and confidence intervals

Step 5: Reject or not the null hypothesis that β1 = 1600

Alternative 2: Calculate the p-value and compare it to α.

The p-value for a one-sided test is


act
• For H1 : β1 > β1,0 : p − value = PrH0 [βˆ1 − β1,0 > βˆ1 − β1,0 ]
act
• For H1 : β1 < β1,0 : p − value = PrH0 [βˆ1 − β1,0 < βˆ1 − β1,0 ]

And it is obtained from the cumulative standard normal distribution as:


• For H1 : β1 > β1,0 : p − value = Pr(Z > tact ) = 1 − Φ(tact )
• For H1 : β1 < β1,0 : p − value = Pr(Z < tact ) = Φ(tact )

Alternative 2: Rejection rule


In our example: p − value = Pr(Z > 0.56) = 1 − Φ(tact ) = 1 − 0.7123 = 0.2877
• 0.2877 > 0.05 −→ We cannot reject the null that β1 = 1600

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 44 / 73
Fundamentals of Regression Analysis Appendix

Appendix

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 45 / 73
Fundamentals of Regression Analysis Appendix

Assumption 1: E(ui |Xi ) = 0


If the conditional distribution of ui given Xi has a mean of zero:

E(ui |Xi ) = 0

• There are different ways of showing the result of these assumption


• The "other factors" contained in ui are unrelated to Xi :

(1) Cov(X, u) = E[(X − E(X))(u − E(u))] = E[(X − E(X))u]

(2) Cov(X, u) = E[Xu − E(X)u] = E(Xu) − E(X)E(u)

By the law of iterated expectations: E(u) = E[E(u|X)]

(3) Cov(X, u) = E(Xu) − E(X)E[E(u|X)] = E(Xu)

By the law of iterated expectations: E(Xu) = E[E(Xu|X)]

(4) Cov(X, u) = E[E(Xu|X)] = E[E(u|X)X] = 0 → Corr(X, u) = 0


Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 46 / 73
Fundamentals of Regression Analysis Appendix

If the conditional distribution of ui given Xi has a mean of zero:

E(ui |Xi ) = 0

• The conditional distribution of Yi is centered in the population


regression line (on average, the prediction of Yi is right).

(1) E(Yi | Xi ) = E(β0 + β1 Xi + ui | Xi )

(2) E(Yi | Xi ) = E(β0 + β1 Xi | Xi ) + E(ui | Xi )

(3) E(Yi | Xi ) = β0 + β1 Xi

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 47 / 73
Fundamentals of Regression Analysis Appendix

It is often convenient to discuss the conditional mean assumption in terms of


correlation between ui and Xi .

When doing so, remember that:

• If E(ui |Xi ) = 0, then Corr(Xi , ui ) is always zero

• However, Corr(Xi , ui ) = 0 does not necessarily imply that E(ui |Xi ) = 0

▶ That is, the correlation only captures the linear relationship between
Xi and ui

• But Corr(Xi , ui ) ̸= 0 necessarily implies that E(ui |Xi ) ̸= 0

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 48 / 73
Fundamentals of Regression Analysis Appendix

Unbiasedness of βˆ1

P
(Xi − X)(Yi − Y)
β̂1 = P
(Xi − X)2

First, represent β̂1 in terms of X and u

• Hint: Yi − Y i = β1 (Xi − X i ) + (ui − u)

P
(Xi − X̄)[β1 (Xi − X i ) + (ui − u)]
(1) β̂1 = P
(Xi − X̄)2

β1 (Xi − X̄)2 + (Xi − X̄)(ui − u)]


P P
(2) β̂1 = P
(Xi − X̄)2

(Xi − X̄)2
P P P
(Xi − X̄)(ui − ū) (Xi − X̄)(ui − ū)
(3) β̂1 = β1 P 2
+ P 2
= β 1 + P
(Xi − X̄) (Xi − X̄) (Xi − X̄)2

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 49 / 73
Fundamentals of Regression Analysis Appendix

P P
(Xi − X̄)ui − (Xi − X̄)u
(4) β̂1 = β1 + P
(Xi − X̄)2
Pn
• Hint: X̄ = i=1 Xi Pn
→ i=1 Xi = nX̄
n
• Hint: (Xi − X̄)ū = [ ni=1 Xi − ni=1 X̄]ū = [nX̄ − nX̄]ū = 0
P P P

P
(Xi − X)ui
(5) β̂1 = β1 + P
(Xi − X)2

Then, take the expectation of βˆ1 .


P
 (Xi − X)ui 
(6) E(β̂1 ) = E β1 + P
(Xi − X)2

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 50 / 73
Fundamentals of Regression Analysis Appendix

• By the law of iterated expectations: E(ui ) = E[E(ui |X1 , ..., Xn )]

P
(Xi − X)E(ui |X1 , ..., Xn ) 
(7) E(β̂1 ) = E β1 + P
(Xi − X)2

• Because observations are independently distributed:


E(ui |X1 , ..., Xn ) = E[E(ui |Xi )] and E(ui |Xi ) = 0
P
 (Xi − X)E(ui |Xi ) 
(8) E(β̂1 ) = E β1 + P = β1
(Xi − X)2

• Equivalently, by the law of iterated expectation:

(9) E(β̂1 ) = E[E(βˆ1 |X1, ..., Xn )] = β1

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 51 / 73
Fundamentals of Regression Analysis Appendix

Unbiasedness example

• What does E(β̂1 ) = β1 intuitively means?


• The concept of unbiasedness should make us reflect about things such
as probability and expected value
• We will do a little simulated experiment to understand what is behind the
concept of unbiasedness
• And to what extent it is important or relevant to ask an estimator to be
unbiased.

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 52 / 73
Fundamentals of Regression Analysis Appendix

Data simulation

First, we will use Stata to simulate some observations from a true model
(remember: we never know the true model and the whole point is to estimate
its parameters)
• The true model is Wagei = β0 + β1 × Agei + ui , with β0 = 21 y β1 = 2
• Age is in years and wage is in euros per hour
• Let’s assume for the sake of simplicity, that the unknown error term is
iid
u ∼ N (0, 3) and satisfies the assumption of E(u | X) = 0
• As we said, we will treat the model as known, and we will generate 1000
values for Age and u, and using the true values of the parameters β0 and
β1 we will generate 1000 values for the wage
• Therefore, the 1000 data points for (Yi , Xi , ui ) will be our population

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 53 / 73
Fundamentals of Regression Analysis Appendix

True regression line: Wagei = 21 + 2Agei

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 54 / 73
Fundamentals of Regression Analysis Appendix

Simulation

1. Let’s take a random sample of n=50 data from 1000 data points
2. Then estimate the parameters β0 and β1 applying OLS to those data
3. That is, let’s now pretend that we don’t know the true population
parameters and use our random sample to estimate both β̂0 and β̂1

• What do we expect as the result from this estimation?


• What relationship does it have regarding the true line?

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 55 / 73
Fundamentals of Regression Analysis Appendix

The red dots are those point from the populations that were chosen in the random sampling and the estimated line with
those 50 points Wagei = 21.87 + 1.75Agei . In black, we have the true line (Wagei = 21 + 2Agei )

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 56 / 73
Fundamentals of Regression Analysis Appendix

Continuing the experiment

• Let’s repeat the experiment 10 times


• Each of these ten times, we will pick 50 points randomly from the entire
population and estimate again the OLS regression
• What does this suggests regarding the property of unbiasedness of the
OLS estimator and the assumption of E(u|X) = 0?

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 57 / 73
Fundamentals of Regression Analysis Appendix

Again, in red we have the points chosen in the second random sampling and the regression line with 50 data points
Wagei = 20, 6 + 2, 06Edadi . In black, we have the true line (Wagei = 21 + 2Agei )

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 58 / 73
Fundamentals of Regression Analysis Appendix

In red we have the points chosen in the third random sampling and the regression line with 50 data points
Wagei = 22, 09 + 1, 81Agei . In black, we have the true line (Wagei = 21 + 2Agei )

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 59 / 73
Fundamentals of Regression Analysis Appendix

In red we have the points chosen in the fourth random sampling and the regression line with 50 data points
Wagei = 21, 12 + 1, 94Agei . In black, we have the true line (Wagei = 21 + 2Agei )

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 60 / 73
Fundamentals of Regression Analysis Appendix

In red we have the points chosen in the fifth random sampling and the regression line with 50 data points
Wagei = 20, 96 + 2, 04Agei . In black, we have the true line (Wagei = 21 + 2Agei )

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 61 / 73
Fundamentals of Regression Analysis Appendix

In red we have the points chosen in the sixth random sampling and the regression line with 50 data points
Wagei = 19, 66 + 2, 23Agei . In black, we have the true line (Wagei = 21 + 2Agei )

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 62 / 73
Fundamentals of Regression Analysis Appendix

In red we have the points chosen in the seventh random sampling and the regression line with 50 data points
Wagei = 21, 73 + 1, 85Agei . In black, we have the true line (Wagei = 21 + 2Agei )

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 63 / 73
Fundamentals of Regression Analysis Appendix

• In red we have the points chosen in the eight random sampling and the regression line with 50 data points
Wagei = 22, 71 + 1, 86Agei . In black, we have the true line (Wagei = 21 + 2Agei )

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 64 / 73
Fundamentals of Regression Analysis Appendix

In red we have the points chosen in the ninth random sampling and the regression line with 50 data points
Wagei = 20, 43 + 2, 14Agei . In black, we have the true line (Wagei = 21 + 2Agei )

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 65 / 73
Fundamentals of Regression Analysis Appendix

In red we have the points chosen in the tenth random sampling and the regression line with 50 data points
Wagei = 20, 57 + 2, 12Agei . In black, we have the true line (Wagei = 21 + 2Agei )

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 66 / 73
Fundamentals of Regression Analysis Appendix

Sample β̂0 β̂1

1 21.87 1.75
2 20.66 2.06
3 22.09 1.81
4 21.12 1.94
5 20.96 2.04
6 19.66 2.23
7 21.73 1.85
8 22.71 1.86
9 20.43 2.14
10 20.55 2.15

Average 21.178 1.983

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 67 / 73
Fundamentals of Regression Analysis Appendix

Simulation: continuation

• How would the table change if we would repeat it 30 times?

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 68 / 73
Fundamentals of Regression Analysis Appendix

Muestra β̂0 β̂1

1 21.87 1.75
2 20.66 2.06
3 22.09 1.81
4 21.12 1.94
5 20.96 2.04
6 19.66 2.23
7 21.73 1.85
8 22.71 1.86
9 20.43 2.14
10 20.55 2.15
11 20.57 2.12
12 21.86 1.88
13 21.36 2.00
14 22.60 1.75
15 21.50 1.95
16 20.49 2.11
17 20.81 2.02
18 21.20 1.98
19 22.23 1.75
20 20.80 2.12
21 19.69 2.16
22 21.58 1.83
23 21.00 2.05
24 19.99 2.04
25 20.89 2.12
26 20.89 2.06
27 21.73 1.99
28 21.85 2.03
29 21.95 1.87
30 20.31 2.12

Average 21.17 1.99

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 69 / 73
Fundamentals of Regression Analysis Appendix

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 70 / 73
Fundamentals of Regression Analysis Appendix

• How would the result would change if we could repeat the process more
times?
• How would the result change if we repeat it from scratch 1000, but now
using random samples of n=100?

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 71 / 73
Fundamentals of Regression Analysis Appendix

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 72 / 73
Fundamentals of Regression Analysis Appendix

Conclusions

• The estimator is a random variable: with different samples we get


different results
• Unbiased means that if we repeat the random sampling enough times,
the average of those experiments will be the true parameter
• But it does not states anything regarding a particular sample and
estimated coefficient
• Unbiasedness is a concept linked to fairness
• Its interpretation will depend on the concept of probability that we use
• In the frequentist view, the interpretation is the value that would arise if
we repeat the experiment a sufficient number of times (LLN)
• If the concept is more subjective (bayesian), it would be the value we
expect to show in light of our knowledge of how the true model works

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 73 / 73

You might also like