Professional Documents
Culture Documents
Topic 2
Topic 2
TOPIC 2
ECONOMETRICS I
𝒀𝒊 = 𝜷𝟎 + 𝜷𝟏 𝑿𝒊 + 𝑼𝒊 , where 𝒊 = 𝟏, … , 𝒏
VARIABLES:
Y is the dependent variable X1, X2, X3, and X4 are four hypothetical values of the explanatory
variable.
β0 is the intercept parameter (when X = 0).
Sometimes it has a meaningful interpretation and in If the relationship between Y and X were exact, the corresponding
other just act as the level (height) of the regression values of Y would be represented by the points Q1 – Q4 on the line.
β1 is the causal effect of X on Y. Parameter that gives In the diagram, the disturbance term has been assumed to be
the change in Y for a unit change in X, holding other positive in the first and fourth observations and negative in the
factors constant. In the linear model the change in Y other two, with the result that, if one plots the actual values of Y
is the same for all changes in X, no matter what the against the values of X, one obtains the points P1 – P4.
initial level of X.
It must be emphasized that in practice the P points are all one can
Ui are unobserved factors that influence Y, other see. The actual values of β1 and β2, and hence the location of the
than the variable X. Ui, is sometimes also called the Q points, are unknown, as are the values of the disturbance term
“error term” or “residual”. in the observations.
In our example:
How can we find the value of these betas? Using the OLS estimators.
OLS ESTIMATORS = the population regression line can be estimated using sample observations by ordinary least squares (OLS). The
OLS estimators of the regression intercept and slope are denoted 𝛽̂0 and 𝛽̂1 .
Laura Aparicio 14
Downloaded by Rana Ahemd (rana.ahemd@ozu.edu.tr)
lOMoARcPSD|20927229
ECONOMETRICS I
Suppose that you are given the four observations on X and Y represented in previous figure and you are asked to obtain estimates of
the values of β0 and β1. As a rough approximation, you could do this by plotting the four P points and drawing a line to fit them as best
you can.
̂𝟎 + 𝜷
̂𝒊 = 𝜷
The fitted line will be written as: 𝒀 ̂ 𝟏 𝑿𝒊
Drawing a regression line by eye is all very well, but it leaves a lot to subjective judgment. The question arises, is there a way of
calculating good estimates of β0 and β1 algebraically?
[1] Define what is known as a residual for each observation: the difference between the actual value of Y in any observation and
the fitted value given by the regression line.
̂𝒊
𝒖𝒊 = 𝒀𝒊 − 𝒀
̂𝟎 − 𝜷
𝒖𝒊 = 𝒀𝒊 − 𝜷 ̂ 𝟏 𝑿𝒊
[3] Hence the residual in each observation depends on our choice of b1 and b2. Obviously, we wish to fit the regression line, that
is, choose b1 and b2, in such a way as to make the residuals as SMALL AS POSSIBLE. We need to devise a criterion of fit that
takes account of the size of all the residuals simultaneously. There are a number of possible criteria, some of which work
better than others. One way of overcoming the problem is to minimize RSS, the sum of the squares of the residuals.
The smaller one can make RSS, the better is the fit, according to this criterion. If one could reduce RSS to 0, one would have
a perfect fit, for this would imply that all the residuals are equal to 0. The line would go through all the points, but of course
in general the disturbance term makes this impossible.
WHY USE OLS, RATHER THAN SOME OTHER ESTIMATOR? The OLS estimator has some desirable properties: under certain
assumptions, it is unbiased (that is, E(𝛽̂1 ) = β1), and it has a tighter sampling distribution than some other candidate estimators of β1.
Importantly, this is what everyone uses.
Laura Aparicio 15
Downloaded by Rana Ahemd (rana.ahemd@ozu.edu.tr)
lOMoARcPSD|20927229
ECONOMETRICS I
Let’s know considerer the GENERAL CASE where there are n observations on two variables X and Y and, supposing Y to depend on X,
we will fit the equation:
̂𝟎 + 𝜷
̂𝒊 = 𝜷
𝒀 ̂ 𝟏 𝑿𝒊
𝒏
𝑹𝑺𝑺 = 𝒖𝟏 𝟐 + ⋯ +𝒖𝒏 𝟐 = ∑ 𝒖𝒊 𝟐
𝒊=𝟏
𝟏 𝒏 ̅ )(𝒀𝒊 − 𝒀
̅ ) ∑𝒏 (𝑿 − 𝑿
𝒄𝒐𝒗(𝑿, 𝒀) 𝒏 ∑𝒊=𝟏(𝑿𝒊 − 𝑿 𝒊=𝟏 𝒊
̅ )(𝒀𝒊 − 𝒀
̅)
̂𝟏 =
𝜷 = =
𝑽𝒂𝒓(𝑿) 𝟏 𝒏 ̅ )𝟐 ∑ 𝒏
(𝑿 − ̅
𝑿 )𝟐
∑ (𝑿 − 𝑿 𝒊=𝟏 𝒊
𝒏 𝒊=𝟏 𝒊
̂𝟎 = 𝒀
𝜷 ̅ − 𝜷𝟏 𝑿
̅
MAIN CONCEPTS
OBJECTIVE: The task of regression analysis is to obtain estimates of β1 and β2, and hence an estimate of the location of the line, given
the P points. Moreover, if you were concerned only with measuring the effect of X on Y, it would be much more convenient if the
disturbance term did not exist. But in fact, part of each change in Y is due to a change in u, and this makes life more difficult (u is
sometimes described as noise).
Laura Aparicio 16
Downloaded by Rana Ahemd (rana.ahemd@ozu.edu.tr)
lOMoARcPSD|20927229
ECONOMETRICS I
INTERPRETATION
There are two stages in the interpretation of a regression equation: to turn the equation into words so that it can be understood by
a no econometrician and to decide whether this literal interpretation should be taken at face value or whether the relationship
should be investigated further.
EXAMPLE:
SLOPE: It indicates that, as S increases by one unit (of S), EARNINGS increases by 1.07 units (of EARNINGS). Since S is measured
in years, and EARNINGS is measured in dollars per hour, the coefficient of S implies that hourly earnings increase by $1.07 for
every extra year of schooling.
In this case a literal interpretation of the constant would lead to the nonsensical conclusion that an individual with no
schooling would have hourly earnings of –$1.39. In this data set, no individual had less than six years of schooling and only
three failed to complete elementary school, so it is not surprising that extrapolation to 0 leads to trouble.
MEASURES OF FIT = A natural question is how well the regression line “fits” or explains the data. There are two regression statistics
that provide complementary measures of the quality of fit:
Regression R2: measures the fraction of the variance of Y that is explained by X. It’s unit-less and ranges between 0 (no fit)
and 1 (perfect fit).
Standard error of the regression (SER): is an estimator of the standard deviation of the regression error.
REGRESSION (R2)
Laura Aparicio 17
Downloaded by Rana Ahemd (rana.ahemd@ozu.edu.tr)
lOMoARcPSD|20927229
ECONOMETRICS I
𝒀𝒊 = 𝒀̂𝒊 + 𝒖𝒊
[3] Now it so happens the Cov(Yˆ,e) must be equal to 0 (see the box). Hence we obtain:
̂ )/𝑽𝒂𝒓(𝒀) is the proportion of the variance explained by the regression line. This proportion is known
[4] In view of this, 𝑽𝒂𝒓(𝒀
as the coefficient of determination or, more usually, R2:
̂)
𝑽𝒂𝒓(𝒀
𝑹𝟐 =
𝑽𝒂𝒓(𝒀)
The maximum value of R2 is 1. This occurs when the regression line fits the observations exactly, so that 𝑌̂ = 𝑌𝑖 in all
observations and all the residuals are 0. Then 𝑉𝑎𝑟(𝑌̂) = 𝑉𝑎𝑟(𝑌)𝑎𝑛𝑑𝑉𝑎𝑟(𝑒)𝑖𝑠0, and one has a perfect fit.
If there is no apparent relationship between the values of Y and X in the sample, R 2 will be close to 0.
Laura Aparicio 18
Downloaded by Rana Ahemd (rana.ahemd@ozu.edu.tr)
lOMoARcPSD|20927229
ECONOMETRICS I
The standard error of the regression is (almost) the sample standard deviation of the OLS residuals:
CHARACTERISTICS
The root mean squared error (RMSE) is closely related to the SER:
IMPORTANT! A low R2 and large SER do NOT imply that our regression is either “good” or “bad”. What they tell us is that other
important factors influence Y. Moreover, they do NOT tell us what these factors are, but they do indicate that X alone explains only
a small part of the variation in Y in these data.
Laura Aparicio 19
Downloaded by Rana Ahemd (rana.ahemd@ozu.edu.tr)
lOMoARcPSD|20927229
ECONOMETRICS I
INTERCEPT: The intercept (taken literally) means that, according to this estimated line, districts with zero students per teacher would
have a (predicted) test score of 698.9. This interpretation of the intercept makes no sense (it extrapolates the line outside the range of
the data) in this application, the intercept is not itself economically meaningful.
Laura Aparicio 20
Downloaded by Rana Ahemd (rana.ahemd@ozu.edu.tr)
lOMoARcPSD|20927229
ECONOMETRICS I
So far we have seen how to estimate the slope of the population regression function using the estimator. But under what conditions
̂1 in a causal way? And how should we interpret if this condition fails? Moreover, the OLS regression line is an
can we interpret 𝛽
estimate, computed using our sample of data; a different sample would have given a different value of.
̂1 ?
quantify the sampling uncertainty associated with 𝛽
̂1 to test hypotheses such as 𝛽1 = 0?
use 𝛽
̂1 ?
construct a confidence interval for 𝛽
KEY ASSUMPTIONS OF THE MODEL = OLS provides an appropriate estimator of the unknown regression coefficients, β0 and β1, under
these three assumptions:
It means that the “other factors” contained in Ui are unrelated to Xi in the sense that, given a value of Xi, the mean of the distribution
of these other factors is zero. This assumption is illustrated in Figure 4.4:
Laura Aparicio 21
Downloaded by Rana Ahemd (rana.ahemd@ozu.edu.tr)
lOMoARcPSD|20927229
ECONOMETRICS I
As shown in Figure 4.4, the assumption that 𝐸(𝑢𝑖 |𝑋𝑖 ) = 0 is equivalent to assuming that the population regression line is the
conditional mean of Yi given Xi.
𝐸(𝑢𝑖 |𝑥 = 1) = 𝐸(𝑢𝑖 |𝑥 = 2) = ⋯ Changes in X (size class) should never have an impact on Ui.
1) or to the control group (X = 0). for is that X is as if randomly assigned, If the conditional mean of one random
in the precise sense that 𝐸(𝑢𝑖 |𝑋𝑖 ) = variable given another is zero, then the
two random variables have zero
The random assignment typically is 0.
covariance and are uncorrelated.
done using a computer program that
uses no information about the Whether this assumption holds in a Conditional mean = 0
This assumption is a statement about how the sample is drawn. This arises automatically if the entity (individual, district) is sampled
by simple random sampling: the entity is selected then, for that entity, X and Y are observed (recorded).
[1] The main place we will encounter non-i.i.d. sampling is when data are recorded over time (“time series data”). This will
introduce some extra complications.
a. Example: data on inventory levels (Y) at a firm and the interest rate at which the firm can borrow (X), where these
data are collected over time from a specific firm (four times per year during 30 years). A key feature of time series
data is that observations falling close to each other in time are not independent but rather tend to be correlated
Laura Aparicio 22
Downloaded by Rana Ahemd (rana.ahemd@ozu.edu.tr)
lOMoARcPSD|20927229
ECONOMETRICS I
with each other; if interest rates are low now, they are likely to be low next quarter. This pattern of correlation
violates the “independence” part of the i.i.d. assumption.
[2] Another instance of non-i.i.d. sampling is when observations belonging to a group or cluster have unobservable variables in
common.
Large outliers (that is, observations with values of Xi, Yi or both that are far
outside the usual range of the data) are unlikely. Large outliers can make OLS
regression results misleading. This potential sensitivity of OLS to extreme outliers
is illustrated in the following figure:
where
Which means that our variables can take finite values (example: class size is capped by the physical capacity of a classroom; the best
you can do on a standardized test is to get all the question right and the worst you can do is to get all the questions wrong).
In conclusion, if the assumption of finite fourth moments holds, then it is unlikely that statistical inferences using OLS will be dominated
by a few observations.
[1] FIRST ROLE: If these assumptions hold, then, as is shown in the next section, in large samples the OLS estimators have
sampling distribution that are normal which allows us to develop methods for hypothesis testing and to construct
confidence intervals.
[2] SECOND ROLE: It allows us to organize the circumstances that pose difficulties for OLS regression:
a. Assumption #1: It’s the most important to consider in practice because in several cases may not hold.
b. Assumption #2: Although it holds in many datasets, the independence assumption is inappropriate for time
series data. Therefore, in this cases we will need to modify the methods used.
Laura Aparicio 23
Downloaded by Rana Ahemd (rana.ahemd@ozu.edu.tr)
lOMoARcPSD|20927229
ECONOMETRICS I
c. Assumption #3: If your dataset contains large outliers, you should examine those outliers carefully to make
sure those observations are correctly recorded and belong in the data set (there can be data entry errors like
height in meters or centimetres).
̂0 and 𝛽
Remember that 𝛽 ̂1 are the OLS estimators of the unknown intercept 𝛽0 and slope 𝛽1 of the population regression line. Because
̂0 and 𝛽
the OLS estimators are calculated using a random sample, 𝛽 ̂1 are random variables that take on different values from one
sample to the next; the probability of these different values is summarized in their sampling distributions. Under the three Least Square
Assumptions and when the sample is LARGE:
𝑝
̂1 → 𝛽1 , in other words, these estimators are consistent (when the sample is large, our estimators will be
CONSISTENT: 𝛽
near the true population coefficients) (LLN)
̂1 −𝐸(𝛽
𝛽 ̂1 )
NORMALLY DISTRIBUTED: is approximately distributed as N(0,1) if the sample is sufficiently large even if the
̂1 )
√𝑉𝑎𝑟(𝛽
Laura Aparicio 24
Downloaded by Rana Ahemd (rana.ahemd@ozu.edu.tr)
lOMoARcPSD|20927229
ECONOMETRICS I
̂𝟏 and 𝑽𝒂𝒓(𝜷
PROPERTY: Unbiasedness of 𝜷 ̂𝟏 ) is inversely proportional to n
Cov (X,U)
BEFORE: the estimator depends on X and Y. If the #1 assumption holds, the cov(X, U) = 0, so
the estimator predicts the truth
NOW: the estimator depends on X and U.
Laura Aparicio 25
Downloaded by Rana Ahemd (rana.ahemd@ozu.edu.tr)
lOMoARcPSD|20927229
ECONOMETRICS I
𝒑
̂𝟏 : 𝜷
PROPERTY: Consistency of 𝜷 ̂𝟏 → 𝜷𝟏
𝒑
̅ → 𝝁𝒙
𝑿
𝒑
𝒔𝟐𝒙 → 𝝈𝒙
𝒑
𝒔𝟐𝒙𝒚 → 𝝈𝒙𝒚
CLT and LLN allow us to combine parameters (fixed values) of a population with sample estimators.
̂𝟏 with large n
PROPERTY: Approximation to a normal distribution of 𝜷
Additional notes:
Laura Aparicio 26
Downloaded by Rana Ahemd (rana.ahemd@ozu.edu.tr)
lOMoARcPSD|20927229
ECONOMETRICS I
CONCLUSION
Until now we have focused on the use of ordinary least squares to estimate the intercept and slope of a population regression line
using a sample of n observations on a dependent variable, Y, and a single regressor, X.
There are many ways to draw a straight line through a scatterplot, but doing so using OLS has several virtues. If the least squares
assumptions hold, then the OLS estimators of the slope and the intercept are unbiased, consistent and have sampling distribution
with a variance that is inversely proportional to the sample size n. Moreover, if n is large, then the sampling distribution of the OLS
estimator is normal.
The results we’ve obtained describe the sampling distribution of the OLS estimator. By themselves, however, these results are not
̂1 or to construct a confidence interval for 𝛽
sufficient to test a hypothesis about the value of 𝛽 ̂1 . Doing so requires an estimator of
the standard deviation of the sampling distribution (that is, the standard error of the OLS estimator) which is what we will do in the
next sections.
(4.32)
𝟏
∑𝒏𝒊=𝟏 𝒖
̂ 𝒊 = 𝟎 The SAMPLE AVERAGE of the OLS residuals is zero
𝒏
̂𝒐 + 𝜷
𝒀𝒊 = 𝜷 ̂ 𝟏 𝑿𝒊 + 𝑼
̂𝒊
̂𝒐 + 𝜷
̂ 𝒊 = 𝒀𝒊 − [𝜷
𝑼 ̂ 𝟏 𝑿𝒊 ]
̂ 𝒊 = 𝒀𝒊 − [𝒀
𝑼 ̂ 𝟏𝑿
̅−𝜷 ̂ 𝟏 𝑿𝒊 ]
̅𝒊 + 𝜷
̂ 𝒊 = 𝒀𝒊 − [𝒀
𝑼 ̅ + (𝑿𝒊 − 𝑿 ̂ 𝟏 ] = (𝒀𝒊 − 𝒀
̅ 𝒊 )𝜷 ̂𝟏
̅ 𝒊 )𝜷
̅ ) − (𝑿𝒊 − 𝑿
𝒏 𝒏 𝒏 𝒏 𝒏 𝒏 𝒏
̂ 𝒊 = ∑(𝒀𝒊 − 𝒀
∑𝑼 ̂ 𝟏 ∑(𝑿𝒊 − 𝑿
̅) − 𝜷 ̂ 𝟏 (∑ 𝑿𝒊 − ∑ 𝑿
̅−𝜷
̅ 𝒊 ) = ∑ 𝒀𝒊 − ∑ 𝒀 ̅ 𝒊)
𝒊=𝟏 𝒊=𝟏 𝒊=𝟏 𝒊=𝟏 𝒊=! 𝒊=𝟏 𝒊=!
𝒏 𝒏 𝒏
̂ 𝒊 = ∑ 𝒀𝒊 − 𝒏𝒀
∑𝑼 ̂ 𝟏 (∑ 𝑿𝒊 − 𝒏𝑿
̅−𝜷 ̅ 𝒊)
𝒊=𝟏 𝒊=𝟏 𝒊=𝟏
Laura Aparicio 27
Downloaded by Rana Ahemd (rana.ahemd@ozu.edu.tr)
lOMoARcPSD|20927229
ECONOMETRICS I
𝒏 𝒏 𝒏 𝒏 𝒏
𝒏 𝒏 𝒏 𝟏 𝟏
̂ 𝒊 = ∑ 𝒀𝒊 − 𝒏𝒀
∑𝑼 ̂ 𝟏 ( ∑ 𝑿𝒊 − 𝒏𝑿
̅−𝜷 ̅ 𝒊 ) = 𝒏 ( ∑ 𝒀𝒊 − 𝒀 ̂ 𝟏 ( ∑ 𝑿𝒊 − 𝑿
̅ ) − 𝒏𝜷 ̅ 𝒊)
𝒏 𝒏 𝒏 𝒏 𝒏
𝒊=𝟏 𝒊=𝟏 𝒊=𝟏 𝒊=𝟏 𝒊=𝟏
𝒏
𝟏
̂𝟏 · 𝟎 = 𝟎
̂𝒊 = 𝟎 − 𝜷
∑𝑼
𝒏
𝒊=𝟏
(4.33)
𝟏
̂𝒊 = 𝒀
∑𝒏𝒊=𝟏 𝒀 ̅ 𝒊 The SAMPLE AVERAGE of the OLS predicted values equals 𝒀
̅
𝒏
̂𝒐 + 𝜷
𝒀𝒊 = 𝜷 ̂ 𝟏 𝑿𝒊 + 𝑼
̂𝒊
𝒀𝒊 = 𝒀̂𝒊 + 𝑼
̂𝒊
𝒏 𝒏 𝒏
∑ 𝒀𝒊 = ∑ 𝒀̂𝒊 + ∑ 𝑼
̂𝒊
𝒊=𝟏 𝒊=𝟏 𝒊=𝟏
𝒏 𝒏
∑ 𝒀𝒊 = ∑ 𝒀̂𝒊
𝒊=𝟏 𝒊=𝟏
𝒏
𝟏 𝟏
̅=
𝒀 ∑ 𝒀𝒊 =
𝒏 𝒏
𝒊=𝟏
(4.34)
∑𝒏𝒊=𝟏 𝒖
̂ 𝒊 𝑿𝒊 = 𝟎 The SAMPLE COVARIANCE between the OLS residuals and the regressors is zero
̂ 𝒊 = (𝒀𝒊 − 𝒀
[2] Substitute 𝑼 ̂𝟏
̅ 𝒊 )𝜷
̅ ) − (𝑿𝒊 − 𝑿
[3] Develop
𝒄𝒐𝒗(𝑿,𝒀) ∑𝒏 ̅ ̅
𝒊=!(𝑿𝒊 −𝑿)(𝒀𝒊 −𝒀)
̂𝟏 =
[4] Substitute 𝜷 =
𝑽𝒂𝒓(𝑿) ∑𝒏𝒊=𝟏(𝑿−𝑿̅ )𝟐
𝒏 𝒏
̂ 𝒊 𝑿𝒊 = ∑ 𝒖
∑𝒖 ̅) = 𝟎
̂ 𝒊 (𝑿𝒊 − 𝑿
𝒊=𝟏 𝒊=𝟏
𝒏
̅ ) − (𝑿𝒊 − 𝑿
∑[(𝒀𝒊 − 𝒀 ̂ 𝟏 ] (𝑿𝒊 − 𝑿
̅ 𝒊 )𝜷 ̅) = 𝟎
𝒊=𝟏
Laura Aparicio 28
Downloaded by Rana Ahemd (rana.ahemd@ozu.edu.tr)
lOMoARcPSD|20927229
ECONOMETRICS I
𝒏 𝒏
̅ ) (𝑿𝒊 − 𝑿
∑(𝒀𝒊 − 𝒀 ̂ 𝟏 ∑(𝑿𝒊 − 𝑿
̅) − 𝜷 ̅ )𝟐 = 𝟎
𝒊=𝟏 𝒊=𝟏
𝒏 𝒏
̅ )(𝒀𝒊 − 𝒀
∑𝒏𝒊=!(𝑿𝒊 − 𝑿 ̅)
∑(𝒀𝒊 − 𝒀 ̅) −
̅ ) (𝑿𝒊 − 𝑿 ̅ )𝟐 = 𝟎
∑(𝑿𝒊 − 𝑿
𝒏
∑𝒊=𝟏(𝑿 − 𝑿̅) 𝟐
𝒊=𝟏 𝒊=𝟏
𝒏 𝒏
∑(𝒀𝒊 − 𝒀 ̅ )(𝒀𝒊 − 𝒀
̅ ) − ∑(𝑿𝒊 − 𝑿
̅ ) (𝑿𝒊 − 𝑿 ̅) = 𝟎
𝒊=𝟏 𝒊=!
(4.35)
𝒏
̅ )𝟐
𝑻𝑺𝑺 = ∑(𝒀𝒊 − 𝒀
𝒊=𝟏
𝒏
̂𝒊 + 𝒀
𝑻𝑺𝑺 = ∑(𝒀𝒊 − 𝒀 ̂𝒊 − 𝒀
̅ )𝟐
𝒊=𝟏
𝒏 𝒏 𝒏 𝒏
Laura Aparicio 29
Downloaded by Rana Ahemd (rana.ahemd@ozu.edu.tr)
lOMoARcPSD|20927229
ECONOMETRICS I
Hypothesis testing for regression coefficients is analogous to hypothesis testing for the population mean: Use the t-statistic to
calculate the p-values and either accept or reject the null hypothesis. Like a confidence interval for the population mean, a 95%
confidence interval for a regression coefficient is computed as the estimator ±1.96 standard errors.
HYPOTHESIS TESTING
Laura Aparicio 30
Downloaded by Rana Ahemd (rana.ahemd@ozu.edu.tr)
lOMoARcPSD|20927229
ECONOMETRICS I
Looking to the confidence interval: if the 0 (the value of the null hypothesis) is not included then we reject.
At the same time, it can be define as the set of values that can’t
be rejected using a two-sided hypothesis test with a 5%
significance level.
RECALL!! When X is binary, the regression model can be used to estimate and test hypotheses about the difference between the
population means of the “X=0” and the “X=1” group.
Our only assumption about the distribution of Ui conditional on Xi is that is has a mean of zero (the first least squares assumption). If,
furthermore, the variance of this conditional distribution does NOT depend on Xi, then the errors are said to be homoskedastic.
We’re going to discuss:
Laura Aparicio 31
Downloaded by Rana Ahemd (rana.ahemd@ozu.edu.tr)
lOMoARcPSD|20927229
ECONOMETRICS I
𝑽𝒂𝒓(𝑿) You calculate the variance of all the population’s years of school.
𝑽𝒂𝒓(𝑿|𝑿 = 𝟏𝟐) = 𝟎 We’re fixing X, so the variance of X is equal to 0.
Therefore, 𝑽𝒂𝒓(𝒀|𝑿) = 𝑽𝒂𝒓(𝜷𝟎 + 𝜷𝟏 𝑿 + 𝑼|𝑿) = 𝑽𝒂𝒓(𝜷𝟎 |𝑿) + 𝑽𝒂𝒓(𝜷𝟏 𝑿|𝑿) + 𝑽𝒂𝒓(𝑼|𝑿) = 𝟎 + 𝟎 + 𝑽𝒂𝒓(𝑼|𝑿) = 𝑽𝒂𝒓(𝑼|𝑿)
HOMOSKEDASTICITY HETEROSKEDASTICITY
If 𝑉𝑎𝑟(𝑈|𝑋 = 𝑥) is constant, that is the variance of the If 𝑉𝑎𝑟(𝑈|𝑋 = 𝑥) is NOT constant, that is the variance of the
conditional distribution of U given X does NOT depend on conditional distribution of U given X depends on X, then U is said to be
X, then U is said to be homoskedastic. heteroskedastic.
Laura Aparicio 32
Downloaded by Rana Ahemd (rana.ahemd@ozu.edu.tr)
lOMoARcPSD|20927229
ECONOMETRICS I
If you go less than ten years to school, your wage will
be around 0-20.
Laura Aparicio 33
Downloaded by Rana Ahemd (rana.ahemd@ozu.edu.tr)
lOMoARcPSD|20927229
ECONOMETRICS I
Heteroskedasticity and homoskedasticity concern only to the variance, in other words, the Standard Error (SE) and all the values
calculated using SE (t-statistic, confidence intervals…).
HETEROSKEDASTICITY
HETEROSKEDASTICITY
X=1 Var = 15
X=2 Var = 20
HOMOSKEDASTICITY
HOMOSKEDASTICITY
X=1 Var = 15
Know due to U and X are independent we can express 𝑉𝑎𝑟(𝑣) = 𝑉𝑎𝑟(𝑋) · 𝑉𝑎𝑟(𝑈) X=2 Var = 25
Laura Aparicio 34
Downloaded by Rana Ahemd (rana.ahemd@ozu.edu.tr)
lOMoARcPSD|20927229
ECONOMETRICS I
The two formulas coincide (when n is large) in the special case of homoskedasticity.
The bottom line: you should ALWAYS use the heteroskedasticity-based formulas – these are conventionally called the
heteroskedasticity-robust standard errors.
MAIN IDEA! In general, the error ui is heteroskedastic (that is, the variance of ui at a given value of Xi, 𝑣𝑎𝑟(𝑢𝑖 |𝑋𝑖 = 𝑥), depends on x).
A special case is when the error is homoskedastic (that is 𝑣𝑎𝑟(𝑢𝑖 |𝑋𝑖 = 𝑥) is constant). Homoskedasticity-only standard errors do NOT
produce valid statistical inferences when the errors are heteroskedastic, but heteroskedasticity-robust standard errors do.
EXTRA! If the three least squares assumption hold AND if the regression errors are homoskedastic, then, the OLS estimator is BLUE.
Moreover, if the three least squares holds, if the regression errors are homoskedastic AND if the regression errors are normally
distributed, then the OLS t-statistic computed using homoskedasticity-only standard errors has a Student t distribution when the null
hypothesis is true. The difference between the Student t distribution and normal distribution is negligible if the sample size is moderate
or large.
CONCLUSION
Returning to the California test score data set, there is a negative relationship between the student-teacher ratio and test scores,
but is this relationship necessarily the causal one? Districts with lower STR have, on average, higher test score. But does this mean
that reducing the STR will, in fact, increase scores?
There is, in fact, reason to worry that it might not. Hiring more teachers, after all, costs money, so wealthier school districts can
better afford small classes. Moreover, students at wealthier schools also have other advantages over their poorer neighbours,
including better facilities, newer books, and better-paid teachers.
What’s more, California has a large immigrant community; these immigrants tend to be poorer than the overall population, and, in
many cases, their children are not native English speakers. It thus might be that our negative estimated relationship between test
scores and the STR is a consequence of large classes being found in conjunction with many other factors that are, in fact, the
real cause of the lower test scores.
These other factors or “omitted variables”, could mean that the OLS analysis done so far has little value. Indeed, it could be
misleading: changing the STR alone would not change these other factors that determine a child’s performance at school. To
address this problem, we need a method that will allow us to isolate the effect on test scores of changing the STR, holding these
other factors constant. The method is MULTIPLE REGRESSION ANALYSIS.
Laura Aparicio 35
Downloaded by Rana Ahemd (rana.ahemd@ozu.edu.tr)