Professional Documents
Culture Documents
The Simple Regression Model
The Simple Regression Model
The Simple Regression Model
y = 0 + 1x + u
Some Terminology
In the simple linear regression model, where y = 0 + 1x + u, we typically refer to y as the
Independent Variable, or Right-Hand Side Variable, or Explanatory Variable, or Regressor, or Covariate, or Control Variables
A Simple Assumption
The average value of u, the error term, in the population is 0. That is, E(u) = 0 This is not a restrictive assumption, since we can always use 0 to normalize E(u) to 0
E(y|x) as a linear function of x, where for any x the distribution of y is centered about E(y|x)
y
f(y)
.
x1 x2
. E(y|x) = + x
0 1
Population regression line, sample data points and the associated error terms
y y4 E(y|x) = 0 + 1x . u4 {
y3 y2
u2 {.
.} u3
y1
} u1
x1
x2
x3
x4
n n
( y
n i= 1 n i= 1
0 1 xi = 0
xi yi 0 1 xi = 0
y = 0 + 1 x , or 0 = y 1 x
( (
xi ( yi y ) = 1 xi ( xi x )
i =1 n i =1 2 ( xi x )( yi y ) = 1 ( xi x ) i =1 i =1 n
1 =
( x x )( y
i =1 i n i =1 i n
y)
2
( x x)
i =1
More OLS
Intuitively, OLS is fitting a line through the sample points such that the sum of squared residuals is as small as possible, hence the term least squares The residual, , is an estimate of the error term, u, and is the difference between the fitted line (sample regression function) and the sample point
Sample regression line, sample data points and the associated estimated error terms
y y4 4 {
.
y = 0 + 1 x
y3 y2
2 {.
.} 3
y1
} 1 .
x1 x2 x3 x4 x
i = yi 0 1 xi u
i =1 i =1
(y
n i =1 n i
1 xi = 0 0
xi yi 0 1 xi = 0
i =1
u
i =1
=0
x u
i =1 i
=0
y = 0 + 1 x
More terminology
We can think of each observatio n as being made up of an explained part, and an unexplaine d part, yi = yi + ui We then define the following :
( y y ) is the total sum of squares (SST) ( y y ) is the explained sum of squares (SSE) u is the residual sum of squares (SSR)
2 2 i i 2 i
Goodness-of-Fit
How do we think about how well our sample regression line fits our sample data? Can compute the fraction of the total sum of squares (SST) that is explained by the model, call this the R-squared of regression R2 = SSE/SST = 1 SSR/SST
Unbiasedness of OLS
Assume the population model is linear in parameters as y = 0 + 1x + u Assume we can use a random sample of size n, {(xi, yi): i=1, 2, , n}, from the population model. Thus we can write the sample model yi = 0 + 1xi + ui Assume E(u|x) = 0 and thus E(ui|xi) = 0 Assume there is variation in the xi
1
2 x
( x x) y =
i
2 x
, where
s ( xi x )
( x x ) y = ( x x )( + x ( x x ) + ( x x ) x + ( x x )u = ( x x ) + ( x x )x + ( x x )u
i i i i 0 0 i 1 i i i 0 i 1 i i i i
1 i
+ ui ) =
( x x ) = 0, ( x x)x = ( x x)
i i i i 2 1 x
( x x )u +
i
2 x
( )
Unbiasedness Summary
The OLS estimates of 1 and 0 are unbiased Proof of unbiasedness depends on our 4 assumptions if any assumption fails, then OLS is not necessarily unbiased Remember unbiasedness is a description of the estimator in a given sample we may be near or far from the true parameter
Homoskedastic Case
y
f(y|x)
.
x1 x2
. E(y|x) = + x
0 1
Heteroskedastic Case
f(y|x)
y
.
x1 x2 x3
E(y|x) = 0 + 1x
( )
d i2Var ( ui )
2
1 d = sx2
2
d i2 =
1 2 2 = Var 1 sx = 2 2 sx sx
( )
) (
( )
se 1 = / ( xi x )
( )