Chapter 3

Multiple Regression

3.1. Definitions
3.1.1 Multiple Regression Model
3.1.2 Population regression function
3.1.3 Sample regression function

3.2. OLS Estimator in multiple regression model

3.2.1 Ordinary least square estimators
3.2.2 Assumptions of multiple regression
3.2.3 Unbiased and efficient properties
3.3. Measure of fit

3.1 Definition
3.3.1 Multiple regression model
revise chapter 2:
- The error u arises because of factors, or
variables, that influence Y but are not included
in the regression function.
- The key assumption 3 – that all other factors
affecting y are uncorrelated with x – is often
unrealistic -> difficult to draw ceteris paribus
conclusions about how x affects y
3.3.1 Multiple regression model
• is more amenable to ceteris paribus analysis
• Add more factors to our model -> more of
variation in y can be explained -> better model
for predicting the dependent variable.

Example: compared 2 model
𝑐𝑜𝑛𝑠𝑢𝑚 = 𝛽1 + 𝛽2 𝑖𝑛𝑐𝑜𝑚𝑒 + 𝛽3 𝑎𝑠𝑠𝑒𝑡 + 𝑢𝑖 (1)
𝑐𝑜𝑛𝑠𝑢𝑚 = 𝛽1 + 𝛽2 𝑖𝑛𝑐𝑜𝑚𝑒 + 𝑢𝑖 (2)

We know that the simple regression coefficient 𝛽2 (1) does not
usually equal the multiple regression coefficient 𝛽2 (2). There
are two distinct cases where 𝛽2 (1) and 𝛽2 (2) are identical:

1. The partial effect of x2 on y is zero in the sample. That is

𝛽3^ = 0.

2. x1 and x2 are uncorrelated in the sample.

MRM can incorporate fairly general
function from relationships.
𝑐𝑜𝑛𝑠𝑢𝑚 = 𝛽1 + 𝛽2 𝑖𝑛𝑐𝑜𝑚𝑒 + 𝛽3 𝑖𝑛𝑐𝑜𝑚𝑒 2 + 𝑢𝑖

MRM is the most widely used vehicle for empirical
analysis in economics and other social sciences

Wage=f(edu, exper)

3.1.2 Population regression fuction
Yi = 1 +  2 X 2i + .... +  k X ki + ui
• Y = One dependent variable (criterion)
• X = Two or more independent variables (predictor
• ui the stochastic disturbance term
• Sample size: >= 50 (at least 10 times as many cases
as independent variables)
•  1 is the intercept
•  k measures the change in Y with respect to Xk,
holding other factors fixed.
3.1.3 The Sample Regression Function (SRF)
• Population regression function
E (Y / X i ) = f ( X i ) = 1 +  2i X 2i +  3i X 3i
• Sample regression function
Yˆi = ˆ1 + ˆ2i X 2 i + ˆ3i X 3i
• 𝑌෠ i = estimator of E(Y | Xi)
• 𝛽መ 1 = estimator of β1
• 𝛽መ2 = estimator of β2
• 𝛽መ3 = estimator of β3

• An estimator, also known as a (sample) statistic, is simply a rule or

formula or method that tells how to estimate the population
parameter from the information provided by the sample.
• A particular numerical value obtained by the estimator in an
application is known as an estimate.

Example- Multiple regression function
• Problem 3.2: (Suppose that there are only 2
independent variables in the MRM) A labor
economist would like to examine the effects of
job training on worker productivity. In this
case, there is little need for formal economic
theory. Basic economic understanding is
realizing that factors such as education and
experience affect worker productivity. Also,
economists are well aware that workers are
paid commensurate with their productivity.
Example- Multiple regression function
→ Model: wage = f(educ,exper )
wage = hourly wage
educ: years of formal education
exper: years of workforce experience
wage = 1 +  2 educ +  3  exp er + u

3.1.3. The Sample Regression Function (SRF)

A sample of Y values corresponding to some

fixed X’s. Can we estimate the PRF from the
sample data?
So, if we have a data for problem 3.1, we can
have the SRF:
ෟ = 𝛽መ1 + 𝛽መ2 𝑒𝑑𝑢 + 𝛽መ3 𝑒𝑥𝑝𝑒𝑟 + 𝑢ො 𝑖

3.2. The OLS estimator in multiple
regression model
• 3.2.1 Ordinary least square estimators
• 3.2.2 Assumptions of MRM
• 3.2.3 Unbiased and efficient properties

3.2.1 OLS Estimators

Considered the three-variable model

• To find the OLS estimators, let us first write the sample
regression function (SRF) as follows:
Yi = ˆ1 + ˆ2 X 2i + ˆ3 X 3i + uˆi 7.4.1
• The residual sum of squares (RSS) ∑uˆ2i is as small as
 i

uˆ = (Y − Yˆ ) 2 → min
i i

u = Y −2
i (
  i 1 2 2i 3 3i
ˆ − ˆ X − ˆ X
 )
→ min

3.2.1 OLS Estimators

• Partial derivative

 Yi = nˆ1 + ˆ2  X 2i + ˆ3  X 3i

 2i i = ˆ
1 + ˆ
2 +  ˆ
3  X 2 i X 3i
X Y X 2i X 2i

 3i i = ˆ
1 + ˆ
 2 + ˆ
 3
X Y X 3i X X
2 i 3i X 3i 17
3.2.1 OLS Estimators

• If we denote: yi = Yi − Y
x2i = X 2i − X 2
x3i = X 3i − X 3
 x =  X − n( X )
2 2 2
2i 2i 2

 x =  X − n( X )
2 2 2
3i 3i 3

 y =  Y − n(Y )
2 2 2
i i

 x x =  X X − nX X
2i 3i 2i 3i 2 3

 y x =  Y X − n.Y .X
i 2i i 2i 2

 y x = Y X − n.Y .X
i 3i i 3i 3
3.2.1 OLS Estimators

• We will obtain:

ˆ1 = Y − ˆ2 X 2 − ˆ3 X 3

( yx )( x ) − ( x x )( y x )

ˆ =
2 3 2 3 3
( x )( x )− ( x x )
3 2 3

( y x )( x ) − ( x x )( y x )

ˆ =
3 2 2 3 2
( x )( x )− ( x x )
3 2 3

3.2.1 OLS Estimators

• Example: We have a following data

Y X2 X3
20 8 3
19 7 4
18 6 5
17 5 5
16 5 6
15 5 6
15 4 7
14 4 7
14 3 8
12 3 9
3.2.1 OLS Estimators

• We obtain
 Y = 160
i  Y = 2616

 X = 50 2i  X = 274

 X = 60 3i  X = 390

Y = 16  Y X = 835
i 2i

X2 = 5  Y X = 920
i 3i
X3 = 6
 X X = 274
2i 3i
3. OLS Estimators

• and
 y =  Y − n(Y ) = 56
2 2 2
i i

 x =  X − n(X ) = 24
2 2 2
2i 2i 2

 x =  X − n(X ) = 30
2 2 2
3i 3i 3

 y x =  Y X − nYX = 35
i 2i i 2i 2

 y x =  Y X − nYX = −40
i 3i i 3i 3

 x x =  X X − nX X = −26
2 i 3i 2i 3i 2 3
3.2.1 OLS Estimators

• and ˆ
 2 = 0.2272
ˆ3 = −1.1363
ˆ1 = 21.6818

Yˆi = 21.6818 + 0.2272 X 2 − 1.1363 X 3

3.2.1 OLS Estimators

• Variances and Standard Errors of OLS Estimators

ˆ 1 X 22  x32 + X 32  x22 − 2 X 2 X 3  x2 x3
Var(1 ) = ( + ) 2
n  x  x − ( x x )
3 2 3

Var( ˆ ) =
 x 2
 2

 x  x − ( x x )
2 2 2 2
2 3 2 3

Var(  3 ) =
 x2

 2

 x2  x3 − ( x2 x3 )
2 2 2

3. OLS Estimators

• or, equivalently,
Var ( ˆ ) = se( ˆ2 ) = var(ˆ2 )
x (1 − r )
2 2 2
2 23

Var ( ˆ3 ) = se( ˆ3 ) = var(ˆ3 )
 3 23 )
x 2
(1 − r 2

where r23 is the sample coefficient of correlation between X2,X3

• In all these formulas σ2 is the variance of the population
disturbances ui
 = ˆ
2 2

 uˆ i

7.4.19 25
Example- Stata output
• Model: wage = f(educ,exper )

3.2.2 The Three-Variable Model: Notation and
Assumptions Yi = 1 +  2 X 2i + 3 X 3i + ui
1. Linear regression model, or linear in the parameters.
2. X values are fixed in repeated samplings. X is assumed to be non-
3. Zero mean value of disturbance ui: E(ui|X2i, X3i) = 0.
Then we have Zero covariance between ui and each X variable
cov (ui, X2i) = cov (ui,X3i) = 0
4. Homoscedasticity or constant variance of ui: Var(ui)=σ2
5. No serial correlation between the disturbances:
Cov(ui,uj) = 0, i ≠ j
6. The number of observations n must be greater than the number of
parameters to be estimated.
7. Variability in X values. The X values in a given sample must not all
be the same.
8 No specification bias or the model is correctly specified.
9. No exact collinearity between the X variables.
Assumption 3: Zero mean value of
disturbance ui: E(ui|X2i, X3i) = 0.
This Ass can fail if:
- the functional relationship between the
explained and explanatory variables is
misspecified in equation
- omitting an important factor that is correlated
with any of x1,x2, …xk

When xj is correlated with u for any reason, then

xj is said to be an endogenous explanatory
Assumption 9: No exact collinearity between the
X variables.
• If an independent variable in the regression is
an exact linear combination of the other
independent variables, then we say the model
suffers from perfect collinearity, and it cannot
be estimate (chapter5)
• Note that Ass 9 allow independent variables
to be correlated, they just cannot perfectly
correlated. If we did not allow for any
correlation, then multiple regression would be
of very limited use for econometric analysis.
3.2.3. Unbiased and efficient properties
Gauss-Markov Theorem ˆ1 , ˆ 2 ,...., ˆ k are the best
linear unbiased estimators (BLUEs) of 1 , 2, ......,  k

• An estimator 𝛽መ𝑗 is an unbiased estimator of j if

𝐸(𝛽መ𝑗 ) = 𝛽𝑗

• An estimator of 𝛽መ𝑗 is linear if and only if it can be

expressed as a linear function of
the data on the
dependent variable:  j =  wij yi
i =1

• “best” is defined as smallest variance.

3.2.3. Unbias and efficient properties
• The sample regression line (surface) passes through
the means of (Y , X 2 ,..., X k )

• The mean value of the estimated Yi is equal to the mean

value of the actual Yi. ˆY =Y n
• Sum of residuals is equal to 0  uˆ
i =1
i =0
• The residuals are uncorrelated with Xki : X
i =1
ki iuˆ = 0
• The residuals are uncorrelated with Ŷi n

 Yˆ uˆ
i =1
i i =0


3.2.3. Unbias and efficient properties

Standard errors of the OLS estimators

• An unbiased estimator of  2
:  =
E (u ) =
 i /n
u 2

i =1

→ This is not a true estimator because we can not

observe the ui.

• The unbiased estimator of  : ̂ =

2 i
u 2

n−k n−k
RSS /  follows
2 2
distribution with df = number of
observations – number of estimated parameters = n-k
Positive ̂ is called the standard error of the regression
(SER) (or Root MSE). SER is an estimator of the standard
deviation of the error term.
3.2.3. Unbias and efficient properties
 2
Var ( ˆ j ) =
TSS j (1 − R 2j )
• Where TSS j =  ( xij − x j ) 2is total sample
i =1 2
variation in xj and R j is the R-squared from
regressing xj on all other independent
variables (and including an intercept).
• Since  is unknown, we replace it with its
estimator ̂ . Standard error:
se( ˆ j ) = ˆ /[TSS j (1 − R 2j )]1/ 2
3.3 Measure of fit or coefficient of determination R2
• The total sum of squares (TSS)
TSS =  y =  (Yi − Y ) =  Yi − nY
2 2 2

• The explained sum of squares (ESS)

ESS =  yˆ = (Yi − Y ) = ˆ2  yi x2i + ˆ3  yi x3i
2 2
• The residual sum of squares (RSS)
RSS =  (Yi −Yˆi ) 2 =  uˆi2 = TSS − ESS
• Goodness of fit - Coefficient of Determination R2
R =
= 1−
→ The fraction of the sample variation in Y that is explained
by X2 and X3.
Example- Goodness of fit
• Determinants of college GPA:
- The variables in GPA1.dta include the college
grade point average (colGPA), high school GPA
(hsGPA) and achievement test score (ACT) for
a sample of 141 students from a large

Example- Goodness of fit
• Determinants of college GPA:

Output interpretation
• hsGPA and ACT together explain about ?% of
the variation in college GPA for this sample of
• There are many other factors including family
background, personality, quality of high school
education, affinity for college that contribute
to a student’s college performance.

3.3. Measure of fit
• Note that R2 lies between 0 and 1.
o If it is 1, the fitted regression line explains 100 percent of
the variation in Y
o If it is 0, the model does not explain any of the variation
in Y.
• The fit of the model is said to be “better’’ the closer R2 is to
• As the number of regressors increases, R2 almost invariably
increases and never decreases.

R2 and the adjusted R2
• An alternative coefficient of determination:
RSS /(n − k )
R = 1−

TSS /(n − 1)
n −1
R = 1 − (1 − R )
2 2

where k = the number of parameters in the model including the
intercept term.

R2 and the adjusted R2
• It is good practice to use adjusted R2 than R2
because R2 tends to give an overly optimistic
picture of the fit of the regression, particularly
when the number of explanatory variables is
not very small compared with the number of

The game of maximizing adjusted R2
• Sometimes researchers play the game of maximizing
adjusted R2, that is, choosing the model that gives the
highest adjusted R2. → This may be dangerous.
• In regression analysis, our objective is not to obtain a
high adjusted R2 per se but rather to obtain
dependable estimates if the true population regression
coefficients and draw statistical inferences about them.
• Researchers should be more concerned about the
logical or theoretical relevance of the explanatory
variables to the dependent variable and their statistical

Comparing Coefficients of Determination R2

• It is crucial to note that in comparing two models on the

basis of the coefficient of determination, whether adjusted
or not
• the sample size n must be the same
• the dependent variable must be the same
• the explanatory variables may take any form.
Thus for the models
lnYi = β1 + β2X2i + β3X3i + ui (7.8.6)
Yi = α1 + α2X2i + α3X3i + ui (7.8.7)
the computed R2 terms cannot be compared
Review: Partial correlation coefficients
• Example: we have a regression model with three variables:
Y, X2 and X3.
• The coefficient of correlation r as a measure of the degree
of linear association between two variables: r12 (correlation
coefficient between Y and X2), r13(correlation coefficient
between Y and X3) and r23 (correlation coefficient between
X2 and X3)→ gross of simple correlation coefficients or
correlation coefficients of zero order.
• Does r12 in fact measure the “true” degree of (linear)
association between Y and X2 when X3 may be associated
with both of them?
→ We need a correlation coefficient that is independent of
the influence of X3 on X2 and Y → The partial correlation

Review. Partial correlation coefficients
• r12,3 =partial correlation coefficient between Y and X2,
holding X3 constant.
• r13,2 =partial correlation coefficient between Y and X3,
holding X2 constant.
• r23,1 =partial correlation coefficient between X2 and X3,
holding Y constant.
→ These are called first order correlation coefficients (order=
the number of secondary subscripts).

r12 − r13 r23

r12 ,3 =
(1 − r132 )(1 − r232 )

Example- Partial correlation coefficients
• Y= crop yield, X2= rainfall, X3= temperature.
Assume r12=0, there is no association between
crop yield and rainfall. Assume r13 is positive,
r23 is negative → r12,3 will be positive →
Holding temperature constant, there is a
positive association between yield and rainfall.
Since temperature X3 affects both yield Y and
rainfall, we need to remove the influence of
the nuisance variable temperature.
• In Eview: quick-> group statistic->correlation

LEC 11
More on Functional Form
The Cobb–Douglas Production Function

• The Cobb–Douglas production function, in its stochastic

form, may be expressed as:
2 3 U i
Yi = 1 X 2i X 3i e 7.9.1
where Y = output
X2 = labor input
X3 = capital input
u = stochastic disturbance term
e = base of natural logarithm
• if we log-transform this model, we obtain:
ln Yi = ln β1 + β2 lnX2i + β3lnX3i + ui
= β0 + β2lnX2i + β3lnX3i + ui (7.9.2)
where β0 = ln β1. 46
EXAMPLE 7.3 ValueAdded, Labor Hours, and Capital Input in
the Manufacturing Sector

• There

Nguyen Thu Hang, BMNV, FTU CS2 47


• There

Nguyen Thu Hang, BMNV, FTU CS2 48

. More on Functional Form
The Cobb–Douglas Production Function


• The output elasticities of labor and capital were 0.4683 and

0.5213, respectively.
• Holding the capital input constant, 1 percent increase in the labor
input led on the average to about a 0.47 percent increase in the
• Similarly, holding the labor input constant, 1 percent increase in
the capital input led on the average to about a 0.52 percent
increase in the output.
More on Functional Form
Polynomial Regression Models

Figure 7.1, The U-shaped marginal cost curve shows that

the relationship between MC and output is nonlinear.

. More on Functional Form
Polynomial Regression Models
• Geometrically, the MC curve depicted in Figure 7.1
represents a parabola. Mathematically, the parabola is
represented by the following equation:
Y = β0 + β1X + β2Xi2 (7.10.1)
which is called a quadratic function,
• The general kth degree polynomial regression may be written
Yi = β0 + β1Xi + β2Xi2+ · · ·+βkXik + ui (7.10.3)

More on Functional Form
Polynomial Regression Models
EXAMPLE 7.4 Estimating the Total Cost Function


