CHAPTER IV - Multiple Regression Model

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 90

Introduction

 The two-variable model studied extensively is often inadequate


in practice.
 In our consumption–income example, it was assumed that only
income X affects consumption Y.
 But economic theory is seldom so simple for, besides income, a
number of other variables are also likely to affect consumption
expenditure.
 Therefore, we need to extend our simple two-variable regression
model to cover models involving more than two variables.
 Adding more variables leads us to the discussion of multiple
regression models, that is, models in which the dependent
variable, or regressand, Y depends on two or more explanatory
variables, or regressors.

10/24/2017 Mai VU-FIE-FTU 2


Outline
1. Establishment of the model
2. The problem of estimation
3. Interpretation of multiple regression equation
4. The multiple coefficient of determination R2 and
the multiple coefficient of correlation R
5. Hypothesis testing
6. Prediction

10/24/2017 Mai VU-FIE-FTU 3


1. Establishment of the model
 The regression model having k independent variables
is presented in algebraic form as follows:

PRF: Yi =  0 + 1 X 1i +  2 X 2i + ... +  k −1 X k −1,i + ui (3.01)

SRF: Yi = ˆ0 + ˆ1 X 1i + ˆ2 X 2i + ... + ˆk −1 X k −1,i + uˆi (3.02)

 β0: intercept coef., βj (j=1,…,k-1): slope coefs.


 ˆ j , ûi : point estimators of βj, ui

10/24/2017 Mai VU-FIE-FTU 4


1. Establishment of the model
 Let X be an n x k matrix where we have observations
on k independent variables for n observations. Since
our model will usually contain a constant term, one of
the columns in the X matrix will contain only ones.
This column should be treated exactly the same as any
other column in the X matrix.
 Let Y be an n x 1 vector of observations on the
dependent variable.
 Let u be an n x 1 vector of disturbances or errors.
 Let β be an k x 1 vector of unknown population
parameters that we want to estimate.
10/24/2017 Mai VU-FIE-FTU 5
1. Establishment of the model
 Our statistical model will essentially look something
like the following:

 Y1   0   u1   1 X 11 X 21 ... X k −1,1 
       
 Y2  ,  1  , u =  u2  , X =  1 X 12 X 22 ... X k −1, 2 
Y =   =   ...   ... ... ... ... ... 
... ...    
      1 X X
Y     un  n*1  ... X k −1, n n*k
 n  n*1  k −1  k *1 1n 2n

 Then, our PRF can be written in matrix form as:


Y= X.β + u (3.03)

10/24/2017 Mai VU-FIE-FTU 6


1. Establishment of the model
 Our SRF can be written as: Y i = Xˆ + uˆ (3.04)
 Where:

 ˆ0   uˆ1 
   
 ˆ   uˆ2 
ˆ =  1  and uˆ =  
...
 ...   
 ˆ   uˆ 
 k −1  k *1  n  n*1

10/24/2017 Mai VU-FIE-FTU 7


2. The problem of estimation
2.1. Ordinary Least Squared Approach in Matrix Form
2.2. Properties of the OLS Estimators
2.3. The Gauss-Markov Assumptions
2.4. The Gauss-Markov Theorem
2.5. The Variance-Covariance Matrix of the OLS Estimates

10/24/2017 Mai VU-FIE-FTU 8


2.1. OLS approach in matrix form
 Our estimates of the population parameters are

referred to as 𝛽.
 Recall that the criteria we use for obtaining our
estimates is to find the estimator 𝛽መ that minimizes the
sum of squared residuals:

 
n n 2

 i =  Yi − (ˆ0 + ˆ1 X1i + .... + ˆk −1 X k −1 )


ˆ
u 2

i =1 i =1
→ min

10/24/2017 Mai VU-FIE-FTU 9


2.1. OLS approach in matrix form

 Remind Transpose of a matrix: In linear algebra, the


transpose of a matrix A is an operator which flips a
matrix over its diagonal, that is it switches the row and
column indices of the matrix by producing another
matrix denoted as AT  X1 
X 
A = X 1 X2 X3 X4 → AT
=  2
X3
 
X4
 Property: AT.A=A2
10/24/2017 Mai VU-FIE-FTU 10
2.1. OLS approach in matrix form
 Therefore,
n

i
ˆ
u 2
= uˆ T .uˆ
(Y − Xˆ )T .(Y − Xˆ )
i =1
=
= (Y T − ˆ T X T ).(Y − Xˆ )
= Y T Y − Y T Xˆ − ˆ T X T Y + ˆ T X T Xˆ
= Y T Y − 2ˆ T X T Y + ˆ T X T Xˆ
(As Y T Xˆ = (Y T Xˆ )T = ˆ T X T Y )
 Then:
n

i
ˆ
u 2

i =1
= f ( ˆ ) = Y T Y − 2ˆ T X T Y + ˆ T X T Xˆ
 (3.05)
10/24/2017 Mai VU-FIE-FTU 11
2.1. OLS approach in matrix form
 To find the 𝛽መ that minimizes the sum of squared residuals,

we need to take the derivative of Eq. 3.05 with respect to 𝛽.
This gives us the following equation:
f ( ˆ)

f ' ( ˆ ) = = − 2 X T Y + 2 X T Xˆ = 0 → X T X̂ = X T Y
ˆ
 Equivalent to:
 n
 X 1i X 2i ... X k −1, i 

 ˆ0   1
  
1 ... 1   Yˆ    Yi 
  1  
  X1i X X X X X
1i k −1, i   ˆ1   X 11 X 12 ... X 1n   Yˆ    Yi X 1i 
2
1i 1i 2i ...
 ... ... ... ... ...     ... ... ... ...  
2
  ... 
  ...       ... 
 X X X X X 2i ...  X k2−1,i k*k  ˆ   X ... X k −1, n k *n  Yˆn    Yi X k −1,i k *n
 k −1, i k −1, i 1i k −1, i  k −1 k *1  k −1,1 X k −1,2 n *1

10/24/2017 Mai VU-FIE-FTU 12


2.1. OLS approach in matrix form
 Or: XTX ̂ = XT Y
 Two things to note about the (XTX) matrix. First, it is
always square since it is k x k. Second, it is always
symmetric.
 Recall that (XTX) and XT Y are known from our data
but ̂ is unknown. If the inverse of (XTX) exists (i.e.
(XTX)-1), then pre-multiplying both sides by this
inverse gives us the following equation:
X TY
̂ = T = ( X T X )−1.( X T Y )
X X
10/24/2017 Mai VU-FIE-FTU 13
2.2. Properties of the OLS Estimators
1. The sum of the residuals is zero: σ 𝑢𝑖 = 0
2. The observed values of X are uncorrelated with the
residuals: cov( û ,X) = 0
σ 𝑢𝑖
3. The sample mean of the residuals is zero: 𝑢ത =
𝑛
4. The regression hyperplane passes through the means of
the observed values (𝑋ത and 𝑌).

5. The predicted values of Y are uncorrelated with the
ො 𝑌෠ = 0
residuals: 𝑐𝑜𝑣 𝑢,
 The mean of the predicted Y's for the sample will equal the
mean of the observed Y's: Y = Yˆ
10/24/2017 Mai VU-FIE-FTU 14
2.2. Properties of the OLS Estimators
 Note that we know nothing about 𝛽መ except that it
satisfies all of the properties discussed above.
 We need to make some assumptions about the true
model in order to make any inferences regarding β
(the true population parameters) from 𝛽መ (our
estimator of the true parameters).
 Recall that 𝛽መ comes from our sample, but we want to
learn about the true parameters.

10/24/2017 Mai VU-FIE-FTU 15


2.3. The Gauss-Markov Assumptions
1. There is a linear relationship between y and X:
Y=Xβ+ u
2. X is an n x k matrix of full rank: This assumption
states that there is no perfect multicollinearity. In
other words, the columns of X are linearly
independent. This assumption is known as the
identification condition.

10/24/2017 Mai VU-FIE-FTU 16


2.3. The Gauss-Markov Assumptions
3. The disturbances average out to zero for any value of
X: E(u)= 0
 u1   E (u1 / X 11 , X 21 ,..., X k −1,1 ) 
 
E(u) = E  u2  =  E (u2 / X 12 , X 22 ,..., X k −1, 2 ) 
 ...   ... 
   
 u   E (un / X 1n , X 2 n ,..., X k −1, n ) 
 n

10/24/2017 Mai VU-FIE-FTU 17


2.3. The Gauss-Markov Assumptions
tự tương quan phương sai kh đồng nhất
4. There is no autocorrelation and no heteroskedasticity in
the model: E(u.uT)= σ2.I, where I is identity matrix.
  u1    u 2
u1u2 ... u1un 
    1

E(u.uT) =E  u2   = E  u2u1 u2 ... u2un 
  ... u1 u2 ... un 
2

 ... 
    ... ... ... 
 u   u u u u ... u 2 
 n   n 1 n 2 n 

 E (u12 ) E (u1u2 ) ... E (u1un ) 


 
=  E (u2u1 ) E (u2 ) ... E (u2un ) 
2

 ... ... ... ... 


 
 E (u u ) E (u u ) ... E (u 2 ) 
 n 1 n 2 n 
10/24/2017 Mai VU-FIE-FTU 18
2.3. The Gauss-Markov Assumptions
 The assumption of homoskedasticity states that the
variance of ui is the same (σ2) for all: var[ui|X] =σ2 ∀ i.
 The assumption of no autocorrelation (uncorrelated
errors) means that cov(ui, uj)= E(ui, uj)= 0, ∀ i ≠ j
 With these assumptions, we have:
  2 0 ... 0  1 0 ... 0
   
E(u.u ) =  0  ... 0  = σ  0 1
T 2 2 ... 0  = σ2 I
 ... ... ... ...   ... ... ... ...
   
 0 0 ...   2  0 0 1 
  ...

10/24/2017 Mai VU-FIE-FTU 19


2.3. The Gauss-Markov Assumptions
5. X may be fixed or random, but must be generated by
a mechanism that is unrelated to ui.
6. u ~ N(0, σ2I): This assumption is not actually
required for the Gauss-Markov Theorem. However,
we often assume it to make hypothesis testing easier.

10/24/2017 Mai VU-FIE-FTU 20


2.4. The Gauss-Markov theorem

 𝛽መ is an unbiased estimator of β.
 𝛽መ is a linear estimator of β.
 𝛽መ has minimal variance among all linear and unbiased
estimators.

10/24/2017 Mai VU-FIE-FTU 21


2.5. The variance- Covariance Matrix of the
OLS Estimates
 To measure the variation and correlation between the
estimated coefficients, we use the covariance matrix of
the OLS estimator, 𝛽መ as follows:

 var( ˆ0 ) cov( ˆ0 , ˆ1 ) ... cov( ˆ0 , ˆk −1 )


 
ˆ  cov( ˆ1 , ˆ0 ) var( ˆ1 ) ˆ ˆ
... cov( 1 ,  k −1 ) 
cov(  ) = =σ 2.(XTX)-1 [3.05]
 .................. .................. ... .................. 
 
cov( ˆk −1 , ˆ0 cov( ˆk −1 , ˆ1 ) ˆ
... var(  k −1 ) 

10/24/2017 Mai VU-FIE-FTU 22


2.5. The variance- Covariance Matrix of the
OLS Estimates
 We estimate σ2 with 𝜎ො 2 , where:

i
ˆ
u 2

(3.07)
̂ 2 = i =1
n−k

10/24/2017 Mai VU-FIE-FTU 23


3. Interpretation of multiple regression equation
 Given a three-variable PRF as:
Yt =β1+β2X2t + β3X3t + ut
 On taking the conditional expectation of Y on both side of
the above equation, we obtain:
E(Yt|X2t ,X3t)=β1+β2X2t + β3X3t + ut (3.08)
 In words, (3.0.8) gives the conditional mean or expected
value of Y conditional upon the given or fixed values of X2
and X3. tùy thuộc vào

 Therefore, as in the two-variable case, multiple regression


analysis is regression analysis conditional upon the fixed
values of the regressors, and what we obtain is the average
or mean value of Y or the mean response of Y for the given
values of the regressors.
10/24/2017 Mai VU-FIE-FTU 24
3. Interpretation of multiple regression equation
 The regression coefficients β2 and β3 are known as
partial regression or partial slope coefficients.
 The meaning of partial regression coefficient is as
follows:
 β2 measures the change in the mean value of Y, E(Y), per
unit change in X2, holding the value of X3 constant.
 β3 measures the change in the mean value of Y per unit
change in X3, holding the value of X2 constant.

10/24/2017 Mai VU-FIE-FTU 25


4. The multiple coefficient of determination R2 , adjusted
R2 and the multiple coefficient of correlation R
4.1. The multiple coefficient of determination R2
4.2. The adjusted R2
4.3. Comparing two R2 values
4.4. The game of maximizing 𝑅ത 2
4.3. The multiple coefficient of correlation R

10/24/2017 Mai VU-FIE-FTU 26


4.1. The multiple coefficient of determination R2
 In the two-variable case we saw that r2 measures the
goodness of fit of the regression equation; that is, it
gives the proportion or percentage of the total
variation in the dependent variable Y explained by the
(single) explanatory variable X.
 This notation of r2 can be easily extended to regression
models containing more than two variables.
 The quantity that gives this information is known as
the multiple coefficient of determination and is
denoted by R2.

10/24/2017 Mai VU-FIE-FTU 27


4.1. The multiple coefficient of determination R2
 By definition:
𝐸𝑆𝑆 𝑅𝑆𝑆
𝑅2 = =1− (3.09)
𝑇𝑆𝑆 𝑇𝑆𝑆
 R2, like r2, lies between 0 and 1. If it is 1, the fitted
regression line explains 100 percent of the variation in
Y.
 On the other hand, if it is zero, the model does not
explain any of the variation in Y.

10/24/2017 Mai VU-FIE-FTU 28


4.2. The adjusted R2
 Recall the definition of the coefficient of
determination:
𝐸𝑆𝑆 𝑅𝑆𝑆 ෝ𝑖2
σ𝑢
𝑅2 = =1− =1−σ (3.12)
𝑇𝑆𝑆 𝑇𝑆𝑆 𝑦𝑖2
 To compare two R2 terms, one must take into account
the number of X variables present in the model. This
can be done readily if we consider an alternative
coefficient of determination, which is as follows:
ෝ𝑖2 /(𝑛−𝑘)
σ𝑢
𝑅ത 2 = 1 − σ (3.13)
𝑦𝑖2 /(𝑛−1)

10/24/2017 Mai VU-FIE-FTU 29


4.2. The adjusted R2
Where
 k= the number of parameters in the model including
the intercept term.
 The term adjusted means adjusted for the df
associated with the sums of squares entering into
(3.12)
 Equation (3.13) can also be written as:
ෝ2
𝜎
𝑅ത 2 = 1 − (3.14)
𝑆𝑌2
 Where 𝜎ො 2 is the residual variance, an unbiased
estimator of true σ2 and 𝑆𝑌2 is the sample variance of Y.
10/24/2017 Mai VU-FIE-FTU 30
4.2. The adjusted R2
 It is easy to see that 𝑅ത 2 and R2 are related because,
substituting (3.12) into (3.13), we obtain:
𝑛−1
ത 2 2
𝑅 = 1 − (1 − 𝑅 ) (3.15)
𝑛−𝑘
 It is immediately apparent from Eq. (3.15) that:
 For k > 1, 𝑅ത 2 < R2 which implies that as the number of X
variables increases, the adjusted R2 increases less than
the unadjusted R2;
 𝑅 ത 2 can be negative, although R2 is necessarily non-
negative. In case 𝑅ത 2 turns out to be negative in an
application, its value is taken as zero.
10/24/2017 Mai VU-FIE-FTU 31
4.3. Comparing two R2 values
 Comparing Two R2 Values: in comparing two models
on the basis of the coefficient of determination,
whether adjusted or not, the sample size n and the
dependent variable must be the same; the explanatory
variables may take any form. Thus for the models:
𝑙𝑛𝑌𝑖 = 𝛽1 + 𝛽2 𝑋2𝑖 + 𝛽3 𝑋3𝑖 + 𝑢𝑖 (3.16)
𝑌𝑖 = 𝛼1 + 𝛼2 𝑋2𝑖 + 𝛼3 𝑋3𝑖 + 𝑢𝑖 (3.17)
 the computed R2 terms cannot be compared. The
reason is as follows:

10/24/2017 Mai VU-FIE-FTU 32


4.3. Comparing two R2 values
 By definition, R2 measures the proportion of the variation
in the dependent variable accounted for by the explanatory
variable(s).
 Therefore, in (3.16), R2 measures the proportion of the
variation in lnY explained by X2 and X3, whereas in (3.17) it
measures the proportion of the variation in Y, and the two
are not the same thing:
 a change in lnY gives a relative or proportional change in Y,
whereas a change in Y gives an absolute change.
 Therefore, var 𝑌 ෢ 𝑖 )/var (lnYi) →
෡𝑖 /varYi is not equal to var (𝑙𝑛𝑌
the two coefficients of determination are not the same.

10/24/2017 Mai VU-FIE-FTU 33


4.4. The game of maximizing 𝑅ത 2
 The “Game’’ of Maximizing 𝑅ത 2 :
ത2,
 Sometimes researchers play the game of maximizing 𝑅
that is, choosing the model that gives the highest 𝑅ത 2 .
But this may be dangerous, for in regression analysis our
objective is not to obtain a high 𝑅ത 2 per se but rather to
obtain dependable estimates of the true population
regression coefficients and draw statistical inferences
about them.
 The researcher should be more concerned about the
logical or theoretical relevance of the explanatory
variables to the dependent variable and their statistical
significance.

10/24/2017 Mai VU-FIE-FTU 34


4.5. The multiple coefficient of correlation R
 By taking the square roots of both sides of (3.09), we have:
𝐸𝑆𝑆 𝑅𝑆𝑆
𝑅= = 1− (3.10)
𝑇𝑆𝑆 𝑇𝑆𝑆

 That is the coefficient of multiple correlation, and it is a


measure of the degree of association between Y and all the
explanatory variables jointly.
 Recall that in the two-variable case we defined the quantity r as
the coefficient of correlation which measures the degree of
(linear) association between two variables.
 Although r can be positive or negative, R is always taken to be
positive. In practice, however, R is of little importance. The more
meaningful quantity is R2.

10/24/2017 Mai VU-FIE-FTU 35


4.5. The multiple coefficient of correlation R
 The following relationship between R2 and the
variance of a partial regression coefficient in the k-
variable multiple regression model given in:

𝜎2 1
𝑣𝑎𝑟 𝛽መ𝑗 = 2 2 (3.11)
σ 𝑥𝑗 1 − 𝑅𝑗

 Where: 𝛽መ𝑗 = the partial regression coefficient of


regressor Xj; 𝑅𝑗2 = R2 in the regression of Xj .

10/24/2017 Mai VU-FIE-FTU 36


4.5. The multiple coefficient of correlation R
 In the simple regression model, the coefficient of
correlation r as a measure of the degree of linear
association between two variables.
 For the three-variable regression model we can
compute three correlation coefficients: r12 (correlation
between Y and X2), r13 (correlation coefficient between
Y and X3), and r23 (correlation coefficient between X2
and X3) → These correlation coefficients are called
gross or simple correlation coefficients, or
correlation coefficients of zero order.

10/24/2017 Mai VU-FIE-FTU 37


4.5. The multiple coefficient of correlation R
 If we estimate a correlation coefficient that is independent
of the influence, if any, of X3 on X2 and Y
 Such a correlation coefficient can be obtained and is
known appropriately as the partial correlation
coefficient. Conceptually, it is similar to the partial
regression coefficient.
 We define
 r12.3 = partial correlation coefficient between Y and X2,
holding X3 constant
 r13.2 = partial correlation coefficient between Y and X3,
holding X2 constant
 r23.1 = partial correlation coefficient between X2 and X3,
holding Y constant

10/24/2017 Mai VU-FIE-FTU 38


4.5. The multiple coefficient of correlation R
 These partial correlations can be easily obtained
from the simple or zero order, correlation coefficients
as follows: r12 − r13 r23
 r12 ,3 = (3.12)
(1 − r13 )(1 − r23 )
2 2

r13 − r12 r23


 r13 , 2 = (3.13)
(1 − r122 )(1 − r232 )

r23 − r12 r13


r23,1 =
 (1 − r122 )(1 − r132 ) (3.14)

10/24/2017 Mai VU-FIE-FTU 39


4.5. The multiple coefficient of correlation R
 These partial correlations given are called first order
correlation coefficients. By order we mean the
number of secondary subscripts.
 Thus r12,34 would be the correlation coefficient of order
two, r12,345 would be the correlation coefficient of order
three, and so on.

10/24/2017 Mai VU-FIE-FTU 40


4.5. The multiple coefficient of correlation R
 In general, we have the correlation coefficient between the
dependent variable Y and the independent variable Xi as:
r0 j =  yi x ji

 yi  x ji
2 2

 The correlation between two independent variables Xt and


Xj as:
rtj =
 xti x ji

 xti  x ji
2 2

 Where: yi = Yi- Y ; xji = Xji - X j

10/24/2017 Mai VU-FIE-FTU 41


4.5. The multiple coefficient of correlation R
 Before moving on, note the following relationships
between R2, simple correlation coefficients, and partial
correlation coefficients:
r 2
+ r 2
− 2r r r
R 2 = 12 13 2 12 13 23
1 − r23

R 2 = r122 + (1 − r122 )r132 , 2

R 2 = r132 + (1 − r132 )r122 ,3

10/24/2017 Mai VU-FIE-FTU 42


5. Hypothesis testing and predictions
5.1. Assumption about the distribution of the disturbance
5.2. Properties of OLS estimators under the normality
assumption
5.3. Review of statistics and probability
5.4. Confidence interval for regression coefficients
5.5. Hypothesis testing

10/24/2017 Mai VU-FIE-FTU 43


5.1. Assumption about the distribution of
the disturbance
The disturbance follows normal distribution.
ui ~ N(0, 𝝈𝟐 )

This assumption allows us to derive the probability distribution


of OLS estimators.
If the sample size is less than 100, normality assumption is
necessary.
If the sample size is reasonably large, the assumption maybe
relaxed. (Central limit theorem)

10/24/2017 Mai VU-FIE-FTU 44


5.2. Properties of OLS estimators under the
normality assumption
1. Unbiased
2. Minimum variance among the entire class of unbiased
estimators.
3. Consistency
4.  j N ( j, var(  j ))
( j −  j)
5. t = T n−k
se(  j )
2
6. (n − k )   2 (n − k )
2
10/24/2017 Mai VU-FIE-FTU 45
5.3. Review of statistics and probability
5.3.1. Probability distribution of 𝛽መ𝑗 and t
5.3.2. Interval estimation
5.3.3. The t-distribution
5.3.4. Critical t value
5.3.5. Confidence interval for regression βj
5.3.6. Interpretation of confidence interval
5.3.7. One-sided confidence interval
5.3.8. Confidence interval for σ2

10/24/2017 Mai VU-FIE-FTU 46


5.3.1. Probability distribution of 𝛽෡𝑗 and t

X N (  , ( / n))
2
t=
X −  x
T
x

n

N (  , var(  ))
(  −  )

j
t= j
T

j j j n−k
se( )
j

10/24/2017 Mai VU-FIE-FTU 47


5.3.1. Probability distribution of 𝛽෡𝑗 and t

(  −  )
 N (  , var(  ))
j
t= j
T

j j j n−k
se( )
j

10/24/2017 Mai VU-FIE-FTU 48


5.3.2. Interval estimation
• Point estimator β෠ 𝑗 : random, may differ from the true
parameter.
• Interval estimation: an interval around the point
estimator that contain the true value of parameter with a
certain probability.
• To construct an interval we need:
✓Level of significance α: 1%, 5%, 10%
✓Level of confidence 1 – α: 99%, 95%, 90%
• P (β෠ 𝑗 – ε ≤ β𝑗 ≤ β෠ 𝑗 + ε) = 1 – α
The probability that the random interval (β෠ 𝑗 – ε, β෠ 𝑗 + ε)
contains the true β𝑗 is 1 – α

10/24/2017 Mai VU-FIE-FTU 49


5.3.3. The t-distribution

P(−t  /2  t s  t  /2) = (1 −  )
P(t s  t  ) = (1 −  )
P(t s  −t  ) = (1 −  )
• 𝑡α and 𝑡α/2 are called the critical t.
10/24/2017 Mai VU-FIE-FTU 50
5.3.4. Critical t-value
 tα, n-k
 Degree of freedom (d.f.) n – k
 Level of significance α
P (|t| ≥ tα/2) = α
P(t ≥ tα) = α
P(t ≤ -tα) = α

 Look up for critical t in the table for t-distribution critical


value.
 Look up for critical t in Excel.

10/24/2017 Mai VU-FIE-FTU 51


5.3.5. Confidence interval for regression
coefficients 𝛽𝑗
 We have:
 − j
P(−t   t ) = (1 −  )
 /2
j
 /2
se(  )
j
  = (1 −  )
   t
P − se
j
(    t  
) 
 /2
 +
j
se ( )
j j
 /2
j

 The the interval:


 ) 
  j
− t  / 2se(  
j
),
j
+ t  / 2se(  j 

is called the (1 – α) confidence interval for β𝑗


10/24/2017 Mai VU-FIE-FTU 52
5.3.6. Interpretation of confidence interval
• Given (1-α) level of confidence, when independent
variable 𝑋𝑗 increases by 1 unit and other things
unchanged, the mean value of dependent variable
will increase by a value within the interval.
• Given the confidence level of (1-α), in the long run,
in 100*(1-α) out of 100 cases intervals like that will
contain the true βj.
• the width of the confidence interval is proportional
to the standard error of the estimator.
• Confidence intervals are random.
10/24/2017 Mai VU-FIE-FTU 53
5.3.7. One-sided confidence interval
• Left-sided confidence interval

(
 j  −,  j + t  ,n−kse(  j ) )
• Right-sided confidence interval:

(
 j   j − t  ,n−kse(  j ), + )
10/24/2017 Mai VU-FIE-FTU 54
5.3.8. Confidence interval for σ2
2

(n − k )  2 (n − k )
2
P(  2
1− /2     /2 ) = 1 − 
2 2

10/24/2017 Mai VU-FIE-FTU 55


5.3.8. Confidence interval for σ2
2

P( 12− /2  (n − k )    /2 ) = 1 − 
2

 2

2 2
 
P[(n − k)   2
 (n − k) ] = 1− 
 / 2
2
1− / 2
2

gives the 100(1-α)% confidence interval for σ2 .

10/24/2017 Mai VU-FIE-FTU 56


5.5. Hypothesis testing
5.5.1. H0 and H1
5.5.2. Testing an individual regression coefficient βj
5.5.3. Analysis of variance
5.5.4. Test the joint effect of regression coefficients

10/24/2017 Mai VU-FIE-FTU 57


5.5.1. H0 and H1
• Null hypothesis H0:
✓reflects that there will be no observed effect for your
experiment.
✓is assumed to be true until evidence indicates otherwise.
✓is what we want to reject!

• Alternative hypothesis H1:


✓is an alternative to the null hypothesis.
✓reflects that there will be an observed effect for our
experiment.
✓is what we might believe to be true or hope to prove true!

10/24/2017 Mai VU-FIE-FTU 58


5.5.2. Testing an individual regression
coefficient βj
• Two-sided testing (two-tail):
H0 :  j =  *
j

H1 :  j   *
j

• One-sided testing (one-tail):


H0 :  j   j *
H0 :  j   j
*

H1 :  j   j *
H1 :  j   j
*

10/24/2017 Mai VU-FIE-FTU 59


a. Confidence interval approach
2 steps:
• Construct a 100(1-α)% confidence interval for β𝑗
• Decision rule:
✓If β𝑗 under H0 falls within this confidence interval, do
not reject H0.
• 𝛽𝑗 * ∈ confidence interval, do not reject H0
✓If 𝛽𝑗 under H0 falls outside the interval, reject H0.
• 𝛽𝑗 * ∉ confidence interval, reject H0
10/24/2017 Mai VU-FIE-FTU 60
a. Confidence interval approach

10/24/2017 Mai VU-FIE-FTU 61


b. Test of significance approach
• Given that H0: β𝑗 = β𝑗 ∗ is true:
j −  *
j
P(−t  /2   t  /2) = (1 −  )
se(  j )

 P   *
− t  /2se(  )     *
+ t  se(  )  = (1 −  )
 j j j j /2 j 

which gives the interval in which β෠ 𝑗 will fall with 1−α


probability, given β𝑗 = β𝑗 ∗
=> Region of acceptance of the null hypothesis.

10/24/2017 Mai VU-FIE-FTU 62


Region(s) of acceptance/Region(s) of rejection
• The 100(1−α)% confidence interval established is known
as the region of acceptance (of H0).
• The region(s)outside the confidence interval is (are)
called the region(s) of rejection (of H0) or the critical
region(s).
• A statistic is said to be statistically significant if the value
of the test statistic lies in the critical region.

10/24/2017 Mai VU-FIE-FTU 63


Region(s) of acceptance/Region(s) of rejection

10/24/2017 Mai VU-FIE-FTU 64


Region(s) of acceptance/Region(s) of rejection

10/24/2017 Mai VU-FIE-FTU 65


Steps to do a t-test
• Step 1: Establish the hypotheses.
• Step 2: Find the test statistic (observed t value) ts
ˆ
j −j *

ts =
SE ( ˆ )
j

• Step 3: Find the critical t value tc


✓Degree of freedom: n – k
✓Level of significance: one-tail testing: α
two-tail testing: α/2
• Step 4: Compare ts to tc. Draw the conclusion.

10/24/2017 Mai VU-FIE-FTU 66


Decision rule for t-test

Type of hypothesis H0 and H1 Rejection zone

H 0 :  j =  *j
Two-tail |t| > tn-k; α/2
H1 :  j   *
j

H 0 :  j =  *j (  j   *j )
t > tn-k; α
Right-tail H1 :  j   *
j

H 0 :  j =  *j (  j   *j )
Left-tail t < -tn-k; α
H1 :  j   *
j

10/24/2017 Mai VU-FIE-FTU 67


c. P-value approach
P-value: the practical level of significance.
• Given that H0 is true, p-value is the probability of
getting a value of the sample test statistic that is at
least as extreme as the one found from the sample
data.
• the lowest significance level at which a null
hypothesis can be rejected.

10/24/2017 Mai VU-FIE-FTU 68


c. P-value approach
Steps to test an individual regression coefficient using p-
value approach:
• Step 1: Establish the hypotheses.
• Step 2: Find the test statistic (observed t value) ts
• Step 3: Find the p-value associated with the observed t-
value
✓Two-tail: T.DIST.2T(ts, d.f.)
✓Right-tail: T.DIST.RT(ts, d.f.)
✓Left-tail: T.DIST(ts, d.f.)
• Step 4: Compare p-value to α. Draw the conclusion.
✓Reject H0 if p-value < α
10/24/2017 Mai VU-FIE-FTU 69
Example

10/24/2017 Mai VU-FIE-FTU 70


5.5.3. Analysis of variance
• The Analysis of Variance Approach:
• ANOVA table:
Source of SS df MSS
variation

Due to k-1  2  yi x2i + ... +  k  yi xki


regression  2  yi x2i + ... +  k  yi xki
(ESS) k −1

u
Due to 2
n-k
u
2
i
residual i
(RSS) n−k
Total
 i
y 2 n-1

10/24/2017 Mai VU-FIE-FTU 71


5.5.4. Test the joint effect of regression
coefficients
 Test the overall significance of an observed multiple
regression.
 Test the incremental contribution of an independent
variable/ a group of variables.

10/24/2017 Mai VU-FIE-FTU 72


Testing the overall significance
• Hypotheses:
H 0 :  2 = 3 = ... =  k = 0
H1 :  22 + 32 + ... +  k2  0
• Null hypothesis: all the independent variables jointly do not
explain any variation in the value of Y.
✓This means that R2 = 0.

• Equivalent hypotheses: H0 : R2 = 0
H1 : R  0
2

10/24/2017 Mai VU-FIE-FTU 73


Testing the overall significance
Steps to test the overall significance of a model:
• Step 1: Establish the hypotheses.
• Step 2: Calculate the test statistic:

ESS (n − k ) R 2 (n − k )
F = =
s
RSS (k − 1) (1 − R 2 )(k − 1)

• Step 3: Find the critical F value F(k-1, n-k)


• Step 4: Conclusion.
✓If Fs > F(k-1, n-k), reject H0
✓If Fs < F(k-1, n-k), do not reject H0
10/24/2017 Mai VU-FIE-FTU 74
F-test
 2  yi x2i + ... +  k  yi xki
F= k −1 =
ESS / df
 ui
2
RSS / df
n−k

• Follows F-distribution with k-1 and n-k degree of


freedom.
• Provide a test of the null hypothesis that the true slope
coefficients are simultaneously zero.

10/24/2017 Mai VU-FIE-FTU 75


P-value
• P-value

• If p-value < α => Reject H0


• If p-value > α => Do not reject H0

10/24/2017 Mai VU-FIE-FTU 76


The incremental contribution of
independent variables
• What is the marginal, or incremental, contribution
of a variable/group of variables, knowing that
another variable is already in the model and that it
is significantly related to the dependent variable?
• Is the incremental contribution statistically
significant?
• What is the criterion for adding variables to the
model?
• When should we remove a group of variables out of
the model?
10/24/2017 Mai VU-FIE-FTU 77
Test the joint significance of a variable/a
group of variables
• Consider the model: Y = β1 + β2X2+..+ β5X5 + u
Unrestricted model.
Coefficient of determination R2(U)
• Whether β2 = β4= 0? (whether the 2 variables
corresponding to these parameters jointly have no effect
on dependent variable Y?)
• If β2 = β4= 0, X2 and X4 should not be in the model.
Y = β1’ + β3’X3+ β5’X5 + u
Restricted model
Coefficient of determination R2(R)

10/24/2017 Mai VU-FIE-FTU 78


Test the joint significance of a variable/a
group of variables
• Step 1: Establish the hypotheses
H 0 : 2 = 4 = 0
H1 :  22 +  42  0
• Step 2: Calculate the test statistic:
( ESSU − ESSR )( n − k ) ( RU2 − R 2R )( n − k )
F = =
s
RSSU m (1 − RU2 ) m

• Step 3: Find the critical F value F(m, n-k)


• Step 4: Conclusion.
✓ If Fs > F(m, n-k), reject H0
✓ If Fs < F(m, n-k), do not reject H0

10/24/2017 Mai VU-FIE-FTU 79


Testing the equality of 2 regression
coefficients
• Consider the model: Y = β1 + β2X2+..+ β5X5 + u
• Whether β3 = β4 ? (Whether X3 and X4 have the same impact on
dependent variable Y?)
• Test the hypotheses:
H 0 : 3 =  4 or H 0 : 3 −  4 = 0
H1 :  3   4 H1 :  3 −  4  0

• Test statistic:
(  3 −  4 ) − ( 3 −  4 )
ts =
se(  3 −  4 )

10/24/2017 Mai VU-FIE-FTU 80


Testing the equality of 2 regression
coefficients
• 4 steps
• Step 1: Establish the hypotheses.
• Step 2: Find the test statistic (observed t value) ts

(  3 −  4 ) − ( 3 −  4 )
ts =
se(  3 −  4 )
• where

se( 3 −  4 ) = var(  3 ) + var(  4 ) − 2cov(  3 ,  4 )


• Step 3: Find the critical t value tc
• Step 4: Compare ts to tc. Draw the conclusion.

10/24/2017 Mai VU-FIE-FTU 81


Example

10/24/2017 Mai VU-FIE-FTU 82


6. Prediction
• Predict the value of Y when we know the value of X
✓Point estimate 𝑌෠𝑖
✓Mean prediction E(Y|X)
✓Individual prediction 𝑌𝑖

10/24/2017 Mai VU-FIE-FTU 83


6. Prediction
• Consider the model
Y i
=  +  X +u
1 2 i i

• Sample regression model


Yˆi = ˆ1 + ˆ2 X i

• If we know that X = X0, predict the mean value of Y


and the individual value of Y with α level of
significance.

10/24/2017 Mai VU-FIE-FTU 84


6.1. Point estimation
• Get the point estimate 𝑌෠0 by replacing the
value of X0 into the SRF.
Yˆ0 = ˆ1 + ˆ2 X 0

10/24/2017 Mai VU-FIE-FTU 85


6.2. Mean prediction
E (Y / X 0 )  (Yˆ0 −  0 ;Yˆ 0+ 0 )
Where
 0 = SE (Yˆ0 )t( n − 2, / 2)

SE(Yˆ0 ) = Var(Yˆ0 )
1 (2X − X ) 2
Var (Yˆ0 ) =  ( + 0
)
n  xì2

10/24/2017 Mai VU-FIE-FTU 86


6.3. Individual prediction
ˆ ' ˆ
Y0  (Y0 −  0 ;Y 0+ 0 )
'

where
 = SE (Y0 − Y 0 )t( n − 2, / 2)
'
0

SE (Y0 − Yˆ0 ) = Var (Y0 − Yˆ0 )


21 ( X − X 0 )2
Var (Y0 − Y 0 ) =  (1 + + )
n  xì2

10/24/2017 Mai VU-FIE-FTU 87


10/24/2017 Mai VU-FIE-FTU 88
Assignment no 1

1. What is the conditional expectation function or the


population regression function?
2. What is the difference between the population and sample
regression functions? Is this a distinction without
difference?
3. What is the role of the stochastic error term ui in
regression analysis? What is the difference between the
stochastic error term and the residual, (𝑢ො 𝑖 )?
4. Why do we need regression analysis? Why not simply use
the mean value of the regressand as its best value?
5. What do we mean by a linear regression model?

10/24/2017 Mai VU-FIE-FTU 89


Model: log(wage) = β0+β1educ + β2exper +
β3tenure + u

(i) Interpretation of β1.


(ii) Calculate the exact percentage effect of another year of education on
the predicted wage level.
(iii) Test the null hypothesis that all the slope parameters in the model
are jointly equal to zero using a 1 percent significance level. What do you
conclude ?
(iv) We are interesting in constructing a confidence interval for the
(conditional) predicted log(wage) when educ = 13, exper = 11 and tenure =
7. To obtain the standard error for the prediction we need to estimate a
transformed model that is equivalent to (2.1). Derive the transformed
model which will give a direct estimate of the prediction and the standard
error of the prediction.

10/24/2017 Mai VU-FIE-FTU 90

You might also like