Solution Basic Econometrics

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

B.A.

(Hons) Business Economics, 2019

Solution by Dr. Ganesh Manjhi


1. (a) True. The unbiasedness of the estimators required are, 𝐸(𝑢𝑖 ) = 0 and 𝐸(𝑋𝑖 𝑢𝑖 ) = 0
(both, to derive normal equations through minimizing residual sum of squares and former also
required in the next step as shown below) and not the normal distribution of the error term.
That is -
From the normal equations after minimizing the residual sum of squares in the 2-variable
regression equation, the solution for 𝛽̂2 can be written as –
∑𝑥 𝑌 𝑥
𝛽̂2 = ∑ 𝑥𝑖 2 𝑖 = ∑ 𝑘𝑖 𝑌𝑖 ; where, 𝑘𝑖 = ∑ 𝑥𝑖 2 .
𝑖 𝑖

𝛽̂2 = ∑ 𝑘𝑖 (𝛽1 + 𝛽2 𝑋𝑖 + 𝑢𝑖 )

𝛽̂2 = 𝛽2 + ∑ 𝑘𝑖 𝑢𝑖

𝐸(𝛽̂2 ) = 𝛽2. Similarly, it can be shown for, 𝐸(𝛽̂1 ) = 𝛽1.

(b) False. From the normal equations after minimizing the residual sum of square in the 2-
variable regression equation, the solution for 𝛽̂2 can be written as –
∑𝑥 𝑌 𝑥
𝛽̂2 = ∑ 𝑥𝑖 2 𝑖 = ∑ 𝑘𝑖 𝑌𝑖 ; where, 𝑘𝑖 = ∑ 𝑥𝑖 2 .
𝑖 𝑖

𝛽̂2 = ∑ 𝑘𝑖 (𝛽1 + 𝛽2 𝑋𝑖 + 𝑢𝑖 )

𝛽̂2 = 𝛽2 + ∑ 𝑘𝑖 𝑢𝑖

𝐸(𝛽̂2 ) = 𝛽2
To find out variance we have

𝑉𝑎𝑟(𝛽̂2 ) = 𝐸[𝛽̂2 − 𝐸(𝛽̂2 )]2

⇒ 𝑉𝑎𝑟(𝛽̂2 ) = 𝐸(∑ 𝑘𝑖 𝑢𝑖 )2

⇒ 𝑉𝑎𝑟(𝛽̂2 ) = 𝐸[𝑘12 𝑢12 + 𝑘22 𝑢22 + ⋯ + 𝑘𝑛2 𝑢𝑛2 + 2𝑘1 𝑘2 𝑢1 𝑢2 + ⋯ 2𝑘𝑛−1 𝑘𝑛 𝑢𝑛−1 𝑢𝑛 ]

⇒ 𝑉𝑎𝑟(𝛽̂2 ) = ∑ 𝑘𝑖2 𝐸(𝑢𝑖2 ) + 2 ∑𝑖<𝑖 ′ ∑ 𝑘𝑖 𝑘𝑖′ 𝐸(𝑢𝑖 𝑢𝑖′ )

If 𝐸(𝑢𝑖 𝑢𝑖′ ) = 𝑚 ≠ 0 then, ⇒ 𝑉𝑎𝑟(𝛽̂2 ) = 𝜎 2 ∑ 𝑘𝑖2 + 2𝑚.


Suppose the, variance of the slope coefficient without serially correlated error term is,

𝑉𝑎𝑟(𝛽̂2∗ ) = 𝜎 2 ∑ 𝑘𝑖2 then, 𝑉𝑎𝑟(𝛽̂2∗ ) < 𝑉𝑎𝑟(𝛽̂2 ). Hence serially correlated error term make
the slope estimator inefficient.
(c) False. In the double log model of, log (Y) = log(𝛽1 ) + 𝛽2 log(𝑋𝑖 ) + log( 𝑢𝑖 ), the slope
coefficient is known as the elasticity of the dependent variable (Y) with respect to the
independent variable (X) and not the growth rate. Suppose, Y is the demand for internet and X
is the income of the consumer then 𝛽2 is the income elasticity of demand for internet.
𝑋 𝑑𝑦
The elasticity 𝛽2 can also be written as, 𝛽2 = 𝑌 𝑑𝑥 , where 𝛽2 is be defined as the percentage
change in demand for internet (Y) due to percentage change in the in the income (X).
(d) False. The durbin Watson test for serial correlation has been designed for first order serial
correlation only. This is based on two assumptions- (i) 𝑌 = 𝛽1 + 𝛽2 𝑋𝑖 + 𝑢𝑖 , where, 𝑢𝑡 =
𝜌𝑢𝑡−1 + 𝜀𝑡 , −1 < 𝜌 < 1, or, |𝜌| < 1, 𝜌 is called the coefficient of the error term lagged one
period. It is also called the first order autocorrelation coefficient. (ii) 𝐸(𝜀𝑡 ) = 0, E(𝜀𝑡2 ) = 𝜎𝜀2 <
∞ and 𝐸(𝜀𝑡 , 𝜀𝑡−𝑠 ) = 0; for all 𝑠 ≠ 0. The durbin Watson test take following steps – (i) Estimate
the OLS model and compute the residual as, 𝑢̂𝑡 = 𝑌𝑡 − 𝛽̂1 + 𝛽̂2 𝑋𝑡2 + ⋯ + 𝛽̂𝑘 𝑋𝑡𝑘 . (ii)
∑𝑡=𝑛
𝑡=2 (𝑢 ̂𝑡−1 )2
̂𝑡 −𝑢
Compute durbin Watson, 𝐷𝑊(𝑑) = ∑𝑡=𝑛 ̂𝑡2
≅ 2(1 − 𝜌). (iii) If, 𝑑 < 2, the null and
𝑡=1 𝑢
alternative hypothesis is, 𝐻0 : 𝜌 = 0 and 𝐻1 : 𝜌 > 0 and if 𝑑 > 2, 𝐻0 : 𝜌 = 0 and 𝐻1 : 𝜌 < 0.
Look up the 𝑑𝐿 and 𝑑𝑈 from Durbin Watson table for k’=number of independent variable
(excluding constant), reject the null if 𝑑 ≤ 𝑑𝐿 and do not reject the null if 𝑑 ≥ 𝑑𝑈 . If 𝑑𝐿 < 𝑑 <
𝑑𝑈 ⇒ 𝑡𝑒𝑠𝑡 𝑖𝑠 𝑖𝑛𝑐𝑜𝑛𝑐𝑙𝑢𝑠𝑖𝑣𝑒. Similarly, for 𝑑 > 2, if 𝑑 ≤ 4 − 𝑑𝑈 , do not reject the null and if
𝑑 ≥ 4 − 𝑑𝐿 , reject the null. If, 4-𝑑𝑈 < 𝑑 < 4 − 𝑑𝐿 ⇒ 𝑡𝑒𝑠𝑡 𝑖𝑠 𝑖𝑛𝑐𝑜𝑛𝑐𝑙𝑢𝑠𝑖𝑣𝑒.
The major problem of the DW(d) is that the result of the first order serial correlation can be
inconclusive and in this case and we need to conduct Lagrange Multiplier or other higher order
test of serial correlation.
(e) True. Otherwise, there will be a dummy variable trap. Assume a model with qualitative and
quantitative variables –
𝑌𝑡 = 𝛼1 + 𝛼2 𝐷 + 𝛽𝑋 + 𝑢 (a)
Where, Y=wage Earned, D=1, if male, 0, otherwise, X=Experience. The estimated relationship
for two groups’ are-

Female: 𝐸(𝑌𝑡 /𝐷 = 0, 𝑋𝑡 ); 𝑌̂ = 𝛼̂1 + 𝛽̂ 𝑋

Male: 𝐸(𝑌𝑡 /𝐷 = 1, 𝑋𝑡 ); 𝑌̂ = (𝛼̂1 +𝛼̂2 ) + 𝛽̂ 𝑋.


The reason for not defining both the dummies together is as follows. Consider the following
example, define 𝐷1 = 1, 𝑖𝑓 𝑚𝑎𝑙𝑒, 0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 and 𝐷2 = 1, 𝑖𝑓 𝑓𝑒𝑚𝑎𝑙𝑒, 0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒. In this
situation, 𝛼 = 𝛼1 + 𝛼2 𝐷1 + 𝛼3 𝐷2 would have exact multicollinearity because 𝐷1 + 𝐷2 = 1,
which is a constant term. This is known as dummy variable trap. To avoid this problem, the
number of dummy variable used in the regression equation is one less than the total number of
categories possible. However, if the regression coefficient is modified as, 𝑌𝑡 = 𝛽1 𝐷1 + 𝛽2 𝐷2 +
𝛿𝑋 + 𝑢 without a constant term then there is no problem of exact multicollinearity and hence
there is no dummy variable trap.

(f) False. Show and analyse from the relationship between 𝑅 2 and 𝑅̅ 2 . For example in case of
N=26, k=6, 𝑅 2 = 0.1, we get 𝑅̅ 2 = −0.125.
(g) False. Using the deviation form of the equation-
∑ 𝑦̂𝑖 𝑢̂𝑖 = 𝛽̂2 ∑ 𝑥𝑖 𝑢̂𝑖

⇒ ∑ 𝑦̂𝑖 𝑢̂𝑖 = 𝛽̂2 ∑ 𝑥𝑖 (𝑦𝑖 − 𝛽̂2 𝑥𝑖 )

⇒ ∑ 𝑦̂𝑖 𝑢̂𝑖 = 𝛽̂2 ∑ 𝑥𝑖 𝑦𝑖 − 𝛽̂22 ∑ 𝑥𝑖

⇒ ∑ 𝑦̂𝑖 𝑢̂𝑖 = 𝛽̂22 ∑ 𝑥𝑖 − 𝛽̂22 ∑ 𝑥𝑖 = 0


2. (a) Refer Page – 102-103 (Appendix from the book Gujarati D. N, Basic Econometrics, 4th
Edition).
̂𝑡2
∑𝑢
𝜎̂ 2 = 𝑛−2
, if n goes up 𝜎̂ 2 goes down.

(b) The durbin Watson test for serial correlation has been designed for first order serial
correlation only. This is based on two assumptions- (i) 𝑌 = 𝛽1 + 𝛽2 𝑋𝑖 + 𝑢𝑖 , where, 𝑢𝑡 =
𝜌𝑢𝑡−1 + 𝜀𝑡 , −1 < 𝜌 < 1, or, |𝜌| < 1. 𝜌 is called the coefficient of the error term lagged one
period. It is also called the first order autocorrelation coefficient. (ii) 𝐸(𝜀𝑡 ) = 0, E(𝜀𝑡2 ) = 𝜎𝜀2 <
∞ and 𝐸(𝜀𝑡 , 𝜀𝑡−𝑠 ) = 0; for all 𝑠 ≠ 0. The durbin Watson test take following steps – (i) Estimate
the OLS model and compute the residual as, 𝑢̂𝑡 = 𝑌𝑡 − 𝛽̂1 + 𝛽̂2 𝑋𝑡2 + ⋯ 𝛽̂𝑘 𝑋𝑡𝑘 . (ii) Compute
∑𝑡=𝑛
𝑡=2 (𝑢 ̂𝑡−1 )2
̂𝑡 −𝑢
durbin Watson, 𝐷𝑊(𝑑) = ∑𝑡=𝑛 ̂𝑡2
≅ 2(1 − 𝜌). (iii) (a) If, 𝑑 < 2, the null and alternative
𝑡=1 𝑢
hypothesis is, 𝐻0 : 𝜌 = 0 and 𝐻1 : 𝜌 > 0, and (b) if 𝑑 > 2 𝐻0 : 𝜌 = 0 and 𝐻1 : 𝜌 < 0.
Look up the 𝑑𝐿 and 𝑑𝑈 , from Durbin Watson table for k’=number of independent variable
(excluding constant), reject the null if 𝑑 ≤ 𝑑𝐿 , and do not reject the null if 𝑑 ≥ 𝑑𝑈 . If 𝑑𝐿 <
𝑑 < 𝑑𝑈 ⇒ 𝑡𝑒𝑠𝑡 𝑖𝑠 𝑖𝑛𝑐𝑜𝑛𝑐𝑙𝑢𝑠𝑖𝑣𝑒. Similarly, for (iii) (b) if 𝑑 ≤ 4 − 𝑑𝑈 , do not reject the null
and if 𝑑 ≥ 4 − 𝑑𝐿 , reject the null. If, 4-𝑑𝑈 < 𝑑 < 4 − 𝑑𝐿 ⇒ 𝑡𝑒𝑠𝑡 𝑖𝑠 𝑖𝑛𝑐𝑜𝑛𝑐𝑙𝑢𝑠𝑖𝑣𝑒.
The major problem of the DW(d) is that the result of the first order serial correlation can be
inconclusive in this case and we need to do Lagrange Multiplier or other higher order test of
serial correlation.
For the given value of N=57, k=2, DW(d)=0.802. The hypothesis will be, 𝐻0 : 𝜌 = 0 and
𝐻1 : 𝜌 > 0 because 𝑑 < 2. From the dubin Watson table we find that, 𝑑𝐿 = 1.49, and 𝑑𝑈 =
1.64. Notice that 𝑑 < 𝑑𝐿 , hence reject the null of no serial correction and conclude that there
is first order positive serial correlation in the error term and the value of serial correlation is
𝑑
𝜌 = 1 − 2 = 0.599. [Note that, in many colleges the durbin Watson tables could not been
provided so, if the students have found the value of 𝜌, we should give marks for this.]

3. See Chapter 11, pp. 387-389, show through graphs and all. Consider a regression equation
– 𝑌𝑡 = 𝛽1 + 𝛽2 𝑋𝑡2 + ⋯ + 𝛽𝑘 𝑋𝑡𝑘 + 𝑢𝑡 , where, 𝑢𝑡 is random variable and Var(𝑢𝑡 /𝑋𝑡 ) = 𝜎𝑡2 for
𝑡 = 1,2,3, … , 𝑛. Thus for each observation has a different error variance suggests that the
variance of the error term is heteroskedastic in nature. This problem is prevalent more in the
cross section and panel data however, it can also be a problem in a time series. Methods of the
detection of heteroskedasticity are as follows – (i) Graphical (Student must have explained this)
(ii) Lagrange Multiplier Test (Breusch-Pagan Test, Glesjer Test, Harvey Godfrey Test),
Spearman’s Rank Correlation Test, Goldfeld Quandt Test, White Test (Student must have
explained at least one).
Consequences of ignoring Heteroskedasticity –
(i) Effects on the properties of Estimators – the properties of the unbiasedness and consistency
are not violated by ignoring heteroskedasticity and using OLS to estimate 𝛽1 and 𝛽2. Since,
Var(𝑢𝑡 /𝑋𝑡 ) = 𝜎 2 (constant variance) is used to show the efficient property of the variance of
𝛽̂1 and 𝛽̂2, it is not possible to show that the Gauss Markov theorem holds. It means OLS
estimators are inefficient. That is, it is possible to find out an alternative unbiased linear
estimator that has lower variance than the OLS estimator.
(ii) Effects on Tests of Hypothesis – The estimated variances and covariance of the regression
coefficients will be biased and inconsistent and hence tests of hypothesis (that is, t- and F-tests)
are invalid.
(iii) Since, OLS estimators are still unbiased the forecasting based on these estimates are still
unbiased, but because the estimators are inefficient, forecasts will also be inefficient.

3. (b) (i) Growth rate of India’s population is 2.4% (Instantaneous). The compound growth rate
is Antilog (0.024)-1=2.43%
̂ 𝑡 = 4.77 + 0.015 ∗ 𝑡
(ii) Before 1978: 𝐸(𝐿𝑜𝑔(𝑃𝑜𝑝)𝑡 /𝐷𝑡 = 0, 𝑡); 𝐿𝑜𝑔(𝑃𝑜𝑝)

1978 Onwards: ̂ 𝑡 = (4.77 − 0.075) + (0.015 +


𝐸(𝐿𝑜𝑔(𝑃𝑜𝑝)𝑡 /𝐷𝑡 = 1, 𝑡); 𝐿𝑜𝑔(𝑃𝑜𝑝)
0.011) ∗ 𝑡
In Model B, the growth rate of population before 1978 is 1.5% (instantaneous) and the
compound growth rate is Antilog(0.015)-1= 1.511 %
The growth rate of population after 1978 is 1.10% (instantaneous) and the compound growth
rate is Antilog(0.011)-1=1.11% which is significantly different from the growth rate of pre-
1978.
(iii) If the dummy variable is defined as-
𝐷1𝑡 = 1, 𝑖𝑓 1978 𝑜𝑛𝑤𝑎𝑟𝑑, 0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
𝐷2𝑡 = 1, 𝑏𝑒𝑓𝑜𝑟𝑒 1978, 0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒.
Using both the dummies together gives, 𝐷1𝑡 + 𝐷2𝑡 = 1, which is a constant term. This is known
as dummy variable trap or the case of exact multicollinearity. In this case regression coefficient
cannot be estimated.

4. (a) (i) The benchmark category is unmarried and Non-South resident. The mean hourly wage
of the benchmark is $8.81.
(ii) The mean hourly wage of those who are married is around $1.10 higher and an actual hourly
wage = $9.91($8.81+$1.10).
(iii) For those who live in the South, the average hourly age is lower by about $1.67 and the
actual wage is = $7.14($8.81-$1.67).
∑ 𝑥𝑖 𝑦 𝑖 16800
(b) (i) 𝛽̂2 = ∑ 𝑥𝑖2
= 33000 = 0.509

Further, 𝛽̂1 = 𝑌̅ − 𝛽̂2 𝑋̅ = 111 − 0.509 ∗ 170 = 24.47. The estimated linear regression is
𝑌̂ = 24.47 + 0.509𝑋.
2 𝐶𝑜𝑣(𝑋,𝑌) 16800
(ii) For 2- variable regression equation 𝑟𝑋,𝑌 = 𝑅 2 . So, 𝑟𝑋,𝑌 = = =
𝜎𝑋 𝜎𝑌 √33000√17099
∑𝑥 2
2
0.70725. And, 𝑟𝑋,𝑌 = 𝑅 2 = 0.5002[One can also calculate 𝑅 2 by, 𝑅 2 = 𝛽̂ 2 ∑ 𝑦𝑖2 ].
𝑖

(c) Consider the regression equation,


𝑌𝑡 = 𝛽1 + 𝛽2 𝑋 + 𝑢𝑡
Consider the scaling factor, 𝑌𝑡∗ = 𝑤1 𝑌𝑡 and 𝑋𝑡∗ = 𝑤2 𝑋𝑡 and 𝑢𝑡∗ = 𝑤1 𝑢𝑡 . Estimating the
equation, 𝑌𝑡∗ = 𝛽̂1∗ + 𝛽̂2∗ 𝑋𝑡∗ + 𝑢𝑡∗ . Applying OLS with the new scaling factor, the solutions are-
𝑤 1
𝛽̂2∗ = 𝑤1 𝛽̂2, and for 𝑤1 = 1 and 𝑤2 = 10, 𝛽̂2∗ = 10 𝛽̂2
2

𝛽̂1∗ = 𝑤1 𝛽̂1, and for 𝑤1 = 1 and 𝑤2 = 10, 𝛽̂1∗ = 𝛽̂1

Fitted Value with the original equation, 𝑌̂𝑡 = 𝛽̂1 + 𝛽̂2 𝑋𝑡 and the Fitted Value with the new
1
scaling would be 𝑌̂𝑡∗ = 𝛽̂1∗ + 𝛽̂2∗ 𝑋𝑡∗ ⇒ 𝑌̂𝑡 = 𝛽̂1 + 10 𝛽̂2 𝑋𝑡 , That is, only the slope coefficient
changes and not the intercept. Further, with original equation the residual is, 𝑢̂𝑡 = 𝑌𝑡 − 𝑌̂𝑡 and
with a new scaling 𝑢̂𝑡∗ = 𝑢̂𝑡 = 𝑌𝑡 − 𝑌̂𝑡 . The residual will not change.

If 10 is added with variable X, the new equation changes as, 𝑌̂𝑡 = 𝛽̂1 + 𝛽̂2 (10 + 𝑋𝑡 )

⇒ 𝑌̂𝑡 = 𝛽̂1 + 10𝛽̂2 + 𝛽̂2 𝑋𝑡

⇒ 𝑌̂𝑡 = (𝛽̂1 + 10𝛽̂2 ) + 𝛽̂2 𝑋𝑡

⇒ 𝑌̂𝑡 = 𝛽̂1∗ + 𝛽̂2 𝑋𝑡 , where, 𝛽̂1∗ = 𝛽̂1 + 10𝛽̂2 . In this case, intercept is the only term which will
change and not the slope coefficient. There is no change in the residual and fitted values when
10 is added with the variable X.

5. (a) (i) For a given 𝑅̅ 2 = 0.277, find the value of 𝑅 2 first and then calculate F-statistics for
𝑅 2 ⁄(𝑘−1)
overall significance using 𝐹𝑐 = (1−𝑅2 )⁄(𝑛−𝑘).

(𝑛−𝑘)
𝑅 2 = 1 − (𝑛−1) (1 − 𝑅̅ 2 ) = 0.3222,

𝑅 2 ⁄(𝑘−1) 0.3222⁄(4−1) 4.833


and 𝐹𝑐 = (1−𝑅2 )⁄(𝑛−𝑘) = (1−0.3222)⁄(49−4) = 0.6778 = 7.130~𝐹𝑐(3,45) . The critical value is
∗ ∗
𝐹3,45 = 4.31 at 1% level of significance. We find that 𝐹𝑐 > 𝐹3,45 , implies reject the null that
the coefficient of Education, Experience and Age are simultaneously zero and conclude that
model is overall significant.
142.510
(ii) For EDUC, 𝑡𝑐 = = 4.088 > (𝑡 ∗ = 2.70) and significant at 1% level of significance.
34.86
43.225
For EXPER, 𝑡𝑐 = 14.303 = 3.0221 > (𝑡 ∗ = 2.70) and significant at 1% level of significance.
−1.913
For AGE, 𝑡𝑐 = = 0.2200 < (𝑡 ∗ = 2.70) and insignificant at 1% level of significance.
8.695

(iii) This might be because of the concept of backward bending labour supply curve with
respect to age or due to multicollinearity with experience (EXPER), or both.
(iv) Even if there is low t-statistics the expected sign is correct and the result is conceptual
correct. Removal of AGE from the regression equation can create a problem of omitting the
relevant variable and can cause omitted variable bias in remaining estimators.
(b) By definition,

𝐶𝑜𝑣(𝛽̂1 , 𝛽̂2 ) = 𝐸{[𝛽̂1 − 𝐸(𝛽̂1 )][𝛽̂2 − 𝐸(𝛽̂2 )]}

= 𝐸{[𝛽̂1 − 𝛽1 ][𝛽̂2 − 𝛽2 ]} [Since, 𝛽̂1 = 𝑌̅ − 𝛽̂2 𝑋̅ = 𝛽1 + 𝛽2 𝑋̅ − 𝛽̂2𝑋̅, = 𝛽1 − (𝛽̂2 − 𝛽2 )𝑋̅]

= −𝑋̅ 𝐸(𝛽̂2 − 𝛽2 )2 ,

= −𝑋̅ 𝑉𝑎𝑟(𝛽̂2 )

̂
𝛼 26.034
6. (a) 𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝐸𝑟𝑟𝑜𝑟(𝛼̂) = 𝑡−𝑟𝑎𝑡𝑖𝑜 = = 1.7408, and
14.955
̂
𝛽 0.137
𝑡 − 𝑟𝑎𝑡𝑖𝑜 = ̂) = 0.028 = 4.8928
𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝐸𝑟𝑟𝑜𝑟 (𝛽

𝐶(𝑉,𝑃) 126.84967
𝑟𝑉,𝑃 = , and also 𝐶(𝑉, 𝑃) = 𝛽̂ 𝑆𝑃2 = 126.84967, 𝑟𝑉,𝑃 = = 0.73744
𝜎𝑉 𝜎𝑃 √31.954√925.91

2
In 2-variable regression equation, we have 𝑅 2 = 𝑟𝑉,𝑃 = 0.5438. Next, we have N=22, k=2
̂2
∑𝑢 𝐸𝑆𝑆 305.96
𝜎̂ 2 = 𝑛−𝑘𝑡 = 𝑛−𝑘 = = 15.298, where ESS is error sum of square as provided in the
20
question, however since full form of ESS is not provided in the question paper and student
might have found out 𝜎̂ 2 taking ESS as explained sum of squares. In that case 𝜎̂ 2 = 12.834
𝐸𝑆𝑆 305.96
[That is, 𝑇𝑆𝑆 = , where ESS is explained sum of square, and TSS = =
𝑅2 0.5438
𝑅𝑆𝑆
562.6333, 𝑇𝑆𝑆 = 1 − 𝑅 2 , 𝑅𝑆𝑆 = (1 − 𝑅 2 )𝑇𝑆𝑆 = 0.4568 ∗ 562.6333 = 256.6733, and 𝜎̂ = 2

𝑅𝑆𝑆 256.6733
= = 12.834]. Both the answers are correct though only the former will be the
𝑛−𝑘 20
consistent with the information provided.

𝑉̅ = 𝛼̂ + 𝛽̂ 𝑃̅
𝑉̅ = 26.034 + 0.137 ∗ 54.478
𝑉̅ = 33.4975

(b) Consider and regression equation as,

𝑌𝑡 = 𝑌̂𝑡 + 𝑢̂𝑡
Writing equation in the deviation form, and squaring further
𝑦𝑡 = 𝑦̂𝑡 + 𝑢̂𝑡
∑ 𝑦𝑡2 = ∑ 𝑦̂𝑡2 + ∑ 𝑢̂𝑡2 [since, 2 ∑ 𝑦̂𝑡 𝑢̂𝑡 = 0]

𝑇𝑆𝑆 = 𝐸𝑆𝑆 + 𝑅𝑆𝑆


Dividing the equation throughout by TSS gives,
𝐸𝑆𝑆 𝑅𝑆𝑆
1 = 𝑇𝑆𝑆 + 𝑇𝑆𝑆 .
𝐸𝑆𝑆 𝑅𝑆𝑆
𝑅 2 is defined as 𝑅 2 = 𝑇𝑆𝑆 = 1 − 𝑇𝑆𝑆 ,

̂2 ⁄(𝑛−𝑘)
∑𝑢 𝑅𝑆𝑆⁄(𝑛−𝑘) (𝑛−1)
and 𝑅̅ 2 = 1 − ∑ 𝑦𝑡2⁄(𝑛−1) = 1 − 𝑇𝑆𝑆⁄(𝑛−1) = 1 − (𝑛−𝑘) (1 − 𝑅 2 )
𝑡

𝑅 2 and 𝑅̅ 2 is the percentage of variation explained by independent variables, where former is


used in the case of 2-variable regression equation and latter in the case of more than two
variables regression equation.
With the given information where N=26, k=6, 𝑅 2 = 0.1 the value of 𝑅̅ 2 = 1 −
(𝑛−1)
(1 − 𝑅 2 ) = −0.125.
(𝑛−𝑘)

𝑅̅ 2 is the adjusted 𝑅 2 implies adjusted for the d.f. associated with the sums of squares entering
𝑅𝑆𝑆
in 𝑅 2 = 1 − 𝑇𝑆𝑆 . That is, ∑ 𝑢̂𝑡2 has (𝑛 − 𝑘) d.f. in a model involving k parameters including
the intercept term and ∑ 𝑦𝑡2 has (𝑛 − 1) d.f. In the formula for 𝑅̅ 2 – (i) for 𝑘 > 1, 𝑅̅ 2 < 𝑅 2 ,
implies that as the number of x variable increases, the 𝑅̅ 2 increases less than the un-adjusted
𝑅 2 , and (ii) The 𝑅̅ 2 can be negative, though 𝑅 2 is non-negative necessarily.
𝑅̅ 2 is a better measure of goodness of fit because it allows for the trade-off between increased
𝑛−1
𝑅 2 and decreased d.f. Note that, is never less than 1 and 𝑅̅ 2 will never be higher than 𝑅 2 .
𝑛−𝑘

7. Write Short Notes-


(a) Jarque Bera (JB) Test
The Jarque Bera (JB) test of normality is an asymptotic, or large sample, test. The test statistics
𝑆2 (𝑘−3)2
is as follows- 𝐽𝐵 = 𝑛 [ 6 + ], where, n=sample size, S=skewness coefficient, and
24
K=kurtosis coefficient. Under the null hypothesis that the residual is normally distributed
𝐻0 : 𝐽𝐵 = 0 , where jointly S=0 and K=3. In this case JB value is expected to be 0. The JB
statistics follow the Chi-Square distribution with 2 d.f. If the computed p-value of the JB
statistic in an application is sufficiently low, which will happen if the value of JB is very
different from 0, one can reject the null hypothesis that the residuals are normally distributed.
But if the p-values is reasonably high, which will happen if the value of the statistic is close to
zero, we do not reject the normality assumptions.
(b) Perfect Multicollinearity
If two independent variables are closely related then, there exist multicollinearity. In other
words, if there exist linear relationship between two variables then there might be a possibility
of exact multicollinearity as well.
Consider the case as, 𝑥𝑡3 = 2𝑥𝑡2 and writing the regression equation in deviation form;
𝑦𝑡 = 𝛽2 𝑥𝑡2 + 𝛽3 𝑥𝑡3 + 𝑣𝑡 . By minimizing the squared of the residual, the two normal equations
can be written as –

𝛽̂2 ∑ 𝑥𝑡2
2
+ 𝛽̂3 ∑ 𝑥𝑡2 𝑥𝑡3 = ∑ 𝑦𝑡 𝑥𝑡2 (a)

𝛽̂2 ∑ 𝑥𝑡2 𝑥𝑡3 + 𝛽̂3 ∑ 𝑥𝑡3


2
= ∑ 𝑦𝑡 𝑥𝑡3 (b)
Substitute, 𝑥𝑡3 = 2𝑥𝑡2 in Eq. (b) gives,

𝛽̂2 ∑ 𝑥𝑡2 (2𝑥𝑡2 ) + 𝛽̂3 ∑ 𝑥𝑡2 (2𝑥𝑡2 ) = ∑ 𝑦𝑡 (2𝑥𝑡2 ). This further gives,

2𝛽̂2 ∑ 𝑥𝑡2
2
+ 2𝛽̂3 ∑ 𝑥𝑡2 𝑥𝑡3 = 2 ∑ 𝑦𝑡 𝑥𝑡2 (c)
Eq. (c) is linear transformation of Eq. (a) and we find that the Eq. (a) and Eq. (b) are not
independent to give estimates of 𝛽̂2 and 𝛽̂3. In matrix form, we will get singular matrix where,
determinant of (𝑋’𝑋), ∆= 0 and hence we cannot invert and get the solution of the matrix. This
is the problem of exact multicollinearity.

(c) Dummy Variable Trap


Consider a model with qualitative and quantitative variables –
𝑌𝑡 = 𝛼1 + 𝛼2 𝐷 + 𝛽𝑋 + 𝑢 (a)
Where, Y=wage Earned, D=1, if male, 0, otherwise, X=Experience. The estimated relationship
for two groups’ are-

Female: 𝐸(𝑌𝑡 /𝐷 = 0, 𝑋𝑡 ); 𝑌̂ = 𝛼̂1 + 𝛽̂ 𝑋

Male: 𝐸(𝑌𝑡 /𝐷 = 1, 𝑋𝑡 ); 𝑌̂ = (𝛼̂1 +𝛼̂2 ) + 𝛽̂ 𝑋.


The natural way to arise question is, ‘there is no difference in the relationship between groups’.
Comparing equations for female and male, we can test - 𝐻0 : 𝛼̂2 = 0 and 𝐻1 : 𝛼̂2 > 0 or,
𝐻1 : 𝛼̂2 ≠ 0. The appropriate test would be t-test on 𝛼̂2 with 𝑛 − 3 d.f. There is special reason
for not defining both the dummies together. For example, define 𝐷1 =
1, 𝑖𝑓 𝑚𝑎𝑙𝑒, 0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 and 𝐷2 = 1, 𝑖𝑓 𝑓𝑒𝑚𝑎𝑙𝑒, 0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒. In this situation, 𝛼 = 𝛼1 +
𝛼2 𝐷1 + 𝛼3 𝐷2 would have exact multicollinearity because 𝐷1 + 𝐷2 = 1, which is a constant
term. This is known as dummy variable trap. To avoid this problem, the number of dummy
variable used in the regression equation is always less than the total number of categories
possible. However, if the regression coefficient is modified as, 𝑌𝑡 = 𝛽1 𝐷1 + 𝛽2 𝐷2 + 𝛿𝑋 + 𝑢
without a constant term then there is no problem of exact multicollinearity and hence there is
no dummy variable trap.
(d) ANOVA
From the identity; ∑ 𝑦𝑡2 = ∑ 𝑦̂𝑡2 + ∑ 𝑢̂𝑡2 ,

implies, ∑ 𝑦𝑡2 = 𝛽̂22 ∑ 𝑥𝑡2 + ∑ 𝑢̂𝑡2


𝑇𝑆𝑆 = 𝐸𝑆𝑆 + 𝑅𝑆𝑆 , where, TSS=total sum of square, ESS=explained sum
of square, RSS=residual sum of squares. The study of the components of TSS is known as the
analysis of variance (ANOVA) from the regression view point.
Associated with any sum of squares in its degrees of freedom, the number of independent
observations on which it is based, TSS has n-1 degrees of freedom because we lose 1 d.f. in
computing the sample mean 𝑌̅. RSS has n-2 degrees of freedom and is true for 2-variable
regression equation where 𝛽1is present. ESS has 1 d.f., which follows from the fact that
𝐸𝑆𝑆 = 𝛽̂22 ∑ 𝑥𝑡2 is a function of 𝛽̂2only, since ∑ 𝑥𝑡2 is known.
ANOVA Table for Two Variable Regression Model
Source of Variation SS* d.f. MSS+
Due to Regression ∑ 𝑦 2 = 𝛽̂ 2 ∑ 𝑥 2 1 𝛽̂22 ∑ 𝑥𝑡2
𝑡 2 𝑡
(ESS)
Due to Residual ∑ 𝑢̂𝑡2 n-2 ∑ 𝑢̂𝑡2
(RSS) = 𝜎̂ 2
𝑛−2
TSS ∑ 𝑦𝑡2 n-1

* Means sum of squares, + Mean sum of squares, which is obtained by dividing SS by their
d.f.
𝑀𝑆𝑆 𝑜𝑓 𝐸𝑆𝑆 ̂22 ∑ 𝑥𝑡2
𝛽 ̂22 ∑ 𝑥𝑡2
𝛽
From the table consider the following variable- 𝐹 = 𝑀𝑆𝑆 𝑜𝑓 𝑅𝑆𝑆 = ̂2
∑𝑢
= ̂2
. Assuming
𝑡 𝜎
𝑛−2
𝑢𝑖 is normally distributed and if the null hypothesis (𝐻0 ) is that 𝛽2 = 0, then it can be shown
that the F-variable follows the F-distribution with 1 d.f. in the numerator and n-2 d.f. in the
denominator.
It can be shown that,

𝐸(𝛽̂22 ∑ 𝑥𝑡2 ) = 𝜎 2 + 𝛽2 ∑ 𝑥𝑡2 (a)


̂2
∑𝑢
𝐸 (𝑛−2𝑡 ) = 𝐸(𝜎̂ 2 ) = 𝜎 2 (b)

If 𝛽2 = 0 implies Eq. (a) and Eq. (b) equations provide identical estimates of true 𝜎 2 . In this
case X does not have linear influence on Y and all the variation in Y is explained by the random
error term 𝑢𝑖 . If 𝛽2 ≠ 0, the Eq. (a) and Eq. (b) will be different and part of the variation in Y
will be explained by X. Therefore, F-test provide a null hypothesis 𝐻0 : 𝛽2 = 0. If 𝐹𝑐 >
𝐹 ∗ (tabulated value), reject the null and the probability of committing type-I error is very low.

(e) Structural Stability


If we have time series data, then it may happen that, there is structural change in the relationship
between the regressand and Y regressors. That is, the values of the parameters of the model do
not remain the same through the entire time period. Sometimes structural change can be due to
external factors – such as impact of war or oil prices or, due to policy changes. Suppose we
have a regression equation of Y(savings) on X(disposable income) and expected structural
breaks are as follows:
1970-1981: 𝑌𝑡 = 𝛼1 + 𝛼2 𝑋𝑡 + 𝑢1𝑡 , 𝑛1 = 12 (a)
1982-1995: 𝑌𝑡 = 𝛽1 + 𝛽2 𝑋𝑡 + 𝑢2𝑡 , 𝑛2 = 14 (b)
1970-1995: 𝑌𝑡 = 𝛾1 + 𝛾2 𝑋𝑡 + 𝑢𝑡 , 𝑛 = 𝑛1 + 𝑛2 = 26 (c)
Null Hypothesis; 𝐻0 : 𝛼1 = 𝛽1 = 𝛾1, and 𝛼2 = 𝛽2 = 𝛾2. That is, the structural change can
happen because of intercept term or slope coefficient or both. Here, the slope coefficient is
marginal propensity to save (MPS). One possible method of structural stability test is Chow
Test.
Assume, (i) 𝑢1𝑡 ~𝑁(0, 𝜎 2 ) and 𝑢2𝑡 ~𝑁(0, 𝜎 2 ), and (ii) 𝑢1𝑡 and 𝑢2𝑡 are independently and
identically distributed. Steps for the Chow test are as follows –
(i) Estimate Eq. (c) and get RSS3 with 𝑑. 𝑓. = 𝑛1 + 𝑛2 − 𝑘. RSSR is called the restricted
residual sum of squares and we consider, RSS3=RSSR, because it is obtained by imposing the
restrictions that, 𝛼1 = 𝛽1 and 𝛼2 = 𝛽2 . That is, the sub-periods regressions are not different.
(ii) Estimate Eq. (a) and get RSS1 with 𝑑. 𝑓. = 𝑛1 − 𝑘.
(iii) Estimate Eq. (b) and get RSS2 with 𝑑. 𝑓. = 𝑛2 − 𝑘.
(iv) Since, two sets of samples are independent, we can add RSS1 and RSS2 to obtain what we
may call the unrestricted residual sum of squares (RSSU). That is we obtain, 𝑅𝑆𝑆𝑈 = 𝑅𝑆𝑆1 +
𝑅𝑆𝑆2 with 𝑑. 𝑓. = 𝑛1 + 𝑛2 − 2𝑘.
(𝑅𝑆𝑆𝑅 −𝑅𝑆𝑆𝑈 )⁄𝑘
(v) Calculate, 𝐹𝑐 = 𝑅𝑆𝑆 ~𝐹𝑘, 𝑛1 +𝑛2 −2𝑘 . If 𝐹𝑐 > 𝐹 ∗ (tabulated value), implies reject
𝑈 /𝑛1 +𝑛2 −2𝑘
the null of parameter (structural) stability else, do not reject the null.
Objections of the Chow tests are following – (i) Assumptions underlying the Chow test must
be fulfilled, for example; 𝜎12 = 𝜎22 etc., (ii) The Chow test will tell us only if the two regressions
(a) and (b) are different without telling us whether the difference is on account of the intercept,
slopes or both. (iii) Chow test assumes, that we know the point(s) of structural break.

You might also like