Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

STATISTICS for Finance

Topic #3:
REGRESSION MISSPECIFICATION
(Video K)
STATISTICS IN FINANCE A.M. Fuertes/ 1 Introduction 23

Review Questions
Q1:. Suppose that we are testing the null hypothesis H0 : β = 0.75 against the alternative that
HA : β > 0.75 where β denotes the systematic risk of a stock XZY. The test statistic employed for the
analysis follows a t distribution with 3 degrees of freedom. The observed statistic based on a linear
regression of stock prices (y) against the market index (x) including an intercept equals 3.2.
(i) What is the rejection probability of the test?
(ii) What is the critical value of the test at the 5% level?
(iii) Use your answers to (i) and (ii), separately, to answer the question: Is the hypothesis rejected?
Do you obtain the same answer in both cases?
(iv) What is the sample size used?

Q2: Let X and Y be two (discrete or continuous) random variables. One useful expression for
the covariance is this Cov(X, Y ) = E(XY )-µX µY where µX = E(X) and µY = E(Y ). Derive this
expression using the expectation operator, and properties of expectations.

Q3: Let X denote a Normally distributed random variable with mean µX and standard deviation
σX . Show that by transforming this variable by subtracting its mean and dividing by its standard
deviation we obtain another Normal random variable with zero mean and unit standard deviation
Z ∼ N (0, 1).

Q4: Let X and Y represent the monthly percentage returns of assets A and B, respectively, which
are are Normally distributed. The expected return and variance (risk measure) for asset A are 2
and 1.3, respectively, and for asset B, they are 0.9 and 0.5. So we can write X ∼ N (2, 1.3) and
Y ∼ N (0.9, 0.5).
(i) What is the probability that an equally-weighted portfolio P1 has a monthly return of more
than 1.75%. Answer this question assuming: (a) corr(X, Y ) = 0, (b) corr(X, Y ) = −1.
(ii) What is the probability that the portfolio P2 = 0.8X + 0.2Y has a monthly return between
1.75% and 2%. Answer this question assuming that the asset returns X and Y are independent.
(iii) Show geometrically (i.e. in a graph) the probability calculated in (i) and (ii) for the indepen-
dence case.

Q5: The following cross-section sample of abnormal returns (i.e. returns in ‘excess’ from the average
or market return) in percent has been obtained for a fund after a political event {0.2, 0.5, 0.12, -0.15,
1.3, 0.72, 0.23, 0.55, 2.5, -1.5}. Test the hypothesis that the event has a significant effect on the stock
market.

Q6: Explain intuitively (using a graph) the unbiasedness and consistency properties of an estimator.
Q7: The question of interest is whether US mutual funds outperformed UK mutual funds in the
year 2006. To examine this issue empirically, the 2008 returns of 15 randomly selected mutual funds
of comparable characteristics in the US and UK were recorded with values US(%)={1.3, -0.2, 1.7, 2.1,
1.8, -0.3, 1.4, 1.5, -1.3, 2.0, 3.1, 2.2, -0.7, -1.1, 1.5} and UK(%)={-1.1, -0.5, 1.3, 1.2, 1.6, 0.9, -1.3,
1.7, 2.3, 1.1, -1.3, 2.5, 0.3, 1.1, -1.7}. Test the hypothesis of equal fund performance in the following
scenarios: a) the risk in the two markets can be assumed identical, b) unequal risk.

Q8: Take the observations given in Q7 as time observations for the largest US mutual fund and the
largest UK fund over 15 consecutive months. Test the hypothesis of equal fund performance using a
paired difference test. Is the use of the paired difference test justified in this context? (Hint: calculate
the sample correlation and test whether it is significant).
STATISTICS IN FINANCE A.M. Fuertes/ 1 Introduction 24

Q9: Let the random variable xt represent the monthly return of a portfolio
2
of commodities futures.
Let us denote the mean and variance of this random variable µ and σ , respectively. The monthly
commodity futures returns xt are normally distributed; this can be denoted xt ∼ N (µ, σ 2 ). Suppose
that the aforementioned portfolio has yielded the following returns over 17 consecutive months xt =
{1.2, 0.3, 2.1, −3.1, 1.0, 5.0, −0.8, 1.4, 3.8, 1.3, 6.1, −3.5, 2.9, 1.9, 0.7, −0.8, 1.9}. How likely it is that the
portfolio yields a negative monthly return?

Q10: According to a report, UK mutual funds tend to provide an (annualized) monthly return of
2.79%. The asset management company CISCB plf is marketing a relatively new fund, that has been
running for only two years, which they claim is able to provide a higher (annualized) monthly return.
A potential investor with good analytical skills gathers data on this fund and sets out to test the latter
claim. Here are his hypotheses: H0 : µ = 2.79 versus Ha : µ < 2.79 (where µ is the mean return of
this relatively new fund). Under which of the following conditions would the investor commit a Type
I error? (a) The mean return of the CISCB fund is actually 2.79, and the investor fails to conclude it
is higher than 2.79.
b. The mean return of the CISVB fund is actually higher than 2.79 and the investor concludes it is
higher than 2.79.
c. The mean return of the CISVB fund is actually 2.79, and he concludes it is higher than 2.79.
d. The mean return of the CISVB fund is actually higher than 2.79 and the investor fails to conclude
it is higher than 2.79.

Q11: “The significance level at which a test is conducted (e.g., 10%, 5% or 1%) refers to the rate
at which we are prepared to reject a true null hypothesis or Type I error.”Is this statement TRUE or
FALSE?

Q12: “Increasing the sample size of a test from say T=20 to T=60 observations (or data points)
would reduce the Type I error of the test”Is this statement TRUE or FALSE?

ANSWERS to Review Questions


Go to Moodle
STATISTICS IN FINANCE A.M. Fuertes/ 2 Simple Regression Model 52

Review Questions
Q1: The following regression estimation results have been obtained by OLS:
yt = 0.20 + 1.34xt + εt

where y is monthly sales and x is advertising expenditure in the film industry.


a) What is the marginal effect of monthly advertising on sales? Is it constant?
b) What is the elasticity of monthly sales with respect to advertising? Is it constant?

Q2: Suppose that T observations are collected on the annual excess returns of the XYZ hedge fund
(y) and the excess returns on the FTSE100 market index (x). Suppose that OLS gives the following
regression estimates:
yt = 0.74 + 1.64xt + εt
a) An analyst tells you that according to a good financial model she expects the market to yield
an excess return of 20% next year. On the basis of this information, what would your expectation be
for the fund’s excess return next year?
b) What is the systematic risk (or CAPM beta) for this fund equal to?

Q3: The following regression parameters (with s.e. in parenthesis) have been obtained by the ML
estimation method:

yt = 0.74 + 1.64xt + εt
(0.54) (0.94)

where y are the quarterly excess returns of portfolio AA and x is the FTSE100 excess returns. These
results are based on T = 22 quarterly observations.
a) Conduct a (one-sided) exact test and an asymptotic test to investigate the hypothesis the fund
returns are influenced by the market. Express clearly the hypothesis that is being tested.
b) Do the results coincide? If not, which one should we follow?
c) Test for the hypothesis that the fund generates abnormal positive returns.

Q4: Suppose you are interested in analyzing the relationship between CEO salaries and firm profits.
For this purpose, a sample of US executive compensation ($ million) and firm profits ($ million) is
collected for 10 randomly sampled firms:
comp 0.7 1.3 0.3 0.9 1.7 1.2 1.1 0.1 2.4 3.9
prof its 932 1921 654 1213 2109 1327 127 98 2109 1987
a) Calculate the marginal change or sensitivity of compensation to profits and the corresponding
elasticity assuming a linear relationship. Interpret the estimates.
b) Repeat the previous question now assuming a nonlinear relationship between the two variables
using the following functional forms: quadratic, reciprocal, exponential, log-log, linear-log, log-inverse.
Interpret the results.
c) For each of the cases in a) and b) test the hypothesis that the marginal change is insignificant.
(Note: for the non-constant cases, conduct the test at the mean compensation level and/or mean
profits).

Q5: Suppose we want to test H0 : γ = 3 at the 5% level. The test statistic we use is ϕ ∼ N (0, 1).
Suppose we obtain under the null ϕ = 1.42.
a) What is the critical value and the probability value (p-value) of a two-sided test HA : γ #= 3?
What do they suggest?
STATISTICS IN FINANCE A.M. Fuertes/ 2 Simple Regression Model 53

b) What is the critical value and the probability value of a one-sided test HA : γ > 3? What do
they suggest?
c) Represent in a graph the critical values and p-values of a and b.

Q6: What do the terms ‘exact test’ and ‘asymptotic test’ mean? Give an example of each.

Q7: Suppose the OLS estimation results for yt = β1 + β2 x2 + εt are

ŷt = 0.07 + 0.87xt , R2 = 0.89


(0.04) (0.30)

where yt are the returns on Sainsbury plc. shares and xt are the returns on the FTSE All share index.
(i) How much of the variabilility of Sainsbury returns is explained by the market? (ii) What is the
correlation between Sainsbury returns and the market returns?

Q8: Suppose that your OLS estimation results give R2 = −0.34. Discuss this result.
Q9: Suppose that the plot of your regression residuals shows clusters of positive residuals followed
by clusters of negative residuals and so on. Discuss.

Q10: A portfolio manager is seeking to hedge the risk of corporate bond B by taking a short
position on the corresponding futures contract (F). More specifically, (s)he wants to calculate the
proportion of the value of the long position (B) that the short position should represent in order
to minimize the total variance of the portfolio (P). The latter is known in risk management as the
minimum risk hedge ratio (h hereafter) and it can be shown that h can be given by

cov(RB , RF )
h=
var(RF )

So (s)he collects past data (T = 55 observations) on prices, PB and PF , and constructs the return series
yt ≡ ∆PB /PB and xt ≡ ∆PF /PF , respectively. He obtains also the following summary statistics:
T
! T
!
yt xt = 17, 000; x2t = 9, 000; x̄ = 10; ȳ = 30
t=1 t=1

(i) Can he measure h (i.e. compute ĥ) using this information?; (ii) Show that the hedge ratio is given
by h = cov(RB , RF )/var(RF ); (iii) How can regression analysis effectively be applied in the context
of hedging?

Q11: A stock market analyst is interested in studying the relationship between the annualized stock
returns for IVECO plc. and those for the S&P500 over the period 1989M1-1998M12. the following
summary statistics are obtained from the data (yt =IVECO returns; x =S&P returns) expressed in
percentage:
T
!
ȳ = 2.5; (yt − ȳ)2 = 30; x̄ = 1.7%
t=1
T
! T
!
(xt − x̄)2 = 40; (xt − x̄)(yt − ȳ) = 66
t=1 t=1
STATISTICS IN FINANCE A.M. Fuertes/ 2 Simple Regression Model 54

(i) Compute the OLS estimates of α and β in the model yt = α + βxt + εt ; (ii) The sum of squared
residuals is 10.93. Analysts in a brokerage house are using a beta (measure of systematic risk) for
IVECO equal to 1.6 in their research analysis. Are they doing the right thing? (iii) Form a 99%
confidence interval for the beta of IVECO plc. and interpret it.
Note: The 1%, 5% and 10% critical values of a two-sided standard normal test statistic are 2.57, 1.96
and 1.65, respectively.

Q12: Suppose data are available on two variables yta and ytb which have different degrees of
dispersion. More particularly, var(yta ) is very small (near zero) while var(ytb ) is much larger. We
run a regression of each variable on a constant only. Indicate whether the following statement is
True/False and justify your answer. “Since the variability of yta is very small then a regression of it
on a constant is expected to provide a reasonably good fit as measured by R2 . The opposite holds for
ytb , i.e. since it has a large variability around its mean then a regression of ytb on a constant (which
has zero variability) will give a very small R2 .”

Q13: Suppose that 40 monthly observations are available on the returns of a unit trust fund XYZ
(denoted RP ), the T-bill rate which is taken as proxy for the ‘risk-free’ rate (rf ) and the S&P500
index returns as proxy for the market returns (RM ). Defining yt ≡ RP − rf and xt ≡ RM − rf and
running an OLS regression, suppose that we obtain the estimates (standard errors in parenthesis):

yt = 0.74 + 1.64xt + εt
(0.54) (0.80)
2
R = 0.65

Jensen’s alpha measure is the average return on a portfolio over and above that predicted by the
CAPM, given the portfolio’s beta and the average market return, αP = E(RP )−[E(rf )+βP (E(RM )−
E(rf )].
(i) If the excess return of the market increases by 1 percentage point, by how much the excess
return of the unit trust is predicted to increase?
(ii) Is it fair to say that this unit trust generates abnormal positive returns?

Q14: “A small R-squared does imply that the error variance is small relative to the variance of
y.” True or False?

Q15: A potential investor in a risky fund has collected information on the monthly return of the
fund over the past 12 months. He finds the sample (annualized) mean return is x̄ = 2.29% with
a standard deviation of SD(x) = 0.20. The returns in the sample are roughly symmetric with no
distinct outliers. Based on this sample, which of the following is closest to a 90%, percent confidence
interval for the mean return of this risky fund. (a) (2.09%, 2.49%); (b) (2.16%, 2.42%); (c) (2.19%,
2.39%); (d) (2.23%, 2.35%); (e) (2.27%, 2.31%).
ANSWERS to Review Questions
Go to Moodle
STATISTICS IN FINANCE A.M. Fuertes/ 3 Multiple Regression Model 76

Review Questions
Q1: Let yt denote an emerging stock market index return, xt2 the sovereign bond yields and
xt3 the exchange rate volatility, all at a monthly frequency. Suppose that the regression model
yt = β1 + β2 xt2 + β3 xt3 + εt has been estimated by OLS using a sample of T=80 months and the
results are:
ŷt = 2.79 + 0.64xt2 − 1.02xt3
ESS(Explained Sum of Squares)=173.24
RSS(Residual Sum of Squares)=48.61
 
.384 .154 .312
Variance-covariance matrix: .116 .106 
.240
(i) Test for the significance of each model coefficient using both an asymptotic and an exact test
and compare the results. Discuss. (ii) Calculate the R2 and the R̄2 ;(iii) Test for the overall significance
of the regression; (iv) Test the hypothesis that the coefficients of xt2 and xt3 are equal in absolute
value and of opposite sign.

Q2: Suppose the OLS estimation results for yt = β1 + β2xt2 + β3xt3 + εt where &(xt2 − x̄)2 = 2.5
are (with SE√in parenthesis): β̂1 = 0.17(0.04), β̂2 = 1.6(0.72), β̂3 = −1.45(0.64), cov(β̂2 , β̂3 ) = −0.2
and SER = 0.07

(i) Is the regression intercept statistically significant?


(ii) Suppose that some theory suggests that β2 + β3 = 0. Is this theory supported by the data?
(iii) Can we infer from these results the sample correlation between xt2 and xt3 ?

Q3: Indicate whether the following statement is True or False and justify your answer. “The
adjusted-R2 is a ‘softer’ decision rule than the SBC for assessing whether an additional regressor
should be included in a model and hence, a researcher basing his/her modeling decisions on the R̄2 is
more prone to incur in data mining.”

Q4: The following nonlinear regression of inflation (yt ) on unemployment (xt ) has been estimated
using UK data 1960-2001
yt = β1 + β2 (1/xt ) + εt
as a proxy for the so-called Phillips curve. The estimation results produced R2 = 0.60. Indicate
whether the following statement is True or False and justify your answer: “The addition of the
regressor £/$, i.e. the nominal exchange rate level (zt ) will cause the R2 to increase if the relative
strength of the pound is a significant determinant of inflation and it will cause the R2 to decrease or
remain unchanged if the £/$ level has no economic importance in determining inflation”.

Q5: Indicate whether the following statement is True or False and justify your answer: “If an
additional regressor accounts for very little of the unexplained variation in the dependent variable, R̄2
falls whereas R2 increases or remains the same”.

Q6: List six situations where comparing alternative regression models on the basis of their R2 can
be misleading.

Q7: Discuss some commonalities and main differences between the AIC, SBC and HQC selection
criteria. What makes them ‘superior’ to the adjusted-R2 .
STATISTICS IN FINANCE A.M. Fuertes/ 3 Multiple Regression Model 77

Q8: Suppose that three alternative regression models, A, B and C have been estimated by maximum
likelihood. Their maximized log-likelihood functions are ln LA = 1702.3, ln LB = 1302.3 and ln LC =
1102.3. Is it adequate to select model A as best model?

Q9: Suppose that a researcher wants to test for joint hypotheses on the parameters of an econometric
model. You are asked to suggest to this researcher two alternative test statistic for his purpose. Discuss
the relative merits and disadvantages of the two tests proposed.

Q10: The international finance hypothesis known as Purchasing Power Parity (PPP) states that
the foreign exchange rate and domestic prices of any two countries move proportionally in the long run
so that their currencies have, effectively, the same purchasing power. To investigate the PPP theory
we estimate by OLS a regression of the £/$ exchange rate (denoted yt ) on the UK and US consumer
price indexes denoted xU K,t and xU S,t , respectively, with all variables expressed in logarithmic form.
Using monthly data from 1992M1 to 2016M12, we obtain the following estimation results:

yt = 0.14 + 1.64xU K,t − 1.55xU S,t + εt


 
0.29 0.13 0.07
V (β̂) =  0.64 0.02 
0.23
Do these results support the PPP hypothesis? (Hint: The PPP hypothesis implies that Yt ∗XU S,t /XU K,t =
α which taking logarithms can be expressed as yt + xU S,t − xU K,t = c where yt = ln(Yt ), xU S,t =
ln(XU S,t ), xU K,t = ln(XU K,t ) and c = ln(α). We can re-write the latter expression as yt =
c + xU K,t − xU S,t which suggests that a “soft” test of the PPP hypothesis can be formulated as
the null hypothesis that the regression coefficients of xU K,t and xU S,t are equal and of opposite sign).

ANSWERS to Review Questions


Go to Moodle
STATISTICS IN FINANCE A.M. Fuertes/ 4 Misspecification 95

Review Questions
Q1. Consider a model explaining the demand for food in the US. Let Q be the total real expenditure
on food, let X be the real total expenditure on all goods and services. To obtain these “real” variables
we have divided the nominal quantities by a price index deflating all to a 1980 basis. Economic theory
suggests that Q = f (X). We consider the following econometric model:

ln(Q) = β1 + β2 ln(X) + ε

Using data from 1965 to 1989 (25 annual observations) we estimate the model and obtain the following
Eviews output [LQ = ln(Q) and LX = ln(X)].

Dependent variable: LQ
Parameter estimates

Coefficient Std. Error t-statistic Prob


β1 7.5900 .2777 27.327 .0001
β2 .3224 .0194 16.577 .0001

(a) State the interpretation of the estimated slope coefficient in the context of this problem.
(b) Test the null hypothesis that a 1% increase in real expenditure on all goods and services leads
to a 0.25% increase in the real expenditure on food against the alternative that it does not. What do
you conclude? Use the 0.01 (or 1%) significance level.
(c) Construct a 95% confidence interval estimate for β2 .
(d) What distributional assumption about the error term in this model must hold in order for
interval estimation and hypothesis testing to be valid. Why? What if the number of observations was
100 rather than 25?
(e) Suppose that in this model we have omitted an important variable, namely the real price of
food, relative to all other goods. What are the consequences of this omission for (a)-(c)?

Q2. In an estimated regression, the reported significance t-statistic is -6.607 and the parameter
estimate is -37.819. What is the estimated variance of the OLS estimator for the parameter?

Q3. Suppose that, from a sample of 63 observations, the OLS estimates and their corresponding
variance-covariance estimates are given by
       
β̂1 2 β̂1 3 -2 1
 β̂2  =  3  ; côv  β̂2  =  -2 4 0 
β̂3 -1 β̂3 1 0 3

Test each of the following hypotheses and clearly state the conclusion being made:
(a) β2 = 0
(b) β1 + 2β2 = 5
(c) β1 − β2 + β3 = 4

Q4. Discuss:
(1) What are the consequences of exact multicollinearity? What are the consequences of strong,
but not exact, multicollinearity?
(2) What can you do to overcome multicollinearity?
,
(3) Suppose that the RESET test statistic for the model estimated in Q1 ln(Q) = β1 +β2 ln(X)+ε,
is equal to 2.748 and its p-value equal to 0.064. What can you conclude?
(4) How would you go about choosing a functional form?
STATISTICS IN FINANCE A.M. Fuertes/ 4 Misspecification 96

(5) How might you decide whether a functional form is adequate?

Q5. In Chapter 2 (Section 2.7) we discussed the potential problems of not including an intercept in
an empirical regression model when the true DGP has a nonzero intercept. Consider now the opposite
situation, that is, an intercept is included in the empirical regression model whereas the true DGP
has a zero intercept. Discuss the potential problems (if any!) of doing so.

Q6. “The R2 in the multiple regression yt = β1 + β2 xt2 + β3 xt3 + εt will never be less than the sum
of the R2 in the individual regressions yt = α1 + α2 xt2 + εt and yt = γ1 + γ2 xt3 + εt .” True/False?
Justify your answer.

Q7. 2 “The omission of a variable that ought to be included can make the R2 underestimate the
true R .” True/False? Justify your answer.

ANSWERS to Review Questions


Go to Moodle
STATISTICS IN FINANCE A.M. Fuertes/ 5 Autocorrelated Errors 113

Review Questions
Q1: Are the following statements true or false. Justify your answer.
(a) When autocorrelation is present, the OLS estimators are unbiased but inefficient.
(b) The Durbin-Watson DW can be misleading in dynamic regression models, however the Breusch-
Godfrey LM test for autocorrelation is still valid in this case.
(c) The DW test is designed on the assumption that the variance of the error term is constant, i.e.
it assumes that the error term is homoskedastic.
(d) Suppose we transform a model (using GLS) to circumvent the autocorrelation problem detected
in an initial regression model estimated by OLS, the R2 of the two models are not comparable.
(e) One solution to the problem of autocorrelation is to replace the standard OLS standard errors
of the estimated coefficients by the Newey-West h.a.c. standard errors. However, this correction does
not solve the problem of inefficient estimates.

Q2: Using annual data 1970-1987 on Yt = N Y SE Composite Common Stock Price Index, (De-
cember 31, 1985=100) and Xt = GN P ($, billions) the following regression results were obtained

Ŷt = 88.543− 0.0454Xt + 0.0000131Xt2


t-stat= (8.3792) (-5.0808) (8.0045)
R2 = 0.9577; DW = 1.67

where Xt2 represents the square of the GNP.


(a) Is there first order autocorrelation in this regression?
(b) Suppose that a regression of Yt on Xt only (i.e. not including the squared term) was estimated
by OLS and autocorrelation was found. In the light of your answer to (a) what can you say?
(c) A priori, would you expect a positive or negative relationship between stock prices and GNP?
Justify your answer.
(d) According to this model is the sensitivity of stock prices to GNP positive or negative? (Note:
GNP for 1979 was 1015.5 while for 1987 it was 4526.7)

Q3: Consider the following regression model

Ŷt = −49.4664+ .88544X2t + .09253X3t


t-stat= (-2.2392) (70.2936) (2.6933)
R2 = 0.9979; DW = 0.8755;

where
Y =personal consumption expenditure (1982 billions of dollars)
X2t =personal disposable income (1982 billions of dollars) (PDI)
X3t =Dow Jones Industrial Average Stock Index
The regression is based on US annual data from 1961 to 1985 (Gujarati and Porter, 2008).
(a) Is there first-order autocorrelation in the residuals of this regression? How do you know?
(b) Using the OLS estimate α̂ for ε̂t = αε̂t−1 + et , each of the above regression variables was
transformed to Ŷt∗ = Yt − α̂Yt−1 (and likewise for the two regressors) and the following OLS estimation
results were obtained
Ŷt∗ = −17.97+ .89X2t ∗
+ .09X3t ∗

t-stat= (30.72) (2.66)


R2 = 0.9816; DW = 2.28;
Has the problem of autocorrelation been resolved? How do you know?
(c) Comparing the original and transformed regressions, the t value of the PDI has dropped
dramatically. What does this suggest?
STATISTICS IN FINANCE A.M. Fuertes/ 5 Autocorrelated Errors 114

(d) Is the DW value from the transformed regression of any value in determining the presence, or
lack thereof, of autocorrelation in the transformed data?

Q4: Is the following statement True or False? (Justify your answer). “A high R2 in conjunction
with a low DW statistic suggests that there is no autocorrelation.”

Q5: Is the following statement True or False? (Justify your answer). “A significant LM (or DW)
test for autocorrelated errors could occur even if the errors are not ‘truly’ autocorrelated but instead
the source of the problem may be a misspecified regression model — an incorrect regressor set or an
incorrect functional form.”

ANSWERS to Review Questions


Go to Moodle
STATISTICS IN FINANCE A.M. Fuertes/ 6 Heteroskedastic errors 128

Review Questions
Q1: Are the following statements true or false. Justify your answer.
(a) In the presence of heteroskedasticity the least squares (OLS) estimator is biased as well as
inefficient.
(b) If heteroskedasticity is present, the conventional t and F tests are invalid and the problem is
not mitigated by increasing the sample size.
(c) In time series data heteroskedastic errors may give a misleading impression of autocorrelation.
(d) In the presence of heteroskedasticity the OLS method always overestimates the standard errors
of estimators.
(e) If the plot of the OLS residuals exhibits a ‘systematic’ pattern, rather than randomly fluctuating
around zero, it suggests that heteroskedasticity is present in the data.
(f) There is no general test of heteroskedasticiy that is free of any assumption or approximation
regarding the time-varying error variance, var(εt ) = σt2 .
(g) In regressions with marked heteroskedasticity when the sample size is small, the GLS standard
errors tend to be smaller than White standard errors.

Q2: (Gujarati and Porter, 2008) Consider the following two regressions run by Hanushek and
Jackson (Statistical Methods for Social Sciences, 1977) based on US data for 1946-1975 with standard
errors in parentheses
Ĉt = 26.19+ 0.6248GN Pt − .4398Dt
S.E. (2.73) (0.0060) (0.0736)
R2 = 0.999
!Ct
25.92 GN1 Pt + Dt
GN Pt = 0.6246− .4315 GN Pt
S.E. (2.22) (0.0068) (0.0597)
R2 = 0.875
where C = aggregate private consumption expenditure, GN P = gross national product, D = national
defense expenditure; t = time.
The objective of Hanushek and Jackson’s study was to find out the effect of defense expenditure
on other expenditures in the economy
(a) What might be the reason(s) for transforming the first equation into the second equation?
(b) If the objective of the transformation was to remove or reduce heteroskedasticity, what as-
sumption has been made about the error variance?
(c) Suppose that there is heteroskedasticity in the original model. With the information given,
can you tell whether the authors succeeded in removing it?
(d) Does the transformed regression have to be run through the origin? Why or why not?
(d) Can you compare the R2 values of the two regressions? Why or why not? Which one is the
relevant R2 ?

Q3: (Maddala, 2001) In a study of 27 industrial establishments of varying size, Y =the number of
supervisors and X =the number of supervised workers. Y varies from 30 to 210 and X from 247 to
1650. The OLS results for this cross-section were as follows
Yt = 0.115+ 14.448Xt + εt
t-stat= (0.011) (9.562)
T = 27, σ̂ = 21.73, R2 = 0.776

By plotting the squared residuals against X a positive relationship was found. A plot of the squared
residuals against 1/X showed a tighter relationship. Hence the assumption made was

var(εt ) = σ 2 Xt2
STATISTICS IN FINANCE A.M. Fuertes/ 6 Heteroskedastic errors 129

The estimated equation was

Yt /X = 0.121+ 3.803(1/Xt ) +et


t-stat= (.009) (4.570)
T = 27, σ̂ = 22.577, R2 = 0.7587

which in terms of the original variables can be rewritten as

Yt = 3.803 + 0.121Xt

(a) An investigator looks at the drop in R2 and concludes that the first equation is better. Is this
correct?
(b) What would the transformed equation be if it is assumed that var(εt ) = σ 2 Xt instead of
var(εt ) = σ 2 Xt2 ? How would you determine (in addition to the aforementioned plots) which of these
alternatives is the best one?
(c) Comment on the calculation of R2 from the transformed equation and the R2 from the equation
in terms of the original variables.

Q4: Indicate whether the following statement is true or false and justify your answer. “The
heteroskedasticity may be due to an omitted explanatory variable or an incorrect functional form”.
(T or F?)

Q5: Indicate whether the following statement is true or false and justify your answer. “A popular
form of heteroskedasticity for cross-section data is ARCH — autoregressive conditional heteroskedas-
ticity — originally derived by Engle (1982). Engle noticed that particularly for cross-section financial
data, large and small residuals tend to come in clusters, and this suggests that the variance of the
error may depend on the magnitude (i.e. irrespective of the sign) of the preceding error”.

Q6: Suppose that the model of interest is

∆(GDP/pc)j = β1 + β2 (RjM ) +εj

where ∆(GDP/pc)j denotes quarterly change in GDP per capita and RjM is the quarterly change in
a broad stock market index for a cross-section of j = 1, 2, ..., 50 industrialized countries. Suppose that
the estimation results are
β̂1OLS β̂2OLS β̂1GLS β̂2GLS
0.07 0.32 0.04 0.36
SE (0.08) (0.13) SE (0.15) (0.18)
SEW hite [0.20] [0.21]

On the basis of these results, is there evidence of heteroskedasticity? Discuss.

Q7: If is well known that in the presence of heteroskedasticity (a) the OLS estimators are still unbi-
ased and consistent. Also that heteroskedasticity presents two problems for OLS regression analysis:
(b) The OLS estimators are no longer efficient. The efficient estimators are those provided by the
GLS estimation method.
(c) The conventional SE of the OLS estimators are incorrect.
Illustrate (i.e. further explain) the above points (a), (b) and (c) on the basis of the information
provided in the previous table (Q6).

Q8: Take the cross-section regression model


yj = β1 + β2 xj +εj
STATISTICS IN FINANCE A.M. Fuertes/ 6 Heteroskedastic errors 130

It is known from previous research that σj2 = var(εj ) = eαzj where z is an observed variable taken
values zj = {0.14, 0.13, 0.20, ..., 0.12} and α is an unknown parameter. Sketch a feasible GLS procedure
to estimate efficiently β1 and β2 . Does the fact that the factor α is unknown represent a problem in
applying GLS?

Q9: Suppose that a researcher has investigated the sensitivity of y =annual profits to x =advertising
expenditure using a cross-section of j = 1, 2, ..., 10 companies pertaining to the same industry. To do
so this researcher has estimated a simple cross-section regression by OLS. The data used are

yj 2 3 9 1 4 3
xj 4 7 17 10 14 7
&
where ȳ = 3.7, x̄ = 9.8, xj yj = 269 and var(xj ) = 23.8. She is subsequently informed that this
sample comprises firms of different size (e.g. as measured by market share).
It can be shown that when the regression errors are εt ∼ iid(0, σ 2 ) then the variance of the OLS
2
slope coefficient estimate (in a simple model with one regressor only) is given by var(β̂2 ) = ! (xσt −x̄)2 .
t
However, if the errors are heteroskedastic or var(εt ) = σt2 then the latter
!
formula is no longer holds.
(xt −x̄)2 σt2
It can be shown that the correct formula in this context is var(β̂2 ) = !t 2.
( t (xt −x̄)2 )
(a) Report the OLS-based estimate of the sensitivity of annual profits to ‘ad’ expenditure obtained
by the researcher and the conventional OLS measure of sampling variability for this estimate. On the
basis of these results, can she claim that ad expenditure is a significant factor behind profits?
(b) On the basis of the second piece of information received by the researcher, which adjustment
to the initial research output should be made?
(c) How are the SE obtained from the second formula usually called?

ANSWERS to Review Questions


Go to Moodle
STATISTICS IN FINANCE A.M. Fuertes/ 7 Qualitative Variables 144

Review Questions
Q1: Consider the following two models:
yi = β0 + β1 xi + β2 d2i + β3 d3i + εi (16)

and
yi = β0 + β1 xi + β2 d2i + β3 d3i + β4 (d2i xi ) + β5 (d3i xi ) + εi (17)
where

y = annual earnings of MBA graduates


x = years of work experience
'
1 if Harvard MBA
d2 =
0 otherwise
'
1 if Wharton MBA
d3 =
0 otherwise

(a) What is the interpretation of β2 and β3 in model (1)? What is the interpretation of β2 , β3 , β4 and
β5 in model (17)?
(b) Which additional features (if any!) does model (17) capture over model (16)? Put differently,
why might one be interested in using the former instead of the latter?
(c) According to model (16) what are the expected annual earnings for a Harvard MBA graduate
with 5 years of experience? And according to model (17)?
(d) If β4 and β5 are individually statistically significant, would you choose (17) over (16)? If not,
which problem (if any) would arise?
(e) How would you test the hypothesis that β4 and β5 are both zero?

Q2: Using data on weight (Wi ), height (Hi ) for 51 students, 36 males and 15 females a regression
is estimated by OLS. The estimation results are:

Ŵi = −232.06551+ 5.5662Hi (R1))


t-stat= (-5.2066) (8.6246)

Ŵi = −122.9621+ 23.8238di + 3.7402Hi (R2)


t-stat= (-2.5884) (4.0149) (5.1613)

Ŵi = −107.9508+ 3.5105Hi + 2.0073di + 0.3263(di Hi ) (R3)


t-stat= (-1.2266) (2.6087) (0.0187) (0.2035)
where weight is in pounds, height is in inches, and where
'
1 for male
di =
0 for female

The following correlation matrix is computed

Hi di d i Hi
Hi 1 .6276 .6752
di .6276 1 .9971
d i Hi .6752 .9971 1

(a) Which regression model would you choose, R1 or R2 for the empirical analysis? Why?
STATISTICS IN FINANCE A.M. Fuertes/ 7 Qualitative Variables 145

(b) Suppose that regression model R2 is, in fact, the appropriate one but instead you chose
regression R2. What are the consequences of this model mispecification for your inferences?
(c) What does the coefficient of the gender dummy variable di in regression R2 suggest?
(d) In regression R2 the gender dummy variable di is statistically significant in explaining weight
differences, while in regression R3 is statistically insignificant. How would you explain this contra-
diction?
(e) Between regression models R2 and R3, which one would you choose? Why?
(f) In models 2 and 3 the coefficient on the height variable is about the same but the coefficient
of the dummy variable for sex changes dramatically. Any idea what is going on?

Q3: We address the question on the starting salaries of MSc graduate intake at the research
department of investment bank RTI plc. More specifically, we want to assess the value of taking
Statistics in their MSc. Let Si =$salary, Gi =MSc grade point, Ei = 1 if graduate took Statistics, 0
otherwise. The estimated regression is:

Ŝi = 24200+ 1643Gi + 5033Ei R2 = .74


(18)
(S.E.) (1078) (352) (456)

(a) Interpret the estimated equation.


(b) How would you assess whether or not women had lower starting salaries than men.
(c) How would you test whether the value of taking an Statistics course was the same for women
and men.
(b) How would you modify the above regression model to allow (and test) for the possibility that
there is both a different starting salary for women and that the impact of MSc overall grade on the
starting salary is also different for women and men.

Q4: A sample of 2820 british adults of ages between 28 and 58 was examined to investigate the level
of participation in the stock market as a function of income (I) expressed in thousands of £. Stock
market participation is a binary variable (S = 1 if an individual owns stock shares in his portfolio,
S = 0 otherwise). The following logit estimates (t ratios in brackets) were obtained:

Ŝi = G(−2.64 + 0.29Ii )


(−3.35) (4.05)
χ21 (1 df) = 16.681

(a) Using the above estimates, write an expression for the probability of share ownership which
explicitly shows that a logit regression is a nonlinear model of the relationship between Si and Ii .
(b) What is the probability that an individual with an income of 10,000 will own shares? And at an
income of 25,000? What is the rate of change of probability at the income level 30,000? (c) Comment
on the statistical significance of the estimated logit model (d) Explain why running a simple OLS
regression of Si on ln Ii would not give good estimates.

ANSWERS to Review Questions


Go to Moodle

You might also like