HW3 Equity

Risk Management
Homework 2
Equity Return Predictability
Deadline: May 27th 2024, 23:59
This homework constitutes 20% of the total mark for this course. You may work in a group
of at most six students. You should clearly indicate the names of group members.
Submit a report with a detailed description of the analysis. The corresponding code and
the data should also be submitted. Make sure that the results can be reproduced with the
files you submit.
Good Luck!
1
The main goal of this project is to provide an answer to the question: do machine learning
methods better explain equity returns compared to more traditional linear regressions?
The extant literature has documented that some variables can predict long-term fluctua-
tions on stock market returns. However, the short-term predictability of returns remains
elusive. This project will propose, compare, and evaluate a variety of different machine
learning methods for predicting short-term equity returns and compare the results to the
standard linear regression models considered by prior literature.
1. Load the file “PredictorData2018.xlsx”. Focus on the following time series at the
monthly frequency:
– Stock Returns (Index): We use S&P 500 index from Center for Research in
Security Press (CRSP) month-end values. For yearly and longer data frequencies,
we can go back as far as 1871, using data from Robert Shiller’s website. For
monthly frequency, we can only begin in the CRSP period, that is, 1927.
– Dividends (E12): Dividends are 12-month moving sums of dividends paid on
the S&P 500 index. The data are from Robert Shiller’s website from 1871 to 1987.
Dividends from 1988 to 2018 are from the S&P Corporation.
– Earnings (E12): Earnings are 12-month moving sums of earnings on the S&P
500 index. The data are again from Robert Shiller’s website from 1871 to 1987.
Earnings from 1988 to 2018 are estimates based on interpolation of quarterly
earnings provided by the S&P Corporation.
– The Book-to-Market Ratio (b/m) is the ratio of book value to market value for
the Dow Jones Industrial Average. Book values from 1920 to 2018 are from Value
Line’s website, specifically their Long-Term Perspective Chart of the Dow Jones
Industrial Average.
– Net Issues (ntis) is the ratio of 12-month moving sums of net issues by NYSE
listed stocks divided by the total end-of-year market capitalization of NYSE
stocks.
– Treasury Bills (tbl): Treasury-bill rates from 1920 to 1933 are the U.S. Yields
On Short-Term United States Securities, Three-Six Month Treasury Notes and
Certificates, Three-Month Treasury series in the NBER Macro history data base.
Treasury-bill rates from 1934 to 2018 are the 3-Month Treasury Bill: Secondary
Market Rate from the economic research data base at the Federal Reserve Bank
at St. Louis (FRED).
– Long Term Yield (lty): long-term government bond yield data from 1919 to
2018 is the U.S. Yield On Long-Term United States Bonds series in the NBER’s
Macrohistory data base.
2
– Corporate Bond Returns (AAA and BAA): Corporate Bond Yields on AAA and
BAA-rated bonds from 1919 to 2018 are from FRED.
– Inflation (infl): Inflation is the Consumer Price Index (All Urban Consumers)
from 1919 to 2018 from the Bureau of Labor Statistics.
– Stock Variance (svar): Stock Variance is computed as sum of squared daily
returns on the S&P 500. Daily returns from 1871 to 1926 are from the G. William
Schwert’s website; data from 1926 to 2018 are from CRSP.
2. Using the post-WWII data from January 1950 (skip the period 1945-1949 to eliminate
the effect of WWII), estimate a predictive regression model of the following form:
Rt+1 = α + βXt + ε t+1 ,
where Rt+1 is the logarithm return on a market portfolio S&P 500, which can be
computed as
D12t+1
Rt+1 = ln Indext+1 + − ln (Indext ) ,
12
Xt is a 12 × 1 vector of the lagged (one lag) predictors
(1) log dividend price ratio (ln(D12/Index))

(2) log earning price ratio (ln(E12/Index))
(3) dividend payout ratio (D12/E12)
(4) stock variance (svar)
(5) book to market ratio (b/m)
(6) net issues (ntis)
(7) treasury bills rate (tbl)
(8) long term rate (ltr)
(9) term spread (ltr-tbl)
(10) default spread (AAA-BAA)
(11) default return spread (AAA - ltr)
(12) consumer price index (infl).
Report the regression output and explain the sign and significance of all estimates.
3. Using the post-WWII data from January 1950 (skip the period 1945-1949 to eliminate
the effect of WWII), consider a predictive regression model of the following form
R t +1 = α + f ( X t ) + ε t +1 ,
3
where Rt+1 is the logarithm return on a market portfolio S&P 500, Xt is a 12 × 1
vector of the lagged (one lag) predictors. Evaluate how the following techniques help
better predict stock market returns.
– OLS regressions,
– penalized linear regressions (Ridge, Lasso, Elastic Net),
– principal component analysis (PCA) (3, 5 and 10 components),
– random forests,
– boosted regression trees,
– extremely randomized regression trees,
– neural networks: shallow vs. deep (for example, 1 layer with 16 nodes vs. 2
layers with 16-8 nodes).
Starting from January 1981 (t0 = 1981-01), perform the following estimation strategy.
Following common machine learning practice, you should split the historical data
(that is, the data available at the time you train the corresponding model) into two
sub-samples: a training set used to train the model and a validation set used to
evaluate the estimated model on an independent data set. Use the model accuracy
over the validation sample to iteratively search the hyperparameters that optimize
the objective function.
Regarding the splitting scheme, apply the following rules:
– keep the fraction of data used for training and validation fixed at 85% and 15%
of the historical data, respectively
– training and validation samples are consequential, that is, you always take the
first 85% of the historical data for training and then the remaining 15% of the
historical data for validation while preserving the order of observations. In other
words, you do not cross-validate by randomly selecting independent subsets of
data to preserve the time-series dependence of both the predictors and the target
variables.
– Forecasts are produced recursively by using an expanding window procedure,
that is, we re-estimate a given model at each time t and produce out-of-sample
forecasts of excess returns at time t + 1. Also, due to the expanding window, we
will have more and more historical data and hence larger training and validation
samples
– Notice that for some of the methodologies, validation is not required. For in-
stance, neither standard linear regressions nor PCA require a pseudo out-of-
sample period to validate the estimates. In these cases, we adopt a traditional
4
separation between in-sample versus out-of-sample period, where the former
consists of both the training data and the validation data.
– For neural networks, it might be too computationally costly to fine-tune the
networks each month. For simplicity, assume that you reestimate the network
once per year if cross-validation is very slow and you do not have access to the
high end computing cluster). In other words, if you fine-tune the network at
time t, then you should use this network without changing the hyperparameters
for forecasting excess returns in all periods for the next year. You reestimate the
network only in period t+ 1 year.
Report the out-of-sample Mean Squared Prediction Error (MSPE) and R2oos computed
as follows:
T −1
1
∑
2
MSPE = Rt+1 − R̃t+1 ,
T − t 0 − 1 t = t0
T −1 2
∑ Rt+1 − R̃t+1
t = t0
R2oos = 1 − ,
T −1
2
∑ ( Rt+1 − R̄t+1 )
t = t0
where R̃t+1 is the one-step ahead forecast of equity returns; R̄t+1 is the historical
mean; t0 is the date of the first prediction. Discuss your results. You may want to
answer some of the questions below.
– Compare the predictability implied by tree-based methods and the standard

linear regressions.
– How do extreme trees affect the predictability?
– How do shallow and deep neural networks affect the predictability?
Suggest and perform some robustness checks and then discuss how your initial re-
sults change.

HW3 Equity

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

HW3 Equity

Uploaded by

Copyright:

Available Formats

Risk Management

Equity Return Predictability

Deadline: May 27th 2024, 23:59

Rt+1 = α + βXt + ε t+1 ,

(1) log dividend price ratio (ln(D12/Index))

– Compare the predictability implied by tree-based methods and the standard

You might also like