Full Text 01

Matematiska Institutionen
Department of Mathematics
Masters Thesis
Forecasting the Equity Premium and
Optimal Portfolios
Johan Bjurgert and Marcus Edstrand
Reg Nr: LITH-MAT-EX--2008/04--SE
Linkping 2008
Matematiska institutionen
Linkpings universitet
581 83 Linkping
Forecasting the Equity Premium and Optimal
Portfolios
Department of Mathematics, Linkpings universitet
LITH-MAT-EX--2008/04--SE
Handledare: Dr Jrgen Blomvall
mai, Linkpings universitet
Dr Wofgang Mader
risklab GmbH
Examinator: Dr Jrgen Blomvall
mai, Linkpings universitet
Linkping, 15 April, 2008
Avdelning, Institution
Division, Department
Division of Mathematics
Department of Mathematics
Linkpings universitet
SE-581 83 Linkping, Sweden
Datum
Date
2008-04-15
Sprk
Language
Svenska/Swedish
Engelska/English
Rapporttyp
Report category
Licentiatavhandling
Examensarbete
C-uppsats
D-uppsats
vrig rapport
URL fr elektronisk version

http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-11795
ISBN
ISRN
LITH-MAT-EX--2008/04--SE
Serietitel och serienummer
Title of series, numbering
ISSN
Titel
Title
Forecasting the Equity Premium and Optimal Portfolios
Frfattare
Author
Sammanfattning
Abstract
The expected equity premium is an important parameter in many nancial
models, especially within portfolio optimization. A good forecast of the future
equity premium is therefore of great interest. In this thesis we seek to forecast
the equity premium, use it in portfolio optimization and then give evidence on
how sensitive the results are to estimation errors and how the impact of these can
be minimized.
Linear prediction models are commonly used by practitioners to forecast
the expected equity premium, this with mixed results. To only choose the model
that performs the best in-sample for forecasting, does not take model uncertainty
into account. Our approach is to still use linear prediction models, but also taking
model uncertainty into consideration by applying Bayesian model averaging.
The predictions are used in the optimization of a portfolio with risky assets to
investigate how sensitive portfolio optimization is to estimation errors in the
mean vector and covariance matrix. This is performed by using a Monte Carlo
based heuristic called portfolio resampling.
The results show that the predictive ability of linear models is not sub-
stantially improved by taking model uncertainty into consideration. This could
mean that the main problem with linear models is not model uncertainty, but
rather too low predictive ability. However, we nd that our approach gives better
forecasts than just using the historical average as an estimate. Furthermore,
we nd some predictive ability in the the GDP, the short term spread and the
volatility for the ve years to come. Portfolio resampling proves to be useful
when the input parameters in a portfolio optimization problem is suering from
vast uncertainty.
Keywords: equity premium, Bayesian model averaging, linear prediction,
estimation errors, Markowitz optimization
Nyckelord
Keywords equity premium, Bayesian model averaging, linear prediction, estimation errors,
Markowitz optimization
Abstract
The expected equity premium is an important parameter in many nancial mod-
els, especially within portfolio optimization. A good forecast of the future equity
premium is therefore of great interest. In this thesis we seek to forecast the equity
premium, use it in portfolio optimization and then give evidence on how sensitive
the results are to estimation errors and how the impact of these can be minimized.
Linear prediction models are commonly used by practitioners to forecast the ex-
pected equity premium, this with mixed results. To only choose the model that
performs the best in-sample for forecasting, does not take model uncertainty into
account. Our approach is to still use linear prediction models, but also taking
model uncertainty into consideration by applying Bayesian model averaging. The
predictions are used in the optimization of a portfolio with risky assets to investi-
gate how sensitive portfolio optimization is to estimation errors in the mean vector
and covariance matrix. This is performed by using a Monte Carlo based heuristic
called portfolio resampling.
The results show that the predictive ability of linear models is not substantially
improved by taking model uncertainty into consideration. This could mean that
the main problem with linear models is not model uncertainty, but rather too low
predictive ability. However, we nd that our approach gives better forecasts than
just using the historical average as an estimate. Furthermore, we nd some pre-
dictive ability in the the GDP, the short term spread and the volatility for the ve
years to come. Portfolio resampling proves to be useful when the input parameters
in a portfolio optimization problem is suering from vast uncertainty.
Keywords: equity premium, Bayesian model averaging, linear prediction,
estimation errors, Markowitz optimization
v
Acknowledgments
First of all we would like to thank risklab GmbH for giving us the opportunity
to write this thesis. It has been a truly rewarding experience. We are grateful
for the many inspirational discussions with Wolfgang Mader, our supervisor at
risklab. He also has provided us with valuable comments and suggestions. We
thank our supervisor at LiTH, Jrgen Blomvall, for his continous support and
feedback. Finally we would like to acknowledge our opponent Tobias Trnfeldt,
for his helpful comments.
Johan Bjurgert
Marcus Edstrand
Munich, April 2008
vii
Contents
1 Introduction 5
1.1 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2 Problem denition . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.5 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
I Equity Premium Forecasting using Bayesian Statistics 7
2 The Equity Premium 9
2.1 What is the equity premium? . . . . . . . . . . . . . . . . . . . . . 9
2.2 Historical models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3 Implied models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.4 Conditional models . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.5 Multi factor models . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.6 A short summary of the models . . . . . . . . . . . . . . . . . . . . 14
2.7 What is a good model? . . . . . . . . . . . . . . . . . . . . . . . . 15
2.8 Chosen model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3 Linear Regression Models 17
3.1 Basic denitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.2 The classical regression assumptions . . . . . . . . . . . . . . . . . 21
3.3 Robustness of OLS estimates . . . . . . . . . . . . . . . . . . . . . 22
3.4 Testing the regression assumptions . . . . . . . . . . . . . . . . . . 23
4 Bayesian Statistics 25
4.1 Basic denitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.2 Sucient statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.3 Choice of prior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.4 Marginalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.5 Bayesian model averaging . . . . . . . . . . . . . . . . . . . . . . . 30
4.6 Using BMA on linear regression models . . . . . . . . . . . . . . . 32
ix
x Contents
5 The Data Set and Linear Prediction 37
5.1 Chosen series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
5.2 The historical equity premium . . . . . . . . . . . . . . . . . . . . 37
5.3 Factors explaining the equity premium . . . . . . . . . . . . . . . . 39
5.4 Testing the assumptions of linear regression . . . . . . . . . . . . . 45
5.5 Forecasting by linear regression . . . . . . . . . . . . . . . . . . . . 51
6 Implementation 53
6.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
6.2 Linear prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
6.3 Bayesian model averaging . . . . . . . . . . . . . . . . . . . . . . . 55
6.4 Backtesting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
7 Results 57
7.1 Univariate forecasting . . . . . . . . . . . . . . . . . . . . . . . . . 57
7.2 Multivariate forecasting . . . . . . . . . . . . . . . . . . . . . . . . 60
7.3 Results from the backtest . . . . . . . . . . . . . . . . . . . . . . . 62
8 Discussion of the Forecasting 65
II Using the Equity Premium in Asset Allocation 69
9 Portfolio Optimization 71
9.1 Solution of the Markowitz problem . . . . . . . . . . . . . . . . . . 71
9.2 Estimation error in Markowitz portfolios . . . . . . . . . . . . . . . 76
9.3 The method of portfolio resampling . . . . . . . . . . . . . . . . . 77
9.4 An example of portfolio resampling . . . . . . . . . . . . . . . . . . 78
9.5 Discussion of portfolio resampling . . . . . . . . . . . . . . . . . . . 79
10 Backtesting Portfolio Performance 85
10.1 Backtesting setup and results . . . . . . . . . . . . . . . . . . . . . 85
11 Conclusions 89
Bibliography 91
A Mathematical Preliminaries 97
A.1 Statistical denitions . . . . . . . . . . . . . . . . . . . . . . . . . . 97
A.2 Statistical distributions . . . . . . . . . . . . . . . . . . . . . . . . 98
B Code 100
B.1 Univariate predictions . . . . . . . . . . . . . . . . . . . . . . . . . 100
B.2 Multivariate predictions . . . . . . . . . . . . . . . . . . . . . . . . 101
B.3 Merge time series . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
B.4 Load data into Matlab from Excel . . . . . . . . . . . . . . . . . . 103
B.5 Permutations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
B.6 Removal of outliers and linear prediction . . . . . . . . . . . . . . . 104
Contents xi
B.7 setSubColumn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
B.8 Portfolio resampling . . . . . . . . . . . . . . . . . . . . . . . . . . 105
B.9 Quadratic optimization . . . . . . . . . . . . . . . . . . . . . . . . 106
List of Figures
3.1 OLS by means of projection . . . . . . . . . . . . . . . . . . . . . . 18
3.2 The eect of outliers . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.3 Example of a Q-Q plot . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.1 Bayesian revising of probabilities . . . . . . . . . . . . . . . . . . . 26
5.1 The historical equity premium over time . . . . . . . . . . . . . . . 38
5.2 Shapes of the yield curve . . . . . . . . . . . . . . . . . . . . . . . 43
5.3 QQ-Plot of the one step lagged residuals for factors 1-9 . . . . . . 47
5.4 QQ-Plot of the one step lagged residuals for factors 10-18 . . . . . 48
5.5 Lagged factors 1-9 versus returns on the equity premium . . . . . . 49
5.6 Lagged factors 10-18 versus returns on the equity premium . . . . 50
6.1 Flowchart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
6.2 User interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
7.1 The equity premium from the univariate forecasts . . . . . . . . . . 58
7.2 Likelihood function values for dierent g-values . . . . . . . . . . . 59
7.3 The equity premium from the multivariate forecasts . . . . . . . . 60
7.4 Backtest of univariate models . . . . . . . . . . . . . . . . . . . . . 62
7.5 Backtest of multivariate models . . . . . . . . . . . . . . . . . . . . 63
9.1 Comparison of ecient and resampled frontier . . . . . . . . . . . . 81
9.2 Resampled portfolio allocation when shorting allowed . . . . . . . 82
9.3 Resampled portfolio allocation when no shorting allowed . . . . . . 83
9.4 Comparison of estimation error in mean and covariance . . . . . . 84
10.1 Portfolio value over time using dierent strategies . . . . . . . . . . 86
2 Contents
List of Tables
2.1 Advantages and disadvantages of discussed models . . . . . . . . . 14
3.1 Critical values for the Durbin-Watson test. . . . . . . . . . . . . . 23
5.1 The data set and sources . . . . . . . . . . . . . . . . . . . . . . . . 38
5.2 Basic statistics for the factors . . . . . . . . . . . . . . . . . . . . . 40
5.3 Outliers identied by the leverage measure . . . . . . . . . . . . . . 45
5.4 Jarque-Bera test of normality . . . . . . . . . . . . . . . . . . . . . 46
5.5 Durbin-Watson test of autocorrelation . . . . . . . . . . . . . . . . 46
5.6 Principle of lagging time series for forecasting . . . . . . . . . . . . 51
5.7 Lagged R
2
for univariate regression . . . . . . . . . . . . . . . . . . 52
7.1 Forecasting statistics in percent . . . . . . . . . . . . . . . . . . . . 57
7.2 The univariate model with highest probability over time . . . . . . 58
7.3 Out of sample, R
2
os,uni
, and hit ratios, HR
uni
. . . . . . . . . . . . 59
7.4 Forecasting statistics in percent . . . . . . . . . . . . . . . . . . . . 60
7.5 The multivariate model with highest probability over time . . . . . 61
7.6 Forecasts for dierent g-values . . . . . . . . . . . . . . . . . . . . 61
7.7 Out of sample, R
2
os,mv
mv
. . . . . . . . . . . . 61
9.1 Input parameters for portfolio resampling . . . . . . . . . . . . . . 78
10.1 Portfolio returns over time . . . . . . . . . . . . . . . . . . . . . . . 86
10.2 Terminal portfolio value . . . . . . . . . . . . . . . . . . . . . . . . 87
Nomenclature
The most frequently used symbols and abbreviations are described here.
Symbols
Demanded portfolio return
i,t
Beta for asset i at time t
t
True least squares parameter at time t
Asset return vector
t
Information set at time t
Estimated covariance matrix
cov[X] Covariance of the random variable X
t
Least squares estimate at time t
Sampled covariance matrix

u
t
Least squares sample residual at time t
m,t
Market m price of risk at time t
C Covariance matrix
I
n
The unity matrix of size n n
w Weights of assets
tr[X] The trace of the matrix X
var[X] Variance of the random variable X
D
i,t
Dividend for asset i at time t
E[X] Expected value of the random variable X
r
f,t
Riskfree rate at time t to t + 1
r
m,t
Return from asset m at time t
u
t
Population residual in the least square model at time t
Abbreviations
aHEP Average historical equity premium
BMA Bayesian model averaging
DJIA Dow Jones industrial average
EEP Expected equity premium
GDP Gross domestic product
HEP Historical equity premium
IEP Implied equity premium
OLS Ordinary least squares
REP Required equity premium
3
Chapter 1
Introduction
The expected equity risk premium is one of the single most important economic
variables. A meaningful estimate of the premium is critical to valuing companies
and stocks and for planning future investments. However, the only premium that
can be observed is the historical premium.
Since the equity premium is shaped by overall market conditions, factors inu-
encing market conditions can be used to explain the equity premium. Although
predictive power usually is low, the factors can also be used for forecasting. Many
of the investigations undertaken, typically set out to determine a best model, con-
sisting of a set of economic predictors and then proceed as if the selected model
had generated the equity premium. Such an approach ignores the uncertainty
in model selection leading to over condent inferences that are more risky than
one thinks that they are. In our thesis we will forecast the equity premium by
computing a weighted average of a large number of linear prediction models using
Bayesian model averaging (BMA) to allow for model uncertainty being taken into
account.
Having forecasted the equity premium - the key input for asset allocation op-
timization models, we conclude by highlighting main pitfalls in the mean variance
optimization framework and present portfolio resampling as a way to arrive at
suitable allocation decisions when the input parameters are very uncertain.
5
6 Introduction
1.1 Objectives
The objective of this thesis is to build a framework for forecasting the equity
premium and then implement it to produce a functional tool for practical use.
Further, the impact of uncertain input parameters in mean-variance optimization
shall be investigated.
1.2 Problem denition
By means of BMA and linear prediction, what is the expected equity premium
for the years to come and how is it best used as an input in a mean variance
optimization problem?
1.3 Limitations
The practical part of this thesis is limited to the use of US time series only.
However, the theoretical framework is valid for all economies.
1.4 Contributions
To the best knowledge of the authors, this is the rst attempt to forecast the
equity premium using Bayesian model averaging with the priors specied later in
the thesis.
1.5 Outline
The rst part of the thesis is about forecasting the equity premium whereas the
second part discusses the importance of parameter uncertainty in portfolio opti-
mization.
In chapter 2 we present the concept of the equity premium, usual assumptions
thereof and associated models. Chapter 3 describes the fundamental ideas of lin-
ear regression and its limitations. In chapter 4 we rst present basic concepts of
Bayesian statistics and then use them to combine the properties of linear predic-
tion with Bayesian model averaging. Having dened the forecasting approach we
in chapter 5 turn to the factors explaining the equity premium. Chapter 6 ad-
dresses the implementation of the theory. Finally, chapter 7 presents our results
and a discussion thereof is found in chapter 8. In chapter 9 we investigate the im-
pact of estimation error on portfolio optimization. In chapter 10 we evaluate the
performance of a portfolio when using the forecasted equity premium and portfo-
lio resampling. With chapter 11 we conclude our thesis and make propositions of
future investigations and work.
Part I
Equity Premium Forecasting
using Bayesian Statistics
7
Chapter 2
The Equity Premium
In this chapter we dene the concept of the equity premium and present some
models that have been used for estimating the premium. At the end of the chap-
ter, a table summing up advantages and disadvantages of the dierent models is
provided. The chapter concludes with a motivation to why we have chosen to work
with multi factor models and a summary of criterions for a good model.
2.1 What is the equity premium?
As dened by Fernandz [32], the equity premium can be split up into four dierent
concepts. These concepts hold for single stocks as well for stock indices. In our
thesis the emphasis is on stock indices.
historical equity premium (HEP): historical return of the stock market
over riskfree asset
expected equity premium (EEP): expected return of the stock market
over riskfree asset
required equity premium (REP): incremental return of the market port-
folio over the riskfree rate required by an investor in order to hold the market
portfolio, or the extra return that the overall stock market must provide over
the riskfree asset to compensate for the extra risk
implied equity premium (IEP): the required equity premium that arises
from a pricing model and from assuming that the market price is correct.
The HEP is observable on the nancial market and is equal for all investors.
1
It
is calculated by
HEP
t
= r
m,t
r
f,t1
= (
P
t
P
t1
1) (r
f,t1
) (2.1)
1
This is true as long as they use the same instruments and the same time resolution.
9
10 The Equity Premium
where r
m,t
is the return on the stock market, r
f,t1
is the rate on a riskfree asset
from t 1 to t. P
t
is the stock index level.
A widely used measure for r
m,t
is the return on a large stock index. For the
second asset r
f,t1
in (2.1), the return on government securities is usually used.
Some practitioners use the return on short-term treasury bills; some use the re-
turns on long-term government bonds. Yields on bonds instead of returns have
also been used to some extent. Despite the indisputable importance of the equity
premium, a general consensus on exactly which assets should enter expression (2.1)
does not exist. Questions like: Which stock index should be used? and Which
riskfree instrument should be used and which maturity should it have? remain
unanswered.
The EEP is made up of the markets expectations of future returns over a risk-free
asset and is therefore not observable in the nancial market. Its magnitude and
the most appropriate way to produce estimates thereof is an intensively debated
topic among economists. The market expectations shaping the premium are based
on, at least, a non-negative premium and to some extent also average realizations
of the HEP. This would mean that there is a relation between the EEP and the
HEP. Some authors (e.g. [9], [21], [37] and [42]), even argue that there is a strict
equality between the both, whereas other claim that the EEP is smaller than the
HEP (e.g. [45], [6] and [22]). Although investors have dierent opinions to what
is the correct level of the expected equity premium, many basic nancial books
recommend using 5-8%.
2
The required equity premium (REP) is important in valuation since it is the key
to determining the companys required return on equity.
If one believes that prices on the nancial markets are correct, then the implied eq-
uity premium, (IEP), would be an estimate of the expected equity premium (EEP).
We now turn to presenting models being used to produce estimates of the dif-
ferent concepts.
2.2 Historical models
The probably most used method by practitioners is to use the historical realized
equity premium as a proxy for the expected equity premium [64]. They thereby
implicitly follow the relationship HEP = EEP.
Assuming that the historical equity premium is equal to the expected equity pre-
mium can be formulated as
r
m,t
= E
t1
[r
m,t
] +e
m,t
(2.2)
2
See for instance [8]
2.3 Implied models 11
where e
m,t
is the error term, the unexpected return. The expectation is often com-
puted as the arithmetic average of all available values for the HEP. In equation
(2.2), it is assumed that the errors are independent and have a mean of zero. The
model then implies that investors are rational and the random error term corre-
sponds to their mistakes. It is also possible to model more advanced errors. For
example, an autoregressive error term might be motivated since market returns
sometimes exhibit positive autocorrelation. An AR(1) model then implies that
investors need one time step to learn about their mistakes. [64]
The model has the advantages of being intuitive and easy to use. The draw-
backs on the other hand are not few. Except for usual problems with time series,
such as used length, outliers etc, the model suers from problems with longer pe-
riods where the riskfree asset has a higher average return than the equity. Clearly,
this is not plausible since an investor expects a positive return in order to invest.
2.3 Implied models
Implied models for the equity premium make use of the assumption EEP = IEP
and are used much in a similar way as investors use the Black and Scholes formula
backwards to solve for implied volatility. The advantage of implied models is that
they provide time-varying estimates for the expected market returns since prices
and expectations change over time. The main drawback is that the validity is
bounded by the validity of the model used. Lately, the inverse Black Litterman
model has attracted interest, see for instance [67]. Another more widely used
model is the Gordon dividend growth model which is further discussed in [11].
Under certain assumptions it can be written as
P
it
=
E[D
i,t+1
]
E[r
i,t+1
] E[g
i,t+1
]
(2.3)
where E[D
i,t+1
] are the next years expected dividend, E[r
i,t+1
] the required rate
of return and E[g
i,t+1
] is the companys expected growth rate of dividends from
today until innity.
Assuming that CAPM
3
holds, the required rate of returns for stock i can be
written as
E[r
i,t
] = r
f,t
+
i,t
E[r
m,t
r
f,t
] (2.4)
By combining the two equations, where dividends are approximated as E[D
i,t+1
] =
[1 + E[g
i,t+1
]]D
i,t
, under assumption that E[r
f,t+1
] = r
f,t+1
and by aggregating
3
Capital asset pricing model, see [7]
over all assets, we can now solve for the expected market risk premium
E[r
m,t+1
] =
(1 +E[g
m,t+1
])D
m,t
P
m,t
+E[g
m,t+1
]
= (1 +E[g
m,t+1
]) DivYield
m,t
+E[g
m,t+1
] (2.5)
where E[r
m,t+1
] is the expected market risk premium, D
m,t
is the sum of dividends
from all companies, E[g
m,t+1
] is the expected growth rate of the dividends from
today to innity
4
, and DivYield
m,t
is the current market price dividend yield. [64]
One critic against using the Gordon dividend growth model is that the result
depend heavily on what number is used for the expected dividend growth rate and
thereby the problem is shifted to forecasting the expected dividend growth rate.
2.4 Conditional models
Conditional models refers to models conditioning on the information investors use
to estimate the risk premium and thereby allow for time-varying estimations. On
the other hand, the information set
t
used by investors is not observable on the
market and it is not clear how to specify a method that investors use to form their
expectations from the data set.
As an example of such a model, the conditional version of the CAPM implies
the following restriction for the excess returns
E[r
i,t
[
t1
] =
i,t
E[r
m,t
[
t1
] (2.6)
where the market beta is
i,t
=
cov [r
i,t
, r
m,t
[
t1
]
var [r
m,t
[
t1
]
(2.7)
and E[r
i,t
[
t1
] and E[r
m,t
[
t1
] are expected returns on asset i and the market
portfolio conditional on investors information set
t1
5
.
Observing that the ratio E[r
m,t
[
t1
]/ var[r
m,t
[
t1
] is the market price of risk
m,t
, measuring the compensation an investor must receive for a unit increase
in the market return variance [55], yields the following expression for the market
portfolios expected excess returns
E[r
m,t
[
t1
] =
m,t
(
t1
) var [r
m,t|
t1
]. (2.8)
By specifying a model for the conditional variance process, the equity premium
can be estimated.
4
E[R
m,t+1
] > E[g
m,t+1
]
5
Both returns are in excess of the riskless rate of return r
f,t1
and all returns are measured
in one numeraire currency.
2.5 Multi factor models 13
2.5 Multi factor models
Multi factor models make use of correlation between equity returns and returns
from other economic factors. By choosing a set of economic factors and by deter-
mining the coecients, the equity premium can be estimated as
r
m,t
=
t
+
j,t
X
j,t
+
t
(2.9)
where the coecients and usually are calculated using the least squares method
(OLS), X contains the factors and is the error.
The most prominent candidates of economic factors used as explanatory variables
are the dividend to price ratio and the dividend yield (e.g. [60], [12], [28], [40] and
[51]), the earnings to price ratio (e.g. [13], [14] and [48]), the book to market ratio
(e.g. [46] and [58]), short term interest rates (e.g. [40] and [1]), yield spreads (e.g.
[43], [15] and [29]), and more recently the consumption-wealth ratio (e.g. [50]).
Other candidates are dividend payout ratios, corporate or net issuing ratios and
beta premia (e.g. [37]), the term spread and the default spread (e.g. [2], [15], [29]
and [43]), the ination rate (e.g. [30], [27] and [19]), value of high and low beta
stocks (e.g. [57]) and aggregate nancing activity (e.g. [3]).
Goyal and Welch [37] showed that most of the mentioned predictors performed
worse out-of-sample than just assuming that the equity premium had been con-
stant. They also found that the predictors were not stable, that is their importance
changes over time. Campbell and Thompson [16] on the other hand found that
some of the predictors, with signicant forecasting power in-sample, generally have
a better out-of-sample forecast power than a forecast based on the historical av-
erage.
2.6 A short summary of the models
Model type Advantages Disadvantages
Historical Intuitive and easy to use Might have problems with
longer periods of negative
equity premium
Doubtful whether past is
an indicator for future
Implied Relatively simple to use The validity of the esti-
mates is bounded to the
validity of the used model
Provides time varying es-
timates for the premium
Assumes market prices are
correct
Conditional Provides time varying es-
timates for the premium
The information used by
investors are not visible on
the market
Models for determining
how investors form their
expectations from the in-
formation are not unam-
biguous
Multi Factor High model transparency
and results are easy to in-
terpret
It is doubtful whether past
is an indicator for future
Forecasts are only possible
for a short time horizon,
due to lagging
Table 2.1. Table highlighting advantages and disadvantages of the discussed models
2.7 What is a good model? 15
2.7 What is a good model?
These are model criterions that the authors, inspired of Vaihekoski [64], consider
important for a good estimate of the equity premium:
Economical reasoning criterions
The premium estimate should be positive for most of the time
Model inputs should be visible at the nancial markets
The estimated premium should be rather smooth over time because investor
preferences presumably do not change much over time
The model should provide dierent premium estimates for dierent time
horizons, that is, taking investors time structure into account
Technical reasoning criterions
The model should allow for time variation in the premium
The model should make use of the latest time t observation
The model should be provided with a precision of the estimated premium
It should be possible to use dierent time resolutions in the data input
2.8 Chosen model
All model categories previously stated are likely to be useful in estimating the
equity premium. In our thesis we have chosen to work with multi factor models
because they are intuitively more straight forward than both implied and condi-
tional models; all model inputs are visible on the market and it is perfectly clear
from the model how dierent factors add up to the equity premium. Furthermore,
it is easy to add constraints to the model, which enables the use of economic
reasoning as a complement to pure statistical analysis.
Chapter 3
Linear Regression Models
First we summarize the mechanics of linear regressions and present some formu-
las that hold regardless of what statistical assumptions that are made. Then we
discuss dierent statistical assumptions about the properties of the model and ro-
bustness of the estimates.
3.1 Basic denitions
Suppose that a scalar y
t
is related to a vector x
t
R
k1
and a noise term u
t
according to the regression model
y
t
= x
t
+u
t
. (3.1)
Denition 3.1 (Ordinary least squares OLS) Given an observed sample
(y
1
, y
2
, . . . , y
T
), the ordinary least squares estimate of (denoted

t
) is the value
that minimizes the residual sum of squares: V () =
T
t=1
2
t
() =
T
t=1
(y
t
y
t
)
2
=
T
t=1
(y
t
x
t
)
2
(see [38])
Theorem 3.1 (Ordinary least squares estimate) The OLS estimate is given
by
= [
T
t=1
(x
t
x
t
)]
1
[
T
t=1
(x
t
y
t
)] (3.2)
assuming that the matrix
T
t=1
(x
t
x
t
) R
kk
is nonsingular (see [38]).
Proof : The result is found by dierentiation,
dV ()
d
= 2
T
t=1
x
t
(y
t
x
t
) = 0,
and the minimizing argument is thus
17
18 Linear Regression Models
= [
T
t=1
(x
t
x
t
)]
1
[
T
t=1
(x
t
y
t
)].
Often, the regression model is written in matrix notation as

y = X +u, (3.3)
where y
y
1
y
2
.
.
.
y
n
x
T
1
x
T
2
.
.
.
x
T
n
u
1
u
2
.
.
.
u
n
.
A perhaps more intuitive way to arrive at equation (3.2) is to project y on the
column space of X.
Figure 3.1. OLS by means of projection
The vector of the OLS sample residuals, u can then be written as u = y X.
Consequently the loss function V () for the least squares problem can be written
V () = min
( u
u).
Since y, the projection of y on the column space of X, is orthogonal to u
u
y = y
u = 0. (3.4)
In the same way, the OLS sample residuals are orthogonal to the explanatory
variables in X
u
X = 0. (3.5)
3.1 Basic denitions 19
Now, substituting y = X into (3.4) yields
(X)
(y X) = 0
(X
y X
X) = 0.
By choosing the nontrivial solution for beta, and by noticing that if X is of full
rank, then the matrix X
X also is of full rank and we can compute the least

squares estimator by inverting X
X.
= (X
X)
1
X
y. (3.6)
The OLS sample residual u shall not be confused with the population residual u.
The vector of OLS sample residuals can be written as
u = y X
= y X(X
X)
1
X
y = [I
n
X(X
X)
1
X
]y = M
X
y. (3.7)
The relationship between the two errors can now be found by substituting equation
(3.3) into equation (3.7)
u = M
X
(X +u) = M
X
u. (3.8)
The dierence between the OLS estimate

and the true parameter is found by
substituting equation (3.3) into (3.6)
= (X
X)
1
X
[X +u] = + (X
X)
1
X
u. (3.9)
Denition 3.2 (Coecient of determination) The coecient of determina-
tion, R
2
, is dened as the fraction of variance that is explained by the model
R
2
=
var[ y]
var[y]
.
If we let X include an intercept, then (3.5) also implies that the tted residuals
have a zero mean
1
n
n
i=1
u
i
= 0. Now we can decompose the variance of y into
the variance of y and u
var[y] = var[ y + u] = var[ y] + var[ u] 2 cov[ y, u].
Rewriting the covariance as
cov[ y, u] = E[ y u] E[ y]E[ u]
and by using y u and E[ u] = 0 we can write R
2
as
R
2
=
var[ y]
var[y]
= 1
var[ u]
var[y]
.
Since OLS minimizes the sum of squared tted errors, which is proportional to
var[y], it also maximizes R
2
.
By substituting the estimated variances, R
2
can be written as
var[ y]
var[y]
=
1
n
n
i=1
( y
i
y)
2
1
n
n
i=1
(y
i
y)
2
=
n
i=1
( y
i
)
2
n y
2
n
i=1
(y
i
)
2
n y
2
=
(X
(X
) n y
2
y
y n y
2
=
y
X(X
X)
1
X
y n y
2
y
y n y
2
where the identity used is calculated as
n
i=1
(x
i
x)
2
=
n
i=1
[x
2
i

2
n
x
i
n
i=1
x
i
+
1
n
2
(
n
i=1
x
i
)
2
]
=
n
i=1
(x
2
i
)
2
n
(
n
i=1
x
i
)
2
+
n
n
2
(
n
i=1
x
i
)
2
=
n
i=1
(x
2
i
)
1
n
(
n
i=1
x
i
)
2
=
n
i=1
(x
2
i
) n x
2
.
3.2 The classical regression assumptions 21
3.2 The classical regression assumptions
The following assumptions
1
are used for later calculations
1. x
t
is a vector of deterministic variables
2. u
t
is i.i.d. with mean 0 and variance
2
(E[u] = 0 and E[uu
] =
2
I
n
)
3. u
t
is Gaussian (0,
2
)
Substituting equation (3.3) into equation (3.6) and taking expectations using as-
sumptions 1 and 2 establishes that

is unbiased,
= (X
X)
1
X
[X +u] = + (X
X)
1
X
u (3.10)
E[
] = + (X
X)
1
X
E[u] = (3.11)
with covariance matrix given by
E[(
)(
] = E[(X
X)
1
X
uu
X(X
X)
1
] (3.12)
= (X
X)
1
X
E[uu
]X(X
X)
1
=
2
(X
X)
1
X
X(X
X)
1
=
2
(X
X)
1
.
When u is Gaussian, the above calculations imply that

is Gaussian. Hence, the
preceding results imply
N(,
2
(X
X)
1
).
It can further be shown that under assumption 1,2 and 3,

is BLUE
2
, that is, no
unbiased estimator of is more ecient than the OLS estimator

.
1
As treated in [38]
2
BLUE, best linear unbiased estimator see the Gauss-Markov theorem
3.3 Robustness of OLS estimates
The most serious problem with OLS is non-robustness to outliers. One single bad
point will have a strong inuence on the solution. To remedy this one can dis-
card the worst tting data-point and recompute the OLS t. In gure 3.2, the
black line illustrates the result of discarding an outlier. Deleting of an extreme
Figure 3.2. The eect of outliers
point can be justied by arguing that there seldom are outliers which practically
makes them unpredictable and therefore the deletion would make the predictive
power stronger. Sometimes extreme points correspond to extraordinary changes
in economies and depending on context it might be more or less justied to discard
them.
Because the outliers do not get a higher residual they might be easy to over-
look. A good measure for the inuence of a data point is its leverage.
Denition 3.3 (Leverage) To compute leverage in ordinary least squares, the
hat matrix H is given by H = X(X
X)
1
X
, where X R
np
and n p.
Since y = X
= X(X
X)
1
X
y the leverage measures how an observation es-

timates its own predicted value. The diagonals h
ii
of H contains the leverage
measures and are not inuenced by y. A rule of thumb [39] for detecting out-
liers is that h
ii
> 2
(p+1)
n
signals a high leverage point, where p is the number of
columns in the predictor matrix X aside from the intercept and n is the number
of observations. [39]
3.4 Testing the regression assumptions 23
3.4 Testing the regression assumptions
Unfortunately assumption 2 can easily be violated for time series data since many
time series exhibit autocorrelation, resulting in the OLS estimates being inecient,
that is, they have higher variability than they should.
Denition 3.4 (Autocorrelation function) The j
th
autocorrelation of a co-
variance stationary process
3
, denoted
j
, is dened as its j
th
autocovariance di-
vided by the variance
j

j
0
, where
j
= E(Y
t
)(Y
tj
).
(3.13)
Since
j
is a correlation, [
j
[ 1 for all j. Note also that
0
equals unity for all
covariance stationary processes.
A natural estimate of the the sample autocorrelation
j
is provided by the corre-
sponding sample moments

j

j

0
, where

j
=
1
T
T
t=j+1
(Y
t
y)(Y
tj
y) j = 0, 1, 2 . . . , T 1
y =
1
T
T
t=1
(Y
t
).
Denition 3.5 (Durbin-Watson test) The Durbin-Watson test statistics is used
to detect the presence of autocorrelation in the residuals from a regression analysis
and is dened by
DW =
T
t=2
(e
t
e
t1
)
2
T
t=1
e
2
t
(3.14)
where the e
t
, t = 1, 2, . . . , T are the regression analysis residuals.
The null hypothesis of the statistic is that there is no autocorrelation, that is
= 0 and the opposite hypothesis, that there is autocorrelation, = 0. Durbin
and Watson [23] derive lower and upper bounds for the critical values, see table
3.1.
= 0 DW 2 No Correlation
= 1 DW 0 Positive Correlation
= 1 DW 4 Negative Correlation
Table 3.1. Critical values for the Durbin-Watson test.
3
For a denition of a covariance stationary process, see appendix A.1.
One way to check assumption 3 is to plot the underlying probability distribution
of the sample against the theoretical distribution. Figure 3.3 is called a Q-Q plot.
Figure 3.3. Example of a Q-Q plot
For a more detailed analysis the Jarque-Bera test, a godness of t measure from
departure of normality, based on skewness and kurtosis can be employed.
Denition 3.6 (Jarque-Bera test) The test statistic JB is dened as
JB =
n
6
S
2
+
(K 3)
2
4
(3.15)
where n is the number of observations, S is the sample skewness and K is the
sample kurtosis, dened as
S =
1
n
n
k=1
(x
k
x)
3
(
1
n
n
k=1
(x
k
x)
2
)
3/2
K =
1
n
n
k=1
(x
k
x)
4
(
1
n
n
k=1
(x
k
x)
2
)
2
where x is the sample mean.
Asymptotically JB
2
(2) which can be used to test the null hypothesis that
data are from a normal distribution. The null hypothesis is a joint hypothesis
of skewness being 0 and the excess kurtosis being 3 since samples from a nor-
mal distribution have an expected skewness of 0 and an expected kurtosis of 3.
The denition shows that any deviation from the expectations increases the JB
statistic.
Chapter 4
Bayesian Statistics
First, we introduce fundamental concepts of Bayesian statistics and then we pro-
vide tools for calculating posterior densities which are crucial to our forecasting.
4.1 Basic denitions
Denition 4.1 (Prior and posterior) If M
j
, j J, are considered models,
then for any data D,
p(M
j
), j J, are called the prior probabilities of the M
j
, j J
p(M
j
[D), j J, are called the posterior probabilities of the M
j
, j J
where p denotes probability distribution functions (See [5]).
Denition 4.2 (The likelihood function) Let x = (x
1
, . . . , x
n
) be a random
sample from a distribution p(x; ) depending on an unknown parameter in the
parameter space /. The function l
x
() =

n
i=1
p(x
i
; ) is called the likelihood
function.
The likelihood function is then the probability that the values x
1
, . . . , x
n
are in
the random sample. Mind that the probability density is written as p(x; ). This
is to emphasize that is the underlying parameter and will not be written out
explicitly in the sequel. Depending on context we will also refer to the likelihood
function as p(x[) instead of l
x
().
Theorem 4.1 (Bayess theorem) Let p(y, ), denote the joint probability den-
sity function (pdf) for a random observation vector y and a parameter vector ,
also considered random. Then according to usual operations with pdfs, we have
p(y, ) = p(y[)p()
=p([y)p(y)
and thus
p([y) =
p()p(y[)
p(y)
=
p()p(y[)
A
p(y[)p()d
(4.1)
25
26 Bayesian Statistics
with p(y) = 0. In the discrete case, the theorem is written as
p([y) =
p()p(y[)
p(y)
=
p()p(y[)
iA
p(y[
i
)p(
i
)
. (4.2)
The last expression can be written as follows
p([y) p()p(y[)
posterior pdf pdf likelihood function, (4.3)
here, p(y), the normalizing constant needed to obtain a proper distribution in
is discarded and denotes proportionality. The use of the symbol is explained
in the next section.
Figure 4.1 highlights the importance of Bayess theorem and shows how the prior
information enters the posterior pdf via the prior pdf, whereas all the sample in-
formation enters the posterior pdf via the likelihood function.
Figure 4.1. Bayesian revising of probabilities
Note that an important dierence between the Bayesian statistics and the classical
Fisher statistics is that the parameter vector is considered to be a stochastic
variable rather than an unknown parameter.
4.2 Sucient statistics
A sucient statistics can be seen as a summary of the information in data, where
redundant and uninteresting information has been removed.
Denition 4.3 (Sucient statistics) A statistic t(x) is sucient for an under-
lying parameter precisely if the conditional probability distribution of the data
x, given the statistic t(x), is independent of the parameter , (see [17]).
Shortly the denition states that can not give any further information about x
if t(x) is sucient for , that is, p(x[t, ) = p(x[t).
The Neymans factorization theorem provides a convinient characterization of a
sucient statistics.
4.2 Sucient statistics 27
Theorem 4.2 (Neymans factorization theorem) A statistic t is sucient
for given y if and only if there are functions f and g such that
p(y[) = f(t, )g(y)
where t = t(y). (see [49])
Proof: For a proof see [49]
Here, t(y) is the sucient statistics and the function f(t, ) relates the sucient
statistics to the parameter , while g(y) is a independent normalization factor
of the pdf.
It turns out that many of the common statistical distributions have a similar
form. This leads to the denition of the exponential family.
Denition 4.4 (The exponential family) A distribution is from the one-parameter
exponential family if it can be put into the form
p(y[) = g(y)h() exp[t(y)()].
Equivalently, if the likelihood of n independent observations y = (y
1
, y
2
. . . y
n
)
from this distribution is of the form
l
y
() h()
n
exp[
t(y
i
)()],
then it follows immediately from denition 4.2 that
t(y
i
) is sucient for given
y.
Example 4.1: Sucient statistics for a Gaussian
For a sequence of independent Gaussian variables with unknown mean
y
t
= +e
t
N(,
2
), t = 1, 2, . . . , N
p(y[) =
N
t=1
1
2
2
exp[
1
2
2
(y
t
)
2
]
= exp[
1
2
2
2
+ 2
y
t
]
. .. .
=f(t,)
(2
2
)
N/2
exp[
1
2
2
y
2
t
]
. .. .
=g(y)
the sucient statistics t(y) is given by t(y) =
y
t
.
4.3 Choice of prior
Suppose our model M of a set of data y is parameterized by . Our knowledge
about before y is measured (given) is quantied by the prior pdf, p(). After
measuring y the posterior pdf is available as p([y) p(y[)p(). It is clear that
dierent assumptions of p() leads to dierent inferences p([y).
A good rule of thumb for prior selection is that your prior should represent the
best knowledge available about the parameters before looking at data. For exam-
ple, the number of scores in a football game can not be less than zero and is less
than 1000, which justies setting your prior equal to zero outside this interval.
In the case that one does not have any information, a good idea might be to use
an uninformative prior.
Denition 4.5 (Jereys prior) Jereys prior p
J
() is dened as proportional
to the determinant of the Fisher information matrix of p(y|)
p
J
() [J(|y)[
1
2
(4.4)
where
J([y)
i,j
= E
y
2
ln p(y|)
. (4.5)
The Fisher information is a way of measuring the amount of information that
an observable random variable y = (y
1
, . . . , y
n
) carries about a set of unknown
parameters = (
1
, . . . ,
n
). The notation J([y) is used to make clear that the
parameter vector is associated with the random variable y and should not be
thought of as conditioning. A perhaps more intuitive way
1
to write (4.5) is
J([y)
i,j
= cov
i
ln p(y|),

j
ln p(y|)] (4.6)
Mind that the Fisher information only is dened under certain regularity condi-
tions, which is further discussed in [24]. One might wonder why Jeerys made his
prior proportional to the square root of the determinant of the sher information
matrix. There is a perfectly good reason for this, consider a transformation of the
unknown parameters to () then if K is the matrix K
ij
=
i
/
j
J(|y) = KJ(|y)K
and hence the determinant of the information satises

[J([y)[ = [J([y)[[K[
2
.
Because [K[ is the Jacobian, and thus, does not depend on y, it follows that
p
J
() [J(|y)[
1
2
provides a scale-invariant prior, which is a highly desirable property for a reference
prior. In Jeerys own words any arbitrariness in the choice of parameters could
make no dierence to the results.
1
Remember that cov[x, y] = E[(x
x
)(y
y
)].
4.3 Choice of prior 29
Example 4.2
Consider a random sample y = (y
1
, . . . , y
n
) N(, ), with mean known and
variance unknown. The Jereys prior p
J
() for is then computed as follows
L([y) = ln (p(y[)) = ln (
n
i=1
1
2
exp[
(x
i
)
2
2
])
= ln ((
1
2
)
n
exp[
1
2
n
i=1
(x
i
)
2
])
=
1
2
n
i=1
(x
i
)
2
n
2
ln +c

2
L
2
=
3
3
n
i=1
(x
i
)
2
+
n
2
E[
2
L
2
] =
3
3
E[
n
i=1
(x
i
)
2
]
n
2
=
=
3
3
(n)
n
2
=
2n
2
p
J
() [J([y)[
1
2

1
A natural question that arises is what choices of priors generate analytical expres-
sions for the posterior distribution. This question leads to the notion of conjugate
priors.
Denition 4.6 (Conjugate prior) Let l be a likelihoodfunction l
y
(). A class
of prior distributions is said to form a conjugate family if the posterior density
p([y) p()l
y
()
is in the class for all y whenever the prior density is in (see [49]).
There is a minor complication with the denition and a more rigorous denition is
presented in [5]. However, the denition states the key principle in a clear enough
matter.
Example 4.3
Let x = (x
1
, . . . , x
n
) have independent Poisson distributions with the same mean
, then the likelihood function l
x
() equals
l
x
() =
n
i=1
(
x
i
x
i
e
) =
t e
n
n
i=1
x
i

t
e
n
where t =
n
i=1
x
i
and by theorem 4.2 is sucient for given x.
If we let the prior of be in the family of constant multiples of chi-squared
random variables, p()
v/21
e
S
0
/2
, then the posterior is also in .
p([x) p()l
x
() =
t+v/21
e
1
2
(S
0
+2n)
The distribution of p() is explained in appendix A.2.
Conjugate priors are useful in computing posterior densities. Although there are
not that many priors that are conjugate, there might be a risk of overuse since
data might be better described by another distribution that is not conjugate.
4.4 Marginalization
A useful property of conditional probabilities is the possibility to integrate out
undesired variables. According to usual operations of pdfs we have
p(a, b)db = p(a).

Analogously, for any likelihood function of two or more variables, marginal like-
lihoods with respect to any subset of the variables can be dened. Given the
likelihood l
y
(, M) the marginal likelihood l
y
(M) for model M is
l
y
(M) = p(y[M) =
p(y[, M)p([M)d.
Unfortunately marginal likelihoods are often very dicult to calculate and numer-
ical integration techniques might have to be employed.
4.5 Bayesian model averaging
To explain the powerful idea of Bayesian model averaging (BMA) we start by an
example
Example 4.4
Suppose we are analyzing data and believe that it arises from a set of probability
distributions or models M
i
k
i=1
. For example, the data might consist of a normally
distributed outcome y that we wish to predict future values of. We also have two
other outcomes, x
1
and x
2
, that covariates with y. Using the two covariates as
4.5 Bayesian model averaging 31
predictors on y oers two models, M
1
and M
2
as explanation for what values y is
likely to take on in the future. A novel approach to deciding what future value of
y should be used might be to simply average the two estimates. But, if one of the
models suers from bad predictive ability, then the average of the two estimates
is not likely to be especially good. Bayesian model averaging solves this issue by
normalizing the estimates y
1
and y
2
by how likely the models are
y = p(M
1
[Data) y
1
+ p(M
2
[Data) y
2
. Using theory from the previous chapters it
is possible to compute the probability p(M
i
[Data) for each model.
We now treat the averaging more mathematically.
Let be a quantity of interest, then its posterior distribution given data D is
p([D) =
K
k=1
p([M
k
, D)p(M
k
[D). (4.7)
This is a weighted average of the posterior probability where each model M
k
is
considered. The posterior probability for model M
k
is
p(M
k
[D) =
p(D[M
k
)p(M
k
)
K
l=1
p(D[M
l
)p(M
l
)
, (4.8)
where
p(D[M
k
) =
p(D[
k
, M
k
)p(
k
[M
k
)d
k
(4.9)
is the marginalized likelihood of the model M
k
with parameter vectors
k
as dened
in section 4.4. All probabilities are implicitly conditional on , the set of models
being considered. The posterior mean and variance of are given by
= E[[D] =
K
k=1
k
p(M
k
[D) (4.10)
= var[[D] = E(y
2
[D) E(y[D)
2
(4.11)
=
K
k=1
(var[y[D, M
k
] +

k
)p(M
k
[D) E[y[D]
2
where

k
= E[[D, M
k
], (see [41]).
4.6 Using BMA on linear regression models
Here, the key issue is the uncertainty about the choice of regressors, that is the
model uncertainty. Each model M
j
is of the previously discussed form y = X
j
j
+
u N(X
j
j
,
2
I
n
), where the regressors X
j
R
np
j, with the intercept
included, correspond to the regressor set, j J, specied in chapter 5. The
quantity y is the given data and we are interested in the quantity , the regression
line.
p(y[
j
,
2
) = l
y
(
j
,
2
) = (
1
2
2
)
n
2
exp[
1
2
2
(y X
j
j
)
(y X
j
j
)]
By completing the square in the exponent, the sum of squares can be written as
(y X)
(y X) = (

)
X(

) + (y X
(y X
),
where

= (X
X)
1
X
y is the OLS estimate. That the equality holds is proved

by multiplying out the right handside and checking that it equals the left handside.
As pointed out in section 3.1, (y X
) is the residual vector u and its sum

of squares divided by the number of observations less the number of covariates is
known as the residual mean square denoted by s
2
.
s
2
=
u
u
(np)
=
u
u
(v)
u
u = vs
2
It is convenient to denote np as v, known as the degrees of freedom of the model.
Now we can write the likelihood as
l
y
(
j
,
2
) (
2
)
p
j
2
exp[
1
2
2
(
j
j
)
(X
j
X
j
)(
j
j
)] (
2
)
v
j
2
exp[
v
j
s
2
j
2
2
].
The BMA analysis requires the specication of prior distribution for the parame-
ters
j
and
2
. For
2
we choose an uninformative prior
p(
2
) 1/
2
, (4.12)
which is the Jereys prior as calculated in example 4.2. For
j
the g-prior, as
introduced by Zellner [68], is applied
p(
j
[
2
, M
j
) f
N
(
j
[0,
2
g(X
j
X
j
)
1
), (4.13)
where f
N
(w[m, V ) denotes a normal density on w with mean m and covariance
matrix V. The expression
2
(X
X)
1
is recognized as the covariance matrix of
the OLS-estimate and the prior covariance matrix is then assumed to be propor-
tional to the sample covariance with a factor g which is used as a design parameter.
An increase of g makes the distribution more at and therefore gives higher pos-
terior weights to large absolute values of
j
.
4.6 Using BMA on linear regression models 33
As shown by Fernandez, Ley and Steel [33] the following three theoretical values
of g lead to consistency, in the sense of asymptotically selecting the correct model.
g = 1/n
The prior information is roughly equal to the information available from one
data observation
g = k/n
Here, more information is assigned to the prior as the number of predictors
k grows
g = k
(1/k)
/n
Now, less information is assigned to the prior as the number of predictors
grows
To arrive at a posterior probability of the models given data we also need to specify
the prior distribution for each model M
j
over the space of all K = 2
p1
models.
M
j
=
p(M
j
) = p
j
, j = 1, . . . , K
p
j
> 0
K
j=1
p
j
= 1
In our application, we chose p
j
= 1/K so that we have a uniform distribution
over the model space since we at this point have no reason to favor a model to
another. Now, the priors chosen have the tractable property of an analytical ex-
pression for l
y
(M
j
) the marginal likelihood.
Theorem 4.3 (Derivation of the marginal likelihood) Using the above spec-
ied priors, the marginalized likelihood function is given by
l
y
(M
j
) =
p(y[
j
,
2
, M
j
)p(
2
)p(
j
[
2
, M
j
)d
j
d
2
=
=
(n/2)
n/2
(g + 1)
p/2
(y
y
g
1 +g
y
X
j
(X
j
X
j
)
1
X
j
y)
n
2
.
Proof :
l
y
(M
j
,
j
,
2
) = p(y|
j
,
2
, M
j
)p(
j
|
2
, M
j
)p(
2
) =
= (2
2
)
n/2
exp[
1
2
2
(v
j
s
2
j
+ (
j

j
)
(X
j
X
j
)(
j

j
))]
(2
2
)
p/2
|Z
0
|
1/2
exp[
1
2
2
(
j

j
)
Z
0
(
j

j
))] 1/
2
To integrate the expression we start by completing the square of the exponents. Here,
we do not write out the index on the variables. Mind that Z
0
is used instead of writing
out the g-prior.
(

)
X(

) + (

)
Z
0
(

)
=
(X
X+Z
0
)
(X
+Z
0

) (
X+

Z
0
)+

Z
0

=
=
(X
X+Z
0
)
(X
X+Z
0
) (X
X+Z
0
)
1
(X
+Z
0

)
. .. .
=B
1
(
X+

Z
0
)(X
X+Z
0
)
1
. .. .
=B
1
(X
X+Z
0
) +

Z
0

=
=
(X
X+Z
0
)
(X
X+Z
0
)B
1
B
1
(X
X+Z
0
) +B
1
(X
X+Z
0
)B
1
B
1
(X
X+Z
0
)B
1
+

Z
0

=
= ( B
1
)
(X
X+Z
0
)( B
1
) B
1
(X
X+Z
0
)B
1
+

Z
0

=
= ( B
1
)
(X
X+Z
0
)( B
1
) (
X+

Z
0
)(X
X+Z
0
)
1
(X
+Z
0

)+
+
Z
0

=
= ( B
1
)
(X
X+Z
0
)( B
1
) (
X)(X
X+Z
0
)
1
(X
)
(
X)(X
X+Z
0
)
1
Z
0

(
Z
0
)(X
X+Z
0
)
1
(X
)+
(
Z
0
)(X
X+Z
0
)
1
(Z
0

) + (
X)(X
X+Z
0
)
1
(X
X+Z
0
)
Z
0
(X
X+Z
0
)
1
(X
X+Z
0
)
=
= ( B
1
)
(X
X+ Z
0
)( B
1
) [(
X)(X
X+Z
0
)
1
(Z
0

)
+ (
Z
0
)(X
X+Z
0
)
1
(X
) (
X)(X
X+Z
0
)
1
(Z
0

)
(
Z
0
)(X
X+Z
0
)
1
(X
)] =
/X
X(X
X+Z
0
)
1
Z
0
= ((X
X)
1
+Z
1
0
)
1
/
= (B
1
)
(X
X+Z
0
)(B
1
)[
((X
X)
1
+Z
1
0
)
1

+

((X
X)
1
+Z
1
0
)
1

((X
X)
1
+X
1
0
)
1

((X
X)
1
+Z
1
0
)
1

] =
= ( B
1
)
(X
X+Z
0
)( B
1
) + (
((X
X)
1
+Z
1
0
)
1
(

).
Now we can write l
y
(M
j
,
j
,
2
) as
1/
2
(2
2
)
(n+p)/2
exp[
1
2
2
S
1
] exp[
1
2
2
(
j
B
1
)
(A
1
)(
j
B
1
)] where
S
1
= v
j
s
2
j
+ (
j
)
((X
j
X
j
)
1
+Z
1
0
)
1
(
j
)
A
1
= Z
0
+X
j
X
j
The second exponent is the kernel of a multivariate normal density
2
and integrating with
respect to yields
1/
2
(2
2
)
n/2
|Z
0
|
1/2
|A
1
|
1/2
exp[
1
2
2
S
1
]
which in turn is the kernel of an inverted Wishart density
3
.
2
For a denition, see Appendix A
3
For a denition, see Appendix A
4.6 Using BMA on linear regression models 35
We now integrate with respect to
2
resulting in
l
Y
(M
j
) = (2)
n/2
|Z
0
|
1/2
|A
1
|
1/2
|S
1
|
n/2
c
0
(n
= n + 2, p
= 1) k
where k is a proportionality constant canceling in the posterior expression. To obtain
the marginal likelihood we substitute Z
0
with the inverse of the g-prior
1
g
(X
j
X
j
), where
2
is integrated out.
|S
1
|
n/2
= S
n/2
1
= (v
j
s
2
j
+

j
((1 + g)X
j
X
j
)
1

j
)
n/2
= (v
j
s
2
j
+

j
(1/(1 + g))(X
j
X
j
)
1

j
)
n/2
= ((y X
j

j
)
(y X
j

j
) +

j
(1/(1 + g))(X
j
X
j
)
1

j
)
n/2
= (y
y
g
1 + g
y
X
j
(X
j
X
j
)
1
X
j
y)
n/2
|Z
0
|
1/2
= |
1
g
X
j
X
j
|
1/2
= (1/g)
p/2
|X
j
X
j
|
1/2
|A
1
|
1/2
=
1
|A
1
|
1/2
=
1
(1 + (1/g))
p/2
|X
j
X
j
|
1/2
c
0
(n
= n + 2, p
= 1) = 2
n/2
(n/2)
And nally we arrive at l
y
(M
j
) =
(n/2)
n/2
(g+1)
p/2
(y
y
g
1+g
y
X
j
(X
j
X
j
)
1
X
j
y)
n/2
.
Now, applying Bayes rule yields the posterior model probabilities
p(M
j
[y) =
p(y[M
j
)p
j
n
k=1
p(y[M
k
)p
k
Meanwhile, the mean and variance of the predicted values, , is given by
= E([y) =
K
j=1
X
j
j
p(M
j
[y) (4.14)
= var[[y] =
K
j=1
[
2
u
X
j
(X
j
X
j
)
1
X
j
+ (X
j

j
)
2
]p(M
j
[y) [X
j

j
p(M
j
[y)]
2
(4.16)
where the expression var[[y, M
k
] from equation (4.11) is calculated as
var[[y, M
k
] = var[X
k
k
] = E[X
k
(
)(
k
] (4.17)
= X
k
E[(
)(
]X
k
=
2
u
X
k
(X
k
X
k
)
1
X
k
.
The estimation error is calculated as
S
k
=
1
n
n
i=1
ii
. (4.18)
Finally the condence interval for our BMA estimate for the equity premium is
calculated
I
1
(
k
) =
k
1
(1

2
)
S
k
n
, (4.19)
where = p(X x) when X is N(0, 1). This interval results from the central
limit theorem stating that for a set of n i.i.d. random variables with nite mean
and variance
2
, the sample average approaches the normal distribution with a
mean and variance

2
n
as n increases. This holds irrespectively of the shape of
the original distribution. It then follows, that for each time step the 2
18
estimates
of the equity premium has a sample mean and variance that is normal distributed.
Chapter 5
The Data Set and Linear
Prediction
In this chapter we rst describe the used data set and then explain and motivate
the predictors we have chosen to forecast the expected equity premium. We also
check that our statistical assumptions hold and explain how the predictions are
carried out.
5.1 Chosen series
The data set consists of information from three dierent sources: Bloomberg R (,
FRED R (and ValueLine R (, see table 5.1. In total the set consists of 18 dierent
time series, which can be divided into three dierent groups: data on a large stock
index, interest rates and macroeconomic factors. The data set has yearly data
from 1959 to 2007 on each series. The time series from ValueLine ends in 2003
and has been prolonged with data from Bloomberg while data from FRED covers
the whole time span.
5.2 The historical equity premium
The historical yearly realized equity premium can be seen in gure 5.1, where the
premium is calculated as in expression (2.1) with P
t
as the index level of Dow
Jones Industrial Average (DJIA)
1
and r
f,t1
being the US 1-year treasury bill
rate. It is this historical time series that will be used as dependent variable in the
regression models.
1
DJIA is is a price-weighted average of 30 signicant stocks traded on the New York Stock
Exchange and the Nasdaq. In contrast, most stock indices are market capitalization weighted,
meaning that larger companies account for larger proportions of the index.
37
38 The Data Set and Linear Prediction
Time series Bloomberg Ticker FRED Id Value Line
Dow Jones Industrial Average (DJIA) INDU Index.Px Last - X
DJIA Dividend Yield .Eqy Dvd Yld 12m - X
DJIA Price-Earnings Ratio .Pe Ratio - X
DJIA Book Value per share .Indx Weighted Book Val - X
DJIA Price-Dividend Ratio .Eqy Dvd Yld 12m - X
DJIA Earnings per share .Indx General Earn - X
Consumer Price Index - CPIAUCNS -
Eective Federal Funds Rate - FEDFUNDS -
3-month Treasury Bill - TB3MS -
1-Year Treasury Rate - GS1 -
10-Year Treasury Rate - GS10 -
Moodys Aaa Corp Bond Yield - AAA -
Moodys Baa Corp Bond Yield - BAA -
Producer Price Index - PPIFGS -
Industrial Production Index - INDPRO -
Personal Income - PI -
Gross Domestic Product - GDPA -
Consumer Sentiment - UMCSENT -
Table 5.1. The data set and sources
Figure 5.1. The historical equity premium over time
5.3 Factors explaining the equity premium 39
5.3 Factors explaining the equity premium
From the time series in table 5.1 we have constructed 18 predictors, which should
account for changes in the stock index as well as changes in the general economy.
1. Dividend yield is the dividend yield on the Dow Jones Industrial Average
Index (DJIA).
2. Price-earnings ratio is the price-earnings ratio on DJIA.
3. Book value per share is the book value per share on DJIA.
4. Price-dividend ratio is the price dividend ratio on DJIA.
5. Earnings per share is the earnings per share on DJIA.
6. Ination is measured by the consumer price index for all urban consumers
and all items.
7. Fed funds rate is the US eective federal funds rate.
8. Short term interest rate is the 3-month US treasury bill secondary market
rate.
9. Term spread short is the US 1-year treasury with constant maturity rate
less the 3-month US treasury bill secondary market rate.
10. Term spread long is the US 10-year treasury with constant maturity rate
less the US 1-year treasury with constant maturity rate.
11. Credit spread is Moodys Baa corporate bond yield returns less the Aaa
corporate bond yield.
12. Producer price is the US producer price index for nished goods.
13. Industrial production is the US industrial production index.
14. Personal income is the US personal income.
15. GDP is the gross US domestic product.
16. Consumer sentiment is the University of Michigan time series for con-
sumer sentiment.
17. Volatility is the standard deviation of the returns on DJIA.
18. Earnings-book ratio is earnings per share divided by book value per share
for DJIA.
For all 18 factors above we have used the fractional change dened as
r
i,t
=
I
t
I
t1
1 (5.1)
where r
i,t
is the return on factor i at time t and I
t
is the factor level at time t.
The basic statistics for the 18 factors is found in table 5.3.
Factors
1 2 3 4 5 6 7 8 9
Mean 0.00 0.07 0.06 0.02 0.07 0.04 0.09 0.07 -0.02
Std 0.14 0.37 0.15 0.13 0.23 0.03 0.40 0.40 0.11
Median 0.00 -0.01 0.05 0.01 0.10 0.04 0.06 0.01 -0.02
Min -0.30 -0.38 -0.20 -0.23 -0.61 0.01 -0.71 -0.68 -0.34
Max 0.32 1.73 0.87 0.29 0.64 0.14 1.28 1.65 0.20
10 11 12 13 14 15 16 17 18
Mean -0.04 0.00 0.04 0.03 0.07 0.07 0.00 0.04 1.50
Std 0.27 0.04 0.04 0.05 0.03 0.03 0.14 0.01 11.84
Median 0.01 -0.01 0.02 0.03 0.07 0.06 0.00 0.04 0.79
Min -1.29 -0.10 -0.03 -0.09 0.01 0.00 -0.28 0.01 -52.24
Max 0.53 0.15 0.16 0.11 0.13 0.13 0.42 0.08 48.60
Table 5.2. Basic statistics for the factors
Dividend yield
The main reason for the supposed predictive power of the dividend yield is the
positive relation between expected high dividend yields and high returns. This is
a result from using a discounted cash ow framework under the assumption that
the expected stock return is equal to a constant. For instance Campbell [11] has
shown that the current stock price is equal to the expected present value of future
dividends out to the innite future. Assuming that the current dividend yields will
remain the same in the future, the positive relation follows. This relationship can
also be observed in the Gordon dividend growth model. In the absence of capital
gains, the dividend yield is also the return on the stock and measures how much
cash ow you are getting for each unit of cash invested.
Price-earnings ratio
Price-earnings ratio, price per share divided by earnings per share, measures how
much an investor is willing to pay per unit of earnings. A high Price-earnings ratio
then suggests that investors think the rm has good growth opportunities or that
the earnings are safe and therefore more valuable [7].
Book value per share
Book value per share, value of equity divided by the number of outstanding share,
is the raw value of the stock and should be compared with the market value of
the stock. These two gures are rarely the same. Most of the time a stock trade
to a multiple of the book value. A low book value per share in comparison with
the market value per share suggests that the stock is high valued or perhaps even
overvalued, the reciprocal also holds.
Price-dividend ratio
The price-dividend ratio, price per share divided by annual dividend per share, is
the reciprocal of the dividend yield. A low ratio might mean that investors require
a high rate of return or that they are not expecting dividend growths in the future
[7]. As a consequence, a low ratio could be a forecast of less protable times. A
low ratio can also indicate either a fast growing company or a company with poor
outlooks. A high ratio could either point to a mature company with few growth
opportunities or just a mature stable company with temporarily low market value.
Earnings per share
Earnings per share, prot divided by the number of outstanding share, is more
interesting if you calculate and view the incremental change for a period of time.
A steady rate of increasing earnings per share could suggest good performance and
decreasing earnings per share gures would suggest poor performance.
Ination
Ination is dened as the increase in the price of some set of goods and services
in a given economy over a period of time [10]. The ination is usually measured
through a consumer price index, which measures nominal consumer prices for a
basket of items bought by a typical consumer. The prices are weighted by the
fraction the typical consumer spends on each item. [20]
Many dierent theories for the role and impact of ination in an economy have
been proposed, but they all have some basic implications in common. A high
ination make people more interested in investing their savings in assets that are
ination protected, e.g. real estate, instead of holding xed income assets such
as bonds. By moving away from xed income and investing in other assets the
hopes are that the returns will exceed the ination. As a result, high ination
leads to reduced purchasing power as individuals reduce money holdings. High
inations are unpredictable and volatile. This creates uncertainty in the business
community, reducing investment activity and thus economic growth. If a period
of high ination rules, a prolonged period of high unemployment must be paid to
reduce ination to modest levels again. This is the main reason for fearing high
ination. [44]
A low ination usually implies that the price levels are expected to increase over
time and therefore it is benecial to spend and borrow in the short run. A low
ination is the starting point for a higher rate of ination.
Central banks try to contain the rate of ination to a predetermined interval,
usually 2-3 %, in order to maintain a stable price level and currency value. The
means for doing so are given to the banks by changing the discount rate - increas-
ing the rate usually dampens the ination and the other way around.
Generally, no producer is keen on lowering their prices, just as no employee accepts
a decrease in their nominal salary. This leads to that a small level of ination has
to be allowed in order for the pricing system to work ecient. Ination levels
above this threshold are considered negative, mainly due to the fact that ination
creates further ination expectations. [44]
Besides being linked to the general state of the economy, ination also has great
impact on interest rates. If the ination rises, so will the nominal interest rates
which in turn inuence the business conditions. [44]
Federal funds rate
The federal funds rate is one of the most important money market instruments. It
is the rate that banks in the US charge each other for lending overnight. Federal
funds are tradable reserves that commercial banks are required to maintain with
the Fed. The Fed does not pay interest on these reserves so banks maintain the
minimum reserve position possible and sell the excess to other banks short of cash
to meet their reserve deposit needs. The federal funds rate therefore is roughly
analogous to the London Interbank Oered Rate (LIBOR). [4]
A bank that wishes to nance a client venture but does not have the means to
do so can lend capital from another bank to the federal funds rate. As a result,
the federal funds rate set the threshold for how willing banks are to nance new
ventures. As the rate increases, banks become more reluctant to take out these
inter-bank loans. A low rate will on the other hand encourage banks to borrow
money and hence increase the possibilities for businesses to nance new ventures.
Therefore, this rate somewhat controls the US business climate.
Short term interest rate
The short term interest rate (3-month T-bills) is an important rate which many
use as a proxy for the risk-free rate and hence enters many dierent valuation
models used by practitioners. As a result, changes in the short term rate inu-
ences the market prices. For instance, an increase in the short term rate makes the
present value of cash ows to the rm take on a smaller value and a discount cash
ow model for a rms stock would as a result imply a lower stock price. Another
simple implication is that an increase also just make it more expensive for rms
to nance themselves in the short run. In general, an increase in the short term
rate tend to slow economic growth and dampen ination. The short term interest
rate is also linked, in its movements, to the federal funds rate.
Term spread
A yield curve can take on many dierent shapes and there are several dierent
theories trying to explain the shape. When talking about the shape of the yield
curve one refers to the slope of the curve. Is it at, upward sloping, downward
sloping or humped? Upward and downward sloping curves are also referred to as
normal and inverted yield curves. A yield curve constructed from prices in the
bond market can be used to calculate dierent term spreads, dierences in rates
for two dierent maturities. For this reason the term spread is related to the slope
of the yield curve. Here we have dened the short term spread as the dierence in
rates between the maturities one year and three months and the long term spread
as the dierence between ten years and one year maturities. Positive short and
long term spreads could imply an upward sloping yield curve, and the opposite
could imply a downward sloping curve. A positive short term spread and a nega-
tive long term spread could correspond to a humped yield curve.
Yield curves almost always slope upwards, gure 5.2 a. One reason for this is
expectation of future increases in ination and therefore investors require a pre-
mium for locking in their money at an interest rate that is not ination protected.
[44] As mentioned earlier, increase in ination comes with economy growth which
makes an upward sloping yield curve a sign of good times. The growth itself
can also be partly explained by the lower short term rate which makes it cheaper
for companies to borrow for expanding. Furthermore, central banks are expected
to fend o the expected rise in ination with higher rates, decreasing the price
of long-term bonds and thus increasing their yields. A downward sloping yield
curve, gure 5.2 b occurs when the expectations is that future ination will be
lower than current ination and thus the expectation also is that the economy will
slow down in the future [44]. A low long term bond yield is acceptable since the
ination is low. In fact, each of the six last recessions in the US has been preceded
by an inverted yield curve [25]. This shape could also be developed as the Federal
Reserve raises their nominal federal funds rate.
(a) Normal (b) Inverted (c) Flat (d) Humped
Figure 5.2. Shapes of the yield curve
A at yield curve, gure 5.2 c, signals uncertainty in the economy and should not
be visible for any longer time periods. Investors should in theory not have any
incentive to hold long-dated bonds over shorter-dated bonds when there is no yield
premium. Instead they would sell o long-dated bonds resulting in higher yields in
the long end and an upward sloping yield curve. A humped yield curve, gure 5.2
d, arises when investors expect interest rates to rise over the next several periods
and then decline. It could also signal the beginning of a recession or just be the
result of a shortage in the supply of long or short-dated bonds. [18]
Credit spread
Yields on corporate bonds are almost always higher than on treasuries with the
same maturity. This is mainly a result of the higher default risk in corporate
bonds, even if other factors have been suggested as well. The corporate spread,
also known as the credit spread, is usually the dierence between the yields on a
Baa rated corporate bond and a government bond, with the same time to maturity
of course. Research [47] has shown that only around 20-50 percent of the credit
spread can be accounted for by the default risk only, when calculating the credit
spread with government bonds as the reference instrument. If one instead uses
Aaa rated corporate bonds, you hopefully increase this number. Above all, the
main reason for using credit spread as an explaining/forecasting variable at all is
that the credit spread seems to widen in recessions and to shrink in expansions
during the business cycle [47]. It can also change as other bad news hit the market.
Our corporate bond series have bonds with a maturity as close as possible to 30
years, and are averages of daily data.
Producer price
The producer price measures the average change over time in selling prices received
by domestic producers of goods and services. It is measured from the perspective
of the seller in contrast with the consumer price index that measure from pur-
chasers perspective. These two may dier due to government subsidies, sales and
excise taxes and distribution costs.[63]
Industrial production and personal income
Industrial production measures the output from the US industrial sector which
is dened as being compromised to manufacturing, mining and electric and gas
utilities [31]. Personal income measures the sum of wages and salaries in dollars
for the US.
Gross domestic product
The gross domestic product (GDP) is considered as a good measure of the size of
an economy and how well it is performing. This statistics is dened as the market
value of all goods and services produced within a country in a given time period
and is computed every three months by the Bureau of Economic Analysis. More
specically, the GDP is the sum of spending divided into four broad categories:
consumption, investment, government purchases and net exports. The change of
the GDP describes how the economy varies so therefore it is an indicator of the
conjuncture cycle. [53]
Consumer sentiment
The consumer sentiment index is based on household interviews and gives an in-
dication of the future business climate, personal nance and spending in the US
and therefore has implications on stocks, bonds and cash markets.[62]
Volatility
Volatility is the standard deviation of the change in value of a nancial instrument.
The volatility is here calculated on monthly observations for each year. The basic
idea behind volatility as an explaining variable is that volatility is synonymous
with risk. High volatility should imply a higher demand for risk compensation, a
higher equity premium.
5.4 Testing the assumptions of linear regression 45
Earnings-book ratio
The earnings-book ratio relates the earnings per share to the book value per share
and measures a rms eciency at generating prots. The ratio is also called ROE,
return on equity. It is likely that a high ROE yields a high equity premium be-
cause general business conditions have to be good in order to generate a good ROE.
5.4 Testing the assumptions of linear regression
As discussed in chapter 3.3 the estimated coecients in the OLS-solution are very
sensitive to outliers. By applying the leverage measure from denition 3.3 the
outliers in table 5.3 have been found. Elements in y deviating more than three
standard deviations from the mean of y have been removed and replaced by lin-
ear interpolated values. This have been repeated three times for each factor time
series. In total, an average of one outlier per time series factor per time step has
been removed and interpolated.
Step Outliers
tot
1 19
2 18
3 18
4 14
5 16
Table 5.3. Outliers identied by the leverage measure for univariate predictions
The assumptions that must hold for a linear regression model were presented in
chapter 3.2 and the means for testing these assumptions were given in chapter 3.4.
After having removed outliers, it is motivated to check for violations against the
classical regression assumptions.
The QQ-plots for all factors are presented in gure 5.3 and 5.4. By visual in-
spection of each subplot, it is seen that for some factors, the points on the plot fall
close to the diagonal line - the error distribution is likely to be gaussian. Other
factors shows sign of kurtosis due to the S-shaped form. A Jarque-Bera test on the
signicance level 0.05 has been performed to rule out the uncertainties of depar-
tures from the normal distribution. From the results in table 5.4 it is found that
we can not reject the null hypothesis that the residuals are Gaussian at signicance
level 0.05. The critical value represents the upper limit for the null hypothesis to
hold, the P-Value represents the probability of observing the same outcome given
that the null hypothesis is true or put another way if the P-Value is above the
signicance level we cannot reject the null hypothesis.
Factor
1 2 3 4 5 6 7 8 9
JB-Value 2.39 1.79 1.35 2.24 1.69 1.27 0.96 1.14 2.00
Crit-Value 4.84 4.88 4.95 4.92 4.95 4.89 4.95 4.93 4.93
P-Value 0.16 0.26 0.39 0.18 0.29 0.41 0.53 0.46 0.22
H
0
or H
1
H
0
H
0
H
0
H
0
H
0
H
0
H
0
H
0
H
0
10 11 12 13 14 15 16 17 18
JB-Value 1.62 2.14 0.85 1.77 0.96 0.82 1.72 2.18 1.62
Crit-Value 4.94 4.98 4.93 4.92 4.91 4.90 4.91 4.88 4.94
P-Value 0.30 0.20 0.58 0.26 0.53 0.59 0.28 0.19 0.30
H
0
or H
1
H
0
H
0
H
0
H
0
H
0
H
0
H
0
H
0
H
0
Table 5.4. Jarque-Bera test of normality at = 0.05 for univariate residuals for lagged
factors
To investigate the presence of autocorrelation in the residuals a Durbin-Watson
test is performed. If the Durbin-Watson test statistics is close to 2, it indicates
that there is no autocorrelation in the residuals. As can be seen in table 5.5 all
test statistics group around 2 and it can be assumed that autocorrelation is not
present. It can be concluded from these two tests and from checking that the
errors indeed have an average of zero, that the classical regression assumptions in
chapter 3.2 are fullled for the univariate models. For the multivariate models, it
has not been veried that the assumptions hold, this is due to the large number
of models. Even if the assumptions are not fullled, OLS can still be used, but it
is not guaranteed that it is the best linear unbiased estimate.
Factor
1 2 3 4 5 6 7 8 9
DW-Value 1.83 2.10 2.02 1.88 2.10 2.19 2.09 2.09 2.16
P-Value 0.46 0.85 0.97 0.58 0.83 0.67 0.89 0.89 0.64
10 11 12 13 14 15 16 17 18
DW-Value 2.08 1.97 2.23 1.92 2.23 2.08 2.11 2.02 2.05
P-Value 0.92 0.82 0.57 0.67 0.56 0.95 0.81 0.91 0.98
Table 5.5. Durbin-Watson test of autocorrelation for univariate residuals for lagged
factors
(a) Dividend yield (b) Price-earnings ratio (c) Book value per share
(d) Price-dividend ratio (e) Earnings per share (f) Ination
(g) Fed funds rate (h) Short term interest rate (i) Term spread short
Figure 5.3. QQ-Plot of the one step lagged residuals for factors 1-9 versus standard
normal pdf
(a) Term spread long (b) Credit spread (c) Producer price
(d) Industrial production (e) Personal income (f) Gross domestic product
(g) Consumer sentiment (h) Volatility (i) Earnings-book ratio
Figure 5.4. QQ-Plot of the one step lagged residuals for factors 10-18 versus standard
normal pdf
(a) Dividend yield (b) Price-earnings ratio (c) Book value per share
(d) Price-dividend ratio (e) Earnings per share (f) Ination
(g) Fed funds rate (h) Short term interest rate (i) Term spread short
Figure 5.5. One step lagged factors 1-9 versus returns on the equity premium, outliers
marked with a circle
(a) Term spread long (b) Credit spread (c) Producer price
(d) Industrial production (e) Personal income (f) Gross domestic product
(g) Consumer sentiment (h) Volatility (i) Earnings-book ratio
Figure 5.6. One step lagged factors 10-18 versus returns on the equity premium, outliers
marked with a circle
5.5 Forecasting by linear regression 51
5.5 Forecasting by linear regression
When forecasting time series data by using regression there are two dierent ap-
proaches. The rst possibility would be to estimate the regression equation using
all values of the dependent and the independent variables. When one wants to
take a step ahead in time, forecasted values for the independent variables have to
be inserted into the regression equation. In order to do this one must clearly be
able to forecast the independent variables, e.g. by assuming an underlying process,
and one has mearly shifted the problem of forecasting the dependent variable to
forecasting the independent variables.
The second possibility is to estimate the regression equation using lagged inde-
pendent variables. If one wants to take one step ahead in time, then one would lag
its independent variables one step. This is illustrated in table 5.6 where is the
time lag steps. By inserting the most recent, unused, observations of the indepen-
dent variables in the regression equation you get a one step forecasted value for
the dependent variable. In fact, one could insert any of the unused observations of
the independent variables since its already assumed that the regression equation
holds over time. However, economically, it is common practise to use the most
recent values since they probably contain more information about the future
2
. It
is the approach mentioned above that has been used in this thesis. Plots for the
univariate one step lagged regressions are found in gure 5.5 and gure 5.6.
Y X
i
y
t
x
i,t
y
t1
x
i,t1
y
t2
x
i,t2
.
.
.
.
.
.
y
tN
x
i,tN
Table 5.6. Principle of lagging time series for forecasting
2
This follows from the Ecient market hypothesis
When a time series is regressed on other time series that are lagged, information
is generally lost and resulting in smaller absolute values of R
2
, see table 5.7. This
does not need to be the case, some times lagged predictors provide a better R
2
.
This can be explained by, and observed in table 5.7, that it takes time for these
predictors to have impact on the dependent variable. For instance a higher R
2
in-
sample would have been obtained for factor 15, GDP, if its time series would have
been lagged one step. The realized change in GDP does a better job in forecasting
than in explaining that years equity premium.
Factor Time lag
0 1 2 3 4 5
1 0.440 0.038 0.008 0.000 0.086 0.000
2 0.075 0.000 0.009 0.000 0.033 0.010
3 0.001 0.032 0.108 0.010 0.028 0.010
4 0.416 0.024 0.014 0.000 0.075 0.001
5 0.001 0.000 0.042 0.009 0.008 0.008
6 0.180 0.013 0.006 0.016 0.027 0.001
7 0.001 0.076 0.004 0.022 0.008 0.119
8 0.000 0.045 0.004 0.010 0.004 0.065
9 0.001 0.037 0.004 0.129 0.034 0.128
10 0.003 0.008 0.011 0.008 0.000 0.022
11 0.138 0.087 0.003 0.000 0.127 0.014
12 0.180 0.020 0.012 0.006 0.032 0.019
13 0.159 0.059 0.000 0.001 0.003 0.060
14 0.030 0.096 0.052 0.035 0.058 0.042
15 0.008 0.113 0.084 0.018 0.049 0.030
16 0.305 0.000 0.010 0.001 0.030 0.008
17 0.112 0.025 0.017 0.095 0.062 0.059
18 0.000 0.005 0.117 0.003 0.002 0.002
Table 5.7. Lagged R
2
for univariate regression with the equity premium as dependent
variable
Chapter 6
Implementation
In this chapter it is explained how the theory from the previous chapter is imple-
mented and techniques and solutions are highlightened. All code is presented in
the appendix B.
6.1 Overview
The theory covered in the previous chapters are implemented using Matlab. To
make the program easy to use, a user interface in Excel is constructed. Figure 6.1
describes the communication between Excel, VBA and Matlab.
Figure 6.1. Flowchart
53
54 Implementation
Figure 6.2. User interface
6.2 Linear prediction
The predictions are implemented using Matlabs backslash operator which solves
equation systems of the form y = x. Depending on matrix conditions of y,x dif-
ferent factorizations are made in the call y`x. If the dimensions are not matched,
the call is executed by rst performing a factorization and the least squares esti-
mate of is calculated. If the dimensions are matched, then = y`x is computed
by Gauss elimination. The backslash operator never computes explicit inverses.
The Jarque-Bera test, Durbin-Watson test and the QQ-plots are generated us-
ing the following Matlab calls: jbtest,dwtest and qqplot.
In the multivariate prediction, permutations of the 18 factors are selected us-
ing binary numbers from 1 to 2
18
where the ones symbolize factors included and
the zeros symbolize factors not included in the dierent models.
Surveys on the equity premium have shown that the large majority of profes-
sionals believe that the the premium is conned to 2-13% [65]. Therefore, models
yielding a negative value of the premium or a value exceeding the historical mean
of the premium with 1.28, that corresponds to a 90% condence interval, are not
being used in the Bayesian model averaging and therefore do not inuence the
nal premium estimate at all. Setting the upper bound to 1.28 rules out premia
larger than around 30%.
6.3 Bayesian model averaging 55
6.3 Bayesian model averaging
The Bayesian model averaging is straightforwardly implemented from the theo-
retical expression for the likelihood given in section 4.6, where g is set to be the
reciprocal of the number of samples. As can be seen in table 7.6, the three dierent
choices of g lead almost to the same results. The diculties with the implemen-
tation lie within dealing with the large number of models, 2
18
262000, in a
time ecient manner. This problem has been solved by implementing a routine
in C, called setSubColumn, that handles memory allocation more ecient when
working with matrices close to the maximal allowed matrix size in Matlab. The
code is supplied in the appendix B.
6.4 Backtesting
Since the HEP sometimes is negative while we do not allow for negative values
of the premium, traditional backtesting would not be a fair benchmark for the
performance of our prediction model. Instead we evaluate how good excess returns
are estimated by allowing for negative values. To further investigate the predictive
ability of our forecasting, an R
2
-out-of-sample statistic is employed. The statistic
is dened as
R
2
os
= 1
n
i=1
(r
t
r
t
)
2
n
i=1
(r
t
r
t
)
2
, (6.1)
where r
t
is the tted value from the predicitive regression estimated through t 1
and r
t
is the historical average return, also measured through t 1. If the statistic
is positive, then the predicitive regression has lower average mean squared error
than the historical average.
1
Therefore, the statistic can be used to determine if a
model has better predictive performance than applying the historical average.
A measure called hit ratio (HR) can be used as an indication of how good the
forecast is at predicting the sign of the realized premium. It is simply the ratio of
how many times the forecast has the right sign and the length of the investigated
time period. For an investor this is of interest since the hit ratio can be used as a
buy-sell signal on the underlying asset. In the case of the equity premium, this is
a biased measure since the long-term average of the HEP is positive.
An interesting question is if the next years predicted value will be realized within
the next coming business cycle, here approximated as ve years and called forward
average. This value is calculated as a benchmark along with a ve-year rolling av-
erage, here called backward average. The results from the backtest is presented in
the results section.
1
This statistics is further investigated by Campbell and Thomson [16]
Chapter 7
Results
In this chapter we present our forecasts of the equity premium along with the
results from the backtest.
7.1 Univariate forecasting
In gure 7.1 the historical equity premium is prolonged with the estimated equity
premia for ve years ahead and plotted over time. The models used are univariate
and hence each model consists of only one factor, being 18 models in total.
The gures for the forecasted premia is displayed in table 7.1. Models not belong-
ing to the set specied in chapter 6 are not taken into consideration. In table 7.1
the labels Prediction Range and Mean refer to the range of the predicted values
and to the mean of these predicted values. Note that the Mean corresponds to the
prior believes.
k
is the estimate of the premium using bayesian model averaging.
The variance and a condence interval for this estimate is also presented.
Time
step
Prediction Range Mean
k
S
k
I
0.90
Dec-08 0.00 - 16.0 3.69 4.20 15.27 0.58 - 7.83
Dec-09 0.00 - 14.4 2.36 3.07 15.29 -0.60 - 6.74
Dec-10 0.00 - 14.0 2.54 3.54 15.28 -0.17 - 7.24
Dec-11 0.00 - 15.1 2.94 4.84 15.30 1.08 - 8.59
Dec-12 0.00 - 8.9 3.36 4.05 15.34 0.25 - 7.85
Table 7.1. Forecasting statistics in percent
57
58 Results
Figure 7.1. The equity premium from the univariate forecasts
In table 7.2 the factors constituting the univariate model with highest probability
over time is presented. The factors are further explained in chapter 5. Note that
the prior assumption about model probabilities is 1/18 5.5 percent for each
model.
Time
step
Factor Pr(M
i
)
1 Gross Domestic Product 6.47
2 Gross Domestic Product 7.38
3 Terms Spread Short 8.19
4 Volatility 9.23
5 Terms Spread Short 6.96
Table 7.2. The univariate model with highest probability over time
Figure 7.2 shows how the likelihood function changes for dierent g-values for
each one step lagged predictor. Table 7.3 shows results from the backtest. The
R
2
os
statistics shows that the univariate prediction model has better predictive
7.1 Univariate forecasting 59
Figure 7.2. Likelihood function values for dierent g-values
performance than applying the historical average for the period 1991 to 1999.
The hit ratio statistics, HR, shows how often the univariate predictions have the
right sign, that is, if the premium is positive or negative. Mind that we allow for
negative premium values when applying the HR statistics.
Pred. step 1 2 3 4 5
R
2
os,uni
0.21 0.26 0.23 0.05 0.14
HR
uni
0.6 0.2 0 0.6 0.2
Table 7.3. Out of sample, R
2
os,uni
uni
60 Results
7.2 Multivariate forecasting
The corresponding results from multivariate predictions are presented below in
gure 7.3. As in the univariate case, no negative values are allowed and the upper
limit from chapter 6 is used. In table 7.4 the labels Prediction Range and Mean
refer to the range of the predicted values and to the mean of these predicted values.
Note that the Mean corresponds to the prior believes.
k
is the estimate of the
premium using Bayesian model averaging.
Figure 7.3. The equity premium from the multivariate forecasts
Time
step
Prediction Range Mean
k
S
k
I
0.90
Dec-08 0.00 - 21.4 3.18 7.72 16.6 3.79 - 11.7
Dec-09 0.00 - 21.7 1.48 7.97 16.7 4.01 - 11.9
Dec-10 0.00 - 21.4 5.07 10.4 16.6 6.45 - 14.3
Dec-11 0.00 - 21.7 4.26 10.2 16.7 6.30 - 14.2
Dec-12 0.00 - 16.0 0.58 3.74 17.7 -0.21 - 7.70
Table 7.4. Forecasting statistics in percent
7.2 Multivariate forecasting 61
Time
step
Factor Pr(M
i
)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
1 0.001
2 0.002
3 0.0009
4 0.001
5 0.003
Table 7.5. The multivariate model with highest probability over time
In table 7.5 the factors constituting the multivariate models with highest proba-
bilities over time are presented. The factors are discussed in chapter 5. Note that
the prior assumption about the model probabilities is 1/(2
18
) 0.00038 percent
for each model.
Time horizon g = 1/n g = k
1/(1+k)
/n g = k/n
Dec-08 7.7236 7.7274 7.8047
Dec-09 7.9769 7.9786 7.9509
Dec-10 10.384 10.340 10.568
Dec-11 10.248 10.251 10.344
Dec-12 3.7434 3.7433 3.7688
Table 7.6. Forecasts for dierent g-values
Table 7.6 depicts how the predicted values are inuenced by the three choices for g.
In the univariate case, the three choices coincide. Table 7.7 shows results from the
backtest. The R
2
os
statistics shows that also the multivariate prediction model has
better predictive performance than applying the historical average for the period
1991 to 1999. The hit ratio statistics, HR, shows how often the univariate pre-
dictions have the right sign, that is, if the premium is positive or negative. Once
again, we allow for negative premium values when applying the HR statistics.
Pred. step 1 2 3 4 5
R
2
os,mv
0.23 -0.10 0.20 0.47 0.60
HR
mv
0.6 0.4 0.6 0.8 0.6
Table 7.7. Out of sample, R
2
os,mv
mv
62 Results
7.3 Results from the backtest
In gure 7.4 and 7.5 our forecasts are compared with a backward average, a forward
average and the HEP. An average of the forecasts are also compared to a forward
average. The backtest is explained in chapter 6.4 and further discussed in the next
chapter.
(a) Univariate backtest 1 year (b) Univariate backtest 2 year
(c) Univariate backtest 3 year (d) Univariate backtest 4 year
(e) Univariate backtest 5 year (f) 1991:1995 compared forward
Figure 7.4. Backtest of univariate models
7.3 Results from the backtest 63
(a) Multivariate backtest 1 year (b) Multivariate backtest 2 year
(c) Multivariate backtest 3 year (d) Multivariate backtest 4 year
(e) Multivariate backtest 5 year (f) 1991:1995 compared forward
Figure 7.5. Backtest of multivariate models
Chapter 8
Discussion of the Forecasting
In chapter 6.3 we specied the value of g to be used in this thesis as the reciprocal
of the number of samples. For the sake of completeness, we have presented the
outcome of the two other values of g in table 7.6. Apparently, the chosen value
of g has most impact on the 1-year horizon forecast and a decreasing impact on
the other horizons. This can be explained by the rapid decreasing forecasting
performance of the covariance matrix for time lags above one which in turn can
be motivated by table 5.7 showing decreasing R
2
-values over time. In gure 7.2
the principle appearance of the likelihood function for the factors and dierent
g-values can be seen. As explained earlier it is seen that increasing the value of
g gives models with good adaptation to data a higher likelihood, while setting g
to zero yields the same likelihood for all models. For large g-values, only models
with high degree of explanation will have impact in the BMA and you have great
condence in your data. On the other hand, a decrease of g allows for more un-
certainty to be taken into account.
Turning to the model criterions formulated in chapter 2.7, it is found that most of
the criteria are fullled. The equity premium over the ve-year horizon is positive,
due to our added constraints, however the condence interval for the premium
incorporates the zero at some times.
The time variation criteria is not fullled in the sense that the regression line
does not change considerably as new data points become available. The amount
of used data is a tradeo between stability and incorporating the latest trend. The
conict lies in the condence of predictors. To use many data samples improve
preciseness of the predictors but the greater the dierence between the time to be
predicted and that of the oldest samples, the more doubtful are the implications
of old samples.
The smoothness of the estimates over time is questionable, our ve-years pre-
diction in the univariate case are rather smooth whereas the multivariate forecasts
exhibit greater uctuations. Given the appearance of the realized equity premium
65
66 Discussion of the Forecasting
til December 2007, which is strongly volatile, and that a multivariate model can
explain more variance, it is reasonable that a multivariate model would generate
results more similar to the input data, just as can be observed in the multivariate
case, gure 7.3.
The time structure of the equity premium is not taken into consideration be-
cause the one-year yield, serving as the riskfree asset, does not alone account for
the term structure.
Since all predictions suer from an error it is important to be aware of the qual-
ity of the predictions. Our precision estimate takes the mist of the models into
account and therefore it says something about the uncertainty in our predictions.
However, this precision does not say anything about the relevancy of using old
data to forecast future values.
From the R
2
-values in table 5.7 it can be seen that there are some predictive
ability at hand, even though it is small. Another evidence of predictability is the
deviation of the prior probabilities to the posterior probabilities. If there were no
predictability at hand, why would then the prior probability be dierent from the
posterior probability? The mean in table 7.1 and table 7.4 corresponds to using
the prior believes that all models have the same probability, the BMA estimate is
never equal to the mean.
The univariate predictors with the highest probability in each time step, table
7.2, also enters the models with highest probability in table 7.5, except for GDP
which is not a member of the multivariate model for the rst time step. This
can be summarized as the factors GDP, term spread short and volatility being
important in the forecast for the next ve years.
Having seen evidence of predictive ability, the question is now to what extent
it can be used to produce accurate forecasts.
Backtesting our approach is not trivial, mainly because we cannot access the
historical expected premium. Nevertheless, backtesting has been performed by
doing a full ve-year horizon forecast starting in each year between 1991 and 1995
respectively and then comparing the point forecasts with the realized historical
equity premium for each year. Here, no restrictions are imposed on the forecasts,
i.e. negative excess returns are allowed. The results are presented in gure 7.4 and
gure 7.5 where each plot corresponds to a time step (1, 2, 3, 4 or 5 years). These
plots have also been complemented with the realized excess returns, as well as the
ve-year backward and the ve-year forward average. In gure 7.4 f and gure
7.5 f, the arithmetic average of the full ve-year horizon forecast is compared to
the ve-year forward average.
The univariate backtest shows that the forecast intervals at most capture 2 out
of 5 HEP:s, this at the one and two-year horizon. Otherwise, the forecasts tend
67
to be far too low in comparison with the HEP. The number of times the HEP
intersects with the forecasted intervals at the most is 2 times, at the two-year
horizon gure 7.4 b. In general, the univariate forecasts do not seem to be exible
enough to t the sometimes vast changes in the HEP and are far too low. The
backtest has not provided us with any evidence of forecasting ability. However,
when the forecast constraint is imposed, the predictive ability from 1991-1995 is
superior to using the historical average. This can be seen from the R
2
-statistics
in table 7.3. The four and ve-year horizon forecasts, gure 7.4 d e, captures
2 out of 5 forward averages, whereas the one-year horizon captures 3 backward
averages. In gure 7.4 f it can be seen that averaging the forecasts do not give
a better estimate of the forward average. From table 7.3 it can be seen that the
hit-ratios for the one and four-year horizon stand out with both scoring 60 %. The
results from the univariate back-test have shown that the best forecasts were re-
ceived for the one and four-year horizon, of which none has a good forecast quality.
The multivariate backtest shows little sign of forecasting ability for our model.
The number of times the HEP intersects at most with the forecasted interval is 3
out of 5 times. This happens at the three and four-year horizon, gure 7.5 c and d,
these are also also the forecasts following the evolvement of the HEP most closely.
The four-year forecast depicts the change of the HEP the best, being correct 3 out
of 4 times, however never getting the actual gures correct. The two and four-year
forecast captures the forward average the best, 2 out of 5 forecasted intervals are
realized in average over the next 5 years. From gure 7.5 f, the only conclusion
that can be drawn is that averaging our forecast for each time step does not pro-
vide a better estimate of the forward average. The R
2
-values in table 7.7 show
sign of forecasting ability in comparison with the historical average at all time
steps except for the two-year horizon, with the four and ve-year horizon forecasts
standing out. The most signicant hit-ratio is 80%, at the four-year horizon. In
conclusion the back-testing in the multivariate case has shown that for the test
period the best results in all terms have been received for the four and ve-year
horizon, in particular the four-year horizon.
Summing up the results from the univariate and multivariate back-test, it can
not be said that the quality of the multivariate forecasts outperforms the quality
of the univariate estimates when looking to the R
2
-values and hit-ratios. However,
the multivariate forecasts as such depict the evolvement of the true excess returns
in a better way. Contrary to what one could believe, the one year horizon fore-
casts do not generate better forecasts than the other horizons. In fact, the best
estimates are provided by the 4-year forecasts, both in the univariate and the mul-
tivariate case. Still, we recommend using the one-year horizon forecasts because it
has the smallest time lag and therefore uses more recent data. Furthermore, the
result that the forecast power for multi factor models is better than for a forecast
based on the historical average is in line with Campbell and Thompsons ndings
[16].
68 Discussion of the Forecasting
Part II
Using the Equity Premium
in Asset Allocation
69
Chapter 9
Portfolio Optimization
In modern portfolio theory it is assumed that expected returns and covariances are
known with certainty. Naturally, this is not the case in practise - the inputs have
to be estimated and with this follows estimation errors. Errors in the estimations
have great impact on the optimal allocation weights in a portfolio, therefore it is
of great interest to have as accurate forecasts of the input parameters as possible,
which has been dealt with in part I of this thesis. Even if you have good estimates of
the input parameters, estimation errors will still be present, they are just smaller.
In this chapter we discuss and present the impact of estimation errors in portfolio
optimization.
9.1 Solution of the Markowitz problem
The Markowitz problem is the foundation for single-period investment theory and
relates the trade-o between expected rate of return and variance of the rate of
return in a portfolio of risky assets. [52]
The model of Markowitz is assuming that investors are only concerned about
the mean, the variance and the correlation of the portfolio assets. A portfolio is
said to be ecient if there is no other portfolio with the same expected return
but with a lower risk, or if there is no other portfolio with the same risk, but with
a higher expected return. [54] An investor who seeks to minimize risk (standard
deviation) always chooses the portfolio with the smallest standard deviation for a
given mean, i.e. he is risk averse. An investor, who for a given standard deviation
wants to maximize the expected return, is said to have the property nonsatiation.
An investor being riskaverse and nonsatiated at the same time will always choose a
portfolio on the ecient frontier, which is made up of the set of ecient portfolios.
[52] The portfolio on the ecient frontier with the lowest standard deviation is
called the minimum variance portfolio (MVP).
Given the number of assets n in the portfolio the other statistical properties of
the Markowitz problem can be described by its average return R
n1
, the
71
72 Portfolio Optimization
covariance matrix C R
nn
and the asset weight w R
n1
. The mathematical
formulation of the Markowitz problem is now given as
min
w
w
Cw
s.t.
w =
1
w = 1, (9.1)
where 1 is a column vector of ones. The rst constraint says that the weights
and their corresponding returns have to equal the desired return level. The sec-
ond constraint means that the weights have to add up to one. Note that in this
formulation, the signs of the weights are not restricted, short selling is allowed.
Following Zagst [66] the solution to problem (9.1) is given in theorem 9.1.
Theorem 9.1 (Solution of the Markowitz problem) If C is positive denite,
then according to theorem A.1, C is invertible and its inverse is also positive def-
inite. Further, denote
a = 1
C
1
b =
C
1
c = 1
C
1
1
d = bc a
2
.
The optimal solution of problem (9.1) is given as
w
=
1
d
((c a)C
1
+ (b a )C
1
1) (9.2)
with
2
( ) = w
Cw
=
c
2
2a +b
d
. (9.3)
The minimum variance portfolio denoted with w
MV P
is given as
w
MV P
=
1
c
C
1
1 (9.4)
and is located at
(
MV P
,
MV P
) = (
a
c
,
1
c
). (9.5)
Finally, the minimum variance set is given as
=
MV P

d
c
(
2
2
MV P
) (9.6)
where the positive case correspond to the ecient frontier, since it dominates the
negative case.
2
MV P
sets the lower bound for possible values on
2
.
9.1 Solution of the Markowitz problem 73
Proof :
1
Since C
1
is positive denite it holds that
b =
C
1
> 0 (9.7)
and also that
c = 1
C
1
1 > 0. (9.8)
With the scalar product
2
'1, ` 1
C
1
and the Cauchy-Schwarz inequality it
follows
'1, `
2
= (1
C
1
)
2
= a
2
'1, 1`', ` = (1
C
1
1)(
C
1
) = bc
and for = k 1, it follows that
d = bc a
2
> 0. (9.9)
Furthermore, the Lagrangian for problem (9.1) is given as
L(w, u) =
1
2
w
Cw +u
1
(
w) +u
2
(1 1
w) (9.10)
where the objective function has been multiplied with the factor
1
2
for convenience
only. w
is optimal if there exists an u = (u

1
, u
2
)
R
2
that satises the Kuhn-
Tucker conditions
L
w
i
(w
, u) =
n
j=1
c
i,j
w
j
u
1
u
i
u
2
= 0, i (9.11)
L
u
1
(w
, u) =
= 0 (9.12)
L
u
2
(w
, u) = 1 1
= 0. (9.13)
(9.11) Cw
= u
1
+u
2
1
w
= u
1
C
1
+u
2
C
1
1 (9.14)
(9.13)&(9.14) 1
= u
1
1
C
1
+u
2
1
C
1
1
= au
1
+cu
2
= 1 (9.15)
(9.12)&(9.14)
= u
1
C
1
+u
2
C
1
1
= bu
1
+au
2
= (9.16)
(9.15)&(9.16)
a c
b a
. .. .
A
u
1
u
2
. .. .
u
=
1
.
(9.17)
1
Following [66]
2
see theorem A.1
Calculate the inverse of A as
A
1
=
1
det(A)
a c
b a
=
1
bc a
2
a c
b a
=
1
d
a c
b a
, (9.18)
where d is greater than zero, see (9.9). Using (9.17) and (9.18) yields
u = A
1
=
1
d
c a
b a
(9.19)
By inserting (9.19) into (9.14) equation (9.2),the optimal weights, are found:
w
= u
1
C
1
+u
2
C
1
1
=
1
d
((c a)C
1
+ (b a )C
1
1). (9.20)
Equation (9.3) follows by
2
( ) = w
Cw
(9.11)
....
= u
1
+u
2
1
(9.15)&(9.16)
....
= u
1
+u
2
(9.21)
(9.19)
....
=
1
d
((c a) + (b a ))
=
c
2
2a +b
d
(9.22)
which has its minimum for
2
( )

=
1
d
(2c 2a) = 0

MV P
=
a
c
(9.23)
since the second partial derivative is positive
2
( )
2

=
2c
d
(9.8)&(9.9)
....
> 0. (9.24)
9.1 Solution of the Markowitz problem 75
(9.23) and (9.3) results in
MV P
=

2
(
MV P
)
=
c
2
MV P
2a
MV P
+b
d
=
c(
a
c
)
2
2a(
a
c
) +b
d
=
1
c
, (9.25)
where c is positive, see (9.8). Together with (9.23) this gives equation (9.5), the
location of the minimum variance portfolio.
(
MV P
,
MV P
) = (
a
c
,
1
c
)
The weights of the minimum variance portfolio, equation (9.4) is found as follows
w
MV P
(9.20)
....
=
1
d
((c
MV P
a)C
1
+ (b a
MV P
)C
1
1)
(9.23)
....
=
1
d
((c(
a
c
) a)C
1
+ (b a(
a
c
))C
1
1)
(9.9)
....
=
1
c
C
1
1. (9.26)
Finally, the ecient frontier in equation (9.6) is found by dening ( )
2
(9.22)
....
=
c
2
2a +b
d
d
c
2
=
2
2
a
c
+
b
c
= (
a
c
)
2
a
2
c
2
+
b
c
(9.9)&(9.23)
....
= (
MV P
)
2
+
d
c
1
c
(9.25)
....
= (
MV P
)
2
+
d
c
2
MV P
(
MV P
)
2
=
d
c
(
2
2
MV P
)
=
MV P

d
c
(
2
2
MV P
)
If shorting was not allowed, the constraint for positive portfolio weights would
have to be added to problem (9.1). The problem formulation would then be
min
w
w
Cw
s.t.
w =
1
w = 1
w 0. (9.27)
This optimization problem is quadratic just as problem (9.1), but in contrast it can
not be reduced to a set of linear equations due to the added inequality constraint.
Instead, an iterative optimization method has to be used for nding the optimal
weights. The problem is solved by making the call quadprog in Matlab. The
function solves quadratic optimization problems by using active set methods
3
.
9.2 Estimation error in Markowitz portfolios
The estimated parameters, mean and covariance, used in Markowitz-based port-
folio construction are often based on calculations on just one sample set from the
return history. Input parameters derived from this sample set can only be ex-
pected to equal the parameters of the true distribution if the sample is very large
and the distribution is stationary. If the distribution is non-stationary it could
be advisable to instead use a smaller sample for estimating the parameters. We
now can distinguish between two types of origination for the estimation error -
stationary but too short data set or non-stationary data. [61] In this part of the
thesis we will focus on estimation error originating from stationary but too short
data sets.
Solving problem (9.27) for a given data set, where the means and covariances
have been estimated on historical data, would generate portfolios that exhibit
very dierent allocation weights. Some assets tend to never enter the solution as
well. This is a natural result from solving the optimization problem - the assets
with very attractive features dominate the solution. It is also here the estimation
errors are likely to be large, which means that the impact of estimation errors
on portfolio weights is maximized. [61] This is an undesired property of portfolio
optimization that has been known for a long time [56]. Since the input parameters
are treated as if they were known with certainty, even very small changes in them
will trace out a new ecient frontier. The problem gets even worse as the numbers
of assets increases because this increases the probability of outliers. [61]
3
This is further explained in [35]
9.3 The method of portfolio resampling 77
9.3 The method of portfolio resampling
Section 9.2 presented the problems with estimation errors in portfolio optimiza-
tion due to treating input parameters as certain. A Monte Carlo approach called
Portfolio Resampling has been introduced by Michaud [56] to deal with this. The
basic idea is to allow for uncertainty in the input parameters by sampling from
a distribution with parameters specied by estimates on historical data. Fabozzi
[26] has summarized the procedure and it is described below.
Algorithm 9.1 (Portfolio resampling)
1. Estimate the mean vector, , and covariance matrix,

, from historical data.
2. Draw T random samples from the multivariate distribution N( ,

) to
estimate
i
and

i
.
3. Calculate an ecient frontier from the input parameters from step 2 over the
interval [
MV P,i
,
MAX
] which is partitioned into M equally spaced points.
Record the weights w
1,i
, . . . , w
M,i
.
4. Repeat step 2 and 3 a total of I times.
5. Calculate the resampled portfolio weights as w
M
=
1
I
I
i=1
w
M,i
and evalu-
ate the resampled frontier with the mean vector and covariance matrix from
step 1.
The number of draws T correspond to the uncertainty in the inputs you are us-
ing. As the number of draws increases the dispersion decreases and the estimation
error, the dierence between the original estimated input parameters and the sam-
pled input parameters, will become smaller. [61] Typically, the value of T is set
to the length of the historical data set [61] and the value of I is set between 100 to
500 [26]. The number of portfolios M can be chosen freely according to how well
the ecient frontier should be depicted.
The new resampled frontier will appear below the original one. This follows from
the weights w
1,i
, . . . , w
M,i
being optimal relative to
i
and

i
but inecient rela-
tive to the original estimates and

. Therefore, the resampled portfolio weights
are also inecient relative to and

. By the sampling and reestimation that
occurs at each step in the portfolio resampling process, the eect of estimation
error is incorporated in the determination of the resampled portfolio weights. [26]
9.4 An example of portfolio resampling
A portfolio consisting of 8 dierent assets has been constructed. The assets are:
a world commodity index; equity in the emerging markets, the US and Germany;
bonds in the emerging markets, the US and Germany and nally a real estate
index. Their mean vector and covariance matrix has been estimated on data from
2002-2006 and can be found in table 9.1.
Ticker
Bloomberg
Asset Mean
Covariance
Cmdty EQEM EQUS EQDE BDEM BDUS BDDE Estate
SPGCCITR Cmdty 0.57 0.21
NDLEEGF EQEM 0.08 0.32 0.21
INDU EQUS -0.05 0.17 0.18 0.05
DAX EQDE -0.08 0.31 0.30 0.64 0.07
JGENGLOG BDEM -0.01 0.01 0.00 -0.01 0.01 0.09
JPMTUS BDUS 0.01 -0.03 -0.03 -0.08 0.01 0.03 0.06
JPMTWG BDDE 0.01 -0.02 -0.02 -0.05 0.01 0.02 0.01 0.05
G250PGLL Estate -0.05 0.12 0.10 0.13 0.01 -0.01 0.00 0.19 0.10
Table 9.1. Input parameters for portfolio resampling
With the input parameters from table 9.1 a portfolio resampling has been carried
out, with and without shorting allowed and always with both errors in the mean
and covariances. In gure 9.1 the resampled ecient frontiers are depicted. In
gure 9.2 and 9.3 the portfolio allocations are found. Finally, the impact of errors
in the mean and in the covariances respectively are displayed in 9.4.
9.5 Discussion of portfolio resampling 79
9.5 Discussion of portfolio resampling
As discussed earlier the resampled frontier will plot below the ecient frontier,
just as in gure 9.1 b. However, when shorting is allowed the resampled frontier
will coincide with the ecient frontier. Why is that? Estimation errors should
result in an increase in portfolio risk showing up as an increase in volatility for
each return level. Instead it can only be seen that the estimation errors result in
a shortening of the frontier. The explanation given by Scherer [61] is that highly
positive returns will be oset by highly negative returns when drawing from the
original distribution. The quadratic programming optimizer will invest heavily in
the asset with highly positive returns and short the asset with highly negative
returns and this will be oset in average. When the long-only constraint is added,
this will no longer be the case and the resampled frontier will plot below the e-
cient frontier, gure 9.1 b.
As a result of above, the resampled porfolio weights when shorting is allowed
will be pretty much the same as those in the ecient portfolios. Most of the assets
enter the solution in the same way, as depicted in gure 9.2 b. When shorting
no longer is allowed, the resulting allocations are very concentrated to only some
assets in the ecient portfolios and a small shift in desired return level can lead
to rather dierent allocations, e.g. going from portfolio 6 to 7 in gure 9.3. The
resampled portfolios on the other hand exhibit a much more smooth transition
from dierent return levels and a greater diversication.
In the resampling, estimation errors have been assumed both in the means and
covariances. In gure 9.4 the eect of only estimation errors in the means or co-
variances can be observed. It is found that estimation errors in the mean have a
much greater impact than estimation errors in covariances. A good forecast of the
mean will improve the resulting allocations a great deal.
The averaging in the portfolio resampling method makes the weights still sum
to one, which is important. But averaging can sometimes prove to be misleading.
For instance you will always face the probability that the allocation weights for a
given portfolio are heavily inuenced of a few lucky draws making the asset look
more attractive than what is justiable. Averaging is indeed the main idea be-
hind portfolio resampling, but it is not plausible that the nal averaged portfolio
weights are dependent on a few extreme outcomes. This is criticism discussed by
Scherer [61]. However, the most important criticism, also presented by Scherer
[61], is that all resamplings are derived from the same mean vector and covariance
matrix. Because the true distribution is unknown, all resampled portfolios suer
from the same deviation from the true parameters in pretty much the same way.
Averaging will not help much in this case. Therefore it is fair to say that all port-
folios inherit the same estimation error.
It is found by Michaud [56] that resampled portfolios beat Markowitz portfolios
out-of-sample. This follows from the fact that well diversied portfolios tend to
always beat Markowitz portfolios out-of-sample and can therefore not only be sub-
scribed to the portfolio resampling method itself as being outstanding. Although
the resampling heuristic have some major drawbacks, it remains interesting since
it is a rst step of addressing estimation errors in portfolio optimization.
(a) shorting allowed
(b) no shorting allowed
Figure 9.1. Comparison of ecient and resampled frontier
(a) Resampled weights
(b) Mean-variance weights
Figure 9.2. Resampled portfolio allocation when shorting allowed
(a) Resampled weights
(b) Mean-variance weights
Figure 9.3. Resampled portfolio allocation when no shorting allowed
(a) Errors in mean
(b) Errors in covariance
Figure 9.4. Comparison of estimation error in mean and covariance
Chapter 10
Backtesting Portfolio
Performance
In the rst part of this thesis we developed a method for forecasting the equity
premium that took model uncertainty into account. It was found that our forecast
outperformed the use of an historical average but was associated with estimation
errors. In the previous chapter we presented portfolio resampling as a method for
dealing with these errors. In this chapter we will evaluate if portfolio resampling
can be used to improve our forecasting results.
10.1 Backtesting setup and results
We benchmark the performance of a portfolio consisting of all the assets found in
table 9.1, except for equity and bonds from emerging markets, using our forecasted
equity premium and portfolio resampling. For the two assets in emerging markets
we had too short time series.
Starting in the end of 1998 and going to the end of 2007 we solve problem (9.27)
and rebalance the portfolio at the end of each year. We do not allow for short-
selling since it previously was found that portfolio resampling only has eect under
the long-only constraint. Transaction costs are not taken into account, since our
concern is the relative performance of the methods. The returns vector, , is fore-
casted using the arithmetic average of the returns up to time t for asset i except
for equity US where we make use of our one year multivariate forecasted equity
premium for time t. The parameter is set so that each portfolio has a volatility of
0.02 14% when rebalanced. The covariance matrix is always estimated on all
returns available up to time t. The resulting portfolio value over time is found in
gure 10.1 and in table 10.1 the corresponding returns are found. In table 10.2 the
exact portfolio values on the end date for ten resampling simulations are presented.
85
86 Backtesting Portfolio Performance
Figure 10.1. Portfolio value over time using dierent strategies
It is found that using our premium forecasts as input yields better performance
than just employing the historical average
1
. Our forecast consistently generates
the highest portfolio value. As explained earlier, using accurate inputs in portfolio
optimization is very important.
Date EEP EEP&PR aHEP aHEP&PR
Dec-99 33.4 32.8 24.4 27.2
Dec-00 -3.6 -2.6 -2.8 -1.2
Dec-01 -17.1 -18.3 -17.1 -18.9
Dec-02 -16.8 -16.8 -23.3 -19.8
Dec-03 22.0 24.3 19.0 23.0
Dec-04 3.4 7.6 7.0 9.8
Dec-05 18.9 20.2 20.8 21.0
Dec-06 6.9 5.9 6.7 6.3
Dec-07 20.7 17.6 20.3 19.4
Table 10.1. Portfolio returns in percent over time. PR is the acronym for portfolio
resampling.
1
For the asset equity US, the historical arithmetic average is refered to as aHEP.
10.1 Backtesting setup and results 87
EEP EEP&PR aHEP aHEP&PR
1.716 1.701 1.520 1.731
1.765 1.671
1.750 1.713
1.700 1.717
1.785 1.728
1.768 1.672
1.750 1.755
1.790 1.730
1.767 1.675
1.766 1.736
Average: 1.754 1.713
Table 10.2. Terminal portfolio value. PR is the acronym for portfolio resampling.
Portfolio resampling seems to improve performance if the input is very uncertain,
such as the aHEP. Resampling increases the portfolio return on an average of al-
most 20 percentage units for the aHEP, but only about 4 percentage units for the
EEP. As seen in table 10.2, resampling generated a higher terminal value ten out
of ten times for the aHEP, whilst for the EEP resampling sometimes generated
a lower terminal portfolio value. This could point to that resampling indeed is
useful when the input parameters are uncertain, since the portfolio weights get
smoothened and more assets enter the solution and creates a more diversied
portfolio. According to Michaud [56] well diversied portfolios, e.g. obtained by
resampling, should outperform Markowitz portfolios out-of-sample, just as found
here. The pure EEP and aHEP portfolios are both outperformed by their resam-
pled counterparts. The rather small increase in portfolio return when resampling
using the EEP as input compared to using the aHEP, points to the EEP containing
smaller estimation errors than the aHEP. This is also supported by the positive
R
2
os,mv
found in section 7.2.
In this backtest we nd evidence that our multivariate forecast performs better
than the arithmetic average when used as input in a mean-variance asset allocation
problem. Portfolio resampling is also found to provide a good way of arriving at
meaningful asset allocations when the input parameters are very noisy.
Chapter 11
Conclusions
In this thesis we incorporate model uncertainty in the forecasting of the expected
equity premium by creating a large number of linear prediction models on which
we apply Bayesian model averaging. We also investigate the general impact of in-
put estimation errors in mean-variance optimization and evaluate the performance
of a Monte Carlo based heuristic called portfolio resampling.
It is found that the forecasting ability of multi factor models is not substantially
improved by our approach. Our interpretation thereof is that the largest problem
with multifactor models is not model uncertainty, but rather too low predictive
ability.
Further, our investigation brings evidence that the GDP, the short term spread
and the volatility are useful in forecasting the expected equity premium for the ve
years to come. Our investigations also show that multivariate models are to some
extent better than univariate models, but it can not be said that any of them is
accurate in predicting the expected equity premium. Nevertheless, it is likely that
both provide better forecasts than using the arithmetic average of the historical
equity premium.
We have also found that portfolio resampling provides a good way to arrive at
meaningful allocation decisions when the optimization inputs are very noisy.
Our proposal to further work is to investigate whether a Bayesian analysis, not in-
volving linear regression, with carefully selected priors, calibrated to reect mean-
ingful economic information, provides better predictions for the expected equity
premium than the approach used in this thesis.
89
Bibliography
[1] Ang A. & Bekaert G., (2003), Stock return predictability: is it there?, Work-
ing Paper, University of Columbia.
[2] Avramov D., (2002), Stock return predictability and model uncertainty, Jour-
nal of Financial Economics, vol. 64, pp. 423-458.
[3] Baker M. & Wurgler J., (2000), The Equity Share in New Issues and Aggregate
Stock Returns, Journal of Finance, American Finance Association, vol. 55(5),
pp. 2219-2257.
[4] Benning J. F., (2007), Trading Strategies for Capital Markets, McGraw-Hill,
New York.
[5] Bernardo J. M. & Smith A., (1994), Bayesian Theory, John Wiley & Sons
Ltd.
[6] Bostock P., (2004), The Equity Premium, Journal of Portfolio Management
vol. 30(2), pp. 104-111.
[7] Brealey R. A., Myers S. C. & Allen F., (2006),Corporate Finance, McGraw-
Hill, New York.
Hill, New York.
Hill, New York.
[10] Burda M. & Wyplosz C., (1997), Macroeconomics: A European text, Oxford
University Press, New York.
[11] Campbell J. Y., Lo A. & MacKinlay A., (1997), The Econometrics of Financial
Markets, Princeton University Press.
[12] Campbell J. Y. & Shiller R. J., (1988) The dividend-price ratio and expecta-
tions of future dividends and discount factors, Review of Financial Studies,
vol. 1, pp. 195-228.
[13] Campbell J. Y. & Shiller R. J., (1988) Stock prices, earnings, and expected
dividends, Journal of Finance, vol. 43, pp. 661-676.
91
92 Bibliography
[14] Campbell J. Y. & Shiller R. J., (1998) Valuation ratios and the long-run stock
market outlook, Journal of Portfolio Management, vol. 24, pp. 11-26.
[15] Campbell, J. Y., (1987), Stock returns and the term structure, Journal of
Financial Economics, vol. 18, pp. 373-399.
[16] Campbell J. & Thompson S., (2005), Predicting the Equity Premium Out of
Sample: Can Anything Beat the Historical Average?, NBER Working Papers
11468, National Bureau of Economic Research.
[17] Casella G. & Berger R. L., (2002), Statistical Inference, 2nd ed. Duxbury
Press.
[18] Choudhry M., (2006), Bonds - A concise guide for investors, Palgrave Macmil-
lan, New York.
[19] Cohen R.B., Polk C. & Vuolteenaho T., (2005), Ination Illusion in the Stock
Market: The Modigliani-Cohn Hypothesis, Quarterly Journal of Economics,
vol. 120, pp. 639-668.
[20] Daln J., (2001), The Swedish Consumer Price Index - A Handbook of Meth-
ods, Statistiska Centralbyrn, SCB-Tryck, rebro.
[21] Damodaran A., (2006), Damodaran on Valuation, John Wiley & Sons, New
York.
[22] Dimson E., Marsh P. & Staunton M., (2006), The Worldwide Equity Pre-
mium: A Smaller Puzzle, SSRN Working Paper No. 891620.
[23] Durbin J. & Watson G.S., (1950), Testing for Serial Correlation in Least
Squares Regression I, Biometrika vol. 37, pp. 409-428.
[24] Escobar L. A. & Meeker W. Q., (2000), The Asymptotic Equivalence of the
Fisher Information Matrices for Type I and Type II Censored Data from
Location-Scale Families., Working Paper.
[25] Estrella A. & Trubin M. R., (2006), The Yield Curve as a Leading Indicator:
Some Practical Issues, Current Issues in Economics and Finance - Federal
Reserve Bank of New York, vol. 12(5).
[26] Fabozzi F. J., Focardi S. M. & Kolm P. N., (2006), Financial Modeling of the
Equity Market, John Wiley & Sons, New Jersey.
[27] Fama E.F., (1981), Stock returns, real activity, ination and money, American
Economic Review, pp. 545-565.
[28] Fama E. F. & French K. R., (1988), Dividend yields and expected stock
returns, Journal of Financial Economics, vol. 22, pp. 3-25.
[29] Fama E. F. & French K. R., (1989), Business conditions and expected returns
on stocks and bonds, Journal of Financial Economics, vol. 25, pp. 23-49.
Bibliography 93
[30] Fama E.F. & Schwert G.W., (1977), Asset Returns and Ination, Journal of
Financial Economics, vol. 5(2), pp. 115-46.
[31] The Federal Reserve, Industrial production and capacity utilization, (2007),
Retrieved February 12, 2008 from
http://www.federalreserve.gov/releases/g17/20071214/
[32] Fernndez P., (2006), Equity Premium: Historical, Expected, Required and
Implied, IESE Business School, Madrid.
[33] Fernandz C., Ley E. & Steel M., (1998), Benchmark priors for Bayesian
Model Averaging, Working Paper.
[34] Franke J., Hrdle W.K. & Hafner C.M., (2008), Statistics of Financial Markets
An Introduction, Springer-Verlag, Berlin Heidelberg.
[35] Gill P. E. & Murray W., (1981), Practical Optimization, Academic Press,
London.
[36] Golub G. & Van Loan C., (1996), Matrix Computations, The Johns Hopkins
University Press, Baltimore.
[37] Goyal A. & Welch I., (2006), A Comprehensive Look at the Empirical Per-
formance of Equity Premium Prediction, Review of Financial Studies, forth-
coming.
[38] Hamilton J. D., (1994), Time Series Analysis, Princeton University Press.
[39] Harrell F. E., (2001), Regression Modeling Strategies, Springer-Verlag, New
York.
[40] Hodrick R. J., (1992), Dividend yields and expected stock returns: alternative
procedures for inference and measurement, Review of Financial Studies, vol.
5(3), pp. 257-286.
[41] Hoeting J. A., Madigan D. & Raftery A. E. & Volinsky C. T., (1999), Bayesian
Model Averaging: A Tutorial, Statistical Science 1999, vol. 14(4), pp. 382-417.
[42] Ibbotson Associates, (2006), Stocks, Bonds, Bills and Ination, Valuation
Edition, 2006 Yearbook.
[43] Keim D. B. & Stambaugh R. F., (1986), Predicting returns in the stock and
bond markets, Journal of Financial Economics, vol. 17(2), pp. 357-390.
[44] Kennedy P. E., (2000), Macroeconomic Essentials - Understanding Economics
in the News, The MIT Press, Cambridge.
[45] Koller T. & Goedhart M. & Wessels D., (2005), Valuation: Measuring and
Managing the Value of Companies, McKinsey & Company, Inc. Wiley.
[46] Kothari S. P. & Shanken J., (1997), Book-to-market, dividend yield, and ex-
pected market returns: a time series analysis, Journal of Financial Economics,
vol. 44, pp. 169-203.
94 Bibliography
[47] Krainer J., What Determines the Credit Spread?, (2004), FRBSF Economic
Letter, Nr 2004-36.
[48] Lamont O., (1998), Earnings and expected returns, Journal of Finance, vol.
53, pp.1563-1587.
[49] Lee P. M., (2004), Bayesian Statistics an introduction, Oxford University
Press.
[50] Lettau M. & Ludvigson, (2001), Consumption, aggregate wealth and expected
stock returns, Journal of Finance, vol. 56(3), pp. 815-849.
[51] Lewellen J., (2004), Predicting returns with nancial ratios, working paper.
[52] Luenberger D. G., (1998), Investment Science, Oxford University Press, New
York.
[53] Mankiw G. N., (2002), Macroeconomics, Worth Publishers, New York.
[54] Mayer B., (2007), Credit as an Asset Class, Masters Thesis, TU Munich.
[55] Merton R. C., (1980), On Estimating the Expected Return on the Market:
An Exploratory Investigation, Journal of Financial Economics, vol. 8, pp.
323-361.
[56] Michaud R., (1998), Ecient Asset Management: A Practical Guide to Stock
Portfolio Optimization and Asset Allocation, Oxford University Press, New
York.
[57] Polk C., Thompson S, & Vuolteenaho T, (2005), Cross-sectional forecasts of
the equity premium, Journal of Financial Economics, vol. 81(1), pp. 101-141.
[58] Ponti J. & Schall L. D., (1998), Book-to-market ratios as predictors of market
returns, Journal of Financial Economics, vol. 49, pp. 141-160.
[59] Press J. S., (1972), Applied Multivariate Analysis, Holt, Rinehart & Winston
Inc, University of Chicago.
[60] Roze M., (1984), Dividend yields are equity risk premiums, Journal of Port-
folio Management, vol. 11, pp. 68-75.
[61] Scherer B., (2004), Portfolio Construction and Risk Budgeting, Risk Books,
Incisive Financial Publishing Ltd.
[62] University of Michigan, Surveys of consumers, Retrieved February 9, 2008
from http://www.sca.isr.umich.edu/
[63] U.S. Department of Labor, Glossary, Retrieved February 5, 2008 from
http://www.bls.gov/bls/glossary.htm#P
[64] Vaihekoski M., (2005), Estimating Equity Risk Premium: Case Finland,
Lappeenranta University of Technology, Working paper.
Bibliography 95
[65] Welch, I., (2000),Views of Financial Economists on the Equity Premium and
on Professional Controversies, Journal of Business, vol. 73(4), pp. 501-537
[66] Zagst R., (2004), Lecture Notes - Asset Pricing, TU Munich.
[67] Zagst, R. & Pschik M., (2007), Inverse Portfolio Optimization under Con-
straints, Working Paper.
[68] Zellner A., (1986), On assessing prior distributions and bayesian regression
analysis with g-prior distributions, in Essays in Honor of Bruno de Finetti,
eds P.K. Goel and A. Zellner, Amsterdam: North-Holland, pp. 233-243.
Appendix A
Mathematical Preliminaries
A.1 Statistical denitions
Denition A.1 (Bias) Let

be a sample estimate of a vector of parameters .
For example,

could be the sample mean x. The estimate is then said to be
unbiased if E[
] = , (see [38]).
Denition A.2 (Stochastic process) A stochastic process X
t
, t Z, is a fam-
ily of random variables, dened in a probability space (, T, P).
At a specic time point t, X
t
is a random variable with a specic density function.
Given a specic w , X() = X
t
(, t Z) is a realization or a path of the
process, (see [34]).
Denition A.3 (Autocovariance function) The autocovariance function of a
stochastic process X
t
is dened as
(t, ) = E[(X
t
t
)(X
t

t
)], Z
The autocovariance function is symmetric, that is, (t, ) = (t, ). In general
(t, ) is dependent on t as well as on . Below we dene the important concept of
stationarity, which many times will simplify autocovariance functions, (see [34]).
Denition A.4 (Stationarity) A stochastic process X
t
is covariance stationary
if
E[X
t
] = and (t, ) = (), t.
A stochastic process X
t
is strictly stationary if for any t
1
, . . . , t
n
and for all n, s Z
it holds that the joint distribution
F
t
1
,...,t
n
(x
1
, . . . , x
n
) = F
t
1
+s,...,t
n
+s
(x
1
, . . . , x
n
).
For covariance stationary processes, the term weakly stationary is often used, (see
[34]).
97
98 Mathematical Preliminaries
Denition A.5 (Trace of a matrix) The trace of an matrix A R
nn
is de-
ned as the sum of the elements along the diagonal
tr(A) = a
11
+a
22
+ +a
nn
, (see[59]).
Denition A.6 (The gamma function) The gamma function can be dened
as the denite integral
(x) =
0
t
(x1)
e
t
dt
where x R and x > 0, (see [59]).
Denition A.7 (Positive denite matrix) A symmetric matrix A R
nn
is
called positive denite if
x
Ax > 0, x = 0 R
n
, (see[34]).
Theorem A.1 (Properties of positive denite matrices) If Ais positive def-
inite it denes an inner product on R
n
as
'x, y` = x
Ay.
In particular, the standard inner product for R
n
is obtained when setting A = I.
Furthermore, A has only positive eigenvalues
i
and is invertible and its inverse
is also positive denite.
Proof : (see [36], [59])
A.2 Statistical distributions
Denition A.8 (The normal distribution) The variable Y has a Gaussian,
or normal, distribution with mean and variance
2
if
f
Y
=
1
2
exp
(y
t
)
2
2
2
.
Denition A.9 (The Chi-Squared distribution) The probability density for
the
2
-distribution with v degrees of freedom is given by
p
v
(x) =
x
v/21
exp[x/2]
(v/2)2
v/2
.
A.2 Statistical distributions 99
Denition A.10 (The multivariate normal distribution) Let x R
p1
be
a random vector with density function f(x). x is said to follow a multivariate
normal distribution with mean vector R
p1
and covariance matrix R
pp
if
f(x) =
1
(2)
p/2
||
1/2
exp[
1
2
(x )
1
(x )].
If [[ = 0 the distribution of x is called degenerate and does not exist.
The inverted Wishart distribution is the multivariate generalization of the uni-
variate inverted gamma distribution. It is the distribution of the inverse of a
random matrix following the Wishart distribution, and is the distribution which
is natural conjugate prior for the covariance matrix in a normal distribution.
Denition A.11 (The inverted Wishart distribution) Let U R
pp
be a
random matrix following the inverted Wishart distribution with positive denite
matrix G and n degrees of freedom. Then for n > 2p, the density of U is given by
p(U) =
c
0
|G|
(np1)/2
|U|
(n/2)
exp[
1
2
tr[U
1
G]]
and p(U) = 0 otherwise. The constant c
0
is given by
c
1
0
= 2
(np1)p/2
p(p1)/4
p
j=1
(
npj
2
).
Appendix B
Code
B.1 Univariate predictions
%input
[dates,values]=loadThesisData_LongDataSet(false); [dates, returns,
differ] = calcFactors_LongDataSet(dates, values);
eqp=returns(1:end,1); %this is the equity premium
returns=returns(1:end,2:end);
muci=[]; predRng=[]; allEst=[]; prob_model=[]; outliersStep=[];
%prediction horizon
horizon=5;
for k=1:horizon
y_bma=[];
x_bma=[];
res=[];
est=[];
removedModels=[];
usedModels=[];
outliers=0;
for j=1:length(returns(1,:))
[x, y, est_tmp, beta, resVec, outliersTmp]=predictClean(eqp(k+1:end)...
,returns(1:end-k,j),returns(end,j));
res = [res resVec];
est = [est est_tmp];
y_bma=[y_bma y];
x_bma=[x_bma x];
n=length(x(:,1));
p=length(x(1,:));
g=1/n;
if (est(j) > 0.0) && est(j)<mean(eqp(k+1:end))+1.28*rlstd(eqp(k+1:end))
P=x*inv(x*x)*x;
likelihood(j)=(gamma(n/2)/((2*pi)^(n/2))/((1+g)^(p/2)))...
*(y*y-(g/(1+g))*y*P*y)^(-n/2);
usedModels = [usedModels j];
else
likelihood(j)=0;
100
B.2 Multivariate predictions 101
removedModels = [removedModels j];
est(j)=0;
usedModels = [usedModels j];
end
outliers = outliers + outliersTmp;
end
outliersStep=[outliersStep outliers];
usedModelsBMA = usedModels*2-1;
p_model=likelihood./sum(likelihood);
weightedAvg =p_model*est;
prob_model=[prob_model p_model];
predRng = [predRng; 100*min(est) 100*max(est) 100*mean(est)];
allEst = [allEst est];
VARyhat_data=zeros(length(res(:,1)),length(res(:,1)));
for i = 1:length(returns(1,:))
VARyhat_data = VARyhat_data +(diag(res(:,i))*x_bma(:,i*2-1:i*2)...
*inv(x_bma(:,i*2-1:i*2)*x_bma(:,i*2-1:i*2))*x_bma(:,i*2-1:i*2)...
+y_bma(:,i)*y_bma(:,i))*prob_model(i)-(y_bma(:,i)*prob_model(i))...
*(y_bma(:,i)*prob_model(i));
end
STD_step(k) = sqrt(sum(diag(VARyhat_data))/length(diag(VARyhat_data)));
z=norminv([0.05 0.95],0,1);
muci=[muci; weightedAvg+z(1)*STD_step(k)/sqrt(length(res(:,1)))...
weightedAvg weightedAvg+z(2)*STD_step(k)/sqrt(length(res(:,1)))];
end
B.2 Multivariate predictions
[dates,values]=loadThesisData_LongDataSet(false);
%input
[dates, returns, differ] = calcFactors_LongDataSet(dates, values);
eqp=returns(:,1); regressor=returns(:,2:end);
numFactor=length(regressor(1,:)); numOfModel=2^numFactor;
horizon=5; %prediction horizon
comb=combinations(numFactor);
prob_model=zeros(numOfModel-1,horizon);
likelihood=zeros(numOfModel-1,1); tmp=zeros(numOfModel-1,1);
usedModels=zeros(1,horizon); predRng=zeros(3,horizon);
y_bma=zeros(length(returns),horizon);
res=zeros(length(eqp)-1,numOfModel-1); toto = ones(length(eqp),1);
r=zeros(1,horizon); allMag=[]; muci=[]; VARyhat_data=[];
for k=1:horizon
for i=1:numOfModel-1
%pick a model
L=length(regressor(:,1));
out=[comb(i,1)*ones(L,1) comb(i,2)*ones(L,1) comb(i,3)*ones(L,1)...
comb(i,4)*ones(L,1) comb(i,5)*ones(L,1) comb(i,6)*ones(L,1)...
comb(i,16)*ones(L,1) comb(i,17)*ones(L,1) comb(i,18)*ones(L,1)];
output=out.*regressor;
modRegr = output(:,not(all(output(:,1:size(output,2))== 0)));
%predictions
[x, y, est_tmp, beta, resVec, outliersTmp]=predictClean(eqp(k+1:end)...
,modRegr(1:end-k,:),modRegr(end,:));
102 Code
if (est_tmp>0)&&(est_tmp<(mean(eqp(k+1:end))+1.28*sqrt(var(eqp(k+1:end)))))
tmp(i)=est_tmp;
%calculate likelihood
n=length(x(:,1));
p=length(x(1,:));
g=p^(1/(1+p))/n;
P=x*inv(x*x)*x;
likelihood(i)=(gamma(n/2)/((2*pi)^(n/2))/((1+g)^(p/2)))...
*(y*y-(g/(1+g))*y*P*y)^(-n/2);
else
likelihood(i)=0;
tmp(i)=0;
r(k)=r(k)+1;
end
setsubColumn(k+1,size(res,1),i,resVec,res);
end
%bma
p_model=likelihood./sum(likelihood);
magnitude=p_model*tmp;
prob_model(:,k)=p_model;
predRng(:,k)=[min(tmp); max(tmp); mean(tmp)];
allMag=[allMag magnitude];
y_bma(k+1:end,k)=y;
%Compute variance and confidence interval
%Instead of storing all models, create them again
VARyhat_data=zeros(length(y_bma(k+1:end,k)));
for i=1:numOfModel-1
%pick a model
L=length(regressor(:,1));
out=[comb(i,1)*ones(L,1) comb(i,2)*ones(L,1) comb(i,3)*ones(L,1)...
comb(i,16)*ones(L,1) comb(i,17)*ones(L,1) comb(i,18)*ones(L,1)];
output=out.*regressor;
modRegr = [modRegr(1:end-k,:) ones(length(modRegr(1:end-k,:)),1)];
%intercept added
VARyhat_data = VARyhat_data + (diag(res(k:end,i))*modRegr*inv(modRegr...
*modRegr)*modRegr+y_bma(k+1:end,k)*y_bma(k+1:end,k))...
*prob_model(i)-(y_bma(k+1:end,k)*prob_model(i))...
*(y_bma(k+1:end,k)*prob_model(i));
end
STD_step(k) = sqrt(sum(diag(VARyhat_data))/(length(diag(VARyhat_data))));
z=norminv([0.05 0.95],0,1);
muci=[muci; allMag(k)+z(1)*STD_step(k)/sqrt(length(res(:,1)))...
allMag(k) allMag(k)+z(2)*STD_step(k)/sqrt(length(res(:,1)))];
end
B.3 Merge time series 103
B.3 Merge time series
Developed by Jrgen Blomvall, Linkping Institute of Technology
function [mergedDates, values] = mergeExcelData(sheetNames, data)
mergedDates = datenum(30-Dec-1899) + data{1}(:,1);
mergedDates(find(isnan(data{1}(:,2)))) = []; length(sheetNames); for
i = 2:length(sheetNames)
nMerged = length(mergedDates);
dates = datenum(30-Dec-1899) + data{i}(:,1);
newDates = zeros(size(mergedDates));
for j = 1:nMerged
while (dates(k) < mergedDates(j) && k < length(dates))
k = k+1;
end
if (dates(k) == mergedDates(j) && ~isnan(data{i}(k,2)))
n = n+1;
newDates(n) = mergedDates(j);
end
end
mergedDates = newDates(1:n);
end
values = zeros(n, length(sheetNames));
for i = 1:length(sheetNames)
dates = datenum(30-Dec-1899) + data{i}(:,1);
k = 1;
for j = 1:n
while (dates(k) < mergedDates(j) && k < length(dates))
k = k+1;
end
if (dates(k) == mergedDates(j))
values(j,i) = data{i}(k,2);
else
error = 1
end
end
end
B.4 Load data into Matlab from Excel
Developed by Jrgen Blomvall, Linkping Institute of Technology
function [dates, values] = loadThesisData(interpolate)
%[status, sheetNames] = xlsfinfo(test_merge.xls); % Do not work for all
% Matlab versions
sheetNames = {DJtech WoMat ConsDisc EnergySec ConStap
Health...
Util sp1500 sp500 spEarnYld spMktCap spPERat spDaiNetDiv...
spIndxPxBook spIndxAdjPe spEqDvdYi12m spGenPERat spPrice...
spMovAvg200 spVol90d MoodCAA MoodBAA tresBill3m USgenTBill1M...
GovtYield10Y CPI PCECYOY};
for i = 1:length(sheetNames)
data{i} = xlsread(runEqPred.xls, char(sheetNames(i)));
end
if interpolate
[dates, values] = mergeInterpolExcelData(sheetNames, data);
else
[dates, values] = mergeExcelData(sheetNames, data);
end
104 Code
B.5 Permutations
function out = combinations(k);
total_num = 2^k; indicator = zeros(total_num,k); for i = 1:k;
temp_ones = ones( total_num/( 2^i),2^(i-1) );
temp_zeros = zeros( total_num/(2^i),2^(i-1) );
x_temp = [temp_ones; temp_zeros];
indicator(:,i) = reshape(x_temp,total_num,1);
end;
out = indicator;
B.6 Removal of outliers and linear prediction
function [x, y, est, beta ,resVec, outliers]=predictClean(y,x,
lastVal)
%remove outliers
xTmp=[]; outliers=0; for i=1:length(x(1,:)) xVec=x(:,i);
for k=1:3 %nr of iterations for finding outliers
H_hat=xVec*inv(xVec*xVec)*xVec;
Y=H_hat*y;
index=find(abs(Y-mean(Y))>3*rlstd(Y));
outliers=outliers+length(index);
for j=1:length(index)
if index(j)~= length(y)
xVec(index(j))= 0.5*xVec(index(j)+1)+0.5*xVec(index(j)-1);
else
xVec(index(j))=0.5*xVec(index(j)-1)+0.5*xVec(index(j));
end
end
end xTmp = [xTmp xVec]; end x=xTmp;
%OLS
x=[ones(length(x),1) x]; %adding intercept
beta=x\y; % OLS
est=[1 lastVal]*beta; %predicted value
resVec=(y-x*beta).^2; %residual vector
B.7 setSubColumn
#include "mex.h"
void mexFunction(int nlhs, mxArray *plhs[ ],int nrhs, const mxArray
{
*prhs[ ]) { int j; double *output;
double *src; double *dest; double *iStart, *iEnd, *col;
iStart = mxGetPr(prhs[0]); iEnd = mxGetPr(prhs[1]); col =
mxGetPr(prhs[2]);
src = mxGetPr(prhs[3]); dest = mxGetPr(prhs[4]);
//mexPrintf("%d\n", (int)col[0]*mxGetM(prhs[4])+(int)iStart[0]-1);
/* Populate the output */
memcpy(&(dest[((int)col[0]-1)*mxGetM(prhs[4])+(int)iStart[0]-1]),...\\
src, (int)(iEnd[0]-iStart[0]+1)*sizeof(double));
}
B.8 Portfolio resampling 105
B.8 Portfolio resampling
% Load Data & Set Parameters
[dates,values]=loadThesisData_Resampling4(false);
volDesired = 0.02; nrAssets=6; T=17; I=200; nrPortfolios=30;
errMean=true; errCov=true;
normPort=false; resampPort=true; stocksNr = [1 2 3 4 5 6];
EQP=[0.1417 0.1148 0.1062 0.4478 0.1024 0.1372 0.0979 0.0635
0.0897 0.1084]; HEP=[0.0616 0.0760 0.0708 0.0326 0.0231 0.0253
0.0398 0.0578 0.0674 0.0450];
for l = 1:10
%1. Estimate Historical Mean & Cov
if normPort
histMean=mean(returns(1:end-(10-l),stocksNr));
histMean(2)=EQP(l);
%histMean(2)=HEP(l);
histCov=cov(returns(1:end-(10-l),stocksNr));
elseif resampPort
histMean=mean(returns(1:end-(10-l),stocksNr));
histMean(2)=EQP(l);
%histMean(2)=HEP(l);
histCov=cov(returns(1:end-(10-l),stocksNr));
end
%2. Sample the Distribution
if resampPort
wStarAll=zeros(nrAssets, nrPortfolios);
for j=1:I
r = mvnrnd(histMean,histCov,T);
sampMean = mean(r);
sampCov = cov(r);
%3. Calculate efficient sampled Frontier
if (errMean) && ~(errCov)
sampMean=sampMean;
sampCov=histCov;
elseif errCov && ~(errMean)
sampMean=histMean;
sampCov=sampCov;
elseif errCov && errMean
sampMean=sampMean;
sampCov=sampCov;
else
sampMean=histMean;
sampCov=histCov;
end
minMean = abs(min(sampMean));
maxMean = max(sampMean);
z=1;
for k=[minMean:(maxMean-minMean)/(nrPortfolios-1):maxMean]
[wStar(:,z), tmp] = solveQuad(sampMean, sampCov, nrAssets, k);
z=z+1;
end
%4. Repeat step 2-3
allReturn(:,j)=wStar*histMean;
for q=1:nrPortfolios
allVol(q,j)=wStar(:,q)*histCov*wStar(:,q);
end
wStarAll=wStarAll + wStar;
end
106 Code
%5. Calculate Average Weights
wStarAll=wStarAll./I;
returnResamp=wStarAll*histMean;
for i=1:nrPortfolios
volResamp(i)=wStarAll(:,i)*histCov*wStarAll(:,i);
end
end
%6. Original Frontier
minMean = abs(min(histMean));
maxMean = max(histMean);
z=1;
for k=[minMean:(maxMean-minMean)/(nrPortfolios-1):maxMean]
[wStarHist(:,z), tmp] = solveQuad(histMean, histCov, nrAssets, k);
z=z+1;
end
returnHist=wStarHist*histMean;
for i=1:nrPortfolios
volHist(i)=wStarHist(:,i)*histCov*wStarHist(:,i);
end
prices((11-l),:)=values(end-(l-1),stocksNr);
if resampPort
[mvp_val mvp_nr] = min(volHist);
[tmpMin, portNr] = min(abs(volResamp(mvp_nr:end)-volDesired));
weights(l,:)=wStarAll(:, portNr+mvp_nr-1);
else
[mvp_val mvp_nr] = min(volHist);
[tmpMin, portNr] = min(abs(volHist(mvp_nr:end)-volDesired));
weights(l,:)=wStarHist(:, portNr+mvp_nr-1);
end
end
[V, wealth]=buySell2(weights,prices)
B.9 Quadratic optimization
function [w, fval] = solveQuad(histMean, histCov, nrAssets, muBar)
clc;
H=histCov*2; f=zeros(nrAssets,1); A=[]; b=[]; Aeq=[histMean; ones(1,
nrAssets)]; beq=[muBar; 1]; lb=zeros(nrAssets,1);
ub=ones(nrAssets,1);
options = optimset(LargeScale,off);
[w, fval] = quadprog(H, f, A, b, Aeq, beq, lb, ub, [], options);
108 Code
Copyright
The publishers will keep this document online on the Internet - or its possible re-
placement - for a period of 25 years from the date of publication barring exceptional
circumstances. The online availability of the document implies a permanent per-
mission for anyone to read, to download, to print out single copies for your own use
and to use it unchanged for any non-commercial research and educational purpose.
Subsequent transfers of copyright cannot revoke this permission. All other uses of
the document are conditional on the consent of the copyright owner. The publisher
has taken technical and administrative measures to assure authenticity, security
and accessibility. According to intellectual property law the author has the right to
be mentioned when his/her work is accessed as described above and to be protected
against infringement. For additional information about the Linkping University
Electronic Press and its procedures for publication and for assurance of document
integrity, please refer to its WWW home page: http://www.ep.liu.se/
Upphovsrtt
Detta dokument hlls tillgngligt p Internet - eller dess framtida ersttare -
under 25 r frn publiceringsdatum under frutsttning att inga extraordinra
omstndigheter uppstr. Tillgng till dokumentet innebr tillstnd fr var och
en att lsa, ladda ner, skriva ut enstaka kopior fr enskilt bruk och att an-
vnda det ofrndrat fr ickekommersiell forskning och fr undervisning. ver-
fring av upphovsrtten vid en senare tidpunkt kan inte upphva detta tillstnd.
All annan anvndning av dokumentet krver upphovsmannens medgivande. Fr
att garantera ktheten, skerheten och tillgngligheten nns det lsningar av
teknisk och administrativ art. Upphovsmannens ideella rtt innefattar rtt att
bli nmnd som upphovsman i den omfattning som god sed krver vid anvndning
av dokumentet p ovan beskrivna stt samt skydd mot att dokumentet ndras
eller presenteras i sdan form eller i sdant sammanhang som r krnkande fr
upphovsmannens litterra eller konstnrliga anseende eller egenart. Fr ytterli-
gare information om Linkping University Electronic Press se frlagets hemsida
http://www.ep.liu.se/
c ( May 12, 2008. Johan Bjurgert & Marcus Edstrand

Full Text 01

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Full Text 01

Uploaded by

Copyright:

Available Formats

Matematiska Institutionen

URL fr elektronisk version

Sampled covariance matrix

Often, the regression model is written in matrix notation as

X also is of full rank and we can compute the least

y the leverage measures how an observation es-

and hence the determinant of the information satises

p(a, b)db = p(a).

y is the OLS estimate. That the equality holds is proved

) is the residual vector u and its sum

is optimal if there exists an u = (u

You might also like