Download as pdf or txt
Download as pdf or txt
You are on page 1of 26

The European Journal of Finance

ISSN: 1351-847X (Print) 1466-4364 (Online) Journal homepage: https://www.tandfonline.com/loi/rejf20

Can Warren Buffett forecast equity market


corrections?

S. Lleo & W. T. Ziemba

To cite this article: S. Lleo & W. T. Ziemba (2019) Can Warren Buffett forecast
equity market corrections?, The European Journal of Finance, 25:4, 369-393, DOI:
10.1080/1351847X.2018.1521859

To link to this article: https://doi.org/10.1080/1351847X.2018.1521859

Published online: 24 Sep 2018.

Submit your article to this journal

Article views: 48

View Crossmark data

Full Terms & Conditions of access and use can be found at


https://www.tandfonline.com/action/journalInformation?journalCode=rejf20
THE EUROPEAN JOURNAL OF FINANCE
2019, VOL. 25, NO. 4, 369–393
https://doi.org/10.1080/1351847X.2018.1521859

Can Warren Buffett forecast equity market corrections?


S. Lleo a and W. T. Ziembab,c
a Finance Department, NEOMA Business School, Reims, France; b Alumni Professor of Financial Modeling and Stochastic
Optimization (Emeritus), University of British Columbia, Vancouver, Canada; c Distinguished Visiting Associate, Systemic Risk
Centre, London School of Economics, London, England

ABSTRACT ARTICLE HISTORY


Warren Buffett suggested that the ratio of the market value of all publicly traded stocks Received 31 December 2017
to the Gross National Product could identify potential overvaluations and underval- Accepted 30 August 2018
uations in the US equity market when this ratio deviates above 120% or below 80%. KEYWORDS
We investigate whether this ratio is a statistically significant predictor of equity market Stock market crashes; market
corrections and rallies. We find that Buffett’s decision rule does not deliver satisfactory value-to-GNP ratio; Warren
forecasts. However, when we adopt a time-varying decision rule, the ratio becomes Buffett; likelihood ratio test;
a statistically significant predictor of equity market corrections. The two time-varying Monte Carlo simulation;
decision rules are: (i) predict an equity market correction when the ratio exceeds a robustness
95% one-tail confidence interval based on a normal distribution, and (ii) predict an JEL CLASSIFICATIONS
equity market correction when the ratio exceeds a threshold computed using Cantelli’s G14; G15; G12; G10
inequality. These new decision rules are robust to changes in the two key parame-
ters: the confidence level and the forecasting horizon. This paper also shows that the
MV/GNP ratio performs relatively well against the four most popular equity market
correction models, but the ratio is not a particularly useful predictor of equity market
rallies.

Pointing to the spectacular rise in the S&P500 index, which has more than tripled since its March 2009 trough,
investors worry that the US equity market might be dangerously overvalued. At the time of writing, the S&P 500
index was near its record high at 2,400. Legendary investor George Soros entered a $2.2 billion put position on
the U.S. stock market in August 2014 (Kelly 2016). He subsequently closed this position in early 2016, following
a 12% drop in the S&P500 at the end of 2015, before entering a new put position shortly before the 2016 US
elections. Soros was still holding a similar position at the time of writing (Cox 2017). Robert Shiller echoes
George Soros’ bearish views. In a recent interview with Bloomberg (Clenfield and Haigh 2017), Shiller expressed
his concern that ‘the market is way overpriced’.
In an attempt to determine whether the US equity market is overvalued, investors and market commentators
are directing their attention to a variety of measures, ranging from Nobel laureate Robert Shiller’s Cyclically
Adjusted Price-to-Earnings (CAPE) (Campbell and Shiller 1988, 1998; Shiller 2006, 2015), to the CBOE’s Skew
index and to Warren Buffett’s ratio of the market value (MV) of all publicly traded stocks to the current level of
the GNP (MV/GNP) (Buffett and Loomis 2001).
This study focuses on the MV/GNP ratio. The MV/GNP ratio is particularly popular among investment
professionals and is frequently reported in the financial press, thanks in part to the reputation of Warren Buffett
as a legendary investor. However, so far, claims about the forecasting ability of the MV/GNP have not been
examined systematically. For that purpose, we construct the ratio using quarterly, seasonally-adjusted, final GNP
data from the first quarter of 1971 to the third quarter of 2016, for a total of 183 quarters. We use the Wilshire

CONTACT S. Lleo sebastien.lleo@neoma-bs.fr

© 2018 Informa UK Limited, trading as Taylor & Francis Group


370 S. LLEO AND W. T. ZIEMBA

5000 Full Cap Price Index as a proxy for the market value of all publicly traded securities. Both datasets come
from the Federal Reserve Economic Data (FRED) repository at the Federal Reserve Bank of St. Louis.
We study the ratio in three steps. First, we test statistically the MV/GNP ratio’s ability to forecast equity
market corrections, defined as a drop of more than 10% in the value of the S&P 500 within a year. Then, we
compare the forecasting accuracy of the MV/GNP ratio with 32 models based on the Bond Stock Earnings Yield
Differential (BSEYD) and the price-to-earnings (P/E) ratio. Finally, we investigate the converse claim that the
ratio can predict equity market rallies and bull markets, defined respectively as a rise of more than 10% and 20%
in the value of the S&P 500 within a year.
Our analysis shows that the MV/GNP ratio is not a satisfactory predictor of equity market corrections when
the decision rule uses a fixed threshold. However, we find that this ratio, and its logarithm, are statistically sig-
nificant predictors of equity market corrections, when the decision rule uses a time-varying, confidence-based
threshold based on either a normal distribution or on Cantelli’s inequality. What’s more, these models are robust
with respect to changes in their key parameters: confidence level and forecasting horizon.
We compare the six implementations of the MV/GNP ratio developed in this paper with 16 BSEYD-based
models and 16 P/E-based models (including the CAPE), on the same dataset, and with the same quarterly
frequency. We find that the MV/GNP ratio does not outperform the BSEYD, P/E and CAPE.
Finally, our analysis suggests that the MV/GNP ratio has a limited use as a predictor of equity market rallies
and bull markets. Although the MV/GNP ratio is reasonably accurate, it only signaled a fraction of all equity
market rallies and bull markets.
This paper contributes to the literature on bubbles and crashes. To our knowledge, there is no significant
research on the properties of the MV/GNP ratio despite Warren Buffett’s success and fame as an investment
manager. We investigate its properties as a predictor of equity market corrections and bear markets, arguably
the most important use of the ratio.
Most of the existing literature focuses on identifying bubble conditions or performing crash forecasts. The
validity of bubble detection and crash prediction models is often assessed heuristically one event at a time:
either a bubble bursts, or it does not; either a crash occurs or it does not. We conduct a rigorous statistical
study of the MV/GNP ratio as an equity correction predictor in this paper, based on the approach proposed
by Lleo and Ziemba (2017). The construction we propose for the predictor is fully out-of sample to guarantee
that there is no look ahead bias. This construction has some necessary differences with the construction in
Lleo and Ziemba (2017) because we are dealing with quarterly data rather than daily data. As we have fewer,
less frequent observations, our estimate will be less precise, but the risk of significant autocorrelation is also
lower. The non-parametric nature of the likelihood test is crucial: equity market corrections are rare events,
and predicting them places us in the tail of the MV/GNP distribution. Conducting a non-parametric test does
not require assumptions on the distribution or its tail behavior, and this makes its conclusion more robust. We
complete this likelihood test with a Monte Carlo study for small sample bias, and with robustness tests on the
key parameters of the model: the confidence level and the forecasting horizon. Our approach to robustness
differs from the robust likelihood statistic proposed by Lleo and Ziemba (2017). We propose a simpler and more
direct approach in which robustness is related to the optimal choice of parameters that maximizes the predictor’s
accuracy. We also analyze the sensitivity of the model to the definition of equity market corrections, which Lleo
and Ziemba (2017) did not do in their study.

1. Literature review
The academic literature on bubbles and crashes is well established with significant contributions by Blan-
chard and Watson (1982), Flood, Hodrick, and Kaplan (1986), Diba and Grossman (1988), Camerer (1989),
Ziemba and Schwartz (1991), Allen and Gorton (1993), Abreu and Brunnermeier (2003), Gresnigt, Kole, and
Franses (2015), Lleo and Ziemba (2017).
The Oxford English dictionary defines a bubble as ‘a significant, usually rapid, increase in asset prices that is
soon followed by a collapse in prices and typically arises from speculation or enthusiasm rather than intrinsic
increases in value’. The economic literature adopts a more succinct definition: a bubble is ‘the possibility that
asset prices might deviate from intrinsic values based on market fundamental’ (Camerer 1989). However, this
THE EUROPEAN JOURNAL OF FINANCE 371

definition raises the question of the choice of model used to determine the intrinsic value of an asset. For instance,
Flood, Hodrick, and Kaplan (1986) argue that bubbles can be interpreted as evidence of model misspecification.
The Oxford English dictionary defines a crash as ‘a sudden disastrous drop in the value or price of some-
thing’. Several definitions coexist in the finance literature. We discuss three main definitions. Gresnigt, Kole,
and Franses (2015) define a crash as a daily return located in the bottom five percentile of the distribution. This
definition is data-dependent. All samples will necessarily contain crashes, and the exact level of a crash will vary
across samples. Abreu and Brunnermeier (2003) define a crash as an instantaneous drop in the price of an asset
from an inflated level induced by a bubble, down to its fundamental value. This definition has the advantage of
relating bubbles and crashes, but it also implies that the fundamental value is known, which is seldom the case
in practice. In addition, the crash takes the asset price to its fundamental value, which rules out the possibility of
either under or over-reaction. Finally, Ziemba and Schwartz (1991) define a crash as a 10% decline in the level
of a stock market index within a year. This definition does not capture the swiftness of the drop, but it is precise,
introduces fewer assumptions, is independent of the data, and its time frame is consistent with the focus of this
study.
In this paper, we use the Ziemba-Schwartz definition of a 10% decline in the level of a stock market index
within a year, but call it an equity market correction rather than a crash. The term ‘equity market correction’ is
commonly-accepted among equity portfolio managers and widely reported in the financial press.
Blanchard and Watson (1982) assert that rational bubbles can be predicted when both prices and returns
are observable. Over the past thirty years, three broad categories of bubble identification and crash forecasting
models have been developed: fundamental models, stochastic models and sentiment-based models (Ziemba,
Lleo, and Zhitlukhin 2017).
Fundamental models use fundamental variables such as stock prices, corporate earnings, interest rates, infla-
tion, or GNP, to forecast crashes. In this category, we have the Bond-Stock Earnings Yield Differential (BSEYD)
measure (Ziemba and Schwartz 1991), the price-to-earnings (P/E) (Lleo and Ziemba 2017) ratio and the Cycli-
cally Adjusted P/E ratio (CAPE), and the ratio of the market value of all publicly traded stocks to the current
level of the GNP (MV/GNP). The BSEYD is the oldest and most studied measure. It is the difference between
the yield of a long-term government bond and the earnings yield of a equity market. The BSEYD successfully
predicted the burst of the Japanese equity market bubble (Ziemba and Schwartz 1991). Lleo and Ziemba (2012)
conducted an event study and found that the BSEYD would have forecasted equity market crashes that occurred
in Iceland, China and the United States between 2007 and 2009. To investigate the forecasting accuracy of the
BSEYD, Lleo and Ziemba (2017) tested the BSEYD, P/E ratio and CAPE on the S&P 500 using daily data from
1964 to 2014. They found that the BSEYD, P/E ratio and CAPE, were statistically significant robust predictors of
corrections on the S&P500 over the period. However, the forecasting accuracy of the MV/GNP ratio has never
been tested.
Stochastic models construct a probabilistic representation of asset prices. This representation can be
either a discrete or continuous time stochastic processes. Examples include the local martingale model (Jar-
row, Kchia, and Protter 2011a,b,c; Jarrow 2012), the disorder detection model and Ziemba (Shiryaev and
Zhitlukhin 2012a,b; Shiryaev, Zhitlukhin, and Ziemba 2014, 2015) and the earthquake model (Gresnigt, Kole,
and Franses 2015). When it comes to actual implementation, the local martingale model and the disorder detec-
tion model share the same starting point: they assume that the evolution of the price S(t) of an asset can be best
described using a geometric process:

dS(t) = μ(t, S(t))S(t)dt + σ (t, S(t))S(t)dW(t), S(0) = s0 , t ∈ R+

where W(t) is a standard Brownian motion on the underlying probability space. However, the two models look
at different aspect of the evolution. The local martingale model detects bubbles by testing whether the volatility
σ is a local martingale or a strict martingale. The disorder detection model detects crashes by looking for a
change in regime in the drift μ and volatility σ . In contrast, the earthquake model implements the Epidemic-
type Aftershock Sequence (ETAS) geophysics model proposed by Ogata (1988). It is based on an Hawkes process,
a type of inhomogeneous point process. All three types of stochastic models perform well empirically. Jarrow,
Kchia and Protter found evidence of bubbles in stock price data relating to the internet bubble of 1998-2001,
372 S. LLEO AND W. T. ZIEMBA

and in LinkedIn’s stock price in May 2011. Shiryaev, Zhitlukhin and Ziemba computed an optimal selling time
guaranteeing a profitable exit from Apple’s stock between 2009 and 2012, from the NASDAQ 100 index during
the internet bubble, from the Nikkei index around 1990s, and from Japanese lands around 1990. Finally, Gresnigt,
Kole, and Franses found that the rate of correct predictions in their early warning system was higher than the
rate of false predictions, but they did not test statistically the accuracy of the prediction.
Sentiment-based models are behavioral models looking at crashes in relation to market sentiment (Fisher
and Statman 2000, 2003; Baker and Wurgler 2006) and collective behavioral biases such as overconfidence
and excessive optimism (Barone-Adesi, Mancini, and Shefrin 2013). Goetzmann, Kim, and Shiller (2016) use
surveys of individual and institutional investors, conducted regularly over a 26-year period in the United
States, to assess the subjective probability of a market crash. They observe that these subjective probabil-
ities are much higher than the actual historical probabilities. To understand this observation, they exam-
ine a number of factors that influence investor responses and find evidence consistent with the effect of
availability bias.

2. Theory: Buffett’s market value to GNP ratio as an equity correction forecastingmodel


2.1. Buffett’s market value to GNP ratio
Warren Buffett introduced the MV/GNP ratio in a Fortune Magazine article. He defined it as ‘the market value
of all publicly traded securities as a percentage of the country’s business - that is, as a percentage of GNP’.
This ratio gauges the total market value of companies against the value of the goods and services that these
companies produce. The market value of all publicly traded US securities reflects the capacity of US firms to
generate revenue, and to translate this revenue into stable earnings.
Buffett views this ratio as a simple rule to determine whether equity market are over- or under-valued (Buffett
and Loomis 2001):
The ratio has certain limitations in telling you what you need to know. Still, it is probably the best single measure of where
valuations stand at any given moment. And as you can see, nearly two years ago the ratio rose to an unprecedented level.
That should have been a very strong warning signal.
The US GNP represents the market value of all the products and services produced by US citizens and com-
panies regardless of where they are produced. By contrast, the US GDP is the market value of all the products
and services produced in the US, regardless of who produced it. To illustrate, the production of Apple in China
would be part of the US GNP but not GDP, while the cars produced in the US by Toyota would count in the US
GDP but not GNP. This argument justifies Buffett’s use of GNP in the ratio.
How different would the ratio be if we used GDP instead of GNP? Figure 1 shows two comparisons of US
GDP and GNP between the first quarter of 1971 and the third quarter of 2016. Panel (a) shows the evolution
of the GDP and GNP on a quarterly and seasonally-adjusted basis. The difference between the two measures is
small, with GNP rising slightly above the GDP in recent years. Panel (b) shows that the GNP-to-GDP ratio has
remained in a very narrow 1.00 to 1.02 range over the whole period. This suggests that using the US GDP rather
than the GNP does not have a material impact on the results.
We test this intuition with an ordinary least square regression of the quarterly GNP against the quarterly
GDP. Both series are in billions of dollars and seasonally adjusted. The R2 of the regression is 0.99, indicating
that the variability of the GDP explains almost all of the variability observed in the GNP. The F-statistic is above
2,400,000 with 182 degrees of freedom, rejecting the hypothesis that the slope coefficient is 0 at a 1% level of
significance. The estimate for the slope is actually 1.01, with a standard error of 0.0006. We also test the null
hypothesis that the slope is equal to 1 at a 1% level of significance. The t-statistic is 17, compared to a critical
value of 2.60. We reject the hypothesis. The corresponding 99% confidence interval around the slope estimate
is [1.009, 1.013].
Buffett and Loomis also suggested the following heuristic decision rule:
If the percentage relationship falls to the 70% or 80% area, buying stocks is likely to work very well for you. If the ratio
approaches 200%–as it did in 1999 and a part of 2000–you are playing with fire.
THE EUROPEAN JOURNAL OF FINANCE 373

Figure 1. Comparison of the US quarterly GDP and quarterly GNP during the period 1971 Q1 to 2016 Q3. (a) US quarterly GDP and US quarterly
GNP (1971 Q1-2016 Q3) (b) Ratio of the US quarterly GNP to quarterly GDP (1971 Q1-2016 Q3).
374 S. LLEO AND W. T. ZIEMBA

Figure 2. Quarterly ratio of the full cap price Wilshire 5000 index to the GNP (1971 Q1 -2016 Q3).

In Figure 2 we show the ratio of the Wilshire 5000 Full Cap Price Index, a proxy for the total market value of
US stocks, to the US GNP from 1971 Q1 to 2016 Q3. The ratio reached its peak at 142% in March 2000. We also
indicate the 80% and 120% levels, which are often used by practitioners as a decision rule. The ratio has only
risen above 120% on two instances over the past forty-five years: during the Dot.Com bubble and at the end of
2014. Investors have taken this pattern as evidence that the stock market could experience a sharp decline.

2.2. Equity market corrections


Lleo and Ziemba (2017) define an equity market correction as a decline of at least 10% in the S&P500 index level
from peak to trough based on closing prices, over a maximum period of one year (252 trading days).
We identify a correction on the day when the daily closing price crosses the 10% threshold. The identification
algorithm is as follows:

(1) Identify all the local peaks and local troughs in the data set. Today is a local peak (trough) if there is no
higher (lower) closing price within ±90 days.
(2) Identify the correction. Today is an identification day if all of the following conditions hold:
(a) The closing level of the S&P500 today is down at least 10% from its highest level within the past year,
and the close on the previous days was above the 10% threshold;
(b) The highest level reached by the S&P 500 prior to the present correction differs from the highest level
corresponding to a previous correction; and
(c) The highest level occurred after the local trough that followed the last correction.

The objective of these rules is to ensure that the corrections we identify are distinct. Two corrections are not
distinct if they occur within the same larger market decline.
To determine the timing of equity market corrections, we use the daily level of the S&P 500 Total Return Index
from January 1st, 1971 to September 30, 2016. We obtained this data from Bloomberg. Table 1 presents the 20
THE EUROPEAN JOURNAL OF FINANCE 375

Table 1. The S&P 500 index experienced 20 market corrections between January 1, 1971 and September 30, 2016.

Crash Identification S&P Index S&P Index Peak-to-trough Peak-to-trough


Date Peak Date at Peak Trough date at trough decline (%) duration (in days)
1 1971-08-04 1971-04-28 104.77 1971-11-23 90.16 13.9% 209
2 1973-04-27 1973-01-11 120.24 1974-10-03 62.28 48.2% 630
3 1975-08-08 1975-07-15 95.61 1975-09-16 82.09 14.1% 63
4 1977-05-25 1976-09-21 107.83 1978-03-06 86.9 19.4% 531
5 1978-10-26 1978-09-12 106.99 1978-11-14 92.49 13.6% 63
6 1979-10-25 1979-10-05 111.27 1979-11-07 99.87 10.2% 33
7 1980-03-10 1980-02-13 118.44 1980-03-27 98.22 17.1% 43
8 1981-08-24 1980-11-28 140.52 1981-09-25 112.77 19.7% 301
9 1984-02-13 1983-10-10 172.65 1984-07-24 147.82 14.4% 288
10 1987-10-15 1987-08-25 336.77 1987-12-04 223.92 33.5% 101
11 1990-01-30 1989-10-09 359.8 1990-01-30 322.98 10.2% 113
12 1990-08-17 1990-07-16 368.95 1990-10-11 295.46 19.9% 87
13 1997-10-27 1997-10-07 983.12 1997-10-27 876.99 10.8% 20
14 1998-08-14 1998-07-17 1186.75 1998-08-31 957.28 19.3% 45
15 1999-09-29 1999-07-16 1418.78 1999-10-15 1247.41 12.1% 91
16 2000-04-14 2000-03-24 1527.46 2001-04-04 1103.25 27.8% 376
17 2007-11-26 2007-10-09 1565.15 2009-03-09 676.53 56.8% 517
18 2010-05-20 2010-04-23 1217.28 2010-07-02 1022.58 16.0% 70
19 2011-08-04 2011-04-29 1363.61 2011-10-03 1099.23 19.4% 157
20 2015-08-24 2015-05-21 2130.82 2015-08-25 1867.61 12.4% 96

corrections that occurred over the period. On average, a correction lasted 195 days, causing a 19.9% decline in
the S&P 500 index.
Although, we would like to ascertain whether the MV/GNP ratio forecasts bear markets, defined as declines
of at least 20% in the S&P500 index, we cannot do this convincingly. We only have five bear markets in our
sample, which is clearly insufficient to perform statistical inference. Conversely, we could consider equity market
declines of less than 10%. The pitfall here is that the frequency of declines increases rapidly as we lower the 10%
threshold. Figure 3 displays the absolute frequency of declines in the S&P 500 for various levels of declines with
a minimum of 5% and a maximum above 26%. Out of the 32 declines identified, 17 are less than 10% and only

Figure 3. Absolute frequency of equity market declines (January 1, 1971 to September 30, 2016).
376 S. LLEO AND W. T. ZIEMBA

five are above 20%.1 Because declines of less than10% occur frequently, portfolio managers seldom focus on
forecasting them and protecting against their effect.

2.3. Turning the market value to GNP ratio into a crash forecasting measure
Equity market correction forecasting models such as the BSEYD model (Ziemba and Schwartz 1991; Lleo
and Ziemba 2012, 2017) or the continuous time disorder detection model (Shiryaev, Zhitlukhin, and
Ziemba 2014, 2015) generate a signal to indicate a downturn in the equity market at a given horizon H. This
signal occurs whenever the value of a given measure crosses a threshold. Given a forecasting measure M(t), a
signal SIGNAL(t) occurs at time t whenever
SIGNAL(t) = M(t) − B(t) > 0 (1)
where B(t) is a time-varying threshold for the signal. The three determinants of the signal are:

(1) the choice of measure M(t);


(2) the definition of threshold B(t); and
(3) the specification of a time interval H between the occurrence of the signal and that of the equity market
correction.

2.3.1. The measure M(t)


The MV/GNP ratio is the main measure under consideration. We will also consider the logarithm of the Market
Value to GNP (log(MV/GNP)) ratio.
The logarithm has several advantages. First, it converts products into sums and ratios into subtractions. Sec-
ond, the logarithm rescales large values into smaller ones. Third, the logarithm of the Wilshire 5000 index level
and the logarithm of the GNP have an economic interpretation. The logarithm of the Wilshire 5000 index level
is the log return of the Wilshire 5000. Similarly, the logarithm of the GNP is the logarithmic growth of the GNP.
This means that we can interpret log(MV/GNP) as the difference in log return between financial investments
and productive assets. Clearly, financial returns cannot outpace yields on productive assets for extensive periods
of time.
We compute MV/GNP and log(MV/GNP) using end-of period values. Using financial market and macroe-
conomic data series in the same forecast creates a synchronicity problem. Financial market data, such as the
Wilshire 5000, are directly observable and readily available. Macroeconomic data series, such as GDP or GNP,
are released with a time lag and are subject to revisions. Therefore we cannot determine the quarterly value of the
MV/GNP ratio using only information available at the end of the quarter. It is still best to use both the Wilshire
5000 index level and the final GNP release in our study.

2.3.2. The threshold B


Buffett and Loomis do not propose a clear threshold on the upside. Practitioners use a threshold of 120%, prob-
ably because of its apparent symmetry with Buffett’s 70% to 80% downside rule. Hence, we use a threshold of
120% in our study.
We also test the MV/GNP measure using two time-varying thresholds defined as the upper bound of a con-
fidence interval (Ziemba and Schwartz 1991; Lleo and Ziemba 2012, 2017). We compute this upper bound as
either a standard one-tail α confidence bound based on a normal distribution, or as a β confidence bound based
on Cantelli’s inequality, a one-tailed version of Chebyshev’s inequality.2
To establish the upper bound of the confidence level, we use the rolling horizon mean (moving average) and
standard deviation of the distribution of the measure M(t). The q-quarter moving average at time t, denoted by
q q
μt , and the corresponding rolling horizon standard deviation σt are


1
q−1
  1  q−1
rt−i , σt = 
q h q
μt = (rt−i − μt )2 , (2)
q i=0 q − 1 i=0
THE EUROPEAN JOURNAL OF FINANCE 377

where rt is the value of the quarterly return r over the quarter t. This construction has the advantage of providing
data consistency because rolling horizon means and standard deviations are not overly sensitive to the start-
ing date of the calculation. Most importantly, this construction addresses the in-sample versus out-of-sample
problem by using only past data.
In our analysis, the horizon for the rolling statistics is q = 8 quarters, or two years of data. The crash forecasting
literature generally suggests using one year of monthly or daily data. At a quarterly frequency, one year only offers
four observations, whereas using two years of data makes for more meaningful standard deviations.
We select α = 95% for the upper bound of the standard confidence interval. This choice is consistent with
the crash forecasting literature (Ziemba and Schwartz 1991). Within the statistical inference literature, α = 95%
is also a widely accepted choice. Fisher first suggested the use of a two-tailed 5% significance level.3 Pearson and
Neyman advocated that the level of significance need to be selected a priori to avoid the use of p-values as a main
decision rule (Neyman and Pearson 1933; Neyman 1934, 1937). Although Fisher later updated his view on the
choice of a level of significance (Fisher 1955), the 5% level significance / 95% confidence level has remained a
benchmark ever since for both one-tailed and two-tailed tests.
The expert opinion literature (Meyer and Booker 2001; O’Hagan 2006) also customarily asks for two-tailed
90% confidence bounds, translating into a 95% one-tailed confidence interval. Here, the frame of references
is subjective probabilities, and more specifically to the personal probabilities of Ramsey, de Finetti and Savage
(Savage 1971). In this framework, the equity correction forecasting model is subjective in nature and akin to an
expert opinion. The confidence level α is the subjective level of confidence in the model’s ability to forecast ‘nor-
mal’ market operations. Any sharp increase above this level would indicate that we are outside of the confidence
interval: a market disruption such as an equity market correction, is likely to happen.
With Cantelli’s inequality, there is no clear rule on how to select β. We chose β = 25% to produce a slightly
higher threshold than the standard confidence interval. In a normal distribution, we expect 5% of the observa-
tions to lie in the right tail, whereas Cantelli’s inequality implies that the percentage of observation in the right
tail will be no higher than 25%.

2.3.3. The horizon H


The last parameter we specify is the forecasting horizon H. We set the horizon to H = 4 quarters, prior to the local
peak that preceded the equity market correction. This approach differs from setting the horizon with respect to
the identification date (Lleo and Ziemba 2017). However, there is a tradeoff in this choice. Using the date of
the local peak as a reference point gives a greater precision because the time between the local peak and the
crash identification date differs across equity market corrections. The local peak date is a by-product of the
identification date. As such it is not a key determinant of the crash predictor.

3. Calculations
This study follows six computational steps. We start by describing and computing six crash prediction models.
Then, we construct a hit sequence X, tracking the outcome of each signal. Next, we estimate the probability of
forecasting a correction and derive a likelihood ratio test. We investigate the effect of small sample bias with a
Monte Carlo study and perform two robustness tests for: the confidence level α, and the forecasting horizon H.
Finally, we examine the sensitivity of the measures to a change in the definition of equity market corrections.

3.1. Crash prediction models in this study


We study six crash prediction models:

(1) MV/GNP with fixed threshold at 120%;


(2) MV/GNP with threshold computed using a standard 95% one-tail confidence interval based on a normal
distribution;
(3) MV/GNP with threshold computed using Cantelli’s inequality;
(4) log(MV/GNP) with fixed threshold at 120%;
378 S. LLEO AND W. T. ZIEMBA

(5) log(MV/GNP) with threshold computed using a standard 95% one-tail confidence interval based on a
normal distribution; and
(6) log(MV/GNP) with threshold computed using Cantelli’s inequality;

3.2. Construction of the hit sequence X


Equity correction forecasting models have two components: (1) a signal which takes value 1 or 0 depending on
whether the measure has crossed the confidence level, and (2) an indicator which takes value 1 when an equity
market correction occurs and 0 otherwise. From a probabilistic perspective, these components are Bernoulli
random variables.
We define a signal indicator sequence S = {St , t = 1, . . . , T}. This sequence records the first quarter in a
series of positive signals. The signal indicator St takes the value 1 if the measure crosses the threshold on quarter
t but not on quarter t−1, and 0 otherwise. Thus, the event ‘a signal occurred for the first time on quarter t’ is
represented as {St = 1}. We express the signal indicator sequence as the vector s = (S1 , . . . , St , . . . , ST )
We define Ct as the indicator function taking value 1 when an equity market correction occurs at time t, and
0 otherwise. Denote by Ct,H the indicator function returning 1 if the identification date of at least one equity
market correction occurs between time t and time t+H. The relation between Ct,H and Ct is


H
Ct,H := 1 − (1 − Ct+i ) . (3)
i=1

We identify the vector CH with the sequence CH := {Ct,H , t = 1, . . . , T − H} and define the vector cH :=
(C1,H , . . . , Ct,H , . . . CT−H,H ). The number of correct forecasts n is defined as


T
n= Ct,H = 1 cH (4)
t=1

where 1 is a vector with all entries set to 1 and v  denotes the transpose of vector v.
The accuracy of the equity correction forecasting model is the conditional probability P(Ct,H = 1|St = 1) that
an equity market downturn occurs between time t and time t+H, given that we observe a signal at time t. The
higher the probability, the more accurate the model. We apply maximum likelihood to estimate this probability
and to test whether it is significantly higher than a random guess. We can obtain a simple analytical solution
because the conditional random variable {Ct,H = 1|St = 1} is a Bernoulli trial with probability p = P(Ct,H =
1|St = 1).
To estimate the probability p, we change the indexing to consider only events along the sequence {St |St =
1, t = 1, . . . T} and denote by X := {Xi , i = 1, . . . , N} the ‘hit sequence’ where Xi = 1 if the ith signal is followed
by a correction and 0 otherwise. Here N denotes the total number of signals, that is


T
N= St = 1 s. (5)
t=1

The sequence X can be expressed in vector notation as x = (X1 , X2 , . . . , XN ). The probability p is the ratio n/N.

3.3. Maximum likelihood estimate of p = p(Ct,H |St ) and likelihood ratio test
The likelihood function L associated with the observations sequence X is


N
L(p|X) := pXi (1 − p)1−Xi (6)
i=1
THE EUROPEAN JOURNAL OF FINANCE 379

and the log likelihood function L is


 

N 
N
L(p|X) := ln L(p|X) = Xi ln p + N − Xi ln(1 − p) (7)
i=1 i=1

This function is maximized for


N
i=1 Xi
p̂ := (8)
N
so the maximum likelihood estimate of the probability p = P(Ct,H |St ) is the historical proportion of correct
forecasts out of all observations.
Then we apply a likelihood ratio test to test the null hypothesis H0 : p = p0 against the alternative hypothesis
HA : p = p0 . Based on the null hypothesis, the probability that a random, uninformed signal correctly forecasts
a correction is equal to p0 . A significant departure above this level indicates that the measure we are considering
may contain some information about future equity market corrections.
This null hypothesis is meaningful because it is jointly consistent with the data at hand, and with the impli-
cations of financial economics that equity market corrections should not be predictable. This is a crucial, but
subtle and often overlooked, point in statistical inference. Rejecting or failing to reject a null hypothesis only truly
makes sense if the null hypothesis is meaningful within the context of both the underlying theory we are testing,
and the data. Setting a null hypothesis that ignores either theory or data could lead to spurious conclusions.
The likelihood ratio  is:
L(p = p0 |X) L(p = p0 |X)
= = . (9)
maxp∈(0,1) L(p|X) L(p = p̂|X)
The statistic Y := −2 ln  is asymptotically χ 2 -distributed with ν = 1 degree of freedom. We reject the null
hypothesis H0 : p = p0 and accept that the model may have some predictive power if Y > c, where c is the
critical value chosen for the test. We perform the test for c = 2.71, 3.84, and 6.63 corresponding respectively to
90%, 95% and 99% confidence levels.

3.4. Monte Carlo study for small sample bias


A limitation of the likelihood ratio test is that the χ 2 distribution is only valid asymptotically. In our case, the
number of correct forecasts follows a binomial distribution with an estimated probability of success p̂ and N
trials. However, only 19 corrections occurred during the time period in this study, and the number of signals
generated by the six models is even lower, ranging from 2 to 11. The continuous χ 2 distribution might not
provide an adequate approximation for this discrete distribution: p̂ might appear significantly different from
p0 with a χ 2 distribution but not with the true distribution. This difficulty is an example of small sample bias.
Therefore, we use Monte Carlo methods to obtain the empirical distribution of test statistics and address this
bias.
The Monte Carlo algorithm generates a large number K of paths. For each path k = 1, . . . , K, simulate N
Bernoulli random variables with probability p0 of obtaining a ‘success’. Denote by Xk := {Xik , i = 1, . . . , N} the
realized sequence, where Xik = 1 if the ith Bernoulli variable produces a ‘success’ and 0 otherwise. Compute the
maximum likelihood estimate for the probability of success given the realization sequence Xk as
N
Xk
p̂ := i=1 i , (10)
N
and the test statistics for the path as
L(p = p0 |Xk ) L(p = p0 |Xk )
Yk = −2 ln k = −2 ln = −2 ln . (11)
maxp∈(0,1) L(pk |Xk ) L(p = p̂k |Xk )
Once all the paths have been simulated, we use all K test statistics Yk , k = 1, . . . , K to produce an empirical
distributions for the test statistic Y.
380 S. LLEO AND W. T. ZIEMBA

From the empirical distribution, we obtain critical values at 90%, 95% and 99% confidence levels against
which we assess the forecasting test statistics Y. The empirical distribution also enables us to compute a p-value
for the forecasting test statistics. Finally, we compare the results obtained under the empirical distribution to
those derived using the asymptotic χ 2 distribution.

3.5. Parameter robustness and optimal parameter choice


The confidence level α and the forecasting horizon H are the two key parameters. The confidence level affects
directly the number of signals that the model generates, and indirectly the accuracy of the model. The forecasting
horizon influences the number of correct signals, as well as the uninformed probability p0 used in the significance
test, but it does not influence the number of signals generated. It is easier to produce an accurate forecast if we
have a longer horizon than a shorter one.
To test the robustness of the model forecasts, we compute the optimal value for the confidence level α, hold-
ing the forecasting horizon H constant, and then the optimal value for the forecasting horizon H, holding the
confidence level α constant.
We seek the confidence level α ∈ [0.9, 1] that maximizes the empirical accuracy p̂:

A = arg max p̂(α; H) (12)


α∈[0.9,1]

We are interested in the lowest confidence interval for which p̂ = 100% and the evolution of the number of
predications as the confidence level changes. Intuitively, we would expect the accuracy of the measure to increase
with the confidence level.
We are also interested in whether the model remains significantly better than a random guess if we chose a
confidence level at the lower end of the confidence range. Answering this question will give us an indication on
the robustness of the model, in relation to a change or misspecification in the confidence level. This approach to
robustness is an application of the robust likelihood statistics proposed by Lleo and Ziemba (2017) to a single
parameter test.
We look at robustness with respect to the forecasting horizon H. Lleo and Ziemba’s robust likelihood statistics
cannot test the robustness of the model with respect to a change in forecasting horizon because changing the
forecasting horizon will affect the uninformed probability p0 . Therefore, we need a different approach. We hold
the confidence level α constant at 95% and look for the time horizon H that maximizes the empirical accuracy p̂:

H= arg max p̂(H; α) (13)


H∈{1,2,3,4,5,6,7,8}

We limit the range in our analysis to 8 quarters (2 years) after the signal. The practical usefulness of forecasting
a correction more than two years into the future is limited.

3.6. Model sensitivity to the definition of an equity market corrections


We defined an equity market correction as a decline of at least 10% in the value of the S&P 500 index. To under-
stand the effect that a change in the magnitude of the decline may have on the significance of the forecasts, we
conduct our analysis using a decline of at least 8% in the value of the S&P 500 index.
Lowering the loss threshold from 10% to 8% increases the number of corrections from 20 to 24. This should
reduce the standard error in our inference. On the other hand, increasing the loss threshold would decrease the
number of corrections and increase markedly the standard error. For example, we only have 17 corrections at
a loss threshold of 11%. Table 2 lists the 24 declines of 8% or more that occurred between January 1, 1971 and
September 30, 2016.
The choice of a loss threshold will not affect the number of crash forecasts. It may have an effect on the
accuracy, and on the level of significance. As we lower the loss threshold from 10% to 8%, the uninformed
probability p0 will increase and the test statistic Y will decrease.
THE EUROPEAN JOURNAL OF FINANCE 381

Table 2. S&P 500 index experienced 24 declines of at least 8% between January 1, 1971 and September 30, 2016.

Peak-to-trough
Crash Identification S&P Index S&P Index Peak-to-trough duration
Date Peak Date at Peak Trough date at trough decline (%) (in days)
1 1971-07-29 1971-04-28 104.77 1971-11-23 90.16 13.9% 209
2 1973-03-21 1973-01-11 120.24 1974-10-03 62.28 48.2% 630
3 1975-08-04 1975-07-15 95.61 1975-09-16 82.09 14.1% 63
4 1976-11-10 1976-09-21 107.83 1976-11-10 98.81 8.4% 50
5 1978-04-18 1977-07-19 101.79 1978-11-14 92.49 9.1% 483
6 1979-10-19 1979-10-05 111.27 1979-11-07 99.87 10.2% 33
7 1980-03-06 1980-02-13 118.44 1980-03-27 98.22 17.1% 43
8 1980-12-10 1980-11-28 140.52 1981-09-25 112.77 19.7% 301
9 1984-02-06 1983-10-10 172.65 1984-07-24 147.82 14.4% 288
10 1986-09-12 1986-09-04 253.83 1986-09-29 229.91 9.4% 25
11 1987-10-12 1987-08-25 336.77 1987-12-04 223.92 33.5% 101
12 1990-01-22 1989-10-09 359.8 1990-01-30 322.98 10.2% 113
13 1990-08-06 1990-07-16 368.95 1990-10-11 295.46 19.9% 87
14 1994-04-04 1994-02-02 482.00 1994-04-04 438.92 8.9% 61
15 1998-08-04 1998-07-17 1186.75 1998-08-31 957.28 19.3% 45
16 1999-08-06 1999-07-16 1418.78 1999-10-15 1247.41 12.1% 91
17 2000-02-18 1999-12-31 1469.25 2001-04-04 1103.25 24.9% 460
18 2004-08-06 2004-02-11 1157.76 2004-08-12 1063.23 8.2% 183
19 2007-08-14 2007-07-19 1553.08 2007-08-15 1406.7 9.4% 27
20 2007-11-12 2007-10-09 1565.15 2009-03-09 676.53 56.8% 517
21 2010-02-08 2010-01-19 1150.23 2010-07-02 1022.58 11.1% 164
22 2011-08-02 2011-04-29 1363.61 2011-10-03 1099.23 19.4% 157
23 2012-05-17 2012-04-02 1419.04 2012-06-01 1278.04 9.9% 60
24 2015-08-24 2015-05-21 2130.82 2015-08-25 1867.61 12.4% 96

4. Results and discussion


4.1. Accuracy of the predictor
We perform our analysis using quarterly, seasonally-adjusted, final GNP data and Wilshire 5000 Full Cap Price
Index, a proxy for the total market value of the US equity market. The data was obtained from the Federal Reserve
Economic Data (FRED) repository at the Federal Reserve Bank of St. Louis. This dataset covers the period from
the first quarter of 1971 to the third quarter of 2016, for a total of 183 quarters.
Table 3 presents the total number of signals, number of correct and incorrect forecasts, and proportion of cor-
rect and incorrect forecasts for the six models. The MV/GNP and log(MV/GNP) models using a fixed threshold
at 120% generated only two signals. The MV/GNP models based on confidence interval and Cantelli’s inequality

Table 3. Proportion of correct and incorrect forecasts for six signal Model.

Proportion of
Total number Number of Proportion of correct Number of incorrect
Model of signals correct forecasts forecasts (%) incorrect forecasts forecasts (%)
(1) (2) (3) (4) (5) (6)
MV/GNP (fixed threshold) 2 2 100.00% 0 0.00%
MV/GNP (confidence interval) 11 8 72.73% 3 27.27%
MV/GNP (Cantelli) 11 8 72.73% 3 27.27%
log(MV/GNP) (fixed threshold) 2 2 100.00% 0 0.00%
log(MV/GNP) (confidence interval) 10 8 80.00% 2 20.00%
log(MV/GNP) (Cantelli) 10 8 80.00% 2 20.00%
Note: The Total Number of Signal reported in Column 2 tallies distinct signals. It is calculated as the sum of all the elements of the indicator
sequence S. The Number of Correct Forecasts in Column 3 counts the signals that preceded a correction. It is calculated as the sum of all the
entries of the indicator sequence X. The Proportion of Correct Forecasts in Column 4 measures the accuracy of the signal. It is computed as the
ratio of column (3) to column (2). The Number of Incorrect Forecasts reported in Column 5 counts the number of signals that did not forecast a
correction. It is calculated as column (2) minus column (3). Finally, the Proportion of Incorrect Forecasts in Column 6 measures the percentage
of false positive given by the signal. It is computed as the ratio of column (5) to column (2).
382 S. LLEO AND W. T. ZIEMBA

produced 11 signals each. The log(MV/GNP) models based on confidence interval and Cantelli’s inequality
generated 10 signals each.
These results are markedly lower than the 20 equity market corrections recorded over the period. The main
reason is that the GNP is released quarterly, limiting the frequency of calculations and reducing the number of
signals. The accuracy of the model is the proportion of correct forecasts. It ranges from 70% for MV/GNP based
on confidence interval and Cantelli’s inequality, to 80% for the log(MV/GNP) based on confidence interval and
Cantelli’s inequality, and to 100% for the MV/GNP and log(MV/GNP) with a fixed threshold.

4.2. Maximum likelihood estimate of p = p(Ct,H |St ) and likelihood ratio test
The probability p0 is the probability to identify an equity market correction within 4 quarters from a randomly
selected date. To compute p0 empirically, we tally the number of quarters that are at most 4 quarters before a
local peak and divide by the total number of quarters in the sample. We find that p0 = 38.64% over the entire
period. Essentially, a monkey pointing to a date on a calendar would have a 40% chance of correctly forecasting
an equity market correction.
We can confirm this number heuristically. Given that 20 distinct corrections occurred, then at most 4 × 20 =
80 quarters in our sample occur within 4 quarters prior to a local peak. Because there are 183 quarters in the
dataset, the heuristic probability is 43.72% (80/183), a bit higher than the empirical probability. The difference
between the heuristic and empirical probability is due to the fact that in reality equity market corrections are
not spread evenly through the period. Corrections might occur in quick succession, as was the case in the late
1990s when three corrections occurred within less than two years.
Table 4 presents the maximum likelihood estimate, likelihood ratio and test statistic for each of the six models.
With an accuracy of 100%, the MV/GNP and log(MV/GNP) with fixed thresholds have a perfect track record,
but out of 2 forecasts only. The MV/GNP models using a confidence interval or Cantelli inequality are significant
at a 5% level while the log(MV/GNP) are significant at a 1% level. Overall, all the models demonstrate an ability
to forecast equity market corrections.

4.3. Monte Carlo study for small sample bias


Table 5 reports the maximum likelihood estimate, empirical critical value at 90%, 95% and 99% confidence
levels, the test statistic and p-value for each of the six models. All the models based on time-varying thresholds are

Table 4. Maximum likelihood estimate and likelihood ratio test: uninformed prior.

Total number Number of Likelihood Test statistics


Model of signals correct forecasts ML Estimate p̂ L(p̂) ratio  −2 ln  p-value
MV/GNP (fixed threshold) 2 2 100.00% – – – –
MV/GNP (confidence interval) 11 8 72.73% 1.59E-03 7.23E-02 5.25∗∗ 2.19%
MV/GNP (Cantelli) 11 8 72.73% 1.59E-03 7.23E-02 5.25∗∗ 2.19%
log(MV/GNP) (fixed threshold) 2 2 100.00% – – – –
log(MV/GNP) (confidence interval) 10 8 80.00% 6.71E-03 2.79E-02 7.16∗∗∗ 0.75%
log(MV/GNP) (Cantelli) 10 8 80.00% 6.71E-03 2.79E-02 7.16∗∗∗ 0.75%
Notes: The Total Number of Signals is calculated as the sum of all the elements in the indicator sequence S. The Number of Correct Forecasts tallies
corrections preceded by a signal. It is calculated as the sum of all the elements in the indicator sequence X. The Maximum Likelihood estimate p̂
is the probability of forecasting a correction. It maximizes the likelihood function of the model. It is equal to the ratio of the number of correct
forecasts to the total number of signals. L(p̂) is the likelihood of the correction forecasting model, computed using the maximum likelihood
estimate p̂. The likelihood ratio  = L(p0 |X)/L(p = p̂|X) is the ratio of the likelihood under the null hypothesis p = p0 to the likelihood using
the estimated probability p̂. The estimated test statistics, equal to −2 ln , is asymptotically χ 2 -distributed with 1 degree of freedom. The p-
value is the probability of obtaining a test statistic higher than the one actually observed, assuming that the null hypothesis is true. The degree
of significance and the p-value indicated in the table are both based on this distribution. The critical values at the 95%, 99% and 99.5% level are
respectively 3.84, 6.63 and 7.88.
∗ significant at the 10% level;
∗∗ significant at the 5% level;
∗∗∗ significant at the 1% level.
THE EUROPEAN JOURNAL OF FINANCE 383

Table 5. Monte Carlo likelihood ratio test.

Critical Critical Critical


Total number Value: 90% Value: 95% Value: 99% Test statistics Empirical
Signal Model of signals ML Estimate p̂ confidence confidence confidence −2 ln (p0 ) p-value
MV/GNP (fixed threshold) 2 100.00% 3.71 3.71 3.71 – –
MV/GNP (confidence interval) 11 72.73% 2.80 4.97 5.25 5.25∗∗ 2.78%
MV/GNP (Cantelli) 11 72.73% 2.80 4.97 5.25 5.25∗∗ 2.82%
log(MV/GNP) (fixed threshold) 2 100.00% 3.71 3.71 3.71 – –
log(MV/GNP) (confidence interval) 10 80.00% 4.03 4.19 7.16 7.16∗∗ 1.38%
log(MV/GNP) (Cantelli) 10 80.00% 4.03 4.19 7.16 7.16∗∗ 1.38%
Notes: The Total Number of Signals is calculated as the sum of all the elements of the indicator sequence S. The Maximum Likelihood estimate p̂
is the probability of forecasting a correction. It maximizes the likelihood function of the model. It is equal to the ratio of the number of correct
forecasts to the total number of signals. Columns 4 to 6 report the critical values at a 90%, 95% and 99% confidence level for the empirical dis-
tribution generated using K = 10,000 Monte-Carlo simulations. The test statistic in column 7 is equal to −2 ln (p0 ) = −2 ln(L(p0 |X)/L(p̂|X)).
The levels of significance are based on the empirical distribution. The p-value is the probability of obtaining a test statistic higher than the one
actually observed, assuming that the null hypothesis is true. The degree of significance indicated in the test statistic and p-value are both based
on an empirical distribution generated through Monte-Carlo simulations.
∗ significant at the 10% level;
∗∗ significant at the 5% level;
∗∗∗ significant at the 1% level.

significant at a 5% level. Therefore, the likelihood ratio test based on the asymptotic χ 2 distribution is remarkably
accurate.

4.4. Parameter sensitivity: confidence level


For MV/GNP with a threshold computed using a standard one-tail confidence interval based on a normal
distribution,

inf {α ∈ A, 100α ∈ N} = 97%. (14)

The condition 100α ∈ N is purely aesthetic: it guarantees that we get an integer percentage. This choice of a
confidence level at 97% generates 6 signals (a bit less than a third of the 20 equity market corrections), but
reaches an accuracy p̂ = 100%. As expected, increasing the confidence level α leads to a decline in the number
of signals and an increase in the accuracy of the forecast.
Table 6 reports the key statistics of MV/GNP for various confidence levels. Setting a confidence level α = 90%
generates 17 signals. The proportion of correct signals is 64.71% and the model is significantly better than a ran-
dom guess at a 90% and 95% confidence level (p-value = 3%). In fact, if we ventured outside of the [0.9, 1] range
to pick α = 85%, the model generates 13 correct forecasts out of 20 signals. Its accuracy, 65%, is significantly bet-
ter than a random guess at a 90% and 95% confidence level (p-value = 1.73%). The model generates 20 signals
at α = 80%. With 16 correct forecasts, its accuracy would increase to 80%, making the model significantly better
than a random guess at a 90%, 95% and 99% confidence level (p-value = 0.02%). This unexpected observations
suggests that the model is particularly robust to a change or a misspecification in the confidence interval, even
outside of our initial test range of [0.9, 1].
The accuracy log(MV/GNP) is maximized for

inf {α ∈ A, 100α ∈ N} = 98%, (15)

At this confidence level, the log(MV/GNP) only produces one forecast. Table 6 reports the key statistics
for log(MV/GNP) at various confidence levels. These statistics are consistent with the pattern identified for
MV/GNP. Overall, the log(MV/GNP) model produces fewer signals than the MV/GNP model, but its test
statistics are higher for confidence levels ranging from 80% to 95%.
384 S. LLEO AND W. T. ZIEMBA

Table 6. Accuracy and statistical significance of MV/GNP and log(MV/GNP) as a function of the confidence level α.
Confidence level
80% 85% 90% 92.5% 95% 97.5% 99%
MV/GNP
Number of signals 20 20 17 17 11 4 0
Number of correct signals 16 13 11 11 8 4 0
Proportion of correct signals 80.00% 65.00% 64.71% 64.71% 72.73% 100.00% 0.00%
Test statistics 14.32∗∗∗ 5.66∗∗ 4.71∗∗ 4.71∗∗ 5.25∗∗ – –
p-value 0.02% 1.73% 3.00% 3.00% 2.19% – –
log(MV/GNP)
Number of signals 17 17 15 14 10 4 0
Number of correct signals 14 14 11 10 8 2 0
Proportion of correct signals 82.35% 82.35% 73.33% 71.43% 80.00% 50.00% 0.00%
Test statistics 13.71∗∗∗ 13.71∗∗∗ 7.43∗∗∗ 6.17∗∗ 7.16∗∗∗ 0.21 –
p-value 0.02% 0.02% 0.64% 1.30% 0.75% 64.51% –
Notes: The numbers presented in this table are based on a forecasting horizon H = 4 quarters. With this choice, the uninformed probability that a
random guess correctly identifies an equity market downturn is p0 = 38.64%. Rows 1,2 and 3 report the total number of signals generated, the
number of correct signals, and the proportion of correct signals computed as the ratio of the number of correct signals to the total number of
signals. Rows 4 and 5 respectively report the test statistic and p-value.
∗ significant at the 10% level;
∗∗ significant at the 5% level;
∗∗∗ significant at the 1% level.

4.5. Parameter sensitivity: forecasting horizon


We study the sensitivity of MV/GNP to a change in the forecasting horizon. The accuracy of the model is
maximized for

arg max p̂(H; α) = 7. (16)


H∈{1,2,3,4,5,6,7,8}

The model produces 10 accurate forecast out of 11 signals at H = 7. Table 7 reports the key statistics of the model
for H ∈ {1, 2, 3, 4, 5, 6, 7, 8}. As expected, both the number of correct signals and the uninformed probability p0
increase with the forecasting horizon. The small number of signals makes the test statistics unstable. For example,
the test statistics for H = 4 is 5.25, but it drops to 3.36 for H = 5, before increasing again to 4.35 for H = 6 and

Table 7. Accuracy and statistical significance of MV/GDP and log(MV/GNP) as a function of the forecasting horizon H.
Forecasting horizon (in quarters)
1 2 3 4 5 6 7 8
Uninformed probability p0 10.80% 21.02% 30.68% 39.08% 45.45% 51.70% 56.82% 61.36%
MV/GNP
Number of correct signals 1 4 7 8 8 9 10 10
Proportion of correct signals 9.09% 36.36% 63.64% 72.73% 72.73% 81.82% 90.91% 90.91%
Test statistics 0.035 1.36 5.05∗∗ 5.25∗∗ 3.36∗ 4.35∗∗ 6.28∗∗ 4.97∗∗
p-value 85.18% 24.25% 2.46% 2.19% 6.68% 3.69% 1.22% 2.58%
log(MV/GNP)
Number of correct signals 3 3 7 8 8 9 10 10
Proportion of correct signals 30.00% 30.00% 70.00% 80.00% 80.00% 90.00% 100.00% 100.00%
Test statistics 2.74 0.44 6.52∗∗ 7.16∗∗∗ 5.03∗∗ 6.82∗∗∗ – –
p-value 9.80% 50.51% 1.07% 0.75% 2.49% 0.90% – –
Notes: The numbers presented in this table are based on a confidence level α = 95%. With this choice, the model generated 11 signals. Row 1
reports the number of correct signals, row 2, the proportion of correct signals as the ratio of the number of correct signals to the total number
of signals. Row 3 presents the uninformed probability p0 that a random guess would correctly identify an equity market downturn. Rows 4 and
5 report the test statistics and p-value.
∗ significant at the 10% level;
∗∗ significant at the 5% level;
∗∗∗ significant at the 1% level.
THE EUROPEAN JOURNAL OF FINANCE 385

Table 8. Maximum likelihood estimate and likelihood ratio test: 8% equity market decline.

Total number Number of Likelihood Test statistics


Model of signals correct forecasts ML Estimate p̂ L(p̂) ratio  −2 ln  p-value
MV/GNP (fixed threshold) 2 2 100.00% – – – –
MV/GNP (confidence interval) 11 9 81.82% 5.43E-03 5.43E-02 5.83∗∗ 1.58%
MV/GNP (Cantelli) 11 9 81.82% 5.43E-03 5.43E-02 5.83∗∗ 1.58%
log(MV/GNP) (fixed threshold) 2 2 100.00% – – – –
log(MV/GNP) (confidence interval) 10 8 80.00% 6.71E-03 9.44E-02 4.72∗∗ 2.98%
log(MV/GNP) (Cantelli) 10 8 80.00% 6.71E-03 9.44E-02 4.72∗∗ 2.98%
Notes: The Total Number of Signals is calculated as the sum of all the elements in the indicator sequence S. The Number of Correct Forecasts tallies
corrections preceded by a signal. It is calculated as the sum of all the elements in the indicator sequence X. The Maximum Likelihood estimate p̂
is the probability of forecasting a correction. It maximizes the likelihood function of the model. It is equal to the ratio of the number of correct
forecasts to the total number of signals. L(p̂) is the likelihood of the correction forecasting model, computed using the maximum likelihood
estimate p̂. The likelihood ratio  = (L(p0 |X)/L(p = p̂|X)) is the ratio of the likelihood under the null hypothesis p = p0 to the likelihood using
the estimated probability p̂. The estimated test statistics, equal to −2 ln , is asymptotically χ 2 -distributed with 1 degree of freedom. The p-
value is the probability of obtaining a test statistic higher than the one actually observed, assuming that the null hypothesis is true. The degree
of significance and the p-value indicated in the table are both based on this distribution. The critical values at the 95%, 99% and 99.5% level are
respectively 3.84, 6.63 and 7.88.
∗ significant at the 10% level;
∗∗ significant at the 5% level;
∗∗∗ significant at the 1% level.

reaching 6.28 at H = 7. In addition, the model only starts to be significant for a horizon of at least three quarters.
It remains significant at longer horizon, except for H = 5.
log(MV/GNP) reaches 100% accuracy at H = 7. Like MV/GNP, log(MV/GNP) is only significant at horizons
of at least three quarters. We also observe that log(MV/GNP) has a higher test statistics than MV/GNP.
To conclude, both MV/GNP and log(MV/GNP) are robust with respect to changes in their key parameters.
Intriguingly, the accuracy of the models is higher at lower confidence levels. We do not have enough signals to
determine whether this is an artifact of the data, or an observation that can be generalized.

4.6. Model sensitivity to the definition of an equity market corrections


We analyze the impact of a small change in the magnitude of an equity market downturn to a 8% level. As
expected, lowering the loss threshold leads to an increase in the uninformed probability p0 . This probability is
now equal to 46.59%, compared with 38.64% for a 10% correction.
Table 8 reports the maximum likelihood estimate, likelihood ratio and test statistics for each of the six models.
The correction forecasting models still perform very well under the alternate definition of equity market cor-
rections. Their accuracy has not declined. In fact the accuracy of the two MV/GNP models has even increased
from 72.73% to 81.82%. The significance of the models has not declined markedly either, despite the noticeable
increase in the uninformed probability p0 .
Table 9 reports the maximum likelihood estimate, empirical critical value at 90%, 95% and 99% confidence
levels obtained by Monte Carlo simulations, test statistics and p-value for each model. The Monte Carlo study
confirms our earlier conclusions: the likelihood ratio test based on the asymptotic χ 2 distribution are remarkably
accurate.
The correction forecasting models do not appear overly sensitive to a small change in the definition of equity
market corrections.

5. Comparing MV/GNP to BSEYD and P/E ratio


Lleo and Ziemba (2017) found that the BSEYD and P/E ratio were statistically significant predictors of equity
market corrections on the S&P 500 using daily data from 1964 to 2014. So far, our study has shown that the
MV/GNP ratio is also a statistically significant predictor of equity market corrections using quarterly data from
1971 to 2016. So, how do the MV/GNP, BSEYD, and P/E ratio compare with each other?
386 S. LLEO AND W. T. ZIEMBA

Table 9. Monte Carlo likelihood ratio test: 8% equity market decline.

Critical Critical Critical


Total number Value: 90% Value: 95% Value: 99% Test statistics Empirical
Signal Model of signals ML Estimate p̂ confidence confidence confidence −2 ln (p0 ) p-value
MV/GNP (fixed threshold) 2 100.00% 3.01 3.01 3.01 – –
MV/GNP (confidence interval) 11 81.82% 3.09 3.91 7.37 5.83∗∗ 3.02%
MV/GNP (Cantelli) 11 81.82% 3.09 3.91 7.37 5.83∗∗ 3.03%
log(MV/GNP) (fixed threshold) 2 100.00% 3.71 3.71 3.71 – –
log(MV/GNP) (confidence interval) 10 80.00% 3.08 4.72 6.32 4.72∗ 5.11%
log(MV/GNP) (Cantelli) 10 80.00% 3.08 4.72 6.32 4.72∗ 5.34%
Notes: The Total Number of Signals is calculated as the sum of all the elements of the indicator sequence S. The Maximum Likelihood estimate p̂
is the probability of forecasting a correction. It maximizes the likelihood function of the model. It is equal to the ratio of the number of correct
forecasts to the total number of signals. Columns 4 to 6 report the critical values at a 90%, 95% and 99% confidence level for the empirical dis-
tribution generated using K = 10,000 Monte-Carlo simulations. The test statistic in column 7 is equal to −2 ln (p0 ) = −2 ln(L(p0 |X)/L(p̂|X)).
The levels of significance are based on the empirical distribution. The p-value is the probability of obtaining a test statistic higher than the one
actually observed, assuming that the null hypothesis is true. The degree of significance indicated in the test statistic and p-value are both based
on an empirical distribution generated through Monte-Carlo simulations.
∗ significant at the 10% level;
∗∗ significant at the 5% level;
∗∗∗ significant at the 1% level.

5.1. Constructing the BSEYD and P/E-Based prediction models


To construct a meaningful comparison with the MV/GNP measure, we need to use the same period (1971 to
2016) and quarterly frequency for the BSEYD and P/E measures. We expect that changing from a daily frequency
to a quarterly frequency will reduce the accuracy of the BSEYD and P/E.
We test four measures: P/E ratio, log(P/E) which is computed as the logarithm of the P/E ratio, BSEYD and
logBSEYD. The logBSEYD is defined as
r(t)
logBSEYD(t) = log , (17)
ρ(t)
where r(t) is the yield of a 10-year constant maturity U.S. Treasury Note (source: Board of Governors of the
Federal Reserve System), and ρ(t) is the earnings yield on the S&P500 calculated as the ratio of the earnings to
the level of the S&P500 or alternatively, as the reciprocal of the P/E ratio.
We use four definitions of earnings: current nominal earnings (E0), current real earnings (R0), average nom-
inal earnings over ten years (E10), and average real earnings over ten years (R10). Earnings data come from
Robert Shiller’s database.
In addition, we compute the threshold using two methods: confidence intervals and Cantelli’s inequality.
Therefore, we construct 32 models: 4 choices of measures (P/E, log(P/E), BSEYD, logBSEYD) × 4 definitions
of earnings (E0, R0, E10, R10) × 2 threshold calculation methods (confidence and Cantelli). These 32 models
include the four most popular models:

(1) The P/E ratio based on current earnings (P/E E0) commonly reported in the financial press;
(2) The P/E ratio based on ten years average earnings (P/E E10) advocated by Graham and Dodd (1934);
(3) The BSEYD model (BSEYD E0) proposed by Ziemba and Schwartz, and;
(4) The CAPE (P/E R10) proposed by Robert Shiller.

5.2. Results and discussion


Tables 4 and 5 already presented statistical results for the six MV/GNP models. Tables 10 and 11 do the same
for the 32 BSEYD and P/E models.
Table 10 presents the number of signals, number of correct forecasts, maximum likelihood estimate for the
accuracy of measure, and statistics related to the likelihood test. The results vary strongly across models. The
number of signals ranges from 6 to 18 out of 20 equity market corrections, and the accuracy from 55% to 100%.
THE EUROPEAN JOURNAL OF FINANCE 387

Table 10. Maximum likelihood estimate and likelihood ratio test: uninformed prior.

Total number Number of Test statistics


Model of signals correct forecasts ML Estimate p̂ L(p̂) Likelihood ratio  −2 ln  p-value
Standard Measures - Confidence Intervals
BSEYD E0 12 9 75.00% 1.17E-03 3.78E-02 6.55∗∗ 1.05%
BSEYD R0 14 11 78.57% 6.93E-04 9.55E-03 9.30∗∗∗ 0.23%
P/E E0 11 6 54.55% 5.11E-04 5.66E-01 1.14 28.63%
P/E R0 14 10 71.43% 2.30E-04 4.56E-02 6.17∗∗ 1.30%
BSEYD E10 6 4 66.67% 2.19E-02 3.82E-01 1.92 16.55%
BSEYD R10 9 7 77.78% 8.50E-03 5.69E-02 5.73∗∗ 1.67%
P/E E10 10 9 90.00% 3.87E-02 3.04E-03 11.59∗∗∗ 0.07%
P/E R10 13 9 69.23% 3.27E-04 8.31E-02 4.98∗∗ 2.57%
Standard Measures - Cantelli
BSEYD E0 12 9 75.00% 1.17E-03 3.78E-02 6.55∗∗ 1.05%
BSEYD R0 15 11 73.33% 1.67E-04 2.43E-02 7.43∗∗∗ 0.64%
P/E E0 11 6 54.55% 5.11E-04 5.66E-01 1.14 28.63%
P/E R0 15 10 66.67% 7.14E-05 9.04E-02 4.81∗∗ 2.83%
BSEYD E10 6 4 66.67% 2.19E-02 3.82E-01 1.92 16.55%
BSEYD R10 10 7 70.00% 2.22E-03 1.34E-01 4.03∗ 4.48%
P/E E10 11 9 81.82% 5.43E-03 1.33E-02 8.64∗∗∗ 0.33%
P/E R10 14 9 64.29% 1.09E-04 1.53E-01 3.75∗ 5.27%
Log Measures - Confidence Intervals
logBSEYD E0 14 9 64.29% 1.09E-04 1.53E-01 3.75∗ 5.27%
logBSEYD R0 18 12 66.67% 1.06E-05 5.59E-02 5.77∗∗ 1.63%
log(P/E) E0 11 6 54.55% 5.11E-04 5.66E-01 1.14 28.63%
log(P/E) R0 14 9 64.29% 1.09E-04 1.53E-01 3.75∗ 5.27%
logBSEYD E10 7 5 71.43% 1.52E-02 2.14E-01 3.09∗ 7.89%
logBSEYD R10 10 7 70.00% 2.22E-03 1.34E-01 4.03∗∗ 4.48%
log(P/E) E10 9 9 100.00% – – – –
log(P/E) R10 11 9 81.82% 5.43E-03 1.33E-02 8.64∗∗∗ 0.33%
Log Measures - Cantelli
logBSEYD E0 14 9 64.29% 1.09E-04 1.53E-01 3.75∗ 5.27%
logBSEYD R0 18 12 66.67% 1.06E-05 5.59E-02 5.77∗∗ 1.63%
log(P/E) E0 11 6 54.55% 5.11E-04 5.66E-01 1.14 28.63%
log(P/E) R0 15 9 60.00% 4.13E-05 2.48E-01 2.79∗ 9.50%
logBSEYD E10 7 5 71.43% 1.52E-02 2.14E-01 3.09∗ 7.89%
logBSEYD R10 10 7 70.00% 2.22E-03 1.34E-01 4.03∗∗ 4.48%
log(P/E) E10 10 9 90.00% 3.87E-02 3.04E-03 11.59∗∗∗ 0.07%
log(P/E) R10 12 9 75.00% 1.17E-03 3.78E-02 6.55∗∗ 1.05%
Notes: The Total Number of Signals is calculated as the sum of all the elements in the indicator sequence S. The Number of Correct Forecasts tallies
corrections preceded by a signal. It is calculated as the sum of all the elements in the indicator sequence X. The Maximum Likelihood estimate p̂
is the probability of forecasting a correction. It maximizes the likelihood function of the model. It is equal to the ratio of the number of correct
forecasts to the total number of signals. L(p̂) is the likelihood of the correction forecasting model, computed using the maximum likelihood
estimate p̂. The likelihood ratio  = L(p0 |X)/L(p = p̂|X) is the ratio of the likelihood under the null hypothesis p = p0 to the likelihood using
the estimated probability p̂. The estimated test statistics, equal to −2 ln , is asymptotically χ 2 -distributed with 1 degree of freedom. The p-
value is the probability of obtaining a test statistic higher than the one actually observed, assuming that the null hypothesis is true. The degree
of significance and the p-value indicated in the table are both based on this distribution. The critical values at the 95%, 99% and 99.5% level are
respectively 3.84, 6.63 and 7.88.
∗ significant at the 10% level;
∗∗ significant at the 5% level;
∗∗∗ significant at the 1% level.

Seven measures are significant at a 1% level. Another 11 are significant at a 5% level. Six measures are not
significant. This variability across measures is to be expected because correction predictions tend to be more
stable with more frequent measurements. Moreover, the choice of threshold method does not change the results
substantially.
Accuracy and the number of signals generated determine the statistical significance of a measure. Four mod-
els perform best: BSEYD R0, P/E E10, and log(P/E) E10 (significant at 1%), and logBSEYD R0 (significant
at 5%). However, these four models differ in their accuracy and the number of signals they generated. P/E E10
and log(P/E) E10 are more accurate, with an accuracy of 82% to 100%, versus 67% to 79% for BSEYD R0 and
388 S. LLEO AND W. T. ZIEMBA

Table 11. Monte Carlo likelihood ratio test.

Critical Critical Critical


Total number Value: 90% Value: 95% Value: 99% Test statistics Empirical
Signal Model of signals ML Estimate p̂ confidence confidence confidence −2 ln (p0 ) p-value
Standard Measures - Confidence Intervals
BSEYD E0 12 75.00% 2.76 3.85 6.55 6.55∗∗ 1.28%
BSEYD R0 14 78.57% 3.75 4.04 7.39 9.30∗∗∗ 0.31%
PE E0 11 54.55% 2.80 4.97 5.25 1.14 36.29%
PE R0 14 71.43% 3.75 4.04 7.39 6.17∗∗ 2.40%
BSEYD E10 6 66.67% 1.92 5.86 5.86 1.92 22.32%
BSEYD R10 9 77.78% 3.44 3.44 8.79 5.73∗∗ 3.12%
PE E10 10 90.00% 1.86 4.19 7.16 11.59∗∗∗ 0.09%
PE R10 13 69.23% 2.78 3.39 6.57 4.98∗∗ 4.08%
Standard Measures - Cantelli
BSEYD E0 12 75.00% 2.76 3.85 6.55 6.55∗∗ 1.58%
BSEYD R0 15 73.33% 2.79 4.72 7.43 7.43∗∗ 1.34%
PE E0 11 54.55% 2.80 4.97 5.25 1.14 35.90%
PE R0 15 66.67% 2.79 4.72 7.43 4.81∗ 5.96%
BSEYD E10 6 66.67% 1.92 5.86 5.86 1.92 22.32%
BSEYD R10 10 70.00% 4.03 4.19 7.16 4.03 10.31%
PE E10 11 81.82% 2.80 4.97 8.64 8.64∗∗ 1.09%
PE R10 14 64.29% 3.75 4.04 7.39 3.75 10.01%
Log Measures - Confidence Intervals
logBSEYD E0 14 64.29% 3.75 4.04 7.39 3.75 10.10%
logBSEYD R0 18 66.67% 2.21 4.14 6.87 5.77∗∗ 2.67%
logPE E0 11 54.55% 2.80 4.97 5.25 1.14 36.29%
logPE R0 14 64.29% 3.75 4.04 7.39 3.75 10.10%
logBSEYD E10 7 71.43% 3.09 3.09 6.84 3.09 11.42%
logBSEYD R10 10 70.00% 1.86 4.19 7.16 4.03∗ 9.76%
logPE E10 9 100.00% 3.44 3.44 8.79 – –
logPE R10 11 81.82% 2.80 4.97 5.25 8.64∗∗∗ 0.81%
Log Measures - Cantelli
logBSEYD E0 14 64.29% 1.95 4.04 7.39 3.75∗ 9.51%
logBSEYD R0 18 66.67% 2.21 4.14 6.87 5.77∗∗ 2.52%
logPE E0 11 54.55% 2.80 4.97 5.25 1.14 35.90%
logPE R0 15 60.00% 2.79 4.72 7.43 2.79 11.52%
logBSEYD E10 7 71.43% 3.09 6.65 6.84 3.09 12.09%
logBSEYD R10 10 70.00% 4.03 4.19 7.16 4.03 10.19%
logPE E10 10 90.00% 4.03 4.19 7.16 11.59∗∗∗ 0.12%
logPE R10 12 75.00% 2.76 3.85 6.55 6.55∗∗ 1.46%
Notes: The Total Number of Signals is calculated as the sum of all the elements of the indicator sequence S. The Maximum Likelihood estimate p̂
is the probability of forecasting a correction. It maximizes the likelihood function of the model. It is equal to the ratio of the number of correct
forecasts to the total number of signals. Columns 4 to 6 report the critical values at a 90%, 95% and 99% confidence level for the empirical dis-
tribution generated using K = 10,000 Monte-Carlo simulations. The test statistic in column 7 is equal to −2 ln (p0 ) = −2 ln(L(p0 |X)/L(p̂|X)).
The levels of significance are based on the empirical distribution. The p-value is the probability of obtaining a test statistic higher than the one
actually observed, assuming that the null hypothesis is true. The degree of significance indicated in the test statistic and p-value are both based
on an empirical distribution generated through Monte-Carlo simulations.
∗ significant at the 10% level;
∗∗ significant at the 5% level;
∗∗∗ significant at the 1% level.

logBSEYD R0. But BSEYD R0 and log BSEYD R0 generate more signals than P/E E10 and log(P/E) E10, with 14
to 18 signals versus 10 to 11 signals out of 20 equity market corrections.
Three other models (BSEYD E0, P/E R0 and P/E R10) deliver encouraging results. The worst performing
models are P/E E0, BSEYD E10, and their logarithmic versions.
The Monte Carlo study in Table 11 confirm these results, accentuating the gap between the best and worst
models.
Where does the MV/GNP ratio compare with these models in terms of accuracy and numbers of signals
generated? The MV/GNP ratio based on a fixed threshold drops out of the race because it produces too few
signals (2 out of 20 equity market corrections). The MV/GNP ratio based on a dynamic threshold (confidence
THE EUROPEAN JOURNAL OF FINANCE 389

interval or Cantelli’s inequality) performs better than the worst BSEYD and P/E measures, but not as well as the
best ones.
The MV/GNP ratio does not perform as well as P/E E10, BSEYD R0 and their logarithms. Compared with
P/E E10 and its logarithm, the MV/GNP ratio generates a similar number of signals, but is less accurate (73%
to 80% versus 82% to 100%). Compared with BSEYD R0, the MV/GNP ratio is slightly more accurate, but it
produces a quarter fewer signals (11 versus 14 to 18). Compared with logBSEYD R0, the log(MV/GNP) ratio is
more accurate (80% vs 67%), but it generates nearly 45% fewer signals (10 versus 18).
The MV/GNP ratio performs nearly as well as BSEYD E0, P/E R0, P/E R10 and their logarithms. Compared
with BSEYD E0, the MV/GNP ratio is slightly less accurate because it generates one fewer signal. Compared
with P/E R0 and P/E R10, the MV/GNP ratio is a bit more accurate (73% versus 64%-71%), but it produced
fewer signals (11 versus 13-14). Compared with logBSEYD E0 and log(P/E) R0, the log(MV/GNP) model
is substantially more accurate (80% versus 64%), but it generates fewer signals (10 versus 14 to 15). Finally,
log(MV/GNP) is broadly comparable to log(P/E) R10. It is a bit more accurate (80% versus 75%) but generates
slightly fewer signals (10 versus 12).
Overall, the MV/GNP ratio performs relatively well against the four most popular BSEYD and P/E-based
models, surpassing the P/E ratio based on current earnings (P/E E0) and the P/E ratio based on ten years average
earnings (P/E E10), and equaling the BSEYD model (BSEYD E0) and the CAPE (P/E R10).

6. Can Warren Buffett also forecast equity market rallies?


Warren Buffett’s MV/GNP ratio can forecast equity market correction reasonably well. Can it also predict equity
market rallies?
Similarly to our definition of equity market corrections, we define an equity market rally as an increase of at
least 10% in the S&P500 index level from trough to peak based on closing prices, over a maximum period of one
year (252 trading days). Table 12 presents all 25 rallies that occurred over the period. On average, a rally lasted
419 days, with a 37% rise in the level of the S&P 500 index. Seventeen rallies were bull markets, with an increase
of at least 20% in the level of the S&P 500 index.

Table 12. The S&P 500 index experienced 26 market rallies between January 1, 1971 and September 30, 2016.

Peak Identification S&P Index at S&P Index at Trough-to-peak Trough-to-peak


Date Trough Date Trough Peak date peak rise (%) duration (in days)
1 1973-02-12 1972-02-14 104.59 1973-02-13 116.78 11.7% 365
2 1973-10-11 1973-08-22 100.53 1973-10-12 111.44 10.9% 51
3 1974-03-13 1974-02-11 90.66 1974-03-13 99.74 10.0% 30
4 1974-10-10 1974-10-03 62.28 1975-07-15 95.61 53.5% 285
5 1978-04-24 1978-03-06 86.9 1978-09-12 106.99 23.1% 190
6 1979-04-24 1978-11-14 92.49 1980-02-13 118.44 28.1% 456
7 1980-05-22 1980-03-27 98.22 1980-11-28 140.52 43.1% 246
8 1981-11-02 1981-09-25 112.77 1983-10-10 172.65 53.1% 745
9 1984-08-07 1984-07-24 147.82 1987-08-25 336.77 127.8% 1127
10 1987-10-21 1987-10-19 224.84 1989-10-09 359.80 60.0% 721
11 1990-07-30 1990-01-30 322.98 1992-01-15 420.77 30.3% 715
12 1993-01-14 1992-04-08 394.5 1993-03-10 456.33 15.7% 336
13 1995-02-15 1994-04-04 438.92 1998-07-17 1157.76 163.8% 1565
14 2000-02-22 1999-03-02 1225.5 2000-03-24 1527.46 24.6% 388
15 2001-04-18 2001-04-04 1103.25 2001-05-21 1312.83 19.0% 47
16 2001-10-03 2001-09-21 965.8 2002-01-04 1172.51 21.4% 105
17 2002-07-29 2002-07-23 797.7 2004-02-11 1157.76 45.1% 568
18 2005-01-26 2004-08-12 1063.23 2005-03-07 1225.31 15.2% 207
19 2005-11-21 2005-04-20 1137.5 2006-03-17 1307.25 14.9% 331
20 2007-08-17 2006-08-23 1292.99 2007-10-09 1565.15 21.0% 412
21 2008-05-1 2008-03-10 1273.37 2008-05-19 1426.63 12.0% 70
22 2008-10-13 2008-10-10 899.22 2010-04-23 1217.28 35.4% 560
23 2010-09-15 2010-07-02 1022.58 2011-04-29 1363.61 33.3% 301
24 2011-10-14 2011-10-03 1099.23 2012-04-02 1419.04 29.1% 182
25 2014-10-21 2014-02-03 1741.89 2015-05-21 2130.82 22.3% 472
390 S. LLEO AND W. T. ZIEMBA

Given a forecasting measure M(t), a signal SIGNAL(t) occurs at time t whenever the measures crosses below
a threshold B(t) :

SIGNAL(t) = M(t) − B(t) < 0 (18)

We examine the following six prediction models:

(1) MV/GNP with fixed threshold at 80%;


(2) MV/GNP with threshold computed using a standard 95% one-tail confidence interval based on a normal
distribution;
(3) MV/GNP with threshold computed using Cantelli’s inequality;
(4) og(MV/GNP) with fixed threshold at 80%;
(5) log(MV/GNP) with threshold computed using a standard 95% one-tail confidence interval based on a
normal distribution; and
(6) log(MV/GNP) with threshold computed using Cantelli’s inequality;

Table 13 presents the maximum likelihood estimate, likelihood ratio and test statistic for each of the six mod-
els. The uninformed prior probability p0 is the probability to identify an equity market rally within 4 quarters
from a randomly selected date. We find that p0 = 51%, noticeably higher than the probability to identify an
equity market correction at random (39%). The MV/GNP with fixed threshold, the MV/GNP using Cantelli’s
inequality, and the log(MV/GNP) using Cantelli’s inequality are 100% accurate, but they only produced 5 to 6
signals. Despite a high accuracy, the log(MV/GNP) using a confidence interval is significant at a 10% level only,
while the log(MV/GNP) with fixed thresholds and MV/GNP using a confidence interval are not significant.
Overall none of the models succeeded in forecasting more that 30% of all rallies.
The Monte-Carlo study (Table 14) is broadly consistent with the results of the likelihood ratio test based on
the asymptotic χ 2 distribution.
To summarize, the MV/GNP ratio is not particularly helpful as a predictor of equity market rallies. Although
half of the models have perfect accuracy on the rallies they predicted, and the other half were 75% to 86%
accurate, all six models missed a majority of rallies. Because of the low number of predictions, and relatively
high uninformed prior probability p0 , it is is difficult to establish the significance of measures based on the
MV/GNP.

Table 13. Maximum likelihood estimate and likelihood ratio test: uninformed prior.

Total number Number of Likelihood Test statistics


Model of signals correct forecasts ML Estimate p̂ L(p̂) ratio  −2 ln  p-value
MV/GNP (fixed threshold) 5 5 100.00% – – – –
MV/GNP (confidence interval) 8 6 75.00% 1.11E-02 3.84E-01 1.92 16.64%
MV/GNP (Cantelli) 6 6 100.00% – – – –
log(MV/GNP) (fixed threshold) 5 4 80.00% 8.19E-02 4.08E-01 1.79 18.05%
log(MV/GNP) (confidence interval) 7 6 85.71% 5.67E-02 1.54E-01 3.74∗ 5.32%
log(MV/GNP) (Cantelli) 5 5 100.00% – – – –
Notes: The Total Number of Signals is calculated as the sum of all the elements in the indicator sequence S. The Number of Correct Forecasts tallies
corrections preceded by a signal. It is calculated as the sum of all the elements in the indicator sequence X. The Maximum Likelihood estimate p̂
is the probability of forecasting a correction. It maximizes the likelihood function of the model. It is equal to the ratio of the number of correct
forecasts to the total number of signals. L(p̂) is the likelihood of the correction forecasting model, computed using the maximum likelihood
estimate p̂. The likelihood ratio  = L(p0 |X)/L(p = p̂|X) is the ratio of the likelihood under the null hypothesis p = p0 to the likelihood using
the estimated probability p̂. The estimated test statistics, equal to −2 ln , is asymptotically χ 2 -distributed with 1 degree of freedom. The p-
value is the probability of obtaining a test statistic higher than the one actually observed, assuming that the null hypothesis is true. The degree
of significance and the p-value indicated in the table are both based on this distribution. The critical values at the 95%, 99% and 99.5% level are
respectively 3.84, 6.63 and 7.88.
∗ significant at the 10% level;
∗∗ significant at the 5% level;
∗∗∗ significant at the 1% level.
THE EUROPEAN JOURNAL OF FINANCE 391

Table 14. Monte Carlo likelihood ratio test.

Critical Critical Critical


Total number Value: 90% Value: 95% Value: 99% Test statistics Empirical
Signal Model of signals ML Estimate p̂ confidence confidence confidence −2 ln (p0 ) p-value
MV/GNP (fixed threshold) 5 100.00% 2.07 6.71 7.16 – –
MV/GNP (confidence interval) 8 75.00% 2.28 4.79 5.34 1.92 28.83%
MV/GNP (Cantelli) 6 100.00% 3.10 3.10 8.59 – –
log(MV/GNP) (fixed threshold) 5 80.00% 2.07 6.71 7.16 1.79 36.93%
log(MV/GNP) (confidence interval) 7 85.71% 3.74 4.19 9.39 3.74 12.95%
log(MV/GNP) (Cantelli) 5 100.00% 2.07 6.71 7.16 – –
Notes: The Total Number of Signals is calculated as the sum of all the elements of the indicator sequence S. The Maximum Likelihood estimate p̂
is the probability of forecasting a correction. It maximizes the likelihood function of the model. It is equal to the ratio of the number of correct
forecasts to the total number of signals. Columns 4 to 6 report the critical values at a 90%, 95% and 99% confidence level for the empirical dis-
tribution generated using K = 10,000 Monte-Carlo simulations. The test statistic in column 7 is equal to −2 ln (p0 ) = −2 ln(L(p0 |X)/L(p̂|X)).
The levels of significance are based on the empirical distribution. The p-value is the probability of obtaining a test statistic higher than the one
actually observed, assuming that the null hypothesis is true. The degree of significance indicated in the test statistic and p-value are both based
on an empirical distribution generated through Monte-Carlo simulations.
∗ significant at the 10% level;
∗∗ significant at the 5% level;
∗∗∗ significant at the 1% level.

7. Conclusion
Our analysis shows that Warren Buffett’s market value of all publicly traded securities as a percentage of GNP
(MV/GNP), and its parent the logarithm of the market value of all publicly traded securities as a percentage
of GNP (log(MV/GNP)), can be a statistically significant predictors of future market corrections. However, for
these measures to work, we need to use time-varying confidence-based thresholds rather than fixed thresholds.
This conclusion dispels a common myth about the MV/GNP ratio: absolute level matters. This myth has led
market commentators and investment practitioners to suggest that the level of the MV/GNP is the harbinger of
an impeding market meltdown. After all, they argue, the MV/GNP is near its highest point since the Dot.Com
bubble. Our findings indicate that this claim cannot be substantiated. Using an arbitrary threshold fixed at 120%,
the MV/GNP would have signaled at most 2 out of the 20 equity market corrections that occurred between
1971 Q1 and 2016 Q3. This is not to say that the absolute level of the ratio is altogether irrelevant. In itself,
the level does not provide sufficient evidence to forecast most equity market corrections. In fact, as soon as we
change the fixed level for a simple stochastic threshold based on the rolling horizon sample mean and standard
deviation, the number of predictions soars to 10 or 11, with a 70% to 80% accuracy, and a strong statistical
significance.
Our comparative analysis shows that the MV/GNP ratio performs relatively well against the four most pop-
ular BSEYD and P/E-based models, but it predicts fewer equity market corrections than most BSEYD and
P/E-based models. In a separate study, we also found that the MV/GNP ratio was not particularly useful
predictor of equity market rallies.
A major and practical limitation of the MV/GNP ratio is its reliance on the GNP, which is only released
quarterly and subject to revisions. This reliance prevents the same timely measurements as with other crash
and equity correction measures. These timely measurements are crucial to identify and anticipate possible mar-
ket corrections. The recent development of a number of nowcasting methodologies, including the Atlanta Fed’s
GDPNow, responds to a rising demand for more frequent and accurate forecasts. Until a ‘GNPNow’ is avail-
able, we could consider using GDPNow, and compute MV/GDP as a proxy for MV/GDP. However, we do not
currently have enough GDPNow data to test this idea.
It is time to go back to the worried investors that we met in the introduction. If the MV/GNP ratio is currently
sending a mixed signal, what do other measures tell us? At the time of writing, Robert Shiller’s CAPE was at
32.09, much higher than its historical mean at 16.7, but still lower than its Black Tuesday peak reached at 32.6 in
September 1929, or its Dot.Com peak reached at 44.2 in December 1999 (Shiller 2018). Neither the BSEYD nor
William Ziemba’s proprietary put-call measures indicated that an equity market downturn is imminent. Overall,
the crash forecasting measures are not signaling an impeding market meltdown.
392 S. LLEO AND W. T. ZIEMBA

This investigation also uncovered five additional research questions. If the MV/GNP is a statistically signif-
icant predictor of equity market corrections, can we also use it to forecast abnormally high returns? We have
established that the MV/GNP ratio works in the U.S. market. Does it also work in international markets? What
would be the loss of accuracy if we used a consensus forecast or a forecasting model instead of using the final
GNP release? Finally, would combining several equity correction forecasting models produce a more accurate
forecast of equity market corrections? Can we build a meta model?
The first author gratefully acknowledges support from Région Champagne Ardennes and the European
Union through the RiskPerform Grant. We also wish to express our gratitude to two anonymous referee for
their helpful comments and suggestions.

Data Reference
Board of Governors of the Federal Reserve System. H.15 Selected Interest Rates,https://www.federalreserve.gov/
datadownload/Choose.aspx?rel = H15, last accessed on May 18, 2018.
Federal Reserve Bank of St. Louis, Federal Reserve Economic Data (FRED),https://fred.stlouisfed.org, last
accessed on March 22, 2017.
Bloomberg, obtained from a Bloomberg terminal on May 17, 2018 via a Bloomberg Excel AddIn.
Robert Shiller. U.S. Stock Markets 1871-Present and CAPE Ratio, available athttp://www.econ.yale.edu/
∼ shiller/data.htm, last accessed on May 18, 2018.

Notes
1. When using different thresholds, the identification algorithm will produces slightly different results. Hence the total number of
declines at or above 10% in Figure 3, which uses a threshold of 5%, will not sum to 20 as presented in Table 1.
2. See, for example, Problem 7.11.9 in Grimmett and Stirzaker (2001)
3. See for example pp. 45, 98, 104, 117 in Fisher (1933).

Disclosure statement
No potential conflict of interest was reported by the authors.

Funding
This work was supported by Region Champagne Ardenne and European Union [RiskPerform].

ORCID
S. Lleo http://orcid.org/0000-0002-0732-2833

References
Abreu, D., and M. K. Brunnermeier. 2003. “Bubbles and Crashes.” Econometrica 71 (1): 173–204.
Allen, F., and G. Gorton. 1993. “Churning Bubbles.” The Review of Economic Studies 60 (4): 813–836.
Baker, M., and J. Wurgler. 2006. “Investor Sentiment and the Cross-Section of Stock Returns.” The Journal of Finance 61 (4):
1645–1680.
Barone-Adesi, G., L. Mancini, and H. Shefrin. 2013. A Tale of Two Investors: Estimating Optimism and Overconfidence. Technical
Report.
Blanchard, O. J., and M. W. Watson. 1982. “Bubbles, Rational Expectations, and Financial Markets.” In Crisis in the Economic and
Financial Structure, edited by P. Wachtel, 295–315. Cambridge, MA.
Buffett, W., and C. Loomis. 2001. “Warren Buffett on the Stock Market.” FORTUNE Magazine.
Camerer, C. 1989. “Bubbles and Fads in Asset Prices.” Journal of Economic Survey 3 (1): 3–41.
Campbell, J. Y., and R. J. Shiller. 1988. “Stock Prices, Earnings, and Expected Dividends.” The Journal of Finance 43 (3): 661–676.
Papers and Proceedings of the Forty-Seventh Annual Meeting of the American Finance Association, Chicago, Illinois, December
28–30, 1987.
Campbell, J. Y., and R. J. Shiller. 1998. “Valuation Ratios and the Long-Run Stock Market Outlook.” The Journal of Portfolio
Management 24: 11–26.
THE EUROPEAN JOURNAL OF FINANCE 393

Clenfield, J., and A. Haigh. 2017. “Why Robert Shiller Is Worried About the Trump Rally.” https://www.bloomberg.com/news/
articles/2017-03-14/as-trump-charms-wall-street-robert-shiller-gets-dot-com-deja-vu, 03.
Cox, J. 2017. “George Soros loaded up with big bets against the stock market.” http://www.cnbc.com/2017/02/17/george-soros-
loaded-up-with-big-bets-against-the-stock-market.html, 02.
Diba, B. T., and H. I. Grossman. 1988. “The Theory of Rational Bubbles in Stock Prices.” The Economic Journal 98 (392): 746–754.
Fisher, R. A. 1933. Statistical Methods for Research Workers. 5th ed. Edinburgh and London: Oliver & Boyd.
Fisher, R. A. 1955. “Statistical Methods and Scientific Inference.” Journal of the Royal Statistical Society, Series B 17 (1): 69–78.
Fisher, K., and M. Statman. 2000. “Investor Sentiment and Stock Returns.” Financial Analyst Journal56: 16–23.
Fisher, K., and M. Statman. 2003. “Consumer Confidence and Stock Returns.” The Journal of Portfolio Management 30: 115–127.
Flood, R. P., R. J. Hodrick, and P. Kaplan. 1986. “An Evaluation of Recent Evidence on Stock Market Bubbles.” NBER Working Paper
No. w1971.
Goetzmann, W., D. Kim, and R. Shiller. 2016. “Crash Beliefs from Investor Surveys.” .
Graham, B., and D. L. Dodd. 1934. Security Analysis. 1st ed. New York, NY: Whittlesey House.
Gresnigt, F., E. Kole, and P. H. Franses. 2015. “Interpreting financial market crashes as earthquakes: A new Early Warning System
for medium term crashes.” Journal of Banking & Finance 56: 123–139.
Grimmett, G., and D. Stirzaker. 2001. Probability and Random Processes. Oxford, UK: Oxford University Press.
Jarrow, R. A. 2012. “Detecting Asset Price Bubbles.” The Journal of Derivatives 20 (1): 30–34.
Jarrow, R. A., Y. Kchia, and P. Protter. 2011a. “How to Detect an Asset Bubble.” SIAM Journal on Financial Mathematics 2:
839–865.
Jarrow, R. A., Y. Kchia, and P. Protter. 2011b. “Is There a Bubble in LinkedIn’s Stock Price?” The Journal of Portfolio Management 38
(1): 125–130.
Jarrow, R. A., Y. Kchia, and P. Protter. 2011c. “A Real Time Bubble Detection Methodology.” Bloomberg Risk Newsletter.
Kelly, K. 2016. “George Soros loads up on bearish market bet.” http://www.cnbc.com/2014/08/15/george-soros-loads-up-on-bearish-
market-bet.html, 06.
Lleo, S., and W. T. Ziemba. 2012. “Stock market crashes in 2007–2009: were we able to predict them?” Quantitative Finance 12 (8):
1161–1187.
Lleo, S., and W. T. Ziemba. 2017. “Does the Bond-Stock Earnings Yield Differential Model Predict Equity Market Corrections Better
Than High P/E Models?” Financial Markets, Institutions & Instruments 26 (2): 61–123.
Meyer, M. A., and J. M. Booker. 2001. Eliciting and Analyzing Expert Judgment: A Practical Guide. Philadelphia, PA: Society for
Industrial and Applied Mathematics.
Neyman, J. 1934. “On the Two Different Aspects of the Representative Method.” In Royal Statistical Society.
Neyman, J. 1937. “Outline of a Theory of Statistical Estimation Based on the Classical Theory of Probability.” Philosophical
Transactions of the Royal Society A 236 (767): 333–380.
Neyman, J., and E. S. Pearson. 1933. “The Testing of Statistical Hypotheses in Relation to Probabilities a Priori.” In Mathematical
Proceedings of the Cambridge Philosophical Society, Cambridge, UK, Vol. 29, 492–510.
Ogata, Y. 1988. “Statistical models for earthquake occurrences and residual analysis for point processes.” Journal of the American
Statistical Association 83 (401): 9–27.
O’Hagan, A. 2006. Uncertain Judgments: Eliciting Expert’s Probabilities. Chichester: Wiley.
Savage, L. J. 1971. The Foundations of Statistics. 2nd ed. New York, NY: Dover.
Shiller, R. J.. 2006. “Irrational Exuberance Revisited.” CFA Institute Conference Proceedings Quarterly23 (3): 16–25.
Shiller, R. J. 2015. Irrational Exuberance. 3rd ed. Princeton, NJ: Princeton University Press.
Shiller, R. J. 2018. “U.S. Stock Markets 1871-Present and CAPE Ratio.” http://www.econ.yale.edu/ shiller/data.htm, March.
Shiryaev, A. N., and M. V. Zhitlukhin. 2012a. “Bayesian Disorder Detection Problems on Filtered Probability Spaces.” Theory of
Probability and Its Applications 57 (3): 497–511.
Shiryaev, A. N., and M. V. Zhitlukhin. 2012b. Optimal stopping problems for a Brownian motion with a disorder on a finite interval.
Technical Report. arXiv:1212.3709.
Shiryaev, A. N., M. V. Zhitlukhin, and W. T. Ziemba. 2014. “When to Sell Apple and the Nasdaq 100? Trading Bubbles with a
Stochastic Disorder Model.” The Journal of Portfolio Management 40 (2): 1–10.
Shiryaev, A. N., M. V. Zhitlukhin, and W. T. Ziemba. 2015. “Land and Stock Bubbles, Crashes and Exit Strategies in Japan, Circa
1990 and in 2013.” Quantitative Finance (forthcoming).
Ziemba, W. T., S. Lleo, and M. Zhitlukhin. 2017. Stock Market Crashes: Predictable and Unpredictable. Singapore: World Scientific
Publishing.
Ziemba, W. T., and S. L. Schwartz. 1991. Invest Japan. Chicago: Probus.

You might also like