Professional Documents
Culture Documents
Pseudo-Mathematics and Financial Charlatanism. The Effects of Backtest Overfitting On Out of Sample Performance PDF
Pseudo-Mathematics and Financial Charlatanism. The Effects of Backtest Overfitting On Out of Sample Performance PDF
Another thing I must point out is that you cannot “training set” in the machine-learning literature).
prove a vague theory wrong. […] Also, if the process The OOS performance is simulated over a sample
of computing the consequences is indefinite, then not used in the design of the strategy (a.k.a. “testing
with a little skill any experimental result can be
set”). A backtest is realistic when the IS performance
made to look like the expected consequences.
is consistent with the OOS performance.
—Richard Feynman [1964]
When an investor receives a promising backtest
from a researcher or portfolio manager, one of
Introduction her key problems is to assess how realistic that
A backtest is a historical simulation of an algo- simulation is. This is because, given any financial
rithmic investment strategy. Among other things, series, it is relatively simple to overfit an investment
it computes the series of profits and losses that strategy so that it performs well IS.
such strategy would have generated had that al- Overfitting is a concept borrowed from ma-
gorithm been run over that time period. Popular chine learning and denotes the situation when a
performance statistics, such as the Sharpe ratio
model targets particular observations rather than
or the Information ratio, are used to quantify the
a general structure. For example, a researcher
backtested strategy’s return on risk. Investors
could design a trading system based on some
typically study those backtest statistics and then
parameters that target the removal of specific
allocate capital to the best performing scheme.
recommendations that she knows led to losses
Regarding the measured performance of a
IS (a practice known as “data snooping”). After a
backtested strategy, we have to distinguish between
few iterations, the researcher will come up with
two very different readings: in-sample (IS) and out-
“optimal parameters”, which profit from features
of-sample (OOS). The IS performance is the one
that are present in that particular sample but may
simulated over the sample used in the design of
well be rare in the population.
the strategy (also known as “learning period” or
Recent computational advances allow invest-
David H. Bailey is retired from Lawrence Berkeley National ment managers to methodically search through
Laboratory. He is a Research Fellow at the University of Cal- thousands or even millions of potential options for
ifornia, Davis, Department of Computer Science. His email a profitable investment strategy. In many instances,
address is david@davidhbailey.com. that search involves a pseudo-mathematical ar-
Jonathan M. Borwein is Laureate Professor of Mathematics gument which is spuriously validated through a
at the University of Newcastle, Australia, and a Fellow of the backtest. For example, consider a time series of
Royal Society of Canada, the Australian Academy of Science,
daily prices for a stock X. For every day in the
and the AAAS. His email address is jonathan.borwein@
newcastle.edu.au. sample, we can compute one average price of
that stock using the previous m observations x̄m
Marcos López de Prado is Senior Managing Director at
Guggenheim Partners, New York, and Research Affiliate at and another average price using the previous n
Lawrence Berkeley National Laboratory. His email address observations x̄n , where m < n. A popular invest-
is lopezdeprado@lbl.gov. ment strategy called “crossing moving averages”
Qiji Jim Zhu is Professor of Mathematics at Western Michi- consists of owning X whenever x̄m > x̄n . Indeed,
gan University. His email address is zhu@wmich.edu. since the sample size determines a limited number
DOI: http://dx.doi.org/10.1090/noti1105 of parameter combinations that m and n can adopt,
4
Bailey et al. [1] propose a method to determine the degree
to which a particular backtest may have been compromised
by the risk of overfitting.
Recentering a series is one way to introduce Proposition 5 reaches the same conclusion as
memory into a process, because some data points Proposition 3 (a compensation effect) without
will now compensate for the extreme outcomes requiring a global constraint.
from other data points. By optimizing a backtest,
the researcher selects a model configuration that Is Backtest Overfitting a Fraud?
spuriously works well IS and consequently is likely Consider an investment manager who emails his
to generate losses OOS. stock market forecast for the next month to 2n x
prospective investors, where x and n are positive
Serial Dependence integers. To half of them he predicts that markets
Imposing a global constraint is not the only will go up, and to the other half that markets will
situation in which overfitting actually is detrimental. go down. After the month passes, he drops from
To cite another (less restrictive) example, the same his list the names to which he sent the incorrect
effect happens if the performance series is serially forecast, and sends a new forecast to the remaining
conditioned, such as a first-order autoregressive 2n−1 x names. He repeats the same procedure n
process, times, after which only x names remain. These x
investors have witnessed n consecutive infallible
(10) ∆mτ = (1 − ϕ)µ + (ϕ − 1)ϕmτ−1 + σ ετ forecasts and may be extremely tempted to give this
or, analogously, investment manager all of their savings. Of course,
this is a fraudulent scheme based on random
(11) mτ = (1 − ϕ)µ + ϕmτ−1 + σ ετ ,
screening: The investment manager is hiding the
where the random shocks are again IID distributed fact that for every one of the x successful witnesses,
as ετ ∼ Z. The following proposition is proven in he has tried 2n unsuccessful ones (see Harris [8, p.
the appendix. The number of observations that 473] for a similar example).
it takes for a process to reduce its divergence To avoid falling for this psychologically com-
from the long-run equilibrium by half is known as pelling fraud, a potential investor needs to consider
the half-life period, or simply half-life (a familiar the economic cost associated with manufacturing
physical concept introduced by Ernest Rutherford the successful experiments, and require the invest-
in 1907). ment manager to produce a number n for which