Download as pdf or txt
Download as pdf or txt
You are on page 1of 2048

Bachelier, Louis Let us say a few words about this extraordinary

thesis. The problem investigated by Bachelier is


(1870–1946) described in less than a page. The stock market is
subject to innumerable random influences, and so it
is unreasonable to expect a mathematically precise
Formation Years forecast of stock prices. However, we can try to
establish the law of the changes in stock prices over
Louis Bachelier was born in Le Havre, France, on a fixed period of time. The determination of this law
March 11, 1870. His father, a native of Bordeaux, was the subject of Bachelier’s thesis. The thesis was
moved to Le Havre after his marriage to the daughter not particularly original. Since the early nineteenth
of a notable citizen of Le Havre. He started a wine century, people had applied probability theory to
and spirits shop, and bought and exported wines from study exchange rates. In France, in particular, we can
Bordeaux and Champagne. At the time, Le Havre cite the work of Bicquilley (around 1800) or Jules
was an important port. The Protestant bourgeoisie in Regnault (around 1850). In his thesis, Bachelier [1]
the city, which dominated the local cotton and coffee intended to revisit this issue from several viewpoints
markets, occupied the upper echelons of society. The taken from physics and probability theory, as these
young Louis was educated at a high school in Le subjects were taught in Europe, including Paris,
Havre. He seems to have been a fairly good student, around 1900. He adapted these viewpoints to aid his
but he interrupted his studies after earning his high investigation. The first method he used is the method
school diploma in 1889, when both of his parents adopted by Einstein, five years later, to determine
died in the span of a few weeks. To provide for his the law of Brownian motion in a physical context. It
youngest brother and his older sister, most likely, he consists of studying the integral equation that governs
took over his father’s business, but he sold it after a the probability that the change in price is y at time t,
few years. In 1892, he completed his military service under two natural assumptions: the change in price
as an infantryman and then moved to Paris, where his during two separate time intervals is independent and
activities are unclear. What is clear, however, is that the expectation of the change in price is zero. The
Bachelier focused on his interests in the stock market resulting equation is a homogeneous version of the
and undertook university studies at the University diffusion equation, now known as the Kolmogorov (or
of Paris, where in 1895 he obtained his bachelor’s Chapman–Kolmogorov) equation, in which Bachelier
degree in the mathematical sciences, without being a boldly asserts that the appropriate solution is given by
particularly distinguished student. After earning his a centered Gaussian law with variance proportional
degree, he continued to attend the lectures of the to time t. He proved a statement already proposed,
Faculty, including courses in mathematical physics without justification, by Regnault in 1860 that the
taught by Poincaré and Boussinesq. expectation of the absolute change in price after
Although we cannot be absolutely certain, it is time t is proportional to the square root of t.
likely that in 1894, Bachelier attended lectures in But this first method, which would eventually be
probability theory given by Poincaré, which were used in the 1930s by physicists and probabilists, did
published in 1896 and were based on the remarkable not seem to satisfy Bachelier, since he proposed a
treatise that Joseph Bertrand published in 1888. second method, which was further developed in the
His attendance at these lectures, his reading of 1930s by the Moscow School: the approximation of
treatises by Bertrand and Poincaré, and his interest the law of Brownian motion by an infinite sequence
in the stock market probably inspired his thesis, of coin flips, properly normalized. Since the change
“theory of speculation”, which was defended by in price over a given period of time is the result of a
Bachelier [1] in Paris on March 29, 1900, before a very large number of independent random variables,
jury composed of Appell, Boussinesq, and Poincaré. it is not surprising that this change in price is
On the report by Henri Poincaré, he was conferred the Gaussian. But the extension of this approximation
rank of Doctor of Mathematics with an “honorable” to a continuous-time version is not straightforward.
designation, that is, a designation insufficient for him Bachelier, who already know the result he wanted
to obtain employment in higher education, which was to obtain, states and prepares the way to the first
extremely limited at the time. known version of a theorem, which in the current
2 Bachelier, Louis (1870–1946)

language reads as follows: let {X1 , X2 , . . . , Xn , . . .} whose solution is the law of a centered Gaussian
be a sequence of independent random variables taking random variable with variance n.
values 1 or −1 with probability 1/2. If we let Sn =
X1 + · · · + Xn and let [x] denote the integer part of
a real number x, then Theory of Speculation
 
1   At the stock market, probability radiates like heat.
√ S[nt] , t ≥ 0 −−−→ Bt , t ≥ 0 (1) This “demonstrates” the role of Gaussian laws in
n
problems related to the stock market, as acknowl-
in law as n −−−→ ∞, where (Bt , t ≥ 0) is a standard edged by Poincaré himself in his report: “A little
Brownian motion. reflection shows that the analogy is real and the
This second method, which is somewhat difficult comparison legitimate. The arguments of Fourier are
to read and not very rigorous, naturally leads to applicable, with very little change, to this problem
the previous solution. But it is still not sufficient. that is so different from the problem to which these
Bachelier proposes a third method, the “radiation (or arguments were originally applied.” And Poincaré
diffusion) of probability”. Bachelier, having attended regretted that Bachelier did not develop this point
the lectures of Poincaré and Boussinesq on the theory further, though this point would be developed in a
of heat, was aware of the “method of Laplace”, masterly way by Kolmogorov in a famous article
which gives the fundamental solution of the heat published in 1931 in the Mathematische Annalen. In
equation, a solution that has exactly the form given fact, the first and third methods used by Bachelier
by the first (and second) methods used by Bachelier. are intrinsically linked: the Chapman–Kolmogorov
Hence, there is a coincidence to be elucidated. We equation for any regular Markov process is equiva-
know that Laplace probably knew the reason for lent to a partial differential equation of parabolic type.
this coincidence. Lord Rayleigh had recently noticed In all regular Markovian schemes that are continuous,
this coincidence in his solution to the problem of probability radiates like heat from a fire fanned by the
“random phases”. It is likely that neither Bachelier thousand winds of chance. And further work, exploit-
nor Poincaré had read the work of Rayleigh. Anyway, ing this real analogy, would transform not only the
Bachelier, in turn, explains this curious intersection theory of Markov processes but also the century-old
between the theory of heat and the prices of annuities theory of Fourier equations and parabolic equations.
on the Paris stock exchange. This is his third method, Now, having determined the law of price changes,
which can be summarized as follows. all calculations of financial products involving time
Consider the game of flipping a fair coin an infinite follow easily. But Bachelier did not stop there. He
number of times and set f (n, x) = (Sn = x). It has proposed a general theory of speculation integrat-
been known since at least the seventeenth century that ing all stock market products that could be proposed
to clients, whose (expected) value at maturity—and
f (n + 1, x) = 12 f (n, x − 1) + 12 f (n, x + 1) therefore whose price—can be calculated using gen-
eral formulas resulting from theory. The most remark-
(2) able product that Bachelier priced was based on the
maximum value of a stock during the period between
Subtracting f (n, x) from both the sides of the equa- its purchase and a maturity date (usually one month
tion, we obtain later). In this case, one must determine the law of the
 maximum of a stock price over some interval of time.
f (n + 1, x) − f (n, x) = f (n, x + 1)
1
2
This problem would be of concern to Norbert Wiener,
 the inventor of the mathematical theory of Brownian
− 2f (n, x) + f (n, x − 1) (3) motion, in 1923. It involves knowing a priori the
law of the price over an infinite time interval, but it
It then suffices to take the unit 1 in the preceding was not known—either in 1923 or in 1900—how to
equation to be infinitely small to obtain the heat easily calculate the integrals of functions of an infi-
equation nite number of variables. Let us explain the reasoning
∂f 1 ∂ 2f used by Bachelier [1] as an example of his methods
= (4) of analysis.
∂n 2 ∂x 2
Bachelier, Louis (1870–1946) 3

Bachelier proceeded in two different ways. The a simple formula by using a very simple probabilistic
first way was based on the second method developed (or combinatorial) argument.
in Bachelier’s thesis. It consists of discretizing time Of course, Bachelier had to do his mathematics
in steps of t, and introducing a change in price without a safety net. What could his safety net have
at each step of ±x. Bachelier wanted to calculate been? The mathematical analysis available during
the probability that before time t = nt, the game his time could not deal with such strange objects
(or price) exceeds a given value c = mx. Let n = and calculations. It was not until the following
m + 2p. Bachelier proposed to first calculate the year, 1901, that Lebesgue introduced the integral
probability that the price c is reached for the first based on the measure that Borel had just recently
time at exactly time t. To this end, he uses the constructed. The Daniell integral, which Wiener used,
gambler’s ruin argument: the probability is equal dates to 1920 and it was not until the 1930s that
to (m/n)Cn 2−n , which Bachelier obtained from the
p
European mathematicians realized that computing
ballot formula of Bertrand, which he learned from probabilities with respect to Brownian motion, or
Poincaré or Bertrand’s work, or perhaps both. It with respect to sequences of independent random
suffices to√ then pass properly to the limit so that variables, could be done using Lebesgue measure on
x = O( t). One then obtains the probability that the unit interval. Since Lebesgue’s theory came to be
the price exceeds c before t. Bachelier then noted viewed as one of the strongest pillars of analysis in
that this probability is equal to twice the probability the twentieth century, this approach gave probability
that the price exceeds c at time t. theory a very strong analytic basis. We will have to
The result is Bachelier’s formula for the law of wait much longer to place the stochastic calculus
the maximum Mt of the price Bt over the interval of Brownian motion and sample path arguments
[0, t]; that is, involving stopping times into a relatively uniform
analytical framework. Anyway, Bachelier had little
(Mt > c) = 2(Bt > c) (5) concern for either this new theory in analysis or the
work of his contemporaries, whom he never cites. He
It would have been difficult to proceed in a simpler refers to the work of Laplace, Bertrand, and Poincaré,
fashion. Having obtained this formula, Bachelier who never cared about the Lebesgue integral, and so
had to justify it in a simple way to understand Bachelier always ignored its existence.
why it holds. Bachelier therefore added to his first It seems that in 1900, Bachelier [1] saw very
calculation (which was somewhat confusing and clearly how to model the continuous movement of
difficult to follow) a “direct demonstration” without stock prices and he established new computational
passing to the limit. He used the argument that “the techniques, derived notably from the classical tech-
price cannot pass the threshold c over a time interval niques involving infinite sequences of fair coin flips.
of length t without having done so previously” and He provided an intermediate mathematical argument
hence that to explain a new class of functions that reflected the
vagaries of the market, just as in the eighteenth cen-
(Bt > c) = (Mt > c)α (6) tury, when one used geometric reasoning and physical
intuition to explain things.
where α is the probability that the price c, having
been attained before time t, is greater than c at
time t. The latter probability is obviously 1/2, due After the Thesis
to symmetry of the sample paths that go above
and that remain below c by time t. And Bachelier His Ph.D. thesis defended, Bachelier suddenly seem-
concludes: “It is remarkable that the multiple integral ed to discover the immensity of a world in which
that expresses the probability (Mt > c) does not randomness exists. The theory of the stock market
seem amenable to ordinary methods of calculation, allowed him to view the classical results of proba-
but can be determined by very simple probabilistic bility with a new eye, and it opened new viewpoints
reasoning.” It was, without doubt, the first example for him. Starting in 1901, Bachelier showed that the
of the use of the reflection principle in probability known results about infinite sequences of fair coin
theory. In two steps, a complicated calculation yields flips could all (or almost all) be obtained from stock
4 Bachelier, Louis (1870–1946)

market theory and that one can derive new results Bachelier essentially did not publish any original
that are more precise than anyone had previously sus- work. He married in 1920, but his wife died a few
pected. In 1906, Bachelier proposes an almost general months later. He was often ill and he seems to have
theory of “related probabilities”, that is to say, a been quite isolated.
theory about what would, 30 years later, be called In 1937, he moved with his sister to Saint-Malo
Markov processes. This article by Bachelier was the in Brittany. During World War II, he moved to Saint-
starting point of a major study by Kolmogorov in Servan, where he died in 1946. He seemed to be
1931 that we already mentioned. All of Bachelier’s aware of the new theory of stochastic processes that
work was published with the distant but caring rec- was then developing in Paris and Moscow, and that
ommendation of Poincaré, so that by 1910, Bachelier, was progressively spreading all over the world. He
whose income remains unknown and was proba- attempted to claim credit for the things that he had
bly modest, is permitted to teach a “free course” in done, without any success. He regained his appetite
probability theory at the Sorbonne, without compen- for research, to the point that in 1941, at the age
sation. Shortly thereafter, he won a scholarship that of 70, he submitted a note for publication to the
allowed him to publish his Calculus of Probability, Academy of Sciences in Paris on the “probability of
Volume I, Paris, Gauthier-Villars, 1912 (Volume II maximum oscillations”, in which he demonstrated a
never appeared), which included all of his work since fine mastery of the theory of Brownian motion, which
his thesis. This very surprising book was not widely was undertaken systematically by Paul Levy starting
circulated in France, and had no impact on the Paris in 1938. Paul Levy, the principal French researcher
stock market or on French mathematics, but it was of the theory of Brownian motion, recognized, albeit
one of the sources that motivated work in stochastic belatedly, the work of Bachelier, and his work
processes at the Moscow School in the 1930s. It also provided a more rigorous foundation for Bachelier’s
influenced work by the American School on sums “theory of speculation”.
of independent random variables in the 1950s, and
at the same time, influenced new theories in math- Reference
ematical finance that were developing in the United
States. And, as things should rightly be, these theo-
[1] Bachelier, L. (1900). Théorie de la spéculation, Thèse
ries traced back to France, where Bachelier’s name Sciences mathématiques Paris. Annales Scientifiques de
had become so well recognized that in 2000, the l’Ecole Normale Supérieure 17, 21–86; The Random
centennial anniversary of his work in “theory of spec- Character of Stock Market Prices, P. Cootner, ed, MIT
ulation” was celebrated. Press, Cambridge, 1964, pp. 17–78.
The First World War interrupted the work of
Bachelier, who was summoned for military service Further Reading
in September 1914 as a simple soldier. When the
war ended in December 1918, he was a sublieutenant Courtault, J.M. & Kabanov, Y. (eds) (2002). Louis Bachelier:
in the Army Service Corps. He served far from the Aux origines de la Finance Mathématique, Presses Univer-
front, but he carried out his service with honor. As a sitaires Franc-Comtoises, Besançon.
result, in 1919, the Directorate of Higher Education in Taqqu, M.S. (2001). Bachelier and his times: a conversation
Paris believed it was necessary to appoint Bachelier with Bernard Bru, Finance and Stochastics 5(1), 3–32.
to a university outside of Paris, since the war had
decimated the ranks of young French mathematicians Related Articles
and there were many positions to be filled. After
many difficulties, due to his marginalization in the
French mathematical community and the incongruent Black–Scholes Formula; Markov Processes;
nature of his research, Bachelier finally received Martingales; Option Pricing: General Principles.
tenure in 1927 (at the age of 57) as a professor at BERNARD BRU
the University of Besançon, where he remained until
his retirement in 1937. Throughout the postwar years,
Samuelson, Paul A. fat-tailed, infinite-variance return distributions [14],
and, over a span of nearly four decades, analyzing
the systematic dependence on age of optimal port-
Paul Anthony Samuelson (1915–) is Institute Profes- folio strategies, in particular, optimal long-horizon
sor Emeritus at the Massachusetts Institute of Tech- investment strategies, and the improper use of the
nology where he has taught since 1940. He earned Law of Large Numbers to arrive at seemingly domi-
a BA from the University of Chicago in 1935 and nating strategies for the long run [10, 15, 17, 21–27].
his PhD in economics from Harvard University in In investigating the oft-told tale that investors become
1941. He received the John Bates Clark Medal in systematically more conservative as they get older,
1947 and the National Medal of Science in 1996. Samuelson shows that perfectly rational risk-averse
In 1970, he became the first American to receive the investors with constant relative risk aversion will
Alfred Nobel Memorial Prize in Economic Sciences. select the same fraction of risky stocks versus safe
His textbook, Economics, first published in 1948, and cash period by period, independently of age, provided
in its 18th edition, is the best-selling and arguably the that the investment opportunity set is unchanging.
most influential economics textbook of all time. Having shown that greater investment conservatism is
Paul Samuelson is the last great general not an inevitable consequence of aging, he later [24]
economist—never again will any one person make demonstrates conditions under which such behavior
such foundational contributions to so many distinct can be optimal: with mean-reverting changing oppor-
areas of economics. His prolific and profound theo- tunity sets, older investors will indeed be more con-
retical contributions over seven decades of published servative than in their younger days, provided that
research have been universal in scope, and his ram- they are more risk averse than a growth-optimum,
ified influence on the whole of economics has led log-utility maximizer. To complete the rich set of age-
to foundational contributions in virtually every field dependent risk-taking behaviors, Samuelson shows
of economics, including financial economics. Repre- that rational investors may actually become less con-
senting 27 years of scientific writing from 1937 to servative with age, if either they are less risk averse
the middle of 1964, the first two volumes of his Col- than log or if the opportunity set follows a trend-
lected Scientific Papers contain 129 articles and 1772 ing, momentum-like dynamic process. He recently
pages. These were followed by the publication of confided that in finance, this analysis is a favorite
the 897-page third volume in 1972, which registers brainchild of his.
the succeeding seven years’ product of 78 articles Published in the same issue of the Industrial Man-
published when he was between the ages of 49 and agement Review, “Proof That Properly Anticipated
56 [18]. A mere five years later, at the age of 61, Prices Fluctuate Randomly” and “Rational Theory of
Samuelson had published another 86 papers, which Warrant Pricing” are perhaps the two most influen-
fill the 944 pages of the fourth volume. A decade tial Samuelson papers in quantitative finance. Dur-
later, the fifth volume appeared with 108 articles and ing the decade before their printed publication in
1064 pages. A glance at his list of publications since 1965, Samuelson had set down, in an unpublished
1986 assures us that a sixth and even seventh vol- manuscript, many of the results in these papers and
ume could be filled. That Samuelson paid no heed had communicated them in lectures at MIT, Yale,
to the myth of debilitating age in science is particu- Carnegie, the American Philosophical Society, and
larly well-exemplified in his contributions to financial elsewhere. In the early 1950s, he supervised a PhD
economics, with all but 6 of his more than 60 papers thesis on put and call pricing [5].
being published after he had reached the age of 50. The sociologist or historian of science would
Samuelson’s contribution to quantitative finance, undoubtedly be able to develop a rich case study
as with mathematical economics generally, has been of alternative paths for circulating scientific ideas
foundational and wide-ranging: these include recon- by exploring the impact of this oral publication of
ciling the axioms of expected utility theory first with research in rational expectations, efficient markets,
nonstochastic theories of choice [9] and then with the geometric Brownian motion, and warrant pricing in
ubiquitous and practical mean–variance criterion of the period between 1956 and 1965.
choice [16], exploring the foundations of diversifica- Samuelson (1965a) and Eugene Fama indepen-
tion [13] and optimal portfolio selection when facing dently provide the foundation of the Efficient Market
2 Samuelson, Paul A.

theory that developed into one of the most impor- the most part in those ensuing years, his interpretation
tant concepts in modern financial economics. As of the data is that organized markets where widely
indicated by its title, the principal conclusion of owned securities are traded are well approximated
the paper is that in well-informed and competitive as microefficient, meaning that the relative pricing of
speculative markets, the intertemporal changes in individual securities within the same or very similar
prices will be essentially random. Samuelson has asset classes is such that active asset management
described the reaction (presumably his own as well applied to those similar securities (e.g., individual
as that of others) to this conclusion as one of “initial stock selection) does not earn greater risk-adjusted
shock—and then, upon reflection, that it is obvi- returns.
ous”. The argument is as follows: the time series of However, Samuelson is discriminating in his
changes in most economic variables gross national assessment of the efficient market hypothesis as it
product (GNP, inflation, unemployment, earnings, relates to real-world markets. He notes a list of
and even the weather) exhibit cyclical or serial the “few not-very-significant apparent exceptions” to
dependencies. Furthermore, in a rational and well- microefficient markets [23, p. 5]. He also expresses
informed capital market, it is reasonable to presume belief that there are exceptionally talented people who
that the prices of common stocks, bonds, and com- can probably garner superior risk-corrected returns,
modity futures depend upon such economic variables. and even names a few. He does not see them as offer-
Thus, the shock comes from the seemingly inconsis- ing a practical broad alternative investment prescrip-
tent conclusion that in such well-functioning markets tion for active management since such talents are few
the changes in speculative prices should exhibit no and hard to identify. As Samuelson believes strongly
serial dependencies. However, once the problem is in microefficiency of the markets, he expresses doubt
viewed from the perspective offered in the paper, this about macromarket efficiency: namely that indeed
seeming inconsistency disappears and all becomes asset-value “bubbles” do occur.
obvious. There is no doubt that the mainstream of the pro-
Starting from the consideration that in a competi- fessional investment community has moved signifi-
tive market, if everyone knew that a speculative secu- cantly in the direction of Paul Samuelson’s position
rity was expected to rise in price by more (less) than during the 35 years since he issued his challenge to
the required or fair expected rate of return, it would that community to demonstrate widespread superior
already be bid up (down) to negate that possibility, performance [20]. Indexing as either a core invest-
Samuelson postulates that securities will be priced at ment strategy or a significant component of insti-
each point in time so as to yield this fair expected tutional portfolios is ubiquitous, and even among
rate of return. Using a backward-in-time induction those institutional investors who believe they can
argument, he proves that the changes in speculative deliver superior performance, performance is typi-
prices around that fair return will form a martingale. cally measured incrementally relative to an index
And this follows no matter how much serial depen- benchmark and the expected performance increment
dency there is in the underlying economic variables to the benchmark is generally small compared to the
upon which such speculative prices are formed. In an expected return on the benchmark itself. It is there-
informed market, therefore, current speculative prices fore with no little irony that as investment practice
will already reflect anticipated or forecastable future has moved in this direction, for the last 15 years,
changes in the underlying economic variables that are academic research has moved in the opposite direc-
relevant to the formation of prices, and this leaves tion, strongly questioning even the microefficiency
only the unanticipated or unforecastable changes in case for the efficient market hypothesis. The con-
these variables as the sole source of fluctuations in ceptual basis of these challenges comes from the-
speculative prices. ories of asymmetric information and institutional
Samuelson is careful to warn the reader against rigidities that limit the arbitrage mechanisms that
interpreting his mathematically derived theoretical enforce microefficiency and of cognitive dissonance
conclusions about markets as empirical statements. and other systematic behavioral dysfunctions among
Nevertheless, for 40 years, his model has been impor- individual investors that purport to distort market
tant to the understanding and interpretation of the prices away from rationally determined asset prices
empirical results observed in real-world markets. For in identified ways. A substantial quantity of empirical
Samuelson, Paul A. 3

evidence has been assembled, but there is consider- his paper, Samuelson thus chose the term European
able controversy over whether it does indeed make for the relatively simple(-minded)-to-value option
a strong case to reject market microefficiency in the contract that can only be exercised at expiration and
Samuelsonian sense. What is not controversial at all American for the considerably more-(complex)-to-
is that Paul Samuelson’s efficient market hypothesis value option contract that could be exercised early,
has had a deep and profound influence on finance any time on or before its expiration date.
research and practice for more than 40 years and all Although real-world options are almost always
indications are that it will continue to do so well into of the American type, published analyses of option
the future. pricing prior to his 1965 paper focused exclusively
If one were to describe the 1960s as “the decade on the evaluation of European options and therefore
of capital asset pricing and market efficiency” in did not include the extra value to the option from the
view of the important research gains in quantitative right to exercise early.
finance during then, one need hardly say more than The most striking comparison to make between
“the Black-Scholes option pricing model” to justify the Black–Scholes option pricing theory and Samuel-
describing the 1970s as “the decade of option and son’s rational theory [12] is the formula for the
derivative security pricing.” Samuelson was ahead of option price. The Samuelson partial differential equa-
the field in recognizing the arcane topic of option tion for the option price is the same as the correspond-
pricing as a rich area for problem choice and solution. ing equation for the Black–Scholes option price if
By at least the early 1950s, Samuelson had shown one sets the Samuelson parameter for the expected
that the assumption of an absolute random walk or return on the underlying stock equal to the riskless
arithmetic Brownian motion for stock price changes interest rate minus the dividend yield and sets the
leads to absurd prices for long-lived options, and
Samuelson parameter for the expected return on the
this was done before his rediscovery of Bachelier’s
option equal to the riskless interest rate. It should,
pioneering work [1] in which this very assumption
however, be underscored that the mathematical equiv-
is made. He introduced the alternative process of a
alence between the two formulas with the redefinition
“geometric” Brownian motion in which the log of
of parameters is purely a formal one. The Samuel-
price changes follows a Brownian motion, possibly
son model simply posits the expected returns for the
with a drift. His paper on the rational theory of
stock and option. By employing a dynamic hedging
warrant pricing [12] resolves a number of apparent
paradoxes that had plagued the existing mathematical or replicating portfolio strategy, the Black–Scholes
theory of option pricing from the time of Bachelier. analysis derives the option price without the need
In the process (with the aid of a mathematical to know either the expected return on the stock or
appendix provided by H. P. McKean, Jr), Samuelson the required expected return on the option. There-
also derives much of what has become the basic fore, the fact that the Black–Scholes option price
mathematical structure of option pricing theory today. satisfies the Samuelson formula implies neither that
Bachelier [1] considered options that could only the expected returns on the stock and option are
be exercised on the expiration date. In modern equal nor that they are equal to the riskless rate of
times, the standard terms for options and warrants interest. Furthermore, it should also be noted that
permit the option holder to exercise on or before Black–Scholes pricing of options does not require
the expiration date. Samuelson coined the terms knowledge of investors’ preferences and endowments
European option to refer to the former and American as is required, for example, in the sequel Samuelson
option to refer to the latter. As he tells the story, and Merton [28] warrant pricing paper. The “ratio-
to get a practitioner’s perspective in preparation for nal theory” put forward in 1965 is thus clearly a
his research, he went to New York to meet with a “miss” with respect to the Black–Scholes develop-
well-known put and call dealer (there were no traded ment. However, as this analysis shows, it is just as
options exchanges until 1973) who happened to be clearly a “near miss”. See [6, 19] for a formal com-
Swiss. Upon his identifying himself and explaining parison of the two models.
what he had in mind, Samuelson was quickly told, Extensive reviews of Paul Samuelson’s remark-
“You are wasting your time—it takes a European able set of contributions to quantitative finance can
mind to understand options.” Later on, when writing be found in [2–4, 7, 8].
4 Samuelson, Paul A.

References [16] Samuelson, P.A. (1970). The fundamental approxima-


tion theorem of portfolio analysis in terms of means,
variances and higher moments, Review of Economic
[1] Bachelier, L. (1900, 1966). Theory de la Specula-
Studies 37, 537–542, Collected Scientific Papers, III,
tion, Gauthier-Villars, Paris, in The Random Character
of Stock Market Prices, P. Cootner, ed, MIT Press, Chap. 203.
Cambridge. [17] Samuelson, P.A. (1971b). The ‘Fallacy’ of maximizing
[2] Bernstein, P.L. (2005). Capital Ideas: The Improbable the geometric mean in long sequences of investing
Origins of Modern Wall Street, John Wiley & Sons, or gambling, Proceedings of the National Academy of
Hoboken. Sciences of United States of America 68, 2493–2496,
[3] Carr, P. (2008). The father of financial engineering, Collected Scientific Papers, III, Chap. 207.
Bloomberg Markets 17, 172–176. [18] Samuelson, P.A. (1972). The Collected Scientific Papers
[4] Fischer, S. (1987). Samuelson, Paul Anthony, The New of Paul A. Samuelson, R.C. Merton, ed, MIT Press,
Palgrave: A Dictionary of Economics, MacMillan Pub- Cambridge, Vol. 3.
lishing, Vol. 4, pp. 234–241. [19] Samuelson, P.A. (1972). Mathematics of speculative
[5] Kruizenga, R. (1956). Put and Call Options: A Theo- price, in Mathematical Topics in Economic Theory and
retical and Market Analysis, Doctoral dissertation, MIT, Computation, R.H. Day & S.M. Robinson, eds, Society
Cambridge, MA. for Industrial and Applied Mathematics, Philadelphia,
[6] Merton, R.C. (1972). Continuous-time speculative pro- pp. 1–42, reprinted in SIAM Review 15, 1973, Collected
cesses: appendix to P. A. Samuelson’s ‘mathematics Scientific Papers, IV, Chap. 240.
of speculative price’, in Mathematical Topics in Eco- [20] Samuelson, P.A. (1974). Challenge to judgment, Journal
nomic Theory and Computation, R.H., Day & S.M. of Portfolio Management 1, 17–19, Collected Scientific
Robinson, eds, Philadelphia Society for Industrial and Papers, IV, Chap. 243.
Applied Mathematics, pp. 1–42, reprinted in SIAM
[21] Samuelson, P.A. (1979). Why we should not make mean
Review 15, 1973.
log of wealth big though years to act are long, Journal
[7] Merton, R.C. (1983). Financial economics, in Paul
of Banking and Finance 3, 305–307.
Samuelson and Modern Economic Theory, E.C. Brown &
R.M. Solow, eds, McGraw Hill, New York. [22] Samuelson, P.A. (1989). A case at last for age-
[8] Merton, R.C. (2006). Paul Samuelson and financial phased reduction in equity, Proceedings of the National
economics, in Samuelsonian Economics and the Twenty- Academy of Science of United States of America 86,
First Century, M. Szenberg, L. Ramrattan & A. Gottes- 9048–9051.
man, Oxford University Press, Oxford, Reprinted in [23] Samuelson, P.A. (1989). The judgment of economic
American Economist 50, no. 2 (Fall 2006). science on rational portfolio management: indexing,
[9] Samuelson, P.A. (l952). Probability, utility, and the inde- timing, and long-horizon effects, Journal of Portfolio
pendence axiom, Econometrica 20, 670–678, Collected Management Fall, 16, 4–12.
Scientific Papers, I, Chap. 14. [24] Samuelson, P.A. (1991). Long-run risk tolerance when
[10] Samuelson, P.A. (1963). Risk and uncertainty: a fallacy equity returns are mean regressing pseudoparadoxes
of large numbers, Scientia 57, 1–6, Collected Scientific and vindication of ‘businessmen’s risk, in Money,
Papers, I, Chap. 16. Macroeconomics, and Economic Policy: Essays in Honor
[11] Samuelson, P.A. (l965). Proof that properly antici- of James Tobin, W.C. Brainard, W.D. Nordhaus &
pated prices fluctuate randomly, Industrial Manage- H.W. Watts, eds, The MIT Press, Cambridge, pp.
ment Review 6, 41–49, Collected Scientific Papers, III, 181–200.
Chap. 198. [25] Samuelson, P.A. (1992). At last a rational case for long
[12] Samuelson, P.A. (l965). Rational theory of warrant pric- horizon risk tolerance and for asset-allocation timing?
ing, Industrial Management Review 6, 13–39, Collected
in Active Asset Allocation, D.A. Robert & F.J. Fabozzi,
Scientific Papers, III, Chap. 199.
eds, Probus Publishing, Chicago.
[13] Samuelson, P.A. (1967). General proof that diversi-
[26] Samuelson, P.A. (1994). The long-term case of equi-
fication pays, Journal of Financial and Quantitative
ties and how it can be oversold, Journal of Portfolio
Analysis 2, 1–13, Collected Scientific Papers, III,
Chap. 201. Management Fall, 21, 15–24.
[14] Samuelson, P.A. (1967). Efficient portfolio selection [27] Samuelson, P.A. (1997). Proof by certainty equiva-
for Pareto-Levy investments, Journal of Financial and lents that diversification-across-time does worse, risk-
Quantitative Analysis 2, 107–122, Collected Scientific corrected, than diversification-throughout-time, Journal
Papers, III, Chap. 202. of Risk and Uncertainty 14, 129–142.
[15] Samuelson, P.A. (l969). Lifetime portfolio selection by [28] Samuelson, P.A. & Merton, R.C. (1969). A complete
dynamic stochastic programming, Review of Economics model of warrant pricing that maximizes utility, Indus-
and Statistics 51, 239–246, Collected Scientific Papers, trial Management Review 10, 17–46, Collected Scien-
III, Chap. 204. tific Papers, III, Chap. 2000.
Samuelson, Paul A. 5

Further Reading Samuelson, P.A. (1977). The Collected Scientific Papers of


Paul A. Samuelson, H. Nagatani & K. Crowley, eds, MIT
Samuelson, P.A. (1966). The Collected Scientific Papers of Press, Cambridge, Vol. 4.
Paul A. Samuelson, J.E. Stiglitz, ed, MIT Press, Cambridge, Samuelson, P.A. (1986). The Collected Scientific Papers of
Vols. 1 and 2. Paul A. Samuelson, K. Crowley, ed, MIT Press, Cambridge,
Samuelson, P.A. (l971). Stochastic speculative price, Proceed- Vol. 5.
ings of the National Academy of Sciences of the United States
of America 68, 335–337, Collected Scientific Papers, III, ROBERT C. MERTON
Chap. 206.
Black, Fischer important use of the tools was in his work on interest
rate derivatives, in the famous Black–Derman–Toy
term structure model [10].
The central focus of the career of Fischer Black Black got his start in finance after already earn-
(1938–1995) was on teasing out the implications ing his PhD in applied mathematics (Harvard, 1964)
of the capital asset pricing model (CAPM) for the when he learned about CAPM from Treynor [18], his
changing institutional framework of financial markets colleague at the business consulting firm Arthur D.
of his day. He became famous for the Black–Scholes Little, Inc. Fischer had never taken a single course in
options formula [14], an achievement that is now economics or finance, nor did he ever do so subse-
widely recognized as having opened the door to mod- quently. Nevertheless, the field was underdeveloped
ern quantitative finance and financial engineering. at the time, and Fischer managed to set himself up
Fischer was the first quant, but a very special kind of as a financial consultant and to parlay his success
quant because of his taste for the big picture [16]. in that capacity into a career in academia (Univer-
Regarding that big picture, as early as 1970, he sity of Chicago 1971–1975, Massachusetts Institute
sketched a vision of the future that has by now largely of Technology 1975–1984), and then into a partner-
come true: ship at the Wall Street investment firm of Goldman
Sachs (1984–1995). There can be no doubt that his
Thus a long term corporate bond could actually be
early success with the options pricing formula opened
sold to three separate persons. One would supply the
money for the bond; one would bear the interest rate these doors. The more important point is how, in each
risk; and one would bear the risk of default. The last of these settings, Fischer used the opportunity he had
two would not have to put up any capital for the been given to help promote his vision of a CAPM
bonds, although they might have to post some sort future for the financial side of the economy.
of collateral. CAPM is only about a world of debt and equity,
Today we recognize the last two instruments as an and the debt in that world is both short term and risk
interest rate swap and a credit default swap, the free. In such a world, everyone holds the fully diver-
two instruments that have been the central focus of sified market portfolio of equity and then adjusts risk
financial engineering ever since. exposure by borrowing or lending in the market for
All of the technology involved in this engineer- risk-free debt. As equity values fluctuate, outstanding
ing can be traced back to roots in the original debt also fluctuates, as people adjust their portfolios
Black–Scholes option pricing formula [14]. Black to maintain desired risk exposure. One implication of
himself came up with a formula through CAPM, by CAPM, therefore, is that there should be a market for
thinking about the exposure to systemic risk that was passively managed index mutual funds [15]. Another
involved in an option, and how that exposure changes implication is that the regulatory apparatus surround-
as the price of the underlying changes. Today the for- ing banking, both lending and deposit taking, should
mula is more commonly derived using the Ito formula be drastically relaxed to facilitate dynamic adjustment
and the option replication idea introduced by Merton of risk exposure [3]. And yet a third implication is
[17]. For a long time, Black himself was unsure about that there might be a role for an automatic risk rebal-
the social utility of equity options. If all they do is ancing instrument, essentially what is known today
to allow people to achieve the same risk exposure as portfolio insurance [6, 13].
they could achieve by holding equity outright with Even while Black was working on remaking the
leverage, then what is the point? world in the image of CAPM, he was also expand-
The Black–Scholes formula and the hedging ing the image of the original CAPM to include a
methodology behind it subsequently became a central world without a riskless asset in his famous zero-beta
pillar in the pricing of contingent claims of all kinds model [1] and to include a world with multiple cur-
and in doing so gave rise to many innovations that rencies in his controversial universal hedging model
contributed to making the world more like his 1970 [2, 7] that subsequently formed the analytical core of
vision. Black and Cox [9] represents an early attempt the Black–Litterman model of global asset allocation
to use the option pricing technology to price default [11, 12].
risk. Black [4] similarly uses the option pricing tech- These and other contributions to quantitative
nology to price currency risk. Perhaps, Black’s most finance made Fischer Black famous, but according
2 Black, Fischer

to him, his most important work was the two books [7] Black, F. (1990). Equilibrium exchange rate hedging,
he wrote that extended the image of CAPM to the real Journal of Finance 45, 899–907.
economy, including the theory of money and business [8] Black, F. (1995). Exploring General Equilibrium, MIT
Press, Cambridge, MA.
cycles [5, 8]. The fluctuation of aggregate output, he
[9] Black, F. & Cox, J.C. (1976). Valuing corporate securi-
reasoned, was nothing more than the fluctuating yield ties: some effects of bond indenture provisions, Journal
on the national stock of capital. Just as risk is the price of Finance 31, 351–368.
we pay for higher expected yield, business fluctuation [10] Black, F., Derman, E. & Toy, W.T. (1990). A one-factor
is also the price we pay for higher expected rates of model of interest rates and its application to treasury
economic growth. bond options, Financial Analysts Journal 46, 33–39.
The rise of modern finance in the last third of [11] Black, F. & Litterman, R. (1991). Asset allocation: com-
twentieth century transformed the financial infrastruc- bining investor views with market equilibrium, Journal
of Fixed Income 1, 7–18.
ture within which businesses and households interact. [12] Black, F. & Litterman, R. (1992). Global portfolio
A system of banking institutions was replaced by optimization, Financial Analysts Journal 48, 28–43.
a system of capital markets, as financial engineer- [13] Black, F. & Perold, A.F. (1992). Theory of constant
ing developed ways to turn loans into bonds. This proportion portfolio insurance, Journal of Economic
revolution in institutions has also brought with it a Dynamics and Control 16, 403–426.
revolution in our thinking about how the economy [14] Black, F. & Scholes, M. (1973). The pricing of options
works, including the role of government regulation and corporate liabilities, Journal of Political Economy
81, 637–654.
and stabilization policy. Crises in the old banking
[15] Black, F. & Scholes, M. (1974). From theory to a new
system gave rise to the old macroeconomics. Crises financial product, Journal of Finance 19, 399–412.
in the new capital markets system will give rise to a [16] Mehrling, P.G. (2005). Fischer Black and the Revolu-
new macroeconomics, possibly built on the founda- tionary Idea of Finance, John Wiley & Sons, Hoboken,
tions laid by Fischer Black. New Jersey.
[17] Merton, R.C. (1973). Theory of rational option pricing,
References Bell Journal of Economics and Management Science 4,
141–183.
[1] Black, F. (1972). Capital market equilibrium with [18] Treynor, J.L. (1962). Toward a theory of market value of
restricted borrowing, Journal of Business 45, 444–455. risky assets, in Asset Pricing and Portfolio Performance,
[2] Black, F. (1974). International capital market equilib- R.A. Korajczyk, ed, Risk Books, London, pp. 15–22.
rium with investment barriers, Journal of Financial Eco-
nomics 1, 337–352.
[3] Black, F. (1975). Bank funds management in an efficient
market, Journal of Financial Economics 2, 323–339.
Related Articles
[4] Black, F. (1976). The pricing of commodity contracts,
Journal of Financial Economics 3, 167–179.
Black–Scholes Formula; Black–Litterman App-
[5] Black, F. (1987). Business Cycles and Equilibrium, Basil
Blackwell, Cambridge, MA. roach; Option Pricing Theory: Historical Perspec-
[6] Black, F. (1988). Individual investment and consumption tives; Merton, Robert C.; Modern Portfolio The-
under uncertainty, in Portfolio Insurance, A Guide to ory; Term Structure Models; Sharpe, William F.
Dynamic Hedging, D.L. Luskin, ed, John Wiley & Sons,
New York, pp. 207–225. PERRY MEHRLING
Mandelbrot, Benoit disordered and random phenomena ranging from the
geometry of coastlines to the variation of foreign
exchange rates. In his own words
The roughness of clusters in the physics of disor-
der, of turbulent flows, of exotic noises, of chaotic
dynamical systems, of the distribution of galaxies, of
coastlines, of stock price charts, and of mathemat-
ical constructions,—these have typified the topics
I studied.
He formalized the notion of ‘fractal process’—and
later, that of multifractal [13]—which provided a
tool for quantifying the “degree of irregularity” of
various random phenomena in mathematics, physics,
and economics.
Benoit B. Mandelbrot, Sterling Professor Emeritus Benoit Mandelbrot’s numerous awards include the
of Mathematical Sciences at Yale University and 1993 Wolf Prize for Physics and the 2003 Japan Prize
IBM Fellow Emeritus at the IBM Research Cen- for Science and Technology, the 1985 F. Barnard
ter, best known as the “father of fractal geometry”, Medal for Meritorious Service to Science (“Magna
is a Polish-born French-American multidisciplinary est Veritas”) of the US National Academy of Sci-
scientist with numerous contributions to different ences, the 1986 Franklin Medal for Signal and Emi-
fields of knowledge including mathematics, statistics, nent Service in Science of the Franklin Institute
hydrology, physics, engineering, physiology, eco- of Philadelphia, the 1988 Charles Proteus Stein-
nomics and, last but not least, quantitative finance. metz Medal of IEEE, the 2004 Prize of Financial
In this short text we will focus on Mandelbrot’s con- Times/Deutschland, and a Humboldt Preis from the
tributions to the study of financial markets. Alexander von Humboldt Stiftung.
Benoit Mandelbrot was born in Warsaw, Poland,
on November 20, 1924 in a family of scholars from
Lituania. In 1936 Mandelbrot’s family moved to From Mild to Wild Randomness:
Paris, where he was influenced by his mathemati- The Noah Effect
cian uncle Szolem Mandelbrojt (1899–1983). He
entered the Ecole Polytechnique in 1944. Among his Mandelbrot developed an early interest in the stochas-
professors at Polytechnique was Paul Levy, whose tic modeling of financial markets. Familiar with
pioneering work on stochastic processes influenced the work of Louis Bachelier (see Bachelier, Louis
Mandelbrot. (1870–1946)), Mandelbrot published a series of
After two years in Caltech and after obtaining pioneering studies [6–8, 21] on the tail behavior
a doctoral degree in mathematics from University of the distribution of price variations, where he
of Paris in 1952, he started his scientific career at advocated the use of heavy-tailed distributions and
scale-invariant Lévy processes for modeling price
the Centre National de la Recherche Scientifique in
fluctuations. The discovery of the heavy-tailed nature
Paris, before moving on various scientific appoint-
of price movements led him to coin the term
ments which included those at Ecole Polytechnique,
“wild randomness” for describing market behavior,
Universite de Lille, the University of Geneva MIT,
as opposed to the “mild randomness” represented by
Princeton, University of Chicago, and finally the
Bachelier’s Brownian model, which later became the
IBM Thomas J. Watson Research Center in York-
standard approach embodied in the Black–Scholes
town Heights, New York and Yale University where
model. Mandelbrot likened the sudden bursts of
he spent the longer part of his career.
volatility in financial markets to the “Noah effect”,
A central thread in his scientific career is the
by analogy with the flood which destroys the world
“ardent pursuit of the concept of roughness” which
in Noah’s biblical story:
resulted in a rich theoretical apparatus—fractal and
multifractal geometry—whose aim is to describe In science, all important ideas need names and
and represent the order hidden in apparently wildly stories to fix them in the memory. It occurred to
2 Mandelbrot, Benoit

me that the market’s first wild trait, abrupt change activity, is given by a multifractal (see Multifractals)
or discontinuity, is prefigured in the tale of Noah. increasing process (see Mixture of Distribution
As Genesis relates, in Noah’s six-hundredth year Hypothesis; Time Change) [5, 15]:
God ordered the Great Flood to purify a wicked
world. [. . .] The flood came and went, catastrophic The key step is to introduce an auxiliary quantity
but transient. Market crashes are like that : at times, called trading time. The term is self-explanatory
even a great bank or brokerage house can seem like and embodies two observations. While price changes
a little boat in a big storm. over fixed clock time intervals are long-tailed,
price changes between successive transactions stay
near-Gaussian over sometimes long period between
Long-range Dependence: The Joseph discontinuities. Following variations in the trading
Effect volume, the time interval between successive trans-
actions vary greatly. Thissuggests that trading time
Another early insight of Mandelbrot’s studies of is related to volume.
financial and economic data was the presence of long- The topic of multifractal modeling in finance was
range dependence [9–11] in market fluctuations: further developed in [1, 17–19]; a nontechnical
account is given in [16].
The market’s second wild trait—almost cycles—is
prefigured in the story of Joseph. The Pharaoh
Mandelbrot’s work in quantitative finance has
dreamed that seven fat cattle were feeding in the been generally 20 years ahead of its time: many
meadows, when seven lean kine rose out of the Nile of his ideas proposed in the 1960s—such as long-
and ate them. [. . .] Joseph, a Hebrew slave, called range dependence, volatility clustering, and heavy
the dreams prophetic : Seven years of famine would tails—became mainstream in financial modeling in
follow seven years of prosperity. [. . .] Of course, the 1990s. If this is anything of a pattern, his more
this is not a regular or predictable pattern. But the recent work in the field might deserve a closer look.
appearance of one is strong. Behind it is the influence
of long-range dependence in an otherwise random
Perhaps, one of the most important insights of his
process or, put another way, a long-term memory work on financial modeling is to closely examine the
through which the past continues to influence the empirical features of data before axiomatizing and
random fluctuations of the present. I called these writing down complex equations, a timeless piece of
two distinct forms of wild behavior the Noah effect advice which can be a useful guide for quantitative
and the Joseph effect. They are two aspects of one modeling in finance.
reality.
Mandelbrot’s work in finance is summarized in the
Building on his earlier work Mandelbrot [22, 23] on books [14, 15] and a popular account of this work is
long-range dependence in hydrology and fractional given in the book [5].
Brownian motion, he proposed the use of fractional
processes for modeling long-range dependence and References
scaling properties of economic quantities (see Long
Range Dependence). [1] Barral, J. & Mandelbrot, B. (2002). Multifractal products
of cylindrical pulses, Probability Theory and Related
Fields 124, 409–430.
Multifractal Models and Stochastic Time [2] Calvet, L., Fisher, A. & Mandelbrot, B. (1997). Large
Deviations and the Distribution of Price Changes.
Changes Cowles Foundation Discussion Papers: 1165.
[3] Clark, P.K. (1973). A subordinated stochastic process
In a series of papers [2, 4, 20] with Adlai Fisher model with finite variance for speculative prices, Econo-
and Laurent Calvet, Mandelbrot studied the scaling metrica 41(1), 135–155.
properties of the US/DEM foreign exchange rate at [4] Fisher, A., Calvet, L.M. & Mandelbrot, B. (1997).
frequencies ranging from a few minutes to weeks Multifractality of the Deutschmark/US Dollar exchange
and, building on earlier work by Clark [3] and rates. Cowles Foundation Discussion Papers: 1166.
[5] Hudson, R.L. (2004). The (Mis)behavior of Prices: A
Mandelbrot [12, 13], introduced a new family of Fractal View of Risk, Ruin, and Reward, Basic Books,
stochastic models, where the (log) price of an asset New York, & Profile Books, London, pp. xxvi + 329.
is represented by a time-changed fractional Brownian [6] Mandelbrot, B. (1962). Sur certains prix spéculatifs: faits
motion, where the time change, representing market empiriques et modèle basé sur les processus stables
Mandelbrot, Benoit 3

additifs de Paul Lévy, Comptes Rendus (Paris) 254, [19] Mandelbrot, B. (2001). Stochastic volatility, power-laws
3968–3970. and long memory, Quantitative Finance 1, 558–559.
[7] Mandelbrot, B. (1963). The variation of certain specula- [20] Mandelbrot B., Fisher A. & Calvet, L. (1997). The
tive prices, The Journal of Business of the University of Multifractal Model of Asset Returns. Cowles Foundation
Chicago 36, 394–419. Discussion Papers: 1164.
[8] Mandelbrot, B. (1963). New methods in statistical eco- [21] Mandelbrot, B. & Taylor, H.M. (1967). On the distribu-
nomics, The Journal of Political Economy 71, 421–440. tion of stock price differences, Operations Research 15,
[9] Mandelbrot, B. (1971). Analysis of long-run dependence 1057–1062.
in economics: the R/S technique, Econometrica 39, [22] Mandelbrot, B. & Van Ness, J.W. (1968). Fractional
(July Supplement), 68–69. Brownian motions, fractional noises and applications,
[10] Mandelbrot, B. (1971). When can price be arbitraged SIAM Review 10, 422–437.
efficiently? A limit to the validity of the random- [23] Mandelbrot, B. & Wallis, J.R. (1968). Noah, Joseph
walk and martingale models, Review of Economics and and operational hydrology, Water Resources Research 4,
Statistics 53, 225–236. 909–918.
[11] Mandelbrot, B. (1972). Statistical methodology for non-
periodic cycles: from the covariance to R/S analy-
sis, Annals of Economic and Social Measurement 1, Further Reading
257–288.
[12] Mandelbrot, B. (1973). Comments on “A subordinated Mandelbrot, B. (1966). Forecasts of future prices, unbiased
stochastic process model with finite variance for spec- markets and “martingale” models, The Journal of Business
ulative prices.” by Peter K. Clark, Econometrica 41, of the University of Chicago 39, 242–255.
157–160. Mandelbrot, B. (1982). The Fractal Geometry of Nature.
[13] Mandelbrot, B. (1974). Intermittent turbulence in self- Mandelbrot, B. (2003). Heavy tails in finance for indepen-
similar cascades; divergence of high moments and dent or multifractal price increments, in Handbook on Heavy
dimension of the carrier, Journal of Fluid Mechanics 62, Tailed Distributions in Finance, T.R. Svetlozar, ed., Hand-
331–358. books in Finance, 30, Elsevier, pp. 1–34, Vol. 1.
[14] Mandelbrot, B. (1997). Fractals and Scaling in Finance:
Discontinuity, Concentration, Risk, Springer, New York,
pp. x + 551. Related Articles
[15] Mandelbrot, B. (1997). Fractales, hasard et finance
(1959–1997), Flammarion (Collection Champs), Paris,
p. 246.
Exponential Lévy Models; Fractional Brownian
[16] Mandelbrot, B. (1999). A Multifractal Walk down Wall Motion; Heavy Tails; Lévy Processes; Long Range
Steet, Scientific American, February 1999, pp. 50–53. Dependence; Mixture of Distribution Hypothesis;
[17] Mandelbrot, B. (2001). Scaling in financial prices, I: tails Stylized Properties of Asset Returns.
and dependence, Quantitative Finance 1, 113–123.
[18] Mandelbrot, B. (2001). Scaling in financial prices, RAMA CONT
IV: multifractal concentration, Quantitative Finance 1,
641–649.
Sharpe, William F. market portfolio. Sharpe’s next step was to derive a
relationship between the expected return on any risky
asset and the expected return on the market. As a
William Forsyth Sharpe (born on June 16, 1934) is matter of curiosity, the CAPM relationship does not
one of the leading contributors to financial economics appear in the body of the paper but rather as the final
and shared the Nobel Memorial Prize in Economic equation in footnote 23 on page 438.
Sciences in 1990 with Harry Markowitz and Merton The CAPM relationship in modern notation is
Miller. His most important contribution is the capital
asset pricing model (CAPM), which provided an E[Rj ] − rf = βj (E[Rm ] − rf ) (1)
equilibrium-based relationship between the expected
return on an asset and its risk as measured by where Rj is the return on security j , Rm is the return
its covariance with market portfolio. Similar ideas on the market portfolio of all risky assets, rf is the
were developed by John Lintner, Jack Treynor (see return on the risk-free security, and
Treynor, Lawrence Jack), and Jan Mossin around
the same time. Sharpe has made other important Cov(Rj , Rm )
βj = (2)
contributions to the field of financial economics but, V ar(Rm )
given the space limitations, we only describe two of
his contributions: the CAPM and the Sharpe ratio. is the beta of security j . The CAPM asserts that
It is instructive to trace the approach used by the excess expected return on a risky security is
Sharpe in developing the CAPM. His starting point equal to the security’s beta times the excess expected
was Markowitz’s model of portfolio selection, which return on the market. Note that this is a single period
showed how rational investors would select optimal model and that it is formulated in terms of ex ante
portfolios. If investors only care about the expected expectations. Note also that formula (2) provides an
return and the variance of their portfolios, then the explicit expression for the risk of a security in terms
optimal weights can be obtained by quadratic pro- of its covariance with the market and the variance
gramming. The inputs to the optimization are the with the market.
expected returns on the individual securities and The CAPM has become widely used in both
their covariance matrix. In 1963, Sharpe [1] showed investment finance and corporate finance. It can
how to simplify the computations required under the be used as a tool in portfolio selection and also
Markowitz approach. He assumed that each secu- in the measurement of investment performance of
rity’s return was generated by two random factors: portfolio managers. The CAPM is also useful in
one common to all securities and a second factor capital budgeting applications since it gives a formula
that was uncorrelated across securities. This assump- for the required expected return on an investment. For
tion leads to a simple diagonal covariance matrix. this reason, the CAPM is often used in rate hearings
Although the initial motivation for this simplify- in some jurisdictions for regulated entities such as
ing assumption was to reduce the computational utility companies or insurance companies.
time, it would turn out to have deep economic The insights from the CAPM also played an
significance. important role in subsequent theoretical advances,
These economic ideas were developed in Sharpe’s but owing to space constraint we only mention one.
[2] Journal of Finance paper. He assumed that all The original derivation of the classic Black–Scholes
investors would select mean-variance-efficient port- option formula was based on the CAPM. Black
folios. He also assumed that investors had homoge- assumed that the return on the stock and the return on
neous beliefs and that investors could borrow and its associated warrant both obeyed the CAPM. Hence
lend at the same riskless rate. As Tobin had shown, he was able to obtain expressions for the expected
this implied two fund separations where the investor return on both of these securities and he used this in
would divide his money between the risk-free asset deriving the Black–Scholes equation for the warrant
and an efficient portfolio of risky assets. Sharpe price.
highlighted the importance of the notion of equi- The second contribution that we discuss is the
librium in this context. This efficient portfolio of Sharpe ratio. In the case of a portfolio p with
risky assets in equilibrium can be identified with the expected return E[Rp ] and standard deviation σp , the
2 Sharpe, William F.

Sharpe ratio is allocation optimization and returns-based style anal-


E[Rp ] − rf ysis for evaluating the style and performance of
(3) investment funds. Sharpe has helped translate these
σp
theoretical ideas into practical applications. These
Sharpe [3] introduced this formula in 1966. It applications include the creation of index funds and
represents the excess expected return on the portfolio several aspects of retirement portfolio planning. He
normalized by the portfolio’s standard deviation and has written a number of influential textbooks, includ-
thus provides a compact measure of the reward to ing Investments, used throughout the world. It is clear
variability. The Sharpe ratio is also known as the that Sharpe’s ideas have been of great significance in
market price of risk. Sharpe used this ratio to evaluate the subsequent advances in the discipline of finance.
the performance of mutual funds, and it is now widely
used as a measure of portfolio performance. References
In continuous time finance, the instantaneous
Sharpe ratio, γt , plays a key role in the transformation [1] Sharpe, W.F. (1963). A simplified model for portfolio
of a Brownian motion under the real-world measure analysis, Management Science 9(2), 277–293.
P to a Brownian motion under the risk neutral mea- [2] Sharpe, W.F. (1964). Capital asset prices—a theory of
market equilibrium under conditions of risk, The Journal
sure Q. Suppose Wt is a Brownian motion under P
of Finance, XIX(3), 425–442.
and W̃t is a Brownian motion under Q, then we have, [3] Sharpe, W.F. (1966). Mutual fund performance, Journal
from the Girsanov theorem under suitable conditions, of Business 39, 119–138.
on γ
dW̃t = dWt + γt dt (4) Further Reading

It is interesting to see that the Sharpe ratio figures Sharpe, W.F., Alexander, G.J. & Bailey, J. (1999). Investments,
so prominently in this fundamental relationship in Prentice-Hall.
modern mathematical finance.
Bill Sharpe has made several other notable con- Related Articles
tributions to the development of the finance field.
His papers have profoundly influenced investment Capital Asset Pricing Model; Style Analysis; Bino-
science and portfolio management. He developed mial Tree.
the first binomial tree model (see Binomial Tree)
for option pricing, the gradient method for asset PHELIM BOYLE
Markowitz, Harry The now-landmark 1952 “Portfolio Selection”
paper skipped over the problem of selecting individ-
ual stocks and focused instead on how a manager or
investor selects a portfolio best suited to the indi-
ő Harry Max Markowitz, born in Chicago in 1927, vidual’s risk and return preferences. Pre-Markowitz,
said in his 1990 Nobel Prize acceptance speech that, diversification was considered important, but there
as a child, he was unaware of the Great Depres- was no framework to determine how diversified a
sion, which caused a generation of investors and portfolio was or how an investor could create a well-
noninvestors the world over to mistrust the markets. diversified portfolio.
However, it was a slim, 15-page paper published by Keeping in mind that “diversification is both
Markowitz as a young man that would eventually observed and sensible,” the paper began from the
transform the way people viewed the relationship premise that investors consider expected return a
between risk and return, and that overhauled the “desirable thing” and risk an “undesirable thing”.
way the investment community constructed diversi- Markowitz’s first insight was to look at a portfo-
fied portfolios of securities. lio’s risk as the variance of its returns. This offered
Markowitz was working on his dissertation in a way to quantify investment risk that previously
economics at the University of Chicago when his had not existed. He then perceived that a portfolio’s
now-famous “Portfolio Selection” paper appeared in riskiness depended not just on the expected returns
the March 1952 issue of the Journal of Finance [1]. and variances of the individual assets but also on
He was 25. He went on to win the Nobel Prize the correlations between the assets in the portfo-
in Economic Sciences in 1990 for providing the lio. For Markowitz, the wisdom of diversification
cornerstone to what came to be known as modern was not simply a matter of holding a large num-
portfolio theory (Modern Portfolio Theory). ber of different securities, but of holding securities
Markowitz shared the Nobel Prize with Merton whose value did not rise and fall in tandem with
H. Miller and William F. Sharpe (Sharpe, William one another. “It is necessary to avoid investing in
F.), who were recognized, respectively, for their work securities with high covariances among themselves,”
on how firms’ capital structure and dividend policy he stated in the paper. Investing in companies in
affect their stock price, and the development of the different industries, for instance, increased a port-
capital asset pricing model, which presents a way folio’s diversification and, paradoxically, improved
to measure the riskiness of a stock relative to the the portfolio’s expected returns by reducing its
performance of the stock market as a whole. Together, variance.
the three redefined the way investors thought about Markowitz’s paper laid out a mathematical theory
the investment process, and created the field of finan- for deriving the set of optimal portfolios based on
cial economics. Markowitz, whose work built on their risk-return characteristics. Markowitz showed
earlier work on diversification by Yale University’s how mean-variance analysis could be used to find
James Tobin, who received a Nobel Prize in 1981, a set of securities whose risk-return combinations
was teaching at Baruch College at the City Univer- were deemed “efficient”. Markowitz referred to this
sity of New York when he won the Nobel at the as the expected returns–variance of returns rule (E-
age of 63. V rule). The range of possible risk–return combi-
Markowitz received a bachelor of philosophy in nations yielded what Markowitz described as effi-
1947 and a PhD in economics in 1955, both from cient and inefficient portfolios, an idea he based
the University of Chicago. Years later he said that on Koopmans’ notion that there are efficient and
when he decided to study economics, his philo- inefficient allocations of resources [3]. Koopmans,
sophical interests drew him toward the “economics at the time, was one of Markowitz’s professors.
of uncertainty”. At Chicago, he studied with Mil- Markowitz’s notion of efficient portfolios was sub-
ton Friedman, Jacob Marschak, Leonard Savage, and sequently called the efficient frontier. “Not only does
Tjalling Koopmans, and became a student member of the E-V hypothesis imply diversification, it implies
the famed Cowles Commission for Research in Eco- the ‘right kind’ of diversification for the ‘right rea-
nomics (which moved to Yale University in 1955 and son,”’ Markowitz wrote. The optimal portfolio was
was renamed the Cowles Foundation). the one that would provide the minimum risk for a
2 Markowitz, Harry

given expected return, or the highest expected return he values another prize he received more than the
for a given level of risk. An investor would select Nobel: the von Neumann Prize in operations research
the portfolio whose risk-return characteristics he pre- theory. That prize, he said, recognized the three
ferred. main research areas that have defined his career.
It has been said many times over the years that Markowitz received the von Neumann prize in 1989
Markowitz’s portfolio theory provided, at long last, from the Operations Research Society of America
the math behind the adage “Don’t put all your eggs and the Institute of Management Sciences (now
in one basket.” In 1988, Sharpe said of Markowitz’s combined as INFORMS) for his work on portfolio
portfolio selection concept: “I liked the parsimony, theory, sparse matrix techniques and the high-level
the beauty, of it. . . . I loved the mathematics. It was simulation language called SIMSCRIPT programming
simple but elegant. It had all the aesthetic qualities language.
that a model builder likes” [5]. After Chicago, Markowitz went to the RAND
Back in 1952, Markowitz already knew the prac- Corp. in Santa Monica, CA, where he worked
tical value of the E-V rule he had crafted. It with a group of economists on linear program-
functioned, his paper noted, both “as a hypothe- ming techniques. In the mid-1950s, he developed
sis to explain well-established investment behav- sparse matrices, a technique to solve large mathe-
ior and as a maxim to guide one’s own action.” matical optimization problems. Toward the end of
However, Markowitz’s insight was deeper. The E-V the decade, he went to General Electric to build
rule enabled the investment management profession models of manufacturing plants in the company’s
to distinguish between investment and speculative manufacturing services department. After return-
behavior, which helped fuel the gradual institutional- ing to RAND in 1961, he and his team devel-
ization of the investment management profession. In oped a high-level programming language for sim-
the wake of Markowitz’s ideas, investment managers ulations called SIMSCRIPT to support Air Force
could strive to build portfolios that were not simply projects that involved simulation models. The lan-
groupings of speculative stocks but well-diversified guage was published in 1962. The same year,
sets of securities designed to meet the risk-return Markowitz and former colleague Herb Karr formed
expectations of investors pursuing clear investment CACI, the California Analysis Center Inc. The firm
goals. later changed its name to Consolidated Analysis
Markowitz’s ideas gained traction slowly, but Centers Inc. and became a publicly traded company
within a decade investment managers were turning that provided IT services to the government and
to Markowitz’s theory of portfolio selection (Mod- intelligence community. It is now called CACI
ern Portfolio Theory) to help them determine how International.
to select portfolios of diversified securities. This Markowitz’s career has ranged across academia,
occurred as institutional investors in the United States research, and business. He worked in the money
were casting around for ways to structure portfolios management industry as president of Arbitrage Man-
that relied more on analytics and less on relation- agement Company from 1969 to 1972. From 1974
ships with brokers and bankers. In the intervening until 1983, Markowitz was at IBM’s T.J. Watson
years, Markowitz expanded his groundbreaking work. Research Center in Yorktown Heights, NY. He has
In 1956, he published the Critical Line Algorithm, taught at the University of California at Los Angeles,
which explained how to compute the efficient fron- Baruch College and, since 1994, at the University
tier for portfolios with large numbers of securities of California at San Diego. He continues to teach at
subject to constraints. In 1959, he published Portfo- UC-San Diego and is an academic consultant to Index
lio Selection: Efficient Diversification of Investments, Fund Advisors, a financial services firm that provides
which bored further into the subject and explored the low-cost index funds to investors.
relationship between his mean-variance analysis and In the fall of 2008 and subsequent winter,
the fundamental theories of action under uncertainty Markowitz’s landmark portfolio theory came under
of John von Neumann and Oskar Morgenstern, and harsh criticism in the lay press as all asset classes
of Leonard J. Savage [2]. declined together. Markowitz, however, argued that
However, while Markowitz is most widely known the credit crisis and ensuing losses highlighted the
for his work in portfolio theory, he has said that benefits of diversification and exposed the risks in
Markowitz, Harry 3

not understanding, or in misunderstanding, the cor- [2] Markowitz, H.M. (1959). Portfolio Selection: Efficient
relations between assets in a portfolio. “Portfolio Diversification of Investments, John Wiley & Sons, New
York.
theory was not invalidated, it was validated,” he
[3] Markowitz, H.M. (2002). An Interview with Harry
noted in a 2009 interview with Index Fund Advisors Markowitz by Jeffrey R. Yost, Charles Babbage Institute,
[4]. He has said numerous times over the years that University of Minnesota, Minneapolis, MN.
there are no “shortcuts” to understanding the trade- [4] Markowitz, H.M. (2009). An Interview with Harry M.
off between risk and return. “US portfolio theorists Markowitz by Mark Hebner, Index Fund Advisors, Irvine,
do not talk about risk control,” he said in that inter- CA.
[5] Sharpe, W.F. (1988). Revisiting the Capital Asset Pricing
view. “It sounds like you can control risk. You can’t.”
Model, an interview by Jonathan Burton. Dow Jones
“But diversification,” he continued, “is the next best Asset Manager, May/June, 20–28.
thing.”

Related Articles
References
Modern Portfolio Theory; Risk–Return Analysis;
Sharpe, William F.
[1] Markowitz, H.M. (1952). Portfolio selection, Journal of
Finance 7, 77–91. NINA MEHTA
Merton, Robert C. case when the set of investment opportunities
is stochastic and evolves over time. Investors
hold a portfolio to hedge against shifts in the
Robert C. Merton is the John and Natty McArthur opportunity set of security returns. This implies that
University Professor at Harvard Business School. investors are compensated in the expected return
In 1966, he earned a BS in engineering mathemat- for bearing the risk of shifts in the opportunity set
ics from Columbia University where he published of security returns, in addition to bearing market
his first publication “The ‘Motionless’ Motion of risk. Because of this additional compensation in
Swift’s Flying Island” in the Journal of the History expected return, in equilibrium, expected returns
of Ideas [4]. He then went on to pursue gradu- on risky assets may differ from the risk-less
ate studies in applied mathematics at the Califor- expected return even when they have no market
nia Institute of Technology, leaving the institution risk. Through this work, we obtain an empirically
with an MS in 1967. He obtained a PhD in eco- more useful version of CAPM that allows for
nomics in 1970 from the Massachusetts Institute multiple risk factors. Merton’s ICAPM predated
of Technology where he worked under the Nobel many subsequently published multifactor models like
laureate Paul A. Samuelson (see Samuelson, Paul the arbitrage pricing theory [11] (see Arbitrage
A.). His dissertation was entitled “Analytical Optimal Pricing Theory).
Control Theory as Applied to Stochastic and Non- Merton’s work in the 1970s laid the foundation
stochastic Economics.” Prior to joining Harvard in for modern derivative pricing theory (see Option
1988, Merton served on the finance faculty of Mas- Pricing: General Principles). His paper “Theory
sachusetts Institute of Technology. of Rational Option Pricing” [5] is one of the two
In 1997, Merton shared the Nobel Prize in Eco- classic papers on derivative pricing that led to the
nomic Sciences with Myron Scholes “for a new Black–Scholes–Merton option pricing theory (see
method to determine the value of derivatives”. Black–Scholes Formula). Merton’s essential con-
Merton taught himself stochastic dynamic program- tribution was his hedging (see Hedging) argument
ming and Ito calculus during graduate school at for option pricing based on no arbitrage; he showed
Massachusetts Institute of Technology and subse- that one can use the prescribed dynamic trading
quently introduced Ito calculus (see Stochastic Inte- strategy under Black–Scholes [1] to offset the risk
grals) into finance and economics. Continuous-time exposure of an option and obtain a perfect hedge
stochastic calculus had become a cornerstone in under the continuous trading limit. In other words,
mathematical finance, and more than anyone Merton he discovered how to construct a “synthetic option”
is responsible in making manifest the mathematical using continual revision of a “self-financing” portfo-
tool’s power in financial modeling and applications. lio involving the underlying asset and riskless bor-
Merton had also produced highly regarded work rowing to replicate the expiration-date payoff of the
on dynamic models of optimal life-time consump- option. And no arbitrage dictates that the cost of con-
tion and portfolio selection, equilibrium asset pric- structing this synthetic option must give the price
ing, contingent-claim analysis, and financial systems. of the option even if it does not exist. This sem-
Merton’s monograph “Continuous-time finance” [8] inal paper also extended the Black–Scholes model
is a classic introduction to these topics. to allow for predictably changing interest rates, div-
Merton proposed an intertemporal capital asset idend payments on the underlying asset, changing
pricing model (ICAPM) [6] (see Capital Asset exercise price, and early exercise under American
Pricing Model), a model empirically more attractive options. Merton also produced “perhaps the first
than the single-period capital asset pricing model closed-form formula for an exotic option”. [12] Mer-
(CAPM) (see Capital Asset Pricing Model). ton’s approach to derivative securities provided the
Assuming continuous-time stochastic processes with intellectual basis for the rise of the profession of
continuous-decision-making and trading, Merton financial engineering.
showed that mean–variance portfolio choice is The Merton model (see Structural Default Risk
optimal at each moment of time. It explained when Models) refers to an increasingly popular structural
and how the CAPM could hold in a dynamic credit risk model introduced by Merton [7] in the
setting. As an extension, Merton looked at the early 1970s. Drawing on the insight that the payoff
2 Merton, Robert C.

structure of the leveraged equity of a firm is identical jump-diffusion models (see Jump-diffusion Mod-
to that of a call option (see Call Options) on els) in option pricing, valuation of market forecasts,
the market value of the assets of the whole firm, pension reforms, and employee stock option (see
Merton proposed that the leveraged equity of a firm Employee Stock Options).
could be valued as if it were a call option on the In addition to his academic duties, Merton has also
assets of the whole firm. The isomorphic (same been partner of the now defunct hedge fund Long
payoff structure) price relation between the leveraged Term Capital Management (see Long-Term Capi-
equity of a firm and a call option allows one to tal Management) and is currently Chief Scientific
apply the Black–Scholes–Merton contingent-claim Officer at the Trinsum Group.
pricing model to value the equities [7]. The value
for the corporate debt could then be obtained by
subtracting the value of the option-type structure
that the leveraged equity represents from the total
References
market value of the assets. Merton’s methodology
offered a way to obtain valuation functions for the
equity and debt of a firm, a measure of the risk of [1] Black, F. & Scholes, M. (1973). The pricing of options
the debt, as well as all the Greeks of contingent- and corporate liabilities, Journal of Political Economy
claim pricing. The Merton model provided a useful 81(3), 637–659.
basis for valuing and assessing corporate debt, its [2] Crane, D., Froot, K., Mason, S., Perold, A., Mer-
risk, and the sensitivity of debt value to various ton, R.C., Bodie, Z., Sirri, E. & Tufano, P. (1995).
parameters (e.g., the delta gives the sensitivity of The Global Financial System: A Functional Perspective,
Harvard Business School Press, Boston, MA.
either debt value or equity value to change in asset
[3] Merton, R.K. (1957). Social Theory and Social Structure,
value). Commercial versions of the Merton model revised and enlarged edition, The Free Press, Glencoe,
include the KMV model and the Jarrow–Turnbull IL.
model. [4] Merton, R.C. (1966). The “Motionless” Motion of
Since the 1990s, Merton collaborated with Zvi Swift’s flying island, Journal of the History of Ideas 27,
Bodie, Professor of Finance at Boston University to 275–277.
develop a new line of research on the financial sys- [5] Merton, R.C. (1973). Theory of rational option theory,
tem [2, 9, 10]. They adopted a functional perspective, Bell Journal of Economics and Management Science
4(1), 141–183.
“similar in spirit to the functional approach in soci-
[6] Merton, R.C. (1973). An intertemporal capital asset
ology pioneered by Robert K. Merton (1957)” [3, 9]. pricing model, Econometrica 41(5), 867–887.
By focusing on the underlying functions of financial [7] Merton, R.C. (1974). On the pricing of corporate debt:
systems, the functional perspective takes functions the risk structure of interest rates, Journal of Finance
rather than institutions and forms as the concep- 29(2), 449–470.
tual anchor in its analysis of financial institutional [8] Merton, R.C. (1990). Continuous-Time Finance, Black-
change over time and contemporaneous institutional well, Malden, MA.
[9] Merton, R.C. & Bodie, Z. (1995). A conceptual frame-
differences across borders. The functional perspec-
work for analyzing the financial system. Chapter 1 in
tive is also useful for predicting and guiding finan- The Global Financial System: A Functional Perspective,
cial institutional change. The existing approaches D. Crane, K. Froot, S. Mason, A. Perold, R. Merton,
of neoclassical, institutional, and behavioral theo- Z. Bodie, E. Sirri, & P. Tufano, eds, Harvard Business
ries in economics are taken as complementary in School Press, Boston, MA, pp. 3–31.
the functional approach to understand financial sys- [10] Merton, R.C. & Bodie, Z. (2005). Design of financial
tems. systems: towards a synthesis of function and structure,
Merton had made significant contributions to Journal of Investment Management 3(1), 1–23.
[11] Ross, S. (1976). The arbitrage theory of capital asset
finance across a broad spectrum and they are too
pricing, Journal of Economic Theory 13(3),
numerous to mention exhaustively. His other works 341–360.
include those on Markowitz–Sharpe-type models [12] Rubinstein, M. (2006). A History of the Theory of Invest-
with investors with homogeneous beliefs but with ments, John Wiley & Sons, Hoboken, NJ,
incomplete information about securities, the use of p. 240.
Merton, Robert C. 3

Further Reading ment; Merton Problem; Option Pricing: Gen-


eral Principles; Option Pricing Theory: Histor-
Merton, R.C. (1990). Continuous-Time Finance, Blackwell, ical Perspectives; Partial Differential Equations;
Malden, MA. Samuelson, Paul A.; Structural Default Risk
Models; Thorp, Edward.
Related Articles ALEX HAMILTON CHAN
Black, Fischer; Black–Scholes Formula; Jump-
diffusion Models; Long-Term Capital Manage-
Arbitrage: Historical financial derivatives provide other examples of arbi-
trages relevant to the quantitative finance usage.
Perspectives Among the general public, confusion about the nature
of arbitrage permitted Bernard Madoff to use the illu-
sion of arbitrage profit opportunities to attract “hedge
The concept of arbitrage has acquired a precise, fund investments” into the gigantic Ponzi scheme that
technical meaning in quantitative finance (see Arbi- collapsed in late 2008. Tracing the historical roots of
trage Pricing Theory; Arbitrage Strategy; Arbi- arbitrage trading provides some insight into the var-
trage Bounds). In theoretical pricing of derivative ious definitions of arbitrage in modern usage.
securities, an arbitrage is a riskless trading strategy
that generates a positive profit with no net investment
of funds. This definition can be loosened to allow the Arbitrage in Ancient Times
positive profit to be nonnegative, with no possible
future state having a negative outcome and at least Records about business practices in antiquity are
one state with a positive outcome. Pricing formulas scarce and incomplete. Available evidence is
for specific contingent claims are derived by assum- primarily from the Middle East and suggests that mer-
ing an absence of arbitrage opportunities. Generaliz- cantile trade in ancient markets was extensive and
ing this notion of arbitrage, the fundamental theorem provided a number of avenues for risky arbitrage.
of asset pricing provides that an absence of arbitrage Potential opportunities were tempered by the lack
opportunities implies the existence of an equivalent of liquidity in markets; the difficulties of obtaining
martingale measure (see Fundamental Theorem of information and moving goods over distances; and,
Asset Pricing; Equivalent Martingale Measures). inherent political and economic risks. Trading insti-
Combining absence of arbitrage with a linear model tutions and available securities were relatively simple.
of asset returns, the arbitrage pricing theory decom- Circa 1760 BC, the Code of Hammurabi dealt exten-
poses the expected return of a financial asset into sively with matters of trade and finance. Sumerian
a linear function of various economic risk factors, cuneiform tablets from that era indicate a rudimen-
including market indices. Sensitivity of expected tary form of bill of exchange transaction was in use
return to changes in each factor is represented where a payment (disbursement) would be made in
by a factor-specific beta coefficient. Significantly, one location in the local unit of account, for exam-
while riskless arbitrage imposes restrictions on prices ple, barley, in exchange for disbursement (payment)
observed at a given point in time, the arbitrage pric- at a later date in another location of an agreed upon
ing theory seeks to explain expected returns, which amount of that local currency, for example, lead [6].
involve prices observed at different points in time. The date was typically determined by the accepted
In contrast to the technical definitions of arbitrage transport time between the locations. Two weeks to
used in quantitative finance, colloquial usage of arbi- a month was a commonly observed time between the
trage in modern financial markets refers to a range payment and repayment. The specific payment loca-
of trading strategies, including municipal bond arbi- tion was often a temple.
trage; merger arbitrage; and convertible bond arbi- Ancient merchants developed novel and complex
trage. Correctly executed, these strategies involve solutions to address the difficulties and risks in exe-
trades that are low risk relative to the expected cuting various arbitrage transactions. Because the two
return but do have possible outcomes where profits payments involved in the ancient bill of exchange
can be negative. Similarly, uncovered interest arbi- were separated by distance and time, a network of
trage seeks to exploit differences between foreign agents, often bound together by family or tribal ties,
and domestic interest rates leaving the risk of cur- was required to disburse and receive funds or goods
rency fluctuations unhedged. These notions of risky in the different locations. Members of the caravan or
arbitrage can be contrasted with covered interest arbi- ship transport were often involved in taking goods on
trage, which corresponds to the definition of arbitrage consignment for sale in a different location where the
used in quantitative finance of a riskless trading strat- cost of the goods would be repaid [6, p.15–6]. The
egy that generates a positive profit with no net invest- merchant arbitrageur would offset the cost of purchas-
ment of funds. Cash-and-carry arbitrages related to ing goods given on consignment with payments from
2 Arbitrage: Historical Perspectives

other merchants seeking to avoid the risks of car- To deal with the problems of reconciling transactions
rying significant sums of money over long distance, using different coinages and units of account, a forum
making a local payment in exchange for a disburse- for arbitrating exchange rates was introduced. On
ment of the local currency in a different location. the third day of each fair, a representative body
The basic cash-and-carry arbitrage is complicated by composed of recognized merchant bankers would
the presence of different payment locations and cur- assemble and determine the exchange rates that
rency units. The significant risk of delivery failure would prevail for that fair. The process involved each
or nonpayment was controlled through the close-knit banker suggesting an exchange rate and, after some
organizational structure of the merchant networks [7]. discussion, a voting process would determine the
These same networks provided information on chang- exchange rates that would apply at that fair. Similar
ing prices in different regions that could be used in practices were adopted at other important fairs later in
geographical goods arbitrage. the Middle Ages. At Lyon, for example, Florentine,
The gradual introduction of standardized coinage Genoese, and Lucca bankers would meet separately
starting around the 650 BC expanded available to determine rates, with the average of these group
arbitraging opportunities to include geographical rates becoming the official rate. These rates would
arbitrage of physical coins to exploit differing then apply to bill transactions and other business
exchange ratios [6, p.19–20]. For example, during conducted at the fair. Rates typically stayed constant
the era of the Athenian empire (480–404 BC), Per- between fairs in a particular location providing the
sia maintained a bimetallic coinage system where opportunity for arbitraging of exchange rates across
silver was undervalued relative to gold. The result- fairs in different locations.
ing export of silver coins from Persia to Greece and From ancient beginnings involving commodity
elsewhere in the Mediterranean is an early instance transactions of merchants, the bill of exchange
of a type of arbitrage activity that became a main- evolved during the Middle Ages to address the diffi-
stay of the arbitrageur in later years. This type of culties of using specie or bullion to conduct foreign
arbitrage trading was confined to money changers exchange transactions in different geographical loca-
with the special skills and tools to measure the bul- tions. In general, a bill of exchange contract involved
lion value of coins. In addition to the costs and risks four persons and two payments. The bill is created
of transportation, the arbitrage was restricted by the when a “deliverer” exchanges domestic cash money
seigniorage and minting charges levied in the dif- for a bill issued by a “taker”. The issued bill of
ferent political jurisdictions. Because coinage was exchange is drawn on a correspondent or agent of the
exchanged by weight and trading by bills of exchange taker who is situated abroad. The correspondent, the
was rudimentary, there were no arbitrageurs special- “payer”, is required to pay a stated amount of foreign
izing solely in “arbitrating of exchange rates”. Rather, cash money to the “payee”, to whom the bill is made
arbitrage opportunities arose from the trading activ- payable. Consider the precise text of an actual bill
ities of networks of merchants and money changers. of exchange from the early seventeenth century that
These opportunities included uncovered interest arbi- appeared just prior to the introduction of negotiability
trage between areas with low interest rates, such as [28, p.123]:
Jewish Palestine, and those with high rates, such as
Babylonia [6, p.18–19]. March 14, 1611
In London for £69.15.7 at 33.9
At half usance pay by this first of exchange
Evolution of the Bill of Exchange to Francesco Rois Serra sixty-nine pounds, fifteen
shillings, and seven pence sterling at thirty-three
Though the precise origin of the practice is unknown, shillings and nine pence groat per £ sterling, value
“arbitration of exchange” first developed during the [received] from Master Francesco Pinto de Britto,
Middle Ages. Around the time of the First Crusade, and put it into our account, God be with you.
Giovanni Calandrini and
Genoa had emerged as a major sea power and Filippo Burlamachi
important trading center. The Genoa fairs had become Accepted
sufficiently important economic and financial events [On the back:] To Balthasar Andrea in Antwerp
that attracted traders from around the Mediterranean. First 117.15.0 [pounds groat]
Arbitrage: Historical Perspectives 3

The essential features of the bill of exchange all producing a number of different contractual varia-
appear here: the four separate parties; the final tions [9, 15, 26]. The market for bills of exchange
payment being made in a different location from also went through a number of different stages. At
the original payment; and the element of currency the largest and most strategic medieval fairs, finan-
exchange. “Usance” is the period of time, set by cial activities, especially settlement and creation of
custom, before a bill of exchange could be redeemed bills of exchange, came to dominate the trading in
at its destination. For example, usance was 3 months goods [27]. By the sixteenth century, bourses such as
between Italy and London and 4 weeks between the Antwerp Exchange were replacing the fairs as the
Holland and London. The practice of issuing bills at key international venues for bill trading.
usance, as opposed to specifying any number of days
to maturity, did not disappear until the nineteenth
century [34, p.7]. Arbitrage in Coinage and Bullion
Commercial and financial activities in the Middle
Ages were profoundly impacted by Church doctrine Arbitrage trading in coins and bullion can be traced
and arbitrage trading was no exception. Exchange to ancient times. Reflecting the importance of the
rates determined for a given fair would have to be activity to ordinary merchants in the Middle Ages,
roughly consistent with triangular arbitrage to avoid methods of determining the bullion content of coins
Church sanctions. In addition, the Church usury pro- from assay results, and rates of exchange between
hibition impacted the payment of interest on money coins once bullion content had been determined,
loans. Because foreign exchange transactions were formed a substantial part of important commercial
licit under canon law, it was possible to disguise arithmetics, such as the Triparty (1484) of Nicolas
the payment of interest in a combination of bill of Chuquet [2]. The complications involved in trading
exchange transactions referred to as dry exchange or without a standardized unit of account were imposing.
fictitious exchange [13, p.380–381], [17, 26]. The There were a sizable number of political jurisdictions
associated exchange and re-exchange of bills was that minted coins, each with distinct characteristics
a risky set of transactions that could be covertly and weights [14]. Different metals and combinations
used to invest money balances or to borrow funds of metals were used to mint coinage. The value of
to finance the contractual obligations. The expansion silver coins, the type of coins most commonly used
of bill trading for financial purposes combined with for ordinary transactions, was constantly changing
the variation in the exchange rates obtained at fairs in because of debasement and “clipping”. Over time,
different locations provided the opportunity of geo- significant changes in the relative supply of gold and
graphical arbitrage of exchange rates using bills of silver, especially due to inflows from the New World,
exchange. It was this financial practice of exploiting altered the relative values of bullion. As a result,
differences in bill exchange rates between financial merchants in a particular political jurisdiction were
centers that evolved into the “arbitration of exchange” reluctant to accept foreign coinage at the par value
identified by la Porte [22], Savary [24], and Postel- set by the originating jurisdiction. It was common
wayte [30] in the eighteenth century. practice for foreign coinage to be assayed and a value
The bill of exchange contract evolved over time to set by the mint conducting the assay. Over time, this
meet the requirements of merchant bankers. As mon- led to considerable market pressures to develop a
etary units became based on coinage with specific unit of account that would alleviate the expensive
bullion content, the relationship between exchange and time-consuming practice of determining coinage
rates in different geographical locations for bills value.
of exchange, coinage, and physical bullion became An important step in the development of such
the mainstay of traders involved in “arbitration of a standardized unit of account occurred in 1284
exchange”. Until the development of the “inland” bill when the Doge of Venice began minting the gold
in early seventeenth century in England, all bills of ducat: a coin weighing about 3.5 g and struck in
exchange involved some form of foreign exchange 0.986 gold. While ducats did circulate, the primary
trading, and hence the name bill of exchange. Con- function was as a trade coin. Over time, the ducat
tractual features of the bill of exchange, such as was adopted as a standard for gold coins in other
negotiability and priority of claim, evolved over time countries, including other Italian city states, Spain,
4 Arbitrage: Historical Perspectives

Austria, the German city states, France, Switzerland, for silver shillings at a fixed price (£1.075 = 21s.
and England. Holland first issued a ducat in 1487 6d./oz.). In Amsterdam, the market price for a Dutch
and, as a consequence of the global trading power of gold ducat was 17.5 schellingen (S). Observing that
Holland in the sixteenth and seventeenth centuries, the ducat contained 0.1091 ounces of recoverable
the ducat became the primary trade coin for the gold and the guinea 0.2471 ounces, it follows that
world. Unlike similar coins such as the florin and 36.87 S could be obtained for £1 if gold was
guinea, the ducat specifications of about 3.5 g of used to effect the exchange. Or, put differently, 1
0.986 gold did not change over time. The use of ducat would produce £0.4746. Because transportation
mint parities for specific coins and market prices of coins and bullion was expensive, there was a
for others did result in the gold–silver exchange sizable band within which rates on bills of exchange
ratio differing across jurisdictions. For example, in
could fluctuate without producing bullion flows. If
1688, the Amsterdam gold–silver ratio for the silver
the (S/£) bill exchange rate rose above the rate of
rixdollar mint price and gold ducat market price was
exchange for gold plus transport costs, merchants in
14.93 and, in London, the mint price ratio was 15.58
Amsterdam seeking funds in London would prefer
for the silver shilling and gold guinea [25, p.475].
Given transport and other costs of moving bullion, to send gold rather than buy bills of exchange
such gold/silver price ratio differences were not on London. Merchants in London seeking funds
usually sufficient to generate significant bullion flows. in Amsterdam would buy bills on Amsterdam to
However, combined in trading with bills of exchange, benefit from the favorable exchange. Similarly, if the
substantial bullion flows did occur from arbitrage bill exchange rate fell below the rate of exchange
trading. for silver plus transport costs, merchants in London
Details of a May 1686 arbitrage by a London would gain by exporting silver to Amsterdam rather
goldsmith involving bills of exchange and gold coins than buying a bill on Amsterdam.
are provided by Quinn [25, p.479]. The arbitrage To reconstruct the 1686 goldsmith arbitrage,
illustrates how the markets for gold, silver, and observe that the exchange rate for a 4-week bill in
bills of exchange interacted. At that time, silver was London on Amsterdam at the time of the arbitrage
the primary monetary metal used for transactions was 37.8 (S/£). Obtaining gold ducats in Holland
though gold coins were available. Prior to 1663, when for £0.4746 and allowing for transport costs of 1.5%
the English Mint introduced milling of coins with and transport time of 1 week produces gold in Lon-
serrated edges to prevent clipping, all English coins don for £0.4676. Using this gold to purchase a bill
were “hammered” [20]. The minting technology of of exchange on Amsterdam produces 17.6715 S in
hammering coins was little changed from Roman Amsterdam 5 weeks after the trade is initiated, an
times. The process produced imperfect coins, not arbitrage profit of 0.1715 S. Even if the gold can
milled at the edges, which were only approximately be borrowed in Amsterdam and repaid in silver, the
equal in size, weight, and imprint making altered trade is not riskless owing to the transport risk and
coins difficult to identify [29, ch.4]. Such coins were
the possible movement in bill rates before the bill
susceptible to clipping, resulting in circulating silver
is purchased in London. These costs would be miti-
coins that were usually under the nominal Mint
gated significantly for a London firm also operating
weight. Despite a number of legislative attempts at
remedying the situation, around 1686, the bulk of in the bill and bullion market of Amsterdam, as was
the circulating coins in England were still hammered the case with a number of London goldsmiths. The
silver. The Mint would buy silver and gold by weight strength of the pound sterling in the bill market from
in exchange for milled silver shilling coins at a set 1685–1688 generated gold inflows to England from
price per ounce. When the market price of silver rose this trade higher than any other four-year period in
sufficiently above the mint price, English goldsmiths the seventeenth century [25, p.478]. The subsequent
would melt the milled silver coin issued by the Mint, weakening of the pound in the bill market from
though it was technically illegal to do so. 1689 until the great recoinage in 1696 led to arbi-
In addition to mint prices for silver and gold, there trage trades switching from producing gold inflows
were also market prices for gold and silver. Around to substantial outflows of silver from melted coins
1686, the Mint would issue guineas in exchange and clipping.
Arbitrage: Historical Perspectives 5

Bill of Exchange Arbitrage and then matching the settlements in the payment
centers.
The roots of “arbitration of exchange” can be traced In the following example, $G is the domestic
to the transactions of medieval merchant bankers currency in Hamburg and $A is the domestic cur-
seeking to profit from discrepancies in bill exchange rency in Antwerp, the forward exchange rate imbed-
rates across geographical locations [27, 28]. For ded in the bill transaction is denoted as F1 for
example, if sterling bills on London were cheaper in Ducats/$A; F2 for Ducats/$G; F3 for £/$G; and, F4
Paris than in Bruges, then medieval bankers would for £/$A.
profit by selling sterling in Bruges and buying in
Paris. The effect of such transactions was to keep
In Hamburg
all exchange rates roughly in parity with the trian-
gular arbitrage condition. Temporary discrepancies Acquire $G QG Deliver the $G QG
did occur but such trading provided a mechanism using a bill which on another bill
agrees to pay which agrees to be
of adjustment. The arbitrages were risky even when ($G QG F2 ) in repaid ($G QG F3 ) in
done entirely with bills of exchange. Owing to Venice at time T London at time T
the slowness of communications, market conditions
could change before bills of exchange reached their
destination and the re-exchange could be completed. In Antwerp
As late as the sixteenth century, only the Italian Acquire $A QA Deliver the $A QA on
merchant bankers, the Fuggers of Augsburg, and a using a bill which another bill
few other houses with correspondents in all bank- agrees to pay which agrees to be
ing centers were able to engage actively in arbitrage ($A QA F4 ) in repaid ($A QA F1 ) in
London at time T Venice at time T
[28, p.137]. It is not until the eighteenth century
that markets for bills were sufficiently developed
At t = 0, the cash flows from all the bill transactions
to permit arbitration of exchange to become stan-
at t = 0 offset. If the size of the borrowings in
dard practice of merchants deciding on the most
the two issuing centers is calculated to produce
profitable method of remitting or drawing funds
the same maturity value, in terms of the domestic
offshore.
currencies of the two payment centers, then the
The transactions in arbitration of exchange by
profit on the transaction depends on the relative
medieval bankers are complicated by the absence
values of the payment center currencies in the issuing
of offsetting cash flows in the locations where bills
centers. If there is sufficient liquidity in the Hamburg
are bought and sold. In the example above, the pur-
and Antwerp bill markets, the banker can generate
chase of a bill in Paris would require funds, which
triangular arbitrage trades designed to profit from
are generated by the bill sale in Bruges. The prof-
discrepancies in bid/offer rates arising in different
its are realized in London. Merchant bankers would
geographical locations.
be able to temporarily mitigate the associated geo-
To see the precise connection to triangular arbi-
graphical fund imbalances with internally generated
trage, consider the profit function from the trading
capital, but re-exchanges or movements of bullion
strategy. At time T in Venice, the cash flows would
were necessary if imbalances persisted. To be con-
provide ($A QA F1 ) − ($G QG F2 ). And, in Lon-
sistent with the spirit of the self-financing element of
don, the cash flows would provide ($G QG F3 ) −
modern riskless arbitrage, the example of medieval
($A QA F4 ). For the intermediary operating in both
banker arbitrage among Paris, Bruges, and London
locations, the resulting profit (π) on the trade would
can be extended to two issuing locations and two
be the sum of the two cash flows:
payment centers. It is possible for the same loca-
tion to be used as both the issuing and payment
π(T ) = ($A QA F1 − $G QG F2 )
location but that will not be assumed. Let the two
issuing locations be, say, Antwerp and Hamburg, + ($G QG F3 − $A QA F4 )
with the two payment locations being London and
= $A QA (F1 − F4 ) + $G QG (F3 − F2 )
Venice. The basic strategy involves making offset-
ting bill transactions in the two issuing locations (1)
6 Arbitrage: Historical Perspectives

Constructing the principal values of the two trans- Merchants’ manuals of the eighteenth and
actions to be of equal value now permits the substi- nineteenth centuries typically present arbitration
tution of QG = QA ($G/$A), where ($G/$A) = F0 of exchange from the perspective of a merchant
is the prevailing exchange rate between $G and $A: engaged in transferring funds. In some sources,
self-financing arbitrage opportunities created by
π(T ) = $AQA [(F1 − F0 F2 ) − (F4 − F0 F3 )] combining remitting and drawing opportunities are
  identified. Discussions of the practice invariably
Ducats $G Ducats
= $AQA − involve calculations of the “arbitrated rates”. Earlier
$A $A G
  manuals such as the one by Le Moine [11] only
£ $G £ provide a few basic calculations aimed to illustrate
− − (2)
$A $A $G the transactions involved. The expanded treatment
in Postlewayt [24] provides a number of worked
The two values in brackets will be zero if trian- calculations. In one example, exchange rates at
gular arbitrage holds for both currencies. If the direct London are given as London–Paris 31 3/4 pence
and indirect exchange rates for one of the currencies sterling for 1 French crown; London–Amsterdam
are not consistent with triangular arbitrage, then the as 240 pence sterling for 414 groats. Worked
banker can obtain a self-financing arbitrage profit. calculations are given for the problem “What is the
proportional arbitrated price between Amsterdam and
Paris?” Considerable effort is given to show the
Arbitration of Exchange arithmetic involved in determining this arbitrated rate
as 54 123/160 groat for 1 crown. Using this calculated
By the eighteenth century, the bill market in key arbitrated exchange rate and the already known actual
financial centers such as Amsterdam, London, Ham- London–Paris rate, Postlewayt then proceeds to
burg, and Paris had developed to the point where determine the arbitrated rate for London–Amsterdam
merchants as well as bankers could engage in arbi- using these exchange rates for Paris–London and
tration of exchange to determine the most profitable Paris–Amsterdam finding that it equals 240 pence
method of remitting funds to or drawing funds from sterling for 414 groats.
offshore locations. From a relatively brief treatment Having shown how to determine arbitrated rates,
in early seventeenth century sources, for example, Postlewayt provides worked examples of appropri-
[13], merchants’ manuals detailing technical aspects ate arbitrage trades when the actual exchange rate is
of bill trading were available by the beginning of the above or below the arbitrated rate. For example, when
eighteenth century. The English work by Justice, A the arbitrated Amsterdam–Paris rate is above the
General Treatise on Money and Exchanges [9], an actual rate, calculations are provided to demonstrate
expanded translation of an earlier treatise in French that drawing sterling in London by selling a bill on
by M. Ricard, details the workings of bill transac- Paris, using the funds to buy a bill on Amsterdam and
tions, recognizing subtle characteristics in the bill then exchanging the guilders/groats received in Ams-
contract. However, as a reflection of the rudimentary terdam at the actual rate to cover the crown liability
state of the English bill market in the early eigh- in Paris will produce a self-financing arbitrage profit.
teenth century, Justice did not approve of “drawing Similarly, when the arbitrated Amsterdam–Paris rate
bills upon one country payable in another” due to is below the actual rate, the trades in the arbitrage
the “difference in the Laws of Exchange, in different involve drawing sterling in London by selling a
countries” giving rise to “a great many inconve- bill on Amsterdam, using the funds to buy a bill
niences” [9, p.28]. As the eighteenth century pro- on Paris and then exchanging at the actual Ams-
gressed, there was substantial growth in the breadth terdam–Paris exchange rate the crowns received in
and depth of the bill market supported by increases in Paris to cover the guilder liability. This is similar to
speed of communication between key financial cen- the risky medieval banker arbitrage where the rate
ters with London emerging as the focal point [16, on re-exchange is uncertain. Though the actual rate
31]. This progress was reflected in the increasingly is assumed to be known, in practice, this rate could
sophisticated treatment of arbitration of exchange in change over the time period it takes to settle the rele-
merchants’ manuals. vant bill transactions. However, the degree of risk
Arbitrage: Historical Perspectives 7

facing the medieval banker was mitigated by the speed at which price discrepancies across interna-
18th century due to the considerably increased speed tional markets could be identified. Telegraph tech-
of communication between centers and subsequent nology allowed the introduction of the stock market
developments in the bill contract, such as negotiabil- ticker in 1867. Opportunity for arbitraging differences
ity and priority of claim. in the prices of securities across markets was fur-
Earlier writers on arbitration of exchange, such ther aided by expansion of the number and variety of
as Postlewayt, accurately portrayed the concept but stocks and shares, many of which were interlisted
did not adequately detail all costs involved in the on different regional and international exchanges.
transactions. By the nineteenth century, merchants’ (Where applicable, the nineteenth century convention
manuals such as [34] accurately described the range of referring to fixed-income securities as stocks and
of adjustments required for the actual execution of the common stocks as shares will be used.) For exam-
trades. Taking the perspective of a London merchant ple, after 1873 arbitraging the share price of Rio Tinto
with sterling seeking to create a fund of francs between the London and Paris stock exchanges was
in Paris, a difference is recognized between two a popular trade.
methods of determining the direct rate of exchange: Cohn [3, p.3] attributes “the enormous increase
buying a bill in the London market for payment
in business on the London Stock Exchange within
in Paris; or having correspondents in Paris issue
the last few years” to the development of “Arbi-
for francs a bill for sterling payment in London.
trage transactions between London and Continental
In comparing with the arbitrated rates, the more
Bourses”. In addition to various government bond
advantageous direct rate is used. In determining direct
issues, available securities liquid enough for arbi-
rates, 3-month bill exchange rates are used even
though the trade is of shorter duration. These rates trage trading included numerous railway securities
are then adjusted to “short” rates to account for that appeared around the middle of the century. For
the interest factor. Arbitrated rates are calculated example, both Haupt [8] and Cohn [3] specifically
and, in comparing with direct rates, an additional identify over a dozen securities traded in Amster-
brokerage charge (plus postage) is deducted from the dam that were sufficiently liquid to be available for
indirect trade due to the extra transaction involved, arbitrage with London. Included on both lists are
for example, a London merchant buys a bill for securities as diverse as the Illinois and Erie Rail-
payment in Frankfurt, which is then sold in Paris. way shares and the Austrian government silver loan.
No commissions are charged as it is assumed that the Securities of mines and banks increased in impor-
trade is done “between branches of the same house, tance as the century progressed. The expansion in
or on joint account” [34, p.98]. railway securities, particularly during the US consol-
idations of the 1860s, led to the introduction of traded
contingencies associated with these securities such as
Arbitrage in Securities and Commodities rights issues, warrant options, and convertible securi-
Arbitrage involving bills of exchange survives in ties. Weinstein [33] identifies this development as the
modern times in the foreign exchange swap trades beginning of arbitrage in equivalent securities, which,
of international banks. Though this arbitrage is of in modern times, encompasses convertible bond arbi-
central historical importance, it attracts less atten- trage and municipal bond arbitrage. However, early
tion now than a range of arbitrage activities involv- eighteenth century English and French subscription
ing securities and commodities that benefited from shares do have a similar claim [32]. Increased liquid-
the financial and derivative security market develop- ity in the share market provided increased opportuni-
ments of the nineteenth century. Interexchange and ties for option trading in stocks and shares.
geographical arbitrages were facilitated by develop- Also during the nineteenth century, trading in
ments in communication. The invention of the tele- “time bargains” evolved with the commencement
graph in 1844 permitted geographical arbitrage in of trading in such contracts for agricultural com-
stocks and shares between London and the provin- modities on the Chicago Board of Trade in 1851.
cial stock exchanges by the 1850s. This trade was While initially structured as forward contracts, adop-
referred to as shunting. In 1866, Europe and Amer- tion of the General Rules of the Board of Trade
ica were linked by cable, significantly enhancing the in 1865 laid a foundation for trading of modern
8 Arbitrage: Historical Perspectives

futures contracts. Securities and contracts with con- daily quotations of rates in several markets. Also, the
tingencies have a history stretching to ancient times similar traffic in stock.” The initial usage is given
when trading was often done using samples and mer- as 1881. Reference is also directed to “arbitration
chandise contracts had to allow for time to delivery of exchange” where the definition is “the determina-
and the possibility that the sample was not repre- tion of the rate of exchange to be obtained between
sentative of the delivered goods. Such contingencies two countries or currencies, when the operation is
were embedded in merchandise contracts and were conducted through a third or several intermediate
not suited to arbitrage trading. The securitization of ones, in order to ascertain the most advantageous
such contingencies into forward contracts that are method of drawing or remitting bills.” The singu-
adaptable to cash-and-carry arbitrage trading can be lar position given to “arbitration of exchange” trad-
traced to the introduction of “to arrive” contracts on ing using bills of exchange recognizes the practical
the Antwerp bourse during the sixteenth century [19, importance of these securities in arbitrage activi-
ch.9]. Options trading was a natural development on ties up to that time. The Oxford International Dic-
the trade in time bargains, where buyers could either tionary definition does not recognize the specific
take delivery or could pay a fixed fee in lieu of deliv- concepts of arbitrage, such as triangular currency
ery. In effect, such forward contracts were bundled arbitrage or interexchange arbitrage, or that such
with an option contract having the premium paid at arbitrage trading applies to coinage, bullion, com-
delivery. modities, and shares as well as to trading bills of
Unlike arbitration of exchange using bills of exchange. There is also no recognition that doing
exchange, which was widely used and understood arbitrage with bills of exchange introduces two addi-
by the eighteenth century, arbitrage trades involving tional elements not relevant to triangular arbitrage
options—also known as privileges and premiums —
for manual foreign exchange transactions: time and
were not. Available sources on such trades con-
location.
ducted in Amsterdam, Joseph de la Vega [21, ch.3]
The word “arbitrage” is derived from a Latin
and Isaac da Pinto [19, p.366–377] were written by
root (arbitrari, to give judgment; arbitrio, arbitration)
observers who were not the actual traders, so only
with variants appearing in the Romance languages.
crude details of the arbitrage trades are provided.
Consider the modern Italian variants: arbitraggio is
Conversion arbitrages for put and call options, which
the term for arbitrage; arbitrato is arbitration or
involves knowledge of put–call parity, are described
by both de la Vega and da Pinto. Despite this, prior umpiring; and, arbitrarer is to arbitrate. Similarly,
to the mid-nineteenth century, options trading was a for modern French variants, arbitrage is arbitration;
relatively esoteric activity confined to a specialized arbitrer is to arbitrate a quarrel or to umpire;
group of traders. Having attracted passing mention and arbitre is an arbitrator or umpire. Recognizing
by Cohn [3], Castelli [1, p.2] identifies “the great that the “arbitration of prices” concept underlying
want of a popular treatise” on options as the rea- arbitrage predates Roman times, the historical origin
son for undertaking a detailed treatment of mostly where the word arbitrage or a close variant was
speculative option trading strategies. In a brief treat- first used in relation to arbitrating differences in
ment, Castelli uses put–call parity in an arbitrage prices is unknown. A possible candidate involves
trade combining a short position in “Turks 5%” in arbitration of exchange rates for different currencies
Constantinople with a written put and purchased call observed at the medieval fairs, around the time of
in London. The trade is executed to take advantage of the First Crusade (1100). The dominance of Italian
“enormous contangoes collected at Constantinople” bankers in this era indicates the first usage was the
[1, p.74–77]. close variant, arbitrio, with the French “arbitrage”
coming into usage during the eighteenth century.
Religious and social restrictions effectively barred
Etymology and Historical Usage public discussion of the execution and profitability
of such banking activities during the Middle Ages,
The Oxford International Dictionary [12] defines though account books of the merchant banks do
arbitrage as: “the traffic in bills of exchange drawn remain as evidence that there was significant arbitrage
on sundry places, and bought or sold in sight of the trading.
Arbitrage: Historical Perspectives 9

As late as the seventeenth century, important Following the usage of “arbitrage” in German
English sources on the Law Merchant such as Ger- and Dutch works in the 1860s, common usage of
ard Malynes, Lex Mercatoria [13], make no reference “arbitrageur” in English appears with Ottomar Haupt,
to arbitrage trading strategies in bills of exchange. The London Arbitrageur [8], though reference is still
In contrast, a similar text in Italian, Il Negotiante made to “arbitration of exchange” as the activity
(1638) by Giovanni Peri [18], a seventeenth cen- of the arbitrageur. Haupt produced similar works in
tury Italian merchant, has a detailed discussion on German and French that used “arbitrage” to describe
exchange dealings. Peri states that profit is the objec- the calculation of parity relationships. A pamphlet by
tive of all trade and that the “activity directed to Maurice Cohn, The Stock Exchange Arbitrageur [3]
this end is subject to chance, which mocks at every describes “arbitrage transactions” between bourses
calculation. Yet there is still ample space for reason- but also uses “arbitration” to refer to calculated
able calculation in which the possibility of adverse parity relationships. Charles Castelli’s The Theory of
fortunes is never left out of account” [5, p.327]. “Options” in Stocks and Shares [1] concludes with
This mental activity engaged in the service of busi- a section on “combination of options with arbitrage
ness is called arbitrio. Peri identifies a connection operations” where arbitrage has exclusive use and
between speculation on future exchange rate move- no mention is made of “arbitration” of prices or
ments and the arbitrio concept of arbitrage: “the rates across different locations. Following Arbitrage
profits from exchange dealings originate in price dif- in Bullion, Coins, Bills, Stocks, Shares and Options
ferences and not in time” with profits turning to by Henry Deutsch [4], “arbitration of exchange” is
losses if re-exchange is unfavorable [18, p.150]. For no longer commonly used.
Peri, the connection between speculation and arbi-
trage applies to commodities and specie, as well as References
bills of exchange.
The first published usage of “arbitrage” in dis- [1] Castelli, C. (1877). The Theory of “Options” in Stocks
cussing the relationship between exchange rates and and Shares, F. Mathieson, London.
the most profitable locations for issuing and settling a [2] Chuquet, N. (1484, 1985). Triparty, in Nicolas Chu-
bill of exchange appears in French in, La Science des quet, Renaissance Mathematician, G. Flegg, C. Hay &
B. Moss, eds, D. Reidel Publishing, Boston.
Négocians et Teneurs de Livres [22, p.452]. From the [3] Cohn, M. (1874). The London Stock Exchange in Rela-
brief reference in a glossary of terms by de la Porte, tion with the Foreign Bourses. The Stock Exchange Arbi-
a number of French sources, including the section trageur, Effingham Wilson, London.
Traité des arbitrages by Mondoteguy in Le Moine, [4] Deutsch, H. (1904, 1933). Arbitrage in Bullion, Coins,
Le Negoce d’Amsterdam [11] and Savary, Diction- Bills, Stocks, Shares and Options, 3rd Edition, Effingham
naire Universel de Commerce (1730, 2nd ed.) [30], Wilson, London.
[5] Ehrenberg, R. (1928). Capital and Finance in the Age of
developed a more detailed presentation of arbitrage
the Renaissance, translated from the German by Lucas,
transactions involving bills of exchange. An impor- H. Jonathan Cape, London.
tant eighteenth century English source, The Univer- [6] Einzig, P. (1964). The History of Foreign Exchange, 2nd
sal Dictionary of Trade and Commerce [24], is an Edition, Macmillan, London.
expanded translation of Savary where the French [7] Greif, A. (1989). Reputation and coalitions in medieval
word “arbitrage” is translated into English as “arbi- trade: evidence on the Maghribi Traders, Journal of
Economic History 49, 857–882.
tration”. This is consistent with the linguistic con-
[8] Haupt, O. (1870). The London Arbitrageur; or, the
vention of referring to arbitration instead of arbitrage English Money Market in connexion with foreign
found in the earlier English source, The Merchant’s Bourses. A Collection of Notes and Formulae for the
Public Counting House [23]. This led to the com- Arbitration of Bills, Stocks, Shares, Bullion and Coins,
mon English use of the terms “simple arbitrations”, with all the Important Foreign Countries, Trubner and
“compound arbitrations”, and “arbitrated rates”. The Co., London.
practice of using arbitration instead of arbitrage con- [9] Justice, A. (1707). A General Treatise on Monies and
Exchanges; in which those of all Trading Nations are
tinues into nineteenth century works by Patrick Kelly, Describ’d and Consider’d, S. and J. Sprint, London.
The Universal Cambist [10] and William Tate, The [10] Kelly, P. (1811, 1835). The Universal Cambist and
Modern Cambist [34]. The latter book went into six Commercial Instructor; Being a General Treatise on
editions. Exchange including the Monies, Coins, Weights and
10 Arbitrage: Historical Perspectives

Measures, of all Trading Nations and Colonies, 2nd [22] la Porte, M. (1704). La Science des Négocians et Teneurs
Edition, Lackington, Allan and Co., London, 2 Vols. de Livres, Chez Guillaume Chevelier, Paris.
[11] Le Moine de l’Espine, J. (1710). Le Negoce [23] Postlethwayt, M. (1750). The Merchant’s Public Count-
d’Amsterdam . . . Augmenté d’un Traité des arbitrages ing House, John and Paul Napton, London.
& des changes sur les principales villes de l’Europe (by [24] Postlethwayt, M. (1751, 1774). The Universal Dictionary
Jacques Mondoteguy), Chez Pierre Brunel, Amsterdam. of Trade and Commerce, 4th Edition, John and Paul
[12] Little, W., Fowler, H. & Coulson, J. (1933, 1958). Napton, London.
Oxford International Dictionary of the English Lan- [25] Quinn, S. (1996). Gold, silver and the glorious revolu-
guage, Leland Publishing, Toronto, revised and edited tion: arbitrage between bills of exchange and bullion,
by C. Onions, 1958. Economic History Review 49, 473–490.
[13] Malynes, G. (1622, 1979). Consuetudo, vel Lex Merca- [26] de Roover, R. (1944). What is dry exchange? A contri-
toria or The Ancient Law Merchant, Adam Islip, Lon- bution to the study of english mercantilism, Journal of
don. reprinted (1979) by Theatrum Orbus Terrarum, Political Economy 52, 250–266.
Amsterdam. [27] de Roover, R. (1948). Banking and Credit in Medieval
[14] McCusker, J. (1978). Money and Exchange in Europe Bruges, Harvard University Press, Cambridge, MA.
and America, 1600–1775, University of North Carolina [28] de Roover, R. (1949). Gresham on Foreign Exchange,
Press, Chapel Hill NC. Harvard University Press, Cambridge, MA.
[15] Munro, J. (2000). English ‘Backwardness’ and finan- [29] Sargent, T. & Velde, F. (2002). The Big Problem of Small
cial innovations in commerce with the low coun- Change, Princeton University Press, Princeton, NJ.
tries, 14th to 16th centuries, in International Trade [30] Savary des Bruslons, J. (1730). Dictionnaire Universel
in the Low Countries (14th –16th Centuries), P. Sta- de Commerce, Chez Jacques Etienne, Paris, Vol. 3.
bel, B. Blondé, A. Greve, eds, Garant, Leuven- [31] Schubert, E. (1989). Arbitrage in the foreign exchange
Apeldoorn, pp. 105–167. markets of London and Amsterdam during the 18th
[16] Neal, L. & Quinn, S. (2001). Networks of information, Century, Explorations in Economic History 26, 1–20.
markets, and institutions in the rise of London as a [32] Shea, G. (2007). Understanding financial derivatives
financial centre, 1660–1720, Financial History Review during the south sea bubble: the case of the south
8, 7–26. sea subscription shares, Oxford Economic Papers 59
[17] Noonan, J. (1957). The Scholastic Analysis of Usury, (Special Issue), 73–104.
Harvard University Press, Cambridge, MA. [33] Weinstein, M. (1931). Arbitrage in Securities, Harper &
[18] Peri, G. (1638, 1707). Il Negotiante, Giacomo Hertz, Bros, New York.
Venice. (last revised edition 1707). [34] William, T. (1820, 1848). The Modern Cambist: Form-
[19] Poitras, G. (2000). The Early History of Financial ing a Manual of Foreign Exchanges, in the Different
Economics, 1478–1776, Edward Elgar, Cheltenham, Operations of Bills of Exchange and Bullion, 6th Edition,
U.K. Effingham Wilson, London.
[20] Poitras, G. (2004). William Lowndes, 1652–1724, in
Biographical Dictionary of British Economists, R. Don- GEOFFREY POITRAS
ald, ed., Thoemmes Press, Bristol, UK, pp. 699–702.
[21] Poitras, G. (2006). Pioneers of Financial Economics:
Contributions Prior to Irving Fisher, Edward Elgar,
Cheltenham, UK, Vol. I.
Utility Theory: Historical of the traders—presaged, but did not directly influ-
ence, what will become known in economics as the
Perspectives “Marginalist revolution” led by William Jevons [13],
Carl Menger [17], and Leon Walras [26].

The first recorded mention of a concave utility func-


tion in the context of risk and uncertainty is in a Axiomatization
manuscript of Daniel Bernoulli [4] in 1738, though
credit should also be given to Gabriel Cramer, who, The work of Gossen notwithstanding, another century
according to Bernoulli himself, developed a remark- passed before the scientific community took an inter-
ably similar theory in 1728. Bernoulli proposes a est in Bernoulli’s ideas (with some notable exceptions
resolution of a paradox posed in 1713 by his cousin such as Alfred Marshal [16] or Francis Edgeworth’s
Nicholas Bernoulli. Known as the St. Petersburg entry on probability [8] in the celebrated 1911 edi-
paradox, it challenges the idea that rational agents tion of Encyclopedia Britannica). In 1936, Franz Alt
value random outcomes by their expected returns. published the first axiomatic treatment of decision
Specifically, a game is envisioned in which a fair coin making in which he deduces the existence of an
is tossed repeatedly and the payoff equals 2n ducats if implied utility function solely on the basis of a sim-
the first heads appeared on the nth toss. The expected ple set of plausible axioms. Eight years later, Oskar
value of the payoff can be computed as Morgenstern and John von Neumann published the
widely influential “Theory of Games and Economic
1 1 1 Behavior” [25]. Along with other contributions—the
× 2 + × 4 + × 8 + ···
2 4 8 most important representative being a mathematically
1 rigorous foundation of game theory—they develop,
+ n × 2n + · · · = + ∞ (1) at great length, a theory similar to Alt’s. Both Alt’s
2
and the von Neumann–Morgenstern axiomatizations
but, clearly, no one would pay an infinite, or even a study a preference relation on the collection of all
large finite, amount of money for a chance to play lotteries (probability distributions on finite sets of
such a game. Daniel Bernoulli suggests that the sat- outcomes) and show that one lottery is preferred to
isfaction or utility U (w) from a payoff of size w
the other if and only if the expected utility of the
should not be proportional to w (as mandated by the
former is larger than the expected utility of the lat-
then prevailing valuation by expectation), but should
ter. The major conceptual leap accomplished by Alt,
exhibit diminishing marginal returns; in contempo-
von Neumann, and Morgenstern was to show that
rary language, the derivative U  of the function U
the behavior of a rational agent necessarily coincides
should be decreasing (see Utility Function). Propos-
with the behavior of an agent who values uncertain
ing a logarithmic function as a suitable U , Bernoulli
payoffs using an expected utility.
suggests that the value of the game to the agent
should be calculated as the expected utility

1 1 1 The Subjectivist Revolution and the


× log(2) + × log(4) + × log(8) + · · · State-preference Approach
2 4 8
1
+ n × log(2n ) + · · · = log(4) (2) All of the aforementioned derivations of the
2 expected-utility hypothesis assumed the existence of
Bernoulli’s theory was poorly accepted by his con- a physical (objective) probability over the set of
temporaries. It was only a hundred years later that possible outcomes of the random payoff. An approach
Herman Gossen [11] used Bernoulli’s idea of dimin- in which both the probability distribution and the
ishing marginal utility of wealth to formulate his utility function are determined jointly from simple
“Laws of Economic Activity”. Gossen’s “Second behavioral axioms has been proposed by Leonard
law”—the idea that the ratio of exchange values of Savage [23], who was inspired by the work of Frank
two goods must equal the ratio of marginal utilities Ramsey [21] and Bruno de Finetti [5, 6].
2 Utility Theory: Historical Perspectives

One of the major features of the expected-utility Brownian motion as a model for stock evolution, and
theory is the separation between the utility func- it was not long before it was combined with expected
tion and the resolution of uncertainty, in that equal utility theory in the work of Robert Merton [18] (see
payoffs in different states of the world yield the Merton, Robert C.).
same utilities. It has been argued that, while some-
times useful, such a separation is not necessary. An References
approach in which the utility of a payoff depends
not only on its monetary value but also on the state
[1] Allais, M. (1953). La psychologie de l’home rationnel
of the world has been proposed. Such an approach devant le risque: critique des postulats et axiomes
has been popularized through the work of Kenneth de l’école Américaine, Econometrica 21(4), 503–546.
Arrow [2] (see Arrow, Kenneth) and Gerard Debreu Translated and reprinted in Allais and Hagen, 1979.
[7], largely because of its versatility and compatibility [2] Arrow, K.J. (1953). Le Rôle des valeurs boursières pour
with general-equilibrium theory where the payoffs are la Répartition la meilleure des risques, Econométrie,
not necessarily monetary. Further successful applica- Colloques Internationaux du Centre National de la
Recherche Scientifique, Paris 11, 41–47; Published in
tions have been made by Roy Radner [20] and many English as (1964). The role of securities in the optimal
others. allocation of risk-bearing, Review of Economic Studies
31(2), 91–96.
[3] Arrow, K.J. (1965). Aspects of the Theory of Risk-
Empirical Paradoxes and Prospect Theory Bearing, Yrjö Jahnsson Foundation, Helsinki.
[4] Bernoulli, D. (1954). Exposition of a new theory on
With the early statistical evidence being mostly anec- the measurement of risk, Econometrica 22(1), 23–36.
dotal, many empirical studies have found significant Translation from the Latin by Dr. Louise Sommer of
inconsistencies between the observed behavior and work first published 1738.
[5] de Finetti, B. (1931). Sul significato soggettivo della
the axioms of utility theory. The most influential
probabilità, Fundamenta Mathematicae 17, 298–329.
of these early studies were performed by George [6] de Finetti, B. (1937). La prévision: ses lois logiques, ses
Shackle [24], Maurice Allais [1], and Daniel Ellsberg sources subjectives, Annales de l’Institut Henri Poincaré
[9]. In 1979, Daniel Kahneman and Amos Tversky 7(1), 1–68.
[14] proposed “prospect theory” as a psychologi- [7] Debreu, G. (1959). Theory of Value—An Axiomatic
cally more plausible alternative to the expected utility Analysis of Economic Equilibrium, Cowles Foundation
theory. Monograph # 17, Yale University Press.
[8] Edgeworth, F.Y. (1911). Probability and Expectation,
Encyclopedia Britannica.
[9] Ellsberg, D. (1961). Risk, ambiguity and the Savage
Utility in Financial Theory axioms, Quarterly Journal of Economics 75, 643–69.
[10] Friedman, M. & Savage, L.P. (1952). The expected-
The general notion of a numerical value associ- utility hypothesis and the measurability of utility, Jour-
ated with a risky payoff was introduced to finance nal of Political Economy 60, 463–474.
by Harry Markowitz [15] (see Markowitz, Harry) [11] Gossen, H.H. (1854). The Laws of Human Relations
through his influential “portfolio theory”. and the Rules of Human Action Derived Therefrom, MIT
Markowitz’s work made transparent the need for a Press, Cambridge, 1983. Translated from 1854 original
precise measurement and quantitative understanding by Rudolph C. Blitz with an introductory essay by
Nicholas Georgescu-Roegen.
of the levels of “risk aversion” (degree of concavity [12] Itô, K. (1942). On stochastic processes. I. (Infinitely
of the utility function) in financial theory. Even divisible laws of probability), Japan. Journal of Mathe-
though a similar concept had been studied by Milton matics 18, 261–301.
Friedman and Leonard Savage [10] before that, the [13] Jevons, W.S. (1871). The Theory of Political Econ-
major contribution to this endeavor was made by John omy. History of Economic Thought Books, McMaster
Pratt [19] and Kenneth Arrow [3]. University Archive for the History of Economic
With the advent of stochastic calculus (developed Thought.
[14] Kahneman, D. & Tversky, A. (1979). Prospect theory:
by Kiyosi Itô [12], see Itô, Kiyosi (1915–2008)), an analysis of decision under risk, Econometrica 47(2),
the mathematical tools for continuous-time finan- 263–292.
cial modeling became available. Paul Samuelson [22] [15] Markowitz, H. (1952). Portfolio selection, Journal of
(see Samuelson, Paul A.) introduced geometric Finance 7(1), 77–91.
Utility Theory: Historical Perspectives 3

[16] Marshal, A. (1895). Principles of Economics, 3rd Edi- [23] Savage, L.J. (1954). The Foundations of Statistics, John
tion, 1st Edition 1890, Macmillan, London, New York. Wiley & Sons Inc., New York.
[17] Menger, C. (1871). Principles of Economics, 1981 edi- [24] Shackle, G.L.S. (1949). Expectations in Economics,
tion of 1971 Translation, New York University Press, Gibson Press.
New York. [25] von Neumann, J. & Morgenstern, O. (2007). Theory
[18] Merton, R.C. (1969). Lifetime portfolio selection under of Games and Economic Behavior, Anniversary Edition.
uncertainty: the continuous-time case, The Review of 1st Edition, 1944, Princeton University Press, Princeton,
Economics and Statistics 51, 247–257. NJ.
[19] Pratt, J. (1964). Risk aversion in the small and in the [26] Walras, L. (1874). Eléments d’économie Politique Pure,
large, Econometrica 32(1), 122–136. 4th Edition, L. Corbaz, Lausanne.
[20] Radner, R. (1972). Existence of equilibrium of plans,
prices, and price expectations in a sequence of markets,
Econometrica 40(2), 289–303.
[21] Ramsey, F.P. (1931). The foundations of mathematics Related Articles
and other logical essays, in Truth and Probability,
R.B. Braithwaite, ed, Kegan, Paul, Trench, Trubner &
Behavioral Portfolio Selection; Expected Utility
Co., Harcourt, Brace and Company, London, New York,
Chapter VII, pp. 156–198. Maximization; Merton Problem; Risk Aversion;
[22] Samuelson, P.A. (1965). Rational theory of Warrant Risk–Return Analysis.
Pricing, Industrial Management Review 6(2),
13–31. GORDAN ŽITKOVIĆ
understand Lévy, and, as anyone who has attempted
Itô, Kiyosi (1915–2008) to read Lévy in the original knows, this in itself
a daunting task. Indeed, I have my doubts that,
even now, many of us would know what Lévy
Kiyosi Itô was born in 1915, approximately 60 did had Itô not explained it to us. Be that as it
years after the Meiji Restoration. Responding to the may, Itô’s first published paper (1941) was devoted
appearance of the “Black Ships” in Yokohama harbor to a reworking (incorporating important ideas due
and Commodore Perry’s demand that they open to J.L. Doob) of Lévy’s theory of homogeneous,
their doors, the Japanese overthrew the Tokugawa independent increment processes.
shogunate and in 1868 “restored” the emperor Meiji Undoubtedly as a dividend of the time and effort
to power. The Meiji Restoration initiated a period of which he spent unraveling Lévy’s ideas, shortly after
rapid change during which Japan made a concerted completing this paper Itô had a wonderful insight
and remarkably successful effort to transform itself of his own. To explain his insight, imagine that
from an isolated, feudal society into a modern state the space M1 () of probability measures on  has
that was ready to play a major role in the world. a differentiable structure in which the underlying
During the first phase of this period, they sent their dynamics is given by convolution. Then, if t ∈
best and brightest abroad to acquire and bring back [0, ∞)  −−−→ µt ∈ M1 () is a “smooth curve” which
to Japan the ideas and techniques that had been pre- starts at the unit point mass δ0 , its “tangent” at time
viously blocked entry by the shogunate’s closed door 0, it should be given by the limit
policy. However, by 1935, the year that Itô entered  n
Tokyo University, the Japanese transformation pro- lim µ 1
n→∞ n
cess had already moved to a second phase, one in
which the best and brightest were kept at home to where  denotes convolution and therefore ν n
study, assimilate, and eventually disseminate the vast is the n-fold convolution power of ν ∈ M1 ().
store of information which had been imported during What Itô realized is that, if this limit exists,
the first phase. Thus, Itô and his peers were expected it must be an infinitely divisible law. Applied
to choose a topic that they would first teach them- to µt = P (t, x, ·), where (t, x) ∈ [0, ∞) ×   −−−→
selves and then teach their compatriots. For those of P (t, x, ·) ∈ M1 () is the transition probability func-
us who had the benefit of step-by-step guidance from tion for a Markov process, this key observation
knowledgeable teachers, it is difficult to imagine how lead Itô to view Kolmogorov’s forward equation as
Itô and his fellow students managed, and we can only describing the flow of a vector field on M1 ().In
marvel at the fact that they did. addition, because infinitely divisible laws play in
The topic which Itô chose was that of stochas- the geometry of M1 () the rolea that straight lines
tic processes. At the time, the field of stochastic play in Euclidean space, he saw that one should be
processes had only recently emerged and was still able to “integrate” Kolmogorov’s equation by piecing
in its infancy. N. Wiener (1923) had constructed together infinitely divisible laws, just as one inte-
Brownian motion, A.N. Kolmogorov (1933) and Wm. grates a vector field in Euclidean space by piecing
Feller (1936) had laid the analytic foundations on together straight lines.
which the theory of diffusions would be built, and Profound as the preceding idea is, Itô went a step
P. Lévy (1937) had given a pathspace interpretation further. Again under Lévy’s influence, he wanted to
of infinitely divisible laws. However, in comparison transfer his idea to a pathspace setting. Reasoning
to well-established fields such as complex analysis, that if the transition function can be obtained by con-
stochastic processes still looked more like a haphaz- catenating infinitely divisible laws, then the paths of
ard collection of examples than a unified field. the associated stochastic processes must be obtainable
Having studied mechanics, Itô from the outset to concatenating paths coming from Lévy’s inde-
was drawn to Lévy’s pathspace perspective with its pendent increment processes and that one should be
emphasis on paths and dynamics, and he set as his able to encode this concatenation procedure in some
goal the reconciliation of Kolmogorov and Feller’s sort of “differential equation” for the resulting paths.
analytic treatment with Lévy’s pathspace picture. To The implementation of this program required him to
carry out his program, he first had to thoroughly develop what is now called the “Itô calculus”.
2 Itô, Kiyosi (1915–2008)

It was during the period when he was work- in a German prison camp for French intellectuals,
ing out the details of his calculus that he realized, each of whom attempted to explain to the others
at least in the special case when paths are contin- something about which he was thinking. With the
uous, there is a formula which plays role in his objective of not discussing anything that might be
calculus that the chain rule plays in Newton’s. This useful to the enemy, Leray chose to talk about
formula, which appeared for the first time in a foot- algebraic topology rather than his own work on
note, is what we now call Itô’s formula. Humble partial differential equations, and for this purpose, he
as its origins may have been, it has become one introduced spectral sequences as a pedagogic tool.
of the three or four most famous mathematics for- After relating this anecdote, Schwartz leaned back
mulae of the twentieth century. Itô’s formula is not against the blackboard and spent several minutes
only a boon of unquestioned and inestimable value musing about the advantages of doing research in
to mathematicians but also has become an indispens- ideal working conditions.
able tool in the world of mathematically oriented Kiyosi Itô died at the age of 93 on November
finance. 10, 2008. He is survived by his three daughters. A
Itô had these ideas in the early 1940s, around week before his death, he received the Cultural Medal
the time when Japan attacked Pearl Harbor and its from the Japanese emperor. The end of an era is fast
population had to face the consequent horrors. In approaching.
view of the circumstances, it is not surprising that few
inside Japan, and nobody outside of Japan, knew what
Itô was doing for nearly a decade. Itô did publish an End Notes
outline of his program in a journal of mimeographed
notes (1942) at Osaka University, but he says that
a.
Note that when t  µt is the flow of infinitely divisible
only his friend G. Maruyama really read what he had law µ in the sense that µ1 = µ and µs+t = µs  µt , µ =
(µ(1/n) )n for all n ≥ 1, which is the convolution analog of
written. Thus, it was not until 1950, when he sent the
f (1) = n−1 f (n) for a linear function on .
manuscript for a monograph to Doob who arranged
that it be published by the A.M.S. as a Memoir,
that Itô’s work began to receive the attention which References
it deserved. Full appreciation of Itô’s ideas by the
mathematical community came only after first Doob [1] Stroock, D. & Varadhan S.R.S. (eds) (1986). Selected
and then H.P. McKean applied martingale theory Papers: K. Itô, Springer-Verlag.
to greatly simplify some of Itô’s more technical [2] Stroock, D. (2003). Markov Processes from K. Itô’s
arguments. Perspective, Annals of Mathematical Studies, Vol. 155,
Despite its less than auspicious beginning, the Princeton University Press.
[3] Stroock, D. (2007). The Japanese Journal of Mathemati-
story has a happy ending. Itô spent many years cal Studies 2(1).
traveling the world: he has three daughters, one living
in Japan, one in Denmark, and one in America. He
is, in large part, responsible for the position of Japan Further Reading
as a major force in probability theory, and he has
disciples all over the planet. His accomplishments are A selection of Itô’s papers as well as an essay about his life
widely recognized: he is a member of the Japanese can be found in [1]. The first half of the book [2] provides a
Academy of Sciences and the National Academy of lengthy exposition of Itô’s ideas about Markov processes.
Sciences; and he is the recipient of, among others, Reference [3] is devoted to articles, by several mathe-
maticians, about Itô and his work. In addition, thumbnail
the Kyoto, Wolf, and Gauss Prizes. When I think
biographies can be found on the web at www-groups.dcs.
of Itô’s career and the rocky road that he had to st-and.ac.uk/history/Biographies/Ito.html and www.math.
travel, I recall what Jack Schwartz told a topology uah.edu/stat/biographies/Ito.xhtml
class I was attending about Jean Leray’s invention of
spectral sequences. At the time, Leray was a prisoner DANIEL W. STROOCK
Thorp, Edward also interested in warrants because of his own invest-
ing. Kassouf had analyzed market data to determine
the key variables that affected warrant prices. On the
Edward O. Thorp is a mathematician who has made basis of his analysis, Kassouf developed an empiri-
seminal contributions to games of chance and invest- cal formula for a warrant’s price in terms of these
ment science. He invented original strategies for variables.
the game of blackjack that revolutionized the game. In September 1965, Thorp and Kassouf discovered
Together with Sheen Kassouf, he showed how war- their mutual interest in warrant pricing and began
rants could be hedged using a short position in the their collaboration. In 1967, they published their
underlying stocks and described and implemented book, Beat the Market, in which they proposed a
arbitrage portfolios of stocks and warrants. Thorp method for hedging warrants using the underlying
made other important contributions to the develop- stock and developed a formula for the hedge ratio
ment of option pricing and to investment theory and [5]. Their insights on warrant pricing were useda by
practice. He has had a very successful record as Black and Scholes in their landmark 1973 paper on
an investment manager. This note contains a brief option pricing.
account of some of his major contributions. Thorp and Kassouf were aware that the conven-
Thorp studied physics as an undergraduate and tional valuation method was based on projecting the
obtained his PhD in mathematics from the University warrant’s expected terminal payoff and discounting
of California at Los Angeles in 1958. The title of back to current time. This approach involved two
his dissertation was Compact Linear Operators in troublesome parameters: the expected return on the
Normed Spaces, and he has published several papers warrant and the appropriate discount rate. Black and
on functional analysis. He taught at UCLA, MIT, Scholes in their seminal paper would show that the
and New Mexico State University and was professor values of both these parameters had to coincide with
of mathematics and finance at the University of the riskless rate. There is strong evidenceb that Thorp
California at Irvine. independently discovered this solution in 1967 and
Thorp’s interest in devising scientific systems for used it in his personal investment strategies. Thorpc
playing games of chance began when he was a gradu- makes it quite clear that the credit rightfully belongs
ate student in the late 1950s. He invented a system for to Black and Scholes.
playing roulette and also became interested in black-
Black Scholes was a watershed. It was only after
jack and devised strategies based on card counting seeing their proof that I was certain that this was
systems. While at MIT, he collaborated with Claude the formula—and they justifiably get all the credit.
Shannon, and together they developed strategies for They did two things that are required. They proved
improving the odds at roulette and blackjack. One of the formula(I didn’t) and they published it (I didn’t).
their inventions was a wearable computer that was
the size of modern-day cell phone. In 1962, Thorp Thorp made a number of other contributions to the
[3] published Beat the Dealer: A Winning Strategy for development of option theory and modern finance and
the Game of Twenty One. This book had a profound his ideas laid the foundations for further advances.
impact on the game of blackjack as gamblers tried As one illustration based on my own experience,
to implement his methods, and casinos responded I will mention Thorp’s essential contribution to a
with various countermeasures that were sometimes paper that David Emanuel and I published in 1980
less than gentle. [2]. Our paper examined the distribution of a hedged
In June 1965, Thorp’s interest in warrants was portfolio of a stock and option that was rebalanced
piqued by reading Sydney Fried’s RHM Warrant Sur- after a short interval. The key equation on which
vey. He was motivated by the intellectual challenge our paper rests was first developed by Thorp in
of warrant valuation and by the prospect of mak- (1976) [4].
ing money using these instruments. He developed his Throughout his career, Edward Thorp has applied
initial ideas on warrant pricing and investing during mathematical tools to develop highly original solu-
the summer of 1965. Sheen Kassouf who was, like tions to difficult problems and he has demonstrated a
Thorp, a new faculty member at the University of unique ability to implement these solution in a prac-
California’s newly established campus at Irvine, was tical way.
2 Thorp, Edward

End Notes [2] Boyle, P.P. & Emanuel, D. (1980). Discretely adjusted
option hedges, Journal of Financial Economics 8(3),
a. 259–282.
Black and Scholes state, “One of the concepts we use [3] Thorp, E.O. (1962). Beat the Dealer: A Winning Strategy
in developing our model was expressed by Thorp and for the Game of Twenty-One, Random House, New York.
Kassouf.” [4] Thorp, E.O. (1976). Common stock volatilities in option
b.
For a more detailed discussion of this issue, see Boyle formulas, Proceedings, Seminar on the Analysis of Secu-
and Boyle [1] Chapter Five. rity Prices, Center for Research in Security Prices, Grad-
c.
Email to the author dated July 26, 2000. uate School of Business, University of Chicago, Vol. 21,
1, May 13–14, pp. 235–276.
References [5] Thorp, E.O. & Kassouf, S. (1967). Beat the Mar-
ket: A Scientific Stock Market System, Random House,
New York.
[1] Boyle, P.P. & Boyle, F.P. (2001). Derivatives: the Tools
that Changed Finance, Risk Books, UK. PHELIM BOYLE
Option Pricing Theory: work on warrant pricing [117]. Samuelson derived
valuation formulas for both European and American
Historical Perspectives options, coining these terms in the process.
Samuelson’s derivation was almost identical to
that used nearly a decade later to derive the
This article traces the history of the option pric- Black–Scholes–Merton formula, except that instead
ing theory from the turn of the twentieth century of invoking the no arbitrage principle to derive the
to the present. This history documents and clari- valuation formula, Samuelson postulated the condi-
fies the origins of the key contributions (authors tion that the discounted option’s payoffs follow a
and papers) to the theory of option pricing and martingale (see [117], p. 19). Furthermore, it is also
hedging. Contributions with respect to the empirical interesting to note that, in the appendix to this arti-
understanding of the theories are not discussed, cle, Samuelson and McKean determined the price of
except implicitly, because the usefulness and longe- an American option by observing the correspondence
vity of any model is based on its empirical validity. between an American option’s valuation and the free
It is widely agreed that the modern theory of boundary problem for the heat equation.
option pricing began in 1973 with the publica- A few years later, instead of invoking the postulate
tion of the Black–Scholes–Merton model [12, 104]. that discounted option payoffs follow a martingale,
Except for the early years (pre-1973), this his- Samuelson and Merton [118] derived this condition
tory is restricted to papers that use the no arbi- as an implication of a utility maximizing investor’s
trage and complete markets technology to price behavior. In this article, they also showed that the
options. Equilibrium option pricing models are not option’s price could be viewed as its discounted
discussed herein. In particular, this excludes the expected value, where instead of using the actual
consideration of option pricing in incomplete mar- probabilities to compute the expectation, one employs
kets. An outline for this article is as follows. utility or risk-adjusted probabilities (see expression
The following section discusses the early years of (20) on page 26). These risk-adjusted probabilities are
option pricing (pre-1973). The remaining sections now known as “risk-neutral” or “equivalent martin-
deal with 1973 to the present: the section “Equity gale” probabilities. Contrary to a widely held belief,
Derivatives” discusses the Black–Scholes–Merton the use of “equivalent martingale probabilities” in
model; the section “Interest Rate Derivatives” con- option pricing theory predated the paper by Cox and
cerns the Heath–Jarrow–Morton model; and the Ross [36] by nearly 10 years (Merton (footnote 5 p.
section “Credit Derivatives” corresponds to credit 218, [107]) points out that Samuelson knew this fact
risk derivative pricing models. as early as 1953).
Unfortunately, these early option pricing formu-
las depended on the expected return on the stock, or
Early Option Pricing Literature equivalently, the stock’s risk premium. This depen-
(Pre-1973) dency made the formulas difficult to estimate and to
use. The reason for this difficulty is that the empiri-
Interestingly, many of the basic insights of option cal finance literature has documented that the stock’s
pricing originated in the early years, that is, pre- risk premium is nonstationary. It varies across time
1973. It all began at the turn of the century according to both changing tastes and changing eco-
in 1900 with Bachelier’s [4] derivation of an nomic fundamentals. This nonstationarity makes both
option pricing formula in his doctoral disserta- the modeling of risk premium and their estimation
tion on the theory of speculation at France’s Sor- problematic. Indeed, at present, there is still no gen-
bonne University. Although remarkably close to the erally accepted model for an asset’s risk premium
Black–Scholes–Merton model, Bachelier’s formula that is consistent with historical data (see [32], Part
was flawed because he used normally distributed IV for a review).
stock prices that violated limited liability. More than Perhaps the most important criticism of this early
half a century later, Paul Samuelson read Bache- approach to option pricing is that it did not invoke the
lier’s dissertation, recognized this flaw, and fixed it riskless hedging argument in conjunction with the no-
by using geometric Brownian motion instead in his arbitrage principle to price an option. (The first use of
2 Option Pricing Theory: Historical Perspectives

riskless hedging with no arbitrage to prove a pricing the discounted stock price process a martingale.
relationship between financial securities can be found The second fundamental theorem of asset pricing
in [110].) And, as such, these valuation formulas states that the market is complete if and only if the
provided no insights into how to hedge an option equivalent martingale measure is unique. A complete
using the underlying stock and riskless borrowing. market is one in which any derivative security’s
It can be argued that the idea of hedging an option payoffs can be generated by a dynamic trading
is the single most important insight of modern strategy in the stock and riskless asset. These two
option pricing theory. The use of the no arbitrage theorems enabled the full fledged use of stochastic
hedging argument to price an option can be traced calculus for option pricing theory. A review and
to the seminal papers by Black and Scholes [12] summary of these results can be found in [43].
and Merton [104], although the no arbitrage hedging At the beginning, this alternative and more for-
argument itself has been attributed to Merton (see mal approach to option pricing theory was viewed
[79] in this regard). as only of tangential interest. Indeed, all existing
option pricing theorems could be derived without
this technology and only using the more intuitive
Equity Derivatives economic hedging argument. It was not until the
Heath–Jarrow–Morton (HJM) model [70] was devel-
Fischer Black, Myron Scholes, and Robert Mer- oped—circulating as a working paper in 1987—that
ton pioneered the modern theory of option pricing this impression changed. The HJM model was
with the publication of the Black–Scholes–Merton the first significant application that could not be
option pricing model [12, 104] in 1973. The origi- derived without the use of the martingale pricing
nal Black–Scholes–Merton model is based on five technology. More discussion relating to the HJM
assumptions: (i) competitive markets, (ii) frictionless model is contained in the section “Interest Rate
markets, (iii) geometric Brownian motion, (iv) deter- Derivatives”.
ministic interest rates, and (v) no credit risk. For the
purposes of this section, the defining characteristics Extensions
of this model are the assumptions of deterministic
interest rates and no credit risk. The original Black–Scholes–Merton model is based
The original derivation followed an economic on the following five assumptions: (i) competi-
hedging argument. The hedging argument involves tive markets, (ii) frictionless markets, (iii) geo-
holding simultaneous and offsetting positions in a metric Brownian motion, (iv) deterministic interest
stock and option that generates an instantaneous rates, and (v) no credit risk. The first two assump-
riskless position. This, in turn, implies a partial tions —competitive and frictionless markets —are
differential equation (pde.) for the option’s value the mainstay of finance. Competitive markets means
that is subject to a set of boundary conditions. The that all traders act as price takers, believing their
solution under geometric Brownian motion is the trades have no impact on the market price. Friction-
Black–Scholes formula. less markets imply that there are no transaction costs
It was not until six years later that the mar- nor trade restrictions, for example, no short sale con-
tingale pricing technology was introduced by Har- straints. Geometric Brownian motion implies that the
rison and Kreps [65] and Harrison and Pliska stock price is lognormally distributed with a con-
[66, 67], providing an alternative derivation of the stant volatility. Deterministic interest rates are self-
Black–Scholes–Merton model. These papers, and explanatory. No credit risk means that the investors
later refinements by Delbaen and Schachermayer [40, (all counterparties) who trade financial securities will
41, 42], introduced the first and second fundamen- not default on their obligations.
tal theorems of asset pricing, thereby providing the Extensions of the Black–Scholes–Merton model
rigorous foundations to option pricing theory. that relaxed assumptions (i)–(iii) quickly flourished.
Roughly speaking, the first fundamental theorem Significant papers relaxing the geometric Brownian
of asset pricing states that no arbitrage is equivalent to motion assumption include those by Merton [106]
the existence of an equivalent martingale probability and Cox and Ross [36], who studied jump and
measure, that is, a probability measure that makes jump-diffusion processes. Merton’s paper [106] also
Option Pricing Theory: Historical Perspectives 3

included the insight that if unhedgeable jump risk is price impact is called liquidity risk. Liquidity risk,
diversifiable, then it carries no risk premium. Under of this type, can be considered as an endogenous
this assumption, one can value jump risk using the transaction cost. This extension is studied in [26].
statistical probability measure, enabling the simple Liquidity risk is currently a hot research topic in
pricing of options in an incomplete market. This option pricing theory.
insight was subsequently invoked in the context of The Black–Scholes–Merton model has been app-
stochastic volatility option pricing and in the context lied to foreign currency options (see [58]) and to all
of pricing credit risk derivatives. types of exotic options on both equities and foreign
Merton [104], Cox [34] and Cox and Ross [36] currencies. A complete reference for exotic options
were among the first to study stochastic volatility is [44].
option pricing in a complete market. Option pric-
ing with stochastic volatility in incomplete markets Computations
was subsequently studied by Hull and White [73]
and Heston [71]. More recent developments in this The original derivation of the Black–Scholes–
line of research use a HJM [70] type model with a Merton model yields an option’s value satisfying a
term structure of forward volatilities (see [51, 52]). pde. subject to a set of boundary conditions. For a
Stochastic volatility models are of considerable cur- European call or put option, under geometric Brow-
rent interest in the pricing of volatility swaps, vari- nian motion, the pde. has an analytic solution. For
ance swaps, and options on variance swaps. American options under geometric Brownian motion,
A new class of Levy processes was introduced analytic solutions are not available for puts indepen-
by Madan and Milne [102] into option pricing and dent of dividend payments on the underlying stock,
generalized by Carr et al. [20]. Levy processes have and for American calls with dividends. For differ-
the nice property that their characteristic function is ent stock price processes, analytic solutions are often
known, and it can be shown that an option’s price not available as well, even for European options. In
can be represented in terms of the stock price’s these cases, numerical solutions are needed. The first
characteristic function. This leads to some alternative numerical approaches employed in this regard were
numerical procedures for computing option values finite difference methods (see [15, 16]).
using fast Fourier transforms (see [23]). For a survey Closely related, but containing more economic
of the use of Levy processes in option pricing, intuition, option prices can also be computed numer-
see [33]. ically by using a binomial approximation. The first
The relaxation of the frictionless market assump- users in this regard were Sharpe [122] chapter 16, and
tion has received less attention in the literature. The Rendleman and Bartter [113]. Cox et al. [37] pub-
inclusion of transaction costs into option pricing was lished the definitive paper documenting the binomial
originally studied by Leland [99], while Heath and model and its convergence to the continuous time
Jarrow [69] studied the imposition of margin require- limit (see also [68]). A related paper on convergence
ments. A more recent investigation into the impact of of discrete time models to continuous time models is
transaction costs on option pricing, using the martin- that by Duffie and Protter [48].
gale pricing technology, can be found in [26]. The binomial pricing model, as it is now known,
The relaxation of the competitive market assump- is also an extremely useful pedagogical device for
tion was first studied by Jarrow [77, 78] via the explaining option pricing theory. This is true because
consideration of a large trader whose trades change the binomial model uses only discrete time mathe-
the price. Jarrow’s approach maintains the no arbi- matics. As such, it is usually the first model presented
trage assumption, or in this context, a no market in standard option pricing textbooks. It is interesting
manipulation assumption (see also [5]). to note that both the first two textbooks on option
In between a market with competitive traders and a pricing utilized the binomial model in this fashion
market with a large trader is a market where traders (see [38] and [84]).
have only a temporary impact on the market price. Another technique for computing option values is
That is, purchase/sales change the price paid/received to use a series expansions (see [50, 83 and 123]).
depending upon a given supply curve. Traders act as Series expansions are also useful for hedging exotic
price takers with respect to the supply curve. Such a options that employ only static hedge positions with
4 Option Pricing Theory: Historical Perspectives

plain vanilla options (see [38] chapter 7.2, [24, 63, During the late 1970s and 1980s, interest rates
and 116]). were large and volatile, relative to historical norms.
As computing a European option’s price is equiv- New interest rate risk management tools were needed
alent to computing an expectation, an alternative because the Black–Scholes–Merton model was not
approach to either finite difference methods or the useful in this regard. In response, a class of inter-
binomial model is Monte Carlo simulation. The paper est rate pricing models were developed by Vasicek
that introduced this technique to option pricing is by [124], Brennan and Schwartz [17], and Cox et al.
Boyle [13]. This technique has become very popu- (CIR) [35]. This class, called the spot rate mod-
lar because of its simplicity and its ability to handle els, had two limitations. First, they depended on
high-dimensional problems (greater than three dimen- the market price(s) of interest rate risk, or equiv-
sions). This technique has also recently been extended alently, the expected return on default free bonds.
to pricing American options. Important contributions This dependence, just as with the option pricing mod-
in this regard are by Longstaff and Schwartz [101] els pre-Black–Scholes–Merton, made their imple-
and Broadie and Glasserman [18]. For a complete mentation problematic. Second, these models could
reference on Monte Carlo techniques, see [61]. not easily match the initial yield curve. This cal-
Following the publication of Merton’s original ibration is essential for the accurate pricing and
paper [104], which contained an analytic solution for hedging of interest rate derivatives because any
a perpetual American put option, much energy has discrepancies in yield curve matching may indi-
been expended in the search for analytic solutions for cate “false” arbitrage opportunities in the priced
both American puts and calls with finite maturities. derivatives.
For the American call, with a finite number of known To address these problems, Ho and Lee [72]
dividends, a solution was provided by Roll [115]. applied the binomial model to interest rate derivatives
For American puts, breaking the maturity of the with a twist. Instead of imposing an evolution on
option into a finite number of discrete intervals, the spot rate, they had the zero coupon bond price
the compound option pricing technique is applicable, curve that evolved in a binomial tree. Motivated by
(see [60] and [93]). More recently, the decomposition this paper, Heath–Jarrow–Morton [70] generalized
of American options into a European option and an this idea in the context of a continuous time and
early exercise premium was discovered by Carr et al. multifactor model to price interest rate derivatives.
[22], Kim [96], and Jacka [75]. The key step in the derivation of the HJM model was
These computational procedures are more gen- determined as the necessary and sufficient conditions
erally applicable to all derivative pricing models, for an arbitrage free evolution of the term structure
including those discussed in the next two sections. of interest rates.
The defining characteristic of the HJM model is
Interest Rate Derivatives that there is a continuum of underlying assets, a
term structure, whose correlated evolution needs to
Interest rate derivative pricing models provided be considered when pricing and hedging options. For
the next major advance in option pricing the- interest rate derivatives, this term structure is the term
ory. Recall that a defining characteristic of the structure of interest rates. To be specific, it is the term
Black–Scholes–Merton model is that it assumes structure of default free interest rates. But there are
deterministic interest rates. This assumption limits its other term structures of relevance, including foreign
usefulness in two ways. First, it cannot be used for interest rates, commodity futures prices, convenience
long-dated contracts. Indeed, for long-dated contracts yields on commodities, and equity forward volatili-
(greater than a year or two), interest rates cannot ties. These alternative applications are discussed later
be approximated as being deterministic. Second, for in this section.
short-dated contracts, if the underlying asset’s price To simplify the mathematics, HJM focused on
process is highly correlated with interest rate move- forward rates instead of zero-coupon bond prices.
ments, then interest rate risk will affect hedging, and The martingale pricing technology was the tool used
therefore valuation. The extreme cases, of course, are to obtain the desired conditions —the “HJM drift
interest rate derivatives where the underlyings are the conditions”. Given the HJM drift conditions and
interest rates themselves. the fact that the interest rate derivative market is
Option Pricing Theory: Historical Perspectives 5

complete in the HJM model, standard techniques are Subclasses


then applied to price interest rate derivatives.
The HJM model is very general: all previous spot Subsequent research developed special cases of the
rate models are special cases. In fact, the labels HJM model that have nice analytic and computational
Vasicek, extended Vasicek (or sometimes Hull and properties for implementation. Perhaps the most use-
White [74]), and CIR are now exclusively used to ful class, for its analytic properties, is the affine
identify subclasses of the HJM model. Subclasses are model of Duffie and Kan [45] and Dai and Sin-
uniquely identified by a particular volatility structure gleton [39]. The class of models is called affine
for the evolution of forward rate curve. For example, because the spot rate can be written as an affine
the Ho and Lee model is now identified as a single function of a given set of state variables. The affine
factor HJM model, where the forward rate volatility class includes both the Vasicek and CIR models
is a constant across maturities. This can be shown to as mentioned earlier. This class of term structure
be the term structure evolution to which the Ho and evolutions have known characteristic functions for
Lee binomial model converges. the spot rate, which enables numerical computa-
Adoption of the HJM model was slow at first, tions for various interest rate derivatives (see [47]).
hampered mostly by computational concerns, but as Extensions of the affine class include those by Fil-
these computational concerns dissipated, the mod- ipovic [57], Chen et al. [28], and Cheng and Scaillet
ern era for pricing interest rate derivatives was born. [29].
As mentioned previously, the HJM model was very The original HJM paper showed that instantaneous
general. In its most unrestricted form, the evolution forward rates being lognormally distributed is incon-
of the term structure of interest rates could be path sistent with no arbitrage. Hence, geometric Brownian
dependent (non-Markov) and it could generate nega- motion was excluded as an acceptable forward rate
tive interest rates with positive probability. Research process. This was unfortunate because it implies that
into the HJM model proceeded in two directions: (i) caplets, options on forward rates, will not satisfy
investigations into the abstract mathematical struc- Black’s formula [10]. And historically, because of
the industry’s familiarity with the Black–Scholes for-
ture of HJM models and (ii) studying subclasses that
mula (a close relative of Black’s formula), Black’s
had nice analytic and computational properties for
formula was used extensively to value caplets. This
applications.
inconsistency between theory and practice lead to a
With respect to the understanding of the math-
search for a theoretical justification for using Black’s
ematical structure of HJM models, three questions
formula with caplets.
arose. First, what structures would guarantee interest
This problem was resolved by Sandmann
rates that remained positive? Second, given an ini-
et al. [119], Miltersen et al. [109], and Brace et al.
tial forward rate curve and its evolution, what is the [14]. The solution was to use a simple interest rate,
class of forward rate curves that can be generated compounded discretely, for the London Interbank
by all possible evolutions? Third, under what condi- Offer Rate (LIBOR). Of course, simple rates better
tions is an HJM model a finite dimensional Markov match practice. And it was shown that the evolu-
process? The first question was answered by Fle- tion of a simple LIBOR could evolve as a geometric
saker and Hughston [55], Rogers [114], and Jin and Brownian motion in an arbitrage free setting. Subse-
Glasserman [91]. The second was solved by Bjork quently, the lognormal evolution has been extended to
and Christensen [7] and Filipovic [56]. The third was jump diffusions (see [62]), Levy processes (see [54]),
studied by Cheyette [30], Caverhill [25], Jeffrey [92], and stochastic volatilities (see [1]).
Duffie and Kan [45], and Bjork and Svensson [9], Key to the use of the “LIBOR model”, as it
among others. has become known, is the forward price martin-
The original HJM model had the term structure of gale measure. The forward price martingale mea-
interest rates generated by a finite number of Brow- sure is an equivalent probability measure that makes
nian motions. Extensions include (i) jump processes asset payoffs at some future date T martingales
(see [8, 53 and 82]); (ii) stochastic volatilities (see [1, when discounted by the T maturity zero coupon
31]); and (iii) random fields (see [64, 95]). bond price. The forward price martingale measure
6 Option Pricing Theory: Historical Perspectives

was first discovered by Jarrow [76] and later inde- of a firm issuing only a single zero coupon bond. As
pendently discovered by Geman [59] (see [112] such, risky debt could be decomposed into riskless
for a discussion of the LIBOR model and its debt plus a short put option on the assets of the firm.
history). Shortly thereafter, extensions to address this simple
liability structure were quickly discovered by Black
Applications and Cox [11] Jones et al. [94] and Leland [100]
among others.
The HJM model has been extended to multiple term The structural approach to credit risk modeling
structures and applied to foreign currency derivatives has two well-known empirical shortcomings: (i) that
[2], to equities and commodities [3], and to Treasury default occurs smoothly, implying that bond prices
inflation protected bonds [89]. The HJM model has do not jump at default and (ii) that the firm’s
also been applied to term structures of futures prices assets are neither traded nor observable. The first
(see [21], and [108]), term structures of convenience shortcoming means that for short maturity bonds,
yields [111], term structures of credit risky bonds credit spreads as implied by the structural model are
(discussed in the next section), and term structures smaller than those observed in practice. Extensions
of equity forward volatilities ([51, 52], and [121]). In of the structural approach that address the absence of
fact, it can be shown that almost all option pricing a jump at default include that by Zhou [125]. These
applications can be viewed as special cases of a extensions, however, did not overcome the second
multiple term structure HJM model (see [88]). A shortcoming.
summary of many of these applications can be found Almost 20 years after Merton’s original paper,
in [19]. Jarrow and Turnbull [85, 86] developed an alter-
native credit risk model that overcame the sec-
ond shortcoming. As a corollary, this approach
Credit Derivatives also overcame the first shortcoming. This alterna-
tive approach has become known as the reduced form
The previously discussed models excluded the con-
model. Early important contributions to the reduced
sideration of default when trading financial securities.
form model were by Lando [97], Madan and Unal
The first model for studying credit risk, called the
[103], Jarrow et al. [80], and Duffie and Singleton
structural approach, was introduced by Merton [105].
[49].
Credit risk, although always an important considera-
As the credit derivative markets expanded, so did
tion in fixed income markets, dramatically expanded
its market wide recognition with the introduction of extensions to the reduced form model. To consider
trading in credit default swaps after the mid-1990s. credit rating migration, Jarrow et al. [80] introduced
The reason for this delayed importance was that it a Markov chain model, where the states correspond
took until then for the interest rate derivative markets to credit ratings. Next, there was the issue of default
to mature sufficiently for sophisticated financial insti- correlation for pricing credit derivatives on baskets
tutions to successfully manage/hedge equity, foreign (e.g., credit default obligations (CDOs)). This corre-
currency and interest rate risk. This risk-controlling lation was first handled with Cox processes (Lando
ability enabled firms to seek out arbitrage opportu- [97]).
nities, and in the process, lever up on the remaining The use of Cox processes induces default corre-
financial risks, which are credit/counterparty, liquid- lations across firms through common state variables
ity, and operational risk. This greater risk expo- that drive the default intensities. But when condition-
sure by financial institutions to both credit and liq- ing on the state variables, defaults are assumed to
uidity risk (as evidenced by the events surround- be independent across firms. If this structure is true,
ing the failure of Long Term Capital Management) then after conditioning, defaults are diversifiable in
spurred the more rapid development of credit risk a large portfolio and require no additional risk pre-
modeling. mium. The implication is that the empirical and risk
As the first serious contribution to credit risk neutral default intensities are equal. This equality, of
modeling, Merton’s original model was purposely course, would considerably simplify direct estimation
simple. Merton considered credit risk in the context of the risk neutral default intensity [81].
Option Pricing Theory: Historical Perspectives 7

This is not the only mechanism through which translation in P. Cootner (ed.) (1964) The Random
default correlations can be generated. Default con- Character of Stock Market Prices, MIT Press, Cam-
tagion is also possible through competitive industry bridge, MA.
[5] Bank, P. & Baum, D. (2004). Hedging and Portfolio
considerations. This type of default contagion is a optimization in illiquid Financial markets with a large
type of “counterparty” risk, and it was first studied trader, Mathematical Finance 14(1), 1–18.
in the context of a reduced form model by Jarrow [6] Bielecki, T. & Rutkowski, M. (2002). Credit Risk:
and Yu [90]. “Counterparty risk” in a reduced form Modeling, Valuation, and Hedging, Springer Verlag.
model, an issue in and of itself, was previously stud- [7] Bjork, T. & Christensen, B. (1999). Interest rate
ied by Jarrow and Turnbull [86, 87]. dynamics and consistent forward rate curves, Mathe-
matical Finance 9(4), 323–348.
Finally, default correlation could be induced via [8] Bjork, T., Di Masi, G., Kabanov, Y. & Runggaldier, W.
information flows as well. Indeed, a default by one (1997). Towards a general theory of bond markets,
firm may cause other firm’s default intensities to Finance and Stochastics 1, 141–174.
increase as the market learns about the reasons for the [9] Bjork, T. & Svensson, L. (2001). On the existence of
realized default (see [120]). Finding a suitable corre- finite dimensional realizations for nonLinear forward
lation structure for implementation and estimation is rate models, Mathematical Finance 11(2), 205–243.
[10] Black, F. (1976). The pricing of commodity contracts,
still a topic of considerable interest.
Journal of Financial Economics 3, 167–179.
An important contribution to the credit risk model [11] Black, F. & Cox, J. (1976). Valuing corporate securi-
literature was the integration of structural and reduced ties: some effects of bond indenture provisions, Journal
form models. These two credit risk models can be of Finance 31, 351–367.
understood through the information sets used in their [12] Black, F. & Scholes, M. (1973). The pricing of options
construction. Structural models use the management’s and corporate liabilities, Journal of Political Economy
information set, while reduced form models use the 81, 637–659.
[13] Boyle, P. (1977). Options: a Monte Carlo approach,
market’s information set. Indeed, the manager has Journal of Financial Economics 4, 323–338.
access to the firm’s asset values, while the market [14] Brace, A., Gatarek, D. & Musiela, M. (1997). The
does not. The first paper making this connection was market model of interest rate dynamics, Mathematical
by Duffie and Lando [46] who viewed the market Finance 7(2), 127–147.
as having the management’s information set plus [15] Brennan, M. & Schwartz, E. (1977). The valuation of
noise, due to the accounting process. An alternative American put options, Journal of Finance 32, 449–462.
[16] Brennan, M. & Schwartz, E. (1978). Finite difference
view is that the market has a coarser partitioning of methods and jump processes arising in the pricing of
management’s information, that is, less of it. Both contingent claims: a synthesis, Journal of Financial and
views are reasonable, but the mathematics is quite Quantitative Analysis 13, 461–474.
different. The second approach was first explored by [17] Brennan, M. & Schwartz, E. (1979). A continuous time
Cetin et al. [27]. approach to the pricing of bonds, Journal of Banking
Credit risk modeling continues to be a hot area of and Finance 3, 135–155.
[18] Broadie, M. & Glasserman, P. (1997). Pricing Ameri-
research. Books on the current state of the art with
can style securities by simulation, Journal of Economic
respect to credit risk derivative pricing models are by Dynamics and Control 21, 1323–1352.
Lando [98] and Bielecki and Rutkowski [6]. [19] Carmona, R. (2007). HJM: a unified approach to
dynamic models for fixed income, credit and equity
markets. Paris-Princeton Lectures on Mathematical
References Finance 2004, Lecture Notes in Mathematics, vol.
1919, Springer Verlag.
[1] Andersen, L. & Brotherton-Ratcliffe, R. (2005). Ex- [20] Carr, P., Geman, H., Madan, D. & Yor, M. (2003).
tended LIBOR market models with stochastic volatility, Stochastic volatility for levy processes, Mathematical
Journal of Computational Finance 9, 1–26. Finance 13, 345–382.
[2] Amin, K. & Jarrow, R. (1991). Pricing foreign cur- [21] Carr, P. & Jarrow, R. (1995). A discrete time syn-
rency options under stochastic interest rates, Journal of thesis of derivative security valuation using a term
International Money and Finance 10(3), 310–329. structure of futures prices, in Handbooks in OR & MS,
[3] Amin, K. & Jarrow, R. (1992). Pricing American R. Jarrow, V. Maksimoviz & W. Ziemba, eds, Elsevier
options on risky assets in a stochastic interest rate Science B.V., Vol. 9, pp. 225–249.
economy, Mathematical Finance 2(4), 217–237. [22] Carr, P., Jarrow, R. & Myneni, R. (1992). Alternative
[4] Bachelier, L. (1990). Theorie de la Speculation, Ph.D. characterizations of American put options, Mathemati-
Dissertation, L’Ecole Normale Superieure. English cal Finance 2(2), 87–106.
8 Option Pricing Theory: Historical Perspectives

[23] Carr, P. & Madan, D. (1998). Option valuation using [44] Detemple, J. (2006). American Style Derivatives: Valu-
the fast Fourier transform, Journal of Computational ation and Computation, Financial Mathematics Series,
Finance 2, 61–73. Chapman & Hall/CRC.
[24] Carr, P. & Madan, D. (1998). Toward a theory of [45] Duffie, D. & Kan, R. (1996). A yield factor model of
volatility trading, in Volatility, R. Jarrow, ed., Risk interest rates, Mathematical Finance 6, 379–406.
Publications, pp. 417–427. [46] Duffie, D. & Lando, D. (2001). Term structure of
[25] Caverhill, A. (1994). When is the spot rate Markovian?, credit spreads with incomplete accounting information,
Mathematical Finance 4, 305–312. Econometrica 69, 633–664.
[26] Çetin, U., Jarrow, R. & Protter, P. (2004). Liquidity risk [47] Duffie, D., Pan, J. & Singleton, K. (2000). Transform
and arbitrage pricing theory, Finance and Stochastics analysis and asset pricing for affine jump-diffusions,
8, 311–341. Econometrica 68, 1343–1376.
[27] Cetin, U., Jarrow, R., Protter, P. & Yildirim, Y. (2004). [48] Duffie, D. & Protter, P. (1992). From discrete to
Modeling credit risk with partial information, The continuous time finance: weak convergence of the
Annals of Applied Probability 14(3), 1167–1178. financial gain process, Mathematical Finance 2(1),
[28] Chen, L., Filipovic, D. & Poor, H. (2004). Quadratic 1–15.
term structure models for risk free and defaultable rates, [49] Duffie, D. & Singleton, K. (1999). Modeling term
Mathematical Finance 14(4), 515–536. structures of defaultable bonds, Review of Financial
[29] Cheng, P. & Scaillet, O. (2007). Linear-quadratic Studies 12(4), 687–720.
jump diffisuion modeling, Mathematical Finance 17(4), [50] Dufresne, D. (2000). Laguerre series for Asian and
575–598. other options, Mathematical Finance 10(4), 407–428.
[30] Cheyette, O. (1992). Term structure dynamics and [51] Dupire, B. (1992). Arbitrage pricing with stochastic
mortgage valuation, Journal of Fixed Income 1, 28–41. volatility. Proceedings of AFFI Conference, Paris, June.
[31] Chiarella, C. & Kwon, O. (2000). A complete Marko- [52] Dupire, B. (1996). A Unified Theory of Volatility.
vian stochastic volatiility model in the HJM framework,
Paribas working paper.
Asia-Pacific Financial Markets 7, 293–304.
[53] Eberlein, E. & Raible, S. (1999). Term structure mod-
[32] Cochrane, J. (2001). Asset Pricing, Princeton Univer-
els driven by general Levy processes, Mathematical
sity Press.
Finance 9(1), 31–53.
[33] Cont, R. & Tankov, P. (2004). Financial Modeling with
[54] Eberlein, E. & Ozkan, F. (2005). The Levy LIBOR
Jump Processes, Chapman & Hall.
model, Finance and Stochastics 9, 327–348.
[34] Cox, J. (1975). Notes on Option Pricing I: Constant
[55] Flesaker, B. & Hughston, L. (1996). Positive interest,
Elasticity of Variance Diffusions, working paper, Stan-
Risk Magazine 9, 46–49.
ford University.
[56] Filipovic, D. (2001). Consistency Problems for Heath
[35] Cox, J., Ingersoll, J. & Ross, S. (1985). A theory of
Jarrow Morton Interest Rate Models, Springer Lecture
the term structure of interest rates, Econometrica 53,
385–407. Notes in Mathematics, Vol. 1760, Springer Verlag.
[36] Cox, J. & Ross, S.A. (1976). The valuation of options [57] Filipovic, D. (2002). Separable term structures and the
for alternative stochastic processes, Journal of Finan- maximal degree problem, Mathematical Finance 12(4),
cial Economics 3(1/2), 145–166. 341–349.
[37] Cox, J., Ross, S. & Rubinstein, M. (1979). Option [58] Garman, M. & Kohlhagen, S. (1983). Foreign currency
pricing: a simplified approach, Journal of Financial exchange values, Journal of International Money and
Economics 7, 229–263. Finance 2, 231–237.
[38] Cox, J. & Rubinstein, M. (1985). Option Markets, [59] Geman, H. (1989). The Importance of the Forward
Prentice Hall. Neutral Probability in a Stochastic Approach of Interest
[39] Dai, Q. & Singleton, K. (2000). Specification analysis Rates, working paper, ESSEC.
of affine term structure models, Journal of Finance 55, [60] Geske, R. (1979). The valuation of compound options,
1943–1978. Journal of Financial Economics 7, 63–81.
[40] Delbaen, F. & Schachermayer, W. (1994). A general [61] Glasserman, P. (2004). Monte Carlo Methods in Finan-
version of the fundamental theorem of asset pricing, cial Engineering, Springer Verlag.
Mathematische Annalen 300, 463–520. [62] Glasserman, P. & Kou, S. (2003). The term structure
[41] Delbaen, F. & Schachermayer, W. (1995). The exis- of simple forward rates with jump risk, Mathematical
tence of absolutely continuous local Martingale mea- Finance 13(3), 383–410.
sures, Annals of Applied Probability 5, 926–945. [63] Green, R. & Jarrow, R. (1987). Spanning and com-
[42] Delbaen, F. & Schachermayer, W. (1998). The fun- pleteness in markets with contingent claims, Journal of
damental theorem for unbounded stochastic processes, Economic Theory 41(1), 202–210.
Mathematische Annalen 312, 215–250. [64] Goldstein, R. (2000). The term structure of interest
[43] Delbaen, F. & Schachermayer, W. (2006). The Mathe- rates as a random field, Review of Financial Studies
matics of Arbitrage, Springer Verlag. 13(2), 365–384.
Option Pricing Theory: Historical Perspectives 9

[65] Harrison, J. & Kreps, D. (1979). Martingales and discontinuities in asset returns, Mathematical Finance
arbitrage in multiperiod security markets, Journal of 5(4), 311–336.
Economic Theory 20, 381–408. [83] Jarrow, R. & Rudd, A. (1982). Approximate option
[66] Harrison, J. & Pliska, S. (1981). Martingales and valuation for arbitrary stochastic processes, Journal of
stochastic integrals in the theory of continuous trad- Financial Economics 10, 347–369.
ing, Stochastic Processes and Their Applications 11, [84] Jarrow, R. & Rudd, A. (1983). Option Pricing, Dow
215–260. Jones Irwin.
[67] Harrison, J. & Pliska, S. (1983). A stochastic cal- [85] Jarrow, R. & Turnbull, S. (1992). Credit risk: drawing
culus model of continuous trading: complete mar- the analogy, Risk Magazine 5(9).
kets, Stochastic Processes and Their Applications 15, [86] Jarrow, R. & Turnbull, S. (1995). Pricing derivatives
313–316. on financial securities subject to credit risk, Journal of
[68] He, H. (1990). Convergence of discrete time to conti- Finance 50(1), 53–85.
nous time contingent claims prices, Review of Financial [87] Jarrow, R. & Turnbull, S. (1997). When swaps are
Studies 3, 523–546. dropped, Risk Magazine 10(5), 70–75.
[69] Heath, D. & Jarrow, R. (1987). Arbitrage, continuous [88] Jarrow, R. & Turnbull, S. (1998). A unified approach
trading and margin requirments, Journal of Finance 17, for pricing contingent claims on multiple term struc-
1129–1142. tures, Review of Quantitative Finance and Accounting
[70] Heath, D., Jarrow, R. & Morton, A. (1992). Bond 10(1), 5–19.
pricing and the term structure of interest rates: a [89] Jarrow, R. & Yildirim, Y. (2003). Pricing treasury infla-
new methodology for contingent claims valuation, tion protected securities and related derivatives using
Econometrica 60(1), 77–105. an HJM model, Journal of Financial and Quantitative
[71] Heston, S. (1993). A closed form solution for options Analysis 38(2), 337–358.
with stochastic volatility with applications to bond [90] Jarrow, R. & Yu, F. (200). Counterparty risk and the
and currency options, Review of Financial Studies 6, pricing of defaultable securities, Journal of Finance
56(5), 1765–1799.
327–343.
[91] Jin, Y. & Glasserman, P. (2001). Equilibrium positive
[72] Ho, T. & Lee, S. (1986). Term structure movements
interest rates: a unified view, Review of Financial
and pricing interest rate contingent claims, Journal of
Studies 14, 187–214.
Finance 41, 1011–1028.
[92] Jeffrey, A. (1995). Single factor heath Jarrow Morton
[73] Hull, J. & White, A. (1987). The pricing of options on
term structure models based on Markov spot rate
assets with stochastic volatilities, Journal of Finance
dynamics, Journal of Financial and Quantitative Anal-
42, 271–301.
ysis 30, 619–642.
[74] Hull, J. & White, A. (1990). Pricing interest rate
[93] Johnson, H. (1983). An analytic approximation of
derivative securities, Review of Financial Studies 3,
the American put price, Journal of Financial and
573–592. Quantitative Analysis 18, 141–148.
[75] Jacka, S. (1991). Optimal stopping and the American [94] Jones, E., Mason, S. & Rosenfeld, E. (1984). Con-
put, Mathematical Finance 1, 1–14. tingent claims analysis of corporate capital structures:
[76] Jarrow, R. (1987). The pricing of commodity options an empirical investigation, Journal of Finance 39,
with stochastic interest rates, Advances in Futures and 611–627.
Options Research 2, 15–28. [95] Kennedy, D. (1994). The term structure of interest rates
[77] Jarrow, R. (1992). Market manipulation, bubbles, cor- as a Gaussian random field, Mathematical Finance 4,
ners and short squeezes, Journal of Financial and 247–258.
Quantitative Analysis 27(3), 311–336. [96] Kim, J. (1990). The analytic valuation of American
[78] Jarrow, R. (1994). Derivative security markets, market options, Review of Financial Studies 3, 547–572.
manipulation and option pricing, Journal of Financial [97] Lando, D. (1998). On Cox processes and credit
and Quantitative Analysis 29(2), 241–261. risky securities, Review of Derivatives Research 2,
[79] Jarrow, R. (1999). In honor of the Nobel Laureates 99–120.
Robert C. Merton and Myron S. Scholes: a partial [98] Lando, D. (2004). Credit Risk Modeling: Theory and
differential equation that changed the world, Journal Applications, Princeton University Press, Princeton.
of Economic Perspectives 13(4), 229–248. [99] Leland, H. (1985). Option pricing and replication with
[80] Jarrow, R., Lando, D. & Turnbull, S. (1997). A Markov transaction costs, Journal of Finance 15,
model for the term structure of credit risk spreads, 1283–1391.
Review of Financial Studies 10(1), 481–523. [100] Leland, H. (1994). Corporate debt value, bond coven-
[81] Jarrow, R., Lando, D. & Yu, F. (2005). Default risk ants and optimal capital structure, Journal of Finance
and diversification: theory and empirical applications, 49, 1213–1252.
Mathematical Finance 15(1), 1–26. [101] Longstaff, F. & Schwartz, E. (2001). Valuing American
[82] Jarrow, R. & Madan, D. (1995). Option pricing using options by simulation: a simple least squares approach,
the term structure of interest rates to hedge systematic Review of Financial Studies 14, 113–147.
10 Option Pricing Theory: Historical Perspectives

[102] Madan, D. & Milne, F. (1991). Option pricing with [115] Roll, R. (1977). An analytic valuation formula for
variance gamma martingale components, Mathematical unprotected American call options on stocks with
Finance 1, 39–55. known dividends, Journal of Financial Economics 5,
[103] Madan, D. & Unal, H. (1998). Pricing the risks of 251–258.
default, Review of Derivatives Research 2, 121–160. [116] Ross, S. (1976). Options and efficiency, Quarterly
[104] Merton, R.C. (1973). The theory of rational option Journal of Economics 90, 75–89.
pricing, Bell Journal of Economics and Management [117] Samuelson, P. (1965). Rational theory of warrant
Science 4, 141–183. pricing, Industrial Management Review 6, 13–39.
[105] Merton, R.C. (1974). On the pricing of corporate debt: [118] Samuelson, P. & Merton, R.C. (1969). A complete
the risk structure of interest rates, Journal of Finance model of warrant pricing that maximizes utility, Indus-
29, 449–470. trial Management Review 10(2), 17–46.
[106] Merton, R.C. (1976). Option pricing when underlying [119] Sandmann, K., Sondermann, D. & Miltersen, K.
stock returns are discontinuous, Journal of Financial (1995). Closed form term structure derivatives in a
Economics 3, 125–144. heath Jarrow Morton model with log-normal annu-
[107] Merton, R.C. (1990). Continuous Time Finance, Basil ally compunded interest rates, Proceedings of the
Blackwell, Cambridge, Massachusetts. Seventh Annual European Research Symposium,
[108] Miltersen, K., Nielsen, J. & Sandmann, K. (2006). Bonn, September 1994, Chicago Board of Trade,
New no-arbitrage conditions and the term structure of pp. 145–164.
interest rate futures, Annals of Finance 2, 303–325. [120] Schonbucher, P. (2004). Information Driven Default
[109] Miltersen, K., Sandmann, K. & Sondermann, D. (1997). Contagion, working paper, ETH Zurich.
Closed form solutions for term structure derivatives [121] Schweizer, M. & Wissel, J. (2008). Term structure of
with log-normal interest rates, Journal of Finance 52, implied volatilities: absence of arbitrage and existence
409–430. results, Mathematical Finance 18(1), 77–114.
[110] Modigliani, F. & Miller, M. (1958). The cost of capital, [122] Sharpe, W. (1981). Investments, Prentice Hall, Engle-
corporation finance, and the theory of investment, wood Cliffs.
American Economic Review 48, 261–297. [123] Turnbull, S. & Wakeman, L. (1991). A quick algo-
[111] Nakajima, K. & Maeda, A. (2007). Pricing commodity rithm for pricing European average options, Journal of
spread options with stochastic term structure of conve- Financial and Quantitative Analysis 26, 377–389.
nience yields and interest rates, Asia Pacific Financial [124] Vasicek, O. (1977). An equilibrium characterization of
Markets 14, 157–184. the term structure, Journal of Financial Economics 5,
[112] Rebonato, R. (2002). Modern Pricing of Interest Rate 177–1888.
Derivatives: The LIBOR Market Model land Beyond, [125] Zhou, C. (2001). The term structure of credit spreads
Princeton University Press. with jump risk, Journal of Banking and Finance 25,
[113] Rendleman, R. & Bartter, B. (1979). Two state option 2015–2040.
pricing, Journal of Finance 34, 1093–1110.
[114] Rogers, L. (1994). The potential approach to the term ROBERT A. JARROW
structure of interest rates and foreign exchange rates,
Mathematical Finance 7, 157–176.
Modern Portfolio Theory be determined by the present value of discounted
future dividends. MPT prehistory can be traced even
beyond to Bachelier [3], who was the first to describe
Modern portfolio theory (MPT) is generally arithmetic Brownian motion with the objective of
defined as the body of financial economics determining the value of financial derivatives, all the
beginning with Markowitz’ famous 1952 paper, way to Bernoulli [7], who originated the concept
“Portfolio Selection”, and extending through the of risk aversion while working to solve the St.
next several decades of research into what has Petersburg Paradox. Bernoulli, in his derivation of
variously been called Financial Decision Making logarithmic utility, suggested that people maximize
under Uncertainty, The Theory of Investments, The “moral expectation”—what we call today expected
Theory of Financial Economics, Theory of Asset utility; further, Bernoulli, like Markowitz [53] and
Selection and Capital–Market Equilibrium, and The Roy [78], advised risk-averse investors to diversify:
Revolutionary Idea of Finance [45, 53, 58, 82, 88, “. . . it is advisable to divide goods which are exposed
98]. Usually this definition includes the Capital Asset to some small danger into several portions rather than
Pricing Model (CAPM) and its various extensions. to risk them all together.”
Markowitz once remarked to Marschak that the first Notwithstanding this ancient history, MPT is inex-
“CAPM” should be attributed to Marschak because tricably connected to CAPM, which for the first time
of his pioneering work in the field [56]; Marschak placed the investor’s problem in the context of an
politely declined the honor. economic equilibrium. This modern approach finds
The original CAPM, as we understand it today, its origin in the work of Mossin [65], Lintner [47,
was first developed by Treynor [91, 92], and subse- 48], and Sharpe [84], and even earlier in Treynor [91,
quently independently derived in the works of Sharpe 92]. Accounts of these origins can be found in [8, 29,
[84], Lintner [47], and Mossin [65]. With the excep- 85]. Treynor [92] built on the single-period discrete-
tion of some commercially successful multifactor time foundation of Markowitz [53, 54] and Tobin
models that implement the approaches pioneered in [90]. Similar CAPM models of this type were later
[71, 72, 74, 75], most practitioners have little use published in [47, 48, 84]. Mossin [65] clarified Sharpe
for market models other than the CAPM, although [84] by providing a more precise specification of
(or, perhaps rather, because of the simplicity it the equilibrium conditions. Fama [26] reconciled the
derives from the fact that) its conclusions are based Sharpe and Lintner models; Lintner [49] incorporated
on extremely restrictive and unrealistic assumptions. heterogeneous beliefs; and Mayers [57] allowed for
Academics have spent much time and effort attempt- concentrated portfolios through trading restrictions
ing to substantiate or refute the validity of the CAPM on risky assets, transactions costs, and information
as a positive economic model. The best examples of asymmetries. Black [10] utilized the two-fund sep-
such attempts are [13, 28]. Roll [70] effectively ended aration theorem to construct the zero-beta CAPM,
this debate, however, by demonstrating that, since the by using a portfolio that is orthogonal to the mar-
“market portfolio” is not measurable, the CAPM can ket portfolio in place of a risk-free asset. Rubinstein
never be empirically proven or disproven. [79] extended the model to higher moments and also
(independently of Black) derived the CAPM without
a riskless asset.
History of Modern Portfolio Theory Discrete-time multiperiod models were the next
step; these models generally extend the discrete-time
The history of MPT extends back farther than the single-period model into an intertemporal setting in
history of CAPM, to Tobin [90], Markowitz [53], which investors maximize the expected utility of
and Roy [78], all of whom consider the “price of lifetime consumption and bequests. Building upon
risk”. For more detailed treatments of MPT and the multiperiod lifetime consumption literature of
pre-MPT financial economic thought, refer to [22, Phelps [68], Mirrlees [63], Yaari [97], Levhari and
69, 82]. The prehistory of MPT can be traced Srinivasan [44], and Hahn [30], models of this type
further yet, to Hicks [34] who includes the “price include those of Merton [59, 60], Samuelson [83],
of risk” in his discussion of commodity futures Hakansson [31, 32], Fama [27], Beja [4], Rubinstein
and to Williams [95] who considers stock prices to [80, 81], Long [50, 51], Kraus and Litzenberger
2 Modern Portfolio Theory

[41], and culminate in the consumption CAPMs Hindy and Huang [36] and the parsimonious con-
(CCAPMs) of Lucas [52] and Breeden [15]. ditional discrete-time CAPM and simplified infinite-
The multiperiod approach was taken to its date model of LeRoy [43], continues to build upon
continuous-time limit in the intertemporal CAPM the model originated in [91]. Each is perhaps more
(“ICAPM”) of Merton [61]. In addition to the stan- realistic, if less elegant, than the original. And yet
dard assumptions—limited liability of assets, no mar- it is the single period, discrete-time CAPM that has
ket frictions, individual trading does not affect prices, become popular and endured, as all great models do,
the market is in equilibrium, a perfect borrowing precisely because it is simple and unrealistic. It is
and lending market exists, and no nonnegativity realistic enough, apparently, to be coincident with the
constraints (relaxing the no short-sale rule employed utility functions of great many agents.
by Tobin and Sharpe but not by Treynor and Lint-
ner)—this model assumes that trading takes place
continually through time, as opposed to at discrete A Perspective on CAPM
points in time. Rather than assuming normally dis-
One of the puzzles that confronts the historian
tributed security returns, the ICAPM assumes a log- of CAPM is the changing attitude over time and
normal distribution of prices and a geometric Brow- across different scholarly communities toward the
nian motion of security returns. Also, the constant seminal work of Treynor [91, 92]. Contemporaries
rate of interest provided by the risk-free asset in consistently cited the latter paper [11, 13, 37, 38],
the CAPM is replaced by a dynamically chang- including also [84, 85]. However, in other papers,
ing rate, which is certain in the next instant but such as [16, 45, 55], these citations were not made.
uncertain in the future. Williams [96] extended this Histories and bibliographies continue to take note
model by relaxing the homogeneous expectations of Treynor’s contribution [8, 14, 58, 82], but not
assumption, and Duffie and Huang [23] confirmed textbooks or the scholarly literature that builds on
that such a relaxation is consistent with the ICAPM. CAPM. Why not?
The continuous-time model was shown to be con- One reason is certainly that Treynor’s manuscript
sistent with a single-beta CCAPM by Breeden [15]. [92] was not actually published in a book until
Hellwig [33] and Duffie and Huang [24] construct much later [40], although the paper did circulate
continuous-time models that allow for informational widely in mimeograph form. Another is that Treynor
asymmetries. The continuous-time model was further never held a permanent academic post, and so did
extended to include macroeconomic factors in [20]. not have a community of students and academic
Kyle [42] constructs an ICAPM to model insider colleagues to draw attention to his work. A third is
trading. that, although Treynor continued to write on financial
These, and other CAPMs, including the interna- topics, writings collected in [93], these writings were
tional models of Black [12], Solnik [86], and Stulz consistently addressed to practitioners, not to an
academic audience.
[89], as well as the CAPMs of Ross [73, 76] and Sta-
Even more than these, perhaps the most impor-
pleton and Subrahmanyam [87], are reviewed in [16,
tant reason (paradoxically) is the enormous attention
17, 19, 62, 77]. Bergstrom [5] provides a survey of
that was paid in subsequent years to refinement of
continuous-time models.
MPT. Unlike Markowitz and Sharpe, Treynor came
Extensions of the CAPM have also been devel- to CAPM from a concern about the firm’s capital
oped for use, in particular, in industrial applications; budgeting problem, not the investor’s portfolio allo-
for example, Cummins [21] reviews the models of cation problem. (This concern is clear in the 1961
Cooper [18], Biger and Kahane [9], Fairley [25], draft, which builds explicitly on [64].) This was the
Kahane [39], Hill [35], Ang and Lai [2], and Turner same concern, of course, that motivated Lintner, and
[94], which are specific to the insurance industry. it is significant therefore that the CAPMs of Lintner
More recent work continues to extend the theory. and Sharpe were originally seen as different theories,
Nielsen [66, 67], Allingham [1], and Berk [6] exam- rather than different formulations of the same theory.
ine conditions for equilibrium in the CAPM. Current Because the portfolio choice problem became
research, such as the collateral adjusted CCAPM of such a dominant strand of academic research, it
Modern Portfolio Theory 3

was perhaps inevitable that retrospective accounts of [3] Bachelier, L. (1900). Théorie de la spéculation, Annales
CAPM would emphasize the line of development Scientifique de l’École Normale Superieure 17, 3e serie,
that passes from the individual investor’s problem 21–86; Translated by Boness, A.J. and reprinted in
Cootner, P.H. (ed.) (1964). The Random Character of
to the general equilibrium problem, which is to say Stock Market Prices, MIT Press, Cambridge. (Revised
the line that passes through Tobin and Markowitz edition, first MIT Press Paperback Edition, July 1967).
to Sharpe. Lintner and Mossin come in for some pp. 17–78; Also reprinted as Bachelier, L. (1995).
attention, as academics who contributed not only Théorie de la Spéculation & Théorie Mathématique
their own version of CAPM but also produced a du jeu, (2 titres en 1 vol.) Les Grands Classiques
series of additional contributions to the academic Gauthier-Villars, Éditions Jacques Gabay, Paris, Part 1,
pp. 21–86.
literature. However, Treynor was not only interested
[4] Beja, A. (1971). The structure of the cost of capital
in a different problem but also was, and remained, a under uncertainty, The Review of Economic Studies 38,
practitioner. 359–369.
[5] Bergstrom, A.R. (1988). The history of continuous-
time econometric models, Econometric Theory 4(3),
Conclusion 365–383.
[6] Berk, J.B. (1992). The Necessary and Sufficient Con-
In 1990, the world beyond financial economists ditions that Imply the CAPM , working paper, Faculty
was made aware of the importance of MPT, when of Commerce, University of British Columbia, Canada;
Subsequently published as (1997). Necessary condi-
Markowitz and Sharpe, along with Miller, were
tions for the CAPM, Journal of Economic Theory 73,
awarded the Nobel Prize in Economics for their 245–257.
roles in the development of MPT. In the presenta- [7] Bernoulli, D. (1738). Exposition of a new theory on the
tion speech, Assar Lindbeck of the Royal Swedish measurement of risk, Papers of the Imperial Academy of
Academy of Sciences said “Before the 1950s, there Science, Petersburg, Vol. II, pp. 175–192;Translated and
was hardly any theory whatsoever of financial mar- reprinted in Sommer, L. (1954). Econometrica 22(1),
kets. A first pioneering contribution in the field was 23–36.
[8] Bernstein, P.L. (1992). Capital Ideas: The Improbable
made by Harry Markowitz, who developed a theory Origins of Modern Wall Street, The Free Press, New
. . . [which] shows how the multidimensional prob- York.
lem of investing under conditions of uncertainty in a [9] Biger, N. & Kahane, Y. (1978). Risk considerations in
large number of assets . . . may be reduced to the issue insurance ratemaking, The Journal of Risk and Insurance
of a trade-off between only two dimensions, namely 45, 121–132.
the expected return and the variance of the return of [10] Black, F. (1972). Capital market equilibrium with
restricted borrowing, Journal of Business 45(3),
the portfolio . . . . The next step in the analysis is to
444–455.
explain how these asset prices are determined. This [11] Black, F. (1972). Equilibrium in the creation of invest-
was achieved by development of the so-called Cap- ment goods under uncertainty, in Studies in the Theory of
ital Asset Pricing Model, or CAPM. It is for this Capital Markets, M.C. Jensen, ed., Praeger, New York,
contribution that William Sharpe has been awarded. pp. 249–265.
The CAPM shows that the optimum risk portfolio [12] Black, F. (1974). International capital market equilib-
of a financial investor depends only on the portfolio rium with investment barriers, Journal of Financial Eco-
nomics 1(4), 337–352.
manager’s prediction about the prospects of different [13] Black, F., Jensen, M.C. & Scholes, M. (1972). The
assets, not on his own risk preferences . . . . The Cap- capital asset pricing model: some empirical tests, in
ital Asset Pricing Model has become the backbone of Studies in the Theory of Capital Markets, M.C. Jensen,
modern price theory of financial markets” [46]. ed., Praeger, New York, pp. 79–121.
[14] Brealey, R.A. & Edwards, H. (1991). A Bibliography of
Finance, MIT Press, Cambridge.
References [15] Breeden, D.T. (1979). An intertemporal asset pric-
ing model with stochastic consumption and investment
[1] Allingham, M. (1991). Existence theorems in the capital opportunities, Journal of Financial Economics 7(3),
asset pricing model, Econometrica 59(4), 1169–1174. 265–296.
[2] Ang, J.S. & Lai, T.-Y. (1987). Insurance premium [16] Breeden, D.T. (1987). Intertemporal portfolio theory
pricing and ratemaking in competitive insurance and and asset pricing, in The New Palgrave Finance,
capital asset markets, The Journal of Risk and Insurance J. Eatwell, M. Milgate & P. Newman, eds, W.W. Norton,
54, 767–779. New York, pp. 180–193.
4 Modern Portfolio Theory

[17] Brennan, M.J. (1987). Capital asset pricing model, in [36] Hindy, A. & Huang, M. (1995). Asset Pricing With
The New Palgrave Finance, J. Eatwell, M. Milgate & Linear Collateral Constraints. unpublished manuscript,
P. Newman, eds, W.W. Norton, New York, pp. 91–102. Graduate School of Business, Stanford University.
[18] Cooper, R.W. (1974). Investment Return and Property- March.
Liability Insurance Ratemaking, Huebner Foundation, [37] Jensen, M.C. (ed) (1972). Studies in the Theory of
University of Pennsylvania, Philadelphia. Capital Markets, Praeger, New York.
[19] Copeland, T.E. & Weston, J.F. (1987). Asset pricing, in [38] Jensen, M.C. (1972). The foundations and current state
The New Palgrave Finance, J. Eatwell, M. Milgate & of capital market theory, in Studies in the Theory of
P. Newman, eds, W.W. Norton, New York, pp. 81–85. Capital Markets, M.C. Jensen, ed., Praeger, New York,
[20] Cox, J.C., Ingersoll Jr, J.E. & Ross, S.A. (1985). An pp. 3–43.
intertemporal general equilibrium model of asset prices, [39] Kahane, Y. (1979). The theory of insurance risk premi-
Econometrica 53(2), 363–384. ums—a re-examination in the light of recent develop-
[21] Cummins, J.D. (1990). Asset pricing models and insur- ments in capital market theory, ASTIN Bulletin 10(2),
ance ratemaking, ASTIN Bulletin 20(2), 125–166. 223–239.
[22] Dimson, E. & Mussavain, M. (2000). Three Centuries [40] Korajczyk, R.A. (1999). Asset Pricing and Portfolio Per-
of Asset Pricing, Social Science Research Network formance: Models, Strategy and Performance Metrics,
Electronic Library, paper 000105402.pdf. January. Risk Books, London.
[23] Duffie, D. & Huang, C.F. (1985). Implementing Arrow- [41] Kraus, A. & Litzenberger, R.H. (1975). Market equilib-
Debreu equilibria by continuous trading of few long- rium in a multiperiod state-preference model with loga-
lived securities, Econometrica 53, 1337–1356; Also rithmic utility, Journal of Finance 30(5), 1213–1227.
reprinted in edited by Schaefer, S. (2000). Continuous- [42] Kyle, A.S. (1985). Continuous auctions and insider
Time Finance, Edward Elgar, London. trading, Econometrica 53(3), 1315–1335.
[24] Duffie, D. & Huang, C.F. (1986). Multiperiod security [43] LeRoy, S.F. (2002). Theoretical Foundations for Con-
markets with differential information: martingales and ditional CAPM . unpublished manuscript, University of
resolution times, Journal of Mathematical Economics 15, California, Santa Barbara. May.
283–303. [44] Levhari, D. & Srinivasan, T.N. (1969). Optimal savings
[25] Fairley, W. (1979). Investment income and profit mar- under uncertainty, The Review of Economic Studies
gins in property-liability insurance: theory and empirical 36(106), 153–163.
tests, Bell Journal of Economics 10, 192–210. [45] Levy, H. & Sarnatt, M. (eds) (1977). Financial Decision
[26] Fama, E.F. (1968). Risk, return, and equilibrium: some Making under Uncertainty, Academic Press, New York.
clarifying comments, Journal of Finance 23(1), 29–40. [46] Lindbeck, A. (1990). The sveriges riksbank prize in
[27] Fama, E.F. (1970). Multiperiod consumption—invest- economic sciences in memory of Alfred Nobel 1990
ment decisions, The American Economic Review 60, presentation speech, Nobel Lectures, Economics 1981-
163–174. 1990, K.-G. Mäler, ed., World Scientific Publishing Co.,
[28] Fama, E.F. & MacBeth, J. (1973). Risk, return and Singapore, 1992.
equilibrium: empirical tests, The Journal of Political [47] Lintner, J. (1965). The valuation of risk assets and the
Economy 81(3), 607–636. selection of risky investments in stock portfolios and
[29] French, C.W. (2003). The Treynor capital asset pricing capital budgets, The Review of Economics and Statistics
model, Journal of Investment Management 1(2), Second 47, 13–37.
quarter, 60–72. [48] Lintner, J. (1965). Securities prices, risk, and maximal
[30] Hahn, F.H. (1970). Savings and uncertainty, The Review gains from diversification, Journal of Finance 20(4),
of Economic Studies 37(1), 21–24. 587–615.
[31] Hakansson, N.H. (1969). Optimal investment and con- [49] Lintner, J. (1969). The aggregation of investor’s diverse
sumption strategies under risk, an uncertain lifetime, judgment and preferences in purely competitive secu-
and insurance, International Economic Review 10(3), rities markets, Journal of Financial and Quantitative
443–466. Analysis 4, 347–400.
[32] Hakansson, N.H. (1970). Optimal investment and con- [50] Long Jr, J.B. (1972). Consumption-investment decisions
sumption strategies under risk for a class of utility and equilibrium in the securities markets, in Studies in
functions, Econometrica 38(5), 587–607. the Theory of Capital Markets, M.C. Jensen, ed., Praeger,
[33] Hellwig, M.F. (1982). Rational expectations equilibrium New York, pp. 146–222.
with conditioning on past prices: a mean-variance exam- [51] Long Jr, J.B. (1974). Stock prices, inflation and the
ple, Journal of Economic Theory 26, 279–312. term structure of interest rates, Journal of Financial
[34] Hicks, J.R. (1939). Value and Capital: An Inquiry Economics 2, 131–170.
into some Fundamental Principles of Economic Theory, [52] Lucas Jr, R.E. (1978). Asset prices in an exchange
Clarendon Press, Oxford. economy, Econometrica 46(6), 1429–1445.
[35] Hill, R. (1979). Profit regulation in property-liability [53] Markowitz, H.M. (1952). Portfolio selection, Journal of
insurance, Bell Journal of Economics 10, 172–191. Finance 7(1), 77–91.
Modern Portfolio Theory 5

[54] Markowitz, H.M. (1959). Portfolio Selection: Efficient Journal of Financial and Quantitative Analysis 8(3),
Diversification of Investments, Cowles Foundation for 317–333.
Research in Economics at Yale University, Monograph [73] Ross, S.A. (1975). Uncertainty and the heterogeneous
#6. John Wiley & Sons, Inc., New York. (2nd Edition, capital good model, The Review of Economic Studies
1991, Basil Blackwell, Inc., Cambridge). 42(1), 133–146.
[55] Markowitz, H.M. (2000). Mean-Variance Analysis in [74] Ross, S.A. (1976). The arbitrage theory of capital asset
Portfolio Choice and Capital Markets, Frank J. Fabozzi pricing, Journal of Economic Theory 13(3), 341–360.
Associates, New Hope. [75] Ross, S.A. (1976). Risk, return and arbitrage, in Risk and
[56] Marschak, J. (1938). Money and the theory of assets, Return in Finance, I. Friend & J. Bicksler, eds, Ballinger,
Econometrica 6, 311–325. Cambridge, pp. 1–34.
[57] Mayers, D. (1972). Nonmarketable assets and capital [76] Ross, S.A. (1978). Mutual fund separation in financial
market equilibrium under uncertainty, in Studies in the theory—the separating distributions, Journal of Eco-
Theory of Capital Markets, M.C. Jensen, ed., Praeger, nomic Theory 17(2), 254–286.
New York, pp. 223–248. [77] Ross, S.A. (1987). Finance, in The New Palgrave
[58] Mehrling, P. (2005). Fischer Black and the Revolution- Finance, J. Eatwell, M. Milgate & P. Newman, eds,
ary Idea of Finance, Wiley, Hoboken. W.W. Norton, New York, pp. 1–34.
[59] Merton, R.C. (1969). Lifetime portfolio selection under [78] Roy, A.D. (1952). Safety first and the holding of assets,
uncertainty: the continuous time case, The Review of Econometrica 20(3), 431–439.
Economics and Statistics 51, 247–257; Reprinted as [79] Rubinstein, M. (1973). The fundamental theorem of
chapter 4 of Merton, R.C. (1990). Continuous-Time parameter-preference security valuation, Journal of Fin-
Finance, Blackwell, Cambridge, pp. 97–119. ancial and Quantitative Analysis 8, 61–69.
[60] Merton, R.C. (1971). Optimum consumption and port- [80] Rubinstein, M. (1974). A Discrete-Time Synthesis of
folio rules in a continuous time model, Journal of Eco- Financial Theory, Working Paper 20, Haas School
nomic Theory 3, 373–413; Reprinted as chapter 5 of of Business, University of California at Berkeley;
Reprinted in Research in Finance, JAI Press, Greenwich,
Merton, R.C. (1990). Continuous-Time Finance, Black-
Vol. 3, pp. 53–102.
well, Cambridge pp. 120–165.
[81] Rubinstein, M. (1976). The valuation of uncertain
[61] Merton, R.C. (1973). An intertemporal capital asset
income streams and the pricing of options, Bell Journal
pricing model, Econometrica 41, 867–887; Reprinted
of Economics 7, Autumn, 407–425.
as chapter 15 of Merton, R.C. (1990). Continuous-Time
[82] Rubinstein, M. (2006). A History of the Theory of Invest-
Finance, Blackwell, Cambridge, pp. 475–523.
ments My Annotated Bibliography, Wiley, Hoboken.
[62] Merton, R.C. (1990). Continuous-Time Finance, Black-
[83] Samuelson, P.A. (1969). Lifetime portfolio selection
well, Cambridge. (revised paperback edition, 1999
by dynamic stochastic programming, The Review of
reprint).
Economics and Statistics 57(3), 239–246.
[63] Mirrlees, J.A. (1965). Optimum Accumulation Under [84] Sharpe, W.F. (1964). Capital asset prices: a theory of
Uncertainty. unpublished manuscript. December. market equilibrium under conditions of risk, Journal of
[64] Modigliani, F. & Miller, M.H. (1958). The cost of cap- Finance 19(3), 425–442.
ital, corporation finance, and the theory of investment, [85] Sharpe, W.F. (1990). Autobiography, in Les Prix Nobel
The American Economic Review 48, 261–297. 1990, Tore Frängsmyr, ed., Nobel Foundation, Stock-
[65] Mossin, J. (1966). Equilibrium in a capital asset market, holm.
Econometrica 34(4), 768–783. [86] Solnik, B. (1974). An equilibrium model of interna-
[66] Nielsen, L.T. (1990). Equilibrium in CAPM without tional capital markets, Journal of Economic Theory 8(4),
a riskless asset, The Review of Economic Studies 57, 500–524.
315–324. [87] Stapleton, R.C. & Subrahmanyam, M. (1978). A mul-
[67] Nielsen, L.T. (1990). Existence of equilibrium in CAPM, tiperiod equilibrium asset pricing model, Econometrica
Journal of Economic Theory 52, 223–231. 46(5), 1077–1095.
[68] Phelps, E.S. (1962). The accumulation of risky capi- [88] Stone, B.K. (1970). Risk, Return, and Equilibrium, a
tal: a sequential utility analysis, Econometrica 30(4), General Single-Period Theory of Asset Selection and
729–743. Capital-Market Equilibrium, MIT Press, Cambridge.
[69] Poitras, G. (2000). The Early History of Financial [89] Stulz, R.M. (1981). A model of international asset
Economics, Edward Elgar, Chentenham. pricing, Journal of Financial Economics 9(4), 383–406.
[70] Roll, R. (1977). A critique of the asset pricing theory’s [90] Tobin, J. (1958). Liquidity preference as behavior
tests, Journal of Financial Economics 4(2), 129–176. towards risk, The Review of Economic Studies (67),
[71] Rosenberg, B. (1974). Extra-market component of 65–86. Reprinted as Cowles Foundation Paper 118.
covariance in security returns, Journal of Financial and [91] Treynor, J.L. (1961). Market Value, Time and Risk .
Quantitative Analysis 9(2), 263–273. unpublished manuscript dated 8/8/61.
[72] Rosenberg, B. & McKibben, W. (1973). The prediction [92] Treynor, J.L. (1962). Toward a Theory of Market Value
of systematic and specific risk in security returns, of Risky Assets, unpublished manuscript. “Rough Draft”
6 Modern Portfolio Theory

dated by Mr. Treynor to the fall of 1962. A final version of the American Economic Association, Boston, December;
was published in 1999, in Asset Pricing and Portfolio Subsequently extended and published as (1965). Investment
Performance, R.A. Korajczyk, ed., Risk Books, London, decision under uncertainty: choice-theoretic approaches, The
pp. 15–22. Quarterly Journal of Economics 79(5), 509–536; Also, see
[93] Treynor, J.L. (2007). Treynor on Institutional Investing, (1966). Investment decision under uncertainty: applications
Wiley, Hoboken. of the state-preference approach, The Quarterly Journal of
[94] Turner, A.L. (1987). Insurance in an equilibrium asset Economics 80(2), 252–277.
pricing model, in Fair Rate of Return in Property- Itô, K. (1944). Stochastic integrals, Proceedings of the Imperial
Liability Insurance, J.D. Cummins & S.E. Harrington, Academy Tokyo 22, 519–524.
eds, Kluwer Academic Publishers, Norwell. Itô, K. (1951). Stochastic differentials, Applied Mathematics
[95] Williams, J.B. (1938). The Theory of Investment Value, and Optimization 1, 374–381.
Harvard University Press, Cambridge. Itô, K. (1998). My sixty years in studies of probability
[96] Williams, J.T. (1977). Capital asset prices with het- theory, acceptance speech of the Kyoto prize in basic
erogeneous beliefs, Journal of Financial Economics 5, sciences, in The Inamori Foundation Yearbook 1998, Inamori
219–239. Foundation, Kyoto.
[97] Yaari, M.E. (1965). Uncertain lifetime, life insurance, Jensen, M.C. (1968). The performance of mutual funds in the
and the theory of the consumer, The Review of Economic period 1945-64, Journal of Finance 23(2), 389–416.
Studies 32(2), 137–150. Jensen, M.C. (1969). Risk, the pricing of capital assets, and
[98] The Royal Swedish Academy of Sciences (1990). The the evaluation of investment portfolios, Journal of Business
Sveriges Riskbank Prize in Economic Sciences in Mem- 42(2), 167–247.
ory of Alfred Nobel 1990 , Press release 16 October 1990. Keynes, J.M. (1936). The General Theory of Employment,
Interest, and Money, Harcourt Brace, New York.
Further Reading Leontief, W. (1947). Postulates: Keynes’ general theory and
the classicists, in The New Economics: Keynes’ Influence on
Theory and Public Policy, S.E. Harris, ed., Knopf, New York,
Arrow, K.J. (1953). Le Rôle des Valuers Boursières pour la
Chapter 19, pp. 232–242.
Répartition la Meilleure des Risques, Économetrie, Collo-
Lintner, J. (1965). Securities Prices and Risk; the Theory and
ques Internationaux du Centre National de la Recherche
a Comparative Analysis of AT&T and Leading Industrials,
Scientifique 11, 41–47.
Paper Presented at the Bell System Conference on the Eco-
Black, F. & Scholes, M. (1973). The pricing of options and
nomics of Regulated Public Utilities, University of Chicago
corporate liabilities, The Journal of Political Economy 81(3),
637–654. Business School, Chicago, June.
Cootner, P.H. (ed.) (1964). The Random Character of Stock Lintner, J. (1970). The market price of risk, size of market
Market Prices, MIT Press, Cambridge. (Revised edition, and investor’s risk aversion, The Review of Economics and
First MIT Press Paperback Edition, July 1967). Statistics 52, 87–99.
Courtault, J.M., Kabanov, Y., Bru, B., Crépel, P., Lebon, I. & Lintner, J. (1971). The effects of short selling and margin
Le Marchand, A. (2000). Louis Bachelier on the centenary requirements in perfect capital markets, Journal of Financial
of théorie de la spéculation, Mathematical Finance 10(3), and Quantitative Analysis 6, 1173–1196.
341–353. Lintner, J. (1972). Finance and Capital Markets, National
Cvitanić, J., Lazrak, A., Martinelli, L. & Zapatero, F. (2002). Bureau of Economic Research, New York.
Revisiting Treynor and Black (1973): an Intertemporal Model Mandelbrot, B.B. (1987). Louis Bachelier, in The New Pal-
of Active Portfolio Management , unpublished manuscript. grave Finance, J. Eatwell, M. Milgate & P. Newman, eds,
The University of Southern California and the University W.W. Norton, New York, pp. 86–88.
of British Columbia. Markowitz, H.M. (1952). The utility of wealth, The Journal of
Duffie, D. (1996). Dynamic Asset Pricing Theory, 2nd Edition, Political Economy 60(2), 151–158.
Princeton University Press, Princeton. Markowitz, H.M. (1956). The optimization of a quadratic func-
Eatwell, J., Milgate, M. & Newman, P. (eds) (1987). The New tion subject to linear constraints, Naval Research Logistics
Palgrave Finance, W.W. Norton, New York. Quarterly 3, 111–133.
Friedman, M. & Jimmie Savage, L. (1948). The utility analysis Markowitz, H.M. (1957). The elimination form of the inverse
of choices involving risk, The Journal of Political Economy and its application to linear programming, Management
56(4), 279–304. Science 3, 255–269.
Friend, I. & Bicksler, J.L. (1976). Risk and Return in Finance, Marschak, J. (1950). Rational behavior, uncertain prospects,
Ballinger, Cambridge. and measurable utility, Econometrica 18(2), 111–141.
Hakansson, N.H. (1987). Portfolio analysis, in The New Pal- Marschak, J. (1951). Why “Should” statisticians and busi-
grave Finance, J. Eatwell, M. Milgate & P. Newman, eds, nessmen maximize “moral expectation”?, Proceedings of
W.W. Norton, New York, pp. 227–236. the Second Berkeley Symposium on Mathematical Statistics
Hirshleifer, J. (1963). Investment Decision Under Uncertainty, and Probability, University of California Press, Berkeley,
Papers and Proceedings of the Seventy-Sixth Annual Meeting pp. 493–506. Reprinted as Cowles Foundation Paper 53.
Modern Portfolio Theory 7

Marshall, A. (1890, 1891). Principles of Economics, 2nd Sharpe, W.F. (1961a). Portfolio Analysis Based on a Simpli-
Edition, Macmillan and Co., London and New York. fied Model of the Relationships Among Securities, unpub-
Merton, R.C. (1970). A Dynamic General Equilibrium Model lished doctoral dissertation. University of California at Los
of the Asset Market and Its Application to the Pricing of Angeles, Los Angeles.
the Capital Structure of the Firm, Working Paper 497-70, Sharpe, W.F. (1961b). A Computer Program for Portfolio Anal-
Sloan School of Management, MIT, Cambridge; Reprinted ysis Based on a Simplified Model of the Relationships Among
as chapter 11 of Merton, R.C. (1990). Continuous-Time Securities, unpublished mimeo. University of Washington,
Finance, Blackwell, Cambridge, pp. 357–387. Seattle.
Merton, R.C. (1972). An analytic derivation of the efficient Sharpe, W.F. (1963). A simplified model for portfolio analysis,
portfolio frontier, Journal of Financial and Quantitative Management Science 9(2), 277–293.
Analysis 7, 1851–1872. Sharpe, W.F. (1966). Mutual fund performance, Journal of
Miller, M.H. & Modigliani, F. (1961). Dividend policy, Business 39,(Suppl), 119–138.
growth and the valuation of shares, Journal of Business 34, Sharpe, W.F. (1970). Portfolio Theory and Capital Markets,
235–264. McGraw-Hill, New York.
Modigliani, F. & Miller, M.H. (1963). Corporate income taxes Sharpe, W.F. (1977). The capital asset pricing model: a
and the cost of capital, The American Economic Review 53, ‘multi-Beta’ interpretation, in Financial Decision Making
433–443. Under Uncertainty, H. Levy & M. Sarnatt, eds, Har-
Mossin, J. (1968). Optimal multiperiod portfolio policies, court Brace Jovanovich, Academic Press, New York, pp.
Journal of Business 4(2), 215–229. 127–136.
Mossin, J. (1969a). A note on uncertainty and preferences in Sharpe, W.F. & Alexander, G.J. (1978). Investments, 4th
a temporal context, The American Economic Review 59(1), Edition, (1990), Prentice-Hall, Englewood Cliffs.
172–174. Taqqu, M.S. (2001). Bachelier and his times: a conver-
Mossin, J. (1969b). Security pricing and investment criteria in sation with Bernard Bru, Finance and Stochastics 5(1),
competitive markets, The American Economic Review 59(5), 3–32.
749–756. Treynor, J.L. (1963). Implications for the Theory of Finance,
Mossin, J. (1973). Theory of Financial Markets, Prentice-Hall, unpublished manuscript. “Rough Draft” dated by Mr.
Englewood Cliffs. Treynor to the spring of 1963.
Mossin, J. (1977). The Economic Efficiency of Financial Mar- Treynor, J.L. (1965). How to rate management of investment
kets, Lexington, Lanham. funds, Harvard Business Review 43, 63–75.
von Neumann, J.L. & Morgenstern, O. (1953). Theory of Treynor, J.L. & Black, F. (1973). How to use security analysis
Games and Economic Behavior, 3rd Edition, Princeton to improve portfolio selection, Journal of Business 46(1),
University Press, Princeton. 66–88.
Roy, A.D. (1956). Risk and rank or safety first generalised,
Economica 23(91), 214–228.
Rubinstein, M. (1970). Addendum (1970), in Portfolio Related Articles
Selection: Efficient Diversification of Investments, Cowles
Foundation for Research in Economics at Yale University,
Monograph #6, H.M. Markowitz, ed., 1959. John Wiley & Bernoulli, Jacob; Black–Litterman Approach;
Sons, Inc., New York. (2nd Edition, 1991, Basil Blackwell, Risk–Return Analysis; Markowitz, Harry; Mutual
Inc., Cambridge), pp. 308–315. Funds; Sharpe, William F..
Savage, L.J. (1954). The Foundations of Statistics, John Wiley
& Sons, New York. CRAIG W. FRENCH
Long-Term Capital so LTCM would lever the spread trade to raise the
overall risk level, as well as the expected return on
Management invested capital.
An example of such a trade is an on-the-run versus
off-the-run trade. In August 1998, 30-year treasuries
Background (the on-the-run bond) had a yield to maturity of
5.50%. The 29-year bond (the off-the-run issue) was
Long-Term Capital Management (LTCM) launched 12 basis points (bp) cheaper, with a yield to maturity
its flagship fund on February 24, 1994, with $1.125 of 5.62%. The outright risk of 30-year treasury bonds
billion in capital, making it the largest start-up was a standard deviation of around 85 bp per year.
hedge fund to date. Over $100 million came from The spread trade only had a risk level of around 3.5
the partners themselves, especially those who came bp per year, so the spread trade could be levered 25
from the proprietary trading operation that John to 30 to 1, bringing it in line with the market risk of
Meriwether had headed at Salomon Brothers. At 30-year treasuries.
Salomon, the profit generated by this group had LTCM would never do a trade that mathematically
regularly exceeded the profit generated by the entire looked attractive according to its models unless
firm, and the idea of LTCM was to continue this they qualitatively understood why the trade worked
record on their own. To help them, they also recruited and what were the forces that would bring the
a dream team of academic talent, most notably Myron “spreads” to convergence. In the case of the on-the-
Scholes and Robert Merton (see Merton, Robert C.), run versus off-the-run trade, the main force leading
who would win the 1997 Nobel Prize in Economics to a difference in yields between the two bonds is
for their pioneering work in financial economics. But liquidity. The 30-year bond is priced higher by 12
they were not alone; half of the founding partners bp (approximately 1.2 points on a par bond) because
taught finance at major business schools. some investors are willing to pay more to own a
The first few years of the fund continued the more liquid bond. But in six months’ time, when the
success of the Salomon years (Table 1). treasury issues a new 30-year bond, that new bond
The fund was closed to new capital in 1995 and will be the most liquid one and the old 30-year bond
quickly grew to $7.5 billion of capital by the end of will lose its liquidity premium. This means that in
1997. At this time the partners decided, given the six months’ time, it will trade at a yield similar to
lack of additional opportunities, to pay a dividend of that of the old 29-year bond, thus bringing about a
$2.7 billion, which left the capital at the beginning of convergence of the spread.
1998 at $4.8 billion. LTCM was involved in many such relative-value
trades, in many different and seemingly unrelated
markets and instruments. These included trades in
Investment Style Government bond spreads, swap spreads, yield curve
arbitrage, mortgage arbitrage, volatility spreads, risk
The fund invested in relative-value convergence arbitrage, and equity relative value trades. In each
trades. They would buy cheap assets and hedge case, the bet was that some spread would converge
as many of the systematic risk factors as possible over time.
by selling rich assets. The resulting “spread” trade
had significantly less risk than the outright trade,
Risk Management
Table 1 LTCM returns
LTCM knew that a major risk to pursuing relative-
Net Gross Dollar Ending
Year return (%) return (%) profits ($) capital ($) value convergence trades was the ability to hold the
trades until they converged. To ensure this, LTCM
1994 20 28 0.4 1.6 insisted that investors lock in equity capital for
1995 43 59 1.3 3.6 3 years, so there would be no premature liquidation
1996 41 57 2.1 5.2
1997 17 25 1.4 7.5
from investor cashout. This equity lock-in also gave
counterparties comfort that LTCM had long-lasting
2 Long-Term Capital Management

credit worthiness, and that enabled LTCM to acquire diversification. If the relative value strategies had
preferential financing. very low correlations with each other, then the risk of
As a further protection, LTCM also made exten- the overall portfolio would be low. LTCM assumed
sive use of term financing. If the on-the-run/off-the- that in the long run these correlations were low
run trade might take six months to converge, LTCM
because of the loose economic ties between the
would finance the securities for six months, instead
of rolling the financing overnight. LTCM also had trades, although in the short run these correlations
a two-way mark-to-market provisions in all of its could be significantly higher. LTCM also assumed
over-the-counter contracts. Thus for its relative value that the downside risk on some of the trades was
trades that consisted of both securities and contractual diminished, as spreads got very wide, on the assump-
agreements it had fully symmetric marks, so that the tion that other leveraged funds would rush in to take
only time LTCM had to put additional equity capital advantage. In retrospect, these assumptions were all
into a trade was if the spreads widened out. The fund
falsified by experience.
also had term debt and backstop credit lines in place
as alternative funding. Before the crisis, LTCM had a historical risk
LTCM also stress tested its portfolio relative level of a $45 million daily standard deviation of
to potential economic shocks to the system, and return on the fund. See Figure 1 for historical daily
hedged against the consequences. As an example, in returns.
1995, LTCM had a large swapped position in Italian After the fund reached global scale in 1995,
government bonds. The firm got very worried that the risk level was remarkably stable. In fact, the
if the Republic of Italy defaulted, it would have a
partners had actually predicted a higher risk level
sizable loss. So it purchased insurance against this
potential default by doing a credit default swap on for the fund as they assumed that the correla-
the Italian government bonds. tions among the relative value trades would be
But the primary source of risk management relied higher then historical levels. But in 1998, all this
on the benefit that the portfolio obtained due to changed.

200

150

100
Millions of dollars

50

−50

−100

−150

−200
February 24, 1994 to July 22, 1988

Figure 1 Historical daily returns


Long-Term Capital Management 3

The 1998 Crisis While the Russian default triggered the economic
crisis in August, it was an LTCM crisis in September.
Would the fund fail? Many other institutions with
In 1998, LTCM was up slightly in the first four similar positions liquidated them in advance of the
months of the year. Then, in May, the portfolio lost potential failure. Some market participants bet against
6% and in June, it lost 10%. In early July, the portfo- the firm and counterparties marked contractual agree-
lio rebounded by about 7% and the partners reduced ments at extremely wide levels to obtain addi-
the underlying risk of the portfolio accordingly by tional cushions against bankruptcy. The partners hired
about 10%. Goldman Sachs to help them raise additional capital
The crisis was triggered by the Russian default and to sell off assets; for this, they received 50% of
on its domestic bonds on August 17, 1998. While the management company.
LTCM did not have many Russian positions so that The leverage of the firm went to an enormous lev-
its direct losses were small, the default did initiate els involuntarily (Figure 2), not because of increase
the process that was to follow as unrelated markets in assets but because of equity falling. In the event,
all over the world reacted. On Friday August 21, attempts to raise additional funds failed and on Mon-
LTCM had a one-day loss of $550 million. (A risk day, September 21, the fund lost another $550 mil-
arb deal that was set to close on that day, that of lion, putting its capital for the first time below $1
Ciena and Tellabs, broke, causing a $160 million billion. On Wednesday, at the behest of the Federal
loss. Swap spreads that normally move about 1 bp Reserve, the 15 major counterparties met at the New
a day were out 21 bp intraday.) The Russian debt York Fed to discuss the situation.
crisis had triggered a flight out of all relative-value During the meeting, at 11:00 AM the partners rece-
positions. In the illiquid days at the end of August, ived a telephone call from Warren Buffett, who was
these liquidations caused a downward spiral as new on a satellite phone while vacationing with Bill Gates
losses led to more liquidations and more losses. The in Alaska. He said that LTCM was about to receive a
result was that by the end of August LTCM was bid on its entire portfolio from him and that he hoped
down by 53% for the year, with the capital now at they would seriously consider it. At 11:30 AM LTCM
$2.3 billion. received the fax message given in Figure 3.

45

40

35

30
Leverage

25

20

15

10

0
Jun-94 Jan-95 Aug-95 Mar-96 Oct-96 May-97 Dec-97 Jul-98
June 1994 to September 1998

Figure 2 Leverage
4 Long-Term Capital Management

HIGHLY CONFIDENTIAL

September 23, 1998

Mr. John Meriwether


Chief Executive Officer
Long-Term Capital Management, LP.
One East Weaver Street
Greenwich, CT 06331-5146

Dear Mr. Meriwether:

Subject to the following deal structure, the partnership described below proposes to purchase
the assets of Long-Term Capital Management (and/or its affiliates and subsidiaries, collectively
referred to as "Long-Term Capital") for $250 million.

The purchaser will be a limited partnership whose investors will be Berkshire Hathaway for $3
billion, American International Group for $700 million and Goldman Sachs for $300 million (or
each of their respective affiliates). All management of the assets will be under the sole control
of the partnership and will be transferred to the partnership in an orderly manner.

This bid is also subject to the following:

1) The limited partnership described herein will not assume any liabilities of Long-Term
Capital arising from any activities prior to the purchase by the partnership

2) All current financing provided to Long-Term Capital will remain in place under current
terms and conditions.

The names of the proposal participants may not be disclosed to anyone. If the names are
disclosed, the bid will expire.

This bid will expire at 12:30 p.m. New York time on September 23, 1998.

Sincerely,

Warren E. Buffett Maurice R. Greenberg Jon S. Corzine

Agreed and Accepted on behalf of Long-Term Capital

John Meriwether

Figure 3 Copy of the $250 million offer for Long-Term Capital Management

The partners were unable to accept the proposal contract). Transfer of those positions to the Buffett-
as it was crafted. The fund had approximately 15 000 led group would require the approval of all the
distinct positions. Each of these positions was a counterparties. Clearly, all of LTCM’s counterparties
credit counterparty transaction (i.e., a repo or swap would prefer to have Warren Buffett as a creditor
Long-Term Capital Management 5

as opposed to an about-to-be-bankrupt hedge fund. bailout. At that time third-party investors were paid
But it was going to be next to impossible to obtain off. The consortium of banks decided to continue
complete approval in one hour. the liquidation at a faster pace and, by December
The partners proposed, as an alternative, that the 1999, the liquidation was complete. The banks had
group make an emergency equity infusion into the no losses and had made a 10% return on their
fund in return for 90% ownership and the right investment.
to kick the partners out as managers. Under this Investors who had made a $1 investment at the
plan, all the financing would stay in place and the beginning of 1998 would have seen their investment
third party investors could be redeemed at anytime. fall to 8 cents at the time of the bailout, and would
Unfortunately, the lawyers were not able to get have received 10 cents on April 1, 1999. But in its
Buffett back on his satellite phone and no one earlier years, LTCM had made high returns and paid
was prepared to consummate the deal without his out high dividends such that of its 100 investors only
approval. 12 actually lost money, and only 6 lost more than
At the end of the day, 14 financial institutions $2 million. The median investor actually had a 19%
(everyone with the exception of Bear Stearns) agreed internal rate of return (IRR) even including the loss.
to make an emergency $3.625 billion equity infusion The partners did not fare as well. Their capital was
into the fund. The plan was essentially a no-fault about $2 billion at the beginning of 1998 and they
bankruptcy where the creditors of a company (in received no final payout.
this case, the secured creditors) make an equity
investment, cramming down the old equity holders, in
order to liquidate the company in an orderly manner. Lessons Learned
Why did the Fed orchestrate the bailout? The
answer has to do with how the bankruptcy laws are The LTCM crisis illustrates some of the pitfalls of
applied with respect to financial firms. When LTCM a VaR-based risk management system (see Value-at-
did the on-the-run versus off-the-run strategy, the risk Risk), where the risk of the portfolio is determined
of the two sides of the trade netted within the fund. by the exogenous economic relationships among
But in bankruptcy, each side of the trade liquidates its the trades. During the crisis, all of LTCM’s trades
collateral separately, and sends a bill to LTCM. The moved together with correlations approaching one,
risk involved in the position is thus no longer netted even though the trades were economically diverse.
at 3.5 bp but is actually 85 bp per side. Although It was hard to believe that the returns from US
the netted risk of LTCM was $45 million per day, the mortgage arbitrage trades would be highly related
gross risk was much larger, more like $30 million per to LTCM’s Japanese warrant and convertible book
day with each of 15 counterparties. or highly related to their European government bond
As conditions worsened, early in September, the spread trades. Yet, during the crisis these correlations
partners had been going around to the counterparties all moved toward one, resulting in a failure of
and explaining this enormous potential risk factor diversification and creating enormous risk for the
in the event of bankruptcy and the large losses fund.
that the counterparties would potentially face. They What was the common thread in all of these
separately asked each dealer to make an equity trades? It was not that they were economically
infusion to shore up LTCM’s capital situation. But related, but more that they had similar holders of
it was a classic Prisoner’s Dilemma problem. No the trades with common risk tolerances. When these
dealer would commit unless everyone else did. It was hedge funds and proprietary trading groups at the
necessary to get everyone in the same room, so that banks lost money in the Russian crisis they were
they would all know the full extent of the exposures ordered by senior management to reduce their risk
and all commit together, and that could not happen exposures. The trades that they took off were the
until bankruptcy was imminent. relative-value trades. As they unwound their positions
In this event, the private bailout was a success. in the illiquid days of August, the spreads went out
No counterparty had any losses on their collateral. further, causing more losses and further unwinds.
By the end of the first quarter of 1999, the fund This risk might be better classified as endogenous
had rallied 25% from its value at the time of the risk, risk that comes about not from the fundamental
6 Long-Term Capital Management

economic relationships of the cash flows of the secu- Related Articles


rities but in a crisis through the common movements
of the holders of the trades. Prudent risk manage- Merton, Robert C.; Risk Management: Historical
ment practices need to manage the portfolio risk not Perspectives; Value-at-Risk.
just for normal times but for crisis times, taking into
account the endogenous aspect of risk. ERIC ROSENFELD
Bubbles and Crashes scramble to unload whatever they have bought
at greater and greater losses and cash becomes
king.
The two acclaimed classic books—Galbraith’s “The Although this makes for compelling reading, many
Great Crash 1929” [40] and Kindleberger’s “Manias, questions remain unanswered. There is little consid-
Panics and Crash” [61]—provide the most commonly eration about how much fundamentals contributed to
accepted explanation of the 1929 boom and crash. the bull market and what might have triggered the
Galbraith argues that a bubble was formed in the
speculative mania. Galbraith [40] cited margin buy-
stock market during the rapid economic growth in
ing, the formation of closed-end investment trusts, the
the 1920s. Both he and Kindleberger, in his extensive
transformation of financiers into celebrities, and other
historical compendium of financial excesses, empha-
qualitative signs of euphoria to support his view.
size the irrational element—the mania—that induced
Recent evidence supports the concept of the growth
the public to invest in the bull “overheating” market.
of a social procyclical mood that promotes the attrac-
The rise in the stock market, according to Galbraith’s
tion for investing in the stock markets by a larger
account (1954 and 1988, pp. xii-xiii), depended on
and larger fraction of the population as the bubble
“the vested interest in euphoria [that] leads men and
grows [88].
women, individuals and institutions to believe that all
Furthermore, Galbraith’s and Kindleberger’s
will be better, that they are meant to be richer and to
accounts are vague about the causes of the market
dismiss as intellectually deficient what is in conflict
crash, believing that almost any event could have
with that conviction.” This eagerness to buy stocks
triggered irrational investors to sell toward the end
was then fueled by an expansion of credit in the form
of bubble, not really explaining the reason for the
of brokers’ loans that encouraged investors to become
crash. Instead, they sidestep the thorny question of
dangerously leveraged. In this respect, Shiller [91]
the occurrence and timing of the crash by focusing
argues that the increase in stock price was driven by
on the inevitability of the bubble’s collapse and
irrational euphoria among individual investors, fed by
emphatic media, which maximized TV ratings and suggest several factors that could have exploded
catered to investor demand for pseudonews. public confidence and caused prices to plummet.
Kindleberger [61] summarizes his compilation of Furthermore, little has been done to identify the
many historical bubbles as follows. precise role of external events in provoking the
collapse.
• The upswing usually starts with an opportu- In the words of Shiller [91], a crash is a time when
nity—new markets, new technologies, or some “the investing public en masse capriciously changes
significant political change—and investors look- its mind.” However, as with the more rational the-
ing for good returns. ories, this explanation again leaves unanswered the
• It proceeds through the euphoria of rising prices, question of why such tremendous capricious changes
particularly of assets, while an expansion of credit in sentiment occur. Alternatively, it amounts to sur-
inflates the bubble. rendering the explanation to the vagaries of “capri-
• In the manic phase, investors scramble to get out cious changes”. Other studies have argued that even
of money and into illiquid investments such as though fundamentals appeared high in 1929, Fisher
stocks, commodities, real estate, or tulip bulbs: “a [35], for example, argued throughout 1929 and 1930
larger and larger group of people seeks to become that the high level of prices in 1929 reflected an
rich without a real understanding of the processes expectation that future corporate cash flows would be
involved.” very high. Fisher believed this expectation to be war-
• Ultimately, the markets stop rising and peo- ranted after a decade of steadily increasing earnings
ple who have borrowed heavily find themselves and dividends, of rapidly improving technologies, and
overstretched. This is “distress”, which generates of monetary stability. In hindsight, it has become
unexpected failures, followed by “revulsion” or clear that even though fundamentals appeared high in
“discredit”. 1929, the stock market rise was clearly excessive. A
• The final phase is a self-feeding panic, where recent empirical study [25] concludes that the stocks
the bubble bursts. People of wealth and credit making up the S&P500 composite were priced at least
2 Bubbles and Crashes

30% above fundamentals in late summer 1929. White largely disappeared by the end of 2000. Although in
[107] suggests that the 1929 boom cannot be readily February 2000 the vast majority of Internet-related
explained by fundamentals, represented by expected companies had negative earnings, the Internet sector
dividend growth or changes in the equity premium. in the United States was equal to 6% of the market
While Galbraith’s and Kindleberger’s classical capitalization of all US public companies and 20% of
views have been most often cited by the mass media, the publicly traded volume of the US stock market
they had received little scholarly attention. Since the [82, 83].
1960s, in parallel with the emergence of the efficient- Ofek and Richardson [83] used the financial data
market hypothesis, their position has lost ground from 400 companies in the Internet-related sectors
among economists and especially among financial and analyzed to what extent their stock prices differed
economists. More recent works, described at the end from their fundamental values estimated by using
of this article, revive their views in the form of Miller and Modigliani [79] model for stock valuation
quantitative diagnostics. [38]. Since almost all companies in the Internet sector
had negative earnings, they estimated the (implied)
price-to-earnings (P /E) ratios, which are derived
Efficient-market Hypothesis from the revenue streams of these firms rather than
their earnings that would be read from the 1999
The efficient-markets hypothesis (see Efficient Mar- financial data. Their results are striking. Almost 20%
ket Hypothesis) states that asset prices reflect fun- of the Internet-related firms have P /E ratios in
damental value, defined as the discounted sum of excess of 1500, while over 50% exceed 500, and the
expected future cash flows where, in forming expec- aggregate P /E ratio of the entire Internet sector is
tations, investors “correctly process” all available 605. Under the assumptions that the aggregate long-
information. Therefore, in an efficient market, there run P /E ratio is 20 on average (which is already
is “no free lunch”: no investment strategy can on the large end member from a historical point
earn excess risk-adjusted average returns or aver- of view), the Internet sector would have needed to
age returns greater than are warranted for its risk. generate 40.6% excess returns over a 10-year period
Proponents of the efficient-markets hypothesis, Fried- to justify the P /E ratio of 605 implied in 2000.
man and Schwartz [39] and Fama, [34], argue that The vast majority of the implied P /Es are much too
rational speculative activity would eliminate riskless high relative to the P /Es usually obtained by firms.
arbitrage opportunities. Fama ([34], p.38) states that, By almost any standard, this clearly represented
if there are many sophisticated traders in the market, “irrational” valuation levels. These and similar figures
they may cause these bubbles to burst before they led many to believe that this set of stocks was in the
have a chance to really get under way. midst of an asset price bubble.
However, after years of effort, it has become From the theoretical point of view, some ratio-
clear that some basic empirical facts about the stock nal equilibrium asset-pricing models allow for the
markets cannot be understood in this framework presence of bubbles, as pointed out for infinite-
[106]. The efficient-markets hypothesis entirely lost horizon models in discrete-time setups by Blanchard
ground after the burst of the Internet bubble in 2000, and Watson [9]. Loewenstein and Willard [70, 71]
providing one of the recent most striking episodes characterized the necessary and sufficient conditions
of anomalous price behavior and volatility in one for the absence of bubbles in complete and incom-
of the most developed capital markets of the world. plete markets equilibria with several types of bor-
The movement of Internet stock prices during the rowing constraints and in which agents are allowed
late 1990s was extraordinary in many respects. The to trade continuously. For zero net supply assets,
Internet sector earned over 1000% returns on its including financial derivatives with finite maturities,
public equity in the two-year period from early they show that bubbles can generally exist and have
1998 through February 2000. The valuations of these properties different from their discrete-time, infinite-
stocks began to collapse shortly thereafter and by horizon counterparts. However, Lux and Sornette
the end of the same year, they had returned to pre- [73] demonstrated that exogenous rational bubbles
1998 levels, losing nearly 70% from the peak. The are hardly reconcilable with some of the stylized
extraordinary returns of 1998–February 2000 had facts of financial data at a very elementary level.
Bubbles and Crashes 3

Jarrow et al. [53] showed that if financial agents the finance literature has evolved to increasingly
prefer more to less (no dominance assumption), then recognize the evidence of deviations from the funda-
bubbles in complete markets can only exist which mental value. One important class of theories shows
are uniformly integrable martingales, and these can that there can be large movements in asset prices
exist with an infinite lifetime. Under these conditions, caused by the combined effects of heterogeneous
the put–call parity holds and there are no bubbles in beliefs and short-sales constraints. The basic idea
standard call and put options. Their analysis implies finds its root back to the original capital asset pricing
that if one believes that asset price bubbles exist, model (CAPM) theories, in particular, to Lintner’s
then asset markets must be incomplete. Jarrow et al. model of asset prices with investors having hetero-
[54] extend their discussion in [53] to characterize all geneous beliefs [69]. In his model, asset prices are a
possible price bubbles in an incomplete market, satis- weighted average of beliefs about asset payoffs with
fying the “no free lunch with vanishing risk” and “no the weights being determined by the investor’s risk
dominance” assumptions. Their [54] new theory for aversion and beliefs about asset price covariances.
bubbles is formulated in terms of different local mar- Lintner [69] and many others after him show that
tingale measures across time, which leads to some widely inflated prices can occur.
testable predictions on derivative pricing in the pres- Many other asset-pricing models in the spirit of
ence of bubbles.
Lintner [69] have been proposed [19, 29, 48, 52,
78, 89]. In these models that assume heterogeneous
Heterogeneous Beliefs and Limits to beliefs and short-sales restrictions, the asset prices
Arbitrage are determined at equilibrium to the extent that they
reflect the heterogeneous beliefs about payoffs, but
The collapsing Internet bubble has thrown new light short-sales restrictions force the pessimistic investors
on the old subject and raised the acute question of out of the market, leaving only optimistic investors
why rational investors have not moved earlier into and thus inflated asset price levels. However, when
the market and driven the Internet stock prices back short-sales restrictions no longer bind investors, then
to their fundamental valuations. prices fall. This provides a possible account of the
Two conditions are, in general, invoked as being bursting of the Internet bubble that developed in
necessary for prices to deviate from the fundamental 1998–2000. As documented by Ofek and Richard-
value. First, there must be some degree of irrational-
son [83], and by Cochrane [20], typically as much
ity in the market; that is, investors’ demand for stocks
as 80% of Internet-related shares were locked up.
must be driven by something other than fundamen-
This is due to the fact that many Internet compa-
tals, such as overconfidence in the future. Second,
nies had gone through recent initial public offerings
even if a market has such investors, the general
(IPOs) and regulations impose that shares held by
argument is that rational investors will drive prices
insiders and other pre-IPO equity holders cannot be
back to fundamental value. To avoid this, there needs
to be some limit on arbitrage. Shleifer and Vishny traded for at least six months after the IPO date. The
[92] provide a description for various limits of arbi- float of the Internet sector dramatically increased as
trage. With respect to the equity market, clearly the the lockups of many of these stocks expired. The
most important impediment to arbitrage is short-sales unlocking of literally hundreds of billions of dol-
restrictions. Roughly 70% of mutual funds explicitly lars of shares in the Internet sector in Spring 2000
state (in the Securities and Exchange Commission was equivalent of removing short-sales restrictions.
(SEC) form N-SAR) that they are not permitted to sell And the collapse of Internet stock prices coincided
short [2]. Seventy-nine percent of equity mutual funds with a dramatic expansion in the number of pub-
make no use of derivatives whatsoever (either futures licly tradable shares of Internet companies. Among
or options), suggesting further that funds do not take many others, Hong et al. [49] explicitly model the
synthetically short positions [64]. These figures indi- relationship between the number of publicly tradable
cate that the vast majority of funds never take short shares of an asset and the propensity for specula-
positions. tive bubbles to form. So far, the theoretical models
Recognizing that the world has limited arbi- based on agents with heterogeneous beliefs facing
trage and significant numbers of irrational investors, short-sales restrictions are considered among the most
4 Bubbles and Crashes

convincing models to explain the burst of the Internet that several standard results fail for local martingales:
bubbles. put–call parity does not hold, the price of an Amer-
Another test of this hypothesis on the origin of ican call exceeds that of a European call, and call
the 2000 market crash is provided by the search prices are no longer increasing in maturity (for a fixed
for possible discrepancies between option and stock strike).
prices. Indeed, even though it is difficult for rational Thus, it would seem that the issue of the ori-
investors to borrow Internet stocks for short sell- gin of the 2000 crash is settled. However, Battalio
ing due to the lockup period discussed above, they and Schultz [6] arrive at the opposite conclusion,
should have been able to construct equivalent syn- using proprietary intraday option trade and quote data
thetic short positions by purchasing puts and writing generated in the days surrounding the collapse of
calls in the option market and either borrowing or the Internet bubble. They find that the general pub-
lending cash, without the need for borrowing the lic could cheaply short synthetically using options,
stocks. The question is now transformed into find- and this information could have been transmitted to
ing some evidence for the use or the absence of such the stock market, in line with the absence of evi-
strategy and the reason for its absence in the lat- dence that synthetic stock prices diverged from actual
ter case. One possible thread is that, if short selling stock prices. The difference between the work of
through option positions was difficult or impracti- Ofek and Richardson [83] and Ofek et al. [84], on
cal, prices in the stock and options markets should the one hand, and Battalio and Schultz [6], on the
decouple [67]. Using a sample of closing bid and other, is that the former used closing option quotes
ask prices for 9026 option pairs for three days in and last stock trade prices from the OptionMetrics
February 2000 along with closing trade prices for Ivy database. As pointed out by Battalio and Schultz
the underlying equities, Ofek and Richardson [83]
[6], OptionMetrics matches closing stock trades that
find that 36% of the Internet stocks had put–call
occurred no later than 4:00 pm, and perhaps much
parity violations as compared to only 23.8% of the
earlier, with closing option quotes posted at 4:02 pm.
other stocks. One reason for put–call parity violations
Furthermore, option market makers that post clos-
may be that short-sale restrictions prevent arbitrage
ing quotes on day t are not required to trade at
from equilibrating option and stock prices. Hence,
those quotes on day t + 1. Likewise, dealers and
one interpretation of the finding that there are more
specialists in the underlying stocks have no obliga-
put–call parity violations for Internet stocks is that
short-sale constraints are more frequently binding for tion to execute incoming orders at the price of the
Internet stocks. Furthermore, Ofek et al. [84] provide most recent transaction. Hence, closing option quotes
a comprehensive comparison of the prices of stocks and closing stock prices obtained from the Option-
and options, using closing options quotes and closing Metrics database do not represent contemporaneous
trades on the underlying stock for July 1999 through prices at which investors could have simultaneously
November 2001. They find that there are large differ- traded. To address this problem, Battalio and Schultz
ences between the synthetic stock price and the actual [6] use a unique set of intraday option price data.
stock price, which implies the presence of apparent They first ensure that the synthetic and the actual
arbitrage opportunities involving selling actual shares stock prices that they compare are synchronous, and
and buying synthetic shares. They interpret their find- then, they discard quotes that, according to exchange
ings as evidence that short-sale constraints provide rules, are only indicative of the prices at which liq-
meaningful limits to arbitrage that can allow prices uidity demanders could have traded. They find that
of identical assets to diverge. almost all of the remaining apparent put–call par-
By defining a bubble as a price process that, ity violations disappear when they discard locked or
when discounted, is a local martingale under the crossed quotes and quotes from fast options markets.
risk-neutral measure but not a martingale, Cox and In other words, the apparent arbitrage opportunities
Hobson [21] provide a complementary explanation almost always arise from quotes upon which investors
for the failure of put–call parity. Intuitively, the could not actually trade. Battalio and Schultz [6] con-
local martingale model views a bubble as a stopped clude that short-sale constraints were not responsible
stochastic process for which the expectation exhibits for the high prices of Internet stocks at the peak
a discontinuity when it ends. It can then be shown of the bubble and that small investors could have
Bubbles and Crashes 5

sold short synthetically using options, and this infor- bubble was born. Once prices overshoot or supply
mation would have been transmitted to the stock catches up, inventories begin to rise, time on the mar-
market. The fact that investors did not take advan- ket increases, vacancy rises, and price increases slow
tage of these opportunities to profit from overpriced down, eventually encountering downward stickiness.
Internet stocks suggests that the overpricing was The predominant story about home prices is always
not as obvious then as it is now, with the benefit the prices themselves [91, 93]; the feedback from
of hindsight. Schultz [90] provides additional evi- initial price increases to further price increases is a
dence that contemporaneous lockup expirations and mechanism that amplifies the effects of the precip-
equity offerings do not explain the collapse of Inter- itating factors. If prices are going up rapidly, there
net stocks because the stocks that were restricted to is much word-of-mouth communication, a hallmark
a fixed supply of shares by lockup provisions actu- of a bubble. The word of mouth can spread opti-
ally performed worse than stocks with an increasing mistic stories and thus help cause an overreaction
supply of shares. This shows that current explana- to other stories, such as ones about employment.
tions for the collapse of Internet stocks are incom- The amplification can work on the downside as
plete. well.
Hedge funds are among the most sophisticated
investors, probably closer to the ideal of “rational
Riding Bubbles arbitrageurs” than any other class of investors. It is
therefore particularly telling that successful hedge-
One cannot understand crashes without knowing the fund managers have been repeatedly reported to ride
origin of bubbles. In a nutshell, speculative bubbles rather than attack bubbles, suggesting the existence of
are caused by “precipitating factors” that change pub- mechanisms that entice rational investors to surf bub-
lic opinion about markets or that have an immediate bles rather than attempt to arbitrage them. However,
impact on demand and by “amplification mecha- the evidence may not be that strong and could even be
nisms” that take the form of price-to-price feedback, circular, since only successful hedge-fund managers
as stressed by Shiller [91]. Consider the example would survive a given 2–5 year period, opening the
of a housing-market bubble. A number of funda- possibility that the mentioned evidence could result
mental factors can influence price movements in in large part from a survival bias [14, 44]. Keeping
housing markets. The following characteristics have this in mind, we now discuss two classes of models,
been shown to influence the demand for housing: which attempt to justify why sophisticated “rational”
demographics, income growth, employment growth, traders would be willing to ride bubbles. These mod-
changes in financing mechanisms, interest rates, as els share a common theme: rational investors try to
well as changes in the characteristics of the geo- ride bubbles, and the incentive to ride the bubble
graphic location such as accessibility, schools, or stems from predictable “sentiment”—anticipation of
crime, to name a few. On the supply side, atten- continuing bubble growth [1] and predictable feed-
tion has been paid to construction costs, the age back trader demand [26, 27]. An important implica-
of the housing stock, and the industrial organiza- tion of these theories is that rational investors should
tion of the housing market. The elasticity of sup- be able to reap gains from riding a bubble at the
ply has been shown to be a critical factor in the expense of less-sophisticated investors.
cyclical behavior of home prices. The cyclical pro-
cess that we observed in the 1980s in those cities
experiencing boom-and-bust cycles was caused by
the general economic expansion, best proxied by Positive Feedback Trading by Noise
employment gains, which drove up the demand. In Traders
the short run, those increases in demand encoun-
tered an inelastic supply of housing and developable The term noise traders was introduced first by
land, inventories of for-sale properties shrank, and Kyle [65] and Black [8] to describe irrational
vacancy declined. As a consequence, prices accel- investors. Thereafter, many scholars exploited this
erated. This provided an amplification mechanism concept to extend the standard models by intro-
as it led buyers to anticipate further gains, and the ducing the simplest possible heterogeneity in terms
6 Bubbles and Crashes

of two interacting populations of rational and irra- Their work was followed by a number of behav-
tional agents. One can say that the one-representative- ioral models based on the idea that trend chas-
agent theory is being progressively replaced by a ing by one class of agents produces momentum
two-representative-agents theory, analogously to the in stock prices [5, 22, 50]. The most influential
progress from the one-body to the two-body problems empirical evidence on momentum strategies came
in astronomy. from the work of Jegadeesh and Titman [55, 56],
De Long et al. [26, 27] introduced a model of who established that stock returns exhibit momentum
market bubbles and crashes, which exploits this behavior at intermediate horizons. Strategies that buy
idea of the possible role of noise traders in the stocks that have performed well in the past and sell
development of bubbles as a possible mechanism for stocks that have performed poorly in the past gener-
why asset prices may deviate from the fundamen- ate significant positive returns over 3- to 12-month
tals over rather long time periods. Their inspiration holding periods. De Bondt and Thaler [24] docu-
came from the observation of successful investors mented long-term reversals in stock returns. Stocks
such as George Soros, who reveal that they often that perform poorly in the past perform better over
exploit naive investors following positive feedback the next 3–5 years than stocks that perform well
strategies or momentum investment strategies. Pos- in the past. These findings present a serious chal-
itive feedback investors are those who buy securi- lenge to the view that markets are semistrong-form
ties when prices rise and sell when prices fall. In efficient.
the words of Jegadeesh and Titman [55], positive In practice, do investors engage in momentum
feedback investors are buying winners and selling trading? A growing number of empirical studies
losers. In a description of his own investment strat- address momentum trading by investors, with some-
egy, Soros [101] stresses that the key to his success what conflicting results. Lakonishok et al. [66] ana-
was not to counter the irrational wave of enthusi- lyzed the quarterly holdings of a sample of pension
asm that appears in financial markets, but rather to funds and found little evidence of momentum trading.
ride this wave for a while and sell out much later. Grinblatt et al. [45] examined the quarterly holdings
The model of De Long et al. [26, 27] assumes that of 274 mutual funds and found that 77% of the funds
when rational speculators receive good news and in their sample engaged in momentum trading [105].
trade on this news, they recognize that the initial Nofsinger and Sias [81] examined total institutional
price increase will stimulate buying by noise traders holdings of individual stocks and found evidence
who will follow positive feedback trading strategies of intraperiod momentum trading. Using a different
with a delay. In anticipation of these purchases, ratio- sample, Gompers and Metrick [41] investigated the
nal speculators buy more today, and so drive prices relationship between institutional holdings and lagged
up today higher than fundamental news warrants. returns and concluded that once they controlled for
Tomorrow, noise traders buy in response to increase the firm size, there was no evidence of momentum
in today’s price and so keep prices above the fun- trading. Griffin et al. [43] reported that, on a daily and
damentals. The key point is that trading between intraday basis, institutional investors engaged in trend
rational arbitrageurs and positive feedback traders chasing in NASDAQ 100 stocks. Finally, Badrinath
gives rise to bubble-like price patterns. In their model, and Wahal [4] documented the equity trading prac-
rational speculators destabilize prices because their tices of approximately 1200 institutions from the third
trading triggers positive feedback trading by other quarter of 1987 through the third quarter of 1995.
investors. Positive feedback trading reinforced by They decomposed trading by institutions into (i) the
arbitrageurs’ jumping on the bandwagon leads to a initiation of new positions (entry), (ii) the termination
positive autocorrelation of returns at short horizons. of previous positions (exit), and (iii) the adjustments
Eventually, selling out or going short by rational to ongoing holdings. Institutions were found to act
speculators will pull the prices back to the fundamen- as momentum traders when they enter stocks but as
tals, entailing a negative autocorrelation of returns contrarian traders when they exit or make adjustments
at longer horizons. In summary, De Long et al. [26, to ongoing holdings. Badrinath and Wahal [4] found
27] model suggests the coexistence of intermediate- significant differences in trading practices among dif-
horizon momentum and long-horizon reversals in ferent types of institutions. These studies are limited
stock returns. in their ability to capture the full range of trading
Bubbles and Crashes 7

practices, in part because they focus almost exclu- is reflected in the fact that hedge funds earned
sively on the behavior of institutional investors. In substantial excess returns in the technology segment
summary, many experimental studies and surveys of the NASDAQ.
suggest that positive feedback trading exists in greater
or lesser degrees.
Complex Systems Approach to Bubbles
and Crashes
Synchronization Failures among Rational Bhattacharya and Yu [7] provide a summary of
Traders recent efforts to expand on the above concepts, in
particular, to address the two main questions of
Abreu and Brunnermeier [1] propose a completely (i) the cause(s) of bubbles and crashes and (ii) the
different mechanism justifying why rational traders possibility to diagnose them ex ante. Many finan-
ride rather than arbitrage bubbles. They consider a cial economists recognize that positive feedbacks
market where arbitrageurs face synchronization risk and, in particular, herding are the key factors for
and, as a consequence, delay usage of arbitrage the growth of bubbles. Herding can result from
opportunities. Rational arbitrageurs are supposed to a variety of mechanisms, such as anticipation by
know that the market will eventually collapse. They rational investors of noise traders’ strategies [26,
know that the bubble will burst as soon as a sufficient 27], agency costs and monetary incentives given to
number of (rational) traders will sell out. However, competing fund managers [23] sometimes leading
the dispersion of rational arbitrageurs’ opinions on to the extreme Ponzi schemes [28], rational imita-
market timing and the consequent uncertainty on the tion in the presence of uncertainty [88], and social
synchronization of their sell-off are delaying this col- imitation.
lapse, allowing the bubble to grow. In this framework, The Madoff Ponzi scheme is a significant recent
bubbles persist in the short and intermediate term illustration, revealed by the unfolding of the finan-
because short sellers face synchronization risk, that cial crisis that started in 2007 [97]. It is the
is, uncertainty regarding the timing of the correction. world’s biggest fraud allegedly perpetrated by long-
As a result, arbitrageurs who conclude that other arbi- time investment adviser Bernard Madoff, arrested
trageurs are yet unlikely to trade against the bubble on December 11, 2008 and sentenced on June 29,
find it optimal to ride the still growing bubble for 2009 to 150 years in prison, the maximum allowed.
a while. His fraud led to 65 billion US dollars losses that
Like other institutional investors, hedge funds with caused reverberations around the world as the list
large holdings in US equities have to report their of victims included many wealthy private investors,
quarterly equity positions to the SEC on Form 13F. charities, hedge funds, and major banks in the United
Brunnermeier and Nagel [15] extracted hedge-fund States, Europe, and Asia. The Madoff Ponzi scheme
holdings from these data, including those of well- surfed on the general psychology, characterizing the
known managers such as Soros, Tiger, Tudor, and first decade of the twenty-first century, of exorbi-
others in the period from 1998 to 2000. They found tant unsustainable expected financial gains. It is a
that, over the sample period 1998–2000, hedge- remarkable illustration of the problem of implement-
fund portfolios were heavily tilted toward highly ing sound risk management, due diligence processes,
priced technology stocks. The proportion of their and of the capabilities of the SEC, the US mar-
overall stock holdings devoted to this segment was kets watchdog, when markets are booming and there
higher than the corresponding weight of technology is a general sentiment of a new economy and new
stocks in the market portfolio. In addition, the hedge financial era, in which old rules are believed not
funds in their sample skillfully anticipated price to apply anymore [75]. Actually, the Madoff Ponzi
peaks of individual technology stocks. On a stock- scheme is only the largest of a surprising number of
by-stock basis, hedge funds started cutting back other Ponzi schemes revealed by the financial cri-
their holdings before prices collapsed, switching sis in many different countries (see accounts from
to technology stocks that still experienced rising village.albourne.com).
prices. As a result, hedge-fund managers captured Discussing social imitation is often considered
the upturn, but avoided much of the downturn. This off-stream among financial economists but warrants
8 Bubbles and Crashes

some scrutiny, given its pervasive presence in human How can this help address the question of what
affairs. On the question of the ex ante detection is/are the cause(s) of bubbles and crashes? The crucial
of bubbles, Gurkaynak [46] summarizes the dismal insight is that a system, made of competing investors
state of the econometric approach, stating that the subjected to the myriad of influences, both exogenous
“econometric detection of asset price bubbles cannot news and endogenous interactions and reflexiv-
be achieved with a satisfactory degree of certainty. ity, can develop into endogenously self-organized
For each paper that finds evidence of bubbles, there self-reinforcing regimes, which would qualify as
is another one that fits the data equally well without bubbles, and that crashes occur as a global self-
allowing for a bubble. We are still unable to distin- organized transition. Mathematicians refer to this
guish bubbles from time-varying or regime-switching behavior as a bifurcation or more specifically as a
fundamentals, while many small sample economet- catastrophe [103]. Physicists call these phenomena
rics problems of bubble tests remain unresolved.” The phase transitions [102]. The implication of modeling
following discusses an arguably off-stream approach a market crash as a bifurcation is to solve the question
that, by using concepts and tools from the theory of of what makes a crash: in the framework of bifurca-
complex systems and statistical physics, suggests that tion theory (or phase transitions), sudden shifts in
ex ante diagnostic and partial predictability might be behavior arise from small changes in circumstances,
possible [93]. with qualitative changes in the nature of the solutions
that can occur abruptly when the parameters change
smoothly. A minor change of circumstances, of inter-
Social Mimetism, Collective Phenomena, action strength, or heterogeneity may lead to a sudden
Bifurcations, and Phase Transitions and dramatic change, such as during an earthquake
and a financial crash.
Market behavior is the aggregation of the indi-
Most approaches for explaining crashes search for
vidual behavior of the many investors participat-
possible mechanisms or effects that operate at very
ing in it. In an economy of traders with com-
short timescales (hours, days, or weeks at most).
pletely rational expectations and the same infor-
According to the “bifurcation” approach, the under-
mation sets, no bubbles are possible [104]. Ratio-
lying cause of the crash should be found in the
nal bubbles can, however, occur in infinite-horizon
preceding months and years, in the progressively
models [9], with dynamics of growth and col-
increasing buildup of market cooperativity, or effec-
lapse driven by noise traders [57, 59]. However,
tive interactions between investors, often translated
the key issue is to understand by what detailed
into accelerating ascent of the market price (the bub-
mechanism the aggregation of many individual
ble). According to this “critical” point of view, the
behaviors can give rise to bubbles and crashes.
specific manner in which prices collapsed is not
Modeling social imitation and social interactions
the most important problem: a crash occurs because
requires using approaches, little known to finan-
the market has entered an unstable phase and any
cial economists, that address the fundamental ques-
small disturbance or process may reveal the existence
tion of how global behaviors can emerge at the
of the instability.
macroscopic level. This extends the representa-
tive agent approach, but it also goes well beyond
the introduction of heterogeneous agents. A key Ising Models of Social Imitation and Phase
insight from statistical physics and complex sys- Transitions
tems theory is that systems with a large number of
interacting agents, open to their environment, self- Perhaps the simplest and historically most impor-
organize their internal structure and their dynam- tant model describing how the aggregation of many
ics with novel and sometimes surprising “emer- individual behaviors can give rise to macroscopic
gent” out-of-equilibrium properties. A central prop- out-of-equilibrium dynamics such as bubbles, with
erty of a complex system is the possible occur- bifurcations in the organization of social systems due
rence and coexistence of many large-scale collec- to slight changes in the interactions, is the Ising model
tive behaviors with a very rich structure, resulting [16, 80]. In particular, Orléan [85, 86] captured the
from the repeated nonlinear interactions among its paradox of combining rational and imitative behav-
constituents. ior under the name mimetic rationality, by developing
Bubbles and Crashes 9

models of mimetic contagion of investors in the stock V-3 Bubble as Superexponential Price
markets, which are based on irreversible processes of Growth, Diagnostic, and Prediction
opinion forming. Roehner and Sornette [88], among
others, showed that the dynamical updating rules of Bubbles are often defined as exponentially explo-
the Ising model are obtained in a natural way as the sive prices, which are followed by a sudden collapse.
optimal strategy of rational traders with limited infor-
As summarized, for instance, by Gurkaynak [46],
mation who have the possibility to make up for their
the problem with this definition is that any expo-
lack of information via information exchange with
nentially growing price regime—that one would call
other agents within their social network. The Ising
a bubble—can be also rationalized by a fundamen-
model is one of the simplest models describing the
tal valuation model. This is related to the problem
competition between the ordering force of imitation
that the fundamental price is not directly observ-
or contagion and the disordering impact of private
able, giving no strong anchor to understand observed
information or idiosyncratic noise (see [77] for a tech-
prices. This was exemplified during the last Inter-
nical review).
net bubble by fundamental pricing models, which
Starting with a framework suggested by Blume
incorporated real options in the fundamental valua-
[10, 11], Brock [12], Durlauf [30–33], and Phan
tion, justifying basically any price. Mauboussin and
et al. [87] summarize the formalism starting with
Hiler [76] were among the most vocal proponents
different implementation of the agents’ decision pro-
cesses whose aggregation is inspired from statis- of the proposition, offered close to the peak of the
tical mechanics to account for social influence in Internet bubble that culminated in 2000, that bet-
individual decisions. Lux and Marchesi [72], Brock ter business models, the network effect, first-to-scale
and Hommes [13], Kaizoji [60], and Kirman and advantages, and real options effect could account
Teyssiere [63] also developed related models in which rationally for the high prices of dot-com and other
agents’ successful forecasts reinforce the forecasts. New Economy companies. These interesting views
Such models have been found to generate swings expounded in early 1999 were in synchrony with the
in opinions, regime changes, and long memory. An bull market of 1999 and preceding years. They par-
essential feature of these models is that agents are ticipated in the general optimistic view and added to
wrong for some of the time, but whenever they are the strength of the herd. Later, after the collapse of
in the majority they are essentially right. Thus, they the bubble, these explanations seemed less attractive.
are not systematically irrational [62]. Sornette and This did not escape the US Federal Reserve chairman
Zhou [99] show how Bayesian learning added to the Greenspan [42], who said: “Is it possible that there
Ising model framework reproduces the stylized facts is something fundamentally new about this current
of financial markets. Harras and Sornette [47] show period that would warrant such complacency? Yes, it
how overlearning from lucky runs of random news in is possible. Markets may have become more efficient,
the presence of social imitation may lead to endoge- competition is more global, and information technol-
nous bubbles and crashes. ogy has doubtless enhanced the stability of business
These models allow one to combine the ques- operations. But, regrettably, history is strewn with
tions on the cause of both bubbles and crashes, as visions of such new eras that, in the end, have proven
resulting from the collective emergence of herding to be a mirage. In short, history counsels caution.”
via self-reinforcing imitation and social interactions, In this vein, the buzzword “new economy” so much
which are then susceptible to phase transitions or used in the late 1990s was also in use in the 1960s
bifurcations occurring under minor changes in the during the “tronic boom” also followed by a market
control parameters. Hence, the difficulty in answering crash and during the bubble of the late 1920s before
the question of “what causes a bubble and a crash” the October 1929 crash. In the latter case, the “new”
may, in this context, be attributed to this distinctive economy was referring to firms in the utility sector.
attribute of a dynamical out-of-equilibrium system to It is remarkable how traders do not learn the lessons
exhibit bifurcation behavior in its dynamics. This line of their predecessors.
of thought has been pursued by Sornette and his coau- A better model derives from the mechanism of
thors, to propose a novel operational diagnostic of positive feedbacks discussed above, which generi-
bubbles. cally gives rise to faster-than-exponential growth of
10 Bubbles and Crashes

price (termed as superexponential ) [95, 96]. An expo- received the attention from the academic financial
nential growing price is characterized by a constant community that it perhaps deserves given the stakes.
expected growth rate. The geometric random walk is This is probably due to several factors, which include
the standard stochastic price model embodying this the following: (i) the origin of the hypothesis com-
class of behaviors. A superexponential growing price ing from analogies with complex critical systems in
is such that the growth rate grows itself as a result physics and the theory of complex systems, which
of positive feedbacks of price, momentum, and other constitutes a well-known obstacle to climb the ivory
characteristics on the growth rate [95]. As a conse- towers of standard financial economics; (ii) the non-
quence of the acceleration, the mathematical models standard (from an econometric viewpoint) formula-
generalizing the geometric random walk exhibit so- tion of the statistical tests performed until present (in
called finite-time singularities. In other words, the this respect, see the attempts in terms of a Bayesian
resulting processes are not defined for all times: the analysis of log-periodic power law (LPPL) precursors
dynamics has to end after a finite life and to transform [17] to focus on the time series of returns instead of
into something else. This captures well the transient prices, and of regime-switching model of LPPL [18]),
nature of bubbles, and the fact that the crashes ending (iii) the nonstandard expression of some of the math-
the bubbles are often the antechambers to different ematical models underpinning the hypothesis; and
market regimes. (iv) perhaps an implicit general belief in academia
Such an approach may be thought of, at first that forecasting financial instabilities is inherently
sight, to be inadequate or too naive to capture impossible. Lin et al. [68] have recently addressed
the intrinsic stochastic nature of financial prices, problem (ii) by combining a mean-reverting volatil-
whose null hypothesis is the geometric random walk ity process and a stochastic conditional return, which
model [74]. However, it is possible to generalize this reflects nonlinear positive feedbacks and continu-
simple deterministic model to incorporate nonlinear ous updates of the investors’ beliefs and sentiments.
positive feedback on the stochastic Black–Scholes When tested on the S&P500 US index from January
model, leading to the concept of stochastic finite-time 3, 1950 to November 21, 2008, the model correctly
singularities [3, 36, 37, 51, 95]. Much work still needs identifies the bubbles that ended in October 1987, in
to be done on this theoretical aspect. October 1997, in August 1998, and the information
In a series of empirical papers, Sornette and his and communication technologies (ICT) bubble that
collaborators have used this concept to empirically ended in the first quarter of 2000. Using Bayesian
test for bubbles and prognosticate their demise often inference, Lin et al. [68] find a very strong statistical
in the form of crashes. Johansen and Sornette [58] preference for their model compared with a stan-
provide perhaps the most inclusive series of tests of dard benchmark, in contradiction with Chang and
this approach. First, they identify the most extreme Feigenbaum [17], who used a unit-root model for
cumulative losses (drawdowns) in a variety of asset residuals.
classes, markets, and epochs, and show that they
belong to a probability density distribution, which is V-4 Bubbles and the Great Financial
distinct from the distribution of 99% of the smaller Crisis of 2007
drawdowns (the more “normal” market regime).
These drawdowns can thus be called outliers or kings It is appropriate to end this article with some com-
[94]. Second, they show that, for two-thirds of these ments on the relationship between the momentous
extreme drawdowns, the market prices followed a financial crisis and bubbles. The financial crisis,
superexponential behavior before their occurrences, which started with an initially well-defined epicen-
as characterized by the calibration of the power law ter focused on mortgage-backed securities (MBS),
with a finite-time singularity. has been cascading into a global economic recession,
This provides a systematic approach to diagnose whose increasing severity and uncertain duration are
for bubbles ex ante, as shown in a series of real-life continuing to lead to massive losses and damage for
tests [98, 100, 108–111]. Although this approach has billions of people. At the time of writing (July 2009),
enjoyed a large visibility in the professional financial the world still suffers from a major financial crisis
community around the world (banks, mutual funds, that has transformed into the worst economic reces-
hedge funds, investment houses, etc.), it has not yet sion since the Great Depression, perhaps on its way
Bubbles and Crashes 11

to surpass it. Heavy central bank interventions and [4] Badrinath, S.G. & Wahal, S. (2002). Momentum
government spending programs have been launched trading by institutions, Journal of Finance 57(6),
worldwide and especially in the United States and 2449–2478.
[5] Barberis, N., Shleifer, A. & Vishny, R. (1998). A model
Europe, with the hope to unfreeze credit and bolster of investor sentiment, Journal of Financial Economics
consumption. 49, 307–343.
The current financial crisis is a perfect illustration [6] Battalio, R. & Schultz, P. (2006). Option and the
of the major role played by financial bubbles. We bubble, Journal of Finance 61(5), 2071–2102.
refer to the analysis, figures, and references in [97], [7] Bhattacharya, U. & Yu, X. (2008). The causes and
which articulate a general framework, suggesting that consequences of recent financial market bubbles: an
the fundamental cause of the unfolding financial and introduction, Review of Financial Studies 21(1), 3–10.
[8] Black, F. (1986). Noise, The Journal of Finance 41(3),
economic crisis is the accumulation of five bubbles:
529–543. Papers and Proceedings of the Forty-Fourth
1. the “new economy” ICT bubble that started in Annual Meeting of the America Finance Association,
New York, NY, December 28–30, 1985.
the mid-1990s and ended with the crash of 2000;
[9] Blanchard, O.J. and Watson, M.W. (1982). Bubbles,
2. the real-estate bubble launched in large part by rational expectations and speculative markets, in Cri-
easy access to a large amount of liquidity as a sis in Economic and Financial Structure: Bubbles,
result of the active monetary policy of the US Bursts, and Shocks, P. Wachtel, ed., Lexington Books,
Federal Reserve lowering the fed rate from 6.5% Lexington.
in 2000 to 1% in 2003 and 2004 in a successful [10] Blume, L.E. (1993). The statistical mechanics of
attempt to alleviate the consequence of the 2000 strategic interaction, Game and Economic Behavior 5,
387–424.
crash;
[11] Blume, L.E. (1995). The statistical mechanics of
3. the innovations in financial engineering with the best-response strategy revisions, Game and Economic
collateralized debt obligations (CDOs) and other Behavior 11, 111–145.
derivatives of debts and loan instruments issued [12] Brock, W.A. (1993). Pathways to randomness in the
by banks and eagerly bought by the market, economy: emergent nonlinearity and chaos in eco-
accompanying and fueling the real-estate bubble; nomics and finance, Estudios Económicos 8, 3–55.
4. the commodity bubble(s) on food, metals, and [13] Brock, W.A. & Hommes, C.H. (1999). Rational animal
spirits, in The Theory of Markets, P.J.J. Herings, G. van-
energy; and
derLaan & A.J.J. Talman, eds, North-Holland, Amster-
5. the stock market bubble that peaked in October dam, pp. 109–137.
2007. [14] Brown, S.J., Goetzmann, W., Ibbotson, R.G. &
Ross, S.A. (1992). Survivorship bias in performance
These bubbles, by their interplay and mutual rein- studies, Review of Financial Studies 5(4), 553–580.
forcement, have led to the illusion of a “perpetual [15] Brunnermeier, M.K. & Nagel, S. (2004). Hedge funds
money machine”, allowing financial institutions to and the technology bubble, Journal of Finance 59(5),
extract wealth from an unsustainable artificial pro- 2013–2040.
cess. This realization calls to question the sound- [16] Callen, E. & Shapero, D. (1974). A theory of social
ness of many of the interventions to address the imitation, Physics Today July, 23–28.
[17] Chang, G. & Feigenbaum, J. (2006). A Bayesian
recent liquidity crisis that tend to encourage more
analysis of log-periodic precursors to financial crashes,
consumption. Quantitative Finance 6(1), 15–36.
[18] Chang, G. & Feigenbaum, J. (2007). Detecting log-
periodicity in a regime-switching model of stock
References returns, Quantitative Finance 8, 723–738.
[19] Chen, J., Hong, H. & Stein, J. (2002). Breadth of
[1] Abreu, D. & Brunnermeier, M.K. (2003). Bubbles and ownership and stock returns, Journal of Financial
crashes, Econometrica 71, 173–204. Economics 66, 171–205.
[2] Almazan, A., Brown, K.C., Carlson, M. & Chap- [20] Cochrane, J.H., 2003,. Stocks as money: convenience
man, D.A. (2004). Why constrain your mutual yield and the tech-stock bubble, in Asset Price Bubbles,
fund manager? Journal of Financial Economics 73, W.C. Hunter, G.G. Kaufman & M. Pomerleano, eds,
289–321. MIT Press, Cambridge.
[3] Andersen, J.V. & Sornette, D. (2004). Fearless ver- [21] Cox, A.M.G. & Hobson, D.G. (2005). Local martin-
sus fearful speculative financial bubbles, Physica A gales, bubbles and option prices, Finance and Stochas-
337(3–4), 565–585. tics 9(4), 477–492.
12 Bubbles and Crashes

[22] Daniel, K., Hirshleifer, D. & Subrahmanyam, A. [40] Galbraith, J.K. (1954/1988). The Great Crash 1929,
(1998). Investor psychology and security market under- Houghton Mifflin Company, Boston.
and overreactions, The Journal of Finance 53(6), [41] Gompers, P.A. & Metrick, A. (2001). Institutional
1839–1885. investors and equity prices, Quarterly Journal of Eco-
[23] Dass, N., Massa, M. & Patgiri, R. (2008). Mutual nomics 116, 229–259.
funds and bubbles: the surprising role of contracted [42] Greenspan, A. (1997). Federal Reserve’s Semiannual
incentives, Review of Financial Studies 21(1), 51–99. Monetary Policy Report, before the Committee on
[24] De Bondt, W.F.M. & Thaler, R.I.-I. (1985). Does Banking. Housing, and Urban Affairs, U.S. Senate,
the stock market overreact? Journal of Finance 40, February 26.
793–805. [43] Griffin, J.M., Harris, J. & Topaloglu, S. (2003). The
[25] De Long, B.J. & Shleifer, A. (1991). The stock dynamics of institutional and individual trading, Jour-
market bubble of 1929: evidence from closed-end nal of Finance 58, 2285–2320.
mutual funds, The Journal of Economic History 51(3),
[44] Grinblatt, M. & Titman, S. (1992). The persistence
675–700.
of mutual fund performance, Journal of Finance 47,
[26] De Long, J.B., Shleifer, A., Summers, L.H. & Wald-
1977–1984.
mann, R.J. (1990a). Positive feedback investment
[45] Grinblatt, M., Titman, S. & Wermers, R. (1995).
strategies and destabilizing rational speculation, The
Momentum investment strategies, portfolio perfor-
Journal of Finance 45(2), 379–395.
[27] De Long, J.B., Shleifer, A., Summers, L.H. & Wald- mance and herding: a study of mutual fund behavior,
mann, R.J. (1990b). Noise trader risk in financial mar- The American Economic Review 85(5), 1088–1105.
kets, The Journal of Political Economy 98(4), 703–738. [46] Gurkaynak, R.S. (2008). Econometric tests of asset
[28] Dimitriadi, G.G. (2004). What are “Financial Bubbles”: price bubbles: taking stock, Journal of Economic Sur-
approaches and definitions, Electronic journal “INVES- veys 22(1), 166–186.
TIGATED in RUSSIA” http://zhurnal.ape.relarn.ru/ [47] Harras, G. & Sornette, D. (2008). Endogenous versus
articles/2004/245e.pdf Exogenous Origins of Financial Rallies and Crashes
[29] Duffie, D., Garleanu, N. & Pedersen, L.H. (2002). in an Agent-based Model with Bayesian Learning and
Security lending, shorting and pricing, Journal of Imitation, ETH Zurich preprint (http://papers.ssrn.com/
Financial Economics 66, 307–339. sol3/papers.cfm?abstract id=1156348)
[30] Durlauf, S.N. (1991). Multiple equilibria and persis- [48] Harrison, M. & Kreps, D. (1978). Speculative investor
tence in aggregate fluctuations, American Economic behavior in a stock market with heterogeneous expec-
Review 81, 70–74. tations, Quarterly Journal of Economics 92, 323–336.
[31] Durlauf, S.N. (1993). Nonergodic economic growth, [49] Hong, H., Scheinkman, J. & Xiong, W. (2006). Asset
Review of Economic Studies 60(203), 349–366. float and speculative bubbles, Journal of Finance 59(3),
[32] Durlauf, S.N., (1997). Statistical mechanics approaches 1073–1117.
to socioeconomic behavior, in The Economy as an [50] Hong, H. & Stein, J.C. (2003). Differences of Opinion,
Evolving Complex System II, Santa Fe Institute Studies short-sales constraints, and market crashes, The Review
in the Sciences of Complexity, B. Arthur, S. Durlauf of Financial Studies 16(2), 487–525.
& D. Lane, eds, Addison-Wesley, Reading, MA, Vol. [51] Ide, K. & Sornette, D. (2002). Oscillatory finite-time
XXVII. singularities in finance, population and rupture, Physica
[33] Durlauf, S.N. (1999). How can statistical mechanics A 307(1–2), 63–106.
contribute to social science? Proceedings of the
[52] Jarrow, R. (1980). Heterogeneous expectations, restric-
National Academy of Sciences of the USA 96,
tions on short sales, and equilibrium asset prices, Jour-
10582–10584.
nal of Finance 35, 1105–1113.
[34] Fama, E.F. (1965). The Behavior of Stock-Market
[53] Jarrow, R., Protter, P. & Shimbo, K. (2007). Asset
Prices, Journal of Business, 38(1), 34–105.
price bubbles in a complete market, in Advances in
[35] Fisher, I. (1930). The Stock Market Crash-and After,
Macmillan, New York. Mathematical Finance, (Festschrift in honor of Dilip
[36] Fogedby, H.C. (2003). Damped finite-time-singularity Madan’s 60th birthday), M.C. Fu, R.A. Jarrow, J.-Y.
driven by noise, Physical Review E 68, 051105. Yen & R.J. Elliott, eds, Birkhäuser, pp. 97–122.
[37] Fogedby, H.C. & Poukaradzez, V. (2002). Power [54] Jarrow, R., Protter, P. & Shimbo, K. (2008). Asset price
laws and stretched exponentials in a noisy finite-time- bubbles in incomplete markets, Mathematical Finance
singularity model, Physical Review E 66, 021103. to appear.
[38] French, K.R. & Poterba, J.M. (1991). Were Japanese [55] Jegadeesh, N. & Titman, S. (1993). Returns to buying
stock prices too high? Journal of Financial Economics winners and selling losers: Implications for stock
29(2), 337–363. market efficiency, Journal of Finance 48, 65–91.
[39] Friedman, M. & Schwartz, A.J. (1963). A Monetary [56] Jegadeesh, N. & Titman, S. (2001). Profitability of
History of the United States, 1867-1960, Princeton momentum strategies: An evaluation of alternative
University Press, Princeton. explanations, Journal of Finance 54, 699–720.
Bubbles and Crashes 13

[57] Johansen, A., Ledoit, O. & Sornette, D. (2000). Crashes [73] Lux, T. & Sornette, D. (2002). On rational bubbles and
as critical points, International Journal of Theoretical fat tails, Journal of Money, Credit and Banking Part 1
and Applied Finance 3(2), 219–255. 34(3), 589–610.
[58] Johansen, A. & Sornette, D. (2004). Endogenous versus [74] Malkiel, B.G. (2007). A Random Walk Down Wall
Exogenous Crashes in Financial Markets, preprint at Street: The Time-Tested Strategy for Successful Invest-
http://papers.ssrn.com/paper.taf?abstract id=344980, ing, W.W. Norton & Co.. Revised and Updated edition
published as “Shocks, Crashes and Bubbles in Finan- (December 17, 2007).
cial Markets,” Brussels Economic Review (Cahiers [75] Markopolos, H. (2009). Testimony of Harry Markopo-
economiques de Bruxelles), 49 (3/4), Special Issue on los, CFA, CFE Chartered Financial Analyst, Certified
Nonlinear Analysis (2006) (http://ideas.repec.org/s/bxr/ fraud examiner, before the U.S. House of Represen-
bxrceb.html) tatives, Committee on Financial Services. Wesnesday,
[59] Johansen, A., Sornette, D. & Ledoit, O. (1999). Pre- February 4, 2009, 9:30am, McCarter & English LLP,
dicting financial crashes using discrete scale invariance, Boston.
Journal of Risk 1(4), 5–32. [76] Mauboussin, M.J. & Hiler, B. (1999). Rational Exuber-
[60] Kaizoji, T. (2000). Speculative bubbles and crashes in ance? Equity Research, Credit Suisse First Boston, pp.
stock markets: an interacting agent model of specula- 1–6. January 26, 1999.
tive activity, Physica A 287(3–4), 493–506. [77] McCoy, B.M. & Wu, T.T. (1973). The Two-Dimen-
[61] Kindleberger, C.P. (1978). Manias, Panics and sional Ising Model, Harvard University, Cambridge,
Crashes: A History of Financial Crises, Basic Books, MA.
New York. [78] Miller, E. (1977). Risk, uncertainty and divergence of
[62] Kirman, A.P. (1997). Interaction and Markets, opinion, Journal of Finance 32, 1151–1168.
G.R.E.Q.A.M. 97a02 , Universite Aix-Marseille III. [79] Miller, M.H. & Modigliani, F. (1961). Dividend pol-
[63] Kirman, A.P. & Teyssiere, G. (2002). Micro-economic icy, growth, and the valuation of shares, Journal of
Business, 34(4), 411–433.
models for long memory in the volatility of financial
[80] Montroll, E.W. & Badger, W.W. (1974). Introduction
time series, in The Theory of Markets, P.J.J. Her-
to Quantitative Aspects of Social Phenomena, Gordon
ings, G. VanderLaan & A.J.J. Talman, eds, North-
and Breach, New York.
Holland, Amsterdam, pp. 109–137.
[81] Nofsinger, J.R. & Sias, R.W. (1999). Herding and feed-
[64] Koski, J.L. & Pontiff, J. (1999). How Are derivatives
back trading by institutional and individual investors,
used? Evidence from the mutual fund industry, Journal
Journal of Finance 54, 2263–2295.
of Finance 54(2), 791–816.
[82] Ofek, E. & Richardson, M. (2002). The valuation
[65] Kyle, A.S. (1985). Continuous auctions and insider
and market rationality of internet stock prices, Oxford
trading, Econometrica 53, 1315–1335.
Review of Economic Policy 18(3), 265–287.
[66] Lakonishok, J., Shleifer, A. & Vishny, R.W. (1992).
[83] Ofek, E. & Richardson, M. (2003). DotCom mania:
The impact of institutional trading on stock prices, the rise and fall of internet stock prices, The Journal of
Journal of Financial Economics 32, 23–43. Finance 58(3), 1113–1137.
[67] Lamont, O.A. & Thaler, R.H. (2003). Can the market [84] Ofek, E., Richardson, M. & Whitelaw, R.F. (2004).
add and subtract? Mispricing in tech stock carve- Limited arbitrage and short sale constraints: evidence
outs, Journal of Political Economy 111(2), 227–268. from the options market, Journal of Financial Eco-
University of Chicago Press. nomics 74(2), 305–342.
[68] Lin, L., Ren, R.E. & Sornette, D. (2009). A Consistent [85] Orléan, A. (1989). Mimetic contagion and speculative
Model of ‘Explosive’ Financial Bubbles With Mean- bubbles, Theory and Decision 27, 63–92.
Reversing Residuals, preprint at http://papers.ssrn.com/ [86] Orléan, A. (1995). Bayesian interactions and collec-
abstract=1407574 tive dynamics of opinion – herd behavior and mimetic
[69] Lintner, J. (1969). The aggregation of investors’ diverse contagion, Journal of Economic Behavior and Organi-
judgments and preferences in purely competitive secu- zation 28, 257–274.
rity markets, Journal of Financial and Quantitative [87] Phan, D., Gordon, M.B. & Nadal, J.-P. (2004). Social
Analysis 4, 347–400. interactions in economic theory: an insight from sta-
[70] Loewenstein, M. & Willard, G.A. (2000a). Rational tistical mechanics, in Cognitive Economics – An Inter-
equilibrium asset-pricing bubbles in continuous trading disciplinary Approach, P. Bourgine & J.-P. Nadal, eds,
models, Journal of Economic Theory 91(1), 17–58. Springer, Berlin.
[71] Loewenstein, M. & Willard, G.A. (2000b). Local [88] Roehner, B.M. & Sornette, D. (2000). Thermometers
martingales, arbitrage and viability: free snacks and of speculative frenzy, European Physical Journal B 16,
cheap thrills, Economic Theory 16, 135–161. 729–739.
[72] Lux, T. & Marchesi, M. (1999). Scaling and criticality [89] Scheinkman, J. & Xiong, W. (2003). Overconfidence
in a stochastic multi-agent model of a financial market, and speculative bubbles, Journal of Political Economy
Nature 397, 498–500. 111, 1183–1219.
14 Bubbles and Crashes

[90] Schultz, P. (2008). Downward-sloping demand curves, [103] Thom, R. (1989). Structural Stability and Morpho-
the supply of shares, and the collapse of internet stock genesis: An Outline of a General Theory of Models,
prices, Journal of Finance 63, 351–378. Addison-Wesley, Reading, MA.
[91] Shiller, R. (2000). Irrational Exuberance, Princeton [104] Tirole, J. (1982). On the possibility of speculation under
University Press, Princeton, NJ. rational expectations, Econometrica 50, 1163–1182.
[92] Shleifer, A. & Vishny, R. (1997). Limits of arbitrage, [105] Wermers, R. (1999). Mutual fund herding and the
Journal of Finance 52, 35–55. impact on stock prices, Journal of Finance 54(2),
[93] Sornette, D. (2003). Why Stock Markets Crash (Crit- 581–622.
ical Events in Complex Financial Systems), Princeton [106] West, K.D. (1988). Bubbles, fads and stock price
University Press, Princeton NJ. volatility tests: a partial evaluation, Journal of Finance
[94] Sornette, D. (2009). Dragon-Kings, Black Swans and 43(3), 639–656.
the Prediction of Crises, in press in the Interna- [107] White, E.N. (2006). Bubbles and Busts: The 1990s
tional Journal of Terraspace Science and Engineering in the Mirror of the 1920s NBER Working Paper No.
(http://ssrn.com/abstract = 1470006). 12138 .
[95] Sornette, D. & Andersen, J.V. (2002). A nonlinear [108] Zhou, W.-X. & Sornette, D. (2003). 2000–2003 real
estate bubble in the UK but not in the USA, Physica A
super-exponential rational model of speculative finan-
329, 249–263.
cial bubbles, International Journal of Modern Physics
[109] Zhou, W.-X. & Sornette, D. (2006). Is there a real-
C 13(2), 171–188.
estate bubble in the US? Physica A 361, 297–308.
[96] Sornette, D., Takayasu, H. & Zhou, W.-X. (2003).
[110] Zhou, W.-X. & Sornette, Didier (2007). A Case
Finite-time singularity signature of hyperinflation,
Study of Speculative Financial Bubbles in the South
Physica A: Statistical Mechanics and Its Applications
African Stock Market 2003-2006 , ETH Zurich preprint
325, 492–506. (http://arxiv.org/abs/physics/0701171)
[97] Sornette, D. & Woodard, R. (2009). Financial bubbles, [111] Zhou, W.-X. & Sornette, D. (2008). Analysis of the real
real estate bubbles, derivative bubbles, and the finan- estate market in Las Vegas: bubble, seasonal patterns,
cial and economic crisis, to appear in the Proceedings and prediction of the CSW indexes, Physica A 387,
of APFA7 (Applications of Physics in Financial Analy- 243–260.
sis), in New Approaches to the Analysis of Large-Scale
Business and Economic Data, M. Takayasu, T Watan-
abe & H. Takayasu, eds., Springer (2010) (e-print at Further Reading
http://arxiv.org/abs/0905.0220)
[98] Sornette, D., Woodard, R. & Zhou, W.-X. (2008).
Abreu, D & Brunnermeier, M.K. (2002). Synchronization risk
The 2006–2008 Oil Bubble and Beyond , ETH Zurich
and delayed arbitrage, Journal of Financial Economics 66,
preprint (http://arXiv.org/abs/0806.1170)
341–360.
[99] Sornette, D. & Zhou, W.-X. (2006a). Importance
Farmer, J.D. (2002). Market force, ecology and evolution,
of positive feedbacks and over-confidence in a self-
Industrial and Corporate Change 11(5), 895–953.
fulfilling ising model of financial markets, Physica
Narasimhan, J. & Titman, S. (1993). Returns to buying winners
A: Statistical Mechanics and its Applications 370(2), and selling losers: implications for stock market efficiency,
704–726. The Journal of Finance 48(1), 65–91.
[100] Sornette, D. & Zhou, W.-X. (2006b). Predictability Narasimhan, J. & Titman, S. (2001). Profitability of momentum
of large future changes in major financial indices, strategies: an evaluation of alternative explanations, The
International Journal of Forecasting 22, 153–168. Journal of Finance 56(2), 699–720.
[101] Soros, G. (1987). The Alchemy of Finance: Reading the Shleifer, A & Summers, L.H. (1990). The noise trader approach
Mind of the Market, Wiley, Chichester. to finance, The Journal of Economic Perspectives 4(2),
[102] Stanley, H.E. (1987). Introduction to Phase Transitions 19–33.
and Critical Phenomena, Oxford University Press,
USA. TAISEI KAIZOJI & DIDIER SORNETTE
Ross, Stephen arguably the vision that underlies the entire field of
financial engineering.
The general existence of a linear pricing rule
The central focus of the work of Ross (1944–) has has further implications that Ross would later group
been to tease out the consequences of the assumption together in what he called the pricing rule representa-
that all riskless arbitrage opportunities have already tion theorem [7, p. 104]. Most important for practical
been exploited and none remain. The empirical rel- purposes is the existence of positive risk-neutral prob-
evance of the no arbitrage assumption is especially abilities and an associated riskless rate of interest, a
high in the area of financial markets for two sim- feature first noted in [4, 5]. It is this general fea-
ple reasons: there are many actors actively searching ture that makes it possible to model option prices
for arbitrage opportunities, and the exploitation of by treating the underlying stock price as a binomial
such opportunities is relatively costless. For finance, random variable in discrete time, as first introduced
therefore, the principle of no arbitrage is not merely by Cox et al. [6] in an approach that is now ubiq-
a convenient assumption that makes it possible to uitous in industry practice. It is this same general
derive clean theoretical results but even more an feature that makes it possible to characterize asset
idealization of observable empirical reality, and a prices generally as following a martingale under the
characterization of the deep and simple structure equivalent martingale measure [9], a characteriza-
underlying multifarious surface phenomena. For one tion that is also now routine in financial engineering
whose habits of mind were initially shaped by the practice.
methods of natural science, specifically physics as What is most remarkable about these conse-
taught by Richard Feynman (B.S. California Institute quences of the no arbitrage point of view is how little
of Technology, 1965), finance seemed to be an area economics has to do with it. Ross, a trained economist
of economics where a truly scientific approach was (Harvard, PhD, 1969), might well have built a rather
possible. different career, perhaps in the area of agency theory
It was exposure to the Black–Scholes option pric- where he made one of the early seminal contributions
ing theory, when Ross was starting his career as [10], but once he found finance he never looked back.
an assistant professor at the University of Pennsyl- (His subsequent involvement in agency theory largely
vania, that first sparked his interest in the line of focused on financial intermediation in a world with
research that would occupy him for the rest of his no arbitrage, as in [14, 18].)
life. If the apparently simple and eminently plausible When Ross was starting his career, economists had
assumption of no arbitrage could crack the problem already begun making inroads into finance, and one
of option pricing, perhaps it could crack other prob- of the consequences was the Sharpe–Lintner capital
lems in finance as well. In short order, Ross produced asset pricing model (CAPM) (see Modern Portfo-
what he later called the fundamental theorem of asset lio Theory). Ross [16] reinterpreted the CAPM as
pricing [7, p. 101], which linked the absence of arbi- a possible consequence of no arbitrage and then pro-
trage with the existence of a positive linear pricing posed his own arbitrage pricing theory [13] as a more
rule [12, 15] (see Fundamental Theorem of Asset general consequence that would be true whenever
Pricing). asset prices were generated by a linear factor model
Perhaps the most important practical implication such as
of this theorem is that it is possible to price assets
that are not yet traded simply by reference to the
price of assets that are already traded, and to do Ri = Ei + βij fj + εi , i = 1, . . . , n (1)
so without the need to invoke any particular theory
of asset pricing. This opened the possibility of
creating new assets, such as options, that would where Ei is the expected return on asset i, fi is an
in practical terms “complete” markets, and so help exogenous systematic factor, and εi is the random
move the economy closer to the ideal efficient noise.
frontier characterized by Kenneth Arrow (see Arrow, In such a world, it follows from no arbitrage that
Kenneth) as a complete set of markets for state- the expected return on asset i, in excess of the risk-
contingent securities [11]. Here, in the abstract, is free rate of return r, is equal to a linear combination
2 Ross, Stephen

of the factor loadings βij : References

Ei − r = λj βij (2)


[1] Cox, J.C., Ingersoll Jr, J.E. & Ross, S. (1981). A re-
examination of traditional hypotheses about the term
This is the APT generalization of the CAPM security structure of interest rates, Journal of Finance 36(4),
market line that connects the mean–variance of the 769–799.
market (rM , σM ) to that of the risk-free asset (r, 0). [2] Cox, J.C., Ingersoll Jr, J.E. & Ross, S. (1985a). An
It also follows that the optimal portfolio choice intertemporal general equilibrium model of asset prices,
for any agent can be characterized as a weighted Econometrica 53(2), 363–384.
sum of n mutual funds, one for each factor. This [3] Cox, J.C., Ingersoll Jr, J.E. & Ross, S. (1985b). A theory
of the term structure of interest rates, Econometrica
is the APT generalization of the CAPM two-fund
53(2), 385–407.
separation theorem, and unlike CAPM it does not [4] Cox, J.C. & Ross, S.A. (1976a). The valuation of options
depend on any special assumptions about either utility for alternative stochastic processes, Journal of Financial
functions or the stochastic processes driving asset Economics 3, 145–166.
returns. In a certain sense, it does not depend on [5] Cox, J.C. & Ross, S.A. (1976b). A survey of some
economics. new results in financial option pricing theory, Journal
We can understand the work of Cox et al. [1–3] as of Finance 31(2), 383–402.
an attempt to connect the insights of no arbitrage back [6] Cox, J.C., Ross, S.A. & Rubinstein, M. (1979). Option
to economic “fundamentals”. “In work on contingent pricing: a simplified approach, Journal of Financial
Economics 7, 229–263.
claims analysis, such as option pricing, it is common,
[7] Dybvig, P.H. & Ross, S.A. (1987). Arbitrage, in
and to a first approximation reasonable, to insist New Palgrave, A Dictionary of Economics, J. Eatwell,
only on a partial equilibrium between the prices of M. Milgate & P. Newman, eds, Macmillan, London,
the primary and derivative assets. For something as pp. 100–106.
fundamental as the rate of interest, however, a general [8] Grinblatt, M. (ed) (2008). Stephen A. Ross, Mentor:
equilibrium model is to be preferred” [1, p. 773]. Influence Through Generations, McGraw Hill, New
They produce a general equilibrium model driven York.
by a k-dimensional vector of state variables, but [9] Harrison, J.M. & Kreps, D. (1979). Martingales and
arbitrage in multiperiod securities markets, Journal of
are forced to specialize the model considerably in
Economic Theory 20(3), 381–408.
order to achieve definite results for the dynamics of [10] Ross, S.A. (1973). The economic theory of agency: the
interest rates and the term structure. Here, more than principal’s problem, American Economic Review 63(2),
anywhere else in Ross’s wide-ranging work, we see 134–139.
the tension between the methodologies of economics [11] Ross, S.A. (1976a). Options and efficiency, Quarterly
and finance. It is this experience, one supposes, that Journal of Economics 90(1), 75–89.
lies behind his subsequent defense of the “isolated [12] Ross, S.A. (1976b). Return, risk, and arbitrage, in Risk
and eccentric tradition” that is unique to finance and Return in Finance, I. Friend & J. Bicksler, eds,
[17, p. 34]. The tradition to which he refers is the Ballinger, Cambridge, pp. 189–217.
[13] Ross, S.A. (1976c). The Arbitrage theory of cap-
practice of approaching financial questions from the
ital asset pricing, Journal of Economic Theory 13,
perspective of no arbitrage, without the apparatus of 341–360.
utility and production functions and without demand [14] Ross, S.A. (1977). The determination of financial struc-
and supply. ture: the incentive-signalling approach, Bell Journal of
Not content with having established the core Economics 8(1), 23–40.
principles and fundamental results of the no arbitrage [15] Ross, S.A. (1978b). A simple approach to the val-
approach to finance, Ross devoted his subsequent uation of risky streams, Journal of Business 51(3),
career to making sure that the significance and wide 453–475.
[16] Ross, S.A. (1982). On the general validity of the
applicability of these results was appreciated by both
man-variance approach in large markets, in Financial
academicians and practitioners. Toward that end, his Economics: Essays in Honor of Paul Cootner, W. Sharpe
own voluminous writings have been multiplied by & P. Cootner, eds, Prentice-Hall.
the work of the many students whom he trained at [17] Ross, S.A. (1987). The interrelations of finance and
the University of Pennsylvania, then Yale, and then economics: theoretical perspectives, American Economic
MIT [8]. Review 77(2), 29–34.
Ross, Stephen 3

[18] Ross, S.A. (2004). Markets for agents: fund manage- Related Articles
ment, in The Legacy of Fischer Black, B.N. Lehman, ed,
Oxford University Press.
Arbitrage: Historical Perspectives; Arbitrage
Pricing Theory; Black, Fischer; Equivalent
Further Reading Martingale Measures; Martingale Representation
Theorem; Option Pricing Theory: Historical
Perspectives; Risk-neutral Pricing.
Ross, S.A. (1974). Portfolio Turnpike theorems for constant
policies, Journal of Financial Economics 1, 171–198.
PERRY MEHRLING
Ross, S.A. (1978a). Mutual fund separation in financial theory:
the separating distributions, Journal of Economic Theory
17(2), 254–286.
Fisher, Irving favoring consumption at the expense of saving, a
view now increasingly held by economists. Fisher
[7] also discussed the pricing and allocation of risk
The American economist Irving Fisher (born 1867, in financial markets, using a “coefficient of cau-
died 1947) advanced the use of formal mathematical tion” to represent subjective attitudes to risk tolerance
and statistical techniques in economics and finance, [2, 3, 18]. In The Rate of Interest, Fisher [8] drew
both in his own pioneering research in monetary and on the earlier work of John Rae and Eugen von
capital theory and in his roles as a mentor to a Böhm-Bawerk to examine how intertemporal alloca-
handful of talented doctoral students and as found- tion and the real interest rate depend on impatience
ing president of the Econometric Society. As an (time preference) and opportunity to invest (expected
undergraduate and a graduate student at Yale Uni- rate of return over cost). He illustrated this anal-
versity, Fisher studied with the physicist J. Willard ysis with the celebrated “Fisher diagram” showing
Gibbs and the economist and sociologist William optimal smoothing of consumption over two periods.
Graham Sumner. Fisher’s 1891 doctoral dissertation According to the “Fisher separation theorem,” the
in economics and mathematics, Mathematical Inves- time pattern of consumption is independent of the
tigations in the Theory of Value and Prices (reprinted time pattern of income (assuming perfect credit mar-
in [12], Vol. 1), was the first North American use kets), because the net present value of expected
of general equilibrium analysis—indeed, an inde- lifetime income is the relevant budget constraint for
pendent rediscovery of general equilibrium, because consumption and saving decisions, rather than income
Fisher did not read the works of Léon Walras and in a particular period. Fisher’s analysis of consump-
F.Y. Edgeworth until his thesis was nearly com- tion smoothing across time periods provided the basis
pleted. To accompany this thesis, Fisher constructed for later permanent-income and life-cycle models of
a hydraulic mechanism to simulate the determination consumption, and was extended by others to con-
of equilibrium prices and quantities, a remarkable sumption smoothing across possible states of the
achievement in the days before electronic comput- world. John Maynard Keynes later identified his con-
ers (see Brainard and Scarf in [5] and Schwalbe cept of the marginal efficiency of capital with Fisher’s
in [14]). Initially appointed to teach mathematics rate of return over costs.
at Yale, Fisher soon switched to political economy, Fisher’s Appreciation and Interest [6] presented
teaching at Yale until he retired in 1935. Stricken the “Fisher equation,” decomposing nominal interest
with tuberculosis in 1898, Fisher was on leave for into real interest and expected inflation, formalizing
three years, and did not resume a full teaching load and expounding an idea that had been briefly noted
until 1903. This ordeal turned Fisher into a relentless by, among others, John Stuart Mill and Alfred
crusader for healthier living and economic reforms, Marshall. With i as the nominal interest rate, j as
dedicated to improving the world and confident of the real interest rate, and a as the expected rate
overcoming adversity and daunting obstacles [1, 5, of appreciation of the purchasing power of money
14]. As a scientific economist and as a reformer, ([6] appeared at the end of two decades of falling
Fisher was a brilliant and multifaceted innovator, but prices),
he never managed to pull his ideas together in a grand (1 + j ) = (1 + a)(1 + i) (1)
synthesis.
In The Nature of Capital and Income, Fisher [7] in Fisher’s notation. This analysis of the relation-
popularized the concept of net present value, viewing ship between interest rates expressed in two different
capital as the present discounted value of an expected standards (money and goods, gold and silver, dol-
income stream. Controversially, Fisher excluded sav- lars and pounds sterling) led Fisher [6] to uncovered
ing from his definition of income, and advocated a interest parity (the difference between nominal inter-
spending tax instead of a tax on income as usu- est rates in two currencies is the expected rate of
ally defined. Since saving is the acquisition of assets change of the exchange rate) and to a theory of the
whose market value is the net present value of the term structure of interest rates as reflecting expecta-
expected taxable income from owning the assets, a tions about future changes in the purchasing power
tax on income (as usually defined) would involve of money. In later work (see [12], Vol. 9), Fisher
double taxation and would introduce a distortion correlated nominal interest with a distributed lag of
2 Fisher, Irving

past price level changes, deriving expected inflation it came closer than any other formula to satisfying
adaptively from past inflation. Distributed lags were seven tests for such desirable properties as deter-
introduced into economics by Fisher, who was also minateness, proportionality, and independence of the
among the first economists to use correlation analysis. units of measurement. Later research demonstrated
Long after Fisher’s death, his pioneering 1926 article that no formula can satisfy more than six of the
[10], correlating unemployment with a distributed lag seven tests, although, which one should be dropped
of inflation, was reprinted in 1973, under the title “I remains an open question. Three quarters of a century
Discovered the Phillips Curve.” later, the “Fisher ideal index” began to be adopted by
In The Purchasing Power of Money, Fisher [13] governments.
upheld the quantity theory of money, arguing that Beyond his work, Fisher encouraged quantita-
changes in the quantity of money affect real output tive research by others, notably Yale dissertations by
and real interest during adjustment periods of up to J. Pease Norton [16] and Chester A. Phillips [17],
10 years, but affect only nominal variables in the long and through his role as founding president of the
run. He extended the quantity theory’s equation of Econometric Society. Norton’s Statistical Studies of
exchange to include bank deposits: the New York Money Market is now recognized as
a landmark in time-series analysis, while Phillips’s
MV + M  V  = P T (2)
Bank Credit (together with later work by Fisher’s for-
where M is currency, M  bank deposits, V and V  mer student James Harvey Rogers) analyzed the cre-
the velocities of circulation of currency and bank ation and absorption of bank deposits by the banking
deposits, respectively, P the price level, and T an system [4]. Arguing that fluctuations in the pur-
index of the volume of transactions. Fisher attributed chasing power of money make money and bonds
economic fluctuations to the slow adjustment of nom- risky assets, contrary to the widespread “money illu-
inal interest to monetary shocks, resulting from what sion,” Fisher and his students advocated common
he termed “the money illusion” in the title of a 1928 stocks as a long-term investment, with the return
book (in [12], Vol. 8). The economy would be stable on stocks more than compensating for their risk,
if, instead of pegging the dollar price of gold, mon- once risk is calculated in real rather than in nominal
etary policy followed Fisher’s “compensated dollar” terms.
plan of regularly varying the price of gold to target Fisher was swept up in the “New Economy”
an index number of prices. Inflation targeting is a rhetoric of the 1920s stock boom. He promoted
modern version of Fisher’s proposed price level tar- several ventures, of which by far the most suc-
get (without attempting a variable peg of the price of cessful was his “Index Visible,” a precursor of the
gold, which would have made Fisher’s plan vulnera- Rolodex. Fisher sold Index Visible to Rand Kardex
ble to speculative attacks). Failing to persuade gov- for shares and stock options, which he exercised
ernments to stabilize the purchasing power of money, with borrowed money. In mid-1929, Fisher’s net
Fisher attempted to neutralize the effects of price worth was 10 million dollars. Had he died then,
level changes by advocating the creation of indexed he would have been remembered like Keynes as a
financial instruments, persuading Rand Kardex (later financial success as well as a brilliant theorist; how-
Remington Rand) to issue the first indexed bond (see ever, a few years later, Fisher’s debts exceeded his
[12], Vol. 8). Fisher tried to educate the public against assets by a million dollars—a loss of 11 million dol-
money illusion, publishing a weekly index of whole- lars, which, as John Kenneth Galbraith remarked,
sale prices calculated by an index number institute was “a substantial sum of money, even for a pro-
operating out of his house in New Haven, Connecti- fessor of economics” [1, 3]. Worst of all for his
cut. Indexed bonds, the compensated dollar, statistical public and professional reputation, Fisher memo-
verification of the quantity theory, and eradication of rably asserted in October 1929, on the eve of the
money illusion all called for a measure of the price Wall Street crash, that stock prices appeared to
level. In The Making of Index Numbers, Fisher [9] have reached a permanently high plateau. McGrat-
argued that a simple formula, the geometric mean tan and Prescott [15] hold that Fisher was right
of the Laspeyres (base-year weighted) index and the to deny that stocks were overvalued in 1929 given
Paasche (current-year weighted) index, was the best the prices/earnings multiples of the time. Whether
index number for that and all other purposes, as or not Fisher could reasonably be faulted for not
Fisher, Irving 3

predicting the subsequent errors of public policy that [4] Dimand, R. (2007). Irving Fisher and his students as
converted the downturn into the Great Depression, financial economists, in Pioneers of Financial Eco-
and even though many others were just as mistaken nomics, G. Poitras, ed., Edward Elgar, Cheltenham, UK,
Vol. 2, pp. 45–59.
about the future course of stock prices, Fisher’s mis-
[5] Dimand, R. & Geanakoplos, J. (eds) (2005). Celebrating
taken prediction was particularly pithy, quotable, and
Irving Fisher, Blackwell, Malden, MA.
memorable, and his reputation suffered as severely [6] Fisher, I. (1896). Appreciation and Interest, Macmil-
as his personal finances. Fisher’s 1933 article on lan for American Economic Association, New York.
“The Debt-Deflation Theory of Great Depressions” (reprinted in Fisher [12], 1).
[11], linking the fragility of the financial system to [7] Fisher, I. (1906). The Nature of Capital and Income,
the nonneutrality of inside nominal debt whose real Macmillan, New York. (reprinted in Fisher [12], 2).
value grew as the price level fell, was much later [8] Fisher, I. (1907). The Rate of Interest, Macmillan, New
taken up by such economists as Hyman Minsky, York. (reprinted in Fisher [12], 3).
James Tobin, Ben Bernanke, and Mervyn King [5, [9] Fisher, I. (1922). The Making of Index Numbers,
Houghton Mifflin, Boston. (reprinted in Fisher [12], 7).
14], but in the 1930s Fisher had lost his audience.
[10] Fisher, I. (1926). A statistical relation between unem-
Fisher’s 1929 debacle (together with his enthusias- ployment price changes, International Labour Review
tic embrace of causes ranging from a new world 13, 785–792. reprinted as Lost and found: (1973) I dis-
map projection, the unhealthiness of smoking, and covered the Phillips curve – Irving Fisher, Journal of
the usefulness of mathematics in economics, through Political Economy 81, 496–502.
the League of Nations, universal health insurance, [11] Fisher, I. (1933). The debt-deflation theory of great
and a low-protein diet to, more regrettably, prohi- depressions, Econometrica 1, 337–357. (reprinted in
bition and eugenics) long tarnished his public and Fisher [12], 10).
professional reputation, but he has increasingly come [12] Fisher, I. (1997). The Works of Irving Fisher, W.J.
to be recognized as a great figure in the development Barber, ed, Pickering & Chatto, London.
[13] Fisher, I. & Brown, H.G. (1911). The Purchasing Power
of theoretical and quantitative economics, including
of Money, Macmillan, New York. (reprinted in Fisher
financial economics. [12], 4).
[14] Loef, H. & Monissen, H. (eds) The Economics of Irving
References Fisher, Edward Elgar, Cheltenham, UK.
[15] McGrattan, E. & Prescott, E. (2004). The 1929 stock
[1] Allen, R.L. (1993). Irving Fisher: A Biography, Black- market: Irving Fisher was right, International Economic
well, Cambridge, MA. Review 45, 91–1009.
[2] Crockett, J.H. Jr. (1980). Irving Fisher on the financial [16] Norton, J.P. (1902). Statistical Studies in the New York
economics of uncertainty, History of Political Economy Money Market, Macmillan, New York.
12, 65–82. [17] Phillips, C. (1920). Bank Credit, Macmillan, New York.
[3] Dimand, R. (2007). Irving Fisher and financial eco- [18] Stabile, D. & Putnam, B. (2002). Irving Fisher and statis-
nomics: the equity premium puzzle, the predictabil- tical approaches to risk, Review of Financial Economics
ity of stock prices, and intertemporal allocation under 11, 191–203.
risk, Journal of the History of Economic Thought 29,
153–166. ROBERT W. DIMAND
Modigliani, Franco In 1954, Modigliani laid the groundwork for
the now-famous life cycle hypothesis (LCH) ([5],
Vol. 6, pp. 3–45). The LCH bracketed broader
macroeconomic problems such as why S/Y is larger
An Italian-born economist who fled the fascist regime in rich countries than in poor countries; why S
of Benito Mussolini at the outbreak of WWII, is greater for farm families than urban families;
Modigliani pursued the study of economics at the why lower status urban families save less than
New School of Social Research (renamed New other urban families; why when a higher future
School University) in New York where he received income is expected, more of current income will
his doctorate in 1944. He taught at several universi- be consumed now; why in countries with rising
ties but, from 1962 on, he stayed at the Massachusetts income that is expected to continue to increase,
Institute of Technology. His famous dissertation on S/Y will be smaller; and why property income that
the Keynesian system served as a springboard for mostly accrues to the rich is largely saved, whereas
many of his lifetime contributions, which include wages that are mostly earned by the poor are largely
stabilization policies, the FRB–MIT–Penn–SSRC spent. To answer these questions, the LCH model
Model (MPS), the Modigliani–Miller (M&M) the- maintains the relative income concept of the early
orem (Modigliani–Miller Theorem) and the life S/Y model. The income concept is, however, more
cycle hypothesis (LCH). Modigliani was awarded encompassing in being high or low relative to the
the Nobel Memorial Prize in economics in 1985 for individual’s lifetime or permanent income, marking
research in the latter two areas. Modigliani’s contribution to the permanent income
Modigliani contributed to making the disciplines hypothesis in consumption theory. The LCH captures
of financial economics and macroeconomics opera- how individuals save when they are young, spend
tional, and thus more quantitative from a neoclassical when they are old, and make bequests to their
perspective. The influence of his teachers, particularly children. In that scenario, consumption, C is uniform
J. Marschak and A. Wald, is seen in his quantitative over time, T , or C(T ) = (N/L)Y , where L is the
MPS model based on Keynesian economic thought number of years the representative individual lives;
and his M&M hypothesis in financial economics. N < L is the number of years the individual earns
The macroeconomic framework that Modigliani built labor income, and Y is average income. Average
emphasized the savings, consumption, investment, income is represented by a flat line, Y (T ) up to
and liquidity components of the Keynesian model. N , which falls to zero after N , when the individual
He explained the anomalous fluctuations of the sav- retires. Since income is earned for N periods, lifetime
ings (S) to income (Y ) ratio during the 1940s and income is NY, and savings is defined as the excess of
1950s. He explained the S/Y ratio by the relative Y (T ) over C(T ).
position in the income distribution of individuals, The empirical estimate of the LCH included a
and by secular and cyclical changes in income ([3], wealth-effect variable on consumption. Saving during
Vol. 2). The secular changes represent differences in an individual’s early working life is one way in
real income per capita above the highest level reached which wealth accumulates. Such an accumulation of
in any preceding year, signifying his contribution wealth reaches a peak during the person’s work-
to the relative income hypothesis in consumption ing age when income is highest. Individuals also
theory. The cyclical changes represent variation in inherit wealth. If the initial stock of wealth is A0 ,
money income measured by an index, (Yt − Yt0 )/Yt , then, at a certain age, τ , a person’s consump-
where Yt is real income per capita in current time, tion can be expressed as (L − τ )C = A + (N − τ )Y .
and Yt0 is the past peak level of such income. He Thus, we have a model of consumption explained
estimated that the secular and the cyclical affects by income and wealth or assets that can be con-
on income were approximately 0.1% and 0.125%, fronted with data. An early estimate of the coefficient
respectively. These coefficients translate to an S/Y of this LCH model yielded C = 0.76Y + 0.073A
ratio of about 11.7%. Klein and Ozmucur [1] revisited (Modigliani, ibid., 70). The result reconciled an early
Modigliani’s S/Y specification with a much larger controversy that the short-run propensity to consume
sample size and were able to reaffirm the robustness from income was between 70% and 80%, and the
of the model. long-run propensity was approximately 100%. The
2 Modigliani, Franco

reconciliation occurs because the short-run marginal is a reserve release term, and δ is a constant. The
propensity to consume (MPC) is 0.766, and assuming equations indicate that the cause and effect between
assets, A, is approximately five times income, while unborrowed reserves to GNP works through lags,
labor income is approximately 80% of income, then a causing delay responses to policy measures.
long-run MPC is approximately 0.98 = 0.8(.76Y ) + Another of Modigliani’s noteworthy contributions
5(.073Y ). to quantitative analysis is the Modigliani and Miller
Modigliani’s largest quantitative effort was the (M&M) theorem [6], which has created a revolution
MPS model. Working with the board of governors in corporate finance equivalent to the revolution in
of the Federal Reserve Banks (FRB) and the Social portfolio theory by H. Markowitz and W. Sharpe.
Science Research Council (SSRC), Modigliani built The M&M hypothesis stands on two major propo-
the MIT–Penn–SSRC (MPS) econometric model sitions, namely that “. . . market value of any firm
in the 1960s. The 1968 version, which had 171 is independent of its capital structure and is given by
endogenous and 119 exogenous variables, predicted capitalizing its expected return at the rate ρk appropri-
poorly in the 1970s and 1980s. In 1996, the FRB/US ate to its class,” and that “the average cost of capital
model replaced the MPS by incorporating rational to any firm is completely independent of the capi-
and vector autoregression types of expectations with tal structure and is equal to the capitalization rate
a view to improve forecasts. The financial sector of a pure equity stream of its class” (Italics origi-
was the dominant module in the MPS model. The nal) ([4], Vol. 3, 10–11). The M&M model can be
net worth of consumers took the form of the real demonstrated for a firm with no growth, no new net
value of money and debt. The demand for money investment, and no taxes. The firm belongs to a risk
depended on the nominal interest rate and the cur- group in which its shares can be substituted for one
rent value of output. Unborrowed reserves influ- another.
enced the short-term money rate of interest and the  of the firm can be written as Vj ≡ Sj +
The value
nominal money supply, and through the term struc- Dj = X j ρj , where X j measures expected return
ture effect, the short-term rate affected the long- on assets, ρj measures interest rate for a given risk
term rate and hence savings, which is essential for class, Dj is market value of bonds, and Sj is the
the expansion of output and employment. Out of market value of stocks. For instance, if the earnings
this process came the following two fitted demand before interest and taxes (EBIT) are $5000 and if
and supply equations that characterized the financial the low-risk interest is 10%, then the net operating
sector: income is $50 000.
The proposition of the M&M hypothesis is often
Md = − 0.0021iY − 0.0043rs Y + 0.542Y expressed as an invariance principle based on the idea
+ 0.0046NP + 0.833Mdt−1 (1) that the value of a firm is independent of how it is
financed. The proof of this invariance is based on
arbitrage. As stated by Modigliani, “. . . an investor
F R = (0.001 − 0.00204S2 − 0.00237S3 can buy and sell stocks and bonds in such a way as
to exchange one income stream for another . . . the
− 0.00223S4 )D t−1 + 0.00122iDt−1
value of the overpriced shares will fall and that of
+ 0.00144d dD t−1 + 0.646(1 − δ)RU the under priced shares will rise, thereby tending to
eliminate the discrepancy between the market values
− 0.502δCL + 0.394RD + 0.705F Rt−1
of the firms” (ibid., p. 11). For example, an investor
(2) can get a 6% return either by holding the stocks of
an unlevered firm (0.06X1 ), or holding the stocks
where Md is demand for deposits held by the public, and debts of a levered firm, that is, [0.06(X2 − rD2 )
Y is gross national product (GNP), rs is the savings of stocks + 0.06rD2 of debts], where the subscripts
deposit rate, i is the available return on short-term refer to firms, X is stock, D is debt, and r is return.
assets, P is expected profits, F R is free reserves, Si The M&M hypothesis was a springboard for many
are seasonal adjustments, D is the expected value new works in finance. A first extension of the model
of the stock of member banks deposits, RU is by the authors reflected the effect of corporate tax
unborrowed reserves, CL is commercial loans, RL effects. Further analysis incorporating the effects of
Modigliani, Franco 3

personal and corporate income taxes does not change [2] Mehrling, P. (2005). Fisher Black and the Revolutionary
the value of the firm because both personal and Idea of Finance, John Wiley & Sons, Inc, Hoboken.
corporate tax rates tend to cancel out. Researchers [3] Modigliani, F. (1980). Fluctuations in the saving-income
ratio: a problem in economic forecasting, in The Collected
dealt with questions that arise when the concept Papers of Franco Modigliani, The Life Cycle Hypothesis
of risk class used in the computation of a firm’s of Savings, A. Abel, & S. Johnson, eds, The MIT Press,
value is replaced with perfect market assumptions, Cambridge, MA, Vol. 2.
and when mean–variance models are used instead [4] Modigliani, F. (1980). The cost of capital, corporate
of arbitrage. The value of the firm was also found finance and the theory of investment, in The Collected
to be independent of dividend policy. By changing Papers of Franco Modigliani, The Theory of Finance and
the discount rate for the purpose of calculating a Other Essays, A. Abel, ed., The MIT Press, Cambridge,
MA, Vol.3.
firm’s present value, it was found that bankruptcy can [5] Modigliani, F. (2005). Collected Papers of Franco
have an effect on the value of a firm. Macroeconomic Modigliani, F. Modigliani, ed., The MIT Press,
variables such as the inflation rate can result in the Cambridge, MA, Vol. 6.
underestimation of the value of a firm’s equity. [6] Modigliani, F. & Miller, M. (1958). The cost of cap-
The M&M theorem has been extended into many ital, corporation finance and the theory of investment,
areas of modern research. It supports the popular American Economic Review 48(3), 261–297.
Black–Scholes capital structure model. It has been
used to validate the effect of the Tax Reform Act Further Reading
of 1986 on values of the firm. Modern capital asset
pricing model (CAPM) scholars such as Sharpe Modigliani, F. (2003). The Keynesian Gospel according to
(Sharpe, William F.), J. Lintner, and J. Treynor Modigliani, The American Economist 47(1), 3–24.
[2] were influenced by the M&M result in the Ramrattan, L. & Szenberg, M. (2004). Franco Modigliani
construction of their financial models and ratios. 1918–2003, in memoriam, The American Economist 43(1),
On a personal level, Modigliani was an outstand- 3–8.
ingly enthusiastic, passionate, relentless, and focus- Szenberg, M. & Ramrattan, L. (2008). Franco Modigliani,
A Mind That Never Rests with a Foreword by Robert M.
driven teacher and exceptional researcher whose
Solow, Palgrave Macmillan, Houndmills, Basingstoke and
arena was both economic theory and the real New York.
empirical world.

References Related Articles

[1] Klein, L.R. & Ozmucur, S. (2005). The Wealth Effect: A Modigliani–Miller Theorem.
Contemporary Update, paper presented at the New School
University. MICHAEL SZENBERG & LALL RAMRATTAN
Arrow, Kenneth and Morgenstern [41], Hernstein and Milnor [33],
De Groot [31], and Villegas [40]. The legacy of
Arrow’s work is very extensive and some of it
Most financial decisions are made under conditions surprising. This article describes his legacy along
of uncertainty. Yet a formal analysis of markets under three lines: (i) individual and idiosyncratic risks,
uncertainty emerged only recently, in the 1950s. The (ii) rare risks and catastrophic events, and (iii)
matter is complex as it involves explaining how endogenous uncertainty.
individuals make decisions when facing uncertain
situations, the behavior of market instruments such
as insurance, securities, and their prices, the welfare Biographical Background
properties of the distribution of goods and services
under uncertainty, and how risks are shared among Kenneth Joseph Arrow is American economist and
the traders. It is not even obvious how to formulate joint winner of the Nobel Memorial Prize in Eco-
market clearing under conditions of uncertainty. A nomics with John Hicks in 1972. Arrow taught at
popular view in the middle of the last century was Stanford University and Harvard University. He is
that markets would only clear on the average and one of the founders of modern (post World War
asymptotically in large economies.a This approach II) economic theory, and one of the most important
was a reflection of how insurance markets work, and economists of the twentieth century. For a full bio-
followed a notion of actuarially fair trading. graphical note, the reader is referred to [18]. Born in
A different formulation was proposed in the 1921 in New York City to Harry and Lilian Arrow,
early 1950s by Arrow and Debreu [10, 12, 30]. Kenneth was raised in the city. He graduated from
They introduced an economic theory of markets in Townsend Harris High School and earned a bach-
which the treatment of uncertainty follows basic elor’s degree from the City College of New York
principles of physics. The contribution of Arrow studying under Alfred Tarski. After graduating in
and Debreu is as fundamental as it is surpris- 1940, he went to Columbia University and after a
ing. For Arrow and Debreu, markets under uncer- hiatus caused by World War II, when he served
tainty are formally identical to markets without with the Weather Division of the Army Air Forces,
uncertainty. In their approach, uncertainty all but he returned to Columbia University to study under
disappears.b the great statistician Harold Hotelling at Columbia
It may seem curious to explain trade with uncer- University. He received a master’s degree in 1941
tainty as though uncertainty did not matter. The studying under A. Wald, who was the supervisor
disappearing act of the issue at stake is an unusual of his master’s thesis on stochastic processes. From
way to think about financial risk, and how we trade 1946 to 1949 he spent his time partly as a grad-
when facing such risks. But the insight is valu- uate student at Columbia and partly as a research
able. Arrow and Debreu produced a rigorous, con- associate at the Cowles Commission for Research in
sistent, general theory of markets under uncertainty Economics at the University of Chicago; it was in
that inherits the most important properties of mar- in Chicago that he met his wife Selma Schweitzer.
kets without uncertainty. In doing so, they forced us During that time, he also held the position of Assis-
to clarify what is intrinsically different about uncer- tant Professor of Economics at the University of
tainty. Chicago. Initially interested in following a career as
This article summarizes the theory of markets an actuary, in 1951 he earned his doctorate in eco-
under uncertainty that Arrow and Debreu created, nomics from Columbia University working under the
including critical issues that arise from it, and also supervision of Harold Hotelling and Albert Hart. His
its legacy. It focuses on the way Arrow introduced published work on risk started in 1951 [3]. In devel-
securities: how he defined them and the limits of oping his own approach to risk, Arrow grapples with
his theory. It mentions the theory of insurance the ideas of Shackle [39], Knight [35], and Keynes
that Arrow pioneered together with Malinvaud and [34] among others, seeking and not always finding
others [6], as well as the theory of risk bearing a rigorous mathematical foundation. His best-known
that Arrow developed on the basis of expected works on financial markets date back to 1953 [3].
utility [7], following the axioms of Von Neumann These works provide a solid foundation based on the
2 Arrow, Kenneth

role of securities in the allocation of risks [4, 5, 7, nature [4, 5]. This new approach no longer requires
9, 10]. His approach can be described as a state con- trading “contingent” commodities but rather trading
tingent security approach to the allocations of risks a combination of commodities and securities. Arrow
in an economy, and is largely an extension of the proves that by trading commodities and securities,
same approach he followed in his work on general one can achieve the same results as trading state
equilibrium theory with Gerard Debreu, for which he contingent commodities [4, 5]. Rather than needing
was awarded the Nobel Prize in 1972 [8]. Neverthe- N × S markets, one needs a fewer number of mar-
less, his work connects also with social issues of risk kets, namely, N markets for commodities and S − 1
allocation and with the French literature of the time, markets for securities. This approach was a great
especially [1, 2]. improvement and led to the study of securities in
a rigorous and productive manner, an area in which
his work has left a large legacy. The mathematical
Markets under Uncertainty requirement to reach Pareto efficiency was simplified
gradually to require that the securities traded should
The Arrow–Debreu theory conceptualizes uncer- provide for each trader a set of choices with the same
tainty with a number of possible states of the world dimension as the original state contingent commod-
s = 1, 2, . . . that may occur. Commodities can be in ity approach. When this condition is not satisfied, the
one of several states, and are traded separately in markets are called “incomplete”. This led to a large
each of the states of nature. In this theory, one does literature on incomplete markets, for example, [26,
not trade a good, but a “contingent good”, namely, 32], in which Pareto efficiency is not assured, and
a good in each state of the world: apples when it government intervention may be required, an area that
rains and apples when it shines [10, 12, 30]. This exceeds the scope of this article.
way the theory of markets with N goods and S
states of nature is formally identical to the theory
of markets without uncertainty but with N × S com- Individual Risk and Insurance
modities. Traders trade “state contingent commodi-
ties”. This simple formulation allows one to apply the The Arrow–Debreu theory is not equally well suited
results of the theory of markets without uncertainty, for all types of risks. In some cases, it could require
to markets with uncertainty. One recovers most of an unrealistically large number of markets to reach
the important results such as (i) the existence of a efficient allocations. A clear example of this phe-
market equilibrium and (ii) the “invisible hand theo- nomenon arises for those risks that pertain to one
rem” that establishes that market solutions are always individual at a time, called individual risks, which
Pareto efficient. The approach is elegant, simple, and are not readily interpreted as states of the world on
general. which we all agree and are willing to trade. Indi-
Along with its elegance and simplicity, the formu- viduals’ accidents, illnesses, deaths, and defaults, are
lation of this theory can be unexpectedly demanding. frequent and important risks that fall under this cat-
It requires that we all agree on all the possible states egory. Arrow [6] and Malinvaud [37] showed how
of the world that describe “collective uncertainty”, individual uncertainty can be reformulated or reinter-
and that we trade accordingly. This turns out to be preted as collective uncertainty. Malinvaud formal-
more demanding than it seems: for example, one may ized the creation of states of collective risks from
need to have a separate market for apples when it individual risks, by lists that describe all individu-
rains than when it does not, and separate market als in the economy, each in one state of individual
prices for each case. The assumption requires N × S risk. The theory of markets can be reinterpreted
markets to guarantee market efficiency, a requirement accordingly [14, 37, 38], yet remains somewhat awk-
that in some cases militates against the applicabil- ward. The process of trading under individual risk
ity of the theory. In a later article, Arrow simplified using the Arrow–Debreu theory requires an unreal-
the demands of the theory and reduced the num- istically large number of markets. For example with
ber of markets needed for efficiency by defining N individuals, each in one of two individual states
“securities”, which are different payments of money G (good) and B (bad), the number of (collective)
exchanged among the traders in different states of states that are required to apply the Arrow–Debreu
Arrow, Kenneth 3

theory is S = 2N . The number of markets required an “expected utility function”. This means that they
is as above, either S × N or N + S − 1. But with behave as though they have (i) a utility u for
N = 300 million people, as in the US economy, commodities, which is independent of the state of
applying the Arrow–Debreu approach would require nature, and (ii) subjective probabilities about how
N × S = N × 2300 million markets to achieve Pareto likely are the various states of nature. Using the
efficiency, more markets than the total amount of classic axioms one constructs a ranking of choice
particles in the known universe [25]. For this rea- under uncertainty obtaining a well-known expected
son, individual uncertainty is best treated with another utility approach. Specifically, traders choose over
formulation of uncertainty involving individual states “lotteries” that achieve different outcomes in different
of uncertainty and insurance rather than securities, states of nature. When states of nature and outcomes
in which market clearing is defined on the aver- are represented by real numbers in R, a lottery
age and may never actually occur. In this new is a function f : R → R N , a utility is a function
approach, instead of requiring N + S − 1 markets, u : R N → R,  and a subjective probability is p : R →
one requires only N commodity markets and, with [0, 1] with R p(s) = 1. Von Neumann, Arrow, and
two states of individual risk, just one security: an Hernstein and Milnor, all obtained the same classic
insurance contract suffices to obtain asymptotic effi- “representation theorem” that identifies choice under
ciency [37, 38]. This is a satisfactory theory of uncertainty by the ranking of lotteries according to
individual risk and insurance, but it leads only to a real-valued function W, where W has the now
asymptotic market clearing and Pareto efficiency. familiar “expected utility” form:
More recently, the theory was improved and it was 
shown that one can obtain exact market-clearing solu- W (f ) = p(s).u(f (s)) ds (1)
s∈R
tions and Pareto-efficient allocations based on N
commodity markets with the introduction of a lim- The utility function u is typically bounded to avoid
ited number of financial instruments called mutual paradoxical behavior. The expected utility approach
insurance [14]. It is shown in [14] that if there are just described has been generally used since the mid-
N households (consisting of H types), each fac- twentieth century. Despite its elegance and appeal,
ing the possibility of being in S individual states from the very beginning, expected utility has been
together with T collective states, then ensuring unable to explain a host of experimental evidence
Pareto optimality requires only H (S − 1)T indepen- that was reported in the work of Allais [2] and
dent mutual insurance policies plus T pure Arrow others. There has been a persistent conflict between
securities. theory and observed behavior, but no axiomatic
foundation to replace Von Neumann’s foundational
approach. The reason for this discrepancy has been
Choice and Risk Bearing identified more recently, and it is attributed to the
fact that expected utility is dominated by frequent
Choice under uncertainty explains how individuals events and neglects rare events—even those that are
rank risky outcomes. In describing how we rank potentially catastrophic, such as widespread default
choices under uncertainty, one follows principles in today’s economies. That expected utility neglects
that were established to describe the way nature rare events was shown in [17, 19, 23]. In [23],
ranks what is most likely to occur, a topic that was the problem was traced back to Arrow’s axiom of
widely explored and is at the foundation of statistics monotone continuity [7], which Arrow attributed to
[31, 40]. To explain how individuals choose under Villegas [40], and to the corresponding continuity
conditions of uncertainty, Arrow used behavioral axioms of Hernstein and Milnor, and De Groot [31],
axioms that were introduced by Von Neumann and who defined a related continuity condition denoted
Morgenstern [41] for the theory of gamesc and “SP4 ”. Because of this property, on which Arrow’s
axioms defined by De Groot [31] and Villegas [40] work is based, the expected utility approach has
for the foundation of statistics. The main result been characterized as the “dictatorship” of frequent
obtained in the middle of the twentieth century events, since it is dominated by the consideration of
was that under rather simple behavioral assumptions, “normal” and frequent events [19]. To correct this
individuals behave as though they were optimizing bias, and to represent more accurately how we choose
4 Arrow, Kenneth

under uncertainty, and to arrive at a more realistic through our economic behavior. This realization led
meaning of rationality, a new axiom was added in to the new concept of “markets with endogenous
[17, 19, 21], requiring equal treatment for frequent uncertainty”, created in 1991, and embodied in early
and for rare events. The new axiom was subsequently articles [16, 27, 28] that established some of the
proven to be the logic negation of Arrow’s monotone basic principles and welfare theorems in markets
continuity that was shown to neglect small probability with endogenous uncertainty. This, and other later
events [23]. articles ([20, 25, 27, 36]), established basic princi-
The new axioms led to a “representation theorem” ples of existence and the properties of the general
according to which the ranking of lotteries is a equilibrium of markets with endogenous uncertainty.
modified expected utility formula It is possible to extend the Arrow–Debreu theory
 of markets to encompass markets with endogenous
W (f ) = p(s).u(f (s)) ds + φ(f ) (2) uncertainty and also to prove the existence of market
s∈R
equilibrium under these conditions [20]. But in the
where φ is a continuous linear function on lotteries new formulation, Heisenberg’s uncertainty principle
defined by a finite additive measure, rather than a rears its quizzical face. It is shown that it is no longer
countably additive measure [17, 19]. This measure possible to fully hedge the risks that we create our-
assigns most weight to rare events. The new for- selves [16], no matter how many financial instruments
mulation has both types of measures, so the new we create. The equivalent of Russel’s paradox in
characterization of choice under uncertainty incor- mathematical logic appears also in this context due to
porates both (i) frequent and (ii) rare events in a the self-referential aspects of endogenous uncertainty
balanced manner, conforming more closely to the [16, 20]. Pareto efficiency of equilibrium can no
experimental evidence on how humans choose under longer be ensured. Some of the worst economic risks
uncertainty [15]. The new specification gives well- we face are endogenously determined—for example,
deserved importance to catastrophic risks, and a spe- those that led to the 2008–2009 global financial cri-
cial role to fear in decision making [23], leading to sis [27]. In [27] it was shown that the creation of
a more realistic theory of choice under uncertainty financial instruments to hedge individual risks—such
and foundations of statistics, [15, 23, 24]. The legacy as credit default insurance that is often a subject
of Kenneth Arrow’s work is surprising but strong: of discussion in today’s financial turmoil—by them-
the new theory of choice under uncertainty coincides selves induce collective risks of widespread default.
with the old when there are no catastrophic risks so The widespread default that we experience today was
that, in reality, the latter is an extension of the former anticipated in [27], in 1991, and in 2006, when it
to incorporate rare events. Some of the most interest- was attributed to endogenous uncertainty created by
ing applications are to environmental risks such as
financial innovation as well as to our choices of
global warming [25]. Here Kenneth Arrow’s work
regulation or deregulation of financial instruments.
was prescient: Arrow was a contributor to the early
Examples are the extent of reserves that are required
literature on environmental risks and irreversibilities
for investment banking operations, and the creation
[11], along with option values.
of mortgage-backed securities that are behind many
of the default risks faced today [29]. Financial inno-
Endogenous Uncertainty and Widespread vation of this nature, and the attendant regulation
of new financial instruments, causes welfare gains
Default
for individuals—but at the same time creates new
Some of the risks we face are not created by nature. risks for society that bears the collective risks that
They are our own creation, such as global warming ensue, as observed in 2008 and 2009. In this con-
or the financial crisis of 2008 and 2009 anticipated text, an extension of the Arrow–Debreu theory of
in [27]. In physics, the realization that the observer markets can no longer treat markets with endogenous
matters, that the observer is a participant and cre- uncertainty as equivalent to markets with stan-
ates uncertainty, is called Heisenberger’s uncertainty dard commodities. The symmetry of markets with
principle. The equivalent in economics is an uncer- and without uncertainty is now broken. We face a
tainty principle that describes how we create risks brave new world of financial innovation and the
Arrow, Kenneth 5

endogenous uncertainty that we create ourselves. Cre- Prix Nobel en 1972, Stockholm Nobel Foundation pp.
ation and hedging of risks are closely linked, and 253–272.
endogenous uncertainty has acquired a critical role in [9] Arrow, K. (1983). Collected Papers of Kenneth Arrow,
Belknap Press of Harvard University Press.
market performance and economic welfare, an issue
[10] Arrow, K.J. & Debreu, G. (1954). Existence of an
that Kenneth Arrow has more recently tackled him- equilibrium for a competitive economy, Econometrica
self through joint work with Frank Hahn [13]. 22, 265–290.
[11] Arrow, K.J. & Fischer, A. (1974). Environmental preser-
vation, uncertainty and irreversibilities, Quarterly Jour-
Acknowledgments nal of Economics 88(2), 312–319.
[12] Arrow, K. & Hahn, F. (1971). General Competitive
Many thanks are due to Professors Rama Cont and Perry Analysis, Holden Day, San Francisco.
Mehrling of Columbia University and Barnard College, [13] Arrow, K. & Hahn, F. (1999). Notes on sequence
respectively, for their comments and excellent suggestions. economies, transaction costs and uncertainty, Journal of
Economic Theory 86, 203–218.
[14] Cass, D., Chichilnisky, G. & Wu, H.M. (1996). Indi-
End Notes vidual risk and mutual insurance, Econometrica 64,
333–341.
a.
See [37, 38]; later on Werner Hildenbrand followed this [15] Chanel, O. & Chichilnisky, G. (2009). The influence of
approach. fear in decisions: experimental evidence, Journal of Risk
b.
They achieved the same for their treatment of economic and Uncertainty 39(3).
dynamics. Trading over time and under conditions of [16] Chichilnisky, G. (1991, 1996). Markets with endogenous
uncertainty characterizes financial markets. uncertainty: theory and policy, Columbia University
c.
And similar axioms used by Hernstein and Milnor [33]. Working paper 1991 and Theory and Decision 41(2),
d.
Specifically to avoid the so-called St. Petersburg paradox, 99–131.
see [7]. [17] Chichilnisky, G. (1996). Updating Von Neumann Morg-
ernstern axioms for choice under uncertainty with
catastrophic risks. Proceedings of Conference on Catas-
References
trophic Risks, Fields Institute for Mathematical Sciences,
Toronto, Canada.
[1] Allais, M. (ed) (1953). Fondements el Applications de la [18] Chichilnisky, G. (ed) (1999). Markets Information and
Theorie du Risque en Econometrie, CNRS, Paris. Uncertainty: Essays in Honor of Kenneth Arrow, Cam-
[2] Allais, M. (1987). The general theory of random choices bridge University Press.
in relation to the invariant cardinality and the specific [19] Chichilnisky, G. (2000). An axiomatic treatment
probability function, in Risk Decision and Rationality, of choice under uncertainty with catastrophic risks,
B.R. Munier, ed., Reidel, Dordrech The Netherlands, Resource and Energy Economics 22, 221–231.
pp. 233–289.
[20] Chichilnisky, G. (1999/2008). Existence and optimality
[3] Arrow, K. (1951). Alternative approaches to the theory
of general equilibrium with endogenous uncertainty, in
of choice in risk – taking situations, Econometrica
Markets Information and Uncertainty: Essays in Honor
19(4), 404–438.
of Kenneth Arrow, 2nd Edition, G. Chichilnisky, ed.,
[4] Arrow, K. (1953). Le Role des Valeurs Boursiers pour
Cambridge University Press, Chapter 5.
la Repartition la Meilleure des Risques, Econometrie 11,
41–47. Paris CNRS, translated in English in RES 1964 [21] Chichilnisky, G. (2009). The foundations of statis-
(below). tics with Black Swans, Mathematical Social Sciences,
[5] Arrow, K. (1953). The role of securities in the optimal DOI:10.1016/j.mathsocsci.2009.09.007.
allocation of risk bearing, Proceedings of the Colloque [22] Chichilnisky, G. (2009). The limits of econometrics: non
sur les Fondaments et Applications de la Theorie du parametric estimation in Hilbert spaces, Econometric
Risque en Econometrie. CNRS, Paris. English Transation Theory 25, 1–17.
published in The Review of Economic Studies Vol. 31, [23] Chichilnisky, G. (2009). “The Topology of Fear” invited
No. 2, April 1964, p. 91–96. presentation at NBER conference in honor of Ger-
[6] Arrow, K. (1953). Uncertainty and the welfare eco- ard Debreu, UC Berkeley, December 2006, Journal of
nomics of medical care, American Economic Review 53, Mathematical Economics 45(11–12), December 2009.
941–973. Available online 30 June 2009, ISSN 0304–4068, DOI:
[7] Arrow, K. (1970). Essays on the Theory of Risk Bearing, 10.1016/j.jmateco.2009.06.006.
North Holland, Amsterdam. [24] Chichilnisky, G. (2009a). Subjective Probability with
[8] Arrow, K. (1972). General Economic Equilibrium: Black Swans, Journal of Probability and Statistics (in
Purpose Analytical Techniques Collective Choice, Les press, 2010).
6 Arrow, Kenneth

[25] Chichilnisky, G. & Heal, G. (1993). Global environmen- [35] Knight, F. (1921). Risk Uncertainty and Profit, Houghton
tal risks, Journal of Economic Perspectives, Special Issue Miffin and Co., New York.
on the Environment Fall, 65–86. [36] Kurz, M. & Wu, H.M. (1996). Endogenous uncertainty
[26] Chichilnisky, G. & Heal, G. (1996). On the existence in a general equilibrium model with price - contingent
and the structure pseudo-equilibrium manifold, Journal contracts, Economic Theory 6, 461–488.
of Mathematical Economics 26, 171–186. [37] Malinvaud, E. (1972). The allocation of individual
[27] Chichilnisky, G. & Wu, H.M. (1991, 2006). General risks in large markets, Journal of Economic Theory 4,
equilibrium with endogenous uncertainty and default, 312–328.
Working Paper Stanford University, 1991, Journal of [38] Malinvaud, E. (1973). Markets for an exchange economy
Mathematical Economics 42, 499–524. with individual; Risks, Econometrica 41, 383–410.
[28] Chichilnisky, G., Heal, G. & Dutta, J. (1991). [39] Shackle, G.L. (1949). Expectations in Economics,
Endogenous Uncertainty and Derivative Securities in a Cambridge University Press, Cambridge, UK.
General Equilibrium Model, Working Paper Columbia [40] Villegas, C. (1964). On quantitiative probability
University. σ − algebras, Annals of Mathematical Statistics 35,
[29] Chichilnisky, G., Heal, G. & Tsomocos, D. (1995). 1789–1800.
Option values and endogenous uncertainty with asset [41] Von Neumann, J. & Morgenstern, O. (1944). Theory
backed securities, Economic Letters 48(3–4), 379–388. of Games and Economic Behavior, Princeton University
[30] Debreu, G. (1959). Theory of Value: An Axiomatic Press, Princeton, NJ.
Analysis of Economic Equilibrium, John Wiley & Sons,
New York.
[31] De Groot, M.H. (1970, 2004). Optimal Statistical Deci- Related Articles
sions, John Wiley & Sons, Hoboken New Jersey.
[32] Geanakopolos, J. (1990). An introduction to general
Arrow–Debreu Prices; Risk Aversion; Risk
equilibrium with incomplete asset markets, Journal of
Mathematical Economics 19, 1–38.
Premia; Utility Theory: Historical Perspectives.
[33] Hernstein, N. & Milnor, J. (1953). An axiomatic
approach to measurable utility, Econometrica 21, GRACIELA CHICHILNISKY
219–297.
[34] Keynes, J.M. (1921). A Treatise in Probability,
MacMillan and Co., London.
Efficient Markets Theory: mathematical model of a stochastic process (random
walk, Brownian motion, or martingale); (ii) the con-
Historical Perspectives cept of economic equilibrium; and (iii) the statistical
results about the unpredictability of stock market
prices. EMH’s creation took place only between 1959
Without any doubt, it can be said that efficient mar- and 1976, when a large number of economists became
ket hypothesis (EMH) was crucial in the emergence familiar with these three features. Between the time of
of financial economics as a proper subfield of eco- Bachelier and the development of EMH, there were
nomics. But this was not its original goal: EMH was no theoretical preoccupations per se about the ran-
initially created to give a theoretical explanation of dom character of stock prices, and research was only
the random character of stock market prices. empirical.
The historical roots of EMH can be traced back to
the nineteenth century and the early twentieth century
in the work of Regnault and Bachelier, but their work
was isolated and not embedded in a scientific com- Empirical Research between 1933 and
munity interested in finance. More immediate roots 1959
of the EMH lie in the empirical work of Cowles,
Working, and Kendall from 1933 to 1959, which laid Between 1933 and the end of the 1950s, only three
the foundation for the key works published in the authors dealt with the random character of stock
period from 1959 (Roberts) to 1976 (Fama’s reply
market prices: Cowles [3, 4], Working [24, 25], and
to LeRoy). More than any other single contributor,
Kendall [13]. They compared stock price fluctuations
it was Fama [7] in his 1965 dissertation, building on
with random simulations and found similarities. One
the work of Roberts, Cowles, and Cootner, who for-
point must be underlined: these works were strictly
mulated the EMH, suggesting that stock prices reflect
statistical, and no theory explained these empirical
all available information, and that, consequently, the
results.
actual value of a security is equal to its price. In
The situation changed at the end of the 1950s and
addition, because new information arrives randomly,
during the 1960s because of three particular events.
stock prices fluctuate randomly.
The idea that stock prices fluctuate randomly was First, the Koopmans–Vining controversy at the end of
not new: in 1863, a French broker, Jules Regnault 1940s led to a decline of descriptive approaches and
[20], had already suggested it. Regnault was the first to the increased use of modeling based on theoretical
author to put forward this hypothesis, to validate it foundations. Second, modern probability theory, and
empirically, and to give it a theoretical interpretation. consequently also the theory of stochastic processes,
In 1900, Louis Bachelier [1], a French mathemati- became usable for nonmathematicians. Significantly,
cian, used Regnault’s hypothesis and framework to economists were attracted to the new formalisms
develop the first mathematical model of Brownian by some features that were already familiar con-
motion, and tested the model by using it to price sequences of economic equilibrium. Most impor-
futures and options. In retrospect, we can recog- tant, the zero expected profit when prices follow a
nize that Bachelier’s doctoral dissertation constitutes Brownian motion reminded economists of the zero
the first work in mathematical finance. Unfortunately marginal profit in the equilibrium of a perfectly
for him, however, financial economics did not then competitive market. Third, research on the stock
exist as a scientific field, and there was no organized market became more and more popular among schol-
scientific community interested in his research. Con- ars: groups of researchers and seminars in finan-
sequently, both Regnault and Bachelier were ignored cial economics became organized; scientific journals
by economists until the 1960s. such as the Journal of Financial and Quantita-
Although these early authors did suggest mod- tive Analysis were created and a community of
eling stock prices as a stochastic process, they did scholars was born. This context raised awareness
not formulate the EMH as it is known today. EMH about the need for theoretical investigations, and
was genuinely born in linking three elements that these investigations, in turn, allowed the creation of
originally existed independently of each other: (i) the the EMH.
2 Efficient Markets Theory: Historical Perspectives

Theoretical Investigations during the 1960s link between empirical results about stock price
variations, the random walk model, and economic
Financial economists did not speak immediately of equilibrium. EMH was born.
EMH; they talked about “random walk theory”.
Following his empirical results, Working [26] was
the first author to suggest a theoretical explana-
tion; he established an explicit link between the Evolution of Fama’s Definition during the
unpredictable arrival of information and the random 1970s
character of stock market price changes. However,
this paper made no link with economic equilibrium Five years after his PhD dissertation, Fama [8]
and, probably for this reason, it was not widely offered a mathematical demonstration of the EMH.
diffused. Instead, it was Roberts [21], a professor He simplified his first definition by making the
at the University of Chicago, who first suggested implicit assumption of a representative agent. He
a link between economic concepts and the random also used another stochastic process: the martingale
walk model by using the “arbitrage proof” argu- model, which had been introduced to model the ran-
ment that had been popularized by Modigliani and dom character of stock market prices by Samuelson
Miller [19]. Then, Cowles [5] made an important [22] and Mandelbrot [17]. The martingale model
step by identifying a link between financial econo- is less restrictive than the random walk model: the
metric results and economic equilibrium. Finally, martingale model requires only independence of the
two years later, Cootner [2] linked the random walk conditional expectation of price changes, whereas
model, information, and economic equilibrium, and the random walk model requires also independence
exposed the idea of EMH, although he did not use involving the higher conditional moments (i.e., vari-
that expression. ance, skewness, and kurtosis) of the probability dis-
Cootner [2] had the essential idea of EMH, but tribution of price changes. For Fama’s [8] purposes,
he did not make the crucial empirical link because the most important attraction of the martingale for-
he considered that real-world stock price variations malism was its explicit reference to a set of informa-
were not purely random. This point of view was tion, t ,
defended by economists from MIT (such as Samuel-
E(Pt+1 |t ) − Pt = 0 (1)
son) and Stanford University (such as Working). By
contrast, economists from the University of Chicago
claimed that real stock markets were perfect, and As such, the martingale model could be used to
so were more inclined to characterize them as effi- test the implication of EMH that, if all available
cient. Thus, it was a scholar from the Univer- information is used, the expected profit is null. This
sity of Chicago, Eugene Fama, who formulated the idea led to the definition of an efficient market that is
EMH. generally used nowadays: “a market in which prices
In his 1965 PhD thesis, Fama gave the first always ‘fully reflect’ available information is called
theoretical account of EMH. In that account, the key ‘efficient’ ” [8].
assumption is the existence of “sophisticated traders” However, in 1976, LeRoy [15] showed that
who, due to their skills, make a better estimate of Fama’s demonstration is tautological and that his the-
intrinsic valuation than do other agents by using ory is not testable. Fama answered by modifying his
all available information. Provided that such traders definition and he also admitted that any test of the
have predominant access to financial resources, their EMH is a test of both market efficiency and the model
activity of buying underpriced assets and selling of equilibrium used by investors. In addition, it is
overpriced assets will tend to make prices equal striking to note that the test suggested by Fama [9]
the intrinsic values about which they have a shared (i.e., markets are efficient if stock prices are equal to
assessment and also to eliminate any expectation of the prediction provided by the model of equilibrium
profit from trading. Linking these consequences with used) does not imply any clear causality between
the random walk model, Fama added that because the random character of stock market prices and the
information arrives randomly, stock prices have to EMH; it is mostly a plausible correlation valid only
fluctuate randomly. Fama thus offered the first clear for some cases.
Efficient Markets Theory: Historical Perspectives 3

The Proliferation of Definitions since is costly, prices cannot perfectly reflect all available
the 1970s information. Consequently, they considered that per-
fectly information-efficient markets are impossible.
Fama’s modification of his definition proved to be a The history of EMH shows that the definition
fateful admission. In retrospect, it is clear that the of this theory is plural, and the initial project of
theoretical content of EMH comprised its sugges- EMH (the creation of a link between a mathematical
tion of a link between some mathematical model, model, the concept of economic equilibrium, and
some empirical results, and some concept of eco- statistical results about the unpredictability of stock
nomic equilibrium. The precise linkage proposed by market prices) has not been fully achieved. Moreover,
Fama was, however, only one of many possible link- this theory is not empirically refutable (since a test
ages, as subsequent literature would demonstrate. Just of the random character of stock prices does not
so, LeRoy [14] and Lucas [16] provided theoreti- imply a test on efficiency). Nevertheless, financial
cal proofs that efficient markets and the martingale economists have considered EMH as one of the
hypothesis are two distinct ideas: martingale is nei- pillars of financial economics because it played a key
ther necessary nor sufficient for an efficient market. role in the creation and history of financial economics
In a similar way, Samuelson [23], who gave a mathe- by linking financial results with standard economics.
matical proof that prices may be permanently equal to This link is the main contribution of EMH.
the intrinsic value and fluctuate randomly, explained
that it cannot be excluded that some agents make References
profits, contrary to the original definition of EMH. De
Meyer and Saley [6] show that stock market prices [1] Bachelier, L. (1900). Théorie de la spéculation repro-
follow a martingale even if all available information duced in Annales de l’Ecole Normale Supérieure, 3ème
is not contained in stock market prices. série 17, in Random Character of Stock Market Prices
This proliferation at the level of theory has been (English Translation: P.H. Cootner, ed, (1964)), M.I.T.
matched by proliferation at the level of empirical test- Press, Cambridge, MA, pp. 21–86.
ing, as the definition of EMH has changed depending [2] Cootner, P.H. (1962). Stock prices: random vs. sys-
tematic changes, Industrial Management Review 3(2),
on the emphasis placed by each author on one par-
24–45.
ticular feature. For instance, Fama et al. [10] defined [3] Cowles, A. (1933). Can stock market forecasters fore-
an efficient market as “a market that adjusts rapidly cast? Econometrica 1(3), 309–324.
to new information”; Jensen [12] considered that “a [4] Cowles, A. (1944). Stock market forecasting, Economet-
market is efficient with respect to information set θt rica 12(3/4), 206–214.
if it is impossible to make economic profit by trad- [5] Cowles, A. (1960). A revision of previous conclusions
ing on the basis of information set θt ”; according to regarding stock price behavior, Econometrica 28(4),
909–915.
Malkiel [18] “the market is said to be efficient with [6] De Meyer, B. & Saley, H.M. (2003). On the strategic
respect to some information set [. . .] if security prices origin of Brownian motion in finance, International
would be unaffected by revealing that information to Journal of Game Theory 31, 285–319.
all participants. Moreover, efficiency with respect to [7] Fama, E.F. (1965). The behavior of stock-market prices,
an information set [. . .] implies that it is impossible Journal of Business 38(1), 34–105.
to make economic profits by trading on the basis of [8] Fama, E.F. (1970). Efficient capital markets: a review
of theory and empirical work, Journal of Finance 25(2),
[that information set]”.
383–417.
The situation is similar regarding the tests: the [9] Fama, E.F. (1976). Efficient capital markets: reply,
type of test used depends on the definition used by Journal of Finance 31(1), 143–145.
the authors and on the data used (for instance, most [10] Fama, E.F., Fisher, L., Jensen, M.C. & Roll, R. (1969).
of the tests are done with low frequency or daily The adjustment of stock prices to new information,
data, while statistical arbitrage opportunities are dis- International Economic Review 10(1), 1–21.
cernible and exploitable at high frequency using algo- [11] Grossman, S.J. & Stiglitz, J.E. (1980). The impossibility
of informationally efficient markets, American Economic
rithmic trading). Moreover, some authors have used Review 70(3), 393–407.
the weakness of the definitions to criticize the very [12] Jensen, M.C. (1978). Some anomalous evidence regard-
relevance of efficient markets. For instance, Gross- ing market efficiency, Journal of Financial Economics
man and Stiglitz [11] argued that because information 6, 95–101.
4 Efficient Markets Theory: Historical Perspectives

[13] Kendall, M.G. (1953). The analysis of economic time- [25] Working, H. (1949). The investigation of economic
series. Part I: prices, Journal of the Royal Statistical expectations, The American Economic Review 39(3),
Society 116, 11–25. 150–166.
[14] LeRoy, S.F. (1973). Risk-aversion and the martingale [26] Working, H. (1956). New ideas and methods for price
property of stock prices, International Economic Review research, Journal of Farm Economics 38, 1427–1436.
14(2), 436–446.
[15] LeRoy, S.F. (1976). Efficient capital markets: comment,
Journal of Finance 31(1), 139–141. Further Reading
[16] Lucas, R.E. (1978). Asset prices in an exchange econ-
omy, Econometrica 46(6), 1429–1445. Jovanovic, F. (2008). The construction of the canonical history
[17] Mandelbrot, B. (1966). Forecasts of future prices, unbi- of financial economics, History of Political Economy 40(3),
ased markets, and “Martingale” models, Journal of Busi- 213–242.
ness 39(1), 242–255. Jovanovic, F. & Le Gall, P. (2001). Does God practice a
[18] Malkiel, B.G. (1992). Efficient Market Hypothesis, in random walk? The “financial physics” of a 19th century
The New Palgrave Dictionary of Money and Finance, forerunner, Jules Regnault, European Journal of the History
P. Newman, M. Milgate & J. Eatwell, eds, Macmillan, of Economic Thought 8(3), 323–362.
London. Jovanovic, F. & Poitras, G. (eds) (2007). Pioneers of Financial
[19] Modigliani, F. & Miller, M.H. (1958). The cost of Economics: Twentieth Century Contributions, Edward Elgar,
capital, corporation finance and the theory of investment, Cheltenham, Vol. 2.
The American Economic Review 48(3), 261–297. Poitras, G. (ed) (2006). Pioneers of Financial Economics: Con-
[20] Regnault, J. (1863). Calcul des Chances et Philosophie tributions prior to Irving Fisher, Edward Elgar, Cheltenham,
de la Bourse, Mallet-Bachelier and Castel, Paris. Vol. 1.
[21] Roberts, H.V. (1959). Stock-market “Patterns” and Rubinstein, M. (1975). Securities market efficiency in an
financial analysis: methodological suggestions, Journal Arrow-Debreu economy, The American Economic Review
of Finance 14(1), 1–10. 65(5), 812–824.
[22] Samuelson, P.A. (1965). Proof that properly antici-
pated prices fluctuate randomly, Industrial Management
Review 6(2), 41–49. Related Articles
[23] Samuelson, P.A. (1973). Proof that properly discounted
present value of assets vibrate randomly, Bell Journal of
Economics 4(2), 369–374.
Bachelier, Louis (1870–1946); Efficient Market
[24] Working, H. (1934). A random-difference series for use Hypothesis.
in the analysis of time series, Journal of the American
Statistical Association 29, 11–24. FRANCK JOVANOVIC
Econophysics the first to describe, for the distribution of incomes,
the eponym power laws that would later become the
center of attention of physicists and other scientists
observing this remarkable and universal statistical
The Prehistoric Times of Econophysics signature in the distribution of event sizes (earth-
quakes, avalanches, landslides, storms, forest fires,
The term econophysics was introduced in the 1990s, solar flares, commercial sales, war sizes, and so on)
endorsed in 1999 by the publication of Mantegna punctuating so many natural and social systems [3,
& Stanley’s “An Introduction to Econophysics” [33]. 29, 35, 41].
The word “econophysics”, paralleling the quests of While attempting to model the erratic motion of
biophysics or geophysics, suggests that there is a bonds and stock options in the Paris Bourse in 1900,
physics-based approach to economics. mathematician Louis Bachelier developed the mathe-
From classical to neoclassical economics and until matical theory of diffusion (and the first elements of
now, economists have been inspired by the concep- financial option pricing) and solved the parabolic dif-
tual and mathematical developments of the physical fusion equation five years before Albert Einstein [10]
sciences and by their remarkable successes in describ- established the theory of Brownian motion based on
ing and predicting natural phenomena. Reciprocally, the same diffusion equation (also underpinning the
physics has been enriched several times by develop- theory of random walks) in 1905. The ensuing mod-
ments first observed in economics. Well before the ern theory of random walks now constitutes one of
christening of econophysics as the incarnation of the the fundamental pillars of theoretical physics and eco-
multidisciplinary study of complex large-scale finan- nomics and finance models.
cial and economic systems, a multitude of small and In the early 1960s, mathematician Benoit Mandel-
large collisions have punctuated the development of brot [28] pioneered the use in financial economics
these two fields. We now mention a few that illustrate of heavy-tailed distributions (Lévy stable laws) as
the remarkable commonalities and interfertilization. opposed to the traditional Gaussian (normal) law. A
In his “Inquiry into the Nature and Causes of cohort of economists, notably at the University of
the Wealth of Nations” (1776), Adam Smith found Chicago (Merton Miller, Eugene Fama, and Richard
inspiration in the Philosophiae Naturalis Principia Roll), at MIT (Paul Samuelson), and at Carnegie Mel-
Mathematica (1687) of Isaac Newton, specifically lon University (Thomas Sargent), initially followed
based on the (novel at the time) notion of causative his steps. In his PhD thesis, Eugene Fama con-
forces. firmed that the frequency distribution of the changes
The recognition of the importance of feedbacks to in the logarithms of prices was “leptokurtic”, that
fathom the sheer complexity of economic systems has is, with a high peak and fat tails. However, other
been at the root of economic thinking for a long time. notable economists (Paul Cootner and Clive Granger)
Toward the end of the nineteenth century, the microe- opposed Mandelbrot’s proposal, on the basis of the
conomists Francis Edgeworth and Alfred Marshall argument that “the statistical theory that exists for the
drew on some of the ideas of physicists to develop normal case is nonexistent for the other members of
the notion that the economy achieves an equilibrium the class of Lévy laws.” The coup de grace was the
state like that described for gases by Clerk Maxwell mounting empirical evidence that the distributions of
and Ludwig Boltzmann. The general equilibrium the- returns were becoming closer to the Gaussian law at
ory now at the core of much of economic thinking is timescales larger than one month, at odds with the
nothing but a formalization of the idea that “every- self-similarity hypothesis associated with the Lévy
thing in the economy affects everything else” [18], laws [7, 23]. Much of the efforts in the econophysics
reminiscent of mean-field theory or self-consistent literature of the late 1990s and early 2000s revis-
effective medium methods in physics, but emphasiz- ited and refined this hypothesis, confirming on one
ing and transcending these ideas much beyond their hand the existence of the variance (which rules out
initial sense in physics. the class of Lévy distributions proposed by Mandel-
While developing the field of microeconomics brot), but also suggesting a power-law tail with an
in his “Cours d’Economie Politique” (1897), the exponent close to 3 [16, 32]—several other groups
economist and philosopher Vilfredo Pareto was have discussed alternatives, such as exponential [39]
2 Econophysics

or stretched exponential distributions [19, 24, 26]. to covariance of returns [20, 36, 37], and meth-
Financial engineers actually care about these appar- ods and models of dependence between financial
ent technicalities because the tail structure controls assets [25, 43].
the Value at Risk and other measures of large losses, At present, the most exciting progresses seem to
and physicists care because the tail may constrain be unraveling at the boundary between economics
the underlying mechanism(s). For instance, Gabaix and the biological, cognitive, and behavioral sciences.
et al. [14] attribute the large movements in stock mar- While it is difficult to argue for a physics-based foun-
ket activity to the interplay between the power-law dation of economics and finance, physics has still a
distribution of the sizes of large financial institutions role to play as a unifying framework full of concepts
and the optimal trading of such large institutions. In and tools to deal with the complex. The modeling
this domain, econophysics focuses on models that skills of physicists explain their impressive number
can reproduce and explain the main stylized facts in investment and financial institutions, where their
of financial time series: non-Gaussian fat tail dis- data-driven approach coupled with a pragmatic sense
tribution of returns, long-range autocorrelation of of theorizing has made them a most valuable com-
volatility and the absence of correlation of returns, modity on Wall Street.
multifractal property of the absolute value of returns,
and so on.
In the late 1960s, Benoit Mandelbrot left financial Acknowledgments
economics but, inspired by this first episode, went
on to explore other uncharted territories to show how We would like to thank Y. Malevergne for many
nondifferentiable geometries (that he coined fractal ), discussions and a long-term enjoyable and fruitful
collaboration.
previously developed by mathematicians from the
1870s to the 1940s, could provide new ways to deal
with the real complexity of the world [29]. He later References
returned to finance in the late 1990s in the midst
of the econophysics’ enthusiasm to model the mul- [1] Arthur, W.B. (2005). Out-of-equilibrium economics and
tifractal properties associated with the long-memory agent-based modeling, in Handbook of Computational
properties observed in financial asset returns [2, 30, Economics, Vol. 2: Agent-Based Computational Eco-
31, 34, 43]. nomics, K. Judd & L. Tesfatsion, eds, Elsevier, North
Holland.
[2] Bacry, E., Delour, J. & Muzy, J.-F. (2001). Multifractal
random walk, Physical Review E 64, 026103.
Notable Contributions [3] Bak, P. (1996). How Nature Works: The Science of Self-
Organized Criticality, Copernicus, New York.
[4] Bouchaud, J.-P. & Potters, M. (2003). Theory of finan-
The modern econophysicists are implicitly and some- cial risk and derivative pricing, From Statistical Physics
times explicitly driven by the hope that the concept to Risk Management, 2nd Edition, Cambridge University
of “universality” holds in economics and finance. The Press.
value of this strategy remains to be validated [42], [5] Bouchaud, J.-P., Sagna, N., Cont, R., El-Karoui, N. &
as most econophysicists have not yet digested the Potters, M. (1999). Phenomenology of the interest rate
curve, Applied Mathematical Finance 6, 209.
subtleties of economic thinking and failed to marry
[6] Bouchaud, J.-P. & Sornette, D. (1994). The Black-
their ideas and techniques with mainstream eco- Scholes option pricing problem in mathematical finance:
nomics. The following is a partial list of a few generalization and extensions for a large class of stochas-
notable exceptions: precursory physics approach to tic processes, Journal de Physique I France 4, 863–881.
social systems [15], agent-based models, induction, [7] Campbell, J.Y., Lo, A.W. & MacKinlay, A.C. (1997).
evolutionary models [1, 9, 11, 21], option theory The Econometrics of Financial Markets, Princeton Uni-
for incomplete markets [4, 6], interest rate curves versity Press, Princeton.
[8] Challet, D., Marsili, M. & Zhang, Y.-C. (2005). Minority
[5, 38], minority games [8], theory of Zipf law and Games, Oxford University Press, Oxford.
its economic consequences [12, 13, 27], theory of [9] Cont, R. & Bouchaud, J.-P. (2000). Herd behavior and
large price fluctuations [14], theory of bubbles and aggregate fluctuations in financial markets, Journal of
crashes [17, 22, 40], random matrix theory applied Macroeconomic Dynamics 4(2), 170–195.
Econophysics 3

[10] Einstein, A. (1905). On the motion of small particles [29] Mandelbrot, B.B. (1982). The Fractal Geometry of
suspended in liquids at rest required by the molecular- Nature, W.H. Freeman, San Francisco.
kinetic theory of heat, Annalen der Physik 17, 549–560. [30] Mandelbrot, B.B. (1997). Fractals and Scaling in
[11] Farmer, J.D. (2002). Market forces, ecology and evolu- Finance: Discontinuity, Concentration, Risk, Springer,
tion, Industrial and Corporate Change 11(5), 895–953. New York.
[12] Gabaix, X. (1999). Zipf’s law for cities: an explanation, [31] Mandelbrot, B.B., Fisher, A. & Calvet, L. (1997). A
Quarterly Journal of Economics 114(3), 739–767. Multifractal Model of Asset Returns, Cowles Founda-
[13] Gabaix, X. (2005). The Granular Origins of Aggregate tion Discussion Papers 1164, Cowles Foundation, Yale
Fluctuations, working paper, Stern School of Business, University.
New York. [32] Mantegna, R.N. & Stanley, H.E. (1995). Scaling behav-
[14] Gabaix, X., Gopikrishnan, P., Plerou, V. & Stanley, H.E. ior in the dynamics of an economic index, Nature 376,
(2003). A theory of power law distributions in financial 46–49.
market fluctuations, Nature 423, 267–270. [33] Mantegna, R. & Stanley, H.E. (1999). An Introduction to
[15] Galam, S. & Moscovici, S. (1991). Towards a theory of Econophysics: Correlations and Complexity in Finance,
collective phenomena: consensus and attitude changes Cambridge University Press, Cambridge and New York.
in groups, European Journal of Social Psychology 21, [34] Muzy, J.-F., Sornette, D., Delour, J. & Arneodo, A.
49–74. (2001). Multifractal returns and hierarchical portfolio
[16] Gopikrishnan, P., Plerou, V., Amaral, L.A.N., Meyer, M. theory, Quantitative Finance 1, 131–148.
& Stanley, H.E. (1999). Scaling of the distributions of [35] Newman, M.E.J. (2005). Power laws, Pareto distri-
fluctuations of financial market indices, Physical Review butions and Zipf’s law, Contemporary Physics 46,
E 60, 5305–5316. 323–351.
[17] Johansen, A., Sornette, D. & Ledoit, O. (1999). Pre- [36] Pafka, S. & Kondor, I. (2002). Noisy covariance matri-
dicting financial crashes using discrete scale invariance, ces and portfolio optimization, European Physical Jour-
Journal of Risk 1(4), 5–32. nal B 27, 277–280.
[18] Krugman, P. (1996). The Self-Organizing Economy, [37] Plerou, V., Gopikrishnan, P., Rosenow, B., Ama-
Blackwell, Malden. ral, L.A.N. & Stanley, H.E. (1999). Universal and non-
[19] Laherrere, J. & Sornette, D. (1999). Stretched exponen- universal properties of cross correlations in financial
tial distributions in nature and economy: fat tails with time series, Physical Review Letters 83(7), 1471–1474.
characteristic scales, European Physical Journal B 2, [38] Santa-Clara, P. & Sornette, D. (2001). The dynamics
525–539. of the forward interest rate curve with stochastic string
[20] Laloux, L., Cizeau, P., Bouchaud, J.-P. & Potters, M. shocks, The Review of Financial Studies 14(1), 149–185.
(1999). Noise dressing of financial correlation matrices, [39] Silva, A.C., Prange, R.E. & Yakovenko, V.M. (2004).
Physical Review Letters 83, 1467–1470. Exponential distribution of financial returns at meso-
[21] Lux, T. & Marchesi, M. (1999). Scaling and criticality scopic time lags: a new stylized fact, Physica A 344,
in a stochastic multi-agent model of financial market, 227–235.
Nature 397, 498–500. [40] Sornette, D. (2003). Why Stock Markets Crash, Critical
[22] Lux, T. & Sornette, D. (2002). On rational bubbles and Events in Complex Financial Systems, Princeton Univer-
fat tails, Journal of Money, Credit and Banking, Part 1 sity Press.
34(3), 589–610. [41] Sornette, D. (2006). Critical Phenomena in Natural Sci-
[23] MacKenzie, D. (2006). An Engine, Not a Camera: ences, Chaos, Fractals, Self-organization and Disorder:
How Financial Models Shape Markets, The MIT Press, Concepts and Tools, Series in Synergetics, 2nd Edition,
Cambridge, London. Springer, Heidelberg.
[24] Malevergne, Y., Pisarenko, V.F. & Sornette, D. (2005). [42] Sornette, D., Davis, A.B., Ide, K., Vixie, K.R., Pis-
Empirical distributions of log-returns: between the arenko, V. & Kamm, J.R. (2007). Algorithm for model
stretched exponential and the power law? Quantitative validation: theory and applications, Proceedings of the
Finance 5(4), 379–401. National Academy of Sciences of the United States of
[25] Malevergne, Y. & Sornette, D. (2003). Testing the Gaus- America 104(16), 6562–6567.
sian copula hypothesis for financial assets dependences, [43] Sornette, D., Malevergne, Y. & Muzy, J.F. (2003). What
Quantitative Finance 3, 231–250. causes crashes? Risk 16, 67–71. http://arXiv.org/abs/
[26] Malevergne, Y. & Sornette, D. (2006). Extreme Finan- cond-mat/0204626
cial Risks: From Dependence to Risk Management,
Springer, Heidelberg.
[27] Malevergne, Y. & Sornette, D. (2007). A two-factor Further Reading
Asset Pricing Model Based on the Fat Tail Distri-
bution of Firm Sizes, ETH Zurich working paper. Bachelier, L. (1900). Théorie de la speculation, Annales de
http://arxiv.org/abs/physics/0702027 l’Ecole Normale Supérieure (translated in the book Ran-
[28] Mandelbrot, B.B. (1963). The variation of certain spec- dom Character of Stock Market Prices), Théorie des prob-
ulative prices, Journal of Business 36, 394–419. abilités continues, 1906, Journal des Mathematiques Pures
4 Econophysics

et Appliquées; Les Probabilités cinematiques et dynamiques, Stanley, H.E. (1999). Scaling, universality, and renormaliza-
1913, Annales de l’Ecole Normale Supérieure. tion: three pillars of modern critical phenomena, Reviews of
Cardy, J.L. (1996). Scaling and Renormalization in Statistical Modern Physics 71(2), S358–S366.
Physics, Cambridge University Press, Cambridge.
Pareto, V. (1897). Cours d’Économique Politique, Macmillan, GILLES DANIEL & DIDIER SORNETTE
Paris, Vol. 2.
Kolmogorov, Andrei [5], the theory of trigonometric series, measure
and set theory, the theory of integration, approx-
Nikolaevich imation theory, constructive logic, topology, the
theory of superposition of functions and Hilbert’s
thirteenth problem, classical mechanics, ergodic the-
Andrei Nikolaevich Kolmogorov was born on ory, the theory of turbulence, diffusion and models
April 25, 1903 and died on October 20, 1987 in the of population dynamics, mathematical statistics, the
Soviet Union. theory of algorithms, information theory, the the-
Springer Verlag published (in German) Kol- ory of automata and applications of mathemati-
mogorov’s monograph “Foundations of the Theory cal methods in humanitarian sciences (including
of Probability” more than seventy-five years ago [3]. work in the theory of poetry, the statistics of
In this small, 80-page book, he not only provided text, and history), and the history and methodol-
the logical foundation of the mathematical theory of ogy of mathematics for school children and teachers
probability (axiomatics) but also defined new con- of school mathematics [4–6]. For more descriptions
cepts: conditional probability as a random variable, of Kolmogorov’s works, see [1, 7].
conditional expectations, notion of independency, the
use of Borel fields of probability, and so on. The
“Main theorem” in Chapter III “Probability in Infi- References
nite Spaces” indicated how to construct stochastic
processes starting from their finite-dimensional dis- [1] Bogolyubov, N.N., Gnedenko, B.V. & Sobolev, S.L.
tributions. His approach has made the development (1983). Andrei Nikolaevich Kolmogorov (on his eigh-
of modern mathematical finance possible. teenth birthday), Russian Mathematical Surveys 38(4),
Before writing “Foundations of the Theory of 9–27.
Probability”, Kolmogorov wrote his great paper [2] Kolmogoroff, A. (1931). Uber die analytischen Metho-
den in der Wahrscheinlichkeitsrechnung, Mathematische
“Analytical Methods in Probability Theory” [2],
Annalen, 104, 415–458.
which gave birth to the theory of Markov pro- [3] Kolmogoroff, A. (1933). Grundbegriffe der Wahrschein-
cesses in continuous time. In this paper, Kolmogorov lichkeitsrechnung, Springer, Berlin.
presented his famous forward and backward dif- [4] Kolmogorov, A.N. (1991). Mathematics and mechan-
ferential equations, which are the often-used tools ics, in Mathematics and its Applications (Soviet Series
in probability theory and its applications. He also 25), V.M. Tikhomirov, ed., Kluwer, Dordrecht, Vol. I,
gave credit to L. Bachelier for the latter’s pioneering pp. xx+551.
investigations of probabilistic schemes evolving con- [5] Kolmogorov, A.N. (1992). Probability theory and math-
ematical statistics, in Mathematics and its Applications
tinuously in time. (Soviet Series 26), A.N. Shiryayev, ed., Kluwer, Dor-
The two works mentioned earlier laid the ground- drecht, Vol. II, pp. xvi+597.
work for all subsequent developments of the theory [6] Kolmogorov, A.N. (1993). Information theory and the
of probability and stochastic processes. Today, it is theory of algorithms, in Mathematics and its Applica-
impossible to imagine the state of these sciences with- tions (Soviet Series 27), A.N. Shiryayev, ed., Kluwer,
out Kolmogorov’s contributions. Dordrecht, Vol. III, pp. xxvi+275.
Kolmogorov developed many fundamentally [7] Shiryaev, A.N. (2000). Andrei Nikolaevich Kolmogorov
(April 25, 1903 to October 20, 1987). A biographical
important concepts that have determined the progress
sketch of his life and creative paths, in Kolmogorov
in different branches of mathematics and other in Perspective, American Mathematical Society, London
branches of science and arts. Being an outstand- Mathematical Society, pp. 1–87.
ing mathematician and scientist, he obtained, besides
fundamental results in the theory of probability ALBERT N. SHIRYAEV
Bernoulli, Jacob Europe by Leonardo of Pisa, also known as Fibonacci
[6]. Rather than relying on investments with guar-
anteed rates of return, which were frowned upon
as involving usury, Muslim trade was often carried
Jacob Bernoulli (1654–1705), the son and grandson out by partnerships or companies, many involving
of spice merchants in the city of Basel, Switzerland, members of extended families. Such partnerships
was trained to be a protestant clergyman, but, follow- would be based on a written contract between those
ing his own interests and talents, instead became the involved, spelling out the agreed-upon division of the
professor of mathematics at the University of Basel profits once voyagers had returned and the goods had
from 1687 until his death. He taught mathematics been sold, the shares of each partner depending upon
to his nephew Nicolaus Bernoulli (1687–1759) and their investment of cash, supply of capital goods such
to his younger brother Johann (John, Jean) Bernoulli as ships or warehouses, and labor. According to the
(1667–1748), who was trained in medicine, but took Islamic law, if one of the partners in such an enter-
over as professor of mathematics at Basel after prise died before the end of the anticipated period of
Jacob’s death in 1705. As a professor of mathemat- the venture, his heirs were entitled to demand the dis-
ics, Johann Bernoulli, in turn, taught mathematics to solution of the firm, so that they might receive their
his sons, including Daniel Bernoulli (1700–1782), legal inheritances. Not infrequently, applied mathe-
known for the St. Petersburg paradox in probabil- maticians were called upon to calculate the value of
ity, as well as for work in hydrodynamics. Jacob and the partnership on a given intermediate date, so that
Johann Bernoulli were among the first to read and the partnership could be dissolved fairly.
understand Gottfried Wilhelm Leibniz’s articles in the In Arabic and then Latin books of commer-
Acta Eruditorum of 1684 and 1686, in which Leibniz cial arithmetic or business mathematics in general
put forth the new algorithm of calculus. They helped (geometry, for instance, volumes of barrels, might
to develop and spread Leibniz’s calculus throughout also be included), there were frequently problems
Europe, Johann teaching calculus to the Marquis de of “societies” or partnerships, which later evolved
Hôpital, who published the first calculus textbook. into the so-called “problem of points” concerning
Nicolas Bernoulli wrote his master’s thesis [1] on the division of the stakes of a gambling game if
the basis of the manuscripts of Jacob’s still unpub- it were terminated before its intended end. Typi-
lished Art of Conjecturing, and helped to spread its cally, the values of the various partners’ shares were
contents in the years between Jacob’s death and the calculated using (i) the amounts invested; (ii) the
posthumous publication of Jacob’s work in 1713 [2]. length of time it was invested in the company if
In the remainder of this article, the name “Bernoulli” all the partners were not equal in this regard; and
without any first name refers to Jacob Bernoulli. (iii) the original contract, which generally specified
(Readers should be aware that many Bernoulli math- the division of the capital and profits among part-
ematicians are not infrequently confused with each ners traveling to carry out the business and those
other. For instance, it was Jacob’s son Nicolaus, also remaining at home. The actual mathematics involved
born in 1687, but a painter and not a mathematician, in making these calculations was similar to the
who had the Latin manuscript of [2] printed, and not mathematics of calculating the price of a mixture
his nephew Nicolaus, although the latter wrote a brief [2, 7, 8]. (If, as was often the case, “story prob-
preface.) lems” were described only in long paragraphs, what
As far as the application of the art of conjectur- was intended might seem much more complex than
ing to economics (or finance) is concerned, much of if everything could have been set out in the subse-
the mathematics that Jacob Bernoulli inherited relied quently developed notation of algebraic equations.)
more on law and other institutional factors than it In Part IV of [2], Bernoulli had intended to apply
relied on statistics or mathematical probability, a dis- the mathematics of games of chance, expounded in
cipline that did not then exist. Muslim traders had Parts I–III of the book on the basis of Huygens’
played a significant role in Mediterranean commerce work, by analogy, to civil, moral, and economic
in the medieval period and in the development of problems. The fundamental principle of Huygens’
mathematics, particularly algebra, as well. Muslim and Bernoulli’s mathematics of games of chance was
mathematical methods were famously transmitted to that the game should be fair and that players should
2 Bernoulli, Jacob

pay to play a game in proportion to their expected Bernoulli’s proof with Nicolaus Bernoulli’s proof of
winnings. Most games, like business partnerships, the same theorem, see [5].
were assumed to involve only the players, so that In correspondence with Leibniz, Bernoulli unsuc-
the total paid in would equal the total paid out at the cessfully tried to obtain from Leibniz a copy of Jan
end. Here, a key concept was the number of “cases” De Witt’s rare pamphlet, in Dutch, on the mathe-
or possible alternative outcomes. If a player might matics of annuities—this was the sort of problem to
win a set amount if a die came up a 1, then there were which he hoped to apply his new mathematical the-
said to be six cases, corresponding to the six faces of ory [4]. Leibniz, in reply, without having been told
the die, of which one, the 1, would be favorable to the mathematical basis of Bernoulli’s proof of his law
that player. For this game to be fair, the player should for finding, a posteriori, ratios of cases, for instance,
pay in one-sixth of the amount he or she would win of surviving past a given age, objected that no such
if the 1 were thrown. approach would work because the causes of death
Bernoulli applied this kind of mathematics in an might be changeable over time. What if a new disease
effort to quantify the evidence that an accused person should make an appearance, leading to an increase of
had committed a crime by systematically combining early deaths? Bernoulli’s reply was that, if there were
all the various types of circumstantial evidence of such changed circumstances, then it would be neces-
the crime. He supposed that something similar might sary to make new observations to calculate new ratios
be done to judge life expectancies, except that no one for life expectancies or values of annuities [2].
knew all the “cases” that might affect life expectancy, But what if not only were there no fixed ratios of
such as the person’s inherited vigor and healthiness, cases over time, but no such regularities (underlying
the diseases to which a person might succumb, the ratios of cases) at all? For Bernoulli this was not a
accidents that might happen, and so forth. With the serious issue because he was a determinist, believing
law that later came to be known as the weak law that from the point of view of the Creator everything
of large numbers, Bernoulli proposed to discover is determined and known eternally. It is only because
a posteriori from the results many times observed we humans do not have such godlike knowledge that
in similar situations what the ratios of unobserved we cannot know the future in detail. Nevertheless, we
can increase the security and prudence of our actions
underlying “cases” might be. Most people realize,
through the application of the mathematical art of
Bernoulli said, that if you want to judge what may
conjecturing that he proposed to develop. Even before
happen in the future by what has happened in the
the publication of The Art of Conjecturing, Abraham
past, you are less liable to be mistaken if you have
De Moivre had begun to carry out with great success
made more observations or have a longer time series
the program that Bernoulli had begun [3]. Although,
of outcomes. What people do not know, he said, is
for Bernoulli, probability was an epistemic concept,
whether, if you make more and more observations,
and expectation was more fundamental than relative
you can be more and more sure, without limit,
chances, De Moivre established mathematical proba-
that your prediction is reliable. By his proof he
bility on the basis of relative frequencies.
claimed to show that there was no limit to the degree
of confidence or probability one might have that
the ratio of results would fall within some interval References
around an expected ratio. In addition, he made a
rough calculation of the number of trials (later called
[1] Bernoulli, N. (1709). De Usu Artis Conjectandi in
Bernoulli trials) that would be needed for a proposed Jure, in Die Werke von Jacob Bernoulli III, B.L. van-
degree of certainty. The mathematics he used in his der Waerden, ed., Birkhäuser, Basel, pp. 287–326.
proof basically involved binomial expansions and the An English translation of Chapter VII can be found
possible combinations and permutations of outcomes at http://www.york.ac.uk/depts/mathes/histstat/bernoulli
(“successes” or “failures”) over a long series of trials. n.htm [last access December 13, 2008].
[2] Bernoulli, J. (2006). [Ars Conjectandi (1713)], English
After a long series of trials, the distribution of ratios
translation in Jacob Bernoulli, The Art of Conjecturing
of outcomes would take the shape of a bell curve, together with Letter to a Friend on Sets in Court Tennis,
with increasing percentages of outcomes clustering E.D. Sylla ed., The Johns Hopkins University Press,
around the central value. For a comparison of Jacob Baltimore.
Bernoulli, Jacob 3

[3] De Moivre, A. (1712). De Mensura Sortis, seu, de Prob- [6] Leonardo of Pisa (Fibonacci) (2002). [Liber Abaci
abilitate Eventuum in Ludis a Casu Fortuito Penden- (1202)], English translation in Fibonacci’s Liber Abaci:
tibus Philosophical Transactions of the Royal Society 27, A Translation into Modern English of Leonardo Pisano’s
213–264 ; translated by Bruce McClintock in Hald, A. Book of Calculation, Springer Verlag, New York.
(1984a). A. De Moivre: ‘De Mensura Sortis’ or ‘On the [7] Sylla, E. (2003). Business ethics, commercial mathe-
Measurement of Chance’ . . . Commentary on ‘De Men- matics, and the origins of mathematical probability, in
sura Sortis, International Statistical Review 52, 229–262. Oeconomies in the Age of Newton, M. Schabas & N.D.
Marchi, eds, Annual Supplement to History of Politi-
After Bernoulli’s The Art of Conjecturing, De Moivre
cal Economy, Duke University Press, Durham, Vol. 35,
published The Doctrine of Chances, London 1718, 1738,
pp. 309–327.
1756.
[8] Sylla, E. (2006). Revised and expanded version of [7]:
[4] De Witt, J. (1671). Waerdye van Lyf-renten, in Die “Commercial Arithmetic, theology and the intellectual
Werke von Jacob Bernoulli III, B.L. vander Waerden, ed., foundations of Jacob Bernoulli’s Art of Conjecturing”, in
Birkhäuser, Basel, pp. 328–350. G. Poitras, ed., Pioneers of Financial Economics, Contri-
[5] Hald, A. (1984b). Nicholas Bernoulli’s theorem, Interna- butions Prior to Irving Fisher, Edward Elgar Publishing,
tional Statistical Review 52, 93–99 ; Cf. Hald, A. (1990). Cheltenham UK and Northampton MA, Vol. 1.
A History of Probability and Statistics and Their Applica-
tions before 1750, Wiley, New York. EDITH DUDLEY SYLLA
Treynor, Lawrence Jack permitted him to isolate the portion of fund return that
was actually due to the selection skills of the fund
manager. In 1981, Fischer Black wrote an open letter
Jack Lawrence Treynor was born in Council Bluffs, in the Financial Analysts Journal, stating that Treynor
Iowa, on February 21, 1930 to Jack Vernon Treynor had “developed the capital asset pricing model before
and Alice Cavin Treynor. In 1951, he graduated anyone else.”
from Haverford College on Philadelphia’s Main Line In his second Harvard Business Review paper,
with a Bachelors of Arts degree in mathematics. He Treynor and Kay Mazuy used a curvilinear regression
served two years in the US Army before moving to line to test whether funds were more sensitive to the
Cambridge, MA to attend Harvard Business School. market in the years when the market went up versus
After a year writing cases for Professor Robert the years when the market went down.
Anthony, Treynor went to work for the Operations When Fischer Black arrived at Arthur D. Little
Research department at Arthur D. Little in 1956. in 1965, Black took an interest in Treynor’s work
Treynor was particularly inspired by the 1958 and later inherited Treynor’s caseload (after Treynor
paper coauthored by Franco Modigliani and Merton went to work for Merrill Lynch.) In their paper,
H. Miller, titled “The Cost of Capital, Corporation “How to Use Security Analysis to Improve Portfolio
Finance, and the Theory of Investment.” At the Selection,” Treynor and Black proposed viewing
invitation of Modigliani, Treynor spent a sabbati- portfolios as having three distinct parts: a riskless
cal year at MIT between 1962 and 1963. While at part, a highly diversified part (devoid of specific risk),
MIT, Treynor made two presentations to the finance and an active part (which would have both specific
faculty, the first of which, “Toward a Theory of risk and market risk). The paper spells out the optimal
the Market Value of Risky Assets,” introduced the balance, not only between the three parts but also
capital asset pricing model (CAPM). The CAPM between the individual securities in the active part.
says that the return on an asset should equal the In 1966, Treynor was hired by Merrill Lynch
rate on a risk-free rate plus a premium propor- where he headed Wall Street’s first quantitative
tional to its contribution to the risk in the market research group. Treynor left Merrill Lynch in 1969
portfolio. The model is often referred to as the to serve as the editor of the Financial Analysts
Treynor–Sharpe–Lintner–Mossin CAPM to reflect Journal, with which he stayed until 1981. Treynor
the fact that it was simultaneously and independently then joined Harold Arbit in starting Treynor–Arbit
developed by multiple individuals, albeit with slight Associates, an investment firm based in Chicago.
differences. Although Treynor’s paper was not pub- Treynor continues to serve on the advisory boards
lished until Robert Korajczyk included the unrevised of the Financial Analysts Journal and the Journal of
version in his 1999 book, Asset Pricing and Portfolio Investment Management, where he is also case editor.
Performance, it is also included in the “Risk” section In addition to his 1976 book published with
of Treynor’s own 2007 book, Treynor on Institutional William Priest and Patrick Regan titled The Financial
Investing (Wiley, 2008). William F. Sharpe’s 1964 Reality of Pension Funding under ERISA, Treynor
version, which was built on the earlier work of Harry coauthored Machine Tool Leasing in 1956 with
M. Markowitz, won the Nobel Prize for Economics Richard Vancil of Harvard Business School. Treynor
in 1990. has authored and co-authored more than 90 papers on
The CAPM makes no assumptions about the factor such topics as risk, performance measurement, eco-
structure of the market. In particular, it does not nomics, trading (market microstructure), accounting,
assume the single-factor structure of the so-called investment value, active management, and pensions.
market model. However, in his Harvard Business He has also written 20 cases, many published in the
Review papers on performance measurement, Treynor Journal of Investment Management.
assumed a single factor. He used a regression of Treynor’s work has appeared in the Financial Ana-
returns on managed funds against returns on the lysts Journal, the Journal of Business, the Harvard
“market” to estimate the sensitivity of the fund Business Review, the Journal of Finance, and the
to the market factor and then used the slope of Journal of Investment Management, among others.
that regression line to estimate the contribution of Some of Treynor’s works were published under the
market fluctuations to a fund’s rate of return, which pen-name “Walter Bagehot,” a cover that offered him
2 Treynor, Lawrence Jack

anonymity while allowing him to share his often as to change the direction of the profession and
unorthodox theories. He promoted notions such as raise it to higher standards of accomplishment.” He
random walks, efficient markets, risk/return trade-off, received the Roger F. Murray prize in 1994 from
and betas that others in the field actively avoided. the Institute of Quantitative Research in Finance for
Treynor has since become renowned not only for “Active Management as an Adversary Game.” That
pushing the envelope with new ideas but also for same year he was also named a Distinguished Fellow
encouraging others to do the same as well. Eighteen of the Institute for Quantitative Research in Finance
of his papers have appeared in anthologies. along with William Sharpe, Merton Miller, and Harry
Two papers that have not been anthologized are Markowitz. In 1997, he received the EBRI Lillywhite
“Treynor’s Theory of Inflation” and “Will the Phillips Award, which is “awarded to persons who have had
Curve Cause World War III?” In these papers, he distinguished careers in the investment management
points out that, because in industry labor and capital and employee benefits fields and whose outstanding
are complements (rather than substitutes, as depicted service enhances Americans’ economic security.” In
in economics textbooks), over the business cycle they 2007, he was presented with The Award for Pro-
will become more or less scarce together. However, fessional Excellence, presented periodically by the
when capital gets more or less scarce, the identity of CFA Institute Board to “a member of the investment
the marginal machine will change. If the real wage profession whose exemplary achievement, excellence
is determined by the marginal productivity of labor of practice, and true leadership have inspired and
then (as Treynor argues) it is determined by the labor reflected honor upon our profession to the high-
productivity of the marginal machine. As demand est degree” (Previous winners were Jack Bogle and
rises and the marginal machines get older and less Warren Buffett.). In 2008, he was recognized as the
efficient, the real wage falls, but labor negotiations 2007 IAFE/SunGard Financial Engineer of the Year
fix the money wage. In order to satisfy the identity for his contributions to financial theory and practice.
money wage Treynor taught investments at Columbia Univer-
money prices ≡ (1) sity while working at the Financial Analysts Journal.
real wage
Between 1985 and 1988, Treynor taught investments
when the real wage falls, money prices must at the University of Southern California.
rise. According to Nobel Laureate Merton Miller, He is currently President of Treynor Capital Man-
Treynor’s main competitor on the topic, the Phillips agement in Palos Verdes, California.
curve is “just an empirical regularity” (i.e., just data
snooping). Further Reading
Treynor has won the Financial Analysts Jour-
nal’s Graham and Dodd Scroll award in 1968,
Bernstein, P.L. (1992). ‘Capital Ideas: The Improbable Origins
1982, twice in 1987, for “The Economics of the of Modern Wall Street’, The Free Press, New York.
Dealer Function” and “Market Efficiency and the Black, F.S. (1981). An open letter to Jack Treynor, Financial
Bean Jar Experiment,” in 1998 for “Bulls Bears Analysts Journal July/August, 14.
and Market Bubbles”, and in 1999 for “The Invest- Black, F.S. & Treynor, J.L. (1973). How to use security
ment Value of Brand Franchise.” In 1981 Treynor analysis to improve portfolio selection, The Journal of
was again recognized for his research, winning the Business 46(1), 66–88.
Black, F.S. & Treynor, J.L. (1986). Corporate investment deci-
Graham and Dodd award for “Best Paper” titled sion, in Modern Developments in Financial Management,
“What Does It Take to Win the Trading Game?” S.C. Myers, ed., Praeger Publishers.
In 1987, he was presented with the James R. Vertin French, C. (2003). The Treynor capital asset pricing model,
Award of the Research Foundation of the Institute Journal of Investment Management 1(2), 60–72.
of Chartered Financial Analysts, “in recognition of Keynes, J.M. (1936). The General Theory of Employment,
his research, notable for its relevance and endur- Interest, and Money, Harcourt Brace, New York.
ing value to investment professionals.” In addition, Korajczyk, R. (1999). Asset Pricing and Portfolio Perfor-
mance: Models, Strategy and Performance Metrics, Risk
the Financial Analysts Association presented him Books, London.
with the Nicholas Molodovsky Award in 1985, “in Lintner, J. (1965a). The valuation of risk assets and the
recognition of his outstanding contributions to the selection of risky investment in stock portfolios and capital
profession of financial analysis of such significance budgets, The Review of Economics and Statistics 47, 13–37.
Treynor, Lawrence Jack 3

Lintner, J. (1965b). Securities prices, risk, and maximal gains Treynor, J.L. (1965). How to rate management of investment
from diversification, The Journal of Finance 20(4), 587–615. funds, Harvard Business Review 43, 63–75.
Markowitz, H.M. (1952). Portfolio selection, The Journal of Treynor, J.L. (2007). Treynor on Institutional Investing, Wiley,
Finance 7(1), 77–91. New York.
Mehrling, P. (2005). Fischer Black and the Revolutionary Idea Treynor, J.L. & Mazuy, K. (1966). Can mutual funds outguess
of Finance, Wiley, New York. the market? Harvard Business Review 44, 131–136.
Modigliani, F. & Miller, M.H. (1958). The cost of capital, Treynor, J.L. & Vancil, R. (1956). Machine Tool Leasing,
corporation finance, and the theory of investment, The Management Analysis Center.
American Economic Review 48, 261–297.
Sharpe, W.F. (1964). Capital asset prices: a theory of market
equilibrium under conditions of risk, The Journal of Finance Related Articles
19(3), 425–442.
Treynor, J.L. (1961). Market Value, Time, and Risk . Unpub- Black, Fischer; Capital Asset Pricing Model;
lished manuscript. Dated 8/8/1961, #95-209.
Treynor, J.L. (1962). Toward a Theory of Market Value of Risk
Factor Models; Modigliani, Franco; Samuelson,
Assets. Unpublished manuscript. Dated Fall of 1962. Paul A.; Sharpe, William F.
Treynor, J.L. (1963). Implications for the Theory of Finance.
Unpublished manuscript. Dated Spring of 1963. ETHAN NAMVAR
Rubinstein, Edward Mark In 1975, Rubinstein began developing theoret-
ical models of “efficient markets.” In 1976, he
published a paper showing that the same for-
mula derived by Black and Scholes for valuing
Mark Rubinstein, the only child of Sam and Gladys options could come from an alternative set of
Rubinstein of Seattle, Washington, was born on June assumptions based on risk aversion and discrete-
8, 1944 . He attended the Lakeside School in Seattle time trading opportunities. (Black and Scholes had
and graduated in 1962 as one of the two gradu- required continuous trading and continuous price
ation speakers. He earned an A.B. in Economics, movements.)
magna cum laude, from Harvard College in 1966 Working together with Cox et al. [1], Rubinstein
and an MBA with a concentration in finance from published the popular and original paper develop-
the Graduate School of Business at Stanford Uni- ing the binomial option pricing model, one of the
versity in 1968. In 1971, Rubinstein earned his most widely cited papers in financial economics and
PhD. in Finance from the University of California, now probably the most widely used model by pro-
Los Angeles (UCLA). During this time at UCLA, fessional traders to value derivatives. The model
he was heavily influenced by the microeconomist is often referred to as the Cox–Ross–Rubinstein
Jack Hirshleifer. In July 1972, he became an assis- option pricing (CRR) model. At the same time,
tant professor in finance at the University of Cal- Rubinstein began work with Cox [2] on their
ifornian at Berkeley, where he remained for his own text, Options Markets, which was eventually
entire career. He was advanced to tenure unusu- published in 1985 and won the biennial award
ally early in 1976 and became a full professor in of the University of Chicago for the best work
1980. by professors of business concerning any area of
Rubinstein’s early work concentrated on asset business.
pricing. Specifically, between 1971 and 1973, his He supplemented his academic work with first-
research centered on the mean–variance capital asset hand experience as a market maker in options
pricing model and came to include skewness as a when he became a member of the Pacific Stock
measure of risk [3–5]. Rubinstein’s extension has Exchange. In 1981, together with Hayne E. Leland
new relevance as several researchers have since and John W. O’Brien, Rubinstein founded the Leland
determined its predictive power in explaining real- O’Brien Rubinstein (LOR) Associates, the original
ized security returns. In 1974, Rubinstein’s research portfolio insurance firm. At the time, the novel
turned to more general models of asset pricing. idea of portfolio insurance had been put forth by
He developed an extensive example of multiperiod Leland, later fully developed together with Rubin-
security market equilibrium, which later became the stein, and successfully marketed among large insti-
dominant model used by academics in their theoret- tutional investors by O’Brien. Their business grew
ical papers on asset pricing. Unlike earlier work, he extremely rapidly, only to be cut short when they
left the intertemporal process of security returns to had to share the blame for the October 1987 stock
be determined in equilibrium rather than as datum market crash. Not admitting defeat, LOR invented
(although as special cases he assumed a random another product that became the first exchange-traded
walk and constant interest rates). Rubinstein was thus fund (ETF), the SuperTrust, listed on the Ameri-
able to derive conditions for the existence of a ran- can Stock Exchange in 1992. Rubinstein also pub-
dom walk and an unbiased term structure of interest lished a related article examining alternative basket
rates. He also was the first to derive a simple equa- vehicles.
tion in equilibrium for valuing a risky stream of In the early 1990s, Rubinstein published a series
income received over time. He published the first of eight articles in the Risk Magazine showing how
paper to show explicitly how and why in equilib- option pricing tools could easily be applied to value a
rium investors would want to hold long-term bonds host of so-called exotic derivatives, which were just
in their portfolios, and in particular would want to becoming popular.
hold a riskless (in terms of income) annuity maturing Motivated by the failure after 1987 of index
at their death, foreshadowing several strands of later options to be priced anywhere close to the predic-
research. tions of the Black–Scholes formula, in an article
2 Rubinstein, Edward Mark

published in the Journal of Finance [8], he devel- States and around the world. He has served as chair-
oped an important generalization of the original bino- man of the Berkeley finance group, and as direc-
mial model, which he called implied binomial trees. tor of the Berkeley Program in Finance; he is the
The article included new techniques for inferring founder of the Berkeley Options Database (the first
risk-neutral probability distributions from options on large transaction-level database ever assembled with
the same underlying asset. Rubinstein’s revisions of respect to options and stocks). He has served on the
the model provide the natural generalization of the editorial boards of numerous finance journals. He has
standard binomial model to accommodate arbitrary authored 62 journal articles, published 3 books, and
expiration date risk-neutral probability distributions. developed several computer programs dealing with
derivatives.
This paper, in turn, spurred new academic work on
Rubinstein is currently a professor of finance at
option pricing in the latter half of the 1990s and
the Haas School of Business at the University of
found immediate application among various profes-
California, Berkeley. Many of his papers are fre-
sionals. In 1998 and 1999, Rubinstein rounded out quently reprinted in survey publications, and he has
his work on derivatives by publishing a second text won numerous prizes and awards for his research
titled “Rubinstein on Derivatives,” which expanded and writing on financial economics. He was named
its domain from calls and puts to futures and more “Businessman of the Year” (one of 12) in 1987 by
general types of derivatives. The book also pio- Fortune magazine. In 1995, the International Asso-
neered new ways to integrate computers as an aid ciation of Financial Engineers (IAFE) named him
to learning. the 1995 IAFE/SunGard Financial Engineer of the
After a 1999 debate about the empirical rationality Year. In 2000, he was elected to Derivatives Strat-
of financial markets with the key behavioral finance egy Magazine’s “Derivatives Hall of Fame” and
theorist, Richard Thaler, Rubinstein began to rethink named in the “RISK Hall of Fame” by Risk Mag-
the concept of efficient markets. In 2001, he published azine in 2002. Of all his awards, the one he cher-
a version of his conference argument in the Financial ishes the most is the 2003 Earl F. Cheit Teaching
Analysts Journal [6, 7], titled “Rational Markets? award in the Masters of Financial Engineering Pro-
Yes or No: The Affirmative Case,” which won the gram at the University of California, Berkeley [10]
Graham and Dodd Plaque award in 2002. (Rubinstein, M.E. (2003). A Short Career Biography.
He then returned to the more general theory of Unpublished.)
investments with which he had begun his research Rubinstein has two grown-up children, Maisiee
career as a doctoral student. In 2006, Rubinstein [11] and Judd. He lives with Diane Rubinstein in the San
Francisco Bay Area.
published “A History of the Theory of Investments:
My Annotated Bibliography”—an academic history
of the theory of investments from the thirteenth to the References
beginning of the twenty-first century, systematizing
the knowledge, and identifying the relations between [1] Cox, J.C., Ross, S.A. & Rubinstein, M.E. (1979).
Optional pricing: a simplified approach, Journal of
apparently disparate lines of research. No other book Financial Economics September, 229–263.
has so far been written that comes close to examining [2] Cox, J.C. & Rubinstein, M.E. (1985). Options Markets,
in detail the intellectual path that has led to modern Prentice-Hall.
financial economics (particularly, in the subarea of [3] Rubinstein, M.E. (1973). The fundamental theorem
investments). Rubinstein shows that the discovery of of parameter-preference security valuation, Journal of
Financial and Quantitative Analysis January, 61–69.
key ideas in finance is much more complex and mul- [4] Rubinstein, M.E. (1973). A comparative statics analysis
tistaged than anyone had realized. Too few are given of risk premiums, Journal of Business October.
too much credit, and sometimes original work has [5] Rubinstein, M.E. (1973). A mean-variance synthesis of
been forgotten. corporate financial theory, Journal of Finance March.
Rubinstein has taught and lectured widely. Dur- [6] Rubinstein, M.E. (1989). Market basket alternatives,
Financial Analysts Journal September/October.
ing his career, he has given 303 invited lectures, [7] Rubinstein, M.E. (1989). Rational markets? Yes or
including conference presentations, full course sem- No: the affirmative case, Financial Analysts Journal
inars, and honorary addresses all over the United May/June.
Rubinstein, Edward Mark 3

[8] Rubinstein, M.E. (1994). Implied binomial trees, Journal [11] Rubinstein, M.E. (2006). A History of the Theory of
of Finance July, 771–818. Investments: My Annotated Bibliography, John Wiley &
[9] Rubinstein, M.E. (2000). Rubinstein on Derivatives, Risk Sons, New York.
Books.
[10] Rubinstein, M.E. (2003). All in All, it’s been a Good ETHAN NAMVAR
Life, The Growth of Modern Risk Management: A His-
tory July, 581–585.
Infinite Divisibility  > 0. On the other hand, the class L of SD
distributions is characterized as the class of pos-
sible limit laws for normalized sequences of the
form (X1 + · · · + Xn − an )/bn , where X1 , X2 , . . . are
We say that a random variable X has an infinitely independent random variables and an and bn > 0
divisible (ID) distribution (in short X is ID) if are sequences of numbers with limn→∞ bn = ∞ and
for all the integers n ≥ 1 there exist n indepen- limn→∞ bn+1 /bn = 1.
dent identically distributed (i.i.d) random variables
d d
X1 , . . . , Xn , such that X1 + · · · + Xn = X, where =
is equality in distribution. Alternatively, X (or its Lévy–Khintchine Representation
distribution µ) is ID if for all n ≥ 1, µ is the nth
convolution µn ∗ · · · ∗ µn , where µn is a probability In terms of characteristic functions (see Filtering),
distribution. a random variable X is ID if ϕ(u) = E[eiuX ] is
There are several advantages in using infinitely represented by ϕ = (ϕn )n , where ϕn is the char-
divisible distributions and processes in financial acteristic function of a probability distribution for
modeling. First, they offer wide possibilities for every n ≥ 1. We define the characteristic exponent
modeling alternatives to the Gaussian and stable or cumulant function of X by (u) = log ϕ(u).
distributions, while maintaining a link with the The Lévy–Khintchine representation establishes that
central limit theorem and a rich probabilistic struc- a distribution function µ is ID if and only if its char-
ture. Second, they are closely linked to Lévy pro- acteristic exponent is represented by
cesses: for each ID distribution µ there is a Lévy
process (see Lévy Processes) {Xt : t ≥ 0} with 1
(u) = iau − u2 σ 2
X1 having distribution µ. Third, every stationary 2

distribution of an Ornstein–Uhlenbeck process (see  iux 
Ornstein–Uhlenbeck Processes) belongs to the class + e − 1 − iux1|x|≤1 (dx), u∈

L of ID distributions, which are self-decomposable
(SD). We say that a random variable X is SD if it has (1)
the linear autoregressive property: for any θ ∈ (0, 1),
where σ 2 ≥ 0, a ∈  and  is a positive measure on
there is a random variable εθ independent of X such
d  with no atom at zero and  min(1, |x|2 )(dx) <
that X = θX + εθ . ∞. The triplet (a, σ 2 , ) is unique and is called
The concept of infinite divisibility in probability the generating triplet of µ, while  is its Lévy
was introduced in 1929 by de Fenneti. Its theory was measure. When  is zero, we have the Gaussian
established in the 1930s by Khintchine, Kolmogorov, distribution. We speak of the purely non-Gaussian
and Lévy. Motivated by applications arising in dif- case when σ 2 = 0. When (dx) = h(x)dx is abso-
ferent fields, from the 1960s on there was a renewed lutely continuous, we call the nonnegative func-
interest in the subject, in particular, among many tion h the Lévy density of . Distributions in
other topics, in the study of concrete examples and the class L are also characterized by having Lévy
subclasses of ID distributions. Historical notes and densities of the form h(x) = |x|−1 g(x), where g
references are found in [3, 6, 8, 9]. is nondecreasing in x < 0 and nonincreasing in
x > 0.
A nonnegative ID random variable is characterized
Link with the Central Limit Theorem by a special form of its Lévy–Khintchine repre-
The class of ID distributions is characterized as sentation: it is purely non-Gaussian, (−∞, 0) = 0,
|x|≤1 |x| (dx) < ∞, and
the class of possible limit laws for triangular
arrays of the form Xn,1 + · · · + Xn,kn − an , where 
 iux 
kn > 0 is an increasing sequence, Xn,1 , . . . , Xn,kn are (u) = ia0 u + e − 1 (dx) (2)
+
independent random variable for every n ≥ 1, an
are normalized constants, and {Xn,j } is infinitesi- where a0 ≥ 0 is called the drift. The associated Lévy
mal: limn→∞ max1≤j ≤kn P (Xn,j  > ) = 0, for each process {Xt : t ≥ 0} is called a subordinator. It is a
2 Infinite Divisibility

nonnegative increasing process having characteristic of SD distributions is that they always have densities
exponent (2). Subordinators are useful models for that are unimodal.
random time evolutions. Infinite divisibility is preserved under some mix-
Several properties of an ID random variable X tures of distributions. One has the surprising fact
are related to corresponding properties of its Lévy that any mixture of the exponential distribution is
measure . For example,
 the kth moment E |X|k is d
ID: X = Y V is ID whenever V has exponential dis-
|x|
finite if and only if |x|>1 k (dx) is finite. Like-
 tribution and Y is an arbitrary nonnegative random
wise, for the IDlog condition: |x|>2 ln |x| (dx) < ∞ variable independent of V . The monograph [9] has a

if and only if |x|>2 ln |x| µ(dx) < ∞. detailed study of ID mixtures.
The monograph [8] has a detailed study of mul-
tivariate ID distributions and their associated Lévy
processes. Stochastic Integral Representations
Several classes of ID distributions are characterized
Classical Examples and Criteria by stochastic integrals (see Stochastic Integrals)
of a nonrandom function with respect to a Lévy
The Poisson distribution with mean λ > 0 is ID process [2]. The classical example is the class L
with Lévy measure (B) = λ1{1} (B), but is not d
that is also characterized as all the laws of X =
SD. Acompound Poisson distribution is the law of ∞ −t
X= N 0 e dZt , where Zt is a Levy process having
i=1 Yi , where N, Y1 , Y2 , . . . are independent
Lévy measure Z with the IDlog condition. More
random variables, N having Poisson distribution with 1
mean λ and the Yi ’s have the same distribution G, generally, the stochastic integral 0 log t −1 dZt is well
with G({0}) = 0. Any compound Poisson distribution defined for every Lévy process Zt . Denote by B()
is ID with Lévy measure (B) = λG(B). This the class of all the distributions of these stochastic
distribution is a building block for all other ID laws, integrals. The class B() coincides with those ID
since every ID distribution is the limit of a sequence laws with completely monotone Lévy density. It is
of compound Poisson distributions. also characterized as the smallest class that contains
An important example of an SD law is the all mixtures of exponential distributions and is closed
gamma distribution with shape parameter α > 0 and under convolution, convergence, and reflection. It
scale parameter β > 0. It has Lévy density h(x) = is sometimes called the Bondenson–Goldie–Steutel
αx −1 e−βx , x > 0. The α-stable distribution, with class of distributions. Multivariate extensions are
0 < α < 2 and purely non Gaussian, is also SD. Its presented in [2].
Lévy density is h(x) = c1 x −1−α dx on (0, ∞) and
h(dx) = c2 |x|−1−α on (−∞, 0), with c1 ≥ 0, c2 ≥ 0
and c1 + c2 > 0. Generalized Gamma Convolutions
There is no explicit characterization of infinite
divisibility in terms of densities or distributions. The class of generalized gamma convolutions
However, there are some sufficient or necessary con- (GGCs) is the smallest class of probability distribu-
ditions to test for infinite divisibility. A nonnegative tions on + that contains all gamma distributions and
random variable with density f is ID in any of the fol- is closed under convolution and convergence in dis-
lowing cases: (i) log f is convex, (ii) f is completely tribution [6]. These laws are in the class L and have
monotone, or (iii) f is hyperbolically completely Lévy density of the form h(x) = x −1 g(x), x > 0,
monotone [9]. If X is symmetric around zero, it is with g a completely monotone function on (0, ∞).
ID if it has a density that is completely monotone on Most of the classical distributions on + are GGC:
(0, ∞). For a non-Gaussian ID distribution F, its tail gamma, lognormal, positive α-stable, Pareto, Student
behavior is − log(1 + F (−x) − F (x)) = O(x log x), t-distribution, Gumbel, and F -distribution. Of spe-
when x → ∞. Hence, no bounded random variable cial applicability in financial modeling is the family
is ID and if a density has a decay of the type of generalized inverse Gaussian distributions [4, 7].
c1 exp(−c2 x 2 ) with some c1 , c2 positive and if it is A distribution µ with characteristic exponent 
not Gaussian, then F is not ID. An important property is GGC if and only if there exists a positive Radon
Infinite Divisibility 3

measure U on (0, ∞) such that with V being nonnegative ID and N having the
 ∞   standard normal distribution. Any type G distribu-
iu
(u) = ia0 u − log 1 + U (ds) (3) tion is ID and it is interpreted as the law of a
0 s random time changed Brownian motion BV , where
1 ∞ {Bt : t ≥ 0} is a Brownian motion independent of V .
with 0 | log x|U (dx) < ∞ and 1 U (dx)/x < ∞.
When we know the Lévy measure ρ of V , we
The measure Uµ is called the Thorin measure of µ.
can compute the Lévy density of X as h(x) =
So, the triplet of µ is (a0 , 0, νµ ) where the Lévy  1 2
measure is concentrated
∞ on (0, ∞) and such that (2π)−1/2 + s −1/2 e− 2s x ρ(ds) as well as its charac-
νµ (dx) = dx/x 0 e−xs Uµ (ds). Moreover,  ∞any GGC
teristic exponent
is the law of a Wiener-gamma integral 0 h(u)dγu , 
where (γt ; t ≥ 0) is the standard gamma process with X (u) = e−(1/2)u s − 1 ρ(ds)
2
(5)
Lévy measure ν(dx) = e−x (dx/x)  ∞ and h is a Borel +
function h : + → + with 0 log(1 + h(t))dt <
∞. The function h is called the Thorin function of Many classical distributions are of type G and SD:
x the gamma variance distribution, where V has a
µ and is obtained as follows. Let FU (x) = 0 U (dy)
for x ≥ 0 and let FU−1 (s) be the right continuous gamma distribution; the Student t, where V has the
distribution of the reciprocal chi-square distribution
inverse of FU−1 (s) in the sense of composition of
and the symmetric α-stable distributions, 0 < α < 2;
functions, that is FU−1 (s) = inf{t > 0; FU (t) ≥ s} for
here V is a positive α/2-stable random variable,
s ≥ 0. Then, h(s) = 1/FU−1 (s) for s ≥ 0. For the
including the Cauchy distribution case α = 1. Of
positive α-stable distributions, 0 < α < 1, h(s) =
special relevance in financial modeling are the nor-
{sθ (α + 1)}−1/α for a θ > 0.
mal inverse Gaussian, with V following the inverse
For distributions on , Thorin also introduced
Gaussian law [1], and the zero-mean symmetric gen-
the class T () of extended generalized gamma
eralized hyperbolic distributions, where V has the
convolutions as the smallest class that contains the
generalized inverse Gaussian law [5, 7]; all their
GGC and is closed under convolution, convergence
moments are finite and they can accommodate heavy
in distribution, and reflection. These distributions are
tails.
in the class L and are characterized by the alternative
representation of their characteristic exponents

1 2 2 Tempered Stable Distributions


(u) = iua − uσ
2
  
Tempered stable distributions (see Tempered Stable
iu iux Process) are useful in mathematical finance as an
− ln 1− + U (dx) (4)
+ x 1+x 2 attractive alternative to stable distributions, since they
can have moments and heavy tails at the same time.
where a ∈ , σ 2 ≥ 0 and U : +→ + is a nonde- Their corresponding Lévy and Ornstein–Uhlenbeck
1
 ∞ −2 with U (0) = 0, 0 |ln(x)|)U (dx) <
creasing function processes combines both the stable and Gaussian
∞ and 1 x U (dx) < ∞. Several examples of trends. An ID distribution on  is tempered stable
Thorin distributions are given in [6, 9]. Any mem- if it is purely non-Gaussian and if its Lévy measure
ber of
 ∞this class is the law of a stochastic inte- is of the form
gral 0 g ∗ (t)dZt , where Zt is a Lévy process with   ∞
Z1 satisfying the IDlog condition and g ∗ is the
(B) = 1B (sx)s −1−α g(s)dsτ (dx) (6)
inverse
 ∞ −1 of the incomplete gamma function g(t) =  0
−u
t u e du [2].
where 0 < α < 2, g is a completely monotone func-
tion on (0, ∞) and τ is a finite Borel measure on 
Type G Distributions such that τ has no atom at zero and  |x|α τ (dx) <
∞. These distributions are in class L and constitute
d √
A random variable X is of type G if X = V N , a proper subclass of the class of Thorin distribu-
where N and V are independent random variables tions T ().
4 Infinite Divisibility

References on Stochastic Analysis, Random Fields and Applications


IV, Progress in Probability, R.C. Dalang, M. Dozzi &
F. Russo, eds, Birkhäuser, Vol. 58, pp. 221–264.
[1] Barndorff-Nielsen, O.E. (1998). Processes of normal [8] Sato, K. (1999). Lévy Processes and Infinitely Divisible
inverse Gaussian type, Finance and Stochastics 2, 41–68. Distributions, Cambridge University Press, Cambridge.
[2] Barndorff-Nielsen, O.E., Maejima, M. & Sato, K. (2006). [9] Steutel, F.W. & Van Harn, K. (2003). Infinite Divisibility
Some classes of multivariate infinitely divisible dis- of Probability Distributions on the Real Line, Marcel-
tributions admitting stochastic integral representations, Dekker, New York.
Bernoulli 12, 1–33.
[3] Barndorff-Nielsen, O.E., Mikosch, T. & Resnick, S.
(eds) (2001). Lévy Processes—Theory and Applications,
Further Reading
Birkhäuser, Boston.
[4] Barndorff-Nielsen, O.E. & Shephard, N. (2001). Non- James, L.F., Roynette, B. & Yor, M. (2008). Generalized
Gaussian Ornstein–Uhlenbeck-based models and some gamma convolutions, Dirichlet means, Thorin measures,
of their uses in financial economics (with Discussion), with explicit examples, Probability Surveys 8, 346–415.
Journal of the Royal Statistical Society Series B 63, Rosinski, J. (2007). Tempering stable processes, Stochastic
167–241. Processes and Their Applications 117, 677–707.
[5] Bibby, B.M. & Sorensen, M. (2003). Hyperbolic distribu-
tions in finance, in Handbook of Heavy Tailed Distribu- Related Articles
tions in Finance, S.T. Rachev, ed, Elsevier, Amsterdam.
[6] Bondesson, L. (1992). Generalized Gamma Convolu-
tions and Related Classes of Distributions and Densities, Exponential Lévy Models; Heavy Tails; Lévy
Lecture Notes in Statistics, Springer, Berlin, Vol. 76. Processes; Ornstein–Uhlenbeck Processes; Temp-
[7] Eberlein, E. & Hammerstein, E.V. (2004). General- ered Stable Process; Time-changed Lévy Process.
ized hyperbolic and inverse Gaussian distributions: lim-
iting cases and approximation of processes, in Seminar VÍCTOR PÉREZ-ABREU
Ornstein–Uhlenbeck 2
For t = s, we obtain var(Xt ) = σ (1 − e−2λt ).

Let N be a zero-mean Gaussian random variable with
Processes 2
variance σ , independent of the Brownian motion
2λ t
{Bt : t ≥ 0}. The process Xt = σ e−λt 0 eλs dBs + N
There are several reasons why Ornstein– is a stationary Gaussian process with Cov(Xt , Xs ) =
Uhlenbeck processes are of practical interest in σ 2 e−λ|t−s| . Moreover, X is a Markov process with
t

financial stochastic modeling. These continuous-time stationary transition probability
stochastic processes offer the possibility of capturing √
important distributional deviations from Gaussianity λ
Pt (x, B) = 
and for flexible modeling of dependence structures, σ π(1 − e−2λt )
while retaining analytic tractability.   
An Ornstein–Uhlenbeck (OU) process is defined λ (y − xe−λt )2
× exp − 2 dy
as the solution Xt of a Langevin-type stochastic B σ 1 − e−2λt
differential equation (SDE) dXt = −λXt dt + dZt ,
where λ > 0 and Zt is a Lévy process (see Lévy (3)
Processes). The process is named after L. S. Ornstein
and G. E. Uhlenbeck who, in 1930, considered the
classical Langevin equation when Z is a Brownian Non-Gaussian OU Processes
motion, and hence Xt is a Gaussian process. Histor-
ical notes, references, and details are found in [6, 7] Let {Zt : t ≥ 0} be a Lévy process (see Lévy Pro-
while modeling aspects are found in [1]. At the time cesses). A solution of the Langevin-type SDE dXt =
of writing, new extensions and applications of OU −λXt dt + dZt is a stochastic process {Xt : t ≥ 0}
processes are thriving, many of them motivated by with right-continuous and left-limit paths satisfying
financial modeling. the equation
 t
Xt = X0 − λ Xs ds + Zs , t ≥ 0 (4)
0
The Gaussian OU Process
When X0 is independent of {Zt : t ≥ 0}, the
Let {Bt : t ≥ 0} be a standard Brownian motion, σ a unique (almost surely) solution is the OU process
 t
positive constant, and x0 a real constant. The classical Xt = e−λt X0 + e−λ(t−s) dZs , t ≥ 0 (5)
OU process 0
 t
We call Zt the background driving Lévy process
Xt = e−λt x0 + σ e−λ(t−s) dBs , t ≥ 0 (1) (BDLP). Of special relevance in financial model-
0
ing is the case when Zt is a nonnegative increasing
Lévy process (a subordinator) and X0 is nonneg-
is the solution of the classical Langevin equa- ative. The corresponding OU process is positive,
tion dXt = −λXt dt + σ dBt , X0 = x0 . It was orig- moves up entirely by jumps, and then tails off
inally proposed as a model for the velocity of exponentially. Hence it can be used as a variance
a Brownian motion and it is the continuous-time process.
analog of the discrete-time autoregressive process Every OU process is a time-homogeneous Markov
AR(1). In mathematical finance, OU is used for process starting from X0 and its transition prob-
modeling of the dynamics of interest rates and ability Pt (x, dy) is infinitely divisible (see Infi-
volatilities of asset prices. The process Xt is a nite Divisibility) with characteristic function (see
Gaussian process with (almost surely) continuous Filtering)
sample paths, mean function E(Xt ) = x0 σ e−λt , and 
  t
covariance −λt −λs
e Pt (x, dy) = exp ixue +
iuy
(e u)ds
σ 2  −λ|t−s|   0
Cov(Xt , Xs ) = e − e−λ(t+s) (2) (6)

2 Ornstein–Uhlenbeck Processes

where  is the characteristic exponent of the divisible): for any θ ∈ (0, 1), there is a random vari-
d
Lévy process Zt given by the Lévy–Khintchine able εθ independent of X such that X = θX + εθ .
representation Conversely, for every SD distribution µ there exists
a Lévy process Zt with Z1 being IDlog and such that
1 µ is the stationary distribution of the OU process
(u) = iau − u2 σ 2
2 driven by Zt .
 The strictly stationary OU process is defined as
+ (eiux − 1 − iux1|x|≤1 )(dx), u∈  t

−λt
(7) Xt = e eλs dZs , t ∈  (9)
−∞

where σ 2 ≥ 0, a ∈ , and , the Lévy mea- where {Zt : t ∈ } is a Lévy process constructed
sure, is a positive measure on  with  ({0}) = 0 as follows: let {Zt1 : t ≥ 0} be a Lévy process with
and  min(1, |x|2 )(dx) < ∞. For each t > 0, the characteristic exponent 1 and let {Zt2 : t ≥ 0} be a
probability distribution of Zt has characteristic func- Lévy process with characteristic exponent 2 (u) =
tion ϕt (u) = E[eiuXt ] = exp(t(u)). When the Lévy 1 (−u) and independent of Z 1 . Then Zt = Zt1 for
measure is zero, Zt is a Brownian motion with vari- t ≥ 0 and Zt = Z−t 2
for t < 0. In this case, the law
ance σ 2 and drift a. of Xt is SD and conversely, for any SD law µ there
exists a BDLP Zt such that equation (9) determines a
stationary OU process with distribution µ. As a result,
The Integrated OU Process 0
taking X0 = −∞ eλs dZs , we can always consider (5)
as a strictly stationary OU process with a prescribed
A non-Gaussian OU process Xt has the same jump SD distribution µ. It is an important example of a
times of Zt , as one sees from equation (4). However, continuous-time moving average process.
Xt and Zt cobreak in the sense that a linear combi-
nation of the two does not jump. We see this by con-
sidering
t the continuous integrated OU process ItX = Generalizations
0 Xs ds, which has two alternative representations
The monographs [6, 7] contain a detailed study of
ItX =λ −1 −1
{X0 − Xt + Zt } = λ (1 − e −λt
)X0 multivariate OU process, while matrix extensions
 t are considered in [2]. Another extension is the
−1

generalized OU process, which has arisen in several
+λ 1 − e−λ(t−s) dZs (8)
0 financial applications [4, 8]. It is defined as
 t
−ξt −ξt
In the Gaussian case, the process ItX is interpreted Xt = e X0 + e e−ξs− dηs , t ≥ 0 (10)
as the displacement of the Brownian particle. In 0

financial applications, ItX is used to model integrated


variance [1]. where {(ξt , ηt ) : t ≥ 0} is a bivariate Lévy process,
independent of X0 . This process is a homogeneous
Markov process starting from X0 , and, in general,
Stationary Distribution and the Stationary the existence of the stationary solution depends
OU Process on the convergence of integrals of exponentials of
Lévy processes. For example, when ξ and η are

An OU process has an asymptotic distribution µ independent, and if ξt → ∞ and V∞ = 0 e−ξs− dηs
when t → ∞ if it does not have too many big jumps. is defined and finite, then the law of V∞ is the
This is achieved if Z1 is IDlog : |x|>2 ln |x| (dx) unique stationary solution of Xt . In the dependent
< ∞, where  is the Lévy measure of Z1 . In this case, the generalized OU process admits a stationary
case, µ does not depend on X0 and we call µ the solution that does not degenerate to a constant
process if and only if V∞ = limt→∞ 0 e−ξs− dLs
t
stationary distribution of Xt . Moreover, µ is a self-
decomposable (SD) distribution (and hence infinitely exists and is finite almost surely and does not
Ornstein–Uhlenbeck Processes 3

degenerate to constant random variable, and where Y. Kabanov, R. Lipster & J. Stoyanov, eds, Springer,
Lt is the accompanying Lévy process Lt = ηt +

pp. 392–419.
− ξs ξ η [5] Linder, A. & Maller, R. (2005). Lévy processes and
0<s≤t (e − 1) ηs − tE(B1 , B1 ), where ξs =
ξ η the stationarity of generalised Ornstein-Uhlenbeck pro-
ξs − ξs− , with B1 , B1 the Gaussian parts of ξ and η cesses, Stochastic Processes and Their Applications 115,
respectively [3, 5]. 1701–1722.
[6] Rocha-Arteaga, A. & Sato, K. (2003). Topics in Infinitely
Divisible Distributions and Lévy Processes, Aportaciones
References Matemáticas Investigación, Mexican Mathematical Soci-
ety, 17.
[1] Barndorff-Nielsen, O.E. & Shephard, N. (2001). Non- [7] Sato, K. (1999). Lévy Processes and Infinitely Divisible
Gaussian Ornstein-Uhlenbeck-based models and some of Distributions, Cambridge University Press, Cambridge.
their uses in financial economics (with discussion), The [8] Yor, M. (2001). Exponential Functionals of Brownian
Journal of the Royal Statistical Society B 63, 167–241. Motion and Related Processes, Springer, New York.
[2] Barndorff-Nielsen, O.E. & Stelzer, R. (2007). Positive-
definite matrix processes of finite variation, Probability
and Mathematical Statistics 27, 3–43. Related Articles
[3] Carmona, P., Petit, F. & Yor, M. (2001). Exponential
functionals of Lévy processes, in Lévy Processes. Theory
and Applications, O.E. Barndorff-Nielsen, T. Mikosch & Infinite Divisibility; Lévy Processes; Stochastic
S.I. Resnick, eds, Birkhäuser, pp. 41–55. Integrals.
[4] Klüppelberg, C., Linder, A. & Maller, R. (2006). Con-
tinuous time volatility modelling: COGARCH versus VÍCTOR PÉREZ-ABREU
Ornstein-Uhlenbeck models, in The Shiryaev Festschrift:
From Stochastic Calculus to Mathematical Finance,
Fractional Brownian One can define a parametric family of fBms in
terms of the stochastic Weyl integral (see e.g. [16],
Motion chapter 7.2). In fact, for any a, b ∈ ,

 
A fractional Brownian motion (fBm) is a self-similar  H−
1
H−
1
d
Gaussian process, defined as follows: {BH (t)}t∈ = a [(t − s)+ 2 − (−s)+ 2 ]


Definition 1 Let 0 < H < 1. The Gaussian stochas-  
tic process {BH (t)}t≥0 satisfying the following three H−
1
H−
1
properties + b [(t − s)− 2 − (−s)− 2 ] dB(s) (2)
t∈
(i) BH (0) = 0
(ii) E[BH (t)] = 0 for all t ≥ 0, where u+ = max(u, 0), u− = max(−u, 0), and
(iii) for all s, t ≥ 0, {B(t)}t∈ is a two-sided standard Brownian motion
constructed by taking a Brownian motion B1 and an
E[BH (t)BH (s)] independent copy B2 and setting B(t) = B1 (t)1{t≥0}
− B2 (−t−)1{t<0} . √
1  2H 
= |t| − |t − s|2H + |s|2H (1) If we choose a = (2H + 1) sin(πH )/ (H +
2 1/2) and b = 0 in equation (2) then {BH (t)}t∈ is an
is called the (standard) fBm with parameter H . fBm satisfying equation (1).
fBm
t
admits a Volterra type representation BH (t)
The fBm has been the subject of numerous inves- = 0 KH (t, s) B(ds), where KH is some square inte-
tigations, in particular, in the context of long-range grable kernel (see [13] or [1] for details).
dependence (often referred to as long memory). fBm
was first introduced in 1940 by Kolmogorov (see
Kolmogorov, Andrei Nikolaevich) [11], but its main
properties and its relevance in many fields of appli- Properties
cation such as economics, finance, turbulence, and
telecommunications were first discussed in the sem- Many properties of fBm, like self-similarity, are
inal paper of Mandelbrot (see Mandelbrot, Benoit) given by its fractional index H .
and Van Ness [12].
For historical reasons, the parameter H is also Definition 2 A real-valued stochastic process
referred to as the Hurst coefficient. In fact, in 1951, {X(t)}t∈ is self-similar with index H if for all c > 0,
while he was investigating the flow of the river Nile, d d
{X(ct)}t∈ = cH {X(t)}t∈ , where = denotes equality
the British hydrologist H. E. Hurst [10] noticed that in distribution.
his measurements showed dependence properties and,
in particular, long memory behavior in the sense Proposition 1 Fractional Brownian motion (fBm)
that they seemed to require models, whose auto- is self-similar with index H . Moreover, fBm is the
correlation functions exhibit a power law decay at only self-similar Gaussian process with stationary
large timescales. This index of dependence H always increments.
takes values between 0 and 1 and indicates rela-
tively long-range dependence if H > 0.5, for exam- Now, we consider the increments of fBm.
ple, Hurst observed H = 0.91 in the case of Nile Definition 3 The stationary process {Y (t)}t∈ given
level data.
by
If H = 0.5, it is obvious from equation (1)
that the increments of fBm are independent Y (t) = BH (t) − BH (t − 1) t ∈  (3)
and {B0.5 (t)}t∈ = {B(t)}t∈ is ordinary Brownian
motion. Moreover, fBm has stationary increments
which, for H  = 0.5, are not independent. is called fractional Gaussian noise.
2 Fractional Brownian Motion

0.5
H = 0.95

0
H = 0.55

BH(t )
−0.5

−1
H = 0.75

−1.5

−2
0 50 100 150 200 250 300 350 400 450 500
t

Figure 1 Various sample paths, each showing 500 points of fBm

For n ∈ , it follows by the stationarity of the such that


increments of BH ,
|BH (t) − BH (s)| ≤ c|t − s|H − (6)

ρH (n) := cov(Y (k + n), Y (k))


for any  > 0.
1
= (|n + 1|2H − 2|n|2H − |n − 1|2H ) (4)
2 Figure 1 shows the sample paths of fBm for
various values of the Hurst parameter H .
Proposition 2
Proposition 4 The sample paths of fBm are of finite


(i) If 0 < H < 0.5, ρH is negative and |ρH (n)| p-variation for every p > 1/H and of infinite p-
n=1 variation if p < 1/H .
< ∞.
(ii) If H = 0.5, ρH equals 0, that is, the increments
Consequently, for H < 0.5 the quadratic variation
are independent.

∞ is infinite. On the other hand, if H > 0.5 it is known


(iii) If 0.5 < H < 1, ρH is positive, |ρH (n)| = that the quadratic variation of fBm is zero, whereas
n=1 the total variation is infinite.
∞,
and Corollary 1 This shows that for H  = 1/2, fBm
ρH (n) ∼ Cn2H −2 , n→∞ (5) cannot be a semimartingale.

Hence, for 0.5 < H < 1 the increments of fBm A proof of this well-known fact can be found in
are persistent or long-range dependent, whereas for for example, [15] or [4].
0 < H < 0.5 they are said to be antipersistent. However, since fBm is not a semimartingale one
cannot use the Itô stochastic integral (see Stochastic
Proposition 3 The sample paths of fBm are contin- Integrals) when considering integrals with respect to
uous. In particular, for every H̃ < H there exists a fBm. Recently, integration with respect to fBms has
modification of BH whose sample paths are almost been studied extensively and various approaches have
surely (a.s.) locally H̃ -Hölder continuous on , that been made to define a stochastic integration theory for
is, for each trajectory, there exists a constant c > 0 fBm (see e.g., [14] for a survey).
Fractional Brownian Motion 3

Applications in Finance Hu and Oksendal [9] in a fractional Black–Scholes


pricing model in which the “gain” of a self-financing
Many studies of financial time series point to long- T
range dependence (see Long Range Dependence), portfolio φ is replaced by φ(t) δS(t). However,
0
which indicates the potential usefulness of fBm in results produced by this approach are controversial:
financial modeling (see [7] for a summary and refer- indeed, for a piecewise constant strategy (represented
ences). One obstacle is that fBm is not a semimartin- by a simple predictable process) φ, this definition
gale (see Semimartingale), so the Ito integral cannot does not coincide with the capital gain of the portfo-
be used to define the gain of a self-financing portfo- lio, so the approach lacks economical interpretation
lio as, for instance, in the Black–Scholes model (see [3]. An interesting study is [17], where the implica-
Black–Scholes Formula). Various approaches have tions of different notions of integrals to the problem
been developed for integrating fBm, some of which of arbitrage and self-financing condition in the frac-
are as follows: tional pricing model are considered.
1. The pathwise Riemann–Stieltjes fractional inte- An alternative is to use mixed Brownian motion,
gral defined by defined as the sum of a (regular) Brownian motion
and an fBm with index H which, under some
conditions on H , is a semimartingale [5]. Alterna-
T
tively, Rogers [15] proposes to modify the behavior
f (t) dBH (t) near zero of the kernel in equation (2) to obtain a
0 semimartingale. In both the cases, one loses self-

n−1 similarity, but conserves long-range dependence.
= lim f (tk )(BH (tk+1 ) − BH (tk )) On the other hand, there is empirical evidence
|π|→0
k=0 of long-range dependence in absolute returns [7],
(7) showing that it might be more interesting to use frac-
tional processes as models of volatility rather than
where π = {tk : 0 = t0 < t1 < . . . < tn = T } is prices [6]. Fractional volatility processes are compat-
a partition of the interval [0, T ] and f has ible with the semimartingale assumption for prices, so
bounded p-variation for some p < 1/(1 − H ) the technical obstacles discussed above do not nec-
a.s. essarily arise when defining portfolio gain processes
2. Under some regularity conditions on f , the (see Long Range Dependence; Multifractals).
fractional Wick–Itô integral has the form

T
References
f (t) δBH (t)
[1] Baudoin, F. & Nualart, D. (2003). Equivalence of
0 Volterra processes, Stochastic Processes and their Appli-

n−1 cations 107, 327–350.
= lim f (tk ) ♦ (BH (tk+1 ) − BH (tk )) [2] Bender, C. (2003). An Itô formula for generalized func-
|π|→0 tionals of a fractional Brownian motion with arbitrary
k=0
Hurst parameter, Stochastic Processes and their Appli-
(8) cations 104, 81–106.
[3] Björk, T. & Hult, H. (2005). A note on Wick products
where ♦ represents the Wick product [18] and and the fractional Black-Scholes model, Finance and
the convergence is the L2 ()-convergence of Stochastics 9, 197–209.
random variables [2]. [4] Cheridito, P. (2001). Regularizing Fractional Brownian
Whereas, the pathwise fractional integral mirrors a Motion with a View towards Stock Price Modelling, PhD
Dissertation, ETH Zurich.
Stratonovich integral, the Wick–Itô-Skorohod calcu-
[5] Cheriditio, P. (2003). Arbitrage in fractional Brownian
lus is similar to the Itô calculus, for example, integrals motion models, Finance and Stochastics 7, 533–553.
always have zero expectation. [6] Comte, F. & Renault, E. (1998). Long memory in con-
The Wick–Itô integral was constructed by Duncan tinuous time stochastic volatility models, Mathematical
et al. [8] and later applied to finance by, for example, Finance 8, 291–323.
4 Fractional Brownian Motion

[7] Cont, R. (2005). Long range dependence in financial [15] Rogers, L.C.G. (1997). Arbitrage with fractional Brow-
time series, in Fractals in Engineering, E. Lutton & nian motion, Mathematical Finance 7, 95–105.
J. Levy-Vehel, eds, Springer. [16] Samorodnitsky, G. & Taqqu, M. (1994). Stable Non-
[8] Duncan, T.E., Hu, Y. & Pasik-Duncan, B. (2000). Gaussian Random Processes: Stochastic Models with
Stochastic calculus for fractional Brownian motion I. Infinite Variance, Chapman & Hall, New York.
Theory, SIAM Journal of Control and Optimization 28, [17] Sottinen, T. & Valkeila, E. (2003). On arbitrage
582–612. and replication in the fractional Black-Scholes pricing
[9] Hu, Y. & Oksendal, B. (2003). Fractional white noise model, Statistics and Decisions 21, 93–107.
calculus and applications to finance, Infinite Dimensional [18] Wick, G.-C. (1950). Evaluation of the collision matrix,
Analysis, Quantum Probability and Related Topics 6, Physical Review 80, 268–272.
1–32.
[10] Hurst, H. (1951). Long term storage capacity of reser-
Further Reading
voirs, Transactions of the American Society of Civil Engi-
neers 116, 770–1299.
[11] Kolmogorov, A.N. (1940). Wienersche Spiralen und Doukhan, P., Oppenheim, G. & Taqqu, M.S. (2003). Theory
einige andere interessante Kurven im Hilbertschen and Applications of Long-Range Dependence, Birkhäuser,
Raum, Computes Rendus (Doklady) Academic Sciences Boston.
USSR (N.S.) 26, 115–118. Lin, S.J. (1995). Stochastic analysis of fractional Brownian
[12] Mandelbrot, B.B. & Van Ness, J.W. (1968). Fractional motion, Stochastics and Stochastics Reports 55, 121–140.
Brownian motions, fractional noises and applications,
SIAM Review 10, 422–437.
[13] Norros, I., Valkeila, E. & Virtamo, J. (1999). An ele-
Related Articles
mentary approach to a Girsanov formula and other ana-
lytical results on fractional Brownian motion, Bernoulli Long Range Dependence; Mandelbrot, Benoit;
5, 571–589. Multifractals; Semimartingale; Stylized Properties
[14] Nualart, D. (2003). Stochastic calculus with respect of Asset Returns.
to the fractional Brownian motion and applications,
Contemporary Mathematics 336, 3–39. TINA M. MARQUARDT
with independent increments. For the most part, how-
Lévy Processes ever, research literature through the 1960s and 1970s
refers to Lévy processes simply as processes with
stationary and independent increments. One sees a
A Lévy process is a continuous-time stochastic pro- change in language through the 1980s and by the
cess with independent and stationary increments. 1990s the use of the term Lévy process had become
Lévy processes may be thought of as the continuous- standard.
time analogs of random walks. Mathematically, a Judging by the volume of published mathematical
Lévy process can be defined as follows. research articles, the theory of Lévy processes can
be said to have experienced a steady flow of interest
Definition 1 An d -valued stochastic process X =
from the time of the foundational works, for example,
{Xt : t ≥ 0} defined on a probability space (, F, )
of Lévy [8], Kolmogorov [7], Khintchine [6], and
is said to be a Lévy process if it possesses the
Itô [5]. However, it was arguably in the 1990s that a
following properties:
surge of interest in this field of research occurred,
drastically accelerating the breadth and depth of
1. The paths of X are  almost surely right contin-
understanding and application of the theory of Lévy
uous with left limits.
processes. While there are many who made prolific
2. (X0 = 0) = 1.
contributions during this period, as well as thereafter,
3. For 0 ≤ s ≤ t, Xt − Xs is equal in distribution
the general progression of this field of mathematics
to Xt−s .
was enormously encouraged by the monographs of
4. For 0 ≤ s ≤ t, Xt − Xs is independent of {Xu :
Bertoin [3] and Sato [10]. It was also the growing
u ≤ s}.
research momentum in the field of financial and
Historically, Lévy processes have always played insurance mathematics that stimulated a great deal
a central role in the study of stochastic processes of the interest in Lévy processes in recent times, thus
with some of the earliest work dating back to the entwining the modern theory of Lévy processes ever
early 1900s. The reason for this is that, mathemat- more with its historical roots.
ically, they represent an extremely robust class of
processes, which exhibit many of the interesting phe- Lévy Processes and Infinite Divisibility
nomena that appear in, for example, the theories of
stochastic and potential analysis. Moreover, this in The properties of stationary and independent incre-
turn, together with their elementary definition, has ments imply that a Lévy process is a Markov process.
made Lévy processes an extremely attractive class of One may show in addition that Lévy processes are
processes for modeling in a wide variety of physical, strong Markov processes. From Definition 1 alone it
biological, engineering, and economical scenarios. is otherwise difficult to understand the richness of the
Indeed, the first appearance of particular examples class of Lévy processes. To get a better impression
of Lévy processes can be found in the foundational in this respect, it is necessary to introduce the notion
works of Bachelier [1, 2], concerning the use of of an infinitely divisible distribution. Generally, an
Brownian motion, within the context of financial d -valued random variable  has an infinitely divis-
mathematics, and Lundberg [9], concerning the use ible distribution if for each n = 1, 2, . . . there exists
of Poisson processes within the context of insurance a sequence of i.i.d. random variables 1,n , . . . , n,n
mathematics. such that
d
The term Lévy process honors the work of the  = 1,n + · · · + n,n (1)
French mathematician Paul Lévy who, although not
d
alone in his contribution, played an instrumental role where = is equality in distribution. Alternatively, this
in bringing together an understanding and character- relation can be expressed in terms of characteristic
ization of processes with stationary and independent exponents. That is to say, if  has characteristic
increments. In earlier literature, Lévy processes have exponent (u) := − log Ɛ(eiu· ), then  is infinitely
been dealt with under various names. In the 1940s, divisible if and only if for all n ≥ 1 there exists a
Lévy himself referred to them as a subclass of pro- characteristic exponent of a probability distribution,
cessus additifs (additive processes), that is, processes say n , such that (u) = nn (u) for all u ∈ d .
2 Lévy Processes

It turns out that  has an infinitely divisible dis- Two fundamental examples of Lévy processes,
tribution if and only if there exists a triple (a, , ), which are shown in the next section to form the
where a ∈ d ,  is a d × d matrix whose eigenval- “building blocks”of all the other Lévy processes, are
ues are all nonnegative, and   isa measure concen- Brownian motion and compound Poisson processes.
trated on d \{0} satisfying d 1 ∧ |x|2 ( dx) < A Brownian motion is the Lévy process associated
∞, such that with the characteristic exponent
1 1
 (u) = ia · u + u · u (u) = u · u (6)
2 2

 
+ 1 − eiu·x + iu · x1(|x|<1) ( dx) (2) and therefore has increments over time periods of
d length t, which are Gaussian distributed with covari-
ance matrix t. It can be shown that, up to the
for every θ ∈ d . Here, we use the notation u · x
addition of a linear drift, Brownian motions are the
for the Euclidian inner product and |x| for Euclidian
only Lévy processes that have continuous paths.
distance. The measure  is called the Lévy (char-
A compound Poisson process is the Lévy process
acteristic) measure and it is unique. The identity
associated with the characteristic exponent:
in equation (2) is known as the Lévy–Khintchine

formula.  
The link between a Lévy processes and infinitely (u) = 1 − eiu·x λF ( dx) (7)
d
divisible distributions becomes clear when one notes
that for each t > 0 and any n = 1, 2, . . . , where λ > 0 and F is a probability distribution.
Such processes may be described pathwise by the
Xt = Xt/n + (X2t/n − Xt/n ) + · · · + (Xt − X(n−1)t/n ) piecewise linear process:
(3)

Nt
ξi , t ≥0 (8)
As a result of the fact that X has stationary inde- i=1
pendent Increments, it follows that Xt is infinitely
divisible. where {ξi : i ≥ 1} are a sequence of i.i.d. random
It can be deduced from the above observation variables with common distribution F , and {Nt : t ≥
that any Lévy process has the property that for all 0} is a Poisson process with rate λ; the latter is
t ≥0 the process with initial value zero and with unit
 
Ɛ eiu·Xt = e−t(u) (4) increments whose interarrival times are independent
and exponentially distributed with parameter λ.
where  (θ) := 1 (θ) is the characteristic exponent It is a straightforward exercise to show that the
of X1 , which has an infinitely divisible distribution. sum of any finite number of independent Lévy pro-
The converse of this statement is also true, thus cesses is also a Lévy process. Under some circum-
constituting the Lévy–Khintchine formula for Lévy stances, one may show that a countably infinite sum
processes. of Lévy processes also converges in an appropri-
ate sense to a Lévy process. This idea forms the
Theorem 1 (Lévy–Khintchine formula for Lévy basis of the Lévy–Itô decomposition, discussed in
processes). a ∈ d ,  is a d × d matrix whose the next section, where, as alluded to above, the Lévy
eigenvalues are all nonnegative, and  is
 a measure
 processes that are summed together are either a Brow-
concentrated on d \{0} satisfying d 1 ∧ |x|2  nian motion with drift or a compound Poisson process
( dx) < ∞. Then there exists a Lévy process having with drift.
characteristic exponent
1
 (u) = ia · u + u · u The Lévy–Itô Decomposition
2

  Hidden in the Lévy–Khintchine formula is a repre-
+ 1−eiu·x + iu · x1(|x|<1) ( dx) (5) sentation of the path of a given Lévy process. Every
d
Lévy Processes 3

Lévy process may always be written as the indepen- distributed with common distribution F0 ( dx) concen-
dent sum of up to a countably infinite number of other trated on {x : |x| ≥ 1} and for n = 1, 2, 3, . . .
Lévy processes, at most one of which will be a linear

(n)
Nt 
Brownian motion and the remaining processes will
be compound Poisson processes with drift. Xt(n) = ξi(n) − λn t xFn ( dx), t ≥ 0
i=1 2−n ≤|x|<2−(n−1)
Let  be the characteristic exponent of some
infinitely divisible distribution with associated  triple (13)
(a, , ). The necessary assumption that d (1 ∧
|x|2 )( dx) < ∞ implies that (A) < ∞ for all with {Nt(n) : t ≥ 0} as a Poisson process with rate
Borel A such that 0 is in the interior of Ac and, in λn and {ξi(n) : i ≥ 1} are independent and identically
particular, that ({x : |x| ≥ 1}) ∈ [0, ∞). With this distributed with common distribution Fn ( dx) con-
in mind, it is not difficult to see that, after some sim- centrated on {x : 2−n ≤ |x| < 2−(n−1) }. The limit in
ple reorganization, for u ∈ d , the Lévy–Khintchine equation (10) needs to be understood in the appropri-
formula can be written in the form ate context, however.
It is a straightforward exercise to deduce that X·(n)
  is a square integrable martingale on account of the
1
(θ) = iu · a + u · u fact that it is a centered compound Poisson process
2
   together with the fact that x 2 is integrable in the
  neighborhood of the origin
against the measure . It
+ λ0 1 − eiu·x F0 ( dx)
|x|≥1 is not difficult to see that kn=1 X·(n) is also a square

    integrable martingale. The convergence of kn=1 X·(n)


+ λn 1 − eiu·x Fn ( dx) as k ↑ ∞ can happen in one of the two ways. The
n≥1 2−n ≤|x|<2−(n−1) two quantities
 
(n)
+ iλn u · xFn ( dx) (9) 
k 
Nt
2−n ≤|x|<2−(n−1) lim |ξi(n) | and
k↑∞
n=1 i=1
where λ0 = ({x : |x| ≥ 1}), F0 ( dx) = ( dx)/λ0 ,
k 

and for n = 1, 2, 3, . . . , λn = ({x : 2−n ≤ |x| <
lim |x|λn Fn ( dx) (14)
2−(n−1) }) and Fn ( dx) = ( dx)/λn (with the under- k↑∞ 2−n ≤|x|<2−(n−1)
n=1
standing that the nth integral is absent if λn = 0).
This decomposition suggests that the Lévy process are either simultaneously finite or infinite (for all
X = {Xt : t ≥ 0} associated with  may be written t > 0), where the random limit is understood in the
as the independent sum: almostsure sense. When both are finite,

that(n)is to say,
when |x|<1 |x|( dx) < ∞, then ∞ n=1 X· is well

k
defined as the difference of a stochastic processes
Xt = Yt + Xt(0) + lim Xt(n) , t ≥ 0 (10)
k↑∞ with
 jumps and a linear drift. Conversely when
n=1
|x|<1 |x|( dx) = ∞, it can be shown that, thanks


where to the assumption, |x|<1 |x|2 ( dx) < ∞, kn=1 X·(n)


Yt = Bt − at, t ≥ 0 (11) converges uniformly over finite time horizons in the
L2 norm as k ↑ ∞. In that case, the two exploding
with {Bt : t ≥ 0} a d-dimensional Brownian motion limits in equation (14) compensate one another in
with covariance matrix , the right way for their difference to converge in the
prescribed sense.
(0)

Nt Either way, the properties of stationary and inde-
Xt(0) = ξi(0) , t ≥ 0 (12) pendent increments and almost surely right
continu-
i=1 ous paths with left limits that belong to kn=1 X·(n) as
a finite sum of Lévy processes are also inherited by
with {Nt(0) : t ≥ 0} as a Poisson process with rate the limiting process as k ↑ ∞. It is also the case that
λ0 and {ξi(0) : i ≥ 1} are independent and identically the limiting Lévy process is also a square integrable
4 Lévy Processes

martingale just as the elements of the approximating the Lévy process X will thus be of bounded variation
sequence are. and otherwise, when the above integral is infinite, the
paths are of unbounded variation.
In the case that d = 1, as an extreme case of a
Path Variation Lévy process with bounded variation, it is possible
that the process X has nondecreasing paths, in which
Consider any function f : [0, ∞) → d . Given any case it is called a subordinator. As is apparent from
partition P = {a = t0 < t2 < · · · < tn = b} of the the Lévy–Itô decomposition (9), this will necessarily
bounded interval [a, b], define the variation of f over occur when (−∞, 0) = 0,
[a, b] with partition P by 
x( dx) < ∞ (18)

n (0,1)
VP (f, [a, b]) = |f (ti ) − f (ti−1 )| (15) and  = 0. In that case, reconsidering the decompo-
i=1
sition (10), one may identify
The function f is said to be of bounded variation
over [a, b] if    Nt(n)
k 
Xt = −a − x( dx) t + lim ξi(n)
k↑∞
V (f, [a, b]) := sup VP (f, [a, b]) < ∞ (16) (0,1) n=1 i=1
P
(19)
where the supremum is taken over all partitions of
[a, b]. Moreover, f is said to be of bounded variation On account of the assumption (−∞, 0) = 0, all
if the above inequality is valid for all bounded the jumps ξi(n) are nonnegative. Hence, it is also a
intervals [a, b]. If V (f, [a, b]) = ∞ for all bounded necessary condition that
intervals [a, b], then f is said to be of unbounded 
variation. −a − x( dx) ≥ 0 (20)
(0,1)
For any given stochastic process X = {Xt : t ≥
0}, we may adopt these notions in the almost sure for X to have nondecreasing paths. These necessary
sense. So, for example, the statement “X is a process conditions are also sufficient.
of bounded variation” (or “has paths of bounded
variation”) simply means that as a random mapping,
X : [0, ∞) → d is of bounded variation almost Lévy Processes as Semimartingales
surely.
In the case that X is a Lévy process, the Lévy–Itô Recall that a semimartingale with respect to a given
decomposition also gives the opportunity to establish filtration  := {Ft : t ≥ 0} is defined as the sum of
a precise characterization of the path variation of an -local martingale and an -adapted process of
a Lévy process. Since any Lévy process may be bounded variation. The importance of semimartin-
written as the independent sum as in equation (10) gales is that they form a natural class of stochastic
and any d-dimension Brownian motion is known to processes with respect to which one may construct
have paths of unbounded variation, it follows that a stochastic integral and thereafter perform calculus.
any Lévy process for which   = 0 has unbounded Moreover, the theory of stochastic calculus plays a
variation. In the case that  = 0, since the paths of significant role in mathematical finance as it can be
the component X (0) in equation (10) are independent used as a key ingredient in justifying the pricing and
and clearly of bounded variation (they are piecewise hedging of derivatives in markets where risky assets
linear), the path variation of X is are modeled as positive semimartingales.

characterized by the A popular choice of model for risky assets in
way in which the component kn=1 Xt(n) converges.
In the case that recent years has been the exponential of a Lévy pro-
 cess (see Exponential Lévy Models). Lévy processes
have also been used as building blocks in more com-
|x|( dx) < ∞ (17)
|x|<1 plex stochastic models for prices, such as stochastic
Lévy Processes 5

volatility models with jumps (see Barndorff-Nielsen [6] Khintchine, A. (1937). A new derivation of one formula
and Shephard (BNS) Models) and time-changed by Levy P., Bulletin of Moscow State University I(1),
Lévy models (see Time-changed Lévy Process). The 1–5.
[7] Kolmogorov, N.A. (1932). Sulla forma generale di un
monograph of Cont and Tankov [4] gives an exten- processo stocastico omogeneo (un problema di B. de
sive exposition on these types of models. Thanks to Finetti), Atti Reale Accademia Nazionale dei Lincei Rend
Itô’s formula for semimartingales, the exponential of 15, 805–808.
a Lévy process is a semimartingale when it can be [8] Lévy, P. (1934). Sur les intégrales dont les éléments
shown that a Lévy process is a semimartingale. How- sont des variables aléatoires indépendantes, Annali
ever, reconsidering della Scuola Normale Superiore di Pisa 3–4, 217–218,

equation (10) and recalling that 337–366.
B  and limk↑∞ kn=1 X·(n) are martingales and that
[9] Lundberg, F. (1903). Approximerad framställning av
X·(0) − a· is an adapted process with bounded vari- sannolikhetsfunktionen, Återförsäkring av kollektiv-
ation paths, it follows immediately that any Lévy risker, Akademisk Afhandling Almqvist och Wiksell,
process is a semimartingale. Uppsala.
[10] Sato, K. (1999). Lévy Processes and Infinitely Divisible
Distributions, Cambridge University Press, Cambridge.
References

[1] Bachelier, L. (1900). Théorie de la spéculation, Annales Related Articles


Scientifiques de lÉcole Normale Supérieure 17, 21–86.
[2] Bachelier, L. (1901). Théorie mathematique du jeu, Generalized Hyperbolic Models; Infinite Divisibi-
Annales Scientifiques de lÉcole Normale Supérieure 18,
lity; Jump Processes; Lévy Copulas; Normal
143–210.
[3] Bertoin, J. (1996). Lévy Processes, Cambridge Univer- Inverse Gaussian Model; Poisson Process; Stochas-
sity Press, Cambridge. tic Exponential; Tempered Stable Process; Time-
[4] Cont, R. & Tankov, P. (2004). Financial Modelling changed Lévy Process; Variance-gamma Model.
with Jump Processes, Financial Mathematics Series,
Chapman & Hall/CRC. ANDREAS E. KYPRIANOU
[5] Itô, K. (1942). On stochastic processes. I. (Infinitely
divisible laws of probability), Japanese Journal of Math-
ematics 18, 261–301.
Wiener–Hopf to general and specific classes of infinitely divisible
random variables (see Infinite Divisibility). An d -
Decomposition valued random variable X is infinitely divisible if for
each n = 1, 2, 3, . . .
d
A fundamental part of the theory of random walks X = X(1,n) + · · · + X(n,n) (3)
and Lévy processes is a set of conclusions, which, where {X(i,n) : i = 1, . . . , n} are i.i.d. distributed and
in modern times, are loosely referred to as the the equality is in the distribution. In other words, if
Wiener–Hopf factorization. Historically, the identi- µ is the characteristic function of X, then for each
ties around which the Wiener–Hopf factorization is n = 1, 2, 3, . . . we have µ = (µn )n , where µn is the
centered are the culmination of a number of works the characteristic function of some d -valued random
that include [2–4, 6–8, 14–17], and many oth- variable.
ers; although the analytical roots of the so-called In general, if X is any d -valued random variable
Wiener–Hopf method go much further back than that is also infinitely divisible, then for each θ ∈ d ,
these probabilistic references; see, for example, [9, E(eiθ·X ) = e−(θ) where
13]. The importance of the Wiener–Hopf factoriza-
tion for either a random walk or a Lévy process is that
1
it characterizes the range of the running maximum of (θ) = ia · θ + Q(θ)
2
the process as well as the times at which new maxima 
are attained. We deal with the Wiener–Hopf factor-  
+ 1 − eiθ·x + iθ · x1(|x|<1) (dx)
ization for random walks before moving to the case of d
Lévy processes. The discussion very closely follows (4)
the ideas of [6, 7]. Indeed, for the case of random
walks, we shall not deter from providing proofs as where a ∈ d , Q is a positive semidefinite quadratic
their penetrating and yet elementary nature reveals a form on d and  is a measure supported in d \{0}
simple path decomposition that is arguably more fun- such that 
damental than the Wiener–Hopf factorization itself. 1 ∧ |x|2 (dx) < ∞ (5)
The Wiener–Hopf factorization for Lévy processes is d
essentially a technical variant of the case for random Here, | · | is Euclidean distance and, for a, b ∈ d ,
walks and we only state it without proof. a · b is the usual Euclidean inner product.
A special example of an infinitely divisible distri-
bution is the geometric distribution. The symbol p
Random Walks and Infinite Divisibility
always denotes a geometric distribution with param-
Suppose that {ξi : i = 1, 2, . . .} are a sequence of - eter p ∈ (0, 1) defined on (, F, ). In particular,
valued independent and identically distributed (i.i.d.)
P (p = k) = pq k , k = 0, 1, 2, . . . (6)
random variables defined on the common probability
space (, F, ) with common distribution function where q = 1 − p. The geometric distribution has the
F . Let following properties that are worth recalling for the
 n
S0 = 0 and Sn = ξi (1) forthcoming discussion. First,
i=1
P (p ≥ k) = q k (7)
The process S = {Sn : n ≥ 0} is called a (real valued)
random walk. For convenience, we make a number and, second, the lack-of-memory property:
of assumptions on F . First,
P (p ≥ n + m|p ≥ m) = P (p ≥ n),
min{F (0, ∞), F (−∞, 0)} > 0 (2)
n, m = 0, 1, 2, . . . (8)
meaning that the random walk may experience both
positive and negative jumps, and second, F has no A more general class of infinitely divisible distribu-
atoms. In the prevailing analysis, we repeatedly refer tions than the latter, which will shortly be of use,
2 Wiener–Hopf Decomposition

are those that may be expressed as the distribution (ii) For 0 < s ≤ 1 and θ ∈ 
of a random walk sampled at an independent
p  
and geometrically distributed time; Sp = i=1 ξi . E s G eiθSG
0
(Note, we interpret i=1 as the empty sum). To jus-   ∞

   1
tify the previous claim, a straightforward computation = exp − 1−s n e iθx
q n F ∗n (dx)
shows that for each n = 1, 2, 3, . . . (0,∞) n=1 n
 n (13)

1
 iθS   p n

Ɛ e p
=     (iii) For 0 < s ≤ 1 and θ ∈ 
1 − q Ɛ eiθξ1
 
 n E s N eiθSN
= Ɛ eiθS1/n,p (9)   ∞

 1 ∗n
= 1 − exp − n iθx
s e F (dx)
where 1/n,p is a negative binomial random variable (0,∞) n=1 n
with parameters 1/n and p, which is independent of (14)
S. The latter has distribution mass function
1 (k + 1/n) 1/n k Note that the third part of the Wiener–Hopf fac-
(1/n,p = k) = p q (10) torization characterizes what is known as the ladder
k! (1/n)
height process of the random walk S. The latter is
for k = 0, 1, 2, . . . the bivariate random walk (T , H ) := {(Tn , Hn ) : n =
0, 1, 2, . . .} where (T0 , H0 ) = (0, 0), and otherwise
for n = 1, 2, 3, . . .,
Wiener–Hopf Factorization for Random   
Walks min k ≥ 1 : STn−1 +k > Hn−1 if Tn−1 < ∞
Tn =
∞ if Tn−1 = ∞
We now turn our attention to the Wiener–Hopf
factorization. Fix 0 < p < 1 and define and 
STn if Tn < ∞
Hn = (15)
  ∞ if Tn = ∞
G = inf k = 0, 1, . . . , p : Sk = max Sj That is to say, the process (T , H ), until becoming
j =0,1,...,p
infinite in value, represents the times and positions of
(11) the running maxima of S, the so-called ladder times
and ladder heights. It is not difficult to see that Tn
where p is a geometrically distributed random is a stopping time for each n = 0, 1, 2, . . . and hence
variable with parameter p, which is independent of thanks to the i.i.d. increments of S, the increments
the random walk S, that is, G is the first visit of S of (T , H ) are i.i.d. with the same law as the pair
to its maximum over the time period {0, 1, . . . , p }. (N, SN ).
Now define
Proof (i) The path of the random walk may be
N = inf{n > 0 : Sn > 0} (12) broken into ν ∈ {0, 1, 2, . . .} finite (or completed)
excursions from the maximum followed by an addi-
In other words, the first visit of S to (0, ∞) after tional excursion, which straddles the random time
time 0. p . Here, we understand the use of the word strad-
Theorem 1 (Wiener–Hopf Factorization for Ran- dle to mean that if  is the index of the left end
dom Walks) Assume all of the notation and conven- point of the straddling excursion then  ≤ p . By the
tions above. strong Markov property for random walks and lack
of memory, the completed excursions must have the
(i) (G, SG ) is independent of (p − G, Sp − SG ) same law, namely, that of a random walk sampled
and both pairs are infinitely divisible. on the time points {1, 2, . . . , N } conditioned on the
Wiener–Hopf Decomposition 3

event that {N ≤ p } and hence ν is geometrically and, on the other hand, with the help of Fubini’s
distributed with parameter 1 − P (N ≤ p ). Mathe- Theorem,
matically, we express
  ∞ 
  n 1 ∗n

ν
 (i) (i)  exp − 1−s e n iθx
q F (dx)
(G, SG ) = N ,H (16)  n=1 n
i=1  ∞ 
   1
= exp − 1 − s n E eiθSn q n
n
where the pairs {(N (i) , H (i) ) : i = 1, 2, . . .} are inde- n=1
 ∞ 
pendent having the same distribution as (N, SN )    n
 1
conditioned on {N ≤ p }. Note also that G is the = exp − 1 − s n E eiθS1 qn
sum of the lengths of the latter conditioned excur- n=1
n
sions and SG is the sum of the respective increment    
= exp log(1 − q) − log 1 − sqE eiθS1
of the terminal value over the initial value of each
excursion. In other words, (G, SG ) is the component- p
= (19)
wise sum of ν independent copies of (N, SN ) (with 1 − qsE(eiθS1 )
(G, SG ) = (0, 0) if ν = 0). Infinite divisibility fol-
lows as a consequence of the fact that (G, SG ) is where, in the last equality, we have applied the
a geometric sum of i.i.d. random variables. The Mercator–Newton series expansion of the logarithm.
independence of (G, SG ) and (p − G, Sp − SG ) is Comparing the conclusions of the last two series of
immediate from the decomposition described above. equalities, the required expression for E(s p eiθSp )
Feller’s classic duality lemma (cf [3]) for ran- follows. The Lévy measure mentioned in equation
dom walks says that for any n = 0, 1, 2, . . . (which (4) is thus identifiable as
may later be randomized with an independent geo- ∞
metric distribution), the independence and common  1
(dy, dx) = δ{n} (dy)F ∗n (dx) q n (20)
distribution of increments implies that {Sn−k − Sn : n
n=1
k = 0, 1, . . . , n} has the same law as {−Sk : k =
0, 1, . . . , n}. In the current context, the duality lemma for (y, x) ∈ 2 .
also implies that the pair (p − G, Sp − SG ) is equal We know that (p , Sp ) may be written as the
in distribution to (D, SD ) where independent sum of (G, SG ) and (p − G, Sp −
SG ), where both are infinitely divisible. Further, the
  former has Lévy measure supported on {1, 2, . . .} ×
D := sup k = 0, 1, . . . , p : Sk = min Sj (0, ∞) and the latter has Lévy measure supported
j =0,1,...,p
on {1, 2, . . .} × (−∞, 0). In addition, E(s G eiθSG )
(17) extends to the upper half of the complex plane
in  θ (and is continuous
 on the real axis) and
E s p − G eiθ(Sp − SG extends to the lower half of the
(ii) Note that, as a geometric sum of i.i.d. random
complex plane in θ (and is continuous on the real
variables, the pair (p , Sp ) is infinitely divisible for
axis).a Taking account of equation (4), this forces
s ∈ (0, 1) and θ ∈ , let q = 1 − p and also that, on
the factorization of the expression for E(s p eiθSp )
one hand,
in such a way that
   ∞
 p  − (1−s n eiθ x )q n F ∗n (dx)/n
E(s ep iθSp
) = E E seiθS1 E(s G eiθSG ) = e (0,∞) n=1
(21)
   k
= p qsE eiθS1 (iii) Note that the path decomposition given in part
k≥0 (i) shows that
p    ν (i) ν (i) 
=   (18)
1 − qsE eiθS1 E s G eiθSG = E s i=1 N eiθ i=1 H (22)
4 Wiener–Hopf Decomposition

where the pairs {(N (i) , H (i) ) : i = 1, 2, . . .} are inde- It is easy to deduce that if X is a Lévy process,
pendent having the same distribution as (N, SN ) con- then for each t > 0 the random variable Xt is
ditioned on {N ≤ p }. Hence, we have infinitely divisible. Indeed, one may also show via
a straightforward computation that
   
E s G eiθSG Ɛ eiθXt = e−(θ)t for all θ ∈ , t ≥ 0 (26)

= P (N > p )P (N ≤ p )k
where, in its most general form,  takes the form
k≥0
 k given in equation (4). Conversely, it can also be
(i)
k (i)  shown that given a Lévy–Khintchine exponent (4) of
× E s i=1 N eiθ i=1 H
an infinitely divisible random variable, there exists
  k a Lévy process that satisfies equation (26). In the
= P (N > p )P (N ≤ p )k E s N eiθSN |N ≤ p special case that the Lévy–Khintchine exponent 
k≥0 belongs to that of a positive-valued infinitely divisible
  k distribution, it follows that the increments of the
= P (N > p )E s N eiθSN 1(N≤p )
associated Lévy process must be positive and hence
k≥0
 its paths are necessarily monotone increasing. In full
 k
= P (N > p )E (qs)N eiθSN generality, a Lévy process may be naively thought of
k≥0 as the independent sum of a linear Brownian motion
plus an independent process with discontinuities in its
P (N > p )
=   (23) path, which, in turn, may be seen as the limit (in an
1 − E (qs)N eiθSN appropriate sense) of the partial sums of a sequence
of compound Poisson processes with drift. The book
Note that in the fourth equality we use the fact that by Bertoin [1] gives a comprehensive account of the
P (p ≥ n) = q n . above details.
The required equality to be proved follows by The definition of a Lévy process suggests that
setting s = 0 in equation (21) to recover it may be thought of as a continuous-time analog
   of a random walk. Let us introduce the exponen-

 qn ∗n
tial random variable with parameter p, denoted by
P (N > p ) = exp − F (dx) ep , which henceforth is assumed to be independent
(0,∞) n=1 n
of all other random quantities under discussion and
(24) defined on the same probability space. Like the geo-
metric distribution, the exponential distribution also
and then plugging this back into the right-hand side has a lack-of-memory property in the sense that for
of equation (23) and rearranging. all 0 ≤ s, t < ∞ we have (ep > t + s|ep > t) =
(ep > s) = e−ps . Moreover, ep , and, more gener-
ally, Xep , is infinitely divisible. Indeed, straightfor-
Lévy Processes and Infinite Divisibility ward computations show that for each n = 1, 2, 3, . . .
 
A (one-dimensional) stochastic process X = {Xt :  1 n  n
t ≥ 0} is called a Lévy process (see Lévy Processes) p n
Ɛ(eiθXep ) =   = Ɛ eiθXγ1/n,p
on some probability space (, F, ) if p + (θ)
(27)
1. X has paths that are -almost surely right where γ1/n,p is a gamma distribution with parameters
continuous with left limits; 1/n and p, which is independent of X. The latter has
2. given 0 ≤ s ≤ t < ∞, Xt − Xs is independent of distribution
{Xu : u ≤ s};
3. given 0 ≤ s ≤ t < ∞, Xt − Xs is equal in dis- p 1/n −1+1/n −px
tribution to Xt−s ; and (γ1/n,p ∈ dx) = x e dx (28)
(1/n)
(X0 = 0) = 1 (25) for x > 0.
Wiener–Hopf Decomposition 5

Wiener–Hopf Factorization for Lévy Theorem 2 (The Wiener–Hopf Factorization for


Processes Lévy Processes) Suppose that X is any Lévy process
other than a compound Poisson process. As usual,
The Wiener–Hopf factorization for a one-dimen- denote by ep an independent and exponentially dis-
sional Lévy processes is slightly more technical than tributed random variable.
for random walks but, in principle, appeals to essen-
tially the same ideas that have been exhibited in (i) The pairs
the above exposition of the Wiener–Hopf factor-
ization for random walks. In this section, therefore, (Gep , X ep ) and (ep − Gep , X ep − Xep ) (31)
we give only a statement of the Wiener–Hopf fac-
are independent and infinitely divisible.
torization. The reader who is interested in the full
(ii) For α, β ≥ 0
technical details is directed primarily to the article
by Greenwood and Pitman [6] for a natural and   κ(p, 0)
insightful probabilistic presentation (in the author’s Ɛ e−αGep −βXep = (32)
κ(p + α, β)
opinion). Alternative accounts based on the afore-
mentioned article can be found in the books by (iii) The Laplace exponent κ(α, β) may be identified
Bertoin [1] and Kyprianou [12], and derivation of the in terms of the law of X in the following way,
Wiener–Hopf factorization for Lévy processes from
the Wiener–Hopf factorization for random walks can  ∞  ∞  −t 
be found in [18]. κ(α, β) = k exp e − e−αt−βx
Before proceeding to the statement of the Wiener– 0 0

Hopf factorization, we first need to introduce the dt
ladder process associated with any Lévy process × (Xt ∈ dx) (33)
t
X. Here, we encounter more subtleties than for the
random walk. Consider the range of the times and where α, β ≥ 0 and k is a dimensionless strictly
positions at which the process X attains new maxima. positive constant.
That is to say, the random set {(t, X t ) : X t = Xt }
where X t = sups≤t Xs is the running maximum. It
turns out that this range is equal in law to the range The First Passage Problem and
of a killed bivariate subordinator (τ, H ) = {(τt , Ht ) : Mathematical Finance
t < ζ }, where the killing time ζ is an independent
and exponentially distributed random variable with There are many applications of the Wiener–Hopf
some rate λ ≥ 0. In the case that limt↑∞ X t = ∞, factorization in applied probability, and mathemati-
there should be no killing in the process (τ, H ) and cal finance is no exception in this respect. One of
hence λ = 0 and we interpret (ζ = ∞) = 1. Note the most prolific links is the relationship between the
that we may readily define the Laplace exponent of information contained in the Wiener–Hopf factoriza-
the killed process (τ, H ) by tion and the distributions of the first passage times
Ɛ(e−ατt −βHt 1(t<ζ ) ) = e−κ(α,β)t (29)
τx+ := inf{t > 0 : Xt > x} and
for all α, β ≥ 0 where, necessarily, κ(α, β) = λ +
φ(α, β) is the rate of ζ , and φ is the bivariate Laplace τx− := inf{t > 0 : Xt < x} (34)
exponent of the unkilled process {(τt , Ht ) : t ≥ 0}.
Analogous to the role played by joint probability together with the overshoots Xτx+ − x and x − Xτx− ,
generating and characteristic exponent of the pair where x ∈ . In turn, this is helpful for the pricing
(N, SN ) in Theorem 1 (iii), the quantity κ(α, β) also of certain types of exotic options.
is prominent in the Wiener–Hopf factorization for For example, in a simple market model for which
Lévy processes, which we state below. To do so, we there is one risky asset modeled by an exponential
give one final definition. For each t > 0, let Lévy process and one riskless asset with a fixed
rate of return, say r > 0, the value of a perpetual
Gep = sup{s < ep : Xs = X s } (30) American put, or indeed a perpetual down-and-in
6 Wiener–Hopf Decomposition

put, boils down to the computation of the following Corollary 1 For all α, β ≥ 0 and x ≥ 0, we have
quantity:
 
     −
 Ɛ eβXeα 1(−X >x)
− X− + −ατ−x +βXτ − eα
vy (x) := Ɛ e−rτy K − e τy |X0 = x (35) Ɛ e (τ−x <∞) =
−x 1 −  
Ɛ eβXeα

where y ∈  and z+ = max{0, z} and the expectation (39)


is taken with respect to an appropriate risk-neutral
measure that keeps X in the class of Lévy processes In that case, we may develop the expression in
(e.g., the measure that occurs as a result of the equation (35) by using Corollary 1 to obtain
Escher transform). To see the connection with the     
Wiener–Hopf factorization consider the following Ɛ K Ɛ eXer − ex+Xer 1(−Xer >x−y)
lemma and its corollary: vy (x) =  
Ɛ e X er
Lemma 1 For all α > 0, β ≥ 0 and x ≥ 0 we have (40)
  where X t = infs≤t Xs is the running infimum. Ulti-
−βX eα  
 −ατ + −βX  Ɛ e 1 X >x

mately, further development of the expression on
τx+
Ɛ e x 1(τx+ <∞) =   the right-hand side above requires knowledge of the
Ɛ e−βXeα distribution of X er . This is information, which, in
principle, can be extracted from the Wiener–Hopf
(36) factorization.
We conclude by mentioning the articles [5, 10]
Proof First, assume that α, β, x > 0 and note that
and [11] in which the Wiener–Hopf factorization is
  used for the pricing of barrier options (see Lookback
Options).
Ɛ e−βXeα 1Xe 
α >x

  End Notes
= Ɛ e−βXeα 1(τx+ <eα )
 

a.
−β Xeα −Xτ +  It is this part of the proof that makes the connection
−βXτ +  + with the general analytic technique of the Wiener–Hopf
= Ɛ 1(τx+ < eα ) e x Ɛ e x
 Fτx (37)
 method of factorizing operators. This also explains the
origin of the terminology Weiner–Hopf factorization for
what is otherwise a path, and consequently distributional,
Now, conditionally on Fτx+ and on the event τx+ < eα , decomposition.
the random variables X eα − Xτx+ and X eα have the
same distribution, thanks to the lack-of-memory prop- References
erty of eα and the strong Markov property. Hence, we
have the factorization [1] Bertoin, J. (1996). Lévy Processes, Cambridge Univer-
sity Press.
     
+ [2] Borovkov, A.A. (1976). Stochastic Processes in Queue-
Ɛ e−βXeα 1X  = Ɛ e−ατx −βXτx+ Ɛ e−βXeα
eα >x ing Theory, Springer-Verlag.
[3] Feller, W. (1971). An Introduction to Probability Theory
(38) and its Applications, 2nd Edition, Wiley, Vol. II.
[4] Fristedt, B.E. (1974). Sample functions of stochastic pro-
cesses with stationary independent increments, Advances
The case that β or x is equal to zero can be achieved
in Probability 3, 241–396.
by taking limits on both sides of the above equality. [5] Fusai, G., Abrahams, I.D. & Sgarra, C. (2006). An exact
analytical solution for discrete barrier options, Finance
By replacing X by −X in Lemma 1, we get the and Stochastics 10, 1–26.
following analogous result for first passage into the [6] Greenwood, P.E. & Pitman, J.W. (1979). Fluctua-
negative half line. tion identities for Lévy processes and splitting at
Wiener–Hopf Decomposition 7

the maximum, Advances in Applied Probability 12, [14] Percheskii, E.A. & Rogozin, B.A. (1969). On the joint
839–902. distribution of random variables associated with fluctua-
[7] Greenwood, P.E. & Pitman, J.W. (1980). Fluctu- tions of a process with independent increments, Theory
ation identities for random walk by path decom- of Probability and its Applications 14, 410–423.
position at the maximum. Abstracts of the Ninth [15] Spitzer, E. (1956). A combinatorial lemma and its
Conference on Stochastic Processes and Their Applica- application to probability theory, Transactions of the
tions, Evanston, Illinois, 6–10 August 1979, Advances American Mathematical Society 82, 323–339.
in Applied Probability 12, 291–293. [16] Spitzer, E. (1957). The Wiener-Hopf equation whose
[8] Gusak, D.V. & Korolyuk, V.S. (1969). On the joint kernel is a probability density, Duke Mathematical
distribution of a process with stationary independent Journal 24, 327–343.
increments and its maximum. Theory of Probability 14, [17] Spitzer, E. (1964). Principles of Random Walk, Van
400–409. Nostrand.
[9] Hopf, E. (1934). Mathematical Problems of Radiative [18] Sato, K.-I. (1999). Lévy Processes and Infinitely Divisi-
Equilibrium. Cambridge tracts, No. 31. ble Distributions, Cambridge University Press.
[10] Jeannin, M. & Pistorius, M.R. (2007). A Transform
Approach to Calculate Prices and Greeks of Barrier
Options Driven by a Class of Lévy. Available at arXiv: Related Articles
http://arxiv.org/abs/0812.3128.
[11] Kudryavtsev, O. & Levendorski, S.Z. (2007). Fast Fractional Brownian Motion; Infinite Divisibility;
and Accurate Pricing of Barrier Options Under Levy
Processes. Available at SSRN: http://ssrn.com/abstract=
Lévy Processes; Lookback Options.
1040061.
[12] Kyprianou, A.E. (2006). Introductory Lectures on Fluc- ANDREAS E. KYPRIANOU
tuations of Lévy Processes with Applications, Springer.
[13] Payley, R. & Wiener, N. (1934). Fourier Transforms in
the Complex Domain, American Mathematical Society.
Colloquium Publications, New York, Vol. 19.
• for every s, t ≥ 0, the r.v. Nt+s − Nt has the same
Poisson Process law as Ns .

For any fixed t ≥ 0, the random variable Nt has a


In this article, we present the main results on Poisson Poisson law, with parameter λt, that is, (Nt = n) =
processes, which are standard examples of jump e−λt ((λt)n /n!) and, for every x > 0, t > 0, u, α ∈ 
processes. The reader can refer to the books [2, 5]
for the study of standard Poisson processes, or [1, 3,
4, 6] for general Poisson processes. Ɛ(Nt ) = λt, Var (Nt ) = λt
Ɛ(x Nt ) = eλt (x−1) ; Ɛ(eiuNt ) = eλt (e −1)
iu
;
Counting Processes and Stochastic Ɛ(eαNt ) = eλt (e −1)
α
(4)
Integrals
From the property of independence and stationarity
Let (Tn , n ≥ 0) be a strictly increasing sequence of the increments, it follows that the process (Mt : =
of random times (i.e., nonnegative random vari- Nt − λt, t ≥ 0) is a martingale. More generally, if
ables on a probability space (, F, )) such that H is an FN -predictablea bounded process, then the
limn→∞ Tn = ∞, with T0 = 0. The counting process following processes are FN -martingales:
N associated with (Tn , n ≥ 0) is defined as
   
n if t ∈ [Tn , Tn+1 [ t t t
Nt = (1) (H  M)t : = Hs dMs = Hs dNs − λ Hs ds
+∞ otherwise
0 0 0
 t
or, equivalently,
  ((H  M)t ) − λ 2
Hs2 ds
Nt = 11{Tn ≤t} = n 11{Tn ≤t<Tn+1 } (2) 
0
 
t t
n≥1 n≥1
exp Hs dNs − λ (eHs − 1) ds (5)
0 0
It is an increasing, right-continuous process. We
denote by Nt − the left limit of Ns when s → t, s < t
In particular, the processes (Mt2 − λt, t ≥ 0) and
and by Ns = Ns − Ns − the jump process of N . The
(Mt2 − Nt , t ≥ 0) are martingales. The process (λt,
stochastic integral of a real-valued process C with
t ≥ 0) is the predictable quadratic variation process
respect to the increasing process N is defined as
of M (or the compensator of N ), denoted by N ,
 t  the process (Nt , t ≥ 0) equals in this case its optional
(C  N )t := Cs dNs = Cs dNs quadratic variation, denoted by [N ].
0 ]0,t] The above martingale properties do not extend

 to FN -adapted processes
t H . For example, from the
= CTn 11{Tn ≤t} (3) simple equality 0 (Ns − Ns− ) dMs = Nt , it follows
t
n=1 that 0 Ns dMs is not a martingale.
The natural filtration of N (i.e., the smallest right-
continuous and complete filtration that makes the
Predictable Representation Property
process N adapted) is denoted by FN .
Proposition 1 Let N be a Poisson process, and
H∞ ∈ L2 (FN∞ ), a square-integrable random variable.
Standard Poisson Process
Then, there exists an FN -predictable process (hs ,
The standard Poisson process is a counting process s ≥ 0) such that
(Nt , t ≥ 0) with stationary and independent incre-  ∞
ments, that is, H∞ = Ɛ(H∞ ) + hs dMs (6)
0
• for every s, t ≥ 0, Nt+s − Nt is independent of  ∞ 
FNt ; and and Ɛ 0 h2s ds < ∞, where Mt = Nt − λt.
2 Poisson Process

It follows that if X is a square-integrable FN - An inhomogeneous Poisson process with stochas-


martingale, there exists an FN - predictable process tic intensity λ can be viewed as a time change of
t
(xs , s ≥ 0) such that Xt = X0 + 0 xs dMs . , a standard Poisson process: indeed, the process
N
(Nt = N t , t ≥ 0) is an inhomogeneous Poisson pro-
cess with stochastic intensity (λt , t ≥ 0).
Independent Poisson Processes
For H an F-predictable process satisfying some
Here, we assume that the probability space (, F, ) integrability conditions, the following processes are
is endowed with a filtration F. martingales:
A process (N 1 , . . . , N d ) is a d-dimensional F-  t  t  t
Poisson process (with d ≥ 1) if each (N j , j = (H  M)t = Hs dMs = Hs dNs − λs Hs ds
1, . . . , d) is a right-continuous F-adapted process 0 0 0
j
such that N0 = 0, and if there exist constants  t
(λj , j = 1, . . . , d) such that for every t ≥ s ≥ 0, ((H  M)t )2 − λs Hs2 ds
∀nj ∈ , 
0
 
t t

exp Hs dNs − λs (e Hs
− 1) ds (9)
 ∩dj=1 (Ntj − Nsj = nj )|Fs 0 0


d
(λj (t − s))nj Stochastic Calculus
= e−λj (t−s) (7)
j =1
nj ! Integration by Parts Formula. Let dXt = bt dt +
ϕt dMt and dYt = ct dt + ψt dMt , where ϕ and ψ are
Proposition 2 An F-adapted process N is a predictable processes, and b, c are adapted processes
d-dimensional F-Poisson process if and only if such that the processes X and Y are well defined.
Then,
1. each N j is an F-Poisson process  
t t
2. no two N j ’s jump simultaneously. Xt Yt = xy + Ys − dXs + Xs − dYs + [X, Y ]t
0 0
(10)
Inhomogeneous Poisson Processes
where [X, Y ]t is the quadratic covariation process,
We assume that the probability space (, F, ) is defined as
endowed with a filtration F.  t
[X, Y ]t : = ϕs ψs dNs (11)
0
Definition
In particular, if dXt = ϕt dMt and dYt = ψt dMt (i.e.,
Let
λ be an F-adapted nonnegative process satisfying X and Y are local martingales), the process (Xt Yt −
∞
t
Ɛ 0 λs ds < ∞, ∀t, and 0 λs ds = ∞. [X, Y ]t , t ≥ 0) is a martingale. It can be noted that,
An inhomogeneous Poisson process N with in that case, the process (Xt Yt − X, Y t , t ≥ 0),
t
stochastic intensity λ is a counting process such where X, Y t = 0 ϕs ψs λs ds is also a martingale.
that for every nonnegative F-predictable process (φt , The process X, Y  is the compensator of [X, Y ]
t ≥ 0), the following equality is satisfied: if [X, Y ] is integrable (see Compensators). The
 ∞   ∞  predictable process (X, Y t , t ≥ 0) is called the
predictable covariation process of the pair (X, Y ), or
Ɛ φs dNs = Ɛ φs λs ds (8)
0 0
the compensator of the product XY . If dXti = xti dNti ,
t where N i , i = 1, 2 are independent inhomogeneous
Therefore (Mt = Nt − 0 λs ds, t ≥ 0) is an F- Poisson processes, the covariation processes [X 1 , X 1 ]
martingale, and  if φ is an F-predictable t process and X 1 , X 2  are null, and X 1 X 2 is a martingale.
t
such that ∀t, Ɛ( 0 |φs |λs ds) < ∞, then ( 0 φs dMs ,
t
t ≥ 0) is an F-martingale. The process t = 0 λs ds Itô’s Formula. Itô’s formula is a special case of
is called the compensator of N . the general one; it is a bit simpler and is used for the
Poisson Process 3

processes that are within bounded variation. Let b be The local martingale L is denoted by E(µ  M) and
an adapted process and ϕ a predictable process with named the Doléans-Dade exponential (alternatively,
adequate integrability conditions, and the stochastic exponential) of the process µ  M.
If µ > −1, the process L is nonnegative and is a
dXt = bt dt + ϕt dMt = (bt − ϕt λt ) dt + ϕt dNt martingale if ∀t, Ɛ(Lt ) = 1 (this is the case if µ
satisfies −1 + δ < µs < C where C and δ > 0 are
(12)
two constants).
and F ∈ C 1,1 (+ × ). Then, the process (F (t, Xt ), If µ is not greater than −1, then the process L
t ≥ 0) is a semimartingale with decomposition defined in equation (16) may take negative values.
F (t, Xt ) = Zt + At (13)
where Z is a local martingale given by
Change of Probability Measure

Let µ be a predictable process such that µ > −1,


Zt = F (0, X0 ) t
and 0 λs |µs | ds < ∞, and let L be the positive
 t
exponential local martingale solution of
+ [F (s, Xs − + ϕs ) − F (s, Xs − )] dMs (14)
0
dLt = Lt− µt dMt , L0 = 1 (19)
and A a bounded variation process
 t Assume that L is a martingale, and let  be the

At = ∂t F (s, Xs ) + bs ∂x F (s, Xs ) probability measure equivalent to  defined on Ft
0 by |Ft = Lt |Ft . Under , the process

+ λs [F (s, Xs − + ϕs ) −F (s, Xs )− ϕs ∂x F (s, Xs )] ds
 t
µ
(15) Mt := Mt − µs λs ds
0
 t
Exponential Martingales
= Nt − (µs + 1)λs ds t ≥0 (20)
Proposition 3 Let N be an inhomogeneous Pois- 0
son process with stochastic intensity (λt , t ≥ 0),
is a local martingale, hence N is a -inhomogeneous
 t (µt , t ≥ 0) a predictable process such that
and
Poisson process, with intensity λ(1 + µ).
0 |µs |λs ds < ∞. Then, the process L defined by
   t 

 exp − µs λs ds if t < T1

 Compound Poisson Processes

 0

Lt = (1 + µTn )

 n,Tn ≤t Definition and Properties

   t 

 × exp − µs λs ds if t ≥ T1 Let λ be a positive number, and F (dy) be a proba-
0
(16) bility law on . A (λ, F )-compound Poisson process
is a local martingale solution of is a process X = (Xt , t ≥ 0) of the form
dLt = Lt− µt dMt , L0 = 1 (17) 
Nt 
Xt = Yn = Yn (21)
Moreover, if µ is such that ∀s, µs > −1,
n=1 n>0,Tn ≤t
  t  t 
Lt = exp − µs λs ds + ln(1 + µs ) dNs where N is a standard Poisson process with intensity
0 0 λ > 0, and the (Yn , n ≥ 1) are i.i.d. square-integrable
 t random variables with law F (dy) = (Y1 ∈ dy),
= exp − (µs − ln(1 + µs ))λs ds independent of N .
0
 t
Proposition 4 A compound Poisson process has
+ ln(1 + µs ) dMs (18) stationary and independent increments; for fixed t, the
0
4 Poisson Process

cumulative distribution function of Xt is In other words, for any α such that Ɛ(eαXt ) <

∞ (or equivalently Ɛ(eαY1 ) < ∞), the process
 (λt)n (eαXt /Ɛ(eαXt ), t ≥ 0) is a martingale. More gener-
(Xt ≤ x) = e−λt F ∗n (x) (22)
n=0
n! ally, let f be a bounded Borel function. Then, the
process
where the star indicates a convolution.
If Ɛ(|Y1 |) < ∞, the process (Zt = Xt − tλƐ(Y1 ), N  
t ∞
t ≥ 0) is a martingale and Ɛ(Xt ) = λt Ɛ(Y1 ). exp f (Yn ) − λt (e f (x)
− 1)F (dx) (28)
If Ɛ(Y12 ) < ∞, the process (Zt2 − tλƐ(Y12 ), n=1 −∞

t ≥ 0) is a martingale and Var (Xt ) = λt Ɛ(Y12 ).


 is a martingale. In particular,
Introducing the random measure µ = ∞ n=1 δTn ,Yn on
+ × , that is,  N 
 t

µ(ω, ]0, t], A) = 11Yn (ω)∈A (23) Ɛ exp f (Yn )


n>0,Tn (ω)≤t n=1
  ∞ 
and denoting by (f ∗ µ)t , the integral = exp λt (ef (x) − 1)F (dx) (29)
−∞
 t 
f (x)µ(ω, ds, dx) = f (Yn (ω))
0  n>0,Tn ≤t
Change of Measure


Nt
Let X be a (λ, F )-compound Poisson process,
= f (Yn (ω)) (24) 
λ > 0, and F a probability measure on , absolutely
n=1
continuous with respect to F , with Radon–Nikodym
we obtain that (dx) = ϕ(x)F (dx). The process
density ϕ, that is, F
  
f  λ
Mt = (f ∗ µ)t − tλƐ(f (Y1 )) Lt = exp t (λ − 
λ) + ln ϕ(Xs ) (30)
 t λ
s≤t
= f (x)(µ(ω, ds, dx) − λF (dx) ds) (25)
0 
is a positive martingale (take f (x) = ln((
λ/λ) ϕ(x))
is a martingale. in equation (28)) with expectation 1. Set d|Ft =
Lt d|Ft .
Martingales )-
Proposition 6 Under , the process X is a (
λ, F
Proposition 5 If X is a (λ, F )-compound Poisson compound Poisson process.
∞
process, for any α such that −∞ eαx F (dx) < ∞, the
Let α be such that Ɛ(eαY1 ) < ∞. The particular
process
case with ϕ(x) = (eαx /Ɛ(eαY1 )) and  λ = λƐ(eαY1 )
  ∞  corresponds to the Esscher transform for which
Zt = exp αXt − λt (eαx − 1)F (dx) (26)
−∞
eαXt
d|Ft = d|Ft (31)
is a martingale and Ɛ(eαXt )
  ∞  We emphasize that there exist changes of probability
Ɛ(eαXt ) = exp λt (eαx − 1)F (dx) that do not preserve the compound Poisson process
−∞
  property. For the predictable representation theorem,
= exp λt (Ɛ(eαY1 − 1)) (27) see Point Processes.
Poisson Process 5

An Example: Double Exponential Model References

The compound Poisson process is said to be a double [1] Brémaud, P. (1981). Point Processes and Queues:
exponential process if the law of the random variable Martingale Dynamics, Springer-Verlag, Berlin.
Y1 is [2] Çinlar, E. (1975). Introduction to Stochastic Processes,
Prentice Hall.
  [3] Cont, R. & Tankov, P. (2004). Financial Modeling with
F (dx) = pθ1 e−θ1 x 11{x>0} + (1 − p)θ2 eθ2 x 11{x<0} dx Jump Processes, Chapman & Hall/CRC.
[4] Jeanblanc, M., Yor, M. & Chesney, M. (2009). Mathe-
(32)
matical Models for Financial Markets, Springer, Berlin.
[5] Karlin, S. & Taylor, H. (1975). A First Course in
where p ∈]0, 1[ and θi , i = 1, 2 are positive numbers. Stochastic Processes, Academic Press, San Diego.
Under an Esscher transform, this model is still a [6] Protter, P.E. (2005). Stochastic Integration and Differen-
double exponential model. This particular dynamic tial Equations, 2nd Edition, Springer, Berlin.
allows one to compute the Laplace transform of the
first hitting times of a given level.
Related Articles

End Notes Lévy Processes; Martingales; Martingale Repre-


sentation Theorem.
a.
We recall that adapted continuous-on-left processes are
predictable. The process N is not predictable. MONIQUE JEANBLANC
Point Processes The process N is called a marked point process. This
is a generalization of the compound Poisson process:
we have introduced, in particular, a spatial dimension
for the size of jumps, which are no more i.i.d. random
This article gives a brief overview of general point variables.
processes. We refer to the books [1–5], for proofs A map  is predictable if it is P ⊗ E measurable.
and advanced results. The compensator of the marked point process N is
the unique predictable random measure ν on (+ ×
E, G ⊗ E) such that, for every bounded predictable
Marked Point Processes process 

Definition  t  
Ɛ (s, z; ω)µ(ω; ds, dz)
An increasing sequence of random times is called 0 E
a univariate point process. A simple example is the  t  
Poisson process. =Ɛ (s, z; ω) ν(ω; ds, dz) (5)
Given a univariate point process, we associate 0 E
to every time Tn a mark Zn . More precisely, let
(, F, ) be a probability space, (Zn , n ≥ 1) a In the case of a marked point process on  × d ,
sequence of random variables taking values in a the compensator admits an explicit representation: let
measurable space (E, E), and (Tn , n ≥ 1) an increas- Gn (dt, dz) be a regular version of the conditional
ing sequence of nonnegative random variables. We distribution of (Tn+1 , Zn+1 ) with respect to FTNn =
assume that lim Tn = ∞, so that there is only a finite σ {(T1 , Z1 ), . . . (Tn , Zn )}. Then,
number of n such that, for a given t, one has Tn ≤ t.  Gn (dt, dz)
We define the process
 N as follows. For each set, ν(dt, dz) = 11{Tn <t≤Tn+1 } (6)
A ∈ E, Nt (A) = n 11{Tn ≤t} 11{Zn ∈A} is the number of n
Gn ([t, ∞[×d )
“marks” in the set A before time t. The natural fil-
tration of N is
Intensity Process
FtN = σ (Ns (A), s ≤ t, A ∈ E ) (1)
In what follows, we assume that, for any A ∈ E,
The predictable σ -algebra P is the σ -algebra defined the process (Nt (A), t ≥ 0) admits the F-predictable
on  × + that is generated by the sets intensity (λt (A), t ≥ 0), that is, there exists a non-
negative process (λt (A), t ≥ 0) such that

A × {0}, A ∈ F0N ; A×]s, t], A ∈ FsN , s ≤ t t
Nt (A) − λs (A)ds (7)
(2) 0

 t (E)
The associated random counting measure µ(ω, is an F- martingale. Then, if Xt = N n=1 (Tn , Zn )
ds, dz) is defined as follows: let  be a map where  is an F-predictable process that satisfies
  
(t, ω, z) ∈ (+ , , E) → (t, ω, z) ∈  (3)
Ɛ |(s, z)|λs (dz)ds < ∞ (8)
]0,t] E
We set
the process
  ∞

(s, z)µ(ds, dz) = (Tn , Zn )11{Tn ≤t}  t
]0,t] E n=1 Xt − (s, z)λs (dz)ds
0 E
N
t (E)  
= (Tn , Zn ) (4) = (s, z) [µ(ds, dz) − λs (dz)ds] (9)
n=1 ]0,t] E
2 Point Processes

is a martingale and, in particular, Change of Probability Measure


   Let µ be the random measure of a marked point
Ɛ (s, z)µ(ds, dz) process with intensity λt (A) = αt mt (A), where m
]0,t] E is a probability measure. We shall say that the
   marked point process admits (αt , mt (dz)) as P -local
=Ɛ (s, z)λs (dz)ds (10) characteristics. Let (ψt , ht (z)) be two predictable
]0,t] E
positive processes such that
The random measure µ(ds, dz) − λs (dz)ds is the  t 
compensated measure of µ. ψs αs ds < ∞, ht (z)mt (dz) = 1 (15)
0 E
Example
Nt Compound Poisson Process. Let Xt = Let L be the solution of
Y be a (λ, F )-compound Poisson process.
n=1 n 
We can consider the Yn s as marks  and introduce
the marked point process Nt (A) = N t dLt = Lt − (ψt ht (z) − 1)(µ(dt, dz)
n=1 Yn ∈A . For
1
1 E
any A, the process (Nt (A), t ≥ 0) is a compound
− αt mt (dz)dt), L0 = 1 (16)
Poisson process, and (Nt (A) − λtP (Y1 ∈ A), t ≥ 0)
is a martingale. The intensity of the marked point If Ɛ(Lt ) = 1 (so that L is a martingale), setting
process N is λt (dz) = λF (dz). Moreover, if Ai are |Ft = Lt |Ft , the marked point process has the -
disjoint sets, the processes N (Ai ) are independent. local characteristics (ψt αt , ht (z)mt (dz)).
The counting random measure µ satisfies
Example Compound Poisson Process. The
 t 
Nt
change of measure for compound Poisson processes
f (x)µ(ω; ds, dx) = f (Yk ) (11) can be written in terms of random measures. Let
0  k=1

and we obtain, in particular, that, as in the article on Lt = exp f (x)Nt (dx)
Poisson processes (see Poisson Process) 
 ∞ 
 t − tλ (e f (x)
− 1)F (dx)
f −∞
Mt = f (x)(µ(ω; ds, dx) − ds λF (dx))  t 
0 
= exp f (x)µ(ds, dx)
(12) 0 
 ∞ 
is a martingale. −t (e f (x)
− 1)λF (dx) (17)
−∞

Predictable Representation Property be a martingale. Define d|Ft = Lt d|Ft . Then,


 t
Let FN be the filtration generated by the marked point (µ(ds, dx) − ds ef (x) λF (dx)) (18)
process with intensity λs (dz). Then, any (, FN )- 0 
martingale M admits the representation
is a -martingale as obtained in the article on Poisson
 t processes (see Poisson Process).
Mt = M0 + (s, x)(µ(ds, dx) − λs (dx)ds)
0 E
Poisson Point Processes
(13)
Poisson Measures
where  is a FN -predictable process such that
 t   Let (E, E) be a measurable space. A random measure
µ on (E, E) is a Poisson measure with intensity ν,
Ɛ |(s, x)|λs (dx)ds < ∞ (14)
0 E
where ν is a σ -finite measure on (E, E), if
Point Processes 3

1. for every set B ∈ E with ν(B) < ∞, µ(B) If n(


) < ∞, the process Nt
− tn(
) is an
follows a Poisson distribution with parameter F-martingale.
ν(B);
2. for disjoint sets Bi , i ≤ n, the variables µ(Bi ), Proposition 1 (Compensation Formula).
i ≤ n are independent. Let H be a measurable positive process vanishing at
δ. Then
 
Point Processes 
Ɛ H (s, ω, es (ω))
Let (E, E) be a measurable space and δ an additional s≥0
 
point. We set Eδ = E ∪ δ, Eδ = σ (E, {δ}). ∞
=Ɛ ds H (s, ω, u)n(du) (22)
Definition 1 Let e be a stochastic process defined 0

on a probability space (, F, P ), taking values in



t
(Eδ , Eδ ). The process e is a point process if If, for any t, Ɛ 0 ds H (s, ω, u)n(du) < ∞, the
process
1. the map (t, ω) → et (ω) is B(]0, ∞[) ⊗ F-
  t 
measurable;
2. the set Dω = {t : et (ω)  = δ} is a.s. countable. H (s, ω, es (ω)) − ds H (s, ω, u)n(du)
s≤t 0

For every measurable set B of ]0, ∞[×E, we set (23)


 is a martingale.
N B (ω) := 11B (s, es (ω)) (19)
s≥0
Proposition 2 (Exponential Formula).
If t f is a measurable function such that
In particular, if B =]0, t] ×
, we write
0 ds |f (s, u)|n(du) < ∞ for every t, then,

Nt
= N B = Card{s ≤ t : e(s) ∈
} (20)  

Ɛ exp i f (s, es )
0<s≤t
Poisson Point Processes  t  
= exp ds (eif (s,u) − 1)n(du) (24)
Definition 2 An F-Poisson point process e is a point 0
process such that Moreover, if f ≥ 0,
1. NtE < ∞ a.s. for every t  
2. for any
∈ E, the process N
is F-adapted 
Ɛ exp − f (s, es )
3. for any s and t and any
∈ E, Ns+t

− Nt
is
0<s≤t
independent from Ft and distributed as Ns
.    
t

In particular, for any disjoint family (


i , = exp − ds (1 − e−f (s,u) )n(du) (25)
0
i = 1, . . . , d), the d-dimensional process (Nt
i , i =
1, · · · , d) is a Poisson process.
References
Definition 3 The σ -finite measure on E defined by
[1] Cont, R. & Tankov, P. (2004). Financial Modeling with
1 Jump Processes, Chapman & Hall/CRC.
n(
) = Ɛ(Nt
) (21) [2] Dellacherie, C. & Meyer, P.-A. (1980). Probabilités et
t Potentiel, chapitres, Hermann, Paris, Chapter V-VIII.
English translation (1982), Probabilities and Potentiel
is called the characteristic measure of e. Chapters V-VIII, North-Holland.
4 Point Processes

[3] Jacod, J. & Shiryaev, A.N. (2003). Limit Theorems for Related Articles
Stochastic Processes, 2nd Edition, Springer
Verlag.
[4] Last, G. & Brandt, A. (1995). Marked Point Processes
on the Real Line. The Dynamic Approach, Springer, Lévy Processes; Martingales; Martingale Repre-
Berlin. sentation Theorem.
[5] Protter, P.E. (2005). Stochastic Integration and Differen-
tial Equations, 2nd Edition, Springer, Berlin. MONIQUE JEANBLANC
that ∀t, F (t) < 1, the H-compensator of τ is t =
Compensators  t∧τ dF (s)
. If F is continuous, the H-compensator
0 1−F (s − )
is t = − ln(1 − F (t ∧ τ )).
In probability theory, the compensator of a stochastic
process designates a quantity that, once subtracted Cox Processes
from a stochastic process, yields a martingale.
Let F be a given filtration, t λ a given F-adapted
nonnegative process, Ft = 0 λs ds, and  a random
Compensator of a Random Time variable with exponential law, independent of F. Let
us define the random time τ as
Let (, G, ) be a filtered probability space and τ  
a G-stopping time. The process Ht = 11τ ≤t is a G- τ = inf t : Ft ≥  (3)
adapted increasing process, hence a G-submartingale
and admits a Doob–Meyer decomposition as Then, the process
 t∧τ
H t = M t + t (1)
11τ ≤t − λs ds = 11τ ≤t − Ft∧τ (4)
0
where M is a G-local martingale and  a G-
predictable increasing process. The process , called is a martingale in the filtration G = F ∨ H, the small-
the G-compensator of H , is constant after τ , that is, est filtration that contains F, making τ a stopping time
t = t∧τ . The process  “compensates” H with (in fact a totally inaccessible stopping time). The G-
the meaning that H −  is a martingale. If τ is G- compensator of H is t = Ft∧τ , and the G-intensity
predictable, then t = Ht . The continuity of  is rate is λG t = 11t<τ λt . In that case, for an integrable
equivalent to the fact that τ is a G-totally inaccessible random variable X ∈ FT , one has
stopping time. If  is absolutely continuous with
Ɛ(X11T <τ |Gt ) = 11t<τ et Ɛ(Xe−T |Ft )
F F
(5)
 t G to the Lebesgue measure, that is, if Gt =
respect
0 λs ds, the nonnegative G-adapted process λ is
and, for H , an F-predictable (bounded) process
called the intensity rate of τ . Note that λGt is null on
the set τ ≤ t. Ɛ(Hτ 11τ ≤T |Gt ) = Hτ 11τ ≤t
For any integrable random variable X ∈ GT , one  T 
has F
+ 11t<τ et Ɛ Hs e−s λs ds|Ft
F
(6)
  t
Ɛ(X11T <τ |Gt ) = 11{t<τ } Vt − Ɛ(Vτ 11τ ≤T |Gt )
(2) Conditional Survival Probability
−T
with Vt = e Ɛ(Xe
t
|Gt ). Assume now that τ is a nonnegative random vari-
In the following examples, τ is a given random able on the filtered probability space (, F, ) with
time, that is, a nonnegative random variable, and H conditional survival probability Gt : = (τ > t|Ft ),
the natural filtration of H (i.e., the smallest filtration taken continuous on the right and let G = F ∨ H. The
satisfying the usual conditions such that the process random time τ is a G-stopping time.
H is adapted). The random time τ is a H-stopping If τ is an F-predictable stopping time (hence a
time. G-predictable stopping time), then Gt = 11τ >t and
 = H.
Elementary Case In what follows, we assume that Gt > 0 and
we introduce the Doob–Meyer decomposition of the
Let τ be an exponential random variable with con- F-supermartingale G, that is, Gt = Zt − At , where
stant parameter λ. Then, the H-compensator of H is Z is an F-martingale and A is an increasing F-
λ(t ∧ τ ). More generally, if τ is a nonnegative ran- predictable process. Then, the G-compensator of
τ is t = 0 (Gs − )−1 dAs . If dAt = at dt, the G-
t∧τ
dom variable with cumulative distribution function F ,
−1
taken continuous on the right (F (t) = (τ ≤ t)) such t = 11t<τ (Gt − ) at . Moreover, if G
intensity rate is λG
2 Compensators

is continuous, then for an integrable random variable h|Gt ) = λG


t , and that, there exists a Lebesgue inte-
X ∈ FT , one has grable process y such that | h1 (t < τ ≤ t + h|Gt ) −
t | ≤ yt for any h small enough. Then λ
λG G
is the
Ɛ(X11T <τ |Gt ) = 11t<τ (Gt )−1 Ɛ(XGT |Ft ) (7) G-intensity of τ .
In the case of conditional survival probability
It is often convenient to introduce the F-adapted
model, the predefault intensity λG is
process λt = (Gt − )−1 at , equal to λG
t on the set t < τ .
We shall call this process the predefault-intensity 1
t = lim
λG (t < τ ≤ t + h|Ft ) (14)
process. h→0 h(t < τ |Ft )
A particular case occurs when the process G is
nonincreasing and absolutely continuous with respect See [2] for an extensive study.
∞
to the Lebesgue measure, that is, Gt = t gs ds,
where g ≥ 0. In that case, the G-adapted intensity Shrinking
−1
t = (Gt ) gt 11t<τ , the predefault intensity is
rate is λG
−1
λt = (Gt ) gt and, for an integrable random variable Assume that G∗ is a subfiltration of G such that τ is
X ∈ FT , a G∗ (and a G) stopping time. Assume that τ admits
a G-intensity rate equal to λG . Then, the G∗ -intensity
Ɛ(X11T <τ |Gt ) = 11t<τ et Ɛ(Xe−T |Ft ) of τ is λ∗t = Ɛ(λG ∗
F F
(8) t |Gt ) (see [1]).
As we have seen above, in the survival probability
where F is the F-adapted process defined as approach, the value of the intensity can be given in
 t  t terms of the conditional survivalprobability. Assume
t
t =
F
λs ds = (Gs )−1 gs ds (9) that Gt = (τ > t|Ft ) = Zt − 0 as ds, where Z is
0 0 an F-martingale and that G = F∗ ∨ H where, F∗ ⊂

F. Then, the F∗ -conditional survival probability of


Aven’s Lemma τ is
 t
∗ ∗ ∗ ∗
The Aven lemma has the following form: let Gt = (τ > t|Ft ) = Ɛ(Gt |Ft ) = Xt − as∗ ds
(, Gt , ) be a filtered probability space and N be a 0
counting process. Assume that E(Nt ) < ∞ for any t. (15)
Let (hn , n ≥ 1) be a sequence of real numbers con-
verging to 0, and where X ∗ is an F∗ -martingale and as∗ = Ɛ(as |F∗s ). It
follows that the G∗ -intensity rate of τ writes as (we
1 assume, for simplicity, that G and G∗ are continuous)
Yt(n) = E(Nt+hn − Nt |Gt ) (10)
hn
at∗ ∗
t Gt |Ft )
Ɛ(λG
λ∗t = 11t<τ ∗ = 11t<τ ∗ (16)
Assume that there exists λt and yt nonnegative Gt Ɛ(Gt |Ft )
-adapted processes such that
It is useful to note that one can start with a model
1. For any t, lim Yt(n) = λt (11) in which τ is an F-predictable stopping time (hence
2. For any t, there exists for almost all ω an n0 = G = F, and a G-intensity rate does not exist) and
n0 (t, ω) such that consider a smaller filtration (e.g., the trivial filtration)
for which there exists an intensity rate, computed by
|Ys(n) − λs (ω)| ≤ ys (ω) , s ≤ t, n ≥ n0 (t, ω) means of the conditional survival probability.
(12)
 t Compensator of an Increasing Process
3. ys ds < ∞, ∀t (13)
0 The notion of interest in this section is that of dual
t predictable projection, which we define as follows:
Then, Nt − 0 λs ds is a -martingale.
For the particular case of a random time, we obtain Proposition 1 Let A be an integrable increas-
the following: assume that limh→0 h1 (t < τ ≤ t + ing process (not necessarily F-adapted). There
Compensators 3

exists

a unique
F-predictable increasing process 1. for every predictable process H , the process
(p) (H ν) is predictable (the measure ν is said to
At , t ≥ 0 , called the F-dual predictable projec-
tion of A such that be predictable) and
 ∞   ∞  2. for every predictable process H such that the
process |H | µ is increasing and locally inte-
Ɛ Hs dAs = Ɛ Hs dA(p)
s (17) grable, the process (H µ − H ν) is a local
0 0
martingale.
for any positive F-predictable process H .
Examples
The definition of compensator of a random time
can be interpreted in terms of dual predictable pro- If N is a Lévy process with Lévy measure ν
 
jection: if τ is a random time, the F -predictable
compensator associated with τ is the dual predictable f (x)Nt (·, dx) − t f (x)ν(dx)
 
projection Aτ of the increasing process 11{τ ≤t} . It  

satisfies  ∞  = f (Xs )11 (Xs ) − t f (x)ν(dx)
Ɛ(kτ ) = Ɛ τ
ks dAs (18) 0<s≤t 
0 (20)
for any positive, F-predictable process k. 
is a martingale, the compensator of  f (x)Nt (·, dx)
is t  f (x)ν(dx).
Examples For other examples see the article on point pro-
cesses (see Point Processes).
Covariation Processes. Let M be a martingale
and [M] its quadratic variation process. If [M] is References
integrable, its compensator is M .
[1] Brémaud, P. & Yor, M. (1978). Changes of filtration and
Standard Poisson Process. If N is a Poisson of probability measures, Zeit Wahr and Verw Gebiete 45,
process, (Mt = Nt − λt, t ≥ 0) is a martingale, and 269–295.
[2] Zeng, Y. (2006). Compensators of Stopping Times, PhD
λt is the compensator of N ; the martingale M is
thesis, Cornell University.
called the compensated martingale.
Further Reading
Compensated Poisson Integrals. Let N be a time
inhomogeneous Poisson process with deterministic Brémaud, P. (1981). Point Processes and Queues. Martingale
intensity λ and FN its natural filtration. The process Dynamics, Springer-Verlag, Berlin.
  t  Çinlar, E. (1975). Introduction to Stochastic Processes, Prentice
Hall.
Mt = Nt − λ(s)ds, t ≥ 0 (19) Cont, R. & Tankov, P. (2004). Financial Modeling with Jump
0
Processes, Chapman & Hall/CRC.
 t an F -martingale. The increasing function (t) : =
N Jeanblanc, M., Yor, M. & Chesney, M. (2009). Mathematical
is
λ(s)ds is called the (deterministic) compensator Models for financial Markets, Springer, Berlin.
0 Karlin, S. & Taylor, H. (1975). A First Course in Stochastic
of N . Processes, Academic Press, San Diego.

Related Articles
Random Measures
Definitions Doob–Meyer Decomposition; Filtrations; Inten-
sity-based Credit Risk Models; Point Processes.
The compensator of a random measure µ is the
unique random measure ν such that MONIQUE JEANBLANC
Heavy Tails of the observations. In the early 1960s, Mandelbrot
(see Mandelbrot, Benoit) [31], Mandelbrot and
Taylor [32], and Fama [21] realized that the marginal
distribution of returns appeared to be heavy tailed. To
The three most cited stylized properties attributed to cope with heavy tails, they considered non-Gaussian
log-returns of financial assets or stocks are (i) a kur- stable distributions for the marginals. Since this
tosis much larger than 3, the kurtosis of a normal class of distributions has infinite variance, it was a
distribution; (ii) serial dependence without correla- slightly controversial approach. On the other hand,
tion; and (iii) volatility clustering. Any realistic and for many financial time series, there is evidence that
useful model for log-returns must account for all three the marginal distribution may have a finite variance
of these characteristics. In this article, the focus is but an infinite fourth moment. Figure 1 contains
on the large kurtosis property, which is indicative two financial time series that exhibit heavy tails.
of heavy tails in the returns. Although this stylized Figure 1(a) consists of the daily pound/US dollar
fact may not draw the same level of attention as the exchange rate from October 1, 1981 to June 28,
other two, it can have a serious impact on model- 1985, while Figure 1(b) displays the log-returns of
ing and inference questions related to financial time the daily closing price of Merck stock from January
series. One such application is the estimation of the 2, 2003 through April 28, 2006. One can certainly
Value at Risk, which is an important entity in the detect the occasional bursts of outlying observations
finance industry. For example, financial institutions in both series that are representative of heavy tails.
would like to estimate large quantiles of the absolute As described in the second section (see Figure 3c and
returns, that is, the level at which the probability that d), there is statistical evidence that the tail behavior
an absolute return exceeds this value is small such as of the marginal distribution is heavy with possibly
0.01 or less. The estimation of these large quantities is infinite fourth moments.
extremely sensitive to the shape assumed for the tail Regular variation is a natural and often used con-
of the marginal distribution. A light-tailed assump- cept to describe and model heavy-tailed phenomena.
tion for the tails can severely underestimate the actual Many processes that are designed to model finan-
quantiles of the marginal distribution. In addition to cial time series, such as the GARCH and heavy-
Value at Risk, heavy tails can impact the estimation of tailed SV processes, have the property that all finite-
key measures of dependencies in financial time series. dimensional distributions are regularly varying. For
This includes the sample autocorrelation of the time such processes, one can apply standard results from
series and of functions of the time series such as abso- extreme value theory for establishing limiting behav-
lute values and squares. Standard central limit theory ior of the extremes of the process, the sample ACF
for mixing sequences generally directly applies to the of the process and its squares, and a host of other
sample autocorrelation functions (ACFs) of a finan- statistics. The regular variation condition and its prop-
cial time series and its squares, provided the fourth erties are described in the second section. In the third
and eight moments, respectively, are finite. If these section, some of the main results on regular varia-
moments are infinite, as well may be the case for tion for GARCH and SV processes, respectively, are
financial time series, then the asymptotic behavior of described. The fourth section describes some of the
the sample ACFs is often nonstandard. As it turns out, applications of the regular variation conditions men-
GARCH processes and stochastic volatility (SV) pro- tioned in the third section, with emphasis on extreme
cesses, which are the primary modeling engines for values, point processes, and sample autocorrelations.
financial returns, exhibit heavy tails in the marginal
distribution. We focus on heavy tails and how the
concept of regular variation plays a vital role in both Regular Variation
these processes.
It is often a misconception to associate heavy- Multivariate regular variation plays an indispensable
tailed distributions with a very large variance. Rather, role in extreme value theory and often serves as
the term is used to describe data that exhibit bursts the starting point for modeling multivariate extremes.
of outlying observations. These outlying observations In some respect, one can regard a random vector
could be orders of magnitude larger than the median that is regularly varying as the heavy-tailed analog
2 Heavy Tails

Pound/Dollar exchange rate 10/1/81-6/28/85 Log-returns for merck 1/2/03-4/28/06

0.1
2

0.0
Exchange returns

Log-returns
0
−0.1

−2
−0.2

−4
−0.3

1982 1983 1984 1985 0 200 400 600 800


(a) Time (b) Time

Figure 1 Log-returns for US/pound exchange rate, October 1, 1981 to June 28, 1985 (a) and log-returns for closing price
of Merck stock, January 2, 2003 to April 28, 2006 (b)

of a Gaussian random vector. Unlike a Gaussian that is,


random vector, which is characterized by the mean
vector and all pairwise covariances, a regular varying
P ( X/|X| ∈ A||X| > u ) → P ( ∈ A)
random vector in d dimensions is characterized by
two components, an index α > 0 and a random vector as u → ∞ (3)
 with values in d−1 , where d−1 denotes the unit
sphere in d with respect to the norm | · |. The The distribution of  is often called the spectral
random vector X is said to be regularly varying with measure of the regularly varying random vector. The
index −α if for all t > 0, modulus has power-law-like tails in the sense that

P (|X| > x) = L(x)x −α (4)


P (|X| > tu, X/|X| ∈ ·) v −α
→ t P ( ∈ ·)
P (|X| > u) where L(x) is a slowly varying function, that is,
as u → ∞ (1) for any t > 0, L(tx)/L(x) → 1 as x → ∞. This
property implies that the rth moments of |X| are
v
The symbol → stands for vague convergence on infinite for r > α and finite for r < α.
 d−1 ; vague convergence of measures is treated in There is a second characterization of regular vari-
detail in [27]. See [24, 36, 37] for background on ation that is often useful in applications. Replacing
multivariate regular variation. In this context, the u in equation (1) by the sequence an > 0 satisfying,
convergence in equation (1) holds for all continuity nP (|X| > an ) → 1 (i.e., we may take an to be the
sets A ∈ B(d−1 ) of . In particular, equation (1) 1 − n−1 quantile of |X|), we obtain
implies that the modulus of the random vector |X| is
regularly varying, that is, v
nP (|X| > t an , X/|X| ∈ · ) → t −α P ( ∈ · )
P (|X| > t u) as n → ∞ (5)
lim = t −α (2)
u→∞ P (|X| > u )
As expected, the multivariate regular variation
Hence, roughly speaking, from the defining equa- condition collapses to the standard condition in the
tion (1), the modulus and angular parts of the random one-dimensional case d = 1. In this case, 0 =
vector, |X| and X/|X|, are independent in the limit, {−1, 1}, so that the random variable X is regular
Heavy Tails 3

varying if and only if |X| is regularly varying the unit circle and the coordinate axes. That is,
 
P (|X| > t u) πk 1
lim = t −α (6) P = = for k = −1, 0, 1, 2 (8)
u→∞ P (|X| > u ) 2 4

and the tail balancing condition, The scatter plot in Figure 2 reflects the form
of the spectral distribution. The points that are far
from the origin occur only near the coordinate axes.
P (X > u)
lim =p and The interpretation is that the probability that both
u→∞ P (|X| > u ) components of the random vector are large at the
P (X < −u) same time is quite small.
lim =q (7)
u→∞ P (|X| > u )
Example 2 (Totally Dependent Components). In
holds, where p and q are nonnegative constants with contrast to the independent case of Example 1,
p + q = 1. The Pareto distribution, t-distribution, suppose that both components of the vector are
and nonnormal stable distributions are all examples identical, that is, X = (X, X), with X regularly
of one-dimensional distributions that are regularly varying in one dimension. Independent replicates of
varying. this random vector would just produce points lying
on a 45° line through the origin. Here, it is easy to
Example 1 (Independent components). Suppose see that the vector is regularly varying with spectral
that X = (X1 , X2 ) consists of two independent and measure given by
identically distributed (i.i.d.) components, where X1  
 π −π
is regularly varying random variable. The scatter plot
P = =p and P = = q (9)
of 10 000 replicates of these pairs, where X1 has a 4 4
t-distribution with 3 degrees of freedom, is displayed
in Figure 2(a). The t-distribution is regularly varying, Example 3 (AR(1) Process). Let {Xt } be the AR(1)
with index α being equal to the degrees of freedom. process defined by the recursion:
In this case, the spectral measure is a discrete distri-
bution, which places equal mass at the intersection of Xt = 0.9Xt−1 + Zt (10)

Independent components

80

40
60
x = {t +1}

40
20
x _2

20

0 0

−20
−20

−20 −10 0 10 20 −20 0 20 40 60 80


(a) x _1 (b) x=t

Figure 2 Scatter plot of 10 000 pairs of observations with i.i.d. components having a t-distribution with 3 degrees of
freedom (a) and 10 000 observations of (Xt , Xt+1 ) from an AR(1) process (b)
4 Heavy Tails

where {Zt } is an i.i.d. sequence of random variables of m where the plot appears horizontal for an
that have a symmetric stable distribution with expo- extended segment. See [7, 37] for other procedures
nent 1.8. This stable distribution is regularly varying for selecting m. There is the typical bias versus
with index α = 1.8. Since Xt = ∞ j
j =0 0.9 Zt−j is a variance trade-off, with larger m producing smaller
linear process, it follows [14, 15] that Xt is also sym- variance but larger bias. Figure 3 contains graphs of
metric and regularly varying with index 1.8. In fact, the Hill estimate of α as a function of m for the
Xt has a symmetric stable distribution with exponent two simulated series in Figure 2 and the exchange
1.8 and scale parameter (1 − 0.91.8 )−1/1.8 . The scatter rate and log-return data of Figure 1. In all cases, one
plot of consecutive observations (Xt , Xt+1 ) based on can see a range of m for which the graph of α̂ is
10 000 observations generated from an AR(1) pro- relatively flat. Using this segment as an estimate of
cess is displayed in Figure 2(b). It can be shown α, we would estimate the index as approximately 3
that all finite-dimensional distributions of this time for the two simulated series, approximately 3 for the
series are regularly varying. The spectral distribution exchange rate data, and around 3.5 for the stock price
of the vector consisting of two consecutive observa- data. (The value of α for the two simulated series
tions X = (Xt , Xt+1 ) is given by is indeed 3.) Also displayed on the plots are 95%
confidence intervals for α, assuming the data are i.i.d.
P ( = ± arctan(0.9)) = 0.9898 and As suggested by these plots, the return data appear
to have quite heavy tails.
P ( = ± π/2) = 0.0102 (11)

As seen in Figure 2, one can see that most of the Estimation of the Spectral Distribution
points in the scatter plot, especially those far from
the origin, cluster tightly around the line through the Using property (3), a naive estimate of the distri-
origin with slope 0.9. This corresponds to the large bution of  is based on the angular components
mass at arctan(0.9) of the distribution of . One can Xt /|Xt | in the sample. One simply uses the empir-
also detect a smattering of extreme points clustered ical distribution of these angular pieces for which the
around the vertical axis. modulus |Xt | exceeds some large threshold. More
details can be found in [37]. For the scatter plots
in Figure 2, we produced in Figure 4 kernel den-
Estimation of α sity estimates of the spectral density function for
the random variable  on (−π, π]. One can see
A great deal of attention in the extreme value theory in the graph of the i.i.d. data, the large spikes at
community has been devoted to the estimation of α values of θ = −π, −π/2, 0, π/2, π corresponding to
in the regular variation condition (1). The generic the coordinate axes (the values at −π and π should
Hill estimate is often a good starting point for this be grouped together). On the other hand for the
task. There are more sophisticated versions of Hill AR(1) process, the density estimate puts large mass at
estimates, see [23] for a nice treatment of Hill θ = arctan(0.9) and θ = arctan(0.9) − π correspond-
estimators, but for illustration we stick with the ing to the line with slope 0.9 in the first and third
standard version. For observations X1 , . . . , Xn from a quadrants, respectively. Since there are only a few
nonnegative-valued time series, let Xn:1 > · · · > Xn:n points on the vertical axis, the density estimate does
be the corresponding descending order statistics. If not register much mass at 0 and π.
the data were in fact i.i.d. from a Pareto distribution,
then the maximum likelihood estimator of α −1 based
on the largest m + 1 order statistics is Regular Variation for GARCH and SV
1 
m
 Processes
α̂ −1 = ln Xn:j − ln Xn:m+1 (12)
m j =1 GARCH Processes

Different values of m produce an array of α The autoregressive conditional heterscedastic


estimates. The typical operating procedure is to plot (ARCH) process developed by Engle [19] and its
the estimate of α versus m and choose a value generalized version, GARCH, developed by Engle
Heavy Tails 5

Hill plot for independent components


Hill plot for AR(1)
5
5

4 4
Hill

Hill
3 3

2 2

1 1

0 500 1000 1500 2000 0 500 1000 1500 2000


(a) m (b) m
Hill plot for exchange rate Hill plot for merck returns
5 5

4 4
Hill

Hill

3 3

2 2

1 1
0 50 100 150 0 50 100 150
(c) m (d) m

Figure 3 Hill plots for tail index: (a) i.i.d. data in Figure 2; (b) AR(1) process in Figure 2; (c) log-returns for US/pound
exchange rate; and (d) log-returns for Merck stock, January 2, 2003 to April 28, 2006

and Bollerslev [20] are perhaps the most popu- where the noise or innovations sequence (Zt )t∈ is
lar models for financial time series (see GARCH an i.i.d. sequence with mean zero and unit variance.
Models). Although there are many variations of the It is usually assumed that all coefficients αi and
GARCH process, we focus on the traditional version. βj are nonnegative, with α0 > 0. For identification
We say that {Xt } is a GARCH(p, q) process if it is a purposes, the variance of the noise is assumed to
strictly stationary solution of the equations: be 1 since otherwise its standard deviation can be
absorbed into σt . (σt ) is referred to as the volatility
sequence of the GARCH process.
Xt = σt Zt The parameters are typically chosen to ensure

p that a causal and strictly stationary solution to the
σt2 = α0 + 2
αi Xt−i equations (13) exists. This means that Xt has a
i=1 representation as a measurable function of the past

q and present noise values Zs , s ≤ t. The necessary and
+ 2
βj σt−j , t ∈ (13) sufficient conditions for the existence and uniqueness
j =1 of a stationary ergodic solution to equation (13) are
6 Heavy Tails

Independent components AR(1)


0.8

0.6
0.20

0.4
0.15

0.2

0.10
0.0

−3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3
(a) q (b) q

Figure 4 The estimation of the spectral density function for i.i.d. components (a) and for the AR(1) process (b) from
Figure 2

given in [35] for the GARCH(1, 1) case and for the where Yt is an m-dimensional random vector, At
general GARCH(p, q) case in [4]; see [30] for a is an m × m random matrix, Bt is a random vec-
summary of the key properties of a GARCH process. tor, and {(At , Bt )} is an i.i.d. sequence. Under suit-
In some cases, one only assumes weak stationarity, in able conditions on the coefficient matrices and error
which case the conditions on the parameters reduce matrices, one can derive various properties about the
substantially. A GARCH process is weakly stationary Markov chain Yt . For example, iteration of equa-
if and only if tion (15) yields a unique stationary and causal solu-
tion:

p

q
α0 > 0 and αj + βj < 1 (14)


j =1 j =1
Y t = Bt + At · · · At−i+1 Bt−i , t ∈  (16)
i=1
To derive properties of the tail of the finite-
dimensional distributions of a GARCH process,
including the marginal distribution, it is convenient To ensure almost surely (a.s.) convergence of the
to embed the squares Xt2 and σt2 in a stochas- infinite series in equation (16), and hence the exis-
tic recurrence equation (SRE). This embedding can tence of a unique a strictly stationary solution to
be used to derive other key properties of the pro- equation (15), it is assumed that the top Lyapunov
cess beyond the finite-dimensional distributions. For exponent given by
example, conditions for stationarity and β-mixing can
be established from the properties of SREs and gen-
eral theory of Markov chains. Here, we focus on the γ = inf n−1 E log An · · · A1  (17)
n≥1
tail behavior.
One builds an SRE by including the volatil-
ity process in the state vector. An SRE takes the is negative, where  ·  is the operator norm corre-
form sponding to a given norm in m .
Now, the GARCH process, at least its squares, can
Yt = At Yt−1 + Bt (15) be embedded into an SRE by choosing
Heavy Tails 7

 α Z2 + β β2 ··· βq−1 βq α2 α3 ··· αp 


1 t 1
   1 0 ··· 0 0 0 0 ··· 0 
2
σt+1  ··· ··· 0 
..  0 1 0 0 0 0 
   .. .. .. .. .. .. .. .. .. 
 2.   . . . . . . . . . 
 σt−q+2   

Yt =  , At =  0 0 ··· 1 0 0 0 ··· 0 
2   
 X. t   Zt2 0 ··· 0 0 0 0 ··· 0 
 ..   
 0 0 ··· 0 0 1 0 ··· 0 
2  .. 
Xt−p+1  .. .. .. .. .. .. .. .. 
. . . . . . . . .
0 0 ··· 0 0 0 ··· 1 0

Bt = (α0 , 0, . . . , 0) (18)

where, as required, {(At , Bt )} is an i.i.d. sequence. The conditions [35], E log(α1 Z 2 + β1 ) < 0 and
The top row in the SRE for the GARCH specification α0 > 0, are necessary and sufficient for the existence
follows directly from the definition of the squared of a stationary causal nondegenerate solution to the
2
volatility process σt+1 and the property that Xt = GARCH(1,1) equations.
σt Zt . Once the squares and volatility sequence, Xt2 and
In general, the top Lyapunov coefficient γ for 2
σt , respectively, are embedded in an SRE, then one
the GARCH SRE cannot be calculated explicitly. can apply classical theory for SREs as developed by
However, a sufficient condition for γ < 0 is given as
Kesten [28], (see also [22]), and extended by Basrak
et al. [2], to establish regular variation of the tails of

p

q Xt2 and σt2 . The following result by Basrak et al. [1]
αi + βj < 1 (19) summarizes the key results applied to a GARCH
i=1 j =1 process.

see p. 122 [4]. It turns out that this condition is also Theorem 1 Consider the process (Yt ) in equa-
necessary and sufficient for the existence of a weakly tion (18) obtained from embedding a stationary
stationary solution to the GARCH recursions. The GARCH process into the SRE (18). Assume that Z
solution will also be strictly stationary in this case. has a positive density on  such that E(|Z|h ) < ∞
It has been noted that for many financial time for h < h0 and E(|Z|h0 ) = ∞ for some h0 ∈ (0, ∞].
series, the GARCH(1,1) often provides an adequate Then with Y = Y1 , there exist α > 0, a constant c >
model or is at least a good starter model. This is one 0, and a random vector  on the unit sphere p+q−2
of the few models where the Lyapunov coefficient such that
can be computed explicitly. In this case, the SRE
equation essentially collapses to the one-dimensional
SRE given as x α/2 P (|Y| > x) → c as x → ∞ (22)

2
σt+1 = α0 + (α1 Zt2 + β1 ) σt2 = At σt2 + α0 (20) and for every t > 0

where At = α1 Zt2 + β1 . The elements in the second P (|Y| > tx, Y/|Y| ∈ ·) w −α/2
row in the vector and matrix components of equa- →t P ( ∈ ·)
P (|Y| > x)
tion (18) play no role in this case. Hence,
as x → ∞ (23)

γ = n−1 E log (An · · · A1 ) = E log A1 w


where → denotes weak convergence on the Borel σ -
= E log (α1 Z + β1 )
2
(21) field of p+q−2 .a
8 Heavy Tails

It follows that the components of the vector of Y so that


are also regularly varying so that

(σ12 , . . . , σm2 ) = 1, α1 Z12 + β1 , (α1 Z22 + β1 )
P (|X1 | > x) ∼ c1 x −α and
× (α1 Z12 +β1 ),. . ., α1 Zm−1
2
+β1 )· · ·
P (σ1 > x) ∼ c2 x −α
(24) 
× (α1 Z1 + β1 ) σ1 + Rm
2 2

for some positive constants c1 and c2 . A straightfor- = Dm σ12 + Rm (29)


ward application of Breiman’s lemma [6], (cf. [13],
Section 4), allows us to remove the absolute values where Rm has tails that are lighter than those for σ12 .
in X1 to obtain Now since Dm = (D1 , . . . , Dm ) is independent of σ12
and has a α/2 + δ moment for some δ > 0, it follows
P (X1 > x) = P (σ1 Z1+ > x) by a generalization of Breiman’s lemma [1] that

∼ E((Z1+ )α ) P (σ1 > x) (25) Um := (X12 , . . . , Xm ) = Fm σ12 + Rm


2
(30)
P (X1 ≤ −x) = P (−σ1 Z1− ≤ −x) where Fm = (Z12 D1 , . . . , Zm
2
Dm ) is regularly varying
∼ E((Z1− )α ) P (σ1 > x) (26) with

where Z1± are the respective positive and negative P (|Um | > x, Um /|Um | ∈ A)
lim
parts of Z1 . With the exception of simple models x→∞ P (|Um | > x)
such as the GARCH(1,1), there is no explicit formula P (|Fm |σ12 > x, Fm /|Fm | ∈ A)
for the index α of regular variation of the marginal = lim
distribution. In principle, α could be estimated from
x→∞ P (|Fm |σ12 > x)
 
the data using a Hill style estimator, but an enormous E |Fm |α/2 IA (Fm /|Fm |)
sample size would be required in order to obtain a = (31)
E|Fm |α/2
precise estimate of the index.
In the GARCH(1,1) case, α is found by solving It follows that the finite-dimensional distributions
the following equation: of a GARCH process are regularly varying.
 
E (α1 Z 2 + β1 )α/2 = 1 (27) Stochastic Volatility Processes
This equation can be solved for α by numerical The SV process also starts with the multiplicative
and/or simulation methods for fixed values of α1 and model (13)
β1 from the stationarity region of a GARCH(1,1) pro- Xt = σt Zt (32)
cess and assuming a concrete density for Z. (See [12]
for a table of values of α for various choices of α1 and with (Zt ) being an i.i.d. sequence of random vari-
β1 .) Note that in the case of an integrated GARCH ables. If var(Zt ) < ∞, then it is conventional to
(IGARCH) process where α1 + β1 = 1, then we have assume that Zt has mean 0 and variance 1. Unlike
α = 2. This holds regardless of the distribution of Z1 , the GARCH process, the volatility process (σt ) for
provided it has a finite variance. Since the marginal SV processes is assumed to be independent of the
distribution of an IGARCH process has Pareto-like sequence (Zt ). Often, one assumes that log σt2 is a
tails with index 2, the variance is infinite. linear Gaussian process given by
While equations (25) and (26) describe only the ∞

regular variation of the marginal distribution, it is log σt2 = Yt = µ + ψj ηt−j (33)
also true that the finite-dimensional distributions are j =0
regularly varying. To see this in the GARCH(1,1)
case, we note that the volatility process is given as where (ψj ) is a sequence of square summable coef-
ficients and (ηt ) is a sequence of i.i.d. N(0, σ 2 )
2
σt+1 = (α1 Zt2 + β1 )σt2 + β0 (28) random variables independent of (Zt ). If var(Zt ) is
Heavy Tails 9

finite and equal to 1, then the SV process Xt = that X1 is regularly varying with index −α and choos-
σt Zt = expYt /2 Zt is 
white noise with mean 0 and ing the sequence (an ) such that n(1 − F (an )) → 1,
variance exp{µ + σ 2 ∞ 2
j =0 ψj /2}. One advantage of then
such processes is that one can explicitly compute the

autocovariance function (ACVF) of any power of Xt 0, x≤0
F (an x) → G(x) =
n
−α (36)
and its absolute values. For example, the ACVF of e −x , x > 0
the squares of (Xt ) is, for h > 0, given as
This relation is equivalent to convergence in distri-
γ|X|2 (h) = E(exp{Y0 + Yh }) − (E exp{Y0 })2 bution of the maxima of the associated independent
sequence (X̂t ) (i.e., the sequence (X̂t ) is i.i.d. with
 ∞
 
common distribution function F ) normalized by an
= exp 2µ + σ 2 ψi2
to the Fréchet distribution G. Specifically, if M̂n =
i=0
max{X̂1 , . . . , X̂n }, then
   ∞  
× exp σ 2 ψi ψi+h − 1 P (an−1 Mn ≤ x) → G(x) (37)
i=0
 
= e2µ e γY (0) e γY (h) − 1 (34) Under mild mixing conditions on the sequence (Xt )
[29], we have
Note that as h → ∞,
P (an−1 Mn ≤ x) → H (x) (38)
 
γ|X|2 (h) ∼ e2µ eγY (0) e γY (h) − 1 ∼ e2µ e γY (0) γY (h) with H a nondegenerate distribution function if and
only if
(35) H (x) = Gθ (x) (39)
so that the ACVF of the SV for the squares converges
for some θ ∈ (0, 1]. The parameter θ is called the
to zero at the same rate as the log-volatility process.
extremal index and can be viewed as a sample
If Zt has a Gaussian distribution, then the tail
size adjustment for the maxima of the dependent
of Xt remains light although a bit heavier than a
sequence due to clustering of the extremes. The case
Gaussian [3]. This is in contrast to the GARCH
θ = 1 corresponds to no clustering, in which case the
case where an i.i.d. Gaussian input leads to heavy-
limiting behavior of Mn and M̂n are identical. In case
tailed marginals of the process. On the other hand,
θ < 1, Mn behaves asymptotically like the maximum
for SV processes, if the Zt have heavy tails, for
of nθ independent observations. The reciprocal of the
example, if Zt has a t-distribution, then Davis and
extremal index 1/θ of a stationary sequence (Xt ) also
Mikosch [10] show that Xt is regularly varying.
has the interpretation as the expected size of clusters
Furthermore, in this case, any finite collection of
of high-level exceedances in the sequence.
Xt ’s has the same limiting joint tail behavior as
There are various sufficient conditions for ensur-
an i.i.d. sequence with regularly varying marginals.
ing that θ = 1. Perhaps the most common anticlus-
Specifically, the two random vectors, (X1 , . . . , Xk )
tering condition is D  [28], which has the following
and (E|σ1 |α )1/α (Z1 , . . . , Zk ) have the same joint tail
form:
behavior.


[n/k]
lim sup n P (X1 > an x, Xt > an x) = O(1/k)
Limit Theory GARCH and SV Processes n→∞
t=2

Convergence of Maxima (40)

If (Xt ) is a stationary sequence of random variables as k → ∞. Hence, if the stationary process (Xt )
with common distribution function F , then often satisfies a mixing condition and D  , then
one can directly relate the limiting distribution of
the maxima, Mn = max{X1 , . . . , Xn } to F . Assuming P (an−1 Mn ≤ x) → G(x) (41)
10 Heavy Tails

Returning to the GARCH setting, we assume that regularly varying tails with index −α. Choosing the
the conditions of Theorem 1 are satisfied. Then we sequence an satisfying n(1 − F (an )) → 1, we have
know that P (|X| > x) ∼ c1 x −α for some α, c1 > 0,
and we can even specify the value of α in the nP (X̂1 > an x) → x −α (44)
GARCH(1, 1) case by solving equation (27). Now
1/α
choosing an = n1/α c1 , we have nP (|X1 | > an ) → as n → ∞. Now equation (44) can be strengthened
1 and defining Mn = max{|X1 |, . . . , |Xn |}, we obtain to the statement

P (an−1 Mn ≤ x) → exp{−θ1 x −α } (42) n P (an−1 X̂1 ∈ B) → ν(B) (45)


where the extremal index θ1 is strictly less than 1. for all suitably chosen Borel sets B, where the
Explicit formulae for the extremal index of a general measure ν is defined by its value on intervals of the
GARCH process are hard to come by. In some special form (a, b] with a > 0 as
cases, such as the ARCH(1) and the GARCH(1,1),
there are more explicit expressions. For example, in
ν(a, b] = a −α − b−α (46)
the GARCH(1,1) case, the extremal index θ1 for the
maxima of the absolute values of the GARCH process The convergence in equation (46) can be connected
is given by Mikosch and Stărică [34] with the convergence in the distribution of a sequence
 of point processes. For a bounded Borel set B in E =
 α/2  [0, ∞] \ {0}, define the sequence of point processes
 
j 
 
lim E |Z1 |α − max Zj2 Ai   (N̂n ) by
k→∞ j =2,...,k+1  
i=2
+
 
θ1 = N̂n (B) = # an−1 X̂j ∈ B , j = 1, 2, . . . , n (47)
E|Z1 |α

(43) If B is the interval (a, b] with 0 < a < b ≤ ∞,


then since the X̂j are i.i.d., N̂n (B) has a binomial
The above expression can be evaluated by Monte- distribution with number of trials n and probability
Carlo simulation, see, for example, [25] for the of success
ARCH(1) case with standard normal noise Zt ; see
[18], Section 8.1, where one can also find some
advice as to how the extremal index of a stationary pn = P (an−1 X̂1 ∈ (a, b]) (48)
sequence can be estimated from data.
It then follows from equation (46) that N̂n (B) con-
The situation is markedly different for SV pro-
verges in distribution to a Poisson random variable
cesses. For the SV process with either light- or
N (B) with mean ν(B). In fact, we have the stronger
heavy-tailed noise, one can show that D  is satisfied
point process convergence:
and hence the extremal index is always 1 (see [3]
for the light-tailed case and [10] for the heavy-tailed d
N̂n → N (49)
case). Hence, although both GARCH and SV models
exhibit stochastic clustering, only the GARCH pro- where N is a Poisson process on E with mean mea-
cess displays extremal clustering. d
sure ν(dx) and → denotes convergence in distribu-
d
tion of point processes. For our purposes, → for point
Convergence of Point Processes processes means that for any collection of boundedb
Borel sets B1 , . . . , Bk for which P (N (∂Bj ) > 0) =
The theory of point processes plays a central role in 0, j = 1, . . . , k, we have
extreme value theory and in combination with regu-
lar variation can be a powerful tool for establishing d
limiting behavior of other statistics beyond extreme (N̂n (B1 ), . . . , N̂n (Bk )) → (N (B1 ), . . . , N (Bk ))
order statistics. As in the previous section, suppose (50)
that (X̂t ) is an i.i.d. sequence of nonnegative ran-
dom variables with common distribution F that has on k [18, 29, 36].
Heavy Tails 11

As an application of equation (49), define M̂n,k to The Behavior of the Sample Autocovariance and
be the kth largest among X̂1 , . . . , X̂n . For y ≤ x, the Autocorrelation Functions
event {an−1 M̂n ≤ x, an−1 M̂n,k ≤ y} = {N̂n (x, ∞) = 0,
The ACF is one of the principal tools used in classical
N̂n (y, x] ≤ k − 1} and hence time series modeling. For a stationary Gaussian
process, the dependence structure of the process is
P (an−1 M̂n ≤ x, an−1 M̂n,k ≤ y) completely determined by the ACF. The ACF also
conveys important dependence information for linear
= P (N̂n (x, ∞) = 0, N̂n (y, x] ≤ k − 1) process. To some extent, the dependence governed by
→ P (N (x, ∞) = 0, N (y, x] ≤ k − 1) a linear filter can be fully recovered from the ACF.
For the time series consisting of financial returns,
−α

k−1
the data are uncorrelated, so the value of the ACF
= e−x (y −α − x −α )j /j ! (51)
is substantially diminished. Nevertheless, the ACF of
j =0
other functions of the process such as the squares and
As a second application of the limiting Poisson absolute values can still convey useful information
convergence in equation (49), the limiting Poisson about the nature of the nonlinearity in the time series.
−1/α For example, slow decay of the ACF of the squares
process N̂ has points located at k , where k =
E1 + · · · + Ek is the sum of k i.i.d. unit exponentially is consistent with the volatility clustering present in
distributed random variables. Then if α < 1, the the data. For a stationary time series (Xt ), the ACVF
result is more complicated; if α ≥ 1, we obtain the and ACF are defined as
convergence of partial sums:
γX (h) = cov(X0 , Xh ) and

n ∞
 γX (h)
an−1
d
X̂t →
−1/α
j (52) ρX (h) = corr(X0 , Xh ) = , h ≥ 0 (53)
γX (0)
t=1 j =0
respectively. Now for observations X1 , . . . , Xn from
In other words, the sum of the points of the point the stationary time series, the ACVF and ACF are
process Nn converges in distribution to the sum of estimated by their sample counterparts, namely, by
points in the limiting Poisson process.
For a stationary time series (Xt ) with heavy
1 
n−h
tails that satisfy a suitable mixing condition, such γ̂X (h) = (Xt − X n ) (Xt+h − X n ) (54)
as strong mixing, and the anticlustering condition n t=1
D  , then the convergence in equation (49) remains
valid, as well as the limit in equation (52), at least and
for positive random variables. For example, this is
the case for SV processes. If the condition D  is 
n−h

replaced by the assumption that all finite-dimensional (Xt − X n )(Xt+h − X n )


γ̂X (h) t=1
random variables are regularly varying, then there is ρ̂X (h) = = (55)
a point convergence result for Nn corresponding to γ̂X (0) 
n
(Xt − X n )2
(Xt ). However, the limit point process in this case
t=1
is more difficult to describe. Essentially, the point

process has anchors located at the Poisson points where X n = n−1 nt=1 Xt is the sample mean.
−1/α
j . At each of these anchor locations, there is an Even though the sample ACVF is an average of
independent cluster of points that can be described by random variables, its asymptotic behavior is deter-
the distribution of the angular measures in the regular mined by the extremes values, at least in the case
variation condition [8, 9]. These conditions can then of heavy-tailed data. Regular variation and point pro-
be applied to functions of the data, such as lagged cess theory are the two ingredients that play a key
products, to establish the convergence in distribution role in deriving limit theory for the sample ACVF
of the sample autocovariance function. This is the and ACF. In particular, one applies the point process
subject of the following section. techniques alluded to in the previous section to the
12 Heavy Tails

stationary process consisting of products (Xt Xt+h ). value statistics as described in the second section,
The first such results were established by Davis and indicating that log-return series might not have a
Resnick [14–16] in a linear process setting. Exten- finite fourth or fifth momentc and then the limit results
sions by Davis and Hsing [8] and Davis and Mikosch above would show that the usual confidence bands for
[9] allowed one to consider more general time series the sample ACF based√on the central limit theorem
models beyond those linear. The main idea is to con- and the corresponding n-rates are far too optimistic
sider a point process Nn based on products of the in this case.
form Xt Xt+h /an2 . After establishing convergence of
this point process, in many cases one can apply the
continuous mapping theorem to show that the sum of The Stochastic Volatility Case
the points that comprise Nn converges in distribution
For a more direct comparison with the GARCH
to the sum of the points that make up the limiting
process, we choose a distribution for the noise process
point process. Although the basic idea for establish-
that matches the power law tail of the GARCH with
ing these results is rather straightforward, the details
index α. Then
are slightly complex. These ideas have been applied
 n 1/α  n 1/(2α)
to the case of GARCH processes in [1] and to SV
ρ̂X (h) and ρ̂X2 (h) (57)
processes in [10], which are summarized below. ln n ln n
converge in distribution for α ∈ (0, 2) and α ∈ (0, 4),
The GARCH Case respectively. This illustrates the excellent large sam-
ple behavior of the sample ACF for SV models even
The scaling in the limiting distribution for the sample if ρX and ρX2 are not defined [11, 13]. Thus, even
ACF depends on the index of regular variation α if var(Zt ) = ∞ or EZt4 = ∞, the estimates ρ̂X (h)
specified in Theorem 1. We summarize the results and ρ̂X2 (h), respectively, converge to zero at a rapid
for the various cases of α. rate. This is in marked contrast with the situation for
GARCH processes, where under similar conditions
1. If α ∈ (0, 2), then ρ̂X (h) and ρ̂|X| (h) have nonde- on the marginal distribution, the respective sample
generate limit distributions. The same statement ACFs converge in distribution to random variables
holds for ρ̂X2 (h) when α ∈ (0, 4). without any scaling.
2. If α ∈ (2, 4), then both ρ̂X (h), ρ̂|X| (h) converge
in probability to their deterministic counterparts
ρX (h), ρ|X| (h), respectively, at the rate n1−2/α
End Notes
and the limit distribution is a complex function a.
Basrak et al. [1] proved this result under the condition
of non-Gaussian stable random variables.
that α/2 is not an even integer. Boman and Lindskog [5]
3. If α ∈ (4, 8), then removed this condition.
b.
Here bounded means bounded away from zero.
d
n1−4/(2α) (ρ̂X2 (h) − ρX2 (h)) → Sα/2 (h) (56) c.
See, for example, [18], Chapter 6, and [33].

where the random variable Sα/2 (h) is a function References


of infinite variance stable random variables.
4. If α > 4, then the one can apply standard central [1] Basrak, B., Davis, R.A. & Mikosch, T. (2002). Regular
limit theorems for stationary mixing sequences variation of GARCH processes, Stochastic Processes
to establish a limiting normal distribution [17, and Their Applications 99, 95–116.
26]. In particular, √(ρ̂X (h)) and (ρ̂|X| (h)) have [2] Basrak, B., Davis, R.A. & Mikosch, T. (2002). A
Gaussian limits at n-rates. The corresponding characterization of multivariate regular variation, The
result holds for (Xt2 ) when α > 8. Annals of Applied Probability 12, 908–920.
[3] Breidt, F.J. & Davis, R.A. (1998). Extremes of stochastic
volatility models, The Annals of Applied Probability 8,
These results show that the limit theory for the 664–675.
sample ACF of a GARCH process is rather com- [4] Bougerol, P. & Picard, N. (1992). Stationarity of
plicated when the tails are heavy. In fact, there is GARCH processes and of some nonnegative time series,
considerable empirical evidence based on extreme Journal of Econometrics 52, 115–127.
Heavy Tails 13

[5] Boman, J. & Lindskog, F. (2007). Support Theorems [22] Goldie, C.M. (1991). Implicit renewal theory and tails
for the Radon Transform and Cramér-Wold Theorems. of solutions of random equations, Annals of Applied
Technical report, KTH, Stockholm. Probability 1, 126–1 –1.
[6] Breiman, L. (1965). On some limit theorems similar to [23] Haan, L. & Ferreira, A. (2006). Extreme Value Theory:
the arc-sin law, Theory of Probability and Its Applica- An Introduction, Springer, New York.
tions 10, 323–331. [24] Haan, L. & Resnick, S.I. (1977). Limit theory for
[7] Coles, S. (2001). An Introduction to Statistical Modeling multivariate sample extremes, Zeitschriftfur Wahrschein-
of Extreme Values, Springer, London. lichkeitstheorieund Verwandle. Gebiete 40, 317–337.
[8] Davis, R.A. & Hsing, T. (1995). Point process and [25] Haan, Lde., Resnick, S.I., Rootzén, H. & Vries, C. Gde.
partial sum convergence for weakly dependent random (1989). Extremal behaviour of solutions to a∼stochastic
variables with infinite variance, Annals of Probability 23, difference equation with applications to ARCH pro-
879–917. cesses, Stochastic Processes and Their Applications 32,
[9] Davis, R.A. & Mikosch, T. (1998). The sample autocor- 213–224.
relations of heavy-tailed processes with applications to [26] Ibragimov, I.A. & Linnik, Yu.V. (1971). Independent
ARCH, Annals of Statistics 26, 2049–2080. and Stationary Sequences of Random Variables, Wolters-
[10] Davis, R.A. & Mikosch, T. (2001). Point process conver- Noordhoff, Groningen.
gence of stochastic volatility processes with application [27] Kallenberg, O. (1983). Random Measures, 3rd edition,
to sample autocorrelation, Journal of Applied Probability Akademie-Verlag, Berlin.
38A, 93–104. [28] Kesten, H. (1973). Random difference equations and
[11] Davis, R.A. & Mikosch, T. (2001). The sample auto- renewal theory for products of random matrices, Acta
correlations of financial time series models, in W.J. Mathematica 131, 207–248.
Fitzgerald, R.L. Smith, A.T. Walden & P.C. Young, [29] Leadbetter, M.R., Lindgren, G. & Rootzén, H. (1983).
(eds), Nonlinear and Nonstationary Signal Processing, Extremes and Related Properties of Random Sequences
Cambridge University Press, Cambridge, pp. 247–274. and Processes, Springer, New York.
[12] Davis, R.A. & Mikosch, T. (2009). Extreme value [30] Linder, A. (2009). Stationairty, mixing, distributional
theory for GARCH processes, in Handbook of Financial properties and moments of GARCH(p,q) processes, in
Time Series, T. Andersen, R.A. Davis, J.-P. Kreiss & T. Andersen, R.A. Davis, J.-P. Kreiss, and T. Mikosch,
T. Mikosch, eds, Springer, New York, pp. 187–200. (eds), Handbook of Financial Time Series, Springer, New
[13] Davis, R.A. & Mikosch, T. (2009). Probabilistic proper- York.
ties of stochastic volatility models, in T. Andersen, R.A. [31] Mandelbrot, B. (1963). The variation of certain specula-
Davis, J.-P. Kreiss & T. Mikosch, (eds), Handbook tive prices, Journal of Business 36, 394–419.
of Financial Time Series, Springer, New York, pp. [32] Mandelbrot, B. & Taylor, H. (1967). On the distribution
255–267. of stock price differences, Operations Research 15,
[14] Davis, R.A. & Resnick, S.I. (1985). Limit theory for 1057–1062.
moving averages of random variables with regularly [33] Mikosch, T. (2003). Modelling dependence and tails of
varying tail probabilities, Annals of Probability 13, financial time series, in B. Finkenstadt & H. Rootzen,
179–195. (eds), Extreme Values in Finance, Telecommunications
[15] Davis, R.A. & Resnick, S.I. (1985). More limit theory and the Environment, Chapman & Hall, pp. 185–286.
for the sample correlation function of moving aver- [34] Mikosch, T. & Stărică, C. (2000). Limit theory for the
ages, Stochastic Processes and Their Applications 20, sample autocorrelations and extremes of a GARCH(1,1)
257–279. process, Annals of Statistics 28, 1427–1451.
[16] Davis, R.A. & Resnick, S.I. (1986). Limit theory for the
[35] Nelson, D.B. (1990). Stationarity and persistence in
sample covariance and correlation functions of moving
the GARCH$(1,1)$ model, Econometric Theory 6,
averages, Annals of Statistics 14, 533–558.
318–334.
[17] Doukhan, P. (1994). Mixing Properties and Examples,
[36] Resnick, S.I. (1987). Extreme Values, Regular Variation,
Lecture Notes in Statistics, Springer Verlag, New York.
and Point Processes, Springer, New York.
Vol. 85.
[37] Resnick, S.I. (2007). Heavy Tail Phenomena; Probabilis-
[18] Embrechts, P., Klüppelberg, C. & Mikosch, T. (1997).
tic and Statistical Modeling, Springer, New York.
Modelling Extremal Events for Insurance and Finance,
Springer, Berlin.
[19] Engle, R.F. (1982). Autoregressive conditional het-
eroscedastic models with estimates of the variance of
Further Reading
United Kingdom inflation, Econometrica 50, 987–1007.
[20] Engle, R.F. & Bollerslev, T. (1986). Modelling the Resnick, S.I. (1986). Point processes, regular variation and
persistence of conditional variances. With comments and weak convergence, Advances in Applied Probability 18,
a reply by the authors, Econometric Reviews 5, 1–87. 66–138.
[21] Fama, E.F. (1965). The behaviour of stock market prices, Taylor, S.J. (1986). Modelling Financial Time Series, Wiley,
Journal of Business 38, 34–105. Chichester.
14 Heavy Tails

Related Articles Risk Measures: Statistical Estimation; Stochastic


Volatility Models; Volatility.

Extreme Value Theory; GARCH Models; Mandel- RICHARD A. DAVIS


brot, Benoit; Mixture of Distribution Hypothesis;
y
Filtering E{f (xt ) | Ft } as the optimal filter for f (xt ). Notice
y
that determining E{f (xt ) | Ft } is no more restric-
tive than determining the entire filter distribution
p(xt | y0t ); in fact, by taking f (x) = eiλx for a generic
The Filtering Problem y
λ, the E{f (xt ) | Ft } in equation (1) leads to the con-
ditional characteristic function of xt given y0t .
Consider a randomly evolving system, the state of
Related to the filtering problem, are the predic-
which is denoted by xt and this state may not be
tion problem, that is, that of determining p(xt | y0s )
directly observable. Denote by yt the observation at
for s < t, and the interpolation or smoothing problem
time t ∈ [0, T ] (xt and yt may be vector-valued): yt
concerning p(xt | y0s ) for t < s. Given the Bayesian
is supposed to be probabilistically related to xt . For
nature of the filtering problem, one can also con-
instance, yt may represent a noisy measurement of xt .
sider the so-called combined filtering and parameter
The process xt is generally supposed to evolve
estimation problem: if the dynamics p(xt | xs ) for x
in a Markovian way according to a given (a priori)
include an unknown parameter θ, one may consider
distribution p(xt | xs ), s ≤ t. The dynamics of yt are
the problem of determining the joint conditional dis-
given in terms of the process xt ; a general assumption y
tribution p(xt , θ | Ft }.
is that, given xt , the process yt is independent of
its past and so one may consider as given the
distribution p(yt | xt ). The information on xt at a Models for the Filtering Problem
given t ∈ [0, T ] is thus represented by the past and
present observations of yt , that is, by y0t := {ys ; s ≤ To solve a given filtering problem, one has to specify
y
t} or, equivalently, by the filtration Ft := σ {ys ; s ≤ the two basic inputs, namely, p(xt | xs ) and p(yt |
t}. This information, combined with the a priori xt ). A classical model in discrete time is
dynamics of x given by p(xt | xs ) can, via a Bayes-
type formula, be synthesized in the conditional or 
xt+1 = a(t, xt ) + b(t, xt ) wt
posterior distribution p(xt | y0t ) of xt , given y0t , and yt = c(t, xt ) + vt
(2)
this distribution is called the filter distribution.
The filtering problem consists now in determining, where wt and vt are (independent) sequences of
possibly in a recursive way, the filter distribution at independent random variables and the distribution of
each t ≤ T . It can also be seen as a dynamic exten- x0 is given. Notice that in equation (2) the process xt
sion of Bayesian statistics: for xt ≡ x an unknown is Markov and yt represents the indirect observations
parameter, the dynamic model for x given by p(xt | of xt , affected by additive noise.
xs ) reduces to a prior distribution for x and the filter The continuous time counterpart is
p(x | y0t ) is then simply the posterior distribution of
x, given the observations ys , s ≤ t. 
dxt = a(t, xt ) dt + b(t, xt ) dwt
In many applications, it suffices to determine a (3)
dyt = c(t, xt ) dt + dvt
synthetic value of the filter distribution p(xt | y0t ). In
particular, given an (integrable) function f (·), one and notice that, here, yt represents the cumulative
may want to compute observations up to t. These basic models allow
for various extensions: xt may, for example, be a
y
E{f (xt ) | y0t } = E{f (xt ) | Ft } jump-diffusion process or a Markov process with a
 finite number of states, characterized by its transition
= f (x) dp(x | y0t ) (1) intensities. Also the observations may more generally
be a jump-diffusion such as
The quantity in equation (1) may be seen as the best
estimate of f (xt ), given y0t , with respect to the mean dyt = c(t, xt ) dt + dvt + dNt (4)
square error criterion in the sense that E{(E{f (xt |
y0t )} − f (xt ))2 } ≤ E{(g(y0t ) − f (xt ))2 } for all mea- where Nt is a doubly stochastic Poisson process, the
surable (and integrable) functions g(y0t ) of the avail- intensity λt = λ(xt ) of which depends on xt . Further
able information. In this sense, one may also consider generalizations are, of course, possible.
2 Filtering

Analytic Solutions of the Filtering Problem which represents the prediction step, and

Discrete Time. By the Markov property of the


process xt and the fact that, given xt , the process x̂t|t = x̂t|t−1 + Lt [yt − Ct x̂t|t−1 ]
yt is independent of its past, with the use of Bayes’
formula one easily obtains the following two-step Pt|t = Pt|t−1 − Lt Ct Pt|t−1 (9)
recursions
  which represents the updating step with x̂0|−1 the
p(xt | y0t−1 ) = p(xt | xt−1 ) dp(xt−1 | y0t−1 )
(5) mean of x0 and P0|−1 its variance. Furthermore,
p(xt | y0t ) ∝ p(yt | xt )p(xt | y0t−1 )

where ∝ denotes “proportional to” and the first step


Lt := Pt|t−1 Ct [Ct Pt|t−1 Ct + Rt Rt ]−1 (10)
corresponds to the prediction step while the second
one is the updating step. The recursions start with
p(x0 | y00 ) = p(x0 ). Although equation (5) represents Notice that, in the prediction step, the estimate of
a fully recursive relation, its actual computation is xt is propagated one step further on the basis of the
made difficult not only by the presence of the integral given a priori dynamics of xt , while in the updating
in xt−1 , but also by the fact that this integral is step one takes into account the additional information
parameterized by xt that, in general, takes infinitely coming from the current observation. A crucial role
many values. Depending on the model, one can in the updating step given by equation (9) is played
however obtain explicit solutions as will be shown by
below. The most general of such situations arises
when one can find a finitely parameterized class of
distributions of xt that is closed under the operator yt − Ct x̂t|t−1 = yt − Ct At−1 x̂t−1|t−1
implicit in equation (5), that is, such that, whenever
p(xt−1 | y0t−1 ) belongs to this class, then p(xt | y0t ) = yt − Ct E{xt | y0t−1 }
also belongs to it. A classical case is the linear = yt − E{yt | y0t−1 } (11)
conditionally Gaussian case that corresponds to a
model of the form
 which represents the new information given by yt
xt+1 = At (y0t )xt + Bt (y0t ) wt with respect to its best estimate E{yt | y0t−1 } and is
(6)
yt = Ct (y0t ) xt + Rt (y0t ) vt therefore called innovation.
The Kalman–Bucy filter has been extremely suc-
where the coefficients may depend on the entire
cessful and has also been applied to Gaussian models
past of the observations yt , and wt , vt are indepen-
that are nonlinear by simply linearizing the nonlinear
dent i.i.d. sequences of standard Gaussian random
coefficient functions around the current best estimate
variables. For such a model, p(xt | y0t ) is Gaussian
of xt . In this way, one obtains an approximate filter,
at each t and therefore characterized by mean and
called the extended Kalman filter.
(co)variance that can be recursively computed by the
Exact solutions for the discrete time filtering
well-known Kalman–Bucy filter. Denoting
problem can also be obtained for the case when xt
is a finite-state Markov chain with, say, N states
x̂t|t−1 := E{xt | y0t−1 }; x̂t|t := E{xt | y0t } defined by its transition probability matrix. In this
case, the filter is characterized by its conditional
Pt|t−1 := E{(xt − x̂t|t−1 )(xt − x̂t|t−1 ) | y0t−1 } (7)
state probability vector that we denote by πt =
Pt|t := E{(xt − x̂t|t )(xt − x̂t|t ) | y0t }
y
(πt1 , . . . , πtN ) with πti := P {xt = i | Ft }.

the Kalman–Bucy filter is given by (dropping for


simplicity the dependence on y0t ), Continuous Time. For the solution of a gen-
 eral continuous time problem, we have two main
x̂t|t−1 = At−1 x̂t−1|t−1 approaches, namely, the innovations approach that
(8)
Pt|t−1 = At−1 Pt−1|t−1 At−1 + Bt−1 Bt−1

extends the innovation representation of the Kalman
Filtering 3

filter where, combining equations (8) and (9), this equation (14) becomes (on replacing L by Q)
latter representation is given by

N

x̂t|t = At−1 x̂t−1|t−1 + Lt [yt − Ct At−1 x̂t−1|t−1 ] dπt (j ) = πt (i)qij dt


(12) i=1

and the so-called reference probability approach. For  



N
the sake of brevity, we discuss here only the innova- + πt (j ) c(t, j ) − πt (i)c(t, i)
tions approach (Kushner–Stratonovich equation) and i=1
we do it for the case of the model in equation (3) men-  

N
tioning briefly possible extensions to other cases. For × dyt − πt (i)c(t, i) dt (15)
the reference probability approach (Zakai equation), i=1
we refer to the literature (for instance, [8, 19]).
We denote by L the generator of the Markov For more results when xt is finite-state Markov, we
diffusion xt in equation (3), that is, assuming x ∈ n , refer to [10], and, in particular, see [11].
for a function φ(t, x) ∈ 1,2 , we have We just mention that one can write the dynamics
of fˆt also in the case of jump-diffusion observations
as in equation (4) (see [17]) and one can, furthermore,
Lφ(t, x) = a(t, x)φx (t, x) obtain an evolution equation, a stochastic partial dif-
ferential equation (PDE), for the conditional density
1 
n
+ σij (t, x)φxi xj (t, x) (13) p(xt ) = p(xt | y0t ), whenever it exists, that involves
2 i,j =1 the formal adjoint L∗ of the L in equation (13) (see
[19]).
with σ (t, x) := b(t, x)b (t, x). Furthermore, for a
generic (integrable) f (·), we let fˆt := E{f (xt ) | Ft }.
y
Numerical Solutions of the Filtering Problem
The innovations approach now leads, in case of model
given by equation (3), to the following dynamics, also As we have seen, an explicit analytic solution to the
called the Kushner–Stratonovich equation (see e.g., filtering problem can be obtained only for special
[19, 8]): models so that, remaining within analytic solutions,
in general, one has to use an approximation approach.
As already mentioned, one such approximation con-
dfˆt = L
f (xt ) dt + [c(t, 
xt )f (xt ) sists in linearizing the nonlinear model, both in
discrete and continuous time, and this leads to the

− c(t, 
xt )fˆt ] [ dyt − c(t, xt ) dt] (14) extended Kalman filter. Another approach consists in
approximating the original model by one where xt
which (see equation (3)) is based on the innovations is finite-state Markov. The latter approach goes back
xt ) dt = dyt − E{dyt | Fyt }. In addition to
dyt − c(t, mainly to Kushner and coworkers; see, for example,
the stochastic integral, the main difficulty with equa- [18] (for a financial application, see also [13]). A
tion (14) is that, to compute fˆ, one needs cf , which, more direct numerical approach is simulation-based
 2
in turn, requires c f , and so on. In other words, equa- and given by the so-called particle approach to fil-
tion (14) is not a closed system of stochastic differen- tering that has been successfully introduced more
tial equations. Again, for particular models, equation recently and that is summarized next.
(14) leads to a closed system as it happens with the
linear-Gaussian version of equation (3) that leads to Simulation-based Solution (Particle Filters).
the continuous time Kalmann–Bucy filter, which is Being simulation-based, this solution method as such
analogous to its discrete time counterpart. A further is applicable only to discrete time models; continuous
case arises when xt is finite-state Markov with tran- time models have to be first discretized in time. There
sition intensity matrix Q = {qij }, i, j = 1, . . . , N. are various variants of particle filters but, analogous
y
Putting πt (i) := P {xt = i | Ft } and taking f (·) as to the analytical approaches, they all proceed along
the indicator function of the various values of xt , two steps, a prediction step and an updating step, and
4 Filtering

at each step the relevant distribution (predictive and addition, with Markovian factor processes, Markov-
filter distribution, respectively) is approximated by process techniques can be fruitfully applied. In many
a discrete probability measure supported by a finite financial applications of factor models, the investors
number of points. These approaches vary mainly in have only incomplete information about the actual
the updating step. state of the factors and this may induce model
A simple version of a particle filter is as follows risk. In fact, even if the factors are associated
(see [3]): in the generic period t − 1 approximate with economic quantities, some of them are difficult
p(xt−1 | y0t−1 ) by a discrete distribution ((xt−1
1 1
, pt−1 ), to observe precisely. Furthermore, abstract factors
L L i
. . . , (xt−1 , pt−1 )) where pt−1 is the probability that without economic interpretation are often included in
xt−1 = xt−1 i
. Consider each location xt−1i
as the the specification of a model to increase its flexibility.
position of a “particle”. Under incomplete information of the factors, their
1. Prediction step values have to be inferred from observable quantities
and this is where filtering comes in as an appropriate
Propagate each of the particles xt−1 i
→ x̂ti over
tool.
one time period, using the given (discrete time)
Most financial problems concern pricing as well
evolution dynamics of xt : referring to the model in
as portfolio management, in particular, hedging and
equation (2) just simulate independent trajectories
i portfolio optimization. While portfolio management
of xt starting from the various xt−1 . This leads
is performed under the physical measure, for pricing,
to an approximation of p(xt | y0 ) by the discrete
t−1
one has to use a martingale measure. Filtering prob-
distribution ((x̂t1 , p̂t1 ), . . . , (x̂tL , p̂tL )) where one puts
lems in finance may therefore be considered under
p̂ti = pt−1
i
.
the physical or the martingale measures, or under
2. Updating step both (see [22]). In what follows, we shall discuss
Update the weights using the new observation yt by filtering for pricing problems, with examples from
putting pti = cpt−1
i
p(yt | x̂ti ) where c is the normal- term structure and credit risk, as well as for portfolio
ization constant (see the second relation in equation management. More general aspects can be found, for
(5) for an analogy). example, in the recent papers [6, 7], and [23].
Notice that p(yt | x̂ti ) may be viewed as the likeli-
hood of particle x̂ti , given the observation yt , so that in
the updating step one weighs each particle according
to its likelihood. There exist various improvements of Filtering in Pricing Problems
this basic setup. There are also variants, where in the
updating step each particle is made to branch into a
random number of offsprings, where the mean num- This section is to a large extent based on [14]. In
ber of offsprings is taken to be proportional to the Markovian factor models, the price of an asset at
likelihood of that position. In this latter variant, the a generic time t can, under full observation of the
number of particles increases and one can show that, factors, be expressed as an instantaneous function
under certain assumptions, the empirical distribution (t, xt ) of time and the value of the factors. Let
of the particles converges to the true filter distribu- Gt denote the full filtration that measures all the
tion. There is a vast literature on particle filters, of processes of interest, and let Ft ⊂ Gt be a subfiltration
which we mention [5] and, in particular, [1]. representing the information of an investor. What is
an arbitrage-free price in the filtration Ft ? Assume
the asset to be priced is a European derivative with
Filtering in Finance maturity T and claim H ∈ FT . Let N be a numeraire,
adapted to the investor filtration Ft , and let QN be
There are various situations in finance where filtering
the corresponding martingale measure. One can easily
problems may arise, but one typical situation is given
by factor models. These models have proven to prove the following:
be useful for capturing the complicated nonlinear N

dynamics of real asset prices, while at the same Lemma 1 Let (t, xt ) = Nt E Q NHT | Gt be the
time being parsimonious and numerically tractable. In arbitrage-free price of the claim H under the full
Filtering 5

ˆ
information Gt and (t)
N
= Nt E Q NHT | Ft the cor- From the filtering point of view, the system (20) is
a linear-Gaussian model with xt unobserved and the
responding arbitrage-free price in the investor filtra-
observations given by (rt , yti ). We shall thus put Ft =
tion. It then follows that
σ {rs , ysi ; s ≤ t, i = 1, . . . , n}. The filter distribution
ˆ N is Gaussian and, via the Kalman filter, one can
(t) = E Q {(t, xt ) | Ft } (16)
obtain its conditional mean mt and (co)variance
t
Furthermore, if the savings account Bt = exp{ 0 t . Applying Lemma 1 and using the moment-
rs ds} with corresponding martingale measure Q is generating function of a Gaussian random variable,
Ft −adapted, then we obtain the arbitrage-free price, in the investor
filtration, of an illiquid bond with maturity T as
ˆ
(t) = E Q {(t, xt ) | Ft } (17) follows:

We thus see that, to compute the right-hand sides


in equation (16) or equation (17), namely, the price p̂(t, T ) = E{p(t, T ; xt ) | Ft }
of a derivative under restricted information given its = exp[A(t, T )] E{exp[−B(t, T )xt ] | Ft }
price under full information, one has to solve the
filtering problem for xt given Ft under a martingale = exp[A(t, T ) − B(t, T )mt
measure. We present now two examples. 1
+ B(t, T ) t B  (t, T )] (21)
2
Example 1 (Term structure of interests). The ex-
ample is a simplified version adapted from [15]. For the given setup, the expectation is under the mar-
Consider a factor model for the term structure where tingale measure Q with the money market account
the unobserved (multivariate) factor process xt satis- Bt as numeraire. To apply Lemma 1, we need the
fies the linear-Gaussian model numeraire to be observable and this contrasts with the
assumption that rt is observable only in noise. This
dxt = F xt dt + D dwt (18) difficulty can be overcome (see [14]), but by suitably
changing the drifts in equation (20) (corresponding
In this case, the term structure is exponentially affine to a translation of wt ), one may however consider
in xt and one has the model in equation (20) also under a martingale
measure for which the numeraire is different from Bt
p(t, T ; xt ) = exp[A(t, T ) − B(t, T ) xt ] (19)
and observable.
with A(t, T ), B(t, T ) satisfying well-known first- A further filter application to the term structure of
order ordinary differential equations to exclude arbi- interest rates can be found in [2].
trage. Passing to log-prices for the bonds, one gets the
linear relationship ytT := log p(t, T ; xt ) = A(t, T ) − Example 2 (Credit risk). One of the main issues
B(t, T )xt . Assume now that investors cannot observe in credit risk is the modeling of the dynamic evolution
xt , but they can observe the short rate and the log- of the default state of a given portfolio. To formalize
prices of a finite number n of zero-coupon bonds, the problem, given a portfolio of m obligors, let
perturbed by additive noise. This leads to a system yt := (yt,1 , . . . , yt,m ) be the default indicator process
of the form where yt,i := 1{τi ≤t} with τi the random default time
 of obligor i, i = 1, . . . , m. In line with the factor
 dx = F xt dt + D dwt
 t modeling philosophy, it is natural to assume that
drt = (αt0 + βt0 xt ) dt + σt0 dwt + dvt0
default intensities depend on an unobservable latent
 dyt = (αt + βt xt ) dt + σt dwt + (Ti − t) dvt
 i i i i i

; i = 1, . . . , n process xt . In particular, if λi (t) is the default


(20) intensity of obligor i, i = 1, . . . , m, assume λi (t) =
where vti , i = 0, . . . , n are independent Wiener pro- λi (xt ). Note that this generates information-driven
cesses and the coefficients are related to those in contagion: it is, in fact, well known that the intensities
equations (18) and (19). The time-dependent volatil- with respect to Ft are given by λ̂i (t) = E{λi (xt ) |
ity in the perturbations of the log-prices reflects the Ft }. Hence the news that an obligor has defaulted
fact that it tends to zero as time approaches maturity. leads, via filtering, to an update of the distribution
6 Filtering

of xt and thus to a jump in the default intensities xt and yt . In [13] it is shown that an arbitrarily good
of the still surviving obligors. In this context, we approximation to the filter solution can be obtained
shall consider the pricing of illiquid credit derivatives both analytically and by particle filtering.
on the basis of the investor filtration supposed to
be given by the default history and noisily observed We conclude this section with a couple of addi-
prices of liquid credit derivatives. tional remarks:
We assume that, conditionally on xt , the defaults
1. Traditional credit risk models are either struc-
are independent with intensities λi (xt ) and that
tural models or reduced-form (intensity-based)
(xt , yt ) is jointly Markov. A credit derivative has
models. Example 2 belongs to the latter class.
the payoff linked to default events in a given refer- In structural models, the default of the generic
ence portfolio and so one can think of it as a random obligor/firm i is defined as the first passage time
y
variable H ∈ FT with T being the maturity. Its full of the asset value Vi (t) of the firm at a given
information price at the generic t ≤ T , that is, in (possibly stochastic) barrier Ki (t), that is,
the filtration Gt that measures also xt , is given by
H̃t = E{e−r(T −t) H | Gt } where r is the short rate and τi = inf{t ≥ 0 | Vi (t) ≤ Kt (t)} (25)
the expectation is under a given martingale measure
Q. By the Markov property of (xt , yt ), one gets a In such a context, filtering problems may arise
representation of the form when either Vi (t) or Ki (t) or both are not exactly
known/observable (see e.g., [9]).
H̃t = E{e−r(T −t) H | Gt } := a(t, xt , yt ) (22) 2. Can a structural model also be seen as a reduced-
form model? At first sight, this is not clear
for a suitable a(·). In addition to the default history, since τi in equation (25) is predictable, while in
we assume that the investor filtration also includes intensity-based models it is totally inaccessible.
noisy observations of liquid credit derivatives. In However, it turns out (see e.g., [16]) that, while τi
view of equation (22), it is reasonable to model such in equation (25) is predictable with respect to the
observations as full filtration (measuring also Vi (t) and Ki (t)),
it becomes totally inaccessible in the smaller
dzt = γ (t, xt , yt ) dt + dβt (23) investor filtration that, say, does not measure
Vi (t) and, furthermore, it admits an intensity.
where the various quantities may also be column
vectors, βt is an independent Wiener process and γ (·)
is a function of the type of a(·) in equation (22). The Filtering in Portfolio Management Problems
y
investor filtration is then Ft = Ft ∨ Fzt . The price at
Rather than presenting a general treatment (for this,
t < T of the credit derivative in the investor filtration
we refer to [21] and the references therein), we
is now Ht = E{e−r(T −t) H | Ft } and by Lemma 1 we
discuss here two specific examples in models with
have
unobserved factors, one in discrete time and one in
Ht = E{e−r(T −t) H | Ft } = E{a(t, xt , yt ) | Ft } continuous time. Contrary to the previous section
(24) on pricing, here we shall work under the physical
Again, if one knows the price a(t, xt , yt ) in Gt , one measure P .
can thus obtain the price in Ft by computing the
right-hand side in equation (24) and for this we need A Discrete Time Case. To motivate the model, start
the filter distribution of xt given Ft . from the classical continuous time asset price model
To define the corresponding filtering problem, we dSt = St [a dt + xt dwt ] where wt is Wiener and xt is
need a more precise model for (xt , yt ) (the process the nondirectly observable volatility process (factor).
zt is already given by equation (23)). Since yt is For yt := log St , one then has
a jump process, the model cannot be one of those  
1
for which we had described an explicit analytic dyt = a − xt2 dt + xt dwt (26)
2
solution. Without entering into details, we refer to
[13] (see also [14]), where a jump-diffusion model Passing to discrete time with step δ, let for t =
is considered that allows for common jumps between 0, . . . , T the process xt be a Markov chain with m
Filtering 7

states x 1 , . . . , x m (may result from a time discretiza- distribution of the form p(yt | xt−1 , yt−1 ), and equa-
tion of a continuous time xt ) and tion (5) can be adapted to become here
  
1 2 √
yt = yt−1 + a − xt−1 δ + xt−1 δεt (27)  π0 = 
µ (initial distribution for xt )
2 πti ∝ m
j =1 p (yt | xt−1 = j, yt−1 ) (30)
 j
with εt i.i.d. standard Gaussian as it results from p (xt = i | xt−1 = j ) πt−1
equation (26) by applying the Euler–Maruyama
In addition, we may consider the law of yt
scheme. Notice that (xt , yt ) is Markov. Having for
conditional on (πt−1 , yt−1 ) = (π, y) that is given by
simplicity only one stock to invest in, denote by φt
the number of shares of stock held in the portfo-
lio in period t with the rest invested in a riskless 
m
 
Qt (π, y, dy  ) = p y  | xt−1 = j, y
bond Bt (for simplicity assume r = 0). The corre- i,j =1
sponding self-financed wealth process then evolves
according to p (xt = i | xt−1 = j ) π j (31)
   
φ φ φ
Vt+1 = Vt + φt eyt+1 − eyt := F Vt , φt , yt , yt+1 From equations (30) and (31), it follows easily that
y
(πt , yt ) is a sufficient statistic and an Ft −Markov
(28)
y process.
and φt is supposed to be adapted to Ft ; denote by
To transform the original partial information prob-
A the class of such strategies. Given a horizon T ,
lem with criterion (29) into a corresponding complete
consider the following investment criterion
observation problem, put r̂t (π, y, v, φ) = m i
i=1 rt (x ,
m
i ˆ
y, v, φ)π and f (π, y, v) = i=1 f (x , y, v)π so
i i
Jopt (V0 ) = sup J (V0 , φ) that, by double conditioning, one obtains
φ∈A
 T −1  T −1
 φ 

= sup E rt (xt , yt , Vt , φt ) J (V0 , φ) = E


φ y
E rt (xt , yt , Vt , φt ) | Ft
φ∈A t=0 t=0
 
φ

+ f (xT , yT , VT ) (29) φ y
+ E f (xT , yT , VT ) | FT

T −1 
which, besides portfolio optimization, includes also 
r̂t (πt , yt , Vt , φt )+ fˆ(πT , yT , VT )
φ φ
hedging problems. The problem in equations (27), =E
(28), and (29) is now a stochastic control problem t=0
under partial/incomplete information given that xt is (32)
an unobservable factor process.
A standard approach to dynamic optimization Owing to the Markov property of (πt , yt ), one can
problems under partial information is to trans- write the following (backward) dynamic program-
form them into corresponding complete information ming recursions:
ones whereby xt is replaced by its filter distribu-
y y 
tion given Ft . Letting πti := P {xt = x i | Ft } , i =  u (π, y, v) = fˆ(π, y,
1, . . . , m we first adapt the filter dynamics in equa-

 T  v)
ut (π, y, v) = supφ∈A r̂t (π, y, v, φ)
tion (5) to our situation to derive a recursive relation (33)
 +E {ut+1 (πt+1 , yt+1,

 
for πt = (πt1 , . . . , πtm ). Being xt finite-state Markov, F (v, φ, y, yt+1 )) | (πt , yt ) = (π, y)}
p(xt+1 | xt ) is given by the transition probability
matrix and the integral in equation (5) reduces to where the function F (·) was defined in equation (28),
a sum. On the other hand, p(yt | xt ) in equation (5) and φ here refers to the generic choice of φ = φt in
corresponds to the model in equation (2) that does period t. It leads to the optimal investment strategy
not include our model in equation (27) for yt . One φ ∗ and the optimal value Jopt (V0 ) = u0 (µ, y0 , V0 ). It
can however easily see that equation (27) leads to a can, in fact, be shown that the strategy and value thus
8 Filtering

obtained are optimal also for the original incomplete consumption, and with a power utility function.
information problem when φ there is required to be Combining equations (34), (35), and (36) we obtain
y
Ft −adapted. the following portfolio optimization problem under
To actually compute the recursions in equation incomplete information where the factor process xt
(33), one needs the conditional law of (πt+1 , yt+1 ) is not observed and where we shall require that ρt is
given (πt , yt ), which can be deduced from equations FYt -adapted:
(30) and (31). In this context, notice that, even if x
is m-valued, πt takes values in the m-dimensional 
simplex that is ∞-valued. To actually perform the 
 dxt = Ft (xt ) dt + Rt (xt ) dMt (unobserved)


 dy t = At 
(yt , xt ) dt + B(yt ) dwt (observed)
calculation, one needs an approximation leading to a 
  


finite-valued process (πt , yt ) and to this effect various 
 dVt = Vt ρt At (yt , xt ) + 1 Bt2 (yt ) dt
approaches have appeared in the literature (for an 2
 (37)
approach with numerical results see [4]). 




 + ρt Bt (yt ) dwt




A Continuous Time Case. Consider the following  sup E {(V )µ } , µ ∈ (0, 1)
ρ T
market model where xt is an unobserved factor
process and St is the price of a single risky asset:
As in the previous discrete time case, we shall now
 transform this problem into a corresponding one
dxt = Ft (xt ) dt + Rt (xt ) dMt under complete information, thereby replacing the
(34)
dSt = St [at (St , xt ) dt + σt (St ) dwt ] unobserved state variable xt by its filter distribution,
y y
with wt a Wiener process and Mt a not necessarily given Ft , that is, πt (x) := p(xt | Ft )xt =x . Even if
continuous martingale, xt is finite-dimensional, πt (·) is ∞-dimensional. We
 t independent of wt . Since, in have seen above cases where the filter distribution
continuous time, 0 σs2 ds can be estimated by the
empirical quadratic variation of St , in order not to is finitely parameterized, namely, the linear-Gaussian
have degeneracy in the filter to be derived below for case (Kalman filter) and when xt is finite-state
xt , we do not let σ (·) depend also on xt . For the Markov. The parameters characterizing the filter were
riskless asset, we assume for simplicity that its price seen to evolve over time driven by the innovations
is Bt ≡ const (short rate r = 0). In what follows, it process (see equations (8), (10) and (14)). In what
is convenient to consider log-prices yt = log St , for follows, we then assume that the filter is parameter-
which ized by a vector process ξt ∈ p , that is, πt (x) :=
y
p(xt | Ft )xt =x = π(x; ξt ) and that ξt satisfies
1
dyt = [at (St , xt ) − σt2 (St )] dt + σ (St ) dwt dξt = βt (yt , ξt ) dt + ηt (yt , ξt ) dw̄t (38)
2
:= At (yt , xt ) dt + B(yt ) dwt (35)
where w̄t is Wiener and given by the innovations
Investing in this market in a self-financing way and process. We now specify this innovations process w̄t
denoting by ρt the fraction of wealth invested in for our general modelin equation (37). To this effect,
the risky asset, we have from dV
Vt
t
= ρt dS
St
t
= ρt edyt eyt putting At (yt , ξt ) := At (yt , x) dπt (x; ξt ), let
that
 dw̄t := Bt−1 (yt ) [ dyt − At (yt , ξt ) dt] (39)
 
1
dVt = Vt ρt At (yt , xt ) + Bt2 (yt ) dt
2 and notice that, replacing dyt from equation (35),
 this definition implies a translation of the original
(P , Ft )-Wiener wt , that is,
+ ρt Bt (yt ) dwt (36)
 
dw̄t = dwt + Bt−1 (yt ) At (yt , xt ) − At (yt , ξt ) dt
We want to consider the problem of maximization (40)
of expected utility from terminal wealth, without
Filtering 9

and thus the implicit change of measure P → P̄ with [2] Bhar, R. Chiarella, C. Hung, H. & Runggaldier, W.
(2005). The volatility of the instantaneous spot interest
 rate implied by arbitrage pricing—a dynamic Bayesian
dP¯ T   approach. Automatica 42, 1381–1393.
= exp At (yt , ξt ) − At (yt , xt ) [3] Budhiraja, A., Chen, L. & Lee, C. (2007). A survey
dP | FT 0 of nonlinear methods for nonlinear filtering problems,
 Physica D 230, 27–36.
1 T 
× Bt−1 (yt ) dwt − At (yt , ξt ) [4] Corsi, M., Pham, H. & Runggaldier, W.J. (2008).
2 0 Numerical approximation by quantization of control
 problems in finance under partial observations, to appear
2 in Mathematical Modeling and Numerical Methods in
− At (yt , xt ) Bt−2 (yt ) dt (41) Finance. Handbook of Numerical Analysis, A. Bensous-
san & Q. Zhang, eds, Elsevier, Vol. 15.
[5] Crisan, D., Del Moral, P. & Lyons, T. (1999). Inter-
We obtain thus as the complete information problem acting particle systems approximations of the Kush-
corresponding to equation (37), the following, which ner–Stratonovich equation, Advances in Applied
is defined on the space (, F, Ft , P̄ ) with Wiener w̄t : Probability 31, 819–838.
[6] Cvitanic, J., Liptser, R. & Rozovski, B. (2006). A filter-
 ing approach to tracking volatility from prices observed
 dξt = βt (yt , ξt ) dt + ηt (yt , ξt ) dw̄t

 at random times, The Annals of Applied Probability 16,

 dyt = At (yt , ξt ) dt + Bt (yt ) dw̄t

  
1633–1652.



 dVt = Vt ρt At (yt , ξt ) + 1 Bt2 (yt ) dt
[7] Cvitanic, J., Rozovski, B. & Zaliapin, I. (2006).
2 Numerical estimation of volatility values from dis-
 (42) cretely observed diffusion data, Journal of Computa-



 tional Finance 9, 1–36.

 + ρt Bt (yt ) dw̄t

 [8] Davis, M.H.A. & Marcus, S.I. (1981). An Introduction


 to nonlinear filtering, in Stochastic Systems: The Mathe-
supρ Ē {(VT )µ } , µ ∈ (0, 1) matics of Filtering and Identification and Applications
M. Hazewinkel & J.C. Willems, eds, D.Reidel, Dor-
One can now use methods for complete information drecht, pp. 53–75.
problems to solve equation (42), and it can also be [9] Duffie, D. & Lando, D. (2001). Term structure of
shown that the solution to equation (42) gives a credit risk with incomplete accounting observations,
Econometrica 69, 633–664.
solution of the original problem for which ρt was
y [10] Elliott, R.J. (1993). New finite-dimensional filters and
assumed Ft -adapted. smoothers for noisily observed Markov chains, IEEE
We remark that other reformulations of the incom- Transactions on Information Theory, IT-39, 265–271.
plete information problem as a complete information [11] Elliott, R.J., Aggoun, L. & Moore, J.B. (1994). Hidden
one are also possible (see e.g., [20]). Markov models: estimation and control, in Applications
A final comment concerns hedging under incom- of Mathematics, Springer-Verlag, Berlin-Heidelberg-
plete information (incomplete market). When using New York, Vol. 29.
[12] Frey, R. & Runggaldier, W. (1999). Risk-minimizing
the quadratic hedging criterion, that is, minρ ES0 ,V0 hedging strategies under restricted information: the case
ρ
{(HT − VT )2 }, its quadratic nature implies that if of stochastic volatility models observed only at discrete

φt (xt , yt ) is the optimal strategy (number of units random times, Mathematical Methods of Operations
invested in the risky asset) under complete informa- Research 50(3), 339–350.
tion also of xt , then, under the partial information [13] Frey, R. & Runggaldier, W. (2008). Credit risk and
y incomplete information: a nonlinear filtering approach,
Ft , the optimal strategy is simply the projection
preprint, Universitat Leipzig, Available from www.math.
E{φt∗ (xt , yt ) | Ft } that can be computed on the basis
y
y uni-leipzig.de/%7Efrey/publications-frey.html.
of the filter of xt given Ft (see [12]). [14] Frey, R. & Runggaldier, W.R. Nonlinear filtering in
models for interest-rate and credit risk, to appear
in Handbook of Nonlinear Filtering, D. Crisan &
References B. Rozovski, eds, Oxford University Press (to be pub-
lished in 2009).
[1] Bain, A. & Crisan, D. (2009). Fundamentals of stochas- [15] Gombani, A., Jaschke, S. & Runggaldier, W. (2005).
tic filtering, in Series: Stochastic Modelling and Applied A filtered no arbitrage model for term structures with
Probability, Vol. 60, Springer Science+Business Media, noisy data, Stochastic Processes and Applications 115,
New York,. 381–400.
10 Filtering

[16] Jarrow, R. & Protter, P. (2004). Structural versus Markov factors, in Seminar on Stochastic Analysis, Ran-
reduced-form models: a new information based perspec- dom Fields and Applications V, R.C. Dalang, M. Dozzi,
tive, Journal of Investment Management, 2, 1–10. & F. Russo, eds, Progress in Probability, Birkhäuser
[17] Kliemann, W., Koch, G. & Marchetti, F. (1990). Verlag, Vol. 59, pp. 493–506.
On the unnormalized solution of the filtering prob- [21] Pham, H. Portfolio optimization under partial obser-
lem with counting process observations, IEEE IT-36, vation: theoretical and numerical aspects, to appear
1415–1425. in Handbook of Nonlinear Filtering, D. Crisan &
B. Rozovski, eds, Oxford University Press (to be pub-
[18] Kushner, H.J. & Dupuis, P. (1992). Numerical methods
lished in 2009).
for stochastic control Problems in continuous time,
[22] Runggaldier, W.J. (2004). Estimation via stochastic
in Applications of Mathematics, Springer, New York,
filtering in financial market models, in Mathematics
Vol. 24. of Finance. Contemporary Mathematics, G. Yin &
[19] Liptser, R.S. & Shiryaev, A.N. (2001). Statistics of Q. Zhang, eds, AMS, Vol. 351, pp. 309–318.
random processes, Series: Applications of Mathematics; [23] Zeng, Y. (2003). A partially observed model for micro-
Stochastic Modelling and Applied Probability, Springer- movement of asset prices with Bayes estimation via
Verlag, Berlin, Vols. I, II. filtering, Mathematical Finance, 13, 411–444.
[20] Nagai, H. & Runggaldier, W.J. (2008). PDE approach
to utility maximization for market models with hidden WOLFGANG RUNGGALDIER
Filtrations Some fundamental theorems, such as the Début
theorem, require the usual hypotheses. Hence natu-
rally, very often in the literature on the theory of
stochastic processes and mathematical finance, the
The notion of filtration, introduced by Doob, has underlying filtered probability spaces are assumed to
become a fundamental feature of the theory of satisfy the usual hypotheses. This assumption is not
stochastic processes. Most basic objects, such as mar- very restrictive for the following reasons:
tingales, semimartingales, stopping times, or Markov
processes, involve the notion of filtration. 1. Any filtration can easily be made complete
and right continuous;
 indeed,
 given a filtered
Definition 1 Let (, F, ) be a probability space. probability space , F, ,  , we  first complete
A filtration , on (, F, ), is an increasing family the probability space , F,  , and then we
(Ft )t≥0 of sub-σ -algebras of F. In other words, for add all the -null sets to every Ft+ , t ≥ 0. The
each t, Ft is a σ -algebra included in F and if s ≤ t, new filtration thus obtained satisfies the usual
Fs ⊂ Ft . A probability space (, F, ) endowed with hypotheses and is called the usual augmentation
a filtration  is called a filtered probability space. of ;
2. Moreover, in most classical and encountered
We now give a definition that is very closely cases, the filtration  is right continuous. Indeed,
related to that of a filtration. this is the case when, for instance,  is the natural
filtration of a Brownian motion, a Lévy process,
Definition 2 A stochastic process (Xt )t≥0 on (,
a Feller process, or a Hunt process [8, 9].
F, ) is adapted to the filtration (Ft ) if, for each
t ≥ 0, Xt is Ft -measurable.
Enlargements of Filtrations
A stochastic process X is always adapted to
its natural filtration X , where for each t ≥ 0, For more precise and detailed references, the reader
FXt = σ (Xs , s ≤ t) (the last notation means that Ft can consult the books [4–6, 8] or the survey article
is the smallest σ -algebra with respect to which all the [7].
variables (Xs , s ≤ t) are measurable). X is, hence,
the smallest filtration to which X is adapted.
Generalities
The parameter t is often thought of as time, and
the σ -algebra Ft represents the set of information  
Let , F, ,  be a filtered probability space satis-
available at time t, that is, events that have occurred fying the usual hypotheses. Let  be another filtration
up to time t. Thus, the filtration  represents the satisfying the usual hypotheses and such that Ft ⊂ Gt
evolution of the information or knowledge of the for every t ≥ 0. One natural question is, how are
world with time. If X is an adapted process, then the -semimartingales modified when considered as
Xt , its value at time t, depends only on the evolution stochastic processes in the larger filtration ? Given
of the universe prior to t. the importance of semimartingales and martingales
  (in particular, in mathematical finance where they are
Definition 3 Let , F, ,  be a filtered probabil- used to model prices), it seems natural to character-
ity space. ize situations where the semimartingale or martingale
  properties are preserved.
1. The filtration  is said to be complete if , F, 
is complete and if F0 contains all the -null Definition
 4 We shall say that
 the pair of filtra-
sets. tions ,  satisfies the H  hypothesis if every
2. The filtration  is said to satisfy the usual -semimartingale is a -semimartingale.
hypotheses if it is complete and right continuous,
that is, for all t ≥ 0, Ft = Ft+ , where Remark 1 In fact, using a classical decomposition
of semimartingales due to Jacod and Mémin, it is

Ft+ = Fu (1) enough to check that every -bounded martingale is
u>t a -semimartingale.
2 Filtrations

Definition
  5 We shall say that the pair of filtrations The conditional laws of Z given Ft , for t ≥ 0,
,  satisfies the (H ) hypothesis if every -local play a crucial role in initial enlargements.
martingale is a -local martingale.
Theorem 2 (Jacod’s criterion). Let Z be an F mea-
The theory of enlargements of filtrations, devel- surable random variable and let Qt (ω, dx) denote the
oped in the late 1970s, provides answers to questions regular conditional distribution of Z given Ft , t ≥ 0.
such as those mentioned earlier. Currently, this the- Suppose that for each t ≥ 0, there exists
 a positive
ory has been widely used in mathematical finance, σ -finite measure ηt (dx) on , B  such that
especially in insider trading models and in models of
default risk. The insider trading models are usually Qt (ω, dx)  ηt (dx) almost surely (3)
based on the so-called initial enlargements of filtra-
tions, whereas the models of default risk fit well in Then every -semimartingale is a -semimartin-
the framework of the progressive enlargements of fil- gale.
trations.
 More precisely,
 given a filtered probability
space , F, ,  , there are essentially two ways of Remark 2 In fact, this theorem still holds for
enlarging filtrations: random variables with values in a standard Borel
 space. Moreover, the existence of the σ -finite mea-
• initial enlargements, for which Gt = Ft H for sure ηt (dx) is equivalent to the existence of one pos-
every t ≥ 0, that is, the new information H is itive σ -finite measure η (dx) such that Qt (ω, dx) 
brought in at the origin of time and  η (dx) and in this case η can be taken to be the dis-
• progressive enlargements, for which Gt = Ft Ht tribution of Z.
for every t ≥ 0, that is, the new information is
brought in progressively as the time t increases. Now we give classical corollaries of Jacod’s
theorem.
Before presenting the basic theorems on enlarge-
ments of filtrations, we state a useful theorem due to Corollary 1 Let Z be independent of F∞ . Then,
Stricker. every -semimartingale is a -semimartingale.

Theorem 1 (Stricker [10]). Let  and  be two Corollary 2 Let Z be a random variable taking on
filtrations as above, such that for all t ≥ 0, Ft ⊂ Gt . only a countable number of values. Then every -
If (Xt ) is a -semimartingale that is -adapted, then semimartingale is a -semimartingale.
it is also an -semimartingale.
In some cases, it is possible to obtain an explicit
decomposition of an -local martingale as a -
Initial Enlargements of Filtrations semimartingale [4–8]. For example, if Z = Bt0 , for
some fixed time t0 > 0 and a Brownian Motion B, it
The most important theorem on initial enlargements can be shown that Jacod’s criterion holds for t < t0
of filtrations is due to Jacod and deals with the special and that every -local martingale is a semimartin-
case where the initial information brought in at the gale for 0 ≤ t < t0 , but not necessarily including t0 .
origin of time consists of the σ -algebra generated
 by Indeed in this case, there are -local martingales
a random variable. More precisely, let , F, ,  that are not -semimartingales. Moreover, B is a
be a filtered probability space satisfying the usual -semimartingale, which decomposes as
assumptions. Let Z be an F measurable random
variable. Define
t +
t∧t0
Bt0 − Bs
   Bt = B0 + B ds (4)
Gt = Ft+ε σ {Z} , t ≥ 0 (2) 0 t0 − s
ε>0  
t is a  Brownian Motion.
where B
In financial models, the filtration  represents
the public information in a financial market and Remark 3 There are cases where Jacod’s crite-
the random variable Z stands for the additional rion does not hold but where other methods apply
(anticipating) information of an insider. [4, 6, 7].
Filtrations 3

Progressive Enlargements of Filtrations The next decomposition formulas are used for
  pricing in default models:
Let , F, ,  be a filtered probability space sat-
isfying
 the usual hypotheses, and ρ : (, F) → Proposition 1
+ , B + be a random time. We enlarge the ini-
tial filtration  with the process (ρ ∧ t)t≥0 , so that
1. Let ξ ∈
L1 . Then a càdlàg version of the martingale
the new enlarged filtration ρ is the smallest filtra- ρ
ξt = Ɛ ξ |Ft , on the set {t < ρ}, is given by:
tion (satisfying the usual assumptions) containing 


 (i.e., for all t ≥ 0,
and making ρ a stopping time 1
ρ ξt 1t<ρ = ρ 1t<ρ Ɛ ξ 1t<ρ |Ft (10)
Ft = Kot+ , where Kot = Ft σ (ρ ∧ t)). One may Zt
interpret ρ as the instant of default of an issuer; the
2. Let ξ ∈ L1 and let ρ be an honest time.
Then a
given filtration  can be thought of as the filtration ρ
càdlàg version of the martingale ξt = Ɛ ξ |Ft is
of default-free prices, for which ρ is not a stopping
given as
time. Then, the filtration ρ is the defaultable market
filtration used for the pricing of defaultable assets. 1

A few processes play a crucial role in our ξt = ρ Ɛ ξ 1t<ρ |Ft 1t<ρ
Zt
discussion:
1

• the -supermartingale + ρ Ɛ ξ 1t≥ρ |Ft 1t≥ρ (11)
1 − Zt
ρ
Zt =  [ρ > t | Ft ] (5)
The (H ) Hypothesis
chosen to be càdlàg, associated to ρ by Azéma
[1]; The (H ) hypothesis, in contrast to the (H  ) hypothe-
• the -dual optional projection of the process sis, is sometimes presented
 as a no-abitrage
 condition
ρ
1{ρ≤t} , denoted by At (see [7, 8] for a definition in default models. Let , F,  be a probability
of dual optional projections); and space satisfying the usual assumptions. Let  and 
• the càdlàg martingale be two subfiltrations of F, with
ρ
ρ ρ Ft ⊂ Gt (12)
µt = Ɛ Aρ∞ | Ft = At + Zt (6)
Brémaud and Yor [2] have proven the following
Theorem 3 Every -local martingale (Mt ), stopped characterization of the (H ) hypothesis:
at ρ, is an ρ -semimartingale, with canonical
decomposition: Theorem 4 The following are equivalent:
t∧ρ
t + d M, µρ
s 1. Every -martingale is a -martingale.
Mt∧ρ = M ρ (7)
0 Zs− 2. For all t ≥ 0, the sigma fields Gt and F∞ are
  independent conditionally on Ft .
t is an ρ -local martingale.
where M
Remark 4 We also say that  is immersed in .
The most interesting case in the theory of progres-
sive enlargements of filtrations is when ρ is an honest In the framework of the progressive enlargement
time or equivalently the end of an  optional set , of some filtration  with a random time ρ, the
that is, (H ) hypothesis is equivalent to one of the following
ρ = sup {t : (t, ω) ∈ } (8) hypothesis [3]:
Indeed, in this case, the pair of filtrations (, ρ ) ρ
satisfies the (H  ) hypothesis: every -local martin- 1. ∀t, the σ -algebras F∞ and Ft are conditionally
gale (Mt ) is an ρ -semimartingale, with canonical independent given Ft .
decomposition: 2. For all bounded F∞ measurable random vari-
ρ
t∧ρ t ables F and all bounded Ft measurable random
t + d M, µρ
s d M, µρ
s variables Gt , we have
Mt = M ρ − 1{ρ≤t} ρ
0 Zs− ρ 1 − Zs−
(9) Ɛ [FGt | Ft ] = Ɛ [F | Ft ] Ɛ [Gt | Ft ] (13)
4 Filtrations

ρ
For all bounded Ft measurable random variables 1 1
3. × d[X, R]s −  d[X, R  ]s
Gt : Rs− Rs−
Ɛ [Gt | F∞ ] = Ɛ [Gt | Ft ] (14) (19)
 
is a ,  -local martingale.
4. For all bounded F∞ measurable random vari-
ables F, References


Ɛ F | Fρt = Ɛ [F | Ft ] (15)
[1] Azéma, J. (1972). Quelques applications de la théorie
générale des processus I, Inventiones Mathematicae 18,
293–336.
5. For all s ≤ t, [2] Brémaud, P. & Yor, M. (1978). Changes of filtration
and of probability measures, Zeitschrift fur Wahrschein-
 [ρ ≤ s | Ft ] =  [ρ ≤ s | F∞ ] (16) lichkeitstheorie und Verwandte Gebiete 45, 269–295.
[3] Elliott, R.J., Jeanblanc, M. & Yor, M. (2000). On models
of default risk, Mathematical Finance 10, 179–196.
In view of applications to financial mathematics, [4] Jeulin, T. (1980). Semi-martingales et Grossissements
d’une Filtration, Lecture Notes in Mathematics,
a natural question is, how is the (H ) hypothesis Springer, Vol. 833.
affected when we make an equivalent change of [5] Jeulin, T. & Yor, M. (eds) (1985). Grossissements de
probability measure? Filtrations: Exemples et Applications, Lecture Notes in
Mathematics, Springer, Vol. 1118.
Proposition 2 Let  be a probability measure   [6] Mansuy, R. & Yor, M. (2006). Random Times and
that is equivalent to  (on F). Then, every ,  - (Enlargement of Filtrations) in a Brownian Setting,
semimartingale is a ,  -semimartingale. Lecture Notes in Mathematics, Springer, Vol. 1873.
[7] Nikeghbali, A. (2006). An essay on the general theory
of stochastic processes, Probability Surveys 3, 345–412.
Now, define [8] Protter, P.E. (2005). Stochastic Integration and Differ-
d d
ential Equations, 2nd Edition, version 2.1, Springer.
= Rt , = Rt (17) [9] Revuz, D. & Yor, M. (1999). Continuous Martingales
d Ft d Gt and Brownian Motion, 3rd Edition, Springer.
[10] Stricker, C. (1977). Quasi-martingales, martingales
If Y = d , then the hypothesis (H ) holds under locales, semimartingales et filtration naturelle, Zeitschrift
d fur Wahrscheinlichkeitstheorie und Verwandte Gebiete
 if and only if 39, 55–63.

ƐP [XY |Gt ] ƐP [XY |Ft ] Further Reading


∀X ≥ 0, X ∈ F∞ , =
Rt Rt
(18) Jacod, J. (1985). Grossissement initial, hypothèse (H’), et
théorème de Girsanov, in Grossissements de Filtrations:
Exemples et Applications, T. Jeulin & M. Yor, eds, Springer,
In particular, when d is F∞ measurable, pp. 15–35.
d
Rt = Rt and the hypothesis (H ) holds under .
A decomposition formula is given below. Related Articles
 
Theorem 5 If (Xt ) is a ,  -local martingale,
then the stochastic process Compensators; Equivalence of Probability Mea-
sures; Martingale Representation Theorem; Mar-
t 
tingales; Poisson Process; Semimartingale.
Rs−
IX (t) = Xt +
0 Rs DELIA COCULESCU & ASHKAN NIKEGHBALI
Local Times The local time of B can also be considered as
a doubly indexed process. As such it is a.s. jointly
continuous in b and t (see [9]) and deterministic
functions on  × + can be integrated with respect
The most obvious way to measure the time that
to (Lbt , b ∈ , t ≥ 0) (see Itô’s Formula).
a random process X spends at a value b on a
t
time interval [0, t] is to compute 0 1{Xs =b} ds. The
problem is that this expression might be equal to 0, Local Time of a Semimartingale
although the process X actually visits the value b.
This is realized by the real Brownian motion (for Similarly to formula (2), one can define the local time
a definition of this process, see Lévy Processes). process of a semimartingale Y (for the definition of
Indeed, if we denote this process B, then for every a semimartingale, see Stochastic Integrals) by using
fixed real b the set {s ≥ 0 : Bs = b} has 0 Lebesgue the following occupation time formula:
measure and is infinite (and uncountable). However,  t 
one can measure the time that B spends at b by using f (Ys ) d[Y ]s =
c
f (b)Lbt (Y ) db (5)
the notion of local time defined by 0 
 t where ≥ 0) is the continuous part of the
([Y ]cs , s
1
Lbt = lim 1{|Bs −b|<} ds (1) quadratic variation of Y also denoted by < Y > (for
→0 2 0
the definition see Stochastic Integrals). For a fixed
where the limit is a pathwise limit. b (Lbt (Y ), t ≥ 0) is a.s. continuous.
For a fixed b, the process (Lbt , t ≥ 0) is an increas- The obtained local time process (Lbt (Y ), b ∈
ing process that only increases at times when B takes , t ≥ 0) satisfies the following formula, called
the value b. Under the assumption that B starts at 0, Tanaka’s formula:
the processes (L0t , t ≥ 0) and (2 sup0≤s≤t Bs , s ≥ 0)
 t
have the same law. This identity is due to Paul Lévy.
As b varies and t is fixed, one obtains the process |Yt − b| = |Y0 − b| + sgn(Ys − b) dYs + Łbt
0
(Lbt , b ∈ ), which actually represents the density of 
occupation time of B during the time interval [0, t]. + {|Ys − b| − |Ys− − b| − sgn(Ys− − b)Ys }
This fact corresponds to the following formula called 0<s≤t
the occupation time formula (6)
 t 
f (Bs ) ds = f (b)Lbt db (2) where the function sgn is defined by sgn(x) = 1x>0 −
0  1x≤0 . Tanaka’s formula actually provides a definition
for every measurable bounded function f . This for- of the local time equivalent to formula (5). Thanks
mula provides a definition of the local time equiv- to this formula, Paul Lévy’s identity is extended in
alent to definition (1). For a fixed t, one does not [5] to the continuous semimartingales starting from 0
know, special times excepted, the law of the pro- under the form
cess (Lbt , b ∈ ), but a lot of trajectorial results are   s 
(law)
established. For example, from [6], we have (L0t , t ≥ 0) = 2 sup sgn(−Yu ) dYu , t ≥ 0
0≤s≤t 0
lim inf sup Lxt (t −1 log log t)1/2 = c (3)
t→∞ x∈ (7)

with 0 < c < ∞, and One can actually see Tanaka’s formula as an example
√ of extension of Itô’s formula (see Itô’s Formula).
lim sup sup Lxt (t log log t)−1/2 = 2 (4) Local time is also involved in inequalities reminis-
t→∞ x∈
cent of the Burkholder–Davis–Gundy ones. Indeed,
One of these special times is Ta , the first hitting in [2], it is shown that there exist two universal pos-
time by B of a given value a. The law of (LbTa , b ∈ itive and finite constants c and C such that
) is described by one of the famous Ray–Knight
cE [sup |Xt |] ≤ E[sup La∞ ] ≤ CE [sup |Xt |] (8)
Theorems (see [8, Chapter XI]). t a t
2 Local Times

for any continuous local martingale X with occupation time formula as for the real Brownian
X0 = 0. motion:
 t 
f (Xs ) ds = f (b)bt (X) db (12)
Local Time of a Markov Process 0 E

In case a random process is both a Markov process


One can define the local time process of a Markov with regular points and a semimartingale, it then
process X at a value b of its state space only admits two local time processes that are different
if b is regular for X (see Markov Processes for (they might coincide as in the case of the Brownian
the definition of Markov process). This means that motion). As an example, consider a symmetric stable
starting from b, the process X then visits b at process X with index in (1, 2) (for definition see
arbitrarily small times. Not every Markov process Lévy Processes). We have [X]c = 0; hence, as a
has this property. For example, a real-valued Lévy semimartingale, X has a local time process that
process (see Lévy Processes for the definition of identically equals 0. However, as a Markov process,
that process) has this property at every point if its X has a local time process that satisfies formula
characteristic exponent ψ satisfies [3, Chapter II] (12) and hence differs from 0. Besides, in this case,
 +∞   condition (11) is satisfied.
1
R dx < ∞ (9)
−∞ 1 + ψ(x)
References
When b is regular for X, there exists a unique
(up to a multiplicative constant) increasing contin- [1] Barlow, M.T. (1988). Necessary and sufficient conditions
uous additive functional, that is, an adapted process for the continuity of local times for Levy processes,
(bt (X), t ≥ 0) starting from 0 such that Annals of Probability 16, 1389–1427.
[2] Barlow, M.T. & Yor, M. (1981). (Semi-) Martingale
bt+s (X) = bt (X) + bs (X)oθt (10) inequalities and local times, Zeitschrift fur Wahrschein-
lichkeitstheorie verw Gebiete 55, 237–254.
increasing only at times when X takes the value b. [3] Bertoin, J. (1996). Lévy Processes, Cambridge University
This process is called the local time at b. Press.
[4] Eisenbaum, N. & Kaspi, H. (2007). On the continuity of
When it exists, the local time process (bt (X), b ∈ local times of Borel right Markov processes, Annals of
E, t ≥ 0) of a Markov process X with state space Probability 35, 915–934.
E might be jointly continuous in b and t. A nec- [5] El Karoui, N. & Chaleyat-Maurel, M. (1978). Un
essary and sufficient condition for that property is problème de réflexion et ses applications au temps local et
given in [1] for Lévy processes as follows: set aux équations différentielles stochastiques sur , in Temps
1 ∞
h(a) = π −∞ (1 − cosab)R(1/ψ(b)) db and m(ε) =
locaux—Astérisque, Société mathématiques de France,
Paris, Vol. 52–53, pp. 117–144.
 da1{h(a)<ε} for ε > 0; then the considered Lévy [6] Kesten, H. (1965). An iterated logarithm law for local
process has a continuous local time process if time, Duke Mathematical Journal 32, 447–456.
    [7] Marcus, M. & Rosen, J. (1995). Sample path properties
1 of the local times of strongly symmetric Markov pro-
Log dε < ∞ (11) cesses via Gaussian processes, Annals of Probability 20,
0+ m(ε)
1603–1684.
This result concerning Lévy processes has been [8] Revuz, D. & Yor, M. (1999). Continuous Martingale and
Brownian Motion, 3rd Edition, Springer.
extended to symmetric Markov processes in [7] and [9] Trotter, H. (1958). A property of Brownian motion paths,
to general Markov processes in [4]. Illinois Journal of Mathematics 2, 425–433.
We mention that under condition (9), the local
time process of a Lévy process X satisfies the same NATHALIE EISENBAUM
Stochastic Integrals Having fixed the stochastic base on which all
the processes are defined, let us go back to our
primary task of defining the integral H dX. If X
is a process of finite variation, the theory is that of
If Ht represents the number of shares of a certain Lebesgue–Stieltjes integration.
asset held by an investor and Xt denotes the price of
the asset, the gain on [0, t] from the trading strategy Definition 1 A stochastic process X is said to be
H is often represented as càdlàg (for continu à droite, limites à gauche from
 t French) if it a.s. has sample paths that are right con-
Ht dXt (1) tinuous on [0, ∞) with left limits on (0, ∞). Similarly,
0 a stochastic process X is said to be càglàd (for con-
tinu à gauche, limites à droite) if it a.s. has sample
Here, our goal is to give a precise meaning to such paths that are left continuous on (0, ∞) with right
“stochastic integrals”, where H and X are stochastic limits on [0, ∞). We denote the space of adapted,
processes verifying appropriate assumptions. càdlàg (respectively, càglàd ) processes by  (respec-
Looking at the time-series data for price evolution tively, ).
of, say, a stock, one realizes that placing smoothness
assumptions, such as differentiability, on the paths of Definition 2 Let X be a càdlàg process. For a given
X would be unrealistic. Consequently, this puts us in ω ∈ , the variation of the path X(ω) on the compact
a situation where the theory of ordinary integration interval [a, b] is defined as
is no longer sufficient for our purposes. In  what  
follows, we construct the stochastic integral H dX sup Xt (ω) − Xt (ω) (2)
i+1 i
for a class of integrands and integrators that are as π∈P t ∈π
i
large as possible while satisfying certain conditions.
The stochastic processes that we use are defined on where P is the set of all finite partitions of [a, b]. X
a complete probability space (, F, ). We always is said to be a finite variation (FV) process if X is
assume that all the processes are jointly measurable, càdlàg and almost all paths of X have finite variation
that is, for any process (Yt )0≤t<∞ the map (t, ω)  → on each compact interval of [0, ∞).
Yt (ω) is measurable with respect to B(+ ) × F,
where B(+ ) is the Borel σ -algebra on [0, ∞). In If X is an FV process, for fixed ω, it induces a
addition, we are given a filtration (Ft )0≤t≤∞ (see signed measure on + and  t thus we can define a
Filtrations), which models the accumulation of our jointly measurable integral 0 Hs (ω) dXs (ω) for any
information over time. The filtration (Ft )0≤t≤∞ is bounded and jointly measurable H . In other words,
usually denoted by  for convenience. We say that the integral H dX can be defined path by path
a jointly measurable process, Y , is adapted (or - as a Lebesgue–Stieltjes integral, t if H is a jointly
adapted if we need to specify the filtration) if Yt ∈ Ft measurable process such that 0 Hs (ω) dXs (ω) exists
for all t, 0 ≤ t < ∞. We assume that the following and is finite for all t > 0, a.s.
hypotheses hold true. Unfortunately, the set of FV processes is not rich
enough
 if one wants to give a rigorous meaning
Assumption 1 The filtered complete probability to H dX using only Stieltjes integration. When
space (, F, , ) satisfies the usual hypotheses (see we replace X with, say, a Brownian motion, the
Filtrations) theory of Stieltjes integration fails to work since the
Brownian motion is known to have paths of infinite
Although the above hypotheses are restrictive, they variation in every compact interval of + . Therefore,
are satisfied in many situations. The natural filtration one needs to develop a concept of integration with
of a Lévy process, in particular a Brownian motion, respect to a class of processes that is large enough
satisfies the usual hypotheses once completed. The to cover processes such as the Brownian motion or
same is true for the natural filtration of any counting the more general Lévy processes, which find frequent
process or “reasonable” strong Markov process (see, applications in different fields.
e.g., [7] for a more detailed discussion of the usual  To find the weakest conditions on X so that
hypotheses and their consequences). H dX is well defined, we start with the simplest
2 Stochastic Integrals

possible form for the integrand H and work gradually asset since its price does not change over time on
to extend the stochastic integral to more complex average. Indeed, if H is of the form (3), then H · X
integrands by imposing conditions on X but making is a martingale with expected value zero so that
sure that these conditions are as minimal as possible the traders earn zero profit on average, as expected.
at the same time. Now consider another strategy H = 1[0,T1 ) , where
The simplest integrand one can think of is of the T1 is the time of the first jump of N . Since X is
following form: an FV process, H · X is well defined as a Stieltjes
integral and is given by (H · X)t = λ(t ∧ T1 ) > 0,
a.s., being the value of the portfolio at time t.
Ht (ω) = 1(S(ω),T (ω)] (t)
 Thus, this trading strategy immediately accumulates
1 if S(ω) < t ≤ T (ω) arbitrage profits. A moment of reflection reveals that
:= (3)
0 otherwise such a trading strategy is not feasible under usual
circumstances since it requires the knowledge of the
where S and T are stopping times (see Filtrations) time of a market crash, time T1 in this case, before it
with respect to . In financial terms, this corresponds happens. If we use H = 1[0,T1 ] instead, this problem
to a buy-and-hold strategy, whereby one unit of the disappears.
asset is bought at, possibly random, time S and sold
at time T . If X is the stochastic process representing Naturally, one will want the stochastic integral to
the price of the asset, the net profit of such a trading be linear. Given a linear integral operator, we can
strategy aftertime T is equal to XT − XS . This leads define H · X for integrands that are linear combina-
us to define H dX as tions of processes of the form (3).
 t Definition 3 A process H is said to be simple
Hs dXs = Xt∧T − Xt∧S (4) predictable if H has a representation
0

n
where t ∧ T := min{t, T } for all t, 0 ≤ t < ∞, and Ht = H0 1{0} (t) + Hi 1(Ti ,Ti+1 ] (t) (5)
stopping times T . Clearly, the process H in equation i=1
(3) has paths that are left continuous and possess
 right
limits. We could similarly have defined H dX for where 0 = T1 ≤ · · · ≤ Tn+1 < ∞ is a finite sequence
H of the form, say, 1[S,T ) . However, there is a good of stopping times, H0 ∈ F0 , Hi ∈ FTi , 1 ≤ i ≤ n with
reason for insisting on paths that are continuous from |Hi | < ∞, a.s., 0 ≤ i ≤ n. The collection of simple
the lefton (0, ∞) as we see in Example 1. Let us predictable processes is denoted by S.
t
denote 0 Hs dXs by (H · X)t .
Let L0 be the space of finite-valued random
Theorem 1 Let H be of the form (3) and M be variables endowed with the topology of convergence
a martingale (see Martingales). Then H · M is a in probability. Define the linear mapping IX : S  →
martingale. L0 as

Later, we will see that the above theorem holds IX (H ) = (H · X)∞


for a more general class of integrands so that the
stochastic integrals preserve the martingale property. 
n
:= H0 X0 + Hi (XTi+1 − XTi ) (6)
The following example shows why the left continuity i=1
for H is a reasonable restriction from a financial
perspective. where H has the representation given in equation
(5). Note that this definition does not depend on the
Example 1 Let N be a Poisson process with particular choice of representation for H .
intensity λ and define X by Xt = λt − Nt . It is Another property that the operator IX must have
well known that X is a martingale. Suppose that is that it should satisfy some version of the bounded
there exists a traded asset with a price process given convergence theorem. This will inevitably place some
by X. Under normal circumstances, one should not restrictions on the stochastic process X. Thus, to
be able to make arbitrage profits by trading in this have a large enough class of integrators, we choose a
Stochastic Integrals 3

reasonably weak version. A particularly weak version Arguably, Brownian motion is the most well
of the bounded convergence theorem is that the known of all semimartingales. In the following
uniform convergence of H n to H in S implies the section, we develop stochastic integration with
convergence of IX (H n ) to IX (H ) only in probability. respect to a Brownian motion.
Let Su be the space S topologized by uniform
convergence and recall that for a process X and a
stopping time T , the notation X T denotes the process L2 Theory of Stochastic Integration with
(Xt∧T )t≥0 . Respect to Brownian Motion
Definition 4 A process X is a total semimartingale if We assume that there exists a Brownian motion, B, on
X is càdlàg, adapted and IX : Su  → L0 is continuous. (, F, , ) with B0 = 0, and that F0 only contains
X is a semimartingale (see Semimartingale) if, for the (F, )-null sets. First, we define the notion of
each t ∈ [0, ∞), X t is a total semimartingale. predictability, which is the key concept in defining
the stochastic integral.
This continuity property of IX allows us to extend
the definition of stochastic integrals to a class of Definition 5 The predictable σ -algebra P on
integrands that is larger than S when the integrator [0, ∞) ×  is defined to be the smallest σ -algebra
is a semimartingale. on [0, ∞) ×  with respect to which every adapted
It follows from the definition of a semimartingale càglàd process is measurable. A process is said to
that semimartingales form a vector space. One can be predictable if it is a P-measurable map from
also show that all square integrable martingales and [0, ∞) ×  to .
all adapted FV processes are semimartingales (see
Semimartingale). Therefore, the sum of a square Clearly, S ⊂ P. Actually, there is more to this as is
integrable martingale and an adapted FV process shown by the next theorem.
would also be a semimartingale. The converse of
this statement is also “essentially” true. The precise Theorem 3 Let bS be the set of elements of S
statement is the following theorem. that are bounded a.s. Then, P = σ (bS), that is, P is
generated by the processes in bS.
Theorem 2 (Bichteler–Dellacherie Theorem).
By linearity of the stochastic integral and Theo-
Let X be a semimartingale. Then there exist processes
rem 1 and using the fact that Brownian motion has
M, A, with M0 = A0 = 0 such that
increments independent from the past with a certain
Xt = X0 + Mt + At (7) Gaussian distribution, we have the following.

where M is a local martingale and A is an adapted Theorem 4 Let H ∈ bS and define (H · B)t = (H ·
B t )∞ , that is, (H · B)t is the stochastic integral of H
FV process.
with respect to B t . Then H · B is a martingale and
Here, we emphasize that this decomposition is not  t
 
necessarily unique. Indeed, suppose that X has Ɛ (H · B)t =
2
Ɛ[Hs2 ] ds (8)
the decomposition X = X0 + M + A and the space 0

(, F, , ) supports a Poisson process N with In the following, we construct the stochastic integral
intensity λ. Then Yt = Nt − λt will define a martin- with respect to Brownian motion for a subset of
gale, which is also an FV process. Therefore, X can predictable processes. To keep the exposition simple,
also be written as X = X0 + (M + Y ) + (A − Y ). we restrict our attention to a finite interval [0, T ],
The reason for the nonuniqueness is the existence where T is arbitrary but deterministic. Define
of martingales that are of finite variation. However,
  T 
if X has a decomposition X = X0 + M + A, where
M is a local martingale and A is predictablea and L (B ) := H ∈ P :
2 T
Ɛ[Hs ] ds < ∞
2
(9)
0
FV with M0 = A0 = 0, then such a decomposition is
unique since all predictable local martingales that are which is a Hilbert space. Note that bS ⊂ L2 (B T ).
of finite variation have to be constant. Letting L2 (FT ) denote the space of square integrable
4 Stochastic Integrals

FT -measurable random variables, Theorem 4 now Brownian motion. We show that the integral oper-
implies the map ator is a continuous mapping from the set of simple
predictable process into an appropriate space so that
IB T : bS  → L2 (FT ) (10) we can extend the set of possible integrands to the
closure of S in a certain topology.
defined by
IB T (H ) = (H · B)T (11) Definition 6 A sequence of processes (H n )n≥1 con-
verges to a process H uniformly on compacts in prob-
is an isometry. Consequently, we can extend the ability (UCP) if, for each t > 0, sup0≤s≤t |Hsn − Hs |
definition of the stochastic integral uniquely to the converges to 0 in probability.
closure of bS in L2 (B T ). An application of monotone
class theorem along with Theorem 3 yields that the The following result is not surprising and one can
closure is the whole L2 (B T ). refer to, for example, [7] for a proof.

Theorem 5 Let H ∈ L2 (B T ). Then the Itô integral Theorem 6 The space S is dense in  under the
(H · B)T of H with respect to B T is the image of H UCP topology.
under the extension of the isometry IB T to the whole
of L2 (B T ). In particular, The following mapping is key to defining the stochas-
 T tic integral with respect to a general semimartingale.
 
Ɛ (H · B)2T = Ɛ[Hs2 ] ds (12) Definition 7 For H ∈ S and X being a càdlàg
0
process, define the linear mapping JX : S  →  by
Moreover, the process Y defined by Yt = (H · B)t∧T  n
is a square integrable martingale. JX (H ) = H0 X0 + Hi (X Ti+1 − X Ti ) (14)
i=1
The property (12) is often called the Itô isometry.
where H has the representation as in equation (5).

Note the difference between JX and IX . IX maps


Stochastic Integration with Respect to
processes into random variables, whereas JX maps
General Semimartingales processes into processes.
In the previous section, we developed the stochastic Definition 8 For H ∈ S and X being an adapted
integration for Brownian motion over the interval càdlàg process, we call JX (H ) the stochastic integral
[0, T ]. We need to mention here that the method of H with respect to X.
employed works not only for Brownian motion but
also for any martingale M that is square integrable Observe that JX (H )t = IXt (H ). This property, com-
over [0, T ], the latter case requiring some extra effort bined with the definition of a semimartingale, yields
mainly for establishing the existence of the so-called the following continuity property for JX .
quadratic variation process associated with M. This
would, in turn, allow us to extend the definition Theorem 7 Let X be a semimartingale and SUCP
of the stochastic integral with respect to X of the (respectively UCP ) denote the space S (respectively,
form X = M + A, where M is a square integrable ) endowed with the UCP topology. Then the map-
martingale and A is a process of finite variation on ping JX : SUCP  → UCP is continuous.
compacts by defining, under some conditions on H ,
Using Theorem 6, we can now extend the inte-
H ·X =H ·M +H ·A (13) gration operator JX from S to  by continuity, since
UCP is a complete metric spaceb .
where H · A can be computed as a path-by-path
Lebesgue–Stieltjes integral. In this section, we estab- Definition 9 Let X be a semimartingale. The con-
lish the stochastic integral with respect to a general tinuous linear mapping JX : UCP  → UCP obtained
semimartingale. The idea would be similar to the con- as the extension of JX : SUCP  → UCP is called the
struction of the stochastic integral with respect to stochastic integral.
Stochastic Integrals 5
 t 
Note that, in contrast to the L2 theory utilized in
Bsn dBs = Btj (Bt∧tj +1 − Btj )
the previous section, we do not need to impose any 0 tj ∈σn
integrability conditions on either X or H to establish tj <t

the existence of the stochastic integral H · X as long 1


as H remains in . The above continuity property = (Bt∧tj +1 + Btj )(Bt∧tj +1 − Btj )
2
of the stochastic integrals moreover allows us to tj ∈σn
tj <t
approximate the H · X using the Riemann sums.
1
Definition 10 Let σ denote a finite sequence of finite − (Bt∧tj +1 − Btj )2
tj ∈σn
2
stopping times: tj <t

1 2 1
0 = T0 ≤ T1 ≤ · · · ≤ Tk < ∞. (15) = B(t∧T n ) − (Bt − Btj )2 (19)
2 k n 2 tj ∈σn j +1
tj <t
The sequence of σ is called a random partition. A
sequence of random partitions σn
As n tends to ∞, the sumc in equation (19) is
σn : 0 = T0n ≤ T1n ≤ · · · ≤ Tknn (16) known to converge to t. Obviously, BT2 n ∧t tends to
kn
Bt2 since σn tends to identity. Thus, we conclude via
is said to tend to identity if Theorem 8 that

1. limn→∞ supj Tjn = ∞, a.s. and t
1 2 t
2. supj |Tjn+1 − Tjn | converges to 0 a.s. Bs dBs = B − (20)
0 2 t 2

Let Y be a process and σ be a random partition. since B is continuous with B0 = 0. Thus, the integra-
Define the process tion rules for a stochastic integral are quite different
 from those for an ordinary integral. Indeed, if A
Y σ := Y0 1{0} + YTj 1(Tj ,Tj +1 ] (17) were a continuous process of finite variation with
j A0 = 0, then the Riemann–Stieltjes integral of A · A
will yield the following formula:
Consequently, if Y is in  or 


t
1 2
Y σ · X = Y0 X0 + YTj X Tj +1 − X Tj (18) As dAs = A (21)
0 2 t
j

As in the case of Brownian motion, stochastic inte-


for any semimartingale X. gration with respect to a semimartingale preserves the
Theorem 8 Let X be a semimartingale and let martingale property.
t
H dX denote (H · X)t − H0 X0 for any H ∈ .
0+ s s
Theorem 9 Let H ∈  such that limt↓0 |Ht | < ∞
If Y is a process in  or in , and (σn ) is a
and X be a local martingale (see Martingales). Then
sequence of random  partitions tending to identity,
t
H · X is also a local martingale.
then the process 0+ Ysσn dXs converges to the
t≥0
stochastic integral (Y− ) · X in UCP, where Y− is the Next, we would like to weaken the restriction that
process defined as (Y− )s = limr→s,r<s Yr , for s > 0, an integrand must be in . If we want the stochastic
and (Y− )0 = 0. integral to still preserve the martingale property with
this extended class of integrands, we inevitably need
Example 2 As an application of the above theorem, to restrict our attention to predictable processes.
t
we calculate 0 Bs dBs , where B is a standard Brow- To see this, consider the process H = 1[0,T1 ) in
nian motion with B0 = 0. Let (σn ) be a sequence of Example 1. This process is not predictable since the
random partitions of the form (16) tending to identity jump times of a Poisson process are not predictable
and let B n = B σn . Note that stopping times. As we have shown in Example 1, the
6 Stochastic Integrals

integral of H with respect to a particular martingale Theorem 15 The stochastic integral is associative.
is not a martingale. That is, H · X is also a semimartingale and if G ∈ 
Before we allow more general predictable inte-
grands in a stochastic integral, we need to develop G · (H · X) = (GH ) · X (22)
the notion of quadratic variation of a semimartingale.
Definition 11 The quadratic variation process of X,
This is discussed in the following section.
denoted by [X, X] = ([X, X]t )t≥0 , is defined as

[X, X] = X 2 − 2X− · X (23)


Properties of Stochastic Integrals Recall that X0− = 0. Let Y be another semimartin-
gale. The quadratic covariation of X and Y , denoted
In this section, H denotes an element of  and X by [X, Y ], is defined as
denotes a semimartingale. For a process Y ∈ , we
define Yt = Yt − Yt− , the jump at t. Recall that two [X, Y ] = XY − Y− · X − X− · Y (24)
process Y and Z are said to be indistinguishable if
{ω : Yt (ω) = Zt (ω), ∀t} = 1. Since X− (and Y− ) belongs to , we can use
Theorem 8 to deduce the following.
Theorem 10 Let T be a stopping time. Then (H ·
X)T = H 1[0,T ] · X = H · (X T ). Theorem 16 Let Y be a semimartingale. The quad-
ratic covariation [X, Y ] of X and Y is an adapted
Theorem 11 The jump process ((H · X)t )t≥0 is càdlàg process that satisfies the following:
indistinguishable from (Ht Xt )t≥0 .
1. [X, Y ]0 = X0 Y0 and [X, Y ] = XY .
In finance theory, one often needs to work under the 2. If (σn ) is a sequence of partitions tending to
so-called risk-neutral measure rather than the empir- identity, then
ical or objective measure . Recall that definitions
 Tn n
of a semimartingale and its stochastic integral are X0 Y0 + (X j +1 − X Tj )
given in spaces topologized by convergence in prob- j
ability. Thus, one may wonder whether the value
Tjn+1 n
of a stochastic integral remains unchanged under an × (Y − Y Tj ) → [X, Y ] (25)
equivalent change of measure. The following theorem
shows that this is indeed the case. Let  be another with convergence in UCP, where σn is of the form
probability measure on (, F) and let H · X denote (16).
the stochastic integral of H with respect to X com- 3. If T is any stopping time, then [X T , Y ] =
[X, Y T ] = [X, Y ]T .
puted under .
Moreover, [X, X] is increasing.
Theorem 12 Let  . Then, H · X is indistin-
Since [X, X] is increasing and càdlàg by definition,
guishable from H · X.
we immediately deduce that [X, X] is of finite vari-
Theorem 13 Let  = (Gt )t≥0 be another filtration ation. Moreover, the following polarization identity
such that H is in both () and (), and such
that X is also a -semimartingale. Then H · X is 1
[X, Y ] = ([X + Y, X + Y ] − [X, X] − [Y, Y ])
indistinguishable from H · X. 2
(26)
The following theorem shows the stochastic integral
is an extension of the Lebesgue–Stieltjes integral. reveals that [X, Y ] is the difference of two increasing
processes; therefore, [X, Y ] is an FV process as well.
Theorem 14 If X is an FV process, then H · X This, in turn, implies XY is also a semimartingale
is indistinguishable from the Lebesgue–Stieltjes inte- and yields the integration by parts formula:
gral, computed path by path. Consequently, H · X is
an FV process. Xt Yt = (X− · Y )t + (Y− · X)t + [X, Y ]t (27)
Stochastic Integrals 7

When X and Y are FV processes, the classical then X is constant on [S, T ]. Moreover, if [X, X] is
integration by parts formula reads as follows: constant on [S, T ] ∩ [0, ∞), then X is also constant
there.
Xt Yt = X0 Y0 + (X− · Y )t The following result is quite handy when it comes

+ (Y− · X)t + Xs Ys (28) to the calculation of the quadratic covariation of two
0<s≤t stochastic integrals.

Therefore, if X or Y is a continuous processes of Theorem 19 Let Y be a semimartingale and K ∈ .


finite variation, then [X, Y ] = X0 Y0 . In particular, Then
if X is a continuous FV process, then its quadratic  t
variation is equal to X02 . [H · X, K · Y ]t = Hs Ks d[X, Y ]s (30)
0
Theorem 17 Let X and Y be two semimartingales,
and let H and K be two measurable processes. Then In the following section, we define the stochastic inte-
one has a.s. gral for predictable integrals. However, we already
have all the results to present the celebrated Itô’s
 ∞ formula.
|Hs ||Ks | | d[X, Y ]s |
0
Theorem 20 (Itô’s Formula). Let X be a semi-
 ∞ 1  ∞ 1 martingale and f be a C 2 real function. Then f (X)
2 2
≤ Hs2 d[X, X]s Ks2 d[Y , Y ]s (29) is again a semimartingale and the following formula
0 0 holds:
The above inequality is called Kunita–Watanabe  t
inequality. An immediate consequence of this inequa- f (Xt ) − f (X0 ) = f (Xs− ) dXs
lity is that if X or Y has zero quadratic variation, then 0+
[X, Y ] = 0. The following theorem follows from the 
1 t
definition of quadratic variation and Theorem 9. + f (Xs− ) d[X, X]s
2 0+
Theorem 18 Let X be a local martingale. Then,  
X 2 − [X, X] is a local martingale. Moreover, [X, X] + f (Xs ) − f (Xs− ) − f (Xs− )Xs
0<s≤t
is the unique adapted càdlàg and FV process A such

that X 2 − A is a local martingale and A = (X)2 1
with A0 = X02 . − f (Xs− )(Xs )2 (31)
2
Note that the uniqueness in the above theorem is lost
if we do not impose A = (X)2 . Roughly speak- Stochastic Integration for Predictable
ing, the above theorem infers Ɛ(Xt2 ) = Ɛ([X, X]t )
Integrands
when X is a martingale. The following theorem for-
malizes this intuition. In this section, we weaken the hypothesis that H ∈ 
Corollary 1 Let X be a local martingale. Then, X in order for H · X to be well defined for a semimartin-
is a martingale with Ɛ(Xt2 ) < ∞, for all t ≥ 0, if and gale X. As explained earlier, we restrict our attention
only if Ɛ([X, X]t ) < ∞, for all t ≥ 0. If Ɛ([X, X]t ) < to predictable processes since we want the stochas-
∞, then Ɛ(Xt2 ) = Ɛ([X, X]t ). tic integral to preserve the martingale property. We
will not be able to show the existence of stochastic
The following corollary to Theorem 18 is of funda- integral H · X for all H ∈ P but, as in the section
mental importance in the theory of martingales. L2 Theory of Stochastic Integration with Respect to
Brownian Motion, we give a meaning to H · X for
Corollary 2 Let X be a continuous local martingale, the appropriately integrable processes in P. First, we
and S ≤ T ≤ ∞ be stopping times. If X has paths assume that X is a special semimartingale, that is,
of finite variation on the stochastic interval (S, T ), there exist processes M and A such that M is a
8 Stochastic Integrals

local martingale and A is predictable and of finite Moreover, it is easy to show that if (H n ) ⊂ b and
variation with M0 = A0 = 0 and X = X0 + M + A. (J n ) ⊂ b converge to the same limit under dX (·, ·),
This decomposition of a special semimartingale is then (H n · X) and (J n · X) converge to the same limit
unique and called the canonical decomposition. With- in H2 . Thus, we can now define the stochastic integral
out loss of generality, let us assume that X0 = 0. H · X for any H ∈ bP.

Definition 12 Let X be a special semimartingale Definition 14 Let X ∈ H2 and H ∈ bP. Let (H n ) ⊂


with the canonical decomposition X = M + A. The b such that limn→∞ dX (H n , H ) = 0. The stochastic
H2 norm of X is defined as integral H · X is the unique semimartingale Y ∈ H2
such that limn→∞ H n · X = Y in H2 .
 ∞
 X H2 := 1/2
[M, M]∞ L2 +  | dAs | L2 Note that if B is a standard Brownian motion, B
0
is not in H2 but B T ∈ H2 for any deterministic and
(32) finite T . Therefore, for any H ∈ bP, H · B T is well
defined. Moreover, H ∈ bP implies H ∈ L2 (B T )
The space of H2 semimartingales consists of special where L2 (B T ) is the space defined in the section
semimartingales with finite H2 norm. We write X ∈ L2 Theory of Stochastic Integration with Respect
H2 to indicate that X belongs to the space of H2 to Brownian Motion. One can easily check that the
semimartingales. stochastic integral H · B T defined by Definition 14 is
indistinguishable from the stochastic integral H · B T
One can show that the space of H2 semimartingales
defined in the section L2 Theory of Stochastic Inte-
is a Banach space, which is the key property to
gration with Respect to Brownian Motion. Clearly,
extend the definition of stochastic integrals for a
bP is strictly contained in L2 (B T ), and we know
more general class of integrands. Let b denote
from the section L2 Theory of Stochastic Integration
the space of bounded adapted processes with càglàd
with Respect to Brownian Motion that it is possible
paths and bP denote the space of bounded predictable
to define the stochastic integral with respect to B T
processes.
for any process in L2 (B T ). Thus, it is natural to ask
whether we can extend the stochastic integral given
Definition 13 Let X ∈ H2 with the canonical
by Definition 14 to integrands that satisfy a certain
decomposition X = N + A and let H , J ∈ bP. We
square integrability condition.
define the metric dX (H , J ) as

  ∞ 1/2  Definition 15 Let X ∈ H2 with the canonical


  decomposition X = M + A. We say that H ∈ P is

dX (H , J ) :=  (Hs − Js ) d[M, M]s
2 
 2 (H2 , X) integrable if
0 L
 ∞ 
   
+ |Hs − Js || dAs |
 (33) ∞
0 L2 Ɛ Hs2 d[M, M]s
0
 2 
From the monotone class theorem, we obtain the ∞
following. +Ɛ |Hs || dAs | <∞ (34)
0

Theorem 21 For X ∈ H2 , the space b is dense in


bP under dX (·, ·). It can be shown that if H ∈ P is (H2 , X) integrable,
(H n · X) is a Cauchy sequence in H2 where H n =
It is straightforward to show that if H ∈ b and X ∈ H 1{|H |≤n} is in bP, which means that we can define
H2 , then H · X ∈ H2 . The following is an immediate the stochastic integral for such H .
consequence of the definition of dX (·, ·).
Definition 16 Let X ∈ H2 and let H ∈ P be (H2 , X)
Theorem 22 Let X ∈ H2 and (H n ) ⊂ b such that integrable. The stochastic integral H · X is defined to
(H n ) is Cauchy under dX (·, ·). Then, (H n · X) is be the limn→∞ H n · X, with convergence in H2 , where
Cauchy in H2 . H n = H 1{|H |≤n} .
Stochastic Integrals 9

In the case X = B T , M = B T , and A = 0; therefore, random variable with (U = 1) = (U = −1) =
H being (H2 , X) integrable is equivalent to the 1/2, and set X = U 1[T ,∞) . Then, X is a martingale in
condition  T
its own filtration. Let H be defined as Ht = 1t 1{t>0} .
Ɛ(Hs2 ) ds < ∞ (35) H is a deterministic predictable integral. Note that
0 H is not locally bounded, being only continuous on
(0, ∞). H · X exists as a Lebesgue–Stieltjes integral
which gives exactly the elements of L2 (B T ).
since X has paths of finite variation. However, H · X
So far, we have been able to define the stochastic
is not a local martingale since, for any stopping time
integral with predictable integrands only for semi-
S with P (S > 0) > 0, Ɛ(|(H · X)S |) = ∞.
martingales in H2 . This seems to be a major restric-
tion. However, as the following theorem shows, it When M is a continuous local martingale, the
is not. Recall that for a stopping time T , X T − = theory becomes nicer.
X1[0,T ) + XT − 1[T ,∞] .
Theorem 26 Let M be a continuous
 t local martin-
Theorem 23 Let X be a semimartingale, X0 = gale and let H ∈ P be such that 0 Hs2 d[M, M]s <
0. Then X is prelocally in H2 . That is, there ∞, for each t ≥ 0. Then H ∈ L(M) and H · M is a
exists a nondecreasing sequence of stopping times continuous local martingale.
(T n ), limn→∞ T n = ∞ a.s., such that X T − ∈ H2 , for
n

each n ≥ 1. The question may arise as to whether the proper-


ties of stochastic integral stated for left-continuous
Definition 17 Let X be a semimartingale and integrands in the section Properties of Stochastic
H ∈ P. The stochastic integral H · X is said to exist if
Integrals continue to hold when we allow pre-
there exists a sequence of stopping times (T n ) increas-
dictable integrands. The answer is positive except
ing to ∞ a.s. such that X T − ∈ H2 , for each n ≥ 1,
n

for Theorems 13 and 14. Still, if X is a semi-


and such that H is (H , X T − ) integrable for each
n
2
martingale with paths of finite variation on compacts
n ≥ 1. In this case, we write H ∈ L(X) and define
and if H ∈ L(X) is such that the Stieltjes inte-
the stochastic integral as t
gral 0 |Hs ||dXs | exists a.s. for each t ≥ 0, then the
H · X = H · (X T
n

), on [0, T n ) (36) stochastic integral H · X agrees with the Stieltjes
integral computed path by path. However, H · X is
for each n. not necessarily an FV process. See [7, Exercise 45
in Chapter IV] of [7] for a counterexample. The
A particular case when H · X is well defined is when analogous result for Theorem 13 is the following,
H is locally bounded. which is particularly useful when one needs to study
asymmetric information in financial markets where
Theorem 24 Let X be a semimartingale and H ∈ P
some traders possess extra information compared to
be locally bounded. Then, H ∈ L(X).
others.
We also have the martingale preservation property. Theorem 27 Let  be another filtration satisfy-
ing the usual hypotheses and suppose that Ft ⊂ Gt ,
Theorem 25 Let M be a local martingale and
H ∈ P be locally bounded. Then, H · M is a local each t ≥ 0, and that X remains a semimartingale
martingale. with respect to . Let H be locally bounded and
predictable for . Then H is locally bounded and pre-
The general result that M a local martingale and dictable for , the stochastic integral H · X exists
H ∈ L(M) implies that H · M is a local martingale is and is equal to H · X.
not true. The following example is due to Emery and
can be taken as a starting point for a study of sigma- It is important to have H locally bounded in the above
martingales (see Equivalent Martingale Measures). theorem; see [4] for a counterexample in the context
of enlargement of filtrations.
Example 3 Let T be an exponential random vari- We end this section with the dominated conver-
able with parameter 1 and let U be an independent gence theorem for stochastic integrals.
10 Stochastic Integrals

Theorem 28 Let X be a semimartingale and processes when at least one of the integrand or the
(H n ) ⊂ P be a sequence converging a.s. to a limit integrator is continuous.
H ∈ P. If there exists a process G ∈ L(X) such that
|H n | ≤ G, for all n, then H n ∈ L(X) for all n, H ∈
End Notes
L(X) and (H n · X) converges to H · X in UCP.
a.
See Definition 5 for the definition of a predictable process.
b.
Concluding Remarks For a proof of the fact that UCP is metrizable and
complete under that metric, see [7].
c.
In this article, we used the approach of Protter [7] to This sum converges to the quadratic variation of B over
the interval [0, t] as we see in Theorem 16.
define the semimartingale as a good integrator and
construct its stochastic integral. Another approach
that is closely related is given by Chou et al. [1], References
who developed the stochastic integration for general
predictable integrands with respect to a semimartin- [1] Chou, C.S., Meyer, P.A. & Stricker, C. (1980). Sur
gale in a space endowed with the semimartingale les intégrales stochastiques de processus prévisibles non
topology. Historically, the stochastic integral was first bornés, Séminaire de Probabilités, XIV . Lecture Notes in
Mathematics, 784, Springer, Berlin, pp. 128–139.
proposed for Brownian motion by Itô [3], then for
[2] Doléans-Dade, C. & Meyer, P.-A. (1970). Intégrales
continuous martingales, then for square integrable stochastiques par rapport aux martingales locales, Sémi-
martingales, and finally for càdlàg processes that naire de Probabilités, IV . Lecture Notes in Mathematics,
can be written as the sum of a locally square inte- 124, Springer, Berlin, pp. 77–107.
grable local martingale and an FV process by J.L. [3] Itô, K. (1944). Stochastic integral, Proceedings of the
Doob, H. Kunita, S. Watanabe, P. Courrège, P.A. Imperial Academy of Tokyo 20, 519–524.
Meyer, and others. Later in 1970, Doléans-Dade and [4] Jeulin, T. (1980). Semi-martingales et Grossissement
d’une Filtration, Lecture Notes in Mathematics, Springer,
Meyer [2] showed that the local square integrability
Berlin, Vol. 833.
condition could be relaxed, which led to the tradi- [5] McShane, E.J. (1974). Stochastic Calculus and Stochastic
tional definition of a semimartingale as a sum of a Models, Probability and Mathematical Statistics, Aca-
local martingale and an FV process. A different the- demic Press, New York, Vol. 25.
ory of stochastic integration, the Itô-belated integral, [6] Protter, P. (1979). A comparison of stochastic integrals,
was developed by McShane [5]. It imposed differ- The Annals of Probability 7(2), 276–289.
ent restrictions on the integrators and the integrands [7] Protter, P. (2005). Stochastic Integration and Differential
Equations, 2nd Edition, Version 2.1, Springer, Berlin.
and used a theory of “gauges” and appeared to be
very different from the approach here. It turns out,
however, that when the integral H dX made sense Related Articles
both as a stochastic integral in the sense developed
here and as an Itô-belated integral, they were indis- Arbitrage Strategy; Complete Markets; Equi-
tinguishable. See [6] for a comparison of these two valent Martingale Measures; Filtrations; Itô’s
integrals. Another related stochastic integral is called Formula; Martingale Representation Theorem;
the Fisk–Stratonovich (FS) integral that was devel- Semimartingale.
oped by Fisk and Stratonovich independently. The FS
integral obeys the integration by parts formula for FV UMUT ÇETIN
Equivalence of Probability in L1 (Q). We then have
   
Measures Zs EQ f | Fs = EP Zt f | Fs (3)

As consequence of Bayes’ formula, we get that if


M is a Q-martingale then ZM is a P -martingale and
In finance it is often important to consider differ-
vice versa. Hence, we can turn any Q-martingale into
ent probability measures. The statistical measure,
a P -martingale by just multiplying it with the density
commonly denoted by P , is supposed to (ideally)
process. It follows that the martingale property is not
reflect the real-world dynamics of financial assets.
invariant under equivalent measure changes.
A risk-neutral measure (see Equivalent Martingale
There are, however, a couple of important objects
Measures), often denoted by Q, is the measure
like stochastic integrals and quadratic variations
of choice for the valuation of derivative securities.
which do remain invariant under equivalent measure
Prices of traded assets are supposed to be (local) changes although they depend, by their definition, a
Q-martingales, and hence their dynamics (as seen priori on some probability measure. Let us illustrate
under Q) typically differs from their actual behavior this in case of the quadratic variation of a semimartin-
(as modeled under P ). How far can the dynamics gale S. This is defined to be the limit in P -probability
with respect to these two measures be away in terms of the sum of the squared S-increments over a time
of qualitative behavior? We would not expect that grid, for vanishing mesh size. It is elementary that
events that do not occur in the real world, in the convergence in P -probability implies convergence
sense that they have P -probability zero, like a stock in Q-probability if Q  P , and thus convergence
price exploding to infinity, would have positive Q- in P -probability is equivalent to the convergence in
probability in the risk-neutral world. This discussion Q-probability when P and Q are equivalent. This
leads to the notion of absolute continuity. implies, for example, that quadratic variations remain
the same under a change to an equivalent probability
Definition 1 Let P , Q be two probability measures
measure.
defined on a measurable space (, F). We say that Q
The compensator or angle bracket process, how-
is absolutely continuous with respect to P , denoted
ever, is not invariant with respect to equivalent mea-
by Q  P , if all P -zero sets are also Q-zero sets.
sure changes. It is defined (for reasonable processes
If Q  P and P  Q we say that P and Q are
S) as the process S one has to subtract from the
equivalent, denoted by P ∼ Q. In other words, two
quadratic variation process [S] to turn it into a local
equivalent measures have the same zero sets.
martingale. But, as we have seen, the martingale
Let Q  P . By the Radon–Nikodym theorem property typically gets lost by switching the mea-
there exists a density Z = dQ/dP so that for f ∈ sure. As an example, consider a Poisson process N
L1 (Q) we can calculate its expectation with respect with intensity λ. We have [N ] = N , so the compen-
to Q by sator equals λt. As we shall see below, the effect
of an equivalent measure change is that the intensity
EQ [f ] = EP [Zf ] (1)
changes as well, to µ, say, so the compensator under
the new measure would be µt.
Note that if Q is absolutely continuous, but not
equivalent to P , then we have P (Z = 0) > 0.
We now look at a dynamic picture, and assume Girsanov’s Theorem
that we also have a filtration (Ft )0≤t≤T at our disposal
where T is some fixed finite time horizon. For t ≤ T As we have discussed above, the martingale property
let is not preserved under measure changes. Fortunately,
Zt = EP [ Z| Ft ] (2) it turns out that at least the semimartingale property
is preserved. Moreover, it is possible to state the
We call the martingale Z = (Zt ) the density pro- precise semimartingale decomposition under the new
cess of Q. The Bayes formula tells us how to calculate measure Q. This result is known in the literature
conditional expectations with respect to Q in terms as the Girsanov’s theorem, although it was rather
of P . Let 0 ≤ s ≤ t ≤ T and f be Ft -measurable and Cameron and Martin who proved a first version of
2 Equivalence of Probability Measures

it in a Wiener space setting. Later on it was extended to be a martingale measure for the price process,
in various levels of generality by Girsanov, Meyer, and then equivalence is a necessary condition to
and Lenglart, among many others. exclude arbitrage opportunities [1]. There is, how-
Let us first give some examples. They are all ever, also a result which covers the case where
the consequences of the general formulation of Q is only absolutely continuous, but not equiva-
Girsanov’s theorem to be given below. lent to P , and which has been proven by Lenglart
[2].
1. Let B be a P -Brownian motion, µ ∈ , and
define an equivalent measure Q by the stochastic Theorem 1 (Girsanov’s Theorem: Standard
exponential Version). Let P ∼ Q, with density process given by
   
dQ 1 dQ 
= E (−µB)T = exp −µBT − µ2 T Zt = E Ft (6)
dP 2 dP 
(4)
If S is a semimartingale under P with decomposi-
Then B = B + µt is a Q-Brownian motion (up tion S = M + A (here M is a local martingale, and
to time T ). Alternatively stated, the semimartin- A a process of locally finite variation), then S is a
gale decomposition of B under Q is B = B − semimartingale under Q as well and has decomposi-
µt. Hence the effect of the measure change is to tion
add a drift term to the Brownian motion. 

2. Let Nt − λt be a compensated Poisson process 1
S= M− d[Z, M]
on an interval [0, T ] with P -intensity λ > 0, Z
and let κ > 0. Define an equivalent measure 

1
Q by + A+ d [Z, M] (7)
Z
dQ  1
= e−κλT (1 + κNs ) In particular, M −
dP Z d[Z, M] is a local Q-
0<s≤T martingale.
= e−κλT (1 + κ)NT
In situations where the process S may exhibit
= exp (NT ln (1 + κ) − κλT ) (5) jumps, it is often more convenient to apply a version
of Girsanov which uses the angle bracket instead of
Then N is a Poisson process on [0, T ] under Q the quadratic covariation.
with intensity (1 + κ) λ. The process Nt − (1 + κ) λt
is a compensated Poisson process under Q and thus a Theorem 2 (Girsanov’s Theorem: Predictable
Q-martingale. Hence the effect of the measure change Version). Let P ∼ Q, with density process as above,
is to change the intensity of the Poisson process, or and S = M + A be a P -semimartingale. Given that
in other words, to add a drift term to the compensated Z, M exists (with respect to P ), then the decompo-
Poisson process. sition of S under Q is
One of the most important applications of measure
changes in mathematical finance is to find martingale 

1
measures for the price process S of some risky asset. S= M− d Z, M
Z−


Definition 2 A martingale measure for S is a proba- 1
bility measure Q such that S is a Q-local martingale. + A+ d Z, M (8)
Z−
Let us now state a general form of Girsanov’s Here Z− denotes the left-continuous version of Z.
theorem. It is not the most general setting, though,
since we will assume that Q is equivalent to P Whereas the standard version of Girsanov’s theo-
which suffices for most applications in finance. This rem always works, we need an integrability condition
is due to the fact that one would often choose Q (existence of Z, M) for the predictable version.
Equivalence of Probability Measures 3

However, in case S = M + A for a local martingale For example, in the Bachelier model S = B + µt
M and a finite variation process A, it is rarely the case we have that Bt = t, and hence λ equals the con-
in a discontinuous framework that dA << d [M], stant µ.
whereas it is quite natural in financial applications The predictable version of Girsanov’s theorem
that dA << d M (see below). can now be applied to remove the drift λd M as
In mathematical finance, these results are often follows: we define a probability measure Q via
applied to find a martingale measure for the price pro- 

cess S. Consider, for example, the Bachelier model dQ
= E − λ dM (14)
where S = B + µt is a Brownian motion plus drift. dP T
If we now take as above the measure
change as given
where E denotes the Doléans-Dade  stochastic

by a density process Zt = exp −µBt − 1 µ2 t , then exponential, assuming that E − λdM is a
2
we have (since dZ = −µZdB) martingale. The corresponding density process Z
 therefore satisfies the stochastic differential equation


1 1
A+ d [Z, M] = µt + d −µ Z dB, B dZ = −Z− λ dM (15)
Z Z



1 It follows that
= µt + d −µ Z dt
Z 


=0 (9) Z, M = − Z− λ dM, M = − Z− λd M

According to Girsanov’s theorem (here the stan- (16)


dard version coincides with the predictable one since
S is continuous), the price process S is therefore a and
Q-local martingale (and, in fact, a Brownian motion

1
according to Lévy’s characterization), and hence Q S=M+ λd M = M − d Z, M (17)
Z−
is a martingale measure for S.
More generally, Girsanov’s theorem implies an is by the (predictable version) of the Girsanov theo-
important structural result for the price process S rem a local Q-martingale: the drift has been removed
in an arbitrage-free market. As has been mentioned by the measure change.
above, it is essentially true that some no-arbitrage This representation of S has an important con-
property implies the existence of an equivalent mar- sequence for the structure of martingale measures,
tingale measure Q for S = M + A, with density pro- provided the so-called structure condition holds:
cess Z. Therefore, we must have by the predictable
T
version (8), given that Z, M exists, that λ2s d Ms < ∞ P –a.s. (18)

0
1
A=− d Z, M (10) In that case, the remarkable conclusion we can
Z−
draw from (13) is that the existence of an equivalent
to get that S is a local Q-martingale. As it follows martingale measure for S implies that S is a spe-
from the so-called Kunita-Watanabe inequality that cial semimartingale, for example, its finite variation
part is predictable and therefore the semimartingale
d Z, M  d M (11)
decomposition (13) is unique. Moreover, the follow-
(here Z, M respectively M are interpreted as the ing result holds.
associated measures on the nonnegative real line), we
conclude that Proposition 1 Let Q be an equivalent martingale
dA  d M (12) measure for S, and the structure condition (18) hold.
Then the density process Z of Q with respect to P is
and hence there exists some predictable process λ given by the stochastic exponential
such that


S = M + λ d M (13) Z = E − λ dM + L (19)
4 Equivalence of Probability Measures

for some process L such that L as well as [M, L] References


are local P -martingales. The converse statement is
true as well, assuming that all involved processes are [1] Delbaen, F. & Schachermayer, W. (2006). The Mathemat-
locally bounded: if Q is a probability measure whose ics of Arbitrage, Springer, Berlin.
density process can be written like in equation (19) [2] Protter, P.E. (2005). Stochastic Integration and Differ-
with L as above, then Q is a martingale measure ential Equations, 2nd Edition, Version 2.1, Springer,
Heidelberg.
for S.

This result is fundamental in incomplete markets Related Articles


(see Complete Markets), where there are many
equivalent martingale measures for the price process
Change of Numeraire; Equivalent Martingale
S. Indeed, any choice of L as in the statement of the
Measures; Semimartingale; Stochastic Exponen-
proposition gives one particular pricing measure.
tial; Stochastic Integrals.
In applications in finance, the density process Z
can also be interpreted in terms of a change of THORSTEN RHEINLÄNDER
numeraire.
minimality of τ is equivalent to (Bt∧τ : t ≥ 0) being
Skorokhod Embedding a uniformly integrable martingale (see [6, 12]) and,
in consequence, when ƐBτ2 < ∞, it is further equiv-
Analysis of a random evolution focuses initially on alent to Ɛτ < ∞. Note that we can have many, in
the behavior at a fixed deterministic, or random, time. fact, infinitely many, minimal stopping times all of
The process and time horizon are known and we which embed the same distribution µ.
investigate the marginal law of the process. If we We want τ to be “small” to enable us to iterate the
reverse this point of view, we face the embedding embedding procedure. In this way, Skorokhod [20]
problem. We fix a probability distribution and a (well- represented a random walk as a Brownian motion
understood) stochastic process and we try to design stopped at an increasing sequence of stopping times
a random time such that the process at this time and deduced properties of the random walk from
behaves according to the specified distribution. In the well-understood behavior of Brownian motion.
other words, we know what we want to see and we As a simple example, one can use the representation
ask when to look for it. to deduce the central limit theorem from the strong
This Skorokhod embedding problem (SEP) or the law of large numbers (cf. [14, Sec. 11.2]). The ideas
Skorokhod stopping problem, first formulated and of embedding processes into Brownian motion were
solved by A.V. Skorokhod in 1961 (English transla- extended and finally led to the celebrated work of
tion in 1965 [20]), is thus the problem of representing Monroe [13], who proved that any semimartingale is
a given distribution µ as the distribution of a given a time-changed Brownian motion.
stochastic process (such as a Brownian motion) at The SEP, as stated above, does not necessarily
some stopping time. It has stimulated research in have a solution—existence of a solution depends
probability theory for over 40 years now—the prob- greatly on X and µ. This can be seen already for real-
lem has been changed, generalized, or specialized in valued diffusions [6]. However, for Brownian motion
various ways. We discuss some key results in the on , or any continuous local martingale (Xt ) with
domain, along with the applications in quantitative X∞ = ∞ a.s., there is always a solution to the
finance, namely to the computation of robust market- SEP and there are numerous explicit constructions
consistent prices and hedges of exotic derivatives. (typically, for the case of centered µ), of which we
give two examples below (cf. [14]).

The Skorokhod Embedding Problem Explicit Solutions

The SEP problem can be stated as follows: Skorokhod [20] and Dubins [8] solved the SEP for
Given a stochastic process (Xt : t ≥ 0) and a Brownian motion and arbitrary centereda probability
probability measure µ, find a minimal stopping time measure µ. However, the search for new solutions
τ such that Xτ has the law µ : Xτ ∼ µ. continued and was, to a large extent, motivated by the
At first, there seems to be a trivial solution to the properties of the stopping times. Researchers sought
SEP when Xt = Bt is a Brownian motion. Write  simple explicit solutions that would have additional
and Fµ for the cumulative distribution function of the optimal properties. Several solutions were obtained
standard normal distribution and of µ, respectively. using stopping times of the form
Then Fµ−1 ((B1 )) has law µ and hence the stop-
τ = inf{t : (At , Bt ) ∈ },  = (µ) ⊂ 2 (1)
ping time τ = inf{t ≥ 2 : Bt = Fµ−1 ((B1 ))} satis-
fies Bτ ∼ µ. However, this solution is intuitively “too which is a first hitting time for the Markov process
large”, in particular Ɛτ = ∞. A meaningful solution (At , Bt ), where (At ) is some auxiliary increasing
needs to be “small”. To express this, Skorokhod [20] process. We now give two examples.
imposed Ɛτ < ∞ and solved the problem explicitly Consider At = t and let τR be the resulting
for any centered target measure with finite variance. stopping time in (1). Root [17] proved that for
To avoid the restriction on the set of target measures, any centered µ there is a barrier  = (µ) such
in general, one requires τ to be minimal. Minimal- that Bτ ∼ µ, where a barrier is a set in + × 
ity of τ signifies that if a stopping time ρ satisfies (time–space) such that if a point is in , then all
ρ ≤ τ and Xρ ∼ Xτ then ρ = τ . When ƐBτ = 0, points to the right of it are also in  (see Figure 1).
2 Skorokhod Embedding

see, these two solutions induce upper and lower


Γ bounds on the price of a one-touch option.

Bt Applications
Robust Price Bounds
TR t
In the standard approach to pricing and hedging, one
postulates a model for the underlying, calibrates it
to the market prices of liquidly traded vanilla options
(see Call Options), and then uses the model to derive
prices and associated hedges for exotic over-the-
counter products (such as Barrier Options; Look-
Figure 1 The barrier  and Root stopping time τR
back Options; Foreign Exchange Options). Prices
embedding a uniform law and hedges will be correct only if the model describes
the real world perfectly, which is not very likely. The
SEP-driven approach uses the market data to deduce
Later Rost (cf. [14]) proved an analogous result bounds on the prices consistent with no-arbitrage
replacing (µ) with a reversed barrier ˜ = (µ), ˜ and the associated super-replicating strategies (see
which is a set in time–space such that if a point is in Superhedging), which are robust to model misspec-
˜ then all the points to the left of it are also in .
, ˜ ification.
˜
We denote τ̃R the first hitting of (µ). Rost (cf. [14, Assume absence of arbitrage (see Fundamen-
19]) proved that for any other solution τ to the SEP tal Theorem of Asset Pricing) and work under a
and any positive convex function f , we have risk-neutral measure (see Risk-neutral Pricing) so
that the forward price process (see Forwards and
Ɛf (τR ) ≤ Ɛf (τ ) ≤ Ɛf (τ̃R ) (2) Futures) (St : t ≤ T ) is a martingale. Equivalently,
under a simplifying assumption of zero interest rates,
In financial terms, as we will see, this implies St is simply the stock price process. We are interested
bounds on the prices of volatility derivatives. Given in pricing an exotic option with payoff given by a
a measure µ, the barrier  and the reversed barrier ˜ path-dependent functional F (S)T . Our main example
are not known explicitly. However, using techniques considered below is a one-touch option struck at α
of partial differential equations, they can be computed that pays 1 if the stock price reaches α before matu-
numerically together with the bounds in equation (2) rity T : O α (S)T = 1S T ≥α , where S T = supt≤T St . It
(see [9]). follows from Monroe’s theorem that St = Bρt , for
Consider now At = B t = supu≤t Bu in equation a Brownian motion (Bt ) with B0 = S0 and some
(1). Azéma and Yor [1] proved that, for a probability increasing sequence of stopping times ρt : t ≤ T
measure µ satisfying xµ(dx) = B0 , the stopping (possibly relative to an enlarged filtration). We make
time no other assumptions about the dynamics of the
underlying. Instead, we propose to investigate the
τAY = inf{t : µ (Bt ) ≤ B t }, restrictions induced by the market data.
 Suppose, first, that we know the market prices of
1
where µ (x) = uµ( du) (3) calls and puts (see Call Options) for all strikes at
µ([x, ∞)) [x,∞) one maturity T . This is equivalent to knowing the
is minimal and BτAY ∼ µ. The Azéma–Yor stopping distribution µ of ST (cf. [3]). Thus, we can see the
time is also optimal as it stochastically maximizes the stopping time ρ = ρT as a solution to the SEP for
maximum: (B τ ≥ α) ≤ (B τAY ≥ α), for all α ≥ 0 µ. Conversely, given a solution τ to the SEP for µ,
and any minimal τ with Bτ ∼ BτAY . Later, Perkins the process S̃t = Bτ ∧ t is a model for the stock-
T −t
[16] developed a stopping time τP , which, in turn, price process consistent with the observed prices of
stochastically minimizes the maximum. As we will calls and puts at maturity T. In this way, we obtain
Skorokhod Embedding 3

a correspondence that allows us to identify market of an embedding that maximizes the maximum. As
models with solutions to the SEP and vice versa. we have seen, in financial terms, this amounts to
In consequence, to estimate the fair price of the obtaining the least upper bound on the price of a
exotic option ƐF (S)T , it suffices to bound ƐF (B)τ one-touch option.
among all solutions τ to the SEP. More precisely, if In practice, we do not observe the prices of calls
F (S)T = F (B)ρT a.s., then we have and puts for all strikes but only for a finite family
of strikes. As a result, the terminal law of ST is
inf ƐF (B)τ ≤ ƐF (S)T ≤ sup ƐF (B)τ (4) not specified entirely and one needs to optimize
τ :Bτ ∼µ τ :Bτ ∼µ
among possible terminal laws (cf. [5, 10]). In general,
where all stopping times τ are minimal. Consider, different sets of market prices lead to embedding
for example, a volatility derivativeb paying F (S)T = problems with different constraints. The resulting
f (ST ), for some positive convex function f , problems can be complex. In particular, to our best
and suppose that the underlying (St ) is continuous. knowledge, there are no known optimal solutions to
Then, by Dubins–Schwarz theorem, we can take the the SEP with multiple intermediate law constraints.
time change ρt = St so that f (ST ) = f (ρT ) =
F (B)ρT . Using inequality (2), inequality (4) becomes
Robust Hedging
Ɛf (τR ) ≤ Ɛf (ST ) ≤ Ɛf (τ̃R ) (5)
Once we know the price-range for an option, we want
where BτR ∼ ST ∼ Bτ̃R (cf. [9]). to understand model-free super-replicating strategies
When (St ) has jumps typically one of the bounds (see Superhedging). In general, to achieve this, we
in inequality (4) remains true and the other degen- need to develop a pathwise approach to the SEP.
erates. In the example of a one-touch option, one Following [5], we treat the example of a one-touch
sees that O α (S)T ≤ O α (B)ρT and the fair price is option. We develop a super-replicating portfolio with
always bounded above by supτ {(B τ ≥ α) : Bτ ∼ the initial wealth equal to the upper bound displayed
µ}. Furthermore, the supremum is attained by the in equation (6).
Azéma–Yor construction discussed above. The best The key observation lies in the following simple
lower bound on the price in the presence of jumps inequality:
is the obvious bound µ([α, ∞)). In consequence, the
price of a one-touch option ƐO α (S)T = (S T ≥ α) (ST − K)+ Sς∧T − ST
1S T ≥α ≤ + 1S T ≥α (7)
is bounded by α−K α−K
where α > S0 , K and ς = inf{t : St ≥ α}. Taking
µ([α, ∞)) ≤ (S T ≥ α) ≤ (B τAY ≥ α) expectations yields (S T ≥ α) ≤ C(K)/(α − K),
where C(K) denotes the price of a European call with
= µ([µ−1 (α))) (6)
strike K and maturity T . Taking the optimal K =
and the lower bound can be improved to (B τP ≥ α) K ∗ such that C(K ∗ ) = (α − K ∗ )|C (K ∗ )| we find
under the hypothesis that (St ) is continuous, where (S T ≥ α) ≤ |C (K ∗ )| = (ST ≥ K ∗ ). On the other
τP is Perkins’ stopping time (see [5] for detailed hand, using |C (K)| = µ([K, ∞)), where µ ∼ ST ,
discussion and numerical examples). Selling a one- we have
touch option for a lower price then the upper bound  ∞  
in equation (6) necessarily involves some risk. If C(K) = (u − K)µ(du) = |C (K)| µ (K) − K
additional modeling assumptions are made, then a K
lower price can be justified, but this new price is (8)
not necessarily robust to model misspecification.
The above analysis can be extended if we know The equation for K ∗ implies readily that K ∗ =
more market data. For example, knowing prices of µ−1 (α) and the bound we have derived coincides
puts and calls at some earlier expiry T1 < T would with equation (6).
lead to solving the SEP, constrained by embedding Inequality (7) encodes the super-replicating strat-
an intermediate law µ1 before µ. This was achieved egy. The first term of the right-hand side means we
by Brown et al. [4] who gave an explicit construction buy 1/(α − K ∗ ) calls with strike K ∗ . The second
4 Skorokhod Embedding

term is a simple dynamic trading: if the price reaches End Notes


level α, we sell 1/(α − K ∗ ) forwards on the stock.
At the cost of C1 = C(K ∗ )/(α − K ∗ ) we are then a.
When modeling the stock price process, implicitly we shift
guaranteed to super-replicate the one-touch regardless both B and µ by a constant S0 .
of the dynamics of the underlying. In consequence, b.
Here, written on the realized quadratic variation of the
selling the one-touch for C2 > C1 would be an arbi- stock itself and not the log process.
trage opportunity as we would make a riskless profit
of C2 − C1 . Finally, note that our derivation of the
superhedge is pathwise and makes no assumptions References
about the existence (or uniqueness) of the pricing
measure. [1] Azéma, J. & Yor, M. (1979). Une solution simple au
problème de Skorokhod, in Séminaire de Probabilités,
XIII, Lecture Notes in Mathematics, Springer, Berlin,
Other Resources
Vol. 721, pp. 90–115.
The arguments for robust pricing and hedging of [2] Bertoin, J. & Le Jan, Y. (1992). Representation of
lookback (see Lookback Options) and barrier (see measures by balayage from a regular recurrent point,
Annals of Probability 20(1), 538–548.
Barrier Options) options can be found in the pio- [3] Breeden, D.T. & Litzenberger, R.H. (1978). Prices of
neering work of Hobson [10] and in [5]. Dupire state-contingent claims implicit in option prices, The
[9] investigated volatility derivatives using the SEP. Journal of Business 51(4), 621–651.
Cox et al. [7] designed pathwise inequalities to derive [4] Brown, H., Hobson, D. & Rogers, L.C.G. (2001). The
price range and robust super-replicating strategies for maximum maximum of a martingale constrained by an
derivatives paying a convex function of the local time intermediate law, Probability Theory and Related Fields
(see Local Times; Corridor Variance Swap). The 119(4), 558–578.
[5] Brown, H., Hobson, D. & Rogers, L.C.G. (2001). Robust
idea of no-arbitrage bounds on the prices goes back
hedging of barrier options, Mathematical Finance 11(3),
to Merton [11] (see Arbitrage Bounds). This was 285–314.
refined in no-good deals (see Good-deal Bounds) [6] Cox, A. & Hobson, D. (2006). Skorokhod embeddings,
pricing, where one postulates that markets not only minimality and non-centered target distributions, Prob-
exclude arbitrage opportunities but also any highly ability Theory and Related Fields 135(3), 395–414.
desirable investments. No-good deals pricing yields [7] Cox, A., Hobson, D. & Obłój, J. (2008). Pathwise
tighter bounds on the prices but requires an arbitrary inequalities for local time: applications to Skorokhod
embeddings and optimal stopping, Annals of Applied
choice of utility function.
Probability 18(5), 1870–1896.
We refer to [14] for an extended survey of [8] Dubins, L.E. (1968). On a theorem of Skorohod, The
the SEP, including its history and overview of Annals of Mathematical Statistics 39, 2094–2097.
its applications. We have not discussed here the [9] Dupire, B. (2005). Arbitrage Bounds for Volatility
SEP for processes other than Brownian motion. Derivatives as a Free Boundary Problem, http://www.
Rost [18] investigated the problem for a general math.kth.se/pde finance/presentations/Bruno.pdf.
Markov process and has a necessary and sufficient [10] Hobson, D. (1998). Robust hedging of the lookback
condition on the target measure µ for existence option, Finance and Stochastics 2, 329–347.
[11] Merton, R.C. (1973). Theory of rational option pricing,
of an embedding. Bertoin and Le Jan [2] then
Bell Journal of Economics and Management Science 4,
developed an explicit solution, in a broad class of 141–183.
Markov processes, which was based on additive [12] Monroe, I. (1972). On embedding right continuous mar-
functionals. More recently, the approach of Vallois tingales in Brownian motion, The Annals of Mathemati-
[21] was extended to provide explicit solutions for cal Statistics 43, 1293–1311.
classes of discontinuous processes including Azéma’s [13] Monroe, I. (1978). Processes that can be embedded
martingale [15]. in Brownian motion, The Annals of Probability 6(1),
42–56.
[14] Obłój, J. (2004). The Skorokhod embedding problem
Acknowledgments and its offspring, Probability Surveys 1, 321–392.
[15] Obłój, J. (2007). An explicit solution to the Skorokhod
This research was supported by a Marie Curie Intra- embedding problem for functionals of excursions of
European Fellowship at Imperial College London within Markov processes, Stochastic Process and their Appli-
the 6th European Community Framework Programme. cation. 117(4), 409–431.
Skorokhod Embedding 5

[16] Perkins, E. (1986). The Cereteli-Davis solution to the [21] Vallois, P. (1983). Le problème de Skorokhod sur
H 1 -embedding problem and an optimal embedding in R: une approche avec le temps local, in Séminaire
Brownian motion, in Seminar on stochastic processes, de Probabilités, XVII, Lecture Notes in Mathematics,
1985 (Gainesville, Fla., 1985), Progress in Probability Springer, Berlin, Vol. 986, pp. 227–239.
and Statistics, Birkhäuser Boston, Boston, Vol. 12,
pp. 172–223.
[17] Root, D.H. (1969). The existence of certain stopping
times on Brownian motion, The Annals of Mathematical
Related Articles
Statistics 40, 715–718.
[18] Rost, H. (1971). The stopping distributions of a Markov Arbitrage Bounds; Arbitrage: Historical Perspec-
Process, Inventiones Mathematicae 14, 1–16.
tives; Arbitrage Pricing Theory; Arbitrage Stra-
[19] Rost, H. (1976). Skorokhod stopping times of minimal
variance, in Séminaire de Probabilités, X, Lecture Notes
tegy; Barrier Options; Complete Markets; Convex
in Mathematics, Springer, Berlin, Vol. 511, pp. 194–208. Risk Measures; Good-deal Bounds; Hedging;
[20] Skorokhod, A.V. (1965). Studies in the Theory of Ran- Implied Volatility Surface; Martingales; Model
dom Processes, Addison-Wesley Publishing Co., Read- Calibration; Static Hedging; Superhedging.
ing, Translated from the Russian by Scripta Technica,
Inc. JAN OBŁÓJ
by B. In the following, we will denote a Markov pro-
Markov Processes cess by (Xt , t ≥ 0), or simply X when no confusion
is possible.
A Markov process is a process that evolves in a
memoryless way: its future law depends on the past
only through the present position of the process. This
Markov Property and Transition
property can be formalized in terms of conditional
expectations: a process (Xt , t ≥ 0) adapted to the Semigroup
filtration (Ft )t≥0 (representing the information avail-
A Markov process retains no memory of where it
able at time t) is a Markov process if
has been in the past. Only the current state of the
Ɛ(f (Xt+s ) | Ft ) = Ɛ(f (Xt+s ) | Xt ) (1) process influences its future dynamics. The following
definition formalizes this notion:
for all s, t ≥ 0 and f bounded and measurable.
The interest of such a process in financial models Definition 1 Let (Xt , t ≥ 0) be a stochastic process
becomes clear when one observes that the price of an defined on a probability filtered space (, Ft , ) with
option, or more generally, the value at time t of any values in d . X is a Markov process if
future claim with maturity T , is given by the general
formula (see Risk-neutral Pricing) (Xt+s ∈  | Ft ) = (Xt+s ∈  | Xt ) -a.s.
(4)
Vt = value at time t
= Ɛ(discounted payoff at time T | Ft ) (2) for all s, t ≥ 0 and  ∈ B. Equation (4) is called
the Markov property of the process X. The Markov
where the expectation is computed with respect to a process is called time homogeneous if the law of Xt+s
pricing measure (see Equivalent Martingale Mea- conditionally on Xt = x is independent of t.
sures). The Markov property is a frequent assumption
in financial models because it provides powerful tools Observe that equation (4) is equivalent to equation
(semigroup, theory of partial differential equations (1) and that X is a time-homogeneous Markov
(PDEs), etc.) for the quantitative analysis of such process if there exists a positive function P defined
problems. on + × d × B such that
Assuming the Markov property (1) for (St , t ≥ 0),
the value Vt of the option can be expressed as P (s, Xt , ) = (Xt+s ∈  | Ft ) (5)

Vt = Ɛ(e−r(T −t) f (ST ) | Ft ) holds -a.s. for all t, s ≥ 0 and  ∈ B. P is called the
transition function of the time homogeneous Markov
= Ɛ(e−r(T −t) f (ST ) | St ) (3) process X.
For the moment, we restrict ourselves to the time-
so Vt can be expressed as a (deterministic) function of
homogeneous case.
t, St : u(t, St ) = Ɛ(e−r(T −t) f (ST ) | St ). Furthermore,
this function u is shown to be the solution of a Proposition 1 The transition function P of a time-
parabolic PDE, the Kolmogorov backward equation. homogeneous Markov process X satisfies
The goal of this article is to present the Markov
processes and their relation with PDEs, and to 1. P (t, x, ·) is a probability measure on d for any
illustrate the role of Markovian models in various t ≥ 0 and x ∈ d ,
financial problems. We give a general overview of the 2. P (0, x, ·) = δx (unit mass at x) for any x ∈ d ,
links between Markov processes and PDEs without 3. P (·, ·, ) is measurable for any  ∈ B,
giving more details and we focus on the case of and for any s, t ≥ 0, x ∈ d ,  ∈ B, P satisfies the
Markov processes solution to stochastic differential Chapman–Kolmogorov property
equations (SDEs).

We will restrict ourselves to d -valued Markov
processes. The set of Borel subsets of d is denoted P (t + s, x, ) = P (s, y, )P (t, x, dy) (6)
d
2 Markov Processes

From an analytical viewpoint, we can think of the Theorem 1 ([9] Th.4.2.7). Given a Feller semigroup
transition function as a Markov semigroupa (Pt , t ≥ (Pt , t ≥ 0) and any probability measure ν on d ,
0), defined by there exists a filtered probability space (, Ft , )
and a strong Markov process (Xt , t ≥ 0) on this

space with values in d with initial law ν and with
Pt f (x) := P (t, x, dy)f (dy) transition function Pt . A strong Markov process whose
d
semigroup is Feller is called a Feller process.
= Ɛ(f (Xt ) | X0 = x) (7)
in which case the Chapman–Kolmogorov equation
becomes the semigroup property Infinitesimal Generator

Ps Pt = Pt+s , s, t ≥ 0 (8) We are now in a position to introduce the key notion


of infinitesimal generator of a Feller process.
Conversely, given a Markov semigroup (Pt , t ≥
0) and a probability measure ν on d , it is always Definition 4 For a Feller process (Xt , t ≥ 0), the
possible to construct a Markov process X with initial infinitesimal generator of X is the (generally un-
law ν that satisfies equation (7) (see [9, Th.4.1.1]). bounded) linear operator L : D(L) → C0 (d ) defin-
The links between PDEs and Markov processes are ed as follows. We write f ∈ D(L) if, for some g ∈
based on this equivalence between semigroups and C0 (d ), we have
Markov processes. This can be expressed through a
Ɛ(f (Xt ) | X0 = x) − f (x)
single object: the infinitesimal generator. → g(x) (11)
t
when t → 0 for the norm  · , and we then define
Strong Markov Property, Feller Processes Lf = g.
Recall that a random time τ is called a Ft -stopping By Theorem 1, an equivalent definition can be
time if {τ ≤ t} ∈ Ft for any t ≥ 0. obtained by replacing X by its Feller semigroup
(Pt , t ≥ 0). In particular, for all f ∈ D(L),
Definition 2 A Markov process (Xt , t ≥ 0) with
transition function P (t, x, ) is strong Markov if, for
any Ft -stopping time τ , Pt f (x) − f (x)
Lf (x) = lim (12)
t→0 t
(Xτ +t ∈  | Fτ ) = P (t, Xτ , ) (9)
An important property of the infinitesimal gener-
for all t ≥ 0 and  ∈ B. ator is that it allows one to construct fundamental
martingales associated with a Feller process.
Let C0 (d ) denote the space of bounded con-
tinuous functions on d , which vanish at infinity, Theorem 2 ([21], III.10). Let X be a Feller process
equipped with the L∞ norm denoted by  · . on (, Ft , ) with infinitesimal generator L such that
X0 = x ∈ d . For all f ∈ D(L),
Definition 3 A Feller semigroupb is a strongly  t
continuous,c positive, Markov semigroup (Pt , t ≥ 0) f (Xt ) − f (x) − Lf (Xs ) ds (13)
such that Pt : C0 (d ) → C0 (d ) and 0

defines a Ft -martingale. In particular,


∀f ∈ C0 (d ), 0 ≤ f ⇒ 0 ≤ Pt f
 t 
∀f ∈ C0 (d ) ∀x ∈ d , Pt f (x) → f (x) as t → 0 Ɛ(f (Xt )) = f (x) + Ɛ Lf (Xs ) ds (14)
0
(10)
As explained earlier, the law of a Markov pro-
For a Feller semigroup, the corresponding Markov cess is characterized by its semigroup. In most cases,
process can be constructed as a strong Markov a Feller semigroup can be itself characterized by
process. its infinitesimal generator (the precise conditions for
Markov Processes 3

this to hold are given by the Hille–Yosida theorem, equation; for all f ∈ D(L),
see [21, Th.III.5.1]). For almost all Markov finan-
d
cial models, these conditions are well established Pt f = LPt f (16)
and always satisfied (see Examples 1, 2, 3, and 4). dt
As illustrated by equation (14), when D(L) is large This equation is called Kolmogorov’s backward equa-
enough, the infinitesimal generator captures the law tion. In particular, if L is a differential operator (e.g.,
of the whole dynamics of a Markov process and pro- if X is a Feller diffusion), the function u(t, x) =
vides an analytical tool to study the Markov process. Pt f (x) is the solution of the PDE
The other major mathematical tool used in finance 
is the stochastic calculus (see Stochastic integral, ∂u = Lu
∂t (17)
Itô formula), which applies to Semimartingales (see
u(0, x) = f (x)
[18]). It is therefore crucial for applications to char-
acterize under which conditions a Markov process Conversely, if this PDE admits a unique solution,
is a semimartingale. This question is answered for then its solution is given by
very general processes in [5]. We mention that this is
always the case for Feller diffusions, defined later. u(t, x) = Ɛ(f (Xt ) | X0 = x) (18)

This is the simplest example of a probabilistic inter-


Feller Diffusions pretation of the solution of a PDE in terms of a
Markov process.
Let us consider the particular case of continuous Moreover, because Feller semigroups are strongly
Markov processes, which include the solutions of continuous, it is easy to check that the operators Pt
stochastic differential equations (SDEs). and L commute. Therefore, equation (16) may be
rewritten as
Definition 5 A Feller diffusion on d is a Feller d
Pt f = Pt Lf (19)
process X on d that has continuous paths, and such dt
that the domain D(L) of the generator L of X contains This equation is known as Kolmogorov’s forward
the space CK∞ (d ) of infinitely differentiable functions equation. It is the weak formulation of the equation
of compact support.
d x
µ = L∗ µxt (20)
Feller diffusions are Markov processes admitting dt t
a second-order differential operator as infinitesimal
generator. where the probability measure µxt on d denotes the
law of Xt conditioned on X0 = x and where L∗ is the
Theorem 3 For any f ∈ CK∞ (d ), the infinitesimal adjoint operator of L. In particular, with the notation
generator L of a Feller diffusion has the form of Theorem 3, if X is a Feller diffusion and if µxt (dy)
admits a density q(x; t,y) with respect to Lebesgue’s
measure on d (which holds, e.g., if the functions
1  
d d
∂ 2f ∂f bi (x) and aij (x) are bounded and locally Lipschitz,
Lf (x) = aij (x) (x) + bi (x) (x)
2 i,j =1 ∂xi ∂xj i=1
∂x i if the functions aij (x) are globally Hölder and if
the matrix a(x) is uniformly positive definite [10,
(15) Th.6.5.2]), the forward Kolmogorov equation is the
weak form (in the sense of the distribution theory) of
where the functions aij (·) and bi (·), 1 ≤ i, j ≤ d
the PDE
are continuous and the matrix a = (aij (x))1≤i,j ≤d is
nonnegative definite symmetric for all x ∈ d .
∂  ∂ d
q(x; t,y) = − (bi (y)q(x; t,y))
∂t i=1
∂yi
Kolmogorov Equations

d
∂2
Observe by equation (12) that the semigroup Pt of + (aij (y)q(x; t,y)) (21)
a Feller process X satisfies the following differential i,j =1
∂yi ∂yj
4 Markov Processes

This equation is known as Fokker–Planck equation time-inhomogeneous infinitesimal generators of the


and gives another family of PDEs that have proba- process X.
bilistic interpretations. Fokker–Planck equation has All the results on Feller processes stated earlier
applications in finance for quantiles, Value at Risk, can be easily transposed to the time-inhomogeneous
or risk measure computations [22], whereas Kol- case, observing that if (Xt , t ≥ 0) is a time-
mogorov’s backward equation (17) is more suited to inhomogeneous Markov process on d , then (X̃t , t ≥
financial problems related to the hedging of deriva- 0), where X̃t = (t, Xt ) is a time-homogeneous
tives products or portfolio allocation (see the section Markov process on + × d . Moreover, if X is time-
“Parabolic PDEs Associated to Markov Processes”, inhomogeneous Feller, it is elementary to check that
and sequel). the process X̃ is time-homogeneous Feller as defined
in Definition 3. Its semigroup (P˜t , t ≥ 0) is linked to
the time-inhomogeneous semigroup by the relation
Time-inhomogeneous Markov Processes
P̃t f (s, x) = Ɛ[f (s + t, Xs+t ) | Xs = x]
The law of a time-inhomogeneous Markov process is 
described by the doubly indexed family of operators = Ps,s+t f (s + t, ·) (x) (26)
(Ps,t , 0 ≤ s ≤ t) where, for any bounded measurable
f and any x ∈ d , for all bounded and measurable f : + × d →
. If L̃ denotes the infinitesimal generator of the
Ps,t f (x) = Ɛ(f (Xt ) | Xs = x) (22) process X̃, it is elementary to check that, for any
f (t, x) ∈ D(L̃) that is differentiable with respect to
Then, the semigroup property becomes, for s ≤ t ≤ r, t, with derivative uniformly continuous in (t, x),
x
→ f (t, x) belongs to D(Lt ) for any t ≥ 0 and
Ps,t Pt,r = Ps,r (23)
∂f 
Definition 3 of Feller semigroups can be gener- L̃f (t, x) = (t, x) + Lt f (t, ·) (x) (27)
∂t
alized to time-inhomogeneous processes as fol-
lows. The time-inhomogeneous Markov process X On this observation, it is possible to apply Theorem 3
is called a Feller time-inhomogeneous process if to time-inhomogeneous Feller diffusions, defined
(Ps,t , 0 ≤ s ≤ t) is a family of positive, Markov lin- as continuous time-inhomogeneous Feller processes
ear operators on C0 (d ) which is strongly continuous with infinitesimal generators (Lt , t ≥ 0) such that
in the sense CK∞ (d ) ⊂ D(Lt ) for any t ≥ 0. For such processes,
there exist continuous functions bi and aij , 1 ≤ i, j ≤
d from + × d to  such that the matrix a(t, x) =
∀s ≥ 0, x ∈ d , f ∈ C0 (d ), Ps,t f − f  → 0
(ai,j (t, x))1≤i,j ≤d is symmetric nonnegative definite
as t → s (24) and

In this case, it is possible to generalize the notion of


1 
d
∂ 2f
infinitesimal generator. For any t, let Lt f (x) = aij (t, x) (x)
2 i,j =1 ∂xi ∂xj
Pt,t+s f (x) − f (x)
Lt f (x) = lim 
d
∂f
s→0 s + bi (t, x) (x) (28)
  ∂xi
Ɛ f (Xt+s ) | Xt = x − f (x) i=1
= lim
s→0 s for all t ≥ 0, x ∈ d and f ∈ CK∞ (d ).
(25) For more details on time-inhomogeneous Markov
processes, we refer to [10].
for any f ∈ C0 (d ) such that Lt f ∈ C0 (d ) and the
limit above holds in the sense of the norm  · . The Example 1 Brownian Motion The standard one-
set of such f ∈ C0 (d ) is called the domain D(Lt ) dimensional Brownian motion (Bt , t ≥ 0) is a Feller
of the operator Lt . (Lt , t ≥ 0) is called the family of diffusion in  (d = 1) such that B0 = 0 and for
Markov Processes 5

which the parameters of Theorem 3 are b = 0 and (i.e., a = σ σ ) and where Bt is a r-dimensional stan-
a = 1. The Brownian motion is the fundamental dard Brownian motion. For example, when d = r,
prototype of Feller diffusions. Other diffusions are one can take for σ (x) the symmetric square root
inherited from this process because they can be matrix of the matrix a(x).
expressed as solutions to SDEs driven by independent The construction of Markov solutions to the SDE
Brownian motions (see later). Similarly, the standard (33) with generator (15) is possible if b and σ are
d-dimensional Brownian motion is a vector of d inde- globally Lipschitz with linear growth [13, Th.5.2.9],
pendent standard one-dimensional Brownian motions or if b and a are bounded and continuous func-
and corresponds to the case bi = 0 and aij = δij for tions [13, Th.5.4.22]. In the second case, the SDE has
1 ≤ i, j ≤ d, where δij is the Kronecker delta func- a solution in a weaker sense. Uniqueness (at least in
tion (δij = 1 if i = j and 0 otherwise). law) and the strong Markov property hold if b and
σ are locally Lipschitz [13, Th.5.2.5], or if b and a
Example 2 Black–Scholes Model In the Black– are Hölder continuous and the matrix a is uniformly
Scholes model, the underlying asset price St follows positive definite [13, Rmk.5.4.30, Th.5.4.20]. In the
a geometric Brownian motion with constant drift µ one-dimensional case, existence and uniqueness for
and volatility σ . the SDE (32) can be proved under weaker assump-
 tions [13, Sec.5.5].
St = S0 exp (µ − σ 2 /2)t + σ Bt (29) In all these cases, the Markov property allows one
to identify the SDE (33) with its generator (15). This
where B is a standard Brownian motion. With Itô’s will allow us to make the link between parabolic
formula, it is easily checked that S is a Feller PDEs and the corresponding SDE in the section
diffusion with infinitesimal generator “Parabolic PDEs Associated to Markov Processes”
and sequel.
Lf (x) = µxf (x) + 12 σ 2 x 2 f (x) (30) Similarly, one can associate to the time-inhomo-
geneous SDE
Itô’s formula also yields
 t  t
dXt = b(t, Xt ) dt + σ (t, Xt ) dBt (34)
St = S0 + µ Ss ds + σ Ss dBs (31)
0 0 the time-inhomogeneous generators (28). Existence
for this SDE holds if bi and σij are globally Lipschitz
which can be written as the SDE
in x and locally bounded (uniqueness holds if bi and
σij are only locally Lipschitz in x). As earlier, in this
dSt = µSt dt + σ St dBt (32) case, a solution to equation (34) is strong Markov.
We refer the reader to [16] for more details.
The correspondence between the SDE and the
second-order differential operator L appears below Example 4 Backward Stochastic Differential
as a general fact. Equations Backward stochastic differential
equations are SDEs where a random variable is given
Example 3 Stochastic Differential Equations as a terminal condition. Let us motivate the definition
SDEs are probably the most used Markov models of a backward SDE (BSDE) by continuing the study
in finance. Solutions of SDEs are examples of Feller of the elementary example of the introduction of this
diffusions. When the parameters bi and aij of The- article.
orem 3 are sufficiently regular, a Feller process X
with generator equation (15) can be constructed as Consider an asset St modeled by the Black–
the solution of the SDE Scholes SDE (32) and assume that it is possible to
borrow and lend cash at a constant risk-free interest
dXt = b(Xt )dt + σ (Xt ) dBt (33) rate r. A self-financed trading strategy is determined
by an initial portfolio value and the amount πt of
where b(x) ∈ d is (b1 (x), . . . ,

bd (x)), where the the portfolio value placed in the risky asset at time t.
d×r matrix σ (x) satisfies aij (x)= rk=1 σik (x)σj k (x) Given the stochastic process (πt , t ≥ 0), the portfolio
6 Markov Processes

value Vt at time t solves the SDE Discontinuous Markov Processes

dVt = rVt dt + πt (µ − r) dt + σ πt dBt (35) In financial models, it is sometimes natural to con-


sider discontinuous Markov processes, for exam-
where B is the Brownian motion driving the dynam- ple, when one wants to take into account jumps in
ics (32) of the risky asset S. prices. This can sometimes be done by modeling the
Assume that this portfolio serves to hedge a call dynamics using Poisson processes, Lévy processes
option with strike K and maturity T . This problem or other jump processes (see Jump Processes). In
can be expressed as finding a couple of processes particular, it is possible to define SDEs where the
(Vt , πt ) adapted to the Brownian filtration Ft = Brownian motion is replaced by a Lévy process
σ (Bs , s ≤ t) such that (see CGMY model, NIG model, or Generalized
hyperbolic model for examples). In this situation,
 the generator is an integro-differential operator and
T
the parabolic PDE is replaced by Partial integro-
Vt = (ST − K)+ − (rVs + πs (µ − r)) ds
t
differential Equations.
 T
− σ πs dBs (36)
t Dimension of the State Space
Such SDEs with terminal condition and with un- In many pricing/hedging problems, the dimension of
known process driving the Brownian integral are the pricing PDE is greater than the state space of
called BSDEs. This particular BSDE admits a unique the underlyings. In such cases, the financial problem
solution (see the section “Quasi- and Semilinear is apparently related to non-Markov stochastic pro-
PDEs and BSDEs”) and can be explicitly solved. cesses. However, it can usually be expressed in terms
Because V0 is F0 adapted, it is nonrandom and of Markov processes if one increases the dimension
therefore V0 is the usual free arbitrage price of the of the process considered. For example, in the con-
option. In particular, choosing µ = r, we recover text of Markov short rates (rt , t ≥ 0), the pricing of
the usual formula for the free arbitrage price V0 = a zero-coupon
Ɛ[e−rT (ST − K)+ ], and the quantity of risky asset  t bond is expressed in terms of the pro-
cess Rt = 0 rs ds which is not Markovian, whereas
πt /St in the portfolio is given by the Black–Scholes the couple (rt , Rt ) is Markovian. For Asian options
-hedge ∂u/∂x(t, St ), where u(t, x) is the solution on a Markov asset, the couple formed by the asset
of the Black–Scholes PDE (see Exchange Options) and its integral is Markovian. If the asset involves
a stochastic volatility solution to a SDE (see Heston
 model and SABR model), then the couple formed by

 ∂u ∂u σ 2 2 ∂ 2 u − ru = 0
 ∂t + rx ∂x + 2 x the asset value and its volatility is Markov. As men-
∂t 2
 ∀(t, x) ∈ [0, T ) × (0, +∞) tioned earlier, another important example is given by

 time-inhomogeneous Markov processes that become
u(T , x) = f (x) ∀x ∈ (0, +∞)
time homogeneous when one considers the couple
(37) formed by the current time and the original process.
In some cases, the dimension of the system can
Applying Itô formula to u(t, St ), an elementary be reduced while preserving the Markovian nature
computation shows that u(t, St ) solves the same of the problem. In the case of the portfolio man-
SDE (35) with µ = r as Vt , with the same terminal agement of multidimensional Black–Scholes prices
condition. Therefore, by uniqueness, Vt = u(t, St ). with deterministic volatility matrix, mean return vec-
Usually, for more general BSDEs, (πt , t ≥ 0) is tor and interest rate, the dimension of the problem
an implicit process given by the martingale represen- is actually reduced to one (see Merton problem).
tation theorem. In the section “Quasi- and Semilinear When the volatility matrix, the mean return vector,
PDEs and BSDEs”, we give results on the existence and the interest rate are Markov processes of dimen-
and uniqueness of solutions of BSDEs, and on their sion d , the dimension of the problem is reduced to
links with nonlinear PDEs. d + 1.
Markov Processes 7

Parabolic PDEs Associated to Markov “Optimal Control, Hamilton–Jacobi–Bellman Equa-


Processes tions, and Variational Inequalities”, because of the
nonlinearity of the problem, classical solutions may
Computing the value of any future claim with fixed not exist, and one must consider the weaker notion
maturity (for example, the price of an European of viscosity solutions.
option on an asset solution to a SDE), or solving In the section “Brownian Motion, Ornstein–
an optimal portfolio management problem, amounts Uhlenbeck Process, and the Heat Equation”, we con-
to solve a parabolic second-order PDE, that is a PDE sider heat-like equations where the solution can be
of the form explicitly computed. The section “Linear Case” deals
with linear PDEs, the section “Quasi- and Semilinear
∂u PDEs and BSDEs” deals with quasi- and semilinear
(t, x) + Lt u(t, x) PDEs and their links with BSDEs, and the section
∂t
“Optimal Control, Hamilton–Jacobi–Bellman Equa-
= f (t, x, u(t, x), ∇u(t, x)), (t, x) ∈ + × d tions, and Variational Inequalities” deals with optimal
control problems.
(38)

where ∇u(t, x) is the gradient of u(t, x) with respect


Brownian Motion, Ornstein–Uhlenbeck
to x and the linear differential operators Lt has the
form equation (28).
Process, and the Heat Equation
The goal of this section is to explain the links The heat equation is the first example of a parabolic
between these PDEs and the original diffusion pro- PDE with basic probabilistic interpretation (for which
cess, or some intermediate Markov process. We will there is no need of stochastic calculus).
distinguish between linear parabolic PDEs, where
the function f (t, x, y, z) does not depend on z and 
∂u (t, x) = 1 u(t, x), (t, x) ∈ (0, +∞) × d
is linear in y, semilinear parabolic PDEs, where ∂t 2
the function f (t, x, y, z) does not depend on z but u(0, x) = f (x), x ∈ d
is nonlinear in y, and quasi-linear parabolic PDEs, (40)
where the function f (t, x, y, z) is nonlinear in (y, z).
d
We will also discuss the links between diffusion where denotes the Laplacian operator of  . When
processes and some fully nonlinear PDEs (Hamil- f is a bounded measurable function, it is well known
ton–Jacobi–Bellman (HJB) equations or variational that the solution of this problem is given by the
inequalities) of the form formula

  u(t, x) = f (y)g(x; t,y) dy (41)
∂u
F t, (t, x), u(t, x), ∇u(t, x), H u(t, x) = 0, d
∂t where
 
(t, x) ∈ + × d (39) 1 |x − y|2
g(x; t,y) = exp (42)
for some nonlinear function F , where H u denotes (2πt)d/2 2t
the Hessian matrix of u with respect to the space
| · | denotes the Euclidean norm on d . g is often
variable x.
called the fundamental solution of the heat equation.
Such problems involve several notions of solutions
We recognize that g(x; t,y) dy is the law of x + Bt
discussed in the literature (see viscosity solution). In
where B is a standard d-dimensional Brownian
the sections “Brownian Motion, Ornstein–Uhlenbeck
motion. Therefore, equation (41) may be rewritten
Process, and the Heat Equation” and “Linear Case”,
as
we consider classical solutions, that is, solutions that u(t, x) = Ɛ[f (x + Bt )] (43)
are continuously differentiable with respect to the
time variable, and twice continuously differentiable which provides a simple probabilistic interpretation
with respect to the space variables. In the sections of the solution of the heat equation in d as a par-
“Quasi- and Semilinear PDEs and BSDEs” and ticular case of equation (18). Note that equation (40)
8 Markov Processes
 
involves the infinitesimal generator of the Brownian 2β(y − x exp(βt))2
motion (1/2) . × exp − (50)
Let us mention two other cases where the link σ 2 (exp(2βt) − 1)
between PDEs and stochastic processes can be done Then, for any bounded and measurable f ,
without stochastic calculus. The first one is the
Black–Sholes model, solution to the SDE 
u(t, x) = f (y)h(x; t,y) dy = Ɛ[f (Xt ) | X0 = x]
dSt = St (µ dt + σ dBt ) (44) 

When d = 1, its infinitesimal generator is Lf (x) = (51)


µxf (x) + (σ 2 /2)x 2 f (x) and its law at time t when
is solution of
S0 = x is l(x; t,y) dy where

∂u (t, x) = Au(t, x), (t, x) ∈ (0, +∞) × 
1 ∂t
l(x; t,y) = √
σy 2πt u(0, x) = f (x), x∈
  2 
1  y  σ2
(52)
× exp − 2 log − µ − 2
t
2σ t x
(45) Linear Case
Then, for any bounded and measurable f , elementary The probabilistic interpretations of the previous PDEs
computations show that can be generalized to a large class of linear parabolic
 ∞ PDEs with arbitrary second-order differential oper-
u(t, x) = f (y)l(x; t,y) dy (46) ator, interpreted as the infinitesimal generator of a
0
Markov process. Assume that the vector b(t, x) ∈ d
satisfies and the d × r matrix σ (t, x) are uniformly bounded
 and locally Lipschitz functions on [0, T ] × d and
∂u (t, x) = Lu(t, x), (t, x) ∈ (0, +∞)2 consider the SDE in d
∂t
u(0, x) = f (x), x ∈ (0, +∞) dXt = b(t, Xt ) dt + σ (t, Xt ) dBt (53)
(47)
where B is a standard r-dimensional Brownian
Here again, this formula gives immediately the prob- motion. Set a = σ σ and assume also that the d × d
abilistic interpretation matrix a(t, x) is uniformly Hölder and satisfies the
uniform ellipticity condition: there exists γ > 0 such
u(t, x) = Ɛ[f (St ) | S0 = x] (48)
that for all (t, x) ∈ [0, T ] × d and ξ ∈ d ,
The last example is the Ornstein–Uhlenbeck

d
process in  aij (t, x)ξi ξj ≥ γ |ξ |2 (54)
i,j =1
dXt = βXt dt + σ dBt (49)
with β ∈ , σ > 0 and X0 = x. The infinitesimal Let (Lt )t≥0 be the family of time-inhomogeneous
generator of this process is Af (x) = βxf (x) + infinitesimal generators of the Feller diffusion Xt
(σ 2 /2)f (x). It can be easily checked that Xt is a solution to the SDE (53), given by equation (28).
Gaussian random variable with mean x exp(βt) and Consider the Cauchy problem
variance σ 2 (exp(2βt) − 1)/2β with the convention 
that (exp(2βt) − 1)/2β = t if β = 0. Therefore, its  ∂u

 ∂t (t, x) + Lt u(t, x)
probability density function is given by (t, x) ∈ [0, T ) × d
+c(t, x)u(t, x) = f (t, x),
 


β u(T , x) = g(x), x ∈ d
h(x; t,y) =
σ 2 π(exp(2βt) − 1) (55)
Markov Processes 9

where c(t, x) is uniformly bounded and locally g(Su , 0 ≤ u ≤ T ). The free arbitrage value at time t
Hölder on [0, T ] × d , f (t, x) is locally Hölder on of this option is
[0, T ] × d , g(x) is continuous on d and
Vt = Ɛ[e−r(T −t) g(Su , t ≤ u ≤ T ) | Ft ] (59)
|f (t, x)| + |g(x)| ≤ A exp(a|x|), By the Markov property (1), this quantity only
∀(t, x) ∈ [0, T ] × d depends on St and t [10, Th.2.1.2]. The Feynman–
Kac formula (58) allows one to characterize V in the
(56) case where g depends only on ST and S is a Feller
diffusion.
for some constants A, a > 0. Under these condi- Most often, the asset SDE
tions, it follows easily from Theorems 6.4.5 and 6.4.6
of [10] that equation (55) admits a unique classical dSt = St (µ(t, St ) dt + σ (t, St ) dBt ) (60)
solution u such that
cannot satisfy the uniform ellipticity assumption (54)
in the neighborhood of 0. Therefore, Theorem 4 does
|u(t, x)| ≤ A exp(a|x|) ∀(t, x) ∈ [0, T ] × d
not apply directly. This is a general difficulty for
(57) financial models. However, in most cases (and in
all the examples below), it can be overcome by
taking the logarithm of the asset price. In our case,
for some constant A > 0. we assume that the process (log St , 0 ≤ t ≤ T ) is
The following result is known as Feynman–Kac a Feller diffusion on  with time-inhomogeneous
formula and can be deduced from equation (57) generator
using exactly the same method as for [10, Th.6.5.3]
and using the fact that, under our assumptions, Lt φ(y) = 12 a(t, y)φ (y) + b(t, y)φ (y) (61)
Xt has finite exponential moments
[10, Th.6.4.5].
that satisfy the assumptions of Theorem 4. This
Theorem 4 Under the previous assumptions, the holds for example for the Black–Scholes model
solution of the Cauchy problem (55) is given by (32). This assumption implies that S is a Feller
diffusion on (0, +∞) whose generator takes the
form
  T  
u(t, x) = Ɛ g(XT ) exp c(s, Xs ) ds | Xt = x L̃t φ(x) = 12 ã(t, x)x 2 φ (x) + b̃(t, y)xφ (x) (62)
t
 T
−Ɛ f (s, Xs ) where ã(t, x)=a(t, log x) and b̃(t, x)=b(t, log x) +
t a(t, log x)/2.
  
s Assume also that g(x) is continuous on +
× exp c(α, Xα ) dα ds | Xt = x with polynomial growth when x → +∞. Then, by
t
Theorem 4, the function
(58)
 
Let us mention that this result can be extended v(t, y) = Ɛ e−r(T −t) g(ST ) | log St = y (63)
to parabolic linear PDEs on bounded domains [10,
Th.6.5.2] and to elliptic linear PDEs on bounded is solution to the Cauchy problem
domains [10, Th.6.5.1].

 ∂v
Example 5 European Options The Feynman– 
 ∂t (t, y) + Lt v(t, y)
Kac formula has many applications in finance. Let (t, y) ∈ [0, T ) × 
−rv(t, y) = 0, (64)
us consider the case of an European option on a 


one-dimensional Markov asset (St , t ≥ 0) with payoff v(T , y) = g(exp(y)), y∈
10 Markov Processes

Making the change of variable x = exp(y), u(t, x) = It is straightforward to check that (S, A) is a Feller
v(t, log x) is solution to diffusion on (0, +∞)2 with infinitesimal generator


∂u (t, x) + b̃(t, x)x ∂u (t, x) + 1 ã(t, x)x 2 ∂ 2 u (t, x) − rv(t, x) = 0, (t, x) ∈ [0, T ) × (0, +∞)
∂t ∂x 2
∂x 2 (65)
u(T , x) = g(x), x ∈ (0, +∞)

and Vt = u(t, St ). The Black–Scholes PDE (37) is a ∂f σ 2 2 ∂ 2f


Lf (x, y) = rx (x, y) + x (x, y)
particular case of this result. ∂x 2 ∂x 2
1 ∂f
Example 6 An Asian Option We give an example + x (x, y) (70)
of a path-dependent option for which the uniform T ∂y
ellipticity condition of the matrix a does not hold. Although considering the change of variable (log S,
An Asian option is an option where the payoff is A), Theorem 4 does not apply to this process because
determined by the average of the underlying price the infinitesimal generator is degenerated (with-
over the period considered. Consider the Asian call out second-order derivative in y). Formally, the
option Feynman–Kac formula would give that
  T +
1
Su du − K (66) u(t, x, y)
T 0
:= Ɛ[e−r(T −t) (AT /T − K)+ | St = x, At = y]
on a Black–Scholes asset (St , t ≥ 0) following
(71)
dSt = rSt dt + σ St dBt (67)
is solution to the PDE

 ∂u + σ 2 x 2 ∂ 2 u + rx ∂u + 1 x ∂u − ru = 0, (t, x, y) ∈ [0, T ) × (0, +∞) × 
∂t 2 ∂x 2 ∂x T ∂y (72)

u(T , x, y) = (y/T − K)+ , (x, y) ∈ (0, +∞) × 

where B is a standard one-dimensional Brownian Actually, it is possible to justify the previous state-
motion. The free arbitrage price at time t is ment in the specific case of a one-dimensional
   T +   Black–Scholes asset: u can be written as
 
−r(T −t) 1  KT − y
Ɛ e Su du − K  St (68) u(t, x, y) = e −r(T −t)
T 0  x ϕ t,
x
(73)

To apply the Feynman–Kac formula, one must (see [20]) where ϕ(t, z) is the solution of the one-
express this quantity as the (conditional) expectation dimensional parabolic PDE
  
 ∂ϕ 2
σ 2 z2 ∂ ϕ (t, z) − 1 + rz ∂ϕ (t, z) + rϕ(t, z) = 0,
(t, z) + (t, z) ∈ [0, T ) × 
∂t 2 ∂z 2 T ∂z (74)
 +
ϕ(T , z) = −(z) /T , z∈

of the value at time T of some Markov quantity. This From this, it is easy to check that u solves equa-
can be done by introducing the process tion (72).
Note that this relies heavily on the fact that the
 t
underlying asset follows the Black–Scholes model.
At = Su du, 0≤t ≤T (69) As far as we know, no rigorous justification of
0
Markov Processes 11

Feynman–Kac formula is available for Asian options solution of the SDE dYt = f (Yt ) dt + Zt dBt with
on more general assets. terminal condition YT = g(XT ).
The following definition of a BSDE generalizes
the previous situation. Given functions bi (t, x) and
Quasi- and Semilinear PDEs and BSDEs σij (t, x) that are globally Lipschitz in x and locally
The link between quasi- and semilinear PDEs and bounded (1 ≤ i, j ≤ d) and a standard d-dimensional
BSDEs is motivated by the following formal argu- Brownian motion B, consider the unique solution X
ment. Consider the semilinear PDE of the time-inhomogeneous SDE



∂u
(t, x) + Lt u(t, x) = f (u(t, x)) dXt = b(t, Xt ) dt + σ (t, Xt ) dBt (79)


 ∂t
 (t, x) ∈ (0, T ) ×  with initial condition X0 = x. Consider also two



 functions f : [0, T ] × d × k × k×d → k and
u(T , x) = g(x) x∈ g : d → k . We say that ((Yt , Zt ), t ≥ 0) solve the
(75) BSDE
where (Lt ) is the family of infinitesimal generators
of a time-inhomogeneous Feller diffusion (Xt , t ≥ 0). dYt = f (t, Xt , Yt , Zt ) dt + Zt dBt (80)
Assume that this PDE admits a classical solution
u(t, x). Assume also that we can find a unique
with terminal condition g(XT ) if Y and Z are
adapted process (Yt , 0 ≤ t ≤ T ) such that
progressively measurable processes with respect to
 T
the Brownian filtration Ft = σ (Bs , s ≤ t) such that,
Yt = Ɛ[g(XT ) − f (Ys ) ds | Ft ] ∀t ∈ [0, T ] for any 0 ≤ t ≤ T ,
t
(76)  T  T
Yt = g(XT ) − f (s, Xs , Ys , Zs ) ds − Zs dBs
Now, by Itô’s formula applied to u(t, Xt ), t t
(81)
 T
u(t, Xt ) = Ɛ[g(XT ) − f (u(s, Xs )) ds | Ft ] Example 4 corresponds to g(x) = (x − K)+ ,
t
f (t, x, y, z) = −ry + z(µ − r)/σ and Zt = σ πt .
(77)
Note that the role of the implicit unknown process
Therefore, Yt = u(t, Xt ) and the stochastic process Z is to make Y adapted.
Y provides a probabilistic interpretation of the solu- The existence and uniqueness of (Y, Z) solving
tionof the PDE (75). Now, by the martingale decom- equation (81) hold under the assumptions that g(x) is
position theorem, if Y satisfies (76), there exists an continuous with polynomial growth in x, f (t, x, y, z)
adapted process (Zt , 0 ≤ t ≤ T ) such that is continuous with polynomial growth in x and linear
growth in y and z, and f is uniformly Lipschitz in y
 T and z. Let us denote by (A) all these assumptions.
Yt = g(XT ) − f (Ys ) ds We refer to [17] for the proof of this result and
t the general theory of BSDEs (see also forward-
 T backward SDEs).
− Zs dBs ∀t ∈ [0, T ] (78) Consider the quasi-linear parabolic PDE
t


∂u (t, x) + L u(t, x) = f (t, x, u(t, x), ∇ u(t, x)σ (t, x)), (t, x) ∈ (0, T ) × d
∂t t x
(82)
u(T , x) = g(x), x ∈ d

where B is the same Brownian motion as the one The following results give the links between the
driving the Feller diffusion X. In other words, Y is BSDE (80) and the PDE (82).
12 Markov Processes

Theorem 5 ([15], Th.4.1). Assume that b(t, x), In the case of an European put option, the price is
σ (t, x), f (t, x, y, z), and g(x) are continuous and dif- given by the solution of the BSDE
ferentiable with respect to the space variables x, y, z  T
+
with uniformly bounded derivatives. Assume also that Yt = (K − ST ) − Zs dBs (85)
b, σ , and f are uniformly bounded and that a = σ σ t
is uniformly elliptic. Then equation (82) admits a by a similar argument as in Example 4. In the
unique classical solution u and case of an American put option, the price at time
t is necessarily bigger than (K − St )+ . It is there-
fore natural to include this condition by consid-
Yt = u(t, Xt ) and Zt = ∇x u(t, Xt )σ (t, Xt ) (83)
ering the BSDE (85) reflected on the obstacle
Theorem 6 ([17], Th.2.4). Assume (A) and that (K − St )+ . Mathematically, this corresponds to the
b(t, x) and σ (t, x) are globally Lipschitz in x and problem of finding adapted processes Y, Z, and R
locally bounded. Define the function u(t, x) = Ytt,x , such that
 T
where Y t,x is the solution to the BSDE (82) on the time  Yt = (K − ST )+ − t Zs dBs + RT − Rt


interval [t, T ], where X is solution to the SDE (79) 
 Y ≥ (K − S )+
with initial condition Xt = x. Then u is a viscosity t t
(86)

 R is continuous, increasing, R0 = 0 and
solution of equation (82). 
 
 T +
0 [Yt − (K − St ) ] dRt = 0
Theorem 5 gives an interpretation of the solution
of a BSDE in terms of the solution of a quasi- The process R increases only when Yt = (K − St )+
linear PDE. In particular, in Example 4, it gives in such a way that Y cannot cross this obstacle. The
the usual interpretation of the hedging strategy πt = existence of a solution of this problem is a particular
Zt /σ as the -hedge of the option price. Note also case of general results, (see [7]). As a consequence
that Theorem 5 implies that the process (X, Y, Z) of the following theorem, this reflected BSDE gives
is Markov—a fact which is not obvious from the a way to compute the price of the American put
definition. Conversely, Theorem 6 shows how to option.
construct a viscosity solution of a quasi-linear PDE
from BSDEs. Theorem 7 ([7], Th.7.2). The American put option
BSDEs provide an indirect tool to compute quan- has the price Y0 , where (Y , Z, R) solves the reflected
tities related to a solution X of the SDE (such as BSDE (86).
the hedging price and strategy of an option based The essential argument of the proof is the follow-
on the process X). BSDEs also have links with ing. Fix t ∈ [0, T ) and a stopping time τ ∈ [t, T ].
general stochastic control problems, that we will Since
not mention (see BSDEs). Here, we give an exam-  τ
ple of application to the pricing of an American Yτ − Yt = Rt − Rτ + Zs dBs (87)
put option. t

Example 7 Pricing of an American Put Option and because R is increasing, Yt = Ɛ∗ [Yτ + Rτ −


Consider a Black–Scholes underlying asset S and Rt | Ft ] ≥ Ɛ∗ [(K − Sτ )+ | Ft ]. Conversely, if τt∗ =
assume for simplicity that the risk-free interest rate inf{u ∈ [t, T ] : Yu = (K − Su )+ }, because Y > (K −
r is zero. The price of an American put option on S)+ on [t, τt∗ ), R is constant on this
interval and
S with strike K and maximal exercise policy T is
given by
Yt = Ɛ∗ [Yτt∗ + Rτt∗ − Rt | Ft ] = Ɛ∗ [(K − Sτt∗ )+ ]
∗ +
sup Ɛ [(K − Sτ ) ] (84)
0≤τ ≤T (88)

where τ is a stopping time and where ∗ is the risk- Therefore,


neutral probability measure, under which the process
Yt = ess sup Ɛ∗ [(K − Sτ )+ | Ft ] (89)
S is simply a Black–Scholes asset with zero drift. t≤τ ≤T
Markov Processes 13

which gives another interpretation for the solution Y Optimal Control,


of the reflected BSDE. Applying this for t = 0 yields Hamilton–Jacobi–Bellman Equations, and
Y0 = supτ ≤T Ɛ∗ [(K − Sτ )+ ] as stated. Variational Inequalities
Moreover, as shown by the previous computation,
the process Y provides an interpretation of the We discuss only two main families of stochastic
optimal exercise policy as the first time where Y hits control problems: finite horizon and the optimal
the obstacle (K − S)+ . This fact is actually natural stopping problems. Other classes of optimal problems
from equation (89); the optimal exercise policy is the appearing in finance are mentioned in the end of this
first time where the current payoff equals the maximal section.
future expected payoff.
As it will appear in the next section, as the solution
of an optimal stopping problem, if S0 = x, the price Finite Horizon Problems
of this American put option is u(0, x), where u is the
The study of optimal control problems with finite
solution of the nonlinear PDE
horizon is motivated, for example, by the ques-
  
 min u(t, x) − (K − x)+ ; − ∂u (t, x) − σ 2 x 2 ∂ 2 u u(t, x) = 0, (t, x) ∈ (0, T ) × (0, +∞)
∂t 2 ∂x 2 (90)

u(T , x) = (K − x)+ , x ∈ (0, +∞)

Therefore, similarly as in Theorem 6, the reflected tions of portfolio management, quadratic hedging of
BSDE (84) provides a probabilistic interpretation of options, or super-hedging cost for uncertain volatil-
the solution of this PDE. ity models.
The (formal) essential argument of the proof of Let us consider a controlled diffusion X α in d
this result can be summarized as follows (for details, solution to the SDE
see [14, Section V.3.1]). Consider the solution u of
equation (90) and apply Itô’s formula to u(t, St ). dXtα = b(Xtα , αt ) dt + σ (Xtα ) dBt (93)
Then, for any stopping time τ ∈ [0, T ],
where B is a standard r-dimensional Brownian
 τ  motion and the control α is a given progressively
∂u measurable process taking values in some compact
u(0, x) = Ɛ[u(τ, Sτ )] − Ɛ (t, St )
0 ∂t metric space A. Such a control is called admissible.
  For simplicity, we consider the time-homogeneous
σ 2 ∂ 2u
+ St2 2 u(t, St ) ds case and we assume that the control does not act on
2 ∂x the diffusion coefficient σ of the SDE. Assume that
(91) b(x, a) is bounded, continuous, and Lipschitz in the
variable x, uniformly in a ∈ A. Assume also that σ is
Because u is solution of equation (90), u(0, x) ≥ Lipschitz and bounded. For any a ∈ A, we introduce
Ɛ[u(τ, Sτ )] ≥ Ɛ[(K − Sτ )+ ]. Hence, u(0, x) ≥ the linear differential operator
sup0≤τ ≤T Ɛ[(K − Sτ )+ ].
Conversely, if τ ∗ = inf{0 ≤ t ≤ T : u(s, Ss ) =  d 
1  
d
∂ 2ϕ
(K − Ss )+ }, then L ϕ=
a
σik (x)σj k (x)
2 i,j =1 k=1 ∂xi ∂xj
∂u σ 2 2 ∂ 2u 
∀t ∈ [0, τ ∗ ]
d
(t, St ) + S u(t, St ) = 0 ∂ϕ
∂t 2 t ∂x 2 + bi (x, a) (94)
(92) i=1
∂xi
Therefore, for τ = τ ∗ , all the inequalities in the
previous computation are equalities and u(0, x) = which is the infinitesimal generator of X α when α is
sup0≤τ ≤T Ɛ[(K − Sτ )+ ]. a constant equal to a ∈ A.
14 Markov Processes
 
A typical form of finite horizon optimal control ∂v
problems in finance consists in computing ×Ɛ (t, Xtα ) + Lαt v(t, Xtα ) + rv(t, Xtα ) ds
∂t
 (98)
u(t, x) = inf Ɛ e−rT g(XTα )
α admissible Therefore, by equation (96),
 T 
+ e−rt f (Xtα , αt ) dt | Xtα = x (95)
t v(0, x)
  T 
where f and g are continuous and bounded func- ≤ Ɛ e−rT g(XTα ) + e−rt f (Xtα , αt ) dt | Xtα = x
tions and to find an optimal control α ∗ that realizes t
the minimum. Moreover, it is desirable to find a (99)
Markov optimal control, that is, an optimal con-
trol having the form αt∗ = ψ(t, Xt ). Indeed, in this for any admissible control α. Now, for the Markov

case, the controlled diffusion X α is a Markov pro- control α ∗ defined in Theorem 8, all the inequalities
cess. in the previous computation are equalities. Hence
In the case of nondegenerate diffusion coefficient, v = u.
we have the following link between the optimal The cases where σ is not uniformly elliptic or
control problems and a semilinear PDEs. where σ is also dependent on the current control
αt are much more difficult. In both cases, it is
Theorem 8 Under the additional assumption that necessary to enlarge the set of admissible control
σ is uniformly elliptic, u is the unique bounded by considering relaxed controls, that is, controls
classical solution of the Hamilton–Jacobi–Bellman that belong to the set P(A) of probability measures
(HJB) equation on A. For such a control α, the terms b(x, αt ) and

∂u (t, x) + inf {La u(t, x) + f (x, a)} − ru(t, x) = 0, (t, x) ∈ (0, T ) × d
∂t a∈A
(96)
u(T , x) = g(x), x ∈ d

Furthermore, a Markov control αt∗ = ψ(t, Xt ) is opti- f (x, αt ) in equations (93) and (95) are replaced by
mal for a fixed initial condition x and initial time b(x, a)αt (da) and f (x, a)αt (da), respectively.
t = 0 if and only if The admissible controls of the original problem
correspond to relaxed controls that are Dirac masses
Lψ(t,x) u(t, x) + f (x, ψ(t, x)) at each time. These are called precise controls.
The value ũ of this new problem is defined as
= inf {La u(t, x) + f (x, a)} (97) in equation (95), but the infimum is taken over all
a∈A
progressively measurable processes α taking values
for almost every (t, x) ∈ [0, T ] × d . in P(A). It is possible to prove under general
assumptions that both problems give the same value:
This is Theorem III.2.3 of [3] restricted to the case ũ = u (cf. [3, Cor.I.2.1] or [8, Th.2.3]).
of precise controls (see later). In these cases, one usually cannot prove the
Here again, the essential argument of the proof existence of a classical solution of equation (96). The
can be easily (at least formally) written: consider any weaker notion of viscosity solution is generally the
admissible control α and the corresponding controlled correct one. In all the cases treated in the literature,
diffusion X α with initial condition X0 = x. By Itô’s u = ũ solves the same HJB equation as in Theorem 8,
formula applied to e−rt v(t, Xtα ), where v is the except that the infimum is taken over P(A) instead
solution of equation (96), of A (cf. [3, Th.IV.2.2] for the case without control
on σ ). However, it is not trivial at all in general to
 T
obtain a result on precise controls from the result
Ɛ[e−rT v(T , XTα )] = v(0, x) + e−rt on relaxed controls. This is due to the fact that
0
Markov Processes 15

usually no result is available on the existence and assume that g(t, x) is differentiable with respect
the characterization of a Markov-relaxed optimal to t and twice differentiable with respect to x
control. The only examples where it has been done and that
require restrictive assumptions (cf. [8, Cor.6.8]).   d  
However, in most of the financial applications, the  ∂g    ∂g 
|f (t, x)| +  (t, x) + 
 (t, x) ≤ Ceµ|x|

value function u is the most useful information. In ∂t ∂x i
i=1
practice, one usually only needs to compute a control (102)
that give an expected value arbitrarily close to the for positive constants C and µ.
optimal one.
Theorem 9 ([2], Sec.III.4.9). Under the previous
assumptions, u(t, x) admits first-order derivatives
Optimal Stopping Problems
with respect to t and second-order derivatives with
Optimal stopping problems arise in finance, for respect to x that are Lp for all 1 ≤ p < ∞. Moreover,
example, for the American options pricing (when u is the solution of the variational inequality

  
max u(t, x) − g(t, x); − ∂u
∂t (t, x) − L t u(t, x) + ru(t, x) − f (t, x) = 0, (t, x) ∈ (0, T ) × d
(103)
u(T , x) = g(T , x) x ∈ d

to sell a claim, an asset?) or in production models The proof of this result is based on a similar
(when to extract or product a good? when to stop (formal) justification as the one we gave for equa-
production?). tion (90). We refer to [12] for a similar result under
Let us consider a Feller diffusion X in d solution weaker assumptions more suited to financial models
to the SDE when f = 0 (this is in particular the case for Amer-
ican options).
dXt = b(t, Xt ) dt + σ (t, Xt ) dBt (100) In some cases (typically with f = 0, see [11]), it
can be shown that the infimum in equation (101) is
where B is a standard d-dimensional Brownian attained for the stopping time
motion. As in equation (28), let (Lt )t≥0 denote its
family of time-inhomogeneous infinitesimal genera-  
τ ∗ = inf t ≤ s ≤ T : u(s, Xst,x ) = g(s, Xst,x )
tors. Denote by (t, T ) the set of stopping times
valued in [t, T ]. (104)
A typical form of optimal stopping problems
consists in computing where X t,x is the solution of the SDE (100) with
initial condition Xtt,x = x.

u(t, x) = inf Ɛ e−r(τ −t) g(τ, Xτ )
τ ∈(t,T ) Generalizations and Extensions
 τ 
+ e−r(s−t) f (s, Xs ) ds | Xt = x An optimal control problem can also be solved
t through the optimization of a family of BSDEs
(101) related to the laws of the controlled diffusions. On
this question, we refer to [19] and BSDEs.
and to characterize an optimal stopping time. In this section, we considered only very specific
Assume that b(t, x) is bounded and continu- optimal control problems. Other important families of
ously differentiable with bounded derivatives and optimal control problems are given by impulse con-
that σ (t, x) is bounded, continuously differentiable trol problems, where the control may induce a jump
with respect to t and twice continuously differen- of the underlying stochastic process, or ergodic con-
tiable with respect to x with bounded derivatives. trol problems, where the goal is to optimize a quantity
Assume also that σ is uniformly elliptic. Finally, related to the stationary behavior of the controlled
16 Markov Processes

diffusion. Impulse control has applications, for exam- or difficult depending on the particular constraints
ple, in stock or resource management problems. In the imposed on the control. Moreover, these methods
finite horizon case, when the underlying asset follows require to localize the problem, that is, to solve the
a model with stochastic or elastic volatility or when problem in a bounded domain with artificial bound-
the market is incomplete, other optimal control prob- ary conditions, which are usually difficult to compute
lems can be considered, such as characterizing the precisely. This localization problem can be solved by
superhedging cost, or minimizing some risk measure. computing the artificial boundary condition with a
Various constraints can be included in the optimal Monte Carlo method based on BSDEs. However, the
control problem, such as maximizing the expectation error analysis of this method is based on the prob-
of an utility with the constraint that this utility has abilistic interpretation of HJB equations in bounded
a fixed volatility, or minimizing the volatility for a domains, which is a difficult problem in general.
fixed expected utility. One can also impose Gamma
constraints on the control. Another important exten- End Notes
sion of optimal control problems arises when one
wants tosolve numerically an HJB equation. Usual a.
A Markov semigroup family (Pt , t ≥ 0) on d is a family
discretization methods require to restrict to a bounded of bounded linear operators of norm 1 on the set of bounded
domain and to fix artificial boundary conditions. The measurable functions on d equipped with the L∞ norm,
numerical solution can be interpreted as the solution which satisfies equation (8).
b.
of an optimal control problem in a bounded domain. This is not the most general definition of Feller semi-
groups (see [21, Def.III.6.5]). In our context, because we
In this situation, a crucial question is to quantify the only introduce analytical objects from stochastic processes,
impact on the discretized solution of an error on the the semigroup (Pt ) is naturally defined on the set of
artificial boundary condition (which usually cannot bounded measurable functions.
c.
be computed exactly). The strong continuity of a semigroup is usually defined
as Pt f − f  → 0 as t → 0 for all f ∈ C0 (d ). However,
in the case of Feller semigroups, this is equivalent to the
On Numerical Methods weaker formulation (10) (see [21, Lemma III.6.7]).

The Feynman–Kac formula for linear PDEs allows References


one to use Monte Carlo methods to compute the
solution of the PDE. They are especially useful when [1] Bally, V. & Pagès, G. (2003). Error analysis of the
the solution of the PDE has to be computed at a optimal quantization algorithm for obstacle problems,
small number of points, or when dimension is large Stochastic Processes and their Applications 106(1),
(typically larger or equal to 4), since they provide a 1–40.
rate of convergence independent of the dimension. [2] Bensoussan, A. & Lions, J.-L. (1982). Applications of
Variational Inequalities in Stochastic Control, Studies
Concerning quasi- or semilinear PDEs and some in Mathematics and its Applications, North-Holland
optimal control problems (e.g., American put options Publishing, Amsterdam, Vol. 12 (Translated from the
in the section “Quasi- and Semilinear PDEs and French).
BSDEs”), interpretations in terms of BSDEs provide [3] Borkar, V.S. (1989). Optimal Control of Diffusion Pro-
indirect Monte Carlo methods of numerical com- cesses, Pitman Research Notes in Mathematics Series,
putation (see [1] for Bermudan options or [4, 6] Longman Scientific & Technical, Harlow, Vol. 203.
[4] Bouchard, B. & Touzi, N. (2004). Discrete-time approx-
for general BSDEs schemes). These methods have
imation and Monte-Carlo simulation of backward stoch-
the advantage that they do not require to consider astic differential equations, Stochastic Processes and
artificial boundary conditions. However, their speed their Applications 111(2), 175–206.
of convergence to the exact solution is still largely [5] Çinlar, E. & Jacod, J. (1981). Representation of
unknown, and could depend on the dimension of the semimartingale Markov processes in terms of Wiener
problem. processes and Poisson random measures, in Sem-
For high dimensional HJB equations, the analyti- inar on Stochastic Processes, 1981 (Evanston, Ill.,
1981), Progress in Probability and Statistics, Birkhäuser,
cal discretization methods lead to important numeri- Boston, Vol. 1, pp. 159–242.
cal problems. First, these methods need to solve an [6] Delarue, F. & Menozzi, S. (2006). A forward-backward
optimization problem at each node of the discretiza- stochastic algorithm for quasi-linear PDEs, Annals of
tion grid, which can be very costly in high dimension Applied Probability 16(1), 140–184.
Markov Processes 17

[7] El Karoui, N., Kapoudjian, C., Pardoux, E., Peng, S. & [16] Øksendal, B. (2003). Stochastic Differential Equations:
Quenez, M.C. (1997). Reflected solutions of backward An Introduction with Applications, 6th Edition, Univer-
SDE’s, and related obstacle problems for PDE’s, Annals sitext, Springer-Verlag, Berlin.
of Probability 25(2), 702–737. [17] Pardoux, E. (1998). Backward stochastic differential
[8] El Karoui, N., Nguyen, D. & Huu Jeanblanc-Picqué, M. equations and viscosity solutions of systems of semi-
(1987). Compactification methods in the control of linear parabolic and elliptic PDEs of second order, in
degenerate diffusions: existence of an optimal control, Stochastic Analysis and Related Topics: The Geilo Work-
Stochastics 20(3), 169–219. shop, B.O.L. Decreusefond, J. Gjerde & A. Ustunel, eds,
[9] Ethier, S.N. & Kurtz, T.G. (1986). Markov Processes: Birkhäuser, pp. 79–127.
Characterization and Convergence, Wiley Series in Prob- [18] Protter, P. (2001). A partial introduction to financial
ability and Mathematical Statistics: Probability and asset pricing theory, Stochastic Processes and Their
Mathematical Statistics, John Wiley & Sons, New York. Applications 91(2), 169–203.
[10] Friedman, A. (1975). Stochastic Differential Equations [19] Quenez, M.C. (1997). Stochastic control and BSDEs,
and Applications, Vol. 1, Probability and Mathematical in Backward Stochastic Differential Equations (Paris,
Statistics, Academic Press [Harcourt Brace Jovanovich 1995–1996), Pitman Research Notes in Mathematics
Publishers], New York, Vol. 28. Series, Longman, Harlow, Vol. 364, pp. 83–99.
[11] Jacka, S.D. (1993). Local times, optimal stopping and [20] Rogers, L.C.G. & Shi, Z. (1995). The value of an
semimartingales, Annals of Applied Probability 21(1), Asian option, Journal of Applied Probability 32(4),
329–339. 1077–1088.
[12] Jaillet, P., Lamberton, D. & Lapeyre, B. (1990). Varia- [21] Rogers, L.C.G. & Williams, D. (1994). Diffusions,
tional inequalities and the pricing of American options, Markov Processes, and Martingales, Wiley Series in
Acta Applicandae Mathematicae 21(3), 263–289. Probability and Mathematical Statistics: Probability and
[13] Karatzas, I. & Shreve, S.E. (1988). Brownian Motion Mathematical Statistics, 2nd Edition, John Wiley &
and Stochastic Calculus, Graduate Texts in Mathematics, Sons, Chichester, Vol. 1.
Springer-Verlag, New York, Vol. 113. [22] Talay, D. & Zheng, Z. (2003). Quantiles of the Euler
[14] Lamberton, D. & Lapeyre, B. (1996). Introduction to scheme for diffusion processes and financial applica-
Stochastic Calculus Applied to Finance, Chapman & tions, Mathematical Finance 13(1) 187–199, Confer-
Hall, London (Translated from the 1991 French original ence on Applications of Malliavin Calculus in Finance
by Nicolas Rabeau and François Mantion). (Rocquencourt, 2001).
[15] Ma, J., Protter, P. & Yong, J.M. (1994). Solving forward-
backward stochastic differential equations explicitly—a MIREILLE BOSSY & NICOLAS CHAMPAGNAT
four step scheme, Probability Theory and Related Fields
98(3), 339–359.
Doob–Meyer with a martingale M and an increasing predictable
process A satisfying A0 = 0. While the intuitive
Decomposition meaning of M and A may not be obvious, the cor-
responding decomposition of the increments Xt :=
Xt − Xt−1 is easier to understand.
Submartingales are processes that grow on average.
Subject to some condition of uniform integrability, Xt = Mt + At (3)
they can be written uniquely as the sum of a can be interpreted in the sense that the increment Xt
martingale and a predictable increasing process. This consists of a predictable trend At and a random
result is known as the Doob–Meyer decomposition. deviation Mt from that trend. Its implication At =
Consider a filtered probability space (, F , E(Xt |F t−1 ) means that At is the best prediction
F, P ). It consists of a probability space (, F , P ) of Xt in a mean-square sense and based on the
and a filtration F = (F t )t≥0 , that is, an increasing information up to time t − 1.
family of sub-σ -fields of F . The σ -field F t stands The natural decomposition (3) does not make
for the information available at time t. A random sense for continuous time processes but an analog
event A belongs to F t , if we know at time t, of equation (2) still exists. To this end, the notion
whether it will take place or not, that is, A does not of predictability must be extended to continuous
depend on randomness in the future. For technical time. A process X = (Xt )t∈+ is called predictable
reasons, one
 typically assumes right continuity, that if—viewed as a mapping on  × + —it is mea-
is, F t = s>t F s . surable with respect to the σ -field generated by all
A martingale (see Martingales) (respectively sub- adapted, left-continuous processes. Intuitively, this
martingale, supermartingale) is an adapted, inte- rather abstract definition means that Xt is known
grable process (Xt )t∈+ satisfying slightly ahead of time t. In view of the discrete-time
E(Xt |F s ) = Xs (1) case, it may seem more natural to require that Xt be
F t− -measurable, where F t− stands for the smallest
(respectively ≥ Xs , ≤ Xs ) for s ≤ t. Moreover, we sub-σ -field containing all F s , s < t. However, this
require these processes to be a.s. càdlàg, that is, right- slightly weaker condition turns out to be too weak
continuous with left-hand limits. Adaptedness means for the general theory.
that Xt is F t -measurable, that is, the random value In order for a decomposition (2) into a martingale
Xt is known at the latest at time t. Integrability M and a predictable increasing process A to exist,
E(|Xt |) < ∞ is needed for the conditional expec- one must assume some uniform integrability of X.
tation to be defined. The crucial martingale equality The process X must belong to the so-called class
(1) means that the best prediction of future values (D), which amounts to a rather technical condition
of X is the current value, that is, X will stay on the implying supt≥0 E(|Xt |) ≤ ∞ but being itself implied
current level on average. In other words, it does not by E(supt≥0 |Xt |) ≤ ∞. For its precise definition, we
exhibit any positive or negative trend. If X denotes need to introduce the concept of a stopping time,
the price of a security, this asset does not produce which is not only an indispensable tool for the general
profits or losses on average. Submartingales, on the theory of stochastic processes but also interesting for
other hand, grow on average. Put differently, they applications, for example, in mathematical finance. A
show an upward trend compared to a martingale. [0, ∞]-valued random variable T is called stopping
This loose statement is made precise in terms of the time if {T ≤ t} ∈ F t for any t ≥ 0. Intuitively, T
Doob–Meyer decomposition. stands for a random time, which is generally not
As a starting point, consider a discrete-time pro- known in advance but at the latest once it has
cess X = (Xt )t=0,1,2,... . In discrete time, a process happened (e.g., the time of a phone call, the first time
X is called predictable if Xt is F t−1 -measurable when a stock hits 100, the time when you crash your
for t = 1, 2, . . .. This means that the value Xt is car into a tree). In financial applications, it appears,
known already one period ahead. The Doob decompo- for example, as the exercise time of an American
sition states that any submartingale X can be written option.
uniquely as Stopping times can be classified by their degree of
Xt = Mt + At (2) suddenness. Predictable stopping times do not come
2 Doob–Meyer Decomposition

entirely as a surprise because one anticipates them. monotonicity of A. In general, A is only required
Formally, a stopping time T is called predictable if to be of finite variation, that is, the difference of
it allows for an announcing sequence, that is, for a two increasing processes.  t In the Itô process exam-
sequence (Tn )n∈ of stopping times satisfying T0 < ple, these are A(+) = 0 max(Ks , 0)ds and At
(−)
=
t t
T1 < T2 < . . . on {T > 0} and Tn → T as n → ∞. max(−K s , 0)ds. Put differently, the trend may
0
This is the case for a continuous stock price hitting change its direction every now and then.
100 or for the car crashing into a tree, because you To cover all Itô processes, one must also allow for
can literally see the level 100 or the tree coming local martingales rather than martingales. M is said
increasingly closer. Phone calls, strikes of lightning, to be a local martingale if there exists a sequence
or jumps of Lévy process, on the other hand, are of stopping times (Tn )n∈ , which increases to ∞
of an entirely different kind because they happen almost surely such that M Tn is a martingale for
completely out of the blue. Such stopping times T any n. Here, the stopped process M Tn is defined as
are called totally inaccessible, which formally means MtTn := Mmin(Tn ,t) , that is, it stays constant after time
that P (S = T < ∞) = 0 for all predictable stopping Tn (as e.g., your wealth does if you sell an asset at
times S. Tn ). This rather technical concept appears naturally
Coming back to our original theme, a pro- in the general theory of stochastic
cess X is said to be of class (D) if the set  t processes. For
example, stochastic integrals Mt = 0 Hs dNs relative
{XT : T finite stopping time} is uniformly integrable, to martingales N generally fail to be martingales but
which in turn means that are typically local martingales or a little less, namely,
σ -martingales.
lim sup E(1{|XT |>c} |XT |) = 0
c→∞
T finite stopping time A local martingale is a uniformly integrable mar-
tingale, if and only if it is of class (D). Nevertheless,
The Doob–Meyer decomposition can now be stated one should be careful with thinking that local mar-
as follows: tingales behave basically as martingales up to some
integrability. For example, there exist local martin-
Theorem 1 Any submartingale X of class (D) t
gales Mt = 0 Hs dWs with M0 = 0 and M1 = 1 a.s.
allows for a unique decomposition and such that E(|Mt |) < ∞, t ≥ 0. Even though
Xt = Mt + At (4) such a process has no trend in a local sense, it
behaves entirely differently from a martingale on
with a martingale M and some predictable increasing a global scale. The difference between local mar-
process A satisfying A0 = 0. tingales and martingales leads to many technical
problems in mathematical finance. For example, the
The martingale M turns out to be of class (D) previous example may be interpreted in the sense that
as well, which implies that it converges a.s. and in dynamic investment in a perfectly reasonable martin-
L1 to some terminal random variable M∞ . Since the gale may lead to arbitrage unless the set of trading
whole martingale M can be recovered from its limit strategies is restricted to some admissible subset.
via Mt = E(M∞ |F t ), one can formally identify such Let us come back to generalizing the Doob–Meyer
uniformly integrable martingales with their limit. decomposition. Without class (D) it reads as follows:
In the case of an Itô process
Theorem 2 Any submartingale X allows for a
dXt = Ht dWt + Kt dt (5) unique decomposition (4) with a local martingale M
and some predictable increasing process A satisfying
the Doob–Meyer decomposition  t is easily obtained. A0 = 0.
Indeed, we have Mt = X0 + 0 Hs dWs and At =
t
0 Ks ds. However, a general Itô process need not,
For a considerably larger class of processes X,
of course, be a submartingale. However, equation there exists a canonical decomposition (4) with a
(5) suggests that a similar decomposition exists for local martingale M and some predictable process A
more general processes. This is indeed the case. of finite variation, which starts in 0. These processes
For a generalization covering all Itô processes we are called special semimartingales and they play a
relax both the martingale property of M and the key role in stochastic calculus. The slightly larger
Doob–Meyer Decomposition 3

class of semimartingales is obtained, if A is only Further Reading


required to be adapted rather than predictable. This
class is, in some sense, the largest one that
 t allows Protter, P. (2004). Stochastic Integration and Differential Equa-
for the definition of a stochastic integral 0 Hs dXs tions, 2nd Edition, Version 2.1, Springer, Berlin.
satisfying a mild continuity property. In the gen-
eral semimartingale case, decomposition (4) should
Related Articles
not be called canonical because it is not unique.
Moreover, A should not be regarded as a trend
unless it is predictable. On the other hand, if the American Options; Martingales; Semimartingale.
jumps of a semimartingale X are sufficiently inte-
JAN KALLSEN
grable (e.g., bounded), then X is special and hence
allows for a canonical decomposition resembling the
Doob–Meyer decomposition of a submartingale.
Forward–Backward BSDEs provide exactly the right mathematical tool
for it.
Stochastic Differential Peng [41], and Pardoux and Peng [38], then
studied decoupled FBSDEs, that is, b and σ do
Equations (SDEs) not depend on (y, z). They discovered the deep
relation between Markovian FBSDEs (i.e., FBSDEs
with deterministic coefficients) and PDEs, via the
A forward–backward stochastic differential equation so called nonlinear Feynman–Kac formula. Soon
(FBSDE) is a system of two Itô-type stochastic after that, people found that such FBSDEs had very
differential equations (SDEs) over [0, T ] taking the natural applications in option pricing theory, and thus
following form: extended the Black–Scholes formula to a much more
general framework. In particular, the solution triplet

 dX = b(t, ω, Xt , Yt , Zt )dt (X, Y, Z) can be interpreted as the underlying asset
 t
+ σ (t, ω, Xt , Yt , Zt )dWt , X0 = x; price, the option price, and the hedging portfolio,

 dYt = −f (t, ω, Xt , Yt , Zt )dt + Zt dWt , respectively. El Karoui et al. [22] further introduced
YT = g(ω, XT ) reflected BSDEs, which are appropriate for pricing
American options, again, in a general framework. See
(1)
a survey paper [24] and the section Applications for
Here W is a standard Brownian motion defined such applications.
on a complete probability space (, F, P ), and The theory of coupled FBSDEs was originally
 motivated by Black’s consol rate conjecture.
F={Ft }0≤t≤T is the filtration generated by W aug- Antonelli [1] proved the first well-posedness result,
mented with all the null sets. The coefficients when the time duration T is small. For arbitrary T ,
b, σ, f, g are progressively measurable; b, σ, f are F- there are three typical approaches, each with its limit.
adapted for fixed (x, y, z); and g is FT -measurable The most famous one is the four-step scheme, pro-
for fixed x. The first equation is forward because posed by Ma et al. [34]. On the basis of this scheme,
the initial value X0 is given, while the second one Duffie et al. [21] confirmed Black’s conjecture. The
is backward because the terminal condition YT is theory has also been applied to various areas, espe-
given. The solution to FBSDE (1) consists of three cially in finance and in stochastic control.
F-adapted processes (X, Y, Z) that satisfy equation There have been numerous publications on the
(1) for any t, P almost surely (a.s.), and subject. We refer interested readers to the books [23,
 35], and the references therein for the general theory
   and applications.
(X, Y, Z)2 = E sup |Xt |2 + |Yt |2
0≤t≤T
 T Decoupled FBSDEs
+ |Zt |2 dt < ∞ (2)
0
Since b and σ do not depend on (y, z), one can first
BSDEs can be traced back to the 1973 paper by solve the forward SDE and then the backward one.
Bismut [7], where a linear BSDE is introduced as The main idea in [37] to solve BSDEs is to apply
an adjoint equation for a stochastic control prob- the Picard iteration, or equivalently, the contraction
lem. Bensoussan [6] proved the well posedness of mapping theorem.
general linear BSDEs by using the martingale rep-
resentation theorem. The general theory of nonlinear Theorem 1 ([38]). Assume that b, σ do not depend
BSDEs, however, originated from the seminal work on (y, z); that b, σ , f , g are uniformly Lipschitz con-
of Pardoux and Peng [37]. Their motivation was to tinuous in (x, y, z), uniformly on (ω, t); and that
study the general Pontryagin-type maximum princi-

 T

ple for stochastic optimal controls; see, for example,


I0 = E |b(t, ·, 0)|2 + |σ (t, ·, 0)|2
[40]. Independent of the development of this theory, 0
Duffie and Epstein [19, 20] proposed the concept
of stochastic recursive utility, and it turns out that + |f (t, ·, 0, 0, 0)|2 dt + |g(·, 0)|2 < ∞ (3)
2 Forward–Backward Stochastic Differential Equations (SDEs)

Then FBSDE (1) admits a unique solution reason we call equation (4) a Markovian FBSDE. We
(X, Y , Z), and there exists a constant C, depending note that in the Black–Scholes model, as we see in
only on T , the dimensions, and

the Lipschitz
constant, the section Applications, the PDE (5) is linear and
such that (X, Y , Z) ≤ C |x0 | + I0 .
2 2 one can solve for u explicitly. Then equation (6) in
fact gives us the well known Black–Scholes formula.
When dim(Y ) = 1, we have the following com- Moreover, the hedging portfolio Zt σ −1 (t, Xt ) is the
parison result for the BSDE. For i = 1, 2, assume sensitivity of the option price Yt with respect to the
underlying asset price Xt . This is exactly the idea
(b, σ, fi , gi ) satisfy the assumptions in Theorem 1
of the -hedging. On the other hand, when f is
and let (X, Y i , Z i ) denote the corresponding solu-
linear in (y, z), equation (7) actually is equivalent to
tions to equation (1). If f 1 ≤ f 2 , g 1 ≤ g 2 , P a.s.,
the Feynman–Kac formula. In general, when m = 1,
for any (t, x, y, z), then, Yt1 ≤ Yt2 , ∀t, P a.s.; see, for
equation (7) provides a probabilistic representation
example, [24]. On the basis of this result, Lepeltier
for the viscosity solution to the PDE (5), and thus
and San Martín [31] constructed solutions to BSDEs
is called a nonlinear Feynman–Kac formula. Such a
with non-Lipschitz coefficients. Moreover, Kobylan-
type of representation formula is also available for
ski [30] and Briand and Hu [10] proved the well
ux [36].
posedness of BSDEs whose generator f has quadratic
The link between FBSDEs and PDEs opens the
growth in Z. Such BSDEs are quite useful in practice.
door to efficient Monte Carlo methods for high-
When the coefficients are deterministic, the decou-
dimensional PDEs and FBSDEs, and thus also for
pled FBSDE (1) becomes
many financial problems. This approach can effec-
tively overcome the curse of dimensionality; see,
dXt = b(t, Xt )dt + σ (t, Xt )dWt , X0 = x;
for example, [3–5, 8, 27, 45], and [12]. There are
dYt = −f (t, Xt , Yt , Zt )dt + Zt dWt , (4) also some numerical algorithms for non-Markovian
YT = g(XT ) BSDEs and coupled FBSDEs; see, for example, [2,
In this case, the FBSDE is associated with the 9, 18, 33], and [17].
following system of parabolic PDEs:
   Coupled FBSDEs

 ui + 1 tr uixx σ σ ∗ (t, x) + uix b(t, x)
 t 2i
+f (t, x, u, ux σ (t, x)) = 0, (5) The theory of coupled FBSDEs is much more com-

 i = 1, · · · , m; plex and is far from complete. There are mainly three

u(T , x) = g(x) approaches for its well posedness, each with its limit.
Since the precise statements of the results require
Theorem 2 ([38]). Assume b, σ , f , g satisfy all the complicated notation and technical conditions, we
conditions in Theorem 1. refer readers to the original research papers and focus
only on the main ideas here.
(i) If PDE (5) has a classical solution u ∈ C 1,2
([0, T ] × IRn ), then Method 1 Contraction Mapping This method
works very well for BSDEs and decoupled FBS-
Yt = u(t, Xt ), Zt = ux σ (t, Xt ) (6) DEs. However, to ensure the constructed mapping
(ii) In general, define is a contraction one, for coupled FBSDEs one has
to assume some stronger conditions. The first well-
 posedness result was by Antonelli [1], which has been
u(t, x)=E{Yt |Xt = x} (7)
extended further by Pardoux and Tang [39]. Roughly
speaking, besides the standard Lipschitz conditions,
Then u is deterministic and Yt = u(t, Xt ). FBSDE (1) is well posed in one of the following
Moreover, when m = 1, u is the unique viscos- three cases: (i) T is small and either σz or gx is
ity solution to the PDE (5). small; (ii) X is weakly coupled into the BSDE (i.e.,
gx and fx are small) or (Y, Z) are weakly cou-
In this case, X is a Markov process; then by equation pled into the FSDE (i.e., by , bz , σy , σz are small); or
(6) the solution (X, Y, Z) is Markovian. For this (iii) b is deeply decreasing in x (i.e., [b(·, x1 , ·) −
Forward–Backward Stochastic Differential Equations (SDEs) 3

b(·, x2 , ·)][x1 − x2 ] ≤ −C|x1 − x2 |2 for some large assumes some sufficient conditions on the determin-
C) or f is deeply decreasing in y. Antonelli [1] istic coefficients to ensure such Lipschitz continuity.
also provides a counterexample to show that, under In particular, one key condition is that the coefficient
Lipschitz conditions only, equation (1) may have no σ be uniformly nondegenerate. Zhang [46] allows
solution. the coefficients to be random and σ to be degen-
erate, but assumes all processes are one-dimensional
Method 2 Four-step Scheme This is the most pop- along with some special compatibility condition on
ular method for coupled FBSDEs with deterministic the coefficients, so that a similarly defined random
coefficients, proposed by Ma et al. [34]. The main field u(t, ω, x) is uniformly Lipschitz continuous
idea is to use the close relationship between Marko- in x.
vian FBSDEs and PDEs, in the spirit of Theorem 2.
Step 1 in [34] deals with the dependence of σ on z, Method 3 Method of Continuation The idea is
which works only in very limited cases. The more that, if an FBSDE is well-posed, then a new FBSDE
interesting case is that σ does not depend on z. Then with slightly modified coefficients is also well-posed.
the other three steps read as follows: The problem is then to find sufficient conditions so
that this modification procedure can go arbitrarily
Step 2. Solve the following PDE with u(T , x) = long. This method allows the coefficients to be
g(x): for i = 1, · · · , m, random and σ to be degenerate. However, it requires
some monotonicity conditions; see for example, [29,
1 42], and [43]. For example, [29] assumes that, for
uit + tr [uixx σ σ ∗ (t, x, u)]
2 some constant β > 0 and for any θi = (xi , yi , zi ), i =
1, 2,
+ uix b(t, x, u, ux σ (t, x, u))
+ f i (t, x, u, ux σ (t, x, u)) = 0 (8) [b(t, ω, θ1 ) − b(t, ω, θ2 )][y1 − y2 ]

Step 3. Solve the following FSDE: + [σ (t, ω, θ1 ) − σ (t, ω, θ2 )][z1 − z2 ]

 t − [f (t, ω, θ1 ) − f (t, ω, θ2 )][x1 − x2 ]


Xt = x + b(s, Xs , u(s, Xs ), ux (s, Xs )
0 ≥ β[|x1 − x2 |2 + |y1 − y2 |2
× σ (s, Xs , u(s, Xs )))ds + |z1 − z2 |2 ] (11)
 t
+ σ (s, Xs , u(s, Xs ))dWs (9)
0
[g(ω, x1 ) − g(ω, x2 )][x1 − x2 ]
Step 4. Set
≤ −β|x1 − x2 |2 (12)
 
Yt = u(t, Xt ), Zt = ux (t, Xt )
× σ (t, Xt , u(t, Xt )) (10)

The main result in [34] is essentially the following Applications


theorem.
We now present some typical applications of
Theorem 3 Assume (i) b, σ , f , g are deterministic, FBSDEs.
uniformly Lipschitz continuous in (x, y, z), and σ does 1. Option pricing and hedging
not depend on z; (ii) PDE (8) has a classical solution
Let us consider the standard Black–Scholes model.
u with bounded derivatives. Then FBSDE (1) has a
The financial market consists of two underlying
unique solution.
assets, a riskless one Bt and a risky one St . Assume
This result has been improved by Delarue [16] and an investor holds a portfolio (xt , πt )0≤t≤T , with its

Zhang [46], by weakening the requirement on u to wealth Vt = xt Bt + πt St . We say the portfolio is self-
only uniform Lipschitz continuity in x. Delarue [16] financing if dVt = xt dBt + πt dSt ; that is, the change
4 Forward–Backward Stochastic Differential Equations (SDEs)

of the wealth is solely due to the change of the of Theorem 2:


underlying assets’ prices.
Now consider a European call option with terminal
 1
payoff g(ST ) = (ST − K)+ . We say a self-financing min u − h(t, x), −ut − tr (uxx σ σ ∗ (t, x))
portfolio (xt , πt ) is a perfect hedge of the option if 2

VT = g(ST ). Under a no-arbitrage assumption, Vt is − ux b(t, x) − f (t, x, u, ux σ ) = 0 (15)
the unique fair option price at t. Let r denote the
interest rate of B, µ the appreciation rate, and σ the
volatility of S. Then (S, V , π) satisfy the following
linear FBSDE: 3. Some further extensions
The previous two models consider complete markets.
El Karoui and Quenez [26] studied superhedging

dSt = St [µdt + σ dWt ], S0 = s0 ; problems in incomplete markets. They have shown
dVt = [r(Vt − πt St ) + µπt St ]dt (13) that the superhedging price of a contingent claim is
+ πt St σ dWt , VT = g(ST ) the increasing limit of solutions of a sequence of
BSDEs. Cvitanić et al. [14] also studied superhedging
If the borrowing interest rate R is greater than problems, but in the case that there is a constraint on
the lending interesting rate r, then the drift term the portfolio part Z. It turns out that the superhedging
of dVt becomes r(Vt − πt St )+ − R(Vt − πt St )− +
price is the minimum solution to an FBSDE with
µπt St , and thus the BSDE becomes nonlinear. The
reflection/constraint on Z. Buckdahn and Hu [11]
coupled FBSDE gives a nice framework for the large
studied a similar problem, but using coupled FBSDE
investor problem, where the investment may affect
with reflections.
the value of St . Assume dSt = µ(t, St , Vt , πt )dt +
Another application is the zero-sum Dynkin game.
σ (t, St , Vt , πt )dWt . Then the system becomes cou-
The value process Y is the solution to a BSDE with
pled. We refer to [24] and [15] for more detailed
double barriers of Y : Lt ≤ Yt ≤ Ut . In this case,
exposure.
besides (Y, Z), the solution consists of two increasing
2. American option and reflected FBSDEs processes K + , K − satisfying [Yt − Lt ]dKt+ = [Ut −
Consider an American option with generator f , ter- Lt ]dKt− = 0, and an equilibrium of the game is

minal payoff function g, and early exercise pay- a pair of stopping times: τ1∗ = inf{t : Yt = Lt } ∧
off Lt . Let X denote the underlying asset price, 
T , τ2∗ = inf{t : Yt = Ut } ∧ T . The work in [13, 28]
Y the option price, and Zσ −1 the hedging port-
and [32] is along this line.
folio. Then the American option solves the fol-
lowing reflected FBSDE with an extra compo- 4. Black’s consol rate conjecture
nent K, which is continuous and increasing, with 
Let r denote the short-rate process and Yt =
K0 = 0: ∞  s 
Et t exp − t rl dl ds be the consol price.
 dX = b(t, ω, X )dt + σ (t, ω, X )dW , Assume


t t t t

 X0 = x0 ;
dYt = −f (t, ω, Xt , Yt , Zt )dt (14) drt = µ(rt , Yt )dt + α(rt , Yt )dWt (16)



 + Z t dW t − dK t , YT = g(ω, X T );
Yt ≥ Lt ; [Yt − Lt ]dKt = 0
for some deterministic functions µ, α. The question is
Here KT − Kt can be interpreted as the time value whether Y satisfies certain SDEs. Black conjectured
of the option. Moreover, the optimal exercise time that there exists a function A, depending on µ and α,

is τ = inf{t ≥ 0 : Yt = Lt } ∧ T . See [22] for more such that dYt = [rt Yt − 1]dt + A(rt , Yt )dWt .
details. The conjecture is confirmed in [21] by using
In the Markovian case with Lt = h(t, Xt ), the FBSDEs. Assume r is “hidden Markovian,” that is,
RFBSDE (14) is associated with the following obsta- rt = h(Xt ) for some deterministic function h and
cle problem of PDE with u(T , x) = g(x), in the spirit some Markov process X. Consider the following
Forward–Backward Stochastic Differential Equations (SDEs) 5

FBSDE over infinite horizon: where σ


, h
are derivatives with respect to a. If
 a ∗ is optimal, then ∇J (a ∗ , a) ≤ 0 for any a.
 dXt = b(Xt , Yt )dt + σ (Xt , Yt )dWt , As a necessary condition, we obtain the stochastic

 X0 = x; maximum principle:

 Y = [h(X t )Yt − 1]dt + Zt dWt ,
 t
Yt is bounded a.s. uniformly in t ∈ [0, ∞)
σ
(t, at∗ )Zt + h
(t, at∗ ) = 0 (20)
The above FBSDE is associated with the following
elliptic PDE
Under certain technical conditions, we get at∗ =
1 2
σ (x, u)u

(x) + b(x, u)u


(x) − h(x)u(x) + 1 = 0 I (t, Zt ) for some deterministic function I . Plugging
2 this into equations (18) and (19) we obtain a coupled
(17)
FBSDE.
Assume equation (17) has a bounded classical
solution u. Then the Black’s conjecture is true with
A(x, y) = σ (x, y)u
(x). References
5. Stochastic control
This is the original motivation to study BSDEs. [1] Antonelli, F. (1993). Backward-forward stochastic dif-
The classical results in the literature assumed that ferential equations, The Annals of Applied Probability
the diffusion coefficient σ was independent of the 3(3), 777–793.
[2] Bally, V. (1997). Approximation scheme for solutions
control; then the problem was essentially parallel of BSDE, in Backward Stochastic Differential Equations
to a deterministic control problem. With the help (Paris 1995–1996), N. El Karoui & L. Mazliak, eds, Pit-
of BSDEs, one can derive necessary conditions for man Research Notes in Mathematics Series, Longman,
stochastic control problems in a general framework. Harlow, Paris, Vol. 364, pp. 177–191.
To illustrate the idea, we show a very simple example [3] Bally, V. & Pagès, G. (2003). Error analysis of the
here. We refer readers to [7, 25, 40], and [44] for more quantization algorithm for obstacle problems, Stochastic
details in this aspect. Processes and their Applications 106, 1–40.
[4] Bender, C. & Denk, R. (2007). A forward scheme
Assume the state process is
for backward SDEs, Stochastic Processes and their
 t Applications 117(12), 1793–1823.
Xt = x + σ (s, as )dWs (18) [5] Bender, C. & Zhang, J. (2008). Time discretization and
0 Markovian iteration for coupled FBSDEs, The Annals of
Applied Probability 18(1), 143–177.
where a is the control in some admissible set A. The [6] Bensoussan, A. (1983). Stochastic maximum princi-
goal is to find optimal a ∗ to maximize the utility (or ple for distributed parameter systems, Journal of the
  T  Franklin Institute 315(5–6), 387–406.
minimize the cost) J (a)=E g(XT ) + 0 h(t, at )dt ; [7] Bismut, J.M. (1973). Théorie Probabiliste du Contrôle
that is, we want to find a ∗ ∈ A such that J (a ∗ ) ≥ des Diffusions, Memoirs of the American Mathematical
J (a), for all a ∈ A. Society, Providence, Rhode Island, Vol. 176.
[8] Bouchard, B. & Touzi, N. (2004). Discrete-time approxi-
Define an adjoint equation which is a BSDE: mation and Monte-Carlo simulation of backward
 T stochastic differential equations, Stochastic Processes
and their Applications 111, 175–206.
Yt = g
(XT ) − Zs dWs (19) [9] Briand, P., Delyon, B. & Mémin, J. (2001). Donsker-
t
type theorem for BSDEs, Electronics Communications
in Probability 6, 1–14.
Then for any a, one can show that [10] Briand, P. & Hu, Y. (2006). BSDE with quadratic
growth and unbounded terminal value, Probability The-
 1 ory and Related Fields 136(4), 604–618.
∇J (a, a) = lim [J (a + εa) − J (a)] [11] Buckdahn, R. & Hu, Y. (1998). Hedging contingent
ε→0 ε
 T claims for a large investor in an incomplete market,
Advances in Applied Probability 30(1),
=E [σ
(t, at )Zt + h
(t, at )]at dt 239–255.
0
6 Forward–Backward Stochastic Differential Equations (SDEs)

[12] Cheridito, P., Soner, M., Touzi, N. & Victoir, N. [28] Hamadene, S. & Lepeltier, J.-P. (1995). Zero-sum
(2006). Second order backward stochastic differen- stochastic differential games and backward equations,
tial equations and fully non-linear parabolic PDEs, Systems and Control Letters 24(4), 259–263.
Communications in Pure and Applied Mathematics 60, [29] Hu, Y. & Peng, S. (1995). Solution of forward-backward
1081–1110. stochastic differential equations, Probability Theory and
[13] Cvitanić, J. & Karatzas, I. (1996). Backward SDE’s with Related Fields 103(2), 273–283.
reflection and Dynkin games, The Annals of Probability [30] Kobylanski, M. (2000). Backward stochastic differen-
24, 2024–2056. tial equations and partial differential equations with
[14] Cvitanić, J., Karatzas, I. & Soner, M. (1998). Back- quadratic growth, The Annals of Probability 28(2),
ward stochastic differential equations with constraints 558–602.
on the gains-process, The Annals of Probability 26(4), [31] Lepeltier, J.P. & San Martín, J. (1997). Backward
1522–1551. stochastic differential equations with continuous coeffi-
cients, Statistics and Probability Letters 32,
[15] Cvitanić, J. & Ma, J. (1996). Hedging options for a large
425–430.
investor and forward-backward SDE’s, The Annals of
[32] Ma, J. & Cvitanic, J. (2001). Reflected forward-
Applied Probability 6(2), 370–398.
backward SDEs and obstacle problems with boundary
[16] Delarue, F. (2002). On the existence and unique-
conditions, Journal of Applied Mathematics and Stochas-
ness of solutions to FBSDEs in a non-degenerate
tic Analysis 14(2), 113–138.
case, Stochastic Processes and their Applications 99(2), [33] Ma, J., Protter, P., San Martín, J. & Torres, S. (2002).
209–286. Numerical method for backward stochastic differential
[17] Delarue, F. & Menozzi, S. (2006). A forward backward equations, The Annals of Applied Probability 12(1),
stochastic algorithm for quasi-linear PDEs, The Annals 302–316.
of Applied Probability 16, 140–184. [34] Ma, J., Protter, P. & Yong, J. (1994). Solving forward-
[18] Douglas, J., Ma, J. & Protter, P. (1996). Numeri- backward stochastic differential equations explicitly - a
cal methods for forward backward stochastic differ- four step scheme, Probability Theory and Related Fields
ential equations, The Annals of Applied Probability 6, 98, 339–359.
940–968. [35] Ma, J. & Yong, J. (1999). Forward-backward Stochastic
[19] Duffie, D. & Epstein, L. (1992). Stochastic differential Differential Equations and their Applications, Lecture
utility, Econometrica 60, 353–394. Notes in Mathematics, Springer, Vol. 1702.
[20] Duffie, D. & Epstein, L. (1992). Asset pricing with [36] Ma, J. & Zhang, J. (2002). Representation theorems for
stochastic differential utility, Review of Financial Studies backward SDEs, The Annals of Applied Probability 12,
5, 411–436. 1390–1418.
[21] Duffie, D., Ma, J. & Yong, J. (1995). Black’s consol [37] Pardoux, E. & Peng, S. (1990). Adapted solutions
rate conjecture, The Annals of Applied Probability 5(2), of backward stochastic equations, System and Control
356–382. Letters 14, 55–61.
[22] El Karoui, N., Kapoudjian, C., Pardoux, E., Peng, S. & [38] Pardoux, E. & Peng, S. (1992). Backward Stochastic
Quenez, M.C. (1997). Reflected solutions of backward Differential Equations and Quasilinear Parabolic Partial
SDE’s, and related obstacle problems for PDE’s, The Differential Equations, Lecture Notes in CIS, Springer,
Annals of Probability 25(2), 702–737. Vol. 176, pp. 200–217.
[23] El Karoui, N. & Mazliak, L. (1997). Backward Stochas- [39] Pardoux, E. & Tang, S. (1999). Forward-backward
stochastic differential equations and quasilinear para-
tic Differential Equations, Pitman Research Notes in
bolic PDEs, Probability Theory and Related Fields
Mathematics Series, Longman, Harlow, Vol. 364.
114(2), 123–150.
[24] El Karoui, N., Peng, S. & Quenez, M.C. (1997).
[40] Peng, S. (1990). A general stochastic maximum principle
Backward stochastic differential equations in finance,
for optimal control problems, SIAM Journal on Control
Mathmatical Finance 7, 1–72.
and Optimization 28(4), 966–979.
[25] El Karoui, N., Peng, S. & Quenez, M.C. (2001). [41] Peng, S. (1992). A nonlinear Feynman-Kac formula and
A dynamic maximum principle for the optimization applications, in Control Theory, Stochastic Analysis and
of recursive utilities under constraints, The Annals of Applications: Proceedings of the Symposium on System
Applied Probability 11(3), 664–693. Sciences and Control Theory (Hangzhou, 1992), S.P.
[26] El Karoui, N. & Quenez, M.C. (1995). Dynamic pro- Shen & J.M. Yong, eds, World Scientific Publications,
gramming and pricing of contingent claims in an incom- River Edge, NJ, pp. 173–184.
plete market, SIAM Journal on Control and Optimization [42] Peng, S. & Wu, Z. (1999). Fully coupled forward-
33(1), 29–66. backward stochastic differential equations and applica-
[27] Gobet, E., Lemor, J.-P. & Warin, X. (2005). A tions to optimal control, SIAM Journal on Control and
regression-based Monte-Carlo method to solve backward Optimization 37(3), 825–843.
stochastic differential equations, The Annals of Applied [43] Yong, J. (1997). Finding adapted solutions of forward-
Probability 15, 2172–2202. backward stochastic differential equations: method of
Forward–Backward Stochastic Differential Equations (SDEs) 7

continuation, Probability Theory and Related Fields Related Articles


107(4), 537–572.
[44] Yong, J. & Zhou, X. (1999). Stochastic Controls: Hamil-
tonian Systems and HJB Equations, Springer.
[45] Zhang, J. (2004). A numerical scheme for BSDEs, The Backward Stochastic Differential Equations;
Annals of Applied Probability 14(1), 459–488. Backward Stochastic Differential Equations: Nu-
[46] Zhang, J. (2006). The wellposedness of FBSDEs, Dis- merical Methods; Doob–Meyer Decomposition.
crete and Continuous Dynamical Systems-series B 6,
927–940. JIANFENG ZHANG
d 

Martingale = M0 +
t
Hsi dBsi ∀t ≥ 0 (1)
Representation Theorem i=1 0

Furthermore, the process H is unique modulo dt ×


The “martingale representation theorem” is one dP-null sets. Consequently, it holds that M 2 (FB ) =
of the fundamental theorems of stochastic cal- M 2c (FB ).
culus. It was first noted by Itô [9] (see Itô,
Kiyosi (1915–2008)) as an application of mul- The proof of this theorem can be found in standard
tiple Wiener–Itô integrals. It was later modified reference books in stochastic analysis, for example,
and extended to various forms by many authors, Ikeda and Watanabe [8], Karatzas and Shreve [12],
but the basic theme remains the same: a square- Liptser and Shiryaev [14], Protter [20], and Rogers
integrable (local) martingale with respect to the fil- and Williams [21], to mention a few. But the work
tration generated by a Brownian motion can always of Dellacherie [1] is worth mentioning, since it is the
be represented as an Itô integral with respect to basis for many other proofs in the literature.
that Brownian motion. An immediate consequence Note that if ξ is an FBT -measurable random vari-
would then be that every square-integrable mar- able for some T > 0 with finite second moments, then
  
tingale with respect to a Brownian filtration must Mt = E ξ |FBt , t ≥ 0, defines a square-integrable
have continuous paths. The martingale representa- FB -martingale. We therefore have the following
tion theorem is particularly useful in fields such corollary:
as nonlinear filtering and mathematical finance [12]
(see Second Fundamental Theorem of Asset Pric- Corollary 1 Assume that ξ is a FBT -measurable
ing) and it is a fundamental building block of the random variable for some T > 0, such that E[|ξ |2 ] <
theory of backward stochastic differential equations ∞. Then there exists
 T a d-dimensional FB -predictable
[17, 19] (see Backward Stochastic Differential process H with E 0 |Hs | ds < ∞ such that
2
Equations).
To state the martingale representation theorem  T
more precisely, let us consider a probability space
ξ = E[ξ ] + (Hs , dBs )
(, F, P ), on which is defined a d-dimensional 0
Brownian motion B. We denote the filtration gen- d 
    T
erated by B as FB = FBt t≥0 , where FBt = = E[ξ ] + Hsi dBsi , P a.s. (2)
σ {Bs : s ≤ t} ∨ N, t ≥ 0, and N is the set of all P - i=1 0
null sets in F. It can be checked that the filtration

FB is right continuous (i.e., Ft = FBt+ = ∩ε>0 FBt+ε , Furthermore, the process H is unique modulo dt ×
t ≥ 0), and Ft contains all P -null sets of F. In
B dP-null sets.
other words, FB satisfies the so-called usual hypothe-
We remark that in the above corollary, the process
ses [20] (see Filtrations). Let us denote M 2 (FB )
H , often referred to as the martingale integrand or
to be the set of all square-integrable FB -martingales
representation kernel of the martingale M, could
and M 2c (FB ) to be the subspace of M 2 (FB ) of all
depend on the duration T > 0; therefore, a more
those martingales that have continuous paths. The
precise notation would be H = H T , if the time
most common martingale representation theorem is
duration T has to be taken into consideration. But
the following:
the uniqueness
  of the representation implies that the
family H T is actually “consistent” in the sense that
Theorem 1 Let M ∈ M 2 (FB ). Then there exists a
d-dimensional FB -predictable process H with HtT1 = HtT2 , dt × dP a.e. on [0, T1 ] × , if T1 ≤ T2 .
T The martingale representation theorem can be
E 0 |Hs | ds < ∞ for all T > 0, such that
2
generalized to local martingales [12, 20, 21]:
 t
Theorem 2 Every FB -local martingale is continu-
Mt = M0 + (Hs , dBs ) ous and is the stochastic integral with respect to B of
0
2 Martingale Representation Theorem

a predictable process H such that The generalization of type (1) essentially uses
the idea of orthogonal decomposition of the Hilbert
 
t space. In fact, note that M 2 (F) is a Hilbert space,
P |Hs |2 ds < ∞ : t ≥ 0 = 1 (3) let H denote all H ∈ M 2 (F) such that Ht =
0 and
t
0 s dBs , t ≥ 0 for some progressively measurable
We note that there is a slight difference between process  ∈ L2 ([0, T ] × ). Then H is a closed
Corollary 1 and Theorem 2, on the integrability of subspace of M 2 (F); thus for any M ∈ M 2 (F) the
the integrand H . In fact, without the local martin- following decomposition holds:
gale assumption the “local” square integrability such
as equation (3) does not guarantee the uniqueness of
M t = M 0 + Ht + N t
the process H in Corollary 1. A very elegant result  t
in this regard is attributed to Dudley [4], who proved
= M0 + s dBs + Nt , t ≥0 (4)
that any almost surely finite FT -measurable random 0
variable ξ can be represented as a stochastic inte-
gral evaluated at T , and the “martingale integrand” where N ∈ N ⊥ , the subspace of M 2 (F) consisting
satisfies only equation (3). However, such representa- of all martingales that are “orthogonal” to N . We
tion does not have uniqueness. This point was further refer to [12] and [20], for example, for detailed
investigated in [7]. In this study, the filtration is gen- discussions for this type of representations. The
erated by a higher dimensional Brownian motion, of generalizations of types (2) and (3) keep the original
which B is only a part of the components. We also form of the representation. We now list two results
refer to [12] for the discussions on this issue. adapted from Ikeda–Watanabe [8].
Itô’s original martingale representation theorem
has been extended to many other situations when Theorem 3 Let M i ∈ M 2c (F), i = 1, 2, . . . , d. Sup-
the Brownian motion is replaced by certain semimar- pose that i,j ∈ L 1 (F) and  i,k ∈ L 2 (F), i, j , k =
tingales. In this section, we give a brief summary of 1, 2, . . . , d, exist such that for i, j = 1, 2, . . . , d,
these cases. For simplicity in what follows, we shall
consider only martingales rather than local martin-  t
gales. The versions for the latter are essentially iden-
M , M t =
i j
ijs ds and
tical, but with slightly relaxed integrability require- 0
ments on the representing integrands, as we saw in 
d

s =
Theorem 2. i,j sik sj k , P a.s. (5)
k=1

Representation under Non-Brownian jk


and det(s ) = 0, a.s., for all s ≥ 0. Then there exists
Filtrations
a d-dimensional F-Brownian motion B = {(Bt1 , . . . ,
We recall that one of the most important assumptions Btd ) : t ≥ 0} such that
in the martingale representation theorems is that
the filtration is generated by the Brownian motion d 
 t
(or “Brownian-filtration”). When this assumption is Mti = M0i + sik dBsk , i = 1, 2, . . . , d
removed, the representation may still hold, but the k=1 0
form will change. There are different ways to adjust
(6)
the result:
jk
1. Fix the probability space, but change the form of We remark that the assumption det(s ) = 0 in
representation (by adding an orthogonal martin- Theorem 3 is quite restrictive, which implies, among
gale). other things, that the representing Brownian motion
2. Fix the probability space, but use more informa- has to have the same dimension as the given mar-
tion of the martingale to be represented. tingale (thus the representation kernel is “squared”).
3. Extend the probability space, but keep the form This restriction can be removed by allowing the prob-
of the representation. ability space to be enlarged (or extended, see [8]).
Martingale Representation Theorem 3

 
Theorem 4 Let M i ∈ M 2c (F), i = 1, 2, . . . , d. Sup- t
X →  satisfying E 0 X |f (s, x, ·)|2 N̂p (ds, dx)
pose that i,j ,  i,k ∈ L 0 (F), i, j = 1, 2, . . . , d, k =
< ∞, such that
1, 2, . . . , r exist such that for i, j = 1, 2, . . . , d and
t ij t  t+ 
k = 1, 2, . . . , r, 0 |s | ds < ∞ and 0 |sik |2 ds <
∞, t ≥ 0, P a.s., and that Mt = M0 + f (s, x, ·)Ñp (ds, dx), t ≥0
0 X
 (9)
t

M i , M j t = ijs ds and We should note that like Theorem 1, Theorem 5
0
also has generalizations that could be considered as

d
counterparts of Theorems 3 and 4 [8]. It is worth
s =
i,j sik sj k , P a.s. (7) noting that by combining Theorems 1 and 5, it is pos-
k=1
sible to obtain a martingale representation theorem
Then there exists an extension (,  
F, P; 
F) of that involves both Brownian motion and the Poisson
(, F, P ; F), and a d-dimensional F-Brownian random measure. Keeping the Lévy–Khintchine for-

motion B = (Bt1 , . . . , Btd ) : t ≥ 0 such that mula (see Lévy Processes) (or Lévy–Itô Theorem)
in mind, we have the following representation theo-
d 
 t
rem, which is a simplified version resulting from a
Mti = M0i + sik dBsk , i = 1, 2, . . . , d (8) much deeper and extensive exposition by Jacod and
k=1 0 Shiryaev [10] (see also [13]). Let F be the filtration
generated by a Lévy process with the Brownian com-
ponent B and Poisson component N .
Representation for Discontinuous
Martingales Theorem 6 Suppose that M ∈ M 2 (F). Then there
exist an F-adapted
T process H and
  tarandom field G
Up to this point, all the representable martingales are,
satisfying E 0 |Hs |2 ds<∞, E 0 \0 |G(s, x)|2 N̂
in fact necessarily, continuous. This clearly excludes 
many important martingales, most notably the com- (ds, dx) < ∞, such that
pensated Poisson processes. Thus another general-
 t  t
ization of the martingale representation theorem is
Mt = M0 + Hs dBs + (ds, dx)
G(s, x)N
to replace the Brownian motion by Poisson random
0 0 \0
measure. We refer to Ikeda and Watanabe [8], for
example, for the basic notions of Poisson point pro- (10)
cess and Poisson random measures.
Let p be a Poisson point process (see Point Moreover, the elements of the pair (H , G) are unique
Processes) on some state space (X, B (X)), where in their respective spaces.
B (X) stands for the Borel field of X. For each
In Theorem 6, the Brownian component and the
t > 0 and U ∈ B (X), define the counting mea-
 Poisson component of the Lévy process have to be
sure Np (t, U ) = s≤t 1U (p(s)). We assume that treated separately, and one cannot simply replace
the point process p is of class (QL), that is, the the Brownian motion in Theorem 1 by a Lévy
compensator N̂p (·, U ) = E[Np (·, U )] is continuous process. In fact, the martingale representation for
for each U ; and Ñp (t, U ) = Np (t, U ) − Ûp (t, U ) Lévy process is a much more subtle issue, and was
is a martingale. Similar to the Brownian case, we recently studied by Nualart and Schoutens [18] via
p 
can define the filtration generated
 by p as Ft = the chaotic representation using the so-called Teugels
σ Np (s, U ) : s ≤ t, U ∈ B (X) (or make it right martingales. We refer also to Løkka [15] for a more
p p recent development on this issue.
 p  by defining F̃t = ∩ε>0 Ft+ε ), and denote
continuous
F = Ft t≥0 . We then have the following analog of
p A natural question now is whether the martingale
Theorem 1. representation theorem can still hold (in the usual
sense) for martingales with jumps. The answer to
Theorem 5 Let M ∈ M 2 (Fp ). Then there exists this question has an important implication in finance,
an Fp -predictable random field f :  × [0, ∞) × since, as we shall see subsequently, this is the
4 Martingale Representation Theorem

same as asking whether a market could be complete market, denoted by σ , is positive, we can write
when the dynamics of the underlying assets have
jumps. It turns out that there indeed exists a class  t  t
of martingales, known as the normal martingales, Vt = V0 + rVs ds + πt σs dBs , t ∈ [0, T ]
0 0
that are discontinuous in general but the martingale
representation theorem holds. A square-integrable (13)
martingale M is called normal if
M t = t (cf.
where πt = ert φt σt−1 , t ≥ 0. The process π is then
[2]). The class of normal martingale, in particular,
exactly the “hedging strategy” for the claim X , that
includes those martingales that satisfy the so-called
is, the amount of money one should invest in the
structure equation (cf. [5, 6]). Examples of normal
stock, so that VT = X, almost surely.
martingales satisfying the structure equation include
The martingale representation theorem also plays
Brownian motion, compensated Poisson process, the
an important role in portfolio optimization problems,
Azéma martingale, and the “parabolic” martingale
especially in finding optimal strategies [12].
[20]. The martingale representation, or more precisely
One of the abstract forms of the hedging problem
the Clark–Ocone formula, was proved in [16]. The
described earlier is the so-called backward stochastic
application of such a representation in finance was
differential equation (BSDE), which is the problem of
first done by Dritschel and Protter [3] (see also [11]).
finding a pair of F-adapted processes (V , Z) so that
the following terminal value problems for a stochastic
differential equation similar to (13) holds:
Relation with Hedging
dVt = f (t, Vt , Zt ) dt + Zt dBt , t ∈ [0, T ]
The martingale representation theorem is the basis
for the arguments leading to market completeness, a VT = X (14)
fundamental component in the “Second Fundamental
Theorem” of mathematical finance (see Second Fun- See Forward–Backward Stochastic Differential
damental Theorem of Asset Pricing). Consider a Equations (SDEs); Backward Stochastic Differen-
market modeled by a probability space (, F, P , F), tial Equations.
where F is the filtration generated by a Brown-
ian motion that represents market randomness, and References
denote it by B. Assume that the market is arbitrage
free; then there exists a risk neutral measure Q (see [1] Dellacherie, C. (1974). Intégrales Stochastiques par
Fundamental Theorem of Asset Pricing), equiva- Rapport aux Processus de Wiener et de Poisson, Sémi-
lent to P . The arbitrage price at time t ∈ [0, T ] for naire de Probability (Univ. de Strasbourg) IV, Lec-
any contingent T -claim X is given by the discounted ture Notes in Math, Springer-Verlag, Berlin, Vol. 124,
77–107.
present value formula:
[2] Dellacherie, C., Maisonneuve, B. & Meyer, P.A. (1992).
Probabilités et Potentiel: Chapitres XVII à XXIV, Her-
Vt = e−r(T −t) E Q [X|Ft ], t ∈ [0, T ] (11) mann, Paris.
[3] Dritschel, M. & Protter, P. (1999). Complete markets
with discontinuous security price, Finance and Stochas-
where r is the (constant) interest rate. If X is tics 3(2), 203–214.
square integrable, then Mt = e−rt Vt , t ≥ 0, is a [4] Dudley, R.M. (1977). Wiener functionals as Itô integrals,
square-integrable F-martingale under Q. Applying Annals of Probability 5, 140–141.
the martingale representation theorem one has [5] Emery, M. (1989). On the Azéma Martingales, Séminaire
de Probabilités XXIII, Lecture Notes in Mathematics,
Vol. 1372, Springer Verlag, pp. 66–87.
 t [6] Emery, M. (2006). Chaotic representation property of
Mt = M0 + φs dBs , t ∈ [0, T ] (12) certain Azéma martingales, Illinois Journal of Mathe-
0 matics 50(2), 395–411.
[7] Emery, M., Stricker, C. & Yan, J. (1983). Valuers prises
for some square-integrable, F-predictable process φ. par les martinglales locales continues à un instant donné,
Or equivalently, assuming that the volatility of the Annals of Probability 11, 635–641.
Martingale Representation Theorem 5

[8] Ikeda, N. & Watanabe, S. (1981). Stochastic Differential [18] Nualart, D. & Schoutens, W. (2000). Chaotic and pre-
Equations and Diffusion Processes, North-Holland. dictable representations for Lévy processes, Stochastic
[9] Itô, K. (1951). Multiple Wiener integral, Journal of Processes and their Applications 90, 109–122.
Mathematical Society of Japan 3, 157–169. [19] Pardoux, E. & Peng, S. (1990). Adapted solutions
[10] Jacod, J. & Shiryaev, A.N. (1987). Limit Theorems for of backward stochastic equations, System and Control
Stochastic Processes, Springer-Verlag, Berlin. Letters 14, 55–61.
[11] Jeanblanc, M. & Privault, N. (2002). A complete [20] Protter, P. (1990). Stochastic Integration and Stochastic
market model with Poisson and Brownian compo- Differential Equations, Springer.
nents, Seminar on Stochastic Analysis, Random Fields [21] Rogers, L.C.G. & Williams, D. (1987). Diffusions,
and Applications, Ascona; Progress in Probability, 52, Markov Processes and Martingales, Vol. 2: Itô Calculus,
189–204. John Wiley & Sons.
[12] Karatzas, I. & Shreve, S.E. (1987). Brownian Motion
and Stochastic Calculus, Springer.
[13] Kunita, H. (2004). Representation of martingales with Further Reading
jumps and applications to mathematical finance, in
Stochastic Analysis and Related Topics in Kyoto, Dellacherie, C. & Meyer, P. (1978). Probabilities and Poten-
Advanced Studies in Pure Mathematics 41 , H. Kunita, tial, North-Holland.
S. Watanabe & Y. Takahashi eds, Mathematical Society Doob, J.L. (1984). Classical Potential Theory and its Proba-
of Japan, Tokyo, pp. 209–232. bilistic Counterparts, Springer.
[14] Liptser, R.S. & Shiryaev, A.N. (1977). Statistics of Revuz, D. & Yor, M. (1991, 1994). Continuous Martingales
Random Processes. Vol I: General Theory, Springer- and Brownian Motion, Springer.
Verlag, New York.
[15] Løkka, A. (2004). Martingale representation of function-
als of Lévy processes, Stochastic Analysis and Applica- Related Articles
tions 22(4), 867–892.
[16] Ma, J., Protter, P., & San Martin, J. (1998). Anticipating
Backward Stochastic Differential Equations; Con-
integrals for a class of martingales, Bernoulli 4(1),
81–114. vex Duality; Complete Markets; Filtrations; Sec-
[17] Ma, J. & Yong, J. (1999). Forward-Backward Stochastic ond Fundamental Theorem of Asset Pricing.
Differential Equations and Their Applications, LNM
1702, Springer. JIN MA
Backward Stochastic Brownian motion W ; L2 is the set of random
variables ξ that are FT -measurable and square-
Differential Equations integrable; IH 2 is the set of predictable processes φ
T
such that E 0 |φt |2 dt < ∞. In the following, the
sign  denotes transposition.
Backward stochastic differential equations (BSDEs) Let us consider the following BSDE (with dimen-
occur in situations where the terminal (as opposed sion 1 to simplify the presentation):
to the initial) condition of stochastic differential
equations is a given random variable. Linear BSDEs
were first introduced by Bismut (1976) as the adjoint − dYt = f (t, Yt , Zt )dt − Zt dWt , YT = ξ (3)
equation associated with the stochastic version of
the Pontryagin maximum principle in control theory. where ξ ∈ L2 and f is a driver, that is, it satisfies
The general case of a nonlinear BSDE was first the following assumptions: f :  × [0, T ] × IR ×
introduced by Peng and Pardoux [23] to give a IR n → IR est P ⊗ B ⊗ Bn -measurable, f (., 0, 0) ∈
Feynman–Kac representation of nonlinear parabolic IH 2 and f is uniformly Lipschitz with respect to y, z
partial differential equations (PDEs). The solution of with constant C > 0. Such a pair (ξ, f ) is called a
a BSDE consists of a pair of adapted processes (Y, Z) pair of standard parameters. If the driver f does not
satisfying depend on y and z, the solution Y of equation (3) is
then given as
− dYt = f (t, Yt , Zt )dt − Zt dWt , YT = ξ (1)
  T 
where f is called the driver and ξ the terminal Yt = E ξ + f (s)ds/Ft (4)
condition. This type of equation appears naturally in t
hedging problems. For example, in a complete market
(see Complete Markets), the price process (Yt )0≤t≤T and the martingale representation theorem for Brow-
of a European contingent claim ξ with maturity T nian motion ([16] Theorem 4.15) gives the existence
corresponds to the solution of a BSDE with a linear of a unique process Z ∈ IH 2 such that
driver f and a terminal condition equal to ξ .
Reflected BSDEs were introduced by El Karoui   T   t
et al. [6]. In the case of a reflected BSDE, the solution E ξ+ f (s)ds/Ft = Y0 + Zs dWs (5)
Y is constrained to be greater than a given process 0 0
called the obstacle. A nondecreasing process K is
introduced in the equation in order to push (upward) In 1990, Peng and Pardoux [23] stated the follow-
the solution so that the constraint is satisfied, and ing theorem.
this push is minimal, that is, Y satisfies the following
Theorem 1 If ξ ∈ Ł2 and if f is a driver, then there
equation:
exists a unique pair of solutions (Y , Z) ∈ IH 2 × IH 2
− dYt = f (t, Yt , Zt )dt + dKt − Zt dWt , YT = ξ of equation (3).
(2)
In [7], El Karoui et al. have given a short proof
with (Yt − St ) dKt = 0. One can show that the price of this theorem based on a priori estimations of the
of an American option (with eventually some non- solutions. More precisely, the proposition is given as
linear constraints) is the solution of a reflected follows:
BSDE, where the obstacle is given by the payoff
process. Proposition 1 (A Priori Estimations). Let f 1 , ξ 1 ,
f 2 , ξ 2 be standard parameters. Let (Y 1 , Z 1 ) be the
solution associated with f 1 , ξ 1 and (Y 2 , Z 2 ) be the
Definition and Properties solution associated with f 2 , ξ 2 . Let C be the Lips-
chitz constant of f 1 . Substitute δYt = Yt1 − Yt2 , δZt =
We adopt the following notation: IF = {Ft , 0 ≤ t Zt1 − Zt2 , and δ2 ft = f 1 (t, Yt2 , Zt2 ) − f 2 (t, Yt2 , Zt2 ).
≤ T } is the natural filtration of an n-dimensional For (λ, µ, β) such that λ2 > C and β sufficiently
2 Backward Stochastic Differential Equations

large, that is, β > C(2 + λ2 ) + µ2 , the following esti- Let S ≤ T be a stopping time, and denote by

mations hold: Yt (S, ξ ) the solution of the BSDE with terminal
  time T , coefficient f (t, y, z)1{t≤S} , and terminal
1 
condition ξ (FS -measurable). Both the processes
||δY ||β ≤ T e E(|δYT | ) + 2 ||δ2 f ||β
2 βT 2 2
(6)
µ (Yt (S, YS ), Zt (S, YS ); t ∈ [0, T ]) and (Yt∧S (T , ξ ),
  Z(T , ξ )1{t≤S} ; t ∈ [0, T ]) are solutions of the BSDE
λ2 1 with terminal time T , coefficient f (t, y, z)1{t≤S} , and
||δZ||2β ≤ e βT
E(|δYT |2
) + ||δ 2 f ||2
β
λ2 − C µ2 terminal condition YS . By uniqueness, these processes
are the same dP ⊗ dt-a.s.
(7)
T The simplest case is that of a linear BSDE.
where ||δY ||2β =E 0 e |δYt | dt.
βt 2
Let (β, γ ) be a bounded (IR, IR n )-valued predictable
process and let ϕ ∈ IH 2 (IR), ξ ∈ Ł2 (IR). We consider
From these estimations, uniqueness and existence the following BSDE:
of a solution follow by using the fixed point theo-
rem applied to the function  : IHβ2 ⊗ IHβ2 → IHβ2 ⊗
− dYt = (ϕt + Yt βt + Zt γt ) dt − Zt dWt ,
IHβ2 ; (y, z)  → (Y, Z), where (Y, Z) is the solution
associated with the driver f (t, yt , zt ) and IHβ2 denotes YT = ξ (9)
the space IH 2 endowed with norm || · ||β . Indeed, by
using the previous estimations, one can show that for By applying Itô’s formula to
t Yt , it can easily
t
sufficiently large β, the mapping  is strictly con- be shown that the process
t Yt + 0
s ϕs ds is a
tracting, which gives the existence of a unique fixed local martingale and even a uniformly integrable
point, which is the solution of the BSDE. martingale, which gives the following proposition.
In addition, from “a priori estimations” (Proposi-
tion 1), some continuity and differentiability of solu- Proposition 3 The solution (Y , Z) of the linear
tions of BSDEs (with respect to some parameter) can BSDE (9) satisfies
be derived ([7] section 2).   T 
Furthermore, estimations (1) are also very useful
t Yt = E ξ
T +
s ϕs ds |Ft (10)
to derive some results concerning approximation or t
discretization of BSDEs [14]. where
is the adjoint process (corresponding to a
Recall the dependence of the solutions of BSDEs change of numéraire or a deflator in finance) defined
with respect to terminal time T and terminal condi-
by d
t =
t [βt dt + γt∗ dWt ],
0 = 1.
tion ξ by the notation (Yt (T , ξ ), Zt (T , ξ )). We have
the following flow property. Remark 1 First, it can be noted that if ξ and ϕ are
positive, then the process Y is positive. Second, if in
Proposition 2 (Flow Property). Let (Y (T , ξ ), Z
addition Y0 = 0 a.s., then for any t, Yt = 0 a.s. and
(T , ξ )) be the solution of a BSDE associated with the
ϕt = 0 dt ⊗ dP -a.s.
terminal time T > 0 and standard parameters (ξ , f ).
For any stopping time S ≤ T , From the first point in this remark, one can derive
the classical comparison theorem, which is a key
Yt (T , ξ ) = Yt (S, YS (T , ξ )), property of BSDEs.
Zt (T , ξ ) = Zt (S, YS (T , ξ )),
Theorem 2 (Comparison Theorem). If f 1 , ξ 1 and
t ∈ [0, S], dP ⊗ dt-almost surely (8) f 2 , ξ 2 are standard parameters and if (Y 1 , Z 1 )
(respectively (Y 2 , Z 2 )) is the solution associated with
Proof By conventional notation, we define the solu- (f 1 , ξ 1 ) (respectively (f 2 , ξ 2 )) satisfying
tion of the BSDE with terminal condition (T , ξ ) for

t ≥ T by (Yt = ξ, Zt = 0). Thus, if T ≥ T , then 1. ξ 1 ≥ ξ 2 P -a.s.

(Yt , Zt ); t ≤ T is the unique solution of the BSDE 2. δ2 ft = f 1 (t, Yt2 , Zt2 ) − f 2 (t, Yt2 , Zt2 ) ≥ 0 dt ×

with terminal time T , coefficient f (t, y, z)1{t≤T } , and dP -a.s.
terminal condition ξ . 3. f 1 (t, Yt2 , Zt2 ) ∈ IH 2 .
Backward Stochastic Differential Equations 3

Then, we have Y.1 ≥ Y.2 P -a.s. ξ = ess inf ξ α = ξ α , P -a.s. (13)


α
In addition, the comparison theorem is strict,
that is, on the event {Yt1 = Yt2 }, we have ξ1 = ξ2 Then,
a.s., f 1 (t, Yt2 , Zt2 ) = f 2 (t, Yt2 , Zt2 ) ds × dP -a.s. and
Ys1 = Ys2 a.s., t ≤ s ≤ T . Yt = ess inf Ytα = Ytα , 0 ≤ t ≤ T , P -a.s. (14)
α

Idea of the proof. We denote by δY the spread Proof For each α, since f (t, Yt , Zt ) ≤ f α (t, Yt , Zt )
between those two solutions: δYt = Yt2 − Yt1 and dt ⊗ dP -a.s. and ξ ≤ ξ α , the comparison theorem
δZt = Zt2 − Zt1 . The problem is to show that under gives that Yt ≤ Ytα 0 ≤ t ≤ T , P -a.s. It follows that
the above assumptions, δYt ≥ 0.
Now, the pair (δY, δZ) is the solution of the Yt ≤ ess inf Ytα , 0 ≤ t ≤ T , P -a.s. (15)
following LBSDE: α

Now, by assumption, it is clear that Yt = Ytα ,0 ≤


− dδYt = δy f 2 (t)δYt + δz f 2 (t)δZt + ϕt dt
t ≤ T , P -a.s., which gives that the inequality in (15)
− δZt dWt , is an equality, which ends the proof.
Note also that from the strict comparison theorem,
δYT = ξ 2 − ξ 1 (11) one can derive an optimality criterium [7]:
f 2 (t, Yt2 , Zt2 ) − f 2 (t, Yt1 , Zt2 )
where δy f 2 (t) = if Proposition 5 A parameter α is 0-optimal (i.e.,
Yt2 − Yt1 minα Y0α = Y0α ) if and only if
Yt − Yt is not equal to 0, and 0 otherwise (and
2 1

the same for δz f 2 (t)). Now, since the driver f 2 is


supposed to be uniformly Lipschitz with respect to f (s, Ys , Zs ) = f α (s, Ys , Zs )dP ⊗ ds-a.s.
(y, z), it follows that δfy2 (t) and δfy2 (t) are bounded. ξ = ξ α P -a.s. (16)
In addition, ϕt and δYT are nonnegative. It follows
from the first point of Remark (1) that the solution The flow property (Proposition 2) of the value
δYt of the LBSDE (11) is nonnegative. In addition, function corresponds to the dynamic programming
the second point of Remark (1) gives the strict principle in stochastic control.
comparison theorem. Indeed, using the same notation as in Proposi-
From this theorem, we then state a general prin- tion 2, for any stopping time S ≤ T ,
ciple for minima of BSDEs [7]: if a driver f can be
written as an infimum of a family of drivers f α and if Yt (T , ξ ) = ess inf Ytα (S, YS (T , ξ )),
a random variable ξ can be written as an infimum of α

random variables ξ α , then the solution of the BSDE 0 ≤ t ≤ S, P -a.s. (17)


associated with f and ξ can be written as the infimum
of the solutions of the BSDEs associated with f α , ξ α . From the principle on minima of BSDEs (Propo-
More precisely, we have the following proposi- sition 4), one can easily obtain some links between
tion. BSDEs and stochastic control (see, e.g. [10] Section 3
for a financial presentation or [26] for a more classical
Proposition 4 (Minima of BSDEs). Let (f , f α ; α ∈ presentation in stochastic control).
A) be a family of drivers and let (ξ , ξ α ; α ∈ A) be Note, in particular, that if this principle on minima
a family of terminal conditions. Let (Y , Z) be the of BSDEs is formulated a bit differently, it can be
solution of the BSDE associated with (f , ξ ) and let seen as a verification theorem for some stochastic
(Y α , Z α ) be the solution of the BSDE associated with control problem written in terms of BSDEs . More
(f α , ξ α ). Suppose that there exists a parameter α such precisely, let (f α ; α ∈ A) be a family of drivers and
that let (ξ α ; α ∈ A) be a family of terminal conditions.
Let (Y α , Z α ) be the solution of the BSDE associated
f (t, Yt , Zt ) = ess inf f α (t, Yt , Zt ) with (f α , ξ α ). The value function is defined at time
α t as
= f (t, Yt , Zt ), dt ⊗ dP -a.s. (12)
α Y t = ess inf Ytα , P -a.s. (18)
α
4 Backward Stochastic Differential Equations

If there exist standard parameters f and ξ and Many tentatives have been made to relax the
a parameter α such that equation (12) holds, then Lipschitz assumption on the driver f ; for instance,
the value function coincides with the solution of Lepeltier and San Martı́n [19] and have proved the
the BSDE associated with (f, ξ ). In other words, existence of a solution for BSDEs with a driver
Y t = Yt , 0 ≤ t ≤ T , P -a.s., where (Y, Z) denotes f , which is only continuous with linear growth by
the solution of the BSDE associated with (f, ξ ). It can an approximation method. Kobylanski [17] studied
be noted that this verification theorem generalizes the the case of quadratic BSDEs [20]. To give some
well-known Hamilton–Jacobi–Bellman–verification intuition on quadratic BSDEs, let us consider the
theorem, which holds in a Markovian framework. following simple example:
Indeed, recall that in the Markovian case, that is,
the case where the driver and the terminal condition
are functions of a state process, Peng and Pardoux Zt2
−dYt = dt − Zt dWt ,
(1992) have given an interpretation of the solution of 2
a BSDE in terms of a PDE [24]. More precisely, the YT = ξ (23)
state process X.t,x is a diffusion of the following type:
Let us make the exponential change of variable
dXs = b(s, Xs )ds + σ (s, Xs )dWs , Xt = x (19)
yt = eYt . By applying Itô’s formula, we easily derive
Then, let us consider (Y t,x , Z t,x ) solution of the
following BSDE:
dyt = eYt Zt dWt ,

−dYs = f (s, Xst,x , Ys , Zs )ds − Zs dWs , yT = e ξ (24)

YT = g(XTt,x ) (20) and hence, if ξ is supposed to be bounded and Z


where b, σ , f , and g are deterministic functions. ∈ H 2 , we have yt = E[eξ /Ft ]. Thus, for quadratic
In this case, one can show that under quite weak BSDEs, it seems quite natural to suppose that the
conditions, the solution (Yst,x , Zst,x ) depends only on terminal condition is bounded. More precisely, the
time s and on the state process Xst,x (see [7] Section following existence result holds [17].
4). In addition, if f and g are uniformly continuous
Proposition 6 (Quadratic BSDEs). If the terminal
with respect to x and if u denotes the function such
condition ξ is bounded and if the driver f is linear
that Ytt,x = u(t, x), one can show (see [24] or [10]
p. 226 for a shorter proof) that u is a viscosity growth in y and quadratic in z, that is,
solution of the following PDE:
|f (t, y, z)| ≤ C(1 + |y| + |z|2 ) (25)
∂t u + Lu(t, x) + f (t, x, u(t, x), ∂x uσ (t, x)) = 0,
then there exists an adapted pair of processes (Y , Z),
u(T , x) = g(x) which is the solution of the quadratic BSDE associ-
ated with f and ξ such that the process Y is bounded
(21)
and Z ∈ H 2 .
where L denotes the infinitesimal generator of X
The idea is to make an exponential change of vari-
(see Forward–Backward Stochastic Differential
able yt = e2CYt and to show the existence of a solu-
Equations (SDEs); Markov Processes). There are
some complementary results concerning the case of tion by an approximation method. More precisely, it
a non-Brownian filtration (see [1] or [7] Section 5). is possible to show that there exists a nonincreasing
In addition, some properties of differentiability in sequence of Lipschitz drivers F p , which converges
Malliavin’s sense of the solution of a BSDE can be to F (where F is the driver of the BSDE satisfied
given [7, 24]. In particular, under some smoothness by yt ). Then, one can show that the (nonincreasing)
assumptions on f , the process Zt corresponds to the sequence y p of solutions of classical BSDEs associ-
Malliavin derivative of Yt , that is, ated with F p converges to a solution y of the BSDE
associated with the driver F and terminal condition
Dt Yt = Zt , dP ⊗ dt-a.s. (22) e2Cξ , which gives the desired result.
Backward Stochastic Differential Equations 5

BSDE for a European Option contingent claim settled at time T , that is, an FT -
measurable square-integrable random variable (it can
Consider a market model with a nonrisky asset, where be thought of as a contract that pays the amount ξ at
price per unit P0 (t) at time t satisfies time T ). By a direct application of BSDE results, we
derive that there exists a unique P -square-integrable
dP0 (t) = P0 (t)r(t)dt (26) strategy (X, π) such that
and n risky assets, the price of the ith stock Pi (t) is
modeled by the linear stochastic differential equation dXt = rt Xt dt + πt σt θt dt + πt σt dWt ,
  XT = ξ (30)

n
dPi (t) = Pi (t) bi (t)dt + σi,j (t)dWt  (27)
j
Xt is the price of claim ξ at time t and (X, π) is a
j =1
hedging strategy for ξ .
driven by a standard n-dimensional Wiener process In the case of constraints such as the case of a
W = (W 1 , . . . , W n ) , defined on a filtered probabil- borrowing interest rate Rt greater than the bond rate
ity space (, IF, P ). We assume the filtration IF r (see [10] p. 201 and 216 or [7]), the case of taxes
generated by the Brownian W is complete. The prob- [8], or the case of a large investor (whose strategy has
ability P corresponds to the objective probability an influence on prices, see [10] p. 216), the dynamics
measure. The coefficients r, bi , σi,j are IF -predictable of the wealth-portfolio strategy is no longer linear.
processes. We denote the vector b := (b1 , . . . , bn ) by Generally, it can be written as follows:
b and the volatility matrix σ := (σi,j , 1 ≤ i ≤ n, 1 ≤ −dXt = b(t, Xt , σt πt )dt − πt σt dWt (31)
j ≤ n) by σ . We will assume that the matrix σt has
full rank for any t ∈ [0, T ]. Let θt = (θt1 , . . . , θtd ) be where b is a driver (the classical case corresponds to
the classical risk-premium vector defined as the case where b(t, x, z) = −rt x − z θt ).
Let ξ be a square-integrable European contingent
θt = σ −1 (bt − rt 1) P -a.s. (28) claim. BSDE results give the existence and the
The coefficients σ , b, θ, and r are supposed to be uniqueness of a P -square-integrable strategy (X, π)
bounded. such that
Let us consider a small investor, who can invest
in the n + 1 basic securities. We denote by (Xt ) the −dXt = b(t, Xt , σt πt )dt − πt σt dWt ,
wealth process. At each time t, he/she chooses the XT = ξ (32)
amount πi (t) invested in the ith stock.
More precisely, a portfolio process
 T is an adapted As in the classical case, Xt is the price of the
 
process π = (π1 , . . . , πn ) with 0 |σt πt |2 dt < ∞, claim ξ at time t and (X, π) is a hedging strategy of
P -a.s. ξ . Also note that, under some smoothness assump-
The strategy is supposed to be self-financing, tions on the driver b, by equality (22), the hedging
that is, the wealth process satisfies the following portfolio process (multiplied by the volatility) πt σt
dynamics: corresponds to the Malliavin derivative Dt Xt of the
price process, that is,
dXtx,π = rt Xt dt + πt σt (dWt + θt dt) (29)
Dt Xt = σt πt , dP ⊗ dt-a.s. (33)
Generally, the initial wealth x = X0 is taken as a
primitive, and for an initial endowment and portfolio which generalizes (to the nonlinear case) the useful
process (x, π), there exists a unique wealth process result stated by Karatzas and Ocone [21] in the
X, which is the solution of the linear equation (29) linear case. Thus, we obtain a nonlinear price system
with initial condition X0 = x. Therefore, there exists (see [10] p. 209), that is, an application that, for
a one-to-one correspondence between pairs (x, π) each ξ ∈ L2 (FT ) and T ≥ 0, associates an adapted
and trading strategies (X, π). process (Xtb (ξ, T )){0≤t≤T } , where Xtb (ξ, T ) denotes
Let T be a strictly positive real, which will be the the solution of the BSDE associated with the driver
terminal time of our problem. Let ξ be a European b, terminal condition ξ , and terminal time T .
6 Backward Stochastic Differential Equations

By the comparison theorem, this price system is A is a bounded set of  T pairs of adapted pro-
nondecreasing with respect to ξ and satisfies the no- cesses (β, γ ) such that E 0 B(t, βt , γt )2 dt < +∞.
arbitrage property: BSDEs’ properties give the following variational
formulation:
A1. If ξ 1 ≥ ξ 2 and if Xtb (ξ 1 , T ) = Xtb (ξ 2 , T ) on an
event A ∈ Ft , then ξ 1 = ξ 2 on A. β,γ
Xtb = ess sup Xt (36)
By the flow property of BSDEs (Proposition 2), (β,γ )∈A
it is also consistent: more precisely, if S is a
stopping time (smaller than T ), then for each where X β,γ is the solution of the linear BSDE
time t smaller than S, the price associated with associated with the driver bβ,γ and terminal condition
payoff ξ and maturity T coincides with the ξ . In other words, X β,γ is the classical linear price of
price associated with maturity S and payoff ξ in a fictitious market with interest rate β and risk-
XSb (ξ, T ), that is, premium γ . The function B can be interpreted as a
A2. ∀t ≤ S, Xtb (ξ, T ) = Xtb (XSb (ξ, T ), S). cost function or a penalty function (which is equal to
In addition, if b(t, 0, 0) ≥ 0, then, by the com- 0 in quite a few examples).
parison theorem, the price X.b is positive. At An interesting question that follows is “Under
least, if b is sublinear with respect to (x, π) what conditions does a nonlinear price system have
(which is generally the case), then, by the com- a BSDE representation?” In 2002, Coquet et al. [3]
parison theorem, the price system is sublinear. gave the first answer to this question.
Also note that if b(t, 0, 0) = 0, then the price
of a contingent claim ξ = 0 is equal to 0, Theorem 3 Let X(.) be a price system, that is,
that is, Xtb (0, T ) = 0 and moreover (see, e.g., an application that, for each ξ ∈ L2 (FT ) and T ≥
[25]), the price system satisfies the zero–one 0, associates an adapted process (Xt (ξ , T )){0≤t≤T }
law property, that is, that is nondecreasing, which satisfies the no-arbitrage
A3. Xt (1A ξ, T ) = 1A Xt (ξ, T ) a.s. for t ≤ T , A ∈ property (A1), time consistency (A2), zero–one law
Ft , and ξ ∈ L2 (FT ). (A3), and translation invariance property (A4).
Furthermore, if b does not depend on x, then Suppose that it satisfies the following assumption:
the price system satisfies the translation invari- There exists some µ > 0 such that
X0 (ξ + ξ  , T ) − X0 (ξ , T ) ≤ Y0 (ξ  , T ), for any ξ
ance property: µ

A4. Xt (ξ + ξ  , T ) = Xt (ξ, T ) + ξ  , for any ξ ∈ ∈ L2 (FT ) and ξ  a positive random variable ∈


L2 (FT ) and ξ  ∈ L2 (Ft ). L2 (FT ), where Yt (ξ  , T ) is solution of the following
µ

Intuitively, it can be interpreted as a market BSDE:


with interest rate r equal to zero.

In the case where the driver b is convex with − dYt = µ|Zt |dt − Zt dWt , YT = ξ  (37)
respect to (x, π) (which is generally the case), we
have a variational formulation of the price of a Then the price system has a BSDE representation,
European contingent claim (see [7] or [10] Prop. 3.8 that is, there exists a standard driver b(t, z) that does
p. 215). Indeed, by classical properties of convex not depend on x such that b(t, 0) = 0 and that is
analysis, b can be written as the maximum of a family Lipschitz with respect to z with coefficient µ, such
of affine functions. More precisely, we have that X(ξ , T ) corresponds to the solution of the BSDE
associated with the terminal time T , driver b, and
b(t, x, π) = sup {bβ,γ (t, x, π)} (34) terminal condition ξ , for any ξ ∈ L2 (FT ), T ≥ 0, that
(β,γ )∈A
is, X(ξ , T ) = X b (ξ , T ).
where bβ,γ (t, x, π) = B(t, βt , γt ) − βt x − γt π,
where B(t, ., .) is the polar function of b with respect In this theorem, the existence of the coefficient µ
to x, π, that is, might be interpreted in terms of risk aversion.
Many nonlinear BSDEs also appear in the case
B(ω, t, β, γ ) = inf [b(ω, t, x, π) of an incomplete market (see Complete Markets).
(x,π)∈IR×IR n
For example, the superreplication price of a Euro-

+ βt (ω) x + γt (ω) π] (35) pean contingent claim can be obtained as the limit
Backward Stochastic Differential Equations 7

of a nondecreasing sequence of penalized prices, Then, by the results of the previous section,
which are solutions of nonlinear BSDEs [9, 10]. the dynamic risk measure ρ b is nonincreasing and
Another example is given by the pricing a European satisfies the no-arbitrage property (A1). In addition,
contingent claim via exponential utility maximiza- the risk measure ρ b is also consistent.
tion in an incomplete market. In this case, El Karoui If b is superadditive with respect to (x, z), then
and Rouge [11] have stated that the price of such an the dynamic risk-measure ρ b is subadditive, that is,
option is the solution of a quadratic BSDE. More pre- For any T ≥ 0, ξ, ξ  ∈ L2 (FT ), ρtb (ξ + ξ  , T ) ≤
cisely, let us consider a complete market (see Com- ρtb (ξ, T ) + ρtb (ξ  , T ).
plete Markets) [11] that contains n securities, whose If b(t, 0, 0) = 0, then ρ b satisfies zero–one law
(invertible) volatility matrix is denoted by σt . Sup- (A3).
pose that only the first j securities are available for In addition, if b does not depend on x, then the
hedging and their volatility matrix is denoted by σt1 . measure of risk satisfies the translation invariance
The utility function is given by u(x) = −e−γ x , where property (A4).
γ (≥ 0) corresponds to the risk-aversion coefficient. In addition, if b is positively homogeneous with
Let ξ be a given contingent claim corresponding to respect to (x, z), then the risk measure ρ b is positively
an exercise time T ; in other words, ξ is a bounded homogeneous with respect to ξ , that is, ρ.b (λξ, T ) =
FT -measurable variable. Let (Xt (ξ, T )) (also denoted λρ.b (ξ, T ), for each real λ ≥ 0, T ≥ 0, and ξ ∈
by (Xt )) be the forward price process defined via the L2 (FT ).
exponential utility function as in [11]. By Theorem If b is convex (respectively, concave) with respect
5.1 in [11], there exists Z ∈ H 2 (IR n ) such that the to (x, z), then ρ b is concave (respectively, con-
pair (X, Z) is solution of the quadratic BSDE: vex) with respect to ξ . Furthermore, if b is concave
(respectively, convex), we have a variational formu-
γ lation of the risk measure ρ b (similar to the one
−dXt = −(ηt + σt−1 νt0 ) · Zt + |(Zt )|2 obtained for nonlinear price systems). Note that in
2
the case where b does not depend on x, this dual for-
× dt − Zt dWt , XT = ξ (38)
mulation corresponds to a famous theorem for convex
and translation-invariant risk measures [12] and the
where η is the classical relative risk process, ν 0 is a polar function B corresponds to the penalty function.
given process [11], and (z) denotes the orthogonal Clearly, Theorem 3 can be written in terms of
projection of z onto the kernel of σt1 . risk measures. Thus, it gives the following interesting
result.
Proposition 7 Let ρ be a dynamic risk measure,
Dynamic Risk Measures that is, an application that, for each ξ ∈ L2 (FT )
and T ≥ 0, associates an adapted process
In the same way as in the previous section, some (ρt (ξ , T )){0≤t≤T } . Suppose that ρ is nonincreas-
dynamic measures of risk can be induced quite simply ing and satisfies assumptions (A1)–(A4) and that
by BSDEs (note that time-consistent dynamic risk- there exists some µ > 0 such that ρ0 (ξ + ξ  , T ) −
measures are otherwise very difficult to deal with). ρ0 (ξ , T ) ≥ −Y0 (ξ  , T ), for any ξ ∈ L2 (FT ) and ξ  a
µ
More precisely, let b be a standard driver. We positive random variable ∈ L2 (FT ), where Yt (ξ  , T )
µ
define a dynamic risk-measure ρ b as follows: for each is solution of BSDE (37). Then, ρ can be represented
T ≥ 0 and ξ ∈ L2 (FT ), we set by a backward equation, that is, there exists a stan-
dard driver b(t, z), which is Lipschitz with respect to
ρ.b (ξ, T ) = X.b (−ξ, T ) (39)
z with coefficient µ, such that ρ = ρ b a.s.

where (Xtb (−ξ, T )) denotes the solution of the Relation with Recursive Utility
BSDE associated with the terminal condition −ξ ,
terminal time T , and driver b(t, ω, x, z) [25]. Also Another example of BSDEs in finance is given
note that ρ.b (ξ, T ) = −X.b (ξ, T ), where b(t, x, z) = by recursive utilities introduced by Duffie and
−b(t, −x, −z). Epstein [5]. Such a utility function associated with
8 Backward Stochastic Differential Equations

a consumption rate (ct , 0 ≤ t ≤ T ) corresponds to we have


the solution of BSDE (3) with terminal condition ξ ,
 T s
which can be interpreted as a terminal reward (which β,γ βu du
can be a function of terminal wealth) and a driver Yt = EQγ e t F (s, cs , βs , γs )ds
t
f (t, ct , y) depending on the consumption rate ct . The T  

Y  Ft
case of a standard utility function corresponds to a βu du
+e t (42)
linear driver f of the form f (t, c, y) = u(c) − βt y,
where u is a nondecreasing and concave deterministic
function and β corresponds to the discounted rate. El Karoui et al. [8] considered the optimization
Note that by BSDE results, we may consider problem of a recursive utility with nonlinear con-
a driver f that depends on the variability process straints on the wealth. By using BSDE techniques,
Zt [7]. The generalized recursive utility is then the authors state a maximum principle that gives a
the solution of the BSDE associated with ξ and necessary and sufficient condition of optimality. The
f (t, ct , y, z). The standard utility function can be variational formulation can also lead to transform
generalized to the following model first introduced the initial problem into a max–min problem, which
by Chen and Epstein [2]: can be written as a min–max problem under some
assumptions.
f (t, c, y, z) = u(c) − βt y − K.|z| (40)
Reflected BSDEs
where K = (K1 , . . . , Kn ) and |z| = (|z1 |, . . . , |zn |).
The constants Ki can be interpreted as risk-aversion Reflected BSDEs have been introduced by El Karoui
coefficients (or ambiguity-aversion coefficients). et al. [6]. For a reflected BSDE, the solution is
By the flow property of BSDEs, recursive utility constrained to be greater than a given process called
is consistent. In addition, by the comparison theorem, the obstacle.
if f is concave with respect to (c, y, z) (respectively, Let S 2 be the set of predictable processes φ
nondecreasing with respect to c), then recursive such that E(supt |φt |2 ) < +∞. We are given a
utility is concave (respectively, nondecreasing) with couple of standard parameters, that is, a standard
respect to c. driver f (t, y, z) and a process {ξt , 0 ≤ t ≤ T } called
In the case where the driver f is concave, we the obstacle, which is supposed to be continuous
have a variational formulation of recursive utility on [0, T [, adapted, belonging to S 2 and satisfying
(first stated in [7]) similar to the one obtained for limt→T ξt ≤ ξT .
nonlinear convex price systems (see the previous A solution of the reflected BSDE associated with
section). Let F (t, ct , ., .) be the polar function of f f and ξ corresponds to a triplet (Y, Z, K) ∈ S 2 ×
with respect to y, z and let A(c) be the (bounded) IH 2 × S 2 such that
set of pairs of adapted processes (β, γ ) such that
T
E 0 F (t, ct , βt , γt )2 dt < +∞. Properties on opti- − dYt = f (t, Yt , Zt )dt + dKt − Zt dWt ,
mization of BSDEs lead us to derive the following
variational formulation: YT = ξT (43)

with Yt ≥ ξt , 0 ≤ t ≤ T and where K is nondecreas-


β,γ
Yt = ess inf Yt (41) ing, continuous,
(β,γ )∈A  T adapted process equal to 0 at time
0 such that 0 (Ys − ξs )dKs = 0. The process K can
be interpreted as the minimal push, which allows the
where Y β,γ is the solution of the linear BSDE associ-
solution to stay above the obstacle.
ated with the driver f β,γ (t, c, x, π) := F (t, c, βt , γt )
We first give a characterization of the solution
+βt y + γt z and the terminal condition ξ . Note that
(first stated by El Karoui and Quenez [10]). For each
Y β,γ corresponds to a standard utility function eval-
t ∈ [0, T ], let us denote the set of stopping times by
uated under a discounted rate −β and under a prob-
Tt τ such that τ ∈ [t, T ] a.s.
ability Qγ with
 density with respect to P given by For each τ ∈ Tt , we denote by (Xs (τ, ξτ ),
T  1 T
Z (T ) = exp − 0 γs dWs −
γ
|γ |2 ds . Indeed, πs (τ, ξτ ), t ≤ s ≤ τ ) the (unique) solution of the
2 0 s
Backward Stochastic Differential Equations 9

BSDE associated with the terminal time τ , terminal Proposition 9 (Comparison). Let ξ 1 , ξ 2 be two
condition ξτ , and coefficient f . We easily derive the obstacle processes and let f 1 , f 2 be two coefficients.
following property. Let (Y 1 , Z 1 , K 1 ) (respectively, (Y 2 , Z 2 , K 2 )) be
a solution of the reflected BSDE (43) for (ξ 1 , f 1 )
Proposition 8 (Characterization). Suppose that (respectively, for (ξ 2 , f 2 ) and assume that
(Y , Z, K) is solution of the reflected BSDE (43). Then,
for each t ∈ [0, T ], • ξ 1 ≤ ξ 2 a.s.
• f 1 (t, y, z) ≤ f 2 (t, y, z), t ∈ [0, T ], (y, z) ∈
Yt = Xt (Dt , ξDt ) = ess sup Xt (τ , ξτ ) (44) IR × IR d .
τ ∈Tt
Then, Yt1 ≤ Yt2 ∀t ∈ [0, T ] a.s.
where Dt = inf {u ≥ t; Yu = ξu }.
As in the case of classical BSDEs, some a priori
Proof By using the fact that YDt = ξDt and since estimations similar to equations (6) and (7) can be
the process K is constant on [t, Dt ], we easily given [6]. From these estimations, we can derive the
derive that (Ys , t ≤ s ≤ Dt ) is the solution of the existence of a solution, that is, the following theorem.
BSDE associated with the terminal time Dt , terminal
condition ξDt , and coefficient f , that is, Theorem 4 There exists a unique solution (Y , Z, K)
of RBSDE (43).
Yt = Xt (Dt , ξDt ) (45)
Sketch of the proof. The arguments are the same as
It remains now to show that Yt ≥ Xt (τ, ξτ ), for in the classical case. The only problem is to show the
each τ ∈ Tt . existence of a solution in the case where the driver
Fix τ ∈ Tt . On the interval [t, τ ], the pair (Ys , Zs ) f does not depend on y, z. However, this problem
satisfies is already solved by optimal stopping time theory.
Indeed, recall that by Theorem (4), we have Y that is
a solution of the RBSDE associated with the driver
−dYs = f (s, Ys , Zs) ds + dKs − Zs dWs , f (t) and obstacle ξ ; then,
Yτ = Yτ (46)
Yt = ess sup X(τ, ξτ )
τ ∈Tt
In other words, the pair (Ys , Zs , t ≤ s ≤ Dt ) is the   
τ 
solution of BSDE associated with the terminal time = ess sup E f (s) ds + ξτ  Ft (48)
τ , terminal condition Yτ , and coefficient τ ∈Tt t

f (s, y, z) + dKs Thus, to show the existence of a solution, a natural


candidate is the process
Since f (s, y, z) + dKs ≥ f (s, y, z) and since  τ  

Yτ ≥ ξτ , the comparison theorem for BSDEs gives Y t = ess sup E f (s) ds + ξτ  Ft (49)
τ ∈Tt t
Yt ≥ Xt (τ, ξτ ) (47)
Then, by using classical results of the Snell enve-
and the proof is complete. lope theory, we derive that there exist a nondecreas-
Proposition 8 gives the uniqueness of the solution: ing continuous process K and an adapted process Z
such that (Y , Z, K) is the solution of the RBSDE
Corollary 1 (Uniqueness). There exists a unique associated with f and ξ .
solution of reflected BSDE(43).
Remark 2 The existence of a solution of the
In addition, from Proposition 8 and the compari- reflected BSDE can also be derived by an approx-
son theorem for classical BSDEs, we quite naturally imation method via penalization [6]. Indeed, one
derive the following comparison theorem for RBS- can show that the sequence of penalized processes
DEs (see [6] or [18] for a shorter proof). (Y n , n ∈ IN ), defined as the solutions of classical
10 Backward Stochastic Differential Equations

BSDEs solution of a parabolic PDE. Thus, we have that


un (t, x) ↑ u(t, x) as n → ∞ and by using classical
−dYtn = f (t, Ytn , Ztn )dt techniques of the theory of viscosity solutions, it is
possible to show that u(t, x) is a viscosity solution
+ n(Ytn − St )− dt − Ztn dWt , YTn = ξ of the obstacle problem (53).
(50) Another proof can be given by directly showing
that u is a viscosity solution of the obstacle problem
is nondecreasing (by the comparison theorem) and [18].
that it converges a.s. to the solution Y of the reflected Under quite standard assumptions on the coeffi-
BSDE. cients, there exists a unique viscosity solution (see
Monotone Schemes) of the obstacle problem (53)
In the Markovian case [6], that is, in the case [6]. Generalizations of the previous results have been
where the driver and the obstacle are functions of done on reflected BSDEs. Cvitanic and Karatzas [4]
a state process, we can give an interpretation of have studied reflected BSDEs with two obstacles and
the solution of the reflected BSDE in terms of an their links with stochastic games. Hamadène et al.
obstacle problem. More precisely, the framework is [15] have studied reflected BSDEs with two obstacles
the same as in the case of a Markovian BSDE. with continuous coefficients. Gegout-Petit and Par-
The state process X.t,x follows the dynamics (19). doux [13] have studied reflected BSDEs in a convex
Let (Y t,x , Z t,x , K t,x ) be the solution of the reflected domain, Ouknine [22] has studied reflected BSDEs
BSDE: with jumps, and finally Kobylanski et al. [18] have
studied reflected quadratic RBSDEs.
−dYs = f (s, Xst,x , Ys , Zs )ds + dKs − Zs dWs ,
YT = g(XTt,x ) (51)
Reflected BSDEs and Pricing of an
American Option under Constraints
with Ys ≥ ξs := h(s, Xst,x ), t ≤ s ≤ T . Moreover, we
assume that h(T , x) ≤ g(x) for x ∈ IR d . The func- In this section, we see how these results can be
tions f , h are deterministic and satisfy applied to the problem of evaluation of an American
option (see, e.g., [10] Section 5.4). The framework
is the one that is described in the previous section (a
h(t, x) ≤ K(1 + |x|p ), t ∈ [0, T ], x ∈ IR d complete market with nonlinear constraints such as a
(52) large investor).
Recall that an American option consists, at time
In this case, if u denotes the function such that
t, in the selection of a stopping time ν ≥ t and (once
Ytt,x = u(t, x), we have the following theorem.
this exercise time is chosen) of a payoff ξν , where
Theorem 5 Suppose that the coefficients f , b, σ , (ξt , 0 ≤ t ≤ T ) is a continuous adapted process on
and h are jointly continuous with respect to t and x. [0, T [ with limt→T ξt ≤ ξT .
Then, the function u(t, x) is a viscosity solution of the Let ν be a fixed stopping time. Then, from the
following obstacle problem: results on classical BSDEs, there exists a unique
pair of square-integrable adapted processes (X(ν, ξν ),
π(ν, ξν )) denoted also by (X ν , π ν ), satisfying
min ((u − h)(t, x), −∂t u − Lu − f (t, x, u(t, x),
∂x uσ (t, x)) = 0, u(T , x) = g(x) (53) − dXtν = b(t, Xtν , πtν )dt − (πtν ) dWt , XTν = ξ
Idea of the proof. A first proof [6] can be given (54)
by using the approximation of the solution Y of the
RBSDE by the increasing sequence Y n of penalized (To simplify the presentation, σt is assumed to be
solutions of BSDEs (50). By the previous results on equal to the identity). X(ν, ξν ) corresponds to the
classical BSDEs in the Markovian case, we know that price of a European option of exercise time ν and
Ytn, t,x = un (t, x) where un is the unique viscosity payoff ξν .
Backward Stochastic Differential Equations 11

The price of the American option is then given g-expectations, Probability Theory and Related Fields
by a right continuous left limited (RCLL) process Y , 123, 1–27.
satisfying for each t, [4] Cvitanić, J. & Karatzas, I. (1996). Backward stochastic
differential equations with reflection and Dynkin games,
Yt = ess sup Xt (ν, ξν ), P -p.s. (55) Annals of Probability 4, 2024–2056.
ν∈Tt [5] Duffie, D. & Epstein, L. (1992). Stochastic differential
utility, Econometrica 60, 353–394.
By the previous results, the price (Yt , 0 ≤ t ≤ T ) [6] El Karoui, N., Kapoudjian, C., Pardoux, E., Peng, S. &
corresponds to the solution of a reflected BSDE Quenez, M.C. (1997). Reflected solutions of Backward
associated with the coefficient b and obstacle ξ . In SDE’s and related obstacle problems for PDE’s, The
other words, there exists a process π ∈ IH 2 and K Annals of Probability 25(2), 702–737.
an increasing continuous process such that [7] El Karoui, N., Peng, S. & Quenez, M.C. (1997).
Backward stochastic differential equations in finance,
Mathematical Finance 7(1), 1–71.
−dYt = b(t, Yt , πt )dt + dKt − πt dWt , [8] El Karoui, N., Peng, S. & Quenez, M.C. (2001). A
dynamic maximum principle for the optimization of
YT = ξT (56)
recursive utilities under constraints, Annals of Applied
T Probability 11(3), 664–693.
with Y. ≥ ξ. and 0 (Yt − ξt ) dKt = 0. In addition, [9] El Karoui, N. & Quenez, M.C. (1995). Dynamic pro-
the stopping time Dt = inf {s ≥ t/Ys = ξs } is opti- gramming and pricing of a contingent claim in an incom-
mal, that is, plete market, SIAM Journal on Control and optimization
33(1), 29–66.
Yt = ess sup X(ν, ξν ) = Xt (Dt , ξDt ) (57) [10] El Karoui, N. & Quenez, M.C. (1996). Non-linear
ν∈Tt
pricing theory and backward stochastic differen-
Moreover, by the minimality property of the tial equations, in Financial Mathematics, Lectures
increasing process K, the process Y corresponds to Notes in Mathematics, Bressanone 1656, W.J. Rung-
galdieredssnm, ed., collection, Springer.
the surreplication price of the option, that is, the
[11] El Karoui, N. & Rouge, R. (2000). Contingent claim
smallest price that allows the surreplication of the pricing via utility maximization, Mathematical Finance
payoff. 10(2), 259–276.
One can also easily state that the price system [12] Föllmer, H. & Shied, A. (2004). Stochastic Finance: An
ξ.  → Y. (ξ. ) is nondecreasing and sublinear if b is introduction in Discrete Time, Walter de Gruyter, Berlin.
sublinear with respect to x, π. Note (see [10] p. 239) [13] Gegout-Petit, A. & Pardoux, E. (1996). Equations
that the nonarbitrage property holds only in a weak différentielles stochastiques rétrogrades réfléchies dans
sense: more precisely, let ξ. and ξ. be two payoffs and un convexe, Stochastics and Stochastic Reports 57,
111–128.
let Y and Y  their associated prices. If ξ. ≥ ξ. and also
[14] Gobet, E. & Labart, C. (2007). Error expansion for
Y0 = Y0 , then D0 ≤ D0 , the payoffs are equal at time the discretization of Backward Stochastic Differential
D0 , and the prices are equal until D0 . Equations, Stochastic Processes and their Applications
In the previous section, we have seen how, in the 10(2), 259–276.
case where the driver b is convex, one can obtain [15] Hamadane, S., Lepeltier, J.P. & Matoussi, A. (1997).
a variational formulation of the price of a European Double barrier reflected backward SDE’s with contin-
option. Similarly, one can show that the price of an uous coefficient, in Backward Stochastic Differential
American option is equal to the value function of a Equations, Collection Pitman Research Notes in Math-
mixed control problem [10]. ematics Series 364, N. El Karoui & L. Mazliak, eds,
Longman.
[16] Karatzas, I. & Shreve, S. (1991). Brownian Motion and
References Stochastic Calculus, Springer Verlag.
[17] Kobylanski, M. (2000). Backward stochastic differential
[1] Buckdahn, R. (1993). Backward Stochastic Differential equations and partial differential equations with
Equations Driven by a Martingale. Preprint. quadratic growth, The Annals of Probability 28,
[2] Chen, Z. & Epstein, L. (1998). Ambiguity, Risk and 558–602.
Asset Returns in Continuous Time, working paper 1998, [18] Kobylanski, M., Lepeltier, J.P., Quenez, M.C. &
University of Rochester. Torres, S. (2002). Reflected BSDE with super-linear
[3] Coquet, F., Hu, Y., Mémin, J. & Peng, S. (2002). quadratic coefficient, Probability and Mathematical
Filtration-consistent nonlinear expectations and related Statistics 22, Fasc.1, 51–83.
12 Backward Stochastic Differential Equations

[19] Lepeltier, J.P. & San Martı́, J. (1997). Backward stochas- [25] Peng, S. (2004). Nonlinear Expectations, Nonlinear
tic differential equations with continuous coefficients, Evaluations and Risk Measures, Lecture Notes in Math.,
Statistics and Probability Letters 32, 425–430. 1856, Springer, Berlin, pp. 165–253.
[20] Lepeltier, J.P. & San Martı́n, J. (1998). Existence for [26] Quenez, M.C. (1997). “Stochastic Control and BSDE’s”,
“Backward Stochastic Differential Equations”, N. El
BSDE with superlinear-quadratic coefficient, Stochastic
Karoui & L. Mazliak, eds, Collection Pitman Reasearch
and Stochastic Reports 63, 227–240.
Notes in Mathematics Series 364, Longman.
[21] Ocone, D. & Karatzas, I. (1991). A generalized Clark
representation formula with application to optimal
portfolios, Stochastics and Stochastisc Reports 34, Related Articles
187–220.
[22] Ouknine, Y. (1998). Reflected backward stochastic dif- Backward Stochastic Differential Equations:
ferential equation with jumps, Stochastics and Stochas- Numerical Methods; Convex Risk Measures;
tics Reports 65, 111–125. Forward–Backward Stochastic Differential Equa-
[23] Pardoux, P. & Peng, S. (1990). Adapted solution of tions (SDEs); Markov Processes; Martingale Rep-
backward stochastic differential equation, Systems and resentation Theorem; Mean–Variance Hedging;
Control Letters 14, 55–61. Recursive Preferences; Stochastic Control; Sto-
[24] Pardoux, P. & Peng, S. (1992). Backward stochastic dif- chastic Integrals; Superhedging.
ferential equations and Quasilinear parabolic partial dif-
ferential equations, Lecture Notes in CIS 176, 200–217. MARIE-CLAIRE QUENEZ
Backward Stochastic assumptions only, but for simulation studies multiple
approximations are needed. See also [10, 13, 28]
Differential Equations: for forward–backward systems of SDE (FBSDE)
solutions, [18] for a regression-based Monte Carlo
Numerical Methods method, [39] for approximating solutions of BSDEs,
and [35] for Monte Carlo valuation of American
Options.
Nonlinear backward stochastic differential equations On the other hand, in [2, 9, 11, 26] the authors
(BSDEs) were introduced in 1990 by Pardoux and replace Brownian motion by simple random walks
Peng [34]. The interest in BSDEs comes form their in order to define numerical approximations for
connections with partial differential equations (PDEs) BSDEs. This technique simplifies the computation of
[14, 38]; stochastic control (see Stochastic Cont- conditional expectations involved at each time step.
rol); and mathematical finance (see [16, 17], among A quantization (see Quantization Methods) tech-
others). In particular, as shown in [15], BSDEs are nique was suggested in [4, 5] for the resolution of
a useful tool in the pricing and hedging of European reflected backward stochastic differential equations
options. In a complete market, the price process Y (RBSDEs) when the generator f does not depend
of ξ is a solution of a BSDE. BSDEs are also useful on the control variable z. This method is based on
in quadratic hedging problems in incomplete markets the approximation of continuous time processes on
(see Mean–Variance Hedging). a finite grid, and requires a further estimation of the
The result that there exist unique BSDE equations transition probabilities on the grid.
under the assumption that the generator is locally Lip- In [8], the authors propose a discrete-time approxi-
schitz can be found in [19]. A similar result was mation for approximations of RBSDEs. The Lp norm
obtained in the case when the coefficient is con- of the error is shown to be of the order of the time
tinuous with linear growth [24]. The same authors, step. On the other hand, a numerical approximation
Lepeltier and San Martı́n [23], generalized these for a class of RBSDEs based on numerical approxi-
results under the assumption that the coefficients mations for BSDE and approximations given in [29],
have a superlinear quadratic growth. Other exten- can be found in [31, 33].
sions of existence and uniqueness of BSDE are dealt Recently, work on numerical schemes for jumps
with in [20, 25, 30]. Stability of solutions for BSDE is given in [22] and is based on the approximation for
have been studied, for example, in [1], where the the Brownian motion and a Poisson process by two
authors analyze stability under disturbances in the simple random walks. Finally, for decoupled FBSDEs
filtration. In [6], the authors show the existence and with jumps a numerical
 scheme is proposed in [7].
uniqueness of the solution and the link with integral- Let  = C [0, 1], d and consider the canonical
PDEs (see Partial Integro-differential Equations Wiener space (, F, , Ft ), in which Bt (ω) = ω(t)
(PIDEs)). An existence theorem for BSDEs with is a standard d-dimensional Brownian motion. We
jumps is presented in [25, 36]. The authors state a the- consider the following BSDE:
orem for Lipschitz generators proved by fixed point  T  T
techniques [37]. Yt = ξ + f (s, Ys , Zs )ds − Zs dBs (1)
Since BSDE solutions are explicit in only a few t t
cases, it is natural to search for numerical methods where ξ is a FT -measurable square integrable random
approximating the unique solution of such equa- variable and f is Lipschitz continuous in the space
tions and to know the associated type of conver- variable with Lipschitz constant L. The solution of
gence. Some methods of approximation have been equation (1) is a pair of adapted processes (Y, Z),
developed. which satisfies the equation.
A four-step algorithm is proposed in [27] to
solve equations of forward–backward type, relat-
ing the type of approximation to PDEs theory. On Numerical Methods for BSDEs
the other hand, in [3], a method of random dis-
cretization in time is used where the convergence of One approach for a numerical scheme for solving
the method for the solution (Y, Z) needs regularity BSDEs is based upon a discretization of the equation
2 Backward Stochastic Differential Equations: Numerical Methods

(1) by replacing B with a simple random walk. To be It is standard to show that if f is uniformly
more precise, let us consider the symmetric random Lipschitz in the spatial variable x with Lipschitz
walk W n : constant L (we also assume that f is bounded by R),
then the iterations of this procedure will converge
1  n c (t)
to the true solution of equation (7) at a geometric
Wtn := √ ζkn , 0≤t ≤T (2)
n k=0 rate L/n. Therefore, in the case where n is large
enough, one iteration would already give us the
where {ζkn }1≤k≤n is an i.i.d. Bernoulli symmetric error estimate: |Ytni − X 1 | ≤ LR
n2
, producing a good
sequence. We define Gnk := σ (ζ1n , . . . , ζkn ). Through- approximate solution of equation (7). Consequently,
out this section cn (t) = [nt]/n, and ξ n denotes a the explicit numerical scheme is given by
square integrable random variable, measurable w.r.t.

Gnn that should converge to ξ . We assume that W n 
 ŶT = ξ
; ẐTn =
n n

  0
 Xt = Ɛ Ŷti+1 Gni
and B are defined in the same probability space.  n
In [26], the authors consider the case when the i

generator depends only on the variable Y , which 


 Ŷtni = Xtni + n1 f (Xtni )


  
makes the analysis simpler. In this situation, the  Ẑtn = Ɛ Ŷti+1 + 1 f (Ŷtn ) − Ŷtn (Wtn )−1 Gn

BSDE (1) is given by i n i i i+1 i

 T  T (9)
Yt = ξ + f (Ys )ds − Zs dBs (3)
t t The convergence of Ŷ n to Y is proved in the sense
of the Skorohod topology in [9, 26]. In [11], the
whose solution is given by
convergence of the sequence Y n is established using
  T   the tool of convergence of filtrations. See also [3] for

Yt = Ɛ ξ + f (Ys )ds Ft (4) the case where f depends on both variables y and z.
t

which can be discretized in time with step-size h =


T /n by solving a discrete BSDE given by Application to European Options
In the Black–Scholes model (see Black–Scholes
1 
n n−1
Ytni =ξ +n
f (Ytnj ) − Ztnj Wtnj +1 (5) Formula)
n j =i j =i dSt = µSt dt + σ St dBt (10)
This equation has a unique solution (Ytn , Ztn ) since which is the continuous version of
the martingale W n has the predictable representation
property. It can be checked that solving this equation St+t − St
≈ µt + σ Bt (11)
is equivalent to finding a solution to the following St
implicit iteration problem:
 where the relative return has linear growth plus a

1  random perturbation. σ is called the volatility and it
Ytni = Ɛ Ytni+1 + f (Ytni )Gni (6) is a measure of uncertainty. In this particular case, S
n
has an explicit solution given by the Doleans–Dade
which, due to the adaptedness condition, is equivalent exponential
to

1  St = S0 e(µ− 2 σ t )+σ Bt
1 2
Ytni − f (Ytni ) = Ɛ Ytni+1 Gni (7) (12)
n
Furthermore, once Ytni+1 is determined, Ytni is solved We assume the existence of a riskless asset whose
via equation (7) by a fixed point technique: evolution is given by βt = β0 ert , where r is a constant
interest rate. Then β satisfies the ODE:


  t
X 0 = Ɛ Yti+1 Gni
(8) βt = β0 + r βs ds (13)
X 1 = X 0 + n1 f (X k ) 0
Backward Stochastic Differential Equations: Numerical Methods 3
 t
A portfolio is a pair of adapted processes (at , bt )
that represent the amount of investment in both assets + (rbs βs + as µSs ) ds (18)
0
at time t (both can be positive or negative). The
wealth process is then given by Using the uniqueness in the predictable represen-
tation property for Brownian motion (see Martingale
Yt = at St + bt βt (14) Representation Theorem), we obtain that
We assume Y is self-financing:
∂w
dYt = at dSt + bt dβt (15) as σ S s = σ S s
∂x
A call option gives the holder the right to buy 1 2 2 ∂ 2w ∂w ∂w
rbs βs + as µSs = σ Ss + µSs +
an agreed quantity of a particular commodity S at 2 ∂x 2 ∂x ∂t
a certain time (the expiration date, T ) for a certain ∂w
price (the strike price K). The holder has to pay a fee as = (s, Ss )
∂x
(called a premium q) for this right. If the option can
Ys − as Ss
be exercised only at T , the option is called European. bs = (19)
If it can be exercised at any time before T , it is called βs
American. The main question is, what is the right
Since r (Ys −a
2
price for an option? Mathematically, q is determined s Ss )
βs + as µSs = 12 σ 2 Ss2 ∂∂xw2 + µSs ∂w
βs ∂x
by the existence of a replication strategy with the
+ ∂t , the equation for w is
∂w
initial value q and final value (ST − K)+ ; that is,
find (at , bt ) such that
∂w 1 ∂ 2w ∂w
Yt = at St + bt βt YT = (ST − K)+ Y0 = q (16) r + σ 2 x 2 2 = − rx + rw
∂t 2 ∂x ∂x
We look for a solution to this problem of the form w(T , x) = (x − K)+ (20)
Yt = w(t, St ) with w(T , x) = (x − K)+ . Using Itô’s
formula, we get
The solution of this PDE is related to a BSDE,
  which we deduce now. Let us start again from the
t t
∂w ∂ 2w self-financing assumption
Yt = Y0 + dSs + d[S, S]s
0 ∂x 0 ∂x 2
 
t
∂w t
∂w  T
+ ds = Y0 + {µSs ds + σ Ss dBs } ∂w
0 ∂t 0 ∂x (ST − K)+ = YT = Yt + dSs
 t ∂x
t
1 ∂ 2w 2 2  T  
+ σ Ss ds ∂w
0 2 ∂x 2 + r Ys − Ss ds = Yt
  t t ∂x
t
∂w ∂w 
+ ds = Y0 + σ Ss dBs T
∂w
0 ∂t 0 ∂x + σ Ss dBs
 t 2  t ∂x
1∂ w 2 2 ∂w ∂w   
+ σ S s + µS s + ds (17) T
∂w
0 2 ∂x 2 ∂x ∂t + rYs + (µ − r)Ss ds
t ∂x
Using the self-financing property, we obtain (21)
 t  t  t
Yt = Y0 + as dSs + bs dβs = Y0 + as {µSs ds from which we deduce
0 0 0
 t  t  T  T
+ σ Ss dBs } + bs dβs = Y0 + as σ Ss dBs Yt = ξ + (αZs − rYs )ds − Zs dBs (22)
0 0 t t
4 Backward Stochastic Differential Equations: Numerical Methods

, ξ = (S0 e(µ− 2 σ T )+σ BT − K)+ , and (26) [14] coupled with a use of the standard Euler
1 2
with α = r−µ σ
Zs = σ Ss ∂x . In this case, we have an explicit solu-
∂w scheme. The penalization equation is given by
tion for w given by
 1
Ytε = ξ + f (s, Ysε , Zsε )ds
−rT
Y0 = S0 (g(T , S0 )) − Ke (h(T , S0 )) t
 1  1
1
(23) − Zsε dBs + (Ls − Ysε )+ ds
t ε t
w(t, x) = x (g(T − t, x))
(27)
− Ke−r(T −t) (h(T − t, x)) (24)
2
In this framework, we define
where g(t, x) = ln(x/K)+(r+1/2σ

σ t
)t
, h(t, x) = g(t, x) − 
√  −y 2 1 t
1 x
σ t and (x) = √2π −∞ e 2 dy is the standard Ktε := (Ls − Ysε )+ ds, 0≤t ≤1 (28)
ε 0
normal distribution. In general, for example, when
σ may depend on time and (St ), we obtain a BSDE where ε is the penalization parameter. In order to
for (Yt ) coupled with a forward equation for (St ), that have an explicit iteration, we include an extra Picard
can be solved numerically. iteration, and the numerical procedure is then

ε,p+1,n ε,p+1,n 1 ε,p,n ε,p,n


Numerical Methods for RBSDEs Yti = Yti+1 + f (ti , Yti , Zti )
n
1 1 ε,p+1,n
In this section, we are interested in the numerical (Lti − Yti )+ − √ Zti
ε,p,n
+ ζi+1
approximation of BSDEs with reflection (in short, nε n
RBSDEs). We present here the case of one lower (29)
1  +
barrier, which we assume is an Itô process (a sum of a
ε,p+1,n ε,p+1,n ε,p+1,n
Brownian martingale and a continuous finite variation Kti+1 − Kti := S − Ÿti
process). nε
for i ∈ {n − 1, . . . , 0} (30)
 T
Yt = ξ + f (s, Ys , Zs )ds Theorem 1 Under the assumptions
t
 T A1. f is Lipschitz continuous and bounded;
− Zs dBs + KT − Kt 0≤t ≤T (25) A2. L is assumed
 to be an Itô process; 
t  
 T A3.  
lim Ɛ sup Ɛ[ξ |Fs ] − Ɛ[ξ |Gcn (s) ] = 0
n n
n→+∞
Yt ≥ Lt , 0 ≤ t ≤ T, and (Yt − Lt ) dKt = 0 s∈[0,T ]
0
(26) the triplet (ξ n , Y ε,p,n , Z ε,p,n , K ε,p,n ) converges in the
Skorohod topology toward the solution (ξ , Y , Z, K)
where, as before, f is the generator, ξ is the of the RBSDE (26) (the order is first p → ∞, then
terminal condition, and L = (Lt ) is the reflecting n → ∞ and finally ε → 0).
barrier. Under the Lipschitz assumption of f (see
[14] and for generalizations see [12, 21, 32]) there
is a unique solution (Y, Z, K) of adapted processes, A Procedure Based on Ma and Zhang’s
with the condition that K is increasing and minimal Method
in the sense that it is supported at the times Y touches
the boundary. We now introduce a numerical scheme based on a
The numerical scheme for RBSDEs that we suggestion given in [29]. The new ingredient is to
present here is based on a penalization of equation use a standard BSDE with no reflection and then
Backward Stochastic Differential Equations: Numerical Methods 5

impose in the final condition of every step of the Clearly K n is predictable and we have
discretization that the solution must be above the
barrier. Schematically we have  ti  
Ytni−1 = Ytni + f s, Ỹsn , Zsn ds
• Y1n := ξ n  
ti−1
 ti
• for i = n, n − 1, . . . 1 let Ỹ n , Z n be the solu- − Zsn dWsn + Ktni − Ktni−1 (32)
tion of the BSDE: ti−1

Theorem 2 Under the assumptions A1, A2 of The-


orem 1 and
1
Ỹtni+1 = Ytni + f (s, Ỹsn , Zsn ) − Zsn (Wtni+1 − Wtni )  2
n  
(31) lim Ɛ sup Ɛ[ξ |Fs ] − Ɛ[ξ n |Gncn (s) ] =0
• define Ytni+1 = Ỹtni+1 ∨ Lti+1 n→+∞ s∈[0,T ]

• let K0n = 0 and define Ktni := ij =1 (Ytnj −1 − Ỹtnj −1 ) (33)

Node
7.1
260,88728
Node
6.1
222,35356
Node Node
5.1 7.2
189,51137 188,266912
Node Node
4.1 6.2
161,520055 160,459406
Node Node Node
3.1 5.2 7.3
137,663129 136,759141 135,861089
Node Node Node
2.1 4.2 6.3
117,3299316 116,559465 115,794058
Node Node Node Node
1.1 3.2 5.3 7.4
100 99,3433333 98,6909788 98,042908
Node Node Node
2.2 4.3 6.4
84,67006838 84,1140683 83,5617192
Node Node Node
3.3 5.4 7.5
71,6902048 71,2194391 70,7517648
Node Node
4.4 6.5
60,7001454 60,3015478
Node Node
5.5 7.6
51,3948546 51,0573618
Node
6.6
43,5160586
Node
7.7
35,8450765

Figure 1 Binomial tree for six time steps, r = 0.06, σ = 0.4, and T = 0.5
6 Backward Stochastic Differential Equations: Numerical Methods

we have Table 1 Numerical scheme for an American option with


18 steps, K = 100, r = 0.06, σ = 0.4, and T = 0.5, and
different values of S0
  2   2 
  1   n S0 = 80 S0 = 100 S0 = 120
lim IE sup Yti − Ytni  + Zt − Z n  dt = 0
 t 
n→∞ 0≤i≤n 0 1 20 11.2773 4.1187
(34) 2 22.1952 10.0171 3.8841
3 21.8707 10.7979 3.1489
4
.. 22.8245
.. 10.1496
.. 3.9042
..
Application to American Options . . . .
15 22.6775 10.8116 3.7119
16 22.6068 10.6171 3.6070
An American option (see American Options) is one
17 22.7144 10.7798 3.6811
that can be exercised at any time between the pur- 18 22.6271 10.6125 3.6364
chase date and the expiration date T , which we Real values 21.6059 9.9458 4.0611
assume is nonrandom and for the sake of simplic-
ity we take T = 1. This situation is more general
than the European-style option, which can only be The exercise random time is given by the fol-
exercised on the date of expiration. Since an Ameri- lowing stopping time τ = inf{t : Yt − Lt < 0} that
can option provides an investor with a greater degree represents the exit time from the market for the
of flexibility, the premium for this option should be investor. As usual, we take τ = 1 if Y never touches
higher than the premium for a European-style option. the boundary L. At τ the investor will buy the stock if
We consider a financial market described by τ < 1, otherwise he/she does not exercise the option.
a filtered probability space (, F, F0≤t≤T , ). As In this problem, we are interested in finding Yt , Zt ,
above, we consider the following adapted processes: and τ .
the price of the risk asset S = (St )0≤t≤T and the In Table 1 and Figure 1, we summarize the results
wealth process Y = (Yt )0≤t≤T . We assume that the of a simulation for the American option.
rate interest r is constant. The aim is to obtain Y0 ,
the value of the American Option.
We assume that there exists a risk-neutral measure Acknowledgments
(see Equivalent Martingale Measures) allowing
one to compute prices of all contingent claims as the Jaime San Martı́n’s research is supported by Nucleus
Millennium Information and Randomness P04-069-F and
expected value of their discounted cash flows. The BASAL project. Soledad Torres’ research is supported
equation that describes the evolution of Y is given by PBCT-ACT 13 Stochastic Analysis Laboratory,
by a linear reflected BSDE coupled with the forward Chile.
equation for S.
References
 1
Yt = (K − S1 )+ − (rYs + (µ − r)Zs ) ds [1] Antonelli, F. (1996). Stability of backward stochastic
t
 1
differential equations, Stochastic Processes and Their
Applications 62(1), 103–114.
+ K1 − Kt − Zs dBs (35) [2] Antonelli, F. & Kohatsu-Higa, A. (2000). Filtration
t
 t  t stability of backward SDE’s, Stochastic Analysis and
St = S0 + µSs ds + σ Ss dBs (36) Applications 18(1), 11–37.
0 0 [3] Bally, V. (1997). Approximation Scheme for Solutions
of BSDE. Backward Stochastic Differential Equations.
The increasing process K keeps the process Y (Paris, 1995–1996), Pitman Research Notes Mathemat-
above the barrier Lt = (St − K)+ (for a call option) ics Series, Longman, Harlow, Vol. 364, pp. 177–191.
[4] Bally, V. & Pagès, G. (2003). A quantization algo-
in a minimal way, that is, Yt ≥ Lt , dKt ≥ 0, and rithm for solving multi-dimensional discrete-time opti-
 1
mal stopping problems, Bernoulli 9(6), 1003–1049.
[5] Bally, V., Pagès, G. & Printems, J. (2001). A Stochastic
(Yt − Lt )dKt = 0 (37) Quantization Method for Nonlinear Problems. Monte
0
Backward Stochastic Differential Equations: Numerical Methods 7

Carlo and Probabilistic Methods for Partial Differential [21] Kobylanski, M., Lepeltier, J.P., Quenez, M.C. &
Equations (Monte Carlo, 2000). Monte Carlo Methods Torres, S. (2002). Reflected BSDE with Superlinear
and Applications 7 (no. 1–2), pp. 21–33. quadratic coefficient, Probability and Mathematical
[6] Barles, G., Buckdahn, R. & Pardoux, E. (1997). BSDEs Statistics 22,(Fasc. 1), 51–83.
and integral-partial differential equations, Stochastics [22] Lejay, A., Mordecki, E. & Torres, S. (2008). Numerical
and Stochastics Reports 60(1–2), 57–83. method for backward stochastic differential equations
[7] Bouchard, B. & Elie, R. (2005). Discrete time approx- with jumps. Submitted, preprint inria-00357992.
imation of decoupled forward-backward SDE with [23] Lepeltier, J.P. & San Martı́n, J. (1997). Backward
jumps. Stochastic Processes and Their Applications stochastic differential equations with continuous coeffi-
118(1), 53–75. cient, Statistics and Probability Letters 32(4), 425–430.
[8] Bouchard, B. & Touzi, N. (2004). Discrete-time [24] Lepeltier, J.P. & San Martı́n, J. (1998). Existence for
approximation and Monte-Carlo simulation of backward BSDE with superlinear-quadratic coefficients, Stochas-
stochastic differential equations, Stochastic Processes tics Stochastics Reports 63, 227–240.
and Their Applications 111(2), 175–206. [25] Li, X. & Tang, S. (1994). Necessary condition for opti-
[9] Briand, P., Delyon, B. & Mémin, J. (2001). Donsker- mal control of stochastic systems with random jumps,
Type theorem for BSDEs, Electronic Communications SIAM Journal on Control and Optimization 332(5),
in Probability 6, 1–14. 1447–1475.
[10] Chevance, D. (1997). Numerical Methods for Backward [26] Ma, J., Protter, P., San Martı́n, J. & Torres, S.
Stochastic Differential Equations. Numerical Methods in (2002). Numerical method for backward stochastic dif-
Finance, Publications of the Newton Institute, Cam- ferential equations, Annals of Applied Probability 12,
bridge University Press, Cambridge, pp. 232–244. 302–316.
[11] Coquet, F., Mémin, J. & Slomiński, L. (2001). On Weak [27] Ma, J., Protter, P. & Yong, J. (1994). Solving forward-
Convergence of Filtrations, Séminaire de Probabilités, backward stochastic differential equations explicitly a
four step scheme, Probability Theory and Related Fields
XXXV, Lecture Notes in Mathematics, Springer, Berlin,
98(3), 339–359.
Vol. 1755, pp. 306–328.
[28] Ma, J. & Yong, J. (1999). Forward-Backward Stochas-
[12] Cvitanic, J. & Karatzas, I. (1996). Backward stochastic
tic Differential Equations and their Applications. Lec-
differential equations with reflections and Dynkin games,
ture notes in Mathematics, Springer Verlag, Berlin,
Annals of Probability 24, 2024–2056.
p. 1702.
[13] Douglas, J., Ma, J. & Protter, P. (1996). Numerical
[29] Ma, J. & Zhang, L. (2005). Representations and regular-
methods for forward-backward stochastic differential
ities for solutions to bsde’s with reflections, Stochastic
equations, Annals of Applied Probability 6(3), 940–968.
Processes and their Applications 115, 539–569.
[14] El Karoui, N., Kapoudjian, C., Pardoux, E. &
[30] Mao, X.R. (1995). Adapted Solutions of BSDE with
Quenez, M.C. (1997). Reflected solutions of backward
Non-Lipschitz coefficients, Stochastic Processes and
SDE’s, and related obstacle problems for PDE’s, Annals their Applications 58, 281–292.
of Probability 25(2), 702–737. [31] Martínez, M., San Martı́n, J. & Torres, S. Numerical
[15] El Karoui, N., Peng, S. & Quenez, M.C. (1997). method for Reflected Backward Stochastic Differential
Backward stochastic differential equations in finance, Equations. Submitted.
Mathematical Finance 7, 1–71. [32] Matoussi, A. (1997). Reflected solutions of back-
[16] El Karoui, N. & Quenez, M.C. (1997). Imperfect Mar- ward stochastic differential equations with continu-
kets and Backward Stochastic Differential Equation. ous coefficient, Statistics and Probability Letters 34,
Numerical Methods in Finance, publications of the New- 347–354.
ton Institute, Cambridge University Press, Cambridge, [33] Mémin, J., Peng, S. & Xu, M. (2008). Convergence
pp. 181–214. of solutions of discrete reflected backward SDE’s and
[17] El Karoui, N. & Rouge, R. (2000). Contingent claim simulations, Acta Matematicae Applicatae Sinica 24(1),
pricing via utility maximization, Mathematical Finance 1–18.
10(2), 259–276. [34] Pardoux, P. & Peng, S. (1990). Adapted solution of
[18] Gobet, E., Lemor, J.-P. & Warin, X. (2005). A backward stochastic differential equation, Systems and
regression-based Monte Carlo method to solve backward Control Letters 14, 55–61.
stochastic differential equations, Annals of Applied Prob- [35] Rogers, L.C.G. (2002). Monte Carlo valuation of Amer-
ability 15(3), 2172–2202. ican options, Mathematical Finance 12(3), 271–286.
[19] Hamadene, S. (1996). Équations différentielles stochas- [36] Situ, R. (1997). On solution of backward stochastic
tiques rétrogrades: les cas localement Lipschitzien, differential equations with jumps, Stochastic Processes
Annales de l’institut Henri Poincaré (B) Probabilités et and their Applications 66(2), 209–236.
Statistiques 32(5), 645–659. [37] Situ, R. & Yin, J. (2003). On solutions of forward-
[20] Kobylanski, M. (2000). Backward stochastic differen- backward stochastic differential equations with Pois-
tial equations and partial differential equations with son jumps, Stochastic Analysis and Applications 21(6),
quadratic growth, Annals of Probability 28, 558–602. 1419–1448.
8 Backward Stochastic Differential Equations: Numerical Methods

[38] Sow, A.B. & Pardoux, E. (2004). Probabilistic inter- Differential Equations (SDEs); Markov Processes;
pretation of a system of quasilinear parabolic PDEs, Martingales; Martingale Representation Theorem;
Stochastics and Stochastics Reports 76(5), 429–477.
Mean–Variance Hedging; Partial Differential
[39] Zhang, J. (2004). A numerical scheme for BSDEs,
Annals of Applied Probability 14(1), 459–488. Equations; Partial Integro-differential Equa-
tions (PIDEs); Quantization Methods; Stochastic
Control.
Related Articles
JAIME SAN MARTÍN & SOLEDAD TORRES
American Options; Backward Stochastic Differ-
ential Equations; Forward–Backward Stochastic
Stochastic Exponential For a general semimartingale X as above, the expres-
sion for the stochastic exponential is
  
1
Let X be a semimartingale with X0 = 0. Then there Zt = exp Xt − [X]t (1 + Xs )
2 0<s≤t
exists a unique semimartingale Z that satisfies the
 
equation 1
 t × exp −Xs + (Xs ) 2
(6)
2
Zt = 1 + Zs− dXs (1)
0 where the possibly infinite product converges. Here
[X] denotes the quadratic variation process of X.
It is called the stochastic exponential of X and is
In case X is a local martingale vanishing at zero
denoted by E(X). Sometimes the stochastic exponen-
with X > −1, then E(X) is a strictly positive local
tial is also called the Doléans exponential, after the
martingale. This property renders the stochastic expo-
French mathematician Catherine Doléans-Dade. Note
nential very useful as a model for asset prices in case
that Z− denotes the left-limit process, so that the inte-
the price process is directly modeled under a mar-
grand in the stochastic integral is predictable.
tingale measure, that is, in the risk neutral world.
We first give some examples as follows:
However, considering some Lévy-process X, many
1. If B is a Brownian motion, then an application authors prefer to model the price process as exp(X)
of Itô’s formula reveals that rather than E(X) since this form is better suited
for applying Laplace transform methods. In fact, the
  two representations are equivalent because starting
1
E (B)t = exp Bt − t (2) with a model of the form exp(X), one can always
2  
find a Lévy-process X  such that exp(X) = E X 
2. Likewise, the stochastic exponential for a com- and vice versa (in case the stochastic exponential is
pensated Poisson process N − λt is given as positive). The detailed calculations involving charac-
teristic triplets can be found in Goll and Kallsen [3].
  Finally, for any two semimartingales X, Y we
1 have the formula
E (N − λt)t = exp − λt × 2Nt
2
  E (X) E (Y ) = E (X + Y + [X, Y ]) (7)
1
= exp ln(2)Nt − λt (3) which generalizes the multiplicative property of the
2
usual exponential function.
3. The classical Samuelson model for the evolution
of stock prices is also given as a stochastic
exponential. The price process S is modeled Martingale Property
here as the solution of the stochastic differential The most crucial issue from the point of mathemati-
equation cal finance is that, given X is a local martingale, the
dSt
= σ dBt + µ dt (4) stochastic exponential E(X) may fail to be a martin-
St gale. Let us give an illustration of this phenomenon.
We assume that the price process of a risky
Here, we consider the constant trend coefficient asset evolves as the stochastic exponential Zt =
µ, the volatility σ , and a Brownian motion B. exp Bt − 12 t where B is a standard Brownian
The solution to this equation is motion starting in zero. Since one-dimensional Brow-
nian motion is almost-surely recurrent, and therefore
gets negative for arbitrarily large times, zero must
St = E (σ Bt + µt) be an accumulation point of Z. As Z can be written
   
1 2 as a stochastic integral of B, it is a local martin-
= exp σ Bt + µ − σ t (5)
2 gale, and hence a supermartingale by Fatou’s lemma
2 Stochastic Exponential

because it is bounded from below. We conclude by Then E(M) is a uniformly integrable


the supermartingale convergence theorem that Z con- martingale.
verges (necessarily to zero). This shows that
Nevertheless, these results are still not applicable
lim Zt = 0 P − a.s (8) in many practically important situations, for exam-
t→∞
ple, if one wants to construct martingale measures
Holding one stock of the asset with price process in stochastic volatility models driven by Brownian
Z therefore amounts to following a suicide strategy, motions. In that case, the following result taken from
since one starts with an initial capital of one and Liptser and Shiryaev [8] often turns out to be useful.
ends up with no money at all at time infinity. The
mathematical explanation for this phenomenon is Theorem 3 Let T be a finite time horizon, ϑ a
that Z is not a martingale on the closed interval predictable process with
[0, ∞], or equivalently, the family {Zt , t ∈ + } is  T 
not uniformly integrable. P ϑs2 ds < ∞ = 1 (12)
What is more, one of the main applications of 0

stochastic exponentials is that they are intricately and B a Brownian motion. Provided that there is
related to measure changes since they qualify as ε > 0 such that
candidates for density processes (see Girsanov’s
 
theorem). Let us fix a filtered probability space sup E exp εϑt2 < ∞ P − a.s. (13)
0≤t≤T
(, F∞ , (Ft ), P ). In case the stochastic exponential
is positive, we may define a new measure Q on
then the stochastic exponential E( ϑdB) is a martin-
F∞ via gale on [0, T ].
dQ
= Z∞ (9)
dP Let us now turn to the discontinuous case. A gen-
If Z is a uniformly integrable martingale, then Q eralization of Novikov’s criterion has been obtained
is a probability measure since E[Z∞ ] = Z0 = 1. On by Lepingle and Mémin [7] where more results in
the other hand, if Z is a strict local martingale, this direction can be found.
hence a strict supermartingale, then we get Q() = Theorem 4 Let M be a locally bounded local
E[Z∞ ] < 1. It is therefore of paramount interest to P -martingale with M > −1. If
have criteria at hand for stochastic exponentials to be
true martingales. We first focus on the continuous  
1
c
case. E exp M ∞ (1 + Mt )
2 t
Theorem 1 (Kazamaki’s Criterion). Let M be a  
continuous local martingale. Suppose Mt
× exp − <∞ (14)
   1 + Mt
1
sup E exp MT <∞ (10) then E(M) is a uniformly integrable martingale. Here
T 2
M c denotes the continuous local martingale part
where the supremum is taken over all bounded stop- of M.
ping times T . Then E(M) is a uniformly integrable
martingale. The situation is particularly transparent for Lévy
processes; see Cont and Tankov [1].
A slightly weaker result, which, however, is often
easier to apply, is given by the following criterion. Theorem 5 If M is both a Lévy process and a
local martingale, then its stochastic exponential E(M)
Theorem 2 (Novikov’s Criterion). Let M be a (given that it is positive) is already a martingale.
continuous local martingale. Suppose
   Alternative conditions for ensuring that stochastic
1 exponentials are martingales in case of Brownian
E exp [M]∞ <∞ (11)
2 motion driven stochastic volatility models have been
Stochastic Exponential 3

provided in Hobson [4] as well as in Wong and [3] Goll, T. & Kallsen, J. (2000). Optimal portfolio with loga-
Heyde [9]. Moreover, Kallsen and Shiryaev [6] rithmic utility, Stochastic Processes and their Applications
give results generalizing and complementing the 89, 91–98.
[4] Hobson, D. (2004). Stochastic volatility models, correla-
criterions in Lepingle and Mémin [7]. In case of local tion and the q-optimal measure, Mathematical Finance
martingales of stochastic exponential form E(X), 14, 537–556.
where X denotes one component of a multivariate [5] Kallsen, J. & Muhle-Garbe, J. (2007). Exponentially
affine process, Kallsen and Muhle-Garbe [5] give Affine Martingales and Affine Measure Changes, preprint,
sufficient conditions for M to be a true martingale. TU München.
Finally, there are important links between stochastic [6] Kallsen, J. & Shiryaev, A.N. (2002). The cumulant
process and Esschers’s change of measure, Finance and
exponentials of BMO-martingales, reverse Hölder
Stochastics 6, 397–428.
inequalities, and weighted norm inequalities (i.e., [7] Lepingle, D. & Mémin, J. (1978). Sur l’intégrabilité
inequalities generalizing martingale inequalities to uniforme des martingales exponentielles, Zeitschrift für
certain semimartingales); compare Doléans-Dade and Wahrscheinlichkeitstheorie und verwandte Gebiete 42,
Meyer [2]. 175–203.
[8] Liptser, R. & Shiryaev, A.N. (1977). Statistics of Random
Processes I, Springer, Berlin.
References [9] Wong, B. & Heyde, C.C. (2004). On the martingale
property of stochastic exponentials, Journal of Probability
[1] Cont, R. & Tankov P. (2003). Financial Modelling with and its Applications 41, 654–664.
Jump Processes, Chapman & Hall/CRC Press, Boca
Raton. THORSTEN RHEINLÄNDER
[2] Doléans-Dade, C. & Meyer, P.A. (1979). Inégalités de
normes avec poids, Séminaire de Probabilités de Stras-
bourg 13, 313–331.
Martingales speaking, that a mathematical model for stochastic
asset prices X is free of arbitrage if and only if X
is a martingale under an equivalent probability mea-
The word martingale originated from Middle French. sure. The fair price of a contingent claim associated
It means a device for steadying a horse’s head with those assets X is the expectation of its payoff
or checking its upward movement. In eighteenth- under the martingale equivalent measure (risk neutral
century France, martingale also referred to a class measure).
of betting strategies in which a player increases the Martingale theory is a vast field of study, and
stake usually by doubling each time a bet is lost. this article only gives an introduction to the theory
The word “martingale”, which appeared in the official and describes its use in finance. For a complete
dictionary of the Academy in 1762 (in the sense of description, readers should consult texts such as [4,
a strategy) means “a strategy that consists in betting 13] and [6].
all that you have lost”. See [7] for more about the
origin of martingales. The simplest version of the
martingale betting strategies was designed to beat a Discrete-time Martingales
fair game in which the gambler wins his stake if a
coin comes up heads and loses it if the coin comes A (finite or infinite) sequence of random variables
up tails. The strategy had the gambler keep doubling X = {Xn |n = 0, 1, 2, . . .} on a probability space
his bet until the first head eventually occurs. At this (, F, ) is called a discrete-time martingale (res-
point, the gambler stops the game and recovers all pectively, submartingale, supermartingale) if for all
previous losses, besides winning a profit equal to n = 0, 1,
the original stake. Logically, if a gambler is able to 2, . . ., Ɛ[|Xn |] < ∞ and
follow this “doubling strategy” (in French, it is still
referred to as la martingale), he would win sooner   

or later. But in reality, the exponential growth of Ɛ Xn+1 X0 , X1 , . . . , Xn = Xn
the bets would bankrupt the gambler quickly. It is (respectively ≥ Xn , ≤ Xn ) (1)
Doob’s optional stopping theorem (the cornerstone
of martingale theory) that shows the impossibility of By the tower property of conditional expectations,
successful betting strategies. equation (1) is equivalent to
In probability theory, a martingale is a stochas-
tic process (a collection of random variables) such   

that the conditional expectation of an observation at Ɛ Xn X0 , X1 , . . . , Xk = Xk
some future time t, given all the observations up to
some earlier time s < t, is equal to the observation (respectively ≥ Xk , ≤ Xk ), for any k ≤ n (2)
at that earlier time s. The name “martingale” was
introduced by Jean Ville (1910–1989) as a synonym Obviously, X is a submartingale if and only if −X
of “gambling system” in his book on “collectif” in is a supermartingale. Every martingale is also a
the Borel collection, 1938. However, the concept of submartingale and a supermartingale; conversely, any
martingale was created and investigated as early as in stochastic process that is both a submartingale and
1934 by Paul Pierre Lévy (1886–1971), and a lot of a supermartingale is a martingale. The expectation
the original development of the theory was done by Ɛ[Xn ] of a martingale X at time n, is a constant
Joseph Leo Doob (1910–2004). At present, the mar- for all n. This is one of the reasons that in a
tingale theory is one of the central themes of modern fair game, the asset of a player is supposed to
probability. It plays a very important role in the study be a martingale. For a supermartingale X, Ɛ[Xn ]
of stochastic processes. In practice, a martingale is a is a nonincreasing function of n, whereas for a
model of a fair game. In financial markets, a fair submartingale X, Ɛ[Xn ] is a nondecreasing function
game means that there is no arbitrage. Mathematical of n. Here is a mnemonic for remembering which is
finance builds the bridge that connects no-arbitrage which: “Life is a supermartingale; as time advances,
arguments and martingale theory. The fundamental expectation decreases.” The conditional expectation
theorem (principle) of asset pricing states, roughly of Xn in equation (2) should be evaluated on the basis
2 Martingales

of all information available up to time k, which can walk and (q/p)Sn is a martingale since φ(p/q) = 1;
be summarized by a σ -algebra Fk , in particular, when p = q = 1/2, Sn is called a
simple symmetric random walk. If Zk has the
Fk = {all events occurring at times Bernoulli distribution, (Zk = +1) = p, (Zk =
0) = q = 1 − p, then Sn has the binomial distribu-
i = 0, 1, 2, . . . , k} (3) tion (n, p), and (q/p)2Sn −n is a martingale since
φ([q/p]2 ) = q/p.
A sequence of increasing σ -algebras {Fn |n = 0, 1,
2, . . .}, that is, Fk ⊆ Fn ⊆ F for k ≤ n, is called a Example 3 (Polya’s Urn). An urn initially con-
filtration, denoted by . When Fn is the smallest tains r red and b blue marbles. One is chosen ran-
σ -algebra containing all the information of X up domly. Then it is put back together with another one
to time n, Fn is called the σ -algebra generated by of the same color. Let Xn be the number of red mar-
X0 , X1 , . . . , Xn , denoted by σ {X0 , X1 , . . . , Xn }, and bles in the urn after n iterations of this procedure,
 is called the natural filtration of X. For another and let Yn = Xn /(n + r + b). Then the sequence Yn
sequence of random variables {Yk |k = 0, 1, . . .}, let is a martingale.
Fk = σ {Y0 , Y1 , . . . , Yk }, then Ɛ[Xn |Y0 , Y1 , . . . , Yk ] =
Ɛ[Xn |Fk ]. Example 4 (A Convex Function of Martingales).
A sequence of random variables X = {Xn |n = By Jensen’s inequality, a convex function of a
0, 1, 2, . . .} on the filtered probability space (, F, martingale is a submartingale. Similarly, a convex
, ) is said to be adapted if Xn is Fn -measurable and nondecreasing function of a submartingale is
for each n, which means that given Fn , there is also submartingale. Examples of convex functions are
no randomness in Xn . An adapted X is called a max(x − k, 0) for constant k, |x|p for p ≥ 1 and eθx
discrete-time martingale (respectively submartingale, for constant θ.
supermartingale) with respect to the filtration , if for
Example 5 (Martingale Transforms). Let X be
each n, Ɛ[|Xn |] < ∞, and
a martingale with respect to the filtration  and H be
a predictable process with respect to , that is, Hn
Ɛ[Xn |Fk ] = Xk (respectively ≥ Xk , ≤ Xk ), is Fn−1 -measurable for n ≥ 1, where F0 = {∅, }. A
martingale transform of X by H is defined by
for any k ≤ n (4)
  
n
Example 1 (Closed Martingales). Let Z be a H · X = H0 X0 + Hi (Xi − Xi−1 ) (6)
random variable with Ɛ|Z| < ∞, then for any fil- n
i=1
tration  = (Fn ), Xn = Ɛ[Z|Fn ] is a martingale (also
called a martingale closed by Z). Conversely, for any where the expression H· X is the discrete analog of
martingale X on a finite probability space, there exists the stochastic integral H dX. If Ɛ|(H · X)n | < ∞
a random variable Z such that Xn = Ɛ[Z|Fn ]. for n ≥ 1, then (H · X)n is a martingale with respect
to . The interpretation is that in a fair game X, if we
Example 2 (Partial Sums of i.i.d. Random Vari- choose our bet at each stage on the basis of the prior
ables). Let Z1 , Z2 , . . . be a sequence of indepen- history, that is, the bet Hn for the nth gamble only
dent, identically distributed (i.i.d.) random variables depends on {X0 , X1 , . . . , Xn−1 }, then the game will
such that Ɛ[Zn ] = µ, and Ɛ[Zn2 ] = σ 2 < ∞, and continue to be fair. If Xn is the asset price at time
that the moment generating function φ(θ) = Ɛ[θ Z1 ] n and Hn is the number of shares of the assets held
exists for some θ > 0. Let Sn be the partial sum, by the investor during the time period from time n
Sn = Z1 + · · · + Zn , also called a random walk. Let until time n + 1, more precisely, for the time interval
Fn = σ {Z1 , . . . , Zn }. Then [n, n + 1), then (H · X)n is the total gain (or loss) up
to time n (the value of the portfolio at time n with
θ Sn the trading strategy H ).
Sn − nµ, (Sn − nµ)2 − nσ 2 , (5)
[φ(θ)]n
A random variable T taking values in {0, 1, 2,
are all martingales. If (Zk = +1) = p, (Zk = . . . ; ∞} is a stopping time T with respect to a fil-
−1) = q = 1 − p, then Sn is called a simple random tration  = {Fn |n = 0, 1, 2, . . .}, if for each n, the
Martingales 3

event {T = n} is Fn -measurable, or equivalently, the Continuous-time martingales have the same prop-
event {T ≤ n} is Fn -measurable. If S and T are erties as discrete-time martingales. For example,
stopping times, then S + T , S ∨ T = max(S, T ), and Doob’s optional stopping theorem says that for a
S ∧ T = min(S, T ) are all stopping times. Partic- martingale Xt with right continuous paths, which is
ularly, T ∧ n is a bounded stopping time for any closed in L1 by a random variable X∞ , we have
fixed time n. XnT =: XT ∧n is said to be the process
X stopped at T , since on the event {ω|T (ω) = k},
XnT = Xk for n = k, k + 1, . . . . Ɛ[XT |FS ] = XS a.s. for any two stopping times
0≤S≤T (9)
Doob’s Optional Stopping Theorem
Let X be a martingale and T be a bounded stopping The most important continuous-time martingale is
time with respect to the same filtration , then Brownian motion, which was named for the Scot-
Ɛ[XT ] = Ɛ[X0 ]. Conversely, for an adapted process tish botanist Robert Brown, who, in 1827, observed
X, if Ɛ[|XT |] < ∞ and Ɛ[XT ] = Ɛ[X0 ] hold for all ceaseless and irregular movement of pollen grains
bounded stopping time T , then X is a martingale. suspended in water. It was studied by Albert Einstein
This theorem says roughly that stopping a martingale in 1905 at the level of modern physics. Its mathemati-
at a stopping time T does not alter its expectation, cal model was first rigorously constructed in 1923 by
provided that the decision when to stop is based only Norbert Wiener. Brownian motion is also called a
on information available up to time T . The theorem Wiener process. The Wiener process gave rise to the
also shows that a martingale stopped at a stopping study of continuous-time martingales, and has been
time is still a martingale, and there is no way to be an example that helps mathematicians to understand
sure to win in a fair game if the stopping time is stochastic calculus and diffusion processes.
bounded. It was Louis Bachelier (1870–1946), now recog-
nized as the founder of mathematical finance (see
[9]), who first, in 1900, used Brownian motion B to
Continuous-time Martingales model short-term stock prices St at a time t in finan-
cial markets, that is, St = S0 + σ Bt , where σ > 0 is
A continuous-time stochastic process X on filtered a constant. Now we can see that if Brownian motion
probability space (, F, , ) is a collection of B is defined on (, F, , ), then the price process
random variables X = {Xt : 0 ≤ t ≤ ∞}, where Xt S is a martingale under the probability measure .
is a random variable observed at time t, and the In 1965, the American economist Paul Samuel-
filtration  = {Ft : 0 ≤ t ≤ ∞}, which is a family
son rediscovered Bachelier’s ideas and proposed the
of increasing σ -algebras, Fs ⊆ Ft ⊆ F for s ≤ t. A
geometric Brownian motion S0 exp{(µ − (σ 2 /2))t +
process X is said to be adapted if Xt is Ft measurable
σ Bt } as a model for long-term stock prices St . That is,
for each t. A random variable T taking values in
St follows the stochastic differential equation (SDE):
[0, ∞] is called a stopping time, if the event {T ≤ t}
dSt = µSt dt + σ St dBt . From this simple structure,
is Ft measurable for each t. The stopping  time σ -
 we get the famous Black–Scholes option price for-
algebra FT is defined to be FT = {A ∈ FA ∩ {T ≤ mulas for European calls and puts. This SDE is now
t} ∈ Ft , all t ≥ 0}, which represents the information called the Black–Scholes equation (model). Contrary
up to the stopping time T . to Bachelier’s setting, the price process S is not a
A real-valued, adapted process X is called a martingale under . However, by Girsanov’s theo-
continuous-time martingale (respectively supermar- rem, there is a unique probability measure , which
tingale, submartingale) with respect to the filtration is equivalent to , such that the discounted stock
 if price e−rt St is a martingale under  for 0 ≤ t ≤ T ,
where r is the riskless rate of interest, and T > 0 is
1. Ɛ[|Xt |] < ∞, for t > 0 (7) a fixed constant.
The reality is not as simple as the above linear
2. Ɛ[Xt |Fs ] = Xs (respectively ≤ Xs , ≥ Xs ),
SDE. A simple generalization is dSt = µ(t, St ) dt +
a.s. for any 0 ≤ s ≤ t (8) σ (t, St ) dBt . If one believes that risky asset prices
4 Martingales

have jumps, an appropriate model might be paths of infinite variation on [0, t], which prevents
us from defining the stochastic integral H dB as a
dSt = µ(t, St ) dt + σ (t, St ) dBt + J (t, St ) dNt Riemann–Stieltjes integral, path by path.
An adapted, càdlàg process M is called a local
(10) martingale with respect to a filtration  if there
exists a sequence of increasing stopping time Tn
where N is a Poisson process with intensity λ, with limn→∞ Tn = ∞ almost surely, such that for
J (t, St ) refers to the jump size, and N indicates when each n, Mt∧Tn is a martingale. A similar concept
the jumps occur. Since N is a counting (pure jump) is that a function is locally bounded : for example,
process with independent and stationary increments, 1/t is not bounded over (0, 1], but it is bounded
both Nt − λt and (Nt − λt)2 − λt are martingales. on the interval [1/n, 1] for any integer n. A process
For a more general model, we could replace N by a moving very rapidly though with continuous paths,
Lévy process that includes the Brownian motion and or jumping unboundedly and frequently, might not
Poisson process as special cases. be a martingale. However, we could modify it to be
Under these general mathematical models, it a martingale by stopping it properly, that is, it is a
becomes hard to turn the fundamental principle of martingale up to a stopping time, but may not be a
asset pricing into a precise mathematical theorem: the martingale for all time.
absence of arbitrage possibilities for a stochastic pro- The class of local martingales includes martingales
cess S, a semimartingale defined on (, F, , ), is
as special cases. For example, if for every t >
equivalent to the existence of an equivalent measure
0, Ɛ{sups≤t |Ms |} < ∞, then M is a martingale;
, under which S is a local martingale, sometimes, if for all t > 0, Ɛ{[M, M]t } < ∞, then M is a
a sigma martingale. See [2] or [3].
martingale, and Ɛ{Mt2 } = Ɛ{[M, M]t }. Conversely,
if M is a martingale with Ɛ{Mt2 } < ∞ for all t > 0,
then Ɛ{[M, M]t } < ∞ for all t > 0. For the definition
Local Martingales and Finite Variation
of quadratic variation [M, M]t , see equation (14) in
Processes the next section.
There are two types of processes with only jump Not all local martingales are martingales. Here
discontinuities. A process is said to be càdlàg if it is a typical example of a local martingale, but not
almost surely (a.s.) has sample paths that are right a martingale. Lots of continuous-time martingales,
continuous, with left limits. A process is said to be supermartingales, and submartingales can be con-
càglàd if it almost surely has sample paths that are structed from Brownian motion, since it has indepen-
left continuous, with right limits. The words càdlàg dent and stationary increments and it can be approx-
and càglàd are acronyms from the French for continu imated by a random walk. For example, let B be a
à droite, limites à gauche, and continu à gauche, standard Brownian motion in 3 with B0 = x = 0.
limites à droite, respectively. Let Let u(y) = ||y||−1 , be a superharmonic function on
3 . M t =
√ u(Bt ) is a √positive supermartingale. Since
limt→∞ t Ɛ{Mt } = π and Ɛ{M0 } = u(x), M does
 = the space of adapted processes not have constant expectations and it cannot be a
with càdlàg paths martingale. M is known as the inverse Bessel Pro-
cess. For each n, we define a stopping time Tn =
 = the space of adapted processes inf{t > 0 : ||Bt || ≤ 1/n}. Since the function u is har-
with càglàd paths (11) monic outside of the ball of radius 1/n centered at
the origin, the process {Mt∧Tn : t ≥ 0} is a martingale
An adapted, càdlàg process  A is called a finite for each n. Therefore, M is a local martingale.
N
variation (FV) process if sup i=1 |Ati − Ati−1 | is
bounded almost surely for each constant t > 0, where
the supremum is taken over the set of all parti- Semimartingales and Stochastic Integrals
tions 0 = t0 ≤ t1 ≤ · · · ≤ tN = t. An FV process is
a difference of two increasing processes. Although Today stocks and bonds are traded globally almost 24
the Brownian motion B has continuous paths, it has hours a day, and online trading happens every second.
Martingales 5

When trading takes place almost continuously, it is For a semimartingale X, its quadratic variation
simpler to use a continuous-time stochastic processes [X, X] is defined by
to model the price X. The value of the portfolio t
at time t with the continuous-time trading strategy [X, X]t = Xt − 2
2
Xs− d Xs (14)
H becomes the limit of sums as shown in the 0
martingale transform (H t · X)n in equation (6), that is, where Xs− denotes the left limit at s. Let [X, X]c
the stochastic integral 0 Hs dXs . Stochastic calculus denote the path-by-path continuous part of [X, X],
is more complicated than regular calculus because X and Xs = Xs − Xs− be the jump of X at s,
can have paths of infinite variation, especially when then [X, X]t = [X, X]ct + 0≤s≤t (Xs )2 . For an FV
X has unbounded jumps, for example, when X is 
process X, [X, X]t = 0≤s≤t (Xs )2 . In particular,
Brownian motion, a continuous-time martingale, or if X is an FV process with continuous paths, then
a local martingale. For stochastic integration theory, [X, X]t = X02 for all t ≥ 0. For a continuous local
see Stochastic Integrals or consult [8, 11] and [12], martingale X, then X 2 − [X, X]t is a continuous local
and other texts. martingale. Moreover, if [X, X]t = X02 for all t, then
Let 0 = T1 ≤ · · · ≤ Tn+1 < ∞ be a sequence of Xt = X0 for all t; in other words, if an FV process
stopping times and Hi ∈ FTi with |Hi | < ∞. A is also a continuous local martingale, then it is a
process H with a representation constant process.


n
Ht = H0 1{0} (t) + Hi 1(Ti , Ti+1 ) (t) (12) Lévy’s Characterization of Brownian
i=1 Motion
is called a simple predictable process. A collection A process X is a standard Brownian motion if and
of simple predictable processes is denoted by S. only if it is a continuous local martingale with
For a process X ∈  and H ∈ S having the rep- [X, X]t = t.
resentation (12), we define a linear mapping as the The theory of stochastic integration for integrands
martingale transforms in equation (6) in the discrete- in  is sufficient to establish Itô’s formula, the Gir-
time case sanov–Meyer theorem, and to study SDEs. For exam-
ple, the stochastic exponential of a semimartingale X

n with X0 = 0, written E(X), is the unique semimartin-
(H · X)t = H0 X0 + Hi (Xt∧Ti+1 − Xt∧Ti ) (13) gale Z that is a solution of the linear SDE: Zt =
t
i=1 1 + 0 Zs− d Xs . When X is a continuous local mar-
tingale, so is E(X)t = exp{Xt − 12 [X, X]t }. Further-
If for any H ∈ S and each t ≥ 0, the sequence
more, if Kazamaki’s Criterion supT Ɛ{exp( 12 XT )} <
of random variables (H n · X)t converges to (H ·
∞ holds, where the supremum is taken over all
X)t in probability, whenever H n ∈ S converges to bounded stopping times, or if Novikov’s Criterion
H uniformly, then X is called a semimartingale. Ɛ{exp( 12 [X, X]∞ )} < ∞ holds (stronger but easier to
For example, an FV process, a local martingale check in practice), then E(X) is a martingale. See
with continuous paths, and a Lévy process are all [10] for more on these conditions. When X is Brow-
semimartingales. nian motion, E(X) = exp{Xt − 12 t} is referred to as
Since the space S is dense in , for any H ∈ , geometric Brownian motion.
there exists Hn ∈ S such that Hn converges to H . The space of integrands  is not general enough
For a semimartingale X and a process H ∈ , the to have local times and martingale representation
stochastic integral H d X, also denoted by (H · X), theory, which is essential for hedging in finance. On
is defined by lim (H n · X). For any H ∈ , H · X the basis of the Bichteler–Dellacherie theorem, X is a
n→∞
is a semimartingale, it is an FV process if X is, and semimartingale if and only if X = M + A, where M
it is a local martingale if X is. But H · X may not is a local martingale and A is an FV process, we can
be a martingale even if X is. H· X is a martingale if extend the stochastic integration from  to the space
t
X is a local martingale and Ɛ{ 0 Hs2 d[X, X]s } < ∞ P of predictable processes, which are measurable
for each t > 0. with respect to σ {H : H ∈ }. For a semimartingale
6 Martingales

X, if a predictable H is X integrable, that is, we and it suffices to require H · X to be a martingale for


can define the stochastic integral H · X, then we some H , that is, X is a sigma martingale. Moreover,
write H ∈ L(X) (see chapter 4 of [8]). If H ∈ P nonnegative sigma martingales are local martingales,
is locally bounded then H ∈ L(X) and H · X is a so in particular for stock prices, we do need to
local martingale if X is. However, if H ∈ P is not consider sigma martingales.
locally bounded or H ∈ / , then H · X may not be Finally, we cite two fundamental theorems of asset
a local martingale even if X is an L2 martingale. pricing from chapters 8 and 14 of [3] to see why we
For such an example due to M. Émery, see pp 152 need sigma martingales in mathematical finance.
of [5] or pp 176 of [8]. If X is a local martingale and
H ∈ L(X), then H · X is a sigma martingale. Theorem 1 Let the discounted price process S be
a locally bounded semimartingale defined on (,
F, , ). Then there exists a probability measure 
Sigma Martingales (equivalent to ) under which S is a local martingale,
if and only if S satisfies the condition of no free lunch
The concept of a sigma martingale was introduced with vanishing risk (NFLVR).
by Chou [1] and further analyzed by Émery [5]. It Here the concept of NFLVR is a mild strengthen-
has seen a revival in popularity owing to Delbaen ing of the concept of no arbitrage, which is introduced
and Schachermayer [2]; see [8] for a more detailed by Delbaen and Schachermayer in [2].
treatment. Sigma martingales relate to martingales
analogously as sigma-finite measures relate to finite Theorem 2 If we assume that S is a nonlocally
measures. A sigma martingale, which may not be bounded semimartingale, then we have a general
a local martingale, has the essential features of a theorem by replacing the term “local martingale” by
the term “sigma martingale” in Theorem 1 above.
martingale.
A semimartingale X is called a sigma martingale However if S ≥ 0, then “local martingale” suffices,
because sigma martingales bounded below are a
if there exists a martingale M and a nonnegative
priori local martingales.
H ∈ P such that X = H · M, or, equivalently, if there
exists a nonnegative H ∈ P such that H · X is a
martingale.
A local martingale is a sigma martingale, but a
Conclusion
sigma martingale with large jumps might fail to be A local martingale is a martingale up to a sequence
a local martingale. If X is a sigma martingale and if of stopping times that goes to ∞, while a sigma mar-
either sups≤t |Xs | or sups≤t |Xs | is locally integrable tingale is a countable sum (a mixture) of martingales.
(for example, X has continuous paths or bounded
jumps), then X is a local martingale. If X is a sigma
martingale and H ∈ L(X), then H · X is always a References
sigma martingale.
The concept of a sigma martingale is new in the [1] Chou, C.S. (1977). Caractérisation d’une classe de
context of mathematical finance. It was introduced to semimartingales, Séminaire de Probabilit és XIII, LNM,
deal with possibly unbounded jumps of the asset price Vol. 721, Springer, pp. 250–252.
[2] Delbaen, F. & Schachermayer, W. (1998). The Funda-
process X. When we consider the process X with
mental Theorem of Asset Pricing for Unbounded Stochas-
jumps, it is often convenient to assume the jumps tic Processes, Mathematicsche Annalen, Vol. 312,
to be unbounded, for example, the Lévy processes Springer, pp. 215–250.
and the family of ARCH, GARCH processes. If the [3] Delbaen, F. & Schachermayer, W. (2006). The Mathe-
conditional distribution of jumps is Gaussian, then matics of Arbitrage, Springer Finance Series, Springer-
the process is not locally bounded. In that case, the Verlag, New York.
concept of a sigma martingale is unavoidable. On [4] Dellacherie, C. & Meyer, P.A. (1982). Probabilities and
Potential, Vol. 29 of North-Holland Mathematics Studies,
the other hand, if we are only interested in how North-Holland, Amsterdam.
to price and hedge some contingent claims, not the [5] Émery, M. (1980). Compensation de processus à varia-
underlying assets X, then it might not be necessary tion finie non localement int’egrales., Séminaire de Prob-
to require the asset price X to be a (local) martingale abilités XIV, LNM, Vol. 784, Springer, pp. 152–160.
Martingales 7

[6] Ethier, S. & Kurtz, T.G. (1986). Markov Processes: [11] Revuz, D. & Yor, M. (1991). Continuous Martingales
Characterization and Convergence, Wiley, New York. and Brownian motion, Grundlehren der Mathematischen
[7] Mansuy, R. (2005). Histoire de martingales, Mathema- Wissenschaften, 3rd Edition, Springer, Vol. 293.
[12] Rogers, L.C.G. & Williams, D. (2000). Diffusions,
tiques et Sciences Humaines/Mathematical Social Sci-
Markov Processes and Martingales, Vols 1 and 2, Cam-
ences 169(1), 105–113. bridge University Press.
[8] Protter, P. (2003). Stochastic Integration and Differential [13] Williams, D. (1991). Probability with Martingales, Cam-
Equations, Applications of Mathematics, 2nd Edition, bridge University Press.
Springer, Vol. 21.
[9] Protter, P. (2007). Louis Bachelier’s Theory of Specu- Related Articles
lation: The Origins of Modern Finance, M. Davis &
A. Etheridge, eds, a book review in the Bulletin of Equivalent Martingale Measures; Fundamental
the American Mathematical Society, Vol. 45, No. 4, Theorem of Asset Pricing; Markov Processes;
pp. 657–660. Martingale Representation Theorem.
[10] Protter, P. & Shimbo, K. (2006). No Arbitrage and
General Semimartingales. To appear in the Festschrift. LIQING YAN
Itô’s Formula The process defined in formula (2) is an example of
continuous semimartingale. Here is the classical Itô
formula for a general semimartingale (Xs )s≥0 (e.g.,
[7, 9]) and F in C2
For a function depending on space and time param-
eters, rules of differentiation are well known. For a  t
function depending on space and time parameters and
F (Xt ) = F (X0 ) + F  (Xs− ) dXs
also on a randomness parameter, Itô’s formulas pro- 0
vide rules of differentiation. These rules of differ- 
1 t 
entiation are based on the complementary notion of + F (Xs ) d[X]cs
stochastic integration (see Stochastic Integrals). 2 0
  
More precisely, given a probability space (, IP , F, + F (Xs ) − F (Xs− ) − F  (Xs− )Xs
(Ft )t≥0 ), Itô’s formulas deal with (F (Xt ); t ≥ 0), 0≤s≤t
where F is a deterministic function defined on 
and (Xt )t≥0 is a random process such that inte- (4)
gration of locally bounded predictable processes is
possible with respect to (Xt )t≥0 and satisfies a prop- where [X]c is the continuous part of [X]. For contin-
erty equivalent to the Lebesgue dominated conver- uous semimartingales, formula (4) becomes
gence theorem. This means that (Xt )t≥0 is a semi-
martingale and therefore has a finite quadratic varia-  t
tion process ([X]t , t ≥ 0) (see Stochastic Integrals) F (Xt ) = F (X0 ) + F  (Xs ) dXs
defined as 0
 t
1
 2 + F  (Xs ) d[X]s (5)
[X]t = limn→∞ Xsi+1
n − Xsin in probability, 2 0

uniformly on time intervals (1) In the special case when (Xt )t≥0 is a real Brownian
motion, then [X]t = t.
where (sin )1≤i≤nis a subdivision of [0, t] whose mesh The multidimensional version of formula (4)
converges to 0 as n tends to ∞. gives the expansion of F (Xt(1) , Xt(2) , . . . , Xt(d) ) for
We will see that Itô’s formulas also provide infor- F a real-valued function of C2 (d ) and d semi-
mation on the stochastic structure of the process
martingales X (1) , X (2) , . . . , X (d) . We set X = (X (1) ,
(F (Xt ), t ≥ 0). We first introduce the formula estab-
X (2) , . . . , X (d) ):
lished by Itô in 1951. Consider a process (Xt )t≥0 of
the form
d 

 t  t
t
∂F
F (Xt ) = F (X0 ) + (Xs− ) dXs(i)
Xt = Hs dBs + Gs ds (2) 0 ∂xi
0 0 i=1
 t 2
1  ∂ F  c
where (Bs )s≥0 is a real-valued Brownian motion, and + (Xs− ) d X (i) , X (j ) s
2 1≤i,j ≤d 0 ∂xi ∂xj
(Hs )s≥0 and (Gs )s≥0 are locally bounded predictable
processes. Then for every C 2 -function F from  to

, we have + F (Xs ) − F (Xs− )
0≤s≤t
 t

d
F (Xt ) = F (X0 ) + F  (Xs )Hs dBs −
∂F
(Xs− )Xs(i) (6)
0 ∂x
 t  t i=1 i
1
+ F  (Xs )Gs ds + Hs2 F  (Xs ) ds
0 2 0 Note the Itô formula corresponding to the case of
(3) the couple of semimartingales (Xt , t)t≥0 with X
2 Itô’s Formula

continuous and F in C2 (2 ) process A:


 t
 t
∂F F (Xt ) = F (X0 ) + F  (Xs− ) dXs
F (Xt , t) = F (X0 , 0) + (Xs , s) dXs 0
0 ∂x   
 t + F (Xs ) − F (Xs− ) − F  (Xs− )Xs
∂F
+ (Xs , s) ds 0<s≤t
0 ∂t 
 t +
1
Lxt µ(dx) (10)
1 ∂ 2F
+ (Xs , s) d[X]s (7) 2 
2 0 ∂x 2
The Meyer–Itô formula is also obviously available
Each of the above Itô formulas gives a decomposition for functions F , which are difference of two convex
of the process (F (Xt ), t ≥ 0) that can be reduced functions.
to the sum of a local martingale and an adapted For the semimartingales X, such that for every
bounded variation process. This shows that F (X) is a t > 0: 0<s≤t |Xs | < ∞ a.s., Bouleau and Yor
semimartingale. In practical situations, the considered extended the Meyer–Itô formula to functions F ,
function F might not be a C2 -function and the process admitting a Radon–Nicodym derivative with respect
F (X) might not be a semimartingale. Hence, many to the Lebesgue measure. Indeed, the Bouleau–Yor
authors have written extensions of the above formulas formula [2] states in that case
enlightening this C2 -condition. Some of them use  t
the notion of local times (see Local Times) whose F (Xt ) = F (X0 ) + F  (Xs− ) dXs
definition can actually be set by the following first 0
extension of the Itô formula.   
+ F (Xs ) − F (Xs− ) − F  (Xs− )Xs
For F real-valued convex function and X semi-
0<s≤t
martingale, F (X) is a semimartingale too and 
1
 t − F  (x) dx Lxt (11)
 2 
F (Xt ) = F (X0 ) + F (Xs− ) dXs + At (8)
0 Note that the Bouleau–Yor formula requires the con-
struction of a stochastic integration of deterministic
where F  is the left derivative of F and (At , t ≥ 0) is
functions with respect to the process (Lxt , x ∈ ),
an adapted , right continuous increasing process such
although this last process might not be a semimartin-
that As = F (Xs ) − F (Xs− ) − F  (Xs− )Xs .
gale. Besides, this formula shows that the process
Choosing F (x) = |x − a|, one obtains the exis-
(F (Xt ), t ≥ 0) might not be a semimartingale but a
tence of an increasing process (Lat , t ≥ 0) such that
Dirichlet process (i.e., the sum of a local martingale
and a 0-quadratic variation process).
 t In the special case of a real Brownian motion
|Xt − a| = |X0 − a| + sgn(Xs− − a) dXs + Lat (Bt , t ≥ 0), Föllmer, Protter, and Shiryayev formula
0
 offers an extension of the Bouleau–Yor formula to
+ {|Xs − a| − |Xs− − a| space–time functions G defined on  × + admit-
0<s≤t ting a Radon–Nikodym derivative with respect to the
− sgn(Xs− − a)Xs} (9) space parameter ∂G/∂x with some continuity prop-
erties (see [6], for the detailed assumptions)
The process La is called the local time process of X  t
at a (see Local Times for alternative definition and
G(Bt , t) = G(B0 , t) + G(Bs , ds)
basic properties). Note that La is continuous in t. 0
Coming back to formula (8), denote by µ the  t
∂G 1 ∂G
second derivative of F in the generalized function + (Bs , s) dBs + (B. , .), B
sense; then the Meyer–Itô formula goes further 0 ∂x 2 ∂x t
by giving the expression of the bounded variation (12)
Itô’s Formula 3
t
with 0 G(Bs , ds) = limn→∞ ni=1 (G(Bsi+1 n ,s
i+1 ) −
n (but not their derivatives). This case is treated in [8]
n
n , s )) in probability, where (s )
n for X continuous semimartingale
and in [4] for X
G(Bsi+1 i i 1≤i≤n is a
subdivision of [0, t] whose mesh converges to 0 as n Lévy process such that 0≤s≤t |Xs | < ∞ a.s. Both
tends to ∞ (Reference 5 contains a similar result and use the notion of local time of X along the curve b
Reference 1 extends it to nondegenerate diffusions). s , s ≥ 0), defined as
denoted (Lb(.)
Another way to extend the Bouleau–Yor for-
mula, in the case of a real Brownian motion, con-  t
1
sists in the construction of the stochastic integration Lb(.)
t = lim 1(|Xs −b(s)|<) d[X]cs
→0 2 0
of locally bounded deterministic space–time func-
tions f (x, t) with respect to the local time process uniformly on compacts in L1 (17)
(Lxt , x ∈ , t ≥ 0) of B. That way one obtains, for
the functions G admitting locally bounded first-order When b is a equal to the constant a, Lb(.) coincides
derivatives, Eisenbaum’s formula [3]: with the local time at the value a. These formulas
have the following form:
 t
∂G
G(Bt , t) = G(B0 , t) + (Bs , s) dBs  t
∂x ∂G
0 G(Xt , t) = G(X0 , 0) + (Xs− , s) dXs
 t 0 ∂x
∂G
+ (Bs , s) ds  t
0 ∂t
∂G1
+ (Xs , s)1(Xs <b(s)) ds
  ∂t
1 t ∂G 0
− (x, s) dLxs (13)  t
2 0  ∂x ∂G2
+ (Xs , s)1(Xs ≥b(s)) ds
0 ∂t
The comparison of formula (13) with formulas (12)  
1 t ∂ 2 G1
and (7) provides some rules of integration with + (Xs , s)1(x<b(s))
respect to the local time process of B such as 2 0 ∂x 2

∂ 2 G2
• for f continuous function on  × + + (X s , s)1 (x≥b(s)) d[X]cs
∂x 2
 t   
1 t ∂G2 ∂G1
f (x, s) dLxs = −[f (B. , .), B. ]t (14) + − (b(s), s) ds Lb(.)
s
0  2 0 ∂x ∂x
 
• for f locally bounded function on  × + + G(Xs , s) − G(Xs− , s)
admitting a locally bounded Radon–Nikodym 0<s≤t
derivative ∂f/∂x 
∂G
 t  − (Xs− , s)Xs (18)
t
∂f ∂x
f (x, s) dLxs =− (Xs , s) ds (15)
0  0 ∂x
Note that ∂G/∂x exists as a Radon–Nikodym
See [2] for an extension of formula (13) to Lévy derivative and is equal to (∂G1 /∂x)(x, s)1(x<b(s)) +
processes. (∂G2 /∂x)(x, s)1(x≥b(s)) . The formula (18) is helpful
We now mention the special case of a space–time in free-boundary problems of optimal stopping. Other
function G(x, s) defined as follows: illustrations of formula (13) are given in [4] for mul-
tidimensional Lévy processes.

G(x, s) = G1 (x, s)1{x>b(s)} + G2 (x, s)1{x≤b(s)}


References
(16)
[1] Bardina X. & Jolis M. (1997). An extension of Itô’s for-
where (b(s), s ≥ 0) is a continuous curve and G1 mula for elliptic diffusion processes, Stochastic Processes
and G2 are C2 -functions that coincide on x = b(s) and their Applications 69, 83–109.
4 Itô’s Formula

[2] Bouleau N. & Yor M. (1981). Sur la variation quadratique [7] Jacod J. & Shiryayev A.N. (2003). Limit Theorems for
des temps locaux de certaines semimartingales, Comptes Stochastic Processes, 2nd Edition, Springer.
Rendus de l’Académie des Sciences 292, 491–494. [8] Peskir G. (2005). A change-of-variable formula with local
[3] Eisenbaum N. (2000). Integration with respect to local time on curves, Journal of Theoretical Probability 18,
time, Potential Analysis 13, 303–328. 499–535.
[4] Eisenbaum N. (2006). Local time-space stochastic cal- [9] Protter, P. (2004). Stochastic Integration and Differential
Equations, 2nd Edition, Springer.
culus for Lévy processes, Stochastic Processes and their
Applications 116(5), 757–778.
[5] Errami M., Russo F. & Vallois P. (2002). Itô formula
for C 1,λ -functions of a càdlàg process, Probability Theory
Related Articles
and Related Fields 122, 191–221.
[6] Föllmer H., Protter P. & Shiryayev A.N. (1995). Quad- Lévy Processes; Local Times; Stochastic Integrals.
ratic covariation and an extension of Itô’s formula,
Bernoulli 1(1/2), 149–169. NATHALIE EISENBAUM
say, positive jumps, the definition of the tail integral
Lévy Copulas is simple: given a d-valued Lévy process with Lévy
measure ν supported by [0, ∞)d , the tail integral of
ν is the function U : (0, ∞)d → [0, ∞) defined by
Lévy copulas characterize the dependence among
components of multidimensional Lévy processes. U (x1 , . . . , xd ) = ν((x1 , ∞) × · · · × (xd , ∞)) (1)
They are similar to copulas of probability distribu-
tions but are defined at the level of Lévy measures. In the general case, care must be taken to avoid the
Lévy copulas separate the dependence structure of possible singularity of ν near zero: so the tail integral
a Lévy measure from the one-dimensional marginal is a function U : ( \ {0})d →  defined by
measures meaning that any d-dimensional Lévy mea-  
sure can be constructed from a set of one-dimensional 
d 
d

margins and a Lévy copula. This suggests the con- U (x1 , . . . , xd ) := sgn(xi )ν  I(xj ) (2)
struction of parametric multidimensional Lévy mod- i=1 j =1

els by combining arbitrary one-dimensional Lévy


where I := (x, ∞) if x > 0 and I(x) := (−∞, x] if
processes with a Lévy copula from a parametric fam-
x < 0.
ily. The Lévy copulas were introduced in [4] for
Given an d-valued Lévy process X and a
spectrally one-sided Lévy processes and in [6, 7]
nonempty set of indices I ⊂ {1, . . . , d}, the I margin
in the general case. Subsequent theoretical devel-
of X is the Lévy process of lower dimension that con-
opments include Barndorff-Nielsen and Lindner [1],
tains only those components of X whose indices are
who discuss further interpretations of Lévy copulas
in I : X I := (X i )i∈I . The I -marginal tail integral U I
and various transformations of these objects. Farkas
of X is then simply the tail integral of the process X I .
et al. [5] develop deterministic numerical methods
for option pricing in models based on Lévy copulas,
and the simulation algorithms for multidimensional
Lévy processes based on their Lévy copulas are dis- Lévy Copulas: The General Case
cussed in [4, 7].
Central to the theory of Lévy copulas are the notions
In finance, Lévy copulas are useful to model joint
of a d-increasing function and the margins of a d-
moves of several assets in various settings including
increasing function. Intuitively speaking, a function
portfolio risk management, option pricing [8], insur-
F is d-increasing if dF is a positive measure on
ance [3], and operational risk modeling [2].
d in the sense of Lebesgue–Stieltjes integration.
Similarly, the margin F I is defined so that the
measure d(F I ) induced by F I coincides with the I
Lévy Measures and Tail Integrals
margin of the measure dF . Let us now turn to precise
definitions.
A Lévy process on d is described by its characteris- d
tic triplet (A, ν, γ ), where A is a positive semidefinite We set  := (−∞, ∞] and for a, b ∈  , we
write a ≤ b if ak ≤ bk , k = 1, . . . , d. In this case,
d × d matrix, γ ∈ d , and ν is a positive Radon mea-
(a, b] denotes the interval
sure on d \ {0}, satisfying d \{0} (x2 ∧ 1)ν(dx) <
∞ and called the Lévy measure of X. The matrix A (a, b] := (a1 , b1 ] × · · · × (ad , bd ] (3)
is the covariance matrix of the continuous martingale
d
(Brownian motion) part of X, and ν describes the For a function F :  → , the F -volume of (a, b]
independent jump part. It makes sense, therefore, to is defined by
describe the dependence structure of the jump part of

X with a suitable notion of copula at the level of the VF ((a, b]) := (−1)N(u) F (u) (4)
Lévy measure. u∈{a1 ,b1 }×···×{ad ,bd }
In the same way that the distribution of a random
vector can be represented by its distribution function, where N (u) := #{k : uk = ak }. In particular, VF
the Lévy measure of a Lévy process will be repre- ((a, b]) = F (b) − F (a) for d = 1 and VF ((a, b]) =
sented by its tail integral. If we are only interested in, F (b1 , b2 ) + F (a1 , a2 ) − F (a1 , b2 ) − F (b1 , a2 ) for
2 Lévy Copulas


d = 2. If F (u) = di=1 ui , the F volume of any inter- Lévy Copulas: The Spectrally One-sided
val is equal to its Lebesgue measure. Case
d
A function F :  →  is called d increasing
If X has only positive jumps in each component, or
if VF ((a, b]) ≥ 0 for all a ≤ b. The distribution
if we are only interested in the positive jumps of
function of a random vector is one example of a d-
X, only the values F (u1 , . . . , ud ) for u1 , . . . , ud ≥ 0
increasing function. The tail integral U was defined
are relevant. We can then set F (u1 , . . . , ud ) = 0 if
in such way that (−1)d U is d increasing in every
ui < 0 for at least one i, which greatly simplifies the
orthant (but not on the entire space).
d definition of the margins:
Let F :  →  be a d-increasing function such
that F (u1 , . . . , ud ) = 0 if ui = 0 for at least one i. F I ((ui )i∈I ) = F (u1 , . . . , ud )|uj =+∞,j ∈I
/ (8)
For an index set I , the I margin of F is the function
|I | Taking the margins now amounts to replacing the
F I :  → , defined by
 variable that is being integrated out with infin-
F I ((ui )i∈I ) := lim ity—exactly the same procedure as for probability
a→∞
(ui )i∈I c ∈{−a,∞}|I c | distribution functions. Restricting a Lévy copula to
 [0, ∞]d in such way, we obtain a Lévy copula for
× F (u1 , . . . , ud ) sgn ui (5) spectrally positive Lévy processes, or, for short, a
i∈I c positive Lévy copula.
where I c := {1, . . . , d} \ I . In particular, we have
F {1} (u) = F (u, ∞) − lima→−∞ F (u, a) for d = 2. Sklar’s Theorem for Lévy Processes
To understand the reasoning leading to the above def-
inition of margins, note that any positive measure µ The following theorem [4, 7] characterizes the depen-
d
on  naturally induces an increasing function F via dence structure of Lévy processes in terms of Lévy
copulas:
F (u1 , . . . , ud ) :=
Theorem 1


d
µ (u1 ∧ 0, u1 ∨ 0] ×· · ·× (ud ∧ 0, ud ∨ 0] sgn ui 1. Let X = (X 1 , . . . , X d ) be a d-valued Lévy pro-
i=1
cess. Then there exists a Lévy copula F such that
(6) the tail integrals of X satisfy

for u1 , . . . , ud ∈ . The margins of µ are usually U I ((xi )i∈I ) = F I ((Ui (xi ))i∈I ) (9)
defined by
for any nonempty index set I ⊂ {1, . . . , d} and

d
µI (A) = µ {u ∈  : (ui )i∈I ∈ A} , A ⊂ 
|I | any (xi )i∈I ∈ ( \ {0})|I | . The Lévy copula F is
unique on di=1 Ran Ui .
(7)
2. Let F be a d-dimensional Lévy copula and
Ui , i = 1, . . . , d, tail integrals of real-valued Lévy
It is now easy to see that the margins of F are induced
processes. Then there exists a d-valued Lévy
by the margins of µ in the sense of equation (6).
d process X whose components have tail integrals
A function F :  →  is called Lévy copula if U1 , . . . , Ud and whose marginal tail integrals sat-
it satisfies the following four conditions (the first one isfy equation (9) for any nonempty I ⊂ {1, . . . , d}
is just a nontriviality requirement): and any (xi )i∈I ∈ ( \ {0})|I | . The Lévy measure
ν of X is uniquely determined by F and Ui , i =
1. F (u1 , . . . , ud )
= ∞ for (u1 , . . . , ud )
=
1, . . . , d.
(∞, . . . , ∞);
2. F (u1 , . . . , ud ) = 0 if ui = 0 for at least one In particular, applying the above theorem with I =
i ∈ {1, . . . , d}; {1, . . . , d}, we obtain the usual formula
3. F is d-increasing; and
4. F {i} (u) = u for any i ∈ {1, . . . , d}, u ∈ . U (x1 , . . . , xd ) = F (U1 (x1 ), . . . , Ud (xd )) (10)
Lévy Copulas 3

If the one-dimensional marginal Lévy measures are dependence Lévy copula given by
infinite and have no atoms, Ran Ui = (−∞, 0) ∪
(0, ∞) for any i and one can compute F directly via 
d
F (x) := min(|x1 |, . . . , |xd |)1K (x) sgn xi (14)

F (u1 , . . . , ud ) = U U1−1 (u1 ), . . . , Ud−1 (ud ) (11) i=1

Conversely, if F is a Lévy copula of X, then the


Lévy measure of X is supported by an ordered
Examples and Parametric Families subset of K. If, in addition, the tail integrals Ui
of X i are continuous and satisfy limx→0 Ui (x) = ∞,
The components of a pure-jump Lévy process are i = 1, . . . , d, then F is the unique Lévy copula of
independent if and only if they never jump together, X and the jumps of X are completely dependent. For
that is, if the Lévy measure is supported by the positive Lévy copulas, expression (14) simplifies to
coordinate axes. This leads to a characterization
of Lévy processes with independent components F (x1 , . . . , xd ) := min(x1 , . . . , xd ) (15)
in terms of their Lévy copulas: the components that is, we recover the expression of the complete
X 1 , . . . , X d of a d-valued Lévy process X are dependence copula of random variables (but the two
independent if and only if their Brownian motion functions are defined on different domains!).
parts are independent and if X has a Lévy copula One simple and convenient parametric family of
of the form positive Lévy copulas is similar to the Clayton family
of copulas; it is therefore called the Clayton–Lévy

d 
F⊥ (x1 , . . . , xd ) := xi 1{∞} (xj ) (12) copula:
i=1 j
=i
d −1/θ

The Lévy copula of independence is thus different F (u1 , . . . , ud ) = u−θ
i , u1 , . . . , ud ≥ 0
from the copula of independent random variables i=1
C⊥ (u1 , . . . , ud ) = u1 . . . ud , which emphasizes the (16)
fact that the two notions are far from being the same
and the “copula” intuition cannot always be applied The reader can easily check that this copula converges
to Lévy copulas. to the complete dependence copula F as θ → ∞
The complete dependence copula, on the other and to the independence copula F⊥ as θ → 0. This
hand, turns out to have a similar form to the classical construction can be generalized to a Lévy copula
d
case. Recall that a subset S of d is called ordered on  :
if, for any two vectors u, v ∈ S, either uk ≤ vk , k =
1, . . . , d or uk ≥ vk , k = 1, . . . , d. Similarly, S is d −1/θ

−θ
called strictly ordered if, for any two different vectors F (u1 , . . . , ud ) = 2 2−d
|ui |
u, v ∈ S, either uk < vk , k = 1, . . . , d or uk > vk , i=1
k = 1, . . . , d. Furthermore, set
× η1{u1 ···ud ≥0} − (1 − η)1{u1 ···ud <0} (17)
K := {x ∈ d : sgn x1 = . . . = sgn xd } (13) defines a two-parameter family of Lévy copulas. The
role of the parameters is easiest to analyze in the case
The jumps of an d-valued Lévy process X are d = 2, when equation (17) becomes
said to be completely dependent or comonotonic if
there exists a strictly ordered subset S ⊂ K such that −1/θ
Xt := Xt − Xt− ∈ S, t ∈ + (except for some null F (u, v) = |u|−θ + |v|−θ

set of paths). The condition Xt ∈ K means that if × η1{uv≥0} − (1 − η)1{uv<0} (18)
the components of a Lévy process are comonotonic,
they always jump in the same direction. A d-valued From this equation, it is readily seen that the parame-
Lévy process whose Lévy measure is supported by ter η determines the dependence of the sign of jumps:
an ordered set S ⊂ K is described by the complete when η = 1, the two components always jump in the
4 Lévy Copulas

same direction, and when η = 0, positive jumps in both of which lead to a correlation of 50% but
one component are accompanied by negative jumps have different tail dependence patterns. It is clear
in the other and vice versa. The parameter θ is respon- that when a precise description of tail events such
sible for the dependence of absolute values of jumps as simultaneous large jumps is necessary, Lévy cop-
in different components. ulas offer more freedom in modeling dependence
Figure 1 shows the scatter plots of weekly returns than traditional correlation-based approaches. A nat-
in an exponential Lévy model with variance gamma ural application of Lévy copulas arises in the context
(see Variance-gamma Model) margins and the of multidimensional gap options [8] that are exotic
dependence pattern given by the Lévy copula (18) products whose payoff depends on the total number
with two different sets of dependence parameters, of sharp downside moves in a basket of assets.

References
0.2 [1] Barndorff-Nielsen, O.E. & Lindner, A.M. (2007). Lévy
copulas: dynamics and transforms of upsilon type, Scan-
0.1 dinavian Journal of Statistics 34, 298–316.
[2] Böcker, K. & Klüppelberg, C. (2007). Multivariate oper-
ational risk: dependence modelling with Lévy copulas,
0 ERM Symposium Online Monograph, Society of Actuar-
ies, and Joint Risk Management, section newsletter.
[3] Bregman, Y. & Klüppelberg, C. (2005). Ruin estimation
−0.1 in multivariate models with Clayton dependence structure,
Scandinavian Actuarial Journal November(6), 462–480.
−0.2 [4] Cont, R. & Tankov, P. (2004). Financial Modelling with
Jump Processes, Chapman & Hall/CRC Press.
(a) −0.2 −0.1 0 0.1 0.2 [5] Farkas, W., Reich, N. & Schwab, C. (2007). Anisotropic
stable Lévy copula processes-analytical and numerical
aspects, Mathematical Models and Methods in Applied
0.2 Sciences 17, 1405–1443.
[6] Kallsen, J. & Tankov, P. (2006). Characterization of
dependence of multidimensional Lévy processes using
0.1 Lévy copulas, Journal of Multivariate Analysis 97,
1551–1572.
[7] Tankov, P. (2004). Lévy Processes in Finance: Inverse
0
Problems and Dependence Modelling, PhD thesis, Ecole
Polytechnique, France.
−0.1 [8] Tankov, P. (2008). Pricing and Hedging Gap Risk,
preprint, available at http://papers.ssrn.com.

−0.2
Related Articles
(b) −0.2 −0.1 0 0.1 0.2
Copulas: Estimation; Exponential Lévy Models;
Figure 1 Scatter plots of returns in a two-dimensional
variance gamma model with correlation ρ = 50% and dif-
Lévy Processes; Multivariate Distributions; Oper-
ferent tail dependence. (a) Strong tail dependence (η = 0.75 ational Risk.
and θ = 10) and (b) weak tail dependence (η = 0.99 and
θ = 0.61) PETER TANKOV
Convex Duality Different duality principles differ in the way the dual
problem is built. Two main principles are Lagrange
duality and Fenchel duality. Even though they are
formally equivalent, at least in the finite-dimensional
Convex duality refers to a general principle that case, they provide different insights into the problem.
allows us to associate with an original minimization We will see below how the Lagrange and Fenchel
program (the primal problem) a class of concave duality principles practically accomplish the tasks 1
maximization concave programs (the dual problem), to 3 above.
which, under some conditions, are equivalent to For the topics to be presented below, compre-
the primal. The unifying principles underlying these hensive references are [4] and [1] for the finite-
methods can be traced back to the basic duality that dimensional case ([1] also provides an extensive
exists between a convex set of points in the plane account of numerical methods) and [2] for the
and the set of supporting lines (hyperplanes). Duality infinite-dimensional case.
tools can be applied to nonconvex programs too, but
are most effective for convex problems.
Convex optimization problems naturally arise in Lagrange Duality in Finite-dimensional
many areas of finance; we mention just few of them Problems
(see the list of the related entries at the end of this
article): maximization of expected utility in com- We consider finite-dimensional problems, that is,
plete or incomplete markets, mean–variance portfo- V = N for some N ≥ 1. We denote v · w the
lio selection and CAPM, utility indifference pricing, inner product between two vectors v, w ∈ N and
selection of the minimal entropy martingale measure, use v ≥ 0 as a shorthand for vn ≥ 0 ∀n. Let
and model calibration. This short and nonexhaustive f, h1 , . . . , hM : C →  be M + 1 convex functions,
list should give a hint of the scope of convex duality where C ⊆ N is a convex set. Setting h =
methods in financial applications. (h1 , . . . , hM ), so that h is a convex function from
Consider the following primal minimization (con- C to M , we consider, as the primal problem, the
vex) problem: minimization of f under M inequality constraints:

(P) : min f (v) (P) : min f (v) sub v ∈ A


subject to v∈A (1) = {v ∈ C : h(v) ≤ 0} ⊂ N (3)

where A is a convex subset of some vector space V To build a dual problem, we define the so-called
and f : A →  is a convex function. Convex duality Lagrangian function
principles consist in pairing this problem with a dual
maximization (concave) problem: L(v, w) := f (v) + w · h(v)
(D) : max g(w) sub w ∈ B (2) v ∈ C, w ∈ M (4)

where B is a convex subset of some other vector and note that f (v) = supw≥0 L(v, w) for any v ∈ A.
space W (possibly W = V ) and g : B →  is a As a consequence, we can write the primal problem
concave function. in terms of L:
In general, by applying a duality principle, we
usually try to (P) : inf sup L(v, w) (5)
v∈C w≥0

1. find a lower bound for the value of the primal


The dual problem is then defined by switching the
problem, or, better
supremum with the infimum
2. find the value of the primal problem, or, even
better
(D) : sup inf L(v, w) (6)
3. find the solutions, if any, of the primal problem. w≥0 v∈C
2 Convex Duality

In the terminology of the introductory section, the practical situations, “branch and bound” algorithms in
dual problem is then integer programming being a prominent example. It
also provides a workable condition that characterizes
(D) : max g(w) sub w ∈ B a solution pair, at least when there is no duality gap.
Strong duality, on the contrary, requires a precise
= {w ∈ D : w ≥ 0} ⊂ M (7) topological assumption: the interior of the constraint
where set has to be nonempty (Slater condition). We note,
g(w) = inf L(v, w) (8) however, that this condition is satisfied in most cases,
v∈C at least in the present finite-dimensional setting.
and D = {w ∈ M : g(w) > −∞} is the domain The proof is then based on a separating hyperplane
of g. It can be proved that D is a convex set and g theorem, that in turn requires convexity assumptions
is a concave function on D even if f is not convex: about f and h. When strong duality holds, and
therefore the dual problem is always concave, even provided we are able to actually solve the dual
when the primal problem is not convex. problem, we obtain the exact value of the primal (no
We assume throughout primal and dual feasibility, duality gap).
that is, A and B are assumed to be nonempty. Dual We can add a finite number (say L) of linear
feasibility would however be ensured under Slater equality constraints to (P), obtaining
conditions for A (see below). Let p = infA f and
d = supB g be the (possibly infinite) values of the (P) : min f (v) sub v ∈ A
primal and the dual. A primal (dual) solution is 
v∈A = {v ∈ C : h(v) ≤ 0, Qv = r} ⊂ N (11)
w ∈ B), if any, such that f (
( v ) = p (g(
w ) = d); a
solution pair is a feasible pair ( ) ∈ A × B made
v, w where Q is an L × N matrix and r ∈  . The L

by a primal and a dual solution. Lagrangian is defined as

Lagrange Duality Theorem L(v, w) = f (v) + w in · h(v) + w eq · (Qv − r)


v ∈ C, w = (w in , w eq ) ∈ M×L (12)
1. Weak duality
in such a way that
Primal boundedness (p > −∞) implies dual bound-
edness (d < +∞) and inf f (v) = inf sup L(v, w) (13)
v∈A v∈C win ≥0, weq ∈L

p ≥ d (p − d ≥ 0 is called the duality gap) (9)


The dual problem is then
Moreover, if there is no duality gap (p = d), then
) ∈ A × B is a solution pair if and only if
v, w
( (D) : max g(w) sub w ∈ B

 · h(
w v ) = 0 and L( ) = g(
v, w w) (10) = {w ∈ D : w in ≥ 0} ⊂ M×L (14)

 is usually called a Lagrange multipli-


In this case, w where, as before, g(w) = infv∈C L(v, w), and D
ers vector. is the domain of g. It is worth noting that if
the primal problem has equality constraints only,
2. Strong duality then the only constraint of the dual problem
If, in addition, there exists v ∈ C such that hm (v) < 0 is w ∈ D.
for all m (Slater condition), then there is no duality A Lagrange duality theorem can then be stated and
gap and there exists a dual solution. also proved in this case, reaching similar conclusions.
We have just to replace w  with w in in the first
See [4] or [1] for a proof. condition in (10), and modify the Slater condition
as follows:
Weak duality, whose proof is trivial, holds under very
general conditions: in particular, the primal problem
• There exists v ∈ ri(C) such that hm (v) < 0
need not be convex. It gives a lower bound for the
value of the primal problem, which is useful in many for all m and Qv = r (15)
Convex Duality 3

The relative interior ri(C) is the interior of the convex if N is much larger than L. This is the basis for great
set C relative to the affine hull of C. For instance, enhancements in existing numerical methods.
if C = [0, 1] × {0} ⊂ 2 , then ri(C) = (0, 1) × {0} A last remark concerns the word “duality”: any
(because the affine hull of C is  × {0}), while the dual problem can be turned into an equivalent mini-
interior of C is clearly empty (see [4] for more mization primal problem. It turns out that the bidual,
on relative interiors and related topics about convex that is, the dual of this new primal problem, seldom
sets). coincides with the original primal problem. LP prob-
In many concrete problems, C is a polyhedron, lems are an important exception: the bidual of an LP
that is, it is the (convex and closed) set defined by problem is the problem itself.
a certain finite set of linear inequalities, and all the
functions hm are affine. If we assume, in addition, that
f may be extended to a finite convex function over all Fenchel Duality in Finite-dimensional
N , Farkas Lemma allows us to prove strong duality Problems
without requiring any Slater condition. Remarkably,
if f is linear too, then the existence of a primal Fenchel duality, that we will derive from Lagrange
solution is ensured. duality, may be applied to primal problems in the
The Lagrange duality theorem provides us a form
simple criterion for the existence of a dual solution
and a set of conditions characterizing a possible
(P) : min {f1 (v) − f2 (v)}
primal solution. It is, however, not directly concerned
with the existence of a primal solution. To ensure sub v ∈ A = C1 ∩ C2 ⊂ N (18)
this, one has to assume stronger conditions such
as compactness of C or coercivity of f . A third where C1 , C2 ⊆ N are convex, f1 : C1 →  is
condition (f linear) has been described above. convex, and f2 : C2 →  is concave.
We have seen that the dual problem usually looks Consider the function f (x, y) = f1 (x) − f2 (y)
much better than the primal: it is always concave and defined on 2N and clearly convex. We can restate
its solvability is guaranteed under mild assumptions the primal as
about the primal. This fact is particularly useful
in designing numerical procedures. Moreover, even
(P) : min f (x, y) sub (x, y) ∈ A

when the primal is solvable, the dual often proves


easier to handle. We provide a simple example that = {(x, y) ∈ C1 × C2 : x = y} ⊂ 2N (19)
should clarify the point.
A standard linear programming (LP) problem where the N fictitious linear constraints (xn = yn ∀n)
comes, by definition, in the form allow us to apply the Lagrange duality machinery.
The Lagrangian function is L(x, y, w) = f1 (x) −
(P) : min c · v sub Qv = r, v ≥ 0, v ∈ N f2 (y) + w · (x − y) and, using some simple algebra,
we compute
(16)

where c ∈ N , Q is a L × N matrix and r ∈ L . An g(w) = inf L(x, y, w) = f2∗ (w) − f1∗ (w)
x∈C1 ,y∈C2
easy computation shows that the dual problem is (T
denotes transposition) (20)

(D) : max r · w sub QT w ≤ c, w ∈ L (17) where


f1∗ (w) = sup {w · x − f1 (x)} (21)
We know that strong duality holds in this case, and x∈C1
that the existence of a solution pair is guaranteed.
 − c) · 
In particular, (QT w v = 0 is a necessary con- is, by definition, the convex conjugate (indeed, f1∗ is
dition for a pair ( ) to be a solution. The dual
v, w convex) of the convex function f1 , and
problem, however, has L variables and N constraints
and thus can often be more tractable than the primal f2∗ (w) = inf {w · y − f2 (y)} (22)
y∈C2
4 Convex Duality

is the concave conjugate (indeed, f2∗ is concave) of Fenchel duality can sometimes be effectively used for
the concave function f2 . As a consequence, the dual general problems in the form
problem is
(P) : min f (v) sub v ∈ C ⊂ N (25)
(D) : max {f2∗ (w) − f1∗ (w)} where f and C are convex. Indeed, such a problem
sub w ∈ B = C1∗ ∩ C2∗ ⊂ N
(23) can be cast in the form (18) provided we set f1 = f ,
f2 = 0 (concave), C1 = N , and C2 = C. The dual
where C1∗ and C2∗
are the domains of and f1∗ problem is given by equation (23), where
f2∗ , respectively. Assuming primal feasibility and
boundedness, the Lagrange duality theorem yields the f1∗ (w) = sup {w · v − f (v)} (26)
v∈N
Fenchel duality theorem.
is an unconstrained problem and
Fenchel Duality Theorem
f2∗ (w) = inf w · v (27)
v∈C
1. Weak duality
has a simple goal function.
If there is no duality gap, ( ) is a solution pair if
v, w
We have derived Fenchel duality as a by product
and only if
of Lagrange duality. However, it is possible to go

v·w v ) + f1∗ (
 = f1 ( v ) + f2∗ (
w ) = f2 ( w) (24) in the opposite direction, by first proving Fenchel
duality (unsurprisingly, using hyperplane separation
2. Strong duality arguments, see [2]) and then writing a Lagrange
There is no duality gap between the primal and the problem in the Fenchel form, so that Lagrange duality
dual, and there is a dual solution, provided one of the can be derived (see [3]). Therefore, at least in
following conditions is satisfied: the finite-dimensional setting, Lagrange and Fenchel
(a) ri(C1 ) ∩ ri(C2 ) is nonempty duality are formally equivalent.
(b) C1 and C2 are polyhedra and f1 (resp. f2 )
may be extended to a finite convex (concave)
function over all N Duality in Infinite-dimensional Problems
See [4] or [1] for a proof. For infinite-dimensional problems, Lagrange or
Fenchel duality exhibit a large formal similarity with
We say that a convex function f is closed if, for any the finite-dimensional counterparts we have described
a ∈ , the set a = {v : f (v) ≤ a} is closed; a sim- so far. Nevertheless, the technical topological
ilar definitions applies to concave functions, where assumptions, which are needed to ensure duality,
the inequality inside a is reversed. A sufficient, become much less trivial when the space V = N
though not necessary condition for f to be closed is is replaced by an infinite-dimensional Banach space.
continuity on all C. A celebrated result (the Fenchel– We give a brief account of these differences.
Moreau theorem) states that (f ∗ )∗ ≡ f , provided f Let V be a Banach space and consider the primal
is a closed (convex or concave) function. Therefore, problem
if in the primal problem f1 and f2 are closed, then
the dual problem of the dual coincides with the pri-
(P) : min f (v) sub v ∈ A
mal, and the duality is therefore complete. Thanks to
this fact, an application of the Fenchel duality theo- = {v ∈ C : h(v) ≤ 0} ⊂ V (28)
rem to the dual problem allows us to state that the
primal has a solution provided one of the following where C ⊆ V is a convex set, and f : C →  and h :
conditions is satisfied: C → M are convex functions. Then, by mimicking
the finite-dimensional case, the dual problem is
1. ri(C1∗ ) ∩ ri(C2∗ ) is nonempty.
2. C1∗ and C2∗ are polyhedra, and f1∗ (resp. f2∗ ) may
(D) : max g(w) sub w ∈ B
be extended to a finite convex (concave) function
over all N . = {w ∈ D : w ≥ 0} ⊂ M (29)
Convex Duality 5

where g(w) = infv∈C {f (v) + w · h(v)}, and D is the are the convex and concave conjugates of f1 and
domain of g. We can note that the dual is finite- f2 , respectively, and C1∗ and C2∗ are their domains.
dimensional, but the definition of g involves an Then, with obvious formal modifications, Fenchel
infinite-dimensional problem. A perfect analog of the duality theorem holds in this case, too (see again
finite-dimensional Lagrange duality theorem may be [2]). However, to obtain strong duality, we must
derived in this more general case too (see [2]) with supplement conditions (a) or (b) with the following
essentially the same Slater condition (existence of
some v ∈ C such that hm (v) < 0 for any m). We • Either {(v, a) ∈ V ×  : f1 (v) ≤ a}
can also introduce a finite set of linear inequali-
or {(v, a) ∈ V ×  : f2 (v) ≥ a}
ties: this case can be handled in exactly the same
way as in the finite-dimensional case. However, has a nonempty interior.
the hypothesis ri(C) = ∅ is not completely trivial
This latter condition, which, in the finite-dimensional
here.
setting, follows from (a) or (b), must be checked
Fenchel duality too can be much generalized.
separately in the present case.
Indeed, let V be a Banach space, W = V ∗ its dual
space (the Banach space of continuous linear forms
on V ), and denote by v, v ∗  the action of v ∗ ∈ V ∗ References
on v ∈ V . Consider the primal problem
[1] Bertsekas, D.P. (1995). Nonlinear Programming, Athena
Scientific, Belmont.
(P) : min {f1 (v) − f2 (v)} sub v ∈ A [2] Luenberger, D.G. (1969). Optimization by Vector Space
Methods, Wiley, New York.
= C1 ∩ C2 ⊂ V (30)
[3] Magnanti, T.L. (1974). Fenchel and Lagrange duality are
equivalent, Mathematical Programming 7, 253–258.
where C1 , C2 ⊆ V are convex sets, f1 is convex [4] Rockafellar, R.T. (1970). Convex Analysis, Princeton
on C1 , and f2 is concave on C2 . Then, again by University Press, Princeton.
mimicking the finite-dimensional case, we associate
the primal with the dual
Related Articles
(D) : max {f2∗ (v ∗ ) − f1∗ (v ∗ )} sub v ∗ ∈ B Capital Asset Pricing Model; Expected Utility
Maximization; Expected Utility Maximization:
= C1∗ ∩ C2∗ ⊂V ∗
(31) Duality Methods; Minimal Entropy Martin-
gale Measure; Model Calibration; Optimization
where
Methods; Risk–Return Analysis; Robust Port-
folio Optimization; Stochastic Control; Utility
f1∗ (v ∗ ) = sup {v, v ∗  − f1 (v)} and f2∗ (v ∗ ) Function; Utility Indifference Valuation.
v∈C1

= inf {v, v ∗  − f2 (v)} (32) GIACOMO SCANDOLO


v∈C2
Squared Bessel Processes with Aµ = (φµ (∞))1/2 ; Bµ = exp(φµ (0+)) for φµ ,
the unique decreasing solution of the Sturm–
Liouville equation: φ  = φµ ; φ(0) = 1.
Squares of Bessel processes enjoy both an addi- Equation (3) may be considered as the (general-
tivity property and a scaling property, which are, ized) Laplace transform (with argument µ) of the
arguably, the main reasons why these processes occur probability Qδx , while as Qδx , for any fixed δ and x,
naturally in a number of Brownian, or linear diffu- is infinitely divisible, the next formula is the Lévy
sion, studies. This survey is written in a minimalist Khintchine representation of Qδx :
manner; the aim is to refer the reader to a few refer-  
ences where many facts and formulae are discussed 1
Qδx exp − Iµ
in detail. 2
   
1
= exp − Mx,δ (dz) 1 − e− 2 Iµ (z) (4)
Squared Bessel (BESQ) Processes C+

A squared Bessel (BESQ) process (Xt(x,δ) , t ≥ 0) may where Mx,δ = xM + δN , for M and N two σ -finite
be defined (in law) as the solution of the stochastic measures on C+ , which are described in detail in, for
differential equation: example, [5] and [6].
 t
Xt = x + 2 Xs dβs + δt , Xt ≥ 0 (1) Brownian Local Times and BESQ
0
Processes
where x is the starting value: X0 = x, δ is the
so-called dimension of X, and (βs )s≥0 is standard The Ray–Knight theorems for Brownian local times
y y
Brownian motion. For any integer dimension δ, (Lt ; y ∈ , t ≥ 0) express the laws of (LT ; y ∈ )
(Xt , t ≥ 0) may be obtained as the square of the for some very particular stopping times in terms of
Euclidean norm of δ-dimensional Brownian motion. certain Qδx ’s, namely,
The general theory of stochastic differential equa-
tions (SDEs) ensures that equation (1) enjoys path- 1. if T = Ta is the first hitting time of a by Brown-
ian motion then Z(a) ≡ La−y , y ≥ 0, satisfies the
wise uniqueness, hence uniqueness in law, and conse- y Ta
quently the strong Markov property. Denoting by Qδx following:
the law of (Xt )t≥0 , solution of equation (1), on the  y
canonical space C+ ≡ C(+ , + ), where (Zu , u ≥ 
Zy = 2 z dβz + 2(y ∧ a)
Z (5)
0
0) is taken as the coordinate process, there is the
convolution property: 2. if T = τ is the first time (L0t , t ≥ 0) the Brown-
y
  ian local time at level 0 reaches , then (Lτ , y ≥
Qδx ∗ Qδx  = Qδ+δ (2) −y
x+x  0) and (Lτ , y ≥ 0) are two independent BESQ
which holds for all x, δ ≥ 0 ([7]); in other terms, processes, distributed as Q0 .
adding two independent BESQ processes yields
another BESQ process, whose starting point, respec-
tively dimension, is the sum of the starting points,
An Implicit Representation in Terms of
respectively dimensions. Geometric Brownian Motions
It follows from equation (2) that
 for any positive Lamperti [3] showed a one-to-one correspondence
measure µ(du) on + such that µ(du)(1 + u) < between Lévy processes (ξt , t ≥ 0) and semistable
 Markov processes (u , u ≥ 0) via the (implicit) for-
∞, then, if Iµ = µ(du)Zu , mula:
  exp(ξt ) =  t , t ≥0 (6)
1
Qδx exp − Iµ = (Aµ )δ (Bµ )x (3) ds exp(ξs )
2 0
2 Squared Bessel Processes

In the particular case where ξt = 2(Bt + νt), t ≥ of Iµ , provided the function φλµ is known explicitly,
0, formula (6) becomes which is the case for µ(dt) = at α 1(t≤A) dt + bεA (dt)
and many other examples.
exp(2(Bt + νt)) = X(1,δ)t (7) Consequently, the semigroup of BESQ may be
ds exp(2(Bs + νs)) expressed explicitly in terms of Bessel functions,
0
as well as the Laplace transforms of first hitting
where, in agreement with our notation, (Xu(1,δ) , u ≥ times (see, for example, [2]) and distributions of last
0) denotes a BESQ process starting from 1 with passage times (see, for example, [4]). Chapter XI of
dimension δ = 2(1 + ν). We note that in equation [6] is entirely devoted to Bessel processes.
(7), δ may be negative, that is, ν < −1; however,
formula (7) reveals (Xu(1,δ) ) for u ≤ T0 (X (1,δ) ) the first References
hitting time of 0 by (X (1,δ) ). Nonetheless, the study
of BESQδ , for any δ ∈ , has been developed in [1]. [1] Goı̈ng-Jaeschke, A. & Yor, M. (2003). A survey and
Absolute continuity relationships between the laws some generalizations of Bessel processes, Bernoulli 9(2),
of different BESQ processes may be derived from 313–350.
equation (7), combined with the Cameron–Martin [2] Kent, J. (1978). Some probabilistic properties of Bessel
relationship between the laws of (Bt + νt, t ≥ 0) and functions, The Annals of Probability 6, 760–770.
[3] Lamperti, J. (1972). Semi-stable Markov processes,
(Bt , t ≥ 0). Zeitschrift fur Wahrscheinlichkeitstheorie und verwandte
Precisely, one obtains thus, for δ ≥ 2: Gebiete 22, 205–225.
 ν    [4] Pitman, J. & Yor, M. (1981). Bessel processes
Zu 2 ν 2 u ds and infinitely divisible laws, in Stochastic Integrals,
Qx|Zu =
δ
exp − •Q
2
x|Zu (8) D. Williams, ed., LNM 851, Springer, pp. 285–370.
x 2 0 Zs
[5] Pitman, J. & Yor, M. (1982). A decomposition of Bessel
Bridges, Zeitschrift fur Wahrscheinlichkeitstheorie und
where Zu ≡ σ {Zs , s ≤ u}, and ν = 2δ − 1. The com- verwandte Gebiete 59, 425–457.
bination of equations (7) and (8) may be used to [6] Revuz, D. & Yor, M. (1999). Continuous Martingales and
derive results about (Bt + νt, t ≥ 0) from results Brownian Motion, 3rd Edition, Springer.
[7] Shiga, T. & Watanabe, S. (1973). Bessel diffusions as
about X x,δ (and vice versa). In particular, the law of
a one-parameter family of diffusion processes, Zeitschrift
 Tλ fur Wahrscheinlichkeitstheorie und verwandte Gebiete 27,
(ν)
ATλ := ds exp(2(Bs + νs)) (9) 37–46.
0 [8] Yor, M. (2001). Exponential Functionals of Brownian
Motion and Related Processes, Springer-Finance.
where Tλ denotes an independent exponential time,
was derived in ([8], Paper 2) from this combination.
Related Articles

Some Explicit Formulae for BESQ Affine Models; Cox–Ingersoll–Ross (CIR) Model;
Functionals Heston Model; Simulation of Square-root Pro-
cesses.
Formula (3), when µ is replaced by λµ, for any
scalar λ ≥ 0, yields the explicit Laplace transform MARC J. YOR
Semimartingale A process X is defined to be a semimartingale if
it is càglàd, adapted, and the mapping IX : S → L0
is continuous. Such processes are “good integrators”,
Semimartingales form an important class of processes because they satisfy the following bounded conver-
in probability theory, especially in the theory of gence theorem: the uniform convergence of H n to
stochastic integration and its applications. They H (in S) implies the convergence in probability of
serve as natural models for asset pricing, since under IX (H n ) to IX (H ). As a consequence, when X is a
no-arbitrage assumptions a price process must be a semimartingale, the domain of the stochastic integral
semimartingale [1, 3]. IX can be extended to the space of all predictable
Let (, F,  = (Ft )t≥0 , P ) be a complete proba- processes H (see Stochastic Integrals).
bility space that satisfies the usual assumptions (i.e., Indeed, these two definitions are equivalent. This
F0 contains all P -null sets of F and the filtration  result is known as the Bichteler–Dellacherie theorem
is right continuous). A càglàd, adapted process X is [2, 4].
called a semimartingale if it admits a decomposition
Xt = X0 + At + Mt (1)
Examples
where X0 is F0 -measurable, A is a process with
finite variation, M is a local martingale, and A0 =
M0 = 0. If, moreover, A is predictable (i.e., mea-
surable with respect to the σ -algebra generated by • Càglàd adapted processes with finite variation
all left-continuous processes), X is called a special are semimartingales.
semimartingale. In this case, the decomposition (1) • All càglàd, adapted martingales, submartingales,
is unique and we call it the canonical decomposi- and supermartingales are semimartingales.
tion. Clearly, the set of all semimartingales is a vector • Brownian motion is a continuous martingale.
space. Hence, it is a semimartingale.
For any a > 0, a semimartingale X can be further • Lévy processes are semimartingales.
decomposed as • Itô diffusions of the form

Xt = X0 + At + Dt + Nt (2)  t  t
Xt = X0 + as ds + σs dWs (5)
where D and N are local martingales such that D is 0 0
a process with finite variation and the jumps of N are
bounded by 2a (see [6] p. 126).
Alternatively, semimartingales can be defined as where W is a Brownian motion, are (continuous)
a class of “good integrators”. Let S be a collection semimartingales. In particular, solutions of stochastic
of all simple predictable processes equipped with the differential equations of the type dXt = a(t, Xt )dt +
uniform convergence in (t, ω). A process H is called σ (t, Xt )dWt are semimartingales.
simple predictable if it has the representation

n
Ht = H0 1{0} (t) + Hi 1(Ti ,Ti+1 ] (t) (3) Quadratic Variation of Semimartingales
i=1

where 0 = T1 ≤ · · · ≤ Tn+1 < ∞ are stopping times, Quadratic variation is an important characteristic of
Hi are FTi -measurable and |Hi | < ∞ almost surely. a semimartingale. It is also one of the crucial objects
Let L0 be the space of (finite-valued) random in financial econometrics as it serves as a measure of
variables topologized by convergence in probability. the variability of a price process.
For a given process X, we define a linear mapping Let X, Y be semimartingales. The quadratic vari-
(stochastic integral) IX : S → L0 by ation process [X, X] = ([X, X]t )t≥0 is given as

n  t
IX (H ) = H0 X0 + Hi (XTi+1 − XTi ) (4) [X, X]t = Xt2 − X02 − 2 Xs− dXs (6)
i=1 0
2 Semimartingale

where Xs− = limu<s,u→s Xs (X0− = X0 ). The quad- Stability Properties of Semimartingales


ratic covariation of X and Y is defined by
Semimartingales turn out to be invariant under
 t  t
change of measure. Indeed, if Q is a probability mea-
[X, Y ]t = Xt Yt − X0 Y0 − Xs− dYs − Ys− dXs sure that is absolutely continuous with respect to P ,
0 0 then every P -semimartingale is a Q-semimartingale.
(7) When X is a P -semimartingale with decomposition
(1) and P , Q are equivalent probability measures,
which is also known as the integration by parts then X is a Q-semimartingale with the decomposition
formula (see [5] p. 51). Obviously, the operator Xt = X0 + Ãt + M̃t , where
(X, Y ) → [X, Y ] is symmetric and bilinear. We  t
1
therefore have the polarization identity M̃t = Mt − d[Z, M]s (12)
0 Zs
 
1 Zt = EP dQ |Ft and Ãt = Xt − X0 − M̃t . The lat-
[X, Y ] = ([X + Y, X + Y ] − [X, X] − [Y, Y ]) dP
2 ter result is known as Girsanov’s Theorem (see [6]
(8) p. 133).
Furthermore, semimartingales are stable under cer-
tain changes of filtration. Let X be a semimartingale
The quadratic (co-)variation process has the follow-
for the filtration . If  ⊂  is a subfiltration and
ing properties:
X is adapted to , then X is a semimartingale for
1. [X, Y ] = XY with Zs = Zs − Zs−  (Stricker’s Theorem). Semimartingales are also
(Z0 = 0) for any càglàd process Z. invariant to certain enlargement of filtration. Let A ⊂
2. [X, Y ] has finite variation and [X, X] is an F be a collection of events such that A, B ∈ A, A  =
increasing process. B, implies A ∩ B = ∅. Let Ht be generated by Ft and
3. Let A, B be càglàd, adapted processes. Then it A. Then every (, P )-semimartingale is a (, P )-
holds that semimartingale (Jacod’s Countable Expansion).
    t
As dXs , Bs dYs = As Bs d[X, Y ]s Itô’s Formula
t 0
(9) Semimartingales are stable under C 2 -transformation.
Furthermore, the quadratic variation process can be Let X = (X 1 , . . . , X d ) be a d-dimensional semi-
written as a sum of its continuous and discontinuous martingale and f : IR d → R be a function with con-
parts: tinuous second-order partial derivatives. Then f (X)
 is again a semimartingale and the Itô’s Formula
[X, X]t = [X, X]ct + |Xs |2 (10) holds:
0≤s≤t
f (Xt ) − f (X0 )
where [X, X]c denotes the continuous part of [X, X]. d  t
A semimartingale X is called quadratic pure jump if  ∂f
= (Xs− ) dXs
[X, X]c = 0. ∂x i
i=1 0
For any subdivision 0 = t0n < · · · < tknn = t with
d 
maxi |tin − ti−1
n
| → 0, it holds that 1  t ∂ 2f
+ (Xs− ) d[X i , X j ]cs
2 i,j =1 0 ∂xi ∂xj

kn
 
p
(Xtin − Xti−1
n )(Y n − Y n ) −
ti ti−1 −−→ [X, Y ]t (11)
i=1
+ f (Xs ) − f (Xs− )
0≤s≤t
The latter suggests the realized variance as a natural 
d
∂f
consistent estimator of the quadratic variation (see − (Xs− )Xsi (13)
Realized Volatility and Multipower Variation). i=1
∂x i
Semimartingale 3

One of the most interesting applications of Itô’s [2] Bichteler, K. (1981). Stochastic integration and Lp-theory
formula is the so-called Doléans–Dade exponen- of semimartingales, Annals of Probability 9, 49–89.
tial (see Stochastic Exponential). Let X be a [3] Delbaen, F. & Schachermayer, W. (1994). A general
version of the fundamental theorem of asset pricing,
(one-dimensional) semimartingale with X0 = 0. Then Mathematische Annalen 300, 463–520.
there exists a unique semimartingale
t Z that satisfies [4] Dellacherie, C. (1980). Un survol de la théorie de
the equation Zt = 1 + 0 Zs− dXs . This solution is l’intégrale stochastique, Stochastic Processes and their
denoted by E(X) (the Doléans–Dade exponential) and Applications 10, 115–144.
is given by [5] Jacod, J. & Shiryaev, A.N. (2003). Limit Theorems for
Stochastic Processes, 2nd Edition, Springer-Verlag.

[6] Protter, P.E. (2005). Stochastic Integration and Differen-
1
E(X)t = exp Xt − [X, X]t (1 + Xs ) tial Equations, 2nd Edition, Springer-Verlag.
2 0≤s≤t


1 Further Reading
× exp −Xs + |Xs | 2
(14)
2
Revuz, D. & Yor, M. (2005). Continuous Martingales and
Moreover, we obtain the identity E(X)E(Y ) = E(X + Brownian Motion, 3rd Edition, Springer-Verlag.
Y + [X, Y ]).
An important example is Xt = at + σ Wt , where
W denotes the Brownian motion and a, σ are con- Related Articles
stant.
In this2 case,
 the continuous
 solution E(X)t =
exp a − σ t + σ Wt is known as the Black– Doob–Meyer Decomposition; Equivalence of Prob-
2 ability Measures; Filtrations; Itô’s Formula; Mar-
Scholes model.
tingales; Poisson Process; Stochastic Exponential;
Stochastic Integrals.
References
MARK PODOLSKIJ
[1] Back, K. (1991). Asset prices for general processes,
Journal of Mathematical Economics 20(4), 371–395.
Capital Asset Pricing understanding the behaviors and transactions of mar-
ket participants on the financial market. Under this
Model setting, market participants are assumed to act simul-
taneously so that they can invest their money in only
two asset classes, namely, risky assets, which are
The 1990 Nobel Prize winner William Sharpe contingent claims, and nonrisky assets such as the
[49, 50] introduced one cornerstone of the modern risk-free asset. The confrontation between the supply
finance theory with his seminal capital asset pricing and demand of financial assets in the market allows,
model (CAPM) for which Black [9], Lintner [35, 36], therefore, for establishing an equilibrium price (for
Mossin [43], and Treynor [54] proposed analogous each traded asset) once the supply of financial assets
and extended versions. He then proposed an answer to satisfies the demand of financial assets. The uncer-
the financial theory’s question about the uncertainty tainty surrounding contingent claims is so that the
surrounding any investment and any financial asset. general equilibrium theory explains risky asset prices
Indeed, financial theory raised the question of how by the equality between the supply and demand of
risk impacts the fixing of asset prices in the financial financial assets. Under this setting, Sharpe [49, 50]
market (see Modern Portfolio Theory), and William assumes that the returns of contingent claims depend
Sharpe proposed an explanation of the link prevailing on each other only due to a unique exogenous market
between risky asset prices and market equilibrium. factor called the market portfolio. The other potential
The CAPM therefore proposes a characterization of impacting factors are assumed to be random.
the link between the risk and return of financial assets, Hence, the CAPM results immediately from
on one side, and market equilibrium, on the other Markowitz [37, 38] setting since it represents an
side. This fundamental relationship establishes that equilibrium model of financial asset prices (see
the expected excess return of a given risky asset Markowitz, Harry). Basically, market participants
(see Expectations Hypothesis; Risk Premia) cor- hold portfolios, which are composed of the risk-
responds to the expected market risk premium (i.e., free asset and the market portfolio (representing the
market price of risk) times a constant parameter set of all traded risky assets). The market portfo-
called beta (i.e., a proportionality constant). The beta lio is moreover a mean–variance efficient portfolio,
is a measure of the asset’s relative risk and repre- which is optimally diversified and satisfies equi-
sents the asset price’s propensity to move with the librium conditions (see Efficient Markets Theory:
market. Indeed, the beta assesses the extent to which Historical Perspectives; Efficient Market Hypoth-
the asset’s price follows the market trend simulta- esis; Risk–Return Analysis). Consequently, holding
neously. Namely, the CAPM explains that, on an a risky asset such as a stock is equivalent to holding
average basis, the unique source of risk impacting the a combination of the risk-free asset and the market
returns of risky assets comes from the broad finan- portfolio, the market portfolio being the unique mar-
cial market to which all the risky assets belong and ket factor.
on which they are all traded. The main result is that
the global risk of a given financial asset can be split
into two distinct components, namely, a market-based The Capital Asset Pricing Model
component and a specific component. This specific Specifically, Sharpe [49, 50] describes the uncer-
component vanishes within well-diversified portfo- tainty underlying contingent claims with a one-factor
lios so that their global risk summarizes to the broad model—the CAPM. The CAPM illustrates the estab-
market influence. lishment of financial asset prices under uncertainty
and under market equilibrium. Such equilibrium is
Framework and Risk Typology partial and takes place under a set of restrictive
assumptions.
The CAPM provides a foundation for the theory of
market equilibrium, which relies on both the utility
Assumptions
theory (see Utility Theory: Historical Perspectives)
and the portfolio selection theory (see Markowitz, 1. Markets are perfect and without frictions: no tax,
Harry). The main focus consists of analyzing and no transaction costs (see Transaction Costs),
2 Capital Asset Pricing Model

and no possibility of manipulating asset prices it characterizes the systematic fluctuations in asset
in the market (i.e., perfect market competition). prices, which result from the broad market. In a com-
2. Information is instantaneously and perfectly plementary way, the specific risk factor is also called
available in the market so that investors simulta- idiosyncratic risk factor, unsystematic risk factor, or
neously access the same information set without diversifiable risk factor. It represents a component,
any cost. which is peculiar to each financial asset or to each
3. Market participants invest over one time period financial asset class (e.g., small or large caps). This
so that we consider a one-period model setting. specific component in asset prices has no link with
4. Financial assets are infinitely divisible and liquid. the broad market. Moreover, the systematic risk fac-
5. Lending and borrowing processes apply the risk- tor is priced by the market, whereas the idiosyncratic
free rate (same rate of interest), and there is no risk factor is not priced by the market. Specifically,
short sale constraint. market participants ascribe a nonzero expected return
6. Asset returns are normally distributed so that to the market risk factor, whereas they ascribe a zero
expected returns and corresponding standard expected return to the specific risk factor. This fea-
deviations are sufficient to describe the assets’ ture results from the fact that the idiosyncratic risk
behaviors (i.e., their probability distributions). can easily be mitigated within a well-diversified port-
The Gaussian distribution assumption is equiv- folio, namely, a portfolio with a sufficient number
alent to a quadratic utility setting. of heterogeneous risky assets so that their respective
7. Investors are risk averse and rational. More- idiosyncratic risks cancel each other. Thus, a diversi-
over, they seek to maximize the expected util- fied portfolio’s global risk (i.e., total variance) results
ity of their future wealth/of the future value of only from the market risk (i.e., systematic risk).
their investment/portfolio (see Expected Util-
ity Maximization: Duality Methods; Expected CAPM equation
Utility Maximization; and the two-fund separa-
Under the previous assumptions, the CAPM estab-
tion theorem of Tobin [52]).
lishes a linear relationship between a portfolio’s
8. Investors build homogeneous expectations about
expected risk premium and the expected market risk
the future variation of interest rates. All the
premium as follows:
investors build the same forecasts about the  
expected returns and the variance–covariance E[RP ] = rf + βP × E[RM ] − rf (1)
matrix of stock returns. Therefore, there is where RM is the return of the market portfolio; RP is
a unique set of optimal portfolios. Basically, the return of portfolio P (which may also correspond
investors share the same opportunity sets, which to a given stock i); rf is the risk-free interest rate;
means they consider the same sets of accessible βP is the beta of portfolio P ; and E[RM ] − rf is
and “interesting” portfolios. the market price of risk. The market portfolio M
9. The combination of two distinct and independent is composed of all the available and traded assets
risk factors drives the evolution of any risky in the market. The weights of market portfolio’s
return over time, namely, the broad financial components are proportional to their corresponding
market and the fundamental/specific features of market capitalization relative to the global broad
the asset under consideration. Basically, the risk market capitalization. Therefore, the market portfolio
level embedded in asset returns results from the is representative of the broad market evolution and
trade-off between a market risk factor and an its related systematic risk. Finally, βP is a systematic
idiosyncratic risk factor. risk measure also called Sharpe coefficient since it
quantifies the sensitivity of portfolio P or stock i to
The market risk factor is also called systematic
the broad market. Basically, the portfolio’s beta is
risk factor and nondiversifiable risk factor. It repre-
written as
sents a risk factor, which is common to any traded
Cov(RP , RM ) σP M
financial asset. Specifically, the market risk factor βP = = 2 (2)
represents the global evolution of the financial mar- Var(RM ) σM
ket and the economy (i.e., trend of the broad market, where Cov(RP , RM ) = σP M is the covariance
business cycle), and impacts any risky asset. Indeed, between the portfolio’s return and the market return,
Capital Asset Pricing Model 3

E [R ] Security market line

P
E [R P ]

M Risk premium =
E [R M ]
systematic risk
Market times market price
price of risk. of risk.
rf
Time price =
risk free rate.

0 bM = 1 bP b

Portfolio’s systematic risk

Figure 1 Security market line

and Var(RM ) = σ 2 M is the market return’s variance required by investors becomes. Consequently, the
over the investment period. In other words, beta beta parameter allows investors to classify assets as a
is the risk of covariation between the portfolio’s function of their respective systematic risk level (see
and the market’s returns normalized by the market Table 1).
return’s variance. Therefore, beta is a relative risk Assets with negative beta values are usually spe-
measure. Under the Gaussian return assumption, the cific commodity securities such as gold-linked assets.
standard deviation, or equivalently the variance, is an Moreover, risk-free securities such as cash or Trea-
appropriate risk metric for measuring the dispersion sury bills, Treasury bonds, or Treasury notes belong
risk of asset returns. to the zero-beta asset class. Risk-free securities are
Therefore, under equilibrium, the portfolio’s independent from the broad market and exhibit a zero
expected return RP equals the risk-free rate increased variance, or equivalently a zero standard deviation.
by a risk premium. The risk premium is a linear However, the class of zero-beta securities includes
function of the systematic risk measure as represented also risky assets, namely, assets with a nonzero vari-
by the beta and the market price of risk as ance, which are not correlated with the market.
represented by the expected market risk premium.
Such a relationship is qualified as the security
Table 1 Systematic risk classification
market line (SML; see Figure 1). Since idiosyncratic
risk can be diversified away, only the systematic Beta level Classification
risk component in asset returns matters.a Intuitively, β>1 Offensive, cyclical asset
diversified portfolios cannot get rid of their respective amplifying market
dependency to the broad market. From a portfolio variations
management prospect, the CAPM relationship then 0<β<1 Defensive asset absorbing
focuses mainly on diversified portfolios, namely, market variations
β=1 Market portfolio or asset
portfolios or stocks with no idiosyncratic risk.
mimicking market
It then becomes useless to keep any idiosyncratic variations
risk in a given portfolio since such a risk is not β=0 Asset with no market
priced by the market. The beta parameter becomes dependency
subsequently the only means to control the portfolio’s β lies between −1 Asset with low systematic
risk since the CAPM relationship (1) establishes the and 1 risk level
premium investors require to bear the portfolio’s sys- |β| lies above 1 Asset with a higher risk level
than the broad market’s
tematic risk. Indeed, the higher the dependency on the risk
broad financial market is, the greater the risk premium
4 Capital Asset Pricing Model

Estimation and Usefulness describing the return of asset i. Therefore, RMt and
εit are assumed to be independent, whereas (εit )
The CAPM theory gives a partial equilibrium rela- are supposed to be mutually independent. Regression
tionship, which is assumed to be stable over time. equation (3) is simply the ex-post form of the CAPM
However, how can we estimate such a linear relation- relationship, namely, the application of CAPM to past
ship in practice and how do we estimate a portfolio’s observed data [27].
beta? How useful is this theory to market participants The second method for estimating CAPM betas
and investors? is the characteristic line so that we consider the
following regression:
Empirical Estimation
Rit = ai + bi × RMt + εit (4)
As a first point, under the Gaussian return assump-
where ai and bi are constant trend and slope regres-
tion, beta coefficients can be computed while con-
sion coefficients, respectively [51]. Moreover, such
sidering the covariance and variance of asset returns
coefficients have to satisfy the following constraints:
over the one-period investment horizon (see equa-
tion (2)). However, this way of computing beta coef- αi = ai − (1 − bi ) × rf (5)
ficients does not work in a non-Gaussian world. βi = bi (6)
Moreover, beta estimates depend on the selected mar-
ket index, the studied time window, and the frequency Regression equations (3) and (4) are only valid
of historical data [8]. under the strong assumptions that αi and βi coef-
As a second point, empirical estimations of the ficients are stationary over time (e.g., time stability),
CAPM consider historical data and select a stock and that each regression equation is a valid model
market index as a proxy for the CAPM market portfo- over each one-period investment horizon.
lio. Basically, the CAPM is tested while running two In practice, the market model (3) is estimated
possible types of regressions based on observed asset over a two-year window of weekly data, whereas
returns (i.e., past historical data). Therefore, stocks’ the characteristic line (4) is estimated over a five-
and portfolios’ betas are estimated by regressing past year window of monthly data. Basically, the market
asset returns on past market portfolio returns. We model and the characteristic line use, as a market
therefore focus on the potential existence of a linear proxy, well-chosen stock market indexes such as
relationship between stock/asset returns and market NYSE index and S&P500 index, respectively, which
returns. The first possible estimation method corre- are adapted to the frequency of the historical data
sponds to the market model regression as follows: under consideration.
Rit − rf = αi + βi × (RMt − rf ) + εit (3)
where Rit is the return of asset i at time t; RMt is Practical Use
the market portfolio’s return at time t, namely, the
systematic risk factor as represented by the chosen A sound estimation process is very important insofar
market benchmark, which is the unique explanatory as the CAPM relationship intends to satisfy investors’
factor; rf is the short-term risk-free rate; εit is a needs. From this viewpoint, the main goal of CAPM
Gaussian white noise with a zero expectation and estimation is first to use past-history beta estimates to
a constant variance σεi 2 ; αi is a constant trend forecast future betas. Specifically, the main objective
coefficient; and the slope coefficient βi is simply the consists of extracting information from past history to
beta of asset i. The trend coefficient αi measures predict future betas. However, extrapolating past beta
the distance of the asset’s average return to the estimates to build future beta values may generate
security market line, namely, the propensity of asset estimation errors resulting from outliers due to firm-
i to overperform (i.e., αi > 0) or to underperform specific events or structural changes either in the
(i.e., αi < 0) the broad market. In other words, αi is broad market or in the firm [10].
the difference between the expected return forecast Second, the CAPM is a benchmark tool helping
provided by the security market line and the average investors’ decision. Specifically, the SML is used to
return observed on past history. The error term εit identify overvalued (i.e., above SML) and underval-
represents the diversifiable/idiosyncratic risk factor ued (i.e., below SML) stocks under a fundamental
Capital Asset Pricing Model 5

analysis setting. Indeed, investors compare observed efficiency. Indeed, Campbell et al. [14] show the poor
stock returns with CAPM required returns and then performance of CAPM over the 1990s investment
assess the performance of the securities under consid- period in the United States. Such a result does have
eration. Therefore, the CAPM relationship provides several possible explanations among which miss-
investors with a tool for investment decisions and ing explanatory factors, heteroscedasticity in returns
trading strategies since it provides buy and sell sig- or autocorrelation patterns, time-varying or nonsta-
nals, and drives asset allocation across different asset tionary CAPM regression estimates. For example,
classes. heteroscedastic return features imply that the static
Third, the CAPM allows for building classical estimation of the CAPM is flawed under the classic
performance measures such as Sharpe ratio (see setting (e.g., ordinary least squares linear regres-
Sharpe Ratio), Treynor index, or Jensen’s alpha (see sion). One has, therefore, to use appropriate tech-
Style Analysis; Performance Measures). Finally, niques while running the CAPM regression under
the CAPM theory can be transposed to firm valuation heteroscedasticity or non-Gaussian stock returns (see
insofar as the equilibrium value of the firm is the dis- [7], for example, and see also Generalized Method
counted value of its future expected cash flows. The of Moments (GMM); GARCH Models).
discounting factor is just mitigated by one identified
risk factor affecting equity [20, 29, 30, 47]. Accord- General Violations
ing to the theorem proposed by Modigliani and Miller Basic CAPM assumptions are not satisfied in the mar-
[40–42] (see Modigliani–Miller Theorem), the cost ket and engender a set of general violations. First,
of equity capital for an indebted firm corresponds lending and borrowing rates of interest are different
to the risk-free rate increased by an operating risk in practice. Generally speaking, it is more expensive
premium (independent from the firm’s debt) times to borrow money than to lend money in terms of
a leverage-specific factor. The firm’s risk is there- interest rate level. Second, the risk-free rate is not
fore measured by the beta of its equity (i.e., equity’s constant over time but one can focus on its arith-
systematic risk), which also depends on the beta of metic mean over the one-period investment horizon.
the firm’s assets and on the firm’s leverage. Indeed, Moreover, the choice of the risk-free rate employed
the leverage increases the beta of equity in a perfect in the CAPM has to be balanced with the unit-holding
market and therefore increases the firm’s risk, which period under consideration. Third, transactions costs
represents the probability of facing a default situation. are often observed on financial markets and consti-
However, an optimal capital structure may result from tute part of the brokers’ and dealers’ commissions.
market imperfections such as taxes, agency costs, Fourth, the market benchmark as well as stock returns
bankruptcy costs, and information asymmetry among are often nonnormally distributed and skewed [44].
others. For example, there exists a trade-off between Indeed, asset returns are skewed, leptokurtic [55],
the costs incurred by a financial distress (i.e., default) and they exhibit volatility clusters (i.e., time-varying
and the potential tax benefits inferred from lever- volatility) and long memory patterns [2, 45]. More-
age (i.e., debt). Consequently, applying the CAPM over, the market portfolio is assumed to be composed
to establish the cost of capital allows for budget of all the risky assets available on the financial market
planning and capital budgeting insofar as choosing so as to represent the portfolio of all the traded secu-
an intelligent debt level allows for maximizing the rities. Therefore, the broad market proxy or market
firm value. Namely, there exists an optimal capital benchmark should encompass stocks, bonds, human
structure. capital, real estate assets, and foreign assets (see the
critique of Roll [46]). Fifth, financial assets are not
Limitations and Model Extensions infinitely divisible so that only fixed amounts or pro-
portions of shares, stocks, and other traded financial
However, CAPM is only valid under its strong semi- instruments can be bought or sold.
nal assumptions and exhibits a range of shortcomings Finally, the static representation of CAPM is at
as reported by Banz [6], for example. However, in odds with the dynamic investment decision pro-
practice and in the real financial world, many of these cess. This limitation gives birth to multiperiodic
assumptions are violated. As a result, the CAPM suf- extensions of CAPM. Extensions are usually called
fers from various estimation problems that impact its intertemporal capital asset pricing models (ICAPMs),
6 Capital Asset Pricing Model

and extend the CAPM framework to several unit- factor: the market portfolio. Indeed, considering the
holding periods (see [11, 39]). market portfolio as the unique source of systematic
risk, or equivalently as the unique systematic risk
Trading, Information, and Preferences information source is insufficient. To bypass this
Insider trading theory assumes that some market shortcoming, a wide academic literature proposes
participants hold some private information. Specifi- to add complementary factors to the CAPM in
cally, information asymmetry prevails so that part of order to better forecast stock returns (see Arbitrage
existing information is not available to all investors. Pricing Theory; Predictability of Asset Prices;
Under such setting, Easley and O’Hara [22] and Factor Models). Those missing factors are often
Wang [56] show that the trade-off between pub- qualified as asset pricing anomalies [5, 24, 26, 31].
lic and private information affects any firm’s cost Namely, the absence of key explanatory factors
of capital as well as the related return required by generates misestimations in computed beta values.
investors. Namely, the existence of private infor- For example, Fama and French [25] propose to
mation increases the return required by uninformed consider two additional factors such as the issu-
investors. Under information asymmetry, market ing firm’s size and book-to-market characteristics.
participants exchange indeed information through Further, Carhart [16] proposes to add a fourth
observed trading prices [18]. Moreover, heterogene- complementary factor called momentum. The stock
ity prevails across investors’ preferences. Namely, momentum represents the significance of recent past
they exhibit different levels of risk tolerance, which stock returns on the current observed stock returns.
drives their respective investments and behaviors in Indeed, investors’ sentiment and preferences may
the financial market. Finally, homogeneous expec- explain expected returns to some extent. In this
tations are inconsistent with the symmetry in the prospect, momentum is important since investors
motives of transaction underlying any given trade. make the difference between poor and high perform-
For a transaction to take place, the buy side has to ing stocks over a recent past history. More recently,
meet the sell side. Indeed, Anderson et al. [4] show Li [34] proposed two additional factors to the four
that heterogeneous beliefs play a nonnegligible role previous ones, namely, the earnings-to-price ratio and
in asset pricing. the share turnover as a liquidity indicator. Indeed,
Acharya and Pedersen [1], Brennan and Subrah-
Nonsynchronous Trading manyam [12], Chordia et al. [19], and Keene and
Peterson [32] underlined the importance of liquidity
Often, the market factor of risk and stocks are not as an explanatory factor in asset pricing. Basically,
traded at the same time on the financial market, the trading activity impacts asset prices since the
specifically at the daily frequency level. This stylized degree of transactions’ fluidity drives the continu-
fact engenders the so-called nonsynchronous trading ity of observed asset prices. In other words, traded
problem. When the market portfolio is composed of volumes impact market prices, and the impact’s mag-
highly liquid stocks, the nonsynchronism problem nitude depends on the nature of market participants
is reduced within the portfolio as compared to an [17].
individual stock. However, for less liquid stocks or
less liquid financial markets, the previous stylized
fact becomes an issue under the CAPM estimation Time-varying Betas
setting. To bypass this problem, the asset pricing Some authors like Tofallis [53] questioned the sound-
theory introduces one-lag systematic risk factor(s) ness of CAPM while assessing and forecasting stock
as additional explanatory factor(s) to describe asset returns’ performance. Indeed, the CAPM relation-
returns [13, 21, 48]. ship is assumed to remain stable over time insofar
as it relies on constant beta estimates over each unit-
Missing Factors holding period (i.e., reference time window). Such a
The poor explanatory power of the CAPM setting process assumes implicitly that beta estimates remain
[14] comes from the lack of information describing stable in the near future so that ex-post beta estimates
stock returns in the market among others. The broad are good future risk indicators. However, time insta-
market’s uncertainty is described by a unique risk bility is a key feature of beta estimates. For example,
Capital Asset Pricing Model 7

Gençay et al. [28] and Koutmos and Knif [33] sup- [11] Breeden, D. (1979). An intertemporal capital asset pric-
port time-varying betas in CAPM estimation. ing model with stochastic consumption and investment
Moreover, CAPM-type asset pricing models often opportunities, Journal of Financial Economics 7(3),
265–296.
suffer from error-in-variables problems coupled with
[12] Brennan, M.J. & Subrahmanyam, A. (1996). Market
time-varying parameters features [15]. To solve such microstructure and asset pricing: on the compensation
problems, authors like Amman and Verhoeven [3], for illiquidity in stock returns, Journal of Financial
Ellis [23], and Wang [57] among others advocate Economics 41(3), 441–464.
using conditional versions of the CAPM. Moreover, [13] Busse, J.A. (1999). Volatility timing in mutual funds:
Amman and Verhofen [3] and Wang [57] show the evidence from daily returns, Review of Financial Studies
efficiency of conditional asset pricing models and 12(5), 1009–1041.
exhibit the superior performance of the conditional [14] Campbell, J.Y., Lettau, M., Malkiel, B.G. & Xu, Y.
(2001). Have individual stocks become more volatile?
CAPM setting as compared to other asset pricing
An empirical exploration of idiosyncratic risk, Journal
models.
of Finance 56(1), 1–43.
[15] Capiello, L. & Fearnley, T.A. (2000). International
End Notes CAPM with Regime Switching GARCH Parameters.
Graduate Institute of International Studies, University of
a.
Specifically, the systematic risk represents that part of Geneva. Research Paper No 17.
returns’ global risk/variance, which is common to all [16] Carhart, M.M. (1997). On persistence in mutual fund
traded assets, or equivalently, which results from the broad performance, Journal of Finance 52(1), 57–82.
market’s influence. [17] Carpenter, A. & Wang, J. (2007). Herding and the
information content of trades in the Australian dollar
market, Pacific-Basin Finance Journal 15(2), 173–194.
References [18] Chan, H., Faff, R., Ho, Y.K. & Ramsay, A. (2006).
Asymmetric market reactions of growth and value
[1] Acharya, V.V. & Pedersen, L.H. (2005). Asset pricing firms with management earnings forecasts, International
with liquidity risk, Journal of Financial Economics Review of Finance 6(1–2), 79–97.
77(2), 375–410. [19] Chordia, T., Roll, R. & Subrahmanyam, A. (2001).
[2] Adrian, T. & Rosenberg, J. (2008). Stock Returns and Trading activity and expected stock returns, Journal of
Volatility: Pricing the Short-run and Long-run Compo- Financial Economics 59(1), 3–32.
nents of Market Risk , Staff Report No 254, Federal [20] Cohen, R.D. (2008). Incorporating default risk into
Reserve Bank of New York. Hamada’s equation for application to capital structure,
[3] Amman, M. & Verhofen, M. (2008). Testing conditional Wilmott Magazine March, 62–68.
asset pricing models using a Markov chain Monte [21] Dimson, E. (1979). Risk measurement when shares
Carlo approach, European Financial Management 14(3),
are subject to infrequent trading, Journal of Financial
391–418.
Economics 7(2), 197–226.
[4] Anderson, E.W., Ghysels, E. & Juergens, J.L. (2005). Do
[22] Easley, D. & O’Hara, M. (2004). Information and the
heterogeneous beliefs matter for asset pricing? Review of
cost of capital, Journal of Finance 59(4), 1553–1583.
Financial Studies 18(3), 875–924.
[23] Ellis, D. (1996). A test of the conditional CAPM
[5] Avramov, D. & Chordia, T. (2006). Asset pricing models
with simultaneous estimation of the first and second
and financial market anomalies, Review of Financial
Studies 19(3), 1001–1040. conditional moments, Financial Review 31(3), 475–499.
[6] Banz, R. (1981). The relationship between return and [24] Faff, R. (2001). An Examination of the Fama and French
market value of common stocks, Journal of Financial three-factor model using commercially available factors,
Economics 9(1), 3–18. Australian Journal of Management 26(1), 1–17.
[7] Barone Adesi, G., Gagliardini, P. & Urga, G. (2004). [25] Fama, E.F. & French, K.R. (1993). Common risk factors
Testing asset pricing models with coskewness, Journal in the returns on stocks and bonds, Journal of Financial
of Business and Economic Statistics 22(4), 474–495. Economics 33(1), 3–56.
[8] Berk, J. & DeMarzo, P. (2007). Corporate Finance, [26] Fama, E.F. & French, K.R. (1996). Multi-factor expla-
Pearson International Education, USA. nations of asset pricing anomalies, Journal of Finance
[9] Black, F. (1972). Capital market equilibrium with 51(1), 55–84.
restricted borrowing, Journal of Business 45(3), [27] Friend, I. & Westerfield, R. (1980). Co-skewness and
444–455. capital asset pricing, Journal of Finance 35(4), 897–913.
[10] Bossaerts, P. & Hillion, P. (1999). Implementing statisti- [28] Gençay, R., Selçuk, F. & Whitcher, B. (2003). Sys-
cal criterion to select return forecasting models: what do tematic risk and timescales, Quantitative Finance 3(1),
we learn? Review of Financial Studies 12(2), 405–428. 108–116.
8 Capital Asset Pricing Model

[29] Hamada, R. (1969). Portfolio analysis market equilib- Physica A: Statistical Mechanics and Its Applications
rium and corporation finance, Journal of Finance 24(1), 387(5–6), 1247–1254.
13–31. [46] Roll, R. (1977). A critique of the asset pricing theory’s
[30] Hamada, R. (1972). The effect of the firm’s capital tests: Part one: on past and potential testability of the
structure on the systematic risk of common stocks, theory, Journal of Financial Economics 4(1), 129–176.
Journal of Finance 27(2), 435–451. [47] Rubinstein, M. (1973). A mean variance synthesis of
[31] Hu, O. (2007). Applicability of the Fama-French three- corporate financial theory, Journal of Finance 38(1),
factor model in forecasting portfolio returns, Journal of 167–181.
Financial Research 30(1), 111–127. [48] Scholes, M. & Williams, J. (1977). Estimating betas
[32] Keene, M.A. & Peterson, D.R. (2007). The importance from non synchronous data, Journal of Financial Eco-
of liquidity as a factor in asset pricing, Journal of nomics 5(3), 309–327.
Financial Research 30(1), 91–109. [49] Sharpe, W.F. (1963). A simplified model of portfolio
[33] Koutmos, G. & Knif, J. (2002). Estimating systematic analysis, Management Science 9(2), 227–293.
risk using time-varying distributions, European Finan- [50] Sharpe, W.F. (1964). Capital asset prices: a theory of
cial Management 8(1), 59–73. market equilibrium under risk, Journal of Finance 19(3),
[34] Li, X. (2001). Performance Evaluation of Recommended 425–442.
Portfolios of Individual Financial Analysts. Working [51] Smith, K.V. & Tito, D.A. (1969). Risk-return measures
Paper, Owen Graduate School of Management, Vander- of ex post portfolio performance, Journal of Financial
bilt University. and Quantitative Analysis 4(4), 449–471.
[35] Lintner, J. (1965). The valuation of risky assets and [52] Tobin, J. (1958). Liquidity preferences as behavior
the selection of risky investments in stock portfolios towards risk, Review of Economic Studies 25(1), 65–86.
and capital budgets, Review of Economics and Statistics [53] Tofallis, C. (2008). Investment volatility: a critique
47(1), 13–37. of standard beta estimation and a simple way for-
[36] Lintner, J. (1969). The aggregation of investor’s diverse ward, European Journal of Operational Research 187(3),
judgments and preferences in purely competitive security 1358–1367.
markets, Journal of Financial and Quantitative Analysis [54] Treynor, J. (1961). Toward a theory of the market value
4(4), 347–400. of risky assets, in Asset Pricing and Portfolio Per-
[37] Markowitz, H.W. (1952). Portfolio selection, Journal of formance: Models, Strategy and Performance Metrics,
Finance 7(1), 77–91. Korajczyk, Robert A., ed., Risk Books, London, pp.
[38] Markowitz, H.W. (1959). Portfolio Selection. Efficient 15–22. Unpublished Manuscript. Recently published in
Diversification of Investment, John Wiley & Sons, New 1999 as the Chapter 2 of editor).
York. [55] Verhoeven, P. & McAleer, M. (2004). Fat tails and
[39] Merton, R.C. (1973). An intertemporal capital asset asymmetry in financial volatility models, Mathematics
pricing model, Econometrica 41(5), 867–887. and Computers in Simulation 64(3–4), 351–361.
[40] Modigliani, F. & Miller, M.H. (1958). The cost of [56] Wang, J. (1993). A model of intertemporal asset prices
capital, corporation finance and the theory of investment, under asymmetric information, Review of Economic
American Economic Review 48(3), 261–297. Studies 60(2), 249–282.
[41] Modigliani, F. & Miller, M.H. (1963). Corporate income [57] Wang, K.Q. (2003). Asset pricing with conditioning
taxes and the cost of capital: a correction, American information: a new test, Journal of Finance 58(1),
Economic Review 53(3), 433–443. 161–196.
[42] Modigliani, F. & Miller, M.H. (1966). Some estimates
of the cost of capital to the utility industry 1954-7,
American Economic Review 56(3), 333–391. Related Articles
[43] Mossin, J. (1966). Equilibrium in a capital asset market,
Econometrica 34(4), 768–783. Arbitrage Pricing Theory; Efficient Markets The-
[44] Nelson, D.B. (1991). Conditional heteroskedasticity in ory: Historical Perspectives; Markowitz, Harry;
asset returns: a new approach, Econometrica 59(2), Modigliani, Franco; Sharpe, William F.
347–370.
[45] Oh, G., Kim, S. & Eom, C. (2008). Long-term memory
and volatility clustering in high-frequency price changes, HAYETTE GATFAOUI
{Zi ; i = 1, . . . , N }. The bi,k are the factor loadings
Arbitrage Pricing Theory and the ei are the residuals from projecting the Zi on
the factors.
The arbitrage pricing theory (APT) was introduced by The K + 1 largest eigenvalue of the covariance
Ross [10] as an alternative to the capital asset pricing matrix of the Zi , denoted by 2 (K), is interpreted
model (CAPM). The model derives a multibeta as a measure of the extent to which our sequence of
representation of expected returns relative to a set assets has a K-factor representation. The PCA selects
of K reference variables under assumptions that may the fk so that 2 (K) is minimized. In addition,
be described roughly as follows: 2 (K) is also the largest eigenvalue of the covariance
matrix of the ei .
1. There exists no mean–variance arbitrage.
2. The asset returns follow a K-factor model.
3. The reference variables and the factors are non- Diversified Portfolios
trivially correlated.a
Let w ∈ R N be a portfolio in assets i = 1, . . . , N . Its
The first assumption implies that there are no excess return is
portfolios with arbitrarily large expected returns and
N
unit variance. The second one assumes that the Zw = wi Zi .
returns are a function of K factors common to all i=1
assets, and noise term specific to each asset. The third Its representation as a linear function of the factors is
one identifies the sets of reference variables for which
the model works. K
The model predictions may have approximation Zw = bw,0 + bw,k fk + ew
k=1
errors. However, these errors are small for each port- 
folio that its weight on each asset is small (a well- where bw,k = N i=1 wi bi,k are the factor loadings and
N
diversified portfolio). ew = i=1 wi ei is the residual which satisfies
Early versions of the model unnecessarily assumed
N
that the factors are equal to the reference variables. Var[ew ] < 2 (K) wi2
The extension of the model to arbitrary sets of ref- i=1

erence variables comes at the cost of increasing the A portfolio w = (w1 , . . .) is called an (approximate)
bound on the approximation errors by a multiplicative well-diversified portfolio if
factor. However, when focusing on pricing of only
well-diversified portfolios, this seems to be unimpor- N
wi2 ≈ 0 (1)
tant because each of the approximation error is small i=1
and a multiplicative factor does not change much the
size of the error. Intuitively, a well-diversified portfolio is one with
a large number of assets that has a small weight in
many of them, and, in addition, there is no single
Factor Representation asset for which the weight is not small.
The variance of the residual of a well-diversified
Consider a finite sequence of random variables portfolio is small and thus its excess return is
{Zi ; i = 1, . . . , N } with finite variances that will be approximately a linear function of the factors; that is,
held fixed throughout the article. It is regarded as rep-
K
resenting the excessb returns of a given set of assets Zw ≈ bw,0 + bw,k fk (2)
(henceforth “assets i = 1, . . . , N ”). Without any fur- k=1
ther assumptions 
Although N i=1 wi ≈ 0, Zw may not be
2
small. For
K N
Zi = bi,0 + bi,k fk + ei ; i = 1, . . . , N example, let wi = 1/N , then we have i=1 wi =
2
K
k=1
1/N , and bw,k = (1/N ) k=1 bi,k
where f1 , . . . , fK are the first K factors in the A further discussion on well-diversified portfolios
principal component analysis (PCA) of the sequence can be found in [4].
2 Arbitrage Pricing Theory

Multibeta Representation The Pricing Errors


Throughout the article we consider a fixed set of The pricing error of any portfolio w,
K reference variables {g1 , . . . , gK } with respect to K
which we derive an approximate multibeta represen- wi ρi = ρw (8)
k=1
tation defined as
K satisfies N
E[Zi ] = Bi,k λk + ρi (3) |ρw |2 ≤  wi2 (9)
k=1 i=1

where Provided  is not large and N is large, the pricing


Bi,k = Cov(Zi , gk ) (4) error on each well-diversified portfolio is small. For
a single asset i, we only get that most of the ρi are
This means that small. However, for a few of the assets the ρi may
K not be small.
E[Zi ] ≈ Bi,k λk (5)
k=1

where ρi is the approximation error in pricing asset i. Example


The sum of the squares of these approximation errors,
that is, Assume that each Zi is given by
N
ρi2 = 2 (6)
i=1 Zi = ai + bi f + ei
determines the quality of the approximation.
where the ei are mutually uncorrelated and have zero
mean, and f has a zero mean and unit variance and
is uncorrelated with all the ei .
The APT Bound The APT implies that every random variable g for
Huberman [3] showed that  is finite for an infinite which cov(g, f ) is not zero can serve as a reference
sequence of excess returns but did not derive a variable. Thus there exists a constant λ so that
bound. Such bounds were derived by Chamberlain & E[Zi ] = cov(Zi , gs)λ + ρi for each i
Rothschild [1], in the case where the reference
variables are the factors and by Reisman [7], in the In addition, for each well-diversified portfolio w,
general case. Reisman showed that we have
E[Zw ] ≈ cov(Zw , g)λ
 ≤ SV (7)
In this example,  = 1/corr(f, g)2 ; (1) and S
where 2 (K) is the K + 1 largest eigenvalue of the and  may take arbitrary values.
covariance matrix of the Zi ; S is the lowest upper
bound on expected excess return among portfolios
with unit variance; 2 = 1 − R 2 of the regression of Empirical Studies
the tangency portfolio on the reference variables; 
is an increasing function of the largest eigenvalue Empirical studies attempted to find the sets of refer-
of (Gt G)−1 , where G = Corr(fn , gm )n,m=1,...,K is ence variables for which the hypothesis that
the cross-correlation matrix of the factors and the K
reference variables; and V 2 is a bound on the E[Zi ] = Bi,k λk
k=1
variances of the Zi . See [5, 8] for further details.
What is important about the bound is that neither cannot be rejected. Roll and Ross [9] identified sets
 nor  depends on the number of assets, N . This of macroeconomic variables that are believed to be
means that the size of the bound depends on the responsible for stock price movements and tested
number of assets N , only through (K), S, and V , whether they explain expected returns in the major
which may be bounded as this number increases to US markets. Trzcinka [13] applied PCA to identify
infinity. the factors. He showed that a small number of factors
Arbitrage Pricing Theory 3

may explain most of the variation of the market. 1. a factor structure with K factors;
Then he tested the multibeta representation with these 2. no mean–variance arbitrage;
factors as reference variables. 3. nontrivial correlation between our set of refer-
ence variables and the first K factors in the PCA.
Equilibrium APT The parameters , S, and  are measures of the
extent to which each of the above assumptions holds.
The CAPM implies that the market portfolio is The larger it is, the larger is the extent to which the
mean–variance efficient. If the market portfolio is a related assumption does not hold.
well-diversified one, then it is spanned by the factors. What this says is that the model translates our
In that case, we get that if the reference variables beliefs on the extent to which the model assumptions
are the factors, then  is small, which implies that hold to a belief on a bound on the size of the approx-
the approximation error for each asset in the sequence imation errors in pricing well-diversified portfolios.
is small. Connor [2] and Wei [14] derived a related
result which is called equilibrium APT.
Summary
Arbitrage and APT The APT implies that each (approximate) well-
S measures the extent to which arbitrage in the mean- diversified portfolio is (approximately) priced by a
variance sense exists. It is equal to the maximal set of K reference variables.
expected excess return per unit variance of portfolios What distinguishes this model from the K-factor
in the Zi . A large S can be interpreted as some form CAPM is the set of reference variables that is implied
of no arbitrage. However it is not an arbitrage in the by each of the models.
standard sense as there are examples in which S is In the CAPM, the market portfolio is mean–
finite and arbitrage exists. See Reisman [6]. variance efficient and its return must be equal to a
linear function of the set of reference variables.
In contrast, in the APT, the reference variables
Testability are any set that is nontrivially correlated with the
common factors of the returns and it may not span
It was pointed out by Shanken [11, 12] that an the mean–variance frontier.
inequality of the type given in equation (7) is a
tautology. That is, it is a mathematical statement and
thus cannot be rejected. End Notes
Assume that we performed statistical tests that
a.
imply that the probability that the bound in equation The cross-correlation matrix is nonsingular.
b.
(7) holds, is small. Then the only explanation can The excess return is the return minus the risk-free rate.
be that it was a bad sample. Since equation (7) is a
tautology, there is no other explanation. References
Nevertheless, this does not imply that the bound
is not useful. The bound translates prior beliefs on [1] Chamberlain, G. & Rothschild, M. (1983). Arbitrage,
the sizes of , S, and , into a prior belief on a factor structure, and mean variance analysis on large
bound on the size of the approximation error of each asset markets, Econometrica 51, 1281–1304.
[2] Connor, G. (1984). A unified beta pricing theory,
well-diversified portfolio.
Journal of Economic Theory 34, 13–31.
The relationship between the sizes of , S, and , [3] Huberman, G. (1982). A simple approach to arbitrage
and the model assumptions is illustrated in the next pricing, Journal of Economic Theory 28, 183–191.
section. [4] Ingersoll Jr J.E. (1984). Some results in the theory of
arbitrage pricing, Journal of Finance 39, 1021–1039.
[5] Nawalkha, S.K. (1997). A multibeta representation theo-
The APT Assumptions rem for linear asset pricing theories, Journal of Financial
Economics 46, 357–381.
The model is derived under assumptions on the extent [6] Reisman, H. (1988). A general approach to the Arbitrage
to which there exists Pricing Theory (APT), Econometrica 56, 473–476.
4 Arbitrage Pricing Theory

[7] Reisman, H. (1992). Reference variables, factor struc- [13] Trzcinka, C. (1986). On the number of factors in
ture, and the approximate multibeta representation, Jour- the arbitrage pricing model, Journal of Finance 41,
nal of Finance 47, 1303–1314. 347–368.
[8] Reisman, H. (2002). Some comments on the APT, [14] Wei, K. & John, C. (1988). An asset-pricing theory
unifying CAPM and APT, Journal of Finance, 43,
Quantitative Finance 2, 378–386.
881–892.
[9] Roll, R. & Ross, S.A. (1980). An empirical investigation
of the arbitrage pricing theory, Journal of Finance 35,
1073–1103. Related Articles
[10] Ross, S.A. (1976). The arbitrage theory of capital asset
pricing, Journal of Economic Theory 13, 341–360. Capital Asset Pricing Model; Correlation Risk;
[11] Shanken, J. (1982). The arbitrage pricing theory: is it Factor Models; Risk–Return Analysis; Ross,
testable? Journal of Finance 37, 1129–1140. Stephen; Sharpe, William F.
[12] Shanken, J. (1992). The current state of the arbitrage
pricing theory, Journal of Finance 47, 1569–1574. HAIM REISMAN
Efficient Market efficiency only if market efficiency is identified with
constancy of expected returns. On this reading, the
Hypothesisa additional restriction implied by market efficiency
might consist of the assumption that investors have
rational expectations. The market model explains
The topic of capital market efficiency plays a cen- asset prices based on investors’ subjective percep-
tral role in introductory instruction in finance. After tions of their environment; the assumption of rational
investigating the risk–return trade-off and the selec- expectations is needed to connect these subjective
tion of optimal portfolios, instructors find it natural perceptions with objective correlations. Admittedly,
to go on to raise the question of what information it is pure conjecture to assume that proponents intend
is incorporated in the estimates of risk and expected this identification of market efficiency with rational
return that underlie portfolio choices. Information that expectations–as Berk [1] pointed out, there is no
is “fully reflected” in security prices (and therefore mention of rational expectations in [7, 8].
in investors’ estimates of expected return and risk) In many settings, conditional expected returns are
cannot be used to construct successful trading rules, constant over time when agents are risk neutral. If
which are defined as those with an abnormally high agents are risk averse, expected returns will gen-
expected return for a given risk. In contrast, informa- erally differ across securities, as is clear from the
tion that is not fully reflected in security prices can be capital asset pricing model (see Capital Asset Pric-
so used. Students appear to find this material plausible ing Model), and will change over time according to
and intuitive, and this is the basis of its appeal. Best the realizations of the conditioning variables even
of all, the idea of capital market efficiency appears not in stationary settings [14, 19]. Hence, if investors
to depend on the validity of particular models, imply- are risk averse, the assumption of rational expecta-
ing that students can grasp the major ideas without tions will not generally lead to returns that are fair
wading through the details of finance models. games.
However, those who are accustomed to relying Analysts who understood that constancy of
on formal models to discipline their thinking find expected returns requires the assumption of risk neu-
that capital market efficiency has the disadvantage trality (or some other even more extreme assumption,
of its advantage: the fact that market efficiency is not such as that growth rates of gross domestic prod-
grounded in a particular model (unlike, e.g., portfolio uct are independently and identically distributed over
theory) means that it is not so easy to determine time) were skeptical about the empirical evidence
what efficiency really means. To see this, consider the offered in support of market efficiency. From the fact
assertion of Fama [8] that capital market efficiency that high-risk assets generate higher average returns
can only be tested in conjunction with a particular than low-risk assets—or from the fact that agents
model of returns. This statement implies that there purchase insurance even at actuarially unfavorable
exist two independent sources of restrictions on the prices, or from a variety of other considerations—we
data that are being tested jointly: the assumed model know that investors are risk averse. If so, there is no
and market efficiency. Analysts who are used to reason to expect that conditional expected returns will
deriving all restrictions being tested from the assumed be constant.
model find this puzzling: what is the additional source One piece of evidence offered in the 1970s, which
of information that is separate from the model? appeared to contradict the consensus in support of
This question was not addressed clearly in the market efficiency, had to do with the volatility of
major expositions of market efficiency offered by its security prices and returns. If conditional expected
proponents. One way to resolve this ambiguity is to returns are constant, then the volatility of stock
look at the empirical tests that are interpreted as sup- prices depends entirely on the volatility of dividends
porting or contradicting market efficiency. Most of (under some auxiliary assumptions, such as exclu-
the empirical evidence that Fama [7] interpreted as sion of bubbles). This observation led LeRoy and
supporting market efficiency is based on a particular Porter [16] and Shiller [23] to suggest that bounds
model: expected returns conditional on some pre- on the volatility of stock prices and returns can
specified information set are constant. For example, be derived from the volatility of dividends. These
return autocorrelatedness is evidence against market authors concluded that stock prices appear to be more
2 Efficient Market Hypothesis

volatile than can be justified by the volatility of divi- The attractive feature of the log-linearization is that
dends. This finding corroborated the informal opinion expectations of future dividends and expectations of
(that was subsequently confirmed by Cutler et al. future returns appear symmetrically and additively in
[6]) that large moves in stock prices generally can- relation (1). Without the log-linearization, dividends
not be convincingly associated with contemporaneous would appear in the numerator of the present-value
news that would materially affect expected future relation and returns in the denominator, rendering the
dividends. analysis less tractable.
Connecting the volatility of stock prices with that As noted, the market-efficiency tests of Fama and
of dividends required a number of auxiliary econo- the variance bounds are implications of the hypoth-
metric specifications. These were supplied differently esis that prt is a constant. If prt is, in fact, random
by LeRoy–Porter and Shiller. However, both sets of and positively correlated with pdt , then the assump-
specifications turned out to be controversial (see [9] tion of constancy of expected returns will bias the
for a survey of the econometric side of the variance- implied volatility of pt downward. Campbell and
bounds tests). Some analysts, such as Marsh and Shiller found that if averages of future returns are
Merton [20], concluded that the appearance of excess regressed on current stock prices, a significant propor-
volatility was exactly what should be expected in an tion of the variation can be explained, contradicting
efficient market, although the majority opinion was the specification that expected returns are constant.
that resolving the econometric difficulties reduces but Campbell et al. noted that as economists came
does not eliminate the excess volatility [25]. to understand the connection between return auto-
It was understood throughout that the variance correlatedness and price and return volatility, the
bounds were implications of the assumption that variance-bounds results seemed less controversial:
expected returns are constant. As noted, this was the
same model that was implicitly assumed in the market LeRoy and Porter [16] and Shiller [23] started a
heated debate in the early 1980s by arguing that
efficiency tests summarized by Fama. The interest stock prices are too volatile to be rational forecasts
in the variance-bounds tests derived from the fact of future dividends discounted at a constant rate.
that the results of the two sets of tests of the This controversy has since died down, partly because
same model appeared to be so different. In the late it is now more clearly understood that a rejection
1980s, there was a growing realization that small but of constant-discount-rate models is not the same
persistent autocorrelations in returns could explain as a rejection of Efficient Capital Markets, and
the excess volatility of prices [24]. This connection partly because regression tests have convinced many
financial economists that expected stock returns are
is particularly easy to understand if we employ the time-varying rather than constant ([2] p. 275).
Campbell–Shiller log-linearization. Defining rt+1 as
the log stock return from t to t + 1, pt as the log stock This passage, in implying that the return autocorre-
price at t, and dt as the log dividend level, we have lation results provide an explanation for excess stock
price volatility, is a bit misleading. The log-linearized
pt ∼
= k + pdt + prt (1)
present-value relation (1) is not a theoretical model
where pdt and prt are given by with the potential to explain price volatility. Rather,
  it is very close to an identity (the only respect in
∞ which equation (1) imposes substantive restrictions
pdt = Et  ρ j [(1 − ρ)dt+j ] (2) lies in the assumption that the infinite sums con-
j =1 verge; this rules out bubbles). The Campbell–Shiller
exercise amounts to decomposing price variation into
and   dividend variation, return variation, and a covari-
∞
prt = −Et  ρ j rt+j  (3) ance term and observing that the latter two terms
j =1
are not negligible quantitatively. This, although use-
ful, is a restatement of the variance-bounds result,
(see [2–4]). Here, k and ρ are parameters associated not an explanation of it. Explaining excess volatil-
with the log-linearization. Thus pdt and prt capture ity would involve accounting in economic terms for
price variations induced by expected dividend vari- the fact that expected returns have the time structure
ations and expected return variations, respectively. that they do. Campbell and Shiller have not done
Efficient Market Hypothesis 3

this—nor has anyone else. LeRoy–Porter’s conclu- as real estate in a model that accounts explicitly
sion from the variance-bounds tests was that we do for illiquidity in terms of search and matching. In
not understand why asset prices move as they do. a similar setting, Krainer [12] introduced economy-
That conclusion is no less true now than it was when wide shocks and found that, despite the illiquidity
the variance-bounds results were first reported. of real estate, prices adjust instantaneously to the
Fama’s assertion that market efficiency is testable, shocks, just as in liquid markets.
but only in conjunction with a model of market A similar result was demonstrated by Lim [17].
returns, can be given another reading. Rather than He considered the determination of asset prices
identifying market efficiency with the proposition that when short sales are restricted. Lintner [18] and
investors have rational expectations—alternatively, Miller [21], among others, proposed that short sale
with the decision to model investors as having ratio- restrictions cause securities to trade at higher prices
nal expectations—one can associate market effi- than they would otherwise. This is held to occur
ciency with the proposition that asset prices behave because investors with negative information may be
as one would expect if security markets were entirely unable to trade based on their information, whereas
frictionless. In such markets, prices respond quickly those with positive information can buy without
to information, implying that investors cannot use restriction. Empirical evidence is held to support this
publicly available information to construct profitable result [5, 10, 22]. Lim showed that this outcome
trading rules because that information is reflected will not occur if investors have rational expectations
in security prices as soon as it becomes available. about the extent of short sales restrictions. Under
In contrast, the presence of major frictions in asset rational expectations, prices in Lim’s model follow a
markets is held to imply that prices may respond martingale under the natural probabilities (reflecting
slowly to information. In that case, the frictions pre- assumed risk neutrality), just as they would in the
vent investors from exploiting the resulting trading absence of short sales restrictions.
opportunities. These results were derived in settings that imposed
In the foregoing argument, it is presumed that trad- strong restrictions, and it is not clear how general
ing frictions and transactions costs are analogous to they are. However, the preliminary conclusion is
adjustment costs. In the theory of investment, it is that if market efficiency is defined as the absence
sometimes assumed that investment in capital goods of frictions, empirical evidence of quick adjustment
induces costs that motivate firms to change quan- of prices to information cannot necessarily be inter-
tities—in this case, physical capital—more slowly preted as supporting market efficiency, since that
than they would otherwise. It appears natural to outcome would occur in the presence of frictions.
assume that prices are similar. For example, real It could be objected that none of these consid-
estate prices are held to respond slowly to relevant erations supports distinguishing between the implica-
information because the costs implied by the illiquid- tions of an asset pricing model and market efficiency,
ity of real estate preclude the arbitrages that would however defined. All testable restrictions are derived
otherwise bring about rapid price adjustment. from an assumed model; so, the question is, what can
Recent work on the valuation of assets in the be gained by identifying some of these restrictions
presence of market frictions raises questions as to with something called market efficiency? This is par-
the appropriateness of the analogy between quantity ticularly debatable, given the ambiguity in the usage
adjustment and price adjustment. It is correct that,
of this term now. Berk [1] suggested dropping the
if prices respond slowly to information, investors
term “market efficiency” from financial economics,
may be unable to construct the trades that exploit
and this might be the best course.
the mispricing because of frictions. This, however,
does not establish that markets clear in settings
where prices adjust slowly. Equilibrium models that
characterize asset prices in the presence of frictions End Notes
suggest that in equilibrium prices respond quickly
to shocks, just as in the absence of frictions. For a.
An evaluation of the idea of capital market efficiency has
example, Krainer [11] and Krainer and LeRoy [13] been presented elsewhere [15]. In this essay, repetition of
analyzed equilibrium prices of illiquid assets such material found there has been avoided as much as possible.
4 Efficient Market Hypothesis

References [15] LeRoy, S.F. (1989). Efficient capital markets and mar-
tingales, Journal of Economic Literature 27, 1583–1621.
[16] LeRoy, S.F. & Porter, R.D. (1981). The present value
[1] Berk, J. (2007). A Critique of the Efficient Capital Mar- relation: tests based on implied variance bounds, Econo-
kets Hypothesis. Reproduced, Haas School of Business, metrica 49, 555–574.
University of California, Berkeley. [17] Lim, B. (2007). Short-sales Constraints and Price
[2] Campbell, J.Y., Lo, A.W. & MacKinlay, A.C. (1996). Bubbles. Reproduced, University of California, Santa
The Econometrics of Financial Markets, Princeton Barbara.
University Press, Princeton, NJ, 275. [18] Lintner, J. (1969). The aggregation of investors’ diverse
[3] Campbell, J.Y. & Shiller, R.J. (1988). The dividend- judgments and preferences in purely competitive security
price ratio and expectations of future dividends and dis- markets, Journal of Financial and Quantitative Eco-
count factors, Review of Financial Studies 1, 195–228. nomics 4(4), 347–400.
[4] Campbell, J.Y. & Shiller, R. (1988). Stock prices, [19] Lucas, R.E. (1978). Asset prices in an exchange econ-
earnings, and expected dividends, Journal of Finance omy, Econometrica 46, 1429–1445.
43, 661–676. [20] Marsh, T.A. & Merton, R.C. (1986). Dividend variability
[5] Cheng, J.W., Chang, E.C. & Yu, Y. (2007). Short-sales and variance bounds tests for the rationality of stock
constraints and price discovery: evidence from the Hong market prices, American Economic Review 76, 483–498.
Kong market, Journal of Finance 62(5), 2097–2121. [21] Miller, E.M. (1977). Risk, uncertainty, and divergence
[6] Cutler, D., Poterba, J. & Summers, L. (1989). What of opinion, Journal of Finance 32(4), 1151–1168.
moves stock prices? Journal of Portfolio Management [22] Ofek, E. & Richardson, M. (2003). Dotcommania: the
15, 4–12. rise and fall of internet stock prices, Journal of Finance
[7] Fama, E.F. (1970). Efficient capital markets: a review
58(3), 1113–1137.
of theory and empirical work, Journal of Finance 25, [23] Shiller, R.J. (1981). Do stock prices move too much to be
283–417. justified by subsequent changes in dividends? American
[8] Fama, E.F. (1991). Efficient capital markets: II, Journal Economic Review 71, 421–436.
of Finance 46, 1575–1617. [24] Summers, L. (1986). Does the stock market ratio-
[9] Gilles, C. & LeRoy, S.F. (1991). Econometric aspects of nally reflect fundamental values, Journal of Finance 41,
the variance-bounds tests: a survey, Review of Financial 591–600.
Studies 4, 753–791. [25] West, K.D. (1988). Bubbles, fads and stock price
[10] Jones, C. & Lamont, O. (2002). Short-sale constraints volatility: a partial evaluation, Journal of Finance 43,
and stock returns, Journal of Financial Economics 66, 636–656.
207–239.
[11] Krainer, J. (1997). Pricing Illiquid Assets with a Match-
ing Model . Reproduced, University of Minnesota. Related Articles
[12] Krainer, J. (2001). A theory of liquidity in residential
real estate markets, Journal of Urban Economics 13,
32–53. Expectations Hypothesis; Predictability of Asset
[13] Krainer, J. & LeRoy, S.F. (2002). Equilibrium valuation Prices; Risk Aversion; Risk Premia; Transaction
of illiquid assets, Economic Theory 19, 223–242. Costs.
[14] LeRoy, S.F. (1973). Risk aversion and the martingale
model of stock prices, International Economic Review STEPHEN F. LEROY
14, 436–446.
Expectations Hypothesis and simplicity reasons. In comparing the expected
returns on two bonds of different maturities, how-
ever, the returns may be compounded in any of
four natural ways: continuously, to the shorter
bond’s maturity, to the longer bond’s maturity, or
If the attractiveness of an economic hypothesis is
measured by the number of papers which statistically to the nearest available future date. For these rea-
reject it, the expectations theory of the term structure sons, in the following, we introduce notation that
is a knockout [43]. is flexible enough to accommodate the descrip-
tion of discrete as well as continuous time models
The term expectations hypothesis (EH) stands for and all possible ways that compounding may take
numerous statements that link yields, returns on place.
bonds, and forward rates of different maturities and A zero-coupon bond or discount bond–the sim-
periods. The EH has been the basis of empirical and plest fixed income security–promises a single fixed
theoretical work in fixed income following the work payment at a specified date in the future known as
of Macaulay [54]. These hypotheses were devel- maturity date. The size of this payment is called face
oped for understanding the returns and yields on value of the bond. Example of such securities is the
long- versus short-term bonds and the time series Treasury bills, which are bonds issued by the US
movements of the term structure. The literature dis- government with maturities up to a year.
tinguishes between the pure expectations hypothesis We denote the price of a zero-coupon bond
(PEH), which postulates that (i) expected excess that matures τ ∈ + periods from now and pays
returns on long-term over short-term bonds are zero, 1 unit at maturity as Pt(τ ) . Call the yield to matu-
(ii) yield term premia are zero, or (iii) forward term rity–compounded once per period–for this zero-
premia are zero, from the EH, which postulates that coupon bond as Yt(τ ) . Then prices and yields are
(i) expected excess returns are constant over time, (ii) connected through the following equation:
yield term premia are constant, or (iii) forward term
premia are constant over time. 1
Pt(τ ) = (1)
We review the literature related to the EH. We (1 + Yt(τ ) )τ
present the different forms of both the PEH and It is common in the empirical finance literature to
the less strong EH. We show that their math- work with log or continuously compounded vari-
ematical expressions depend on the researchers’ ables. This has the usual advantage of linearizing
choice of model–continuous time versus discrete exponential affine equations that arise frequently in
time–and their choice of frequency of compounding asset pricing and of defining comparable yield values
returns–continuous (log-return) versus discrete (sim- independent of the remaining horizon value τ . Using
ple return). Depending on these choices, we may or lowercase letters for logs, the relationship between
may not have equivalence among the several forms of log yield and log price is
the (pure) EH. In addition, we examine which of the
statements can be derived from a no-arbitrage gen- pt(τ )
eral equilibrium model. Lastly, we present empirical yt(τ ) = − (2)
τ
evidence against the EH mainly from the US data, The collection of all these yields for different maturi-
and the less strong rejection of the hypotheses when ties is called the zero term structure of interest rates.
using non-US data. Buying this bond at time t and reselling it at time
t + s generate a holding period return of
Notation (τ −s)
(τ ) Pt+s (1 + Yt(τ ) )τ
Rt→t+s = = (τ −s) τ −s
(3)
To formulate the different forms of the EH, we Pt(τ ) (1 + Yt+s )
need to introduce the basic fixed income assets and a log holding period return of
and concepts associated with them. Even though
all the empirical research is done using discrete (τ ) (τ −s)
rt→t+s = pt+s − pt(τ )
time models, the theoretical literature predominantly
(τ −s)
uses continuous time models mainly for tractability = s yt(τ ) − (τ − s)(yt+s − yt(τ ) ) (4)
2 Expectations Hypothesis

Clearly, the holding period s cannot exceed the time and the two measures  and () are connected
to maturity τ , s ≤ τ . The above equation shows through the Radon–Nikodym derivative
that the holding period return on a zero-coupon
bond is not known at time t unless the holding  
d() 1 T
period coincides with the lifetime of the bond. In = ξT = exp − (s) (s)ds−
d 2 0
this case, the holding period return is the yield to  T 
maturity. Otherwise, the return is a random variable
(s)ds (9)
that depends on the future evolution of yields. 0
Even though returns are unknown, bonds can be
combined to guarantee a fixed interest rate on an This gives rise to the following pricing equation
investment to be made in the future; the interest under both measures:
rate on this investment is called a forward rate. The    T 
forward and log forward rates guaranteed at time t M(T ) − r(s)ds
S(t) = Ɛt 
S(T ) = Ɛt e t S(T )
for an investment made at time t + s until time t + τ M(t)
where s ≤ τ are given as (10)

1 1 Easy manipulations of the above equation can prove


1 + Ft(s,τ ) = that under , the instantaneous expected returns for
τ − s (Pt(τ ) /Pt(s) )
all assets are equal to the risk-free rate. For this
1 (1 + Yt(τ ) )τ reason, the measure  is also called the risk-neutral
= (5) measure. Specializing the above equation for a zero-
τ − s (1 + Yt(s) )s
coupon bond that matures at time t + τ and promises
pt(s) − pt(τ ) τ a payoff of $1 gives
ft(s,τ ) = = yt(s) + (yt(τ ) − yt(s) )    T 
τ −s τ −s
(τ )  M(T )  − r(s)ds
(6) Pt = Ɛt = Ɛt e t (11)
M(t)
Finally, the short-term interest rate is the limit From the above pricing equations, we observe that the
of yields to maturity as maturity approaches, rt = key variables that govern the bond prices dynamics
limτ ↓0 yt(τ ) . are (i) the interest rate r(t) and (ii) the prices of risk
(t). Different assumptions about the functional form
of these variables imply different data-generating
Bond Pricing processes for bond yields. A large part of the current
fixed income literature is devoted to studying these
Bonds are usually priced with the use of the so- different models and the goodness of fitting the raw
called risk-neutral probability measure , which is bond yield data crossectionally and in time series. We
equivalent to the true or physical or data-generating examine below the features of the most representative
measure .a The pricing under  is done with the bond pricing models.
use of a pricing kernel M, which is the result of the
no-arbitrage assumption. For an asset that promises
payoff S(T ) at time T , its price now at time t is given The Expectations Hypothesis
by the expected discounted cash flows equation:
The term expectations hypothesis stands for numerous
  statements that link yields, returns on bonds, and
 M(T )
S(t) = Ɛt S(T ) (7) forward rates of different maturities and periods. It
M(t)
is important to note that initially, starting with Hicks
It can be shown that given the shocks of the economy [49], Lutz [53], and Macaulay [54], these statements
dz(t), M takes the following form: were not formally derived from any fully specified
equilibrium model, but rather merely hypothesized.
dM(t) For this reason, the term expectations hypothesis is
= −r(t) dt − (t) dz(t) (8) not associated with only one mathematical statement.
M(t)
Expectations Hypothesis 3

These hypotheses were developed for understanding of that future period, or equivalently, forward term
the returns and yields on long- versus short-term premiag are zeroc :
bonds, and the time series movements of the term
(1 + Yt(n) )n  
structure. Later, researchers developed theoretical 1 + Ft(n−1,n) = = Ɛ 1 + Y (1)
t t+n−1
models that give rise to some of the hypothesized (1 + Yt(n−1) )n−1
equations associated with the EH [20, 21, 27, 57]. (14)
The literature distinguishes between the PEH, The last form of the PEH equates the n-period bond
which postulates that (i) expected excess returns on return with the one-period bond and n − 1 period
long-term over short-term bonds are zero, (ii) yield bond:
term premia are zero, or that (iii) forward term premia 

n

n−1 
are zero, from the EH, which postulates that (i) 1 + Yt(n) = 1 + Yt(1) Ɛ 1 + Y (n−1)
t t+1
expected excess returns are constant over time, (ii)
yield term premia are constant, or (iii) forward term (15)
premia are constant over time. In the following, all
the forms of the PEH in discrete time with discrete Even though the above expressions describe dif-
and continuous compounding and in continuous time ferent forms of the PEH, they are not mutually
(continuous compounding) are presented. We will see equivalent. Assuming that the above expressions are
that the PEH expressions derived in all these models true for all t and n, it can be shown that (i) equa-
are not equivalent across models as well as within tion (13) is equivalent to equation (15), (ii) equation
each model. (14) implies equation (13) (therefore equation (15)),
but the opposite is not true unless
∞ we make the
(1)
additional assumption that 1 + Yt+j are uncor-
Pure Expectations Hypothesis in Discrete j =1
related with each other, and (iii) equations (12) and
Time (15) are inconsistent, because the expected value of
the inverse of a random variable is not in general
Discrete Compounding
equal to the inverse of its expected value.
The first form of the PEH equates the expected To summarize, the PEH cannot hold in both
returns on one-period (short-term) and n-period its one-period form and its n-period form, and,
(long-term) bonds, or equivalently, expected excess essentially there are three different (competing) forms
returns on long-term over short-term bonds are zero: of the PEH in discrete time, the excess return
expression (12), the yield premia expression (13), and
  the forward premia expression (14).
(1 + Yt(1) ) = Ɛ (n)
t 1 + Rt→t+1
Imposing more structure in the term structure
 model by assuming that the interest rate is lognormal

n
−n+1 
and homoscedastic, we can quantify the effect of
= 1 + Yt(n) · Ɛ
t 1 + Y (n−1)
t+1 Jensen’s inequality. Under this additional assumption,
the excess one-period bond returns under the different
(12)
hypotheses can be shown to be of 1 Var[rt→t+1 (n)

2
(1)
The second form of the PEH equates the n-period yt ] order. Therefore, the difference between the one-
expected returns on the one-period and n-period period excess bond returns of different PEH forms is
(n)
bonds, or equivalently, yield term premia are zerob : Var[rt→t+1 − yt(1) ]. Using sample means and standard
deviations, we can get an estimate and a standard

n 

error of the above quantity. This magnitude is very
1 + Yt(n) = Ɛ
t 1 + Yt(1) 1 + Yt+1
(1)
small for short-term bonds and becomes significant

 only for long-term bonds; hence, the differences
(1)
· · · 1 + Yt+n−1 (13) between different forms of the PEH are small except
for very long term zero-coupon bonds. Thus, the data
The third form of the PEH equates the expected future reject all forms of the PEH at the short end, but
one-period spot rate with the current forward rate reject no forms of the PEH at the long end of the
4 Expectations Hypothesis

term structure. In this sense, the distinction between also known to hold under ; under the risk-neutral
the different forms of the PEH is not critical for measure, all assets have the same expected return,
evaluating this hypothesis. equal to the risk-free rate. This implies that this form
of the PEH postulates that  = . The expression’s
Continuous Compounding (13) continuous time equivalent is
  t+τ 
Most empirical research, though, uses neither of the 1  r(s) ds
= Ɛ t e t (22)
above PEH forms, but a log form of them. Once Pt(τ )
the PEH is formulated in logs, all the forms of
the PEH become equivalent. Using log returns, the This statement equates the guaranteed return from
counterparts of equations (12), (13), and (14) are holding any zero-coupon bond to maturity with the
total return expected from rolling over a series
of short-term period bonds. The continuous time
yt(1) = Ɛ (n)
t [rt→t+1 ] (16)
equivalent of equation (14) is

n−1
yt(n) = (1/n) Ɛt [yt+i
(1)
] (17) −∂Pt(τ ) /∂τ
= Ɛ
t [r(t + τ )] (23)
i=0 Pt(τ )
ft(τ −1,τ ) = Ɛ (1)
t [yt+τ −1 ] (18) The left-hand side of the equation is the current
The empirical literature uses equations (17) and infinitesimal forward rate at time t + τ , and the
(18) in order to construct two related notions of right-hand side is the expected future spot rate at
term premia that have played a prominent role in t + τ . Integrating the last equation and applying the
the literature of expected bond returns: the yield boundary condition Pt(0) = 1 gives
premium,  t+τ
− ln[Pt(τ ) ] = Ɛt [r(s)] ds (24)
1  (1)
n−1 t
ct(n) ≡ yt(n) − Ɛ [y ] (19)
n i=0 t t+i Formulating the PEH in continuous time makes
the pairwise incompatibility of equations (21), (22),
and the forward term premium, and (24) transparent. If we
define the random variable

t+τ
pt(n) ≡ ft(n,n+1) − Ɛ (1) X̃ ≡ exp − t r(s) ds , then these equations can
t [yt+n ]. (20)
be rewritten as
Derivations of PEH- and EH-tested formulas follow
below. P = Ɛ
t [X̃] (25)
−1
P = Ɛt [X̃−1 ] (26)
Pure Expectations Hypothesis in
Continuous Time ln(P ) = Ɛ
t [ln X̃] (27)

Cox et al. [27] restate the PEH forms in continuous By invoking Jensen’s inequality, one can show that
time and prove that the different forms are incompat- the yields to maturity implied from equations (21),
ible. The equivalent of expression (12) in continuous (22), and (24) satisfy the relationship (with some
time is created by assuming that the holding period abuse of notation):
is the shortest possible, that is, infinitesimal. In this
case, the PEH takes the following form: yt(τ )(21) ≤ yt(τ )(22) ≤ yt(τ )(24) (28)

Ɛt [dPt(τ ) ] In this model, it is also easy to see that the expected
= r(t) dt (21) excess returns are positive in all hypotheses except
Pt(τ )
in equation (21).
This expression states that all bonds have the same Perhaps the most impacting result of Cox et al.
expected infinitesimal return, equal to the short- [27] is the characterization of the PEH forms that can
term interest rate. However, the above expression is be the result of a (no-arbitrage) equilibrium model.
Expectations Hypothesis 5

They examine whether there exist pricing kernels Fama [39, 40] and Fama and Bliss [41] also present
(i.e., prices of risk) that can satisfy the resulting pric- challenges of the EH where they find evidence of rich
ing PDE in the economy and at the same time satisfy patterns of variation in expected returns across time
the form of the PEH under examination. They con- and maturities. Keim and Stambaugh [50], Fama and
clude that only equation (21) can be sustained by French [42], and Campbell and Ammer [23] show
an equilibrium model. By definition, equation (21) that yield spreads help to forecast excess return on
implies that  = ; therefore, selecting (t) = 0 bonds as well as on other long-term assets.
gives rise to a valid pricing kernel. Cox et al. [27] Perhaps the most widely cited tests of the EH are
prove that the other forms do not give rise to a the Campbell and Shiller [24] regressions based on
valid pricing kernel. However, McCulloch [57] later the equations:
showed that their claim is incorrect. Working in gen-
eralizing a preexisting discrete time model to contin-   m
(n−m)
Ɛt yt+m − yt(n) = αnm + (yt(n) − yt(m) )
uous time, he shows that there exists an equilibrium n−m
economy that also gives rise to equation (23). (29)

which are a more general form of the regressions in


Expectation Hypothesis [30] based on the following equations:
As described above, the difference between the EH   1
(n−1)
and the PEH is that the term premia under the Ɛt yt+1 − yt(n) = αn1 + (yt(n) − rt ) (30)
n−1
PEH are assumed to be zero, whereas under the
EH they are assumed to be constant. Therefore, to that are used to assess the goodness of fit of the
formulate the different forms of the EH we need different term structure models. The derivations of
to add in each form of the PEH a constant term that the above equations are shown in detail in the
depends only upon the remaining time to maturity above-mentioned papers and in [73]. In short, from
of the corresponding bond considered in each form equation (4) we have that the one-period excess bond
of the EH. continuously compounded return is equal to
Even though the different forms of the PEH are
   
generally incompatible, Campbell [21] showed that Ɛt rt→t+1
(n)
− rt = − (n − 1)Ɛ (n−1) (n)
t yt+1 − yt
the different forms of the EH are not incompatible
and he derived a general equilibrium model that + (yt(n) − rt ) (31)
sustained several forms of the EH at the same time.
His model is set up in continuous time. In addition, and, it can also be shown that it is equal to
special cases of the models examined in [16] and [47]
provide equilibrium models that give rise to constant    
(n) (n−1) (n)
term premia [36]. Ɛt rt→t+1 − rt = − (n − 1)Ɛ
t ct+1 − ct
Next, we show the most commonly tested equa-
tions of the EH in the literature. + pt(n−1) (32)

where ct(n) and pt(n) are the yield and forward premia
Tests of the Expectations Hypothesis defined in equations (19) and (20), respectively. The
last expression implies that if the PEH holds (i.e.,
The EH has been under scrutiny at least since the ct(n) = 0, pt(n) = 0) then the expected excess returns
work of Macaulay [54]. In this study, Macaulay are zero, whereas if the EH holds (i.e., ct(n) = c(n),
emphasizes the low (given the EH is true) correla- pt(n) = p(n)), then the expected excess returns are
tion between forward rates and subsequent spot rates. constants that depend on the time to maturity n.
Since then, the EH has been tested in hundreds of Combining equations (31) and (32) gives rise to
studies, and in all of them–with only few excep- equation (30), the well-known LPY regressions of
tions–has been rejected. Some of the early papers Dai and Singleton [30].
that test the EH are those of Sutch [74], Shiller [69, Campbell and Shiller [24] and Dai and Singleton
70, 71], Modigliani and Shiller [59], Sargent [67, 68]. [30], among others, document the failure of both the
6 Expectations Hypothesis

regressions (31) and (32), which are true under the easily available for maturities up to a year, but for
EH. According to these equations, the coefficients longer maturities the rates have to be constructed
(n−m)
of the nonconstant terms when regressing yt+m − using spline methods) but with the use of the easily
(n) m (n) (m) 1
yt onto n − m (yt − yt ), or n − 1 (yt − rt ) if (n) observable coupon-bond yields. One of their regres-
m = 1, should be equal to unity. Not only are the sions is based on the EH equation:
estimated coefficients not unity but also they are  
often statistically significantly negative, particularly Ɛt yt+m
(n−m)
− yt(m) = αn,m + (ft(m,n) − yt(m) ) (34)
for large n. This means that the EH fails more
significantly for long-term bonds. The intuition of the Using coupon-bond yields does not change the results
EH is that increases in the slope of term structure that future bond yield changes cannot be predicted
(yt(n) − rt ) reflect expectations of rising future short by the current term spread of forward spread. Still,
spot rates. For the “buy an n-period bond and even the direction cannot be predicted correctly.
hold it to maturity” investment strategy to match, They suggest time-varying risk premia as a plausible
on average, the returns from rolling over short solution to the failure of the EH. Froot [43] also tests
rates in a rising short rate environment, the price the same equation trying to understand whether its
of the long bond should decrease such that the failure is the result of time varying term premia or
yield increases (yt+1(n−1)
> yt(n) ). The regression results that expected future spot rates under- or overreact
suggest that the slope of the yield curve does not to changes in short rates. He finds that for short
even forecast correctly the direction of the changes maturities its failure is due to variation in term
in the long-bond yields. The underreaction of long premia, but this is not true for long maturities.
rates to spread term changes has also been the study A recent paper that has received a lot of attention
in [56]. Elaborating and further documenting this is Cochrane and Piazzesi [26]. Cochrane and Piazzesi
underreaction, Campbell [22] finds that yield spreads [26] have revisited the forecasting regressions of
do not forecast short-run changes in long yields Fama and Bilss using the term structure of forward
(against EH) but do forecast long-run changes in short rates instead of a single forward rate. Their most
yields (consistent with EH). notable finding is that the coefficients from regressing
Backus et al. [7] tested the EH by running regres- excess bond returns over one year onto the one
sions based on analogous equations for the forward year forward rates for the next five years exhibit a
rates: tentlike shape for all maturity bonds. The tentlike
  shape similarities for all bond maturities suggest
Ɛt ft+1
(n−1,n)
− rt = α + (ft(n−1,n) − rt ) (33) that a single common factor may be underlying the
predictability of excess bond returns for all maturities.
They also find that the regression coefficients of Another very interesting fact in [26] is the high R 2 s
(ft(n−1,n) − rt ) are not unity as the null hypothesis generated in the above regressions. The R 2 s range
sets, but slightly less than one and significantly between 36% and 39%. This is substantially more
different than one. They also show that the small predictability than in [41] using a single forward
differences of the estimated coefficients with unity factor.
in the above regressions do not constitute separate or A series of other papers have also examined
weaker findings against the EH than the deviations alternative reasons for the failure of the EH: Mankiew
of the Campbell–Shiller coefficients from unity, but and Miron [55] find that interest rate movements were
are actually the same. They constructed a one-factor more predictable before the founding of the Federal
model and showed that the small differences in the Reserve in 1913, and the downward bias appears
coefficients of the Backus et al. [7] regressions from to be smaller in that period. Campbell and Ammer
their null value translate into large negative values [23] emphasize that long-term bond yields vary
for the Campbell–Shiller coefficients. primarily in response to changing expected inflation.
Similar to the above forward regressions are the Rudebusch [65] argues that contemporary Federal
forward term regressions tested in [43, 72] Shiller Reserve operating procedures lead to predictable
et al. [72] use a log-linearized model [70, 71] that interest rate movements in the very short run and the
allows them to test several models of the EH very long run, but tend to smooth away predictable
without having to use discount rates (which are movements in the medium run. Balduzzi et al. [8]
Expectations Hypothesis 7

argue that spreads between short-term rates and the trying to understand the failure of the EH have
overnight federal reserve funds rate are mainly driven hinted on different reasons that may give rise to
by expectations of changes in the target, and not time-varying expected excess returns on bonds. Even
by the transitory dynamics of the overnight rate though the understanding of the failure of the EH
around the target. Hence, the bias in tests of the EH is not complete, part of the literature is devoted to
that they document can be mainly attributed to the creating models that better capture this failure and
erroneous anticipation of future changes in monetary that better replicate the data.
policy. In this strand we can put the papers of reduced
Several studies have examined the small-sample form (affine or nonaffine, with macro or without
bias of the regression coefficients. Bekaert and macrovariables) term structure models, such as Ahn
Hodrick [10] argue that the past use of large sample et al. [1], Ang and Bekaert [2], Bansal and Zhou
critical regions, instead of their small-sample coun- [9], Bikbov and Chernov [12, 13], Buraschi et al.
terparts, may have overstated the evidence against [17], Dai and Singleton [29, 30, 32], Diebold et al.
the expectations theory. They find that the evidence [33], Duarte [34], Evans [38], Leippold and Wu [52],
against the EH for these interest rates and exchange and Naik and Lee [59], and, those of the structural
rates is much less strong than under asymptotic infer- form models with or without macrovariables, such
ence. Other studies, though, such as Backus et al. [7] as, Ang et al. [3, 6], Ang and Piazzesi [4, 5], Brandt
and Bekaert, Hodrick, and Marshall [11], find that and Wang [15], Buraschi and Jiltsov [18, 19], Dai
the small-sample properties of the regressions like the [28], Greenwood and Vayanos [45], Guibaud et al.
ones shown in this article are actually biased upward; [46], Piazzesi [62], Piazzesi and Schneider [63],
this means the true Campbell–Shiller coefficients Rudebusch and Wu [65, 66], Vayanos and Vila [75],
are more negative than the ones estimated in the and Wachter [76].
regressions, heightening the puzzle related with the
failure of the EH.
Researchers have also looked at the validity of the Acknowledgments
EH outside the United States. The tendency in these
studies is to find Campbell–Shiller coefficients that The author thanks Aggie Moon for providing research
assistantship. The author takes the responsibility for errors
are less than zero but less negative than the US data
if any.
results. Some of those studies that are done primarily
for European countries and show mixed results are
Bekaert and Hodrick [10], Boero and Torricelli [14], End Notes
Evans [37], Gerlach and Smets [44], Hardouvelis
[48], Kugler [51]. a.
Cochrane [25], Dai and Singleton [31], Duffie [35],
Nielsen [60], Piazzesi [61], Singleton [73].
b.
In the EH literature the term yield premium is used to
Conclusion denote the difference of the nth root of the terms in the
left- and right-hand side of equation (13).
c.
In the EH literature the term forward premium is used to
The EH constitutes several hypotheses that were
denote the difference of the terms in the left- and right-hand
generated to understand bond returns and their yields side of equation (14).
through the help of other maturity bond returns or
investment strategies and forward rates. We showed
that these hypotheses can be formulated in many References
different ways and using different models (continuous
time vs discrete and continuous compounding vs [1] Ahn, D.-H., Dittmar, R.F. & Gallant, A.R. (2002).
discrete). The different hypotheses are not equivalent. Quadratic term structure models: theory and evidence,
Therefore to test the validity of the EH numerous Review of Financial Studies 15, 243–288.
[2] Ang, A. & Bekaert, G. (2002). Regime switches in inter-
different expressions have to be tested. est rates, Journal of Business and Economic Statistics 20,
The consensus is that the EH fails in the US 163–182.
data. Its failure, though, is less strong or mixed [3] Ang, A., Dong, S. & Piazzesi, M. (2007). No-Arbitrage
for the non-US data. Researchers challenging and Taylor Rules, National Bureau of Economic Research.
8 Expectations Hypothesis

[4] Ang, A. & Piazzesi, M. (2003a). A no-arbitrage vec- [21] Campbell, J.Y. (1986b). A defense of traditional
tor autoregression of term structure dynamics with hypotheses about the term structure of interest rates,
macroeconomic and latent variables, Journal of Mon- Journal of Finance 41, 183–193.
etary Economics 50, 745–787. [22] Campbell, J.Y. (1995). Some lessons from the yield
[5] Ang, A. & Piazzesi, M. (2003b). A no-arbitrage curve, Journal of Economic Perspectives 9, 129–152.
vector autoregression of term structure dynamics [23] Campbell, J.Y. & Ammer, J. (1993). What moves the
with macroeconomic and latent variables, Journal of stock and bond markets? A variance decomposition for
Monetary Economics 50, 745–787. long-term asset returns, Journal of Finance 48, 3–37.
[6] Ang, A., Piazzesi, M. & Wei, M. (2006). What does the [24] Campbell, J.Y. & Shiller, R.J. (1991). Yield spreads and
yield curve tell us about the GDP growth? Journal of interest rate movements: A Bird’s eye view, Review of
Econometrics 131, 359–403. Economic Studies 58, 495–514.
[7] Backus, D., Foresi, S., Mozumbar, A. & Wu, L. (2001). [25] Cochrane, J. (2000). Asset Pricing, Princeton University
Press, Princeton.
Predictable changes in yields and forward rates, Journal
[26] Cochrane, J. & Piazzesi, M. (2005). Bond risk premia,
of Financial Economics 59, 281–311.
American Economic Review 95, 138–160.
[8] Balduzzi, P., Bertola, G. & Foresi, S. (1997). A model
[27] Cox, J.C., Ingersoll, J.C. & Ross, S.A. (1981). A re-
of target changes and the term structure of interest rates,
examination of traditional hypotheses about the term
Journal of Monetary Economics 39, 223–249.
structure of interest rates, Journal of Finance 36,
[9] Bansal, R. & Zhou, H. (2002). Term structure of 769–799.
interest rates with regime shifts, Journal of Finance 57, [28] Dai, Q. (2003). Term Structure Dynamics in a Model with
1997–2044. Stochastic Internal Habit. Working Paper, New York
[10] Bekaert, G. & Hodrick, R.J. (2001). Expectations University.
hypothesis tests, Journal of Finance 56, 1357–1394. [29] Dai, Q. & Singleton, K. (2000). Specification analysis
[11] Bekaert, G., Hodrick, R.J. & Marshall, D.A. (1997). of affine term structure models, Journal of Finance 55,
On biases in tests of the expectations hypothesis of 1943–1978.
the term structure of interest rates, Journal of Financial [30] Dai, Q. & Singleton, K. (2002). Expectations puzzles,
Economics 44, 309–348. time-varying risk premia, and affine models of the term
[12] Bikbov, R. & Chernov, M. (2005). Term Structure structure, Journal of Financial Economics 63, 415–442.
and Volatility: Lessons from the Eurodollar Futures [31] Dai, Q. & Singleton, K. (2003a). Fixed-income pricing,
and Options. Working Paper, London Business School, in Handbook of Economics and Finance, C. Constan-
London. tinides, M. Harris & R. Stulz, eds, North Holland,
[13] Bikbov, R. & Chernov, M. (2006). No-Arbitrage Amsterdam.
Macroeconomic Determinants. Working Paper, London [32] Dai, Q. & Singleton, K. (2003b). Term structure dynam-
Business School. ics in theory and reality, Review of Financial Studies 16,
[14] Boero, G. & Torricelli, C. (1997). The Expectations 631–678.
Hypothesis of the Term Structure: Evidence for Germany. [33] Diebold, F., Rudebusch, G. & Aruoba, B. (2006). The
Working Paper CRENoS 1997/4. Centre for North South macroeconomy and the yield curve: a dynamic latent
Economic Research, University of Cagliari and Sassari, factor approach, Journal of Econometrics 131, 309–338.
Sardinia, revised. [34] Duarte, J. (2004). Evaluating an alternative risk prefer-
[15] Brandt, M.W. & Wang, K.Q. (2003). Time-varying risk ence in affine term structure models, Review of Financial
Studies 17, 370–404.
aversion and unexpected inflation, Journal of Monetary
[35] Duffie, D. (1996). Dynamic Asset Pricing Theory,
Economics 50, 1457–1498.
Princeton University Press, Princeton.
[16] Breeden, D. (1986). Consumption, production and
[36] Dunn, K.B. & Singleton, K.J. (1986). Modeling the term
interest rates: a synthesis, Journal of Financial
structure of interest rates under non-separable utility and
Economics 7, 265–296.
durability of goods, Journal of Financial Economics 17,
[17] Buraschi, A., Cieslak, A. & Trojani, F. (2007). 27–55.
Correlation Risk and the Term Structure of Interest Rates. [37] Evans, M.D. (2000). Regime Shifts, Risk, and the Term
Working Paper, Imperial College, U.K. Structure. Working Paper, Georgetown University.
[18] Buraschi, A. & Jiltsov, A. (2005). Inflation risk premia [38] Evans, M.D. (2003). Real risk, inflation risk, and the
and the expectations hypothesis, Journal of Financial term structure, The Economic Journal 113, 345–389.
Economics 75, 429–490. [39] Fama, E.F. (1984a). The information in the term struc-
[19] Buraschi, A. & Jiltsov, A. (2007). Term structure of ture, Journal of Financial Economics 13, 509–528.
interest rates implications of habit persistence, Journal [40] Fama, E.F. (1984b). Term premiums in bond returns,
of Finance 62, 3009–3063. Journal of Financial Economics 13, 529–546.
[20] Campbell, J.Y. (1986a). Bond and stock returns in a [41] Fama, E.F. & Bliss, R.R. (1987). The information in
simple exchange model, Quarterly Journal of Economics long-maturity forward-rates, American Economic Review
101, 785–803. 77, 680–692.
Expectations Hypothesis 9

[42] Fama, E.F. & French, K.R. (1989). Business conditions [61] Piazzesi, M. (2003). Affine term structure models, Hand-
and expected returns on stocks and bonds, Journal of book of Financial Econometrics, Elsevier, p. 828.
Financial Economics 29, 23–49. [62] Piazzesi, M. (2005). Bond yields and the federal reserve,
[43] Froot, K.A. (1989). New hope for the expectations Journal of Political Economy 113, 311–344.
hypothesis of the term structure of interest rates, Journal [63] Piazzesi, M. & Schneider, M. (2007). Equilibrium
of Finance 44, 283–305. yield curves, NBER/Macroeconomics Annual 21,
[44] Gerlach, S. & Smets, F. (1997). The term structure of 389–442.
Euro-rates: some evidence in support of the expectations [64] Rudebusch, G.D. (1995). Federal reserve interest rate
hypothesis, Journal of International Money and Finance targeting, rational expectations, and the term structure,
16, 305–321. Journal of Monetary Economics 35, 245–274.
[45] Greenwood, R. & Vayanos, D. (2008). Bond Supply and [65] Rudebusch, G.D. & Wu, T. (2004a). A Macro-Finance
Excess Bond Returns. Working Paper, London School of Model of the Term Structure, Monetary Policy, and the
Economics. Economy. Working Paper, Federal Reserve Bank of San
[46] Guibaud, S., Nosbusch, Y. & Vayanos, D. (2007). Francisco.
Preferred Habitat and the Optimal Maturity Structure of [66] Rudebusch, G.D. & Wu, T. (2004b). The Recent Shift
Government Debt. Working Paper, London School of in Term Structure Behavior from a No-Arbitrage Macro-
Economics. Finance Perspective. Working Paper, Federal Reserve
[47] Hansen, L.P. & Singleton, K. (1983). Stochastic con- Bank of San Francisco.
sumption, risk aversion, and the temporal behavior of [67] Sargent, T.J. (1972). Rational expectations and the term
asset returns, Journal of Political Economy 91, 249–268. structure of interest rates, Journal of Money, Credit and
[48] Hardouvelis, G. (1994). The term structure spread and Banking 4, 74–97.
future changes in long and short rates in G7 countries, [68] Sargent, T.J. (1979). A note on maximum likeli-
Journal of Monetary Economics 33, 255–283. hood estimation of the rational expectations model of
[49] Hicks, J.R. (1939)Value and Capital,. Oxford University the term structure, Journal of Monetary Economics 5,
Press, Oxford. 133–143.
[50] Keim, D.B. & Stambaugh, R.F. (1986). Predicting [69] Shiller, R.J. (1972). Rational Expectations and the Term
returns in the stock and bond markets, Journal of Structure of Interest Rates. Ph.D. Dissertation, MIT.
Financial Economics 17, 357–390. [70] Shiller, R.J. (1979). The volatility of long-term interest
[51] Kugler, P. (1997). Central bank policy reaction and the rates and expectations models of the term structure,
expectations hypothesis of the term structure, Interna- Journal of Political Economy 87, 1190–1219.
tional Journal of Financial Economics 2, 164–181. [71] Shiller, R.J. (1981). Do stock prices move too much to be
[52] Leippold, M. & Wu, L. (2003). Design and estimation justified by subsequent changes in dividends? American
of quadratic term structure models, European Finance Economic Review 71, 421–436.
Review 7, 47–73. [72] Shiller, R.J., Campbell, J.Y. & Schoenholtz, K.L. (1983).
[53] Lutz, F.A. (1940). The structure of interest rates, The Forward rates and future policy: interpreting the term
Quarterly Journal of Economics 55, 36–63. structure of interest rates, Brookings Papers on Eco-
[54] Macaulay, F.R. (1938). Some Theoretical Problems Sug- nomic Activity 14(1), 173–224.
gested by the Movements of Interest Rates, Bond Yields, [73] Singleton, K.J. (2006). Empirical Dynamic Asset Pric-
and Stock Prices in the United States Since 1856 . NBER ing, Princeton University Press, Princeton.
Working Paper Series, New York. [74] Sutch, R.C. (1970). Expectations, risk, and the term
[55] Mankiew, G.N. & Miron, J.A. (1986). The changing structure of interest rates, Journal of Finance 25, 703.
behavior of the term structure of interest rates, Quarterly [75] Vayanos, D. & Vila, J.-L. (2007). A Preferred-Habitat
Journal of Economics 101, 211–228. Model of the Term Structure of Interest Rates. Working
[56] Mankiew, G.N. & Summers, L.H. (1984). Do long-term Paper, London School of Economics.
interest rates overreact to short-term rates? Brookings [76] Wachter, J.A. (2006). A consumption-based model of
Papers on Economic Activity 1, 223–242. the term structure of interest rates, Journal of Financial
[57] McCulloch, H.J. (1993). A reexamination of traditional Economics 79, 365–399.
hypotheses about the term structure: a comment, Journal
of Finance 48, 779–789.
[58] Modigliani, F. & Shiller, R.J. (1973). Inflation, rational
Further Reading
expectations, and the term structure of interest rates,
Economica 40, 12–43. Longstaff, F.A. (2000). The term structure of very short-term
[59] Naik, V. & Lee, M.H. (1997). Yield Curve Dynamics rates: new evidence for the expectations hypothesis, Journal
with Discrete Shifts in Economic Regimes: Theory and of Financial Economics 58, 397–415.
Estimation. Unpublished Working Paper, Faculty of Sutch, R.C. (1968). Expectations, Risk, and the Term Structure
Commerce, University of British Columbia. of Interest Rates. Dissertation, MIT.
[60] Nielsen, L.T. (1999). Pricing and Hedging of Derivative
Securities, Oxford University Press, Oxford. ANTONIOS SANGVINATSOS
Stochastic Discount The Setup

Factors Consider a very simplistic example of an economy,


where there are only two dates of interest, represented
by times t = 0 (today) and t = T (the financial-
planning horizon). There are several states of nature
Economic agents make investment decisions within
possible at time T and, for the time being, these are
active and liquid financial markets. Capital is allo-
represented as a finite set . Only one ω ∈  will be
cated today in exchange for some future income
revealed at time T , but this is not known in advance
stream. If there is no uncertainty regarding the future
today.
payoff of an investment opportunity, the yield that
In the market, there is a baseline asset with a price
will be asked on the investment will equal the risk-
process S 0 = (S 0 )t=0,T . Here, S00 is a strictly positive
free interest rate prevailing for the time period cover-
constant and ST0 (ω) > 0 for all ω ∈ . The process
ing the time of investment until the time of the payoff.
β := S00 /S 0 is called the deflator. It is customary
However, in the presence of any payoff uncertainty at
to regard this baseline asset as riskless, providing a
the time of undertaking an investment venture, eco-
simple annualized interest rate r ∈ + for investment
nomic agents will typically ask for risk compensation,
from today to time T ; in this case, S00 = 1 and
and thus for some investment-specific yield, which
will discount the expected future payoff stream. The ST0 = 1 + rT . This viewpoint is not adapted here,
yields that particular agents ask for depend both on since it is unnecessary.
their statistical views on possible future outcomes, as Together with the baseline asset, there exist d other
well as their attitudes toward risk. liquid traded assets whose prices S0i , i = 1, . . . , d
Yields vary across different investment opportuni- today are known constants, but the prices STi , i =
ties and their interrelations are difficult to explain. 1, . . . , d, at day T depend on the outcome ω ∈ ,
For the same agent, a different discounting factor that is, they are random variables.
has to be used for every separate valuation occa-
sion. If, however, one is ready to accept discounting Agent Portfolio Selection via Expected Utility
that varies randomly with the possible outcomes, and Maximization
therefore accepts the concept of a stochastic discount
factor, then a very economically consistent theory can Consider an economic agent in the market as
be developed. Asset valuation becomes a matter of described above. Faced with inherent uncertainty,
randomly discounting payoffs under different states the agent postulates some likelihood on the pos-
of nature and weighing them according to the agent’s sible outcomes, modeled
 via a probability measure
probability structure. The advantages of this approach P :   → [0, 1] with ω∈ P[ω] = 1. This gives rise
are obvious, since a single discounting mechanism to a probability
  on the subsets of  defined via
suffices to describe how any asset is priced by the [A] = ω∈A P(ω) for all A ⊆ . This probability
agent. can either be subjective, that is, coming from views
We discuss the theory of stochastic discount fac- that are agent specific, or historical, that is, arising
tors first in a discrete-time, finite state space and then from statistical considerations via some estimation
in the more practical case of Itô-process models. procedure.
Economic agents act in the market and optimally
invest to maximize their satisfaction. Each agent has
Stochastic Discount Factors in Discrete some preference structure on the possible future ran-
Probability Spaces dom payoffs that is represented here via the expected
utility paradigm.a There exists a continuously dif-
We start by introducing all relevant ideas in a very ferentiable, increasing, and strictly concave function
simple one-time-period framework and finite states U :   → , such that the agent will prefer a ran-
of the world. There are plenty of textbooks with vast dom payoff ξ :   →  from another random payoff
exposition on these and other related themes (e.g., ζ :   →  at time T if and only if Ɛ [U (ζ )] ≤
[1] or the first chapters of [9]—see also [5] for the Ɛ [U (ξ )], where Ɛ denotes expectation with respect
general state-space case). to the probability .
2 Stochastic Discount Factors

Starting with capital x ∈ , an economic agent The above is a nonlinear system of d equations to be
chooses at day zero a strategy θ ≡ (θ 1 , . . . , θ d ) ∈ solved for d unknowns (θ∗1 , . . . , θ∗d ). Under NA, the
d , where θ j denotes the units from thej th asset system 4 has a solution θ∗ . Actually, under a trivial
held in the portfolio. What remains, x − di=1 θ i S0i , nondegeneracy condition in the market, the solution is
is invested in the baseline asset. If X (x,θ) is the unique; even if the optimal strategy θ∗ is not unique,
wealth generated starting from capital x and investing strict concavity of U implies that the optimal wealth
according to θ, then X0(x,θ) = x and XT(x; θ∗ ) generated is unique.
A little bit of algebra on equation (4) gives, for all
  i = 1, . . . , d,

d
ST0 
d
XT(x; θ) = x− θ i S0i + θ i STi
S00  
i=1 i=1 S0i = Ɛ YT STi , where
   
ST0 
d
ST0
=x + θ i
STi − S0i (1) U
XT(x; θ∗ )
S00 S00 YT : =    (5)
Ɛ (ST0 /S00 )U
XT(x; θ∗ )
i=1

or, in deflated terms, βT XT(x; θ) = x + di=1 θ i
(βT STi − S0i ). The agent’s objective is to choose a Observe that since U is continuously differentiable
strategy in such a way as to maximize expected utility, and strictly increasing, U
is a strictly positive func-
that is, find θ∗ such that tion, and therefore [YT > 0] = 1. Also, equation (5)
      also holds trivially for i = 0. Note that the random
Ɛ U XT(x; θ∗ ) = sup Ɛ U XT(x; θ) (2) variable YT that was obtained above depends on the
θ∈d utility function U , the probability , as well as on
the initial capital x ∈ .
The above problem will indeed have a solution if
and only if no arbitrages exist in the market. By Definition 1 In the model described above, a pro-
definition, an arbitrage is a wealth generated by some cess Y = (Yt )t=0,T will be called a stochastic discount
 
θ ∈ d such that [XT(x; θ) ≥ 0] = 1 and [XT(x; θ) > factor if [Y0 = 1, YT > 0] = 1 and S0i = Ɛ YT STi
0] > 0. It is easy to see that arbitrages exist in the for all i = 0, . . . , d.
market if and only if supθ∈d Ɛ [U (XT(x; θ) )] is not
If Y is a stochastic discount factor, using equation
attained by some θ∗ ∈ d . Assuming, then, the no-
(1), one can actually show that
arbitrage (NA) condition, concavity of the function
d θ  → Ɛ [U (XT(x; θ) )] will imply that the first-  
order conditions Ɛ YT XT(x; θ) = x, for all x ∈  and θ ∈ d

∂ 
 
(x; θ)
 (6)
Ɛ U XT = 0, for all i = 1, . . . , d
∂θ i θ=θ∗ In other words, the process Y X (x; θ) is a -martingale
(3) for all x ∈  and θ ∈ d .
will provide the solution θ∗ to the problem. Since
the expectation is just a finite sum, the differential Connection with Risk-neutral Valuation
operator can pass inside, and then the first-order
conditions for optimality are Since Ɛ [ST0 YT ] = S00 > 0, we can define
 a proba-
bility mass Q by requiring that Q(ω) = ST0 (ω)/S00





(x; θ)
YT (ω)P(ω), which defines a probability  on sub-
0=Ɛ U XT sets of  in the obvious way. Observe that, for any
∂θ i θ=θ∗
  A ⊆ , [A] > 0 if and only if [A] > 0; we say
  ST0 i that the probabilities  and  are equivalent and we

(x; θ∗ )
= Ɛ U XT ST − 0 S0 ,
i
denote this by  ∼ . Now, rewrite equation (5) as
S0
 
i = 1, . . . , d (4) S0i = Ɛ βT STi , for all i = 0, . . . , d (7)
Stochastic Discount Factors 3

A probability , equivalent to , with the property Arbitrage-free Prices


prescribed in equation (7) is called risk-neutral or an For a claim with random payoff HT at time T ,
equivalent martingale measure. In this simple frame- an arbitrage-free (AF) price H0 is a price at time
work, stochastic discount factors and risk-neutral zero such that the extended market that consists
probabilities are in one-to-one correspondence. In of the original traded assets with asset prices S i ,
fact, more can be said. i = 0, . . . , d, augmented by the new claim, remains
Theorem 1 [Fundamental Theorem of Asset Pric- AF. If the claim is perfectly replicable, that is, if
ing] In the discrete model as described previously, there exists x ∈  and θ ∈ d such that XT(x; θ) = HT ,
the following three conditions are equivalent: it is easily seen that the unique AF price for the
claim is x. However, it is frequently the case that
1. There are no arbitrage opportunities. a newly introduced claim is not perfectly replica-
2. A stochastic discount factor exists. ble using the existing liquid assets. In that case,
3. A risk-neutral probability measure exists. there exists more than one AF price for the claim;
actually, the set of all the possible AF prices is
The fundamental theorem of asset pricing was first {Ɛ [YT HT ] | Y is a stochastic discount factor}. To
formulated by Ross [11] and it took 20 years to reach see this, first pick a stochastic discount factor YT and
a very general version of it in general semimartingale set H0 = Ɛ[YT HT ]; then, Y remains a stochastic dis-
models that are beyond the scope of our treatment count factor for the extended market, which therefore
here. The interested reader can check the monograph does not allow for any arbitrage opportunities. Con-
[3], where the history of the theorem and all its proofs versely, if H0 is an AF price for the new claim, we
are presented. know from Theorem 1 that there exists a stochas-
tic discount factor Y for the extended market, which
The Important Case of the Logarithm satisfies H0 = Ɛ[YT HT ] and is trivially a stochastic
discount factor for the original market. The result we
The most well-studied case of utility on the real line just mentioned justifies the appellation “Fundamental
is U (x) = log(x), both because of its computational theorem of asset pricing” for Theorem 1.
simplicity and for the theoretical value that it has.
Since the logarithmic function is only defined on the
strictly positive real line, it does not completely fall Utility Indifference Pricing
in the aforementioned framework, but it is easy to
see that the described theory is still valid. Suppose that a new claim promising some random
Consider an economic agent with logarithmic payoff at time T is issued. Depending on the claim’s
utility that starts with initial capital x = 1. Call X ∗ = present traded price, an economic agent might be
X (1; θ∗ ) the optimal wealth corresponding to log-utility inclined to take a long or short position—this will
maximization. The fact that U
(x) = 1/x allows to depend on whether the agent considers the market
define a stochastic discount factor Y ∗ via Y0∗ = 1 and price low or high, respectively. There does exist
a market price level of the claim that will make
1 the agent indifferent to going long or short on an
YT∗ =   (8)
XT∗ Ɛ 1/(βT XT∗ ) infinitesimalb amount of asset. This price level is
called indifference price. In the context of claim valu-
From Ɛ [YT∗ XT∗ ] = 1, it follows that Ɛ [1/ ation, utility indifference prices have been introduced
(βT XT∗ )] = 1 and therefore Y ∗ = 1/X ∗ . This simple in [2];c however, they had been widely used previ-
relationship between the log-optimal wealth and the ously in the science of economics. Indifference prices
stochastic discount factor that is induced by it is one depend on the particular agent’s views, preferences,
of the keys to characterize the existence of stochas- as well as portfolio structure, and should not be con-
tic discount factors in more complicated models and fused with market prices, which are established using
their relationship with absence of free lunches. It finds the forces of supply and demand.
good use in the section Stochastic Discount Factors Since the discussed preference structures are based
for Itô Processes for the case of models using Itô on expected utility, it makes sense to try and under-
processes. stand quantitatively how utility indifference prices are
4 Stochastic Discount Factors

formed. Under the present setup, consider a claim equation (5), we can write
with random payoff HT at time T . The question we
wish to answer is this: what is the indifference price H0 = Ɛ [YT HT ] (12)
H0 of this claim today for an economic agent?
It is important to observe that YT depends on a
For the time being, let H0 be any price set by the
number of factors, namely, the probability , the
market for the claim. The agent will invest in the
utility U , and the initial capital x, but not on the
risky assets and will hold θ units of them, as well as
particular claim to be valued. Thus, we need only
the new claim, taking a position of  units. Then, the
one evaluation of the stochastic discount factor and
agent’s terminal payoff is
we can use it to find indifference prices with respect
  to all kinds of different claims.
(x; θ,) (x; θ) ST0
XT := XT +  HT − 0 H0 (9)
S0
State Price Densities
The agent will again maximize expected utility, that For a fixed ω ∈ , consider an Arrow–Debreau
is, will invest (θ∗ , ∗ ) ∈ d ×  such that security that pays off a unit of account at time T
if the state of nature is ω, and pays off nothing,
     
otherwise. The indifference price of this security
Ɛ U XT(x; θ∗ ,∗ ) = sup Ɛ U XT(x; θ,)
(θ,)∈d × for the economic agent is p(ω) := Y (ω)P(ω). Since
Y appears as the density of the “state price” p
(10) with respect to the probability , stochastic discount
factors are also termed state price densities in the
If H0 is the agent’s indifference price, it must follow literature. For two states of nature ω and ω
of
that ∗ = 0 in the above maximization problem;  such that Y (ω) < Y (ω
), an agent who uses the
then, the agent’s optimal decision regarding the stochastic discount factor Y would consider ω
a
claim would be not to buy or sell any units of more unfavorable state than ω and would be inclined
the asset.  the concave function 
 In particular, to pay more for insurance against adverse market
  → Ɛ U XT(x;θ∗ ,) should achieve its maximum movements.
at  = 0. First-order conditions give that H0 is the
agent’s indifference price if
Comparison with Real-world Valuation

∂    Only for the purpose of what is presented here,


Ɛ U XT ∗
(x; θ ,)
0= assume that S00 = 1 and ST0 = 1 + rT for some
∂ =0
  r ∈ + . Let Y be a stochastic discount factor; then,
  ST0

(x; θ∗ ,0) we have 1 = S00 = Ɛ [YT ST0 ] = (1 + rT )Ɛ [YT ].
= Ɛ U XT XT − 0 X0 (11)
S0 Pick any claim with random payoff HT at time T
and use H0 = Ɛ [YT HT ] to write
A remark is in order before writing down the 1
indifference-pricing formula. The strategy θ∗ that has H0 = Ɛ [HT ] + cov (YT , HT ) (13)
been appearing above represents the optimal holding 1 + rT
in the liquid traded assets when all assets and the where cov (·, ·) is used to denote covariance of two
claim are available—it is not, in general, the agent’s random variables with respect to . The first term
optimal asset holdings if the claim were not around. (1 + rT )−1 Ɛ [HT ] of the above formula describes
Nevertheless, if the solution of problem (10) is such “real-world” valuation for an agent who would be
that the optimal holdings in the claim are ∗ = 0, neutral under his views  in facing the risk coming
then θ∗ are also the agent’s optimal asset holdings from the random payoff HT . This risk-neutral attitude
if there had been no claim to begin with. In other is usually absent: agents require compensation for the
words, if ∗ = 0, XT(x; θ∗ ,0) is exactly the same quantity risk they undertake, or might even feel inclined to pay
XT(x; θ∗ ) that appears in equation (4). Remembering more for a security that will insure them in cases of
the definition of the stochastic discount factor YT of unfavorable outcomes. This is exactly mirrored by the
Stochastic Discount Factors 5

correction factor cov (YT , HT ) appearing in equation of the j th source of uncertainty on the ith asset
(13). If the covariance of YT and HT is negative, the at time t ∈ [0, T ]. With “” denoting transposition,
claim tends to pay more when YT is low. By the c := σ  σ is the d × d local covariation matrix. To
discussion in the section State Price Densities, this avoid degeneracies in the market, it is required that
means that the payoff will be high in states that are ct has full rank for all t ∈ [0, T ],  almost surely
not greatly feared by the agent, who will therefore be (a.s.). This implies, in particular, that d ≤ m—there
inclined to pay less than what the real-world valuation are more sources of uncertainty in the market than
gives. On the contrary, if the covariance of YT and are liquid assets to hedge away the uncertainty risk.
HT is positive, HT will pay off higher in dangerous Models of this sort are classical in the quantitative
states of nature for the agent (where YT is also high), finance literature—see, for example, [8].
and the agent’s indifference price will be higher than
the real-world valuation. Definition 2 A risk premium is any m-dimensional,
F-adapted process λ satisfying σ  λ = b − r1, where
1 is the d-dimensional vector with all unit entries.
Stochastic Discount Factors for Itô
The terminology “risk premium” is better
Processes explained for the case d = m = 1; then λ = (b −
r)/σ is the premium over the risk-free rate that
The Model
investors require per unit of risk associated with
Uncertainty is modeled via a probability space the (only) source of uncertainty. In the general
(, F, F, ), where F = (Ft )t∈[0,T ] is a filtration case, λj can be interpreted as the premium required
representing the flow of information. The market con- for the risk associated with the j th source of
sists of a locally riskless savings account whose price uncertainty, represented by the Brownian motion W j .
process S 0 satisfies S00 > 0 and In incomplete markets, when d < m, Proposition 1
shows all the different choices for λ. Each choice will
dSt0 parameterize the different risk attitudes of different
= rt dt, t ∈ [0, T ] (14) investors. In other words, risk premia characterize the
St0
possible stochastic discount factors, as is revealed in
for some F-adapted, positive short-rate process r = Theorem 3.
t
(rt )t∈ . It is obvious that St0 = S00 exp( 0 ru du) for If m = d, the equation σ  λ = b − r1 has only
t ∈ [0, T ]. We define the deflator β via one solution: λ∗ = σ c−1 (b − r1). If d < m there are
  t  many solutions, but they can be characterized using
S0 easy linear algebra.
βt = 00 = exp − ru du , t ∈ [0, T ] (15)
St 0
Proposition 1 The risk premia are exactly all pro-
The movement of d risky assets will be modeled via cesses of the form λ = λ∗ + κ, where λ∗ := σ c−1 (b −
Itô processes: r1) and κ is any adapted process with σ  κ = 0.

dSti If λ = λ∗ + κ in the notation of Proposition 1,


= bti dt + σt·i , dWt , t ∈ + , i = 1, . . . , d then λ∗ , κ = (b − r1) c−1 σ  κ = 0. Then, |λ|2 =
Sti |λ∗ |2 + |κ|2 , where |λ∗ |2 = b − r1, c−1 (b − r1) .
(16)
Stochastic Discount Factors
Here, b = (b1 , . . . , bd ) is the F-adapted d-dim-
ensional process of appreciation rates, W = The usual method of obtaining stochastic discount
(W 1 , . . . , W m ) is an m-dimensional -Brownian factors in continuous time is through risk-neutral
motion representing the sources of uncertainty in measures. The fundamental theorem of asset pricing
the market, and ·, · denotes  the j iusualj inner in the present Itô-process setting states that absence
product notation: σt·i , dWt = m j =1 σt dWt where of free lunches with vanishing riskd is equivalent to
(σ j i )1≤j ≤m, 1≤i≤d is the F-adapted (m × d)-matrix- the existence of a probability  ∼  such that βS i is
ji
valued process whose entry σt represents the impact (only) a local -martingale for all i = 0, . . . , d. (For
6 Stochastic Discount Factors

the definition of local martingales, check, e.g., [7].) In capital in hand 


invested in asset i at time t. In that
that case, by defining Y via Yt = βt (d/d)|Ft , Y S i case, π 0 = 1 − di=1 π i will be invested in the sav-
is a local -martingale for all i = 0, . . . , d. The last ings account. Denote by X π the wealth generated by
property is taken here as the definition of a stochastic starting from unit initial capital (X0π = 1) and invest
discount factor. according to π. Then,
Definition 3 Consider the above Itô-process setup,
a stochastic process Y is called a stochastic discount dXtπ d
dS i
factor if π = πti it = (rt + πt , bt − rt 1 ) dt
Xt i=0
St
• Y0 = 1 and YT > 0,  a.s. + σt πt , dWt (17)
• Y S i is a local -martingale for all
i = 0, 1, . . . , d. To ensure that the above wealth process is well
defined, we must assume that
In the case where Y S 0 is an actual martingale, that is,

Ɛ [YT ST0 ] = S00 , a risk-neutral measure  is readily T

defined via the recipe d = (YT ST0 /S00 )d. How- | πt , bt − rt 1 | dt < +∞ and
0
ever, this is not always the case, as Example 1 below  T
will show. Therefore, existence of a stochastic dis- πt , ct πt dt < +∞,  a.s (18)
count factor is a weaker notion than existence of a 0
risk-neutral measure. For some practical applications
The set of all d-dimensional, F-adapted processes
though, these differences are unimportant. There is
π that satisfy equation (18) is denoted by . A
further discussion of this point later in the section
simple use of the integration-by-parts formula gives
Stochastic Discount Factors and Equivalent Martin-
the following result:
gale Measures.
Proposition 2 If Y is a stochastic discount factor,
Example 1 Let S 0 ≡ 1 and S 1 be a three- then Y X π is a local martingale for all π ∈ .
dimensional Bessel process with S01 = 1. If F is
the natural filtration of S 1 , it can be shown that
the only stochastic discount factor is Y = 1/S 1 , Connection with “No Free Lunch” Notions
which is a strict local martingale in the terminology The next line of business is to obtain an existential
of [4]. result about stochastic discount factors in the present
setting, also connecting their existence to an NA-type
Credit Constraints on Investment notion. Remember, from the section The Important
Case of the Logarithm, the special stochastic dis-
In view of the theoretical possibility of continuous
count factor that is the reciprocal of the log-optimal
trading, to avoid so-called doubling strategies (and
wealth process. We proceed somewhat heuristically to
for the fundamental theorem of asset pricing to hold),
compute the analogous processes for the Itô-process
credit constraints have to be introduced. The wealth
model. The linear stochastic differential equation (17)
of agents has to be bounded from below by some
has the following solution, expressed in logarithmic
constant, representing the credit limit. Shifting the
terms:
wealth appropriately, one can assume that the credit
limit is set to zero; therefore, only positive wealth  · 
1
processes are allowed in the market. log X π = rt + πt , bt − rt 1 − πt , ct πt dt
Since only strictly positive processes are consid- 0 2
 ·
ered, it is more convenient to work with proportions
of investment, rather than absolute quantities as was + σt πt , dWt (19)
0
the case in the section Stochastic Discount Factors ·
in Discrete Probability Spaces . Pick some F-adapted Assuming that the local martingale term 0 σt πt
process π = (π 1 , . . . , π d ). For i = 1, . . . , d and t ∈ dWt in equation (19) is an actual martingale, the
[0, T ], the number πti represents the percentage of aim is to maximize the expectation of the drift
Stochastic Discount Factors 7

term. Notice that we can actually maximize the drift where κ is an m-dimensional F-adapted process with
pathwise if we choose the portfolio π∗ = c−1 (b − σ  κ = 0.
r1). We need to ensure that π∗ is in . It is easy to see
that If the assumption that F is generated by W is
 Tthe equations in (18) are both satisfied if and only removed, one still obtains a similar result with N κ
if 0 |λ∗t |2 dt < ∞  a.s., where λ∗ := σ c−1 (b − r1)
being replaced by any positive F-martingale N with
is the special risk premium of Proposition 1. Under
N0 = 1 that is strongly orthogonal to W . The specific
this assumption, π∗ ∈ . Call X ∗ = X π∗ and define
representation obtained in Theorem 3 comes from
the martingale representation theorem of Brownian
1 filtrations; see, for example, [7].
Y ∗ :=
X∗
  ·   Stochastic Discount Factors and Equivalent
∗ 1 · ∗2
= β exp − λt , dWt − |λ | dt Martingale Measures
0 2 0 t
(20) Consider an agent who uses a stochastic discount
factor Y for valuation purposes. There is a possibility
that Y S i could be a strict local -martingale for
Using the integration-by-parts formula, it is rather some i = 0, . . . , d, which would mean thate S0i >
straightforward to check that Y ∗ is a stochastic
Ɛ [YT STi ]. The last inequality is puzzling in the sense
discount factor. In fact, the ability to define Y ∗ is
that the agent’s indifference price for the ith asset,
the way to establish that a stochastic discount factor which is Ɛ [YT STi ], is strictly lower than the market
exists, as the next result shows. price S0i . In such a case, the agent would be expected
Theorem 2 For the Itô process-model considered to wish to short some units of the ith asset. This is
above, the following are equivalent. indeed what is happening; however, because of credit
constraints, this strategy is infeasible. The following
1. The set of stochastic discount factors is is a convincing example that establishes this fact.
nonempty. Before presenting the example, an important issue
T ∗ 2 ∗ should be clarified. One would rush to state that such
2. 0 |λt | dt, < ∞-a.s.; in that case, Y defined
in equation (20) is a stochastic discount factor. “inconsistencies” are tied to the notion of a stochastic
3. For any  > 0, there exists  = () ∈ + such discount factor as it appears in Definition 3, and
that [XTπ > ] <  uniformly over all portfolios that is strictly weaker than existence of a probability
π ∈ .  ∼  that makes all discounted processes βS i
local -martingales for i = 0, . . . , d. Even if such
The interest reader is referred to [6], where the a probability did exist, βS i could be a strict local -
property of the market described in statement 3 of martingale for some i = 1, . . . , d; in that case, S0i >
the above theorem is termed No Unbounded Profit Ɛ [βT STi ] and the same mispricing problem pertains.
with Bounded Risk.
The next structural result about the stochastic Example 2 Let S 0 ≡ 1, S 1 be the reciprocal of a
discount factors in the Itô-process setting reveals the three-dimensional Bessel process starting at S01 = 1
importance of Y ∗ as a building block. under  and F be the filtration generated by S 1 . Here,
 is the only equivalent local martingale measure and
Theorem 3 Assume that F is the filtration generated 1 = S01 > Ɛ [ST1 ] for all T > 0. This is a complete
by the Brownian motion W . Then, any stochastic market—an agent can start with capital Ɛ [ST1 ] and
discount factor Y in the previous Itô-process model invest in a way so that at time T the wealth generated
can be decomposed as Y = Y ∗ N κ , where Y ∗ was is exactly ST . Naturally, the agent would like to long
defined in equation (20) and as much as possible from this replicating portfolio
and go as short as possible from the actual asset.
  t  t  However, in doing so, the possible downside risk
Ntκ = exp − κu , dWu − |κu | du ,
2
is infinite throughout the life of the investment and
0 0 the enforced credit constraints will disallow for such
∀t ∈ [0, T ] (21) strategies.
8 Stochastic Discount Factors

In the context of Example 2, the law of one issue, depending on the preferences of the particu-
price fails, since the asset that provides payoff ST1 lar agent as given by the choice of κ to form the
at time T has a market price S01 and a replication stochastic discount factor.
price Ɛ [ST1 ] < S01 . Therefore, if the law of one price
is to be valid in the market, one has to insist on
existence of an equivalent (true) martingale measure End Notes
, where each discounted process βS i is a true (and
a.
not only local) -martingale for all i = 0, . . . , d. For One can impose natural conditions on preference rela-
pricing purposes then, it makes sense to ask that the tions defined on the set of all possible outcomes that
will lead to numerical representation of the preference
stochastic discount factor Y κ that is chosen according
relationship via expected utility maximization. This was
to Theorem 3 is such that Y κ S i is a true -martingale axiomatized in [10]—see also Chapter 2 of [5] for a nice
for all i = 0, . . . , d. Such stochastic discount factors exposition.
give rise to probabilities κ that make all deflated b.
We stress “infinitesimal” because when the portfolio
asset-price-process κ -martingales and can be used holdings of the agent change, the indifference prices also
as pricing measures. change; thus, for large sales or buys that will considerably
Let us now specialize to the important “diffusion” change the portfolio structure, there might appear an
incentive, that was not there before, to sell or buy the asset.
case where rt = r ∈  for all t ∈ [0, T ] and σt = c.
For this reason, utility indifference prices are sometimes
η(t, St ) for all t ∈ [0, T ], where η is a nice function referred to as Davis prices.
with values in the space of (m × d)-matrices. As d.
Free lunches with vanishing risk is the suitable general-
long as a claim written only on the traded assets is ization of the notion of arbitrages to get a version of the
concerned, the choice of κ for pricing is irrelevant, fundamental theorem of asset pricing in continuous time.
since the asset prices under κ have dynamics The reader is referred to [3].
e.
The inequality follows because positive local martingales
are supermartingales—see, for example, [7].
dSti
= rt dt + σt·i , dWtκ ,
Sti
References
∀t ∈ [0, T ], i = 1, . . . , d (22)
[1] Cochrane, J.H. (2001). Asset Pricing, Princeton Univer-
where W κ is a κ -Brownian motion. However, if one sity Press.
is interested in pricing a claim written on a nontraded [2] Davis, M.H.A. (1997). Option pricing in incomplete
asset whose price process Z has -dynamics markets, in Mathematics of Derivative Securities (Cam-
bridge, 1995), Publications of the Newton Institute,
dZt = at dt + ft , dWt , t ∈ [0, T ] (23) Cambridge University Press, Cambridge, Vol. 15, pp.
216–226.
for F-adapted a and f = (f 1 , . . . , f m ), then the κ - [3] Delbaen, F. & Schachermayer, W. (2006). The Mathe-
dynamics of Z are matics of Arbitrage, Springer Finance, Springer-Verlag,
Berlin.
  [4] Elworthy, K.D. & Li, X.-M. & Yor, M. (1999). The
dZt = at − ft , λ∗t − ft , κt dt + ft , dWtκ , importance of strictly local martingales; applications to
radial Ornstein-Uhlenbeck processes, Probability Theory
∀t ∈ [0, T ] (24) and Related Fields 115, 325–355.
[5] Föllmer, H. & Schied, A. (2004). Stochastic Finance,
The dynamics of Z will be independent of the choice extended Edition, de Gruyter Studies in Mathematics,
of κ only if the volatility structure of the process Z, Walter de Gruyter & Co., Berlin, Vol. 27.
given by f , is in the range of σ  . This will mean [6] Karatzas, I. & Kardaras, C. (2007). The numéraire
that f, κ = 0 for all κ such that σ  κ = 0 and that portfolio in semimartingale financial models, Finance
Z is perfectly replicable using the traded assets. As and Stochastics 11, 447–493.
[7] Karatzas, I. & Shreve, S.E. (1991). Brownian Motion
long as there is any randomness in the movement in
and Stochastic Calculus, 2nd Edition, Graduate Texts in
Z that cannot be captured by investing in the traded Mathematics, Springer-Verlag, New York, Vol. 113.
assets, that is, if there exists some κ with σ  κ = 0 [8] Karatzas, I. & Shreve, S.E. (1998). Methods of Math-
and f, κ not being identically zero, perfect replica- ematical Finance, Applications of Mathematics (New
bility fails and pricing becomes a more complicated York), Springer-Verlag, New York, Vol. 39.
Stochastic Discount Factors 9

[9] Lamberton, D. & Lapeyre, B. (1996). Introduction to Related Articles


Stochastic Calculus Applied to Finance, Chapman &
Hall, London. Translated from the 1991 French original
by Nicolas Rabeau and François Mantion. Arrow–Debreu Prices; Change of Numeraire;
[10] von Neumann, J. & Morgenstern, O. (2007). Theory Complete Markets; Equivalent Martingale Mea-
of Games and Economic Behavior, anniversary edition, sures; Fundamental Theorem of Asset Pricing;
Princeton University Press, Princeton, NJ. With an Pricing Kernels.
introduction by Harold W. Kuhn and an afterword by
Ariel Rubinstein. CONSTANTINOS KARDARAS
[11] Ross, S.A. (1976). The arbitrage theory of capi-
tal asset pricing, Journal of Economic Theory 13,
341–360.
Utility Function Transitivity is a rationality assumption. Its violation
generates cycles, for example, x  y  z  x. The
most troublesome consequence of such cycles is that
there might not exist a best element in the choice set
Behavior and Preferences X. For example, suppose that X = {x, y, z} and that
x  y and y  z. If transitivity is violated, we get
Modern utility theory studies preference orderings
the cycle x  y  z  x and there is no best element
over choice sets and their numerical representations.
in X.
Consider a decision maker (DM) who has to choose
among a set X of alternatives. The set X is called Axiom 2 (Completeness). For any two elements x
the DM’s choice set. In the deterministic case, which and y in X, x  y, y  x, or both.
is our focus here, alternatives are certain, without
any uncertainty. For example, in consumer theory, This is a simple, but not innocuous, property. A
the DM is a consumer and X is the consumption DM’s preference  satisfies this property if, when
set that he/she faces, that is, a subset of n whose faced with any two alternatives in X, he/she can
elements x = (x1 , . . . , xn ) represent the consumption always say which one he/she prefers. As alterna-
bundles available to consumers. In intertemporal tives may be very different, this might be a strong
choice problems, X is a subset of ∞ , the space requirement (see [1, 6], for weakenings of this
of sequences {xt }∞ t=1 , where xt is the DM’s outcome assumption).
at time t. Alternatives become more complicated Note that Axiom 2 implies reflexivity, that is, x 
objects under uncertainty, such as random variables x for all x ∈ X. When  is reflexive and transitive
in one-period problems and stochastic processes in (e.g., when it satisfies Axioms 1 and 2), following the
intertemporal problems. This more general case is not consumer theory terminology, we call indifference
considered here. curves the equivalence classes [x] = {y ∈ X : y ∼ x}
DMs have some preferences over the elements for any x ∈ X. We denote the collection {[x] : x ∈
of X; they may like some alternatives more than X} of all indifference curves by X/ ∼, which is a
others or may be indifferent among some of them. partition of X. That is, each x ∈ X belongs to one,
For example, in consumer theory consumers will and only one, indifference curve.
rank consumption bundles in their consumption sets Axioms 1 and 2 do not depend on any particular
according to their tastes. structure of the set X. In most applications, however,
This motivates the introduction of preference X is a subset of an ordered vector space (V , ≥), that
orderings  defined on the choice set X. The order- is, of a space V that has both a vector and an order
ing  has the following interpretation: for any two structure. The space n endowed with the natural
vectors x and y in X, we write x  y if the DM order ≥ is an important example of an ordered vector
either strictly prefers x to y or is indifferent between
space. Given any x, y ∈ n , when the vectors x and
the two.
y are regarded as consumption bundles, x ≥ y means
The ordering  is the basic primitive of the theory.
that the bundle x has at least as much of each good
The following two relations are derived from :
than the bundle y, while the convex combination
1. for any two vectors x and y in X, we write x  y αx + (1 − α)y is interpreted as a mix of the two
if the DM strictly prefers x to y. Formally, x  y vectors (implicitly we are assuming that goods are
if x  y, but not y  x; suitably divisible).
2. for any two vectors x and y in X, we write The following axioms are based on the order and
x ∼ y if the DM is indifferent between x and vector structures of X. For simplicity, we assume
y. Formally, x ∼ y if both x  y and y  x. X ⊆ n , though most of what follows holds in more
general ordered vector spaces with units. Here, x > y
On the preference ordering , which is the the- means x ≥ y and x  = y (i.e., xi > yi for at least some
ory’s “raw material,” some properties are considered. i = 1, . . . , n).

Axiom 1 (Transitivity). For any three elements Axiom 3 (Monotonicity). For any two elements x
x, y, and z in X, if x  y and y  z, then x  z. and y in X ⊆ n , if x > y, then x  y.
2 Utility Function

This axiom connects the order ≥ on X and in terms of choice behavior. In particular, their
the DM’s preference relation . In the context of behavioral meaning is transparent and, with the
consumer theory, it says that “the more, the better.” exception of the Archimedean axiom, they are all
In particular, given two vectors x and y with x ≥ y, behaviorally falsifiable by suitable choice patterns.
it is enough that x has strictly more of at least some For example, one can show that a DM does not
good i to be strictly preferred to y. This means that all satisfy the transitivity axiom by finding alternatives
goods are “essential” that is, the DM pays attention x, y, z ∈ X over which his/her choices exhibit the
to each of them. Moreover, observe that, by Axiom 3 cycle x  y  z  x. This choice pattern would be
and reflexivity, x ≥ y implies x  y. This is because enough to reject the hypothesis that his/her preference
x ≥ y if either x = y or x > y. over X is transitive.
The following two axioms rely on the vector The use of preference axioms that have a
structure of X. transparent behavioral interpretation and that are
falsifiable through choice behavior is the main
Axiom 4 (Archimedean). Suppose that x, y, and methodological tenet of modern utility theory, often
z are any three elements of a convex X ⊆ n such called the revealed preference methodology. In fact,
that x  y  z. Then there exist α, β ∈ (0, 1) such choice behavior data are regarded as the only
that αx + (1 − α)z  y  βx + (1 − β)z. observable data that economic theories can rely upon.
Another important methodological feature of mod-
According to this axiom, there are no infinitely
ern utility theory is that it adopts a weak notion
preferred or infinitely despised alternatives. That is,
of rationality, which requires only the consistency
given any pairs x  y and y  z, alternative x cannot
of choices without any demand on their motives.
be infinitely better than y, and alternative z cannot be
For example, transitivity is viewed as a rational-
infinitely worse than y. Indeed, we can always mix x
ity requirement in this sense because its violations
and z to get better alternatives, that is, αx + (1 − α)z,
would entail inconsistent patterns of choices that no
or worse alternatives, that is, βx + (1 − β)z, than y.
DM would consciously follow, regardless of his/her
It may be useful to remember the analogous
motivations (see [15], for a recent discussion of this
property that holds for real numbers: if x, y, and
methodological issue).
z are real numbers with x > y > z, then there exist
α, β ∈ (0, 1) such that αx + (1 − α)z > y > βx +
(1 − β)z. This property does not hold any more if we Paretian Utility Functions
consider ∞ and −∞, that is, the extended real line
 = [−∞, ∞]. Specifically, let x = ∞ or z = −∞. Although the preference ordering  is the funda-
In this case, x is infinitely larger than y, z is infinitely mental notion, for analytical convenience it is often
smaller than y, and there are no α, β ∈ (0, 1) that of interest to find a numerical representation of .
satisfy the previous inequality. In fact, α∞ = ∞ and Such numerical representations are called utility func-
β(−∞) = −∞ for all α, β ∈ (0, 1). tions; formally, a real-valued function u : X →  is
a (Paretian) utility function if, for any pair x, y ∈ X,
Axiom 5 (Convexity). Given any two elements x
and y of a convex set X ⊆ n , if x ∼ y then αx + xy if and only if u(x) ≥ u(y) (1)
(1 − α)y  x for all α ∈ [0, 1].

This axiom captures a preference for mixing: In particular, for the derived relations  and ∼ it
given any two indifferent alternatives, the DM always holds, respectively, x  y if and only if u(x) > u(y)
prefers any of their combination to each of the orig- and x ∼ y if and only if u(x) = u(y). Indifference
inal alternatives. This preference for mixing is often curves can thus be written in terms of utility functions
assumed in applications and is a convexity property as [x] = {y ∈ X : u(y) = u(x)}.
of indifference curves,a the modern counterpart of the Utility functions are analytically very convenient,
classic assumption of diminishing marginal utility. but do not have any intrinsic psychological mean-
Summing up, we have introduced a few properties ing: what matters is that they numerically rank vec-
that are often assumed on the preference . All these tors in the same way as the preference ordering
axioms are behavioral, that is, they are expressed . This implies, inter alia, that every monotone
Utility Function 3

transformation of a utility function is still a util- is given by the budget set


ity function, that is, utility functions are invariant  
under monotone transformations. To see why this 
n
C= x∈X: p i xi ≤ w (4)
is the case, let u(X) = {u(x) : x ∈ X} ⊆  be the
i=1
range of u and f : u(X) →  a (strictly) mono-
tone function, that is, t > s implies f (t) > f (s) for where w is the consumer’s wealth and each pi is the
any scalars t, s ∈ u(X). Clearly, x  y if and only if price per unit of good i.
(f ° u)(x) ≥ (f ° u)(y) for any pair x, y ∈ X, and this It is immediately seen that the solutions of the
shows that the transformation f ° u is still a utility optimization problem (3) are the same, regardless
function. of what monotone transformation of u is selected
to make calculations. On the other hand, all these
Example 1 A classic utility function u : 2++ →  monotone transformations represent the same pref-
is the Cobb–Douglas utility function: erence  and the solutions reflect only the DM’s
basic preference , not the particular utility func-
u(x, y) = x a y 1−a with 0≤a≤1 (2) tion used to represent . This further shows that 
is the fundamental notion. The choice of which u to
Suppose a preference  is represented by a use, among all equivalent monotone transformations,
Cobb–Douglas utility function. Then,  is also rep- is only a matter of analytical convenience (e.g., in the
resented by the following monotone transformations Cobb–Douglas case, it is often convenient to use the
of u: logarithmic version a lg x + (1 − a) lg y).
The optimization problems (3), which play a key
1. lg(u(x, y)) = lg(x a y 1−a ) = a lg x + (1 − a)
role in economics, also illustrate the analytical impor-
lg y;
a 1−a tance of utility functions. In fact, a numerical repre-
√ 
2. u(x, y) = x a y 1−a = x 2 y 2 ; and sentation of preferences allows to use the powerful
3. u(x, y)3 = x 3a y 3(1−a) . methods of optimization theory to find and charac-
terize the solutions of problem (3), which would be
In view of this invariance under monotone trans- otherwise impossible, if we were to only rely on the
formations, the utility theory presented here is often preference . In other words, though the study of the
called ordinal utility theory. Observe that in this ordi- preference  is what gives ordinal utility theory its
nal theory, utility differences such as u(x) − u(y) are scientific status by making it a behaviorally founded
of no interest. This is because inequalities such as and falsifiable theory, it is its numerical representa-
u(x) − u(y) ≥ u(z) − u(w) have no meaning in this tion provided by utility functions that gives the theory
setup: given any such inequality, it is easy to come its operational content.
up with monotone transformations f :  →  such Given the importance of utility functions, the
that (f ° u)(x) − (f ° u)(y) < (f ° u)(z) − (f ° u)(w). main problem of ordinal utility theory is to establish
An important consequence of this observation is that conditions under which the preference ordering 
incremental ratios of utility functions defined on sub- admits a utility representation. This is not a simple
sets of n have no interest, except for their sign. For problem. We first state an existence result for the
example, the classic notion of decreasing marginal special case when the collection X/ ∼ of indifference
utility, which is based on properties of the partial curves is at most countable.
derivatives ∂u(x)/∂xk , is thus meaningless in ordinal
Theorem 1 A preference ordering  defined on a
utility theory.
choice set X with X/ ∼ at most countable satisfies
In applications, utility functions u : X →  are
Axioms 1 and 2 if and only if there exists a function
often used in optimization problems
u : X →  such that equation (1) holds.
max u(x) (3)
x∈C Proof 1 [12] page 14.

where C is a suitable subset of the choice set X, Matters are more complicated when the collection
determined by possible constraints that limit the X/ ∼ is uncountable. It is easy to come up with exam-
DM’s choices. For example, in consumer theory, C ples of preferences that satisfy Axioms 1 and 2 and
4 Utility Function

do not admit a utility representation (see Example 2). optimization problems, a natural question is whether,
We refer to [2, 12, 18] for general representation the- among all monotone transformations f ° u of a quasi-
orems. Here we establish an existence result for the concave utility function u, there exists a concave one
important special case of preferences defined on n , and this would ensure the existence of a concave rep-
based on [3]. It is closely related to Theorems 3.3 resentation of a preference  that satisfies Axiom 5.
and 3.6 of Fishburn (1970). For brevity, we omit its This important question was first studied by de Finetti
proof. [11], who showed that there exist quasi-concave func-
Write x ≤ ∞ (respectively, x ≥ −∞) when either tions that do not have any concave monotone trans-
x ∈ n or xi = ∞ (respectively, xi = −∞) for each formation. Hence, convex indifference curves are not
i. That is, x ≤ ∞ or x ≥ −∞ means that either necessarily determined by a concave utility function
each xi is finite or each xi is infinite. A subset (the converse is obviously true) and quasiconcavity
of n is a closed order interval if, given −∞ ≤ in Theorem 2 cannot be improved to concavity. Inter
y < z ≤ ∞, it has the form [y, z] = {x ∈ n : y ≤ alia, the seminal paper of de Finetti started the study
x ≤ z} and is an open order interval if it has of quasi-concave functions, later substantially devel-
the form (y, z) = {x ∈ n : yi < xi < zi for each i}. oped by Fenchel [8], which is arguably the most
The half-open order intervals [y, z) and (y, z] are important generalization of concavity.
similarly defined. For example, [z, ∞) = {x ∈ n : Finally, observe that the utility function in Theo-
x ≥ z}, and so [0, ∞) = n+ . rem 2 is continuous even though none of the axioms
A function u : X →  is monotone if x > y involves any topological notion. This is a remarkable
implies u(x) > u(y) and is quasiconcave if its consequence of the order and vector structures that
upper sets {x : u(x) ≥ t} are convex for all t ∈ the axioms use.
 [16]. Since {y : u(y) ≥ u(x)} = {y : y  x}, the We close with an example of a preference that
quasi-concavity of u implies the convexity of the does not admit a utility representation.
upper contour sets of indifference curves (cf. End
Note a). Example 2 Lexicographic preferences are a classic
example of preference orderings that do not admit a
Theorem 2 For a preference ordering  defined on utility representation. Set X = 2 and say that x  y
a order interval X ⊆ n , the following conditions are if either x1 > y1 or x1 = y1 and x2 ≥ y2 . That is,
equivalent: the DM first looks at the first coordinate: if x1 > y1 ,
then x  y. However, if x1 = y1 , then the DM turns
1.  satisfies Axioms 1–4 and
his/her attention to the second coordinate: if x2 ≥ y2 ,
2. there exists a monotonic and continuous function
then x  y. This is how dictionaries order words and
u : X →  such that equation (1) holds.
this motivates the name of this particular ordering.
Moreover, Axiom 5 holds if and only if u is
Although they satisfy Axioms 1–3, it can be proved
quasiconcave.
([18], pages 24–25) that lexicographic preferences do
Theorem 2 is an important result. Almost every not admit a utility representation (it is easy to check
work in economics contains a utility function, often that they do not satisfy the Archimedean axiom).
defined on order intervals of n and assumed to be
monotone and quasi-concave. Theorem 2 shows the
behavioral conditions that underlie this key modeling Brief Historical Remarks
assumption.
By Theorem 2, the convexity axiom 5 is equiv- The early development of utility theory is surveyed
alent to the quasi-concavity of the utility function in the two 1950 articles of George Stigler [24].
u. This is a substantially weaker property than the Here it is worth noting that originally utility func-
concavity of u, which would require u(αx + (1 − tions were regarded as a primitive notion whose role
α)y) ≥ αu(x) + (1 − α)u(y) for all x, y ∈ X and all was to quantify a Benthamian pain/pleasure calcu-
α ∈ [0, 1]. For example, any increasing function u : lus. In other words, utility functions were viewed
 →  is automatically quasi-concave. as a measure or a quantification of an underly-
Since concave utility functions are often used in ing physiological phenomenon. This view of utility
applications because of their remarkable properties in theory is sometimes called cardinalism and utility
Utility Function 5

functions derived within this approach are called car- End Notes
dinal utility functions. A key feature of cardinalism is
that utility differences and their ratios are meaning- a.
Observe that this convexity property of indifference curves
ful notions that quantify differences in pain/pleasure is weaker than the convexity of their upper contour sets
that DMs experience among different quantities of {y ∈ X : y  x}.
the outcomes. In particular, marginal utilities measure
the marginal pain/pleasure that results from choices
and these played a central role in the early cardinal References
consumer theory.
However, the difficulty of any reliable scien- [1] Aumann, R. (1962). Utility theory without the complete-
tific measurement of cardinal utility raised serious ness axiom, Econometrica 30, 445–462.
[2] Bridges, D.S. & Mehta, G.B. (1995). Representations of
doubts on the scientific status of cardinalism. At
Preference Orderings, Springer-Verlag, Berlin.
the end of the nineteenth century Pareto revolu- [3] Cerreia-Vioglio, S., Maccheroni, F., Marinacci, M. &
tionized utility theory by showing that an ordinal Montrucchio, L. (2009). Uncertainty Averse Preferences,
approach, based on indifference curves as a primi- mimeo.
tive notion—unlike Edgeworth [7], who introduced [4] Debreu, G. (1959). Theory of Value, Yale University
them as level curves of an original cardinal utility Press.
function—was enough for consumer theory purposes [5] Debreu, G. (1964). Continuity properties of Paretian
utility, International Economic Review 5, 285–293.
[20]. In particular, Pareto showed that the classic
[6] Dubra, J., Maccheroni, F. & Ok, E.A. (2004). Expected
consumer problem could be solved and character- utility theory without the completeness axiom, Journal
ized by replacing marginal utilities with marginal of Economic Theory 115, 118–133.
rates of substitutions along indifference curves. For [7] Edgeworth, F.Y. (1881). Mathematical Psychics: An
example, the classic key assumption of diminishing Essay on the Application of Mathematics to the Moral
marginal utilities is replaced by the convexity prop- Sciences, Kegan Paul, London.
erty (Axiom 5) of indifference curves (the latter is [8] Fenchel, W. (1953). Convex Cones, Sets, and Functions,
Princeton University Press, Princeton.
actually a stronger property, unless utility functions
[9] de Finetti, B. (1931). Sul significato soggettivo della
are separable). probabilità, Fundamenta Mathematicae 18, 298–329.
Unlike cardinal utility functions, indifference [10] de Finetti, B. (1937). La prévision: ses lois logiques, ses
curves and their properties can be empirically deter- sources subjectives, Annales de l’Institut Henri Poincaré
mined and tested. Pareto’s insight thus represented 7, 1–68.
a key methodological advance and his ordinal [11] de Finetti, B. (1949). Sulle stratificazioni convesse,
approach, later substantially extended by Hicks and Annali di Matematica Pura ed Applicata 30, 173–183.
[12] Fishburn, P.C. (1970). Utility Theory for Decision Mak-
Allen [17, 23], is today the mainstream version of
ing, Wiley, New York.
consumer theory. More generally, Pareto’s ordinal [13] Frisch, R. (1926). Sur un problem d’économie pure,
revolution paved the way to the modern use of pref- Norsk Matematisk Forenings Skrifter 1, 1–40.
erences as the primitive notion of decision theory. In [14] Gilboa, I. (2009). Theory of Decision under Uncertainty,
fact, the use of preferences is the natural conceptual Cambridge University Press, Cambridge.
development of Pareto’s original insight of consider- [15] Gilboa I., Maccheroni, F., Marinacci, M. & Schmei-
ing indifference curves as a primitive notion. The first dler, D. (2009). Objective and subjective rationality in a
multiple priors model, Econometrica, forthcoming.
appearance of preferences as primitive notions seems [16] Greenberg, H.J. & Pierskalla, W.P. (1971). A review
to be in [9, 13]. They earned their current central of quasi-convex functions, Operations Research 19,
theoretical place in decision theory with the classic 1553–1570.
works [4, 9, 12]. [17] Hicks, J.R. & Allen, R.G.D. (1934). A reconsideration of
The utility theory under certainty outlined here the theory of value I, II, Economica 1, 52–76, 196–219.
reached its maturity in the 1960s (see, e.g., [5]). [18] Kreps, D.M. (1988). Notes on the Theory of Choice,
Westview Press, London.
Subsequent work on decision theory has been mainly
[19] von Neumann, J. & Morgenstern, O. (1947). Theory of
concerned with choice under uncertainty, extending Games and Economic Behavior, 2nd Edition, Princeton
the scope of the seminal contributions [9, 10, 19, 21, University Press, Princeton.
22]. We refer the reader to [14] for a thorough and [20] Pareto, V. (1906). Manuale di Economia Politica,
updated introduction to these more recent advances. Società Editrice Libraria, Milano.
6 Utility Function

[21] Ramsey, F.P. (1931). Truth and probability, in Founda- Related Articles
tions of Mathematics and other Essays, R.B. Braithwaite,
ed., Routledge.
[22] Savage, L.J. (1954). The Foundations of Statistics,
Expected Utility Maximization: Duality Methods;
Wiley, New York.
Expected Utility Maximization; Recursive Prefer-
[23] Slutsky, E. (1915). Sulla teoria del bilancio del consuma-
ences; Risk Aversion; Utility Indifference Val-
tore, Giornale degli Economisti 51, 1–26.
uation; Utility Theory: Historical Perspectives.
[24] Stigler, G.J. (1950). The development of utility theory I,
II, Journal of Political Economy 58, 307–327, 373–396. MASSIMO MARINACCI
Recursive Preferences This preference model was introduced by Lazrak and
Quenez [15] to unify the recursive formulation of
Duffie and Epstein [8] and multiple-prior formulation
The standard additive utility model defines time- of Chen and Epstein [2]. Schroder and Skiadas
t utility for a discrete-time consumption process [19] and Skiadas [23] (see also [20] for the case
{ct ; t = 1, . . . , T } as with jumps) show that the more flexible form of
the aggregator allows preferences to depend on the

T source of risk (e.g., domestic versus foreign), as well
Ut = E t e−β (s−t) u(cs ) = Et {u(ct ) + e−β Ut+1 } as first-order risk aversion (which imposes a higher
s=t penalty for small levels of risk) in addition to the
(1) standard second-order risk aversion dependence in
where Et denotes the conditional expectation. The
equation (3).a
virtue of the model is its simplicity: only discounted
Relative to the time-additive model, the loss of
probabilities and the function u determine prefer-
tractability under generalized SDU is surprisingly
ences. However, the additive treatment of states
small and mainly confined to the complete-markets
and times precludes the model from distinguishing
setting. In the case of power utility, for example, once
between aversion to variability in consumption across
incompleteness or market constraints are imposed,
states and across time. In fact, the agent’s preferences
the additive problem is no simpler to solve than a
are entirely determined by preferences over determin-
more general class of scale-invariant (homothetic)
istic consumption streams (see [23]). Furthermore,
recursive utility. The tractability of the most popular
because agents care only about the distribution of
additive utility models is obtained not from additivity
future consumption, they do not care about the tem-
but from the scale or translation invariance property.
poral resolution of uncertainty.
The second and third sections examine the recursive
A more flexible preference model is obtained with
classes with these invariance properties and show that
Kreps and Porteus [14] recursive specification (see
their solution essentially reduces to solving a single
also [11]):
constrained backward stochastic differential equation.
Ut = F (ct , Et u(Ut+1 )), UT = v(cT ) (2) After defining the preferences and markets in the
second and third sections, the general solution to the
where the aggregator function F models intertempo- optimal portfolio and consumption problem is pre-
ral substitution and u the aversion to risk in next sented in the fourth section. The solution is obtained
period’s utility. The popular Epstein and Zin [12] by first characterizing the utility supergradient density
model is the special case characterized by scale- (a generalization of marginal utility) and the state-
invariant preferences (Ut homogeneous in (ct , Ut+1 ) price density. The state-price result is useful in other
and v(c) = c) and constant elasticity of substitution. asset-pricing applications because it characterizes the
The stochastic differential utility (SDU) formulation set of pricing operators consistent with no arbitrage
 T  in a general market setting.b The optimal consump-
1
Ut = E t b(cs , Us )ds + a(Us )d[U, U ]s tion process is obtained by equating a supergradient
s=t 2 density and state-price density (a generalized notion
(3) of equating marginal utility and prices). All results
where [·, ·] denotes quadratic variation, which was in this article are based on [18–20]; these references
obtained by Duffie and Epstein [8] as the continuous- also develop more specialized and tractable formula-
time limit of recursive utility. Time-additive utility is tions, based on quadratic modeling of risk aversion,
the special case b(c, U ) = u(c) − βU and a = 0. and the last introduces jump risk (modeled by marked
Skiadas [22] shows that SDU includes the robust point processes).
control formulations of Anderson et al. [1], Hansen All uncertainty is generated by d-dimensional
et al. [13], and Maenhout [17]. It is straightforward standard Brownian motion B over the finite time
to show that SDU also includes the continuous-time horizon [0, T ], supported by a probability space
limit of Chew [3] and Dekel [7] preferences. (, F, P ). All processes dealt with in this article are
In this paper, we examine the generalized SDU assumed to be progressively measurable with respect
model, given in differential form by equation (5). to the augmented filtration {Ft : t ∈ [0, T ]} generated
2 Recursive Preferences

by B. We define a cash


 flow as a process x such the agent’s position in m risky assets is represented by
T 2
that E 0 xt dt + xT < ∞. We interpret xt as a
2 the process φ = (φ 1 , . . . , φ m ) . The agent’s financial
time-t payment rate and xT as a lump-sum terminal wealth process (not including the present value of
payment. The set of all cash flows is denoted H, the future endowment), W , is defined in terms of
which we regard as a Hilbert space under the inner the wealth aggregator f :  × [0, T ] × m+1 → ,
product which represents the instantaneous expected growth
of the agent’s portfolio. Cuoco and Cvitanić [4]
 T
propose a nonlinear wealth aggregator to model
(x|y) = E xt yt dt + xT yT , x, y ∈ H (4) the price impact of a large investor or differential
0
borrowing and lending rates. Trading and wealth
The set of consumption plans is the convex cone constraints are modeled by requiring the vector
C ⊆ H. Finally, we let Sp

, p = 1, 2, denote the
set of (Wt , φt ) to lie in a convex set K ⊆ 1+m at all times.
cash flows, satisfying E ess supt∈[0,T ] |xt |p < ∞. The returns diffusion matrix is an d×m -valued
The qualification “almost surely” is omitted through- process σ R . The agent’s plan (c, W, φ) is feasible
out. The coefficients of all the stochastic differential if it satisfies the budget equation
equations introduced will be assumed sufficiently
integrable so that the equations are well defined. dWt = (f (t, Wt , φt ) + et − ct )dt + φt σtR dBt ,
W0 = w0 , cT = WT + eT , (Wt , φt ) ∈ K (7)
Recursive Preferences t
and the integrability conditions 0 (|f (s, Ws , φs )| +
We define preferences in terms of a utility aggregator, φs σsR σsR φs )ds < ∞ and (−Wt )+ ∈ S2 (the latter to
F :  × [0, T ] × 2+d → . For every consumption rule out doubling-type strategies). A consumption
plan c ∈ C, we assume that there is a unique solution plan c is feasible if it is part of a feasible plan
(U, ) to the backward stochastic differential equa- (c, W, φ).
tion (BSDE):
Example 1 Linear Budget Equation. Suppose
dUt = − F (t, ct , Ut , t ) dt + t dBt , that a money-market security pays interest at a rate
rt , and the risky assets’ instantaneous excess returns
UT = F (T , cT ) (5) relative to r are dRt = µRt dt + σtR dBt . Then we get
the standard case:
(terminal utility depends only on (ω, T , cT )), and
we define U (c) = U . Throughout we assume that
F is differentiable, F (ω, t, ·) is concave, and that f (ω, t, w, α) = r(ω, t)w + α  µR (ω, t),
the range of Fc (ω, t, ·, U, ) is (0, ∞). SDU is (w, α) ∈ K (8)
the special case F (ω, t, c, U, ) = b(ω, t, c, U ) +
1 a(ω, t, U )  , and standard additive utility cor- Example 2 Different Borrowing and Lending
2
responds to the linear aggregator F (ω, t, c, U, ) = Rates. Extending Example 1, if b is a strictly
u(t, c) − β(t)U . A multiple-priors formulation of [2] positive process and money-market lending and
is given by borrowing occur at the rates rt and rt + bt ,
  respectively, then
dUt = − b(t, ct , Ut ) − max θ  t dt + t dBt
θ∈t
f (ω, t, w, α) = r(ω, t)w + µR (ω, t) α − b(ω, t)
(6)
× (1 α − w)+ , (w, α) ∈ K (9)
for some function  from  × [0, T ] to the set of
convex compact subsets of d . For a related analysis, see Appendix B of [6].

Markets and the Wealth Equation General Solution Method


The agent is endowed with initial financial wealth w0 The agent’s problem is to choose an optimal con-
and an endowment process e ∈ H. The dollar value of sumption plan: a feasible c such that U (c) ≥ U (c̃)
Recursive Preferences 3

for all other feasible consumption plans c̃. We first in terms of the differential or superdifferential (in
show that optimality of c is essentially equivalent the absence of differentiability or in the presence of
to the utility supergradient density of U at c sat- constraints) of the corresponding aggregator.
isfying the conditions for a state-price density, and The superdifferential of f (t, ·) at (ω, t, w, α) rel-
then characterize these density equations in terms of ative to the constraint set K is the set ∂f (ω, t, w, α)
the utility and wealth aggregators defined above. The of all pairs (dw , dφ ) ∈ 1+m such that
resulting first-order conditions satisfy a constrained
forward–backward stochastic differential equation f (ω, t, w̃, α̃) − f (ω, t, w, α) ≤ dw (w̃ − w)
(FBSDE) system.
Given the feasible consumption plan c, the process + dφ (α̃ − α) for all (w̃, α̃) ∈ K (13)
π ∈ H is a state-price density at c if
Sufficient conditions for a state-price density
follow.c
(π|x) ≤ 0
Proposition 2 Suppose that (c, W , φ) is feasible
for all x such that c + x is a and π ∈ H++ satisfies
feasible consumption plan (10)
dπt
We can interpret (π|x) as the net present value of = −ζt dt − ηt dBt , (ζt , σtR ηt ) ∈ ∂f (t, Wt , φt )
πt
the cash flow x, which must be nonpositive for any (14)
feasible (i.e., affordable) incremental cash flow.
and πW ∈ S1 . Then π is a state-price density at c.
The process π ∈ H is a supergradient density of
U0 at c if The process η is often called the market price
of risk, with ηti representing the time-t shadow
U0 (c + x) ≤ U0 (c) + (π|x) incremental expected wealth return per unit additional
exposure to dBti . The drift term ζ represents the
for all x such that c + x ∈ C (11)
shadow incremental return per unit wealth. In the
and a utility gradient density of U0 at c if case of a linear budget equation (8) and K = 1+m
(no constraints but possibly incomplete markets), we
U0 (c + αx) − U0 (c) obtain the standard result ζt = rt and µRt = σtR ηt .
(π|x) = lim for all x such
α↓0 α Example 3 Collateral Constraint. Suppose that
that c + αx ∈ C for some α > 0 (12) there is a single risky asset (m = 1), and, as in
Example 1, f (ω, t, w, α) = r(ω, t)w + µR (ω, t)α.
If π is a supergradient density of U0 at c and We consider an agent who faces the collateral
the utility gradient of U0 at c exists, then the utility constraint:
gradient density is π.
The general optimality result follows. K = {(w, α) ∈ 2 : w ≥ |α|} (15)

Proposition 1 Suppose that (c, W , φ) is a feasible for some  ∈ (0, 1). Then condition (ζ, σ R η) ∈
plan. If π ∈ H is both a supergradient density of U0 at ∂f (W, φ) is equivalent to the following restrictions:
c and a state-price density at c, then the plan (c, W , φ)
is optimal. Conversely, if the plan (c, W , φ) is optimal δt = ζt − rt ≥ 0, εt = µRt − t ∈ [−δt , δt ]
and π ∈ H is a utility gradient density of U0 at c, then
π is a state-price density at c. (φt > 0 ⇒ εt = δt ), (φt < 0 ⇒ εt = −δt ),
(Wt > |φt | ⇒ δt = 0) (16)
To apply Proposition 1, we obtain the dynamics
of the utility supergradient and state-price densities Papers analyzing collateral constraints in a
corresponding to the utility and market models, as Brownian setting and additive utility include [5, 16].
discussed in the sections Recursive Preferences and
Markets and the Wealth Equation. Both depend on the Assuming differentiability of the utility aggregator
feasible reference plan (c, W, φ) and are expressed F (nondifferentiability is accommodated by replacing
4 Recursive Preferences

the differential with a superdifferential defined as for optimality conditions in the form of a constrained
f ), we now provide sufficient conditions for a utility FBSDE system:
supergradient density.
dU = − F (I(λ, U, ), U, )dt +   dB,
Proposition 3 Suppose that c ∈ C, (U , ) solves
UT = F (T , WT + eT )
BSDE (5), π ∈ H++ satisfies
dλt
πt = Et Fc (t, ct , Ut , t ) (17) = − (ζ + FU + σ λ F )dt + σ λ dB,
λt
λT = Fc (T , WT + eT )
where
dW = (f (W, φ) + e − I(λ, U, ))dt + φ  σtR dBt ,
dEt
= FU (t, ct , Ut , t )dt+ W0 = w0
Et
F (t, ct , Ut , t ) dBt , E0 = 1 (18) (ζ, −σ R (F + σ λ )) ∈ ∂f (W, φ),
(φ, W ) ∈ K (23)
and EU ∈ S1 . Then, π is a utility gradient density of
U0 at c. Given a solution (U, , λ, σ λ , W, φ) and suit-
able integrability assumptions (to satisfy Propositions
The supergradient density expression (17) is con- 1–3), then c in equation (21) defines an optimal con-
sistent with the calculations of Skiadas [2], Duffie and sumption plan.
Skiadas [9], Chen and Epstein [10], and El Karoui
et al. [21]. All these papers assume Lipschitz-growth
conditions that are violated in our setting. Scale and Translation-invariant Solutions
We now apply Proposition 1 to characterize the
first-order conditions. A key role in the solution is The first-order conditions significantly simplify when
played by the strictly positive process utility and wealth dynamics fall into either the scale
or translation-invariant classes. The scale-invariant,
λt = Fc (t, ct , Ut , t ) (19) or homothetic, class exhibits homogeneity of degree
one in consumption (when in certainty equivalent
computed at the optimum, which represents the form) and includes, as special cases, homothetic
derivative of time-t optimal utility with respect to Duffie–Epstein utility and additive power and log
time-t wealth (as in the familiar envelope result). We utility. The translation-invariant class exhibits quasi-
solve for µλ and σ λ in the Ito expansion linearity with respect to a reference consumption
dλt stream and generalizes additive exponential utility.
= µλt dt + σtλ dBt (20) In both cases, the FBSDE of the first-order condi-
λt
tions uncouples into a single pure backward equation
by applying Ito’s lemma to the utility gradient for λ and a pure forward equation for wealth.
density, πt = Et λt , and matching coefficients with
those of the state-price density in Proposition 2. Scale-invariant Class
Having solved for λ, invert equation (19) to express
the consumption plan c as We assume that consumption is strictly positive, and
the aggregator F (ω, t, ·) is homogeneous of degree
ct = I(t, λt , Ut , t ) (21) one, allowing the representation
where the function I :  × [0, T ] × (0, ∞) × d+1  
c 
→  is defined implicitly through the following F (ω, t, c, U, ) = U G ω, t, , ,
equation: U U
F (T , c) = c (24)
Fc (t, I(t, y, U, ), U, ) = y, y ∈ (0, ∞) (22)
It is easy to confirm that utility is therefore
Combining the dynamics of λ with the utility homogeneous of degree one in consumption:
BSDE (5), the budget equation (7), and the state-
pricing restriction of Proposition 2, we obtain the U (kc) = kU (c) for all k ∈ + and c ∈ C (25)
Recursive Preferences 5

Defining σtU = t /Ut , the BSDE (5) is equiva- and therefore σtU = σtλ + σtR ψt . Recalling λt =
lent to Gc (t, ct /Ut , σtU ), we define the inverse function
dUt IG () analogously to equation (22) to obtain ct /Ut =
= −G(t, ct /Ut , σtU )dt + σtU  dBt , UT = cT IG (t, λt , σtU ). Defining the dual function of G∗
Ut
(26)
G∗ (t, λ, σ ) = G(t, IG (t, λ, σ ), σ ) − IG (t, λ, σ )λ
Example 4 Schroder and Skiadas [19] show that
the quasi-quadratic aggregator (32)

1 we obtain the first-order conditions (necessary and


G(ω, t, x, σ ) = g(ω, t, x) −   Q(ω, t) (27) sufficient) as a constrained backward equation for λ
2
(independent of wealth):
where Q is positive definite for all (ω, t), is par-
ticularly tractable, while allowing the modeling of dλ 
source-dependent second-order risk aversion through = − G∗ (λ, σ λ + σ R ψ) + f 1 (ψ)
Q. The continuous-time version of Epstein and Zin λ−

[11] is the special case with Q = γ I , for some con- 
+ψ  σ R σ λ dt + σtλ dBt , λT = 1,
stant γ > 0, and
x 1−η − 1 − σ R (Gσ + σ λ ) ∈ ∂f 1 (ψ) (33)
g(ω, t, x) = α + β (28)
1−η Given a solution (λ, σ λ , ψ) and sufficient regular-
for constants α and η, β > 0. Additive utility cor- ity, then ct /Wt = λt IG (t, λ, σ ) is substituted into the
responds to γ = η (the coefficient of relative risk wealth equation to complete the solution.
aversion is equal to the inverse of the elasticity of
intertemporal substitution). Translation-invariant Class
We allow consumption to take any value in  and
On the markets side, we assume that wealth is
fix some strictly positive and bounded reference
strictly positive; the endowment process, e, is zero;
consumption plan γ ∈ H. The aggregator is assumed
constraints are on investment proportions, φt /Wt ∈
to satisfy
K 1 , for some convex set K 1 ⊆ m ; and the wealth
aggregator f (ω, t, ·) is homogeneous of degree one.  
c
Letting ψt = φt /Wt denote the vector of investment F (ω, t, c, U, ) = G ω, t, − U,  ,
proportions, and defining f 1 (t, ·) = f (t, 1, ·), the γ (ω, t)
budget equation (7) becomes c
F (ω, T , c) = (34)
  γ (ω, T )
dWt ct
= f 1 (t, ψt ) − dt + ψt σtR dBt , which implies that U is quasilinear with respect to γ :
Wt Wt
U (c + kγ ) = U (c) + k for all k ∈  and c ∈ C
W0 = w0 , cT = WT , ψt ∈ K 1 (29) (35)
(in the linear budget constraint case of Example 1,
Example 5 Additive exponential utility corre-
we have f 1 (t, ψt ) = rt + ψt µRt ).
sponds to
The state-price density condition (ζt , σtR ηt ) ∈
∂f (t, Wt , φt ) is then equivalent to 1
G(ω, t, x, ) = β(ω, t) − exp(−x) −   
2
ζt = f 1 (t, ψt ) − ψt σtR ηt and σtR ηt ∈ ∂f 1 (t, ψt ) (36)
This follows because the ordinally equivalent
(30) utility Vt = − exp(−Ut ) satisfies (under suitable
integrability restrictions)
The scale-invariance properties imply that at the
optimum utility is proportional to wealth:  T   s 
cs
Vt = E t − exp − βu du − ds
Ut = λt Wt (31) t t γs
6 Recursive Preferences

  
T
cT γ / . Utility and marginal utility of wealth processes
− exp − βu du − (37) satisfy
t γT
1 1
On the markets side, we assume that the reference Ut = (Yt + Wt ), λt = (41)
consumption stream γ is part of the feasible plan t t
(γ , , κ):
where (Y, φ 0 ) is determined by a constrained back-
  ward SDE, given below, that is independent of finan-
dt γt cial wealth.
= µt −
κ
dt + κ  σtR dBt , T = γT
t t Defining the superdifferential notation ∂f 0 anal-
(38) ogously to ∂f , the state-price density condition
(ζ, σ R η) ∈ ∂f (W, φ) is equivalent to ζ = µκ −
That is,  is the price of a fund paying dividend κ  σ R η and σ R η ∈ ∂f 0 (φ 0 ). Defining the inverse
process γ ; κ ∈ m represents the investment propor- and dual functions X, G∗ :  × [0, T ] × d+1 → 
tions of the fund; and µκ is the fund’s instantaneous by
expected return process.
For any (w, α) ∈ K, we assume (w + v, α + Gx (ω, t, X(ω, t, y, ), ) = y,
vκ) ∈ K and f (ω, t, w + v, α + vκ) = f (ω, t,
G∗(ω, t, y, ) = G(ω, t, X(ω, t, y,),)
w, α) + vµκ (ω, t) for all v ∈ . That is, trading in
the portfolio κ is unrestricted and earns instantaneous − y X(ω, t, y, )
expected return µκ regardless of the agent’s plan.
(42)
For example, under the linear budget equation
(Example 1), we have µκ = r + κ  µR .
the processes (Y, σ Y , φ 0 ) satisfy
Defining the zero-wealth constraint set, aggrega-
tor, and portfolio and consumption processes
dY = − (e − Y µκ + f 0 (φ 0 ) + G∗ (δ, )

K 0 = {α : (0, α) ∈ K}, − κ  σ R )dt + σ Y  dB, YT = eT

f 0 (ω, t, α) = f (ω, t, 0, α), σ + (φ − κY ) σ
Y 0 R
= ,

φt0 = φt − Wt κ, 

γt − σ R (G − σ R κ) ∈ ∂f 0 (φ 0 ), φ0 ∈ K 0
ct0 = ct − Wt (39)
t (43)

the budget equation (7) is equivalent to Given the solution (Y, σ Y , φ 0 ) and sufficient regu-
larity, the optimal wealth-independent component of
dt consumption is
dWt = Wt + (f 0 (t, φt0 ) + et − ct0 )dt
t  
γt γt
+ φt0 σtR dBt , ct0 = Yt + γt X t, , t and cT0 = eT
t t
cT0 = eT , φt0 ∈ K 0 (40)
(44)
At the optimum, the quasi-linearity of utility and Substituting (c0 , φ 0 ) into the budget equation (40),
markets implies that there are two components to the optimal plan is (c0 + W γ / , W, φ 0 + W κ).
consumption and trading. The pair (c0 , φ 0 ) depends
on the investment opportunity set and the endowment,
but is independent of W . All incremental financial Acknowledgments
wealth is invested in the portfolio κ, and the resulting
dividend stream rate γ is consumed; therefore, (c − I am grateful to Costis Skiadas for many fruitful years of
c0 , φ − φ 0 ) depend only on W and the dividend yield joint research, on which this article is based.
Recursive Preferences 7

End Notes and asset returns: an empirical analysis, The Journal of


Political Economy 99, 263–286.
a. [13] Hansen, L., Sargent, T., Turmuhambetova, G. & Williams, N.
See also [25], which develops the discrete-time counter- (2001). Robustness and Uncertainty Aversion, working
part of (5), and [24], which develops the continuous-time paper, Department of Economics, University of Chicago.
formulations of other notions of ambiguity aversion. [14] Kreps, D. & Porteus, E. (1978). Temporal resolution of
b.
We define arbitrage in the constrained case as a feasible uncertainty and dynamic choice theory, Econometrica
incremental cash flow (given the current portfolio of the 46, 185–200.
agent) that is nonnegative and nonzero. [15] Lazrak, A. & Quenez, M.C. (2003). A generalized
c.
With some additional mild technical conditions, Schroder stochastic differential utility, Mathematics of Operations
and Skiadas [20] show the necessity of the state-price Research 28, 154–180.
characterization for the market settings in the scale and [16] Liu, J. & Longstaff, F.A. (2004). Losing money on
translation-invariant classes, which are discussed below in arbitrage: optimal dynamic portfolio choice in markets
the text. with arbitrage opportunities, Review of Financial Studies
17(3), 611–641.
References [17] Maenhout, P. (1999). Robust Portfolio Rules and Asset
Pricing, working paper, INSEAD.
[18] Schroder, M. & Skiadas, C. (2003). Optimal life-
[1] Anderson, E., Hansen, L. & Sargent, T. (2000). Robust- time consumption-portfolio strategies under trading con-
ness, Detection and the Price of Risk, working paper, straints and generalized recursive preferences, Stochastic
Department of Economics, University of Chicago. Processes and Their Applications 108, 155–202.
[2] Chen, Z. & Epstein, L. (2002). Ambiguity, risk, and [19] Schroder, M. & Skiadas, C. (2005). Lifetime consump-
asset returns in continuous time, Econometrica 70, tion-portfolio choice under trading constraints and non-
1403–1443. tradeable income, Stochastic Processes and their Appli-
[3] Chew, S.H. (1983). A generalization of the Quasi-linear cations 115, 1–30.
mean with applications to the measurement of inequality [20] Schroder, M. & Skiadas, C. (2008). Optimality and state
and decision theory resolving the allais paradox, Econo- pricing in constrained financial markets with recursive
metrica 51, 1065–1092. utility under continuous and discontinuous information,
[4] Cuoco, D. & Cvitanić, J. (1998). Optimal consump- Mathematical Finance 18, 199–238.
tion choices for a large investor, Journal of Economic [21] Skiadas, C. (1992). Advances in the Theory of Choice
Dynamics and Control 22, 401–436. and Asset Pricing, Ph.D. Thesis, Stanford University.
[5] Cuoco, D. & Liu, H. (2000). A martingale characteriza- [22] Skiadas, C. (2003). Robust control and recursive utility,
tion of consumption choices and hedging costs with mar- Finance and Stochastics 7, 475–489.
gin requirements, Mathematical Finance 10, 355–385. [23] Skiadas, C. (2008). Dynamic portfolio choice and risk
[6] Cvitanić, J. & Karatzas, I. (1992). Convex duality aversion, in Handbooks in OR & MS, J.R. Birge &
in constrained portfolio optimization, The Annals of V. Linetsky, eds, Elsevier, Vol. 15, Chapter 19, pp.
Applied Probability 2, 767–818. 789–843.
[7] Dekel, E. (1986). An axiomatic characterization of pref- [24] Skiadas, C. (2008). Smooth Ambiguity Aversion Toward
erences under uncertainty: weakening the independence Small Risks and Continuous-Time Recursive Utility,
axiom, Journal of Economic Theory 40, 304–318. working paper, Kellogg School of Management, North-
[8] Duffie, D. & Epstein, L.G. (1992). Stochastic differential western University.
utility, Econometrica 60, 353–394. [25] Skiadas, C. (2009). Asset Pricing Theory, Princeton
[9] Duffie, D. & Skiadas, C. (1994). Continuous-time secu- University Press, Princeton, NJ.
rity pricing: a utility gradient approach, Journal of Math-
ematical Economics 23, 107–131.
[10] El Karoui, N., Peng, S. & Quenez, M.-C. (2001). A Related Articles
dynamic maximum principle for the optimization of
recursive utilities under constraints, Annals of Applied
Probability 11, 664–693. Backward Stochastic Differential Equations; Util-
[11] Epstein, L.G. & Zin, S.E. (1989). Substitution, risk ity Function; Utility Theory: Historical Perspec-
aversion, and the temporal behavior of consumption and tives.
asset returns: a theoretical framework, Econometrica 57,
937–969. MARK SCHRODER
[12] Epstein, L.G. & Zin, S.E. (1991). Substitution, risk
aversion, and the temporal behavior of consumption

Risk Aversion which is represented by EU () =x∈X u(x)dF (x) if
F (.) is differentiable or EU () = ni=1 pi u(xi ) if the
lottery is finite. The von Neumann–Morgenstern (or
An agent is risk averse if he/she dislikes the actions Bernoulli) utility function u : X →  represents the
whose outcomes are not certain. In the following, preferences over the set of degenerate lotteries, that
only actions with one-dimensional final outcomes, is, over the set of outcomes.
for example, sums of money, are taken into consid-
Definition 1 (Expected Value).  The expected value
eration. To define risk aversion, it is necessary that
of a lottery  ∈ L is EV () =
 x∈X xdF (x) or, if the
a probability is associated to every possible conse-
lottery is finite, EV () = ni=1 pi xi and the function
quence, that is, that the actions can be represented
EV : L → X is the expected value function.
as lotteries. A lottery is simple if all possible conse-
quences are final outcomes (sums of money) and it Definition 2 (Certainty Equivalent). The certainty
is compound if other lotteries are included among its equivalent CE() of a lottery  ∈ L is the out-
consequences. An outcome coincides with a degen- come for which the individual is indifferent between
erate lottery, that is, the lottery that generates it with this outcome and the lottery, that is, (CE(), 1) ∼ ,
probability one. where (CE(), 1) is the degenerate lottery with out-
Formally, a decision-making situation under risk is come CE(). Having U () = u(CE()), the certainty
represented by the quintuple S; 2S ; p; X; L, where equivalent function is CE() = u−1 (U ()). If the sys-
S is a set of states of the nature; 2S is its power tem of preferences L,  can be represented by an
set (i.e., the set of all subsets of S, the empty set
expected utility function, then CE() = u−1 (EU ()).
included); p is a probability distribution on 2S ; X
is a set of outcomes (with X ⊆  if they are one- Proposition 1 (Existence and Uniqueness of the
dimensional); and L is the set of lotteries. A simple Certainty Equivalent). Let us assume that the set
lottery is represented by  = (x(E), p(E))E∈Part(S) , of outcomes is compact, that is, X = [x, x] and the
where outcomes and probabilities are associated with system of preferences L,  is regular (i.e., complete
the events E ⊆ S that form a partition P art (S) of and transitive), continuous and such that (x, 1)   
S; a compound lottery by  = ((E), p(E))E∈Part(S) , (x, 1) for every  ∈ L. Then, there exists one and only
with (E) = ( (E  ), p(E  ))E  ∈Part(E) ; and a degen- one certainty equivalent CE() ∈ X for every  ∈ L.
erate lottery by  = (x, 1). A simple lottery is also
represented by the cumulative probability function The following discussion on the notion of risk
F : X → [0, 1], where F (.) is a nondecreasing func- aversion refers, for the sake of simplicity, to finite
tion with range [0,1], and, if S is finite, that is, S = simple lotteries on a compact set of outcomes, where
{s1 , . . . , sm }, by  = (xi , pi )ni=1 , where pi = p(Ei ) not specified differently.
with Ei = {sh ∈ S : x(sh ) = xi }. An agent in a risky
situation is a system of preferences L,  over the
set of lotteries. Let L,  be regular (i.e., com- Global Risk Aversion
plete and transitive) and continuous. Moreover, let
it be strongly monotone with respect to degenerate
lotteries, that is, (x, 1)  (x  , 1) if x > x  . Then, pref- Definition 3 (Risk Premium and Global Risk Aver-
erences can be represented by a utility function U : sion). The risk premium RP () of a lottery is the
L → , that is, such that U () ≥ U ( ) if and only if maximum sum of money that the agent is willing to
   . This function is not necessarily the expected pay to get the expected value of the lottery in place of
utility function. However, if the expected utility the lottery. Therefore,
model is introduced, then every lottery is equiva-
lent to a simple lottery because of the compound RP () = EV () − CE() (1)
lottery principle (implied by expected utility), accord-
ing to which any compound lottery is indifferent to
the simple (or reduced) lottery that associates to each since the conditions (EV () − RP (), 1) ∼  and
final outcome its compound probability, and prefer-  ∼ (CE(), 1) imply EV () − RP () = CE().
ences are represented by the expected utility function, The agent denotes (global) risk aversion if his/her
2 Risk Aversion

system of preferences L,  requires CE() ≤


EV (), so RP () ≥ 0, for every  ∈ L. The agent is
risk loving if RP () ≤ 0 and risk neutral if RP () =
0. He/she is strictly risk averse if RP () > 0 for
every nondegenerate  ∈ L (strictly risk loving if
RP () < 0). An agent is globally neither risk averse
nor risk loving if there is a pair ,  ∈ L for which
RP () > 0 and RP ( ) < 0.

Proposition 2 [5]. Let us introduce the set of the


lotteries that are not preferred to the certain outcome
x and the set of lotteries that have an expected value
not higher than x, that is,

G(x) = { ∈ L : CE() ≤ x},


Figure 1 Indifference curve of a risk averse expected
H (x) = { ∈ L : EV () ≤ x} (2) utility agent
The agent is risk averse if and only if H (x) ⊆
G(x) for every x ∈ X, risk loving if and only if
H (x) ⊇ G(x), and risk neutral if and only if H (x) =
G(x).

In the Hirshleifer–Yaari diagram, where simple


lotteries with only two possible outcomes with given
probabilities are represented, the certainty equiv-
alent of a lottery ∗ = (x1 ∗ , p; x2 ∗ , 1 − p) is, by
definition, determined as the intersection of the
45° line and the corresponding indifference curve.
Therefore, the certainty equivalent is equal to the
coordinates of this point. Moreover, the expected
value of the same lottery is equal to the coor-
dinates of the point where the 45° line inter-
sects the expected value line (described by the
equation px1 + (1 − p)x2 = EV (∗ ) = px1 ∗ + (1 −
p)x2 ∗ ). This line passes through ∗ and has the slope Figure 2 Indifference curve of a risk averse nonexpected
p
equal to − . Thus, the agent is risk averse if the utility agent
1−p
first intersection point is not above the second one,
as shown in Figures 1 and 2. n n 
Proposition 2 indicates that risk aversion implies The inequality i=1 pi u(xi ) ≤ u i=1 pi xi ,
H (CE(∗ )) ⊆ G(CE(∗ )). It means that the indif- which is called the Jensen inequality, is a definition of
ference curve and the expected value line passing concavity and is equivalent to EU () ≤ u(EV ()).
through the same point on the 45° line do not cross In Figure 3 we can see how the concavity of
and that the indifference curve is to the north-east the function u(.) implies risk aversion. The expected
with respect to the expected value line. value, expected utility, and certainty equivalent are
represented for the lottery  = (x1 , 0.5; x2 , 0.5).
Proposition 3 If the expected utility model applies, The concavity of the von Neumann–Morgenstern
then the agent is risk averse if and only if his/her von function u(.) implies the concavity of the expected
Neumann–Morgenstern utility function u : X →  is utility function with respect to the outcomes. That
concave, risk loving if and only if it is convex, and is, if u(λxi + (1 − λ)xi  ) ≥ λu(xi ) + (1 − λ)u(xi  )
risk neutral if and only if it is a linear. for every pair xi , xi  ∈ X and every λ ∈ [0, 1],
Risk Aversion 3

u (x 2) u (x )

U (EV ( )) u ((x 1+x 2) / 2)

U( ) (u (x 1)+u (x 2)) / 2

u (x 1)

u −1(U ( )) (x 1 + x 2 ) / 2

x2 CE ( ) EV ( ) x2 x

Figure 3 Risk aversion and concavity of utility function

then EU ( ) ≥ λEU () + (1 − λ)EU ( ) for every B if their systems of preferences L, A  and L, B 
λ ∈ [0, 1] and every triplet ,  ,  ∈ L, with  = give CEA () ≤ CEB () for every  ∈ L.
(xi , pi )ni=1 ,  = (xi , pi )ni=1 and  = (xi , pi )ni=1 .
Thus, if the agent is risk averse and the expected In the Hirshleifer–Yaari diagram, this definition
utility theory holds, the function EU (.) is concave implies that the indifference curves of the agents that
(and, all the more so, quasiconcave) with respect to go through the same point on the 45° line do not
the outcomes. Consequently, the indifference curves cross and that the indifference curve of the more risk
in the Hirshleifer–Yaari diagram are convex (as averse agent is to the north-east with respect to the
described in Figure 1, but not in Figure 2, which can indifference curve of the less risk averse agent, as
represent an agent who is risk averse but does not shown in Figure 4.
maximize expected utility).
Proposition 5 [7]. If agent A is more risk averse
Proposition 4 [5]. The agent is risk averse if the than agent B and the expected utility model applies,
certainty equivalent function CE : L → X is con-
vex with respect to the probabilities. The agent is
risk loving if it is concave and risk neutral if it is x2
linear.
The condition stated in Proposition 4 for risk aver-
sion is sufficient, but not necessary, nor is it necessary
that the certainty equivalent function CE(.) is quasi-
convex with respect to the probabilities. However,
if the expected utility theory holds and there is risk
aversion, then the certainty equivalent function is ∗ UA( ∗)
convex with respect to the probabilities: UB ( ∗)
n in fact, in 
such a case, we have CE() = u−1 i=1 pi u(xi ) ,
where function u(.) is increasing and concave and
function u−1 (.) is increasing and convex. CEA( ∗) CEB ( ∗) x1

Definition 4 (Comparison of Risk Aversion across Figure 4 Indifference curves of two agents of whom one
Agents). An agent A is more risk averse than agent is more risk averse than the other
4 Risk Aversion

then the von Neumann–Morgenstern utility function u (.), which is a measure of its concavity, is not
uA (.) is a concave transformation of uB (.). That is, invariant to increasing linear transformations of u(.).
there exists an increasing and concave function g : An invariant measure is the de Finetti–Arrow–Pratt
 →  such that uA (x) = g(uB (x)) for every x ∈ X. coefficient of risk aversion (due to de Finetti [3], Pratt
[7], and Arrow [1]). This measure of (absolute) risk
aversion is defined as
Local Risk Aversion
u (x)
Till now, we considered global risk aversion, that is, r(x) = − (3)
u (x)
the relationship CE() ≤ EV () was introduced for
every lottery  ∈ L. Now, let us consider local risk There also exists a measure of relative risk aver-
aversion, by taking into account only small lotteries,
sion rr (x) = −x u  (x) , which is important in the
that is, the lotteries that have only little differences in u (x)
consequences. For this purpose, we denote the lottery case of multiplicative lotteries  = (αi W, pi )ni=1 .
(x + txi , pi )ni=1 with x + t, where  = (xi , pi )ni=1 . The de Finetti–Arrow–Pratt measure can be jus-
tified in relation to the local risk premium, which is
Definition 5 (Local Risk Aversion). An agent is (by Definition 3)
locally risk averse, if, for every x ∈ X and
 ∈ L, there exists a t ∗ > 0 such that CE(x + t) ≤
RP (x + t) = EV (x + t) − CE(x + t)
EV (x + t) for all t ∈ [0, t ∗ ]. Thus, if the certainty
equivalent function can be derived, then the agent is = x + tEV ()
locally risk averse if lim d (EV (x + t) − CE(x +  n 
t→0 dt

−1
−u pi u(x + txi ) (4)
t)) > 0 and only if lim d (EV (x + t) − CE(x +
t→0 dt i=1
t)) ≥ 0 for every x ∈ X and  ∈ L. By analogy, the
definition holds with reversed inequality signs for the Then, assuming that this function is differ-
local risk loving. entiable with respect to t, we get RP(x) =

0, ∂RP (x + t)  = 0 and ∂ 2 RP (x + t)  =
∂t  
Although the global risk aversion requires that in t=0 ∂t 2 t=0
the Hirshleifer–Yaari diagram the indifference curve 
− u (x) 2
σ (). Therefore, in the neighborhood of the
and the expected value line passing through some u (x)
point on the 45° line do not cross and that the certain outcome x, the risk premium is proportional
indifference curve is to the north-east with respect to the de Finetti–Arrow–Pratt measure. Nevertheless,
to the expected value line, this condition needs to be the fact that only the second derivative of the risk pre-
satisfied only in the vicinity of the 45° line for the mium can be different from zero at t = 0, while the
local risk aversion. first derivative is always equal to zero, means that the
expected utility theory allows only for local risk aver-
Proposition 6 If the expected utility theory holds, sion of the second order, while that of the first order
then the agent is locally risk averse if and only if is zero. Other theories (e.g., rank-dependent expected
his/her von Neumann–Morgenstern utility function utility, which is discussed later) also allow for the
u : X →  is concave. In other words, if the expected risk aversion of the first order and can, as a result,
utility theory holds, then the conditions for local and describe the preferences that indicate more relevant
global risk aversion (risk loving or neutrality) are the types of aversion to risk (like the one presented in
same. Allais paradox) than the risk aversion admitted by
the expected utility theory and measured by the de
Measure of the Risk Aversion Finetti–Arrow–Pratt index.
Local risk aversion in the Hirshleifer–Yaari dia-
If the expected utility theory holds, then the local gram is linked to the curvature of indifference
risk aversion can be measured by the concavity of curves at the point where they intersect the 45°
the von Neumann–Morgenstern utility function u(.). line. In other words, it is linked to the value
However, the second derivative of the utility function of the second derivative x2  (x1 ) at x1 = x, where
Risk Aversion 5

the function x2 (x1 ) that represents the indiffer- called strong risk aversion and the risk aversion as
ence curve is implicitly defined by the condition introduced in Definition 3 is called weak risk aversion
CE(x1 , x2 ) = x. Then, if the expected utility the- [2]). To be precise, if CE() ≤ CE(∗ ) for every pair
p , ∗ ∈ L with  not less risky than ∗ (according to
ory holds, we get x2 (x) = x, x2  (x) = − and
1−p
p 
u (x) , that is, the curvature of mean preserving spreads), then CE() ≤ EV () for
x2  (x) = −  every  ∈ L.
(1 − p)2 u (x)
the indifference curves along the 45° line is propor-
tional to the de Finetti–Arrow–Pratt measure of risk Proposition 8 If the expected utility model applies,
aversion. then there is aversion toward mean preserving
The dependence of the de Finetti–Arrow–Pratt spreads increases in risk if and only if the von
index r(x) on x defines the decreasing absolute risk Neumann–Morgenstern utility function u : X →  is
aversion if r  (x) < 0 (increasing if r  (x) > 0), as concave.
well as, with regard to rr (x), the decreasing relative Note that the concavity of the utility function is
risk aversion if rr  (x) < 0 (increasing if rr  (x) > 0). a necessary and sufficient condition for both risk
aversion and aversion to increases in risk (deter-
Aversion Toward Increases in Risk mined by mean preserving spreads). The equality of
this condition holds in the case of expected util-
Risk aversion can also be analyzed taking into ity theory. For other theories, we will generally
account the riskiness of lotteries, that is, consid- have two different conditions (one for risk aver-
ering preference for less risky lotteries. However, sion and the other for the aversion to increases in
there does not exist a unique definition of riskiness risk).
according to which lotteries can be ordered. In the An ordering of the lotteries according to their
following, only two definitions of riskiness are exam- riskiness that is equivalent to the mean preserving
ined. Both introduce a partial ordering criterion. spreads concept (for the lotteries that have equal
expected value) is provided by the notion of the
1. The first definition refers to mean preserving second-order stochastic dominance.
spreads (introduced by Rothschild and Stiglitz
[10]). A lottery  = (xi , pi )ni=1 is not less risky Definition 7 (First-order Stochastic Dominance).
than lottery ∗ = (xi ∗ , pi ∗ )ni=1 if  can be obtained A lottery  = (xi , pi )ni=1 , where xi > xi+1 for every
from ∗ by mean preserving spreads. That is, i = 1, . . . , n − 1, first orderstochastically
i dominates
if EV () = EV (∗ ), xi = xi ∗ for every i = lottery  = (x  i , p i
 n
)i=1 if 
i
h=1 p h ≥ 
h=1 ph (or,
1, . . . , n and pi = pi ∗ for every i = 1, . . . , n equivalently, nh=i+1 ph ≤ nh=i+1 ph  ) for every i =
except for three outcomes xa > xb > xc , for 1, . . . , n − 1, that is, with respect to the cumulative
which we have pa ≥ pa ∗ , pb ≤ pb ∗ , and pc ≥ probability functions (introduced earlier), if F (x) ≤
pc ∗ . For example,  = (x1 , p1 ; x2 , p2 ; x3 , p3 ) is F  (x) for every x ∈ X.
not less risky than ∗ = (x1 , p1 ∗ ; x2 , p2 ∗ ; x3 , p3 ∗ )
if p2 ≤ p2 ∗ , p1 = p1 ∗ + xx2 − x3 (p ∗ − p ), First-order stochastic dominance means that prob-
1 − x3
2 2
abilities of the better (worse) outcomes are higher
p3 = p3 ∗ + xx1 − x2 (p ∗ − p ), and x > x >
1 − x3 (lower) in the dominant lottery than in the dominated
2 2 1 2
x3 . lottery. It implies that EV () ≥ EV ( ) and, also,
CE() ≥ CE( ) for a rational agent.
Definition 6 (Aversion to Mean Preserving Spreads
Increases in Risk). An agent is averse to the increases Definition 8 (Second-order Stochastic Dominance).
in risk if CE() ≤ CE(∗ ) for every pair of lotteries A lottery  = (xi , pi )ni=1 , where xi > xi+1 for every
, ∗ ∈ L with  not less risky than ∗ (according to i = 1, . . . , n − 1, second order stochastically  dom-
mean preserving spreads). inates lottery  = (xi , pi  )ni=1 if Dj (,  ) = n−1
j i=j
Proposition 7 If an agent is averse to mean pre- (xi − xi+1 ) h=1 (ph − ph  ) ≥ 0 for every j = 1, . . . ,
serving spreads increases in risk, then he/she is also n − 1, that is, with respect to the cumulative  x proba-
risk averse (for this reason, sometimes the aver- bility functions in the continuous case, if x (F (t) −
sion to mean preserving spreads increases in risk is F  (t))dt ≤ 0 for every x ∈ X = [x, x]. The first-order
6 Risk Aversion

l xa (s1)
p (s1) xa (s1) p (s1) xb (s 1)
p (s1)
1–l xb (s1)
a b l a ⊕(1–l) b xa (s 2)
l
p (s 2) xa (s 2) p (s 2) xb (s 2) p (s 2)
1–l xb (s 2)

Figure 5 Probability mixture of two lotteries

stochastic dominance implies second-order stochastic Note that the expected utility model implies neu-
dominance, but not vice versa. trality toward probability mixture increases in risk,
since this model satisfies the compound lottery prin-
Proposition 9 Let two lotteries   and  have the ciple, according to which EU (λa ⊕ (1 − λ)b ) =
n−1
same expected value, so that
j i=1 (xi − xi+1 ) λEU (a ) + (1 − λ)EU (b ).
 
(p
h=1 h − p h ) = 0. If the lottery  is more risky
than  (according to the mean preserving spreads cri-
terion), then  second-order stochastically dominates Risk Aversion and Aversion to
 . Conversely, if  second-order stochastically domi- Increasing Risk with Regard to
nates  , then  can be obtained from  by a sequence Rank-dependent Expected Utility
of mean preserving spreads.
Let us take into consideration a generalization of
The equivalence of the second-order stochastic expected utility theory in order to show some aspects
dominance and mean preserving spreads for the of risk aversion and aversion to increasing risk,
lotteries with the same expected value implies that which appear very different from the case of expected
the same conditions that determine the aversion to utility.
the increases in risk (introduced by mean preserving
spreads) also determine the aversion for the lotteries Definition 10 (Rank-dependent Expected Utility [8,
that are second-order stochastically dominated (in 4]). The system of preferences L,  is represented
comparison between lotteries of the same expected by rank-dependent expected utility U : L →  if, for
value). every lottery  ∈ L with  = (xi , pi )ni=1 and xi > xi+1
for every i = 1, . . . , n − 1, where xi ∈ X with X =
2. The second definition of riskiness refers to prob- [x, x] ⊂ , we have
ability mixtures [11]. According to this defini-
tion, a compound lottery is, ceteris paribus, more 
n−1 i

risky than a simple lottery. More precisely, let U () = u(xn ) + (u(xi ) − u(xi+1 ))ϕ ph
h=1
us define as a probability mixture of two sim- i=1
ple lotteries a = (xa (sj ), p(sj ))m j =1 and b = (5)
(xb (sj ), p(sj ))m
j =1 , where S = {s 1 , . . . , sm } is the
set of the states of the nature, the two-stages where function u : X →  represents the system of
lottery λa ⊕ (1 − λ)b = (((xa (sj ), λ), (xb (sj ), preferences X,  over the set of outcomes and
(1 − λ))), p(sj ))m function ϕ : [0, 1] → [0, 1], which is increasing, with
j =1 , where λ ∈ [0, 1]. Figure 5
represents the simplest case of a probability ϕ(0) = 0 and ϕ(1) = 1, distorts the decumulative
mixture. probability function.
Thus, the rank-dependent expected utility model
Definition 9 (Aversion to Probability Mixture In- describes the agent’s system of preferences by means
creases in Risk). An agent is averse to the increases in of a utility function on outcomes and a probability
risk if CE(λa ⊕(1−λ)b ) ≤ max{CE(a ), CE(b )} distortion function (while the expected utility model
for every pair of lotteries a , b ∈ L and λ ∈ [0, 1]. requires only the first function). Note that, when
Risk Aversion 7

the probability distortion function is the identity


function, that is, when ϕ(p) = p for every p ∈ [0, 1],
then rank-dependent expected utility coincides with
expected utility.
Recalling that an agent is risk averse if CE() ≤
EV () for every  ∈ L, that is, if the risk premium
RP () = EV () − CE() is nonnegative for every RPRDEU (t )
 ∈ L, let us split the risk premium RP () into two
1 n() = CEEU
parts: first-order risk premium RP  () −
CE() (with CEEU () = u−1 i=1 p i u(x i ) ) and RPEU (t )
second-order risk premium RP2 () = EV () − b a
CEEU ().
t
Proposition 10 [6]. Let L,  be represented by
rank-dependent expected utility. There is first-order Figure 6 Risk premium function of a risk averse agent
risk aversion, that is, RP1 () = CEEU () − CE() ≥
0 for every  ∈ L, if and only if probability distortion
function ϕ : [0, 1] → [0, 1] is such that ϕ(p) ≤ p for
every p ∈ [0, 1]. The agent exhibits second-order risk x2
aversion, that is, RP2 () = EV () − CEEU () ≥ 0
for every  ∈ L, if and only if the utility function
u : X →  is concave. As a consequence, an agent EV ( ) = x
is risk averse, that is, RP () = EV () − CE() ≥
0 for every  ∈ L, if ϕ(p) ≤ p for every p ∈ [0, 1]
and u : X →  is concave. In essence, the condition
ϕ(p) ≤ p means that the agent overstates the prob-
abilities of some bad outcomes and understates the x
probabilities of some better outcomes. Because of the
CE ( ) = x
probability distortion, rank-dependent expected utility
d g
admits first-order risk aversion, therefore allowing for
a significant risk aversion even when stakes are small, x x1
contrary to expected utility [9]. This may be relevant
in finance applications when the agent’s choice con- Figure 7 Indifference curve of a risk averse rank-
cerns lotteries in which a small amount of wealth is dependent expected utility agent
involved.
Proposition 11 Let L,  be represented by rank- respect to t is generally nonzero and discontin-
dependent expected utility. The agent is locally uous at t = 0. For example, if n = 2 and x1 >
risk averse if the probability distortion function ϕ :
x2 , we get limt→0+ ∂RP (x + t) = (x − x )(p −
[0, 1] → [0, 1] is such that ϕ(p) < p for every p ∈ ∂t 1 2 1
∂RP (x + t) = (x1 − x2 )(p2 −
(0, 1) and only if ϕ(p) ≤ p. ϕ(p1 )) and limt→0− ∂t
ϕ(p2 )). (However, the expected utility theory would
In other words, if the rank-dependent expected
yield limt→0 ∂RP (x + t) = 0.) In Figure 6, the
utility theory holds, then the condition for local ∂t
curve RP (t) represents function RP (x + t) of
risk aversion concerns only the probability distortion
a risk averse agent, where tan α = (x1 − x2 )(p1 −
function. (As a consequence, the de Finetti–Arrow–
ϕ(p1 )) and tan β = (x1 − x2 )(p2 − ϕ(p2 )). The
Pratt coefficient of risk aversion, which has as its
curve RPEU (t) represents the same function when the
object the utility function u(.), is of no importance in
expected utility theory holds. In the Hirshleifer–Yaari
the case of rank-dependent expected utility.)
diagram (Figure 7), the indifference curves have
Another interesting point is that the first-
order derivative of risk premium RP (x + t) with a kink at x1 = x2 , with limx1 −x2 →0+ dx2 (x1 ) =
dx1
8 Risk Aversion

ϕ(p1 ) [4] Machina, M.J. (1987). Choice under uncertainty:


= tan γ and limx1 −x2 →0− dx2 (x1 ) =
1 − ϕ(p1 ) dx1 problems solved and unsolved, Economic Perspectives
ϕ(p2 ) 1, 121–154.
= tan δ.
1 − ϕ(p2 ) [5] Montesano, A. (1999). Risk and uncertainty aversion on
certainty equivalent functions, in Beliefs, Interactions
If the expected utility theory is valid, then both risk
and Preferences in Decision Making, M.J. Machina &
aversion and aversion toward increases in risk (intro- B. Munier, eds, Kluwer, Dordrecht, pp. 23–52.
duced with mean preserving spreads) come from the [6] Montesano, A. (1999). Risk and uncertainty aversion
same condition, which is concavity of the von Neu- with reference to the theories of expected utility,
mann–Morgenstern utility function (Propositions 3 rank dependent expected utility, and Choquet expected
and 8). These conditions are different when the rank- utility, in Uncertain Decisions: Bridging Theory and
dependent expected utility theory holds. Moreover, Experiments, L. Luini, ed, Kluwer, Boston, pp. 3–37.
[7] Pratt, J.W. (1964). Risk aversion in the small and in the
a rank-dependent expected utility agent may exhibit
large, Econometrica 32, 122–136.
aversion to probability mixture increases in risk, [8] Quiggin, J. (1982). A theory of anticipated utility,
while the expected utility agent is always neutral. Journal of Economic Behavior and Organization 3,
323–343.
Proposition 12 Let L,  be represented by rank- [9] Rabin, M. (2000). Risk aversion and expected utility
dependent expected utility. Then, an agent is averse theory: a calibration theorem, Econometrica 68,
toward (mean preserving spreads) increases in risk 1281–1292.
if the function ϕ : [0, 1] → [0, 1] is convex and the [10] Rothschild, M. & Stiglitz, J.E. (1970). Increasing risk:
function u : X →  is concave. He/she is averse I. A definition, Journal of Economic Theory 2, 225–243.
[11] Wakker, P.P. (1994). Separating marginal utility and
toward (probability mixtures) increases in risk if and
probabilistic risk aversion, Theory and Decision 36,
only if the function ϕ : [0, 1] → [0, 1] is convex. 1–44.

References
Related Articles
[1] Arrow, K.J. (1965). Aspects of the Theory of Risk-
Bearing, Yrjö Jahnssonin Sāātiö, Helsinki.
[2] Cohen, M.D. (1995). Risk-aversion concepts in Ambiguity; Behavioral Portfolio Selection;
expected- and non-expected-utility models, Geneva Expected Utility Maximization; Risk–Return
Papers on Risk and Insurance Theory 20, 73–91. Analysis; Utility Function.
[3] de Finetti, B. (1952). Sulla preferibilità, Giornale degli
Economisti NS 11, 685–709. ALDO MONTESANO
Ambiguity the unfamiliar coin lands heads up—that is, a bet on
the event B = {HH , TH }—an SEU decision maker
reveals that
In the literature on decision making under uncer-
tainty, ambiguity is now consistently used to define u(1) P (A) + u(0) (1 − P (A)) > u(1) P (B)
those decision settings in which an economic agent + u(0) (1 − P (B)) (1)
perceives “[. . .] uncertainty about probability, cre-
ated by missing information that is relevant and could that is, P (A) > P (B). Analogously, by preferring the
be known” [17]. Other terms have been used inter- bet on tails on the familiar coin to the bet on tails on
changeably, notably “Knightian uncertainty,” based the unfamiliar coin, an SEU decision maker reveals
on Knight’s [32] distinction between “risk” (a con- that
text in which all the relevant “odds” are known and
unanimously agreed upon) and “uncertainty” (a con- P ({TH , TT }) = P (Ac ) = 1 − P (A) > 1 − P (B)
text in which some “odds” are not known). The term
ambiguity, which avoids charging uncertainty with = P (B c ) = P ({HT , TT }) (2)
too many meanings, was introduced in [12], the paper
that is, P (A) < P (B): a contradiction. Yet, few
that first showed how ambiguity represents a norma-
people would immediately describe these preferences
tive criticism to Savage’s [38] subjective expected
as being an example of irrationality. Ellsberg reports
utility (SEU) model.
that Savage himself chose in the manner described
Ellsberg proposed two famous thought experi-
above, and did not feel that his choices were clearly
ments involving choices on urns in which the exact
wrong [12, p. 656]. (Indeed, Savage was aware of
distribution of ball colors is unknown (one of which
the issue well before Ellsberg proposed his thought
was anticipated in both [29] and [32]). A variant
experiments, for Savage wrote in the Foundations of
of Ellsberg’s so-called two-urn paradox is the fol-
Statistics (pp. 57–58) that “there seem to be some
lowing example, due to David Schmeidler. “Suppose
probability relations about which we feel relatively
that I ask you to make bets on two coins, one taken
‘sure’ as compared to others,” adding that he did
out of your pocket—a coin, which you have flipped
not know how to make such notion of comparatively
countless times—the other taken out of my pocket.
“sure” less vague.)
If asked to bet on ‘heads’ or on ‘tails’ on one of
Ellsberg’s paper generated quite a bit of debate
the two coins, would you rather bet on your coin
immediately after its publication (most of which is
or mine?” Most people, when posed this question,
discussed in Ellsberg’s PhD dissertation [13]), but
announce a mild but strict preference for betting on
the lack of axiomatically founded models that could
their own coin rather than on somebody else’s, both
encompass a concern for ambiguity while retain-
for heads and for tails. The rationale is precisely that
ing most of the compelling features of the SEU
their coin has a well-understood stochastic behav-
model worked to douse the flames. Moreover, the
ior, while the other person’s coin does not; that is,
so-called Allais paradox [2], another descriptive fail-
its behavior is ambiguous. The possibility that the
ure of expected utility, which predated Ellsberg’s by
coin be biased, although remote, cannot be dismissed
a few years, monopolized the attention of decision
altogether. This pattern of preference is called ambi-
theorists until the early 1980s. However, statisticians
guity aversion, and is, as suggested, very common
such as Good [23] and Arthur Dempster [9] did lay
([6, p. 646] e.g., references many experimental repli-
the foundations of statistics with sets of probabilities,
cations of the “paradox”.) It is easy to see that it
providing analysis and technical results, which even-
is not compatible with the SEU model. For, suppose
tually made it into the toolbox of decision theorists.
that a decision maker has a probabilistic prior P over
the state space S = {HH , HT , TH , TT } (where H T
is the state in which the familiar coin lands heads up Models of Ambiguity-sensitive Preferences
and the unfamiliar coin lands tails up, etc.). Then, by
saying that he/she prefers a bet that pays off ¤1 if The interest in ambiguity as a reason for departure
the familiar coin lands heads up—that is, a bet on from the SEU model was revived by David Schmei-
the event A = {HH , HT }—to the bet that pays ¤1 if dler, who proposed and characterized axiomatically
2 Ambiguity

two of the most successful models of decision making of u° g. The interested reader is referred to Schmei-
in the presence of ambiguity, the Choquet expected dler’s paper for details of the axiomatization. For our
utility (CEU) and the maxmin expected utility (MEU) purpose, it suffices to observe that, not too surpris-
models. ingly, the key axiomatic departure from SEU (in the
CEU [39] “resolves” the Ellsberg paradox by variant due to [3]) is a relaxation of the independence
allowing a decision maker’s willingness to bet on an axiom—or what Savage calls the sure-thing princi-
event to be represented by a set-function that is not ple—which is the property of preferences that the
necessarily additive; that is, a v, which, to disjoint Ellsberg-like preferences above violate.
events A and B, may assign v(A ∪ B)  = v(A) + Not all capacities give rise to behavior which
v(B). More precisely, call a capacity any function v is averse to ambiguity, as in the above example.
defined on a σ -algebra  of subsets of a state space S, Schmeidler proposed the following behavioral notion
which satisfies the following properties: (i) v(∅) = 0, of aversion to ambiguity. Assuming that the payoffs
(ii) v(S) = 1, (iii) for any A, B ∈  such that A ⊆ B, x can themselves be (objective and additive) lotteries
v(A) ≤ v(B). (Note that a probability (charge) is v, over a set of certain prizes, define for any α ∈ [0, 1]
which satisfies instead of (iii) the property v(A ∪ the α-mixture of acts f and g as follows: for any
B) = v(A) + v(B) − v(A ∩ B) for any A, B ∈ .) It s ∈ S,
is simple to see that if v represents a decision maker’s
beliefs, we may observe the preferences described (αf + (1 − α)g)(s) ≡ αf (s) + (1 − α)g(s) (5)
above in the two-coin example. Just substitute P
in equations (1) and (2) with v satisfying v(A) = where the object on the right-hand side is the lottery
v(Ac ) = 1/2 and v(B) = v(B c ) = 1/4. The obvious that pays off prize f (s) with probability α and
question is that of defining expectations for a notion prize g(s) with probability (1 − α). Now, say that
of “belief”, which is not a measure. As the model’s a preference satisfies ambiguity hedging (Schmeidler
name suggests, Schmeidler used the notion of integral calls this property uncertainty aversion) if for any f
for capacities, which was developed by Choquet [8]. and g such that f ∼ g we have
Formally, given a capacity space (S, , v) and a -
measurable function a : S → , the Choquet integral αf + (1 − α)g  f (6)
of a with reference to (w.r.t.) v is given by the
following formula: for any α. That is, the decision maker may prefer
  to “hedge” the ambiguous returns of two indifferent

acts by mixing them appropriately. This makes sense
a(s) dv(s) ≡ v({s ∈ S : a(s) ≥ α}) dα
S 0 if we consider two acts whose payoff profiles are
 0 negatively correlated (over S), so that the mixture has
+ [v({s ∈ S : a(s) ≥ α}) − 1] dα a payoff profile, which is flatter, hence less sensitive
−∞
to the information on S, than the original acts.
(3)
(Ghirardato and Marinacci [20] discuss ambiguity
This is shown to correspond to Lebesgue integra- hedging, arguing that it captures more than just the
tion when the capacity v is a probability. Schmeidler ambiguity aversion of equations 1 and 2.) Schmeidler
provided axioms on a decision maker’s preference shows that a CEU decision maker satisfies ambiguity
relation , which guarantee that the latter is rep- hedging if and only if her capacity v is supermodular;
resented by the Choquet expectation w.r.t. v of a that is, for any A, B ∈ ,
real-valued utility function u (on final prizes x ∈ X).
Precisely, given choice options (acts) f, g : S → X, v(A ∪ B) ≥ v(A) + v(B) − v(A ∩ B) (7)
  Ambiguity hedging also plays a key role in
f  g ⇐ ⇒ u(f (s)) dv(s) ≥ u(g(s)) dv(s) the second model of ambiguity-sensitive preferences
S S
(4) proposed by Schmeidler, the MEU model introduced
alongside that of Itzhak Gilboa [21]. In MEU, the
That is, the decision maker prefers f to g when- decision maker’s preferences are represented by (a
ever the Choquet integral of u° f is greater than that utility function u and) a set C of probability charges
Ambiguity 3

on (S, )—which is nonempty, (weak*-)closed and called variational preferences, which relaxes the
convex—as follows: independence condition used in MEU while retaining
the ambiguity hedging condition. An important spe-

cial case of variational preferences is the so-called
f  g ⇐ ⇒ min u(f (s)) dP (s) multiplier model of Hansen and Sargent [25], a key
P ∈C S
 model in the applications literature to be discussed
≥ min u(g(s)) dP (s) (8) later. Siniscalchi [42] proposed a model that he called
P ∈C S vector expected utility, in which an act is evaluated by
Thus, the presence of ambiguity is reflected by modifying its expectation (w.r.t. a “baseline probabil-
the nonuniqueness of the prior probabilities over the ity”) by an adjustment function capturing ambiguity
set of states. In the authors’ words, “the subject has attitudes. Such a model is also built with applications
too little information to form a prior. Hence, (s)he in mind, as it (potentially) employs a smaller number
considers a set of priors as possible” [21, p. 142]. of parameters than CEU and MEU.
In the two-coin example, let S be the product space Second, Bewley [4] (originally circulated in 1986)
{H, T } × {H, T } and consider the set of priors suggested that ambiguity might result in incom-
pleteness of preferences, rather than in violation of
C ≡ ∪a∈[1/4,3/4] {{1/2, 1/2} × {a, 1 − a}} (9) independence. Under such assumptions, he found a
representation in which a set of priors C appears in
It is easy to see that a decision maker with a “unanimity” sense as follows:
such a C will “assign” to events A and Ac the 
weight minP ∈C P (A) = 1/2 = minP ∈C P (Ac ), and to f  g ⇐ ⇒ u(f (s)) dP (s)
events B and B c the weight minP ∈C P (B) = 1/4 = S
minP ∈C P (B c ), thus displaying the classical Ellsberg 
preferences. Gilboa and Schmeidler showed that ≥ u(g(s)) dP (s) for all P ∈ C
S
MEU is axiomatically very close to CEU. While
ambiguity hedging is required (being single-handedly (10)
responsible for the “min” in the representation; see That is, the decision maker prefers f over g when-
[19]), a weaker version of independence is used. ever f dominates g according to every “possible
Ambiguity hedging characterizes the intersection scenario” in C. Preferences are undecided otherwise,
of the CEU and MEU models. Schmeidler [39] shows and Bewley suggested completing them by following
that a decision maker’s preferences have both CEU an “inertia” rule: the status quo is retained if undom-
and MEU representations if and only if (i) the v in inated by any available act. In a model that joins
the CEU representation is supermodular, and (ii) the the two research strands just described, Ghirardato
lower envelope of the set C in the MEU representa- et al. [19] showed that if we drop ambiguity hedg-
tion, C(·) ≡ minP ∈C P (·), is a supermodular capacity ing from the MEU axioms, we can still obtain the
and C is the set of all the probability charges that set of priors C as a “unanimous” representation of
dominate C (the core of C). On the other hand, a suitably defined incomplete subset of the decision
there are CEU preferences that are not MEU (take maker’s preference relation, which they interpreted
a capacity v which is not supermodular), and MEU as “unambiguous” preference (i.e., a preference that
preferences that are not CEU (see [30, Example 1]). is not affected by the presence of ambiguity). This
The CEU and MEU models brought ambiguity yields a model—of which both CEU and MEU are
back to the forefront of decision theoretic research, special cases—in which the decision maker evaluates
and in due course, as “applications” of such theo- act f via the functional
retical models started to appear, they were key in

attracting the attention of mainstream economics and
finance. V (f ) = a(f ) min u(f (s)) dP (s)
P ∈C S
On the theoretical front, a number of alternative 
axiomatic models have been developed. First, there + (1 − a(f )) max u(f (s)) dP (s)
P ∈C
are generalizations of CEU and MEU. For instance, S
Maccheroni et al. [33] presented a model that they (11)
4 Ambiguity

where a(f ) ∈ [0, 1] is the decision maker’s ambigu- such applications, while some applications to finance
ity aversion in evaluating f (a generalization of the are briefly discussed here.
decision rule suggested by Hurwicz [27]). In a seminal contribution, Dow and Werlang [10]
A third modeling approach relaxes the “reduction showed that a CEU agent with supermodular capacity
of compound lotteries” property that is built within may display a nontrivial bid–ask spread on the price
the expected utility model. The basic idea is that the of an (ambiguous) Arrow security, even without fric-
decision maker forms a “second-order” probability µ tions. If the price of the security falls within such an
over the set of possible priors over S, and that he/she interval, the agent will not want to trade the secu-
does not reduce the resulting compound probability. rity at all (given an initial riskless position). Epstein
That is, he/she could evaluate act
 f by first calculat- and Wang [15] employed the recursive MEU model
ing its expectation EP (u° f ) ≡ u(f (s)) dP (s) with to study the equilibrium of a representative agent
respect to each prior P that he/she deems possible, economy à la Lucas. They showed that price inde-
and then computing terminacy can arise in equilibrium for reasons that
 are closely related to Dow and Werlang’s observa-
φ(EP (u° f )) dµ(P ) (12) tion. Other contributions followed along this line; for
 example, see [7, 35, 43]. More recently, the smooth
where  denotes the set of all possible probabil- ambiguity model has also been receiving attention;
ity charges on (S, ), and φ :  →  is a function, see, for example, [28].
which is not necessarily affine. This is the reasoning Though originally not motivated by the Ellsberg
adopted by Segal [40], followed by Ergin and Gul paradox and ambiguity, the “model uncertainty” lit-
[16], Klibanoff et al. [31], Nau [37], and Seo [41]. erature due to Hansen et al. ([26], but more com-
The case of SEU corresponds to φ being affine, while prehensively found in [25]) falls squarely within the
Klibanoff et al. [31] show that φ being concave cor- scope of the applications of ambiguity. Moreover,
responds intuitively to ambiguity averse preferences. both decision models they employ are special cases of
That is, the “external” utility function describes ambi- the models described above: the “multiplier model”
guity attitude, while the “internal” one describes risk is a special case of variational preferences, and the
attitude. An important feature of such a model is that “constraint model” is a special case of MEU.
its representation is smooth (in utility space), whereas Most of the applications of ambiguity to
those of MEU and CEU are generally not. For this finance—an exception being [11]—are cast in
reason, this is called the smooth ambiguity model. a representative agent environment, with the
In concluding this brief survey of decision mod- preferences of the representative agent satisfying in
els, it is important to stress that, owing to space one case MEU, in another CEU, and so on. Recent
constraint, the focus is on static models. The litera- work on experimental finance by Bossaerts et al. [5]
ture on intertemporal models is more recent and less and Ahn et al. [1] finds that experimental subjects,
developed, in part, because of the fact that non-SEU when making portfolio choices with ambiguous
preferences often violate a property called dynamic Arrow securities, display substantial heterogeneity
consistency [18], making it hard to use the traditional in ambiguity attitudes. Because Bossaerts et al. [5]
dynamic programming tools. Important contributions show that such heterogeneity may easily result in a
in this area are found in [14, 22] (characterizing the breakdown of the representative agent result, such
so-called recursive MEU model) and [24, 34]. findings cast some doubt on the generality of a
representative agent approach to financial markets
equilibrium.
Applications
References
As mentioned above, the CEU and MEU models
were finally successful in introducing ambiguity into
[1] Ahn, D., Choi, S., Gale, D. & Shachar, K. (2007).
mainstream research in economics and finance. Many Estimating Ambiguity Aversion in a Portfolio Choice
papers have been written, which assume that (some) Experiment, UC Berkeley, Mimeo.
agents have CEU or MEU preferences. The interested [2] Allais, M. (1953). Le comportement de l’homme
reader is referred to [36] for an extensive survey of rationnel devant le risque: Critique des postulats
Ambiguity 5

et axiomes de l’école américaine, Econometrica 21, [23] Good, I.J. (1962). Subjective probability as the mea-
503–546. sure of a nonmeasurable set, in Logic, Methodology
[3] Anscombe, F.J. & Aumann, R.J. (1963). A definition of and Philosophy of Science, E. Nagel, P. Suppes &
subjective probability, Annals of Mathematical Statistics A. Tarski, eds, Stanford University Press, Stanford,
34, 199–205. pp. 319–329.
[4] Bewley, T. (2002). Knightian decision theory: part I, [24] Hanany, E. & Klibanoff, P. (2007). Updating prefer-
Decisions in Economics and Finance 25(2), 79–110. ences with multiple priors, Theoretical Economics 2(3),
(First version 1986). 261–298.
[5] Bossaerts, P., Ghirardato, P., Guarnaschelli, S. & [25] Hansen, L.P. & Sargent, T.J. (2007). Robustness, Prince-
Zame, W.R. (2006). Ambiguity and asset markets: the- ton University Press, Princeton, NJ.
ory and experiment, Review of Financial Studies, forth- [26] Hansen, L.P., Sargent, T.J. & Tallarini, T.D. (1999).
coming, Notebook 27, Collegio Carlo Alberto. Robust permanent income and pricing, Review of Eco-
[6] Camerer, C. (1995). Individual decision making, in The nomic Studies 66, 873–907.
Handbook of Experimental Economics, J.H. Kagel & [27] Hurwicz, L. (1951). Optimality Criteria for Decision
A.E. Roth, eds, Princeton University Press, Princeton, Making under Ignorance. Statistics 370, Cowles Com-
NJ, pp. 587–703. mission Discussion Paper.
[7] Chen, Z. & Epstein, L.G. (1999). Ambiguity, Risk [28] Izhakian, Y. & Benninga, S. (2008). The Uncertainty
and Asset Returns in Continuous Time, University of Premium in an Ambiguous Economy. Technical report,
Rochester, Mimeo. Recanati School of Business, Tel-Aviv University.
[8] Choquet, G. (1953). Theory of capacities, Annales de [29] Keynes, J.M. (1921). A treatise on probability, The
l’Institut Fourier (Grenoble) 5, 131–295. Collected Writings of John Maynard Keynes, Macmil-
[9] Dempster, A.P. (1967). Upper and lower probabilities lan, London and Basingstoke, paperback 1988 edition,
induced by a multi-valued mapping, Annals of Mathe- Vol. VIII.
matical Statistics 38, 325–339. [30] Klibanoff, P. (2001). Characterizing uncertainty aversion
through preference for mixtures, Social Choice and
[10] Dow, J. & Werlang, S. (1992). Uncertainty aversion,
Welfare 18, 289–301.
risk aversion, and the optimal choice of portfolio,
[31] Klibanoff, P., Marinacci, M. & Mukerji, S. (2005). A
Econometrica 60, 197–204.
smooth model of decision making under ambiguity,
[11] Easley, D. & O’Hara, M. Ambiguity and nonparticipa-
Econometrica 73(6), 1849–1892.
tion: the role of regulation, Review of Financial Studies
[32] Knight, F.H. (1921). Risk, Uncertainty and Profit,
22(5), 1817–1843.
Houghton Mifflin, Boston.
[12] Ellsberg, D. (1961). Risk, ambiguity, and the Savage
[33] Maccheroni, F., Marinacci, M. & Rustichini, A. (2006).
axioms, Quarterly Journal of Economics 75, 643–669.
Ambiguity aversion, robustness, and the variational
[13] Ellsberg, D. (2001). Risk, Ambiguity and Decision. PhD
representation of preferences, Econometrica 74(6),
thesis, Harvard University, 1962. Published by Garland 1447–1498.
Publishing Inc., New York. [34] Maccheroni, F., Marinacci, M. & Rustichini, A. (2006).
[14] Epstein, L.G. & Schneider, M. (2003). Recursive Dynamic variational preferences, Journal of Economic
multiple-priors, Journal of Economic Theory 113, 1–31. Theory 128(1), 4–44.
[15] Epstein, L.G. & Wang, T. (1994). Intertemporal asset [35] Mukerji, S. & Tallon, J.-M. (2001). Ambiguity aversion
pricing under Knightian uncertainty, Econometrica 62, and incompleteness of financial markets, Review of
283–322. Economic Studies 68(4), 883–904.
[16] Ergin, H. & Gul, F. (2004). A Subjective Theory of [36] Mukerji, S. & Tallon, J.-M. (2004). An overview of
Compound Lotteries. February. economic applications of David Schmeidler’s models
[17] Frisch, D. & Baron, J. (1988). Ambiguity and rationality, of decision making under uncertainty, in Uncertainty
Journal of Behavioral Decision Making 1, 149–157. in Economic Theory: A Collection of Essays in Honor
[18] Ghirardato, P. (2002). Revisiting Savage in a conditional of David Schmeidler’s 65th Birthday, I. Gilboa, ed.,
world, Economic Theory 20, 83–92. Routledge, Chapter 13, pp. 283–302.
[19] Ghirardato, P., Maccheroni, F. & Marinacci, M. (2004). [37] Nau, R.F. (2006). Uncertainty aversion with second-
Differentiating ambiguity and ambiguity attitude, Jour- order utilities and probabilities, Management Science
nal of Economic Theory 118(2), 133–173. 52(1), 136.
[20] Ghirardato, P. & Marinacci, M. (2002). Ambiguity made [38] Savage, L.J. (1954). The Foundations of Statistics,
precise: a comparative foundation, Journal of Economic Wiley, New York.
Theory 102, 251–289. [39] Schmeidler, D. (1989). Subjective probability and
[21] Gilboa, I. & Schmeidler, D. (1989). Maxmin expected expected utility without additivity, Econometrica 57,
utility with a non-unique prior, Journal of Mathematical 571–587.
Economics 18, 141–153. [40] Segal, U. (1987). The Ellsberg paradox and risk
[22] Gilboa, I. & Schmeidler, D. (1993). Updating ambiguous aversion: an anticipated utility approach, International
beliefs, Journal of Economic Theory 59, 33–49. Economic Review 28, 175–202.
6 Ambiguity

[41] Seo, K. (2006). Ambiguity and Second-order Belief, Related Articles


University of Rochester, Mimeo.
Behavioral Portfolio Selection; Convex Risk Mea-
[42] Siniscalchi, M. Vector expected utility and attitudes
sures; Expected Utility Maximization; Expected
toward variation, Econometrica 77(3), 801–855. Utility Maximization: Duality Methods; Risk
[43] Uppal, R. & Wang, T. (2003). Model misspecification Aversion; Utility Function; Utility Theory: Histor-
ical Perspectives.
and under-diversification, Journal of Finance 58(6),
2465–2486. PAOLO GHIRARDATO
Risk Premia investor continues to buy or sell the asset until the
marginal loss equals the marginal gain. The Euler
equation is thus
  
Risk premia are the expected excess returns that com- u (Ct+1 )
pensate investors for taking on aggregate risk. The Pt = Et β  Xt+1 = Et [Mt+1 Xt+1 ] (3)
u (Ct )
first section of this article defines risk premia analyti-
cally. The second section surveys empirical evidence where the SDF Mt+1 is defined as Mt+1 ≡ βu
on equity, bond, and currency excess returns. The (Ct+1 )/u (Ct ).
third section reviews the models that explain these
risk premia. Complete Markets. Let us now abstract from util-
ities and assume that markets are complete. There
are S states of nature tomorrow, and s denotes an
Theoretical Definition individual state. A contingent claim is a security that
pays one dollar (or one unit of the consumption good)
Risk premia are derived analytically from Euler equa- in one state s only tomorrow. The price today of
tions that link returns to stochastic discount fac- this contingent claim is Pc (s). In complete markets,
tors (SDFs). These Euler equations can be derived investors can buy any contingent claim (or synthesize
under three different assumptions: complete markets, all contingent claims). Let X be the payoff space and
the law of one price, or the existence of investors’ X(s) ∈ X denote an asset’s payoff in state of nature
preferences. These three assumptions are reviewed s. Let π(s) be the probability that state s occurs. Then
here, followed by the analytical definition of risk the price of this asset is
premia.

S 
S
Pc (s)
P (X) = Pc (s)X(s) = π(s) X(s) (4)
Euler Equations s=1 s=1
π(s)

Utility-based Asset Pricing. Assume that the in- We define M as the ratio of the contingent claim’s
vestor derives some utility u from consumption C price to the corresponding state’s probability M(s) ≡
now and in the next period. This setup can be easily Pc (s)/π(s) to obtain the Euler equation in complete
generalized to many periods. Let us find the price Pt markets:
at time t of a payoff Xt+1 at time t + 1. Let Q be

S
the original consumption level in the absence of any P (X) = π(s)M(s)X(s) = E(MX) (5)
asset purchase and let ξ be the amount of the asset s=1
the investor chooses to buy. The constant subjective
discount factor is β. The maximization problem of Law of One Price and the Absence of Arbitrage.
this investor is Finally, assume now that markets are incomplete
and that we simply observe a set of prices P and
Maxξ u(Ct ) + Et [βu(Ct+1 )] payoffs X. Under a minimal set of assumptions, some
discount factor exists that represents the observed
subject to: Ct = Qt − Pt ξ,
prices by the same equation P = E(MX). These
Ct+1 = Qt+1 + Xt+1 ξ (1) assumptions are defined below:

Substituting the constraints into the objective and Definition 1 Free portfolio formation: X1 , X2 ∈
setting the derivative with respect to ξ to zero yields X ⇒ aX1 + bX2 ∈ X for any real a and b.

Pt u (Ct ) = Et [βu (Ct+1 )Xt+1 ] (2) Definition 2 Law of one price: P (aX1 + bX2 ) =
aP (X1 ) + bP (X2 ).
where Pt u (Ct ) is the loss in utility if the investor
buys another unit of the asset, and Et [βu (Ct+1 )Xt+1 ] Note that free portfolio formation rules out short
is the expected and discounted increase in utility sales constraints, bid/ask spreads, leverage limi-
he/she obtains from the extra payoff Xt+1 . The tations, and so on. The law of one price says
2 Risk Premia

that investors cannot make instantaneous profits by Projecting X on M is like regressing X on M without
repackaging portfolios. These assumptions lead to the a constant:
following theorem: E(MX)
proj(X|M) = M (7)
Theorem 1 Given free portfolio formation and the E(M 2 )
law of one price, there exists a unique payoff X  ∈ X
The residuals ε are orthogonal to the right-hand side
such that P (X) = E(X  X) for all X ∈ X. variable M: E(Mε) = 0, which means that the price
As a result, there exists an SDF M such that of ε is zero. The price of the projection of X on M
P (X) = E(MX). Note that the existence of a dis- is the price of X:
count factor implies the law of one price E[M(X +  
E(MX)
Y )] = E[MX] + E[MY ]. The theorem reverses this P (proj (X|M)) = E M M = E(MX)
E(M 2 )
logic. Cochrane [7] offers a geometric and an arith- (8)
metic proof. With a stronger assumption, the absence
of arbitrage, the SDF is strictly positive and thus rep-
Payoffs and Returns. We have reviewed three
resents some–potentially unknown–preferences. Let
frameworks that lead to the Euler equation. This
us first review the definition of the absence of arbi-
equation defines the asset price P for any asset. For
trage and then turn to this new theorem.
stocks, the payoff Xt+1 is the price next period Pt+1
Definition 3 Absence of arbitrage: A payoff space and the dividend Dt+1 . For a one-period bond, the
X and pricing function P (X) leave no arbitrage payoff is 1: one buys a bond at price Pt and receive
opportunities if every payoff X that is always nonneg- 1 dollar next period. Alternatively, we can write the
ative (X ≥ 0 almost surely) and strictly positive (X > Euler equation in terms of returns. For stocks, returns
are payoffs divided by prices: Rt+1 = Xt+1 /Pt+1 . For
0) with some positive probability has some strictly
bonds, one pays 1 dollar today and receives Rt+1
positive price P (X) > 0.
dollars tomorrow. In any case, the Euler equation in
In other words, no arbitrage says that one cannot terms of returns is thus
get for free a portfolio that might pay off positively
Et [Mt+1 Rt+1 ] = 1 (9)
but will certainly never cost one anything. This
assumption leads to the next theorem: The Euler equation naturally applies to a risk-free
f
asset. If one pays 1 dollar today and receives Rt
Theorem 2 No arbitrage and the law of one price f
imply the existence of a strictly positive discount dollars tomorrow for sure, the risk-free rate Rt
factor M > 0 such that P = E(MX), ∀X ∈ X. satisfies
f
Rt = 1/Et [Mt+1 ] (10)
We have seen three ways to derive the Euler
equation that links any asset’s price to the SDF. Expected Excess Returns
Before we exploit the Euler equation to define risk
premia, note that only aggregate risk matters for asset Definition of Risk Premia. Applying the defini-
prices. tion of the covariance to the Euler equation (9) for
i
the asset return R i leads to Et (Mt+1 )Et (Rt+1 )+
covt [Mt+1 , Rt+1 ] = 1. Using the definition of the
i

Aggregate and Idiosyncratic Risk. Only the com- risk-free rate in equation (10), we obtain
ponent of payoffs that is correlated with the SDF f f
shows up in the asset’s price. Idiosyncratic risk,
i
Et (Rt+1 ) − Rt = −Rt covt [Mt+1 , Rt+1
i
] (11)
uncorrelated with the SDF, generates no premium.
The left-hand side of equation (11) defines the
To see this, let us project X on M and decompose
expected excess return. The right-hand side of equa-
the payoff as follows:
tion (11) defines the risk premium. When the asset
return R i is negatively correlated to the SDF, the
X = proj(X|M) + ε (6) investor expects a positive excess return on asset
Risk Premia 3

i. All assets have an expected return equal to the the previous results in terms of the log SDF mt+1 and
i
risk-free rate, plus a risk adjustment that is positive log return rt+1 . Assuming that SDF and returns are
or negative. lognormal, equation (9) leads to
To gain some intuition on the definition above,
let us consider the case of preference-based SDFs. 1
Assume that utility increases, and marginal utility Et (mt+1 ) + Et (rt+1
i
) + Vart (mt+1 )
2
decreases with consumption; this is the consumption-
1
capital asset pricing model (consumption-CAPM). + Vart (rt+1i
) + Covt (mt+1 , rt+1
i
)=0 (14)
Here, the SDF—also known as intertemporal margi- 2
nal rate of substitution—is the ratio of marginal util- where lowercase letters denote logs. The same equa-
ity of consumption tomorrow divided by the marginal f e,i
tion holds for the risk-free rate rt . Let r̃t+1 be
utility of consumption today. Substituting the SDF the excess return corrected for the Jensen term:
into equation (11), we obtain − rt + 1 Vart (rt+1
f
e,i
r̃t+1 = rt+1
i i
). Then, the expected
2
Covt [βu (Ct+1 ), Rt+1
i
] log excess return is equal to
f f
i
Et (Rt+1 ) − Rt = −Rt 
u (Ct ) e,i
Et (r̃t+1 ) = −Covt (mt+1 , r̃t+1
e,i
) (15)
(12)
Marginal utility u (C) declines as consumption C For the consumption-CAPM, the utility each
rises. Thus, an asset’s expected excess return is posi- period is u(C) = C 1−γ /(1 − γ ). The log SDF
tive if its return covaries positively with consumption. depends only on consumption growth and is equal
The reason for this is can be explained as follows. Our to mt+1 = log β − γ g − γ (ct+1 − g), where g is
assumption on the investors’ utility function implies the average consumption growth. In this case, the
that investors dislike uncertainty about consumption. expected excess return is equal to
An asset whose return covaries positively with con- e,i
sumption pays off well when the investor is already Et (r̃t+1 e,i
) = γ Covt (ct+1 − g, r̃t+1 ) (16)
feeling wealthy and it pays off badly when he/she is
already feeling poor. Thus, such an asset will make Again, assets whose returns covary positively with
the investor’s consumption stream more volatile. As consumption must promise positive expected returns
a result, assets whose returns covary positively with to induce investors to hold them.
consumption make consumption more volatile, and
so must promise higher expected returns to induce
investors to hold them. Empirical Evidence

Beta-representation and Market Price of Risk. Now the empirical stylized facts on risk premia are
We can rewrite the right-hand side of equation discussed. A large literature shows that, in many asset
(11) as markets, expected excess returns are sizable and time-
  varying. The equity, bond, and currency markets are
i 
Covt [Mt+1 , Rt+1 ] Vart [Mt+1 ] considered (see Predictability of Asset Prices).
− (13)
Vart [Mt+1 ] Et [Mt+1 ]



βi,M λM Stock Markets

i
E(Rt+1
f
) − Rt = βi,M λM is then a beta-representation Evidence of large risk premia abound on equity
of the Euler equation. Note that λM is independent of markets. The size of the average excess return
the asset i. It is called the market price of risk. βi,M on the stock market is actually puzzling from
is the quantity of risk. The expected excess return on a consumption-based asset pricing perspective; it
asset i is equal to the quantity of risk of this asset constitutes the equity premium puzzle. Moreover,
times the price of risk. expected equity returns appear time-varying.

Euler Equation with Log Returns and Log SDF. Equity Premium Puzzle. To understand the equity
To interpret risk premia, it is often easier to rewrite premium puzzle, let us first define the Sharpe ratio.
4 Risk Premia

Definition 4 The Sharpe ratio SR measures how Taking logs leads to


much return the investor receives per unit of volatility:
pt − dt = −rt+1 + dt+1 + log(1 + ept+1 −dt+1 )
E(R i ) − R f (21)
SR = (17)
σ (R i ) A first-order Taylor approximation of the last term
around the mean price-dividend ratio P /D gives
where σ (R i ) denotes the standard deviation of the
return R i . pt − dt = −rt+1 + dt+1 + k + ρ(pt+1 − dt+1 )
(22)
Over the period 1927–2006 in the United States,
real excess returns on the New York Stock Exchange where k = log(1 + P /D) and ρ = (P /D)/(1 +
(NYSE) stock index have averaged 8%, with a P /D). Iterating forward and assuming that
standard deviation of 20%, and thus the Sharpe ratio limj →∞ ρ j (pt+j − dt+j ) = 0, one obtains
has been about 0.4. Starting from equation (11) and


using the fact that correlations are below unity, the
pt − dt = Constant + ρ j −1 (dt+j − rt+j ) (23)
Sharpe ratio is linked to the first and second moments
j =1
of SDFs:

E(R i ) − R f σ (M) This equation holds ex-post, and thus also ex-ante:

≤ (18)
σ (R i ) E(M) ∞

pt − dt = Constant + Et ρ j −1 (dt+j − rt+j )
Now, recall the consumption-CAPM and assume that j =1
consumption is lognormal. Then, the right-hand side (24)
is approximately
Now multiply both sides by pt − dt − E(pt − dt ).

σ (M) 2 2
Then the variance of the log price-dividend ratio is
= eγ σc − 1 ≈ γ σc (19)
E(M)  


Aggregate nondurable and services consumption
cov pt − dt , ρ j −1 dt+j 
growth has a mean of 2% and a standard devi-
j =1
ation of 1%, implying a risk-aversion coefficient  
of 40! If we take into account the low correla- ∞

tion between consumption growth rates and market − cov pt − dt , ρ j −1 rt+j  (25)
returns, the implied risk aversion is even higher. j =1
This is the equity premium puzzle of Mehra and
Prescott [16]. Such a high risk-aversion coefficient The fact that the price-dividend ratio varies means
implies implausibly high risk-free rates. This is that either dividend growth rates or returns must
the risk-free rate puzzle of Weil [19]. The above- be forecastable. The question is: which one is
mentioned evidence is based on realized excess forecastable? Long-horizon regressions show little
returns. Yet similar results are obtained with expected predictability in dividend growth rates and some pre-
excess returns, which turn out to be large and dictability in returns and excess returns (Table 1).
time-varying. We have seen that the aggregate stock market
offers evidence of sizable and time-varying risk
Time-varying Expected Excess Returns. The premia. Many subsets of the market offer comparable
Campbell and Shiller [5] decomposition of stock results. For example, Fama and French [12] sort
returns frames the evidence on stock market pre- stocks along different dimensions (e.g., their market
−1 size, book-to-market ratios, or past returns), build
dictability. To see this, start from 1 = Rt+1 Rt+1 =
−1
Rt+1 (Pt+1 + Dt+1 )/Pt , and multiply both sides by the corresponding portfolios and obtain large cross
the price-dividend ratio Pt /Dt to obtain sections of returns. Buying the stocks in the last
  portfolio and selling the ones in the first portfolio
Pt −1 Pt+1 Dt+1 lead to large and predictable excess returns, and thus
= Rt+1 1 + (20)
Dt Dt+1 Dt evidence of equity risk premia.
Risk Premia 5

Table 1 Long-horizon stock market predictability tests


Horizon Excess returns Dividend growth

h α s.e. R2 α s.e. R2
1 3.77 1.38 0.07 −0.11 1.00 0.00
2 7.46 2.36 0.12 −0.76 0.86 0.01
3 12.07 3.70 0.18 0.12 0.98 0.00
4 17.62 5.27 0.24 0.41 1.26 0.00
5 22.01 5.66 0.29 0.03 0.89 0.00

This table reports slope coefficients α, standard errors s.e. and R 2 from in-sample predictability tests. In the left panel, the univariate
e
regressions are Rt,t+h = C + αDt /Pt + εt+h , where Rt,t+he
denotes the h-year ahead stock market excess return and Dt /Pt the
dividend-price ratio. In the right panel, the regressions are Dt+h /Dt = C + αDt /Pt + t+h , where Dt+h /Dt denotes the h-year
ahead dividend growth rate. The sample relates to the period 1927–2006. Data are annual

Bond Markets Table 2 Expectation hypothesis tests


n=2 n=3 n=4 n=5
Equivalent results are obtained on bond markets,
where expected excess returns exist and are time- −0.88 −1.46 −1.62 −1.70
varying. These results contradict the usual “expecta- [0.47] [0.48] [0.53] [0.61]
tion hypothesis of the term structure”. It is reviewed, This table reports slope coefficients βn and associated standard
followed by the empirical evidence on bond excess errors from n−1
− ytn =
 n the following
 univariate regressions: yt+1
returns (see Expectations Hypothesis). yt − yt1
α + βn + t+1 , where ytn denotes the n-year bond
The expectation hypothesis can be defined in three n−1
equivalent ways: yield. The sample relates to the period 1952–2006. Data are
annual
• The yield ytn of a bond with maturity n is
equal to the average of the expected yields of equation:
future one-year bonds, up to a constant risk  
premium: ytn − yt1
n−1
yt+1 − ytn = α + βn + t+1 (29)
1 n−1
ytn = Et (yt1 + yt+1
1
+ · · · + yt+n−1
1
) (26)
n
The expectation hypothesis implies that βn = 1. In
• The forward rate equals the expected future spot the data, the slope coefficient βn is significantly
rate, up to a constant risk premium: below 1, often negative, and decreasing with the
horizon n (Table 2). The rejection of the expectation
ftn→n+1 = Et (yt+n
1
) (27) hypothesis implies that bond markets offer time-
varying, expected excess returns.
• The expected holding-period return (defined as
the return on buying a bond of a given maturity
n and selling it in the next period) is the same for Currency Markets
any maturity n, up to a constant risk premium:
Risk premia are also prevalent on currency markets.
n
Et (hprt+1 ) = yt1 , ∀n (28) Currency excess returns correspond to the follow-
ing investment strategy: borrowing in the domestic
where hprt+1 n
= pt+1
n−1
− ptn denotes the log currency, exchanging this amount for some foreign
holding-period return and ptn the log price of a bond currency, lending abroad, and converting back the
of maturity n.a earnings into the domestic currency. According to the
standard uncovered interest rate parity (UIP) condi-
Following Campbell and Shiller [6], the expecta- tion, the expected change in exchange rate should be
tion hypothesis is often tested with the following equal to the interest rate differential between foreign
6 Risk Premia

and domestic risk-free bonds. In this case, expected successful classes of models in this literature, namely,
currency excess returns should be zero. However, the habit preferences, long-run risk, and disaster risk, are
UIP condition is clearly rejected in the data. In a reviewed.
simple regression of exchange rate changes on inter-
est rate differentials, UIP predicts a slope coefficient
of 1. Instead, empirical work following Hansen and
Habit Preferences
Hodrick [13] and Fama [11] consistently reveals a Habit preferences assume that the agent does not care
regression coefficient that is smaller than 1 and very about the absolute level of his/her consumption, but
often negative. The international economics literature cares about its relative level compared to a habit level
refers to these negative UIP slope coefficients as the that can be interpreted as a subsistence level, past
UIP puzzle or forward premium anomaly. Negative consumption, or the neighbors’ consumption. Hence,
slope coefficients mean that currencies with higher preferences over habits H are defined using ratios
than average interest rates actually tend to appreci- or differences (C/H or C − H ), where H depends
ate. Investors in foreign one-period discount bonds on past consumption: Ht = f (Ct−1 , Ct−2 , . . .). Major
thus earn the interest rate spread, which is known examples of habit preferences are found in Abel [1],
at the time of their investment, plus the bonus from Campbell and Cochrane [4], Constantinides [8] and
the appreciation of the currency during the holding Sundaresan [18]. Preferences defined using differ-
period. As a result, the failure of the UIP condi- ences between consumption and habit (e.g., u(C) =
tion implies positive predictable excess returns when (C − H )−γ ) imply time-varying risk-aversion coeffi-
investing in high interest rate currencies and nega- cient if the percentage gap between consumption and
tive excess returns for investing in low interest rate habit changes through time:
currencies. Lustig and Verdelhan [15] build portfo-
lios of currency excess returns by sorting currencies CUCC γ Ht
γt = − = (30)
on their interest rate differentials with the United UC C t − Ht
States. They obtain a large cross section of currency
excess returns and show that these excess returns Campbell and Cochrane [4] propose a model along
compensate the US investor for bearing US aggre- these lines. In their model, the habit level is slow
gate macroeconomic risk because high interest rate moving; in bad times, consumption falls close to
currencies tend to depreciate in bad times. As a the habit level, and the investor is very risk averse.
result, currency excess returns are also evidence of This model offers a new interpretation to risk pre-
risk premia. mia: investors fear bad returns and wealth loss
To summarize this section, equity, bond, and because they tend to happen in recessions, when
currency markets offer predictable excess returns, consumption falls relative to its recent past. These
and are thus characterized by risk premia. Now the preferences generate many interesting asset pric-
potential theoretical explanations of these risk premia ing features: pro-cyclical variations of stock prices,
are discussed. long-horizon predictability, countercyclical variation
of stock market volatility, countercyclicality of the
Sharpe ratio, and the short- and long-run equity
premium.
Theoretical Interpretations
As observed above, the consumption-CAPM (also Long-run Risk
known as power utility) can replicate average
equity excess returns only with implausibly high The long-run risk literature works off the class of
risk-aversion coefficients. Moreover, if consumption preferences due to Epstein and Zin [9, 10] and Kreps
growth shocks are close to independent and identi- and Porteus [14]. These preferences impute a con-
cally distributed (i.i.d.)—as they are in the data—this cern for the timing of the resolution of uncertainty to
model does not explain time variations in expected agents, and the risk-aversion coefficient is no longer
excess returns. A large literature seeks to address the inverse of the intertemporal elasticity of substitu-
these shortcomings and offers different interpretations tion as it is with the consumption-CAPM (see Recur-
of the observed risk premia. Now the three most sive Preferences). Building on these preferences,
Risk Premia 7

Bansal and Yaron [2] propose a model where the Acknowledgments


consumption and dividend growth processes contain
a low-frequency component and are heteroscedastic. I owe a great part of my knowledge on risk premia to John
Cochrane and to his book on “Asset Pricing”, which has
These two features capture time-varying growth rates
inspired large parts of this article.
and time-varying economic uncertainty. Because this
low-frequency component is persistent, a high value
today signals high expected consumption growth in End Notes
the future. If the intertemporal elasticity of sub-
a.
stitution is above 1, then, in response to higher Recall that the yield ytn of an n-year bond is a fraction of
the log price ptn of the bond: ytn = − n1 pn .
expected growth, agents buy more assets, and the t
price to consumption ratio rises: the intertemporal
substitution effect dominates the wealth effect. In References
this case, asset prices are high in good times and
low in bad times; thus, investors require risk pre- [1] Abel, A.B. (1990). Asset prices under habit formation
mia. In this model, agents have preference for early and catching up with the Joneses, American Economic
resolution of uncertainty, which increases the risk Review 80(2), 38–42.
compensation for long-run growth and uncertainty [2] Bansal, R. & Yaron, A. (2004). Risks for the long run: a
risks. potential resolution of asset prizing puzzles, The Journal
of Finance 59, 1481–1509.
[3] Barro, R. (2006). Rare disasters and asset markets in the
twentieth century, Quarterly Journal of Economics 121,
823–866.
Disaster Risk [4] Campbell, J.Y. & Cochrane, J.H. (1999). By force of
habit: a consumption-based explanation of aggregate
In the disaster risk literature, the agent is charac- stock market behavior, Journal of Political Economy
terized by the usual constant relative risk-aversion 107(2), 205–251.
[5] Campbell, J.Y. & Shiller, R.J. (1988). The dividend-
preferences. Rietz [17] assumes that in each period price ratio and expectations of future dividends and
a small-probability disaster may occur, and in this discount factors, Review of Financial Studies 1,
case, consumption and dividends drop sharply. Barro 195–228.
[3] calibrates disaster probabilities from the twenti- [6] Campbell, J.Y. & Shiller, R.J. (1991). Yield spreads and
eth-century global history and shows that they are interest rates: a bird’s eye view, Review of Economic
Studies 58, 495–514.
consistent with the high equity premium, low risk-
[7] Cochrane, J.H. (2001). Asset Pricing. Princeton Univer-
free rate, and volatile stock returns. In this model, sity Press, Princeton, NJ.
risk premia exist because investors fear rare economic [8] Constantinides, G.M. (1990). Habit formation: a reso-
disasters. lution of the equity premium puzzle, The Journal of
Political Economy 98, 519–543.
[9] Epstein, L.G. & Zin, S. (1989). Substitution, risk aver-
sion and the temporal behavior of consumption and
asset returns: a theoretical framework, Econometrica 57,
Conclusion 937–969.
[10] Epstein, L.G. & Zin, S. (1991). Substitution, risk
aversion and the temporal behavior of consumption
Under a minimal set of assumptions, any return sat-
and asset returns, Journal of Political Economy 99(6),
isfies a simple Euler equation. This equation implies 263–286.
that expected returns in excess of the risk-free rate, [11] Fama, E. (1984). Forward and spot exchange rates,
that is, risk premia, exist because returns comove with Journal of Monetary Economics 14, 319–338.
aggregate factors that matter for the investor. Empir- [12] Fama, E.F. & French, K.R. (1992). The cross-section
ical evidence from the equity, bond, and currency of expected stock returns, Journal of Finance 47(2),
427–465.
markets points to large and time-varying predictable
[13] Hansen, L.P. & Hodrick, R.J. (1980). Forward exchange
excess returns. A recent literature tries to replicate rates as optimal predictors of future spot rates: an econo-
and interpret these risk premia as compensations for metric analysis, Journal of Political Economy 88(5),
recession, long-run, or disaster risks. 829–853.
8 Risk Premia

[14] Kreps, D. & Porteus, E.L. (1978). Temporal resolution [19] Weil, P. (1989). The equity premium puzzle and the
of uncertainty and dynamic choice theory, Econometrica risk-free rate puzzle, Journal of Monetary Economics 24,
46, 185–200. 401–424.
[15] Lustig, H. & Verdelhan, A. (2007). The cross-section of
foreign currency risk premia and consumption growth
risk, American Economic Review 97(1), 89–117.
[16] Mehra, R. & Prescott, E. (1985). The equity pre- Related Articles
mium: a puzzle, Journal of Monetary Economics 15(2),
145–161.
[17] Rietz, T.A. (1988). The equity risk premium: a solution, Arbitrage Pricing Theory; Capital Asset Pricing
Journal of Monetary Economics 22, 117–131. Model; Stochastic Discount Factors; Utility
[18] Sundaresan, S. (1989). Intertemporal dependent prefer- Function.
ences and the volatility of consumption and wealth, The
Review of Financial Studies 2(1), 73–88. ADRIEN VERDELHAN
Predictability of Asset are related to interest rates: relative interest rate [7],
term spread and the default spread [7, 16, 23], infla-
Prices tion rate [14, 18]; variables that are related to “one
over the price”: dividend yield [10], payout yield [4],
earning–price ratio and dividend–earnings (payout)
Predictability can be interpreted in many ways in ratio [26], book-to-market ratio [25, 32]; and other
finance. The fundamental issue in asset pricing variables including aggregate net issuing activity [2]
is to determine the relationship between risk and and consumption–wealth–income ratio [27].
reward. To quantify such a relationship, an economic Although the focus is on the rational explanation
model is built to “predict” how the expected asset for predictability, the evidence has also been inter-
returns should vary with their risk measures. In this preted differently under different views. Their dif-
case, predictability means contemporaneous associa- ferences are illustrated by the following story. Once
tion between the expected return of an asset and the there were four students walking on a street with their
expected returns of different risk factors. For exam- professor. A dollar bill lying on the sidewalk quickly
ple, the capital asset pricing model (CAPM) predicts caught the professor’s eyes. The professor asked the
that a security’s expected risk premium is propor- four students why nobody was picking up the dollar
tional to the expected return from the market factor, bill. The first student answered although the dollar
where the proportionality reflects the systematic risk bill was real, people just pretended not seeing it. The
measure. This type of predictability is not the focus second student argued that the dollar bill was just an
of this article. Instead, the focus is on whether future illusion (or a statistical illusion). The third student
security returns can be predicted from current known said that, even though the dollar bill was real, no one
information. would bother to pick it up because it was too costly to
One important assumption used to build a rational pick it up (or transactions costs). The last student’s
asset pricing model is the market efficiency (see answer was that the dollar bill was real. Someone
Efficient Market Hypothesis), in which security left it there for a needy person. Generally speaking,
prices reflect all available information quickly and the first student is a behaviorist; the second and third
fairly. This was interpreted literally in the 1950s and students hold the traditional efficient market view;
1960s as saying that any lagged variables possess no and the last student holds the modern view on the
power in predicting current or future security prices EMH. No matter which student’s answer represents
or returns. The modern finance theory, however, has your view, predictability cannot be too large. There
a different interpretation for the evidence of return is an old saying: if you can predict the market, why
predictability. In fact, researchers have recognized aren’t you rich!
since 1980s that the expected returns can vary over The existence of predictability is crucial in testing
time due to changes in investors’ risk tolerance and/or the conditional asset pricing models [19], in return
investment opportunities [30] over business cycles. decomposition [8], in asset allocation [22], and so on.
If business cycles are predictable to some degree, Because of the theoretical foundation for predictabil-
returns can also be predictable, which poses no ity, this article focuses primarily on aggregate market
challenge to the efficient market hypothesis (EMH). returns. Predictability is also related to anomalies. An
Under this view, one should not rely solely on the anomaly is defined as the deviation from an asset
historical average returns to estimate expected returns pricing model. In most empirical studies, anomalies
in assisting our investment decisions. In other words, are tied to a specific part of the market, such as small
the task of estimating the expected returns precisely firms, firms with low book-to-market ratios, and so
largely depends on our ability to predict future stock on, or particular sample periods, such as January,
returns. weekends, and so on. A detailed review on anomalies
Given the fact that the serial correlations for aggre- can be found in [35].
gate stock returns are weak especially in the recent This article intends to offer a perspective on both
decade, the quest for additional predictors goes on. the evidence and the reasons for return predictability.
Many financial variables have been shown to possess A detailed discussion about the economic reasons for
predictive power for stock returns. A partial list of predictability is given in the section Economic Inter-
these variables can be characterized as variables that pretation of Predictability. Recent empirical studies
2 Predictability of Asset Prices

have uncovered many useful predictors, which are it is clear that most of the predictability from past
summarized in the section Understanding Some Use- returns concentrates in the early sample period from
ful Predictors. Predictability is not without contro- 1962 to 1984, with autocorrelations as high as 22.4
versy. Many of the statistical issues in testing the and 38.5% for value-weighted and equal-weighted
predictability are discussed in the section Statistical indices, respectively.
Issues, followed by conclusion in the last section. Predictability in daily returns might be subject
to market microstructure effects discussed in the
section The Economic Interpretation of Predictability.
Evidence on Predictability One way to alleviate such effects is to examine the
behavior of monthly returns. For both value- and
The most simple form of predictability is the return
equal-weighted index returns, the autocorrelations
autocorrelation. To gain a perspective on the mag-
have been substantially attenuated. For example,
nitude of the serial correlation, returns of differ-
over the whole sample period, autocorrelation for
ent frequencies and over different sample peri-
value-weighted index returns is only 4.3%, almost
ods are examined. Owing to the availability of
daily returns, the whole sample period is from negligible. For the equal-weighted index, however,
1962 to 2006. The summary statistics is listed in the autocorrelation is still as large as 17.6% for
Table 1 for both value-weighted and equal-weighted the whole sample period and is stable over the two
NYSE/AMEX/NASDAQ composite index returns. subsample periods. Therefore, it can be concluded
For the whole sample period, the average value- that return serial correlations are more likely to occur
weighted index daily return is 0.044% with a volatil- in small stocks. Given there are still substantial serial
ity of 0.859%. Such a large difference between aver- correlations in low-frequency small stock return data,
age return and volatility implies a very low Sharpe market microstructure effects cannot be the only
ratio of 5%. If returns are autocorrelated, the “true” factor.
Sharpe ratio should be larger.a For the value-weighted If future returns can only be weakly predicted by
index returns, the autocorrelation is about 13%. Such past returns, are there other variables that help to pre-
a large autocorrelation further increases to 31% when dict returns? In Table 2, we further study return pre-
an equal-weighted index is used. If we fit an AR(1) dictability using three other variables—the dividend
model to the equal-weighted index returns, we see yield, the repurchasing yield, and the relative interest
an R 2 of 9.61%! The autocorrelation difference in rate. Our sample starts in 1952 after a major shift in
the two types of index returns suggests that small the interest rate regime by the Federal Reserve. To be
stocks are more predictable than large stocks. To see representative, we focus on the value-weighted index
whether such a predictability is stable over time, the returns. During the first 17 years from 1952 to 1978,
whole sample period is split into two. From Table 1, both the dividend yield and the relative interest rate

Table 1 Autocorrelations in index returns


Value weighted Equal weighted
Sample
period Mean SD Corr. Mean SD Corr.
Panel A: daily returns
1962–2006 0.044 0.859 13.2 0.069 0.744 31.0
1962–1984 0.035 0.794 22.4 0.068 0.787 38.5
1985–2006 0.053 0.922 6.1 0.071 0.696 21.0
Panel A: monthly returns
1962–2006 0.929 4.216 4.3 1.186 5.345 17.6
1962–1984 0.772 4.422 6.0 1.285 6.252 16.4
1985–2006 1.079 4.010 1.9 1.092 4.312 20.0
This table reports the characteristics of NYSE/AMEX/NASDAQ composite index returns
over different samples periods and for different frequencies. “Corr.” stands for the first-order
autocorrelation; “SD” is the standard deviation
Predictability of Asset Prices 3

Table 2 VAR results for index returns

Dependent Adjusted
variable rt (D/P )t (F/P )t rrelt R2
Panel A: sample period 1952–1978
rt+1 0.061 10.90 0.675 −11.67 0.062
(D/P )t+1 −0.000 0.966 0.003 0.042 0.956
(F/P )t+1 −0.001 0.038 0.943 0.034 0.898
rrelt+1 0.000 0.032 0.005 0.731 0.529
Panel A: sample period 1979–2005
rt+1 0.030 0.461 3.508 −0.801 0.009
(D/P )t+1 −0.000 0.994 −0.009 0.005 0.985
(F/P )t+1 −0.001 0.029 0.971 0.071 0.960
rrelt+1 −0.000 0.009 −0.010 0.751 0.560
This table reports the VAR results for the four variables including the value-weighted
NYSE/AMEX/NASDAQ composite index return, dividend yield, repurchasing yield,
and the relative interest rate over different sample periods. The bold face number
indicates that the estimate is statistically significant at a 5% level

have helped to predict returns, with an adjusted R 2 future prices is the current price. In other words, we
of 6.2%. In contrast, the repurchasing yield becomes have
more important over the next 17 years from 1979 to
Cov[(Pt+j − Pt+i ), (Pt+l − Pt+k )|It ] = 0 (3)
2005, with an adjusted R 2 of 0.9%. The evidence sug-
gests that returns are predictable even if not by their where i < j < k < l. In other words, the nonoverlap-
past returns. Despite large persistence of all three ping price changes are uncorrelated at all leads and
predictors as shown in Table 2, statistical adjustment lags. If we interpret the price difference as a return,
for estimates will not likely take away the predictive it means that returns should be unpredictable.
power of the three variables (see the section Statisti- This analysis defines the notion of EMH. Financial
cal Issues). markets are said to be efficient if security prices
rapidly reflect all relevant information about asset
values, and all securities are fairly priced in light
Predictability and Market Efficiency of the available information. In other words, the
Historically, predictability has been associated with EMH describes how security prices should react to
market inefficiency. According to the fundamental available information and how prices should evolve
law of valuation, a security price should reflect its over time. Under this framework, return predictability
expected fundament value for risk-neutral investors serves as evidence against the EMH.
with zero interest rate: Does the EMH indeed exclude predictability? To
answer this question, we focus on a stronger version
Pt = E[V ∗ |It ], Pt+1 = E[V ∗ |It+1 ] (1) of the Martingale process, which is the random walk
process, and assume that investors are risk averse.
where V ∗ is the fundamental value and It is the The random walk process was first used by Bachelier
information set at time t. Since the information set It (1900) to model stock prices in his dissertation, and
is included in the information set It+1 , the following was rekindled by Merton in the late 1960s. For
result is obtained by the law of iterated expectations: convenience, we use log price pt

pt+1 = µ + pt + t+1 (4)


Pt = E[V ∗ |It ] = E[E(V ∗ |It+1 )|It ] = E[Pt+1 |It ]
(2) where µ is the expected price change. If we define
return as rt+1 = pt+1 − pt , equation (4) can be
Equation (2) suggests that security prices should expressed as
follow a Martingale process.b The best predictor for rt+1 = µ + t+1 (5)
4 Predictability of Asset Prices

Strictly speaking, the EMH only puts a restriction traditional framework? Most explanations focus on
on the residual t+1 to satisfy the condition of market microstructure effects and transactions costs.
E[t+1 |It ] = 0 at any time t in either equation (4) or This section reviews the bid–ask bounce, nonsyn-
(5). Since µ is determined by an asset pricing model, chronous trading, and transactions costs in explaining
such as the CAPM, the traditional view on the EMH the return autocorrelation.
implicitly assumes that µ is constant. The modern
finance theory, however, has offered a different view
on µ. For example, Fama and French [17] have Bid–ask Bounce
suggested that the risk premium might be higher in
the economic downturn than in the peak of a business Returns tend to be negatively autocorrelated in a
cycle. This evidence suggests that the expected return short-run. One possible explanation is offered by
might be time varying. In fact, many asset pricing Roll [34] from the perspective of bid and ask price
models since Merton have emphasized the idea of differences. In the absence of information, sell orders
changing investment opportunities, which requires and buy orders arrive with the same probabilities. In
additional risk compensation over time. Alternatively, other words, a buy order is likely to follow a sell
investors’ risk tolerance might change over time, order, which results in a negative autocorrelation. In
which will cause the investors to demand different particular, let Pt∗ be the fundamental value:
levels of risk premium. No matter which scenario is
more likely, one should allow µ to be time varying:
Pt = Pt∗ + It (s/2) (7)
rt+1 = µt+1 + t+1 (6) 
+1 if buy order with prob = 0.5
It = (8)
Although under the EMH, we still have the condition −1 if sell order with prob = 0.5
of E[t+1 |It ] = 0, E[µt+1 |It ] is not necessarily con-
stant. For example, if risk premia changes with the where s is the bid–ask spread. This implies a price
business cycle and the business cycle is predictable, change of Pt = Pt∗ + (It − It−1 )s/2. In other
return should also be predictable. This analysis opens words, autocorrelation is related to the spread s in
a channel for the predictability to coexist with the the following way:
EMH.
Returns from a buy-and-hold strategy on the mar-
Cov(Pt−1 , Pt ) = −s 2 /4 (9)
ket portfolio correspond to returns for a represen-
tative investor. Predictability means that someone
Since the bid–ask spreads tend to be larger for small
can implement a trading strategy that requires a full
company stocks than for large stocks, autocorrelation
investment in some periods and a zero or a short posi-
will be stronger for small firms than for large stocks,
tion in other periods in order to earn higher returns
than those from a buy-and-hold strategy. Clearly, this other things being equal. Equation (9) can also be
investment strategy cannot be implemented by the used to back out the implied bid–ask spread.
representative investor since he/she has to fully invest If the autocorrelation is due to differences in the
in the equity market. Although such a strategy will bid and ask prices, the effect should be smaller if
pay off in a long run, it is not without risk in short the average bid and ask prices are used to com-
term. The success of this strategy depends on the pute returns instead of the actual closing prices.
degree of predictability. Therefore, predictability can- Similarly, low-frequency returns, such as monthly
not be too large in order to prevent too many investors returns, should have weaker autocorrelation than
defecting from being representative investors. high-frequency returns, such as daily returns, which is
true in general. We should also see a drop in autocor-
relations over time when the average bid–ask spread
The Economic Interpretation of shrinks, especially after decimalization. This is con-
Predictability firmed in Table 1. In general, investors cannot design
a trading strategy to obtain excess returns in this case,
Without assuming irrationality and market ineffi- since the bid and ask effect is due to the market
ciency, how can we interpret predictability under the friction.
Predictability of Asset Prices 5

Nonsynchronous Trading Table 3 The probability of nontrading; adopted from Lo


and Mackinlay [29]
Although individual stock returns might exhibit neg-
Probability
ative serial correlation, portfolio returns tend to be t |t +1 Small Medium Large nontrading
positively autocorrelated. Lo and MacKinlay [29]
have offered “nonsynchronous trading” as a mech- Small 0.35 0.21 0.02 0.291
Medium 0.39 0.31 0.09 0.025
anism in generating such a positive autocorrelation.
Large 0.33 0.36 0.17 0.008
In practice, not all stocks, especially small stocks, are
traded at any given moment. On the arrival of market
wide news, these stocks that are not traded currently the nontrading in small stocks. Under this view, no
will have similar returns to those traded today when money can be made even with a positive return auto-
trading resumes next period, which will make the correlation.
portfolio with all stocks look like autocorrelated.
To illustrate the idea, suppose that there are two
stocks A and B following random walk processes, Transactions Costs
implying no autocorrelation in their own returns. At
the release of a market wide news at time t = 0, we Nontrading is a bit of an econometrics device.
would have observed returns of R1A and R1B for stocks Instead, security prices could be slow to update in
A and B, respectively. Owing to the commonality of the arrival of information due to transactions costs.
the news, we assume Cov(R1A , R1B ) > 0. If stock A In other words, transactions costs put a wedge in how
is not traded and stock B is traded, however, we will prices might change over time. Let Pt∗ be investors’
only observe R̂1A = 0 and R̂1B = R1B . Similarly, there correct valuation. They will trade only when the price
is a common news released at t = 1. Both stocks are cover the transactions costs. In other words, there will
traded this time, resulting in returns of R2A and R2B for be a bound around the current price. Only when Pt∗
the two stocks. Owing to the random walk assumption
accumulates to the degree that overcomes the bound,
on individual stocks, we have Cov(R1A , R2A ) = 0 and
we will see a price change. Otherwise, there will be
Cov(R1B , R2B ) = 0. This structure can be summarized
as follows: zero excess demand within the bound. Such a slow
adjustment will create a positive autocorrelation in
Stock A : | R̂1A = 0 | R̂2A = R1A + R2A | security returns.
Stock B : | R̂1 = R1
B B
| R̂2B = R2B |
t =0 t =1 t =2 Cross Autocorrelation
(10)
The evidence of large stock returns predicting small
Now, consider an equal-weighted portfolio of the stock returns seems to be persuasive since it existed
two stocks. The portfolio returns in the two periods in both daily and monthly stock returns. Although
are R̂1P = 12 R1B and R̂2P = 12 (R2B + R1A + R2A ). It is one might attribute the phenomenon to the nonsyn-
chronous trading story on the daily frequency, non-
easy to see that Cov(R̂1P , R̂2P ) = 12 Cov(R1A , R1B ) > 0.
trading is less likely for monthly returns. In response,
The same idea applies to the case when different
Boudoukh et al. [5] offered an alternative explanation
stocks are traded at different times. Using daily
that utilizes the serial correlation and the contempo-
returns from 1962 to 1985 to form 20 size portfolios,
raneous correlation to explain the cross correlation.
Lo and MacKinlay reported the following first-order
Suppose that security i’s return follows an AR(1)
autocorrelations and the probability of nontrading
process of the following form:
(Table 3):
Clearly, a large autocorrelation of 35% in small ri,t+1 = µ + θri,t + i,t+1 (11)
stock portfolio returns can be supported by a 29%
It is easy to see that θ = Corr(ri,t+1 , ri,t ). Multiplying
likelihood of nontrading in small stocks.c How-
both sides of equation (11) by rj,t and assuming that
ever, it is difficult to justify the 17% autocorrelation
Cov(i,t+1 , rj,t ) = 0, we have the following relation:
in large stock portfolio returns by the correspond-
ing likelihood of nontrading. In addition, the cross
Corr(ri,t+1 , rj,t ) = Corr(ri,t+1 , ri,t )Corr(ri,t , rj,t )
autocorrelation of 33% from the large stock portfolio
to the small stock portfolio is also consistent with (12)
6 Predictability of Asset Prices

Table 4 Portfolio correlations adopted from Boudoukh Note that the coefficients in equations (15) and (16)
et al. [5] can be estimated using the Kalman filter proce-
Portfolio Smallt+1 Mediumt+1 Larget+1 Smallt Larget dure. Testing the hypothesis of time-varying expected
return is equivalent to test whether φ = 0 in the above
Smallt 0.36 0.19 0.03 models. Using the 10 size-sorted weekly (Wednes-
Mediumt 0.35 0.22 0.06 0.89
Larget 0.28 0.21 0.07 0.72 0.91
day to Tuesday) portfolio returns from 1962 to 1985,
Conrad and Kaul [13] found that the autocorrela-
This table reports cross- and auto-correlations among size tions coefficients are 41 and 9% for the small and
portfolios the large decile portfolios, respectively, which are
both statistically significant
√ when compared to the
As seen from equation (12), the cross autocorrelation confidence bound of 1/ T = 0.03. Although the per-
is essentially the self-autocorrelation acted on con- sistence parameter estimates (φ̂) of 0.589 and 0.087
temporaneous correlation. Using a different sample for small and large portfolios, respectively, are very
period, Boudoukh et al. [5] found that results are different, they are statistically significant at a 1%
consistent with equation (12) (Table 4). level.
Applying equation (12), we can compute the It is important to understand why expected returns
predicted cross autocorrelations as change over time. In the CAPM world, it is implicitly
assumed that a firm will continue to produce the
same widgets and face the same uncertainty when
Corr∗ (rsmall,t+1 , rlarge,t )
selling these widgets in the market. In other words,
= Corr (rsmall,t+1 , rsamll,t )Corr(rsmall,t , rlarge,t ) the risk structure in future cash flows (CFs) is fixed.
Thus, the comovement with the overall market is
= 0.36 × 0.72 = 0.26 (13)
fixed. At the same time, investors’ attitude toward
Corr∗ (rlarge,t+1 , rsmall,t ) risk does not change, which implies constant expected
returns. Such a model structure may reasonably
= Corr (rlarge,t+1 , rlarge,t )Corr(rlarge,t , rsmall,t ) describe the real world over a short period of
= 0.07 × 0.72 = 0.05 (14) time.
Over a longer horizon, however, investment
These numbers are very close to the actual cross opportunities can change due to either technological
autocorrelations shown in the table. Therefore, we do advances or changes in consumers’ preference toward
not need frequent nontrading to justify the observed goods and services. For example, Apple used to be
cross autocorrelation. However, we still need to in the business of making personal computers and
understand the serial correlation. software 10 years ago. Today, a significant portion
of Apple’s business is in the consumer electron-
Time-varying Expected Returns ics including music players and cell phones. Under
this view, both the risk environment of a firm and
The mechanism for the observed autocorrelation, the risk tolerance of investors could change over
discussed in the previous sections, largely relies time. Therefore, the observed predictability may sim-
on market frictions. As discussed in the section ply provide compensation for investors’ exposure to
Predictability and Market Efficiency, an alterna- the risk of change in investment opportunities or
tive rational explanation for predictability is the reflect the differences in the required risk compen-
time-varying expected return. Given the unobserv- sation due to change in the risk tolerance over dif-
ability nature of expected returns, Conrad and ferent economic conditions. In this case, a represen-
Kaul [13] proposed to characterize the movement tative investor will not try to utilize the predictability
in expected returns as following a simple AR(1) to alter his/her asset allocations. For example, if
process of he/she knows that the next period stock return will
likely be high, he/she should allocate more assets
to stocks. However, if he/she understands that the
rt+1 = Et (rt+1 ) + t+1 (15)
high return is associated with high expected return
Et (rt+1 ) = r̄ + φEt−1 (rt ) + ut (16) due to his/her increased risk aversion next period,
Predictability of Asset Prices 7

he/she would not increase his/her holding of the risky that security prices reflect investors’ expectations,
stocks. and expectations are good predictors of future val-
ues. To further illustrate this rationale, we can use
mathematical models to relate returns to prices or
Understanding Some Useful Predictors other variables.

In an interesting paper by Boudoukh et al. [5], it is


argued that the observed autocorrelation in returns The Dividend–price Ratio—Log-linearization
neither is due to market inefficiency nor can be
Perhaps the most frequently used predictor is the
attributed to time-varying expected returns. If auto-
dividend–price ratio or the dividend yield. This is
correlation in the returns of an index, such as the
also the variable that has been scrutinized the most
S&P 500, is due to market inefficiency or time-
[1]. Despite many statistical issues discussed in the
varying expected returns, the same autocorrelation
following section, it is important to understand why
should be observed in the S&P 500 future con-
the dividend–price ratio should predict future returns.
tract returns too, but they did not find supportive
We start from the following return identity:
evidence.d This result seems to kill the predictability
associated with autocorrelation, but does not neces-  
sarily provide evidence against other form of pre- Pt+1 + Dt+1 Pt+1 Dt Dt+1 Dt+1
Rt+1 = = 1+
dictability. Moreover, autocorrelation in returns is a Pt Dt+1 Pt Dt Pt+1
sufficient condition for predictability. It is not a nec- (18)
essary condition. There could exist nonreturn-based
predictors. Suppose that the return generating process It is difficult to allow time-varying expected return
is as follows: due to the nonlinearity in equation (18). We, thus,
take natural log on both sides of equation (18)
rt+1 = βzt + t+1 (17) and apply Taylor series expansion around the steady
state. After simplifying [8], we obtain the following
where zt is a predictor with Cov(zt , t+1 ) = 0 equation:
and Cov(t , t+1 ) = 0. Since Cov(rt+1 , rt ) =
βCov(zt , rt ) + Cov(t+1 , rt ) ≈ βCov(zt , rt ), auto- dt − pt = const + ρ(dt+1 − pt+1 ) + rt+1 − dt+1
correlation could be close to zero as long as
Cov(zt , rt ) is small, which is usually the case. (19)
Therefore, recent literature has focused its atten-
tion on predictors other than past returns. For where ρ = 1/(1 + D/P ) = 1/1.04 = 0.96 (with
example, an incomplete list includes short-term D/P being the steady-state dividend–price ratio),
interest rate, term spread, default spread, inflation dt+1 = dt+1 − dt , and lowercase variables represent
rate, dividend yield, book-to-market ratio, consump- log of the corresponding uppercase variables. Under
tion–wealth–income ratio, repurchasing yield, and the assumption of stationary dividend–price ratio, we
so on. It is important to know why these variables can solve equation (19) forward,
predict returns in the first place. Without theory, a


variable found to be useful in predicting stock returns
could be a result of data mining. dt − pt = const + ρ j (−dt+1+j + rt+1+j )
A closer look at these predictors reveals that they j =0

are either related to business cycles or associated (20)


with stock prices. Since expected returns could vary
with business cycles, variables that predict business Equation (20) implies that a high dividend–price ratio
cycles such as the term spread or the default spread must mean either a low future dividend growth or a
should be useful predictors. Many significant predic- high future return. In addition, dividends and returns
tors, such as the dividend yield, book-to-market ratio, that are closer to the present are more influential than
and repurchasing yield, contain the element of one dividends and returns far in the future due to the fact
over price. This common feature comes from the fact that ρ is less than 1.
8 Predictability of Asset Prices

From an empirical perspective, we can compute NYSE/AMEX/NASDAQ composite index returns.


the volatility of dividend–price ratio by multiplying Clearly, the degree of predictability measured by R 2
both sides of equation (20) by (dt − pt ) increases monotonically with the return horizon. For
example, R 2 s are 0.7, 8.6, 21.7, and 41.9% over 1
  month, 1 year, 2 years, and 4 years, respectively, in


Var(dt − pt ) = − Cov dt − pt , ρ j dt+1+j  the early sample period of 1927–1951. Similarly, R 2 s
j =0
continue to be impressive with 1.8, 18.8, 32.2, and
  41.7% over 1 month, 1 year, 2 years, and 4 years,

 respectively, in the later sample period from 1952 to
+ Covdt −pt , ρ j rt+1+j (21) 1994.
j =0

Since the volatility of (dt − pt ) is positive, it is Issues with Long-horizon Regressions


clear that the dividend–price ratio will forecast
either dividend growth or future returns. Empiri- Long-horizon regressions, such as equation (22),
cal evidence suggests that the (d − p) variable does were first advocated by Fama and French [15].
not forecast future dividend growth. Therefore, the Despite the impressive magnitude of R 2 s, the power
(d − p) ratio must forecast future returns. Again, of the associated long-horizon tests is doubtful. The
such a predictability does not imply market ineffi- issue involves the use of overlapping observations
ciency in predicting t+1 in equation (6). A testing due to the availability of data [33]. In general, over-
strategy based on equation (21) is to regress the sum lapping samples could lead to large efficiency gains
of future returns on the dividend–price ratio: when the independent variables in a predictive regres-
sion are serially uncorrelated. However, most predic-
tors are highly autocorrelated, which implies limited
rt+1 + · · · + rt+τ = α + β(τ )(dt − pt ) + t+1,t+τ efficiency gains when using overlapping observa-
(22) tions. For example, for the 60 years of returns used in
the Fama and French [15] study, although the nom-
where τ is the number of future periods. Table 5 inal sample size is large using overlapping returns,
is adapted from Campbell et al. [9] on monthly the effective sample size is not much larger than 12

Table 5 Long-horizon results for index returns


Forecast 1 3 12 24 48
horizon Month Months Months Months Months
Panel A: sample period 1927–1994
β(τ ) 0.016 0.043 0.200 0.386 0.654
t (τ ) 1.553 1.420 2.257 4.115 3.870
R 2 (τ ) 0.007 0.014 0.073 0.143 0.261
Panel A: sample period 1927–1951
β(τ ) 0.024 0.054 0.304 0.667 1.085
t (τ ) 0.980 0.793 1.915 3.841 3.693
R 2 (τ ) 0.007 0.011 0.086 0.217 0.419
Panel A: sample period 1951–1994
β(τ ) 0.027 0.080 0.327 0.579 0.843
t (τ ) 3.118 3.152 3.181 3.072 3.508
R 2 (τ ) 0.018 0.049 0.188 0.322 0.417
This table reports results from regressing future τ months returns on current
dividend–price ratio for the value-weighted NYSE/AMEX/NASDAQ composite
index return. The regression model is given as follows:

rt+1 + · · · + rt+τ = α + β(τ )(dt − pt ) + t+1,t+τ


Predictability of Asset Prices 9

(the nonoverlapping 5-year sample) due to the highly The Payout Ratio or the Repurchasing Yield
persistent regressor. As pointed out by Boudoukh
et al. [6], if an innovation in an independent variable Recently, Boudoukh et al. [4] have proposed to use
happens to coincide with the next period return, this the total payout ratio as a predictor for stock returns.
relationship will be repeated many times in the long- The importance of the new predictor can be illustrated
horizon regression since the shock will not die out by its impressive R 2 of 26% using annual data over
for many periods and the particular return will appear the sample period from 1926 to 2003. The use of
many times in the overlapping return series. payout ratio can be justified since investors’ total
Under the null hypothesis of no autocorrelation wealth could also be affected by share repurchasing.
in returns, that is, β(τ ) = 0, ∀τ in equation (22), In fact, representative investors should care about
Kirby [24] has shown the following asymptotic result the total distribution, which includes both the direct
for the R 2 from a predictive regression: dividend distributions and repurchases. If there are
rational reasons to believe that dividend yield predicts
d stock returns, the payout yield should play a similar
T × R 2 −−−→χ 2 (K) (23) role. In fact, the implementation of the SEC rule
where T is the number of observations and K is the 10b − 18 in 1982 gives firms an incentive to rely
number of independent variables in the regression. more on repurchases due to the tax advantages for
Let us use a numerical example to illustrate the point. investors.
For a univariate regression with K = 1 and T = 12,
what would we expect to see? Payout Ratio. Repurchasing could be used either
to reduce the effect of stock option exercise or to
• Since the mean of a χ 2 (1) random variable is substitute for dividends. To construct a measure that
1, we have E(12R 2 ) = 1, which implies that reflects the latter, Boudoukh et al. [4] used change in
E(R 2 ) = 8.3%. treasury stock adjusted for potential asynchronicity
• The 95% cutoff for R 2 is expected to be 32% between the repurchase and option exercise as a
since the critical value for a χ 2 (1) distribution measure of repurchasing (TS). They also used total
under the same confidence level is 3.84. In other repurchasing from the CF statement. Results are
words, we can expect to see R 2 s as high as 32% summarized in Table 6. Clearly, the predictive power
even though there is no predictability. of the dividend–price ratio (D/P ) has gone down
when comparing R 2 s from the two sample periods.
Therefore, long-horizon regression results are In particular, R 2 has decreased from 13 to 8% when
avoided in this article. including the recent sample period.

Table 6 Dividend yield and payout ratio


ln(D/P ) ln(CF /M) ln(TS/M) ln(0.1 + Net payout)
1926–2003
Coef 0.116 0.209 0.172 0.759
t-ratio 2.240 3.396 2.854 5.311
R2 0.055 0.091 0.080 0.262
Sim p-value 0.083 0.011 0.020 0.000
1926–1984
Coef 0.296 0.280 0.300 0.794
t-ratio 3.666 3.688 3.741 5.342
R2 0.130 0.121 0.135 0.300
Sim p-value 0.044 0.054 0.043 0.001
This table reports results from predictive regressions using various predictors. “CF/M” is the
total repurchasing from cash flow statement (CF) over the total market value, while “TS/M” is
change in treasury stock adjusted for potential asynchronicity between the repurchase and option
exercise as a measure of repurchasing (TS) over the total market value. “D/P ” is the usual
dividend–price ratio for the value-weighted NYSE/AMEX/NASDAQ composite index
10 Predictability of Asset Prices

Table 7 The adjusted R 2 s for predictive regression using the dividend yield
(D/P) and/or the repurchasing yield (F/P) over different sample periods
1952–2005 1952–1978 1979–2005

Frequency D/P F /P D/P F /P D/P F /P


Monthly 0.5 1.0 1.9 0.0 0.0 1.7
Quarterly 1.9 2.5 5.6 0.0 0.7 5.6

In contrast, the repurchasing yield (measured as where St is the number of share outstanding at time
the ratio between repurchasing and market capital- t. Equation (24) can be interpreted in the following
ization) is impressive. No matter how it is measured, way:
the explanatory power is much larger than the pure
dividend–price ratio. Moreover, when using the net • first term: dividend yield (D/P ) for a representa-
payout yield (measured as the ratio between repur- tive shareholder;
chasing minus new issuing plus dividend and the mar- • second term: net repurchasing yield (F /P ) at the
ket capitalization), R 2 is as high as 26%. Although the before ex-dividend day price; and
payout yield is important empirically, its significance • third term: change in market capitalization, which
is overstated in Boudoukh et al. [4]. The most signifi- reflects growth.
cant contributor to the predictive power of the payout
Using NYSE/AMEX/NASDAQ index returns, we
yield is the new issuing yield when examining their
can construct both the smoothed dividend yield
accounting-based measures separately. Furthermore,
(D/P ) and the repurchasing yield (F /P ) over the
the predictive power of new issuing yield largely
past 12 months. Table 7 reports the adjusted R 2 s for
comes from the two outliers of 1929 and 1930. In
various predictive regressions.
other words, the new issuing yield offers no explana-
For the whole sample period from 1952 to 2005,
tory power once the sample period starts from 1931
quarterly returns are more predicable than monthly
instead of 1926.
returns. Overall, the repurchasing yield has higher
predictive power than the dividend yield. When we
split the sample into two, it becomes clear that the two
An Alternative Approach to Construct the Repur- predictors have played very different roles. Almost all
chasing Yield. The conventional approach of com- the predictive power in the first half of the sample
puting returns ignores changes in the market capital- comes from the dividend yield, whereas majority
ization associated with changes in the total number predictive power in the second half of the sample
of share outstanding. When the number of shares is due to the repurchasing yield. This evidence is
changes over time due to either repurchasing or sea- consistent with the observation of a decreasing trend
sonal offering, capital gains do not purely reflect in the dividend yield and the increasing role played
growth potential. From an asset pricing perspective, by repurchases.
it is more important to consider different components
of returns from a representative investor’s perspec-
tive. In other words, we can decompose returns from
the stand point of a representative investor instead of Statistical Issues
a buy-and-hold investor. In particular, we can rewrite
the return identity as the following: The use of many predictors can be controversial. In
many cases, the issue lies in the statistical inference
due to the persistence in predictors. These issues
St+1 Dt+1 (St − St+1 )(Pt+1 + Dt+1 )
Rt+1 ≡ + include spurious regression, biased estimates due to
St Pt St Pt correlations between innovations to predictors and
St+1 Pt+1 stock returns, and error in variables when using
+ (24) imperfect predictors.
St Pt
Predictability of Asset Prices 11

Spurious Regression ratio and ψ̂ is typically biased downward, equation


(27) suggests that the beta estimate is biased upward.
When regressing one nonstationary random vari-
Therefore, Stambaugh concluded that the predictive
able on another independent nonstationary random
power of dividend yield is exaggerated.
variable, we often observe a significant relationship
While Stambaugh’s bias adjustment is based on
between the two variables. This is because, in a finite
the well-known bias result of ψ̂ being −(1 + 3ψ)/T ,
sample, both variables are likely to be perceived
Lewellen [28] observes that such a bias typically
trending. Spurious regression is first discussed by
occurs in the data that appear to mean-revert more
Granger and Newbold [21]. At a first glance, spurious
strongly than they truly do. However, predictors such
regression may not seem to be likely for a predictive
as the dividend yield are hardly mean-reverting. They
regression since stock returns on the left-hand side
contain roots very close to unity. Instead, Lewellen
of a regression are not persistent at all. However, if
[28] proposed to use ψ = 1 as the true value in
we consider stock returns as containing a persistent
equation (27) in order to derive a conservative
expected return component, the predictive regression
adjustment. For example, using NYSE returns and
could be spurious [20]. This problem can even be
log dividend–price ratio over the ample period from
more severe when researchers are mining for predic-
1946 to 2000 in equation (25), the least-squares
tors because highly persistent series are more likely
estimate of β is 0.92, with a standard error of 0.48.
to be found significant in the search for predictors.
Their simulation results suggest that many of the use- When applying Stambaugh [36] bias correction, the
ful predictors found in the literature could be subject estimate becomes 0.20 with a one-sided p-value of
to this criticism. 0.308. In contrast, using Lewellen’s conservative bias
adjustment, the estimate becomes 0.66 with a t-ratio
of 4.67.
Predictive Regression
Owing to persistence in the predictors and a corre- Implied Constraint
lation between innovations to predictors and stock
returns, Stambaugh [36] has suggested that both the To push the idea of incorporating prior knowledge
coefficient estimate and the t-ratio in a predictive as in Lewellen [28], Cochrane [12] argued that the
regression are biased. For example, when the cur- coefficients from predictive regressions of returns,
rent stock price is high, the current return will also dividend growth, and dividend–price ratio using the
be high, whereas the current dividend–price ratios lagged dividend–price ratio should be constrained. In
will be low since the D/P ratio has a price in the particular, if we run the following regressions,
denominator. Such an association implies a nega-
tive relationship between innovations to D/P ratios rt+1 = ar + br (dt − pt ) + r,t+1 (28)
and innovations to returns. Such a negative corre-
lation will couple with the typical downside bias dt+1 = ad + bd (dt − pt ) + d,t+1 (29)
in the persistence parameter estimate of the D/P dt+1 − pt+1 = adp + ψ(dt − pt ) + dp,t+1 (30)
ratio to make the predictive regression coefficient bias
upward. More specifically, suppose that we have the the regression coefficients br , bd , and ψ should be
following system: related. In fact, by substituting equations (28) through
(30) into equation (19), we have the following results:
rt+1 = βzt + t+1 (25)
br = 1 − ρψ + bd (31)
zt+1 = ψzt + ut+1 (26)
r,t+1 = d,t+1 − dp,t+1 (32)
where zt is the predictor. It can be shown [28] that
Since ρ < 1 and ψ < 1, equation (31) implies that
E(β̂ − β) = γ (ψ̂ − ψ) (27) one cannot test the joint hypothesis of br = 0 and
bd = 0 at the same time. In other words, if we fail
where γ is from the regression of t = γ ut + vt . to reject the hypothesis of bd = 0, we cannot ignore
Since γ is negative in the case of dividend–price the evidence that br is positive in the predictive
12 Predictability of Asset Prices

Table 8 The actual parameter estimates and the implied through a correlation between ut+1 and vt+1 . In other
parameters; adopted from Cochrane [12] words, information in the predictors helps to improve
Correlation the “quality” of the expected return estimates in
Implied the spirit of the classical SUR (seemingly unrelated)
Estimates σ (b̂) value r d dp regression. Since the system of equations (33)–(35)
b̂r 0.097 0.050 0.101 r 19.6 66.0 −70.0 can be reduced to a predictive regression when µt =
b̂d 0.008 0.044 0.004 d 66.0 14.0 7.5 zt , it should perform at least as good as the predictive
ψ̂ 0.941 0.047 0.945 dp −70.0 7.5 15.3 regression. Additional constraints can be imposed to
improve the estimation efficiency. For example, when
there is a positive shock to the expected return, future
regression of equation (28). As shown in Table 8 expected returns will be high due to the persistence,
(adapted from [12]), the b̂d estimate is very close which will result in a low price, or equivalently a
to zero with a large standard error. Therefore, br low return. Therefore, we can incorporate this prior
is probably close to 0.101 as implied by equation constraint of the negative correlation between ut+1
(31) using ρ = 0.9638, which is close to the actual and t+1 . Using quarterly data and imposed economic
estimate of 0.097. prior in a Bayesian framework, Pastor and Stambaugh
At the same time, equation (32) suggests that [31] found that the dividend yield is a very useful
shocks to returns and the dividend–price ratio are predictor.
highly correlated with each other, which are indeed In Pastor and Stambaugh [31], a predictor affects
true from Table 8. For example, the negative corre- the expected return through an indirect channel by
lation is as high as 70%. Table 8 also shows that improving the precision of the expected return esti-
the estimated coefficients br , bd , and ψ and their mate as in the same spirit of the SUR regression. To
corresponding implied values from equation (31) are push the idea further, Baranchuk and Xu [3] studied
amazingly close. both the direct and indirect effects of predictors on
the expected return. In particular, equations (34) and
Error in Variables (35) are replaced by

If predictability is driven by time-varying expected


returns, predictors should predict the expected return. µt+1 = µ0 + φµt + δmt + ut+1 (36)
In other words, the conventional predictive regression
implicitly assumes that the expected return is a linear zt+1 = mt + ηt+1 (37)
function of the predictors. However, from the small mt+1 = αmt + vt+1 (38)
magnitude of predictability, it is clear that predictors
can only be noisy estimates of the true expected
returns. In other words, the predictive regressions where mt is the expected predictor. In this frame-
are subject to error-in-variables problem, which will work, the expected predictor directly affects the level
bias estimates. To overcome the problem, Pastor and of expected return, whereas the unexpected predictor
Stambaugh [31] proposed to model the expected continues to influence the efficiency of the time-
return as an unobservable component and to allow varying expected return through its correlation with
its innovation being correlated with innovations in the innovation to the expected return. Using both the
predictors. Specifically, they propose the following dividend yield and repurchasing yield, Baranchuk and
model: Xu [3] were able to demonstrate the very different
role played by the two predictors. The repurchasing
rt+1 = µt + t+1 (33) yield affects the expected return directly, whereas
dividend yield works through the indirect channel
µt+1 = µ0 + φµt + ut+1 (34) affecting the precision in the estimate of the expected
zt+1 = ψzt + vt+1 (35) return. From a technical perspective, such an elabo-
rated model structure also avoids the potential spuri-
where µt is the expected return and zt is the predictor. ous regression and possesses the ability to incorporate
In this system, predictors affect the expected return economic prior.
Predictability of Asset Prices 13

Out-of-sample Predictive Power magnitude of future returns due to errors in the


parameter estimates. One has to at least estimate
If predictability is due to time-varying expected two parameters in a predictive regression, while a
returns, a representative investor will not attempt to sample mean corresponds to only one parameter
make abnormal returns since both his/her risk expo- estimate. The additional estimation error could easily
sure and risk tolerance change over time. However, a overwhelm the benefit of using predictors. Therefore,
nonrepresentative (isolated) investor might be able to in out-of-sample studies, a more useful question to
take the advantage of return predictability in order to ask is whether we are able to predict the direction of
outperform the market. Goyal and Welch [37] have future market movement. On the basis of this idea,
run a horse racing on the out-of-sample predictive Xu [38] had studied the economic significance of the
power for a model based on unconditional forecast following trading strategy:
versus models with conditional forecast using differ- Trading strategy: Invest in a risky asset today only
ent predictors including the following: if the predicted future asset return is positive.
Under the t-distributed return assumption, there
• the dividend–price ratio and the dividend yield
exists a moderate condition, under which the trading
[10];
strategy will outperform a buy-and-hold strategy.
• the earnings price ratio and dividend–earnings
(payout) ratio [26]; Using inflation, relative interest rate, and divi-
• the short-term interest rate [7]; dend–price ratio as predictors, Xu [38] had shown
• the term spread and the default spread [7, 16, 23]; that such a trading strategy could potentially double
• the inflation rate [14, 18]; the performance of a buy-and-hold strategy over the
• the book-to-market ratio [25, 32]; sample period from 1952 to 1998.
• the consumption, wealth, and income ratio [27];
and
• the aggregate net issuing activity [2]. Concluding Comments

After comparing the (conditional) root-mean-squared Given the vast literature on predictability, in this arti-
errors (RMSEs) with respect to the predicted returns cle attention was focused on the questions of the
to the (unconditional) RMSEs using a simple sam- existence of predictability and the interpretation of
ple mean, Goyal and Welch [37] concluded that predictability. Return predictability has always been
in-sample predictability can be very different from a challenge to the EMH. The traditional view on
out-of-sample performance. In most cases, the uncon- the evidence is either denying the evidence with the
ditional RMSEs are smaller than the conditional help of statistical methods or attributing the phe-
RMSEs. Therefore, they believe that most results nomenon to market frictions. For example, most pre-
from predictive regressions are just statistical illu- dictors, except for the past returns, are persistent.
sions. Such a statistical property may result in a spuri-
Similar to the idea of using prior information to ous regression. Predictors are also imperfect, which
improve the predictive power for future returns as in will bring in an error-in-variables problem in esti-
[12], prior economic constraints are valuable infor- mation. Many market microstructure effects, such as
mation and should be used simultaneously. Campbell bid–ask bounce and nonsynchronous trading, may
and Yogo [11] recognized that if we are really predict- induce autocorrelation in a short-run and among small
ing the expected returns in a predictive regression, we stocks. The modern view, however, takes a more pos-
should throw out the negative predicted returns since itive approach by recognizing the time-varying risk
the expected returns should always be positive. By premium due to changes in either investment oppor-
constraining the predicted returns to be nonnegative, tunities or investors’ risk tolerance. If this is indeed
Campbell and Yogo [11] found that most predictors the case, many variables that predict business cycles
in the above list are indeed useful in predicting future should also help to predict returns, for example, the
returns even out of sample. interest rate. Other variables that contain a price com-
In a related study, Xu [38] recognized that, given ponent can also predict returns because prices reflect
the low R 2 s in predictive regressions, it is very expectations and should summarize all future changes
difficult to provide accurate prediction about the in the expected return or CF distributions.
14 Predictability of Asset Prices

Many statistical issues can be dealt with a more References


elaborated model structure and the use of economic
prior. For example, if predictability is due to changes
in the risk premium, we can model the expected [1] Ang, A. & Bekaert, G. (2007). Stock return predictabil-
returns explicitly as following an AR(1) process and ity: is it there? Review of Financial Studies 20, 651–707.
[2] Baker, M. & Wurgler, J. (2000). The equity share in new
impose the nonnegativity constraint. We can also
issues and aggregate stock returns, Journal of Finance
alleviate the market microstructure effects by using 55, 2219–2257.
low-frequency returns such as monthly or quarterly [3] Baranchuk, N. & Xu, Y. (2007). What Predicts Stock
returns. From an empirical perspective, however, we Returns?—The Role of Expected versus Unexpected
should not expect to find huge return predictability. Predictors. working paper, University of Texas at Dallas.
It seems to be odd that some economic agents do not [4] Boudoukh, J., Michaely, R., Richardson, M.P. &
try to explore economic profits, even though they are Roberts, M.R. (2007). On the importance of measuring
not subject to the kind of risks that a representative payout yield: implications for empirical asset pricing,
Journal of Finance 62, 877–915.
investor might expose to. Indeed, many studies tend
[5] Boudoukh, J., Richardson, M.P. & Whitelaw, R.F.
to find economically weak but statistically significant (1994). Tale of three schools: insights on autocorrela-
evidence of predictability. tions of short-horizon stock returns, Review of Financial
Overall, we believe that the evidence points to the Studies 7, 539–573.
direction of predictable returns even under careful [6] Boudoukh, J., Richardson, M. & Whitelaw, R.F. (2008).
statistical inference. If this is the case, will evidence The myth of long-horizon predictability, Review of
on predictability have any implications on asset pric- Financial Studies 21, 1533–1575.
ing? The answer is yes as evident from the literature [7] Campbell, J.Y. (1987). Stock returns and term structure,
Journal of Financial Economics 18, 373–399.
on testing the conditional asset pricing models. Pre-
[8] Campbell, J.Y. (1991). A variance decomposition for
dictability also has implications on investors’ asset stock returns, Economic Journal 101, 157–179.
allocation decisions [22]. If returns are positively cor- [9] Campbell, J.Y., Lo, A.W. & MacKinlay, C.A. (1997).
related over time, investors might want to allocate The Econometrics of Financial Markets, Princeton Uni-
less wealth to equity. This is because a risk averse versity Press, Princeton.
investor understands that he/she will be subject to [10] Campbell, J.Y. & Shiller, R. (1988). Stock prices,
even larger downside risk if today’s return is low. earnings, and expected dividends, Journal of Finance
This is still an active area of research. 43, 661–676.
[11] Campbell, J.Y. & Yogo, M. (2006). Efficient tests of
stock return predictability, Journal of Financial Eco-
End Notes nomics 81, 27–60.
[12] Cochrane, J.H. (2008). The dog that did not bark: a
a. defense of return predictability, Review of Financial
Suppose returns follow the following AR(1) process,
Studies 21, 1533–1575.
rt − µ = θ (rt−1 − µ) + t [13] Conrad, J. & Kaul, G. (1988). Time-variation in expected
returns, The Journal of Business 61, 409–425.
the true Sharpe ratio defined as µ/σ can be expressed as [14] Fama, E. (1981). Stock returns, real activity, inflation,
and money, American Economic Review 71, 545–565.
µ 1 µ [15] Fama, E. & French, K. (1988). Permanent and temporary
= components of stock prices, Journal of Finance 96,
σ 1−θ y
2 σ
246–273.
b.
When the discount rate is not zero, we can define a [16] Fama, E. & French, K. (1989). Business conditions
discounted process such that it is a martingale. and expected returns on stock and bonds, Journal of
c.
The positive autocorrelation in the equal-weighted index Financial Economics 25, 23–49.
shown in Table 1 is also consistent with the nonsyn- [17] Fama, E. & French, K. (1996). Multifactor explanations
chronous trading story. of asset pricing anomalies, Journal of Finance 51,
d. 55–84.
Since holdings in the future contracts are much smaller
than the market capitalization of the 500 largest companies, [18] Fama, E. & Schwert, G.W. (1977). Asset returns and
the evidence could be consistent with our argument that inflation, Journal of Financial Economics 5, 115–146.
the representative investors will not try to trade on the pre- [19] Ferson, W.E. & Harvey, C.R. (1991). The variation of
dictability, while nonrepresentative investor in a segment economic risk premiums, Journal of Political Economy
of the market could. 99, 385–415.
Predictability of Asset Prices 15

[20] Ferson, W.E., Sarkissian, S. & Simin, T.T. (2003). [31] Pastor, L. & Stambaugh, R.F. (2007). Predictive Sys-
Spurious regressions in financial economics? Journal of tems: Living with Imperfect Predictors. NBER, Working
Finance 58, 1393–1414. Paper.
[21] Granger, C.W.J. & Newbold, P. (1974). Spurious [32] Pontiff, J. & Schall, L.D. (1998). Book-to-market ratio
regressions in economics, Journal of Econometrics 14, as predictors of market returns, Journal of Financial
111–120. Economics 49, 141–160.
[22] Kandle, S. & Stambaugh, R.F. (1996). On the pre- [33] Richardson, M. & Smith, T. (1991). Tests of financial
dictability of stock returns: an asset allocation perspec- models in the presence of overlapping observations,
tive, The Journal of Finance 51, 385–424. Review of Financial Studies 4, 227–254.
[34] Roll, R. (1984). A simple implicit measure of the
[23] Keim, D. & Stambaugh, R. (1986). Predicting returns in
effective bid-ask spread in an efficient market, Journal
stock and bond markets, Journal of Financial Economics
of Finance 39, 1127–1140.
17, 357–390.
[35] Schwert, G. (2003). Anomalies and market efficiency,
[24] Kirby, C. (1997). Measuring the predictability in stock in Handbook of Economics and Finance, G. Constan-
and bond returns, Review of Financial Studies 10, tinides, M. Harris & R. Stulz, eds, North Holland,
579–630. Amsterdam, Netherlands, Chapter 17.
[25] Kothari, S. & Shanken, J. (1997). Book-to-market, [36] Stambaugh, R.R. (1999). Predictive regressions, Journal
dividend yield, and expected market returns: a time- of Financial Economics 54, 375–421.
series analysis, Journal of Financial Economics 44, [37] Welch, I. & Goyal, A. (2008). A comprehensive look at
169–203. the empirical performance of equity premium prediction,
[26] Lamont, O. (1998). Earnings and expected returns, Review of Financial Studies 21, 1533–1575.
Journal of Finance 53, 1563–1587. [38] Xu, Y. (2004). Small levels of predictability and large
[27] Lettau, M. & Ludvigson, S. (2001). Consumption, aggre- economic gains, Journal of Empirical Finance 11,
gate wealth, and expected stock returns, Journal of 247–275.
Finance 56, 515–849.
[28] Lewellen, J. (2004). Predicting returns with financial
ratios, Journal of Financial Economics 74, 209–235.
Related Articles
[29] Lo, A. & MacKinlay, A.C. (1990). An econometric
analysis of nonsynchronous-trading, Journal of Econo- Capital Asset Pricing Model; Efficient Mar-
metrics 45, 181–212. ket Hypothesis; Expectations Hypothesis; Risk
[30] Pastor, L. & Stambaugh, R.F. (2008). Predictive Sys- Premia.
tems: Living with Imperfect Predictors, Working Paper,
12814, NBER. YEXIAO XU
Real Options take place in continuous time and that the underlying
sources of uncertainty follow Brownian motions.
Even though these assumptions may be unsuitable
in some corporate contexts, they permit to derive
Real options theory is about decision making and
precise theoretical solutions, thereby proving to be
value creation in an uncertain world. It owes its suc-
essential.f,g The focus of this earlier literature has
cess to its ability to reconcile frequently observed
been on valuing individual real options: the option
investment behaviors that are seemingly inconsistent
to expand a project, for instance, is an American
with rational choices at the firm level. For instance,
call option (see American Options). So is a deferral
Dixit [15] uses real options to explain why firms
option that gives a firm the right to delay the start of
undertake investments only if they expect a yield
a project. The option to abandon a project, or to scale
in excess of a required hurdle rate, thus violat-
back by selling a fraction of it for a fixed price, is
ing the Marshallian theory of long- and short-run
formally an American put (see American Options).
equilibria.a,b This is because, relative to a setting in
which there is no uncertainty, unforeseeable future Real-world projects, however, are often more com-
payouts discourage commitment to a project unless plex in that they involve a collection of real options,
the expected profitability of the project is sufficiently whose values may interact. The recent development
high. The real options methodology allows to identify in financial options interdependencies has enabled a
and value risky investments and, under certain condi- smoother transition from a theoretical stage to an
tions, to even take advantage of uncertainty. Indeed, application stage.h Margrabe’s [29] valuation of an
as we shall see, this valuation approach insures option to exchange one risky asset for another (see
investments against possible adverse outcomes while Margrabe Formula) finds immediate application in
retaining upside potential.c the modeling of switching options, which allow a firm
to switch between two modes of operation. Geske
[19] values options on options—called compound
Definition of a Real Option options —which may be applied to growth opportuni-
ties that become available only if earlier investments
A real option gives its holder the right, but not are undertaken. Phased investments belong to this
the obligation, to take an action (e.g., deferring, category. Thus, almost paradoxically, in this rela-
expanding, contracting, or abandoning) for a specified tively new field of research, the mathematically most
price, called the exercise —or strike —price, on or complex models, which apply sophisticated contin-
before some specified future date. We can identify gent claims analysis techniques, entail a great wealth
at least six factors that affect the value of a real of factual applications.i Moreover, numerous studies
option: the value of the underlying risky asset (i.e., show that real options represent a sizable fraction
the project, investment, or acquisition); the exercise of a firm’s value; both Kester [25] and Pindyck
price; the volatility of the value of the underlying [35], for instance, estimate that the value of a firm’s
asset; the time to expiration of the option; the interest growth options is more than half its market value
rate; and the dividend rate of the underlying asset of equity if demand volatility exceeds 20%. For this
(i.e., the cash outflows or inflows over the life of reason, the theory of real options has gained sig-
the option). If the value of the underlying project, its nificant importance among management practitioners
standard deviation, or the time to expiration increase, whose choices determine the success or failure of
so too does the value of the option. The value of their enterprises. Amram and Kulatilaka [1] collect
the (call) option also increases if the risk-free rate several case studies to show practitioner audiences
of interest goes up. Lost dividends decrease the how real options can improve capital investment
value of the option.d A higher exercise price reduces planning and results. In particular, they list three real
(augments) the value of a call (put) option.e options characteristics that are of great use to man-
The quantitative origins of real options derive agers: (i) options payoffs are contingent on the man-
from the seminal work of Black and Scholes [2] ager’s decisions; (ii) options valuations are aligned
and Merton [32] on financial options pricing (see with financial market valuations; and (iii) options
Black–Scholes Formula). These roots are evident thinking can be used to design and manage strategic
in the assumptions that trading and decision making investments proactively. The real options paradigm,
2 Real Options

however, is only the last stage in the evolution of with expected return and volatility indicated by µ
valuation models. The traditional approach to valu- and σ , respectively. The project’s payout rate equals
ing investment projects, which owes its origins to δ. Formally, the process can be written as
John Hicks and Irving Fisher, is based on net present
value. This technique involves discounting expected dV
= (µ − δ) dt + σ dz (1)
net cash flows from a project at a discount rate V
that reflects the risk of those cash flows, called the
where dz is the increment of a Wiener process and
risk-adjusted discount rate. Brennan and Trigeorgis
(dz)2 = dt.m,n In addition, denote the value of the
[8] characterize this first-phase models as static, or
firm’s investment opportunity (its option to invest)
mechanistic. The second-phase models are control-
by F (V ). It can be shown that the optimal rule is to
lable cash-flow models, in which projects can be
invest at the date τ ∗ when the project’s value first
managed actively in response to the resolution of
exceeds a certain optimal threshold V ∗ . This rule
exogenous uncertainties. Since they ignore strate-
maximizes
gic investment, both first- and second-phase models
often lead to suboptimal decisions. Dynamic, game-
F (V ) = max E[(Vτ − I )e−µτ ], V0 = V (2)
theoretic options models assume that projects can be τ
managed actively, instead.j These models take into
account not only the resolution of exogenous uncer- over all possible stopping times τ , where E is the
tainties but also the actions of outside parties. For expectation operator. Prior to undertaking the project
this reason, an area of immense importance within the only return to holding the investment option is its
game-theoretic options models concerns market com- capital appreciation, so that
petition and strategy.
Strategic firm interactions are isomorphic to a µF (V ) dt = E[dF (V )] (3)
portfolio of real options.k Furthermore, the payouts
of a project (as well as its value) can be seen as the Expanding dF (V ) using Itô’s lemma yields
outcome of a game among the inside agent, outside 1
agents, and nature. Dixit [14] and Williams [40] were dF (V ) = F  (V ) dV + F  (V )(dV )2 (4)
2
the first to consider real options within an equilibrium
context. Smit and Ankum [37], among others, study where primes indicate derivatives. Lastly, substituting
competitive reactions within a game-theoretic frame- equation (1) in (4) and taking expectations on both
work under different market structures. In the same sides gives
line of research is Grenadier’s [21] analysis of a per-
fectly competitive real-estate market with stochastic 1 2 2 
σ V F (V ) + (µ − δ)V F  (V ) − µF (V ) = 0
demand and time to build.l 2
(5)

Solution of the Basic Model Equation (5) must be solved simultaneously for
the project value F (V ) and the optimal investment
Besides particular cases, all investment expenditures threshold V ∗ , subject to three boundary conditions:
have two important characteristics. First, they are
at least partly irreversible, and second, they can be F (0) = 0 (6)
delayed so that the firm has the opportunity to wait
F (V ∗ ) = V ∗ − I (7)
for new information to arrive before committing any
 ∗
resources. F (V ) = 1 (8)
The most basic continuous-time model of irre-
versible investment was originally developed by Equation (6) is equivalent to stating that the
McDonald and Siegel [31]. In their problem, a firm investment option is worthless when the project’s
must decide when to invest in a single risky project, outcome is null. Equations (7) and (8) indicate the
denoted by V , with a fixed known cost I . The project payoff and marginal value associated with the opti-
is assumed to follow a geometric Brownian motion mum. To derive V ∗ , we must guess a functional form
Real Options 3

that satisfies equation (5) and verify if it works. In Numerical Methods in Real Options
particular, if we take F (V ) = AV β , then
In practice, most real option problems must be solved
∗ βI using numerical methods. Until recently, these meth-
V = (9)
β −1 ods were so complex that only few companies found
it practical to use them when formulating operat-
and ing strategies. However, advances in both compu-
 tational power and understanding of the techniques
 2 over the last 20 years have made it feasible to apply
1 µ−δ µ−δ 1 2µ
β= − + − + (10) real options thinking to strategic decision making.
2 σ2 σ 2 2 σ2
Numerical solutions give not only the value of the
project but also the optimal strategy for exercising the
The optimal rule is to invest when the value of
β options.t The simplest real option problems involv-
the project exceeds the cost by a factor > 1.
β −1 ing one or two state variables can be more conve-
This result is in contrast with net present value, niently solved using binomial or trinomial trees in one
which prescribes to invest as long as the value of the or two dimensions (see Finite Element Methods).u
project exceeds the cost (V ∗ = I ). However, since When a problem involves more state variables, per-
the latter rule does not account for uncertainty and haps path dependent, the more practical solution is
irreversibility, it is incorrect and it leads to suboptimal to use Monte Carlo simulation methods (see Monte
decisions. Carlo Simulation).v,w In order to do so, we use the
Furthermore, as it is apparent from the solution, assumption that properly anticipated prices (or cash
the higher the risk of the project, measured by σ , the flows) fluctuate randomly. Regardless of the pattern
larger are the value of the option and the opportunity of cash flows that a project is expected to have, the
cost of investing. Increasing values of the growth changes in its present value will follow a random
rate, µ, also cause F (V ) and V ∗ to be higher. On walk. This theorem, attributable to Paul Samuelson,
the other hand, larger expected payout rates, δ, lower allows us to combine any number of uncertainties
both F (V ) and V ∗ as holding the option becomes by using Monte Carlo techniques, and to produce
more expensive. an estimate of the present value of a project con-
Dixit and Pindyck [16] show how the optimal ditional on the set of random variables drawn from
investment rule can be found by using both dynamic their underlying distributions. More generally, there
programming (as it is done above) and contingent are two types of numerical techniques for option
claims analysis.o valuation: (i) those that approximate the underlying
Contingent claims methods require one important stochastic processes directly and (ii) those approxi-
assumption: stochastic changes in the value of the mating the resulting partial differential equation. The
project must be spanned by existing assets in the first category includes lattice approaches and Monte
economy (see Complete Markets). Specifically, cap- Carlo simulations. Examples of the second cate-
ital markets must be sufficiently complete so that gory include numerical integration (see Quadrature
one could find an asset, or construct a dynamic Methods); and the implicit/explicit finite difference
portfolio of assets, the price of which is perfectly schemes (see Finite Difference Methods for Bar-
correlated with the value of the project (see Risk- rier Options; Finite Difference Methods for Early
neutral Pricing).p,q This assumption allows properly Exercise Options) used by Brennan [6], Brennan and
taking into account all the flexibility (options) that Schwartz [7], and Majd and Pindyck [28], among
the project might have and using all the information others.
contained in market prices (e.g., futures prices) when
such prices exist.r If the sources of uncertainty in a
project are not traded assets (examples of which are Conclusions
product demand uncertainty, geological uncertainty,
technological uncertainty, cost uncertainty, etc.), an The application of option concepts to value real assets
equilibrium model of asset prices can be used to value has been an important growth area in the theory
the contingent claim.s and practice of finance. The insights and techniques
4 Real Options

derived from option pricing have proven capable both plain and exotic contingent claims and presents
of quantifying the managerial operating flexibility recent results on the numerical computation of opti-
and strategic interactions thus far ignored by con- mal exercise boundaries, hedging prices, and hedging
ventional net present value and other quantitative portfolios.
i.
Flexible manufacturing, natural resource investments, land
approaches. This flexibility represents a substantial
development, leasing, large-scale energy projects, research
part of the value of many projects and neglecting and development, and foreign investment are all examples
it can undervalue investments and induce a mis- of real options cases.
allocation of resources. By explicitly incorporating j.
Trigeorgis and Mason [39] remark that option valuation
management flexibility into the analysis, real options can be seen as a special version of decision tree analysis.
have provided the tools for properly valuing corporate Decision scientists propose the use of decision tree analysis
resources and capital budgeting. [34] to capture the value of operating flexibility associated
with many projects.
k.
Luerhman [27] explains how a business strategy compares
End Notes to a series of options more than to a single option. De
facto, executing a strategy almost always involves making a
a.
Marshall’s [30] analysis states that if price exceeds long- sequence of decisions: some actions are taken immediately,
run average cost, then existing firms expand and new ones while others are deliberately deferred.
l.
enter a business. The time-to-build and continuous-time features of Gren-
b. adier’s [21] model translate into an infinite state space.
Symmetrically, firms often do not exit a business for
lengthy periods, even after the price falls substantially Despite this, he is able to determine the optimal construc-
below long-run average cost. This phenomenon is dubbed tion rules by engineering an artificial economy with a finite
hysteresis. state space in which the equilibrium strategy is identical to
c.
Amram and Kulatilaka [1], Brennan and Trigeorgis [8], that of the true economy.
Copeland and Antikarov [10], Dixit and Pindyck [16], m.
According to equation (1), the current project value is
Grenadier [21], Schwartz and Trigeorgis [36], and Smit known but its future values are uncertain.
and Trigeorgis [38] represent core reference volumes on n.
Chapters 3 and 4 in [16] provide a thorough overview
real investment decisions under uncertainty. The survey of the mathematical tools necessary to study investment
article by Boyer et al. [4] is a noteworthy collection of decision using a continuous-time approach.
all most notable contributions to the literature on strategic o.
Although equivalent, the two methodologies are concep-
investment games, from the pioneering works of Gilbert tually rather different: while the former lies on the option’s
and Harris [20] and Fudenberg and Tirole [18] to more value satisfying the Bellman equation, the latter is founded
recent contributions. on the construction of a risk-free portfolio formed by a long
d.
For a thorough examination of the variables driving real position in the firm’s option and a short position in units
options’ analysis, the reader is referred to [10], Chapter 1. of the firm’s project. Chapter 5 in [16] presents a detailed
e.
An interesting example on the effect of an option’s explanation, along with a guided derivation, of the optimal
exercise price on its value is presented by Moel and Tufano rule obtained on adopting each technique.
[33]. They study the bidding for rights to explore and p.
Duffie [17] gives great emphasis to the implications of
develop a copper mine in Peru. A peculiar aspect of the
complete markets for asset pricing under uncertainty.
transaction is the nature of the bidding rules that bidders q.
Harrison and Kreps [22], Harrison and Pliska [23],
were required to follow by the Peruvian government.
and others have shown that, in complete markets, the
Each bid was required to specify the minimum amount
that the bidder would spend on developing the property absence of arbitrage implies the existence of a probability
if they decided to go ahead after exploration. This is distribution such that securities are priced on the basis
equivalent to allowing the bidders to specify the exercise of their discounted (at the risk-free rate) expected cash
price of their development option. This structure gave rise flows, where expectation is determined under the risk-
to incentives that affected the amount that firms would neutral probability measure. If all risks can be hedged, this
offer, thus inducing successful bidders to make uneconomic probability is unique. The critical advantage of working
investments. in the risk-neutral environment is that it is a convenient
f.
Boyarchenko and Levendorskii [3] relax these assump- environment for option pricing.
r.
tions and show how to analyze firm decisions in discrete The reader is referred to [36] for a more rigorous
time. discussion on the application of contingent claims analysis
g. to determine a project’s optimal operating policy.
Cox, Ross, and Rubinstein’s [12] binomial approach
s.
enables a more simplified valuation of options in discrete See [11] for the derivation of a fundamental partial
time. differential equation that must be satisfied by the value of
h. all contingent claims on the value of state variables that are
Detemple [13] provides a complete treatment of
American-style derivatives pricing. He analyzes in detail not traded assets.
Real Options 5
t. [12] Cox, J., Ross, S. & Rubinstein, M. (1979). Option
Broadie and Detemple [9] conduct a careful evaluation
of the many methods for computing American option pricing: a simplified approach, The Journal of Financial
prices. Economics 7(3), 229–263.
u. [13] Detemple, J. (2005). American-Style Derivatives: Valu-
Boyle [5] shows how lattice frameworks can be extended
to handle two state variables. ation and Computation, Chapman & Hall/CRC.
v. [14] Dixit, A. (1989). Entry and exit decisions under
In the last few years, methods have been developed, which
allow using simulations for solving American-style options. uncertainty, The Journal of Political Economy 97(3),
For example, Longstaff and Schwartz [26] developed a 620–638.
least-squares Monte Carlo approach to compare the value [15] Dixit, A. (1992). Investment and hysteresis, The Journal
of immediate exercise with the conditional expected value of Economic Perspectives 6(1), 107–132.
from continuation. [16] Dixit, A. & Pindyck, R. (1994). Investment Under
w.
Hull and White [24] suggest a control variate technique to Uncertainty, Princeton University Press, Princeton, NJ.
improve computational efficiency when a similar derivative [17] Duffie, D. (1996). Dynamic Asset Pricing Theory,
asset with an analytic solution is available. Princeton University Press, Princeton, NJ.
[18] Fudenberg, D. & Tirole, J. (1985). Preemption and rent
equalization in the adoption of new technology, The
References Review of Economic Studies 52(3), 383–401.
[19] Geske, R. (1979). A note on analytical valuation formula
[1] Amram, M. & Kulatilaka, N. (1999). Real Options: for unprotected American call options on stocks with
Managing Strategic Investment in an Uncertain World, known dividends, The Journal of Financial Economics
Harvard Business School Press, Boston, MA. 7, 375–380.
[2] Black, F. & Scholes, M. (1973). The pricing of options [20] Gilbert, R. & Harris, R. (1984). Competition with
and corporate liabilities, The Journal of Political Econ- lumpy investment, RAND Journal of Economics 15(2),
omy 18(3), 637–654. 197–212.
[3] Boyarchenko, S. & Levendorskii, S. (2000). Entry [21] Grenadier, S. (2000). Strategic options and product
and exit strategies under Non-Gaussian distributions, in market competition, in Project Flexibility, Agency, and
Project Flexibility, Agency, and Competition, M. Bren- Competition, M. Brennan, & L. Trigeorgis, eds, Oxford
nan & L. Trigeorgis, eds, Oxford University Press, Inc., University Press, Inc., New York, NY, pp. 275–296.
New York, NY, pp. 71–84. [22] Harrison, M. & Kreps, D. (1979). Martingales and
[4] Boyer, R., Gravelle, E. & Lasserre, P. (2004). Real arbitrage in multiperiod securities markets, The Journal
Options and Strategic Competition: A Survey. Working of Economic Theory 20(3), 381–408.
Paper. [23] Harrison, J. & Pliska, S. (1981). Martingales and
[5] Boyle, P. (1988). A lattice framework for option pricing stochastic integrals in the theory of continuous trad-
with two state variables, The Journal of Financial and ing, Stochastic Processes and Their Applications 11,
Quantitative Analysis 23(1), 1–12. 215–260.
[6] Brennan, M. (1979). The pricing of contingent claims [24] Hull, J. & White, A. (1988). The use of control variate
in discrete time models, The Journal of Finance 34(1), technique in option pricing, The Journal of Financial
53–68. and Quantitative Analysis 23(3), 237–251.
[7] Brennan, M. & Schwartz, E. (2001). Finite differences [25] Kester, W. (2001). Today’s options for tomorrow’s
methods and jump processes arising in the pricing of growth, in Real Options and Investment Under Uncer-
contingent claims: a synthesis, in Real Options and tainty: Classical Readings and Recent Contributions,
Investment Under Uncertainty: Classical Readings and E. Schwartz & L. Trigeorgis, eds, The MIT Press,
Recent Contributions, E. Schwartz & L. Trigeorgis, eds, Cambridge, MA, pp. 33–46.
The MIT Press, Cambridge, MA, pp. 559–570. [26] Longstaff, F. & Schwartz, E. (2001). Valuing American
[8] Brennan, M. & Trigeorgis, L. (2000). Project Flexibility, options by simulations: a simple least-squares approach,
Agency, and Competition, Oxford University Press, Inc., The Review of Financial Studies 14(1), 113–147.
New York, NY. [27] Luehrman, T. (2001). Strategy as a portfolio of real
[9] Broadie, M. & Detemple, J. (1996). American option options, in Real Options and Investment Under Uncer-
valuation: new bounds, approximations, and a compari- tainty: Classical Readings and Recent Contributions,
son of existing methods, The Review of Financial Studies E. Schwartz & L. Trigeorgis, eds, The MIT Press,
9(4), 1211–1250. Cambridge, MA, pp. 385–404.
[10] Copeland, T. & Antikarov, V. (2001). Real Options: A [28] Majd, S. & Pindyck, R. (1987). Time to build, option
Practitioner’s Guide, W.W. Norton & Company, New value, and investment decisions, The Journal of Finan-
York. cial Economics 18(1), 7–27.
[11] Cox, J., Ingersoll, J. & Ross, S. (1985). An intertemporal [29] Margrabe, W. (1978). The value of an option to
general equilibrium model of asset prices, Econometrica exchange one asset for another, The Journal of Finance
53(2), 363–384. 33(1), 177–186.
6 Real Options

[30] Marshall, A. (1890). Principles of Economics, Macmil- [37] Smit, H. & Ankum, L. (1993). A real options and
lan and Co, London. game-theoretic approach to corporate investment strat-
[31] McDonald, R. & Siegel, D. (1986). The value of waiting egy under competition, Financial Management 22(3),
241–250.
to invest, The Quarterly Journal of Economics 101(4),
[38] Smit, H. & Trigeorgis, L. (2004). Strategic Investment:
707–728.
Real Options and Games, Princeton University Press,
[32] Merton, R. (1973). Theory of rational option pricing, Princeton, NJ.
Bell Journal of Economics 4(1), 141–183. [39] Trigeorgis, L. & Mason, S. (2001). Valuing manage-
[33] Moel, A., Tufano, P., Brennan, M. & Trigeorgis, L. rial flexibility, in Real Options and Investment Under
(2000). Bidding for the antamina mine: valuation and Uncertainty: Classical Readings and Recent Contribu-
incentives in a real options context, in Project Flexibil- tions, E. Schwartz & L. Trigeorgis, eds, The MIT Press,
ity, Agency, and Competition, Oxford University Press, Cambridge, MA, pp. 47–60.
[40] Williams, J. (1993). Equilibrium and options on real
London, pp. 128–150.
assets, The Review of Financial Studies 6(4), 825–850.
[34] Myers, S. (2001). Finance theory and financial strat-
egy, in Real Options and Investment Under Uncer- Further Reading
tainty: Classical Readings and Recent Contributions,
E. Schwartz & L. Trigeorgis, eds, The MIT Press, Cam-
Grenadier, S. (2000). Game Choices: The Intersection of Real
bridge, MA, pp. 19–32. Options and Game Theory, Risk Books, London.
[35] Pyndick, R., Schwartz, E. & Trigeorgis, L. (eds) (2001).
Irreversible investment, capacity choice, and the value of
Related Articles
the firm, in Real Options and Investment Under Uncer-
tainty: Classical Readings and Recent Contributions, Black–Scholes Formula; Option Pricing: General
The MIT Press, Cambridge, MA, pp. 313–334. Principles; Options: Basic Definitions; Swing
[36] Schwartz, E. & Trigeorgis, L. (2001). Real Options and Options.
Investment Under Uncertainty: Classical Readings and
Recent Contributions, The MIT Press, Cambridge, MA. DORIANA RUFFINO
Employee Stock Options is not able to sell or transfer the options at any time.
This is in keeping with the alignment or incentive
effect of options. The option terms are modified if
the employee exits the firm either because he or she
Employee stock options (ESOs) are call options is fired, leaves, retires, or dies. These “sunset rules”
issued by a company and given to its employees vary widely across firms (see [8] for details), but
as part of their remuneration. The rationale is that typically the employee is given a period of time in
granting the employee options will align his or which to exercise the options or forfeit them. The
her interests with those of the firm’s shareholders. length of time is generally longest if the employee
This is particularly relevant for managers and Chief retires and shortest if the employee leaves or is
Executive Officers (CEOs) whose behavior has more fired. In addition to being unable to unwind an
impact on firm value than that of lower ranked option position by selling it, employees are typically
employees. also restricted from short selling the stock of their
ESOs are prevalent in both the United States and company and thus are very restricted in terms of
Europe. In the fiscal year 1999, 94% of the S&P 500 hedging their option exposure [5].
companies granted options to their top executives, There have been a number of empirical studies
and the value at the grant date represented 47% of ESO exercise patterns. Huddart and Lang [23]
of total pay for the CEOs [14]. The 2005 Mercer study exercise behavior in a sample of eight firms
Compensation Survey [34] reports that over 75% of that volunteered internal records on option grants and
CEOs receive option grants and options account for exercises from 1982 to 1994. They find a pervasive
32% of CEO pay. The Hay Group’s 2006 European pattern of option exercises well before expiration–the
Executive Pay Survey [15] found that 55% of the mean fraction of option life elapsed at the time of
companies in the study used stock options. exercise varied from 0.26 to 0.79 over companies.
ESOs are American call options on the company Bettis et al. [2] analyze a unique database of more
stock granted to the employee. They typically have a than 140 000 option exercises by corporate executives
number of characteristics that distinguish them from at almost 4000 firms during the period 1996 through
financial options; see [38] and [35] for overviews. 2002. They find 10-year options were exercised a
There is usually an initial vesting period during which median of 4.25 years before expiry. A further fea-
the options cannot be exercised. Cliff vesting is a ture documented in the data is that of block exercise.
structure where all options granted on a given date Huddart and Lang [23] find that the mean fraction of
become exercisable after an initial period, usually options from a single grant exercised by an employee
2–4 years. Stepped vesting refers to a structure where at one time varied from 0.18 to 0.72 over employ-
a proportion of an option grant becomes exercisable ees at a number of companies. Similarly, Aboody [1]
each year, for example, 10% after one year, then reports yearly mean percentages of options exercised
20%, 30%, and 40% each subsequent year. The most over the life of 5 and 10 year options, showing exer-
common structure is straight vesting where the pro- cises are spread over the life of the options. Some
portions are equal, say one-third of the grant is exer- of these block exercises are due to the nature of
cisable after each of the first three years (see [2, 30], the vesting structure–for instance, Huddart and Lang
and [25]). During this period, typically, the employee [23] find spikes on vest dates corresponding to large
must forfeit the remaining unvested options if he or exercises on those dates–but there are also other
she resigned or is fired. Clearly, if there is no vest- block exercises on dates that cannot be explained by
ing period, the options are American style, whereas, vesting.
in the limit, as the vesting period approaches matu- There are many questions of interest–including
rity, the options become European (see American “What is the employee’s optimal exercise pol-
Options; Call Options for descriptions of European icy?”, “What are the options worth to him or her?”;
and American options). “What is the corresponding cost to the company
After the vesting period, the options may be of granting the options?” The employee’s exercise
exercised at any time up to and including the maturity policy and option value should incorporate the fea-
date. These options are typically long dated with a tures described above–his or her inability to hedge
10-year maturity being most common. The employee being key. The cost to the company should reflect the
2 Employee Stock Options

value of the option liability to the issuing corporation. European. Also in a binomial model, Cai and Vijh [3]
This usually entails the assumption that shareholders and Carpenter [5] assume nonoption or outside wealth
are well diversified, so the cost should be the is invested in a Merton-style portfolio, but only allow
risk-neutral option value conditional on the optimal for a one-off choice of this portfolio.
exercise behavior of the employee. This distinction Many of the papers mentioned above observe
between the option value to the employee (often that the utility-based or subjective valuation to
called subjective value) and the cost to the company the employee is much lower than the equivalent
is important and arises because the employee can- Black–Scholes value (the value obtained in an equiv-
not perfectly hedge the risk of the option exposure, alent complete market setting); however, this is not
while shareholders are typically assumed to be well universally true in models where nonoption wealth
diversified. is invested in a riskless bond [14]. Generally, how-
The need to quantify the company cost is par- ever, the (subjective) value of the options to the
ticularly relevant in light of changes in accounting employee is less than the cost of the options to the
rules, which require companies to expense options company because of the employee’s hedging restric-
at the grant date. In 1995, the Financial Account- tions.
ing Standards Board (FASB) set a standard to require These models have been extended to incorporate
firms to expense stock options using “fair value”. the impact of optimal investment of outside wealth
However, this included the possibility to calculate in a market or risky asset, rather than just a bank
the option cost to the firm as the option’s intrinsic account. This was tackled in the natural setting of
value at the grant date. Perhaps motivated by this, utility-indifference pricing (see [19] for a survey
companies mainly granted options that were at-the- containing many references) for European options
money thus calculating a zero value for the expense. by Henderson [17]. This allows the employee to
The huge growth of employee options and a series reduce risk by partial hedging in the market asset,
of corporate scandals led to pressure for changes to which would seem to reflect what can be done
these rules, and new regulations (FASB 123R in the in practice. The basic setup for continuous-time
United States, International Financial Reporting Stan- models with hedging in the market asset is as
dards (IFRS) 2 in Europe) were introduced in 2004. follows. The market M follows a geometric Brownian
From 2005 onward, these regulations required com- motion
panies to use a “fair value method” of accounting dM/M = µdt + σ dB (1)
for the expense of employee options, and although
recommendations are made concerning appropriate
where µ, σ are constants, and B is standard Brownian
methods, there is still much scope for interpretation
motion. Let W be a standard Brownian motion and
by companies. For instance, use of the (European)
assume dBdW = ρdt. We can write dW = ρdB +

Black–Scholes price with an estimated “expected
(1 − ρ 2 )dZ for Z a Brownian motion independent
term” is an acceptable and popular approach. Despite
of B. The company stock S also follows a geometric
these changes, the granting of options that are at-the-
Brownian motion:
money is still typical.
To take into account the nonhedgability aspect of
employee options, we need to move outside of the dS/S = νdt + ηdW
complete market or risk-neutral pricing framework 
to an incomplete setting (see Complete Markets). = νdt + η(ρdB + (1 − ρ 2 )dZ) (2)
There have been many papers in the literature in
this direction, beginning with [22, 31, 32], and The term ρ 2 η2 represents the hedgable or market
[14], amongst others. These papers typically develop component of the total risk of the stock and (1 −
binomial models that take trading restrictions and ρ 2 )η2 is the unhedgable or idiosyncratic/firm-specific
employee risk aversion into account and compute risk of the stock. When ρ 2 = 1 all the risk can be
a certainty equivalent or subjective value for the hedged and an employee with an option on the stock
employee options. These models make the simplistic S is able to perfectly hedge the risk he or she faces.
assumption that any nonoption wealth is invested in (To avoid arbitrage, we should have ν − r = (µ −
a riskless bank account, and most treat the options as r)η/σ . More generally, CAPM imposes the relation
Employee Stock Options 3

ν − r = (µ − r)ηρ/σ ; see Capital Asset Pricing considered the case of the perpetual option but with-
Model). out the partial hedging in the market. The exercise
The employee can invest in a riskless asset with threshold and option values both decrease with risk
interest rate r and hold a cash amount θt in the market aversion and increase with (absolute value of) cor-
at time t. The dynamics of the wealth account X are relation. Just as in the European case, the ability to
then partially hedge risk is valuable to the employee. He
dX = θdM/M + r(X − θ)dt (3) or she places a higher value on the option and waits
longer to exercise it. It is also possible that stock
If the employee is granted λ European call options volatility reduces the option value in some scenar-
with strike K then he or she solves ios because of the interaction of the convex payoff
with the concave utility function; see [17, 18, 33],
and also [37]. Since the cost to the company is just
V (t, Xt , St , λ) = sup Et [U (XT + λ(ST − K)+ )]
θu ;u≥t the risk-neutral option value conditional on optimal
(4) exercise by the employee, it is also decreasing with
Under the assumption of exponential utility, closed- risk aversion and increasing with (absolute value of)
form solutions are obtained for the value function. correlation [13]. Detemple and Sundaresan [9] and
The utility-based or utility-indifference value p of the Ingersoll [25] also allow for optimal investment in a
λ options solves V (t, x + p, St , 0) = V (t, x, St , λ). market portfolio and consider numerical approaches
In such models, it is straightforward to show that, to the marginal pricing of small quantities of options.
in the limit, as the (absolute value of) correlation As mentioned earlier, the data indicates that
between the company stock and market approaches employees exercise options in a number of tranches
one, the Black–Scholes or complete market value on different occasions. Consideration of models that
is recovered. This value is then an upper bound only allow for one option or one exercise time is
on the utility-based valuation. In a European option not consistent with this observation. Vesting is one
setting, the Black–Scholes value represents the cost feature that clearly encourages such block exercise
to the company, so we see the value to the employee behavior, and indeed, Huddart and Lang [23] observe
is lower than the cost to the company. The other that many exercises take place immediately when the
comparison of interest is to consider what difference options vest. However, vesting does not appear to
the ability to undertake partial hedging in the market explain all of the intertemporal exercises, since not all
makes. The ability to partially hedge is valuable exercises occur immediately upon vesting. Another
to the employee and his or her utility-based or reason for intertemporal exercise is risk aversion and
subjective option value is higher than without the the inability to hedge risk due to restrictions. Jain
hedging/investment opportunity. In other words, the and Subrahmanian [26] consider a binomial model
subjective value increases in (absolute value of) for a risk-averse employee who is granted a number
correlation. Similar to the models without the market of options. Grasselli [12] extends the binomial frame-
asset, the higher the employee’s risk aversion, the work to include optimal investment in a correlated
lower the utility-based option value. market asset. These papers find numerically that opti-
Of course, as we described earlier, employee mal behavior is to exercise options when the stock
stock options are American options, and allow for price reaches a boundary and the discrete nature
early exercise once the options have vested. Some of the model results in exercise occurring at a dis-
of the aforementioned papers also treat American crete set of dates or stock price levels. Rogers and
style options and the general intuition is that hedg- Scheinkman [36] make similar observations numeri-
ing restrictions of the employee result in an earlier cally in a discrete approximation to a continuous-time
exercise and a lower subjective value than the equiv- model without investment opportunities in a market
alent Black–Scholes (complete market) American asset.
option. In the continuous-time model with invest- Grasselli and Henderson [13] show that under
ment in the market asset, closed-form results are the assumption of exponential utility and perpetual
found under the assumptions of exponential utility options, closed-form solutions can be derived for
and perpetual options in [18] and numerical solu- the multiple-option problem with investment oppor-
tions for finite maturity in [33]. Kadam et al. [29] tunities in a market asset. In fact, they show that
4 Employee Stock Options

given N options, there are N unique stock price option; see [24] and [7]. This style of model has
thresholds at which the employee should exercise an the attraction of simplicity and is much easier for
option. These thresholds are obtained using a recur- calibration since the employee’s risk aversion is no
sive relation. The price thresholds are increasing as longer used. For this reason, it may well be a fruit-
the quantity of options falls. In other words, when ful approach for calculating an approximation to the
the employee has fewer options remaining, he or she cost of the options to the company for accounting
is exposed to less risk, and thus is willing to wait purposes.
for a higher price threshold before exercising fur- We now turn to briefly discuss a number of
ther options. Similar comparative statics apply as in other features relevant in employee compensation.
the single American option case–thresholds, option Typically employees receive new grants of options
values, and company cost are decreasing in risk aver- periodically; however, companies also engage in
sion and increasing in (absolute value of) correlation. resetting (where the option strike of existing options
In addition, they show that the cost to the company is is adjusted downward when the options are out-of-
underestimated if a single optimal exercise threshold the-money) and reloading (where additional options
is used. Since, in reality, options are not exercised are granted automatically when existing options are
one at a time, the paper also introduces a transaction exercised [10]).
cost on exercise, which restores block exercise as the Besides the traditional employee options described
optimal solution, again found in closed form. Leung in this article, companies have increasingly granted
and Sircar [33] consider the finite-maturity version performance-based options, which link option vest-
of the problem, which leads to numerical solution of ing or exercise to the achievement of market or
the free-boundary problem. They also include fea- accounting-based performance targets. These options
tures such as vesting and job termination risk. are very popular in Europe, but have, until recently,
As described earlier, option terms change upon been less common in the United States; see [11] and
departure of an employee from the company and references therein. Compensation linked to account-
this should be incorporated into pricing models. ing data is potentially open to manipulation and man-
Employee departure is typically modeled by an agers with such options may be motivated to inflate
exogenous exponentially distributed time with con- earnings. There is a large literature on the connection
stant intensity, independent of the stock price, similar between compensation involving accounting-based
to a reduced-form approach in credit modeling. (see targets and earnings management, either of a direct
Structural Default Risk Models). The papers [5, 6, nature [4]) or accrual-based management or manipu-
27, 39], and [33], among others, incorporate departure lation Healy [16].
into a variety of setups in this manner. Performance-based options can also have exercise
Although we do not discuss estimation in any prices contingent on performance relative to a com-
detail here, it is clear that estimation of such models is parison group–these are known as indexed options;
difficult. The models require estimates of risk aver- see Johnson and Tian [28] who value such options
sion, outside wealth, and employee departure rate, in a risk-neutral framework using techniques from
which are not easily obtained. Bettis et al. [2] and exchange or Margrabe options (see Margrabe For-
Carpenter [5] have attempted calibration exercises mula). Managers are then rewarded as a function of
on utility style models to exercise data; however, relative performance relative to a peer group rather
many simplifying assumptions have to be made due than on absolute performance [20].
to data limitations. For example, they assume an Other important issues that have not been dis-
option grant is exercised on one date only rather cussed here include the impact of dilution–when
than on multiple occasions. Perhaps surprising is options are exercised, the company typically issues
the finding of Carpenter [5] that after a calibration new shares. Another important issue is the influence
to data, a reduced-form model of employee depar- the CEO has on the stock price via his or her effort or
ture is as capable as a utility-maximizing model choice of projects/risk. The problem of how best to
in explaining option exercises. This finding moti- compensate managers, given the benefits of improved
vates another strand of the literature, which models incentives and the costs of inefficient risk-sharing, is
option exercise exogenously by postulating an exer- the subject of a large literature on the principal agent
cise boundary in terms of the moneyness of the problem; see the classic reference [21].
Employee Stock Options 5

References [18] Henderson, V. (2007). Valuing the option to invest


in an incomplete market, Mathematics and Financial
Economics 1, 103–128.
[1] Aboody, D. (1996). Market valuation of employee [19] Henderson, V. & Hobson, D. (2009). Utility indif-
ference pricing–an overview, in Indifference Pricing,
stock options, Journal of Accounting and Economics 22,
R. Carmona, ed, Princeton University Press, Chapter 2.
357–391.
[20] Holmstrom, B. (1982). Moral hazard in teams, Bell
[2] Bettis, J.C., Bizjak, J.M. & Lemmon, M.L. (2005).
Journal of Economics 13, 1324–1340.
Exercise behavior, valuation and the incentive effects of
[21] Holmstrom, B. & Milgrom, P. (1987). Aggregation and
employee stock options, Journal of Financial Economics
linearity in the provision of intertemporal incentives,
76, 445–470.
Econometrica 55, 303–328.
[3] Cai, J. & Vijh, A. (2005). Executive stock and option [22] Huddart, S. (1994). Employee stock options, Journal of
valuation in a two state-variable framework, Journal of Accounting and Economics 18, 207–231.
Derivatives 12, 19–27. [23] Huddart, S. & Lang, M. (1996). Employee stock option
[4] Camara, A. & Henderson, V. (2007). Performance exercises: an empirical analysis, Journal of Accounting
Based Compensation and Direct Earnings Management, and Economics 21, 5–43.
Working paper. [24] Hull, J. & White, A. (2002). How to value employee
[5] Carpenter, J.N. (1998). The exercise and valuation of stock options, Financial Analysts Journal 60(1),
executive stock options, Journal of Financial Economics 114–119.
48, 127–158. [25] Ingersoll, J.E. (2006). The subjective and objective
[6] Carr, P. & Linetsky, V. (2000). The valuation of evaluation of compensation stock options, Journal of
executive stock options in an intensity-based framework, Business 79, 453–487.
European Finance Review 4, 211–230. [26] Jain, A. & Subramanian, A. (2004). The intertemporal
[7] Cvitanic, J., Wiener, Z. & Zapatero, F. (2008). Analytic exercise and valuation of employee stock options, The
pricing of employee stock options, Review of Financial Accounting Review 79(3), 705–743.
Studies 21, 683–724. [27] Jennergren, L. & Naslund, B. (1993). A comment on
[8] Dahiya, S. & Yermack, D. (2008). You can’t take it with “Valuation of executive stock options and the FASB
you: Sunset provisions for equity compensation when proposal”, The Accounting Review 68, 179–183.
managers retire, resign or die, Journal of Corporate [28] Johnson, S. & Tian, Y. (2000). Indexed executive stock
Finance 14, 499–511. options, Journal of Financial Economics 57, 35–64.
[9] Detemple, J. & Sundaresan, S. (1999). Nontraded [29] Kadam, A., Lakner, P. & Srinivasan, A. (2005). Exec-
asset valuation with portfolio constraints: a binomial utive Stock Options: Value to the Executive and Cost to
approach, Review of Financial Studies 12, the Firm, Working paper, City University.
835–872. [30] Kole, S. (1997). The complexity of compensation con-
[10] Dybvig, P. & Lowenstein, M. (2003). Employee reload tracts, Journal of Financial Economics 43, 79–104.
options: pricing, hedging and optimal exercise, Review [31] Kulatilaka, N. & Marcus, A.J. (1994). Valuing employee
of Financial Studies 12, 145–171. stock options, Financial Analysts Journal November-
December, 46–56.
[11] Gerakos, J.J., Goodman, T.H., Ittner, C.D. &
[32] Lambert, R.A., Larcker, D.F. & Verrecchia, R.E. (1991).
Larcker, D.F. (2005). The Adoption and Characteristics
Portfolio considerations in valuing executive compensa-
of Performance Stock Options, Working paper.
tion, Journal of Accounting Research 29(1), 129–149.
[12] Grasselli, M. (2005). Nonlinearity, Correlation and the
[33] Leung, T. & Sircar, R. (2009). Accounting for risk aver-
Valuation of Employee Options, Working paper.
sion, vesting, job termination risk and multiple exercises
[13] Grasselli, M. & Henderson, V. (2009). Risk aver-
in valuation of employee stock options, Mathematical
sion, effort and block exercise of executive stock Finance 19(1), 99–128.
options, Journal of Economic Dynamics and Control 33, [34] Mercer Human Resource Consulting. (2006). 2005 CEO
109–127. Compensation Survey and Trends.
[14] Hall, B.J. & Murphy, K.J. (2002). Stock options for [35] Murphy, K.J. (1999). Executive compensation, in Hand-
undiversified executives, Journal of Accounting and book of Labor Economics, O. Ashenfelter & D. Card,
Economics 33, 3–42. eds, North Holland, Vol. 3.
[15] Hay Group (2007). 2006 European executive pay survey, [36] Rogers, L.C.G. & Scheinkman, J. (2007). Optimal exer-
The Executive Edition (1), 11–12. cise of executives stock options, Finance and Stochastics
[16] Healy, P.M. (1985). The effect of bonus schemes on 11, 357–372.
accounting decisions, Journal of Accounting and Eco- [37] Ross, S.A. (2004). Compensation, incentives and the
nomics 7, 85–107. duality of risk aversion and riskiness, Journal of Finance
[17] Henderson, V. (2005). The impact of the market portfo- 59(1), 207–225.
lio on the valuation, incentive and optimality of execu- [38] Rubinstein, M. (1995). On the accounting valuation of
tive stock options, Quantitative Finance 5(1), 35–47. employee stock options, Journal of Derivatives 3, 8–24.
6 Employee Stock Options

[39] Sircar, R. & Xiong, W. (2007). A general framework for Related Articles
evaluating executive stock options, Journal of Economic
Dynamics and Control 31(7), 2317–2349.
American Options; Black–Scholes Formula; Call
Options; Capital Asset Pricing Model; Complete
Further Reading Markets; Structural Default Risk Models.

Black, F. & Scholes, M. (1973). The pricing of options and VICKY HENDERSON & JIA SUN
corporate liabilities, Journal of Political Economy 81(3),
637–654.
Arbitrage Strategy Japanese yen for less than $0.009419 and Toru will
be buying euro for less than ¥155.02. Very soon, the
situation will be such that nobody is able to make a
It is difficult to imagine a normative condition that riskless profit anymore.
is more widely accepted and unquestionable in the The economic rationale behind asking for nonex-
minds of anyone involved in the field of quantitative istence of arbitrage opportunities is based exactly on
finance other than the absence of arbitrage opportu- the discussion in the previous paragraph. If arbi-
nities in a financial market. Put plainly, an arbitrage trage opportunities were present in the market, a
strategy allows a financial agent to make certain profit multitude of investors would try to take advantage
out of nothing, that is, out of zero initial investment. of them simultaneously. Therefore, there would be
This has to be disallowed on economic basis if the an almost instantaneous move of the prices of cer-
market is in equilibrium state, as opportunities for tain financial instruments as a response to a sup-
riskless profit would result in an instantaneous move- ply–demand imbalance. This price movement will
ment of prices of certain financial instruments. continue until any opportunity for riskless profit is
Let us give an illustrative example of an arbi- no longer available.
trage strategy in the foreign exchange market, com- It is important to note that the preceding, some-
monly called the triangular arbitrage. Suppose that what theoretical, discussion does not imply that arbi-
Mary, in Paris, is buyinga the US dollar for ¤0.685. trage opportunities never exist in practice. On the
Tom, in San Francisco, is buying Japanese yen contrary, it has been observed that opportunities for
for $0.009419. Finally, Toru, in Tokyo, is buying some, albeit usually minuscule, riskless profit appear
one euro for ¥155.02. All these transactions are frequently as a consequence of the huge amount of
supposed to be able to occur at the same time. distant geographic trading locations, as well as a
There is something worth noting in the situation just result of the numerous financial products that have
described—something that could allow you to make sprung up and are sometimes interrelated in compli-
riskless profit. Let us see how. You borrow $10 000 cated ways. Realizing that such opportunities exist is
from your rich aunt Clara and tell her you will return a matter of rapid access to information that a certain
the money in a matter of minutes. First, you approach group of investors, so-called arbitrageurs, has. It is
Mary and change all your dollars to euros. This means rather the existence of arbitrageurs acting in financial
that you will get ¤6850. With the euros in hand, you
markets that ensures that when arbitrage opportunities
contact Toru and change them into yen—you will
exist, they will be fleeting.
get ¥(6850 × 155.02) = ¥1 061 887. Finally, you call
The principle of not allowing for arbitrage oppor-
Tom, wire him all your yen and change them back to
tunities in financial markets has far-reaching con-
dollars, which gets you $(1 061 887 × 0.009419) ≡
sequences and has immensely boosted research in
$10 001.91. You give the $10 000 back to your aunt
Clara as promised, and you have managed to create quantitative finance. The ground-breaking papers of
$1.91 out of thin air. Black (see Black, Fischer) and Scholes [1] and
Although the above-mentioned example is over- Merton (see Merton, Robert C.) [3], published
simplistic, it gives a clear idea of what arbitrage is: in 1973, were the first instances explaining how
a position on a combination of assets that requires absence of arbitrage opportunities leads to ratio-
zero initial capital and results in a profit with no nal pricing and hedging formulas for European-style
risk involved. Let us now take a step further and see options in a geometric Brownian motion financial
what will happen under the situation of the preceding model.b This idea was consequently taken up and
example. As more and more investors become aware generalized by many authors and has lead to a pro-
of the discrepancy between prices, they will all try to found understanding of the interplay between the
use the same smart strategy that you used for their economics of financial markets and the mathematics
benefit. Everyone will be trying to exchange US dol- of stochastic processes, with deep-reaching results—
lars for euros in the first step of the arbitrage, which see Fundamental Theorem of Asset Pricing; Risk-
will drive Mary to start buying the US dollar for neutral Pricing; Equivalent Martingale Measures;
less than ¤0.685 because of the high demand for the and Free Lunch for some amazing developments on
euros she is selling. Similarly, Tom will start buying this path.
2 Arbitrage Strategy

We close the discussion of arbitrages on an Further Reading


amusing note. Such is the firm belief on the principle
of not allowing for arbitrage opportunities in financial Dalang, R.C., Morton, A. & Willinger, W. (1990). Equivalent
modeling that even jokes have been created in order martingale measures and no-arbitrage in stochastic securities
market models, Stochastics and Stochastics Reports 29,
to substantiate it further. We quote directly from 185–201.
Chapter 1 of [2], which can be used as an excellent Delbaen, F. (1992). Representing martingale measures when
introduction to arbitrage theory: A professor working asset prices are continuous and bounded, Mathematical
in Mathematical Finance and a normalc person go Finance 2, 107–130.
on a walk and the normal person sees a ¤100 bill Delbaen, F. & Schachermayer, W. (1994). A general version
lying on the street. When the normal person wants of the fundamental theorem of asset pricing, Mathematische
Annalen 300, 463–520.
to pick it up, the professor says: “Don’t try to do
Delbaen, F. & Schachermayer, W. (1998). The fundamental
that. It is absolutely impossible that there is a ¤100 theorem of asset pricing for unbounded stochastic processes,
bill lying on the street. Indeed, if it were lying on Mathematische Annalen 312, 215–250.
the street, somebody else would have picked it up Elworthy, K.D., Li, X.-M. & Yor, M. (1999). The impor-
before you.” tance of strictly local martingales; applications to radial
Ornstein-Uhlenbeck processes, Probability Theory and Re-
lated Fields 115, 325–355.
End Notes Föllmer, H. & Schied, A. (2004). Stochastic Finance, de
Gruyter Studies in Mathematics, extended Edition, Walter
a.
All the prices referred to in this example are bid prices de Gruyter & Co., Berlin, Vol. 27.
of the currencies involved. Hull, J.C. (2008). Options, Futures, and Other Derivatives, 7th
b.
For historical perspectives regarding option pricing and Edition, Prentice Hall.
hedging, see Black, Fischer; Merton, Robert C.; Arbi- Michael Harrison, J. & Kreps, D.M. (1979). Martingales
trage: Historical Perspectives; and Option Pricing The- and arbitrage in multiperiod securities markets, Journal of
ory: Historical Perspectives. For a more thorough quanti- Economic Theory 20, 381–408.
tative treatment, see Risk-neutral Pricing. Michael Harrison, J. & Pliska, S.R. (1981). Martingales and
c.
Is this bold distancing from normality of mathematical stochastic integrals in the theory of continuous trading,
finance professors, clearly implied from the authors of [2], Stochastic Processes and Their Applications 11, 215–260.
a decisive step toward illuminating the perception they have Shreve, S.E. (2004). Stochastic Calculus for Finance. I: The
of their own personalities? Or is it just a gimmick used to Binomial Asset Pricing Model, Springer Finance, Springer-
add another humorous ingredient to the joke? The answer Verlag, New York.
is left for the reader to determine.
Related Articles
References
Black, Fischer; Equivalent Martingale Measures;
[1] Black, F. & Scholes, M. (1973). The pricing of options Fundamental Theorem of Asset Pricing; Free
and corporate liabilities, The Journal of Political Economy Lunch; Good-deal Bounds; Merton, Robert C.;
81, 637–654.
Ross, Stephen; Risk-neutral Pricing.
[2] Delbaen, F. & Schachermayer, W. (2006). The Mathe-
matics of Arbitrage, Springer Finance, Springer-Verlag,
CONSTANTINOS KARDARAS
Berlin.
[3] Merton, R.C. (1973). Theory of rational option pricing,
Bell Journal of Economics and Management Science 4,
141–183.
Fundamental Theorem of The decisive novel feature of the Black–Merton–
Scholes approach was the argument that links this
Asset Pricing pricing technique with the notion of arbitrage: the
payoff function of an option can be precisely repli-
cated by trading dynamically in the underlying stock.
This idea, which is credited in footnote 3 of [3] to
Consider a financial market modeled by a price pro- Merton, opened a completely new perspective on how
cess S on an underlying probability space (, F, ). to deal with options, as it linked the pricing issue with
The fundamental theorem of asset pricing, which is the idea of hedging, that is, dynamically trading in the
one of the pillars supporting the modern theory of underlying asset.
Mathematical Finance, states that the following two The technique of replicating an option is com-
statements are essentially equivalent: pletely absent in Bachelier’s early work; apparently,
the idea of “spanning” a market by forming lin-
1. S does not allow for arbitrage (NA). ear combinations of primitive assets first appears
2. There exists a probability measure Q on the in the Economics literature in the classic paper by
underlying probability space (, F, ), which is Arrow (see Arrow, Kenneth) [1]. The mathemati-
equivalent to  and under which the process is cally delightful situation, that the market is complete
a martingale. in the sense that all derivatives can be replicated,
occurs in the Black–Scholes model as well as in
We have formulated this theorem in vague terms, Bachelier’s original model of Brownian motion (see
which will be made precise in the sequel: we formu- Second Fundamental Theorem of Asset Pricing).
late versions of this theorem that use precise defini- Another example of a model in continuous time
tions and avoid the use of the word essentially. sharing this property is the compensated Poisson
The story of this theorem started—like most of process, as observed by Cox and Ross (see Ross,
modern Mathematical Finance—with the work of Stephen) [4]. Roughly speaking, these are the only
Black (see Black, Fischer), Scholes [3], and Mer- models in continuous time sharing this seducingly
ton (see Merton, Robert C.) [25]. These authors beautiful “martingale representation property” (see
consider a model S = (St )0≤t≤T of geometric Brown- [16, 39] for a precise statement on the uniqueness
ian motion proposed by Samuelson (see Samuelson, of these families of models).
Paul A.) [30], which is widely known today as the Appealing as it might be, the consideration of
Black–Scholes model. Presumably every reader of “complete markets” as above is somewhat dangerous
this article is familiar with the well-known technique from an economic point of view: the precise repli-
to price options in this framework (see Risk-neutral cability of options, which is a sound mathematical
Pricing): one changes the underlying measure  to theorem in the framework of the above models, may
an equivalent measure Q under which the discounted lead to the illusion that this is also true in economic
stock price process is a martingale. Subsequently, one reality. However, these models are far from match-
prices options (and other derivatives) by simply tak- ing reality in a one-to-one manner. Rather they only
ing expectations with respect to this “risk neutral” or highlight important aspects of reality and therefore
“martingale” measure Q. should not be considered as ubiquitously appropriate.
In fact, this technique was not the novel feature of For many purposes, it is of crucial importance to
[3, 25]. It was used by actuaries for some centuries put oneself into a more general modeling framework.
and it was also used by Bachelier [2] in 1900 who When the merits as well as the limitations of
considered Brownian motion (which, of course, is the Black–Merton–Scholes approach unfolded in the
a martingale) as a model S = (St )0≤t≤T of a stock late 1970s, the investigations on the fundamental
price process. In fact, the prices obtained by Bachelier theorem of asset pricing started. As Harrison and
(see Bachelier, Louis (1870–1946)) by this method Pliska formulate it in their classic paper [15]: “it
were—at least for the empirical data considered was a desire to better understand their formula which
by Bachelier himself —very close to those derived originally motivated our study, . . . ”.
from the celebrated Black–Merton–Scholes formula The challenge was to obtain a deeper insight
([34]). into the relation of the following two aspects: on
2 Fundamental Theorem of Asset Pricing

one hand, the methodology of pricing by taking question. An excellent reference is [14]. Ross [29]
expectations with respect to a properly chosen “risk circumvented this problem by deliberately leaving
neutral” or “martingale” measure Q; on the other this issue aside and simply starting with the mod-
hand, the methodology of pricing by “no arbitrage” eling assumption that the subset M ⊆ X as well as a
considerations. Why, after all, do these two seem- pricing operator π : M →  are given.
ingly unrelated approaches yield identical results in Let us now formalize the notion of arbitrage.
the Black–Merton–Scholes approach? Maybe even In the above setting, we say that the no arbitrage
more importantly: how far can this phenomenon be assumption is satisfied if, for m ∈ M, satisfying
extended to more involved models? m ≥ 0, -a.s. and [m > 0] > 0, we have π(m) >
To the best of the author’s knowledge, the first 0. In prose, this means that it is not possible to find a
person to take up these questions in a systematic claim m ∈ M, which bears no risk (as m ≥ 0, -a.s.),
way was Ross (see Ross, Stephen) [29]; see also [4, yields some gain with strictly positive probability (as
27, 28]. He chose the following setting to formalize [m > 0] > 0), and such that its price π(m) is less
the situation: fix a topological, ordered vector space than or equal to zero.
(X, τ ), modeling the possible cash flows (e.g., the The question that now arises is whether it is
payoff function of an option) at a fixed time horizon possible to extend π : M →  to a nonnegative,
T . A good choice is, for example, X = Lp (, F, ), continuous linear functional π ∗ : X → .
where 1 ≤ p ≤ ∞ and (, F, (Ft )0≤t≤T , ) is the What does this have to do with the issue of
underlying filtered probability space. The set of martingale measures? This theme was developed in
marketed assets M is a subspace of X. detail by Harrison and Kreps [14]. Suppose that
In the context of a stock price process S = X = Lp (, F, ) for some 1 ≤ p < ∞, that the
(St )0≤t≤T as above, one might think of M as all price process S = (St )0≤t≤T satisfies St ∈ X, for each
the outcomes of an initial investment x ∈  plus 0 ≤ t ≤ T , and that M contains (at least) the “simple
the result of subsequent trading according to a integrals” on the process S = (St )0≤t≤T of the form
predictable trading strategy H = (Ht )0≤t≤T . This

n
yields (in discounted terms) an element m=x+ Hi (Sti − Sti−1 ) (3)
 T i=1
m=x+ Ht dSt (1) Here x ∈ , 0 = t0 < t1 < . . . < tn = T and
0
(Hi )ni=1 is a (say) bounded process which is pre-
in the set M of marketed claims. It is natural to price dictable, that is, Hi is Fti−1 -measurable. The sums in
the above claim m by setting π(m) = x, as this is equation (3) are the Riemann sums corresponding to
the net investment necessary to finance the above the stochastic integrals (1). The Riemann sums (3)
claim m. have a clear-cut economic interpretation [14]. In
For notational convenience, we shall assume in equation (3) we do not have to bother about subtle
the sequel that S is a one-dimensional process. It is convergence issues as only finite sums are involved
straightforward to generalize to the case of d risky in the definition. It is therefore a traditional (minimal)
assets by assuming that S is d -valued and replacing requirement that the Riemann sums of the form (3)
the above integral by are in the space M of marketed claims; naturally, the
price of a claim m of the form (3) should be defined
 
T d
as π(m) = x.
m=x+ Hti dSti (2) Now suppose that the functional π, which is
0 i=1 defined for the claims of the form (3) can be extended
Some words of warning about the stochastic inte- to a continuous, nonnegative functional π ∗ defined on
gral (1) seem necessary. The precise admissibility X = Lp (, F, ). If such an extension π ∗ exists, it
conditions, which should be imposed on the stochas- is induced by some function g ∈ Lq (, F, ), where
tic integral (1), in order to make sense both mathe- p
1
+ q1 = 1. The nonnegativity of π ∗ is tantamount
matically as well as economically, are a subtle issue. to g ≥ 0, -a.s., and the fact that π ∗ (1) = 1 shows
Much of the early literature on the fundamental the- that g is the density of a probability measure Q with
orem of asset pricing struggled exactly with this Radon–Nikodym derivative dQ = g.
d
Fundamental Theorem of Asset Pricing 3

If we can find such an extension π ∗ of π, we thus infinite (, F, ), the present result only applies to
find a probability measure Q on (, F, ) for which L∞ (, F, ) endowed with the norm topology. In
this case, the continuous linear functional π ∗ only is
 n  n 
    in L∞ (, F, )∗ and not necessarily in L1 (, F, );

π Hi Sti − Sti−1 = ƐQ Hi (Sti − Sti−1 ) in other words, we cannot be sure that π ∗ is induced
i=1 i=1 by a probability measure Q, as it may happen that
(4) π ∗ ∈ L∞ (, F, )∗ also has a singular part.
Another drawback, which already appears in the
for every bounded predictable process H = (Hi )ni=1 case of finite-dimensional  (in which case π ∗
as above, which is tantamount to (St )0≤t≤T being a certainly is induced by some Q with ddQ 
=g∈
martingale (see [Th. 2] [14], or [Lemma 2.2.6] [11]).
L1 (, F, )) is the following: we cannot be sure that
To sum up, in the case 1 ≤ p < ∞, finding a con-
the function g is strictly positive -a.s. or, in other
tinuous, nonnegative extension π ∗ : Lp (, F, ) →
words, that Q is equivalent to .
 of π amounts to finding a -absolutely continuous
After this early work by Ross, a major advance
measure Q with dQ ∈ Lq and such that (St )0≤t≤T is in the theory was achieved between 1979 and 1981
d
a martingale under Q. by three seminal papers [14, 15, 24] by Harrison,
At this stage, it becomes clear that in order to Kreps, and Pliska. In particular, [14] is a landmark in
find such an extension π ∗ of π, the Hahn–Banach the field. It uses a similar setting as [29], namely, an
theorem should come into play in some form, for ordered topological vector space (X, τ ) and a linear
example, in one of the versions of the separating functional π : M → , where M is a linear subspace
hyperplane theorem. of X. Again the question is whether there exists
In order to be able to do so, Ross assumes an extension of π to a linear, continuous, strictly
([p. 472] [29]) that “. . .we will endow X with a positive π ∗ : X → . This question is related in [14]
strong enough topology to insure that the positive to the issue of whether (M, π) is viable as a model
orthant {x ∈ X|x > 0} is an open set, . . .”. In prac- of economic equilibrium. Under proper assumptions
tice, the only infinite-dimensional ordered topological on the convexity and continuity of the preferences of
vector space X, such that the positive orthant has agents, this is shown to be equivalent to the extension
nonempty interior, is X = L∞ (, F, ), endowed discussed above.
with the topology induced by .∞ . The paper [14] also analyzes the case when  is
Hence the two important cases, applying to Ross’ finite. Of course, only processes S = (St )Tt=0 indexed
hypothesis, are when either the probability space  by finite, discrete time {0, 1, . . . , T } make sense in
is finite, so that X = Lp (, F, ) simply is finite this case. For this easier setting, the following precise
dimensional and its topology does not depend on theorem was stated and proved in the subsequent
1 ≤ p ≤ ∞, or if (, F, ) is infinite and X = paper [15] by Harrison and Pliska:
L∞ (, F, ) equipped with the norm .∞ .
After these preparations we can identify the Theorem 1 ([Th. 2. 7.] [15]): suppose the
two convex sets to be separated: let A = {m ∈ M : stochastic process S = (St )Tt=0 is based on a finite, fil-
π(m) ≤ 0} and B be the interior of the positive cone tered, probability space (, F, (Ft )Tt=0 , ). The mar-
of X. Now make the easy, but crucial, observation: ket model contains no-arbitrage possibilities if and
these sets are disjoint if and only if the no-arbitrage only if there is an equivalent martingale measure
condition is satisfied. As one always can separate an for S.
open convex set from a disjoint convex set, we find
a functional π̃, which is strictly positive on B, while The proof again relies on a (finite-dimensional
π̃ takes nonpositive values on A. By normalizing π̃, version) of the Hahn–Banach theorem plus an extra
that is, letting π ∗ = π̃ (1)−1 π̃ we have thus found the argument making sure to find a measure Q, which
desired extension. is equivalent to . Harrison and Pliska thus have
In summary, the first precise version of the fun- achieved a precise version of the above meta-theorem
damental theorem of asset pricing is established in in terms of equivalent martingale measures, which
[29], the proof relying on the Hahn–Banach theorem. does not use the word “essentially”. Actually, the
There are, however, serious limitations: in the case of theme of the Harrison–Pliska theorem goes back
4 Fundamental Theorem of Asset Pricing

much further, to the work of Shimony [35] and Here is the ingenious construction of Kreps: define
Kemeny [22] on symbolic logic in the tradition
of Carnap, de Finetti, and Ramsey. These authors A = M0 − X+ (5)
showed that, in a setting with only finitely many states
of the world, a family of possible bets does not allow where the bar denotes the closure with respect to the
(by taking linear combinations) for making a riskless topology τ . We shall require that A still satisfies
profit (i.e., one certainly does not lose but wins with
strictly positive probability), if and only if there is a A ∩ X+ = {0} (6)
probability measure Q on these finitely many states,
This property is baptized as “no free lunch” by Kreps:
which prices the possible bets by taking conditional
Q-expectations. Definition 1 [24]: The financial market defined by
The restriction to finite  is very severe in (X, τ ), M, and π admits a free lunch if there are nets
applications: the flavor of the theory, building on (mα )α∈I ∈ M0 and (hα )α∈I ∈ X+ such that
Black–Scholes–Merton, is precisely the concept of
continuous time. Of course, this involves infinite lim (mα − hα ) = x (7)
probability spaces (, F, ). α∈I

Many interesting questions were formulated in the for some x ∈ X+ \{0}.


papers [14, 15] hinting on the difficulties to prove a
version of the fundamental theorem of asset pricing It is easy to verify that the negation of the
beyond the setting of finite probability spaces. above definition is tantamount to the validity of
A major breakthrough in this direction was equation (6).
achieved by Kreps [24]: as above, let M ⊆ X and The economic interpretation of the “no free lunch”
a linear functional π : M →  be given. The typi- condition is a sharpening of the “no-arbitrage condi-
cal choice for X will now be X = Lp (, F, ), for tion”. If the latter is violated, we can simply find
1 ≤ p ≤ ∞, equipped with the topology τ of con- an element x ∈ X+ \{0}, which also lies in M0 . If
vergence in norm, or, if X = L∞ (, F, ), equipped the former fails, we cannot quite guarantee this, but
with the Mackey topology τ induced by L1 (, F, ). we can find x ∈ X+ \{0}, which can be approximated
This setting will make sure that a continuous linear in the τ -topology by elements of the form mα − hα .
functional on (X, τ ) will be induced by a measure Q, The passage from mα to mα − hα means that agents
which is absolutely continuous with respect to . are allowed to “throw away money”, that is, to aban-
The no-arbitrage assumption means that M0 := don a positive element hα ∈ X+ . This combination
{m ∈ M : π(m) = 0} intersects the positive orthant of the “free disposal” assumption with the possibility
X+ of X only in {0}. In order to obtain an extension of of passing to limits is crucial in Kreps’ approach (5)
π to a continuous, linear functional π ∗ : X →  we as well as in most of the subsequent literature. It
have to find an element in (X, τ )∗ , which separates was shown in [Ex. 3.3] [32]; ([33]) that the (seem-
the convex set M0 from the disjoint convex set ingly ridiculous) “free disposal” assumption cannot
X+ \{0}, that is, the positive orthant of X with 0 be dropped.
deleted. Definition (5) is tailor-made for the application
Easy examples show that, in general, this is not of Hahn–Banach. If the no free lunch condition (6)
possible. In fact, this is not much of a surprise (if is satisfied, we may, for any h ∈ X+ , separate the
X is infinite-dimensional) as we know that some τ -closed, convex set A from the one-point set {h}
topological condition is needed for the Hahn–Banach by an element πh ∈ (X, τ )∗ . As 0 ∈ A, we may
theorem to work. assume that πh |A ≤ 0 while πh (h) > 0. We thus have
It is always possible to separate a closed convex obtained a nonnegative (as −X+ ⊆ A), continuous
set from a disjoint compact convex set by a con- linear functional πh , which is strictly positive on
tinuous linear functional. In fact, one may even get a given h ∈ X+ . Supposing that X+ is τ -separable
strict separation in this case. It is this version of the (which is the case in the above setting of Lp -spaces
Hahn–Banach theorem that Kreps eventually applies. if (, F, ) is countably generated), fix a dense
But how? After all, neither M0 nor X+ {0} are sequence (hn )∞ n=1 and find
strictly positive scalars
closed in (X, τ ), let alone compact. µn > 0 such that π ∗ = ∞ n=1 µn πhn converges to a
Fundamental Theorem of Asset Pricing 5

probability measure in (X, τ )∗ = Lq (, F, ), where of them were explicitly stated as open problems in
p
1
+ q1 = 1. This yields the desired extension π ∗ of these papers.
π which is strictly positive on X+ \{0}. Subsequently a rather extensive literature devel-
We still have to specify the choice of (M0 , π). The oped, answering these problems and opening new
most basic choice is to take for given S = (St )0≤t≤T perspectives. We cannot give a full account on all
the space generated by the “simple integrands” (3) of this literature and refer, for example, to the mono-
as proposed in [14]. We thus may deduce from graph [11] for more extensive information. We can
Kreps’ arguments in [24] the following version of give an outline.
the fundamental theorem of asset pricing. As regards the situation for 1 ≤ p ≤ ∞ in Kreps’
theorem, this issue was further developed by Duffie
Theorem 2 Let (, F, ) be countably generated and Huang [12] and, in particular, by Stricker [36].
and X = Lp (, F, ) endowed with the norm topol- This author related the no free lunch condition of
ogy τ , if 1 ≤ p < ∞, or the Mackey topology induced Kreps to a theorem by Yan [37] obtained in the
by L1 (, F, ), if p = ∞. context of the Bichteler–Dellacherie theorem on
Let S = (St )0≤t≤T be a stochastic process taking the characterization of semimartingales. Using Yan’s
values in X. Define M 0 ⊆ X to consist of the simple theorem, Stricker gave a different proof of Kreps’
stochastic integrals ni=1 Hi (Sti − Sti−1 ) as in equa- theorem, which does not need the assumption that
tion (3). (, F, ) is countably generated.
Then the “no free lunch” condition (5) is satisfied if A beautiful extension of the Harrison–Pliska the-
and only if there is a probability measure Q with ddQ

∈ orem was obtained in 1990 by Dalang, Morton, and
Lq (, F, ), where p1 + q1 = 1, such that (St )0≤t≤T is Willinger [5]. They showed that, for an d -valued
a Q-martingale. process (St )Tt=0 in finite discrete time, the no-arbitrage
condition is indeed equivalent to the existence of an
This remarkable theorem of Kreps sets new stan- equivalent martingale measure. The proof is surpris-
dards. For the first time, we have a mathematically ingly tricky, at least for the case d ≥ 2. It is based on
precise statement of our meta-theorem applying to a the measurable selection theorem (the suggestion to
general class of models in continuous time. There are use this theorem is acknowledged to Delbaen). Differ-
still some limitations, however. ent proofs of the Dalang–Morton–Willinger theorem
When applying the theorem to the case 1 ≤ p < have been given in [17, 20, 21, 26, 31].
∞, we find the requirement dQ ∈ Lq (, F, ) for An important question left unanswered by Kreps
d
some q > 1, which is not very pleasant. After all, was whether one can, in general, replace the use of
we want to know what exactly corresponds (in terms nets (mα − hα )α∈I , indexed by α ranging in a general
of some no-arbitrage condition) to the existence of ordered set I , simply by sequences (mn − hn )∞ n=1 . In
an equivalent martingale measure Q. The q-moment the context of continuous processes, S = (St )0≤t≤T , a
condition is unnatural in most applications. In partic- positive answer was given by Delbaen in [6], if one is
ular, it is not invariant under equivalent changes of willing to make the harmless modification to replace
measures as is done often in the applications. the deterministic times 0 = t0 ≤ t1 ≤ . . . ≤ tn = T in
The most interesting case of the above theorem equation (3) by stopping times 0 = τ0 ≤ τ1 ≤ . . . ≤
is p = ∞. However, in this case, the requirement τn = T . A second case, where the answer to this
St ∈ X = L∞ (, F, ) is unduly strong for most question is positive, are processes S = (St )∞ t=0 in
applications. In addition, for p = ∞, we run into the infinite, discrete time as shown in [32].
subtleties of the Mackey topology τ (or the weak-star The Banach–Steinhaus theorem implies that, for a
topology, which does not make much of a difference) sequence (mn − hn )∞ ∞
n=1 converging in L (, F, )
on L∞ (, F, ). We shall discuss this issue below. with respect to the weak-star (or Mackey) topology,
The “heroic period” of the development of the fun- the norms (||mn − hn ||∞ )∞ n=1 remain bounded (“uni-
damental theorem of asset pricing marked by Ross form boundedness principle”). Therefore, it follows
[29], Harrison–Kreps [14], Harrison–Pliska [15], that in the above two cases of continuous processes
and Kreps [24], put the issue on safe mathematical S = (St )0≤t≤T or processes (St )∞ t=0 in infinite, dis-
grounds and brought some spectacular results. How- crete time, the “no free lunch” condition of Kreps
ever, it still left many questions open; quite a number can be equivalently replaced by the “no free lunch
6 Fundamental Theorem of Asset Pricing

with bounded risk” condition introduced in [32]: Definition 2 ([Def. 2.7] [7]): An S-integrable
in equation (7) above, we additionally impose that predictable process H = (Ht )0≤t≤T is called admis-
(||mα − hα ||∞ )α∈I remains bounded. In this case, sible if there is a constant M > 0 such that
we have that there is a constant M > 0 such that  t
mα ≥ −M, -a.s. for each α ∈ I , which explains the Hu dSu ≥ −M, a.s., f or 0 ≤ t ≤ T (8)
wording “bounded risk”. 0
However, in the context of general semimartingale
models S = (St )0≤t≤T , a counter-example was given The economic interpretation is that the economic
by Delbaen and the author in ( [Ex. 7.8] [7]) showing agent, trading according to the strategy, has to respect
that the “no free lunch with bounded risk” condition a finite credit line M.
does not imply the existence of an equivalent martin- Let us now sketch the approach of [7]. Define
gale measure. Hence, in a general setting and by only
 T
using simple integrals, there is no possibility of get- K= Ht dSt : H admissible (9)
ting any more precise information on the free lunch 0

condition than the one provided by Kreps’ theorem. which is a set of (equivalence classes of) random
At this stage it became clear that, in order to variables. Note that by equation (6) the elements
obtain sharper results, one has to go beyond the f ∈ K are uniformly bounded from below, that is,
framework of simple integrals (3) and rather use f ≥ −M for some M ≥ 0. On the other hand, there
general stochastic integrals (1). After all, the simple is no reason why the positive part f+ should obey
integrals are only a technical gimmick, analogous any boundedness or integrability assumption.
to step functions in measure theory. In virtually all As a next step, we “allow agents to throw away
the applications, for example, the replication strategy money” similarly as in Kreps’ work [24]. Define
of an option in the Black–Scholes model, one uses
general integrals of the form (1).
C = g ∈ L∞ (, F, ) : g ≤ f for some f ∈ K
General integrands pose a number of questions  
to be settled. First of all, the integral (1) has to be = K − L0+ (, F, ) ∩ L∞ (, F, ) (10)
mathematically well defined. The theory of stochastic
calculus starting with K. Itô, and developed in partic- where L0+ (, F, ) denotes the set of nonnegative
ular by the Strasbourg school of probability around measurable functions.
Meyer, provides very precise information on this By construction, C consists of bounded random
issue: there is a good integration theory for a given variables, so that we can use the functional analytic
stochastic process S = (St )0≤t≤T if and only if S is a duality theory between L∞ and L1 . The difference of
semimartingale (theorem of Bichteler–Dellacherie). the subsequent definition to Kreps’ approach is that
Hence, mathematical arguments lead to the model it pertains to the norm topology .∞ rather than to
assumption that S has to be a semimartingale. How- the Mackey topology on L∞ (, F, ).
ever, what about an economic justification of this
assumption? Fortunately, the economic reasoning Definition 3 ([2.8] [11]): A locally bounded semi-
hints in the same direction. It was shown by Delbaen martingale S = (St )0≤t≤T satisfies the no free lunch
and the author that, for a locally bounded stochastic with vanishing risk condition if
process S = (St )0≤t≤T , a very weak form of Kreps’
C̄ ∩ L∞
+ (, F, ) = {0} (11)
“no free lunch” condition involving simple integrands
(3), implies already that S is a semimartingale (see where C̄ denotes the .∞ -closure of C.
[Theorem 7.2] [7], for a precise statement).
Hence, it is natural to assume that the model Here is the translation of equation (11) into prose:
S = (St )0≤t≤T of stock prices is a semimartingale so the process S fails the above condition if there is a
that the stochastic integral (3) makes sense mathe- function g ∈ L∞ + (, F, ) with [g > 0] > 0 and a
matically, for all S-integrable, predictable processes sequence (f n )∞
n=1 of the form
H = (Ht )0≤t≤T . As pointed out, [14, 15] impose, in
 T
addition, an admissibility condition to rule out dou-
bling strategies and similar schemes. fn = Htn dSt (12)
0
Fundamental Theorem of Asset Pricing 7

where H n are admissible integrands, such that that, under the assumption of no free lunch with
vanishing risk, the set C defined in equation (10) is
fn ≥ g − 1
n
a.s. (13) automatically weak-star closed in L∞ (, F, ). This
Hence the condition of no free lunch with van- pleasant fact is not only a crucial step in the proof of
ishing risk is intermediate between the (stronger) the above theorem; maybe even more importantly,
no free lunch condition of Kreps and the (weaker) it also found other applications. For example, to
no-arbitrage condition. The latter would require that find general existence results in the theory of utility
there is a nonnegative function g with [g > 0] > 0, optimization (see Expected Utility Maximization:
which is of the form Duality Methods) it is of crucial importance to have
 T a closedness property of the set over which one
g= Ht dSt (14) optimizes: for these applications, the above result is
0 very useful [23].
for an admissible integrand H . Condition (13) does Without going into the details of the proof, the
not quite guarantee this, but something — at least importance of certain elements in the set K is pointed
from an economic point of view — very close: we out. The admissibility rules out the use of doubling
can uniformly approximate from below such a g by strategies. The opposite of such a strategy can be
the outcomes fn of admissible trading strategies. called a suicide strategy. It is the mathematical
The main result of Delbaen and the author [7] equivalent of making a bet at the roulette, leaving it
reads as follows. as well as all gains on the table as long as one keeps
winning, and wait until one loses for the first time.
Theorem 3 ( [Corr. 1.2] [7]): Let S = (St )0≤t≤T be Such strategies, although admissible, do not reflect
a locally bounded real-valued semimartingale. economic efficiency. More precisely, we define the
There is a probability measure Q on (, F), which
following.
is equivalent to  and under which S is a local
martingale if and only if S satisfies the condition of T
Definition 4 An admissible outcome 0 Ht dSt
no free lunch with vanishing risk.
is called maximal if there
 T is no other
 T admissible
This is a mathematically precise theorem, which, strategy H
such that 0 Ht
dSt ≥ 0 Ht dSt with
T T
in my opinion, is quite close to the vague “meta- [ 0 Ht
dSt > 0 Ht dSt ] > 0
theorem” at the beginning of this article. The dif-
ference to the intuitive “no arbitrage” idea is that the In the proof of Theorem 6, these elements play
agent has to be willing to sacrifice (at most) the quan- a crucial role and the heart of the proof consists in
tity n1 in equation (13), where we may interpret n1 as, showing that every element in K is dominated by
say, 1 cent. a maximal element. However, besides their mathe-
The proof of the above theorem is rather long and matical relevance, they also have a clear economic
technical and a more detailed discussion goes beyond interpretation. There is no use in implementing a
the scope of this article. To the best of the author’s strategy that is not maximal as one can do better.
knowledge, no essential simplification of this proof Nonmaximal elements can also be seen as bubbles
has been achieved so far ([19]). [18].
Mathematically speaking, the statement of the In Theorem 6, we only assert that S is a local
theorem looks very suspicious at first glance: after martingale under Q. In fact, this technical concept
all, the no free lunch with vanishing risk condition cannot be avoided in this setting. Indeed, fix an
pertains to the norm topology of L∞ (, F, ). Hence S-integrable, predictable, admissible process H =
it seems that, when applying the Hahn–Banach (Ht )0≤t≤T as well as a bounded, predictable, strictly
theorem, one can only obtain a linear functional positive process (kt )0≤t≤T . The subsequent identity
in L∞ (, F, )∗ , which is not necessarily of the holds true trivially.
form dQ ∈ L1 (, F, ), as we have seen in Ross’
d
work [29].
 t  t
The reason why the above theorem, nevertheless, Hu
is true is a little miracle: it turns out ([Th. 4.2] [7]) Hu dSu = dS̃u , 0≤t ≤T (15)
0 0 ku
8 Fundamental Theorem of Asset Pricing

where  u
to Equivalent Martingale Measures for a discussion
S̃u = kv dSv , 0≤u≤T (16) of the concept of sigma-martingales. This concept
0 allows to formulate a result pertaining to a perfectly
general setting.
The message of equations (15) and (16) is that
the class of processes obtained by taking admissible Theorem 4 ([Corr. 1.2][7]): Let S = (St )0≤t≤T be
stochastic integrals on S or S̃ simply coincide. An an d -valued semimartingale.
easy interpretation of this rather trivial fact is that There is a probability measure Q on (, F), which
the possible investment opportunities do not depend is equivalent to  and under which S is a sigma-
on whether stock prices are denoted in euros or in martingale if and only if S satisfies the condition
cents (this corresponds to taking kt ≡ 100 above). of no free lunch with vanishing risk with respect to
However, it may very well happen that S̃ is a admissible strategies.
martingale while S only is a local martingale. In
fact, the concept of local martingales may even be One may still ask whether it is possible to for-
characterized in these terms ([Proposition 2.5] [10]): mulate a version of the fundamental theorem, which
a semimartingale S is a local martingale if and only does not rely on the concepts of local or sigma-, but
if there is a strictly positive, decreasing, predictable rather on “true” martingales.
process k such that S̃ defined in equation (16) is a This was achieved by Yan [38] by applying a
martingale. clever change of numéraire technique, (see Change
Again we want to emphasize the role of the max- of Numeraire also [Section 5] [13]): let us suppose
T
imal elements. It turns out ([8, 11]) that if 0 Ht dSt that (St )0≤t≤T is a positive semimartingale, which is
is maximal, if and only if there is an equivalent natural if we model, for example, prices of shares
 t local
martingale measure Q such that the process 0 Hu dSu (while the previous setting of not necessarily positive
is a martingale and not just a local martingale under price processes also allows for the modeling of
Q. One can show ([9, 11]) that for a given sequence forwards, futures etc.).
T Let us weaken the admissibility condition (8)
of maximal elements 0 Htn dSt , one can find one and
the same equivalent local above, by calling a predictable, S-integrable process
 t martingale measure Q such
that all the processes 0 Hun dSu are Q-martingales. allowable if
Another useful and related characterization
t ([8, 11]) 
is that if a process Vt = x + 0 Hu dSu defines a max- t
T Hu dSu ≥ −M(1 + St ) a.s., for 0 ≤ t ≤ T
imal element 0 Hu dSu and remains strictly positive, 0
the whole financial market can be rewritten in terms (17)
of V as a new numéraire without losing the no-
arbitrage properties. The change of numéraire and The economic idea underlying this notion is well
the use of the maximal elements allows to introduce known and allows for the following interpretation:
a numéraire invariant concept of admissibility, see an agent holding M units of stock and bond may, in
[9] for details. An important result in this article is addition, trade in S according to the trading strategy
that the sum of maximal elements is again a maximal H satisfying equation (17); the agent will then remain
element. liquid during [0, T ].
Theorem 6 above still contains one severe limi- By taking S + 1 as new numéraire and replac-
tation of generality, namely, the local boundedness ing admissible by allowable trading strategies, Yan
assumption on S. As long as we only deal with con- obtains the following theorem.
tinuous processes S, this requirement is, of course,
satisfied. However, if one also considers processes Theorem 5 ([Theorem 3.2] [38]) Suppose that S is
with jumps, in most applications it is natural to drop a positive semimartingale.
the local boundedness assumption. There is a probability measure Q on (, F), which
The case of general semimartingales S (without is equivalent to  and under which S is a martingale
any boundedness assumption) was analyzed in [10]. if and only if S satisfies the condition of no free lunch
Things become a little trickier as the concept of local with vanishing risk with respect to allowable trading
martingales has to be weakened even further: we refer strategies.
Fundamental Theorem of Asset Pricing 9

References [17] Jacod, J. & Shiryaev, A.N. (1998). Local martingales and
the fundamental asset pricing theorems in the discrete-
time case, Finance and Stochastics 2(3), 259–273.
[1] Arrow, K. (1964). The role of securities in the optimal [18] Jarrow, R., Protter, P. & Shimbo, K. (2007). Asset
allocation of risk-bearing, Review of Economic Studies price bubbles in complete markets, in Advances in
31, 91–96. Mathematical Finance, Appl. Numer. Harmon. Anal.,
[2] Bachelier, L. (1964). Théorie de la Spéculation, Annales Birkhäuser, Boston, Boston MA, pp. 97–121.
Scientifiques de l’É Normale Superieure 17, 21–86. [19] Kabanov, Y.M. (1997). On the FTAP of Kreps-Delbaen-
English translation in: Cootner, P. (ed), The Random Schachermayer (English), in Statistics and Control of
Character of Stock Market Prices, MIT Press. Stochastic Processes, Y.M. Kabanov ed., World Scien-
[3] Black, F. & Scholes, M. (1973). The pricing of options tific, Singapore, pp. 191–203. The Liptser Festschrift.
and corporate liabilities, Journal of Political Economy Papers from the Steklov seminar held in Moscow, Rus-
81, 637–659. sia, 1995–1996.
[4] Cox, J. & Ross, S. (1976). The valuation of options [20] Kabanov, Y.M. & Kramkov, D. (1994). No-arbitrage and
for alternative stochastic processes, Journal of Financial equivalent martingale measures: an elementary proof of
Economics 3, 145–166. the Harrison–Pliska theorem, Theory of Probability and
[5] Dalang, R.C., Morton, A. & Willinger, W. (1990). its Applications 39(3), 523–527.
Equivalent Martingale measures and no-arbitrage in [21] Kabanov, Y.M. & Stricker, Ch. (2001). A teachers’
stochastic securities market model, Stochastics and note on no-arbitrage criteria, Séminaire de Probabilités
Stochastic Reports 29, 185–201. XXXV, Springer Lecture Notes in Mathematics 1755,
[6] Delbaen, F. (1992). Representing martingale measures 149–152.
when asset prices are continuous and bounded, Mathe- [22] Kemeny, J.G. (1955). Fair bets and inductive probabili-
matical Finance 2, 107–130. ties, Journal of Symbolic Logic 20(3), 263–273.
[7] Delbaen, F. & Schachermayer, W. (1994). A general [23] Kramkov, D. & Schachermayer, W. (1999). The asymp-
version of the fundamental theorem of asset pricing, totic elasticity of utility functions and optimal investment
Mathematische Annalen 300, 463–520. in incomplete markets, Annals of Applied Probability
[8] Delbaen, F. & Schachermayer, W. (1995). The no- 9(3), 904–950.
[24] Kreps, D.M. (1981). Arbitrage and equilibrium in eco-
arbitrage condition under a change of numéraire,
nomics with infinitely many commodities, Journal of
Stochastics and Stochastic Reports 53, 213–226.
Mathematical Economics 8, 15–35.
[9] Delbaen, F. & Schachermayer, W. (1997). The Banach
[25] Merton, R.C. (1973). The theory of rational option
space of workable contingent claims in arbitrage the-
pricing, Bell Journal of Economics and Management
ory, Annales de IHP (B) Probability and Statistics 33,
Science 4, 141–183.
113–144.
[26] Rogers, L.C.G. (1994). Equivalent martingale measures
[10] Delbaen, F. & Schachermayer, W. (1998). The funda-
and no-arbitrage, Stochastics and Stochastic Reports
mental theorem of asset pricing for unbounded stochastic
51(1–2), 41–49.
processes, Mathematische Annalen 312, 215–250. [27] Ross, S. (1976). The arbitrage theory of capital asset
[11] Delbaen, F. & Schachermayer, W. (2006). The Mathe- pricing, Journal of Economic Theory 13, 341–360.
matics of Arbitrage, Springer Finance, Springer, p. 371. [28] Ross, S. (1977). Return, risk and arbitrage, Risk and
[12] Duffie, D. & Huang, C.F. (1986). Multiperiod security Return in Finance 1, 189–218.
markets with differential information; martingales and [29] Ross, S. (1978). A simple approach to the valuation of
resolution times, Journal of Mathematical Economics 15, risky streams., Journal of Business 51, 453–475.
283–303. [30] Samuelson, P.A. (1965). Proof that properly antici-
[13] Guasoni, P., Rásonyi, M. & Schachermayer, W. (2009). pated prices fluctuate randomly, Industrial Management
The fundamental theorem of asset pricing for continu- Review 6, 41–50.
ous processes under small transaction costs, Annals of [31] Schachermayer, W. (1992). A Hilbert space proof of the
Finance, forthcoming. fundamental theorem of asset pricing in finite discrete
[14] Harrison, J.M. & Kreps, D.M. (1979). Martingales and time, Insurance: Mathematics and Economics 11(4),
arbitrage in multiperiod securities markets, Journal of 249–257.
Economic Theory 20, 381–408. [32] Schachermayer, W. (1994). Martingale Measures for
[15] Harrison, J.M. & Pliska, S.R. (1981). Martingales and Discrete time Processes with Infinite Horizon, Mathe-
stochastic integrals in the theory of continuous trad- matical Finance 4, 25–56.
ing, Stochastic Processes and their Applications 11, [33] Schachermayer, W. (2005). A note on arbitrage and
215–260. closed convex cones, Mathematical Finance (1), forth-
[16] Harrison, J.M. & Pliska, S.R. (1983). A stochastic coming.
calculus model of continuous trading: complete mar- [34] Schachermayer, W. & Teichmann, J. (2005). How close
kets, Stochastic Processes and their Applications 11, are the option pricing formulas of Bachelier and Black-
313–316. Merton-Scholes? Mathematical Finance 18(1), 55–76.
10 Fundamental Theorem of Asset Pricing

[35] Shimony, A. (1955). Coherence and the axioms of [39] Yor, M. (1978). Sous-espaces denses dans L1 ou H 1 et
confirmation, The Journal of Symbolic Logic 20, 1–28. représentation des martingales, in Séminaire de Prob-
[36] Stricker, Ch. (1990). Arbitrage et Lois de Martingale, abilités XII, Springer Lecture Notes in Mathematics,
Annales de l’Institut Henri Poincaré—Probabilites et Springer, Vol. 649, pp. 265–309.
Statistiques 26, 451–460.
[37] Yan, J.A. (1980). Caractérisation d’ une classe Related Articles
d’ensembles convexes de L1 ou H 1 , in Séminaire de
Probabilités XIV, J. Azema, M. Yor, eds, Springer Arbitrage Strategy; Arrow, Kenneth; Change
Lecture Notes in Mathematics 784, Springer, pp. of Numeraire; Equivalent Martingale Measures;
220–222. Martingales; Martingale Representation Theorem;
[38] Yan, J.A. (1998). A new look at the fundamental theorem Risk-neutral Pricing; Stochastic Integrals.
of asset pricing, Journal of Korean Mathematics Society
35, 659–673. WALTER SCHACHERMAYER
Risk-neutral Pricing account. This approach is used in modern mathe-
matical finance, in particular, in the Black–Scholes
formula. However, the idea goes back much further
and the method was used by actuaries for centuries.
A classical problem arising frequently in business is Think of a life insurance contract. To focus on
the valuation of future cash flows that are risky. By the essential point, we consider the simplest case: a
the term risky we mean that the payment is not of a one-year death insurance. If the insured person dies
deterministic nature; rather there is some uncertainty within the subsequent year, the insured sum S, say
in the amount of the future cash flows. Of course, in S = ¤1, is paid out at the end of this year; if the
real life, virtually everything happening in the future insured person survives the year, nothing is paid, and
contains some element of uncertainty. the contract ends at the end of the year.
As an example, let us think of an investment To calculate the premiumb for this contract, actu-
project, say, a company plans to build a new factory. aries look up in their mortality tablesc the probability
A classical way to proceed is to calculate a net asset that the insured person dies within one year. The tra-
value. One tries to estimate the future cash flows ditional notation for this probability is qx , where x
generated by the project in the subsequent periods. In denotes the age of the insured person.
the present example, they will initially be negative; To calculate the premium for such a one-year
this initial investment should be compensated by the death insurance contract, with S normalized to S = 1,
positive cash flows in the later periods. Having fixed actuaries apply the formula
these estimates of the future cash flows for all periods,
1
one calculates a net asset value by discounting these P = qx (1)
cash flows to the present date. But, of course, there 1+i
is uncertainty involved in the estimation of the The term qx is just the expected value of the future
future cash flows and people doing these calculations cash flow and i denotes “the” interest rate: hence the
are, of course, aware of that. The usual way to premium P is the discounted expected value of the
compensate for this uncertainty is to apply an interest cash flow at the end of the year.
rate that is higher than the risklessa rate of return It is important to note that actuaries use a “conser-
corresponding to the rate of return of government vative” value for the interest rate, for example, i =
bonds. 3%. In practical terms, this corresponds quite well to
The spread between the riskless rate of return and the “riskless rate of return”. In any case, it is quite
the interest rate used for discounting the future cash different, in practical as well as in theoretical terms,
flows in the calculation of the net asset value can from the discount factors used to calculate the net
be quite substantial in order to compensate for the asset value of a risky future cash flow according to
riskiness. Only if the net asset value, obtained by the method stated above.
discounting with a rather high rate of return, remains But, after all, the premium of our death insurance
positive, the management of the company will engage contract also corresponds to the present value of an
in the investment project. uncertain future cash flow! How do actuaries account
Mathematically speaking, the above procedure for the risk involved in this cash flow, if not via an
may be described as follows: first, one determines appropriate choice of the interest rate?
the expected values of the future cash flows and, The answer is simple when looking at equation
subsequently, one discounts by using an elevated (1): apart from the interest rate i the probability qx of
discount factor. However, there is no systematic way dying within the next year also enters the calculation
of mathematically approaching the question of how of P . The art of the actuarial profession is to choose
the degree of uncertainty in the determination of the the “good” value for qx . Typically, actuaries very
expected values can be quantified, and in which way well know the actual mortality probabilities in their
this should be taken into account to determine the portfolio of contracts, which often consists of several
spread between the interest rates. hundred thousand contracts; in other words, they have
We now turn to a different approach, which inter- a very good understanding of what the “true value” of
changes the roles of taking expectations and discount- qx is. However, they do not apply this “true value”
ing in taking the riskness of the cash flows into in their premium calculations: in equation (1) they
2 Risk-neutral Pricing

would apply a value for qx which is substantially Markets), the solution proposed by Black, Scholes,
higher than the “true” value of qx . Actuaries speak and Merton is
about mortality tables of the first kind and the second
kind. C0 = e−rT ƐQ [CT ] (3)
Mortality tables of the second kind reflect the “true
probabilities”. They are only used for the internal The above equation is a perfect analog to the pre-
analysis of the profitability of the insurance company. mium of a death insurance contract (1). The first
On the other hand, in the daily life of actuaries only term, taking care of the discounting, uses the
the mortality tables of the first kind, which properly “conservative” choice of a riskless interest rate r.
display the “modified” probabilities qx , are used. The second term gives the expected value of the
They are not only used for the calculation of premia future cash flow, taken under the risk-neutral prob-
but also for all quantities of relevance involved ability measure Q. This probability measure Q is
in an insurance policy, such as surrender values, chosen in such a way that the dynamics (2) of the
reserves, and so on. This constitutes a big strength of stock under Q become
the actuarial technique: actuaries are always armed
with perfectly coherent logic when doing all these dSt = St r dt + St σ dWt (4)
calculations. This logic is that of a fair game or,
The point is that the drift term St r dt of S under
mathematically speaking, of a martingale. Indeed, if
Q is in line with the growth rate of the risk-free bond
the qx would correctly model the mortality of the
insured person and if i were the interest rate that dBt = Bt r dt (5)
the insurance company could precisely achieve when
investing the premia, then the premium calculation The interpretation of (4) is that if the market were
(1) would make the insurance contract a fair game. correctly modeled by the probability Q, then the mar-
It is important to note that this argument pertains ket was risk neutral. The mathematical formulation,
only to a kind of virtual world, as it is precisely (e−rt St )0≤t≤T , that is, the stock price process dis-
the task of actuaries to choose the mortalities qx counted by the risk-free interest rate r, is a martingale
in a prudent way such that they do not coincide under Q.
with the “true” probabilities. In the case of insurance Similarly as in the actuarial context above, the
contracts where the insurance company has to pay in mathematical model of a financial market under the
the case of death, actuaries choose the probabilities risk-neutral measure Q pertains to a virtual world,
qx higher than the “true ones”. This happens in and not to the real world. In reality, that is, under
the simple example considered above. On the other , we would typically have µ > r. Fixing this case,
hand, if the insurance company has to pay when the Girsanov’s formula (see Equivalence of Probability
insured person is still alive, for example, in the case Measures; Stochastic Exponential) tells us precisely
of a pension, actuaries use probabilities qx which are that the probability measure Q represents a “prudent
lower than the “true ones”, in order to be on the safe choice of probability”. It gives less weight than the
side. original measure  to the events which are favorable
These actuarial techniques have been elaborated for the buyer of a stock, that is, when ST is large.
on as this will be helpful to more clearly understand On the other hand, Q gives more weight than  to
the essence of the option pricing approach of Black, unfavorable events, that is, when ST is small. This
Scholes, and Merton. Their well-known model for the can be seen from Girsanov’s formula
risky stock S and the risk-free bond are
dQ  µ−r (µ − r)2 
= exp − WT − T (6)
dSt = St µ dt + St σ dWt d σ 2σ 2

dBt = Bt r dt (2) and the dynamics of the stock price process S under
 resulting from (2)
The task is to value a (European) derivative on    
the stock S at expiration time T , for example, CT = σ2
ST = S0 exp σ WT + µ − T (7)
(ST − K)+ . As explained earlier (see Complete 2
Risk-neutral Pricing 3

Fixing a random element ω ∈ , the Radon– Risk-neutral Pricing for General Models
Nikodym derivative dQ d
(ω) is small iff WT (ω) is
large, and the latter is large iff ST (ω) is large. In the Black–Scholes model (2) there is only one
In many applications, it is not even necessary risk-neutral measure Q under which the discounted
to consider the original “true” probability measure stock price process becomes a martingale.d
. There are hundreds of papers containing the This feature characterizes complete financial mar-
sentence: “we work under the risk-neutral measure kets (see Complete Markets). In this case, we not
Q”. This is parallel to the situation of an actuary only obtain from equation (3) a price C0 for the
in his/her daily work: He/she does not bother about derivative security CT , but we get much more: the
the “true” mortality probabilities, but only about the derivative can be perfectly replicated by starting
probabilities listed in the mortality table of the first at time t = 0 with the initial investment given by
kind. equation (3) and subsequent dynamical trading in
The history of the valuation formula (3), in fact, the underlying stock S. This is the essence of the
goes back much further than Black, Scholes, and approach of Black, Scholes, and Merton; it has no
Merton. Already in 1900, L. Bachelier applied this parallel in the classical actuarial approach or in the
formula in his thesis [1] in order to price options. It work of L. Bachelier.
seems worthwhile to have a closer look. Bachelier What happens in incomplete financial markets,
did not use a discount factor, such as e−rT , in that is, when there is more than one risk-neutral
equation (3). The reason is that in 1900 prices measure Q? It has been shown by Harrison and Pliska
underlying the option were denoted in forward prices [4] that equation (3) yields precisely all the consistent
at the Paris stock exchange (called “true prices” by pricing rules for derivatives on S, when Q runs
Bachelier who also carefully adjusted for coupon through the set of risk-neutral measures equivalent
payments; see [6] for details). As it is well known, to . We denote the latter set by Me (S). The term
when considering forward prices the discount factor consistent means that there should be no-arbitrage
disappears. In modern terminology, this fact boils possibilities when all possible derivatives on S are
down to “Black’s formula”. traded at the price given by equation (3).
As regards the second term in equation (3), Bache- But, what is the good choice of Q ∈ Me (S)?
lier started from the very beginning with a martingale In general, this question is as meaningless as the
model, namely, (scaled) Brownian motion [6] question: what is the good choice of an element in
some convex subset of a vector space? In order to
St = S0 + σ Wt , 0≤t ≤T (8) allow for a more intelligent version of this question,
one needs additional information. It is here that
the original probability measure  comes into play
In other words, he also “worked assuming the risk-
again: a popular approach is to choose the element
neutral probability”.
Q ∈ Me (S) which is “closest” to .
In fact, in the first pages of his thesis Bachelier
In order to make this idea precise, fix a strictly
does speak about two kinds of probabilities. The
convex function V (y), for example,
following is a quote from [1]:
 
(i) The probability which might be called “mathe- V (y) = y ln(y) − 1 , y>0 (9)
matical”, which can be determined a priori and which
2
is studied in games of chance. y
(ii) The probability dependent on future events and, or V (y) = , y∈ (10)
2
consequently impossible to predict in a mathematical
manner. Determine Q̂ ∈ Me (S) as the optimizer of the
This latter is the probability that the speculator tries optimization problem
to predict.


dQ
Admitting a large portion of goodwill and hindsight Ɛ V → min! Q ∈ Me (S) (11)
d
knowledge one might interpret (i) something like
the risk-neutral probability Q, while (ii) describes To illustrate things
 at the hand
 of the above exam-
something like the historical measure . ples: For V (y) = y ln(y) − 1 , this corresponds to
4 Risk-neutral Pricing

choosing the element Q̂ ∈ Me (S)minimizing the rel- References



ative entropy H (Q|) = ƐQ ln dQ ; for V (y) =
d [1] Bachelier, L. (1964). Théorie de la Spéculation, Annales
y2
, this corresponds to choosing Q ∈ Me (S) mini- Scientifiques de l’É Normale Superieure 17, 21–86;
 dQ 2  1
2
English translation in: P. Cootner (ed.) (1900). The Ran-
mizing the L2 -norm  ddQ 
 L2 () = Ɛ
d
2.
dom Character of stock market prices, MIT Press.
Under appropriate conditions, the minimization [2] Davis, M. (1997). Option pricing in incomplete markets,
problem (11) has a solution, which then is unique in Mathematics of Derivative Securities, M.A.H. Demp-
by the strict convexity assumption. ster & S.R. Pliska, eds, Cambridge University Press,
pp. 216–226.
There is an interesting connection to the issue [3] Foldes, D. (2000). Valuation and martingale properties
of Utility Indifference Valuation. Let U (x) be the of shadow prices: an exposition, Journal of Economic
(negative) Legendre–Fenchel transform of V , that Dynamics and Control 24, 1641–1701.
is, [4] Harrison, J.M. & Pliska, S.R. (1981). Martingales and
stochastic integrals in the theory of continuous trading,
U (x) = inf{−xy + V (y)} (12) Stochastic Processes and their Applications 11, 215–260.
y
[5] Rubinstein, M. (1976). The valuation of uncertain income
streams and the pricing of options, Bell Journal of
For the two examples above, we obtain Economics 7, 407–426.
[6] Schachermayer, W. (2003). Introduction to the mathemat-
ics of financial markets, in Lecture Notes in Mathemat-
−x
U (x) = − e (13) ics 1816 - Lectures on Probability Theory and Statistics,
2 Saint-Flour Summer School 2000 (Pierre Bernard, editor),
x
or U (x) = − (14) S. Albeverio, W. Schachermayer & M. Talagrand, eds,
2 Springer Verlag, Heidelberg, pp. 111–177.

which may be interpreted as utility functions. It


turns out that—under appropriate assumptions—the
Further Reading
optimizer Q̂ in equation (11) yields precisely the
Black, F. & Scholes, M. (1973). The pricing of options
marginal utility indifference pricing rule when
and corporate liabilities, Journal of Political Economy 81,
plugged into equation (3) (see Utility Indifference 637–659.
Valuation). Delbaen, F. & Schachermayer, W. (2006). The Mathematics of
In particular, we may conclude that pricing by Arbitrage, Springer Finance, p. 371.
marginal utility [2, 3, 5] is a consistent pricing rule Merton, R.C. (1973). The theory of rational option pricing, Bell
in the sense of Harrison and Kreps. Journal of Economics and Management Science 4, 141–183.
Ross, S. (1978). A simple approach to the valuation of risky
streams, Journal of Business 51, 453–475.
Samuelson, P.A. (1965). Proof that properly anticipated
End Notes prices fluctuate randomly, Industrial Management Review 6,
41–50.
a.
In real life nothing is actually riskless: in practice, the
riskless rate of return corresponds to government bonds Related Articles
(provided that the government is reliable).
b.
We do not consider costs, taxes, and so on, which are
eventually added to this premium; we only consider the Change of Numeraire; Complete Markets; Equi-
“net premium”. valent Martingale Measures; Fundamental Theo-
c.
A mortality table (horrible word!) is nothing but a list of rem of Asset Pricing; Model Calibration; Monte
probabilities qx , where x runs through the relevant ages, say Carlo Simulation; Pricing Kernels; Stochastic Dis-
x = 18, . . . , 110. The first mortality table was constructed count Factors.
by Edmond Halley in 1693.
d.
To be precise: this result only holds true if for the
WALTER SCHACHERMAYER
underlying filtered probability space (, F, (Ft )0≤t≤T , )
we have F = FT and the filtration (Ft )0≤t≤T is generated
by (St )0≤t≤T .
Hedging To calculate ϑ, note that we can express ϑ as
the (symbolic) differential of angle bracket processes
(w.r.t. Q),
d V , St
In a complete market (see Complete Markets) ϑt = (5)
d St
derivative securities are redundant in the sense that
they can be replicated by the gains from trading via which then in turn can often be evaluated by assum-
a self-financing admissible strategy in the underlying ing more specific structures for the price process S
asset. This replicating strategy is then called the and the claim B.
hedging strategy for the claim.
More formally, we fix some filtered probability
space (, A, (Ft ) , P ). The (discounted) price pro- Quadratic Risk Minimization
cess of a risky asset is modeled by an (Ft )-adapted
semimartingale S. A claim B is an FT -measurable In incomplete markets, one can in general not hedge a
random variable, where T is the maturity of the claim. claim perfectly, and hence, there will always be some
B is attainable if there exists a constant c and an remaining risk which can be minimized according
admissible strategy ϑ such that to various criteria. The Föllmer–Sondermann (FS)
[5] approach consists in an orthogonal projection
 T in L2 (Q) of a square-integrable claim B onto the
B =c+ ϑt dSt (1) subspace spanned by the constants and stochastic
0
integrals w.r.t. the price process S (which we assume
The quintuple (, A, (Ft ) , P , S) models a financial to be locally square-integrable). Here, Q is some
market. A market is complete if all bounded claims martingale measure for S that has been obtained
are attainable. Finally, a market that is not complete either via calibration or according to some optimality
is called incomplete. criterion.
In case there exists an equivalent martingale mea- More precisely, given a claim B ∈ L2 (Q, FT ), we
sure (see Equivalent Martingale Measures) Q for want to minimize
S in a complete market, it must be unique accord-   T 2 
ing to some version of the second fundamental EQ B −c− ϑt dSt (6)
theorem of asset pricing (see Second Fundamen- 0
tal Theorem of Asset Pricing). Moreover, S has
the predictable representation property (PRP) (see over all constants c and all ϑ ∈ L2 (S), that is,
T
Martingale Representation Theorem) with respect predictable processes ϑ, such that EQ 0 ϑt2 d [S]T
to (w.r.t.) (Q, (Ft )), meaning that every (Q, (Ft ))- < ∞. Hence, the goal is to project B onto the linear
martingale can be written as a sum of its initial value space
and a stochastic integral w.r.t. S. These facts can be
used to show existence of an optimal hedging strat-  T

egy as follows: we consider for each bounded claim K = c+ ϑt dSt : c ∈ , ϑ ∈ L2 (S) ⊂ L2 (Q)
B the associated Q-martingale V given by 0

(7)
Vt = EQ [ B | Ft ] , t ≤T (2)
For ϑ as above we also denote
By the PRP, there exists an admissible strategy ϑ  T

such that K0 = ϑt dSt : ϑ ∈ L (S) ⊂ L2 (Q)


2
(8)
 t 0

Vt = V0 + ϑu dSu , t ≤T (3) By its very construction, the stochastic integral


0
yields an isometry (here, we understand [S] as the
In particular, for t = T , we get measure on [0, T ] which is associated with the
 T increasing process [S])
B = EQ [B] + ϑt dSt (4) K0 ∼
0
= L2 ( × [0, T ] , Q ⊗ [S]) (9)
2 Hedging

 T Utility-indifference Hedging
ϑt dSt ←−→ ϑ (10)
0
Let u be some utility function defined on the whole
since we have real line. If there exists a number π satisfying
 2     
T T T
EQ ϑt dSt = EQ ϑt2 d [S]T (11) sup E u x + ϑt dSt
0 0 ϑ 0
  T 
Hence, K0 is isometrically isomorphic to an L - 2 = sup E u x + ϑt dSt + π − B (17)
space and therefore closed. Therefore, we can apply ϑ 0

the theorem about the orthogonal projection in the then it is called utility-indifference price of the claim
Hilbert spaces to get a decomposition B. It is the threshold where the investor is indifferent
 T
whether just to maximize expected utility from a pure
B=c + B
ϑtB dSt + LT (12) investment into the stock with the price process S or
0 to sell in addition a claim B and collect a premium
π for this.
where LT is orthogonal to each element of K; The optimal strategies ϑ on both sides of equation
in particular, EQ [LT ] = 0 since 1 ∈ K. It follows (17) typically differ. The difference
that we have cB = EQ [B], and ϑ B is called the
FS optimal hedging strategy. As processes, Lt := θ := φ B − φ 0 (18)
E[LT |Ft ] and S are strongly orthogonal in the sense
that LS is a Q-martingale or equivalently, L, S = of the optimizers on the right- and the left-hand
0, where the predictable covariation ., . here, refers side respectively can be interpreted as a utility-based
to the measure Q. This implies hedging strategy. It corresponds to the adjustment of
 the investor’s portfolio made in order to account for
the option.
ϑ B dV , S = S, S (13)
Let us consider exponential utility

where Vt := EQ [B|Ft ] denotes the martingale gener- u(x) = 1 − exp(−αx) (19)


ated by B. Moreover, a simple calculation yields
where α > 0. If θ α denotes the exponential utility-
 T based hedging strategy corresponding to selling α
EQ [L2T ] = EQ V , V T − (ϑtB )2 S, St (14) units of the claim B, then it turns out that under
0
quite general conditions the associated normalized
Equation (13) is sometimes written as 1  T θ α dS converge in L2 (Q0 ), Q0 being the
gains α t
0 t T
minimal entropy martingale measure, to 0 ϑtB dSt .
dV , S Here, ϑ B is the integrand coming from the GKW
ϑB = (15)
dS, S decomposition (12) w.r.t. Q0 ; see [6] and the
references contained therein.
We call

V =c + B
ϑ B dS + L (16) Further Approaches to Hedging
the Galtchouk–Kunita–Watanabe (GKW) decompo- Ideally, one would like to find a hedging strategy
sition of B or rather V relative to S. that always allows one to superreplicate the claim
In some models, one can compute the optimal B. Finding such a strategy is related to the optional
(risk-minimizing) hedging strategy by solving a par- decomposition theorem for supermartingales which
tial integro-differential equation [1] or by a general- are bounded from below. However, it turns out
ized Clark–Ocone formula from Malliavin calculus that pursuing such a superhedging strategy is too
[2]. expensive in the sense that the corresponding price
Hedging 3

typically equals the highest price consistent with no- [2] Di Nunno, G. (2002). Stochastic integral representation,
arbitrage pricing, that is, it amounts to supQ EQ [B], stochastic derivatives and minimal variance hedging,
where the supremum is taken over all the equivalent Stochastics and Stochastics Reports 73, 181–198.
[3] Föllmer, H. & Leukert, P. (1999). Quantile hedging,
martingale measures Q. Finance and Stochastics 3, 251–273.
Therefore, it has been proposed by Föllmer and [4] Föllmer, H. & Leukert, P. (2000). Efficient hedging: cost
Leukert [3] to maximize the probability of a success- versus shortfall risk, Finance and Stochastics 4, 117–146.
ful hedge given a certain amount of initial capital, [5] Föllmer, H. & Sondermann, D. (1986). Hedging of non-
a concept that they call quantile hedging. However, redundant contingent claims. Contributions to mathemati-
with this approach there is no protection for the worst cal economics, in Honor of G. Debreu, W. Hildenbrand &
case scenarios other than portfolio diversification, A. Mas-Colell, eds, Elsevier Science Publications, North-
Holland, pp. 205–223.
and technically, it might be difficult to implement [6] Kallsen, J. & Rheinländer, T. (2008). Asymptotic Utility-
this since it corresponds to hedging a knock-out based Pricing and Hedging for Exponential Utility.
option. The same authors [4], moreover, considered Preprint.
efficient hedges which minimize the expected short-
fall weighted by some loss function. In this way, the
investor may interpolate between the extremes of no Related Articles
hedge and a superhedge, depending on the accepted
level of shortfall risk. Complete Markets; Delta Hedging; Equivalent
Martingale Measures; Mean–Variance Hedging;
References Option Pricing: General Principles; Second
Fundamental Theorem of Asset Pricing; Stochastic
[1] Cont R., Tankov P. & Voltchkova E. (2007). Hedging Integrals; Superhedging; Uncertain Volatility
with options in presence of jumps, in Stochastic Analysis Model; Utility Indifference Valuation.
and Applications: The Abel Symposium 2005 in honor
of Kiyosi Ito, F.E. Benth, G. Di Nunno, T. Lindstrom, THORSTEN RHEINLÄNDER
B. ksendal & T. Zhang, eds, Springer, pp. 197–218.
Complete Markets is given as a strictly positive process, so that it can
be selected as a numéraire asset. Let us then assume
that Stk > 0 for every t ≤ T . To emphasize the spe-
cial role of the process S k , we will sometimes write
According to the arbitrage pricing of derivative secu-
B instead of S k . We assume that all assets are per-
rities, the arbitrage price of a financial derivative
fectly divisible and the market is frictionless, that is,
is defined as the wealth of a self-financing trad-
there are no restrictions on the short-selling of assets,
ing strategy based on traded primary assets, which
transaction costs, taxes, and so on.
replicates the terminal payoff at maturity (or, more
We consider a probability space (, FT , ),
generally, all cash flows) from the financial deriva-
which is equipped with a filtration  = (Ft )t≤T .
tive. Hence, an important issue arises whether any
A probability measure , to be interpreted as the
financial derivative admits a replicating strategy in a
real-life probability, is an arbitrary probability mea-
given model; if this property holds, then the market
sure on (, FT ) such that (ωi ) > 0 for every i =
model is said to be complete. Completeness of a mar-
1, 2, . . . , d. For convenience, we assume throughout
ket model ensures that any derivative security can be
that the σ -field F0 is trivial, that is, F0 = {∅, }. All
priced by arbitrage and hedged by a dynamic trading
processes considered in what follows are assumed to
in primary traded assets. For example, in the frame-
be -adapted.
work of the Cox, Ross, and Rubinstein [9] model,
not only the call and put options but also any path-
independent or path-dependent contingent claim can Trading Strategies
be replicated by a dynamic trading in stock and bond.
Similarly, the classic Black and Scholes [3] model The component φti of a trading strategy φ =
enjoys the property of completeness, although a suit- (φ 1 , φ 2 , . . . , φ k ) represents the number of units of
able technical assumption needs to be imposed on the the ith security held by an investor at time t. In other
class of considered contingent claims. words, φti Sti is the amount of funds invested in the
Even for an incomplete model, the class of hedge- ith security at time t. Hence, the wealth process V (φ)
able derivatives, formally represented by attainable of a trading strategy φ is given by the equality, for
contingent claims, can be sufficiently large for prac- t = 0, 1, . . . , T ,
tical purposes. Therefore, completeness should not

k
be seen as a necessary requirement, as opposed to the Vt (φ) = φti Sti (1)
no-arbitrage property, which is an indispensable fea- i=1
ture of any financial model used for arbitrage pricing
of derivative securities. The initial wealth V0 (φ) = φ0 S0 is also referred to as
the initial cost of φ.
A trading strategy φ is said to be self-financing
Finite Market Models whenever it satisfies the following condition, for
every t = 0, 1, . . . , T − 1,
The issue of completeness of a finite market
model was analyzed, among others, by Taqqu and 
k 
k
φti St+1
i
= i
φt+1 i
St+1 (2)
Willinger [24]. The finiteness of a market means i=1 i=1
that the underlying probability space is finite,
 = {ω1 , ω2 , . . . , ωd }, and trading activities may In the financial interpretation, this condition means
only occur at the finite set of dates, denoted as that the portfolio φ is revised at any date t in such
{0, 1, . . . , T }. As a standard example of a finite a way that there are no infusions of external funds
market model, one may quote, for instance, the Cox, and no funds are withdrawn from the portfolio. We
Ingersoll, and Ross [9] binomial tree model (see denote by  the vector space of all self-financing
Binomial Tree) or any its multinomial extensions. trading strategies. The gains process G(φ) of any
Let S 1 , S 2 , . . . , S k be the stochastic processes trading strategy φ equals, for t = 0, 1, . . . , T ,
describing the spot (or cash) prices of some non- 
t−1 
k
dividend paying financial assets. As customary, we Gt (φ) = φui (Su+1
i
− Sui ) (3)
postulate that the price process of at least one asset u=0 i=1
2 Complete Markets

with G0 (φ) = 0. It can be checked that a trading there are no-arbitrage opportunities in the class  of
strategy φ is self-financing if and only if the all self-financing trading strategies.
equality Vt (φ) = V0 (φ) + Gt (φ) holds for every t = It can be shown that if the market model M is
0, 1, . . . , T . arbitrage free, then any attainable contingent claim X
is uniquely replicated in M. The converse implication
is not true, however, that is, the uniqueness of the
Replication and Arbitrage wealth process of any attainable contingent claim
does not imply the arbitrage-free property of a
A European contingent claim X with maturity T is market, in general. Therefore, the existence and
an arbitrary FT -measurable random variable. Since uniqueness of the wealth process associated with any
the space  is assumed to be a finite set with d attainable claim is insufficient to justify the term
elements, any claim X has the representation X = arbitrage price. Indeed, it is easy to give an example
(X(ω1 ), X(ω2 ), . . . , X(ωd )) ∈ d . Hence, the class of a finite market in which all claims can be uniquely
X of all contingent claims that settle at T may be
replicated, but there exists a strictly positive claim
identified with the vector space d .
which can be replicated by a self-financing strategy
A replicating strategy for the contingent claim X,
with a negative initial cost.
which settles at time T , is a self-financing trading
strategy φ such that VT (φ) = X. For any claim X, Definition 2 Let the market model M be arbitrage
we denote by X the class of all replicating strategies free. Then the wealth process of an attainable claim
for X. X is called the arbitrage price of X in M and it is
The wealth process V (φ) of an arbitrary strategy denoted by πt (X) for every t = 0, 1, . . . , T .
φ from X is called a replicating process of X in M.
Finally, we say that a claim X is attainable in M if
it admits at least one replicating strategy. We denote Risk-neutral Valuation Formula
the class of all attainable claims by A.
Recall that we write S k = B. Let us denote by S ∗ the
Definition 1 A market model M is said to be process of relative prices, which equals, for every
complete if every claim X ∈ X is attainable in M t = 0, 1, . . . , T ,
or, equivalently, if for every FT -measurable random
variable X there exists at least one trading strategy
St∗ = (St1 Bt−1 , St2 Bt−1 , . . . , Stk Bt−1 )
φ ∈  such that VT (φ) = X. In other words, a market
model M is complete whenever X = A. = (St∗1 , St∗2 , . . . , St∗(k−1) , 1) (5)
Let X be an arbitrary attainable claim that settles where we denote S ∗i = S i B −1 . Recall that the prob-
at time T . We say that X is uniquely replicated in M ability measures  and  on (, F) are said to
if it admits a unique replicating process in M, that be equivalent if, for any event A ∈ F, the equality
is, if the equality Vt (φ) = Vt (ψ), t ∈ [0, T ], holds (A) = 0 holds if and only if (A) = 0. Similarly,
for arbitrary trading strategies φ, ψ from X . Then  is said to be absolutely continuous with respect
the process V (φ) is termed the wealth process of X to  if, for any event A ∈ F, the equality (A) = 0
in M. implies that (A) = 0. Clearly, if the probability
measures  and  are equivalent, then they are
also equivalent to each other. The following con-
Arbitrage Price cept is crucial in the so-called risk-neutral valuation
approach.
A trading strategy φ ∈  is called an arbitrage
opportunity if V0 (φ) = 0 and the terminal wealth of
Definition 3 A probability measure ∗ on (, FT )
φ satisfies
equivalent to  (absolutely continuous with respect
(VT (φ) ≥ 0) = 1 and (VT (φ) > 0) > 0 (4) to , respectively) is called a equivalent martingale
measure for S ∗ (a generalized martingale measure
where  is the real-world probability measure. We for S ∗ , respectively) if the relative price S ∗ is a ∗ -
say that a market M = (S, ) is arbitrage free if martingale with respect to the filtration .
Complete Markets 3

An -adapted, k-dimensional process S ∗ = no-arbitrage property of M. Recall that trivially


(S ∗1 , S ∗2 , . . . , S ∗k ) is a ∗ -martingale with respect P(M) ⊆ Q(M) so that the class Q(M) is manifestly
to a filtration  if the equality nonempty if P(M) is so.
∗i
Ɛ∗ (St+1 | Ft ) = St∗i (6) Proposition 1 Assume that the class P(M) is
nonempty. Then the market M is arbitrage free.
holds for every i and t = 0, 1, . . . , T − 1. Moreover, the arbitrage price process of any attain-
We denote by P(S ∗ ) and Q(S ∗ ) the class of able contingent claim X, which settles at time T , is
all equivalent martingale measures for S ∗ and the given by the risk-neutral valuation formula, for every
class of all generalized martingale measures for S ∗ , t = 0, 1, . . . , T ,
respectively, so that the inclusion P(S ∗ ) ⊆ Q(S ∗ )
holds. It is not difficult to provide an example in πt (X) = Bt Ɛ∗ (XBT−1 | Ft ) (8)
which the class P(S ∗ ) is empty, whereas the class
where ∗ is any EMM (or GMM) for the market model
Q(S ∗ ) is not.
M.
Definition 4 A probability measure ∗ on (, FT ) It can be checked that the binomial tree model
equivalent to  (absolutely continuous with respect (see Binomial Tree) with deterministic interest rates
to , respectively) is called an equivalent martingale is complete, whereas its extension in which the stock
measure for M = (S, ) (a generalized martingale price is modeled by a trinomial tree is incomplete.
measure for M = (S, ), respectively) if for every Completeness relies, in particular, on the choice of
trading strategy φ ∈  the relative wealth process traded primary assets. Hence, it is natural to ensure
V ∗ (φ) = V (φ)B −1 is a ∗ -martingale with respect completeness of an incomplete model by adding new
to the filtration . traded instruments (typically, plain-vanilla options).
We write P(M) (Q(M), respectively) to denote
the class of all equivalent martingale measures (of Completeness of a Finite Market
all generalized martingale measures, respectively) for
M. For conciseness, an equivalent martingale mea- We already know that if the set of equivalent mar-
sure (a generalized martingale measure, respectively) tingale measures is nonempty, then the market model
is abbreviated as EMM (GMM, respectively). Note M is arbitrage free. It appears that this condition is
that an equivalent martingale measure is sometimes also necessary for the no-arbitrage property of the
referred to as a risk-neutral probability. market model M.
It can be shown that a trading strategy φ is
Proposition 2 Suppose that the market model M
self-financing if and only if the relative wealth
is arbitrage free. Then the class P(M) of equivalent
process V ∗ (φ) = V (φ)B −1 satisfies, for every t =
martingale measures for M is nonempty.
0, 1, . . . , T ,
This leads to the following version of the first fun-

t−1 
k
Vt∗ (φ) = V0∗ (φ) + φui (Su+1
i∗
− Sui∗ ) (7) damental theorem of asset pricing (the First FTAP).
u=0 i=1
Theorem 1 A finite market model M is arbitrage
Therefore, for any φ ∈  and any GMM ∗ the free if and only if the class P(M) is nonempty, that
relative wealth V ∗ (φ) is a ∗ -martingale with respect is, there exists at least one equivalent martingale
to the filtration . This leads to the following result. measure for M.

Lemma 1 A probability measure ∗ on (, FT ) is In the case of a finite market model, this result
a GMM for the market model M if and only if it was established by Harrison and Pliska [13]. For a
is a GMM for the relative price process S ∗ , that is, probabilistic approach to the First FTAP we refer to
P(S ∗ ) = P(M) and Q(S ∗ ) = Q(M). Taqqu and Willinger [20], who examine the case of
a finite market model, and to papers by Dalang et al.
The next result shows that the existence of [10] and Schachermayer [23], who study the case of
an EMM for M is a sufficient condition for the a discrete-time model with infinite state space.
4 Complete Markets

The following fundamental result provides a rela- where W = (W 1 , . . . , W d ) is a standard d-dimen-


tionship between the completeness property of a finite sional Brownian motion, defined on a filtered
market model and the uniqueness (or nonuniqueness) probability space (, , ). We make the natural
of an EMM. Any result of this kind is commonly assumption that the underlying filtration  coincides
referred to as the second fundamental theorem of asset with the filtration W generated by the Brownian
pricing. motion W . The coefficients µi and σ i follow bounded
progressively measurable processes on the space
Theorem 2 Assume that a market model M is (, , ), with values in  and d , respectively. An
arbitrage free so that the class P(M) is nonempty. important special case is obtained by postulating that
Then M is complete if and only if the uniqueness of for every i the volatility coefficient σ i is represented
an equivalent martingale measure for M holds. by a fixed vector in d and the appreciation rate µi
is a real number.
If an arbitrage-free market model is incomplete,
For brevity, we write σ = σt to denote the
not all claims are attainable and the class P(M) of
volatility matrix—that is, the time-dependent random
equivalent martingale measures comprises more than ij
one element. In that case, one can use the following matrix [σt ], whose ith row specifies the volatility of
the ith traded stock. The last primary security is the
result to determine whether a given contingent claim
risk-free savings account B with the price process
is attainable.
S k+1 = B satisfying
Corollary 1 A contingent claim X ∈ X is attainable
in an arbitrage-free market model M if and only if the dBt = rt Bt dt, B0 = 1 (11)
map ∗  → Ɛ∗ (XBT−1 ) from P(M) to  is constant.
for a bounded, nonnegative, progressively measurable
It follows from this result that if a claim is interest rate process r. This means that, for every
attainable, so that its arbitrage price is well defined, t ∈ [0, T ],  t 
the price can be computed using the risk-neutral Bt = exp ru d u (12)
valuation formula under any of (possibly several) 0

martingale measures. In addition, if the risk-neutral To ensure the absence of arbitrage opportuni-
valuation formula yields the same result for any ties, we postulate the existence of a d-dimensional,
choice of an EMM for the market model at hand, progressively measurable process γ such that the
then a given claim is necessarily attainable. equality


d
ij j
Multidimensional Black and Scholes rt − µit = σt γt = σti · γt (13)
j =1
Model
is satisfied simultaneously for every i = 1, . . . , k (for
A multidimensional Black and Scholes model is a Lebesgue a.e. t ∈ [0, T ], with probability one). Note
natural extension to a multiasset setup of the classic that the market price for risk γ is not uniquely
Black and Scholes [3] options pricing model. Let k determined, in general. Indeed, the uniqueness of a
denote the number of primary risky assets. For any solution γ to this equation holds only if d ≤ k and
i = 1, . . . , k, the price process S i of the ith risky the volatility matrix σ has the full rank for every
asset, referred to as the ith stock, is modeled as an t ∈ [0, T ].
Itô process (the dot · stands for the inner product in For example, if d = k and the volatility matrix
d ) σ is nonsingular (for Lebesgue a.e. t ∈ [0, T ], with
dSti = Sti (µit dt + σti · dWt ) (9) probability one), then, for every t ∈ [0, T ],
with S0i > 0 or, more explicitly, γt = σt−1 (rt 1 − µt ) (14)
 

d where 1 denotes the d-dimensional vector with every
dSti = Sti µit dt + σt dWt 
ij j
(10) component equal to one, and µt is the vector with
j =1 components µit . For any process γ satisfying the
Complete Markets 5

above equation, we introduce a probability measure seen that under these assumptions, the martingale
∗ on (, FT ) by setting measure ∗ exists and is unique.

  
d ∗ T
1 T
= exp γu · dWu − γu du ,
2 Completeness of the Multidimensional
d 0 2 0 Black and Scholes Model
-a.s. (15)
The completeness of the multidimensional Black and
provided that the right-hand side in the last formula Scholes model is defined in much the same way
is well defined. The Doléans (stochastic) exponential as for a finite market model, except that certain
 t   technical restrictions need to be imposed on the class
1 t of contingent claims we wish to hedge and price. This
ηt = exp γu · dWu − γu du
2
(16)
0 2 0 is linked to the fact that not all self-financing trading
strategies are deemed to be admissible. Some of them
is known to be a strictly positive supermartingale should be excluded in order to ensure the no-arbitrage
(but not necessarily a martingale) under , since it property of the model (in addition to the existence of
may happen that Ɛ∗ (ηT ) < 1. A probability measure a martingale measure). Typically, one considers the
∗ equivalent to  is well defined if and only if class of tame strategies to play the role of admissible
the process η follows a -martingale, that is, when trading strategies.
Ɛ∗ (ηT ) = 1. For the last property to hold, it is The multidimensional Black and Scholes model
enough (but not necessary) that γ is a bounded is said to be complete if any ∗ -integrable, bounded
process. from below contingent claim X is attainable, that is,
Assume that the class of martingale measures is if for any such claim X there exists an admissible
nonempty. By virtue of the Girsanov theorem, the trading strategy φ such that X = VT (φ). Otherwise,
process W ∗ , which equals, for every t ∈ [0, T ], the market model is said to be incomplete.
 t Since, by assumption, the interest rate process r
∗ is nonnegative and bounded, the integrability and
Wt = Wt − γu du (17)
0 boundedness of X is therefore equivalent to the
integrability and boundedness of the discounted claim
is a d-dimensional standard Brownian motion on
X/BT . It is not postulated that the uniqueness of
(, , ∗ ). It follows from the Itô formula that the
an EMM holds, and thus the ∗ -integrability of X
discounted stock price St∗i = Sti Bt−1 satisfies under ∗
refers to any EMM for the model. The next result
establishes necessary and sufficient conditions for the
dSt∗i = S ∗i σti · dWt∗ (18)
completeness of the Black and Scholes market.
for any i = 1, . . . , k. This means that the discounted
Proposition 3 The following are equivalent:
prices of all stocks follow local martingales under ∗ ,
so that any probability measure described above is a 1. the multidimensional Black and Scholes
martingale measure for our model and it corresponds model is complete;
to the choice of the savings account as the numéraire 2. inequality d ≤ k holds and the volatility
asset. The class of tame strategies relative to B is matrix σ has full rank for Lebesgue a.e. t ∈
defined by postulating that the discounted wealth of [0, T ], with probability 1;
a strategy follows a stochastic process bounded from 3. there exists a unique equivalent martingale
below. The market model obtained in this way is measure ∗ for discounted stock price S ∗i for
referred to as the multidimensional Black and Scholes every i = 1, . . . , k.
model.
In the classic version of the multidimensional The classic one-dimensional Black and Scholes
Black and Scholes model, one postulates that d = market model introduced in [3] is clearly a special
k, the constant volatility matrix σ is nonsingular, case of the multidimensional Black and Scholes
and the appreciation rates µi and the continuously model. Hence, the above results apply also to the
compounded interest rate r are constant. It is easily classic Black and Scholes market model, in which the
6 Complete Markets

martingale measure ∗ is well known to be unique. where the stochastic volatility process σ satisfies
We conclude that the one-dimensional Black and
t
dσt = a(σt , t) dt + b(σt , t) dW
Scholes market model is complete, that is, any ∗ - (21)
integrable contingent claim is ∗ -attainable and thus are (possibly correlated) one-
where W and W
it can be priced by arbitrage.
dimensional Brownian motions defined on some
In the general semimartingale framework, the
filtered probability space (, , ). Owing to the
equivalence of the uniqueness of an EMM and the , stochastic
presence of the Brownian motion W
completeness of a market model were conjectured
volatility models are incomplete if stock and bond
by Harrison and Pliska [13, 14] (see also [18]).
are the only trade primary assets. By postulating that
The case of the Brownian filtration is examined in some plain-vanilla options are traded, it is possible
[16]. Chatelain and Stricker [7, 8] provide definitive to complete a stochastic volatility model, however.
results for the case of continuous local martingales Completeness of a model of financial market with
(see also [1, 20] for related results). They focus traded call and put options and related topics, such
on the important distinction between the vector and as static hedging of exotic options, was examined by
componentwise stochastic integrals. several authors: Bajeux-Besnainou and Rochet [2],
Breeden and Litzenberger [4], Brown et al. [5], Carr
et al. [6], Derman et al. [11], Madan and Milne [17],
Local and Stochastic Volatility Models Nachman [19], Romano and Touzi [21], and Ross
[22], to mention a few.
Note that we have examined the completeness of
the market model in which trading was restricted References
to a predetermined family of primary securities. In
practice, several derivative securities are also traded [1] Artzner, P. & Heath, D. (1995). Approximate complete-
either on organized exchanges or over-the-counter ness with multiple martingale measures, Mathematical
and thus they can be used to formally complete a Finance 5, 1–11.
given market model. Let us comment briefly on two [2] Bajeux-Besnainou, I. & Rochet, J.-C. (1996). Dynamic
classes of models in which, for simplicity, we assume spanning: are options an appropriate instrument, Mathe-
that the bond price is deterministic. matical Finance 6, 1–16.
[3] Black, F. & Scholes, M. (1973). The pricing of options
Following Dupire [12], we define the stock price and corporate liabilities, Journal of Political Economics
as a solution to the following stochastic differential 81, 637–654.
equation: [4] Breeden, D. & Litzenberger, R. (1978). Prices of state-
contingent claims implicit in option prices, Journal of
dSt = St (µ(St , t) dt + σ (St , t) dWt∗ ) (19) Business 51, 621–651.
[5] Brown, H., Hobson, D. & Rogers, L. (2001). Robust
hedging of options, Applied Mathematical Finance 5,
where S0 > 0 and the function σ : + × + →  17–43.
represents the so-called local volatility. In practice, [6] Carr, P., Ellis, K. & Gupta, V. (1998). Static hedging of
the function σ is obtained by fitting the model to exotic options, Journal of Finance 53, 1165–1190.
market quotes of traded options. Model of this form [7] Chatelain, M. & Stricker, C. (1994). On componentwise
is complete and thus any derivative security with the and vector stochastic integration, Mathematical Finance
4, 57–65.
stock price as an underlying asset can be hedged
[8] Chatelain, M. & Stricker, C. (1995). Componentwise
and priced by arbitrage (provided, of course, that and vector stochastic integration with respect to cer-
the model is arbitrage free). Another example of a tain multi-dimensional continuous local martingales, in
complete model in which the volatility follows a Seminar on Stochastic Analysis, Random Fields and
stochastic process is discussed by Hobson and Rogers Applications, E. Bolthausen, M. Dozzi, F. Russo, eds,
[15]. Birkhäuser, Boston, Basel, Berlin, pp. 319–325.
[9] Cox, J.C., Ross, S.A. & Rubinstein, M. (1979). Option
In a typical stochastic volatility model, the stock
pricing: a simplified approach, Journal of Financial
price S is governed by the equation Economics 7, 229–263.
[10] Dalang, R.C., Morton, A. & Willinger, W. (1990).
dSt = µ(St , t) dt + σt St dWt (20) Equivalent martingale measures and no-arbitrage in
Complete Markets 7

stochastic securities market model, Stochastics and [19] Nachman, D. (1989). Spanning and completeness with
Stochastic Reports 29, 185–201. options, Review of Financial Studies 1, 311–328.
[11] Derman, E., Ergener, D. & Kani, I. (1995). Static options [20] Pratelli, M. (1996). Quelques résultats du calcul
replication, Journal of Derivatives 2(4), 78–95. stochastique et leur application aux marchés financiers,
[12] Dupire, B. (1994). Pricing with a smile, Risk 7(1), Astérisque 236, 277–290.
18–20. [21] Romano, M. & Touzi, N. (1997). Contingent claims and
[13] Harrison, J.M. & Pliska, S.R. (1981). Martingales and market completeness in a stochastic volatility model,
stochastic integrals in the theory of continuous trad- Mathematical Finance 7, 399–412.
ing, Stochastic Processes and their Applications 11, [22] Ross, S.A. (1976). Options and efficiency, Quarterly
Journal of Economics 90, 75–89.
215–260.
[23] Schachermayer, W. (1992). A Hilbert space proof of
[14] Harrison, J.M. & Pliska, S.R. (1983). A stochastic
the fundamental theorem of asset pricing in finite
calculus model of continuous trading: complete mar-
discrete time, Insurance: Mathematics and Economics
kets, Stochastic Processes and their Applications 15, 11, 249–257.
313–316. [24] Taqqu, M.S. & Willinger, W. (1987). The analysis of
[15] Hobson, D.G. & Rogers, L.C.G. (1998). Complete finite security markets using martingales, Advances in
model with stochastic volatility, Mathematical Finance Applied Probability 19, 1–25.
8, 27–48.
[16] Jarrow, R.A. & Madan, D. (1991). A characterization of
complete markets on a Brownian filtration, Mathemati- Related Articles
cal Finance 1, 31–43.
[17] Madan, D.B. & Milne, F. (1993). Contingent claims
Binomial Tree; Local Volatility Model; Martin-
valued and hedged by pricing and investing in a basis,
gale Representation Theorem; Second Fundamen-
Mathematical Finance 4, 223–245.
[18] Müller, S. (1989). On complete securities markets and
tal Theorem of Asset Pricing.
the martingale property of securities prices, Economics
MAREK RUTKOWSKI
Letters 31, 37–41.
Equivalent Martingale (iii) sigma-martingale if there is an d -valued mar-
tingale M = (Mt )0≤t≤T and a predictable
Measures M-integrable + -valued process ϕ such that
S = ϕ · M.

The usual setting of mathematical finance is provided The process ϕ · M is defined as the stochastic inte-
by a d-dimensional stochastic process S = (St )0≤t≤T gral in the sense of semimartingales. The—by now
based on and adapted to a filtered probability space well understood—underlying theory was developed
(, F, (Ft )0≤t≤T , ). This process S models the notably by the school of P.A. Meyer in Strasbourg
price evolution of d risky stocks, which is random. [10–12]:
To alleviate notation, we assume from the very  t
beginning that these prices are denoted in discounted (ϕ · M)t = ϕu dMu , 0 ≤ t ≤ T (3)
0
terms: fix a traded asset, the “bond”, as numéraire
and express stock prices S in units of this bond. This It is not obvious, but true, that a local martingale
simple and classical technique allows us to dispense is a sigma-martingale, so that (i) ⇒ (ii) ⇒ (iii) holds
with discount factors in the formulae below (compare true above, while the reverse implications fail to hold
Section 2.1 in [6] for more details). true as we discuss later.
A central topic in mathematical finance is to Why is it necessary to introduce these generaliza-
decide whether there is a probability measure Q, tions of the concept of a martingale? Let us start
equivalent to , such that S is a martingale under with a familiar example of a martingale, namely,
Q. This is the theme of the fundamental theorem geometric Brownian motion
of asset pricing (see Fundamental Theorem of  
Asset Pricing). Once we know that there exist t
Mt = exp Wt − , t ≥0 (4)
equivalent martingale measures, they can be used to 2
determine risk-neutral prices of derivative securities
where the process (Wt )t≥0 is a standard Brownian
by taking expectations under these measures (see
motion.
Risk-neutral Pricing), and to replicate, respectively,
Clearly, (Mt )t≥0 is a martingale (with reference
sub- or superreplicate, the derivative.
to its natural filtration) when t ranges in [0, ∞[. But
In fact, we were less precise in the previous
what happens if we include t = ∞ into the time set?
paragraph (as is usual in this context) by requiring
It is straightforward to verify that
that S is a martingale. It turns out that some
technical care is needed here, involving the notions M∞ := lim Mt (5)
of local martingales and, more generally, of sigma- t→∞
martingales. This article deals precisely with these exists a.s. and equals
technical variants of the concept of a martingale.
We start by giving precise definitions. M∞ = 0 (6)

Definition 1 An d -valued stochastic process Hence we may well define the continuous process
(St )0≤t≤T based on and adapted to (, F, (Mt )0≤t≤∞ ; this process is not a martingale any
(Ft )0≤t≤T , ) is called a more as
1 = M0 > Ɛ[M∞ ] = 0 (7)
(i) martingale if
In this example, the breakdown of the martingale
Ɛ[St |Fu ] = Su , 0≤u≤t ≤T (1) property happens at t = ∞. However, it is purely
formal to shift this problem to any other point
(ii) local martingale if there exists a sequence T ∈]0, ∞[, for example, T = 1. Indeed, letting
(τn )∞
n=1 of [0,T ] ∪ {+∞}-valued stopping times,
increasing a.s. to ∞, such that the stopped pro-
cesses Stτn are all martingales, where M̃t = Mtan tπ  , 0≤t <1
2

Stτn = St∧τn , 0≤t ≤T (2) M̃1 = M∞ = 0 (8)


2 Equivalent Martingale Measures

we find a process (M̃t )0≤t≤1 , having a.s. continuous or [Vol. II] [13]), goes as follows: for the drift
paths, which fails to be a martingale. However, term µ(Xt , t) we have µ(Xt , t) ≡ 0 if and only if
it is intuitively clear that “locally”, that is, before (Xt )0≤t≤1 is a martingale. This is a very useful argu-
t assumes the value 1, the process (M̃t )0≤t≤1 is ment. However, this argument is not quite complete,
“something like a martingale”. The good way to as a glance at equation (10) reveals where we only
formalize this intuition is to find a “localizing” obtain a local martingale. The correct statement is
sequence of stopping times as in (ii) above. The the process X is a local martingale if and only if
canonical choicea is µ(Xt , t) = 0, a.s. with respect to d ⊗ dλ.

τn = inf{t ∈ [0, 1] : |M̃t | ≥ n} (9)


Local Martingales in Finance
which is a [0, 1] ∪ {∞}-valued stopping time, if we
define the infimum over the empty set to be equal to As a concrete example of a local martingale mod-
∞. It is straightforward to verify that (τn )∞
n=1 satisfies eling a price process we consider S̃ = (S̃t )0≤t≤T =
the requirements of (ii) above. (M̃t )0≤t≤1 , the time-changed geometric Brownian
In the above example it holds true that (M̃t )0≤t<1 motion defined in equation (8). We consider S̃ to be
is a martingale, that is, when t ranges in [0, 1[. defined on (, F, (Ft )0≤t≤T , ), where the filtration
In other words, the problem only arises at t = 1. (Ft )0≤t≤T is generated by S̃ and F = FT . We con-
However, there is a more subtle phenomenon in this clude that Q =  is the unique probability measure
context, where the problem not only appears at one Q on F, which is equivalent to  and such that S̃ is a
single value of t, but also for all t. local martingale under Q. This quickly follows from
The canonical example for this phenomenon is the fact that Q =  is the unique probability mea-
the inverse of the three-dimensional Bessel processb sure on F, equivalent to , such that the Brownian
(Rt )0≤t≤1 . It may be defined by R0 = 1 and motion W = (Wt )0≤t<∞ in equation (4) is a martin-
gale under Q.
dRt = Rt2 dWt , 0≤t ≤1 (10)
The question arises whether S̃ defines a sound, that
It turns out that equation (10) well defines a is, arbitrage-free model of a financial market. At first
stochastic process with continuous paths, which is glance, things seem suspicious; after all, (S̃t )0≤t≤T , is
a local martingale (define (τn )∞ a ridiculous stock; it starts at S̃0 = 1 and ends a.s. at
n=1 as in equation (9)).
However, the function S̃T = 0. Hence buying a stock S̃ and holding it up to
time T = 1 is a very “silly investment”. M. Harrison
t → Ɛ[Rt ], 0≤t ≤1 (11) and St. Pliska [8] called such an investment a suicide
strategy. But it is not forbidden to be silly.
is strictly decreasing on the entire interval [0, 1]. Intu- A much more appealing investment strategy would
itively speaking, this may be interpreted as (Rt )0≤t≤1 be to go short in the stock at time 0, and to hold this
“losing mass in continuous time”. We leave it to short position up to time T , as this strategy yields
the reader to develop his or her own intuition for a.s. a gain of one Euro. However, unfortunately, this
this remarkable phenomenon. In any case, this exam- is forbidden. To understand why this is the case, let us
ple should convince the reader that the concept of recall from Fundamental Theorem of Asset Pricing
local martingales, involving “localizing” sequences the definition of admissibility: a trading strategy H =
of stopping times, is a useful and natural notion. (Ht )0≤t≤T for a (general semimartingale) stock price
To underline this claim even further, think of a process S is defined as a predictable strategy, which
diffusion process (Xt )0≤t≤1 satisfying the equation is S-integrable. By definition, the stochastic integral
 t
dXt = σ (Xt , t) dWt + µ(Xt , t) dt, 0≤t ≤1 (H · S)t = (Hu , dSu ), 0 ≤ t ≤ T (13)
0
(12)
then is well defined. We call H admissible, if there
A typical argument used, for example, in the is a constant C > 0 such that a.s.
derivation of the Black–Scholes partial differen-
tial equation (PDE) (compare Complete Markets (H · S)t ≥ −C, 0≤t ≤T (14)
Equivalent Martingale Measures 3

The finite credit line C rules out doubling strate- Sigma-martingales


gies and similar schemes that capitalize on taking
higher and higher risks. A typical representative of For continuous stock price processes S or, more gen-
such a “kind of doubling strategy” is the strategy of erally, for locally bounded processes S, the concept
going short in the stock (S̃t )0≤t≤T , which corresponds of local martingales is sufficiently general to char-
to taking Ht ≡ −1, for 0 ≤ t ≤ T . acterize those models that satisfy the condition of
We now shall convince ourselves that local martin- no free lunch with vanishing risk (see Fundamental
gales yield sound, arbitrage-free models of financial Theorem of Asset Pricing).
markets. It turns out that it does not matter whether However, if we pass to processes S that are not
we start with a true martingale S = (St )0≤t≤T or with locally bounded any more, we still need one more
step of generalization. The key concept for doing
a local martingale S, if we are only interested in the
so was introduced by C. Chou [2] and M. Émery
admissible stochastic integrals H · S. Indeed, it was
[7]
 under the name of “semimartingale de la classe
shown by J.P. Ansel and C. Stricker [1, Corr. 3.5]
( m )”. In [4], F. Delbaen and this author took the
that, given a local martingale S and an admissible
liberty of calling these processes sigma-martingales
integrand H , the stochastic integral H · S is a local
S. The reason for this is that their relation to mar-
martingale and therefore (using once more the fact
tingales is analogous to the relation between sigma-
that H · S is bounded from below) a supermartin- finite measures and finite measures, as seen from
gale. In particular, the process H · S cannot increase Definition 1 above. Also note the (only) difference
in expectation; but it may very well decrease in between Definition 1 (iii) of a sigma-martingale, and
expectation as, for example, the process S̃ above. the characterization of a local martingale as given in
The following characterization of local martin- Proposition 1: in the latter the predictable, + -valued
gales (see [4, Prop. 2.5] for more on this issue) is process ϕ is supposed to be increasing while there is
useful in this context. no such restriction in the former one.
Here is the illuminating example, due to M. Émery
Proposition 1 For an d -valued semimartingale S
[7] (compare [4, Ex. 2.2]), of the archetypical sigma-
the following are equivalent.
martingale, which fails to be a local martingale.
(i) S is a local martingale.
Example 1 We start with an exponentially dis-
(ii) S = ϕ · M, where M is an d -valued martin- tributed random variable τ and an independent
gale, and ϕ is an + -valued, M-integrable, pre- Bernoulli random variable ε, that is, [ε = 1] =
dictable, increasing process. [ε = −1] = 12 . These random variables are based
on some probability space (, F, ).
From this proposition and the trivial formulac
Define the process M = (Mt )t≥0 by

H H 0, for 0 ≤ t ≤ τ
H ·M = · (ϕ · M) = ·S Mt = (16)
ϕ ϕ
(15) ε, for τ ≤ t
The verbal description goes like this: the process
d
which holds true for every  -valued, predictable, M remains at zero until time τ ; then a coin is flipped,
M-integrable process H = (Ht )0≤t≤T we deduce that independently of τ , and the process M continues at
the family of processes, which are stochastic integrals the level +1 or −1, according to the result of this
on the local martingale S coincides with the family coin flip.
Denoting by (Ft )t≥0 the filtration generated by
of processes, which are stochastic processes on the
(Mt )t≥0 , it is rather obvious that (Mt )t≥0 is a martin-
martingale M. Also note that H is admissible for M
gale in this filtration (Ft )t≥0 . To keep in line with the
if and only if Hϕ is admissible for S.
above notation, we only consider the finite-horizon
The bottom line of formula (15) is that there is
process (Mt )0≤t≤T , but the example could as well be
no difference between the stock price process S and
presented for the infinite horizon.
M in Proposition 1 if we are only interested in the Let ϕ = (ϕt )0≤t≤1 be the deterministic process
admissible stochastic integrals on these processes:
these two families of stochastic integrals coincide. ϕt = t −1 , 0≤t ≤1 (17)
4 Equivalent Martingale Measures

and define the stochastic integral S = ϕ · M, for processes S and M work equally well. In particular,
which we get the Ansel–Stricker Theorem carries over to sigma-
 martingales (see [4, Th. 5.5] for a somewhat stronger
0, for 0 ≤ t ≤ τ version of this result).
St = (18)
τ −1 ε, for τ ≤ t It is not hard to show that a locally bounded
The process S = (St )0≤t≤1 is a well-defined sto- process, which is a sigma-martingale, is already a
chastic integral (in the pointwise Stieltjes sense). The local martingale [4, Prop. 2.5 and 2.6]. Émery’s
verbal description of S goes as follows: again S example shows that this is not the case any more
remains at 0 until time τ and then, depending on if we drop the local boundedness assumption. From
the sign of ε, it jumps to +τ −1 or −τ −1 . a financial point of view, however, the question of
Is the process S a martingale? Morally speaking, interest arises in a slightly different version. Is there
one might think yes, as it has the same odds of an example of a process S = (St )0≤t≤T , which is a
jumping up or down.d But this intuition goes wrong: sigma-martingale, say under , but such that it fails to
indeed, the notion of martingale is based on some be a local martingale under any probability measure
(conditional) expectations to be zero. When we do Q equivalent to ?
the calculations in the present example we end up Émery’s original example does not provide a
with expressions of the form ∞ − ∞, which creates counterexample to this question; in this example,
a problem. Indeed, we have it is not hard to pass from  to Q such that
S even becomes a Q-martingale. However, in [4,
Ɛ[|St |] = ∞, for 0 ≤ t ≤ 1 (19) Ex. 2.3] a variant of Émery’s example has been
t constructed, which is a process S taking values in 2
as is easily seen from 0 u−1 du = ∞, for t > 0. answering the above question negatively. It seems
In fact, it is not hard to show [7] that, for every worth mentioning that—to the best of the author’s
(Ft )0≤t≤1 —stopping time σ :  → [0, 1], such that knowledge—it is unknown whether there also is a
[σ > 0] > 0, we have counterexample of a process S, taking values only in
, to this question.
Ɛ[|Sσ |] = ∞ (20)

It follows that S even fails to be a local martingale.


But, of course, S is a sigma-martingale by its very Separating Measures
construction.
We have seen in the preceding sections that, for a
The message of the above example is that the process S = (St )0≤t≤T which is a sigma-martingale
notion of sigma-martingale is tailor-made to save under some probability measure Q and for each
the intuition that—from a “moral point of admissible integrand H , we have the inequality
view”—the above process S is “something like a
martingale”. ƐQ [(H · S)T ] ≤ 0 (21)
Let us turn from moral considerations to finance
again: the question arises as to whether a process Indeed, the theorem of Ansel–Stricker [1,
S = (St )0≤t≤T , which is a sigma-martingale under Corr. 3.5] and its extension to sigma-martingales [4,
some measure Q equivalent to , well defines a Th. 5.5] imply that H · S is a local martingale and,
sound, that is, arbitrage-free, model of a financial using again the boundedness from below, the process
market. The answer is analogous to the case of a H · S is a supermartingale.
local martingale, namely, a resounding yes. If S = The notion of a separating measure introduced
ϕ · M, for some + -valued predictable process ϕ, by Y. Kabanov in [9], takes this inequality (21) as
then again the “trivial formula” (15) above holds true. defining property. To formalize this idea, we assume
Hence again the families of admissible stochastic that S is an d -valued semimartingale on some
integrals on the processes S and M coincide. If filtered probability space (, F, (Ft )0≤t≤T , ). We
these are the only relevant objects—as is the case say that a measure Q, equivalent to , is a separating
for the classical approach to no-arbitrage theory measure for S if, for all admissible, predictable
as proposed by M. Harrison and S. Pliska [8]—the S-integrable integrands H , inequality (21) holds true.
Equivalent Martingale Measures 5

If S is bounded, then it is straightforward to In the context of this theorem, after surmounting


verify that the validity of inequality (21), for all some difficulties, an application of the Hahn–Banach
admissible H , is tantamount to S being a martingale. theorem plus an exhaustion argument due to J. Yan
It follows that, if S is locally bounded, then the ([15], compare also [14]) provides a σ ∗ -continuous,
validity of inequality (21), for all admissible H , is linear functional F : L∞ (, F, ) →  which stric-
tantamount to S being a local martingale. Hence, tly separates the set of random variables of the forme
we do not find anything new by using the notion of  T
separating measure in the context of locally bounded (H · S)T = (Ht , dSt ) (24)
semimartingales S. However, for semimartingales S 0
that are not locally bounded, we do find something
where H runs through the admissible integrands,
new; as observed above, if S is a sigma-martingale
from L∞ + (, F, ) \ {0}, that is, the positive orthant
under Q then inequality (21) holds true, for all
with the origin 0 deleted. Normalizing the functional
admissible H . But the converse does not hold true.
F by F (ζ) = 1, this translates into the fact that F is
The difference is illustrated by the subsequent easy
of the form
one-period example. To stay in line with the present
notation, we write it as an example in continuous F (g) = ƐQ [g], g ∈ L∞ (, F, ) (25)
time.
where Q is a separating measure.
Example 2 Let X be an -valued random variable, If the process S is bounded (respectively, locally
defined on some probability space (, F, ), which bounded), it immediately follows that S is a mar-
is unbounded from above and from below. For exam- tingale (respectively, a local martingale) under this
ple, we may choose X to be normally distributed. separating measure Q, which concludes the proof
The process S = (St )0≤t≤1 is defined as of the fundamental theorem of asset pricing (see

0 0≤t <1 [3]).
St = (22)
X t =1 If, however, S fails to be locally bounded, then
we cannot conclude that S is “some kind of mar-
Defining (Ft )0≤t≤1 as the filtration generated
tingale” under the separating measure Q, as is illus-
by S = (St )0≤t≤1 , we find that the only (Ft )0≤t≤1 -
trated by Example 2 above. Some further work is
predictable processes are the constant processes H =
needed—which was carried out in [4]—to pass from
(Ht )0≤t≤1 . Among those, the only S-admissible pre-
the separating measure Q to a probability measure
dictable process is H = 0. Indeed, if H = const = 0,
Q̃, which is equivalent to  and under which S
the process H · S is not bounded from below in the
is a sigma-martingale measure. It turns out that, in
sense of inequality (14).
The condition (21) therefore is trivially satisfied, the setting of the fundamental theorem of asset pric-
for each probability measure Q equivalent to . On ing [4], the latter set is dense with reference to
the other hand, S is a martingale (or, equivalently, a · 1 in the set of separating measures for S. In
local or a sigma-martingale) under Q if particular, this set is nonempty, provided we have
found a separating measure. This argument con-
ƐQ [X] = ƐQ [S1 ] = 0 (23) cludes the proof of the fundamental theorem of asset
Hence we see that, in this easy example, the class pricing also in the case of a general, d -valued
of separating measures Q is strictly bigger than the semimartingale S.
class of probability measures Q, equivalent to ,
under which S is a sigma-martingale. End Notes
Where does the nomenclature “separating mea-
a.
sure” come from? This concept arises naturally as an For continuous local martingales (Mt )t≥0 starting at
M0 = 0 the choice of stopping times via equation (9)
intermediary step in the proof of Fundamental The-
always works, that is, gives a sequence (τn )∞
n=1 satisfying
orem of Asset Pricing (compare [9] for a careful the requirements of (ii) above. In the case of càdlàg local
analysis of the arguments in [3] and [4] and, in par- martingales this is not true any more and one may give
ticular, for the introduction of the name “separating examples of local martingales where equation (9) does not
measure”). define a sequence of localizing stopping times.
6 Equivalent Martingale Measures

b.
The name is derived from the following fact: let [8] Harrison, J.M. & Pliska, S.R. (1981). Martingales and
(Bt )0≤t≤1 = (Bt1 , Bt2 , Bt3 )0≤t≤1 be an 3 -valued standard stochastic integrals in the theory of continuous trad-
Brownian motion starting at B0 = (B01 , B02 , B03 ) = (1, 0, 0). ing, Stochastic Processes and their Applications 11,
Let Rt = Bt −1 where . denotes Euclidean norm on 3 . 215–260.
Then (Rt )0≤t≤1 satisfies equation (10), where (Wt )0≤t≤1 is a [9] Kabanov, Y.M. (1997). On the FTAP of Kreps-Delbaen-
one-dimensional Brownian motion adapted to the filtration Schachermayer (English), in Statistics and Control of
generated by the three-dimensional Brownian (Bt )0≤t≤1 . We Stochastic Processes, Y.M. Kabanov, B.L. Rozovskii
refer to [11] for a beautiful presentation of the theory of & A.N. Shiryaev, eds, The Liptser Festschrift. Papers
Bessel processes (compare also [5]). from the Steklov Seminar held in Moscow, Russia,
c. 1995–1996, World Scientific, Singapore, pp.
It is easy to verify that in Proposition 1 (as well as
in Definition 1 (iii)), we may assume without loss of 191–203.
generality that ϕ takes its values in ]0, ∞[ (or, equivalently, [10] Protter, P. (1990). Stochastic integration and differential
in [1, ∞[). See [4, Prop. 2.5] for details. equations. A new approach, in Applications of Math-
d.
A precise statement is that the processes S and −S have ematics, (2nd edition, 2003, corrected third printing:
the same law, which obviously is the case. 2005) Springer-Verlag, Berlin, Heidelberg, New York,
e. Vol. 21.
To be precise, we have to consider the random variables
(H · S)T ∧ C, where C runs through + , to make sure that [11] Revuz, D. & Yor, M. (1991). Continuous martingales
these random variables are in L∞ (, F, ). and Brownian motion, in Grundlehren der Mathematis-
chen Wissenschaften, 3rd edition, 1999, corrected third
printing: 2005, Springer, Vol. 293.
References [12] Rogers, L.C.G. & Williams, D. (2000). Diffusions,
Markov Processes and Martingales, Cambridge Univer-
[1] Ansel, J.P. & Stricker, C. (1994). Couverture des actifs sity Press, Vol. I and II.
contingents et prix maximum, Annales de l’Institut Henri [13] Shreve, S. (2004). Stochastic calculus for finance,
Poincaré – Probabilités et Statistiques 30, 303–315. Springer Finance I, II, 208, 550.
[2] Chou, C.S. (1977/78). Caractérisation d’une classe de [14] Stricker, Ch. (1990). Arbitrage et Lois de martingale,
semimartingales, in Séminaire de Probabilités XIII, Annales de l’Institut Henri Poincaré – Probabilites et
Springer Lecture Notes in Mathematics, Springer, Vol. Statistiques 26, 451–460.
721, pp. 250–252. [15] Yan, J.A. (1980). Caractérisation d’ une classe
[3] Delbaen, F. & Schachermayer, W. (1994). A general d’ensembles convexes de L1 ou H 1 , in Séminaire de
version of the fundamental theorem of asset pricing, Probabilités XIV, J. Azema & M. Yor, eds, Springer
Mathematische Annalen 300, 463–520. Lecture Notes in Mathematics, Springer, Vol. 784, pp.
[4] Delbaen, F. & Schachermayer, W. (1998). The funda- 220–222.
mental theorem of asset pricing for unbounded stochastic
processes, Mathematische Annalen 312, 215–250.
[5] Delbaen, F. & Schachermayer, W. (1995). Arbitrage
possibilities in Bessel processes and their relations to
Related Articles
local martingales, Probability Theory and Related Fields
102, 357–366. Arbitrage Strategy; Complete Markets; Free
[6] Delbaen, F. & Schachermayer, W. (2006). The Math-
Lunch; Fundamental Theorem of Asset Pri-
ematics of Arbitrage, Springer Finance, p. 371. ISBN:
3-540-21992-7. cing; Martingales; Minimal Entropy Martingale
[7] Émery, M. (1980). Compensation de processus à varia- Measure; Minimal Martingale Measure; Risk-
tion finie non localement intégrables, in Séminaire de neutral Pricing; Second Fundamental Theorem of
Probabilités XIV, J. Azema & M. Yor, eds, Springer Asset Pricing.
Lecture Notes in Mathematics, Springer, Vol. 784,
pp. 152–160. WALTER SCHACHERMAYER
Second Fundamental and with FT = F. Let S = (St0 , . . . , Std )t∈[0,T ] be a
(d + 1)-dimensional strictly positive semimartingale,
Theorem of Asset Pricing whose components S 0 , . . . , S d are right continuous
with left limits. Moreover, we assume that S00 = 1.
Here, the stochastic process Stk represents the value
The second fundamental theorem of asset pricing at time t of the kth security on the market. The
concerns the mathematical characterization of the discounted price process Z = (Zt1 , . . . , Ztd )t∈[0,T ] is
economic concept of market completeness for liq- then defined by setting Z k = S k /S 0 , for k = 1, . . . , d.
uid and frictionless markets with an arbitrary number Let  be the set of probability measures Q on
of assets. The theorem establishes the mathemati- (, F) that are equivalent to P and such that Z is
cal necessary and sufficient conditions in order to a (vector) martingale under Q. We assume that 
guarantee that every contingent claim on the mar- is not empty, that is, that the market is arbitrage
ket can be duplicated with a portfolio of primitive free (see Fundamental Theorem of Asset Pri-
assets. For finite asset economies, completeness (i.e., cing). We fix an element P ∗ in  and denote by
perfect replication of every claim on the market E ∗ the expectation under P ∗ . Let L(Z) denote the
by admissible self-financing strategies) is equivalent set of all vector-valued, predictable processes H =
to uniqueness of the equivalent martingale measure. (Ht1 , . . . , Htd )t∈[0,T ] that are integrable with respect
This result can be extended to market models with an to the semimartingale Z. For further details on L(Z),
infinite number of assets by defining completeness in we refer to Remark 1 below.
terms of approximate replication of claims by attain-
able ones. Hence several definitions of completeness Definition 1 A stochastic process φ ∈ L(Z) is said
are possible, and in the sequel we will present and to be an admissible self-financing strategy if
discuss them extensively.
(i) the
d discounted value process V ∗ (φ) :=
k k
k=1 φ Z is almost surely nonnegative;
Finite Number of Assets (ii) V ∗ (φ) satisfies the self-financing condition

The second fundamental theorem appeared in [9]  t


d
under the assumption that the interest rate is zero and
Vt∗ (φ) = V0∗ (H ) + φsk dZsk , t ∈ [0, T ];
that the agent employs only simple trading strategies 0 k=1
to address the following issue, raised in the economic (1)
literature [1, 20, 22]: given a financial market, which (iii) V ∗ (φ) is a martingale under P ∗ .
contingent claims are “spanned” by a given set of
market securities? Condition (iii) is introduced here to rule out “certain
In the seminal paper [7], it was already observed foolish strategies that throw out money” [11], that
that in the idealized Black–Scholes market the cash is, for no-arbitrage reasons. Note also that in the
flow of an option can be duplicated by managing a preceding definition only the last condition may
portfolio containing only stock and bond. A natural depend on the choice of the reference measure P ∗ .
question is then as follows: for which contingent A contingent claim X with maturity T is then
claim does this result hold in more general markets? represented by a nonnegative (FT -measurable) ran-
When does it hold for all contingent claims on the dom variable. Such a claim is said to be attainable if
market? there exists an admissible trading strategy φ such that
For markets with a finite number of asset prices, VT∗ (φ) = X/ST0 . The model is said to be complete if
the answer to this problem was provided for the first every claima is attainable.
time in [10, 11]. Here we follow the notation of [11]
in order to state the second fundamental theorem. Theorem 1 (The second fundamental theorem of
Let T < ∞ be a fixed time horizon; consider a asset pricing, [11]). Let   = ∅. Then the following
probability space (, F, P ) endowed with a filtration statements are equivalent:
(Ft )t∈[0,T ] satisfying the usual conditions and such
that F0 contains only  and the null sets of P (i) The model is complete under P ∗ .
2 Second Fundamental Theorem of Asset Pricing

(ii) Every P ∗ -martingale M can be represented in t = 0, . . . , T and each A ∈ Pt , we have Kt (A) =


the form dim(span {St+1 (ω) : ω ∈ A}). Hence it is sufficient to
check if the rank of the matrix with columns formed
 t
d
by the vectors St+1 (ω), ω ∈ A, equals the splitting
Mt = M0 + Hsk dZsk , t ∈ [0, T ] (2)
0 k=1
index Kt (A) of A. By using this geometric property
of the sample paths of the price process, an algorithm
for some H ∈ L(Z) (predictable representation is then provided in [23] to check if finite securities
property). markets in discrete time are complete.
(iii)  is a singleton, that is, there exists a unique
equivalent martingale measure for Z. Example 2 In the case when security prices follow
Itô processes on a multidimensional Brownian filtra-
The proof of this theorem relies on some results tion, completeness of the market can be characterized
of [12, 14], Chapter XI, relating the representation in terms of the volatility matrix of the underlying
property (1) to a condition involving a certain set of asset prices, as shown in [3, 16, 18]. Consider a mar-
probability measures. ket with d risky assets given by Itô processes of the
form
Remark 1 In Theorem 1 the definition of the space
L(Z) is crucial, as shown by a counterexample 
 t n  t

in [19]. From reference [16] we obtain that L(Z) Sti = S0i exp  αsi ds − 1/2 (σsij )2 ds
must be the largest class of integrands over which 0 0
j =1
multidimensional integrals with respect to Z can 
be defined, as done implicitly in [11]. Hence by n 
 t
Theorem 4.6 of [12] we have that L(Z) is the + σsij dWsj  , t ∈ [0, T ] (4)
space of the vector-valued, predictable processes j =1 0

H = (Ht1 , . . . , Htd )t∈[0,T ] such that


i = 1, . . . , d, on the probability space (, F, P )
 t 
d
endowed with the (augmented) natural filtration
Hsi Hsj d[Z i , Z j ]s , t ∈ [0, T ] (3) (Ft )t∈[0,T ] generated by the n-dimensional Brownian
0 i,j =1
motion W = (Wt1 , . . . , Wtn )t∈[0,T ] with FT = F. Here
is locally integrable. S 0 can be assumed constantly equal to 1 for the sake
of simplicity. For t ∈ [0, T ] we denote by t (ω) the
Completeness can be easily characterized in some (random) volatility matrix, whose entries are given by
particular cases, as shown by the following examples.
ij
Example 1 Consider a market with a finite number [t (ω)]ij = σt (ω), i = 1, . . . , d, j = 1, . . . , n
of assets in discrete time {0, . . . , T } and let Pt be the (5)
partition of  underlying Ft . For each cell A of Pt ,
t ∈ {0, . . . , T − 1}, we define as splitting index of A If for all i = 1, . . . , d, S0i is a positive constant,
the number Kt (A) of cells of Pt+1 that are contained (αti )t∈[0,T ] an adapted stochastic process with
in A. Then completeness can be characterized as
follows.  T
|αsi | ds < ∞, a.s. (6)
Proposition 1 (Proposition 2.12 of [10]). Let   = ∅ 0
and suppose that the securities are not redundant.b
ij
Then the model is complete if and only if Kt (A) = and (σt )t∈[0,T ] are adapted stochastic processes
d + 1 for all A ∈ Pt and t = 0, . . . , T − 1. with  T

Hence completeness is a matter of dimension. Corol- (σsij )2 ds < ∞, a.s. (7)


0
lary 4.2 of [23] shows that if the market is com-
plete, then the splitting index Kt (A) is determined for j = 1, . . . , n, then the following characterization
by the price process S only, that is, for every of market completeness holds.
Second Fundamental Theorem of Asset Pricing 3

Theorem 2 (Theorem 4 of [3], Theorem 2.2 and sources of randomness, given by the |E| different
3.2 of [16]). Let   = ∅. Then the market is com- possible shocks.
plete if and only if P (rank(t ) = d for almost all t ∈
[0, T ]) = 1. We have seen that the key to completeness is the
predictable representation property. Hence, a natu-
For further references, see also Theorem 4.1 of [18]. ral question concerns the kind of martingales for
Since there are n sources of randomness represented which the predictable representation property is sat-
by the Brownian motions, it is natural to expect that isfied. For the continuous case, we have that the
n sufficiently independent asset prices are needed for predictable representation property holds for diffu-
completeness. Clearly, if d < n the market cannot be sion processes that are martingales and have either
complete. Lipschitz coefficients [24] or a nondegenerate dif-
fusion matrix and continuous coefficients [14]. The
Example 3 If price processes are discontinuous only one-dimensional martingales with stationary and
but with a finite number of jump sizes, then we independent increments that satisfy the predictable
obtain again a characterization of completeness in representation property are the Wiener and the Pois-
terms of the volatility matrix, as shown by the son martingales [25]. Hence the representation prop-
following theorem attributed to Bättig [3]. We set erty holds for finite Lévy measures, but it fails for
again S 0 = 1 and consider price processes driven by infinite Lévy measures. In the next section, we dis-
a multivariate point processc µ with compensator cuss the second fundamental theorem in the case of
ν( dt, dx) = Kt ( dx) dt such that infinite dimensional financial markets.

Sti = S0i ε R i t , t ∈ [0, T ], i = 1, . . . , d (8)
Infinite Number of Assets
with
  Many applications of hedging involve dynamic trad-
t
Rti = αsi ds + i
σ (u, x)(µ( du, dx) ing in principle in infinitely many securities, for
0 [0,t]×E example, in pricing of interest rate derivatives by
− ν( du, dx)), t ∈ [0, T ], i = 1, . . . , d, using pure discount bonds or in the use of the
term and strike structure of European put and call
(9)
options to hedge exotic derivatives, when asset prices
where the σ i (t, x)s are bounded dµ ⊗ dP a.e., E is are driven by Lévy measures. Hence it is natural
the Doléans exponential (for the definition, we refer to develop infinite dimensional market models to
to Theorem I.4.61 of [13]), and E ⊂ . Note that address this kind of issues. The problem now is
here σ i , µ, and ν may depend on ω, but for the sake to establish if the second fundamental theorem still
of simplicity we do not indicate this dependence. In holds, and if the market is endowed with an infinite
this context, asset prices may have jumps that can be number of assets.
thought of as the result of possible shocks that may By defining a complete market via the density of
trigger the market. If the cardinality |E| of E is finite, a vector space, the second fundamental theorem is
we denote again by t the volatility matrix, whose in [8] proved to hold true for (infinitely many)
row vectors are given by (σ i (t, x))x∈E , i = 1, . . . , d. continuous and bounded asset price processes, if all
the martingales with respect to the reference filtration
Theorem 3 (Theorem 5 of [3]). Let   = ∅, |E| < Ft are continuous ([8], Theorem 6.7). In the case of
∞ and Kt ({x}) > 0 for every x ∈ E. Then the a general filtration, Theorem 6.5 of [8] states that
market is complete if and only if P (rank(t ) = completeness is equivalent for P ∗ to be an extreme
|E| for almost all t ∈ [0, T ]) = 1. point of , that is, a weaker version of the second
fundamental theorem holds.
Furthermore, in the case of a finite number of jumps The hypothesis of continuity cannot be dropped
that may trigger the economy, the characterization of and in the presence of jumps (discontinuities) and
market completeness is similar to the Itô price process infinitely many assets, a counterexample to the sec-
case, that is, one needs |E| sufficiently independent ond fundamental theorem is provided in [2], where an
processes for completeness in presence of the |E| economy with infinitely many assets is constructed,
4 Second Fundamental Theorem of Asset Pricing

in which the market is complete; yet, there exists be interpreted as a market agent’s personal way of
an infinity of equivalent martingale measures. Since assigning values to claims, that is, the set  repre-
the formulation of this counterexample, many papers sents the possible contingent claims valuation mea-
have studied the problem of extending the result sures held by traders. An agent using the valuation

Q ∈  assigns to a contingent claim H the
of the second fundamental theorem to markets with measure
infinitely many assets. Since many definitions of value H dQ. The fact that  is given by the P -
completeness are possible, the solution to the coun- absolutely continuous signed measures on FT has
terexample of [2] relies on the choice of the definition two particular meanings: first that all traders agree
of completeness that is adopted. A first answer to this on null events, and second, that there can be strictly
problem was provided in 1997 by Björk et al. [5, 6], positive random variables with negative personal
where Theorem 6.11 shows that in the presence of value. For a given trader, represented by Q ∈ ,
infinitely many assets and a continuum of jump sizes, two contingent claims H1 and H2 are approximately
the uniqueness of the equivalent martingale measure equal if
is equivalent to the market being approximately com-
plete, that is, every bounded contingent claim can be 
approached in L2 (Q) for some Q ∈  by a sequence
| (H1 − H2 ) dQ| < for small > 0 (10)
of hedgeable claims.
In 1999, a number of papers appeared [3, 4,
15, 17] at the same time, where new definitions
Denote the space of all bounded contingent claims
of market completeness were proposed in order to
maintain the second fundamental theorem, even in by C. The finite
intersections

of the sets of the form
complex economies. The equivalence between market B(H1 , ) = H2 ∈ C| | (H1 − H2 ) dQ| < , H1 ∈
completeness and uniqueness of the pricing measure C, and > 0, give a basis for a topology τ Q on C. We
is maintained by introducing a notion of market com- endow C with the coarsest topology τ finer than all of
pleteness that is independent both of the notion of no the τ Q , Q ∈ . This topology is now agent indepen-
arbitrage and of a chosen equivalent martingale mea- dent, that is, two claims are approximately equal if
sure. In finite-dimensional markets, the definition of all the agents believe that their values are close. The
market completeness is given in terms of replicat- topology τ is usually referred as the weak* topology
ing value processes in economies without arbitrage on C [21].
possibilities and with respect to a given equivalent An agent is then allowed to trade in a finite number
martingale measure. However, the issue of complete- of assets via self-financing, bounded, stopping time
ness is about the ability to replicate certain cash simple strategies that yield a bounded payoff at T .
flows, and not about how these cash flows are val- As in the previous section, a (bounded) claim is said
ued or whether these values are arbitrage free. From to be attainable if it can be replicated by one of
this perspective, the appropriate measure to address such strategies. In this setting, the market is said
the issue of completeness is the statistical probability to be quasicomplete if any contingent claim H ∈ C
measure P , and not an equivalent martingale mea- can be approximated by attainable claims in the
sure that may also not exist. In reference [17], this weak* topology induced by  on C. Since the weak*
new approach was also motivated by the empirical topology as well as the trading strategies are agent
asset pricing literature. Moreover, an example in [3] measure independent, the same is true for this notion
shows an economy where the existence of an equiv- of completeness. Consider now the space ± of the
alent martingale measure precludes the possibility of P -absolutely continuous signed martingale measures.
market completeness. Hence in references [3, 4, 15, Then the following generalized version of the second
17], the concept of exact (almost everywhere) replica- fundamental theorem holds.
tion of a contingent claim via an admissible portfolio
is substituted by the notion of approximation of a Theorem 4 (The second fundamental theorem of
contingent claim. The main outlines of this approach asset pricing, Theorem 2 of [3], Theorem 1 of [4],
are the following. Theorem 5 of [17]). Let ±  = ∅. Then there exists
Let  denote the space of the P -absolutely con- a unique P -absolutely continuous signed martingale
tinuous signed measures on FT . Then Q ∈  can measure if and only if the market is quasicomplete.
Second Fundamental Theorem of Asset Pricing 5

The proof of this theorem relies on the theory of [5] Björk, T., Di Masi, G., Kabanov, Y. & Runggaldier, W.
linear operators between locally convex topological (1997). Towards a general theory of bond markets,
vector spaces. Finance and Stochastics 1, 141–174.
[6] Björk, T.G., Kabanov, Y. & Runggaldier, W. (1997).
Since the market is endowed with an infinite
Bond market structure in the presence of marked point
number of assets, in principle, trading in infinitely processes. Mathematical Finance 7, 211–223.
many assets may be possible. To take this possibility [7] Black, F. & Scholes, M. (1973). The pricing of options
into account, in [5, 6, 15, 17] portfolios consisting and corporate liabilities, Journal of Political Economy
of infinitely many assets are allowed by considering 81, 637–659.
measure-valued strategies. The result of Theorem 4 [8] Delbaen, F. (1992). Representing martingale measures
still holds in the case of market models where when asset prices are continuous and bounded. Mathe-
measure-valued strategies are allowed as shown in matical Finance 2, 107–130.
[9] Harrison, J.M. & Kreps, D.M. (1979). Martingales and
Theorem 6.11 of [5] and Theorem 2.1 of [15].
arbitrage in multiperiod securities markets. Journal of
This approach resolves the paradox of the coun- Economic Theory 20, 381–408.
terexample of [2], since the economy considered in [10] Harrison, J.M. & Pliska, S.R. (1981). Martingales and
[2] is incomplete under this new definition of market stochastic integrals in the theory of continuous trad-
completeness. Moreover, if   = ∅ and the number of ing, Stochastic Processes and Their Applications 11,
assets is finite or the asset prices are given by con- 215–260.
tinuous processes, then Theorem 5 of [4] shows that [11] Harrison, J.M. & Pliska, S.R. (1983). A stochastic
the market model is quasicomplete if and only if it is calculus model of continuous trading: complete mar-
kets. Stochastic Processes and Their Applications 15,
complete.
313–316.
[12] Jacod, J. (1979). Calcul Stochastique et Problèmes des
Martingales, Lectures Notes in Mathematics, No. 714,
End Notes Springer-Verlag, Berlin, Heidelberg, New York.
[13] Jacod, J. & Shiryaev, A.N. (1987). Limit Theorems for
a.
We say that a contingent claim is integrable if E ∗ [X/ST0 ] < Stochastic Processes, Springer-Verlag, Berlin, Heidel-
∞. By Definition 1, it follows that an attainable contin- berg, New York.
gent claim is necessarily integrable. Hence we can restate [14] Jacod, J. & Yor, M. (1977). Etude des solutions
the definition of market completeness as follows. The extrémales et représentation intégrales des solutions
model is said to be complete if every integrable claim is pour certains problèmes des martingales, Zeitschrift für
attainable. Wahrscheinlichkeitstheorie und verwandte Gebiete 38,
b.
The price process is said to contain a redundancy if 83–125.
P (α · St+1 = 0|A) = 1 for some nontrivial vector α, some [15] Jarrow, R.A., Jin, X. & Madan, D.B. (1999). The second
t < T , and some A ∈ Pt . fundamental theorem of asset pricing, Mathematical
c.
Let E be a Blackwell space. An E-multivariate point Finance 9, 255–273.
process is an integer-valued random measure on [0, T ] × [16] Jarrow, R.A. & Madan, D.B. (1991). A characterization
E with µ([0, t] × E) < ∞ for every ω, t ∈ [0, T ] (see of complete security markets on a Brownian filtration,
Definition III.1.23 of [13]). Mathematical Finance 1, 31–43.
[17] Jarrow, R.A. & Madan, D.B. (1999). Hedging contingent
claims on semimartingales, Finance and Stochastics 3,
References 111–134.
[18] Londono, J.A. (2004). State tameness: a new approach
[1] Arrow, K. (1964). The role of securities in the optimal for credit constrains, Electronic Communications in
allocation of risk-bearing, Review of Economics Studies Probability 9, 1–13.
31, 91–96. [19] Müller, S.M. (1989). On complete securities markets and
[2] Artzner, P. & Heath, D. (1995). Approximate complete- the martingale property of securities, Economics Letters
ness with multiple martingale measures, Mathematical 31, 37–41.
Finance 5, 1–11. [20] Ross, S. (1976). The arbitrage theory of capital asset
[3] Bättig, R. (1999). Completeness of securities mar- pricing, Journal of Economic Theory 13, 341–360.
ket models–an operator point of view, The Annals of [21] Rudin, W. (1991). Functional Analysis, 2nd Edition,
Applied Probability 9, 529–566. MacGraw-Hill, New York.
[4] Bättig, R. & Jarrow, R.A. (1999). The second funda- [22] Stiglitz, J. (1972). On the optimality of the stock
mental theorem of asset pricing: a new approach, The market allocation of investment, Quarterly Journal of
Review of Financial Studies 12, 1219–1235. Economics 86, 25–60.
6 Second Fundamental Theorem of Asset Pricing

[23] Taqqu, M.S. & Willinger, W. (1987). The analysis of Related Articles
finite security markets using martingales, Advances in
Applied Probability 19, 1–25.
[24] Yamada, T. & Watanabe, S. (1971). On the unique- Equivalence of Probability Measures; Equivalent
ness of solutions of stochastic differential equations. Martingale Measures; Fundamental Theorem of
Journal of Mathematics of Kyoto University 11, Asset Pricing; Hedging; Martingales; Martingale
155–167. Representation Theorem;
[25] Yor, M. (1977). Remarques Sur la Représentation des
Martingales Comme intégrales Stochastiques, Séminaire FRANCESCA BIAGINI
de probabilités de Strasbourg XI, Lecture Notes in Math-
ematics, No. 581, Springer, New York,
pp. 502–517.
Expected Utility Convex Duality) is a powerful alternative approach.
In the mid-1980s, with the works of Pliska [14],
Maximization: Duality He and Pearson [8], Karatzas et al. [10], and Cox
and Huang [5] this new methodology started to fully
Methods develop. Relying on convex duality (see Convex
Duality) and martingale (see Martingales) meth-
ods, it enables the treatment of the most general
Expected utility maximization has a long tradition in cases. The price to pay for the achieved general-
modern mathematical finance. It dates back to the ity is that the results obtained have a mathematical
1950s [18] when it provided a theoretical foundation existence–uniqueness–characterization form. As is
to the (Markowitz’s) mean–variance asset allocation always the case, explicit calculations require the spec-
method (see Risk–Return Analysis). The objective ification of a (very) tractable model.
of a rational and risk-averse agent acting is captured The presentation given here is based on the convex
by a concave function, the utility U of the agent (see duality approach, in a general semimartingale model.
Utility Function). It is typically assumed that U is For a treatment of the same problem with martingale
increasing since the agent prefers more wealth to less. methods in a diffusion context, see Expected Utility
Given his/her U , the agent chooses the portfolio P ∗ Maximization or [9].
that maximizes the agent’s expected utility over a
horizon [0, T ].
Some famous case studies are considered in [12, Examples
13], where the agent is planning for retirement
in a Black–Scholes (and thus complete) financial Consider an agent who is a price taker, that is,
market (see Merton Problem). The complete market his/her actions do not affect market prices, and whose
framework (see Complete Markets) is a convenient goal is to trade dynamically in a financial market
mathematical idealization as any conceivable risk up to a horizon T , in order to achieve maximum
can be hedged by cleverly investing in the market. expected utility. A host of features can be taken into
As a consequence, independently of the specific account, such as the initial endowment, the possibility
utility of the agent, the price of any claim is also of intertemporal consumption, and the presence of
uniquely assigned since by the no-arbitrage principle a random endowment at time T . A list of various
it must coincide with the initial value of the hedging situations is given in the following. The mathematical
portfolio. details are discussed in the next section.
In the more realistic situation of incomplete mar-
ket, when there are, for example, intrinsic, nontraded 1. Maximizing Utility of Terminal Wealth
sources of risk, both the valuation and the hedging The preferences of the investor are represented by a
problems become highly nontrivial issues. Expected von Neumann–Morgenstern utility function
utility maximization has also turned out to perform
very well in the pricing problem in the general, U :  → [−∞, +∞) (1)
incomplete market setup. The related pricing tech-
niques are known as pricing by marginal utility and which must be not identical to −∞, increasing, and
indifference pricing and are discussed briefly in this concave.
article (for more details see Utility Indifference Val- Typical examples are U (x) = ln x, U (x) = α1 x α
uation). with α < 1, α  = 0, where it is intended that U (x) =
The use of increasingly more complex probabilis- −∞ outside the domain, and U (x) = − γ1 e−γ x with
tic models of financial assets has continued to pose γ > 0.
new mathematical challenges. If the setup is that of No consumption occurs before time T . The agent
general non-Markovian diffusion or semimartingale has the initial endowment x and can invest in the
models, direct methods from stochastic optimal con- financial market. The resulting optimization problem
trol (as originally done by Merton and many others is
after him) become increasingly difficult to handle. sup E[U (k)] (2)
As first suggested by Bismut [4], convex duality (see k∈K(x)
2 Expected Utility Maximization: Duality Methods

where K(x) is the set of random wealths that can the wealth level x. The maximization is then that
be obtained at time T (terminal wealths) with initial of the expected integrated utility from the rate of
wealth x. consumption:
The formulation of the problem with random  T 
endowment, namely, when the agent receives at T sup E U (t, c(t)) dt (6)
an additional cashflow B (say, an option), is the (C,P )∈A(x) 0
following:
sup E[U (k + B)] (3) 3. Maximizing Utility of Terminal Wealth and
k∈K(x) Consumption
Alternatively, the agent may wish to maximize
as his/her terminal possible wealths now are of the
expected utility from terminal wealth and intertem-
form k + B.
poral consumption given his/her initial wealth x ≥ 0.
2. Maximizing Utility of Consumption Therefore, there are two utilities, U and U , from
terminal wealth and from the rate of consumption,
Suppose that the agent is not particularly interested
respectively. Let A(x) be the set of the possible
in consumption at the terminal time T , but rather
consumption plans—portfolios (C, P ), obtained with
he/she is willing to consume over the entire planning
initial wealth x, and let X C,P (T ) be the terminal
horizon. A consumption plan C for the agent is deter-
wealth from the choice (C, P ). Then the optimal
mined by its random rate of consumption c(t) at time
consumption–investment is the couple (C ∗ , P ∗ ) that
t for all t ∈ [0, T ]. It is evident from the financial per-
solves
spective that the rate c(t) must be nonnegative, so the
consumption in the interval [t, t + dt] increases by   
T   
the quantity c(t)dt. The goal of the agent is thus the sup E U (t, c(t)) dt + E U X C,P (T )
selection of the best consumption plan over [0, T ], (C,P )∈A(x) 0
starting with an initial endowment x ≥ 0. The utility (7)
function will now measure the degree of satisfaction
with the intertemporal consumption or better with the The case selected in the following section for the
rate of consumption. As this measure may change illustration of the duality technique and the main
over the time, the utility also depends on the time results is the first, that is, utility maximization of
parameter: terminal wealth. When intertemporal consumption is
taken into account, similar results can be proved. In
U : [0, T ] ×  → [−∞, +∞) (4) addition, case 3 turns out to be a superposition of
cases 1 and 2, as shown in Chapters 3, 6 of [9].
When t is fixed, then U (t, ·) is a utility function
with the same properties as in case (1). As the rate
of consumption cannot be negative, U (t, x) = −∞ Maximizing the Utility of (Discounted)
when x < 0. The agent may clearly benefit from Terminal Wealth
the opportunity of investing in the financial market,
so in general his/her position can be expressed by An analysis of any optimization problem relies on
a consumption plan C and a dynamically changing a precise definition of the domain of optimization
portfolio P . If X C,P (t) is the total wealth of the and the objective function. Therefore, the study of
position (C, P ) at time t, then as there is no inflow maximization (2) requires a specification of
of cash the variation of the wealth in [t, t + dt] must
satisfy 1. the financial market model and the admissible
terminal wealths;
dX C,P (t) = −c(t) dt + dV P (t) (5) 2. the technical assumptions on U ; and
3. some joint condition on the market model and
where dV P (t) is the variation of the value of the the utility function.
portfolio P at time t due to market fluctuations.
Let A(x) indicate the set of all such consump- 1. The financial market model considered is fric-
tion plans—portfolios (C, P ) when starting from tionless and consists of N risky assets and one
Expected Utility Maximization: Duality Methods 3

risk-free asset (money market account). Although if there exists some constant c > 0 such
it is not necessary, for the sake of convenience,  t
it is assumed that the risk-free asset, S 0 , is con-
for all t ∈ [0, T ], Hs dSs ≥ −c  − a.s.
stantly equal to 1, that is, the prices are discounted. 0
The N risky assets are globally indicated by S = (9)
(S 1 , . . . , S N ). The trading can occur continuously
in [0, T ]. S = (St )t≤T is, in fact, an N -valued,
that for any x the wealth process X = x +
so
continuous-time process, defined on a filtered prob- Hs dSs is also bounded from below. Maximizing
ability space (, (Ft )t≤T , ). Since the wealth from expected utility from terminal wealth means, in fact,
an investment in this market is a (stochastic) inte- maximizing expected utility from the set K(x) of those
gral, S is assumed to be a semimartingale so that the random variables

T XT that can be represented as
object “integral with respect to S” is mathematically XT = x + 0 Ht dSt with H admissible in the sense
well defined (see Stochastic Integrals). For exposi- of equation (9).
tory reasons, S is a locally bounded semimartingale. Hereafter, the notation E[·] indicates -expec-
This class of models is already very general, as all tation. When considering expectation under another
the diffusions are locally bounded semimartingales, probability , the notation is explicitly E [·].
as well as any jump-diffusion process with bounded As shown by Delbaen and Schachermayer [7]
jumps. a financially relevant set of probabilities is Me ,
The agent has an initial endowment x and there namely, the set of the equivalent (local) martingale
are no restrictions on the quantities he/she can buy, probabilities for S. When the market is complete, this
sell, or sell short. Ht = (Ht1 , . . . HtN ) is the ran- set consists of only one probability, but in the general,
dom vector with the number of shares of each risky incomplete market case, this set is infinite. Under
asset that the agent holds in the infinitesimal interval each probability,  ∈ Me S is a (local) martingale
[t, t + dt]. Bt represents the number of shares of the and thus  is a risk-neutral probability. This is the
risk-free asset held in the same interval. H = (Ht )t theoretical justification for the use of each of these
and B = (Bt )t are the corresponding processes and s as a pricing measure for any derivative claim B,
are referred to as the strategy of the agent. To be with (arbitrage-free) price given by the expectation
technically precise, H must be a predictable process E [B].
and B a semimartingale. As there is no consumption However, we need the less restrictive set M of the
and no infusion of money in the trading period [0, T ], absolutely continuous (local) martingale probabilities
the wealth from a strategy (H, B) is the process X
 for S, as this is the set that will show up in the
dual problem. The set M can be characterized in the
that solves
following manner:
   
dXt = (Ht dSt + Bt dSt0 ) = Ht dSt T
(8) M =    | E Ht dSt ≤ 0 ∀ adm. H
X0 = x
0


t (10)
or, in integral form, Xt = x + 0 Hs dSs . This can
be equivalently stated as the strategy (H, B) is self- as the set of absolutely continuous probability
financing. Since dS 0 = 0, the self-financing condition measures that give nonpositive expectation to the ter-
enables a representation of the wealth X only in terms minal wealths from admissible self-financing strate-
of H . This is the reason one typically refers to H only gies starting with zero wealth. Therefore, given any
as the strategy. XT ∈ K(x) and any  ∈ M,
As usual in continuous-time trading (see Funda-   T 
mental Theorem of Asset Pricing) to avoid phenom- E [XT ] = E x + Ht dSt ≤ x (11)
0
ena like doubling strategies, not every self-financing
H is allowed. A self-financing strategy H is said to 2. Hypothesis on U . As a case study, let us
be admissible only if during the trading the losses do assume that U is finite valued on , that is, the
not exceed a finite credit line. That is, H is admissible wealth can become arbitrarily negative (the closest
4 Expected Utility Maximization: Duality Methods

references are [2, 16]). A typical example is the and, apart from some minus signs, it coincides with
exponential utility. The reason we prefer the expo- the Fenchel conjugate of U (see Convex Duality).
nential utility (and all the other utilities with the Thus, V is a convex function, which is identically
properties listed below) to, for example, the loga- equal to +∞ when y < 0. It is also differentiable
rithmic or the power utilities is that the dual problem on (0, +∞) and its derivative is V = −(U )−1 .
is easier to interpret. References for the case when Traditionally, the inverse of the marginal utility
there are constraints on the wealth (then U is finite (U )−1 is denoted by I . By mere definition of V ,
only on a half-line), like U (x) = ln x or U (x) = for all x, y, the Fenchel inequality holds
1 α
α
x , are [11], [17], and the bibliography contained
therein. U (x) ≤ xy + V (y) (14)
A main difficulty the reader may encounter when
comparing this literature is that the language and and the above relation is, in fact, an equality iff y =
style in the papers differ. Very recently, Biagini and U (x) or equivalently x = (U )−1 (y) = I (y). Also
Frittelli [3] proposed a unifying approach that works note that
both for the case of U finite on all  and for the
case of U finite only on a half-line. The result there U (x) = inf{xy + V (y)} = inf {xy + V (y)} (15)
y y>0
is enabled by the choice of an innovative duality (an
Orlicz space duality), naturally induced by the utility
function U . The typical example (and most used) is the fol-
Regarding U , it is here required that lowing couple (U, V ):

• U is strictly concave, strictly increasing, and 1


differentiable over (−∞, +∞) and U (x) = − e−γ x
γ
• limx↓−∞ U (x) = −∞ and limx→+∞ U (x) = 0 
(these are known as the Inada condition on the 
 γ1 (y ln y − y) y > 0
marginal utility U ). V (y) = 0 y = 0 (16)


In addition, U must satisfy the reasonable asymp- +∞ y < 0
totic elasticity condition RAE(U ) introduced in [11,
Let us recall that a probability  absolutely continu-
16]:
ous with respect to  is said to have finite generalized
xU (x) xU (x) entropy (or, also, finite V -divergence), if its density
lim inf > 1, lim sup <1 (12) d is integrable when composed with V
x→−∞ U (x) x→+∞ U (x) d
In the cited references, it is also shown that this   
d
condition is necessary and sufficient for the duality E V < +∞ (17)
to work properly if U is fixed and one considers all d
possible financial markets. However, within a specific
The joint condition required between preferences
market model, one may state more general necessary
and the market is actually a condition between V and
and sufficient conditions on U that enable the duality
approach and ensure the existence of the optimal the set of probabilities M, which is as follows:
investment. We choose to impose RAE(U ) for it has
Condition 1 There exists a Q0∈ Mwith finite
the advantage that it can be easily verified. Note also
generalized entropy, that is, E V ddQP
0

that RAE(U ) is already satisfied by the commonly < +∞.


used utility functions, for example, by the classic
exponential function U (x) = − γ1 e−γ x .
3. The convex conjugate V and a joint condition Duality in Complete Market Models
between preferences and the market. The conjugate
V of U is the function Suppose that the market is complete and arbitrage
free. Thus, there exists a unique equivalent martingale
V (y) = sup{U (x) − yx} (13)
x measure  ∈ Me , which, by Condition 1, has also
Expected Utility Maximization: Duality Methods 5

finite generalized entropy. Let us restate problem (2), out the value u (x) = u(x), one can now apply the
the primal problem traditional Lagrange multiplier method to get

u(x) := sup E[U (XT )] (18)


XT ∈K(x) u (x) = sup E[U (k)]
k∈K (x)
where u(x) denotes the optimal level of the expected
= sup inf {E[U (k)] + y(x − E [k])}
utility. It is not difficult to derive an upper bound for k∈L1 () y>0
u(x). From inequality (14), in fact, for all XT ∈ K(x) (24)
and for all y > 0
  The dual problem is defined by exchanging the inf
d d
U (XT ) ≤ XT y +V y (19) and the sup in the above expression:
d d
and taking -expectations on both sides inf sup {E[U (k)] + y(x − E [k])} (25)
y>0 k∈L1 ()
  
d
E[U (XT )] ≤ xy + E V y (20) From [15 Theorem 21] or from a direct computa-
d tion, the inner sup is actually equal to
 
  
because E XT dd = E [XT ] ≤ x. Therefore, tak- d
 xy + E V y (26)
ing the supremum over XT and the infimum over y, d

so that the dual problem takes the traditional form


u(x)
      
d d
= sup E[U (XT )] ≤ inf xy + E V y inf xy + E V y (27)
XT ∈K(x) y>0 d y>0 d
(21)
which is exactly the right-hand side of equation (21).
As noted by Merton, the above supremum is Thanks to Condition 1, the dual problem is
not necessarily reached over the restricted set of always finite valued and so is u. A priori,
admissible terminal wealths K(x). Following a well- however,
 one has  chain u(x) = u (x) ≤
 only the
known procedure in the calculus of variations, a infy>0 xy + E V y d , but under the current
d
relaxation of the primal problem allows to obtain the assumptions there is no duality gap:
optimal terminal wealth. Here, this means enlarging    
K(x) slightly and considering the larger set d
u(x) = u (x) = inf xy + E V y (28)
y>0 d
K (x) := {k ∈ L1 () | E [k] ≤ x} (22)
the infimum is a minimum and the supremum over
K (x) is simply the set of claims that have initial K (x) is reached. In fact, the RAE(U) condition
  on
price smaller or equal to the initial endowment x. d 
the utility function U implies that E V y <
An application of the separating hyperplane theorem d
gives that K (x) is the norm closure of K(x) − +∞ ∀y > 0, so the infimum in (27) can be obtained
L1+ () in L1 (). Then, an approximation argument by differentiation under the expectation sign. The
shows that the optimal expected value u(x) and dual minimizer y ∗ (which depends on x) is then the
unique solution of
u (x) := sup E[U (k)] (23)   
k∈K (x) d
x + E V y =0 (29)
d
are, in fact, equal. The relaxed maximization problem
over K (x) is much simpler than the original one or, equivalently, y ∗ is the unique solution of
over K(x). The replication-with-admissible-strategies   
issue has been removed and there is just an inequality d
E I y =x (30)
constraint, given by the pricing measure . To find d
6 Expected Utility Maximization: Duality Methods

Therefore,
 the (unique) optimal claim is k ∗ = Duality in Incomplete Market Models
I y ∗ d because it verifies the following:
d The same methodology applies to the incomplete
• ∗ ∗
the balance equation E [k ] = x, so k ∈ K (x) market framework, but the technicalities require some
and more effort. The main results are (more or less intu-
• the Fenchel equality itive) generalizations of what happens in the com-
plete case, as summarized below (see [2, 16] for the
 
d d proofs).
U (k ∗ ) = k ∗ y ∗ + V y∗ (31)
d d
1. The duality relation is the natural generalization
from which, by taking the -expectations, we get of equation (28):

    
∗ ∗ ∗ d ∗ d u(x) = sup E[U (XT )]
E[U (k )] = y E k +E V y XT ∈K(x)
d d
      
d d
= y∗x + E V y∗ (32) = inf xy + E V y
d y>0,∈M d
(36)
which proves the main equality (28).
and there exists a unique couple of dual mini-
By market completeness, the martingale represen-
mizers y ∗ , ∗ .
tation theorem applies, so that k ∗ can be obtained via
2. As in the complete case, the supremum of the
a self-financing strategy H ∗ :
expected utility on K(x) may be not reached.
 T KV (x) denotes the set of k ∈ L1 () such that
k∗ = x + Ht∗ dSt (33) E [k] ≤ x for all  ∈ M with finite generalized
0
entropy. Then, the supremum of the expected
though H ∗ is not admissible in general, that is, when utility on KV (x) coincides with the value u(x)
optimally investing, the agent can incur arbitrarily and it is a maximum. The claim k ∗ ∈ KV attain-
large losses. ing the maximum is unique and the relationship
Moreover, as a function of x, the optimal value between primal and dual optima still holds
u(x) is also a utility function finite on , with
the same properties of U . The duality equation (28) d ∗ 1
= ∗ U (k ∗ ) (37)
shows that u and d y
   
 E V y dQ if y ≥ 0 3. ∗ may be not equivalent to . However, in the
v(y) = dP

(34) case ∗ ∼ , k ∗ can be obtained through a self-
+∞ otherwise financing strategy H ∗ , albeit not admissible in
are conjugate functions. general.
The relationship between the primal and dual 4. The optimal value u as a function of the initial
optima can also be expressed as endowment x is a utility function, with the same
properties of U . In fact, it is finite on , strictly
d 1 concave, strictly increasing, it verifies the Inada
= ∗ U (k ∗ ) (35) conditions, and RAE(u) holds. The duality rela-
d y
tion (36), rewritten as u(x) = infy>0 {xy + v(y)}
which amounts to saying that d is proportional to with
d
one’s marginal utility from the optimal investment.    
Therefore, in the complete market case, pricing by
v(y) = inf∈M E V y dQ if y ≥ 0
taking -expectations coincides with the pricing by dP
marginal utility principle, introduced in the option +∞ otherwise
pricing context by Davis [6]. (38)
Expected Utility Maximization: Duality Methods 7

shows that u and v are conjugate functions (see This means that the agent is indifferent, that
Convex Duality). is, he/she has the same (optimal expected) utility,
As ∗ results from a minimax theorem, it is also between (i) paying pB at time t = 0 and receiving B
known as the minimax measure. For the applications, at T and (ii) not entering into a deal for the claim B.
it is important to know that there are easy sufficient
conditions that guarantee that ∗ is equivalent to ,
such as the following: (i) U (+∞) = +∞ as noted in References
[1] or (ii) in case U (x) = − γ1 e−γ x , the existence of
a  ∈ Me with finite generalized entropy (see [17]
[1] Bellini, F. & Frittelli, M. (2002). On the existence of
for an extensive bibliography).
minimax martingale measures, Mathematical Finance
When ∗ is indeed equivalent to , its selection 12/1, 1–21.
in the class of all risk-neutral, equivalent probabilities [2] Biagini, S. & Frittelli, M. (2005). Utility maximization
Me as the pricing measure is economically motivated in incomplete markets for unbounded processes, Finance
by its proportionality to the marginal utility from the and Stochastics 9, 493–517.
optimal investment. [3] Biagini, S. & Frittelli, M. (2008). A unified frame-
work for utility maximization problems: an Orlicz
space approach, Annals of Applied Probability 18/3,
929–966.
Utility Maximization with Random [4] Bismut, J.M. (1973). Conjugate convex functions in
Endowment optimal stochastic control, Journal of Mathematical
Analysis and Applications 44, 384–404.
Under all the conditions stated above (on the market, [5] Cox, J.C. & Huang, C.F. (1989). Optimal consump-
on U , and on both), suppose that the agent has a tion and portfolio policies when asset prices follow
a diffusion process, Journal of Economic Theory 49,
random endowment B at T , in addition to the initial 33–83.
wealth x. For example, B can be the payoff of a [6] Davis, M.H.A. (1997). Option pricing in incomplete
European option expiring at T . The agent’s goal is markets, in Mathematics of Derivative Securities,
still maximizing of expected utility from terminal M. Dempster & S.R. Pliska, eds, Cambridge University
wealth, which now becomes Press, pp. 216–227.
[7] Delbaen, F. & Schachermayer, W. (1994). A general
version of the fundamental theorem of asset pricing,
u(x, B) := sup E[U (B + XT )] (39)
XT ∈K(x) Mathematische Annalen 300, 463–520.
[8] He, H. & Pearson, N.D. (1991). Consumption and
portfolio policies with incomplete markets and short-
The duality results, in this case, are similar to the sale constraints: the infinite-dimensional case, Journal
ones just shown. In fact, of Economic Theory 54, 259–304.
[9] Karatzas, I. & Shreve, S. (1998). Methods of Mathemat-
ical Finance, Springer.
u(x, B) [10] Karatzas, I., Shreve, S., Lehoczky, J. & Xu, G. (1991).
    Martingale and duality methods for utility maximization
d
= min xy + yE [B] + E V y in an incomplete market, SIAM Journal on Control and
y>0,∈M d Optimization 29, 702–730.
[11] Kramkov, D. & Schachermayer, W. (1999). The asymp-
(40)
totic elasticity of utility function and optimal investment
in incomplete markets, Annals of Applied Probability
Note that the maximization without the claim 9/3, 904–950.
can be seen as a particular case of the one above, [12] Merton, R.C. (1969). Lifetime portfolio selection under
with B = 0: u(x, 0) = u(x). The solution of a utility uncertainty: the continuous-time case, The Review of
maximization problem with random endowment is Economics and Statistics 51, 247–257.
the key step to the indifference pricing technique. [13] Merton, R.C. (1971). Optimum consumption and port-
folio rules in a continuous-time model, Journal of Eco-
The (buyer’s) indifference price of B is, in fact, the
nomic Theory 3, 373–413.
unique price pB that solves [14] Pliska, S.R. (1986). A stochastic calculus model of
continuous trading: optimal portfolios, Mathematics of
u(x − p, B) = u(x, 0) (41) Operations Research 11, 371–382.
8 Expected Utility Maximization: Duality Methods

[15] Rockafellar, R.T. (1974). Conjugate Duality and Opti- Related Articles
mization, Conference Board of Math. Sciences Series,
SIAM Publications, No. 16.
[16] Schachermayer, W. (2001). Optimal investment in Complete Markets; Convex Duality; Equivalent
incomplete markets when wealth may become negative, Martingale Measures; Expected Utility Maximiza-
Annals of Applied Probability 11/3, 694–734. tion; Merton Problem; Second Fundamental The-
[17] Schachermayer, W. (2004). Portfolio Optimization in orem of Asset Pricing; Utility Function; Utility
incomplete financial markets, Notes of the Scuola
Indifference Valuation.
Normale Superiore di Pisa, Cattedra Galileiana downlo-
adable at http://www.fam.tuwien.ac.at/∼wschach/pubs/
[18] Tobin, J. (1958). Liquidity preference as behavior
SARA BIAGINI
towards risk, Review of Economic Studies 25, 68–85.
Change of Numeraire pricing formula:
 
X 
t [X] = St0 E 0  Ft (1)
ST0
Consider a financial market model with nondividend
paying asset price processes (S 0 , S 1 , . . . , S N ) living where E 0 denotes integration with respect to (w.r.t.)
on a filtered probability space (, F, F, P ), where Q0 .
F = {Ft }t≥0 and P is the objective probability mea- Very often one uses the bank account B with
sure. For general results concerning completeness, dynamics
self-financing portfolios, martingale measures, and
arbitrage, (see Arbitrage Strategy; Fundamental dBt = rt Bt dt, B0 = 1 (2)
Theorem of Asset Pricing; Risk-neutral Pricing).
We choose the asset S 0 as the numeraire asset, where r is the short rate, as numeraire. The cor-
and we assume that St0 > 0 with probability 1. From responding martingale measure QB is then often
general theory, we know that (modulo integrability denoted by Q and referred to as “the risk neutral
and technical conditions) the market is free of arbi- martingale measure”. In this case, the pricing formula
trage if and only if there exists a measure Q0 ∼ P becomes
such that the normalized price processes  T  
− r ds 
t [X] = E Q e t s X Ft (3)
St0 St1 StN
, , . . . , In many concrete situations, the computational
St0 St0 St0
work needed for the determination of arbitrage-free
prices can be drastically reduced by a clever choice
are Q0 martingales. Using the notation Z i = S i /S 0 ,
of numeraire, and the purpose of this article is to
thus we also have, apart from the nominal price
analyze such changes.
system S 0 , S 1 , . . . , S n , the normalized price system
To set the scene, we consider a fixed risk neutral
Z 0 , Z 1 , . . . , Z n . The economic importance of the nor-
martingale measure Q for the numeraire B, and an
malized system is clarified by the following standard
alternative numeraire asset S 0 with the corresponding
result.
martingale measure Q0 . Our first task is to find the
Proposition 1 With notation as defined above the measure transformation between Q and Q0 .
following hold. To see what Q0 must look like, we consider a
fixed time T and an arbitrarily chosen T -claim X.
• A portfolio is self-financing in the S system if and Assuming enough integrability we then know that,
only if it is self-financing in the Z system. by using B as the numeraire, the arbitrage-free price
• A portfolio is an arbitrage opportunity in the S of X at time t = 0 is given as
system if and only if it is an arbitrage in the Z  
system. X
0 [X] = E Q
(4)
• The S market is complete if and only if the Z BT
market is complete.
• In the Z market, the asset Z 0 has the property that On the other hand, using S 0 as numeraire, the price
Zt0 ≡ 1, so it represents a bank account with zero is also given by the following formula:
interest rate.  
0 0 X
If X ∈ FT is a fixed contingent claim with exercise 0 [X] = S0 E (5)
ST0
date T , and if we denote the (not necessarily unique)
arbitrage-free price process of X by t [X], then by Defining the likelihood process L by Lt = dQ0 /
applying the above-mentioned result to the extended dQ on Ft , we thus have
market S 0 , S 1 , . . . , S N , t [X] we see that t [X]/St0    
is a Q0 martingale, and using this fact together with X X
E Q
= S0 E L T 0
0 Q
(6)
the obvious fact that T [X] = X we obtain the basic BT ST
2 Change of Numeraire

Since this holds for all X ∈ FT , we have the unique martingale measure Q0 . In more detail, the
following basic result. situation is as follows.
Proposition 2 Under the above-mentioned assump- • If the market is incomplete, then there will exist
tions, the likelihood process L, defined as several risk neutral measures Q.
• Each of these measures generates a different price
dQ0 system, defined by the pricing formula (3).
Lt = , on Ft , 0≤t ≤T (7)
dQ • Choosing one particular Q is thus equivalent to
is given by the formula choosing one particular price system.
• For a given numeraire S 0 , there will also exist
St0 several different martingale measures Q0 .
Lt = (8) • Each of these measures generates a different price
S0 · Bt
0
system, defined by the pricing formula (1).
We note that since S 0 /B is a Q martingale, • If a risk neutral measure Q and thus a price
the likelihood process L is also, as expected, a Q system are fixed, there exists a unique measure
martingale. Q0 such that Q0 generates the same price system
As an immediate corollary we have the following. as Q.
• The measure transformations considered here are
Proposition 3 Assume that the S dynamics under 0 precisely those corresponding to a change of
the Q measure are of the form measure within a given price system.

dSt0 = rt St0 dt + St0 σt dWtQ (9)


Pricing Homogeneous Contracts
where W Q is a d-dimensional Q Wiener process, r is
the short rate, and σ is a d-dimensional optional row Using a numeraire S 0 is particularly useful when the
vector process. Then the dynamics for the likelihood claim X is of the form X = ST0 · Y , since then we
process L are of the form obtain the following simple expression:

dLt = Lt σt dWtQ (10) t [X] = St0 E 0 [Y |Ft ] (13)

We can thus easily construct the relevant Girsanov


transformation directly from the volatility of the S 0 - A typical example when this situation occurs is
process. when dealing with derivatives defined in terms of
several underlying assets. Assume, for example, that
We can, in a straightforward manner, extend we are given two asset prices S 0 and S 1 , and that the
Proposition 3 to change from one numeraire Q0 to contract X to be priced is of the form X = (ST0 , ST1 ),
another numeraire Q1 . The proof is obvious. where  is a given linearly homogeneous function.
Using the standard machinery, we would have to
Proposition 4 Let S 0 and S 1 be two strictly positive compute the price as
numeraire assets with the corresponding martingale  T  
measures Q0 and Q1 . Denote the likelihood process − r(s) ds 1 
t [X] = E e0 t (ST , ST )Ft
0
(14)
L0,1 as
dQ1
L0,1
t = , on Ft (11) which essentially amounts to the calculation of a
dQ0 triple integral. If we instead use S 0 as numeraire we
Then L0,1 is given by have
  
St1 S00 1 
L0,1
t = · (12) t [X] = St0 E 0 0
(ST0 , ST1 )Ft
St0 S01 ST
   
Remark 1 It may perhaps seem surprising that even ST1
= St0 E 0  1, Ft
in the case of an incomplete market, we obtain a ST0
Change of Numeraire 3

= St0 E 0 [ϕ(ZT )Ft ] (15) where W is a standard Q0 -Wiener process. The price
is thus given by the following formula:
where ϕ(z) = (1, z) and ZT = ST1 /ST0 . Note that
the factor St0 is the price of the traded asset S 0 at time t [X] = St0 · c(t, Zt ) (23)
t, so this quantity does not have to be computed—it
can be directly observed on the market. Thus, the Here c(t, z) is given directly by the Black–Scholes
computational work is reduced to computing a single formula as the price of a European call option, val-
integral. √ K = 1,
ued at t, with time of maturity T , strike price
As an example, assume that we have two stocks, short rate r = 0, on a stock with volatility σ 2 + δ 2
S 0 and S 1 , with price processes of the following form and price z.
under the objective probability P :

dSt0 = αSt0 dt + σ St0 dW̃t0 (16) Forward Measures

dSt1 = βSt1 dt + δSt1 dW̃t1 . (17) We now specialize the theory to the case when the
chosen numeraire is a zero coupon bond. As can
Here W̃ 0 and W̃ 1 are assumed to be independent be expected, this choice of numeraire is particularly
P -Wiener processes, but it would also be easy to useful when dealing with interest rate derivatives.
treat the case when there is a coupling between the Suppose, therefore, that we are given a specified
two assets. bond market model with a fixed risk neutral martin-
Under Q the price dynamics will be given as gale measure Q (always with B as numeraire). For
a fixed time of maturity T , we now choose the price
dSt0 = rSt0 dt + σ St0 dWt0 (18) process p(t, T ), of a zero coupon bond maturing at
T , as our new numeraire.
dSt1 = rSt1 dt + δSt1 dWt1 (19)
Definition 1 The T -forward measure QT is defined
where W 0 and W 1 are Q-Wiener processes, and from as
Proposition 3 it follows that the Girsanov transfor- dQT
LTt = (24)
mation from Q to Q0 has a likelihood process with dQ
dynamics given as
on Ft for 0 ≤ t ≤ T where LT is defined as
dLt = Lt σ dWt0 (20)
p(t, T )
The T -claim to be priced is an exchange option, LTt = (25)
Bt p(0, T )
which gives the holder the right, but not the obliga-
tion, to exchange one S 0 share for one S 1 share at Observing that p(T , T ) = 1 we have the following
time T . Formally, this means that the claim is given useful pricing formula as an immediate corollary of
by X = max[ST1 − ST0 , 0], and we note that we have a Proposition 3.
linearly homogeneous contract function. From equa-
tion (15), the price process is given as Proposition 5 For any sufficiently integrable
T -claim X, we have the pricing formula
t [X] = St0 E 0 [max[ZT − 1, 0]|Ft ] (21)
t [X] = p(t, T )E T [X|Ft] (26)
with Z(t) = St1 /St0 . We are thus, in fact, valuing a
European call option on ZT , with strike price K = 1. where E T denotes integration w.r.t. QT .
By construction, Z will be a Q0 -martingale, and
since a Girsanov transformation will not affect the Note again that the price p(t, T ) does not have
volatility, it follows easily from equations (16) and to be computed. It can be observed directly on the
(17) that the Q0 -dynamics of Z are given by market at time t.
A natural question to ask is when Q and QT
dZt = Zt σ 2 + δ 2 dWt (22) coincide. This occurs if and only if we Q-a.s. have
4 Change of Numeraire

LT (T ) = 1, that is when the technique is not explicitly discussed. The first


T explicit use of a change of numeraire change was in
− r(s) ds
p(T , T ) e 0 [7], where an underlying stock was used as numeraire
1= =  T  (27) in order to value an exchange option. The numeraire
BT p(0, T ) − r(s) ds
EQ e 0 change is also used in [4, 5] and basically in all later
works on the existence of martingale measures in
that is if and only if r is deterministic. order to reduce the general case to the basic case
of zero short rate. In these papers, the numeraire
The General Option Pricing Formula change as such is, however, not put to systematic use
as an instrument for facilitating the computation of
We now present a fairly general formula for the option prices in complicated models. In the context
pricing of European call options. Therefore, assume of interest rate theory, changes of numeraire were
that we are given a financial market with a (possibly used and discussed independently in [2] and (within
stochastic) short rate r and a strictly positive asset a Gaussian framework) in [6], where in both cases a
price process S. We also assume the existence of a bond maturing at a fixed time T is used as numeraire.
risk neutral martingale measure Q. A systematic study of general changes of numeraire
Consider now a fixed time T , and a European call can be found in [3]. For further examples of the
on S with exercise date T and strike price K. We are, change of numeraire technique see [1].
thus, considering the T -claim:
X = max[ST − K, 0] (28) References
The main trick when dealing with options is to [1] Benninga, S., Björk, T. & Wiener, Z. (2002). On the use
write X as of numeraires in option pricing, Journal of Derivatives
43–58.
X = (ST − K) · I {ST ≥ K} [2] Geman, H. (1989). The Importance of the Forward Neu-
tral Probability in a Stochastic Approach of Interest Rates.
= ST · I {ST ≥ K} − K · I {ST ≥ K} (29) Working paper, ESSEC, 10.
[3] Geman, H., El Karoui, N. & Rochet, J.-C. (1995).
where I denotes an indicator function. Using the Changes of numéraire, changes of probability measure
linear property of pricing we thus obtain and option pricing, Journal of Applied Probability 32,
443–458.
t [X] = t [ST · I {ST ≥ K}] − K · t [I {ST ≥ K}] [4] Harrison, J. & Kreps, J. (1979). Martingales and arbitrage
(30) in multiperiod markets, Journal of Economic Theory 11,
For the first term, we change to the measure QS 418–443.
[5] Harrison, J. & Pliska, S. (1981). Martingales and stochas-
having S as numeraire, and for the second term,
tic integrals in the theory of continuous trading, Stochastic
we use the T -forward measure. Using the pricing Processes and Applications 11, 215–260.
formula (1) twice, once for each numeraire, we obtain [6] Jamshidian, F. (1989). An exact bond option formula.
the following basic option pricing formula, where we Journal of Finance 44, 205–209.
recognize the structure of the standard Black–Scholes [7] Margrabe, W. (1978). The value of an option to exchange
formula. one asset for another. Journal of Finance 33, 177–186.
[8] Merton, R. (1973). The theory of rational option pricing.
Proposition 6 Given the above-mentioned assump- Bell Journal of Economics and Management Science 4,
tions, the option price is given as 141–183.

t [X] = St QS (ST ≥ K|Ft) Related Articles


− Kp(t, T )Q (ST ≥ K|Ft )
T
(31)
Forward and Swap Measures.

Notes TOMAS BJÖRK


The first use of a numeraire different from the risk-
free asset B was probably in [8] where, however,
Utility Indifference same range for any b. The notion compensating vari-
ation is classical from the economic theory of demand
Valuation by John Richard Hicks [7]. Alternatively, the terms
indifference value (“price”) and reservation price
have been used frequently in recent literature. We use
Under market frictions like illiquidity or transac- the terms synonymously, but note that the classical
terminology appears more accurate in reflecting the
tion costs, contingent claims can incorporate some
definition (1): in general, the compensating variation
inevitable intrinsic risk that cannot be completely
πtb,x (δ) is not a “price” for which δ illiquid assets
hedged away but remains with the holder. In gen-
eral, they cannot be synthesized by dynamical trading can be traded in the market. Also, πtb,x (δ) is deter-
in liquid assets and hence cannot be priced by no- mined only at t in dependence of the position (x, b)
arbitrage arguments alone. Still, an agent (she) can prevailing and the variation δ occurring at the same
time t; it should not be interpreted prematurely as a
determine a valuation with respect to her preferences
“value at t” that could be attributed at times before t
towards risk. The utility indifference value for a vari-
to the payoff δB at T .
ation in the quantity of illiquid assets held by the
Next, we introduce the setup for a market model
agent is defined as the compensating variation of
in which a family of utility functions ut , t ≤ T , is
wealth, under which her maximal expected utility
to be obtained. For simplicity, we consider a finite
remains unchanged.
probability space (, F, P ). For time t ∈ {0, . . . T }
Consider an agent acting in a financial market
being discrete, the information flow is described by a
with d + 1 liquid assets, which can be traded at
filtration (Ft )0≤t≤T (see Filtrations), that is defined
market prices in a frictionless way anytime up to
by refining partitions At of  and corresponds to
horizon T < ∞. In addition, there are J illiquid
a nonrecombining tree (see Arrow–Debreu Prices,
assets providing risky payouts (B j )h=1,...,J at T .
Figure 1). The smallest nonempty events of Ft are
A preference order of the agent is described by
called atoms A ∈ At of Ft . Take F0 as trivial,
an (indirect) utility function ubt (x), describing the
FT = F = 2 as the set of all events, and all prob-
maximal expected utility obtainable when holding at
abilities P (ω) > 0, ω ∈  to be positive. Random
t a position consisting of wealth x ∈  invested in variables Xt known at time t are denoted by Xt ∈
liquid assets (at market prices) and b ∈ J shares LFt . They are determined by their values on each
of illiquid assets. At time t, the agent prefers a atom A ∈ At , and can be identified with elements

position (x, b) to (x  , b ) if ubt (x) ≥ ubt (x  ). She of a suitable N . A process (Xt )t≤T is adapted if

is indifferent if ubt (x) = ubt (x  ). The agent’s utility Xt ∈ LFt for any t. Inequalities and properties stated
indifference (buy) value for adding δ illiquid assets for random variables and functions are meant to hold
to her current position (x, b) is defined as the for all outcomes (coordinates). Conditional expec-
compensating variation πtb,x (δ) of her present wealth tations with respect to Ft are denoted by Et [·].
that leaves her utility unchanged, that is, as the For a family {Xta } ⊂ LFt (), the random variable
solution to ess supa Xta takes the value supa Xta (A) on atom
A ∈ At .
ub+δ
t (x − πtb,x (δ)) = ubt (x) (1) The price evolution of d liquidly traded risky
assets is described by an d -valued adapted pro-
The indifference sell value is −πtb,x (−δ). In com- cess (St )0≤t≤T . All prices are expressed in units of
parison, the certainty equivalent for adding δ to her a further liquid riskless asset (unit of account) whose
position in illiquid assets is the equivalent variation price is constant at 1. If, for example, the unit of
cb,x
t (δ) of the wealth that yields the same utility, that
account is the zero-coupon bond with maturity T , all
is, it is the solution to prices are expressed in T -forward units. A trading
strategy (ϑt )t≤T ∈  is described by the numbers of
liquid assets ϑt ∈ LFt−1 to be held over any period
ubt (x + cb,x
t (δ)) = ut
b+δ
(x) (2) T
[t − 1, t). Its gains from t until T are t ϑ dS :=
T
Equations (1), (2) have unique solutions if the func- k=t+1 ϑk Sk , with Sk := S − Sk−1 . Any XT ∈
kT
tions x  → ubt (x) are strictly increasing and have the LFT of the form XT = x + t ϑ dS represents a
2 Utility Indifference Valuation

wealth at time T that is attainable from x ∈ LFt at for the conjugate function
t by trading. Let XT (x, t) denote the set of all such V b (y) := ess supx (U (x + bB) − xy)
XT , and set Xt (x) := Xt (x, t − 1). In addition to the (y > 0, b ∈ J ), with V b (y) = V 0 (y) + ybB and
liquid assets, there exist J illiquid assets delivering vTb (y) = V b (y). For later arguments, we assume the
payoffs B = (B j )1≤j ≤J ∈ LFT (J ) at T . A quantity following:
b ∈ Jof illiquid assets provides at T the payoff (A1) ubt (x) satisfies condition (4) for any t, b, and
bB := j bj B j . We assume that the market is free the value functions are conjugate:
of arbitrage in the sense that the set Me of equivalent
probability measures Q under which S is a martingale vtb (y) = ess supx∈ (ubt (x) − xy), y > 0 (6)
(see Martingales) is nonempty. This is equivalent
ubt (x) = ess infy>0 (vtb (y) + xy), x∈ (7)
to assuming that all sets Ds,t , s < t ≤ T , of condi-
tional state-price densities are nonempty. Technically, (A2) For any t ≤ T and b, x, y ∈ LFt−1 , there exist
 t Ds,t ∈ LFt satis-
Ds,t is the set of strictly positive unique X tb (y) that attain the single-period
tb (x) and D
fying Es [Ds,t ] = 1 and Es [Ds,t s ϑ dS] = 0 for all optima (8) and (9)
ϑ ∈ . For brevity, let Dt := Dt−1,t . State-price den-
 
sities are related to the likelihood density process ubt−1 (x) = ess sup Et−1 ubt (Xt )
Zt = Et [ dQ/ dP ] of a Q ∈ Me by Ds,t = Zt /Zs . Xt ∈Xt (x,t−1)
 
b (x))
= Et−1 ubt (X (8)
t
 
Conditional Utility Functions and Dual b
vt−1 (y) = ess inf Et−1 vtb (yDt )
Dt ∈Dt
Problems  
b (y))
= Et−1 vtb (y D (9)
t
Our agent’s objective is to maximize her expected
utility (3) of wealth at T for a direct utility function and satisfy u bt (X b (x)) = y D
b (y) for x and y being
t t
b
U , which is finite, differentiable, strictly increasing, related by u t−1 (x) = y.
and concave on all of , with limx→−∞ U  (x) = ∞ (A3) For t ≤ T , b, x, y ∈ LFt , unique optima X b
T
and limx→∞ U  (x) = 0. Holding position (x, b) ∈ b
and Dt,T for the multiperiod problems (3), (5) are at-
 × J in liquid and illiquid assets at t ≤ T , she tained, and can be constructed by dynamical program-
maximizes ming X b = X
b (X b ) and D b = D b D b b
k k k−1 t,k t,k−1 k (y Dt,k−1 )
 
for t < k ≤ T , with Dk (·) and Xk (·) from A2 and
b b

ubt (x) : = ess sup Et [U (XT + bB)] tb := x and D


X t,t
b b ) =
:= 1. The optima satisfy u bk (Xk
XT ∈XT (x,t) yD , t < k ≤ T , for x and y being related by
b
  t,k
= ess sup Et ubT (XT ) (3) u bt (x) = y.
XT ∈XT (x,t)  being finite, A1–3 can be shown by convex
duality; by arguments as in [21] follows inductively
We call ubt (x) u-regular if (for all b, ω) the function that A1–3 hold for t − 1 on each atom, given they
x  → ubt (x) is strictly concave, increasing, and con- hold at t. Also see Convex Duality and Second
tinuously differentiable on  with limx→−∞ u bt (x) = Fundamental Theorem of Asset Pricing. Let us
+∞ and limx→+∞ u bt (x) = 0. For t = T , ubT (x) = just mention here that, under regularity, the trans-
U (x + bB) satisfies the condition forms (7) and (6) are inversions of each other, and
−vtb (y) is the inverse function of the marginal utility
ubt (x) is u-regular, concave, and differentiable u bt (x) = ∂x∂ b
ut (x).
in (x, b), with ubt () = U () (4)
Properties of Utility Indifference Values
with U () = {U (x) : x ∈ } denoting the range of
U . The primal problems (3) are related, see A1–3, Concavity (Convexity)
to the dual problems By concavity of U , indifference buy (sell) values
  πtb,x (δ) (respectively −πtb,x (−δ)) are concave (con-
vtb (y) := ess inf Et V b (yDt,T ) , y > 0 (5)
Dt,T ∈Dt,T vex) with respect to the quantity δ of illiquid assets
Utility Indifference Valuation 3

that they compensate for, that is, opportunities to her counterparty. Indeed, a strategy
θ ∈  would offer arbitrage profits to him, jointly
λπtb,x (δ 1 ) + (1 − λ)πtb,x (δ 2 ) with (βt ), if his gains

≤ πtb,x (λδ 1 + (1 − λ)δ 2 ) for λ ∈ [0, 1] (10)  T


GT : = θ dS
Monotonicity 0

Monotonicity of U implies, on any atom A ∈ At , that 


T
β ,
Xt−1
+ t−1
πt−1 (βt ) − (βT − b)B
1A πtb,x (δ) ≤ 1A πtb,x (δ  ) if 1A δB ≤ 1A δ B  t=1


(13)
for δ, δ ∈  J
(11)

and that 1A πtb,x (δ) = 0 holds if 1A δB = 0. satisfy GT ≥ 0 and P [GT > 0]. Unwinding her
illiquid asset position at T , leaves her with final
Dynamic consistency with no arbitrage wealth
So far, we took the agent to trade optimally in liquid  T
assets, while holding a fixed position b in illiquid T = x +
X 
ϑ dS
assets. Now, suppose that she is ready to buy (or 0
sell) at her compensating variations shares of illiquid

T
,
assets in quantities as requested by another agent − t−1
πt−1
β Xt−1
(βt ) + βT B (14)
(he), dynamically over time. Let βt − β0 ∈ LFt−1 (J ) t=1
denote the cumulative position in illiquid assets she
has accepted until date t − 1, when she initially has Adding equation T (13) to equation β(14), would im-
held β0 := b ∈ J . At t − 1 < T , he chooses to sell ply E[ubT (x + 0 θ +  T )] = ub (x),
ϑ dS)] > E[uTT (X 0
βt ∈ LFt−1 (J ) illiquid assets. Given X t−1 is the contradicting definition (3).
wealth in liquid assets she arrived with at t − 1,
paying compensating variation changes her liquid Static no-arbitrage bounds
wealth to X t−1 − π βt−1 ,
t−1 := X Xt−1
(βt ) such that In particular, there is no arbitrage from buy(sell)-and-
t−1
βt−1 
the utility of her position stays equal ut−1 (Xt−1 ) = hold strategies in illiquid assets. For x ∈ , b, δ ∈ J
βt t−1 ). Investing optimally for the next period
ut−1 (X it thus holds
according to A2 from her new position (X t−1 , βt )
(without knowing his future (βt+k )k≥1 ), she arrives πtb,x (δ) ≤ ess sup EtQ [δB]
at t with liquid wealth Q∈Me

and − πtb,x (−δ) ≥ ess infe EtQ [δB]


t = X
X tβt (X
t−1 ) Q∈M

 t (15)
t−1 − π t−1 β ,
Xt−1 
=: X t−1 (βt ) + ϑ dS (12) j T j
t−1 For replicable payoffs B j = Bt + t ϑ B dS with
j
ϑ B in  for all j , Bt ∈ LFt (d ) the indifference
for an optimal strategy  ϑ over (t − 1, t]. Given
an initial wealth X0 = x ∈ , the wealth process value πtb,x (δ) equals the replication cost (market
t is determined by compensating variations and
X price) δBt .
β 
A2 such that (ut t (X t )) is a martingale. Trading
Marginal indifference values
against indifference valuations but not following
strategy ϑ would result in a suboptimal wealth In general, πtb,x (δ) is nonlinear in δ. Since
β
process Xt for which the utility process (ut t (Xt )) ε  → ub+εδ
t (x − πtb,x (εδ)) is constant, it holds
is a supermartingale, therefore decreasing in the
mean. By accepting to trade illiquid assets against ∂ b,x gradb ubt (x)
πt (εδ) = δ (16)
u t (x)
∂ε b
her indifference values, she is not offering arbitrage ε=0
4 Utility Indifference Valuation

Hence, marginal indifference values, that is, compen- Partial hedging


sating variations for infinitesimal changes of quan- Compensating variations can be associated with a
tities, are linear in δ and given by the ratio of the utility-based hedging strategy, which, for an aggre-
gradient of ubt (x) with respect to b and the marginal gate position (x, b) at t = 0, is defined as the strategy
utility of wealth u bt (x). The principle of valuation at whose wealth process is X b (x) − X 0 (x + c0,x (b)),
t t 0
ratios of marginal utilities is classical in economics, b 0
for optimal wealth processes X , X from A3 and
see for example [4]. Marginal indifference values c0,x
0 (b) from equation (2). The risk that remains under
can be computed from optimizers of the dual prob- partial hedging can be substantial, see the example
lem. They coincide with prices of an arbitrage-free below.
dynamical price process in an enlarged market, where
previously illiquid assets are tradable at “shadow”
price processes, which are such that the utility max- Case of Exponential Utility
imizing agent is not trading those assets. To see
this for t = 0, fix x, b. For y and D b from A3, Much of the literature on indifference pricing deals
0,T
Q b dP . Let with exponential utility U (x) = − α1 exp(−αx) of
let Rk := Ek [B], k ≤ T , for dQ := D 0,T
b b constant absolute risk aversion α > 0. Because U
ū0 (x), v̄0 (y) be the primal and dual value functions factorizes, the utility functions are of the form
(cf. equations (3),(5)) of the market S̄ = (S, R) that ubt (x) = − α1 e−α(x+Ct ) , t ≤ T , for random variables
b

is enlarged by the additional price process R. The


Ctb ∈ LFt not depending on x, with CTb = bB.
set of state-price densities for the enlarged market
Clearly, πtb,x (δ) = Ctb+δ − Ctb does not depend on x,
is smaller, but includes the minimizer for equation
and the compensating variation (1) and the equiv-
(5). Hence v̄0b (·) ≥ v0b (·) and v̄0b (y) = v0b (y), implying
alent variation (2) coincide for
v̄0b (y) = v0b (y) and ūb0 (x) = ub0 (x) by A1. Thus, the exponential utility. From the dual value functions
optimal strategy in the enlarged market is not trading vtb (y) = αy (log y − 1 + αCtb ) from equation (5), one
the additional asset at the shadow price process (Rt ). obtains a general formula
The agent is, in particular, indifferent to infinitesi-
mal initial variations of his position at shadow prices.

Hence, R0 must be given by the ratio in (16) of πt0,x (δ) = ess inf Et [D(δB)]
D∈Dt,T
marginal utilities at t = 0. If the agent is taken to
be representative for the whole market, holding a net 1 t,T t,T

+ Et [D log D] − Et [D 0
log D 0
]
supply of b illiquid assets, then (Rt ) could be inter- α
preted as a partial equilibrium price process. (17)
Numeraire dependence for the indifference value =
πtb,x (δ) πt0,x (δ
+ b) −
In general, utility indifference values depend on the πt0,x (b), where D 0 is the minimizer of equa-
t,T
utility functions and the numeraire (unit of account) tion (5) for b = 0 that satisfies Et [D0 log D0 ] =
t,T t,T
with respect to which they are defined. But it is pos- ess infDt,T Et [Dt,T log Dt,T ]. By equation (17), util-
sible to choose state-dependent utility functions with ity indifference sell values δB  → −πtb,x (−δ) are
respect to another numeraire such that indifference monotonic in α, and satisfy the properties of con-
values (and optimal strategies) become numeraire vexity, translation invariance, and monotonicity, that
invariant. Let (Nt ) be the price process of a trad- constitute a convex risk measure (see Convex Risk
t
able numeraire, that is, Nt = N0 + 0 ϑ N dS for t ≤ Measures).
T , ϑ N ∈ , with N > 0. Then indifference values Under particular model assumptions, indifference
coincide, that means πt,N b,x
(δ) = πtb,xNt (δ)/Nt holds, values πt0,x (δ) can be computed by a backward
if utilities and payoffs with respect to N satisfy the induction scheme
relations ubt,N (x) = ubt (xNt ) (for t = T , hence all t)
0,x
and BN := B/NT . Likewise, for numeraires N, N,  − πt−1 (δ)
the relations should be ub (x) = ubt,N (x N t /Nt ) and  
t,N Q0 1  
 = Et−1 log EGt exp − απt (δ)
0,x
(18)
BN = BN NT /NT . α
Utility Indifference Valuation 5

starting from πT0,x (δ) = δB. Roughly speaking, the with the replication cost of δB. Marginal utility
assumptions needed comprise certain independence indifference values are given by
conditions plus semicompleteness of the market at
∂ b,x
each period. The scheme (18) has intuitive appeal, σ (
π (δ) = Yt +  λ − λρ)(T − t)
in showing that the indifference valuation is com- ∂δ t
puted here by intertwining two well-known valuation − α(b + δ)
σ 2 (1 − ρ 2 )(T − t) (21)
methods: First, one takes an exponential certainty
equivalent with respect to nontradable risk at the Under the (minimal entropy) martingale measure
inner expectation (with Ft−1 ⊂ Gt ⊂ Ft ); after that dQ0 = exp(−λWT − λ2 T /2) dP we have St =
one takes a risk-neutral expectation of this certainty S0 + σ Wt0 for independent Q0 -Brownian motions
equivalent at the outer expectation (under the mini- (Wt0 ) and (Wt⊥ ). Indifference values can be
mal entropy martingale measure), where, Gt -risk is expressed by
taken as replicable from t − 1. See [1, 18] for precise
technical assumptions and examples. − πt0,x (b) =
1 0  
log EtQ exp α(1 − ρ 2 )(−bB) (22)
Example in Continuous Time α(1 − ρ )
2

Formulas like equation (22) have also been obtained


For an instructive example, consider a (nonfinite) for different models, including the case where the
filtered probability space (, (Ft )t≤T , P
) with Brow- price processes of the risky and the nontraded asset
nian motions (Wt ) and (W t ) = (ρWt + 1 − ρ 2 W ⊥ ) (underlying of B) are given by correlated geometric
t
correlated by ρ ∈ [−1, 1]. The price process of a Brownian motions, see [6, 16].
single risky asset is St = S0 + σ (Wt + λt) with λ, To discuss the possible size of the partial hedg-
S0 ∈ , σ > 0, as in the model by Louis Bache- ing error, let S0 , Y0 , λ,  λ be zero, T = 1, and
lier. The illiquid asset’s payout B := YT for Yt = σ = σ . We assume that the agent has accepted
Y0 +  t + 
σ (W λt) can be interpreted as position in initially an illiquid position b at her indifference
a nontraded but correlated asset. Trading strategies valuation. Her utility-based partial hedging strategy
ϑ ∈  are taken to be adapted and bounded. The when holding b illiquid assets is  ϑb − ϑ 0 = −bρ.
0,x
maximal expected exponential utilities Her hedging error H = −π0 (b) + bB − bσρW1 =
0

   ⊥
1
2
αb σ (1 − ρ ) + bσ 1 − ρ W1
2 2 2 2 is normally
1 1 λ2 distributed. Its standard deviation accounts for
ut (x) = − exp − α x +
b
(T − t) + πt (b) ,
0,x 
α 2 α 1 − ρ 2 × 100% of that for the unhedged payoff
bB = bYT . For correlation ρ = 80%, for example,
x, b ∈  (19) the error size is still substantial at a ratio of 60%.
are then attained by the optimal strategies 
ϑb = Even for ρ = 99%, it is still above 14%. To be
λ 
− b σ ρ, with
σ compensated for the remaining risk in terms of
σα
her expected utility, the agent requires −π00,x (b) =
σ (
πt0,x (b) = bYt + b λ − λρ)(T − t)
1
2
αb2 σ 2 (1 − ρ 2 ) at t = 0. Her compensating varia-
tion of wealth is proportional to the variance of H
1 and to her risk aversion α.
− αb2
σ 2 (1 − ρ 2 )(T − t) (20)
2
Indifference values πtb,x (δ) = πt0,x (b + δ) − πt0,x (b)
Further Reading
for exponential utility do not depend on wealth x.
Optimality b
 t of equation (19) and ϑ follows by noting To value options under transaction costs, indifference
that ut ( 0 ϑ dS) is a martingale for ϑ = 
b
ϑ b and valuation was applied in [8]. The method is not lim-
a supermartingale for any other ϑ ∈ . Clearly, ited to European payoffs. For payoffs with optimal
indifference buy (sell) values πtb,x (δ) (respectively exercise features, see [10, 17]. Indifference values
−πtb,x (−δ)) are decreasing (increasing) in the risk for payoff streams could be defined by equation (1)
aversion α. They are linear in the quantity δ only if for utilities that reflect preferences on future payment
correlation is perfect (|ρ| = 1). Then, they coincide streams, like in [22]. For results on nonexponential
6 Utility Indifference Valuation

utilities, see [5, 12, 14]. For performance of utility- quadratic coefficient, Probability and Mathematical
based hedging strategies, see [15]. Besides dynamical Statistics 22, 51–83.
programming and convex duality, solutions have been [11] Kramkov, D. & Bank, P. (2007). A model for large
investor, where she trades at utility indifference prices
obtained by backward stochastic differential equa- of market makers, ICMS, Edinburgh, Present
tions (see Backward Stochastic Differential Equa- ation, www.icms.org.uk/downloads/quantfin/Kramkov.
tions) [10, 13, 20], also for non-convex closed con- pdf
straints [9] and jumps [2]. For asymptotic results on [12] Kramkov, D. & Sirbu, M. (2007). Asymptotic analysis
valuation and hedging for small volumes, see [2, 5, of utility-based hedging strategies for small number of
12, 13]. A Paretian equilibrium formulation for indif- contingent claims, Stochastic Processes and Applications
117, 1606–1620.
ference pricing has been presented in [11]. Being
[13] Mania, M. & Schweizer, M. (2005). Dynamic expo-
nonlinear, indifference values can reflect diversifica- nential utility indifference valuation, Annals of Applied
tion or accumulation of risk for applications areas Probability 15, 2113–2143.
like real options or insurance, see [6, 14, 19, 22]; [14] Møller, T. (2003). Indifference pricing of insurance con-
but modeling and computation are more demanding, tracts: applications, Insurance: Mathematics and Eco-
since a portfolio of assets cannot be valued by parts nomics 32, 295–315.
in general. Instead, each component is to be judged [15] Monoyios, M. (2004). Performance of utility-based
strategies for hedging basis risk, Quantitative Finance
by its contribution to the overall portfolio. More com-
4, 245–255.
prehensive references are given in [1, 3, 4, 6, 16]. [16] Musiela, M. & Zariphopoulou, T. (2004). An example
of indifference pricing under exponential preferences,
Finance and Stochastics 8, 229–239.
References [17] Musiela, M. & Zariphopoulou, T. (2004). Indiffer-
ence prices of early exercise claims, in Mathematics of
[1] Becherer, D. (2003). Rational hedging and valua- Finance, G. Yin & Q. Zhang, eds, Contemporary Math-
tion of integrated risks under constant absolute risk ematics, AMS, Vol. 351, pp. 259–273.
aversion, Insurance: Mathematics and Economics 33, [18] Musiela, M. & Zariphopoulou, T. (2004). A valuation
1–28. algorithm for indifference prices in incomplete markets,
[2] Becherer, D. (2006). Bounded solutions to backward Finance and Stochastics 8, 399–414.
SDEs with jumps for utility optimization and indif- [19] Porchet, A., Touzi, N. & Warin, X. (2008). Val-
ference hedging, Annals of Applied Probability 16, uation of power plants by utility indifference and
2027–2054. numerical computation, Mathematical Methods of Oper-
[3] Davis, M.H.A. (2006). Optimal hedging with basis risk, ations Research [Online], DOI: 10.007/s00186-008-
in From Stochastic Calculus to Mathematical Finance, 0231-z.
Y. Kabanov, R. Liptser & J. Stoyanov, eds, Springer, [20] Rouge, R. & El Karoui, N. (2000). Pricing via utility
Berlin, pp. 169–188. maximization and entropy, Mathematical Finance 10,
[4] Foldes, L. (2000). Valuation and martingale properties 259–276.
of shadow prices: an exposition, Journal of Economic [21] Schachermayer, W. (2002). Optimal investment in
Dynamics and Control 24, 1641–1701. incomplete financial markets, in Mathematical Finance:
[5] Henderson, V. (2002). Valuation of claims on non-traded Bachelier Congress 2000, H. Geman, D. Madan & S.R.
assets using utility maximization, Mathematical Finance Pliska, eds, Springer, Berlin, pp. 427–462.
12, 351–373. [22] Smith, J.E. & Nau, R.F. (1995). Valuing risky projects:
[6] Henderson, V. & Hobson, D. (2008). Utility indif- option pricing theory and decision analysis, Management
ference pricing—an overview, in Indifference Pricing, Science 41, 795–816.
R. Carmona, ed, Princeton University Press, pp.
44–74.
[7] Hicks, J.R. (1956). A Revision of Demand Theory, Related Articles
Oxford University Press, Oxford.
[8] Hodges, S.D. & Neuberger, A. (1989). Optimal replica- Complete Markets; Expected Utility Maximiza-
tion of contingent claims under transaction costs, Review
of Futures Markets 8, 222–239.
tion: Duality Methods; Good-deal Bounds;
[9] Hu, Y., Imkeller, P. & Müller, M. (2005). Utility Hedging; Minimal Entropy Martingale Measure;
maximization in incomplete markets, Annals of Applied Utility Theory: Historical Perspectives; Utility
Probability 15, 1691–1712. Function.
[10] Kobylanski, M., Lepeltier, J., Quenez, M. & Torres, S.
(2002). Reflected backward SDE with super-linear DIRK BECHERER
Superhedging noticed that, in the presence of leverage constraints,
superhedging may be cheaper than perfect hedging.
The same phenomenon has been observed by Benais
Pricing and hedging of contingent claims are the et al. [1] in the presence of transaction costs.
two main problems of mathematical finance. They The characterization of superhedging strategies
both have a clear and transparent solution when the and prices is the object of a family of results called
underlying market model is complete, that is, for superhedging theorems.
each contingent claim with promised payoff H there
exists a self-financing admissible trading strategy
Superhedging Theorems
whose wealth at maturity equals H (see Complete
Markets). Such a strategy is called the hedging A large literature has been devoted to characterizing
strategy of the contingent claim H . The smallest the set of all initial endowments that allows to
initial wealth that allows to reach H at maturity via superhedge a contingent claim H as a first crucial
admissible trading is called the hedging price of H . step to compute the superhedging price, the infimum
Under a suitable no-arbitrage assumption (see of that set. In this article, we focus essentially on
Fundamental Theorem of Asset Pricing), the sec- continuous-time hedging of European options, that
ond fundamental theorem of asset pricing (see Sec- is, with a fixed exercise time T , and distinguish
ond Fundamental Theorem of Asset Pricing) states between two cases: frictionless incomplete markets
that replicability of every contingent claim is equiv- and markets with frictions. For superhedging in
alent to the uniqueness of the equivalent martingale discrete-time models and for American options, the
measure  (see Equivalent Martingale Measures). interested reader could see respectively Föllmer and
It turns out that in a complete market (see Com- Schied’s book [17] and American Options.
plete Markets), the hedging price at time t = 0
of a contingent claim H , denoted by p(H ), coin-
cides with the expectation of discounted H under
Frictionless Incomplete Markets
the unique equivalent martingale measure , that is, To facilitate the discussion, let us fix the notation first.
p(H ) = Ɛ [DT H ] where DT is a discounting factor We consider a market model composed of d ≥ 1 risky
over [0, T ]. assets whose discounted price dynamics is described
If the market model is incomplete, there exist by a càdlàg and locally bounded semimartingale
contingent claims that are not perfectly replicable via S = (St )t∈[0,T ] , where T > 0 is a given finite time
admissible trading strategies. In other words, in such horizon. S is defined on a probability space (, F, )
financial models, contingent claims are not redundant and adapted to a filtration (Ft )t∈[0,T ] with Ft ⊂ F
assets. Therefore, since perfect replicability cannot be for all t ≤ T satisfying usual conditions. Notice that
always achieved, this requirement has to be relaxed. prices S are already discounted; this is equivalent
One way of doing this consists in introducing the to assuming that the spot interest rate r = 0. This
concept of superhedging. model is, in general, incomplete, that is, it may admit
Given a contingent claim H with maturity T > 0, infinitely many equivalent martingale measures (see
a superhedging strategy for H is an admissible trad- Equivalent Martingale Measures).
ing strategy such that its terminal wealth VT super- Let H be a positive FT -measurable random vari-
replicates H , that is, VT ≥ H . The superhedging able, modeling the final payoff of a given contingent
price of H is the smallest initial endowment that claim, for example, H = (ST − K)+ , a European call
allows an investor to super-replicate H at maturity; option written on S, with maturity T and strike price
in other words, it is the initial value V0 of the super- K > 0.
hedging strategy of H . An admissible trading strategy is a couple (x, θ)
Superhedging was introduced and investigated where x ∈  is an initial endowment and θ =
first by El Karoui and Quenez [13, 14] in a (θt )t∈[0,T ] a predictable S-integrable process,
t such that
continuous-time setting where the risky assets follow the corresponding wealth Vtx,θ = x + 0 θu dSu ≥ −a
a multidimensional diffusion process. Independently, for every t ∈ [0, T ] and for some threshold a >
Naik and Uppal [25] studied the same problem in a 0. We denote A as the set of all admissible
discrete-time model with finite set of scenarios and strategies.
2 Superhedging

Definition 1 Let H ≥ 0 be a given contingent claim. wealth. This extension is a consequence of the so-
(x, θ) ∈ A is a superhedging strategy for H if VTx,θ ≥ called optional decomposition of supermartingales.
H -a.s. (almost surely). Moreover, the superhedging The optional decomposition was first proved in [13,
price p̄(H ) of H is given by 14] for diffusions and then extended to general semi-
martingales by Kramkov [24], Föllmer and Kabanov
  [15], and Delbaen and Schachermayer [12]. This is
p̄(H ) = inf x ∈  : ∃(x, θ) ∈ A, VTx,θ ≥ H a.s. a very deep result of the general theory of stochas-
(1) tic processes and roughly states that any càdlàg-
positive -supermartingale X, for any  ∈ Me , can
The fundamental result in the literature on super- be decomposed as follows:
hedging is the dual characterization of the set DH  t
of all initial endowments x ∈  leading to super- Xt = X0 + θu dSu − Ct , t ∈ [0, T ] (4)
hedge H . In an incomplete frictionless market, the 0

relevant dual variables are the densities of all equiv- where θ is a predictable, S-integrable process and C
alent martingale measures d/d. We denote Me as an increasing optional process, to be interpreted as
the set of all equivalent (local) martingale measures a cumulative consumption process. What is remark-
for S. In this setting, the superhedging theorem states able is that the local martingale part can be rep-
that resented as a stochastic integral with respect to S
  so that it is a local martingale under any equiva-
DH = x ∈  : Ɛ [H ] ≤ x, ∀ ∈ Me (2) lent martingale measure . In this sense, decom-
position (4) is universal. The price to pay is that
An important consequence of equation (2) is that the increasing process C is, in general, not pre-
the superhedging price p̄(H ) satisfies dictable as in the Doob–Meyer decomposition (see
Doob–Meyer Decomposition) but only optional. The
p̄(H ) = sup Ɛ [H ] (3)
∈Me
process C has the economic interpretation of cumu-
lative consumption.
While an advantage of superhedging is that it is The decomposition (4) implies that the wealth
preference free, from the previous characterization dynamics of the minimal superhedging portfolio for
of p̄(H ) as the biggest expectation Ɛ [H ] over all a contingent claim H is given by
equivalent martingale measures, it becomes appar-
ent that pursuing a superhedging strategy can be too Vt = ess sup∈Me Ɛ [H |Ft ], t ∈ [0, T ] (5)
expensive, depending on the financial model and on An analogous result holds for American contingent
the constraints on portfolios. This is the main disad- claims too (see [13–15, 24] for details at increasing
vantage of such a criterion, which is, nonetheless, of levels of generality).
great interest as a benchmark. Moreover, for an agent Finally, in the more specific setting of stochastic
with a large risk aversion and under transaction costs volatility models, Cvitanić et al. [8] compute the
(see the section Markets with Frictions), the reser- superhedging strategy and price for a contingent
vation price approaches the superhedging price, as claim H = g(ST ), yielding that the former is a buy-
established in [2]. and-hold strategy and so the latter is just S0 . The same
El Karoui and Quenez [13, 14] first proved the study is carried over under portfolio constraints.
superhedging theorem in an Itô’s diffusion setting
and Delbaen and Schachermayer [10, 11] generalized Markets with Frictions
it to, respectively, a locally bounded and unbounded
semimartingale model, using a Hahn–Banach sepa- In the previous section, we made the implicit assump-
ration argument. tion that investors can trade in continuous time and
The superhedging theorem can be extended in without frictions. This is clearly a strong idealiza-
order to characterize the dynamics of the mini- tion of the real world; that is why during the last
mal superhedging portfolio of a contingent claim 15 years much effort has been devoted to the super-
H , that is, the cheapest at any time t of all hedging approach under various types of trading
superhedging portfolios of H with the same initial constraints.
Superhedging 3

Transaction Costs. Financial models with propor- where ·, · denotes the usual scalar product in d .
tional transaction costs were studied first by Jouini This theorem has been proven with increasing degree
and Kallal [19] and then generalized in a series of of generality by Cvitanić and Karatzas [7], Kabanov
papers by Kabanov and his coauthors [20–22]. [20], and Kabanov and Last [21] for continuous
For the reader’s convenience, we briefly introduce bid–ask processes (πt )t∈[0,T ] and constant propor-
the model, following the bid–ask matrix formalism tional transaction costs, by Kabanov and Stricker
introduced by Schachermayer [27], which is only one [22] under slightly more general assumptions and
of many equivalent convenient ways of describing finally, motivated by a counterexample constructed
it (see, e.g., [22] and Transaction Costs for more by Rásonyi [26], Campi and Schachermayer [5]
details). extend it to discontinuous π.
We consider an economy with d ≥ 1 risky assets Explicit computations of the superhedging price
ij
(e.g., foreign currencies); πt (ω) denotes the number have been performed in [3, 9, 18] for a European-
of physical units of asset i that can be exchanged type contingent claim H = g(ST ), where ST is the
with 1 unit of asset j at time t ∈ [0, T ]. All of them price at time T of a given asset in terms of some
are assumed to be adapted to some filtration and fixed numéraire. Under different assumptions, the
càdlàg. An important role is played by the so-called superhedging strategy is a buy-and-hold one, so that
solvency region Kt (ω), the cone generated by the unit the corresponding superhedging price is the price at
vectors ei and π ij ei − ej for 1 ≤ i, j ≤ d. Elements time t = 0 of the underlying S0 .
of Kt (ω) are all the positions that can be liquidated Finally, duality methods for American options
into a portfolio with a nonnegative quantity of each under proportional transaction costs are briefly treated
currency. We denote Kt∗ (ω) as the positive polar of in Transaction Costs.
Kt (ω).
A self-financing portfolio process is modeled by a Other Types of Market Frictions. Superhedging
d-dimensional finite variation process V = (Vt )t∈[0,T ] has also been studied under other types of constraints
such that each infinitesimal change dVt (ω) lies in on, for example, shortselling and/or borrowing (see,
−Kt (ω), that is, a portfolio change at time t has to e.g., Cvitanić and Karatzas’ paper [6] and Karatzas
be done according to the trading terms described by and Shreve’s book [23], Chapter 5, for more details).
the solvency cone Kt . Very often, an agent willing to superhedge a con-
In this setting, so-called strictly consistent price tingent claim H has to choose a strategy fulfilling
systems play the same role as the equivalent mar- a given set of constraints. Let us denote Ac as the
tingale measures. A strictly consistent price system class of constrained trading strategies. In this case,
Z is a positive non-null d –dimensional martingale the constrained superhedging price p̄c (H ) is given by
such that each Zt (ω) belongs to the relative interior
of Kt∗ (ω) almost surely for all t ∈ [0, T ]. We denote  
p̄c (H ) = inf x ∈ d : ∃(x, θ) ∈ Ac , VTx,θ ≥ H
Zs as the set of all strictly consistent price systems.
A standard assumption is that there exists at least one (7)
of such Z’s, that is, Zs = ∅, which is equivalent to
some kind of no-arbitrage condition (see Transaction Cvitanić and Karatzas [6] gave the first dual char-
Costs for details). acterization of p̄c (H ) in a diffusions setting, which
Let H = (H 1 , . . . , H d ) be a d-dimensional con- was further generalized to general semimartingales
tingent claim such that H + a1 ∈ KT for some a ∈ by Föllmer and Kramkov [16] via a constrained ver-
. We say that an admissiblea portfolio V super- sion of the optional decomposition theorem, whose
hedges H if VT − H ∈ KT . Consider the set DH of original version we already discussed at the end of
all initial endowment x ∈ d such that there exists an the section Frictionless Incomplete Markets.
admissible portfolio V , V0 = x, that superhedges H . We conclude by mentioning a recent series of
In this model, the superhedging theorem states that papers by Broadie et al. [4] and by Soner and Touzi
[28, 29] on superhedging under gamma constraints,
  where an agent is allowed to hedge H , having at
DH = x ∈ d : Ɛ[ZT H ] ≤ x, Z0 , ∀Z ∈ Zs
the same time a control on the gamma of his or her
(6) portfolios.
4 Superhedging

End Notes [14] El Karoui, N. & Quenez, M.-C. (1995). Dynamic pro-
gramming and pricing of contingent claims in an incom-
a. plete market, SIAM Journal of Control and Optimization
We remark en passant that the notion of admissibility in 33(1), 27–66.
the presence of transaction costs, that we do not give here, [15] Föllmer, H. & Kabanov, Yu.M. (1998). Optional decom-
is a subtle one. The interested reader could look at [5] for position and Lagrange multipliers, Finance and Stochas-
a short discussion. tics 2(1), 69–81.
[16] Föllmer, H. & Kramkov, D. (1997). Optional decompo-
References sitions under constraints, Probability Theory and Related
Fields 109, 1–25.
[17] Föllmer, H. & Schied, A. (2004). Stochastic Finance: An
[1] Benais, B., Lesne, J.P., Pagès, H. & Scheinkman, J. Introduction in Discrete Time, 2nd Edition, de Gruyter
(1992). Derivative asset pricing with transaction costs, Studies in Mathematics, Berlin, P. 27.
Mathematical Finance 2, 63–86. [18] Guasoni, P., Rásonyi, M. & Schachermayer, W. (2007).
[2] Bouchard, B., Kabanov, Yu.M. & Touzi, N. (2001). Consistent price systems and face-lifting under transac-
Option pricing by large risk aversion utility under tion costs, Annals of Applied Probability 18(2),
transaction costs, Decisions in Economics and Finance 491–520.
24, 127–136. [19] Jouini, E. & Kallal, H. (1995). Martingales and arbitrage
[3] Bouchard, B. & Touzi, N. (2000). Explicit solution of the in securities markets with transaction costs, Journal of
multivariate super-replication problem under transaction Economic Theory 66, 178–197.
costs, Annals of Applied Probability 10, 685–708. [20] Kabanov, Yu.M. (1999). Hedging and liquidation under
[4] Broadie, M., Cvitanić, J. & Soner, H.M. (1998). Optimal transaction costs in currency markets, Finance and
replication of contingent claims under portfolio con- Stochastics 3(2), 237–248.
straints, The Review of Financial Studies 11, 59–79. [21] Kabanov, Yu.M. & Last, G. (2002). Hedging under
[5] Campi, L. & Schachermayer, W. (2006). A super- transaction costs in currency markets: a continuous-time
replication theorem in Kabanov’s model of transaction model, Mathematical Finance 12(1), 63–70.
costs, Finance and Stochastics 10(4), 579–596. [22] Kabanov, Yu. & Stricker, Ch. (2002). Hedging of
[6] Cvitanić, J. & Karatzas, I. (1993). Hedging contingent contingent claims under transaction costs, in Advances
claims with constrained portfolios, The Annals of Applied in Finance and Stochastics. Essays in Honour of Dieter
Probability 3(3), 652–681. Sondermann, K. Sandmann & Ph. Schonbucher, eds,
[7] Cvitanić, J. & Karatzas, I. (1996). Hedging and port- Springer, Berlin, Heidelberg, New York.
folio optimization under transaction costs: a martingale [23] Karatzas, I. & Shreve, S. (1998). Methods of Mathemat-
approach, Mathematical Finance 6(2), 133–165. ical Finance, Springer.
[8] Cvitanić, J., Pham, H. & Touzi, N. (1999). Super- [24] Kramkov, D. (1996). Optional decomposition of super-
replication in stochastic volatility models under port- martingales and hedging contingent claims in incomplete
folio constraints, Journal of Applied Probability 36(2), security markets, Probability Theory and Related Fields
523–545. 105, 459–479.
[9] Cvitanić, J., Pham, H. & Touzi, N. (1999). A closed [25] Naik, V. & Uppal, R. (1994). Leverage constraints and
form solution to the problem of super-replication under the optimal hedging of stock and bond options, Journal
transaction costs, Finance and Stochastics 3, 35–54. of Financial and Quantitative Analysis 29(2), 199–222.
[10] Delbaen, F. & Schachermayer, W. (1994). A general [26] Rásonyi, M. (2003). A Remark on the Superhedging
version of the fundamental theorem of asset pricing, Theorem Under Transaction Costs. Séminaires de Prob-
Mathematische Annalen 300, 463–520. abilités XXXVII, Lecture Notes in Mathematics, 1832,
[11] Delbaen, F. & Schachermayer, W. (1998). The fun- Springer, pp. 394–398.
damental theorem of asset pricing for unbounded stoch- [27] Schachermayer, W. (2004). The fundamental theorem
astic processes, Mathematische Annalen 312, of asset pricing under proportional transaction costs
215–250. in finite discrete time, Mathematical Finance 14(1),
[12] Delbaen, F. & Schachermayer, W. (1999). A compact- 19–48.
ness principle for bounded sequences of martingales with [28] Soner, H.M. & Touzi, N. (2000). Super-replication
applications, Proceedings of the Seminar of Stochastic under gamma constraints, SIAM Journal of Control and
Analysis, Random Fields and Applications, Progress in Optimization 39(1), 73–96.
Probability 45, 137–173. [29] Soner, M. & Touzi, N. (2007). Hedging under gamma
[13] El Karoui, N. & Quenez, M.-C. (1991). Programma- constraints by optimal stopping and face-lifting, Mathe-
tion dynamique et évaluation des actifs contingents en matical Finance 17(1), 59–80.
marché incomplet. (French) [Dynamic programming and
pricing of contingent claims in an incomplete mar- LUCIANO CAMPI
ket], Comptes Rendus de l’Académie des Sciences Série
Mathématiques 313(12), 851–854.
Free Lunch point in the future (after liquidation has taken place).
Naturally, the previous formulation of an arbitrage
presupposes that a probabilistic model for the ran-
In the process of building realistic mathematical dom movement of liquid asset prices has been set
models of financial markets, absence of opportunities up. In [5], a discrete state space, multiperiod discrete-
for riskless profit is considered to be a minimal time financial market was considered. For this model,
normative assumption in order for the market to be the authors showed the equivalence between the eco-
in equilibrium state. The reason is quite obvious. If nomical “no arbitrage” (NA) condition and the math-
opportunities for riskless profit were present in the ematical stipulation of existence of an equivalent
market, every economic agent would try to reap them. probability that makes the discounted asset price pro-
Prices would then instantaneously move in response cesses martingales.
to an imbalance between supply and demand. This Crucial in the proof of the result in [5] was the
sudden price movement would continue as long as separating hyperplane theorem in finite-dimensional
opportunities for riskless profit are still present in Euclidean spaces. One of the convex sets to be sep-
the market. Therefore, in market equilibrium, no such arated is the class of all terminal outcomes resulting
opportunities should be possible. from trading and possible consumption starting from
The aforementioned simple and very natural idea zero capital; the other is the positive orthant. The NA
has proved very fruitful and has lead to great math- condition is basically the statement that the intersec-
ematical as well as economical insight in the theory tion of these two convex sets consists of only the zero
of quantitative finance. A rigorous formulation of the vector.
exact definition of “absence of opportunities for risk- After the publication of [5], a saga of papers
less profit” turned out to be a highly nontrivial fact followed that were aimed, one way or another, at
that troubled mathematicians and economists for at strengthening the conclusion by considering more
least two decades.a As the road unfolded, the valuable complicated market models. It quickly became obvi-
input of the theory of stochastic analysis in financial ous that the previous NA condition is no longer
theory was obvious; in the other direction, the devel- sufficient to imply the existence of a risk-neutral mea-
opment of the theory of stochastic processes benefited sure; it is too weak. In infinite-dimensional spaces,
immensely from problems that emerged purely from separation of hyperplanes, made possible by means of
these financial considerations.
the geometric version of the Hahn–Banach theorem,
Since the late 1970s, there has been a notion that
requires the closedness of the set C of all terminal
there is a deep connection between the absence of
outcomes resulting from trading and possible con-
opportunities for riskless profit and the existence of
sumption starting from zero capital. The simple NA
a risk-neutral measure,b that is, a probability that
condition does not imply this, in general. This has
is equivalent to the original one under which the
lead Kreps [7] to define a free lunch as a generalized,
discounted asset price processes have some kind of
martingale property. Existence of such measures are asymptotic form of an arbitrage.
of major practical importance, since they open the Essentially, a free lunch is a possibly infinite-
road to pricing illiquid assets or contingent claims valued random variable f with [f ≥ 0] = 1 and
in the market (see Risk-neutral Pricing). The result [f > 0] > 0 that belongs to the closure of C.
of the above notion has been called the fundamental Once an appropriate topology is defined on L0 , the
theorem of asset pricing (FTAP); for a detailed space of all random variables, in order for the last
 to make sense, the “no-free-lunch”
closure (call it C)
account, see Fundamental Theorem of Asset Pri-
cing. (NFL) condition states thatc C  ∩ L0 = {0}. Kreps
+
The easiest and most classical way to formulate [7] used this idea with a very weak topology on
the notion of riskless profit is via the so-called arbi- locally convex spaces and showed the existence of
trage strategy (see Arbitrage Strategy). An arbitrage a separating measure.d However, apart from trivial
is a combination of positions in the traded assets cases, this topology does not stem from a metric,
that requires zero initial capital and results in non- which means that closedness cannot be described in
negative outcome with a strictly positive probability terms of convergence of sequences. This makes the
of the wealth being strictly positive at a fixed time definition of a free lunch quite nonintuitive.
2 Free Lunch

After [7], there were lots of attempts to introduce b.


Also called an equivalent martingale measure—see Equi-
a condition closely related to NFL that would be valent Martingale Measures for an account of the different
more economically plausible, albeit still equivalent notions that the previous appellation encompasses.
c. 0
to NFL, and would prove equivalent to the existence L+ is the subset of L0 consisting of nonnegative random
variables.
of a risk-neutral measure. In general finite-horizon, d.
A separating measure is a probability  equivalent to the
discrete-time markets, it was shown ine [1] that the original one such that all elements of C have nonpositive
plain NA condition is equivalent to NFL. This seemed expectation with respect to . In this context, also see
to suggest the possibility of a nice counterpart of the Fundamental Theorem of Asset Pricing. Note that in
NFL condition for more complicated models. Del- the case of a continuous-time market model with locally
baen [2] treated the case of continuous time, bounded, bounded asset prices, a separating measure automatically
makes the discounted asset prices local martingales. This
and continuous asset prices and used a neat condi- was proved in [3].
tion, equivalent to NFL, calledf no free lunch with e.
For a compact and rather elementary proof of this result,
bounded risk (NFLBR) that can be stated in terms of see [6].
f.
sequence convergence. Essentially, the NFLBR con- The appellation to this condition was actually coined by
dition precludes asymptotic arbitrage at some fixed W. Schachemayer in [8].
point in time, when the overall downside risk of all
the wealth processes involved is bounded. Later, [8] References
treated the case of infinite-horizon discrete-time mod-
els, where the NFLBR condition was once again used. [1] Dalang, R.C., Morton, A. & Willinger, W. (1990). Equiv-
At this point, with the continuous-path and infinite- alent martingale measures and no-arbitrage in stochas-
horizon discrete-time cases resolved, there seemed to tic securities market models. Stochastics and Stochastics
be one more “gluing” step to reach a general version Reports 29, 185–201.
[2] Delbaen, F. (1992). Representing martingale measures
of the FTAP for semimartingale models. Not only
when asset prices are continuous and bounded. Mathe-
did Delbaen and Schachemayer make this step for matical Finance 2, 107–130.
semimartingale models, they actually further weak- [3] Delbaen, F. & Schachermayer, W. (1994). A general
ened the NFLBR condition to the no free lunch with version of the fundamental theorem of asset pricing,
vanishing risk (NFLVR) condition, where the previ- Mathematische Annalen 300, 463–520.
ous asymptotic arbitrage at some fixed point in time [4] Delbaen, F. & Schachermayer, W. (1998). The funda-
mental theorem of asset pricing for unbounded stochastic
is precluded and the overall downside risk of all the
processes. Mathematische Annalen 312, 215–250.
wealth processes tends to zero in the limit. In more [5] Harrison, J.M. & Kreps, D.M. (1979). Martingales and
precise mathematical terms, the NLFVR condition arbitrage in multiperiod securities markets, Journal of
can be stated as C ∩ L0+ = {0}, where C is the clo- Economic Theory 20, 381–408.
sure in the very strong L∞ -topology of (almost sure) [6] Kabanov, Y. & Stricker, C. (2001). A teachers’ note
uniform convergence. on no-arbitrage criteria, in Seminaire de Probabilites,
XXXV, Lecture Notes in Mathematics, Springer, Berlin,
The NFLVR condition was finally the one that
Vol. 1755, 149–152.
proved itself to be the most fruitful in obtaining a [7] Kreps, D.M. (1981). Arbitrage and equilibrium in econo-
general version of the FTAP; see [3] and [4] (also mies with infinitely many commodities, Journal of Math-
see Fundamental Theorem of Asset Pricing). It ematical Economics 8, 15–35.
is both economically plausible and mathematically [8] Schachermayer, W. (1994). Martingale measures for
convenient. Needless to say, and like many great discrete-time processes with infinite horizon, Mathemati-
cal Finance 4, 25–55.
results in science, the final simplicity and clarity of
the result’s statement came with the price that the CONSTANTINOS KARDARAS
corresponding proof was extremely technical.

End Notes

a.
The exact market viability definition is still sometimes the
source of debate.
Minimal Entropy of U , that is,

Martingale Measure f (y) − αy − β = sup(U (x) − xy) (3)


x

Finding Q∗,f is then the dual to the primal problem


Consider a stochastic process S = (St )t≥0 on a prob- of maximizing the expected utility
ability space (, F, P ) and adapted to a filtration   
IF = (Ft )t≥0 . Each St takes values in IR d and mod-
T
els the discounted prices at time t of d basic assets ϑ  → E U x0 + ϑr dSr  (4)
traded in a financial market. An equivalent local mar- 0
tingale measure (ELMM) for S, possibly on [0, T ]
for a time horizon T < ∞, is a probability measure from terminal wealth over allowed investment
Q equivalent to the original (historical, real-world) strategies ϑ. Moreover, under suitable conditions, the
measure P (on FT , if there is a T ) such that S is solutions Q∗,f and ϑ ∗,U are related by
a local Q-martingale (on [0, T ], respectively); see  

T
Equivalent Martingale Measures. If S is a nonneg- dQ∗,f
ative P -semimartingale, the fundamental theorem of = const. U  x0 + ϑr∗,U dSr  (5)
dP
asset pricing says that the existence of an ELMM Q 0
for S is equivalent to the absence-of-arbitrage con-
dition (NFLVR) that S admits no free lunch with More details can, for instance, be found in [26, 41,
vanishing risk; see Fundamental Theorem of Asset 46, 67, 68]. Relative entropy comes up with fE (y) =
Pricing. y log y when one starts with the exponential utility
functions Uα (x) = −e−αx with risk aversion α > 0.
Definition 1 Fix a time horizon T < ∞. An ELMM The duality in this special case has been studied in
QE for S on [0, T ] is called minimal entropy martin- detail in [8, 18, 40].
gale measure (MEMM) if QE minimizes the relative Since fE is strictly convex, the minimal entropy
entropy H (Q|P ) over all ELMMs Q for S on [0, T ]. martingale measure is always unique. If S is locally
bounded, the MEMM (on [0, T ]) exists if and only
Recall that the relative entropy is defined as if there is at least one ELMM Q for S on [0, T ]
   with H (Q|P ) < ∞ [21]. For general unbounded
dQ dQ S, the MEMM need not exist; [21] contains a
H (Q|P ) := EP dP log dP if Q  P (1) counterexample, and [1] shows how the duality above
+∞ otherwise will then fail. In [21], it is also shown that the MEMM
This is an example of the general concept of an f- is automatically equivalent to P , even if it is defined
divergence of the form as the minimizer of H (Q|P ) over all P -absolutely
continuous local martingale measures for S on [0, T ],
    provided that there exists some ELMM Q for S on
dQ
Df (Q|P ) := E P f if Q  P (2) [0, T ] with H (Q|P ) < ∞. Moreover, the density of
dP
+∞ otherwise QE with respect to P on FT has a very specific form;
it is given by
where f is a convex function on [0, ∞); see [26, 49],  T 
or [22] for a number of examples. The minimizer 

dQ 
E
Q∗,f of Df ( · |P ) is then called f-optimal ELMM. = ZTE = Z0 exp  ϑrE dSr  (6)
In many situations arising in mathematical dP FT
0
finance, f -optimal ELMMs come up via duality
from expected utility maximization problems; see for some constant Z0 > 0 and some predictable S-
Expected Utility Maximization: Duality Methods; integrable process ϑ E . This has been proved in [21]
Expected Utility Maximization. One starts with a for models in finite discrete time and in [26, 28] in
utility function U (see Utility Function) and obtains general; see also [23] for an application to finding
f (up to an affine function) as the convex conjugate optimal strategies in a Lévy process setting. Note,
2 Minimal Entropy Martingale Measure

however, that representation (2) holds only at the time given by an Esscher transform (see Esscher
horizon, T ; the density process Transform) and L is again a Lévy process under
 QE ; see, for instance, [13, 19, 24, 39].
dQE    
Zt =
E
= EP ZTE  Ft , 0 ≤ t ≤ T (7)
dP Ft For continuous semimartingales S, an alternative
approach is to characterize Z E via semimartin-
is usually quite difficult to find. We remark that gale backward equations or backward stochastic
the above results on both the equivalence to P and differential equations [50, 52]. The results in [56, 57]
the structure of the fE -optimal QE have versions use a mixture of the above ideas in a specific class
for more general f -divergences [26]. (Essentially, of models.
equation (2) is relation (1) in the case of exponential The second major area is concerned with con-
utility, but it can also be proved directly without using vergence questions. Several authors have proved, in
general duality.) several settings and with various techniques, that
The history of the minimal entropy martingale the minimal entropy martingale measure QE is the
measure QE is not straightforward to trace. A general limit, as p 1, of the so-called p-optimal martingale
definition and an authoritative exposition are given by measures obtained by minimizing the f -divergence
Frittelli [21]. However, the idea of the so-called mini- associated to the function f (y) = y p . This line of
max measures to link martingale measures via duality research was initiated in [27, 28], and later contri-
to utility maximization already appears, for instance, butions include [39, 52, 65]. In [45, 60], this con-
in [30, 31, 41]; see also [8]. Other early contributors vergence is combined with the general duality (1)
include Miyahara [53], who used the term “canonical from utility maximization in order to obtain con-
martingale measure”, and Stutzer [70]; some more vergence results for optimal wealths and strategies
historical comments and references are contained in as well.
[71]. Even before, in [20], it was shown that the prop- The third, and by far the most important area of
erty defining the MEMM is satisfied by the so-called research on the MEMM, is centered on its link to the
minimal martingale measure if S is continuous and exponential utility maximization problem; see [8, 18]
the so-called mean-variance trade-off of S has con- for a detailed exposition of this issue. More specifi-
stant expectation over all ELMMs for S; see also cally, the MEMM is very useful when one studies the
Minimal Martingale Measure. The most prominent valuation of contingent claims by (exponential) utility
example for this occurs when S is a Markovian dif- indifference valuation; see Utility Indifference Val-
fusion [53]. uation. To explain this, we fix an initial capital x0
After the initial foundations, work on the MEMM and a random payoff H due at time T . The maximal
has mainly concentrated on three major areas. The expected utility one can obtain by trading in S via
first aims to determine or describe the MEMM and, some strategy ϑ, if one starts with x0 and has to pay
in particular, its density process Z E more explicitly out H in T , is
in specific models. This has been done, among others,
for the following:   

T
• stochastic volatility models: see [9, 10, 35, 62, sup E U x0 + ϑr dSr − H  =: u(x0 ; −H )
ϑ
63], and compare also Volatility; Barndorff- 0
Nielsen and Shephard (BNS) Models;
(8)
• jump-diffusions [54]; and
• Lévy processes (see Lévy Processes), both in and the utility indifference value xH is then implicitly
general and in special settings: see [36] for defined by
an overview and [42, 43] for some examples.
In particular, many studies have considered u(x0 + xH ; −H ) = u(x0 ; 0) (9)
exponential Lévy models (see Exponential Lévy
Models) where S = S0 E(L) and L is a Lévy Hence, xH represents the monetary compensation
process under P . There, the existence of the required for selling H if one wants to achieve util-
MEMM QE reduces to an analytical condition ity indifference at the optimal investment behavior.
on the Lévy triplet of L. Moreover, QE is then If U = Uα is exponential, its multiplicative structure
Minimal Entropy Martingale Measure 3

makes the analysis of the utility indifference value xH aversion, Insurance: Mathematics and Economics 33,
tractable, in remarkable contrast to all other classical 1–28.
utility functions. Moreover, u(x0 ; −H ) as well as xH [5] Becherer, D. (2004). Utility-indifference hedging and
valuation via reaction-diffusion systems, Proceedings
and the optimal strategy ϑH∗ can be described with of the Royal Society A: Mathematical, Physical and
the help of a minimal entropy martingale measure Engineering Sciences 460, 27–51.
(defined here with respect to a new, H -dependent [6] Becherer, D. (2006). Bounded solutions to backward
reference measure PH instead of P ). This topic has SDEs with jumps for utility optimization and indif-
first been studied in [4, 58, 59, 64]; later work ference hedging, Annals of Applied Probability 16,
has examined intertemporally dynamic extensions [5, 2027–2054.
51], descriptions via backward stochastic differential [7] Bellamy, N. (2001). Wealth optimization in an incom-
plete market driven by a jump-diffusion process, Journal
equations (BSDEs) in specific models [6, 51], exten-
of Mathematical Economics 35, 259–287.
sions to more general payoff structures [38, 47, 48, [8] Bellini, F. & Frittelli, M. (2002). On the existence of
61], and so on [29, 37, 69]. minimax martingale measures, Mathematical Finance
Apart from the above, there are a number of other 12, 1–21.
areas where the minimal entropy martingale measure [9] Benth, F.E. & Karlsen, K.H. (2005). A PDE represen-
has come up; these include the following: tation of the density of the minimal entropy martingale
measure in stochastic volatility markets, Stochastics 77,
• option price comparisons [7, 11, 32–34, 55]; 109–137.
• generalizations or connections to other optimal [10] Benth, F.E. & Meyer-Brandis, T. (2005). The density
process of the minimal entropy martingale measure in
ELMMs [2, 14, 15, 66]); see also Minimal a stochastic volatility model with jumps, Finance and
Martingale Measure and [20]; Stochastics 9, 563–575.
• utility maximization with a random time horizon [11] Bergenthum, J. & Rüschendorf, L. (2007). Convex
[12]; ordering criteria for Lévy processes, Advances in Data
• good deal bounds [44]; see also Good-deal Analysis and Classification 1, 143–173.
Bounds; and [12] Blanchet-Scalliet, C., El Karoui, N. & Martellini, L.
• a calibration game [25]. (2005). Dynamic asset pricing theory with uncertain
time-horizon, Journal of Economic Dynamics and Con-
trol 29, 1737–1764.
There are also many papers that simply choose
[13] Chan, T. (1999). Pricing contingent claims on stocks
the MEMM as pricing measure for option pricing driven by Lévy processes, Annals of Applied Probability
applications; especially in papers from the actuarial 9, 504–528.
literature, this approach is often motivated by the [14] Choulli, T. & Stricker, C. (2005). Minimal entropy-
connections between the MEMM and the Esscher Hellinger martingale measure in incomplete markets,
transformation. Finally, we mention that the idea Mathematical Finance 15, 465–490.
of looking for a martingale measure subject to a [15] Choulli, T. & Stricker, C. (2006). More on mini-
mal entropy-Hellinger martingale measure, Mathemat-
constraint on relative entropy also naturally comes
ical Finance 16, 1–19.
up in calibration problems; see, for instance, [3, 16, [16] Cont, R. & Tankov, P. (2004). Nonparametric calibration
17] and Model Calibration. of jump-diffusion option pricing models, Journal of
Computational Finance 7, 1–49.
[17] Cont, R. & Tankov, P. (2006). Retrieving Lévy processes
References from option prices: regularization of an ill-posed inverse
problem, SIAM Journal on Control and Optimization 45,
[1] Acciaio, B. (2005). Absolutely continuous optimal mar- 1–25.
tingale measures, Statistics and Decisions 23, [18] Delbaen, F., Grandits, P., Rheinländer, T., Samperi, D.,
81–100. Schweizer, M. & Stricker, C. (2002). Exponential hedg-
[2] Arai, T. (2001). The relations between minimal martin- ing and entropic penalties, Mathematical Finance 12,
gale measure and minimal entropy martingale measure, 99–123.
Asia-Pacific Financial Markets 8, 137–177. [19] Esche, F. & Schweizer, M. (2005). Minimal entropy
[3] Avellaneda, M. (1998). Minimum-relative-entropy cali- preserves the Lévy property: how and why, Stochastic
bration of asset pricing models, International Journal of Processes and their Applications 115, 299–327.
Theoretical and Applied Finance 1, 447–472. [20] Föllmer, H. & Schweizer, M. (1991). Hedging of con-
[4] Becherer, D. (2003). Rational hedging and valua- tingent claims under incomplete information, in M.H.A.
tion of integrated risks under constant absolute risk Davis & R.J. Elliott, eds, Applied Stochastic Analysis,
4 Minimal Entropy Martingale Measure

Stochastics Monographs, Gordon and Breach, London, [37] İlhan, A., Jonsson, M. & Sircar, R. (2005). Opti-
Vol. 5, pp. 389–414. mal investment with derivative securities, Finance and
[21] Frittelli, M. (2000). The minimal entropy martingale Stochastics 9, 585–595.
measure and the valuation problem in incomplete mar- [38] İlhan, A. & Sircar, R. (2006). Optimal static-dynamic
kets, Mathematical Finance 10, 39–52. hedges for barrier options, Mathematical Finance 16,
[22] Frittelli, M. (2000). Introduction to a theory of value 359–385.
coherent with the no-arbitrage principle, Finance and [39] Jeanblanc, M., Klöppel, S. & Miyahara, Y. (2007).
Stochastics 4, 275–297. Minimal f q -martingale measures for exponential Lévy
[23] Fujiwara, T. (2004). From the minimal entropy mar- processes, Annals of Applied Probability 17, 1615–1638.
tingale measures to the optimal strategies for the [40] Kabanov, Y.M. & Stricker, C. (2002). On the opti-
exponential utility maximization: the case of geomet- mal portfolio for the exponential utility maximization:
ric Lévy processes, Asia-Pacific Financial Markets 11, remarks to the six-author paper, Mathematical Finance
367–391. 12, 125–134.
[24] Fujiwara, T. & Miyahara, Y. (2003). The minimal [41] Karatzas, I., Lehoczky, J.P., Shreve, S.E. & Xu, G.L.
entropy martingale measures for geometric Lévy pro- (1991). Martingale and duality methods for utility max-
cesses, Finance and Stochastics 7, 509–531. imization in an incomplete market, SIAM Journal on
[25] Glonti, O., Harremoes, P., Khechinashvili, Z., Topsøe, F. Control and Optimization 29, 702–730.
& Tbilisi, G. (2007). Nash equilibrium in a game of [42] Kassberger, S. & Liebmann, T. (2008). Mini-
calibration, Theory of Probability and its Applications mal q-entropy Martingale Measures for Exponential
51, 415–426. Time-changed Lévy Processes and within Parametric
[26] Goll, T. & Rüschendorf, L. (2001). Minimax and mini- Classes, preprint, University of Ulm, http://www.uni-
mal distance martingale measures and their relationship ulm.de/mawi/finmath/people/kassberger.html
to portfolio optimization, Finance and Stochastics 5, [43] Kim, Y.S. & Lee, J.H. (2007). The relative entropy
557–581. in CGMY processes and its applications to finance,
[27] Grandits, P. (1999). The p-optimal martingale measure Mathematical Methods of Operations Research 66,
and its asymptotic relation with the minimal entropy 327–338.
martingale measure, Bernoulli 5, 225–247. [44] Klöppel, S. & Schweizer, M. (2007). Dynamic utility-
[28] Grandits, P. & Rheinländer, T. (2002). On the minimal based good deal bounds, Statistics and Decisions 25,
285–309.
entropy martingale measure, Annals of Probability 30,
[45] Kohlmann, M. & Niethammer, C.R. (2007). On
1003–1038.
convergence to the exponential utility problem,
[29] Grasselli, M. (2007). Indifference pricing and hedging
Stochastic Processes and their Applications 117,
for volatility derivatives, Applied Mathematical Finance
1813–1834.
14, 303–317.
[46] Kramkov, D. & Schachermayer, W. (1999). The
[30] He, H. & Pearson, N.D. (1991). Consumption and
asymptotic elasticity of utility functions and optimal
portfolio policies with incomplete markets and short-sale
investment in incomplete markets, Annals of Applied
constraints: the finite-dimensional case, Mathematical
Probability 9, 904–950.
Finance 1(3), 1–10. [47] Leung, T. & Sircar, R. (2008). Exponential Hedging with
[31] He, H. & Pearson, N.D. (1991). Consumption and Optimal Stopping and Application to ESO Valuation,
portfolio policies with incomplete markets and short- preprint, Princeton University, http://ssrn.com/abstract=
sale constraints: the infinite dimensional case, Journal 1111993
of Economic Theory 54, 259–304. [48] Leung, T. & Sircar, R. (2009). Accounting for risk
[32] Henderson, V. (2005). Analytical comparisons of option aversion, vesting, job termination risk and multiple
prices in stochastic volatility models, Mathematical exercises in valuation of employee stock options,
Finance 15, 49–59. Mathematical Finance 19, 99–128.
[33] Henderson, V. & Hobson, D.G. (2003). Coupling and [49] Liese, F. & Vajda, I. (1987). Convex Statistical
option price comparisons in a jump-diffusion model, Distances, Teubner.
Stochastics and Stochastics Reports 75, 79–101. [50] Mania, M., Santacroce, M. & Tevzadze, R. (2003).
[34] Henderson, V., Hobson, D., Howison, S. & Kluge, T. A semimartingale BSDE related to the minimal
(2005). A comparison of option prices under different entropy martingale measure, Finance and Stochastics 7,
pricing measures in a stochastic volatility model with 385–402.
correlation, Review of Derivatives Research 8, 5–25. [51] Mania, M. & Schweizer, M. (2005). Dynamic
[35] Hobson, D. (2004). Stochastic volatility models, correla- exponential utility indifference valuation, Annals of
tion, and the q-optimal measure, Mathematical Finance Applied Probability 15, 2113–2143.
14, 537–556. [52] Mania, M. & Tevzadze, R. (2003). A unified charac-
[36] Hubalek, F. & Sgarra, C. (2006). Esscher transforms and terization of q-optimal and minimal entropy martin-
the minimal entropy martingale measure for exponential gale measures by semimartingale backward equations,
Lévy models, Quantitative Finance 6, 125–145. Georgian Mathematical Journal 10, 289–310.
Minimal Entropy Martingale Measure 5

[53] Miyahara, Y. (1995). Canonical martingale measures [64] Rouge, R. & El Karoui, N. (2000). Pricing via utility
of incomplete assets markets, Probability Theory and maximization and entropy, Mathematical Finance 10,
Mathematical Statistics: Proceedings of the Seventh 259–276.
Japan-Russia Symposium, Tokyo, pp. 343–352. [65] Santacroce, M. (2005). On the convergence of the p-
[54] Miyahara, Y. (1999). Minimal entropy martingale optimal martingale measures to the minimal entropy
measures of jump type price processes in incomplete martingale measure, Stochastic Analysis and Applica-
assets markets, Asia-Pacific Financial Markets 6, tions 23, 31–54.
97–113. [66] Santacroce, M. (2006). Derivatives pricing via p-optimal
[55] Møller, T. (2004). Stochastic orders in dynamic martingale measures: some extreme cases, Journal of
reinsurance markets, Finance and Stochastics 8, Applied Probability 43, 634–651.
479–499. [67] Schachermayer, W. (2001). Optimal investment in
[56] Monoyios, M. (2006). Characterisation of optimal dual incomplete markets when wealth may become negative,
measures via distortion, Decisions in Economics and Annals of Applied Probability 11, 694–734.
Finance 29, 95–119. [68] Schäl, M. (2000). Portfolio optimization and martingale
[57] Monoyios, M. (2007). The minimal entropy measure and measures, Mathematical Finance 10, 289–303.
an Esscher transform in an incomplete market, Statistics [69] Stoikov, S. (2006). Pricing options from the point of
and Probability Letters 77, 1070–1076. view of a trader, International Journal of Theoretical
[58] Musiela, M. & Zariphopoulou, T. (2004). An example and Applied Finance 9, 1245–1266.
of indifference prices under exponential preferences, [70] Stutzer, M. (1996). A simple nonparametric approach
Finance and Stochastics 8, 229–239. to derivative security valuation, Journal of Finance 51,
[59] Musiela, M. & Zariphopoulou, T. (2004). A valuation 1633–1652.
algorithm for indifference prices in incomplete markets, [71] Stutzer, M.J. (2000). Simple entropic derivation of a gen-
Finance and Stochastics 8, 399–414. eralized Black-Scholes option pricing model, Entropy 2,
[60] Niethammer, C.R. (2008). On convergence to the expo- 70–77.
nential utility problem with jumps, Stochastic Analysis
and Applications 26, 169–196.
[61] Oberman, A. & Zariphopoulou, T. (2003). Pricing early Related Articles
exercise contracts in incomplete markets, Computational
Management Science 1, 75–107.
[62] Rheinländer, T. (2005). An entropy approach to the
Entropy-based Estimation; Exponential Lévy
Stein and Stein model with correlation, Finance and Models; Minimal Martingale Measure; Risk-
Stochastics 9, 399–413. neutral Pricing; Semimartingale.
[63] Rheinländer, T. & Steiger, G. (2006). The minimal
entropy martingale measure for general Barndorff- MARTIN SCHWEIZER
Nielsen/Shephard models, Annals of Applied Probability
16, 1319–1351.
Minimal Martingale locally risk-minimizing strategy for a given con-
tingent claim H was obtained there (under some
Measure specific assumptions) as the integrand from the clas-
sical Galtchouk–Kunita–Watanabe decomposition of
H under P . However, the introduction of P  in
[46] and also in [47] was still somewhat ad hoc.
Let S = (St ) be a stochastic process
 on a filtered The above definition was given in [18] where the
probability space , F, (Ft ), P that models the main results presented here can also be found. In
discounted prices of primary traded assets in a finan- particular, [18] showed that for continuous S, the
cial market. An equivalent local martingale measure Galtchouk–Kunita–Watanabe decomposition of H
(ELMM) for S is a probability measure Q equivalent under the MMM P  provides (under very mild integra-
to the original (historical) measure P such that S bility conditions) the so-called Föllmer–Schweizer
is a local Q-martingale (see Equivalent Martingale decomposition of H under the original measure P ,
Measures). If S is a nonnegative P -semimartingale, and this in turn immediately gives the locally risk-
the fundamental theorem of asset pricing says that minimizing strategy for H . We emphasize that this
an ELMM Q for S exists if and only if S satisfies is no longer true, in general, if S has jumps. The
the no-arbitrage condition (NFLVR), that is, admits MMM subsequently found various other applications
no free lunch with vanishing risk (see Fundamental and uses and has become fairly popular, especially in
Theorem of Asset Pricing). By Girsanov’s theorem, models with continuous price processes.
S is then under P a semimartingale with a decom- Suppose now S satisfies (SC). For every ELMM
position S = S0 + M + A into a local P -martingale Q for S with dQ/ dP ∈ L2 (P ), the density process
M and an adapted process A of finite variation. If then takes the form
S is special under P , then A can be chosen pre-    
dQ  Q
dictable and the resulting canonical decomposition of Z Q := = Z E − λ dM + L Q
(1)
S is unique. We say that S satisfies the structure con- dP IF 0

dition (SC) if M is locally


 P -square-integrable and with some locally P -square-integrable local P -
A has the form A = dMλ for a predictable  pro- martingale LQ . If the MMM P  exists, then it has
cess λ such that the increasing process λ dMλ  
Z0 = 1 and L ≡ 0, and its density process is thus
P
is finite-valued. In an Itô process model where S given by the stochastic exponential (see Stochastic
is given by a stochastic differential equation dSt = Exponential)
St (µt − rt ) dt + σt dWt , the latter process is given
 2   
by (µt − rt )/σt dt, the integrated squared instan-
taneous Sharpe ratio of S (see Sharpe Ratio). 
Z = E − λ dM

Definition 1 Suppose S satisfies (SC). An ELMM    


 for S with P -square-integrable density d P/dP is 1
P = exp − λ dM − λ d[M] λ
called minimal martingale measure (MMM) (for S) 2
 = P on F0 and if every local P -martingale L
if P  
1
that is locally P -square integrable and strongly P - × (1 − λ M) exp λ M + (λ M)2
orthogonal to M is also a local P -martingale. We 2

call P orthogonality preserving if L is also strongly (2)
-orthogonal to S.
P
The advantage of this explicit representation is that
The basic idea for the MMM first appeared it allows to determine the MMM P  and its density
in [46] in a more specific model, where it was 
process Z directly from the ingredients M and λ of
used as an auxiliary technical tool in the con- the canonical decomposition of S. Conversely, one
text of local risk-minimization (see also Hedg-  to define a
can start with the above expression for Z
ing for an overview of key ideas on hedging candidate for the density process of the MMM. This
and Mean–Variance Hedging for an alternative gives existence of the MMM under the following
quadratic approach). More precisely, the so-called conditions:
2 Minimal Martingale Measure

1.  is strictly positive; this happens if and only if


Z • Properties, characterization results, and general-

λ M < 1, that is, all the jumps of λ dM are izations for the MMM: [1, 4, 9–11, 14, 19, 33,
strictly below 1. 36, 37, 49, 51];
2. The local P -martingale Z  is a true P -martingale. • Convergence results for option prices (computed
3.  is P -square-integrable.
Z under the MMM): [25, 32, 42, 44];
• Applications to hedging: [7, 39, 47, 48] (see also
Condition 1 automatically holds (on any finite time Hedging).
interval) if S, hence also M, is continuous; it typically
• Uses for option pricing: [8, 13, 55], to name
fails in models where S has jumps. Conditions 2 and
only a very a few; comparison results for option
3 can fail even if 1 holds and even if there exists some
prices are given in [22, 24, 34] (see also Risk-
ELMM for S with P -square-integrable density; see
neutral Pricing).
[45] or [15] for a counterexample.
 shows that P  is • Problems and counterexamples: [15, 16, 43, 45,
The above explicit formula for Z
52].
minimal in the sense that its density process con-
• Equilibrium justifications for using the MMM:
tains the smallest number of symbols among all
[26, 40].
ELMMs Q. More seriously, the original idea was
that P should turn S into a (local) martingale while
A second category of papers contains those where
having a minimal impact on the overall martingale the MMM has (sometimes unexpectedly) come up
structure of our setting. This is captured and made in connection with various other problems and top-
precise by the definition. If S is continuous, one can ics in mathematical finance. Examples include the
show that P  is even orthogonality preserving; see
following:
[18] for this, and note that this usually fails if S has
jumps. • Classical utility maximization and utility
To some extent, the naming of the “minimal” indifference valuation [3, 20, 21, 23, 35,
martingale measure is misleading since P  was not 41, 53, 54]: the MMM here often appears
originally defined as the minimizer of a particular because the special structure of a given model
functional on ELMMs. However, if S is continuous, implies that P  has a particular optimality
Föllmer and Schweizer [18] have proved that P 
property (see also Expected Utility Maximiza-
minimizes tion; Expected Utility Maximization: Duality

 ∞ Methods; Utility Indifference Valuation; and

Q  → H (Q|P ) − EQ λu dMu λu (3) Minimal Entropy Martingale Measure).
0
• The numeraire portfolio and growth-optimal
over all ELMMs Q for S; see also [49]. More- investment [2, 12]: this is related to the mini-
over, Schweizer [50] has shown that if S is contin- mization of the reverse relative entropy H (P | · )
uous, then P  minimizes the reverse relative entropy over ELMMs (see also Kelly Problem).
H (P |Q) over all ELMMs Q for S; this no longer • The concept of value preservation [28–30]:
holds if S has jumps. Under more restrictive assump- here the link seems to come up because value
tions, other minimality properties for P  have been preservation is, like local risk-minimization, a
obtained by several authors. However, a general local optimality criterion.
result under the sole assumption (SC) is not available • Good deal bounds in incomplete markets [5, 6]:
so far. the MMM naturally shows up here because
There is a large amount of literature related to the good deal bounds are formulated via instanta-
MMM. In fact, a Google Scholar search for “minimal neous quadratic restrictions on the pricing ker-
martingale measure” (enclosed in quotation marks) nel (ELMM) to be chosen (see also Good-deal
produced in April 2008 a list of well over 400 hits. As Bounds; Sharpe Ratio; Market Price of Risk).
a first category, this contains papers where the MMM • Local utility maximization [27]; again, the link
is studied per se or used as in the original approach here is due to the local nature of the criterion
of local risk-minimization. In terms of topics, the that is used.
following areas of related work can be found in that • Risk-sensitive control [17, 31, 38]; this is an
category: area where the connection to the MMM seems
Minimal Martingale Measure 3

not yet well understood. See also Risk-sensitive [17] Fleming, W.H. & Sheu, S.J. (2002). Risk-sensitive
Asset Management. control and an optimal investment model II, The Annals
of Applied Probability 12, 730–767.
[18] Föllmer, H. & Schweizer, M. (1991). Hedging of con-
tingent claims under incomplete information, in Applied
References Stochastic Analysis, Stochastics Monographs, M.H.A.
Davis & R.J. Elliott eds, Gordon and Breach, London,
[1] Arai, T. (2001). The relations between minimal martin- Vol. 5, pp. 389–414.
gale measure and minimal entropy martingale measure, [19] Grandits, P. (2000). On martingale measures for stochas-
Asia-Pacific Financial Markets 8, 137–177. tic processes with independent increments, Theory of
[2] Becherer, D. (2001). The numeraire portfolio for Probability and its Applications 44, 39–50.
unbounded semimartingales, Finance and Stochastics 5, [20] Grasselli, M. (2007). Indifference pricing and hedging
327–341. for volatility derivatives, Applied Mathematical Finance
[3] Berrier, F., Rogers, L.C.G. & Tehranchi, M. (2008). 14, 303–317.
A Characterization of Forward Utility Functions, [21] Henderson, V. (2002). Valuation of claims on nontraded
preprint, http://www.statslab.cam.ac.uk/∼mike/forward assets using utility maximization, Mathematical Finance
-utilities.pdf. 12, 351–373.
[4] Biagini, F. & Pratelli, M. (1999). Local risk minimiza- [22] Henderson, V. (2005). Analytical comparisons of option
tion and numeraire, Journal of Applied Probability 36, prices in stochastic volatility models, Mathematical
Finance 15, 49–59.
1126–1139.
[23] Henderson, V. & Hobson, D.G. (2002). Real options
[5] Björk, T. & Slinko, I. (2006). Towards a general
with constant relative risk aversion, Journal of Economic
theory of good-deal bounds, The Review of Finance 10,
Dynamics and Control 27, 329–355.
221–260.
[24] Henderson, V. & Hobson, D.G. (2003). Coupling and
[6] Černý, A. (2003). Generalised Sharpe ratios and asset
option price comparisons in a jump-diffusion model,
pricing in incomplete markets, European Finance
Stochastics and Stochastics Reports 75, 79–101.
Review 7, 191–233.
[25] Hong, D. & Wee, I.S. (2003). Convergence of jump-
[7] Černý, A. & Kallsen, J. (2007). On the structure of
diffusion models to the Black-Scholes model, Stochastic
general mean-variance hedging strategies, The Annals of
Analysis and Applications 21, 141–160.
Probability 35, 1479–1531.
[26] Jouini, E. & Napp, C. (1999). Continuous Time Equilib-
[8] Chan, T. (1999). Pricing contingent claims on stocks
rium Pricing of Nonredundant Assets, Leonard N. Stern
driven by Lévy processes, The Annals of Applied Prob- School Finance Department Working Paper 99-008 ,
ability 9, 504–528. New York University, http://w4.stern.nyu.edu/finance/
[9] Choulli, T. & Stricker, C. (2005). Minimal entropy- research.cfm?doc id=1216, http://www.stern.nyu.edu/
Hellinger martingale measure in incomplete markets, fin/workpapers/papers99/wpa99008.pdf.
Mathematical Finance 15, 465–490. [27] Kallsen, J. (2002). Utility-based derivative pricing
[10] Choulli, T. & Stricker, C. (2006). More on mini- in incomplete markets, in Mathematical Finance—
mal entropy-Hellinger martingale measure, Mathemat- Bachelier Congress 2000, H. Geman, D. Madan,
ical Finance 16, 1–19. S.R. Pliska & T. Vorst, eds, Springer-Verlag, Berlin,
[11] Choulli, T., Stricker, C. & Li, J. (2007). Minimal Heidelberg, New York, pp. 313–338.
Hellinger martingale measures of order q, Finance and [28] Korn, R. (1998). Value preserving portfolio strategies
Stochastics 11, 399–427. and the minimal martingale measure, Mathematical
[12] Christensen, M.M. & Larsen, K. (2007). No arbitrage Methods of Operations Research 47, 169–179.
and the growth optimal portfolio, Stochastic Analysis [29] Korn, R. (2000). Value preserving strategies and a
and Applications 25, 255–280. general framework for local approaches to optimal
[13] Colwell, D.B. & Elliott, R.J. (1993). Discontinuous asset portfolios, Mathematical Finance 10, 227–241.
prices and non-attainable contingent claims, Mathemat- [30] Korn, R. & Schäl, M. (1999). On value preserving
ical Finance 3, 295–308. and growth optimal portfolios, Mathematical Methods
[14] Delbaen, F., Grandits, P., Rheinländer, T., Samperi, D., of Operations Research 50, 189–218.
Schweizer, M. & Stricker, C. (2002). Exponential hedg- [31] Kuroda, K. & Nagai, H. (2002). Risk-sensitive portfolio
ing and entropic penalties, Mathematical Finance 12, optimization on infinite time horizon, Stochastics and
99–123. Stochastics Reports 73, 309–331.
[15] Delbaen, F. & Schachermayer, W. (1998). A simple [32] Lesne, J.-P., Prigent, J.-L. & Scaillet, O. (2000). Con-
counterexample to several problems in the theory of vergence of discrete time option pricing models under
asset pricing, Mathematical Finance 8, 1–11. stochastic interest rates, Finance and Stochastics 4,
[16] Elliott, R.J. & Madan, D.B. (1998). A discrete time 81–93.
equivalent martingale measure, Mathematical Finance [33] Mania, M. & Tevzadze, R. (2003). A unified charac-
8, 127–152. terization of q-optimal and minimal entropy martingale
4 Minimal Martingale Measure

measures by semimartingale backward equation, The [45] Schachermayer, W. (1993). A counterexample to several
Georgian Mathematical Journal 10, 289–310. problems in the theory of asset pricing, Mathematical
[34] Møller, T. (2004). Stochastic orders in dynamic reinsur- Finance 3, 217–229.
ance markets, Finance and Stochastics 8, 479–499. [46] Schweizer, M. (1988). Hedging of options in a general
[35] Monoyios, M. (2004). Performance of utility-based semimartingale model, Dissertation ETH Zürich 8615.
strategies for hedging basis risk, Quantitative Finance [47] Schweizer, M. (1991). Option hedging for semimartin-
4, 245–255. gales, Stochastic Processes and their Applications 37,
[36] Monoyios, M. (2006). Characterisation of optimal dual 339–363.
measures via distortion, Decisions in Economics and [48] Schweizer, M. (1992). Mean-variance hedging for gen-
Finance 29, 95–119. eral claims, The Annals of Applied Probability 2,
[37] Monoyios, M. (2007). The minimal entropy measure and 171–179.
an Esscher transform in an incomplete market, Statistics [49] Schweizer, M. (1995). On the minimal martingale mea-
and Probability Letters 77, 1070–1076. sure and the Föllmer-Schweizer decomposition, Stochas-
[38] Nagai, H. & Peng, S. (2002). Risk-sensitive portfo- tic Analysis and Applications 13, 573–599.
lio optimization with partial information on infinite [50] Schweizer, M. (1999). A minimality property of the
time horizon, The Annals of Applied Probability 12,
minimal martingale measure, Statistics and Probability
173–195.
Letters 42, 27–31.
[39] Pham, H., Rheinländer, T. & Schweizer, M. (1998).
[51] Schweizer, M. (2001). A guided tour through quadratic
Mean-variance hedging for continuous processes: new
hedging approaches, in Option Pricing, Interest Rates
results and examples, Finance and Stochastics 2,
and Risk Management, E. Jouini, J. Cvitanić &
173–198.
M. Musiela, eds, Cambridge University Press, Cam-
[40] Pham, H. & Touzi, N. (1996). Equilibrium state prices
in a stochastic volatility model, Mathematical Finance bridge, pp. 538–574.
6, 215–236. [52] Sin, C.A. (1998). Complications with stochastic volatil-
[41] Pirvu, T.A. & Haussmann, U.G. (2007). On Robust ity models, Advances in Applied Probability 30,
Utility Maximization, University of British Columbia, 256–268.
arXiv:math/0702727, preprint. [53] Stoikov, S. & Zariphopoulou, T. (2004). Optimal invest-
[42] Prigent, J.-L. (1999). Incomplete markets: convergence ments in the presence of unhedgeable risks and under
of options values under the minimal martingale measure, CARA preferences, in IMA Volume in Mathematics and
Advances in Applied Probability 31, 1058–1077. its Applications, in press.
[43] Rheinländer, T. (2005). An entropy approach to the [54] Tehranchi, M. (2004). Explicit solutions of some utility
Stein and Stein model with correlation, Finance and maximization problems in incomplete markets, Stochas-
Stochastics 9, 399–413. tic Processes and their Applications 114, 109–125.
[44] Runggaldier, W.J. & Schweizer, M. (1995). Conver- [55] Zhang, X. (1997). Numerical analysis of American
gence of option values under incompleteness, in Seminar option pricing in a jump-diffusion model, Mathematics
on Stochastic Analysis, Random Fields and Applications, of Operations Research 22, 668–690.
E. Bolthausen, M. Dozzi & F. Russo, eds, Birkhäuser
Verlag, Basel, pp. 365–384. HANS FÖLLMER & MARTIN SCHWEIZER
Good-deal Bounds for some derivative which can, at best, only be partly
hedged. There is no chance of replicating this claim
exactly, and super-replication bounds may be too
Most contingent claims valuation is based, at least loose to be practically helpful. The company expects
notionally, on the concept of exact replication. The to trade using some kind of statistical arbitrage, for
difficulties of exactly replicating derivative positions which each transaction passes a minimum reward-
suggest that in many cases we should, instead, put for-risk threshold and overall to obtain a portfolio
bounds around the value of an instrument. These that performs much better than that minimun.
bounds ought to depend on model assumptions and on More specifically, reservation forward bid and ask
the prices of securities that would be used to exploit prices p− < p+ are to be determined at time zero for
mispricing. No-arbitrage bounds are often very weak, a derivative that will pay a random amount C̃T at later
so good-deal bounds provide an attractive alternative. date T . We suppose a von Neumann–Morgenstern
Good-deal bounds provide a range of prices within utility function U (.) for date T wealth and a forward
which an instrument must trade if it is not to offer a wealth endowment of W0 . The reservation prices
surprisingly good reward-for-risk opportunity. This is are constructed so that trade will provide a level
illustrated in Figure 1, where the horizontal axis rep- of expected utility at a predetermined level UR that
resents the distribution of future payoffs (or values) exceeds the expected utility that could be reached
after zero cost hedging. In an incomplete market set- without it by A > 0.
ting, rather strong assumptions are needed to arrive at Figure 2 illustrates the construction. The hori-
a unique forward value, such as p ∗ in the figure. Con- zontal axis represents the price of the contingent
versely, risk-free arbitrage typically allows a rather claim. The vertical axis represent the expected utility
wide band of prices, as between the upper and lower obtained from buying (or selling) the optimal quantity
bounds b+ , b− . We can hope to obtain a much nar- of the claim. Outside the super-replication bounds,
rower band without the need for strong assumptions b− , b+ , unbounded wealth can be obtained.
if we simply preclude profitable opportunities. This In the case where no hedging will be undertaken
gives the good-deal bounds p+ and p− . These bounds and the forward price of the claim is p, we simply
have two alternative interpretations: we can think of have the optimization of the quantity θ bought or
them as establishing normative bid and ask forward sold as
prices for a particular trader or as predicting a range
in which we expect the market price to lie. Max    
E U W0 + θ C̃T − p (1)
This line of valuation analysis now has an inter- θ
esting history and it has inspired a quite significant
literature, much of it very mathematical. There are a If p is low enough we will expect to buy the
great many different variations by which the philos- claim, and if p is high enough we will want to
ophy just described can be implemented. sell it. Intuitively the good-deal lower bound, p− ,
This article aims to cover the main issues without is the highest price at which we can buy the claim
going too deeply into mathematical technicalities. We and obtain expected utility of UR , and the good-
begin by considering a simple illustrative example deal upper bound, p+ , is the lowest price at which
to provide intuitive insights into the nature of the we can sell the claim and obtain expected utility of
analysis, including the use of duality in the solutions. UR .
We then sketch the history of this topic, including Now consider the first-order conditions from the
the generalized Sharpe ratio. Finally, there is a optimization:
discussion of the role of the utility function (see     
Utility Function) in the analysis, of applications, and E C̃T − p U  W0 + θ C̃T − p = 0, so (2)
of the more recent literature.    
E C̃T U  W̃T
p=    ,
Illustration E U  W̃T
Consider the problem faced by a financial interme-  
diary in determining reservation bid and ask prices where W̃T = W0 + θ C̃T − p (3)
2 Good-deal Bounds

Density So far, we have only described the primal view


of this problem. Well-known duality results pro-
vide an alternative viewpoint that provides both
insights and alternative computational schemes. The
good-deal lower bound is characterized as the infi-
mum of values over nonnegative changes of measure,
Payoff
m, that price all reference assets and have insufficient
b− p− p* p+ b+ dispersion to provide higher levels of expected utility.
For example,
Figure 1 Good-deal bounds. Alternative forward prices   
and the distribution of future values after zero cost hedg- p− = inf E mC̃T subject to
m≥0
ing: b− , b+ : super-replication bounds; p− , p+ : good-deal
bounds; p∗ : unique (indifference) price E[V(m)] ≤ A, E[mST ] = S0 (5)

where V (m) is the conjugate function of U ,


Expected utility defined by

V(m) = sup{U (ST ) − mST } for m > 0 (6)


ST

and the final constraint in equation (5) represents the


correct pricing of reference assets.
UR
Price Note that this only differs from the formulation
b− p− p* p+ b+ for super-replication (no-arbitrage) bounds by the
addition of the inequality constraint in equation (5),
Figure 2 Expected utility against price. The good-deal which precludes extreme changes of measure that
bounds, p− , p+ , are defined as the prices at or beyond would generate expected utility greater than UR . In
which expected utility of UR can be obtained both cases, the more assets we hedge with, the more
the change of measure is constrained and the tighter
Thus, the reservation price corresponds to pric- the valuation bounds.
ing with stochastic discount factors induced by the
marginal utility at the optimal wealth levels corre-
sponding to this price. Early Literature
In principle, the extension to hedging is straight-
forward. The gains or losses from a self-financing Finding bounds on the values of derivatives has a
strategy with zero initial cost are simply added into long history. Merton [15] summarizes the conven-
the date T wealth. If at date t the strategy involves tional upper and lower bounds on vanilla options
holdings xt at prices Pt , the expression for wealth at and how they are enforced by arbitrage. The sub-
date T becomes sequent contributions of Harrison and Kreps [10],
Dybvig and Ross [8], and others have shown the
   T
pricing implications of no arbitrage more generally.
W̃T = W0 + θ C̃T − p + xt dPt (4)
0 Later papers by Perrakis and Ryan [16] and Levy [14]
obtained slightly more general bounds on the prices
Ideally, we would like to find and use the opti- of options, for example, based on stochastic dom-
mum hedging strategies, but any strategies that inance by adding some additional stronger assump-
enhance expected utility will provide tighter reser- tions. More recently a number of papers, such as [11],
vation prices. Note that if the claim can be repli- have considered super-replication bounds on exotic
cated exactly, then both the good-deal bounds will options when vanilla options can be used to engineer
tend to the replication cost. Similarly, the good-deal the hedge (see Arbitrage Bounds). Interest in this
bounds will always be at least as tight as any super- topic has further intensified with the growth of the
replication bounds that can be based on the same literature on Levy processes (see Exponential Lévy
assumptions. Models), exotic options, and incomplete markets.
Good-deal Bounds 3

Much of the work in the incomplete markets level of von Neumann–Morgenstern expected utility
literature focuses on ways to obtain a particular pric- and a good-deal as a desirable claim with zero or
ing measure and hence unique prices (for example, negative price. Within the analysis, it is assumed that
see Minimal Entropy Martingale Measure; Mini- any quantity of any claim may be bought or sold.
mal Martingale Measure and Schweitzer [17]), but The economy contains a collection of claims with
it is not clear why a particular agent would be pre- predetermined prices, so called basis assets. These
pared to trade at these prices. claims generate the marketed subspace M and their
The good-deal literature represents an important prices define a price correspondence on this subspace.
alternative between these two paths. Hansen and In an incomplete market, it is often convenient to
Jagannathan [9] provide a crucial stepping stone. suppose that the market is augmented in such a
They showed that the Sharpe ratio on any security way that the resulting complete market contains no
is bounded by the coefficient of variation of the arbitrages. Instead, we can more powerfully augment
stochastic discount factor (see Stochastic Discount the market so that the complete market contains no
Factors). The Sharpe ratio provides a very natural good-deals. We obtain a set of pricing functionals that
benchmark (see [18]) and Cochrane and Saá-Requejo form a subset of those that simply preclude arbitrage.
[6] subsequently used this to limit the volatility of the The link between no arbitrage and strictly positive
stochastic discount factor and infer the first no-good- pricing rules carries over to good-deals and enables
deal prices, conditional on the absence of high Sharpe price restrictions to be placed on nonmarketed claims.
ratios. At about the same time, a related paper by Under suitable technical assumptions, the no-good-
Bernardo and Ledoit [2] showed how similar bounds deal price region for a set of claims is a convex set,
could be obtained relative to a maximum gain–loss and redundant assets have unique good-deal prices.
ratio for the economy as a whole. These papers With an acceptance set of deals, K, typically
have their disadvantages. Cochrane and Saá-Requejo defined in terms of expected utility, the upper and
work with quadratic utility (and sometimes truncated lower good-deal bounds can be defined simply as
quadratic utility), whereas Bernardo and Ledoit use   T
Domar–Musgrave utility (i.e., two linear segments). p+ = inf p| − C̃T + p + xt dSt ∈ K and (7)
This led Hodges [12] to investigate bounds based on p,xt 0
the more conventional choice of exponential utility   T
and to thereby introduce the idea of a generalized p+ = inf p|C̃T − p + xt dSt ∈ K (8)
p,xt 0
Sharpe ratio.
This concept was extended by Černý and Hodges For a given utility function, the positions of the
[5] into the more general framework of good-deal good-deal bounds naturally depend on the required
pricing mostly used today. By then, it was already expected utility premium, A. The higher this level,
clear that these prices satisfied the criteria for coher- the further apart the bounds will be. Coherent risk
ent risk measures of Artzner et al., [1], namely, the measures, well into the tails of the final distribution,
linearity, subadditivity, and monotonicity properties. can be obtained if high levels are employed for A.
This includes the representation of the lower good- Except for the case of exponential utility, the bounds
deal price as an infimum over values from alternative also depend on the initial wealth level.
pricing measures. Nevertheless, Jaschke and Küchler
[13] provided an important clarification and unifica-
tion of these ideas. Generalized Sharpe Ratios
One method for setting the required premium comes
General Framework from the Sharpe ratio available on a market oppor-
tunity. This give rise to what are called generalized
The general framework of “no-good-deal” pricing Sharpe ratio bounds (see [12] or [4]). The idea is to
(first described by Černý and Hodges [5]) places no- first compute the level of expected utility UR attain-
arbitrage and representative agent equilibrium at the able from a market opportunity offering a specific
two ends of a spectrum of possibilities. They define annualized Sharpe ratio, such as 0.25, and with-
a desirable claim as one which provides a specific out any investment in the derivative. The good-deal
4 Good-deal Bounds

bounds that are supported by this level of expected the Domar–Musgrave function used by Bernardo and
utility (but without this market opportunity) are then Ledoit and the negative exponential one.
said to correspond to a generalized Sharpe ratio of
0.25. In the case of negative exponential utility, the
wealth level and the risk aversion parameter play the Coherent Risk Measures
same role and become irrelevant since the opportunity
can be accepted at any scale. This provides a particu- Jaschke and Küchler [13] expand the link between
larly simple implementation with minimal parameter good-deal bounds and coherent risk measures. They
requirements. show that there is a one-to-one correspondence
Subsequent analysis by Černý [4] further expands between
both the notions and the analysis of generalized 1. “coherent risk measures” (see Convex Risk
Sharpe ratios. The analysis provides details of the Measures)
dual formulations for alternative standard utility func- 2. cones of “desirable claims”
tions. For example, the dual constraints on the change 3. partial orderings
of measure m for different utility functions are as 4. good-deal valuation bounds
given in Table 1. 5. sets of “admissible” price systems.
The various properties of the utility affect the
details of the mathematical analysis considerably. For It should be noted from this analysis that it is
some features to work cleanly we need unbounded sufficient but not necessary to use expected utility
utility, whereas for others the behavior for low wealth to define all the abstract measures considered in
levels is critical. Exponential utility precludes any their paper. In other words, acceptance sets must be
delta hedge that gives a short lognormal position consistent with coherence, but not necessarily with
over finite time—even though it would have a expected utility.
smaller standard deviation than the fully covered It is clear from the foregoing that good-deal
position. Capping such a liability at a finite level can analysis can easily be applied as the basis of risk
therefore have a big effect on the good-deal price measurement and will satisfy the axioms of coherent
resulting from such an analysis. Depending on the risk measures (see Convex Risk Measures). They
context, this may or may not be desirable. While can also be applied as a method of risk adjustment
exponential utility precludes fat negative tails, such for performance measurement. For example, a utility-
as the short lognormal, power and log utility preclude based generalized Sharpe ratio, when applied to an
the possibility of any negative future wealth, and even empirical distribution, provides a method of adjusting
stronger effects can, in principle, derive from this. for skewness in the distribution. In doing so, it makes
With constant absolute risk aversion (CARA) sense to apply a negative sign to situations where a
utility, changing the scale of investment is equivalent short position would have been optimal.
to changing the level of risk aversion. With constant
relative risk aversion (CRRA), it is equivalent to
scaling the initial wealth, W0 . The CRRA-based Recent and Prospective Literature
good-deal bound thus searches across measures with Important new papers continue to appear quite reg-
the same exponent, but different wealth levels. There ularly; a few recent ones are mentioned here. Staum
may be some advantages to finding alternative utility [19] provides much of the background, treating
functions that have properties intermediate between good-deals from the perspective of convex optimiza-
tion. Bjork and Slinko [3] provide extensions to
Table 1 Stochastic discount factor constraints for various Cochrane and Saá-Requejo in a multidimensional
utility functions jump-diffusion setting. There are further papers that
Utility function Constraint expand on the dynamic aspects of this analysis, apply
it to settings with stochastic volatility, or implement
Quadratic: Cochrane et al. E[m2 ] ≤ 1 + A2
similar optimizations using mathematical program-
Exponential E[m ln m] ≤ A
Power, RRA = γ E[m1−1/γ ] ≤ (1 + Aγ )1/γ −1
ming. There are also a number of papers, which
Logarithmic −E[ln m] ≤ ln(1 + A)
although not directly within the framework developed
here, deal with related ideas in different ways.
Good-deal Bounds 5

The apparently simple concept of good-deal [9] Hansen, L.P. & Jagannathan, R. (1991). Implications of
bounds has turned out to provide a great deal of rich- security market data for models of dynamic economies,
ness for mathematicians to analyze, and there are now Journal of Political Economy 99, 225–262.
[10] Harrison, J. & Kreps, J. (1979). Martingales and arbi-
many variations on this theme in the published lit- trage in multiperiod securities markets, Journal of Eco-
erature. Although the theory stems from a practical nomic Theory 11, 215–260.
desire, very few of the papers have an applied flavor. [11] Hobson, D.G. (1998). Robust hedging of the lookback
Rather little algorithmic or numerical work has been option, Finance and Stochastics 2, 329–347.
reported, and most of that uses only somewhat sim- [12] Hodges, S.D. (1998). A Generalization of the Sharpe
plified models, seldom calibrated to the market. The ratio and its Applications to Valuation Bounds and
Risk Measures. FORC Preprint 1998/88, University of
good-deal bounds approach could easily be adapted
Warwick.
to deal with model risk, something which is hinted [13] Jaschke, S. & Küchler, U. (2001). Coherent risk mea-
at in Cont [7]. The literature needs more real appli- sures and good-deal bounds, Finance and Stochastics 5,
cations, and, perhaps, the balance will have changed 181–200.
when the next survey of this area comes to be written. [14] Levy, H. (1985). Upper and lower bounds of put and call
option values: stochastic dominance approach, Journal
of Finance 40, 1197–1218.
References [15] Merton, R.C. (1973). Theory of rational option pricing,
Bell Journal of Economics 4, 141–183.
[1] Artzner, P., Delbaen, F., Eber, J. & Heath, D. (1999). [16] Perrakis, S. & Ryan, P.J. (1984). Option pricing
Coherent measures of risk, Mathematical Finance 9(3), bounds in discrete time, Journal of Finance 39,
203–228. 519–525.
[2] Bernardo, A. & Ledoit, O. (1996). Gain, loss and asset [17] Schweizer, M. (1995). On the minimal martingale mea-
pricing, Journal of Political Economy 108(1), 144–172. sure and the Föllmer-Schweizer decomposition, Stochas-
[3] Bjork, T. & Slinko, I. (2006). Towards a general theory tic Analysis and its Applications 13, 573–599.
of good-deal bounds, Review of Finance 10, 221–260. [18] Sharpe, W.F. (1994). The sharpe ratio, Journal of
[4] Černý, A. (2003). Generalised sharpe ratios and asset Portfolio Management 21, 49–59.
pricing in incomplete markets, European Finance [19] Staum, J. (2004). Pricing and hedging in incomplete
Review 7, 191–233. markets: fundamental theorems and robust utility maxi-
[5] Černý, A. and Hodges, S.D. (2001). The theory of good- mization, Mathematical Finance 14(2), 141–161.
deal pricing in financial markets, in Selected Proceedings
of the First Bachalier Congress Held in Paris, 2000,
H. Geman, D. Madan, S.R. Pliska & T. Vorst, eds, Related Articles
Springer Verlag.
[6] Cochrane, J.H. & Saá-Requejo, J. (2000). Beyond arbi-
trage: ‘Good-Deal’ asset price bounds in incomplete Arbitrage Strategy; Convex Risk Measures;
markets, Journal of Political Economy 108(1), 79–119.
Stochastic Discount Factors; Sharpe Ratio; Super-
[7] Cont, R. (2006). Model uncertainty and its impact on the
pricing of derivative instruments, Mathematical Finance
hedging; Utility Function.
16(3), 519–547.
[8] Dybvig, P.H. & Ross, S.A. (1987). Arbitrage, in The STEWART D. HODGES
New Palgrave: A Dictionary of Economics, J. Eatwell,
M. Milgate & P. Newman., eds, Macmillan, London,
Vol. 1, pp. 100–106.
dia cγ /γ with relative risk aversion coefficient γ =
Arrow–Debreu Prices γ a ∈ (0, 1) and discount factors dia > 0. This exam-
ple for preferences satisfies the general requirements
(insaturation, continuity and convexity) on prefer-
Arrow–Debreu prices are the prices of “atomic”
ences for state-contingent consumption in [3], which
time and state-contingent claims, which deliver one
need not be of the separable subjective expected util-
unit of a specific consumption good if a specific
ity form above. The only way for agents to allocate
uncertain state realizes at a specific future date. For
their consumption is by exchanging state-contingent
instance, claims on the good “ice cream tomorrow”
claims for the delivery of some units of the (perish-
are split into different commodities depending on
able) consumption good at a specific future state. Let
whether the weather will be good or bad, so that
qω denote the price at time 0 for the state-contingent
good-weather and bad-weather ice cream tomorrow
claim that pays q0 > 0 units if and only if state ω ∈ 
can be traded separately. Such claims were introduced
is realized. Given the endowments and utility pref-
by Arrow and Debreu in their work on general
erences of the agents, an equilibrium is given by
equilibrium theory under uncertainty, to allow agents
consumption allocations ca∗ and a linear price system
to exchange state and time contingent claims on
(qω )ω∈ ∈ m + such that,
goods. Thereby the general equilibrium problem with
uncertainty can be reduced to a conventional one
1. for any agent a, his or her consumption ca∗
without uncertainty. In finite-state financial models,
maximizes ua (ca ) over all ca subject to budget
Arrow–Debreu securities delivering one unit of the
constraint 
numeraire good can be viewed as natural atomic
(c0a − e0a )q0 + ω (c1a − e1a )(ω)qω
building blocks for all other state–time contingent
≤ 0, and 
financial claims; their prices determine a unique
2. markets clear, that is a (cta − eta )(ω) = 0 for all
arbitrage-free price system.
dates t = 0, 1 and states ω.

An equilibrium exists and yields a Pareto opti-


Arrow–Debreu Equilibrium Prices mal allocation; see [3], Chapter 7, or the references
below. Relative equilibrium prices qω /q0 of the
This section explains Arrow–Debreu prices in an
Arrow securities are determined by first-order con-
equilibrium context, where they originated, see [1,
ditions from the ratio of marginal utilities evaluated
3]. We first consider a single-period model with
at optimal consumption: for any a,
uncertain states that will be extended to multiple
periods later. For this exposition, we restrict ourselves qω ∂  ∂
to a single consumption good only, and consider a = P a (ω) a Uωa (c1a∗ (ω)) U a (ca∗ ) (2)
q0 ∂c1 ∂c0a 0 0
pure exchange economy without production.
Let (, F) be a measurable space of finitely many To demonstrate the existence of equilibrium, the
outcomes ω ∈  = {1, 2, . . . , m}, where the σ -field classical approach is to show that excess demand
F = 2 is the power set of all events A ⊂ . There vanishes, that is markets clear, by using a fixed
is a finite set of agents, each seeking to maximize point argument (see Chapter 17 in [8]). To this
the utility ua (ca ) from his or her consumption ca = end, it is convenient to consider ca , ea and q =
(c0a , c1a (ω)ω∈ ) at present and future dates 0 and 1, (q0 , q1 , . . . , qm ) as vectors in 1+m . Since only
given some endowment that is denoted by a vector relative prices matter, we may andshall suppose
m
(e0a , e1a (ω)) ∈ 1+m
++ . For simplicity, let consumption that prices are normalized so that 0 qi = 1, that
preferences of agent a be of the expected utility form is, the vector q lies in the unit simplex =
1+m m
{q ∈ + | 0 qi = 1}. The budget condition 1 then

m
reads compactly as (ca − ea )q ≤ 0, where the left-
ua (ca ) = U0a (c0 ) + P a (ω)Uωa (c1 (ω)) (1)
hand side is the inner product in 1+m . For given
ω=1
prices q, the optimal consumption of agent a is given
where P a (ω) > 0 are subjective probability weights, by the inverse of the marginal utility, evaluated at a
and the direct utility functions Uωa and U0a are, for multiple of the state price density (see equation (12)
present purposes, taken to be of the form Uia (c) = for the general definition in the multiperiod case), as
2 Arrow–Debreu Prices

c0a∗ = c0a∗ (q) = (U0a  )−1 (λa q0 ) and For multiple consumption goods, the above ideas
generalize if one considers consumption bundles
a∗
c1,ω = c1,ω
a∗
(q) = (Uωa  )−1 (λa qω /P a (ω)) , ω∈ and state-contingent claims of every good. Arrow
(3) [1] showed that in the case of multiple consump-
where λa = λa (q) > 0 is determined by the budget tion goods, all possible consumption allocations are
constraint (ca∗ − ea )q = 0 as the Lagrange multiplier spanned if agents could trade as securities solely
associated to the constrained optimization problem 1. state-contingent claims on the unit of account (so-
Equilibrium is attained at prices q ∗ where the aggre- called Arrow securities), provided that spot markets
gate excess demand with anticipated prices for all other goods exists in
all future states. In the sequel, we only deal with

z(q) := (ca∗ (q) − ea ) (4) Arrow securities in financial models with a single
a numeraire good that serves as unit of account, and

vanishes, that is z(q ) = 0. One can check that could for simplicity be considered as money (“euro”).
z :  → 1+m is continuous in the (relative) inte- If the set of outcomes  were (uncountably) infi-
rior int :=  ∩ 1+m
++ of the simplex, and that nite, the natural notion of atomic securities is lost,
|z(q n )| goes to ∞ when q n tends to a point on although a state price density (stochastic discount
the boundary of . Since each agent exhausts his factor, deflator) may still exist, which could be inter-
or her budget constraint 1. with equality, Walras’ preted intuitively as an Arrow–Debreu state price per
law z(q)q = 0 holds for any q ∈ int . Let n be unit probability.
an increasing sequence of compact sets exhaust-
ing the simplex interior: int = ∪n n . Set ν n (z) :=
{q ∈ n | zq ≥ zp ∀p ∈ n }, and consider the corre- Multiple Period Extension and
spondence (a multivalued mapping) No-arbitrage Implications
n : (q, z) → (ν n (z), z(q)) (5) The one-period setting with finitely many states is
easily extended to finitely many periods with dates
that can be shown to be convex, nonempty valued, t ∈ {0, . . . , T } by considering an enlarged state space
and maps the compact convex set n × z(n ) into of suitable date–event pairs (see Chapter 7 in [3]). To
itself. Hence, by Kakutani’s fixed point theorem,
this end, it is mathematically convenient to describe
it has a fixed point (q n∗ , zn∗ ) ∈ n (q n∗ , zn∗ ). This
the information flow by a filtration (Ft ) that is
implies that
generated by a stochastic process X = (Xt (ω))0≤t≤T
z(q n∗ )q ≤ z(q n∗ )q n∗ = 0 for all q ∈ n (6) (abstract, at this stage) on the finite probability
space (, F, P0 ). Let F0 be trivial, FT = F = 2 ,
using Walras’ law. A subsequence of q n∗ converges and assume P0 ({ω}) > 0, ω ∈ . The σ -field Ft
to a limit q ∗ ∈ . Provided one can show that q ∗ is contains all events that are based on information
in the interior simplex int , existence of equilibrium from observing paths of X up to time t, and is
follows. Indeed, it follows that z(q ∗ )q ≤ 0 for all defined by a partition of . The smallest nonempty
q ∈ int , implying that z(q ∗ ) = 0 since z(q ∗ )q ∗ = 0 events in Ft are “t-atomic” events A ∈ At of the
by Walras’ law. To show that any limit point of q n∗ type A = [x0 · · · xt ] := {X0 = x0 , . . . , Xt = xt }, and
is indeed in int , it suffices to show that |z(q n∗ )| is constitute a partition of . Figure 1 illustrates the
bounded in n, recalling that  z explodes at the sim- partitions At corresponding to the filtration (Ft )t=0,1,2
plex boundary. Indeed, z = a za is bounded from in a five-element space , as generated by a process
below since each agent’s excess demand satisfies Xt taking values a, . . . , f . It shows that a filtration
za = ca − ea ≥ −ea . This lower bound implies also can be represented by a nonrecombining tree. There
an upper bound, by using equation (6) applied with are eight (atomic) date–event pairs (t, A), A ∈ At .
some q ∈ 1 ⊂ n , since 0 <
≤ qi ≤ 1 uniformly An adapted process (ct )t≥0 , describing, for instance,
in i. This establishes existence of equilibrium. To a consumption allocation, has the property that ct is
ensure uniqueness of equilibrium, a sufficient con- constant on each atom A of partition At , and hence
dition is that all agents’ risk aversions are less than is determined by specifying its value ct (A) at each
or equal to 1, that is γ a ∈ (0, 1] for all a, see [2]. node point A ∈ At of the tree. Arrow–Debreu prices
Arrow–Debreu Prices 3

d [abd ]

b b [ab ] [abb ]

a a
[a ] [aba ]
e
[ace ]
c
[ac ]
f [acf ]

Date: 0 1 2 Date: 0 1 2
(a) (b)

Figure 1 Equivalent representations of the multiperiod case. (a) Tree of the filtration generating process Xt . (b) Partitions
At of filtration (Ft )t=0,1,2

q(t, A) are specified for each node of the tree and like this and trading takes place only at time 0, the
represent the value at time 0 of one unit of account market is free of arbitrage, given all Arrow–Debreu
at date t in node A ∈ At . prices are strictly positive. To give some examples,
Technically, this is easily embedded in the previ- the price at time 0 of a zero couponbond paying
ous single-period setting by passing to an extended one euro at date t equals ZCB t = A∈At q(t, A).
space  := {1, . . . T } ×  with σ -field F  generated For the absence of arbitrage, the t-forward prices
by all sets {t} × A with A being an (atomic) event q f (t, A ), A ∈ At , must be related to spot prices of
of Ft , and P0 ({t} × A) := µ(t)P0 (A) for a (strictly Arrow–Debreu securities by
positive) probability measure µ on {1, . . . T }.
For the common no-arbitrage pricing approach q(t, A )
in finance, the focus is to price contingent claims q f (t, A ) = 
q(t, A)
solely in relation to prices of other claims, that are
A∈At
taken as exogenously given. In doing so, the aim
of the model shifts from the fundamental economic 1
= q(t, A ), A ∈ At (7)
equilibrium task to explain all prices, toward a ZCB t
“financial engineering” task to determine prices from Hence, the forward prices q f (t, A ) are normal-
already given other prices solely by no-arbitrage ized Arrow–Debreu prices and constitute a prob-
conditions, which are a necessary prerequisite for ability measure Qt on Ft , which is the t-forward
equilibrium. From this point of view, the (atomic) measure associated to the t − ZCB as numeraire,
Arrow–Debreu securities span a complete market, and yields q f (t, A) = E t [1A ] for A ∈ At , with E t
as every contingent payoff c, paying ct (A) at time denoting expectation under Qt . Below, we also con-
t in atomic event A ∈ At , can be decomposed by sider “non-atomic” state-contingent claims with pay-
c = t,A∈At ct (A)1(t,A) into a portfolio of atomic offs ck (ω) = 1(t,B) (k, ω), k ≤ T , for B ∈ Ft , whose
Arrow securities, paying one euro at date t in Arrow–Debreu prices are denoted by q(t, B) =
event A. Hence the no-arbitrage price of the claim 
A∈At ,A⊂B q(t, A).
must be t,A∈At ct (A)q(t, A). Given that all atomic
Arrow–Debreu securities are traded at initial time
0, the market is statically complete in that any state- Arrow–Debreu Prices in Dynamic
contingent cash flow c can be replicated by a portfolio Arbitrage-free Markets
of Arrow–Debreu securities that is formed statically
at initial time, without the need for any dynamic In the above setting, information is revealed dynam-
trading. The no-arbitrage price for c simply equals ically over time, but trading decisions are static in
the cost of replication by Arrow–Debreu securities. that they are entirely made at initial time 0. To
It is easy to check that, if all prices are determined discuss relations between initial and intertemporal
4 Arrow–Debreu Prices

Arrow–Debreu prices in arbitrage-free models with by q0 (t, At ) ≡ q(t, At ), the martingale property and
dynamic trading, this section extends the above set- equations (8, 10) imply that
ting, assuming that all Arrow–Debreu securities are
tradable dynamically over time. q(t + 1, At+1 ) = e−Rt+1 (At )t QB (At+1 |At )q(t, At )
Let qs (t, At ), s ≤ t, denote the price process of (11)
the Arrow–Debreu security paying one euro at t in Hence q(t, At ) = QB (At )/Bt (At ) for At ∈ At .
state At ∈ At . At maturity t, qt (t, At ) = 1At takes The deflator or state price density for agent a is
value 1 on At and is 0 otherwise. For the absence of the adapted process ζta defined by
arbitrage, it is clearly necessary that Arrow–Debreu
prices are nonnegative, and that qs (t, At )(As ) > 0 q(t, At ) QB (At )
ζta (At ) := = , At ∈ At
holds for s < t at As ∈ As if and only if As ⊃ At . P a (At ) Bt (At )P a (At )
Further, for s < t it must hold that (12)
so thatζta St a
is a P -martingale for any security price
qs (t, At )(As ) = qs (s + 1, As+1 )(As ) process S, e.g. St = qt (T , AT ), t ≤ T , with AT ∈
AT . Ifone chooses, instead of Bt , another security
× qs+1 (t, At )(As+1 ) (8) Nt = AT ∈AT NT (AT )qt (T , AT ) with NT > 0 as the
numeraire asset for discounting, one can define an
for As ∈ As , As+1 ∈ As+1 such that As ⊃ As+1 ⊃ At . equivalent measure QN by
In fact, the above conditions are also sufficient to
ensure that the market model is free of arbitrage: At QN (A) NT (A) a
= ζ (A) , A ∈ AT (13)
any date t, the Arrow–Debreu prices for the next a
P (A) N0 (A) T
date define the interest rate Rt+1 for the next period
(t, t + 1) 
of length t > 0 of a savings account which has the property that St /Nt is a QN -martingale
Bt = exp( ts=1 Rs t) by for any security price process S. Taking N =
(ZCBtT )t≤T as the T -zero-coupon bond yields the T -
forward measure QT .
exp(−Rt+1 (At )t) If X is a QB -Markov process, the condi-
 tional probability QB (At+1 |At ) in equation (11) is
= qt (t + 1, At+1 ),
a transition probability pt (xt+1 |xt ) := QB (Xt+1 =
At+1 ∈At+1 (At ),At+1 ⊂At
xt+1 |Xt = xt ), where Ak = [x1 . . . xk ] for k = t,
for At ∈ At (9) t + 1. By summation of suitable atomic events

which is locally riskless, in that Rt+1 is known q(t + 1, Xt+1 = xt+1 )


at time t, that means it is Ft -measurable. They 
define an equivalent risk neutral probability QB by = e−R(xt )t pt (xt+1 |xt )q(t, Xt = xt )
xt
determining its transition probabilities from any At ∈
At to At+1 ∈ At+1 with At ⊃ At+1 as (14)

qt (t + 1, At+1 )(At ) where the sum is over all xt from the range of Xt .
QB (At+1 |At ) =  (10)
qt (t + 1, A)(At )
A∈At+1 ,A⊂At Application Examples: Calibration of
Pricing Models
The transition probability (10) can be interpreted
as one-period forward price when being at At at The role of Arrow–Debreu securities as “atomic
time t, for one euro at date t + 1 in event At+1 , building blocks” is theoretical, in that there exist
cf. (7). Since all B-discounted Arrow–Debreu price no corresponding securities in real financial markets.
processes qs (t, At )/Bs , s ≤ t, are martingales under Nonetheless, they are of practical use in the cal-
QB thanks to equations (8, 10), the model is free of ibration of pricing models. For this section, X is
arbitrage by the fundamental theorem of asset pricing, taken to be a QB -Markov process, possibly time-
see [6]. For initial Arrow–Debreu prices, denoted inhomogeneous.
Arrow–Debreu Prices 5

The first example concerns the calibration of a prices of all state-contingent claims, which pay one
short rate model to some given term structure of zero unit at some t if Xt = xt for some xt , already deter-
coupon prices (ZCB t )t≤T , implied by market quotes. mine the risk neutral transition probabilities of X.
For such models, a common calibration procedure It is easy to see that these prices are determined
relies on a suitable time-dependent shift of the state by those of calls and puts for sufficiently many
space for the short rate (see [7], Chapter 28.7). Let strikes and maturities. Indeed, strikes at all tree levels
suitable functions Rt∗ be given such that the variations of the stock for each maturity date t are suffi-
of Rt∗ (Xt ) already reflect the desired volatility and cient, since Arrow–Debreu payoffs are equal to those
mean-reversion behavior of the (discretized) short of suitable butterfly options that are combinations
rate. Making an ansatz Rt (Xt ) := Rt∗ (Xt ) + αt for of such calls and puts. From given Arrow–Debreu
the short rate, the calibration task is to determine the prices q(t, Xt = xt ) for all t, xt , the transition prob-
parameters αt , 1 ≤ t ≤ T , such that abilities pt (xt+1 |xt ) are computed as follows: start-
   ing from the highest stock level xt at some date
 t, one obtains pt (xt u|xt ) by equation (14) with

ZCB t = E exp −
B
(Rk (Xk ) + αk )t
Rt (xt ) = r and t = 1. The remaining transition
k≤t
(15) probabilities pt (xt m|xt ), pt (xt d|xt ) from (t, xt ) are
determined from
with the expectation being taken under the risk
neutral measure QB . It is obvious that this determines pt (xt u|xt )u + pt (xt m|xt )m + pt (xt d|xt )d = 1
all the αt uniquely. When computing this expectation (17)
to obtain the αt by forward induction, it is efficient
to use Arrow–Debreu prices q(t, Xt = xt ), since X and pt (xt u|xt ) + pt (xt m|xt ) + pt (xt d|xt ) = 1. Using
usually can be implemented by a recombining tree. these results, the transition probabilities from the
Summing over the range of states xt of Xt is more second highest (and subsequent) stock level(s) are
efficient than summing over all paths of X. Suppose implied by equation (14) in a similar way. This yields
that αk , k ≤ t, and q(t, Xt = xt ) for all values xt have all transition probabilities for any t.
been computed already. Using equation (14), one can To apply this in practice, the call and put prices for
then compute αt+1 from equation the maturities and strikes required would be obtained
from real market quotes, using suitable interpolation,
  and the trinomial state space (i.e., σ, r, t) has to be
ZCB t+1 = q(t, Xt = xt ) chosen appropriately to ensure positivity of all pt ,
xt+1 xt
∗ see [4, 5].
× e(Rt+1 (xt )+αt+1 )t pt (xt+1 |xt ) (16)

where the number of summand in the double sum is


typically bounded or grows at most linearly in t. Then References
Arrow–Debreu prices q(t + 1, Xt+1 = xt+1 ) for the
next date t + 1 are computed using equation (14), [1] Arrow, K.J. (1964). The role of securities in the optimal
while those for t can be discarded. allocation of risk–bearing, As translated and reprinted in
The second example concerns the calibration to 1964, Review of Financial Studies 31, 91–96.
[2] Dana, R.A. (1993). Existence and uniqueness of equilibria
an implied volatility surface. Let X denote the dis-
when preferences are additively separable, Econometrica
counted stock price Xt = St exp(−rt) in a trino- 61, 953–957.
mial tree model with constant interest Rt := r and [3] Debreu, G. (1959). Theory of Value: An Axiomatic Analy-
t = 1. Each Xt+1 /Xt can√ attain three possible val- sis of Economic Equilibrium, Yale University Press, New
ues {m, u, d} := {1, e±σ 2 } with positive probabil- Haven.
ity, for σ > 0. The example is motivated by the [4] Derman, E., Kani, I. & Chriss, N. (1996). Implied trino-
mial trees of the volatility smile, Journal of Derivatives
task to calibrate the model to given prices of Euro-
3, 7–22.
pean calls and puts by a suitably choice of the [5] Dupire, B. (1997). Pricing and hedging with smiles, in
(nonunique) risk neutral Markov transition probabil- Mathematics of Derivative Securities, M.A.H. Dempster
ities for Xt . We focus here on the main step for & S.R. Pliska, eds, Cambridge University Press,
this task, which is to show that the Arrow–Debreu Cambridge, pp. 227–254.
6 Arrow–Debreu Prices

[6] Harrison, J. & Kreps, D. (1979). Martingales and arbitrage Related Articles
in multiperiod securities markets, Journal of Economic
Theory 20, 381–408.
[7] Hull, J. (2006). Options, Futures and Other Deriva- Arrow, Kenneth; Complete Markets; Dupire Eq-
tive Securities, Prentice Hall, Upper Saddle River, New uation; Fundamental Theorem of Asset Pricing;
Jersey. Model Calibration; Pricing Kernels; Risk-neutral
[8] Mas-Colell, A., Whinston, M.D. & Green, J.R. (1995). Pricing; Stochastic Discount Factors.
Microeconomic Theory, Oxford University Press,
Oxford. DIRK BECHERER & MARK H.A. DAVIS
Options: Basic Definitions There are several binary classifications that help
define an option contract.

A financial option is a contract conferring on the European/American


holder the right, but not the obligation, to engage
in some transaction, on precisely specified terms, An option is European if it must be exercised, if at
at some time in the future. When the holder does all, on a specified date, or a specified sequence of
decide to engage in the transaction, he/she is said to dates (for example, a cap is an interest-rate option
exercise the option. There are two parties to an option consisting of a sequence of call options on the Libor
contract, generally known as the writer and the buyer rate; each of these options is exercised when they
or holder. The “optionality” accrues to the buyer; the fall due if in the money). In an American option,
writer has the obligation to carry out his/her side of by contrast, the time of exercise is at the holder’s
the transaction, should the buyer exercise the option. discretion. The classic American option involves a
An option is vulnerable if there is considered to be fixed final time T and allows the holder to exercise
nonnegligible risk that the writer will fail to do this, at any time T  ≤ T , leaving the holder with the
that is, he/she will default on his/her side of the problem of determining an exercise strategy that
contract. will maximize the value to him. This immediately
The classic contracts are European call and put implies three things: (i) the value of an American
options. A call option entitles the holder to purchase option that has not already been exercised at any
a specified number N of units of a security at a time t can never be less than the intrinsic value
fixed price K per unit, at a specified time T . If ([K − St ]+ for a put option), since one possible
ST is the market price of the security at time T , strategy is always to exercise now, (ii) the value can
the holder will exercise the option if and only if never be less than the value of the corresponding
ST > K (otherwise, it would be cheaper to buy in European option, since another possible strategy is
the market). Since the holder has now acquired for never to exercise before T , and (iii) the value is a
K per unit something that is worth ST , he/she makes nondecreasing function of the final maturity time,
a profit of N (ST − K). Thus, in general, the profit since for T1 < T2 the T1 option is just the T2
is N [ST − K]+ = N max(ST − K, 0). Similarly, the option with the additional restriction that it never
profit on a contract, entitling the holder to sell at K, be exercised beyond T1 . The difference between the
is N [K − ST ]+ . The asset on which the option is American and European values for the same contract
written is called the underlying asset. A call option is called the early exercise premium. Sometimes,
is in the money if ST > K, at the money (ATM) American options have some restriction on the set of
if ST = K and out of the money if ST < K. These allowable exercise times; for example, the conversion
terms are also used at earlier times t < T (e.g., ATM option in convertible bonds often prohibits the
if St = K etc.) even though the option cannot be investor from converting before a certain minimum
exercised then. An option is ATM forward at time t time. Of course, any such restrictions reduce the value
if K = F (t, T ) where F (t, T ) is the forward price since they reduce the class of exercise strategies. A
at time t for purchase at time T . particular case is the Bermuda option, which can only
be exercised at one of a finite number of times. This
is the normal situation in interest-rate options, where
Option Contracts there is a natural sequence of coupon dates at which
exercise decisions may be taken. Bermuda options are
In general, several assets may underlie a given presumably so called because Bermuda is somewhere
contract, as for example in an exchange option between America and Europe—but much closer to
where the holder has the right to exchange one America.
asset for another. Options are sometimes called
contingent claims or derivative securities. These two Traded/OTC
synonymous terms include options, but refer more
generally to any contract whose value is a function Traded options are those where the parties trade
of the values of some collection of underlying assets. through the medium of an organized exchange,
2 Options: Basic Definitions

while “over the counter” (OTC) options are bilateral none of whom controls a significant proportion of
agreements between market counterparties. Option the total supply. In these circumstances, the price
exchanges have become increasingly globalized in is well established, since the last trade was never
recent years. They include the US–European consor- very long ago; bid/ask spreads will be tight, buyers
tium NYSE Euronext, Chicago Mercantile Exchange and sellers can enter the market at will, and there is
(CME), Eurex and EDX, all of which offer a range of little room for price manipulation. By contrast, in an
financial contracts, and a number of specialist com- illiquid market, it may be hard to establish a market
modity exchanges such as NYMEX (oil), ICE, and price when actual trades are infrequent and bid/ask
the London Metal Exchange (LME). An exchange spreads are wide. The liquid/illiquid classification is
offers contracts on an underlying asset such as an not immutable: a liquid market can suddenly become
individual stock or a stock index such as the S&P500, illiquid if there is some shock that forces everybody
with a range of maturity times and strike values. New onto the same side of the market. Several well-
options are added as the old ones roll off, and the recorded disasters in the derivatives market have been
strikes offered are in a range around the spot price due to this phenomenon.
of the underlying asset at the time the contract is ini-
tiated (the options may turn out to be far in or out
of the money at later times, of course). In a traded (Plain) Vanilla/Exotic
options market, prices are determined by supply and
demand. If the exercise times are Ti and the strike The simplest, most standard, and most widely traded
values Kj then the matrix V = [σ̂ij ], where σ̂ij is options are often referred to as plain vanilla options.
the implied volatility corresponding to the (Ti , Kj ) This would certainly include all exchange-traded
contract, defines the so-called volatility surface that options. An “exotic” option is an OTC option with
plays a key role in option risk management. nonstandard features of some kind, which requires
All interest-rate options and most FX (foreign significant modeling effort to value, and where differ-
exchange) options are OTC, but many are, ent analysts could well come up with significantly dif-
nevertheless, very liquidly traded and market ferent valuations. Exotic options often involve several
information on implied volatilities is readily underlying assets and complicated payment streams,
available. but even a simple call option can be exotic if it
poses significant hedging difficulties, as for exam-
Physical Settlement/Cash Settlement ple do long-dated equity options. On the other hand,
barrier options, for example, which once would have
Many single-stock options, and commodity options been considered exotic, have now become vanilla in
are physically settled, that is, at exercise the holder some markets such as FX, because they are so widely
pays the strike value and takes delivery of a share traded.
certificate or a barrel of oil. (One can, however, avoid
physical delivery by selling the option shortly before
final maturity.) The alternative is cash settlement, Path-dependent/Path-independent
where the holder is simply paid a cash amount, such
as [ST − K]+ for a call option, at exercise. When the An option is path-dependent if its exercise value
underlying is an index like the S&P500 this is the depends on the value of the underlying asset at
only way—one cannot deliver the index! In this case, more than one time. Examples are barrier and Asian
the amount paid is c × [IT − K]+ where IT is the options, and any American option. The exercise value
value of the index and c is the contractually specified of a path-independent option is a function only of the
dollar value of one index point. underlying price, say ST , at the maturity time T , as
for example in Black–Scholes. Valuation then only
Liquid/Illiquid requires specifying the one-dimensional risk-neutral
distribution of ST , whereas for a path-dependent
Like any other traded asset, an option contract is option a distribution in path space is required, making
liquid if there is large market depth, that is, there are the valuation more computationally intensive and
a significant number of active traders in the market, model dependent.
Options: Basic Definitions 3

Option Definitions Knockout options are cheaper than their plain


vanilla counterparts because the exercise value is
The purpose of this section is to collect together strictly less with positive probability. Essentially, the
introductory definitions of various option contracts, buyer of a vanilla option pays a premium for events
or features of option contracts, found in the market. he/she may regard as overwhelmingly unlikely. By
We refer to specialist articles in this encyclopaedia for buying a barrier option instead, he/she avoids paying
a detailed treatment, and to standard textbooks such this premium.
as Hull [1]. In these definitions, we use St to denote
a generic underlying asset price, on which is written Basket
an option that starts at time 0 with final maturity at
time T . Consider a portfolio containing wi units of asset
i for i = 1, . . . , n. A basket call option
 then has
exercise value [X − K]+ where X = i wi Si (T ) is
Asian the portfolio value at time T . The main problem in
valuing basket options is the enormous number of
An Asian option is one whose exercise value
correlation coefficients involved, for even moderate
depends on the average price over some range of
portfolio size n.
times. Generally, this is an arithmetic average of the
form S = (1/n) ni=1 Sti and, of course, in reality,
it is always a finite sum, although for purposes of Bermuda
analysis it is often convenient T to consider continu-
c These were already mentioned above. Probably the
ous averaging S = (1/T ) 0 St dt. Averaging may
most common example is the Bermuda swaption,
be over the entire length of the contract or some much
entitling the holder to enter a swap at an agreed fixed-
shorter period. For example, in commodity options
side rate at any one of a list of coupon dates (or,
call option values are generally based on the 10-
equivalently, the right to walk away from an existing
day or one-month average price immediately prior
swap contract).
to maturity, rather than on the spot price, to deter
market manipulation.
Chooser

Barrier A chooser option involves three times 0, T1 , T2 , and


a strike K. The option is entered and the strike set at
A barrier option involves one or both of two prices, time 0, and at time T1 , the holder selects whether it is
a lower barrier L < S0 and an upper barrier U > S0 , a to be a put or a call. The appropriate exercise value
strike K, and a maturity time T . Let τL = inf{t : St ≤ is then evaluated and paid at T2 . Thus the value at
L}, τU = inf{t : St ≥ U } and τLU = min(τL , τU ). A T1 is the maximum of the put and call values at that
knockout option expires worthless if a specified one time. Given this fact valuation is straightforward in
of these times occurs before T , while a knock- the Black–Scholes model.
in option expires worthless unless this time occurs
before T . An up-and-out call is a knockout call Digital
option based on τU , so formally its exercise value
is 1(τU >T ) [ST − K]+ . Similarly, an up-and-in call A digital option pays a fixed cash amount if some
has exercise value 1(τU ≤T ) [ST − K]+ . The sum is an condition is realized. For example, an up-and-in
ordinary call option. There are analogous definitions digital barrier option with maturity T and barrier level
for down-and-out and down-and-in options based on U will pay a fixed amount X if τU < T . Payment
τL . Normally, these would be put options. A double might be made at τU or at T .
barrier option knocks out or in at time τLU . In the
Black–Scholes model, there is an analytic formula Exchange
for single-barrier options, based on the reflection
principle for Brownian motion, but double barrier An exchange option has exercise value [aS2 (T ) −
options require numerical methods. S1 (T )]+ , that is, the holder has the right to exchange
4 Options: Basic Definitions

one unit of asset 1 for a units of asset 2 at the Quanto


maturity time T . Exchange options can be priced by
the Margrabe formula, originally introduced in [2]. A quanto or cross-currency option is written on an
underlying asset denominated in currency A, but the
Forward-starts, Ratchets, and Cliquets exercise value is paid in currency B. For example,
we could have an option on the USD-denominated
A involves two times 0 < T1 < T2 . The premium is S&P index I (t) where the exercise value is GBP
paid at time 0 and the exercise value at final maturity c[I (T ) − K]+ . We can write the constant as c = c1 c2
T2 is [ST2 − mST1 ]+ , where m is a contractually where c1 is the conventional dollar value of an index
specified “moneyness” factor. If, for example, m = point, while c2 is an exchange rate (the number
1.05, then effectively the strike is set at T1 in such a of pounds per dollar). Thus a quanto option is a
way that the option is 5% out of the money at that combination of a “foreign”-denominated option plus
time. This is a pure volatility play in that the value an exchange-rate guarantee. Valuation amounts to
essentially only depends on the forward volatility deriving the state-price density applicable to a market
between T1 and T2 . A ratchet or cliquet is a string of model including foreign as well as domestic assets.
forward start options over times Ti , Ti+1 , so Ti+1 is
the maturity date for option i and the start date for
option i + 1. Russian
A Russian option is a perpetual lookback option,
Lookback that is, an American lookback option with no final
maturity time.
Let Smax (t) = maxu≤t S(u) and Smin (t) = minu≤t
S(u). The exercise values of a lookback call and
a lookback put are [S(T ) − Smin (T )] and [Smax (T )− References
S(T )], respectively. The holders of these options can
essentially buy at the minimum price and sell at the [1] Hull, J.C. (2000). Options, Futures and Other Derivatives,
maximum price. Black–Scholes valuation of these 4th Edition, Prentice Hall.
options uses the reflection principle for Brownian [2] Margrabe, W. (1978). The value of an option to exchange
motion, in the same way as for barrier options. one asset for another, Journal of Finance 33, 177–186.

MARK H.A. DAVIS


Passport
A passport option is a call option where the underly-
ing asset is a traded portfolio, and the holder has the
right to choose the trading strategy in this portfolio.
Option Pricing: General Suppose that we wish to find a fair price of a call
option with strike $105 in one year. This option
Principles will effectively pay out $5 if the stock increases,
whereas the holder will not exercise it if the stock
value is $100. Consider now an investment today
Option contracts are financial assets that involve in a = 0.5 number of stocks and b = −$50/1.05 ≈
an element of choice for the owner. Depending −$47.62 deposited in the bank. A simple calculation
on an event, the holder of an option contract can reveals that this investment yields exactly the same as
exercise his/her options stated in the contract, that holding the option. In fact, this is the only investment
is, to undertake certain specified actions. The typical in the stock and bank that perfectly replicates the
example of an option contract gives the holder the option payoff and we, therefore, call it the replicating
right to buy a specific stock at a contracted price strategy of the option. The cost of replication is
and time in the future. The contracted price is called P = $50/21 ≈ $2.38.
the strike price, whereas the exercise time is when We argue that the fair price of the option should
the option may be executed. Such contracts are be the same as the costs P of buying the replicating
strategy. If the price would be higher, say P > P,
known as call options, and the event that triggers
the execution of the option is that the underlying then one could do the following. Sell n options for
stock price is above the strike. There is a plethora of that price and buy n of the replicating strategy. At
different options traded in today’s modern financial exercise, any claims from the options sold will be
markets, where the financial events may include covered exactly by the replicating strategies bought.
However, we have received the cash amount of n × P 
credit, weather related situations, and so on. One
usually refers to derivatives or claims as being for selling the options and paid out the amount n × P
financial assets whose values are dependent on other for replication, thus leaving us with a profit. There
financial assets. is no risk attached with this investment proposition,
There are two fundamental questions that the and we can make the profit arbitrarily high by simply
option pricing theory tries to answer. First, what is increasing n. This is what is known as an arbitrage
the fair price of a claim, and second, how can one opportunity, and in efficient markets, this should
replicate the claim. The second question immediately not be possible (or at least be ruled out quickly). If
implies the answer to the first, since if we can find an  < P , we reverse the positions above to create an
P
investment strategy in the market that replicates the arbitrage. The definition of a fair price is the price for
claim, the cost of this replication should be the fair which no arbitrage possibility exists. Thus, the option
price. This replication strategy is frequently called price in our example must be P = $50/21 ≈ $2.38.
the hedging strategy of the claim. The key finan- We note that the probability of a stock price
cial concepts in pricing and replication are arbitrage increase did not enter into our analysis. The fair price
(or rather the absence of such) and completeness. A is unaffected by this probability, since the hedging
mathematical concept related to these is the equiv- strategy is the same no matter how likely the stock
alent martingale measure, also known as the risk- price is to increase to $110. The price of the option
neutral probability. does not depend on the expected return of the stock,
but only on the spread in the two possible outcomes
of the stock price at exercise time, or, in other words,
Explaining the Basic Concepts the volatility.
One may ask if the price of an option can be
To understand the concepts used, it is informative to stated as the present expected value of the payoff
consider a very simple (and highly unrealistic) one- at exercise. From the above derivations, we see that
period binomial model. Suppose that we have a stock this is, in general, not the case since the price is not a
with value $100 today and two possible outcomes function of the probability of a stock price increase.
in one year. Either the stock price can increase to Hence, a present value price of the option would lead
$110 or it can remain unchanged. The interest rate to arbitrage possibilities. However, we may rephrase
earned on bank deposits is set to 5% yearly and the question and ask whether there exists a probability
considered the risk-free investment in the market. q for price increase such that the fair price can be
2 Option Pricing: General Principles

expressed as a present value? Letting q = 0.5, we famous option pricing formula. We suppose that the
can easily convince ourselves that market is frictionless in the sense that there are no
transaction costs incurred when we trade in the stock
1 or the bank, and there are no restrictions on short or
P = {q × 5 + (1 − q) × 0} long positions. Further, the interest rate is the same
1.05
1 whether we borrow or lend money, and the market is
= Ɛq [option payoff] (1) perfectly liquid.
1.05 The main difference from the one-period model
where Ɛq is the expectation with respect to the is that we can invest in the underlying stock at all
probability q. This probability of a stock price times up to maturity of the claim. Obviously, we can
increase is not the probability for a price increase also do the same with the bank deposit, which is
observed in the market, but a constructed probability now assumed to yield a continuously compounding
for which the option price can be expressed as a interest rate r. An investment strategy will consist
present expected value. of a(t) shares of the stock and $b(t) invested in
The probability q has an interesting property that the bank at time t. Since investors cannot foresee
actually defines it. The present expected value of the the future, the investment decisions at time t can
stock price is equal to today’s value, only be based upon the available market information,
which is contained in the filtration Ft . The value at
1
100 = Ɛq [stock price] (2) time t of the portfolio is
1.05
Hence, the discounted stock price is a martingale V (t) = a(t)S(t) + b(t)R(t) (4)
with respect to the probability q. Further, the return
on an investment in the stock coincides with the risk- where R(t) = exp(rt), the value of an initial bank
free rate under q, defending the name “risk-neutral deposit of 1. Further, since we are interested in
probability” often assigned to q. creating strategies that are replicating an option, we
wish to rule out any external funding or withdrawal
of money in the portfolio we are setting up. This leads
Option Pricing in Continuous Time to the so-called self-financing hypothesis, saying that
any change in portfolio value comes from a change
Our binomial one-period example basically contains in the underlying stock price and bank deposit.
the main concepts for pricing of options and claims Mathematically, we can formulate this condition as
in more general and realistic market models. Moving
to a stock price that evolves dynamically in time dV (t) = a(t) dS(t) + b(t) dR(t) (5)
with stochastic marginal changes, the principles of
option pricing remain basically the same, however, Note that Itô’s formula implies a dynamics V (t)
introducing interesting technical challenges. We now where the differentials of a(t) and b(t) appear.
look at the case when the stock price follows a The self-financing hypothesis indicates that these
geometric Brownian motion (GBM), that is, differentials are zero.
For the one-period binomial model, we recall the
dS(t) existence of an equivalent martingale measure for
= µ dt + σ dB(t) (3)
S(t) which the discounted stock price is a martingale.
Applying the Girsanov theorem, we find a probability
defined on a probability space (, F, (Ft )t≥0 , )
measure  equivalent to the market probability ,
with the filtration Ft generated by the Brownian
for which the process W (t) with differential
motion modeling the information flow. The GBM
model indicates that returns (or more precisely, µ−r
logarithmic returns) are independent and √ normally dW (t) = dt + dB(t) (6)
σ
distributed, with mean µ dt and volatility σ dt. The
model was first proposed for stock price dynamics is a Brownian motion. By a direct calculation, we
by Samuelson [7] and later used by Black and find
Scholes [1] and Merton [6] in their derivation of the d(e−rt S(t)) = σ (e−rt S(t)) dW (t) (7)
Option Pricing: General Principles 3

which is a martingale under . Furthermore, by Thus, as a natural generalization of the binomial


discounting the portfolio process V (t) and applying one-period model case, any claim has a price given as
the self-financing hypothesis, we find the expected present value, where the expectation is
taken with respect to the risk-neutral probability. Note
d(e−rt V (t)) = σ a(t)(e−rt S(t)) dW (t) (8) that S does not depend on its expected return µ under
, and therefore the price P (t) is independent of this.
Hence, the discounted portfolio process is also a
The volatility σ is, however, a crucial parameter for
martingale under .
the determination of price.
Consider a claim with maturity at time T and a
If we let X be the payoff of a call option written
payoff represented by the random variable X, where
on S, one can calculate the conditional expectation
FT is measurable and integrable with respect . The
in equation (10) to derive the famous Black–Scholes
(for the moment unknown) price at time t of the
formula. Further, the process φ(t) is in this case
claim is denoted by P (t). Suppose that the discounted
explicitly known, and it turns out that the investment
price of the claim is a martingale with respect to 
strategy a(t) is the derivative of the price P (t) with
and that we have a self-financing portfolio consisting
respect to S(t). This derivative is known as the delta
of investments in the stock, the bank notes, and the
of the call option. Moreover, the strategy given by
claim. Further, we construct the investment such that
a(t) is called delta-hedging.
the initial price is zero. The discounted value process
of this portfolio will then (by the same reasoning as
above) give a martingale process under , and hence
the expectation with respect to  of the portfolio Option Pricing in Incomplete Markets
value at any future time must be the same as the
Recall that we have assumed a frictionless market.
initial investment, namely, zero. Thus, under , there
In practice, transaction costs are normally incurred
is a positive probability of having a negative portfolio
when buying and selling shares. Hence, since a delta-
value, which implies by equivalence of  with  that
hedging strategy (a, b) involves incessant trading,
we cannot have any arbitrage opportunities in this
it will become infinitely costly if implemented. In
market. On the other hand, if the market does not
addition, there are practical limits to how big a
allow for any arbitrage, one can show that e−rt P (t)
short position we can take (e.g., due to credit limits
must be a -martingale. We refer to [3] for the
and collaterals). Theoretically, there exists only one
connection between no-arbitrage and existence of
equivalent martingale measures. It is a financially replicating strategy since the martingale representa-
reasonable condition to assume that the market is tion theorem prescribes a unique integrand process
arbitrage free. φ(t). Introducing frictions such as transaction costs
By the martingale representation theorem, there or short-selling limits in the market rules out the
exists an adapted stochastic process φ(t) such that possibility to replicate claims in general, and the
market is said to be incomplete. We remark that in
d(e−rt P (t)) = φ(t) dW (t) (9) an incomplete market, there still exists claims that
can be replicated, and by the no-arbitrage princi-
whenever exp(−rt)P (t) is square-integrable with ple, the price of these is characterized by the cost

respect to . By defining a(t)=φ(t)/σ exp(−rt)S(t) of replication, as we have argued above. However, a
 natural question arises: what can we say about pricing
and b(t)= exp(−rt)(P (t) − a(t)S(t)), the portfolio
V (t) given by the investment strategy (a, b) is self- and hedging of claims where no replicating strategy
financing. Moreover, V (T ) = P (T ) = X, implying exists?
that it is a replicating strategy for the claim. Further- One approach suggests to look at super- and
more the market becomes complete, meaning that sub-replicating strategies. A super(sub-)replicating
there exists a replicating strategy for all claims, X strategy is a self-financing portfolio of stock and
being square-integrable with respect to . bank deposit, which at least(most) has the same
Now, again appealing to the -martingale prop- value as the claim at maturity. Letting Pmax (Pmin )
erty of e−rt P (t), we find by definition that be the infimum (supremum) over all prices of such
super(sub-)replicating strategies, it follows that any
P (t) = e−r(T −t) Ɛ [X | Ft ] (10) price P in the interval (Pmin , Pmax ) is arbitrage
4 Option Pricing: General Principles

free. Furthermore, any self-financing strategy that utility from the two investment scenarios, the indif-
costs less than Pmax will always have a positive ference price of the claim is defined as the price that
probability of having a value lower than the claim at makes one indifferent between the two opportunities.
maturity, and thus full replication is impossible. This The choice of an exponential utility function leads
leaves the issuer of the claim with some unhedgable to prices where the singular case of zero risk aver-
risk. An acceptable or fair price of the claim will sion coincides with the price defined by the minimal
reflect the compensation the issuer demands for entropy martingale measure [4]. This price lends
taking on this risk. itself to the interpretation of being the price that is
A change in the stock price dynamics gives equally desirable for both the issuer and the buyer in
another source of incompleteness in the market. The the case when both parties have zero risk aversion.
GBM model is rather unnatural from an empirical For all other risk aversions, the seller will charge
point of view, since observed stock price returns on higher prices, and the buyer will demand lower.
the marketplace are frequently far from being nor- The difference of the two optimal investment strate-
mally distributed nor are they independent. Stock gies obtained from utility maximization becomes the
price models including stochastic volatility and/or hedging strategy. This and other similar approaches
stochastic drivers other than Brownian motion have have gained a lot of academic attention in the recent
been proposed. For instance, on the basis of empir- years.
ics, the returns may be modeled by a heavy-tailed Another path to pricing in incomplete markets is
distribution, which gives rise to a Lévy process in to try to complete the market by adding options. The
the geometric dynamics of the stock price. A conse- required number of options to complete the market is
quence of such a seemingly innocent change in the closely linked to the number of sources of uncertainty
structure is that there exists a continuum (in general) and the number of assets. For example, considering
of equivalent martingale measures  such that the a GBM with a stochastic volatility following the
discounted stock price is a martingale. The compli- Heston model gives two random sources and one
cating implication of this is the absence of martin- asset. Following the analysis in [2], one call option is
gale representations when it becomes impossible to sufficient to complete the market. In [2], the necessary
and sufficient conditions to complete markets are
find an investment strategy replicating the claim. As
given in the case when the filtration is spanned
for markets with frictions, we have no possibility
by more Brownian motions than there are traded
of replication, but an interval of possible arbitrage-
assets.
free prices. In addition, in this case, the issuer
of the claim needs to accept a certain unhedgable
risk. References
To price claims in incomplete markets, one must
resort to methods that take into account the risk posed
[1] Black, F. & Scholes, M. (1973). The pricing of options
on the issuer. Popular approaches include minimal- and corporate liabilities, Journal of Political Economy 81,
variance hedging, where the strategy minimizing the 637–654.
variance (that is, the risk) is sought for. The price [2] Davis, M. & Obloj, J. (2008). Market completion using
of the claim is the cost of buying the minimal- options, in Advances in Mathematics of Finance, L. Stet-
variance strategy [8] plus a compensation for the tner, ed., Banach Center Publications, pp. 49–60, Vol. 43.
[3] Delbaen, F. & Schachermayer, W. (1994). A general
unhedged risk. Another possibility that has gained
version of the fundamental theorem of asset pricing,
a lot of attention in the option pricing literature is Matematische Annalen 300, 463–520.
indifference pricing (see also the seminal work of [4] El Karoui, N. & Rouge, R. (2000). Pricing via utility
Hodges and Neuberger [5]). Here, one considers an maximization and entropy, Mathematical Finance 10(2),
investor who has two opportunities. Either he/she can 259–276.
invest his/her funds in the market, or he/she can sell [5] Hodges, S. & Neuberger, A. (1989). Optimal replication
of contingent claims under transaction costs, Review of
a claim and invest his/her funds along with claim
Futures Markets 8, 222–239.
price. In the latter case, he/she has more funds for [6] Merton, R. (1973). Theory of rational option pricing,
investment, but on the other hand, he/she faces a Bell Journal of Economics and Management Science 4,
claim at maturity. By optimizing his/her expected 141–183.
Option Pricing: General Principles 5

[7] Samuelson, P.A. (1965). Proof that properly anticipat- Related Articles
ing prices fluctuate randomly, Industrial Management
Reviews 6, 41–49.
[8] Schweizer, M. (2001). A guided tour through quadratic Binomial Tree; Black–Scholes Formula; Hedging;
hedging approaches, in Option Pricing, Interest Rates, and Option Pricing Theory: Historical Perspectives.
Risk Management, E. Jouini, J. Cvitanic & M. Musiela,
eds, Cambridge University Press, pp. 538–574. FRED E. BENTH
Forwards and Futures parties need to agree on the specific asset (often called
the underlying asset) and on the precise quantities that
are bought or sold, on the exact date when the trans-
Futures and forwards are financial contracts that make actions take place (the delivery date), and the price
it possible to reduce the price risk that arises from the that will be charged on that date (the forward price).
intention to buy or sell certain assets at a later date. Usually the forward price is chosen in such a way
A forward contract specifies in advance the price that that both parties agree to sign the contract without
will be paid at such a later date for the delivery of the any money changing hands before the delivery date.
asset. This obviously reduces the price risk for that This implies that the forward contract starts with hav-
transaction to zero for all parties involved. A futures ing zero market value, since both parties are willing
contract, on the other hand, guarantees that changes to sign it without receiving or paying any money for
in the asset’s price that occur before the delivery date it. Later on, the contract may have a positive or neg-
will be compensated for immediately when they arise. ative market value, since every change in the market
This compensation is achieved by offsetting payments price of the underlying asset will make the existing
into a bank account that is called the margin account. agreement as written in the contract more beneficial
This significantly reduces the price risk associated to one of the parties and less beneficial to the other
with the futures transaction, since the only possible one. The forward contract may, therefore, become a
remaining source of uncertainty is now due to the serious liability for one of the two parties involved,
interest rate used for the margin account. so there is the risk that this party is no longer will-
The assets that are bought or sold at the delivery ing or able to honor the terms of the contract on the
date can be storable commodities (such as gold, oil, delivery date. This counterparty risk problem can be
and agricultural products), nonstorable commodities avoided by the use of futures contracts.
(such as electricity), or other financial assets (such Futures are standardized contracts that are traded
as stocks, bonds, options, or currencies). Forward on futures exchanges. When entering a futures con-
contracts are also used by parties to agree in advance tract, a margin account on the futures exchange is
on an interest rate that will be paid or charged opened and a payment into that account is required, to
during a later time period, in so-called forward rate make it possible for the exchange to withdraw money
agreements (FRAs). Similarly, one can buy and sell when appropriate. The exchange publishes a futures
futures on the value of money deposited in a bank price for every contract, which is updated regularly
account. For such interest rate futures, which include to reflect price changes in the underlying. When-
the very popular eurodollar and euribor contracts, ever a new futures price is announced, an amount
there is no actual delivery but the contract is fulfilled of cash that is equal to the difference between the
by cash settlement instead. new futures price and the previous one is paid into
Here, we discuss only the general pricing prin- or withdrawn from the margin account, depending on
ciples for forwards and futures. We refer to other whether one is short the contract or long the contract.
articles in the encyclopedia for detailed information Parties that intend to buy the underlying are long the
concerning the delivery procedures and methods to contract, and they, therefore, receive money if the
quote prices for specific futures and forward con- futures price goes up and pay when it goes down.
tracts, such as eurodollar futures (see Eurodollar Parties that intend to sell are short the contract, and
Futures and Options), forward rate agreements (see they, therefore, pay money if the futures price goes
LIBOR Rate), electricity (see Electricity Forward down and receive money when it goes up. This pro-
Contracts), commodity (see Commodity Forward cedure is known as marking to market. Since on the
Curve Modeling), and foreign exchange forwards delivery date the futures price is always equal to the
(see Currency Forward Contracts). underlying asset price, a possible difference between
the initial futures price and the current asset price has
Using Futures and Forwards been compensated for by the intermediate payments
into the margin account.
Forward contracts are usually agreed upon by two Parties with opposite positions in the futures
parties who directly negotiate the terms of such con- market deal only with the exchange instead of with
tracts, which can therefore be very flexible. The two each other, which explains the need for standardized
2 Forwards and Futures

contracts and the significant reduction in counterparty time n. These cash flows can be positive (such as
risk. Since no cash is needed to enter into a new dividends when S is a stock, or interest when S is
(long or short) futures contract as long as there is a currency) or negative (such as storage costs when
enough money left in the margin account, it is easy S is a commodity). We will always assume perfect
to change a position in futures once such an account market liquidity (see Liquidity), so all assets can be
has been established. One can terminate existing long bought and sold in all possible quantities for their
contracts by simply taking a position in offsetting current market prices and no transaction costs (see
short contracts or vice versa, and many parties close Transaction Costs) are charged.
their position just before the delivery date if they The cash flows associated with a forward contract
are only interested in compensation for price changes that is initiated at time T0 ∈ N take place at the time
and not in the actual delivery. This makes futures of delivery Td ∈ N that is specified in the contract,
very convenient to use for hedging purposes (see with T0 ≤ Td . At time Td , the asset is delivered while
Hedging) and for speculation on an underlying’s the forward price agreed upon at the initial time T0 for
price movements. Likewise, it is quite easy for the delivery at time Td , which we denote by F (T0 , Td ),
exchange to close the futures position of a party who is paid in return. Since this forward price needs to be
refuses to put more money in their margin account determined at time T0 , it should be FT0 -measurable.
when asked to do so in the so-called margin call. Moreover, the forward price is chosen in such a way
These characteristics have made futures very pop- that both parties agree to enter the contract without
ular financial instruments and the market for them is any cash changing hands at this initial time.
huge. In 2008, more than eight billion futures con- In complete and arbitrage-free markets (see Arbi-
tracts were traded worldwide with underlying assetsa trage Pricing Theory), it is often possible to find an
in equity indices (37%), individual equity (31%), explicit expression for the forward price F (T0 , Td ),
interest rates (18%), agricultural goods (5%), energy since the cash flows associated with the contract can
(3%), currencies (3%), and metals (2%). The most then be replicated using other assets with known
popular are contracts on the S&P 500 and Dow Jones prices. Let us assume that there exists a unique
indices, followed by eurodollar and eurobund futures, martingale measure , which is equivalent to ,
and contracts on white sugar, soybeans, crude oil, alu- such that the discounted versions of tradable assets
minum, and gold. The notional amounts underlying are martingales under this measure (see Equivalent
futures on interest rates, equity indices, and curren- Martingale Measures). This is almost equivalent
cies at the world’s exchanges were estimated to be to the assumption of a complete and arbitrage-free
27 trillion, 1.6 trillion, and 175 billion US dollars, market; for the exact statement (see Fundamental
respectively, in June 2008b . Theorem of Asset Pricing). Contingent claims that
pay a cash-flow stream of Fn -measurable amounts
Xn at the times n ∈ N in such markets have a unique
Pricing Methods for Forwards in Discrete
price p at time k ∈ N equal to
Time 
pk = Bk Ɛ [Xn /Bn | Fk ] (1)
To analyze the futures and forward prices, we first
n∈N, n≥k
look at discrete-time models, and then look at gener-
alizations in continuous time. A specific example is the zero-coupon bond price
Consider a discrete-time market model on a proba- at time k for the delivery of one unit cash at time
bility space (, F, ) with a filtration (Fn )n∈N where T > k, which is equal to p(k, T ) = Bk Ɛ [1/BT |
N = {0, 1, ..., N } denotes our discrete-time set. We Fk ].
define assets S and B to model the underlying asset Suppose that an investor enters into a forward
and a bank account, respectively, with associated contract at time T0 , which obliges him/her to deliver
stochastic price processes (Sn )n∈N and (Bn )n∈N . We the underlying asset S at time Td , and that he/she
assume that S is adapted and that B is predictable buys the underlying asset at time T0 to hold it until
with respect to this filtration, and that both B and delivery. This will lead to a cashflow of −ST0 at time
1/B are bounded. Associated with the asset S are T0 , to cash flows Dn at times {n ∈ N : T0 ≤ n ≤ Td },
cash flows (Dn )n∈N where Dn denotes the sum of and a cash flow of F (T0 , Td ) at time Td when he/she
all cash flows caused by holding one unit of S at delivers the asset. Since a forward contract is entered
Forwards and Futures 3

into without any money changing hands and since the Pricing Methods for Futures in Discrete
net position after delivery will be zero, the value of Time
the cash-flow stream defined above must be zero if
there is no arbitrage in the market. Using the previous All cash flows associated with a futures contract take
equation, we thus find that place via the margin account. Let (Mn )n∈N be the
process describing the value of the margin account
0 = − ST0 + BT0 Ɛ [F (T0 , Td )/BTd | FT0 ] associated with a long position in one future on the
 underlying asset S defined above. If f (k, Td ) is the
+ BT0 Ɛ [Dn /Bn | FT0 ] (2) futures price at time k for delivery of one unit of
T0 ≤n≤Td the asset at time Td > k (k, Td ∈ N), then the margin
account values will satisfy
Since F (T0 , Td ) is FT0 -measurable, this leads to
the following expression for a forward price in a Bk+1
Mk+1 = Mk + f (k + 1, Td ) − f (k, Td ) (5)
complete and arbitrage-free market: Bk
 where we assume that the interest rate used for the
ST0 /BT0 − Ɛ [Dn /Bn | FT0 ] margin account is the same as the one used for B.
T0 ≤n≤Td Futures prices are determined by supply and
F (T0 , Td ) =
Ɛ [1/BTd | FT0 ] demand on the futures exchanges, but if we assume
 a complete and arbitrage-free market for S and B,
ST0 − BT0 Ɛ [Dn /Bn | FT0 ] we can derive a theoretical formula for the futures
T0 ≤n≤Td price. We consider an investment strategy where at a
=
p(T0 , Td ) certain time k ∈ N, we open a new margin account,
(3) put an initial margin amount Mk into it, and take
a long position in a futures contract for delivery at
In particular, when there are no dividends or stor- time Td . One time step later, we go short one future
age costs, the forward price is simply equal to the contract for the same delivery date, which effectively
current price of the underlying asset divided by the closes our futures position, and we then empty our
appropriate discount rate until delivery. For com- margin account. Since our net position is then zero
modities, where the cash flows Dn are often negative again and since we do not pay or receive money to
since they represent storage costs, this formula (3) go long or short a futures contract, the total value of
is known as the cost-of-carry formula. Conversely, this cash-flow stream at time k should be equal to
when the actual possession of an underlying asset is zero, so
more beneficial than just holding the forward con-  
tract, this can be modeled by introducing positive Mk+1 
0 = − Mk + Bk Ɛ Fk
cash flows Dn . Such benefits are often expressed as Bk+1
a rate, the so-called convenience yield, which may  
 f (k + 1, Td ) − f (k, Td ) 
fluctuate as a result of changing expectations con- = Bk Ɛ Fk (6)
cerning the availability of the underlying asset on the Bk+1
delivery date. Since B was assumed to be predictable, that is,
The initial price of a forward contract is zero, but Bk+1 is Fk -measurable for all k ∈ N \ {N }, we may
when the underlying asset’s price changes, so does conclude from the above that the futures price process
the value of an existing contract. If we denote by f (·, Td ) is a -martingale for any fixed delivery date
G(T0 , Td , k) the value at time k of a forward contract Td ∈ N, and hence
entered at time T0 ≤ k for delivery at time Td ≥ k,
then a similar argument as before leads to f (k, Td ) = Ɛ [STd | Fk ] (7)
  since f (Td , Td ) = STd . Note that this formula no
F (k, Td ) − F (T0 , Td ) 
G(T0 , Td , k) = Bk Ɛ Fk longer holds if B fails to be predictable or when the
BTd interest rates paid on the bank account B and the
= p(k, Td ) (F (k, Td ) − F (T0 , Td )) (4) margin account M are different.
4 Forwards and Futures

Continuous-time Models To determine the correct forward price F (T0 , Td )


for a forward contract initiated at time T0 for deliv-
The generalization to continuous-time models is ery at time Td , we follow the same arguments as
rather straightforward for forward contracts, but more in the discrete-time case. If we borrow money to
subtle for futures contracts. buy the underlying asset today and then hold on to
Assume that the price process of the underly- it until we deliver it at the delivery date in return
ing asset is a stochastic process S on a probability for a payment of the forward price, the total value
space (, F, ) with a filtration (Ft )t∈[0,T ] that sat- of this cash-flow stream should be zero since we
isfies the usual conditions, that is, it is right con- enter the forward contract without any cash pay-
tinuous and F0 contains all -null sets. We will ments. Therefore, pt should be zero in the formula
assume that the process S is an adapted semimartin- above if we substitute t = T0 and the cash-flow
gale and that the bank account process B is an stream
adapted and predictable semimartingale, and B and
1/B are assumed to be bounded almost surely. We Xt = 0, (t < T0 , t > Td ), XT0 = −ST0 ,
model the dividend and storage costs of the asset S
using an adapted semimartingale D, with the inter- Xt = Dt (t ∈ ]T0 , Td [ ), XTd = DTd + F (T0 , Td )
pretation that the total amount of dividends received (9)
minus the storage costs paid between two times t1
and t2 is equal to Dt2 − Dt1 − where 0 ≤ t1 < t2 ≤ Using the fact that the forward price F (T0 , Td )
T. must be FT0 -measurable then leads to

  Td     
1 dDu 1 
F (T0 , Td ) = ST0 − BT0 Ɛ
+ d D, FT0 (10)
p(T0 , Td ) T0 Bu− B u

As in the discrete-time case, we assume that we The formula for the value of a forward contract at
have a complete and arbitrage-free market and that a later time after T0 is the same as in the discrete-time
there exists a unique measure  that is equivalent case.
to  such that discounted versions of tradable assets We now turn to the definition of a futures price
become martingales under this measure (see Equi- process (f (t, Td ))t∈[0,Td ] in continuous time for deliv-
valent Martingale Measures). We model contingent ery at a fixed time Td ∈ [0, T ]. Let (ψt )t∈[0,Td ] be
claims by a cumulative cash-flow stream (Xt )t∈[0,T ] , a futures investment strategy: a bounded and pre-
which is an adapted semimartingale. The total cash dictable stochastic process such that ψt represents the
amount paid out by the contingent claim between two number of futures contracts (positive or negative) we
times t1 and t2 is given by Xt2 − Xt1 − , and Xt − Xt− own at time t. The associated margin account process
corresponds to a payment at the single time t (with (Mt )t∈[0,T ] is then defined on [0, T ] as
t, t1 , t2 ∈ [0, T ] and t2 ≥ t1 ). Such contingent claims
have a unique price p in a complete and arbitrage-free dBt
dMt = Mt + ψt df (t, Td ) (11)
market, which at time t is equal to Bt−
 T     
dXu 1  with M0 ∈ , where we have again assumed that the
pt = Bt Ɛ + d X,  Ft (8)
t Bu− B u margin account earns the same interest rate as the
bank account B. As mentioned before, the futures
The last term involving the brackets compensates price process should be equal to the underlying asset
for the fact that the cash flows X and the bank price at delivery, so f (Td , Td ) = STd .
account may have nonzero covariation, so it disap- In a complete and arbitrage-free market, we con-
pears when B has finite variation and is continuous, sider an investment strategy where at any time t ∈
or when B is deterministic. Compare this to the [0, Td ] we open a new margin account and put an
discrete-time case, where we assumed that (Bn )n∈N initial margin amount Mt in, go long one future con-
is predictable. tract at time t, wait until a later date s ∈]t, Td ] and
Forwards and Futures 5

close our futures position by going short one contract, treated by Duffie and Stanton [7] and Karatzas and
and close our margin account. If there is no arbitrage, Shreve [10], see also [12]. See [2] for a very clear
the discounted value of the cash flows from this strat- summary of the principles involved. For excellent
egy should be zero at time t since we start and end introductions to the practical organization of futures
without any position, so and forward markets and for empirical results on
  prices, the books by Duffie [6], Hull [8], and Kolb
Ms 
Mt = Bt Ɛ Ft (12) [11] are recommended.
Bs For incomplete markets, there is a theory of
equilibrium in futures market under mean–variance
This shows that M/B is a martingale under ,
preferences; see, for example, [14] and the
that is, the margin account should be a tradable asset.
consumption-based capital asset pricing model of
A bit of stochastic calculus shows that
    Breeden [4] (see also Capital Asset Pricing Model).
Mt df (t, Td ) 1 Many futures allow a certain flexibility regarding the
d = + d f (·, Td ), (13)
Bt Bt− B t exact product that must be delivered and regarding
the time of delivery. The value of this last “timing
and we see that if B is continuous, of finite variation, option” is analyzed in a paper by Biagini and
bounded, and bounded away from zero, then the Björk [1].
futures price process f (·, Td ) is itself a martingale When the bank account process B is not of
under  and hence finite variation and continuous, the futures price is
no longer a martingale under ; however under
 
f (t, Td ) = Ɛ f (Td , Td ) | Ft = Ɛ STd | Ft some technical conditions, it can be shown to be
a martingale under another equivalent measure that
(14) can be found using a multiplicative Doob–Meyer
decomposition (see Doob–Meyer Decomposition) as
Note that in this case the difference between the
shown in [15]. The assumption that B and 1/B are
forward and futures prices can be expressed as
bounded is often too restrictive in practice; see [13]
for weaker conditions.
F (t, Td ) − f (t, Td )
  
BT0 STd  End Notes
= Ɛ FT0
p(T0 , Td ) BTd
  a.

  1  Sector estimates based on the US data, by the Futures
−Ɛ STd FT0 Ɛ FT0 (15) Industry Association.
BTd b.
Quarterly Review, December 2008, Bank for International
Settlements.
Since the expression in brackets is the FT0 -
conditional covariance between STd and 1/BTd , we
immediately see that forward and futures prices coin-
References
cide if and only if these two stochastic variables are
[1] Biagini, F. & Björk, T. (2007). On the timing option in a
uncorrelated when conditioned on FT0 , for example,
futures contract, Mathematical Finance 17(2), 267–283.
when the bank account B is deterministic. [2] Björk, T. (2004). Arbitrage Theory in Continuous Time,
2nd Edition, Oxford University Press.
[3] Black, F. (1976). The pricing of commodity contracts,
Extensions Journal of Financial Economics 3(1–2), 167–179.
[4] Breeden, D.T. (1980). Consumption risk in futures
For clarity of exposition, we have focused here on markets, Journal of Finance 35(2), 503–520.
forward and future prices in complete and arbitrage- [5] Cox, J.C., Ingersoll, J. Jr. & Ross, S.A. (1981). The
free markets without transaction costs (see Transac- relation between forward prices and futures prices,
Journal of Financial Economics 9(4), 321–346.
tion Costs). Early papers on the theoretical pricing [6] Darell, D. (1989). Futures Markets, Prentice-Hall.
methods are by Black [3] for deterministic interest [7] Duffie, D. & Stanton, R. (1992). Pricing continu-
rates and Cox et al. [5] and Jarrow and Oldfield ously resettled contingent claims, Journal of Economic
[9] for the general case. Continuous resettlement is Dynamics and Control 16(3–4), 561–573.
6 Forwards and Futures

[8] Hull, J. (2003). Options, Futures and Other Derivatives, prices in a multigood economy, Journal of Financial
5th Edition, Prentice-Hall. Economics 9(4), 347–371.
[9] Jarrow, R.A. & Oldfield, G.S. (1981). Forward contracts [15] Vellekoop, M. & Nieuwenhuis, H. (2007). Cash Divi-
and futures contracts, Journal of Financial Economics dends and Futures Prices on Discontinuous Filtrations.
9(4), 373–382. Technical Report 1838, University of Twente.
[10] Karatzas, I. & Shreve, S. (1998). Methods of Mathemat-
ical Finance, Springer-Verlag.
[11] Kolb, R. (2003). Futures, Options, and Swaps, 4th
Edition, Blackwell Publishing. Related Articles
[12] Norberg, R. & Steffensen, M. (2005). What is the time
value of a stream of investments? Journal of Applied
Probability 42, 861–866.
Commodity Forward Curve Modeling; Cur-
[13] Pozdnyakov, V. & Steele, J.M. (2004). On the martin- rency Forward Contracts; Electricity Forward
gale framework for futures prices, Stochastic Processes Contracts; Eurodollar Futures and Options;
and Their Applications 109, 69–77. LIBOR Rate.
[14] Richard, S.F. & Sundaresan, M.S. (1981). A continuous
time equilibrium model of forward prices and futures MICHEL VELLEKOOP
Black–Scholes Formula information in the market. Traded asset prices are
Ft -adapted stochastic processes on (, F, ). We
assume that the market is frictionless; assets may
be held in arbitrary amount, positive and negative,
“If options are correctly priced in the market, it
the interest rate for borrowing and lending is the
should not be possible to make sure profits by creat-
same, and there are no transaction costs (i.e., the
ing portfolios of long and short positions in options
bid–ask spread is 0). While there may be many traded
and their underlying stocks. Using this principle, a
assets in the market, we fix attention on two of them.
theoretical valuation formula for options is derived.”
First, there is a “risky” asset whose price process
These sentences, from the abstract of the great paper
(St , t ∈ + ) is assumed to satisfy the stochastic dif-
[2] by Fischer Black and Myron Scholes, encapsu-
ferential equation (SDE)
late the basic idea that—with the asset price model
they employ—insisting on absence of arbitrage is dSt = µSt dt + σ St dwt (1)
enough to obtain a unique value for a call option on
the asset. The resulting formula, equation (6) below, with given drift µ and volatility σ . Here (wt , t ∈ + )
is the most famous formula in financial economics, is an (Ft )-Brownian motion. Equation (1) has a
and, in fact, that whole subject splits decisively into unique solution: if St satisfies equation (1), then by
the pre-Black–Scholes and post-Black–Scholes eras. the Itô formula
This article aims to give a self-contained deriva-  
tion of the formula, some discussion of the hedge d log St = µ − 12 σ 2 dt + σ dwt (2)
parameters, and some extensions of the formula,
and to indicate why a formula based on a stylized so that St satisfies equation (1) if and only if
mathematical model, which is known not to be a par-   
ticularly accurate representation of real asset prices, St = S0 exp µ − 12 σ 2 t + σ wt (3)
has nevertheless proved so effective in the world of
Asset St is assumed to have a constant dividend yield
option trading. The section The Model and Formula
q, that is, the holder receives a dividend payment
formulates the model and states and proves the for-
qSt dt in the time interval [t, t + dt[. Secondly,
mula. As is well known, the formula can equally
there is a riskless asset paying interest at a fixed
well be stated in the form of a partial differential
continuously compounding rate r. The exact form
equation (PDE); this is equation (9) below. The next
of this asset is unimportant—it could be a money-
section discusses the PDE aspects of Black–Scholes.
market account in which $1 deposited at time s grows
The section Hedge Parameters summarizes informa-
to $er(t−s) at time t, or it could be a zero-coupon bond
tion about the option ‘greeks’, while the sections The
maturing with a value of $1 at some time T , so that
Black ‘Forward’ Option Formula and A Universal
its value at t ≤ T is
Black Formula introduce what is actually a more use-
ful form of Black–Scholes, usually known as the Bt = exp(−r(T − t)) (4)
Black formula. Finally, the section Implied Volatil-
ity and Market Trading discusses the applications of This grows, as required, at rate r:
the formula in market trading. We define the implied
volatility and demonstrate a “robustness” property of dBt = rBt dt (5)
Black–Scholes, which implies that effective hedging
can be achieved even if the “true” price process is Note that equation (5) does not depend on the final
substantially different from Black and Scholes’ styl- maturity T (the same growth rate is obtained from
ized model. any zero-coupon bond) and the choice of T is a matter
of convenience.
A European call option on St is a contract,
entered at time 0 and specified by two parameters
The Model and Formula (K, T ), which gives the holder the right, but not the
obligation, to purchase 1 unit of the risky asset at
Let (, F, (Ft )t∈+ , ) be a probability space with price K at time T > 0. (In the frictionless market
a given filtration (Ft ) representing the flow of setting, an option to buy N units of stock is equivalent
2 Black–Scholes Formula

to N options on a single unit, so we do not need 3. The value of the put option with exercise time T
to include quantity as a parameter.) If ST ≤ K the and strike K is
option is worthless and will not be exercised. If
ST > K the holder can exercise his option, buying the
P (t, S) = e−r(T −t) KN (− d2 )
asset at price K, and then immediately selling it at the
prevailing market price ST , realizing a profit of ST − − e−q(T −t) SN (− d1 ) (11)
K. Thus, the exercise value of the option is [ST −
K]+ = max(ST − K, 0). Similarly, the exercise value To prove the theorem, we are going to show that
of a European put option, conferring on the holder the call option value can be replicated by a dynamic
the right to sell at a fixed price K, is [K − ST ]+ . trading strategy investing in the asset St and in the
In either case, the exercise value is nonnegative zero-coupon bond Bt = e−r(T −t) . A trading strategy is
and, in the above model, is strictly positive with specified by an initial capital x and a pair of adapted
positive probability, so the option buyer should pay processes αt , βt representing the number of units of
the writer a premium to acquire it. Black and Scholes S, B respectively held at time t; the portfolio value
[2] showed that there is a unique arbitrage-free value at time t is then Xt = αt St + βt Bt , and by definition
for this premium. x = α0 S0 + β0 B0 . The trading strategy (x, α, β) is
admissible if
Theorem 1  T
1. In the above model, the unique arbitrage-free (i) αt2 St2 dt < ∞ a.s.
value at time t < T when St = S of the call 
0
T
option maturing at time T with strike K is (ii) |βt | dt < ∞ a.s.
0
(iii) There exists a constant L ≥ 0 such that
C(t, S) = e−q(T −t) SN (d1 ) − e−r(T −t) KN (d2 )
Xt ≥ −L for all t, a.s.
(6) (12)
where N (·) denotes the cumulative standard nor- The gain from trade in [s, t] is
mal distribution function
 x  t  t  t
1 1 2
αu dSu + βu dBu +
N (x) = √ e− 2 y dy (7) qαu Su du
2π −∞ s s s

and where the first integral is an Itô stochastic inte-


gral. This is the sum of the accumulated capital
gains/losses in the two assets plus the total dividend
log(S/K) + (r + σ 2 /2)(T − t)
d1 = √ received. The trading strategy is self-financing if
σ T −t

d2 = d1 − σ T − t (8) αt St + βt Bt − αs Ss − βs Bs
 t  t  t
= αu dSu + qαu Su du + βu dBu (13)
s s s
2. The function C(t, S) may be characterized as the
unique C 1,2 solutiona of the Black–Scholes PDE implying that the change in value over any interval in
portfolio value is entirely due to gains from trade (the
∂C ∂C 1 ∂ 2C accumulated increments in the value of the assets in
+ rS + σ 2 St2 2 − rC = 0 (9) the portfolio plus the total dividend received).
∂t ∂S 2 ∂S
We can always create self-financing strategies
solved backward in time with the terminal bound- by fixing α, the investment in the risky asset, and
ary condition investing all residual wealth in the bond. Indeed,
the value of the risky asset holding at time t is
C(T , S) = [S − K]+ (10) αt St , so if the total portfolio value is Xt we take
Black–Scholes Formula 3

βt = (Xt − αt St )/Bt . The portfolio value process is and in particular


then defined implicitly as the solution of the SDE
x = e−rT Ɛ [h(ST )] (21)
dXt = αt dSt + qαt St dt + βt dBt
Now St is a Markov process, so the conditional
= αt dSt + qαt St dt + (Xt − αt St )r dt
expectation in equation (20) is a function of St ,
= rXt dt + αt St σ (θ dt + dwt ) (14) and indeed we see from equation (17) that ST is a
function of St and the increment (w̌T − w̌t ),√which
is independent of Ft . Writing (w̌T − w̌t ) = Z T − t
where θ = (µ − r + q)/σ . This strategy is always
where Z ∼ N (0, 1), the expectation is simply a
self-financing since Xt is, by definition, the gains
one-dimensional integral with respect to the normal
from trade process, while the value is αS + βB = X.
distribution. Hence, Xt = C(t, St ) where
Proof of Theorem (1) The key step is to put
the “wealth equation” (14) into a more convenient

form by change of measure. Define a measure , e−r(T −t) ∞

the so-called risk-neutral measure on (, FT ) by the C(t, S) = √ h(S exp((r − q − σ 2 /2)
2π −∞
Radon–Nikodým derivative √
× (T − t) − σ x T − t))e−1/2x dx (22)
2
 
d 1 2
= exp −θwT − θ T (15)
d 2 Straightforward calculations show that this integral is
(The right-hand side has expectation 1, since wT ∼ equal to the closed-form expression in equation (6).
N (0, T ).) Expectation with respect to  will be The argument so far shows that if there is a
denoted Ɛ . By the Girsanov theorem, w̌ = wt + θt replicating strategy, the initial capital required must
is a -Brownian motion, so that from equation (1) be x = C(0, S0 ) where C is defined by equation (22).
the SDE satisfied by St under  is It remains to identify the strategy (x, α, β) and to
show that it is admissible. Let us temporarily take for
dSt = (r − q)St dt + σ St dw̌t (16)
granted the assertions of part (2) of the theorem; these
so that for t < T will be proved in Theorem 3 below, where we also
 show that (∂C/∂S)(t, S) = e−q(T −t) N (d1 ), so that in
  particular 0 < ∂C/∂S < 1.
ST = St exp r − q − 12 σ 2 (T − t)
The replicating strategy is A = (x, α, β) defined
 by
+ σ (w̌T − w̌t ) (17)
∂C
Applying the Itô formula and equation (14) we find x = C(0, S0 ), αt = (t, St )
∂S
that, with X̃t = e−rt Xt and S̃t = e−rt St  
1 ∂C 1 2 2 ∂ 2C ∂C
dX̃t = αt S̃t σ dw̌t (18) βt = + σ St − qSt (23)
rBt ∂t 2 ∂S 2 ∂S
−rt
Thus e Xt is a -local martingale under con-
dition (12)(i). Let h(S) = [S − K]+ and suppose Indeed, using the PDE (9) we find that Xt = αt St +
there exists a replicating strategy, that is, a strat- βt Bt = C(t, St ), so that A is replicating and also
egy (x, α, β) with value process Xt constructed as Xt ≥ 0, so that condition (12)(iii) is satisfied. From
in equation (14) such that XT = h(ST ) a.s. Suppose equation (17)
also that αt satisfies the stronger condition
 T St2 = S02 exp((2r − 2q − σ 2 )t + 2σ w̌t ) (24)
Ɛ αt2 St2 dt < ∞ (19)
0 so that Ɛ [St2 ] = exp((2r − 2q + σ 2 )t). Since
T
Then X̃t is a -martingale, and hence for t < T |e−r(T −t) ∂C/∂S| < 1, this shows that Ɛ 0 αt2 St2
dt < ∞, that is, condition (19) is satisfied. Since βt
Xt = e−r(T −t) Ɛ [h(ST )|Ft ] (20) is, almost surely (a. s.), a continuous function of t it
4 Black–Scholes Formula

satisfies equation (12)(ii). Thus A is admissible. The The Black–Scholes Partial Differential
gain from trade in an interval [s, t] is Equation

 t  t  t
Theorem 3
αu dSu + qαu Su du + βu dBu 1. The Black–Scholes PDE (9) with boundary con-
s s s
  t  dition (10) has a unique C 1,2 solution, given by
t
∂C ∂C 1 ∂ 2C equation (6).
= dS + + σ 2 St2 2 du
s ∂S s ∂t 2 ∂S 2. The Black–Scholes “delta”, (t, S), is given by
 t ∂
= dC (t, S) = C(t, S) = e−q(T −t) N (d1 ) (26)
s ∂S
= C(t, St ) − C(s, Ss ) (25) Proof It can—with some pain—be directly
checked that C(t, S) defined by equation (6) does sat-
isfy the Black–Scholes PDE (9), (10), and a further
(We obtain the first equality from the definition of calculation (not quite as simple as it appears) gives
α, β, and it turns out to be just the Itô formula applied the formula (26) for the Black–Scholes delta. It is,
to the function C.) This confirms the self-financing however, enlightening to take the original route of
property and completes the proof. Black and Scholes and relate the equation (9) to a
Finally, part (3) of the theorem follows from simpler equation, the heat equation. Note from the
the model-free put–call parity relation C − P = explicit expression (17) for the price process under
e−q(T −t) S − e−r(T −t) K and symmetry of the normal the risk-neutral measure that, given the starting point
distribution: N (−x) = 1 − N (x).  St , there is a one-to-one relation between ST and
The replicating strategy derived above is known as the Brownian increment w̌T − w̌t . We can therefore
delta hedging: the number of units of the risky asset always express things interchangeably in “S coordi-
held in the portfolio is equal to the Black–Scholes nates” or in “w̌ coordinates”. In fact, we already made
delta,  = ∂C/∂S. use of this in deriving the integral price expression
So far, we have concentrated entirely on the (22). Here we proceed as follows.
hedging of call options. We conclude this section by For fixed parameters S0 , r, q, σ , define the func-
showing that, with the class of trading strategies we tions φ : + ×  → + and u : [0, T [× → + by
have defined, there are no arbitrage opportunities in   
1 2
the Black–Scholes model. φ(t, x) = S0 exp r −q − σ t +σ x (27)
2
Theorem 2 There is no admissible trading strategy
and
in a single asset and the zero-coupon bond that gen-
erates an arbitrage opportunity, in the Black–Scholes u(t, x) = C(t, φ(t, x)) (28)
model.
Note that the inverse function ψ(t, s) = φ −1 (t, s)
Proof Suppose Xt is the portfolio value pro- (i.e., the solution for x of the equation s = φ(t, x))
cess corresponding to an admissible trading strategy is
(x, α, β). There is an arbitrage opportunity if x = 0      
and, for some t, Xt ≥ 0 a.s. and [Xt > 0] > 0, or 1 s 1
ψ(t, s) = log − r − q − σ2 t
equivalently Ɛ[Xt ] > 0. This is the -expectation, σ S0 2
but Ɛ[Xt ] > 0 ⇔ Ɛ [X̃t ] > 0 since  and  are
(29)
equivalent measures and e−rt > 0. From equation
(18), X̃t is a -local martingale which, by the defini- A direct calculation shows that C satisfies equation
tion of admissibility, is bounded below by a constant (9) if and only if u satisfies the heat equation
−L. It follows that X̃t is a supermartingale, so if
x = 0, then Ɛ [X̃t ] ≤ 0 for any t. So no arbitrage ∂u 1 ∂ 2 u
can arise from the strategy (0, α, β).  + −ru=0 (30)
∂t 2 ∂x 2
Black–Scholes Formula 5

If Wt is Brownian motion on some probability space equation (22) is the unique C 1,2 , solution of equation
and u is a C 1,2 function, then an application of the (9) as claimed. 
Itô formula shows that
 
∂u 1 ∂ 2 u Hedge Parameters
−rt −rt
d(e u(t, Wt )) = e + − ru dt
∂t 2 ∂x 2 Bringing in all the parameters, the Black–Scholes
∂u formula (6) is a six-parameter function C(t, S) =
+ e−rt dWt (31) C(τ, S, K, r, q, σ ), where τ = T − t is the time to
∂x
maturity. For risk-management purposes, it is impor-
If u satisfies equation (30) with boundary condition tant to know the sensitivities of the option value to
u(T , x) = g(x) and changes in the parameters. The conventional hedge
 T 2 parameters or “greeks” are given in Table 1. There
∂u are slight notational problems in that “vega” is not
Ɛ (t, Wt ) dt < ∞ (32)
0 ∂x the name of a Greek letter (here we have used upper-
case upsilon, but this is not necessarily a conventional
then the process t → e−rt u(t, Wt ) is a martingale so choice) and upper-case rho coincides with Latin P, so
that, with Ɛt,x denoting the conditional expectation this parameter is usually written ρ, risking confusion
given Wt = x, with correlation parameters. The expressions in the
right-hand column are readily obtained from the sen-
e−rt u(t, x) = Ɛt,x [e−rT u(T , WT )] sitivity parameters (42) and (43) of the “universal”
Black Formula introduced below.
= Ɛt,x [e−rT g(WT )] (33)
Delta is, of course, the Black–Scholes hedge ratio.
Since WT ∼ N (x, T − t), this shows that u is given Gamma measures the convexity of C and is at its
by maximum when the option is close to being at the
money. Since gamma is the rate of change of delta,
 frequent rebalancing of the hedge portfolio will be
e−r(T −t) ∞

(y−x)2
u(t, x) = g(y)e 2(T −t) dy (34) required in areas of high gamma. Theta is defined as
2π(T − t) −∞ −∂ C/∂τ and is generally negative (as can be seen
from the table, it is always negative for a call option
A sufficient condition for equation (32) is
on an asset with no dividends). It represents the
 ∞ “time decay” in the option value as the maturity time
1
g 2 (y)e−y /2T dy < ∞
2
√ (35) is reduced, that is, real time advances. As regards
2πT −∞
rho, it is not immediately obvious, without doing
In our case, the boundary condition is g(x) = the calculation, what its sign will be: on one hand,
[φ(t, x) − K]+ < φ(t, x) and this condition is eas- increasing r increases the forward price, pushing a
ily checked. Hence, equation (30) with this boundary call option further into the money, while on the
condition has unique C 1,2 solution (34), implying that other hand increased r implies heavier discounting,
the inverse function C(t, S) = u(t, ψ(t, S)) given by reducing option value. As can be seen from the table,

Table 1 Black–Scholes risk parameters

Delta  ∂C e−qτ N (d1 )


∂S
Gamma ∂ 2C e−qτ N√
(d1 )
∂S 2 Sσ τ
−qτ

Theta  ∂C
− ∂τ −e SN
√ (d1 )σ + q e−qτ SN (d1 ) − rKe−rτ N (d2 )
2 τ
Rho P ∂C Kτ e−rτ N (d2 )
∂r
∂C √
Vega ϒ ∂σ e−qτ S τ N
(d1 )
6 Black–Scholes Formula

the first effect wins: rho is always positive. Vega is risk-neutral measure as St = F (0, t)Mt where Mt is
in some ways the most important parameter, since the exponential martingale
a key risk in managing books of traded options is  
“vega risk”, and in Black–Scholes this is completely Mt = exp σ w̌t − 12 σ 2 t (38)
“outside the model”. Bringing it back inside the
which is equivalent to equation (17). This model
model is the subject of stochastic volatility.
accords with the general fact that, in a world of
An extensive discussion of the risk parameters and
deterministic interest rates, the forward price is the
their uses can be found in Hull [6].
expected price in the risk-neutral measure, that is, the
ratio St /F (0, t) is a positive martingale with expecta-
tion 1. The exponential martingale (38) is the simplest
The Black “Forward” Option Formula continuous-path process with these properties.
The six-parameter representation C(τ, S, K, r, q, σ )
is not the best parameterization of Black–Scholes. A Universal Black Formula
For the asset St with dividend yield q, the forward
price at time t for delivery at time T is F (t, T ) = The parameterization of Black–Scholes can be fur-
St e(r−q)(T −t) (this is a model-free result, not related ther compressed as follows. First, note that σ and
to the Black–Scholes model). We can trivially re- τ = (T − t) do not appear
√ separately, but only in
express the price formula (6) as the combination a = σ T − t, where a 2 is some-
times known as the operational time. Next, define
C(t, St ) = B(t, T )(F (t, T )N (d1 ) − KN (d2 )) (36) the “moneyness” m as m(t, T ) = K/F (t, T ), and
define
with a log m
d(a, m) = − (39)
2 a
1 √
log(F (t, T )/K) + σ 2 (T − t) (so that d1 = d(σ T − t, K/F (t, T ))). Then the
d1 = √ 2
σ T −t Black formula (36) becomes

d2 = d1 − σ T − t (37) C = BF f (a, m) (40)

where B(t, T ) = e−r(T −t) is the zero-coupon bond where


value or “discount factor” from T to t. There is,
f (a, m) = N (d(a, m)) − mN (d(a, m) − a) (41)
however, far more to this than just a change of
notation. First, the continuously compounding rate Now BF is the price of a zero-strike call, or
r is not market data. The market data at time t is equivalently the price to be paid at time t for delivery
the set of discount factors B(t, t
) for t
> t. We see of the asset at time T . Formula (40) says that the price
from equation (36) that “r” plays two distinct roles in of the K-strike call is the (model-free) price of the
Black–Scholes: it appears in the computation of the zero-strike call modified by a factor f that depends
forward price F and the discount factor B. But both only on the moneyness and operational time. We call
of these are more fundamental than r itself and are, in f the universal Black–Scholes function, and a graph
fact, market data which, as equation (36) shows, can of it is shown in Figure 1. With N
= dN/dx and
be used directly. A further advantage is that the exact d = d(a, m) we find that mN
(d − a) = N
(d) and
mechanism of dividend payment is not important, as hence obtain the following very simple expressions
long as there is an unambiguously defined forward for the first-order derivatives:
price.
Formula (36) is known as the Black formula ∂f
and is the most useful version of Black–Scholes, (a, m) = N
(d) (42)
∂a
being widely applied in connection with FX ( foreign
∂f
exchange) and interest-rate options as well as (a, m) = − N (d − a) (43)
dividend-paying equities. Fundamentally, it relates to ∂m
a price model in which the price is expressed in the In particular, ∂f/∂a > 0 and ∂f/∂m < 0 for all a, m.
Black–Scholes Formula 7

a discussion. Here, we restrict ourselves to examining


1.0
what happens if we naı̈vely apply the Black–Scholes
delta-hedge when in reality the underlying process
0.9
is not geometric Brownian motion, taking q = 0 for
0.8
simplicity. Specifically, we assume that the “true”
0.7 price model, under measure , is
0.6
 
Factor f

t t
0.5
St = S0 + ηt St− dt + κt St− dWt
0.4 0 0

0.3
+ St− vt (z)µ(dt, dz) (44)
0.2 1 [0,t]×
0.1 0.8
0.6 where µ is a finite-activity Poisson random mea-
0.0 0.4 a sure, so that there is a finite measure ν on  such
that µ([0, t] × A) − ν(A)t ≡ (µ − π)([0, t] × A) is
0.0

0.2
0.2

0.4

0.6

0.8

0.00
a martingale for each A ∈ B(). η, κ, v are pre-
1.2

1.4

1.6

1.8

Moneyness m
dictable processes. Assume that η, κ and v are such
that the solution to the SDE (44) is well defined
Figure 1 The universal Black–Scholes function
and, moreover, that vt (z) > −1 so St > 0 almost
surely. This is a very general model including
path-dependent coefficients, stochastic volatility, and
This minimal parameterization of Black–Scholes jumps. Readers unfamiliar with jump-diffusion mod-
is used in studies of stochastic volatility; see, for els can set µ = ν = π = 0 below, and refer to the
example, Gatheral [5]. last paragraph of this section for comments on the
effect of jumps.
Consider the scenario of selling at time 0 a
European call option at implied volatility σ̂ , that is,
Implied Volatility and Market Trading for the price p = C(T , S0 , K, r, σ̂ ) and then following
a Black–Scholes delta-hedging trading strategy based
So far, our discussion has been entirely within the on constant volatility σ̂ until the option expires
Black–Scholes model. What happens if we attempt at time T . As usual, we shall denote C(t, s) =
to use Black–Scholes delta hedging in real market C(T − t, s, K, r, σ̂ ), so that the hedge portfolio, with
trading? This question has been considered by several value process Xt , is constructed by holding αt :=
authors, including El Karoui et al. [3] and Fouque ∂S C(t, St− ) units of the risky asset S, and the
et al. [4], though neither of these discusses the effect remainder βt := B1t (Xt− − αt St− ) units in the riskless
of jumps in the price process. asset B (a unit notional zero-coupon bond). This
In the universal price formula (40), the parameters portfolio, initially funded by the option sale (so X0 =
B, F, m are market data, so we can regard the formula p), defines a self-financing trading strategy. Hence,
as a mapping a → p = BFf (a, m) from a to price the portfolio value process X satisfies the SDE
p ∈ [B[F − K]+ , BF ). In a traded options market,
p is market data (but must lie in the stated interval,  t
else there is a static arbitrage opportunity). In view Xt = p + ∂S C(u, Su− )ηu Su− du
0
of equation (42), f (a, m) is strictly increasing in a  t
and hence there is a unique value a = â(p) such that
+ ∂S C(u, Su− )κu Su− dWu
p = BFf√ (â(p), m). The implied volatility is σ̂ (p) = 0
â(p)/ T − t. If the underlying price process St 
actually were geometric Brownian motion (1), then + ∂S C(u, Su− )Su− vu (z)µ(du, dz)
[0,t]×
σ̂ would be the same, and equal to the volatility
 t
σ , for call options of all strikes and maturities. Of
course, this is never the case in practice—see [5] for + (Xu − ∂S C(u, Su− )Su )rdu (45)
0
8 Black–Scholes Formula

  1 
Now define Yt = C(t, St ), so that, in particular, Y0 = 

p. Applying the Itô formula (Lemma 4.4.6 of [1]) − er(T −t) (t, St− (1 + 
vt (z)))
[0,T ]× 0 0
gives 
× vt2 (z)Su−
2
d
d π(dt, dz) − MT (48)
 t
Yt = p + ∂t C(u, Su− )du
0
 t where MT is the terminal value of the martingale
+ ∂S C(u, Su− )ηu Su− du
0   
 1 
t
Mt = er(T −t) (t, St− (1 + 
vt (z)))
+ ∂S C(u, Su− )κu Su− dWu [0,T ]× 0 0
0 

d
d
t
1 × vt2 (z)Su−
2
(µ − π)(dt, dz) (49)
+ 2
∂SS C(u, Su− )κu2 Su−
2
du
2 0

 Equation (48) is a key formula, as it shows that
+ C(u, Su− (1 + vu (z)))
[0,t]× successful hedging is quite possible even under
 significant model error. Without some “robustness”
− C(u, Su− ) µ(dt, dz) (46) property of this kind, it is hard to imagine that the
derivatives industry could exist at all, since hedging
Thus the ‘hedging error’ process defined by Zt := under realistic conditions would be impossible.
Xt − Yt satisfies the SDE Consider first the case µ ≡ 0, where St has
continuous sample paths and the last two terms
 
t t  in equation (48) vanish. Then, successful hedging
Zt = rXu du − rSu− ∂S C(u, Su− ) depends entirely on the relationship between the
0 0
implied volatility σ̂ and the true “local volatility”
1 
+ ∂t C(u, Su− ) + κu2 Su−
2 2
∂SS C(u, Su− ) du κt . Note from Table 1 that t > 0. If we, as option

2 writers, are lucky and σ̂ 2 ≥ βt2 a.s. for all t, then
 the hedging strategy makes a profit with probability
− C(u, Su− (1 + vu (z))) − C(u, Su− )
[0,t]× 1 even though the true price model is substantially
 different from the assumed model as in equation (1).
− ∂S C(u, Su− )Su− vu (z) µ(du, dz) On the other hand, if we underestimate the volatility,
 t  we will consistently make a loss. The magnitude of
1 t
= rZu du + 2
(u, Su− )Su− (σ̂ 2 − κu2 )du the profit or loss depends on the option convexity .
0 2 0
 If is small, then hedging error is small even if the

− C(u, Su− (1 + vu (z))) volatility has been grossly misestimated.
[0,t]× For the option writer, jumps in either direction
 are unambiguously bad news. Since C is convex,
− C(u, Su− ) − ∂S C(u, Su− )Su− vu (z) µ(du, dz)
C > (∂C/∂S)S, so the last term in equation
(47) (47) is monotone decreasing: the hedge profit takes
a hit every time there is a jump, either upward
where (t, St ) = ∂SS
2
C(t, St ), and the last equality or downward, in the underlying price. However,
follows from the Black–Scholes PDE. Therefore, the there is some recourse: in equation (48), MT has
final difference between the hedging strategy and the expectation 0 while the penultimate term is negative.
required option payout is given by By increasing σ̂ we increase Ɛ[ZT ], so we could
arrive at a situation where Ɛ[ZT ] > 0, although in
this case there is no possibility of with probability
ZT = XT − [ST − K]+
1 profit because of the martingale term. All of this

1 T r(T −t) 2 reinforces the trader’s intuition that one can offset
= e St− (t, St− )(σ̂ 2 − κt2 )dt additional hedge costs by charging more upfront (i.e.,
2 0
Black–Scholes Formula 9

increasing σ̂ ) and hedging at the higher level of [2] Black, F. & Scholes, M. (1973). The pricing of options
implied volatility. and corporate liabilities, Journal of Political Economy 81,
637–654.
[3] El Karoui, N., Jeanblanc-Picqué, M. & Shreve, S.E.
End Notes (1998). Robustness of the Black and Scholes formula,
Mathematical Finance 8, 93–126.
[4] Fouque, J.-P., Papanicolaou, G. & Sircar, K.R. (2000).
a.
A two-parameter function is C 1,2 if it is once (twice) Derivatives in Financial Markets with Stochastic Volatil-
continuously differentiable in the first (second) argument. ity, Cambridge University Press.
[5] Gatheral, J. (2006). The Volatility Surface, Wiley.
[6] Hull, J.C. 2005. Options, Futures and Other Derivatives,
References 6th Edition, Prentice Hall.

[1] Applebaum, D. (2004). Lévy Processes and Stochastic MARK H.A. DAVIS
Calculus, Cambridge University Press.
Exchange Options American option to exchange two fixed zero-dividend
assets is not exercised early.)

Definition and Examples Pricing and Hedging Approaches


A European exchange option is a contract that The exchange option is a special case of a path-
gives the buyer the right to exchange two (possibly independent contingent claim with payoff being a
dividend-paying) assets A and B at a fixed expiration homogeneous function of the underlying asset prices
time T , say, to receive A and deliver (or pay) B; thus, at expiration. It is governed by the same general the-
the option payoff is ory (see Option Pricing: General Principles). One
makes sure that the underlying assets are arbitrage
(AT − BT )+ := max(AT − BT , 0) (1) free, which implies that there are no free lunches in a
strong sense. If the payoff can be attained by a suffi-
(American and Bermudan exchange options are ciently regular self-financing trading strategy (SFTS)
complicated by early optimal exercise and not dis- (e.g., a bounded number of shares or “deltas”), then
cussed here.) An ordinary (European) call or put on the law of one price holds and the option price at each
an asset struck at K can be viewed, as in [9], as time is defined as the value of the self-financing port-
an option to exchange the asset with the T -maturity folio. Otherwise, arbitrage-free pricing is not unique.
zero-coupon bond of principal K. More generally, a We do not discuss this case, but only mention that
call or put on an s-maturity forward contract (s ≥ T ) one approach then chooses a linear pricing kernel
on a zero-dividend asset is equivalent to an option (e.g., the minimal measure) among the many then
to exchange the asset at time T with an s-maturity available and another is nonlinear based on expected
zero-coupon bond. Options to exchange two stocks utility maximization.
or commodities provide good hypothetical examples Payoff replication by an SFTS is a question of
but are not prevalent in the market place. predictable representation. As the payoff in this case
Exchange options are related to spread options is a path-independent function of the underliers, it
with time-T payoffs of the form (X − Y )+ , given two seems natural that the option price as well as deltas
prescribed time-T observables X and Y . A common be functions of time and the underliers at that time.
structure is a CMS spread option, with X and Y , say, This has been the traditional Markovian approach,
the 20-year and the 2-year spot swap rates at time beginning with Black and Scholes [1] and immediate
T . A spread option can be viewed as an exchange extension by Merton [9] (see Black–Scholes For-
option when there exist (or can be replicated) two mula). Their simple choice of a geometric Brownian
zero-dividend assets A and B such that AT = X and motion for the underlying asset in [1] and more
BT = Y . In the CMS case, A and B can be taken as generally of a deterministic-volatility forward price
the coupon cash flows of two CMS bonds or swaps. In process in [9] meant that the underlying SDE and
practice, exchange options on dividend-paying assets the associated PDE had constant coefficients (in
are reduced to the zero-dividend case in a similar log-state). Itô’s formula was applied to construct a
way. riskless hedge, with the deltas (hedge ratios) simply
Interest-rate swaptions, including caplets and given by partial derivatives of the unique solution to
floorlets as one-period special cases, can be viewed the PDE.
both as ordinary call or put options struck at par Black and Scholes constructed an SFTS for a call
on coupon bonds and more directly as options to option struck at K by dynamically rebalancing long
exchange the fixed and floating cash flow legs of a positions on the underlying asset A financed by short-
swap. The latter is the standard as it imposes the ing the riskless money market asset B ∗ = (ert ), post
classical assumption of a lognormal ratio AT /BT on an initial investment equal to the option price. Mer-
the forward swap rate (a swap-curve concept) rather ton’s extension to stochastic interest rate r treated
than on the forward coupon bond price. the call as an option C to exchange the asset A
An exchange option is related to its reverse by with the T -maturity zero-coupon bond B of princi-
parity: (Y − X)+ = (X − Y )+ + Y − X. (Hence, an pal K. The Black–Scholes model corresponded to a
2 Exchange Options
t
deterministic bond price Bt = e−r(T −t) K, but now,
Bt∗ = e 0 s for discounting payoffs before expecta-
r ds
in general, B had infinite variation. The former’s tion. In concert with Black–Scholes but in contrast
simplicity was nonetheless recaptured by exploiting to Merton and Margrabe, the finite variation asset B ∗
the homogeneous symmetry of the option payoff to was their exclusive choice of numéraire.
reduce dimensionality by 1—in effect, a projective With the advent of the forward measure some-
transformation that hedged the forward option con- time later (see Forward and Swap Measures), it
tract F := C/B with trades in the forward asset X := was evident that Merton’s choice of an infinite vari-
A/B. The relevant volatility was accordingly the for- ation zero-coupon bond B as the financing hedge
ward price volatility. An SFTS in the two assets and instrument fitted equivalent martingale measure the-
Itô’s formula led to a PDE for the homogeneous ory perfectly well, and it led to quicker derivations
option price function C(t, A, B) and an equivalent of concrete pricing formulae than B ∗ , as discount-
PDE for the forward option price function F (t, X). ing is conveniently performed outside the expecta-
Margrabe [8] extended the theory in [9] to an tion [4, 7]. Another useful numéraire was the one
option to exchange any two correlated assets assum- by Neuberger [10] to price interest-rate swaptions.
ing constant volatilities (see Margrabe Formula). Viewed as an option to exchange the fixed and float-
He observed akin to [9] that the self-financing equa- ing swap cash flows, the assets’ ratio A/B represents
tion with ∂C/∂A and ∂C/∂B as deltas is, by Itô’s the forward swap rate here. The assumption in [10]
formula, equivalent to C(t, A, B) satisfying a PDE that the ratio has deterministic volatility yielded a
with no first-order terms in A, B. Choosing C as the model that has since served as industry standard to
homogenized Black–Scholes function, it followed quote swaption-implied volatilities (see Swap Mar-
by Euler’s formula for homogeneous functions that ket Models). Here, it is noteworthy that the ratio A/B
∂C/∂A and ∂C/∂B in fact formed an SFTS. The has deterministic volatility but A and B themselves
result demonstrated that (in this case) the exchange decidedly do not. In time, El-Karoui et al. [4] showed
option is replicated by dynamically going long in A that one can basically change numéraire to any asset
and short in B, with no trades in any other asset. (This B and associate with it an equivalent measure under
fails in general, e.g., a bond exchange option in a which A/B is a martingale for every other asset A
k ≥ 3 factor non-Gaussian short-rate model.) “Taking (see Change of Numeraire).
asset two as numéraire,” Margrabe [8] also presented Today, option pricing and hedging theory has
(acknowledging Stephen Ross) a key financial invari- advanced farther and in many directions. Especially
ance argument as a heuristic alternative to the PDE relevant to our discussion of exchange options is the
algebraic proof of [9], reducing to a call on A/B principle of numéraire invariance and arbitrage-free
struck at 1 in the Black–Scholes model with zero modeling. For in-depth studies of these and related
interest rate. topics, we refer the reader to [3] and [2], among other
Martingale theory leads to a conceptual as well excellent books. Our approach is to concentrate on the
as computationally practical representation of solu- modeling in “projective coordinate” X := A/B, and
tions to the PDEs that describe option prices as a impose for the most part conditions that are invariant
conditional expectation of terminal payoff. Harrison under the transformation X  → 1/X.
and Kreps [5] and Harrison and Pliska [6] devel-
oped, in related papers, an equivalent martingale The Deterministic-volatility and
measure framework that not only made this fruit-
Exponential-Poisson Models
ful representation of the option price available but
also laid a more general and probabilistic formula- The option to exchange two assets with a deter-
tion of the notion of a dynamic hedge, or its mirror ministic volatility σ (t) of the asset price ratio
image, a replicating SFTS (see Risk-neutral Pric- X = A/B is celebrated as the simplest nontriv-
ing). Their arbitrage-free semimartingale approach ial example in option pricing theory. Its classi-
does permit path dependency, yet accommodates cal Black–Scholes/Merton option price function and
Markovian SDE/PDE models even better. They took explicit representation of the “deltas” (“hedge ratios”)
the money market asset B ∗ as a tradable enter- illustrate the principles that underline options in many
ing any hedge, giving it a general stochastic form assets with arbitrary homogeneous payoffs and more
Exchange Options 3

general dynamics. There is another concrete albeit This reduces existence to finding an F0 and δ A such
less known example with simple jumps in X involv- that
ing the Poisson rather than the normal distribution.  +  T  
The pattern is similar, with the main difference being AT At
−1 = F0 + δtA d (5)
that the deltas are the partial differences rather than BT 0 Bt
the partial derivatives of the option price function.
We fix, throughout, a stochastic basis (, (Ft ), F, The exchange option  price process
 is then the
) with time horizon t ∈ [0, T ], T > 0. In this semimartingale C = B F0 + δ d B . A A
section, we fix two zero-dividend assets with price Numéraire invariance in effect reduces general
processes A = (At ) and B = (Bt ). option pricing and hedging to a market where one
of the asset price processes equals 1 identically.
The remaining task is to find the above “projective”
The Exchange Option Price Process
predictable representation of the ratio payoff against
the ratio process.
When A and B are semimartingales, we call a pair
(δ A , δ B ) of (locally) bounded predictable processes
a (locally) bounded SFTS (see, more generally, the Deterministic-volatility Exchange Option Model
section  Self-financing
 Trading Strategies) if C =
C0 + δ A dA + δ B dB, where Let σ (t) > 0 be a continuous positive function.
Define the Black–Scholes/Merton projective option
C = δA A + δB B (2) price function

Clearly, C is then a semimartingale, C = f (t, x) := xδA (t, x) + δB (t, x) (6)


δ A A + δ B B, and hence C− = δ A A− + δ B B− .
The differential form of the self-financing equation for t ≤ T , x > 0, where δA (T , x) := 1x>1 , δB
is often handy: (T , x) := −1x>1 , and for t < T ,

dC = δ A dA + δ B dB (3)  √ 
log x νt
δA (t, x) : = N √ + ,
SFTSs form a linear space. If there exists a unique νt 2
bounded SFTS (δ A , δ B ) such that  √ 
log x νt
δB (t, x) : = − N √ − (7)
CT = (AT − BT )+ (4) νt 2
T
then it is justified to call C the exchange option price where νt := t σ 2 (s)ds and N (·) is the normal dis-
process and δ A and δ B the deltas. tribution function. The function f (t, x) is continuous,
Assume now that the semimartingales A and B and on t < T is C 1 in t and analytic in x. In addition,
are positive and have positive left limits. −1 ≤ δB ≤ 0 ≤ δA ≤ 1, and
The numéraire invariance principle (see the
∂f
section Numéraire Invariance and more comprehen- f (T , x) = (x − 1)+ , (t, x) = δA (t, x) (8)
sively the section The Invariance Principle) states ∂x
that if (δ A , δ B ) is a locally
 bounded
 SFTS,
 then As is well known and seen in the sections
C = δ A A + δ B B satisfies d B C = δ A d A (simi- Deterministic-volatility Model Uniqueness and Pro-
B
larly by symmetry with A as numéraire). This is jective Continuous SDE SFTS, the function f (t, x)
useful for uniqueness. Numéraire invariance also is the unique C 1,2 (on t < T ) solution with bounded
∂f
states the converse: if C is a semimartingale and partial derivative ∂x (t, x) subject to f (T , x) = (x −
δ A a locally bounded
 predictable process such that 1)+ of the PDE
C A
d B = δ d B , then (δ A , δ B ) is an SFTS and
A

C A ∂f 1 ∂ 2f
equations (2) and (3) hold, where δ B = B− − δ A B− . (t, x) + σ 2 (t)x 2 2 (t, x) = 0 (9)
− − ∂t 2 ∂x
4 Exchange Options
   
Assume now A = BX for some positive continu- d BC = δ A d A , where C = δ A A + δ B B. (See the
ous semimartingale X > 0 satisfying B
section The Invariance Principle for a more lucid
d[log X]t = σ 2 (t)dt, (A = BX) (10) treatment.)

Under this assumption, one traditionally defines


Exponential-Poisson Exchange Option Model
the exchange option price process C by
Assume that the two zero-dividend asset price pro-
C := BF, F = (Ft ), Ft := f (t, Xt ) (11)
cesses A and B satisfy A = BX, where X is a
Clearly, CT = (AT − BT )+ . The definition is jus- semimartingale satisfying
tified using the continuous semimartingales
Xt = X0 eβPt −(e −1)λt
β
(15)
∂f for some constants β  = 0, λ > 0 and semimartin-
δtA : = δA (t, Xt ) = (t, Xt ),
∂x gale P such that [P ] = P and P0 = 0 (Thus, Pt =

δtB : = δB (t, Xt ) = Ft − δtA Xt (12) s≤t 1Ps =0 ). Define the projective option price
function f (t, x), x > 0 by
Clearly, C = δ A A + δ B B, and the deltas are
bounded: 0 ≤ δ A ≤ 1 and −1 ≤ δ B ≤ 0. Since ∞

(xeβn−(e −1)λ(T −t) − 1)+
β
f (t, x) satisfies the PDE (9) (as directly verified) f (t, x) : =
∂f
and ∂x (t, Xt ) = δtA , by Itô’s formula, the continu- n=0

ous semimartingale F := (f (t, Xt )) satisfies the pre- λn


dictable representation × (T − t)n e−λ(T −t) (16)
n!
dF = δ A dX (13) and exchange option price process by

If, at this stage, we assume that B is a semi- C := BF, F = (Ft ), Ft := f (t, Xt ) (17)
martingale, then A and C are semimartingales too,
and by the invariance principle discussed next, dC = Clearly f (T , x) = (x −1)+ and CT = (AT − BT )+ .
δ A dA + δ B dB and (δ A , δ B ) is a bounded SFTS. One has the predictable representation

dF = δ A dX (18)
Numéraire Invariance
as shown shortly, where
Let X and F be two semimartingales and δ A be
a locally bounded predictable process such that f (t, eβ x) − f (t, x)
dF = δ A dX. Set δ B = F − δ A X. Clearly δ B = F− − δtA := δA (t, Xt− ), δA (t, x) :=
δ A X− since F = δ A X. Let B be any semimartin- (eβ − 1)x
gale. Set A = BX, C = BF . Clearly C = δ A A + (19)
δ B B. We claim dC = δ A dA + δ B dB, so (δ A , δ B ) is
an SFTS. Thus by numéraire invariance, (δ A , δ B ) is an SFTS
Indeed, this follows by applying Itô’s product if A and B are semimartingales, where
rule to BF , then substituting dF = δ A dX and F− = δ B := F − δ A X = F− − δ A X− (20)
δ B + δ A X− , followed by Itô’s product rule on BX:
Moreover, it is bounded. Indeed, since |(eβ y −
dC = d(BF ) = B− dF + F− dB + d[B, F ] 1)+ − (y − 1)+ | ≤ |eβ − 1|y for any y > 0,
= B− δ A dX + (δ B + δ A X− )dB + δ A d[B, X] ∞
 β
−1)λ(T −t)
= δ A d(BX) + δ B dB = δ A dA + δ B dB (14) 0 ≤ δA (t, x) ≤ eβn−(e
n=0
Conversely, if A and B are semimartingales λn
with B, B− > 0 and (δ A , δ B ) is an SFTS, then × (T − t)n e−λ(T −t) = 1 (21)
n!
Exchange Options 5

Hence, 0 ≤ δ A ≤ 1. Similarly, −1 ≤ δ B ≤ 0. Next, define the function of t ≤ T and p ∈ ,


We note that f (t, x) is not C 1 in x (though con-
vex, absolutely continuous, and piecewise analytic in u(t, p) : = f (t, x(t, p))
x). We also caution that this model is arbitrage free


only when {Pt = n} > 0 for all t > 0 and n ∈ , −1)λT
− 1)+
β
= (X0 eβ(p+n)−(e
for example, when P is a Poisson process under an
n=0
equivalent measure.
λn
× (T − t)n e−λ(T −t) (28)
Derivation of the Predictable Representation n!

To show dF = δ A dX (equation (18)), we first note Clearly, u(t, Pt ) = Ft . One readily verifies that
that [P ]c = 0 since u(t, p) satisfies the equation
 [P ] = P ; hence, (P ) = P
2

and Pt = [P ]t = s≤t Ps . If v(p), p ∈ , is any ∂u


function, then clearly V = (v(Pt )) is a semimartin- (t, p) + λ(u(t, p + 1) − u(t, p)) = 0 (29)
gale and we have ∂t
Hence by equation (24) we have,
Vt = v(Pt ) − v(Pt− ) = (v(Pt ) − v(Pt− ))Pt
= (v(Pt− + 1) − v(Pt− ))Pt (22) dFt = (u(t, Pt− + 1) − u(t, Pt− ))d(Pt − λt) (30)

Hence, as V is clearly the sum of its jumps, Combining this with equation (27) and the fact
that clearly

Vt − v(0) = Vs
s≤t u(t, p + 1) − u(t, p) = f (t, eβ x(t, p))

= (v(Ps− + 1) − v(Ps− ))Ps − f (t, x(t, p)) (31)
s≤t
 t we conclude that, as desired,
= (v(Ps− + 1) − v(Ps− ))dPs (23)
0 f (t, eβ Xt− ) − f (t, Xt− )
dFt = dXt (32)
Likewise, (u(t, Pt )) is a semimartingale for any (eβ − 1)Xt−
C 1 in t function u(t, p), p ∈ , and one has
The Homogeneous Option Price Function
∂u
du(t, Pt ) = (t, Pt− )dt + (u(t, Pt− + 1)
∂t There is an alternative derivation of the self-financing
− u(t, Pt− ))dPt (24) equation dC = δ A dA + δ B dB much along that in [9]
and [8] that does not employ numéraire invariance. It
Now, define the function
is related to a family of two-dimensional PDEs sat-
β
−1)λt
x(t, p) := X0 eβp−(e (p ∈ ) (25) isfied by the Merton/Margrabe homogeneous option
price function c(t, a, b) below.
Clearly Xt = x(t, Pt ). Applying equation (24) to Let f (t, x), x > 0, be any C 1,2 function, for
the function x(t, p) and using that example, as in equation (6). Define the homogenized
function
∂x
(t, p) = − x(t, p)(eβ − 1)λ,  a
∂t c(t, a, b) := bf t, (a, b > 0) (33)
b
x(t, p + 1) − x(t, p) = x(t, p)(eβ − 1) (26)
Then c(t, a, b) is homogeneous of degree 1 in
(or alternatively applying Itô’s formula to x(t, Pt ) (a, b), and hence by Euler’s formula
and simplifying) yields
∂c ∂c
dXt = Xt− (eβ − 1)d(Pt − λt) (27) c(t, a, b) = (t, a, b)a + (t, a, b)b (34)
∂a ∂b
6 Exchange Options

A laborious repeated application of the chain rule This combined with Euler’s formula (34) and the
on equation (33) gives definition (6) f := δA x + δB give

∂c  a
2
2∂ c
2
2∂ c
(t, a, b) = δB t, (39)
a 2
(t, a, b) = b 2
(t, a, b) ∂b b
∂a ∂b
Assume that A and B are positive semimartingales
∂ 2c
= − ab (t, a, b) with positive left limits and X := A/B has deter-
∂a∂b ministic volatility σ (t): d[X]t = Xt2 σ 2 (t)dt. Using
∂ 2f a equation (12), the deltas are conveniently the sensitiv-
= b x2 2
(t, x), x := (35) ities of the homogeneous Merton/Margrabe function:
∂x b

Let σ (t), σA (t, a, b), σB (t, a, b), σAB (t, a, b) be ∂c ∂c


δtA = (t, At , Bt ), δtB = (t, At , Bt ) (40)
any functions (a, b > 0) such that ∂a ∂b

σ 2 (t) = σA2 (t, a, b) + σB2 (t, a, b) − 2σAB (t, a, b) Since X is continuous, we also have δtA =
∂c B
(36) ∂a (t, At− , Bt− ) and similarly δt . The section
Deterministic-Volatility Exchange Option Model
Using equations (35), (36), and ∂c ∂t (t, a, b) = yields dC = δ A dA + δ B dB with Ct = Bt f (t, Xt ) =
 
∂f a c(t, At , Bt ). Therefore, by equation (40) and Itô’s
b ∂t t, b , we see that c(t, a, b) satisfies the PDE
formula,

∂c 1 2 ∂ 2c 1 ∂ 2c ∂c 1 ∂ 2c 1 ∂ 2c
+ σA (t, a, b)a 2 2 + σB2 (t, a, b)b2 2 dt + d[A] c
+ d[B]ct
∂t 2 ∂a 2 ∂b ∂t 2 ∂a 2 t
2 ∂b2
∂ 2c ∂ 2c
+ σAB (t, a, b)ab =0 (37) + d[A, B]ct = 0 (41)
∂a∂b ∂a∂b
∂f
if and only if f (t, x) satisfies the PDE (9): ∂t + where the partial derivatives are evaluated at
2 (t, At− , Bt− ) and [·]c is the bracket continuous part.
1 σ 2 (t)x 2 ∂ f = 0.
2 ∂x 2 (The
 jump termAin Itô’s formula vanishes as it equals
s≤t (Cs − δs As − δs Bs ) = 0.)
B
The PDE (9) was utilized in [1] and [9] (but not
in [8]), and Merton [9] stated its equivalence to the Returning to the approach of Merton [9],
PDE (37) (assuming σA , etc., depend only on t). As assume now that d[log A]t = σA2 (t, At , Bt )dt for
noted in [9] and expounded in [8], if d[log A]t = some function σA and similarly d[log B] = σB2 dt and
σA2 (t)dt, d[log B]t = σB2 (t)dt and d[log A, log B]t = d[log A, log B] = σAB dt. Then equation (36) holds
σAB (t)dt, then Itô’s formula and equation (37) imply using log X = log A − log B. Since f (t, x) satisfies
at once dc(t, At , Bt ) = δtA dAt + δtB dBt , with δ A and the PDE (9), the PDE (37) follows as before by the
δ B as in equation (40), and thus (δ A , δ B ) is an SFTS chain rule. However, equation (37) implies equation
with price process c(t, A, B) by Euler’s formula (34). (41), which by Itô’s formula in turn implies the self-
Let us expand on this (see also the sections financing equation dC = δ A dA + δ B dB with δ A and
Self-financing Trading Strategies and Homoge- δ B given by equation (40).
neous Continuous Markovian SFTS). Let σ (t) >
0 be a continuous function, and f (t, x) be the
Black–Scholes/Merton function (6). Set c(t, a, b) := Change of Numéraire
bf (t, a/b). Clearly,
The solution c(t, a, b) to the PDE (37) subject to
c(T , a, b) = (a − b)+ can be expressed in a form
∂c ∂f  a   a
Ɛ (X − Y )+ for some random variables X and Y > 0
(t, a, b) = t, = δA t,
∂a ∂x b b with means a and b. Expectations of this form often
(38) become more tractable by a change of measure as in
Exchange Options 7

[4]. Define the equivalent probability measure  by In general, since F := C/B is a -martingale, we
d := Y . Clearly, have the following pricing formula:
d ƐY
   
X Ɛ(X) d Y Ct = Bt Ɛ [CT /BT | Ft ] (45)
Ɛ = := (42)
Y Ɛ(Y ) d Ɛ(Y )
Replacing X by (X − Y )+ in equation (42) and Deterministic-volatility Model Uniqueness
using the homogeneity to factor out Y , we get
 + Assume that A and B are positive semimartingales
X with positive left limits and X := A/B is an Itô
Ɛ (X − Y )+ = Ɛ(Y )Ɛ −1 (43)
Y process following
If X/Y is -lognormally distributed then equation  
dXt A
(43) together with equation (42) readily yields = µt dt + σt dZt , X := (46)
Xt B


+ log(ƐX/ƐY ) ν where Z is a Brownian motion and µ and σ >
Ɛ(X − Y ) = Ɛ(X)N
+
2 0 are predictable processes with σ bounded and
ν 1/2 T (µt /σt )2 dt

Ɛe 0 < ∞. Let (δ A , δ B ) be an SFTS
log(ƐX/ƐY ) ν with δ bounded. Set C := δ A A + δ B B. We claim
A
− Ɛ(Y )N

ν 2 that δ A = δ B = 0 if CT = 0. Indeed, the process
(44)
    µ   µ 2
µ − σ dZ− 1 σ dt
M := E − dZ = e 2
where ν  := var [log(X/Y )]. When X and Y are σ
bivariately lognormally distributed, it is not difficult (47)
to show that X/Y is lognormally distributed in both
 and  with the same log-variance ν  = ν := is then a positive martingale with M0 = 1. Define the
var[log(X/Y )]. Then ν  can be replaced with ν in
equation (44). This occurs when the functions σA , σB  µ  by d = MT d.
equivalent probability measure
The process W := Z + σ dt is a -Brownian
and σAB in equation (37) are independent of a and b,
motion because [W ]t = t and W is -local martin-
as in [8, 9].
gale as MW is a local martingale using Itô’s product
rule:
Uniqueness
Assume that A and B are positive semimartin- d(MW ) − W dM = MdW + d[W, M]
gales with positive left limits such that X := A/B  µ  µ
= M dZ + dt − M d[Z]
is square-integrable martingale under an equivalent σ σ
probability measure  and d
X  t = Xt− σt dt for
2 2
= MdZ (48)
some nowhere zero process σ , where
X  is the
-compensator of [X]. (Of course,
X  = [X] if Moreover, dX = Xσ dW by equation (46). There-
X is continuous.) Let (δ A , δ B ) be an SFTS and fore, X is a -square integrable martingale since σ
set C := δ A A + δ B B. We claim that δ A = δ B = 0 if is bounded. The claim, thus, follows by the section
CT = 0 and δ A is bounded. Uniqueness.
Indeed, set F := C/B. By numéraire invariance, Assume now that σt is deterministic. The results of
dF = δ A dX. Hence, F is a -square-integrable the section Deterministic-Volatility Exchange Option
martingale since X is and δ A is bounded. Thus, Model hold since d[log X] = σt2 dt. However, we can
F = 0 since FT = CT /BT = 0. Hence, 0 = d
F  = now derive them more conceptually. Indeed, both
(δ A )2 X−2 2
σ dt. However, X− σ > 0. Thus, δ A = 0
2 2
conditioned on Ft and unconditionally, XT /Xt is -
and δ = F − δ X = 0.
B A
lognormally distributed with mean 1 and log-variance
8 Exchange Options

T T T
σs dWs −1/2 σs2 ds is a martingale. Define the equivalent probability

t σs ds since XT = Xt e
2 t t . Hence, measure  by d = MT d. Then N := P − λdt
by equation (45), is a -local martingale as MN is a local martingale
by Itô’s product rule:
f (t, Xt ) = Ɛ [(XT − 1)+ | Ft ] where
 +
XT d(MN ) − N− dM = M− dN + d[M, N ]
f (t, x) : = Ɛ x −1 (49)
Xt = M− (dP − λdt)
which function readily equals the Black–Scholes/  
λ
Merton option price function (6). Thus, F := + M− − 1 dP
κ
(f (t, Xt )) is a -martingale. Therefore, Itô’s for-
λ
mula implies that f (t, x) satisfies the PDE (9) = M− (dP − κdt) (53)
∂f κ
and dF = δ A dX where δ A := ∂x (t, Xt ). Numéraire
invariance now yields that the pair (δ A , δ B := F − Therefore, by equation (50), X is a -square-
δ A X) is an SFTS. Clearly, CT = (AT − BT )+ where integrable martingale (in fact, in Hp () for all
C := δ A A + δ B B = BF . p > 0) since λ is bounded. Thus, by the section
Uniqueness, δ A = δ B = 0 if CT = 0, as claimed.
Assume now that λ is a positive constant. By equa-
Exponential-Poisson Model Uniqueness tion (51) we have a special case of the exponential-
Poisson model. Further, P is a -Poisson process
Let β  = 0 be a constant and κ and λ be positive con-
with intensity λ since [P ] = P . We now have unique-
tinuous adapted processes such that λ is bounded and
 T  λt 2 ness, but additionally, the previous results follow
−1 κt dt more conceptually as follows.
Ɛ e 0 κt < ∞. Let P be semimartingale sat- Conditioned on Ft , PT − Pt is -Poisson dis-
isfying [P ] = P with P0 = 0 and compensator κdt.
tributed with mean λ(T − t). Its unconditional -
Assume that A and B are positive semimartingales
A satisfies distribution is identical. Thus, the Ft - conditional and
with positive left limits and X := B the unconditional -distribution of XT /Xt are iden-
tical and are exponentially Poisson distributed with
dXt = Xt− (eβ − 1)(dPt − λt dt) (50) mean 1. Hence, by equation (45),
Using deβP = (eβ − 1)eβP− dP or as in the section
Derivation of the Predictable Representation, this is f (t, Xt ) = Ɛ [(XT − 1)+ | Ft ] where
 +
equivalent to the integrated form XT
f (t, x) : = Ɛ x −1 (54)
t Xt
βP −(eβ −1) λs ds
Xt = X0 e t 0 (51)
which function readily equals that defined in equation
A B
Let (δ , δ ) be an SFTS with δ bounded. A (16). Thus, F := (f (t, Xt )) is a -martingale. Using
Set C := δ A A + δ B B. We claim  δ = δ =
  λ that
A B this and equation (24), one shows that F satisfies
−1 (dP −κdt) equation (32) and with it that the pair (δ A , δ B ) as
κ
0 if CT = 0. Indeed, Ɛ e T = Ɛ defined in equation (19), equation (20) is a bounded
 T λt 2

κt −1 κt dt SFTS for the exchange option.


e 0 < ∞, so the positive local martingale

   Extension to Dividends
λ
M :=E − 1 (dP − κdt)
κ
Consider two assets with positive price processes Â
    
− (λ−κ)dt λs and B̂ and continuous dividend yields ytA and ytB .
= e 1+ − 1 Ps When there exist traded or replicable zero-dividend
κs
s≤·
assets A and B such that AT = ÂT and BT = B̂T (if
(52) not, there is little hope of replication), it is natural to
Exchange Options 9

define the price process of the option to exchange  replicates the given payoff h(AT ) in general. The
and B̂ to be that of the option to exchange A and B. construction is explicit in the multivariate extensions
If y A and y B are deterministic, then consistent with of the deterministic-volatility and exponential-
the treatment of dividends in [9], A (and similarly B) Poisson models.
is simply given by The homogeneity of the payoff function h(a)
implies h(AT ) = Am T g(XT ) where g(x) := h(x, 1), 
T A 1
x ∈ + , n := m − 1, and X := Am , · · · , Am .
n
n
− y ds A A
At : = a Ãt = e t s Ât ,
t A T A Once a predictable representation F = F0 + δ ·
y ds − y dt
Ãt : = e 0 s Ât , a := e 0 t (55) X, FT = g(XT ) is found, then by numéraire
invariance δ := (δ , δ m ) will be an SFTS with
n
n h(Ai T ), where δ := F− − i=1 δ X− =
Note A/B is a semimartingale if and only if Â/B̂ m i
payoff
is, in which case [log A/B] = [log Â/B̂]. F − i=1 δ X. Uniqueness of pricing requires
In general, Ãt is the price of the zero-dividend boundedness of partial derivatives (or differences)
asset that initially buys one share of  and thereon of h(a) (or g(x)) and that A be arbitrage free,
continually reinvests all dividends in  itself. What meaning X is a martingale under an equivalent
is required is that the four zero-dividend assets A, Ã, measure. Arbitrage freedom holds “generically” when
B, and B̃ be arbitrage free in relation to one another the matrix (
X i , X j ) is nonsingular, basically a “no-
(see the section Arbitrage-free Semimartingales and redundant-asset” condition. Then the SFTS is also
Uniqueness). unique.
For instance, say  and B̂ are the yen/dollar and Libor and swap derivatives are among contingent
yen/Euro exchange rates viewed as yen-denominated claims with homogeneous payoffs.
dividend assets. Then A is the yen-value of the US T -
maturity zero-coupon bond and à is the yen-value of
the US money market asset. This exchange option is Self-financing Trading Strategies
equivalent to a Euro-denominated call struck at 1 on
By an SFTS we mean a pair (δ, A) of an m-
the Euro/dollar exchange rate Â/B̂. The ratio A/B
dimensional semimartingale A = (A1 , . . . , Am ) and
is the forward Euro/dollar exchange rate. If it has
an A-integrable predictable vector process δ =
deterministic volatility, we are as in a setting of [7],
(δ 1 , . . . , δ m ) such that (with δ · A denoting the m-
which yields the same pricing formula as that from
dimensional stochastic integral)
the section Deterministic-volatility Exchange Option
Model. 
m 
m
δ i Ai = δ0i Ai0 + δ · A (56)
i=1 i=1
Pricing and Hedging Options with
Homogeneous Payoffs We then say δ is an SFTS for A. This is equivalent
to saying that the SFTS price process
We took some shortcuts to quickly present
the main results for two of the simplest and 
m

among the most interesting examples. A better C := δ i Ai (57)


understanding of the principles at work requires i=1

generalization to contingent claims C on many


satisfies C = C0 +
δ · A. Clearly, C is then a semi-
assets with price processes A = (A1 , · · · , Am ) > 0
martingale, C = i δ i Ai , and hence
and a path-independent payoff CT = h(AT ) given
as a homogeneous function h(a), a ∈ m + , of the 
m
asset prices AT at expiration time T . Combined C− = δ i Ai− (58)
with an underlying SDE and the resulting PDE, i=1
such a Markovian setting utilizes the invariance
principle and equivalent martingale measures to If δ i are bounded (say by b) and Ai are martin-
derive unique pricing and construct an SFTS that gales, then the SFTS price process C is a martingale
10 Exchange Options

because C is then a local martingale that is dominated (61) has finite variation. Thus, if [Ai ]are absolutely
by a martingale M: continuous and the m × m matrix d/dt[Ai , Aj ]
is nonsingular, then δti = ∂c/∂a i (t, At ), so equa-
 
|Ct | ≤ b |Ait | = b |Ɛ[AiT | Ft ]| tion (62) holds and c(t, At ) = i ∂c/∂ai (t, At )Ait .
i i If further the support of At is a cone, it fol-
 lows c(t, a) is homogeneous of degree 1 in a on
≤b Ɛ[|AiT | | Ft ] =: Mt that cone. 
i Assume that M i := e− rdt Ai are local martin-
(59) gales under an equivalent measure for some locally
bounded
 predictable process r. Then dAi = rAi dt +
As suggested by the case of a locally bounded δ, rdt
e dM i ; thus, by equations (61) and (57)
we often use the differential form

m
1  ∂ 2c
m
dC = δ i dAi (60) ∂c
(t, At )dt + (t, At )d[Ai , Aj ]t
i=1 ∂t 2 i,j =1 ∂ai ∂aj
of the equation C = C0 + δ · A as a convenient
m
∂c
symbolic equivalent in calculations. One interprets = rt C t − (t, At )Ait dt
Ai as prices of m zero-dividend assets and δti as the i=1
∂ai
number of shares invested in them at time t. Then Ct
(63)
indicates the resultant self-financing portfolio price
by equation (57), and equation (60) is the self-
financing equation, implying that the change dC Hence, if c(t, a) is homogeneous (in a), then by
in the portfolio price is only due to the changes Euler’s formula equation (62) holds (yet δti may differ
dAi in the asset prices with no financing from from ∂c/∂ai (t, At ) if there are redundancies, for then
outside. a regular replicating SFTS is not unique).
Assume for the remainder of this subsection as a Given a homogeneous payoff function h(a), the
way of motivation that A is continuous and Ct = section Homogeneous Continuous Markovian SFTS
c(t, At ) for some C 1,2 function c(t, a).a Then by constructs under suitable assumptions a homoge-
equation (60) and Itô’s formula, we have neous solution c(t, a) to equation (62) with c(T , a) =
h(a). Clearly then, by Euler and Itô formulae,
(∂c/∂ai (t, At )) is an SFTS for A (as observed in [9]
1  ∂ 2c
m
∂c and highlighted in [8], see the section The Homo-
(t, At )dt + (t, At )d[Ai , Aj ]t
∂t 2 i,j =1 ∂ai ∂aj geneous Option Price Function). To this end, we
first factor out the homogeneous symmetry of h(a)
m  
∂c next.
= δti − (t, At ) dAit (61)
i=1
∂ai

∂c (t, A ) for all i then The Invariance Principle


In particular, if δti = ∂a t
 ∂c i
c(t, At ) = i ∂a (t, At )At by equation (57) and
i Let (δ, A) be an SFTS and S a (scalar) semimartin-
i
gale such that δ is SA := (SA1 , · · · , SAm )-integrable.
Then (δ, SA) is an SFTS. Consequently,
1  ∂ 2c
m
∂c
(t, At )dt + (t, At )d[Ai , Aj ]t = 0
∂t 2 i,j =1 ∂ai ∂aj 
m
d(SC) = δ i d(SAi ) (64)
(62) i=1

    
∂c ∂c where C := i δ i Ai = C0 + δ · A, that is, SC =
i,j δ − ∂a δ − ∂a d[Ai ,
i j
In general,
i j S0 C0 + δ · (SA). Indeed, by Itô’s product rule, then
Aj ] = 0 since the (left) right-hand side of equation substituting for dC and C− and regrouping, followed
Exchange Options 11

by Itô’s product rule again, numéraire invariance, δ is an SFTS for A with price
process C = Am F , provided δ is A-integrable.
Thus, numéraire invariance shows that in order
d(SC) = S− dC + C− dS + d[S, C]
to find an SFTS with a given time-T payoff CT it

m 
m 
m
is sufficient to find processes δ and F such that
= S− δ i dAi + δ i Ai− dS + δ i d[S, Ai ] F = F0 + δ · X and FT = CT /Am T.
i=1 i=1 i=1
Since δ m = F − ni=1 δ i X i , the mth delta δ m is

m like F determined by δ and F0 . As such, one inter-
= δ i (S− dAi + Ai− dS + d[S, Ai ]) prets the m-th asset as the “numéraire asset” chosen
i=1 to finance an otherwise arbitrary trading strategy δ

m in the other assets, post an initial investment of
= δ i d(SAi ) (65) C0 = Am 0 F0 . 
i=1 We often use the differential form dF = ni=1 δ i
dX i of the equation F = F0 + δ · X.
Interpreting S as an exchange rate, this result [3,
4, 8], called numéraire invariance, means that the Arbitrage-free Semimartingales and Uniqueness
self-financing property is independent of the base
currency. (To the best of our knowledge, the term We call a semimartingale A = (A1 , · · · , Am ), m ≥ 2,
was coined in the 1992 edition of [3], where a similar arbitrage free if there exists a positive semimartingale
proof is given.) S with S− > 0 such that SAi are martingales for all
If S, S > 0, then applied to the semimartingale i. Such a process S is called a state price density or
1/S we see that δ is an SFTS for A if and only if deflator for A. The law of one price (with bounded
it is one for SA. Thus, if equation (57) holds, then deltas) justifies the terminology:
equations (60) and (64) are equivalent. If A is arbitrage free and δ is a boundedSFTSi for
A, then SC is a martingale where C := m i
i=1 δ A ;
Assume now that Am , Am − > 0 and m ≥ 2. Define
the n := m − 1 dimensional semimartingale consequently, C = 0 if CT = 0.
Indeed, by numéraire invariance δ is then an SFTS
for SA with price process SC. Hence by the section
A1 An
X := , . . . , , n := m − 1 (66) Self-financing Trading Strategies, SC is a martingale,
Am Am implying SC = 0 if CT = 0, and with it C = 0, as
claimed.
Taking S = 1/Am , it follows that δ is an SFTS A simple and well-known argument yields that
for A if and only if it is an SFTS for A/Am = (X, 1), if Am , Am − > 0, then A is arbitrage free if and only
that is, if and only if F := C/Am satisfies F = F0 + if there exists an equivalent probability measure
δ · Xwhere δ := (δ 1 , · · · , δ n ). Clearly
 in this case,  such that X is a -martingale, where X :=

F = ni=1 δ i X i + δ m and F− = ni=1 δ i X− i
+ δ m as A1 , · · · An , n := m − 1.b Numéraire invariance
F = δ · X. Thus, Am Am
m
then implies that C/A  i isi a -martingale for the

n 
n   price process C := i δ A of any bounded SFTS δ,
C
δm = F − δ i X i = F− − δ i X−
i
, F := m and hence
 
A
i=1 i=1 m  CT
Ct = At Ɛ | Ft (68)
(67) Am
T
Indeed, by numéraire invariance, δ is an SFTS for
(When m = 1, a similar argument shows that δ A/Am with price process C/Am . Hence, C/Am is a
must be a constant, as intuitively obvious.) -martingale by the section Self-Financing Trading
Conversely, suppose that δ is an X-integrable Strategies since A/Am is a -martingale and δ is
process and F is a process such that F = F0 + δ · X. bounded.
Define δ m by either of the above formulas—the other Suppose that X is a -square-integrable mar-
then holds as before. Obviously then δ = (δ , δ m ) is tingale and δ i are bounded for i ≤ n. Then
an SFTS for (X, 1) with price process F . Hence by F := C/Am is a -square-integrable martingale
12 Exchange Options

since dF = ni=1 δ i dX i
n by i numéraire invariance. unique bounded SFTS for (X, 1) with payoff g(XT ),
Moreover, d
F = ij =1 δ δ j d
X i , X j  . Thus, if
 provided d[X i , X j ] = X i X j σ ij dt for some nonsin-
ij
gular matrix process (σt ).

X are continuous and the n × n matrix
i 
absolutely

d/dt
X i , X j  is nonsingular, then given any ran-
dom variableR, there exists at most one SFTS δ for Example: Projective Deterministic Volatility
A such that m i=1 δT AT = R and δ are bounded for
i i i

i ≤ n. Let X = (X 1 , . . . , X n ) > 0 be a continuous n-


dimensional martingale such that
j
Projective Continuous Markovian SFTS d[X i , X j ]t = Xti Xt σij (t)dt (74)
2
Let X = (X 1 , · · · , X n ) be a continuous vector martin- for some n deterministic continuous functions σij (t).
gale. In this, subsection x ∈ n+ if X > 0 (the main So, d[log X i , log X j ]t = σij (t)dt. Conditioned on Ft
case of interest); otherwise, x ∈ n . Let g(x) be a and unconditionally, XT /Xt is then multivariately
Borel function of linear growth (so Ɛ|g(XT )| < ∞), lognormally distributed, with mean (1, · · · , 1) and
T
and f (t, x) be a continuous function, C 1,2 on t < T . log-covariance matrix ( t σij (s)ds). Let P (t, T , z)
Set m := n + 1 and define the C 1 functions denote its distribution function. Let g(x) be a Borel
function of linear growth. Define the function
∂f  
δi (t, x) : = (t, x), i ≤ n, XT1 XTn
∂xi f (t, x) := Ɛ g x1 1 , . . . , xn n (75)
Xt Xt

n
δm (t, x) : = f (t, x) − δi (t, x)xi (69) Obviously, f (T , x) = g(x). Clearly, f (t, x) can
i=1 also be represented in two other ways as
and the continuous vector process 
δ = (δ 1 , . . . , δ m ), δti := δi (t, Xt ) (70) f (t, x) = g(x1 z1 , . . . , xn zn )P (t, T , dz)
n+
First suppose that  
XT1 Xn
f (t, Xt ) = Ɛ[g(XT ) | Ft ] (71) = Ɛ g x1 1 , . . . , xn Tn | Ft (76)
Xt Xt
Then the process F := (f (t, Xt )) is a martingale,
Equation (71) holds by the second equality, and
and since X i are also martingales, Itô’s formula yields
f (t, x) is C 1 in t and smooth (even analytic) in x on
n
∂f t < T as seen by changing variable in the integral
dFt = (t, Xt )dXti , (72)
∂x i
to y i = x i zi and differentiating under the integral
i=1
sign in the first equality. Therefore by equation (73),
and f (t, x) satisfies the PDE
1  ∂ 2f
n
∂f 1 
n
(t, Xt )dt + (t, Xt )d[X i , X j ]t = 0 ∂f ∂ 2f
∂t 2 i,j =1 ∂xi ∂xj + σij (t)xi xj =0 (77)
∂t 2 i,j =1 ∂xi ∂xj
(73)
on the support of X, equation (72) holds, and δ is an
Clearly, FT = g(XT ) and equation (72) imply δ is SFTS for (X, 1) with price process F := (f (t, Xt )), a
an SFTS for (X, 1) with price process F . martingale by equation (71). If g(x) is dx-absolutely
Conversely, suppose that f (t, x) satisfies equa- ∂g
continuous with bounded partial derivatives ∂x
tion (73) or equivalently, by Itô’s formula, equa- i
tion (72). By equation (72), δ is an SFTS for (as L1 functions), then g(x) has linear growth,
loc p
(X, 1) with price process F := f (t, Xt ). Thus by the Ɛ|g(XT )| < ∞ for p > 0, and
section Self-financing Trading Strategies, if δi (t, x)  
are bounded then F is a martingale and if further ∂f XTi ∂g XT1 XTn
f (T , x) = g(x) then equation (71) holds. Moreover, (t, x) = Ɛ x1 1 , . . . , x n n
∂xi Xti ∂xi Xt Xt
as in the section Arbitrage-free Semimartingales and
Uniqueness, δ given by equation (70) is then the (78)
Exchange Options 13

∂f bounded (hence of linear growth) Borel function


Thus, δi (t, x) = ∂x (t, x) are bounded. If g(x) −
 ∂g
i g(x), then the assumptions of the section Projective
x is bounded, then so is δm (t, x) as Continuous Markovian SFTS are satisfied and the
∂xi i
conclusions hold. In particular, equation (72) then
    n    holds, and since
XT ∂g XT XTi
δm (t, x) = Ɛ g x − x
Xt ∂xi Xt Xti
i=1 d[X i , X j ] = X i X j σij (t, X)dt where
(79)

k

1,2 σij (t, x) : = ϕil (t, x)ϕj l (t, x) (82)


It further follows that if f (t, x) is any C
∂f l=1
function with bounded partials ∂x (t, x) satisfying
i
f (T , x) = 0 for all x and the PDE (77), then F := it follows from equation (73) that, at least on the
(f (t, Xt )) = 0. Indeed, equation (72) then holds by support of X, f (t, x) satisfies the PDE
PDE (77) and Itô’s formula, implying F is a square-
1 
n
integrable martingale. Thus F = 0 since FT = 0. As ∂f ∂ 2f
(t, x) + xi xj σij (t, x) (t, x) = 0
such, f (t, x) = 0 identically if the support of Xt ∂t 2 i,j =1 ∂xi ∂xj
equals n+ for every t. This is so if the matrix (σij (t)) (83)
is nonsingular at least near 0, and it is “generically”
so even when the matrix has rank 1 but is time In the deterministic-volatility case, the functions
dependent. ϕij and hence σij are independent of x and simply
XTt,x = xXT /Xt , explaining why in this special case
Projective Continuous SDE SFTS f (t, x) is also given by equation (73).
In general, if g(x) is absolutely continuous with
Continuous Markovian positive martingales X = bounded derivatives and the probability transition
(X 1 , · · · , X n ) often arise as solutions to an SDE sys- function of X is sufficiently regular, one shows, as
tem of the form in the deterministic volatility case, that the x-partial
derivatives of f (the deltas) are bounded and thereby

k
dXti = Xti
j
ϕij (t, Xt )dWt (80) concludes uniqueness.
j =1
If σij (t, x) are homogeneous of degree 0 in x, then
(assumed) uniqueness and symmetry of PDE (73)
where W 1 , · · · , W k are independent Brownian under dilation in x imply that f (t, x) is homogeneous
motions and ϕij (t, x), x ∈ n+ , are continuous of degree 1 in x if g(x) is so. By Euler’s formula then
bounded functions. As is well known, for each δm (t, x) = 0 in equation (69), implying (δ 1 , · · · , δ n )
s ≤ T and x ∈ n+ , there is a unique continuous is an SFTS for X.
semimartingale X s,x = (Xts,x ) on [s, T ] with Xss,x =
x satisfying this SDE; moreover, X s,x is a positive
square-integrable martingale (in fact in all Hp ) Homogeneous Continuous Markovian SFTS
since ϕij (t, x) are bounded. Fixing an X0 ∈ n+ , the
solution on [0, T ] starting at X0 at time 0 is denoted Let A = (A1 , · · · , Am ) be a semimartingale with
as X = X 0,X0 . The Markov property holds: for any A, A− > 0 such that X i := Ai /Am are Itô processes
Borel function g(x) of linear growth, following

Ɛ[g(XT ) | Ft ] = f (t, Xt ) where 


k
ij j
dXti = Xti ϕt (dZt + φ j dt)
f (t, x) : = Ɛ g(XTt,x ) (81) j =1

Clearly f (T , x) = g(x). (Intuitively, f (t, x) = (i = 1, . . . , n := m − 1) (84)


Ɛ[g(XT ) | Xt = x].)
Thus if we assume that ϕij (t, x) are sufficiently where Z j are independent Brownian motions and φ j ,
regular so that f (t, x) is C 1,2 on t < T for every ϕ ij are locally bounded predictable processes with
14 Exchange Options

 T ∂c (t, A )
Then Ct = c(t, At ). Agreeably, δti = ∂a
j
1/2 (φt )2 dt
ϕ ij bounded and Ɛ e j 0 < ∞. Define the i
t

martingale by equation (69). (For i = m use Euler’s formula for


c(t, a).) By the continuity of X and equation (69),
  ∂c (t, A ) too. Therefore by Itô’s formula,
δti = ∂a
k 
 i
t−

M : = E − φ j dZ j 
1  ∂ 2c
j =1 m
∂c
k    (t, At− )dt + (t, At− )d[Ai , Aj ]ct = 0
− φ j dZ j +
1
(φ i )2 dt
∂t 2 i,j =1 ∂ai ∂aj
j =1 2
=e (85) (87)

 measure  by d = MT d. Then W :=


i
and the (The term for the  sum of jumps in Itô’s formula
Z i + φ i dt are -Brownian motions and are - vanishes since C = δ i Ai .) This yields the PDE
independent since [W k , W l ] = 0 for k  = l. Hence, ∂c + 1  a a σ A (t, a) ∂ 2 c = 0 for the special
Xi  are -square-integrable martingales as dX i = ∂t 2 i,j i j ij ∂ai ∂aj
i j A
X i k ij
j =1 ϕ dW
j
and ϕ ij are bounded. Thus, A is case d[A , A ]t = At At σij (t, At )dt for some func-
i j

arbitrage free. tions σijA (t, a). The quotient-space PDE (83) is more
Now let h(a), a ∈ m + > 0, be a homogeneous fundamental for it holds in general (even when A is
function of linear growth. Define g(x) := h(x, 1), discontinuous) and has one lower dimension. Change
ij i
x ∈ n+ . Assume further that ϕt = ϕij (t, Xt ) for of variable Li = Xi+1 − 1 (i < n), Ln = X n − 1,
some continuous bounded functions ϕij (t, x). Then X
transforms equation (83) to the Libor market model
equation (80) holds, and hence the section PDE.
Projective Continuous SDE SFTS applied under
measure  shows that X is -Markovian in Multivariate Poisson Predictable Representation
that Ɛ [g(XT ) | Ft ] := f (t, Xt ) where f (t, x) =
Ɛ g(XTt,x ), as in equation (81). Thus, by the section Let P = (P 1 , · · · , P k ) be a vector of independent
Projective Continuous SDE SFTS, equations (72) Poisson processes P i with intensities λi > 0. For
and (73) hold and δ as defined in equation (70) any C 1 in t function u(t, p), p ∈ k , the process
is an SFTS for (X, 1). Therefore by numéraire u(t, P ) = (u(t, Pt )) is a finite activity semimartin-
invariance, δ is an SFTS for A with price process gale, and using [P i , P j ] = 0, one has
 u(t, P ) =
i
C = Am F . The homogeneity of h(a) further implies i i u(t, P− )P , where
CT = Am T g(XT ) = h(AT ).
We have thus constructed an SFTS with the given i u(t, p) := u(t, p1 , . . . , pi + 1, . . . , pn ) − u(t, p)
payoff h(AT ). As in the section Example: Projec- (88)
tive Deterministic Volatility or Projective Continuous denotes the ith forward partial difference of u(t, p)
SDE SFTS, we ensure its boundedness by requir- in p. This in turn readily implies
ing the x-partial derivatives of g(x) or equivalently
a-partial derivatives of h(a) (as L1loc functions) be ∂u  k

bounded and thereby get unique pricing. For (very) du(t, P ) = (t, P− )dt + i u(t, P− )dP i (89)
∂t i=1
low dimensions n, the PDE (83) is suitable for
numerical valuation in the absence of a closed-form
Let v(p), p ∈ k be a function of exponential
solution.
linear growth. Define the function
Although the option price process and the deltas
are already found, let us also consider the homoge- ∞
neous option price function referred to in the section 
u(t, p) : = v(p + q)
Self-financing Trading Strategies, and now naturally
q1 ,...,qk =0
defined by

k q
λi
× i
(T − t)qi e−λi (T −t) (p ∈ k )
a1 an qi !
c(t, a) := am f t, m , . . . , m (86) i=1
a a (90)
Exchange Options 15

Clearly, u(T , p) = v(p). Since the unconditional Since


distribution of PT −t is Poisson and is the same as the
distribution of PT − Pt conditioned on Ft , we have
∂xi  k
(t, p) = − xi (t, p) (eβij − 1)λj ,
u(t, p) = Ɛ[v(p + PT − Pt )] ∂t j =1

= Ɛ[v(p + PT − Pt ) | Ft ] (91) j xi (t, p) = xi (t, p)(eβij − 1) (97)


Hence, u(t, Pt ) = Ɛ[v(PT ) | Ft ]. (Intuitively,
u(t, p) = Ɛ[v(PT ) | Pt = p].) Thus, the process it follows from equation (89) (or easily also from
Itô’s formula) that
F = (Ft ), Ft := u(t, Pt ) = Ɛ[v(PT ) | Ft ] (92)

is a martingale. But so are P j − λj t. Therefore in 


k
dX i = X−
i
(eβij − 1)
view of equation (89), it follows that
j =1


k
× d(P j − λj t) (Xti := xi (t, Pt )) (98)
dF = i u(t, P− )d(P i − λi t) (93)
i=1
 Letβil α = (αij ) be any n × k matrix such that
and u(t, p) satisfies the equation i (e − 1)αij = δj l , all 1 ≤ j, l ≤ k. Then

∂u  k

n
(t, Pt− ) + λi i u(t, Pt− ) = 0 (94) dX i
∂t d(P j − λj t) = αij i
(99)
i=1
i=1
X−
Since FT = v(PT ) and F0 = u(0, 0), combining
equations (90) and (93) yields the following repre- Now let g(x), x ∈ n+ , be a function of linear
sentation: growth; define the function


 
k q
λi v(p) := g(x1 (T , p), . . . , xn (T , p)), (p ∈ n )
qi −λi T
v(PT ) = v(q1 , . . . , qk ) i
T e (100)
q1 ,...,qk =0 i=1
qi !
k 
 T
and the function u(t, p) by equation (90). By
+ i u(t, Pt− )d(Pti − λi t) (95) the section Multivariate Possion Predictable Rep-
i=1 0 resentation, F := (u(t, Pt )) is a martingale with
FT = v(PT ) = g(XT ) and is represented as equation
(93). Substituting equation (99) into equation (93)
Projective Exponential-Poisson SFTS yields
Let P = (P 1 , · · · , P k ) be a vector of independent 
n
dF = δ i dX i (101)
Poisson processes P j with intensities λj > 0. Let
i=1
X0 ∈ n+ , n ≥ k, and β = (βij ) be an n × k matrix
such that the n × k matrix (eβij − 1) has full rank.
where
Then the processes X i := (xi (t, Pt )), i = 1, · · · , n,
1 
k
are square-integrable martingales (in fact in all Hp ),
δti := αij j u(t, Pt− ) (102)
where i
Xt− j =1
 
k
Thus, δ = (δ 1 , · · · , δ m ) is an 
SFTS for (X, 1)
xi (t, p) : = X0i exp  (βij pj − (eβij − 1)λj t)
j =1
where m := n + 1 and δ m := F − ni=1 δ i X i .
It is more desirable to express δ in terms of X.
(p ∈  ) k
(96) One has u(t, p) = f (t, x(t, p)), where
16 Exchange Options

      
XT XT
f (t, x) : = Ɛ g x =Ɛ g x | Ft
Xt Xt
 ∞   n n q
n
(β q −(eβ1j −1)λj (T −t)) (β q −(eβnj −1)λj (T −t)) λi i
= g x1 e j =1 1j j ,. . . , xn e j =1 nj j (T − t)qi e−λi (T −t)
q ,···,q =0 i=1
q i !
1 n

(103)

The equalities follow from the definition of v(p) the last equality following  from equation  (98).
n
above and of u(t, p) in equation (90) together with However, the n × n matrix l=1 (e
βil
− 1 (eβj l −
the two formulae following it.c We clearly have 1)λl )ni,j =1 is nonsingular. Therefore, θ i = 0, that
f (T , x) = g(x) and is, δ̂ i = δ i for i ≤ n, implying δ̂ m = δ m too
as F̂ = F .
Ft := u(t, Pt ) = f (t, Xt ) = Ɛ[g(XT ) | Ft ] (104)
One shows, as in the section Exponential-Poisson
Since u(t, p) = f (t, x(t, p)), the deltas in equa- Exchange Option Model, that the processes δ i are
tion (102) are given by partial differences of f (t, x) bounded if γi (x) are bounded, where
as

1 
k
δti = δi (t, Xt− ) where
γi (x) : = αij (g(eβ1j x1 , . . . , eβnj xn ) − g(x)),
xi j =1
1 
k
δi (t, x) : = αij (f (t, eβ1j x1 , · · · , eβnj xn )
xi j =1 
n
γm (x) : = g(x) − γi (x)xi (107)
− f (t, x)) (105) i=1

We have unique pricing since (X, 1) is arbitrage


free (as X i are martingales). Specifically, if δ̂ is
Homogeneous Exponential-Poisson SFTS
another SFTS  for (X, 1) with payoff F̂T = g(XT ),
then F̂ := ni=1 δ̂ i X i + δ̂ m = F provided that either
Let A > 0 be an m-dimensional semimartingale with
all δ̂ i , i ≤ n are bounded or all δ̂ i − δ i , i ≤ n are
A− > 0 and set X := (Ai /Am )ni=1 , n := m − 1, as
bounded.
Indeed, then F̂ = F̂0 + δ̂ · X is a martingale, since before. Assume that
X is square integrable (in the second case, also use
that F is a martingale). Hence, F̂ = F as F̂T = FT . 
k
j j
Moreover, if k = n we have unique hedging, that dXti = Xt−
i
(eβij − 1)(dPt − λt dt) (108)
is, δ̂ = δ for any bounded SFTS δ̂ for (X, 1) with j =1

payoff F̂T = g(XT ). Indeed, F̂ = F , as before; thus,


setting θ i := δ̂ i − δ i gives where 1 ≤ k ≤ n, βij are constants with the n ×
k matrix (eβij − 1) of full rank, λj > 0 are

n bounded predictable processes, and P j are semi-
0 = d
F̂ − F = θ i θ j d
X i , X j martingales with [P j , P l ] = 0 for j  = l such
j
i,j =1 that [P j ] = P j , P0 = 0, and P j − κ j dt are

n 
n local martingales for some locally bounded pre-
j
= θ i θ j X−
i
X− (eβil − 1)(eβj l − 1)λl dt dictable
processes κ j > 0. Assume
further that
 j 2
i,j =1 l=1 k  T λt j
Ɛ exp j =1 0 j −1 κt dt < ∞.
(106) κt
Exchange Options 17

Owing to the above growth condition, the positive If the support of At is a proper surface, for example, if
local martingale m = 2 and A2 is deterministic as in the Black–Scholes
model or A2t = a2 (t, A1t ) as in Markovian short-rate mod-
  els, then obviously there exist infinitely many nonhomo-
k 
 j geneous functions ĉ(t, a) such that Ct = ĉ(t, At ). (Such
λ
M : = E − 1 (dP − κ dt)
j j
a homogeneous function also exists under some assump-
j =1
κj tions as in the section Homogeneous Continuous Markovian
k  j j  SFTS.)
− (λ −κ )dt
 n
λ j b.
Indeed, first assume that A is arbitrage free and let S
=e j =1 (1+ s
− 1 Psj ) m
j =1
κ j
s be a state price density. The martingale M := SA m
s≤· Ɛ[S0 A0 ]
(109) clearly satisfies Ɛ MT = 1. Hence, the equivalent measure
 defined by d = MT d is a probability measure.
i
Since MXi = SA m is a martingale, Xi is a -
is a martingale. Define the measure  by d  = Ɛ[S0 A0 ]
i
martingale by Bayes’ rule. Conversely, assume  that X are
MT d. As inthe section Exponential-Poisson Model d 
Uniqueness, λj dt are the -compensator of P j . -martingales for some . Define Mt := Ɛ | Ft >
d
This, equation (108), and boundedness of λj imply 0. Then (the right continuous version of) M = (Mt ) is
a martingale (so M− > 0). By Bayes’ rule MXi are
that X i are -square integrable martingales. Thus, A
martingales since Xi are -martingales. Set S := M/Am .
is arbitrage free. As before, the SDE (108) integrates Then S, S− > 0 and SAi = MXi . Thus S is a deflator,
to k j
t j as desired. Further, since SC is a martingale for any
β P −(eβij −1) λ ds bounded SFTS δ, by the Bayes’ rule SC/M = C/Am is
Xti = X0i e j =1 ij t 0 s (110)
a -martingale.
c.
Now assume λj are constant. Then P j are - The
  projective
 option price function f (t, x) :=
Poisson processes with intensities λj and are inde- Ɛ g x XT , also encountered for the log-Gaussian case
Xt
pendent since [P j , P l ] = 0, j  = l. Let h(a), a ∈ in equation (75), satisfies f (t, Xt ) = Ɛ[g(XT ) | Ft ] in gen-
m eral when X is the exponential of any n-dimensional
+ , be a homogeneous function of linear growth.
Define g(x) := h(x, 1), x ∈ n+ . The section Pro- process of independent increments (inhomogeneous Lévy
process), but we no longer have hedging in general.
jective Exponential-Poisson SFTS applied under 
nthati δ i given by equation (105) (with δ =
m
implies
References
F − i=1 δ X ) is an SFTS for (X, 1) with price pro-
cess F = (f (t, Xt )) satisfying FT = g(XT ), where
[1] Black, F. & Scholes, M. (1973). The pricing of options
f (t, x) is defined explicitly by equation (103), and corporate liabilities, Journal of Political Economics
or equivalently, f (t, x) = Ɛ g(xXT /Xt ). There- 81, 637–659.
fore, by numéraire invariance, δ is an SFTS for [2] Delbaen, F. & Schachermayer, W. (2006). The Mathe-
A with price process C := Am F satisfying CT = matics of Arbitrage, Springer.
Am g(XT ) = h(AT ) by homogeneity. [3] Duffie, D. (2001). Dynamic Asset Pricing Theory, 3rd
Assume finally that the payoff function h(a) is Edition, Princeton University Press.
[4] El-Karoui, N., Geman, H. & Rochet, J.C. (1995).
such that the functions γi (x) defined in equation Change of numeraire, change of probability measure,
(107) are bounded (e.g., h(a) = max(a 1 , · · · , a m )). and option pricing, Journal of Applied Probability 32,
By the section Projective Exponential-Poisson SFTS, 443–458.
if k = n, then δ is the unique bounded SFTS for [5] Harrison, M.J. & Kreps, D.M. (1979). Martingales and
A with payoff CT = h(AT ). In general, since A arbitrage in multiperiod securities markets, Journal of
is arbitrage free, Ĉ = C for any other bounded Economic Theory 20, 381–408.
[6] Harrison, M.J. & Pliska, S. (1981). Martingales and
 i δ̂i for A with payoff ĈT = h(AT ), where Ĉ :=
SFTS stochastic integrals in the theory of continuous trad-
i δ̂ A . ing, Stochastic Processes and their Applications 11,
215–260.
[7] Jamshidian, F. (1993). Options and futures evaluation
End Notes with deterministic volatilities, Mathematical Finance
3(2), 149–159.
a.
Clearly, then the restriction of (any such) c(t, a) to the [8] Margrabe, W. (1978). The value of an option to
support of A is unique, and if ĉ(t, a) is any function that exchange one asset for another, Journal of Finance 33,
equals c(t, a) on the support of A, then Ct = ĉ(t, At ) too. 177–186.
18 Exchange Options

[9] Merton, R. (1973). Theory of rational option pricing, eign Exchange Options; Forward–Backward Sto-
Bell Journal of Economics 4(1), 141–183. chastic Differential Equations (SDEs); Hedging;
[10] Neuberger, A. (1990). Pricing Swap Options Using the Itô’s Formula; Markov Processes; Martingales;
Forward Swap Market, IFA Preprint.
Poisson Process.

Related Articles FARSHID JAMSHIDIAN

Arbitrage Strategy; Caps and Floors; CMS Spread


Products; Equivalent Martingale Measures; For-
Binomial Tree (see Arbitrage Strategy). There is also an arbitrage
opportunity if RS > S0 , realized by short-selling the
risky asset.
A derivative security, contingent claim, or option
This model, introduced by Cox et al. [1] in 1979,
is a contract that pays tomorrow an amount that
has played a decisive role in the development of
depends only on tomorrow’s asset price. Thus any
the derivatives industry. Its simple structure and
such claim can only have values, say O0 and O1
easy implementation gave analysts the ability to
corresponding to “underlying” prices S0 , S1 , as shown
price a huge range of financial derivatives in an
in Figure 1.
almost routine way. Nowadays its value is largely
Suppose we form a portfolio today consisting of N
pedagogical, in that the whole theory of arbitrage
shares of the risky asset and $B in the bank (either of
pricing in complete markets can be explained in a
both of N , B could be negative). The value today of
couple of pages in the context of the binomial model.
this portfolio is p = B + N S and its value tomorrow
The model is covered in every mathematical finance
will be RB + N S0 or RB + N S1 . Now choose B, N
textbook, but we mention in particular [4], which is
such that
entirely devoted to the binomial model, and [2] for a
careful treatment of American options.
RB + N S0 = O0
RB + N S1 = O1 (2)
The One-period Model
Suppose we have an asset whose price is S today
and whose price tomorrow can only be one of two There is a unique solution as long as S1  = S0 ,
known values S0 , S1 (we take S0 > S1 ); see Figure 1. given by
This apparently highly artificial situation is the kernel O0 − O1 1
of the binomial model. We also suppose there is a N∗ = , B∗ = (O0 − N S0 ) (3)
S0 − S1 R
bank account paying a daily rate of interest r1 , so
that $1 today is worth $R = $(1 + r1 ) tomorrow. We With these choices, the portfolio value tomorrow
assume that borrowing is possible from the bank at exactly coincides with the derivative security payoff,
the same rate of interest r1 , and that the risky asset can whichever way the price moves. If the derivative
also be borrowed (sold short, in the usual financial security is offered today for any price other than
terminology). The only other assumption is that p = RB ∗ + N ∗ there is an arbitrage opportunity
(realized by “borrowing the portfolio” and buying
S1 < RS < S0 (1)
the option or conversely). Thus “arbitrage pricing”
If RS ≤ S1 , we could borrow $B from the bank reduces to the solution of a pair of simultaneous linear
and buy B/S shares of the risky asset. Tomorrow equations.
these will be worth at least S1 B/S, while only RB It is easily checked that p = (q0 O0 + q1 O1 )/R
has to be repaid to the bank, leaving a profit of either where
B(S1 − RS)/S or B(S0 − RS)/S.Both of these are RS − S1 S0 − RS
nonnegative and at least one is strictly positive. This q0 = , q1 = (4)
S0 − S1 S0 − S1
is an arbitrage opportunity: no initial investment, no
loss and the chance of a positive profit at the end We see that q0 , q1 depend only on the underlying
market parameters, not on O0 or O1 , that q0 + q1 = 1
and that q0 , q1 > 0 if and only if the no-arbitrage
S0 O0
condition (1) holds. Thus under this condition q0 , q1
S define a probability measure Q and we can write the
price of the derivative as
S1 O1  
1
p = EQ O (5)
Figure 1 One-period binomial tree R
2 Binomial Tree

Note that Q, the so-called risk-neutral measure, u 2S O0


emerges from the “no-arbitrage” argument. We said
uS
nothing in formulating the model about the probabil-
ity of an upward or downward move and the above S u1,0
argument does not imply that this probability has to S O1
u0,0
be given by Q. A further feature of Q is that if we dS
compute the expected price tomorrow under Q we u1,1
find that d 2S O2
1
S = (q0 S0 + q1 S1 ) (6)
R Figure 2 Two-period binomial tree

showing that the discounted price process is a Q-


martingale. This is summarized as follows: successive value is obtained from the previous one
by multiplication by an independent positive random
• Under condition (1) there is a unique arbitrage- factor.
free price for the contingent claim. Consider the two-period case of Figure 2 and a
• Condition (1) is equivalent to the existence of a contingent claim with exercise value O at time 2
unique probability measure Q under which the where O = O0 , O1 , O2 in the three states as shown.
discounted asset price is a martingale. By the one-period argument, the no-arbitrage price
• The contingent claim value is obtained by com- for the claim at time 1 is v1,0 = (q0 O0 + q1 O1 )/R if
puting the discounted expectation of its exer- the price is uS and v1,1 = (q0 O1 + q1 O2 )/R if the
cise value with respect to a certain probability price is dS. However, now our contingent claim is
measure Q. equivalent to a one-period claim with payoff v1,0 ,
v1,1 , so its value at time 0 is just (q0 v1,0 + q1 v1,1 )/R,
Much of the classic theory of mathematical finance
which is equal to
(see Fundamental Theorem of Asset Pricing; Risk-
 
neutral Pricing) is concerned with identifying condi- 1
tions under which these three statements hold for the v0,0 = EQ O (7)
R2
more general price models. They hold in particular
for the multiperiod models discussed below. Generalizing to n periods and a claim that pays
amounts O0 , . . . , On at time n, the value at time 0 is
 
1  n n−j j
n
The Multiperiod Model 1
v0,0 = EQ nO = C q q1 Oj (8)
R Rn j =0 j 0
More realistic models can be obtained by general-
izing the binomial model to n periods. We consider where Cjn is the binomial coefficient Cjn = n!/j !
a discrete-time price process S(i), i = 0, . . . , n such (n − j )!. From equation (3) the initial hedge ratio
that, at each time i, S(i) takes one of i + 1 values (the number N of shares is the hedging portfolio at
Si0 > Si1 > . . . > Sii . While we could consider gen- time 0) is
eral values for these constants, the most useful case
is that in which the price moves “up” by a factor u or v1,0 − v1,1
“down” by a factor d = 1/u, giving a recombining N=
tree with Sij = Sui−2j where S = S(0); see Figure 2 uS − dS
for the two-period case. We can define a proba- 1 
n−1
n−1−j j
bility measure Q by specifying that P [S(i + 1) = = Cjn−1 q0 q1 (Oj − Oj +1 )
SR n−1 (u − d)
uS(i)|S(i)] = q0 and P [S(i + 1) = dS(i)|S(i)] = q1 j =0
where q0 and q1 are given by equation (4) above; (9)
in this case, q0 = (Ru − 1)/(u2 − 1), q1 = 1 − q0 .
Thus S(i) is a discrete-time Markov process under For example, suppose S = 100, R = 1.001, u =
Q with homogeneous transition probabilities. Specif- 1.04, n = 25, and O is a call option with strike K =
ically, it is a multiplicative random walk in that each 100, so that Oj = [Sun−2j − K]+ . The option value
Binomial Tree 3

is v0,0 = 9.086 and N = 0.588. The initial holding This is the Black–Scholes formula. It can be
in the bank is therefore v0,0 − N S = −49.72. This given in more explicit terms when, for exam-
is the typical situation: hedging involves leverage ple, h(S) = [S − K]+ , the standard call option (see
(borrowing from the bank to invest in shares). Black–Scholes Formula).

Scaling the Binomial Model American Options

Now let us consider scaling the binomial model In the multiperiod binomial model, the basic compu-
to a continuous limit. Take a fixed time horizon tational step is the backward recursion
T and think of the price S(i) above, now written 1
Sn (i), as the price at time iT /n = it. Suppose the vi−1,j = (q0 vi,j + q1 vi,j +1 ) (14)
continuously compounding rate of interest is r, so R
that R = ert . Finally, define h = log u and X(i) = defining the values at time step i − 1 from those at
log(S(i)/S(0)); then X(i) is a random walk on the time i by discounted conditional expectation, starting
lattice {. . . − 2h, −h, 0, h, . . .} with right and left with the exercise values vn,j = Oj at the final time
probabilities q0 , q1 as √defined earlier and X(0) = 0. n. In an American option, we have the right to
If we now take h = σ t for some constant σ , we exercise at any time, the exercise value at time i being
find that some given function h(i, Si ), for example, h(i, Si ) =
  [K − Si ]+ for an American put. The exercise value at
1 h 1 2
q0 , q1 = ± r − σ + O(h2 ) (10) node (i, j ) in the binomial tree is therefore h̃(i, j ) =
2 2σ 2 2 h(i, Sui−2j ). In this case, it is natural to replace
equation (14) by
Thus Z(i) := X(i) − X(i − 1) are independent
random variables with vi−1,j = max{vk−1,j
c
, h̃(i − 1, j )} (15)
  c
h2 1 2 where vi−1,j is given by the right-hand side of
EZ(i) = r − σ + O(h3 ) equation (14). At each node (i − 1, j ), we compare
σ2 2 c
  the “continuation value” vi−1,j with the “immediate
1
= r − σ 2 t + O(n−3/2 ) (11) exercise” value h̃(i − 1, j ) and take the larger value.
2 This intuition is correct, and the value v0,0 obtained
by applying equation (15) for i = n, n − 1, . . . , 1
and with starting condition vn,j = h̃(n, j ) is the unique
var(Z(i)) = σ 2 t + O(n−2 ) (12) arbitrage-free value of the American option at time 0.
The reader should refer to American Options
n
Hence Xn (T ) := X(n) = i=1 Z(i) has mean µn for a complete treatment, but, in outline, the argu-
and variance Vn such that µn → (r − σ 2 /2)T and ment establishing the above claim is as follows. The
Vn → σ 2 T as n → ∞. By the central limit theo- algorithm divides the set of nodes into two, the stop-
rem, the distribution of Xn (T ) converges weakly to ping set S = {(i, j ) : vi,j = h̃(i, j )} and the comple-
the normal distribution with the limiting mean and mentary continuation set C. By definition, (n, j ) ∈ S
variance. If the contingent claim payoff is a continu- for j = 0, . . . , n. Let τ ∗ be the stopping time τ ∗ =
ous bounded function O = h(Sn (n)), then the option min{i : Si ∈ S}. Then τ ∗ is the optimal time at which
value converges to a normal expectation that can be the holder of the option should exercise. The process
written as Vi = vi,Si /R i is a supermartingale, while the stopped
process Vi∧τ ∗ is a martingale with the property that
 1    Vi∧τ ∗ ≥ h(i ∧ τ ∗, Si∧τ ∗ )/R i . These facts follow from
e−rT 1
V0 (S) = √ h S exp r − σ 2 T the general theory of optimal stopping, but are not
2π −1 2
 hard to establish directly in the present case. The
√ 1 2 value Vi∧τ ∗ can be replicated by trading in the under-
+ σ T x e− 2 x dx (13)
lying asset (using the basic hedging strategy (3)
4 Binomial Tree

derived for the one-period model). It follows that this where F0,t is the forward price quoted at time 0 for
strategy (call it SR) is the cheapest superreplicating exchange at time t and
strategy, that is, x = v0,0 is the minimum capital  
required to construct a trading strategy with value Xi 1
Mt = exp σ Wt − σ 2 t (17)
at time i with the property that Xi ≥ h(i, Si ) for all i 2
almost surely. If the seller of the option is paid more
than v0,0 , then he or she can put the excess in the is the exponential martingale with Brownian motion
bank and employ the trading strategy SR, which is Wt . See Black–Scholes Formula for this represen-
guaranteed to cover his or her obligation to the buyer tation. F0,t only depends on the spot price S0 and
whenever he or she chooses to exercise. Conversely, the yield curve (and the dividend yield, if any), so
if the seller will accept p < v0,0 for the option then the only stochastic modeling required relates to the
the buyer should short SR, obtaining an initial value Brownian motion σ Wt . Here we can use a standard
v0,0 of which p is paid to the seller and v0,0 − p “symmetric random walk” approximation: divide the
placed in (for clarity) a second bank account. The length δ = T /n
time interval [0, T ] into n intervals of√
short strategy has value −Xi and the buyer exercises and take a space step of length h = σ δ. At each dis-
at τ ∗, receiving from the seller the exercise value crete time point, the random walk (denoted Xi ) takes
h(τ ∗ , Sτ ∗ ) = Xτ ∗ , which is equal and opposite to the a step of ±h with probability 1/2 each—this is just
value of the short hedge at τ ∗ . Thus, there is an arbi- a binomial tree with equal up and down probabili-
trage opportunity for one party or the other unless the ties. For a single step Z = Xi − Xi−1 = ±h we have
price is v0,0 . E[eZ ] = cosh h, so if we define α = log(cosh h) then
The impact of the binomial model as introduced Mi(n) = exp(Xi − αi) is a positive discrete-time mar-
by Cox et al. [1] is largely due to the fact that tingale with E[Mi(n) ] = 1. It is a standard result that
the European option pricer can be turned into an the sequence M (n) (suitably interpreted) converges
American option pricer by a trivial one-line modifica- weakly to M given by equation (17) as n → ∞. This
tion of the code. Pricing American options in (essen- gives us a discrete-time model
tially) the Black–Scholes model was recognized as
a free-boundary problem in partial differential equa- Si(n) = F0,iδ Mi(n) (18)
tions (PDE) by McKean [3] in 1965, but the only
computational techniques were PDE methods (see
Finite Difference Methods for Early Exercise Op- such that E[Si(n) ] = F0,iδ holds exactly at each i.
tions) generally designed for much more complicated At node (i, j ) in the tree the corresponding price
problems. is F0,iδ exp((n − j )h − iα). Essentially, we have
replaced the original multiplicative random walk rep-
resenting the price S(t) by an additive random walk
Computations in the Binomial Model representing the return process log S(t). The advan-
tages of this are (i) all the yield curve aspects
Nowadays the binomial model is rarely, if ever, are bundled up in the model-free function F , and
used for practical problems, largely because it is (ii) the stochastic model is “universal” (and very
comprehensively outperformed by the trinomial tree simple).
(see Tree Methods). The decisive drawback of any binomial model
First, the form of the tree given above is probably is the absolute inflexibility with respect to volatil-
not the best if we want to regard the tree as an ity: it is impossible to maintain a recombining
approximation to the Black–Scholes model. We see tree while allowing time-varying volatility. This
from equation (10) that the risk-neutral probabilities means that the model cannot be calibrated to more
q0 , q1 depend on r, so if we want to calibrate the than a single option price, making it useless for
model to the market yield curve we will need time- real pricing applications. The trinomial tree gets
varying q0 , q1 . This can be avoided if we write the around this: we can adjust the local volatility by
Black–Scholes model as changing the transition probabilities while main-
taining the tree geometry (i.e., the constant spatial
St = F0,t Mt (16) step h).
Binomial Tree 5

References [4] Shreve, S.E. (2005). Stochastic Calculus for Finance,


Vol 1: The Binomial Asset Pricing Model, Springer.

[1] Cox, J., Ross, S. & Rubinstein, M. (1979). Option pricing,


a simplified approach, Journal of Financial Economics 7, Related Articles
229–263.
[2] Elliott, R.J. & Kopp, P.E. (2005). Mathematics of Finan-
cial Markets, 2nd Edition, Springer. Black–Scholes Formula; Quantization Methods;
[3] McKean, H.P. (1965). Appendix to P.A. Samuelson, Tree Methods.
rational theory of warrant pricing, Industrial Management
Review 6, 13–31. MARK H.A. DAVIS
American Options numerical methods based on this characterization of
the option value process.
We can adopt a complementary point of view to
study an American option. If we specify the evolu-
An American option is a contract between the seller tion model for the underlying assets of the option, we
and the buyer. It is characterized by a nonnegative could characterize the option value as the solution of
random function of time Z and a maturity. The option a variational inequality. This method, introduced by
can be exercised at any time t between the initial Benssoussan and Lions [6], was applied to American
date and the maturity. If the buyer exercises the options by Jaillet et al. [31]. We present this varia-
option at time t, he/she receives the amount of tional approach in the section Analytic Properties of
money Z(t) at time t. The buyer may exercise the American Options. We conclude this survey by giv-
option only once before the maturity. The price ing results on exercise regions. In particular, we recall
of an American option is always greater than or a formula linking the European and the American
equal to the price of the corresponding European option prices known as the early exercise premium
option (see Black–Scholes Formula). Indeed, the formula (see the section Exercise Region).
buyer of an American option gets more rights than
the one who holds a European option, as he/she
may exercise the option at any time before the American Option and Snell Envelope
maturity. This is the right of early exercise, and
the difference between the American and European To price and hedge an American option, we have to
options prices is called the early exercise premium. choose a model for the financial  market. We consider 
The basic American options are American call and a filtered probability space ,  = (Ft )0≤t≤T ,  ,
put options (see Call Options): they allow the buyer where T is the maturity of our investment, Ft the
to sell or buy a financial asset at a price K (the strike information available at time t, and  the historical
price) and before a date (maturity) agreed before. probability. We assume that the market is composed
The function Z associated to the call (respectively of d + 1 assets: S 0 , S 1 , . . . , S d . S 0 is a deterministic
put) option is then Z(t) = (St − K)+ (respectively process representing the time value of money. The
Z(t) = (K − St )+ ), where St is the value at time t others are risky assets such that Sti is the value of
of the underlying financial asset. asset i at time t. In this section, we assume that
The study of American options began in 1965 with the market does not offer arbitrage opportunities and
McKean [41] who considered the pricing problem is complete (see Complete Markets). Harisson and
as an optimal stopping problem and reduced it to Pliska [28] observed that the no arbitrage assumption
a free boundary problem. The option value is then is equivalent to the existence of a probability measure
computable if one knows the free boundary called the equivalent to the historical one under which the
optimal exercise boundary. In 1976, Van Moerbecke discounted asset price processes are martingales.
[48] exhibited some properties of this boundary. In a complete market, such a probability measure
The formalization of the American option pricing is unique and called the risk-neutral probability
problem as an optimal stopping problem was done measure (see Risk-neutral Pricing). We will denote
in the two pioneering works of Benssoussan and it by ∗ .
Karatzas [5, 32].
They have proved that, under no arbitrage and American Option Pricing
completeness assumptions (see Complete Markets),
the value process of an American option is the Snell We present the problems linked to the American
envelope of the pay-off process, that is, the smallest option study. The first one is the option pricing.
supermartingale greater than the pay-off process. An American option is characterized by an adapted
From previous works on these processes [23], we and nonnegative process (Zt )t≥0 , which represents
can derive some properties of the value process. the option pay-off if its owner exercises it at time
Especially, we obtain characterization of optimal t. We generally define Z as a function of one or
exercise times. In the section American Option and several underlying assets. For instance, for a call
Snell Envelope, we present the main results and some option with strike price K, we have Zt = (St − K)+
2 American Options

or for a put option on the minimum of two assets, ∀t ∈ [0, T ], Vt (φ) = Pt + St0 At ≥ Pt (2)
we have Zt = (K − min(St1 , St2 ))+ . There also exist
options, called Amerasian options, where the pay-off This is a surreplication strategy for American
depends
 on the whole  path of the assets, for instance, options. Moreover, for this strategy, the initial wealth
+
1 t
Zt = K − t 0 Su du . for hedging the option is minimum because we have
V0 (φ) = P0 .
Using arbitrage arguments, Benssoussan and
The third problem arising in the American option
Karatzas [5, 32] have shown that the discounted
theory is linked to early exercise opportunity. Con-
American option value at time t is the Snell envelope
trary to European options, for the American option
of the discounted pay-off process [19, 43]. For
holder, knowing the arbitrage price of his/her option
the definition and general properties on the Snell
is not enough. He/she has to know when it is optimal
envelope, we refer to [23] for continuous time and
for him/her to exercise the option. The tool to study
to [44] for discrete time. We can then assert that the
this problem is the optimal stopping theory.
price at time t of an American option with pay-off
process Z and maturity T is
  Optimal Exercise
St0
Pt = esssupτ ∈Tt,T Ɛ∗ Zτ | Ft (1) We recall some useful results of the optimal stopping
Sτ0
theory and apply them to the American put option
where Tt,T is the set of -stopping times with values in the famous Black–Scholes model. These results
in [t, T ]. are proved in [23] in a larger setting and their
The second problem appearing in the option theory financial applications have been developed in [35].
consists in determining a hedging strategy for the An optimal stopping time for an American option
option seller (see Hedging). The solution follows holder is a stopping time that maximizes his/her gain.
directly from the Snell envelope properties. Indeed, if Consequently, a stopping time ν is optimal if we have
X is a process, we will denote the discounted process
Ɛ∗ [Z̃ν ] = esssupτ ∈T0,T Ɛ∗ [Z̃τ | Ft ] (3)
by X̃ = SX0 and we have the following result ([35,
Corollary 10.2.4]). We have a characterization of optimal stopping
times, thanks to the following theorem.
Proposition 1 The process (P̃t )0≤t≤T is the small-
est right-continuous super martingale that dominates Theorem 1 Let τ ∗ ∈ T0,T . τ ∗ is an optimal stop-
(Z̃t )0≤t≤T . ping time if and only if Pτ ∗ = Zτ ∗ and the process
(P˜t∧τ ∗ )0≤t≤T is a martingale.
As (P̃t )0≤t≤T is a super martingale, it admits
a Doob decomposition (see Doob–Meyer Decom- It follows from this result that the stopping time
position). There exist a unique right-continuous
martingale (Mt )0≤t≤T and a unique nondecreasing, τ ∗ = inf{t ≥ 0 : Pt = Zt } ∧ T (4)
continuous, adapted process (At )0≤t≤T such that
is an optimal stopping time and, obviously, it is the
A0 = 0 and P̃t = Mt − At for all t ∈ [0, T ]. This smallest one. We can easily determine the largest opti-
decomposition of P is very useful to determine a mal stopping time by using the Doob decomposition
surreplication strategy for an American option (see of super martingale. We introduce the following stop-
Superhedging). A strategy is defined as a predictable ping time:
process (φt )0≤t≤T such that the value, at time t, of
the
d portfolio associated with this strategy is Vt (φ) = ν ∗ = inf{t ≥ 0 : At > 0} ∧ T (5)
i i
i=0 φt St . In a complete market, each contingent
claim is replicable; then there exists a self-financing and it is easy to see that ν ∗ is the largest optimal
strategy φ such that VT (φ) = ST0 MT . As Ṽ (φ) is a stopping time.
martingale under the risk-neutral probability, we get We then apply these results to an American
Ṽt (φ) = Mt for all t ∈ [0, T ]. In conclusion, we have put option in Black–Scholes framework (see
constructed a self-financing strategy such that Black–Scholes Formula). We assume that the
American Options 3

underlying asset S of the option is solution, under the Another technique to reduce the dimension of the
risk-neutral probability, to the following equation: problem is the randomization of the maturity applied
in [9, 13], but only approximations of the option price
dSt = St (r dt + σ dWt ) (6) can be obtained in this way. In the following section,
we present methods to approximate P based on the
with r, σ > 0 and W a standard Brownian motion. discretization of the problem.
From the Markov property of S, we can deduce
that the option price at time t is P (t, St ), where
Approximation of the American Option Value
P (t, x) = sup Ɛ∗ [e−rτ (K − Sτ )+ |S0 = x] (7)
τ ∈T0,T −t To approximate Pt , it is natural to restrict the set
of exercise dates to a finite one. We then introduce a
It is easy to see that t → P (t, x) is nonincreasing subdivision S = {t1 , . . . , tn } of the interval [0, T ] and
for all x ∈ [0, +∞). Moreover, for t ∈ [0, T ], the assume that the option owner can exercise only at a
function x → P (t, x) is convex [24, 29, 30]. From date in S. Such options are called Bermuda options
the convexity of P , we deduce that there exists and their price at time t is given by
a unique optimal stopping time: τ ∗ = inf{t ≥ 0 :
P (t, St ) = (K − St )+ } ∧ T . We introduce the so- Ptn = esssupτ ∈T n Ɛ[St0 Z̃τ | Ft ] (11)
t,T
called critical price or free boundary s(t) = inf{x ∈
[0, +∞) : P (t, x) > (K − x)+ } and can write that where Tt,T
n
is the set of -stopping times with values
in S ∩ [t, T ]. We obviously have limn→+∞ P n = P
τ ∗ = inf{t ≥ 0 : St ≤ s(t)} ∧ T and some estimates of the error have been given in
[1, 15]. For perpetual put options, Dupuis and Wang
= inf{t ≥ 0 : Wt ≤ α(t)} ∧ T [21] have obtained a first-order expansion of the error

on the value function and on the critical prices. In
1 s(t) σ2
with α(t) = ln − r− t (8) the case of finite maturity, this problem is still open;
σ S0 2 we just know that the error is proportional to n1 for
the value function and to √1n for the critical prices
Hence, τ ∗ is the reaching time of α by a Brownian
[18].
motion. If α was known, we could compute τ ∗
We have to determine Ptni for all i ∈ {1, . . . , n}.
and then P . However, the only way to get the
For this, we use the so-called dynamic programming
law of τ ∗ explicitly is to reduce the dimension
equation:
by considering options with infinite maturity (also
 n
known as perpetual options). In this case, we have  PT = ZT  
the following result ([37, Proposition 4.5]). S 0
(12)
 Ptni = max Zti , Ɛ∗ ti
P n | Fti
Proposition 2 The value function of an American St0i+1 ti+1
perpetual put option is This equation is easy to understand with financial
arguments. At maturity of the option, it is obvious
P ∞ (x) = sup Ɛ∗ [e−rτ (K − Sτ )+ that the option price PTn is equal to the pay-off ZT .
τ ∈T0,+∞ At time ti < T , the option holder has two choices:
× ζτ <+∞ |S0 = x] (9) he/she exercises and then earns Zti ; else he/she keeps
the option and then would have the option value
and is given by at time n + 1, Ptni+1 . Hence, using the no arbitrage
assumption, one can prove that at time ti the option

K − x if x ≤s ∗ seller should receive
P ∞ (x) = ∗ γ  0 
(K − s ∗ ) sx if x > s ∗ St i n
max Zti , Ɛ∗ Pt | Fti (13)
2r Kγ St0i+1 i+1
where γ = and s ∗ = (10)
σ 2 1+γ
Computing the Bermuda option price consists
is the critical price. now in calculating the expectations in the dynamic
4 American Options

programming equation. On the one hand, Monte consequences from the theoretical point of view and
Carlo techniques have been applied to solve this on practical aspects.
problem (see Monte Carlo Simulation for Stocha-
stic Differential Equations; Bermudan Options and
[11]). More precisely, we can quote some regression Analytic Properties of American Options
methods based on projections on Hilbert space base
[40, 47], quantization algorithms proposed in [1, 2], In this section, we assume that the assets prices
and some Monte Carlo methods based on Malliavin process follows a model called local volatility model
calculus [3, 8]. On the other hand, we can use (see Local Volatility Model). This model is complete
a discrete approximation of the underlying assets and takes into account the smile of volatility observed
process. A widely used model is the Cox, Ross, and when one calibrates the Black–Scholes model (see
Rubinstein model (see Binomial Tree). We introduce Model Calibration; Implied Volatility Surface and
a family of independent and identically distributed [20]). We suppose that the assets prices process
Bernouilli variables (Un )1≤n≤N with values in {b, h}, is solution to the following stochastic differential
where −1 < b < h. We then consider only two assets equation:
S 0 and S whose respective initial values are 1 and S0  
such that 
d
dSti = Sti bi (t, St ) dt + σi,j (t, St ) dWt  (16)
j

j =1
Sn0 = (1 + r)n and Sn = Sn−1 (1 + Un )
∀n ∈ {1, . . . , N } (14) where W is a standard Brownian motion on d , b
a function mapping [0, T ] × [0, +∞)d into d , and
where r > 0 is the constant interest rate of the market. σ a function mapping [0, T ] × [0, +∞)d into d×d .
From the no arbitrage assumption, it follows that Moreover, we assume that b is bounded and Lipschitz
b < r < h and that, under the risk-neutral probability continuous, that σ is Lipschitz continuous in the
∗ , we have p := ∗ (U1 = h) = h−b r−b
. Hence, using space variable, and that there exists α ≥ 1/2 and σH
the Markov property of S, we can price an American such that ∀x ∈ [0, +∞), (t, s) ∈ [0, T ]2 , | σ (t, x) −
option on S. For instance, for a call option with σ (s, x) |≤ σH | t − s |α . Moreover, to ensure the
exercise price K, we get Pn = F (n, Sn ), where F completeness of the market and the nondegeneracy of
is the solution to the following equation: the partial differential equation satisfied by European
option price functions, we assume that there exist
 + m > 0 and M > 0 such that
 F (N, x) = (K −
 x)



 F (n, x) = max K − x,
1
1+r ∀(t, x, ξ ) ∈ [0, T ] × [0, +∞)d × d ,
(15)

 × (pF (n + 1, x(1 + h))

  m2 ξ 2 ≤ ξ ∗ σ ∗ σ (t, x)ξ ≤ M 2 ξ 2 (17)

+ (1 − p)F (n + 1, x(1 + b)))
From the Markov property of the process S, at
The convergence of binomial approximations was time t, the price of an American option with maturity
first studied in a general setting in [34]. The rate of T and pay-off process (f (St ))0≤t≤T is P (t, St ), where
convergence is difficult to get, but some estimates are  
given in [36, 38]. P (t, x) = sup Ɛ e−r(τ −t) f (Sτ ) | St = x (18)
In conclusion, for some simple models, one can τ ∈Tt,T

numerically solve the option pricing problem. How-


ever, only the time variable is discretized. Analytical The Value Function
methods have been developed and provide a better
understanding of the links between time and space To compute the option price, we now have to study
variables. In particular, we can characterize the option the option value function P . From its definition, we
value as a solution to a variational inequality and can derive immediate properties:
get an approximation of its solution, thanks to finite
difference methods. This characterization has many • ∀x ∈ [0, +∞)d , P (T , x) = f (x)
American Options 5

• ∀(t, x) ∈ [0, T ] × [0, +∞)d , P (t, x) ≥ f (x) this method to the American option problem, Jaillet
• If the coefficients σ and b do not depend on time, et al. have proved that the value function P can be
we can write characterized as the unique solution, in the sense of
  distribution, of the following variational inequality
P (t, x) = sup Ɛ e−rτ f (Sτ ) | S0 = x (19) [31]:
τ ∈T0,T −t

DP ≤ 0, f ≤ P , (P − f )DP = 0 a.e.
then the function t → P (t, x) is nonincreasing on P (x, T ) = f (x) on [0, +∞)
[0, T ]. (23)
where we set
Up to imposing some assumptions on the regu-
larity of the pay-off function, we can derive some
important continuity properties of P . In this section, ∂h 
d
∂ 2h
Dh(t, x) = + 1
2
(σ σ ∗ )i,j (t, x)xi xj
we assume that f is nonnegative and continuous on ∂t ∂xi xj
i,j =1
[0, +∞) such that

d
∂h
+ bi (t, x)xi − rh (24)
∃(M, n) ∈ [0, +∞) × , ∀x ∈ [0, +∞)d , ∂xi
i=1
d  
 ∂f 
| f (x) | +  (x) ≤ M(1+ | x |n ) (20) This inequality directly derives from the properties
 ∂x 
i=1 i of the Snell envelope. Indeed, the condition DP ≤ 0
is the analytic translation of the super martingale
These assumptions are generally satisfied by the
property of P̃ , f ≤ P corresponds to Z ≤ P , and
pay-off functions appearing in finance, especially by
the fact that one of this two inequalities has to be
the pay-off functions of put and call options. In this
an equality follows from the martingale property of
setting, we have the following result [31].
(Pt∧τ ∗ )0≤t≤T .
Proposition 3 There exists a constant C > 0 such From the variational inequality, we can use numer-
that ical methods, such as finite difference methods, to
compute the option price (see Finite Difference
Methods for Early Exercise Options and [31]).
∀t ∈ [0, T ], ∀(x, y) ∈ [0, +∞)2d , From a theoretical point of view, we can deduce some
| P (t, x) − P (t, y) |≤ C | x − y | (21) analytic properties of P . If we add the condition that
second-order derivatives of the pay-off function are
∀x ∈ [0, +∞) , ∀(t, s) ∈ [0, T ] ,
d 2
bounded from below, we have the following result.
 
 1 1

| P (t, x) − P (s, y) |≤ C (T − t) 2 − (T − s) 2  Proposition 4 Regularity of P

(22) 1. Smooth fit property: For t ∈ [0, T ), the function


x → P (t, x) is continuously differentiable and
As a consequence of this result, we can assert its first derivatives are uniformly bounded on
that the first-order derivatives of P in the sense of [0, T ] × [0, +∞)d .
distributions are locally bounded on the open set 2. There exists a constant C > 0 such that for all
(0, T ) × (0, +∞)d . This plays a crucial role in the (t, x) ∈ [0, T ) × [0, +∞)d , we have
characterization of P as a solution to a variational
inequality.  
 ∂P  C
 
 ∂t (t, x) + |D P (t, x)| ≤
2
1
(25)
Variational Inequality (T − t) 2
In a more general setting, Benssoussan and Lions [6] where D 2 P is the Hessian matrix of P .
have studied existence and uniqueness of solutions
of variational inequalities and linked these solutions The smooth fit property has equally been estab-
to those of optimal stopping problems. Applying lished with probabilistic arguments, using the early
6 American Options

exercise premium formula presented in the section where µ ∈ , σ > 0, W is a standard Brownian
Exercise Region [30, 43]. In connection with free motion, N is a Poisson process with intensity λ >
boundary problems, some analytic methods have been 0, and the Ui are independent and identically dis-
developed in [26] from which we can deduce the con- tributed variables with values in (−1, +∞) such that
tinuity of ∂P
∂t
on [0, T ) × [0, +∞)d . Ɛ[Ui2 ] < +∞.
Thanks to the variational inequality, we can estab- This model is not complete but up to a change
lish the so-called robustness of Black–Scholes for- of probability measure, we can suppose that µ =
mula [24]. The two main results obtained are the r − λƐ[U1 ], where r > 0 is the constant interest rate
following. of the market. Hence, S̃ is a martingale
 with respect
 to
the filtration generated by W , N , and Ui ζi≤Nt 0≤t≤T .
Proposition 5 We assume that d = 1. If the pay-off The option price is then determined as the initial
function is convex, then the value function P is equally wealth of a replication portfolio, which minimizes
convex. Moreover, if there exist σ1 , σ2 > 0 such that the quadratic risk. Merton obtained closed formu-
σ1 ≤ σ ≤ σ2 , then we have las to calculate the European options price. In this
model, Zhang [50] extended the variational inequal-
P σ1 ≤ P ≤ P σ2 (26)
ity approach to evaluate the American options price
where P σi is the value function of the American option and he got a characterization of the value func-
on an underlying asset with volatility σi . tion as solution to the following integro-differential
equation:
The propagation of convexity has been proved 
DP + IP ≤ 0, f ≤ P ,
with probabilistic arguments in [29] and can be
(DP + IP ) (P − f ) = 0 a.e. (28)
extended to the case d > 1. The robustness of
P (x, T ) = f (x) on [0, +∞)
Black–Scholes formula is equally useful from a prac-
tical point of view because it allows to construct with
surreplication and subreplication strategies using a
constant volatility. ∂h σ 2 x 2 ∂ 2 h ∂h
Dh(t, x) = + 2
+ µx − rh
When there is only one risky asset modeled as a ∂t 2 ∂x ∂x
geometric Brownian motion, the analytic properties 
presented in this section can be used to transform, Ih(t, x) = λ (h(t, x + z) − h(t, x)) ν(dz) (29)
thanks to Green’s theorem, the variational inequal-
ity in an integral equation (see Integral Equation where ν is the law of ln(1 + U1 ). Zhang used this
Methods for Free Boundaries). This point of view equation to derive numerical schemes for approxi-
has been adopted to provide new numerical methods mating P . However, he could not obtain a description
[16] to get theoretical results such as the convexity of of the optimal exercise strategies. This was studied
the critical price for the put option [22] or its behavior by Pham [46] who obtained a pricing decomposi-
near maturity [16, 25]. tion formula and some properties of the exercise
boundary.
In conclusion, analytic properties of the American
Integro-differential Equation option value function have been used to build numer-
ical methods of pricing and to get some theoretical
The integro-differential approach can be extended to
properties. Although the variational point of view is
the American option on jump diffusions (see Partial
better for understanding the discretization of Amer-
Integro-differential Equations (PIDEs)). In 1976,
ican options, it is less explicit than the probabilistic
Merton (see Merton, Robert C. and [42]) introduced
methods. We can remark that a specific region of
a model including some discontinuities in the assets
[0, T ] × [0, +∞)d appears in these two approaches:
value process. He considered a risky asset whose
the so-called exercise region
value process is solution to the following equation:
N
 t
E = {(t, x) ∈ [0, T ) × [0, +∞)d : P (t, x) = f (x)}
dSt = St − µdt + σ dWt + d Ui (27)
i=1 (30)
American Options 7

If we knew E, on the one hand, we would be able assets but the same kinds of results exist for many
to determine the law of optimal stopping times and, others options.
on the other hand, the option pricing problem would We denote by Et the temporal section of the
be reduced to solving a partial differential equation in exercise region. For a call option on the maximum
the complementary set of E. In the following section, of two assets, S 1 and S 2 Et can be decomposed in
we recall some results on exercise regions and in two regions: E1t = Et ∩ {(x1 , x2 ) ∈ [0, +∞)2 : x2 ≤
particular we give a price decomposition, known as x1 } and E2t = Et ∩ {(x1 , x2 ) ∈ [0, +∞)2 : x1 ≤ x2 }.
the early exercise premium formula, which involves These two regions are convex and can be rewritten
the exercise region. as follows:

E1t = {(x1 , x2 ) ∈ [0, +∞)2 : s1 (t, x2 ) ≤ x1 } and


Exercise Region
E2t = {(x1 , x2 ) ∈ [0, +∞)2 : s2 (t, x1 ) ≤ x2 } (32)
Description
where s1 and s2 are the respective continuous bound-
In the section Optimal Exercise, we have already aries of E1 (t) and E2 (t).
presented a brief description of the exercise region To compute these boundaries, we can use the early
of an American put option on a single underlying exercise premium formula given in the following
following the Black–Scholes model. These results section.
are still true in the local volatility model introduced in
the section Analytic Properties of American Options.
Hence, for a put option with maturity T and strike Early Exercise Premium Formula
price K, we have
About the same time, many authors have exhibited a
decomposition formula for the American option price
E = {(t, x) ∈ [0, T ) × [0, +∞); x ≤ s(t)} with
[14, 30, 43]. This formula is very enlightening from a
s(t) = inf{x ∈ [0, +∞) : P (t, x) = f (x)} (31) financial point of view because it consists in writing
that Pt = Pte + at where Pte is the corresponding
Using the integral equation satisfied by P in European option price and a is a nonnegative function
the Black–Scholes model, we can apply general of time corresponding to the premium the option
results proved in [27] for free boundary problems buyer has to pay to get the right of early exercise. If
and assert that s is continuously differentiable on the exercise region is known, a closed formula allows
[0, T ). It has been shown that this is still true in the us to compute this premium. We recall this formula
local volatility model using some blow-up techniques for a put option on a dividend-paying asset following
and monotonicity formulas [7, 12]. Moreover, Kim the Black–Scholes model:
[33] proved that limt→T s(t) = min(K, rK δ
) if S is
solution to dSt = St ((r − δ)dt + σ (t, St )dWt ). We  T
will see that the behavior of s near maturity has been P (t, x) = P e (t, x) + Ɛ∗ [e−r(u−t) (δSu − rK)
extensively studied. t
The description of exercise region for options on × ζ{Su ≥s(u)} |St = x] du (33)
several assets is more interesting because in high
dimension numerical methods are less efficient and where δ > 0 is the dividend rate and P e (t, x) =
it helps to have a better understanding of these Ɛ∗ [e−r(T −t) (K − ST )+ |St = x]. This formula is
products. Broadie and Detemple were the first to equally interesting from a theoretical point of view
investigate this problem [10]. They give precise as it leads to an integral equation for the critical
descriptions of the exercise region shapes for the price:
most traded options on several assets. Their results
were completed by Villeneuve [49]. In particular, he  T
gives a characterization of the nonemptiness of the K − s(t) = P e (t, s(t)) + Ɛ∗ [e−ru (δSu − rK)
exercise region. We just quote here the main results t

concerning a call option on the maximum on two × ζ{Su ≥s(u)} |St = s(t)] du (34)
8 American Options

This formula has been extended in [10] to References


American options on several assets. For the call on
the maximum on two assets, we get [1] Bally, V. & Pagès, G. (2003). Error analysis of the
quantization algorithm for obstacle problems, Stochastic
 Processes and their Applications 106, 1–40.
T  
P (t, x) = P e (t, x) + Ɛ∗ [e−r(u−t) δ1 Su1 − rK [2] Bally, V., Pagès, G. & Printems, J. (2005). A quantiza-
t tion method for pricing and hedging multi-dimensional
American style options, Mathematical Finance 15,
× ζ{Su1 ≥s1 (u,Su2 )} |St = x] du 119–168.
 T [3] Bally, V., Caramellino, L. & Zanette, A. (2005). Pricing
  American options by Monte Carlo methods using a
+ Ɛ∗ [e−r(u−t) δ2 Su2 − rK Malliavin Calculus approach, Monte Carlo Methods and
t
Applications 11, 97–133.
× ζ{Su2 ≥s2 (u,Su1 )} |St = x] du (35) [4] Barles, G., Burdeau, J., Romano, M. & Sansoen, N.
(1995). Critical stock price near expiration, Mathemati-
cal finance 5, 77–95.
Once again an integral equation could be derived [5] Benssoussan, A. (1984). On the theory of option pricing,
for (s1 , s2 ). Acta Applicandae Mathematicae 2, 139–158.
We can also use this formula and the integral [6] Benssoussan, A. & Lions, J.L. (1982). Applications of
equation satisfied by the free boundary to study Variational Inequalities in Stochastic Control , North-
the behavior of the exercise region for short matu- Holland.
[7] Blanchet, A. (2006). On the regularity of the free
rity. This is a crucial point for numerical methods.
boundary in the parabolic obstacle problem. Applica-
Indeed, we have seen that both the value func- tion to American options, Nonlinear Analysis 65(7),
tion and the free boundary present irregularities 1362–1378.
near maturity, which implies instability in numerical [8] Bouchard, B., Ekeland, I. & Touzi, N. (2004). On the
methods. Malliavin approach to Monte-Carlo approximation of
conditional expectations, Finance and Stochastics 8(1),
45–71.
[9] Bouchard, B., El Karoui, N. & Touzi, N. (2005).
Behavior Near Maturity
Maturity randomisation for stochastic control problems,
Annals of Applied Probability 15(4), 2575–2605.
The behavior of the exercise region near maturity [10] Broadie, M. & Detemple, J.B. (1997). The valuation
has been extensively studied when there is only one of American options on multiple assets, Mathematical
underlying asset. In his pioneering work, Van Moer- Finance 7, 241–286.
becke conjectured a parabolic behavior for the bound- [11] Broadie, M. & Glasserman, P. (1997). Pricing American-
ary near maturity [48]. However, when the asset does style securities using simulation, Journal of Economic
Dynamics and Control 21, 1323–1352.
not distribute dividends, it has been shown that there
[12] Caffarelli, L., Petrosyan, A. & Shahgholian, H. (2004).
is an extra logarithmic factor [4]. Lamberton and Vil- Regularity of a free boundary in parabolic potential
leneuve have then proved that, in the Black–Scholes theory, Journal of the American Mathematical Society
model, the free boundary has a parabolic behavior 17(4), 827–869.
if its limit is a point of regularity for the pay-off [13] Carr, P. (1998). Randomization and the American Put,
function, else a logarithmic factor appears [39]. This The Review of Financial Studies 11, 597–626.
[14] Carr, P., Jarrow, R. & Myneni, R. (1992). Alternative
result has been extended to local volatility model
characterization of American put options, Mathematical
in [17]. In a recent paper [16], new approximations Finance 2, 87–106.
are provided for the location of the free boundary [15] Carverhill, A.P. & Webber, N. (1990). American options:
by using integral equation satisfied by P and s. theory and numerical analysis, in Options: Recent
However, this technique cannot be extended to the Advances in Theory and Practice, S. Hodges, ed,
case of options on several assets. When there are Manchester University Press.
several underlying assets, the behavior of the exer- [16] Chadam, J. & Chen, X. (2007). Analytical and numer-
ical approximations for the early exercise boundary for
cise boundary near maturity has been studied by American put options, to appear in Dynamics of Contin-
Nyström [45], who has proved that the convergence uous, Discrete and Impulsive Systems 10, 649–657.
rate, when time to maturity goes to 0, is faster than [17] Chevalier, E. (2005). Critical price near maturity for
parabolic. an American option on a dividend-paying stock in
American Options 9

a local volatility model, Mathematical Finance 15, [37] Lamberton, D. & Lapeyre, B. (1996). Introduction to
439–463. Stochastic Calculus Applied to Finance, Chapman and
[18] Chevalier, E. (2007). Bermudean approximation of the Hall, London.
free boundary associated with an American option, [38] Lamberton, D. & Pagès, G. (1990). Sur l’approximation
Free Boundary Problems: Theory and Applications 154, des réduites, Annales de l’I.H.P., Probabilités et Statis-
137–147. tiques 26(2), 331–355.
[19] Duffie, D. (1992). Dynamic Asset Pricing Theory, [39] Lamberton, D. & Villeneuve, S. (2003). Critical price
Princeton University Press, Princeton. for an American option on a dividend-paying stock, The
[20] Dupire, B. (1994). Pricing with a smile, Risk Magazine Annals of Applied Probability 13, 800–815.
7, 18–20. [40] Longstaff, F.A. & Schwartz, E.S. (2001). Valuing
[21] Dupuis, P. & Wang, H. (2004). On the conver- American options by simulations: a simple least squares
gence from discrete to continuous time in an optimal approach, Review of Financial Studies 14, 113–147.
stopping problem, Annals of Applied Probability 15, [41] McKean, H.P. Jr. (1965). Appendix: a free boundary
1339–1366. problem for the heat equation arising from a prob-
[22] Ekström, E. (2004). Convexity of the optimal stop- lem in mathematical economics, Industrial Management
ping boundary for the American put option, Jour- Review 6, 32–39.
nal of Mathematical Analysis and Applications 299, [42] Merton, R.C. (1976). Option pricing when underlying
147–156. stock returns are discontinuous, Journal of Financial
[23] El Karoui, N. (1981). Les aspects probabilistes du Economics 3, 125–144.
contrôle stochastique, Lecture Notes in Mathematics 876, [43] Myneni, R. (1992). The pricing of the American option,
72–238. Springer-Verlag. Annals of Applied Probability 2, 1–23.
[24] El Karoui, N., Jeanblanc-Piqué, M. & Shreve, S. (1998). [44] Neveu, J. (1975). Discrete-Parameter Martingales,
Robustness of the Black-Scholes formula, Mathematical North Holland, Amsterdam.
Finance 8, 93–126. [45] Nyström, K. (2007). On the behaviour near expiry
[25] Evans, J.D., Keller, R.J. & Kuske, R. (2002). American for multi-dimensional American options, to appear in
options on assets with dividends near expiry, Mathemat- Journal of Mathematical Analysis and Applications 339,
ical Finance 12(3), 219–237. 664–654.
[26] Friedman, A. (1975). Stochasic Differential Equations [46] Pham, H. (1997). Optimal stopping, free boundary and
and Applications, Academic Press, New York, Vol. 1. American option in a jump-diffusion model, Applied
[27] Friedman, A. (1976). Stochasic Differential Equations Mathematics and Optimization 35, 145–164.
and Applications, Academic Press, New York, Vol. 2. [47] Tsitsiklis, J.N. & Van Roy, B. (2001). Regression
[28] Harrisson, J.M. & Pliska, S.R. (1981). Martingales methods for pricing complex American-Style options,
and stochastic integrals in the theory of continuous IEEE Transactions on Neural Networks 12(4),
trading, Stochastic Processes and their Applications 11, 694–703.
215–260. [48] Van Moerbeke, P. (1976). On optimal stopping and free
[29] Hobson, D. (1998). Volatility misspecification, option boundary problems, Archive for Rational Mechanics and
pricing and superreplication via coupling, The Annals Analysis 20, 101–148.
of Applied Probability 8(1), 193–205. [49] Villeneuve, S. (1999). Exercice region of American
[30] Jacka, S.D. (1991). Optimal stopping and the American options on several assets, Finance and Stochastics 3,
put, Mathematical Finance 1, 1–14. 295–322.
[31] Jaillet, P., Lamberton, D. & Lapeyre, B. (1990). Varia- [50] Zhang, X.L. (1997). Numerical analysis of American
tional inequalities and the pricing of American options, option pricing in a jump-diffusion model, Mathematics
Acta Applicandae Mathematicae 21, 263–289. of Operations Research 22, 668–690.
[32] Karatzas, I. (1988). On the pricing of American options,
Applied Mathematics and Optimization 17, 37–60.
[33] Kim, I.J. (1990). The analytic valuation of American
options, Review of Financial Studies 3, 547–572. Further Reading
[34] Kushner, H.J. (1977). Probability Methods for Approxi-
mations in Stochastic Control and for Elliptic Equations,
Academic Press, New York. Black, F. & Scholes, M. (1973). The pricing of options
[35] Lamberton, D. (1998). American Options, Statistics in and corporate liabilities, Journal of Political Economy 81,
Finance, D. Hand & S. Jacka, Arnold Applications of 637–659.
Statistics Series. eds, Edward Arnold London. Dalang, R.C., Morton, A. & Willinger, W. (1990). Equivalent
[36] Lamberton, D. (1998). Error estimates for the bino- martingale measures and no-arbitrage in stochastic securities
mial approximation of American put options, Annals of market models, Stochastics and Stochastics Reports 29(2),
Applied Probability 8, 206–233. 185–202.
10 American Options

Detemple, J. (2005). American-Style Derivatives: Valuation and Related Articles


Computation, Financial Mathematics Series, Chapman &
Hall/CRC, New York.
Detemple, J., Feng, S. & Tian, W. (2003). The valuation of Bermudan Options; Bermudan Swaptions and
American call options on the minimum of two dividend- Callable Libor Exotics; Early Exercise Options:
paying assets, Annals of Applied Probability 13, 953–983. Upper Bounds; Exercise Boundary Optimization
Friedman, A. (1964). Partial Differential Equations of Methods; Finite Difference Methods for Early
Parabolic Type, Prentice-Hall: Englewood Cliffs, New Exercise Options; Integral Equation Methods for
Jersey.
Free Boundaries; Point Processes; Swing Options.
Merton, R.C. (1973). Theory of rational option pricing,
Bell Journal of Economics and Management Science 4,
141–183.
ETIENNE CHEVALIER
Asian Options An advantage for the buyer of an Asian option
is that it is often less expensive than an equivalent
vanilla option. This is because the volatility of the
An Asian option is also known as a fixed-strike Asian average is lower than the volatility of the asset itself.
option or an average price or average rate option. Another advantage in thinly traded markets is that the
These options have a payoff based on the average of payoff does not depend only on the price of the asset
an underlying asset price over a specified time period. on a particular day.
The Asian option has a payoff dependent on the aver- Consider the standard Black–Scholes economy
age of the asset price and a strike that is fixed in with a risky asset (stock) and a money market
advance. The other type of Asian option is the aver- account. We also assume the existence of a risk-
age strike option (or floating strike), where the payoff neutral probability measure Q (equivalent to the
is determined by the difference between the underly- real-world measure P ) under which discounted asset
ing asset price and its average (see Average Strike prices are martingales. Under measure Q we denote
Options). Asian options are path-dependent options the expectation by E, and under Q, the stock price
as their payoff depends on the asset price path rather follows:
dSt
than just on the terminal value. = (r − δ) dt + σ dWt (1)
St
If the average is computed using a finite sample
of asset price observations taken at a set of regularly where r is the constant continuously compounded
spaced time points, we have a discrete Asian option. interest rate, δ is a continuous dividend yield,
A continuous time Asian option is obtained by σ is the instantaneous volatility of asset return,
computing the average via the integral of the price and W is a Q-Brownian motion. The reader is
path over an interval of time. In reality, contracts are referred to Black–Scholes Formula for details on
based on discrete averaging; however, if there are a the Black–Scholes model and Risk-neutral Pricing
large number of averaging dates, there are advantages for a discussion of risk-neutral pricing.
in working in continuous time. The average itself The Asian contract is written at time 0 and expires
can be defined to be geometric or arithmetic. When at T > t0 . The averaging begins at time 0 ≤ t0 and
the geometric average is used, the Asian option has occurs over the period [t0 , T ]. (It is possible to have
a closed-form solution for the price, whereas the contracts where the averaging period finishes before
option with arithmetic average does not have a known maturity T , but this case is not covered here.) It is
closed-form solution. of interest to calculate the price of the option at the
One of the reasons Asian options were invented current time t, where 0 ≤ t ≤ T . The position of t
was to avoid price manipulation toward the end of compared to the start of the averaging, t0 , may vary.
the option’s life. By making the payoff depend on If t ≤ t0 , the option is “forward starting”. The special
the average price rather than on the price itself, such case t = t0 is called a starting option here. If t > t0 ,
manipulations have little effect on the option value. the option is termed in progress as the averaging has
For this reason, Asian options are usually of European begun.
style. The possibility of exercise before the expiration We consider an Asian contract that is based on
date would make the option more vulnerable to price the value AlT , where we denote l = c for continuous
manipulation; see [11]. The payoff of an Asian option averaging and l = d for discrete averaging. The
cannot be obtained by combining other instruments continuous arithmetic average is given as
such as vanilla options, forwards, or futures.  t
1
Asian options are commonly used for currencies, Act = Su du, t > t0 (2)
t − t0 t0
interest rates, and commodities, and more recently in
energy markets. They are useful in corporate hedg- and by continuity, we define Act0 = St0 . For the dis-
ing situations, for instance, a company exchanging crete arithmetic average, denote 0 ≤ t0 < t1 < ... <
foreign currency for domestic currency at regular tn = T , and for current time tm ≤ t < tm+1 (for inte-
intervals. Each transaction could be hedged separately ger 0 ≤ m ≤ n),
with derivatives or a single Asian option could hedge 
1
the “average” rate over the period during which the Adt = St (3)
currency is transferred. m + 1 0≤i≤m i
2 Asian Options

The corresponding geometric average Glt ; l = c, d (documented in many papers including Levy [13]),
is defined to be so it is enough to consider the call option and derive
  t  the price of the put from this.
1 The main difficulty in pricing and hedging the
Gt = exp
c
ln Su du (4)
t − t0 t0 Asian option is that the random variable AT does
not have a lognormal distribution. This makes the
for continuous averaging and
pricing very involved, and an explicit formula does
Gdt = (St0 St1 ...Stm )1/(m+1) (5) not exist to date. This is an interesting mathematical
problem and many research papers have been and
for discrete averaging. still are written on the topic. The first of these was
The payoff of an Asian call with arithmetic by Boyle and Emmanuel [3] in 1980.
averaging is given as Early methods for pricing the Asian option with
arithmetic average involved replacing the arithmetic
(AlT − K)+ (6) average AT with the geometric average GT , which
is lognormally distributed; see [5, 10, 11, 15, 17].
and the payoff of an Asian put with arithmetic This gives a simple formula, but it underprices the
averaging is given as call significantly. However, it is worth noting that√
the formula leads to a scaling known as the 1/ 3
(K − AlT )+ (7) rule, since for t > t0 , the volatility is scaled down
by this
√ factor. That is, the formula involves the term
where K is the fixed strike. Option payoffs depending
σ √13 T − t. This is a particularly useful observation
on the geometric average are identical with AlT
replaced by GlT . if the averaging period is quite short relative to the
By standard arbitrage arguments, the time-t price life of the option. See [12], among others, for a
of the Asian call is description and more details.
The second class of methods used is to approx-
e−r(T −t) Ɛ[(AlT − K)+ |Ft ] (8) imate the true distribution of the arithmetic average
using an approximate distribution, usually lognormal
and the price of the put is with appropriate parameters. True moments for AT
are equated with those implied by a lognormal model,
e−r(T −t) Ɛ[(K − AlT )+ |Ft ] (9) so
2 2
ƐAnT = enα+1/2n v (11)
It is worth noting that in pricing the Asian option,
we need to consider only those cases where t ≤ t0 . for any integer n and α, v are the mean and standard
For t > t0 , the option is in progress, and we can write deviation of a normally distributed variable. This
e−r(T −t) Ɛ[(AT − K)+ |Ft ] as idea was used in a number of papers including

  + 
−r(T −t) 1 T
t − t0
e Ɛt Su du + At − K
T − t0
t T − t0
  T  + 
T − t −r(T −t) 1 T − t0 t − t0
= e Ɛt Su du − K− At (10)
T − t0 T −t t T −t T −t

where Ɛt denotes the expectation conditional on


information at time t. This is now the time t price of [13]. Turnbull and Wakeman [19] also corrected for
T −t
times an Asian option with averaging beginning skew and kurtosis by expanding about the lognormal.
T −t0  0 
at time t, with modified strike TT−t
−t
K − Tt−t−t0 At . The The practical advantage of such approximations is
prices of Asian options also satisfy a put–call parity their ease of implementation; however, typically these
Asian Options 3

methods work well for some parameter values but not The Asian option is an exotic path-dependent
for others. option since the value at any point in time depends on
A further analytical technique in approximating the history of the underlying asset price. Specifically,
the price of the Asian option is to establish price the value of the option at t depends on the current
bounds. Curran [6] and Rogers and Shi [16] used con- level of the underlying asset St , time to expiry T − t,
ditioning to obtain a two-dimensional integral, which and the average level of the underlying up to t,
proves to be a tight lower bound for the option. At . Zvan et al. [21] presented numerical methods
Much work has been done on pricing the Asian for solving this PDE. It turns out that the problem
option using quasi-analytic methods. Geman and can be reduced to two variables (one state and the
Yor [8] derived a closed-form solution for an “in- other time). Rogers and Shi [16], Alziary et al. [1],
the-money” Asian call and a Laplace transform and Andreasen [2] formulated a one-dimensional
for “at-the-money” and “out-of-the-money” cases. PDE. The PDE approach is flexible in that it can
Their methods are based on a relationship between handle market realities, but it is difficult to solve
geometric Brownian motion and time-changed Bessel numerically as the diffusion term is very small for
processes. To price the option, one must invert the values of interest on the finite-difference grid. Vecer
Laplace transform numerically; see [7]. Shaw [18] [20] reformulated the problem using analogies to
demonstrated that the inversion can be done quickly passport options [9] to obtain an unconditionally
and efficiently for all reasonable parameter choices stable PDE, which is more easily solved.
in Mathematica, making this a fast and effective Methods based on discrete sampling become more
approach. Linetsky [14] produced a quasi-analytic appropriate when there are relatively few averaging
pricing formula using eigenfunction methods, with dates. One simplistic approach is a scaling correction
highly accurate results, also employing a package to volatility as described. Other possibilities include
such as Mathematica. a Monte Carlo simulation or numerical solution of a
Direct numerical methods such as Monte Carlo sequence of PDEs [2]. Monte Carlo simulation can
or quasi-Monte Carlo simulation and finite-difference be quite efficient when there are only a small num-
partial differential equation (PDE) methods can be ber of averaging dates, since the first “step” can take
used to price the Asian option (see Lattice Meth- one straight to the averaging period (under the usual
ods for Path-dependent Options). In fact, given exponential Brownian motion model). Andreasen [2]
the popularity of such techniques, these methods priced discretely sampled Asian options using finite-
were probably amongst the first used by practitioners difference schemes on a sequence of PDEs. This is
(and remain popular today). Monte Carlo simula- particularly efficient if the averaging period is short
tion was used to price Asian options by Broadie and hence there are only a small number of PDEs
and Glasserman [4] and Kemna and Vorst [11], to solve. He compared his PDE results to that of
among many other more recent researchers. Simu- Monte Carlo simulation and showed that the finite-
lation methods have the advantage of being widely difference schemes get within a penny accuracy of the
used by practitioners to price derivatives, so no Monte Carlo simulation in less than a second of CPU
“new” method is required. Additional practical fea- time.
tures such as stochastic volatility or interest rates To conclude, there has been ongoing research into
can be incorporated without a significant increase the methods for pricing the Asian option. It seems,
in complexity. Control variates can often be used however, the current-state-of-the-art pricing meth-
(e.g., using a geometric Asian option when pric- ods (good implementation of inversion of Laplace
ing an arithmetic option). Additionally, simulation transform, eigenfunction and other expansions, stable
is often used as a benchmark price against which PDE, and Monte Carlo simulation where appropriate)
other methods are tested. The disadvantages are are fast, accurate, and adequate for most uses.
that it is computationally expensive, even when
variance reduction techniques are used. Lapeyre References
and Temam [12] showed that Monte Carlo simula-
tion can be competitive under the more advanced [1] Alziary, B., Decamps, J.P. & Koehl, P.F. (1997). A PDE
schemes they propose and with variance reduction approach to Asian optons: analytical and numerical evi-
techniques. dence, Journal of Banking and Finance 21(5), 613–640.
4 Asian Options

[2] Andreasen, J. (1998). The pricing of discretely sam- [13] Levy, E. (1992). Pricing European average rate currency
pled Asian and lookback options: a change of numeraire options, Journal of International Money and Finance
approach, Journal of Computational Finance 11(5), 474–491.
2(1), 5–30. [14] Linetsky, V. (2004). Spectral expansions for Asian
[3] Boyle, P. & Emanuel, D. (1980). The Pricing of Options (average price) options, Operations Research 52(6),
on the Generalized Mean. Working paper, University of 856–867.
British Columbia. [15] Ritchken, P., Sankarasubramanian, L. & Vijh, A.M.
[4] Broadie, M. & Glasserman, P. (1996). Estimating secu- (1993). The valuation of path-dependent contracts on
rity price derivatives using simulation, Management Sci- the average, Management Science 39(10), 1202–1213.
ence 42, 269–285. [16] Rogers, L.C.G. & Shi, Z. (1995). The value of an Asian
[5] Conze, A. & Viswanathan, R. (1991). European path option, Journal of Applied Probability 32, 1077–1088.
dependent options: the case of geometric averages, [17] Ruttiens, A. (1990). Classical replica, Risk February,
Finance 12(1), 7–22. 33–36.
[6] Curran, M. (1992). Beyond average intelligence, Risk 5, [18] Shaw, W. (2000). A Reply to Pricing Continuous Asian
60. Options by Fu, Madan and Wang, Working paper.
[7] Fu, M., Madan, D. & Wang, T. (1999). Pricing contin- [19] Turnbull, S.M. & Wakeman, L.M. (1991). A quick
uous Asian options: a comparison of Monte Carlo and algorithm for pricing European average options, Journal
Laplace transform inversion methods, Journal of Com- of Financial and Quantitative Analysis 26(3), 377–389.
putational Finance 2(2), 49–74. [20] Vecer, J. (2001). A new PDE approach for pricing arith-
[8] Geman, H. & Yor, M. (1993). Bessel processes, metic average Asian options, Journal of Computational
Asian options and perpetuities, Mathematical Finance Finance 4(4), 105–113.
3, 349–375. [21] Zvan, R., Forsyth, P. & Vetzal, K. (1998). Robust
[9] Henderson, V. & Hobson, D. (2000). Local time, cou- numerical methods for PDE models of Asian options,
pling and the passport option, Finance and Stochastics Journal of Computational Finance 2, 39–78.
4(1), 69–80.
[10] Jarrow, R.A. & Rudd, A. (1983). Option Pricing. Irwin,
IL. Related Articles
[11] Kemna, A.G.Z. & Vorst, A.C.F. (1990). A pricing
method for options based on average asset values, Average Strike Options; Black–Scholes Formula;
Journal of Banking and Finance 14,
113–129.
Lattice Methods for Path-dependent Options;
[12] Lapeyre, B. & Temam, E. (2000). Competitive Monte Risk-neutral Pricing.
Carlo methods for the pricing of Asian options, Journal
of Computational Finance 5, 39–57. VICKY HENDERSON
2. Suppose C(K, T ) < S0 − B(T )K. Then we can
Arbitrage Bounds construct an arbitrage by buying the call option
with strike K, selling short the asset, and buying
A key question in option pricing concerns how to K units of the bond that pays $1 at time T . At
incorporate information about the prices of existing, time 0, we receive the cash amount
liquidly traded options into the prices of exotic
S0 − B(T )K − C(K, T ) (2)
options. In the classical Black–Scholes model, where
there is only one parameter to choose, this question which, by assumption, is strictly positive. At
becomes: what do existing prices tell us about the maturity, writing x+ = max{x, 0}, we hold a
volatility? Since the Black–Scholes model lacks the portfolio whose value is
flexibility to capture all the market information, a
wide variety of pricing models have been proposed. (ST − K)+ − (ST − K) (3)
Rather than specifying a model and pricing with
respect to this model, an alternative approach is to which is positive.
construct model-free arbitrage bounds on the price 3. Finally, it is clear that the call option must have
of exotic options. Arbitrage bounds are constraints on a positive value (i.e., C(T , K) ≥ 0), but this can
the price of an option, due to the absence of arbitrage also be considered a consequence of the arbitrage
strategies. These strategies are typically derived from strategy of “buying” the derivative (for a negative
relationships between the payoff of an option, and the price), and hence receiving positive cash flows
payoff of a simple trading strategy constructed from both initially and at maturity.
other related derivatives—for example, the strategy
might be a buy-and-hold strategy. If such a simple There are some key features of the above exam-
trading strategy can be shown to be worth at least as ple that are repeated in other similar applications.
much as the corresponding option at maturity in every Note, first of all, that the inequalities make no mod-
possible outcome, then the initial cost of the trading eling assumptions—the final value of the arbitrage
strategy must be more than the cost of the option, portfolios will be larger/smaller than the call option
or else there exists a simple arbitrage. An important for any final value of the asset, so these bounds
feature of these bounds is that they are often valid are truly independent of any model for the under-
for a very wide class of models. lying asset. Secondly, the bounds are the best we
can do in the following sense: it can be shown that
there are arbitrage-free models for the asset price
Arbitrage Bounds for Call Prices under which the bounds are tight. For example, if the
interest rates are deterministic, and the asset price sat-
Perhaps the earliest and simplest example of arbitrage isfies St = S0 B(t), then the lower bounds hold for all
bounds are the following inequalities, which are strikes, and there is no arbitrage in the market. Alter-
described in the seminal paper [29]: natively, the upper and lower bounds can be shown
to be the Black–Scholes price of an option in the
max {0, S0 − B(T )K} ≤ C(K, T ) ≤ S0 (1)
limit as σ → ∞ and σ → 0, respectively.
where C(K, T ) is the time-0 price of a European call In practice, these bounds are far too wide for
option on the asset (St )t≥0 with strike K and maturity most practical purposes, although they can be useful
T , and B(T ) is the time-0 price of a bond that is as a check that a pricing algorithm is producing
worth $1 at time T . These bounds can be derived sensible numerical results. Part of the reason for this
from the following simple arbitrages: wide range of values concerns the relatively small
amount of information that is being used in deriving
1. Suppose C(K, T ) > S0 . Then we can construct the bound. In general, one would expect to have
an arbitrage by selling the call option and buying some information about the behavior of the market.
the asset. We receive an initial positive cash flow, A natural place to look for further information is
while at maturity the option is worth (ST − K)+ , in the market prices of other vanilla options: in
which is less than ST , the value of the asset we model-specific pricing, this information is commonly
hold. used for calibration of the model. However, the
2 Arbitrage Bounds

information contained in these prices can also be that the price of an option may be written as a
used to provide arbitrage bounds on the prices of discounted expectation under a suitable probability
other exotic derivatives through the formulation of measure. However, an assumption of the fundamen-
appropriate portfolios. tal theorem of asset pricing is that there is a (known)
model for the underlying asset. In the situation we
wish to consider, there is no such measure. It is there-
Breeden–Litzenberger Formula fore not immediate that we can say anything about
any probabilistic structure that might help us. One
One of the initial works to consider the pricing of the interesting consequences of this result is that
implications of vanilla options on exotic options is it does provide some information about the underly-
[6]. Here, the authors suppose that the value of calls ing probabilistic structure: namely, that the call prices
at all strikes and a given maturity are known, and “imply” a risk-neutral distribution for the asset price,
observe that and that there are arbitrage relationships that ensure

1 ∂ 2 C(K, T )  that any other option whose payoff depends only on
p(x) =  (4) the final value of the asset also has the price implied
B(T ) ∂K 2 
K=x by this probability measure.
can be thought of as the density of a random variable.
The value at time-0 of an option whose payoff is only
a function of the terminal value of the asset, f (ST ), Arbitrage Bounds for Exotic Options
can then be shown to be
 A general approach that is implied by the above
B(T ) f (x)p(x) dx (5) examples is the following: suppose we know the
prices of (and can trade in) a set of “vanilla” deriva-
tives. Consider also an exotic option, for exam-
or, intuitively, the discounted expectation under the
ple, a barrier option. Without making any (strong)
density implied by the call prices. We can see this by
assumptions about a model for the underlying asset,
noting that (at least for twice-differentiable functions
what does arbitrage imply about the price of the bar-
f ), we have
rier option? Through a suitable set of trades in the
 ∞

underlying and vanilla options, we should be able to
f (S) = f (0) + Sf (0) + f  (K)(S − K)+ dK construct portfolios and self-financing trading strate-
0
(6) gies that either dominate, or are dominated by, the
payoff of the exotic option. If we can find a portfolio
and therefore may replicate the contract f (S) exactly that dominates the exotic option, then the initial cost
by holding f (0) in cash, buying f  (0) units of of this portfolio (which is known) must be at least as
the asset, and “holding” a continuous portfolio of much as the price of the exotic option, or else there
calls consisting of f  (K) dK units of call options will be an arbitrage from buying the portfolio and
with strikes in [K, K + dK]. Since this portfolio selling the exotic option. The price of this portfolio
replicates the exotic option exactly, by an arbitrage therefore provides an upper bound on the price of
argument, the prices must agree. The price of the the option. In a similar manner, we may also find a
portfolio of calls can be shown to be equation (5). lower bound for the price of the option by looking
In practice, some discrete approximation of such a for portfolios and trading strategies in the underly-
portfolio is necessary, and this is generally possible ing and vanilla options that result in a terminal value
provided the calls trade at a suitably large range of that is always dominated by the exotic option. Note
strikes. that we are, in general, interested in the least upper
One of the interesting consequences of this result bound and also the greatest lower bound that can be
is that we have a representation for the price of the attained, since these will give the tightest possible
exotic option as a discounted expectation. A key bounds.
result in modern mathematical finance is the fun- We have been vague about two concepts here: first,
damental theorem of asset pricing, which allows we said that we would not want to make any “strong”
us to deduce from the assumption of no arbitrage assumptions about the model of the underlying asset.
Arbitrage Bounds 3

The exact assumptions that different examples make Of course, not all markets fit naturally into this
about the underlying models vary from case to case, framework, and so other settings should also be
but typically we might assume, perhaps, that the considered, as, for example, in [27], where arbitrage
underlying asset price is continuous (or at least, bounds for fixed income markets are considered.
that it continuously crosses a barrier), or that the
price process satisfies some symmetry assumption.
Secondly, we have not specified what types of trading Barrier Options
strategies we wish to consider: this is because, in
part, this depends heavily on the assumptions on the One of the simplest classes of options that can be
price process—for example, trading strategies that considered are the various types of barrier options,
involve a trade when the asset first crosses a barrier and one of the simplest of these options is the one-
often assume that the underlying crosses the barrier touch barrier option: this is an option that pays $1 at
continuously; the assumption on the symmetry of maturity if the barrier is breached during the lifetime
the asset price results in identities connecting the of the contract, and expires worthless if the barrier is
prices of call and put options. However, the important not hit before maturity. Suppose that the price process
point to note here is that we work typically in a is continuous, and suppose further that the riskless
class of price processes that are too large to be interest rate is zero. Then [7] provides an upper bound
able to hedge dynamically in any meaningful way, on the price of the option, OT (R, T ), where R is the
so that continuously rebalancing the portfolio is not level of the barrier, R > S0 , and T is the maturity of
an option. Two important classes of strategies are the option. The bound that is derived in [7] is
static strategies, which involve purchasing an initial
portfolio of the underlying and vanilla options, and C(x, T )
holding this to maturity (see Static Hedging), and OT (R, T ) ≤ inf (7)
x≤R R−x
semistatic strategies, which involve a fixed position
in the options, and some trading in the underlying The bound can be most clearly seen by noting the
asset, often at hitting times of certain levels or corresponding arbitrage strategy: suppose that the
sets. bound does not hold, then we can find an x for which
C(x, T )
Consistency of Vanilla Options OT (R, T ) > (8)
R−x
Since we are looking for arbitrage in the market We sell the one-touch option, and buy R−x 1
units
when we add an exotic option, it is important that of the call with strike x and maturity T . If the
the initial prices of the vanilla options do include an barrier at R is not hit, the one-touch option expires
arbitrage. In the case of equity markets, where the worthless, and our call option may have positive
underlying vanilla options are call options, written value. Alternatively, suppose that at some time, the
on a given set of strikes and maturities, this is a barrier is hit. At this time, we enter into a forward
question that has been studied by a number of authors contract on the asset. Specifically, we sell R−x 1
units
[9, 11, 13, 15, 18]. The fundamental conclusion of a forward struck at R. Since the current value of
that may be arrived at from all these works is the the asset is R, and we have assumed that the interest
following: the prices of calls are arbitrage free if rates are zero, we may enter into such a contract
and only if there exists a model under which the for free. At maturity, the value of our position in
prices agree with the discounted expectation under the forward will be R−S T
, and the total value of our
R−x
the model. Moreover, the existence of the model has position in the call and the forward is
a relatively straightforward characterization in terms
of the properties of the call prices, so that for a given 1 R − ST 1
set of call prices, the conditions may be checked with (ST − x)+ + =
R−x R−x R−x
relative ease. Moreover, some practical concerns can  
be included in the models: [15] allows the inclusion × (ST − x) + (x − ST )+ + (R − ST )
of default of the asset, while [18] also allows for the (x − ST )+
inclusion of dividends. = −1 + (9)
R−x
4 Arbitrage Bounds

where we write x+ = max {x, 0}. Since the value of There are a number of observations that we can
the portfolio is now greater than the value of the make about the solution to the above problem, and
one-touch option, we have an arbitrage. which extend more generally. First, the extension
It can also be shown that the bound here is to nonzero interest rates is nontrivial—one of the
the best that can be attained: specifically, it can assumptions that was made in constructing the trading
be shown that there exists a model under which strategy was that, when the barrier is struck, we
there is equality in the identity (7). By considering would be able to enter into a forward contract with
the form of the hedge, we can also say something a strike at the barrier. If there are nonzero interest
about the extremal model. For equality to be there in rates, we will not be able to enter into such a
equation (7), we must always have equality between contract at no cost. Consequently, these results are
the payoff of the one-touch option, and the value of only generally valid in cases where there is zero
the hedging portfolio. The case where the barrier is cost of carry, for example, where the underlying
not hit requires that is a forward price, in foreign exchange markets
(ST − x)+ where both currencies have the same interest rate,
0= (10) or commodities where the interest rate is the same as
R−x
the convenience yield. Secondly, recall that the only
or, equivalently, that ST is always below x. The case assumption we made on the paths was continuity.
where the barrier is struck requires that This assumption is key to knowing that we can sell
(x − ST )+ forward as we hit the barrier. In fact, the upper
1=1+ (11) bound will still hold if the path is not continuous,
R−x
provided we sell forward the first time that we go
or that ST is always above x. In other words, in the above the barrier, at which point, we can enter into
extremal model, the paths that hit the barrier will, a forward contract that is at least as good for our
at maturity, finish above the minimizing value of x, purposes. Note, however, that under the model for
while those that do not hit the barrier will always end which the bound is tight, we must cross the barrier
up below x. continuously. The same is not true of the lower
A similar approach allows us to find a lower bound, which fails if the asset price does not cross
bound. In this case, the hedging portfolio consists of the barrier continuously. If the path is not assumed
a digital call struck at the barrier, so that the payoff continuous, a new bound can be derived, which
of this option is simply $1 if the asset ends up above corresponds to the asset jumping immediately to its
the barrier, and put options are struck at the barrier, at final value. The third aspect to note about these
some y < R. Note that the digital call can, in theory constructions is that there is a natural extension to
at least, be arbitrarily closely approximated by buying the case where calls are available at finitely many
a suitably large number of calls just below the strike, strikes. Consider the upper bound on the one-touch
and selling the same number of calls at the strike, so option, and suppose that calls trade at a finite set
that we can deduce the price of the digital call from
of strikes K1 , K2 , . . . , Kn . Rather than taking the
the prices of the vanilla call options. The prices of
infimum over x where x < R, to get an upper bound,
the puts can be deduced from put–call parity. In a
we can take the minimum over the strikes at which
manner similar to the above, we can find the “best”
calls are available
bound by finding the value of y that corresponds to
the most expensive portfolio. Again, the bound is
C(Ki , T )
tight, in the sense that there exists a model under OT (R, T ) ≤ min (12)
which we attain equality. We can also describe the i:Ki <R R − Ki
behavior in this model: the paths that hit the barrier
will end up either below y or above R. Those that do The previous arguments can be applied directly to
not hit the barrier will finish between y and R. show that this is an upper bound. It can also be shown
Using extensions of these ideas, similar bounds that there is a model that fits with the call prices, and
can be found for other common barrier options, for under which this bound is attained, so the resulting
example, down-and-in calls. Full details can be found bounds are the best possible. Details of this extension
in [7]. can be found in [7].
Arbitrage Bounds 5

Put–Call Symmetry underlying and time, can be found in [2]. A different


approach to static hedging is given in [20].
An alternative approach to the pricing of barrier
options using the above techniques is to introduce
the concept of “put–call symmetry”. Following [5], Arbitrage Bounds via Skorokhod
we say that put–call symmetry holds if the value of Embeddings
a call struck at K > St , and a put struck at H < St ,
satisfy As shown by Dupire [21], if prices of calls at all
C(K)K −1/2 = P (H )H −1/2 (13) strikes and all maturities are known, there is a unique
diffusion model, the local volatility model, which
where the current asset price S0 is the geometric mean matches those call prices. If we drop the diffusion
of H and K: (KH )1/2 = S0 . While this is a more assumption, we are led to follow the line of reasoning
general concept, in the context of a local volatility from [6]. One of the conclusions from this work is
model, this assumption can be interpreted in terms that knowing the call prices at all strikes at a fixed
of a symmetry condition on the volatility: σ (St , t) = future maturity implies the law of the asset price
σ (S02 /St , t). In particular, this is an assumption that under the risk-neutral measure at this fixed future
is satisfied whenever the volatility is a deterministic date. Further, as a consequence of the assumption
function of time. Alternatively, if we graph the of no arbitrage, we believe that under the risk-
implied volatility smile against log(K/St ), the smile neutral measure, the discounted asset price should be
should be symmetric. Note that, as above, we still a martingale. In this manner, we should be able to
require either the interest rate to be zero, or, for restrict the class of possible (discounted, risk-neutral)
example, to be working with a forward price. price processes to the class of martingales that have
Under the assumption that put–call symmetry a given terminal distribution. If we now also wish to
holds at all future times, we can construct replicating infer information about the price of an exotic option,
portfolios for many types of barrier options. Consider we can ask the question: what is the largest/smallest
the case of a down-and-in call (see Barrier Options), price implied by the martingale price processes in this
with a barrier at R and strike K, so R < S0 . Then we class? Moreover, we might hope to find an arbitrage
may hedge the option simply by purchasing initially if the option trades outside this range.
K/R puts at H , where H = R 2 /K. If the asset never One of the simplest examples to consider is
reaches the barrier, both the down-and-in call and the the one-touch option above: under the risk-neutral
put expire worthless, so we consider the behavior at measure, the price of the call is the discounted
the barrier. When the asset is at the barrier, put–call probability that the price process goes above the
symmetry implies barrier before the expiry date. By restricting ourselves
K to the class of martingales with a given terminal law,
C(K) = P (H ) (14) we should be able to deduce some information about
R
the possible values of this probability, and thus of the
and so we may sell the puts and buy a call with price of the option. The key to using this approach
strike K. Thus this portfolio exactly replicates the efficiently is to find a suitable representation of the set
down-and-in call. of martingales with the given terminal distribution.
The results described above were initially intro- A classical result from probability theory, the
duced in [5], where, in addition to considering knock- Dambis–Dubins–Schwartz Theorem, states that any
in and knockout calls, and the one-touch option continuous martingale may be written as the time
above, the authors also included the lookback option change of a Brownian motion (see, for example,
by expressing it as a portfolio of suitable down-and- [33, Chapter V]), and this is essentially true if the
in options. Further developments can be found in martingale is only right continuous [30]. Hence,
[10], which considers the replication of more general if the discounted asset price is a martingale, one
options in this framework, and [12], which extends would expect it to be a time change of a Brownian
to double knockout calls, rolldown calls, and ratchet motion—that is, we would expect to be able to write
calls. Further extensions to these ideas, where the
volatility is assumed to be a known function of the B(t)St = Wτ (t) (15)
6 Arbitrage Bounds

where Wt is a Brownian motion, τ (t) is increasing in and the minimum will correspond to stopping time
t and is a stopping time for all t. As a consequence, that minimizes the probability within this class. The
any martingale price process should be a time change construction of arbitrage bounds for the price of the
of a Brownian motion. If, in addition, we know option is therefore equivalent to the identification of
that the law of ST under the risk-neutral measure extremal Skorokhod embeddings for the law implied
is implied by the call prices, we also know that by the call prices at maturity, as seen in [7]. The
Wτ (T ) has a given law. Finally, suppose that the construction that attains this maximum is due to
time change is continuous (as it will be if the price Azema and Yor [3], while the construction that attains
process is continuous), then many of the properties the minimum is due to Perkins [32], and it can be
in which we are interested remain unaffected by the shown that these embeddings do indeed have the
exact form of the time change. For example, consider behavior that was hypothesized previously: for the
the probability of whether the discounted asset price upper bound, those paths that hit the barrier remain
goes above a barrier R before time T . This is the above the level x derived in the bound, while in the
same as the probability that the Brownian motion lower bound, those paths that hit the barrier all either
Wt with Wτ (t) = B(t)St goes above the barrier before finish above the barrier, or stop below y.
time τ (T ). Moreover, consider two time changes τ (t) The Skorokhod embedding approach was initially
and τ̃ (t), such that we always have τ (T ) = τ̃ (T ). explored in [23]. In this work, it is shown that the
Then the probability of whether the barrier has been upper bound on the price of a lookback option can
breached will be the same for the price processes be computed in terms of the available call prices.
corresponding to the time change τ and the time Moreover, Hobson [23] has constructed a trading
change τ̃ . Consequently, if we are concerned with strategy that will result in an arbitrage should the
such path properties of the underlying price process, lookback option trade above the given bound. In
when we look in the Brownian setting, we need only this case, the strategy involves constructing an initial
differentiate between different final stopping times portfolio of calls (purchased at the specified prices)
τ (T ), and not different time changes. and then selling these calls appropriately as the price
The argument then goes as follows: suppose we process sets new maxima. The price at which the calls
know call prices at all strikes at time T . From this can be sold will be at least the intrinsic value of the
information, we may deduce the law of the discounted call, and it can be shown that the profit from selling
asset price B(T )ST , which we assume to be a time off the calls appropriately will be at least the payoff
change of a Brownian motion, and whose value at from the lookback option. A simple lower bound is
some stopping time τ therefore has the same law. also derived, but without assuming any continuity.
Since the time change in the intermediate time is For discontinuous asset prices, the lower bound is
assumed to be continuous, and its exact form will attained by the price process that jumps immediately
not impact the quantities of interest, we get a one-to- to its final value. In terms of the corresponding
one correspondence between possible price processes Skorokhod embeddings, the upper bound has close
and the class of stopping times of a Brownian motion connections with the embedding due to Azema and
that have a given law. This line of reasoning is Yor [3]; this can be shown to maximize the law of
of interest, since the problem of finding a stopping the maximum over the class of embeddings. Further,
time with a given terminal law has a long history in it can be shown that if we use the price process that
the probabilistic literature, where it is known as the corresponds to the stopping time constructed in [3],
Skorokhod embedding problem. In particular, given then the trading strategy dominating the lookback
a distribution µ, we say that a stopping time τ is a option actually attains equality demonstrating that the
(Skorokhod) embedding of µ, if Wτ has law µ. The upper bound is the best possible. This connection
recent survey paper [31] contains a comprehensive between an extremal Skorokhod embedding and a
survey of the probabilistic literature on the Skorokhod corresponding bound on the price of a connected
embedding problem. exotic option has been exploited a number of times:
Getting back to the one-touch option, we see that in [8], these techniques are used to generalize the
the upper bound will correspond to the stopping above results to the case where the call prices at
time that maximizes the probability of being larger an intermediate time are also known; in [24] the
than the barrier within the class of embeddings, embedding due to Perkins [32] is generalized to
Arbitrage Bounds 7

provide a lower bound on the price of a forward 1


start digital option, under the assumption that the 0.9 Upper bound
price process is continuous; in [16] the embedding 0.8 Black –Scholes price
Lower bound
of Vallois [35] is used to provide an upper bound on 0.7
products related to corridor variance options. 0.6

Price
A related development of these ideas is considered 0.5
in [28], wherein the problem of fitting martingales 0.4
to marginal distributions specified at all maturities is 0.3
presented, and some solutions corresponding to the 0.2
different Skorokhod embedding approaches, the local 0.1
volatility models of Dupire [21], and processes with 0
90 95 100 105 110 115 120
independent increments are discussed.
Strike

Figure 1 Upper and lower model-free bounds on the price


Advantages and Disadvantages of a one-touch option, as a function of the strike, compared
with the Black–Scholes price. The interest rate is 0, the
From a theoretical point of view, the results described asset price is $90, and σ = 15%
above provide a clear, satisfactory picture: for a
relatively large class of options, a range of model- hedging. In [34], there is no clear outperformance
free prices, or even exact prices, can be established. by either strategy, but in some circumstances the
Where there is a range of prices, the upper and lower static or semistatic hedging strategy outperforms the
bounds can usually be shown to be tight, and trading dynamic strategy. In [22], the authors consider barrier
strategies produced that result in arbitrages, should options, and find that some static hedging strate-
the bounds be violated. gies for barrier options appear to outperform dynamic
However, the results have often been produced strategies. Another useful observation is that by iden-
under strong restrictions on the mechanics of the mar- tifying the extremal models, one can identify the
ket—typically, the cost of carry has been assumed to key model properties that influence the price of the
be zero, and factors such as transaction costs have option: for example, in finding bounds for the one-
been ignored. To some extent, these factors can be touch barrier, the extremal models were identified
added into the bounds, although this is at the expense as those models that either hit the barrier and stay
of wider bounds. Moreover, the bounds that result close, or those models that hit the barrier and end
from the model-free techniques have a tendency to be up far away. Knowledge of these extremes might
rather wide. Figure 1 illustrates the resulting bounds help in deciding where the real price might lie in
for the one-touch option, comparing the upper and relation to the arbitrage bounds, or how prices of
lower bounds described earlier, with the actual price the option might react to large structural changes
derived from a Black–Scholes model. The range of to the market. Finally, arbitrage bounds can also
the bounds is, for interesting values, of the order of be considered as a special case of the good-deal
5% of the final payoff above the Black–Scholes price, bounds of [14]. Good-deal bounds provide a range
and as much as 15% below the Black–Scholes price. of prices, outside of which there exists a trading
These ranges are of much too high an order to be strategy whose payoff may be considered a “good
helpful for pricing purposes. deal”, which is not necessarily an arbitrage, but is
How else might these techniques be of use in sufficiently close to one to be very desirable for an
practice? One important feature is the tendency to investor.
produce simple hedging portfolios. These allow a
trader to cover a position in a derivative with a
portfolio that needs little or no ongoing manage- Additional Resources
ment, and through which they have a guaranteed
lower bound on any possible hedging error. Several There are a number of papers [17, 25, 26] that
authors, for example, [22, 34], have produced com- consider deriving bounds on the price of basket
parisons between static or semistatic and dynamic options, where the payoff of the option depends
8 Arbitrage Bounds

on the value of a weighted sum of a number of [12] Carr, P., Ellis, K. & Gupta, V. (1998). Static hedging of
assets, and where calls are traded on each of the exotic options, Journal of Finance 53(3), 1165–1190.
underlying assets. There are also connections to [1], [13] Carr, P., Geman, H., Madan, D.B. & Yor, M. (2003).
Stochastic volatility for Lévy processes, Mathematical
where bounds on the prices of Asian options are Finance 13(3), 345–382.
derived. [14] Cerny, A. & Hodges, S.D. (1999). The theory of good-
Another class of options where similar hedg- deal pricing in financial markets, FORC preprint, No.
ing techniques have been considered are installment 98/90.
options [19], which are options similar to a European [15] Cousot, L. (2007). Conditions on option prices for
call, but where the holder pays for the option in a set absence of arbitrage and exact calibration, Journal of
number of installments, and has the option to stop Banking and Finance 31, 3377–3397.
[16] Cox, A.M.G., Hobson, D.G. & Obloj, J. (2008). Path-
paying the installments at any point before maturity, wise inequalities for local time: applications to Sko-
thereby losing the final payoff for the contract. rokhod embeddings and optimal stopping, Annals of
A common complication that arises in construct- Applied Probability 18(5), 1870–1896.
ing many of the bounds and their respective hedging [17] d’Aspremont, A. & El Ghaoui, L. (2006). Static arbitrage
portfolios is that there can be some nontrivial opti- bounds on basket option prices, Mathematical Program-
mization problems, typically, large linear program- ming 106(3), Series A, 467–489.
[18] Davis, M.H.A. & Hobson, D.G. (2007). The range
ming problems [4, 17, 25].
of traded option prices, Mathematical Finance 17(1),
1–14.
[19] Davis, M.H.A., Schachermayer, W. & Tompkins, R.G.,
References (2001). Installment options and static hedging, in Math-
ematical Finance (Konstanz, 2000), Trends in Mathe-
[1] Albrecher, H., Mayer, P.A. & Schoutens, W. (2008). matics, Birkhäuser, Basel, pp. 131–139.
General lower bounds for arithmetic Asian option prices, [20] Derman, E., Ergener, D. & Kani, I. (1995). Static options
Applied Mathematical Finance 15(2), 123–149. replication, Journal of Derivatives 2, 78–95.
[2] Andersen, L.B.G., Andreasen, J. & Eliezer, D. (2002). [21] Dupire, B. (1994). Pricing with a smile, Risk 7, 32–39.
Static replication of barrier options: some general results, [22] Engelmann, B., Fengler, M.R., Nalholm, M. & Schwen-
Journal of Computational Finance 5(4), 1–25. der, P. (2006). Static versus dynamic hedges: an empiri-
[3] Azéma, J. & Yor, M. (1979). Une solution simple au cal comparison for barrier options, Review of Derivatives
problème de Skorokhod, in Séminaire de Probabilités, Research 9(3), 239–264.
XIII (Univ. Strasbourg, Strasbourg, 1977/78), Lecture [23] Hobson, D.G. (1998). Robust hedging of the lookback
Notes in Mathematics, Springer, Berlin, Vol. 721, option, Finance and Stochastics 2(4), 329–347.
pp. 90–115. [24] Hobson, D.G. & Pedersen, J.L. (2002). The minimum
[4] Bertsimas, D. & Popescu, I. (2002). On the relation maximum of a continuous martingale with given ini-
between option and stock prices: a convex optimization tial and terminal laws, Annals of Probability 30(2),
approach, Operations Research 50(2), 358–374. 978–999.
[5] Bowie, J. & Carr, P. (1994). Static simplicity, Risk 7(8), [25] Hobson, D., Laurence, P. & Wang, T. (2005a). Static-
45–49. arbitrage upper bounds for the prices of basket options,
[6] Breeden, D.T. & Litzenberger, R.H. (1978). Prices of Quantitative Finance 5(4), 329–342.
state-contingent claims implicit in option prices, Journal [26] Hobson, D., Laurence, P. & Wang, T. (2005b). Static-
of Business 51(4), 621–651. arbitrage optimal subreplicating strategies for basket
[7] Brown, H., Hobson, D. & Rogers, L.C.G. (2001a). options, Insurance, Mathematics & Economics 37(3),
Robust hedging of barrier options, Mathematical Fin- 553–572.
ance 11(3), 285–314. [27] Jaschke, S.R. (1997). Arbitrage bounds for the term
[8] Brown, H., Hobson, D. & Rogers, L.C.G. (2001b). The structure of interest rates, Finance and Stochastics 2(1),
maximum maximum of a martingale constrained by an 29–40.
intermediate law, Probability Theory and Related Fields [28] Madan, D.B. & Yor, M. (2002). Making Markov mar-
119(4), 558–578. tingales meet marginals: with explicit constructions,
[9] Buehler, H. (2006). Expensive martingales, Quantitative Bernoulli 8(4), 509–536.
Finance 6(3), 207–218. [29] Merton, R.C. (1973). Theory of rational option pricing,
[10] Carr, P. & Chou, A. (1997). Breaking barriers, Risk The Bell Journal of Economics and Management Science
10(9), 139–145. 4(1), 141–183.
[11] Carr, P. & Madan, D.B. (2005). A note on sufficient [30] Monroe, I. (1972). On embedding right continuous mar-
conditions for no arbitrage, Finance Research Letters 2, tingales in Brownian motion, Annals of Mathematical
125–130. Statistics 43, 1293–1311.
Arbitrage Bounds 9

[31] Obłój, J. (2004). The Skorokhod embedding problem [34] Tompkins, R. (1997). Static versus dynamic hedging of
and its offspring, Probability Surveys 1, 321–390, elec- exotic options: an evaluation of hedge performance via
tronic. simulation, Netexposure 1, 1–28.
[35] Vallois, P. (1983). Le problème de Skorokhod sur R: une
[32] Perkins, E. (1986). The Cereteli-Davis solution to the
approche avec le temps local, in Seminar on Probability,
H 1 -embedding problem and an optimal embedding in XVII, Lecture Notes in Mathematics, Springer, Berlin,
Brownian motion, in Seminar on Stochastic Processes, Vol. 986, pp. 227–239.
1985 (Gainesville, Fla., 1985), Progress in Probabity
and Statistics, Birkhäuser Boston, Boston, Vol. 12, Related Articles
pp. 172–223.
[33] Revuz, D. & Yor, M. (1999). Continuous Martin- Arbitrage Strategy; Barrier Options; Dupire Eq-
gales and Brownian Motion, Grundlehren der Mathe- uation; Good-deal Bounds; Hedging; Model Cali-
matischen Wissenschaften [Fundamental Principles of bration; Skorokhod Embedding; Static Hedging.
Mathematical Sciences], 3rd Edition, Springer-Verlag,
Berlin, Vol. 293. ALEXANDER COX
Average Strike Options We consider a contract that is based on the value
AT , where (At )t≥t0 is the arithmetic average
 t
1
At = Su du, t > t0 (2)
An average strike option is also known as an Asian t − t0 t0
option with floating strike. These options have a
payoff based on the difference between the terminal and by continuity, we define At0 = St0 . The corre-
asset price and the average of an underlying asset sponding geometric average Gt is defined as
price over a specified time period. The other type   t 
1
of Asian option is the fixed-strike option, where the Gt = exp ln Su du (3)
payoff is determined by the average of an underlying t − t0 t0
asset price and a fixed strike set in advance (see Asian
The contract is written at time 0 (with 0 ≤ t0 )
Options).
and expires at T > t0 . It is of interest to calculate
If the average is computed using a finite sample
the price of the option at the current time t, where
of asset price observations taken at a set of regularly
0 ≤ t ≤ T . The position of t compared to the start
spaced time points, we have a discrete average strike
of the averaging, t0 may vary, as described in Asian
option. A continuous time option is obtained by
Options.
computing the average via the integral of the price
The payoff of an average strike call with arith-
path over an interval of time. The average itself can
metic averaging is given as
be defined to be geometric or arithmetic. As for the
fixed strike Asian option, when the geometric average
(ST − AT )+ (4)
is used, the average strike option has a closed-
form solution for the price, whereas the option with and the payoff of an average strike put with arithmetic
arithmetic average does not have a known closed- averaging is
form solution.
We concentrate on the continuous time, average (AT − ST )+ (5)
strike option of European style with arithmetic aver-
Average strike option payoffs with geometric
aging. A discussion of the uses and rationale for intro-
averaging are identical, with AT replaced by GT . The
ducing Asian contracts is given in Asian Options.
buyer of an average strike call is able to exchange
Average strike options are closely related to these
the terminal asset price for the average of the asset
options, but are less commonly used in practice.
price over a given period. For this reason, it is
Consider the standard Black–Scholes economy
sometimes referred to as a lookback on the average
with a risky asset (stock) and a money market
(see Lookback Options for a discussion of the
account. We also assume the existence of a risk-
lookback option).
neutral probability measure Q (equivalent to the
By standard arbitrage arguments, the time-t price
real-world measure P ) under which discounted asset
of the average strike call is
prices are martingales. We denote expectation under
measure Q by E, and the stock price follows
e−r(T −t) Ɛ[(ST − AT )+ |Ft ] (6)
dSt and the price of the put is
= (r − δ) dt + σ dWt (1)
St
e−r(T −t) Ɛ[(AT − ST )+ |Ft ] (7)
where r is the constant continuously compounded
interest rate, δ is a continuous dividend yield, It turns out that we need to consider only the case
σ is the instantaneous volatility of asset return, t ≥ t0 , where the option is “in progress”. The forward
and W is a Q-Brownian motion. The reader is starting case (t < t0 ) can be rewritten as a modified
referred to Black–Scholes Formula for details on option with averaging starting at t, today. This is in
the Black–Scholes model and Risk-neutral Pricing contrast to the Asian option with fixed strike, where
for a discussion of risk-neutral pricing. the difficult case was when the option was forward
2 Average Strike Options

starting. As for the Asian option, the average strike price of the average strike option. This bound is in
option satisfies a put–call parity; see [1] for details. terms of an Asian option with fixed strike and a
The average strike option is an exotic path- vanilla option. The method gives an “exact” bound
dependent option, as the price depends on the path of for forward starting and starting options and when
the underlying asset via the average. The distribution expiry is reached.
of the average AT is not lognormal, if the asset Numerical methods can be used to price the aver-
price is lognormal, and pricing is difficult because age strike option. The discussion of Monte Carlo
the joint law of AT and ST is needed. This is in simulation in Asian Options is also relevant here, as
contrast to the Asian option, which required only the simulation is often used as a benchmark price. Inger-
law of the average. Perhaps because of this increased soll [6] was the first to recognize that it is possible to
complexity, or their lesser popularity in practice, reduce the dimension of the pricing problem for the
fewer methods exist for the pricing of average strike average strike option using a transformation of vari-
options. Just as for the Asian option, there are no ables. Despite the value of the average strike option at
closed-form solutions for the price of the average
t depending on the current asset price, current value
strike option.
of the average, and time to expiry, a one-dimensional
Many of the methods that we discuss here for
partial differential equation (PDE) can be derived by
pricing are similar to those used to price the Asian
using Ingersoll’s reduction of variables. However, the
option. An early technique to give an approximate
price for the average strike option was to replace drawback is that the Dirac delta function appears as
the arithmetic average AT with the geometric average a coefficient of the PDE, making it prone to insta-
GT . Since GT has a lognormal distribution, the bilities. Vecer’s [12] PDE method for Asian options
(approximate) pricing problem becomes (for a call) with fixed strike also applies to average strike options
and gives a stable one-dimensional PDE. Some test-
e−r(T −t) Ɛ[(ST − GT )+ |Ft ] (8) ing of this method for the average strike option is
given in [11].
We recognize that this is exactly an exchange
To conclude, research into pricing the average
option (see Exchange Options), which can be priced
strike option is ongoing, with current PDE and bound
via a change of measure, as in [9]. Levy and Turnbull
methods being very efficient.
[8] mentioned this connection to exchange options,
but it was Conze and Viswanathan [3] who presented
the results of this computation.
Other analytical approximations can be obtained
by approximating the true joint distribution of the References
arithmetic average and asset price using an approx-
imate distribution, usually jointly lognormal with
[1] Bouaziz, L., Briys, E. & Crouhy, M. (1994). The pricing
appropriate parameters. Chung et al. [2] extended the
of forward starting asian options, Journal of Banking and
linear approximations of Bouaziz et al. [1], Levy [7],
Finance 18(5), 823–839.
and Ritchken et al. [10] (approximating distribution [2] Chung, S., Shackleton, M. & Wojakowski, R. (2003).
of {AT , ST } by joint lognormal) to include quadratic Efficient quadratic approximation of floating strike Asian
terms. Their approximation is no longer based on a option values, Finance 24(1), 49–62.
geometric-type approximation. [3] Conze, A. & Viswanathan, R. (1991). European path
Recently, symmetries of a similar style to that of dependent options: the case of geometric averages,
the put-call symmetry have been found between fixed Finance 12(1), 7–22.
strike Asian options and average strike options. For [4] Henderson, V., Hobson, D., Shaw, W. & Wojakowski, R.
forward starting average strike options, Henderson (2007). Bounds for in-progress floating-strike Asian
options using symmetry, Annals of Operations Research
et al. [4] gave a symmetry with a starting Asian
151, 81–98.
option. If the average strike option is starting, the [5] Henderson, V. & Wojakowski, R. (2002). On the equiva-
special case of Henderson and Wojakowski [5] is lence of fixed and floating-strike Asian options, Journal
recovered. If the average strike option is in progress, of Applied Probability 39(2), 391–394.
it cannot be rewritten as an Asian option, and [6] Ingersoll, J. (1987). Theory of Financial Decision Mak-
Henderson et al. [4] derived an upper bound for the ing, Rowman and Littlefield Publishers, New Jersey.
Average Strike Options 3

[7] Levy, E. (1992). Pricing European average rate currency [12] Vecer, J. (2001). A new pde approach for pricing arith-
options, Journal of International Money and Finance metic average Asian options, Journal of Computational
11(5), 474–491. Finance 4(4), 105–113.
[8] Levy, E. & Turnbull, S. (1992). Average intelligence,
Risk 5, 2.
[9] Margrabe, W. (1978). The value of an option to Related Articles
exchange one asset for another, Journal of Finance 33,
177–186. Asian Options; Black–Scholes Formula; Exchange
[10] Ritchken, P., Sankarasubramanian, L. & Vijh, A.M. Options; Lookback Options; Risk-neutral Pricing.
(1993). The valuation of path-dependent contracts on
the average, Management Science 39(10), 1202–1213. VICKY HENDERSON
[11] Shiuan, Y.J. (2001). Pricing Floating-Strike Asian
Options. MSc dissertation, University of Warwick.
Foreign Exchange exchange rate is 100% correlated to a major currency,
mostly the USD. If one expects that this peg will con-
Markets tinue, hedges should be done in the correlated major
currency. In the case of SAR (Saudi riyal) or AED
(United Arab Emirates dirham), discussion has been
The foreign exchange (FX) market has two major ongoing about depegging the currencies. In case this
functionalities, one related to hedging and the other is done, there could be an increasing interest in SAR-
to investment. or AED-linked investments. This opens an increasing
In the age of globalization, it is essential for interest in SAR- or AED-linked investments to partic-
corporates and multinationals to hedge their FX ipate in the case that these currencies are depegged.
exposure due to export/import activities. In addition, For the “more exotic currencies” such as the GHC
fund managers (institutional) need to hedge their FX (Ghanaian cedi), there is no options market.
risk in stocks or bonds if the stocks/bonds are quoted
in a foreign currency. With hedging instruments,
the FX exposure can be reduced and one can even
Quotation
benefit from certain market scenarios. This kind of The exchange rate can be defined as the amount of
participation brings us to the important class of domestic currency one gets if one sells one unit of
investor-oriented products where the coupon depends foreign currency. If we take a look at an example
on an FX rate or, at maturity, the pay-off (amount, of the EUR/USD exchange rate, the default quotation
currency) will be determined by an FX rate. This is EUR-USD, where USD is the domestic currency
kind of product can be issued as a note, certificate, and EUR is the foreign currency. The terms domestic
or bond. and foreign are not related to the location of the
For the major currencies such as USD, EUR, trader or any country, but it is more a question of
JPY, GBP, CHF, AUD, CAD, and NZD, the mar- the definition. Domestic and base are synonyms as
ket has become more transparent over the last few are foreign and underlying. The common way is to
years. For plain vanilla options, market data, espe- denote the currency pair with a slash (/) and the
cially volatilities for maturities below 1 year, are quotation with a dash (−). The slash (/) does not
published by brokers or banks and are shown on mean a division.
Reuters pages (e.g., TTKLINDEX10, ICAPFXOP, For example, the currency pair EUR/USD can be
GFIVOLS). For exotic products, new pricing tools, quoted either in EUR-USD, which means how many
such as Superderivatives, LPA, Bloomberg, ICY, USD one gets for selling one EUR, or in USD-
Fenics, and so on, are available for users, but the pre- EUR, which then means how many EUR one gets for
mium of the option will depend on the pricing model selling one USD. There are certain market standard
and the adjustments used. For the emerging market quotations; some of them are listed in Table 1.
currencies such as PLN (Polish zloty), HUF (Hun- In the FX market, two currencies are involved,
garian forint), ZAR (South African rand), and so on, which means that one needs to specify on which
which are freely tradable but less liquid, the market currency a particular call or put option is written. For
data are less transparent. Currencies that are not freely
tradable (the currency cannot be cash-settled off-
Table 1 Market convention of some major currency pairs
shore) such as BRL (Brazilian real) or CNY (Chinese
with sample spot price
yuan renmimbi) can be traded as a nondeliverable
forward (NDF) or as a nondeliverable option (NDO) Currency pair Quotation Quote
against a tradable currency. The NDF is a cash-settled EUR/USD EUR-USD 1.4400
product without exchange of notionals, which means GBP/USD GBP-USD 1.9800
that the intrinsic value at maturity will be paid in the USD/JPY USD-JPY 114.00
free tradable currency based on a fixing source. The USD/CHF USD-CHF 1.1500
underlying of an NDO is the NDF, meaning that exer- EUR/CHF EUR-CHF 1.6600
cising the NDO results in an NDF, which will also be EUR/JPY EUR-JPY 165.00
EUR/GBP EUR-GBP 0.7300
cash-settled. Another class of currencies is that of the USD/CAD USD-CAD 0.9800
fully cash-settled pegged ones, which means that their
2 Foreign Exchange Markets

instance, in the currency pair EUR/USD, there can Table 2 Standard market quotation types for option
be a EUR call, which is equivalent to a USD put, or premiums
a EUR put, which is equivalent to a USD call. Symbol Description of symbol Result of example
d pips Domestic per unit 208.42 USD pips
foreign per EUR
FX Terminology f pips Foreign per unit 97.17 EUR pips
domestic per USD
In the FX market, a million is called a buck and a %f Foreign per unit 1.4575% EUR
billion a yard. This is because the word “billion” has foreign
different meanings in different languages. In French %d Domestic per unit 1.3895% USD
and German, it represents 1012 and in English it domestic
stands for 109 . d Domestic amount 20 842 USD
Certain currency pairs have their own names in f Foreign amount 14 575 EUR
the market. For instance, GBP/USD is called a cable, Foreign = EUR, domestic = USD, S0 = 1.4300, rd = 5.0%,
because the exchange rate information used to be sent rf = 4.5%, volatility = 8.0%, K = 1.5000, T = 365 days,
between England and America through a telephone EUR call USD put, notional = 1 000 000 EUR = 1 500 000
cable in the Atlantic Ocean. EUR/JPY is called the USD
cross, because it is the cross rate of the more liquidly
traded USD/JPY and EUR/USD. compare the prices, especially in the broker market.
Some currency pairs also have their own names On the basis of the spot rate on which the delta
to make them short and unique in communication. exchange is done, the premium of the plain vanilla
New Zealand dollar, which is NZD/USD, is called option is calculated via the Black–Scholes formula.
Kiwi, and the Australian dollar, which is AUD/USD, For exotic options, a price in volatility is not possible
is called Aussie. Among the Scandinavian curren- because each bank has its own pricing model for
cies, NOK (Norwegian krone) is called Noki, SEK these.
(Swedish krona) is called Stoki, and in combina- The premium, value, or prices of options can
tion with DKK (Danish krone) the three are called be quoted in six different ways (Table 2). The
Scandies. Black–Scholes formula quotes in domestic pips per
The exchange rates are usually quoted in five one unit of foreign notional. The others can be
relevant figures, for example, in EUR-USD we would retrieved in the following manner:
get a quote of 1.4567. Sometimes one can get a quote
up to six figures, but for the time being we focus on 1 S
× ×K ×
five figures. The last digit “7” is called the pip and SK K
the middle digit “5” is called the big figure, because d pips −−−→ f pips−−−→%f −−−→ %d (1)
the interbank spot trading tools show this digit in
bigger size since it is the most important information.
The figure to the left of the big figure is known Delta and Premium Convention
anyway and the pips to the right of the big figures
are sometimes “negligible”. For example, a rise of The spot delta of a plain vanilla option can be
EUR-JPY 165.00 by 40 pips is 165.40 or a rise by 3 retrieved in a straightforward way by using the
big figures would be 168.00. Black–Scholes formula. It is called the raw spot
delta, δraw . One retrieves it in percentage of the
foreign currency, but the delta in the second involved
opposite
Quotation of Option Prices currency δraw can be computed in the following
manner:
Plain vanilla option prices are usually quoted in S
opposite
δraw = − δraw (2)
terms of implied volatility. If an option is priced K
in volatility, a delta exchange is necessary. The
advantage is that the volatility does not usually move The delta multiplied with the corresponding
as quickly as the spot rate and one has the chance to notional determines the amount that has to be bought
Foreign Exchange Markets 3

or sold to hedge the spot risk of the option up to the Table 3 One-year EUR call USD put, the strike is 1.4300
first order. for a EUR-based bank
An important question is whether the premium of Delta Premium
the option needs to be included in the delta or not? An currency currency Fenics Hedge Delta
example, EUR-USD, for investigation is considered
%EUR EUR LHS δraw − P 48.35
here. In this quotation, USD is the domestic currency %EUR USD RHS δraw 51.64
and EUR is the foreign one. The Black–Scholes %USD EUR RHS + F4 −(δraw − P) −48.35
formula calculates the premium in domestic per 1 unit S/K
foreign currency, which in our example is in USD %USD USD LHS + F4 −δraw S/K −51.64
per 1 EUR. This premium is denoted by p. If the S = 1.4300, rd = 5.0%, rf = 4.5%, volatility = 8.0%,
premium is paid in EUR, which means in the foreign K = 1.4300
currency, it includes an FX risk. The premium p in
USD is equivalent to pS EUR, which means that the
amount of EUR that has to be bought to hedge the
Table 4 One-year EUR call USD put, the strike is 1.5000
option needs to be reduced by this EUR premium and for a EUR-based bank
is given as
Delta Premium
p
δraw − EUR (3) currency currency Fenics Hedge Delta
S
%EUR EUR LHS δraw − P 28.22
%EUR USD RHS δraw 29.69
We denoted USD as domestic currency and EUR %USD EUR RHS + F4 −(δraw − P) −26.91
as foreign currency, but do all banks or trading S/K
places have this notion? What is the notional currency %USD USD LHS + F4 −δraw S/K −28.30
of the option and what is the premium currency? In
S = 1.4300, rd = 5.0%, rf = 4.5%, volatility = 8.0%,
the interbank market, there exists a fixed notion of K = 1.5000
the delta of the currency pair. Normally, it is the
LHS delta in Fenicsa if the option is traded in the
LHS premium, which is mostly used, for example,
for EUR/USD, USD/JPY, and EUR/JPY, and the Examples
RHS delta if it is the RHS premium, for example,
for GBP/USD and AUD/USD. Most of the options To see the different deltas used in practice, consider
traded in the market are out-of-the-money; therefore, two examples discussed in Tables 3 and 4.
the premium does not create a critical FX risk for the
trader.
For the banks where the base currency is consid- Implied Volatility and Delta for a Given
ered the risk-free currency, the market value of the Strike
option is in the base currency, and if the premium is in
the risky currency, the premium needs to be included Implied volatility is not constant across strikes (see
in the hedge. If the premium is in the risk-free (or Foreign Exchange Smiles). The volatility σ depends
the base) currency, the premium will be offset by on the corresponding delta of the option, but the delta
the market value of the option. In the opposite case, depends on the price of the option and therefore on
where the risk-free currency is the underlying cur- the used volatility. How can we retrieve the correct
rency, if the premium is in the risky currency, the volatility for a given strike? For sure it is an iterative
premium will be offset by the market value of the process. Initially, one uses the at-the-money (ATM)
option. Only in the case of premium in risk-free cur- volatility σ0 and calculates the delta 1 . On the basis
rency, the amount needs to be included in the hedge. of 1 , a new volatility σ1 can be retrieved from the
Therefore, the delta hedge is invariant with respect volatility matrix. This new volatility leads to a new
to the risky currency notion of the bank; for example, delta and so on. Now one can define a convergence
for both banks, one is based in USD and the other in criterion to stop the iteration. In practice, a fixed
EUR, and the delta is the same. number of iterations is used, usually five steps.
4 Foreign Exchange Markets

Table 5 Vega matrix for standard maturities and delta values, expressed in percent of foreign notional
Mat/ 50% 45% 40% 35% 30% 25% 20% 15% 10% 5%
O/N 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.01 0.01 0.01
1W 0.06 0.06 0.05 0.05 0.05 0.04 0.04 0.03 0.02 0.01
1M 0.11 0.11 0.11 0.10 0.10 0.09 0.08 0.07 0.05 0.03
2M 0.16 0.16 0.15 0.15 0.14 0.13 0.11 0.09 0.07 0.04
3M 0.20 0.02 0.19 0.19 0.17 0.16 0.14 0.12 0.09 0.05
6M 0.28 0.28 0.27 0.26 0.25 0.23 0.20 0.17 0.13 0.07
9M 0.33 0.33 0.33 0.32 0.30 0.28 0.24 0.20 0.15 0.09
1Y 0.38 0.38 0.38 0.37 0.35 0.32 0.28 0.24 0.18 0.10
2Y 0.51 0.51 0.51 0.50 0.48 0.44 0.40 0.33 0.25 0.15
3Y 0.60 0.60 0.60 0.60 0.57 0.54 0.48 0.40 0.31 0.18
The matrix shows, for example, that 2Y EUR call USD put 35 delta can be hedged with two times 6M EUR call USD put 30 delta

Mapping of Delta on Vega because of the liquidity of these products in the FX


market.
From the Black–Scholes formula, it is clear that
for a fixed delta, the vega Pσ does not depend on
volatility or rd , and Pσ is therefore a function of Risk Reversal (RR)
only rf , maturity, and delta. This gives the trader the
For instance, a 25-delta risk reversal is a combination
advantage of a moderately stable vega matrix. Such
of buying a 25-delta call and selling a 25-delta put.
a matrix is shown in Table 5, with rf = 4.5%.
The payout profile is shown in Figure 1.

FX Smile Butterfly (BF)


The FX smile surface has a different setup or In the case of a 25-delta butterfly, it is the combi-
construction in comparison to equity. nation of buying 25-delta put, buying 25-delta call,
Plain vanilla options with different maturities have selling ATM put, and selling ATM put (alternatively,
different implied volatilities. This is called the term 25-delta strangle is a 25-delta put and a 25-delta call).
structure of a currency pair. Payout profiles are shown in Figure 2.
Plain vanilla options are quoted in terms of The decomposition of a smile curve inspired by
volatility for a given delta. The smile curve is usually these products is shown in Figure 3.
set up on some fixed pillars and the points between
these pillars are interpolated to get a smooth surface.
In the direction of the term structure, the easiest way
is to interpolate linearly in the variance. In addition,
weights are introduced to highlight or lower the
importance of some dates, for example, the release
of nonfarm payrolls, local holiday, or a day before or
Payoff

after a weekend. In the direction of the moneyness


or delta, one method of interpolation is the cubic 25% ATM 25%
spline. The pillars in that direction are 10-delta put,
25-delta put, ATM, 25-delta call, and 10-delta call.
Sometimes the 35-delta put and 35-delta call are also
used. Unlike equity, in the FX market, the smile
surface is decomposed into the symmetric part by Delta
using butterflies or strangles and the skew part by
using the risk reversals for the fixed deltas. This is Figure 1 Payout profile of a risk reversal
Foreign Exchange Markets 5

Table 6 EUR/USD 25-delta risk reversal (in %)


Date 1 month 3 months 1 year

25% ATM 25%


Dec 3, 2007 −0.6 −0.6 −0.6
Dec 4, 2007 −0.525 −0.55 −0.525
−0.525 −0.55 −0.525
Payoff

Dec 5, 2007
Dec 6, 2007 −0.525 −0.55 −0.525
Dec 7, 2007 −0.6 −0.6 −0.6
Dec 10, 2007 −0.6 −0.6 −0.6

Table 7 EUR/USD 25-delta butterfly (in %)


Delta
Date 1 month 3 months 1 year
Figure 2 Payout profile of a butterfly Dec 3, 2007 0.225 0.425 0.460
Dec 4, 2007 0.225 0.425 0.460
Dec 5, 2007 0.225 0.425 0.460
Volatility Dec 6, 2007 0.225 0.425 0.460
Dec 7, 2007 0.227 0.425 0.463
Dec 10, 2007 0.227 0.425 0.463

Table 6 shows the 25-delta risk reversal in


RR EUR/USD on different trading dates and the
BF
corresponding butterflies are listed in Table 7.
For the setup of the smile surface in a risk
−25% ATM +25% management or pricing tool, it is important to know
which convention is used for a certain currency pair
Put delta Call delta
in the option market to define the notion of ATM.

Figure 3 Decomposition of a smile curve

At-the-money Definition
If we denote the ATM volatillity by σ0 , the 25-
delta put volatility by σ− , and the 25-delta call There exist several definitions of ATM:
volatility by σ+ , we get the following relationships:
• ATM spot: Strike is equal to the spot.
RR = σ+ − σ− (4) • ATM forward: Strike is equal to the forward.
1 • Delta parity: The absolute value of the delta call
BF = (σ+ + σ− ) − σ0 (5) is equal to the absolute value of the delta put.
2 • Fifty delta: Put delta is 50% and the call delta is
1 50%.
σ+ = ATM + BF + RR (6)
2 • Value parity: The premium of the delta put is
1 equal to the delta call.
σ− = ATM + BF − RR (7)
2
The most widely used one in the interbank market
It should be noted that the values RR and BF given is the delta parity up to 1 year for the most liq-
above have nothing to do with the prices of actual risk uid currencies. In emerging markets, (at-the-money-
reversal and butterfly contracts: rather they provide forward) (ATMF) is used. For long-term options such
a convenient representation of the implied volatility as USD/JPY 15 years, the ATMF convention is used,
smile in terms of its level (σ0 ), convexity (BF ), and but since this results in a delta position, a forward
skewness (RR). delta exchange will be done.
6 Foreign Exchange Markets

End Notes Related Articles


a.
Fenics is an FX option pricing tool owned by the broker Black–Scholes Formula; Exchange Options;
GFI and used in the interbank market (www.fenics.com). Foreign Exchange Options; Foreign Exchange Op-
tions: Delta- and At-the-money Conventions; For-
Further Reading eign Exchange Smiles; Foreign Exchange Smile
Interpolation.
Hakala, J. & Wystup, U. (2002). Foreign Exchange Risk, Risk
Publications, London. MICHAEL BRAUN
Reiswich, D. & Wystup, U. (2009). FX Volatility Smile Con-
struction, Research Report, Frankfurt School of Finance &
Management, September 2009.
Wystup, U. (2006). FX Options and Structured Products, Wiley
Finance.
Foreign Exchange Options market favors put options. The butterfly measures
the convexity of the “smile” of the volatility, that
is, the volatility for the out-of-the-money and the in-
the-money-options (see Foreign Exchange Markets
Market Overview for details).
If the delta hedge is done with an interbank
The importance of foreign exchange (FX) options for partner at the same time the option is traded, the
risk management and directional trades is gaining trader can focus on the vega position in his book.
more and more recognition from companies and The delta hedge neutralizes the change of the option
investors. Various banks have been adapting their price caused by changes of the underlying. For long-
products to this situation during the past years. term options with an expiry longer than two years or
Different risk and profit profiles can be generated with options with high interest rate sensitivity, the delta
plain vanilla or exotic options as individual products, hedge should be replaced by a forward hedge, as the
as well as in combination with various products such risk of interest rate sensitivity is mostly higher than
as structured products. Financial engineers call this the volatility risk in this case. This means that instead
as playing with Lego bricks. Linear combinations of of neutralizing spot risk by trading in the spot market,
basic products are used to build structured products. one would trade a forward contract with maturities
To price plain vanilla or exotic options and show their matching those of the cash flows of the option. This
risk, many professional trading systems have been would simultaneously take care of the spot and the
introduced and are being continuously developed. rate risk.
With these systems, the traders are able to evaluate
the positions in the individual currency pair or in
currency portfolios at any time. In the FX options First-generation Exotic Options
market, options trading systems such as Fenics,
Murex, or SuperDerivates are used. Owing to the First-generation exotic options are all options beyond
very rapid development in this sector, some banks plain vanilla options that started trading in the 1990s,
started developing and using systems of their own. in particular, barrier options, digital and touch prod-
To comply with various customer requests, a ucts, average rate or Asian options, and lookback
successful trading desk in the interbank market is and compound options. There is no strict separation
essential. This is mostly plain volatility trade. The between first- and second-generation exotics as the
market risk of a short-term FX options trading desk viewpoint on what is first and second varies by the
consists of changes in spot, volatility, and interest person in charge. Exotic options are traded live in
rates. Since spot risk is easily eliminated by delta currency trading as opposed to plain vanilla options,
hedging and the effect of rates is small compared which mostly trade through automated systems. Trad-
to the risk of changing volatility in the short term ing exotic options is done by quoting the bid and
up to two years, managing volatility risk is the main ask price of the product rather than the correspond-
task of the trader. Since the relationship of volatility ing volatility, because the monotone relationship of
and price of call or put options is monotone, it is volatility and price is often not guaranteed. When
equivalent to quote the price of an option either asking for a quote, the spot reference level is agreed
by the price itself or by the volatility implied by upon at which the option is calculated and priced.
the Black–Scholes formula. The established market This allows comparing quotes of the exotic options
standard is quoting this implied volatility, which is and is also the basis of the delta hedge. To keep the
why it is often viewed as a traded quantity. In the case vega risk low when fixing a deal, a vega hedge can
of plain vanilla options, a vega long position is given be done with the partner. In this case, plain vanilla
when buying (call or put); conversely, a vega short options (calls/puts, at-the-money ATM straddles) are
position is given when selling. Volatility difference traded to offset the vega of the exotic option. The
between call and put with same expiry and same default vega hedge is done with a straddle—an out-
deltas is called a risk reversal. If the risk reversal of-the-money call and an out-of-the-money put—the
is positive, the market is willing to pay more for reason being that this product does not have any delta,
calls than for puts; if the risk reversal is negative, the so one offsets the vega position without touching
2 Foreign Exchange Options

EUR-USD knockout option


1000000

800000

600000

P/L
400000

200000

–200000 1.25
1.27
1.29
1.31
1.33
1.35
1.37
1.39
1.41
1.43
1.45
1.47
1.49
1.51
1.53
1.55
1.57
1.59
1.61
1.63
1.65
(a) Spot at expiry

EUR-USD 25 Delta risk reversal


1200000
1000000
800000
600000
400000
P/L

200000
0
–200000
–400000
–600000
–800000
1.25
1.27
1.29
1.31
1.33
1.35
1.37
1.39
1.41
1.43
1.45
1.47
1.49
1.51
1.53
1.55
1.57
1.59
1.61
1.63
1.65
(b) Spot at expiry

Figure 1 (a) Payoff profile of a EUR-USD knockout option that is not knocked out during its lifetime; (b) payoff profile
of the EUR-USD risk reversal at expiry

the delta position. Normally, during the lifetime of the option is only activated if the spot ever trades
the option, the risk is hedged dynamically across the at or beyond the prespecified barrier. The barrier is
entire option book. valid at all times between inception of the trade and
The quoting bank (market maker) is the calcula- the maturity time of the option.
tion agent. It stipulates the regulations under which One can further distinguish regular barrier options,
predefined triggers are reached or how often the where the barrier is out-of-the-money, and reverse-
underlying is traded in certain predefined ranges. The barrier options, where the barrier is in-the-money.
market maker informs the market user about the trig- A regular knockout barrier option can basically be
ger event. priced and semistatically hedged by a risk reversal
(Lego-brick principle).
Barrier Options Figure 1 illustrates the example: EUR-USD Spot
1.4600 expiry six months, strike 1.5000, EUR CALL
Barrier options are vanilla put and call options with with regular knockout trigger at 1.4300.
additional barriers. In case of a knockout, the option Hedging a short regular knockout EUR call, we
expires worthless, if the spot ever trades at or beyond can go long a vanilla EUR call with the same strike
the prespecified barrier. In case of a knockin option, and the same expiry and go short a vanilla EUR put
Foreign Exchange Options 3

with a strike such that the value of the hedge portfolio short knockout option, if the spot value is above both
is zero if the spot is at the barrier. The long call and triggers (Lego-brick principle).
short put is called a risk reversal and its market price Window barriers (partial barriers) are additional
can be used as a proxy for the price of the regular modifications of barrier options. In case of a window-
knockout call. In our example, it would be a 1.3650 barrier option, the trigger is valid only within a certain
EUR put. If the trigger is not reached, then the put period of time. Commonly, this period of time is
expires worthless and the call offsets the knockout from inception of the trade until a specific date (early
call payoff. If the trigger is reached, the risk reversal ending) or from a specific date during validity until
can be canceled with approximately zero value. The expiry date of the option (deferred start). Arbitrary
delta of a knockout option is higher than the delta time intervals are possible.
of the corresponding plain vanilla option, and the For European barrier options, the triggers are only
higher it is, the closer the trigger is to the underlying valid at maturity. They can be statically hedged with
spot. plain vanilla options and European digital options
Reverse knockout and reverse knockin are more (Lego-brick principle).
difficult to price and hedge as the risk profile of
these options is difficult to replicate with other Binary Options/Digital Options
options. In this case, the trigger is in the money. The
volatility risk of first and second order arising from Digital or binary options pay a fixed amount in a
these options can be hedged dynamically with risk currency to be specified if the spot trades at or
reversals and butterflies (see Vanna–Volga Pricing). beyond a prespecified barrier or trigger. For European
However, all sensitivities take extreme values when digitals, the trigger is valid only at maturity, whereas
getting closer to the trigger and closer to maturity. for American digitals, the trigger is valid during the
Delta positions can be a multiple of the notional entire lifetime of the trade. In FX-interbank trade
amount. Therefore, it is difficult for the trader to American digitals are also called one-touch (if the
perform dynamic hedging strategies. To manage these fixed amount is paid at maturity) or instant one-
risks, short-term reverse knockout barrier options touch (if the fixed amount is paid at first hitting time)
are often removed from the global books and are options. Further touch options are the so-called no-
matched as individual positions, or are closed two touch options, double no-touch options, and double
to three weeks before expiry. The risk surcharge one-touch options. A no-touch pays only if the spot
paid in this case is often smaller than the cost never touches or crosses the prespecified trigger. A
of keeping to such positions and hedging them double no-touch pays only if neither the upper trigger
individually. nor the lower trigger is ever touched or crossed during
the lifetime of the contract. A double one-touch pays
only if at least one of the upper or the lower triggers
Modifications and Extensions of Barrier Options is touched. When buying a double no-touch option,
a vega short position is generated. This means that
Standard extensions of barrier options are double-
double no-touch options are cheap in phases of high
barrier options, where there is a barrier above
volatility.
and below the current spot. A double knockout
European digital options can be replicated with
option expires worthless if any of the two barri-
bull or bear spreads with large amounts. Their market
ers are ever touched or crossed. A double knockin
price can thus be approximated by liquid vanilla
option only becomes a vanilla option if at least one
options. However, this type of option is difficult to
of the two barriers is touched or crossed in the
hedge as the delta hedge close to expiry is zero almost
underlying.
everywhere.
A further modification of barrier options is called
the knockin/knockout (KIKO) option. This option can
knockout at any time; however, it must knockin to General Features When Pricing Exotic
become alive. A short KIKO option can be statically Options
hedged with a long knockout option and a short
double knockout option, if the spot value is between Most commercial software packages calculate the
the triggers, and with a long knockout option and a “theoretical value (TV)” of the exotic options, which
4 Foreign Exchange Options

is the value of the product in a Black–Scholes model Further extensions are target redemption prod-
with constant parameters. ucts, whose notional amount increases until a certain
Knowing the TV is important for trading partners gain is reached. A common example is a target
as it serves as a checksum to ensure that both redemption forward (TRF). We provide a descrip-
parties talk about the same product. The market value, tion and an example here: We consider a TRF in
however, often deviates from this value because of which a counterpart sells EUR and buys USD at
so-called overhedge costs, which arise when hedging a much higher rate than current spot or forward
the exotic option. Every trader must be aware of rates. The key feature in this product is that coun-
the risk arising from these options and should be terpart has a total target profit that, once hit, knocks
able to control this risk dynamically in his books via out all future settlements (in the example below, all
the Greeks (price sensitivity with respect to market weekly settlements), locking the gains registered until
and model parameters). If a gain is generated by then.
performing this hedge, the price of an exotic option The idea is to place the strike over 5.5 big
must be higher than the TV. Conversely, if the hedge figures above spot to allow the counterpart to quickly
leads to a loss, the market price of the exotic option accumulate profits and have the trade knocked out
should be above TV. after five or six weeks. The counterpart will start
A very important issue when trading exotic losing money if EUR-USD starts fixing above the
options is placing automatic spot orders at spot lev- strike. On a spot reference of 1.4760, consider a one
els that could lead to a knockout or expiry of the year TRF, in which the counterpart sells 1 EUR 1
option. This order eliminates the delta hedge of million per week at 1.5335, subject to a knockout
the option automatically when reaching the trig- condition: if the sum of the counterpart profits reaches
ger. This explains the occasional very heavy spot the target, all future settlements are canceled. We let
movements during specific trigger events in the mar- the target be 0.30 (i.e., 30 big figures), measured
ket. weekly as Profit = Max (0, 1.5335—EUR-USD Spot
The following vega structure is often found in Fixing). As usual, this type of forward is also traded
options books as it stems from most of the structured at zero cost:
products offered today in the FX range: ATM vega Week 1 Fixing = 1.4800 Profit = 0.0535 Max
long and wing vega short. This is the reason for (1.5335–1.4800, 0)
a long phase of low volatility and high butterflies Week 2 Fixing = 1.4750 Profit = 0.0585 Accumu-
for the past years. See also Foreign Exchange lated profit = 0.1120
Smiles. Week 3 Fixing = 1.4825 Profit = 0.0510 Accumu-
lated profit = 0.1630
Week 4 Fixing = 1.4900 Profit = 0.0435 Accumu-
Second-generation Exotic Options lated profit = 0.2065
Week 5 Fixing = 1.4775 Profit = 0.0560 Accumu-
We consider every exotic option as second generation lated profit = 0.2625
if it is not a vanilla and not a first-generation product. Week 6 Fixing = 1.4850 Profit = 0.0485 Accumu-
Some of the common examples in FX markets are lated profit = 0.3110
range accruals and faders.
A range accrual is a sum of digital call spreads The profit is capped at 0.30, so the counterpart
and pays an amount of a prespecified currency that only accumulates the last 3.75 big figures and the
depends on the number of currency fixings that come trade knocks out.
to fall inside a prespecified range. A fader is any basic Each forward will be settled physically every week
option product like a vanilla or barrier option, whose until trade knocks out (if the target is reached).
notional amount depends on the number of currency Another popular FX product is the time option,
fixings that come to fall inside a prespecified range. which is essentially a forward contract of American
We distinguish fade-in products, where the notional style, that is, the buyer is entitled and obliged to
grows with each fixing inside the range and fade- trade a prespecified amount at a prespecified strike,
out products, where the notional decreases with each but can choose the time within a prespecified time
fixing inside the range. interval.
Foreign Exchange Options 5

The market is likely to continue to develop fast. References


Besides Bermudan style options, where early exercise
is allowed at certain prespecified times, basket [1] Wystup, U. (2006). FX Options and Structured Products,
options and the corresponding structures are very John Wiley & Sons.
much in demand in the market. Hybrid structures are
exotic options whose payoff depends on underlying MARKUS CEKAN, ARMIN WENDEL &
spots across different market sectors. We refer the UWE WYSTUP
reader to [1].
Currency Forward or service. Another positive feature of forwards is that
there is no upfront premium to be paid by either party.
Contracts As both parties to the contract have an obligation to
deliver and the contract is struck at the prevailing
market forward rate, the transaction is by definition
a zero-cost strategy. This is because the definition
Executive Summary of the market forward rate is a future exchange rate
Structured forwards use combinations of options to of two currencies at a rate that demands no upfront
replicate forwards like payout profiles. The main payment from either party.
use of structured forwards is for corporate and How can one calculate this market forward rate
institutional clients trying to hedge their foreign and what are the influencing factors?
exchange exposures. While standard forwards lock
in a fixed exchange rate, structured forwards give the Calculating the Market Forward Rate
user the advantage of the possibility of an improved
exchange rate, while still guaranteeing a worst- The following example helps to determine the for-
case rate. As standard forwards, structured forwards ward exchange rate of a given currency pair.
usually have no upfront premium requirements (zero-
cost strategies). Having the chance of an improved
Market Information.
exchange rate for no upfront premium suggests that
structured forwards must have a guaranteed worst- • Company X: London-based manufacturer export-
case exchange rate that is worse than the prevailing ing to the United States.
forward rate. This is the risk involved when entering • The importing company: New York-based com-
into structured forward transactions. pany importing from the United Kingdom.
• The bank: it is the other counterparty to the
forward transaction.
Forward Contract • Company X sells its goods to the importing com-
pany. The sale is agreed in USD, and the payment
A foreign exchange forward transaction involves
of USD 100 000 is expected six months after the
two parties, who enter into a contract, whereby one
contract is signed. Therefore, the London-based
counterparty agrees to sell a specified amount of a
company X has a foreign exchange exposure, as
currency A in exchange for a specified amount of
the change in the foreign exchange rate has an
another currency B on a specified date. The other
effect on its income in GBP.
counterparty agrees to buy the specified amount of
• Current GBP/USD exchange rate: 2.0000. This
that currency A in exchange of the other currency B.
means that 1 GBP is worth 2 USD.
• Current GBP interest rate: 6% per annum. This is
The Characteristics of Forward Contracts the interest rate company X can borrow and lend
in GBP.
Both counterparties have the obligation to fulfill the • Current USD interest rate: 3% per annum. This is
contract (as opposed to an option transaction where the interest rate company X can borrow and lend
only one of the parties has the obligation, the option in USD.
seller, while the other counterparty, the option buyer,
has the right, but no obligation). As both currency What can company X do to eliminate its foreign
amounts are fixed on the day the contract is entered exchange exposure? We know that company X will
into, the exchange rate between the two currencies is receive USD 100 000 in six months time. In fact they
fixed. Hence, the parties to the contract know from could already sell this USD 100 000 at the prevailing
the beginning at what exchange rate they are obliged market rate (spot rate), but they don’t have it yet. The
to buy or sell the specified currency. solution is to go to the bank and borrow the USD
For corporate and institutional clients, this can be a 100 000. To be precise, they need to borrow less than
useful information, as they can use this exchange rate USD 100 000, because they need to pay interest on
to calculate the cost of production of a given product the loan to the bank. So the exact amount to borrow
2 Currency Forward Contracts

is the net present value (NPV) of USD 100 000. To • At the end of the transaction:
calculate this, we use the following formula: – company X pays USD 100 000 for the GBP
amount exchanged at the agreed forward rate.
N
NP V = (1)
1 + r ∗ d/dc Because the two approaches described earlier have
where the same outcome, the GBP amount received for the
USD 100 000 has to be the same; otherwise there
• N is the amount for which one wants to calculate would be an arbitrage opportunity. Therefore, the
the NPV. In this example, it is USD 100 000. market forward rate in this example has to be 1.9717.
• r is the interest rate, expressed as percentage Generally, the forward rate can be calculated with
per annum, for the currency in which N is a single and easy formula:
denominated.
• d is the duration of the deposit or loan in days. 1 + rUSD∗ d/dcUSD
FGBP/USD = SGBP/USD∗ (2)
In this example, it is 180 days (i.e., six months). 1 + rGBP∗ d/dcGBP
• dc is the day-count fraction. This is usually 360,
except for GBP deposits or loans where it is 365. where FGBP/USD , forward rate for GBP/USD;
SGBP/USD , spot exchange rate for GBP/USD; rUSD ,
We are now able to calculate the amount com- USD interest rate expressed in percentage per annum;
pany X has to borrow: USD 100 000/(1 + 0.03 × rGBP , GBP interest rate expressed in percentage per
180/360) = USD 98,522.17. If they borrow this annum; d, duration of the deposit or loan in days;
money, they have to pay back exactly USD 100 000 dcUSD , day-count fraction for USD (360); and dcGBP ,
in six months time including the interest charge. This day-count fraction for GBP (365).
is the amount company X is due to receive in six As the formula suggests, the market forward rate
months time from the sales of its goods to the import- is a function of only the current (spot) exchange
ing company. rate and the interest rates of the two currencies for
If company X now sells the borrowed USD the specified forward period. Hence, it is not market
in the spot market and buys GBP, they receive expectations, or any other factor that determines the
GBP 49 261.08. This is calculated by dividing the arbitrage-free forward rate.
borrowed USD amount by the current GBP/USD
exchange rate (2.0000 in this example).
Company X now has the GBP and has eliminated Structured Forwards
the foreign exchange exposure. They can take GBP
and deposit it with their bank at the current interest The previous section helped us to understand how a
rate (6% in this example). The amount they get back foreign exchange exposure resulting from a cross-
after six months is equal to GBP 50 718.67—this is border transaction can be eliminated and hedged
calculated as follows: GBP 49 261.08 × (1 + 0.06 × through a forward transaction. It showed that the for-
180/365). ward exchange rate was fixed right at the beginning of
After these series of transactions company X is the contract and hence the uncertainty about exchange
left with no cash position at the beginning of the rate movements was turned into a known rate with
transaction. They receive GBP 50 718.67 after six which companies can calculate their cost of produc-
months and have to pay USD 100 000 in exchange. tion. The example also demonstrated that there is no
The exchange rate that is implied from the above- cash flow at the beginning of a forward transaction
mentioned two amounts is 1.9717 (calculated as USD and there is no premium or any other fee associ-
100 000 divided by GBP 50 718.67). ated with it. A forward transaction is by definition a
What happens when a forward transaction is zero-cost strategy.
entered into? Exactly the same:
The Difference between Forwards and Structured
• At the beginning of the transaction: Forwards
– company X has no cash position;
– company X agrees to sell USD 100 000 for The disadvantage of forwards is that favorable
GBP at the market forward rate. exchange rate moves are also eliminated when the
Currency Forward Contracts 3

exchange rate is fixed. In the previous example, the (or selling) rate and no upfront premium must enter
forward rate was calculated to be 1.9717. This is the into the transaction.
rate at which company X has to buy GBP and sell the
USD. If in six months time the GBP/USD exchange Forward Plus. The forward plus is the simplest of
rate falls below 1.9717, company X would be bet- all structured forwards. It offers the possibility to take
ter off without hedging the GBP purchase through advantage of favorable market movements up to a
a forward. certain point, while still having a certain worst-case
Structured forwards allow just this. They are more hedged rate.
flexible, because favorable exchange rate moves, and, How does it work: by accepting a worst-case
in fact, any market view can be incorporated into the hedge rate that is less favorable than the prevailing
transaction to enhance the rate at which a currency is market forward rate, we create excess cash. Remem-
exchanged for another. ber, trading at the market forward rate is zero cost by
As with forwards, structured forwards offer the definition. If one trades on a rate that is worse than
worst-case exchange rate. This rate is fixed at the the market rate, one can expect some compensation.
beginning of the contract and similar to a regular The cash generated is used to buy an option that pays
forward, it offers the benefit of certainty about the out, if the underlying currency pair moves favorably.
exchange rate that can be used for financial planning. To make this a zero-cost strategy, we need to intro-
Similar to standard forward contracts, most struc- duce a barrier, or knockout. This has the effect that
tured forward contracts are zero-cost strategies, that options cease to exist (are knocked out) if the barrier
is, no upfront premium is required. is reached. For our strategy, it means that we can par-
We all know that there is no such a thing as a “free ticipate in a favorable market move, but only up to
lunch”. Therefore, to have the benefit of an improved a certain point, namely, the predefined barrier level.
exchange rate, a fixed worst-case rate, and a zero- If the barrier is reached we are locked into a forward
cost strategy, the company entering into a structured transaction with a rate equal to the worst-case rate.
forward transaction needs to take on certain risks. Let us continue the previous example with com-
This risk is usually structured so that the guaranteed pany X: We calculated the market forward rate to
worst-case exchange rate is set at a rate that is purchase GBP against USD in six months time to
worse than the prevailing market forward rate. The be 1.9717. A forward plus could have a worst-case
hedging counterparty accepts this worse guaranteed buying rate of 1.9850. This rate is 0.0133 worse than
rate for the chance of receiving a better rate, in case the market forward rate. As compensation for accept-
a predefined condition is met. As the examples in ing this hedge rate, company X has the opportunity
the following section demonstrate, these predefined to buy GBP at the prevailing spot rate in six months
conditions can take many forms and may incorporate time as long as the barrier of 1.8875 is not reached or
the market view of the counterparty entering into the breached during the life of the contract. As the bar-
structured forward transaction. rier is observed continuously during the entire life of
the transaction, we call this barrier an American style
barrier (this is not to be confused with an American
Examples of Structured Forwards style option that is exercisable during the life of the
option). So what does this right to buy the GBP at the
As mentioned in the previous section, structured for- prevailing market spot rate in six months time give
wards offer the possibility to incorporate one’s market to company X? Imagine that the barrier was never
view into a forward transaction. This view might reached and the spot rate in six months time is at
be the appreciation or depreciation of a currency 1.9000. In this case, company X may buy the GBP at
or even the view that a currency pair remains in a 1.9000 and it will outperform the forward transaction
certain range over a given period of time. The follow- that would have forced it to buy the GBP at 1.9717.
ing examples demonstrate how these different market However, if the spot rate ever trades at or below the
views can be expressed with currency options that barrier of 1.8875, company X has to buy the GBP at
can be structured into the forward transaction. As a the worst-case rate of 1.9850.
reminder: all examples follow the basic assumptions Table 1 and Figure 1 demonstrate possible scenar-
that the structured forward has a worst-case buying ios with assumed spot rates after six months.
4 Currency Forward Contracts

Table 1 Forward plus scenario analysis


Forward plus buying rate

Spot rate in six months time Barrier never reached Barrier reached Market forward rate
2.0200 1.9850 1.9850 1.9717
2.0100 1.9850 1.9850 1.9717
2.0000 1.9850 1.9850 1.9717
1.9900 1.9850 1.9850 1.9717
1.9850 1.9850 1.9850 1.9717
1.9750 1.9750 1.9850 1.9717
1.9700 1.9700 1.9850 1.9717
1.9650 1.9650 1.9850 1.9717
1.9600 1.9600 1.9850 1.9717
1.9550 1.9550 1.9850 1.9717
1.9500 1.9500 1.9850 1.9717
1.9450 1.9450 1.9850 1.9717
1.9400 1.9400 1.9850 1.9717
1.9350 1.9350 1.9850 1.9717
1.9300 1.9300 1.9850 1.9717
1.9250 1.9250 1.9850 1.9717
1.9200 1.9200 1.9850 1.9717
1.9150 1.9150 1.9850 1.9717
1.9100 1.9100 1.9850 1.9717
1.9050 1.9050 1.9850 1.9717
1.9000 1.9000 1.9850 1.9717
1.8950 1.8950 1.9850 1.9717
1.8876 1.8876 1.9850 1.9717
1.8875 1.9850 1.9850 1.9717
1.8800 1.9850 1.9850 1.9717
1.8750 1.9850 1.9850 1.9717
1.8700 1.9850 1.9850 1.9717

2.0000

1.9800
GBP purchasing rate

1.9600

1.9400

1.9200

1.9000

1.8800
1.8700 1.8900 1.9100 1.9300 1.9500 1.9700 1.9900 2.0100
GBP/USD spot rate at maturity

Forward plus (barrier not reached), forward plus (barrier reached),


market forward rate

Figure 1 Forward plus scenario analysis


Currency Forward Contracts 5

As Figure 1 demonstrates, the forward plus out- that pays out if the range holds. The payout of the
performs the market forward rate, if the barrier is option is then used to improve the worst-case rate.
never reached and the GBP/USD spot rate at maturity Here is an example: we calculated the market
is below 1.9717. forward rate to purchase GBP against USD in six
If we set the worst-case scenario even higher than months time to be 1.9717. A range forward could
1.9850, we can set the barrier further down. Taking have a worst-case buying rate of 1.9850. This rate
advantage of this flexibility, each company entering is 0.0133 worse than the market forward rate. In
into a forward plus can create a product that suits its compensation for accepting this hedge rate, company
risk appetite. X can buy GBP at 1.8850 (0.0867 better than the
forward rate), if the GBP/USD exchange rate remains
within the 2.0700–1.9400 range during the entire six-
Range Forward. The following example uses month period. If at any time during the life of the
another market view to try to outperform the forward contract, the underlying currency pair trades outside
rate. In this case, we expect the underlying currency the range, company X has to buy the GBP at the
pair to trade within a predefined range during the life worst-case rate of 1.9850.
of the contract. Table 2 and Figure 2 demonstrate possible scenar-
Like with the forward plus (and with nearly all ios with assumed spot rates after six months.
other structured forwards), the worst-case hedge rate As Figure 2 demonstrates, the range forward out-
is less favorable than the prevailing market forward performs the market forward rate, if the range holds,
rate. The generated excess cash is spent on an option even if spot rate closes above the forward rate.

Table 2 Range forward scenario analysis


Range forward buying rate

Spot rate in six months time Barriers never reached Barrier reached Market forward rate
2.1000 1.9850 1.9850 1.9717
2.0700 1.9850 1.9850 1.9717
2.0699 1.8850 1.9850 1.9717
2.0500 1.8850 1.9850 1.9717
2.0300 1.8850 1.9850 1.9717
2.0250 1.8850 1.9850 1.9717
2.0200 1.8850 1.9850 1.9717
2.0150 1.8850 1.9850 1.9717
2.0100 1.8850 1.9850 1.9717
2.0050 1.8850 1.9850 1.9717
2.0000 1.8850 1.9850 1.9717
1.9950 1.8850 1.9850 1.9717
1.9900 1.8850 1.9850 1.9717
1.9850 1.8850 1.9850 1.9717
1.9800 1.8850 1.9850 1.9717
1.9750 1.8850 1.9850 1.9717
1.9700 1.8850 1.9850 1.9717
1.9650 1.8850 1.9850 1.9717
1.9600 1.8850 1.9850 1.9717
1.9550 1.8850 1.9850 1.9717
1.9500 1.8850 1.9850 1.9717
1.9450 1.8850 1.9850 1.9717
1.9401 1.8850 1.9850 1.9717
1.9400 1.9850 1.9850 1.9717
1.9300 1.9850 1.9850 1.9717
1.9250 1.9850 1.9850 1.9717
1.9200 1.9850 1.9850 1.9717
6 Currency Forward Contracts

2.0000

1.9800

GBP purchasing rate


1.9600

1.9400

1.9200

1.9000

1.8800
1.9200 1.9700 2.0200 2.0700
GBP/USD spot rate at maturity

Range forward (barrier not reached), range forward (barrier reached),


market forward rate

Figure 2 Range forward scenario analysis

If we set the worst-case scenario even higher [4] Chisholm, A.M. (2004). Derivatives Demystified: A Step-
than 1.9850, we can widen the range or improve the by-Step Guide to Forwards, Futures, Swaps and Options,
best-case buying rate. Taking advantage of this flex- John Wiley & Sons.
ibility, each company entering into a range forward
can create a product that suits its risk appetite. Related Articles

References Barrier Options; Forwards and Futures; Pricing


Formulae for Foreign Exchange Options.
[1] Wystup, U. (2006). FX Options and Structured Products,
John Wiley & Sons. TAMÁS KORCHMÁROS
[2] Weithers, T. (2006). A Practical Guide to the FX Markets,
John Wiley & Sons.
[3] Villanueva, O.M. (2007). Spot-forward cointegration,
structural breaks and FX market unbiasedness Journal of
International Financial Markets Institutions & Money 17,
58–78.
Pricing Formulae for Table 1 Abbreviations used for the pricing formulae of
FX options
Foreign Exchange Options 
τ = T −t

θ± =
rd − rf
σ ±σ
x 2
 
ln + σ θ± τ
Dd = e−rd τ d± = K √
The foreign exchange options market is highly com- x
σ τ
petitive, even for products beyond vanilla call and ln + σ θ± τ
  B √
put options. This means that pricing and risk manage- Df = e−rf τ x± =
σ τ
ment systems always need to have the fastest possible  
method to compute values and sensitivities for all the B2
ln + σ θ± τ
products in the book. Only then can a trader or risk t 2
xK
n(t) = √1 e− 2
 
z± = √
manager know the current position and risk of his 2π  
σ τ
book. The ideal solution is to use pricing formulae in B
x ln + σ θ± τ
closed form. However, this is often only possible in N(x) =

n(t) dt y± =
 x √
−∞
the Black–Scholes model. σ τ
φ = +1 for call options φ = −1 for put options
t: current time T : maturity time
General Model Assumptions and K: strike B, L, H : barriers
Abbreviations
Throughout this article, we denote the current value v(t,x)
of the spot St by x and use the abbreviations listed
+∞  √

1 2
in Table 1. = e−rd τ F xe(rd −rf − 2 σ )τ +σ τ z n(z) dz
−∞
The pricing follows the usual procedures of Arbi-

trage pricing theory and the Fundamental theorem +∞  √ 


= Dd F xeσ θ− τ +σ τ z n(z) dz (3)
of asset pricing. In a foreign exchange market, this −∞
means that we model the underlying exchange rate
by a geometric Brownian motion The rest is working out the integration. In other
models, one would replace the normal density by
dSt = (rd − rf )St dt + σ St dWt (1) another density function such as a t-density. How-
ever, in many other models densities are not explicitly
where rd denotes the domestic interest rate, σ the known, or even if they are, the integration becomes
volatility, and Wt the standard Brownian motion; cumbersome.
see Foreign Exchange Symmetries for details. Most For the resulting pricing formulae, there are many
importantly, we note that there is a foreign interest sources, for example, [7, 11, 17]. Many general books
rate rf . As in Option Pricing: General Princi- on Option Pricing also contain formulae in a context
ples, one can compute closed-form solutions for outside the foreign exchange, for example, [8, 18].
many options types with payoff F (ST ) at maturity Obviously, we cannot cover all possible formulae in
T directly via this section. We give an overview of several relevant
examples and refer to Foreign Exchange Basket
v(t, x) = e−rd T IE[F (ST )|St = x] Options; Margrabe Formula; Quanto Options for
  √
 more. FX vanilla options are covered in Foreign
1 2
= e−rd T IE F xe(rd −rf − 2 σ )τ +σ τ Z (2) Exchange Symmetries.

where v(t, x) denotes the value of the derivative with


payoff F at time t if the spot is at x. The random Barrier Options
variable Z represents the continuous returns, which
are modeled as standard normal in the Black–Scholes We consider the payoff for single-barrier knock-out
model. In this model, we can proceed as options
2 Pricing Formulae for Foreign Exchange Options

[φ(ST − K)]+ II {ηSt >ηB,0≤t≤T } Using the density (8), the value of a barrier option
= [φ(ST − K)]+ II {mint∈[0,T ] (ηSt )>ηB} (4) can be written as the following integral

where the binary variable η takes the value +1 if barrier (S0 , σ, rd , rf , K, B, T )


the barrier B is approached from above (down-and-
out) and −1 if the barrier is approached from below = e−rd T IE [φ(ST − K)]+ II {ηSt >ηB,0≤t≤T } (9)

x=+∞

(up-and-out). +
To price knock-in options paying = e−rd T φ(S0 eσ x − K)
x=−∞ ηy≤min(0,ηx)

[φ(ST − K)]+ II {mint∈[0,T ] (ηSt )≤ηB} (5) × II  1 B


 f (x, y) dy dx (10)
ηy>η ln
σ S0
we use the fact that Further details on how to evaluate this integral can be
found in [15]. It results in four terms. We provide the
knock-in + knock-out = vanilla (6) four terms and summarize, in Table 2, how they are
used to find the value function (see also [13] or [14]).
Computing the value of a barrier option in the
Black–Scholes model boils down to knowing the
joint density f (x, y) for a Brownian motion with drift A1 = φxDf N(φd+ ) − φKDd N(φd− ) (11)
and its running extremum (η = +1 for a minimum A2 = φxDf N(φx+ ) − φKDd N(φx− ) (12)
and η = −1 for a maximum),
  2θ−
  B σ
A3 = φ
W (T ) + θ− T , η min η(W (t) + θ− t) (7) x
0≤t≤T   2 
B
which is derived, for example, in [15], and can be × xDf N(ηz+ ) − KDd N(ηz− ) (13)
x
written as
  2θ−
B σ
f (x, y)
 A4 = φ
x
1 2 2(2y − x) (2y − x)2  
= −ηeθ− x− 2 θ− T √ exp − , (8)  2
T 2πT 2T B
× xDf N(ηy+ ) − KDd N(ηy− ) (14)
ηy ≤ min(0, ηx) x

Table 2 The summands for the value of single barrier options


Option type φ η In/Out Reverse Combination
Standard up-and-in call +1 −1 in K >B A1
Reverse up-and-in call +1 −1 in K ≤B A2 − A3 + A4
Reverse up-and-in put −1 −1 in K >B A1 − A2 + A4
Standard up-and-in put −1 −1 in K ≤B A3
Standard down-and-in call +1 +1 in K >B A3
Reverse down-and-in call +1 +1 in K ≤B A1 − A2 + A4
Reverse down-and-in put −1 +1 in K >B A2 − A3 + A4
Standard down-and-in put −1 +1 in K ≤B A1
Standard up-and-out call +1 −1 out K >B 0
Reverse up-and-out call +1 −1 out K ≤B A1 − A2 + A3 − A4
Reverse up-and-out put −1 −1 out K >B A2 − A4
Standard up-and-out put −1 −1 out K ≤B A1 − A3
Standard down-and-out call +1 +1 out K >B A1 − A3
Reverse down-and-out call +1 +1 out K ≤B A2 − A4
Reverse down-and-out put −1 +1 out K >B A1 − A2 + A3 − A4
Standard down-and-out put −1 +1 out K ≤B 0
Pricing Formulae for Foreign Exchange Options 3

Digital and Touch Options ω = 1, if the rebate is paid at maturity time T


(23)
Digital Options

Digital options have a payoff where the former is also called instant one-touch
and the latter is the default in FX options markets.
It is important to mention that the payoff is one
v(T , ST ) = II {φST ≥φK} domestic paying (15)
unit of the domestic currency. For a payment in the
w(T , ST ) = ST II {φST ≥φK} foreign paying (16) foreign currency EUR, one needs to exchange rd and
rf , replace x and B by their reciprocal values, and
In the domestic paying case, the payment of the change the sign of η; see Foreign Exchange Sym-
fixed amount is in domestic currency, whereas in metries.
the foreign paying case the payment is in foreign For the one-touch, we use the abbreviations
currency. We obtain for the value functions


ϑ− = θ−2 + 2(1 − ω)rd and
v(t, x) = Dd N(φd− ) (17)
w(t, x) = xDf N(φd+ ) (18)  ± ln x
− σ ϑ− τ
e± = B
√ (24)
σ τ
of the digital options paying one unit of domestic and
paying one unit of foreign currency, respectively. The theoretical value of the one-touch turns out
to be

One-touch Options
v(t, x) = Re−ωrd τ
The payoff of a one-touch is given by  
  θ− +ϑ−   θ− −ϑ−
B σ B σ
× N(−ηe+ ) + N(ηe− )
RII {τB ≤T } (19) x x

τB = inf{t ≥ 0 : ηSt ≤ ηB} (20) (25)

This type of option pays a domestic cash amount Note that ϑ− = |θ− | for rebates paid-at-end (ω = 1).
R if a barrier B is hit any time before the expiration The risk-neutral probability of knocking out is
time. We use the binary variable η to describe whether given by
B is a lower barrier (η = 1) or an upper barrier
(η = −1). The stopping time τB is called the first 1
hitting time. In FX markets, an option with this IP [τB ≤ T ] = IE II {τB ≤T } = erd T v(0, S0 ) (26)
R
payoff is usually called a one-touch (option), one-
touch-digital, or hit option. The modified payoff of a
no-touch (option), RII {τB ≥T } describes a rebate, which Properties of the First Hitting Time τB . As
is paid if a knock-in-option is not knocked-in by derived, for example, in [15], the first hitting time
the time it expires and can be valued similarly by

exploiting the identity τ̃ = inf{t ≥ 0 : θt + W (t) = x} (27)

RII {τB ≤T } + RII {τB >T } = R (21) of a Brownian motion with drift θ and hit level x > 0
has the density
Furthermore, we distinguish the time at which the
rebate is paid and let
IP [τ̃ ∈ dt]

ω = 0, if the rebate is paid at first hitting time τB x (x − θt)2
= √ exp − dt, t > 0 (28)
(22) t 2πt 2t
4 Pricing Formulae for Foreign Exchange Options

the cumulative distribution function To evaluate this integral, we introduce the notation
  S0
θt − x ± ln − σ θ− t
IP [τ̃ ≤ t] = N √  B√
t e± (t) = (35)
  σ t
−θt − x
+ e2θx N √ , t >0 (29) and list the properties
t
the Laplace transform  
2 1 B
  e− (t) − e+ (t) = √ ln (36)
 t σ S 0
IEe−α τ̃ = exp xθ − x 2α + θ 2 , α > 0, x > 0
 − 2θ−
B σ
(30) n(e+ (t)) = n(e− (t)) (37)
S0
and the property
 ∂e± (t) e∓ (t)
1 if θ ≥ 0 = (38)
IP [τ̃ < ∞] = (31) ∂t 2t
e2θx if θ < 0
For upper barriers B > S0 , we can now rewrite the We evaluate the integral in equation (34) by rewriting
first passage time τB as the integrand in such a way that the coefficients
of the exponentials are the inner derivatives of the
τB = inf{t ≥ 0 : St = B} exponentials using properties (36)–(38).
  
1 B
= inf t ≥ 0 : Wt + θ− t = ln (32)       2 
σ S0 B 
 B 

1
T σ ln 

1
σ
ln − θ− t  

The density of τB is hence S0 S0
√ exp − dt
  t 2πt 
 2t 

1 B
0

 

ln
σ S0  
T
IP [τ˜B ∈ dt] = √ 1 B 1
t 2πt = ln n(e− (t)) dt
    2  σ S0 0 t
(3/2)

 1 B 
T

 ln − θ− t  
 1
× exp −
σ S0
, t >0 = n(e− (t))[e− (t) − e+ (t)] dt
  2t


2t 

0
 
  2θ−
T
e+ (t) B σ e− (t)
=− n(e− (t)) + n(e+ (t)) dt
(33) 0 2t S0 2t

Derivation of the value function. Using the den-   2θ−


B σ
sity (33), the value of the paid-at-end (ω = 1) upper = N(e+ (T )) + N(−e− (T )) (39)
S0
rebate (η = −1) option can be written as the follow-
ing integral: The computation for lower barriers (η = 1) is similar.

v(T , S0 ) = Re−rd T IE II {τB ≤T }
  Double-no-touch Options
B

T σ ln
1
S A double-no-touch with payoff function
= Re−rd T
0

0 t 2πt
    2  II {L<mint∈[0,T ] St ≤maxt∈[0,T ] St <H } (40)

 B 


1
σ
ln − θ− t  

S0 pays one unit of domestic currency at maturity T ,
× exp − dt (34)

 2t 
 if the spot never touches any of the two barriers,

 
 where the lower barrier is denoted by L, the higher
Pricing Formulae for Foreign Exchange Options 5

barrier by H . A double-one-touch pays one unit with


of domestic currency at maturity, if the spot either
∞ #
"
touches or crosses any of the lower or higher barrier
kT (x) = ˜
nT (x + 2j (h̃ − l))
at least once between inception of the trade and
j =−∞
maturity. This means that a portfolio of a double-
$
one-touch and a double-no-touch is equivalent to a ˜
− nT (x − 2h̃ + 2j (h̃ − l)) (52)
certain payment of one unit of domestic currency at
maturity. One can use Girsanov’s theorem (see Equivalent
To compute the value, let us introduce the stopping Martingale Measures) to deduce that the joint den-
time sity of the maximum and the minimum of a Brownian

motion with drift θ, Wtθ = Wt + θt, is then given by

τL,H = min {inf {t ∈ [0, T ]|St = L or St = H }, T }  % &
kTθ (x) = kT (x) exp θx − 12 θ 2 T (53)
(41)
We obtain for the value of the double-no-touch at
and the notation any time t < τL,H

 1 H v(t, St ) = Dd IEII {L<minu∈[t,T ] Su ≤maxu∈[t,T ] Su <H }


h̃ = ln (42)
σ St = Dd IEII %l<min
˜
&
Wu − ≤maxu∈[t,T ] Wu − <h̃
θ θ
u∈[t,T ]
 1 L
l˜ = ln (43)

σ St θ
= Dd k(T− −t) (x) dx
 √ l˜
θ̃± = θ± τ (44)

"


h = h̃/ τ (45) = Dd · e−2j θ̃− (h−l)
√ j =−∞

˜ τ
l = l/ (46) % &
× N (h + − ) − N (l + − ) (54)

± = ± (j ) = 2j (h − l) − θ̃± (47)
  − e−2j θ̃− (h−l)+2θ̃− h
 1 x2 % &
nT (x) = √ exp − (48) × N (h − 2h + − ) − N (l − 2h + − )
2πT 2T
and for t ∈ [τL,H , T ]
At any time t < τL,H , the value of the double-no-
touch is v(t, St ) = Dd II {L<minu∈[t,T ] Su ≤maxu∈[t,T ] St <H } (55)

Of course, the value of the double-one-touch is


v(t) = IE t Dd II {L<mint∈[0,T ] St ≤maxt∈[0,T ] St <H } (49) given by
Dd − v(t, St ) (56)
and for t ∈ [τL,H , T ],
To obtain a formula for a double-no-touch pay-
v(t) = Dd II {L<mint∈[0,T ] St ≤maxt∈[0,T ] St <H } (50) ing foreign currency, see Foreign Exchange Sym-
metries.
The joint distribution of the maximum and the
minimum of a Brownian motion can be taken
Lookback Options
from [12] and is given by
 
Lookback options are path dependent. At expiration,

the holder of the option can “look back” over the life-
IP l˜ < min Wt ≤ max Wt < h̃ = kT (x) dx
[0,T ] [0,T ] l˜
time of the option and exercise based upon the opti-
(51) mal underlying value (extremum) achieved during
6 Pricing Formulae for Foreign Exchange Options

that period. Thus, lookback options (like Asian Table 3 Types of lookback options. The contract param-
options) avoid the problem of European options that eters T and X are the time to maturity and the strike price,
the underlying performed favorably throughout most respectively, and ST denotes the spot price at expiration
time. Fixed strike lookback options are also called hindsight
of the option’s lifetime but moves into a nonfavorable options
direction toward maturity. Moreover, (unlike Ameri-
can Options) lookback options optimize the market Parameter used below
timing, because the investor gets, by definition, the Payoff Lookback type in valuation
most favorable underlying price. As summarized in MT − ST Floating strike put φ = −1, η̄ = +1
Table 3, lookback options can be structured in two ST − mT Floating strike call φ = +1, η̄ = +1
different types with the extremum representing either (MT − X)+ Fixed strike call φ = +1, η̄ = −1
the strike price or the underlying value. Figure 1 (X − mT )+ Fixed strike put φ = −1, η̄ = −1
shows the development of the payoff of lookback
options depending on a sample price path. In detail, exchange rate in currency-linked security issues.
we define However, this right is very expensive. Since one buys
a guarantee for the best possible exchange rate ever,
  lookback options are generally too expensive and
MT = max S(u) and mT = min S(u) (57)
0≤u≤T 0≤u≤T hardly ever trade. Exceptions are performance notes,
where lookback and average features are mixed, for
Variations of lookback options include partial
example, performance notes paying say 50% of the
lookback options, where the monitoring period for
best of 36 monthly average gold price returns.
the underlying is shorter than the lifetime of the
option. Conze and Viswanathan [2] present further
variations like limited risk and American lookback Valuation
options.
In theory, Garman pointed out in [4], that look- We consider the example of the floating strike look-
back options can also add value for risk managers, back call. Again, the value of the option is given by
because floating (fixed) strike lookback options are
good means to solve the timing problem of market
v(0, S0 ) = IE e−rd T (ST − mT )
entries (exits) (see [9]). For instance, a minimum
strike call is suitable for avoiding missing the best = S0 e−rf T − e−rd T IE [mT ] (58)

0.25 1.2

Fixed strike lookback call (K = 1.00)


Floating strike lookback call 1
0.2
Plain vanilla call (K = 1.00)
Underlying asset 0.8
Price underlying
Option payoff

0.15

0.6

0.1
0.4

0.05
0.2

0 0
1
3
5
7
9
11
13
15
17
19
21
23
25
27
29
31
33
35
37
39
41
43
45

Trading day

Figure 1 Payoff profile of lookback calls (sample underlying price path, m = 20 trading days)
Pricing Formulae for Foreign Exchange Options 7

In the standard Black–Scholes model (1), the Table 4 Sample values for lookback options. For the
value can be derived using the reflection principle input data, we used spot S0 = 0.9800, rd = 3%, rf = 6%,
and results in σ = 10%, τ = 1/12, running min R = 0.9500, running
max R = 0.9900, and number of equidistant fixings m = 22
 Discretely sampled Continuously
v(t, x) = φ xDf N(φd+ ) − KDd N(φd− ) Equations (67) Equations (59)
Payoff sampled and (68) or (60)
1−η MT − ST
+ φDd [φ(R − X)]+ 0.0231 0.0255
2 ST − mT 0.0310 0.0320
 (MT − 0.99)+
1  x −h √ 0.0107 0.0131
+ ηxDd N(−ηφ(d+ − hσ τ )) (0.97 − mT )+ 0.0235 0.0246
h K

− e(rd −rf )τ N(−ηφd+ ) (59)
lookback options in [10]. We list some sample results
This value function has a removable discontinuity at in Table 4.
h = 0 where it turns out to be


Discrete Sampling
v(t, x) = φ xDf N(φd+ ) − KDd N(φd− )

1−η In practice, one cannot take the average over a


+ φDd [φ(R − X)]+ continuum of exchange rates. The standard is to
2
√ specify a fixing calendar and take only a finite
+ ηxDd σ τ − d+ N(−ηφd+ ) number of fixings into account. Suppose there are
 m equidistant sample points left until expiration

+ ηφn(d+ ) (60) at which we evaluate the extremum. In this case,
the value function vm can be determined by an
approximation described by Broadie et al. [1]. We set
The abbreviations we use are

2(rd − rf )  √
h=

(61) β = − ζ (1/2)/ 2π
σ2

= 0.5826 (ζ being Riemann’s ζ -function) (65)
R = running extremum: extremum observed √

α = eφβσ τ/m (66)
until valuation time (62)

 R floating strike lookback and obtain for fixed strike lookback options
K =
−φ min(−φX, −φR) fixed strike lookback
(63)
 vm (t, x, rd , rf , σ, R, X, φ, η)
 +1 floating strike lookback
η =
−1 fixed strike lookback = v(t, x, rd , rf , σ, αR, αX, φ, η)/α (67)
(64)
and for floating strike lookback options
Note that this formula basically consists of that
for a call option (the first two terms) plus another
term. Conze and Viswanathan also show closed-form vm (t, x, rd , rf , σ, R, X, φ, η)
solutions for fixed strike lookback options and the
= av(t, x, rd , rf , σ, R/α, X, φ, η) − φ(α − 1)xDf
variations mentioned above in [2]. Heynen and Kat
develop equations for partial fixed and floating strike (68)
8 Pricing Formulae for Foreign Exchange Options

Forward Start Options Table 5 Value of a forward start vanilla in USD on


EUR/USD—spot of 0.9000, α = 99%, σ = 12%, rd = 2%,
rf = 3%, maturity T = 186 days, and strike set at tf = 90
Product Definition
days
A forward start vanilla option is just like a vanilla Call Put
option, except that the strike is fixed on some future Value 0.0251 0.0185
date tf ∈ (0, T ), specified in the contract. The strike
is fixed as αStf , where α > 0 is some contractually
defined factor (very common one) and Stf is the spot
Example
at time tf . It pays off
We consider an example in Table 5.
[φ(ST − αStf )]+ (69)

The Value of Forward Start Options Compound and Instalment Options

Using the abbreviations An instalment call option allows the holder to


pay the premium of the call option in instalments
spread over time. A first payment is made at the
 ln Kx + σ θ± (T − tf ) inception of the trade. On the following payment
d± (x) =  (70)
σ T − tf days, the holder of the instalment call can decide to
prolong the contract, in which case, he has to pay the
 − ln α + σ θ± (T − tf ) second instalment of the premium, or to terminate
d±α =  (71)
σ T − tf the contract by simply not paying any more. After
the last instalment payment, the contract turns into a
we recall the value of a vanilla option with strike K plain vanilla call.
at time tf as

Valuation in the Black–Scholes Model


vanilla(tf ,x;K,T ,φ)
The intention of this section is to obtain a closed-
= φ xe−rf (T −tf ) N(φd+ (x))−Ke−rd (T −tf ) N(φd− (x))
form formula for the n-variate instalment option
(72) in the Black–Scholes model. For the cases n = 1
and n = 2, the Black–Scholes formula and Geske’s
For the value of a forward start vanilla option, we compound option formula (see [5]) are well-known
obtain special cases.
Let t0 = 0 be the instalment option inception date
and t1 , t2 , . . . , tn = T a schedule of decision dates
v(0, S0 ) in the contract on which the option holder has to
= e−rd tf IE[vanilla(tf , Stf ; K = αStf , T , φ)] pay the premiums k1 , k2 , . . . , kn−1 to keep the option
alive. To compute the price of the instalment option,
= φS0 e−rf T N(φd+α )− αe(rd −rf )tf e−rd T N(φd−α ) which is the upfront payment V0 at t0 to enter into
(73) the contract, we begin with the option payoff at matu-
rity T
Noticeably, the value computation is easy here,
Vn (s) = [φn (s − kn )]+ = max[φn (s − kn ), 0] (74)
 
because the strike K is set as a multiple of the future
spot. If we were to choose to set the strike as a
constant difference of the future spot, the integration where s = ST is the price of the underlying asset at
would not work in closed form, and we would have T and as usual φn = +1 for a call option, φn = −1
to use numerical integration. for a put option.
Pricing Formulae for Foreign Exchange Options 9

At time ti , the option holder can either terminate 


matrix Rn = (ρij )i,j =1,...,n by Nn (h1 , . . . , hn ; Rn ).
the contract or pay ki to continue. This means that Let the correlation matrix be nonsingular and
the instalment option can be viewed as an option with ρ11 = 1.
strike k1 on an option with strike k2 · · · on an option Under these conditions, Curnow and Dunnett [3]
with strike kn . Therefore, by the risk-neutral pricing derived the following reduction formula for multi-
theory, the holding value is variate normal integrals:

e−rd (ti+1 −ti ) IE[Vi+1 (Sti+1 ) | Sti = s], Nn (h1 , · · · , hn ; Rn )



h1 
for i = 0, . . . , n − 1 (75) h2 − ρ21 y
= Nn−1 ,
−∞ (1 − ρ21
2 1/2
)
where 
hn − ρn1 y ∗
 ···, ; Rn−1 n(y) dy,
 e−rd (ti+1 −ti ) IE[Vi+1 (Sti+1 ) | Sti = s] − ki + (1 − ρn12 1/2
)
Vi (s) =
 for i = 1, . . . , n − 1 ∗
Rn−1 = (ρij∗ )i,j =2,...,n ,

Vn (s) for i = n
(76) ρij − ρi1 ρj 1
ρij∗ =

Then the unique arbitrage-free time-zero value is (78)
(1 − ρi1 ) (1 − ρj21 )1/2
2 1/2

P = V0 (s) = e−rd (t1 −t0 ) IE[V1 (St1 ) | St0 = s]



(77) For example, to go from dimension 1 to dimension
2, this takes the form
Figure 2 illustrates this context.
x
One way of pricing this instalment option is to
N(az + B)n(z) dz
evaluate the nested expectations through multiple −∞
numerical integration of the payoff functions via  
backward iteration. Alternatively, one can derive a B −a
= N2 x,  ; (79)
solution in closed form in terms of the n-variate 1 + a2 1 + a2
cumulative normal, which is described in the fol-
lowing. or more generally,


x
The Curnow and Dunnett Integral Reduction eAz N(az + B)n(z) dz
Technique −∞
 
Denote the n-dimensional multivariate normal inte-
A2 aA + B −a
= e2 N2 x − A,  ; (80)
gral with upper limits h1 , . . . , hn and correlation 1+a 2 1 + a2

Standard option Vn −1 (t )

Compound option Vn −2 (t )

P = V 0 (t 0 ) k1 k2 kn −1 kn

t0 = 0 t1 t2 tn −1 tn = T

V 0 (t )

V1 ( t )

Figure 2 Lifetime of the options Vi


10 Pricing Formulae for Foreign Exchange Options

A Closed-form Solution for the Value of an Theorem 1 Let k = (k1 , . . . , kn ) be the strike price
Instalment Option vector, t = (t1 , . . . , tn ) the vector of the exercise
dates of an n-variate instalment option and φ =
Heuristically, the formula which is given in the (φ1 , . . . , φn ) the vector of the put/call-indicators of
Theorem 1 has the structure of the Black–Scholes these n options.
formula in higher dimensions, namely, S0 Nn (·) − The value function of an n-variate instalment
kn Nn (·) minus the later premium payments ki Ni (·) option is given by

Vn (S0 , k, t, φ)
 
ln SS0∗ + σ θ+ t1 ln SS0∗ + σ θ+ t2 ln SS0∗ + σ θ+ tn
= e−rf tn S0 φ1 · · · φn × Nn φ1 1
√ , φ2 2
√ , . . . , φn n
√ ; Rn 
σ t1 σ t2 σ tn
 
ln SS0∗ + σ θ− t1 ln SS0∗ + σ θ− t2 ln SS0∗ + σ θ− tn
− e−rd tn kn φ1 · · · φn × Nn φ1 1
√ , φ2 2
√ , . . . , φn n
√ ; Rn 
σ t1 σ t2 σ tn
 
ln SS0∗ + σ θ− t1 ln SS0∗ + σ θ− t2 ln SS∗0 + σ θ− tn−1
− e−rd tn−1 kn−1 φ1 · · · φn−1 × Nn−1 φ1 1 √ , φ2 2
√ , . . . , φn−1 n−1
√ ; Rn−1
σ t1 σ t2 σ tn−1
..
.
 
ln SS0∗ + σ θ− t1 ln SS0∗ + σ θ− t2
− e−rd t2 k2 φ1 φ2 N2 φ1 1
√ , φ2 2
√ ; ρ12 
σ t1 σ t2
 
ln SS0∗ + σ θ− t1
− e−rd t1 k1 φ1 N φ1 1
√  (81)
σ t1

(i = 1, . . . , n − 1). This structure is a result of the where Si∗ (i = 1, . . . , n) is to be determined as the


integration of the vanilla option payoff, which is spot price St for which the payoff of the corresponding
again integrated minus the next instalment, which i-variate instalment option (i = 1, . . . , n) is equal to
in turn is integrated with the following instalment 0, that is, Vi (Si∗ , k, t, φ) = 0. This has to be done
and so forth. By this iteration, the vanilla payoff is numerically by a zero search.
integrated with respect to the normal density function The correlation coefficients in Ri of the i-variate
n times and the ith payment is integrated i times for normal distribution function can be expressed through
i = 1, . . . , n − 1. the exercise dates ti ,
The correlation coefficients ρij of these normal 
distribution functions contained in the formula arise ρij = ti /tj for i, j = 1, . . . , n and i < j (82)
from the overlapping increments of the Brownian
motion, which models the price process of the The proof is established with equation (78). For-
underlying St at the particular exercise dates ti mula (81) has been independently derived by Thom-
and tj . assen and Wouve in [16] and Griebsch et al. in [6].
Pricing Formulae for Foreign Exchange Options 11

References [10] Heynen, R. & Kat, H. (1994). Crossing barriers, Risk


7(6), 46–51.
[11] Lipton, A. (2001). Mathematical Methods for Foreign
[1] Broadie, M. Glasserman, P. & Kou, S.G. (1999). Con- Exchange, World Scientific, Singapore.
necting discrete and continuous path-dependent options, [12] Revuz, D. & Yor, M. (1995). Continuous Martingales
Finance and Stochastics 3(1), 55–82. and Brownian Motion, 2nd Edition, Springer.
[2] Conze, A. & Viswanathan, R. (1991). Path dependent [13] Rich, D. (1994). The mathematical foundations of barrier
options: the case of lookback options, The Journal of option pricing theory, Advances in Futures and Options
Finance, XLVI(5), 1893–1907. Research 7, 267–371.
[3] Curnow, R.N. & Dunnett, C.W. (1962). The numeri- [14] Reiner, E. & Rubinstein, M. (1991). Breaking down the
cal evaluation of certain multivariate normal integrals, barriers, Risk 4(8), 28–35.
Annals of Mathematical Statistics 33, 571–579. [15] Shreve, S.E. (2004). Stochastic Calculus for Finance
[4] Garman, M. (1989). Recollection in tranquillity,, re- I+II, Springer.
edited version, in From Black Scholes to Black Holes, [16] Thomassen, L. & van Wouve, M. (2002). A Sensitivity
Originally in Risk Publications, London, pp. 171–175. Analysis for the N-fold Compound Option, Research
[5] Geske, R. (1979). The valuation of compound options, Paper, Faculty of Applied Economics, University of
Journal of Financial Economics 7, 63–81. Antwerpen.
[17] Wystup, U. (2006). FX Options and Structured Products,
[6] Griebsch, S.A. Kühn, C. & Wystup, U. (2008). Instal-
Wiley Finance Series.
ment options: a closed-form solution and the limiting
[18] Zhang, P.G. (1998). Exotic Options, 2nd Edition, World
case, in Contribution to Mathematical Control Theory
Scientific, London.
and Finance, A. Sarychev, A. Shiryaev, M. Guerra, &
M.R. Grossinho, eds, Springer, pp. 211–229.
[7] Hakala, J. & Wystup, U. (2002). Foreign Exchange Risk, Related Articles
Risk Publications, London.
[8] Haug, E.G. (1997). Option Pricing Formulas, McGraw
Hill.
Barrier Options; Black–Scholes Formula; Dis-
[9] Heynen, R. & Kat, H. (1994). Selective memory: reduc- cretely Monitored Options; Foreign Exchange
ing the expense of lookback options by limiting their Markets; Foreign Exchange Options; Foreign
memory, re-edited version, in Over the Rainbow: Devel- Exchange Symmetries; Lookback Options.
opments in Exotic Options and Complex Swaps, Risk
Publications, London. ANDREAS WEBER & UWE WYSTUP
Foreign Exchange domestic currency. In the case of EUR-USD with a
spot of 1.2000, this means that the price of 1 EUR is
Symmetries 1.2000 USD. The notions of foreign and domestic
do not refer to the location of the trading entity,
but only to this quotation convention. We denote
Motivation the (continuous) foreign interest rate by rf and the
(continuous) domestic interest rate by rd . In an equity
The symmetries of the foreign exchange (FX) market scenario, rf would represent a continuous dividend
are the key features that distinguish this market rate. The volatility is denoted by σ , and Wt is a
from all others. With an EUR-USD exchange rate of standard Brownian motion.
1.2500 USD per EUR, there is an equivalent USD- We consider this standard model, not because it
EUR exchange rate of 0.8000 EUR per USD, which is reflects the statistical properties of the exchange rate
just the reciprocal. Any model St for an exchange rate (in fact, it does not), but because it is widely used in
at time t should guarantee that 1/St is within the same practice and front office systems and mainly serves
model class. This is satisfied by the Black–Scholes as a tool to communicate prices in FX options. These
model or local volatility models, but not for many prices are generally quoted in terms of volatility in
stochastic volatility models. the sense of this model.
The further symmetry is that both currencies pay
interest, which we can assume to be continuously
paid. The FX market is the only market where this
Vanilla Options
really works.
In the EUR-USD market, any EUR call is equiv-
The payoff for a vanilla option (European put or call)
alent to an USD put. Any premium in USD can be
is given by
also paid in EUR, any delta hedge can be specified
in the amount of USD to sell or, alternatively, the F = [φ(ST − K)]+ (2)
amount of EUR to buy.
Furthermore, if S1 is a model for EUR-USD and where the contractual parameters are the strike K, the
S2 is a model for USD–JPY, then S3 = S1 · S2 should expiration time T and the type φ, a binary variable,
be a model for EUR-JPY. Therefore, besides the which takes the value +1 in the case of a call and
reciprocals, the products of two modeling quantities −1 in the case of a put. The symbol x + denotes the
should also remain within the same model class. positive part of x, that is, x + = max(0, x) = 0 ∨ x.
Finally, the smile of an FX options market is
summarized by Risk Reversals and Butterflies, the
skew-symmetric part and the symmetric part of a Value
smile curve.
In no other market are symmetries so prominent In the Black–Scholes model, the value of the payoff
and so heavily used as in FX. It is this special feature F at time t if the spot is at S is denoted by v(t, S).
that makes it hard for many newcomers to capture the The result is the Black–Scholes formula
way FX options market participants think.
v(S, K, T , t, σ, rd , rf , φ)
Geometric Brownian Motion Model  
= φe−rd τ f N(φ d+ ) − K N(φ d− ) (3)
for the Spot
We abbreviate
We consider the model geometric Brownian motion
• S: current price of the underlying
dSt = (rd − rf )St dt + σ St dWt (1) • τ = T − t: time to maturity
for the underlying exchange rate quoted in foreign– • f = IE[ST |St = S] = Se(rd −rf )τ : forward price
domestic (FOR–DOM), which means that 1 unit of of the underlying
r −r
the foreign currency costs FOR–DOM units of the • θ± = d σ f ± σ
2
2 Foreign Exchange Symmetries

S f σ2 • Dual delta.
ln + σ θ± τ ln ± τ
• d± = K √ = K√ 2 ∂v
σ τ σ τ = −φe−rd τ N(φd− ) (9)
1
∂K
1 − t2
• n(t) = √ e 2 = n(−t) The forward dual delta
 2π
x
• N(x) = −∞ n(t) dt = 1 − N(−x) N(φ d− ) = IP [φST ≥ φK] (10)

A Note on the Forward can also be viewed as the risk-neutral exercise


probability.
The forward price f is the strike that makes the time • Dual gamma.
zero value of the forward contract
∂ 2v n( d− )
F = ST − f (4) = e−rd τ √ (11)
∂K 2 Kσ τ
equal to zero. It follows that f = IE[ST ] =
Se(rd −rf )·T , that is, the forward price is the expected Identities
price of the underlying at time T in a risk-neutral
setup (drift of the geometric Brownian motion is Any computations with vanilla options often rely on
equal to the cost of carry rd − rf ). the symmetry identities

Greeks ∂ d± d∓
= − (12)
∂σ σ
Greeks are derivatives of the value function with √
∂ d± τ
respect to model and contract parameters. They = (13)
are an important information for traders and have ∂rd σ

become standard information provided by front-office ∂ d± τ
systems. More details on Greeks and the relations = − (14)
∂rf σ
among Greeks are presented in [5] or [6]. We now
list some of them for vanilla options: Se−rf τ n( d+ ) = Ke−rd τ n( d− ) (15)

• Spot delta. The spot delta shows how many units


Put–Call Parity
of FOR in the spot must be traded to hedge an
option with 1 unit of FOR notional. The put–call parity is the relationship,
∂v
= φe−rf τ N(φd+ ) (5) v(S, K, T , t, σ, rd , rf , +1)
∂S
• Forward delta. The forward delta shows how − v(S, K, T , t, σ, rd , rf , −1)
many units of FOR of a forward contract must = Se−rf τ − Ke−rd τ (16)
be traded to hedge an option with 1 unit of FOR
notional. which is just a more complicated way to write the
  trivial equation x = x + − x − . Taking the strike K
f2 to be equal to the forward price f , we see that the
φ N(φ d+ ) = IP φST ≤ φ (6)
K call and put have the same value. The forward is the
center of symmetry for vanilla call and put values.
It is also equal to the risk-neutral exercise proba-
However, this does not imply that the deltas are also
bility or in-the-money probability of the symmetric
symmetric about the forward price.
put (see Section 1.4.3).
The put–call delta parity is
• Gamma.
∂ 2v n( d+ )
= e−rf τ √ (7) ∂v(S, K, T , t, σ, rd , rf , +1)
∂S 2 Sσ τ
∂S
• Vega.
∂v √ ∂v(S, K, T , t, σ, rd , rf , −1)
= Se−rf τ τ n(d+ ) (8) − = e−rf τ (17)
∂σ ∂S
Foreign Exchange Symmetries 3

In particular, we learn that the absolute value of a put Comparing the coefficients of S and K in equa-
delta and a call delta do not exactly add up to 1, but tions (3) and (20) leads to suggestive results for the
only to a positive number e−rf τ . They add up to 1 delta vS and dual delta vK . This space-homogeneity
approximately if either the time to expiration τ is is the reason behind the simplicity of the delta for-
short or if the foreign interest rate rf is close to 0. For mulas, whose tedious computation can be saved this
this reason, traders often prefer to work with forward way.
deltas, because these are symmetric in the sense that
a 25-delta call is a 75-delta put.
Although the choice K = f produces identical Time Homogeneity
values for call and put, we seek the delta-symmetric
strike Ǩ, which produces absolutely identical deltas We can perform a similar computation for the time-
(spot, forward, or driftless). This condition implies affected parameters and obtain the obvious equation
d+ = 0 and thus
v(S, K, T , t, σ, rd , rf , φ)
σ2
Ǩ = fe 2 τ  
(18) T t √
= v S, K, , , aσ, ard , arf , φ
a a
in which case the absolute delta is e−rf τ /2. In
particular, we learn that always Ǩ > f , that is, there for all a > 0 (21)
cannot be a put and a call with identical values and
deltas. This is natural as the payoffs of calls and Differentiating both sides with respect to a and then
puts are not symmetric to start with: the call has setting a = 1 yields
unlimited upside potential, whereas the put payoff is
always bounded by the strike. Note that the strike Ǩ 1
0 = τ vt + σ vσ + rd vrd + rf vrf (22)
is usually chosen as the middle strike when trading 2
a straddle or a butterfly. Similarly the dual-delta-
σ2 Of course, this can also be verified by direct com-
symmetric strike K̂ = f e− 2 τ can be derived from putation. The overall use of such equations is to
the condition d− = 0. generate double checking benchmarks when comput-
Note that the delta-symmetric strike Ǩ also max- ing Greeks. These homogeneity methods can easily
imizes gamma and vega of a vanilla option and is be extended to other more complex options.
thus often considered as a center of symmetry.

Put–Call Symmetry
Homogeneity-based Relationships
By put–call symmetry, we understand the relationship
Space Homogeneity (see [1–4])

We may wish to measure the value of the underlying


v(S, K, T , t, σ, rd , rf , +1)
in a different unit. This will obviously effect the 
option pricing formula as follows: K f2
= v S, , T , t, σ, rd , rf , −1 (23)
f K
av(S, K, T , t, σ, rd , rf , φ)
2
= v(aS, aK, T , t, σ, rd , rf , φ) for all a > 0 The geometric mean of the strike
of the put fK
2
(19) and the strike of the call K is equal to fK · K = f ,
the outright forward rate. Therefore, the outright
Differentiating both sides with respect to a and then forward rate can be interpreted as a geometric mirror
setting a = 1 yields reflecting a call into a certain number of puts. Note
that for at-the-money (forward) options (K = f ) the
v = SvS + KvK (20) put–call symmetry coincides with the special case of
4 Foreign Exchange Symmetries

the put–call parity where the call and the put have FOR paying: ST 11{φST ≥φK} (27)
the same value.
where the contractual parameters are the strike K, the
Rates Symmetry expiration time T , and the type φ, a binary variable,
which takes the value +1 in the case of a call and −1
Direct computation shows that the rates symmetry in the case of a put. Then we observe that a DOM-
∂v ∂v paying digital call in the currency pair FOR–DOM
+ = −τ v (24) with a value of vd units of domestic currency must
∂rd ∂rf
be worth the same as a FOR-paying digital put in the
holds for vanilla options. This relationship, in fact, currency pair DOM–FOR with a value of vf units
holds for all European options and a wide class of of foreign currency. And since we are looking at the
path-dependent options as shown in [6]. same product, we conclude that vd = vf · S, where S
is the initial spot of FOR–DOM.
Foreign–Domestic Symmetry
One can directly verify the FOR–DOM symmetry Touch Options
This key idea generalizes from the path-independent
1 digitals to touch products. Consider the value function
v(S, K, T , t, σ, rd , rf , φ)
S for one-touch in EUR-USD paying 1 USD. If we
 
1 1 want to find the value function of a one-touch in
= Kv , , T , t, σ, rf , rd , −φ (25) EUR-USD paying 1 EUR, we can price the one-touch
S K
in USD-EUR paying 1 EUR using the known value
This equality can be viewed as one of the faces function with rates rd and rf exchanged, volatility
of put–call symmetry. The reason is that the value unchanged, using the formula for a one-touch in
of an option can be computed both in a domes- EUR-USD paying 1 USD. We also note that an
tic as well as in a foreign scenario. We con- upper one-touch in EUR-USD becomes a lower one-
sider the example of St modeling the exchange touch in USD-EUR. The result we get is in domestic
rate of EUR/USD. In New York, the call option currency, which is EUR in USD-EUR notation. To
(ST − K)+ costs v(S, K, T , t, σ, rusd , reur , 1) USD convert it into an USD price, we just multiply by the
and hence v(S, K, T , t, σ, rusd , reur , 1)/S EUR. This EUR-USD spot S.
EUR call option can also be viewed as an USD
+
put option with payoff K K1 − S1T . This option Barrier Options

costs Kv S1 , K1 , T , t, σ, reur , rusd , −1 EUR in Frank-
furt, because St and S1t have the same volatility. Of For a standard knockout barrier option, we let the
course, the New York value and the Frankfurt value value function be
must agree, which leads to equation (25). This can v(S, rd , rf , σ, K, B, T , t, φ, η) (28)
also be seen as a change of measure to the foreign
discount bond as numeraire (see, e.g., in [7]). where B denotes the barrier and the variable η takes
the value +1 for a lower barrier and −1 for an upper
barrier. With this notation at hand, we can state our
Exotic Options
FOR–DOM symmetry as
In FX markets, one can use many symmetry relation-
ships for exotic options. v(S, rd , rf , σ, K, B, T , t, φ, η)
 
1 1 1
Digital Options =v , rf , rd , σ, , , T , t, −φ, −η SK
S K B
For example, let us define the payoff of digital (29)
options by
Note that the rates rd and rf have been interchanged
DOM paying: 11{φST ≥φK} (26) on purpose. This implies that if we know how to price
Foreign Exchange Symmetries 5

barrier contracts with upper barriers, we can derive Table 1 Standard market quotation of major currency
the formulas for lower barriers. pairs with sample spot prices
Currency pair Default quotation Sample quote

Quotation GBP/USD GPB-USD 1.8000


GBP/CHF GBP-CHF 2.2500
EUR/USD EUR-USD 1.2000
Quotation of the Underlying Exchange Rate EUR/GBP EUR-GBP 0.6900
EUR/JPY EUR-JPY 135.00
Equation (1) is a model for the exchange rate. The EUR/CHF EUR-CHF 1.5500
quotation is a permanently confusing issue, so let us USD/JPY USD-JPY 108.00
clarify this here. The exchange rate means how much USD/CHF USD-CHF 1.2800
of the domestic currency are needed to buy 1 unit of
foreign currency. For example, if we take EUR/USD
as an exchange rate, then the default quotation is pips per foreign, in short, d pips. The others can be
EUR-USD, where USD is the domestic currency and computed using the following instruction:
EUR is the foreign currency. The term domestic is
in no way related to the location of the trader or × S1 × KS × S1 ×SK
d pips−
→%f−→%d−
→f pips−−→d pips (30)
any country. It merely means the numeraire currency.
The terms domestic, numeraire, or base currency are
synonyms as are foreign and underlying . Commonly, Delta and Premium Convention
we denote with the slash (/) the currency pair and with
a dash (-) the quotation. The slash (/) does not mean a The spot delta of a European option without
division. For instance, EUR/USD can also be quoted premium-adjustment is well known. It will be called
in either EUR-USD, which then means how many raw spot delta δraw now and denotes the amount of
USD are needed to buy one EUR, or in USD-EUR, FOR to buy when selling an option with 1 unit of
which then means how many EUR are needed to buy FOR notional. However, the same option can also be
1 USD. There are certain market standard quotations viewed as an option with K units of DOM notional.
listed in Table 1. The delta that goes with the same option, but 1 unit
of DOM notional and tells how many units of DOM
currency must be sold for the delta hedge is denoted
Quotation of Option Prices reverse
by δraw . In the market, both deltas can be quoted in
either of the two currencies involved. The relation-
Values and prices of vanilla options may be quoted
ship is
in the six ways explained in Table 2. S
The Black–Scholes formula quotes an option reverse
δraw = −δraw (31)
K
value in units of domestic currency per unit of foreign
notional. Since this is usually a small number, it is The delta is used to buy or sell spot in the cor-
often multiplied with 10 000 and quoted as domestic responding amount in order to hedge the option up

Table 2 Standard market quotation types for option values. In the example, we take FOR = EUR, DOM = USD,
S = 1.2000, rd = 3.0%, rf = 2.5%, σ = 10%, K = 1.2500, T = 1 year, φ = +1 (call), notional = 1 000 000 EUR =
1 250 000 USD. For the pips, the quotation 291.48 USD pips per EUR is also sometimes stated as 2.9148% USD per
1 EUR. Similarly, the 194.32 EUR pips per USD can also be quoted as 1.9432% EUR per 1 USD
Name Symbol Value in units of Example
Domestic cash d DOM 29 148 USD
Foreign cash f FOR 24 290 EUR
% domestic %d DOM per unit of DOM 2.3318% USD
% foreign %f FOR per unit of FOR 2.4290% EUR
Domestic pips d pips DOM per unit of FOR 291.48 USD pips per EUR
Foreign pips f pips FOR per unit of DOM 194.32 EUR pips per USD
6 Foreign Exchange Symmetries

to first order. To interpret this relationship, note that the option, and for risky premium this premium must
the minus sign refers to selling DOM instead of be included. In the opposite case, the risky premium
buying FOR, and the multiplication by S adjusts and the market value must be taken into account for
the amounts. Furthermore, we divide by the strike, the base currency premium, such that these offset
because a call on 1 EUR corresponds to K USD each other. And for premium in underlying currency
puts. More details on delta conventions are contained of the contract, the market value needs to be taken
in Foreign Exchange Options: Delta- and At-the- into account. In this way, the delta hedge is invariant
money Conventions with respect to the risky currency notion of the bank,
For consistency, the premium needs to be incorpo- for example, the delta is the same for a USD-based
rated into the delta hedge, since a premium in foreign bank and an EUR-based bank.
currency will already hedge part of the option’s delta
risk. To make this clear, let us consider EUR-USD.
In the standard arbitrage theory, v(S) denotes the Example
value or premium in USD of an option with 1 EUR
We consider two examples in Tables 3 and 4 to
notional, if the spot is at S, and the raw delta vS
compare the various versions of deltas that are used
denotes the number of EUR to buy for the delta
in practice.
hedge. Therefore, SvS is the number of USD to sell.
If now the premium is paid in EUR rather than in
USD, then we already have Sv EUR, and the number
of EUR to buy has to be reduced by this amount, that
Greeks in Terms of Deltas
is, if EUR is the premium currency, we need to buy In FX markets, the moneyness of vanilla options
vS − v/S EUR for the delta hedge or equivalently is always expressed in terms of deltas and prices
sell SvS − v USD.
To quote an FX option, we need to first sort
out which currency is domestic, which is for- Table 3 1 y EUR call USD put strike K = 0.9090 for a
eign, what is the notional currency of the option, EUR-based bank. Market data: spot S = 0.9090, volatility
and what is the premium currency. Unfortunately, σ = 12%, EUR rate rf = 3.96%, USD rate rd = 3.57%.
The raw delta is 49.15%EUR and the value is 4.427%EUR
this is not symmetric, since the counterparty might
have another notion of domestic currency for a Delta Prem
given currency pair. Hence, in the professional currency currency Fenics Formula Delta
interbank market, there is one notion of delta % EUR EUR lhs δraw − P 44.72
per currency pair. Normally, it is the left-hand % EUR USD rhs δraw 49.15
side delta of the Fenicsa screen if the option is % USD EUR rhs −(δraw − −44.72
traded in left-hand side premium, which is nor- [flip F4] P )S/K
mally the standard and right-hand side delta if it % USD USD lhs −(δraw )S/K −49.15
is traded with right-hand-side premium, for exam- [flip F4]
ple, EUR/USD lhs, USD/JPY lhs, EUR/JPY lhs,
AUD/USD rhs, and so on. Since OTM options
are traded most of the time, the difference is Table 4 1 y call EUR call USD put strike K = 0.7000 for
a EUR-based bank. Market data: spot S = 0.9090, volatility
not huge and hence does not create a huge spot
σ = 12%, EUR rate rf = 3.96%, USD rate rd = 3.57%.
risk. The raw delta is 94.82%EUR and the value is 21.88%EUR
Additionally, the standard delta per currency pair
(left-hand-side delta in Fenics for most cases) is used Delta Prem
to quote options in volatility. This has to be specified currency currency Fenics Formula Delta
by currency pair. % EUR EUR lhs δraw − P 72.94
This standard interbank notion must be adapted % EUR USD rhs δraw 94.82
to the real delta risk of the bank for an automated % USD EUR rhs −(δraw − −94.72
trading system. For currency pairs where the risk- [flip F4] P )S/K
% USD USD lhs −δraw S/K −123.13
free currency of the bank is the domestic or base [flip F4]
currency, it is clear that the delta is the raw delta of
Foreign Exchange Symmetries 7

are quoted in terms of volatility. This makes a 10- thus given by


delta call a financial object as such independent of
spot and strike. This method and the quotation in     
volatility makes objects and prices transparent in a ln 1/S
+ rf − rd + 12 σ 2 τ
 1/K 
very intelligent and user-friendly way. At this point, − φe−rd τ N 
−φ √ 

we list the Greeks in terms of deltas instead of spot σ τ
and strike. Let us introduce the quantities
 S  
−rd τ
ln + rd − rf − 12 σ 2 τ
+ −rf τ
N(φ d+ ) spot delta (32) = −φe N φ K

= φe σ τ
−rd τ
= − φe
− N(φ d− ) dual delta (33)
= − (38)
which we assume to be given. From these we can
retrieve which means that the dual delta is the delta from the
foreign point of view.
Now, we list value, delta, and vega in terms of
d+ = φ N−1 (φerf τ + ) (34) S, + , − , rd , rf , τ , and φ.
−1
d− = φ N (−φe rd τ
− ) (35)
• Value.

Interpretation of Dual Delta


v(S, + , − , rd , rf , τ, φ)
The dual delta introduced in equation (9) as the
e−rf τ n(d+ )
sensitivity with respect to strike has another—more = S + + S − (39)
practical—interpretation in an FX setup. Recall from e−rd τ n(d− )
equation (25) that for vanilla options the domestic
• Spot delta.
value
∂v
v(S, K, τ, σ, rd , rf , φ) (36) = + (40)
∂S
corresponds to a foreign value
• Vega.
 
1 1 ∂v √
v , , τ, σ, rf , rd , −φ (37) = Se−rf τ τ n(d+ ) (41)
S K ∂σ
up to an adjustment of the nominal amount by the Notice that vega does not require knowing the dual
factor SK. From a foreign point of view, the delta is delta.

Table 5 Vega in terms of Delta for the standard maturity labels and various deltas. It shows that one can neutralize a
vega position of a long 9M 35 delta call with 4 short 1M 20 delta puts. This offsetting, however, is not a static, but only
a momentary hedge
Matrix/ 50% 45% 40% 35% 30% 25% 20% 15% 10% 5%
1D 2 2 2 2 2 2 1 1 1 1
1W 6 5 5 5 5 4 4 3 2 1
1W 8 8 8 7 7 6 5 5 3 2
1M 11 11 11 11 10 9 8 7 5 3
2M 16 16 16 15 14 13 11 9 7 4
3M 20 20 19 18 17 16 14 12 9 5
6M 28 28 27 26 24 22 20 16 12 7
9M 34 34 33 32 30 27 24 20 15 9
1Y 39 39 38 36 34 31 28 23 17 10
2Y 53 53 52 50 48 44 39 32 24 14
3Y 63 63 62 60 57 53 47 39 30 18
8 Foreign Exchange Symmetries

Vega in Terms of Delta [2] Bates, D. (1991). The crash of 1987—was it expected?
The evidence from options markets, The Journal of
The mapping Finance 46, 1009–1044.
√ [3] Bowie, J. & Carr, P. (1994). Static simplicity, Risk
 → vσ = Se−rf τ τ n(N−1 (erf τ )) (42) Magazine (7), 45–49. http://www.riskpublications.com
[4] Carr, P. (1994). European Put Call Symmetry, Cornell
is important for trading vanilla options. Observe that University Working Paper.
this function does not depend on rd or σ , just on [5] Hakala, J. & Wystup, U. (2002). Foreign Exchange Risk,
rf . Quoting vega in % foreign will additionally Risk Publications, London. http://www.mathfinance.com/
FXRiskBook/.
remove the spot dependence. This means that for
[6] Reiss, O. & Wystup, U. (2001). Efficient computation
a moderately stable foreign term structure curve, of option price sensitivities using homogeneity and other
traders will be able to use a moderately stable vega tricks, The Journal of Derivatives 9(2), 41–53.
matrix. For rf = 3%, the vega matrix is presented in [7] Shreve, S.E. (2004). Stochastic Calculus for Finance II.
Table 5. Springer.
The most important result of this paragraph is
the fact that vega can be written in terms of delta, Further Reading
which is the main reason why the FX market uses
implied volatility quotation based on deltas in the first
Wystup, U. (2006). FX Options and Structured Products, Wiley
place. Finance Series, Wiley. http://fxoptions.mathfinance.com/.

End Notes
Related Articles
a.
Fenics is one of the standard tools for FX option pricing
(see http://www.fenics.com/) Black–Scholes Formula; Foreign Exchange Op-
tions: Delta- and At-the-money Conventions;
References Foreign Exchange Markets; Put–Call Parity.

UWE WYSTUP
[1] Bates, D. (1988). Crashes, Options and International
Asset Substitutability. PhD Dissertation, Economics
Department, Princeton University.
Quanto Options and hence

1 1 1
dSt(1) = dSt(3) + St(3) d + dSt(3) d
A quanto option can be any cash-settled option St(2) St(2) St(2)
whose payoff is converted into a third currency at
St(3) St(3)
maturity at a prespecified rate, called the quanto = (rEUR − rXAU ) dt + σ3 dWt(3)
factor. There can be quanto plain vanilla, quanto St(2) St(2)
barriers, quanto forward starts, quanto corridors,
St(3)
and so on. The arbitrage pricing theory and the + (rUSD − rEUR + σ22 ) dt
fundamental theorem of asset pricing, also covered St(2)
for example in [3] and [2], allow the computation St(3) St(3)
of option values. Other references include Options: − σ dWt(2) +
(2) 2
ρ23 σ2 σ3 dt
Basic Definitions; Option Pricing: General Princi- St St(2)
ples; Foreign Exchange Markets. = (rUSD − rXAU + σ22 + ρ23 σ2 σ3 )St(1) dt
+ St(1) (σ3 dWt(3) − σ2 dWt(2) ) (6)
Foreign Exchange Quanto Drift
Adjustment Since St(1) is a geometric Brownian motion with
volatility σ1 , we introduce a new Brownian motion
We take the example of a gold contract with under- Wt(1) and find
lying XAU/USD in XAU–USD quotation that is
quantoed into EUR. Since the payoff is in EUR, we
let EUR be the numeraire or domestic or base cur- dSt(1) = (rUSD − rXAU + σ22 + ρ23 σ2 σ3 )St(1) dt
rency and consider a Black–Scholes model + σ1 St(1) dWt(1) (7)

XAU–EUR: dSt(3) = (rEUR − rXAU )St(3) dt Now Figure 1 and the law of cosine imply
+ σ3 St(3) dWt(3) (1)
σ32 = σ12 + σ22 − 2ρ12 σ1 σ2 (8)
USD–EUR: dSt(2) = (rEUR − rUSD )St(2) dt
σ12 = σ22 + σ32 + 2ρ23 σ2 σ3 (9)
+ σ2 St(2) dWt(2) (2)
dWt(3) dWt(2) = − ρ23 dt (3) which yields

where we use a minus sign in front of the correla- σ22 + ρ23 σ2 σ3 = ρ12 σ1 σ2 (10)
tion, because both S (3) and S (2) have the same base
currency (DOM), which is EUR in this case. The sce- As explained in the currency triangle in Figure 1,
nario is displayed in Figure 1. The actual underlying ρ12 is the correlation between XAU–USD and

is then USD–EUR, whence ρ = − ρ12 is the correlation
S (3) between XAU–USD and EUR–USD. Inserting this
XAU–USD: St(1) = t(2) (4)
St into equation (7), we obtain the usual formula for the
drift adjustment
Using Itô’s formula, we first obtain

1 1 1 1 dSt(1) = (rUSD − rXAU − ρσ1 σ2 )St(1) dt


d = − dSt(2) + · 2 · (2) 3 (dSt(2) )2
St(2) (2) 2
(St ) 2 (St ) + σ1 St(1) dWt(1) (11)
1 1
= (rUSD − rEUR + σ22 ) dt − σ2 dWt(2) This is the risk neutral pricing process that can be
St(2) St(2) used for the valuation of any derivative depending on
(5) St(1) , which is quantoed into EUR.
2 Quanto Options

XAU be the adjusted drift, where rd and rf denote the risk-


free rates of the domestic and foreign underlying
currency pair, respectively, σ = σ1 the volatility
of this currency pair, σ̃ = σ2 the volatility of the
currency pair DOM–QUANTO, and
s3 s1
σ32 − σ 2 − σ̃ 2
ρ= (14)
2σ σ̃
p − j12
j23 p − j23 j12 the correlation between the currency pairs FOR–
EUR USD DOM and DOM–QUANTO in this quotation. Fur-
s2 thermore, we let rQ be the risk-free rate of the quanto
currency. With the same principles as in pricing for-
Figure 1 XAU–USD–EUR FX quanto triangle. The ar- mulae for foreign exchange options, we can derive
rows point in the direction of the respective base currencies. the formula for the value as
The length of the edges represents the volatility. The cosine
of the angles cos φij = ρij represents the correlation of the
currency pairs S (i) and S (j ) , if the base currency (DOM) v = Qe−rQ T φ[S0 eµ̃T N(φd+ ) − K N(φd− )] (15)
of S (i) is the underlying currency (FOR) of S (j ) . If both  
S (i) and S (j ) have the same base currency (DOM), then the ln SK0 + µ̃ ± 12 σ 2 T
correlation is denoted by −ρij = cos(π − φij )
d± = √ (16)
σ T
where N denotes the cumulative standard normal
Extensions to Other Models distribution function and n its density.
The previous derivation can be extended to the
case of term-structure of volatility and correlation.
Quanto Forward
However, introduction of volatility smile would dis-
tort the relationships. Nevertheless, accounting for Similarly, we can easily determine the value of a
smile effects is important in real-market scenar- quanto forward paying
ios. See Foreign Exchange Smiles and Foreign
Exchange Smile Interpolation for details. To do Q[φ(ST − K)] (17)
this, one could, for example, capture the smile for
a multicurrency model with a weighted Monte Carlo where K denotes the strike, T the expiration time, φ
technique as described in [1]. This would still allow the usual long–short indicator, S the underlying in
to use the previous result. FOR–DOM quotation, and Q the quanto factor from
the domestic currency into the quanto currency. Then
the formula for the value can be written as
Quanto Vanilla
v = Qe−rQ T φ[S0 eµ̃T − K] (18)
Common among foreign exchange options is a
quanto plain vanilla paying This follows from the vanilla quanto value formula
by taking both the normal probabilities to be 1. These
Q[φ(ST − K)]+ (12) normal probabilities are exercise probabilities under
some measure. Since a forward contract is always
where K denotes the strike, T the expiration time, φ exercised, both these probabilities must be equal to 1.
the usual put–call indicator taking the value +1 for a
call and −1 for a put, S the underlying in FOR–DOM
quotation, and Q the quanto factor from the domestic Quanto Digital
currency into the quanto currency. We let
A European-style quanto digital pays

µ̃ = rd − rf − ρσ σ̃ (13) QII{φST ≥φK} (19)
Quanto Options 3

Table 1 Example of a quanto digital put. The buyer ∂v


receives 100 000 EUR if at maturity, the European Cen- = − QS0 e(µ̃−rQ )T φ N(φd+ )ρσ T ,
∂ σ̃
tral Bank fixing for USD–JPY (computed via EUR–JPY
and EUR–USD) is below 108.65. Terms were created ∂v
= − QS0 e(µ̃−rQ )T φ N(φd+ )σ σ̃ T ,
on January 12, 2004 with the following market data: ∂ρ
USD–JPY spot reference 106.60, USD–JPY at-the-money ∂v ∂v ∂ρ
volatility 8.55%, EUR–JPY at-the-money volatility 6.69%, =
EUR–USD at-the-money volatility 10.99% (correspond- ∂σ3 ∂ρ ∂σ3
ing to a correlation of −27.89% for USD–JPY against ∂v σ3
JPY–EUR), USD rate 2.5%, JPY rate 0.1%, and EUR rate =
4%
∂ρ σ σ̃
σ3
Notional 100 000 EUR = − QS0 e(µ̃−rQ )T φ N(φd+ )σ σ̃ T
σ σ̃
Maturity 3 months (92 days)
European-style barrier 108.65 USD–JPY = − QS0 e(µ̃−rQ )T φ N(φd+ )σ3 T
Theoretical value 71 555 EUR 
Fixing source European Central Bank = − QS0 e(µ̃−rQ )T φ N(φd+ ) σ 2 + σ̃ 2 + 2ρσ σ̃ T
(21)
where K denotes the strike, ST is the spot of the
Note that the computation is standard calculus and
currency pair FOR–DOM at maturity T , φ takes the
repeatedly uses the identity
values +1 for a digital call and −1 for a digital put,
and Q is the prespecified conversion rate from the S0 eµ̃T n(φd+ ) = Kn(φd− ) (22)
domestic to the quanto currency. The valuation of
the European-style quanto digitals follows the same The understanding of these greeks is that σ and σ̃ are
principle as in the quanto vanilla option case. The both risky parameters, independent of each other. The
value is third independent risk is either σ3 or ρ, depending on
what is more likely to be known.
v = Qe−rQ T N(φd− ) (20) This shows exactly how the three vega positions
We provide an example of a European-style digital can be hedged with plain vanilla options in all the
put in USD/JPY quantoed into EUR in Table 1. three legs, provided there is a liquid vanilla options
market in all the three legs. In the example with
XAU–USD–EUR, the currency pairs XAU–USD
Hedging of Quanto Options and EUR–USD are traded; however, there is no liq-
uid vanilla market in XAU–EUR. Therefore, the cor-
Hedging of quanto options can be done by running
relation risk remains unhedgeable. Similar statements
a multicurrency options book. All the usual Greeks
would apply for quantoed stocks or stock indices.
can be hedged. Delta hedging is done by trading in
However, in FX, there are situations with all the legs
the underlying spot market. An exception is the cor-
being hedgeable, for instance, EUR–USD–JPY.
relation risk, which can only be hedged with other
The signs of the vega positions are not uniquely
derivatives depending on the same correlation. This
determined in all the legs. The FOR–DOM vega is
is often difficult to do in practice. In FX, the cor-
smaller than the corresponding vanilla vega in the
relation risk can be translated into vega positions
case of a call and positive correlation or put and
as shown in [4, 5] or in Foreign Exchange Basket
negative correlation, and larger in case of a put and
Options. We now illustrate this approach for quanto
positive correlation or call and negative correlation.
plain vanilla options.
The DOM–QUANTO vega takes the sign of the
correlation in case of a call and its opposite sign in
Vega Positions of Quanto Plain Vanilla Options case of a put. The FOR–QUANTO vega takes the
Starting from equation (15), we obtain the sen- opposite sign of the put–call indicator φ.
sitivities We provide an example of pricing and vega
hedging scenario in Table 2, where we notice that
∂v  √  the dominating vega risk comes from the FOR–DOM
= QS0 e(µ̃−rQ )T n(d+ ) T − φ N(φd+ )ρ σ̃ T , pair, whence most of the risk can be hedged.
∂σ
4 Quanto Options

Table 2 Example of a quanto plain vanilla


Data set 1 Data set 2 Data set 3
FX pair FOR–DOM XAU–USD XAU–USD XAU–USD
Spot FOR–DOM 800.00 800.00 800.00
Strike FOR–DOM 810.00 810.00 810.00
Quanto DOM–QUANTO 1.0000 1.0000 1.0000
Volatility FOR–DOM 10.00% 10.00% 10.00%
Quanto volatility DOM–QUANTO 12.00% 12.00% 12.00%
Correlation FOR–DOM–DOM–QUANTO 25.00% 25.00% −75.00%
Domestic interest rate DOM 2.0000% 2.0000% 2.0000%
Foreign interest rate FOR 0.5000% 0.5000% 0.5000%
Quanto currency rate Q 4.0000% 4.0000% 4.0000%
Time in years T 1 1 1
1 = call −1 = put FOR 1 −1 1
Quanto vanilla option Value 30.81329 31.28625 35.90062
Quanto vanilla option Vega FOR–DOM 298.14188 321.49308 350.14600
Quanto vanilla option Vega DOM–QUANTO −10.07056 9.38877 33.38797
Quanto vanilla option Vega FOR–QUANTO −70.23447 65.47953 −35.61383
Quanto vanilla option Correlation risk −4.83387 4.50661 −5.34207
Quanto vanilla option Vol FOR–QUANTO 17.4356% 17.4356% 8.0000%
Vanilla option Value 32.6657 30.7635 32.6657
Vanilla option Vega 316.6994 316.6994 316.6994

Applications is often chosen to be the current spot. The notional


is often a percentage p of the deposit amount A,
The standard applications are performance-linked such as 50 or 25%. The annual coupon paid to the
deposits or performance notes as in [6]. Any time investor is then a predefined minimum coupon plus
the performance of an underlying asset needs to be the participation
converted into the notional currency invested, and the
exchange rate risk is with the seller, we need a quanto max[ST − S0 , 0]
p· (23)
product. Naturally, an underlying like gold, which is S0
quoted in USD, would be a default candidate for a which is the return of the exchange rate viewed
quanto product, when the investment is in a currency as an asset, where the investor is protected against
other than USD. negative returns. So, obviously, the investor buys a
EUR call GBP put with strike K = S0 and notional
Performance-linked Deposits N = pA GBP or N = pA/S0 EUR. Thus, if the EUR
goes up by 10% against the GBP, the investor gets
A performance-linked deposit is a deposit with a a coupon of p · 10% per annum in addition to the
participation in an underlying market. The standard minimum coupon.
is that a GBP investor waives her coupon that
the money market would pay and instead buys a Example 1 We consider the example shown in
EUR–GBP call with the same maturity date as the Table 3. In this case, if the EUR–GBP spot fixing
coupon, strike K and notional N in EUR. These is 0.7200, the additional coupon would be 0.8571%
parameters have to be chosen in such a way that per annum. The breakeven point is at 0.7467, so this
the offer price of the EUR call equals the money product is advisable for a very strong EUR bullish
market interest rate plus the sales margin. The strike view. For a weakly bullish view, an alternative would
Quanto Options 5

Table 3 Example of a performance-linked deposit, foreign exchange, however, is the deposit currency
where the investor is paid 30% of the EUR–GBP return. being different from the domestic currency of the
Note that in GBP the day count convention in the money exchange rate, which is quoted in FOR–DOM (for-
market is act(a) /365 rather than act/360
eign–domestic), meaning how many units of domes-
Notional 5 000 000 GBP tic currency are required to buy one unit of foreign
Start date 3 June 2005 currency. So, if we have a EUR investor who wishes
Maturity 2 September 2005 (91 days) to participate in a EUR–USD movement, we need to
Number of days 91
quanto the domestic payoff currency (USD) into the
(act)
Money market 4.00% act/365 foreign currency (EUR). The payoff of the EUR call
reference rate USD put
EUR–GBP spot 0.7000 [ST − K]+ (24)
reference
Minimum rate 2.00% act/365 is in domestic currency (USD). Of course, this payoff
T −0.7000,0]
Additional 30% · 100 max[S
0.7000
act/365 can be converted into the foreign currency (EUR)
coupon at maturity, but the question is, at what rate? If we
ST EUR–GBP fixing on 31 August convert at rate ST , which is what we could do in
2005 (88 days)
the spot market at no cost, then the investor buys a
Fixing source ECB
vanilla EUR call. But here, the investor receives a
(a)
(act = actual number of days) coupon given by
max[ST − S0 , 0]
be to buy an up-and-out call with barrier at 0.7400 p· (25)
ST
and 75% participation, where we would find the
best case to be 0.7399 with an additional coupon If the investor wishes to have performance of equa-
of 4.275% per annum, which would lead to a total tion (23) rather than equation (25), then the payoff at
coupon of 6.275% per annum. maturity is converted at a rate of 1.0000 into EUR,
and this rate is set at the beginning of the trade. This
Composition is the quanto factor, and the vanilla is actually a self-
quanto vanilla, that is, a EUR call USD put, cash
• From the money market we get 49 863.01 GBP settled in EUR, where the payoff in USD is con-
at the maturity date. verted into EUR at a rate of 1.0000. This self-quanto
• The investor buys a EUR call GBP put with strike vanilla can be valued by inverting the exchange rate,
0.7000 and with notional 1.5 million GBP. that is, looking at USD–EUR. This way the valuation
• The offer price of the call is 26 220.73 GBP, can incorporate the smile of EUR–USD.
assuming a volatility of 8.0% and a EUR rate Similar considerations need to be taken into
of 2.50%. account if the currency pair to participate in does
• The deferred premium is 24 677.11 GBP. not contain the deposit currency at all. A typical sit-
• The investor receives a minimum payment of uation is a EUR investor, who wishes to participate
24 931.51 GBP. in the gold price, which is measured in USD, so the
• Subtracting the deferred premium and the mini- investor needs to buy a XAU call USD put quantoed
mum payment from the money market leaves a into EUR. So the investor is promised a coupon as
sales margin of 254.40 GBP (which is extremely in equation (23) for a XAU–USD underlying, where
poor). the coupon is paid in EUR; this implicitly means that
• Note that the option the investor is buying must we must use a quanto plain vanilla with a quanto
be cash-settled. factor of 1.0000.

Variations. There are many variations of the References


performance-linked notes. Of course, one can think of
the European style knock-out calls or window-barrier [1] Avellaneda, M., Buff, R., Friedman, C., Grandechamp, N.,
calls. For a participation in a downward trend, the Kruk, L. & Newman, J. (2001). Weighted Monte Carlo:
investor can buy puts. One of the frequent issues in a new technique for calibrating asset-pricing models,
6 Quanto Options

International Journal of Theoretical and Applied Finance [6] Wystup, U. (2006). FX Options and Structured Products,
4(1), 91–119. Wiley Finance Series.
[2] Hakala, J. & Wystup, U. (2002). Foreign Exchange Risk,
Risk Publications, London.
[3] Shreve, S.E. (2004). Stochastic Calculus for Finance I+II,
Springer. Related Articles
[4] Wystup, U. (2001). How the Greeks would have hedged
correlation risk of foreign exchange options, Wilmott
Research Report, August 2001. Black–Scholes Formula; Foreign Exchange Mar-
[5] Wystup, U. (2002). How the Greeks would have hedged kets; Foreign Exchange Options.
correlation risk of foreign exchange options, in Foreign
Exchange Risk, Risk Publications, London. UWE WYSTUP
Vanna–Volga Pricing Delta and vega are the most relevant sensitivity
parameters for FX options maturing within one
year. A delta-neutral position can be achieved by
trading the spot. Changes in the spot are explicitly
The vanna–volga method, also called the traders’ allowed in the Black–Scholes model. Therefore,
rule of thumb, is an empirical procedure that can be model and practical trading have very good control
used to infer an implied-volatility smile from three over spot change risk. The more sensitive part is
available quotes for a given maturity. It is based the vega position. This is not taken care of in the
on the construction of locally replicating portfolios Black–Scholes model. Market participants need to
whose associated hedging costs are added to cor- trade other options to obtain a vega-neutral position.
responding Black–Scholes prices to produce smile- However, even a vega-neutral position is subject to
consistent values. Besides being intuitive and easy to changes of spot and volatility. For this reason, the
implement, this procedure has a clear financial inter- sensitivity parameters vanna (change of vega due to
pretation, which further supports its use in practice. change of spot) and volga (change of vega due to
In fact, SuperDerivatives has implemented a type of change of volatility) are of special interest. Vanna is
this method in their pricing platform, as one can read also called d vega/d spot, volga is also called d vega/d
in the patent that SuperDerivatives has filed. vol. The plots for vanna and volga for a vanilla option
The vanna–volga method is commonly used in are displayed in Figures 1 and 2. In this section, we
foreign exchange options markets, where three main outline how the cost of such a vanna and volga
volatility quotes are typically available for a given exposure can be used to obtain prices for options
market maturity: the delta-neutral straddle, referred that are closer to the market than their theoretical
to as at-the-money (ATM); the risk reversal (RR) Black–Scholes value.
for 25 delta call and put; and the (vega-weighted)
butterfly (BF) with 25 delta wings. The application
of vanna–volga pricing allows us to derive implied
volatilities for any option’s delta, in particular for Cost of Vanna and Volga
those outside the basic range set by the 25 delta
put and call quotes. The notion of risk reversals We fix the rates rd and rf , the time to maturity T ,
and butterflies is explained in the article on foreign and the spot x and define
exchange (FX) market terminology (see Foreign
Exchange Markets).

In the financial literature, the vanna–volga app- cost of vanna = exotic vanna ratio
roach was introduced by Lipton and McGhee in [2],
who compare different approaches to the pricing × value of RR (1)
of double-no-touch (DNT) options, and by Wystup 
cost of volga = exotic volga ratio
in [5], who describes its application to the valuation
of one-touch (OT) options. The vanna–volga proce- × value of BF (2)
dure is reviewed in more detail and some important 
results concerning the tractability of the method and exotic vanna ratio = Bσ x /RRσ x (3)
its robustness are derived by Castagna and Mercurio 
exotic volga ratio = Bσ σ /BFσ σ (4)
in [1].

The following is based on the section Traders’ value of RR = [RR(σ ) − RR(σ0 )] (5)
Rule of Thumb by Wystup in [6]. 
The traders’ rule of thumb is a method of traders value of BF = [BF(σ ) − BF(σ0 )] (6)
to determine the cost of risk managing the volatility
risk of exotic options with vanilla options. This cost where σ0 denotes the ATM (forward) volatility and
is then added to the theoretical value (TV) in the σ denotes the wing volatility at the delta pillar ,
Black–Scholes model and is called the overhedge. and B denotes the value function of a given exotic
We explain the rule and then consider an example of option. The values of risk reversals and butterflies are
a one-touch option. defined by
2 Vanna–Volga Pricing

Vanilla Vanilla

1.6

1.4

1.2
2.0
1.0
1.5
0.8

Volga
1.0
180 0.6
180

Time to expiration
162 0.5
Time to expiration

155

Vanna
144 0.4
130
0.0

(days)
126 105
0.2
(days)

108 80
−0.5 55
91 0.0
73 −1.0 30
55 −0.2 5
−1.5

1.00
0.98
0.96
0.94
0.92
0.90
0.89

0.85
0.87

0.81
0.79
0.83

0.75
0.77

0.74

0.70
0.72
37
19 −2.0
0.70
0.72
0.75
0.77
0.80
0.82
0.85
0.87
0.90
0.92
0.95
0.97
1.00

Figure 2 Volga of a vanilla option as a function of spot


and time to expiration, showing the symmetry about the
Figure 1 Vanna of a vanilla option as a function of spot at-the-money line
and time to expiration, showing the skew symmetry about
the at-the-money line


RR(σ ) = call(x, , σ, rd , rf , T )− put(x, , σ, rd , rf , T ) (7)
 call(x, , σ, rd , rf , T ) + put(x, , σ, rd , rf , T )
BF(σ ) =
2
call(x, 0 , σ0 , rd , rf , T ) + put(x, 0 , σ0 , rd , rf , T )
− (8)
2

where vanilla(x, , σ, rd , rf , T ) means vanilla(x, K, cost of volga


σ, rd , rf , T ) for a strike K chosen to imply |vanillax 2Bσ σ
(x, K, σ, rd , rf , T )| =  and 0 is the delta that pro- =
cσ σ (σ+ )
+ pσ σ (σ− )
duces the ATM strike. To summarize, we  
abbreviate c(σ+ )−c(σ0 ) + p(σ− ) − p(σ0 )
× (12)
2
c(σ+ ) = call(x, + , σ+ , rd , rf , T )

(9)
where we note that volga of the butterfly should
p(σ− ) = put(x, − , σ− , rd , rf , T )

(10) actually be

and obtain 1 
cσ σ (σ+ ) + pσ σ (σ− )−cσ σ (σ0 )−pσ σ (σ0 ) (13)
2
cost of vanna
but the last two summands are close to zero. The
Bσ x
= vanna–volga adjusted value of the exotic is then
cσ x (σ ) − pσ x (σ− )
+

 
× c(σ+ ) − c(σ0 ) − p(σ− ) + p(σ0 ) (11) B(σ0 )+p × [cost of vanna+cost of volga] (14)
Vanna–Volga Pricing 3

A division by the spot x converts everything into With these approximations, we obtain the formulae
the usual quotation of the price in per cent of the
underlying currency. The cost of vanna and volga is
Bσ x
commonly adjusted by a number p ∈ [0, 1], which cost of vanna ≈ cσ (σ0 )RR (17)
is often taken to be the risk-neutral no-touch (NT) cσ x (σ+ )− pσ x (σ− )
probability. The reason is that in the case of options 2Bσ σ
cost of volga ≈ cσ (σ0 )BF (18)
that can knock out, the hedge is not needed anymore cσ σ (σ+ ) + pσ σ (σ− )
once the option has knocked out. The exact choice of
p depends on the product to be priced; see Table 1.
Taking p = 1 as the default value would lead to
overestimated overhedges for DNT options as pointed Observations
out in [2].
The values of risk reversals and butterflies in 1. The price supplements are linear in butterflies
equations (11) and (12) can be approximated by a and risk reversals. In particular, there is no cost
first-order expansion as follows. For a risk reversal, of vanna supplement if the risk reversal is zero
we take the difference of the call with correct implied and no cost of volga supplement if the butterfly
volatility and the call with ATM volatility minus the is zero.
difference of the put with correct implied volatility 2. The price supplements are linear in the ATM
and the put with ATM volatility. It is easy to see vanilla vega. This means supplements grow with
that this can be well-approximated by the vega of growing volatility change risk of the hedge
the ATM vanilla times the risk reversal in terms of instruments.
volatility. Similarly, the cost of the butterfly can be 3. The price supplements are linear in vanna and
approximated by the vega of the ATM volatility times volga of the given exotic option.
the butterfly in terms of volatility. In formulae, this is 4. We have not observed any relevant difference
between the exact method and its first-order
approximation. Since the computation time for
c(σ+ ) − c(σ0 ) − p(σ− ) + p(σ0 )
the approximation is shorter, we recommend
≈ cσ (σ0 )(σ+ − σ0 ) − pσ (σ0 )(σ− − σ0 ) using the approximation.
5. It is not clear up front which target delta to use
= σ0 [pσ (σ0 ) − cσ (σ0 )] + cσ (σ0 )[σ+ − σ− ]
for the butterflies and risk reversals. We take a
= cσ (σ0 )RR (15) delta of 25% merely on the basis of its liquidity.
6. The prices for vanilla options are consistent with
and, similarly, the input volatilities as shown in Figures 3, 4,
and 5.
c(σ+ ) − c(σ0 ) + p(σ− ) − p(σ0 ) 7. The method assumes a zero volga of risk rever-
2 sals and a zero vanna of butterflies. This way
≈ cσ (σ0 )BF (16) the two sources of risk can be decomposed and
hedged with risk reversals and butterflies. How-
ever, the assumption is actually not exact. For
Table 1 Adjustment factors for the overhedge for first- this reason, the method should be used with a
generation exotics lot of care. It causes traders and financial engi-
Option p neers to keep adding exceptions to the standard
method.
KO No-touch probability
RKO No-touch probability
DKO No-touch probability
OT 0.9 × no-touch probability − 0.5 × bid–offer- Consistency Check
spread × (TV − 33%)/66%
DNT 0.5 A minimum requirement for the vanna–volga pricing
KO, knock out; RKO, reverse knockout; DKO, double knock- to be correct is the consistency of the method with
out; OT, one touch; DNT, double no touch vanilla options. We show in Figures 3, 4, and 5 that
4 Vanna–Volga Pricing

Vanna–volga-pricing implied volatilities Vanna–volga-pricing implied volatilities


17.0 13.8
Implied volatility
16.5 Given volatility 13.6

16.0 13.4

Volatility (%)
15.5
Volatility (%)

13.2
15.0
13.0
14.5
12.8 Implied volatility
14.0 Given volatility
12.6
13.5

92

88

81

73

63

51

40

30

21

15

10
13.0 One-year call delta (%)
99
98
94
85
70
51
32

3
1
7
17

Figure 5 Consistency check of vanna–volga pricing.


One-month call delta (%) Vanilla option smile for a one-year maturity EUR/USD call,
spot = 0.9060, rd = 5.07%, rf = 4.70%, σ0 = 13.20%,
Figure 3 Consistency check of vanna–volga pricing. σ+ = 13.425%, σ− = 13.00%
Vanilla option smile for a one-month maturity EUR/USD
call, spot = 0.9060, rd = 5.07%, rf = 4.70%, σ0 = 13.35%,
σ+ = 13.475%, σ− = 13.825%
input consists only of three volatilities (ATM and two
delta pillars), it would be too much to expect that the
Vanna–volga-pricing implied volatilities method produces correct representation of the entire
15.0
volatility matrix. We can only check if the values
Implied volatility
for ATM and target- puts and calls are reproduced
Given volatility
correctly. To verify this, we check if the values for an
14.5 ATM call, a risk reversal, and a butterfly are priced
correctly. Of course, we only expect approximately
Volatility (%)

correct results. Note that the number p is taken to be


14.0 1, which agrees with the risk-neutral NT probability
for vanilla options.
For an ATM call, vanna and volga are approxi-
13.5
mately zero, and hence there are no supplements due
to vanna or volga cost.
13.0 For a target- risk reversal,
92
88
81
73
63
51
40
30
21
15
10

One-year call delta (%) c(σ+ ) − p(σ− ) (19)

Figure 4 Consistency check of vanna–volga pricing. we obtain


Vanilla option smile for a one-year maturity EUR/USD call,
spot = 0.9060, rd = 5.07%, rf = 4.70%, σ0 = 13.20%,
σ+ = 13.425%, σ− = 13.575% cost of vanna
cσ x (σ+ ) − pσ x (σ− )
=
the method does, in fact, yield a typical foreign cσ x (σ+ ) − pσ x (σ− )
exchange smile shape and produces the correct input  
× c(σ+ ) − c(σ0 ) − p(σ− ) + p(σ0 )
volatilities ATM and at the delta pillars. We will now
prove the consistency in the following way. Since the = c(σ+ ) − c(σ0 ) − p(σ− ) + p(σ0 ) (20)
Vanna–Volga Pricing 5

cost of volga • Knock in (KI) is priced via KI = vanilla − KO.


• Reverse knock in (RKI) is priced via RKI =
2[cσ σ (σ+ ) − pσ σ (σ− )]
= vanilla − RKO.
cσ σ (σ+ ) + pσ σ (σ− ) • Reverse knockout (RKO) is priced via RKO(φ,
  K, B) = KO(−φ, K, B) − KO(−φ, B, B) +
c(σ+ ) − c(σ0 ) + p(σ− ) − p(σ0 )
× (21) φ(B − K)NT(B).
2
• Double one touch (DOT) is priced via DNT.
and observe that the cost of vanna yields a perfect • NT is priced via OT.
fit and the cost of volga is small, because in the first
fraction we divide the difference of two quantities by
the sum of the quantities, which are all of the same
Volatility for Risk Reversals, Butterflies
order. and Theoretical Value
For a target- butterfly
To determine the volatility and the vanna and volga
for the risk reversal and butterfly, the convention
c(σ+ ) + p(σ− ) c(σ0 ) + p(σ0 ) is the same as for the building of the smile curve.
− (22)
2 2 Hence the 25% delta risk reversal retrieves the strike
we analogously obtain a perfect fit for the cost of for 25% delta call and put with the spot delta and
volga and calculates the vanna and volga of these options using
the corresponding volatilities from the smile.
The TV of the exotics is calculated using the ATM
cost of vanna volatility, retrieving it with the same convention that
cσ x (σ+ ) − pσ x (σ0 ) − [cσ x (σ0 ) − pσ x (σ− )] was used to build the smile, to build the smile.
=
cσ x (σ+ ) − pσ x (σ0 ) + [cσ x (σ0 ) − pσ x (σ− )]
 
× c(σ+ ) − c(σ0 ) − p(σ− ) + p(σ0 ) (23) Pricing Barrier Options
which is again small. Ideally, one would be in a situation to hedge all
The consistency can actually fail for certain barrier contracts with a portfolio of vanilla options or
parameter scenarios. This is one of the reasons that simple barrier building blocks. In the Black–Scholes
the traders’ rule of thumb has been criticized repeat- model, there are exact rules on how to statically hedge
edly by a number of traders and researchers. many barrier contracts. A state-of-the art reference
We introduce the abbreviations for first generation is given in [3]. However, in practice, most of these
exotics listed as below. hedges fail, because volatility is not constant.
KO, knock-out; KI, knock-in; RKO, reverse For regular KO options, one can refine the method
knock-out; RKI, reverse knock-in; DKO, double to incorporate more information about the global
knock-out; OT, one-touch; NT, no-touch; DOT, dou- shape of the vega surface through time.
ble one-touch; DNT, double no-touch. We chose M future points in time as 0 < a1 % <
a2 % < · · · < aM % of the time to expiration. Using
the same cost of vanna and volga, we calculate the
Adjustment Factor overhedge for the regular KO with a reduced time to
expiration. The factor for the cost is the probability
The factor p has to be chosen in a suitable fashion. not to touch the barrier within the remaining times
Since there is no mathematical justification or indi- to expiration 1 > 1 − a1 % > 1 − a2 % > · · · > 1 −
cation, there is a lot of dispute in the market about aM % of the total time to expiration. Some desks
this choice. Moreover, the choices may also vary over believe that for ATM strikes, the long time to
time. An example for one of many possible choices maturity should be weighted higher and for low-
of p is presented in Table 1. delta strikes the short time to maturity should be
For options with strike K, barrier B and type weighted higher. The weighting can be chosen (rather
φ = 1 for a call and φ = −1 for a put, we use arbitrarily) as
the following pricing rules, which are based on no-
arbitrage conditions: w = tanh[γ (|δ − 50%| − 25%)] (24)
6 Vanna–Volga Pricing

with a suitable positive γ . For M = 3, the total Pricing European-style Options


overhedge is given by
Digital Options
OH (1 − a1 %) × w + OH (1 − a2 %) Digital options are priced using the overhedge of the
+ OH (1 − a3 %) × (1 − w)
OH = call/put spread with the corresponding volatilities.
3
(25)
European Barrier Options
Which values to use for M, γ , and the ai , whether to
European barrier options (EKO) are priced using
apply a weighting and what kind, varies for different
the prices of European and digital options and the
trading desks.
relationship
An additional term can be used for single-barrier
options to account for glitches in the stop loss of the
barrier. The theoretical value of the barrier option EKO(φ, K, B) = vanilla(φ, K) − vanilla(φ, B)
is determined with a barrier that is moved by four − digital(B)φ(B − K) (27)
basis points and 50% of that adjustment is added
to the price if it is positive. If it is negative, it
is omitted altogether. The theoretical foundation for No-touch Probability
such a method is explained in [4].
The NT probability is obviously equal to the nondis-
counted value of the corresponding NT option paying
Pricing Double-barrier Options at maturity (under the risk-neutral measure). Note that
the price of the OT option is calculated using an iter-
Double-barrier options behave similar to vanilla ation for the touch probability. This means that the
options for a spot far away from the barrier and more price of the OT option used to compute the NT prob-
like OT options for a spot close to the barrier. There- ability is itself based on the traders’ rule of thumb.
fore, it appears reasonable to use the traders’ rule of This is an iterative process that requires an abortion
thumb for the corresponding regular KO to determine criterion. One can use a standard approach that ends
the overhedge for a spot closer to the strike and for either after 100 iterations or as soon as the differ-
the corresponding OT option for a spot closer to the ence of two successive iteration results is less than
barrier. This adjustment is the intrinsic value of the 10−6 . However, the method is so crude that it actu-
ally does not make much sense to use such precision
RKO times the overhedge of the corresponding OT
at just this point. Therefore, to speed up the compu-
option. The border is the arithmetic mean between
tation, we suggest that this procedure is omitted and
strike and the in-the-money barrier.
no iterations are taken, which means to use the non-
discounted TV of the no-touch option as a proxy for
the NT probability.
Pricing Double-no-touch Options

For DNT options with lower barrier L and higher The Cost of Trading and Its Implication
barrier H at spot S, one can use the overhedge on the Market Price of One-touch Options
Now let us take a look at an example of the traders’
OH = max{vanna–volga-OH ; δ(S − L) rule of thumb in its simple version. We consider OT
− T V − 0.5%; δ(H − S) − T V − 0.5%} options, which hardly ever trade at TV. The tradable
price is the sum of the TV and the overhedge. Typical
(26) examples are shown in Figure 6, one for an upper
touch level in EUR/USD, and one for a lower touch
where δ denotes the delta of the DNT option. level.
Vanna–Volga Pricing 7

One-touch up
5
4
3

Overhedge (%)
2
1
0
0 10 20 30 40 50 60 70 80 90 100
−1
−2
−3
−4
(a) Theoretical value (%)

One-touch down
1

0
Overhedge (%)

0 10 20 30 40 50 60 70 80 90 100
−1

−1

−2

−2

−3
(b) Theoretical value (%)

Figure 6 Overhedge of a one-touch option in EUR/USD for (a) an upper touch level and (b) a lower touch level, based
on the traders’ rule of thumb

Clearly, there is no overhedge for OT options with Example


a TV of 0% or 100%, but it is worth noting that
low-TV OT options can be twice as expensive as We consider a one-year OT option in USD/JPY with
their TV, sometimes even more. The overhedge arises payoff in USD. As market parameters, we assume a
from the cost of risk managing the OT option. In spot of 117.00 JPY per USD, JPY interest rate 0.10%,
the Black–Scholes model, the only source of risk is USD interest rate 2.10%, volatility 8.80%, 25-delta
the underlying exchange rate, whereas the volatility risk reversal −0.45%,a and 25-delta butterfly 0.37%.b
and interest rates are assumed constant. However, The touch level is 127.00, and the TV is at 28.8%.
volatility and rates are themselves changing, whence If we now only hedge the vega exposure, then we
the trader of options is exposed to instable vega and need to consider two main risk factors, namely,
rho (change of the value with respect to volatility
and rates). For short-dated options, the interest rate 1. the change of vega as the spot changes, often
risk is negligible compared to the volatility risk called vanna;
as shown in Figure 7. Hence the overhedge of 2. the change of vega as the volatility changes, often
an OT option is a reflection of a trader’s cost called volga or volgamma or vomma.
occurring because of the risk management of his vega To hedge this exposure, we treat the two effects
exposure. separately. The vanna of the OT option is 0.16%,
8 Vanna–Volga Pricing

Comparison of vega and rho


0.10
0.09
0.08

Option sensitivity
0.07
0.06
0.05
0.04
0.03
0.02 rho
vega
0.01
0.00
0.0 0.1 0.2 0.3 0.4 0.5 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.8 1.9 2.0
Maturity of a vanilla option in years

Figure 7 Comparison of interest rate and volatility risk for a vanilla option. The volatility risk behaves like a square-root
function, whereas the interest rate risk is close to linear. Therefore, short-dated FX options have higher volatility risk than
interest rate risk

and the vanna of the risk reversal is 0.04%. So End Notes


we need to buy 4 (= 0.16/0.04) risk reversals,
a.
and, for each of them, we need to pay 0.14% of This means that a 25-delta USD call is 0.45% chea-
the USD amount, which causes an overhedge of per than a 25-delta USD put in terms of implied
−0.6%. The volga of the OT is −0.53%, and the volatility.
b.
This means that a 25-delta USD call and 25-delta USD
volga of the butterfly is 0.03%. So we need to sell
put is, on average, 0.37% more expensive than an ATM
18 (= −0.53/0.03) butterflies, each of which pays option in terms of volatility.
us 0.23% of the USD amount, which causes an
overhedge of −4.1%. Therefore, the overhedge is References
−4.7%. However, we will get to the touch level
with a risk-neutral probability of 28.8%, in which
[1] Castagna, A. & Mercurio, F. (2007). The Vanna-Volga
case we would have to pay to unwind the hedge. method for implied volatilities, Risk Jan. 106–111.
Therefore, the total overhedge is −71.2% × 4.7% = [2] Lipton, A. & McGhee, W. (2002). Universal Barriers,
−3.4%. This leads to a midmarket price of 25.4%. Risk, 15(5), 81–85.
Bid and offer could be 24.25%–36.75%. There are [3] Poulsen, R. (2006). Barrier options and their static hedges:
different beliefs among market participants about the simple derivations and extensions, Quantitative Finance,
unwinding cost. Other observed prices for OT options Vol. 6(4), 327–335.
[4] Schmock, U., Shreve, S.E. & Wystup, U. (2002). Deal-
can be due to different existing vega profiles of the ing with dangerous digitals, in Foreign Exchange Risk,
trader’s portfolio, a marketing campaign, a hidden Risk Publications, London, http://www.mathfinance.com/
additional sales margin, or even the overall view of FXRiskBook/
the trader in charge. [5] Wystup, U. (2003). The market price of one-touch options
in foreign exchange markets, Derivatives Week, London,
Further Applications XII(13), 8–9.
[6] Wystup, U. (2006). FX Options and Structured Products,
The method illustrated above shows how important Wiley Finance Series.
the current smile of the vanilla options market is
for the pricing of simple exotics. Similar types of Related Articles
approaches are commonly used to price other exotic
options. For long-dated options, the interest rate risk Barrier Options; Foreign Exchange Markets.
will take over the lead in comparison to short-dated
options where the volatility risk is dominant. UWE WYSTUP
Foreign Exchange Smiles patterns were dependent upon the term to expiration
of the option. For a large sample of option expiration
cycles, the smile patterns were almost identical for
all options with the same time to expiration.
Smile Regularities for Foreign Exchange For currency options, Tompkins [15] examined
Options options on futures for US dollar/Deutsche mark, US
dollar/British pound, US dollar/Japanese yen, and
One trend in the empirical investigation of implied US dollar/Swiss franc for a time period from 1985
volatilities has been to concentrate on understanding to 2000. To determine relative shapes, the implied
the behavior of implied volatilities across strike prices volatilities for each currency pair were standard-
and time to expiration [see 10]. This line of research ized by  
assumes implicitly that these divergences provide σx
V SI = · 100 (1)
information about the dynamics of the options mar- σATM
kets. Another approach [3, 5, 6, 14] suggests that the
divergences of implied volatilities across strike prices where VSI is the volatility smile index, σx is the
may provide information about the expected disper- volatility of an option with strike price x, and
sion process for underlying asset prices. These papers σATM is the volatility of the ATM option. The
assume that asset return volatility is a (locally) deter- ATM volatility was determined using a simple linear
ministic function of the asset price and time and that interpolation for the two implied volatilities of the
this information can be used to enhance the tradi- strike prices that bracketed the underlying asset
tional Black–Scholes–Merton (BSM) option-pricing price. This relative volatility measure will facilitate
approach (see also Dupire Equation; Local Volatil- comparisons of biases (in percentage terms) within
ity Model). All these papers examine implied volatil- and between markets.
ity patterns at a single point in time and assume The strike prices were standardized to allow
that option prices provide an indication of the deter- intra- and intermarket comparisons to be drawn. The
ministic volatility function. However, Dumas et al. standardized strike prices can be expressed as
[4] (1998) tested for the existence of a deterministic ln(Xτ /Fτ )
implied volatility function and rejected the hypoth-  (2)
σ τ/365
esis that the inclusion of such a model in option
pricing was an improvement in terms of predic- where X is the strike price of the option, F is
tive or hedging performance compared with BSM. the underlying futures price and the square root of
Their research examined whether at a single point in time factor reflects the percentage in a year of the
time, implied volatility surfaces provide predictions remaining time until the expiration of the option. The
of implied volatilities at some future date (one week sigma (σ ) is the level of the ATM volatility.
hence). As the analysis was restricted to the actively
Tompkins [15] looked at this problem in a slightly traded quarterly expiration schedule of March, June,
different way. The approach of Dumas et al. [4] September, and December maturities, implied volatil-
assumes that the deterministic volatility function ity surfaces with a maximum term to expiration of
provides both a prediction of the future levels of approximately 90 days were obtained. Data were fur-
implied volatility and the relative shapes of implied ther pruned by restricting the analysis to 18 time
volatilities across strike prices and time. If the future points from (the date nearest to) 90 calendar days
levels of implied volatilities cannot be predicted, to expiration to (the date nearest) 5 calendar days to
this does not mean that the relative shapes of expiration in 5-day increments. Finally, the analysis
implied volatilities cannot be predicted. Tompkins of the implied volatilities was limited to those strike
[15] examined the relative implied volatility bias prices in the range ±3.5 standard deviations away
rather than the absolute implied volatility bias. When from the underlying futures price. Figure 1 displays
the volatilities of each strike price were standardized the aggregated patterns for the 15-year period.
by dividing the level of the at-the-money (ATM) A logical starting point for an appropriate func-
volatility, regularities in the volatility function were tional form to fit an implied volatility surface is the
found. He further found that these standardized smile approach suggested by Dumas et al. [4] (1996), who
2 Foreign Exchange Smiles

Actual implied volatility smiles: D-mark futures Actual implied volatility smiles: yen futures

160 160
Standardized implied

Standardized implied
150 150
140 140
130
volatility

130

volatility
120 120
80 80

piry
110 110

y
Time to expir
50 50
100 100

Time to ex
35 35
90 90
80 20 80 20
−3.5 −3 −2.5 −2 −1.5 −1 5 −3.5 −3−2.5 −2 −1.5 −1 5
−0.5 0 0.5 1 1.5 2 −0.5 0 0.5 1 1.5 2
2.5 3 3.5 2.5 3 3.5
(a) Strike price (in standard deviation terms) (b) Strike price (in standard deviation terms)

Actual implied volatility smiles: B-pound futures Actual implied volatility smiles: S-franc futures

160 160

Standardized implied
Standardized implied

150 150
140 140

volatility
volatility

130 130
120 120
80 80

piry
110 piry 110
50 50

Time to ex
Time to ex
100 100
90 35 90 35
80 20 80 20
−3.5−3−2.5 −2−1.5 −1−0.5 5 −3.5 −3−2.5−2−1.5−1 5
0 0.5 1 1.5 2 2.5 −0.5 0 0.5 1 1.5 2
3 3.5 2.5 3 3.5
(c) Strike price (in standard deviation terms) (d) Strike price (in standard deviation terms)

Figure 1 Actual implied volatility surfaces of option prices for four foreign exchange futures standardized to the level of
the ATM volatility (1985–2000)

tested a number of arbitrary models based upon a of the ATM implied volatility. Curved patterns are
polynomial expansion across strike price (x) and time independent of the level of the exchange rate. Finally,
(t). Tompkins [15] extended the polynomial expan- Tompkins [15] reports a significant third-order strike
sion to degree three and included additional factors, price effect for all four foreign exchange option
which might also influence the behaviors of volatility markets. Tompkins [15] shows that the high degree
surfaces. of explanatory power is invariant to the time period
For all four foreign exchange options markets, a of analysis and that the model provides accurate
parsimonious model explains the vast majority of smile predictions outside of the estimation sample
the variance in the standardized implied volatility period. Under these assumptions, we conclude that
surfaces. The analysis allowed strike price effects regularities in implied volatility surfaces exist and are
to be separated into a first-order effect (the skew), similar for the four currency markets. Furthermore,
a second-order effect (the smile), and higher order the regularities are time period invariant. These
effects. For the skew effect, the results suggested general results provide means to test alternative
that an asymmetrical smile pattern is a function of models, which could potentially explain why implied
the level of the foreign exchange rate. The evidence volatility surfaces exist. This is discussed in the
suggests that when futures prices are low (high), the following section.
implied volatility pattern becomes more negatively
(positively) skewed.
For the second-order “curved” pattern, all four Empirical Regularities for Currency
markets display a convex pattern that becomes more Option Smiles
extreme as the options expiration date is approached.
Furthermore, a significant negative relationship is From [15], the following general conclusions can be
found between the degree of curvature and the level drawn for the behaviors of implied volatility surfaces
Foreign Exchange Smiles 3

for options on foreign exchange: this event caused greater curvature for the
smiles. For the second shock, both the British
1. Implied volatility patterns are symmetrical on pound and Japanese yen displayed greater smile
average for options on currencies. curvature thereafter.
2. For three of the four markets, the skew effect is 8. For all four markets, the degree of curvature of
related to the level of the underlying futures the implied volatility pattern is inversely related
price. The only exception is for the British to the level of the ATM implied volatility.
pound/US dollar. The level of the futures price
Thus, the higher the level of ATM implied
impacts the skewness in an inverse manner to
volatility, the less pronounced the degree of
the pure skewness effect. This suggests that for
curvature in the smile.
low futures prices a negative skew occurs and
9. For three of the four currency markets, the
at higher futures prices the skew flattens and
degree of curvature is independent of the level
can become positive.
3. The skew effect for currency options is rela- of the underlying futures price. The only excep-
tively invariant to the time to expiration of the tion is for the Japanese yen, where the higher
options. It is solely due to extreme levels of the level of the exchange rate, the lesser the
the underlying exchange rate or to some market curvature (however, this impact is small).
shock. 10. For all four markets, the degree of curvature of
4. For two of the four markets, the level of the the implied volatility pattern is asymmetrical.
skew effect is inversely related to the level of For the Deutsche mark, Japanese yen, and Swiss
the ATM implied volatility. For the Deutsche franc, the degree of asymmetry is negative. This
mark and Swiss franc, the higher (lower) the suggests that the curvature is more extreme
level of the ATM implied volatility, the more for options with strike prices below the current
negative (positive) the level of the skew. level of the underlying futures. For the British
5. Shocks change the degree and sign of the pound, the relationship is positive, indicating
skew effect. For the Deutsche mark and Swiss that the curvature is more extreme for options
franc, the concerted intervention in the currency with strike prices above the current level of the
markets by the Group of Seven (G7) caused underlying futures.
a negative skew to occur. The 1987 stock
crash had minimal impact on the currency Using these 10 stylized facts as clues, we now
markets, with only a slightly negative skew examine alternative explanations for the existence
impact for the Deutsche mark. For the second
of implied volatility smiles. It is crucial that any
shock, the only currency option skew affected
coherent explanation must conform to all of these
was the Japanese yen. This occurred in January
facts simultaneously. If selected models are inter-
1988 and appears to have been associated with
nally inconsistent with these facts, it is grounds for
international capital flows out of the US dollar
rejection.
into yen.
6. All implied volatility patterns display some A nontrivial problem is that the statistical testing
degree of curvature and the degree of curvature of any option-pricing model has to be a joint hypoth-
is inversely related to the option’s term to esis that the option-pricing model is correct and that
expiration. The longer the term to expiration, the markets are efficient. Given that smiles do exist,
the less extreme the degree of curvature in the we can reject the hypothesis that actual option values
smile. conform to the Black [2] model. However, we are
7. Shocks change the degree of curvature of the uncertain as to why this occurs. Consider two possi-
implied volatility pattern. However, the effect ble reasons for the existence of smiles: the underlying
is not systematic and often shocks reduce the asset may follow an alternative price process or the
degree of curvature. For the G7 intervention Black [2] model is correct but market imperfections
in 1985, there was a reduction in the degree exist. The next sections discuss both possibilities to
of smile curvature for both the Deutsche mark better understand the regularities in implied volatility
and Swiss franc, while for the Japanese yen, surfaces presented in [15].
4 Foreign Exchange Smiles

Models with Alternative Price and jumps), dZ is a standard Wiener process, q(t) is
Volatility Processes the independent Poisson process, which captures the
jumps. The term λ is the mean number of arrivals per
Consider first that some alternative price (and volatil- unit time and κ represents the jumps size (which can
ity) process is at work instead of geometric Brownian also be a random variable).
motion with constant variance. Following the gen- Bates [1], Ho et al. [8], and Jiang [12] assumed
eral approach of Jarrow and Rudd [11], we consider that the volatility process is subordinated in a non-
alternative true terminal distributions for the underly- normal price process; this provides the inspiration
ing asset. Consider the following models that include for the third model (see [1] for tests of these mod-
stochastic volatility (σ̂ ) and alternative price pro- els). In this spirit, the third proposed model is a
cesses. For the sake of convenience, the volatility variant of the Heston [7] model, proposed by Tomp-
processes will be evaluated
√ in terms of a stochas- kins [16, 17], which includes jumps (as captured by a
tic variance process ( V ). Given that our previous normal inverse Gaussian (NIG) process) in the under-
results examined options on futures, the notation indi- lying price process.
cates that the underlying asset is a futures price
(F ). The first model, which will be considered, is Model 3
a stochastic volatility model: the square root process
model proposed by Heston [7] (see also Stochas- dF (t) = µF (t−) dt + σ̂ (t)F (t−) dN(t) (6)
tic Volatility Models: Foreign Exchange; Heston
Model). This choice is due to the ability of this with the variance process defined by
model to allow correlated underlying and volatility

processes. This will be defined as dV (t) = k(θ − V (t)) dt + ξ V (t) dZ(t) (7)
Model 1
where N (t) is a purely discontinuous martingale
dF (t) = µF (t) dt + σ̂ F (t) dZ1 (t) (3) corresponding to log returns driven by an NIG Lévy
process (see Normal Inverse Gaussian Model).
with the variance process defined by This model will be referred to as normal inverse
 Gaussian stochastic volatility (NIGSV) for the sake
dV (t) = κ(θ − V (t)) dt + ξ V (t) dZ2 (t) (4)
of convenience.
where Z1 and Z2 are standard Wiener processes with
correlation ρ. The term κ indicates the rate of mean
reversion of the variance, θ is the long-term variance, Smile Patterns Associated with the
and ξ indicates√the volatility of the variance. The Proposed Models
terms V and. V represent the variance and the
volatility of the process, respectively. Tompkins [17] discussed how parameters for each
The second model that we consider is the jump- of these models could be estimated (under the
diffusion model proposed by Merton [13] (see also physical measure) and the change of measure to
Jump-diffusion Models). Using his notation, this can allow risk neutral pricing. Of more interest to
be expressed as this article is the resulting smile behavior of each
Model 2 model. This can be seen in Figure 2 (restricted
solely to the Deutsche mark/US dollar). Figure 2(a)
shows the empirical smile patterns for Deutsche
dF (t) = F (α − λκ) dt + F σ (t) dZ(t) + dq(t) mark/US dollar from 1985 to 2000. Figure 2(b)
(5) shows the smile surface associated with the Hes-
ton [7] model. Figure 2(c) shows the smile surface
Using his notation, α is the instantaneous expected associated with the jump-diffusion model of Mer-
return on the futures contract, σ (t) is the instanta- ton [13]. Figure 2(d) represents the combination of
neous volatility of the futures contract, conditional stochastic volatility and jump processes (NIGSV
on no arrivals of important new information (no model).
Foreign Exchange Smiles 5

Empirical implied volatility smiles: Dmark/US Dollar Simulated implied volatility smiles: Heston (1993)

Standardized implied volatility 160


150 160

Standardized implied volatility


140 150
130 140
120 130
110 80 120

piry
100 110 80

piry
Time to ex
45
90 100

Time to ex
45
25
80 90
25
−3.5 −2.5 −1.5 5 80
−0.5 0.5 1.5 2.5 3.5 −3.5 −2.5 −1.5 −0.5 0.5
5
(a) Strike price (in standard deviation terms) 1.5 2.5 3.5
(b)
Strike price (in standard deviation terms)

Simulated implied volatility smiles: Merton (1976) Simulated implied volatility smiles: Tompkins (2007)

160
Standardized implied volatility

150
160

Standardized implied volatility


140
150
130
140
120 130
110 80 120 85
piry

piry
100 110 65
Time to ex

45

Time to ex
90 100
25 45
80 90
−3.5 25
−2.5 −1.5 −0.5 5 80
0.5 1.5 2.5 3.5 −3.5 −2.5 −1.5 −0.5 5
0.5 1.5 2.5 3.5
(c) Strike price (in standard deviation terms) (d) Strike price (in standard deviation terms)

Figure 2 Simulated implied volatility smiles for options on Deutsche mark/US dollar

Smile Patterns Associated with Stochastic of stochastic correlations. However, it seems incon-
Volatility sistent from an economic standpoint; if shocks change
the degree of asymmetry in the expected terminal dis-
As one can see in Figure 2(b), the Heston [7] model tribution of the underlying asset, it is not clear why
does generate a symmetrically curved smile function in half of the instances the degree of curvature (fact
consistent with point #1, but the smiles are flat as #8) is reduced. This model is also inconsistent with
the option expiration approaches and become more fact #8, that the higher the level of expected variance
curved, the longer the term to expiration (which is (ATM volatility), the flatter the degree of curva-
inconsistent with point #6). This is exactly the oppo- ture. Given that this model would produce effects
site of what is observed for currency smiles empiri- that are contradictory to both first and second strike
cally. The Heston [7] model can generate a skewed price effects observed empirically, we must reject it.
implied volatility pattern from a nonzero correlation An alternative explanation is that the jump-diffusion
between the volatility and underlying processes (see model of Merton [13] may be more appropriate.
equations 3 and 4). However, the longer the term
to expiration, the more extreme the skew pattern Smile Patterns Associated with Stochastic
would be. This is inconsistent with point #3, that Volatility
skewed patterns for currency options are time invari-
ant and are only associated with the levels of the According to Hull [9], this model could produce
ATM implied volatility or the underlying currency a curved implied volatility surface and this curve
exchange rate. However, this model is consistent with would be consistent with fact #6, that curves exist
fact #5 that shocks could change the degree of skew- and become more extreme the shorter the time to
ness. The model could still be valid under a regime expiration of the option. This can be seen in Figure 2,
6 Foreign Exchange Smiles

where the degree of curvature is most extreme clos- is observed for the actual smiles. The reason for this
est to expiration. However, as the Poisson process is that the parameters for the model were estimated
in equation (5) is independent and identically dis- using the underlying Deutsche mark/US dollar cur-
tributed (i.i.d.), this will converge over time to a rency futures (see [17] for details). While a feasible
normal distribution and thus, the implied volatil- measure change was used to price options (that omit-
ity surface would flatten, which is what occurs in ted arbitrage), it is unlikely that this measure change
Figure 2. It could also hold under a regime asso- is unique as nontraded sources of risk have been
ciated with fact #7, that shocks do change the introduced into the state space. These include jumps
degree of curvature. It could be that the inflow of and stochastic volatility. Given this, we should expect
new information changes the expectations of mar- that option prices will also contain some risk pre-
ket agents regarding the degree and magnitude of mium above and beyond the values associated with
future jumps. However, the model, as it stands, the underlying asset.
would not be able to explain the first-order strike
price effects. One alternative would be to allow the
shocks to be asymmetric. This would allow a skewed
Conclusions and Implications
implied volatility pattern to exist. However, if the In this article, we have examined currency option
jumps follow some i.i.d. process, the central limit smiles. Previous research by Tompkins [15] suggests
theorem would imply that the degree of skewness that when implied volatility patterns are standardized,
would be highest when the options are closest to regularities are observed both across markets and
expiration and would flatten as the term to expi- across time. He concludes that this may suggest that
ration is lengthened. This is at variance with fact market participants have developed some consistent
#3 that for currency options the skew effects are algorithm to vary option prices in a consistent manner
time invariant. Therefore, we can also reject a jump- away from Black [2] values.
diffusion model as being inconsistent with the empir- To better understand the nature of this algo-
ical record. rithm, 10 stylized results are identified from his
results for the four currency option markets. With
these 10 results we test whether alternative models,
Smile Patterns Associated with the NIGSV which have been proposed to explain the existence
Model of implied volatility surfaces, can generate the same
dynamics as these empirical results. Initially, mod-
This model assumes a symmetrical jump-diffusion els were examined that suggest an alternative price
process with a subordinated stochastic volatility pro- process may better define the underlying price and
cess with nonzero correlations between the two volatility processes. We reject both the Heston [7]
processes. The simulated implied volatility smiles and the Merton [13] models as appropriate models,
appear in Figure 2(d) and seem to resemble most the as they cannot produce all the empirical dynam-
actual smiles for Deutsche mark/US dollars options ics for actual smiles. The only model that could
in Figure 2(a). As can be seen, there is curvature explain all the dynamics is a model that combines
in the smile patterns for both short term and longer stochastic volatility and nonnormal innovations for
term options. The shorter term curvature is associated currency returns. When appropriate parameters are
with the jump process, while the longer term cur- input into this model and a feasible change of mea-
vature is associated with stochastic volatility. This sure is made, option prices can be determined. The
is consistent with both fact #1 and fact #6, that the smiles associated with this model match the dynamics
average smile pattern is symmetrical and the degree observed for actual currency option smiles. However,
of curvature is inversely related to time. Dynamics the model smiles do not display the same extreme
of the skew relationship can be explained with vari- degree of curvature as the empirical smiles. Follow-
ations of the correlation between the two processes. ing Tompkins [17], this suggests that a substantial
Finally, the asymmetry of the smile shapes can be risk premium exists for currency options and that
explained by the jump process. While this model the hypothesis that the existence of implied volatility
appears to display many of the dynamics of empirical surfaces are due solely to an alternative price process
smiles, the degree of curvature is not as extreme as is rejected.
Foreign Exchange Smiles 7

Alternatively, market imperfections may be the [12] Jiang, G. (1999). Stochastic volatility and jump-
reason for the existence of implied volatility surfaces. diffusion—implications on option pricing, International
Given that existing research has previously rejected Journal of Theoretical and Applied Finance 2(4),
409–440.
this, we tend to concur that market imperfections [13] Merton, R. (1976). Option pricing when underlying
alone are also probably not sufficient to explain stock returns are discontinuous, Journal of Financial
the existence of implied volatility smiles. However, Economics 3, 125–144.
it is possible that both alternative price processes [14] Rubinstein, M. (1994). Implied binomial trees, The
and market imperfections jointly contribute to the Journal of Finance 49, 771–818.
existence of implied volatility smiles. [15] Tompkins, R.G. (2001). Implied volatility surfaces:
uncovering regularities for options of financial futures,
The European Journal of Finance 7, 198–230.
References [16] Tompkins, R.G. (2003). Options on bond futures: isolat-
ing the risk premium, Journal of Futures Markets 23(2),
[1] Bates, D.S. (1996). jumps and stochastic volatility: 169–215.
exchange rate process implicit in Deutsche Mark opt- [17] Tompkins, R.G. (2006). Why smiles exist in foreign
ions, Review of Financial Studies 9, 69–107. exchange options: isolating components of the risk
[2] Black, F. (1976). The pricing of commodity contracts, neutral process, The European Journal of Finance 12,
Journal of Financial Economics 3, 167–179. 583–604.
[3] Derman, E. & Kani, I. (1994). Riding on the smile, Risk
7, 32–39.
[4] Dumas, B., Fleming, J. & Whaley, R.E. (1998). Implied Further Reading
volatility functions: empirical tests, The Journal of
Finance 53, 2059–2106. Balyeat, R.B. (2002). The economic significance of risk
[5] Dupire, B. (1992). Arbitrage Pricing with Stochastic premiums in the S&P 500 options market, Journal of Futures
Volatility, Working Paper, Société Générale Options Markets 22, 1145–1178.
Division. Garman, M. & Kohlhagen, S. (1983). Foreign currency option
[6] Dupire, B. (1994). Pricing with a smile, Risk 7, 18–20. values, Journal of International Money and Finance 2,
[7] Heston, S.L. (1993). A closed-form solution for options 231–237.
with stochastic volatility with applications to bond Henker, T. & Kazemi, H.B. (1998). The impact of deviations
and currency options, Review of Financial Studies 6, from random walk, in Security Prices on Option Prices,
327–343. Working Paper, University of Massachusetts,
[8] Ho, M.S., Perraudin, W.R.M. & Sørensen, B.E. (1996). Amherst.
A continuous-time arbitrage-pricing model with stochas-
tic volatility and jumps, Journal of Business & Economic
Statistics 14, 31–43. Related Articles
[9] Hull, J. (1997). Options, Futures and other Derivative
Securities, 3rd Edition, Prentice Hall, Upper Saddle
River.
Foreign Exchange Smile Interpolation; Implied
[10] Jackwerth, J.C. & Rubinstein, M. (1996). Recovering Volatility Surface; Stochastic Volatility Models:
probability distributions from option prices, The Journal Foreign Exchange.
of Finance 51, 1611–1631.
[11] Jarrow, R. & Rudd, A. (1982). Approximate option ROBERT G. TOMPKINS
valuation for arbitrary stochastic processes, Journal of
Financial Economics 10, 347–369.
Foreign Exchange Smile For sufficiently large σ (n ) and a smooth, differ-
entiable volatility smile, the sequence converges for
Interpolation n → ∞ against the unique fixed point ∗ ∈ A with
σ ∗ = σ (∗ ), corresponding to strike K.

The usual FX smiles normally satisfy the above


This article provides a short introduction into the han- mentioned regularity conditions. More details con-
dling of FX-implied volatility market data–especially cerning this proposition can be found in [5]. How-
their inter- and extrapolation across delta space and ever, note that already smoothness is demanded here,
time. We discuss a low-dimensional Gaussian kernel which directly leads to the issue of an appropriate
approach as the method of choice showing several smile interpolation.
advantages over usual smile interpolation methods
such as cubical splines.
Interpolation
FX-implied Volatility Before the discussion of specific interpolation
methods, let us take a step backward and remember
Implied volatilities for FX vanilla options are nor-
Rebonato’s well-known statement of implied volatil-
mally quoted against Black–Scholes deltas
ity as the wrong number in the wrong formula to
BS = e−rf T N obtain the right price [3]. Therefore, the explanatory
     power of implied volatilities for the dynamics of a
ln(S/K) + rd − rf + σ 2 ()/2 T stochastic process remains limited. Implied volatil-
× √ ities give a lattice on which marginal distributions
σ () T
can be constructed. However, even using many data
(1) points to generate marginal distributions, forward
Note that these deltas are dependent on σ (), that distributions and extremal distributions, which deter-
is, the market-given volatility should be quoted. mine the prices of some products such as compound
Thus, when retrieving a volatility for a given and barrier products, cannot be uniquely defined by
strike, an iterative processes is needed. However, implied volatilities (see [4] for a discussion of this).
under normal circumstances, the mapping from a The attempt to capture FX smile features can lead
delta-volatility to a strike-volatility coordinate sys- to two different general approaches.
tem works via a quickly converging fixed point
iteration. Parametrization
Proposition 1 (Delta–Strike fixed point iteration). One possibility to express smile or skew patterns is
Let just to capture it as the calibration parameter set of
n : A  → A, A ⊂ (0, 1) be a mapping, defined by an arbitrary stochastic volatility or jump diffusion
model that generates the observed market implied
volatilities. However, as spreads are rather narrow in
σ0 = σATM liquid FX options markets, it is preferred to exactly fit
0 = (KCall , σATM ) the given input volatilities. This automatically leads
to an interpolation approach.
n+1
Pure Interpolation
= e−rf (T −t) N (d1 (n )) (2)
As an introduction, we would like to pose four
= e−rf (T −t) N
    requirements for an acceptable volatility surface
ln(S/K)+ rd −rf +σ 2 (n )/2 (T −t) interpolation:
× √
σ (n ) T − t
1. Smoothness in the sense of continuous differen-
(3) tiability. Especially with respect to the possible
2 Foreign Exchange Smile Interpolation

application of Dupire-style local volatility mod- Definition 1 (Slice Kernel). Let (x1 , y1 ),(x2 , y2 ). . . ,
els, it is crucial to construct an interpolation that (xn , yn ) be n given points and g :   →  a smooth
is at least C2 in strike and at least C1 in time function which fulfills
direction. This becomes obvious when consider-
ing the expression for the local volatility in this g(xn ) = yn , n = 1, . . . , n (6)
context:

∂C(K, T )/∂T + rf C(K, T ) + K(rd − rf )∂C(K, T )/∂K


2
σloc (S, K) =
(1/2)K 2 (∂ 2 C(K, T )/K 2 )
+ 2(rd − rf )K(∂σi /∂K) + 2(∂σi /∂T )
σi
=  
T
2  (4)
√ √
K2 1
σi
1/(K T + d+ ∂σi /∂K) + ∂ 2 σi /∂K 2 − d+ T (∂σi /∂K)2

where C(K, T ) denotes the Black–Scholes A smooth interpolation is then given by


prices of a call option with strike K, σi its cor-
responding implied volatility, and 1
N
g(x) := αi Kλ (x − xi ) (7)
  λ (x) i=1
ln (S/K) + rd − rf + σ 2 ()/2 T
d+ = √ (5)
σ () T where

N
λ (x) := Kλ (x − xi ) (8)
Note in addition that local volatilities can directly i=1
be extracted from delta-based FX volatility sur-
faces, that is, the Dupire formula can alterna- and

tively be expressed in terms of delta. See [2] for u2
Kλ (u) := exp − (9)
details. 2λ2
2. Absence of oscillations, which is guaranteed if
the sign of the curvature of the surface does not The described kernel is also called Gaussian Ker-
change over different strike or delta levels. nel. The interpolation reduces to determining the αi ,
3. Absence of arbitrage possibilities on single which is straightforward via solving a linear equa-
smiles of the surface as well as absence of cal- tion system. Note that λ remains as a free smooth-
endar arbitrage ing parameter, which also affects the condition of
4. A reasonable extrapolation available for the the equation system. At the same time, it can be
interpolation method. used to fine-tune the extrapolation behavior of the
kernel.
A widely used classical interpolation method is Generally, the slice kernel produces reasonable
cubical splines. They attempt to fit surfaces by output smiles based on a maximum of seven delta-
fitting piecewise cubical polynomials to given data volatility points. Then it fulfills all the above-
points. They are specified by matching their second mentioned requirements. It is C∞ , does not create
derivatives at each intersection. Although this ensures oscillations, passes typical no-arbitrage conditions as
the required smoothness by construction, it does they are, for example, posed by Gatheral [1], and
not prevent oscillations, which directly leads to the finally has an inherent extrapolation method.
danger of arbitrage possibilities or it does not define In time direction, one might connect different slice
how to extrapolate the smile. We, therefore, introduce kernels by linear interpolation of the variances for
the concept of a slice kernel volatility surface as an same deltas. This also normally ensures the absence
alternative: of calendar arbitrage, for which a necessary condition
Foreign Exchange Smile Interpolation 3

Kernel interpolation of FX volatility surface

0.14

0.13

Implied volatility
0.12

0.11

0.1

0.09

0.08
0 20 40 60 80 100
Percent delta

Figure 1 Kernel interpolation of an FX volatility surface

is a nondecreasing variance for constant moneyness [3] Rebonato, R. (1999). Volatility and Correlation, John
F /K (see also [1] for a discussion of this). Wiley & Sons.
Figure 1 displays the shape of a slice kernel [4] Tistaert, J., Schoutens, W. & Simons, E. (2004). A perfect
calibration now what? Wilmott Magazine (March), 66–78.
applied to a typical FX volatility surface con- [5] Wystup, U. (2006). FX Options and Structured Products,
structed from 10 and 25 delta volatilities, and John Wiley & Sons.
the ATM volatility (in this example λ = 0.25 was
chosen).
Related Articles
References
Foreign Exchange Markets; Foreign Exchange
Options: Delta- and At-the-money Conventions.
[1] Gatheral, J. (2004). A Parsimonious Arbitrage-free Impli-
ed Volatility Parameterization with Application to the Val-
UWE WYSTUP
uation of Volatility Derivatives, Workshop Presentation,
Madrid.
[2] Hakala, J. & Wystup, U. (2002). Local volatility sur-
faces—tackling the smile, Foreign Exchange Risk, Risk
Books.
Margrabe Formula change of numeraire (see Change of Numeraire),
writing
S2
π EO (t) = S2 (t)EQ
t ((aS1 (T )/S2 (T ) − b)+ ) (5)
An exchange option gives its owner the right, but
not the obligation, to exchange b units of one asset noting that S1 /S2 follows a geometric Brownian
for a units of another asset at a specific point in motion, and reusing the Black–Scholes calculation
time, that is, it is a claim that pays off (aS1 (T ) − for the mean of a truncated lognormal variable.
bS2 (T ))+ at time T . If the underlying asset prices are multiplied by
Outperformance option or Margrabe option are a positive factor, then the exchange option’s value
alternative names for the same payoff. changes by that same factor. This means that we can
Let us assume that the interest rate is constant use Euler’s homogeneous function theorem to read
(r) and that the underlying assets follow corre- off the partial derivatives of the option value with
lated ( dW1 dW2 = ρ dt) geometric Brownian motions respect to the underlying assets (the deltas) directly
under the risk-neutral measure, from the Margrabe formula (see [15] for more such
tricks), specifically
dSi = µi Si dt + σi Si dWi for i = 1, 2 (1) dEO
= e(µ1 −r)τ N( d+ ) (6)
dS1
Note that allowing µi ’s that are different from r
enables us to use resulting valuation formula for and similarly for S2 . If the S assets are traded, then
the exchange option directly in cases with nontrivial a portfolio with these holdings (scaled by a and b)
carrying costs on the underlying. This could be for that is made self-financing with the risk-free asset
futures (where the drift rate is 0), currencies (where replicates the exchange option, and the Margrabe
the drift rate is the difference between domestic formula gives the only no-arbitrage price.
and foreign interest rates, see Foreign Exchange If the underlying assets do not pay dividends
Options), stocks with dividends (where the drift rate during the life of the exchange option (so that the
is r less the dividend yield), or nontraded quantities risk-neutral drift rates are µ1 = µ2 = r), then early
with convenience yields. exercise is never optimal, and the Margrabe formula
The value of the exchange option at time t is holds for American options too. With nontrivial
carrying costs, this is not true, but as noted by [2],
π EO (t) = EO(T − t, aS1 (t), bS2 (t)) (2) a change of numeraire reduces the dimensionality
of the problem so that standard one-dimensional
where the function EO is given by methods for American option pricing can be used.
The Margrabe formula is still valid with stochastic
interest rates, provided the factors that drive interest
EO(τ, S1 , S2 ) = S1 e(µ1 −r)τ N( d+ ) − S2 e(µ2 −r)τ N( d− )
rates are independent of those driving the S assets.
(3) Exchange options are most common in over-
the-counter foreign exchange markets, but exchange
with features are embedded in many other financial con-
texts; mergers and acquisitions (see [12]) and indexed
ln(S1 /S2 ) + (µ1 − µ2 ± σ 2 /2)τ executive stock options (see [9]) to give just two
d± = √ (4)
σ τ examples.

where σ = σ12 + σ22 − 2σ1 σ2 ρ, N denotes the stan-
dard normal distribution function and τ = T − t. The
Variations and Extensions
formula was derived independently by Margrabe [12] Some variations of exchange options can be valued in
and Fisher [6], but despite the two papers being pub- closed form. In [10], a formula for a so-called traffic
lished side by side in the Journal of Finance, the light option that pays
formula commonly bears only the former author’s
name. The result is most easily proven by using a (S1 (T ) − K1 )+ (S2 (T ) − K2 )+ (7)
2 Margrabe Formula

is derived, and [4] gives a formula for the value of distribution has been used for Asian and basket
a compound exchange option, that is, a contract that options.
pays • changing to Gaussian processes as suggested in
[3]; this may be suitable for commodity markets
(π EO (TC ) − S2 (TC ))+ at time TC < T (8) where spread contracts are popular, and it allows
for the inclusion of mean reversion.
Both formulas involve the bivariate normal distri- • if the ai,n Xi,n ’s depend monotonically on a com-
bution function, and in the case of the compound mon random variable, then Jamshidian’s approach
exchange option a nonlinear but well-behaved equa- from [8] can be used to decompose an option on a
tion that must be solved numerically. portfolio into a portfolio of simpler options. This
For knock-in and knockout exchange options is used to value options on coupon-bearing bonds
whose barriers are expressed in terms of the ratio in one-factor interest-rate models.
of the two underlying assets, [7] show that the
reflection-principle-based closed-form solutions (see
[14]) from the Black-Scholes model carry over; this
means that barrier option values can be expressed References
solely through the EO-function evaluated at appro-
priate points.
However, there are not always easy answers; in [1] Alexander, C. & Scourse, A. (2004). Bivariate normal
the simple case of a spread option mixture spread option valuation, Quantitative Finance 4,
637–648.
[2] Bjerksund, P. & Stensland, G. (1993). American
(S1 (T ) − S2 (T ) − K)+ (9) exchange options and a put-call transformation: a
note, Journal of Business, Finance and Accounting 20,
there is no commonly accepted closed-form solution. 761–764.
The reason for this is that a sum of lognormal [3] Carmona, R. & Durrleman, V. (2003). Pricing and
variables is not lognormal. More generally, many hedging spread options, SIAM Review 45, 627–685.
financial valuation problems can be cast as follows: [4] Carr, P. (1988). The valuation of sequential exchange
calculate the expected value of opportunities, Journal of Finance 43, 1235–1256.
[5] Dufresne, D. (2004). The log-normal approximation in
 + financial and other computations, Advances in Applied

n
αi,n Xi,n − K (10) Probability 36, 747–773.
[6] Fischer, S. (1978). Call option pricing when the exercise
i=1
price is uncertain, and the valuation of index bonds,
where the Xi,n ’s are lognormally distributed. One Journal of Finance, 33, 169–176.
[7] Haug, E.G. & Haug, J. (2002). Knock-in/out Margrabe,
can use generic techniques such as direct integration,
Wilmott Magazine 1, 38–41.
numerical solution of partial differential equations, [8] Jamshidian, F. (1989). An exact bond option formula,
or Monte Carlo simulation, but there is an extensive Journal of Finance 44, 205–209.
literature on other approximation methods. These [9] Johnson, S.A. & Tian, Y.S. (2001). Indexed execu-
include tive stock options, Journal of Financial Economics 57,
35–64.
• moment
n approximation, where the moments of [10] Jørgensen, P.L. (2007). Traffic light options, Journal of
i=1 i,n Xi,n are calculated, the variable then
α Banking and Finance 31, 3698–3719.
treated as lognormal, and the option priced by [11] Levy, E. (1992). Pricing European average rate currency
a Black–Scholes-like formula; an application to options, Journal of International Money and Finance
Asian options is given in [11]. 11(5), 474–491.
• integration by Fourier transform techniques, [12] Margrabe, W. (1978). The value of an option to
exchange one asset for another, Journal of Finance 33,
which extends beyond lognormal models and
177–186.
works well if n is not too large (say 2–4); an [13] Milevsky, M.A. & Posner, S.E. (1998). Asian options,
application to spread options is given in [1]. the sum of lognormals, and the reciprocal gamma distri-
• limiting results for n → ∞ as obtained in [5] bution, Journal of Financial and Quantitative Analysis
and [13]; the relation to the reciprocal gamma 33, 409–422.
Margrabe Formula 3

[14] Poulsen, R. (2006). Barrier options and their static Related Articles
hedges: simple derivations and extensions, Quantitative
Finance 6, 327–335.
[15] Reiss, O. & Wystup, U. (2001). Efficient computation of Black–Scholes Formula; Change of Numeraire;
options price sensitivities using homogeneity and other Exchange Options; Foreign Exchange Options.
tricks, Journal of Derivatives 9, 41–53.
ROLF POULSEN
Foreign Exchange (“today”) between the domestic and the foreign
currency. It is specified as the number of units
Options: Delta- and of domestic currency that an investor gets in
exchange for one unit of foreign currency,
At-the-money
number of units of domestic currency
Conventions S(t) :=
one unit of foreign currency
(1)
In financial markets, the value of a plain-vanilla
European option is generally quoted in terms of its • FX forward rate F (t, T ): The FX forward rate
implied volatility, that is, the volatility that, when F (t, T ) is the exchange rate between the domestic
plugged into the Black–Scholes formula, gives the and the foreign currency at some future point
correct market price. By observation of market prices of time T as observed at the present time t
the implied volatility, however, turns out to be a (t < T ). It is again specified as the number of
function of the option’s strike, thus giving rise to units of domestic currency that an investor gets
the so-called volatility smile. in exchange for one unit of foreign currency at
In foreign exchange (FX) markets, it is common time T .
practice to quote volatilities for FX call and put Using arbitrage arguments, spot and forward FX rates
options in terms of their delta sensitivities rather are related by (see, for instance, [3]):
than in terms of their strikes or their moneyness.
Volatilities and deltas are quoted by means of a table, Dfor (t, T )
the volatility smile table, consisting of rows for each F (t, T ) = S(t) · (2)
Ddom (t, T )
FX option expiry date and columns for a number of
delta values, as well as a column for the at-the-money where Dfor := Dfor (t, T ) is the foreign discount fac-
(ATM) volatilities. tor for time T (observed at time t) and Ddom :=
The definition and usage of a volatility smile table Ddom (t, T ) is the domestic discount factor for time T
is complicated by the fact that FX markets have (observed at time t).
established various delta and ATM conventions. In Note that the terminology in FX transactions is
this article, we summarize these conventions and always confusing. In this article, we refer to the
highlight their intuition. For each delta convention, “domestic” currency in the sense of a base currency
we give formulas and methods for the conversion of in relation to which “foreign” amounts of money are
deltas to strikes and vice versa. We describe how to measured (see also [4]). By definition (1), an amount
retrieve volatilities from the table for an arbitrary FX x in foreign currency, for example, is equivalent to
option that is to be priced in accordance with the x · S(t) units of domestic currency at time t.
information contained therein. We point out some In the markets, FX rates are usually quoted in
mathematical problems and pitfalls when trying to a standard manner. For example, the USD–JPY
do so and give criteria under which these problems exchange rate is usually quoted as the number of
surface. Japanese yen an investor receives in exchange for
1 USD. For a Japanese investor, the exchange rate
would fit the earlier definition, while a US investor
Definitions would either need to look at the reverse exchange rate
1/S(t) or think of Japanese yen as the “domestic”
FX Rate currency.
Before discussing the various delta conventions, we
summarize some basic terms and definitions that we Value of FX Forward Contracts
use in this article.
When two parties agree on an FX forward contract at
• FX spot rate S(t): The FX spot rate S(t) is time s, they agree on the exchange of an amount of
the current exchange rate at the present time t money in foreign currency at an agreed exchange rate
2 Foreign Exchange Options: Delta- and At-the-money Conventions

K against an amount of money in domestic currency We call the currency in which an option’s value is
at time T > s. When choosing K = F (s, T ), the FX measured as its premium currency.
forward contract has no value to either of the parties Also note that the present value of call and put
at time s. options are related by the put–call parity:
As, in general, the forward exchange rate changes
over time, at some time t (s < t < T ), the FX vc − vp = vf = Ddom (F (t, T ) − K) (8)
forward contract will have a nonzero value (in
domestic currency) given by
Definition of Delta Types
vf (t, T ) = Ddom (F (t, T ) − K)
This section summarizes the delta conventions used
= S(t)Dfor − KDdom (3) in FX markets and gives some of their properties. We
outline the correspondence of each delta sensitivity
Value of FX Options with a particular delta hedge strategy the holder of
an FX option chooses. FX options are peculiar in
Upon deal inception, the holder of an FX option that the underlying coincides with the exchange rate.
obtains the right to exchange a specified amount While, in general, it makes no sense to measure the
of money in domestic currency against a specified value of an option in units of its underlying (e.g., in
amount of money in foreign currency at an agreed the number of shares of a company), the FX option
exchange rate K. Assuming nonstochastic interest position can be held either in domestic or in foreign
rates and the standard lognormal dynamics for the currency. This gives rise to the premium-adjusted
spot exchange rate, at time t, the domestic currency deltas.
values of plain-vanilla European call and put FX For the ease of notation, we drop the time depen-
options with strike K and expiry date T are given dency of S(t) and F (t, T ) in the following and denote
by their respective Black–Scholes formulas: the spot exchange rate as of time t by S and the
forward exchange rate for time T as observed in t
by F .
Call option: vc (t,T ) = Ddom F (t,T )N(d+ )
− Ddom K N(d− ) (4)
Unadjusted Deltas
Put option: vp (t, T ) = (−1) · [Ddom F (t, T ) N(−d+ )
Spot Delta.
− Ddom K N (−d− )] (5)
  Definition 1 For FX options, the spot delta is
= Ddom F (t, T ) N(d+ ) − 1
  defined as the derivative of the option price vc/p with
− Ddom K N (d− ) − 1 (6) respect to the FX spot rate S:

where c/p ∂vc/p


S := (9)
∂S
ln (F (t, T )/K) ± 1/2σ τ2
d± = √ , Interpretation Spot delta is the “usual” delta sensi-
σ τ
tivity that follows from the Black–Scholes equation.
K : strike of the FX option, It can be derived by considering an FX option posi-
σ : Black–Scholes volatility, tion that is held and hedged in domestic currency (see,
for instance, [2, 3]).
τ = T − t : time to expiry of FX option, and Note that S is an amount in units of foreign
N(x) : cumulative normal distribution function (7) currency. This makes sense from the hedging per-
spective: an amount of money in foreign currency
Note that vc (t, T ) and vp (t, T ) as given earlier are is needed to make up for changes in the domestic
measured in domestic currency. The option position, currency value of an FX option (held in domestic
however, may also be held in foreign currency. currency), due to changes of the exchange rate. If,
Foreign Exchange Options: Delta- and At-the-money Conventions 3

for example, an investor is long an FX call option, a Put–call delta parity:


decrease in the exchange rate will lead to a decrease p
cF − F = 1 (16)
in his option position. By having shorted S units
of foreign currency, the investor will make a hedge
profit in domestic currency balancing his losses in the Premium-adjusted Deltas
option position to first order. Spot Delta Premium Adjusted.
Properties Definition 3 The premium-adjusted spot delta is
defined as
Call option: cS = Dfor N(d+ ) (10)
p   ∂  vc/p  vc/p
Put option: S = Dfor N(d+ ) − 1 c/p
S, pa := S ·
c/p
= S − (17)
∂S S S
= − Dfor N(−d+ ) (11)
p Interpretation The definition of the premium-
Put–call delta parity: cS − S = Dfor (12) adjusted spot delta follows from an FX option posi-
tion that is held in foreign currency, while being
Forward Delta
hedged in domestic currency.
Definition 2 The forward delta F (also called While v is the option’s value in domestic currency,
driftless delta [4]) of an FX option is defined as the v/S(t) is the option’s value converted to foreign
ratio of the option’s spot delta and the delta of a long currency (i.e., its premium currency) at time t. The
forward contract on the FX rate (where the forward term ∂(v/S(t))/∂S · dS, thus, gives the change of
price of the FX forward contract equals the strike of the option value (measured in foreign currency) with
the FX option): the underlying exchange rate. To complete a delta
hedge in domestic currency, the derivative needs to be
c/p multiplied by S(t) from where the defining equation
c/p ∂vc/p /∂S S
F := = (13) (17) for the premium-adjusted spot delta follows.
∂ vf /∂S ∂vf /∂S Note that the delta sensitivity is equal to spot delta,
adjusted for the premium effect v/S(t). This is easily
interpreted as a delta that is corrected by a premium
Interpretation The forward delta is not simply the amount already paid in foreign currency. Also note
derivative of the option price formula with respect to that S, pa itself is denominated in units of foreign
the forward FX rate F . The rationale for the above- currency.
mentioned definition follows from the construction
of a hedge portfolio using FX forward contracts as Properties
hedge instruments for the FX option position (both
held in domestic currency). The forward delta gives Call option:
the number of forward contracts that an investor
needs to enter into to completely delta hedge his/her K
cS, pa = Dfor N(d− ) (18)
FX option position; F , therefore, is simply a number F
without units. Put option:
Properties p K  
S, pa = Dfor N(d− ) − 1
F
Dfor · N(d+ ) K  
Call option: cF = = N(d+ ) (14) = −Dfor N(−d− ) (19)
Dfor F
  Put–call delta parity:
p Dfor · N(d+ ) − 1
Put option: F = K 
Dfor p
cS, pa + S, pa = Dfor N(d− ) − N(−d− )
F
= N(d+ ) − 1 = −N(−d+ )
K
(15) = 2cS, pa − Dfor (20)
F
4 Foreign Exchange Options: Delta- and At-the-money Conventions

p K Properties
cS, pa − S, pa = Dfor (21)
F
The defining equations for premium-adjusted deltas Call option:
have interesting consequences: while put deltas are K
unbounded and strictly monotonous functions of K, cF , pa = N(d− ) (23)
F
call deltas are bounded (i.e., cS ∈ [0; max ] with
Put option:
max < 1) and are not monotonous functions of K.
Thus, the relationship between call deltas and strikes p K   K
F , pa = · N(d− ) − 1 = − N(−d− ) (24)
K is not one to one. F F
Put–call delta parity:
Forward Delta Premium Adjusted.
p K 
cF , pa + F , pa = N(d− ) − N(−d− )
Definition 4 The premium-adjusted forward delta F
is defined in analogy to the unadjusted delta: K
= 2cF , pa − (25)
F
  c/p
S · ∂/∂S vc/p /S S, pa p K
c/p
F , pa = = (22) cF , pa − F , pa = (26)
∂ vf /∂ S ∂ vf /∂ S F

Interpretation The intuition behind the definition Also note the important remarks in the previous
of the premium-adjusted forward delta follows from section on the domain, the range of values, and the
an FX option position that is held in foreign currency relationship between call delta and option strike. They
and hedged by forward FX contracts in domestic apply likewise to premium-adjusted forward deltas.
currency. The premium-adjusted forward delta gives
the number of forward contracts that are needed
for the delta hedge in domestic currency of an FX Definition of At-the-money Types
option held in foreign currency. The derivation of
the defining equation (22) is similar to the one for In this section, we summarize the various ATM
spot delta premium adjusted (cf. the section Spot definitions, comment on their financial interpretation,
Delta Premium Adjusted). Note that F , pa is a pure and give the relations between all relevant quantities
number without units. in Table 1.

Table 1 Strike values and delta values at the ATM point for the different FX delta conventions(a)
-neutral Katm = F Vega/gamma = max  = 50%
ATM strike values
2 2
Spot delta F e1/2σ τ F F e1/2σ τ
2 2 2
Fwd delta F e1/2σ τ F F e1/2σ τ F e1/2σ τ

F e−1/2σ τ
2 2
Spot delta p.a. F F e1/2σ τ
−1/2σ 2 τ 1/2σ 2 τ
Fwd delta p.a. F e F F e
ATM delta values
 √ 
Spot delta Dfor N(0) Dfor N 12 σ τ Dfor N(0)
 √ 
Fwd delta N(0) N 1σ τ N(0) N(0)
2 √   √ 
Spot delta p.a. Dfor e −1/2σ 2 τ
N(0) Dfor N − 12 σ τ Dfor e N −1σ τ
+1/2σ 2 τ
 √   2
√ 
−1/2σ 2 τ
N −1σ τ e+1/2σ τ N − 1 σ τ
2
Fwd delta p.a. e N(0)
2 2
(a)
Note that N(0) = 1/2. The delta values are given for call options, the corresponding values for put options can be
obtained, by replacing N(x) with (N(x) − 1)
Foreign Exchange Options: Delta- and At-the-money Conventions 5

ATM Definition “Delta Neutral” Properties The relationships of Table 1 can again
be derived in a straightforward manner from the
Definition 5 The ATM point is defined as the strike definitions.
Katm , for which the delta of a call and a put option
add up to zero: ATM Definition “vega = max”
Definition 7 The ATM point is defined as the strike
cx (Katm , σatm ) + px (Katm , σatm ) = 0 (27) Katm for which vega of the FX option is at its
maximum. Vega is the sensitivity of the FX option,
Here, x represents any of the delta conventions with respect to the implied volatility of the underlying
defined in the section Definition of Delta Types. exchange rate. It is given by (cf. [4])

Interpretation The definition follows directly from ∂vc/p √


a “straddle” position where a long call and a long vegac/p = = S Dfor τ n(d+ ) (29)
∂σ
put option with the same strike are combined. If the
strike is chosen appropriately, the change in value of where n(x) is the normal density distribution.
the call and the put option compensate (to first order)
The ATM strike can be derived from ∂ vega/∂K =
when the underlying FX rate changes. The straddle
0 as
position’s value, thus, is insensitive (“delta neutral”)
to changes in the underlying FX rate. The reason for 2

this choice is that traders can use straddles to hedge Katm = F e1/2σ τ
(30)
the vega of their position without upsetting the delta. Properties Table 1 again summarizes the relevant
Properties The ATM definition mentioned earlier quantities for this ATM definition. Note that in case
of unadjusted deltas, this ATM definition is equivalent
for delta neutral FX options is equivalent to N(d+ ) =
to the delta neutral ATM definition. This is, however,
1/2 in case of the unadjusted delta conventions and
not the case for adjusted deltas.
N(d− ) = 1/2 in case of the premium-adjusted delta
conventions. From this, the relationships of Table 1 ATM Definition “γ = max”
follow in a straightforward manner.
Definition 8 The ATM point is defined as the strike
Katm for which the gamma sensitivity of the FX option
ATM Definition via Forward is at its maximum.
We restrict the discussion to the case of gamma
Definition 6 The ATM point is defined as the strike spot,
equaling the forward exchange rate:
c/p
c/p ∂S n(d+ )
γS := = Dfor √ (31)
Katm := F (28) ∂S Sσ τ

Interpretation This definition reflects the view that From ∂γ /∂K = 0, the ATM strike can be deriv-
(given the information at deal inception) an option ed as
is ATM when its strike is chosen equal to the
2
expected exchange rate at option expiry. If the spot Katm = F e1/2σ τ
(32)
exchange rate, indeed, approached F as t → T (as
thus revealing the equivalence to the ATM definition
would be the case in a fully deterministic world by
“vega = max”.a
arbitrage arguments, cf. equation (2)), then the ATM
strike would mark the dividing point between options
ATM Definition “50%”
that expire in-the-money (ITM) and out-of-the-money
(OTM). From the put–call parity (8), we see that this Definition 9 According to this convention, the ATM
is also the strike at which put and call options have point is defined by
the same value. Thus, this ATM definition is also
called value parity [4]. c = 0.5 and |p | = 0.5 (33)
6 Foreign Exchange Options: Delta- and At-the-money Conventions

This condition can only be true for the forward Conversion of Forward Delta to Strike. If volatil-
delta convention and, thus, does not apply to any of ities are given as a function of forward delta, the
the other delta conventions. strike corresponding to a given forward delta cF
can be calculated analytically. Let σ (cF ) denote the
volatility associated with cF ; σ (cF ) may either be
Properties of ATM Definitions quoted or interpolated from the volatility smile table.
Table 1 summarizes the properties of the ATM point With cF and σ (cF ) given, we can directly solve
for all possible combinations of ATM definitions and equation (14),
delta conventions.  
In this context, it is interesting to note that, ln (F /K) + 1/2σ (cF )2 T
cF = N(d+ ) = N √
beside their financial interpretation, mathematically, σ (cF ) T
the various definitions of the ATM point lead to three
(37)
characteristic relationships between the strike K and
the forward exchange rate F : for the strike K. We get

√  c 1
K = F ⇒ d± = ±σ τ (34) K = F exp −σ (F ) T N
c −1
F + σ (F ) T
c 2
2 √ 2
K = F e1/2σ ⇒ d+ = 0 , d− = −σ τ
τ
(35)
√ (38)
K = F e−1/2σ τ ⇒ d+ = σ τ , d− = 0
2
(36)
Conversion of Strike to Forward Delta. The
reverse conversion from strikes to forward deltas
Converting Deltas to Strikes and Vice is more difficult and can only be achieved numer-
Versa ically. The following algorithm can be shown to
converge [1] and has empirically proven to be very
Quoting volatilities as a function of the options’ efficient.
deltas rather than as a function of the options’ strikes In the first step, calculate a zero-order guess 0
brings about a problem when it comes to pricing FX by using the ATM volatility σatm in equation (14):
options. Consider the case that we want to price a  
vanilla European option for a given strike K. To price ln (F /K) + 1/2σatm
2
T
this option, we have to find the correct volatility. As 0 = cF (K, σatm ) = N √
the volatility is given in terms of delta and delta itself σatm T
is a function of volatility and strike, we have to solve (39)
an implicit problem, which, in general, has to be done
numerically. In the second step, use this zero-order guess for delta
The following sections outline the algorithms to derive a first-order guess σ1 for the volatility by
that can be used to that end for the various delta interpolating the curve σ ():
conventions and directions of conversion. As the spot
and forward deltas differ only by constant discount σ1 = σ (0 ) (40)
factors, we restrict the presentation to the forward
Finally, calculate the corresponding first-order guess
versions of the adjusted and unadjusted deltas.
for cF in the third step:
 
Forward Delta ln (F /K) + 1/2σ12 T
1 = cF (K, σ1 ) = N √
For unadjusted deltas, there are simple one-to-one σ1 T
relationships between put and call deltas, the put–call (41)
delta parities (12) and (16). PUT deltas can there-
fore easily be translated into the corresponding CALL and repeat steps two and three until the changes in 
deltas and it is sufficient to perform all the calcula- from one iteration to the next are below the specified
tions for call deltas here. accuracy.
Foreign Exchange Options: Delta- and At-the-money Conventions 7

Forward Delta Premium Adjusted in the σ () curve.

For the sake of clarity, we again restrict the discussion


σi = σ (i−1 ) (43)
below to the case of call deltas. The algorithms for
put deltas work analogously. In the second step, calculate the corresponding
guess i by
Conversion of Forward Delta Premium Adjusted
 
to Strike. The conversion from forward delta pre- K ln (F /K) − 1/2σi2 T
mium adjusted to an absolute strike is more com- i = cF, pa (K, σi ) = N √
F σi T
plicated than in the case of an unadjusted delta and
cannot be formulated in a closed-form expression. (44)
The reason is that in the equation for the premium-
adjusted call delta (23), Iterate these two steps until the change in  from
one step to the next is below the specified accuracy.
K A good initial guess for the volatility is, of course,
cF, pa = N(d− ) again the ATM-volatility σatm .
F
 
K ln (F /K) − 1/2σ (cF, pa )2 T
= N √ (42)
F σ (cF, pa ) T Using the Volatility Smile Table

the strike K appears inside and outside the cumula- In FX markets, linear combinations of plain-vanilla
tive normal distribution function so that one cannot FX options, such as “strangles”, “risk reversals”, and
solve directly for K. Even though both cF, pa and “butterflies”, are liquidly traded. These instruments
σ (cF, pa ) are given, the problem has to be solved are composed of ATM and OTM plain-vanilla put and
numerically. call options at specific values of delta (typically, 0.25
So when converting a given call delta cF, pa , a or 0.1). When aggregating this market information,
root finder has to be used to solve for the correspon- one obtains a scheme called the volatility smile table
dent strike K. This could, for example, be a simple consisting of rows for each FX option expiry date, a
bisection method where, for call deltas, the ATM- column for ATM volatilities, and two distinct sets of
strike Katm is the lower bound and some high (i.e., columns for volatilities of OTM put and call options.
quasi infinite) value such as 100 Katm can be used as We call these sets the put and call sides of the table.
upper bound. Of course, a more elaborate root finder Thus, OTM options can be priced by retrieving
to solve this problem could (and should!) be used, volatilities from the respective side of the volatility
but a discussion of the various methods lies beyond smile table, for example, OTM calls are priced
the scope of this article. using volatilities from the call side. By virtue of
Note, however, that all these methods require that the put–call parity, ITM options can be priced using
 is a strictly monotonous function of K. We will volatilities from the opposite side of the table, that is,
see in section Ambiguities in the Conversion from  ITM calls are priced using volatilities from the put
to Strike for Premium-adjusted Deltas that this is not side.
always the case. To exemplify this, consider the case of an option
with arbitrary time to maturity and strike. The typical
procedure to retrieve this option’s volatility from the
Conversion of Strike to Forward Delta Premium smile table would be the following.
Adjusted. The conversion of an absolute strike
to forward delta premium adjusted can be done 1. Determine the volatilities of the ATM point and
analogously to the conversion into an unadjusted the call and put sides at the option’s expiry. For
forward delta as described in section Conversion of this, the volatilities of each delta column will, in
Strike to Forward Delta. general, have to be interpolated in time.
First, use the previous guess i−1 to obtain an 2. Decide which side of the table to use depending
improved guess σi for the volatility by interpolating on the option’s strike K: options with K >
8 Foreign Exchange Options: Delta- and At-the-money Conventions

Katm are either OTM calls or ITM puts and are be performed. Which of the two conventions shall
therefore priced using volatilities from the call be used for options with expiries t within the range
side. Accordingly, options with K < Katm are Tk < t < Tk+1 ?
priced using volatilities from the put side (cf. Some possibilities are as follows: convert the
equation (4)). grid points at Tk into the convention used for Tk+1
3. Convert the option’s strike to delta. This depends and do interpolation in the long-term convention or,
on the side of the table chosen in the previous conversely, convert the grid points at Tk+1 into the
step: convert to a call delta if K > Katm , and to a convention used for Tk and interpolate in the short-
put delta if K < Katm . See the section Converting term convention. Another possibility would be to
Deltas to Strikes and vice versa for the details of translate the delta grid into a strike grid and do the
these conversions. interpolation on strikes.
4. Retrieve the volatility from the table by interpo- None of these approaches is a priori superior to
lating volatilities in delta. the others. In real life, however, a choice has to be
made. Even though the differences may be small, one
Alternatively, one could also translate the full should be aware that the choice is arbitrary.
smile volatility table from deltas to strikes. The con-
version of deltas to strikes would then be necessary
for the grid points of the table only, thus, making ATM Delta Falling in the OTM Range
steps 2 and 3 of the earlier listed procedure obsolete.
The interpolation in step 4 would be done in strikes For long times to maturity, the delta of an ATM
However, keep in mind that the strike grid points FX option may become smaller than the delta of
would vary from row to row. the closest OTM option. In a sense, the ATM delta
It is important to note that the earlier procedure “crosses” the nearest OTM delta. In this case, it is
is based on the assumption that delta is a strictly unclear how the interpolation of volatilities (in delta)
monotonous function of the option’s strike K: only should be done or whether the ATM point or the
in this case the option’s delta and strike are equivalent crossed delta point should possibly be ignored.
measures of the option’s moneyness, that is, only in The conditions under which this problem occurs
this case we are guaranteed the equivalence vary with the delta and ATM types. In the following,
we outline the derivation of these conditions for an
K > Katm ⇐⇒ c < catm (45) exemplarily chosen combination (premium-adjusted
spot deltas and forward ATM definition), summarize
In the following section, we will show that the the results for other combinations, and finally dis-
assumption of monotonicity is not always true and cuss a typical numerical example for long-dated FX
derive the conditions under which it is violated.
options.

Problems and Pitfalls Exemplary Derivation. We start from equation


(17) for the ATM delta using Katm = F :
Interpolation in Time Dimension when Delta
Conventions Change  σ √ 
atm
cS, pa (F, σatm ) = Dfor N − T (46)
In FX markets, it is common to switch delta conven- 2
tions with an option’s time to maturity. For example,
volatilities for options with less than two years to Obviously, the ATM delta will decrease with
maturity are often quoted in terms of spot deltas, decreasing
√ values of Dfor and increasing values of
whereas volatilities for longer expiries are usually σatm √T . Small values of Dfor and/or large values of
quoted in terms of one of the forward delta conven- σatm T will therefore lead to ATM deltas that are
tions. smaller than the nearest OTM point c1 (which is
When conventions change from one expiry Tk usually c1 = 0.25).
to the next one, it is a priori unclear how an From the expression mentioned earlier, it follows
interpolation in time on a delta-volatility table should immediately that the ATM delta is larger than the first
Foreign Exchange Options: Delta- and At-the-money Conventions 9

Table 2 Restrictions on discount factors and ATM volatilities for various combinations
of delta and ATM types(a)
ATM-type forward ATM-type delta neutral
p
Spot delta Dfor > −21 Dfor > 2c1
√ p

σatm T < 2N−1 1 + D 1
for
√ −1
 p
Forward delta σatm T < 2N 1 + 1 No constraint
Spot delta p.a. Dfor > 2c1 2c1
Dfor >

√ c √
σatm T < 2N−1 1 − D 1 σatm T < 2 ln Dforc
for 21

√   √ 1
Forward delta p.a. σatm T < 2N−1 1 − c1 σatm T < 2 ln c
21

(a)
If these conditions are violated, the ATM point will cross the nearest OTM point. Note that
p
the put delta 1 is negative

OTM point c1 only if For a 30-year option, this condition restricts the
ATM volatility to a maximum of 21.5%, a value

√ −1 c1 −1 c that—a priori—does not seem unattainable.
σatm T < −2N = 2N 1− 1
Dfor Dfor
(47) Ambiguities in the Conversion from  to Strike
for Premium-adjusted Deltas
This inequality has meaningful solutions only if
the right-hand side is positive, that is, only if Premium-adjusted deltas can cause further complica-
1 − c1 /Dfor > 1/2. We therefore have the additional tions. Recall from the section Forward Delta Premium
constraint: Adjusted

Dfor > 2c1 (48) K


cF = N(d− )
F
The derivations for all other possible combina- ln (F /K) − 1/2σ 2 τ
tions are analogous. Their results are summarized in where d− = √ (50)
σ τ
Table 2.
and note that cF is not a monotonous function of the
Numerical Example. To gain some intuition for strike K and therefore not invertible for all strikes
these constraints, let us consider a numerical exam- as illustrated by Figure 1(a). Thus, while cF can be
ple for a relevant case. A typical combination calculated for any strike K, the reverse is not true:
of delta and ATM conventions for long-dated FX for any cF < max , there are two strikes K1  = K2
options is the ATM-type delta neutral for premium- with cF (K1 ) = cF (K2 ).
adjusted forward deltas. Usually, the first OTM In case Kmax < Katm , there is no problem.
point is quoted at c1 = 0.25. Inserting this value cF, pa (K) is a monotone function of K for all
into the respective formula (see Table 2) yields the K > Katm and is therefore directly related to the
condition option’s moneyness: the smaller a call option’s delta
value, the higher its strike and the deeper it is

√ 1 OTM. If desired, the volatility smile table defined
σatm T < 2 ln = 2 ln(2) ≈ 1.1774 in terms of delta can be translated uniquely into
2c1
a smile table in strikes. Besides, when retriev-
(49) ing volatilities from the call side of the volatility
10 Foreign Exchange Options: Delta- and At-the-money Conventions

0.35
Call delta fwd p.a. a = 0.7
Market quotes 0.3 a = 1.1

∆max 0.25
∆atm 0.2

fa (x )
0.25 0.15
0.1
0.05
0
1.5 2 2.5 3 3.5 4 4.5 5
−0.05
(a) Katm Kmax K(∆ =0.25) (b) x

Figure 1 (a) Premium adjusted (forward) call delta as a function of strike. (b) The function fα (x) as defined in equation
(56) for α = 0.7 (broken line) and α = 1.1 (solid line)

smile table, this can be done by direct interpo- Starting from the expressions for the ATM strike
lation of the entries, possibly including the ATM and the call delta, see Table 1,
point.
In case Kmax > Katm , however, the situation
Katm = F e−1/2σ atm T
2
is more complicated. cF, pa (K) is no longer a (51)
monotonous function of K and the conversion of   −1/2σ atm
2
catm = cF, pa Katm , σatm
2
= 1
2
e T
(52)
deltas to strikes is no longer unique for all OTM
options. Thus, when translating a smile table in deltas
into a smile table in strikes, particular care has to we need to find conditions under which there is a
be taken. In addition, when retrieving the volatility second solution with strike K > Katm that solves
for options with strikes K ∈ (Katm ; Kmax ) from the
volatility smile table, one has to extrapolate volatil-  
catm = cF, pa K, σ (cF, pa ) (53)
ities in delta beyond the ATM point which seems
odd.
Note that the counterintuitive extrapolation in This may seem a difficult problem at first glance
delta beyond the ATM point does not occur in case since the delta on the right-hand side depends on the
the volatility smile table in deltas is translated to a volatility, which itself is given in terms of delta. It
table in strikes on the grid points first (see the section is, however, simplified considerably by the following
Using the Smile Volatility Table) and the interpolation argument: while we do not know the strike for the
is done in strikes. This is possible as long as there is point that solves equation (53), we do know the delta
no crossing of the ATM point and the closest delta there: it is the ATM delta. As the volatility is given
grid point as discussed in the section ATM Delta in terms of delta, we also know the volatility on the
Falling in the OTM Range. right-hand side of equation (53): it must be the ATM-
In the following discussion, we show that the volatility σatm .
case Kmax > Katm can, indeed, occur. We restrict Therefore, the problem can be reformulated as
ourselves to premium adjusted forward call deltas follows: when does
and ATM-type delta neutral. The conditions for
the ATM-type forward can be derived analogously
and we summarize the results at the end of this catm = cF, pa (K, σatm ) (54)
section. Similar arguments hold for premium-adjusted
spot deltas, however, it is reasonable to assume have a second solution with K > Katm (besides the
that the problems outlined in the section ATM trivial one at K = Katm )?
Delta Falling in the OTM Range surface Inserting the expressions for the ATM delta and
beforehand. the premium-adjusted forward call delta into equation
Foreign Exchange Options: Delta- and At-the-money Conventions 11

(54) we get In other words, for a 30-year option, with ATM


volatilities larger than 14.6%, the conversion between
 
1 −1/2σ 2 T K ln (F /K) − 1/2σ atm
2
T F, pa and strikes becomes ambiguous.
e atm = N √ (55) A similar condition can be derived for the forward
2 F σatm T ATM type. The major difference is that the function
With the definitions fα (x) takes on a different form that ultimately yields
the following expression for the turnover point α at
√ which the conversion becomes ambiguous:
α := σatm T ,
K 1/2σ 2 T K α 2 /2  α 1  α
x := e atm = e N − = N − (60)
F F 2 α 2
the problem is equivalent to finding the roots of This equation can be solved numerically and
yields the constraint

1 ln(x)
fα (x) := e −α 2 /2
− xN − (56) √
2 α α = σatm T < 1.224 (61)
The condition K > Katm in the previous equation so that for a 30-year option, ATM volatilities above
corresponds to x > 1. Plotting fα (x) for various 22.3% will lead to ambiguities.
values of α, see Figure 1(b), we make the following
observation: for small values of α the function
End Notes
increases monotonically, taking on only positive
values for x > 1. For larger values of α, the function a.
It should be noted that equation (32) was obtained under
decreases at first, thus, taking on negative values in the assumption that the slope of the volatility σ as a
a certain range, and then increases again, eventually function of the strike K in equation (31) is 0 ATM. This is
reaching a second zero. necessary as otherwise this ATM definition would become
Therefore, the question reduces further to this: impracticable. In general, however, the volatility smile
under which conditions does the function fα (x) have implies a nonzero slope and the maximum value of γ will
a negative slope at x = 1? The first derivative of fα not be found at the strike given by equation (32).
can easily be calculated,
References

ln(x) 1 ln(x)
fα (x) = − e−α
2
/2
N − − N − [1] Borowski, B. (2005). Hedgingverfahren für Foreign
α α α
Exchange Barrieroptionen, Diploma Thesis, Technical
(57) University of Munich.
[2] Carr, P. & Bandyopadhyay, A. (2000). How to derive the
and the slope of fα at x = 1 is now readily obtained Black–Scholes Equation Correctly? http://faculty.chica-
as gogsb.edu/akash.bandyopadhyay/research/ (accessed Mar

2000).
1 [3] Hull, J.C. (1997). Options, Futures and Other Derivative
fα (x = 1) = − e−α /2 N (0) − N (0)
2
Securities, 3rd Edition, Prentice Hall, NJ.
α

[4] Wystup, U. (2006). FX Options and Structured Products,
1 1 1 Wiley.
= − e−α /2
2
− √ (58)
2 α 2π
The constraint that it has to be positive yields the Related Articles
condition
Foreign Exchange Markets; Foreign Exchange

√ 2 Symmetries; Foreign Exchange Smile Interpol-
α = σatm T < (59) ation.
π
which restricts the ATM volatitity for a 30-year CLAUS CHRISTIAN BEIER & CHRISTOPH
option to only 14.6%. RENNER
Stochastic Volatility is to model the FX rate Xt with a volatility process vt
by a system of stochastic differential equations, like
Models: Foreign
dXt = µt Xt dt + σt Xt dWtX
Exchange
dvt = η(vt ) dt + ζ (vt ) dWtv (1)

where vt = f (σt ) for a function f and the increments


In finance, the term volatility has a dilative meaning.
of the Brownian motions W X and W v are possibly
There exists a definition in the statistical sense, which correlated. Table 1 gives an overview of some com-
states that volatility is the standard deviation per mon stochastic volatility models.
unit time (usually per year) of the (logarithmic) A general class of stochastic volatility models is
asset returns. However, empirical evidence about formed by the affine jump diffusion models. They
derivatives markets shows that refinements of this have been studied by Duffie et al. [5]. The Heston
definition are necessary. model is a special case of this kind of model.
First, one can observe a dependency of The second approach is inspired by the obser-
Black–Scholes-implied volatility on at least the vation that higher volatility comes along with an
strike price and time to maturity implicit in increased trading activity and vice versa. This is
option prices. This dependency defines the implied realized by a time change in the FX rate process.
volatility surface. One possibility to incorporate this For instance, consider a standard Brownian motion
dependency into a model is by using a deterministic {Wt }t≥0 with variance t for Wt , that is, the value of
function for the instantaneous volatility, that is, the the process after t units of physical time. Now, if
volatility governing the changes in spot returns in the economic time elapses twice as fast as the phys-
infinitesimal time steps. This function of spot price ical time due to market activity, the process could
and time is called local volatility (see Local Volatility be expressed by the deterministically time-changed
Model). process {W2t }t≥0 . Hence, the variance for the values
In addition, empirical evidence indicates that the of the process after t units of physical time would
local volatility surface is not constant over time, then be 2t. This idea can be generalized by repre-
but is subject to changes. This is not surprising senting the economic time as a stochastic process
since the expectations of the market participants with Yt , which is named stochastic clock. For every real-
respect to the future instantaneous volatility might ization of the process {Yt }t≥0 , economic time must
change over time. Furthermore, for some derivative be a monotone function of physical time, that is,
products, the dynamics of the volatility surface is the process is a subordinator. Recently proposed
crucial for a reliable valuation. A prominent example models use a normal inverse gamma or variance
is the product class of cliquet options, which are gamma Lévy process for representing the exchange
basically a collection of forward start options with rate process and an integrated Cox–Ingersoll–Ross
increasing forward start time. The payoff depends or Gamma–Ornstein–Uhlenbeck process for stocha-
on the absolute or the relative performance of the stic time. All these models have the common fea-
underlying during the lifetime of the options. It is ture that the characteristic function of the logarithm
intuitive that an option with a forward start time of the time-changed exchange rate, ln XYt , can be
of one year, for instance, depends substantially on expressed in closed form.
the one-year forward volatility surface. Since this Besides the models mentioned here, there exist
volatility is uncertain, it seems advisable to model other modeling approaches. For a general overview
this risk by an additional stochastic factor. of stochastic volatility models, see [6, 11] and for
especially in the FX context see [12].

Stochastic Volatility FX Models


Heston’s Stochastic Volatility Model
Now, we review some common procedures for incor-
porating a stochastic behavior of volatility into for- In the following, we discuss the stochastic volatility
eign exchange (FX) rate models. The first approach model of Heston and its option valuation applied to
2 Stochastic Volatility Models: Foreign Exchange

Table 1 Some common stochastic volatility models Valuation of Options in the Heston Model
f (σt ) η(vt ) ζ (vt ) Reference
For the valuation of options in the Heston model, we
σt2 κ(θ − vt ) ξ vt GARCH similar diffusion consider the value function of a general contingent
model [11] claim V (t, v, X). As shown in [8], applying Itô’s

σt2 κ(θ − vt ) ξ vt Heston model [9] lemma, the self-financing condition, and the possibil-
3/2
σt2 κ(θ vt − vt2 ) ξ vt 3/2 model [11] ity to trade in the underlying exchange rate, money
ln σt2 κ(θ − vt ) ξ Log-volatility market, and another option, which is dependent on
Ornstein–Uhlenbeck [11]
time, volatility, and X, we arrive at Garman’s partial
differential equation:
an FX setting. The model is characterized by the
stochastic differential equations: ∂V ∂V ∂V 1 ∂ 2V
+ κ(θ − v) + (rd − rf )X + σ 2v 2
∂t ∂v ∂X 2 ∂v

dXt = (rd − rf )Xt dt + vt Xt dWtX 1 ∂ 2V ∂ 2V
√ + vX 2 2
+ ρσ vX − rd V = 0
dvt = κ(θ − vt )dt + σ vt dWtv (2) 2 ∂X ∂v∂X
 X  (3)
with Cov dWt , dWtv = ρdt. Here, the FX rate pro-
cess {Xt }t≥0 is modeled by a process, similar to A solution to the above equation can be obtained
the geometric Brownian motion, but with a non- by specifying appropriate boundary and exercise con-
constant instantaneous variance vt . The variance ditions, which depend on the contract specifications.
process {vt }t≥0 is driven by a mean-reverting stochas- In the case of European vanilla options, Heston [9]
tic square-root process. The increments of the two provided a closed-form solution, namely,
Wiener processes {WtX }t≥0 and {Wtv }t≥0 are assumed  
to be correlated with rate ρ. In an FX setting, the Vanilla = φ e−rf τ Xt P1 − Ke−rd τ P2 (4)
risk-neutral drift term of the underlying process is
the difference between the domestic and the for- where τ = T − t is the time to maturity, φ = ±1 is
eign interest rates rd − rf . The quantities κ ≥ 0 and the call–put indicator and K is the strike price. The
θ ≥ 0 denote the rate of mean reversion and the long- quantities P1 and P2 define the probability that the
term variance. The parameter σ is often called vol of exchange rate X at maturity is greater than K under
vol, but it should be called volatility of instantaneous the spot and the risk-neutral measure, respectively.
variance. The spot delta of the European vanilla option is equal
√ to φe−rf τ P1 .
The term vt in equation (2) ensures a nonneg-
ative volatility in the FX rate process. It is known Assuming that the distribution of ln XT at time
that the distribution of values of {vt }t≥0 is given by a t under the two different measures is determined
noncentral chi-squared distribution. Hence, the proba- uniquely by its characteristic function ϕj , for j =
bility that the variance takes a negative value is equal 1, 2, it is shown, in [15], that P1 and P2 can
to zero. Thus, if the process touches the zero bound, be expressed in terms of the inverse Fourier
the stochastic part of the volatility process turns zero transformation
and the deterministic part will ensure a nonnegative  ∞  
volatility because of the positivity of κ and θ. 1 1 exp(−iu ln K)ϕj (u)
Pj = + φ  du
The Heston model is often not capable of fitting 2 π 0 iu
complicated structures of implied volatility surfaces. (5)
In particular, this is true if the term structure exhibits
a nonmonotone form or the sign of the skew changes The integration in equation (5) can be done
with increasing maturity. For a discussion of the using numerical integration methods such as
implied volatility surface generated by this model, Gauss–Laguerre integration or fast Fourier transform
see [7]. One approach to tackle this limitation is to approximation. In [10], it is shown that the
extend the original Heston model by time-dependent computational time of the fast Fourier transform
parameters [3, 14]. approach to compute vanilla option prices is higher
Stochastic Volatility Models: Foreign Exchange 3

compared to a numerical integration method with Calibration of Heston’s Model


certain caching techniques.
The characteristic function is exponentially affine We realize the estimation by fitting the Heston model
and available in closed form as parameters to the smile of the current vanilla option
market. Thereby, the choice of the loss function
to minimize the differences between the model and
ϕ2 (u) = exp (B(u) + A(u)vt + iu ln Xt ) (6) market Black–Scholes-implied volatilities is crucial.
Here, we decide to do a least-squared error fit over
The functions A and B arise as the solution of absolute values of volatilities, rather than minimizing
the so-called Riccati differential equations as shown over relative volatilities or option values.
in [8]. They are defined as follows: For a fixed time to maturity τ , given market-
implied volatilities σ1market , . . . , σnmarket and corre-
sponding spot delta (premium unadjusted) values
 
1 − e−d(u)τ 1 , . . . , n , the calibration is set up as follows:
A(u) = − iu(1 − iu) (7)
γ (u) 1. Before starting the optimization, we determine
B(u) = iu(rd − rf )τ the strikes Ki corresponding to σimarket with
κθ 2κθ 2d(u)
+ (κ − iuρσ − d(u))τ + ln
σ2 σ2 γ (u)

(8) Ki = X0 exp − φN −1 (φerf τ i )σimarket τ


 1
with d(u)= (ρσ iu − κ)2 + iuσ 2 (1 − iu) and  + rd − rf + (σimarket )2 τ 1 ≤ i ≤ n
γ (u) = d(u) 1+e−d(u)τ + (κ − iuρσ ) 1 − e−d(u)τ . 2
The characteristic function ϕ1 has the same form as (9)
the function ϕ2 , but with u replaced by u − i and
multiplied by a factor exp(−(rd − rf )τ − X0 ). This
is due to the change from the spot to the risk-neutral which requires the inversion of the cumulative
measure in the derivation of ϕ1 . normal distribution function N .
2. The aim is to minimize the objective function M
There exist several different representations of the
defined below. We repeat the steps (a)–(c) until
characteristic function ϕ. In some formulations of ϕ,
a certain accuracy in the optimization routine is
the characteristic function can become discontinuous
achieved.
if the multivalued complex logarithm contained in
the integrand is restricted to the calculation of its a. We use the analytic formula in equation
principal branch, as is the case in many implemen- (4) to calculate the vanilla option val-
tations. Wrong results of the value Pj may occur ues in the Heston model for the strikes
unless a rotation count algorithm is employed. For K1 , . . . , Kn :
other representations of ϕ, stability for all choices
of model parameters can be proved. Details can be
found in [1]. Hi (κ, σ, θ, ρ, v0 )
Besides vanilla options, closed-form solutions for
exotic options have been found for the volatility = Vanilla(κ, σ, θ, ρ, v0 , market data,
option, the correlation option, the exchange option, Ki , φ) (10)
the forward start option, the American option, the
discrete barrier option, and others. Numerical pricing
of exotic options in the Heston model can be carried b. For i = 1, . . . , n, we compute all option
out by using conventional numerical methods such as values Hi in terms of Black–
Monte Carlo simulation [2, 13], finite differences [8], Scholes-implied volatilities σimodel (κ, σ, θ,
or an exact simulation method [4]. ρ, v0 ) by applying a root search.
4 Stochastic Volatility Models: Foreign Exchange

c. The objective function is given as the introduction of a tradable market instrument


U (t, v, X), which depends on the volatility, the
market can be completed and volatility risk can be
M(κ, σ, θ, ρ, v0 ) hedged dynamically. In the Heston model, to make a

n
 2 portfolio containing the contingent claim V (t, v, X)
= wi σimarket −σimodel (κ, σ, θ, ρ, v0 ) instantaneously risk free, the hedge portfolio has
i=1 to consist of X units of the foreign currency
+ penalty (11) and U units of the contingent claim U (t, v, X),
with
The implementation of a penalty function penalty ∂V ∂U ∂V /∂v
and some weights wi may give the calibration routine X = − U and U = (12)
∂X ∂X ∂U/∂v
some additional stability. There exist various choices
for the penalty function. For example, in [14], it is Common FX market instruments applicable for
suggested to penalize the retraction from the initial market completion and hedging are, on the one
set of model parameters, but we may also use the hand, ATM forward plain vanilla options for different
penalty to introduce further constraints such as the maturities. On the other hand, for most of the FX
condition 2κθ − σ 2 > 0 to ensure that in subsequent markets risk reversals (RR) and butterflies (BF)
simulations the volatility process cannot sojourn in are traded for certain maturities and strikes. These
zero. In addition, we could use the weights wi to instruments are defined asa
favor at-the-money (ATM) or out-of-the-money fits.
For the minimization, a great variety of either
local or global optimizers in multidimensions could RR(T , ) = Call(T , K ) − Put(T , K− ) (13)
be used. Algorithms of Levenberg–Marquardt type 1  
are frequently used, because they utilize the least- BF(T , ) = Call(T , K ) + Put(T , K− )
2
squares form of the objective function. Since the
objective function is usually not convex, there may − Call(T , KATM ) (14)
exist many local extrema and the use of a compu-
tationally more expensive global (stochastic) algo- where K is the strike as given in equation (9),
rithm, such as simulated annealing or differential such that the corresponding plain vanilla option has
evolution, in the calibration routine may be con- a Black–Scholes delta of . KATM denotes the ATM
sidered. From a practical point of view, taking the strike, which is often taken to be the strike generating
value of a short-dated implied volatility as an ini- a zero delta for a straddle in FX markets. Risk
tial value for v0 is a good start for the calibration. reversals and butterflies are quoted in Black–Scholes-
In light of parameter stability, the result of the pre- implied volatilities instead of prices, that is, if ν
vious (the day before) calibration could be used as denotes the implied volatility of a call with strike
an initial guess for the remaining parameters. Fur- K (analogously, ν− for puts and νATM ), the FX
thermore, to enhance the speed of calibration, it is smile quotes are
suggested in [8] to fix the model parameter κ and
run the calibration in only four dimensions, since 1
the influence of the mean reversion is often com- νRR = ν − ν− and νBF = (ν + ν− ) − νATM
2
pensated by a stronger volatility of variance σ . To
(15)
ensure that the correlation parameter ρ attains val-
ues in [−1, 1], we reparametrize with the function
2 arctan(ρ)/π. Since individual ATM options are liquidly traded,
and therefore νATM is known, the volatilities ν and
ν− can be calculated from νRR and νBF .
Hedging The quantities νRR and νBF relate to the skew and
smile, respectively, of the implied volatility surface.
If volatility is introduced as a stochastic factor but This is schematically illustrated for  = 0.25 in
cannot be traded, the market is incomplete. By Figure 1.
Stochastic Volatility Models: Foreign Exchange 5

Volatility put options written on the USD/JPY exchange rate.


The option holder receives a vanilla put option payoff
at maturity in 18 months as long as the FX rate does
not fall below a given barrier B at the barrier fixing
times in 6, 12, and 18 months. Otherwise, the option
expires worthless. We compare ATM option prices
for barriers of 10%, 50%, 60%, 70%, 80%, and 90%
nRR of the spot price.
nBF For the valuation in the Heston model, we calibrate
to European plain vanilla options with maturities of 1,
2, 3, 6, 9, 12, and 24 months and strikes with respect
to 10% put, 25% put, ATM, 10% call, and
Put delta −25% ATM 25% Call delta 25% call for May 21, 2009 (Figure 2). The weights
are set to 1 for ATM strikes, to 0.75 for 25%
Figure 1 The meaning of νRR and νBF in the context of strikes, and to 0.25 for 10% strikes, since usually
the implied volatility curve for a fixed maturity options with strikes far from the ATM forward are
less liquidly traded. Figure 2(a) demonstrates that
the market-implied volatilities (dots) are adequately
Example matched by the calibrated model volatilities (circles
connected by lines). Figure 2(b) shows the term
In this section, we give an example of the dif- structure of market-implied volatilities (dots) and
ference between Heston and Black–Scholes option calibrated implied volatilities (circles) for strikes with
prices. Thereby, we consider discrete down-and-out respect to 25% call, ATM, and 25% put (from

USDJPY volatility surface Volatility term structures


0.17
0.22

0.2
0.16
0.18
Volatility

0.16
0.15
0.14
Volatility

0.12
0.14
0.1

0.08
2 0.13

1.5
Tim

1 0.12
e to

0.5
Ma
tu

0 −25D 0.11
rity

ATM
25D 0 0.5 1 1.5 2
(a) Delta (b) Time to maturity

Figure 2 Implied volatilities of the Heston model fitted to market volatilities for USD/JPY with maturities of 1, 2, 3,
6, 9, 12, and 24 months and strikes for 10% and 25% put, ATM, 10%, and 25% call. The dots show the market
volatilities and the circles the calibrated volatilities. (a) The whole volatility surface. (b) The implied volatility term structure
for strikes 25% call, ATM, and 25% put (from bottom to top)
6 Stochastic Volatility Models: Foreign Exchange

Table 2 Down-and-out put values with at-the-money strike and discrete monitoring at 6, 12, and
18 months
Barrier (% of spot) 90 80 70 60 50 10
Black–Scholes 0.8752 3.7068 5.7670 6.2663 6.3000 6.3012
Heston 0.6309 1.8723 3.1830 4.3376 5.2202 6.3023
The value of the corresponding plain vanilla put in the Heston model is given by 6.3060

bottom to top). The resulting model parameters are References


given by κ = 0.2170, θ = 0.0444, σ = 0.3410, ρ =
−0.5927, and v0 = 0.0228.
As mentioned in the section Valuation of Options [1] Albrecher, H., Mayer, P., Schoutens, W. & Tistaert, J.
in the Heston Model, there exists a value function (2007). The Little Heston Trap, Wilmott No. 1, pp.
in (semi-) closed form for discrete barrier options in 83–92.
the Heston model. The distribution of the random [2] Andersen, L. (2007). Efficient Simulation of the Heston
variables ln Xti at the barrier fixing times ti can Stochastic Volatility Model . Working paper. Available at
be determined uniquely by the derivation of their SSRN: http://ssrn.com/abstract=946405
joint characteristic function and with the application [3] Benhamou, E., Gobet, E. & Miri, M. (2009). Time
of Shephard’s theorem given in [15] the required Dependent Heston Model . Working paper. Available at
SSRN: http://ssrn.com/abstract=1367955.
knock-out probabilities can be computed.
[4] Broadie, M. & Kaya, O. (2006). Exact simulation of
For the valuation in the Black–Scholes model, we
stochastic volatility and other affine jump diffusion
use the interpolated ATM forward-implied volatility models, Operations Research 54(2), 217–231.
for plain vanilla options with a maturity of 18 months, [5] Duffie, D., Singleton, K. & Pan, J. (2000). Transform
which is σBS = 0.1270 in our example. analysis and asset pricing for affine jump-diffusions,
Finally, the FX spot trades at X0 = 94.43, and Econometrica 68, 1343–1376.
the domestic and foreign interest rates for 18 months [6] Fouque, J.P., Papanicolaou, G. & Sircar, K.R. (2000).
are given by rd = 0.0065 and rf = 0.0139. The Derivatives in Financial Markets with Stochastic Volatil-
resulting prices for the described options are shown ity, Cambridge University Press.
in Table 2. [7] Gatheral, J. (2006). The Volatility Surface, Wiley.
Comparing the prices, we can observe two effects. [8] Hakala, J. & Wystup, U. (2002). Foreign Exchange Risk,
Risk Publications.
First, the Heston prices are lower than the correspond-
[9] Heston, S.L. (1993). A closed-form solution for options
ing Black–Scholes prices. This behavior might be in with stochastic volatility with applications to bond
major part due to the fact that the Black–Scholes and currency options, Review of Financial Studies 6,
valuation uses a flat volatility, whereas the Hes- 327–343.
ton model incorporates the whole volatility smile. [10] Kilin, F. (2007). Accelerating the Calibration of Stochas-
Since the volatilities below the ATM strikes increase tic Volatility Models. Working Paper. Available at SSRN:
substantially (Figure 2), the knock-out probabilities http://ssrn.com/abstract=965248
also increase and the option prices drop. Second, the [11] Lewis, A.L. (2000). Option Valuation under Stochastic
prices of Heston and Black–Scholes converge with Volatility, Finance Press.
decreasing barrier level. This appears reasonable, [12] Lipton, A. (2001). Mathematical Methods for Foreign
Exchange, World Scientific.
since the likelihood of a knock-out decreases more
[13] Lord, R., Koekkoek, R. & van Dijk, D. (2006). A
and more and the valuation finally results in put
Comparison of Biased Simulation Schemes for Stochastic
prices, which should be equal for both models in the Volatility Models. & Working Paper. Available at SSRN:
case of a good calibration fit. http://ssrn.com/abstract=903116
[14] Nögel, U. & Mikhailov, S. (2003). Heston’s Stochastic
Volatility Model. Implementation, Calibration and some
End Notes Extensions, Wilmott Juli, 74–49.
[15] Shephard, N.G. (1991). From characteristic function
a. to distribution function, Econometric Theory 7(4),
There exist different definitions for risk reversals and but-
terflies in the literature with respect to sign and coefficients. 519–529. Cambridge University Press.
Stochastic Volatility Models: Foreign Exchange 7

Related Articles Model Calibration; Simulation of Square-root


Processes; Stochastic Volatility Models.

Foreign Exchange Options; Foreign Exchange SUSANNE A. GRIEBSCH & KAY F. PILZ
Smiles; Heston Model; Implied Volatility Surface;
Foreign Exchange Basket Pricing Basket Options

Options Basket options should be priced in a consistent way


with plain vanilla options. Hence the basic model
assumption is a lognormal process for the individ-
ual correlated basket components. A decomposition
Quite often, corporate and institutional currency man-
into uncorrelated components of the exchange rate
agers are faced with an exposure in more than processes
one currency. Generally, these exposures would be
hedged using individual strategies for each currency. 
N

These strategies are composed of spot transactions, dSi = µi Si dt + Si ij dWj (2)
forwards, and, in many cases, options on a sin- j =1
gle currency. Nevertheless, there are instruments that is the basis for pricing. Here µi denotes the difference
include several currencies, and these can be used to between the foreign and the domestic interest rate of
build a multicurrency strategy that is almost always the ith currency pair and dWj the j th component
cheaper than the portfolio of the individual strategies. of independent Brownian increments. The covariance
As a leading example, we explain basket options in matrix is given by Cij = (T )ij = ρij σi σj . Here σi
detail.a denotes the volatility of the ith currency pair and ρij
the correlation coefficients.

Basket Options Exact Method


Basket options are derivatives based on a common Starting with the uncorrelated components, the pric-
base currency, say EUR and several other risky cur- ing problem is reduced to the N -dimensional integra-
rencies. The option is actually written on the basket of tion of the payoff. This method is accurate but rather
risky currencies. Basket options are European options slow for more than two or three basket components.
paying the difference between the basket value and
the strike, if positive, for a basket call, or the differ- A Simple Approximation
ence between strike and basket value, if positive, for
a basket put, respectively, at maturity. The risky cur- A simple approximation method assumes that the
rencies have different weights in the basket to reflect basket spot itself is a lognormal process with drift
the details of the exposure. µ and volatility σ driven by a Wiener Process W (t):
For example, a basket call on two currencies USD
dS(t) = S(t)[µ dt + σ dW (t)] (3)
and JPY pays off
  with solution
S1 (T ) S2 (T )
max a1 + a2 − K; 0 (1)
S(T ) = S(t)eσ W (T − t)+(µ−1/2σ )(T − t)
2
S1 (0) S2 (0) (4)

at maturity T , where S1 (t) denotes the exchange rate given we know the spot S(t) at time t. It is a fact
of EUR/USD and S2 (t) denotes the exchange rate of that the sum of lognormal processes is not lognormal
EUR/JPY at time t, ai the corresponding weights, and itself, but as a crude approximation, it is certainly
K the strike. a quick method that is easy to implement. To price
A basket option protects against a drop in both the basket call, the drift and the volatility of the
currencies at the same time. Individual options on basket spot need to be determined. This is done
each currency cover some cases, which are not by matching the first and second moment of the
protected by a basket option (shaded triangular areas basket spot with the first and second moment of the
in Figure 1) and that is why they cost more than lognormal model for the basket spot. The moments
a basket. of lognormal spot are
The ellipsoids connect the points that are reached
E(S(T )) = S(t)eµ(T −t)
with the same probability assuming that the forward
2
)(T −t)
prices are at the center. E(S(T )2 ) = S(t)2 e(2µ+σ (5)
2 Foreign Exchange Basket Options

expansion of the basket spot, which results in


S 2(T ) υ(0) = e−rd T (F N(d1 ) − KN(d2 ))
S(0)
F = √ e(µ−λ/2+(λσ /2(η)))T
2

η


S1(T ) σ − σ 2 +λ (1+(λ/η)) σ 2 T −2 ln(F η/K)
d2 = √


√ σ T
d1 = ηd2 + √ (9)
η
Figure 1 Basket-payoff and contour lines for probabilities
where η = 1 − λT
The new parameter λ is determined by matching
We solve these equations for the drift and volatil-
the third moment of the basket spot and the model
ity:
spot. For details, see [1].
 
1 E(S(T )) Most remarkably, this major improvement in the
µ= ln accuracy only requires a marginal additional compu-
T −t S(t)
  
tation effort.

 1 E(S(T )2 )
σ =  ln (6)
T −t E(S(T ))2 Correlation Risk
Correlation coefficients between market instruments
In these formulae we now use the moments for
are usually not obtained easily. Either historical data
the basket spot:
analysis or implied calibrations need to be done.

N However, in the foreign exchange (FX) market, the
E(S(T )) = αi Si (t)eµi (T −t) cross instrument is traded as well. For instance, in
i=1 the example above, USD/JPY spot and options are
traded, and the correlation can be determined from

N
E(S(T )2 ) = αi αj Si (t)Sj (t) this contract. In fact, denoting the volatilities as in
i,j =1
the tetrahedron (Figure 2), we obtain formulae for

N the correlation coefficients in terms of known market
µi +µj + ki j k (T −t) implied volatilities:
×e k=1 (7)
σ32 − σ12 − σ22
The pricing formula is the well-known Black– ρ12 =
2σ1 σ2
Scholes–Merton formula for plain vanilla call
options: σ12 + σ62 − σ22 − σ52
ρ34 = (10)
2σ3 σ4
υ(0) = e−rd T (F N(d+ ) − KN(d− ))
F = S(0)eµT
   
1 F 1
d± = √ ln ± σ 2T (8)
σ T K 2
s6
s4 s5
Here N denotes the cumulative normal distribution
function and rd the domestic interest rate.
f13 s3
s1
A More Accurate and Equally Fast Approximation f12 s2 f23

The previous approach can be taken one step further


by introducing one more term in the Itô–Taylor Figure 2 Currency tetrahedron including cross contracts
Foreign Exchange Basket Options 3

Table 1 Table 3
GBP/USD 8.9% Base currency EUR Interest rate 4.0%
USD/JPY 10.1% Nominal in EUR 39 007 Strike K 1
GBP/JPY 9.8% Currencies USD JPY GBP
EUR/USD 10.5% Nominals 29% 30% 41%
EUR/GBP 7.5% 1/spot 1.1429 0.00919 1.6091
EUR/JPY 10.0% Spot 0.8750 108.81 0.6215
Strikes (in EUR) 1.1432 0.00927 1.5985
FX implied volatilities for three- Volatilities 10.5% 10.0% 7.5%
month at-the-money vanilla options Interest rates 4.0% 0.5% 7.0%
as of November 23, 2001. Source: BS-values (in EUR) 235 227 233
Reuters Basket value 563
Sum of individuals 695
This method also allows hedging correlation risk Comparison of a basket call with three currencies for a maturity
by trading FX implied volatility. For details see [1]. of three months versus the cost of three individual call options

Practical Example Basket option vs. two vanilla options


500
To find out how much one can save using a basket Two vanilla calls
450
option, we take EUR as a base currency and consider
400
a basket of three currencies USD, GBP, and JPY. For
350 Premium saved
the volatilities, we take the values in Table 1. Value Basket call
The resulting correlation coefficients are given in 300

Table 2. 250
The amount of option premium one can save using 200
a basket call rather than three individual call options 150
is illustrated in Table 3. 100
The amount of premium saved essentially depends
−90
−80
−70
−60
−50
−40
−30
−20
−10
0
10
20
30
40
50
60
70
80
90
100
on the correlation of the currency pairs (Figure 3).
Correlation (%)
In Figure 3, we take the parameters of the previous
scenario, but restrict ourselves to the currencies USD Figure 3 Premium of basket option versus premium of
and JPY. option strategy depending on the correlation

Upper Bound by Vanilla Options Surprisingly, this is just the case if a specific relation
between the strike of the individual options and their
It is actually clear that the price of the two vanilla
volatilities is satisfied. The basket strike has to satisfy
options in the previous example is an upper bound
of the basket option price. It seems intuitively clear K1 K2
that for a correlation of 100% the price is the same. K = a1 + a2 (11)
S1 (0) S2 (0)

Table 2
GBP/USD (%) USD/JPY (%) GBP/JPY (%) EUR/USD (%) EUR/GBP (%) EUR/JPY (%)
GBP/USD 100 −47 42 71 −19 27
USD/JPY −47 100 60 −53 −18 45
GBP/JPY 42 60 100 10 −36 71
EUR/USD 71 −53 10 100 55 52
EUR/GBP −19 −18 −36 55 100 40
EUR/JPY 27 45 71 52 40 100
FX implied three-month correlation coefficients as of Nov 23, 2001
4 Foreign Exchange Basket Options

which leads to the natural choice Table 4


Si (0) Base currency EUR
Ki = K (12)
a1 + a2 Nominal in EUR 39 007
Currencies USD JPY GBP
Each strike Ki satisfies the above constraint by Nominals 29% 30% 41%
choosing RR 25d −0.25% −4.30% 1.10%
√ Fly 25d 0.30% 0.17% 0.25%
Ki = Si (0)e(µi +1/2σi )T +χ σi
2
T
(13) BS-values (in EUR) 235 227 233
Smile values (in EUR) 233 168 278
for some arbitrary, but common χ for all basket Basket smile value 554
components. Sum of individuals (smile) 680
Basket value 563
Sum of individuals 695
Smile Adjustment
For the pricing method, described there is no smile is to use multicurrency hedge instruments. We have
considered. Given the volatility smile for vanilla shown that basket options are convenient instruments
options, σi (K, T ), with the same maturity as the protecting against exchange rates of most of the bas-
basket option, the implied density P for each cur- ket components changing in the same direction. A
rency pair in the basket can be derived from vanilla rather unlikely market move of half of the currencies’
prices V . exchange rates in opposite directions is not protected
P (K, T ) = erT ∂KK V (K, σi (K, T )) (14) by basket options, but when taking this residual risk
into account, the hedging cost is reduced substan-
A mapping ϕ(w) can be derived that maps the tially. The smile impact on the basket value can be
Gaussian random numbers to smile-adjusted random calculated rather easily without referring to a specific
numbers for each currency pair. The implicit con- model, because the product is path-independent.
struction solves the problem for the probability of the
mapped Brownian to be the same as the smile-implied
End Notes
probability.
Using Monte Carlo simulation to price vanilla a.
This article is an extension of “Hakala, J. & Wystup,
options using the mapping, it can be shown that in the U. (2002) Making the most out of Multiple Currency
limit, the derived prices are perfectly in line with the Exposure: Protection with Basket Options, The Euromoney
smile. The formula for the Monte Carlo simulation Foreign Exchange and Treasury Management Handbook
for a realization of a Brownian w is given by 2002 , Adrian Hornbrook.”

Si (0, w) = Si (0)e(µi +1/2σi )T +ϕ(w)σi
2
T
(15)
References
To price the basket option using the smile in
Monte Carlo, a sequence of independent random [1] Hakala, J. & Wystup, U. (2001). Foreign Exchange Risk,
numbers is used. These random numbers are corre- Risk Publications, London.
lated using the square-root matrix  as above and
these are fed into the individual mappings, hence Further Reading
generating the simulated spot at the basket maturity.
Evaluating the payoff and averaging will generate Wystup, U. (2006). FX Options and Structured Products, Wiley.
a smile-adjusted price (see Table 4). Black–Scholes
prices and smile-adjusted prices are shown next to Related Articles
each other for a direct comparison.
Basket Options.
Conclusion JÜRGEN HAKALA & UWE WYSTUP
Many corporate portfolios are exposed to multicur-
rency risk. One way to turn this fact into an advantage
Call Options = f (a) + f  (a)(A − a) + 1A>a
 ∞
× dvf  (v)(A − v)+
a
Call options appeared as rights to buy an underly-  a

ing traded asset for a prespecified price, named the + 1a>A dvf  (v)(v − A)+ (1)
0
option strike or the exercise price, at a prespecified
future date named the option expiry or the maturity. On the right hand side, we have a position in a
Put options are analogous rights to sell an under- bond with face value given by the constant term, a
lying asset. For strike K and maturity T with the position in the underlying risk of f  (a) and a position
underlying asset trading at maturity for S, the call in puts struck below a and calls struck above a at
expires unexercised if S is below K while the put strike ν of f  (ν).
expires unexercised if S is above K. On exercise, the With regard to the information content of the
value of the call option is S − K while that of the put market prices, we consider Breeden and Litzenberger
option is K − S. Hence, one may write the payoffs [1], who showed how one may extract the pricing
at maturity to the call and put options as (S − K)+ density at time t < T , p(t, A) for the underlying risk
and (K − S)+ , respectively. More generally, one may from market option prices. By definition, we have
define a call or put payoff for any underlying random  ∞
variable, which need not be a traded asset, for which c(t, K, T ) = e −r(T −t)
(A − K)p (t, A) dA (2)
the realized value at maturity is known to be A, as K
(A − K)+ and (K − A)+ , respectively.
where r is the interest rate prevailing at time t for
When call and put options trade before the matu-
the maturity (T − t). We may differentiate twice with
rity, on an underlying uncertainty resolved at maturity respect to the strike to get
for various strikes K, with prices determined in mar-
kets at time t < T as c(t, K, T ), p(t, K, T ), respec- ∂ 2 c(t, K, T )
tively, we have an options market for the underlying p(t, K) = er(T −t) (3)
∂K 2
risk. Such markets provide a rich source of opportu-
nities for holding the underlying asset or risk while In the case when the underlying risk is an asset
simultaneously providing information on the prices price with a specific dynamics with exposure to a
of these risks. With regard to the opportunities, they Brownian motion with a space–time deterministic
make it possible to hold any function f (A) of the volatility (see Local Volatility Model) as postulated
underlying risk via a portfolio of put and call options. by Dupire [6] plus a compensated jump martingale
with a space–time deterministic arrival rate of jumps
This fact is easily demonstrated as follows [2].
and a fixed dependence of the arrival rate on the jump
Let f (A) be the function we wish to hold. We
size, one may extract information on the dynamics
note that
from market prices. Here, we follow Carr et al. [4].
 A  a
Let (S(t), t > 0) denote the path of the stock price,
f (A) = f (a) + 1A>a f  (u) du − 1a>A f  (u) du where r is the interest rate, η the dividend yield,
a A σ (S, t) the deterministic space–time volatility func-
 A   u  tion, (W (t), t > 0) a Brownian motion, m(dx, ds)
 
= f (a) + 1A>a f (a)+ f (v) dv du the integer-valued counting measure associated with
a a the jumps in the logarithm of the stock price, a(S, t)
 a   a 
the deterministic space–time jump arrival rate, and
− 1a>A f  (a) − f  (v) dv du k(x) the Lévy density across jump sizes x. The
A u

dynamics for the stock price may be written as
= f (a) + f (a)(A − a) + 1A>a
 A  t
× dvf  (v)(A − v) S(t) = S(0) + S(u )(r − η) du
a 0
 a  t
+ 1a>A dvf  (v)(v − A) + S(u )σ (S(u ), u) dW (u)
A 0
2 Call Options

 t ∞  
  T K
+ S(u ) ex − 1 (m(dx, du) + dY q(Y, u)a(Y, u)
0 −∞ 0 0
 ∞
− a(S(u ), u)k(x) dx du) (4)
×  (Y ex − K)k(x) dx du (6)
K
ln
We now apply a generalization of Itô’s lemma to Y
convex functions known as the Meyer–Tanaka for- Now differentiating equation (6) with respect to
mula (see, e.g., [5, 7, 8] for the specific formulation T , we get
below) to the call option payoff at maturity to obtain

 rerT C + erT CT
T  ∞
(S(T ) − K)+ = (S(0) − K)+ + 1S(u )>K dS(u) = (r − η) q(Y, T )Y dY
0
K
 T
1 σ (K, T )K 2
2
+ δK (S(u ))σ 2
+ q(K, T )
2 0 2


× (S(u), u)S 2 (u) du  ∞  ln
K K
 Y ln
Y
+ 1S(u )>K (K − S(u))+ + dY Y q(Y, T )a(Y, T ) e −ex
K −∞
u≤T
 K
+ 1S(u )<K (S(u) − K)+ (5) × k(x) dx + dY Y q(Y, T )a(Y, T )
0


The second integral denotes the value at K  ∞ K
of the continuous local time LaT ; a ∈ , which is ×  ex − e
ln
Y
k(x) dx (7)
globally K
 ∞defined for every  T bounded Borel function ln
Y
f, as −∞ f (a)LaT da = 0 f (S(u ))d S c u , where
dS c u = σ 2 (S(u), u)S 2 (u)du, and is applied here We now isolate CT on the left, using some
formally to the Dirac measure f (a) = δK (a). The last elementary properties of the relationship between call
term, which is the discontinuous component of local prices and the pricing density. In particular, we note
time at level K, is made up of just the crossovers,  ∞
−rT
whereby one receives S(u) − K on crossing the strike e Y q(Y, T ) dY = C − KCK (8)
into the money, whereas one receives (K − S(u)) on 0

crossing the strike out of the money. e−rT q(K, T ) = CKK (9)
Computing expectations on both sides of equation
(5) and introducing q(, u), the transition density and obtain
that the stock price is  at time u given that at time
CT = − ηC − (r − η)KCK
0 it is at S(0), we may write the call price function
at time zero as σ 2 (K, T )K 2
+ CKK
2
erT C(K, T ) = (S(0) − K)+  ∞
 T ∞ + dY Y CY Y a(Y, T )
+ dY q(Y, u)Y (r − η) du K


0 K  K K

ln
 Y ln
Y
1 T × e −e x
k(x) dx
+ q(K, u)σ 2 (K, u)K 2 du −∞
2

0
  K
T ∞
+ dY q(Y, u)a(Y, u) + dY Y CY Y a(Y, T )
0
0

K


 ∞ K
 K ln
ln
Y ×  e −e x Y
k(x) dx (10)
× (K − Y ex )k(x) dxdu ln
K
−∞ Y
Call Options 3

We may define the function Options Pricing; Partial Integro-differential Equa-


tions (PIDEs)).
 x Apart from spanning all functions of the under-
ψ(x) = 1x<0 (ex − es )k(s) ds lying risk and providing us with information on
−∞
 ∞
the possible dynamic movements of the stock price
+ (es − ex )k(s) ds (11) consistent with market option prices, we have the
x question of understanding the absence of arbitrage
between option prices. This question was addressed
and then write
in [3], where it was shown that if call spread, butterfly
spread, and calendar spread arbitrages are excluded
CT = − ηC − (r − η)KCK then the option quotes are free of static arbitrage. It is,
σ 2 (K, T )K 2 therefore, important to document the three arbitrages
+ CKK that need to be checked. For a call spread, we have
2
 ∞ the inequality for two strikes K1 < K2 for a fixed
K maturity T :
+ CY Y Y a(Y, T )ψ ln dY
0 Y
c(K1 , T ) − c(K2 , T )
(12) <1
K2 − K1
When there are no jumps in the process for X and
For a butterfly spread, we have three strikes K1 <
ψ ≡ 0, equation (12) is identical to the formula in
K2 < K3 and a fixed maturity T for which we must
[6] for local volatility (see Dupire Equation). In the
have
opposite case, when there is no continuous martingale
component, we have the following result:
K3 − K1
c(K1 , T ) − c(K2 , T )
K3 − K2
CT + ηC + (r − η)KCK
 ∞ K2 − K1
K + c(K3 , T ) ≥ 0 (16)
= CY Y Y a(Y, T )ψ ln dY (13) K3 − K2
0 Y
Finally, the calendar spread inequality requires
It is now useful to rewrite equation (13) in terms of that for two maturities T1 < T2 and strike K
k = ln(K), y = ln(Y ), and c(k, T ) = C(ek , T ). With
this substitution, we may rewrite equation (12) as c (K, T2 ) ≥ c(Ke−r(T2 −T1 ) , T1 ) (17)


σ 2 (ek , T )
cT + ηc + r − η + ck Similar results hold for put options via put–call
2 parity that asserts in the case of a stock
 ∞
σ 2 (ek , T )
− ckk = b(y, T )ψe (k − y) dy p(K, T ) = c(K, T ) + K − S(0)e−ηT (18)
2 −∞

(14) The call spread inequality approximates the prob-


ability that the stock exceeds K1 when we take K2
where b(y, T ) = e2y CY Y a(ey , T ). The forward speed close to and above K1 . The butterfly spread inequality
function, a(Y, T ), may be identified as guarantees the existence of a positive pricing den-
sity and the calendar spread inequality arranges these
b(ln(Y ), T ) densities to be increasing in the convex order with
a(Y, T ) = (15)
Y 2 CY Y respect to the maturity. When the underlying risk is
not a traded asset as occurs, for example, for options
For specific Lévy measures, the convolution equa- on the VIX index where the underlying is the price
tion (14) may be solved in closed form to yield of the one month variance swap, we lose the calendar
explicit solutions for the Markov process from spread inequality and the requisite densities are not
data on option prices (see Fourier Methods in increasing in the convex order. One can check that on
4 Call Options

most days VIX call option prices when deflated by [4] Carr, P., Geman, H., Madan, D. & Yor, M. (2005). From
the forward VIX are increasing in maturity for given local volatility to local Lévy models, Quantitative Finance
strikes and we have an empirical increase in the con- 4, 581–588.
[5] Dellacherie, C. & Meyer, P. (1980). Probabilités et
vex order, but there are days when this monotonicity Potentiel, Theorie des Martingales, Hermann, Paris.
is lost. The conditions for VIX option surfaces to be [6] Dupire, B. (1994). Pricing with a smile, Risk 7,
free of arbitrage are, therefore, not as clear as they 18–20.
are for an underlying stock or a stock index. [7] Meyer, P. (1976). Un Cours sur les Intégrales stochas-
tiques, in Séminaire de Probabilités X, Lecture Notes in
Mathematics, Springer-Verlag, Berlin, Vol. 511.
References [8] Yor, M. (1978). Rappels et Préliminaires Généraux,
in Temps Locaux, Société Mathématique de France,
[1] Breeden, D. & Litzenberger, R.L. (1978). Pricing of state- Astérisque, pp. 17–22, 52–53.
contingent claims implicit in option prices, Journal of
Business 51, 621–651.
[2] Carr, P. & Madan, D.B. (2001). Optimal positioning in Related Articles
derivatives, Quantitative Finance 1, 19–37.
[3] Carr, P. & Madan, D.B. (2005). A note on sufficient Dupire Equation; Local Volatility Model; Put–Call
conditions for no arbitrage, Finance Research Letters 2,
Parity; Static Hedging; Variance Swap.
125–130.
DILIP B. MADAN
Barrier Options the barrier level, the barrier option will be knocked in
and become a vanilla option. Otherwise, the barrier
option will expire worthless at maturity. Up-and-in
Barrier options are vanilla options with path-depen- calls are more common. This is because, when the
underlying asset increases to knock-in barrier level,
dent payoffs, that is, the payoff is not only a function
it would most likely stay above the initial underlying
of stock level relative to option strike but also
asset level. Therefore, call options will be more likely
dependent upon whether or not the stock reaches
be in the money at maturity than put options. Bullish
certain prespecified barrier level before maturity. An
investors can buy up-and-in call options and pay a
example will illustrate the idea. Suppose an investor
lower premium than that on the vanilla call options.
is long an up-and-in at-the-money call option on
This makes the on up-and-in calls more leveraged
the S&P 500 index with barrier level at 110% of
than vanilla calls.
the initial S&P 500 index level. Before maturity, if
the index never reaches 110% of the initial index
level, the option never gets knocked in. The investor Down-and-in Call/Down-and-in Put
receives nothing at maturity. However, if the index The down-and-in barrier option has a knock-in barrier
level reaches 110% at some point before maturity, the level, which is below the initial underlying asset
investor receives a payoff identical to a vanilla at-the- level. Before the maturity, if the underlying asset
money call option at maturity. In the latter scenario, goes below the barrier level the barrier option will be
the option is “knocked in” on the day when the index knocked in and become a vanilla option. Otherwise,
reaches 110% level. the barrier option will expire worthless at maturity.
There are many types of barrier options. We dis- Down-and-in puts are more common in this case.
cuss two common ones, that is, knock-out and knock- Bearish investors can buy down-and-in puts and pay
in barrier options. For knock-out barrier options, the a lower premium than that on the vanilla put options.
option will be knocked out and become worthless
if the underlying asset crosses a prespecified barrier Up-and-out Call/Up-and-out Put
level. For knock-in barrier options, the barrier option
will be knocked in and become a vanilla option only This is the first kind of knock-out barrier options.
if the underlying asset crosses the prespecified level The up-and-out barrier option has a knock-out barrier
before maturity. The example used earlier is a knock- level above the initial underlying asset level. Before
in barrier option. maturity, if the underlying asset crosses the barrier
Depending upon the barrier level relative to the level, the option will be knocked out and become
initial underlying asset level, we can have an “up” worthless. Otherwise, the barrier option will just be a
barrier or a “down” barrier. If the barrier is above the vanilla option. A bearish investor would buy up-and-
initial underlying asset level, it is called an up bar- out puts to achieve more leverage by paying a lower
rier. If the barrier is below the initial underlying asset premium than that on vanilla puts.
level, it is called a down barrier. Together, we can
have four different variations of barrier options, that Down-and-out Call/Down-and-out Put
is, up-and-in, up-and-out, down-and-in, and down-
The down-and-out barrier option has a knock-out
and-out options. Table 1 shows these four variations barrier level below the initial underlying asset level.
schematically. Before maturity, if the underlying asset goes below
the barrier level, the option will be knocked out
and become worthless. A bullish investor would
Basic Features buy down-and-out calls to achieve more leverage by
paying a lower premium than that on vanilla calls.
Up-and-in Call/Up-and-in Put

This is the first kind of knock-in barrier options. The Some Variations
up-and-in barrier option has a knock-in barrier level,
which is higher than the initial underlying asset level. With increased popularity of barrier options and
Before maturity, if the underlying asset goes above growth of its market, some other features are being
2 Barrier Options

Table 1 Common barrier types Discrete Barrier


Knock in Knock out
The barrier specification varies from contract to
Up barrier Up-and-in call Up-and-out call contract. Many barrier options have the so-called
Up-and-in put Up-and-out put continuous barrier or intraday barrier. This means
Down barrier Down-and-in call Down-and-out call that the barrier event can be triggered any time
Down-and-in put Down-and-out put during intraday trading hours. However, some barrier
options only allow the barrier event to be triggered by
the end-of-day closing price. This is called discrete
barrier. More generally, a discrete barrier is defined
introduced to better meet investors’ needs. In the as any barrier type other than the continuous barrier.
following, we discuss briefly some of the most
popular variations to basic knock-in/knock-out barrier
options. American Style Exercise

Although it is not very common, there are a few


Rebates types of barrier options that have American exercise
feature. One example would be a six-month at-the-
It is fairly common that a knock-out barrier option money call option with installment premium, that is,
pays a rebate on a knock-out event to compensate the premium would be paid monthly instead of a
the investor for the loss of the option. The rebate is lump sum upfront. The knock-out barrier is at 90% of
typically a small amount and paid either immediately the initial underlying asset level. The option will be
or deferred to the maturity. For knock-in barrier knocked out if the underlying asset drops below the
options, a rebate will be paid out at maturity if the knock-out barrier level before maturity. In addition,
barrier is never knocked in. the option would be terminated automatically if the
investor stops paying the installment premium on
monthly reset dates. This installment feature provides
Double Barrier more flexible financing for investors because they
pay the premium over a period of time instead of
A double barrier option is another variation that has paying the lump sum up front. It also allows investors
two barriers, typically one up barrier and the other a to re-evaluate the market condition and decide if
down barrier. For example, investors seeking high it is optimal to continue the knock-out option or
leverage would consider double knock-out barrier terminate it.
options if they believe changes in the underlying asset
level would be within a narrow range. The double
knock-out barrier option would become worthless if Valuation of Barrier Options
the underlying asset drops below the down barrier or
rises above the up barrier before maturity. Because Barrier options are less expensive than vanilla options
of that, the double knock-out barrier option costs because of the possibility that the barrier option
less than the single knock-out barrier option and thus would be knocked out or knocked in. There are
provides more leverage. many publications on how to price barrier options
Another popular variation of double barrier is the (see [2, 7] for a good summary). In general, the
knock-in option with a knock-out barrier. The knock- valuation of barrier options needs to take into account
in barrier option can be knocked out either before stock-level dependency of volatility dynamics, that is,
or after the option is knocked in. If the option can local volatility surface. This requires using numerical
be knocked out even before the option is knocked methods like partial differential equations (see Finite
in, this is called knock-out dominant barrier option. Difference Methods for Barrier Options) or Monte
Again, this type of double barrier options provides Carlo simulation. In certain situations, the barrier
more leverage than a single barrier option because it options can be priced by using the static replication
costs less than the single knock-in barrier option. method (see [3, 4]).
Barrier Options 3

In–Out Parity case, Broadie et al. [1] proposed that, by adjusting


the barrier level in the continuous barrier option val-
Similar to put–call parity for the vanilla options, uation, one would obtain a good approximation of the
there is an interesting relationship between knock- discrete barrier option value. The adjustment to the
in and knock-out barrier option. Using call options barrier level is prescribed by the following formula:
as example, an up-and-out call is complimentary √
to an up-and-in call if both have the same strike Xadj = Xeασ T /m (1)
and barrier level. A portfolio of one up-and-in call
and one up-and-out call will have the same payoff where α is 0.5826 for an “up” barrier and −0.5826 for
at maturity as a vanilla call option. The reason is a “down” barrier, m is the number of discrete samples
simple. If the underlying asset stays below the barrier of the underlying asset price over the term T of the
level before maturity, the up-and-in call will become barrier option, and σ is the constant volatility.
worthless and the up-and-out call will become a
vanilla call. On the other hand, if the underlying asset
rises above the barrier level, the up-and-out call will References
become worthless and the up-and-in call will become
a vanilla call. Therefore, for any given scenario of the [1] Broadie, M., Glasserman, P. & Kou, S.G. (1997). A
continuity correction for discrete barrier options, Journal
underlying asset path before maturity, the portfolio
of Mathematical Finance 7, 325–349.
will always have the same payoff as a vanilla call. [2] Briys, E., Bellalah, M., Mai, H.M. & de Varenne, F.
The sum of an up-and-in call and an up-and-out call (1998). Options, Futures and Exotic Derivatives—Theory,
is the same as a vanilla call. This is so-called in–out Application and Practice, Wiley Frontiers in Finance.
parity for barrier options and applies to both the put [3] Carr, P. & Chou, A. (1997). Breaking barriers, Risk
and call options in the absence of rebates. Magazine 10, 139–144.
[4] Derman, E., Ergener, D. & Kani, I. (1997). Static options
replication, in Frontiers in Derivatives, Irwin Professional
Constant Volatility Publishing.
[5] Merton, R. (1971). Theory of rational option pricing, Bell
In the limit of constant volatility, Merton [5] provided Journal of Economics and Management 4, 141–183.
[6] Reiner, E. & Rubinstein, M. (1991). Breaking down
the first analytical formula for a down-and-out call
barriers, Risk Magazine 4, 28–35.
option. Later on, Reiner and Rubinstein [6] extended [7] Wilmott, P. (1998). Derivatives—The Theory and Prac-
the formula for all eight combinations of barriers (see tice of Financial Engineering, John Wiley & Sons.
Pricing Formulae for Foreign Exchange Options).

Related Articles
Adjustment for Discrete Barrier
Finite Difference Methods for Barrier Options;
Often, the barrier option has a discrete barrier sched- Pricing Formulae for Foreign Exchange Options.
ule but the exact valuation is only available for the
case of continuous barrier. In the constant volatility MICHAEL QIAN
Corridor Options be seen as the sum of the payoffs of digital options
expiring on successive days.

Occupation time derivatives are debt securities that Payoff


came into existence in 1993 and have attracted some
Let t be the current time (measured in years) and
attention from investors and researchers. A defining
let 0 = T0 < t < T1 , . . . ,Tj , . . . , Tn be the payment
characteristic of these contracts is a payoff that
dates of the coupons φ Tj (T0 is the lastdate at
depends on the time spent by the underlying asset
which a payment occurred). Let D Tj , Tj +1 be the
in some predetermined region. Typical specifications
number of days in the coupon period and let Tj,i be
consist in interest payments that are proportional
the date corresponding to i days after  date Tj , that
to the time in which a reference index rate (most
is, Tj,i = Tj + i/365, i = 1, . . . , D Tj , Tj +1 . Let xl
commonly the Libor rate) lies inside a given range.
and xu be the lower and upper bound of the range
In return for the drawback that no interest will be
(the prespecified range for each observed date can
paid for the time the corridor is left, they offer higher
vary daily or across different compounding periods).
rates than comparable standard products, like floating
Finally let X (t) be the value of the reference index
rate notes. Various claims with features of this type
at time t. Sometimes, X represents a stock index or
have been studied and have been popularized with
a foreign currency rate, and sometimes is taken to
different names such as corridor bond or option,
be a LIBOR rate of a preassigned tenor (in this case
range note, range floater, range accrual note, LIBOR
X (t) = L (t, t + θ), where L is the LIBOR rate and
range note, fairway bond,a and hamster option.b The θ is its tenor) or the spread of two swap rates with
most common underlyings are stock indices, foreign different maturities.
currencies, and interest rates, such as LIBOR or swap The range note can have a fixed and a floating
rates. Also spread range notes are common. Range version. For a range note, the value of the coupon
notes pay a coupon proportional to the number of paid at time Tj +1 is equal to
days in which the difference between two interest  
rates (e.g., 10-year swap rate versus 30-year swap   H Tj , Tj +1
rate) is positive. Thus while the value of an interest φ Tj +1 = Cj +1 ×   (1)
D Tj , Tj +1
range note depends on the volatility of the level of
the term structure, the value of a range note depends where Cj +1 represents the annual coupon rate for the
on the volatility of the slope of the term structure. (j + 1)th compounding period and is given by
This makes it important to model the correlation of 
 C fixed range note,
interest rates with different maturities accurately. 
Cj +1 =   (2)
 X Tj +  delayed floating
One of the most popular structures is the accrual 
note, typically of one- or two-year maturity, which range note,
offers a coupon calculated by the number of days
where C is a constant and  is the spread to be added
in which three-month dollar Libor, for example,
to the reference rate, and
falls inside a predefined range. On these days, the
note will typically offer a preassigned spread over D (Tj ,Tj +1 )

    
the relevant treasury bond. When Libor is outside H Tj , Tj +1 = 1[xl ,xu ] X Tj,i (3)
the range, the payout is zero. The note effectively i=1
provides a way for investors to sell volatility, but
where 1A (x) is the indicator function of the set A,
the binary structure protects them from the unlimited
that is, 1A (x) =1 if x ∈ A, otherwise 1A (x) = 0.
downside that would accompany similar strategies,
Here the term delayed refers to the fact that the option
such as writing a floor and cap on the range. Corridor
maturity (or reset date) Tj anticipates the payment
products offer investors enhanced yield if they have
date Tj +1 . Sometimes, instead of having the delayed
a strong view that rates will stay within a range, and floating range note we have the in-arrears floating
often they are structured to reflect an investor’s view range note where the coupon is set equal to
that is contrary to a particular forward-rate curve. In  
its simplest form, the payoff of the corridor bond can Cj +1 = X Tj +1 +  (4)
2 Corridor Options

that is, the coupon payment depends on the level of Therefore, for a fixed-rate range note, we have
the reference rate at the current coupon payment date
(maturity and payment date coincide).   
Ɛt φ Tj +1
Sometimes, a minimum coupon clause is also
D (Tj ,Tj +1 )
included, so that the coupon amounts to Cj 
   =   N dTxj,i
u xl
−t − N dTj,i −t
  H Tj , Tj +1 D Tj , Tj +1 i=1
φ Tj +1 = max Cj +1 ×  ,K (5)
D Tj , Tj +1 (8)
The standard contract is the fixed rate range note if and the price of the fixed range note is
the underlying is a stock index or a foreign currency
and the delayed floating range note if the underlying 
n
    
is a LIBOR rate. P t, Tj +1 Ɛt φ Tj +1 + P (t, Tn ) (9)
j =0

where P (t, T ) refers to the price of a zero-coupon


Pricing bond expiring in T .
From a pricing perspective, the case of the fixed
In this section, we discuss the pricing problem of the
range note with a minimum coupon provision appears
corridor contracts described earlier.
more interesting. Indeed, in this case, the pricing
formula requires the distribution of the occupation
Underlying is a Stock Index time of the range  [x l , xu ]. If the random vari-
able H Tj , Tj +1 /D Tj , Tj +1 in equation (3) is
In this case, we model the underlying according to the replaced by its continuously monitored version
Geometric Brownian Motion (GBM) model, that is, Tj +1
 
h Tj , Tj +1 ; xl , xu = 1[xl ,xu ] (X (s)) ds (10)
dX(t) = (r − q)X(t) dt + σ X(t) dW (t), X(t) = x Tj

(6) few analytical results are available. In particular,


the distribution of h Tj , Tj +1 ; xl , xu when xl =
where −∞ (or xu = ∞) is obtained in [1], exploiting the
Feynman–Kac formula. Related results are presented
r: instantaneous risk-free rate,
in [4, 5, 7, 10, 12, 16, 17]. Owing to the stationarity of
q: instantaneous dividend yield,
the Brownian
 motion,
 we observe that the distribution
σ : percentage volatility,
of h Tj , Tj +1 is the same as the distribution of
x: initial underlying price.
h 0, Tj +1 − Tj ; xl , xu , and this law (when xl =
An analytical formula for pricing fixed range notes is −∞) is strictly related to the law of the occupation
promptly available, resorting to the fact that time in the time interval [0,τ ] as
  τ
Ɛt 1[xl ,xu ] (X (T )) Y0,x (u, τ ) = 1(−∞,u] (δs + W (s)) ds, W (0) = x
  0
= Ɛt 1(−∞,xu ] (X (T )) − 1(−∞,xl ] (X (T ))
    (11)
= N dTxu−t − N dTxl−t  
where δ = r − q − σ 2 /2 /σ . Fusai [8] provides the
where pricing formula in the case of finite values of xl and
xu , obtaining the Laplace transform in time of the
ln wx + (r − q) (T − t) − 12 σ 2 (T − t) characteristic function of Y0,x (u, τ ).
T −t =
dw
The real-life case of discrete monitoring is dis-
σ 2 (T − t)
x cussed in [9] using Monte Carlo simulation and finite
1 −u2 difference methods. In particular, the authors stress
N(x) = √ e 2 du (7)
2π −∞ that the price of the contract with continuous time
Corridor Options 3

monitoring is the highest (lowest) when the index is dP (t, T )


inside (outside) the band. This is due to the nature = r (t) dt + σ (t, T ) dW (t) (13)
P (t, T )
of the contract: if the index is inside the band, and
we assume continuous time monitoring, then the pas- where σ (t, T ) is deterministic.
sage of time will increase the value of the contract In particular, Turnbull [18] considers the Ho and
until the moment in which the index crosses the bar- Lee model (σ (t, T )=σ , constant) and the Hull
riers. Instead, if we assume discrete time monitoring, and White model (σ (t, T )=σ e−λ(T −t) , λ > 0) and
we cannot exploit the passage of time completely: obtains a closed form formula for the range note
we register the position of the index only at discrete (fixed rate and delayed floating case). A simpler and
dates and if they are quite distant (e.g., a month), it more intuitive derivation than [18] and for a more
is possible that the index at the reset date will move general volatility function (albeit still deterministic)
outside the band and so the occupation time does not is obtained, using the change of numéraire technique,
increase. In this case, the time between two monitor- in [14].
ing dates is entirely lost. Vice versa, in the discrete The multifactor Gaussian HJM
case, if we are outside the band, and between two
monitoring dates the index moves inside the band, dP (t, T )
then the occupation time increases by the entire time = r (t) P (t, T ) dt + σ (t, T ) · dW (t)
P (t, T )
distance between monitoring dates. Instead, with con-
tinuous time monitoring we miss every instant before (14)
the process crosses the barrier. So the continuous
time formula will overvalue (undervalue) the discrete where · denotes the inner product in R n and
time formula when the index is inside (outside) the W (t) ∈ R n is an n-dimensional standard Brownian
band. motion as considered in [15]. Such extension is
We have not discussed here the pricing of the important because it enhances the term structure
floating range notes with a stock index as underlying model calibration to the interest rates covariance
because they are not very common. However, using matrix “observed” in the market, which, along with
the stock as numéraire and the tower property (to the term structure of interest rates, will ultimately
take into account the delay between maturity date determine the price of the range notes under analysis.
and payment date) analytical formula are promptly In fact, in order to price and hedge range notes
available. In the case of a minimum coupon, we need consistently with the market prices of related plain-
the joint law of Brownian motion and its occupation vanilla interest rate options (such as caps and/or
times. Useful formulas for this case are provided in European swaptions), it is essential to use an interest
[2, 10, 12]. rate model that is analytically tractable and that
provides a good fit to the term structures of interest
rates, volatilities, and correlations observed in the
Underlying is an Interest Rate market.
Reference [6] further extends these results to a
In this case, the underlying variable X (t) of the range multivariate Lévy term structure model, an extension
note is a simple compounded interest rate with tenor of a Gaussian HJM model with jump processes.
θ defined according to the formula The main limit of these models is that the rate
X (t) can attain negative values with positive proba-
1 bility, which may cause some pricing error in many
1 + X (t) θ = (12)
P (t, t + θ) cases.
An extension to the class of affine term struc-
The pricing of range notes, with the underlying ture model, which encompasses both Gaussian and
being an interest rate, has begun with Turnbull [18], non-Gaussian models such as Cox–Ingersol–Ross
who assumes that the interest rate dynamics is well (CIR) square-root models is introduced in [11]. The
represented by a one factor Gaussian Heath–Jarrow– extension is useful in ensuring the positivity of the
Morton (HJM) model, that is, the dynamics of the interest rate; moreover, Jangy and Yoon [11] have
price of a zero-coupon bond P (t, T ) is given by also consider the pricing spread range notes. The
4 Corridor Options

limit of their analysis is that they consider as under- End Notes


lying asset a continuously compounded interest rate
of a given tenor rather than the corresponding simple a.
The fairway in golf is like the index or interest rate range.
compounded interest rate. On one hand, this allows to The outlook is positive if the ball lands on the fairway. If,
get (i) analytical formulas for a more general class of however, a ball lands in the rough, the outlook is negative.
multifactor models and (ii) positive interest rates. On Source: http://www.investopedia.com/terms/f/fairwaybond.
the other hand, the pricing formulas are not imme- asp as of January 2009.
b.
diately adaptable to real-life contracts because the The German noun hamster has the same meaning as the
English noun hamster: it is the name of a small rodent.
spread is usually between swap rates of different
But HAMSTER is also an acronym standing for Hoffnung
tenors. Auf MarktSTabilitaet in Einer Range (literally: Hope on
A Libor Market Model is instead adopted in market stability in a given range). It really is a pun as
[19]. This has the advantage of enhancing the model in German the verb ‘hamstern’ has the meaning of ‘to
calibration to the interest rate covariance matrix hoard’. HAMSTER options hoard the fixed amount one
observed in the market, and, in addition, it guarantees gets for each day; the underlying stays in the prespeci-
the positivity of interest rates. However, in this model fied range. What is earned cannot be lost any more. Source
no analytical formulas are available for floating range http://www.margrabe.com / Dictionary / DictionaryGJ.html #
sectH as January 2009.
notes, which need to be priced by Monte Carlo
simulation. However, freezing the drift of forward
rates in order to have a dynamics of the LIBOR References
rate with deterministic drift and volatility, Wu and
Chen [19] are able to provide approximate pricing [1] Akahori, J. (1995). Some formulae for a new type of
formulae. path-dependent options, Annals of Applied Probability
Finally, we remark that, with an interest rate as 5, 383–388.
underlying, no pricing formula has so far been made [2] Borodin, A.N. & Salminen, P. (1996). Handbook of
available in the literature concerning the pricing of a Brownian Motion - Facts and Formulae, Birkhauser.
range note with a minimum coupon provision. [3] Chesney, M., Jeanblanc-Picqué, M. & Yor, M. (1997).
Brownian excursions and Parisian barrier options Brow-
nian excursions and Parisian barrier options, Advances
Related Payoff in Applied Probability 29(1), 165–184.
[4] Dassios, A. (1995). The distribution of the quantile of
Exotic options related to corridor options are quantile a Brownian motion with drift and the pricing of related
options, introduced in [13], and studied in more detail path-dependent options, Annals of Applied Probability
4(2), 719–740.
in [1, 4]. Step options studied in [12] are contracts [5] Douady, R. (1999). Closed form formulas for exotic
related to range notes, as an alternative to standard options and their lifetime distribution, International
barrier options. Barrier options have the drawback Journal of Theoretical and Applied Finance 2(1), 17–42.
of losing all value at the first touch of the barrier. [6] Eberlein, E. & Kluge, W. (2006). Valuation of floating
Step options lose value more gradually. The option range notes in lévy term-structure models, Mathematical
value decreases as the underlying asset spends more Finance 16(2), 237–254.
time at lower levels. Another example is the Parisian [7] Embrechts, P., Rogers, L.C.G. & Yor, M. (1995). A
proof of Dassios’ representation of the α-quantile of
option (see Constant Maturity Swap). A Parisian
Brownian motion with drift, Annals of Applied Proba-
out option with window D, barrier L, and maturity bility 5(3), 757–767.
date T will lose all value if the underlying price [8] Fusai, G. (2000). Corridor options and Arc-Sine law,
has an excursion of duration D above or below the Annals of Applied Probability 10(2), 634–663.
barrier L during the option’s life. If the loss of [9] Fusai, G. & Tagliani, A. (2001). Pricing of occupation
value is prompted by an excursion above (below) the time derivatives: continuous and discrete monitoring,
barrier, the option is said to be an up-and-out (down- Journal of Computational Finance 5(1), 1–37.
[10] Hugonnier, J. (1999). The Feynman-Kac formula and
and-out) Parisian option. Parisian contractual forms
pricing occupation time derivatives, International Jour-
were introduced and studied by Chesney et al. [3]. nal of Theoretical and Applied Finance 2(2), 153–178.
Contracts of this type are more robust to eventual [11] Jangy, B.G. & Hee Yoon, J. (2008). Valuation of
price manipulations. The pricing formulas in [3, 12] Range Notes Under Affine Term Structure Models,
involve inverse Laplace transforms. http://ssrn.com/abstract=1291703.
Corridor Options 5

[12] Linetsky, V. (1999). Step options: The Feynman-Kac [19] Wu, T.P. & Chen, S.N. (2008). Valuation of floating
approach to occupation time derivatives, Mathematical range notes in a LIBOR market model, Journal of
Finance 9, 55–96. Futures Markets 28(7), 697–710.
[13] Miura, R. (1992). A note on lookback options based on
order statistics, Hitotsubashi Journal of Commerce and
Management 27, 15–28. Further Reading
[14] Navatte, P. & Quittard-Pinon, F. (1999). The valuation
of interest rate digital options and range notes revisited, Tucker, A.L. & Wei, J.Z. (1997). The latest range, Advances
European Financial Management 5(3), 425–440. in Futures and Options Research 9, 287–296.
[15] Nunes, J.P.V. (2004). Multi-factor valuation of floating
range notes, Mathematical Finance 14(1), 79–97.
[16] Pechtl, A. (1995). Classified information, in Over Related Articles
the Rainbow, J. Robert, ed, Risk Publications, pp.
71–74.
[17] Takàcs, L. (1996). On a generalization of the arc-sine
Barrier Options; Corridor Variance Swap; Dis-
law, Annals of Applied Probability 6(3), 1035–1040. cretely Monitored Options; Parisian Option.
[18] Turnbull, S.M. (1995). Interest rate digital options and
range notes, Journal of Derivatives 3, 92–101. GIANLUCA FUSAI

Lookback Options A floating strike lookback call The payoff is given
by the difference between the asset price at the
option maturity, which represents the floating strike,
and the minimum price over the monitoring period.
Lookback options are path-dependent options,
Therefore, the buyer of this option can buy the
introduced at first in [25] and [26], characterized by
underlying asset paying the minimum price.
having their settlement based on the minimum or the
Notice that floating strike options will always be
maximum value of an underlying index as registered
exercised. Formulae for the payoffs are provided in
during the lifetime of the option. At maturity, the
Table 1, as well as versions involving the maximum
holder can “lookback” and select the most conve-
and variants denominated partial price or partial
nient price of the underlying that occurred during this
time.
period: therefore they offer investors the opportunity
(at a price, of course) of buying a stock at its low-
est price and selling a stock at its highest price. Since Pricing
this scheme guarantees the best possible result for the
option holder, he or she will never regret the option In this section, we discuss the pricing problem under
payoff. As a consequence, a lookback option is more the geometric Brownian motion (GBM) assumption,
expensive than a vanilla option with similar payoff that is,
function. However, these options do not offer a nat-
ural hedge for typical business and are used mainly dS(t) = (r − q)S(t)dt + σ S(t)dW (t), S(0) = S0
by speculators. To mitigate their cost, sometimes the (1)
lookback feature is mixed with an average feature:
for example, the payoff is the best or the worst of where r is the instantaneous risk-free rate; q is the
past average prices and they are offered as investment instantaneous dividend yield; σ is the percentage
product under names such as Everest, Napoleon, and volatility; S0 is the initial underlying price; and we are
Altiplano. interested in the distribution of the minimum m(T )
In the section Payoff Function, we describe and maximum M(T )
lookback options payoff. In the section Pricing, M(T ) = max S(u), and m(T ) = min S(u)
we illustrate the pricing of these options in the 0≤u≤T 0≤u≤T
Black–Scholes setting and some results on the (2)
hedging problem. Thereafter, in the section Non-
Gaussian Models, we consider the pricing problem Figure 1 illustrates a simulated path of the under-
under non-Gaussian models. Finally, in the section lying asset according to the dynamics in equation 1
Related Payoff, we present payoffs related to look- and the corresponding trajectories for the maximum
back options. and minimum price.

Analytical Solution
Payoff Function Under the GBM assumption, the distribution law of
A lookback option can be structured as a put or m(T ) (as well as the joint density of m(T ) and S(T ))
call. The strike can be either fixed or floating. We is known in closed form. This allows one to obtain
now consider two lookback options written on the an analytical solution for standard lookback options
minimum value achieved by the underlying index as expected value of the discounted payoff; see, for
during a fixed time window: example, [25, 26], and [15]. Therefore, we obtain
the following pricing formula for the floating strike
• A fixed strike lookback put The payoff is given lookback call:
by the difference, if positive, between the strike
price and the minimum price over the monitor- Ɛt,St e−r(T −t) (S(T ) − m(T ))
ing period. Therefore, the buyer of this option
= St e−q(T −t) − e−r(T −t) Ɛt,St m(T )
can sell the asset at the minimum price receiving
the strike K. = St e−q(T −t) N(d2 )
2 Lookback Options

Table 1 Lookback option payoff function


Generic lookbacks
Minimum Maximum Range
m(T ) = min S(u) M(T ) = max S(u) M(T ) − m(T )
0≤u≤T 0≤u≤T
Standard lookbacks
Floating strike Fixed strike Reverse strike
Call (S(T ) − m(T ))+ (M(T ) − K)+ (m(T ) − K)+
Put (M(T ) − S(T ))+ (K − m(T ))+ (K − M(T ))+
Nonstandard lookbacks
Partial price Partial time conditions
Call (S(T ) − λm(T ))+ (S(T2 ) − m(T1 ))+ λ ≥ 1, T1 < T2
Put (γ M(T ) − S(T ))+ (M(T1 ) − S(T2 ))+ µ ≤ 1, T1 < T2

6 2(r − q) √
d3 = − d2 + T − t,
σ
5  x −u2
Stock price
Minimum
N(x) = e 2 du (4)
−∞
4 Maximum

3 Notice that the formula above admits a simpli-


fication when t = 0: indeed, in this case, we have
2 S(0) = m(0). Formula 3 suggests that the lookback
call value is given by the sum of the premium of a
1 plain vanilla call with strike equal to the current min-
imum and the premium of a so-called strike bonus
0 option. This is the expected value of the cash flows
0 0.2 0.4 0.6 0.8 1 necessary to exchange the initial option position for
Time options with successively more favorable strikes. In
other words, it measures the potential decrease in the
Figure 1 Geometric Brownian motion and its maximum
and minimum to date
price at which the option allows its holder to buy the
security, if and when the security price attains a new
minimum.
 √  For pricing the fixed strike put option, we have
− e−r(T −t) mt N d2 − σ T − t
  −2r/σ 2
σ 2 St −r(T −t) St
+ e N(d3 ) Ɛt,St e−r(T −t) (K − m(T ))+
2r mt

−q(T −t)
= 1(K<mt ) −St N(−d) + e−r(T −t)
−e N(−d2 ) (3)
 √  σ2 
with × K N −d + σ T − t + St
2r
     −2r/σ 2   

1 St −r(T −t) St
d2 =  ln + (r − q)(T − t) × e N d1 − N(−d)
σ (T − t)
2 m t K

1
+ σ 2 (T − t) , + 1(K≥mt ) e−r(T −t) (K − mt ) − St N(−d2 )
2
Lookback Options 3

 √  σ2  For example, setting r = 0.1, q = 0, σ = 0.2,


−r(T −t)
+e mt N −d2 + σ T − t + St T = 1, S0 = 100, the continuous formula returns
2r
19.6456. Assuming a year consists of 250 days and
  −2r/σ 2 
that 10 monitoring dates are available (i.e., mon-
St
× e−r(T −t) N(d3 ) − N(−d2 ) (5) itoring occurs approximately once a month), the
mt
discrete formula gives 17.0007: a percentage dif-
with ference about 15% with respect to the continuous
case. Using 10 000 monitoring dates (i.e., monitoring
  
1 St occurs once every 36 minutes), the discrete formula
d= ln + (r − q)(T − t) returns 19.5523, a small but still appreciable differ-
σ 2 (T − t) K
 ence with respect to the continuous case.
1 Few papers have investigated the analytical pric-
+ σ 2 (T − t) ,
2 ing of discretely monitored lookback options. A
correction to the continuously monitored formula
−d + 2(r − q) √
d1 = T −t (6) based on the Riemann zeta function is given in
σ [12]. The Riemann zeta function enters indeed in
Lookback options on the maximum can be priced the computations of the moments of the discrete
by exploiting the relation between maximum and maximum/minimum; see, for example, [23, 27, 31,
minimum operators. 32, 34], for different derivations and improvements
An important feature of lookbacks is the frequency of the above correction. New results for discrete
of observation of the underlying assets for the pur- options have been recently obtained exploiting the
pose of identifying the best possible value for the Wiener–Hopf factorization and the Spitzer’s iden-
holder. For example, the above expressions are con- tity. For example, [5] and [27] present an exact
sistent with the assumption that the underlying asset analytical formula showing how to cast the pric-
is monitored continuously. Instead, discrete monitor- ing problem in terms of an integral equation of the
ing refers to updating the maximum/minimum price Wiener–Hopf type. This equation can be solved in
at fixed times (e.g., daily, weekly, or monthly). In this closed form in the Gaussian case. The computa-
case, we have to replace M(T ) and m(T ) by M(T ) tional cost is linear in the number of monitoring
and m (T ), defined as dates. A related approach based on the Spitzer’s iden-
tity (the probabilistic interpretation of the solution
) = max S(i), and m
M(T (T ) = min S(i) of a Wiener-Hopf equation) has been advanced by
0≤i≤n 0≤i≤n
(7) Borovkov and Novikov [9] and by Petrella and Kou
where n is the number of monitoring dates and [39]. They propose an algorithm with a computational
 is the time distance between monitoring dates; cost that is quadratic in the number of monitoring
with n = T . Nearly all closed-form expressions dates but has the advantages of being simply adapted
available for pricing path-dependent options are to non-Gaussian models provided that they have inde-
based on continuous-time paths, but many traded pendent identically distributed (i.i.d.) increments and
options are based on discrete price fixings. In general, the pricing formula of plain vanilla calls and puts are
a higher maximum/lower minimum occurs as long as available.
the number n of monitoring dates increases. As noted Other approaches are mainly numerical and briefly
by Heynen and Kat [29] and Aitsahlia and Leung [1], detailed in the following subsections.
the discrepancy between option prices under contin-
uous and discrete monitoring of the reference index Finite Difference Method
may have significant effect on the prices of look-
back options, but does not introduce new hedging The numerical solution of the partial differential
problems. Indeed, the slow convergence of the dis- equation (PDE) satisfied by the lookback option price
crete scheme to the continuous one as the number is discussed for example in [42] and a detailed
n of monitoring dates increases is well known. This treatment for the discrete case is given in [3]. Since
figure is quantified
√ in an order of proportionality of lookback options are path-dependent options, their
approximately 1/ n. value V does not depend only on the current spot
4 Lookback Options

price and time, but also on the current realized dates exploiting the exact solution of the stochastic
minimum or maximum, and we can write V = differential equation (1):
V (S, m, t) for options on the minimum (similar 2
√ (j )
discussion holds for lookback on the maximum). S (j ) ((i + 1)) = S (j ) (i)e(r−0.5σ )+σ i
(11)
Applying Ito’s lemma and equating the expected (j )
return on the option to the return on a risk-free where i is a standard normal random variate and
investment, it can be shown that V solves the S (j ) (i) is the spot price at time ti = i as sampled
following PDE: in the j th simulation.
The corresponding minimum price m(j ) (i) over
∂V ∂V σ 2 2 ∂ 2V the time interval t(i−1) , ti is updated at each mon-
+ (r − q)S + S = rV (8) itoring date according to the rule
∂t ∂S 2 ∂S 2

which has to be solved for S ≥ m and T ≥ t ≥ 0. (j ) (i) = min S (j ) (i), m
m (j ) ((i − 1)) (12)
The above PDE is the standard Black–Scholes PDE, (j )
with the change of the domain from S ≥ 0 to S ≥ m. with a starting condition m0 = S0 . The MC price for
Here m appears as parameter delimiting the domain a lookback is given by the average of the discounted
of the spot price. This implies that a boundary payoff computed over J simulated sample paths.
condition at S = m is needed. The important point For a lookback option with fixed strike K, the MC
is the observation that when the spot price is near to price is
the running minimum, the probability that at expiry
1  +
J
the minimum will be equal to the current minimum e−rtn (j ) (n)
K −m (13)
m is zero and therefore changes in m do not affect J j =1
the option value. This allows one to set the boundary
condition at S = m: Similarly, for a floating strike lookback option, the
MC price is
∂V (S, m, t)
= 0 when S = m (9)
1  (n)
∂m J
e−rtn S (j ) (n)
−m (14)
Together with the payoff condition at t = T , equa- J j =1
tions (8) and (9) allow us to fully characterize
the lookback option premium. For discretely mon- Unfortunately, the procedure cannot be immediately
itored lookback options, with monitoring at dates applied to continuously monitored options shrinking
ti = i the PDE (8) remains unchanged, while the the time step . Indeed, owing to fact that we can
boundary condition (9) does not apply anymore. only sample at discrete times, we lose information
Indeed, between monitoring dates, the spot price can about the parts of the continuous-time path that lie
freely move in (0, +∞) and at monitoring dates, the between the sampling dates. This procedure will be
solution is updated according to the rule systematically biased in the sense that the continuous
minimum (maximum) will be always overestimated
V (S, m, ti+ ) = V (S, min(S, m), ti− ) (10) (underestimated). Andersen and Brotherton-Ratcliffe
[4] show that for a one-year lookback with 256
Equation (8) can be solved numerically using discrete monitoring points this bias is around 5% of
an appropriate numerical scheme such as the the option price and suggest a procedure to correct it.
Crank–Nicolson one (see Crank–Nicolson Scheme).
However, exploiting a change of numeraire, the PDE
(8) can be simplified to a single state variable [3]. Binomial and Tree Methods
As for PDEs, the implementation of a tree for
Monte Carlo Simulation path-dependent options involves two state variables
and the need to keep track of current extreme
Discretely monitored lookback options can be easily values will cause the number of calculations to
priced by standard Monte Carlo (MC) simulation. grow substantially faster than the number of nodes.
The underlying price is simulated at all monitoring However, under the GBM assumption and exploiting
Lookback Options 5

a change of numeraire, the pricing of lookback Non-Gaussian Models


options can be reduced to one-state binomial model
(having a reflecting barrier at 0); see, for instance, The GBM model in equation (1) is one of the most
[6] and [14], as well as Tree Methods. Using such successful and widely used models in financial eco-
models cuts the amount of computation remarkably nomics. Unfortunately, deviations between the model
and also makes it straightforward to deal with the and empirical evidence are well known in literature
early exercise feature. and therefore offer new opportunities for the devel-
opment of more-realistic models and pricing formu-
las for exotic options. With reference to lookback
options, extensions have been obtained replacing the
Hedging
GBM model by the constant elasticity of variance
(CEV) model (see Constant Elasticity of Variance
We conclude this section, mentioning the hedging (CEV) Diffusion Model) or by an exponential Lévy
problem of lookback options. When r = q + σ 2 /2, model (see Exponential Lévy Models). In the fol-
lookback options can be exactly replicated using a lowing sections we discuss such extensions.
self-financing strategy based on straddles (an ordinary
put plus an ordinary call) with an exercise price equal
to the initial extremum (maximum or minimum) [25]. CEV
If, over the life of the option, the stock price never
rises above or falls below its initial value, the initial The study of lookback options in the CEV model has
straddle would exactly satisfy the writer’s terminal started with [10] and [11]. Their approach consists
obligation. When the stock price is at an extremum in approximating the CEV process by a trinomial
and then achieves a new maximum or minimum, the lattice and uses it to value barrier and lookback
straddle should be sold and a new portfolio estab- options numerically. A binomial tree is also used in
lished, a straddle with exercise price equal to the new [16]. While these approaches are purely numerical, in
maximum (minimum). This strategy is self-financed [20] closed-form solutions for the Laplace transforms
and replicates the lookback option. However, in gen- of the probability distributions of the maximum
eral, the replicating strategy requires the computation and minimum are obtained. Lookback prices are
of the  coefficient, that is, units of stocks needed to recovered by inverting the Laplace transforms and
replicate the contingent claim, by taking the deriva- integrating against the option payoff. An analytical
tive of the pricing formula with respect to the current inversion of the Laplace transforms for lookback
spot price. An alternative method is put forward by options in terms of spectral expansions associated
the use of the Malliavin calculus approach, and the with the infinitesimal generator of the CEV diffusion
main component for this method to work is a sort is given in [36]. All these studies point out that the
of representation theorem, called the Clark–Ocone differences in prices of these exotic options under the
formula. This formula allows us to identify a formal CEV and geometric Brownian motion assumptions
expression for the replicating portfolio of basically can be far more significant than the differences for
any contingent claim. This has been exploited in [8] standard European options.
to obtain the replicating portfolio.
A different approach is taken in [30] that finds Lévy Processes
bounds on the prices of the lookback option, in
terms of the (market) prices of call options. This is Lookback options have been priced also under Lévy
achieved without making explicit assumptions about processes. For example, in [13] a very powerful algo-
the dynamics of the price process of the underlying rithm is proposed that can be adapted for pricing
asset, but rather by inferring information about the discrete lookback under a jump-diffusion model. For
potential distribution of asset prices from the call more general Lévy processes, very efficient algo-
prices. Thus the bounds and the associated hedging rithms have been proposed in [22, 39] and [24]. An
strategies are model independent and represent limits analytical expression in terms of the Laplace trans-
on the possible price of the lookback, which are form is found for continuously monitored options,
necessary for the absence of arbitrage. under a double-exponential model, in [35].
6 Lookback Options

Related Payoff to write the quantile as sum

Several exotic options can be thought of as a mod- d


q(n, j ) = m )
((n − j ) ) + M(j (15)
ification of lookback options. We have mentioned a
d
few of them here. where = means equality in distribution and m ((n −
In partial lookback options, the lookback feature j )) and M(j ) are independent processes. From
is limited to only the first (for entry timing) or this identity, it follows that the density of the quan-
the last (for exit timing) part of the options’ life. ((n − j ))
tile is the convolution of the densities of m
This product, even with a relatively short lookback and M(j ) and, at the issue date of the contract,
periods, appears to offer a good solution to most the quantile price can be obtained as expected dis-
timing problems at a reasonable price. Kat and counted payoff under the risk-neutral measure; see
Heynen [33] provide closed-form pricing formulas [5] and [19] for details. Another way of exploiting
for such options. Analysis shows how the prices of the Dassios–Port–Wendel identity consists in, con-
such “partial lookback options” respond to a change ditioning on the minimum, representing the quantile
in the monitoring period. option price as average of lookback option prices
Quanto lookback refers to a payoff structure where written on the maximum and with random strike. The
the terminal payoff of the quanto option depends on average is taken with respect to the density of the dis-
the realized extreme value of a stock denominated crete minimum. Finally, we remark that the pricing
in a foreign currency but the payoff is paid in a off the inception date is not straightforward, because
domestic currency. These contracts have been studied the quantile process is non-Markovian. A discussion
in [18]. can be found in [19]. Other relevant references are
Double lookbacks or Range options include calls [2, 38], and [7].
and puts with the underlying being the difference A drawdown (drawup) option is defined as the
between the maximum and minimum prices of one drop (increase) of the asset price from its running
asset over a certain period, and calls or puts with the maximum (minimum), D(t) = M(t) − S(t), (U (t) =
underlying being the difference between the maxi- S(t) − m(t)). Maximum drawdown MDD(t) is
mum prices of two correlated assets over a certain defined as the maximal drop of the asset price from
period. Analytical expressions of the joint probabil- its running maximum over a given period of time:
ity distribution of the maximum and minimum values
of two correlated geometric Brownian motions are MDD(t) = max D(s) (16)
0≤s≤t
derived in [28] and used in the valuation of dou-
ble lookbacks. An option on the spread between the In a similar manner, we can define the maximum
maximum and minimum price of a single stock over drawup. Maximum drawdown measures the worst
a given interval of time captures the idea of an option loss of an investor who enters the market at a
on price volatility; see, for example, the related liter- certain point and leaves it at some following point
ature on financial econometrics devoted to estimate within a given time period; this means that he or
the volatility using range estimators. Therefore, such she buys the asset at a local maximum and sells
an option might be of interest to traders who want to it at the subsequent lowest point, and this drop is
bet on price volatility or hedge an existing position the largest in the given time period. A derivative
that is sensitive to price volatility. contract on the maximum drawdown, introduced
Quantile options have a payoff at maturity depend- in [41], can serve as an important risk measure
ing on the order statistics of the underlying asset indicator: when the market is in a bubble, it is
price. The j -quantile process q(n, j ) is defined as reasonable to expect that the prices of drawdown
the level at which the return process stays below for contracts would be significantly higher. On the other
j periods out of n. In particular, we have q(n, n) = hand, when the market is stable, or when it exhibits

M(n) and q(n, 0) = m (n). The quantile pay- mean reversion behavior, the prices of drawdown
off is obtained replacing the extreme appearing in contracts would become cheaper. When the market
Table 1 by the quantile q(n, j ). These options can experiences a crash, the lookback option may expire
be priced by making use of what is known as the close to worthless if the final asset value is near
Dassios–Port–Wendel identity [19], that allows one its running maximum. Momentum traders believe
Lookback Options 7

that the realized maximum drawdown (maximum [7] Ballotta, L. & Kyprianou, A. (2001). A note on the
drawup) will be larger than expected, and thus they Alpha-Quantile option, Applied Mathematical Finance
are natural buyers of this contract. On the other 8, 137–144.
[8] Bermin, H.-P. (2000). Hedging lookback and partial
hand, selling the (unhedged) contract is equivalent
lookback options using Malliavin calculus, Applied
to taking the opposite strategy, namely, buying the Mathematical Finance 39, 75–100.
asset when it is setting its new low. This is known [9] Borovkov, K. & Novikov, A. (2002). On a new approach
as contrarian trading. Contrarian traders believe that for option pricing, Journal of Applied Probability 39,
the realized maximum drawdown (maximum drawup 1–7.
or range) will be smaller than expected, and they are [10] Boyle, P.P. & Tian, Y. (1999). Pricing lookback and
natural sellers of this contract. The distribution of the barrier options under the CEV process, Journal of
Financial and Quantitative Analysis 34, 241–264.
maximum drawdown of Brownian motion is studied
[11] Boyle, P.P., Tian. Y. & Imai, J. (1999). Lookback
in [37]. options under the CEV process: a correction, Jour-
Russian options are perpetual American Options nal of Finance and Quantitative Analysis web site
with lookback payoff, introduced in [40]. Russian http://www.jfqa.org/. In: Notes, Comments, and Correc-
options can be regarded as a kind of perpetual tions.
American fixed strike lookback option with zero [12] Broadie, M., Glasserman, P. & Kou, S. (1999). Con-
strike price and their pricing can be derived by necting discrete and continuous path-dependent options,
using a probability approach [21], or a PDE appro- Finance and Stochastics 3, 55–82.
[13] Broadie, M. & Yamamoto, Y. (2005). A double-
ach [17]. exponential fast Gauss transform algorithm for pricing
Finally, we mention the class of structures named discrete path-dependent options, Operations Research
mountain options, having names such as Himalaya, 53(5), 764–779.
Everest, Altiplano and so on (see Atlas Option; [14] Cheuck, T.H.F. & Vorst, T.C.F. (1997). Currency look-
Himalayan Option; Altiplano Option). Here, the back options and observation frequency: a binomial
extrema over a given period of a given asset is approach, Journal of International Money and Finance
replaced by the best or the worst performer over dif- 16(2), 173–187.
[15] Conze, A. & Vishwanathan, R. (1991). Path-dependent
ferent periods of assets in a given basket. Sometimes,
options: the case of lookback options, Journal of Finance
a global floor on the return of the product is also intro- 46, 1893–1907.
duced. It is clear that MC simulation is needed to be [16] Costabile, M. (2006). On pricing lookback options under
able to price this type of products, that, in general, the CEV process, Decisions in Economics and Finance
are very sensible to the crosscorrelation of the assets. 29, 139–153.
In addition, the Greeks of these contracts can change [17] Dai, M. (2000). A closed-form solution for perpetual
markedly as the trade progresses. American floating strike lookback options, Journal of
Computational Finance 4(2), 63–68.
[18] Dai, M., Kwok, Y.K. & Wong, H.Y. (2004).
References Quanto lookback options, Mathematical Finance 14(3),
445–467.
[1] Aitsahlia, F. & Leung, L.T. (1998). Random walk duality [19] Dassios, A. (1995). The distribution of the quantile of
and the valuation of discrete lookback options, Applied a Brownian motion with drift & the pricing of related
Mathematical Finance 5(3/4), 227–240. path-dependent options, Annals of Applied Probability 5,
[2] Akahori, J. (1995). Some formulae for a new type of 389–398.
path-dependent option, Annals of Applied Probability 5, [20] Davydov, D. & Linetsky, V. (2001). The valuation and
383–388. hedging of barrier and lookback options under the CEV
[3] Andreasen, J. (1998). The pricing of discretely sampled process, Management Science 47, 949–965.
Asian and lookback options: a change of numeraire [21] Duffie, D. & Harrison, J.M. (1993). Arbitrage pricing
approach, Journal of Computational Finance 2(1), 5–30. of Russian options and perpetual lookback options, The
[4] Andersen, L. & Brotherton-Ratcliffe, R. (1996). Exact Annals of Applied Probability 3(3), 641–651.
exotics, Risk Magazine 9, 85–89. [22] Feng, L. & Linetsky, V. (2009). Computing exponential
[5] Atkinson, C. & Fusai, G. (2007). Discrete extrema of moments of the discrete maximum of a Levy process and
Brownian motion and pricing of exotic options, Journal lookback options, Finance and Stochastics, available at
of Computational Finance 10(3), 1–43. SSRN:http://ssrn.com/abstract=1260934.
[6] Babbs, S. (2000). Binomial valuation of lookback [23] Fusai, G., Abrahams, I.D. & Sgarra, C. (2006). An exact
options, Journal of Economic Dynamics and Control analytical solution for discrete barrier options, Finance
24(11–12), 1499–1525. and Stochastics 10(1), 1–26.
8 Lookback Options

[24] Fusai, G., Marazzina, D., Marena, M. & Ng, M. [35] Kou, S.G. & Wang, H. (2003). First passage times of a
(2008). Maturity Randomization and Option Pricing. jump diffusion process, Advances in Applied Probability
w.p. SEMeQ. 35, 504–531.
[25] Goldman, M.B., Sosin, H.B. & Gatto, M.A. (1979). [36] Linetsky, V. (2004). Lookback options and diffusion
Path-dependent options: buy at the low, sell at the high, hitting time: a spectral expansion approach, Finance and
Journal of Finance 34, 1111–1127. Stochastics 8, 373–398.
[26] Goldman, M.B., Sosin, H.B. & Shepp, L. (1979). On [37] Magdon-Ismail, M., Atiya, A., Pratap, A. & Abu-
Mostafa, Y. (2004). On the maximum drawdown of a
contingent claims that insure ex-post optimal stock
Brownian motion, Journal of Applied Probability 41(1),
market timing, Journal of Finance 34, 401–413.
147–161.
[27] Green, R., Fusai, G. & Abrahams, I.D. (2009). The
[38] Miura, R. (1992). A note on lookback options based
Wiener-Hopf technique & discretely monitored path on order statistics, Hitotsubashi Journal of Commerce &
dependent option pricing, Mathematical Finance, to Management 27, 15–28.
appear. [39] Petrella, G. & Kou, S.G. (2004). Numerical pricing
[28] He, H., Keirstead, W. & Rebholz, J. (1998). Double of discrete barrier and lookback options via Laplace
lookbacks, Mathematical Finance 8, 201–228. transforms, Journal of Computational Finance 8, 1–37.
[29] Heynen, R.C. & Kat, H.M. (1995). Lookback options [40] Shepp, L. & Shiryaev, A.N. (1993). The Russian
with discrete and partial monitoring of the underlying option: reduced regret, Annals of Applied Probability 3,
price, Applied Mathematical Finance 2, 273–284. 631–640.
[30] Hobson, D.G. (1998). Robust hedging of the lookback [41] Vecer, J. (2006). Maximum drawdown and directional
option, Finance and Stochastics 2(4), 329–347. trading, Risk Magazine 19(12), 88–92.
[31] Hörfelt, P. (2003). Extension of the corrected bar- [42] Wilmott, P., Dewynne, J.N. & Howison, S. (1993).
rier approximation by Broadie, Glasserman, and Kou, Option Pricing: Mathematical Models and Computation,
Oxford Financial Press.
Finance and Stochastics 7(2), 231–243.
[32] Howison, S. & Steinberg, M. (2007). A matched asymp-
totic expansions approach to continuity corrections for Related Articles
discretely sampled options. Part 1: barrier options,
Applied Mathematical Finance 14, 63–89.
[33] Kat, H.M. & Heynen, R.C. (1994). Selective memory. Barrier Options; Corridor Options; Discretely
Risk Magazine 7(11), 73–76. Monitored Options; Parisian Option.
[34] Kou, S.G. (2003). On pricing of discrete barrier options,
Statistica Sinica 13, 955–964. GIANLUCA FUSAI
Parisian Option over a time of length  —by studying its asymptotic
behavior when  tends to 0. They derive precise
estimates of gL,tS
:= sup{u ≤ t|Su = L}, which is, for
Parisian options are barrier options that are activated a down-and-out Parisian option, related to TLD,− (S)
or canceled—depending on the type of option—if by the following formula: TLD,− (S) := inf{t > 0 :
the underlying asset has been continuously traded (t − gL,t
S
)11St <L > D}. This procedure still works
above or below the barrier level long enough. A when the asset follows a diffusion process with
down-and-out Parisian option denotes a contract that general coefficients.
expires worthless if the underlying asset reaches a
prespecified level L and remains constantly below
this level for a time interval longer than a fixed Laplace Transforms
number D, called the window. Its price (for a call
option) at time 0 is given by The idea of using Laplace transforms for pricing
  Parisian options is owed to Chesney et al., [5]. By
φ(T , K) := e−rT Ɛ (ST − K)+ 11T D,− (S)>T (1) using the Brownian excursion theory, they get closed
L
formulas for
where TLD,− (S) is the first time the asset S makes  ∞
an excursion longer than D below L. Parisian- dt e−λt φ(t, K) (2)
style options are mostly encountered in convertible 0

bonds with “soft-call” provision for conversion. For the Laplace transform of the price with respect to the
example, the bond’s specifications may be such maturity time.
that conversion will be allowed if and only if the For models with constant parameters, when con-
share price remains above a theoretical price for sidering a down and in call option, one rewrites
a given amount of time, for example, 20 business φ(T , K) as
days prior to the conversion date (this is Parisian
option). Other covenants stipulate that the average  
− r+
m2
T   + 
share price trades for n days above the trigger level. e 2
Ɛ 11T D,− <T xeσ ZT − K emZT (3)
While the latter does not correspond sensu stricto to b

a Parisian option, the motivation is similar: to render


where Z is a -Brownian motion, m depends on
the conversion rule more stable—and less prone to
manipulation—basing it on the behavior of the stock r and σ , and TbD,− := TbD,− (Z) is the first time Z
price over a window of time as opposed to basing it makes an excursion below b := σ1 log (L/S0 ) longer
on the (more volatile) spot price. The pioneer paper than D. By using the Brownian motion excursion the-
on that topic is owed to Chesney et al., [5]. Pricing ory, notably the Azéma martingale and the Brownian
Parisian options is a challenging issue and several meander, the density of ZT D,− can be obtained and
b

methods have been proposed in the literature: Monte it can be shown that TbD,− and ZT D,− are indepen-
b
Carlo simulations, Laplace transforms, lattices, and dent. There is no explicit formula for the density of
partial differential equations. TbD,− , but we only know its Laplace transform. The
strong Markov property enables to introduce ZT D,− in
b
equation (3). We rewrite equation (3) as
Monte Carlo Method
 ∞
As for standard barrier options, using simulations Ɛ (11T D,− <T PT −T D,− (fx )(z))ν − ( dz) (4)
b b
leads to a biased problem, owing to the choice of the −∞

discretization time step in the Monte Carlo algorithm.


Baldi et al., [3] have developed a method based on where ν − denotes the law of ZT D,− , fx (z) =
b

e−(r+m /2)T emz (xeσ z − K)+ , and  Pt (fx )(z) = 1/


2
sharp large deviation estimates, which improves the √ ∞
usual Monte Carlo procedure. It consists in providing 2πt −∞ fx (u) exp −(u − z)2 /2t du. It remains
an approximation of p  —the probability that a to compute the Laplace transform of equation (4)
Brownian bridge reaches a time-dependent barrier with respect to the maturity. A change of variables
2 Parisian Option

introduces the Laplace transform of TbD,− , which is The new state variable τ can be viewed as a clock
explicitly known. This leads to a closed formula. that starts ticking as soon as the share price crosses
We refer the reader to [1] for the description of a the barrier level and is immediately reset when the
fast and accurate numerical inversion of the Laplace share price returns above L. We assume that the
transforms. By studying the regularity of the Parisian asset follows a log normal Brownian motion given
option prices with respect to the maturity time, Labart by dSt = µSt dt + σ St dWt . The option price is a
and Lelong [9] justify the accuracy of the numerical function of S, t, τ . If S ≥ L, the governing equation
inversion. Except for particular values of the barrier, is the standard Black Scholes equation:
the prices are of class C∞ . Their study relies on
the existence and the regularity of a density for the ∂V 1 ∂ 2V ∂V
+ σ 2 S 2 2 + rS − rV = 0 (6)
Parisian time TbD,− . ∂t 2 ∂S ∂S
This algorithm is implemented in [4] and is
If S ≤ L, τ is ticking. The new governing equation
compared to a procedure for approximating a gen-
is
eral Laplace transform with one that can be easily
inverted. The Laplace transform approach is very spe- ∂V 1 ∂ 2V ∂V ∂V
cific to the problem, but practically we see that the + σ 2 S 2 2 + rS + − rV = 0 (7)
∂t 2 ∂S ∂S ∂τ
lack in the flexibility of the method is compensated
by its accuracy and computational speed. The boundary conditions are the following: the path-
wise continuity of V in S = L leads to V (L, t, τ ) =
V (L, t, 0) for all t, and
Lattices
V (S, T , τ ) = (ST − K)+ if τ < D,
Costabile [6] presents a discrete time algorithm to
evaluate Parisian options. The evaluation method is V (S, T , τ ) = 0 otherwise (8)
based on a combinatorial approach used to count the
number of trajectories of a particle which, moving In the study of Haber et al. [8], the numerical solu-
in a binomial lattice, remains constantly above an tion to equations (6) and (7) is implemented using an
upper barrier for time intervals strictly smaller than a explicit finite difference scheme. In the case of a dis-
prespecified window period. Once this number has crete monitoring of the contract, Vetzal and Forsyth
been computed, it can be used to derive a bino- [7] develop an algorithm based on the numerical
mial algorithm, based on the Cox–Ross–Rubinstein solution of a system of one-dimensional PDEs. It
(CRR) model (see Binomial Tree or Tree Methods). is assumed that τ only changes at observation dates
It enables to evaluate Parisian options with a con- with the value of S with respect to the barrier. Away
stant or an exponential barrier. Avellaneda and Wu from observation dates, the PDE satisfied by V does
[2] model and price Parisian-style options by a trino- not depend on τ . Then, the pricing problems consist
mial lattice method, which changes with the value of of a small number of one-dimensional PDEs, which
the asset with respect to the barrier. exchange information only at observation dates (we
impose the continuity of V ).
These methods have one major benefit: they are
Partial Differential Equations flexible enough to be easily modified to price more
general options, like Parisian (i.e., when the recorded
Pricing of Parisian options can be done using par- duration is cumulative rather than continuous).
tial differential equations. Let τ define the time the
underlying asset has continuously spent in the excur-
sion. For a down Parisian option, τ := t − sup{t  ≤ Double Parisian
t|St  ≥ L}. The dynamics of τ is
There exists a double barrier version of the stan-
dard Parisian options. Double Parisian options are
dt if St < L,
dτt = −τt − if St = L, (5) barrier options that are activated or canceled if the
0 if St < L underlying asset continuously remains outside a range
Parisian Option 3

[L1 , L2 ] long enough. The price of a double Parisian [5] Chesney, M., Jeanblanc-Picqué, M. & Yor, M. (1997).
out call at time 0 is given by Brownian excursions and Parisian barrier options,
Advances in Applied Probability 29(1), 165–184.
 
[6] Costabile, M. (2002). A combinatorial approach for
e−rT Ɛ (ST − K)+ 11T D,− (S)>T 11T D,+ (S)>T (9) pricing Parisian options, Decisions in Economics and
L1 L2
Finance 25(2), 111–125.
These double Parisian options can be priced using the [7] Forsyth, P.A. & Vetzal, K.R. (1999). Discrete Parisian and
Monte Carlo procedure improved with the sharp large delayed barrier options: A general numerical approach,
deviation method proposed by Baldi, Caramellino, Advances in Futures Options Research 10, 1–16.
and Iovino [3]. Labart and Lelong [9] give analytical [8] Haber, R.J., Schonbucher, P.J. & Wilmott, P. (1999).
Pricing Parisian options, Journal of Derivatives 6(3),
formulas for the Laplace transforms of the prices with
71–79.
respect to the maturity time. [9] Labart, C. & Lelong, J. Pricing Double Parisian
options using Laplace transforms, International Journal
References of Theoretical and Applied Finance (to appear),
http://hal.archives-ouvertes.fr/hal-00220470/fr/.

[1] Abate, J., Choudhury, L.G. & Whitt, G. (1999). An intro-


duction to numerical transform inversion and its applica- Related Articles
tion to probability models, in Computational Probability,
W. Grassman, ed., Kluwer, Boston, pp. 257–323.
[2] Avellaneda, M. & Wu, L. (1999). Pricing Parisian-style Barrier Options; Discretely Monitored Options;
options with a lattice method, International Journal of Finite Difference Methods for Barrier Options;
Theoretical and Applied Finance 2(1), 1–16. Lattice Methods for Path-dependent Options;
[3] Baldi, P., Caramellino, L. & Iovino, M.G. (2000). Pricing Partial Differential Equations.
complex barrier options with general features using sharp
large deviation estimates, in Monte Carlo and Quasi- CÉLINE LABART
Monte Carlo Methods 1998 (Claremont, CA), Springer,
Berlin, pp. 149–162.
[4] Bernard, C., LeCourtois, O. & Quittard-Pinon, F. (2005).
A new procedure for pricing Parisian options, The Journal
of Derivatives 12(4), 45–53.
Cliquet Options where GF is a global floor; GC is a global cap; and
notional is the principal amount of the investment.
The investor forgoes returns above the local cap
Cliquet options can be broadly characterized as con- and is protected against returns below the local
tracts whose economic value depends on a series of floor. For the same investment cost, investors can
periodic settlement values. Each settlement period has participate in more of the upside return by raising
an associated strike whose value is set at the begin- the local cap at the expense of a lowered local floor
ning of the period. This periodic resetting of the strike and the increased exposure to downside returns.
allows the cliquet option to remain economically sen-
sitive across wide changes in market levels.
Applications of Cliquet Options

The Market for Cliquet Options The periodic strike setting feature of a cliquet enables
an investor to implement a strategy consistent with
The early market in cliquet options featured rolling options positions but without exposure to
vanilla contracts that were simply a series of for- volatility movements. For example, an investor could
ward starting at-the-money options. Rubinstein [4] buy a cliquet to implement a rolling three-month put
provided pricing formulae for forward-start options in strategy and be immunized against the future increase
a Black–Scholes framework resulting in Black–Scho in options premiums that would accompany increases
les pricing for vanilla cliquets. Cliquet products now in volatility throughout the life of the strategy. Hence
trade on exchanges and the fore-runner to these list- a cliquet provides cost certainty, whereas the rolling
ings were reset warrants, whose first public listings in put strategy does not.
the United States appeared in 1993 [5] and 1996 [1, Cliquet products are often embedded in principal-
2]. Cliquet options are equally effective in capturing protected notes, which combine certain aspects of
bullish (call) and bearish (put) market sentiments. fixed-income investing with equity investing. These
The current market for cliquet options accommo- notes guarantee the return of principal at maturity
dates a rich variety of features, which are sometimes with the investment upside provided by the cliquet
best illuminated in discussions of pricing methods [6, return. Retail notes would generally base investment
7]. The most actively traded cliquets are return-based gains on a broad market index such as the S&P 500
products that accumulate periodic settlement values index. Principal-protected notes may further guaran-
and pay a cash flow at maturity. The return character- tee a minimum investment yield, which compounds
istics and the price appeal of a cliquet can be tailored to the value of the global floor at maturity. The guar-
by adding caps and floors to the period returns and anteed yield may be considered as part of the equity
by introducing a strike moniness factor different from return, as it is in equation (2), or it can be considered
one. Defining the ith settlement value, Ri , in a call as part of the fixed-income return. In the latter case,
style cliquet by the equity payoff in equation (2) would be modified
   as in equation (3):
Si
Ri = Max floor i , Min − ki , cap i (1)
Si0 payoff = Notional · Max
  n  
where floor i is the one-period(local) return floor for 
period i; cap i is the one-period(local) return cap for × 0, Min Ri , GC − GF (3)
i
period i; Si is the market level on the settlement date
for period i; Si0 is the market level on the strike setting where the global floor now sets a strike on the sum
date for period i; and ki is a strike moniness factor of periodic returns.
for period i.
The payoff at maturity is given by
Summary
payoff = Notional · Max
  n  We have discussed the general characteristics of cli-

× GF , Min Ri , GC (2) quet options and illustrated the payoff for one com-
i monly traded type of the cliquet. Numerous variations
2 Cliquet Options

exist and can be tailored to give very different risk- [2] Gray, S.F. & Whaley, R.E. (1997). Valuing S&P 500
reward profiles. Some are distinguished in the market bear market warrants with a periodic reset, Journal of
by specific names, for example reverse cliquets [3]. Derivatives 5(1), 99–106.
[3] Jeffrey, C. (2004). Reverse cliquets: end of the road? RISK
The customizability of cliquet options likely means
17(2), 20–22.
we will continue to see product innovation in this [4] Rubinstein, M. (1991). Pay now, choose later, RISK 4, 13.
area in the future. [5] Walmsley, J. (1998). New Financial Instruments, 2nd
edition, John Wiley & Sons, New York.
[6] Wilmott, P. (2002). Cliquet options and volatility models,
References Wilmott Magazine, 6.
[7] Windcliff, H., Forsyth, P.A. & Vetzal, K.R. (2006).
Numerical methods and volatility models for valuing
[1] Conran, A. (1996). IFC Issues S&P 500 Index Bear Mar- cliquet options, Applied Mathematical Finance 13, 353.
ket Warrants, November 26, 1996 Press Release, http:
//www.ifc.org/ifcext/media.nsf/Content/PressReleases. RICK L. SHYPIT
Basket Options basket associated with each stock. If we assume that
these weights are approximately constant which is
reasonable, it follows that the volatility of the basket
and the volatilities of the stocks satisfy the relation
Equity basket options are derivative contracts that
have as underlying asset a basket of stocks. This 
n
category may include (broadly speaking) options on σB2 = pi pj σi σj ρij (3)
indices as well as options on exchange-traded funds ij =1
(ETFs), as well as options on bespoke baskets. The
latter are generally traded over the counter, often as where σB is the volatility of the basket, σi are the
part of, or embedded in, structured equity derivatives. volatilities of the stocks, and ρij is the correlation
Options on broad market ETFs, such as the matrix of stock returns. If we assume lognormal
Nasdaq 100 Index Trust (QQQQ) and the S&P returns for the individual stocks, then the probability
500 Index Trust (SPY), are the most widely traded distribution for the price of the basket is not log-
contracts in the US markets. As of this writing, their normal. Nevertheless, the distribution is well approx-
daily volumes far exceed those of options on most imated by a lognormal and equation (3) represents
individual stocks. Owing to this wide acceptance, the natural approximation for the implied volatility
QQQQ and ETF options have recently been given of the basket in this case.
quarterly expirations in addition to the standard The notion of implied correlation is sometimes
expirations for equity options. Options on sector used to quote basket option prices. The market
ETFs, such as the S&P Financials Index (XLF) or convention is to assume (for quoting purposes) that
the Merrill Lynch HOLDR (SMH), are also highly ρij ≡ ρ, a constant. It then follows from equation (1)
liquid. that the implied correlation of a basket option is
If we denote by B the value of the basket of
stocks at the expiration date of the option, a bas- 
n 
n
ket call has payoff given by max(B − K, 0) and a σB2 − pi2 σi2 σB2 − pi2 σi2
basket put has payoff max(K − B, 0), where K is ρ≡ 
i=1
=
i=1
2
the strike price. Most exchange-traded ETF options
pi pj σi σj 
n 
n
are physically settled. Index options tend to be pi σi − pi2 σi2
i=j
cash settled. Over-the-counter basket options, espe- i=1 i=1
cially those embedded in structured notes, are cash σB2
settled. ≈ 2 (4)
The fair value price of a (bespoke) basket option 
n
pi σi
is determined by the joint risk-neutral distribution of
i=1
the underlying stocks. If we write the value of the
basket as Implied correlation is the market convention for
n
B= wi S i (1) quoting the implied volatility of a basket option
i=1 as a fraction of the weighted average of implied
volatilities of the components.
where wi , Si denote respectively the number of shares For example, if the average implied volatility for
of the ith stock and its price, the returns satisfy the components of the QQQQ for the December at-
 wi Si dSi
n  dSi n the-money options is 25% and the corresponding
dB
= = pi , with QQQQ option is trading at an implied volatility of
B i=1
B Si i=1
Si 19%, the implied correlation is ρ ≈ (19/25)2 = 58%.
wi S i This convention is sometimes applied to options
pi ≡ (2) that are not at the money as well. In this case,
B
in the calculation of implied correlation for the
Here, pi represents the instantaneous capitaliza- basket option, the implied volatilities for the com-
tion weight of the ith stock in the basket, that is, ponent stocks are usually taken to have the same
the percentage of the total dollar amount of the moneyness as the index in percentage terms. Other
2 Basket Options

conventions for choosing the volatilities of the com- Haug, E.G. (1998). The Complete Guide to Option Pricing
ponents, such as equal-delta or “beta-adjusted” mon- Formulas, McGraw-Hill.
eyness, are sometimes used as well. Since the Hull, J. (1993). Options Futures and Other Derivative Securi-
ties, Prentice Hall Inc., Toronto.
corresponding implied correlations can vary with
strike price, market participants sometimes talk about
the implied correlation skew of a series of basket Related Articles
options.
Correlation Swap; Exchange-traded Funds
Further Reading (ETFs).

MARCO AVELLANEDA
Avellaneda, M., Boyer-Olson, D., Busca, J. & Friz, P. (2002).
Reconstructing volatility, Risk 15(10).
Call Spread Another important consideration is the volatil-
ity implied by the market (see Implied Volatility:
Market Models). When the strikes are out of the
money, the call spread price increases when volatility
A call spread is an option strategy with limited upside increases because the probability of finishing in the
and limited downside that uses call options of two money increases. On the other hand, when the strikes
different strikes but the same maturity on the same are in the money, the call spread price decreases when
underlying. More details and pricing models can be volatility increases because the probability of finish-
found in [1]. Market considerations can be found ing out of the money increases. This is illustrated in
in [3–5]. The call spread produces a structure that Figure 3.
at maturity pays off only in scenarios where the
price of the underlying is above the lower strike.
One can think of this strategy as buying a low-strike The Relationship with Digital Options
call option and financing part of the upfront cost by and Skew
selling a higher strike call option. The effect of selling
the higher strike option is to limit the upside potential, The value of a European call spread structure can be
but reduce the cost of the structure. It should be written in terms of the difference of two call options.
used for expressing a bullish view that the underlying If we let p(S) denote the probability distribution of
will rise in price above the lower strike. As with all the underlying at the time of option expiry, then we
options, choosing the strike and maturity will depend have
on one’s view of how much the underlying will move  ∞
and how quickly it will move there. An example is
CallSpread = e−rτ (S − K1 )p(S)dS − e−rτ
shown, in detail, in Figure 1. K1
In the example shown in Figure 1, we look at a  ∞
790/810 call spread on the S&P 500 index, SPX.
× (S − K2 )p(S)dS (1)
With the underlying SPX index at 770 and with three K2
months to expiration, a 790 strike call price is 49.44
and an 810 strike call price is 41.79. The spread cost  ∞
is 49.44 − 41.79 = 7.65. Thus, the cost for a call CallSpread = e−rτ (K2 − K1 )p(S)dS + e−rτ
K2
spread is significantly reduced from the outright cost
of a call option with the same strike. This upfront  K2
cost for the call spread is the most one can lose × (S − K1 ) · p(S)dS (2)
K1
in a call spread. We subtract this initial investment
from all other valuations, as shown in Figure 1, to If we now take the strikes very close to each other,
get a total value. On the other hand, if both options the second term becomes insignificant. Next, if we
expire in the money, one will earn 20 = 810 − 790 lever up by 1/(K2 − K1 ), the payoff approximates
on the call spread. Then the maximum profit is the payoff of a digital option, which pays one if the
the spread minus the initial cost, or 20 − 7.65 = underlying at termination is greater than the strike
12.35. and zero otherwise. In this case,
Note that with three months to expiration, the call
spread value is fairly insensitive to the underlying
price. However, as the option gets closer to expi-  ∞
ration, the sensitivity of the price of the call spread DigitalOption = e−rτ p(S)dS
K
becomes greater, especially in the range of the spread
itself. This sensitivity to underlying price or delta (see = e−rτ (1 − (K)) (3)
Delta Hedging) is illustrated in Figure 2.
The delta of the call spread stays relatively flat where (K) is the cumulative probability distribu-
until relatively close to expiration of the option. When tion at termination of the underlying. For the original
the call spread is close to expiration, the delta is very paper, see [6]. Also see [2, 7, 8] for more details.
unstable around the two strikes. We can state equation (3) in words as follows. The
2 Call Spread

15 0.02
Call spread total value 3 months
2 weeks
0.015 3 months
10 2 weeks
3 days 0.01 3 days
Expiration
5 0.005

Vega
0
0 −0.005
−0.01
−5
−0.015
−10 −0.02
700 750 800 850 900 700 750 800 850 900
Underlying price Underlying price

Figure 1 The value of a call spread at various times before Figure 3 The vega of a call spread at various times before
expiration expiration

area of a triangle)
1

3 months K2
(K2 − K1 )
0.8 2 weeks e−rτ (S − K1 )p(S)dS ≈ e−rτ
3 days K1 2
Expiration
0.6 × ((K2 ) − (K1 ))
Delta

0.4 (5)
0.2 This gives a better approximation
0
700 750 800 850 900 CallSpread ≈ e−rτ (K2 − K1 )
Underlying price  
(K1 ) + (K2 )
× 1− (6)
Figure 2 The delta of a call spread at various times before 2
expiration
This is a very intuitive formula as it is just the payoff
of the call spread times the “average” probability the
call spread finishes in the money.
probability distribution function of the underlying at
termination can be inferred from market prices as the
derivative of digital options prices with respect to the References
strike. As these digital option prices come from call
spreads with close strikes, we can conclude that the [1] Hull, J. (2003). Options, Futures, and Other Derivatives,
5th Edition, Prentice Hall.
probability distribution function can be inferred from [2] Lehman Brothers (2008). Listed Binary Options, availa-
vanilla option prices. ble at http://www.cboe.com/Institutional/pdf/ListedBinary
Equation (2) shows that for close strikes or long Options.pdf
expiries, the value of a call spread is approximately [3] The Options Industry Council (2007). Option Strategies
the strike difference times the probability that the in a Bull Market, available at www.888options.com.
underlying finishes above the spread: [4] The Options Industry Council (2007). Option Strategies
in a Bear Market, available at www.888options.com.
[5] The Options Industry Council (2007). The Equity Options
CallSpread ≈ e−rτ (K2 − K1 ) · (1 − (K2 )) (4) Strategy Guide, January 2007, available at www.888
options.com
[6] Reiner, E. & Rubinstein, M. (1991). Breaking down the
This can be used as a crude first-order estimate barriers, Risk Magazine 4, 28–35.
for the value of a call spread. The second term in [7] Taleb, N.N. (1997). Dynamic Hedging: Managing Vanilla
equation (2) can be approximated as (similar to the and Exotic Options, Wiley Finance.
Call Spread 3

[8] Wikipedia (undated). Binary Option, available at http://en. Related Articles


wikipedia.org/wiki/Binary option

Call Options.
Further Reading
ERIC LIVERANCE
Haug, E.G. (2007). Option Pricing Formulas, 2nd Edition,
McGraw Hill.
Butterfly The delta of the butterfly stays relatively flat until
relatively close to expiration of the option. When
the butterfly is close to expiration, the delta is very
unstable around the three strikes.
A butterfly spread is an option strategy with limited Another important consideration is the volatility
upside and limited downside that uses call options implied by the market (see Implied Volatility: Mar-
of three different strikes but the same maturity on ket Models). The vega profile of a butterfly is
the same underlying. Specifically, a butterfly is a shown in Figure 3. When the underlying is close
structure that is a long position in 1 low-strike call, to the strikes, the vega is negative because when
a short position in 2 midstrike calls, and a long volatility increases the probability that the underlying
position in 1 high-strike call. More details and pricing expires out of the money increases. For this reason,
models can be found in [2]. Market considerations it is common to use a butterfly with relatively long
can be found in [4–6]. The butterfly spread produces expiries and with strikes centered around at-the-
a structure that at maturity pays off only in scenarios money to take a view that implied volatility will
where the price of the underlying is between the decline while still holding a position with relatively
lowest and highest strikes. One can think of this small delta (insensitive to changes in the underlying).
strategy as buying an option on the underlying being When the underlying is away from the money,
in a range. The butterfly has limited upside potential, the butterfly is long vega because when volatility
but a significantly reduced cost compared to that increases, the probability that the underlying finishes
of an outright call option. It should be used for in the money increases.
expressing a bullish view that the underlying will
trade in a range. As with all options, choosing the
strike and maturity will depend on one’s view of how The Relationship with Distribution of the
much the underlying will move and how quickly it Underlying
will move there. An example is shown in detail in
Figure 1. A butterfly can be thought of as a long call spread
In the example shown in Figure 1, we look at a plus a short call spread, with overlapping strikes
780/800/820 butterfly on the S&P 500 index, SPX. and the same strike spread. An approximation for
With the underlying SPX index at 770 and with three the value of a call spread can be found in Call
months to expiration, the butterfly cost is close to Spread:
1.00. The call option with the 800 strike is 70.14;
thus, the cost for a butterfly is significantly reduced Call Spread ≈ e−rτ (K2 − K1 )
from the outright cost of a call option with the  
(K1 ) + (K2 )
same strike. This upfront cost for the butterfly is · 1− (1)
the maximum that this butterfly position can lose. 2
We subtract this initial investment from all other where (x) is the cumulative distribution function of
valuations, as shown in Figure 1, to get a total value. the underlying. Applying equation (1) to a butterfly,
If the underlying is exactly 800 at expiration, the we have
position will earn 20 on the butterfly from the low-
strike option. The maximum position profit then is the  
(K3 ) − (K1 )
strike spread minus the initial cost, or 20 − 1.00 = Butterfly ≈ e−rτ (K2 − K1 )2
19.00. K3 − K1
Note that with three months to expiration, the ≈ e−rτ (K2 − K1 )2 p(K2 ) (2)
butterfly value is fairly insensitive to the underlying
price and is difficult to distinguish from the x-axis. where p(x) is the probability distribution function
However, as the option gets closer to expiration, the of the underlying at option expiration and is the
sensitivity of the price of the call spread becomes derivative of (x) (Figure 4).
greater, especially in the range of the butterfly strikes. We can apply this formula in the following way.
This sensitivity to underlying price or delta (see Delta We convert the triangle in the lower part of Figure 3
Hedging) is illustrated in Figure 2. to a square. Then we let the value of the payoff of the
2 Butterfly

20 10
3 months
3 months
2 weeks
2 weeks
Butterfly total value
15 5 3 days
3 days
Expiration
10
0
5
−5
0
−10
−5
700 750 800 850 900
−15
Underlying price 700 750 800 850 900

Figure 1 The value of a butterfly at various times before Figure 3 The vega of a butterfly at various times before
expiration expiration

1.0
3 months
0.8 2 weeks Butterfly
0.6 3 days Approximation

Butterfly payout
0.4 Expiration
0.2
Delta

0.0
−0.2
−0.4
−0.6
−0.8
−1.0
700 750 800 850 900
Underlying price
Ka Change in underlying Kb
Figure 2 The delta of a butterfly at various times before
expiration Figure 4 Using a butterfly to infer the underlying proba-
bility distribution

butterfly to be represented as the probability times the  ∞


area of the square, as in equation (2). Then turning ∂
Call = −e−rτ p(S)dS (5)
around equation (2), we have ∂K K

Butterfly
prob(Ka < S < Kb ) ≈ erτ (3)
(K2 − K1 ) ∂2
Call = −e−rτ p(K) (6)
The relationship between option prices and the dis- ∂K 2
tribution of the underlying was first pointed out in
[1], but see also [3, 7]. The use of call spreads
and butterflies to impute the market-implied underly- References
ing probability distribution can be related to taking
derivatives with respect to the strike of the call [1] Breeden, D. & Litzenberger, R. (1978). Prices of state-
price. A call spread is like a first derivative and a contingent claims implicit in option prices, Journal of
butterfly is like a second derivative. Formally, we Business 51, 621–651.
have [2] Hull, J. (2003). Options, Futures, and Other Derivatives,
5th Edition, Prentice Hall.
 ∞
[3] Jackwerth, J.C. (1999). Option-implied risk-neutral dis-
−rτ tributions and implied binomial trees: a literature review,
Call = e (S − K)p(S)dS (4) The Journal of Derivatives 7, 66–82.
K
Butterfly 3

[4] The Options Industry Council (2007). Option Strategies Related Articles
in a Bull Market, available at www.888options.com.
[5] The Options Industry Council (2007). Option Strate-
gies in a Bear Market, available at www.888options. Corridor Options; Risk-neutral Pricing; Variance
com Swap.
[6] The Options Industry Council (2007). The Equity Options
Strategy Guide, January 2007, available at www.888 ERIC LIVERANCE
options.com
[7] Rubenstein, M. (1994). Implied binomial trees, The Jour-
nal of Finance 49, 771–818.
Gamma Hedging in currency and can be summed over several stock
positions, whereas the direct gamma cannot).
As no condition was put on the relation of the
volatility to time and space, equation (3) is easily
Why Hedging Gamma? extended to a local volatility setting (see Local
Volatility Model). Practitioners
√ call this equation the
Gamma is defined as the second derivative of a
breakeven relation and σ δt the breakeven for it
derivative product with respect to the underlying
represents the move in performance the stock has
price. To understand why gamma hedging is not just
to make in the time δt to ensure a flat P&L (e.g.,
the issue of annihilating a second-order term in the
if we consider that a year is composed of 256 open
Taylor expansion of a portfolio, we review the profit
days, a stock having an annualized volatility of 16%
and loss (P&L)a explanation of a delta-hedged self-
needs to make a move of 1%, at which the delta
financing portfolio for a monounderlying option and
is rebalanced, to ensure a flat P&L between two
its link to the gamma.
consecutive days). Figure 1 shows the portfolio P&L
Let us consider an economy described by the
for a position composed of an option with a positive
Black and Scholes framework, with a riskless interest
gamma.
rate r, a stock S with no repo or dividend whose
volatility is σ , and an option O written on that stock. Equation (3) leads to two important remarks. First,
Let  be a self-financing portfolio composed at it is a local relation, both in time and space, and
t of the fact that the gamma is gearing the breakeven
relation implies that the global P&L of a positive
• the option Ot ; gamma position, hedged according to the Black and
• its delta hedge: −t St with t = ∂O
∂S
; and Scholes self-financing strategy, can very well be
• the corresponding financing cash amount −Ot + negative if a stock makes large moves in a region
t St . where the gamma is small and makes small moves
in a region where the gamma is maximum, even
We note δ the P&L of the portfolio between if the realized variance of the stock is higher than
t and t + δt and we set δS = St+δt − St . Directly, the pricing variance σ 2 . Secondly, in the long run,
we have that the delta part of the portfolio P&L is the realized variance is usually smaller than the
−t δS and that the P&L of the financing part is implied variance, which can lead practitioners to
(−Ot + t St )rδt. Regarding the option P&L, δO, build negative gamma positions. Yet, Figure 1 shows
we have, by a second-order expansion, that a positive gamma position is of finite loss and
possibly infinite gain, whereas it is the opposite for
∂O ∂O 1 ∂ 2O
δO ≈ δt + δS + (δS)2 (1) a negative gamma position. Practically, this is why
∂t ∂S 2 ∂ 2S traders tend naturally to a gamma neutral position.
Furthermore, the option satisfies the Black and A specific aspect of the equity market is the
Scholes equation (see Black–Scholes Formula): presence of dividends. One can wonder if, on the
date the stock drops by the dividend amount, a
∂O ∂O 1 ∂ 2O positive gamma position is easier to carry than a
+ rS + σ 2 S 2 2 = rO (2) negative gamma position. It is, of course, linked
∂t ∂S 2 ∂ S
to the dividend representation chosen in the stock
Combining these two equations and writing the modeling. It can be shown that the only consistent
P&L of the portfolio as the sum of the three terms, way of representing the dividends is the one proposed
we get in Dividend Modeling, where the stock is modeled
   as in Black and Scholes between two consecutive
1 2 ∂ 2O δS 2 dividend dates. It is the only representation in which
δ ≈ S 2 − σ δt
2
(3)
2 ∂ S S equation (3) stands (on the dividend date, the P&L
term coming from the cash dividend part is offset by
where ∂ 2 O/∂ 2 S is the gamma of the option part of a term arising from the adapted Black and Scholes
the portfolio (in terms of definition, S 2 (∂ 2 O/∂ 2 S) equation). In others, either the gamma carries a
is called the cash gamma because it is expressed dividend part (dividend yield models) that leads to a
2 Gamma Hedging

P&L

Breakeven

d S /S

Figure 1 The P&L of a self-financing portfolio composed of an option with a positive gamma in the interval δt

false breakeven on the dividend date or equation (3) These two figures show that, to efficiently hedge
is not associated with the stock but with the variable his or her gamma exposure, a trader would rather use
that is stochastic (model in which the stock is a short-term option to avoid bringing too much vega
described as a capitalized exponential martingale to his or her position. Moreover, the gamma of an “at-
minus a capitalized dividend term, for example). the-money” option is increasing as one gets closer to
This is why practitioners use the model proposed the maturity, whereas the gamma of an “out-of-the-
in Dividend Modeling rather than any other. This money” option is decreasing.
is also why it is, indeed, a general framework we
put ourselves in by excluding dividends and repo
(which is usually represented by a drift term whose
The Put Ratio Temptation
P&L impact is also offset by a term arising from the
adapted Black and Scholes equation) in our analysis.
As equation (3) shows, the gamma and the theta (first
derivative of a derivative product with respect to
time) of a portfolio are of opposite signs. Moreover,
Practical Gamma Hedging in the equity market, the implied volatility is usually
described by a skew, meaning that if we consider two
We have seen why traders usually try to build
puts P1 and P2 for the same maturity T , having two
a gamma-neutral portfolio. Yet, there is no pure
strikes K1 and K2 with K1 < K2 , we classically have
gamma instrument in the market, and neutralizing
σK1 > σK2 . If we now build a self financing portfolio
the gamma exposure always brings a vega exposure
 that is composed of P2 − αP1 with α = 2 / 1 , the
to the portfolio. Without trying to be exhaustive,
ratio of the two gammas, we get from equation (3)
we briefly review here some natural gamma hedging
that δ ≈ 12 S 2 2 (σK2 1 − σK2 2 )δt > 0.
instruments.
This result is not in contradiction with arbitrage
theory; it only demonstrates that equation (3) is
Hedging Gamma with Vanilla Options strictly a local relation. As shown by Figure 2, to
keep this relation through time, the trader would
European calls and puts have the same gamma have to continuously sell the put P2 , as α increases
(and the same vega). Hence, they are equivalent as time to maturity decreases, and, in case of a
hedging instruments. Figure 2 shows the gamma of market drop down, he or she would find himself in a
a European option for two different maturities and massive negative gamma situation. Still, practitioners
Figure 3 shows the compared evolutions with respect commonly use put ratios to improve the breakeven of
to the maturity of the gamma and of the vega. their position.
Gamma Hedging 3

Three-month call gamma

One-year call gamma

0 50 100 150 200


Spot

Figure 2 Gamma of a European call as a function of the spot for two maturities (strike is equal to 100)

Call vega

Call gamma

0 1 2 3 4 5 6 7 8 9 10
Time

Figure 3 Compared evolution of the gamma and vega of an at-the-money European call as a function of maturity (scales
are different)

Hedging Gamma with a Variance Swap Extending the Definition of Gamma


or a Gamma Swap
In the market, the implied volatility changes with
As explained in Variance Swap a variance swap the spot moves (see Implied Volatility Surface).
is equivalent to a log contract. Hence, its cash It is in contradiction with the use of a Black and
gamma is constant. It is therefore an efficient gamma Scholes model whose volatility is constant, but,
hedging instrument for a portfolio whose gamma is to avoid the multiplicity of risk sources, and to
not particularly localized (as opposed to a portfolio keep them observable, traders tend to rely on that
of vanilla options whose gamma is locally described model, nonetheless (and therefore hedge their vega
by Figure 2). Gamma swaps (see Gamma Swap) exposure). Nevertheless, to take this dynamics into
have the same behavior. Their specificity is to have consideration, some traders incorporate a “shadow”
a constant gamma. term into their sensitivities. The shadow gamma [1]
4 Gamma Hedging

is defined as part, and of a term coming from the pure jump part.
  The hedge of the latter is very complex because it is
∂ 2O ∂ ∂O ∂σ
+ (4) not localized in space (one needs to use a strip of gap
∂ 2S ∂S ∂σ ∂S options, e.g., to control it).
The second term, the shadow term, depends on the Finally, a possible way of controlling the volatil-
chosen dynamics of the implied volatility. ity surface dynamics is to make no assumption on the
The problem with the shadow approach is that volatility except that it is bounded. This framework
we cannot rely anymore on a self-financing strat- is known as uncertain volatility modeling and is pre-
egy in the Black and Scholes framework to define sented in Uncertain Volatility Model. The analysis
the breakeven. One solution, in order to build a self- leads to the conclusion that instead of one breakeven
financing strategy that incorporates volatility surface volatility, there are, in fact, two: the upper bound for
into the dynamics, is to use a stochastic volatility positive gamma regions and the lower bound for neg-
model (see Heston Model) instead of a Black and ative gamma ones. In that case, and supposing that
Scholes model. For example, one can use the follow- the effective realized volatility stays locally between
ing model: these two bounds, gamma hedging is not necessary,
as the P&L of the delta-hedged self financing portfo-
dS = rS dt + σ S dWt1 lio is naturally systematically positive.
dσ = µ dt + ν dWt2 (5)
dW , W t = ρ dt
1 2
Multiunderlying Derivatives
Using the same arguments as in the Black and We consider a multidimensional Black and Scholes
Scholes framework, the P&L of a delta-hedged self- model of N stocks Si with volatility σi . ρij represents
financing portfolio (now with a first-order hedge for the correlation between the Brownian motions con-
the volatility factor using a volatility instrument like trolling the evolution of Si and Sj . We do not discuss
a straddle, for example) in this model is the issue of multicurrency (see Quanto Options) and,
   using the same mechanism as in the monounderlying
1 2 ∂ 2O δS 2 framework, we can express the P&L of a delta-
δ ≈ S 2 − σ δt
2
2 ∂ S S hedged self-financing portfolio as
  
1 ∂ 2O 1  2 ∂ 2O
N
δSi 2
+ ((δσ )2 − ν 2 δt) δ ≈ S − σi δt
2
2 ∂ 2σ 2 i=1 i ∂ 2 Si Si
  
∂ 2O δS    
+S δσ − ρσ ν δt (6)  ∂ 2O δSi δSj
∂S∂σ S + Si Sj −ρij σi σj δt
i<j
∂Si ∂Sj Si Sj
Two other “gamma” terms appear in this equation,
which proves that incorporating the dynamics of the (7)
volatility is not as simple as the addition of a shadow
term in the Black and Scholes breakeven relation. It The first term can be controlled by the hedging
also shows that controlling the P&L leads to a more instruments we have previously reviewed. The cross
complex gamma hedge, as it is now necessary to ones, which incorporate the “cross gammas”, can
annihilate two more terms (the second and third ones, also be controlled, using so-called correlation swaps
for which natural hedging instruments are strangles (typically, a basket option minus the sum of the
and risk reversals). individual options).
Another popular way of integrating the volatility
surface dynamics in the model is to use Levy
processes (see Exponential Lévy Models). We do Conclusion
not give the P&L explanation in that case, but, like
in the stochastic volatility framework, it is the sum of Controlling the gamma exposure of a position is one
the term presented in equation (3), for the Brownian of the main concerns of traders. Hedging instruments
Gamma Hedging 5

are common options but it is not possible to sim- Reference


ply hedge the gamma without modifying the vega
exposure of the position. Also, integrating the volatil- [1] Taleb, N. (1996). Dynamic Hedging: Managing Vanilla
ity surface dynamics in the model leads to a more and Exotic Options, John Wiley & Sons, pp. 138–146.
complex gamma-hedging issue than in a Black and
Scholes model, but it still can be addressed. More-
over, we remark that although we have considered
Related Articles
the equity market in our study, this analysis can eas-
ily be extended to other complete markets in which Correlation Swap; Delta Hedging; Exponential
there is no arbitrage and where the price process is Lévy Models; Gamma Swap; Heston Model;
modeled by a Brownian motion of any dimension. Uncertain Volatility Model; Variance Swap.

CHARLES-HENRI ROUBINET
End Notes
a.
P&L stands for “profit and loss” and represents the
evolution of the portfolio value between two dates due to
time and to the market activity between these dates.
Delta Hedging arising from the transaction. If the net delta expo-
sure between the financial derivative and the hedge
is zero, the position is said to be delta neutral.

The delta of an asset or a portfolio broadly means net


market exposure to an asset class. This may be for a The Process of Delta Hedging
single asset, like an option, or for a portfolio, like an
S&P 500 benchmarked mutual fund. Delta hedging The process of delta hedging incorporates some or all
is the process of reducing the size of this exposure to of the following steps.
a target level to reduce the amount of risk exposure
due to the delta present in the portfolio.
Delta hedging is a term used in two broad cat- Calculation of Delta
egories: delta hedging of investment portfolios and Typically, this is based on risk neutral valuation of
delta hedging of financial derivartives. the product. As a result the delta may vary depending
on the underlying model used in the valuation. This
variation may be small for simple products but may
Delta Hedging of Investment Portfolios result in material differences for complex products.
For example, a vanilla equity call option at-the-
The delta of a portfolio is determined based on the money (ATM) might have a similar delta based on
assets in the portfolio. For equity portfolios, it may Black–Scholes [1] inputs or a local volatility model,
mean net equity exposure, for fixed income portfolios while a knock-out put, might have materially different
it may mean portfolio duration, while for commodity deltas if measured by the two models above.
portfolios it could be exposure to the underlying
commodity such as bushels of corn or barrels of
oil. To delta hedge a portfolio, the portfolio manager Determining the Appropriate Hedging Instrument
determines what target net delta level is desired for
the portfolio and uses a broad market instrument like Entering into a hedge has various costs, the key
a futures or swaps to achieve that desired level of aspects of which are mentioned below:
delta. In addition, some portfolio managers might 1. Liquidity of the hedging instrument
be more precise by hedging sensitivity to one factor It is ideal that the hedging instrument is easily
(beta hedging) or multiple factors. tradable with a low bid/ask spread. For derivatives,
where the underlying is less liquid or there might be
some market impact in executing the hedge, the price
Delta Hedging of Financial Derivatives of the financial derivative is based on the average
realized execution price level of the hedge.
Financial derivatives provide linear or nonlinear
2. Basis risk in the hedging instrument
exposure to an underlying asset price level. While
it is possible that both the buyer and the seller of a In practice, some of the hedging instruments may
specific derivative are interested in identically oppo- have a small basis risk versus the actual underlying,
site exposures, it is more common that one of the which needs to be hedged. This usually happens when
parties to the transaction is a financial intermediary the actual underlying has significant transaction or
or market maker that plans to hedge or reduce some liquidity costs and the basis tracking risk is low.
of the risks of the transaction. 3. Financing costs
Delta hedging is the simplest form of hedging There might be costs associated with going long or
financial derivatives. This hedging aims to neutral- short an asset as a hedge or entering into a financial
ize the direct exposure to the underlying asset, or swap transaction (e.g., the cost of posted collateral).
delta, while maintaining second-order exposures to
convexity, volatility, and time. In this discussion, we 4. Stability of the hedge
assume that a market maker enters into a financial If the hedge involves going long an asset or entering
derivative transaction and decides to hedge the delta into a long-term linear over the counter (OTC)
2 Delta Hedging

derivative, the hedge is considered stable. However, 3. Sell $38.5mm a basket of stock that comprises
if the market maker requires to borrow the hedge to S&P 500 index, paying a borrowing fee and
go short (e.g., short stocks or bonds), then the hedge executing stock orders on the exchange.
may be subject to a lack of availability to borrow in
the marketplace. Example 2. Long total return swap on XYZ com-
modity index.
Incorporating the Cost of Hedging into the Price MMDI sold a five-year total return swap on the XYZ
of the Financial Derivative commodity index to a client for $10mm notional,
which has a delta of $10mm. To hedge the position,
1. Market makers who seek to hedge a position may MMDI can do one of the following:
incorporate the cost of hedging using adjustments
to the financing rates, expected loss owing to basis 1. Buy an equal and offsetting swap with another
risk, or future rollover costs of short-term hedges client or market counterparty or a basket of com-
(like futures) into the pricing of the derivative. modities swaps for each of the components of the
2. In addition, the market maker may seek contrac- XYZ index from another counterparty which is
tual obligations to the ensure that if he is unable the perfect hedge (assuming no counterparty risk).
to hedge the contract, he has a right to unwind 2. Hedge the XYZ index with a much more actively
the derivative at fair market price. traded index like the CRB index taking into
account the tracking risk and weighting differ-
ences of the components between the two indices.
Examples of Delta Hedging 3. Maintain a portfolio of long futures on com-
Below are some examples of delta hedging. modities, which compromises the XYZ index and
rolling futures on the position for the next five
Example 1. Listed put option on S&P 500 index. years. The market maker assumes the risk of
Market Maker Derivatives Inc. (MMDI) sells an rolling the futures positions to changes in the
exchange listed put on the S&P 500 (SPX) Index on a shape of the commodity forward curve.
$100mm notional with a strike price of $1350 and one
year expiration to a client when the SPX is trading
at $1400. The Black–Scholes model calculates the Rehedging Delta with Time, Spot Moves
delta of the position at 38.5%. To delta hedge and The delta of a financial product may change with
neutralize the position to P&L swings from changes time or the levels of the different market parameters
in the level of the SPX, MMDI needs to sell $38.5mm such as volatility, underlying price, interest rates, or
of SPX Index. MMDI can do one of the following to skew. It is then necessary to periodically adjust the
achieve this: size of the hedge to maintain the delta at a preset
1. Sell $38.5mm in SPX index futures (1100 futures level. The rate of change of the delta is proportional
at $1400 with a 250 multiplier). Since these to the gamma (see Gamma Hedging).
futures expire every three months, the market
maker needs to roll this exposure into a new Reference
contract every three months.
2. Sell $38.5mm of a one-year swap on the SPX [1] Black, F. & Scholes, M. (1973). The pricing of options
Index with another counterparty for one year or and corporate liabilities, Journal of Political Economy
81(May-June), 637–659.
enter into an OTC put/call combo transaction
(buying a put and selling a call with the same
strike price to replicate a forward using put–call Related Articles
parity) for $38.5 mm notional. This is done in the
OTC market place and appropriate International Hedging; Hedging of Interest Rate Derivatives;
Swaps and Derivatives Association, Inc. (ISDA) Option Pricing: General Principles.
(http://www.isda.org/) documents and collateral
agreements need to be in place before this is done. VIJU JOSEPH
Dispersion Trading D = 0 corresponds to the case when there is no
dispersion—all correlations are 100%.
So, to long dispersion is equivalent to be short
Dispersion trading refers to the practice of selling correlation and vice versa.
index variance while buying variance of its con- To characterize the correlation between the con-
stituents at the same time. The reverse strategy stituents, one can define the average correlation as
(buying index variance while selling constituents if the correlation is the same between every pair of
variance) can also be employed, but it is not as stocks in the basket
popular.

n
To understand dispersion trading, consider the σI2 − wi2 σi2
index as a basket of stocks: i=1
ρ̄ = (7)

n 
n 
n
SI = wi S i (1) 2 wi wj σi σj
i=1 i=1 j >i

where wi is the weight of Si in the basket.


However, sometimes, it is easier to calculate the
The variance of the index is related to that of the
less accurate “correlation proxy,” which is defined as
individual stocks by

n 
n 
n σI2
ρ̃ =  2 (8)
σI2 = wi2 σi2 +2 ρij wi wj σi σj (2) n
i=1 i=1 j >i wi σi
i=1
The variance is defined as

1  t This correlation proxy can be interpreted as the


T
σi2 = (S − S̄i )2 (3) “average” of all correlations between all pairs of
T t=1 i
stocks in the index including a stock with itself
 t (which we know should be 100%). When the number
with S̄i = T1 Si , and the correlation ρij is defined of stocks in the index n is high, it can be seen
as 
that ni wi2 σi2 is much less compared to the retained
1  t
T
ρij = (S − S̄i )(Sjt − S̄j )/(σi σj ) (4) terms:
T t=1 i

n
If we hold the realized variances of every com- wi2 σi2  σI2 (9)
ponent stock constant, the maximum for the index i=1
variance is reached when the correlation between all 
n
the components is 100%. If the correlation between σI2  σI2 − wi2 σi2 (10)
stocks is not perfectly 100%, the index variance is i=1
lower. The more “dispersed” the stocks are, the lower 
n 
n 
n 
n
is the index variance. 2 wi wj σi σj  2 wi wj σi σj
A measure of dispersion—the dispersion i=1 j >i i=1 j >i
spread—can be defined as
 
n
 n 2 + wi2 σi2 (11)
 
D=  wi σi − σI2 (5) i=1

i=1 The correlation proxy and correlation are very


close to each other. The implied correlation can be
or, alternatively, it has also been defined as simply inferred from the ratio of average volatilities.

n Sometimes it is also convenient to calculate the mean
D= wi σi − σI (6) variance ratio that is more directly related to the trade
i=1 profit/loss (P/L).
2 Dispersion Trading

By definition, realized correlation is the correlation One issue in dispersion trade is to decide the
calculated using realized volatilities, and implied relative weight for index and constituents variances.
correlation is the correlation calculated using implied There is no single “correct” relative weight to use.
volatilities. Implied volatilities decide the price of the For example, vega neutral weights aim to make the
traded instruments like vanilla options and variance sum of constituents vega and index vega zero, so
swaps. that the trade is hedged against fluctuations in level
The success of dispersion trades relies on the fact of volatility. “Premium neutral” weights make the
that statistically the realized correlation tends to be initial premium of buying constituents and selling
below the implied correlation. Historically, if one index cancel each other.
were long dispersion, on average, one made more In reality, it is impractical to trade all constituents.
money than the amount one lost. There are many Often, a selection of names in the index (or even
different reasons for this phenomenon, for example, those not in the index) is used. This is called a
one may argue that there is more market demand proxy basket. One can build the proxy basket by
for index volatility than that of the individual stock, selecting, for example, the names that have the largest
which means usually there is more premium for weights in the index, or the names that are judged
equity stock volatility. More importantly, correlation relatively “cheap”, or the names that are mostly likely
jumps to a very high level when extreme market to “disperse” against each other, or simply by the
conditions exist, namely, global recession and market stock fundamentals.
crash, while it stays low in a normal and uneventful
market.
Related Articles
To long the volatility of each component stock,
and short the index volatility, one can either trade
vanilla options or variance swaps. The variance Basket Options; Correlation Swap.
swaps provide direct exposure to variance without
YONG REN
the unnecessary cost and hassle of hedging against
daily stock movements.
Correlation Swap Realized Correlation
There are mainly two types of realized correla-
tion formulas currently found on over-the-counter
(OTC) markets:
A correlation swap is a type of exotic derivative
security that pays off the observed statistical cor- • equally weighted realized correlation—the for-
relation between the returns of several underlying mula used in the above example; 
assets, against a preagreed price. At the time of writ- wi wj ρi,j
ing, it is traded over-the-counter (OTC) on equity • weighted realized correlation— 
1≤i<j ≤N
,
and foreign exchange derivatives markets. This article wi wj
1≤i<j ≤N
focuses on equity correlation swaps, which appeared where w1 , . . . , wN are preagreed positive weights
in the early 2000s, as a means to hedge the parametric summing to 1. In the above example, one would
risk exposure of exotic trading desks to changes in typically take the “index weights” as of the trade
correlation. date, that is, the stock quantities that a portfolio
manager would invest in to track one unit of the
Dow Jones Euro Stoxx 50 index.
Payoff Several technical reports have investigated how
the above weighted realized correlation (WRC) for-
Similar to variance swaps, the correlation swap mula relates to other proxy formulas that are pop-
payoff involves a notional (the amount to be paid/ ular in econometrics, when the underlying assets
received per correlation point a ), a realized correla- and weights correspond to an equity index. Tierens–
tion component (the formula used to calculate the Anadu [4] give empirical evidence that in the case of
level of observed statistical correlation between the the S&P 500 index,
underlying assets), and a strike price: 
wi wj Cov(Xi , Xj )
i<j
Correlation swap payoff = notional W RC ≈   (1)
wi wj V ar(Xi )V ar(Xj )
× (realized correlation − strike) i<j

In addition, Bossu [1, 2] derives the following


For example, a one-year correlation swap contract limit-case proxy formula, subject to some conditions
on the constituents of the Dow Jones Euro Stoxx 50 on the weights:
index would include the following terms: 
wi wj Cov(Xi , Xj )
• underlying assets—each of the 50 constituent i<j
stocks denoted by S1 , . . ., SN (N = 50);  
wi wj V ar(Xi )V ar(Xj )
• notional— ¤100 000 per correlation point;
 i<j
• realized correlation— N(N−1)2
ρi,j , where  N 
1≤i<j ≤N 
ρi,j = √
Cov(Xi ,Xj )
× 100 is the familiar V ar wi Xi
V ar(Xi )×V ar(Xj )
i=1
pairwise coefficient of correlation between the ∼  2 (2)
time series Xi and Xj of daily log returns
N→∞
N 
observed in the year following the trade date; wi V ar(Xi )
• strike—52.0 correlation points. i=1

The proxy formula on the right-hand side is


Thus, if after one year the arithmetic average remarkable because we can interpret the numerator
of pairwise correlation coefficients between the 50 as “index variance” and the denominator as “average
underlying assets is equal to 58.3 correlation points, constituent variance”, which is more straightforward
the swap seller will pay a net cash flow of ¤630 000 than the average of N × (N − 1)/2 pairwise correla-
to the swap buyer. tion coefficients.
2 Correlation Swap

Fair Value example, a vanilla option on a portfolio made of 50% Si


and 50% Sj . The former two options are listed, the latter
At the time of writing, little is known about the “fair is not.
c.
value” of correlation swaps. Owing to the typically For example, in a two-asset extension of the Black–
Scholes model with instantaneous correlation (dln St1 )(dln
large number of underlyings, the popular Monte
St2 ) = ρ dt, the forward value of an equally weighted
Carlo engine with or without local volatility surfaces correlation swap is simply ρ – strike, which would be
requires an N × N correlation matrix as additional hedged purely with cash!
input parameter. There are two problems with this
approach: a practical one and a theoretical one.
The practical problem is that individual correlation
References
coefficients cannot be implied from listed option
[1] Bossu, S. (2007). A New Approach For Modelling
markets.b The theoretical problem is that, even if one
and Pricing Correlation Swaps, Dresdner Kleinwort
could come up with a sensible implied correlation Equity Derivatives report (working paper). Available
matrix, a meaningful dynamic replication strategy for at http://math.uchicago.edu/∼sbossu/CorrelationSwaps7.
the correlation swap payoff would still be missing.c pdf
Ongoing research (see, e.g., the working papers of [2] Bossu, S. & Gu, Y. (2004). Fundamental Relation-
Bossu [1] and Jacquier [3]) aims to identify the link- ship Between an Index’s Volatility and the Average
ages between dispersion trading and the dynamic Volatility and Correlation of its Components, JPMorgan
Equity Derivatives report (working paper). Available at
hedging of correlation swaps, especially when the
http://math.uchicago.edu/∼sbossu/CorrelFundamentals.
underlying assets are the constituent stocks of an pdf
equity index. This approach exploits the proxy for- [3] Jacquier, A. (2007). Variance Dispersion and Correlation
mulas above to rewrite the correlation swap payoff Swaps, Working paper. Available at SSRN: http://ssrn.
as a function of tradable variance swaps; in this com/abstract=998924
framework the correlation swap becomes a mul- [4] Tierens, I. & Anadu, M. (2004). Does it Matter Which
tiasset volatility derivative (see Realized Volati- Methodology you use to Measure Average Correlation
Across Stocks? Goldman Sachs Equity Derivatives Strat-
lity Options), rather than a classical multiasset egy: Quantitative Insights, 13 April 2004.
derivative.

Related Articles
End Notes
a. Basket Options; Correlation Risk; Dispersion
In market jargon, a correlation point is equal to 0.01.
With this convention, the value of a correlation coefficient Trading; Variance Swap.
is comprised between −100 and +100 correlation points.
b.
To imply the value for ρi,j , one needs three option prices: SÉBASTIEN BOSSU
a vanilla option on Si , a vanilla option on Sj and, for
Stock Pinning options, the closing print of XYZ is $35, then stock
XYZ is said to pin. Prices of $34.27, $31.60, or even
$32.48, are said not to have pinned. Figure 1 is a tick
Stock pinning, or simply pinning, is formally the price graph of KO (Coca Cola Corporation) showing
occurrence of a closing stock print, on option expi- the last several days prior to a pinning expiration.
ration day, which exactly matches the denominated As a practical matter, it may be useful experimen-
value of a strike price. As an example, let stock tally to consider pinning to have occurred if the stock
XYZ have strikes 30, 32.5, 35, and 40. If on Friday, expires within a certain interval of a strike price.
May 16, 2008 at 4:00 pm EDT (USA), the third Fri- There are several reasons for this looser definition.
day of the month and thus an expiration day for listed Empirically there may be several “closing prints”

KO : October 15, 16, 17


46.00

45.80

45.60

45.40

45.20
Price

45.00

44.80

44.60

44.40

44.20

44.00
1 30 59 88 117 146 175 204 233 262 291 320 349 378 407 436 465 494 523 552 581 610 639 668 697 726 755
Tick

Figure 1 KO (Coca Cola) tick data for a pinning expiration, October 17, 2003

Pinning % by (open interest / volatility)


0.14
0.12
0.10
% Pinned

0.08
0.06
0.04
0.02
0.00
0.36 2.19 7.40 46.41
Adjusted open interest (0.001∗(OI/volatility)) — bin average shown

Figure 2 All optionable stocks in 2002 divided into quartiles by pinning strength, β. As predicted, the probability of
pinning increases with beta. (Pinning criteria $0.15, courtesy Bart Rothwell)
2 Stock Pinning

Pinning history (June 2003 – October 2004)


30.0%

25.0%

% That pinned ($0.25)


20.0%

15.0%

10.0%

5.0%

0.0%
0 5 10 15 20
OI-ratio

Figure 3 Cumulative distribution function of pinning of stocks, which are within $1 of a strike with 1 week to go to
expiration as a function of the parameter, β. (Courtesy Tom MacFarland)

Pinning % by date — KO
0.16
0.14
0.12
% Pinned

0.1
0.08
0.06
0.04
0.02
0
−5 −4 −3 −2 −1 0 1 2 3 4 5
Relative trading data from option expiration date

Figure 4 The percentage of days KO closed within $0.15 of a strike in a 10-year period January 1, 1996 to January 1,
2005. 0 is expiration day, negative integers are days prior to expiration, positive integers are days following expiration.
(Courtesy Bart Rothwell)

making a choice of the closing price arbitrary. Tick outside of which in-the-money puts and calls would
data shows that a stock may be effectively pinned be automatically exercised by the clearing process;
over the last several minutes before expiration but options within the interval would require exercise
then have a closing print just off the strike. In the notice by the holder. Over time, the OCC has reduced
first example, this might happen if the last quote were the interval to the current $0.01 (from $0.05 before
$34.98 bid, at $35.01, and the closing price stayed in
June 2008 expiration); traders will declare a stock to
the interval but was not precisely $35.
have pinned if it falls within the OCC interval. Pin
Two additional reasons for a looser definition
are the automatic exercise conditions mandated by risk attends to any short position inside this interval
the OCC (Options Clearing Corporation) and the because an uncertain number of options may be
consequent pin risk, which attends expiring short assigned and thus an uncertain postexpiration stock
positions on the (nearly) pinned strike. The OCC position exists in the positions of those short the
has traditionally fixed an interval about a strike expiring at-the-money options.a
Stock Pinning 3

Percentage of optionable stocks closing within


$0.125 of a strike price

11.5

11
%

10.5

10

9.5
−10 −9 −8 −7 −6 −5 −4 −3 −2 −1 0 1 2 3 4 5 6 7 8 9 10
Relative trading date from option expiration date

Figure 5 All stocks, January 1996 to September 2002 [Reproduced with permission from Stock price clustering on option
expiration dates, Ni et al., Journal of Financial Econometrics,  Elsevier 2005.]

Percentage of nonoptionable stocks closing


within $0.125 of an integer multiple of $5
8

7.5
%

6.5

6
−10 −9 −8 −7 −6 −5 −4 −3 −2 −1 0 1 2 3 4 5 6 7 8 9 10
Relative trading date from option expiration date

Figure 6 Nonoptionable stocks do not pin, January 1996 to September 2002 [Reproduced with permission from Stock
price clustering on option expiration dates, Ni et al., Journal of Financial Econometrics,  Elsevier 2005.]

So far we have defined a single instance of pin- of stock pinning. In this perspective, a stock, or
ning. Complementing the notion of an individual stocks, is said to pin if, no matter how small an
instance of stock pinning is an ensemble assertion interval one chooses about a strike price, there
4 Stock Pinning

Marketmakers + firm proprietary


traders net long
Percentage of stocks close within strike price± 0.125, market and firm net long 43329

12

11.5

11

10.5

10

9.5

8.5

−10 −9 −8 −7 −6 −5 −4 −3 −2 −1 0 1 2 3 4 5 6 7 8 9 10
(a) Relative trading date from option expiration date

Marketmakers net short


Percentage of stocks close within strike price± 0.125, marketmaker net short 26706

12

11.5

11

10.5

10

9.5

8.5

−10 −9 −8 −7 −6 −5 −4 −3 −2 −1 0 1 2 3 4 5 6 7 8 9 10
(b) Relative trading date from option expiration date

Figure 7 All optionable stocks, January 1996 to September 2002. (Pinning criterion $0.125): pinning when professional
traders are (a) long and (b) short the expiring at-the-money strike
Stock Pinning 5

is a finite probability of finding closing prints of stock prices near a strike on expiration days for
within the interval. To compute this limit, expressed KO and for the entire market over extended peri-
mathematically as ods. Figure 6 demonstrates the absence of pinning in
nonoptionable stocks. Finally, Ni et al. lent support
lim P (|S-K| ≤ ε) > 0 (1) for the hedging assumptions of Avellaneda and Lip-
ε→0
kin. Figure 7(a,b) shows the difference between pin-
where ε is an interval about (any) strike K, S is ning when professional traders are long (a) and short
the stock price at expiry, and P is the probability (b) the expiring at-the-money strike.
among all expiration closes, one can do empirical Since 2004, other research groups, for example,
experiments or theoretical calculations. It is important Jeannin, et al. [2], have continued to explore the
to note that standard models of option pricing such details of pinning.
as Black–Scholes, Heston, SABR, and stochastic
volatility models, in general, cannot exhibit pinning
mathematically. End Notes
Although traders had long believed pinning to be
a.
a real phenomenon, little theoretical or experimental Practitioners define pin risk as the uncertain deltas which
effort was made to examine the subject through the an otherwise balanced position might have postexpiration
1990s. Krishnan and Nelkin [3] looked at the data due to the assignment of calls or puts on a near pinning
set of MSFT (Microsoft Corporation) expirations strike. For example, a position long 50 calls and short 50
puts on the $25 strike, for a stock which expires near $25,
and found evidence of pinning. They proposed a may be assigned from 0 to 5000 shares of stock due to the
model that combined a Gaussian random walk with uncertain number of puts which may have been exercised.
a Brownian bridge process in order to force pinning This amount of stock thus assigned is independent of the
to a strike. The model perforce guaranteed pinning, number of calls the trader chooses to exercise himself.
b.
but suffered from many obvious weaknesses: stocks Following the initial 2003 work, Gennady Kasyan (with
do not always pin; they can pin at many possible Avellaneda and Lipkin, unpublished) showed that any
strikes; and the choice of the amount of Brownian impact function stronger than square-root would result in
pinning. This suggests that weaker impact functions may
bridge component was exogenously (and arbitrarily) be contradicted by the extensive market evidence of stock
imposed. pinning.
Then Avellaneda and Lipkin [1], and nearly simul-
taneously, Ni et al. [4], produced theoretical and
experimental arguments for pinning. In the former References
work, Avellaneda and Lipkin proposed an asymmetric
hedging strategy for professional traders—aggressive [1] Avellaneda, M. & Lipkin, M.D. (2003). A market-induced
mechanism for stock pinning, Quantitative Finance 3,
hedging of long gamma positions and weak hedg-
417–425.
ing of short gamma positions. This hedging strategy [2] Jeannin, M., Iori, G. & Samuel, D. (2008). The pinning
coupled with a stock impact function (simplistically effect: theory and a simulated microstructure model,
assumed to be linearb ) led directly to pinning (with Quantitative Finance 8, 823–831.
nonzero probability), which depended naturally and [3] Krishnan, H. & Nelken, I. (2001). The effect of stock
endogenously on the option open interest, the intrin- pinning upon option prices, Risk December.
sic stock volatility, the (logarithmic) distance to the [4] Ni S, Pearson N., Poteshman A. (2004). Stock price
clustering on option expiration dates, SSRN August 27.
strike and the time to expiration. In dimensionless
form, the strength parameter, β, is proportional to
the open interest and inversely proportional to the Related Articles
volatility. Figures 2 and 3 show, experimentally, the
monotonic growth of pinning probability with β. Price Impact.
Ni et al. used the IVY and CBOE databases
to check pinning frequencies. Figures 4 and 5, typ- MIKE LIPKIN
ical Ni et al. graphs, indicate the excess clustering
Variance Swap σstrike at maturity, the payoff is approximately equal
to the vega notional.
Uses of variance swaps include trading the level
of volatility (variance swaps are a more pure way to
Definition do this compared to straddles), trading the realized/
implied vol spread, hedging of volatility exposures,
A variance swap is a volatility derivative that pays or trading volatility on a forward basis via forward
off on realized volatility of some underlying: variance swaps. See [1] for more details about
variance swaps, market practices, and their use in vol
Payoff = (σR 2 − Kvar ) × N (1)
spread trading and correlation trading.
where σR 2 is the realized variance, Kvar is the
fair value of the realized variance at inception, Fair Value and Skew
and N is the notional amount, a leverage factor.
The realized variance may be defined differently in In [6], it is shown that a variance swap can be
different markets depending not only on the “default statically replicated with calls, puts, and a forward
model” but also on specific contract specifications. contract. The payoff for the variance swap then
One standard way for stocks [9] is to define realized comes from delta hedging of this portfolio of options
variance as with the underlying. The fair value is determined by
   the cost of the replicating portfolio of options. The
252  2
n
σR 2 = ui , ui = ln Si Si−1 (2) payoff in terms of realized volatility is achieved by
n i=1 delta hedging with the underlying. If the realized vol
is exactly equal to the expected vol at inception,
This requires n + 1 observations of the daily clos- the hedging profits will be exactly equal to the
ing stock price Si . The factor 252 is the approximate cost of the option portfolio and the payout will be
number of business days in a year, which gives zero. As the portfolio of options (in theory) includes
an annualized variance. Also note that this formula option of every strike, the fair value cost is affected
assumes the mean of the returns to be zero. This significantly by the skew. The skew is defined to be
means that it is distinct from the statistical vari- the way the implied volatility changes as the strike
ance of the returns. The mean of returns is typically changes, all else being equal.
small and setting it to zero makes returns on vari- The fair value can be derived using the volatility
ance swaps additive over time. Using log returns to formula discussed in [4]. In this reference, the authors
measure variance makes this formula compatible with show the remarkable formula that the expected value
the standard Black–Scholes option pricing formula. of any smooth payoff function f (ST ) in terms of the
The long position in a variance swap receives terminal stock price ST can be written in terms of
N dollars for every point by which the stock’s stock and option prices.
realized variance σR 2 has exceeded the inception fair
value Kvar . See [6] for one of the earliest, but most Et [f (ST )] = f (κ)B0 + f  (κ)[Ct (κ) − Pt (κ)]
 κ
comprehensive, references on variance swaps. In this
+ f  (s)Pt (s)ds
reference, the authors discuss replication problems −∞
due to strike spacing and gapping of the underlying,  ∞
for example. + f  (s)Ct (s)ds (4)
It is market practice to define the variance notional κ
in volatility terms: An application of Ito’s lemma to the usual Black
diffusion yields the variance differential:
VegaNotional
Variance Notional = (3)  
2 × σstrike dSt
2 − d(log St ) = σ 2 dt (5)
St
where the strike volatility σstrike is equal to the square
root of the strike variance Kvar . With this adjustment, Applying equation (4) to the log contract [11]
if the realized volatility is 1 percentage point above in equation (5) shows that the replicating portfolio
2 Variance Swap

consists of out-of-the-money calls and puts of all Market Risk


strikes K, where each option has the weight 1/K 2
plus a dynamic delta hedge using a forward contract In [5], the authors derive general results on market
on the stock. The fair value of the variance strike is risk for variance swaps. Because variance is addi-
  tive, a variance swap partway through its life is
2 S0 rT S∗ valued partly by realized vol (already observed) and
Kvar = rT − e − 1 − log
T S∗ S0 partly by unrealized/implied vol, which is yet to be
 S∗ observed.
1
+ erT P (K)dK
0 K2 V (t0 , T )(T − t0 ) = V (t0 , t)(t − t0 )
 ∞   
1 Realized
+ erT C(K)dK (6)
S∗ K
2
+ V (t, T )(T − t) (13)
 
Unrealized
Here, C(K) and P (K) are the prices of the call
and put, respectively, with strike K. It follows from this that at time t a variance swap
In [6], we find the following formulas for fair with notional N has value
value in terms of simple skew models. For the skew
that is linear in the strike, M(t) = N e−rT [λ(V (t0 , t) − K0 )

K − SF + (1 − λ)(Kt − K0 )],
σ (K) = σATM − b (7)
SF λ = (t − t0 )/(T − t0 ) (14)

the fair value is The first piece is just the time-weighted value
of realized variance against the strike. The second
Kvar = σATM 2 (1 + 3T b2 + · · ·) (8) piece is the time-weighted difference of fair value
variance strikes. The same formula can be used to
For the skew that is linear in the delta (here p decompose the daily price change into three risk
is the put delta), components: gamma, vega, and theta:

1 M(t) = M(t + t) − M(t)
σ (p ) = σATM + b p + (9) 
2
1
the fair value is =N
 τ V (t, t + t)t

 
1 √ 1 b2 Gamma

Kvar = σATM 2
1+ √ b T + +· · ·
π 12 σATM 2
1 
(10) + (1 − λt+t ) · K − Kt t 
 (15)
  τ
Vega
 
Theta
In [7], we have a particularly elegant formula for
the fair value given directly in terms of the skew. The biggest independent risk is in the fair value of
Denote the strike, which we have put in the vega component.
√ This component also entails the skew risk.
k σBS (k) T
z(k) = d2 = − √ + (11)
σBS (k) T 2
Volatility Swaps
Intuitively, z measures the log-moneyness of an
option in implied standard deviations. Then Although variance swaps can be statically repli-
 cated, volatility swaps (see Volatility Swaps) cannot.

In [3], the authors show that there is an approx-
Kvar = dz · N  (z)σBS 2 (z)T (12)
−∞
imate dynamic replicating strategy for volatility
Variance Swap 3

swaps. Before this, volatility swap valuation had been [7] Gatheral, J. (2006). The Volatility Surface: A Practi-
thought to be highly model dependent [2]. See also tioner’s Guide, Wiley Finance.
[8, 10], where the authors give a closed formula for [8] Haug, E.G. (2007). Option Pricing Formulas, 2nd Edi-
tion, McGraw Hill.
valuing vol swaps using a GARCH model. [9] Hull, J. (2003). Options, Futures, and other Derivatives,
5th Edition, Prentice Hall.
References [10] Javaheri, A., Wilmott, P. & Haug, E.G. (2002). Garch
and Volatility Swaps, published on Wilmott.com.
[11] Neuberger, A. (1994). The log contract, The Journal of
[1] Bossu, S., Strasser, E. & Guichard, R. (2005). Just What Portfolio Management 20, 74–80.
You Need to Know About Variance Swaps, JP Morgan
Equity Derivatives Research Publication.
[2] Brockhaus, O. & Long, D. (2000). Volatility Swaps Made Related Articles
Simple, RISK Magazine, pp. 92–95.
[3] Carr, P. & Lee, R. (2008). Robust replication of volatility
derivatives, Mathematics in Finance Working Paper Correlation Swap; Corridor Variance Swap;
#2008-3. Courant Institute of Mathematical Sciences. Gamma Swap; Realized Volatility and Multipower
[4] Carr, P. & Madan, D. (1998). Towards a theory of Variation; Realized Volatility Options; Volati-
volatility trading, in Volatility ed., R.A. Jarrow, Risk lity; Volatility Index Options; Volatility Swaps;
Books. Weighted Variance Swap.
[5] Chriss, N. & Morokoff, W. (1999). Market Risk for
Volatility and Variance Swaps, Risk Magazine. ERIC LIVERANCE
[6] Demeterfi, K., Derman, E., Kamal, M. & Zou, J. (1999).
A guide to volatility and variance swaps, Journal of
Derivatives 6, 9–32.
Volatility Swaps the volatilty swap is slightly lower than that of a
variance swap. The difference between the two is
known as the convexity adjustment and gets larger
as volatility of volatility gets larger. The convexity
Volatility swaps are very similar to variance swaps,
adjustment can be calculated, for example, in the
both in concept and in application. Like variance
Heston model.
swaps, volatility swaps can be used by hedge funds
The variance swap is preferred in the equity mar-
to speculate on volatility movements or by portfolio
ket due to the fact that it can be replicated with a
managers to hedge other products against volatility
linear combination of vanilla options and a dynamic
fluctuations. Since their introduction in 1998, they
position in futures (see Variance Swap; [2]). In other
have seen rapid growth and there is currently a sizable
markets, the volatility swap is actually more liquid
market in both the equity and foreign exchange
than the variance swap. Although a position in vari-
markets. Volatility swaps are also traded in interest
ance swap can be replicated, a position in volatility
rate and commodity markets.
swap cannot. This means that different models that
Technically speaking, a volatility swap is not
correctly calibrate to the vanilla option surface will
really a swap, but a forward contract on the realized
give the same price for variance swaps but not for
volatility. At maturity, the buyer receives from the
the volatility swaps. In other words, the price of a
seller the difference between the realized volatility
variance swap is model independent, but the price
and the fixed strike amount, multiplied by the dollar
of volatility swap is not. In practice, the volatility
notional (quoted in dollar per volatility point):
swap and variance swap admit almost equal pricing
for short-term maturities. Recent research also sug-
Volatility swap = Notional × (σrealized − K) (1) gest that the model dependence is not as large as it is
commonly believed and volatility swaps can also be
while
approximately replicated by trading vanilla options
(see Volatility Index Options; Realized Volatility
Variance swap = Notional × (σrealized
2
− K 2 ) (2) Options; [1]). Newer models have been developed
that can price volatility derivatives including volatil-
The fixed strike payment is usually referred to as ity swaps, while remaining consistent with the entire
the fixed leg of the swap and the realized volatility volatility surface [3].
is referred to as the floating leg. The swap con-
tract contains the detailed specifications on how the
volatility is calculated. Typically, the floating leg is References
calculated as

 N   [1] Carr, P. & Lee, R. (2007). Realised volatility and vari-
  Si 2
Annualization × ln (3)
ance: option via swaps, Risk 20(5), 76–83.
Si−1 [2] Demeterfi, K., Derman, E., Kamal, M. & Zou, J. (1999).
1 More than You Ever Wanted to Know About Volatility
Swaps, Goldman Sachs Quantitative Strategies Research
where Si should be adjusted by discrete dividends Notes.
across dividend payment days. In the most accepted [3] Ren, Y., Madan, D. & Qian, M. (2007). Calibrating and
convention, the floating leg is reset and computed pricing with embedded local volatility models, Risk 20(9),
daily. 138–143.
Besides variance swaps, another product closely
related to volatility swaps is the VIX contract, which
is written as the square root of the sum of expected Related Articles
future variances.
Because variance is the square of volatility, the Corridor Variance Swap; Realized Volatility
payoff of a variance swap is convex in volatility while Options; Variance Swap; Volatility Index Options;
the payoff of a volatility swap is linear in volatility. A Weighted Variance Swap.
volatility swap is thus cheaper than the corresponding
variance swap. More specifically, the fair strike for YONG REN
Static Hedging is to match the barrier option’s value at expiry and
along the barrier.
To illustrate, consider a down-and-out call with
strike K, expiry T, and barrier B. (An up-and-out call
Liquid traded put and call options can be used
is treated similarly, except strike-above-the-barrier
as hedge instruments for over-the-counter traded
calls are used as hedge instruments.) Let Put, Call
products. Barrier options are the most common exotic
(spot, time | strike, expiry) denote put and call values.
options, and, for these contracts, static hedging works
Suppose that we have specified a grid of time
out particularly well. In the Black–Scholes model
points 0 = t0 < t1 < . . . < tn = T , and n pairs of put
there are simple methods (conceptually straightfor-
ward and/or closed form) for constructing replicating with strikes Kj ≤ B and expiries Tj ≤ T . Find the
portfolios that do not require dynamic trading; they solution α to
are set up at initiation of the barrier option, and liq- Aα + u = 0 (1)
uidated at either knockout or expiry. They are, thus,
where A is an n × n matrix with entries Ai,j =
static hedges. Inspired by Allen and Padovani [1], we
P ut (B, ti |Kj , Tj ) and u is an n-vector with entries
describe how to find static hedges for barrier options
Call(B, ti |K, T ). A portfolio with the (K, T )-call
in the Black–Scholes model in a way that encom-
and αj in the (Kj , Tj )-put then matches the barrier
passes both Derman’s [6] intuitive calendar-spread
option’s zero value at the ti -points along the bar-
algorithm and Carr’s [4] strike-spread hedges stem-
rier, and its expiry payoff above the barrier. So the
ming from put–call symmetry.
barrier option is—to a good approximation when the
match points, the ti ’s, are close —replicated buy-
Construction of Static Hedges ing this portfolio at time 0 and selling it either when
the barrier option is knocked out (because sample
Unless we explicitly say otherwise, we consider a paths are continuous, this can only happen if the
Black–Scholes model throughout this article. This barrier is actually hit) or when it expires. In other
means that the interest rate is constant, and all options words, this represents a static hedge. There is free-
are written on some underlying asset S that follows a dom of choice regarding strikes and expiries of the
geometric Brownian motion. A zero-rebate, knockout hedge instruments. Derman [6] suggests calendar-
barrier option is a contract that pays off as a plain spread hedging with strikes along the barrier,that is,
vanilla option if S stays within a specified barrier using Tj = tj −1 and Kj = B. This makes the A-
for the whole life of the barrier option, but becomes matrix triangular so that we can solve for αj ’s in
worthless if the barrier is hit or crossed (see also one easy-to-explain backward-working pass. Another
Barrier Options). Recurrent examples are the down- choice—closely related to Carr’s work [4]—is to use
and-out call and the up-and-out call The value of a strike spreads, that is, Tj = T for all j and Kj ’s that
still-alive barrier option is of the form F (St , t), where are different and below the barrier.
the function F solves the Black–Scholes partial
differential equation with 0 as boundary condition Example 1. Table 1 gives a numerical comparison
along the barrier (see Finite Difference Methods of the performance of different hedge portfolios for
for Barrier Options). This is illustrated in Figure 1, three-month barrier options; a typical lifetime of a
which is useful to keep in mind when the method barrier option in foreign exchange markets. Looking
for constructing static hedges is described in the at the results in Table 1 for the down-and-out call, we
following. see the appeal of using options as hedge instruments;
very few options are needed in the static hedges
Construction of Static Hedges to achieve a hedge quality that is several orders
of magnitude better than usual dynamic -hedging.
A portfolio of puts and calls that (approximately) The numbers for the up-and-out call demonstrate one
replicates the barrier option can be found as the problem that static hedging does not immediately
solution to a linear system of equations, and con- solve: the up-and-out call is a reverse or live-out
structing it does not require knowledge/implemen- option meaning that the underlying call is in the
tation of barrier option valuation formulas. The idea money when the barrier option knocks out. This
2 Static Hedging

F≡0

B
F solves Black–Scholes PDE
F (B, t ) = 0
K
F (B, t ) = 0 K
F (x,T ) = (x – K )+ F (x,T ) = (x – K )+
B

F≡0 F solves Black–Scholes PDE

0 0
0 T 0 T
(a) (b)

Figure 1 The PDEs for (a) down-and-out and (b) up-and-out call options

Table 1 Performance of dynamic and static hedge strategies in the Black–Scholes with 15% volatility and zero interest
rate and dividends. The columns show the initial price of the hedge portfolio and the standard deviation of the benchmarked
discounted hedge error, that is, the value of hedge portfolio at liquidation minus barrier option payoff relative to the initial
value of the barrier option. All static hedges use three options besides the (K, T )-call. The time points for value matching,
the ti s, and the expiries for the calendar spreads are from the list (0, 1/12, 2/12, 3/12). The strike spreads use calls with
strikes (110,112,114) for the up-and-out case, and puts with strikes (90.25=B 2 /K , 88.25, 86.25) for the down-and out case.
The -hedge is adjusted daily and all portfolios are continuously monitored
Barrier option type

Down-and-out call Up-and-out call


K = 100, T = 1/4, B = 95 K = 100, T = 1/4, B = 110

Hedge method Cost Standard deviation (%) Cost Standard deviation (%)
Dynamic;  2.6964 11 1.0358 81
Static; strike spreads 2.6964 0 1.0674 19
Static; calendar spreads 2.6704 1.0 1.3468 94

discontinuity creates a large gap risk, and hedge dimension is needed to match different volatility
quality deteriorates. To alleviate this, a number of levels at knockout. By using both strike and calendar
regularization techniques have been suggested, for spreads, asymptotically perfect static hedges can be
instance [10] using singular value decomposition found in these two cases. It should be stressed that the
when solving equation (1). static hedges are model and parameter dependent, but
experimental and empirical evidence [7, 9] suggests
Beyond Black–Scholes Dynamics a high degree of robustness to model risk.

Constructing static hedges by solving linear equations


like equation (1) goes well beyond the Black–Scholes
model. For constant elasticity of variance (asset Put–Call Symmetry and Static Hedges
γ −1
volatility σ St ) and local volatility (asset volatility
of the form σ (St , t)) models, the system carries In a number of papers [3–5], Peter Carr and
over verbatim; the entries of the A-matrix are just coauthors have derived put–call symmetries and
calculated with a different formula/method. For jump- shown how they can be used to create static
diffusion models [2], one needs to extend the grid hedges for barrier options. In its basic form [4]
of match points to space points beyond the barrier, [page 1167], the put–call symmetry states that in
and for stochastic volatility models [8], an extra the zero-dividend, zero-interest rate Black–Scholes
Static Hedging 3

model, we have [3] Bowie, J. & Carr, P. (1994). Static simplicity, Risk
Magazine 7(8), 44–50.
Call (St , t|K, T ) = (K/St ) × P ut (St , t|St2 /K, T ) [4] Carr, P., Ellis, K. & Gupta, V. (1998). Static hedging of
exotic options, Journal of Finance 53, 1165–1190.
for all St , t, K, and T (2) [5] Carr, P. & Lee, R. (2008). Put-call symmetry: extensions
and applications, Mathematical Finance forthcoming.
So a down-and-out call is replicated by buying one [6] Derman, E., Ergener, D. & Kani, I. (1995). Static options
strike-K call, selling K/B puts with strike B 2 /K, liq- replication, Journal of Derivatives 2, 78–95.
uidating this position the first time that St = B , and [7] Engelmann, B., Fengler, M., Nalholm, M. &
if that does not happen, holding it until the options Schwendner, P. (2007). Static versus dynamic hedges:
an empirical comparison for barrier options, Review of
expire. More general symmetry relations enable one
Derivatives Research 9, 239–264.
to find static hedges for such contracts as up-and- [8] Fink, J. (2003). An examination of the effectiveness of
out calls, barrier options with rebates, lookback static hedging in the presence of stochastic volatility,
options, and double barrier options ([11] is a survey). Journal of Futures Markets 23, 859–890.
Those static hedges will typically involve a contin- [9] Nalholm, M. & Poulsen, R. (2006). Static hedging
uum of plain vanilla options. Put–call symmetries and model risk for barrier options, Journal of Futures
also exist in models with nonzero interest rates and Markets 26, 449–463.
[10] Nalholm, M. & Poulsen, R. (2006). Static hedging of
dividends, and more general dynamics than geometric barrier options under general asset dynamics: unification
Brownian motion (see [5]). Note that the strike- and application, Journal of Derivatives 13, 46–60.
spread approach from the previous section finds the [11] Poulsen, R. (2006). Barrier options and their static
symmetry-based static hedges without explicit knowl- hedges: simple derivations and extensions, Quantitative
edge of closed-form results, and that the perfect repli- Finance 6, 327–335.
cation of the down-and-out call in Table 1—where
the strike-B 2 /K put is included as a hedge instru- Related Articles
ment—demonstrates the basic put–call symmetry.
Barrier Options; Finite Difference Methods for
References Barrier Options; Hedging; Put–Call Parity;

[1] Allen, S. & Padovani, O. (2002). Risk management ROLF POULSEN


using quasi-static hedging, Economic Notes 31,
277–336.
[2] Andersen, L., Andreasen, J. & Eliezer, D. (2002). Static
replication of barrier options: some general results,
Journal of Computational Finance 5, 1–25.
Corridor Variance Swap share prices Y under deterministic interest rates and
proportional dividends.
Explicitly, one replicates using equation (7) of that
article, with λ derived in [3]:
A corridor variance swap, with corridor C, on an 
underlying Y is a weighted variance swap on X := 2
λ(y) = 2
Van(y, K) dK (3)
log Y (unless otherwise specified), with weight given K∈C K
by the corridor’s indicator function: where Van(y,K) := (K −y)+ 11K < κ +(y −K)+ 11K > κ
w(y) := 11y∈C (1) for an arbitrary put/call separator κ.
Therefore, in the case that the interest rate equals
For example, one may define an up-variance swap the dividend yield (otherwise, see Weighted Vari-
by taking C = (H, ∞) and a down-variance swap by ance Swap), a replicating portfolio statically holds
taking C = (0, H ), for some agreed H . 2/K 2 dK vanilla calls or puts at each strike K in
In practice, the corridor variance swap monitors the corridor C. The corridor variance swap model-
Y discretely, typically daily, for some number of independently has the same initial value as a claim
periods N , annualizes by a factor such as 252/N , on the time-T payoff λ(Yr ) − λ(Y0 ). Additionally, the
and multiplies by notional, for a total payoff replication strategy trades shares dynamically accord-
ing to a “zero-vol” delta-hedge, meaning that its share

N  2 holding equals the negative of what would be the
Yn European portfolio’s delta under zero volatility.
Notional × Annualization × 11Yn ∈C log
n=1
Yn−1 For corridors of the type C = (0, H ) or C =
(H, ∞) where H > 0, taking κ := H in equation (3)
(2)
yields
If the contract makes dividend adjustments (as typical λ(y) = (−2 log(y/H ) + 2y/H − 2)11y∈C (4)
for contracts on single stocks but not on indices), then
the term inside the parentheses becomes log((Yn + This λ, with H chosen arbitrarily, is also valid for
Dn )/Yn−1 ), where Dn denotes the dividend payment, the variance swap C = (0, ∞)
if any, of the nth period.
Corridor variance swaps accumulate only the vari-
ance that occurs while price is in the corridor. The Further Properties
buyer therefore pays less than the cost of a full
variance swap. Among the possible motivations for
1. For a small interval C = (a, b), the corridor
a volatility investor to accept this trade-off and to
variance swap approximates a contract on local
buy up (or down) variance are the following. First,
time, in the following sense. Corridor variance
the investor may be bullish (bearish) on Y . Second,
satisfies
the investor may have the view that the market’s
downward volatility skew is too steep (flat), mak-  T
ing down-variance expensive (cheap) relative to up- VT(a,b) := 11Xt ∈(log a, log b) dXt
variance. Third, the investor may be seeking to hedge 0
 log b
a short volatility position that worsens as Y increases
(decreases). = LxT dx (5)
log a

by the occupation time formula, where LxT


Model-free Replication and Valuation denotes (an x-cadlag modification of) the local
time of X. Therefore, at any point a,
The continuously monitored corridor variance swap 1
admits model-free replication by a static position V (a,b) −−−→ LaT as b ↓ a (6)
in options and dynamic trading of shares, under log b − log a T
conditions specified in Weighted Variance Swap, 2. Corridor variance can arise from imperfect repli-
which include all positive continuous semimartingale cation of variance. The replicating portfolio for
2 Corridor Variance Swap

a standard variance swap holds options at all In the case of nonzero interest rates or divi-
strikes K ∈ (0, ∞). In practice, not all of those dends, add to equation (8) a correction involving
strikes actually trade. If we truncate the port- payoffs at all expiries in (0, T ), as specified in
folio to hold only the strikes in some interval equation (7a) in Weighted Variance Swap, and
C, then the resulting value does not price a full in equation (9) replace Y0 by the forward price.
variance swap but rather a C-corridor variance
4. With discrete monitoring, the question arises,
swap. (Moreover, in practice not even an inter-
how to define up-variance and down-variance,
val of strikes actually trade, but rather a finite
set, which can replicate instead a strike-to-strike and in particular how much variance to recog-
notion of corridor variance, as shown in [1].) nize, given a discrete move that takes Y across
H . Definition (2) recognizes the full square of
3. In the case C = (H, ∞), where H > 0, we each move that ends in the corridor. Alterna-
rewrite equation (4) as tively, the contract specifications in [2] treat the
movements of Y across H by recognizing a frac-
2
λ(y) = (y − H )+ − 2(log y − log H )+ (7) tion of the squared move. The fraction is defined
H in a way that admits approximate discrete hedg-
Thus, the replicating portfolio is long calls on YT ing, in the sense that the time-discretized imple-
and short calls on log YT . mentation of the continuous replication strategy
Let FXT be the characteristic function of has in each period a hedging error of only third
XT = log YT . Then techniques in [4, 5] price the order in that period’s return.
calls on YT and log YT , respectively. Specifically,
assuming zero interest rates and dividends, we
have the following semiexplicit formula for the References
corridor variance swap’s fair strike:
[1] Carr, P. & Lee, R. (2008). From Hyper Options to
Variance Swaps, Bloomberg LP, University of Chicago.
Ɛλ(YT ) − λ(Y0 )
[2] Carr, P. & Lewis, K. (2004). Corridor variance swaps,
 ∞−αi  
2 FXT (z − i) −iz log H Risk 17(2), 67–72.
= Re e dz [3] Carr, P. & Madan, D. (1998). Towards a theory of
H π 0−αi iz − z2 volatility trading, in Volatility, R., Jarrow, ed, Risk
  
2 ∞−βi FXT (z) −iz log H Publications, pp. 417–427.
+ Re e dz − λ(Y0 ) [4] Carr, P. & Madan, D. (1999). Option valuation using the
π 0−βi z2 fast Fourier transform, Journal of Computational Finance
(8) 3, 463–520.
[5] Lee, R. (2004). Option pricing by transform methods:
for arbitrary positive α, β such that α + 1, β < extensions, unification, and error control, Journal of
p
sup{p : ƐYT < ∞}, where Ɛ denotes expectation Computational Finance 7(3), 51–86.
with respect to martingale measure.
In the case C = (0, ∞), equation (4) implies
Related Articles
the fair strike formula

Ɛλ(YT ) − λ(Y0 ) = − 2Ɛ log(YT /Y0 ) Delta Hedging; Gamma Swap; Realized Volatil-
ity Options; Variance Swap; Volatility Swaps;
= 2iFX T (0) + 2 log Y0 (9) Weighted Variance Swap.
In the case C = (H1 , H2 ), where 0 ≤ H1 < H2 ,
ROGER LEE
subtract the formula for C = (H2 , ∞) from the
formula for C = (H1 , ∞).
Gamma Swap and dynamic trading of shares, under conditions
specified in Weighted Variance Swap, which include
all positive continuous semimartingale share prices
Y under deterministic interest rates and proportional
A gamma swap on an underlying Y is a weighted
dividends.
variance swap (see Weighted Variance Swap) on
Explicitly, one replicates by using equation (7) of
log Y , with weight function
the above article, with
w(y) := y/Y0 (1)
2  
λ(y) = y log(y/κ) − y + κ
In practice, the gamma swap monitors Y discretely, Y0
typically daily, for some number of periods N ,  ∞
2
annualizes by a factor such as 252/N , and multiplies = Van(y, K) dK (3)
by notional, for a total payoff 0 Y 0K

where Van(y,K) := (K− y)+ 11K<κ + (y −K)+ 11K>κ


N   for an arbitrary put/call separator κ. Forms of this
Yn Yn 2
Notional × Annualization × log payoff were derived in, for instance, [2, 3].
Y
n=1 0
Yn−1 Therefore, in the case that the interest rate equals
(2) the dividend yield (otherwise, see the weighted vari-
ance swap article), a replicating portfolio statically
If the contract makes dividend adjustments (as typical holds 2/(Y0 K) dK vanilla calls or puts at each strike
for single-stock gamma swaps but not index gamma K. The gamma swap model independently has the
swaps), then the term inside the parentheses becomes same initial value as a claim on the time-T payoff
log((Yn + Dn )/Yn−1 ), where Dn denotes the dividend λ(YT ) − λ(Y0 ). Additionally, the replication strategy
payment, if any, of the nth period. trades shares dynamically according to a “zero-vol”
Gamma swaps allow investors to acquire variance delta-hedge, meaning that its share holding equals the
exposures proportional to the underlying level. One negative of what would be the European portfolio’s
application is dispersion trading of a basket’s volatil- delta under zero volatility.
ity against its components’ single-name volatilities;
as a component’s value increases, its proportion of
the total basket value also increases and, hence, so Further Properties
does the desired volatility exposure of the single- Points 2–5 follow from equation 3. Point 1 uses only
name contract. This variable exposure to volatility the definition 1.
is provided by gamma swaps, according to point 1 of

the “Further Properties” below. A second application 1. For an index Yt := Jj=1 θj Yj,t , let αj,t :=
is to trade the volatility skew; for example, to express θj Yj,t /Yt be the fraction of total index value due
a view that the skew slopes too steeply downward, to the quantity θj of the j th component Yj,t .
the investor can go long a gamma swap and short Define the cumulative dispersion Dt by
a variance swap, to create a weighting y/Y0 − 1,
which is short downside variance and long upside 
J
dDt = αj,t d[log Yj ]t − d[log Y ]t (4)
variance. A third application is to trade single-stock
j =1
variance without the caps often embedded in vari-
ance swaps to protect the seller from crash risk; in a Going long αj,0 gamma swaps (non-dividend-
gamma swap, the weighting inherently dampens the adjusted) on each Yj and short a gamma swap
downside variance, so caps are typically regarded as on Y creates the payoff
unnecessary.

J  T  T
Yj,t Yt
αj,0 d[log Yj ]t − d[log Y ]t
Model-free Replication and Valuation j =1 0 Yj,0 0 Y0
 T
The continuously monitored gamma swap admits Yt
model-free replication by a static position in options = dDt (5)
0 Y0
2 Gamma Swap

as noted in [2]. Hence, a static combination hold, at each strike K, a quantity proportional
of gamma swaps produces cumulative index- to K n−2 . The gamma swap O(1/K) is interme-
weighted dispersion. diate between logarithmic variance O(1/K 2 ) and
2. By Corollary 2.7 in [1], if the implied volatility arithmetic variance O(1).
smile is symmetric in log-moneyness, and the 5. Let F be the characteristic function of log YT . If
dividend yield equals the interest rate (qt = rt ), ƐYTp < ∞ for some p > 1 then
and there are no discrete dividends, then a
gamma swap has the same value as a variance ƐYT log YT = −iF  (−i) (7)
swap. Gamma swap valuations are therefore directly
3. Assuming that YT = Yt Rt,T for all t, where computable in continuous models for which F
the time-t conditional distribution of each Rt,T is known, such as the Heston model (see Heston
does not depend on Yt , the gamma swap Model).
has time-t gamma equal to a discounting/
dividend-dependent factor times
References


2 ∂ 2 2Ɛt Rt,T
Ɛt yRt,T log(yRt,T ) = [1] Carr, P. & Lee, R. Put–call symmetry: extensions and
Y0 ∂y 2 y=Yt Y0 Yt applications, Mathematical Finance, Forthcoming.
[2] Mougeot, N. (2005). Variance Swaps and Beyond , BNP
(6) Paribas.
[3] Overhaus, M., Bermúdez, A., Buehler, H., Ferraris, A.,
where Ɛ denotes expectation with respect to Jordinson, C. & Lamnouar, A. (2007). Equity Hybrid
martingale measure. Therefore, the share gamma, Derivatives, John Wiley & Sons.
defined to be Yt times the gamma, does not
depend on Yt . This property motivates the term
gamma swap. Related Articles
4. Within the family of weight functions propor-
tional to w(y) = y n , the gamma swap takes Corridor Variance Swap; Delta Hedging; Real-
n = 1. In that sense, the gamma swap is inter- ized Volatility Options; Variance Swap; Volatility
mediate between the usual logarithmic variance Swaps; Weighted Variance Swap.
swap (which takes n = 0) and an arithmetic vari-
ance swap (which, in effect, takes n = 2). ROGER LEE
Expressed in terms of put and call holdings,
the replicating portfolios in these three cases
Atlas Option the payoff of the Atlas option is
 
1 
n−b
Si (T )
max − K, 0 (1)
n − (w + b) i=1+w Si (0)
In late 1990s, Societe Generale introduced a series of
options on baskets of assets which are now commonly with the obvious condition that b + w < n.
If b < w, with b possibly 0, or if b > w with w
referred to as “Mountain Range” options [2]. They
possibly 0, then this option becomes the best of or
were introduced in part to replicate certain portfolio
worst of option, respectively. On the other hand, if
strategies and in part to extend single-name options
b = w, with equal number of underperforming and
to portfolios. What these options share is a strong
outperforming stocks removed, this option becomes,
dependence on the correlation structure of the assets,
in effect, a “middle of the road” or an “average
brought about by their nonlinear and path-dependent
of averages” option. By removing the outliers, we
payoffs. But beyond this similarity, each type has its
are removing extreme risk and lowering the pre-
own distinct payoff tailored to its own risk profile
mium, while making it more favorable to risk-averse
and usage, making each deserving a study of its customers. For example, it can provide protection
own. In a series of three articles, we look at the against defaults for the price of missing out on top
three commonly traded types of Mountain Range performers.
options—the Atlas option, the Himalayan option, and
the Altiplano option.
We start with the Atlas option, which being the Modeling
only non-path-dependent option in this group, is
somewhat easier to analyze than the other two. This In the simplest of implementations, as in single-name
article is organized as follows. We first provide options, the asset price processes are modeled as
a description of the Atlas option and discuss the lognormal processes but with a correlation matrix. In
financial motivations for and strategies of its usage. more advanced implementations, to account for the
We then discuss modeling, valuation, and risk issues volatility smile, some versions of stochastic volatility
that Atlas options share with all Mountain Range models are often used. One may even model some
options, and conclude with a brief analysis of the form of default component. However, in these more
risk profile that is unique to it. We remark that complex models, the modeling of correlation and its
although the following discussion holds for a wide estimation become more complex as well.
class of assets, such as foreign exchange (FX) and
commodities, these options are traded mostly on
baskets of stocks.
Valuation and Risk
The number of assets in Mountain Range options
generally ranges from a low of 4 or 5 to a high
of about 20. Owing to their complex payoff and
Contract Description path-dependency, idiosyncratic characteristics of each
asset need to be taken into account. Hence one can-
The payoff of the Atlas option is simply a call (or not assume homogeneity of assets for neither small
a put) option on the performance of a portfolio at nor large baskets, making any closed-form approx-
maturity with the best and worst performing names imation (especially in light of path dependence)
removed. More precisely, given a portfolio, or bas- intractable. Consequently, Mountain Range options,
ket of n stocks, let Si (t) be the price of the stock i = even the non-path-dependent Atlas options, are calcu-
1, . . . , n at time t, with 0 being the start of the option lated using Monte Carlo simulation [1]. Monte Carlo
and T its maturity. Furthermore, assume that the methods, especially for high-dimensional payoffs
indices are such that S1 (T )/S1 (0), . . . , Sn (T )/Sn (0), with large number of assets, are slow to converge, and
that is, the performance of the stocks is in increasing usually one or more variance-reduction techniques
order. Given a strike K, the number of underperform- are employed. This problem is exacerbated further
ing assets w and outperforming ones b to be removed, when calculating first and second order Greeks.
2 Atlas Option

But even in the simple lognormal model, the sheer the option payoff. Since it behaves as a single asset,
size of the correlation matrix can become a challenge. as in single-asset calls, the option’s payoff generally
Since for n assets there can be n(n − 1)/2 distinct increases with volatility.
correlations, even for a modest basket of 10 assets, For low correlations, on the other hand, the basket
45 different correlations are possible. Moreover, it is has a high dispersion at maturity—on average, a few
not clear how one can obtain the correlation numbers stocks will have a high price and the rest low—and
themselves. If, theoretically speaking, there existed higher the volatility, the higher the dispersion. Since
n(n − 1)/2 traded spread options on each pair, their the expectation of the sum of asset prices at matu-
implied correlations could be used with the spread rity is independent of both volatility and correlation,
options as hedges. However, it is unlikely that every having a few high asset prices implies that many oth-
pair of assets in a basket would have a traded spread ers are very low in most paths. But it is precisely
option. Even if they did, their sheer number would these high-contribution assets that are removed from
make transaction costs prohibitive, even for moderate the basket, leaving the basket with low-priced assets,
bid–ask spreads. Hence historical correlations are thus reducing the price of the option. So for low cor-
more often used, even though as with all historical relation, with b  = 0, increasing volatility does not
estimates, they are hard to hedge and can change necessarily increase the price. We remind the reader
with macro- and microeconomic shifts. When all that this simple analysis applies to a homogeneous
assets belong to the same sector, a single correlation basket. Individual volatilities, dividends, and a non-
number is commonly used. This high amount of asset constant correlation can affect the payoff in ways not
interdependence makes cross-gammas (see Gamma always easily explained.
Hedging) important, adding further to the hedging
complexity.
References

Risk Profile [1] Glasserman, P. (2004). Monte Carlo Methods in Financial


Engineering, Springer, New York.
When w = b = 0, the Atlas option is simply a call [2] Mountain Range Options, document downloadable from
global-derivatives.com
option on the average performance of a basket. As
in a vanilla call on a single stock, the higher the
volatility, the higher the price. But the analysis gets Related Articles
more interesting when we start removing good and
bad performers at maturity. Altiplano Option; Basket Options; Correlation
For simplicity, we look at a homogeneous port- Risk; Himalayan Option.
folio with identical pairwise correlations given by
a single number in a simple lognormal model. For REZA K. GHARAVI
very high correlations, the basket behaves as a sin-
gle asset—thus removing assets has a small effect on
Himalayan Option Modeling
As in single-name options, in the simplest of imple-
mentations, the asset price processes are modeled
In the late 1990s, Societe Generale introduced a with lognormal processes endowed with a correla-
series of options on baskets of assets that are now tion matrix. In more sophisticated implementations,
commonly referred to as Mountain Range options [2]. to account for the volatility smile, some versions of
They were introduced in part to replicate certain stochastic volatility models are often used. One may
portfolio strategies and in part to extend single- even include some form of default component in the
name options to portfolios. What these options share asset price. However, in these more complex models,
is a strong dependence on the correlation structure the modeling of correlation and its estimation become
of the assets, brought about by their nonlinear and more complex as well.
path-dependent payoffs. But beyond this similarity,
each type has its own distinct payoff tailored to its
own risk profile and usage, making each deserving a Valuation and Risk
study of its own. In this second of three articles on
Mountain Range options, we look at the Himalayan The number of assets in Mountain Range options
option. Unlike Atlas options, Himalayans are path generally ranges from a low of 4 or 5 to a
dependent, and are usually longer dated than Atlas high of about 20. Owing to their complex pay-
options. off and path dependency, idiosyncratic characteris-
This article is organized as follows. We first tics of each asset need to be taken into account.
provide a description of the Himalayan option and Hence one cannot assume homogeneity of assets
discuss the financial motivations for and strategies for either small or large baskets, making any
of its usage. We then discuss modeling, valuation, closed-form approximation (especially in light of
and risk issues that Himalayan options share with path dependence) intractable. As a result, Moun-
all Mountain Range options, and conclude with a tain Range options, specially the path-dependent
brief analysis of the risk profile that is unique to varieties such as the Himalayan, are calculated
it. We remark that although the following discussion using Monte Carlo simulation [1]. In light of their
holds for a wide class of assets, such as foreign high-dimensional payoffs due to large number of
exchange (FX) and commodities, these options are assets and time points, to speed up convergence,
traded mostly on baskets of stocks. specially for first- and second-order Greeks, usu-
ally one or more variance-reduction techniques are
employed.
The other challenge posed by these options is the
Contract Description correlation. Even in simple lognormal models, the
sheer size of the correlation matrix can become a
The payoff of the Himalayan option is best described challenge. Since for n assets there can be n(n − 1)/2
in words. Given n assets and contractual times distinct correlations, even for a modest basket of 10
T1 , . . . , Tn (usually yearly), at first time T1 we take assets, 45 distinct correlations are possible. Moreover,
the best performing stock, record its performance, and obtaining the pairwise correlations themselves is
then remove it from the basket. We continue until at not straightforward. If, theoretically speaking, there
maturity Tn we are left with the last stock. The payoff existed n(n − 1)/2 traded spread options on each
of the option is the sum of the performances on these pair, their implied correlations could be used with
n contractual times. In some variations, the top two the spread options as hedges. However, it is unlikely
or three are removed at each time. that every pair of assets in a basket would have a
What this option emulates is the greedy strategy of traded spread option. Even if they did, their sheer
liquidating a portfolio—the best performing stocks number would make transaction costs prohibitive,
are sold first. Since this is a derivative product, even for moderate bid–ask spreads. Hence historical
Himalayan options may offer regulatory and tax correlations are more often used, even though as
benefits compared to actual holding of a portfolio. with all historical estimates, they are hard to hedge
2 Himalayan Option

and can change with macro- and microeconomic performing asset would in most scenarios leave
shifts. When all assets belong to the same sector, the worse performing assets, and since no asset is
a single correlation number is commonly used. This given a chance to grow—it would get removed in
high amount of asset interdependence makes cross- the next cut—we are left with increasingly worse
gammas important, adding further to the hedging performing assets. Thus greediness would actually
complexity. be a bad strategy compared to some predetermined
asset-removal order. Increasing volatility increases
this dispersion, further reducing the payoff of the
Risk Profile Himalayan. We remind the reader that this simple
analysis applies to a homogeneous basket. Individual
Payoffs for Himalayan options can be surprising.
volatilities, dividends, and a nonconstant correlation
For simplicity, we look at a homogeneous portfolio
can affect the payoff in ways not always easily
with identical pairwise correlations given by a single
explained.
number in the simple lognormal model. We look at
the effects of removing the best performing assets
compared to removing assets in a predetermined References
order. For high correlations, since basket assets
move together, the effects of the “greediness” is [1] Glasserman, P. (2004). Monte Carlo Methods in Financial
small—removing the best performing assets leaves Engineering, Springer, New York.
the portfolio with similarly performing assets. So [2] Mountain Range Options, document downloadable from
global-derivatives.com.
compared to removal by some predetermined order,
its effects are small. Increasing the volatility increases
the dispersion, and as we next see for the case of low- Related Articles
correlation baskets, it may adversely impact the value
of the option. Altiplano Option; Atlas Option; Basket Options;
Assets with low correlation, on the other hand, Correlation Risk.
become more disperse as time passes. Since the
expected sum of assets at termination of the option REZA K. GHARAVI
is independent of correlation, removing the best
Altiplano Option use any option. As in single-name barriers, there
are many possible wrinkles for the barrier event,
such as the Parisian type (see Parisian Option).
But unlike a simple extension from the single-name
In the late 1990s, Societe Generale introduced a
case where the barrier is triggered by the sum of
series of options on baskets of assets that are now
the portfolio, individual assets can trigger the barriers
commonly referred to as Mountain Range options
by themselves. In the example above, all it takes is
[2]. They were introduced in part to replicate certain
for one asset to activate a barrier, independently of
portfolio strategies and in part to extend single-name
the level of other assets at the time. This makes the
options to portfolios. What these options share is a
Altiplano sensitive to individual asset moves, rather
strong dependence on the correlation structure of the
than the collective sum.
assets, brought about by their nonlinear and path-
As in single-name barriers, since IB is always less
dependent payoffs. But beyond this similarity, each
than 1, the payoff, and thus the risk, are lower than
type has its own distinct payoff tailored to its own
for standard options on baskets, which makes their
risk profile and use, making each deserving of its
premiums lower as well (assuming C is small or
own study. In this last of three articles on Mountain
zero). The lower premium makes it more attractive
Range options, we look at the Altiplano option, which
when used as a hedge, for example.
can be thought of as an extension of barrier options
to baskets.
This article is organized as follows. We first Modeling
provide a description of the Altiplano option and
discuss the financial motivations for and strategies In the simplest of implementations, as in single-name
of its usage. We then discuss modeling, valuation, options, the asset price processes are modeled as
and risk issues that Altiplano options share with lognormal processes but with a correlation matrix. In
all Mountain Range options, and conclude with a more advanced implementations, to account for the
brief analysis of the risk profile that is unique to volatility smile, some versions of stochastic volatility
it. We remark that although the following discussion models are often used. One may even include some
holds for a wide class of assets, such as foreign form of default component. However, in these more
exchange (FX) and commodities, these options are complex models, the modeling of correlation and its
traded mostly on baskets of stocks. estimation become more complex as well.

Contract Description Valuation and Risk


Like single-name barrier options, Altiplano options The number of assets in Mountain Range options
have two components—a vanilla-type payoff if a generally ranges from a low of 4 or 5 to a high
barrier event occurs, and a coupon payoff if it does of about 20. Owing to their complex payoff and
not. Usually the barrier event is to have at least one path-dependency, idiosyncratic characteristics of each
stock reach a predetermined barrier. asset need to be taken into account. Hence one can-
More precisely, given a portfolio, or a basket of n not assume homogeneity of assets for either small or
stocks, let Si (t) be the price of the stock i = 1, . . . , n large baskets, making any closed-form approximation
at time t, with 0 being the start of the option and T (especially in light of path dependence) intractable.
its maturity. The payoff is As a result, Mountain Range options, specially the
 n  path-dependent varieties such as the Altiplano, are
 Si (T ) calculated using Monte Carlo simulation [1]. Monte
max − K, 0 IB + (1 − IB )C (1)
Si (0) Carlo methods, especially for high-dimensional pay-
i=1
offs with large number of assets and time points, are
where IB is 1 if a barrier event in relation to barrier slow to converge, and usually one or more variance-
level B occurs, and is 0 if it does not, in which reduction techniques are employed. Additionally,
case the option would pay a coupon C. Here we since the barrier event is binary, the number of sim-
have used a call option, but, in principle, we could ulation paths needed is even greater than those with
2 Altiplano Option

continuous payoffs, making first- and second- order terms—the option term and the barrier term. Again,
Greeks calculations even more noisy. This makes use for ease of analysis, we look at a homogeneous
of variance-reduction methods even more critical. portfolio with identical pairwise correlations given
The other challenge posed by these options is the by a single number in the simple lognormal model
correlation. Even in the simple lognormal model, the with no coupon payout. For a simple call option on a
sheer size of the correlation matrix can become a basket, it is known that high correlation and volatility
challenge. Since for n assets, there can be n(n − 1)/2 increase its price. For the barrier event, it depends on
distinct correlations, even for a modest basket of 10 the type. If it takes only one asset to hit a barrier,
assets, 45 distinct correlations are possible. Moreover, then low correlation and high volatility increase the
it is not clear how one can obtain the pairwise probability of hitting it. Moreover, depending on the
correlations themselves. If, theoretically speaking, barrier, the paths that lead to higher option prices may
there existed n(n − 1)/2 traded spread options on make the barrier even more or less likely, leading to
each pair, their implied correlations could be used possible nonmonotonous behavior. So even in this
with the spread options as hedges. However, it simple homogeneous case with a single correlation,
is unlikely that every pair of assets in a basket the behavior is rather complex.
would have a traded spread option. Even if they
did, their sheer number would make transaction
References
costs prohibitive, even for moderate bid–ask spreads.
Hence historical correlations are more often used,
[1] Glasserman, P. (2004). Monte Carlo Methods in Financial
even though as with all historical estimates, they
Engineering, Springer, New York.
are hard to hedge and can change with macro- and [2] Mountain Range Options, document downloadable from
microeconomic shifts. When all assets belong to the global-derivatives.com
same sector, a single correlation number is commonly
used. This high amount of asset interdependence
makes cross-gammas important, adding further to the Related Articles
hedging complexity.
Atlas Option; Basket Options; Correlation Risk;
Himalayan Option.
Risk Profile
REZA K. GHARAVI
As in single-asset barrier options, the payoff of an
Alitplano option is determined by two competing
Constant Proportion the faster the portfolio value approaches the bond
floor in downturn markets.
Portfolio Insurance Both the bond floor and multiple are specified
in the contract and indicate the investor appetite for
risk. Assuming Black–Scholes framework for risky
Portfolio insurance is a dynamic management tech- asset St
nique that aims at giving the investor the ability to dSt = µSt dt + σ St dWt (3)
limit the downside risk while allowing some partic- and assuming floor evolves as follows:
ipation in the upside market. Option-based portfolio
insurance combines a position in the risky asset with a dFt = rFt dt (4)
put option on the asset to achieve this goal. In many
cases, options on a portfolio may not be available. the value of the portfolio at time t, as shown in [5]
Constant proportion portfolio insurance (CPPI) is an and [3], is
alternative to that approach. CPPI was first introduced Vt (St , m) = F0 exp(rt) + αt Stm (5)
by Perold [4] for fixed-income instruments and by
Black and Jones [2] for equity instruments. where αt = Cm0 exp(βt) and β is given by
CPPI utilizes a rule-based strategy to allocate S0
assets dynamically over time. It involves maintaining  
1 σ2
a dynamic mix of a “riskless” asset (usually treasury β = r − m r − σ 2 − m2 (6)
2 2
bills or liquid money market instruments) and the
risky asset, usually a market index. In the case of In periods of negative performance, a specified
having more than one risky asset, an index is formed, amount of the risky asset, according to a predeter-
which would be treated as a single risky asset. The mined asset allocation formula, is liquidated and used
weights in the index do not change during the life of to purchase riskless assets. On the other hand, when
the trade. market goes up, a specified amount of riskless assets,
The strategy is based on the notion of cushion, according to the formula, is liquidated and proceeds
which is the difference between the current portfolio are used to purchase the risky asset. The provider
value and the guaranteed level, called bond floor. undertakes the risk of managing of the pool of assets
Obviously, the initial floor F0 is less than the initial (both risk-free and risky assets).
portfolio value V0 . Define Ct to be t-time value of For risk managing of the pool in CPPI, the
the cushion, that is provider receives an annual fee. The fee is specified
in the contract in one of the following ways: (i) as
Ct = Vt − Ft (1) a fixed percentage of the initial notional per annum,
(ii) as a fixed percentage of the value of the portfolio
The final payoff at maturity is the maximum of per annum (path dependent), and (iii) as a fixed
these two quantities: (i) the value of the portfolio at percentage of the value of the equity held in the pool
maturity and (ii) the guaranteed level. In a nutshell, per annum (path dependent). The first two are more
CPPI is a path-dependent, self-financing capital guar- common than the third one.
antee structured product that has final payoff linked The provider also receives the potential value of
to the performance of a pool of assets. dividends on the equity. This amount is also path
Throughout the existence of the contract, an dependent.
amount of wealth is invested into the risky asset. This If there is more than one risky asset in the pool,
amount called exposure is proportional to the cush- they would be treated like a basket in a basket option,
ion and is calculated by multiplying the cushion by so that in the event of rebalancing the collection of
a predetermined multiple. risky assets is treated like a single underlying.
et = mCt (2)
Rebalancing Procedure
The remainder of wealth is allocated into the
riskless asset. Trivially, the higher the multiple, the The asset allocation formula is a part of the contract.
more the holder will participate in rising markets and The terms in the formula are negotiated between the
2 Constant Proportion Portfolio Insurance

provider and the counterparty before entering into the options and (ii) fat-tailed behaviors observed in stock
transaction. returns distributions.
A feature is built into the allocation formula In [1], the authors apply extreme value theory to
in order to avoid constant rebalancing. Rebalanc- determine the multiple. A quantile hedging approach
ing occurs where the difference between theoretical is introduced, which provides an upper bound on
equity exposure from the formula and the actual multiple. This bound is statistically estimated from
equity allocation is greater than a predefined number the behavior of extreme variations in rates of asset
of percentage points specified in the contract. returns. The authors also introduce the distributions
In the case that it is triggered, the provider of interarrival times of these extreme movements and
sells/purchases an amount of equity and pur- show their impacts on CPPI.
chases/sells risk-free assets to make the actual ratio In [5] the authors analyze the cost of the guarantee
equal to the number generated by the formula. The and the performance of portfolio based on such
timetable for rebalancing is up to the investor, with a strategy. They provide two extensions. One is
monthly or quarterly rebalancing being often cited. based on Levy processes that allow jumps into the
dynamics of the underlying asset. Second, they deal
with insurance against all hitting times of modified
Gap Risk floor.
In [3], Cont and Tankov study the behavior
The risk that the value of the portfolio is less or equal of CPPI strategies in models where the price of
to the bond floor is called the gap risk. the underlying portfolio may experience downward
If there is no drastic jump in the value of the risky jumps. That allows them to quantify the gap risk
asset for the life of the trade, then there is no need while maintaining the analytical tractability of the
for injection of money for rebalancing of the pool. continuous-time framework. With respect to the work
Therefore downfall is lower than the gap risk and the done in [5], in [3] the authors consider various
value of the pool would be above the bond floor. In risk measures for the loss and provide an analytical
that case, hedging the CPPI trade would be risk free. method to compute them.
There are cases that the value of the pool may CPPI techniques have also been applied to credit
go under the bond floor. Either the provider cannot portfolios (see Credit Portfolio Insurance).
liquidate the risky asset due to illiquidity or the
value of the equity asset has dropped so much that References
proceeds are not sufficient to maintain the value of the
portfolio above the floor—the market simply drops [1] Bertrand, P. & Prigent, J.-L. (2002). Portfolio insurance:
by more than the gap risk before a rebalancing can be the extreme value approach to the CPPI method, Finance
undertaken. In either case, the provider must make up 23(2), 69–86.
the shortfalls. Whenever the portfolio value reaches [2] Black, F. & Jones, R. (1987). Simplifying portfolio insur-
a given floor, the investor receives a given amount. ance, Journal of Portfolio Management 14(1), 48–51.
[3] Cont, R. & Tankov, P. (2009). Constant proportion
At this point, the entire pool comprises of 100%
portfolio insurance in presence of jumps in asset prices,
exposure to riskless asset. The gap risk is presented Mathematical Finance 19(3), 379–401.
as basis points per annum. [4] Perold, A.R. (1986). Constant Proportion Portfolio Insur-
ance, Harvard Business School, Working Paper.
[5] Prigent, J.-L. & Tahar, F. (2005). CPPI with Cushion
Insurance, University of Cergy-Pontoise, working
Modeling Gap Risk paper.

Modeling the gap risk is the main concern in CPPI. It


is analogous to a string of one-day out-of-the-money Related Articles
put options.
Using standard Black–Scholes formula would not Credit Portfolio Insurance.
be ideal for the following two reasons: (i) lack of
volatility information on such deep out-the-money ALI HIRSA
Equity Default Swaps than from CDS. A credit event almost certainly
implies that an equity default event will occur,
but, conversely, an equity default event can occur
without a credit event having occurred. This implies
Equity default swaps (EDS) are equity derivatives
that for the same reference entity, the EDS spread
that are structured as far out-of-the-money American-
must be greater than the CDS spread. Besides,
style binary puts with periodic swap payments rather
since protection sellers receiving a higher return
than an up-front premium. The structure of EDS is
for EDS, the default event for EDS is easier to
similar to credit default swaps (CDS) except that the
define than for CDS. Whether or not the share price
default event is defined in terms of a decline in the
has reached a predetermined level is unambiguous,
share price of the reference entity rather than a credit
but the various credit events can sometimes cause
event experienced by the reference entity. Thus, sim-
confusion for counterparties and have led to legal
ilar to a CDS, an EDS can be seen in terms of the
proceedings.
protection buyer and protection seller counterparties.
One difference between EDS and most equity
In the case of an EDS, protection buyers are hedg-
derivatives is the swap feature. Not only is there
ing themselves against a large decline in the share
no up-front premium but also upon an equity default
price.
event, no further swap payments are made. Gil-Bazo
More specifically, the protection buyer in an EDS
makes periodic fixed payments to the protection [4] finds the fraction of the EDS spread that is due
seller. The size of the periodic payments is called to this swap feature under the Black–Scholes model,
the EDS spread. In return for the periodic payments, given plausible parameter values.
a default payment from the protection seller to the
protection buyer is made if the share price of the
reference entity declines a prespecified amount from Applications of EDS
the share price at initiation of the EDS contract. If
this equity default event occurs and the default pay-
Besides the obvious application of EDS as portfolio
ment is made, then the contract terminates with the
protection on long positions in shares, EDS are often
protection buyer ceasing to make further payments.
used in ways that exploit their similarity to CDS.
If the equity default event never occurs before the
Two examples are given here: relative value trades
maturity of the EDS contract, then no payment is
in EDS–CDS carry trades and as yield-enhancing
ever made from the protection seller to the protection
replacements for CDS in collateralized debt obli-
buyer.
gations (CDO).
Typically, the prespecified fall in the share price
In EDS–CDS carry trades, an investor sells pro-
is a 70% decline, so that the equity default event is
tection with an EDS and buys protection with a CDS.
defined as the first time that the shares of the refer-
The larger EDS spread is received while the smaller
ence entity trade at 30% of the share price when the
CDS is paid, so that the investor simply collects
EDS contract is entered. The amount of the default
the positive carry if neither a credit nor an equity
payment is fixed when the EDS contract is initiated
default event occurs. In the case that both default
and computed as
events occur, the investor is partially hedged with the
N × (1 − R) (1) effectiveness of the hedge depending on the relative
timing of the default events as well as the recov-
where N is the notional value of the contract and R ery rate on the credit event. There is a risk of large
is the recovery rate, which is predetermined and typ- losses if an equity default occurs without a credit
ically set to be 50%. The formulation of the default event occurring.
payment in terms of a recovery rate is to further the Some CDO were constructed with the refer-
analogy to CDS. EDS are usually medium-term con- ence portfolio including some EDS in addition to
tracts with maturities of five years being the most CDS. While this increases the risk to the CDO
common. investors, there can be significant yield enhance-
The first EDS were issued in 2004 as a means ment. There have even been CDO where the ref-
to allow protection sellers to receive a higher return erence portfolio was exclusively composed of EDS.
2 Equity Default Swaps

These are termed equity collateralized obligations where R is the recovery rate, D t is the discount factor
(ECO).
to time t and P and E are probabilities and expecta-
tions under the risk-neutral measure. Then, the fair
spread for an EDS is the value of C that makes
EDS as Credit–Equity Hybrids EDS (S0 ; C, R, L) = 0.
The problem of deriving the first passage time dis-
Since a large fall in the equity price is needed to tribution has been solved in several special cases.
trigger the payout of an EDS, there are often accom- Albanese and Chen [1] compute the EDS spread
panying credit implications to the firm leading many under the assumption that the stock price fol-
to refer to EDS as a credit–equity hybrid instrument, lows a constant elasticity of variance (CEV) pro-
despite the default event being defined exclusively in cess (see Constant Elasticity of Variance (CEV)
terms of the share price. Diffusion Model) and Asmussen et al. [2] use
There is empirical evidence that EDS should the Wiener–Hopf factorization (see Wiener–Hopf
not be considered simply as an equity derivative. Decomposition) to compute the EDS spread under
de Servigny and Jobst [8] assess the relative weight- the assumption that the stock price follows a
ing of debt and equity factors for equity default Carr–Geman–Madan–Yorr (CGMY) Lévy process
probabilities. They find that for the typical definition (see Tempered Stable Process). For models where
of equity default as 30% of the initial share price, credit considerations are more explicitly addressed,
debt factors are more important than equity factors. EDS spreads have been priced using different meth-
In addition, Jobst and de Servigny [5] study EDS ods: Albanese and Chen [1] use a credit barrier model
correlation and EDS–CDS correlation and find that with a credit to equity mapping; Campi et al. [3]
multivariate analyses commonly used for credit are extend the CEV case to include jump to default;
the most appropriate. Medova and Smith [6] use a structural model of
credit risk (see Structural Default Risk Models)
where the firm’s asset value follows a geometric
EDS Pricing Brownian motion; and Sepp [7] makes use of an
extended structural model where the firm’s asset
Consider the problem of finding the fair spread for an value can have stochastic volatility (see Heston
EDS that is to be initiated now at time t = 0, with the Model) or be a double exponential jump diffu-
current stock price as S0 . The default payment in an sion, with the default barrier being deterministic or
EDS is made the first time that the share price trades stochastic.
at the prespecified level L where L < S0 . In addition
to this, the swap payments are made contingent on
the share price not being traded at or below the pre- References
specified level. Thus, to price an EDS, the probability
distribution of the first passage time of the level L is [1] Albanese, C. & Chen, O. (2005). Pricing equity default
required. Here, the first passage time of L is defined swaps, Risk 18, 83–87.
as [2] Asmussen, S., Madan, D. & Pistorius, M.R. (2008).
τL = inf{t > 0; St ≤ L} (2) Pricing equity default swaps under an approximation
to the CGMY Lévy Model, Journal of Computational
Finance 11, 79–93.
Then, if an EDS with $1 notional requires a peri-
[3] Campi, L., Polbennikov, S. & Sbuelz, A. (2009). System-
odic payment of C at times T1 , . . . , Tn the price of atic equity-based credit risk: a CEV model with jump to
the EDS for a protection buyer is default, Journal of Economic Dynamics and Control 33,
93–108.

n
[4] Gil-Bazo, J. (2006). The value of the ‘swap’ fea-
EDS (S0 ; C, R, L) = −C D Ti P (τL > Ti |S0 ) ture in equity default swaps, Quantitative Finance 6,
i=1 67–74.
+ (1 − R)E[D τL 1{τL ≤Tn } |S0 ] [5] Jobst, N. & de Servigny, A. (2006). An empirical analysis
of equity default swaps (II): multivariate insights, Risk 19,
(3) 97–103.
Equity Default Swaps 3

[6] Medova, E. & Smith, R. (2006). A structural approach to Related Articles


EDS pricing, Risk 19, 84–88.
[7] Sepp, A. (2006). Extended CreditGrades Model with
stochastic volatility and jumps, Wilmott Magazine Constant Proportion Portfolio Insurance; Credit
September, 50–62. Default Swaps; Total Return Swap; Variance
[8] de Servigny, A. & Jobst, N. (2005). An empirical analysis Swap.
of equity default swaps (I): univariate insights, Risk 18,
84–89. OLIVER CHEN
Exchange-traded Funds market. The second is to purchase enough ETF shares
to form a creation unit and then exchange the creation
(ETFs) unit for the securities that comprise it. As with the
creation of the ETF shares, this second option has
no tax implications but is generally only available to
large institutional investors.
Exchange-traded funds (ETFs) are indexed products The financial firms mentioned above are moti-
and trade like equities on the main exchanges. vated to participate in the ETF space by different
Technically, an ETF holder possesses certificates that profit opportunities. Fund managers and custodial
state legal right of ownership over a portion of a banks each collect a small portion of the fund’s
basket of individual stock certificates. It may be annual assets. Investors who loan instruments that
tempting to think of ETFs as mutual funds. However, compose the aforementioned baskets receive interest
ETFs differ from mutual funds in several ways. One fees, while market makers seek to earn both arbi-
key distinction is that mutual funds only trade at the trage (i.e., the difference in price between the ETF
end of each day at their calculated NAVs while ETFs and the basket of instruments) and bid/ask-spread
trade throughout the day at ever-changing prices. profits.
Another is that ETFs, unlike mutual funds, can be According to a July 2008 Morningstar article, the
sold short. average ETF charged 54 bps in annual fees [3]. This
A number of financial entities are required to number was up from 41 bps a year earlier [3]. The
create and maintain ETFs. Initially, the fund manger average has been raised owing to recently formed
submits a detailed plan to SEC as to the ETF’s
exotic, narrowly focused ETFs. However, there are
constitution and how it will function. Once the plan is
still many broad-market ETFs with annual fees on the
approved, the fund manager enters into an agreement
order of 10 bps. Furthermore, an ETF’s management
with a market maker or specialist known as the
fee is usually lower than the (approximately) 80 bps
authorized participant. The authorized participant,
charged by the typical mutual fund [2]. Finally, over
typically by borrowing, begins to assemble a basket
90% of US ETFs have bid/ask spreads of fewer than
of instruments that compose the index the ETF is
50 bps (with over half having spreads of fewer than
meant to replicate. Once assembled, the instruments
are placed in trust at a custodial bank that, in 20 bps) [1], while the expense ratio (which includes
turn, uses them to form what are know as creation management fees, administrative costs, 12b-1 distri-
units. Each creation unit represents a subset of bution fees, and other operating expenses) for an
the basket of instruments. The custodial bank then average mutual fund is 150 bps [4]. Assuming one
divides the creation units into (typically) 10 000 to must transact at the bid (offer) when selling (buy-
600 000 ETF shares [5], which are legal claims on the ing), even ETFs with higher bid/ask spreads have,
aforementioned instruments, and forwards the shares on average, substantially lower frictional costs than
to the authorized participant. It should be mentioned mutual funds (104 bps vs. 150 bps).
that this is an in-kind trade (i.e., ETF shares are There are close to 2000 ETFs trading today, track-
exchanged for the basket of financial instruments) ing numerous broad-market composites as well as a
with no tax implications. The authorized participant multitude of sector and geographic indexes. These
subsequently sells the shares in the open market just products cover approximately 40 different invest-
like shares of stock. The ETF shares then continue ment categories (e.g., Utilities Sector Equity, High
to be sold and resold by investors. The instruments Yield Fixed Income, US Real Estate Equity and
underlying the creation units, and therefore the ETF European Mid/Small Cap Equity) and fall into sev-
shares, remain in trust with the custodian who is eral investment styles (e.g. equity, fixed income, and
responsible for paying any cash flows (e.g., dividends alternatives). Furthermore, these funds are offered
and coupons) from the instruments to the ETF holders by a large number of investment banks and invest-
and providing administrative oversight of the creation ment management companies. Consequently, the ETF
units themselves. market gives investors numerous choices both in
A long position in an ETF can be unwound two terms of the nature of investment and the fund
ways. The first is simply to sell the share in the open manager.
2 Exchange-traded Funds (ETFs)

In conclusion, a number of financial entities are [2] Kinnel, R. (2007). Fund Fees are Coming Down, Morn-
necessary to create and maintain ETFs, indexed prod- ingstar, Retrieved on July 25, 2008 from http://
ucts that trade like stocks on major bourses. Further- ibd.morningstar.com/article/article.asp?CN=aol828&id=
more, ETFs differ from mutual funds in that they 194298
[3] Marquardt, K. (2008). Surprise: ETF Fees are Going Up,
trade throughout the day, can be shorted, and gener-
U.S. News, Retrieved on July 25, 2008 from http://www.
ally carry lower fees. Finally, the close to 2000 ETFs
usnews.com/blogs/new-money/2008/7/9/surprise-etf-fees
trading today cover a plethora of different investment
-are-going-up.html
categories, offering investors an inexpensive means to [4] McKeever, C. (2007). A Cost Comparison—The Real
construct well-diversified portfolios. Cost of Mutual Funds v ETF’s, Chance Favors, Retrieved
on July 28, 2008 from http://chancefavors.com/2007/10/
References cost-comparison-mutual-funds-vs-etfs/
[5] McWhinney, J. (2005). An Inside Look at ETF Con-
[1] Amery, P. (2008). European ETF Secondary Market Deal- struction, Investopedia, Retrieved on July 25, 2008
ing Spreads, Index Universe, Retrieved on July 25, 2008 from http://www.investopedia.com/articles/mutualfund/
from http://www.indexuniverse.com/sections/features/12/ 05/062705.asp
4294-european-etf-secondary-market-dealing-spreads.
html MICHAEL J. TARI
Volume-weighted Average Subtleties in the computation of VWAP/TWAP
include (i) the choice of volume definition (e.g.,
Price (VWAP) primary market volume or composite volume), (ii)
the treatment of certain trades (e.g., block trades that
might be negotiated off market), and (c) the decision
The volume-weighted average price (VWAP) and whether to include volumes at the open and close of
its close cousin, the time-weighted average price the market.
(TWAP), are commonly used measures of the average
price of a security over a period of time. VWAP
and TWAP are used by traders and other investment
professionals as reference prices, an indication of the
Uses
average transaction price over an interval of time. So,
for example, if the TWAP of a security is $10 on a VWAP is commonly used as an approximation to
given day and a trader had bought a sizeable block of the price that could be realized by a trader who
shares at $9.50, we might conclude that the trader had passively participates in trading activity. As such,
added value in that he or she obtained a better than a the performance of traders can be measured by
naı̈ve program that mechanically sends out orders in their ability to execute orders at prices better than
the market at a steady rate throughout the day. the VWAP benchmark prevailing over the trading
horizon.
The computational simplicity of the VWAP is a
Mathematical Definition major factor in its popularity in measuring trade exe-
cution, especially in markets where detailed trade
More formally, the VWAP of a security over a speci- level data is difficult or expensive to obtain. VWAP
fied trading horizon (e.g., from market open to close) can be misleading as a benchmark in certain situa-
is defined as the ratio of the total transaction value tions where the trader’s objective is to control the
in that security (i.e., the sum, over all trades in the slippage from a given strike or decision price, or
specified horizon, of the product of each trade’s share where the strategy is not passive. In such cases,
volume and the corresponding price) to the total vol- for example, if the trader has short-term alpha, the
ume of shares traded (i.e., the sum of all shares traded mechanical application of a VWAP strategy (i.e.,
in the trading horizon). When the trading horizon is trading in parallel to historical volume patterns) can
typically a trading day, intraday or multiday VWAP lead to significant opportunity costs in terms of slip-
measures are also computed. A related concept is the page. VWAP is not appropriate when the trader’s
TWAP, defined as the average price over a particu- executions are large relative to market volumes.
lar time interval with no explicit volume weighting. In this case, VWAP might conceal a large price
Traders use TWAP over VWAP for securities where impact because the trader’s own trades constitute
the temporal pattern of volume exhibits considerable the bulk of the reported volume. Finally, if traders
variation, for example, in less-active securities. have discretion over whether to execute or not, the
Formally, given N trades in the relevant interval, VWAP benchmark can be gamed by selectively tim-
let S1 , . . . , SN be the shares transacted with corre- ing executions.
sponding prices P1 , . . . , PN . Then, we have An important application is to so-called VWAP

N strategies, typically algorithmic trading strategies that
Pi Si automatically break up an order and send trades to
i=1 the market to match the historical volume pattern or
V W AP = (1) profile (see, e.g., [1]) of a security. See, for example,
N
Si [2] for a discussion of the uses of VWAP in trading
i=1 strategies and algorithms. The goal of a VWAP
strategy is to obtain an execution price close to the

N
Pi VWAP for the day. Some brokers also guarantee
i=1 VWAP execution, essentially taking on the execution
T W AP = (2) risk for a fee.
N
2 Volume-weighted Average Price (VWAP)

References Related Articles

[1] Hobson, D. (2006). VWAP and volume profiles, Journal Automated Trading; Execution Costs; Price
of Trading, 1(2), Spring, 38–42. Impact.
[2] Madhavan, A. (2002). VWAP Strategies, in Transaction
Performance: The Changing Face of Trading, Handbook ANANTH N. MADHAVAN
Series in Finance, B. Bruce, ed, Institutional Investor Inc.
Equity Swaps on the return of a stock or an equity index over the
period [Ti−1 , Ti ].

In general, the cash flows are specified in such a


A swap contract is a bilateral agreement between two way that the initial value, at time T0 , of the swap
parties, known as counterparties, to exchange cash equals zero. Usually the equity swap pays out the
flows at regularly scheduled dates in the future. In an total return of the underlying stock or equity index
equity swap, some of the cash flows are determined including dividends. However, there are also variants
by the return on a stock or an equity index. Typically, where the dividend is excluded.
one of the parties pays to the other the total return A swap contract has a notional principal.b It is
of a stock or an equity index. In exchange, he or a currency amount specified in the swap contract
she receives from the other a cash flow determined that determines the size of the payments expressed
by a fixed or floating rate or the return of another in currency units. While the notional principal of
stock or equity index. Equity swaps are also known a bond, for instance, is paid out at maturity, the
as equity-linked swaps and equity-indexed swaps. notional principal of a swap contract is, in general,
Equity swaps are not traded on an exchange but never exchanged. Equity swaps can be classified into
are privately negotiated. They are referred to as two categories depending on whether the notional
over-the-counter (OTC) contracts. One of the first- principal is constant or varies over the lifetime of
known equity swap agreements was offered by the the swap. We focus on the former case, which is
Bankers Trust in 1989. Since then, the market for considered in the next section.
equity swaps and other equity-linked derivatives has
grown rapidly. There are no exact figures on the size
of the market. However, the Bank for International Contracts with Fixed Notional Principal
Settlements (BIS) provides market size estimates.
According to BIS, the estimate of the worldwide Let N denote the fixed notional principal. Let {Z(t)}
total notional amounts outstanding of equity swaps denote the price process of a stock or an equity index.
and equity forwards was over $300 trillion as of Define the period return R(Ti , Ti+1 ) over the interval
December 2007. [Ti , Ti+1 ] for asset Z by
Equity swaps provide means to get exposure to
the underlying stock or index without making a Z(Ti+1 )
R(Ti , Ti+1 ) = −1 (1)
direct investment. Because equity swaps are OTC- Z(Ti )
contracts, they can be tailor-made to specific needs.
The contracts have been used to circumvent barriers Definition 2 A generic equity-for-fixed-rate swap
for direct investments in particular markets, bypass An equity-for-fixed-rate swap, with tenor structure
various taxes, and minimize transaction costs. T, which is written on the equity Z will have a
predetermined swap rate K and will give rise to the
following payments between the counterparties A and
Defining Equity Swaps B at each payment date Ti :

Let {T0 , T1 , . . . , TM } be a sequence of dates. This • A pays to B the amount: N R(Ti−1 , Ti ).


is the tenor structure and we denote it by T. For a • B pays to A the amount: N δi K.
given day-count convention, we specify a sequence
of year fractionsa {δ1 , δ2 , . . . , δM } to T. We denote In general, the swap rate is chosen such that the
the counterparties by A and B. initial value of the swap at time T0 equals zero.
In its most simple form, this contract is referred
Definition 1 A generic equity swap to as a plain vanilla equity swap. The period return is
An equity swap with tenor structure T is a con- then determined by a domestic asset or index and the
tract that starts at time T0 and has payment dates nominal amount is expressed in units of the domestic
T1 , T2 , . . . , TM . At each payment date Ti for i = currency. Examples can be found in [5] and [8].
1, 2, . . . M, the two counterparties A and B exchange Some equity swaps are structured so that instead
payments. At least one of the payments will be based of a fixed swap rate they pay a floating interest rate,
2 Equity Swaps

usually a LIBOR rate. Let L(Ti , Ti+1 ) denote the flow that B pays). This type of contract is referred
simple spot rate over the period [Ti , Ti+1 ]. to as a quanto swap (see also Quanto Options).
Quanto swaps are more complicated to price than
Definition 3 A generic equity-for-floating-rate other swaps. Quanto contracts have been considered
swap in [3, 4] and [7].
An equity-for-floating-rate swap, with tenor struc- From a pricing and hedging perspective, the sim-
ture T, which is written on the equity Z will give rise plest cross-currency swaps are the ones that are
to the following payments between the counterparties currency adjusted. Consider a cross-currency equity-
A and B at each payment date Ti : for-equity swap with currency-adjusted returns. Let
Z1 be a foreign equity, while Z2 is a domestic equity.
• A pays to B the amount: N R(Ti−1 , Ti ).
Let X(t) denote the exchange rate expressed as the
• B pays to A the amount: N δi (L(Ti−1 , Ti ) + s).
number of domestic currency units per foreign cur-
where s is a constant rate such that the initial value rency unit. Then the currency-adjusted period return
of the swap at time T0 equals zero. over the interval [Ti , Ti+1 ] for the asset Z1 is
X(Ti+1 )Z1 (Ti+1 )
An equity-for-floating swap can be decomposed R1 (Ti , Ti+1 ) = −1 (2)
into an equity-for-fixed swap and a suitably chosen X(Ti )Z1 (Ti )
interest rate swap (see LIBOR Rate and [2]).
While the unit of Z1 (t) is foreign currency, the
Let R1 and R2 denote the return of assets Z1 and
unit of Z1 (t)X(t) is domestic currency. Regarding
Z2 , respectively.
the underlying index as the foreign asset times the
Definition 4 A generic equity-for-equity swap exchange rate, R1 can be treated as the return on
An equity-for-equity swap, with tenor structure T, a domestic index. A cross-currency equity-for-equity
which is written on the equities Z1 and Z2 , will give swap that is currency adjusted is, from a valuation
rise to the following payments between the counter- point of view, equal to a domestic equity-for-equity
parties A and B at each payment date Ti : swap.

• A pays to B the amount: N R1 (Ti−1 , Ti ).


Contracts with Variable Notional Principal
• B pays to A the amount: N (R2 (Ti−1 , Ti ) + sδi ).

where s is the constant rate such that the initial value Some equity swaps are constructed with a vari-
of the swap at time T0 equals zero. able notional principal. A variable notional principal
changes over time according to changes in the refer-
The equity-for-equity swap is also referred to as enced equity index.
a two-way equity swap. The simplest contract of this Consider an equity-for-fixed-rate swap. It can
type is a domestic equity-for-equity swap where both essentially be regarded as a leveraged position in the
returns are based on domestic indices or assets. underlying equity. If the notional principal is con-
So far, we have only considered domestic equity stant, the realized returns from the equity index are
indices and assets. However, all of the three equity withdrawn in each period, resulting in a position that
swaps mentioned above have versions where one or is rebalanced periodically. If the notional principal
both cash flows are based on a foreign equity return or is variable, the realized returns in each period are
interest rate. They are so called cross-currency swaps. reinvested.
To illustrate a cross-currency equity swap, suppose Let Ni denote the variable notional principal,
that the United States is the domestic market. Let the which determines the size of the payments at
notional principal be expressed in US dollars. Let Z1 time Ti for i = 1, 2, . . . , M. Let N1 = 1 and Ni =
be a foreign equity index such as, for instance, the Z(Ti−1 )/Z(T0 ) for i = 2, 3, . . . , M. Thus, for
NIKKEI, while Z2 is a domestic equity index such instance, at the third payment date T3
as the S&P 500. The period return R1 is based on
a foreign equity index, while the nominal amount • A pays to B the amount: Z(T2 ) R(T2 , T3 ).
is in domestic units. There is a currency mismatch Z(T0 )
in the cash flow that A pays (but none in the cash • B pays to A the amount: Z(T2 ) δ3 K.
Z(T0 )
Equity Swaps 3

Equity swaps with variable notional principals are References


treated in [2, 6] and [9].
[1] Bolster, P., Chance, D. & Rich, D. (1996). Executive
More Equity Swaps & Strategies equity swaps and corporate insider holdings, Financial
Managment 25(2), 14–24.
There can be many variations of the equity swaps [2] Chance, D. & Rich, D. (1998). The pricing of equity
listed so far. For instance, there can be more than swaps and swaptions, The Journal of Derivatives 5,
one tenor structure, that is, the payments made by 19–31.
A and B can have different periodicity. It is also [3] Chung, S. & Yang, H. (2005). Pricing quanto equity
possible to make forward agreements to enter into a swaps in a stochastic interest rate economy, Applied
Mathematical Finance 12(2), 121–146.
swap contract in the future. Such contracts are known
[4] Hinnerich, M. (2007). Derivatives Pricing and Term
as forward swaps or deferred swaps. There are also Structure Modeling. PhD Thesis, Stockholm School
equity swaps with option features like capped equity of Economics, EFI, The Economic Research Institute,
swaps and barrier equity swaps. Further examples are Stockholm.
blended index swaps and outperformance swaps (see [5] Jarrow, R. & Turnbull, S. (1996). Derivative Securities,
[2, 6] and [8]). South-Western Publishing, Cincinnati.
We conclude by providing an example of how [6] Kijima, M. & Muromachi, Y. (2001). Pricing equity
swaps in a stochastic interest rate economy, The Journal
equity swaps were used in the United States dur-
of Derivatives 8, 19–35.
ing the 1990s to circumvent taxes. The executive [7] Liao, M. & Wang, M. (2003). Pricing models of equity
equity swap strategies were developed for large sin- swaps, The Journal of Futures Markets 23(8), 121–146,
gle stock shareholders, for instance, a founder of 751–772.
a company. The swap was constructed so that the [8] Marshall, J. & Yuyuenyonwatana, R. (2000). Equity
shareholder made payments based on the return of swaps: structures, uses, and pricing, In Handbook of
the stock to the swap contractor. In exchange, the Equity Derivatives, Jack C. Francis, William W. Toy, &
J.G. Whittaker, eds, Wiley, New York.
shareholder received either a fixed interest rate or the
[9] Wu, T. & Chen, S. (2007). Equity swaps in a libor market
return of a large equity index such as S&P 500. By model, The Journal of Futures Markets 27(9), 893–920.
entering into such a contract, the stockholder could
keep the stocks and the voting rights, but still reduce
the risk of the total portfolio and avoid capital gains Further Reading
taxes. As a result, the tax regulation was changed.
The new regulation states that taxpayers should rec- Chance, D. (2004). Equity swaps and equity investing, The
Journal of Alternative Investing 7, 75–97.
ognize that transactions that are essentially equivalent
to a sale should be treated as such and thus be taxed.
For a more detailed description on this topic, see [1] Related Articles
and [8].
Equity Default Swaps; Forwards and Futures;
End Notes LIBOR Rate; Quanto Options; Total Return
Swap.
a.
For instance, if the convention “Actual/365” is used, δ1 is
equal to the number of days between the dates T0 and T1 , MIA HINNERICH
divided by 365.
b.
Sometimes called face value.
Volatility Index Options the forward index level derived from option prices,
K0 the highest strike just below the forward index
level, F , Ki the strike price of the ith option: a call
if Ki > K0 , a put if Ki < K0 and both call and put
Volatility index options are options on a volatility
for Ki = K0 , Ki the interval between strike prices,
index. Volatility index options enable one to take
R the risk-free rate to expiration, and Q(Ki ) the
a pure volatility exposure without the need to take
midpoint of the bid–ask spread for each options with
positions in the index itself and without the need to
delta-hedge. These features make these options very strike Ki .
interesting for pure volatility trades and bets. They The CBOE calculates and publishes minute-to-
also allow one to trade the spread between realized minute the VIX using real-time bid–ask market
and implied volatility, or to hedge the volatility quotes of options on S & P500 index (SPX) with
exposure of other positions or businesses, without nearby and second nearby maturities and applying the
being contaminated by the index price dependence multiplier of $100. Overall, the VIX reflects market’s
like in the standard index options. This explains their view of the future short term volatility. A high value
popularity among traders and hedge funds. of the index indicates a more volatile market, while
Originally, volatility index options were traded a low value indicates a less volatile environment.
over the counter (OTC), and a large part is still Often referred to as the fear index, it represents one
traded OTC, through volatility and variance swaps measure of the market’s expectation of volatility over
(see Variance Swap). In February 2006, the Chicago the next 30-day period. In short term, bias can explain
board option exchange (CBOE) realized the interest some key differences between the VIX and the overall
in the financial community for exchanged standard- market sentiment. This is particularly true at times
ized volatility index options and started offering when the most liquid options are in the range of
standardized options first on the VIX (which is 2–6 months to expiration. Therefore, VIX tries to
the CBOE volatility index) and later on the Russel quantify the market volatility, mainly focusing on
2000 volatility index. In Europe, although the major short term, being unable to completely explain the
exchanges have already developed volatility on the market volatility, which is a complex concept.
major indexes (and futures on these) like the VDAX, It is worth noting that VIX is computed to be
FTSE 100 volatility index, VSTOXX, VCAC, or the square root of the par variance swap rate (see
VSMI, they still do not offer any options on these Variance Swap), and not the volatility swap rate (see
indexes. Volatility Swaps). This is because variance swap can
be perfectly, statically replicated through vanilla puts
and calls, whereas volatility swap requires dynamic
Volatility Index hedging [1]. We will discuss this point later in the
section on pricing.
The underlying volatility index is computed from VIX options are European call and put options
option prices to capture information from the options on the VIX index, with strikes ranging from 10 to
market by some means. The overall idea of index 65 (with interval of 1 and 2.5 points for liquid and
volatility is to provide a good estimate of the so- 5 points for less liquid points), while maturities are
called implied volatility extracted from the options up to 6 months. Like many other listed options,
market, as opposed to the historical volatility. More VIX options are quoted with a multiplier of $100.
precisely, the aim is to estimate the risk-neutral The expiration date is roughly speaking the third
market’s expectation of the future volatility. In the Wednesday of the expiry month. More details can
specific case of the VIX, the formula is public and be found on the CBOE website [3].
weights the various options as follows:
 2Ki  2
1 F
σ2 = e RT
Q(Ki ) − − 1 (1)
Ti Ki2 Ti K0 Pricing Models
i

where σ is the VIX/100, or equivalently, VIX = In terms of pricing, roughly speaking there are two
100σ , Ti the time to expiration of the ith option, F methods to price index volatility options:
2 Volatility Index Options

• a model-dependent approach that assumes a price F , and where σ is the volatility of the futures
model for the index volatility diffusion and price, DT is the discount factor expiring at time
provides a closed-form formula of call and put T , and N (x) is the cumulative normal distribution
options; up to x. A straightforward drawback of the Wha-
• a model-free approach that computes the cost of ley approach was the strong log normal underlying
the static hedge to replicate the volatility index assumption. This motivated further research and
option. led to many works. Grunbichler and Longstaff [6]
proposed a mean-reverting square root process for
Historically, the model-dependent approach has the volatility process. Following the popularity of
been the first to emerge. Successively, among others, stochastic volatility, Howison [7] and Elliot [5] sug-
Whaley [11], Grunbichler and Longstaff [6], Howison gested to use stochastic volatility model for the index
et al. [7], and Elliott et al. [5], and lately Sepp [9, 10], volatility to capture the risk of volatility for the
presented the model-dependent approaches to price
index volatility. Moreover, because it is well known
volatility index options. They all assume an underly-
that index volatility is upward sloping, the stochas-
ing stochastic process for the index volatility (or the
tic volatility approach, which can cope with this
index volatility futures) and explicitly compute the
important feature, was an appealing modeling choice.
price of the call and put options.
Figure 1, for instance, gives the VIX smile for 27 July
The first model-dependent approach presented by
2009.
Whaley [11] assumed a log normal diffusion for the
Lately, Sepp [9, 10] argued in favor of adding
VIX cash index and the VIX futures leading to a stan-
jumps to the stochastic volatility model to get a
dard Black–Scholes formula for the VIX call options
more realistic diffusion for the index volatility. This
as follows:
was supported by the econometric works that con-
  firmed the evidence of jumps for the volatility
C(T , F, K) = DT F N (d+ ) − KN (d− ) (2) index.
F 1 To sum up, the model-dependent approach aims at
ln ± T σ2 modeling the index volatility evolution as accurately
d± = K√ 2 (3) as possible and providing a very consistent frame-

work for pricing any type of option on index volatil-
where C(T , F, K) stands for the value of the call ity. The strength is the flexibility in terms of pricing
option with expiry time T , strike K, and forward as there is no limitation for the types of options. The

140

120
Volatilities

100

80

60

40
20 25 30 35 40 45 50 60 70
Strikes

VIX Aug 2009 VIX Sep 2009 VIX Oct 2009

VIX Nov 2009 VIX Dec 2009 VIX Jan 2010

Figure 1 VIX Smile


Volatility Index Options 3

weakness is the strong assumption of a specific model the volatility swaps using market prices of volatil-
and distribution for the index volatility. ity options as inputs. The resulting pricing consists
Another approach initiated by Neuberger [8], in numerically computing the cost of the hedging
Demeter et al. [4], Carr and Lee [1], and Carr and strategy with the series of options as shown by the
Wu [2] is to exhibit a static hedge and compute in a integrals of equation (6).
model free way the price of this hedge using call and The obvious strength of this approach is to
put options on the index itself. In a very insightful avoid any assumption on the underlying distribu-
paper, Demeter et al. [4] showed that a static portfo- tion of the index volatility. Primarily, the weak-
lio of call and puts options on the volatility index can nesses are that the replication methods do not work
replicate a variance swap. Lately, Carr and Lee [1] for very specific index volatility options and that
and Carr and Wu [2] extended the closed form for- the discretization bias due to the lack of reli-
mula to the case of both the variance and the volatility able liquid quotes for call and put options at any
swap. The starting point is to assume a pure diffusion strikes can be of the same order of the magni-
given as follows: tude as the misspecification of the index volatility
distribution.
dSt = µt St dt + σ (t, . . .)St dWt (4)

where µ is the drift term and σ (t, . . .) is a very References


general volatility function (that can be assumed to be
a local volatility for the clarity of the explanation). [1] Carr, P. & Lee, R. (2007). Realized volatility and
variance: Options via swaps, Risk May, 76–83.
A trivial application of the Itô lemma on log(St )
[2] Carr, P. & Wu, L. (2009). Variance risk premiums,
provides the main intuition and leads to the fact Review of Financial Studies 22, 1311–1341.
that one can relate the volatility to a log contract [3] CBOE (2009). Vix Options CBOE . www.cboe.com.
as follows: [4] Demeterfi, K., Derman, E., Kamal, M. & Zou, J. (1999).
More than You Ever wanted to know about Volatility
dSt 1 Swaps, Goldman Sachs Quantitative Strategies, March
− d log(St ) = σ 2 (t, . . .) dt (5) 1999.
St 2 [5] Elliott, R., Siu, T. & Chan, L. (2004). Pricing volatility
Like any function of the underlying asset, the log swaps under heston’s stochastic volatility model with
contract can be replicated by a series of call and put regime switching, Applied Mathematical Finance 14(1),
41–62.
options. This leads, in particular, to the Par swap [6] Grunbichler, A. & Longstaff, F. (1996). Valuing futures
rate of a variance swap as follows (referred in the and options on volatility, Journal of Banking and
literature as the replication variance swap price): Finance 20, 985–1001.
[7] Howison, S., Rafailidis, A. & Rasmussen, H. (2004). On
     the pricing and hedging of volatility derivatives, Applied
2 S0 rT S∗
KVar = rT − e − 1 − log Mathematical Finance 11(4), 317–346.
T S∗ S0 [8] Neuberger, A. (1994). The log contract: a new instru-
 S∗ ment to hedge volatility, Journal of Portfolio Manage-
1
+ erT P (K) dK ment 20(2), 74–80.
0 K2 [9] Sepp, A. (2008). Pricing options on realized variance in
  the heston model with jumps in returns and volatility,
1
+ erT + ∞ 2 C(K) dK (6) Journal of Computational Finance 11(4), 33–70.
S∗ K [10] Sepp, A. (2008). Vix option pricing in a jump-diffusion
model, Risk April, 84–89.
where P (K) and C(K), respectively, denote the [11] Whaley, R. (1993). Derivatives on market volatility:
current fair value of a put and call option of strike K, hedging tools long overdue, Journal of Derivatives 1,
r is the risk free rate, T is the maturity of the variance 71–84.
swap, S0 is the initial spot value of the underlying
asset, and S∗ is an arbitrary point to do the split Further Reading
between the liquid call and put options. It is often
chosen to be the forward value. Bergomi, L. (2008). Dynamic properties of smile models, in
Unlike the previous model-dependent approaches, R. Cont ed. Frontiers in Quantitative Finance: Volatility and
the model-free approach replicates the variance and Credit Risk Modeling, Wiley, Chapter 3.
4 Volatility Index Options

Cont, R. and Kokholm, T. (2009). A Consistent Pricing Model Realized Volatility and Multipower Variation;
for Index Options and Volatility Derivatives. Available at Realized Volatility Options; Stochastic Volatility
SSRN: http://ssrn.com/abstract = 1474691. Models; Variance Swap; Weighted Variance Swap.

Related Articles ERIC BENHAMOU & MARIAN CIUCA

Call Options; Corridor Variance Swap; Gamma


Swap; Heston Model; Implied Volatility Surface;
interest rates vary deterministically, because if Y  is a
Realized Volatility dollar-denominated share price and Y is that share’s
Options bond-denominated price, then log Y – log Y  has finite
variation; hence, [log Y ] = [log Y  ].
Expectations Ɛ will be with respect to martingale
Let the underlying process Y be a positive semi- measure .
martingale, and let Xt := log(Yt /Y0 ).
Define realized variance to be [X], where [·]
denotes the quadratic variation (but see the section
Transform Analysis
“Contract Specifications in Practice”). Some of the methods surveyed here (in the sections
Define a realized variance option on Y with “Pricing by Modeling the Underlying Process” and
variance strike Q and expiry T to pay “Pricing via Transform”) will price variance/volatility
options by integrating prices of payoffs of the form
([X]T − Q)+ for a realized variance call ez[X]T . Transform analysis relates the former to the
+ latter, by the following pricing formulas, proved
(Q − [X]T ) for a realized variance put in [5].
Assume that the continuous payoff function h :
and define a realized volatility option on Y with
 →  satisfies
volatility strike Q1/2 and expiry T to pay  ∞
e−αq h(q) dq < ∞ (1)
([X]T − Q1/2 )+
1/2
for a realized volatility call −∞

(Q1/2 − [X]T )+
1/2
for a realized volatility put for some α ∈ . For all z ∈ α + i := {z ∈  :
Re z = α}, define the bilateral Laplace transform
In some places, we restrict attention to puts. Call  ∞
prices follow by put–call parity: for realized vari- H (z) := e−zq h(q) dq (2)
ance options, a long-call short-put combination pays −∞

[X]T − Q, equal to a Q-strike variance swap, and for If |H | is integrable along α + i for some α ≤ 0,
realized volatility options, a long-call short-put com- then by Bromwich and Fubini, the h([X]T ) payoff
1/2
bination pays [X]T − Q1/2 , equal to a Q1/2 -strike has price
volatility swap.  α+∞i
Unlike variance swaps (see Variance Swap; 1
Ɛh([X]T ) = H (z)Ɛez[X]T dz (3)
Weighted Variance Swap), which admit exact 2πi α−∞i
model-free (assuming only continuity of Y ) hedging
and pricing in terms of Europeans, variance, and For a variance put, let h(q) = (Q − q)+ . Then for
volatility options have a range of values, consistent all α < 0, formula (3) holds with
with the given prices of Europeans. With no e−Qz
further assumptions, there exist sub/superreplication H (z) = (4)
strategies and lower/upper pricing bounds (in the z2
section “Pricing Bounds by Model-free Use of √  +
Europeans”). Under an independence condition, there For a volatility put, let h(q) = Q − q + . Then
exist exact pricing formulas in terms of Europeans for all α < 0, formula (3) holds with
(in the section “Pricing by Use of Europeans, Under √ 
π Erf ( zQ)
an Independence Condition”). Under specific models, H (z) = − (5)
there exist exact pricing formulas in terms of model 2z3/2
parameters (in the section “Pricing by Modeling the To price variance and volatility calls by put–call
Underlying Process”). parity, we have the variance swap value
Unless otherwise noted, all prices are denominated 
in units of a T -maturity discount bond. The results ∂ 
Ɛ[X]T =  Ɛez[X]T (6)
apply to dollar-denominated prices, provided that ∂z z=0
2 Realized Volatility Options

and the volatility swap value For variance option pricing under pure-jump pro-
 ∞ cesses with independent increments, but without
1 1 − Ɛe−z[X]T
Ɛ[X]T1/2 = √ dz (7) assuming stationary increments, see [2].
2 π 0 z3/2
if Ɛez[X]T is analytic in a neighborhood of z = 0.
Pricing by Use of Europeans, Under an
Independence Condition
Pricing by Modeling the Underlying
Process In this section, let Y be a share price that follows
general stochastic volatility dynamics
Under Heston and under Lévy models, we give
formulas for the transform Ɛez[X]T , where Re z ≤ 0. dYt = σt Yt dWt (13)
Hence, formula (3) prices the variance put and vola-
tility put, using equations (4) and (5), respectively. where σ and the Brownian motion W are independent.
Although all three subsections use this assumption,
the schemes in the sections “Pricing via Transform”
Example: Heston Dynamics and “Pricing and Hedging via Uniform or L2 Payoff
Under the Heston model for instantaneous variance Approximation” are immunized, to first order, against
(see Heston Model), violations of the independence condition.

dVt = (a − κVt ) dt + β Vt dWt (8)
T Pricing via Transform
and the transform of [X]T = 0 Vt dt is T
The transform of [X]T = σt2 dt satisfies [5]
Ɛez[X]T = eA(z)+B(z)V0 (9) 0

where  √
Ɛez[X]t = Ɛ θ+ (YT /Y0 )1/2+ (1/4)+2z

a
A(z) := 2
(κ − γ )T √
β + θ− (YT /Y0 ) 1/2− (1/4)+2z
(14)


κ −γ
− 2 log 1 + (1 − e−γ T ) (10) provided√that the expectations are finite. Here, θ± :=
2γ (1 ∓ 1/ 1 + 8z)/2. The right-hand side (RHS) of
2z(eγ T − 1) equation (14) is in principle observable from T -
B(z) := , expiry Europeans, which allows variance/volatility
2γ + (γ + κ)(eγ T − 1)
put option pricing by the formulas (3–5). In this
 context, equation (6) can be replaced by the log-
γ := κ 2 − 2β 2 z (11) contract value −2ƐXT , and equation (7) can be
replaced by the synthetic volatility swap value (see
by [6]. Other affine models also have explicit formu-
Volatility Swaps).
las for Ɛez[x]T .
Moreover, source [5] shows that equation (14)
still holds approximately in the presence of corre-
Example: Lévy Dynamics lation between σ and W , in the sense that the RHS
If X is a Lévy process (see Lévy Processes) with is constructed to have zero sensitivity to first-order
Gaussian variance σ 2 and Lévy measure ν, then [X] correlation effects.
has transform
Pricing and Hedging via Uniform or L2 Payoff
  
σ 2 z2 zx 2 Approximation
Ɛe z[X]T
= exp T +T e − 1 ν( dx)
2 
For continuous payoffs, h : [0, ∞) →  with finite
(12) limit at ∞, such as the variance put or volatility put,
Realized Volatility Options 3

consider an nth-order approximation to h(q) where B ∈ N×J is given by Bnj := C BS (Kn , vj ),


the Black–Scholes formula for strike Kn and squared
An (q) := an,n e−cnq + an,n−1 e−c(n−1)q + · · · + an,0 unannualized volatility vj . The approximate solution
is chosen to minimize Bp − c 2 plus a convex
(15) penalty term.
 The contact paying h([X]T ) is then
priced as pj h(vj ).
where c > 0 is an arbitrary constant.
To choose A by uniform approximation, an,k
may be determined as the coefficients of the nth Pricing by Use of Variance or Volatility
Bernstein polynomial approximation to the function
Swaps
x  → h(−(1/c) log x) on [0, 1].
Then source [5] shows that With sufficient liquidity, variance and/or volatility
swap quotes can be taken as inputs. For example, an
Ɛh([X]T ) approximation in [8] prices variance options by fit-

n  √ ting a lognormal variance distribution to variance and
= lim Ɛ an,k θ+ (YT /Y0 )1/2+ 1/4−2ck volatility swaps of the same expiry. An approxima-
n→∞ tion in [4] prices and hedges variance and volatility
k=0
√ options by fitting a displaced lognormal, to variance
+ θ− (YT /Y0 )1/2− 1/4−2ck (16) and volatility swaps.
The variance curve models in [1] apply a different
√ approach to using variance swaps; they take as inputs
where θ± := (1 ∓ 1/ 1 − 8ck)/2. The RHS of equa- the variance swap quotes at multiple expiries, and
tion (16) is, in principle, observable from T -expiry they model the dynamics of the term structure of
Europeans and is moreover designed to have zero forward variance. Applications include pricing and
sensitivity to first-order correlation effects. hedging of realized variance options.
Alternatively, to choose A by L2 approximation,
the an,k may be determined by L2 (µ) projection of
h onto span{1, e−cq , . . . , e−cnq }, where the “prior” µ Pricing Bounds by Model-free Use of
is a finite measure on [0, ∞). In practice, an,k may
Europeans
be computed by weighted least squares regression of
h(q) on the regressors {q  → e−ckq : k = 0, . . . , n}, In this section, consider variance options on, more
with weights given by µ. Then source [5] shows that generally, any continuous share price Y .
equation (16) still holds, regardless of the choice of Given European options of the same expiry T ,
the prior µ, provided that dP / dµ exists in L2 (µ), there exist model-free sub/superreplication strategies,
where P denotes the -distribution of [X]T . and hence lower/upper pricing bounds, for the vari-
For hedging purposes, the summation in the RHS ance options. Here model-free means that, aside from
of equation (16) provides a European-style payoff continuity and positivity, we make no assumptions
that, in conjunction with share trading, replicates the on Y .
volatility payoff h([X]T ) to arbitrary accuracy.

Subreplication and Lower Bounds


Pricing via Variance Distribution Inference
The following subreplication strategy is due to [7];
Given the prices c ∈ N×1 of vanilla options at this exposition also draws from [3].
strikes K1 , . . . , KN , a scheme in [8] discretizes into Let λ : (0, ∞) →  be convex, let λy denote
{v1 , . . . , vJ } the possible values of [X]T , and pro- its left-hand derivative, and assume that its second
poses to infer the discretized variance distribution p ∈ derivative in the distributional sense has a density,
J ×1 where pj := ([X]T = vj ), by solving approx- denoted λyy , which satisfies for all y ∈ +
imately for p in
Bp = c (17) λyy (y) ≤ 2/y 2 (18)
4 Realized Volatility Options


Define for y > 0 and v > 0 where f (z) := 1/4 − 2iz and where α > 0 is arbi-
trary. For y > 0 and bd = bu , define
 ∞
1
e−(z+v/2)
2
BS(y, v; λ) := λ(yez ) √ /(2v)
dz
−∞ 2πv L(y; bd , bu )
(19) log(bu /bd )
:= −2 log(y/bu ) + 2 (y −bu ) (24)
bu − bd
and define BS(y, 0; λ) := λ(y), and let BSy denote
its y-derivative. Let τQ := inf{t ≥ 0 : [X]t ≥ Q}. and define L(y; Y0 , Y0 ) := −2 log(y/Y0 )+2y/Y0 −2.
Then the following trading strategy subreplicates the Let
variance call payoff: hold statically a claim that pays
at time T 
L(y) if y ∈
/ (bd , bu )
λ(YT ) − BS(Y0 , Q; λ) (20) L∗ (y) := (25)
−BP (y, 0) if y ∈ (bd , bu )
and trade shares dynamically, holding at each time Let BPy and Ly denote the y-derivatives, and let
t ∈ (0, T ) τb := inf{t ≥ 0 : Yt ∈
/ (bd , bu )}.
Then, the following strategy superreplicates the
−BSy (Yt , Q − [X]t ; λ) shares if t ≤ τQ variance call payoff ([X]T − Q)+ . Hold statically a
claim that pays at time T
−λy (Yt ) shares if t > τQ (21)
L∗ (YT ) − L∗ (Y0 ) (26)
and a bond position that finances the shares and
accumulates the trading gains or losses. Therefore, and trade shares dynamically, holding at each time at
the time-0 value of the contract paying (20) provides each time t ∈ (0, T )
a lower bound on the variance call value.
The lower bound from equation (20) is optimized
by λ consisting of 2/K 2 dK out-of-the-money vanilla BPy (Yt , [X]t − [X]0 ) shares if 0 ≤ t ≤ τb
payoffs at all K where I0 (K, T ), the squared unannu- −Ly (Yt ) shares if t > τb (27)
alized Black–Scholes implied volatility, exceeds Q:
 and a bond position that finances the shares and
2
λ(y) = 2
vanK (y) dK (22) accumulates the trading gains or losses.
{K:I0 (K,T )>Q} K Therefore, the time-0 value of the contract pay-
ing (26) provides an upper bound on the variance
See [3] for generalization to forward-starting vari-
call value. Given T -expiry European options data, the
ance options.
upper bound from equation (26) may be optimized
over all choices of (bd , bu ).
Superreplication and Upper Bounds

The following superreplication strategy is due to [3]. Connection to the Skorokhod Problem
Choose any bd ∈ (0, Y0 ] and bu ∈ [Y0 , ∞). Let
Whereas the sections “Subreplication and Lower
Bounds” and “Superreplication and Upper Bounds”
BP (y, q) presented explicit hedging strategies, which imply
 ∞−αi  pricing bounds, this section presents (a logarithmic
:= y/bu sinh (log(bd /y)f (z)) version of) the result in [7], which showed that
−∞−αi
 stopping-time analysis also implies pricing bounds.

− y/bd sinh (log(bu /y)f (z)) Denote by ν the -distribution of YT , which is
revealed by the prices of T -expiry options on Y .
 
2πz2 ei(Q−q)z sinh (log(bu /bd )f (z)) dz Suppose that Ỹ is a continuous F-martingale with
ỸT ∼ ν, and [X̃]T has finite expectation, where X̃ :=
(23) log Ỹ . Then Dambis–Dubins–Schwartz implies that
Realized Volatility Options 5

Ỹt = G[X̃]t , where G is a driftless unit-volatility geo- the term inside the parentheses becomes log((Yn +
metric G-Brownian motion (on an enlarged prob- Dn )/Yn−1 ), where Dn denotes the discrete dividend
ability space if needed) with G0 = Y0 , and [X̃]t payment, if any, of the nth period.
are G-stopping times, where Gs := Finf{t: [X̃]t >s} . Thus
G[X̃]T ∼ ν; and hence [X̃]T solves a Skorokhod prob- References
lem (see Skorokhod Embedding): it is a finite-
expectation stopping time that embeds the distribu-
[1] Buehler, H. (2006). Consistent variance curve models,
tion ν in G. Conversely, if some finite-expectation Finance and Stochastics 10(2), 178–203.
τ embeds ν in a driftless unit-volatility geometric [2] Carr, P., Geman, H., Madan, D. & Yor, M. (2005). Pricing
Brownian motion G, then Ỹt := Gτ ∧(t/(T −t)) defines a options on realized variance, Finance and Stochastics
continuous martingale with ỸT ∼ ν and [log Ỹ ]T = τ . 9(4), 453–475.
Therefore, distributions of stopping times solving [3] Carr, P. & Lee, R. Hedging variance options on continu-
ous semimartingales, Finance and Stochastics, forthcom-
the Skorokhod problem are identical to distributions
ing.
of realized variance consistent with the given price [4] Carr, P. & Lee, R. (2007). Realized volatility and
distribution ν. Skorokhod solutions that have optimal- variance: options via swaps, Risk 20(5), 76–83.
ity properties, therefore, imply bounds on prices of [5] Carr, P. & Lee, R. (2008). Robust Replication of Volatility
variance/volatility options. In particular, Root’s solu- Derivatives, Bloomberg LP and University of Chicago.
tion is known [9] to minimize the expectations of [6] Cox, J., Ingersoll, J. & Ross, S. (1985). A theory of
convex functions of the stopping time; the minimized the term structure of interest rates, Econometrica 53(2),
385–407.
expectation is, in that sense, a sharp lower bound on
[7] Dupire, B. (2005). Volatility Derivatives Modeling,
the price of a variance option (see also Skorokhod Bloomberg LP.
Embedding). [8] Friz, P. & Gatheral, J. (2005). Valuation of volatility
derivatives as an inverse problem, Quantitative Finance
5(6), 531–542.
Contract Specifications in Practice [9] Rost, H. (1976). Skorokhod stopping times of mini-
mal variance, Séminaire de Probabilités (Strasbourg),
In practice, the realized variance in the payoff spec- Springer-Verlag, Vol. 10, pp. 194–208.
ification is defined by replacing quadratic variation
[X]T with an annualized discretization that monitors
Y , typically daily, for N periods, resulting in a spec- Related Articles
ification
N  2 Exponential Lévy Models; Heston Model; Lévy
Yn Processes; Skorokhod Embedding; Variance Swap;
Annualization × log (28)
n=1
Yn−1 Volatility Swaps; Volatility Index Options;
Weighted Variance Swap.
If the contract adjusts for dividends (as typical for
single-stock dividends but not index dividends) then ROGER LEE
Put–Call Parity at T by Bt and rearranging terms, we have

call + p.v. of strike price

Put–call parity means that one may switch between = put + p.v. of forward price (1)
call and put positions by selling or buying the or
underlying forward: “long call, short put is long ct + K · Bt = pt + Ft · Bt (2)
forward contract” or c − p ≡ f. In other words,
one may replicate a put contract by buying a call For all investment assets where short selling is
of identical characteristics (underlying asset, strike, feasible, the forward price can be further expressed
maturity) and selling the underlying asset forward as a function of the spot price St and the revenue
(p ≡ c − f), and one may replicate a call by buy- or cost of carry until maturity T (see Forwards and
ing a put and the underlying forward (c ≡ p + f). Futures). For example, the forward price of a stock
This is shown in the three payoff diagrams with continuous dividend rate q satisfies Ft = St /Bt ·
(Figures 1–3). exp(−q(T − t)), and put–call parity simplifies to
A logical proof of the third instance (c ≡ p + f)
is as follows: a rational investor will exercise a call ct + K · Bt = pt + St · e−q(T −t) (3)
option whenever the asset price S at maturity is above
In practice, Kamara and Miller [5] give empirical
the strike K; this is equivalent to promising to buy the
evidence that while put–call parity has many small
asset at K and having the option to sell it at that level,
violations, almost half of the arbitrages would result
which a rational investor will exercise whenever S
in a loss when execution delays are accounted for.
falls below K.
Put–call parity is often referred to as option syn-
thetics by practitioners and holds only for European Basic Implications
options.a It does not require any assumption other
than the ability to buy or sell the asset forward, but it
is worth noting that this may not always be the case: • For trading purposes, puts and calls are identical
to sell forward, either a futures market must exist or instruments (up to a directional position in the
one must be able to short-sell the asset. underlying asset).
Put–call parity must not be confused with • At-the-money-forward calls and puts must have
“put–call symmetry” (see Foreign Exchange Sym- the same value. (An at-the-money-forward option
metries) in foreign exchange, which states that a has its strike set at the forward price of the
call struck at K on a given exchange rate S (e.g., underlying asset.)
dollars per 1 euro) is identical to a put struck at • In the absence of revenue or cost of carry,
1/K on the reverse rate 1/S (euros per 1 dollar), the deltas (see Black–Scholes Formula; Delta
after the ad hoc numeraire conversions: c(S, K)/S ≡ Hedging) of a call and put must add up to 1 (in
K p(1/S, 1/K). absolute value).
• Puts and calls must have the same gamma
and vega (see Black–Scholes Formula; Gamma
Hedging).
Price Relationship
In volatility modeling, put–call parity implies that
Assuming no arbitrage, the synthetic relationship calls and puts of identical characteristics must have
immediately translates into the well-known price the same implied volatility.
relationship: “call minus put equals forward” or In exotic option pricing, Carr and Lee [1] put
ct − pt = ft . Note that here ft denotes the price of forward the idea of a generalized American option
a forward contract struck at K, that is, the present that may be indefinitely exercised until maturity to
value (p.v.) of the gap between the forward price Ft lock-in the intrinsic value and switch between call
and the strike price K (see Forwards and Futures). and put styles. The authors show that this option
Denoting the price of the zero-coupon bond maturing may be replicated by holding onto a European
2 Put–Call Parity

Payoff Payoff

d
Call

ar
rw
Fo
Put

d
ar
Call

rw
Fo
K Call
S
K S
d

d
ar

ar
rw

rw
Fo

Fo
Short put

Figure 1 c−p≡f Figure 3 c≡p+f

History
Payoff Haug [3] traces put–call parity as far back as
Call the seventeenth century, but its formulation was
then “diffuse”. According to the author, an early
Sh
or

formulation of put–call parity “as we know it” can


tf
or

be found in the work by Higgins [4], who wrote in


w
ar
d

1902:
Put
It can be shown that the adroit dealer in options can
convert a ‘put’ into a ‘call’, a ‘call’ into a ‘put’ [. . .]
Put
by dealing against it in the stock.
K S
Derman and Taleb [2] argue that the Black–Scholes–
Merton formulas could have been established earlier
than 1973 via put–call parity instead of the dynamic
Sh
or

replication argument. Specifically, the authors cite


tf
or
w

similar formulas published in the 1960s, all of which


ar
d

“involved unknown risk premiums that would have


been determined to be zero had [. . .] the put–call
replication argument” been used.
Put–call parity can fail when there are restrictions
Figure 2 p≡c−f on short selling, when the underlying asset is hard to
borrow or illiquid, or in the case of corporate events
such as leveraged buyouts.
vanilla call and subsequently selling and buying the
forward contract at every exercise. This strategy is End Notes
a straightforward illustration of how put–call parity
may be exploited to alternate between call and put a.
The reason put–call parity fails with American options
positions by only trading in the forward contract. is best seen in the first instance (c − p ≡ f), whereby an
Put–Call Parity 3

agent attempts to replicate a forward contract by buying a [4] Higgins, L.R. (1902). The Put-and-Call. E. Wilson,
call and selling a put. If the put is American, it may be London.
exercised against the agent before maturity, thus breaking [5] Kamara, A. & Miller, T.W. (1995). Daily and intradaily
the replication strategy. tests of European put-call parity, Journal of Financial
and Quantitative Analysis 30, 519–539.

References
Related Articles
[1] Carr, P. & Lee, R. (2002). Hyper Options. Working paper,
Courant Institute and Stanford University, December Black–Scholes Formula; Call Options; Forwards
2002.
and Futures; Option Pricing: General Principles;
[2] Derman, E. & Taleb, N.N. (2005). The illusions of
dynamic replication, Quantitative Finance 5(4),
Options: Basic Definitions.
323–326.
[3] Haug, E. (2007). Derivatives: Models on Models. Wiley. SÉBASTIEN BOSSU
Discretely Monitored is the computational complexity associated with an
m-variate normal distribution for even moderate val-
Options ues of m. For example, Monte Carlo or tree-based
algorithms may take several hours or even days for
common values of m [4]. In their paper, Broadie
Traditional pricing models for path-dependent options et al. [3] (see also [17]), opt to circumvent this
rely on continuously monitoring the underlying, hurdle by linking Vm (H ) to the price of a con-
often resulting in closed-form or analytic formu- tinuously monitored option with a barrier shifted
las. References include [14, 19, 20, 21] for bar- away from the original. More precisely, they show
rier options, [6, 12, 13] for look-back options, and that
 √   √ 
[11, 18] for Asian or average options. However,
Vm (H ) = V H e±βσ t + o 1/ m (3)
in practice, monitoring is performed over discrete
dates (e.g., monthly, weekly, or daily), while the
underlying is still assumed to follow a continuous where V (H̃ ) is the price of a continuously mon-
model. In contrast to continuous monitoring, dis- itored barrier option with threshold H̃ and β ≈
crete monitoring rarely, if ever, leads to similarly 0.5826, with + for an up option and − for a down
tractable solutions and using continuous monitor- option.
ing as approximation for discrete monitoring often Although this approach works very well, it appears
leads to significant mispricing (cf. [5, 15, 16].) As to be inaccurate when the barrier is near the initial
a consequence, various approaches have been fol- price of the underlying. Under such circumstances,
lowed to arrive at practically useful computational one can opt to use the recursive method of Ait-
schemes. Sahlia and Lai [1], which consists in reducing an
For illustration, we focus on a down-and-out call m-dimensional integration problem to successively
option, where a standard call option with strike K evaluating m one-dimensional integrals. Specifically,
is canceled if the underlying falls below a barrier they show that
prior to expiry T . We first assume the traditional  ∞
 x +
Black–Scholes–Merton setup with the price {St } Vm (H ) = S0 e − K fm (x) dx (4)
log(H /S0 )
of the underlying following a geometric Brownian
motion where, for 1 ≤ n ≤ m, fn (x) dx = P {τ > n, Un ∈
St = S0 eBt (1) dx} for x > log(H /S0 ), with fn defined recursively
for each n according to the following:
where {Bt } is a Brownian motion with drift r −
σ 2 /2 and standard deviation σ . Here the param- f1 (x) = ψ(x)
eters r and σ represent the prevailing risk-free  ∞
rate and the return volatility of the underlying fn (x) = fn−1 (y)ψ(x − y) dy
asset, respectively. Let H > 0 be a given con- log(H /S0 )

stant (barrier) and assume H < S0 . With moni- for 2 ≤ n ≤ m (5)


toring effected over a set of m dates nt (n =
1, . . . , m) such that t = T /m, let Un = X1 + X2 + Here ψ(x) = σ̃ −1 φ = ((x − µ)/σ̃ ), with φ being
· · · + Xn , where the Xi s are independent
 normal
 ran- the density of a standard normal distribution, and
dom variables with mean √µ = r − σ 2 /2 t and fn (x) = 0, for x ≤ log(H /S0 ) and 1 ≤ n ≤ m. This
standard deviation σ̃ = σ t. Then the call is approach is very accurate and efficient for generally
knocked-out the first (random) time τ ∈ {1, 2, . . . , m} moderate values of m, using as little as 20 integration
such that H ≥ Sτ and the time-0 price of such a points.
call is Both the continuity correction and recursive
integration methods can also be similarly applied
 +
Vm (H ) = e−rT E S0 eUm − K 1{τ >m} (2) to discretely monitored look-back options (cf. [2]
and [4].) Alternatives to the above abound as
where τ = inf{n : Un ≤ log (H /S0 )}. The main well. Fusai et al. [10] use a Wiener–Hopf machin-
source of the evaluation of the above expectation ery to also compute hedge parameters. In the
2 Discretely Monitored Options

context of a GARCH model, Duan et al. [7] propose [10] Fusai, G., Abrahams, D. & Sgarra, C. (2006). An exact
a Markov chain technique that can also handle analytical solution for discrete barrier options, Finance
American-style exercise. Partial differential equa- and Stochastics 10, 1–26.
[11] Geman, H. & Yor, M. (1993). Bessel processes,
tions are used in [9, 22, 23, 24] to price average
Asian options and perpetuities, Mathematical Finance
and barrier options, including when volatility is 3, 349–375.
stochastic and exercise is of American style. Finally, [12] Goldman, M., Sosin, H. & Gatto, M. (1979). Path
[8] contains an approach that ultimately relies on dependent options: ‘Buy at the low, sell at the high’,
Hilbert and Fourier transform techniques to address Journal of Finance 34, 1111–1127.
the situation when the underlying follows a Lévy [13] Goldman, M., Sosin, H. & Shepp, L. (1979). On
process. contingent claims that insure ex-post optimal stock
market timing, Journal of Finance 34, 401–414.
[14] Heynen, R.C. & Kat, H.M. (1994). Partial barrier
References options, Journal of Financial Engineering 3,
253–274.
[15] Heynen, R.C. & Kat, H.M. (1994). Lookback op-
[1] AitSahlia, F. & Lai, T. (1997). Valuation of discrete
tions with discrete and partial monitoring of the
barrier and hindsight options, Journal of Financial
underlying price, Applied Mathematical Finance 2,
Engineering 6, 169–177.
273–284.
[2] AitSahlia, F. & Lai, T. (1998). Random walk duality
[16] Kat, H. & Verdonk, L. (1995). Tree surgery, Risk 8,
and the valuation of discrete lookback options, Applied
53–56.
Mathematical Finance 5, 227–240.
[17] Kou, S. (2003). On pricing of discrete barrier options,
[3] Broadie, M., Glasserman, P. & Kou, S. (1997). A conti-
Statistica Sinica 13, 955–964.
nuity correction for discrete barrier options, Mathemat-
ical Finance 7, 325–349. [18] Linetsky, V. (2004). Spectral expansions for Asian
[4] Broadie, M., Glasserman, P. & Kou, S. (1999). Con- (average price) options, Operations Research 52,
necting discrete and continuous path-dependent options, 856–867.
Finance and Stochastics 3, 55–82. [19] Merton, R.C. (1973). Theory of rational option pricing,
[5] Chance, D. (1994). The pricing and hedging of limited Bell Journal of Economics and Management Science 4,
exercise caps and spreads, Journal of Financial Research 141–183.
17, 561–583. [20] Rich, D. (1994). The mathematical foundations of barrier
[6] Conze, A. & Viswanathan, R. (1991). Path dependent option pricing theory, Advances in Futures and Options
options: the case of lookback options, Journal of Finance Research 7, 267–312.
46, 1893–1907. [21] Rubinstein, M. & Reiner, E. (1991). Breaking down the
[7] Duan, J.C., Dudley, E., Gauthier, G. & Simonato, J.G. barriers, Risk 4, 28–35.
(2003). Pricing discretely monitored barrier options [22] Vetzal, K. & Forsyth, P.A. (1999). Discrete Parisian and
by a Markov Chain, Journal of Derivatives 10, delayed barrier options: a general numerical approach,
9–31. Advanced Futures Options Research 10, 1–16.
[8] Feng, L. & Linetsky, V. (2008). Pricing discretely [23] Zvan, R., Forsyth, P.A & Vetzal, K. (1999). Discrete
monitored barrier options and defaultable bonds in Asian barrier options, Journal of Computational Finance
Lévy process models: a fast Hilbert transform approach, 3, 41–68.
Mathematical Finance 18, 337–384. [24] Zvan, R., Vetzal, K. & Forsyth, P.A. (2000). PDE
[9] Forsyth, P.A., Vetzal, K. & Zvan, R. (1999). A finite methods for pricing barrier options, Journal of Economic
element approach to the pricing of discrete lookbacks Dynamics and Control 24, 1563–1590.
with stochastic volatility, Applied Mathematical Finance
6, 87–106. FARID AITSAHLIA
Weighted Variance Swap so the share price with reinvested dividends is Yt Qt .
Then the payoff
 T
w(Yt ) d[X]t (3)
Let the underlying process Y be a semimartingale θ
taking values in an interval I . Let ϕ : I →  be a
difference of convex functions, and let X := ϕ(Y ). admits a model-independent replication strategy,
A typical application takes Y to be a positive price which holds European options statically and trades
process and ϕ(y) = log y for y ∈ I = (0, ∞). the underlying shares dynamically. Indeed, let λ :
Then (the floating leg of) a forward-starting I →  be a difference of convex functions, let λy
weighted variance swap or generalized variance swap denote its left-hand derivative, and assume that its
on ϕ(Y ) (shortened to “on Y ” if the ϕ is under- second derivative in the distributional sense has a
stood), with weight process wt , forward-start time signed density, denoted λyy , which satisfies for all
θ, and expiry T , is defined to pay, at a fixed time y∈I
Tpay ≥ T > θ ≥ 0, λyy (y) = 2ϕy2 (y)w(y) (4)

 T where ϕy denotes the left-hand derivative of ϕ. Then


wt d[X]t (1)
θ  T
w(Yt ) d[X]t
where [·] denotes quadratic variation. In the case θ
that θ = 0, the trade date, we have a spot-starting  T
weighted variance swap. The basic cases of weights = λ(YT ) − λ(Yθ ) − λy (Yt ) dYt (5)
take the form wt = w(Yt ), for a measurable function θ

w : I → [0, ∞), such as the following. T
= λ(YT ) − λ(Yθ ) + (qt − rt )λy (Yt )Yt dt
1. The weight w(y) = 1 defines a variance swap 
θ
T
(see Variance Swap). Zt
− λy (Yt ) d(Yt Qt /Zt )
2. The weight w(y) = 11y∈C , the indicator function θ Qt
of some interval C, defines a corridor variance (6)
swap (see Corridor Variance Swap) with cor-
ridor C. For example, a corridor of the form where equation (5) is by a proposition in [2] that
C = (0, H ) produces a down variance swap. slightly extends [1], and equation (6) is by Ito’s rule.
3. The weight w(y) = y/Y0 defines a gamma swap So the following self-financing strategy replicates
(see Gamma Swap). (and hence prices) the payoff (3). Hold statically a
claim that pays at time Tpay
 T
Model-free Replication and Valuation λ(YT ) − λ(Yθ ) + (qτ − rτ )λy (Yτ )Yτ dτ (7a)
θ
Assuming a deterministic interest rate rt , let Zt be
and trade shares dynamically, holding at each time
the time-t price of a bond that pays 1 at time
t ∈ (θ, T )
Tpay . Assume that Y is the continuous price process
of a share that pays continuously a deterministic −λy (Yt )Zt shares (7b)
proportional dividend qt . Let
and a bond position that finances the shares and accu-
  Tpay  mulates the trading gains or losses. Hence, the payoff
Zt = exp − ru du and (3) has time-0 value equal to that of the replicating
t claim (7a), which is synthesizable from Europeans
 t  with expiries in [θ, T ]. Indeed, for a put/call sepa-
Qt := exp qu du (2) rator κ (such as κ = Y0 ), if λ(κ) = λy (κ) = 0, then
0
2 Weighted Variance Swap

each λ claim decomposes into puts/calls at all strikes unlike equation (10a). The spot-dependent weighting
K, with quantities 2ϕy2 (K)w(K) dK: is, however, the more common specification and is
 assumed in remainder of this article.
λ(y) = 2ϕy2 (K)w(K)Van(y, K) dK (8)
I
Examples
where Van(y, K) := (K − y)+ 11K<κ +(y − K)+ 11K>κ
denotes the vanilla put or call payoff. For put/call Returning to the previously specified examples of
decompositions of general European payoffs, see [1]. weights w(Yt ), we express the replication payoff λ
in a compact formula, and also expanded in terms
of vanilla payoffs according to equation (8). We take
Futures-dependent Weights ϕ(y) = log y unless otherwise stated.

In equation (3), the weight is a function of spot Yt . • Variance swap: Equation (4) has solution
The alternative payoff specification λ(y) = −2 log(y/κ) + 2y/κ − 2
 T  ∞
2
w(Yt Qt /Zt ) d[X]t (9) = 2
Van(y, K) dK (11)
θ 0 K
makes wt a function of the futures price (a constant • Arithmetic variance swap: For ϕ(y) = y, equa-
times Yt Qt /Zt ). tion (4) has solution
In the case ϕ = log, we have [X] = [log Y ] =  ∞
[log(Y Q/Z)]; hence λ(y) = (y − κ)2 = 2 Van(y, K) dK (12)
0
 • Corridor variance swap: Equation (4) has solu-
T  
w Yt Qt /Zt d[X]t tion
θ 
2
Y Q  Y Q  λ(y) = 2
Van(y, K) dK (13)
T T θ θ
=λ −λ K∈C K
ZT Zθ
• Gamma swap: Equation (4) has solution
 T
− 2 
λy (Yt Qt /Zt ) d(Yt Qt /Zt ) λ(y) = y log(y/κ) − y + κ
θ Y0
 ∞
for λ satisfying equation (4). So the alternative payoff 2
= Van(y, K) dK (14)
(9) admits replication as follows: hold statically a 0 Y0 K
claim that pays at time Tpay
In all cases, the strategy (7) replicates the desired con-
λ(YT QT /ZT ) − λ(Yθ Qθ /Zθ ) (10a) tract. In the case of a variance swap, the strategy (10)
also replicates it, because w(Y ) = 1 = w(Y Q/Z).
and trade shares dynamically, holding at each time
t ∈ (θ, T )
Discrete Dividends
− λy (Yt Qt /Zt )Qt shares (10b)
Assume that at the fixed times tm where θ = t0 <
and a bond position that finances the shares and t1 < · · · < tM = T , the share price jumps to Ytm =
accumulates the trading gains or losses. Thus, the Ytm − − δm (Ytm − ), where each discrete dividend is
payoffs (9) and (10a) have equal values at time 0. given by a function δm of prejump price. In this case,
In special cases (such as w = 1 or r = q = 0), the the dividend-adjusted weighted variance swap can be
spot-dependent (3) and futures-dependent (9) weight defined to pay at time Tpay
specifications are equivalent. In general, the spot- M 
 tm−
dependent weighting is harder to replicate, as it w(Yt ) d[X]t (15)
requires a continuum of expiries in equation (7a), m=1 tm−1 +
Weighted Variance Swap 3

If the function y  → y − δm (y) has an inverse where Dn denotes the discrete dividend payment, if
fm : I → I , and if Y is still continuous on each any, of the nth period. Both here and in the theoretical
[tm−1 , tm ), then each term in equation (15) can form (15), no adjustment is made for any dividends
be constructed via equation (7), together with the deemed to be continuous (for example, index variance
relation λ(Ytm − ) = λ(fm (Ytm )). Specifically, the mth contracts typically do not adjust for index dividends;
term admits replication by holding statically a claim see [3]).
that pays at time Tpay In some contracts—for example, single-stock
(down-)variance—the risk to the variance seller that
λ(fm (Ytm )) − λ(Ytm−1 ) Y crashes is limited by imposing a cap on the payoff.
 tm Hence,
+ (qτ − rτ )λy (Yτ )Yτ dτ (16)  
tm−1
Notional × min(Floating, Cap × Fixed) − Fixed
and holding dynamically −λy (Yt )Zt shares, at each
(19)
time t ∈ (tm−1 , tm ).
replaces equation (17), where “Cap” is an agreed
constant, such as the square of 2.5.
Contract Specifications in Practice
In practice, weighted variance swap transactions are References
forward settled; no payment occurs at time 0, and at
time Tpay the party long the swap receives the total [1] Carr, P. & Madan, D. (1998). Towards a theory of volatil-
payment ity trading, in Volatility, R. Jarrow, ed, Risk Publications,
  pp. 417–427.
Notional × Floating − Fixed (17) [2] Carr, P. & Lee, R. (2009). Hedging Variance Options on
Continuous Semimartingales, Forthcoming in Finance and
Stochastics.
where “fixed” (also known as the strike), expressed in [3] Overhaus, M., Bermúdez, A., Buehler, H., Ferraris, A.,
units of annualized variance, is the price contracted at Jordinson, C. & Lamnouar, A. (2007). Equity Hybrid
time 0 for time-Tpay delivery of “floating”, an annual- Derivatives, John Wiley & Sons.
ized discretization of equation (15) that monitors Y ,
typically daily, for N periods. In the usual case of
ϕ = log, this results in a specification Related Articles

Floating := Annualization Corridor Variance Swap; Gamma Swap; Variance


N
Swap.
Yn + Dn 2
× w(Yn ) log (18)
n=1
Yn−1 ROGER LEE
Model Calibration Option prices being evaluated as expectations,
this inverse problem can also be interpreted as a
(generalized) moment problem for the law  of risk-
neutral process given a finite number of option prices,
The fundamental theorem of asset pricing (see Fun-
it is typically an ill-posed problem and can have
damental Theorem of Asset Pricing) shows that, in
many solutions. However, the number of observed
an arbitrage-free market, market prices can be rep-
options can be large (100 − 200 for index options)
resented as (conditional) expectations with respect to
and finding even a single solution is not obvious and
a martingale measure : a probability measure 
requires efficient numerical algorithms.
on the set  of possible trajectories (St )t∈[0,T ] of the
In the Black–Scholes model (see Black–
underlying asset such that the asset price St /Nt dis-
Scholes Formula), calibration amounts to picking
counted by the numeraire Nt is a martingale. The
the volatility parameter to be equal to the implied
value Vt (HT ) of a (discounted) terminal payoff HT
volatility of a traded option. However, if more than
at T is then given by
one option is traded, the Black–Scholes model cannot
Vt (HT ) = E  [B(t, T )HT |Ft ] (1) be calibrated to market prices, since in most options
markets implied volatility varies across strikes and
where B(t, T ) = Nt /NT is the discount factor. For maturities; this is the volatility smile phenomenon.
example, the value under the pricing rule  of Therefore, to solve the calibration problem, we need
a call option with strike K and maturity T is more flexible models, some examples of which are
given by E  [B(t, T )(ST − K)+ |Ft ]. However, this given here.
result does not say how to construct the pricing
measure . Given that data sets of option prices have Example 1 [Diffusion Model (see Local Volatility
become increasingly available, a common approach Model)] If an asset price is modeled as a diffusion
for selecting a pricing model  is to choose, given process
a set of liquidly traded derivatives with (discounted)
dSt = St [µ dt + σ (t, St ) dWt ] (4)
terminal payoffs (H i )i∈I and market prices (Ci )i∈I ,
a pricing measure  compatible with the observed
parameterized by a local volatility function
market prices:
σ : (t, S) → σ (t, S) (5)
Problem 1 [Calibration Problem] Given market
prices (Ci )i∈I (say at date t = 0) for a set of options
then the values of call options can be computed by
with discounted terminal payoffs (Hi )i∈I , construct a
solving the Dupire equation (see Implied Volatility
probability measure  on  such that
Surface)
• the (discounted) asset price (St )t∈[0,T ] is a mar-
tingale under  ∂C0 ∂C0 K 2 σ 2 (T , K) ∂ 2 C0
+ Kr − =0
∂T ∂K 2 ∂K 2
T ≥ t ≥ u ≥ 0 ⇒ E  [St |Fu ] = Su (2)
∀K ≥ 0, C0 (T = 0, K) = (S − K)+ (6)
• the pricing rule implied by  is consistent with
market prices The corresponding inverse problem is to find a
(smooth) volatility function σ : [0, T ] × + → +
∀i ∈ I , E  [Hi ] = Ci (3) such that C σ (Ti , Ki ) = C ∗ (Ti , Ki ) where C σ is the
solution of equation (6) and C ∗ (Ti , Ki ) are the market
where, for ease of notation, we have set discount prices of call options.
factors to 1 (prices are discounted) and E[.] denotes
the conditional expectation given initial information Example 2 In an exponential-Lévy model St =
F0 . Thus, a pricing rule  is said to be calibrated to exp Xt , where Xt is a Lévy process (see Exponential
the benchmark instruments Hi if the value of these Lévy Models) with diffusion coefficient σ > 0 and
instruments, computed in the model, correspond to Lévy measure ν, call prices C σ,ν (t0 , S0 ; Ti , Ki ) are
their market prices Ci . easily computed using Fourier-based methods (see
2 Model Calibration

Fourier Methods in Options Pricing). The calibra- Inversion formulas


tion problem is to find σ, ν such that
In the theoretical situation where prices of European
∀i ∈ I, C σ,ν (t0 , S0 ; Ti , Ki ) = C ∗ (Ti , Ki ) (7) options are available for all strikes and maturities,
the calibration problem can sometimes be explicitly
This is an example of a nonlinear inverse problem solved using an inversion formula.
where the parameter lies in a space of measures. For the diffusion model in Example 1, the Dupire
formula [25] (see Dupire Equation):
Example 3 In the LIBOR market model, a set 
 ∂C0
of N interest rates (LIBOR rates) is modeled as  + Kr ∂C0
a diffusion process Lt = (Lit )i=1..N with constant σ (T , K) =  ∂T 2 2 ∂K (11)
K ∂ C0
covariance matrix  =t σ.σ ∈ Sym+ (n × n): 2 ∂K 2

dLit = µit dt + Lit σi . dWt


j
(8) allows to invert the volatility function from call
option prices. Similar formulas can be obtained in
This model can then be used to analytically credit derivative pricing models, for inverting port-
folio default rates from collateralized debt obligation
price caps, floors, and swaptions (using a lognormal
(CDO) tranche spreads [16] and pure jump models
approximation), whose prices depend on the entries
with state-dependent jump intensity (“‘local Levy”
of the covariance matrix . The calibration problem
model) [12]. No such inversion formula is avail-
is to find a symmetric semidefinite positive matrix
able in the case of American options (see American
 ∈ Sym+ (n × n) such that the model prices C 
Options). The Dupire formula (11) has been widely
match market prices
used by practitioners for recovering the local volatil-
ity function from call/put option prices by interpolat-
∀i ∈ I, C  (Ti , Ki ) = C ∗ (Ti , Ki ) (9)
ing in strike and maturity and applying equation (11).
This problem can be recast as a semi-definite However, since equation (11) involves differentiating
programming problem [2]. the inputs, it suffers from instability and sensitivity to
Other examples include the construction of yield small changes in inputs, as shown in Figure 1. This
curves from bond prices (see Bond Options) calibra- instability deters one from using inversion formulas
tion of term structure models (see Term Structure such as equation (6) even in the rare cases where they
Models) to bond prices, recovering the distribution of exist.
volatility from option prices [28] calibration to Amer-
ican options in diffusion models [1] and recovery of
portfolio default rates from market quotes of credit Least-squares Formulation
derivatives [16, 18].
These problems are typically ill-posed in the sense Typically, if the model is misspecified, the observed
that, either solutions may not exist (model class is option prices may not lie within the range of prices
too narrow to reproduce observations) or solutions attainable by the model. Also, option prices are
are not unique (if data is finite or sparse). In practice, defined up to a bid–ask spread: a model may generate
existence of a solution is restored by formulating the prices compatible with the market but may not
problem as an optimization problem exactly fit the mid-market prices for any given θ ∈ E.
For these reasons, one often reformulates calibration
inf F (C θ − C) (10) as a least-squares problem
θ∈E

where E is the parameter space and F is a loss 


I
inf J0 (θ), J0 (θ) = wi |Ci (θ) − Ci |2 (12)
function applied to the discrepancy C θ − C between θ∈E
i=1
market prices and model prices. An algorithm is then
used to retrieve one solution and the main issue is the where Ci are mid-market quotes and wi > 0 are a set
stability of this reconstructed solution as a function of weights, often chosen inversely proportional to the
of inputs (market prices). (squared) bid–ask spread of Ci .
Model Calibration 3

u1(t,x) s1(t,x)

1 1

0.5 0.5

0 0
1 1
1.5 1.5
0.5 1 0.5 1
t 0 0.5 x 0 0.5

u2(t,x) s2(t,x)

1 1

0.5 0.5

0 0
1 1
1.5 1.5
0.5 1 0.5 1
t 0 0.5 x 0 0.5

Figure 1 Extreme sensitivity of Dupire formula to noise in the data. Two examples of call price function (left) and their
corresponding local volatilities (right). The prices differ through IID noise ∼ U N I F (0, 0.001), representing a bid–ask
spread

In most models, the call prices are computed to its ease of calibration using the Hagan formula
numerically via Fourier transform (see Fourier [30].
Methods in Options Pricing) or by solving a par- In most cases, option prices Ci (θ) depend contin-
tial differential equation (PDE) (see Partial Dif- uously on θ and E is a subset of a finite dimensional
ferential Equations). However, in many situations space (i.e., there are a finite number of bounded
(short or long maturity, small vol–vol, etc.) approx- parameters), so the least-squares formulation always
imation formulae for implied volatilities (Ti , Ki ) admits a solution. However, the solution of equation
of call options are available [5, 10, 11, 30] in (12) need not be unique: J0 may, in fact, have several
terms of model parameters (see Implied Volatility global minima, when the observed option prices do
in Stochastic Volatility Models; Implied Volatility: not uniquely identify the model. Figures 2 and 3 show
Volvol Expansion; Implied Volatility: Long Matu- examples of the function J0 for some popular para-
rity Behavior; SABR Model). In these situations, metric option pricing models, computed using a data
parameters are calibrated by a least-squares fit to the set of DAX index options prices on May 11, 2001.
approximate formula: The pricing error in the Heston stochastic volatil-
ity model (see Heston Model), shown in figure as

I
a function of the “volatility of volatility” and the
inf wi |(Ti , Ki ; θ) −  ∗ (Ti , Ki )|2 (13)
θ∈E mean reversion rate, displays a line of local minima.
i=1
The pricing error for the variance gamma model (see
An example is the SABR model (see SABR Variance-gamma Model) in Figure 3 displays a non-
Model), whose popularity is almost entirely due convex profile, with two distinct minima in the range
4 Model Calibration

Pricing error in heston model: SP500 options data, 2000.

Log error
6

3
5
M
ea
n
re 10
ve
rs 1.5
ion 1
pa 0.5
ram 15 y
0 of volatilit
et
er Volatility

Figure 2 Error surface for the Heston stochastic volatility model, DAX options

× 105
2
1.8
1.6
1.4
1.2 A
1
0.8 0.25
0.6
8 0.2
7
6
5 0.15 s
4
3
k 2 1
0 0.1

Figure 3 Error surface for variance gamma (pure jump) model, DAX options

of observed values. These examples show that, even term, to the pricing error and solve the auxiliary
if the number of observations (option prices) is much problem:
higher than the number of parameters, this does not inf Jα (θ) (14)
imply identifiability of parameters. θ∈E

Regularization methods can be used to overcome


where
this problem [27]. A common method is to have
a convex penalty term R, called the regularization Jα (θ) = J0 (θ) + αR(θ) (15)
Model Calibration 5

The functional (16) consists of two parts: the regu- This regularized formulation has the advantage
larization term αR(θ) which is convex in its argument that its solution exhibits continuous dependence on
and the quadratic pricing error which measures the market prices and with respect to the choice of the
precision of calibration. The coefficient α, called prior model [21, 22].
regularization parameter, defines the relative impor- Simpler regularization methods can be used in
tance of the two terms: it characterizes the trade- settings where prices are computed using analytical
off between prior knowledge and the information transform methods. Belomestny & Reiss [8] pro-
contained in option prices. Jα (.) is usually minimized pose a spectral regularization method for calibrating
by gradient-based methods, where the crux of the exponential-Lévy models. Aspremont [3] formulates
algorithm is an efficient computation of the gradient the calibration of LIBOR market models (Exam-
∇θ J . ple 3) as semidefinite programming problems under
When parameter is a function (such as the local constraints.
volatility function), the regularization term is often Different regularization terms select different solu-
chosen to be a smoothness (e.g., Sobolev) norm. tions: Tikhonov regularization approximates the least-
This method, called Tikhonov regularization (see squares solution with smallest norm [27] while
Tikhonov Regularization) has been applied to diffu- entropy-based regularization selects the minimum-
sion models [1, 2, 13, 23, 26] and to exponential-Lévy entropy least-squares solution [22].
models [19].
Another popular choice of regularization term is
the relative entropy (see Entropy-based Estimation) Entropy Minimization Under Calibration
R(θ) = H (θ |) with respect to a prior probabil- Constraints
ity measure . In continuous-time models, relative
entropy can be used as regularization criterion only An alternative approach to regularization is to select a
if the prior possesses a nonempty class of equiva- pricing model  by minimizing the relative entropy
lent martingale measures, that is, it corresponds to an (see Entropy-based Estimation) of the probability
incomplete market model (see Complete Markets). measure  with respect to a prior, under calibration
From a calibration perspective, market incomplete- constraints
ness (i.e., the nonuniqueness of equivalent martingale
measure) is therefore an advantage: it allows to con-
inf H (|) under Ci = E  [Hi ] for i ∈ I
ciliate compatibility with option prices and equiva- ∼
lence with respect to a reference probability measure. (17)
Examples are provided by jump processes (see Jump
Processes; Exponential Lévy Models) or reduced- Relative entropy being strictly convex, any solu-
form credit risk models (see Reduced Form Credit tion of equation (17) is unique and can be computed
Risk Models): one can modify the jump size distri- in a stable manner using Lagrange multiplier (dual)
bution (Lévy measure) or the default intensity while methods [24] (see Convex Duality).
preserving equivalence (see Equivalence of Prob- Application of these ideas to a set of scenarios
ability Measures) of measures [18, 20]. For Lévy leads to the weighted Monte Carlo algorithm (see
processes (see Exponential Lévy Models), the rela- Weighted Monte Carlo) [6]: one first simulates N
tive entropy term H (ν) is computable in terms of the sample paths N = {ω1 , ..ωN } from a prior model
Lévy measure ν [21]. The calibration problem then  and then solves the above problem (AV) using
takes the following form: as prior the uniform distribution on N . The idea
is to weight the paths in order to verify the cali-
Problem 2 Given a prior Lévy process with law 0 bration constraints. The weights (N (ωi ), i = 1..N )
and characteristics (σ0 , ν0 ), find a Lévy measure ν are constructed by minimizing relative entropy under
which minimizes calibration constraints

N
Jα (ν) = αH (ν) + wi (C0ν (Ti , Ki ) − C0 (Ti , Ki ))2 
N
N (ωi )
i=1 inf N (ωi ) ln under
(16) N ∈P(N )
i=1
N (ωi )
6 Model Calibration


N programming techniques. Consider a Markovian
N (ωi )Gj (ωi ) = Cj (18) model where the state variable St (asset price, interest
i=1 rate,..) follows a stochastic differential equation
This constrained optimization problem is solved
by duality [6, 24]: the dual has an explicit solution, in dXt =µθ (t) dt + σθ (t, St ) dWt
the form of a Gibbs–Boltzmann measure [4, 6] (see 
Entropy-based Estimation). A (discounted) payoff + γθ (t, Xt− )µ(dt dz) (21)
X is then priced using the same set of simulated paths
via
where W is a Wiener process and µ a com-
pensated Poisson random measure with intensity

N
E N [X] = N (ωi )X(ωi ) νθ (dz)λθ (t) dt. The coefficients of the model are
i=1
parameterized by some parameter θ ∈ E; in a non-
parametric setting, θ is just the coefficient itself and
1  N (ωi )
N
E is a functional space. Denote the law of the solution
= X(ωi ) (19)
N i=1 N (ωi ) by θ . Consider now the case where the calibration
criterion J (.)
 Tcan be expressed as an expected value
The benchmark payoffs (calibration instruments) J (θ) = E θ [ 0 φ(Xt ) dt] with a strictly convex func-
play the role of biased control variates, leading to tion φ(.). A classical approach to solve the calibration
variance reduction [29]: problem
 

I I
inf J (θ), under E θ [Hi ] = Ci (22)
N N
E [X] = E X− αi Hi + αi Ci (20) θ∈E
i=1 i=1
is to introduce the Lagrangian functional
This method yields as a by-product, a static
hedge portfolio αi∗ , which minimizes the variance in 
L(θ, λ) = J (θ) − λi (E θ [Hi ] − Ci )
equation (20) [3, 6, 17].
i∈I
A drawback is that the martingale property is  
lost in this process since it would correspond to an T 
infinite number of constraints. As a result, derivative =E θ
φ(Xt ) dt − λi (Hi − Ci )
0 i∈I
prices computed with the weighted Monte Carlo
algorithm may fail to verify arbitrage relations across (23)
maturities (e.g. calendar spread relations), especially
when applied to forward-starting contracts. where λi is the Lagrange multiplier associated to the
These arbitrage constraints can be restored by calibration constraint for payoff Hi . The dual problem
representing  as a random mixture of martingales associated to the constrained minimization problem
the law of random mixture being chosen via relative (22) is given by
entropy minimization under calibration constraints
 T
[17]. This results in an arbitrage-free version of the
weighted Monte Carlo approach, which is applied inf L(θ, λ) = inf E θ φ(Xt ) dt
θ∈E θ∈E 0
to recovering covariance matrices implied by index 
options in [15]. − λi (Hi − Ci ) (24)
i∈I



Stochastic Control Methods
It can be viewed as a stochastic control problem
In certain continuous-time models, the relative (see Stochastic Control) with running cost φ(t, Xt )
entropy minimization approach can be mapped, and terminal cost .
via a duality argument, into a stochastic control This original formulation of the calibration prob-
problem, which can then be solved using dynamic lem was first presented by Avellaneda et al. [7] in the
Model Calibration 7

context of diffusion model with unknown volatility models that are compatible with the market data
(Cibid , Ciask )i=1..I . An evolutionary algorithm simu-
dSt = St σ (t, St ) dWt (25) lates an inhomogeneous Markov chain (Xn )n≥1 in
E N which undergoes mutation–selection cycles [9]
The calibration criterion in [7] was chosen to be
designed such that as the number of iterations n
 T  grows, the components (θ1N , ..., θnN ) of Xn converge
J (σ ) = E σ dt η(σ 2 (t, Xtσ )) (26) to the Gδ , yielding a population of points (θk ) which
0
converges to a sample of model parameters compati-
where η is a strictly convex function. Duality between ble with the market data (Cibid , Ciask )i=1..I in the sense
(22) and (24) is not obvious in this case since the that J0 (θk ) ≤ δ. We thus obtain a population of N
Lagrangian is not convex with respect to its argument model parameters calibrated to market data, which
[31]. The stochastic control approach can also be can be different especially if the initial problem has
applied in the context of model calibration by relative multiple solutions.
entropy minimization for classes of models where Figure 4 shows a sample of local volatility func-
absolute continuity is preserved under a change of tions obtained using this approach [9]. These exam-
parameters, such as models with jumps. Cont and ples illustrate that precise reconstruction of local
Minca [18] use this approach for retrieving the default volatility from call option prices is at best illusory; the
rate in a portfolio from CDO tranche spreads indexed parameter uncertainty is too important to be ignored,
on the portfolio. especially for short maturities where it does not affect
the prices very much; short-term volatility hovers
anywhere between 15% and 30%. These observa-
Stochastic Algorithms tions cast a doubt on the volatility content of very
short-term options in terms of volatility and ques-
Objective functions used in calibration (with the
tions whether one can solely rely on short maturity
exception of entropy-based methods) are typically
asymptotics (see SABR Model) in model calibration.
nonconvex, event after regularization, leading to
multiple minima and lack of convergence in gradient-
based methods. Stochastic algorithms known as
evolutionary algorithms, which contain simulated Parameter Uncertainty
annealing as a special case, have been widely used for
global nonconvex optimization are natural candidate Model calibration is usually the first step in a pro-
for solving such problems [9]. cedure whose ultimate purpose is the pricing and
Suppose, for instance, we want to minimize the hedging of (exotic) options. Once the model param-
pricing error eter θ is calibrated to market prices, it is used to
compute a model-dependent quantity f (θ)—price of

I
an exotic option or a hedge ratio—using a numerical
J0 (θ) = wi |Ciθ − Ci |, θ ∈ E (27)
procedure. Given the ill-posedness of the calibration
i=1
problem and the resulting uncertainty on the solution
where Ciθ are model prices and Ci are observed θ, one question is the impact of this uncertainty on
(transaction or mid-market) prices for the benchmark such model-dependent quantities. This aspect is often
options. Now define the a priori error level δ as neglected in practice and many users of pricing mod-
els view the calibrated parameter as fixed, equating

I
calibration with a curve-fitting exercise.
δ= wi |Cibid − Ciask | (28)
Particle methods yield, as a by-product, a way to
i=1
analyze model uncertainty. While calibration algo-
Given the uncertainty on option values due to rithms based on deterministic optimization yield a
bid–ask spreads, one cannot meaningfully distin- point estimate for model parameters, particle meth-
guish a “perfect” fit J0 (θ) = 0 from any other fit ods yield a population Q = {θ1 , ..., θk } of pricing
with J0 (θ) ≤ δ. Therefore, all parameter values in models, all of which price the benchmark options
the level set Gδ = {θ ∈ E, J0 (θ) ≤ δ} correspond to with equivalent precision E  (Hi ) ∈ [Cibid , Ciask ]. The
8 Model Calibration

Confidence intervals for local volatility : DAX options.

0.35

0.3

0.25

0.2

0.15

0.1
0.5
1
0.2
1.5 0.15
2 0.1
S/S0 0.05
2.5 0 t

Figure 4 A sample of local volatility surfaces calibrated to DAX options

heterogeneity of this population reflects the uncer- with a portfolio containing αi units of benchmark
tainty in model parameters, which are left undeter- instrument Hi ,
mined by the benchmark options. This idea can be 
exploited to produce a quantitative measure of model H = α0 + αi Hi (30)
i∈I
uncertainty compatible with observed market prices 
of benchmark instruments [14], by considering the the cost α0 + αi Ci of setting up the hedge is
interval of prices automatically equal to the model price E  [H ].
  Calibration does not entail that prices, hedge
ratios, or risk parameters generated by the model
inf E  [X], sup E  [X] (29)
∈Q ∈Q
are “correct” in any sense. This requires a correct
model specification with realistic dynamics for risk
for a payoff X in the various calibrated models. factors. Indeed, many different models may calibrate
Another approach is to calibrate several different the same prices of, say, a set of call options but lead
models to the same data and compare the value to very different prices of hedge ratios for exotics
of the exotic option across models [14, 32]. Model [14, 32]. For example, any equity volatility smile can
uncertainty in derivative pricing is further discussed be reproduced by a one-factor diffusion model (see
in [14]. Example 1) via an appropriate specification of the
local volatility surface, but there is ample evidence
that volatility itself should be modeled as a risk factor
Relation with Pricing and Hedging (see Stochastic Volatility Models) and a one-factor
diffusion may lead to an underestimation of volatility
Calibrating a model to market prices simply ensures risk and unrealistic dynamics [30].
that model prices of benchmark instruments reflect However, a model that is not calibrated to market
current “mark-to-market” values. It also ensures that prices of liquidly traded derivatives is typically not
the cost of a static hedge (see Static Hedging) using easy to use. For example, even if a payoff can
these benchmark instruments is correctly reflected in be statically hedged with traded derivatives using
model prices: if a payoff H can be statically hedged an initial capital V0 , the model price will not be
Model Calibration 9

equal to V0 . Thus, model prices will, in general, [14] Cont, R. (2006). Model uncertainty and its impact on the
be inconsistent with hedging costs if the model is pricing of derivative instruments, Mathematical Finance
not calibrated. Thus, calibration seems a necessary 16(3), 519–547.
[15] Cont, R. & Deguest, R. (2009). What do index options
but not sufficient condition for choosing a model for
imply about the dependence among stock returns? Col-
pricing and hedging. umbia University Financial Engineering Report 2009-
06,www.ssrn.com.
[16] Cont, R., Deguest, R. & Kan, Y.H. (2009). Default
References Intensities Implied by CDO Spreads: Inversion Formula
and Model Calibration. Columbia University Financial
Engineering Report 2009-04, www.ssrn.com.
[1] Achdou, Y. (2005). An inverse problem for a parabolic [17] Cont, R. & Léonard, Ch. (2008). A Probabilistic
variational inequality arising in volatility calibration Approach to Inverse Problems in Option Pricing. Work-
with American options, SIAM Journal on Control and
ing Paper.
Optimization 43, 1583–1615.
[18] Cont, R. & Minca, A. (2008). Recovering Portfolio
[2] Achdou, Y. & Pironneau, O. (2002). Volatility smile
Default Intensities Implied by CDO Tranches. Financial
by multilevel least square, International Journal of
Engineering Report 2008-01, Columbia University.
Theoretical and Applied Finance 5(2), 619–643.
[19] Cont, R. & Rouis, M. (2006). Recovering Lévy Processes
[3] d’Aspremont, A. (2005). Risk-management methods for
from Option Prices by Tikhonov Regularization. Working
the Libor market model using semidefinite program-
Paper.
ming, Journal of Computational Finance 8(4), 77–99.
[20] Cont, R. & Tankov, P. (2004). Financial Modelling with
[4] Avellaneda, M. (1998). The minimum-entropy algorithm
Jump Processes, Chapman and Hall/CRC Press, Boca
and related methods for calibrating asset-pricing mod-
Raton.
els, Proceedings of the International Congress of Math-
[21] Cont, R. & Tankov, P. (2004). Nonparametric calibration
ematicians, Documenta Mathematica, Berlin, Vol. III,
of jump-diffusion option pricing models, Journal of
pp. 545–563.
Computational Finance 7(3), 1–49.
[5] Avellaneda, M., Boyer-Olson, D., Busca, J. & Friz, P.
(2002). Reconstructing the smile, Risk Magazine [22] Cont, R. & Tankov, P. (2005). Recovering Lévy pro-
October. cesses from option prices: regularization of an ill-posed
[6] Avellaneda, M., Buff, R., Friedman, C., Grandchamp, N., inverse problem, SIAM Journal on Control and Opti-
Kruk, L. & Newman, J. (2001). Weighted Monte Carlo: mization 45(1), 1–25.
a new technique for calibrating asset-pricing mod- [23] Crépey, S. (2003). Calibration of the local volatility in
els, International Journal of Theoretical and Applied a trinomial tree using Tikhonov regularization, Inverse
Finance 4, 91–119. Problems 19, 91–127.
[7] Avellaneda, M., Friedman, C., Holmes, R. & Sam- [24] Csiszár, I. (1975). I-divergence geometry of probability
peri, D. (1997). Calibrating volatility surfaces via distributions and minimization problems, The Annals of
relative entropy minimization, Applied Mathematical Probability 3, 146–158.
Finance 4, 37–64. [25] Dupire, B. (1994). Pricing with a smile, Risk 7, 18–20.
[8] Belomestny, D. & Reiss, M. (2006). Spectral calibration [26] Engl, H. & Egger, H. (2005). Tikhonov regulariza-
of exponential Lévy Models, Finance and Stochastics tion applied to the inverse problem of option pricing:
10(4), 449–474. convergence analysis and rates, Inverse Problems 21,
[9] Ben Hamida, S. & Cont, R. (2004). Recovering volatility 1027–1045.
from option prices by evolutionary optimization, Journal [27] Engl, H.W., Hanke, M. & Neubauer, A. (1996). Reg-
of Computational Finance 8(3), 43–76. ularization of Inverse Problems, Mathematics and its
[10] Berestycki, H., Busca, J. & Florent, I. (2004). Comput- Applications, Kluwer Academic Publishers, Dordrecht,
ing the implied volatility in stochastic volatility mod- The Netherlands, Vol. 375.
els, Communications on Pure and Applied Mathematics [28] Friz, P. & Gatheral, J. (2005). Valuing Volatility Deriva-
57(10), 1352–1373. tives as an Inverse Problem, Quantitative Finance,
[11] Bouchouev, I., Isakov, V. & Valdivia, N. (2002). Recov- December 2005.
ering a volatility coefficient by linearization, Quantita- [29] Glasserman, P. & Yu, B. (2005). Large sample prop-
tive Finance 2, 257–263. erties of weighted Monte Carlo estimators, Operations
[12] Carr P., Geman H., Madan D.B. & Yor M. (2004). Research 53(2), 298–312.
From local volatility to local Lévy models, Quantitative [30] Hagan, P., Kumar, D., Lesniewski, A.S. & Wood-
Finance 4(5), 581–588. ward, D.E. Managing smile risk, Wilmott Magazine
[13] Coleman, T., Li, Y. & Verma, A. (1999). Reconstructing September, 84–108.
the unknown volatility function, Journal of Computa- [31] Samperi, D. (2002). Calibrating a diffusion model with
tional Finance 2(3), 77–102. uncertain volatility, Mathematical Finance 12, 71–87.
10 Model Calibration

[32] Schoutens, W., Simons, E. & Tistaert, J. (2004). A per- Related Articles
fect calibration! Now what? Wilmott Magazine March.
Black–Scholes Formula; Convex Duality; Dupire
Further Reading Equation; Entropy-based Estimation; Exponential
Lévy Models; Implied Volatility in Stochastic
Biagini, S. & Cont, R. (2006). Model-free representation of Volatility Models; Implied Volatility: Large Strike
pricing rules as conditional expectations, in Stochastic Pro-
Asymptotics; Jump Processes; Local Volatility
cesses and Applications to Mathematical Finance, J. Aka-
hori, S. Ogawa and S. Watanabe, eds, World Scientific,
Model; Markov Functional Models; SABR Model;
Singapore, pp. 53–66. Stochastic Volatility Models; Weighted Monte
Harrison, J.M. & Pliska, S.R. (1981). Martingales and stochas- Carlo; Yield Curve Construction.
tic integrals in the theory of continuous trading, Stochastic
Processes and their Applications 11, 215–260. RAMA CONT
Dupire Equation call price an instant later and the Jensen convexity
bias θ depends on.
According to the forward Dupire equation, the cost
of extending the maturity of a call depends on the
The Dupire equation is a partial differential equation probability of being at the strike at maturity and
(PDE) that links the contemporaneous prices of on the level of volatility there. It can be seen as
European call options of all strikes and maturities relating the price of a calendar spread to the price
to the instantaneous volatility of the price process, of a butterfly spread.
assumed to be a function of price and time only. The
main application of the equation is to compute (i.e.,
invert) local volatilities from market option prices Uses
to build a local volatility model, which many major
banks currently use for option pricing. The equation
If we assume that the price process S follows the
stochastic differential equation, ∂C σ 2 (K, T ) 2 ∂ 2 C ∂C
= K − (r − q)K − qC
∂T 2 ∂K 2 ∂K
dSt (4)
= µt dt + σ (St , t) dWt (1)
St can be used in the following two ways:
Then if C(S, t, K, T ) denotes the price at time t
1. If the local volatility σ (S, t) is known, the
for an underlying price of S of the European call of
PDE can be used to compute the price today
strike K and maturity T that pays (ST − K)+ at time
of all call options in a single sweep, starting
T , C satisfies, for a fixed (S, t).
from the boundary condition C(S, t, K, T ) =
For a fixed (S, t), the Dupire equation,
(S − K)+ . In contrast, the Black–Scholes back-
∂C σ 2 (K, T ) 2 ∂ 2 C ∂C ward equation requires one PDE for each strike
= K − (r − q)K − qC and maturity.
∂T 2 ∂K 2 ∂K
(2) In the case of calibrating a parametric form of
σ (S, t) to a set of market option prices, one
where r is the interest rate, q is the dividend yield (or needs to compute the model price of all these
foreign interest rate in the case of a currency), and options and the forward equation can accelerate
the boundary conditions are given by C(S, t, K, T ) = the computation to a factor 100.
(S − K)+ . 2. If the call prices are known today, one can
This can be established by a variety of methods, compute their derivatives and extract the local
including double integration of the Fokker–Plank volatility by the following formula:
equation, Tanaka formula, and replication strategy. 
 ∂C
It is commonly named the forward equation, as  ∂C
it indicates how current call prices are affected  + (r − q)K + qC
 ∂T ∂K
by an increase in maturity. This can be contrasted σ (K, T ) = 2
 ∂ 2C
with the classical backward Black–Scholes PDE that K2
applies to a European call of fixed strike and maturity: ∂K 2
(5)
∂C σ 2 (S, t) 2 ∂ 2 C ∂C This equation is also known as the stripping
=− S − (r − q)S + rC (3)
∂t 2 ∂S 2 ∂S formula.

Starting from a finite set of listed option prices,


Interpretation a good interpolation in strike and maturities pro-
vides a continuum of option prices and we can apply
The backward Black–Scholes equation applies to a the stripping formula to get the local volatilities.
given call option and relates its time derivative to its Here is an example on the NASDAQ, where inter-
convexity. It is a heat equation that defines the price polation/extrapolation is performed by first fitting a
at a given time as the discounted expectation of the stochastic volatility Heston model to the listed option
2 Dupire Equation

European
60 prices

50 Pricing

40 Calibration

30 Local
volatilities
20 1
0.8
0.6
10 0.4 Pricing
500 1000 0.2
1500 2000 Exotic prices
2500 3000 0

Figure 1 Implied volatility surface of the NASDAQ


Figure 3 Local volatilities give a way to price exotic
options from European options

Up-and-out call price (S = 100, K = 110, H = 130)


100 1.4
80 1.2
BS(vol) price
60 LVM price
1
40
Premium

1 0.8
20 0.8
10 0.6 0.6
0.4
500 1000 0.2
1500 2000 0.4
2500 3000 0
0.2
Figure 2 Local volatility surface of the NASDAQ
0 10 20 30 40 50 60 70 80
prices and then applying a nonparametric interpola- Implied volatility
tion to the residuals.
Figure 4 Comparison of an up-and-out call option in the
Figure 1 displays the implied volatility surface of local volatility model and in Black–Scholes model with
NASDAQ, and the associated local volatility surface various volatilities
is shown in Figure 2.
Once the local volatilities are obtained, one can
The combined effect is that the up-and-out call
price nonvanilla instruments with this calibrated local
local volatility price may exceed the price of any
volatility model (Figure 3).
Black–Scholes model, irrespective of the volatility
Properly accounting for the market skew can
input used (Figure 4).
have a massive impact on the price of exotics.
For instance, an up-and-out call option has a pos-
itive gamma close to the strike and a negative Local Volatilities as Forward Volatilities
gamma close to the barrier. A typical equity neg-
ative skew corresponds to high local volatilities The most common interpretation of local volatility
close to the strike, which adds value to the option is that it is the instantaneous volatility as a certain
due to the positive gamma and low local volatili- function of spot price and time that fits market prices.
ties close to the barrier, which is also beneficial to It gives the simplest model calibrated to the market
the option holder as the gamma is negative there. but assumes a deterministic behavior of instantaneous
Dupire Equation 3

volatility, a fact crudely belied by the market. As time in the future, conditioned on a price level, has
such, the local volatility model is an important step to equal the local variance, which is dictated by the
away from the Black–Scholes model, which assumes current market prices of calls and puts. Fitting to
constant volatility, though it may not necessarily today’s market strongly constrains future dynamics
provide the most realistic dynamics for the price. and, for instance, the backbone, defined as the
The second interpretation, as forward volatilities, behavior of the at-the-money volatility as a function
is far more patent. More precisely, the square of the of the underlying price, cannot be independently
local volatility, the local variance, is the instantaneous specified.
forward variance conditional to the spot price being Once we get a perfect fit of option prices using
equal to the strike at maturity: equation (5), we can perturb the volatility surface,
recalibrate, and conduct a sensitivity analysis. This
σ 2 (K, T ) = E[σt2 |ST = K] (6) provides a decomposition of the volatility risk of any
This means that in a frictionless market where all structured product (or portfolio of) across strikes and
strikes and maturities are available, it is possible to maturities, because seeing the price as a function
combine options into a portfolio that will lock these of the whole volatility surface provides through
forward values. In other words, the local variance perturbation analysis the sensitivity to all volatilities.
is not only a function calibrated to the market that
allows to retrieve market prices but it is also the fair Extensions
value of the fixed leg of a swap with a floating leg
equal to the instantaneous variance at time T , with the There are numerous extensions of the forward PDE,
exchange taking place only if the price at maturity is with stochastic rates and dividends, stochastic volatil-
K. It can be seen as an infinitesimal forward corridor ity, jumps, to the Greeks (sensitivities) and to other
variance swap. products than European options, such as barrier
By way of consequence, if one disagrees with the options, compound options, Asian options, and basket
forward variance, one can put on a trade (in essence options. However, until now, there is no satisfactory
calendar spread against butterfly spread) aligned with counterpart for American options.
this view. Conversely, if one has no view but finds
someone who disagrees with the forward view and Further Reading
accepts to trade at a different level, one can lock the
difference. Derman, E. & Kani, I. (1994). Riding on a smile, Risk 7(2),
Another important consequence of this relation- 32–39, 139–145.
ship is that a stochastic volatility model (with no Dupire, B. (1993). Model art, Risk 6(9), 118–124.
jumps) will be calibrated to the market if and only if Dupire, B. (1994). Pricing with a smile, Risk 7, 18–20.
the conditional expectation of the instantaneous vari- Dupire, B. (1997). Pricing and hedging with smiles, in Math-
ematics of Derivative Securities, M.A.H. Dempster & S.R.
ance is the local variance computed from the market Pliska, eds, Cambridge University Press.
prices. In essence, it means that a calibrated stochastic Dupire, B. (2004). A unified theory of volatility, working
volatility model is a noisy version of the local volatil- paper Paribas capital markets 1996, reprinted in Derivatives
ity model, which is centered on it. In this sense, the Pricing: The Classic Collection, P. Carr, ed., Risk Books,
local volatility model plays a central role. London.
Beyond the fit to the current market prices, these
results have dynamic consequences. For example, Related Articles
they imply that in the absence of jumps, the at-
the-money (ATM) implied volatility converges to the Implied Volatility Surface; Local Times; Local
instantaneous volatility when the maturity shrinks to Volatility Model; Markov Processes; Model
0. The same relation indicates that for any stochastic Calibration.
volatility model calibrated to the market, the average
level of the short-term ATM implied variance at any BRUNO DUPIRE
Implied Volatility Surface Conventional stochastic volatility (SV) models
imply a relationship between the assumed dynamics
of the instantaneous volatility and the volatility skew
(see Chapter 8 of [8]). Empirically, volatility is
The widespread practice of quoting option prices well known to be roughly lognormally distributed
in terms of their Black–Scholes implied volatilities [1, 4] and in this case, the derivative of IV with
(IVs) in no way implies that market participants respect to log-strike in an SV model is approximately
believe underlying returns to be lognormal. On the
independent of volatility [8]. This motivates a simple
contrary, the variation of IVs across option strike and
measure of skew: For a given term to expiration,
term to maturity, which is widely referred to as the
the “95–105” skew is simply the difference between
volatility surface, can be substantial. In this article,
we highlight some empirical observations that are the IVs at strikes of 95% and 105% of the forward
most relevant for the construction and validation of price. Figure 2 shows the historical variation of this
realistic models of the volatility surface for equity measure as a function of term to expiration as
indices. calculated from end-of-day SPX volatility surfaces
generated from listed options prices between January
2, 2001 to February 6, 2009. To fairly compare
across different dates and over all volatility levels,
The Shape of the Volatility Surface
all volatilities for a given date are scaled uniformly
Ever since the 1987 stock market crash, volatility to ensure that the one-year at-the-money-forward
surfaces for global indices have been characterized by (ATMF) volatility equals its historical median value
the volatility skew: For a given expiration date, IVs over this period (18.80%). The skews for all listed
increase as strike price decreases for strikes below the expirations are binned by their term to expiration; the
current stock price (spot) or current forward price. median value for each√ five-day bin is plotted along
This tendency can be seen clearly in the S&P500 with fits to both 1/ T and the best-fitting power-law
volatility surface shown in Figure 1. For short-dated dependence on T .
expirations, the cross section of IVs as a function of The important conclusion to draw here is that
strike is roughly V-shaped, but has a rounded vertex the TS of skew is approximately consistent with
and is slightly tilted. Generally, this V-shape softens square-root (or at least power-law) decay. Moreover,
and becomes flatter for longer dated expirations, this rough relationship continues to hold for longer
but the vertex itself may rise or fall depending on expirations that are typically traded (OTC) Over-the-
whether the term structure (TS) of (ATM) At-the- counter.
money volatility is upward or downward sloping. Significantly, this empirically observed TS of
Conventional explanations for the volatility skew the volatility skew is inconsistent with the 1/T
include the following: dependence for longer expirations typical of popular
one-factor SV models (see Chapter 7 of [8] for
• The leverage effect: Stocks tend to be more
example): Jumps affect only short-term volatility
volatile at lower prices than at higher prices.
skews, so adding jumps does not resolve this dis-
• Volatility moves and spot moves are anticorre-
lated. agreement between theory and observation. Introduc-
• Big jumps in spot tend to be downward rather ing more volatility factors with different timescales
than upward. [3] does help but does not entirely eliminate the prob-
• The risk of default: There is a nonzero probability lem. Market models of IV (see Implied Volatility:
for the price of a stock to collapse if the issuer Market Models) obviously fit the TS of skew by
defaults. construction, but such models are, in general, time
• Supply and demand: Investors are net long of inhomogeneous and, in any case, have not so far
stock and so tend to be net buyers of downside proven to be tractable. In summary, fitting the TS
puts and sellers of upside calls. of skew remains an important and elusive bench-
The volatility skew probably reflects all of these mark by which to gauge models of the volatility
factors. surface.
2 Implied Volatility Surface

Volatility Surface Dynamics

Volatility surfaces cannot have arbitrary shape; they


are constrained by no-arbitrage conditions (such
−1 as the convexity of price with respect to strike).
In practice, these restrictions are not onerous and
−0.5 generally are met provided there are no large
gradients anywhere on the surface. This observation,
Lo

0 1.5 together with the fact that index options in most mar-
g-s

kets trade actively over a wide range of strikes and


trik

1
expiration dates, might lead one to expect the dynam-
e

0.5
ry
k

0.5 e xpi ics of the volatility surface to be quite complicated.


e to
1 Tim On the contrary, several principal component analysis
(PCA) studies have found that an overwhelming frac-
Figure 1 Graph of the S&P500-implied volatility surface tion of the total daily variation in volatility surfaces
as of the close on September 15, 2005, the day before triple is explained by just a few factors, typically three.
witching
Table 1 makes clear that a level mode, one
where the entire volatility surface shifts upward or
downward in tandem, accounts for the vast majority
0.06 of variation; this result holds across different markets,
historical periods, sampling frequencies, and statisti-
0.04 cal methodologies. Although the details vary, gener-
Skew

ally this mode is not quite flat; short-term volatilities


0.02 tend to move more than longer ones, as evidenced by
the slightly upward tilt in Figure 3(a).
In most of the studies, a TS mode is the next most
0.00
important mode: Here, short-term volatilities move
0 100 200 300 400 500 600 700 in the opposite direction from longer term ones, with
Days to expiration little variation across strikes. In the Merrill Lynch
data, which are sampled at 30, 91, 182, 273, 365, and
Figure 2 Decay of S&P500 95–105% skew with respect
to term to expiration. Dots show the median value 547 days to expiration, the pivot point is close to the
√ for each
five-day bin and lines represent best fit to 1/ T (dashed) 91-day term (see Figure 3(b)). In all studies where TS
and to power-law behavior: T −0.39 (solid) is the second most important mode, the third mode
is always a skew mode: one where strikes below
the spot (or forward) move in the opposite direction

Table 1 PCA studies of the volatility surface. GS, Goldman Sachs study [9] and ML, Merrill Lynch proprietary data
Var. explained by
Correlation of 3
Source Market Top 3 modes First mode (%) Top 3 (%) modes with spot
GS S&P500, weekly, 1994–1997 Level, TS, skew 81.6 90.7 −0.61, −0.07, 0.07
GS Nikkei, daily, 1994–1997 Level, TS, skew 85.6 95.9 −0.67, −0.05, 0.04
Cont S&P500, daily, 1900–2001 Level, skew, curvature 94 97.8 −0.66, ∼0, 0.27
et al.
Cont FTSE100, daily, 1999–2001 Level, skew, curvature 96 98.8 −0.70, 0.08, 0.7
et al.
Daglish S&P500, monthly, 1998–2002 Level, TS, skew 92.6 99.3 n.a.
et al.,
ML S&P500, daily, 1901–2009 Level, TS, skew 95.3 98.2 −0.87,−0.11, ∼0
Implied Volatility Surface 3

0.2 0.2 0.2


Vol

0.0 0.0 0.0

Vol
Vol
−0.2 −0.2 −0.2
−1.0 −1.0 −1.0

ke

ke

ke
−0.5 −0.5 −0.5

stri

stri

stri
100 100 100

ed

ed

ed
200 0.0 200 0.0 200 0.0

aliz

aliz

aliz
Te 300 Te 300 Te 300
rm 0.5 rm 0.5 rm 0.5

rm

rm

rm
(da 400 (da 400 (da 400
ys) ys) ys)
No

No

No
(a) 500 1.0 (b) 500 1.0 (c) 500 1.0

Figure 3 PCA modes for Merrill Lynch S&P500 volatility surfaces: (a) level; (b) term structure; and (c) skew

from those above and where the overall magnitude is Table 2 Historical estimates of βT
attenuated as term increases (Figure 3c). T βT standard error R2
It is also worth noting that the two studies [5, 9]
that looked at two different markets during compara- 30 1.55 (0.02) 0.774
ble periods found very similar patterns of variation; 91 1.50 (0.02) 0.825
182 1.48 (0.02) 0.818
the modes and their relative importance were very
365 1.49 (0.02) 0.791
similar, suggesting strong global correlation across
index volatility markets.
In the study by Cont and da Fonseca [5], a TS
market conditions. For example, during the turbulent
mode does not figure in the top three modes; instead
period following the collapse of Lehman Brothers in
a skew mode and another strike-related mode related
September 2008, which was characterized by both
to the curvature emerge as number two and three.
high volatility and high volatility of volatility, spot-
This likely reflects the atypically low variation in TS
volatility correlation remained at historically high
over the historical sample period and is not due to any
levels: −0.92 for daily changes between September
methodological differences with the other studies. As
15, 2008 and end December 31, 2008. On the
in the other studies, the patterns of variation were
other hand, the skew mode, which is essentially
very similar across markets (S&P500 and FTSE100).
uncorrelated with spot return in the full historical
period (see Table 1), did exhibit stronger correlation
Changes in Spot and Volatility are in this period (−0.55), while the TS mode did
Negatively Correlated not. These observations underscore the robustness
of the level-spot correlation as well as the time-
Perhaps the sturdiest empirical observation of all is varying nature of correlations between spot returns
simply that changes in spot and changes in volatility and the other modes of fluctuation of the volatility
(by pretty much any measure) are negatively and surface.
strongly correlated. From the results that we have Other studies have also commented on the robust-
surveyed here, this can be inferred from the high R 2 ness of the spot-volatility correlation. For example,
obtained in the regressions of daily ATMF volatility using a maximum-likelihood technique, Aı̈t-Sahalia
changes shown in Table 1, as well as directly from and Kimmel [1] carefully estimated the parameters
the correlations between spot return and PCA modes of The Heston, CEV, and GARCH models from
shown in Table 2. It is striking that the correlation S&P500 and VIX data between January 2, 1990 and
between the level mode and spot return is consistently September 30, 2003; the correlation between spot and
high across studies, ranging from −0.66 to −0.87. volatility changes varied little between these models
Correlation between the spot return and the other and various estimation techniques, and all estimates
modes is significantly weaker and less stable. were around −0.76 for the period studied.
This high correlation between spot returns and A related question that was studied by Bouchaud
changes in volatility persists even in the most extreme et al. [4] is whether spot changes drive realized
4 Implied Volatility Surface

volatility changes or vice versa. By computing 0.06


the correlations of leading and lagging returns
and squared-returns, they find that for both stocks 0.04
and stock indices, price changes lead to volatility
changes. In particular, there is no volatility feedback 0.02
effect, whereby changes in volatility affect future

dVol
stock prices. Moreover, unlike the decay of the IV 0.00
correlation function itself, which is power-law with
an exponent of around 0.3 for SPX, the decay of the −0.02
spot-volatility correlation function is exponential with
a short half-life of a few days. Supposing the general
level of IV (the variation of which accounts for most
of the variation of the volatility surface) to be highly −0.06
correlated with realized volatility, these results also −0.06 −0.04 −0.02 0.00 0.02 0.04 0.06
apply to the dynamics of the IV surface. Under dif-
Skew*dS/S
fusion assumptions, the relationship between implied
and realized volatility is even more direct: Instanta- Figure 4 Regression of 91-day volatility changes ver-
neous volatility is given by the IV of the ATM option sus spot returns. A zero-intercept least squares fit to
with zero time to expiration. model (1) leads to β91 = 1.50 (solid lines). The β = 1
(“sticky-strike”) prediction (dashed line) clearly does not fit

Skew Relates Statics to Dynamics regression were restricted to spot returns of smaller
magnitude, as suggested visually by the scatterplots
Volatility changes are related to changes in spot: as of Figure 4.
mentioned earlier, volatility and spot tend to move Although empirical relationships between changes
in opposite directions, and large moves in volatility in ATMF volatility and changes in spot are clearly
tend to follow large moves in the spot. relevant to volatility trading and risk management,
It is reasonable to expect skew to play a role the magnitude of βT itself has direct implications
in relating the magnitudes of these changes. For for volatility modeling as well. In both local and
example, if all the variation in ATMF volatility were SV models, βT → 2 in the short-expiration limit.
explained simply by movement along a surface that is Under SV, βT is typically a decreasing function of T ,
unchanged as a function of strike when spot changes, whereas under the local volatility assumption where
then we would expect the local volatility surface is fixed with respect to
dσ S a given level of the underlying, βT is typically an
σATMF (T ) = βT (1) increasing function of T .
d(log K) S
Market participants often adopt a phenomenolog-
with βT = 1 for all terms to expiration (T ). ical approach and characterize surface dynamics as
The empirical estimates of βT shown in Table 2 following one of these rules: “sticky strike,” “sticky
are based on the daily changes in S&P500 ATMF delta,” or “local volatility”; each rule has an associ-
volatilities from January 2, 2001 to February 6, ated value of βT . Under the sticky-strike assumption,
2009 (volatilities tied to fixed expiration dates are βT = 1 and the volatility surface is fixed by strike;
interpolated to arrive at volatilities for a fixed number under the sticky-delta assumption, βT = 0 and the
of days to expiration.) Two important conclusions volatility surface is a fixed function of K/S; and
may be drawn: (i) β is not 1.0, rather it is closer to 1.5 under the local volatility assumption, as mentioned
and (ii) remarkably βT does not change appreciably earlier, βT = 2 for short expirations.
by expiration. In other words, although the volatility Neither “sticky-strike” nor “sticky-delta” rules
skew systematically underestimates the daily change imply reasonable dynamics [2]: In a sticky-delta
in volatility, it does so by roughly the same factor model, the log of the spot has independent increments
for all maturities. It is also worth noting that the and the only arbitrage-free sticky-strike model is
hypothesis βT = 1 would be rejected even if the Black–Scholes (where there is no smile).
Implied Volatility Surface 5

Although the estimates of βT in Table 2 are all to Risk Management, Cambridge University Press, Cam-
around 1.5, consistent with SV, this does not exclude bridge.
the possibility that there may be periods where the βT [5] Cont, R. & Fonseca, J. (2002). Dynamics of implied
volatility surfaces, Quantitative Finance 2, 45–60.
may substantially depart from these average values. [6] Daglish, T., Hull, J. & Suo, W. (2007). Volatility surfaces:
Derman [7] identified seven distinct regimes for theory, rules of thumb, and empirical evidence, Quanti-
S&P500 daily volatility changes between September tative Finance 7, 507–524.
1997 and November 1998, finding evidence for all [7] Derman, E. (1999). Regimes of Volatility, Risk 12, 55–59.
three of the alternatives listed above. A subsequent [8] Gatheral, J. (2006). The Volatility Surface, John Wiley &
study [6] looked at S&P500 monthly data between Sons, Hoboken.
[9] Kamal, M. & Derman, E. (1997). The patterns of change
June 1998 and April 2002 (47 points) and found that
in implied index volatilities, in Goldman Sachs Quantita-
for that period, the data were much more consistent tive Research Notes, Goldman Sachs, New York.
with the sticky-delta rule than with the sticky-strike
rule.
Related Articles
References
Black–Scholes Formula; Implied Volatility in
[1] Aı̈t-Sahalia, Y. & Kimmel, R. (2007). Maximum likeli- Stochastic Volatility Models; Implied Volatility:
hood estimation of stochastic volatility models, Journal Large Strike Asymptotics; Implied Volatility:
of Financial Economics 83, 413–452. Long Maturity Behavior; Implied Volatility: Mar-
[2] Balland, P. (2002). Deterministic implied volatility mod- ket Models; SABR Model.
els, Quantitative Finance 2, 31–44.
[3] Bergomi, L. (2008). Smile dynamics III, Risk 21, 90–96. MICHAEL KAMAL & JIM GATHERAL
[4] Bouchaud, J.-P. & Potters, M. (2003). Theory of Finan-
cial Risk and Derivative Pricing: From Statistical Physics
Moment Explosions us note that finite critical moments of the underlying
ST correspond, in essence, to exponential tails of
log (ST ). There is evidence that refined knowledge
of how moment explosion
  occurs (or the asymptotic
Let (St , Vt )t≥0 be a Markov process, representing a
behavior of u  → Ɛ STu in the case of nonexplosion)
(not necessarily purely continuous) stochastic volatil-
can lead to refined results about implied volatility, see
ity model. (St )t≥0 is the (discounted) price of a
[6, 11] for some examples of stochastic alpha beta rho
traded asset, such as a stock, and (Vt )t≥0 represents
(SABR) type.
a latent factor, such as stochastic volatility, stochas-
In fixed-income markets (St )t≥0 might represent
tic variance, or the stochastic arrival rate of jumps.
a forward LIBOR rate or swap rate. Andersen and
A moment explosion takes place, if the moment
Piterbarg [2] give examples of derivatives with super-
Ɛ[Stu ] of some given order u ∈  becomes infinite linear payoff, whose pricing involves calculation of
(“explodes”) after some finite time T∗ (u). This time
the second moment of ST . It is clear that an explosion
is called the time of moment explosion and formally
of the second moment will lead to infinite prices of
defined by
such derivatives.
  For numerical procedures, such as discretization
T∗ (u) = sup t ≥ 0 : Ɛ[Stu ] < ∞ (1)
schemes for stochastic differential equations (SDEs),
We say that no moment explosion takes place for error estimates that depend on higher order moments
some given order u, if T∗ (u) = ∞. of the approximated process may break down if
Moment explosions can be considered both under moment explosions occur [1]. Moment explosions
the physical and the pricing measure, with most may also lead to infinite expected utility in utility
applications belonging to the latter. If (St )t≥0 is maximization problems [12].
a martingale, then Jensen’s inequality implies that
moment explosions can only occur for moments of
order u ∈  \ [0, 1]. Moment Explosions in the Black–Scholes
Conceptually, the notion of a moment explosion and Exponential Lévy Models
has to be distinguished from an explosion of the
process itself, which refers to the situation that the In the Black–Scholes model, moment explosions
process (St )t≥0 , not one of its moments, becomes never occur, since moments of all orders exist
infinite with some positive probability. for all times. In an exponential Lévy model (see
Exponential Lévy Models), St is given by St =
S0 exp(X t ), where Xt is a Lévy process. It holds
Applications that Ɛ Stu = etκ(u) , where κ(u) is the cumulant-
generating function (cgf) of X1 . Thus in an expo-
In equity and foreign exchange models, where (St )t≥0
nential Lévy model, the time of moment explosion is
represents a stock price or an exchange rate, moment
given by
explosions are closely related to the shape of the
implied volatility surface, and can be used to 
+∞ κ(u) < ∞
obtain approximations for the implied volatility of T∗ (u) = (2)
deep in-the-money and out-of-the-money options (see 0 κ(u) = ∞
Implied Volatility: Large Strike Asymptotics, and
the references therein). According to [5, 14], the Let us remark  that, from Theorem 25.3 in [16],
asymptotic shape of the implied volatility surface for κ(u) < ∞ iff eux 1|x|>1 ν(dx) < ∞ where ν( dx)
some fixed maturity T is determined by the smallest denotes the Lévy measure of X.
and largest moment of ST that is still finite. These
critical moments u− (T ) and u+ (T ) are the piecewise
inverse functionsa of the moment explosion time. Moment Explosions in the Heston Model
Often the explosion time is easier to calculate, so a
feasible approach is to first calculate explosion times, The situation becomes more interesting in a stochastic
and then to invert to obtain the critical moments. Let volatility model, like the Heston model (see Heston
2 Moment Explosions

Model): we see that f satisfies a parabolic partial differen-


tial equation (PDE).

dSt = St Vt dWt1 , S0 = s

dVt = − λ(Vt − θ) dt + η Vt dWt2 , v 2 ∂2 ∂
∂t f = Af := η + [λ(θ − v) + ρηuv]
2 ∂v 2 ∂v
V0 = v,  dWt1 , dWt2  = ρ dt (3)
v
2
We now discuss how to compute the moments of + u −u f (7)
2
St (equivalently, the moment-generating function of
Xt = log St /S0 ). The joint process (Xt , Vt )t≥0 is a
with initial condition f (0, ·; u) = 1, in which (again)
(time-homogenous) diffusion, started at (0, v), with
all coefficients depend in an affine-linear way on v.
generator
The exponentially affine ansatz f (t, v; u) =
 exp(φ(t, u) + vψ(t, u)) then immediately reduces
v ∂2 ∂ v ∂2 this PDE to a system of ordinary differential equa-
L= − + η2 2
2 ∂x 2 ∂x 2 ∂v tions (ODEs) for φ(t, u) and ψ(t, u):

∂ ∂2
+ λ(θ − v) + ρηv (4) ∂
∂v ∂x∂v φ(t, u) = F (u, ψ(t, u)), φ(0, u) = 0 (8)
∂t
Note that (Xt , Vt )t≥0 has affine structure in the sense

that the coefficients of L are affine linear in the state ψ(t, u) = R(u, ψ(t, u)), ψ(0, u) = 0 (9)
variables.b ∂t
Now
2
where F (u, w) = λθw and R(u, w) = w2 η2 +
  (ρηu − λ)w + 12 (u2 − u). Equation (9) is a Riccati
Ɛ euXT |Xt = x, Vt = v differential equation, whose solution blows up at
 
= eux Ɛ euXT |Xt = 0, Vt = v (5) finite time, corresponding to the moment explosion
of St . Explicit calculations ([2], for instance) yieldc


 +∞  if
(u) ≥ 0, χ(u) < 0



 1 χ(u) + 
(u)
log if
(u) ≥ 0, χ(u) > 0
T∗Heston (u) =
(u) χ(u)−
(u) (10)



 2 −
(u)
  arctan + π1{χ (u)<0} if
(u) < 0

(u) χ(u)

satisfies, as a function of (t, x, v), the backward where χ(u) = ρηu − λ and
(u) = χ(u)2 − η2 (u2 −
equation ∂t + L = 0 with terminal data eux and after u). A simple analysis of this condition (cf. [2]) then
replacing T −t with t we can rewrite this as an initial allows to express the no-explosion condition in terms
of the correlation parameter ρ. With focus on positive
value
  setting f = f (t, v; u) :=
problem. Indeed,
moments of the underlying, u ≥ 1, we have
Ɛ euXt |X0 = 0, V0 = v , and noting that

u−1 λ
 T∗Heston (u) = +∞ ⇐
⇒ ρ ≤ − +
∂ 2

ux
u ηu
2
− e f = eux u2 − u f and (11)
∂x ∂x

∂ 2
ux ∂ Similar results for a class of nonaffine stochastic
e f = eux u f (6) volatility models is discussed below.
∂x∂v ∂v
Moment Explosions 3

Moment Explosions in Time-changed From here on, moment explosions of Lτ can be


Exponential Lévy Models investigated analytically, provided κτ , κL are known
in sufficiently explicit form. For some computations
Stochastic volatility can also be introduced in the in this context, also with regard to the asymptotic
sense of running time at a stochastic “business” behavior of the implied volatility smile, see [5].
clock. For instance, when ρ = 0 the (log-price) in
the Heston model is a Brownian motion with drift,
Wt − t/2, run at a Cox–Ingersoll–Rossd (CIR) clock Moment Explosions in Non-affine
τ (t, ω) = τt where Diffusion Models
 Both [2] and [15] study existence of uth moments,
dVt = − λ(Vt − θ) dt + η Vt dWt , V0 = v
u ≥ 1, for (not necessarily affine) diffusion models
(12) of the type
dτt = V dt, τ0 = 0 (13) β
dSt = Vtδ St dWt1 , S0 = s (18)
Since (V , τ ) has an affine structure, there is a γ
tractable moment-generating/characteristic function dVt = ηVt dWt2 + b(Vt ) dt,
in the form V0 = v,  dWt1 , dWt2  = ρ dt (19)

T
where δ, γ > 0, β ∈ [0, 1] and the function b(v) are
Ɛ (exp (uτT )) = Ɛ exp u V (t, ω) dt subject to suitable conditions that ensure a unique
0
solution. For instance, the SABR model falls into
= exp (A (u, T ) + vB (u, T )) (14)
this class. Lions and Musiela [15] first show that if
wheree β < 1, no moment explosions occur. For β = 1, the
same reasoning as in the Heston model shows that
2λθ f (t, v; u) = Ɛ[(St /s)u ] satisfies the PDEf
A (u, t) = λ2 θt/η2 − log
η2
    ∂f
λ ∂ v 2γ 2 ∂ 2 f
× sinh(γ t/2) · coth(γ t/2) + f = Af := η + b(v) + ηρuv δ+γ
γ ∂t 2 ∂v 2 ∂v
(15) v 2δ
2
+ u −u f (20)
2

with initial condition f (0, ·; u) ≡ 1. Note that the


B (u, t) Heston model is recovered as the special case
 β = 1, δ = γ = 1/2, b(v) = −λ(v − θ). Using the
= 2u/(λ + γ coth(γ t/2)), γ = λ2 − 2η2 u
(exponentially-affine in v q ) ansatz f (t, v; u)
(16) = exp(φ(t, u) + v q ψ(t, u)), with suitably chosen q,
φ, and ψ, Lions and Musiela [15] construct super-
We can replace Wt − t/2 above by a general Lévy solutions of equation (20), leading to lower bounds
process L = Lt and run it again at some indepen- for T∗ (u), and then subsolutions, leading to matching
dent clock τ = τ (t, ω) , assuming only knowledge upper bounds.g We report the following results from
of the cgf κT (u) = log Ɛ (exp
 (uτT )). If we also [15]:
set κL (u) = log Ɛ exp (uL1 ) , a simple conditioning
argument shows that the moment-generating function 1. β<
1: no moment explosion occurs, that is,
of Lτ is given by Ɛ Stu < ∞ for all u ≥ 1, t ≥ 0;
2. β = 1, γ + δ < 1: as in 1. no moment explosion

   occurs;
M(u) = Ɛ Ɛ euLτ |τ = Ɛ eκL (u)τ
3. β = 1, γ + δ = 1: If γ = δ = 12 , then this choice
= exp [κτ (κL (u))] (17) of parameters yields a Heston-type model, where
4 Moment Explosions

the mean-reversion term −λ(Vt − θ)dt has been affine ansatz f (t, v; u) = exp (φ(t, u) + vψ(t, u))
replaced by the more general b(Vt ) dt. With λ still reduces the Kolmogorov equation to ordinary dif-
replaced by limv→∞ −b(v)/v the formula (10) ferential equations of the type equation (8). The func-
remains valid. If γ  = δ, then the model can tions F (u, w) and R(u, w) are no longer quadratic
be transformed into a Heston-like model by polynomials, but of Lévy–Khintchine form (see Infi-
the change of variables Vt := V 2δ . The time of nite Divisibility). The time of moment explosion can
t
moment explosion T∗ (u) can be related to the be determined by calculating the blow-up time for
expression in equation (10), by the solutions of these generalized Riccati equations.
This approach can be applied to a Heston model with
1 Heston an additional jump term:
T∗ (u) = T (u) (21)
2δ ∗

4. β = 1, γ + δ >√1: Let b∞ = limv→∞ b(v)/v δ+γ , 


Vt
and ρ ∗ (u) = − (u − 1) /u − b∞ / (ηu), then dXt = c (Vt ) − dt + Vt dWt1 + dJt (Vt ),
2

+∞ ρ < ρ ∗ (u) X0 = 0 (24)
T∗ (u) = (22) 
0 ρ > ρ ∗ (u) dVt = − λ(Vt − θ) dt + η Vt dWt2 ,

The borderline case ρ = ρ ∗ (u) is delicate and we V0 = v,  dWt1 , dWt2  = ρ dt (25)


refer to [15, page 13]. Observe that, the condition
on ρ < ρ ∗ (u) is consistent with the Heston
model (11), upon setting γ = δ = 1/2, b∞ = The process Jt (Vt ) is a pure-jump process based on
−λ, whereas the behavior of ρ > ρ ∗ (u) is dif- a fixed Lévy measure ν( dx). More precisely, writing
ferent in the sense that there is no immediate µ for the uncompensated and µ̂ for the compensated
moment explosion in the Heston model. Poisson random measure, independent of (Wt1 , Wt2 ),
with intensity ν ( dx) ⊗ dt, we assume that

Moment Explosions in Affine Models


with Jumps dJt (Vt )

Recall that in the Heston model 
 |x|<1x µ̃ ( dx, dt) ... case (a)


 + |x|≥1 xµ ( dx, dt)
  =  (26)
Ɛ euXt |X0 = x, V0 = v 


 V x µ̃ ( dx, dt) ... case (b)
 |x|<1
t

= eux exp (φ(t, u) + vψ(t, u)) (23) + |x|≥1 Vt xµ ( dx, dt)

and it was this form of exponentially affine depen-


dence on x, v that allowed an analytical treatment In case (a), the process Jt is a genuine (pure-
via Riccati equations. Assuming validity only of jump) Lévy process; in case (b) jumps are amplified
equation (23), for all u ∈  for which the expecta- linearly with the variance level, as proposed by
tion exists, and that (Xt , Vt )t≥0 is a (stochastically Bates
  [4]. We focus first on case (a). Assuming
continuous, time-homogenous) Markov process on Ɛ eJt < ∞, or equivalently ex 1|x|≥1 ν ( dx) < ∞,
˜
 × ≥0 puts us in the framework of affine processes so that eJt := eJt+ct is a martingale for suitable drift,
[8], which, in fact, includes the bulk of analytically c = − log Ɛ eJ1 , we haveh
tractable stochastic volatility models with and without
jumps.
The infinitesimal generator L of the process    Heston   
˜
(Xt , Vt )t≥0 now includes integral terms correspond- Ɛ euXt = Ɛ euXt Ɛ eu J t
ing to the jump effects and thus is a partial integro-  Heston 
differential operator. Nevertheless, the exponentially = Ɛ euXt et κ̃(u) (27)
Moment Explosions 5

Here, κ̃(u) = (eux − 1 − u(ex − 1)) ν( dx) is well Moment Explosions in Affine Diffusion
defined with values in (−∞,  ∞] and finiteness of Models of Dai–Singleton Type
κ̃(u) < ∞ is tantamount to eux 1|x|≥1 ν ( dx) < ∞.
Hence in case (a), we can link the time of moment
explosion T∗ (u) to T∗Heston (u), given by equation (10), For affine diffusion models with an arbitrary num-
and have ber of stochastic factors, the analysis of moment
explosions through the Riccati equations has been
 studied by Glasserman and Kim [10]. Without struc-
T∗Heston (u) κ̃(u) < ∞ tural restrictions, this approach will lead to multiple
T∗ (u) = (28)
0 κ̃(u) = ∞ coupled Riccati differential equations, whose blow-
up behavior is tedious to analyze in full generality.
However, for concrete specifications, this approach
In the case (b), the jump process Jt (Vt ) depends can still lead to explicit results. Glasserman and Kim
on Vt and the above argument cannot be used. A [10] consider affine models (see Affine Models), of
direct analysis of the (generalized) Riccati equations Dai–Singleton type [7], which are given by a diffu-
[13] shows that in the case κ̃(u) < ∞ the time sion process
of moment explosion is given
by formula (10),
only now
(u) = χ(u)2 − η2 2κ̃(u) + u2 − u , and
immediate moment explosion happens in the case 
κ̃(u) = ∞. dYt = −A ( − Yt ) dt + diag(b + B  Yt ) dWt
Also the model introduced by Barndorff-Nielsen
(32)
and Shephard [3] (see Barndorff-Nielsen and Shep-
hard (BNS) Models), which features simultaneous
jumps in price and variance, falls into the class of
evolving on the state space m ≥0 × 
n−m
. The state
affine models. It is given by
vector Y is partitioned correspondingly, into compo-
nents (Y v , Y d ), called volatility factors and depen-
 dent factors. The vector b ∈ n and matrices A, B ∈
Vt
dXt = c− dt + Vt dWt + ρ dJλt , X0 = 0 n×n are subject to the following structural con-
2 straints:
(29) v
A Ac
dVt = − λVt dt + dJλt , V0 = v, (30) (C1) A = , with real and strictly nega-
0 Ad
tive eigenvalues.
(C2) The off-diagonal entries of Av are nonnega-
where λ > 0, ρ < 0 and (Jt )t≥0 is a pure-jump Lévy tive.
process with positive jumps only, and with Lévy (C3) The vector  = (v , d ) has d = 0, v ≥
measure ν( dx). The drift parameter c is determined 0, and (−A )  0.
v i

I Bc
by the martingale condition for (St )t≥0 . The time (C4) B = , and b = (bv , bd ) with bv = 0
of moment explosion can be calculated [13] and is 0 0
given by and bd = (1, . . . , 1).

Note that condition C1 assumes strict mean reversion



1 in all components, which is a typical assumption
T∗ (u) = − log max 0, for interest rate models. Most equity pricing models,
λ
however, will not satisfy this condition in the strict
2λ max(0, κ+ − ρu) sense: The Heston model, for example, is of the form
1− (31)
u(u − 1) (32), but has an eigenvalue of 0 in the matrix A, and
thus does not satisfy C1. Nevertheless, relaxing this
 ∞ ux κ+ := sup {u > 0 : κ(u) < ∞} and κ(u) =
where condition is in general not a problem, see for example
0 (e − 1) ν(dx) ∈ [0, ∞]. [9]. Glasserman and Kim [10] show that the moments
6 Moment Explosions

of Yt are represented by the transform formula px2 , with the solution x2 (t) = u2 ept . Substitut-
ing into the equation for the first component
 t yields ẋ2 (t) = px1 + x12 + su22 e2pt , a nonau-
Ɛ[exp(2u · Yt )] = exp −2  Ax(s) ds tonomous Riccati equation. After the trans-
0 formation ξ(t) = e−pt x(t) it can be solved
 t explicitly, and the moment explosion time is
+2 |x (s)| ds + 2x(t) · Y0
d 2
determined as
0

(33) 1
T∗ (u1 , u2 ) =log max 0,
p
where x(t) is a solution to the coupled system of
p u1
Riccati equations, given by √ arccot √ +1 (36)
|u2 | s |u2 | s
  x1 (t)
ẋ1.(t) ..
.. Av Ac End Notes
= · .
0 Ad
ẋn (t) xn (t)
 2 
a.
On the intervals (−∞, 0) and (1, ∞), respectively.
x1 (t) b.
In fact, it does not even depend on x, which implies the
I Bc .
+ ·  ..  (34) homogeneity properties in equation (5).
0 0 c.
Only u ∈ / [0, 1] needs to be discussed; in this case, χ (u) =
xn2 (t) 0


(u) < 0.
d.
When u = 1, equation (14) is precisely the Cox–
with initial condition x(0) = u. Equation (33) holds Ingersoll–Ross bond pricing formula.
in the sense that if either side is well defined and e.
For u < u∗ since equation (14) explodes as u ↑
finite, the other one is also finite, and equality holds. u∗ , where u∗ > 0 is determined by I (u∗ ) ≡ λ +
Thus, moment explosions can again be linked to γ (u) coth(γ (u)t/2) = 0.
f.
Care is necessary since f can be +∞; see [15] for a
the blow-up time of the ODE (34). [10] considers
proper discussion via localization.
two concrete specifications of the above model, with g.
A supersolution f of equation (20) satisfies Af − ∂f ≤
one volatility factor and one dependent factor in ∂f
∂t

each case. Owing to conditions C1–C4, the model 0, a subsolution f satisfies Af − ∂t ≥ 0.


h. Heston
p q Xt denotes the usual log-price process in the classical
parameters are of the form A = , B= Heston model, that is, with J ≡ 0.
0 r i.
Following the notation of [7], “” denotes strict inequal-
1 s
, and  = (θ1 , 0), with p < 0, q ≥ 0, r < 0, ity, simultaneously in all components of the vectors.
0 0
s ≥ 0, and θ1 ≥ 0.
References
• q = s = 0: This specification decouples the sys-
tem (34) fully, which can then easily be solved [1] Alfonsi, A. (2008). High Order Discretization Schemes
explicitly. In this case, the moment explosion for the CIR Process: Application to Affine Term Structure
time is given by and Heston Models. Preprint.
[2] Andersen, L.B.G. & Piterbarg, V.V. (2007). Moment
 explosions in stochastic volatility models, Finance and
+∞,   u1 ≤ −p Stochastics 11, 29–50.
T∗ (u1 , u2 ) = 1
p log 1 + u1 > −p
p
u1
, [3] Barndorff-Nielsen, O.E. & Shephard, N. (2001). Non-
Gaussian Ornstein–Uhlenbeck-based models and some
of their uses in financial economics, Journal of the Royal
(35)
Statistical Society B 63, 167–241.
Note that the moment explosion time does not [4] Bates, D.S. (2000). Post-’87 crash fears in the S&P
500 futures option market, Journal of Econometrics 94,
depend on u2 . 181–238.
• s > 0, q = 0, r = p < 0: In this case, the sys- [5] Benaim, S. & Friz, P. (2008). Smile asymptotics ii: mod-
tem (34) decouples only partially; The equa- els with known moment generating functions, Journal of
tion for the second component becomes ẋ2 = Applied Probability 45(1), 16–32.
Moment Explosions 7

[6] Benaim, S., Friz, P. & Lee, R. (2008). The Black Scholes [11] Gulisashvili, A. & Stein, E. (2009). Implied volatility in
implied volatility at extreme strikes, in Frontiers in the Hull-White model, Mathematical Finance, to appear.
Quantitative Finance: Volatility and Credit Risk Model- [12] Kallsen, J. & Muhle-Karbe, J. (2008). Utility Maximiza-
ing, R. Cont, ed, Wiley, Chapter 2. tion in Affine Stochastic Volatility Models, Preprint.
[7] Dai, Q. & Singleton, K.J. (2000). Specification analysis [13] Keller-Ressel, M. (2008). Moment explosions and
of affine term structure models, The Journal of Finance long-term behavior of affine stochastic volatility mod-
55, 1943–1977. els, arXiv:0802.1823, forthcoming. in Mathematical
Finance.
[8] Duffie, D., Filipovic, D. & Schachermayer, W. (2003).
[14] Lee, R. (2004). The moment formula for implied volatil-
Affine processes and applications in finance, The Annals
ity at extreme strikes, Mathematical Finance 14(3),
of Applied Probability 13(3), 984–1053.
469–480.
[9] Filipović, D. & Mayerhofer, E. (2009). Affine Dif- [15] Lions, P.-L. & Musiela, M. (2007). Correlations and
fusion Processes: Theory and Applications, Preprint, bounds for stochastic volatility models, Annales de
arXiv:0901.4003. l’Institut Henri Poincaré 24, 1–16.
[10] Glasserman, P. & Kim, K.-K. (2009). Moment explo- [16] Sato, K.-I. (1999). Lévy Processes and Infinitely Divisi-
sions and stationary distributions in affine diffusion ble Distributions, Cambridge University Press.
models, Mathematical Finance, Forthcoming, available
at SSRN: http://ssrn.com/abstract=1280428. PETER K. FRIZ & MARTIN KELLER-RESSEL

Implied Volatility in “VIX-style” MFIV equals the square root of
expected variance.
Stochastic Volatility • “Synthetic volatility swap (SVS) style” MFIV
equals expected volatility under an independence
Models condition, and approximates expected volatility
under perturbations of that condition.

Unless otherwise noted, the only assumptions on the


Given the geometric Brownian motion (hence con- underlying price process are positivity and continuity.
stant volatility) dynamics of an underlying share Specifically, on a filtered probability space (, F,
price, the Black–Scholes formula finds the no-arbit- {Ft }, ), let S be a positive continuous martingale.
rage prices of call or put options. Given, on the other Regard S as the share price of an underlying trad-
hand, the price of a call or put, the Black–Scholes able asset, and  as risk-neutral measure, with respect
implied volatility is by definition the unique volatil- to a bond having price 1 at all times. Extensions to
ity parameter such that the Black–Scholes formula arbitrary deterministic interest rates are straightfor-
recovers the given option price. ward. Let Ɛt denote Ft -conditional expectation, with
If the share price truly follows geometric Brown- respect to . Let
ian motion, then the Black–Scholes implied volatility
Xt := log(St /S0 ) (1)
matches the constant realized volatility of the shares.
Empirically, however, stock prices do not exhibit
denote the log returns process, and let Xt denote its
constant volatility, which explains the description in
quadratic variation process, which may be regarded
[23] of implied volatility as “the wrong number to
as the unannualized running total of the squared
put in the wrong formula to obtain the right price”.
realized returns of S continuously monitored on [0, t].
Nonetheless, the Black–Scholes implied volatility Fixing a time horizon T > 0, define realized
remains, at the very least, a language/scale/metric by variance to be
which option prices may be quoted and compared
XT
across strikes, expiries, underliers, and observation
times, as noted in [17].
and
√ define realized volatility to be the square root
Moreover, even under stochastic volatility dynam- XT of realized variance. For example, if S has
ics, Black–Scholes implied volatility is not only a dynamics
language but indeed carries meaningful information
dSt = σt St dWt (2)
about realized volatility. This article surveys those
relationships between implied and realized stochastic
volatility, in particular, the following:  T 2respect to Brownian motion W , then XT =
with
0 σt dt.
• Expected realized variance equals the weighted
average of implied variance across strikes, with
“implied normal” weights. Black–Scholes Implied Volatility
• Implied volatility of an option is the break-
even realized volatility for “business-time delta Fix a time horizon T > 0. Define the Black–Scholes
hedging” of that option. [3] function, for S, K, and σ positive, by
• Implied volatility at-the-money approximates  
expected realized volatility, under an indepen- log(S/K) σ
dence condition.
bs
C (σ, S, K) := SN +
σ 2
 
Aside from Black–Scholes implied volatility, alterna- log(S/K) σ
− KN − (3)
tive notions of options-implied volatility have robust σ 2
relationships to realized volatility. We define and
discuss two notions of model-free implied volatility where N is the standard normal cdf. Define C bs (0, S,
(MFIV): K) := (S − K)+ .
2 Implied Volatility in Stochastic Volatility Models

For each K > 0, define the time-0 dimensionless by holding a log contract and dynamically trading
Black–Scholes implied volatility IV0 (K) to be the shares, via the strategy developed in [7, 10, 21], and
unique solution of [9]. Specifically,

C bs (IV0 (K), S0 , K) = Ɛ0 (ST − K)+ =: C(K) (4) 1 1


√ dXt = d log St = dSt − 2 dSt
Dividing IV0 (K) by T produces the usual annual- St 2St
ized implied volatility. 1 1
Often, it is more convenient to regard the Black– = dSt − dXt (9)
St 2
Scholes formula as a function of dimensionless
variance instead of dimensionless volatility, so define hence,
√   T
C BS (V , S, K) := C bs V , S, K (5) 2
XT = −2 log(ST /S0 ) + dSt (10)
0 St
Moreover, it may be convenient to regard the
Black–Scholes implied volatility as a function of log Therefore, the log contract payoff −2 log(ST /S0 ),
strike instead of strike, so define plus the profit/loss from a dynamic position long 2/St
shares, replicates XT . A corollary is that
IV 0 (k) := IV0 (S0 ek ) (6)
√ Ɛ0 XT = Ɛ0 (−2XT ) (11)
We survey how the realized volatility XT (or
realized variance XT ) relates to the time-0 implied if they are finite.
volatility (or its square, the implied variance).
Log Contract Value Equals Weighted Average
Expected Realized Variance Equals Weighted Implied Variance. In turn, the expectation of
−2XT equals weighted average implied variance.
Average Implied Variance Across Strikes
Proofs appear in [19] and [22]. The following is due
Black–Scholes implied variance at one strike does to [22]. Let P (K) := Ɛ0 (K − ST )+ . Assuming dif-
not determine the risk-neutral expectation of realized ferentiability of IV 0 ,
variance, but the weighted average of implied vari-
 S0  ∞
ance at all strikes does so. This result facilitates, for 2 2
instance, analysis [14] of how the implied volatil- Ɛ0 (−2XT ) = 2
P (K) dK + C(K) dK
0 K S0 K2
ity skew’s slope and convexity relate to expected
(12)
variance.  S0  ∞
The “implied normal” weights are given by the 2  2 
= P (K) dK + C (K) dK (13)
standard normal distribution, applied to the log 0 K S0 K
strike standardized by “implied standard deviations”.  0  
Specifically, assuming IV 0 (k) > 0, define the stan- =2 N  (d2 ) + N (−d2 )IV 0 dk
dardized log strike by −∞
 ∞  
k IV 0 (k) +2 N  (d2 )IV 0 − N (d2 ) dk (14)
z(k) := −d2 (k) := + (7) 0
IV 0 (k) 2  ∞  ∞
The result =2 kN  (d2 )d2 dk + 2 N  (d2 )IV 0 dk (15)
 −∞ −∞
∞  ∞
Ɛ0 XT = IV 20 (k) dN (z(k)) (8) =2 kN  (d2 )d2 + N  (d2 )d2 d2 IV 0 dk (16)
−∞
−∞
follows from relating each side to the value of a log  ∞
contract. = IV 20 (k)N  (z(k))z (k) dk (17)
−∞

Expected Realized Variance Equals Log Con- where  denotes derivative (unambiguously, as C, P ,
tract Value. Realized variance admits replication d2 , IV 0 , N are defined as single-variable functions).
Implied Volatility in Stochastic Volatility Models 3

For brevity, we suppress the argument (k) of d2 and Nt := −(Sτ > K), t ∈ (τ ∧ T , T ] (22)
IV 0 and their derivatives.
To justify the integration by parts in equations (15, The break-even property follows from applying Ito’s
16), it suffices to assume the existence of ε > 0 such rule to the process
that ƐST1+ε < ∞ and ƐST−ε < ∞. Then the moment
formula [18] implies that for some β < 2 and all |k| Ct := C BS (IV20 (K) − Xt , St , K), t ∈ [0, τ ∧ T ]
sufficiently large, we have IV 20 (k) < β|k|; hence (23)
to obtain
kN (d2 )|0−∞ − kN (−d2 )|∞
0 =0
∂C BS ∂C BS
and N  (d2 )IV 0 |∞
−∞ = 0 (18) dCt = − dXt + dSt
∂V ∂S
Combining equations (11) and (17) gives the conclu- 1 ∂ 2 C BS
sion in equation (8). + dSt
2 ∂S 2
 
Implied Volatility Equals Break-even Realized 1 2 ∂ 2 C BS ∂C BS
= St − dXt
Volatility 2 ∂S 2 ∂V
Suppose that we buy at time 0 a T -expiry K-strike ∂C bs ∂C BS
+ dSt = dSt (24)
call or put; to be definite, let us say a call. We pay a ∂S ∂S
premium of C0 := C BS (IV20 (K), S0 , K).
Dynamically, delta hedging this option using where the partials of C BS are evaluated at (IV20 (K) −
shares, we have, in principle, a position that is delta Xt , St , K). Therefore,
neutral and “long vega”. Indeed, the implied volatil-  τ ∧T
ity is the option’s break-even realized volatility in the ∂C BS
− Cτ ∧T = −C0 − dSt (25)
following sense: There exists a model-independent 0 ∂S
share trading strategy Nt , such that
as shown in [2, 11, 20].
 T In the event XT < IV20 (K), hence T < τ , we
P &L := − C0 + Nt dSt + (ST − K)+ have
0

< 0 in the event XT < IV0 (K) (19) P &L = (ST − K)+ − CT
√ = (ST − K)+ − C BS (IV20 (K) − XT , ST , K)
and P &L ≥ 0 in the event XT ≥ IV0 (K).
In other words, total profit/loss (from the time- <0 (26)
0 option purchase, the trading in shares, and the
time-T option payout) is negative if and only if and in the event XT ≥ IV20 (K), hence τ ≤ T , we
volatility realizes to less than the initial implied have
volatility.  T
P &L = (ST − K)+ − Cτ − (Sτ > K) dSt
Implied Volatility is Break-even Realized Volatility τ
for Business-time Delta Hedging. Define the busi-
= (ST − K)+ − (Sτ − K)+
ness-time delta hedging strategy by letting
− (Sτ > K)(ST − Sτ ) ≥ 0 (27)
τ := inf{t : Xt = IV20 (K)} (20)
and holding Nt shares at each time t ∈ [0, T ], as claimed. This break-even result is a special case
where of a proposition in [6].

∂C BS Implied Volatility is Not Break-even Realized


Nt := − (IV20 (K) − Xt , St , K)
∂S Volatility for Standard Delta Hedging. The break-
t ∈ [0, τ ∧ T ] (21) even property of the previous section does not extend
4 Implied Volatility in Stochastic Volatility Models

to standard “calendar time” delta hedging, defined by Let Katm := S0 be the at-the-money (ATM) strike.
share holdings Then
∂C BS
− ¯ 20 , St , K), t ∈ [0, T ]
((T − t)IV (28)  
∂S Ɛ(ST −Katm )+ = ƐC bs XT , S0 , Katm (32)
where IV¯ 20 := IV20 (K)/T denotes the time-0 annual-  
≤ C bs Ɛ XT , S0 , Katm (33)
ized implied variance.
This strategy guarantees neither a profit in the by the conditioning argument of [15], independence,
event that XT > IV20 (K) nor a loss in the opposite
and the concavity of
event. To see this, under the dynamics (2), let
 
Yt = C BS (T − t)IV ¯ 20 , St , K (29) v
→ C bs (v, S0 , Katm ) (34)
and apply Ito’s rule to obtain
It follows that
∂C BS
¯ 20 dt +
∂C BS

dYt = − IV dSt IV0 (Katm ) ≤ Ɛ XT (35)
∂V ∂S
1 ∂ 2 C BS The function (34), while concave, is nearly linear
+ dSt
2 ∂S 2 for small v; indeed, its second derivative vanishes at
1 2 2 ∂ 2 C BS ∂C BS v = 0, as observed in [4]. Therefore, the inequalities
¯ 0 St
= − IV dt + dSt (33) and (35) are nearly equalities, as shown in [13].
2 ∂S 2 ∂S
In that sense,
1 ∂ 2 C BS
+ σt2 St2 dt (30)
2 ∂S 2 IV0 (Katm ) ≈ Ɛ XT (36)
BS
where the partial derivatives of C are evaluated at
¯ 20 , St , K). Hence,
((T − t)IV assuming the independence of σ and W .

 T
∂C BS
P &L = YT − Y0 − dSt
0 ∂S Model-free Implied Volatility (MFIV)
 T 2 BS
1 2 ∂ C
= (σt − IV¯ 20 )St2 dt (31) Inverting Black–Scholes is not the only way to
0 2 ∂S 2 extract an implied volatility from option prices.
which is half the time-integrated cash-gamma- While the ATM Black–Scholes implied volatility
weighted difference of instantaneous variance σt2 and approximates expected volatility under the indepen-
implied variance IV ¯ 20 , as shown in [7] and [12]. So dence assumption, alternative definitions of MFIV
if, along some trajectory, σt > IV ¯ 0 at points where use call/put data at all strikes, in order to reflect the
gamma is low, but σt < IV ¯ 0 at points where gamma is expected variance or volatility under more general
T conditions.
high, then it can occur that realized variance 0 σt2 dt
exceeds implied variance IV20 , yet this long-vega
strategy incurs a loss. VIX-style MFIV Equals the Square Root of
In conclusion, implied volatility is the option’s Expected Realized Variance
break-even realized volatility for business-time delta
hedging, but not for calendar-time delta hedging. Motivated by equation (11), define the VIX-style
model-free implied volatility by
Implied Volatility ATM Approximates Expected
Realized Volatility, Under an Independence
Condition VIXIV 0 := Ɛ0 [−2XT ]

In this section, we specialize to dynamics (2) such := Ɛ0 [−2 log(ST /S0 ) + 2(ST /S0 ) − 2]
that σ and W are independent. (37)
Implied Volatility in Stochastic Volatility Models 5

VIXIV 0 is an observable function of option prices, SVSIV 0 is observable from option prices, as the
specifically the square root of the time-0 value of the time-0 value of the portfolio
portfolio

π/2/S0

2/K 2 dK calls at strikes K > S0 straddles at strike K = S0 ,


2/K 2 dK π    
puts at strikes K < S0 (38)
3
I1 log K/S0 − I0 log K/S0 dK
8K S0
Indeed in 2003, the Chicago board options exchange calls at strikes K > S0 ,

(CBOE) [8] adopted an implementation of equation π    


(38) to define the VIX volatility index (but due to the I0 log K/S 0 − I1 log K/S 0 dK
8K 3 S0
availability of only finitely many strikes in practice,
the CBOE VIX is not precisely identical to VIXIV 0 ; puts at strikes K < S0 (42)
see [16]). Under the dynamics (2) with σ and W independent,
By equation (11), the square of VIX-style MFIV the exact equality
equals expected realized variance:
SVSIV 0 = Ɛ0 XT (43)

VIXIV 0 = Ɛ0 XT (39) is proved in [5]. Moreover, it still holds approx-
imately, under perturbations of the independence
However, by Jensen’s inequality, for random XT , assumption. To be precise, consider a family of pro-
cesses S [ρ] , indexed by parameters ρ ∈ [−1, 1], and
defined by
VIXIV 0 > Ɛ0 XT (40)
[ρ]
[ρ] [ρ]
dSt = 1 − ρ 2 σt St dW1t + ρσt St dW2t
thus, VIX-style MFIV differs from expected realized [ρ]
S0 = S0 (44)
volatility, due to convexity.
where W1 and W2 are Ft -Brownian motions, and
σ and W2 are adapted to some filtration Ht ⊆ Ft ,
SVS-style MFIV Equals Expected Realized where HT and FW 1
T are independent. This includes
Volatility all the standard stochastic volatility models of the
form dσt = α(σt ) dt + β(σt ) dW2t .
For nonconstant
√ XT , the VIX-style MFIV never Changing the ρ parameter does not affect √ the
equals Ɛ0 X σ dynamics, and hence cannot affect Ɛ0 XT .
√ T in contrast, the SVS-style MFIV
;
will equal Ɛ0 XT exactly under an independence However, changing ρ does change the S dynamics,
condition, and approximately under perturbations of and hence may change option prices, IV0 (Katm ), and
that condition. Define SVS-style model-free implied SVSIV 0 . Thus, the relationships (36, 43) below
volatility (where SVS stands for “synthetic volatility
swap”) by IV0 (Katm ) ≈ Ɛ0 XT , SVSIV 0 = Ɛ0 XT
(45)

π XT /2 that are valid for the uncorrelated case S = S [0] , may


SVSIV 0 := Ɛ0 e
2 not hold for S = S [ρ] where ρ  = 0. Unlike IV0 (Katm ),

the SVSIV 0 has the robustness property of being
× XT I0 (XT /2) − XT I1 (XT /2) (41) immunized against perturbations of ρ around ρ = 0,
meaning that

where XT := log(ST /S0 ) and Iν is the modified ∂
SVSIV 0 = 0 (46)
Bessel function of order ν. ∂ρ ρ=0
6 Implied Volatility in Stochastic Volatility Models

can be verified. This suggests that SVS-style implied [8] CBOE. (2003). The VIX White Paper, Chicago Board
volatility SVSIV 0 should outperform Black–Scholes Options Exchange.
implied volatility IV0 , as an approximation to the [9] Derman, E., Demeterfi, K., Kamal, M. & Zou, J. (1999).
A guide to volatility and variance swaps, Journal of
expected realized volatility, at least for ρ not too
Derivatives 6, 9–32.
large. [10] Dupire, B. (1992). Arbitrage pricing with stochastic
This is confirmed in [5] for Heston dynam- volatility, Socété Générale.
ics with parameters from [1], and T = 0.5. Across [11] Dupire, B. (2005). Volatility Derivatives Modeling,
essentially all correlation assumptions, the SVS Bloomberg LP.
notion of implied volatility exhibited the small- [12] El Karoui, N., Jeanblanc-Picqué, M. & Shreve, S.
est bias, relative to the true expected annualized (1998). Robustness of the Black and Scholes formula,
volatility. For example, in the case ρ = −0.64, the Mathematical Finance 8, 93–126.
VIX-style implied volatility had bias +98 bp, the [13] Feinstein, S.P. (1989). The Black–Scholes Formula
is Nearly Linear in Sigma for At-the-Money Options:
Black–Scholes implied volatility had bias −30 bp,
Therefore Implied Volatilities from At-the-Money Options
and the SVS-style implied volatility had the smallest
are Virtually Unbiased, Federal Reserve Bank of
bias, −6 bp. Atlanta.
[14] Gatheral, J. (2006). The Volatility Surface: A Practi-
tioner’s Guide, John Wiley & Sons.
Acknowledgments [15] Hull, J. & White, A. (1987). The pricing of options on
assets with stochastic volatilities, Journal of Finance 42,
This article benefited from the comments of Peter Carr. 281–300.
[16] Jiang, G.J. & Tian, Y.S. (2005). The model-free implied
References volatility and its information content, Review of Finan-
cial Studies 18, 1305–1342.
[17] Lee, R. (2004). Implied volatility: statics, dynamics, and
[1] Bakshi, G., Cao, C. & Chen, Z. (1997). Empirical per- probabilistic interpretation, Recent Advances in Applied
formance of alternative option pricing models, Journal
Probability, Springer, pp. 241–268.
of Finance 52, 2003–2049.
[18] Lee, R. (2004). The moment formula for implied
[2] Bick, A. (1995). Quadratic-variation-based dynamic
volatility at extreme strikes, Mathematical Finance 14,
strategies, Management Science 41, 722–732.
469–480.
[3] Black, F. & Scholes, M. (1973). The pricing of options
and corporate liabilities, Journal of Political Economy [19] Matytsin, A. (2000). Perturbative Analysis of Volatility
81, 637–659. Smiles, Merrill Lynch.
[4] Brenner, M. & Subrahmanyam, M. (1988). A simple [20] Mykland, P. (2000). Conservative delta hedging, Annals
formula to compute the implied standard deviation, of Applied Probability 10, 664–683.
Financial Analysts Journal 44, 80–83. [21] Neuberger, A. (1994). The log contract, Journal of
[5] Carr, P. & Lee, R. (2008). Robust Replication of Volatil- Portfolio Management 20, 74–80.
ity Derivatives, Bloomberg LP, University of Chicago. [22] Polishchuk, A. (2007). Variance swap voluation,
[6] Carr, P. & Lee, R. (2008). Hedging Variance Options Bloomberg LP.
on Continuous Semimartingales, Forthcoming in Finance [23] Rebonato, R. (1999). Volatility and Correlation in the
and Stochastics. Pricing of Equity, FX and Interest Rate Options, John
[7] Carr, P. & Madan, D. (1998). Towards a theory of Wiley & Sons.
volatility trading, in Volatility, R. Jarrow, ed, Risk
Publications, pp. 417–427. PETER CARR & ROGER LEE
Local Volatility Model dSt
St
= gt dt + σ (t, St ) dWt (1)

where St is the stock price at time t, gt = rt − bt is


The most important input for pricing equity deriva- the known growth rate of the stock (rt is the interest
tives comes from vanilla call and put options on rate and bt is the effective stock borrowing cost) at
an equity index or a single stock. The market t, σ (t, S) is the local volatility function for given
convention for these options follows the classic time t and stock price S, and Wt is a Brownian
Black–Scholes–Merton (BSM) model [3, 15]: the motion representing the uncertainty in the stock price.
price of each option can be represented by a sin- Dynamics (1) can also be viewed as the effective
gle number called the implied volatility, which is the representation of a general stochastic volatility model
unknown volatility parameter required in the BSM where σ 2 (t, S) is the expectation of the instantaneous
model to reproduce the price. The implied volatil- diffusion coefficient conditioning on St = S [13,
ities for different maturities and strikes are often 17]. If we use C(t, S; T , K) = E[max(ST − K, 0)|
significantly different, and they collectively form the St = S] to represent the undiscounted price for a
implied volatility surface of the underlying. A funda- European call option with maturity T and strike K
mental modeling problem is to explain the implied when the stock price at time t ≤ T is S, then equation
volatility surface accurately using logical assump- (1) leads to the well-known Dupire equation for C:
tions. Many interesting applications follow directly  
from the solution to this problem. The impact from ∂C σ 2 (T , K)K 2 ∂ 2 C ∂C
= + gT C − K (2)
different parts of the implied volatility surface on a ∂T 2 ∂K 2 ∂K
product can be assessed, leading to a deeper under-
standing of the product, its risks, and the associ- Equation (2) gives the relationship between the call
ated hedging strategy. Moreover, different derivatives option price C and the local volatility function
products, including those not available from the mar- σ (t, S). In theory, if arbitrary-free prices of C(T , K)
ket, may be priced and analyzed under assumptions were known for arbitrary T and K, σ (t, S) could be
consistent with the vanilla options. recovered by inverting equation (2) with differentials
This article discusses the local volatility sur- of C. In practice, the market option prices are only
face approach for analyzing equity implied volatil- directly available on a few maturities and strikes.
ity surfaces and examines a common framework in Schemes for interpolating and extrapolating implied
which different modeling assumptions can be incor- volatilities are often adopted in practice to arrive at
porated. Local volatility models were first developed a smooth function C(T , K). Such schemes, however,
by Dupire [11], Derman and Kani [9], and Rubinstein typically lack explicit controls on the various deriva-
[19] in the last decade and have since become one tives terms in equation (2), and the local volatility
of the most popular approaches in equity derivatives directly inverted from equation (2) can exhibit strange
shapes and sometimes attain nonphysical values for
quantitative research [1, 2, 7, 8, 10, 13, 16, 18]. We
reasonable implied volatility input.
present the model from a practitioner’s perspective,
Instead of assuming the implied volatilities per-
discussing calibration techniques with extension to
fectly known for all maturities and strikes and invert-
dividends and interest rate modeling, with emphasis
ing equation (2), one can model the local volatility
on the ease of application to real-world problems.
function σ (t, S) directly as a parametric function.
Solving the forward partial differential equation (2)
numerically with the initial conditions
Basic Model
C(t, S, T = t, K) = max(S − K, 0) (3)
The basic local volatility model is an extension
of BSM to the case where the diffusion volatility yields call option prices for all maturity Ts and strike
becomes a deterministic function of time and the spot Ks, from which the implied volatility surface can
price. In the absence of dividends, the stock dynam- be derived. The parameters of the local volatility
ics can be represented by the following stochastic function can then be determined by matching the
differential equation: implied volatility surface generated from the model
2 Local Volatility Model

to that from the market. With a careful design of implied volatility should have a very smooth shape
the local volatility function, this so-called calibra- over the range thus defined, and matching the implied
tion process can be implemented very efficiently for volatilities at the calibration strikes should produce
practical use. This methodology has the advantage a very good match for all the implied volatilities
that the knowledge of a perfect implied volatility between them.
surface is not required and the model is arbitrage The preceding discussions give a straightforward
free by construction. In addition, a great amount strategy for building the local volatility model—we
of analytical flexibility is available, which allows specify a small number of strikes, and tune the local
tailor-made designs of different models for specific volatility function with the same number of param-
purposes. eters as the number of strikes for each maturity in
a bootstrapping process. The local volatility parame-
ters are then solved through a root-finding routine so
Volatility Surface Design and Calibration that the implied volatilities at the specified strikes on
each maturity are reproduced. As each local volatil-
The key to the success of a volatility model lies in ity parameter is designed to capture a distinct aspect
an understanding of how the implied volatility sur- of the surface shape, the root-finding system is well
face is used in practice. Empirically, option traders behaved and converges quickly to the solution in
often refer to the implied volatility surface and its practice. More importantly, such a process allows a
shape deformation with intuitive descriptions such much smaller numerical noise compared to a typical
as level, slope, and curvature, effectively approxi- optimization process, giving rise to much more stable
mating the shape as simple quadratic functions. In calibration results. This is essential in ensuring robust
addition, for strikes away from the at-the-money Greeks and scenario outputs from the model.
(ATM) region, sometimes the ability to modify the
out-of-the-money (OTM) surface independent from
the central shape is desired, which traders intuitively Discrete Dividend Models
speak of as changing the put wing or the call wing.
Thus there exist several degrees of freedom on the Dividend modeling is an important problem in equity
volatility surface that a good model should be able to derivatives. It can be shown [15] that with nonzero
accommodate, and we can design the local volatility dividends, the original BSM model only works when
function so that each mode is captured by a distinct the payment amount is proportional to the stock
parameter. price immediately before the ex-dividend date (ex-
To facilitate comparison across different modeling date), through incorporating the dividend yields in
techniques, we standardize the model specification gt of equation (1). However, many market partici-
in terms of the BSM implied volatilities on a small pants tend to view future dividends as absolute cash
number of strikes per maturity, typically three or five. amounts, and this is especially true after trading in
For example, volatilities on three strikes in the ATM index dividend swaps becomes liquid. Existing liter-
region can be used to provide a precise definition of ature [4, 5, 12, 14] suggests even in the case of a
the traders’ level, slope, and curvature parameters. constant volatility, cash dividend equity models (also
Similarly, fixing volatilities at one downside strike known as discrete dividend models) are much less
and one upside strike in the OTM region allows tractable than proportional dividend ones. Recently,
the model to agree on a five-parameter specification Overhaus et al. [16] proposed a theory to ship future
of level, slope, curvature, put wing, and call wing. cash dividends from the stock price to arrive at a pure
These calibration strikes on each maturity are chosen stock process, on which one can apply the Dupire
to cover the range of practical interest, usually equation. This theory calls for the changes in future
one to two standard deviations of diffusion at the dividends to have a global impact, especially for
stock’s typical volatility. In the absence of fine maturities before their ex-dates, a feature that certain
structures such as sharp jumps in the underlying, traders find somewhat counterintuitive.
we expect that one standard deviation in the strike Nontrivial dividend specifications can be naturally
range provides a natural length scale over which the introduced in the framework here. We note that
stock price distribution varies smoothly. Thus the between ex-dates, equations (1) and (2) continue to
Local Volatility Model 3

hold without modification. Across an ex-date Ti , a rt = u t + y t (6b)


simple model for the stock price is β
dyt = α(φ − yt ) dt + σt yt dBt (6c)
STi+ = (1 − Yi ) STi− − Di (4) dWt dBt = ρ dt (6d)
where Yi and Di are respectively the dividend yield
where ut is a function of time describing the deter-
and cash dividend amount for the ex-date Ti . This
ministic part of the interest rate, yt is a diffusion
is the mixed-dividend model, which includes propor-
process modeling the stochastic part of the interest
tional and cash dividend models as special cases. Yi
rate, and Bt is a Brownian motion describing the
and Di can be determined from a nominal dividend
interest rate uncertainty, correlated with Wt with coef-
schedule specifying the ex-dates and the dividend
ficient ρ. In equation (6c), α, β, φ, and σt are param-
payment amount, as well as a mixing schedule spec-
eters describing the short-rate process. For example,
ifying the portion of dividends that should remain
when φ = 0 and β = 0, equation (6c) is equivalent to
as cash, the rest being converted into proportional
the Hull–White model. With nonzero φ and β = 12 ,
yield. Typically, cash dividends can be specified
for the first few years to reflect the certainty on the shifted Cox–Ingersoll–Ross (CIR++) model is
expected dividends, gradually switching to all pro- obtained. Both models admit closed form pricing for-
portional in the long term. Theoretically, equation mula for zero-coupon bonds, interest rate caps, and
(4) has the disadvantage of allowing negative ex- swaptions, which can be used for calibration to inter-
dividend stock prices. In practice, if the mixing ratio est rate derivatives market observables. For a given
is switched to all proportional after a few years, this short-rate model and its parameters, the local volatil-
does not pose a serious problem. According to div- ity function σ (t, S) needs to be recovered from equity
idend model (4), the forward equation across the derivatives market information. This can be achieved
ex-date becomes by considering the transition density for the joint evo-
  lution of the stock price St and short rate rt under
+ − K + Di stochastic discount factor, that is,
C(Ti , K) = (1 − Yi ) C Ti , (5)
1 − Yi
p(t, S, y; T , K, Y )
and can be implemented in the same way as stan-    T 
dard jump conditions. With equation (5) incorporated,
the calibration strategy in the section Volatility Sur- = E exp − rτ dτ
t
face Design and Calibration can be applied in exactly  
the same way. We note that it is straightforward to 

extend the local volatility model here to handle more × δ(ST − K)δ(yT − Y )St = S, yt = y (7)

interesting dividend models, in which the dividend
amount can be made a function of the spot immedi- The Fokker–Planck equation for such a quantity can
ately before the ex-date. As long as such a function be written down as
becomes small enough when the stock price goes to
zero, the issue of negative ex-dividend stock prices ∂p ∂ 2 (K 2 σ 2 (T , K)p) σT2 ∂ 2 (Y 2β p)
= +
can be theoretically eliminated. ∂T 2∂K 2 2 2∂Y 2
∂ 2 (Kσ (T , K)Y β p)
+ ρσT
Stochastic Interest Rate Models ∂K∂Y
∂(Kp)
Local volatility models can be extended to cases − (uT + Y − bT )
where stochastic interest rate needs to be considered ∂K
[16]. The interest rate can be modeled through classic ∂p ∂(Yp)
− αφ +α − (uT + Y )p (8)
short-rate models, and the equity process is then ∂Y ∂Y
specified as a diffusion with stochastic growth rate. By solving equation (8) subject to vanishing bound-
Following Brigo and Mercurio [6], we have
ary conditions on K and Y as well as delta-function
dSt initial condition at T = t, one can recover the
= (rt − bt ) dt + σ (t, St ) dWt (6a) European option prices as
St
4 Local Volatility Model
 ∞
section Volatility Surface Design and Calibration
C(t, St ; T , K) = dS max(S − K, 0)
0 can once again be invoked. In practice, since the
 ∞
× p(t, St , y0 ; T , S, Y ) dY (9) two-factor model takes significantly more time in
−∞
calculation than the basic model, it is very effective

and hence derive the implied volatility surface from to use the basic model solution as a starting point for
the hybrid model (6). The strategy discussed in the the hybrid calibration.

80% 80%

60% 60%
Volatility

Volatility
40% 40%

20% 20%

0% 10 0% 10
0 8 ) 0 8 )
rs ars
0.5 6
( yea 0.5 6
( ye
4 4
Stri
1
tu rity 1 rity
ke/s 1.5 2
ma
Stri
ke/s 1.5 2 atu
pot 2 0 o pot 2 0 om
et me
t
(a) Tim (b) Ti

12% 12%

10% 10%
Volatility difference

Volatility difference

8% 8%

6% 6%

4% 4%

2% 2%

0% 10 0% 10
0 8 ) 0 8 )
6 ars 6 ars
0.5
4 ( ye 0.5
4 ( ye
1 rity Stri
1 rity
Stri
ke/s 1.5 2 atu ke/s 1.5 2 atu
pot 2 0 t om pot 2 0 t om
(c) Time (d) Time

Figure 1 The implied and local volatility surface on the S&P 500 Index in November 2007. (a) The implied volatility
as a function of time to maturity and strike price (expressed as a percentage of spot price). (b) The local volatility surface
calibrated under the basic model. (c) Changes in the local volatility surface when cash dividends are assumed for the first
five years, gradually transitioning to proportional dividends in 10 years. (d) Changes in the local volatility surface where
the interest rate is assumed to follow the Hull–White model calibrated to ATM caps with correlation ρ = 30%. In both
(c) and (d) the new local volatility is smaller than in (b)
Local Volatility Model 5

Examples calibration quality can be obtained: the option price


differences computed using the input implied volatil-
We use data from the S&P 500 index market as ity surface and the calibrated local volatility surface
examples to illustrate the preceding discussions. are less than one basis point of the spot price for
Figure 1(a) and (b) shows a typical implied volatil- most liquid strikes and below 10 basis points across
ity surface and the calibrated local volatility surface all strikes and maturities. This accuracy is sufficient
under the basic model, that is, with proportional for most practical purposes.
dividend and deterministic interest rate assumptions. Figure 1(c) and (d) displays the changes in the
The implied volatility surface is given by option local volatility surface when we include effects of
traders. Normally it is retrieved from data in both cash dividends or stochastic interest rate into the
the listed and OTC options market, interpolated, and model. We have assumed the Hull–White model for
extrapolated with trader-specified functions. The local the interest rate in these calculations. The dividend
volatility surface is built by simply calibrating to five and interest rate specifications are seen to have a
strikes on each marked maturity, with the Libor-Swap significant impact on the local volatility surface and
curve and the full index dividend schedule. Excellent hence can be important in derivatives pricing. Cash

0.0 1.2
1 year
0.8
−0.5
Change in fair strike (%)

2 year

0.4
3 year
−1.0
0.0 1 year
4 year 2 year
−1.5
−0.4 3 year
4 year
−2.0 5 year
−0.8 5 year
(a) (b)
−2.5 −1.2

0.0 1.2
Call on maximum
0.8
−0.5
Change in PV (vega)

Put on minimum
0.4
−1.0
0.0 Put on minimum

−1.5
−0.4

−2.0 −0.8
Call on maximum

(c) (d)
−2.5 −1.2
0.0 0.2 0.4 0.6 0.8 1.0 −0.8 −0.4 0.0 0.4 0.8
Cash dividend proportion Equity-interest rate correlation

Figure 2 Impact of discrete dividends and stochastic interest rate on derivative pricing. (a) Changes to the fair strike of
the variance swaps with different dividend assumptions. (b) Changes to the fair strike of the variance swaps under stochastic
interest rate with different correlation. The labels indicate the maturity of the variance swaps. (c) Changes to the PV of
the lookback options with different dividend assumption. (d) Changes to the PV of the lookback options under stochastic
interest rate with different correlation. The numbers in (c) and (d) are in units of vega of Table 2
6 Local Volatility Model

dividends introduce additional deterministic, nonpro- Table 2 Pricing of five-year lookback options with the
portional jump structures in the equity dynamics, and basic model
to maintain the same implied volatility surface the Option type Payout formula PV (%) Vega (%)
local volatility needs to become smaller. This effect 
max SSi − S S5
5
depends on the dividend size relative to future spot Call on maximum 25.29 1.17
i=0 0 0
prices, and thus become more pronounced for smaller 5 
strikes and longer maturities, producing a skewed Put on minimum 1 − min SSi 22.35 0.85
i=0 0
shape in the difference. On the other hand, stochas-
tic interest rate introduces volatility in discount bond Si (i = 0, 1, . . . , 5) is the index price at annual observation
prices and with positive correlation also reduces the dates on year i from the current date. The PV is the calculated
present value according to the payout formula at maturity. The
equity local volatility. This effect does not depend
Vega is the change in PV when a parallel shift of 1% is applied
on spot levels explicitly and is instead related to the to the implied volatility surface
volatility ratio between the interest rate and the equity
and their correlation. Since the interest rate usually
has a small volatility compared to the equity, to the interest rate. For lookback options, one needs to look
leading order the effect of stochastic rates can some- at the joint distribution among equity prices across
times be approximated by a parallel shift on the local different observation dates. Cash dividends generally
volatility surface. reduce the local volatility and hence decrease the cor-
We can apply these local volatility models to relation between the equity prices at different dates,
price exotic derivatives not directly available from leading to lower lookback prices. With stochastic
the vanilla market. One example is variance swaps, interest rate, the effect of modified equity diffusion
which are popular OTC products offered to capitalize volatility can either reinforce (e.g., call on maximum)
on the discrepancy between implied and realized or partly cancel (e.g., put on minimum) the effect of
volatility. Another example is lookback options, stochastic discounting.
which provide payoffs on the maximum/minimum The numerical impact of different modeling assu-
index prices over a set of observation dates and can be mptions can be comparable to a full percentage dif-
appealing hedges to insurance companies who have ference in volatility. Hence, it may be important to
sold policies with similar exposure. Tables 1 and 2 take these into account when accurate and competi-
display the pricing results for these structures using tive pricing of exotic equity derivatives is required.
the basic model. An extensive and detailed discussion of the impact
Figure 2 shows the pricing impact on these struc- of stochastic interest rate on popular hybrid products
tures when the effects of cash dividends and stochas- can be found in [16].
tic interest rates are considered. As the payout for the
variance swap is directly linked to the equity’s aver- References
age local volatility, the pricing is strongly affected
by the assumption of cash dividends and stochastic [1] Andersen, L. & Brotherton-Ratcliffe, R. (1997). The
equity option volatility smile: an implicit finite-
difference approach, Journal of Computational Finance
Table 1 Pricing of variance swaps with the basic model 1, 5–38.
Maturity (years) Fair strike (%) [2] Berestycki, H., Busca, J. & Florent, I. (2002). Asymp-
totics and calibrations of local volatility models, Quan-
1 27.59 titative Finance 2, 61–69.
2 28.18 [3] Black, F. & Scholes, M. (1973). The pricing of options
3 28.42 and corporate liabilities, Journal of Political Economy
4 29.14 81, 631–659.
5 30.00 [4] Bos, M. & Vandermark, S. (2002). Finessing fixed
N −1 dividends, Risk Magazine 15(9), 157–158.
i=0 (ln Si ) −
Si+1 2
The payoff for strike K at maturity is 252N [5] Bos, R., Gairat, A. & Shepeleva, S. (2003). Dealing with
2
K , where Si is the index closing price on the ith business day discrete dividends, Risk Magazine 16(1), 109–112.
from the current date (i = N corresponds to the maturity). The [6] Brigo, D. & Mercurio, F. (2006). Interest Rate Mod-
fair strike is the value K such that the contract costs nothing els—Theory and Practice with Smile, Inflation and
to enter Credit, 2nd Edition, Springer Finance.
Local Volatility Model 7

[7] Brown, G. & Randall, C. (1999). If the skew fits, Risk [16] Overhaus, M., Bermúdez, A., Buehler, H., Ferraris, A.,
Magazine 12(4), 62–65. Jordinson, C. & Lamnouar, A. (2007). Equity Hybrid
[8] Coleman, T.F., Li, Y. & Verma, A. (1999). Recon- Derivatives, Wiley, Hoboken, New Jersy.
structing the unknown volatility function, Journal of [17] Piterbarg, V. (2007). Markovian projection method for
Computational Finance 2, 77–102. volatility calibration, Risk Magazine 20(4), 84–89.
[9] Derman, E. & Kani, I. (1994). Riding on a smile, Risk [18] Rebonato, R. (2004). Volatility and Correlation, 2nd
Magazine 7(2), 32–39. Edition, Wiley, Chichester, West Sussex.
[10] Dumas, B., Fleming, J. & Whaley, R.E. (1998). Implied [19] Rubinstein, M. (1994). Implied binomial trees, Journal
volatility functions: empirical tests, Journal of Finance of Finance 69, 771–818.
53, 2059–2106.
[11] Dupire, B. (1994). Pricing with a smile, Risk Magazine Related Articles
7(1), 18–20.
[12] Frishling, F. (2002). A discrete question, Risk Magazine
15(1), 115–116. Corridor Variance Swap; Dividend Modeling;
[13] Gatheral, J. (2006). The Volatility Surface: A Practi- Dupire Equation; Lookback Options; Model
tioner’s Guide, Wiley, Hoboken, New Jersy. Calibration; Optimization Methods; Stochas-
[14] Haug, E., Haug, J. & Lewis, A. (2003). Back to basics: a tic Volatility Interest Rate Models; Tikhonov
new approach to the discrete dividend problem, Wilmott Regularization; Variance Swap; Yield Curve Cons-
Magazine 5, 37–47. truction.
[15] Merton, R.C. (1973). Theory of rational option pricing,
The Bell Journal of Economics and Management Science CHIYAN LUO & XINMING LIU
4, 141–183.
Dividend Modeling written in lognormal terms:

dSt = (r − q)St dt + σ St dWt (1)

A dividend is a portion of a company’s earnings This approach is especially popular when model-
paid to its shareholders. In the process of dividend ing options on indexes, where dividend payments
payment, the following stages are distinguished: are numerous and spread through time. Another
(i) declaration date, when the dividend size and choice, of proportional amounts di = fi Sti paid at ex-
the ex-dividend date are announced; (ii) ex-dividend dividend dates t1 < t2 < · · · for single shares, can be
date, when the share starts trading net of dividend; justified by the fact that dividends tend to increase
(iii) record date, when holders eligible to dividend when a company is doing well, which is correlated
payment are identified; and (iv) payment date, when with a high share price:
delivery is made. At the ex-dividend date, the stock
price drops by an amount proportional to the size of dSt = (r − Qt )St dt + σ St dWt with
the dividend; the proportionality factor depends on 
Qt = δ(t − ti )fi (2)
the tax regulations. There are a lot of issues, research
i
streams, and approaches in dividend modeling; here
the issue is considered mainly in the context of option In both these cases, the stock price at each time
pricing theory. still has a lognormal distribution, so the prices
The usual way to price derivatives on dividend- of European options are given by straightforward
paying stocks is to take a model for non-dividend- modifications of the Black–Scholes (BS) pricing
paying stocks and extend it to take the dividends formula. This is no longer true, however, for discrete
into account. The dividends then are commonly cash dividends:
modeled as (i) continuously paid dividend yield,
(ii) proportional dividends (known fractions of the dSt = (rSt − Dt ) dt + σ St dWt with
stock price) paid at known discrete times, or 
(iii) fixed dividends (known amounts), paid at Dt = δ(t − ti ) di (3)
known discrete times. It is also possible to model i
the dividend amounts and the dividend dates The stock price St jumps down with the amount
stochastically (though there is evidence that this of dividend di paid at time ti and between the
has a negligible impact on vanilla options [10]). dividends it follows a geometric Brownian motion.
In fact, there is an alternative approach where the In this setting, the stock price can become negative,
stochastic dividends are the primary quantities and but this is usually so unlikely that, in practice, it
the stock followed by option price are derived is not a problem. Still, one might want to use a
from these, which was pioneered in [9]. As usual, more robust dividend policy in the model, such as
one has to choose the complexity of the model capping the dividend at the stock price. Obviously,
depending on dividend exposure of the derivative to different dividend policies result in different option
be priced. prices [7].
In practice, one comes across the notion of implied
dividends: the value of the dividends (independent
of how they are modeled) can be inverted from the Impact on Option Pricing
synthetic forward or future contract; the fact that
one can get quite different (from analyst predictions) To compute an option price under equation (3) the
numbers reflects various uncertainties. Among them standard collection of numerical methods can be
are the sundry tax regulations in different countries employed: finite difference (FD) method with jump
for various market players, timing, and value of the conditions across ex-dividend date [11], Monte Carlo
dividends, just to name a few. simulations, or nonrecombining trees [8]. There is no
The impact of dividends can be illustrated, starting real closed-form solution with multiple dividends for
simply by adding a continuous dividend yield to the European option under equation (3); however, sev-
drift. For the sake of the simplicity of notations, it is eral approximations are available. All of them are
2 Dividend Modeling

based on bootstrapping, that is, repeatedly computing choices used by practitioners [3, 5] are
the convolution of the option value at one dividend

date with the density kernel from that date to the αi = 1 so D̃t = di e−r(ti −t) (6)
previous dividend date and applying the jump condi- t<ti ≤T
tion at the dividend dates, starting from the payoff at 
maturity. One can use a piecewise linear or a more αi = 0 so D̃t = di e−r(ti −t) (7)
sophisticated approximation of the option value at 0<ti ≤t
each convolution step and enjoy having a finite sum ti
of closed-form solutions. On the basis of the fact αi = 1 − (8)
T
that diffusion preserves monotonicity and convexity,
it can be shown that the result converges to the true In this approach, the tree for S̃t is recombining (see
value (unpublished work of Amaro de Matos et al.). Binomial Tree or [8]), and the price of a European
Another choice of parameterization was made in [7]: option is again given by a BS type formula, where
at each step of the integration the option value is the spot and the strike are adjusted as S0 → S0 − D̃0fut
past
approximated by BS-like function where strike and and K → K + D̃T . Needless to say, however, the
volatility are adjusted to obtain the best fit. Such volatility for each of these processes will be different.
methods can be used for any underlying process Namely, choice (6) underestimates and choice (7)
where one can compute the density kernel (Green overestimates the volatility compared to the “true”
function, propagator) for the convolution, though it model (3); the weighted choice (8) aims to minimize
will probably be not much faster or more accurate this effect.
than employing the standard finite difference method,
especially in the case of multiple dividends. For the
handling of American options, one can find an exten- Arbitrage Opportunities
sive list of references in [4] and the relation between
In reference [1], it was shown that arbitrage oppor-
early exercise and dividends is explained in Ameri-
tunities exist in the most standard approach (6) if the
can Options; Finite Difference Methods for Early
volatility surface is continuously interpolated around
Exercise Options or [8].
ex-dividend dates. They apply a rough volatility
adjustment to prevent the arbitrage opportunities. The
following example demonstrates that the continuous
A Common Approach
interpolation of volatility around ex-dividend dates
As already pointed out in the discrete cash dividend can lead to significant mispricing. Figure 1 shows
model (3), the price of a European option has no
closed-form solution and trees do not recombine. In
order to remedy this, traders often split the stock 18
price into a risky net-dividend part and a deterministic
16
dividend part:
Price

14
dS̃t = r S̃t dt + σ̃ S̃t dt
12
St = S̃t + D̃t (4)
past 10
D̃t = D̃tfut − D̃t

D̃tfut = αi di e−r(ti −t) 1 2 3 4 5
t<ti ≤T Maturity

(1 − αi )di e−r(ti −t)
past
D̃t = (5) Figure 1 Price of an American call as a function of the
0<ti ≤t time to maturity T for the following models: (3) (solid
line), (6) (dotted line), and (6) with the volatility adjustment
Note that the dependence on the option maturity T (10) (dashed line). The parameters are S0 = 100, K = 100,
in the notation D̃tfut is suppressed. The most common r = 0.05, σ = 0.3, di = 8, and ti = i − 12
Dividend Modeling 3

the price of an American call as a function of the where


time to maturity T for different models. Given a
flat volatility, the prices under equation (6) jump b12    
down after an ex-dividend date, which is not real- A = 4e 2 −s d˜i ωi N(b1 ) − N (b1 − ξi )
istic, because a risk-free profit can be locked in by i

selling an option maturing just before the ex-dividend  
date and by buying a similar option maturing just + ω̃i N(b2 ) − N (b1 − ξi )
after the ex-dividend date. Although the mispricing 
under equation (6) is most evident for American c12   
+ e 2 −2 s ˜ ˜
di dj ωi ωj N(c1 ) − N(c1 − ψij )
options, it is equally present in the valuation of
i,j
European options. Note that equation (4) produces  
continuous prices around dividend payments and the − ω̃i ω˜j N(c2 ) − N(c1 − ij )
price differences between the two models increase 
 
dramatically with maturity (see [1, 5] for similar + ij N(c1 − ij ) − N(c1 − ψij ) (11)
results). Another primitive pitfall with equation (6) is
to use constant extrapolation of the implied volatility past
to longer maturities. For example, if the present value with s = ln(S0 − D̃0fut ), k = ln(K + D̃T ) − rT , xt =
of dividends of a long-term contract is half of the cur- (s + (k − s) Tt ) + rt, yt = t (TT−t)√, ω̃i = 1 − ωi , d˜i =

rent stock price, then with simple flat term structure di e−r ti , a = σs−k
√ , b1/2 = a ± σ T , c1/2 = a ± σ T ,
T 2
extrapolation one will underestimate the volatility by ξi = σ √tiT , ψij = 2 min(ξi , ξj ), ij = 2 max(ξi , ξj ),
a factor of at least one and a half [2]. ij = ωi ω˜j 1ti >tj + ωj ω̃i 1tj >ti .
This gives a very simple and quick way of switch-
Volatility Adjustments ing between models for trading practice as well as for
understanding the essence of the feature. Moreover,
To understand the difference in terms of volatility it can be used independently of the option type. To
for the specified models, consider the local volatility give an idea of the quantitative value of presented
model. Substitution of equation (4) into equation (3) methods, some numerical results with and without
yields the result that S̃ follows the process with local volatility adjustment are summarized in Table 1. All
volatility   results are compared to the numerical solution cal-
D̃t culated with the FD method. Bootstrapping with the
σ̃ (S̃, t) = 1 + σ (9) piecewise linear interpolation converges to FD with

sufficiently many points; HHL approximaion of [7]
It is handy to translate this result into implied differs a bit, probably due to the fact that the BS-like
terms [6]. Following the line of reasoning presented formula cannot exactly fit the shape of the integrand.
in [2], a slightly generalized result for the implied Clearly, even in approximate form, equation (10)
volatility can be derived: gives a fair correction in all cases, performing espe-
 cially well for the weighted model (8).
π When pricing equity derivatives, one should be
σ̃ ≈ σ + σ
2 2
A (10) aware that these instruments can be sensitive to
2T

Table 1 European call prices with parameter set of Figure 1 for different strikes. HHL refers to the approximation of [7];
FD to finite difference and BS to closed-form solution
BS for (6) BS for (8)
Strike FD for (3) HHL for (3) BS for (6) with (10) BS for (8) with (10)
50 33.509 33.641 29.908 33.312 33.547 33.497
80 22.482 22.559 17.846 22.414 22.304 22.473
100 17.393 17.428 12.772 17.404 17.102 17.388
120 13.573 13.575 9.250 13.644 13.209 13.573
150 9.511 9.479 5.836 9.635 9.099 9.515
4 Dividend Modeling

dividends. Examples are exotic options on stocks call options on stocks paying multiple dividends,
and derivatives involving realized volatility, such Finance Research Letters 4, 34–48.
as variance swaps (see Variance Swap), volatil- [5] Frishling, V. (2002). A discrete question, Risk 15,
115–116.
ity swaps (see Volatility Swaps), correlation swaps [6] Gatheral, J. (2006). The Volatility Surface, John Wiley &
(see Correlation Swap), and gamma swaps (see Sons, Hoboken, pp. 13–14.
Gamma Swap). This sensitivity determines the [7] Haug, E., Haug, J. & Lewis, A. (2003). Back to basics: a
required sophistication in dividend modeling. Adding new approach to the discrete dividend problem, Wilmott
dividends to a stock price process may seem trivial Magazine (September), 37–47.
at first glance, but one has to be careful in setting [8] Hull, J.C. (2006). Options, Futures and Other Deriva-
tives, 6th Edition, Prentice-Hall, Upper Saddle River.
the model parameters. The resulting model can then
[9] Korn, R. & Rogers, L.C.G. (2005). Stocks paying
be solved by the usual methods. For the plain vanilla discrete dividends: modeling and option pricing, Journal
option with dividends, a number of numerical approx- of Derivatives 13(2), 44–48.
imations have been developed. [10] Kruchen, S. (2005). Dividend Risk , thesis, Uni/ETH,
Zürich.
[11] Tavella, D. & Randall, C. (2000). Pricing Finan-
References cial Instruments. The Finite Difference Method, John
Wiley & Sons, New York.
[1] Beneder, R. & Vorst, T. (2001). Options on divi-
dends paying stocks, in Recent Developments in Math-
ematical Finance, World Scientific Printers, Shanghai, Related Articles
pp. 204–217.
[2] Bos, R., Gairat, A. & Shepeleva, A. (2003). Dealing with
discrete dividends, Risk 16, 109–112. American Options; Finite Difference Methods for
[3] Bos, M. & Vandermark, S. (2002). Finessing fixed Early Exercise Options; Local Volatility Model;
dividends, Risk 15, 157–158. Monte Carlo Simulation.
[4] Cassimon, D., Engelen, P.J., Thomassen, L. & Van
Wouwe, M. (2007). Closed-form valuation of American ANNA SHEPELEVA & ALAIN VERBERKMOES
u(u − i)
Implied Volatility: Volvol ×
∂f
∂V

2
Vf (5)
Expansion
We look for solutions which can be written as a
power series of .
In order to calibrate stochastic volatility models, it
is convenient to have an accurate analytical for-
mula or approximation for call options. However, f (u, V , T ) = f (0) (u, V , T ) + f (1) (u, V , T )
deriving such a formula is not always an easy task.
In the Heston model, the most popular technique +  2 f (2) (u, V , T ) (6)
involves numerical integration, which is necessar-
ily time consuming. The main idea is to apply a
perturbation method to the volvol parameter, cal- Thus, we can obtain the power series of the call
culating the first and second order of the differ- price using either equation (4) directly
ence between a stochastic volatility model and a
Black–Scholes model. In general case, we can reduce
the integration of the exact formula to some simpler C(u, V , T ) = C (0) (u, V , T ) + C (1) (u, V , T )
integration. +  2 C (2) (u, V , T ) (7)
Consider the following two-factor stochastic
volatility model:

or expanding first the implied volatility


dS = (r − d)Sdt + σ SdBt (1)
(0) (1)
dV = b(V )dt + ν(V )dW (2) Vimp (u, V , T ) = Vimp (u, V , T ) + Vimp (u, V , T )
(2)
+  2 Vimp (u, V , T ) (8)
dBdW = ρ(V )dt (3)

and then plugging it into the option price with


where r is the short rate, d is the dividend yield,  Black–Scholes formula.
is a constant, and b(V ) and ν(V ) are independent of As a matter of fact, these two methods differ
. We assume parameters r and d to be constant for significantly. The former—denoted by series A in the
the sake of simplicity. The series expansion consists remainder of this article—gives the call price first
in writing the option price formula as a series in . and implied volatility while the latter—denoted by
Fourier methods (see Fourier Methods in series B—gives the implied volatility and call price
Options Pricing) tell us that the call option price is obtained afterward. Nevertheless, in most cases,
is given by there is a slight numerical difference between the
two series. However, regarding far out-of-the money
 options, the two series give different results as shown
Ke−rT i/2+∞
C(S, V , T ) = Se−dT − in Figure 1. Empirical evidence shows that series B is
2π i/2−∞ very often better than series A. Even though this is not
(u, V , T ) a general rule, series B should be usually preferred
exp(−iuX) du (4) over series A.
u(u−i)
We now explicitly compute the two series for the
following model. In particular, this model encom-
where X = ln(S/K) + (r − d)T . passes the Heston model for the special case
φ = 0.5.
∂f 1 ∂ 2f  √ 
=  2 ν 2 (V ) 2 + b(V ) − iuρ(V )ν(V ) V dS = (r − d)Sdt + σ SdBt (9)
∂τ 2 ∂V
2 Implied Volatility: Volvol Expansion

18
Exact
17.5 Series A
Series B
17

16.5

16

15.5

15

14.5

14

13.5

13
70

74

78

82

86

90

94

98

0
10

10

11

11

11

12

12

13
Figure 1 Series A (expansion on price), series B (expansion on implied volatility), and exact volatility

dV = (ω − θV )dt + V φ dW (10) For series B, the expansion on price is


dBdW = ρ(V )dt (11)

First, the expansion is on the fundamental trans- V imp = v(S, v, τ ) + τ −1 J (1) R̃ (1,1)
form of the closed formula, which is presented by 
(u, V , T ) in equation (4). The idea is that we can +  2 τ −1 J 2 + τ −2 J (3) R̃ (2,0)
expand this function into a simpler form, so that
the integration in the equation (4) can be reduced to
analytic form. Here, we do not discuss the detailed τ −2 (1) 2
+ τ −1 J (4) R̃ (1,2) + (J )
derivation but only give the result. Interested readers 2
can refer to [2]. For series A, the expansion on  
 2
price is
× R̃ (2,2)
− R̃ (1,1)
R̃ (2,0)

C(S, V , τ ) = c(S, v, τ ) + τ −1 J (1) R̃ (1,1) cV (S, v, τ ) + O( 3 ) (13)



+  2 τ −1 J 2 + τ −2 J (3) R̃ (2,0)
In the above formulae, term c(S, v, τ ) presents
 the corresponding Black–Scholes price. When
τ −2 1 2 (2,2) volvol  = 0, the stochastic model reduces to a
+τ −1 J (4) R̃ (1,2) + (J ) R̃ Black–Scholes model. v here is the equivalent
2
variance for Black–Scholes, which is basically the
× cV (S, v, τ ) + O( 3 ) (12) integration of the variance from 0 to τ .
Implied Volatility: Volvol Expansion 3

The functions R̃ (p,q) and J (s) are the derivative dvt = κ(θt − vt )dt + ξt vt dBt , v0 (20)
ratios and integration, respectively. Here, for aca-
demic and practitioners’ interest their expressions are dW, Bt = ρt dt (21)
listed.



1 X2 1 1 X 1
R̃ (2,0)
=τ − − , R̃ (1,1)
= − +
2 Y2 2Z 8 Z 2
 

X 2
X 1  1 1X 3
X 2
1 X 1 1 
R̃ (1,2) = 2
− − (4 − Z) , R̃ (2,2) = τ 
 4
− 3
−3 3 + 2
(12 + Z) + 2
(48 − Z 2 )

Z Z 4Z X 2Z Z 8Z 32 Z
2 4
Z
(14)

with Z = V τ and Here, we have adjusted to risk neutral probability;


as a consequence, it introduces the drift term in spot
process by the change of probability.


ρ τ    ω ω φ+ 2
1
J (1)
(V , τ ) = 1 − e−θ(τ −s)
+ e−θs V − ds, J (2) (V , τ ) = 0 (15)
0 θ θ θ
 τ
1  2  ω  ω 2φ
J (3) (V , τ ) = 2 1 − e−θ(τ −s) + e−θs V − ds (16)
2θ 0 θ θ
  τ  
1 ω ω φ+ 1
J (4) (V , τ ) = φ + + e−θ(τ −s) V − 2 J (6) (V , τ )ds (17)
2 0 θ θ
 τ
 −θ(τ −s)  ω  ω φ− 21
with J (V , τ )ds =
(6)
e − e−θs + e−θ(τ −u) V − du (18)
0 θ θ

Example of Volvol Expansion: Heston To expand the model, we add  in the model.
Model
We will now show the expansion of volvol for the  v
dXt = vt dWt − t dt, X0 = x0 (22)
Heston model (see Heston Model). By the asymp- 2
totic expansion, we can finally obtain an approximate 
dvt = κ(θt − vt )dt + ξt vt dBt , v0 = v0
analytic formula for the European call option. This
work comes from the result of Benhamou et al. [1]. (23)
Consider a Heston model
Now, we will expand the European call option
√ vt price formula with respect to . Note that, when
dXt = vt dWt − dt, X0 = x0
2  = 0, we have a Black–Scholes model; while  = 1,
(19) we have a Heston model. We already have the closed
4 Implied Volatility: Volvol Expansion
 
formula of Black–Scholes for  = 0. We expand e−κT −κT (κT + 2) + 2eκT − 2
at  = 0, and let  = 1 to obtain the approximate q0 = ,
2κ 3
formula. In mathematical language, this can be writ-  
ten as follows: e−κT 2eκT (κT − 3) + κT (κT + 4) + 6
q1 = ,
 2κ 3

 
∂PBS 1 ∂ 2 PBS e−2κT −4eκT κT + 2e2κT − 2
PHeston = PBS + Ɛ + Ɛ +E
∂ 2 ∂ 2 r0 = ,
4κ 3
(24)  
e−2κT 4eκT (κT + 1) + e2κT (2κT − 5) + 1
r1 =
4κ 3
Here, we will take another approximation to (28)
simulate the partial derivatives in the above equation
by the linear combination of the Greek letters of The advantage is that there is no integration in the
Black–Scholes. Here, the idea is that use of the approximate formula. So the calculations are done
chain rule in the derivative can result in ∂P∂BS = much faster than in the exact formula. We will discuss
∂PBS ∂S
∂S ∂
+ ∂P∂σBS ∂σ
∂
. The same idea holds for the second this point in the section Numerical Results.
derivative. The error in the approximation is estimated as
√ 3√ 
E = O [ξSup T ] T .

2
∂ i+1 PBS
PHeston = PBS (x0 , varT ) + ai,T (x0 , varT )
i=1
∂x i y


1 Numerical Results
∂ 2i+2 PBS
+ b2i,T (x0 , varT ) + E (25)
i=0
∂x 2i y 2 We test the approximate formula with the follow-
ing strikes. We take strikes from 70% to 130% for
We refer to [1] for proofs and intermediate deriva- short maturity and 10% to 730% for long matu-
tion. The parameters in the formula are as follows: rity. Implied Black–Scholes volatilities of the closed
formula, of the approximation formula, and related
errors (in bp), are expressed as a function of matu-
varT = m0 v0 + m1 θ, a1,T = ρξ(p0 v0 + p1 θ) rities in fractions of years and relative strikes. The
values of the parameters are as follows: θ = 6%,
a2,T = (ρξ ) (q0 v0 + q1 θ),
2
b0,T = ξ 2 (r0 v0 + r1 θ)
κ = 3, ξ = 30%, and ρ = 0%. Except for short matu-
rity plus very small strikes, where we observe the
(26) largest difference (18.01 bp); the difference is less
 T than 5 bp (1 bp = 0.01%) in almost all other
varT = v0,t dt (27) cases. With regard to the speed of calculation, the
0
approximate formula is about 100 times quicker
  than the exact formula (with the optimization in
e−κT −1 + eκT
m0 = , integral).
κ
 
e−κT −1 + eκT
m1 =T − , References
κ
 
e−κT −κT + eκT − 1
p0 = , [1] Benhamou, E., Gobet, E. & Miri, M. (2009). Time
κ2 dependent Heston model, SIAM Journal on Financial
  Mathematics.
e−κT κT + eκT (κT − 2) + 2 [2] Lewis, A.L. (2000). Option Valuation Under Stochastic
p1 = , Volatility: With Mathematica Code. February 2000.
κ2
Implied Volatility: Volvol Expansion 5

Related Articles Model Calibration; Partial Differential Equations;


SABR Model; Stochastic Volatility Models; Styl-
Heavy Tails; Heston Model; Hull–White Stochastic ized Properties of Asset Returns.
Volatility Model; Implied Volatility in Stochas-
ZAIZHI WANG & MOHAMMED MIRI
tic Volatility Models; Implied Volatility Surface;
Implied Volatility: Long nonnegative solution to the equation
 + 
Maturity Behavior Ɛ
ST
S0
−e k
= CBS (k, T (k, T )2 ) (2)

Note that we are using the convention that the log


We discuss some properties of implied volatility moneyness k corresponds to the strike K = S0 ek .
surfaces (see Implied Volatility Surface) for large The main result is that the implied volatility smile
times to maturity. The key result is that the implied flattens at long maturities:
volatility smile flattens at long maturities, indepen-
dently of the model of the underlying asset price, so Theorem 1 For any M > 0, we have
long as there exists an equivalent martingale mea-
sure. An asymptotic formula for the long implied lim sup |(k2 , T )−(k1 , T )| = 0 (3)
T →∞ k1 ,k2 ∈[−M,M]
volatility is given and illustrated by examples from
stochastic volatility models. The dynamics of the long That the implied volatility smile flattens at long
implied volatility are shown to be almost surely non- maturities seems to be a folk theorem, and has been
decreasing as a function of calendar time, in complete verified for various models for which the long implied
analogy with the Dybvig et al. [2] theorem for long volatility can be calculated explicitly. The result is
interest rates. For a more in-depth treatment of these sometimes attributed erroneously to the central limit
issues, consult [3, 4], and [7]. theorem, but it is true in complete generality without
any notion of mean reversion of the spot volatility
process. Indeed, Theorem 1 contains no assumption
Flattening of the Smile on the dynamics of the stock price other than that it
is a nonnegative martingale. Also note that we have
To set up notation and present the main result, not even assumed that limT →∞ (k, T ) exists for
we consider in this section a market in which the any k.
riskless interest rate is zero, and the underlying stock A proof of the flattening of the implied volatil-
pays no dividends. The results described here are ity smile under some mild regularity assumptions
valid for price processes with either continuous or appears in [1]. A proof of Theorem 1 in the form
discontinuous sample paths, or even discrete-time that it appears here can be found in [9].
models. It turns out that the rate of flattening can be
We let S = (St )t≥0 be a nonnegative martingale precisely bounded:
with S0 > 0 modeling the price of a given stock under
a fixed risk-neutral measure. All calculations will be Theorem 2
performed with respect to this measure, so we do not
include it in the notation. 1. For any 0 ≤ k1 < k2 , we have
Introduce a function CBS :  × [0, ∞) → [0, (k2 , T )2 − (k1 , T )2 4
1) by ≤ (4)
k2 − k1 T
CBS (k, v) 2. For any k1 < k2 ≤ 0, we have
  √   √ 
 − √k + v − ek  − √k − v if v > 0 (k2 , T )2 − (k1 , T )2 4
= v 2 v 2 ≥− (5)
 k + k2 − k1 T
(1 − e ) if v = 0
(1) 3. If St → 0 in probability as t → ∞, for any
M > 0, we have
where a + = max{a, 0} denotes the positive part of  
 (k , T )2 −(k , T )2 
the real number a as usual. Since v  → CBS (k, v) is  2 1 
lim sup sup T ≤4
strictly increasing for each k ∈ , we now define T →∞ k1 ,k2 ∈[−M,M]  k2 − k1 
k1 =k2
the Black–Scholes implied volatility (k, T ) for
log moneyness k and maturity T as the unique (6)
2 Implied Volatility: Long Maturity Behavior

The inequality in part 3 of Theorem 2 is sharp, as Formula (10) of Theorem 3 can be found, for
there exists a martingale (St )t≥0 such that St → 0 in instance, in [8]. It can be used to calculate the long
probability and such that implied volatility for some examples.
∂ Example (Exponential Lévy Models (see Exp-
T (k, T )2 → −4 (7)
∂k onential Lévy Models)). The simple inequality
as T → ∞ uniformly for k ∈ [−M, M]. A proof of 11[1,∞) (x) ≤ x ∧ 1 ≤ x p (11)
Theorem 2 is in [9].
which holds for all 0 ≤ p ≤ 1 and x ≥ 0, gives the
Remark The condition St → 0 in probability app- bound
earing in part 3 of Theorem 2 has a natural financial  1/2
8 p
interpretation. Indeed, we have St → 0 in proba- lim sup sup − log Ɛ[ST ]
T →∞ p∈[0,1] T
bility (equivalently, almost surely) if and only if
C(K, T ) → S0 as T → ∞ for some K > 0 (equiva-  1/2
8
lently, for all K > 0) where ≤ (∞) ≤ lim inf − log [ST ≥ 1] (12)
T →∞ T
C(K, T ) = Ɛ[(ST − K)+ ] (8) If (log(St ))t≥0 has independent identically distributed
increments, then the above bounds hold with equality
is the price of a European call option. Since the by the large deviation principle. Indeed, let (Lt )t≥0
long maturity call prices converge to stock price in be a Lévy process with cumulant-generating function
many models of interest (including, of course, the
Black–Scholes model), we see that the assumption (p) = log Ɛ(epL1 ) (13)
St → 0 is not particularly onerous. such that (1) < ∞, and model the stock price by
In fact, since (St )t≥0 is a nonnegative martin- the martingale St = eLt −t(1) . Then the long implied
gale, it must converge almost surely to some ran- volatility satisfies
dom variable S∞ by the martingale convergence
theorem.√If S∞ > 0 with positive probability, then (∞)2 = 8 sup {p(1) − (p)} (14)
p∈[0,1]
limT →∞ T (k, T ) exists and is finite for each k,
and hence limT →∞ (k, T ) = 0. which is eight times the Legendre transform of the
cumulant generating function evaluated at (1).

A Representation Formula Example (Stochastic volatility model.) Lewis [8]


has proposed a saddle-point approximation method
Now that we know that the volatility smile flattens for calculating long implied volatility in stochastic
in the limit as the maturity goes to infinity, we can volatility models. For instance, suppose that the asset
study the behavior of the long implied volatility. price satisfies the following system of stochastic
differential equations:
Theorem 3 For any M > 0, we have √
 dSt = St V dWt
1/2 
(15)
 
 8  dVt = κ(θ − Vt ) dt + η Vt dZt
lim sup (k,T )− − log Ɛ[ST ∧ 1]  = 0 (16)
T →∞ k∈[−M,M] T 
where κ, θ, and η are real constants, and (Wt )t≥0
(9) and (Zt )t≥0 are correlated standard Wiener processes
with
W, Z t = ρt. Lewis [8] has shown that the
where a ∧ b = min{a, b} as usual. In particular, we long implied volatility is given by the following
have the following representation formula formula:
 1/2 4κθ
8 (∞)2 =
(∞) = lim − log Ɛ[ST ∧ 1] (10)
T →∞ T (1 − ρ 2 )η2


whenever the limit exists. × (2κ −ρη)2 +(1 − ρ 2 )η2 −(2κ −ρη) (17)
Implied Volatility: Long Maturity Behavior 3

See [5] for further asymptotics of stochastic volatility dBt = rBt dt, there is no arbitrage if S̃t = St e(δ−r)t
models based on this method, and see [4] for asymp- defines a martingale. In this case, everything from
totics based on perturbation methods. above applies if we define the implied volatil-
ity by
Long Implied Volatility Cannot Fall  + 
S̃T
In many models of interest, the long implied volatil- Ɛ −e
k
= CBS (k, T (k, T )2 ) (21)
S̃0
ity, if it exists, is constant as a function of the calendar
time. However, the long implied volatility need not where the log-moneyness parameter k now corre-
be a constant in general. In this section, we consider sponds to the strike K = S0 ek+(r−δ)T .
the dynamics of the long implied volatility, and in However, it is unclear which of the above results
fact, we will see that the long implied volatility can can be suitably extended to the general case with
never fall. In this section, we also assume that the arbitrary increasing adapted processes (Dt )t≥0 and
stock price is strictly positive, rather than merely non- (Bt )t≥0 .
negative. We define the implied volatility t (k, τ )
for log moneyness k and time to maturity τ as the
unique nonnegative Ft -measurable random variable References
that satisfies
 +  [1] Carr, P. & Wu, L. (2003). The finite moment log stable
St+τ
Ɛ −e k
= CBS (k, τ t (k, τ )2 ) (18) process and option pricing, Journal of Finance 58(2),
St 753–778.
[2] Dybvig, P., Ingersoll, J. & Ross, S. (1996). Long forward
The following theorem was proved in [9]. and zero-coupon rates can never fall, Journal of Business
60, 1–25.
Theorem 4 For all k1 , k2 and 0 ≤ s ≤ t we have [3] Gatheral, J. (2006). The Volatility Surface: A Practi-
tioner’s Guide, John Wiley & Sons, Hoboken, NJ.
lim sup t (k1 , τ ) − s (k2 , τ ) ≥ 0 (19) [4] Fouque, J.-P., Papanicolaou, G. & Sircar, K.R. (2000).
τ →∞ Derivatives in Financial Markets with Stochastic Volatil-
almost surely. ity, Cambridge University Press.
[5] Jacquier, A. (2007). Asymptotic Skew Under Stochas-
This result is an exact analog of the Dybvig– tic Volatility, Pre-print, Birkbeck College, University of
London.
Ingersoll–Ross theorem that long zero-coupon rates [6] Hubalek, F., Klein, I. & Teichmann, J. (2002). A general
never fall. See [6] for a nice proof of this fact. proof of the Dybvig-Ingersoll-Ross theorem: long for-
ward rates can never fall, Mathematical Finance 12(4),
447–451.
Extensions [7] Lee, R. (2004). Implied volatility: statics, dynamics, and
probabilistic interpretation, in Recent Advances in Applied
The previous discussion has considered the case Probability, R. Baeza-Yates, et al., eds, Springer-Verlag,
where the stock pays no dividend and the risk-free Springer, New York, 241–268.
interest rate is zero. In the general case, a stock pays [8] Lewis, A. (2000). Option Valuation Under Stochastic
a dividend and there is a cost to borrow money. The Volatility, Finance Press, Newport Beach.
[9] Rogers, L.C.G. & Tehranchi, M.R. (2008). Can the
situation is usually modeled as follows. Let St be the
Implied Volatility Surface Move by Parallel Shifts? Pre-
stock price, let Dt be the cumulative dividends, and print, University of Cambridge.
let Bt be the price of a numéraire asset such as a bank
account at time t. There is no arbitrage if there exists
a probability measure such that the process Related Articles
t
St dDs
+ (20) Exponential Lévy Models; Heston Model; Implied
Bt 0 Bs
Volatility Surface; Moment Explosions.
is a martingale. In the case of proportional continuous
dividends dDt = δSt dt and constant interest rate MICHAEL R. TEHRANCHI
SABR Model volatility mean-reverts. The use of geometrical meth-
ods in quantitative finance originates from [1, 2] and
was investigated in detail in [5, 6, 7].
The SABR model [4] is a stochastic volatility (see
Stochastic Volatility Models) model in which the
forward asset price follows the dynamics in a forward A More General Stochastic Process
measure T :
In the following, we will assume arbitrary local
volatility functions C(·) and a general time-homo
dft = at C(ft ) dWt (1) geneous one-dimensional stochastic differential equa-
dat = νat dZt , α ≡ a0 (2) tion (SDE) for the stochastic volatility process

dWt dZt = ρ dt, C(f ) ≡ f β , β ∈ [0, 1) (3)


dat = b(at ) dt + σ (at ) dZt (6)
The stochastic volatility at , is described by a
geometric Brownian motion. The model depends on In principle, b(·) and σ (·) could depend on the
four parameters: α, ν, ρ, and β. By using singular forward f as well, but the models we are interested
perturbation techniques, Hagan et al. [4] obtained in here do not exhibit this additional dependence.
a closed-form formula for the implied volatility This strategy for computing the short-time implied
σBS (τ, K), at the first-order in the maturity τ . Here, volatility asymptotics induced by the SVM involves
we display a corrected version of the formula [8]: two main steps:

• Derive the short-time limit of the effective local


ν ln fK0 volatility function. The computation involves
σBS (τ, K) = (1 + σ1 (fav )τ ) (4)
x̂(ζ ) the use of the heat kernel expansion.
• Derive an approximate expression for the
with implied volatility corresponding to this effective
  local volatility function.
1 − 2ρζ + ζ 2 − ρ + ζ
x̂(ζ ) = ln
1−ρ
 f0 Effective Local Volatility Model
ν dx 
ζ = , fav = f0 K
α K C(x) The square of the Dupire effective local volatility
function (see Model Calibration) [3] is equal to the
αν∂f C(f )ρ 2 − 3ρ 2 2 (αC(f ))2
σ1 (f ) = + ν + mean of the square of the stochastic volatility when
4 24 24 the forward is fixed to the strike
  
1 2∂ff C(f ) ∂f C(f ) 2
× + −
f2 C(f ) C(f ) σloc (t, K)2 = C(K)2 Ɛ [at2 |ft = K]
∞ 2
(5) 2 −∞
a p(t, K, a|f0 , α) da
= C(K)  ∞
Though this formula is popular, volatility does not −∞ p(t, K, a|f0 , α) da
mean-revert in the underlying SABR model, so for (7)
given α, ν, β, and ρ, the SABR formula cannot
simultaneously calibrate to the implied volatility where p(t, K, a|f, α) is the conditional probability
smile at more than one expiry. density for the forward and the volatility at time
By mapping Hagan et al.’s computations into a t. As we now proceed to explain, p(t, K, a|f, α) is
geometrical framework based on the heat kernel the fundamental solution of a heat kernel equation
expansion, approximate implied volatility formulae depending on two important geometrical quantities:
may be derived for more general stochastic volatility first a metric tensor in equation (9), which is the
models (SVMs), in particular for the models where inverse of the local covariance matrix and second an
2 SABR Model

Abelian connection in equation (10) which depends the minimizer of


on the drift b(a).  1
dx µ dx ν
d(x)2 = min gµν dλ (13)
C 0 dλ dλ
Heat Kernel Expansion where λ parameterizes the curve C(x 0 , x) join-
ing x(λ = 0) ≡ x 0 and x(λ = 1) ≡ x.
A short-time expansion of the density for a multi-
dimensional Itô diffusion process can be obtained •
(x) is the so-called Van Vleck-Morette deter-
using the heat kernel expansion: the Kolmogorov minant:
equation is rewritten as a heat kernel equation on an
n-dimensional Riemannian manifold endowed with
an Abelian connection as explained in [7].  

1 ∂ 2 d(x)2 1
Suppose the stochastic equations are written as
(x) = g(x) 2 det − 0
g(x 0 )− 2
2∂x∂x
dx µ = bµ (x) dt + σ µ (x) dW µ (8)
(14)
with dW µ dW ν = ρµν dt. The associated metric gµν • P(x 0 , x) is the parallel transport of the Abelian
depends only on the diffusion terms σµ , while the connection along the geodesic C(x 0 , x) from the
connection Aµ (x) involves drift terms bµ as well: point x 0 to x:

− Aµ (x) dx µ
ρ µν P(x 0 , x) = e C(x0 ,x) (15)
gµν (x) = 2
σµ (x)σν (x)
• The ai (x) coefficients (a0 (x) = 1) are smooth
ρ µν ≡ [ρ −1 ]µν , µ, ν = 1 · · · n functions and depend on geometric invariants
 
1 µ 1  such as the scalar curvature. More details can
Aµ (x) = b (x) − g − 2 ∂ν g 1/2 g µν (x) (9) be found in [7].
2

with
g(x) ≡ det[gµν (x)] (10) The Short-time Limit
Plugging the general short-time limit for p at the first-
order in time as given by equation (12) in equation
Here, we have used the Einstein convention mean-
(7) and using a saddle-point approximation for the
ing that two repeated indices are implicitly summed.
integration over a, we obtain the short-time limit of
We set
the effective local volatility function.
Aµ (x) = gµν (x)Aν (x) (11) Getting implied volatility from the effective local
volatility function boils down to calculating the
The asymptotic solution to the Kolmogorov equation geodesic distance between any two given points
in the short-time limit is given by in the metric defined by the SVM. While this is
generally a nontrivial task, the geodesic distance is
 known analytically in the special case of the geometry
g(x)  −
d(x)2
p(t, x|x0 ) = n
(x)P (x 0
, x)e 4t associated with the SVM defined by equations (1) and
(4πt) 2 (6). Details are given in [7].

× an (x)t n (12)
n=1 Asymptotic Implied Volatility
• d(x) is the geodesic distance between x and x 0 Applying these techniques, we find that the general
measured in the metric gµν . d(x) is defined as asymptotic implied volatility at the first order for any
SABR Model 3

time-homogeneous SVM, depending implicitly on the with


metric gij (9) and the connection Ai (10) is given by
1
σ (f ) =
 ν   
ln fK0 g ff τ
σBS (τ, K) =  1+ qν + αρ + α 2 + q 2 ν 2 + 2qανρ
K df 
√ 12 × ln
f 0 2g ff
α(1 + ρ)
   f
 2 dx
3 ∂f g ff ∂f2 g ff 1 q=
− + + f C(x)
4 g ff g ff fav2 0
amin (f ) = α 2 + 2ανρq + ν 2 q 2
g ff τ

 
−1 −qνρ − αρ + amin (f )
2
+ d(f ) = cosh
2g φ  (amin )
ff
α(1 − ρ 2 )
(19)
 

2  φ  (amin ) g ff
ln(
g P ) (amin ) −  + ff 
φ (amin ) g The original SABR formula (6) can be reproduced
by approximating amin for strikes near the money by
(16)

amin  α + qρν
with amin the volatility a, which minimizes the
geodesic distance d(a, fav |α, f0 ). The g ff are the and
ff -components of the inverse metric evaluated at
amin .
is the Van Vleck–Morette determinant as in
sinh(d(amin ))
equation (14), g is the determinant of the metric, and 1 (20)
P is the parallel gauge transport as in equation (15). d(amin )
The prime symbol  indicates a derivative according An asymptotic formula for a SABR model with
to a. This formula in equation (16) is particularly a mean-reversion term, called λ-SABR, has been
useful as we can use it to rapidly calibrate any given obtained similarly in [7].
SVM. In the following, we apply it to the SABR
model with an arbitrary local volatility C(·).
Calibration of the Short-term Smile
Improved SABR Formula Moreover, by inverting equation (17) to lowest order
in τ , we see that for any values α, ρ, and ν, a given
The asymptotic implied volatility in the SABR model short-term smile σBS (f ) is calibrated by construction
with arbitrary local volatility C(·) is then given by if the local volatility function is chosen as

ln fK0 C(f ) =
σBS (τ, K) = (1 + σ1 (fav ) τ ) (17) 
σ (K) σBS (f ) −1
f σBS (f ) 1 − f ln ff0 σBS (f )
 f  
with  ln −1 √ 1
α 1 − ρ cosh |ρ| ν σBS (f ) + cosh
2 ρ f0

1−ρ 2
ανρ sinh(d(f )) (C(f )amin )2 (21)
σ1 (f ) = ∂f C(f ) +
4 d(f ) 24

1 2∂ff (C(f )amin ) References
2
+
f C(f )amin
   [1] Avellaneda, M., Boyer-Olson, D., Busca, J. & Friz, P.
∂f (C(f )amin ) 2
− (18) (2002). Reconstructing the smile, Risk Magazine October
C(f )amin 91–95.
4 SABR Model

[2] Berestycki, H., Busca, J. & Florent, I. (2004). Computing Pricing, Financial Mathematics Series, Chapman &
the implied volatility in stochastic volatility models, Com- Hall/CRC 102–104.
munications on Pure and Applied Mathematics 57(10), [8] Obłój, J. (2008). Fine-tune your smile, Wilmott Magazine
1352–1373. May.
[3] Dupire, B. (2004). A unified theory of volatility, in
Derivatives Pricing: The Classic Collection, P. Carr, ed.,
Risk Publications. Further Reading
[4] Hagan, P., Kumar, D., Lesniewski, A.S. & Wood-
ward, D.E. (2002). Managing smile risk, Wilmott Mag- Benaim, S., Friz, P., Lee, R. (2008). On the Black-Scholes
azine September, 84–108. implied volatility at extreme strikes, in Frontiers in Quanti-
[5] Henry-Labordère, P. (2007). Combining the SABR and tative Finance: Volatility and Credit Risk Modeling, R. Cont,
BGM models, Risk Magazine October 102–107. ed., Wiley, Chapter 3.
[6] Henry-Labordère, P (2008). A geometric approach to Lee, R. (2004). The moment formula for implied volatility at
the asymptotics of implied volatility, in R. Cont, ed., extreme strikes, Mathematical Finance 14(3), July 469–480.
Frontiers in Quantitative Finance: Volatility and Credit
Risk Modeling, Wiley, Chapter 4. PIERRE HENRY-LABORDÈRE
[7] Henry-Labordère, P. (2008). Analysis, Geometry and
Modeling in Finance: Advanced Methods in Option
with d1,2 = − log (K/F0 ) /V ± V /2. It follows that
Implied Volatility: Large we can express the normalized Black–Scholes call
Strike Asymptotics price
cBS := CBS /S0 (8)

as a function of two variables: log-strike k :=


Let St be the price of a risky asset at time log (K/F
t ∈ [0, T ] and Bt = B (t, T ) the time-t value of √ 0 ) and (scaled) Black–Scholes volatility
V = σ T,
one monetary unit received at time T . Assum-
ing suitable no-arbitrage conditions, there exists a
cBS (k, V ) =  (d1 ) − ek  (d2 )
probability measure  = T , called the (T -forward)
pricing measure, under which the Bt -discounted asset with d1,2 = − k/V ± V /2 (9)
price
Let us now return to the general setting and just
Ft = F (t, T ) = S (t) /B (t, T ) (1) assume that, for fixed T , the returns
FT
is a martingale and so are Bt -discounted time-t option X := log (10)
prices, such as Ct /B (t, T ), where Ct denotes the F0
time-t value of a European call option with maturity have a known distribution, fully specified by the
T and payoff (ST − K)+ . With focus on t = 0 and probability distribution function
writing C instead of C0 , we have
F (x) :=  [X ≤ x] (11)
  From equation (2), the value of a normalized call
C = B (0, T ) Ɛ CT /B (T , T ) (2)
  price c = C/S0 is then given by
= B (0, T ) Ɛ (FT − K)+ 
 +  
  FT ∞
x +
FT K + Ɛ − ek = e − ek dF (x)
= S0 Ɛ − (3) F0 −∞
F0 F0
=: c (k) (12)
Let us remark that in the case of deterministic interest
rates r (·), one can rewrite this asa Definition 1 (Implied volatility). Let T > 0 be a
fixed maturity and assume F is the distribution func-
   T  tion of the returns log (FT /F0 ) under the pricing
+
C = Ɛ exp − r (u) du (ST − K) (4) measure. Then, the scaled (Black–Scholes) implied
0
volatility is the unique value V (k) such that
If we now make the assumption that there exists c (k) = cBS (k, V (k)) for all k ∈  (13)
σ > 0, the Black–Scholes volatility, such that Ft √
satisfies We also write σ (k, T ) := V (k) / T for the (annual-
ized, Black–Scholes) implied volatility.
dFt = σ Ft dW (5)
By the very definition, the volatility smile
√ V (·, T )
where W is a Brownian motion under , then is flat, namely constant equal to V = σ T , in the
we have normal returns XBS := log (FT /F0 ). More Black–Scholes model. To see existence/uniqueness
precisely, of implied volatility, in general, it suffices to note

√ that cBS (k, ·) is strictly increasing in the volatility
XBS ∼ Normal −V 2 /2, V 2 with V ≡ σ T (6) parameter and that

and an elementary integration of equation (3) yields cBS (k, V = 0)


the classical Black–Scholes formula,b
+ 
+ 
= 1 − ek ≤ Ɛ FT /F0 −ek = c (k) (14)
K  
CBS = S0  (d1 ) −  (d2 ) (7)
F0 c (k) ≤ Ɛ FT /F0 = 1 = cBS (k, V = +∞) (15)
2 Implied Volatility: Large Strike Asymptotics

It is clear from the afore mentioned monotonicity of and one is led to Lee’s moment formula
cBS (k, ·) that the fatness of the tail of the returns, for


example, the behavior of V (k)2 /k ∼ ψ p ∗


F̄ (k) = 1 − F (k) =  [X > k] as k → ∞ (16) V (−k)2 /k ∼ ψ q ∗ (26)

is related to the shape of the “wing” of the implied Recall that g (k) ∼ h (k) stands for the precise math-
volatility (smile) for far-out-of-the-money calls, V (k) ematical statement that lim g (k) / h (k) → 1 as k →
as k → ∞, and similarly, for F (k) , V (k) as k → ∞. In the same spirit, let us agree that
−∞. Surprisingly, perhaps, this link can be made
very explicit. Let us agree that if F admits a density, g (k)  h (k) means lim sup g (k) / h (k) → 1
it is denoted by f = F  . Let us also adopt the
common convention that as k → ∞ (27)

Proposition 1 (Lee’s Moment Formula; [3, 8]).


g (k) ∼ h (k) means g (k) / h (k) → 1 as k → ∞ Assume Ɛ[eX ] = Ɛ [ST ] /F0 < ∞. The moment for-
mula then holds in complete generality in “limsup”
(17)
form. More precisely, as k → ∞,
The (meta) result is the following tail-wing formula:

as k → ∞ we have V (k)2 /k  ψ p ∗ (28)


  V (−k)2 /k  ψ q ∗ (29)
V (k)2 /k ∼ ψ −1 − log F̄ (k) /k
  The power of the moment formula comes from the
∼ ψ −1 − log f (k) /k (18) fact that the critical exponents p ∗ , q ∗ can often be
 
V (−k)2 /k ∼ ψ − log F (−k) /k obtained by sheer inspection of a moment generating
  function known in closed form. One can also make
∼ ψ − log f (−k) /k (19) use of the recent literature on moment explosions to
obtain such critical exponents in various stochastic
where   volatility models; see Moment Explosions and the
ψ (x) ≡ 2 − 4 x2 + x − x (20) references therein. Let us note that it is possible [4]
to construct (pathological) examples to see that one
An interesting special case arises when either cannot hope for a genuine limit form of the above
moment formula, as was suggested in (26). Another
 
p ∗ = sup p ∈  : M (1 + p) < ∞ (21) remark is that the moment formula provides little
  information in the absence of moment explosion. For
q ∗ = sup q ∈  : M (−q) < ∞ (22) instance, p ∗ = +∞ only implies V (k)2 = o (k) but
gives no further information about the behavior of
is finite, where M is the moment generating function V (k) for large k. Both the issues are dealt with
of F , and this is equivalent to moment explosion of by the tail-wing formula. The key assumptions is a
the underlying since certain well behavedness of F ; but only on a crude
 logarithmic scale and, therefore, rather easy to check
1  u 1  u in many examples.
M (u) := eux dF (x) = u Ɛ FT = Ɛ ST
F0 F0u
Definition 2 (Regular Variation; [5]). A positive,
(23) real-valued function f , defined at least on [M, ∞)
for some large M, is said to be regularly varying of
In this case, one expects an exponential tail so that
index α if for all λ > 0


− log F̄ (k) ∼ p ∗ + 1 k (24) f (λk) ∼ λα f (k) as k → ∞ ∀λ > 0 (30)

− log F (−k) ∼ q ∗ k (25) and in this case we write f ∈ Rα .


Implied Volatility: Large Strike Asymptotics 3

Theorem 1 (Right-hand Tail-wing  Formula;


 [2]). With focus on the right-hand tail-wing, let us
Assume ∃ > 0 : Ɛ[e(1+)X ] = Ɛ ST1+ε /F01+ε < ∞. single out two cases of particular importance in
Let also α > 0 and set applications.
 
ψ (x) ≡ 2 − 4 x2 + x − x (31) 1. (Asymptotically Linear Regime) If −1−
log f (k) /k or −1 − log F̄ (k) /k converges to
p ∗ ∈ (0, ∞) then
Then (i) ⇒(ii) ⇒(iii) ⇒(iv) where


V (k)2 ∼ ψ p ∗ × k (33)
(i) − log f (k) ∈ Rα ;
and the implied variance, defined as the square
(ii) − log F̄ (k) ∈ Rα ;
of implied volatility, is asymptotically linear
(iii) − log c (k) ∈ Rα ; with slope ψ (p ∗ ). One can, in fact, check this
from the moment generating function of X.
and
  Indeed, it is shown in [3] that if
(iv) V (k)2 /k ∼ ψ − log c (k) /k .
s  → M(1 + p ∗ − 1/s) (34)
If (ii) holds, then − log c (k) ∼ −k − log F̄ and
  is regularly varying then − log f (k) /k →
(iv ) V (k)2 /k ∼ ψ −1 − log F̄ (k) /k ,
p ∗ +1. In other words, to ensure equation (26),
if (i) holds, then − log f ∼ − log F̄ and that is, a genuine limit in Lee’s moment for-
  mula, one needs some well behavedness of
(iv ) V (k)2 /k ∼ ψ −1 − log f (k) /k . the M as its argument approaches the criti-
cal exponent 1 + p ∗ . Similar conditions can
Of course, there is a similar left-hand result, which be given with M replaced by M  or log M
we state such as to involve far-out-of-the-money and these conditions are, indeed, easy to check
(normalized) European puts, in a number of familiar exponential Lévy
 ∞ models (including Barndorff–Nielsen’s Nor-

k +
p (k) := e − ex dF (x) (32) mal Inverse Gaussian model, Carr–Madan’s
−∞ Variance Gamma model, or Kou’s Double
Exponential model) and various time changes
Theorem 2 (Left-hand Tail-wing Formula). Assume
of these models (see [3] for details).
∃ > 0 : Ɛ[e−X ] < ∞. Then (i) ⇒(ii) ⇒(iii)
⇒(iv) where 2. (Asymptotically Sublinear Regime) If −
log f (k) /k → ∞, we can use ψ (x) ∼ 1/ (2x)
(i) − log f (−k) ∈ Rα ; as x → ∞ to see that
(ii) − log F (−k) ∈ Rα ;
1 k2
(iii) − log p (−k) ∈ Rα ; V (k)2 ∼ ×k =
−2 log f (k) /k −2 log f (k)
and (35)


(iv) V (−k) /k ∼ ψ − 1 − log p −k /k .
2
so that the implied variance is asymptoti-


cally sublinear. As sanity check, consider the
If (ii) holds, then − log p −k ∼ k − log F −k
Black–Scholes model where f is the den-
and sity of the (normally distributed) returns with

 variance V 2 ≡ σ 2
T , as
(iv ) V (−k)2 /k ∼ ψ − log F −k /k , given in (6); then


− log f (k) ∼ k 2 / 2V 2 and it follows that
if (i) holds, then − log f −k ∼ − log F −k V (k) ∼ V , in trivial agreement with the flat
smile in the Black–Scholes model. Follow-
and

 ing [2], other examples are given by Mer-
(iv ) V (−k)2 /k ∼ ψ − log f −k /k . ton’s jump diffusion as a borderline example
4 Implied Volatility: Large Strike Asymptotics

in which the sublinear behavior comes from √


The leading order term 2k says that implied vari-
a subtle logarithmic correction term, and ance grows linearly with slope 2, as one expects in a
Carr–Wu’s Finite Moment Logstable model. model with immediate moment explosion.
The tail behavior of the latter, as noted in [2],
can be derived from the growth of the (nonex-
plosive) moment generating function by means Acknowledgments
of Kasahara’s Tauberian theorem [5]. Another
example where this methodology works is the Financial support form the Cambridge Endowment of
SABR model Research in Finance is gratefully acknowledged.

dF = σ F β dW, dσ = ησ dZ (36)
End Notes
with σ, η > 0, β < 1 and two Brownian mot-
a.
ions W, Z assumed (here) to be independent. Equation (4) is valid in a nondeterministic interest rate
setting, provided the expectation is taken with respect to the
Using standard stochastic calculus [4],
 one can
 risk-neutral measure (which is equivalent but, in general,
give good enough estimates on Ɛ |FT /F0 |u , not identical to T ).
from above and below, to see that b.
 denotes the distribution function of normal (0, 1).
   
log Ɛ |FT /F0 |u = log Ɛ exp (uX) References
2 2
η Tu
∼ as u → ∞ (37) [1] Avellaneda, M. & Zhu, Y. (1998). A risk-neutral stochas-
(1 − β) 2 2
tic volatility model, International Journal of Theoretical
and Applied Finance 1(2), 289–310.
From this, Kasahara’s theorem allows to [2] Benaim, S. & Friz, P.K. (2009). Regular variation and
deduce the tail behavior of X, namely smile asymptotics, Mathematical Finance 19(1), 1–12,
eprint arXiv:math/0603146.
(1 − β)2 x 2 [3] Benaim, S. Friz, P.K. (2008). Smile asymptotics II:
− log  [X > x] ∼ (38)
η2 T 2 models with known MGF, Journal of Applied Probability
45(1), 16–32.
and the (right hand) tail-wing formula reveals [4] Benaim, S. Friz, P.K. & Lee, R. (2008). The
that the implied volatility in the SABR model Black–Scholes implied volatility at extreme strikes, in
frontiers, in Quantitative Finance: Volatility and Credit
is asymptotically flat, σ (k, T ) ∼ η/ (1 − β) as
Risk Modeling, Chapter 2, Wiley.
k → ∞. [5] Bingham, N.H. Goldie, C.M. & Teugels, J.L. (1987).
Early contributions in the study of smile asymp- Regular Variation, CUP.
[6] Gatheral, J. (2000). Rational shapes of the Volatility
totics are [1, 6]. The moment formula appears in [8],
Surface, Presentation, RISK Conference.
the tail-wing formula in [2] with some additional cri- [7] Gulisashvili, A. & Stein, E. Implied volatility in the
teria in [3]. A survey on the topic, together with some Hull–White model, Mathematical Finance, to appear.
new examples (including CEV and SABR) is found [8] Lee, R. (2004). The moment formula for implied volatil-
in [4]. Further developments in the field include the ity at extreme strikes, Mathematical Finance 14(3),
refined asymptotic results of Gulisashvili and Stein 469–480.
[7]; in a simple log-normal stochastic volatility model
of the form dF = σ F dW, dσ = ησ dZ, with two Further Reading
independent Brownian motions W, Z they find
√ √ Gatheral, J. (2006). The Volatility Surface, A Practitioner’s
log k + log log k
σ (k, T ) T = 2k − √ + O (1) Guide, Wiley.
2η T
(39) PETER K. FRIZ
Constant Elasticity of increasing with the strike price, but care should be
taken when working with this model (see the discus-
Variance (CEV) Diffusion sion below).
The CEV diffusion has the following boundary
Model characterization (see, e.g., [4] for Feller’s bound-
ary classification for one-dimensional diffusions). For
−1/2 ≤ β < 0, the origin is an exit boundary, and
The CEV Process the process is killed the first time it hits the origin.
For β < −1/2, the origin is a regular boundary point.
The constant elasticity of variance (CEV) model is The SDE (1) does not uniquely specify the diffusion
a one-dimensional diffusion process that solves a process, and a boundary condition is needed at the
stochastic differential equation (SDE) origin. In the CEV model, it is specified as a killing
β+1 boundary. Thus, the CEV process with β < 0 natu-
dSt = µSt dt + aSt dBt (1)
rally incorporates the possibility of bankruptcy—the
with the instantaneous volatility σ (S) = aS β speci- stock price can hit zero with positive probability, at
fied to be a power function of the underlying spot which time the bankruptcy occurs. For β ≥ 0, the
price. The model has been introduced by Cox [7] origin is an inaccessible natural boundary.
as one of the early alternative processes to the geo-
metric Brownian motion to model asset prices. Here
Reduction to Bessel Processes, Transition Density,
β is the elasticity parameter of the local volatility,
and Probability of Default
dσ/ dS = βσ/S, and a is the volatility scale param-
eter. For β = 0, the CEV model reduces to the con-
The CEV process is analytically tractable. Its transi-
stant volatility geometric Brownian motion process
tion probability density and cumulative distribution
employed in the Black, Scholes, and Merton model.
function are known in closed form.a It is closely
When β = −1, the volatility specification is that of
related to Bessel processes and inherits their analyti-
Bachelier (the asset price has the constant diffusion
cal tractability. The CEV process with drift (µ  = 0)
coefficient, while the logarithm of the asset price has
is obtained from the process without drift (µ = 0) via
the a/S volatility). For β = −1/2 the model reduces
a scale and time change:
to the square-root model of Cox and Ross [8].
Cox [7] originally studied the case β < 0 for  2µβt 
(µ) e −1
which the volatility is a decreasing function of the St =e µt
Sτ(0)
(t) , τ (t) = (2)
asset price. This specification captures the leverage 2µβ
effect in the equity markets: the stock price volatility  
increases as the stock price declines. The result of this Let Rt(ν) , t ≥ 0 be a Bessel process of index ν.
inverse relationship between the price and volatility Recall that for ν ≥ 0, zero is an unattainable entrance
is the implied volatility skew exhibited by options boundary. For ν ≤ −1, zero is an exit boundary.
prices in the CEV model with negative elasticity. For ν ∈ (−1, 0), zero is a regular boundary. In our
The elasticity parameter β controls the steepness of application, we specify zero as a killing boundary
the skew (the larger the |β|, the steeper the skew), to kill the process at the first hitting time of zero
while the scale parameter a fixes the at-the-money (see, e.g., [4, pp. 133–134], for a summary of Bessel
volatility level. This ability to capture the skew has processes). Before the first hitting time of zero, the
made the CEV model popular in equity options CEV process without drift can be represented as a
markets. power of a Bessel process:
Emanuel and MacBeth [14] extended Cox’s anal-
ysis to the positive elasticity case β > 0, where the  − 1
β
asset price volatility is an increasing function of the St(0) = a|β|Rt(ν) (3)
asset price. The driftless process with µ = 0 and with
positive β is a strict local martingale. It has been where ν = 1/(2β).
applied to modeling commodity prices that exhibit The CEV transition density is obtained from the
increasing implied volatility skews with the volatility well-known expression for the transition density of
2 Constant Elasticity of Variance (CEV) Diffusion Model

the Bessel process (see [4, p. 115, 21, p. 446]). For indexed by  that converge to the CEV process in
the driftless process, it is given by the limit  → 0.
The CEV process with β > 0 can similarly be
−2β−3/2 1/2
 −β −β
 regularized to prevent the volatility explosion as
St S0 S0 St
p(0) (S0 , St ; t) = I|ν| the process tends to infinity by picking a large
2
a |β|t a2β 2 t value E > 0 and fixing the volatility above E to
 −2β −2β
 equal a Eβ . The regularized processes with µ = 0 are
S0 + St true martingales, as opposed to the failure of the
× exp − (4)
2a 2 β 2 t martingale property for the driftless CEV process
with β > 0 and µ = 0, which is only a strict local
where Iν is the modified Bessel function of the first martingale. The failure of the martingale property
kind of order ν. From equation (2), the transition for the nonregularized process with β > 0 can be
density with drift is obtained from the density equa- explicitly illustrated by computing the expectation
tion (4) according to (using the transition density (5)):
    
p(µ) (S0 , St ; t) = e−µt p(0) S0 , e−µt St ; τ (t) (5) µS
−2β
Ɛ[St ] = eµt S0 1 − G ν, 2  2µβt 0
 (7)
a β e −1
The density (5) was originally obtained by Cox [7]
for β < 0 and by Emanuel and MacBeth [14] for
β > 0 on the basis of the result due to Feller [15].
CEV Options Pricing
For β < 0, in addition to the continuous transition
density, we also have a positive probability for the The closed-form CEV call option pricing formula
process started at S0 at time zero to hit zero by time with strike K, time to expiration T , and the initial
t ≥ 0 (probability of default or bankruptcy) that is asset price S can be obtained in closed form by
given explicitly by integrating the call payoff with the risk-neutral CEV
 −2β
 density (5) with the risk-neutral drift µ = r − q (r is
µS0 the risk-free interest rate and q is the dividend
G |ν|, 2  2µβt  (6)
a β e −1 yield). The result can be expressed in terms of
the complementary noncentral chi-square distribution

where G(ν, x) = (1/ (ν)) x uν−1 e−u du is the function Q(z; v, k) ([7] for β < 0, [14] for β > 0; see
complementary Gamma distribution function. This also [11, 22]):
expression can be obtained by integrating the contin-


uous density (5) from zero to infinity and observing C(S; K, T ) = e−rT Ɛ (ST − K)+
that the result is less than one, that is, the density is  −qT
defective. The defect is equal to the probability mass 

e S Q (ξ ; 2ν, y0 ) β>0

 −e−rT K (1 − Q (y0 ; 2(1 + ν), ξ )) ,
at zero equation (6).
=
While killing the process at zero is desirable for 

stock price modeling, it may be undesirable in other  e−qT S Q (y0 ; 2(1 + |ν|), ξ )
 β<0
contexts, where one would prefer the process that −e−rT K (1 − Q (ξ ; 2|ν|, y0 )) ,
stays strictly positive (e.g., in stock index models). A (8)
regularized version of the CEV process that never
hits zero has been constructed by Andersen and where
Andreasen [1] (see also [9]). The positive probability 2µS −2β 2µK −2β
of hitting zero comes from the explosion of instanta- ξ=   , y 0 =   (9)
a 2 β e2µβT − 1 a 2 β 1 − e−2µβT
neous volatility as the process falls toward zero. The
regularized version of the CEV process fixes a small and S = S0 is the initial asset price at time zero. The
value  > 0. For S > , the volatility is according price of the put option is obtained from the put–call
to the CEV specification. For S ≤ , the volatility parity relationship:
is fixed at the constant level a β . We thus have a
sequence of regularized strictly positive processes P (S; K, T ) = C(S; K, T ) + Ke−rT − Se−qT (10)
Constant Elasticity of Variance (CEV) Diffusion Model 3

The complementary noncentral chi-square distribu- intensity that is an affine function of the instantaneous
tion function can be expressed as the series of variance:
complementary Gamma distribution functions ([22,
pp. 214]): λ(S) = b + cσ 2 (S) = b + ca 2 S 2β (13)

 where b ≥ 0 is the constant part of the default


v z

 (k/2)n intensity and c ≥ 0 is the sensitivity of the default
Q (z; v, k) = e−k/2 G n+ ,
n=0
(n + 1) 2 2 intensity to the instantaneous variance. The predefault
stock price follows a diffusion process solving the
(11) SDE:
for k, z > 0. Further efficient numerical methods β+1
dSt = [µ + λ(St )] St dt + a St dBt (14)
to compute the noncentral chi-square cumulative
distribution function (CDF) can be found in [3, 12, The addition of the default intensity in the drift
13, 22]. compensates for the jump to default and makes the
The first passage time problem for the CEV diffu- process with µ = 0 a martingale. The diffusion pro-
sion can be solved analytically and, hence, barrier and cess with the modified drift (14) and killed at the rate
lookback options can be priced analytically under the (13) is called jump-to-default extended constant elas-
CEV process. Davydov and Linetsky [9, 10] obtained ticity of variance (JDCEV) process. In the JDCEV
the analytical expressions for the Laplace transforms model, the stock price evolves according to equation
of single- and double-barrier and lookback options (14) until a jump to default arrives, at which time the
pricing formulas in time to expiration. Davydov and stock price drops to zero and equity becomes worth-
Linetsky [10] and Linetsky [18] inverted the Laplace less. The jump to default time has the intensity (13).
transforms for barrier options and lookback options The JDCEV model can be reduced to Bessel pro-
in terms of eigenfunction expansions, respectively. cesses similar to the standard CEV model. Conse-
Other types of options under the CEV process, quently, it is also analytically tractable. Closed-form
such as American options, require numerical treat- pricing formulas for call and put options and the
ment. The pricing partial differential equation (PDE) probability of default can be found in [6]. The first
for European options reads as follows: passage time problem for the JDCEV process and
the related problem of pricing equity default swaps
a 2 2β+2 ∂ 2 V ∂V ∂V
S + (r − q)S + = rV (12) are solved in [20]. Atlan and Leblanc [2] and Campi
2 ∂S 2 ∂S ∂t et al. [5] investigate related applications of the CEV
The early exercise can be dealt with in the same model to hybrid credit–equity modeling.
way as for other diffusion models via dynamic
programming, free boundary PDE formulations, or
variational inequality formulations. Volatility Skews and Credit Spreads
Figure 1(a) illustrates the shapes of the term structure
Jump-to-Default Extended CEV Model of zero-coupon credit spreads in the CEV and JDCEV
models, assuming zero recovery. The credit spread
While the CEV process can hit zero and, as a result, curves start at the instantaneous credit spread equal
the CEV equity model includes the positive probabil- to the default intensity b + cσ∗2 (σ∗ is the volatility
ity of bankruptcy, the term structure of credit spreads at a reference level S ∗ ).b The instantaneous credit
in the CEV model is such that the instantaneous credit spreads for the CEV model vanish, while they are
spread vanishes. There is no element of surprise—the positive for the JDCEV model. Figure 1(b) plots the
event of default is a hitting time. Moreover, the prob- Black–Scholes implied volatility against the strike
ability of default is too small for practical applications price in the CEV and JDCEV models (we calcu-
of modeling stocks of firms other than the highest late the implied volatility by equating the price of
rated investment grades. Carr and Linetsky [6] extend an option under the Black–Scholes model to the cor-
the CEV model by allowing a jump to default to occur responding option price under the (JD)CEV model).
from a positive stock price. They introduce a default One can observe the decreasing and convex implied
4 Constant Elasticity of Variance (CEV) Diffusion Model

6.0
JDCEV (b = −1/2)
JDCEV (b = −1)
5.0 JDCEV (b = −2)
JDCEV (b = −3)
CEV (b = −1/2)
CEV (b = −1)
4.0 CEV (b = −2)
CEV (b = −3)
Percent

3.0

2.0

1.0

0.0
0 5 10 15 20 25 30
(a) Time to maturity (years)

7.5

JDCEV T = 0.25
65 JDCEV T = 0.5
JDCEV T = 1
JDCEV T = 5
Implied volatility (%)

55 CEV (b = −1) T = 0.25


CEV (b = −1) T = 5
CEV (b = −2) T = 0.25
45 CEV (b = −2) T = 5

35

25

15
30 35 40 45 50 55
(b) Strike

Figure 1 (a) Term structures of credit spreads. Parameter values: S = S ∗ = 50, σ∗ = 0.2, β = −1/2, −1, −2, −3,
r = 0.05, q = 0. JDCEV: b = 0.02 and c = 1/2. CEV: b = 0 and c = 0. (b) Implied volatility skews. Parameter values:
S = S ∗ = 50, σ∗ = 0.2, r = 0.05, q = 0. For JDCEV model: b = 0.02, c = 1/2 and β = −1, the times to expiration
are T = 0.25, 0.5, 1, 5 years. For CEV model: b = c = 0 , β = −1, −2 and times to expiration are T = 0.25, 5. Implied
volatilities are plotted against the strike price

volatility skew with implied volatilities increasing for skew in the CEV model. The slope of the skew
lower strikes, as the local volatility and the default in the JDCEV model is steeper and is controlled
intensity both increase as the stock price declines. by β, as well as the default intensity parameters b
The volatility elasticity β controls the slope of the and c.
Constant Elasticity of Variance (CEV) Diffusion Model 5

Implied Volatility and the SABR model intensity linked to the stock price volatility, jumps,
and stochastic volatility. These models inherit the
By using singular perturbation techniques, Hagan analytical tractability of the CEV and JDCEV mod-
and Woodward [17] obtained explicit asymptotic for- els as long as the Laplace transform of the time-
mulas for the Black–Scholes implied volatility σBS change process is analytically tractable. The stochas-
of European calls and puts on an asset whose for- tic volatility version of the CEV model obtained
ward price F (t) follows the CEV dynamics, that is, in this approach is different from the SABR model
β+1
dFt = aFt dBt , in two respects. The advantage of the time-change
approach is that it preserves the analytical tractability
  
β (β + 3) F0 − K 2 for more realistic choices for the stochastic volatil-
σBS = afavβ 1− ity process, such as the Cox–Ingersoll–Rand (CIR)
24 fav
process with mean-reversion. Another advantage is

that jumps, including the jump to default, can also
β 2 2 2β
+ a τfav + · · · (15) be incorporated. The weakness is that it is hard to
24 incorporate the correlation between the price and
volatility.
where τ is time to expiration, fav = (F0 + K)/2
and F0 is today’s forward price (Hagan and Wood-
ward’s β is equal to our β + 1). This asymptotics End Notes
for the implied volatility approximates the exact
CEV-implied volatilities well when the ratio F0 /K a.
In this article we present the results for the CEV model
is not too far from one and when K and F0 are with constant parameters. We note that the process remains
far away from zero. The accuracy tends to deteri- analytically tractable when µ and a are taken to be
orate when the values are close to zero since this deterministic functions of time [6].
b.
asymptotic approximation does not take into account It is convenient to parameterize the local volatility
the killing boundary condition at zero. function as σ (S) = aS β = σ∗ (S/S ∗ )β so that at some
Hagan et al. [16] introduced the SABR model, reference spot price level S = S ∗ (e.g., the at-the-money
which is a CEV model with stochastic volatility. level at the time of model calibration) the volatility takes
the reference value, σ (S ∗ ) = σ∗ . In the example presented
More precisely, the volatility scale parameter a is here, the reference level is taken to equal the initial spot
made stochastic, so that the forward asset price price level, S ∗ = S0 , and the volatility scale parameter is
follows the dynamics: β
a = σ∗ /(S0 ).

β+1
dFt = at Ft dBt(1) and
Acknowledgments
dat = ηat dBt(2) (16)
This research was supported by the National Science
where dBt(1) , dBt(2) = ρdt. Hagan et al. derive the Foundation under grant DMS-0802720.
asymptotic expression for the implied volatility in the
SABR model.
References
Introducing Jumps and Stochastic [1] Andersen, L. & Andreasen, J. (2000). Volatility skew
Volatility into the CEV Model and extensions of the LIBOR market model, Applied
Mathematical Finance 7, 1–32.
Mendoza et al. [19] introduce jumps and stochastic [2] Atlan, M. & Leblanc, B. (2005). Hybrid equity-credit
volatility into the JDCEV model by time changing modelling, Risk Magazine 18, 8.
the JDCEV process. Lévy subordinator time changes [3] Benton, D. & Krishnamoorthy, K. (2003). Computing
discrete mixtures of continuous distributions: noncen-
introduce state-dependent jumps into the process, tral chi-square, noncentral t and the distribution of
while absolutely continuous time changes introduce the square of the sample multiple correlation coeffi-
stochastic volatility. The result is a flexible family cient, Computational Statistics and Data Analysis 43,
of models that exhibit the leverage effect, default 249–267.
6 Constant Elasticity of Variance (CEV) Diffusion Model

[4] Borodin, A. & Salminen, P. (2002). Handbook of Brow- [14] Emanuel, D.C. & MacBeth, J.D. (1982). Further results
nian Motion: Facts and Formulae, Probability and Its on the constant elasticity of variance call option pric-
Applications, 2nd rev Edition, Birkhauser Verlag AG. ing model, The Journal of Financial and Quantitative
[5] Campi, L., Sbuelz, A. & Polbennikov, S. (2008). Sys- Analysis 17, 533–554.
tematic equity-based credit risk: A CEV model with [15] Feller, W. (1951). Two singular diffusion problems, The
jump to default, Journal of Economic Dynamics and Annals of Mathematics 54, 173–182.
Control 33, 93–108. [16] Hagan, P.S., Kumar, D., Lesniewski, A.S. & Wood-
[6] Carr, P. & Linetsky, V. (2006). A jump to default ward, D.E. (2002). Managing smile risk, Wilmott Mag-
extended CEV model: an application of Bessel pro- azine 1, 84–108.
cesses, Finance and Stochastics 10, 303–330. [17] Hagan, P. & Woodward, D. (1999). Equivalent
[7] Cox, J.C. (1975, 1996). Notes on option pricing I: black volatilities, Applied Mathematical Finance 6,
constant elasticity of variance diffusions, Reprinted in 147–157.
The Journal of Portfolio Management 23, 15–17. [18] Linetsky, V. (2004). Lookback options and diffusion
[8] Cox, J.C. & Ross, S.A. (1976). The valuation of options hitting times: a spectral expansion approach, Finance
for alternative stochastic processes, Journal of Financial and Stochastics 8, 343–371.
Economics 3, 145–166. [19] Mendoza, R., Carr, P. & Linetsky, V. (2007). Time
[9] Davydov, D. & Linetsky, V. (2001). Pricing and hedging Changed Markov Processes in Credit-Equity Modeling,
path-dependent options under the CEV process, Man- Mathematical Finance, to appear.
agement Science 47, 949–965. [20] Mendoza, R. & Linetsky, V. (2008). Equity Default
[10] Davydov, D. & Linetsky, V. (2003). Pricing options Swaps under the Jump-to-Default Extended CEV Model.
on scalar DIFFUSIONS: an eigenfunction expansion Working paper.
approach, Operations Research 51, 185–209. [21] Revuz, D. & Yor, M. (1999). Continuous Martingales
[11] Delbaen, F. & Shirakawa, H. (2002). A note of option and Brownian Motion, Grundlehren Der Mathematischen
pricing for constant elasticity of variance model, Asia- Wissenschaften, Springer.
Pacific Financial Markets 9, 85–99. [22] Schroder, M. (1989). Computing the constant elasticity
[12] Ding, C.G. (1992). Computing the non-central χ 2 dis- of variance option pricing formula, The Journal of
tribution function, Applied Statistics 41, 478–482. Finance 44, 211–219.
[13] Dyrting, S. (2004). Evaluating the noncentral chi-square
distribution for the Cox-Ingersoll-Ross process, Compu- VADIM LINETSKY & RAFAEL MENDOZA
tational Economics 24, 35–50.
Bates Model the cost of carry is equal to the risk-free interest
rate.
The postulated process has an associated con-
The Bates [3] and Scott [13] option pricing models ditional characteristic function that is exponentially
were designed to capture two features of the asset affine in the state variables. For the Bates model, the
returns: the fact that conditional volatility evolves characteristic function is
over time in a stochastic but mean-reverting fashion,
and the presence of occasional substantial outliers ϕ(i) = E0∗ [eiST |S0 , V0 , T ]
in the asset returns. The two models combined the = exp [iS0 + C(T ; i) + D(T ; i) V0
Heston [9] model of stochastic volatility (see Heston 
Model) with the Merton [11] model of independent + λ∗ T E (i) (2)
normally distributed jumps in the log asset price (see
Jump-diffusion Models). The Bates model ignores where E0∗ [·] is the risk-neutral expectational operator
interest rate risk, while the Scott model allows interest associated with equation (1), and
rates to be stochastic. Both models evaluate European 
option prices numerically, using the Fourier inversion γ (z) = (ρσv z − β ∗ )2 − σv2 (z2 − z) (3)
approach of Heston (see also Fourier Transform and
αT 2α
Fourier Methods in Options Pricing for a general C(T ; z) = bT z − [ρσv z − β ∗ − γ (z)] −
discussion of Fourier transform methods in finance). σv2 σv2
The Bates model also includes an approximation for 
1 − eγ (z)T

pricing American options (see American Options). × ln 1+[ρσv z − β − γ (z)]
The two models were historically important in show- 2γ (z)
ing that the tractable class of affine option pricing (4)
models includes jump processes as well as diffusion
processes. z −z
2
D(T ; z) = (5)
All option pricing models rely upon a risk-neutral eγ (z)T + 1 ∗
representation of the data generating process that γ (z) + β − ρσv z
eγ (z)T − 1
includes appropriate compensation for the various

∗ z 1 δ 2 (z2 −z) ∗
risks. In the Bates and Scott models, the risk- E(z) = 1 + k e2 −1−k z (6)
neutral processes for the underlying asset price St
and instantaneous variance Vt are assumed to be of The terms C(·) and D(·) are identical to those
the form in the Heston [9] stochastic volatility model, while
∗ 
dSt /St = (b − λ∗ k ) dt + Vt dZt + k ∗ dqt E(·) captures the additional distributional impact
   of jumps. Scott’s generalization to stochastic inter-
dVt = α − β ∗ Vt dt + σv Vt dZvt (1) est rates uses an extended Fourier transform of
the form
where b is the cost of carry; Zt and Zvt are
Wiener processes with correlation ρ; qt is an integer- ϕ ∗ (z)
valued Poisson counter with risk-neutral intensity λ∗   
T
that counts the occurrence of jumps; and k ∗ is the
= E0∗ exp − rt dt + z ln ST |S0 , r0 , V0 , T
random percentage jumpsize, with a Gaussian  dis- 0
∗ ∗
tribution ln(1 + k ) ∼ N ln(1 + k ) − 2 δ , δ con-
1 2 2
(7)
ditional upon the occurrence of a jump. The Bates
model assumes b is constant, while the Scott model which has an analytical solution for complex-valued
assumes it is a linear combination of Vt and an z that is also exponentially affine in the state variables
additional state variable that follows an indepen- S0 , r0 , and V0 .
dent square-root process. Bates [3] examines for- European call option prices take the form c =
eign currency options, for which b is the domes- B(F P1 − XP2 ), where B is the price of a discount
tic/foreign interest differential, while Scott’s applica- bond of maturity T , F is the forward price on the
tion [13] to nondividend paying stock options implies underlying asset, X is the option’s exercise price, and
2 Bates Model

P1 and P2 are upper tail probability measures deriv- Alternate jump specifications (including Lévy pro-
able from the characteristic function. The papers of cesses) with independent and identically distributed
Bates [3] and Scott [13] present Fourier inversion jumps involve modification of the functional form
methods for evaluating P1 and P2 numerically. How- of E(·), and are discussed in other articles: Temp-
ever, faster methods were subsequently developed for ered Stable Process; Normal Inverse Gaussian
directly evaluating European call options, using a sin- Model; Variance-gamma Model; Kou Model; Exp-
gle numerical integration of the form onential Lévy Models). The Bates [5] model with
(risk-neutral) stochastic jump intensities of the form
c =BF − BX λ∗ + λ∗1 Vt involves modifying γ (·) and D(·):
   
1 1 ∞ f (i)e−i ln X 
× + Re d γ (z) = (ρσv z − β ∗ )2 − σv2 [z2 − z + 2λ∗1 E(z)]
2 π 0 i(1 − i)
(8) (9)
z −z+
2
2λ∗1 E(z)
where Re[z] is the real component of a complex vari- D(T ; z) = (10)
eγ (z)T + 1 ∗
able z (see Fourier Methods in Options Pricing). γ (z) + β − ρσv z
For the Bates model, f (i) = ϕ(i); for the Scott eγ (z)T − 1
model, f (i) = ϕ ∗ (i)/B. European put options can See also Time-changed Lévy Process for other
be evaluated from European call option prices using stochastic-intensity jump models.
the put–call parity relationship p = c + B(X − F ) Bates [5] also contains multifactor specifications
(see Put–Call Parity for details on put-call parity). for the instantaneous variance and jump intensity.
Evaluating equation (8) typically involves integra- The general class of affine jump-diffusion models is
tion of a dampened oscillatory function. While there presented in [8], including the volatility-jump option
exist canned programs for integration over a semi- pricing model. Scott’s extended Fourier transform
infinite domain, most papers use various forms of approach for stochastic interest rates was subse-
integration over a truncated domain. Bates [3] uses quently also used by Bakshi and Madan [2] and
Gauss–Kronrod quadrature (see Quadrature Meth- Duffie et al. [8]
ods). Fast Fourier transform approaches have also
been proposed, but these involve substantially more
functional evaluations. The integration is typically
well behaved, but there do exist extreme parameter Further Reference Material
values (e.g., |ρ| near 1) for which the path of integra-
Bates [7, pp. 943-4] presents a simple derivation of
tion crosses the branch cut of the log function. As all
equation (8), and cites earlier papers that develop
contemporaneous option prices of a given maturity
the single-integration approach. Numerical integra-
use the same values of f (i) regardless of the strike
tion issues are discussed by Lee [10]. Bates [3] and
price X, evaluating options jointly greatly increases
Bakshi et al. [1] estimate and test the Bates and Scott
numerical efficiency.
models, respectively, while Pan [12] provides addi-
tional estimates and tests of the Bates [5] stochastic-
intensity model. Bates [4, 6] surveys empirical option
Related Models pricing research.
Related affine models can be categorized along four
lines: References
1. alternate specifications of jump processes;
2. the Bates [5] extension to stochastic-intensity [1] Bakshi, G., Cao, C. & Chen, Z. (1997). Empirical per-
formance of alternative option pricing models, Journal
jump processes; of Finance 52, 2003–2049.
3. models in which the underlying volatility can [2] Bakshi, G. & Madan, D.B. (2000). Spanning and
also jump; and derivative-security valuation, Journal of Financial Eco-
4. multifactor specifications. nomics 55, 205–238.
Bates Model 3

[3] Bates, D.S. (1996). Jumps and stochastic volatility: [10] Lee, R.W. (2004). Option pricing by transform methods:
exchange rate processes implicit in PHLX deutschemark extensions, unification and error control, Journal of
options, Review of Financial Studies 9, 69–107. Computational Finance 7, 51–86.
[4] Bates, D.S. (1996). Testing option pricing models, in [11] Merton, R.C. (1976). Option pricing when underlying
Handbook of Statistics, G.S. Maddala & C.R. Rao, eds, stock returns are discontinuous, Journal of Financial
Economics 3, 125–144.
(Statistical Methods in Finance), Elsevier, Amsterdam,
[12] Pan, J. (2002). The jump-risk premia implicit in options:
Vol. 14, pp. 567–611.
evidence from an integrated time-series study, Journal
[5] Bates, D.S. (2000). Post-’87 crash fears in the S&P of Financial Economics 63, 3–50.
500 futures option market, Journal of Econometrics 94, [13] Scott, L.O. (1997). Pricing stock options in a jump-
181–238. diffusion model with stochastic volatility and interest
[6] Bates, D.S. (2003). Empirical option pricing: a retro- rates: applications of Fourier inversion methods, Math-
spection, Journal of Econometrics 116, 387–404. ematical Finance 7, 413–426.
[7] Bates, D.S. (2006). Maximum likelihood estimation of
latent affine processes, Review of Financial Studies 19,
909–965.
Related Articles
[8] Duffie, D., Pan, J. & Singleton, K.J. (2000). Transform
analysis and asset pricing for affine jump-diffusions, Barndorff-Nielsen and Shephard (BNS) Models;
Econometrica 68, 1343–1376. Heston Model; Jump-diffusion Models; Stochastic
[9] Heston, S.L. (1993). A closed-form solution for options Volatility Models: Foreign Exchange; Time-
with stochastic volatility with applications to bond changed Lévy Process.
and currency options, Review of Financial Studies 6,
327–344. DAVID S. BATES
Barndorff-Nielsen and  = ln E[e ]. Under the integrability con-
iuZ1
with ψ(u)
dition |x|>1 ln |x|ν( dx) < ∞, the process (yt ) has a
Shephard (BNS) Models stationary distribution with characteristics
A γ
Ay = , γy = , ν y (B)
2λ λ
Stochastic volatility models based on non-Gaussian  ∞

Ornstein–Uhlenbeck (OU)-type processes were intro- = ν(ξ B) , ∀B ∈ B() (5)
duced in [3]. The motivation was to construct a 1 λξ
mathematically tractable model that provides an ade- In the stationary case, an OU-type process has an
quate description of price fluctuations on various exponential (short memory) autocorrelation structure:
timescales. The main idea is to model the volatility
with a non-Gaussian OU process: solution of a lin- Cov(yt , yt+s ) = e−λs Var yt (6)
ear Stochastic differential equation (SDE) with Lévy
To obtain more interesting correlation structures, one
increments. The non-Gaussian increments allow to
can add up several OU-type processes [1]: if y and
build a process that is positive and linear, meaning
ỹ are independent stationary OU-type processes with
that many computations are very simple.
parameters λ and λ̃ then
Cov(yt + ỹt , yt+s + ỹt+s ) = e−λs Var yt
Non-Gaussian Ornstein–Uhlenbeck-type
+ e−λ̃s Var ỹt (7)
Processes
The price to be paid is an increased model dimen-
The OU-type process (see [8, 11, 12] for original sion: the two-dimensional process (y, ỹ) is Markov
introduction or [2, 4, 10] for a more modern treat- but the sum y + ỹ is not. Superpositions of OU-
ment) is defined as the solution of the stochastic type processes can also be used to construct finite-
differential equation dimensional approximations to non-Markov (e.g.,
long memory) processes.
dyt = −λyt + dZt (1)

where Z is a Lévy process. It can be explicitly Positive OU-type processes


written as
 t Positive OU-type processes can be used as linear
models for stationary financial time series such as
yt = y0 e−λt + e−λ(t−s) dZs (2)
0
volatility (discussed below) or commodity prices (see
[6]). An OU-type process is positive if the driving
At any time t, the distribution of yt is infinitely Lévy process Z is a positive Lévy process, also
divisible. If the characteristic triplet of Z is (A, ν, γ ), known as a subordinator. In this case, the trajectory
the characteristics of yt are given by consists of a series of positive jumps with exponential
decay between them like in Figure 1.
A
{1 − e−2λt }
y
At =

γ Model Specification and Examples
γt = {1 − e−λt } + y0 e−λt
y
λ
 eλt The “econometric” (as opposed to risk-neutral) ver-
y dξ sion of the Barndorff-Neilsen and Shephard (BNS)
νt = ν(ξ B) ∀B ∈ B() (3)
1 λξ stochastic volatility model has the form

and the characteristic function of yt is St = S0 exp(Xt )


 
  t  dXt = µ + βσt2 dt + σt dWt + ρ dZt ρ ≤ 0
−λt
E[e ] = exp iuy0 e + ψ(ue
iuyt λ(s−t)
) ds (4)
0
dσt2 = −λσt2 dt + dZt σ02 > 0 (8)
2 Barndorff-Nielsen and Shephard (BNS) Models

0.55 expression for the characteristic function of log stock


0.5 price X. Under the risk-neutral probability,
0.45
φt (u) = E[eiuXt ]
0.4 
u2 + iu
0.35 = exp iu(r − l(ρ))t − iσ02 ε(λ, t)
2
0.3
 t   
u2 + iu
0.25
+ l iρu − ε(λ, t − s) ds
0.2 0 2
0.15 (10)
0.1
with ε(λ, t) := 1 − e−λt /t. This means that if the
0.05
0 0.2 0.4 0.6 0.8 1 risk-neutral parameters are known, European options
can be priced by Fourier inversion (see Exponential
Figure 1 Sample trajectory of a positive OU-type process Lévy Models). The expected integrated variance is
 t

The log stock price is a stochastic volatility process 1 − e−λt


E σs ds = σ02
2
with downward jumps (Z has only positive jumps) 0 λ
and the volatility is a positive OU-type process. e−λt − 1 + λt
Introducing Z into the equation for the log price with + E[Z1 ] (11)
a negative coefficient accounts for the leverage effect: λ2
volatility jumps up when price jumps down. leading to a simple explicit formula for the fair rate
Nicolato and Venardos [9] have shown that model of a variance swap.
(8) is arbitrage-free, that is, one can always find an The market being generally incomplete, there exist
equivalent martingale measure. Under a martingale many risk-neutral probabilities and the prices of
measure, the model takes the form contingent claims are not unique. The solution is
to select the risk-neutral probability implied by the
St = S0 exp(Xt )
  market, calibrating the model parameters to a set of
dXt = r − l(ρ) − 12 σt2 dt + σt dWt quoted option prices. Nicolato and Venardos [9] carry
out the calibration exercise by the usual technique of
+ ρ dZt
nonlinear least squares:
dσt2 = −λσt2 dt + dZt , σ02 > 0 (9)

N
where r is the interest rate and l(u) := ln E[e ]. uZ1 min (CiM − C θ (Ti , Ki ))2 (12)
θ
As an example of a concrete specification, suppose i=1
that the stationary distribution µ of the squared
volatility process σt2 is the gamma distribution with where N is the total number of observations, CiM is
density µ(x) = α c /
(c)x c−1 e−αx 1x≥0 (this is the the observed price of the option with strike Ki and
same as the stationary distribution of the volatility time to maturity Ti , and C θ (Ti , Ki ) is the price of this
in Heston’s stochastic volatility model). In this case, option evaluated in a model with parameter vector θ.
(Zt ) has zero drift and a Lévy measure with density This method appears to work well in [9] but in other
ν(x) = αλce−αx , that is, it is a compound Poisson situations two problems may arise:
process with exponential jump size distribution. The
• Lack of flexibility: in BNS models, the same
Laplace exponent of Z is l(u) = λcu/α − u.
parameter ρ determines the size of jumps in the
price process (and hence the short-maturity skew
Option Pricing and Hedging or asymmetry of the implied volatility smile) and
the correlation between the price process and the
The BNS models are a subclass of affine pro- volatility (the long-dated skew). For this reason,
cesses (see Affine Models) [7]: there is an explicit the model may be difficult to calibrate in markets
Barndorff-Nielsen and Shephard (BNS) Models 3

with pronounced skew changes from short to long [3] Barndorff-Nielsen, O.E. & Shephard, N. (2001). Non-
maturities such as FX markets. Gaussian Ornstein–Uhlenbeck based models and some
• Lack of stability: since the calibration functional of their uses in financial econometrics, Journal of the
Royal Statistical Society: Series B 63, 167–241.
(12) is not convex and the number of model [4] Cont, R. & Tankov, P. (2004). Financial Modelling with
parameters may be large, the calibration algorithm Jump Processes, Chapman & Hall/CRC Press.
may be caught in a local minimum, which leads [5] Cont, R., Tankov, P. & Voltchkova, E. (2007). Hedg-
to instabilities in the calibration procedure. Usual ing with options in models with jumps, in Stochas-
remedies for this problem include the use of tic Analysis and Applications: The Abel Symposium
global minimization algorithms such as simulated 2005 in Honor of Kiyosi Ito, F.E. Benth, G. Di
annealing or adding a convex penalty term to the Nunno, T. Lindstrom, B. Øksendal & T. Zhang, eds,
Springer, pp. 197–218.
functional (12) to make the problem well posed. [6] Deng, S.-J. & Jiang, W. (2005). Lévy process-driven
mean-reverting electricity price model: the marginal
The minimal variance hedging in BNS models distribution analysis, Decision Support Systems 40,
is discussed in [5]. Let the option price at time t 483–494.
be given by C(t, St , σt2 ) (this can be computed by [7] Duffie, D., Filipovic, D. & Schachermayer, W. (2003).
Fourier transform). The hedging strategy minimizing Affine processes and applications in finance, Annals of
the variance of the residual hedging error under the Applied Probability 13, 984–1053.
risk-neutral probability is then given by [8] Jurek, Z.J. & Vervaat, W. (1983). An integral representa-
tion for self-decomposable Banach space valued random

2 ∂C 1 variables, Zeitschrift für Wahrscheinlichkeitstheorie und
φt = σt− + ν( dz)(eρz − 1) Verwandte Gebiete 62(2), 247–262.
∂S St−
[9] Nicolato, E. & Venardos, E. (2003). Option pricing

in stochastic volatility models of Ornstein–Uhlenbeck


× [C(t, St− e , σt− + z) − C(t, St− , σt− )]
ρz 2 2
type, Mathematical Finance 13, 445–466.
[10] Sato, K. (1999). Lévy Processes and Infinitely Divisible

−1 Distributions, Cambridge University Press, Cambridge.
× σt−
2
+ (eρz − 1)2 ν( dz) (13) [11] Sato K. & Yamazato M. (1983). Stationary processes
of Ornstein–Uhlenbeck type, in Probability Theory and
Mathematical Statistics, Fourth USSR–Japan Sympo-
When there are no jumps in the stock price (ρ = 0),
sium, K. Itô & V. Prokhorov, eds, Lecture Notes in
the optimal hedging strategy is just delta-hedging: Mathematics, Vol. 1021, Springer, Berlin.
φt = ∂C/∂S; even though there are jumps in the [12] Wolfe, S.J. (1982). On a continuous analogue of
option price, they cannot be hedged using stock only, the stochastic difference equation xn = ρxn−1 + bn ,
because the stock does not jump. Stochastic Processes and Applications 12(3), 301–312.

References Related Articles


[1] Barndorff-Nielsen, O.E. (2001). Superposition of
Ornstein–Uhlenbeck type processes, Theory of Proba- Exponential Lévy Models; Lévy Processes;
bility and Its Applications 45, 175–194. Ornstein–Uhlenbeck Processes.
[2] Barndorff-Nielsen, O.E., Jensen, J.L. & Sørensen, M.
(1998). Some stationary processes in discrete and PETER TANKOV
continuous time, Advances in Applied Probability 30,
989–1007.
Heston Model Model Description
If we assume a prevailing instantaneous interest rate
of r = (rt )t≥0 and a yield from holding a stock
of µ = (µt )t≥0 , then Heston’s model is given as
In the class of stochastic volatility models (see
the unique strong solution Z = (St , vt )t≥0 of the
Stochastic Volatility Models), Heston’s is proba-
following stochastic differential equation (SDE):
bly the most well-known model. The model was
published in 1993 by S. Heston in his seminal arti- √
cle the title of which readily reveals much of its dvt = κ(θ − vt ) dt + σ vt dWt
popularity, A Closed-Form Solution for Options with √
dSt = St (rt − µt ) dt + St vt dBt (1)
Stochastic Volatility with Applications to Bond and
Currency Options. It is probably the only stochastic with starting values spot S0 > 0 and “Short Vol”

volatility model for equities which both allows very v0 > 0. In this equation, W and B are two stan-
efficient computing of European option prices and dard Brownian motions with a Correlation of ρ ∈
fits reasonably well to market data in very different (−1, +1). The model is usually specified directly
conditions. under a risk-neutral measure.
In fact, the model was used successfully (in the This Correlation together with the “Vol Of Vol”
sense explained below) during the boom of the end σ ≥ 0 can be thought of being responsible for the
of the 1990s, in the brief recession 2001, in the very skew. This is illustrated in Figure 1: Vol Of Vol
low volatility regime until 2007, and it still performed controls the volume of the smile and Correlation and
well during the very volatile period of late-2008. its “tilt”. A negative Correlation produces the desired
However, the model also has several questionable downward skew of implied volatility. It is usually
properties: critics point out that its inherent structure calibrated to a value around −70%.
as a square-root diffusion does not reflect statistical The other parameters control the term structure of
properties seen in real market data. For example, the model: in Figure 2, the impact
√ of changing “Short

typical calibrated parameters allow the instantaneous Vol” v0 ≥ 0, “Long Vol” θ ≥ 0, and “Reversion
volatility of the stock to become zero with a positive Speed” κ > 0 on the term structure of at-the-money
probability. From a practical point of view, the (ATM) implied volatility is illustrated. It can be seen
most challenging property of Heston’s model is the that Short Vol lives up to its name and controls the
interdependence of its parameters and the resulting level of the short-date implied volatilities, whereas
inability to give these parameters a real idiosyncratic Long Vol controls the long end. Reversion Speed
meaning. One example is the fact that moving the controls the skewness or “decay” of the curve from
term structure of volatility has an impact on the shape the Short Vol level to the Long Vol level.
of the implied volatility skew. This means that traders This inherent mean-reversion property of Hes-
who use this model will have to have a very a good ton’s
√ stochastic volatility around a long-term mean
understanding of the dynamics of the model and the θ is one of the important properties of the model.
interplay between its parameters. Real market data are often mean-reverting, and it
Other stochastic volatility models with effi- also makes economic sense to assume that volatil-
cient pricing methods for European options are: ity is not unbounded in its growth as, for example,
SABR, Schobel–Zhou or Hull–White model (see a stock price process is. In historic data, the “natu-
Hull–White Stochastic Volatility Model) and ral” level of mean-reversion is often seen to be itself
Lewis’ “3/2-model” presented in Lewis’ book [13]. a mean-reverting process as Fouque et al. [10] have
The n-dimensional extension of Heston’s model is shown. Some extensions of Heston in this direction
the class of affine models [9]. Related are Levy’- are discussed below.
based models that can also be computed efficiently
(see Time-changed Lévy Process). The most natu- Parameter Interdependence
ral model that is used frequently but which actually
does not allow efficient pricing of Europeans is a Before we proceed, a note of caution: the above
lognormal model for instantaneous volatility. distinction of the parameters by their effect on
2 Heston Model

30

25
Correlation

20
Implied volatility

Vol Of Vol Vol Of Vol


15

10
Black–Scholes
Heston with zero correlation
5
Heston

0
60.7% 74.1% 90.5% 110.5% 135.0% 164.9%
Strike/forward (in log scale)

Figure 1 Stylized effects of changing Vol Of Vol and Correlation in Heston’s model on the one-year implied volatility.
The Heston parameters are v0 = 15%2 , θ = 20%2 , κ = 1, ρ = −70%, and σ = 35%

25 25
20 20
15 15
10 Short Vol 15 10 Long Vol 20
Short Vol 10 Long Vol 15
5 Short Vol 20 5 Long Vol 25
0 0
(a) 0 2 4 6 8 10 12 (b) 0 2 4 6 8 10 12

20
16
12
8 Reversion speed 1
Reversion speed 0.5
4 Reversion speed 1.5
0
(c) 0 2 4 6 8 10 12

Figure 2 The effects of changing Short Vol (a), Long Vol (b), and Reversion Speed (c) on the ATM term structure
of implied volatilities. Each graph shows the volatility term structure for 12 years. The reference Heston parameters are
v0 = 15%2 , θ = 20%2 , κ = 1, ρ = 70%, and σ = 35%

term structure and strike structure, was made for control the risk with the five available parameters,
illustration purposes only: in particular, κ and σ are but has to understand very well their interdependency.
strongly interdependent if the model is used in the For example, to hedge, say, convexity risk in strike
form (1). direction of the implied volatility surface, the trader
This is one of the most serious drawbacks of will also have to deal with the skew risk at the
Heston’s model since it means that a trader who uses same time since in Heston, there is no one parameter
it to risk-manage a position cannot independently to control either: convexity is mainly controlled
Heston Model 3

by Vol Of Vol, but the effect of Correlation on is not an L2 -function in K, we define a “dampened
skew depends on the level of Vol Of Vol, too. call”
Moreover, changes to the short end volatility skew eαk
c(T , k) := (T , ek ) (4)
will always affect the long-term skew. A similar DF(T )FT
strong codependency exists between Vol of Vol and
Reversion Speed; as pointed out in [14], some of for an α > 0,b for which its characteristic func-
the strong interdependence between Vol Of Vol and tion ψt (z; k) :=  eikz c(t, k) dk is well defined and
Reversion Speed can be alleviated by using the given as
alternative formulation ϕt (k − i(α + 1))
√ √ ψt (z; k) = (5)
dvt = (θ − vt )κ dt + σ̃ vt κ dWt (2) (ik + α)(ik + α + 1)
The function ϕt (z) := Ɛ[exp{iz log St /Ft }] is the
In this parametrization, the new Vol Of Vol and characteristic function of Xt := log St /Ft . Since Hes-
reversion speed are much less interdependent, which
ton belongs to the affine model class, its characteristic
stabilizes results of daily calibration to market data
function has the form
substantially. Mathematically, this parametrization
much more naturally defines κ as the “speed” of the ϕt (z) = e−v0 At −mBt (6)
equation.
Such complications are a general issue with with (cf. [14])
stochastic volatility models: since such models
attempt to describe an unobservable, rather theo- α + aeγ t
At := and
retical quantity (instantaneous variance), they do β + beγ t
not produce very intuitive behavior when looked
β + beγ t
at through the lens of the observable measure of αbγ t + (aβ − αb) log
“implied volatility”. That said, implied volatility itself β +b
Bt := κ̃ (7)
or, rather, its interpolations are also moving on a daily βbγ
basis. This indicates that natural parameters such as where µ := (iz + z2 )/2, κ̃ := κ − ρizσ , γ :=
convexity and skew of implied volatility might be a 
− 2µσ + κ̃ 2 , a := −2µ, α := 2µ, b := −κ̃ + γ
2
valuable tool for feeding a stochastic volatility model,
and β := κ̃ + γ .
but it is unreasonable to keep them as constant param-
We can then price a call on X using
eters inside the model.
e−α ln(K)
(T , K) = DF(T )FT
Pricing European Options π
 ∞
Heston’s popularity is probably mainly derived from × e−iz ln(K) ψt (z; ln(K)) dz (8)
the fact that it is possible to price European options 0
on the stock price S using semi-closed-form Fourier The method also lends itself to Fast Fourier Trans-
transformation, which in turn allows rapid calibration form if a range of option prices for a single maturity
of the model parameters to market data. “Calibration” is required.
here means to infer values for the five unobserv-
√ √ Similarly, various other payoffs can be computed
able parameters v0 , θ , κ, σ, ρ from market data very efficiently with the Fourier approach, for exam-
by minimizing the distance between the models’ ple, forward started vanilla options, options on inte-
European option prices and observed market prices. grated short variance, and digital options.
We focus on the call prices. Following Carr and
Madan [7], we price them via Fourier inversion.a The
call price for a relative strike K at maturity T is given Time-Dependent Parameters
as  
(T , K) := DF(T )Ɛ (ST − KFT )+ (3) Moreover, for most of these products—and most im-
portantly, plain European options—it is very straight-
where DF(T ) represents the discount factor and FT forward to extend the model to time-dependent,
is the forward of the stock. Since the call price itself piece-wise constant parameters. This is briefly
4 Heston Model

is reflected in large discrepancies of the parameter


0.20% values for distinct periods. For example, the excellent
0.15%
fit of the time-dependent Heston model in Figure 3 is
achieved with
√ the following parameter values (short
0.10% volatility ζ0 was 15.0%):
0.05%

0.00% 6m 1y 3y ∞

−0.05% Long Vol θ 20.7% 23.6% 36.1% 46.5%
−0.10% Reversion 5.0 3.2 0.4 0.3
5y Speed κ
−0.15%
3y Correlation ρ −55.2% −70.9% −80.1% −69.4%
−0.20% 1y Vol Of Vol σ 78.7% 81.5% 35.3% 60.0
3m
75%
80%
85%

1m
90%
95%

(a)
100%

The increased number of parameters also makes it


105%
110%
115%
120%
125%
more difficult to hedge in such a model in practice;
Strike/spot even though both Heston and the time-dependent
Heston models create complete markets, we will
always need to additionally protect our position
0.20%
against moves in the parameter values of our model.
0.15% Just as for Vega in Black and Scholes, this is
0.10% typically done by computing “parameter greeks” and
neutralizing the respective sensitivities. Clearly, the
0.05%
more the parameters that are involved, and the less
0.00% stable these are, this “parameter hedge” becomes less
−0.05% and less reliable.
−0.10%

−0.15%
5y Mathematical Drawbacks
3y
−0.20% 1y
3m
The underlying mathematical reason for the relative
75%

tractability of Heston’s model is that v is a squared


80%

1m
85%
90%
95%

(b)
100%
105%
110%

Bessel process, which is well understood and reason-


115%
120%
125%

ably tractable. In fact, a statistical estimation on S&P


Strike/spot 500 by Aı̈t-Sahalia and Kimmel [1] of α ∈ [1/2, 2]
in the extended model
Figure 3 Heston (a) without and (b) with time-dependent
parameters fitted to STOXX50E for maturities from 1m to dvt = κ(θ − vt ) dt + σ vtα dWt1 (9)
5y. The introduction of time dependency clearly improves
the fit
has shown that, depending on the observation fre-
quency, a value around 0.7 would probably be more
adequate (see Econometrics of Diffusion Models).
discussed in [14]. It improves the fit of the model What is more, the square-root volatility terms mean
to the market prices markedly, cf. Figure 3. that unless
However, it should be noted that by introducing 2κθ ≥ σ 2 (10)
piecewise constant time-dependent parameters, we
lose much of a model’s structure. It is turned from the process v can reach zero with nonzero probability.
a time-homogeneous model which “takes a view” on The crux is that this condition is regularly violated
the actual evolution of the volatility via its SDE into if the model is calibrated freely to observed market
a kind of an arbitrage-free interpolation of market data. Although a vanishing short variance is not
data: if calibrated without additional constraints to a problem in itself (after all, a variance of zero
ensure smoothness of the parameters over time, this simply means absence of trading activity), it makes
Heston Model 5

Probability density of Heston's short vol for a 20%

75

65

55

45

35

25
1m
3m
15 6m

−5 0 5 10 15 20 25 30 35 40
Short volatility level

Probability density of Heston's Short Vol for a 40%


75

65

55

45

35

25

15 1m
3m
6m
5

−5 0 5 10 15 20 25 30 35 40
Short volatility level

Figure 4 This graphs shows the density of vt for one, three and six months for the case where condition (10) is satisfied
(above) or not (below). Apart from Vol Of Vol, the parameters were v0 = 15%2 , θ = 20%2 and κ = 1

numerical approximations more complicated. In a sense detailed in proposition 3.1 in [2] (see Moment
Monte Carlo simulation, for example, we have to take Explosions for more details). Again, this is not
the event of v being negative into account. The same a problem from a purely mathematical point of
problem appears in a partial differential equation view, but it makes numerical schemes less effi-
(PDE) solver: Heston’s PDE becomes degenerate if
cient. In particular, Monte Carlo simulations perform
Short Vol hits zero. A violation of Equation (10) also
much worse: although an Euler scheme will still
implies that the distribution of short variance Vt at
some later time t is very wide, cf. Figure 4. converge to the desired value, the speed of con-
Additionally, if Equation (10) does not hold, then vergence deteriorates. Moreover, we cannot safely
the stock price S may fail to have a second moment use control variates anymore if the payoff is not
if the Correlation is not negative enough in the bounded.
6 Heston Model

Pricing Methods Heston’s PDE

Once we have calibrated the model using the afore- It is straightforward to derive the PDE for the
mentioned semiclosed form of solution for the Euro- previous model. Let
pean options, the question is how to evaluate complex
Pt (v, S) := DFt (T )Ɛ [F (ST )|St = S, vt = v] (15)
products. At our disposal are PDEs and Monte Carlo
schemes. be the price of a derivative with maturity T at time t.
Since the conditional transition density of the It satisfies
entire process is not known, we have to revert to
solving a discretization of the SDE (1) if we want to
use a Monte Carlo scheme (see Monte Carlo Sim- 0 = rt Pt + ∂S Pt (rt − µt )St + ∂v Pt κ(m − vt )
ulation for Stochastic Differential Equations for 1 2 1 2
+ ∂SS Pt St vt + ∂vv Pt σ 2 vt + ∂vS
2
Pt ρvt St (16)
an overview of Monte Carlo concepts). To this end, 2 2
assume that we are given fixing dates 0 = t0 < · · · <
with boundary condition PT (S, v) = F (ST ). To solve
tN = T and let ti := ti+1 − ti for i = 0, . . . , N − 1.
this two-factor PDE with a potentially degenerate
Moreover, we denote by Wi for i = 0, . . . , N − 1 a 2
diffusion term in ∂vv Pt , it is recommended to use a
sequence of independent normal variables with vari-
stabilized alternating direction implicit (ADI) scheme
ance i , and by Bi a corresponding sequence where
such as the one described by Craig and Sneyd [8] (see
Bi and Wi have Correlation ρ.
Alternating Direction Implicit (ADI) Method for a
When using a straightforward Euler scheme, we
discussion on ADI).
will face the problem that v can become negative. It
works well simply to reduce the volatility term of the
variance to the positive part of the variance, that is,
Risk Management
to simulate
 Provided that we consider not only the stock price
vti+1 = vti + κ(θ − vti ) i + σ vti+ Wi (11) itself but also a second liquid instrument V such
as a listed option as hedging instrument, stochastic
A flaw of this scheme is that it is biased. This is volatility models are complete, that is, in theory every
overcome by using the moment-matching scheme contingent claim P can be replicated in the sense that
there are hedging strategies ( t , Vt )t such that
 
vti+1 = θ ti + θ − vti e−κ ti
  dPt − rt Pt dt = t (dSt − St (rt − µt ) dt)
−2κ ti
1 − e + Vt (dVt − rt Vt dt)
+ σ vt+i  Wi (12) (17)

(see Complete Markets for a discussion on complete
which works well in practice. To compute the markets). In Heston’s model, we can write the price
stock price, we approximate the integrated variance process of both the derivative we want to hedge and
over [ti , ti+1 ] as the hedging instrument as a function of current spot
level and short variance, that is, Pt ≡ Pt (St , vt ) and
  1 − e−κ ti Vt ≡ Vt (St , vt ). Then, the correct hedging ratios are
i V := θ ti + vti − θ (13)
κ ∂v P ∂v P
Vt = and t = ∂S Pt − ∂S Vt (18)
and set ∂v V ∂v V
 k−1   This is the equivalent of delta hedging in Black and
  1
Stk := Ftk exp i V Bi − i V (14) Scholes (see Delta Hedging). However, as for the
i=1
2 latter, plain theoretical hedging will not work since
the other parameters in our model, Reversion Speed,
Note that this scheme is unbiased in the sense Vol of Vol, Long Vol, and potentially Correlation, will
that Ɛ[ST ] = FT . not remain constant if we calibrate our model on a
Heston Model 7

daily basis. This is the effect of a change in volatility where Nt is a Poisson process with intensity λ (see
for Black and Scholes—a change of this parameter is Poisson Process) and where (ξj )j are the normal
not anticipated by the model itself and must be taken jumps of the returns of S with mean µ and volatil-
care of “outside the model”. ity ν. To make sure that St /Ft is a martingale we
1
As a result, one way to control this risk is 2
stipulate that µ = em+ 2 ν − 1.
to engage in additional parameter hedging, that Since the process X is independent of the jumps,
is, the desk also displays sensitivities with respect the characteristic function of the log-stock process is
to the other model parameters including, potentially, the product of the separate characteristic functions. In
second-order exposures. Those can then be monitored other words, Bates’ model can be evaluated using the
on a book level and managed explicitly. The draw- same approach as above and is equally efficient while
back of this method is that to reduce risk with respect allowing for a very pronounced short-term skew due
to those parameters, a portfolio of vanilla options has to the jump part.d Figure 5 shows the improvement of
to be bought whose composition can change quickly time-dependent Bates over time-dependent Heston.
if implemented blindly.c The model has been further enhanced by Knudsen
A second variant is to try to map standard risks of and Nguyen-Ngoc [12] who also added exponentially
the desk such as implied volatility convexity, skew- distributed jumps to the variance process.
ness, and so on into stochastic volatility risk by
“recalibration”. The idea here is that, say, the con-
vexity parameter of the implied volatility is modified, Multifactor Models
then Heston’s model is calibrated to this new implied
volatility surface and the option priced off this model. Structurally, Heston’s model is a member of the class
The resulting change in model price is then con- of “affine models” as introduced by Duffie et al. [9].
sidered the sensitivity of the option to convexity in As such, it can easily be extended by mixing in fur-
implied volatility. This approach suffers from the fact ther independent square-root processes. One obvious
that typical “implied vol risks” are very different from approach presented in [14] is simply to multiply sev-
typical movements in the Heston model. For exam- eral independent Heston processes. For the two-factor
ple, the standard Heston model is homogeneous so case, this means to set St := Ft Xt1 Xt2 where both X 1
it cannot easily accommodate changes in short-term and X 2 have the form (19). Jumps can be added, but
skew only. to make the Fourier integration work efficiently, the
processes X 1 and X 2 must remain independent.
The stochastic variance of the joint stock price is
then simply the sum of the two separate variances, v 1
Related Models and v 2 , and it is intuitively assumed that one is a
“short-term”, fast mean-reverting process whereas the
Owing to its numerical efficiency, Heston’s model other is mean reverting slowly. Such a structure is
is the base for many extensions. The first notable supported by statistical evidence, cf. [10]. However,
extension is Bates’ addition of jumps to the diffusion the independence of the two processes makes it very
process in his article [3] (see Bates Model). Jumps difficult to impose enough skew into this model
are commonly seen as a necessary feature of any risk since the effective Correlation between instantaneous
management model, even though the actual handling variance and stock price weakens. In practice, this
of the jump risk part is far from clear. model is used only rarely.
Bates’ approach can be written as follows: let X A related model “Double Heston” has been men-
be given by tioned by Buehler [6], which is obtained by modeling
the mean variance level θ in Heston itself as a square-
√ root diffusion, that is,
dvt = κ(θ − vt ) dt + σ vt dWt
√ √
dXt = Xt vt dBt (19) dvt = κ(θt − vt ) dt + σ vt dWt

and let dθt = c(m − θt ) dt + ν θt dWtθ
Nt
ξ −λmt √
St = Ft Xt e j =1 j (20) dSt = St (rt − µt ) dt + St vt dBt (21)
8 Heston Model

Fitted Heston
0.20%
A particular class of derivatives that has gained
0.15% reasonable popularity in recent years are “Options on
0.10% Variance”, that is, structures whose terminal payoff
depends on the realized variance of the returns of
0.05%
the stock over a set of business days 0 = t0 < · · · <
0.00% tn = T ,
n  
−0.05% St i 2
(T ) := log (22)
Sti−1
−0.10% i=1
5y
−0.15% The most standard of such products is a “variance
3y
−0.20% 1y swap” (see Variance Swap), which essentially pays
3m the actual realized annualized variance over the
75%
80%
85%

1m
90%
95%

(a) period in exchange for a previously agreed fair strike.


100%
105%
110%
115%
120%
125%
This strike is usually quoted in volatility terms, that
Strike/spot is a variance swap with maturity T , and strike (T )
pays
252
0.20% (T ) −  2 (T ) (23)
n
0.15%
From this product, a market with options on realized
0.10% variance has evolved naturally; these include capped
0.05% variance swaps (mainly traded on single stocks),
outright straddles on realized variance swaps, and
0.00%
also VIX futures and options (see Realized Volatility
−0.05% Options). Although there are several discussions
−0.10% around how best to approach the risk management
−0.15% 5y of such products, a particularly useful Heston’s
3y model is the “Fitted Heston” approach introduced by
−0.20% 1y
3m Buehler [4].
75%

The main idea here is that to price an option on


80%

1m
85%
90%
95%

(b)
100%
105%
110%

realized variance in a given model, it is crucial to


115%
120%
125%

price correctly a variance swap itself, that is, to make


Strike/spot sure that
  n 2
Ɛ (T ) =  (T ) (24)
Figure 5 Heston (a) and Bates (b) with time-dependent 252
parameters fitted to STOXX50E for maturities from 1m
to 5y The idea of “fitting to the market”, say, Heston’s
model (2) is now simply to force the model to sat-
isfy this equation. First, assume that we have the
where W θ is independent of W and B. While this term structure of the market’s expected realized vari-
model has a reasonably tractable characteristic func- ance, M(T ) = n 2 (T )/252 =  2 (T )T , and define
tion, it also suffers from the problem that long-term m(t) := ∂T |T =t M(T ). Take the original short vari-
skew becomes too symmetric, contrary to what is ance of the model,
observed in the market. Such a model, however, may √ √
have applications when pricing options on variance dvt = (θ − vt )κ dt + σ vt κ dWt (25)
where the skew counts less and it is more important
and define the new “fitted” process as
to be able to account for some dynamics of the term
structure of variance. Refer to [6] for an extensive vt
discussion on this. wt := m(t) (26)
Ɛ [vt ]
Heston Model 9

with the stock price as [3] Bates, D. (1996). Jumps and stochastic volatility:
exchange rate process implicit in the Deutschemark

dSt = St (rt − µt ) dt + St wt dBt (27) options, Review of Financial Studies 9(1), 69–107.
[4] Buehler, H. (2006). Consistent variance curve models,
This now reprices all variance swaps automatically Finance and Stochastics 10(2), 178–203.
in the sense (24). Note that this method does not [5] Buehler, H. (2006). Options on variance: pricing and
hedging, Presentation, IQPC Volatility Trading Con-
at all depend on using Heston’s model and can be
ference, London November 28th, 2006, http://www.
applied to any stochastic volatility model as long as quantitative-research.de/dl/IQPC2006-2.pdf
the expectation of instantaneous variance is known. [6] Buehler, H. (2006). Volatility Markets: Consistent Mod-
As pointed out in [6], this model is naturally very eling, Hedging and Practical Implementation, PhD the-
attractive from a risk-management point of view if sis TU Berlin, http://www.quantitative-research.de/dl/
the input M is computed on the fly within the risk HansBuehlerDiss.pdf
management system. In this case, the risk embedded [7] Carr, P. & Madan, D. (1999). Option pricing and the Fast
Fourier Transform, Journal of Computational Finance
in the variance swap level (called VarSwapDelta) is 2(4), 61–73. Summer.
automatically reflected back in the standard implied [8] Craig, I.J.D. & Sneyd, A.D. (1988). An alternating-
volatility risk, and the underlying stochastic volatility direction implicit scheme for parabolic equations with
model is used purely to control skew and convexity mixed derivatives, Computers and Mathematics with
around the variance swap backbone.e Further prac- Applications 16(4), 341–350.
tical considerations and the impact of jumps are [9] Duffie, D., Pan, J. & Singleton, K. (2000). Transform
analysis and asset pricing for affine jump-diffusions,
discussed by Buehler in [5].
Econometrica 68, 1343–1376.
[10] Fouque, J.-P., Papanicolaou, G. & Sircar, K. (2000).
End Notes Derivatives in Financial Markets with Stochastic Volatil-
ity, Cambridge Press.
a.
[11] Heston, S. (1993). A closed-from solution for option
In his original paper [11], Heston suggested a numerically with stochastic volatilty with applications to bond and
more expensive approach via numerical integration that currency options, Review of Financial Studies 6(2),
is twice as slow but still much faster than the same 327–343.
computation for most other models. The approach to price [12] Knudsen, T. & Nguyen-Ngoc, L. (2000). The Heston
with Fourier inversion is due to Carr and Madan [7]; the model steps further Deutsche Bank Quantessence 1(7)
interested reader finds more details on the subject in Lewis’s https: // www.dbconvertibles.com/dbquant/quantessence/
book [13]. Vol1Issue7External.pdf
b.
See [7] for a discussion on the choice of α. [13] Lewis, A. (2000). Option Valuation under Stochastic
c.
Bermudez et al. discuss one approach to find such portfo- Volatility, Finance Press.
lios [14]. [14] Overhaus, M., Bermudez, A., Buehler, H., Ferraris, A.,
d.
In practice, calibrating all parameters (stochastic volatility Jordinson, C., Lamnouar, A. (2006). Equity Hybrid
plus jumps) together is relatively unstable since the two Derivatives, Wiley.
parts play similar roles for the short-term options. It is
therefore customary to fix the jump parameters themselves
or to calibrate them separately to very short-term options.
e.
Related Articles
Usually, the parameters v0 and θ are fixed to some “usual
level” such as 20%. Then, they do not need to be calibrated
anymore and, in addition, σ retains some comparability to Alternating Direction Implicit (ADI) Method;
the standard Heston model. Bates Model; Complete Markets; Cliquet Options;
Econometrics of Diffusion Models; Hedging;
Hull–White Stochastic Volatility Model; Model
References
Calibration; Monte Carlo Simulation for Stocha-
stic Differential Equations; Moment Explosions;
[1] Aı̈t-Sahalia, Y. & Kimmel, R. (2004). Maximum Likeli-
Realized Volatility Options; Variance Swap.
hood Estimation of Stochastic Volatility Models, NBER
Working Paper No. 10579, June 2004.
[2] Andersen, L. & Piterbarg, V. (2007). Moment explosions
HANS BUEHLER
in stochastic volatility models, Finance and Stochastics
11(1), 20–50.
Hull–White Stochastic − rf = −rS
∂f
∂S
− µσ 2
∂f
∂V
(4)
Volatility Model Assuming volatility and stock price are uncorre-
lated, we derive an analytic solution to equation (4)
through risk-neutral valuation procedure
Even before practitioners started using the
Black–Scholes formula extensively, one had iden- f (St , σt2 , t) = e−r(T −t)
tified the assumption of constant volatility as unreal-  ∞
istic. Empirical observation of equity vanilla option × f (ST , σT2 , T )p(ST |St , σt2 ) dSt
market shows, indeed, that the implied volatility level 0
depends on the strike. This feature, commonly known (5)
as the volatility smile, violates the constant volatil-
ity assumption. This essential remark motivated the where T is the option maturity, St is the security at
birth of stochastic volatility models (see Stochastic pricing time t, σt is the instantaneous volatility at time
Volatility Models). t, and p(ST |St , σt2 ) is the conditional distribution of
Among the first authors to tackle this issue, Hull ST given the security price and variance at time t.
and White proposed in 1987 a simple extension Introducing the mean variance over the option
of Black–Scholes model [1]. This article aims at life V̄ ,  T
presenting a sound introduction to the Hull–White 1
stochastic volatility model and at indicating its impli- V̄ = στ2 dτ (6)
T −t t
cations in terms of volatility behavior and correlation.
Hull and White describe the variance V = σ 2 as a we express the call option value as
geometric Brownian motion. Therefore, the asset and  ∞
variance satisfy the following stochastic differential CHW (St , σt , t) =
2
CBS (V̄ )h(V̄ |σt2 )dV̄ (7)
equation: 0

where CBS is the Black–Scholes call option value


dS = φS dt + σ S dwt (1) and h(·, σt2 ) is the conditional density of V̄ given σt2 .
dV = µV dt + ξ V dzt (2) In some particular cases, such as when µ and σ
are constant, this expression admits an explicit Taylor
dwt , dzt = ρ dt (3) expansion which converges quickly for small values
of ξ 2 (T − t).
In its general formulation, the parameter φ may
depend on S, σ , and t, while parameters µ and ξ may
depend on σ and t. One may find that many models
Behavior of Volatility
fall under these dynamics, including the √ Heston
model for µ = κ (θ/V − 1) and ξ = (ν/ V ). As in Assuming that parameters µ and ξ are constant,
[1], we will restrict to the constant parameters case the first two moments of the volatility process are
(Hull and White studied the mean-reverting variance given by
case in [2]).
 
1 1
µ− ξ 2 t
2 4
E[σ (t)] = σ (0) × e (8)
Option Pricing  
1 2

Let f be the price of a security which depends on V [σ (t)] = σ (0)2 × eµt 1 − e 8 ξ t


(9)
the stock price. f satisfies the partial differential
equation (PDE) For µ < 1 ξ 2 , the expectation of volatility con-
4
  verges to zero, whereas for µ > 1 ξ 2 , it diverges.
∂f 1 2
∂ 2f 2 4
2 2∂ f 2 2∂ f Regarding the variance of volatility, it increases
+ σ S + 2ρσ ξ S
3
+ξ V
∂t 2 ∂S 2 ∂S∂V ∂V 2 unbounded if µ > 0.
2 Hull–White Stochastic Volatility Model

17.0% 20%

16.5% 18%
r = 30%

16.0% 16%
r = 0%

15.5%
14%
r = −30%

15.0%
12%
r = −70%
14.5%
0.70 0.80 0.90 1.00 1.10 1.20 1.30 10%
0.70 0.80 0.90 1.00 1.10 1.20 1.30
Figure 1 Implied volatility as a function of strike
Figure 2 Volatility smile for various correlation levels

When calibrating the model to the market, we are


very likely to find µ ≤ 0, since variance of volatility a skew. In order to fit market data, it is crucial to
is bounded. Hence, the expectation of volatility correctly set the correlation parameter.
converges to either zero or its initial value. One drawback of the Hull–White model is the
lack of mean-reverting behavior in the volatility
process (see Stochastic Volatility Models).
Implied Volatility Smile
In Figure 1, we show implied volatility as a function References
of moneyness (strike divided by forward). The call
option price has been computed using the Taylor [1] Hull, J. & White, A. (1987). The pricing of options on
expansion of equation (7) with σ0 = 0.15, µ = 0, assets with stochastic volatilities, The Journal of Finance
ξ = 0, and r = 0. Compared to the Hull–White 17(2), 281–300.
[2] Hull, J & White, A. (1988). An analysis of the bias in
model, the Black–Scholes model overprices at-the-
option pricing caused by a stochastic volatility, Advances
money options and underprices in- and out-of-the- in Futures and Options Research 3, 27–61.
money options.

Related Articles
Correlation between Stock Returns and
Changes in Volatility Heavy Tails; Heston Model; Implied Volatility
in Stochastic Volatility Models; Implied Volatility
By introducing correlation between stock and Surface; Partial Differential Equations; Stochastic
variance Gaussian increments, Hull and White Volatility Models; Stylized Properties of Asset
incorporate explicitly a cause of the volatility skew: Returns.
the leverage effect. Even if they do not provide any
analytic formula in the correlated case, one can still PIERRE GAUTHIER & PIERRE-YVES H.
analyze the impact of correlation through numerical RIVAILLE
simulation.
As shown in Figure 2, this correlation has a huge
impact since it enables to transform the smile into
Tempered Stable Process make S(t) into a martingale and can be determined by
using equation (2), which leads to ω = − ln((−i)).
Further methods to compute an equivalent martingale
measure are discussed in [7].
A tempered stable process is a pure-jump Levy
process (see Lévy Processes) with infinite activity
(see Exponential Lévy Models) whose small jumps Tempered Stable
behave like a stable process, while the large jumps
are “tempered” so that the tail of the density decays The CGMY process is a special case of the tempered
exponentially. Tempered stable processes can be stable process considered in Boyarenko and Leven-
constructed from stable processes by exponential dorskii [1], Cont and Tankov [5] or Rosinski [12].
tilting (see Esscher Transform) of the Levy measure. The latter process has Lévy measure with density
Tempered stable processes were introduced in [8] given by
and introduced in financial modeling by Cont et al.
[4] under the name truncated stable process, where exp(−G|x|)
kTS = C− 1{x<0}
it was noted that tempered stable processes have a |x|1+Y−
short-time behavior similar to stable Levy processes exp(−M|x|)
while retaining finite variance and finite exponential + C+ 1{x>0} (3)
moments. Option pricing with tempered stable pro- |x|1+Y+
cesses was studied in [1], [2], and [5]. The parameters of equation (3) fulfill G, M >
The best known example of tempered stable 0, C± > 0, and Y± ∈ (−∞, 2). The characteristic
processes is the CGMY process introduced in [2], function is available in closed form and hence
which is a pure-jump Levy process with Levy density option pricing and calibration can be performed using
given by Fourier-transform methods. Choosing C− = C+ and
Y− = Y+ leads to the CGMY process and Y− = Y+ =
exp(−G|x|) 0 leads to a variance gamma process (see Variance-
kCGMY = C 1{x<0}
|x|1+Y gamma Model).
exp(−M|x|)
+C 1{x>0} (1)
|x|1+Y
Interpretation of the Parameters
The model parameters of equation (1) fulfill C >
0, G, M ≥ 0, and Y ∈ (−∞, 2). The restriction on In order to show the impact of the model parame-
the parameter Y ensures that the measure is a Lévy ters to asset returns, we consider the properties of the
measure. process X(t). Increasing C makes the density more
For a given stochastic process X(t), its char- peaked while decreasing C flattens it. C controls the
acteristic function is given by (u, t) = Ɛ[exp frequency of jumps. While determining the probabil-
(iuX(t))] (see Fourier Transform; Fourier Meth- ity of jumps larger than a certain level, this parameter
ods in Options Pricing). For the CGMY model, it is is incorporated. The parameter Y governs the fine
derived in [2] and it is given by structure of the process and the choice affects the
overall properties of the process as explained in the
previous section. It determines if the process is of
(u, t) = exp(tC (−Y )((M − iu)Y − M Y finite or infinite activity.
+ (G + iu)Y − GY )) (2) The parameters G and M control the rate of
exponential decay, that is, the tail behavior, on the
On this basis, Fourier-transform methods (see right and the left of kCGMY . We consider three cases:
Fourier Transform; Fourier Methods in Options G = M leads to a symmetric Lévy measure, G < M
Pricing) can be applied to option pricing. Carr makes the left tail heavier than the right one, and vice
et al. show that the CGMY process has completely versa for the case G > M. The last two cases lead to
monotone Lévy density for Y > −1 and is of infinite a skewed distribution.
activity for Y > 0. The drift parameter is chosen to This behavior is illustrated in Figure 1.
2 Tempered Stable Process

Changing G Changing C
2 1.8
Base 1.6 Base
1.5 Double G 1.4 Double C
Half G Half C
1.2
1
1
0.8
0.6
0.5 0.4
0.2
0 0
−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 −1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1

Changing M Changing Y
1.4 2.5

1.2 Base Base


Double M 2 Add 0.5
1 Half M Sub 0.5

0.8 1.5

0.6 1
0.4
0.5
0.2
0 0
−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 −1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1

Figure 1 Illustration of the effect of changing the CGMY model parameters C, G, M, and Y on the probability density
function

If variance, skewness, and kurtosis exist, they can Pricing and Calibration
be computed by
We can use the characteristic function of the log-price
  X(t) from equation (2) to apply Fourier methods
1 1 described in Carr and Madan [3] or Eberlein et al.
Variance = C(2 − Y ) + (4) [6] to price European and path-dependent options (see
M 2−Y G2−Y
  Fourier Methods in Options Pricing).
1 1
C(3 − Y ) + Options may also be priced by Monte Carlo
M 3−Y G3−Y
Skewness = 3/2
(5) simulation [10, 11] using the representation of the
V  tempered stable process as a subordinated Brownian
1 1 motion [5, Proposition 4.1].
C(4 − Y ) +
M 4−Y G4−Y In contrast to the diffusion processes, for the
Kurtosis = (6)
V2 pure-jump processes, the change in measure can be
computed from the statistical measure, that is, using
The equations for the higher moments suggest parameters computed from time-series data and risk-
that the parameter C controls the overall size of neutral measure, that is, using parameters obtained
the moments. This has already been verified by the using quoted option prices. It holds that k˜ (x) =
expression for the density. In the case  k(x) dx < Y (x)k ; see [2] for details. Let us call the correspond-
+∞, it can be interpreted as a measure for the ing parameter sets P = {C, G, M, Y, µ} and P̃ =
overall level of activity. In the case of finite activity, {C̃, G̃, M̃, Ỹ , r}, where r denotes the riskless rate and
the process has a finite number of jumps on every the corresponding measures by  and  ˜ respectively.
compact interval. If the characteristic functions are denoted by  and
Tempered Stable Process 3

 ˜ is an
˜ then using the results in [2] state that  [5] Cont, R. & Tankov, P. (2003). Financial Modelling with
Jump Processes, Chapman and Hall / CRC Press.
equivalent martingale measure to  if and only if [6] Eberlein, E., Glau, K. & Papapantoleon, A. (2008)
˜
C = C̃, Y = Ỹ , and r − (−i) = µ − (−i). The Analysis of Valuation Formulae and Applications to
constraints on the parameters G, M, G̃, and M̃ are Exotic Options.. Preprint Uni Freiburg, www.stochastic.
implicit in the last equality. uni-freiburg.de/∼eberlein/papers/Eberlein-glau.Papapan.
pdf
[7] Kim, Y.S. & Lee, J.H. (2007). The relative entropy in
Path Properties
CGMY processes and its applications to finance, Mathe-
Path properties of the model affect the prices matical Methods of Operations Research 66(2),
of exotic path-dependent options. We considered 327–338.
[8] Koponen, I. (1995). Analytic approach to the problem
path variation when we gave the interpretation
of convergence of truncated Lévy flights towards the
for the model parameters. Other concepts like hit- Gaussian stochastic process, Physical Review E 52,
ting points, creeping, or regularity of the half- 1197–1199.
line are considered in [9]. We shortly introduce [9] Kyprianou, A.E. & Loeffen, T.L. (2005). Lévy processes
hitting points. The process Xt can hit a point in finance distinguished by their coarse and fine path
x ∈  if (Xt = x for at least one t > 0) > 0. We properties, in Exotic Option Pricing and Advanced Lévy
denote the set of all points the process can hit by models, A.E. Kyprianou, W. Schoutens & P. Wilmott,
 eds, Wiley, Chichester.
H = x ∈ |(Xt = x for at least one t > 0) > 0 .
[10] Madan, D. & Yor, M. (2005). CGMY and Meixner
See [9] for details. Subordinators are Absolutely Continuous with Respect
to One Sided Stable Subordinators. Prépublication du
References Laboratoire de Probabilités et Modèles Aléatoires.
[11] Poirot, J. & Tankov, P. (2006). Monte Carlo option
pricing for tempered stable (CGMY) processes, Asia
[1] Boyarchenko, S.I. & Levendorskii, S.Z. (2002). Non- Pacific Financial Markets 13(4), 327–344.
Gaussian Merton-Black-Scholes theory, Advanced [12] Rosinski, J (2007). Tempering stable processes, Stochas-
Series on Statistical Science and Applied Probability,
tic Processes and their Applications, 117(6), 677–707.
World Scientific, River Edge, NJ, Vol. 9.
[2] Carr, P., Geman, H., Madan, D. & Yor, M. (2002). The
fine structure of asset returns: an empirical investigation,
Journal of Business 75(2), 305–332. Related Articles
[3] Carr, P. & Madan, D. (1999). Option valuation using
the fast Fourier transform, Journal of Computational
Finance 2(4), 61–73. Exponential Lévy Models; Fourier Methods
[4] Cont, R., Potters, M. & Bouchaud, J.P. (1997). Scal- in Options Pricing; Fourier Transform; Lévy
ing in stock market data: stable laws and beyond, in Processes; Time-changed Lévy Process.
Scale Invariance and Beyond. B. Dubrulle, F. Graner &
D. Sornette, eds, Springer. JÖRG KIENITZ
Lognormal Mixture for (t, y) > (0, 0) and ν(t, y) = σ0 for (t, y) =
Diffusion Model (0, S0 ), the SDE
dSt = µSt dt + ν(t, St )St dWt (5)

Let us denote the time-t price of a given financial has a unique strong solution whose marginal density
asset by S(t), equivalently St . We say that S evolves is given by the mixture of lognormals
according to a local-volatility model (see also Local

N
1
Volatility Model) if, under the risk-neutral measure, pt (y) = λi √
i=1 yVi (t) 2π
dS(t) = µS(t)dt + σ (t, S(t))S(t)dW (t),
  2 
S(0) = S0 (1) y 1
× exp − 2 ln − µt + 2 Vi (t)
1 2
where S0 is a positive constant, W is a standard 2Vi (t) S0
Brownian motion, σ is a well-behaved deterministic
function, and µ is the risk-neutral drift rate, which (6)
is assumed to be constant. For instance, in case of
Moreover, for (t, y) > (0, 0), we can write ν 2 (t, y)

a stock paying a continuous dividend yield q, µ =
r − q, where r is the (assumed constant) continuously = N 2
i=1 i (t, y)σi (t), where, for each (t, y) and

compounded risk-free rate. i, i (t, y) ≥ 0 and N i=1 i (t, y) = 1. As a conse-
Brigo and Mercurio [1–3] find an explicit expres- quence, for each t, y > 0,

sion for the function σ such that the resulting process
has a density that, at each time, is given by a mix- 0 < σ̃ := inf min σi (t) ≤ ν(t, y) ≤ σ̂
t≥0 i=1,...,N
ture of lognormal densities. Their result is briefly
reviewed in the following.
Let us consider N functions σi ’s that are determin- := sup max σi (t) < +∞ (7)
t≥0 i=1,...,N
istic and bounded from above and below by positive
constants, and corresponding lognormal densities A proof of this proposition can be found in [2],
1 and more formally in [5].
pti (y) = √ The pricing of European options under the
yVi (t) 2π
  2  lognormal-mixture local-volatility model is quite
1 y straightforward (see also Risk-neutral Pricing;
× exp − 2 ln − µt + 2 Vi (t)
1 2
2Vi (t) S0 Black–Scholes Formula).
(2)
 Proposition 2 Consider a European option with
t maturity T , strike K, and written on the asset. The
Vi (t) := σi2 (u)du (3) option value at the initial time t = 0 is given by
0
the following convex combination of Black–Scholes
prices:
Proposition 1 Let us assume that each σi is also
continuous and that there exists an ε > 0 such that
σi (t) = σ0 > 0, for each t in [0, ε] and i = 1, . . . , N .
Then, if we set
 
 N

 2 
 1 1 y

2
λi σi (t) exp − 2 ln − µt + 2 Vi (t)
1 2
 i=1 Vi (t) 2Vi (t) S0
ν(t, y) = 
   2  (4)

N
1 1 y
λi exp − 2 ln − µt + 2 Vi (t)
1 2

i=1
Vi (t) 2Vi (t) S0
2 Lognormal Mixture Diffusion Model

N smile-shaped implied volatility structures. Extensions


π(K, T ) = ωP (0, T ) λi allowing for nonzero slopes at the at-the-money level
i=1 are introduced in [4].
    
S0 1
ln + µ + ηi2 T References
 µT  K 2 
× 
S0 e  ω √ 

ηi T
[1] Brigo, D. & Mercurio, F. (2000). A mixed-up smile, Risk
September, 123–126.
    [2] Brigo, D. & Mercurio, F. (2001). Displaced and mix-
S0 1
ln + µ − ηi2 T ture diffusions for analytically-tractable smile models,
 K 2 
−K 
ω √  (8)

in Mathematical Finance—Bachelier Congress 2000,
ηi T H. Geman, D.B. Madan, S.R. Pliska & A.C.F. Vorst,
eds, Springer Finance, Springer, Berlin, Heidelberg, New
York.
where P (0, T ) is the discount factor for maturity T ,  [3] Brigo, D. & Mercurio, F. (2002). Lognormal-mixture
is the normal cumulative distribution function, ω = 1 dynamics and calibration to market volatility smiles,
for a call and ω = −1 for a put, and International Journal of Theoretical and Applied Finance
5(4), 427–446.

 T [4] Brigo, D., Mercurio, F. & Sartorelli, G. (2003). Alterna-

 σ 2 (t)dt tive asset-price dynamics and volatility smile, Quantita-
Vi (T ) 0 i tive Finance 3(3), 173–183.
ηi := √ = (9) [5] Sartorelli, G. (2004). Density Mixture Ito Processes. PhD
T T
thesis, Scuola Normale Superiore di Pisa.
The main advantage of the lognormal-mixture
FABIO MERCURIO
local-volatility model is its tractability (explicit
marginal density and option prices). This model
can be successfully used in practice to calibrate
Normal Inverse Gaussian distribution is symmetric. This can easily be seen
from the characteristics of the NIG distribution given
Model in Table 1.
Note that the NIG distribution is a special
case of generalized hyperbolic distribution, and it
The normal inverse Gaussian (NIG) process is an can approximate most hyperbolic distributions very
example of a Lévy process (see Lévy Processes) with closely. In modeling, the NIG distribution can
no Brownian component. describe observations with considerably heavier tail
We first discuss the NIG distribution and its main behavior than the log linear rate of decrease that char-
properties. The NIG process can be constructed either acterizes the hyperbolic shape (see [2]). The NIG
as process with NIG increments or, alternatively, distribution has semiheavy tails (see [3])
defined via random time change of Brownian motion
using the inverse Gaussian process to determine
f (x; α, β, δ) ∼ const.|x|−3/2
time. Further, we present the NIG market model and
show how one can price European options under this exp{−α|x| + βx}, x → ±∞
model. Option pricing can be done using the NIG
density function, the NIG Lévy characteristics, or the
NIG characteristic function. The Normal Inverse Gaussian Process

We define the NIG process


The Normal Inverse Gaussian Distribution
The NIG distribution with parameters α > 0, −α < X (NIG) = {Xt(NIG) , t ≥ 0} (3)
β < α, and δ > 0 has characteristic function (see [1])
as the Lévy process with stationary and indepen-
 
dent NIG-distributed increments, where X0(NIG) = 0
φ(u; α, β, δ) = exp −δ α 2 − (β + iu)2 with probability 1. To be precise, Xt(NIG) follows a
  NIG(α, β, δt) law.
− α2 − β 2 (1) The Lévy measure of the NIG process is given by

We shall denote this distribution by NIG(α, β, δ). νNIG ( dx) = δαπ −1 exp{βx}K1 (α|x|)(|x|)−1 dx
The distribution is so named due to the fact that
NIG(α, β, δ) is a variance–mean mixture of a nor- (4)
mal distribution with the inverse Gaussian as the mix-
ing distribution. It follows immediately from expres- An NIG process has no Brownian component and its
sion (1) that this distribution is infinitely divisible. Lévy triplet is given by [γ , 0, νNIG ( dx)], where
The distribution is defined on the whole real line and
1
has the density function
γ = 2δαπ −1 sinh(βx)K1 (αx)dx (5)
   0
f (x; α, β, δ) = αδπ −1 exp δ α 2 − β 2 + βx
The NIG Lévy process may alternatively be repre-
 
 −1 sented via random time change of Brownian motion,
K1 α δ 2 + x 2 δ2 + x 2 , x∈R (2) using the inverse Gaussian (IG) process to determine
time, as
where K1 is the modified Bessel function of third-
order and index 1. If a random variable X follows XtNIG = βδ 2 It + δWIt (6)
an NIG(α, β, δ) distribution and c > 0, then cX
is NIG(α/c, β/c, cδ)-distributed. Further, if X ∼ where W = {Wt , t ≥ 0} is a standard Brownian
NIG(α, β, δ1 ) is independent of Y ∼ NIG(α, β, δ2 ), motion and I = {It , t ≥ 0} is an IG process with
then X + Y ∼ NIG(α, β, δ1 + δ2 ). If β = 0, the parameters a = 1 and b = δ α 2 − β 2 .
2 Normal Inverse Gaussian Model

Table 1 Mean, variance, skewness, and kurtosis of the of the NIG(α, β, δ) process, that is, introducing the
normal inverse Gaussian distribution distribution NIG(α, β, δ, m) with the characteristic
NIG(α, β, δ) function
 2 −1/2
Mean δβ α − β 2 φ̃(u; α, β, δ, m) = φ(u; α, β, δ) exp{ium} (9)
 −3/2
Variance α2 δ α2 − β 2
−1 −1

2 −1/4

δ α − β
2
Skewness 3βα where
 −1/2
Kurtosis 3 1 + α 2 + 4β 2 δ −1 α −2 α 2 − β 2

 
m=r −q +δ α 2 − (β + 1)2 − α 2 − β 2

The NIG Model (10)

The NIG model belongs to the class of exponential and φ(u; α, β, δ) is defined by expression (1).
Lévy models (see Time-changed Lévy Process).
Consider a market with a riskless asset (the bond),
with a price process given by bt = exp{rt}, and one Pricing of European Options
risky asset (the stock or index). The model for the
risky asset is Given our NIG market model, we focus now on
the pricing of European options whose payoffs are
functions of the terminal asset value only. Denote
St = S0 exp{Xt(NIG) } (7)
the payoff of the option at its time of expiry T by
G(ST ) and let F (XT ) = G(ST )
where the log returns log (St+s /St ) follow the NIG
(α, β, δs) distribution (i.e., the distribution of incre-
ments of length s of the NIG process). Pricing through Density Function

For a European call option with strike price K and


time to expiration T , the value V0 at time 0 is given
Equivalent Martingale Measure
by the expectation of the payoff under the martingale
measure Q. If we take for Q the Esscher transform
Pricing financial derivatives requires that we work
equivalent martingale measure, the value at time 0 is
under an equivalent martingale measure. We present
given by
here two ways to attain equivalent martingale mea-
sures for the discounted price process {exp(−(r − ∞
q)t)St , t ≥ 0}, where r is the risk-free continuously ∗
+1)
V0 = exp{−qT }S0 fT(θ (x) dx
compounded interest rate and q is the continuously c
compounded dividend yield. ∞ ∗
One can find at least one equivalent martingale − exp{−rT }K fT(θ ) (x) dx (11)
measure Q using the Esscher transform (see [6]). c
For the NIG model, the Esscher transform equiva- ∗
lent martingale measure follows an NIG(α, θ ∗ + β, δ) where c = ln(K/S0 ), fT(θ ) (x) is the density function
law (see [12]), where θ ∗ is the solution of the of the NIG(α, θ ∗ + β, δ) distribution. Similar formu-
equation las can be derived for other derivatives with a payoff
function that depends only on the terminal value at

  time t = T .
r −q =δ α 2 − (β + θ)2 − α 2 − (β + θ + 1)2
(8) Pricing through the Lévy Characteristics

Another way to obtain an equivalent martingale Another way to find the value Vt = V (t, Xt ) at
measure Q is by mean-correcting the exponential time t is by solving a partial integro-differential
Normal Inverse Gaussian Model 3

equation (see Partial Integro-differential Equations Monte Carlo Simulations


(PIDEs)). If V (t, x) ∈ C 1,2 then the function V (t, x)
solves One can make use of the representation (6) to
simulate an NIG process. In such a way, a sample
path of the NIG is obtained by sampling a standard
∂ ∂ Brownian motion and an IG process. We refer to [5]
rV (t, x) = γ V (t, x) + V (t, x)
∂x ∂t for the details on the generation of an IG random
+∞  number.
+ V (t, x + y) − V (t, x)
−∞

∂ Origin
−y V (t, x) ν Q ( dy)
∂x
V (T , x) = F (x) (12) The NIG distribution was introduced in [1]. The
potential applicability of the NIG distribution and
Lévy process for the modeling and analysis of
where [γ , 0, ν Q ( dy)] is the Lévy triplet of the NIG statistical data from turbulence and finance is dis-
process under the risk-neutral measure Q. cussed in [2] and [3]. See also [9–11] for the applica-
tion of the NIG distribution in modeling logarithmic
asset returns.
Pricing through the Characteristic Functions
References
Pricing can also be done by using the characteristic
function [4] (see Fourier Transform). Let α be a [1] Barndorff-Nielsen, O.E. (1995). Normal Inverse Gaus-
positive constant such that the αth moment of the sian Distributions and the Modelling of Stock Returns,
stock price exists, then the value of the option is Research Report No. 300, Department of Theoretical
given by Statistics, Aarhus University.
[2] Barndorff-Nielsen, O.E. (1997). Normal inverse Gaus-
sian distributions and stochastic volatility modelling,
exp{−α log(K)} Scandinavian Journal of Statistics 24(1),
V0 = 1–13.
π [3] Barndorff-Nielsen, O.E. (1998). Processes of normal
+∞
inverse Gaussian type, Finance and Stochastics 2,
× exp{−iv log(K)}
(v) dv (13) 41–68.
0
[4] Carr, P. & Madan, D. (1998). Option valuation using
the fast Fourier transform, Journal of Computational
where Finance 2, 61–73.
[5] Devroye, L. (1986). Non-Uniform Random Variate Gen-
 eration, Springer.
exp{−rT }E exp{i(v − (α + 1)i) log(ST )} [6] Gerber, H.U. & Shiu, E.S.W. (1994). Option pricing

(v) = by Esscher-transforms, Transactions of the Society of


α + α − v + i(2α + 1)v
2 2
Actuaries 46, 99–191.
exp{−rT }ϕ(v − (α + 1)i) [7] Lee, R.W. (2004). Option pricing by transform methods:
= (14) extensions, unification, and error control, Journal of
α 2 + α − v 2 + i(2α + 1)v Computational Finance 7(3), 50–86.
[8] Raible, S. (2000). Lévy Processes in Finance: Theory,
and Numerics, and Empirical Facts. PhD thesis, University
 of Freiburg, Freiburg.
ϕ(u) = E exp{iu log(ST )} (15) [9] Rydberg, T. (1996). The Normal Inverse Gaussian
Lévy Process: Simulations and Approximation, Research
Report No. 344, Department of Theoretical Statistics,
Other methods for the valuation of European options Aarhus University.
by applying characteristic functions can be found in [10] Rydberg, T. (1996). Generalized Hyperbolic Diffusions
[7] and [8]. with Applications Towards Finance. Research Report
4 Normal Inverse Gaussian Model

No. 342, Department of Theoretical Statistics, Aarhus Related Articles


University.
[11] Rydberg, T. (1997). A note on the existence of unique
equivalent martingale measures in a Markovian setting, Exponential Lévy Models; Fourier Transform;
Finance and Stochastics 1, 251–257. Partial Integro-differential Equations (PIDEs).
[12] Schoutens, W. (2003). Lévy Processes in Finance—
Pricing Financial Derivatives, John Wiley & Sons, HENRIK JÖNSSON, VIKTORIYA MASOL &
Chichester. WIM SCHOUTENS
Generalized Hyperbolic that is,

Models dGIG(λ,δ,γ ) (x) =


 γ λ 1
x λ−1
δ 2Kλ (δγ )
  
1 δ2
Generalized hyperbolic (GH) Lévy motions consti- × exp − +γ x2
1l{x>0}
2 x
tute a subclass of Lévy processes that are generated
by GH distributions. GH distributions were intro- (3)
duced in Barndorff-Nielsen [1] in connection with a
project with geologists. The Lebesgue density of this Then if N (µ + βy, y) denotes a normal distribu-
five-parameter class can be given in the following tion with mean µ + βy and variance y, one can easily
form: verify that

 ∞
dGH(λ,α,β,δ,µ)(x) = a(λ, α, β, δ, µ)
  dGH(λ,α,β,δ,µ)(x) = dN(µ+βy,y) (x)
1 0
λ− /2
× dGIG(λ,δ,√α 2 −β 2 ) (y) dy (4)
2
× (δ 2 + (x − µ)2 )
  
× K 1 α δ 2 + (x − µ)2
λ− Using maximum likelihood estimation, one can
2
× exp(β(x − µ)) (1) fit GH distributions to empirical return distributions
from financial time series such as the daily stock or
with the norming constant index prices. Figure 1 shows a fit to the daily closing
prices of Telekom over a period of seven years.
Figure 2 shows the same densities on a log scale
(α 2 − β 2 )λ/2 in order to make the fit in the tails visible. One
a(λ, α, β, δ, µ) = √ 1  recognizes the hyperbolic shape of the GH density
2π α λ− 2 δ λ Kλ (δ α 2 − β 2 ) in comparison to the parabolic shape of the normal
(2) density. The characteristic function of the GH distri-
bution is
Kν denotes the modified Bessel function of the third
kind with index ν. The parameters can be inter-  λ
preted as follows: α > 0 determines the shape, β α2 − β 2 2
ϕGH (u) = eiuµ
with 0 ≤ |β| < α the skewness and µ ∈  the loca- α 2 − (β + iu)2
tion. δ > 0 serves for scaling, and λ ∈  character-   
izes subclasses. It is essentially the weight in the Kλ δ α 2 − (β + iu)2
tails that changes with λ. There are two alterna- ×  (5)
tive parameterizations that are scale- and location- Kλ (δ α 2 − β 2 )
invariant, that is, they do not change under affine
transformations Y = aX + b for a  = 0, namely, ζ = and expectation and variance are

δ α 2 − β 2 , ρ = β/α and ξ = (1 + ζ )−1/2 , χ = ξρ.
Since 0 ≤ |χ| < ξ < 1, for a fixed λ the distributions βδ 2 Kλ+1 (ζ )
parameterized by χ and ξ can be represented by the E[GH ] = µ + (6)
ζ Kλ (ζ )
points of a triangle, the so-called shape triangle.
GH distributions arise in a natural way as vari- δ 2 Kλ+1 (ζ ) β 2 δ 4
Var(GH ) = + 2
ance–mean mixtures of normal distributions. Let ζ Kλ (ζ ) ζ
dGIG denote the density of a generalized inverse  
2
Gaussian distribution (see Normal Inverse Gaussian Kλ+2 (ζ ) Kλ+1 (ζ )
× − (7)
Model) with parameters δ > 0, γ > 0, and λ ∈ , Kλ (ζ ) Kλ2 (ζ )
2 Generalized Hyperbolic Models

GH whereas for λ = − 12 one gets the class of normal


20 Norm inverse Gaussian (NIG) distributions with density

α   
15
dNIG(α,β,δ,µ)(x) = exp δ α 2 − β 2 + β(x − µ)
π
Densities

 
10
2
x − µ
K1 αδ 1 + 
δ
5
×


x−µ 2
0 1+
δ
−0.2 −0.1 0.0 0.1 0.2
x (9)

Figure 1 GH and normal density fitted to the daily The latter one has a particularly simple character-
Telekom returns istic function:
  
exp δ α 2 − β 2
ϕNIG (u) = eiuµ    (10)
exp δ α 2 − (β + iu)2
GH
2 Norm
Many well-known distributions are limit cases
0 of the class of GH distributions. For λ > 0 and
δ → 0, one gets a variance-gamma distribution; in
Log densities

−2 the special case of λ = 1 the result is a skewed and


shifted Laplace distribution. Other limit cases are
−4
the Cauchy and the Student-t distribution as well as
−6 the gamma, the reciprocal gamma, and the normal
distributions [4].
−8

−0.2 −0.1 0.0 0.1 0.2 Exponential Lévy Models


x
GH distributions are infinitely divisible and therefore
Figure 2 Fitted densities on a log scale generate a Lévy process L = (Lt )t≥0 such that the
distribution of L1 , L(L1 ), is the given GH distri-
bution. Analyzing the characteristic function in the
Lévy–Khintchine form, one sees that the Lévy mea-
The moment-generating function exists for all u
sure has an explicit density. There is no Gaussian
such that −α − β < u < α − β. Therefore, moments
component. Consequently the generated Lévy pro-
of all orders are finite.
cess is a process with purely discontinuous paths. The
There are two important subclasses. For λ = 1,
paths have infinite activity, which means that there
one gets the class of hyperbolic distributions with
are infinitely many jumps in any finite time interval
density
(see Jump Processes; Exponential Lévy Models).
As a model for asset prices such as stock prices,
 indices, or foreign exchange rates, we take the
α2 − β 2
dH (α,β,δ,µ) (x) =    exponential of the Lévy process L
2αδK1 δ α 2 − β 2
  St = S0 exp Lt (11)
× exp −α δ 2 + (x − µ)2 For hyperbolic Lévy motions, this model was
 introduced in [6], NIG Lévy processes were consid-
+ β(x − µ) (8) ered in [2], and the extension to GH Lévy motions
Generalized Hyperbolic Models 3

appeared in [3, 8]. The log returns from this model whenever the integral exists. ϕLT denotes the charac-
taken along time intervals of length 1 are Lt − Lt−1 teristic function of the distribution of LT .
and therefore they have exactly the GH distribu-
tion that generates the Lévy process. It was shown References
in [7] that the model (9) is successful in producing
empirically correct distributions on other time hori-
[1] Barndorff-Nielsen, O.E. (1977). Exponentially decreasing
zons as well. This time consistency property can, for distributions for the logarithm of particle size, Proceed-
example, be used to derive the correct VaR estimates ings of the Royal Society of London A 353, 401–419.
on a two-week horizon according to the Basel II [2] Barndorff-Nielsen, O.E. (1998). Processes of normal
rules. Equation (9) can be expressed by the following inverse Gaussian type, Finance and Stochastics 2(1),
stochastic differential equation: 41–68.
  [3] Eberlein, E. (2001). Application of generalized hyperbolic
dSt = St− dLt + e Lt − 1 − Lt (12) Lévy motions to finance, in Lévy Processes. Theory
and Applications, O.E. Barndorff-Nielsen, T. Mikosch &
The price of a European option with payoff f (ST ) S. Resnick, eds, Birkhäuser, pp. 319–336.
is [4] Eberlein, E. & von Hammerstein, E.A. (2004). Gen-
 
V = e−rT Ɛ f (ST ) (13) eralized hyperbolic and inverse Gaussian distributions:
limiting cases and approximation of processes, in Seminar
where r is the interest rate and expectation is taken on Stochastic Analysis, Random Fields and Applications
IV, R.C. Dalang, M. Dozzi & F. Russo, eds, Progress in
with respect to a risk-neutral (martingale) measure.
Probability, Birkhäuser, Vol. 58, 221–264.
As shown in [5], there are many equivalent martin- [5] Eberlein, E. & Jacod, J. (1997). On the range of options
gale measures due to the rich structure of the driving prices, Finance and Stochastics 1, 131–140.
process L. The simplest choice is the so-called Ess- [6] Eberlein, E. & Keller, U. (1995). Hyperbolic distributions
cher transform, which was used in [6]. For the process in finance, Bernoulli 1(3), 281–299.
L to be again a GH Lévy motion under an equiva- [7] Eberlein, E. & Özkan, F. (2003). Time consistency of
lent martingale measure (see Equivalent Martingale Lévy models, Quantitative Finance 3, 40–50.
[8] Eberlein, E. & Prause, K. (2002). The generalized hyper-
Measures), the parameters δ and µ have to be kept
bolic model: financial derivatives and risk measures,
fixed [9]. Since the density of the distribution of in Mathematical Finance – Bachelier Congress, 2000,
ST can be derived via inversion of the characteris- H. Geman, D. Madan, S. Pliska & T. Vorst, eds, Springer,
tic function, the expectation in equation (11) can be Paris, pp. 245–267.
computed directly. A numerically much more effi- [9] Raible, S. (2000). Lévy Processes in Finance: Theory,
cient method based on two-sided Laplace transforms, Numerics, and Empirical Facts. Ph.D. thesis, University
which is applicable to a wide variety of options, has of Freiburg.
been developed in [9]. Assume that e−Rx f (e−x ) is
bounded and integrable for some R such that the Related Articles
moment-generating function of LT is finite at −R.
Write g(x) = f (e−x ) and ψg (z) =  e−zx g(x)dx for
Exponential Lévy Models; Fourier Methods in
the bilateral Laplace transform of g. If ζ := − log S0 ,
Options Pricing; Heavy Tails; Implied Volatility
then the option price V can be expressed in the
Surface; Jump-diffusion Models; Normal Inverse
following form:
Gaussian Model; Partial Integro-differential
 Equations (PIDEs); Stochastic Exponential; Styl-
eζ R−rT ized Properties of Asset Returns.
V (ζ ) = eiuζ ψg (R + iu)ϕLT (iR − u) du
2π 
ERNST EBERLEIN
(14)
Regime-switching Models The Regime-switching Framework
A regime-switching model can be cast in either
a discrete or continuous time setting. The model
Many financial time series exhibit sudden changes in is built conditional to a Markov chain s(t), the
the structure of the data-generating process. Examples realization of which is not directly observed by
include financial crises, exchange rate swings, and economic agents. The chain can take a discrete
jumps in the volatility. Sometimes, this sudden switch set of values, and here we label them as s(t) ∈
is due to a change in policy, for example, when {1, 2, . . . , N } = S . The Markov chain is determined
moving from a fixed to a floating exchange rate by its transition probability matrix in discrete time
regime. In other cases, the behavior of the series or its rate matrix in continuous time. In particular,
is influenced by an exogenous fundamental variable, in a discrete time setting, we write the transition
such as the current position on the business or the probabilities pi,j
credit cycle.
Regime-switching models attempt to capture this P [S(t + 1) = j |S(t) = i] = pi,j (1)
behavior by allowing the data-generating process and we collect the elements {pi,j } in the transition
to change in time, depending on an underlying, probability matrix P. The columns of P must sum
discrete but unobserved state variable. Typically, up to 1, and all transition probabilities must be
the functional form of the data-generating process nonnegative. For a continuous time process, we
remains the same across the different regimes with define the transition rates qi,j :
only the parameter values being state-dependent, as,
for example, in a random walk equity return model P[S(t + dt) = j |S(t) = i] = 1(i = j ) + qi,j dt (2)
where the drift and volatility change with the regime. where 1 is the indicator function. The elements {qi,j }
However, it is feasible to set up models where the are collected in the rate matrix Q. Note that by this
data-generating process itself changes, for example, definition, the columns of the rate matrix must sum up
moving from a deterministic fixed exchange to a to 0, and the diagonal elements will be negative. The
stochastic floating one. infinitesimal transition probability matrix is given
From a statistical point of view, regime-switching as
models will produce mixtures of distributions (see P(dt) = I + Qdt (3)
Mixture of Distribution Hypothesis), offering a
very stylized and intuitive way of accommodating where I is the unit (N × N ) matrix.
features such as fat tails, skewness, and volatil- At each point in time, the data-generating process
ity clustering (see Stylized Properties of Asset will vary, according to the regime s(t) that prevails
Returns). It is very easy to calibrate regime- at that time. Thus, for a discrete time process, we can
switching models on historical data using maxi- write
x(t) = g[t, s(t), y(t), (t); β] (4)
mum likelihood techniques, implementing what can
be thought of as a discrete version of the Kalman In the above expression, y(t) includes variables
filter (see Filtering). Virtually all economic and known at time t, including exogenous variables and
financial time series have been analyzed under the lagged values of x(t), and (t) represents the error
regime-switching framework, including interest and term. In continuous time, we can write the stochastic
exchange rates, equity returns, commodity prices, differential equation:
energy prices, and credit spreads.
Derivative prices can be computed for regime- dx(t) = µ[t, s(t), y(t); β]dt
switching models by using transform methods (see
+ σ [t, s(t), y(t); β]dB(t) (5)
Hazard Rate). The characteristic function of a
regime-switching process can be computed in closed Again, y(t) can include exogenous variables
form if the characteristic functions conditional on and the history of x(t), while now B(t) is a
each regime are available. This makes such processes standard Brownian motion. The above equation
a viable alternative to the stochastic volatility models, can be extended to a multidimensional setting
where one has to also resort to transform methods for and can be generalized to include jumps or Lévy
pricing. processes.
2 Regime-switching Models

A standard simple example is a regime-dependent • The numerator of the above expression is the
random walk process, where product of the conditional density with the fore-
cast probability for each state.
x(t) = µ [s(t)] + σ [s(t)] (t) in discrete time • The denominator, which is the sum of all numer-
ators computed in the previous step, is also
dx(t) = µ [s(t)] dt + σ [s(t)] dB(t) the conditional density of P[x(t + 1) ∈ dx|F (t)].
in continuous time (6) This is the likelihood function of the observation
x(t + 1).
The parameter set in this example is β = {µ(1),
σ (1), µ(2), σ (2), . . . , µ(N ), σ (N )}. In the Kalman filtering terminology, the above
computation can be compactly written in two steps:

Estimation from Historical Data Prediction: ξ (t + 1|t) = Pξ (t|t) (9)


Given a set of historical values, the parameter vector Correction: ξ (t + 1|t + 1)
β can be calibrated using maximum likelihood. As f(t + 1)  ξ (t + 1|t)
=  (10)
data are available over a discrete time grid, we focus ι · (f(t + 1)  ξ (t + 1|t))
on the calibration of the discrete time model. We
The vector f(t) above collects the conditional
give the switching random walk model with Gaussian
distributions, across all possible regimes. In the
noise as an example, but it should be straightforward
Gaussian regime-switching model, the elements of
to generalize this to more complex structures.
f(t) would have the following elements:
We denote the conditional density of x(t)
by ft (x|j ) = P[x(t) ∈ dx|s(t) = j ], given that the   
underlying Markov chain is in state j . In our example, 1 (x(t) − µ(j ))2
this is a Gaussian density, with mean µ(j ) and f(t) = √ exp −
2πσ (j ) 2σ 2 (j ) j ∈S
variance σ 2 (j ). In addition, for future reference, we
define the vector of conditional probabilities ξ (t|t  ), (11)
with elements
The symbol “” denotes element-by-element mul-
ξ (t|t  ) = {P[S(t) = j |F (t  )]}j ∈S (7) tiplication, and ι is an (N × 1) vector of ones.
More details on estimation methods can be found
An important component of the calibration procedure in [18]. A small sample of the vast number of empir-
is the vector of filtered probabilities ξt|t . ical applications that utilize the regime-switching
Given the parameter set β, the filtered probabilities framework is provided in [3, 15, 17, 19, 23, 27].
can be computed based on the following recursion: Generalizations include time-varying transition prob-
abilities that depend on explanatory variables [14],
• Assume that the filtered probabilities are available switching generalized autoregressive conditionally
up to time t. heteroskedastic (GARCH)-type processes [16, 20],
• Compute the forecast probabilities ξ (t + 1|t) = and models that resemble a multifractal setting [6]. A
Pξ (t|t). Bayesian alternative to maximum likelihood is sought
• The Bayes theorem yields the filtered probability: in [1].

P[s(t + 1) = j |F (t + 1)] = P[s(t + 1) = j |x(t + 1), F (t)]


P[s(t + 1) = j |F (t)]
= P[x(t + 1) ∈ dx|s(t + 1) = j, F (t)] ·
P[x(t + 1) ∈ dx|F (t)]
P[x(t + 1) ∈ dx|s(t + 1) = j, F (t)] · P[s(t + 1) = j |F (t)]
=  (8)
P[x(t + 1) ∈ dx|s(t + 1) = , F (t)] · P[s(t + 1) = |F (t)]
∈S
Regime-switching Models 3

Derivative Pricing under Regime case, the off-diagonal elements of A(u) are multiplied
Switching by the characteristic function of the jump size.
Pricing of American options can be done by
Derivative pricing is typically carried out in a contin- setting up the continuation region [5] or by employing
uous time setting.a For a vanilla payoff with maturity a variant of Carr’s randomization procedure [4]. More
T , say z(T ) = h(x(T )), the time-zero price is given exotic products can be handled by setting up a
by the risk neutral expectation: system of partial (integro-)differential equations (see
Partial Integro-differential Equations (PIDEs)) or
z(0) = EQ [D(T )z(T )] (12) by explicitly using Fourier methods as in [22]. As
the conditional distribution can be recovered from the
where D(t) is the discount factor. characteristic function numerically, the density-based
In the regime-switching framework, pricing is approach of [2] (see Quadrature Methods) can be a
routinely carried out using the Fourier inversion viable alternative.
techniques (see Fourier Transform) outlined in [7].
In particular, if the log asset price x(T ) follows a
regime-switching Brownian motion Regime Switching as an Approximation

dx(t) = µ(s(t))dt + σ (s(t))dB(t) (13) Rather than serving as the fundamental latent process,
the Markov chain can serve as an approximation to
then the characteristic function φ(u; T ) = E exp more complex jump-diffusive dynamics. Then, one
(iux(T )) is given by the matrix exponential can use the regime-switching framework to tackle
problem in a nonaffine (see Affine Models) setting,
φ(u; T ) = ι exp(T A(u))ξ (0|0) (14) both in terms of calibration and derivative pricing. To
achieve that, the number of regimes must be large, but
where the (N × N ) matrix A(u) has the following the transition rates and conditional dynamics will be
form: functions of a small number of parameters. The book
by Kushner and Dupuis [25] outlines the convergence
qi,i + g(u; i) if i = j
ai,j (u) = (15) conditions for the approximation of generic diffusions
qi,j if i  = j
and shows how one can implement the Markov chain
approximation in practice.
for g(u; i) = iuµ(i) − 1 u2 σ 2 (i).
2 Following this approach, many stochastic volatil-
The first implementation that prices options where
ity problems can be cast as regime-switching ones.
a two-regime process is present is that in [26].
Chourdakis [8] shows how a generic stochastic
In a more general setting with N regimes, vanilla
volatility process can be approximated in that way,
call option prices can be easily retrieved using the
whereas Chourdakis [9] extends this method to pro-
Fast Fourier Transform (FFT) approach of Carr and
duce the counterpart of the [21] stochastic volatility
Madan [7] or the fractional variant that allows explicit
model (see Heston Model) in a regime-switching
control of the discretization grids [10]. framework where the equity is driven by a Lévy
The above prototypical process can be extended noise.
in two directions. Rather than having switching
Brownian motions that generate the conditional paths,
one can consider switching Lévy processes (see Lévy
End Notes
Processes; Exponential Lévy Models) between the a.
The treatment in References [12, 13] are exceptions to
regimes (see [24], for the special two-regime case, this.
and [11], for a more general setting). In that case,
the function g(u; i) in A(u) is replaced by the
References
characteristic exponent of the Lévy process that is
active in the ith regime. In addition, to introduce [1] Albert, J. & Chib, S. (1993). Bayes inference via
a correlation structure between the regime changes Gibbs sampling of autoregressive time series subject to
and the log-price changes, a jump in the log-price is Markov mean and variance shifts, Journal of Business
introduced when the Markov chain switches. In that and Economic Statistics 11, 1–15.
4 Regime-switching Models

[2] Andricopoulos, A.D., Widdicks, M., Duck, P.W. & [17] Hamilton, J.D. (1989). A new approach to the economic
Newton, D.P. (2003). Universal option valuation using analysis of nonstationary time series and the business
quadrature methods, Journal of Financial Economics 67, cycle, Econometrica 57, 357–384.
447–471. [18] Hamilton, J.D. (1994). Time Series Analysis, Princeton
[3] Ang, A. & Bekaert, G. (2002). Regime switches in inter- University Press, Princeton, NJ.
est rates, Journal of Business and Economic Statistics [19] Hamilton, J.D. (2005). What’s real about the business
20(2), 163–182. cycle? Federal Reserve Bank of St. Louis Review 87(4),
[4] Boyarchenko, S.I. & Levendorski, S.Z. (2006). Amer- 435–452.
ican Options in Regime-switching Models. Manuscript [20] Hamilton, J.D. & Susmel, R. (1994). Autoregressive
available online at SSRN: 929215. conditional heteroscedasticity and changes in regime,
[5] Buffington, J. & Elliott, R.J. (2002). American options Journal of Econometrics 64, 307–333.
with regime switching, International Journal of Theoret- [21] Heston, S.L. (1993). A closed-form solution for options
ical and Applied Finance 5, 497–514. with stochastic volatility with applications to bond
[6] Calvet, L. & Fisher, A. (2004). How to forecast long-run and currency options, Review of Financial Studies 6,
volatility: regime-switching and the estimation of multi- 327–344.
fractal processes, Journal of Financial Econometrics 2, [22] Jackson, K.R., Jaimungal, S. & Surkov, V. (2007).
49–83. Fourier Space Time-stepping for Option Pricing with
[7] Carr, P. & Madan, D. (1999). Option valuation using Lévy Models. Manuscript available online at SSRN:
the Fast Fourier Transform, Journal of Computational 1020209.
Finance 3, 463–520. [23] Jeanne, O. & Masson, P. (2000). Currency crises,
[8] Chourdakis, K. (2004). Non-affine option pricing, Jour- sunspots, and Markov-switching regimes, Journal of
nal of Derivatives 11(3), 10–25. International Economics 50, 327–350.
[9] Chourdakis, K. (2005). Lévy processes driven by [24] Konikov, M. & Madan, D. (2002). Option pricing using
stochastic volatility, Asia-Pacific Financial Markets 12, Variance Gamma Markov chains, Review of Derivatives
333–352. Research 5(1), 81–115.
[10] Chourdakis, K. (2005b). Option pricing using the Frac- [25] Kushner, H.J. & Dupuis, P.G. (2001). Numerical Meth-
tional FFT, Journal of Computational Finance 8(2), ods for Stochastic Control Problems in Continuous Time,
1–18. 2nd Edition, Applications of Mathematics, Springer Ver-
[11] Chourdakis, K. (2005c). Switching Levy Models in Con- lag, New York, NY, Vol. 24.
tinuous Time: Finite Distributions and Option Pricing. [26] Naik, V. (1993). Option valuation and hedging strategies
Manuscript available online at SSRN: 838924. with jumps in the volatility of asset returns, The Journal
[12] Chourdakis, K. & Tzavalis, E. (2000). Option Pricing of Finance 48, 1969–1984.
Under Discrete Shifts in Stock Returns. Manuscript [27] Weron, R., Bierbauer, M. & Trück, S. (2004). Modeling
available online at SSRN: 252307. electricity prices: jump diffusion and regime switching,
[13] Duan, J.-C., Popova, I. & Ritchken, P. (1999). Option Physica A 336, 39–48.
Pricing under Regime Switching. Technical report, Hong
Kong University of Science and Technology.
[14] Filardo, A.J. (1994). Business-cycle phases and their
Related Articles
transitional dynamics, Journal of Business and Economic
Statistics 12, 299–308. Exponential Lévy Models; Filtering; Fourier
[15] Garcia, R., Luger, R. & Renault, E. (2003). Empirical Methods in Options Pricing; Fourier Transform;
assessment of an intertemporal option pricing model Monte Carlo Simulation; Stochastic Volatility
with latent variables, Journal of Econometrics 116, Models; Stylized Properties of Asset Returns;
49–83.
[16] Gray, S. (1996). Modeling the conditional distribution
Variance-gamma Model.
of interest rates as a regime-switching process, Journal
of Financial Economics 42, 27–62. KYRIAKOS CHOURDAKIS
Variance-gamma Model reveal a kurtosis well in excess of 3, suggesting
that the modeling of returns should be done by a
symmetric distribution with heavier tails than the
normal (see Stylized Properties of Asset Returns;
The variance-gamma (VG) process is a stochas- Heavy Tails). For example, in 1972, Praetz [20]
tic process with independent stationary increments, argued in favor of variance-dilation of the nor-
which allows for flexible parameterization of skew- mal through variance-mixing and found that mix-
ness and kurtosis of increments. It has gained pop- ing according to X|W ∼ N(µ, σ 2 W ), where W has
ularity, especially in option pricing, because of its reciprocal (inverse) gamma PDF with ƐW = 1, gives
analytical tractability. It is an example of a pure-jump the scaled t-distribution symmetric about µ for the
Lévy process (see Lévy Processes). returns. This is a slight generalization of the classical
The VG model is derived from the (symmet- Student’s t-distribution, in that fractional degrees of
ric) VG probability distribution, which is so named freedom are permitted.
because it is the distribution of a random vari- Influenced by Praetz’s work, Madan and Seneta
able X that results from mixing a normal variable [16] took the distribution of the mixing variable W
on its variance by a gamma distribution. Specif- itself to be gamma (rather than reciprocal gamma).
ically, the conditional distribution of X is given This resulted in a continuous-time model, which
by X|W ∼ N(µ, σ 2 W ), µ ∈ , σ > 0, where W ∼ is now known as the (symmetric) variance-gamma
(α, α), α > 0. The symbol “∼” stands for “is dis- model.
tributed as”. The symbol (α, λ) indicates a gamma The VG model may be placed within the context
probability distribution, with probability density func- of a more general subordinator (see Time Change)
tion (PDF), for α, λ > 0, model [10], which gives the price Pt of a risky asset
over continuous time t ≥ 0 as
λα α−1 −λw
f (w; α, λ) = w e , Pt = P0 exp {µt + σ B(Tt )} (5)
(α)
w > 0; = 0 elsewhere (1) where µ and σ (> 0) are real constants. {Tt }, the
(market) activity time, is a positive, increasing ran-
The choice λ = α implies that Ɛ(W ) = 1 and dom process (with stationary differences Wt = Tt −
ar(W ) = α 1 . Thus, X is a random variable sym- Tt−1 , t = 1, 2, . . .), which is independent of the stan-
metrically distributed about its mean Ɛ(X) = µ, with dard Brownian motion {B(t)}. The corresponding
ar(X) = σ 2 and a simple characteristic function returns are then given as
(CF), so that when X is mean-corrected to Y =
X − µ, the CF is Xt = log Pt − log Pt−1 = µ + σ (B(Tt ) − B(Tt−1 ))
(6)
 −α
σ 2 u2
Ɛ(e ) = 1 +
iuY
(2) We assume that Ɛ(Wt ) < ∞, and so without loss

of generality that
A random variable X having (symmetric) VG Ɛ(Wt ) = 1 (7)
distribution may also be viewed as
to make the expected activity time change over unit
D 1
X = µ + σ W 2 Z, Ɛ(W ) = 1 (3) calendar time equal to one unit, the scaling change
in time being absorbed into σ , while noting that
D
where the symbol = means “has the same distribution 1
D
as”. Here Z ∼ N(0, 1) and W is a positive nondegen- Xt = µ + σ Wt 2 B(1) (8)
erate random variable distributed independently of Z.
In the case of the VG distribution, W ∼ (α, α). which is of form (3). The case Tt = t of the
Log returns of financial assets model (5) is the classical geometric Brownian motion
(GBM) model for the process {Pt }, with correspond-
Xt = log Pt − log Pt−1 , t = 1, 2, . . . , N (4) ing returns being independently N(µ, σ 2 ) distributed.
2 Variance-gamma Model

In the VG model, {Tt } for t ≥ 0, is the gamma pro- investigations [10, 12] suggest the nonexistence of
cess, a process of stationary-independent increments. higher moments in a model for returns (see Heavy
The distribution of an increment over a time interval Tails; Stylized Properties of Asset Returns) and
of length t is (αt, α). It is a remarkable feature that hence the scaled t-distribution.
the distributional form for any t is the same; this is On the other hand, other investigations [9] suggest
inherited by the VG model for {log(Pt/P0 )}, t ≥ 0, that it is virtually impossible to distinguish between
which is a process of stationary-independent incre- the symmetric scaled t and VG distributions in regard
ments, with the distribution of an increment over any to distributional tail structure by taking compatible
time period t having CF parameter values in the two distributions. In fact,
 −αt the PDFs of the two distributions reveal that the
σ 2 u2 concentration of probability near the point of sym-
e iµut
1+ (9) metry µ and in the middle range of the distributions

is qualitatively and quantitatively different. The VG
The corresponding distribution is also called a distribution tends to increase the probability near µ
(symmetric) VG distribution. Its mean and variance and in the tails, at the expense of the middle range.
are given by Ɛ log(Pt/P0 ) = µt and ar(log(Pt/P0 )) The different natures in regard to shape are most
= σ 2 t, respectively. The whole structure is redolent significantly revealed by the Cauchy distribution as
of Brownian motion, to which the VG process a special case of the t-distribution and the Laplace
reduces in the limit as α → ∞. (two-sided exponential) distribution as a special case
An important consequence of the VG distribu- of the VG distribution.
tional form of an increment over any time interval The first monograph to include a study of the
of length t is that, irrespective of the size of unit of VG model was [4]. Since then it has found a place
time between successive data readings, returns have in monographs such as [22], where it is treated in
a VG distribution. the general context of Lévy processes (see Lévy
The CF (2), clearly the CF of an infinitely divisible Processes).
distribution, is also the CF of a difference of two inde- Both the VG distribution and the scaled t-
pendently and identically distributed (i.i.d.) gamma distribution are extreme cases of the generalized
random variables, which reflects the fact shown in hyperbolic (see Generalized Hyperbolic Models)
[16] that the process {log(Pt/P0 ) − tµ}, t ≥ 0, is the distribution [23, 25].
difference of two i.i.d. gamma processes.
The VG model is a pure-jump process [16] (see
Jump Processes) reflecting this feature of a gamma Allowing for Skewness
process. This is seen from the Lévy–Khinchin repre-
A generalized normal mean–variance-mixing distri-
sentation (see Lévy Processes).
bution is the distribution of X, where the conditional
The analytical simplicity of the VG model and
distribution of X is given as
its pure-jump nature make it a leading candidate for
modeling historical financial data. Further, the VG X|W ∼ N(µ + θW, σ 2 W + d 2 ) (10)
distribution’s PDF has explicit structural form (see
below), which is tractable for maximum-likelihood Here, µ, θ, d, and σ (> 0) are real numbers, and W
estimation of parameters from returns data. is a nondegenerate positive random variable. The dis-
Returns {Xt }, t = 0, 1, 2, . . ., considered in isola- tribution is skew if θ  = 0, and it is symmetric other-
tion, need not be taken to be i.i.d. as in the preceding wise. Press [21] studied a continuous-time model with
discussion, but to form (more generally) a strictly sta- this distribution for returns, where W ∼ Poisson(λ).
tionary sequence, to which moment estimation meth- This is a process of stationary-independent incre-
ods, for example, will still apply [25]. The symmetric ments, resulting from adding a compound Poisson
scaled t-distribution continues to enjoy favor as a process of normal shocks to a Brownian motion, and
model for the distribution of returns because of its has both continuous and jump components. Some
power-law (Pareto-type) probability tails, a property special cases of equation (10) as a returns distribu-
manifested (in contrast to the VG) in the nonexistence tion, with focus on the estimation of parameters by
of higher moments. For some data sets, empirical the method of moments, are considered in [25].
Variance-gamma Model 3

A random variable X is said to have a normal called the variance-gamma model. Its properties are
variance–mean mixture (NVM) distribution [1] if extensively studied in [15].
equation (10) holds with d = 0.
The symmetric VG and scaled t-distributions are
instances of equation (10), with d = θ = 0. Dependence and Estimation
The skew VG distribution, as introduced in [15], is
the case of NVM where W is described by equation The VG model described above is a Lévy pro-
(1) with ƐW = 1 as in the symmetric case. (The cess (see Lévy Processes)—a stochastic process in
skewed scaled t-distribution is defined analogously continuous time with stationary independent incre-
by taking W to have a reciprocal gamma distribution.) ments—whose increments are independent and VG
The skew VG distribution has PDF distributed. To discuss dependence, we consider the
model for returns:
 (x−µ)θ  α− 1
2 αα e σ 2 |x − µ| 2
Xt = log Pt − log Pt−1 = µ + θWt + σ Wt
1/2
Zt ,
fVG (x) = 
π σ (α) θ 2 + 2ασ 2 t = 1, 2, . . . (16)
  
|x − µ| θ 2 + 2ασ 2
×K 1 , x ∈
α−
2 σ2 where Zt , t = 1, 2, . . ., are identically distributed
(11) N(0, 1) random variables, independent of the
strictly stationary process {Wt }, t = 1, 2, . . . . Here
and CF θ, σ (> 0) are constants as before.
When θ = 0, this discrete-time model is equiv-
  −α alent in distribution to that described by the
1 σ 2 u2
φVG (u) = Ɛ(e iux
)=e iµu
1− iuθ − subordinator model of Heyde [10] given by
α 2 equations (5)–(8). Note that ov(Xt , Xt+k ) = 0,
(12)
2
while ov(Xt2 , Xt+k )  = 0, k = 1, 2, . . . . This is an
important feature inasmuch as many asset returns
Kη (ω) for η ∈  and ω > 0, given as display a sample autocorrelation function plot char-
  acteristic of white noise, but no longer do so in a
 sample autocorrelation plot of squared returns and of
1 ∞ η−1 − 2 z+ z
ω 1
Kη (ω) = z e dz (13) absolute values of returns [10, 12, 17].
2 0
McLeish [17] considered the distribution of indi-
is a modified Bessel function of the third kind with vidual Wt ∼ (α, λ), which gives the distribution of
index η (Kη (ω) is referred to as a modified Bessel individual Xt as (symmetric) VG, which he regarded
function of the second kind in some texts). as a robust alternative to the normal. He suggested a
An equivalent representation is number of ways of introducing the dependence in the
process {Wt }, t = 1, 2, . . . .
D 1
X = µ + θW + σ W 2 Z, EW = 1 (14) The continuous-time subordinator model was
expanded in [11] to allow for scaled t-distributed
where Z and W are independently distributed, Z ∼ returns. Their specification of the activity time
N(0, 1), W ∼ (α, α) as mentioned before. This dis- process in continuous time {Tt } incorporated self-
tributional structure is consistent with the continuous- similarity (a scaling property) and long-range depen-
time model for prices dence (LRD) in the stochastic process of squared
returns. (LRD in the Allen sense is expressed as
Pt = P0 exp {µt + θTt + σ B(Tt )} (15) divergence of the sequence of ultimately nonnegative
autocorrelations of a discrete stationary process.)
where {Tt }, t ≥ 0, is a gamma process, exactly as The general form of the continuous-time model
before. The process of independent stationary incre- for prices over continuous time t ≥ 0 as
ments {log(Pt/P0 )}, t ≥ 0, with the distribution of
returns described by equations (11)–(13), is also Pt = P0 exp {µt + θTt + σ B(Tt )} (17)
4 Variance-gamma Model

for which the returns are equivalent in distribution to Comparing equations (12) and (19), it is clear that
equation (16) was given in [23] as a generalization choosing c = a results in a (skew) VG
distribution,

of the subordinator model that allows for skewness in with PDF (11) parameters α = a, θ = a b1 − d1 , and
the distribution of returns in the same way as in [15], σ 2 = bd
2a
. The further simplification b = d results in
but the returns inherit the postulated strict stationarity the symmetric VG process for returns.
of the sequence {Wt }, t = 1, 2, . . . . Following on Using this model for option pricing requires
from [11], Finlay and Seneta [5, 6] studied in detail imposing parameter restrictions to ensure that
and in parallel the continuous time structure of the {e−rt Pt } is a martingale, where r is the interest
skew VG model and the skew t-model, with focus rate. This amounts to ensuring that Ɛ(e−rt Pt |Fs ) =
on skewness, asymptotic self-similarity, and LRD. e−rs Ps , where Fs represents information available to
Maximum-likelihood estimation for independent time s ≤ t. In the case of the DG process,
readings from a symmetric VG distribution is dis-
cussed in [17] and in [23], which however proposes
moment estimation in the presence of dependence. Ɛ(e−rt Pt |Fs )
Moment estimation, allowing for dependence, is  a(t−s)  c(t−s)
−rs b d
further developed in [25], along with goodness of =e Ps ×e (µ−r)(t−s)
b−1 d +1
fit of various models for several sets of asset data. A (20)
method of simulating data from long-range dependent
processes with skew VG or
t-distributed increments is described in [7], and var- so that imposing the restriction
ious estimation procedures (method of moments,  
product-density maximum likelihood, and nonstan- b d
µ = r − a log − c log (21)
dard minimum χ 2 in particular) are tested on the b−1 d +1
data to assess their performance. In the simulations
considered, the product-density maximum-likelihood with b > 1 results in {e−rt Pt }, which is a martingale
method performs favorably. The conclusion, within with four free parameters: a, b, c, and d. We label it
the limited testing carried out, indicates then that, MDG.
in practice, ordinary product density maximum- The (skew) VG special case is obtained by choos-
likelihood estimation is satisfactory even in the pres- ing c = a. Relabeling the parameters as above, α =
ence of LRD. This is tantamount to saying that one
2a , results in a martin-
a, θ = a b1 − d1 , and σ 2 = bd
may treat such data on returns as i.i.d. This entails
gale that is a (skew) VG process. The mean constraint
an enormous simplification in estimation procedures
(21) now translates to
in fitting the skew VG and skew t-distributions.
 
θ + 12 σ 2
µ = r + α log 1 − (22)
Option Pricing Applications α

Our discussion is based on the difference of gamma


(DG) models for real-world (historical) data: where we take α > θ + 12 σ 2 . This martingale now
has only three free parameters. We label it MVG. This
Pt = P0 eµt+G1 (t;a,b)−G2 (t;c,d) (18) corresponds to the labeling “VG” in [22] and is the
martingale used in [15].
where {G(t; α, β)} is a gamma process, and so Both MDG and MVG are, in the terminology
G(t; α, β) ∼ (tα, β) for any given t, and the two of the work by Schoutens [22], “mean-correcting
gamma processes are independent of each other. For martingales”, since the restriction (21), (22) is on the
each t, the returns (4) for Pt have the following CF: mean (µ) to produce a martingale.
Another way of producing a martingale from a
φDG (u; µ, a, b, c, d)
  (skew) VG process is to begin by noting from [5]
iu −a iu −c and equation (17) that irrespective of the distribution
=e iµu
1− 1+ (19)
b d of Tt ,
Variance-gamma Model 5

Ɛ(e−rt Pt |Fs ) Taking the inverse Fourier transform and using the
 fact that c(ϒ, k) is real, and using equation (25) gives
1 2
= e−rs Ps × e(µ−r)(t−s) Ɛ e(θ+ 2 σ )(Tt −Ts ) |Fs

e−γ k ∞
(23) C(ϒ, k) = e−ixk ϒ (x) dx
2π −∞
where the sequence {Wt }, where Wt = Tt − Tt−1 , is 
e−γ k ∞
strictly stationary. = {e−ixk ϒ (x)} dx (27)
π
Thus, if we take µ = r and θ = − 1 σ 2 in (23), 0
2
the right-hand side of equation (23) becomes e−st Ps , In fact, we shall use a modified version of equation
and we have a martingale. This construction of (27) suggested in [14]:
a martingale, simple and quite general, is slightly 
restrictive, however, in that two parameters, µ and θ, e−γ k ∞
C(ϒ, k) = Rγ + {e−ixk ϒ (x)} dx (28)
are constrained. We shall refer to this construction as π 0
a skew-correcting martingale, since θ, the parameter
that determines skewness, is constrained. We denote where Rγ = φϒ (−i) for −1 < γ < 0. The choice of
this martingale model by MSK. Out of the “external” γ generally impacts on the error generated by the
parameters µ, θ, and σ , the only parameter that is numerical approximation of equation (28). Finally,
retained is σ , which is called the historical volatility the option price (28) is computed via numerical
in the Black–Scholes (BS) context, which is a special integration.
case when Tt = t. Any additional parameters in The option price in this procedure is given simply
the martingale (risk-neutral) process will be those by the sum of a number of function evaluations.
emanating from the nature of {Tt }, which will need Lee [14] shows that with a judicious choice of
to be specified for any examination of estimation and tuning parameters, one can calculate the option price
goodness of fit. up to 99.99% accuracy with less than 100, and in
When the CF of the risk-neutral distribution of some cases less than 10, function evaluations. This
price is of the closed form, option prices may be cal- CF-based pricing method lends itself easily to the
culated using Fourier methods (see Fourier Methods fast Fourier transform, which allows for a very fast
in Options Pricing) as in [3, 14]. Specifically, for calculation of a range of option prices.
C(ϒ, k), the price of a European call option with time To numerically illustrate the method and the
to maturity ϒ and strike price K and k = log(K), let empirical performance of MVG against some com-
qϒ (p) be the risk-neutral density of log(Pϒ ), with petitors, we [8] use the data set in [22], Appendix
CF φϒ (u), at time ϒ. Thus, C, which contains 77 call option prices on the S&P
500 index at the close of market on April 18, 2002.
Fundamentally, each data point consists of the triple:
C(ϒ, k) = e−rϒ Ɛ(Pϒ − K)+ strike, option price, and expiry date.
 ∞ Fitting models involves estimating model param-
= e−rϒ (ep − ek )qϒ (p) dp (24) eters. To do this, we follow [22], p. 7, by minimizing
k
with respect to the model parameters, the root-mean-
Define the modified call price as square error (RMSE):

c(ϒ, k) = eγ k C(ϒ, k) (25)


RMSE
γ +1  
for some γ such that Ɛ(Pϒ ) < ∞. The Fourier

transform of c(ϒ, k) is then given as  (market price − model price)2
= 
number of options
 ∞
options

ϒ (x) = eixk c(ϒ, k) dk (29)


−∞

e−rϒ φϒ (x − (γ + 1)i) and then comparing the values of the minimized


= (26)
γ + γ − x + ix(2γ + 1)
2 2 errors between models. If a model perfectly described
6 Variance-gamma Model

the asset price process, the RMSE value would be martingale model; its RMSE and σ -estimate values
zero, with all model prices matching market prices, are reported in [22], pp. 40-41, and are 6.73 and
given the single true set of parameters. 0.011, respectively. This apparent insensitivity of the
The estimates of model parameters produced for a MSK model to departure from BS, possibly due to the
given model correspond to the current market status skewness parameter being constrained in the martin-
of that model. The procedure is thus, for a given gale construction, is overcome, as reported in [8], by a
model, a calibration procedure. No historical data are four-parameter (“lack of static arbitrage” model: [2])
used in this procedure. The use of this data set for model, which is termed as C3. This model, though
comparison of several different models in this way as not a martingale model, gives an RMSE = 0.76, and
already done in [22] allows for easy comparison of its parameter estimates conform with estimates from
goodness of fit. We used the tuning parameter value historical data for an LRD VG model [7].
γ = − 12 ; the other nonmodel constants, q, r, were as Thus, for a given maturity, the three-parameter
in [22], Appendix C. MVG model and its associated (skew) VG model
The RMSE surface for the MDG was reported in for historical data perform reasonably well in fitting
[8] to be quite flat, with a number of different parame- option prices. If a four-parameter martingale model
ter values giving essentially the same RMSE value of is to be used, the parent model of MVG that should
2.24. The parameter values that gave the lowest value be used is the MDG, in which the gamma process
by 0.001 were as follows: a = 4.35, b = 240.86, c = continues to play a fundamental role.
9.79 × 10−6 , d = 2.65 × 10−7 . The four-parameter
MDG model thus did better than the four-parameter
CGMY and GH models reported in [22], p. 83, and Historical Notes
shown in Table 1.
The VG (skew) model fit reported in [22], p. 83, In the case where σ 2 = 2α in the CF (2), the
corresponded to the parameter values α(= a = c) = corresponding PDF (11) (with µ = θ = 0) already
5.4296 × 10−3 , b = 14.2699, d = 5.8704. The recal- appears in [18], p. 184, equation (xlii), and is the
culation of RMSE with these parameter values theme of [19], where it is shown to be the distribution
reported in [8] gave the value 3.57. Optimization of of difference of two i.i.d. gamma random variables,
RMSE reported in [8] with starting values a = c = an idea clarified in [13]. The definition of the Bessel
0.01, b = d = 10 resulted in the parameter estimates function Kη (ω) used differs from equation (13).
and RMSE as in [22]. Thus in Table 1, the RMSE val- Teichroew [24] obtained the PDF (11) (with µ = θ =
ues reported under VG and MVG are the same, 3.56. 0), in terms of a Hankel function, from the normal
As expected, this three-parameter model variance-mixing structure of the distribution of X,
(MVG/VG) does not perform quite as well as the using form (1) for the PDF of the mixing variable
four-parameter models. The VG model is a special W . These themes are taken up by McLeish [17] as a
case of the GH model, so this is not unexpected. starting point.
Finlay and Seneta [8] discuss fitting an MSK mar- The skew VG distribution with α = 2n, where n
tingale model, which allows for LRD in the historical is a positive integer, and −1 < θ/σ 2 < 1 appears
data. This introduces two parameters in addition to in [26], a paper generalizing [19], which was also
the “historical volatility” parameter σ , namely, a published in 1932.
parameter α corresponding to the gamma distribution
with mean 1, as before, and a “Hurst” parameter H
associated with dependence. The fit of MSK produces Acknowledgments
an RMSE of 6.35 and an estimate of σ = 0.012.
There is almost no improvement on the BS situa- Many thanks are due to Richard Finlay for his help.
tion reported in Table 1 , which is the standard BS
References
Table 1 Fit of models to Schoutens [22] option data
Model MDG CGMY GH VG MVG BS [1] Barndorff-Nielsen, O.E., Kent, J. & Sørensen, M. (1982).
RMSE 2.24 2.76 2.88 3.56 3.56 6.73 Normal variance-mean mixtures and z distributions,
International Statistical Review 50, 145–159.
Variance-gamma Model 7

[2] Carr, P., Geman, H., Madan, D. & Yor, M. (2003). [18] Pearson, K., Jeffery, G.B. & Elderton, E.M. (1929). On
Stochastic volatility for Lévy processes, Mathematical the distribution of the first product-moment coefficient
Finance 13, 345–382. in samples drawn from an indefinitely large normal
[3] Carr, P. & Madan, D. (1999). Option valuation using population, Biometrika 21, 164–201.
the fast Fourier transform, Journal of Computational [19] Pearson, K., Stouffer, S.A. & David, F.N. (1932).
Finance 2, 61–73. Further applications in statistics of the Tm (x) Bessel
[4] Epps, T.W. (2000). Pricing Derivative Securities, World function, Biometrika 24, 316–343.
Scientific, Singapore. [20] Praetz, P.D. (1972). The distribution of share price
[5] Finlay, R. & Seneta, E. (2006). Stationary-increment changes, Journal of Business 45, 49–55.
Student and Variance-Gamma processes, Journal of [21] Press, S.J. (1967). A compound events model for secu-
Applied Probability 43, 441–453. rity prices, Journal of Business 40, 317–335.
[6] Finlay, R. & Seneta, E. (2007). A gamma activity time [22] Schoutens, W. (2003). Lévy Processes in Finance. Pric-
process with noninteger parameter and self-similar limit, ing Financial Derivatives, Wiley, Chichester.
[23] Seneta, E. (2004). Fitting the Variance-Gamma model to
Journal of Applied Probability 44, 950–959.
financial data, in Stochastic Methods and Their Applica-
[7] Finlay, R. & Seneta, E. (2008a). Stationary-increment
tions (C.C. Heyde Festschrift), J. Gani & E. Seneta, eds,
Variance-Gamma and t-models: simulation and param-
Journal of Applied Probability, Vol. 41A, pp. 177–187.
eter estimation, International Statistical Review 76,
[24] Teichroew, D. (1957). The mixture of normal distribu-
167–186.
tions with different variances, Annals of Mathematical
[8] Finlay, R. & Seneta, E. (2008b). Option pricing with
Statistics 28, 510–512.
VG-like models, International Journal of Theoretical [25] Tjetjep, A. & Seneta, E. (2006). Skewed normal
and Applied Finance 11, 943–955. variance-mean models for asset pricing and the
[9] Fung, T. & Seneta, E. (2007). Tailweight, quantiles and method of moments, International Statistical Review 74,
kurtosis. A study of competing distributions, Operations 109–126.
Research Letters 35, 448–454. [26] Wishart, J. & Bartlett, M.S. (1932). The distribution
[10] Heyde, C.C. (1999). A risky asset model with strong of second order moment statistics in a normal system,
dependence through fractal activity time, Journal of Proceedings of the Cambridge Philosophical Society 28,
Applied Probability 36, 1234–1239. 455–459.
[11] Heyde, C.C. & Leonenko, N.N. (2005). Student pro-
cesses, Advances in Applied Probability 37, 342–365.
[12] Heyde, C.C. & Liu, S. (2001). Empirical realities for Further Reading
a minimal description risky asset model. The need for
fractal features, Journal of the Korean Mathematical Seneta, E. (2007). The early years of the Variance-Gamma
Society 38, 1047–1059. process, in Advances in Mathematical Finance (Dilip B.
[13] Kullback, S. (1936). The distribution laws of the differ- Madan Festschrift), M.C. Fu, R.A. Jarrow, J.-Y.J. Yen, &
ence and quotient of variables independently distributed R.J. Elliott, eds, Birkhäuser, Boston, pp. 3–19.
in Pearson type III laws, Annals of Mathematical Statis-
tics 7, 51–53.
[14] Lee, R. (2004). Option pricing by transform methods: Related Articles
extensions, unification and error control, Journal of
Computational Finance 7, 51–86. Exponential Lévy Models; Generalized Hyper-
[15] Madan, D.B., Carr, P.P. & Chang, E.C. (1998). The bolic Models; Hazard Rate; Heavy Tails; Lévy
Variance-Gamma process and option pricing, European
Processes; Stylized Properties of Asset Returns;
Finance Review 2, 79–105.
[16] Madan, D.B. & Seneta, E. (1990). The Variance-Gamma Tempered Stable Process.
(V.G.) model for share market returns, Journal of
Business 63, 511–524. EUGENE SENETA
[17] McLeish, D.L. (1982). A robust alternative to the normal
distribution, Canadian Journal of Statistics 10, 89–102.
Jump-diffusion Models To identify µJ , taking expectations of equation (1)
and from the definition of µ̂,

Jump-diffusion (JD) option pricing models are par- Ɛ[dSt ] = µ̂ St dt = µ St dt


ticular cases of exponential Lévy models (see Exp-  
onential Lévy Models) in which the frequency of +λ ξ F (dξ ) St dt (5)
jumps is finite. They can be considered as prototypes
for a large class of more complex models such as the
stochastic volatility plus jumps model of Bates (see where F is the risk-neutral jump distribution. The
Bates Model). jump compensator is then given as
Consider a market with a riskless asset (the bond) 
and one risky asset (the stock) whose price at time µJ = −λ ξ F (dξ ) (6)
t is denoted by St . In a JD model, the SDE for the
stock price is given as To simplify the presentation, we henceforth ass-
ume zero dividends so that µ̂ = r, the risk-free
dSt = µ St− dt + σ St− dZt + St− dJt (1) rate.

where Zt is a Brownian motion and Characteristic Function


Nt Define the forward price F := S0 ert . If xt :=
Jt = Yi (2) log St /F is a Lévy process (see Lévy Processes),

i=1 its characteristic function φT (u) := Ɛ eiuxT has the
Lévy–Khintchine representation
is a compound Poisson process where the jump sizes
Yi are independent and identically distributed with 
distribution F and the number of jumps Nt is a φT (u) = exp i u (µJ − σ 2 /2) T − u2 σ 2 /2 T
Poisson process with intensity λ. The asset price  
 iuξ 
St thus follows geometric Brownian motion between +T e − 1 ν(ξ ) dξ (7)
jumps. Monte Carlo simulation of the process can
be carried out by first simulating the number of Typical assumptions for the distribution of jump
jumps Nt , the jump times, and then simulating sizes are as follows: normal as in the original paper
geometric Brownian motion on intervals between by Merton [5] and double exponential as in [3] (see
jump times. Kou Model). In the Merton model, the Lévy density
The SDE (1) has the exact solution: ν(·) is given as

St = S0 exp{µ t + σ Zt − σ 2 t/2 + Jt } (3) λ (ξ − α)2


ν(ξ ) = √ exp − (8)
2π δ 2 δ2
Merton [5] considers the case where the jump sizes
Yi are normally distributed. where α is the mean of the log-jump size log J and
δ the standard deviation of jumps. This leads to the
explicit characteristic function
Risk-neutral Drift 
φT (u) = exp i u ω T − 12 u2 σ 2 T
If the above model is used as a pricing model, the
2 2

drift µ in equation (1) is given by the risk-neutral + λ T (eiu α−u δ /2 − 1) (9)
drift µ̂, which contains a jump compensator µJ :
with
2
µ = µ̂ + µJ (4) ω = − 12 σ 2 − λ (eα+δ /2
− 1) (10)
2 Jump-diffusion Models

In the double-exponential case (see Kou Model) where for ease of notation, V denotes V (S, t).
Equation (15) is a partial integro-differential equation

ν(ξ ) = λ p α+ e−α+ ξ 1ξ ≥0 (PIDE) (see Partial Integro-differential Equations
(PIDEs)), which can be solved using finite-difference
+ (1 − p) α− e−α− |ξ | 1ξ <0 (11) methods [1].
where α+ and α− are the expected positive and
negative jump sizes, respectively, and p is the relative A Valuation Formula for European Options
probability of a positive jump. This gives the explicit
characteristic function: Merton [5] derived an exact solution of the valuation
equation (15) for a European-style call option with
 strike K and time to expiration T , which has the
1
φT (u) = exp i u ω T − u2 σ 2 T + λ T form of an infinite sum of Black–Scholes-like terms:
2

p 1−p ∞ 
× − (12) e−λ T (λ T )n
α+ − i u α− + i u C(S, K, T ) = Fn (dξ )
n=0
n!
with
 × CBS (S eξ eµJ T , K, r, σ, T )
1 p 1−p
ω = − σ2 − λ − (13) (16)
2 α+ + 1 α− − 1
where Fn is the distribution of the sum of n-
Pricing of European Options independent jumps and CBS (·) denotes the Black–
Scholes solution, which is given as
Given a characteristic function, European call options
can be priced using Fourier methods (see Fourier CBS (S, K, r, σ, T ) = S N (d1 ) − K e−rT N (d2 )
Methods in Options Pricing), as in [4]: (17)

 with

  √
−r T
√ 1 ∞ du
C(S, K, T ) = e F − FK log(S/K) + r T σ T

 π 0 1 d1 = √ +
u2 + σ T 2
4 √

 log(S/K) + r T σ T
  d2 = √ −
2
(18)
× Re e−iuk φT (u − i/2) (14) σ T


Jump to Ruin
where the log-strike k := log (K/F ).
In the case where eYi = 0 with probability 1, µJ = λ
and equation (16) simplifies to
Valuation Equation
C(S, K, T ) = e−λ T CBS (S eλ T , K, r, σ, T ) (19)
Assuming the process (1) and a constant risk-free rate
r and further supposing that the market is complete, which is the Black–Scholes formula with a shifted
the value V (S, t) of a European-style option satisfies interest rate r → r + λ. This special case of the JD
 model where the stock price jumps to zero (or ruin)
∂V 1 ∂ 2V ∂V whenever there is a jump is the simplest possible
+ σ 2 S2 2 + r S −rV +λ F (dξ )
∂t 2 ∂S ∂S model of default. Equation (19) for the option price
  in the jump-to-ruin model may also be derived from a
 ξ  ∂V
V (S e , t) − V (S, t) − e − 1 S
ξ
=0 Black–Scholes style replication argument using stock
∂S and bonds of the issuer of the stock; upon default of
(15) the issuer, both stock and bonds jump to zero. The
Jump-diffusion Models 3

cost of funding stock with bonds of the issuer is r + λ Each term CBS (S, K, rn , σn , T ) in equation (22)
in this picture, which explains the simple form (19) is the value of the option conditional on there being
of the solution. exactly n jumps during its life.

Implied Volatility Smile


Local Volatility
If there were no jumps in this model, the implied
There is a particularly simple expression for Dupire volatility smile would be flat. Jumps in the JD
local volatility in the jump-to-ruin model. It is given stock price process induce an implied volatility smile
by Gatheral [2]: whose short time limit (see, e.g., [2]) is given
as
√ N (d2 )
2
σloc (K, T ; S) = σ 2 + 2 λ σ T (20) ∂ 2
N  (d2 ) K σ (K, T ) → −2 µJ as T → 0 (26)
∂K BS
with √ The greater the µJ , because jumps are either
log(S/K) + λ T σ T more frequent or more negatively skewed, the more
d2 = √ − (21)
σ T 2 negative is the implied volatility skew.
As K → ∞, d2 → −∞ and the correction term
vanishes, and as K → 0, the correction term References
explodes. In addition, as the hazard rate λ increases,
so does d2 , increasing the local volatility for low [1] Cont, R. & Voltchkova, E. (2005). A finite difference
scheme for option pricing in jump-diffusion and exponen-
strikes K relative to high strikes.
tial Lévy models, SIAM Journal on Numerical Analysis
43(4), 1596–1626.
[2] Gatheral, J. (2006). The Volatility Surface, John Wiley &
Normally Distributed Jumps Sons, Hoboken, Chapter 5.
[3] Kou, S. (2002). A jump-diffusion model for option
Merton [5] also shows that if jumps are normally pricing, Management Science 48, 1086–1101.
distributed with Yi ∼ N (α, δ), equation (16) again [4] Lewis, A.L. (2000). Option Valuation under Stochastic
simplifies considerably to give Volatility with Mathematica Code, Finance Press, Newport
Beach, CA.

[5] Merton, R.C. (1976). Option pricing when underlying
 
e−λ T (λ T )n stock returns are discontinuous, Journal of Financial
C(S, K, T ) = CBS (S, K, rn , σn , T ) Economics 3, 125–144.
n=0
n!
(22)
Related Articles
with
Bates Model; Exponential Lévy Models; Fourier
Methods in Options Pricing; Implied Volatility
λ = λ eα+δ
2
/2
(23) Surface; Kou Model; Partial Integro-differential
σn2 T = σ T + nδ
2 2
(24) Equations (PIDEs).
 
rn T = (r + µJ ) T + n α + δ 2 /2 (25) JIM GATHERAL
the model name)
Time-changed Lévy
Process ψNIG (u; σ, ν, θ)
  
2
ν ν θiu
=σ  − − 2 2 +u2  , σ, ν > 0, θ ∈ 
θ θ2 σ
If L is a Lévy process (see Lévy Processes) and
(Tt )t≥0 is a positive increasing process, Xt = L(Tt ) is (1)
called a time-changed Lévy process with time change ψVG (u; σ, ν, θ)
(Tt )t≥0 . When the time change is independent from  
L, many properties of X can be derived from those 1 σ 2ν 2
= log 1 − iuθν − u , σ, θ, ν ∈  (2)
of L. ν 2
Many Lévy processes have appeared as time-
changed Brownian motions (see Time Change), so ψCGMY (u; C, G, M, Y )
one might ask why one should time change them yet = C(−Y )((M − iu)Y − M Y + (G + iu)Y
again. In this regard, we note that Lévy processes are
by construction processes of independent identically − GY ), C, G, M > 0, 0 < Y < 2. (3)
distributed increments and hence all distributional The details of the associated Lévy processes are
parameters such as variances, skewness, kurtosis, given, which would assist in various applications.
and possibly correlations are constant. Yet all these The NIG and VG processes can be written as
and possibly more entities are stochastic in actual Brownian motion with drift θ and volatility σ time
economies, and this randomness may be important for changed by an inverse Gaussian process and a gamma
particular questions of interest. These considerations process, respectively. The inverse Gaussian process
led in the first instance to the construction of pro- Ttν is the time taken by an independent Brownian
cesses displaying stochastic volatility with the local motion with drift ν to reach the level t, while
innovations of a Lévy process. Analytical tractabil- the gamma process Gνt is an increasing process
ity of characteristic functions motivated models with with independent identically distributed increments
exponential affine characteristic functions (see Affine where the increments over unit time have a gamma
Models) and through the work of Duffie et al. [4] it distribution with unit mean and volatility ν. Both the
became well known that this would be the case if NIG and VG are pure jump processes with Lévy
the infinitesimal generator of the resulting Markov measures kNIG (x) dx, kVG (x) dx defined as
process was linear in state variables. The recipe for
constructing these models was, therefore, clear. 
2 2 eβx K1 (|x|)
A number of models in this direction were pre- kNIG (x) = σα (4)
sented by Carr et al. [2]. Three Lévy processes were π |x|
selected for being time changed and they were the θ ν2 θ2
β= 2
, α2 = 2 + 4 (5)
normal inverse Gaussian (NIG) (see Normal Inverse σ σ σ

Gaussian Model), the variance gamma (VG) (see C G−M G+M


kVG (x) = exp x exp − |x|
Variance-gamma Model), and the CGMY model |x| 2 2
(see Tempered Stable Process). The first two were (6)
already known to be time changes of Brownian  −1
motion with drift. Cont and Tankov [3, Prop. 4.1] 1 θ 2ν2 σ 2ν θν
show that the CGMY also has such a representa- C= ;G =  + −  ,
ν 4 2 2
tion. Characteristic exponents ψ(u; .) are critical to
the development of the exponential affine representa-  −1
2 2 2
tions (see Affine Models) involved, which are given M=  θ ν + σ ν + θν  (7)
here by the logarithm of the characteristic functions 4 2 2
taken at unit time. For these three models, the char-
acteristic exponents are given as (subscript indicates where Kα (x) is the Bessel K function.
2 Time-changed Lévy Process

The CGMY process was defined in terms of its 


where t
Lévy measure kCGMY (x) dx with
Y (t) = y(u) du (14)

0
C G−M
kCGMY (x) = exp x The characteristic function for the composite pro-
|x|1+Y 2
cess is easily derived from the characteristic function

of Y (t) as
G+M
× exp − |x| (8)
2  
E eiuY (t) = φ(u, t, y(0), κ, η, λ)
It was shown in [3, Proposition 4.1] (see also
[6]) that the CGMY process can be represented as = A(t, u) exp (B(t, u)y(0)) (15)
Brownian motion with drift (G − M)/2 time changed  
κ 2 ηt
by a shaved stable Y2 subordinator with shaving exp
λ2
function A(t, u) =
 

2κη
(B 2 − A2 )y B 2 y γY/2 γt κ γt λ2
− − cosh + sinh
f (y) = e 2 E e 2 γ1/2  , 2 γ 2
(16)
G−M G+M 2iu

A= , B= (9) B(t, u) = (17)


2 2 γt
κ + γ coth
where γY/2 , γ1/2 are independent gamma variates. 2
One may explicitly evaluate in terms of Hermite 
functions: γ = κ 2 − 2λ2 iu (18)
  It follows that
B 2 y γY/2  
− (Y ) √
E e 2 γ1/2  =
h−Y (B y) E exp (iuZ(t)) = φ (−iψX (u), t, y(0), κ, η, λ)
Y Y −1
 22 (19)
2

(10) The Stock Price Model


where
There are two approaches to model the stock price
 ∞ S(t). The first approach takes the exponential of
1
e−y /2−yz −ν−1
2
hν (z) = y dy (ν < 0) the composite process corrected to get the correct
(−ν) 0
forward price, whereby we define
(11) exp (Z (t))
S1 (t) = S(0)   (20)
E exp (Z (t))
A Continuous Time Change In this case, the stock price has the right forward
We can introduce stochastic volatility along with a and the resulting option prices are free of static
clustering of volatility by time changing these Lévy arbitrage. However, there may be the possibility of
processes by the integral of the square root process dynamic arbitrage in the model and this is an issue
y(t), where if the model is being used continuously to quote on
√ options with constant parameters through time. To
dy = κ(η − y) dt + λ y dW (12) exclude dynamic arbitrage in the model, one could
for an independent Brownian motion (W (t), t > 0). form a martingale model for the forward stock price
For a candidate Lévy process X(t), we consider as by modeling it as the stochastic exponential of the
a model for the uncertainty driving the stock the martingale:
composite process  t ∞
n(t) = Z(t) − xy(t)kX (x) dxds (21)
Z(t) = X(Y (t)) (13) 0 −∞
Time-changed Lévy Process 3

In the second approach, one writes the stock price We suppose that the background driving Lévy
process S2 (t) as process has the following characteristic function:
E[exp(iuU (t)) = exp(tψU (u)) (28)
S2 (t) = S(0) exp ((r − q)t) exp(Z(t)
− Y (t)ψX (−i)) (22) The characteristic function of the composite pro-
cess Z J (t) may be developed in terms of the joint
For the first approach, the log characteristic func- characteristic function of Y J (t), U (t) as
tion for the logarithm of the stock price is given as   
t (a, b) = E exp iaY J (t) + ibU (t) (29)
 
E exp (iu log S1 (t)) We may show that
= exp (iu(log(S(0) + (r − q)t)))   
E exp iuZ J (t) = t (−iψX (u), uρ) (30)
φ (−iψX (u), t, y(0); κ, η, λ)
× (23) We have that
φ(−iψX (−i), t, y(0); κ, η, λ)iu

The second approach leads to the following char- 1 − e−κt


t (a, b) = exp iay(0)
acteristic function: κ
   U

ψU (v)
E exp (iu log S2 (t)) × exp dv (31)
L a + κb − κv
= exp (iu(log(S(0) + (r − q)t))
L=b (32)
× φ(−iψX (u) − uψX (−i), t, y(0); κ, η, λ)
1 − e−κt
(24) U =b+a (33)
κ
The models of the first approach are termed The characteristic functions for the logarithm of
NIGSA, VGSA, and CGMYSA for NIG, VG, and the stock price for the exponential model are now
CGMY with a stochastic arrival rate of jumps adapted
to the level of the process y(t). The models of   
E exp iu log(S1J (t)
the second approach are martingale models and are
termed N I GSAM, V GSAM, and CGMY SAM, = exp (iu(log(S(0) + (r − q)t))
respectively. It is observed in calibrations that the first × t (−iψX (u), ρu)
approach generally fits the option price data better.
× exp (−iu log (t (−iψX (−i), −iρ)) (34)

Some Discontinuous Time Changes For the stochastic exponential, the result is
given as
One can replace the continuous stochastic process
for the arrival rate of jump activity y(t) by a   
E exp iu log(S2J (t)
discontinuous process that now only has upward
jumps. We call this process y J (t) for discontinuous = exp(iu(log(S(0)) + (r − q)t − ψU (−iρ)t))
jump arrival rates. Given a background driving Lévy × t (−iψX (u) − uψX (−i), ρu) (35)
process (BDLP) U (t) with only positive jumps, we
define Some explicit examples for ψU (u), for which we
dy J (t) = −κy J (t) dt + dU (t) (25) may obtain exact expressions for t (a, b), remain to
The composite process now permits some direct be determined.
dependence between arrival rate jumps and the under-
lying uncertainty:
Examples for ψU (u) and t (a, b)
Z (t) = X(Y (t)) + ρU (t)
J J
(26)
Three explicit models for ψU were developed. These
 t
are SG for stationary gamma, IG for inverse Gaus-
Y J (t) = y J (s) ds (27) sian, and SIG for stationary inverse Gaussian.
0
4 Time-changed Lévy Process

The SG Case IG (x, a, b; κ, ν)


 
In this case, the Lévy density for jumps in the process 2 ν 2 − 2ix 2 ν 2 κ − 2i(a + κb)
U (t) is = +
λ κ κ 3/2
kU (x) = e−x/ζ (36)  √  
ζ κ ν 2 − 2ix
The log characteristic function of the BDLP is × arctanh 
ν 2 κ − 2i(a + κb)
iuλ
ψU (u) = (37) ν log (a + κb − κx)
1/ζ − iu − (44)
κ
The IG Case SIG (x, a, b; κ, ν)

The Laplace transform for inverse Gaussian (see ν 2 − 2ix 2i(a + κb)
Normal Inverse Gaussian Model) time with drift = − 
κ κ 3/2
ν 2 κ − 2i(a + κb)
ν for the Brownian motion is  √  
     
E exp −λT1ν = exp ν − ν 2 + 2λ (38) κ ν 2 − 2ix
× arctanh  (45)
ν 2 κ − 2i(a + κb)
and the log characteristic function is

ψU (u) = ν − ν 2 − 2iu (39)
Correlation in VGSA or VGCSA
The SIG Case We consider the introduction of correlation in VGSA
For this case, Barndorff-Nielsen and Shephard [1] along the following lines. We define the correlated
show that the Lévy density is uncertainty as
 
1 ν2x Z C (t) = X(Y (t)) + ρy(t) (46)
−3/2
kU (x) = √ x (1 + ν x) exp −
2
(40)
2 2π 2 The characteristic function now follows from the
joint characteristic function of Y (t), y(t):
The log characteristic function is  C
  
E eiuZ (t) = E eY (t)ψX (u)+iuρy(t) (47)
iu
ψU (u) =  (41)
ν − 2iu
2 Let
 
For these three cases, the construction of t (a, b) Ct (a, b, x) = E exp (iaY (t) + iby(t)) |y(0) = x
is completed on determining the integral
(48)
 U
ψZ (v) We have
dv = (U, a, b) − (L, a, b)
a + κb − κv
L
φZ C (u) = Ct (−iψX (u), ρu) (49)
(42)
We recall the solution for t (a, b, x) from
[5, 7] as
and we have analytic expressions for (x, a, b) in
the SG, IG, and SIG cases that are as follows: Ct (a, b, x) = AC (t, a, b) exp(B C (t, a, b)x) (50)
SG (x, a, b; κ, λ, ζ ) AC (t, a, b)
 
 κ 2 ηt

λ
i κ − iζ (a + κb) exp
= log  x+ λ2
ζ =

 
 2
2κη
 γt κ − ibλ2 γt λ
λ(a + κb)ζ cosh + sinh
× (a + κb − κx) κ((a + κb)ζ + iκ)  (43) 2 γ 2
(51)
Time-changed Lévy Process 5





γt γt γt
ib γ cosh − κ sinh + 2ia sinh
2 2 2
B C (t, a, b) =

(52)
γt   γ t
γ cosh + κ − ibλ2 sinh
2 2

γ = κ 2 − 2λ2 ia (53)

 t
We get the characteristic function for the model
VGCSA, where the letter C denotes correlated Y (t) = y(u) du (56)
0
stochastic arrival by exponentiation as √
dy = κ(η − y) dt + λ y dWy (t) (57)
  
E exp iu log(S1C (t) dWy dWS = ρ dt (58)
 
= exp (iu(log(S(0)) + (r − q)t)) ν( dx, dt) = cp + sp y(t) kp (x)1x>0 dx
× Ct (−iψX (u), ρu) + (cn + sn y(t))kn (x)1x<0 dx (59)
 
× exp −iu log(Ct(−iψX (−i), −iρ)) (54)
The growth rate of the stock price is at the risk
neutral level of (r − q). The coefficients cp , cn are
Exciting the Jumps by the Level of the Lévy jump response components. The sensitivities
of jumps to volatility are captured by the two
Activity that Is Also a Heston Type of
slope coefficients sp , sn for the positive and the
Correlated Volatility negative sides. The logarithm of the stock price
In this class of models, we introduce stochastic is a continuous martingale with stochastic volatility
volatility and allow jump arrival rates to respond to plus a compensated jump martingale that has jumps
the volatility on each side with separate sensitivities. responding to volatility with log price drift set to fix
This will give rise to stochastic skewness as well as the stock drift at r − q.
to volatility. The model for the logarithm of the stock The joint characteristic function of the log of the
price H (t) = log(S(t)) is now as follows: stock price, the level of the terminal variance, and
the remaining integrated variance is
 t
y(u)
H (t) = H (0) + (r − q)t − du
0 2 (t, H (t), y(t))
 ∞   
   x  T
− cp t + sp Y (t) e − 1 − x kp (x) dx = Et exp(iaH (T ) + iby(T ) + ic y(u) du
0 t
 0
(60)
− (cn t + sn Y (t)) (e − 1 − x)kn (x) dx
x
−∞
 t We have a closed form for  in this model given
+ x ∗ (µ − ν) + y(u) dWS (u) (55) as
0

(t, H (t), y(t)) = A(τ ) exp(iaH (t) + γ (τ )y(t)) (61)


  2κη


λ2
κη  cosh(D) 
A(τ ) = exp ia(r − q) + cp up + cn un + (κ − λρia) 2 τ   τ  (62)
λ cosh D − ξ
2
6 Time-changed Lévy Process

κ − λρia ξ ξ
γ (τ ) = tanh D −+ τ (63)
λ2 λ2 2
  
ξ = (κ − λρia)2 + λ2 a 2 + i(a − 2c) − 2(sp up + sn un ) (64)
 ∞  ∞
up = (e − 1 − iax)kp (x) dx − ia
x
(ex − 1 − x)kp (x) dx (65)
0 0
 0  0
un = (ex − 1 − iax)kn (x) dx − (ex − 1 − x)kn (x) dx (66)
−∞ −∞
 
−1 ibλ2 κ − λρia
D = tanh − (67)
ξ ξ

On setting b = c = 0, we obtain the characteristic [3] Cont, R. & Tankov, P. (2004). Financial Modelling with
function of the log of the final stock price and this Jump Processes, Series in Financial Mathematics, CRC
yields the models: SVADNE, SVAVG, and SVAC- Press.
CGMYY. [4] Duffie, D., Filipovic, D. & Schachermayer, W. (2003).
Affine processes and applications in finance, Annals of
We note that for DNE
Applied Probability 13, 984–1053.
[5] Lamberton, D. & Lapeyre, B. (1996). Introduction to
1
up = (1 − ia) (68) Stochastic Calculus Applied to Finance, Chapman and
βp − 1 Hall, New York.
[6] Madan, D. & Yor, M. (2008). Representing the CGMY
1
un = − (1 − ia) (69) and Meixner Levy processes as time changed Brownian
βn + 1 motions, Journal of Computational Finance Fall, 27–47.
[7] Pitman, J. & Yor M. (1982). A decomposition of Bessel
The corresponding calculations for VG in the Bridges, Zeitschrift für Wahrsch- einlichkeitstheorie und
CGM parameterization are Verwandte Gebiete 59, 425–457.


Further Reading
M
up = log (70)
M −1 Carr, P., Geman, H., Madan, D. & Yor, M. (2002). The fine

structure of asset returns: an empirical investigation, Journal
G
un = log (71) of Business, 75(2), 305–332.
G+1 Madan, D., Carr, P. & Chang, E. (1998). The variance gamma
process and option pricing, European Finance Review 2,
For CCGMYY, we have the following result: 79–105.
Madan, D.B. & Seneta, E. (1990). The Variance Gamma (VG)
  model for share market returns, Journal of Business 63,
up = (−yp ) (M − 1)yp − M yp (72) 511–524.
 
un = (−yn ) (G + 1)yn − Gyn (73)
Related Articles

Affine Models; Barndorff-Nielsen and Shephard


References (BNS) Models; Exponential Lévy Models; Heston
Model; Lévy Processes; Normal Inverse Gaussian
[1] Barndorff-Nielsen, O.E. (1998). Processes of normal Model; Squared Bessel Processes; Stochastic Expo-
inverse Gaussian type, Finance and Stochastics 2, 41–68. nential; Tempered Stable Process; Time Change;
[2] Carr, P., Geman, H., Madan, D. & Yor, M. (2003). Variance-gamma Model.
Stochastic volatility for Levy processes, Mathematical
Finance 13, 345–382. DILIP B. MADAN
In the Merton model [28], Y = log(V ) has a normal
Kou Model distribution. In the double exponential jump-diffusion
model [23] Y = log(V ) has an asymmetric double
exponential distribution with the density
It is well known that empirically asset return dis-
tributions have heavier tails (see Heavy Tails) than fY (y) = p · η1 e−η1 y 1{y≥0} + q · η2 eη2 y 1{y<0} ,
those of normal distributions, in contrast to the classi-
cal Black–Scholes model (see Black–Scholes For- η1 > 1, η2 > 0 (3)
mula). Jump-diffusion models are among the most
where p, q ≥ 0, p + q = 1, represent the probabil-
popular alternative models proposed to address this
ities of upward and downward jumps. The require-
issue, and they are especially useful to price options
ment η1 > 1 is needed to ensure that E(V ) < ∞ and
with short maturities (see Exponential Lévy Mod-
E(S(t)) < ∞; it essentially means that the average
els). However, analytical tractability is one of the
upward jump cannot exceed 100%, which is quite
challenges faced by many alternative models. More
reasonable [2].
precisely, although many alternative models can lead
As pointed out in [23], the jump part of the double
to analytical solutions for European call and put
exponential jump-diffusion model can be interpreted
options, unlike the Black–Scholes model, it is dif-
as the market response to outside developments; and
ficult to do so for path-dependent options such as
the heavier tail and higher peak (in comparison to the
lookback (see Lookback Options), barrier (see Bar-
standard normal distribution) of the double exponen-
rier Options), and American options, which are
tial distribution attempt to model market overreaction
treated using numerical methods (see Partial Inte-
and underreaction, respectively. Ramezani and Zeng
gro-differential Equations (PIDEs)). For example,
[29] independently proposed the double exponential
the convergence rates of binomial trees and Monte
jump-diffusion model from an econometric viewpoint
Carlo simulation for path-dependent options are typi-
as a way of improving the empirical fit of Mer-
cally much slower than those for call and put options;
ton’s normal jump-diffusion model to stock price
see Boyle et al. [3]. data.
The double exponential jump-diffusion model is a Such models lead to incomplete markets in which
jump-diffusion model in which the jump size distri- the replication of an option payoff is impossible.
bution follows a two-sided exponential distribution. The monograph by Cont and Tankov [10] discusses
It was introduced to further extend the analytical hedging issues for jump-diffusion models and result-
tractability of models with jumps. ing pricing measures. Alternatively, one can use the
In jump-diffusion models under the physical prob- rational expectations in [27] and [32] to choose a
ability measure P , the asset price, S(t), is modeled as risk-neutral measure to price derivative as in [23].
N(t)  The double exponential jump-diffusion model
dS(t)  belongs to the class of exponential Lévy models (see
= µ dt + σ dW (t) + d (Vi − 1) (1)
S(t−) Exponential Lévy Models). There is a large litera-
i=1
ture on Lévy processes in finance, including several
where W (t) is a standard Brownian motion, N (t) excellent books, for example, the books by Cont and
is a Poisson process with rate λ, and {Vi } is a Tankov [10] and Kijima [22].
sequence of independent identically distributed (i.i.d.)
nonnegative random variables. All the sources of Analytical Tractability
randomness, N (t), W (t), and V ’s, are assumed to
be independent. Solving the stochastic differential The main advantage of the double exponential jump-
equation (1) gives the dynamics of the asset price: diffusion model is that it offers a rare case where
we can derive the analytical solution of the joint
distribution of the first passage time and X(t) =
   N(t)

1 2 log(S(t)/S(0)), thereby making it possible to price
S(t) = S(0) exp µ − σ t + σ W (t) Vi path-dependent options such as lookback, barrier, and
2 i=1 perpetual American options. An intuitive explanation
(2) for this follows.
2 Kou Model

Overshoot

Figure 1 A simulated sample path with the overshoot problem

To price lookback, barrier, and perpetual American process X(t) is given by


options, it is pivotal to study the first passage times
τb when the process crosses a flat boundary with a 1 2 
Lu(x) = σ u (x) + µ̃u (x)
level b. Without loss of generality, assume that b > 0. 2

When a jump-diffusion process crosses the boundary,

sometimes it hits the boundary exactly and sometimes +λ u(x + y) − u(x) fY (y) dy
−∞
it incurs an “overshoot”, Xτb − b, over the boundary
as shown in Figure 1. The overshoot presents several (4)
problems if one wants to compute the distribution of
for all twice continuously differentiable functions
the first passage time analytically. First, one needs the u(x). When studying the first passage time, we
exact distribution of the overshoot, Xτb − b; partic- encounter an OIDE with discontinuous regions as
ularly, P (Xτb − b = 0) and P (Xτb − b > x), x > 0. follows:
Second, one needs to know the dependence structure  
between the overshoot, Xτb − b, and the first passage (Lu)(x) = αu(x), x < x0
(5)
time τb . u(x) = g(x), x ≥ x0
These difficulties may be resolved under the where α > 0 and g(x) is a given function. Many
assumption that the jump size Y has a double expo- times x0 is a fixed number, but in the case of
nential distribution. Mathematically, this is because American options, x0 is a parameter that needs to
the exponential function has some very nice prop- be determined by solving a free boundary prob-
erties, such as the product of exponential functions lem. Note that u(x) solves the OIDE not for all
is still an exponential function, and the derivatives x ∈  but only for x < x0 . However, u(x) does
of exponential functions are still exponential func- involve the information on x > x0 , as the integral
tion. These nice properties enable us to solve related inside the generator (4) depends on the function
ordinary integro-differential equations (OIDE) explic- g(x), thereby making itself more complicated. This
itly, leading to analytical solutions for the marginal OIDE can be solved explicitly under the double
and joint distributions of the first passage times, and exponential jump-diffusion model, thereby leading
ultimately, analytical tractability for pricing look- to an analytical solution of the joint distribution of
back, barrier, and perpetual American options. More the first passage time τb and Xt ; see [25, 26], and
precisely, the infinitesimal generator of the return [24].
Kou Model 3

In addition to pricing options related to the first Hyper-Exponential Jumps


passage times, the double exponential jump-diffusion
models have been studied in many papers. What is Although the main empirical motivation for using
detailed below is only a snapshot of some interesting Lévy processes in finance comes from the fact
results. that asset return distributions tend to have tails
heavier than those of normal distribution, it is not
1. In terms of computational issues, see [11] and clear how heavy the tail distributions are, as some
[12] for numerical methods via solving partial people favor power-type distributions and others
integro-differential equations (see Partial Inte- exponential-type distributions, although, as pointed
gro-differential Equations (PIDEs)); Feng and out in [23, p. 1090], the power-type right tails cannot
Linetsky [17] and Feng et al. [16] showed be used in models with continuous compounding as
how to price path-dependent options numer- they lead to infinite expectation for the asset price.
ically via extrapolation and variational We stress that, quite surprisingly, it is very difficult
methods. to distinguish power-type tails from exponential-
2. In terms of applications, see the references in type tails and from empirical data unless one has
[18] for applications in fixed income derivatives extremely large sample size perhaps in the order of
and term structure models, and the references tens of thousands or even hundreds of thousands
in [9] for applications in credit risk and credit [19]. Therefore, it is very difficult to choose a
derivatives. good model based on the limited empirical data
3. Double-barrier options (with both upper and alone.
lower barriers) are studied in [30] and [4]. A good intuition may be obtained by sim-
4. Statistical inference and econometric analysis for ply looking at the quantiles for both standard-
Lévy processes are discussed in [31]. ized Laplace (with a symmetric density f (x) =
1 −x
2
e I[x>0] + 12 ex I[x<0] ) and standardized t distribu-
tions with mean 0 and variance 1. The right quan-
Volatility Clustering Effect tiles for the Laplace and normalized t densities with
degrees of freedom (DOF) from 3 to 7 are given in
In addition to the leptokurtic feature, returns dis- Table 1.
tributions also have an interesting dependent struc- This table shows that the Laplace distributions
ture, called the volatility clustering effect; see [14]. may have higher tail probabilities than t distribu-
More precisely, the volatility of returns (which tions, even if asymptotically the Laplace distributions
are related to the squared returns) are correlated, should have lighter tails than t distributions. For
but asset returns themselves have almost no auto- example, regardless of the sample size, the Laplace
correlation. In other words, a large movement in distribution may appear to be heavier tailed than a
asset prices, either upward or downward, tends t-distribution with DOF 6 or 7, up to the 99.9th per-
to generate large movements in the future asset centile. To distinguish the distributions it is necessary
prices, although the direction of the movements is to use quantiles with very low p values and corre-
unpredictable. spondingly large sample sizes for statistical inference.
In particular, any model for stock returns with If the true quantiles have to be estimated from data,
independent increments (such as Lévy processes) then the problem is even worse, as the sample stan-
cannot incorporate the volatility clustering effect. dard deviations need to be considered, resulting in
However, one can combine jump-diffusion processes
with other processes [1, 13] or consider time- Table 1 The right quantiles of the Laplace and normalized
changed Brownian motion and Lévy processes (see t-distributions
Time Change) to incorporate the volatility cluster- Prob. Laplace t7 t6 t5 t4 t3
ing effect. More precisely, if τ (t) contains a dif-
fusion component (i.e., not a subordinator), then 1% 2.77 2.53 2.57 2.61 2.65 2.62
W (τ (t)) and X(τ (t)) may have dependent incre- 0.1% 4.39 4.04 4.25 4.57 5.07 5.90
0.01% 6.02 5.97 6.55 7.50 9.22 12.82
ments and no longer be Lévy processes; see [6, 7], 0.001% 7.65 8.54 9.82 12.04 16.50 27.67
and [8].
4 Kou Model

sample sizes typically in the tens of thousands or by the hyperexponential distribution. Feldmann
even hundreds of thousands necessary to distinguish and Whitt [15] develop a numerical algorithm to
power-type tails from exponential-type tails. For fur- approximate completely monotone distributions
ther discussion, see [20], in which is also discussed by the hyperexponential distribution.
the implication in terms of risk measured.
The difficulty in distinguishing tail behavior moti- Cai and Kou [5] show that the hyperexponential
vated Cai and Kou [5] to extend the double expo- jump-diffusion model can lead to analytical solutions
nential jump-diffusion model to a hyperexponential for popular path-dependent options, such as lookback,
jump-diffusion model, in which the jump size {Yi := barrier, and perpetual American options. These ana-
log(Vi ) : i = 1, 2 · · ·} is a sequence of i.i.d. hyperex- lytical solutions are made possible mainly because we
ponential random variables with density solve several high-order integro-differential equations
related to first passage time problems and optimal

m 
n stopping problems explicitly. Solving the high-order
fY (x) = pi ηi e−ηi x I{x≥0} + qj θj eθj x I{x<0} integro-differential equations is the main technical
i=1 j =1 contribution of [5], which is achieved by discovering
(6) a connection between integro-differential equations
and homogeneous ordinary differential equations in
where pi > 0 and ηi > 1 for all i = 1, . . . , m, qj > the case of the hyperexponential jump-diffusion gen-
m
0 and θj > 0 for all j = 1, . . . , n, and i=1 pi + erator.
n
j =1 q j = 1. Here the condition that ηi > 1, for all
i = 1, . . . , m, is imposed to ensure that the stock
price St has a finite expectation. Multivariate Version
The hyperexponential distribution is general
enough to provide a link between various heavy- A significant drawback of most of the Levy processes
tail distributions, no matter which ones we prefer. In discussed in the literature is that they are one
particular, any completely monotone distribution, for dimensional, whereas many options traded in markets
example, with a density f (x) satisfying the condition have several underlying assets. To overcome this,
that all derivatives of f (x) exist and (−1)n f (n) (x) ≥ Huang and Kou [21] introduced a multivariate jump-
0 for all x and n ≥ 1, can be approximated by
diffusion model in which, under the physical measure
hyperexponential distributions as closely as possible
P , the following stochastic differential equation is
in the sense of weak convergence. Many distributions
proposed to model the asset prices S(t):
with tails heavier than those of the normal distribution
are completely monotone. Here are some examples of N(t) 
completely monotone distributions frequently used in dS(t) 
= µ dt + σ dW (t) + d (Vi − 1) (7)
finance: S(t−) i=1
1. Gamma distribution. The density of Gamma
(α, β) is x α−1 e−βx , where α, β > 0. When α < 1, where W (t) is an n-dimensional standard Brownian
the distribution is completely monotone. motion, σ ∈ R n×n with the covariance matrix  =
2. Weibull distribution. The cumulative distribution σ σ T . The rate
of the Poisson process N (t) process
function of Weibull (c, d) is given by 1 − is λ = λc + nk=1 λk ; in other words, there are two
e−(x/d) , where c, d > 0. When c < 2, it has
c
types of jumps, common jumps for all assets with
heavier tails than the normal distribution. jump rate λc and individual jumps with rate λk ,
3. Pareto distribution. The distribution of Pareto 1 ≤ k ≤ n, only for the kth asset.
(a, b) is given by 1 − (1 + bx)−a , where a, b > 0. The logarithms of the common jumps have
4. Pareto mixture of exponential distribution an m-dimensional asymmetric Laplace distribution
(PME).
+∞ The density of PME (a, b) is given by ALn (mc , Jc ), where mc = (m1,c , . . . , mn,c ) ∈ R n
0 fa,b (y)y −1 e−x/y dy, where fa,b is the den- and Jc ∈ R n×n is positive definite. For the individual
sity of the Pareto (a, b). jumps of the kth asset, the logarithms of the jump
In summary, many heavy-tail distributions used sizes follow a one-dimensional asymmetric Laplace
in finance can be approximated arbitrarily closely distribution, AL1 (mk , vk2 ). In summary,
Kou Model 5


 ALn (mc , Jc ), with prob. λc /λ
Y = log (V ) ∼ (0, . . . , 0, AL1 (mk , vk2 ), 0, . . . , 0) , with prob. λk /λ, 1≤k≤n (8)
      
k−1 n−k

The sources of randomness, N (t), W (t) are assumed The infinitesimal generator of {X1 (t), X2 (t)} is
to be independent of the jump sizes Vi . Jumps given by
at different times are assumed to be independent.
Note that in the univariate case, the above model ∂u ∂u
Lu = µ1 + µ2
degenerates to the double exponential jump-diffusion ∂x1 ∂x2
model [23] but with pη1 = qη2 .
1 ∂ 2u 1 ∂ 2u ∂ 2u
In the special case of a two-dimensional model, + σ12 2 + σ22 2 + ρσ1 σ2
the two-dimensional jump-diffusion return process 2 ∂x1 2 ∂x2 ∂x1 ∂x2
(X1 (t), X2 (t)), with Xi (t) = log(Si (t)/S(0)), is ∞ ∞
given by + λc [u(x1 +y1 , x2 +y2 ) − u(x1 , x2 )]
y2 =−∞ y1 =−∞

× f(Yc (1) ,Y (2) ) (y1 , y2 ) dy1 dy2




N(t)
X1 (t) = µ1 t + σ1 W1 (t) + Yi(1) + λ1 [u(x1 + y1 , x2 ) − u(x1 , x2 )]fY (1) (y1 ) dy1
y1 =−∞

i=1
 ∞

X2 (t) = µ2 t + σ2 ρW1 (t) + 1 − ρ 2 W2 (t) + λ2 [u(x1 , x2 + y2 ) − u(x1 , x2 )]
y2 =−∞


N(t) × fY (2) (y2 ) dy2 (12)
+ Yi(2) (9)
i=1 for all continuous twice differentiable function
u(x1 , x2 ), where f(Yc (1) ,Y (2) ) (y1 , y2 ) is the joint den-
sity of correlated common jumps AL2 (mc , Jc ),
Here all the parameters are risk-neutral parameters;
and fY (i) (yi ) is the individual jump density of
W1 (t) and W2 (t) are two independent standard Brow-
AL1 (mi , Ji ), i = 1, 2.
nian motions; and N (t) is a Poisson process with rate One difficulty in studying the generator is that
λ = λc + λ1 + λ2 . The distribution of the logarithm the joint density of the asymmetric Laplace distri-
of the jump sizes Yi is given by bution has no analytical expression. Therefore, the
calculation related to the joint density and gener-
ator becomes complicated. See [21] for change of
Yi = (Yi(1) , Yi(2) ) measures from a physical measure to a risk-neutral
 measure, analytical solutions for the first passage
 AL2 (mc , Jc ), with prob. λc /λ
times, and pricing formulae for barrier and exchange
∼ (AL1 (m1 , v12 ), 0) , with prob. λ1 /λ
 (0, AL (m , v 2 )) , with prob. λ2 /λ
options.
1 2 2

(10) References

[1] Barndorff-Nielsen, O.E. & Shephard, N. (2001). Non-


where the parameters for the common jumps are
Gaussian Ornstein-Uhlenbeck based models and some
of their uses in financial economics (with discus-
    sion), Journal of Royal Statistical Society, Series B 63,
2 167–241.
m1,c v1,c cv1,c v2,c
mc = and Jc = 2 [2] Boyarchenko, S. & Levendorskii, S. (2002). Non-
m2,c cv1,c v2,c v2,c
Gaussian Merton-Black-Scoles Theory, World Scientific,
(11) Singapore.
6 Kou Model

[3] Boyle, P., Broadie, M. & Glasserman, P. (1997). Monte [20] Heyde, C.C., Kou, S.G. & Peng, X.H. (2008). What is a
Carlo methods for security pricing, Journal of Economic Good Risk Measure: Bridging the Gaps Between Robust-
Dynamics and Control 21(89), 1267–1321. ness, Subadditivity, Prospect Theory, and Insurance Risk
[4] Cai, N., Chen, N. & Wan, X. (2008). Pricing Double Measures, Columbia University. Preprint.
Barrier Options Under a Flexible Jump Diffusion Model, [21] Huang, Z. & Kou, S.G. (2006). First Passage Times and
Hong Kong University of Science and Technology. Analytical Solutions for Options on Two Assets with Jump
Preprint. Risk, Columbia University. Preprint.
[5] Cai, N. & Kou, S.G. (2008). Option Pricing Under [22] Kijima, M. (2002). Stochastic Processes with Applica-
a HyperExponential Jump Diffusion Model, Columbia tions to Finance, Chapman & Hall, London.
University. Preprint. [23] Kou, S.G. (2002). A jump-diffusion model for option
[6] Carr, P., Geman, H., Madan, D. & Yor, M. (2002). The pricing, Management Science 48, 1086–1101.
fine structure of asset returns: an empirical investigation, [24] Kou, S.G., Petrella. G. & Wang, H. (2005). Pricing
Journal of Business 75, 305–332. path-dependent options with jump risk via Laplace
[7] Carr, P., Geman, H., Madan, D. & Yor, M. (2003). transforms, Kyoto Economic Review 74, 1–23.
Stochastic volatility for Lévy processes, Mathematical [25] Kou, S.G. & Wang, H. (2003). First passage time of a
Finance 13, 345–382. jump diffusion process, Advances in Applied Probability
[8] Carr, P. & Wu, L. (2004). Time-changed lévy processes 35, 504–531.
and option pricing, Journal of Financial Economics 71, [26] Kou, S.G. & Wang, H. (2004). Option pricing under a
113–141. double exponential jump-diffusion model, Management
[9] Chen, N. & Kou, S.G. (2005). Credit spreads, optimal Science 50, 1178–1192.
capital structure, and implied volatility with endogenous [27] Lucas, R.E. (1978). Asset prices in an exchange econ-
default and jump risk, Mathematical Finance Preprint, omy, Econometrica 46, 1429–1445.
Columbia University. To appear. [28] Merton, R.C. (1976). Option pricing when underlying
[10] Cont, R. & Tankov, P. (2004). Financial Modelling with stock returns are discontinuous, Journal of Financial
Jump Processes, 2nd Printing, Chapman & Hall/CRC Economics 3, 125–144.
Press, London. [29] Ramezani, C.A. and Zeng, Y. (2002). Maximum Likeli-
[11] Cont, R. & Voltchkova, E. (2005). Finite difference hood Estimation of Asymmetric Jump-Diffusion Process:
methods for option pricing in jump-diffusion and expo- Application to Security Prices, Working Paper, Depart-
nential Lévy models, SIAM Journal of Numerical Anal- ment of Mathematics and Statistics, University of Mis-
ysis 43, 1596–1626. souri, Kansas City.
[12] d’Halluin, Y., Forsyth, P.A. & Vetzal, K.R. (2003). [30] Sepp, A. (2004). Analytical pricing of double-barrier
options under a double exponential jump diffusion
Robust Numerical Methods for Contingent Claims under
process: applications of Laplace transform, Interna-
Jump-diffusion Processes, Working paper, University of
tional Journal of Theoretical and Applied Finance 7,
Waterloo.
151–175.
[13] Duffie, D., Pan, J. & Singleton, K. (2000). Transform
[31] Singleton, K. (2006). Empirical Dynamic Asset Pricing,
analysis and asset pricing for affine jump-diffusions,
Princeton University Press.
Econometrica 68, 1343–1376.
[32] Stokey, N.L. & Lucas, R.E. (1989). Recursive Methods
[14] Engle, R. (1995). ARCH: Selected Readings, Oxford
in Economic Dynamics, Harvard University Press.
University Press.
[15] Feldmann, A. & Whitt, W. (1998). Fitting mixtures
of exponentials to long-tail distributions to analyze
Further Reading
network performance models, Performance Evaluation
31, 245–279.
[16] Feng, L., Kovalov, P., Linetsky, V. & Marcozzi, M. Hull, J. (2005). Options, Futures, and Other Derivatives,
(2007). Variational methods in derivatives pricing, Prentice Hall.
in Handbook of Financial Engineering, J. Birge &
V. Linetsky, eds, Elsevier, Amsterdam.
[17] Feng, L. & Linetsky, V. (2008). Pricing options in jump- Related Articles
diffusion models: an extrapolation approach, Operations
Research 52, 304–325.
[18] Glasserman, P. & Kou, S.G. (2003). The term structure
Barrier Options; Exponential Lévy Models; Jump
of simple forward rates with jump risk, Mathematical Processes; Lookback Options; Partial Integro-
Finance 13, 383–410. differential Equations (PIDEs); Wiener–Hopf
[19] Heyde, C.C. & Kou, S.G. (2004). On the controversy Decomposition.
over tailweight of distributions, Operations Research
Letters 32, 399–408. STEVEN KOU
is called compound Poisson and can be written as
Exponential Lévy Models

Zt
Xt = Yi (2)
Exponential Lévy models generalize the classical k=1
Black and Scholes model by allowing the stock
prices to jump while preserving the independence and where Z is a Poisson process and (Yi ) is an i.i.d.
stationarity of returns. There are ample reasons for sequence of random variables. In general, the number
introducing jumps in financial modeling. First, asset of jumps of a Lévy process in a given interval need
prices exhibit jumps, and the associated risks cannot not be finite, and the process can be represented as a
be handled within continuous-path models. Second, sum of a Brownian motion with drift and a limit of
the well-documented phenomenon of implied volatil- processes of the form in equation (2):
ity smile in option markets shows that the risk-neutral
returns are non-Gaussian and leptokurtic, all the more
so for short maturities, a clear indication of the pres- Xt = γ t + Bt + Nt + lim Mtε (3)
ε↓0
ence of jumps. In continuous-path models, the law
of returns for shorter maturities becomes closer to where B is a d-dimensional Brownian motion, γ ∈
the Gaussian law, whereas in reality and in models d , N is a compound Poisson process that includes
with jumps, returns actually become less Gaussian as the jumps of X with |Xt | > 1, and Mtε is a compen-
the horizon becomes shorter. Finally, jump processes sated compound Poisson process (compound Poisson
correspond to genuinely incomplete markets, whereas minus its expectation) that includes the jumps of X
all continuous-path models are either complete or can with ε < |Xt | ≤ 1. The law of a Lévy process is
be made so with a small number of additional assets. completely identified by its characteristic triplet—the
This fundamental incompleteness makes it possible positive definite matrix A (unit covariance of B), the
to carry out a rigorous analysis of the hedging errors vector γ (drift), and the measure ν on d , called
in discontinuous models and find ways to improve the Lévy measure, which determines the intensity of
the hedging performance using additional instruments jumps of different sizes. ν(A) is the expected num-
such as liquid European options. ber of jumps on the time interval [0, 1], whose sizes
fall in A. The Lévy measure satisfies the integrability
condition
Lévy Processes 
1 ∧ x2 ν(dx) < ∞ (4)
d
Lévy processes (see Fundamental Theorem of Asset
Pricing) [1, 3, 17] are stochastic processes with sta-
tionary and independent increments. The only Lévy and ν() < ∞ if the process has finite jump inten-
process with continuous trajectories is the Brownian sity. The law of Xt at all times t is determined by
motion with drift; all others have paths with discon- the triplet and, in particular, the Lévy–Khintchine
tinuities in finite or (countably) infinite number. The formula gives the characteristic function E[eiuXt ] =
simplest example of a Lévy process is the Poisson exp[tψ(u)] with
process (see Poisson Process): the increasing piece-
wise constant process with jumps of size 1 only and 
1
exponential waiting times between jumps. If (τi ) is ψ(u) = iγ , u + Au, u + (eiu,x − 1
2 d
a sequence of independent exponential  random vari-
ables with intensity λ and Tk := ki=1 τi , then the − iu, x1x≤1 )ν(dx) (5)
process

Zt := 1Ti ≤t (1) Conversely, any infinitely divisible law (see Infinite
i
Divisibility) has a Lévy–Khintchine representation
as above, so modeling with Lévy processes allows to
is called a Poisson process with intensity λ. A piece- pick any infinitely divisible distribution for the law
wise constant Lévy process with arbitrary jump sizes (say, at time t = 1) of the process.
2 Exponential Lévy Models

Exponential Lévy Models In the Merton model (see Jump-diffusion Models)


[16], which is the first model of this type, suggested
The Black–Scholes model in the literature, jumps in the log price X are assumed
dSt to have a Gaussian distribution: Yi ∼ N (µ, δ 2 ). In
= µdt + σ dWt (6) the risk-neutral version (i.e., with the choice of drift
St
such that eX becomes a martingale), the characteristic
exponent of the log stock price takes the following
can be equivalently rewritten in the exponential form form:
2
St = S0 e(µ−σ /2)t+σ Wt . This gives us two possibilities
to construct an exponential Lévy model starting
σ 2 u2
+ λ{e−δ u /2+iµu − 1}
2 2
from a (one-dimensional) Lévy process X, using the ψ(u) = −
stochastic differential equation 2
 
σ2 2
dSt − iu + λ(eδ /2+µ − 1) (10)
= dXt (7) 2
St−

or using the ordinary exponential St = S0 eXt . The In the Kou model (see Kou Model) [13], jump sizes
solution to equation (7) with initial condition S0 = 1 are distributed according to an asymmetric Laplace
is called the stochastic exponential of X. It can law with a density of the form
become negative if the process X has a big negative
jump: Xs < −1 for s ≤ t. However, if X does not ν0 (x) = [pλ+ e−λ+ x 1x>0 + (1 − p)λ− e−λ− |x| 1x<0 ]
have jumps of size smaller than −1, then its stochastic
exponential is positive, and the stochastic and the (11)
ordinary exponential yield the same class of positive
processes. Given this result and the fact that ordinary with λ+ > 0, λ− > 0 governing the decay of the tails
exponentials are more tractable (in particular, we for the distribution of positive and negative jump
have the Lévy–Khintchine representation), they are sizes and p ∈ [0, 1] representing the probability of an
more often used for modeling financial time series upward jump. The probability distribution of returns
than the stochastic ones. In the rest of this article, we in this model has semiheavy (exponential) tails.
focus on the exponential Lévy model The second category consists of models with an
infinite number of jumps in every interval, which
St = S0 ert+Xt (8) we call infinite activity or infinite intensity models.
In these models, one does not need to introduce a
where X is a one-dimensional Lévy process with Brownian component since the dynamics of jumps is
characteristic triplet (σ 2 , ν, γ ) and r denotes the already rich enough to generate nontrivial small time
interest rate. behavior [4].
There are several ways to define a parametric
Examples Lévy process with infinite jump intensity. The first
approach is to obtain a Lévy process by subordinating
Exponential Lévy models fall into two categories. In a Brownian motion with an independent increasing
the first category, called jump-diffusion models, the Lévy process (called subordinator). Two examples
“normal”evolution of prices is given by a diffusion of models from this class are the variance gamma
process, punctuated by jumps at random intervals. process and the normal inverse Gaussian process.
Here the jumps represent rare events—crashes and The variance gamma process (see Variance-gamma
large drawdowns. Such an evolution can be repre- Model) [5, 15] is obtained by time changing a
sented by a Lévy process with a nonzero Gaussian Brownian motion with a gamma subordinator and has
component and a jump part with finitely many jumps: the characteristic exponent of the form
 

Nt
1 u2 σ 2 κ
Xt = γ t + σ Wt + Yi (9) ψ(u) = − log 1 + − iθκu (12)
i=1
κ 2
Exponential Lévy Models 3

The density of the Lévy measure of the variance consequently, their price is not uniquely determined
gamma process is given by by the law of the underlying. This is good news:
c −λ− |x| c this means that the pricing model can be adjusted to
ν(x) = e 1x<0 + e−λ+ x 1x>0 (13) take into account both the historical dynamics of the
|x| x
underlying and the market-quoted prices of European

θ 2 + 2σ 2 /κ call and put options, a procedure known as model
where c = 1/κ, λ+ = − θ2 and λ− = calibration (see Model Calibration). Once the risk-
 σ2 σ
θ 2 + 2σ 2 /κ θ neutral measure Q is calibrated, one can price an
+ 2. exotic option with payoff HT at time T by taking the
σ2 σ
The normal inverse Gaussian process (see Nor- discounted expectation
mal Inverse Gaussian Model) [2] is the result of
time changing a Brownian motion with the inverse P0 = e−rT E Q [HT ] (16)
Gaussian subordinator and has the characteristic
exponent
Fourier Transform Methods for Option
1 1 Pricing and Model Calibration
ψ(u) = − 1 + u2 σ 2 κ − 2iθuκ (14)
κ κ
In exponential Lévy models, and in all models where
The second approach is to specify the Lévy the characteristic function of the log stock price
measure directly. The main example of this category t (u) = E[eiuXt ] is known explicitly, Fourier inver-
is the tempered stable process (see Tempered Stable sion provides a very efficient algorithm for pricing
Process), introduced by Koponen [12] and also European options. This method was introduced in [5]
known under the name of CGMY model [4]. This and later improved and generalized in [14].
process has a Lévy measure with density of the Consider a financial model of the form St =
form S0 ert+Xt , where X is stochastic process whose char-
acteristic function is known explicitly. To compute
c− c+ −λ+ x the price of a call option,
ν(x) = e−λ− |x| 1x<0 + e 1x>0
|x|1+α−
x 1+α+
C(k) = S0 E[(eXT − ek )+ ] (17)
(15)

with α+ < 2 and α− < 2. where k = log(K/S0 ) − rT is the log forward mon-
The third approach is to specify the density of eyness, we would like to express its Fourier transform
increments of the process at a given time scale, say , in terms of the characteristic function of XT and
by taking an arbitrary infinitely divisible distribution. then find the prices for a range of strikes by Fourier
Generalized hyperbolic processes (see Generalized inversion. However, the Fourier transform of C(k)
Hyperbolic Models) [10] can be constructed in this is not well defined because this function is not inte-
way. In this approach, it is easy to simulate the grable, so we subtract the Black–Scholes call price
increments of the process at the same time scale and with nonzero volatility  to obtain a function that is
to estimate parameters of the distribution if data are both integrable and smooth:
sampled with the same period , but, unless this
zT (k) = C(k) − CBS

(k) (18)
distribution belongs to some parametric class closed
under convolution, we do not know the law of the If X is a stochastic process such that E[eXT ] = 1 and
increments at other time scales. E[e(1+α)XT ] < ∞ for some α > 0, then the Fourier
transform of zT (k) is given by
Market Incompleteness and Option
T (v − i) − 
T (v − i)
Pricing ζT (v) = S0 (19)
iv(1 + iv)
The exponential Lévy models correspond, in gen-
2T 2
eral, to arbitrage-free incomplete markets, mean- where  T (v) = exp(− 2 (v + iv)) is the charac-
ing that options cannot be replicated exactly and, teristic function of log stock price in the Black–
4 Exponential Lévy Models

Scholes model with volatility . The exact value PIDE Methods for Exotic Options
of  is not very important, and one can take, for
example,  = 0.2 for practical calculations. For contracts with barriers or American-style exer-
Option prices are computed by evaluating numer- cise, partial integro-differential equation (PIDE)
ically the inverse Fourier transform of ζT : methods provide an efficient alternative to Monte
 +∞ Carlo simulation. In diffusion models, the price of
1 an option with payoff h(ST ) at time T solves the
zT (k) = e−ivk ζT (v) dv (20)
2π −∞ Black–Scholes partial differential equation (PDE)

This integral can be efficiently computed for a range


∂P 1 ∂ 2P ∂P
of strikes using the fast Fourier transform algorithm. + σ 2 S 2 2 = rP − rS
The Fourier-based fast deterministic algorithms ∂t 2 ∂S ∂S
for European option pricing can be used to calibrate P (T , S) = h(S) (21)
exponential Lévy models to market-quoted option
prices by penalized least squares as in [7]. Exponen- In an exponential Lévy model, there is a similar
tial Lévy models perform well for calibrating market equation for the option price
option prices for a range of strikes and a single matu-
rity, but fail to calibrate the entire implied volatility P (t, S) = e−r(T −t) E Q [h(ST )|St = S] (22)
surface containing many maturities. This is due to
but due to the presence of jumps, an integral term
the fact that the law of a Lévy process is completely
appears in addition to the partial derivatives (see
determined by its distribution at a given date, so that
Partial Integro-differential Equations (PIDEs)):
if we know option prices for many strikes and a sin-
gle maturity, we can readily reconstruct the law of
the process at all dates, which may be incompat- ∂P 1 ∂ 2P ∂P
+ σ 2 S 2 2 − rP + rS
ible with the other observations we may have. In ∂t 2 ∂S ∂S
particular, the implied volatility smile in exponential  
Lévy model flattens too fast for long-dated options + ν(dz) P (t, Sez ) − P (t, S)

(see Figure 1). Usually, a jump component can be 
included in a model to calibrate the short-maturity ∂P
− S(ez − 1) (t, S) = 0, P (T , S) = h(S)
prices, and a stochastic volatility component is used ∂S
to calibrate the skew at longer maturities. (23)

0.55
0.50
Implied volatility

0.45
0.40
0.35
0.30
0.25
0.20
1.0
0.8 70
0.6 80
0.4 90
0.2 110 100
T 120
0.0 130 K
140

Figure 1 Implied volatility surface in the Kou model with diffusion volatility σ = 0.2 and only negative jumps with
intensity λ = 10 and average size λ1− = 0.05
Exponential Lévy Models 5

Different path-dependent characteristics of the payoff For jump diffusions, if jumps are small, the Taylor
are translated into the boundary conditions of the decomposition of this formula gives
equation: for example, for a down-and-out option

with barrier B, we would impose P (t, S) = 0 for ∂P St ∂ 2 P
S ≤ B and all t. This equation and its numerical φt ≈ + ν(dz)(ez − 1)3
∂S 2 2 ∂S 2
solution using finite differences is discussed in detail 
in [9] (see Partial Integro-differential Equations  = σ + (ez − 1)2 ν(dz)
2 2
(26)
(PIDEs)).
Therefore, the optimal strategy can be seen as a small
and typically negative (since the jumps are mostly
Hedging negative) correction to delta hedging. For pure-jump
In the Black–Scholes model, delta hedging is known processes such as variance gamma, (∂ 2 P /∂S 2 ) may
to completely eliminate the risk of an option position. not be defined and the correction may be big.
In the presence of jumps, delta hedging is no longer Numerical studies of the performance of hedging
optimal: to hedge a jump of a given size, one should strategies in the presence of jumps show that
use the sensitivity to fluctuations of this particular
size rather than the sensitivity to infinitesimal move- • If the jumps are small, delta hedging works well
ments. Since the jump size is not known in advance, and its performance is close to optimal.
the risk associated with jumps cannot be hedged away • In the presence of a strong jump component, the
completely. The model given by equation (8) there- optimal strategy is superior to delta hedging both
fore corresponds to an incomplete market except for in terms of hedge stability and residual error.
the following two cases: • If jumps are strong, the residual hedging error
can be further reduced by adding options to the
• no jumps in the stock price (ν ≡ 0, the Black– hedging portfolio.
Scholes case) and
• no diffusion component (σ = 0) and only one To eliminate the remaining hedging error, a pos-
possible jump size (ν = δz0 (z)). In this case, the sible solution is to use liquid options as hedging
optimal hedging strategy is instruments. Optimal quadratic hedge ratios in the
case when the hedging portfolio may contain options
P (St ez0 ) − P (St ) can be found in [8].
φt = (24)
St (ez0 − 1)
In all other cases, the hedging becomes an approx- Additional Reading
imation problem: instead of replicating an option,
one tries to minimize the residual hedging error. For a more in-depth treatment, the reader may refer
Many authors (see, e.g. [8, 11]) studied the quadratic to the monographs [6, 18].
hedging, where the optimal strategy is obtained by
minimizing the expected squared hedging error. A References
particularly simple situation is when this error is com-
puted under the martingale probability. The optimal
[1] Appelbaum, D. (2004). Lévy Processes and Stochastic
hedge is then a weighted sum of the sensitivity of Calculus, Cambridge University Press.
option price to infinitesimal stock movements, and [2] Barndorff-Nielsen, O. (1998). Processes of normal
the average sensitivity to jumps: inverse Gaussian type, Finance and Stochastics 2,
41–68.
φ ∗ (t, St ) [3] Bertoin, J. (1996). Lévy Processes, Cambridge Univer-
 sity Press, Cambridge.
∂P 1 [4] Carr, P., Geman, H., Madan, D. & Yor, M. (2002). The
σ2 + ν(dz)(ez −1)(P (t, St ez ) − P (t, St ))
∂S St 
fine structure of asset returns: an empirical investigation,
= Journal of Business 75, 305–332.
σ + (ez − 1)2 ν(dz)
2
[5] Carr, P. & Madan, D. (1998). Option valuation using
the fast Fourier transform, Journal of Computational
(25) Finance 2, 61–73.
6 Exponential Lévy Models

[6] Cont, R. & Tankov, P. (2004). Financial Modelling with [13] Kou, S. (2002). A jump-diffusion model for option
Jump Processes, Chapman & Hall/CRC Press. pricing, Management Science 48, 1086–1101.
[7] Cont, R. & Tankov, P. (2006). Retrieving Lévy processes [14] Lee, R.W. (2004). Option pricing by transform methods:
from option prices: regularization of an ill-posed inverse extensions, unification and error control, Journal of
problem, SIAM Journal on Control and Optimization 45, Computational Finance 7, 51–86.
1–25. [15] Madan, D., Carr, P. & Chang, E. (1998),. The variance
[8] Cont, R., Tankov, P. & Voltchkova, E. (2007). Hedging gamma process and option pricing, European Finance
with options in models with jumps. Proceedings of the Review 2, 79–105.
2005 Abel Symposium in Honor of Kiyosi Itô, F.E. Benth, [16] Merton, R. (1976). Option pricing when underlying
G. Di Nunno, T. Lindstrom, B. Øksendal & T. Zhang, stock returns are discontinuous, Journal Financial Eco-
eds, Springer, pp. 197–218. nomics 3, 125–144.
[9] Cont, R. & Voltchkova E. (2005). A finite difference [17] Sato, K. (1999). Lévy Processes and Infinitely Divisible
scheme for option pricing in jump-diffusion and expo- Distributions, Cambridge University Press, Cambridge.
nential Lévy models, SIAM Journal on Numerical Anal- [18] Schoutens, W. (2003). Lévy Processes in Finance: Pric-
ing Financial Derivatives, Wiley, New York.
ysis 43, 1596–1626.
[10] Eberlein, E. (2001). Applications of generalized
hyperbolic Lévy motion to Finance, in Lévy Pro-
cesses—Theory and Applications, O. Barndorff-Nielsen, Related Articles
T. Mikosch & S. Resnick, eds, Birkhäuser, Boston, pp.
319–336.
Barndorff-Nielsen and Shephard (BNS) Models;
[11] Kallsen, J., Hubalek, F. & Krawczyk, L. (2006).
Variance-optimal hedging for processes with stationary
Fourier Transform; Infinite Divisibility; Jump
independent increments, The Annals of Applied Proba- Processes; Jump-diffusion Models; Kou Model;
bility 16, 853–885. Partial Integro-differential Equations (PIDEs);
[12] Koponen, I. (1995). Analytic approach to the problem Tempered Stable Process; Time-changed Lévy
of convergence of truncated Lévy flights towards the Process; Tempered Stable Process.
Gaussian stochastic process, Physical Review E 52,
1197–1199. PETER TANKOV
Uncertain Volatility Model world where the risk is distributed among a large
enough number of buyers), the risk aversion is total,
meaning that your managing policy will aim at
yielding a nonnegative P&L whatever the realized
Black–Scholes and Realized Volatility path. This approach is what is called the superhedging
What happens when a trader uses the Black–Scholes strategy (or superstrategy) approach to derivative
((BS) in the sequel) formula to dynamically hedge pricing. Of course, the larger the set of the underlying
a call option at a given constant volatility while the scenarios (or paths) for which you want to have
realized volatility is not constant? the superhedging property (see Superhedging), the
It is not difficult to show that the answer is the higher the initial selling price. The first set that
following: if the realized volatility is lower than the comes to mind is the set of paths associated with
managing volatility, the corresponding profit and loss an unknown volatility, say between two boundary
(P&L) will be nonnegative. Indeed, a simple, yet, values σmin and σmax . In other words, we look for
clever application of Itô’s formula shows us that the cheapest price at which we can sell and manage
the instantaneous P&L of being short a delta-hedged an option without any assumption on the volatility
option reads except that it lies in the [σmin , σmax ] range. This
 framework is the uncertain volatility model (UVM)
  
1 2 2 dSt 2 introduced by Avellaneda et al. [2].
P &Lt = St σt dt − (1) If you take a call option (or more generally a
2 St
European option with convex payoff), the BS price at
where  is the gamma of the option (the second volatility σmax is a good candidate. Indeed, it yields
derivative with respect to the underlying, which is a superhedging strategy by result (1). And should the
positive for a call option), and σt the spot volatility, realized volatility be constantly σmax , then your P&L
for example, the volatility at which the option was will be 0. It is easy to conclude from this that the
 2 BS σmax price is the UVM selling price for an option
sold and dS St
t
represents the realized variance over with a convex payoff.
the period [t, t + dt]. Note that this holds without Now very often traders use strategies (butterflies,
any assumption on the realized volatility, which will callspreads, etc.) which are not convex any longer. It
certainly turn out to be nonconstant. This result is is not at all easy to find a superstrategy in such cases.
fundamental in practice: it allows traders to work There is one exception; if you hedge at the selling
with neither exact knowledge of the behavior of time and do not rebalance your hedge before maturity,
the volatility nor a more complex toolbox than the the cheapest price associated to such a strategy will
plain BS formula; an upper bound of the realized be the value at the initial underlying value of the
volatility is enough to grant a profit (conversely, a concave envelope of the payoff function. It is easy to
lower bound for option buyers). This way of handling see that this value corresponds to the total uncertainty
the realized volatility with the BS formula is of case, or to the [0, ∞] case in the UVM model. For a
historical importance in the option market. El Karoui, call option it will be the value of the underlying.
Jeanblanc, and Shreve have formalized it masterfully
in [5].
Black–Scholes–Barenblatt Equation

Superhedging and the Uncertain Volatility There come into play the seminal work [2] and
independently [7]: Going back to equation (1), we
Model (UVM)
are looking for a model with the property that
The UVM Framework the managing volatility is σmin when the gamma
is nonnegative, and σmax in the converse situation.
Assume that you perform the previous strategy. You Should such a model exist, it will yield an optimal
are certainly not alone in the market, and you wish solution to the superhedging problem.
you have the lowest possible selling price compatible An easy way to approximate the optimal solution
with your risk aversion. In practice, on the derivatives is to consider a tree (a trinomial tree, for instance)
desk (this is a big difference with the insurance where the dependence upon the volatility lies in
2 Uncertain Volatility Model

the node probabilities and not in the tree grid. In where


the classical backward pricing scheme one can then  2 +

choose the managing volatility according to the local 2
∂ W +  σmax
2
if ∂ W2 ≥ 0
convexity (since it is a trinomial tree, each node  = ∂S (5)
∂S 2  2 +
 σ 2 if ∂ W < 0
has three offshoot and so a convexity information) min
of the immediately forward price. Of course, it is ∂S 2
not the convexity of the current price since we are and
calculating it, but the related error of replacing the  2 −

current convexity by the forward one will certainly 2
∂ W −  σmax
2
if ∂ W2 ≤ 0
go to zero when the time step goes to zero.  = ∂S (6)
The related continuous-time object is the Black– ∂S 2  2 −
 σ 2 if ∂ W > 0
min
Scholes partial differential equation (PDE) where the ∂S 2
second-order term is replaced by the following non- Observe that in case  is convex, the BS price at
linear one volatility σmax is convex for any time t, so that
 2 +  it solves the Black–Scholes–Barenblatt equation.
1 2
S σmax
2 t
 − σmin2
− Conversely, if  is concave, so is its BS price at
volatility σmax for any time t, which yields the unique
where, as usual, x + and x − denote the positive
solution to the Black–Scholes–Barenblatt equation.
and negative parts. This PDE has been named
Black–Scholes–Barenblatt since it looks like the
Barenblatt PDE occurring in porosity theory. More Superstrategies and Stochastic Control
precisely, in case of no arbitrage, assume that the
stock price dynamics satisfy dSt = St (r dt + σt dWt ), Note that this PDE is also a classical Hamilton–
where Wt is a standard Brownian motion and r is the Jacobi–Bellman equation occurring in stochastic con-
risk-free interest rate. This is valid under the class trol theory. Indeed a related object of interest is
P of all the probability measures such that σmin ≤ the supremum of the risk-neutral prices over all the
σt ≤ σmax . Let t denote the value of a derivative at dynamics of volatility that satisfy the range property:
time t written on St with maturity T and final payoff
 (ST ); then at any time 0 ≤ t ≤ T , we must have sup ƐP f
P ∈P
W − (t, St ) ≤ t ≤ W + (t, St ) where
where P is the set of risk-neutral probabilities,

W − (t, St ) = inf ƐPt e−r(T −t)  (ST ) each of which corresponds to a volatility process
P ∈P with value at each time in [σmin , σmax ]. In fact,

W (t, St ) = sup ƐPt e−r(T −t)  (ST )


+
(2) such an object is not that easy to define in the
P ∈P classical probabilistic modeling framework, since
two different volatility processes will typically yield
The two bounds satisfy the following nonlinear mutually singular probability measures on the set
PDE, called the Black–Scholes–Barenblatt equation of possible paths. A convenient framework is the
(which reduces to the classical BS one in the case stochastic control framework. In such a framework,
σmin = σt = σmax ): the managing volatility being interpreted as a control,
one tries to optimize a given expectation—the risk-
 
∂W ± ∂W ± neutral price in this case. It turns out that stochastic
+r S − W± optimal control will yield the optimal superstrategy
∂t ∂S
price.
1 ∂ 2W ± 2 ±
2∂ W Nevertheless, the connection between the super-
+  S =0 (3)
2 ∂S 2 ∂S 2 strategy problem and stochastic control is not that
obvious, and these need to be spelled out carefully in
with the terminal condition this respect. Recall that the stochastic control prob-
lem is the maximization of an expectation over a set
W ± (S, T ) =  (ST ) (4) of processes, whereas the superstrategy problem is
Uncertain Volatility Model 3
   
the almost sure domination of the option payoff at payoffs F1 ST1 , . . . , Fm STm with maybe different
maturity by a hedging strategy. strikes and maturities are available for hedging; let
Note that even in the UVM case, there are still f1 , . . . , fm be their respective market prices at the
plenty of open questions. In fact, a neat formulation time of the valuation t ≤ min (T , T1 , . . . , Tm ). Con-
of the superhedging problem is not a piece of cake. sider now an agent who buys quantities λ1 , . . . , λm
The issue is avoided in [2], handled partially in [7], of each option. His total cost of hedging then reads
and more formally in [8], where the model uncer-
tainty is specified as a set of martingale probabilities
 (t, St , λ1 , . . . , λm )
on the canonical space, and also in [6]. Once this  
is done, a natural theoretical problem, given such a 
m
 
−r(T −t) −r(Ti −t)
“model set”, is to find out a formula for the cheapest = sup e  (ST ) − λi e Fi STi
P ∈P i=1
superhedging price. The supremum of the risk-neutral
prices over all the probabilities of the set will in gen- 
m

eral be strictly smaller than the cheapest price, even if + λ i fi (7)


they match in the UVM setting. The precise property i=1
of the “model set” that makes this equality remains
where the supremum (sup) is calculated within the
to be clarified. Some partial results in this direction,
UVM framework as presented above, and we must
with progresses towards a general theorem, are avail-
specify a range + − ±
i ≤ λi ≤ i (i represent the
able in [4], where the case of path-dependent payoffs
quantities available on the market). The optimal
in the UVM framework is also solved.
hedge is then defined as the solution to the problem

Lagrangian UVM ∗ (t, St ) = inf  (t, St , λ1 , . . . , λm ) (8)


λ1 ,...,λm

In practice, the UVM approach is easy to imple- 


In fact, the first-order conditions read ∂λ ∂
= m i=1 fi
ment for standard options by using the tree scheme ∗    i

described above, for example. It can be extended in −ƐP e−r(Ti −t) Fi STi = 0, where P ∗ realizes the
the same way for path-dependent options. Neverthe- sup above. These conditions exactly fit the model to
less, when the price pops up, the usual reaction of observed market prices. The convexity of (t, St , λ1 ,
the trader or risk officer is that the price is too high, . . . , λm ) with respect to λi ensures that if a minimum
especially too high to explain the observed market exists, then it is unique.
price. This approach is very attractive from a theoretical
The fact that the price is high is a direct conse- point of view, but it is much harder to implement.
quence of the total aversion approach in the super- The consistency of observed vanilla prices is a crucial
strategy formulation, and also of the fact that the step that is rarely met in practice. Even if numerous
price corresponds to the worst-case scenario where robust algorithms exist to handle the dual problem,
the gamma changes signs exactly when the volatility their implementation is quite tricky. In fact, this
switches regimes. This is a highly unlikely situation. constrained formulation implies a calibration property
To lower the price and fit in the traditional setting of the model, and the design of a stable and robust
where one wants to fit the observed market price of calibration algorithm is one of the greatest challenges
liquid European calls and puts (so-called vanillas), in the field of financial derivatives.
Avellaneda, Levy, and Paras propose a constrained
extension of the UVM model where the price of the The Curse of Nonlinearity
complex products of the trader is handled within the
UVM framework with the additional constraint of fit- Another issue for a practitioner is the inherent non-
ting the vanilla prices. By duality, this reduces to linearity of the UVM formulation. Most traditional
computing the UVM price for a portfolio parameter- models like BS, Heston, or Lévy-based models are
ized by a Lagrangian multiplier and then minimizing linear models. The fact that an option price should
the dual value function over the Lagrangian parame- depend on the whole portfolio of the trader is a no-
ter. Mathematically speaking, let us consider an asset brainer for risk officers, but this nonlinearity is a
St and a payoff  (ST ). m European options with challenge for the modularity and the flexibility of
4 Uncertain Volatility Model

pricing systems. This is very often a no-go feature References


in practice.
The complexity of evaluating a portfolio in the [1] Avellaneda, M. & Buff, R. (1999). Combinatorial impli-
UVM framework is real, as studied thoroughly by cations of nonlinear uncertain volatility models: the case
Avellaneda and Buff in [1]. Following [1], let us of barrier options, Applied Mathematical Finance 1,
consider a portfolio with n options with payoffs 1–18.
f1 , . . . , fn and maturities t1 , . . . , tn . The computa- [2] Avellaneda, M., Levy, A. & Paras, A. (1995). Pricing and
hedging derivative securities in markets with uncertain
tional problem becomes tricky when the portfolio
volatilities, Applied Mathematical Finance 2, 73–88.
consists of barrier options. Indeed, this means that, [3] Avellaneda, M. & Paras, A. (1996). Managing the
at any time step, the portfolio we are trying to value volatility risk of portfolios of derivative securities: the
might be different (in case the stock price has reached Lagrangian uncertain volatility model, Applied Mathemat-
the barrier of any option) from the one at the previ- ical Finance 3, 21–52.
ous time step. Because of the nonlinearity, a PDE [4] Denis, L. & Martini, C. (2006). A theoretical framework
for the pricing of contingent claims in the presence of
specific to this portfolio has to be solved in this case.
model uncertainty, Annals of Applied Probability 16(2),
Avellaneda and Buff [1] addressed this very issue: 827–852.
a naive implementation would require solving the [5] El Karoui, N., Jeanblanc, M. & Shreve, S. (1998). Robust-
2n − 1 nonlinear PDEs, each representing a subport- ness of the Black and Scholes formula, Mathematical
folio. They provide an algorithm to build the minimal Finance 8(2), 92–126.
number Nn of subportfolios (i.e., of nonlinear PDEs [6] Frey, R. (2000). Superreplication in stochastic volatility
models and optimal stopping, Finance and Stochastics
to solve) and show the following:
4(2), 161–187.
[7] Lyons, T.J. (1995). Uncertain volatility and the risk-free
• If the initial portfolio consists of barrier (single
synthesis of derivatives, Applied Mathematical Finance 2,
or double) and vanilla options, then Nn ≤ n(n+1) 2 117–133.
• If the initial portfolio only consists of single [8] Martini, C. (1997). Superreplications and stochastic con-
barrier options (nu up-and-out ones and nd = trol, IIIrd Italian Conference on Mathematical Finance,
n − nu down-and-out ones), then Nn = nd + Trento.
nu + nd nu . This assumes that all the barriers are
different. If some are identical, then the number
of required computations decreases. Related Articles

Numerically speaking, the finite-difference pricing is Black–Scholes Formula; Models; Stochastic


done on a lattice, matching almost exactly all the Control.
barriers. Nevertheless in [3], an optimal construction
of the lattice to solve the PDEs is provided. CLAUDE MARTINI & ANTOINE JACQUIER
usual scalar product, and |·| for the Euclidean norm.
Implied Volatility: Market We assume that the probability measure is risk
Models neutral, that is, discounted price processes are local
martingales. We assume for simplicity that interest
rates and dividends are zero.
The market model approach for implied volatility The traded asset on which options are written is
consists in taking implied volatilities as the quantities denoted by St and its volatility vector by σt , that is,
one wishes to model. In many options exchanges dSt
and over-the-counter markets, implied volatility is the = σt · dWt (1)
St
way an option is quoted and hence plays the role of
a price. The no-arbitrage constraint for the implied volatil-
The market model approach for implied volatili- ity t (T , K) for the option with strike K and matu-
ties is inspired by the corresponding market model rity T implies the following drift restriction in their
approach for interest rates, the so-called Heath– dynamics
Jarrow–Morton (HJM) approach to interest rate mod-
eling (see Heath–Jarrow–Morton Approach). In t (T , K)
interest rate modeling, it is simple to characterize  t
|σs − ln (Ss /K) ξs |2 − s2
the dynamics of the entire family of instantaneous = 0 (T , K) −
forward rates in such a way that the corresponding 0 2s (T − s)
family of bond prices is arbitrage free. It is also 
simple to give examples of such dynamics and to 1 1
+ s σs · ξs − s3 (T − s) |ξs |2 (T ,K) ds
price interest rate sensitive contingent claims in such 2 8
models.  t
Correspondingly, the market model approach to + s (T , K)ξs (T , K) · dWs
implied volatility seeks to characterize the dynam- 0
ics of the entire implied volatility surface in such a (2)
way that the corresponding family of option prices is
arbitrage free. It also seeks simple examples of such where ξt (T , K) is the implied volatility’s volatility
dynamics and ideally practical means of computing vector. The corresponding call option price dynam-
prices of exotic options on the underlying in such ics is
models.  t 
Despite numerous attempts and recent progress Ct (T , K) = C0 (T , K)+ Ss (d1 )σs
in this area, it is fair to say that this approach has 0
unfortunately not delivered the same elegant and √ 
+ s T −sϕ(d1 )ξs (T , K) · dWs (3)
useful results as the HJM approach has in interest rate √
modeling. The market approach to implied volatility where, as usual,
√ d1 = (ln(St /K)/t (T , K) T − t)
can be traced back to the works [11, 12, 20]. + 12 t (T , K) T−t and  and ϕ denote, respectively,
As in the HJM approach, the no-arbitrage con- the cumulative distribution and density probability
dition for implied volatilities takes the form of a function of a standard Gaussian random variable.
drift restriction. In other words, the drift must be The above equation (2) is the equivalent of
constrained for option prices to be local martingales the HJM equation for implied volatilities. However,
under the pricing measure. unlike the HJM equation, the drift does not solely
involve the volatility vector ξt but also depends on
St and σt .
Drift Restriction
To continue the discussion, we need the following The Spot Volatility Specification
definitions. Let (Wt )t≥0 be an n-dimensional Wiener
process that models the uncertainty in the economy. Equation (2) has interesting properties. In particular,
We shall use boldface letters for vectors, · for the when we consider the infinite system of equation (2)
2 Implied Volatility: Market Models

for a fixed K and all T > t, it alone specifies the spot all T > t together with equation (1) admits a unique
volatility σt . This phenomenon is directly related to solution. The best results were obtained by [4] and
the convergence of option prices to the option payoff [19]. Without loss of generality, one can assume that
at expiry. Equivalently, the solution to equation (2) equation (1) is driven by the first Wiener process only
should not blow up too fast near expiry. It is called and that σt = (σt , 0, . . . , 0). Assume that ξt has the
no-bubble restriction in [17], whereas [5] calls it the functional form,
feedback condition and traces it back to [8]. It is also 
1 T Xt (u, K)
called the volatility specification in [19] and [6]. It ξt (T , K) = Vt (u, K) du (6)
reads 2 t Xt (T , K)
   
 St  ∂ ( (T , K)2 (T − t)) is the
where Xt (T , K) = ∂T

t (t, K) = σt − ln ξt (t, K) (4) t
K square of the forward implied volatility and where
For a proof under proper assumptions, see [13]. V has the form
The case where we let K = St in equation (4) says Vt (T , K) = V(t, T , K, t (T , K), t (t, K), St ) (7)
t (t, St ) = |σt |. In other words, the current value of
the spot volatility can be exactly recovered from the for a deterministic function V satisfying technical
implied volatility smile. This very much parallels the positivity, growth, and Lipschitz conditions [19].
fact that the instantaneous forward rate with infinitely Assume also that the spot volatility has the functional
small tenor is the short rate in the HJM approach to form σt = σ (t, K, t (t, K), St ), where the determin-
interest rates. istic function σ is determined by equation (4). Then,
It is shown in [14] that the relation t (t, St ) = |σt | the infinite system of equation (2) for a fixed K and
holds in great generality even when jumps in the all T > t together with equation (1) admits a unique
spot and/or its volatility are present. It turns out to solution.
be a consequence of the central limit theorem for
martingales.
Equation (4) has an interesting connection to the The Case of Several Strikes
work of Berestycki et al. [3]. In a time homogeneous The infinite system of equation (2) for all K and all
stochastic volatility model, [3] shows that the implied T > t together with equation (1) is more complicated
volatility in the short maturity limit can be expressed and conditions on ξt under which it admits a unique
using the geodesic distance associated with the gen- solution are still poorly understood. One advantage
erator of the bivariate diffusion (xt , yt ), where xt is of dealing with all strikes K at once is that one
the log-moneyness and yt is the spot volatility (|σt | can remove the dependence on S in equation (2) by
in our notation). Keeping their notation, we denote changing the parameterization of the surface from K
by d(x, y) the signed geodesic distance from (x, y) to moneyness K/St . The dynamics of the implied
to (0, y), and obtain volatility surface in these coordinates are obtained
ln(St /K) by applying the Itô–Wentzell formula as in [5]. One
t (t, K) = (5) of the difficulties of the multistrike case is that the
d(ln(St /K), |σt |)
solution to the infinite system in equation (2) must
By comparing equations (4) and (5), it becomes clear satisfy some shape restrictions at each time t. These
that the geodesic distance associated with the genera- are consequences of the well-known static arbitrage
tor of the stochastic volatility model and the implied restrictions that we now recall.
volatility’s volatility vector are strongly related.
Static Arbitrage Restrictions
The Case of a Single Strike Static arbitrage relations lead to constraints on the
shape of the implied volatility surface. The fact that
We first deal with the problem studied by [1, 4, 16,
calendar spreads have positive values leads to
17, 19] where only a single option is considered.
The goal is to set up conditions under which the ∂t t
+ ≥0 (8)
infinite system of equation (2) for a fixed K and ∂T 2(T − t)
Implied Volatility: Market Models 3

The fact that call values are a decreasing function of Empirical Models
the strike leads to
−(− d1 ) ∂t ( d2 ) To overcome the obvious shortcomings of the sticky
√ ≤K ≤ √ (9) strike and sticky delta models, Cont and da Fonseca
ϕ( d1 ) T − t ∂K ϕ( d2 ) T − t [9] have proposed to write down a model for the
Finally, the fact that butterfly spreads have positive future evolution of the surface as an infinite system
values, or that calls are convex functions of the strike where each point of the surface is driven by a few
leads to common factors. These dynamics allow for easy cal-
ibration using principal component analysis [9] and
    can be useful for risk management and scenarios sim-
ln (K/St ) ∂t 2 (T − t)2 2 ∂t 2
1− K − t K ulation. It is difficult, however, to check whether such
t ∂K 4 ∂K

specifications satisfy arbitrage restrictions, which
2 prevents them from being used to price exotic
∂t ∂ t
+ (T − t)t K + K2 ≥0 (10) options.
∂K ∂K 2

where d2 = d1 − t (T , K) T − t. These restric-
tions must hold at each time t and at each point The Spot Volatility Dynamics from the
(T , K) of the implied volatility surface. Implied Volatility Surface

In the HJM approach (see Heath–Jarrow–Mor-


Deterministic Models ton Approach) to interest rate modeling, the HJM
Practitioners [10] have proposed two simple models equation can be used to write down the short rate
for implied volatility surfaces movements: the sticky dynamics starting from the forward rate dynamics.
strike model and the sticky delta model. The sticky The parallel result in the case of implied volatility
strike model supposes that between date s and t ≥ s, was obtained in [13]. The statement is the following:
the implied volatility surface evolves as there exists a scalar Wiener process W ⊥ adapted to
the filtration generated by (Wt )t≥0 such that
t (T , K) = s (T , K) (11)
 t
whereas the sticky delta model supposes that ∂s
|σt | = |σ0 | +
2 2
4 |σs | (s, Ss ) + 6 |σs |2
  0 ∂T
Ss
t (T , K) = s T , K (12)  2 
St ∂s 2
3 2 ∂ s
× Ss (s, Ss ) + 2 |σs | Ss (s, Ss ) ds
In a sticky strike model, an option with a given ∂K ∂K 2
strike has constant implied volatility. This contrasts  t  t
∂s
with a sticky delta model where options with same + 4 |σs | (s, Ss ) dSs + 2 |σs |2 ξs⊥ dWs⊥
moneyness have same implied volatilities. In other 0 ∂K 0
words, the implied volatility surface moves in perfect (13)
sync with the spot. In reality, implied volatilities
move in a more complicated fashion but these two where
extreme cases are useful stylized benchmarks.
The sticky strike and sticky delta models, in
 ⊥ 2 2 d ∂t
fact, imply strong restrictions on the possible spot ξt = − St , (t, St )
dynamics. Balland [2] showed that a sticky delta |σt | dt ∂K
occurs if and only if the underlying asset price is the  2
∂t ∂t
exponential of a process with independent increments + 2 St (t, St ) − |σt | St (t, St )
∂K ∂K
(i.e., a Lévy process, see Exponential Lévy Models)
under the pricing measure, and that a sticky strike ∂ 2 t
situation occurs in the Black–Scholes model only! −3|σt | St2 (t, St )
∂K 2
4 Implied Volatility: Market Models

Moreover, the two local martingales appearing in the References


decomposition are orthogonal in the sense that
[1] Babbar, K. (2001). Aspects of Stochastic Implied Volatil-
 t  t ity in Financial Markets. PhD thesis, Imperial College,
∂s
4 |σs | (s, Ss ) dSs ; 2 |σs |2 ξs⊥ dWs⊥ = 0 London.
0 ∂K 0 [2] Balland, P. (2002). Deterministic implied volatility mod-
(14) els, Quantitative Finance 2(2), 31–44.
[3] Berestycki, H., Busca, J. & Florent, I. (2004). Comput-
ing the implied volatility in stochastic volatility mod-
This result actually has a converse, which allows one els, Communications on Pure and Applied Mathematics
to get a very precise idea of the implied volatility 57(10), 1352–1373.
of a given spot model. It indeed allows to compute [4] Brace, A., Fabbri, G. & Goldys, B. (2007). An Hilbert
the first terms of the Taylor expansion of the implied Space Approach for A Class of Arbitrage Free Implied
Volatilities Models, Technical report, Department of
volatility surface for short maturity and around the
Statistics, University of New South Wales, at http://arxiv.
money [13]. org/abs/0712.1343.
[5] Brace, A., Goldys, B., Klebaner, F. & Womersley, R.
(2001). Market Model for Stochastic Implied Volatility
with Application to the BGM Model , Technical report,
Other Approaches Department of Statistics, University of New South
Wales.
Modeling implied volatilities is equivalent to mod- [6] Carmona, R. (2007). HJM: a unified approach to
eling option prices; as seen in equation (3), it is dynamic models for fixed income, credit and equity
merely a parameterization of the options’ volatilities. markets, in Paris-Princeton Lectures on Mathematical
The difficulties in modeling implied volatilities have Finance 2004, Lecture Notes in Mathematics, Springer,
Vol. 1919.
led researchers to look for other and possibly more
[7] Carmona, R. & Nadtochiy, S. (2009). Local volatility
tractable parameterizations. We mention them here, dynamic models, Finance and Stochastics 13(1), 1–48.
although these approaches depart from the strict study [8] Carr, P. (2000). A Survey of Preference Free Option
of implied volatilities. Valuation with Stochastic Volatility, Risk’s 5th annual
First, following a program started in [7, 10] model European derivatives and risk management congress,
option prices by modeling Dupire local volatility Paris.
as a random field. They are able to find explicit [9] Cont, R. & Da Fonseca, J. (2002). Dynamics of implied
drift conditions as well as some examples of such volatility surfaces, Quantitative Finance 2(2), 45–60.
[10] Derman, E. (1999). Regimes of volatility, Risk (4),
dynamics. The Dupire local volatility surface also 55–59.
specifies the spot volatility in the short maturity [11] Derman, E. & Kani, I. (1998). Stochastic implied trees:
limit but does not have complicated static arbitrage arbitrage pricing with stochastic term and strike structure
restrictions like equations 8–10. of volatility, International Journal of Theoretical Applied
Another way of parameterizing option prices con- Finance 1(1), 61–110.
sists in modeling its intrinsic value, that is, the [12] Dupire, B. (1993). Model art, Risk 6(9), 118–124.
difference between the option price and the pay- [13] Durrleman, V. (2004). From Implied to Spot Volatilities,
PhD thesis, Department of Operations Research &
off if the option was exercised today. This is the
Financial Engineering, Princeton University, at http://
approach taken by [15] in a very general semimartin- papers.ssrn.com/sol3/papers.cfm?abstract id=1162425 to
gale framework. Exactly as with implied volatilities, appear in Finance and Stochastics.
this approach yields a spot specification when options [14] Durrleman, V. (2008). Convergence of at-the-money
are close to maturity. implied volatilities to the spot volatility, Journal of
Finally, let us mention the recent work [18], Applied Probability 45, 542–550.
where the authors introduce new quantities: the “local [15] Jacod, J. and Protter, P. (2006). Risk Neutral Compatibil-
ity with Option Prices, Technical report, Université Paris
implied volatilities” and “price level” to parameter-
VI and Cornell University, at http://people.orie.cornell.
ize option prices. These have nicer dynamics and edu/ protter/WebPapers/JP-OptionPrice.pdf.
naturally satisfy the static arbitrage conditions. They [16] Lyons, T. 1995. Uncertain volatility and the risk-free
derive existence results for the infinite system of synthesis of derivatives, Applied Mathematical Finance
equations driving these quantities. (2), 117–133.
Implied Volatility: Market Models 5

[17] Schönbucher, P. (1999). A market model for stochastic Heath, D., Jarrow, R. & Morton, A. Bond pricing and the term
implied volatility, Philosophical Transactions of the structure of interest rates: a new methodology for contingent
Royal Society of London. Series A: Mathematical and claims valuation, Econometrica 60(1), 77–105.
Physical Sciences 357(1758), 2071–2092.
[18] Schweizer, M. & Wissel, J. (2008). Arbitrage-free mar-
ket models for option prices: the multi-strike case,
Finance and Stochastics 12(4), 469–505. Related Articles
[19] Schweizer, M. & Wissel, J. (2008). Term structures of
implied volatilities: absence of arbitrage and existence
Black–Scholes Formula; Dividend Modeling; Exp-
results, Mathematical Finance 18, 77–114.
[20] Zhu, Y. & Avellaneda, M. (1998). A risk-neutral stochas- onential Lévy Models; Heath–Jarrow–Morton
tic volatility model, International Journal of Theoretical Approach; Implied Volatility: Long Maturity
and Applied Finance 1(2), 289–310. Behavior; Implied Volatility: Large Strike Asymp-
totics; Implied Volatility: Volvol Expansion;
Implied Volatility Surface; Implied Volatility
Further Reading in Stochastic Volatility Models; Local Volatility
Model; Moment Explosions; SABR Model.
Gatheral, J. (2006). The Volatility Surface: A Practitioner’s
Guide, Wiley Finance. VALDO DURRLEMAN
• nij (t) = number of firms that went from i at
Rating Transition date t − 1 to j at date t.
−1
Matrices • Ni (t) = Tt=0 ni (t) = number of firm exposures
recorded atthe beginning of transition periods.
• Nij (T ) = Tt=1 nij (t) = total number of transi-
Rating transition matrices play an important role in tions observed from i to j over the entire period.
credit risk management both as a method for summa- If we do not assume time homogeneity, we can
rizing the empirical behavior of a rating system and as estimate each element of the one-step transition
a tool for computing probabilities of rating migrations probability matrix using the maximum-likelihood
in, for example, a portfolio of risky loans. Analysis estimator
of statistical properties of rating transition matrices
is intimately linked with Markov chains. Even if rat- nij (t)

p ij (t − 1; t) = (4)
ing processes in general are not Markovian, statistical ni (t − 1)
analysis of rating systems often focuses on assessing
which simply is the fraction of firms that made the
a particular deviation from Markovian behavior. Fur-
transition divided by the number of firms which could
thermore, the tractability of the Markovian setting can
have made the transition.
be preserved in some simple extensions.
Assuming time homogeneity, the maximum-
likelihood estimator of the transition probabilities
matrix is
Discrete-time Markov Chains Nij (T )
ij =
p (5)
Ni (T )
Let the rating process η = (η0 , η1 , . . .) be a discrete-
time stochastic process taking values in a finite state for all i, j ∈ K. This estimator is different from the
space {1, . . . , K}. If the rating process is a Markov estimator obtained by estimating a sequence of 1-year
chain, the probability of making a particular transition transition matrices and then computing the average of
between time t and time t + 1 does not depend on each element at a time. The latter method will weigh
the history before time t, and one-step transition years with few observations as heavily as years with
probabilities of the form many observations. If the viewpoint is that there is
variation in 1-year transition probabilities over time
pij (t; t + 1) = P r(ηt+1 = j | ηt = i) (1) due to, for example, business cycle fluctuations, the
averaging can be justified as a way of obtaining
describes the evolution of the chain. If the one-step an unconditional 1-year default probability over the
transition probabilities are independent of time, we cycle.
call the chain time homogeneous and write Rating agencies often form a cohort of firms at
a particular date, say January 1, 1980, and record
pij = P r(ηt+1 = j | ηt = i) (2) transition frequencies over a fixed time horizon, say
5 years. This can be done in a straightforward way
The one-period transition matrix of the chain is using only information on the initial rating and final
then given as rating after 5 years, assuming that all companies
  that are in the cohort, to begin with, stay in the
p11 · · · p1K
 sample. In practice, rating withdrawals occur, that
P =  .. .. 
 (3) is, firms or debt issues cease to have a rating.
. .
pK1 · · · pKK According to [4], the vast majority of withdrawals
are due to debt maturing, being redeemed or called.
K
where j =1 pij = 1 for all i. It is traditional in the rating literature to view these
Consider a sample of N firms whose transitions events as “noninformative” censoring. One way to
between different states are observed at discrete dates deal with withdrawals is to eliminate the firms from
t = 0, . . . , T . Now introduce the following notation: the sample and in essence use only those firms that
do not have their rating withdrawn in the 5-year
• ni (t) = number of firms in state i at date t. period. Another way is to estimate a sequence of
2 Rating Transition Matrices

1-year transition probability matrices using the 1-year the generator matrix is given by
estimator and then estimate the 5-year matrix as the
product of 1-year matrices. In this case, information Nij (T )
λ̂ij =
T
(8)
of a firm whose rating is withdrawn is used for the
Yi (s) ds
years where it is still present in the sample. Both 0
methods rely on the assumption of withdrawals being
noninformative. where Yi (s) is the number of firms in rating class i
at time s and Nij (T ) is the total number of direct
transitions over the period from i to j, where i  =
j. The denominator counts the number of “firm-
Continuous-time Markov Chains years” spent in state i.
Any period a firm spends in a state will be picked
When one has access to full rating histories and
up through the denominator. In this sense all informa-
therefore knows the exact dates of transitions, the
tion is being used. Note also how (noninformative)
continuous-time formulation offers significant advan-
censoring is handled automatically: When a firm
tages in terms of tractability. Recall that the family of
leaves the sample, it simply stops contributing to
transition matrices for a time-homogeneous Markov
the denominator. Also, this method will produce esti-
chain in continuous time on a finite state space can
mates of transition probabilities for “rare transitions”,
be described by an associated generator matrix, that
even if the rare transitions have not been observed in
is, a K × K matrix , whose elements satisfy the sample. For more on this, see [9].

λij ≥ 0 for i  = j
Nonhomogeneous Chains
λii = − j =i λij (6)
For statistical specifications and applications to pric-
Let P (t) denote the K × K matrix of transition ing, the concept of a nonhomogeneous chain is useful.
probabilities, that is, pij (t) = P (ηt = j |η0 = i). Then In complete analogy with the discrete-time case, the
definition of the Markov property does not change
P (t) = exp(t) (7) when we drop the assumption of time homogeneity,
but the description of the family of transition matrices
requires that we keep track of calendar dates instead
where the right hand side is the matrix exponential
of just time lengths.
of the matrix t obtained by multiplying all entries
For each pair of states i, j with i  = j, let Aij be a
of  by t.
nondecreasing right-continuous (and with left limits)
In case a row consists of all zeros, the chain is
function, which is zero at time zero. Let
absorbed in that state when it hits it. It is convenient
to work with the default states as absorbing states
Aii (t) = − Aij (t) (9)
even if firms in practice may recover and leave the
j =i
default state. If we ask what the probability is that
a firm will default before time T then this can be and assume that
read from the transition matrix P (T ) when we have
defined default to be an absorbing state. If the state Aii (t) ≥ −1 (10)
is not absorbing, but P allows the chain to jump
back into the nondefault rating categories, then the Then there exists a Markov process with state space
transition probability matrix for time T will only give 1, . . . , K whose transition matrix is given by
the probability of being in default at time T and this
(smaller) probability is typically not the one we are
interested in for risk management purposes. P (s, t) = [s,t] (I + dA)
Assume that we have observed a collection of ≡ lim i (I + A(ti ) − A(ti−1 ))
max |ti −ti−1 |→0
firms between time 0 and time T . The maximum-
likelihood estimator for the off-diagonal elements of (11)
Rating Transition Matrices 3

where s ≤ t1 ≤ tn ≤ t. One can think of the proba- a primary concern of through-the-cycle rating is the
bilistic behavior as follows: Given that the chain is correct ranking of the firm’s default probabilities (or
in state i at time s the probability that it remains in expected loss) over a longer time horizon, whereas
that state at least until t (assuming that Aii (u) > a point-in-time is more concerned with following
−1 for u ≤ t) is given by actual, shorter-term default probabilities seeking to
maintain a constant meaning of riskiness associated
P (ηu = 0 for s < u ≤ t|ηs = i) with each rating category.
The degree to which transition probabilities
= exp(−(Aii (t) − Aii (s))) (12) depend on the previous rating history, business cycle
variables, and the sector or country to which the rated
We are interested in testing assumptions on the companies belong has been investigated, for example,
intensity measure when it can be represented through in papers [1, 9, 10]. A good entry into the literature is
integrated intensities, that is, we assume that there in the special journal issue introduced by Cantor [3].
exists integrable functions (or transition intensities) Rating agencies have a system of modifiers that
λij (·) such that effectively enlarge the state space. For example,

t Moody’s operates with a watchlist and long-term
Aij (t) = λij (s) ds (13) outlooks. Being on a watchlist signals a high like-
0
lihood of rating action in a particular direction in the
for every pair of states i, j with i  = j. near future, and outlooks signal longer term likely
In this case, given that the chain jumps away from rating directions. Hamilton and Cantor [7] investi-
i at date t, the probability that it jumps to state j is gate the performance of ratings when the state space
given by  ij λ (t) .
λ (t)
is enlarged with these modifiers and conclude that
iK
k=i they go a long way in reducing dependence on rating
A homogeneous Markov chain with intensity history.
matrix  has Aij (t) = λij t and in this special case
we can write P (s, t) = exp((t − s)).
For a method for estimating the continuous-time Correlated Transitions
transition probabilities nonparametrically using the
so-called Aalen–Johansen estimator, see, for exam- In risk management, the risk of loan portfolios and
ple, [2]. The specification of individual transition exposures to different counterparties in derivatives
intensities allows us to use hazard regressions on contracts depends critically on the extent to which
specific rating transitions. For an example of nonpara- the credit ratings of different loans and counterparties
metric techniques, see [5]. A Cox regression approach are correlated.
can be found in [9]. We finish by briefly outlining two ways of incor-
porating dependence into rating migrations. For the
first approach, see, for example, [6]; we map rat-
Empirical Observations ing probabilities into thresholds. The idea is easily
illustrated through an example. If firm 1 is cur-
There is a large literature on the statistical proper- rently rated i and we know the (say) 1-year transition
ties of the observed rating transitions, mainly for probabilities pi1 , . . . , piK , then we can model the
firms rated by Moody’s and Standard and Poors. transition to the various categories using a standard
It has been acknowledged for a long time that the Gaussian random variable 1 and defining thresholds
observed processes are not time homogeneous and a1 > a2 > . . . > aK−1 such that
not Markov. This is consistent with stated objectives
of rating agencies of trying to avoid rating rever- piK = P (1 ≤ aK−1 ) = (aK−1 ) (14)
sals and seeking to change ratings only when the
pi,K−1 = P (aK−1 ≤ 1 ≤ aK−2 )
change in credit quality is seen as enduring—a prop-
erty sometimes referred to as “rating through the = (a K−2 ) − (aK−1 ) (15)
cycle”. This is in contrast to “point-in-time” rating. ..
.
The distinction between the two approaches is not rig-
orous, but a rough indication of the difference is that pi1 = P (a1 ≤ 1 ) = 1 − (a1 ) (16)
4 Rating Transition Matrices

Similarly, for firm 2, we can define thresholds [2] Andersen, P.K., Borgan, O., Gill, R. & Keiding, N.
b1 , . . . , bK−1 and a standard random normal variable (1993). Statistical Models Based on Counting Processes,
Springer, New York.
2 so that the transition probabilities are matched as
[3] Cantor, R. (2004). An introduction to recent research
earlier. Letting 1 and 2 be correlated with correla- on credit ratings, Journal of Banking and Finance 28,
tion coefficient ρ induces correlation into the migra- 2565–2573.
tion patterns of the two firms. This can be extended [4] Cantor, R. (2008). Moody’s Guidelines for the With-
to a large collection of firms using a full correlation drawal of Ratings, Rating Methodology, Moody’s
matrix obtained, for example, by looking at equity Investors Service, New York.
return correlations. [5] Fledelius, P., Lando, D. & Nielsen, J. (2004). Non-
parametric analysis of rating transition and default data,
A second approach, which makes it possible to Journal of Investment Management 2(2), 71–85.
link up rating dynamics with continuous-time pricing [6] Gupton, G., Finger, C. & Bhatia, M. (1997). Credit-
models, is proposed in [8]. The idea here is to model Metrics—Technical Document, Morgan Guaranty Trust
the “conditional generator” of a Markov process as Company.
the product of a constant generator  and a strictly [7] Hamilton, D. & Cantor, R. (2004). Rating Transitions
positive affine process µ, that is, conditionally on and Defaults Conditional on Watchlists, Outlook and
Rating History, Special comment, Moody’s Investors
a realization of the process µ, the Markov chain is
Service, New York.
time non-homogeneous with the transition intensity [8] Lando, D. (1998). On Cox processes and credit risky
λij (s) = µ(s)λij . This framework allows for closed securities, Review of Derivatives Research 2, 99–120.
form computation of transition probabilities in a [9] Lando, D. & Skødeberg, T. (2002). Analyzing rat-
setting where rating migrations are correlated through ing transitions and rating drift with continuous obser-
dependence on state variables. vations, The Journal of Banking and Finance 26,
423–444.
[10] Nickell, P., Perraudin, W. & Varotto, S. (2000). Stability
References of ratings transitions, Journal of Banking and Finance
24, 203–227.

[1] Altman, E. & Kao, D.L. (1992). The implications of DAVID LANDO
corporate bond rating drift, Financial Analysts Journal
48(3), 64–75.
Credit Migration Models (see Loan Valuation) or performance measurement.
The total portfolio risk is commonly considered as a
capital which the lender should hold in order to buffer
It is nowadays widely recognized that portfolio mod- large losses. For nontraded assets, such as loans or
els are an essential tool for a proper and effective mortgages, the costs for holding this risk capital
management of credit portfolios, be it from the per- are typically transferred to the borrowers by means
spective of a corporate bank, a mortgage bank, a of a surcharge on interest rates. Calculating these
consumer finance provider, or a fixed-income asset surcharges necessitates that the total portfolio risk
manager. Traditional credit management was, to a capital is broken down to borrower (or instrument)
large extent, focused on the stand-alone analysis and level risk contributions. Only in a portfolio model
monitoring of the credit quality of obligors or coun- framework, where the dependence between obligors
terparties. Frequently, the credit process did also and the resulting diversification benefits are correctly
include ad hoc exposure-based limit-setting policies captured, this risk contribution can be determined
that were devised in order to prevent excessive risk in an economically rational and fair fashion. We
concentrations. This approach was scrutinized in the mention that risk contributions can also be applied
1990s, when the financial industry started to realize in order to determine the ex post (historical) risk-
that univariate models for obligor default had to be adjusted performance of instruments or subportfolios.
extended to a portfolio context. It was recognized Credit portfolio models also play an important role
that credit rating and loss recovery models, although a in the pricing of credit derivatives or structured
crucial element in the assessment of credit risk, fail to products, such as credit default swaps or CDSs.
explain some of the important stylized facts of credit For the correct pricing of many of these credit
loss distributions, if the stochastic dependence of instruments, it is crucial that the dependence between
obligor defaults is neglected. From a statistical point obligor default times are well modeled.
of view, not only the skewness and the relatively
heavy upper tails of credit portfolio loss distribu-
tions, but also the historically observed variation of Overview of Credit Migration-based
default rates and the clustering of bankruptcies in Models
single sectors are clearly inconsistent with stochastic
independence of defaults. From an economics point This article gives a survey on migration-based port-
of view, it is plausible that default rates are con- folio models, that is, models that describe the joint
nected to the intrinsic fluctuations of business cycles; evolution of credit ratings. The ancestor of all
relationships between default rates and the economic such models is CreditMetrics a , which was intro-
environment have indeed been established in numer- duced by the US investment bank J.P. Morgan. In
ous empirical studies [5]. All these insights supported 1997, J.P. Morgan and cosponsors from the finan-
the quest for tractable credit portfolio models that cial industry published a comprehensive technical
reflect these stylized facts. document [13] on CreditMetrics, in an effort to set
Apart from an accurate statistical description of industry standards and to create more transparency
credit losses, a portfolio model can serve many more in credit risk management. This publication attracted
purposes. In contrast to a univariate approach, a a lot of attention and proved to stimulate research
credit portfolio framework allows to quantify the in credit risk. To this date, the CreditMetrics or
diversification effects between credit instruments. derivations thereof have been implemented by a
This makes it, for example, possible to evaluate the large number of financial institutions. Before we
impact on the total risk when securities are added turn to a detailed description of CreditMetrics, it
or removed from a portfolio. In the same vein, the might be worth to mention two related models.
risk numbers produced by a portfolio model help CreditPortfolioView by McKinsey & Co is credit-
to identify possible hedges. Ultimately, the use of migration-based as well. However, in contrast to
a portfolio model facilitates the active management CreditMetrics, which assumes temporally constant
of credit portfolios and the efficient allocation of transition matrices, it is endowed with an estimator
capital. Less of a pure risk management matter is of credit migration probabilities based on macroe-
the use of portfolio models for risk-adjusted pricing conomic observables. is dedicated to its discussion.
2 Credit Migration Models

The second link concerns the longer standing KMV reached in one step from the analysis time 0; typically
model.b An outline of the KMV methodology can be the time horizon is 1 year. It is assumed that the port-
extracted from an article by Kealhofer and Bohn [16]. folio is static, that is, its composition is not altered
In both CreditMetrics and KMV, the obligor cor- during the time period (0, T ).
relation is generated in a similar fashion, that is,
with a dependence structure following a Gaussian Risk Factors and Valuation
copula. The main differences concern the number
of credit states and the source of probabilities of In case of CreditMetrics, the basic assumption is that
default (PDs). The KMV model operates on a contin- each instrument is tied to one or several obligors. The
uum of states, namely, the so-called expected default user furnishes obligors with a rating from a rating sys-
frequencies (Moody’s KMV EDF c ), basically esti- tem with a finite number of classes and an absorbing
mated PDs, whereas CreditMetrics is restricted to a default state. The obligor ratings are the main risk
finite number of credit rating states. For this reason, drivers. We index the obligors by i = 1, . . . , n and
KMV is strictly spoken not a credit-migration-based assume a rating system with rating classes {1 . . . , K}
model and therefore only touched in this article. that are ordered with respect to the credit quality,
As remarked by McNeil et al. [19], a discretization and a default class 0. At time 0, the obligor i has
of EDF would translate KMV to a model which, the (known) initial rating Siinit , which then becomes
apart from parametrization, is structurally equiva- Sinew at time T . The change from Siinit to Sinew hap-
lent to CreditMetrics. Secondly, while for Credit- pens in a random fashion, according to the so-called
Metrics rating transition matrices are the required credit migration probabilities. These probabilities are
exogenous inputs, the KMV counterparts, EDF of assumed to be identical for obligors in the same
listed companies, are estimated through a propri- rating class and can therefore be represented by a so-
etary method, which is basically an extension of called credit migration (or rating transition) matrix
the celebrated Merton model [20] for firm default. M = (mj k )j,k∈{0,...,K} . Clearly,
Inputs to the EDF model are historical time-series of
equity prices together with company debt informa- (Sinew = k|Siinit = j ) = mj k (1)
tion, with which the unobserved asset value processes The credit migration matrix is an important input
are reconstructed and a quantity called distance to to CreditMetrics. In practice, one often uses rating
default (DD) is calculated for every firm. This DD is systems supplied by agencies such as Moody’s or
used as a predictor of EDF; the relationship is deter- Standard&Poor’s. The model also allows to work in
mined by a nonlinear regression of historical default parallel with several rating systems, depending on the
data against historical DD values. It is beyond the obligor. If public ratings are not available, financial
scope of this article to provide more details and so institutions can resort to internal ratings; see Credit
we refer to [2] or [17] for an account of the EDF Rating; Internal-ratings-based Approach; Credit
methodology. Scoring.
To treat specific positions, CreditMetrics must
The CreditMetrics Model estimate values for the position contingent on the
position’s obligor being in each possible future rating
CreditMetrics models the distribution of the credit state. This is equivalent to estimating the loss (or
portfolio value at a future time, from which risk gain) on the position contingent on each possible
measures can be derived. The changes of port- rating transition. In the case of default, the recovery
folio value are caused by credit migrations of rate δi determines the proportion of the position’s
the underlying instruments. In the following, we principal that is paid back by the obligor.d
describe the rationale of the main building blocks of For the nondefault states, the standard implemen-
CreditMetrics. tation of the model is to value positions based on
market factors: the risk-free interest rate curve and
Timescale a spread curve corresponding to the rating state. For
this reason, CreditMetrics is commonly referred to as
CreditMetrics was conceived as a discrete time a mark-to-market model. Importantly, the mark-to-
model. It has a user-specified time horizon T that is market approach incorporates a maturity effect into
Credit Migration Models 3

the model: other things being equal, a downward More formally, if Ri denotes the asset return of
credit migration will have a greater impact on a long obligor i over (0, T ], then the rating at time T is
maturity bond than a short one, given the long bond’s determined by
higher sensitivity (duration) to the spread widen-
ing that is assumed to accompany the migration. Sinew = j ⇐⇒ dj(i) < Ri ≤ dj(i)+1 (2)
However, this approach does require relevant spread
curves for positions of all possible rating states. The increasing thresholds dj(i) are picked such that
For positions where there is little market informa- the resulting migration probabilities coincide with
tion, or where the mark-to-market approach is incon- the ones prescribed by the credit migration matrix.
sistent with an institution’s accounting scheme, it is Consequently,
possible to utilize policy-driven rather than market-
driven valuation. For example, if an institution has d0(i) = −∞ and (i)
dK+1 = +∞
a reserves policy whereby loss reserves are deter-
mined by credit rating and maturity, then the change Gi (dj(i)+1 ) − Gi (dj(i) ) = (Sinew = j | Siinit ) (3)
in required reserves can serve as a proxy for the loss
where Gi is the cumulative distribution function of
on a position, contingent on a particular rating move.
Ri . We illustrate the rating transition mechanism in
In this way, the model can still incorporate a maturity
Figure 1, which shows the return distribution and
effect, even where a mark-to-market approach is not
the thresholds for an obligor with an initial rating 2
practical.
in a hypothetical rating system with four nondefault
classes.
Risk Factor Dynamics and Obligor Dependence The dependence between obligor ratings stems
Structure from the dependence of the asset returns. CreditMet-
rics assumes that these returns follow a linear factor
In the original formulation of CreditMetrics, foreign model with multivariate normal factors and indepen-
exchange (FX) rates and interest rate and spread dent Gaussian innovations. This means that
curves are assumed to be deterministic since one 
p
focuses on the rating as the main risk driver. In Ri = αi + βi F + σi i (4)
principle, this assumption could be relaxed. =1
The migration matrix in the CreditMetrics model
where the common factors F = (F1 , . . . , Fp ) ∼
specifies the rating dynamics of a single obligor,
Np (µ, ) are multivariate Gaussian and the i ’s are
but it does not provide any information about the
independent and identically distributed (i.i.d.) stan-
joint obligor credit migrations. In order to capture
dard normal variables independent of the factors. The
the obligor dependence structure, CreditMetrics bor-
pβi are also called factor exposures or load-
numbers
rows ideas from the Merton structural model for firm
ings, =1 βi F is the systematic return, σi is the
default, which links default to the obligor asset value
volatility of idiosyncratic (or specific) return σi i of
falling short of its liabilities. The assumption of Cred-
obligor i, and the real parameter αi is referred to as
itMetrics is that the obligor rating transition is caused
alpha. The dependence between the returns, and con-
by changes of the obligor’s asset value, or equiva-
sequently the dependence between future ratings, is
lently, the asset value return. The lower this random
caused by the exposure of the obligors to the com-
return, the lower the new rating; if the asset value
mon factors. Usually one normalizes the returns Ri
return drops below a certain threshold, default occurs.
to unit variance; this does not alter the joint distribu-
Mathematically, this amounts to defining return buck- tion of Snew = (S1new , . . . , Snnew ) and leads to adjusted
ets for each obligor; the thresholds bounding these thresholds that are simpler. Not explicitly distinguish-
buckets depend on the initial obligor rating, the tran- ing between returns and normalized returns in our
sition probabilities, and the return distribution. The notation, equation (4) then reads as
rating of an obligor is determined by the bucket its
return falls into. Obviously, the bucket probabilities 
must coincide with the transition probabilities. Mod- Ri = ψi (F) + 1 − Var(ψi (F)) i ∼ N(0, 1)
els of this type are also called threshold models. (5)
4 Credit Migration Models

Density

0 1 2 3 4
d1 d2 d3 d4
Asset value return

Figure 1 Asset return distribution, thresholds and rating classes

for appropriate affine linear functions ψi . The where C, the copula associated with R, is the dis-
adjusted thresholds are given by tribution function of a random vector with standard
uniform marginal distributions. From standard argu-
ments (see e.g., Copulas: Estimation; Copulas in
dj(i) = dj (Siinit ) with Econometrics; or Copulas in Insurance and refer-
 j −1 
 ences therein),
−1
dj (s) =  msk , 0<j ≤K (6)
k=0
(S1new = s1 , . . . , Snnew = sn )
where  is the standard normal distribution
= (ds(1) < R1 ≤ ds(1) 1 +1
, . . . , ds(n) < Rn ≤ ds(n)
n +1
)
function. 1 n
 d (1)  d (n)
As regards the recovery rates δi , they are assumed s1 +1 sn +1

to be independent of each other and of the obligor = ··· dG(r1 , . . . , rn )


ds(1) ds(n)
n
returns. One stipulates that they are beta-distributed 1
 G1 (ds(1)+1 )  Gn (ds(n) )
with parameters (ai , bi ). 1 n +1
= ··· dC(u1 , . . . , un )
G1 (ds(1)
1
) Gn (ds(n)
n )

(8)
Obligor Dependence Structure from Copulas

As was first recognized by Frey et al. [9] (see also Note that the integration limits Gi (ds(i) )=
si −1 i
[8]), copulas provide an elegant means for describing k=0 m init
Si ,k do not depend on Gi . This implies that
the obligor dependence structure in credit portfolio the joint distribution of the ratings vector Snew is
models. By virtue of Sklar’s theorem, the joint determined by the initial obligor ratings, the credit
distribution of the random vector R = (R1 , . . . , Rn ) migration matrix M and the copula C; the marginal
can be factorized as distributions Gi do not matter.
This result helps to categorize threshold credit
portfolio models; models using the same families of
G(r1 , . . . , rn ) = (R1 ≤ r1 , . . . , Rn ≤ rn )
copulas can be considered as structurally equivalent.
= C(G1 (r1 ), . . . , Gn (rn )) (7) The copula associated with a Gaussian random vector
Credit Migration Models 5

is called Gaussian copula and depends on the correla- this allows one to make a link to CreditRisk+ (see
tion matrix only. Since the returns R are multivariate [3, 8, 11] for details).
Gaussian, the original CreditMetrics model [13] is
referred to as having a Gaussian copula. Replacing
the Gaussian copula family by other families gives Asymptotic Behavior of CreditMetrics
different models. Frey et al. [9] study the CreditMet-
rics model with Student-t copulas and find that the In CreditMetrics, risk measures such as VaR
tail of the loss distribution is considerably fatter as (Value-at-Risk) or ES (expected shortfall) cannot
compared with the Gaussian copula with identical be expressed in terms of simple closed formulas.
correlation parameters. For their estimation, one has to resort to Monte-
Carlo (MC) simulation or other numerical methods
CreditMetrics as a Mixture Model (see Credit Portfolio Simulation). Although the
topic of approximations in credit portfolio models
We now interpret the CreditMetrics model in a con- is covered in Large Pool Approximations;
ditional fashion in order to better understand the Saddlepoint Approximation, we provide a brief
meaning of the factors F. To this end, we look at discussion because the asymptotic results provide
the vector of default indicators D = (D1 , . . . , Dn ) , important qualitative insights.
where Di = I{Sinew =0} . We denote the default prob- Concerning approximation, research has dealt with
abilities by p̄i = (Di = 1). Then conditional on strong limits of the relative portfolio loss and the tail
the factors F, the vector D consists of independent behavior of the loss distribution when the number
Bernoulli random variables with success probabilities of obligors tends to infinity. While the derivation of
strong limits consists of straightforward applications
pi (F) = (Di = 1 | F) of the strong law of large numbers, the analysis of
  the tail behavior of the loss is more involved. For the
−1 (p̄i ) − ψi (F) tail behavior, we refer to [18] and a recent article by
=  (9)
1 − Var(ψi (F)) Glasserman et al. [10].
We next present the idea of the so-called large pool
We deduce that one can simulate D by first approximation. To this end, we work in a simplified
drawing F from a multivariate Gaussian distribution framework that is adapted to loan portfolios. We
and then generating independent Bernoulli random assume that recovery rates, spreads, and interest
variables with success probabilities pi (F). From rates are all equal to zero. Every obligor i has an
this angle, F represents the state of the economy outstanding loan of size ei . Then the loss in the period
that determines the obligor default probabilities. The (0, T ] is given by
distribution of D, which is obtained from “mixing”
the conditional distributions of D by F, is a so-called 
n

Bernoulli mixture: Ln = ei D i (11)


i=1

(D1 = d1 , . . . , Dn = dn ) 
We define total exposure by e = ni=1 ei and set
 
= Ɛ (D1 = d1 , . . . , Dn = dn | F) L̄n = Ln /e for the relative loss of the portfolio. We
 n  want to verify to which extent the specific risk caused
by the obligor-specific returns i is diversified away
=Ɛ pi (F) (1 − pi (F))
di 1−di
(10)
when the number of obligors grows. To this end, we
i=1
decompose the relative loss into a systematic and an
The conditional view offers computational advan- obligor-specific component:
tages. Finger [6] exploits it for the determination
of credit portfolio distributions. Using the Poisson L̄n = Ɛ(L̄n | F) + n (12)
approximation for sums of independent Bernoulli
variables, various authors have shown that Credit- It is straightforward to show that obligor-specific
Metrics is approximated by a Poisson mixture model; variance tends to zero as n → ∞, provided the
6 Credit Migration Models

Herfindahl index, which measures exposure concen- equal to p̄, the limit Ɛ(L̄n | F) is a so-called probit-
tration, converges to zero: normal random variable:
 √ 
1 2
n −1 (p̄) − ρF
Var( n ) → 0 if Hn = e →0 (13) L̄n ≈ p(F) =   (15)
e i=1 i 1−ρ

This result goes back to [23, 24]. Studying the


The stronger property that n converges almost properties of the probit-normal distribution reveals
surely to zero holds true (see e.g. [8] for a proof). that the tail behavior of L̄n is essentially governed
This justifies the following approximation for large by the correlation; the larger the ρ, the thicker the
portfolios with no severe exposure concentration: tail of the distribution of L̄n . Along the same lines
one can analyze the tails of credit-migration-based
1
n
models with non-Gaussian copulas, as was done by
L̄n ≈ Ɛ(L̄n | F) = pi (F) (14) Lucas et al. [18].
e i=1

This means that for n large enough, the relative loss


Model Estimation
virtually coincides with its systematic component, or
in other words, the idiosyncratic risks are diversified Statistical inference poses a big challenge in Cred-
away and one is left with systematic risk (caused by itMetrics and related models. No estimation method
F) only. The systematic risk cannot be diversified that we know of is truly flawless because assump-
away other than by hedges that impact Ɛ(L̄n | F). tions that are difficult to verify are ever involved.
Despite this positive result, one should be aware As we have seen, the model (2) is fully determined
that the portfolio R-squared is typically significantly by the credit migration matrix and the matrix of
lower than the R-squared in long-only equity portfo- asset correlations. In what follows, we take the exis-
lios with a similar number of securities. This indicates tence of a credit migration matrix for granted (see
that in credit portfolios the idiosyncratic risks are Rating Transition Matrices) and restrict our dis-
comparatively more important than in equity portfo- cussion to the estimation of asset correlations. See
lios, or in other words, more securities are necessary also Portfolio Credit Risk: Statistical Methods for
to diversify a credit portfolio. This feature is due to a general discussion of statistical methods for the cal-
the fact that the default correlations Corr(Di , Di  ) are ibration of credit portfolio models. We distinguish
very low (less than 5%) for typical values of asset between direct and indirect approaches. The direct
correlations (10–30%) and default probabilities.e approaches start by estimating the exposures and
The large pool approximation (14) lies at the core variances in the factor model (4), typically by regres-
of the capital requirement in the Basel II internal sions against certain predefined factors; the asset
ratings-based (IRB) approach. The IRB approach correlations are then easily derived. In the indirect
imposes a constant correlation Gaussian one-factor approaches, asset correlations are inferred from his-

model, that is, ψi (F) = ρF in (5), and F ∼ N(0, 1) torical default data.
is a latent (unobserved) factor. IRB assumes that the CreditMetrics uses a direct approach. A major dif-
portfolio is infinitely fine-grained, that is, it neglects ficulty is that firm asset values cannot be directly
n and calculates risk directly from Ɛ(L̄n | F). Since observed, even for companies that have publicly
the latter is the sum of comonotonicf (or perfectly traded equity. CreditMetrics circumvents this prob-
positive dependent) obligor contributions, portfolio lem by assuming that asset return correlations are
VaR is just the sum of the obligor VaR contributions. proxied by equity return correlations, or in other
For an explanation of the additional multiplicative words, it regards the Ri ’s in equation (4) as equity
adjustments for maturities and rules for the choice of returns. The CreditMetrics technical document [13]
ρi applied by the IRB approach, we refer to Internal- suggests to use MSCI industry and country index
ratings-based Approach, [7, 12]. returns as explanatory factors. In principle, one could
If the default probabilities in the constant corre- work with any other set of factors. A nice benefit of
lation Gaussian one-factor model are constant and letting equity returns drive the rating transitions is the
Credit Migration Models 7

fact that CreditMetrics can be naturally embedded End Notes


into market risk models of the RiskMetrics g type
because the CreditMetrics risk factors are already part a.
RiskMetrics and CreditMetrics are registered trademarks
of the RiskMetrics factor universe. This allows the of RiskMetrics Group, Inc. and its affiliates.
aggregation of credit and market risks in a straight- b.
Since the original development of the model, the KMV
forward fashion. company was acquired by Moody’s, Inc. and is now part
So far we have explained the case of obligors of Moody’s KMV.
c.
with publicly traded equity. For private firms, the Moody’s KMV EDF, which we refer to as simply EDF, is
betas are mostly set by resorting to economic a trademark of Moody’s KMV. See www.moodyskmv.com.
d.
Equivalently, the loss is determined by the loss given
arguments. Often one uses the obligor’s country and
default (LGD) specified for the position.
industry or sector affiliations. Say, a firm has two e.
The picture is similar when R-squared is replaced by
equally large lines of businesses, which belong to squared ratios of other risk measures.
the US Information Technology and US Consumer f.
See e.g., McNeil et al. [19] for formal definition of
Discretionary sectors, respectively. Then the betas comonotonicity.
g.
with respect to the MSCI US Information Technology See Mina and Xiaao [21] for an introduction to RiskMet-
and MSCI US Consumer Discretionary sector index rics.
returns would both be set to 0.5, and all other betas to
zero. Another piece to specify is the variance of the
idiosyncratic term in equation (4), or equivalently, References
the R-squared. Here CreditMetrics stipulates that R-
squareds obey R 2 = 1/(1 + Aγ exp(λ)), where A is [1] Bluhm, C., Overbeck, L. & Wagner, C. (2003). An
the book value of the firm’s total assets and γ and λ Introduction to Credit Risk Modeling, John Wiley &
are fixed parameters estimated from a cross-section of Sons.
traded stocks; we refer to [14] for a critical appraisal [2] Crosbie, P. & Bohn, J.R. (2003). Modeling Default Risk.
Available at www.moodyskmv.com.
of this method. Alternatively, R-squared can be set
[3] Crouhy, M., Galai, D. & Mark, R. (2000). A comparative
by the experienced user. analysis of current credit risk models, Journal of Banking
We mention that KMV uses a direct approach as and Finance 24, 59–117.
well, but in contrast to CreditMetrics it reconstructs [4] Demey, P., Jouanin, J.-F., Roget, C. & Roncalli, T.
the asset values from the equity price history using (2004). Maximum likelihood estimate of default corre-
the Merton model framework; see [17] for details lations, Risk, November, 104–108.
about this reconstruction and [16] for a description [5] Duffie, D. & Singleton, K.J. (2003). Credit Risk: Pricing,
Measurement, and Management, Princeton Series in
of the structure of the KMV factor model.
Finance, Princeton University Press.
The indirect approaches apply statistical inference [6] Finger, C.C. (1999). Conditional approaches for Cred-
to time-series of count data of defaults. These meth- itMetrics portfolio distributions, CreditMetrics Monitor
ods are constricted by the sparsity of clean data. April, 14–33.
Since defaults rarely happen and the time-series are [7] Finger, C.C. (2001). The one-factor CreditMetrics model
short, one creates groups of obligors and assumes in the new Basel Capital Accord, RiskMetrics Journal
that asset correlations in these groups are constant. 9–18.
[8] Frey, R. & McNeil, A.J. (2003). Dependent defaults in
There exist several studies following an indirect
models of portfolio credit risk, Journal of Risk 6(1),
approach. Bluhm et al. [1] back out asset correla- 59–92.
tions in rating classes from standard deviations of [9] Frey, R., McNeil, A.J. & Nyfeler, M. (2002). Copulas
historical default rates. De Servigny and Renault [22] and credit models, Risk, October, 111–114.
infer default correlation from observed joint defaults. [10] Glasserman, P., Kang, W. & Shahabuddin, P. (2007).
Demey et al. [4] use maximum-likelihood estimation Large deviations in multifactor portfolio credit risk,
and reduce the number of parameters by assuming Mathematical Finance 17(3), 345–379.
[11] Gordy, M.B. (2000). A comparative anatomy of credit
that both asset correlations between and within rat-
risk models, Journal of Banking and Finance 24,
ing classes are constants. Hamerle and Rösch [15] 119–149.
also apply maximum-likelihood estimation, but advo- [12] Gordy, M.B. (2003). A risk-factor model foundation for
cate the use of lagged macroeconomic variables as ratings-based bank capital rules, Journal of Financial
additional predictors. Intermediation 12, 199–232.
8 Credit Migration Models

[13] Gupton, G.M., Finger, C.C. & Bhatia, M. (1997). [20] Merton, R.C. (1974). On the pricing of corporate debt:
CreditMetrics–Technical Document, J.P. Morgan & Co. the risk structure of interest rates, Journal of Finance
Incorporated. 29, 449–470.
[14] Hahnenstein, L. (2004). Calibrating the CreditMet- [21] Mina, J. & Xiaao, J.Y. (2001). Return to RiskMetrics:
The Evolution of a Standard, RiskMetrics Group.
rics correlation concept—empirical evidence from
[22] de Servigny, A. & Renault, O. (2003). Correlation
Germany, Financial Markets and Portfoliomanagement
evidence, Risk, July, 90–94.
18, 358–381. [23] Vasicek, O.A. (1987). Probability of Loss on Loan
[15] Hamerle, A. & Rösch, D. (2006). Parameterizing credit Portfolio. Available at www.moodyskmv.com
risk models, Journal of Credit Risk 2(4), 101–122. [24] Vasicek, O.A. (2002). Loan portfolio value, Risk,
[16] Kealhofer, S. & Bohn, J.R. (2001). Portfolio Manage- December, 160–162.
ment of Default Risk. Available at www.moodyskmv.
com
[17] Lando, D. (2004). Credit Risk Modeling, Princeton
Related Articles
Series in Finance, Princeton University Press.
[18] Lucas, A., Klaassen, P., Spreij, P. & Straetmans, S. Exposure to Default and Loss Given Default;
(2001). An analytical approach to credit risk in large Gaussian Copula Model; Large Pool Approxima-
corporate bond and loan portfolios, Journal of Banking tions; Structural Default Risk Models; Rating
and Finance 2, 1635–1664. Transition Matrices.
[19] McNeil, A.J., Frey, R. & Embrechts, P. (2005). Quan-
titative Risk Management, Princeton Series in Finance, DANIEL STRAUMANN & CHRISTOPHER C.
Princeton University Press. FINGER
Structural Default Risk A major advantage of (multivariate) structural-
default models is the appealing economic inter-
Models pretation of the definition of default. Additionally,
comovements of the individual firm-value processes
might also be interpreted as being the result of com-
mon risk factors. Moreover, modeling the evolution
Structural models of default risk for individual firms of the firm’s values as some multivariate stochastic
originate from the seminal work of Merton [25]. process naturally implies a dynamic model, which
Default is linked to the economic fundamentals of is highly desirable in risk-management and pricing
the considered firm via the assumption that default applications. The downside of this class of mod-
occurs if the value of the firm’s assets, modeled as a els is the mathematical challenge of computing the
geometric Brownian motion, falls below some default portfolio-loss distribution or even bivariate default
threshold (the firm’s liabilities) at some future point correlations. Hence, most of the proposed models rely
in time (the maturity of a zero-coupon bond). A sig- on simplifying assumptions or can be solved only via
nificant extension of this methodology was proposed a Monte Carlo simulation.
by Black and Cox [7], who continuously test for
default. Hence, in their model, the time of default is
a first-passage time. Further generalizations address Merton-type Models
stochastic interest rates, more general assumptions
on the default threshold, the definition of the default The Model of Vasicek
event, and discontinuous processes as model for the
firm’s assets [10, 12, 20–22, 34]. On a high level, In his short memo [31], Vasicek considers a port-
these innovations aim at making the model-induced folio of n loans with unit nominal and maturity T .
term structure of default probabilities flexible enough Each individual firm-value process is modeled as a
to allow for a precise fit of the model to observed geometric Brownian motion defined by the stochastic
bond prices and credit default swap (CDS) spreads. differential equation:
The growing popularity of derivatives on credit
portfolios, for example, collateralized debt obli- dVti = Vti (γ i dt + σ i dWti ),
gations (CDOs) and nth to default baskets, and
V0i > 0, i ∈ {1, . . . , n} (1)
advanced demands on risk-management solutions
produced a need for portfolio models that simulta- The first simplification, often referred to as homo-
neously explain the credit quality of multiple firms. geneous portfolio assumption, is to assume identical
Since corporate defaults in a globalized economy are default probabilities for all firms. In the current setup,
not independent, a multivariate default model has to this assumption corresponds to identical parameters
explain univariate default probabilities and the depen- V0 ≡ V0i , σ ≡ σ i , and γ ≡ γ i . Moreover, an identi-
dence among the default events. A natural assump- cal correlation across all bivariate pairs of Brownian
tion for a multivariate structural-default model is to motions W i and W j is assumed. Using Itô’s formula
introduce dependence by assuming correlated asset and replacing the growth rate γ by the risk-free inter-
values, leading to dependent default events. Zhou est rate r, we find
[33] motivates this approach by the observation: “The
d √
fortunes of individual companies are linked together VTi = V0i exp((r − 0.5 σ 2 )T + σ T X i ) (2)
via industry-specific and/or general economic con- √
ditions.” The first portfolio model of this class was where X i := WTi / T follows a standard normal
formulated by Vasicek [31] and can be classified as distribution. Given some default threshold dT ≡ dTi ,
a multivariate generalization of the work by Mer- one can immediately compute the probability of
ton [25]. This model is discussed in some detail in default at time T , since the distribution of the firm
the section “The Model of Vasicek”, as it constitutes value at time T is known explicitly. Moreover,
the basis for most of today’s generalizations and is since default can only happen at maturity, only
used to asses the regulatory capital for loan portfolios the distribution of VTi is of importance and not
within the Basel II framework. the dynamic model leading to it. By scaling the
2 Structural Default Risk Models

original default threshold, default can alternatively can be avoided by applying the law of large numbers;
be expressed in terms of the standard normally this approach is called large portfolio approximation.
distributed variable X i . More precisely, assuming The key observation is that ( ≤ x|M = m) →
the default probability of firm i at time T is given 1{p(m)≤x} for (n → ∞). A straightforward calculation,
by p i , the default threshold with respect to X i is see, for example, [29] for details, establishes
K i = −1 (p i ), where −1 is the quantile function
of the standard normal distribution. (n)
Fp,ρ (x)
To incorporate correlation among the companies,  
one explains X i by a common market factor M and ∞ 1 − ρ −1 (x) − −1 (p)
an idiosyncratic risk factor  i , that is, → Fp,ρ (x) :=  √ ,
ρ
d √ 
X i = ρM + 1 − ρ i , ρ ∈ (0, 1) (3) (n → ∞)

where M, { i }ni=1 are independent standard nor- (7)


mally distributed random variables. Consequently,
This approximation is continuous and strictly
Cor(X i , X j ) = ρ for i  = j , and each X i is again
increasing in x. As it further maps the unit interval
distributed according to the standard normal law.
onto itself, it is a distribution function, too. It
By conditioning on the common market factor M,
is also worth mentioning that the quality of this
the firm’s values and default events are independent.
approximation is typically good; see [29] for a
The result is the so-called conditionally independent
discussion.
model. We denote this conditional default probability
by p i (M) and obtain
The IRB Approach in Basel II
 √ 
Ki − ρ m Vasicek’s asymptotic loss distribution or, more pre-
p i (m) = (X i < K i |M = m) =  
1−ρ cisely, its quantile function
 √ 
(4) −1 (y) ρ + −1 (p)
Kp,ρ (y) :=   (8)
Furthermore, all companies are assumed to have 1−ρ
identical default probabilities p ≡ p i . Now  is plays a major role in today’s regulatory world. The
defined as the random variable that describes the core of the first pillar of the Basel II accord [4] is the
fraction of defaults in the portfolio up to time t. internal rating-based (IRB) approach for calculating
The distribution of  depends on two parameters: the capital requirements for loan portfolios. Within this
individual default probability p and the correlation framework, banks classify their loans by asset class
ρ. In what follows, this distribution is denoted by and credit quality into homogeneous buckets and use
(n)
Fp,ρ (x) = ( ≤ x). It is crucial that firms are their own internal rating systems to estimate risk
independent given M, since the probability that characteristics such as loss-given default (LGD), the
exactly k firms default can be derived by integrating expected exposure at default (EAD), and the one-
out the market factor: year default probability (PD), that is, P D = p1 . It
 ∞ is worth mentioning that estimating LGD and PD
(n = k) = (n  = k|M = m) independently contradicts the empirical observation
−∞ that recovery rates and default rates are inversely
× φ(m)dm (5) related; see, for example, [2] and [11]. Still, banks
 ∞  are free to choose a certain internal rating system, as
n
= p(m)k (1 − p(m))n−k long as they can demonstrate its accuracy and meet
−∞ k
certain data requirements. In the second step, these
× φ(m)dm (6) credit characteristics are used in the IRB formula
to asses the minimum capital requirements for the
where φ is the density of a standard normal distribu- unexpected loss via the factor
tion and k ∈ {0, . . . , n}. For large portfolios, evaluat-  
ing the binomial coefficient is numerically critical and KIRB = LGD · KP D,ρ (99.9%) − P D · MA (9)
Structural Default Risk Models 3

The risk-weighted assets (RW A) are then obtained are more likely to default due to idiosyncratic rea-
by sons. (ρ l , ρ u ) depend on the type of loan and are
specified as (0.12, 0.24) for sovereign, corporate, and
KIRB · EAD bank loans; (0.12, 0.30) for highly volatile commer-
RW A = = 12.5 · KIRB · EAD (10)
0.08 cial real estate loans; ρ l = ρ u = 0.15 for residential
where 0.08 corresponds to the 8% minimum capital mortgages; ρ l = ρ u = 0.04 for revolving retail loans
ratio. The very conservative one-year 99.9%-quantile such as credit cards; and finally (0.03, 0.16) for other
in equation (9) is part of the Basel II accord and retail exposures, where in this case the weight func-
might be interpreted as some cushion regarding the tion a(x) is computed with exponents −35 instead
underlying simplifications in Vasicek’s model. The of −50.
factor MA is the maturity adjustment and calculated The IRB approach is sometimes criticized for
via (some exceptions apply) the strong assumptions that are required to derive
Vasicek’s distribution. However, one should recog-
nize the IRB approach as a compromise that pro-
1 + (M − 2.5) · b(P D)
MA = , vides a common language for regulators, banks, and
1 − 1.5 · b(P D) investors to communicate and establishes compara-
 

 t · CFt  
ble risk estimates across banks. The IRB formula is
  discussed in depth in [5, 30].
t
M = min ,5 (11)

 CFt 

 
t Generalizations Using Other Distributions
and b(P D) = (0.11852 − 0.05478 · log P D)2 , where It is well known that the model [31] does not yield
CFt denotes the expected cash-flow at time t. M a satisfactory fit to market quotes of tranches of
accounts for the fact that loans with longer (shorter) CDOs. More precisely, an implied correlation smile
maturity than one year require a higher (lower) cap- is present when the model is inverted for the corre-
ital charge. Finally, the crucial correlation parameter lation parameter tranche by tranche. Especially tail
needs to be specified. Basel II uses a convex combina- events with multiple defaults are underrepresented
tion between some lower ρ l and upper ρ u correlation in a Gaussian world, making a precise fit to senior
whose weights depend on the default probability of tranches of a CDO impossible. To overcome this
the respective loan, that is, shortcoming, a natural assumption is to give up
normality in equation (3) and consider other heav-
ρ = ρ l a(P D) + ρ u (1 − a(P D)), ier tailed distributions. For the derivation leading to

Fp,ρ in equation (7), the stability of the normal dis-
1 − e−50x tribution under convolutions is essential in equation
a(x) = (12)
1 − e−50 (3). Hence, natural choices for generalizations are
other infinitely divisible distributions, which are con-
For corporate credits, the correlation-adjustment nected to Lévy processes; see, for example, [8]. These
factor generalizations add flexibility to the model and can
  additionally imply a dependence structure with tail
max{5, S} − 5 dependence, making multiple defaults more likely.
SMad (S) = − 0.04 · 1 −
45 Specific models in this spirit include, for example, the
× 1l{S≤50} (13) NIG model of Kalemanova et al. [17], the VG model
of Moosbrucker [27], and the BVG model of Baxter
[6]. Following [1], we now derive a large homo-
is added to ρ for borrowers with reported annual geneous portfolio approximation in a general Lévy
sales S ≤ 50, measured in millions of Euros. The spe- framework.
cific form of a(x) and the adjustment factor SMad (S) Let X = {Xt }t∈[0,1] be a Lévy process (see Lévy
being negative stem from the empirical observation Processes) with X1 ∼ H1 for some infinitely divis-
[23] that large firms that bear more systemic risk ible distribution H1 . Assume X1 to be standardized
are more correlated compared to small firms that to zero mean and unit variance. Given a correlation
4 Structural Default Risk Models

ρ ∈ (0, 1), define in analogy to equation (3) for inde- expected jump size ν = Ɛ[
− 1]. The advantage of
pendent copies {X i }ni=1 of X the random variables V i supporting negative jumps on a univariate level is
by that default events are no longer predictable, which
V i := Xρ + X1−ρ i
, i ∈ {1, . . . , n} (14) translates to positive short-term credit spreads. Wille-
mann [32] incorporates dependence to the individual
Here, the common market factor is represented firm-value processes by the classical decomposition
by Xρ , and the idiosyncratic risk of firm i is cap- of each Brownian motion into a market factor and
i
tured in X1−ρ . Using the Lévy properties of X, an idiosyncratic component. Moreover, it is assumed
i
each V is again distributed according to H1 and that all firm-value processes jump together, that is,
Cor(V i , V j ) = ρ for i  = j . In what follows, we all processes are driven by the same Poisson pro-
denote by Ht−1 the inverse of the distribution func- cess Nt . Consequently, this construction allows for
tion of Xt . The homogeneous portfolio assumption two layers of correlation: diffusion and jump cor-
in the present setup translates to identical univari- relation; the latter being the main innovation of
ate default probabilities up to time T , abbreviated as this setup.
p ≡ p i , identical threshold levels KT = H1−1 (p) ≡ The default threshold of firm i is set to Kti =
KTi , and unit notional of each firm. The probability e −φ i t i
K0 for some positive constants φ i and K0i .
of exactly k defaults in the portfolio is then again This declining form is chosen to increase short-term
obtained as spreads, but might also imply that the fit to indi-
 ∞ vidual CDS gets worse with increase in time. To
(n = k) = (n  = k|Xρ = m)dHρ (m), achieve semianalytical results for the portfolio-loss
−∞ distribution, default is tested on a grid. The advan-
k ∈ {0, . . . , n} (15) tage of this simplification is that only the distribu-
tion of each firm-value process at the grid points is
Similar to Vasicek’s model, the conditional distri- required, instead of functionals as infs∈[0,t] Vs . Indi-
bution of the number of defaults given Xρ = m is a vidual default probabilities up to time t can then
binomial distribution with n trials and success prob- be computed conditional on the number of jumps
ability p(m) = (V i ≤ KT |Xρ = m) = H1−ρ (KT − up to time t, which is a Poisson-distributed random
m). The large portfolio assumption, that is, letting the variable. Since the specific choice of jump-size dis-
number of firms n tend to infinity, then gives tribution is compatible to the Brownian motion of
the model, this leads to an infinite sum of normally

Fp,ρ (x) = 1 − Hρ (H1−1 (p) − H1−ρ
−1
(x)) (16)
distributed random variables. Moreover, all default
as distribution function of the fractional loss in an events are independent conditional on the market
infinite granular portfolio; see [1] for a complete factor and the number of jumps. Hence, the portfolio-
proof. Let us finally remark that evaluating Ht and loss distribution can be found by integrating out these
Ht−1 requires numerical routines for most choices of common factors and using a recursion technique sim-
X1 ∼ H1 . ilar to [3, 16]. Willemann [32] demonstrates quite
successfully how the model is simultaneously fit-
ted (in seconds) to individual CDS spreads and the
The Model of Willemann tranches of a CDO.
The starting point for Willemann [32] is the univariate
jump-diffusion model of Zhou [34]. This model
assumes a discontinuous firm-value process of the A Remark on Asset and Default Correlation
form
Modeling asset values as correlated stochastic pro-
cesses introduces dependence to the resulting default
dVt = Vt ((γ − λν)dt + σ dWt + (
− 1)dNt ), times. Still, this relation is not trivial and deserves
V0 > 0 (17) some caution, especially when it comes to estimating
the model’s asset-correlation parameter. We follow
where Nt is a Poisson process with intensity λ > 0 [24] in defining the default correlation of two firms
and the jumps
are log-normally distributed with (up to time t) as
Structural Default Risk Models 5

ρtD : = Cor(1l{τ 1 ≤t} , 1l{τ 2 ≤t} ) Being able to convert default to asset correlations
(and vice versa) opens the possibility of estimating
(Pt1 , Pt2 ) − (Pt1 )(Pt2 ) the model’s asset correlation using historical default
=  
(Pt1 )(1 − (Pt1 )) (Pt2 )(1 − (Pt2 )) correlations (and vice versa); see, for example, [14].
This approach is relevant since asset values are not
(18) directly observable, making an estimation of asset
correlations delicate. It is an ongoing debate whether
where Pti := {τ i ≤ t}, i ∈ {1, 2}. Most structural- indirectly observed changes in asset values, computed
default models share the commanality that evaluating from changes in the respective firm’s equity, or
(Pt1 , Pt2 ), the probability of a joint default of both observed defaults are the better source of data for
firms up to time t, is quite difficult; an exception the estimation of the model’s correlation parameter.
being the case of two companies with Gaussian In both cases, pointing out the respective limitations
factors coupled as described in equation (3). This is much simpler than providing theoretical evidence
example is, therefore, used to illustrate the nonlinear for the methodology. Empirically estimating default
relation of asset and default correlation. A joint correlations (based on groups of firms with similar
default in this setup corresponds to a simultaneous characteristics) requires a large set of observations,
drop of both factors X 1 and X 2 below their respective since corporate defaults are rare events. This makes
default threshold K i = −1 (pti ), i ∈ {1, 2}. Since the the approach vulnerable to structural changes such as
vector (X 1 , X 2 ) follows a two-dimensional normal new bankruptcy rules. On the other hand, daily equity
distribution with mean vector (0, 0) and the asset- prices are readily available for most firms. When this
correlation ρ as correlation parameter, we obtain latter source of data is used, the difficulty lies in
transforming equity to asset returns, see, for example,
(Pt1 , Pt2 ) = 2 (K 1 , K 2 ; ρ) (19) [9], from which the correlation might be estimated.
In addition, one should be aware that equity prices
might change for reasons that are not related to
which is used to produce Figure 1. This example credit risk.
illustrates that small asset correlations induce only
a negligible default correlation.
First-passage Time Models
The starting point for most multivariate first passage-
1.0 time models is equation (1). Compared to models
in the spirit of the work by Merton [25], the time
0.8 of default is now defined as suggested in [7], that is,
Default correlation

τ i := inf{t ≥ 0 : Vti ≤ dti }, i ∈ {1, . . . , n} (20)


0.6
where dti is the default threshold of firm i at
0.4 time t. From a modeling perspective, this defini-
tion overcomes the unrealistic assumption of default
being restricted to maturity. This observation is even
0.2
more crucial in a portfolio environment when bonds
with different maturities are monitored simultane-
0.0 ously. More precisely, a first-passage model naturally
0.0 0.2 0.4 0.6 0.8 1.0 induces a dynamic model for the default correlation
Asset correlation
(since the firm-value processes evolve dynamically
over time) and allows the computation of consistent
Figure 1 Default correlation ρtD as a function of asset default correlations over any time horizon.
correlation ρ ∈ [0, 1], with (Pt1 ) = (Pt1 ) = 0.05 and However, the main drawback of this model class
t =1 is its computational intractability. This stems from
6 Structural Default Risk Models

the fact that the joint distribution of the mini- piece of information allows to update the knowl-
mum of several firm-value processes is required, edge on all other default thresholds, leading to
which is already a challenging problem for univari- contagious jumps in credit spreads of the remain-
ate marginals. The following section collects models ing firms. Giesecke [13] also presents an explicit
where analytical results or numerical routines are example of two firms with independent value pro-
available to overcome this problem. cesses modeled as geometric Brownian motions and
default thresholds coupled via a Clayton copula.
While this simplified example illustrates the desired
The Model of Zhou contagion effect of the model, it also highlights the
challenge of finding analytic results in a realistic
Zhou studies [33] a portfolio of two firms whose
framework.
asset-value processes are modeled as in equation
(1) with correlated Brownian motions. The default
thresholds are assumed to be exponential, that is, Models Relying on Monte Carlo Simulations
i
dti = eλ t K i for i ∈ {1, 2}. The degree of dependence
This section briefly presents two first-passage time
of both firms is measured in terms of their default cor-
models that rely on Monte Carlo simulations for the
relation up to time t, that is, as Cor(1{τ 1 ≤t} , 1{τ 2 ≤t} ).
pricing of CDOs.
The key observation is that results of Rebholz [28]
The n firm-value processes [15] are defined as in
can be applied to give an analytical representation
equation (1); the model can therefore be considered
of the default correlation in terms of an infinite sum
as a generalization of Zhou’s [33] bivariate model to
of indefinite integrals over modified Bessel functions.
larger portfolios. The default thresholds are rewritten
Sensitivity analysis of the model parameters indicates
in terms of the driving Brownian motions. Asset
that the model-induced default correlations for short
correlation is introduced by nF risk factors, that is,
maturities are close to zero. This observation needs to
the Brownian motion of firm i is replaced by
be considered when portfolio derivatives with short
maturities are priced within such a framework.

nF
j

nF 1
dWti : = αi,j dFt + (1 − 2 2
αi,j ) dUti ,
The Model of Giesecke j =1 j =1

i ∈ {1, . . . , n} (21)
Giesecke [13] considers a portfolio of n firms
whose value processes evolve according to some where αi,j is the sensitivity of firm i to changes of
vector-valued stochastic process (V 1 , . . . , V n ), where the risk factor F j and U i is the idiosyncratic risk
default is again defined as in equation (20). The of this firm. All processes F j and U i are indepen-
key innovation is to replace the vector of default dent Brownian motions. Hull et al. [15] also consider
thresholds by an initially unobservable random vector extensions to stochastic correlations, stochastic recov-
(d 1 , . . . , d n ) whose dependence structure is repre- ery rates, and stochastic volatilities and compare these
sented by some copula. It is shown that the model- in terms of their fitting capability to CDO tranches.
induced copula of default times is a function of the An interesting conclusion that also applies to similar
copula of default thresholds and the copula of the first-passage time models is drawn when the model
vector of historical lows of the firm-value processes. is compared to a copula model. It is argued that the
On a univariate level, the assumption of an unob- default environment in a copula model is static for
servable random threshold overcomes the predictabil- the whole life of the model, while the dynamic nature
ity of individual defaults, which is responsible for of equation (21) allows to have bad default environ-
vanishing credit spreads for short maturities; see [10] ments in one year, followed by good environments
for a related model. Short-term spreads [13] are later. Hence, the use of one or more common risk
positive as long as the respective firm-value process factors implies a sound economic model for cyclical
is close to its historical low. The consequence of correlation.
this construction on a portfolio level is also remark- Kiesel and Scherer [18] present another multi-
able. Observing a corporate default τ i reveals the variate extension of the work by Zhou [34]. They
respective default threshold d i to all investors. This model the firm-value process of the company i as
Structural Default Risk Models 7

the exponential of a jump-diffusion process with two- the portfolio-loss process; a desired property in risk-
sided exponentially distributed jumps Yij , that is, management solutions and for the pricing of (exotic)
credit-portfolio derivatives.
Vti = V0i exp(Xti ), The downside of multivariate structural-default
models lies in the difficulty of translating the model
N i
t (b ) to analytical formulas for default correlations and the
Xti =γ t +σ
i i
Wti + Yij , V0i > 0 (22) portfolio-loss distribution. This becomes especially
j =1 apparent when the simplifying assumption in [31]
where the Brownian motions of different firms are and its generalizations are reconsidered; the bottom-
again correlated via a factor decomposition. The up nature of structural-default models is entirely
novelty in their approach is the use of a Poisson given up in order to compute the portfolio-loss
process Nt as ticker for jumps in the market that distribution in closed form. The price to pay for
is thinned-out with probability (1 − bi ) to induce a more realistic framework typically is a Monte
jumps in V i . Consequently, some but not necessarily Carlo simulation. However, if such a simulation is
all firms jump (and possibly default) together. As a efficiently implemented, a realistic dynamic model
result of common jumps, the model allows for default for a portfolio of credit-risky assets is available.
clusters that extend the cyclical correlation induced
by common continuous factors. For this choice of
jump distribution, the marginals of the model can be Acknowledgments
calibrated to CDS quotes using the Laplace transform
Research support by Daniela Neykova, Technische
of first-passage times of X i , which is derived in [19]. Universität München, is gratefully acknowledged.
The multivariate model is solved via a Brownian-
bridge Monte Carlo simulation in the spirit of the
work by Metwally and Atiya [26]. References

[1] Albrecher, H., Ladoucette, S. & Schoutens, W. (2007).


Conclusion A generic one-factor Lévy model for pricing synthetic
CDOs, in Advances in Mathematical Finance, M.C. Fu,
Structural-default models allow for an appealing R.A. Jarrow, J.J. Yen & R.J. Elliott, eds, Birkhaeuser.
interpretation of corporate default: companies oper- [2] Altman, E., Resti, A. & Sironi, A. (2004). Default
ate as long as they have sufficient assets. A clear recovery rates in credit risk modeling: a review of the
economic interpretation also holds for the way depen- literature and empirical evidence, Economic Notes 33(2),
183–208.
dence is introduced to a portfolio of companies: [3] Andersen, L. & Sidenius, J. (2004). Extensions of the
comovements of the firm-value processes might be Gaussian copula: random recovery and random factor
seen as the result of common risk factors, to which loadings, Journal of Credit Risk 1(1), 29–70.
economic interpretations might also apply. This ratio- [4] Basel Committee on Banking Supervision (2004). Inter-
nale can also be used to empirically estimate the national Convergence of Capital Measurement and Cap-
correlation structure of the model from market data. ital Standards–A Revised Framework , retrieved from
Summarizing, the dependence structure and univari- http://www.bis.org/publ/bcbs107.pdf.
[5] Basel Committee on Banking Supervision (2005). An
ate marginals are simultaneously explained. More- Explanatory Note on the Basel II IRB Risk Weight Func-
over, since each company is modeled explicitly, tions, retrieved from http://www.bis.org/bcbs/irbrisk-
called bottom-up approach, it is also possible to weight.pdf.
price portfolio derivatives and individual risk con- [6] Baxter, M. (2006). Dynamic Modelling of Single-name
sistently; a major advantage over top-down models Credits and CDO Tranches. Working paper, Nomura
that purely focus on the portfolio-loss process. In Fixed Income Quant Group.
[7] Black, F. & Cox, J. (1976). Valuing corporate securities:
addition, the current asset level might be mapped to
some effects of bond indenture provisions, Journal of
some credit rating, implying a dynamic model of rat- Finance 31(2), 351–367.
ing changes including default. Finally, the dynamic [8] Cont, R. & Tankov, P. (2004). Financial Modelling
nature of the modeled firm-value processes translates with Jump Processes, Financial Mathematics Series,
to a dynamic model for the default correlation and Chapman and Hall/CRC.
8 Structural Default Risk Models

[9] Crosbie, P. & Bohn, J. Modeling Default Risk, KMV [25] Merton, R. (1974). On the pricing of corporate debt: the
Corporation, retrieved from http://www.moodyskmv. risk structure of interest rates, Journal of Finance 29,
com/research/files/wp/ModelingDefaultRisk.pdf. 449–470. Reprinted as Chapter 12 in Merton, R. (1990)
[10] Duffie, D. & Lando, D. (2001). The term structure of Continuous-time Finance, Blackwell.
credit spreads with incomplete accounting information, [26] Metwally, S. & Atiya, A. (2002). Using Brownian
Econometrica 69, 633–664. bridge for fast simulation of jump-diffusion processes
[11] Frye, J. (2000). Depressing recoveries, Risk 13(11), and barrier options, The Journal of Derivatives 10(1),
106–111. 43–54.
[12] Geske, R. (1977). The valuation of corporate liabilities [27] Moosbrucker, T. (2006). Pricing CDOs with Correlated
as compound options, Journal of Financial and Quanti- Variance Gamma Distributions. Research report, Depart-
tative Analysis 12(4), 541–552. ment of Banking, University of Cologne.
[13] Giesecke, K. (2004). Correlated default with incomplete [28] Rebholz, J. (1994). Planar Diffusions with Applications
information, Journal of Banking and Finance 28(7), to Mathematical Finance, PhD thesis, University of
1521–1545. California, Berkeley.
[14] Gordy, M. (2000). A comparative anatomy of credit [29] Schönbucher, P. (2003). Credit Derivatives Pricing Mod-
risk models, Journal of Banking and Finance 24(1), els: Models, Pricing, Implementation, Wiley Finance.
119–149. [30] Thomas, H. & Wang, Z. (2005). Interpreting the internal
[15] Hull, J., Predescu, M. & White, A. (2005). The Valua- ratings-based capital requirements in Basel II, Journal
tion of Correlation-dependent Credit Derivatives using of Banking Regulation 6, 274–289.
a Structural Model . Working paper, retrieved from [31] Vasicek, O. (1987). Probability of Loss on Loan Portfo-
http://www.rotman.utoronto.ca/hull/DownloadablePubli- lio, KMV Corporation, retrieved from http://www.mood-
yskmv.com/research/whitepaper/Probability of Loss on
cations/StructuralModel.pdf
Loan Portfolio.pdf
[16] Hull, J. & White, A. (2004). Valuation of a CDO and an
[32] Willemann, S. (2007). Fitting the CDO correlation skew:
n-th to default CDS without a Monte Carlo simulation,
a tractable structural jump-diffusion model, The Journal
Journal of Derivatives 12(2), 8–23.
of Credit Risk 3(1), 63–90.
[17] Kalemanova, A., Schmid, B. & Werner, R. (2007). The
[33] Zhou, C. (2001). An analysis of default correlations
normal inverse Gaussian distribution for synthetic CDO
and multiple defaults, Review of Financial Studies 14,
pricing, Journal of Derivatives 14(3), 80–93.
555–576.
[18] Kiesel, R. & Scherer, M. (2007). Dynamic Credit Portfo-
[34] Zhou, C. (2001). The term structure of credit spreads
lio Modelling in Structural Models with Jumps. working
with jump risk, Journal of Banking and Finance 25,
paper, retrieved from http://www.uni-ulm.de/fileadmin/ 2015–2040.
website uni ulm / mawi.inst.050 / people /kiesel/publica-
tions/ Kiesel Scherer Dec07.pdf.
[19] Kou, S. & Wang, H. (2003). First passage times of a Further Reading
jump diffusion process, Advances in Applied Probability
35, 504–531. Lipton, A. (2002). Assets with jumps, Risk 15(9), 149–153.
[20] Leland, H. (1994). Corporate debt value, bond Lipton, A. & Sepp, A. (2009). Credit value adjustment for
covenants, and optimal capital structure, Journal of credit default swaps via the structural default model, The
Finance 49(4), 1213–1252. Journal of Credit Risk 5(2), 125.
[21] Leland, H. & Toft, K. (1996). Optimal capital structure,
endogenous bankruptcy, and the term structure of credit
spreads, Journal of Finance 51(3), 987–1019. Related Articles
[22] Longstaff, F. & Schwartz, E. (1995). A simple approach
to valuing risky fixed and floating rate debt, Journal of
Finance 50(3), 789–819. Default Barrier Models; Modeling Correlation of
[23] Lopez, J. (2004). The empirical relationship between Structured Instruments in a Portfolio Setting;
average asset correlation, firm probability of default, and Gaussian Copula Model; Internal-ratings-based
asset size, Journal of Financial Intermediation 13(2), Approach; Reduced Form Credit Risk Models.
265–283.
[24] Lucas, D. (1995). Default correlation and credit analysis, RÜDIGER KIESEL & MATTHIAS A. SCHERER
Journal of Fixed Income 4(4), 76–87.
CreditRisk+ early formulation of the Basel accord (see [5]) and
has been used by central banks to analyze country-
wide panel data on defaults (an example is reported
in [1]).
CreditRisk+ is a portfolio credit risk model devel- For these reasons, since its introduction in 1997,
oped by the bank Credit Suisse, who published the CreditRisk+ has consistently attracted the interest
methodology in 1997 [2]. of practitioners, financial regulators, and academics,
A portfolio credit risk model is a means of who have generated a significant body of literature
estimating the statistical distribution of the aggre- on the model. An account of CreditRisk+ and its
gate loss from defaults in a portfolio of loans or subsequent developments can be found in [6].
other credit-risky instruments over a period of time.
More generally, changes in credit quality other than
default can be considered, but CreditRisk+ in its The CreditRisk+ Algorithm
original form is focused only on default. The most
widely used portfolio credit risk models are undoubt- The function of CreditRisk+ is to transform data
edly the so-called structural models, including mod- about the creditworthiness of individual borrowers
els based on the Gaussian copula framework (see into a portfolio-level assessment of risk. In most
Structural Default Risk Models). CreditRisk+ per- portfolio credit risk models, this step requires Monte
forms its calculation in a different way to these Carlo simulation (see Credit Portfolio Simulation).
models, but it is recognized that CreditRisk+ and However, CreditRisk+ avoids simulation by using an
Gaussian copula models have a similar concep- efficient numerical algorithm, as outlined below.
tual basis. A detailed discussion can be found in The approach confers advantages in terms of speed
[4, 7]. of computation and enhanced understanding of the
Financial institutions use portfolio credit risk mod- drivers of the resulting distribution: many useful
els to estimate aggregate credit losses at high per- statistics, such as the moments of the loss distribu-
centiles, corresponding to very bad outcomes (often tion, are given by simple formulae in CreditRisk+,
known as the tail of the loss distribution). These whose relationship to the risk management features
estimates are then used in setting and allocating of the situation is transparent. On the other hand,
economic capital (see Economic Capital) and deter- owing to its analytic nature, CreditRisk+ is a rel-
mining portfolio performance measures such as risk- atively inflexible portfolio model, and as such has
adjusted return on capital (see Risk-adjusted Return tended to find application where transparency and
on Capital (RAROC)). ease of calculation are more important than flexible
Portfolio credit risk models have two elements. parameterization.
The first is a set of statistical assumptions about the To understand the CreditRisk+ calculation, we
effect of economic influences on the likelihood of consider a portfolio containing N loans, where we
individual borrowers defaulting, and about how much wish to assess the loss distribution over a one-year
the individual losses might be when they default. time horizon. (The model can be applied to bonds or
The second element is an algorithm for calculating derivatives counterparties, but the main features of
the resulting loss distribution under these assump- the calculation are the same.) To run CreditRisk+, a
tions for a specific portfolio. Unlike most portfo- number R of economic factors must be chosen. This
lio credit risk models, CreditRisk+ calculates the can be the number of distinct economic influences
loss distribution using a numerical technique that on the portfolio that are considered to exist (say,
avoids Monte Carlo simulation. The other distinc- the number of geographical regions or industries
tion of CreditRisk+ is that it was presented as a significantly represented in the portfolio), but it is
methodology rather than as a software implemen- often assumed in practice that R = 1, in which
tation. Practitioners and institutions have developed case the model is said to be in “one-factor” mode.
their own implementations, leading to a number of CreditRisk+ with one factor gives an assessment
significant variants and improvements of the original of risk that ignores subtle industry or geographic
model. The model has also been used by regulators diversification, but can capture the correct overall
and central banks: CreditRisk+ played a role in the amount of economic and concentration risk present
2 CreditRisk+

in the portfolio, and is sufficient for many purposes. pi but also on the random variables
 X1 , . . . , XR . Note
In any event, typically R is much less than N, that because E(Xj ) = 1 and Rj=1 θi,j = 1, we have
the number of loans, reflecting the fact that all the
significant influences on the portfolio affect many E(Pi ) = pi (θi,1 + . . . + θi,R ) = pi (2)
borrowers at once.
For each loan i, where 1 ≤ i ≤ N, the model so that the long-term average default probability (or
needs the following input data: equivalently, the average of the default probabilities
across all states of the economy) is pi as required. In a
1. Long-term average probability of default pi : This particular year, however, Pi will differ from its long-
is the probability that the obligor will default term average. If the borrower i is sensitive to a factor
over the year, typically estimated from the credit j, (i.e., θi,j > 0), and if a large value is drawn for Xj ,
rating (see Credit Rating). then this represents a poor economy with a negative
2. Loss on default Ei : This is typically estimated impact on the obligor i, and we will tend to have
as the loan notional less an estimated recovery Pi > pi , meaning that the obligor i is more likely
amount (see Recovery Rate). to default in this particular year than on average.
3. Economic factor loadings: These are given by Because the same will be true of other obligors i 
θi,j , for 1 ≤ j ≤ R, where R is the number of with θi  ,j > 0, the economic influence represented
factors introduced above. θi,j must be nonnega- by factor j can affect a large number of obligors at

tive numbers satisfying Rj=1 θi,j = 1 for each i. once. This mechanism incorporates systematic risk,
The factor loadings θi,j require some further which affects many obligors at once and so cannot
explanation: they represent the sensitivity of the be diversified away. The same mechanism in various
obligor i to each of the R economic factors assumed forms is present in all commonly used portfolio credit
to influence the portfolio. In general, determining risk models.
suitable values for θi,j is one of the main difficulties Two technical assumptions are now made in
of using CreditRisk+, and analogous difficulties exist CreditRisk+:
for all portfolio models. Note, however, that if R
1. The random variables Xj , 1 ≤ j ≤ R, are inde-
is chosen to be 1 (“one-factor mode” as described
pendent, and each has a Gamma distribution with
above), then we must have θi,1 = 1 for all i, and
mean 1 and variance βj .
there is no information requirement. This reflects
2. For each loan i, 1 ≤ i ≤ N, the loss given
the fact that one-factor mode ignores the subtle
default Ei is a positive integer.
industry or geographic diversification effects in the
portfolio, but is, nevertheless, a popular mode of The first assumption is made to facilitate the
use of the model due to the simpler parameter CreditRisk+ numerical algorithm. In other credit risk
requirements. models, notably the Gaussian copula models, the
To understand how CreditRisk+ processes this variables that play the role of the Xj are assumed to
data, let X1 , . . . , XR be random variables, each with be normally distributed. Although these assumptions
mean E(Xj ) = 1. The variable Xj represents the eco- seem very different, in fact for many applications
nomic influence of sector j over the year. In common they have little effect on the final risk estimate.
with most portfolio credit risk models, CreditRisk+ Assumption (1) can, however, lead to difficulties in
does not incorporate economic prediction. Instead, parameterizing CreditRisk+.
uncertainty about the economy is reflected by rep- The second assumption, known as bucketing of
resenting economic factors as random variables in exposures, also requires some further explanation.
this way. CreditRisk+ then assumes that the realized Without this assumption, Ei could be any positive
probability of default Pi for loan i is given by the amounts, all expressed in units of a common ref-
following critical relationship: erence currency. An insight of CreditRisk+ is that
the precise values of Ei are not critical: Ei can
Pi = pi (θi,1 X1 + . . . + θi,R XR ) (1) be rounded to whole numbers without significantly
affecting the aggregate risk assessment (a simple way
The realized default probability Pi depends not of estimating the resulting error is given in Section
only on the long-term average probability of default A4.2 of [2]). The amount of rounding depends on
CreditRisk+ 3

how Ei are expressed before rounding; for example, by calculating A0 , which is the probability of no loss,
it is common to express Ei in millions, so that a loss by setting z = 0 in equation (4) to give the explicit
on default of say 24.35, meaning 24.35 million units formula
of the reference currency, would be rounded to 25.   N −1/βj
After bucketing of exposures, the aggregate loss R 
from the portfolio must itself be a whole number A0 = G(0) = 1 + βj θi,j pi (6)
j =1 i=1
(in the example above, this would mean a whole
number of millions of the reference currency). The and the recurrence relation then allows efficient
loss distribution can therefore be summarized in calculation of An up to any desired level. For
terms of its probability generating function a complete treatment of this algorithm, see, for
∞ example, [6], Chapter 2.

G(z) = An zn (3)
n=0
Later Developments of CreditRisk+
where An denotes the probability that the aggregate
loss is exactly n. To obtain the loss distribution, Many enhancements to CreditRisk+ have been pro-
we need the numerical value of An , for n = 0 posed by various authors (see the introduction to [6]
(corresponding to no loss), 1, 2, . . . up to a desired for a discussion of some of the drawbacks of the
point. For CreditRisk+, with the inputs described original model). Developments have fallen into the
above, it can be shown that the probability generating following broad themes:
function (3) is given explicitly as
1. alternative calculation algorithms, such as sad-
  N −1/βj dlepoint approximation, Fourier inversion, and

R  the method of Giese [3];
G(z) = 1 − βj θi,j pi (zEi − 1)
2. improved capital allocation methods, notably the
j =1 i=1
method of Haaf and Tasche;
(4) 3. inclusion of additional risks, such as migration
risk and uncertain recovery rates;
For the derivation of this equation, see, for exam- 4. improved methods for determining inputs, par-
ple [2], Section A9 or [6], Chapter 2. The derivation ticularly the economic factor loadings θi,j ;
involves a further approximation, known as the Pois- 5. application to novel situations such as default
son approximation, which can roughly be described probability estimation [8]; and
as assuming that the default probabilities pi are 6. asymptotic formulae, notably the application of
small enough that their squares can be neglected. the “granularity adjustment” [5].
CreditRisk+ then uses an approach related to the so- The reader is also referred to [6] for details on
called Panjer algorithm, which was developed origi- many of these developments.
nally for use in actuarial aggregate claim estimation.
This relies on the fact that there exist polynomials
P (z) and Q(z), whose coefficients can be computed References
explicitly from the input data via equation (4), and
which satisfy [1] Balzarotti, V., Castro, C. & Powell, A. (2004). Reforming
Capital Requirements in Emerging Countries: Calibrating
dG(z) Basel II using Historical Argentine Credit Bureau Data
P (z) = Q(z)G(z) (5) and CreditRisk+. Working Paper, Universidad Torcuato
dz
Di Tella, Centro de Investigación en Finanzas.
Equating the coefficients of zn on each side [2] Credit Suisse Financial Products (1997). CreditRisk+ ,
of this identity, for each n ≥ 0, leads finally to a Credit Risk Management Framework, Credit Suisse
Financial Products, London.
a simple recurrence relationship between An in [3] Giese, G. (2003). Enhancing CreditRisk+, Risk 16(4),
equation (3). The recurrence relationship expresses 73–77.
the value of An for each n, in terms of the earlier [4] Gordy, M. (2000). A comparative anatomy of Credit Risk
coefficients A0 , . . . , An−1 . The calculation is started Models, Journal of Banking and Finance 24, 119–149.
4 CreditRisk+

[5] Gordy, M. (2004). Granularity adjustment in portfolio Related Articles


Credit Risk Measurement, in Risk Measures for the 21st
Century, G. Szego, ed., John Wiley & Sons, Heidelberg.
Credit Risk; Gaussian Copula Model; Structural
[6] Grundlach M. & Lehrbass, F. (eds) (2004). CreditRisk+
in the Banking Industry, Springer Finance.
Default Risk Models.
[7] Koyluoglu, H.U. & Hickman, A. (1998). Reconcilable
TOM WILDE
differences, Risk 11(10), 56–62.
[8] Wilde, T. & Jackson, L. (2006). Low default portfolios
without simulation, Risk 19(8), 60–63.
Large Pool numerical quadratures for multidimensional integrals
[7]. The multistep portfolio modeling is applied when
Approximations it is necessary to incorporate the effect of stochastic
portfolio exposure, as in the integrated market and
credit risk framework in [7].
The loss distribution of a large credit portfolio can The notation x denotes the regular conditional
be valued by Monte Carlo methods. This is perhaps probability measure, conditional on X = x; Ɛx is the
the most common approach used by practitioners corresponding conditional expectation operator.
today. The problem is that Monte Carlo methods are A general approach to approximate the distribution
computationally intensive usually taking a significant of the random variable L can be described as follows:
amount of time to achieve the required accuracy.
1. Choose a sufficiently rich family of distributions,
Therefore, although such methods may lend them-
Fθ , such that θ  → Fθ is a Borel measurable
selves to pricing and structuring of credit derivatives,
mapping of a vector of parameters θ.
they are not appropriate for risk management where
2. Fix a value of the variable, X = x and compute
simulation and stress testing are required. In fact,
parameters, θ(x), of the approximating family of
nesting a second level of simulation, for pricing,
distributions, Fθ(x) (), such that the conditional
within the risk management simulation represents a
distribution, x (L ≤ ) = (L ≤  | X = x), is
performance challenge.
approximated by Fθ(x) (). (It is assumed that
Analytical approximations of losses of large port-
x  → θ(x) is Borel measurable, so that x  →
folios represent an efficient alternative to Monte Carlo
Fθ(x) () is also measurable, for each .)
simulation. The following methods can be applied for
3. Find the unconditional approximating distribu-
approximation of a large portfolio’s loss distribution:
tion by integration over the distribution, GX , of
the law of large numbers (LLN), the central limit
theorem (CLT), and large deviation theory. the variable X:

The analytical methods for approximation of credit
portfolio losses are usually applied in an additive F ∗ () = Fθ(x) () dGX (x) (2)
scheme: the portfolio losses due to default, L, over
some fixed time horizon (single step) are repre-
sented as Law of Large Numbers: Vasicek
K
L= Lk (1) Approximation
k=1
The first key result was obtained in [14, 13] for
where Lk is the loss of the kth name in the portfolio homogeneous portfolios. The K random variables,
and K is the number of names. Application of limit Lk , can be expressed as Lk = N · Ik , where Ik is
theorems for stochastic process becomes quite natural the indicator of default of the kth name and N is
as K increases. The main technical difficulties are the constant loss given default. The random variables
related to dependency of default events and losses of Ik are identically distributed and their sum, ν =
the counterparties. K
k=1 Ik , is the number of names in default. The
The analytical methods for portfolio losses are portfolio losses L = N ν.
applied in the conditional independence framework The variable, X, in the Vasicek model is latent and
pioneered in [14] (see also [7, 9]), based on the has a standard normal distribution, (x). Conditional
assumption that there is a random vector, X, such on X = x, the default events are independent and ν
that conditional on the values of X, the default events has a binomial distribution with parameter p(x) =
are independent. Usually, X is interpreted as a vector (Ik = 1 | X = x) so that
of credit drivers describing the state of the economy
 ∞
or a sector of the economy, at the end of the time
horizon [3, 5, 8, 12]. In multistep models, X can be a p(x) d(x) = π∗ (3)
−∞
random process describing the dynamics of the credit
drivers [7]. In this case, computation of conditional where π∗ is the common unconditional probability of
default and migration probabilities requires efficient default. The unconditional distribution of ν is then a
2 Large Pool Approximations

generalized binomial distribution and given default of the kth name in the portfolio, and
by pk (x), the conditional default probability of the
(L = mN ) =  (ν = m) kth name. Then the conditional mean, µ(x), and the
  ∞ conditional variance, σ 2 (x), of the portfolio losses
K are
= p m (x)q K−m (x) d(x),
m −∞

K
µ(x) = Nk pk (x),
m = 0, 1, . . . , K (4)
k=1

where q(x) = 1 − p(x).


The following specificationa of the conditional 
K
σ 2 (x) = Nk2 pk (x) · (1 − pk (x)) (9)
default probability is widely used in the literature
k=1
[6, 14], and so on.
  Under mild conditions on the notionals, Nk (which
H − βx are vacuous in the homogeneous case), the condi-
p(x) =  β +σ =1
2 2
(5)
σ tional distribution of the portfolio losses satisfies

where H = −1 (π∗ ), and β is a parameter that deter-  


L − µ(x)
mines the correlation between default events. x ≤  → () as K → ∞
Consider the ratio νK = ν/K determining the σ (x)
portfolio losses. If β = 0, then p(x) ≡ π∗ and (10)

lim νK = π∗ almost surely (6) Let a probability, q, 0 < q < 1, be fixed and consider
K→∞ the equation
in accordance with the strong law of large numbers. q = (L ≤ q ) (11)
If β  = 0 the limit in equation (6) is in distribution,
to a random variable with the same distribution as for the quantile of the distribution of the random
ξ = p(X). Thus, one obtains variable L. One has

   ∞
σ −1 () − H (L ≤ q ) = x (L ≤ q ) d(x)
lim  (νK ≤ ) =  , 0≤≤1 −∞
K→∞ β
 ∞  
(7) · q − µ(x)
=  d(x) (12)
−∞ σ (x)
It follows from equation (7) that the quantile approx-
imation, ∗q , corresponding to the probability q, is Therefore the quantile approximation, ∗q , is the
  solution of the equation
∗ β−1 (q) + H   
q = N  (8) ∞ ∗q − µ(x)
σ q=  d(x) (13)
−∞ σ (x)
In terms of the general approach, one has θ = µ with
Fθ () = 1l[µ,∞) () and θ(x) ≡ µ(x) = KNp(x). In terms of the general
 approach, one has θ = (µ, σ )
with Fθ () =  σ −µ
and θ(x) = (µ(x), σ (x)).

Central Limit Theorem In the case of a homogeneous portfolio, considered


in [14, 13] one has the simplifications
The heterogeneous case is treated at the outset, as it is
no more difficult than the homogeneous case, which

µ(x) = KNp(x) σ 2 (x) = KN 2 p(x) − p 2 (x)
is described as a special case at the end. Once again,
X is univariate and latent. Denote by Nk , the loss (14)
Large Pool Approximations 3

The normal approximation is just the classical central was used in [2] and [6] for synthetic collater-
limit theorem (CLT). The equation for the quantile alized debt obligation (CDO) pricing. The same
approximation simplifies to approach is applicable for approximation of portfolio
 ∞  ∗  losses.
q /N − Kp(x) In the case of a heterogeneous portfolio, it is
q=  d(x) (15)
−∞ Kp(x)(1 − p(x)) not sufficient to approximate the distribution of the
number of losses suffered. One must keep track of
who defaults or at least the sizes of the individual
Generalized Poisson Approximation potential losses because, given only the number of
defaults, one cannot infer the losses incurred. To see
Consider a homogeneous portfolio, for which the how this added complexity is handled and how the
number, K, of obligors is moderately large but compound Poisson distribution arises quite naturally,
not very large. If also the conditional mean num- the simplest heterogeneous case is analyzed first;
ber of default events in the portfolio, K · p(x), namely, when there are only two distinct recovery-
takes moderate values, the conditional distribution adjusted notional values among the obligors in the
of ν might be better approximated by a Poisson portfolio.
distribution Denote by N(1) and N(2) , the two distinct values of
the recovery-adjusted notionals in the pool. The port-
· λm (x) folio then divides into two groups: one with obligors
x (ν = m) = exp (−λ(x)) , m = 0, 1, 2, . . .
m! having the common recovery-adjusted notional equal-
(16) ing N(1) ; the other having common recovery-adjusted
notional equaling N(2) . Denote the number of defaults
than by a normal distribution, where λ(x) = Kp(x). in each of the two groups, by ν1 and ν2 , respectively.
In this case, the (unconditional) portfolio losses Conditionally, their distributions are independent and
can be approximated by the generalized Poisson can be approximated by a Poisson distribution with
distribution: for moderately large K, conditional mean λi (x) = k:Nk =N(i) pk (x), i = 1, 2,
provided both group sizes are moderately large. (This

· λm (x) assumption on the group sizes, is only being made in
(L = mN ) = e−λ(x) dGX (x), the context of this example.) The total number of
m!
defaults in the portfolio, ν = ν1 + ν2 , is condition-
m = 0, 1, 2 . . . . (17)
ally Poisson with conditional mean λ(x) = λ1 (x) +
In terms of the general approach, one has Fθ being λ2 (x). The total portfolio loss is the sum of the losses
the Poisson distribution function with mean θ and of the first and second groups:
θ(x) = λ(x).
In particular, for the quantile approximation, one L = ν1 N(1) + ν2 N(2) (19)
obtains
As a positive linear combination of conditionally
 

q /N m independent Poisson random variables, L is condi-
λ (x)
q= e−λ(x) dGX (x) (18) tionally a compound Poisson random variable with
m=0
m! the same distribution as that of


ν

Compound Poisson Approximation L̃ := N (j ) (20)


j =1
In order to extend the result of the previous section
to heterogeneous portfolios, one needs to con- where N(j ) is a conditionally independent and identi-
sider compound Poisson distributed random vari- cally distributed (i.i.d.) sequence of random variables,
ables. The compound Poisson distribution is a well each taking two values, N(1) and N(2) , with corre-
known approximation in insurance models [11]. In sponding conditional probabilities λλ(x)1 (x)
and λλ(x)
2 (x)
and
risk management of credit derivatives, the approach conditionally independent of ν. (This is an elementary
4 Large Pool Approximations

calculation using the conditional characteristic func- for fixed i = 1, 2, . . . , n,


tions of L and L̃.) More formally, the conditional
distribution of N( j ) is 
ν
D
L≈ N(m) (25)


f (N ; x) ≡ x N ( j ) = N m=1


 λ1 (x) , N = N(1) where (N(m) )Km=1 is an i.i.d. sequence of random
= λ(x) (21) variables with common probability mass function f
 λ2 (x) , N = N(2) and independent of ν, the number of defaults in
λ(x)
the pool, which is approximately Poisson distributed
In the general case where the recovery-adjusted under x
notionals take more than two values, the conditional D
ν ≈ Pois(λ(x)) (26)
distribution of the random variable, N(j ) , is
 More precisely,
f (N ; x) = pk (x)/λ(x) (22)
k:Nk =N   ν 
  
  
max x (L = N ) − x N =N 
(m)
where λ(x) = K k=1 pk (x) and N represents a possi- N  
m=1
ble individual loss.  K 
In the special case where pk does not depend on 
k, f is simply the relative frequency of the notional =O (pk (x)) 2
(27)
values and does not depend on x: k=1

f (N ) = [#k ∈ {1, 2, . . . , K} : Nk = N ]/K (23) For the unconditional loss distribution,

 

In general, the function f (N ; x) is a probability · e−λ(x) λm (x)
mass function with respect to N , which approximates (L ≤ ) = f
m (N ; x) dGX (x)
N≤ m=0
m!
the conditional probability that the portfolio loss is of
size N , given that there has been only one default. (28)
More generally, it can be shown that
In terms of the general approach, one has Fθ being
x (L = N |ν = m) ≈ f
m (N ; x) (24) the compound Poisson distribution function with
parameter θ = (θ1 , θ2 , . . . , θK ) ∈ [0, 1]K and θ(x) =
where f
m denotes the m-fold convolution of f with (p1 (x), p2 (x), . . . , pK (x)); Fθ is defined as
itself, as a probability mass function (for notational


convenience, f
1 ≡ f and f
0 (N ; x) = 1 if and only e−λ λm
Fθ = f
m (N ) (29)
if N = 0). Given that there have been exactly m m!
N≤ m=0
defaults, the pool loss amounts to a sum of m notional
amounts but, as one does not know who defaulted, in  −1

where λ := K k=1 θk , f (N ) := λ k:Nk =N θk . In
the heterogeneous case there is still some randomness
practice, the convolutions, would be calculated recur-
left; that randomness is captured (approximately)
sively using the fast Fourier transform.
by f
m .
Assuming that a monetary unit has been chosen
and that all recovery-adjusted notionals are expressed
as integers—that is, integer multiples of the monetary Large Deviations
unit—one has the following result [6]:
Approximations based on large deviation theory usu-
Theorem 1 In the limiting case of a large portfolio ally lead to exponential approximations of the tail
(K large), the following approximate equality holds of the conditional portfolio loss distribution. These
in distribution under x (i.e., conditional on X = x): approximations are derived using the saddlepoint
Large Pool Approximations 5

method for the characteristic function of the portfolio 2. An analytic adjustment (approximation) to a full
losses, multifactor model that is still based on an LLN
type of loss function. This adjustment is called a
L (s) = Ɛ[exp (is L)] multifactor adjustment.
  3. An analytic adjustment, bridging the LLN-type
K

loss function of the second stage to the usual
= 1 − pk (x) + pk (x)eisNk dGX (x)
Merton-type one with full specific risk. This
k=1
adjustment is called a granularity adjustment.
(30)
The reason behind the terminology for the two adjust-
The technical details can be found in [1] (see also ments, is that for a single-factor model, the multi-
Saddlepoint Approximation). factor adjustment vanishes, whereas for an infinitely
granular portfolio (i.e., a very large, homogeneous
one), the granularity adjustment vanishes.
Other Methods The approximations, in both the second and third
There are some methods of approximation that deal stages, are based on a single formula for quantile
only with quantiles of the loss distribution directly, approximation, due originally to Gourieroux et al.
focusing on quantiles with high quantile probability, [4]. The formula is a second-order Taylor expansion,
which is the case of interest for credit risk. The for the quantile, in a small parameter that is used
large deviation approximations are examples of such to express the full loss model as a perturbation
methods. of the single-factor model. The first-order Taylor
Another one of these methods is due to Pykhtin coefficient is the difference between the single-factor
[12] who, building on the work of Martin and Wilde (conditional) loss and the conditional expected loss
[10], adapted the tools of an earlier investigation [4] of the full model, conditional on the single factor.
in market-risk sensitivity to position sizes, to the The single factor is constructed so that the first-order
credit risk setting. Note that this method is a direct, Taylor term vanishes.
analytical approximation to the quantile of the uncon- The second-order Taylor coefficient is related to
ditional loss distribution using an approximate model, the conditional variance of the full loss, conditional
unlike the other semianalytic methods described so on the single factor. The well-known conditional
far, which calculate the quantile by making analyti- variance decomposition from statistics is used to split
cal approximations to the conditional loss distribution the Taylor coefficient into two terms, which are the
(conditional on a systemic credit scenario). It is also approximations in the second and third stages.
worth noting that the result is in closed form, a qual- The end result for the entire adjustment to the
itative description of which is given here. single-factor quantile, is expressed as a sum of four
Pykhtin’s approach can be described at a high quadratic forms in the recovery-adjusted exposures,
level as follows. It consists of a three-stage series with coefficients involving the bivariate and univari-
of approximations: ate normal cumulative distribution functions, evalu-
ated in terms of the input statistical parameters of
1. A single-factor model, which is an approximation the model. The result is thus in closed form. The
based on an LLN type of loss function; that is, reader is referred to [12] for the quantitative details
it is a Vasicek type of model. of the construction, the formulae for the terms in the
quantile approximation, and a study of the scope of
a) The single factor is built as a weighted sum
applicability of the method.
of the portfolio’s counterparties’ credit
drivers.
b) The weights are chosen to maximize the
single factor’s correlation with the drivers. End Notes
c) The weights use the counterparties’ loss
characteristics such as default probabilities a.
This specification is a partial case of the famous Gaussian
and losses given default. copula model [9].
6 Large Pool Approximations

References [12] Pykhtin, M. (2004). Multi-factor adjustment, Risk


March, 85–90.
[13] Vasicek, O. (1987). Probability of Loss on Loan Portfo-
[1] Dembo, A., Deushel, J.-D. & Duffie, D. (2004). Large
lio, KMV, available at www.kmv.com
portfolio losses, Finance and Stochastics 8(1), 3–16.
[14] Vasicek, O. (2002). Loan portfolio value, Risk,
[2] De Prisco, B., Iscoe, I. & Kreinin, A. (2005). Loss in
December.
translation, Risk 18(6), 77–82.
[3] Gordy, M. (2003). A risk-factor model foundation for
ratings-based bank capital rules, Journal of Financial Further Reading
Intermediation 12(3), 199–232.
[4] Gourieroux, C., Laurent, J.-P. & Scaillet, O. (2000). Sen-
Emmer, S. & Tasche, D. (2003). Calculating Credit Risk
sitivity analysis of values at risk, Journal of Empirical
Capital Charges with the One-Factor Model , Working Paper,
Finance 7, 225–245.
September 2003.
[5] Huang, X., Oosterlee, C. & Mesters, M. (2007). Com-
Gordy, M. (2002). Saddlepoint approximations of credit risk,
putation of VaR and VaR contribution in the Vasicek
Journal of Banking and Finance 26, 1335–1353.
portfolio credit loss model: a comparative study, The
Gordy, M. & Jones, D. (2003). Random tranches, Risk March,
Journal of Credit Risk 3(3), 75–96.
78–83.
[6] Iscoe, I. & Kreinin, A. (2007). Valuation of synthetic
Gregory, J. & Laurent, J.-P. (2003). I will survive, Journal of
CDOs, Journal of Banking and Finance 31, 3357–3376.
Risk 16(6), 103–108.
[7] Iscoe, I., Kreinin, A. & Rosen, D. (1999). Integrated
Hull, J. & White, A. (2003). Valuation of a CDO and an nth
market and credit risk portfolio model, Algorithmics
to default CDS without Monte Carlo simulation, Journal of
Research Quarterly 2(3), 21–38.
Derivatives 12(2), 8–23.
[8] Koyluoglu, H.U. & Hickman, A. (1998). A Generalized
Laurent, J.-P. & Gregory, J. (2003). Basket default swaps,
Framework for Credit Risk Portfolio Models, Working
CDO’s and factor copulas, Presentation at the Conference
paper, CSFP Capital.
Quant’03 , London, September 2003, p. 21, www.defaultrisk.
[9] Li, D. (1999). On Default Correlation: A Copula Func-
com
tion Approach, The RiskMetrics group, Working paper,
Schönbucher, P. (2003). Credit Derivatives Pricing Models,
99-07.
John Wiley & Sons.
[10] Martin, R. & Wilde, T. (2002). Unsystematic credit risk,
Risk 15(11), 123–128.
[11] Panjer, H. & Willmot, G. (1992). Insurance Risk Models,
IAN ISCOE & ALEX KREININ
Society of Actuaries, Shaumburg.
Saddlepoint (care is needed here to select the correct sign of the
complex square root), and on this line, the Taylor
Approximation expansion of f about ζ ∗ implies
 1  n
∗ (n) ∗ iw
f (z) = f (ζ ) + f (ζ )
n≥2
n! s(f (2) (ζ ∗ ))1/2
The classical method known variously as the saddle-
(3)
point approximation, the method of steepest descents,
One can write the integrand in the form
the method of stationary phase, or the Laplace

method, applies to contour integrals that can be writ- f (3) (ζ ∗ )
sf (ζ ∗ )−w2 /2
ten in the form e sf (z)
∼e 1 − is −1/2 w3
 3!(f (2) (ζ ∗ ))3/2

I (s) = esf (ζ ) dζ (1) f (4) (ζ ∗ )
C + s −1 w4
4!(f (2) (ζ ∗ ))2
where f , an analytic function, has a real part that  
goes to minus infinity at both ends of the contour (f (3) (ζ ∗ ))2
C. The fundamental idea is that the value of the − w + ···
6
(4)
2!(3!)2 (f (2) (ζ ∗ ))3
integral when s > 0 is large should be dominated by
contributions from the neighborhoods of points where Now approximating the integral over C by the integral
the real part of f has a saddlepoint. Early use was over the tangent line parameterized by w leads to a
made of the method by Debye to produce asymptotics series of Gaussian integrals, each of which can be
of Bessel functions, as reviewed in, for example, [8]. computed explicitly. The terms with an odd power of
Daniels [3] wrote a definitive work on the saddlepoint w all vanish, leading to the result
approximation in statistics. Later, these ideas evolved  1/2
into the theory of large deviations, initiated by 2π ∗
I (s) ∼ i (2) ∗
esf (ζ )
Varadhan in [7], which seeks to determine rigorous sf (ζ )
asymptotics for the probability of rare events.  
If we write ζ = x + iy, elementary complex anal- −1 3f (4) (ζ ∗ )
× 1+s
ysis implies that the surface over the (x, y) plane with 4!(f (2) (ζ ∗ ))2
graph f has zero mean curvature, so any critical  
point ζ ∗ (a point where f  = 0) will be a saddle- 5 · 3 · (f (3) (ζ ∗ ))2
− + ... (5)
point of the modulus |esf (ζ ) |. The level curves of f 2!(3!)2 (f (2) (ζ ∗ ))3
and f form families of orthogonal trajectories: the
curves of steepest descent of f are the level curves
of f , and vice versa. Thus the curve of steepest Daniels’ Application to Statistics
descent of the function f through ζ ∗ is also a curve
on which f is constant. In other words, it is a curve Daniels [3] presented an asymptotic expansion for the
of “stationary phase”. On such a curve, the modulus probability density function (pdf) fn (x) of the mean
of esf (ζ ) will have a sharp maximum at ζ ∗ . If the X̄n of n i.i.d. copies of a continuous random variable
contour C can be deformed to follow the curve of X with cumulative probability function F (x) and pdf
steepest descent through a unique critical point ζ ∗ , f (x) = F  (x). Assuming that the moment generating
and the modulus of esf (ζ ) is negligible elsewhere, the function
dominant contribution to the integral for large s can  ∞
be computed by a local computation in the neighbor- M(τ ) = e(τ ) = eτ x f (x) dx (6)
hood of ζ ∗ . In more complex applications, several −∞

critical points may need to be accounted for. is finite for τ in an open interval (−c1 , c2 ) containing
The tangent line to the steepest descent curve at the origin, the Fourier inversion theorem implies that
ζ ∗ can be parameterized by w ∈  by the equation  α+i∞
n
fn (x) = en((τ )−τ x) dτ (7)
(sf (2) (ζ ∗ ))1/2 (ζ − ζ ∗ ) = iw (2) 2πi α−i∞
2 Saddlepoint Approximation

for any real α ∈ (−c1 , c2 ). This integral is now which means roughly that when truncated at any
amenable to a saddlepoint treatment as follows. order of n−1 , the remainder is of the same magnitude
For each x in the support of f , one can show that as the first omitted term. A more precise statement
the saddlepoint condition of the magnitude of the remainder is difficult to
establish: the lack of a general error analysis is an
  (τ ) − x = 0 (8) acknowledged deficiency of the saddlepoint method.
has a unique real solution τ ∗ = τ ∗ (x). One now
evaluates the integral given by equation (7) with α =
τ ∗ , and √uses Taylor expansion and the substitution
Applications to Portfolio Credit Risk
w = −i n  (τ ∗ )(τ − τ ∗ ) to write
√  ∞
n ∗ ∗ 2 The problem of portfolio credit risk measures and the
fn (x) ∼ en((τ )−τ x)−w /2 problem of evaluating arbitrage-free pricing of col-
2π   (τ ∗ ) −∞

lateralized debt obligations (CDOs) both boil down
× 1 + in−1/2 (  (τ ∗ ))−3/2  (3) (τ ∗ )w 3 /3! to computation of the probability distribution of the
portfolio loss at a set of times, and can be amenable
+ n−1 (  (τ ∗ ))−2  (4) (τ ∗ )w 4 /4! + . . . dw
to a saddlepoint treatment. To illustrate this fact, we
(9) consider a simple portfolio of credit risky instru-
ments (e.g., corporate loans or credit default swaps),
Each term in this expansion is a Gaussian integral and investigate the properties of the losses caused
that can be evaluated in closed form. The odd terms by default of the obligors. Let (, F, Ft , P ) be a fil-
all vanish, leaving an expansion in powers of n−1 : tered probability space that contains all of the random
  elements: P may be either the physical or the risk-
−1  (4) (τ ∗ )
fn (x) ∼ gn (x) 1 + n neutral probability measure. The portfolio is defined
8(  (τ ∗ ))2 by the following basic quantities:
 
5( (3) (τ ∗ ))2 −2 • M reference obligors with notional amounts
− + O(n ) (10)
24(  (τ ∗ ))3 Nj , j = 1, 2, . . . , M;
• the default time τj of the j th credit, an Ft
where the leading term (called the saddlepoint stopping time;
approximation) is given by • the fractional recovery Rj after default of the j th
 1/2 obligor;
n ∗ ∗
gn (x) =  ∗ en((τ )−τ x) (11) • the loss lj = (1 − Rj )Nj /N caused by default of
2π (τ ) the j th
obligor as a fraction of the total notional
The function I (x) = supτ τ x − (τ ) = τ ∗ x − N = j Nj ;
(τ ∗ ) that appears in this expression is the Legendre • the cumulative portfolio loss L(t) = j lj I (τj ≤
transform of the cumulant generating function , and t) up to time t as a fraction of the total notional.
is known as the rate function or Cramér function of
the random variable X. The large deviation principle For simplicity, we make the following assumptions:

1 1. The discount factor is v(t) = e−rt for a constant


lim logP (X̄n > x) = −I (x) for x > E[X] interest rate r ≥ 0.
n→∞ n

(12) 2. The fractional recovery values Rj and hence lj


are deterministic constants.
holds for very general X. Another observation is that 3. There is a sub σ -algebra H ⊂ F generated by
the Edgeworth expansion of statistics comes out in a a d-dimensional random variable Y , the “condi-
similar way, but takes 0 instead of τ ∗ as the center tion”, such that the default times τj are mutually
of the Taylor expansion. conditionally independent under H. The marginal
One can show, using a lemma due to Watson distribution of Y is denoted by PY and has pdf
[8], that equation (10) is an asymptotic expansion, ρY (y), y ∈ d .
Saddlepoint Approximation 3

The most important consequence of these assump- Here, we need to take P to be the physical
tions is that, conditioned on H, the fractional loss measure.
L(t) is a sum of independent (but not identical)
Bernoulli random variables. For fixed values of CDO Pricing
the time t and conditioning random variable Y ,
we note that L̂ := L(t)|Y ∼ j lj Xj where Xj ∼ CDOs are portfolio credit swaps that can be schemat-
Bern(pj (t, y)), pj = Prob(τj ≤ t|Y = y). The fol- ically decomposed into two types of basic contingent
lowing functions are associated with the random claims whose cash flows depend on the portfolio loss
variable L̂: Lt . These cash flows are analogous to insurance and
premium payments paid periodically (typically, quar-
1. the pdf ρ(x) := F (−1) (x) (in our simple example, terly) on dates tk , k = 1, . . . , K, to cover default
it is a sum of delta functions supported on the losses within a “tranche” that occurred during that
interval [0, 1]); period.
2. the cumulative distribution function (CDF) F (0) The writer (the insurer) of one unit of a default leg
(x) = E[I (L̂ ≤ x)]; for a tranche with attachment levels 0 ≤ a < b ≤ 1
3. the higher conditional moment functions F (m) (x) pays the holder (the buyer of insurance) at each date
= (m!)−1 E[((x − L̂)+ )m ], m = 1, 2, . . . ; tk all default losses within the interval [a, b] that
4. the cumulant generating function (CGF) (u) = occurred over [tk−1 , tk ]. The time 0 arbitrage price
log(E[euL̂ ]). of such a contract is


When we need to make explicit the dependence on Wa,b = e−rtk E (b − Ltk )+ − (b − Ltk−1 )+
t, y we write F (m) (x|t, y). The unconditional versions k

of these functions are given by −(a − Ltk )+ + (a − Ltk−1 )+ (17)
 d
where E is now the expectation with respect to some
F (x|t) =E[F (x|t, Y )] =
(m) (m)
F (m) (x|t, y)
 risk-neutral measure. The writer of one unit of a
× ρY ( dy), m = −1, 0, . . . (13) premium leg for a tranche with attachment levels
a < b (the insured) pays the holder (the insurer) on
According to these definitions, for all m = 0, 1, . . . each date tk an amount jointly proportional to the
we have the integration formula year fraction tk − tk−1 and the amount remaining in
 x the tranche. We ignore a possible “accrual term” that
account for defaults between payment dates. The time
F (m) (x) = F (m−1) (z) dz (14)
0 0 arbitrage price of such a contract is


Va,b = e−rtk (tk − tk−1 )E (b − Ltk )+


Credit Risk Measures k

In risk management, the key quantities that determine −(a − Ltk )+ (18)
the economic capital requirement for such a credit The CDO rate sa,b for this contract at time 0 is the
risky portfolio are the Value at Risk (VaR) and number of units of the premium leg that has the
Conditional Value at Risk (CVaR) for a fixed time same value as one unit of the default leg, that is,
horizon T and a fixed confidence level α < 1. These sa,b = Wa,b /Va,b .
are defined as follows:

VaRα (LT ) = inf{x|F (0) (x|T ) > α} (15) Saddlepoint Approximations for F (m)
We see that the credit risk management problem and
E[(LT − x)+ ]
CVaRα (LT ) = the CDO pricing problem both boil down to finding
1−α an efficient method to compute E[F (m) (x|t, y)] for
F (1) (x|T ) + E[LT ] − x m = 0, 1 and a large but finite set of values (x, t, y).
= (16) For the conditional loss L̂ = Lt |Y = y, the CGF is
1−α
4 Saddlepoint Approximation

explicit F (1) (x) = G(1) (x) − E[L] + x (25)



M
(u) = log [1 − pj + pj eulj ] (19) with similar formulas relating F (m) and G(m) for
j =1 m = 2, 3, . . . .
Since the conditional portfolio loss is a sum of
We suppose that the conditional default probabilities similar, but not identical, independent random vari-
pj = pj (t, y) are known. A number of different ables, we can follow the argument of Daniels to
strategies can be used to compute this distribution produce an expansion for the functions F (m) . Some
accurately: extra features are involved: the cumulant generating
1. In the fully homogeneous case when pj = function is not N times something, but rather a sum
p, lj = l, the distribution is binomial. of N (easily computed) terms; we must deal with the
2. When lj = l, but pj are variable (the homoge- factor τ −m−1 ; we must deal with the fact that critical
neous notional case), these probabilities can be points of the exponent in these integrals may be on
computed highly efficiently by a recursive algo- the positive or negative real axis and there is a pole
rithm in [1, 5]. at τ = 0. To treat the most general case, we move
3. When both lj , pj are variable, it has been noted the factor τ −m−1 into the exponent and consider the
in [2, 4, 6, 9] that a saddlepoint treatment of saddlepoint condition
these problems offer superior performance over   (τ ) − (m + 1)/τ − x = 0 (26)
a naive Edgeworth expansion.
Proposition 5.1 from [9] shows that a choice of two
We now consider the fully nonhomogeneous case and real saddlepoints solving this equation is typically
begin by using the Laplace inversion theorem to write available:
 α+i∞
1
ρ(x) = F (−1) (x) = e(τ )−τ x dτ (20) Proposition 1 Suppose that pj , lj > 0 for all j.
2π α−i∞ Then
Since ρ is a sum of delta functions, this formula must
be understood in the distributional sense, and holds 1. There is a solution τ ∗ , unique if it exists,
of
for any real α. When α < 0,   (τ ) − x = 0 if and only if 0 < x < j lj . If
 α+i∞ E[L̂] > x > 0, then τ ∗ > 0 and if E[L̂] < x <
1 1 − e−τ x ∗
F (x) =
(0)
e(τ ) dτ j lj , then τ < 0.
2π α−i∞ τ
2. For each m ≥ 0, there is exactly one solution
τm−
 α+i∞
1 of equation (26) on (−∞, 0), if x < j lj and
=− τ −1 e(τ )−τ x dτ (21) no solution on (−∞, 0), if x ≥ j lj . Moreover,
2π α−i∞
when x < j lj , the sequence {τm− }m≥0 is mono-
In the last step in this argument, one term is zero
tonically decreasing in m.
because e(τ ) is analytic and decays rapidly as τ →
3. For each m ≥ 0, there is exactly one solution
−∞. Similarly, for m = 1, 2, . . . one can show that
τm+ of equation (26) on (0, ∞), if x > 0 and no
 α+i∞ solution on (0, ∞), if x ≤ 0. Moreover, when
1
F (m) (x) = (−1)m+1 τ −m−1 e(τ )−τ x dτ x > 0 the sequence {τm+ }m≥0 is monotonically
2π α−i∞
(22) increasing in m.
provided α < 0. It is also useful to consider the
At this point, the methods in [2] and [9] differ.
functions
We consider first the method in [9] for computing
 α+i∞ F (m) , m = 0, 1. The argument of Daniels directly is
1
G(m) (x) := (−1)m+1 τ −m−1 e(τ )−τ x dτ applied, but with the following strategy for choosing
2π α−i∞
(23) the saddlepoint. Whenever x < E[L̂], τm− is chosen
defined when α > 0. One can show by evaluating the as the center of the Taylor expansion for the integral
residue at τ = 0 that in equation (22). Whenever x > E[L̂], instead, τm+ is
chosen as the center of the Taylor expansion for the
F (0) (x) = G(m) (x) − 1 (24) integral in equation (23), and either of equations (24)
Saddlepoint Approximation 5

or (25) is used. Thus for example, when x > E[L], slower than the Edgeworth expansion with the same
the approximation for m = 1 is number of terms. However, both [2] and [9] observe
+ +
that the accuracy of the saddlepoint expansion is often
eτ1 x+(τ1 ) far greater.
F (1) (x) ∼ x−E[L] +
2π (2) (τ1+ )
 Acknowledgments
 (4) (τ1+ )
× 1+
8( (2) (τ1+ ))2 Research underlying this article was supported by the
 Natural Sciences and Engineering Research Council of
5( (3) (τ1+ ))2 Canada and MITACS, Canada.
− + ··· (27)
24( (2) (τ1+ ))3
References
In [2], the m = −1 solution τ ∗ , suggested by large
deviation theory, is chosen as the center of the Taylor [1] Andersen, L., Sidenius, J. & Basu, S. (2003). All your
expansion, even for m = −1. The factor τ −m−1 is hedges in one basket, Risk 16, 67–72.
then included with the other nonexponentiated terms, [2] Antonov, A., Mechkov, S. & Misirpashaev, T. (2005).
leading to an asymptotic expansion with terms of the Analytical Techniques for Synthetic CDOs and Credit
Default Risk Measures, Numerix Preprint http://www.
form defaultrisk.com/pp crdrv 77.htm.
 ∞ [3] Daniels, H.E. (1954). Saddlepoint approximations in
e−w /2 (w + w0 )−m−1 w k dww0 = τ ∗ /  (2) (τ ∗ )
2
statistics, Annals of Mathematical Statistics 25, 631–650.
−∞ [4] Gordy, M. (2002). Saddlepoint approximation of credit
(28) risk, Journal of Banking Finance 26(2), 1335–1353.
[5] Hull, J. & White, A. (2004). Valuation of a CDO and
These integrals can be evaluated in closed form, an nth to default CDS without Monte Carlo simulation,
but are somewhat complicated, and more terms are Journal of Derivatives 2, 8–23.
needed for a given order of accuracy. [6] Martin, R., Thompson, K. & Browne, C. (2003). Taking
to the saddle, in Credit Risk Modelling: The Cutting-edge
Numerical implementation of the saddlepoint
Collection, M. Gordy, ed, Riskbooks, London.
method for portfolio credit problems thus boils down [7] Varadhan, S.R.S. (1966). Asymptotic probabilities and
to efficient computation of the appropriate solutions differential equations, Communications on Pure and
of the saddlepoint condition given by equation (26). Applied Mathematics 19, 261–286.
This is a relatively straightforward application of one- [8] Watson, G.N. (1995). A Treatise on the Theory of Bessel
dimensional Newton–Raphson iteration, but must be Functions, 2nd Edition, Cambridge University Press,
done for a large number of values of (x, t, y). For Cambridge, reprint of the second (1944) edition.
[9] Yang, J.P., Hurd, T.R. & Zhang, X.P. (2006). Saddlepoint
typical parameter values and up to 210 obligors, approximation method for pricing CDOs, Journal of
[9] report that saddlepoints were usually found in Computational Finance 10, 1–20.
under 10 iterations, which suggests that a saddle-
point expansion will run no more than about 10 times THOMAS R. HURD
Credit Scoring the lower the number of specific client information
available to the bank. This generally means that appli-
cation models have a lower prediction power than
Credit scoring models play a fundamental role in the behavioral and collection models.
risk management practice at most banks. Commer- Over the last 50 years, several statistical method-
cial banks’ primary business activity is related to ologies have been used to build credit scoring models.
extending credit to borrowers and generating loans The very simplistic univariate analysis applied at the
and credit assets. A significant component of a bank’s beginning (late 1950s) was replaced as soon as aca-
risk, therefore, lies in the quality of its assets that demic research started to focus on credit scoring
needs to be in line with the bank’s risk appetite.a modeling techniques (late 1960s). The seminal works,
To manage risk efficiently, quantifying it with the in this field, of Beaver [10] and Altman [1] intro-
most appropriate and advanced tools is an extremely duced the multivariate discriminant analysis (MDA)
important factor in determining the bank’s success. that became the most popular statistical methodol-
Credit risk models are used to quantify credit risk ogy used to estimate credit scoring models until
at counterparty or transaction level and they differ Ohlson [26], for the first time, applied the condi-
significantly by the nature of the counterparty (e.g., tional logit model to the default prediction’s study.
corporate, small business, private individual). Rating Since Ohlson’s research (early 1980s), several other
models have a long-term view (through the cycle) and statistical techniques have been utilized to improve
have been always associated with corporate clients, the prediction power of credit scoring models (e.g.,
financial institutions, and public sector (see Credit linear regression, probit analysis, Bayesian methods,
Rating; Counterparty Credit Risk). Scoring mod- neural network, etc.), but the logistic regression still
els, instead, focus more on the short term (point in remains the most popular method.
time) and have been mainly applied to private indi- Lately, credit scoring has gained new impor-
viduals and, more recently, extended to small- and tance with the new Basel Capital Accord. The
medium-sized enterprises (SMEs).b In this article, we so-called Basel II replaces the current 1988 cap-
focus on credit scoring models, giving an overview ital accord and focuses on techniques that allow
of their assessment, implementation, and usage. banks and supervisors to properly evaluate the vari-
Since 1960s, larger organizations have been utiliz- ous risks that banks face (see Internal-ratings-based
ing credit scoring to quickly and accurately assess the Approach; Regulatory Capital). Since credit scor-
risk level of their prospects, applicants, and existing ing contributes broadly to the internal risk assessment
customers mainly in the consumer-lending business. process of an institution, regulators have enforced
Increasingly, midsize and smaller organizations are more strict rules about model development, imple-
appreciating the benefits of credit scoring as well. mentation, and validation to be followed by banks
The credit score is reflected in a number or letter(s) that wish to use their internal models in order to esti-
that summarizes the overall risk utilizing available mate capital requirements.
information on the customer. Credit scoring models The remainder of the article is structured as fol-
predict the probability that an applicant or existing lows. In the second section, we review some of the
borrower will default or become delinquent over a most relevant research related to credit scoring mod-
fixed time horizon.c The credit score empowers users eling methodologies. In the third section, following
to make quick decisions or even to automate deci- the model lifecycle structure, we analyze the main
sions, and this is extremely desirable when banks are steps related to the model assessment, implementa-
dealing with large volumes of clients and relatively tion, and validation process.
small margin of profits at individual transaction level. The statistical techniques used for credit scoring
Credit scoring models can be classified into three are based on the idea of discrimination between
main categories: application, behavioral, and collec- several groups in a data sample. These procedures
tion models, depending on the stage of the consumer originated in the 1930s and 1940s of the previous
credit cycle in which they are used. The main dif- century [18]. At that time, some of the finance houses
ference between them lies in the set of variables that and mail order firms were having difficulties with
are available to estimate the client’s creditworthiness, their credit management. Decision whether to give
that is, the earlier the stage in the credit cycle, loans or send merchandise to the applicants was
2 Credit Scoring

made judgmentally by credit analysts. The decision literature [5, 11, 19, 27, 30] used logit models to
procedure was nonuniform, subjective, and opaque; it predict default.
depended on the rules of each financial house and on Several other statistical techniques have been
the personal and empirical knowledge of each single tested to improve the prediction accuracy of credit
clerk. With the rising number of people applying for scoring models (e.g., linear regression, probit analy-
a credit card, it was impossible to rely only on credit sis, Bayesian methods, neural network, etc.), but the
analysts; an automated system was necessary. The empirical results have never shown really significant
first consultancy was formed in San Francisco by Bill benefits.
Fair and Earl Isaac in the late 1950s.
After the first empirical solutions, academic inter-
est on the topic rose and, given the lack of consumer- Credit Scoring Models Lifecycle
lending figures, researchers focused their attention on As already mentioned, banks that want to imple-
small business clients. The seminal works in this field ment the most advanced approach to calculate their
were Beaver [10] and Altman [1], who developed uni- minimum capital requirements (i.e., advanced inter-
variate and multivariate models, applying an MDA nal rating based approach, A-IRB) are subject to
technique to predict business failures using a set of more strict and common rules regarding how their
financial ratios.d internal models should be developed, implemented,
For many years thereafter, MDA was the prevalent and validated.g A standard model lifecycle has been
statistical technique applied to the default prediction designed to be followed by the financial institutions
models and it was used by many authors [2, 3, 13, that will want to implement the A-IRB approach. The
15, 16, 24, 29]. However, in most of these studies, lifecycle of every model is divided into several phases
authors pointed out that two basic assumptions of (assessment, implementation, validation) and regula-
MDA are often violated when applied to the default tors have published specific requirements for each
prediction problems.e Moreover, in MDA models, one of them. In this section, we describe the key
the standardized coefficients cannot be interpreted aspects of each model’s lifecycle phase.
such as the slopes of a regression equation and,
hence, do not indicate the relative importance of
Model Assessment
the different variables. Considering these MDA’s
problems, Ohlson [26], for the first time, applied the Credit scoring models are used to risk rank new or
conditional logit model to the default prediction’s existing clients on the basis of the assumption that
study.f The practical benefits of the logit methodology the future will be similar to the past. If an applicant
are that it does not require the restrictive assumptions or an existing client had a certain behavior in the past
of MDA and allows working with disproportional (e.g., paid back his debt or not), it is likely that a new
samples. The performance of his models, in terms applicant or client, with similar characteristics, will
of classification accuracy, was lower than the one show the same behavior. As such, to develop a credit
reported in the previous studies based on MDA, but scoring model, we need a sample of past applicants
he pointed out some reasons to prefer the logistic or clients’ data related to the same product as the one
analysis. we want to use our scoring model for. If historical
From a statistical point of view, logit regression data from the bank are available, an empirical model
seems to fit well the characteristics of the default can be developed. When banks do not have data or
prediction problem, where the dependent variable do not have a sufficient amount of data to develop an
is binary (default/nondefault) and with the groups empirical model, an expert or a generic model is the
being discrete, nonoverlapping, and identifiable. The most popular solution.h
logit model yields a score between 0 and 1, which When a data sample covering the time horizon
conveniently can be transformed in the probability necessary for the statistical analysis (usually at least
of default (PD) of the client. Lastly, the estimated 1 year) is available, the performance of the clients
coefficients can be interpreted separately as the inside the sample can be observed. We define perfor-
importance or significance of each of the independent mance as the default or nondefault event associated
variables in the explanation of the estimated PD. with each client.i This binary variable is the depen-
After the work of Ohlson [26], most of the academic dent variable used to run the regression analysis. The
Credit Scoring 3

characteristics of the client at the beginning of the SMEs). When a large amount of applicants or clients
selected period are the predictors. is manually referred to credit analysts to check their
Following the literature discussed in the second information and apply policy rules, most of the
section, a conditional probability model, logit model, benefits associated with the use of scoring models
is commonly used by most banks to estimate the are lost. On the other hand, any scoring model has
1-year score through a range of variables by max- a “gray” area where it is not able to separate with
imizing the log-likelihood function. This procedure an acceptable level of confidence between expected
is used to obtain the estimates of the parameters of “good” clients and expected “bad” ones.l The main
the following logit model [20, 21]: challenge for credit risk managers is to define the
most appropriate and efficient thresholds (cutoff) for
1 each scoring model.
P1 (Xi) = In order to maximize the benefits of a scoring
[1 + e−(B0+B1Xi1+B2Xi2+···+BnXin) ]
model, the optimal cutoff should be set taking into
1 account the misclassification costs related to the
= (1)
[1 + e−(Di) ] type I and type II error rates as Altman et al. [2],
Taffler [29], and Koh [23] point out. Moreover,
where P1 (Xi ) is the score given the vector of we believe that the optimum cutoff value cannot
attributes Xi ; Bj is the coefficient of attribute j (with be found without a careful consideration of each
j = 1, . . . , n); B0 is the intercept; Xij is the value of particular bank peculiarities (e.g., tolerance for risk,
the attribute j (with j = 1, . . . , n) for customer I ; profit–loss objectives, recovery process costs and
and Di is the logit for customer i. efficiency, possible marketing strategies). Today, the
The logistic function implies that the logit score most advanced banks set cutoffs using profitability
P1 has a value in [0,1] interval and is increasing in analyses at account level.
Di . If Di approaches minus infinity, P1 will be zero The availability of sophisticated IT systems has
and if Di approaches plus infinity, P1 will be one. significantly broadened the number of strategies that
The set of attributes that are used in the regression can be implemented using credit scoring models.
depends on the type of model that is going to be The most efficient banks are able to follow the
developed. Application models, employed to decide lifecycle of any client, from the application to the
whether to accept or reject an applicant, typically end of the relationship, with monthly updated scores
rely only on personal information about the applicant, calculated by different scorecards related to the
given the fact that this is usually the only information phase of the credit cycle where the client is located
available to the bank at that stage.j Behavioral (e.g., origination, account maintenance, collection,
and collection models include variables describing write off). Marketing campaigns (e.g., cross-selling,
the status of the relationship between the client up-selling), automated limit changes, early collection
and the bank that may add significant prediction strategies, and shadow limit management are some
power to the model.k of the activities that are fully driven by the output of
Once the model is developed, it needs to be tested scoring models in most banks.
on a test sample to confirm the soundness of its
results. When enough data are available, part of
the development sample (hold-out sample) is usually Model Validation
kept for the final test of the model. However, an
Banks that have adopted or are willing to adopt the
optimal test of the model would require investigating
Basel II IRB-advanced approach are required to put in
its performance also on an out-of-time and out-of-
place a regular cycle of model validation that should
universe sample.
include at least monitoring of the model performance
and stability, reviewing of the model relationships,
Model Implementation and testing of model outputs against outcomes (i.e.,
backtesting).m
The main advantage of scoring models is to allow Considering the relatively short lifecycle of credit
banks to implement automated decision systems to scoring models due to the high volatility of retail
manage their retail clients (private individuals and markets, their validation has always been completed
4 Credit Scoring

by banks. Basel II has only given to it a more of all the scoring models utilized in the daily business
official shape, prescribing that the validation should (see [9], par. 438, 439, 660, 718 (LXXVI), 728).
be undertaken by a team independent from the one
that has developed the models. End Notes
Stability and performance (i.e., prediction accu-
racy) are extremely important information about the a.
Risk appetite is defined as the maximum risk the bank is
quality of the scoring models. As such, they should willing to accept in executing its chosen business strategy,
be tracked and analyzed at least monthly by banks, to protect itself against events that may have an adverse
regardless of the validation exercise. As we have impact on its profitability, the capital base, or share price
discussed above, often scoring models are used to (see Economic Capital Allocation; Economic Capital).
b.
generate a considerable amount of automated deci- Recently, several studies [4, 12] have shown the impor-
tance for banks of classifying SMEs as retail clients and
sions that may have a significant impact on the bank- applying credit scoring models developed specifically for
ing business. Even small changes in the population’s them.
characteristics can substantially affect the quality of c.
The default definition may be significantly different by
the models, creating undesired selection bias. bank and type of client. The new Basel Capital Accord [9]
In the literature, we have found several indexes (par.452) has given a common definition of default (i.e.,
that have been used to assess the performance of the 90 days past due over 1-year horizon) that is consistently
used by most banks today.
models. The simple type I and type II error rates d.
The original Z-score model Altman [1] used five ratios:
that quantify the accuracy of each model in correctly working capital/total assets, retained earnings/total assets,
classifying defaulted and nondefaulted observations EBIT/total assets, market value equity/BV of total debt,
have been the first measures to be applied to scoring and sales/total assets.
e.
models. More recently, the accuracy ratio (AR) MDA is based on two restrictive assumptions: (i) the inde-
pendent variables included in the model are multivariate
and the Gini index have become the most popular
normally distributed and (ii) the group dispersion matri-
measures (see [17] for further details). ces (or variance–covariance matrices) are equal across the
Backtesting and benchmarking are an essential failing and the nonfailing group. See [6, 22, 25] for further
part of the scoring models’ validation. With the back- discussions about this topic.
f.
testing, we evaluate the calibration and discrimination Zmijewski [31] was the pioneer in applying probit analysis
of a scoring model. Calibration refers to the map- to predict default, but, until now, logit analysis has given
better results in this field.
ping of a score to a quantitative risk measure (e.g., g.
The new Basel Capital Accord offers financial institutions
PD). A scoring model is considered well calibrated the possibility to choose between the standardized and the
if the (ex ante) estimated risk measures (PD) deviate advanced approach to calculate their capital requirements.
only marginally from what has been observed ex post Only the latter requires banks to use their own internal
(actual default rate per score band). Discrimination risk assessment tools to quantify the inputs of the capital
measures how well the scoring model provides an requirements formulas (i.e., PD and loss given default).
h.
Expert scorecards are based on subjective weights
ordinal ranking of the risk profile of the observations assigned by an analyst, whereas generic scorecards are
in the sample; for example, in the credit risk con- developed on pooled data from other banks operating in
text, discrimination measures to what extent default- the same market. For a more detailed analysis of the
ers were assigned low scores and nondefaulters high possible solutions that banks can consider when not enough
scores. historical data is available, see [28].
i.
Benchmarking is another quantitative validation See end note (b).
j.
The most common application variables used are sociode-
method that aims at assessing the consistency of mographic information about the applicants (e.g., marital
the estimated scoring models with those obtained status, residence type, time at current address, type of
using other estimation techniques and potentially work, time at current work, flag phone, number of chil-
using other data sources. This analysis may be quite dren, installment on income, etc.). When a credit bureau is
difficult to perform for retail portfolios, given the lack available in the market, the information that can be obtained
related to the behavior of the applicant with other financial
of generic benchmarks in the market.n
institutions is an extremely powerful variable to be used in
Lastly, we would like to point out that Basel II application models.
specifically requires senior management to be fully k.
Variables used in behavioral and collection scoring models
involved and aware of the quality and performance are calculated and updated at least monthly. As such, the
Credit Scoring 5

correlation between these variables and the default event [11] Becchetti, L. & Sierra, J. (2003). Bankruptcy risk and
is significantly high. Examples of behavioral variables are productive efficiency in manufacturing firms, Journal of
as follows: the number of missed installments (current, Banking and Finance 27(11), 2099–2120.
max last 3/6/12 months, or ever), number of days in excess [12] Berger, A.N. & Frame, S.W. (2007). Small business
(current, max last 3/6/12 months, or ever), outstanding on credit scoring and credit availability, Journal of Small
limit, and so on. Behavioral score can be calculated at Business Management 45(1), 5–22.
facility and customer level (when several facilities are [13] Blum, M. (1974). Failing company discriminant analy-
related to the same client). sis, Journal of Accounting Research 12(1), 1–25.
l.
Depending on the chosen binary-dependent variable, [14] Castermans, G., Martens, D., Van Gestel, T., Hamers, B.
“good” and “bad” will have different meanings. For credit & Baesens, B. (2007). An overview and framework
risk models, these terms are usually associated with nonde- for PD backtesting and benchmarking, Proceedings
faulted and defaulted clients, respectively. of Credit Scoring and Credit Control X , Edinburgh,
m.
See par. 417 and 718 (XCix) of the new Basel Capital Scotland.
Accord [7–9] (see also Model Validation; Backtesting). [15] Deakin, E. (1972). A discriminant analysis of predictors
n.
Recently, rating agencies (e.g., Standard & Poor’s and of business failure, Journal of Accounting Research
Moody’s) and credit bureau providers (e.g., Fair Isaac and 10(1), 167–179.
Experian) have started to offer services of benchmarking [16] Edmister, R. (1972). An empirical test of financial ratio
for retail scoring models. For more details about backtesting analysis for small business failure prediction, Journal of
and benchmarking techniques, see [14]. Financial and Quantitative Analysis 7(2), 1477–1493.
[17] Engelmann, B., Hayden, E. & Tasche, D. (2003). Testing
rating accuracy, Risk 16(1), 82–86.
[18] Fisher, R.A. (1936). The use of multiple measurements
References in taxonomic problems, Annals of Eugenic 7, 179–188.
[19] Gentry, J.A., Newbold, P. & Whitford, D.T. (1985).
Classifying bankrupt firms with funds flow components,
[1] Altman, E.I. (1968). Financial ratios, discriminant anal-
Journal of Accounting Research 23(1), 146–160.
ysis and the prediction of corporate bankruptcy, Journal
[20] Gujarati, N.D. (2003). Basic Econometrics, 4th Edition,
of Finance 23(4), 589–611.
McGraw-Hill, London.
[2] Altman, E.I., Haldeman, R.G. & Narayanan, P. (1977).
[21] Hosmer, D.W. & Lemeshow, S. (2000). Applied Logistic
Zeta-analysis. A new model to identify bankruptcy risk
Regression, 2nd Edition, John Wiley & Sons, New York.
of corporations, Journal of Banking and Finance 1,
[22] Karels, G.V. & Prakash, A.J. (1987). Multivariate nor-
29–54.
mality and forecasting of business bankruptcy, Journal
[3] Altman, E.I., Hartzell, J. & Peck, M. (1995). A Scoring
of Business Finance & Accounting 14(4), 573–593.
System for Emerging Market Corporate Debt. Salomon
[23] Koh, H.C. (1992). The sensitivity of optimal cutoff
Brothers Emerging Markets Bond Research, May 15. points to misclassification costs of Type I and Type II
[4] Altman, E.I. & Sabato, G. (2005). Effects of the new errors in the going-concern prediction context, Journal
Basel capital accord on bank capital requirements for of Business Finance & Accounting 19(2), 187–197.
SMEs, Journal of Financial Services Research 28(1/3), [24] Lussier, R.N. (1995). A non-financial business success
15–42. versus failure prediction model for young firms, Journal
[5] Aziz, A., Emanuel, D.C. & Lawson, G.H. (1988). of Small Business Management 33(1), 8–20.
Bankruptcy prediction – an investigation of cash flow [25] Mc Leay, S. & Omar, A. (2000). The sensitivity of
based models, Journal of Management Studies 25(5), prediction models tot the non-normality of bounded an
419–437. unbounded financial ratios, British Accounting Review
[6] Barnes, P. (1982). Methodological implications of non- 32, 213–230.
normality distributed financial ratios, Journal of Business [26] Ohlson, J. (1980). Financial ratios and the probabilis-
Finance and Accounting 9(1), 51–62. tic prediction of bankruptcy, Journal of Accounting
[7] Basel Committee on Banking Supervision (2005). Stud- Research 18(1), 109–131.
ies on the Validation of Internal Rating Systems. Working [27] Platt, H.D. & Platt, M.B. (1990). Development of a class
paper 14, www.bis.org. of stable predictive variables: the case of bankruptcy
[8] Basel Committee on Banking Supervision (2005). prediction, Journal of Business Finance & Accounting
Update on Work of the Accord Implementation Group 17(1), 31–51.
Related to Validation Under the Basel II Framework . [28] Sabato, G. (2008). Managing credit risk for retail low-
Newsletter 4, www.bis.org. default portfolios, in Credit Risk: Models, Derivatives
[9] Basel Committee on Banking Supervision (2006). Inter- and Management, N. Wagner, ed., Financial Mathemat-
national Convergence of Capital Measurement and Cap- ics Series, Chapman & Hall/CRC.
ital Standards. www.bis.org. [29] Taffler, R.J. & Tisshaw, H. (1977). Going, going,
[10] Beaver, W. (1967). Financial ratios predictors of failure, gone – four factors which predict, Accountancy
Journal of Accounting Research 4, 71–111. 88(1083), 50–54.
6 Credit Scoring

[30] Zavgren, C. (1983). The prediction of corporate failure: Related Articles


the state of the art, Journal of Accounting Literature 2,
1–37.
[31] Zmijewski, M.E. (1984). Methodological issues related Backtesting; Credit Rating; Credit Risk; Internal-
to the estimation of financial distress prediction models, ratings-based Approach; Model Validation.
Journal of Accounting Research 22, 59–86.
GABRIELE SABATO
Further Reading

Taffler, R.J. (1982). Forecasting company failure in the UK


using discriminant analysis and financial ratio data, Journal
of the Royal Statistical Society 145(3), 342–358.
Credit Rating regardless of whether that instrument is a standard
corporate bond or a structured product such as a
tranche on a collateralized debt obligation (CDO)
The cornerstone of credit risk measurement and man- (see Collateralized Debt Obligations (CDO)); see
agement for a financial institution is the credit rating, also [31]. The actual behavior of rated obligors or
whether supplied by an external credit rating agency instruments may turn out to have more heterogeneity
(CRA) or generated by an internal credit model. A across countries, industries, and product types, and
credit rating represents an overall assessment of the there is substantial supporting evidence. See [37] for
creditworthiness of a borrower, obligor, or counter- evidence of variation across countries of domicile and
party, and is thus meant to reflect only credit or industries for corporate bond ratings, and [17] for
default risk. That obligor may be either a firm or differences between corporate bonds and structured
an individual. In this way the rating is a forecast, products.
and like all forecasts it is noisy. For that reason, The rating agencies differ about what exactly
credit rating agencies make use of discrete ratings. is assessed. Although Fitch and S&P evaluate an
The convention used by the three largest rating agen- obligor’s overall capacity to meet its financial obli-
cies, namely, Fitch, Moody’s, and Standard & Poor’s gation, and hence is best thought of as an estimate of
(S&P) is to have seven credit grades. They are, from probability of default, Moody’s assessment incorpo-
best to worst, AAA, AA, A, BBB, BB, B, and CCC rates some judgment of recovery in the event of loss.
using S&P and Fitch’s nomenclature, and Aaa, Aa, A, In the argot of credit risk management, S&P mea-
Baa, Ba, B, and Caa using Moody’s nomenclature.b sures PD (probability of default), whereas Moody’s
As a firm migrates from a higher to a lower credit rat- measure is somewhat closer to EL (expected loss)
ing, that is, it is downgraded, it simply moves closer [9].d These differences seem to remain for struc-
to default. tured products. In describing their ratings criteria and
The market for credit ratings in the United States methodology for structured products, S&P states the
is dominated by two players: S&P and Moody’s following: “[w]e base our ratings framework on the
Investor Services; of the smaller rating agencies, likelihood of default rather than expected loss or
only Fitch plays a significant role in the United loss given default. In other words, our ratings at the
States (although it has a more substantial presence rated instrument level don’t incorporate any analy-
elsewhere).c The combined market share of Moody’s sis or opinion on post-default recovery prospects.”
and S&P is 80%, and once market share of Fitch is (44, p. 3). Since 2005, Fitch, followed soon after by
added, the total exceeds 95% [20]. S&P and Moody’s, have started publishing recovery
To be sure, it is not the obligor but the instru- ratings (each on a six-point scale). Market sector cov-
ment issued by the obligor that receives a credit erage has been different, but expanding, across the
rating, though an obligor rating typically corresponds agencies. Also, application is different between cor-
to the credit risk of a senior unsecured debenture porate versus structured products.e
issued by that firm. The distinction is not that rel- Credit ratings issued by the agencies typically
evant for corporate bonds, where the obligor rat- represent an unconditional view, sometimes also
ing is commensurate with the rating on a senior called cycle-neutral or through the cycle: the rating
unsecured instrument, but is quite relevant for struc- agency’s own description of their rating methodology
tured credit products such as asset-backed securities broadly supports this view.
(ABS). Nonetheless, as stated in a recent S&P doc-
(33, p.6,7): .. [O]ne of Moody’s goals is to achieve
ument, “[o]ur ratings represent a uniform measure stable expected [italics in original] default rates
of credit quality globally and across all types of across rating categories and time. . . . Moody’s
debt instruments. In other words, an ‘AAA’ rated believes that giving only a modest weight to cyclical
corporate bond should exhibit the same degree of conditions best serves the interests of the bulk of
credit quality as an ‘AAA’ rated securitized issue.” investors.f
(44, p.4). (43, p.41): Standard & Poor’s credit ratings
This stated intent implies that an investor can are meant to be forward looking; . . . Accord-
assume that, say, a double-A rated instrument is the ingly, the anticipated ups and downs of business
same in the United States as in Belgium or Singapore, cycles—whether industry-specific or related to the
2 Credit Rating

general economy—should be factored into the credit model here:


rating all along. . . . The ideal is to rate ‘through the
cycle’.
Z = 0.012X1 + 0.014X2 + 0.033X3
This unconditional or firm-specific view of credit risk
+ 0.006X4 + 0.999X5 (1)
stands in contrast to risk measures such as EDFs
(expected default frequencies) from Moody’s KMV.
where
An EDF has two principal inputs: firm leverage and
X1 = working capital/total assets,
asset volatility, where the latter is derived from equity
X2 = retained earnings/total assets,
(stock price) volatility; see [24] for a description. As
X3 = earnings before interest and taxes/total
a result, EDFs can change frequently and significantly
assets,
since they reflect the stock market’s view of risk
X4 = market value of equity/book value of total
for that firm at a given point in time, a view
debt, and
which incorporates both systematic and idiosyncratic
X5 = sales/total assets.
risk.
A large value of Z indicates high credit quality;
Unfortunately, there is substantial evidence that
the firm is far from its default threshold. The sim-
credit rating changes, including changes to default,
plicity of this model makes it deceptively easy to
exhibit procyclical or systematic variation [7, 27, 37],
criticize: have the coefficients remained the same?
especially for speculative grades [23].
is it applicable to all industries (financial firms
Although this article’s focus is on credit ratings
are typically much more highly leveraged—X4 is
for corporate entities, including special purpose enti-
small—than nonfinancial firms) or all countries? is
ties such as structured credit, individuals also receive
the relationship really linear in the conditioning vari-
credit ratings or scores. These are important for
ables? and why are nonfinancial variables such as
obtaining either unsecured credit like a credit card,
firm age or some measure of management quality not
or even a mobile phone, as well as secured credit
considered? However, the Altman Z-score endures
such as an auto loan or lease or a mortgage. They to this day; it can be found for most publicly traded
have received considerable attention of late in the firms on any Bloomberg terminal.
context of subprime mortgages and their securitiza- The next important innovation in credit modeling
tion. Nonetheless, because the individual credit expo- is arguably Merton’s [32] option-based default model
sures are typically small, even considering mortgages, (see Default Barrier Models). Merton recognized
banks have tried to automate the credit assessment that a lender is effectively writing a put option on
of retail exposures as much as possible, often with the assets of the borrowing firm; owners and owner-
the help of outside firms called credit bureaus. See managers (i.e., shareholders) hold the call option.
[22] for a discussion with application to credit cards, Thus a firm is expected to default when the value
and [1] for a broader survey on retail credit (see of its assets falls below a threshold value determined
also Credit Scoring for a review of credit scoring by its liabilities. To this day all credit models
models). owe an intellectual debt to Merton’s insights. The
best-known commercial application of the Merton
model is the Moody’s KMV EDF (expected default
How to Generate a Credit Rating frequency) model.
Clearly, quantitative information obtained from
One of the earliest studies of predicting firm (public) accounting data, metrics such as leverage,
bankruptcy, perhaps the most obvious form of profitability, debt service coverage, and liquidity,
borrower default, is [2]. Altman constructed a is important for arriving at a credit assessment of
balanced sample of 33 defaulted and 33 nondefaulted a firm. In addition, a rating agency, because it
firms to build a bankruptcy prediction model has access to private information about the firm,
using multiple-discriminant analysis. His choice of can and does include qualitative information such
condition variables—ratios reflecting firm leverage as the quality of management [3, 28]. Indeed this
and profitability—has influenced default models to is partly what makes credit rating agencies unique
this day, and therefore it is worth showing the final and important: they aggregate public and private
Credit Rating 3

information into a single measure of creditworthiness they also publish default rates for different horizons
(riskiness) and then make that summary statistic—the by rating. Thus we would expect default rates or
credit rating—public, essentially providing a public probabilities to be monotonically increasing as one
good [47]. By contrast a Moody’s KMV EDF makes descends the credit spectrum. Using S&P rating his-
use only of public information, although it transforms tories, Hanson and Schuermann [23] show formally
this information using a proprietary methodology.g that monotonicity is violated frequently for most
In fact, rating agencies are in the business of not notch-level investment grade 1-year estimated default
just information production but, in the words of Boot probabilities. The precision of the PD point estimates
et al. [12], they also act as “information equalizers” is quite low; there have been no defaults over 1 year
[quotes in the original]. In this way, they serve as a for triple-A or AA+ (Aa1) rated firms, yet surely we
coordinating mechanism or focal point to the financial
do not believe that the 1-year probability of default
markets.
is identically equal to zero. The new Basel Capi-
tal Accord (see Regulatory Capital), perhaps with
this in mind, has set a lower bound of 3 bp for any
Model Performance PD estimate (10, §285), commensurate with about a
All credit scoring or rating models map a set of finan- single-A rating. Trück and Rachev [46] show the eco-
cial and nonfinancial variables into the unit interval: nomic impact resulting from such uncertainty using
the objective is to generate a probability of default, to bank internal ratings and a corresponding loan port-
separate the defaulters from the nondefaulters. Unsur- folio. Pluto and Tasche [40] propose a conservative
prisingly, there is a plethora of modeling choices approach to generating PD estimates for low-default
as documented, for instance, by Resti and Sironi portfolios.
[41]. However, in the horse race of default pre- Despite this lack of statistical precision, Kliger
diction models, the hazard approach as shown in and Sarig [26] show that bond ratings contain price-
[16, 42] seems to be emerging as the winner. See relevant information by taking advantage of a natural
[11] for a recent overview. While we can say little experiment. On April 26, 1982, Moody’s introduced
about the performance on bank internal credit scoring overnight modifiers to their rating system, much like
models—they are proprietary—we can examine the the notching used by S&P and Fitch, effectively
empirical default experience of firms with a rating introducing finer credit rating information about their
from a credit rating agency. issuer base without any change in the firm funda-
Highly rated firms default quite rarely. For exam- mentals. They find that bond prices indeed adjust to
ple, Moody’s reports that the 1-year investment grade the new information, as do stock prices, and that any
default rate over the period 1983–2007 was 0.069% gains enjoyed by bondholders are offset by losses
or 6.9 bp [35]. This is an average over four letter suffered by stockholders.
grade ratings: Aaa through Baa. Thus in a pool of Although the 1-year horizon is typical in credit
10 000 investment grade obligors or instruments we
analysis (and is also the horizon used in Basel II),
would expect seven defaults over the course of 1 year.
most traded credit instruments have longer maturity.
But what if only four default? What about 11? Higher
For example, the typical CDS contract (see Credit
than expected default could be the result of either bad
Default Swaps) is five years, and over that horizon
luck or a bad model, and it is very hard to distinguish
between the two, especially for small probabilities there are positive empirical default rates for Aaa and
(see also [29] and Backtesting). Indeed the use of the Aa, which Moody’s reports to be 7.8 bp and 18.3 bp,
regulatory color scheme—green, amber, red—which respectively [35].
is behind the 1996 Market Risk Amendment to the The preceding discussion highlights the difficulty
Basel I, was motivated precisely by this recognition, of accurately forecasting such small PDs. Empirical
and in that case the probability to be validated is estimates of PDs using credit rating histories can
comparatively large 1% (for 99% VaR) [8] with daily be quite noisy, even with over 25 years of data.
data. Under the new Basel Capital Accord (Basel II), US
Although rating agencies insist that their rat- regulators would require banks to have a minimum
ings scale reflects an ordinal ranking of credit risk, of seven nondefault rating categories [21].
4 Credit Rating

Internal Ratings structures represent claims on cash flows from a


portfolio of underlying assets, the rating of a struc-
With the roll-out of the New Basel Capital Accord, tured credit product must take into account systematic
internal credit ratings will become widespread. risk. It is correlated losses that matter especially
To qualify for the internal-ratings-based (IRB) for the more senior (higher rated) tranches, and
approach (see Internal-ratings-based Approach), loss correlation arises through dependence on shared
which allows a bank to use its own internal credit or common (or systematic) risk factors.h For ABS
rating, the accord provides the following rather deals that have a large number of underlying assets,
nonspecific guidance (10, §461): for instance, mortgage-backed securities (MBS), the
Banks must use information and techniques that take portfolio is large enough such that all idiosyncratic
appropriate account of the long-run experience when risk is diversified away leaving only systematic expo-
estimating the average PD for each rating grade. For sure to the risk factors particular to that product class
example, banks may use one or more of the three (here, mortgages). By contrast, a substantial amount
specific techniques set out below: internal default of idiosyncratic risk may remain in ABS transactions
experience, mapping to external data, and statistical with smaller asset pools, for instance, CDOs [4, 17].
default models.
Because these deals are portfolios, the effect
Since bank internal ratings are proprietary, not much of correlation is not the same for all tranches
is known (publicly) about their exact construction or (see CDO Tranches: Impact on Economic Capi-
about their performance. Carey and Hrycay’s [15] tal): equity tranche holders prefer higher correlation,
study of internal ratings in US banks suggests that whereas senior tranches prefer lower correlation (tail
such rating systems rely at least to some degree losses are driven by loss correlation). As correlation
on external ratings themselves, either by using them increases, so does portfolio loss volatility. The pay-
directly, when available, or calibrating internal credit off function for the equity tranche is, true to its name,
scores to external ratings. Several books by practi- like a call option. Indeed equity itself is a call option
tioners and academics with practitioner experience, on the assets of the underlying firm, and the value of
for example, [18, 30, 38, 41], indicate that the meth- a call option is increasing in volatility. If the equity
ods used are, perhaps unsurprisingly, much along tranche is long a call option, then the senior tranche
the lines of the models covered above: statistical is short a call option, so that their payoffs behave in
approaches—discriminant models in the manner of an opposite manner. The impact of increased correla-
Altman’s Z-score, logistic regression, neural net- tion on the value of mezzanine tranches is ambiguous
works, decision trees, and so on—that make use of and depends on the structure of a particular deal [19].
firm financials, augmented with a judgmental overlay By contrast, correlation with systematic risk factors
that incorporates qualitative information. This more should not matter for corporate ratings (see also Base
intangible information is especially important when Correlation and Modeling Correlation of Struc-
lending to small and young firms with no footprint in tured Instruments in a Portfolio Setting for details
the capital markets, as shown, for instance, by Peter- on default correlation modeling).
son and Rajan [39]. Ashcraft [5] documents that the As a result of the portfolio nature of the rated
failure of even healthy banks is followed by largely products, the ratings migration behavior may also be
permanent declines in real activity, which is attributed different than for ordinary obligor ratings. Moody’s
to the destruction of private information about infor- Investors Services [34] reports that rating changes
mationally opaque borrowers. There are now some are much more common for corporate bond than
papers emerging that attempt to formalize the incor- for structured product ratings, but the magnitude of
poration of qualitative and subjective information, for changes (number of notches up- or downgraded) was
example, from a loan officer, into bank internal credit nearly double for the structured products.i There are
scores or ratings. See, for instance, [25, 45]. potentially two reasons for this difference: model
error or greater sensitivity of performance to systemic
Ratings for Structured Credit Products factors.
The modeling approach for rating structured credit
Corporate bond (obligor) ratings are largely based products is in flux as of this writing, driven by the
on firm-specific risk characteristics. Since ABS poor performance during the turmoil in the credit
Credit Rating 5

markets in 2007 and 2008. Moody’s, for instance, i.


The recent rash of downgrades for structured credit
has recently proposed adding two new risk measures products in the wake of the subprime credit crisis may
for structured finance transactions [36]. First, an change this stylized fact.
“assumption volatility score” (or “V Score”) that
would rate the uncertainty of a rating and the potential References
for future ratings volatility on a scale of 1–5 (low
to high). Second, a “loss sensitivity” that would [1] Allen, L., DeLong, G. & Saunders, A. (2004). Issues
estimate the number of notches a tranche would be in the credit risk modeling of retail markets, Journal of
downgraded if the expected loss of the collateral pool Banking and Finance 28, 727–752.
were increased to the 95th percentile of the original [2] Altman, E.I. (1968). Financial ratios, discriminant anal-
loss distribution. Moody’s decided to develop these ysis and the prediction of corporate bankruptcy, Journal
risk measures in addition to rather than in substitution of Finance 20, 589–609.
[3] Altman, E.I. & Rijken, H.A. (2004). How rating agencies
for the standard credit ratings, which are on the same
achieve rating stability, Journal of Banking and Finance
scale as their corporate ratings, precisely to allow 28, 2679–2714.
investors to make a baseline comparison to other [4] Amato, J.D. & Remolona, E.M. (2005). The Pricing of
rated securities. Unexpected Credit Losses. BIS Working Paper No. 190.
[5] Ashcraft, A. (2005). Are banks really special? New
evidence from the FDIC-induced failure of healthy
End Notes banks, American Economic Review 95, 1712–1730.
[6] Ashcraft, A. & Schuermann, T. (2008). Understanding
the Securitization of Subprime Mortgage Credit. Foun-
a.
Any views expressed represent those of the authors only dations and Trends in Finance 2 191–309.
and not necessarily those of the Federal Reserve Bank of [7] Bangia, A., Diebold, F.X., Kronimus, A., Schagen, C.
New York or the Federal Reserve System. & Schuermann, T. (2002). Ratings migration and the
b.
For no reason other than convenience and expediency, business cycle, with applications to credit portfolio
we make use of the Fitch and S&P nomenclature for the stress testing, Journal of Banking and Finance 26(2/3),
remainder of the article. 445–474.
c.
As of this writing, there are ten “accredited” rating agen- [8] Basel Committee on Banking Supervision (1996).
cies in the United States (see [6] for a discussion on what Amendment to the Capital Accord to Incorporate Market
it means to be “accredited”): A.M. Best, Dominion Bond Risks. Basel Committee Publication No. 24. Available:
Rating Service (DBRS), Egan-Jones Rating Company, Fitch www.bis.org/publ/bcbs24.pdf.
Inc., Japan Credit Rating Agency Ltd., Moody’s Investors [9] Basel Committee on Banking Supervision (2000). Credit
Services, Inc., LACE Financial Corp., Rating and Invest- Ratings and Complementary Sources of Credit Quality
ment Information, Inc, Realpoint LLC, Standard & Poor’s Information. BCBS Working Paper No. 3, available at
Ratings Services. There are several other firms that pro- http://www.bis.org/publ/bcbs wp3.htm.
vide credit opinions; one such example is Eagan–Jones [10] Basel Committee on Banking Supervision (2005). Inter-
Rating. An extensive list of rating agencies across the globe national Convergence of Capital Measurement and Cap-
can be found at http://www.defaultrisk.com/rating agencies. ital Standards: A Revised Framework . Available at
htm. For an exposition of the history of the credit rating http://www.bis.org/publ/bcbs118.htm.
industry, see [47]. For a detailed institutional discussion, [11] Bharath, S.T. & Shumway, T. (2008). Forecasting
see [14]. default with the Merton distance to default model,
d.
Specifically, EL = P D × LGD, where LGD is loss Review of Financial Studies 21, 1339–1369.
given default. However, given the paucity of LGD data, [12] Boot, A.W.A., Milbourn, T.T. & Schmeits, A. (2006).
little variation in EL that exists at the obligor (as opposed Credit ratings as coordination mechanisms, Review of
to instrument) level can be attributed to variation in LGD Financial Studies 19, 81–118.
making the distinction between the agencies modest at best. [13] Cantor, R. & Mann, C. (2007). Analyzing the tradeoff
e.
See http://www.fitchratings.com/corporate/fitchResources. between ratings accuracy and stability, Journal of Fixed
cfm?detail=1&rd file=intro#rtng actn. Income 16(4), 60–68.
f.
This view was recently reinforced in [13]; both authors [14] Cantor, R. & Packer, F. (1995). The credit rating
work for Moody’s. industry, Journal of Fixed Income 5(3), 10–34.
g.
In particular, the mapping from distance-to-default to EDF [15] Carey, M. & Hrycay, M. (2001). Parameterizing credit
is proprietary. risk models with rating data, Journal of Banking and
h.
Note that correlation includes more than just economic Finance 25, 197–270.
conditions, as it includes (i) model risk by the agencies, (ii) [16] Chava, S. & Jarrow, R.A. (2004). Bankruptcy prediction
originator and arranger effects, and (iii) servicer effects. with industry effects, Review of Finance 8, 537–569.
6 Credit Rating

[17] Committee on the Global Financial System (2005). [36] Moody’s Investors Services (2008). Introducing Assump-
The Role of Ratings in Structured Finance: Issues tion Volatility Scores and Loss Sensitivities for Struc-
and Implications. Available at http://www.bis.org/publ/ tured Finance Securities, Moody’s Global Credit Policy,
cgfs23.htm, January. New York.
[18] Crouhy, M., Galai, D. & Mark, R. (2000). Risk Manage- [37] Nickell, P., Perraudin, W. & Varotto, S. (2000). Stability
ment, McGraw-Hill, New York. of rating transitions, Journal of Banking and Finance 24,
[19] Duffie, Darrell. (2007). Innovations in Credit Risk 203–227.
Transfer: Implications for Financial Stability. Stan- [38] Ong, M.K. (1999). Internal Credit Risk Models: Capital
ford University GSB Working Paper, available at Allocation and Performance Measurement, Risk Books,
http://www.stanford.edu/∼duffie/BIS.pdf. London.
[20] The Economist (2007). Measuring the Measurers. [39] Peterson, M.A. & Rajan, R. (2002). Does distance still
May 31. matter: the information revolution in small business
[21] Federal Reserve Board (2003). Supervisory Guidance lending, Journal of Finance 57, 2533–2570.
on Internal Ratings-Based Systems for Corporate [40] Pluto, K. & Tasche, D. (2005). Thinking positively, Risk
Credit. Attachment 2 in http://www.federalreserve.gov/ August, 76–82.
boarddocs/meetings/2003/20030711/attachment.pdf. [41] Resti, A. & Sironi, A. (2007). Risk Management and
[22] Gross, D. & Souleles, N. (2002). An empirical analysis Shareholders’ Value in Banking, John Wiley & Sons,
of personal bankruptcy and delinquency, Review of New York.
Financial Studies 15, 319–347. [42] Shumway, T. (2001). Forecasting bankruptcy more accu-
[23] Hanson, S.G. & Schuermann, T. (2006). Confidence rately: a simple hazard model, Journal of Business 74,
intervals for probabilities of default, Journal of Banking 101–124.
and Finance 30(8), 2281–2301. [43] Standard and Poor’s (2001). Rating Methodology: Eval-
[24] Kealhofer, S. & Kurbat, M. (2002). Predictive Merton uating the Issuer, Standard & Poor’s Credit Ratings,
models, Risk February, 67–71. New York.
[25] Kiefer, N. (2007). The probability approach to default [44] Standard and Poor’s (2007). Principles-Based Rating
estimation, Risk July, 146–150. Methodology for Global Structured Finance Securities,
[26] Kliger, D. & Sarig, O. (2000). The information value of Standard & Poor’s RatingsDirect Research, New York.
bond ratings, Journal of Finance 55(6), 2879–2902. [45] Stefanescu, C., Tunaru, R. & Turnbull, S. (2008).
[27] Lando, D. & Skødeberg, T. (2002). Analyzing ratings The Credit Rating Process and Estimation of Transition
transitions and rating drift with continuous observations, Probabilities: A Bayesian Approach. London Business
Journal of Banking and Finance 26(2/3), 423–444. School working paper.
[28] Löffler, G. (2004). Ratings versus market-based mea- [46] Trück, S. & Rachev, S.T. (2005). Credit portfolio risk
sures of default risk in portfolio governance, Journal of and PD confidence sets through the business cycle,
Banking and Finance 28, 2715–2746. Journal of Credit Risk 1(4), 61–88.
[29] Lopez, J.A. & Saidenberg, M. (2000). Evaluating credit [47] White, L. (2002). The credit rating industry: an indus-
risk models, Journal of Banking and Finance 24, trial organization analysis, in Ratings, Rating Agencies
151–165. and the Global Financial System, R.M. Levich, C. Rein-
[30] Marrison, C. (2002). Fundamentals of Risk Measure- hart & G. Majnoni, eds, Kluwer, Amsterdam, NL.
ment, McGraw Hill, New York. pp. 41–64.
[31] Mason, J.R. & Rosner, J. (2007). Where Did the Risk
Go? How Misapplied Bond Ratings Cause Mortgage
Backed Securities and Collateralized Debt Obligation Related Articles
Market Disruptions. Hudson Institute Working Paper.
[32] Merton, R.C. (1974). On the pricing of corporate debt: Collateralized Debt Obligations (CDO); Credit
the risk structure of interest rates, Journal of Finance
29, 449–470.
Migration Models; Credit Risk; CreditRisk+;
[33] Moody’s Investors Services (1999). Rating Methodol- Credit Scoring; Internal-ratings-based Approach;
ogy: The Evolving Meanings of Moody’s Bond Ratings, Rating Transition Matrices; Structured Finance
Moody’s Global Credit Research, New York. Rating Methodologies.
[34] Moody’s Investors Services (2007). Structured Finance
Rating Transitions: 1983–2006. Special Comment, ADAM ASHCRAFT & TIL SCHUERMANN
Moody’s Global Credit Research, New York.
[35] Moody’s Investors Services (2008). Corporate Default
and Recovery Rates: 1920–2007. Special Comment,
Moody’s Global Credit Research, New York.
Portfolio Credit Risk: be diversified away. The second is an idiosyncratic
part εti with variance σ 2 , which is specific for
Statistical Methods each firm, independent between the firms and from
the systematic factor. The default threshold cti is
mostly modeled via credit ratings that reflect an
This article gives a brief overview over statistical aggregated summary of a firm’s risk characteristics.
methods for estimating the parameters of credit In a simple case, both variables may be expressed as
portfolio models from default data. The focus is on linear functions of the respective risk drivers, such
models for default probability and correlations; for that Vti = −ωFt + εti and cti = α + β  xti where α,
recovery rates, (see Recovery Rate). First, a rather β, and ω are unknown parameters, and xti is a
general model setting is introduced along the lines of design vector consisting of observable covariates for
the models of McNeil and Wendin [10] and others, obligor i, which may be time- and obligor-specific
who depict portfolio models as generalized linear (such as balance sheet ratios) or only time-specific
mixed models (GLMMs). Then, we describe the most (such as macroeconomic variables). Then the firm
common estimation techniques, which are the method defaults if Vti < α + β  xti . As shown in [7], the
of moments and maximum likelihood. An excellent aforementioned credit risk models mainly differ in
reference for other estimation techniques is [10], in the distributional assumptions regarding the common
particular for Bayes estimation. and the idiosyncratic random factors driving the firm
value as discussed below.
The probability of the firm’s default conditional
A Single Obligor’s Default Risk on the random factor ft can be expressed as

Let Dti denote an indicator variable for obligor i’s


default in time period t such that
CPDti (ft ) = P (Dti = 1|xti , ft )

1 borrower i defaults in t = P (Vti < α + β  xti |xti , ft )
Dti = (1)
0 otherwise
= P (εti < α + β  xti + ωft )
i ∈ t , t = 1, . . . , T , where t is the set of firms  
under consideration at the beginning of time period = f α + β  xti + ωft (2)
t and nt = |t | is their cardinal number. The default
indicator variable can be motivated in terms of
a threshold approach wherein default is said to which in statistical terms is a GLMM; see [10] and
occur when a continuous variable falls below a the references cited therein. ft is a realization of
threshold. This approach is based on the asset value the systematic factor Ft , which is called the time-
model due to Merton [11] where the firm declares specific random effect. f (.) :  → (0, 1) denotes a
bankruptcy when the value of its asset is below response or link function given by the distribution
the principal value of its debt at maturity. Let Vti of the idiosyncratic random error εti . In the credit
denote the asset value return of firm i at time t metrics model, the idiosyncratic errors are standard
(i ∈ t , t = 1, . . . , T ), or more generally a variable normally distributed (σ 2 = 1), whereas in the credit
representing an obligor’s credit quality. Then, the portfolio view model the idiosyncratic error follows
obligor defaults when Vti falls below a threshold cti , a logistic distribution (σ 2 = π 2 /3). This leads to the
that is, Dti = 1 ⇔ Vti < cti . Crucial parts in credit common link functions of the probit f (y) = (y)
risk management now are the modeling of Vti , the or the logit function f (y) = 1/ (1 + exp(−y)). In
parameterization of cti , and finally the estimation the credit risk plus approach, the systematic and
of parameter. In most industry credit risk models, the unsystematic factors are linked multiplicatively
such as credit metrics, credit risk plus, and credit rather than linearly and their distributions are Gamma
portfolio view, the triggering variable Vti is driven and Exponential, respectively. For details, we refer
by two sources of random factors. The first is a to [7].
systematic random factor Ft following a distribution The (with respect to the random effect uncondi-
G, which affects all firms jointly and therefore cannot tional) probability of default (PD) is given by the
2 Portfolio Credit Risk: Statistical Methods

expectation Extensions of model (5) include more than one


random effect and are, therefore, called multifactor
models.
PD ti = P (Dti = 1|xti ) A special case of this model results if the oblig-
 ors are homogeneous in xti , that is, xti = xt and
 
= f α + β  xti + ωft dG(ft ) (3) thus cti = ct for all i. Then, all obligors exhibit the
same conditional PD and the Bernoulli mixture dis-
tribution (5) drops
 to the binomial mixture distribu-
which depends on the distribution of the random tion. Let Dt = i∈t Dti be the number of defaults.
factor. For example, in the credit metrics model, the Then,
random effect is assumed to follow a standard normal
distribution. Then, in the probit model, the simple  
nt
expression for the unconditional PD P (Dt = dt |xt , ft ) = P (Dti = 1|xt , ft )dt
dt

PD ti = P (Dti = 1|xti ) = (α̃ + β̃  xti ) = (c̃ti ) × (1 − P (Dti = 1|xt , ft ))nt −dt (7)

(4) with the unconditional distribution analogous to equa-


√ tion (6). Define the default rate pt = dt /nt as the
where c̃ti = α̃ + β̃  xti , α̃ = α/ 1 + ω2 and
results, √ ratio of defaulting obligors divided by the total num-
β̃ = β/ 1 + ω2 . The correlation between the latent ber of obligors. As shown in [15, 16], the distribution
variables of obligor i and j , i  = j is given by of the default rate converges against the “Vasicek”-
2 distribution if the random effect is standard normally
ρij ≡ ρ = ω 2 , which is sometimes referred to
1+ω distributed and the number of obligors goes to infin-
as asset correlation since the latent variables are ity. The density is then given as
interpreted as asset value return. See [5] for a detailed
description of correlations. For the aforementioned 
distributions in the credit portfolio view and the 1−ρ 1 −1
f (pt ) = √ · exp ( (pt ))2
credit risk plus approach and the empirical estimation, ρ 2
compare [8]. 
1
− 2 (ct − 1 − ρ 2 · −1 (pt ))2 (8)

Portfolio Default Risk For a thorough description of large pool approxi-
mations, see [9].
The vector of default
 indicators of the portfolio
is denoted by Dt = Dt1 , . . . , Dtnt . Conditional on
the systematic random factor and given the xti , the Estimation Techniques
defaults are independent. Then the joint distribution
of defaults conditional on the systematic factor is There are basically two ways of estimating the
given by unknown model parameters. As the first way, one
can use asset values and asset value returns as in
 the KMV approach [1]. Given the level of liabilities,
P (Dt = dt |xti , ft ) = P (Dti = 1|xti , ft )dti the default probabilities can be derived. Correlation
i∈t estimates are obtained by calculating historical cor-
× (1 − P (Dti = 1|xti , ft ))1−dti (5) relations from asset value returns. As the crucial part
of these methods is deriving the asset values and
which is also known as a Bernoulli mixture model [6]. the capital structure of the firm rather than the sta-
The unconditional distribution (where unconditional tistical procedures, they are not discussed here in
refers w.r.t. the random effect) is obtained as detail. Instead, we refer to [3] and the references
cited therein. As the second way, the parameters

can be estimated using time series d1 , . . . , dT of
P (Dt = dt |xti ) = P (Dt = dt |xti , ft )dG(ft ) (6) observed default events. The simplest methods can
Portfolio Credit Risk: Statistical Methods 3

be employed for the case of the homogeneous portfo- the (homogeneous) asset correlation can be derived as
lio with time-constant parameters where closed-form
solutions for the estimators exist. In the GLMM m2 /T − m21 /T 2
=
ρ (13)
model, more advanced numerical techniques have to 1 + m2 /T − m21 /T 2
be used. Here, we briefly describe the method of
moments and the maximum-likelihood method. For = (T −1 1 − ρ
PD m1 ) (14)
Bayes estimation, we refer to [10] and the references T T
cited therein. where m1 = t=1 pt and m2 = t=1 pt2 [4].
In the general case of the GLMM where obligors
Method of Moments are heterogeneous, the log-likelihood is given via
equation (6) as
If the obligors in the portfolio or the segment are
homogeneous and the parameters are constant, only

T  
two parameters are to be estimated, namely, the PD l= ln P (Dti = 1|xti , ft )dti
and the correlation. Gordy [7] applies the method of t=1 i∈t
moments estimator to the probit model. He shows that
expectation and variance of the conditional default (1 − P (Dti = 1|xti , ft ))1−dti dG(ft ) (15)
probability are
As the log-likelihood function includes solving
E (CPD(Ft )) = PD (9) several integrals, it is numerically optimized w.r.t. the
unknown parameters for which several algorithms,
and such as the Newton–Raphson method, exist and are
implemented in many statistical software packages.
Var (CPD(Ft )) The integral approximation can be conducted by,
  for example, the adaptive Gaussian quadrature as
= 2 −1 (PD), −1 (PD), ρ − PD 2 (10) described in [12]. Under usual regulatory conditions,
where 2 (·) is the bivariate normal cumulative distri- the resulting estimators asymptotically exist, are
bution function for two random variates, with expec- consistent, and converge against normality. See [2],
tation zero and variance one each and correlation ρ. p.243, for a detailed discussion. Applications and
An unbiased estimator for the unconditional PD is estimation results can, for instance, be found in [6,
given by the average default rate: 8, 13, 14]. For the extension of higher dimensional
random effects, there are also some approximation
1

T
methods that can be used, particularly penalized
p̄ = pt (11) quasi-likelihood (PQL) and marginal quasi-likelihood
T t=1
(MQL) [10].
The left-hand side of equation (10) can be esti-
mated by the sample variance of the default rate: Bayes Estimation
1

T
sp2 = (pt − p̄)2 (12) Finally, Bayes estimation can be used for estimation
T −1 t=1
as thoroughly shown in [10]. The joint prior dis-
tribution of Ft , β (including a constant) and some
Given the two estimates, the asset correlation ρ hyperparameters θ can be given as
can be backed out numerically from equation (10).
Gordy [7] also provides a finite sample adjustment for p(β, Ft , θ) = p(Ft |θ) · p(θ) · p(β) (16)
the estimator. However, this modified estimator turns
out to perform similar to the simple estimator [3]. where a priori independence between β and θ is
assumed. Mostly, Markov chain Monte Carlo meth-
Maximum-likelihood Method ods are applied, which can deal with even more
complex models than shown here, such as autocor-
In the limiting case (8), asymptotic maximum- related random effects or multifactor models. For a
likelihood estimators of the (homogeneous) PD and detailed description, we refer to [10].
4 Portfolio Credit Risk: Statistical Methods

References [8] Hamerle, A. & Roesch, D. (2006). Parameterizing credit


risk models, Journal of Credit Risk 3, 101–122.
[9] Kreinin, A. (2009). Large pool approximations for credit
loss, Encyclopedia of Quantitative Finance 09, 017.
[1] Bohn, J. & Crosbie, P. (2003). Modeling Default Risk,
[10] McNeil, A.J. & Wendin, J.P. (2007). Bayesian inference
KMV Corporation.
for generalized linear mixed models of portfolio credit
[2] Davidson, R. & MacKinnon, J.G. (1993). Estimation
risk, Journal of Empirical Finance 14, 131–149.
and Inference in Econometrics, Oxford University Press,
[11] Merton, R.C. (1974). On the pricing of corporate debt:
New York.
the risk structure of interest rates, Journal of Finance
[3] Duellmann, K., Kll, J. & Kunisch, M. (2008). Estimat-
29, 449–470.
ing Asset Correlations from Stock Prices or Default Rates [12] Pinheiro, J.C. & Bates, D.M. (1995). Approximations
which Method is Superior? Deutsche Bundesbank Dis- to the log-likelihood function in the nonlinear mixed-
cussion Paper, Series 2: Banking and Finance, Deutsche effects model, Journal of Computational and Graphical
Bundesbank, Vol 04. Statistics 4, 12–35.
[4] Duellmann, K. & Trapp, M. (2005). Systematic risk in [13] Roesch, D. (2005). An empirical comparison of default
recovery rates- an empirical analysis of U.S. corporate risk forecasts from alternative credit rating philosophies,
credit exposures, in Recovery Risk: The Next Challenge International Journal of Forecasting 25, 37–51.
in Credit Risk Management, E.I. Altman, A. Resti & [14] Roesch, D. & Scheule, H. (2005). A multi-factor
A. Sironi, eds. Deutsche Bundesbank. approach for systematic default and recovery risk, Jour-
[5] Frey, R. (2009). Default correlation and asset correlation, nal of Fixed Income 15, 63–75.
Encyclopedia of Quantitative Finance 10, 038. [15] Vasicek, O.A. (1987). Probability of Loss on Loan
[6] Frey, R. & McNeil, A. (2003). Dependent defaults in Portfolio. Working paper, KMV Corporation.
models of portfolio credit risk, Journal of Risk 6, 59–92. [16] Vasicek, O.A. (1991). Limiting Loan Loss Distribution.
[7] Gordy, M.B. (2000). A comparative anatomy of credit Working paper, KMV Corporation.
risk models, Journal of Banking and Finance 24,
119–149. DANIEL RÖSCH
Recovery Rate median recovery rates. The value-weighted mean
recovery rate is the average recovery rate on all
defaulted issuers weighted by the face value of
A recovery rate (RR) is the fraction of an obligor’s those issues. Issuer-weighted mean recovery rates
debt that a creditor stands to recover in the event and the issuer-weighted median recovery rates are
of default. Recovery rates are usually expressed as the average and median, respectively, of the recov-
a percentage of the par value of the claim (RP). ery rates on each issuer. Varma et al. [17] report
Alternatively, recovery rates can be expressed as historical recovery rates from 1982 to 2003 interna-
a percentage of the market value of the claim tionally. Globally, the value-weighted mean recov-
prior to default (RMV), or as a percentage of an ery rate for all bonds over that period was 33.8%,
equivalent treasury bond (RT). Recovery rates are whereas the issuer-weighted mean and median recov-
closely associated to the concept of loss-given-default ery rates were 35.4% and 30.9%, respectively. In the
(LGD) where LGD = 1 − RR. Recovery rates are United States, the value-weighted mean recovery rate
not known prior to default and can vary between 0 for all bonds over that period was 35.4%, whereas
(full loss) and 1 (full recovery). Recovery rate risk the issuer-weighted mean and median recovery rates
in credit portfolios exists because of the uncertainty were 35.4% and 31.6%, respectively. For sovereign
regarding recovery rates in the event of default. bonds, the value-weighted mean recovery rate for
Along with the probability of default, recovery all bonds over that period was 31.2%, whereas the
rates are important parameters in determining the loss issuer-weighted mean and median recovery rates
distribution of a credit portfolio. For this purpose, were 34.4% and 39.8%, respectively. Furthermore,
the Basel II Accords expressly recommend that the recovery rates will differ depending on seniority and
calculation of regulatory capital on banking institu- collateral of the bond. For instance, senior secured
tions include the estimated recovery rates on their corporate bonds have a value-weighted mean recov-
credit portfolios. The most widespread methodologies ery rate of 50.3%, compared to 22.9% for junior
for estimating recovery rates use historical averages subordinated bonds, over 1982–2003. Carayon et al.
that are conditioned on the type of credit instrument, [7] find that recovery rates on European bonds tend to
seniority (priority of repayment), and collateral [3]. be smaller. For instance, over 1987–2007, they find
However, these estimation methods do not account that senior secured bonds in Europe recover (issuer
for the fact that recovery rates are known to be weighted) 61% compared to 70.6% in North America.
negatively correlated to the probability of default In the Asia-Pacific (excluding Japan) region, Tennant
[1, 2]. The correlation between recovery rates and et al. [16] find lower recovery rates of 35.61% on
default probabilities is important because it exacer- senior secured bonds over the 1990–2007 period.
bates potential losses on credit portfolios. To this
effect, recent credit models have attempted to capture
the endogenous nature of recovery rates [2, 4, 9, 11]. Recovery Rates and Default Risk
Furthermore, recent products in the credit derivatives
market have enabled the extraction of recovery rates The major problem for credit risk models is that there
either directly [5] or indirectly [10, 14]. In addition, is a large body of empirical evidence suggesting that
since 2003, major credit rating agencies have been recovery rates are negatively correlated with default
offering recovery rate ratings based on proprietary probabilities. High periods of default are associated
models [8]. with low recovery rates and vice versa. The correla-
tion between default probabilities and recovery rates
Historical Recovery Rates may be ascribed to at least two nonmutually exclusive
reasons. First, economic downturns can simultane-
Historical recovery rates for different types of credit ously cause increases in the probability of default
securities are considered as important parameters in and lower the value of recovered assets. Second, the
many credit risk models. There are various ways to price at which recovered assets are sold will depend
estimate historical recovery rates. The most com- on the financial condition of peer firms [15]. Under
mon are value-weighted mean recovery rates, issuer- the latter argument, in periods of high default, recov-
weighted mean recovery rates, and issuer-weighted ered assets are forced to be sold at “fire-sale” prices.
2 Recovery Rate

Acharya et al. [1] find both theories to be at work in of the final recovery rate. Arthur and Kapoor [6]
explaining recovery rates. Altman et al. [2] empiri- show how recovery rates can be recovered using
cally estimate the relationship between recovery rates a DDS and a CDS. Finally, Pan and Singleton
(y) and default rates (x) using one linear and three [14] and Das and Hanouna [10] use CDS with
nonlinear specifications: different maturities to extract default probabilities and
recovery rates.
y = 0.51 − 2.61x; R 2 = 0.51 Approximately, if credit spreads are known, we
may write the spread s as a function of default
y = 0.002 − 0.113 ln(x); R 2 = 0.63 probability (λ) and recovery rate (φ): s ≈ λ(1 − φ),
y = 0.61 − 8.72x + 54.8x 2 ; R 2 = 0.65 implying that recovery may we written in a reduced-
form setting as follows:
0.138
y= ; R 2 = 0.65 (1) s
x 0.29 φ =1− (2)
λ
All these specifications show a strong nega-
tive relationship between default rates and recovery More formalized and exact versions of this approx-
rates. imate relation may be derived from a CDS pricing
model or a bond pricing model. Recovery may also
be derived in the class of Merton [13] models. The
Economic Features of Recovery Rates expression for recovery rate is
 
There are several economic features of recovery rates VT 1
that are important: E[φ] = E |VT < D = E [VT |VT < D]
D D
1. As described above, recovery rates are negatively V0 rT
= e {1 − N (d1 )}
correlated with default rates. This is the case D
when the data is examined historically as shown ln(V0 /D) + (r + 12 σV2 )T
in [2] as well as when implied from the data, as d1 = √ (3)
in [10]. σ T
2. Recovery rates are highly variable and depend where {V0 , σ } are the initial value and volatility of
on regime (see [12]). They vary within rating the firm, D is the face value of debt with maturity T ,
and seniority class as well. and r is the risk-free interest rate. N (·) is the normal
3. Seniority and industry are statistically significant distribution function.
determinants of recovery rates, as shown by
Acharya et al. [1]. These authors also find that,
in industries with high asset-specificity, recovery References
rates are lower.
[1] Acharya, V., Bharath, S.r & Srinivasan, A. (2007).
Does industry-wide distress affect defaulted firms?
Evidence from creditor recoveries, Journal of Financial
Implied Recovery Rates Economics 85(3), 787–821.
[2] Altman, E., Brady, B., Resti, A. & Sironi, A. (2004).
Recovery rates can also be implied from prices The link between default and recovery rates: theory,
of certain credit derivatives. One then speaks of empirical evidence and implications, Journal of Business
“implied” (or risk-neutral) recovery rates, which may 76(6), 2203–2227.
not coincide with historically observed recovery rates. [3] Altman, E., Resti, A. & Sironi, A. (2003). Default
Recovery rate swaps are agreements to exchange a Recovery Rates in Credit Risk Modeling: A Review of
fixed recovery rate for the realized recovery rate the Literature and Empirical Evidence, working paper,
New York University.
allowing the market’s expected recovery rate to be [4] Bakshi, G., Madan, D. & Zhang, F. (2001). Recovery
directly recovered [5]. Digital credit default swaps in Default Risk Modeling: Theoretical Foundations and
(DDS) are credit default swaps (CDSs) where the Empirical Applications, working paper, University of
recovery rates on default are prespecified, irrespective Maryland.
Recovery Rate 3

[5] Berd, A.M. (2005). Recovery swaps, Journal of Credit [13] Merton, R.C. (1974). On the pricing of corporate debt:
Risk 1(3), 1–10. the risk structure of interest rates, The Journal of Finance
[6] Berd, A. & Kapoor, V. (2002). Digital premium, Journal 29, 449–470.
of Derivatives 10(3), 66. [14] Pan, J. & Singleton, Ken (2008). Default and recovery
[7] Carayon, J.-M., West, M., Emery, K. & Cantor, R. implicit in the term structure of sovereign CDS spreads,
(2008). European Corporate Default and Recovery Journal of Finance 63, 2345–2384.
[15] Shleifer, A. & Vishny, R. (1992). Liquidation values and
Rates, 1985–2007, Moody’s investors service.
debt capacity: a market equilibrium approach, Journal of
[8] Chew, W.H. & Kerr, S.S. (2005). Recovery ratings: a
Finance 47, 1343–1366.
new window on recovery risk, in Standard and Poor’s: [16] Tennant, J., Emery, K., Cantor, R., Elliott, J. & Cahill, B.
A guide to the Loan Market, Standard and Poor’s. (2007). Default and Recovery Rates of Asia-Pacific
[9] Christensen, J. (2005). Joint Estimation of Default and Corporate Bond, Moody’s Investors Service and Loan
Recovery Risk: A Simulation Study, working paper, Issuers, Excluding Japan, 1990–1H200.
Copenhagen Business School. [17] Varma, P., Cantor, Richard & Hamilton, David (2003).
[10] Das, S.R. & Hanouna, P. (2009). Implied Recovery, Recovery Rates on Defaulted Corporate Bonds and Pre-
forthcoming, Journal of Economic Dynamics and Con- ferred Stocks, 1982–2003, Moody’s investors service.
trol.
[11] Guo, X., Jarrow, R. & Zeng, Y. (2005). Modeling
the Recovery Rate in a Reduced Form Model , working
Related Articles
paper, Cornell University.
[12] Hu, W. (2004). Applying the MLE Analysis on the Recov- Credit Default Swaps; Credit Risk; Exposure to
ery Rate Modeling of US Corporate Bonds, Master’s Default and Loss Given Default; Recovery Swap.
Thesis in Financial Engineering, University of Califor-
nia, Berkeley. SANJIV R. DAS & PAUL HANOUNA
Internal-ratings-based identifying criteria (typically financial ratios) with
good discriminatory power and combining them by
Approach means of statistical regression or other mathematical
methods.
However, in order to use such tools, there must
Within the new Basel capital rules for banks (see be sufficient historical data—both on defaulted and
Regulatory Capital), the internal-ratings-based appr- surviving borrowers and exposures—for determin-
oach (IRBA) represents perhaps the most important ing the discrimination criteria and calibrating their
innovation for regulatory minimum capital require- weightings. In practice, obtaining such data often
ments. For the first time, subject to supervisory proves to be more difficult than the statistical analy-
approval, banks are allowed to use their own risk sis as such, either because historically borrower and
assessments of credit exposures in order to deter- exposure characteristics were not stored in a readily
mine the capital to be held against them. Within the usable manner, or simply because for some portfo-
IRBA, banks estimate the riskiness of each exposure lios there is not sufficient default data. In general,
on a stand-alone basis. The risk estimates serve as rating systems may include a set of quantitative and
input for a supervisory credit risk model (implicitly some qualitative criteria. The weighting of these cri-
given by risk weight functions) that provides a value teria may also be determined by expert opinion rather
for capital that is deemed sufficient to cover against than by statistical tools. In the extreme, for example,
the credit risk of the exposure, given the assumed in international project finance where certain criteria
portfolio diversification. In order to obtain supervi- are deal breakers for loan arrangements (i.e., the exis-
sory approval for the IRBA, banks must apply for tence of sovereign risk coverage via export insurance
IRBA and fulfill a set of minimum requirements. for projects in regions with high political risk), there
Until approval is granted for the entire book or spe- might be no predetermined weighting scheme at all.
cific portfolios, banks must apply the simpler and less Notions appear to be not entirely uniform in prac-
risk-sensitive standardized approach for credit risk, tice. Often, but not always, the notion of a “scoring
where minimum capital requirements are determined system” or a “score card” is used for a purely statis-
in dependence on asset class (sovereign, bank, cor- tical rating system or the statistical part of a mixed
porate, or retail exposure) only and, if applicable, quantitative and qualitative rating system. Moreover,
ratings by external credit assessment agencies like the notion of scores tends to be more often used
rating agencies or credit export agencies. for retail and small business portfolios, while for
corporate, bank, and sovereign portfolios, the lit-
erature tends to speak of rating systems. From an
The Conception of Internal Rating
IRBA perspective, there are no conceptual differ-
Systems in Basel II ences between these notions: they all depict different
Bank internal rating systems are, in the most general forms of IRBA systems. Likewise, there is no IRBA
sense, risk assessment procedures, which are used for requirement for the number of rating systems a bank
the assignment of internally defined borrower and should apply. Usually, one would expect different
exposure categories. A rating system is based on a systems for retail, small businesses and self-employed
set of predefined criteria to be evaluated for each borrowers, corporates, specialized lending portfolios,
borrower or exposure subject to the system, and result sovereigns, and banks. Many of these asset classes
in a final “score” or “rating grade” for the borrower or might again see different rating systems, depend-
exposure. The choice and weighting of the criteria can ing on, for example, product type (very common
be manifold; there are no rules or guidance on which for retail portfolios, but not constrained to them) or
criteria to include or exclude. The main requirement sales volume and region (both common for corpo-
on IRBA systems is that their rating grades or scores rate portfolios), because the different borrower and
do indeed discriminate borrowers according to credit exposure categories might call for different sets of
default risk. rating criteria. Within a large, internationally active
In practice, rating systems are often designed as and well-diversified bank, one might expect to see a
purely or partly statistical tools, for example, by large number of different rating systems.
2 Internal-ratings-based Approach

IRBA Risk Parameters The Basel II Risk Weight Functions

In order to assess the overall risk of a bank portfolio,


Credit risk per rating grade is quantified by probabil- credit portfolio risk models have to evaluate the port-
ities of default (PDs), which give the probabilities of folio composition and its diversification. Within the
borrowers to default on their obligations, with regard Basel II IRBA, banks are not allowed to use their own
to the Basel default definition, within 1 year’s time. credit portfolio risk models and diversification esti-
The PD per rating grade is usually estimated by the mates for minimum regulatory capital ratios. Rather,
use of bank internal historical default data, which may they must input the risk parameters PD, LGD, and
be supplemented by external default data. A specific EAD into a supervisory credit portfolio risk model.
problem within the IRBA comes from the fact that This model can be roughly described as Vasicek’s
not all institutions have readily available default data [6] multivariate extension of Merton’s [5] model of
according to the Basel definition. In this case, adjust- the default of a firm. In statistical terms, the model
ments to the estimates must be made. could be characterized as a one-factor probit model
PDs may be estimated just for the next year (point where the events to be predicted are the borrow-
in time (PIT)) or as long term average PDs (through ers’ defaults and the single factor reflects the state
the cycle (TTC)). PIT estimates take into account the of the global economy. Moreover, the Basel model
current state of the economy—as a consequence, PDs assumes an infinitely granular portfolio, such that all
per rating grade might change over time—while TTC idiosyncratic single name risk is diversified away. In
estimates do not. The Basel Accord seems to impli- this sense, the Basel model is an asymptotic single
cate TTC estimates. Nonetheless, many supervisors risk factor (ASRF) model. For further details of the
might be prepared to accept PD estimates that are model, see [1, 4].
more of PIT type because eliminating all cyclical The assumptions of a single risk factor and of infi-
effects from rating systems and PD estimates might nite granularity lead to the following characteristics
be difficult to afford in practice. of the Basel credit risk model:
In the Basel sense, an IRBA rating system contains
1. The capital charge per exposure can be described
two additional dimensions: an exposure at default
in closed form by risk weight functions (cf [3],
(EAD) dimension, assessing the expected exposure
paragraphs 272 and 273 for corporate, sovereign,
at the point in time when the borrower defaults, and
and bank exposures and paragraphs 328, 329, and
a loss given default (LGD) dimension, measuring the
330 for retail exposures). The specifications of
expected percentage of exposure that is lost in case the risk weight function for the different exposure
a borrower defaults. The EAD dimension is mainly classes can be derived from the following generic
driven by product characteristics, for example, how formula for the capital requirement K per dollar
easily lines can be drawn by the borrower or reduced of exposure:
by the bank prior to default, while the LGD dimen-
sion is heavily dependent on collateral, guarantees,    
and other risk mitigants. Here again, Basel notions G(PD) R
K = LGD N √ + G(0.999)
slightly differ from literature and practice: in indus- 1−R 1−R
try, the notion of a rating system often refers to the 
PD dimension only, while for Basel it includes all 1 + (M − 2.5) b
− PD (1)
three dimensions. Within the IRBA, banks must pro- 1 − 1.5 b
vide a PD, LGD, and EAD for all their exposures.
While the PD must always be estimated by the bank In this equation,
itself, banks can choose whether they want to use
supervisory LGD and EAD estimates for given prod- • the probability of default PD and the loss
uct and collateral types (thus applying the so-called given default LGD are measured as
foundation IRBA) or whether they want to estimate decimals;
these values themselves, too (thus using the so-called • the maturity adjustment b is given by
advanced IRBA). (0.11852 − 0.05478 ln(PD))2 ;
Internal-ratings-based Approach 3

• N denotes the standard normal distribution or want to measure the diversification benefits of
function; their portfolio, they need to develop their own,
• G denotes the inverse standard normal distri- fully fledged credit risk models.
bution function; 3. The capital charge for each exposure, given its
• the effective maturity M was fixed at 1 year risk parameters, depends on the correlation with
for retail exposures, and assumes values the single systematic risk factor and the so-
between 0 and 5 years for other exposures as called confidence level. The confidence level for
described in detail in paragraph 320 of [3]. minimum capital requirements was set by the
Basel Committee to be 99.9%. As a consequence,
The risk weight functions for the different expo- the probability that the bank will suffer losses
sure classes differ mostly in the specification of from the credit portfolio that exceed the capital
the asset correlation R. For the retail mortgage requirements should be of an order of magni-
exposure class, R was fixed at 15%. For revolving tude like 0.1%. The correlations were estimated
retail credit, R is 4%. In contrast, in the corporate, from supervisory data bases and are assumed to
sovereign, and bank exposure classes, R depends decrease with decreasing creditworthiness.
on PD by 4. The ASRF takes only the default event as
stochastic and treats the loss in case of default
 
1 − e−50 PD 1 − e−50 PD as deterministic. As in practice loss amounts
R = 0.12 + 0.24 1− are stochastic as well, and potentially correlated
1 − e−50 1 − e−50 with the drivers of the default events, banks
(2) are supposed to take account of this effect in
their LGD estimates, by estimating the downturn
and also for other retail exposures, R is given as LGDs instead of average LGDs.
a function of PD by 5. Lastly, the ASRF is a default mode (DM) model
that only accounts for losses due to defaults
  within a given time horizon (1 year) but not for
1 − e−35 PD 1 − e−35 PD
R = 0.03 + 0.16 1− losses due to rating migrations and future losses
1 − e−35 1 − e−35 after 1 year. This simplification does not comply
(3) with modern accounting practice. It was therefore
adjusted by introducing the maturity adjustments,
which can be seen as an extension of the model
2. The capital charge per exposure depends only toward a marked-to-market (MtM) mode.
on the risk parameters PD, LGD, and EAD of
the exposure, but not on the portfolio composi-
Minimum Requirements
tion. Thus, the capital charge for each exposure
is exactly the same, no matter which portfolio it In order to apply the IRBA, banks must have explicit
is added to (portfolio invariance). From a super- approval from their supervisors. Approval is subject
visory point of view, portfolio invariance was an to a set of minimum requirements aimed to ensure
important characteristic in developing the Basel the integrity of the rating model, rating process, and
risk weight functions, as it ensures computa- thus of the risk parameters and capital charges. The
tional simplicity and the preservation of a level minimum requirements ([3], Part 2, Section III. H)
playing field between well diversified and spe- hence assemble around the following themes:
cialized banks. The capital charge for the entire
portfolio is the sum of the capital charges for • Rating system design. As mentioned before,
individual exposures. The downside of portfo- there are no regulatory requirements with regard
lio invariance is that the Basel formula cannot to the rating criteria. Rating grades must, in a
account for risk concentrations. If it did, the cap- sensible way, discriminate for credit risk, and
ital charge for an exposure would again have to the onus of proof is with the bank. Moreover,
depend on the portfolio to which it is added. If there must be at least seven rating grades for
banks are concerned about concentration effects performing and one grade for nonperforming
4 Internal-ratings-based Approach

exposures in the PD dimension. No minimum enhanced with external data sources and expert
grade numbers are given for the LGD and EAD judgment if needed.
dimension. Also, there is no requirement of a • Validation of internal estimates. The PD, LGD,
common master scale across all rating systems, and EAD estimates must be validated against
although many banks develop such a scale for actually observed default rates and losses.
internal risk management and communication Owing to relatively short time series for the
purposes. latter, validation remains one of the more dif-
• Rating system operations. By this set of min- ficult issues within the IRBA. For available
imum requirements, banks are asked to ensure statistical techniques see, for example, [2].
the integrity of the rating process. Most notably, Where statistical validation is not reliable, banks
the rating assignments must be independent should use more qualitative validation tech-
from any business units gaining from credit niques, like ensuring good rating process gov-
approval (e.g., the sales department). Moreover, ernance, integrity of the input data, and so on.
there should be no “cherry picking” between • Disclosure requirements. Banks that use the
rated and nonrated exposures (the latter being IRBA must base their capital and risk disclosure
treated in the less risk-sensitive standardized requirements (the Third Pillar of Basel II) on
approach), although a temporary partial use of their IRBA figures.
IRBA, coupled with a supervisory approved
In practice, compliance with the minimum require-
implementation (roll-out plan) for bankwide
ments often seems to prove much more difficult and
IRBA use, and a permanent partial use for
costly than the development of the rating systems as
insignificant portfolios are allowed. Another such. The most difficult issues seem to be data avail-
important aspect is the integration of the rat- ability, IT system implementation, and data feed, the
ings into day-to-day credit processes, including actual rating of entire portfolios, which often require
IT systems and input data availability. large amounts of data for all exposures to be fed into
• Corporate governance and oversight. This set the systems (in the worst case manually, as often data
of criteria requires banks to embed their rating not consistent with the rating criteria have been stored
systems into the overall governance structure of in the past) and, connected to this, the buy-in of senior
the bank. Most notably, senior management is management and the entire credit business into the
supposed to buy into the systems and formally more risk-sensitive and more transparent IRBA.
approve for wide use within the bank, such that
the systems become accepted risk management
tools at all levels in the organization. Also, the Implications for the Bank Internal Use of
role of internal audit in regular rating audits is
IRBA Figures
defined.
• Use of internal ratings. Banks will only receive Risk quantification via IRBA can be of great use
IRBA approval if they use their ratings for a for the bank internal credit risk measurement and
wide range of bank internal applications. Exam- management. However, there are some limitations.
ples include credit approval, limit systems, risk- The most important of these is surely that due to
sensitive pricing and loss provisioning. Rating the asymptotic single risk factor model, the IRBA
systems solely developed for regulatory pur- provides no measure of risk concentrations, be they
poses will not be recognized, as only the deep single name, industry, or regional concentrations.
rooting into day-to-day credit risk management If banks are concerned about concentration effects
actions will ensure their integrity. or—as the other side of the same coin—want to
• Risk quantification. Banks need to quantify the measure the diversification benefits of their portfolio,
risk parameters PD, LGD, and EAD, based on they need to go further and develop their own, full-
their rating grades and on the Basel default and fledged credit risk models with more than one risk
loss definitions ([3], paragraphs 452, 453, and factor and their own correlation estimates. Likewise,
460). In doing so, they should employ a variety the asymptotic assumption needs to be given up in
of data sources: preferentially internal data, but order to capture idiosyncratic single name risk.
Internal-ratings-based Approach 5

The most significant benefit of the IRBA for bank Comprehensive Version. Basel Committee of Banking
internal risk management lies in the standardized Supervision.
assessment and measurement of stand-alone borrower [4] Gordy, M. (2003). A risk-factor model foundation for
ratings-based bank capital rules, Journal of Financial
and exposure credit risk. Credit risk becomes much Intermediation 12(3), 199–232.
more transparent within the organization, and there [5] Merton, R.C. (1974). On the pricing of corporate debt: the
is “one common currency” for risk, expressed by risk structure of interest rates, Journal of Finance 29(2),
the risk parameters PD, LGD, and EAD and the 449–470.
regulatory capital charges based on them. [6] Vasicek, O.A. (2002). The distribution of loan portfolio
value, Risk 15, 160–162.

References
Related Articles
[1] BCBS (2004). An Explanatory Note on the Basel II IRB
Risk Weight Functions. Basel Committee of Banking
Credit Rating; Credit Risk; Credit Scoring;
Supervision.
[2] BCBS (2005). Studies on the Validation of Internal Rat- Economic Capital; Exposure to Default and
ing Systems. Basel Committee of Banking Supervision, Loss Given Default; Large Pool Approximations;
Working Paper No. 14. Regulatory Capital.
[3] BCBS (2006). International Convergence of Capital Mea-
surement and Capital Standards. A Revised Framework, KATJA PLUTO & DIRK TASCHE
Exposure to Default and potentially as large as the current unused portion
(UP) of the credit line. To account for exposure risk,
Loss Given Default banks compute credit conversion factors (CCF) as
CCF ≡ E/UP .a Once a set of CCFs, associated
with different types of borrowers and exposures, has
been estimated, a bank can forecast EAD as
In the study of credit risk, the most relevant factor
has traditionally been the borrower’s probability of EAD = DP + CCF · UP (1)
default (or intensity of default), expressing default
risk and, indirectly, migration risk. However, there CCFs are usually calibrated through a statistical anal-
are other risk profiles that significantly affect the loss ysis of past defaults (see, e.g., [9, 11, 25, 27]), where
experienced by the lender upon the occurrence of a the CCF is explained through the characteristics of
default: exposure at default and loss given default. the borrower, the exposure, and the economic envi-
The uncertainty surrounding these variables gives ronment. When past events are analyzed, the UPs
rise, respectively, to exposure risk and recovery risk. must be recorded some time before the default: this
These risks (captured through parameters like EAD, can be a fixed interval (“fixed time horizon,” e.g.,
LGD, and RR, as explained below) have become 12 months before the default) or a fixed moment in
increasingly popular thanks to the preliminary drafts time for all defaults that occurred in the same period
of the new accord on bank capital requirements (“cohort approach,” e.g., January 1 for all exposures
(“Basel II”) that were circulated by the Basel Com- defaulted in a given year); multiple UPs can also be
mittee after 1999 and led to a new regulatory text in recorded at several different instants in time (“vari-
2004 [12]. able time horizon,” e.g., 6, 12, 24 months before
default) to assess the impact of time-to-default on
exposure risk.b
Exposure at Default and Exposure Risk In fact, CCFs can be expected to increase with
time-to-default: a study based on some 400 borrowers
In the simplest forms of credit exposure, the amount in the period 1995–2000 [9] has shown that one-year
due to the lender in the event of a default (that is, the CCFs average 32%, while five-year CCFs average
exposure at default (EAD)) is known with certainty. 72%; this may be due to a rating migration effect
This is the case, for example, of zero-coupon bonds and a greater opportunity to draw down. CCFs also
or fixed-term loans, where the balance outstanding seem to be driven by the percent usage ratio of the
is predetermined in advance and cannot be modified credit line (DP /(DP + UP)): lower usage rates are
without a formal credit restructuring. usually associated with higher CCFs and with better
However, the amount outstanding in the event of ratings [9].
a default might also be uncertain, basically due to the A well-known relationship also exists between rat-
following reasons: ings and CCFs: indeed, the latter have often been
found to increase for borrowers with better ratings.c
1. changes in the value of the contract to which the In other words, exposure risk is especially signifi-
defaulted party had committed itself (typically, cant when default risk is comparatively low. This is
an OTC derivative affected by a number of an expected result, given that firms with investment-
underlying variables); grade ratings can get funds from the commercial
2. the presence of a revolving credit line (e.g., paper market or by negotiating better terms with their
a loan commitment) where the borrower could suppliers, and hence tend to use a small portion of the
increase his/her credit usage before default. available credit lines (which are comparatively more
expensive); however, as their financial shape deteri-
While case 1, known as counterparty risk, can be orates and default gets closer, firms quickly resort to
considered as a sort of intersection between credit and bank credit lines, as other sources of funds dry up.
market risk, case 2 represents a typical example of Besides focusing on loan behavior at default,
exposure risk. Here, the borrower’s current exposure one can assess exposure risk by monitoring credit
(that is, the drawn part of the credit line, DP) can usage throughout the life of a facility, including both
increase to a larger EAD, with the increase (E) defaulted and performing exposures. These “usage
2 Exposure to Default and Loss Given Default

ratios” have been found to behave very differently for place. Curiously, however, Basel II states that CCFs
firms that eventually default, even several years later, cannot be set below zero, regardless of any empirical
as opposed to nondefaulting obligors. For example, evidence that a bank may produce to its supervisors.
a sample of about 770 000 lines of committed credit Apart from OTC derivatives and credit lines,
lines recorded in the Spanish central credit registerd exposure risk can also arise from the issuance of
shows that defaulting exposures have a median usage guarantees and other off-balance-sheet items (e.g.,
ratio of 50%, in contrast to 43% for nondefaulting letters of credit, bid bonds, and performance bonds)
facilities; this median usage ratio was found to that might be used by third parties to get relief
increase (71%) in the last year before default. Usage after the default of the guaranteed entity (leading
ratios are instead lower, all other things being equal, to monetary outflow for the guarantor, that is, to
for “seasoned” credit lines (i.e., credit lines that an EAD). In this case, the EAD can be anywhere
have been in place for a number of years); this between zero and the amount of the off-balance
suggests that relationship banking may play a role sheet item (OBS), and CCFs can be computed as
in preventing usage peaks in credit lines. CCF ≡ EAD/OBS . CCF estimates associated with
Other borrower characteristics may also help different types of guarantees and OBS can then be
explain exposure risk: for example, usage ratios have used to forecast the EAD as EAD = CCF · OBS
been found to be higher for younger, smaller, and less
profitable firms (as age, size, and profitability tend
to be inversely related to PD, this is consistent with Loss Given Default and Recovery Risk
poorly rated companies being more dependent on
bank credit lines).e Other important explanatory vari- The loss rate given default—or simply loss given
ables are the borrower’s leverage, liquidity, and debt default (LGDg )—is the loss rate experienced by a
cushion; also, exposure risk tends to be higher for lender on a credit exposure if the borrower defaults.
larger companies and for those having a larger share It is given by 1 minus the recovery rate (RR) (see
of bank debt in their liabilities mix [25]. However, equation (3)) and can take any value between 0 and
generally, firm characteristics tend to have a compar- 100%. Formally,
atively limited impact on CCFs and usage statistics.
Exposure risk also seems to be affected by the LGD = 1 − RR (2)
macroeconomic cycle. For example, the gross domes-
tic product (GDP) growth rate has been found [27] LGD is never known when a new loan is issued,
to be inversely related to credit line usage, and such although a reasonable estimate can be produced when
a link is especially meaningful in the case of a slow- the default occurs, at least if there is a secondary
down or recession. This makes sense, as credit lines market where the defaulted exposure can be traded.
are often used to provide a liquidity buffer for bor- In fact, RRs can be computed based on several
rowers in times of financial strain. approaches [8,33]:
Other measures have been proposed as an alter-
native to CCFs: these are the EAD factor, EADF = 1. The market LGD approach uses prices of
EAD/(DP + UP ), and the exposure multiplier, defaulted exposures as an estimate of the RR. In
EM = EAD/DP . The former can be considered as practice, if a defaulted bond trades at 30 cents a
a special case of the usage ratio, recorded at the euro, one can infer that the market is estimating a
time of default; the latter cannot be computed when a 30% RR (hence, a 70% LGD). This approach can
credit line was totally undrawn before the borrower’s be used only for exposures traded on a secondary
defaultf . market.
CCFs can usually be expected to lie between 0 A variation of this approach (emergence LGD
(if the UP is still unused at default) and 1 (if the approach) estimates the RR on the basis of the
whole UP gives rise to an extra exposure). However, market value of the new financial instruments
the E and hence the CCF could also be negative; (usually, shares or long-term bonds) that are
this is likely to be the case if the credit line is offered to lenders in exchange for their defaulted
revocable or has some covenant entitling the bank claims. These are usually issued only when the
to claim its money back before a proper default takes restructuring process is over and the company
Exposure to Default and Loss Given Default 3

emerges from default; their market price must, studies estimating RRs in the 1970s [6] was based on
therefore, be discounted back in time to the a survey carried out among the workout departments
moment when the default took place, using an of a number of large banks in the period 1971–1975;
adequate discount rate. the average recovery on unsecured loans (based on
A third version of market LGD involves the the face value of cash flows on defaulted exposures,
use of spreads on performing bonds as a source recorded in the first three years after default and not
of information; in fact, spreads on corporate discounted) was found to be about 30%.
bonds depend on both the borrower’s PD and In the following years, recoveries on bank loans
the expected RR. Assuming the PD can be have been foundi to be affected by many factors,
estimated otherwise, one can then work out the including the size of the loan and different collateral
LGD implied by market spreads (implicit market types. More generally, the four main drivers of RRs
LGD); alternatively, by assuming that some and LGDs can be summarized as follows:
relationship exists between PD and LGD (see
below), PD and LGD can be derived jointly [13]. Exposure characteristics
Note that implicit market LGD makes it possible These include the presence of any collateral (be it
to use a considerably larger dataset, including represented by financial assets of other goods, such
performing exposures, and not only defaulted as plants, real estate, inventories) and its degree of
ones. However, note that LGDs derived from effectiveness (that is, how easily it can be seized and
market prices often are risk-neutral quantities; liquidated); the priority level of the exposure, which
therefore, some assumption on the relationship can be senior or subordinated to other exposures;
between them and real world LGDs is needed if any guarantees provided by third parties (like banks,
implicit market LGDs are to be used. holding companies of public sector entities). An
2. When market data are not available (as for most important driver of recoveries is also the exposure’s
traditional banking loans, where no secondary “debt cushion”, that is, the amount of the liabilities
market exists) one must turn to the workout in the borrower’s balance sheet that are junior to the
approach. This is based on the actual recoveries one being evaluated; as the volume of such junior
(and recovery costs) experienced by the lender in securities increases, so does the RR on the senior
the months (years) after the default took place. exposure, as its holders are more likely to find an
It therefore requires to set up a database, where adequate volume of assets to be liquidated and used
all recoveries on defaulted exposures are filed. as a source of cash [28,34].
According to this approach, the RR (also known
as ultimate recovery) can be computed using the Borrower characteristics
following equation: These include the industry where the company oper-
 ates, which may affect the liquidation process, that
Ri · (1 + r)−Ti is, the ease with which the firm’s assets can be sold
RR =
i
(3) and turned into cash for the creditors,j the country of
EAD the obligor, which affects the speed and effective-
where Ri is the ith recovery flow associated ness of the bankruptcy procedures; some financial
with the defaulted exposure (negative Ri s denote ratios, like the leverage (namely, the ratio between
recovery costs), r is the appropriate discount total assets and liabilities, which shows how many
rate,h and Ti is the time elapsed between the euros of assets are reported in the balance sheet for
default and the ith recovery. Note that, based on each euro of debt to be paid back) and the ratio
equation (3), RR can be negative (hence LGD can of EBITDA (earnings before interest, taxes, depre-
exceed 100%) if recoveries do not offset recovery ciation, and amortization) to total turnover (which
costs. indicates whether the defaulted company is still capa-
ble of generating an adequate level of cash flow
The determinants of RRs have been extensively for its would-be borrowers). Another interesting vari-
investigated, mainly based on the market LGD able affecting LGD is the borrower’s original rating:
approach, although some examples of workout LGDs indeed, “fallen angels” (i.e., investment-class oblig-
exist (mainly for bank loans). Indeed, one of the first ors that were downgraded to junk) appear to behave
4 Exposure to Default and Loss Given Default

differently from straight speculative-grade issuers, [15], as well as junk bond data for 1982–2000.n
and have been found to recover significantly more Evidence of a strong relationship between LGD and
than bonds of the same seniority that were rated as the state of the economy, including default frequen-
speculative-grade at issuance.k cies, is also found by Moody’s KMV in its LossCalc
model [23], estimated on a dataset of over 3000
Lender (e.g., bank) characteristics recoveries on loans, bonds, and preferred stock.
These may include the efficiency levels of the depart- The correlation between economic cycle and
ment that takes care of the recovery process (workout recoveries appears stronger if estimated at the indus-
department) or the frequency with which out-of- try level [1]. In fact, if the sector where the borrower
court settlements are reached with the borrowers, or used to operate is undergoing a recession, the lender
nonperforming loans are spun-off and sold to third will find it more difficult to find a buyer for the
parties; in fact, sales of nonperforming loans and out- defaulted company or its assets (as competitors are
of-court settlements, while reducing the face value of likely to suffer from excess production capacity) and
the recovery (compared to what could be obtained by recoveries will be lower than expected. As recessions
the bank on the basis of a formal bankruptcy proce- may occur at the industry level when the econ-
dure), also significantly shorten the duration of the omy as a whole is doing reasonably well, moving
recovery process. The financial effect of this shorter from economy-wide to industry-specific conditions
recovery time usually more than offsets the lower can make the empirical link between default rates
recovered amount. and recoveries much easier to detect.
The PD/LGD correlation has wide-ranging impli-
Macroeconomic variables cations for credit risk models. First, the expected
These mainly include the level of the interest rates loss rate can no longer be considered as the product
(higher rates reduce the present value of recoveries) of the expected LGD times the borrower’s uncondi-
and the state of the economic cycle (if the economy is tional PD, since a second, positive addendum must
in recession, the value at which the companies assets be factored in, accounting for covariance. Second,
can be liquidated is likely to be lower). unexpected loss and Value at Risk prove to be con-
During the last years, an important stream of siderably higher than they are if independence is
research has addressed the relationship between PD assumed, as shown by [7]; in other words, if system-
and LGD. From a theoretical point of view, the same atic risk plays an important role for RRs, estimates of
macroeconomic background variables that affect the economic capital turn out to be downward biased.o
default probability of the borrowers (and cause While most RR studies focus on mean or median
default rates to rise) may drive down the liquida- values, it is also important to understand the whole
tion value of assets and increase LGD (so that the probability distribution of recoveries, if extreme sce-
distribution of LGDs is different in high-default and narios are to be fully understood and managed. In
low-default periods).l This intuition has prompted a the case of bank loans, the probability distribution
number of modelsm generalizing the “classic” single- of workout LGDs is usually strongly bimodal, with
factor model in [17] and [22] to the case where peaks at 0% and 100%. In the case of bonds, uni-
recoveries and defaults are driven by a common com- modal distributions may be sensible, but still it is
ponent (usually systemic in nature). strongly advisable to use flexible distributions, such
From an empirical point of view, several pieces of as the beta (which can be either uni- or bimodal
evidence indicate that LGDs and default rates tend to depending on the estimated parameters, and can eas-
increase together when the overall economic cycle ily be fit to the data by the generalized method of
deteriorates. For example, using data on US corpo- moments).p
rate bonds (Moody’s Default Risk Service database) Finally, it is worth emphasizing that, as with all
for 1982–1997, one finds that in a severe economic other risks, recovery risk may also produce prof-
downturn (when defaults are more frequent), recov- its. Indeed, the price performance of defaulted bonds
eries can be expected to drop by 20–25% compared (estimated by comparing market LGDs to emergence
with their unconditional average [20]. Similar results LGDs) can prove extremely brilliant, although this
are found using Standard and Poor’s Credit Pro is not always the case: while senior bonds (both
database (bond and loan defaults) for 1982–1999 secured and unsecured) have been found to perform
Exposure to Default and Loss Given Default 5

very well in the postdefault period (with per annum Merton framework, they present a model where collateral
returns of 20–30%), junior bonds often show nega- value is correlated with the value of the borrower’s assets
tive returns [3]. and hence to his/her PD. This leads to an inverse relation-
ship between default rates and RRs.
m.
See [19–21]. Jarrow [26] presents a model where, as in
Frye’s works, RRs and PDs are correlated and depend on
Acknowledgments the state of the economy; however, his methodology explic-
itly incorporates equity prices in the estimation procedure,
Part of this article, especially the LGD section, draws on
allowing the separate identification of RRs and PDs and the
previous work carried out with Andrea Sironi, to whom I
use of a larger dataset. Furthermore, he explicitly incorpo-
wish to express my gratitude.
rates a liquidity premium to account for the high variability
in the spreads on US corporate debt. In [32] and [15] also
End Notes models are proposed that account for the dependence of
recoveries on systematic risk by extending Gordy’s single-
a. factor model.
CCFs are sometimes also known as loan equivalents n.
See [2]. Note, however, that this study finds that a single
(LEQs). systematic risk factor—that is, the performance of the
b.
See [30] for further details on fixed time horizon, cohort economy as a whole—is less predictive than theoretical
approach, and variable time horizon. models would suggest, while a key role is played by the
c.
See, for example, [11], where a sample of loan supply of defaulted bonds.
commitments in 1987–93 is analyzed, [25], based on o.
See also the empirical results in [15].
3281 defaulted exposures issued by 720 borrowers in p.
For a more flexible approach, see [24] where a variation
1985–2006, or [9]. of the Gaussian kernel, known as Beta kernel, is used to
d.
These are all loan commitments above ¤6000 issued by fit the distribution of RRs of a sample of defaulted bonds
Spanish banks after 1984. See [27] for further details. from the period 1981–1999. See also [18], for an interesting
e.
See again [27], based on a subset of about 86 000 utility-based approach to the estimation of the conditional
companies.
f. probability distribution of RRs.
The EM is sometimes referred to as the CCF (in which
case, what we called CCF is indicated as LEQ. Note that,
given the important role played by bank capital regulation References
in shaping credit risk measurement techniques and jargon,
we chose to use the word CCF in a way that is consistent
with the terminology of the new Basel accord. [1] Acharya, V., Bharath, S. & Srinivasan, A. (2007). Does
g. industry-wide distress affect defaulted firms: evidence
In principle, one should indicate the loss rate given default
as LGDR (LGD rate) and use LGD for the absolute LGD from creditor recoveries, Journal of Financial Eco-
(in euros or dollars). However, “LGD” is used by most nomics, 85, 787–821.
practitioners (and by the new Basel accord on bank capital) [2] Altman, E.I., Brady, B., Resti, A. & Sironi, A. (2005).
to indicate the loss rate, while the absolute loss is usually The link between default and recovery rates: theory,
indicated as LGD · EAD. empirical evidence and implications, Journal of Business
h. 78(6), 2203–2228.
The choice of a suitable risk-adjusted r is far from trivial,
and basically depends on the amount of systemic risk of [3] Altman, E.I. & Eberhart, A. (1994). Do seniority pro-
the defaulted exposure. See [29]. visions protect bondholders’ investments? Journal of
i. Portfolio Management (Summer), 67–75.
See, for example, [10], based on 24 years of data compiled
by Citibank, or [14], using a sample of 371 loans issued [4] Altman, E.I. & Fanjul, G. (2004). Defaults and returns
by Portugal’s largest private bank during 1985–2000; in the high-yield bond market: the year 2003 in review
both studies are based on the workout approach. A study and market outlook, in Credit Risk—Models and Man-
on bank loans (large syndicated loans traded on the agement, D. Shimko, ed, RiskBooks, London.
secondary market) based on the market LGD approach is, [5] Altman, E.I. & Kishore, V.M. (1996). Almost everything
for example, [16]. you wanted to know about recoveries on defaulted
j. bonds, Financial Analysts Journal 52(6), 57–64.
See [1] based on market LGDs observed during the United
States during 1982–1999. See also [5] and the literature [6] Altman, E.R.H. & Narayanan, P. (1977). ZETA analysis:
survey in [33]. a new model to identify bankruptcy risk of corporations,
k.
See [4], based on a sample of corporate bonds stratified by Journal of Banking & Finance 1(1), 29–54.
original rating and seniority: in the case of senior-secured [7] Altman, E.I., Resti, A. & Sironi, A. (2005). Recovery
exposures, for example, the median RR for fallen angels Risk—The Next Challenge in Credit Risk Management,
was 50.5% versus 33.5%. Risk Books, London.
l.
A somewhat different approach has been proposed by [8] Altman, E., Resti, A. & Sironi, A. (2005). The PD/LGD
Peura and Jovivuolle [31]. Using an option-pricing, à la link: implications for credit risk modelling, in Recovery
6 Exposure to Default and Loss Given Default

Risk—The Next Challenge in Credit Risk Management, Management, E. Altman, A. Resti & A Sironi, eds, Risk
E. Altman, A. Resti & A. Sironi, eds, RiskBooks, Books, London.
London, pp. 253–266. [25] Jacobs, M. (2007). An Empirical Study of Exposure
[9] Araten, M. & Jacobs, M.J. (2001). Loan equivalents for at Default, mimeo Office of the Comptroller of the
revolving credit and advised lines, The RMA Journal Currency, Washington, DC.
83(8), 34–39. [26] Jarrow, R. (2001). Default parameter estimation using
[10] Asarnow, E. & Edwards, D. (1995). Measuring loss market prices, Financial Analysts Journal 57(5), 75–92.
on defaulted bank loans: a 24 year study, Journal of [27] Jiménez, G., Lopez, J.A. & Saurina, J. (2007). Empirical
Commercial Bank Lending 77(7), 11–23. Analysis of Corporate Credit Lines, San Francisco:
[11] Asarnow, E. & Marker, J. (1995). Historical perfor- working paper, 2007–14, Federal Reserve Bank of San
mance of the US corporate loan market: 1988–1993, Francisco, San Francisco.
Journal of Commercial Bank Lending (Spring), 13–32. [28] Keisman, D. (2003). Loss Stats, Standard & Poor’s, New
[12] Basel Committee on Banking Supervision (2006). Inter- York.
national Convergence of Capital Measurement and Cap- [29] Maclachlan, I. (2005). Choosing the discount factor
ital Standards—A Revised Framework—Comprehensive for estimating economic LGD, in Recovery Risk—The
Version, Bank for International Settlements, Basel. Next Challenge in Credit Risk Management, E. Alt-
[13] Das, S.R. & Hanouna, P.E. (2007). Implied Recovery, man, A. Resti & A. Sironi, eds, RiskBooks, London,
Tratto da SSRN, http://ssrn.com/abstract=1028612. pp. 285–306.
[14] Dermine, J. & Neto de Carvalho, C. (2005). How to [30] Moral, G. (2006). EAD estimates for facilities with
measure recoveries and provisions on bank lending: explicit limits, in The Basel II Risk Parameters: Estima-
methodology and empirical evidence, in Recovery Risk— tion, Validation and Stress Testing, E. Bernd & R. Robert,
The Next Challenge in Credit Risk Management, eds, Springer Verlag, Berlin.
E. Altman, A. Resti & A. Sironi, eds, RiskBooks, Lon- [31] Peura, S. & Jovivuolle, E. (2005). LGD in a structural
don, pp. 101–120. model of default, in Recovery Risk—The Next Challenge
[15] Duellmann, K. & Trapp, M. (2005). Systematic risk in Credit Risk Management, E.I. Altman, A. Resti &
in recovery rates of US corporate credit exposures, A. Sironi, eds, RiskBooks, London, pp. 201–216.
in Recovery Risk—The next Challenge in Credit Risk [32] Pykhtin, M. (2003). Unexpected recovery risk, Risk
Management, E. Altman, A. Resti & A. Sironi, eds, 16(8), 74–78.
RiskBooks, London, pp. 235–252. [33] Schuermann, T. (2005). What do we know about Loss
[16] Emery, K. (2003). Moody’s Loan Default Database as of Given Default? in Recovery Risk—The Next Challenge
November 2003, Moody’s Investors Service, New York. in Credit Risk Management, A. Resti & E.I. Altman, eds,
[17] Finger, C. (2001). The one-factor creditmetrics model in Risk Books, London.
the new Basel capital accord, RiskMetrics Journal 2(1), [34] Van de Castle, K. & Keisman, D. (1999). Recovering
9–18. Your Money: Insights Into Losses from Defaults, Stan-
[18] Friedman, C. & Sandow, S. (2003). Ultimate recoveries, dard & Poor’s, New York.
Risk August, 69–73.
[19] Frye, J. (2000). Collateral damage, Risk (April), 91–94.
[20] Frye, J. (2000). Collateral Damage Detected, Federal Further Reading
Reserve Bank of Chicago, Chicago.
[21] Frye, J. (2000). Depressing recoveries, Risk (November), Schleifer, A. & Vishny, R. (1992). Liquidation values and debt
108–111. capacity: a market equilibrium approach, Journal of Finance
[22] Gordy, M.B. (2003). A risk-factor model foundation for 47, 1343–1366.
ratings-based bank capital rules, Journal of Financial
Intermediation 12, 199–232.
[23] Gupton, G.M. & Stein, R.M. (2002). LossCalc: Moody’s Related Articles
Model for Predicting Loss Given Default (LGD),
Moody’s Investors Service, New York.
Counterparty Credit Risk; Recovery Rate; Value-
[24] Hagmann, M., Renault, O. & Scaillet, O. (2005). Esti-
mation of recovery rate densities: non-parametric and at-Risk.
semi-parametric approaches versus industry practice,
in Recovery Risk: the Next Challenge in Credit Risk ANDREA RESTI
Credit Portfolio is Ai has the representation
  
Simulationa Ai = Ri2
m
wij Xj + 1 − Ri2 Zi (2)
j =1

The idiosyncratic factors are independent of each


Portfolio Modeling other as well as independent of the systematic factors.
It is usually assumed that the factors follow a
In risk management, quantitative techniques are multivariate Gaussian distribution. We refer to this
mainly used for measuring the risk in a portfolio of class of models as Gaussian multifactor models.b The
assets rather than computing the prices of individual impact of the risk factors on Ai is determined by
securities. The quantification of portfolio risk is tradi- Ri2 ∈ [0, 1] and the factor weights wij ∈ .
tionally split into separate calculations for market and To quantify portfolio risk, measures of risk are
credit risk, which are performed in different types of applied to the portfolio loss distribution (1). The most
portfolio models. This article focuses on credit risk, widely used risk measures in banking are Value-at-
more precisely on simulation techniques in structural Risk and expected shortfall: Value-at-Risk VaRα (L)
credit portfolio models. We refer to [4] for a compre- of L at level α ∈ (0, 1) is simply an α-quantile of L,
hensive exposition of Monte Carlo (MC) methods in whereas expected shortfall of L at level α is defined
quantitative finance including applications in market by
 1
risk models. −1
ESα (L) := (1 − α) VaRu (L)du
In a typical bank, risk capital for credit risk far out- α
weighs capital requirements for any other risk class.
Key drivers of credit risk are concentrations in a For most practical applications, the average of all
bank’s credit portfolio. Depending on their formu- losses above the α-quantile is a good approximation
lation, credit portfolio models can be divided into of ESα (L): for c := VaRα (L) we have
reduced-form models and structural (or firm-value) 
models (see Reduced Form Credit Risk Models; −1
ESα (L) ≈ E(L|L > c) = (1 − α) L · 1{L>c} d
Structural Default Risk Models). The progenitor of
all structural models is the model of Merton [13], (3)
which links the default of a firm to the relationship
between its assets and the liabilities that it faces at the This approximation is an exact equality unless the
end of a given time period [0, T ]. More precisely, in a distribution of L has an atom at c, a situation that
structural credit portfolio model, the ith counterparty very rarely arises in practice.
defaults if its ability-to-pay variable Ai falls below a
default threshold Di : the default event at time T is
defined as {Ai ≤ Di } ⊆ , where Ai is a real-valued Simulation Techniques
random variable on the probability space (, A, )
and Di ∈ . The portfolio loss variable is defined by Since the portfolio loss distribution (1) does not have
analytic form, the actual calculation and allocation
of portfolio risk is a challenging problem. Saddle-

n
point techniques have been successfully applied to
L := li · 1{Ai ≤Di } (1)
certain types of portfolios; see, for example, [10] or
i=1
see Saddlepoint Approximation. The most flexible
approach, however, is based on MC simulation of the
where n denotes the number of counterparties and li is portfolio loss distribution. The following are the main
the loss-at-default of the ith counterparty. To reflect steps in generating one MC sample:
risk concentrations, each Ai is decomposed into a
sum of systematic factors X1 , . . . , Xm , which are 1. calculation of a sample (x1 , . . . , xm ) of the corre-
often identified with geographic regions or industries, lated systematic factors and a sample (z1 , . . . , zn )
and an idiosyncratic (or firm-specific) factor Zi , that of the independent idiosyncratic factors;
2 Credit Portfolio Simulation

2. calculation of the corresponding values (a1 , . . . , estimate


an ) of the ability-to-pay variables using equation
1
k
(2); ESα (L)k,¯ := L ¯ (i) · 1{L¯ (i)>c} · f (i) (6)
3. calculation of the set of defaulted counterparties k i=1 
defined by Def := {i ∈ {1, . . . , n} | ai ≤ Di };
4. calculation of the portfolio loss: the sum where L¯ (i) is a realization of the portfolio loss L
 ¯ and f (i) is the corre-
i∈Def li is a sample of the portfolio loss under the probability measure 
distribution. sponding value of the density function. The objective
is to choose the probability measure  ¯ in such a way
The MC scenarios of the portfolio loss distribution that the variance of the MC estimate for the inte-
are used as input for the calculation of risk mea- gral (5) is minimal under  ¯ . A general formula for
sures. As an example, we compute expected shortfall the optimal importance sampling measure  ¯ is given
with respect to the α = 99.9% quantile based on in [15], which transforms equation (6) into a zero-
k = 100 000 MC samples s1 ≥ s2 ≥ . . . ≥ sk of the variance estimator. However, since the construction
portfolio loss L. Then ESα (L) becomes requires knowledge of the integral (3) itself, the opti-
 mal measure cannot be used in the actual calculation.

100
Nevertheless, it provides guidance on the design of an
−1
(1 − α) L · 1{L>c} d ≈ si /100 (4)
effective importance sampling strategy. Another tech-
i=1
nique for measure transformation, called exponential
Since ESα (L) is calculated as the average of 100 tilting, applies exponential families of distributions,
samples only, the MC estimate is subject to large which are specified by cumulant generating func-
statistical fluctuations and is numerically unstable. tions [1, 4]. As a general rule, detailed knowledge
This is even truer for expected shortfall contributions about the model (often in the form of asymptotic
of individual transactions. A significantly higher approximations) is indispensable for the construction
number of samples has to be computed, which of importance sampling algorithms. It is precisely
makes straightforward MC simulation impracticable this feature of importance sampling that makes the
for large credit portfolios. practical application more difficult but, on the other
Different techniques have been developed that hand, increases the effectiveness of the methodology.
reduce the variance of MC simulations and—as Importance sampling in Gaussian multifactor
a consequence—the number of samples required models utilizes the conditional independence of
for stable results. We refer to [4] for a general ability-to-pay variables by splitting the simulation
introduction to variance reduction techniques includ- of the portfolio loss distribution into two steps
ing control variates, antithetic variables, stratified (compare to [11] in the more general context of
sampling, moment matching, and importance sam- mixture models). In a first step, importance sampling
pling. Recent research [3, 5–9, 12, 14] has shown is used to simulate the systematic factors, and
that importance sampling is particularly efficient for then the independence of the ability-to-pay variables
stabilizing MC simulation in Gaussian multifactor conditional on systematic scenarios is exploited,
models. Importance sampling attempts to reduce vari- for example, by another application of importance
ance by changing the probability measure used for sampling or by limit theorems [7, 8].
A natural importance sampling measure  ¯ for the
generating MC samples. In the above setting, the
integral in equation (3) is replaced by the equiv- systematic factors is a negative shift, that is, the
systematic factors have a negative mean under  ¯,
alent integral on the right-hand side of the equa-
tion which enforces a higher number of defaults and there-
  fore increases the stability of the MC estimate. For
¯ calculating the shift, Glasserman and Li [7] mini-
L · 1{L>c} d = L · 1{L>c} · f d (5)
mize an upper bound on the second moment of the
importance sampling estimator of the tail probabil-
where  is absolutely continuous with respect to the ity. Furthermore, they show that the corresponding
probability measure  ¯ and has (Radon–Nikodym) importance sampling scheme is asymptotically opti-
density f . This change of measure results in the MC mal. The approach in [8, 9] utilizes the infinite
Credit Portfolio Simulation 3

granularity approximation of the portfolio loss distri- [3] Egloff, D., Leippold, M., Jöhri, S. & Dalbert, C.
bution (compare to [16]). More precisely, the original (2005). Optimal Importance Sampling for Credit Port-
portfolio P is approximated by a homogeneous and folios with Stochastic Approximations. Working paper,
Zürcher Kantonalbank, Zurich.
infinitely granular portfolio P̄ . The loss distribution [4] Glasserman, P. (2004). Monte Carlo Methods in Finan-
of P̄ can be specified by a Gaussian one-factor model. cial Engineering, Springer.
The calculation of the shift of the systematic fac- [5] Glasserman, P. (2005). Measuring marginal risk contri-
tors is now done in two steps: in the first step, the butions in credit portfolios, Journal of Computational
optimal mean is calculated in the one-factor setting Finance 9, 1–41.
and then the scalar mean is lifted to a mean vector [6] Glasserman, P., Kang, W. & Shahabuddin, P. (2007).
for the systematic factors in the original multifactor Fast Simulation of Multifactor Portfolio Credit Risk.
Working paper, Columbia University, New York.
model. Other importance sampling techniques [3, 6]
[7] Glasserman, P. & Li, J. (2005). Importance sam-
are based on the Robbins–Monro stochastic approx- pling for portfolio credit risk, Management Science 51,
imation method or use large deviation analysis to 1643–1656.
calculate multiple mean shifts. [8] Kalkbrener, M., Kennedy, A. & Popp, M. (2007).
The efficiency of the proposed variance reduction Efficient calculation of expected shortfall contributions
schemes heavily depends on the portfolio charac- in large credit portfolios, Journal of Computational
teristics. For example, the technique proposed in Finance 11, 45–77.
[9] Kalkbrener, M., Lotter, H. & Overbeck, L. (2004).
[8, 9] is tailored to large and well-diversified port-
Sensible and efficient capital allocation for credit port-
folios. For those portfolios the analytic loss distri- folios, Risk 17(1), S19–S24.
bution of the infinitely granular portfolio provides [10] Martin, R., Thompson, K. & Browne, C. (2001). Taking
an excellent fit, which typically reduces the vari- to the saddle, Risk 14(6), 91–94.
ance—and therefore the number of required MC sce- [11] McNeil, A.J., Frey, R. & Embrechts, P. (2005). Quan-
narios—by a factor of more than 100. Smaller port- titative Risk Management: Concepts, Techniques, and
folios with low dependence on systematic factors, on Tools, Princeton University Press.
[12] Merino, S. & Nyfeler, M. (2004). Applying importance
the other hand, are dominated by idiosyncratic risk,
sampling for estimating coherent credit risk contribu-
which increases the relative importance of variance tions, Quantitative Finance 4, 199–207.
reduction techniques on idiosyncratic factors [7, 8], [13] Merton, R. (1974). On the pricing of corporate debt: the
for example, importance sampling based on exponen- risk structure of interest rates, Journal of Finance 29,
tial tilting. 449–470.
[14] Morokoff, W.J. (2004). An importance sampling method
for portfolios of credit risky assets, Proceedings of
End Notes the 2004 Winter Simulation Conference, IEEE Press,
pp. 1668–1676.
a.
The views expressed in this article are those of the author [15] Rubinstein, R.Y. (1981). Simulation and the Monte
and do not necessarily reflect the position of Deutsche Bank Carlo Method, Wiley.
AG. [16] Vasicek, O. (2002). Loan portfolio value, Risk 15(12),
b.
A survey on credit portfolio modeling can be found in 160–162.
[2, 11].

Related Articles
References

[1] Barndorff-Nielsen, O. (1978). Information and Exponen- Large Pool Approximations; Monte Carlo Simula-
tial Families, Wiley. tion; Structural Default Risk Models; Saddlepoint
[2] Bluhm, C., Overbeck, L. & Wagner, C. (2002). An Intro- Approximation; Variance Reduction.
duction to Credit Risk Modeling, CRC Press/Chapman &
Hall. MICHAEL KALKBRENER
Counterparty Credit the limit, the transaction is not allowed. The limits
usually depend on the counterparty’s credit quality:
Riska higher rated counterparties have higher limits. To
compare uncertain future exposure with a determin-
istic limit, potential future exposure (PFE) profiles
are calculated from exposure probability distributions
Counterparty credit risk (CCR) is the risk that a at future time points. PFE profiles are obtained by
counterparty in a financial contract will default prior calculating a quantile of exposure at a high confi-
to the expiration of the contract and will fail to dence level (typically, above 90%). Some institutions
make all the payments required by the contract. use different exposure measures, such as expected
Only the contracts privately negotiated between the exposure (EE) profiles, for comparing with the credit
counterparties—over-the-counter (OTC) derivatives limit. It is important to understand that a given credit
and securities financing transactions (SFT)—bear limit amount is meaningful only in the context of a
CCR. Exchange-traded derivatives are not subject to given exposure measure (e.g., 95%-level quantile).
CCR because all contractual payments promised by Future credit exposure can be greatly reduced
these derivatives are guaranteed by the exchange. by means of risk-mitigating agreements between
CCR is similar to other forms of credit risk (such two counterparties, which include netting agreements,
as lending risk) in that the source of economic loss is margin agreements, and early termination agree-
an obligor’s default. However, CCR has two unique ments. Netting agreement is a legally binding con-
features that set it apart from lending risk: tract between two counterparties that, in the event of
default of one of them, allows aggregation of trans-
• Uncertainty of credit exposure Credit expo-
actions between these counterparties. Instead of each
sure of one counterparty to the other is determined
trade between the counterparties being settled sep-
by the market value of all the contracts between
arately, the entire portfolio covered by the netting
these counterparties. While one can obtain the
agreement is settled as a single trade whose value
current exposure from the current contract val-
equals the net value of the portfolio. Margin agree-
ues, the future exposure is uncertain because the
ments limit the potential exposure of one counterparty
future contract values are not known at present.
to the other by means of requiring collateral should
• Bilateral nature of credit exposure Since both
the unsecured exposure exceed a predefined thresh-
counterparties can default and the value of many
old. The threshold value depends primarily on the
financial contracts (such as swaps) can change
credit quality of the counterparty: the higher the credit
sign, the direction of future credit exposure
quality, the higher the threshold.
is uncertain. Counterparty A may be exposed
There are two types of early termination agree-
to default of counterparty B under one set of
ments: termination clauses and downgrade provi-
future market scenarios, while counterparty B
sions. Termination clause is specified at the trade
may be exposed to default of counterparty A
level. A unilateral (bilateral) termination clause gives
under another set of scenarios.
one (both) of the counterparties the right to terminate
The uncertainty of future credit exposure makes the trade at the fair market value at a predefined set of
managing and modeling CCR of the trading book dates. Downgrade provision is specified for the entire
challenging. For a comprehensive introduction to portfolio between two counterparties. Under a unilat-
CCR, see [1, 5, 17]. eral (bilateral) downgrade provision, the portfolio is
settled at its fair market value the first time the credit
rating of one (either) of the counterparties falls below
Managing and Mitigating Counterparty a predefined level.
Credit Risk
One of the most conventional techniques of managing Contract-level Exposure
credit risk is setting counterparty-level credit limits.
If a new transaction with the counterparty would Let us consider a financial institution (we will call
result in the counterparty-level exposure exceeding it a bank for brevity) that has a single derivative
2 Counterparty Credit Risk

contract with a counterparty. The bank’s exposure by any of the netting agreements. Counterparty-level
to the counterparty at a given future time is given exposure in this most general case is given by
by the bank’s economic loss in the event of the
 
counterparty’s default at that time. If the counterparty
 
defaults, the bank must close out its position with the Ec (t) = max  Vi (t), 0
counterparty. To determine the loss arising from the k i∈NAk
counterparty’s default, it is convenient to assume that 
the bank enters into a similar contract with another + max[Vi (t), 0] (3)
counterparty in order to maintain its market position. i∈
/ {NA}
Since the bank’s market position is unchanged after
replacing the contract, the loss is determined by the The inner summation in the first term of equation (3)
contract’s replacement cost at the time of default. aggregates the values of all trades covered by the
If the trade value at the time of default is negative kth netting agreement (hence the notation i ∈ NAk ),
for the bank, the bank receives this amount when it while the outer summation aggregates exposures
replaces the trade, but has to forward the money to the across all netting agreements. The second term in
defaulting counterparty, so that the net loss is zero. If equation (3) is simply the sum of the contract-level
the trade value at the time of default is positive for the exposures of all trades that do not belong to any
bank, the bank pays this amount when replacing the netting agreement (hence the notation i ∈/ {NA}).
trade, but receives nothing (assuming no recovery)
from the defaulting counterparty, so that the net loss
is equal to the trade value.
Margin Agreements and Collateral
Summarizing this, we can write the bank’s credit Modeling
exposure to the counterparty at future time t as
Margin agreements can further reduce credit expo-
Ei (t) = max{Vi (t), 0} (1) sure. Margin agreements can be either unilateral or
bilateral. Under a unilateral agreement, only one of
where Vi (t) is the value of trade i with the counter- the counterparties has to post collateral. If the agree-
party at time t from the bank’s point of view and ment is bilateral, both counterparties have to post
Ei (t) the bank’s contract-level exposure to the coun- collateral.
terparty created by trade i at time t. Usually a margin agreement covers one or more
Since the contract value changes unpredictably netting agreements. We can generalize equation (3)
over time as the market moves, only the current by specifying collateral amount Ck (t) available to the
exposure is known with certainty, while the future bank under netting agreements NAk at time t with
exposure is uncertain. the convention that this amount is positive when the
bank holds collateral and negative when the bank has
posted collateral:
Counterparty-level Exposure and Netting
Agreements  
 
If the bank has more than one trade with the Ec (t) = max  Vi (t) − Ck (t), 0
counterparty and counterparty risk is not mitigated k i∈NAk

in any way, the bank’s exposure to the counterparty 


+ max[Vi (t), 0] (4)
is equal to the sum of the contract-level exposures:
i ∈{NA}
/
 
Ec (t) = Ei (t) = max{Vi (t), 0} (2) For netting agreements that are not covered by a
i i margin agreement, collateral is identically zero.
where the subscript ‘c’ stands for ‘counterparty’.
Netting agreements allow for a significant reduc- Unilateral Margin Agreements
tion of the credit exposure. There may be several
netting agreements between the bank and the coun- Let us consider a single unilateral (in the bank’s
terparty, as well as some trades that are not covered favor) margin agreement with the threshold Hc ≥ 0
Counterparty Credit Risk 3

and minimum transfer amount (MTA). When the the counterparty’s threshold Hc nonnegative, but will
portfolio value exceeds the threshold, the counter- specify the bank’s threshold Hb as nonpositive. Then,
party must post collateral to keep the bank’s exposure the bank posts collateral when the portfolio value
from rising above the threshold. As the exposure (defined from the bank’s point of view) is below the
drops below the threshold, the bank returns collat- bank’s threshold. The MTA is the same for the bank
eral to the counterparty. MTA limits the frequency of and the counterparty and MTA > 0.
collateral exchange. It is difficult to model collateral Similar to the unilateral case, we will create effec-
subject to MTA exactly because that would require tive thresholds for the bank and for the counter-
daily simulation time points, which is not feasible party. Effective threshold for the counterparty, Hc(e) ,
given the long-term nature of exposure modeling. In remains unchanged. From the counterparty’s point of
practice, the actual threshold Hc is often replaced view, effective threshold for the bank must be defined
by the effective threshold defined as Hc(e) = Hc + in exactly the same way. After taking into account
MTA. After this replacement, the margin agreement that we do not switch our point of view and Hb ≤ 0,
is treated as if it had zero MTA. the definition of the effective threshold for the bank
The simplest approach to modeling collateral is to will be Hb(e) = Hb − MTA. Now the bilateral agree-
limit the future exposure from above by the threshold ment can be treated as if it had zero MTA.
(i.e., for all scenarios with portfolio value above the Collateral available to the bank at time t under the
threshold, set the exposure equal to the threshold). bilateral agreement is modeled as
However, this approach is too simplistic because it
ignores the time lag between the last delivery of C(t) = max{V (t − δt) − Hc(e) , 0}
collateral and the time when the loss is realized.
This time lag is known as the margin period of risk + min{V (t − δt) − Hb(e) , 0} (6)
(MPR), which we will denote by δt. While the MPR
is not known with certainty, it is typically assumed The first term in the right-hand side of equation (6)
to be a deterministic number that is defined at the describes the scenarios when the bank receives collat-
margin agreement level. Its value depends on the eral (i.e., C(t) > 0), while the second term describes
contractual margin call frequency and the liquidity of the scenarios when the bank posts collateral (i.e.,
the portfolio. For example, δt = 2 weeks is usually C(t) < 0).
assumed for portfolios of liquid contracts and daily For more details on collateral modeling, see [9,
margin call frequency. 15, 16].
Applying the rules of posting collateral under the
assumption of the effective threshold with zero MTA
and taking into account the MPR, the collateral C(t)
Simulating Credit Exposure
available to the bank at time t is given by Because of the complex nature of banks’ portfolios,
exposure distribution at future time points is usually
C(t) = max{V (t − δt) − Hc(e) , 0} (5)
obtained via Monte Carlo Simulation process. This
where V (t) is the portfolio value from the bank’s process typically consists of three major steps:
point of view at time t. • Scenario generation Dynamics of market risk
factors (e.g., interest rates, foreign exchange (FX)
Bilateral Margin Agreements rates, etc.) is specified via relatively simple
stochastic processes (e.g., geometric Brownian
Under a bilateral margin agreement, both the coun- motion). These processes are calibrated either to
terparty and the bank have to post collateral: the historical data or to market implied data. Future
counterparty posts collateral when the bank’s expo- values of the market risk factors are simulated for
sure to the counterparty exceeds the counterparty’s a fixed set of future time points.
threshold, while the bank posts collateral when the • Instrument valuation For each simulation time
counterparty’s exposure to the bank exceeds the point and for each realization of the underlying
bank’s threshold. Since we are doing our analysis market risk factors, valuation is performed for
from the point of view of the bank, we will keep each trade in the counterparty portfolio.
4 Counterparty Credit Risk

• Aggregation For each simulation time point One should keep in mind that counterparty-level
and for each realization of the underlying mar- exposure Ec (t) incorporates all netting and margin
ket risk factors, counterparty-level exposure is agreements between the bank and the counterparty,
obtained by applying the necessary netting and as discussed above.
collateral rules, conceptually described by equa- Unilateral CVA is obtained by taking the risk-
tions (3) and (4). neutral expectation of the loss in equation (7). Under
the assumption that recovery rate is independent
The outcome of this process is a set of real- of the market factors and the time of default, this
izations of the counterparty-level exposure (each results in
realization corresponds to one market scenario) at
each simulation time point. T
Because of the computational intensity required CVAu−l = (1 − R c ) EE∗c (t) d PDc (t) (8)
to calculate counterparty exposures—especially for 0
a bank with a large portfolio—certain compromises
between the accuracy and the speed of the calcu- where EEc∗ (t) is the risk-neutral discounted EE at
lation are usually made: relatively small number of time t, conditional on the counterparty defaulting at
market scenarios (typically, a few thousand) and sim- time t, given by
ulation time points (typically, in the 50–200 range),  
Q B0

simplified valuation methods, and so on. ∗
EEc (t) = E 
Ec (t) τc = t (9)
For more details on simulating credit exposure, Bt
see [7, 17].
and R c is the expected recovery rate; PDc (t) is
the counterparty’s cumulative from today to time t,
Pricing Counterparty Risk—Unilateral estimated today; and T is the maturity of the longest
trade in the portfolio.
Approach
The term structure of the risk-neutral PDs is
Let us assume that the bank is default-risk-free. Then, obtained from the Credit Default Swaps spreads
when pricing transactions with a counterparty, the quoted in the market [19].
bank should require a risk premium to be com- We would like to emphasize that the expectation
pensated for the risk of the counterparty defaulting. of the discounted exposure at time t in equation (9) is
The market value of this risk premium, defined for conditional on the counterparty’s default occurring at
the entire portfolio of trades with the counterparty, time t. This conditioning is material when there is a
is known as unilateral credit valuation adjustment significant dependence between the exposure and the
(CVA). counterparty credit quality. This dependence, known
A Risk-neutral Pricing valuation framework is as right/wrong-way risk, was first considered in [8]
used for pricing CCR. The bank’s economic loss aris- and [12]. To account for it, the counterparty’s credit
ing from the counterparty’s default and discounted to quality must be modeled jointly with the market risk
today is given by factors. For more details on modeling right/wrong-
way risk, see [4, 10, 18].
B0 In practice, the dependence between exposure and
Lu−l = 1{τc ≤T } (1 − Rc )Ec (τc ) (7)
Bτc the counterparty’s credit quality is often ignored and
conditioning on default in equation (9) is removed.
where τc is the time of default of the counterparty; Discounted EE is calculated for a set of simulation
1{A} is the indicator function that assumes the value time points {tk } under the exposure simulation frame-
1 if Boolean variable A is TRUE and value 0, other- work outlined above. Then, CVA is calculated by
wise; Ec (t) is the bank’s exposure to counterparty’s approximating the integral in equation (8) by a sum:
default at time t; Rc is the counterparty Recovery
Rate (i.e., percentage of the bank’s exposure to the 
counterparty that the bank will be able to recover in CVAu−l ≈ (1 − R c ) EE∗c (tk )
the event of the counterparty’s default); and Bt is the k

value of the money-market account at time t. × [PDc (tk−1 ) − PDc (tk )] (10)
Counterparty Credit Risk 5

Since the exposure expectation in equation (10) is Bilateral CVA is obtained by taking risk-neutral
risk neutral, scenario models for all market risk expectation of equation (11):
factors should be arbitrage free. This is achieved
by appropriate calibration of drifts. Moreover, risk
factor volatilities should be calibrated to the available T
market prices of options on the risk factors. CVAb−l = (1 − R c ) EE∗c (t)
For more details on unilateral CVA, see [3]. 0
× Pr[τb > t|τc = t] dPDc (t)
T
Pricing Counterparty Risk—Bilateral − (1 − R b ) EE∗b (t)
Approach
0

In reality, banks are not default-risk-free. Because of × Pr[τc > t|τb = t] dPDb (t)
the bilateral nature of credit exposure, the bank and (12)
the counterparty will never agree on the fair price of
CCR if they apply unilateral pricing outlined above:
each of them will demand a risk premium from where EE∗c (t) is the discounted EE of the counterparty
the other. The bilateral approach specifies a single to the bank at time t, conditional on the counterparty
quantity—known as bilateral CVA—that accounts defaulting at time t, defined in equation (9), and
both for the bank’s loss caused by the counterparty’s EE∗b (t) is the discounted EE of the bank to the
default and the counterparty’s loss caused by the counterparty at time t, conditional on the bank
bank’s default. defaulting at time t, defined as
Bilateral loss of the bank is given by  
B0 
EE∗b (t) = E Eb (t)  τb = t (13)
B0 Bt
Lb−l = 1{τc ≤T } 1{τc <τb } (1 − Rc )Ec (τc )
Bτc
If the dependence between credit exposure and
B0
− 1{τb ≤T } 1{τb <τc } (1 − Rb )Eb (τb ) the credit quality of the counterparty and of the
Bτb bank can be ignored, the conditional expectations
(11) in equations (9) and (13) should be replaced with
the unconditional ones. As expected, equation (12) is
where τb is the time of default of the bank; Eb (t) is symmetric between the bank and the counterparty, so
the counterparty’s exposure to bank’s default at time that the bank and the counterparty will always agree
t; and Rb is the bank recovery rate (i.e., percentage on the price of CCR for their portfolio.
of the counterparty’s exposure to the bank that the One can use Default Time Copulas; Gaussian
counterparty will be able to recover in the event of Copula Model; Copulas: Estimation to express the
the bank’s default). conditional probabilities in equation (12) as functions
The first term in equation (11) describes the of the counterparty’s and the bank’s risk-neutral PDs.
bank’s loss when the counterparty defaults, but the For example, if the normal copula model [13] is
bank does not default. The second term describes used to describe the dependence between τc and τb ,
the loss of the counterparty in the event of the the conditional probabilities in equation (12) take the
bank’s default and the counterparty’s survival. From form
the bank’s point of view, the counterparty’s loss is
gain arising from the bank’s option not to pay the
counterparty when the bank defaults, so this term Pr[τb > t|τc = t]


is subtracted from the bank’s loss. Equation (11) is −1 [PDb (t)] − ρ−1 [PDc (t)]
completely symmetric: if we change the sign of the =1−
right-hand side, we will obtain the bilateral loss of 1 − ρ2
the counterparty. (14)
6 Counterparty Credit Risk

and a simplified approach, where simulation of counter-


party defaults is completely separate from exposure
simulation. The simplified approach is a two-stage
Pr[τc > t|τb = t] process. During the first stage, exposure simulation is


−1 [PDc (t)] − ρ−1 [PDb (t)] performed and a deterministic loan equivalent expo-
=1− sure (LEQ) is calculated from the exposure distri-
1 − ρ2
bution for each counterparty. The second stage is a
(15) simulation of counterparty default events according
to one of the credit risk portfolio models (see Credit
where ρ is the normal copula correlation and (·) is Migration Models; Structural Default Risk Mod-
the standard normal cumulative distribution function. els; CreditRisk+) that are used for loan portfolios.
Portfolio credit loss is calculated as

Portfolio Loss and Economic Capital L(T ) = 1{τj ≤T } (1 − R (j ) )LEQ(j ) (T ) (18)
j

Until now we have discussed modeling credit expo-


sure and losses at the counterparty level. However, where LEQ(j ) (T ) is the LEQ of counterparty j for
the distribution of the credit loss of the bank’s entire time horizon T .
trading book provides more complete information Note that many loan portfolio models do not
about the risk a bank is taking. Portfolio loss dis- produce the time of default explicitly. Instead, they
tribution is needed for such risk management tasks only distinguish between two events: “default has
as calculation and allocation of Economic Capital happened prior to the horizon (i.e., τj ≤ T )” and
(EC). For a comprehensive introduction to EC for “default has not happened prior to the horizon (i.e.,
CCR, see [14]. τj > T )”. Note also that, because the time of default
Portfolio credit loss L(T ) for a time horizon T is not known, discounting to present is not applied in
can be expressed as the sum of the counterparty-level equation (18).
losses over all counterparties: For an infinitely fine-grained portfolio with inde-
pendent exposures, it has been shown [6, 20] that
 B0 LEQ is given by the EE averaged from zero to
L(T ) = 1{τj ≤T } (1 − R (j ) )E (j ) (τj ) (16) T —this quantity is often referred to as expected pos-
j
Bτj
itive exposure (EPE ):
where τj is the time of default of counterparty j ;
T
R (j ) is the recovery rate for counterparty j ; and 1
EPE(j ) (T ) ≡ EE(j ) (t) dt (19)
E (j ) (t) is the bank’s counterparty-level exposure at T
time t created by all trades that the bank has with 0
counterparty j .
If one uses LEQ given by equation (19) for a real
The economic capital ECq (T ) for time horizon T
portfolio, the EC will be understated because both
and confidence level q is given by
exposure volatility and correlation between exposures
are ignored. However, this understated EC can be
ECq (T ) = Qq [L(T )] − E[L(T )] (17)
used in defining a scaling parameter commonly
where Qq [X] is the quantile of random variable X at known as alpha:
confidence level q (in risk management, this quantity
EC(Real)
q (T )
is often referred to as Value-at-Risk (VaR)). The αq (T ) = (20)
distribution of portfolio loss L(T ) can be obtained EC(EPE)
q (T )
from equation (16) via joint Monte Carlo simulation
of trade values for the entire bank portfolio and of where EC(Real)
q (T ) is the EC of the real portfolio with
default times of individual counterparties. stochastic exposures, and EC(EPE)
q (T ) is the EC of the
However, the joint simulation process is very fictitious portfolio with stochastic exposures replaced
expensive computationally and is often replaced by by EPE.
Counterparty Credit Risk 7

If alpha of a real portfolio can be estimated, its [5] Canabarro, E. & Duffie, D. (2003). Measuring and mark-
LEQ can be defined according to ing counterparty risk, in Asset/Liability Management
for Financial Institutions, L. Tilman, ed., Institutional
LEQ(j ) (T ) = αq (T )EPE(j ) (T ) (21) Investor Books.
[6] Canabarro, E., Picoult, E. & Wilde, T. (2003). Analysing
Because the EC of a portfolio with deterministic counterparty risk, Risk September, 117–122.
[7] De Prisco, B. & Rosen, D. (2005). Modeling stochastic
exposures is a homogeneous function of the expo- counterparty credit exposures for derivatives portfolios,
sures, using the LEQ defined in equation (21) will in Counterparty Credit Risk Modelling, M. Pykhtin, ed.,
produce the correct EC(Real)
q (T ). The caveat of this Risk Books.
approach is that one has to run a joint simulation of [8] Finger, C. (2000). Toward a better estimation of wrong-
trade values and counterparties’ defaults to calculate way credit exposure, Journal of Risk Finance 1(3),
alpha. 43–51.
Several estimates of typical values of alpha for a [9] Gibson, M. (2005). Measuring counterparty credit expo-
sure to a margined counterparty, in Counterparty Credit
large dealer portfolio and the time horizon T = 1 year Risk Modelling, M. Pykhtin, ed., Risk Books.
are available. An International Swaps and Deriva- [10] Hille, C., Ring, J. & Shimamoto, H. (2005). Modelling
tives Association (ISDA) survey [11] has reported counterparty credit exposure for credit default swaps,
alpha calculated by four large banks for their actual Risk May, 65–69.
portfolios to be in the 1.07–1.10 range. Theoretical [11] ISDA-TBMA-LIBA (2003). Counterparty Risk Treat-
estimates of alpha under a set of simplifying assump- ment of OTC Derivatives and Securities Financing
Transactions, June.
tions [6, 20] are 1.1 when market-credit correlations
[12] Levin, R. & Levy, A. (1999). Wrong way exposure—are
are ignored, and 1.2 when they are not. firms underestimating their credit risk? Risk July, 52–55.
The framework described above has found its [13] Li, D. (2000). On default correlation: a copula approach,
place in the regulatory capital calculations under Journal of Fixed Income 9, 43–54.
Basel II (see Regulatory Capital): a slightly mod- [14] Picoult, E. (2005). Calculating and hedging exposure,
ified version of equation (21) is used to calculate credit value adjustment and economic capital for coun-
exposure at default (EAD) under the internal mod- terparty credit risk, in Counterparty Credit Risk Mod-
elling, M. Pykhtin, ed., Risk Books.
els method for CCR [2]. Basel fixes alpha at 1.4, but
[15] Pykhtin, M. (2009). Modeling credit exposure for col-
it allows banks to calculate their own alpha, subject lateralized counterparties, Journal of Credit Risk; to be
to the supervisory approval and a floor of 1.2. published.
[16] Pykhtin, M. & Zhu, S. (2006). Measuring counterparty
credit risk for trading products under Basel II, in Basel
End Notes Handbook, 2nd Edition, M. Ong, ed., Risk Books.
[17] Pykhtin, M. & Zhu, S. (2007). A guide to modeling
a.
The opinions expressed here are those of the author and do counterparty credit risk, GARP Risk Review July/August,
not necessarily reflect the views or policies of the author’s 16–22.
employer. [18] Redon, C. (2006). Wrong way risk modelling, Risk April,
90–95.
[19] Schonbucher, P. (2003). Credit Derivatives Pricing Mod-
References els, Wiley.
[20] Wilde, T. (2005). Analytic methods for portfolio coun-
[1] Arvanitis, A. & Gregory, J. (2001). Credit: The Complete terparty risk, in Counterparty Credit Risk Modelling,
Guide to Pricing, Hedging and Risk Management, Risk M. Pykhtin, ed., Risk Books.
Books.
[2] Basel Committee on Banking Supervision (2006). Inter-
national Convergence of Capital Measurement and Cap- Related Articles
ital Standards, A Revised Framework.
[3] Brigo, D. & Masetti, M. (2005). Risk neutral pricing
of counterparty credit risk, in Counterparty Credit Risk Default Time Copulas; Economic Capital; Expo-
Modelling, M. Pykhtin, ed., Risk Books. sure to Default and Loss Given Default; Monte
[4] Brigo, D. & Pallavicini, A. (2008). Counterparty risk Carlo Simulation; Risk-neutral Pricing.
and contingent CDS under correlation, Risk February,
84–88. MICHAEL PYKHTIN
Loan Valuation 3. the time by which the principal of the loan must
be repaid, the maturity;
4. the current market rate of interest for the
A loan is an agreement in which one party, called a obligor’s likelihood of default, called the market
lender, provides the use of property, the principal, to credit spread ;
another party, the borrower. The borrower customar- 5. the likelihood of the event that a borrower will
ily promises to return the principal after a specified have repaid the principal at any particular date
period along with payment for its use, called interest prior to maturity.
[3]. When the property loaned is cash, the documen-
tation of the agreement between borrower and lender Although the bulk of the loans outstanding are rated
is called a promissory note. investment-grade or better, these loans trade very
Although cash loans can take many forms, tra- infrequently because of their high credit quality and
ditionally, banks and other financial institutions are lack of price differentiation. In fact, most loans that
the primary lenders of cash and businesses, organiza- trade after origination are those made by banks to
tions, and individuals are the borrowers. Most loans borrowers having speculative-grade credit ratings.
to corporations share a common set of structural char- These loans, made to high-yield firms are typically
acteristics [2, 5]. referred to as leveraged loans, though the exact
definition varies slightly among market participants.b
1. Interest on loans is typically paid quarterly at a
The types of loan facilities commonly traded in
rate specified relative to some reference rate such
secondary markets include the following:
as LIBOR (i.e., L + 250 bp).a Thus, loans have
floating-rate coupons whose absolute values are 1. Amortizing term loans. Usually called “term
not known with certainty except over the next loan A”, the periodic payments from these loans
quarter. include partial payment of principal, similar to
2. Often the firm’s assets or receivables are pledged
what a mortgage loan does. These loans are
against the borrowed principal. Because of this,
usually held by banks and are becoming less
their recovery rates are generally higher than
popular.
corporate bonds, which are most commonly
2. Institutional term loans. These loans are struc-
unsecured.
3. Most loans are prepayable on any coupon date tured to have bullet or close-to-bullet pay-
at par, although some agreements contain a ment schedules and are targeted for institutional
prepayment penalty or have a noncall period. The investors. They are referred to as “term loan B ”,
loan prepayment feature ensures that loan prices “term loan C ” and so on. Institutional term loans
rarely exceed several points above par. constitute the bulk of leveraged loan market.
4. Finally, unlike bonds which are public securities, 3. Revolving credit lines. These are unfunded or
loans are private credit agreements. Thus, access partially funded commitments by lenders that can
to firm fundamentals and loan terms may be lim- be drawn at the discretion of the borrowers. The
ited and loan contracts are less standardized. It is facility is analogous to a corporate credit card. It
not uncommon to find “nonstandard” covenants can be drawn and repaid multiple times during
or other structural features catering to specific the term of the commitment. These commitments
needs of borrowers or investors. are traded in secondary market. They are also
known as revolvers.
Loan valuation concerns the amount of interest that a
4. Second-lien term loans. They have cash-flow
lender requires for use of the property or an investor
schedule similar to that of institutional term
will charge for purchasing the loan agreement. That
valuation depends on several factors, such as loans, except that their claims on borrowers’
assets are behind first-lien loan holders in the
1. the likelihood of failure to receive timely pay- event of default.
ments of principal, called risk of default; 5. Covenant-lite loans. These are borrower-
2. the residual value of the loan in the event of friendly versions of institutional term loans that
default, called its recovery value; have fewer than the typical stringent covenants
2 Loan Valuation

that restrict use of the principal or subsequent on the evolution of an obligor’s credit state and the
borrowing activities of the firm. changing market costs of borrowing. For example, if
a firm’s credit improves or the loan rate over LIBOR
decreases, the likelihood of prepayment increases; the
Loan Pricing borrower can refinance at a lower rate. Conversely,
if a borrower’s credit deteriorates or lending rates
Like bonds, loans contain risk of default; an obligor
increase, it will not be advantages for the borrower
may fail to make timely payments of interest and/or
to refinance.
principal. Thus, the notion of a credit spread to
LIBOR has been used to characterize the riskiness To account for the prepayment option, we price
of loans, where the credit spread, s, to LIBOR is the loans using a credit-state-dependent backward
calculated as induction method.c To illustrate, consider pricing a
 term loan with face value F , intermediate floating-
 4n ct rate coupon payments of ct , and a maturity at
4 F
V =  t +   (1) time T , to a borrower of known credit quality, J .
rt + s r4n + s 4n Specifically, Figure 1 displays pricing lattices for a
t=1 1+ 1+
4 4 five-year loan to a double-B rated (i.e., J = BB)
obligor having a coupon of LIBOR + 3%,d and
where V is the market value of the loan, ct is the
face value of 100 at maturity.e Figure 1(a) shows
coupon (LIBOR + contractual spread), rt is the spot
rate for maturity t LIBOR rates, and F is the face how the obligor’s credit state evolves over time.
value of the loan to be repaid at maturity. Loan In the lattice, probabilities are assigned reflecting
coupons are generally paid quarterly and then reset transitions from each node at time t to all nodes at
relative to LIBOR and this is reflected in equation 1. t + 1. Thus, the probability of being at a given node
Using equation 1, we can calculate a credit spread for will be conditional upon all the previous transitions.
any loan whose market price is known. In practice, ratings transition probabilities are based
One problem with equation 1 for loan valuation is on historical data from credit rating agencies,f and
that it fails to account for the fact that loans, unlike these are typically modified by the current market
bonds, are typically prepayable at par on any given price of risk to produce risk-neutral ratings transition
coupon date. The loan prepayment option creates matrices.g,h
uncertainty in the expected pattern of cash flows Having calculated transition probabilities between
and complicates comparisons of value among loans all future nodes, we then apply the backward induc-
based on their credit spreads. Pricing the prepayment tion method. At maturity T , the borrower pays
option has proved difficult because of its dependence the principal plus coupon, F + cT , or the recovery

p = 0.01
AAA 105 AAA 105
105.47
AA p = 0.05 AA
105 105.42 105
A Risk neutral credit A
Credit rating

Credit rating

p = 0.14 transitions from


BBB 105 BBB 105.61 105.33 105
structural model and
BB p = 0.38 CDS curve
BB
105 104.90 105
B p = 0.23 B
105 104.33 105
CCC CCC
p = 0.19 75
D 75 D 75
0 1 2 3 4 5 0 1 2 3 4 5
(a) Time (years) (b) Time (years)

Figure 1 Credit-dependent backward induction method. (a) Double-B rated obligor, whose credit transitions are derived
from historical data and incorporate market risk premiums are used to specify the likelihood of being in any credit state at
future times to maturity. (b) Calculation of node values using backward induction, whereby values at each non-defaulted
node are the coupon value at that node plus the sum of the conditional cash flows from the later date, discounted one period
at forward LIBOR. In the example in (b), we assume a refinancing penalty of 0.5% of the principal
Loan Valuation 3

value in default R ∗ F . Those cash flows are dis- Let the conditional probability of prepayment at
counted back to each node at the previous period time be qi ,j then the discounted cash flow is given by
using forward LIBOR at T − 1. In other words, for
each node at time i < T and credit state j , and 
T 
i−1
VJ = Di ∗ (1 − qj ) ∗ [(qi ∗ Ki )
j = (AAA, AA, . . . , CCC) we calculate an induced
i=1 j =1
value, vi,j , as
  + ((1 − qi ) ∗ CFi )] (3)


 
AAA

1 where CF i = ci /4 for i < T ; CFi = (ci /4 + F ) for
v(i,j ) = min     (Pj,k,i ∗ vi+1,k )


 f i = T , and the discount margin Di is given by
 1+
i+1,i
k=D
4
 
i
1
 Di =   (4)

 fj,j −1 + ŝ
j =0 1 +
+ ci , Ki , (2) 4



The credit spread, s, is determined by iteratively
changing the parameter ŝ and recalculating the dis-
where Pj,k,i is the probability of migrating from state counted value of the cash flows, VJ , until VJ con-
j to state k from time i to i + 1, fi is the forward verges to P , the market price.
LIBOR rate from time i to i + 1, Ki is the terminal Revolving lines of credit are priced by assuming
value of the loan at time i,i and vT ,j = F + cT . Thus, that the fraction of the loan drawn at a particular
at each node i, j we compute the induced value, time, called the usage, is directly related changes
compare it with the terminal value, Ki , and set the to the obligor’s credit quality. In other words, if
value at that node, vi,j to the lesser of the two. In a borrower’s credit rating improves, it can access
other words, if the induced value exceeds the terminal credit more cheaply and is also less likely to draw
value, the loan is effectively repaid and terminates on existing lines of credit. Conversely, a borrower
at i, j . Also, if the loan defaults at time i, the with deteriorating credit will likely draw on the
loan terminates with a value vi,D = R ∗ F for all i. credit lines it obtained when more highly rated.
Finally, the value of vi,j at time 0 (in this example, In this framework, usage can be interpreted as
at v0,BB ) is the model price of the loan. credit-dependent face value. Thus, in the equations
Although equation 2 is useful for calculating above the face value is modified by F → Uj ∗ F
prices of illiquid loans and for estimating the coupon where j is the credit state and usage Uj ranges
premiums to charge for new loans, it is less useful from 0 to 1.
for evaluating relative value among existing loans,
which are better assessed using credit spreads. In
fact, we can calculate the credit spread for a loan End Notes
by discounting its expected nondefault cash flows by
a.
a constant amount over the LIBOR curve such that LIBOR stands for London interbank offered rate, which
the discounted value matches its current market price. roughly corresponds to the interest rate charged between
For all nondefault cash flows at a given time, the bor- banks when lending large amounts of US dollars outside
the United States. The coupon rate for a given quarter is set
rower will either prepay the principal and terminate, at the beginning of the period. For example, the L + 250 bp
or pay a coupon and continue. The prepayment region coupon in the text indicates that the borrower will pay one-
in the time-and-credit-state lattice can be determined quarter of 250 bp (0.625%) plus the current three-month
using the values of vi,j in equation 2. The probability LIBOR rate on the next coupon date.
b.
of prepaying at period i is the sum of the probabil- Although some people define leveraged loans on the basis
ities of reaching nodes whose value of vi,,j equal of their balance sheet leverage ratio, it is more common to
those capped at the terminal values Ki . Given the use credit ratings (i.e., below BBB-) or credit spread to
LIBOR above some maximum.
probability transition matrix and the set ω of all pre- c.
Several versions of the backward induction method have
payment nodes, we can calculate the probability of been proposed over the years [1, 6, 7, 9]. The version
prepayment at time i conditional on no prepayment presented in equations (1–3) embodies elements that are
before time i. common to most of these methodologies.
4 Loan Valuation

d.
Loan spreads are typically quoted in basis points such as References
LIBOR + 300 bp, where 1% = 100 bp.
e.
For convenience, we assume LIBOR is constant at 2%, [1] Bohn, J. (2000). A Survey of Contingent-Claims Appro-
thereby generating a constant 5% coupon, and that the aches to Risky Debt Valuation, Institutional Investor.
loan pays annually, rather than the typical quarterly coupon [2] Deitrick, W. (2006). Leveraged Loan Handbook, Citi
payment. Markets and Banking.
f.
The most well-known credit rating agencies are Fitch, [3] Downs, J. & Goodman, J.E. (1991). Dictionary of Finance
Moody’s, and Standard & Poor’s. and Investment Terms, Barron’s, Hauppauge, New York.
g.
Ratings transition matrices are published regularly by the [4] Emery, K., Ou, S., Tennant, J., Kim, F. & Cantor, R.
major agencies [4, 8]. (2008). Corporate Default and Recovery Rates, Moody’s
h.
Most models specify adjustment of physical credit Global Corporate Finance. 1920-2007, Special Comment.
transitions so that the default probabilities at each time, i, [5] Miller, S. & William, C. (2007). A Guide to the Loan
match the risk-neutral probabilities of default as implied Market, Standard & Poor’s.
by the bond and loan markets. For example, the risk- [6] Rizk, H. (1993). GMPM Valuation Methodology: An
neutral default probability for a single risky cash flow at Overview, Citi Markets and Banking.
−ts
time t is given as PtQ = 1 − e and PtQ = N (N −1 (Pt ) + [7] Rosen, D. Does Structure Matter? (2002) Advanced Meth-
√ (1 − R)
βλ t) where, P (t, Q) is the cumulative risk-neutral default ods for Pricing and Managing the Risk of Loan Portfolios,
probability to time t, s is the market credit spread, and Algorithmics Inc.
R is the recovery rate in default. On the right, we [8] Vazza, D., Aurora, D., Kraemer, N., Kesh, S., Torres, J. &
calculate PtQ from Pt the physical default probability Erturk, E. (2007). Annual 2006 Global Corporate Default
by adding a term related to the volatility of the credit Study and Rating Transition, Standard and Poor’s Global
relative to the market, the market price of risk, and the fixed Income Research.
time to receipt of the cash flow. (For an elaboration [9] Zeng, B. & Wen, K. (2006). CreditMark Valuation
and discussion of the derivation of this relation, see Methodology, Moody’s K.M.V.
Bohn [1]. Zeng and Wen [9] describe its application to loan
pricing.) Further Reading
i.
It is common to add a refinancing premium to the principal
plus coupon when defining the terminal value for evaluating
prepayment as there are costs and/or penalties associated Aguais, S., Forest, L. & Rosen, D. (2000). Building a
with the refinancing process. Credit Risk Valuation Framework for Loan Instruments, Algo
j.
The probability of prepayment at time 1 from the initial Research Quarterly.

state J is given by q1 = k∈ω PJ,k,0 . For time i > 1, we
must add the condition TERRY BENZSCHAWEL, JULIO DAGRACA &
 that the loan was not prepaid before
time i; thus, qi = i−1m=1 (1 − qm ) ∗ / Pl,k,i−1 .
k∈ω,l ∈ω
HENRY FOK
Credit Risk usually much lower than the nominal amount of the
deal, and in many cases is only a fraction of this
amount. This is because the economic value of a
Credit risk is the risk of an economic loss from the derivative instrument is related to its replacement,
failure of a counterpartya to fulfill its contractual or market value, rather than its nominal or face
obligations. For example, credit risk in the loan value. However, the credit exposures induced by
portfolio of a bank materializes when a borrower fails the replacement values of derivative instruments are
to make a payment, either the periodic interest charge dynamic: they can be negative at one point in time
or the periodic reimbursement of principal on the and yet become positive at a later point in time after
loan he contracted with the bank. Credit risk can be market conditions have been changed. Therefore,
further decomposed into four main types: default risk, firms must examine not only the current exposure,
bankruptcy risk, deterioration in creditworthiness (or measured by the current replacement value, but also
downgrading) risk, and settlement risk. the profile of potential future exposures up to the
Default risk corresponds to the debtor’s incapacity termination of the deal.
or refusal to meet his/her debt obligations, whether
interest or principal payments on the loan contracted,
by more than a reasonable relief period from the due Credit Risk at the Portfolio Level
date, which is usually 60 days in the banking industry.
Bankruptcy risk is the risk of actually taking over The first factor affecting the amount of credit risk in
the collateralized, or escrowed, assets of a defaulted a portfolio is clearly the credit standing of specific
borrower or counterparty, and liquidating them. obligors (see Rating Transition Matrices; Credit
Creditworthiness risk is the risk that the perceived Rating). The critical issue, then, is to charge the
creditworthiness of the borrower or counterparty appropriate interest rate, or spread, to each borrower
might deteriorate. In general, deteriorated creditwor- so that the lender is compensated for the risk he/she
thiness translates into a downgrade action by the undertakes and to set the right amount of risk capital
rating agencies, such as Standard and Poor’s (S&P) aside (see Economic Capital).
or Moody’s, and an increase in the risk premium, or The second factor is “concentration risk” or the
credit spread of the borrower. A major deterioration extent to which the obligors are diversified in terms
in the creditworthiness of a borrower might be the of number, geography, and industry.
precursor of default. This leads us to the third important factor that
Settlement risk is the risk due to the exchange of affects the risk of the portfolio: the state of the
cash flows when a transaction is settled. Failure to economy. During economic boom, the frequency
perform on settlement can be caused by a counter- of default falls sharply compared with the peri-
party defaulting, liquidity constraints, or operational ods of recession. Conversely, the default rate rises
issues. This risk is greatest when payments occur in again as the economy enters a downturn. Downturns
different time zones, especially for foreign exchange in the credit cycle often uncover the hidden ten-
transactions, such as currency swaps, where notional dency of customers to default together, with banks
amounts are exchanged in different currencies.b being affected to the degree that they have allowed
Credit risk is only an issue when the position is an their portfolios to become concentrated in various
asset, that is, when it exhibits a positive replacement ways (e.g., customer, region, and industry concen-
value. In that situation, if the counterparty defaults, trations) [1].
the firm loses either all of the market value of the Credit portfolio models are an attempt to discover
position or, more commonly, the part of the value that the degree of correlation/concentration risk in a bank
it cannot recover following the credit event. The value portfolio (see Portfolio Credit Risk: Statistical
it is likely to recover is called the recovery value or Methods).
recovery rate when expressed as a percentage; the The quality of the portfolio can also be affected by
amount it is expected to lose is called the loss given the maturities of the loans, as longer loans are gen-
default (see Recovery Rate). erally considered more risky than short-term loans.
Unlike the potential loss given default on coupon Banks that build portfolios that are not concentrated
bonds or loans, the one on derivative positions is in particular maturities—“time diversification”—can
2 Credit Risk

reduce this kind of portfolio maturity risk. This interest and principal, postponement of payments, or
also helps reduce liquidity risk or the risk that change in the currencies of payment—should count
the bank will run into difficulties when it tries to as a credit event. The Conseco case famously high-
refinance large amounts of its assets at the same lighted the problems that restructuring can cause.
time. In October 2000, a group of banks led by Bank
of America and Chase granted to Conseco a three-
month extension of the maturity of approximately
Credit Derivatives and the ISDA $2.8 billion of short-term loans, while simultaneously
Definition of a Credit Event increasing the coupon and enhancing the covenant
protection. The extension of credit might have helped
With the spectacular growth of the market for credit to prevent an immediate bankruptcy, but as a signif-
default swaps (CDSs) (see Credit Default Swaps), icant credit event it also triggered potential payouts
it has become necessary to be specific about what on as much as $2 billion of CDS.
is a credit event? A credit event, usually a default, The original sellers of the CDS were not happy
triggers the payment on a CDS. This event, then, and were annoyed further when the CDS buyers
should be clearly defined to avoid any litigation seemed to play the “cheapest to deliver” game by
when the contract is settled. CDSs normally con- delivering long-dated bonds instead of the restruc-
tain a “materiality clause” requiring that the change tured loans; at the time, these bonds were trading
in credit status be validated by third-party evi- significantly lower than the restructured bank loans.
dence. (The restructured loans traded at a higher price in the
The new CDS market has struggled to define the secondary market due to the new credit-mitigation
kind of credit event that should trigger a payout under features.)
a credit derivatives contract. Major credit events as In May 2001, following this episode, ISDA issued
stipulated in CDS documentations and as formalized a restructuring supplement to its 1999 definitions
by the International Swaps and Derivatives Associa- concerning credit derivative contractual terminology.
tion (ISDA) are the following. Among other things, this document requires that to
qualify as a credit event, a restructuring event must
• Bankruptcy, insolvency, or payment default.
occur to an obligation that has at least three holders,
• Obligation/cross default that means the occur-
and that at least two-thirds of the holders must
rence of a default (other than failure to make a
agree to the restructuring. The ISDA document also
payment) on any other similar obligation.
imposes a maturity limitation on deliverables—the
• Obligation acceleration which refers to the situa-
protection buyer can only deliver securities with
tion where debt becomes due and repayable prior
a maturity of less than 30 months following the
to maturity. This event is subject to a materiality
restructuring date or the extended maturity of the
threshold of $10 million unless otherwise stated.
restructured loan—and it requires that the delivered
• Stipulated fall in the price of the underlying asset.
security be fully transferable. Some key players in the
• Downgrade in the rating of the issuer of the
market have now dropped restructuring from their list
underlying asset.
of credit events.
• Restructuring: this is probably the most contro-
versial credit event.
• Repudiation/moratorium: this can occur in two End Notes
situations. First, the reference entity (the obligor
a.
of the underlying bond or loan issue) refuses to In the following, we use indifferently the term borrower
honor its obligations. Second, a company could or counterparty for a debtor. In practice, we refer to issuer
be prevented from making a payment because of risk, or borrower risk, when credit risk involves a funded
transaction such as a bond or a bank loan. In derivatives
a sovereign debt moratorium (City of Moscow in
markets, counterparty risk is the credit risk of a counterparty
1998). for an unfunded derivatives transaction such as a swap or
an option.
One of the most controversial aspects of the b.
Settlement failures due to operational problems result
debate is whether the restructuring of a loan—which only in payment delays and have only minor economic
can include changes such as an agreed reduction in consequences. In some cases, however, the loss can be quite
Credit Risk 3

substantial and amount to the full amount of the payment References


due. A famous example of settlement risk is the 1974 failure
of Herstatt Bank, a small regional German bank. The day it
[1] Basel Committee on Payment and Settlement Systems
went bankrupt, Herstatt had received payments in Deutsche
(2008). Progress in Reducing Foreign Exchange Settle-
Mark from a number of counterparties but defaulted before
ment Risk , Bank for Internal Settlements, Basel, Switzer-
payments were made in US dollars on the other legs of
land, May 2008.
maturing spot and forward transactions.
Bilateral netting is one of the mechanisms that reduce set- [2] Caouette, J., Altman, E., Narayanan P. & Nimmo, R.
tlement risk. In a netting agreement, only the net balance (2008). Managing Credit Risk: The Great Challenge for
outstanding in each currency is paid instead of making pay- Global Financial Markets, Wiley.
ments on the gross amounts to each other. Currently, 55% of
the foreign exchange (FX) transactions are settled through MICHEL CROUHY
the CLS Bank that provides a payment-versus-payment
(PVP) service that virtually eliminates the principal risk
associated with settling FX trades [2].
Credit Default Swaps CDS contracts are usually documented according
to International Swaps and Derivatives Association
(ISDA) standards and specify the following:
A credit default swap (CDS) is a contract between • A reference entity, whose default (the “credit
two parties, the protection buyer and a protection event”) triggers the default payments in the CDS.
seller, whereby the protection buyer is compensated • A reference obligation, which can be a loan,
for the loss generated by a credit event in a reference a bond issued by a corporation or a sovereign
instrument (see Figure 1). The credit event can be nation, or any other debt instrument.
the default of the reference entity, lack of payment • A maturity, the common maturities being 1, 3,
of coupon, or other corporate events defined in 5, 7, and 10 years although the majority of
the contract. In return, the protection buyer pays a standardized CDSs are 5-year swaps.
premium, equal to an annual percentage X of the • A calculation agent, responsible for computing
notional, to the protection seller. The premium X, the payouts related to the transaction. The cal-
quoted in basis points or percentage points of the culation agent can be one of the counterparties of
notional, is called the CDS spread. This spread is the CDS or a third party.
paid (semi)annually or quarterly in arrears until either • A set of deliverable obligations, in case of phys-
maturity is reached or default occurs. ical settlement.
There are various methods for settlement at
default. In a cash settlement, the protection seller pays CDSs were introduced in 1997 by JPMorgan
the protection buyer the face value of the reference and subsequently became the most common form
asset minus its postdefault market value. In a physi- of credit derivative, amounting to a notional value
cal settlement, the protection buyer receives the initial of USD 64 trillion in 2008. With the onset of the
price of the reference minus the postdefault market financial crisis, this notional volume has gone down
value, but, in turn, must make physical delivery of to around USD 38 trillion in the first half of 2009,
the reference asset or a bond from a pool of eligible but it remains large.
assets to the protection seller in exchange for par. In CDSs are over-the-counter (OTC) derivatives and
both cases, the postdefault market value of the ref- are not yet exchange traded. The CDS market is
erence is typically determined by a dealer poll. The a dealer market where a dozen major institutions
contract may also stipulate a fixed or “digital” cash control an overwhelming proportion of the volume
payment at default, representing a fixed percentage and post quotes for protection premiums on various
of the notional value. reference entities.

Example A protection buyer purchases 5-year pro-


tection on an issuer with notional $10 million at Uses of Credit Default Swaps
an annual premium (spread) of 300 basis points
or 3%. Suppose the reference issuer defaults 4 To gain exposure to the credit risk of a firm,
months after inception and that the reference obli- an investor can purchase a bond issued by the
gation has a recovery rate (see Recovery Rate) of corporation by paying the face value (or current
45%. Thus, 3 months after inception, the protection price) of the bond and collect the interest paid by
buyer makes the first spread payment, roughly equal the issuer. Alternatively, he/she could sell protection
to $10 million × 0.03 × 0.25 = $75 000. At default, in a credit swap referenced on the issuer’s bond.
the protection seller compensates the buyer for the Relative to buying the reference security directly, the
loss by paying $10 million × (100–45%) = $5.5 mil- CDS position has the advantage of leading to the
lion, assuming the contract is settled in cash. At same exposure but not requiring a capital at inception.
the same time, the protection buyer pays to the Also, if the reference entity is a foreign or sovereign
seller the premium accrued since the last payment entity, a CDS with a domestic counterparty might
date, roughly equal to $10 million × 0.03 × 1/12 = greatly simplify the legal structure of the transaction.
$25 000. The payments are netted. With these cash The protection buyer is short the credit risk
flows the swap expires; there are no further obliga- associated with the reference obligation. If the buyer
tions in the contract. actually owns the reference security, then the CDS
2 Credit Default Swaps

Premium leg on the balance sheet of the protection seller. This


Protection Protection off-balance sheet nature makes them attractive to
buyer seller many investors, allowing them to take a synthetic
Default leg exposure to a reference entity without directly invest-
ing in it. However, it can also lead to a lack of trans-
parency and generate large exposures, which are not
Reference readily visible to regulators and market participants,
obligation and not subject to adequate capital requirements.

Figure 1 Structure of a credit default swap (CDS)


Valuation

acts as a hedge against default. For a bank hedging A basic question is to determine the fair swap spread,
its loans, this can lead to economic and regulatory or the premium, at inception. The CDS spread must
capital relief. If the buyer does not have exposure to equate the present value at inception of the premium
the reference security, the CDS enables him/her to payments (premium leg) and the present value of the
take a speculative short position that benefits from a payments at default. After inception, the swap must
deterioration of the issuer’s creditworthiness. be marked to the market. Arbitrage-free valuation of
CDSs are often used to hedge against losses in credit default swaps can be done by using the risk-
the event of a default. Thus, CDSs can be viewed as neutral pricing principle (see Risk-neutral Pricing):
insurance contracts against default or, more generally, we assume a pricing measure  such that the present
as insurance against credit events. However, it is value at t of any payout H at T > t is E  [B(t, T )H ]
important to note that, unlike the case of insurance where B(t, T ) is the (risk-free) discount factor.
contracts, the protection buyer does not need to own Consider a CDS with the notional N , payment
the underlying security or have any exposure to it. dates T1 , T2 , . . . , Tn = T . Denote the (random) date
In fact, an investor can speculate on the default of the underlying credit event as τ . A key role is
of an entity by buying protection on a reference played by the conditional risk-neutral survival prob-
entity. Thus, they are more like deep out-of-the- ability S(t, T ) = (τ > T |Ft ) where Ft represents
money equity puts rather than insurance contracts. information available at date t. We denote S(T ) =
The sheer volume of the CDS market indicates S(0, T ), its value at the inception of the contract.
that a large portion of contracts are speculative since, Denote the recovery rate by R, and R = E  [R], the
in many cases, the outstanding notional of CDSs is “implied” recovery rate (see Recovery Swap).
(much) larger than the total debt of the reference The premium leg pays a fixed annual percentage
entity. For example, when it filed for bankruptcy X on the notional N at dates Ti until default: the cash
on September 14, 2008, Lehman Brothers had $155 flow at Ti is therefore
billion of outstanding debt, but more than $400 billion
XN (Ti − Ti−1 )1τ >Ti (1)
notional value of CDS contracts had been written
with Lehman as reference entity [8]. The value at inception t = 0 of this stream of cash
Also, unlike insurance companies, which are flows is therefore
required to hold reserves in accordance with their
issued insurance claims, a protection seller in a CDS
is not required to maintain any reserves to pay off 
n
buyers. An important case is the event where a XN (Ti − Ti−1 )B(0, Ti )E  [1τ >Ti |F0 ]
protection seller has insufficient funds to cover the i=1
default payment, thereby defaulting on its CDS pay- 
n
ment. A famous example is the downfall of AIG, in = XN (Ti − Ti−1 )B(0, Ti )S(0, Ti )
which CDSs sold by its Financial Products subsidiary i=1
(AIGFP) played a major role. 
n
CDSs, like many other credit derivatives, are = XN (Ti − Ti−1 )D(0, Ti ) (2)
unfunded and typically do not appear as a liability i=1
Credit Default Swaps 3

equalizes at inception the values of the fixed and


where D(0, T ) = B(0, T )S(T ) is the risky discount
protection legs:
factor and we have assumed independence of default
times, recovery rates, and interest rates.
The protection leg (or default leg) can be modeled 
n

as a lump payment N (1 − R) at Ti if default occurs XN (Ti − Ti−1 )D(0, Ti )


i=1
between Ti−1 and Ti (alternatively, one can consider
other payment schemes such as payment at default [3, 
n

4]). This can be represented as a stream of cash flows = N (1 − R) B(0, Ti )[S(Ti−1 ) − S(Ti )]
N (1 − R)1Ti−1 ≤τ ≤Ti paid at Ti . The value at inception i=1

(t = 0) of this cash-flow stream (4)


n which yields
N B(0, Ti )E  [(1 − R)1Ti−1 ≤τ ≤Ti |F0 ]
i=1 X = CDS(Tn )

n

n
= N (1 − R) B(0, Ti )(Ti−1 ≤ τ ≤ Ti ) (1 − R) B(0, Ti )(S(Ti−1 ) − S(Ti ))
i=1
i=1
=

n 
n
= N (1 − R) B(0, Ti )(S(Ti−1 ) − S(Ti )) (Ti − Ti−1 )B(0, Ti )S(Ti )
i=1 i=1

(3) (5)

If payments are made at dates other than Ti , then Figure 2 shows the term structure of CDS spreads
accrued interest must be added. If payment dates are written on Lehman Brothers in September 2008.
frequent (e.g., quarterly) the correction is small. To derive this formula, we have assumed that the
The fair spread for maturity Tn (or contracted firm’s default time and recovery rate are independent,
spread or par spread) is defined as the spread that that interest rate movements are independent from

CDS spreads (bps)


750

700

650

600

550

500

450

400

350

300

250
0 1 2 3 4 5 6 7 8 9 10
Years

Figure 2 Term structure of CDS spreads on Lehman Brothers on September 8, 2008


4 Credit Default Swaps

default times, and that the protection seller has of default probabilities given the CDS spreads
negligible default probability (no counterparty risk CDS(T1 ), . . . , CDS(Tn ). The solution S(Ti ) is called
Counterparty Credit Risk). All these assumptions the implied survival probability and 1 − S(Ti ) is
can be relaxed, especially in the context of reduced- the implied default probability or the “risk-neutral”
form (see Reduced Form Credit Risk Models; default probability implied by CDS quotes.
Intensity-based Credit Risk Models) pricing models This procedure of inverting survival probabilities
[3, 5–7]. Hull and White [6] discuss the incorporation from CDS spreads is analogous to the procedure of
of counterparty risk in CDSs. We note that stripping discount factors/zero coupon bond prices
from bond yields (see Yield Curve Construction).
• CDS spreads depend on the term structure of Note that, as for yield curve construction, there are,
default probabilities and on the term structure in general, many more dates Ti (quarterly payments)
of interest rates, but only through payment dates than CDS maturities; hence, reconstructing S(T )
T1 , . . . Tn : two models that agree on the term from CDS spreads requires interpolation or extra
structure of default probabilities will agree on assumptions on survival probabilities. For example,
CDS spreads. survival probabilities are commonly parameterized
• CDS spread depends on the recovery rate only as   T 
through its expectation R under the pricing mea-
S(t, T ) = exp − h(t, u) du (6)
sure . In market quotes, R has been usually t
chosen to be 40% for corporates, although this
convention is subject to change. where h(t, T ) = −∂T S(t, T )/S(t, T ) is the forward
hazard rate (defined analogously to the forward
interest rate Heath–Jarrow–Morton Approach).a
Implied Default Probability Reduced-form models (see Reduced Form Credit
Risk Models; Intensity-based Credit Risk Models)
Given an estimate for the expected recovery rate lead to parametric functional forms for h(t, .), which
R and the term structure of discount factors, can then be used to calibrate parameters to the
one can solve equation (5) for the term structure observed CDS spreads.

Survival probability
1

0.95

0.9

0.85

0.8

0.75

0.7

0.65

0 1 2 3 4 5 6 7 8 9 10
Years

Figure 3 Risk-neutral survival probabilities implied by CDS spreads on Lehman Brothers on September 8, 2008
Credit Default Swaps 5

Implied hazard rate


0.12

0.1

0.08

0.06

0.04

0.02

0
0 1 2 3 4 5 6 7 8 9 10
Years

Figure 4 Hazard rates implied by CDS spreads on Lehman Brothers on September 8, 2008

Figure 3 shows survival probabilities for Lehman 


Brothers implied from CDS quotes on September 8, CDS(Tn )N (Ti − Ti−1 )B(t, Ti )S(t, Ti )
2008, shortly before Lehman’s default. Assuming that Ti >t
the hazard rate h(t, T ) is piecewise linear in T , we 
obtain the forward (annual) hazard rates shown in − N (1 − R) B(t, Ti )(S(t, Ti−1 ) − S(t, Ti ))
Ti >t
Figure 4.
This example might serve as a warning: such (7)
implied or “risk-neutral” default probabilities do not
necessarily convey any information about the actual where the sum runs over the remaining payments and
likelihood of the default of the reference entity, but the survival probabilities are now computed at time
they simply convey a market consensus on the pre- t. This quantity can be positive or negative, just as
mium for default protection at various maturities. in an interest rate swap.
Note also that the implied default probabilities and The mark-to-market value of the protection
hazard rates depend on the assumption used for seller’s position is the negative of the buyer’s value
recovery rates. (7). Note that the mark-to-market value (7) can be
negative. This occurs when the credit quality of the
reference name has improved since inception, and
Mark-to-Market Value of a Credit default protection is cheaper at current conditions,
Default Swap (CDS) Position that is, available for a lower spread than that agreed
upon at inception.
At inception (say, t = 0) the mark-to-market value
of a CDS position is zero for both counterparties.
At a later date t > 0, this value is no longer zero: Triangle Formula
the mark-to-market value for the protection seller is
the difference between the values of the fixed and Consider now the simple case where the default time
protection legs: is described by a constant hazard rate λ (see Hazard
6 Credit Default Swaps

Rate):    Risk Management of Credit


T
S(0, T ) = exp − λ dt = exp(−λT ) (8) Default Swaps
0
Various factors affect the mark-to-market value of
If payments are assumed frequent, Ti − Ti−1 = a CDS position. On a day-to-day basis the main
T  T , we can approximate the terms in concern is spread volatility: the value of a CDS
equation (5) as position is primarily affected by changes in the CDS
spread. Fluctuations in CDS spreads tend to exhibit
heavy tails and strong asymmetry (upward moves in
S(Ti−1 ) − S(Ti ) spreads have a heavier tail than downward moves)
at daily and weekly frequencies. Figure 5 shows the
= −S  (Ti )(Ti − Ti−1 ) + o(Ti − Ti−1 ) daily returns in the CDS spread of CIGNA Corp.
from 2005 to 2009: note the large amplitude of daily
= λS(Ti )T + o(T ) (9)
returns, which can attain 20% or 30%, especially on
the upside. These tails are exacerbated by the relative
illiquidity of many single-name CDS contracts.
So Another concern is obviously the occurrence of
the underlying credit event, which results in large

n payouts, whose magnitude is linked to the recovery
B(0, Ti )(S(Ti−1 ) − S(Ti )) rate and is difficult to determine in advance.
i=1 To provision for these risks, typically one or
both parties to a CDS contract must post collateral

n
and there can be margin calls requiring the posting
= B(0, Ti )S(Ti ) λ(Ti )T
 
of additional collateral during the lifetime of the
i=1 D(0,Ti ) contract, if the quoted spread of the CDS contract
 T
or the credit rating of one of the parties changes.
T →0 Additionally, as with other OTC derivatives, CDSs
+ o(1) → dtλ D(0, t) (10)
0 are exposed to counterparty risk. The counterparty
risk exposure can be particularly large in a scenario
and where the protection seller and the underlying entity
default together. This can happen, for example, if

n  T
the protection seller has insufficient reserves to cover
T →0 CDS payments. Counterparty risk affects the CDS
(Ti − Ti−1 )D(0, Ti ) → dtD(0, t) (11)
i=1 0 spread if the default of the protection seller and the
reference entity are perceived to be correlated [6].
The AIG fiasco in 2008 and the default of Lehman
Substituting in equation (5) we obtain the “trian- exacerbated the market perception of counterparty
gle” relation risk and has since distorted the level of CDS spreads,
making it imperative to account for counterparty risk
in the risk management of CDS portfolios.
CDS(T ) = (1 − R)λ To mitigate counterparty risk in the CDS market,
it has been proposed by various market participants
CDS spread = (1 − recovery rate) × hazard rate and regulators to clear CDS trades in clearinghouses.
(12) In a clearinghouse, the central counterparty acts as
the buyer to every seller and seller to every buyer,
thereby isolating each participant from the default of
The assumption of a flat term structure of hazard other participants. Participants post collateral with the
rates is rather crude, but this formula is very useful central counterparty and are subject to daily mar-
in practice to get an order of magnitude of the risk- gin calls. The introduction of a CDS clearinghouse
neutral default rate λ from CDS quotes. can also reduce systemic risk resulting from CDS
Credit Default Swaps 7

0.4

0.3

0.2

0.1

−0.1

−0.2

−0.3
2005 2006 2007 2008 2009 2010

Figure 5 Daily (log-)increments of CDS spreads for CIGNA (CI), 2005–2009

transactions [1]. In the United States the first CDS Changes in Conventions
clearinghouse, ICE Trust, began operating in March
2009. Other proposals to clear credit default swaps
have been made by CME, NYSE Euronext, Eurex Since 2009, the CDS market has been evolving in
AG, and LCH Clearnet. the direction of trading standardized single-name con-
tracts with an upfront payment and a fixed coupon of
either 100 or 500 bp and a common set of coupon
Credit Default Swap (CDS) Basis payment dates (see www.cdsmodel.com). Standard
maturity dates are March/June/September/December
An asset swap is a transaction between two parties in 20. Coupon payment dates are like standard matu-
which the asset swap buyer purchases a bond from rity dates, but are adjusted to fall on the fol-
the other party and simultaneously enters into an lowing business day. Each coupon is equal to
interest rate swap transaction, usually with the same
(annual coupon/360) × (number of days in accrual
counterparty, to exchange the coupon on the bond
period). This simplifies processing and computation
for Libor plus a spread. The spread is called the
of coupons and cash flows. For example, every
asset swap spread. A common asset swap is the par
$10 mm 100 bp standard CDS contract will pay the
asset swap where the buyer pays par at the inception
of the deal. Unlike a CDS, an asset swap continues same 2Q09 coupon, $26 111, on Monday, June 22,
following bond default. 2009, regardless of trade date, maturity, or reference
The CDS-Bond basis is the difference between the entity.
CDS spread and the asset swap spread on the same The upfront payment is then set at the inception
bond. It is an indicator of relative value of CDS such that the buyer and seller positions have the
versus the cash bond [2]. For example, when the CDS same present value. In this convention, the dealer will
spread is higher than the asset swap spread, that is, the quote not a spread (which is fixed) but an upfront
basis is positive, the CDS is generally considered to payment. This convention applies to standardized
be more attractive than the bond. The reverse is true CDS contracts on names contained in CDX and
if the basis is negative. Negative CDS basis has been ITRAXX indices and may set the example for all
frequently observed during the recent financial crisis. other CDS contracts in the future.
8 Credit Default Swaps

End Notes [6] Hull, J. & White, A. (2000). Valuing credit default swaps
ii: modeling default correlations, Journal of Derivatives
a. 8, 897–907.
Not to be confused with the (instantaneous) hazard rate or
[7] Schönbucher, P. (1998). Term structure modeling of
the default intensity (see Hazard Rate).
defaultable bonds, Review of Derivatives Research 2,
161–192.
References [8] VanDuyn, A & Weitzman H. (2008). Fed to hold CDS
clearance talks, Financial Times (Oct 7).
[1] Cont, R. & Minca, A. (2009). Credit Default Swaps and
Systemic Risk . Financial Engineering Report, Columbia
University. Related Articles
[2] Davies, M. & Pugachevsky, D. (2005). Bond spreads as
a proxy for credit default swap spreads, Risk. Basket Default Swaps; Counterparty Credit Risk;
[3] Duffie, D. (1999). Credit swap valuation, Financial Ana- Credit Default Swaption; Equity–Credit Problem;
lyst’s Journal 54(1), 73–87.
Exposure to Default and Loss Given Default;
[4] Duffie, D. & Singleton, K.J. (1999). Modeling term
structures of defaultable bonds, Review of Financial Hazard Rate; Intensity-based Credit Risk Models;
Studies 12, 687–720. Recovery Rate; Recovery Swap; Reduced Form
[5] Hull, J. & White, A. (2000). Valuing credit default swaps Credit Risk Models.
i: no counterparty default risk, Journal of Derivatives 8,
29–40. RAMA CONT
Total Return Swap given borrower and potentially diversify a concen-
trated portfolio without removing the asset itself from
their balance sheet, while maintaining the relationship
with the borrower. However, TRS payers do not have
A total return swap (TRS) is a financial contract
to hold the asset itself on their balance sheets. If a
between two counterparties to synthetically replicate
TRS payer is taking an outright position, i.e. without
the economic returns of an underlying asset. The
holding the asset itself on the balance sheet, a TRS
principal mechanism and interaction are shown in
is an efficient way to go the asset short synthetically.
Figure 1.
A TRS can help to activate comparative advan-
The reference asset still belongs to the TRS payer,
who is buying protection from the TRS receiver. tages of financing, depending on whether a certain
This reference asset contains typically a fixed interest market plays a role in a certain part of the market.
payment and experiences a certain credit risk to be Typically, a TRS is an off-balance-sheet deal.
protected. The TRS payer transfers any payment
made by the reference asset to the TRS receiver,
who conversely pays a variable payment (typically Comparison with an Outright Investment
the London interbank offered rate (LIBOR)) and in the Bond
a positive (or negative) spread as risk premium.
Additionally, settlements for price depreciation and The most striking difference from an outright invest-
appreciations of the reference asset are made between ment in a bond or a loan is that with a TRS, price
the counterparties. changes become cash flows at the predefined reset
The TRS payer thus sells the market and credit periods, at which settlements are made. For a bond,
risk of the reference asset to the TRS receiver without they are only accounting profits or losses and become
selling the reference asset itself. In the case of a credit effective at maturity or when the position is unwound.
event, the TRS receiver pays the difference between Thus, the TRS resembles a futures contract whereas
the value of the reference asset and the recovery value the direct investment is more similar to a forward one
to the TRS payer. He acquires the counterparty risk (see, e.g., Schönbucher [3]).
of the TRS receiver instead.
Note that payments are not made continuously
but rather at discrete times, that is, at given and Valuation and Risk Management
specified reset periods. Occasionally, the reference
asset consists of a whole portfolio of assets. Schönbucher [3] gives an indication about the payoff
streams of a TRS from the point of view of the TRS
receiver to be counted for valuation purposes:
Reasons for Investing in a Total
Return Swap • Initially the TRS is closed at a fair value; hence,
no cash flow is proceeded.
The TRS receiver explores the possibility of investing • If the bond does not default, the TRS receiver
in the risk profile of the reference asset without pays a variable coupon plus (or minus) a spread
owning it legally. Thus, insurance companies, hedge at every predefined reset point; he receives the
funds, and so on, count among the typical investors. interest from the bond and the difference in
They aim to work on a leveraged basis, diversify market value of the bond since the last reset is
their portfolio, and achieve higher yields by taking exchanged.
on risk exposure. They can explore a synthetic • If the bond defaults, the TRS receiver pays for a
way to make loans without having the costs and last time the variable coupon plus (or minus) a
administrative burden; they explore possibilities to spread and the difference between the last market
originating credit. Sometimes, for certain investors value of the bond and its recovery.
with capital constraints, TRS may be an effective way
to leverage the use of capital. Thus, several risk factors influence the value of
TRS payers are typically lenders and investors the TRS: the interest rate risk driven by the chang-
who want to reduce their respective exposure to the ing yield curve and the default probability of the
2 Total Return Swap

Interest, dividends etc.


Protection seller Price depreciation/appreciation Protection buyer
TRS receiver LIBOR +/− spread TRS payer

Interest etc.

Reference asset

Figure 1 Mechanism of a total return swap (see Martin et al. [2])

reference asset (we neglect, for instance, the coun- [3] Schönbucher, P. (2003). Credit Derivatives Pricing Mod-
terparty risk). Typical valuation models include the els: Models, Pricing and Implementation, Wiley.
Duffie–Singleton model, hazard rates, and forward
measure. The credit risk is reflected in the fair spread Further Reading
(fair means that initially there has to be no cash flow)
(see also Anson et al. [1]). Kasapi, A. (1999). Mastering Credit Derivatives—A Step-
by-Step Guide to Credit Derivatives and their Application.
Prentice Hall.
References Tavakoli, J.M. (1998). Credit Derivatives. A Guide to Instru-
ments and Applications, Wiley.
[1] Anson, M.J.P., Fabozzi, F.J., Choudry, M. & Chen, R.R.
(2004). Credit Derivatives: Instruments, Applications, and CARSTEN S. WEHN
Pricing. John Wiley & Sons.
[2] Martin, M.R.W., Reitz, S. & Wehn, C.S. (2006). Kred-
itderivate und Kreditrisikomodelle- Eine mathematische
Einführung, Vieweg Verlag. (in German).
Recovery Swap In order to state the triangular arbitrage relation
more generally, consider the case when RS  = RD ,
that is, when the strike recovery rates of the recovery
swap and DDS are not the same. Let the premium on
A recovery swap (RS), also called a recovery lock the CDS be c1 and the premium on the DDS be c2 .
or a recovery default swap (RDS), is an agreement In order to replicate the RS, we will hold x units of
to exchange a fixed recovery rate RS for the realized the CDS and y units of the DDS. The replication has
recovery rate φ, the latter being determined under two conditions:
prespecified contractual terms. The fixed recovery
rate may be specified in terms of a recovery of par 1. The cashflows at default must be equal for the RS
amount (RP), or as the recovery percentage of an and the replicating portfolio of CDS and DDS.
equivalent Treasury bond, known as recovery of In other words,
Treasury (RT), or as a fraction of the market value
of the bond prior to default, also known as recovery x · (1 − φ) − y · (1 − RD ) = RS − φ (1)
of market value (RMV).
A recovery swap is no different than a forward 2. The premiums of the replicating portfolio must
contract at rate RS on the underlying recovery rate be net zero as the recovery swap does not have
φ. The maturity of the contract is denoted as T . any intermediate cash flows. Hence the following
If the reference credit underlying the recovery swap equation must hold:
does not default before T , the swap expires worth-
less. There are no intermediate or periodic cash flows y · c2 − x · c1 = 0 (2)
in a recovery swap. In a liquid market for recovery
swaps, the quoted rate RS is the best forecast of the Set x = 1 in equation (1) so as to eliminate
expected recovery rate for default at time T . This dependence on φ in the equation. Then we have that
recovery rate may then be used to price credit default
swaps (CDSs). 1 − RS
We assume that the buyer of the recovery swap x = 1 implies y= (3)
1 − RD
will receive RS and pay φ. Hence, the buyer gains
when the realized recovery rate is lower than that of Substituting this result for x, y in equation (2)
the strike rate RS . The net payoff to the contract is results in the following:
(RS − φ). Recovery swaps are quoted in terms of the
“strike” rate RS . For example, a dealer might quote a c1 1 − RS LS
recovery swap in GM at 37/40. This means the dealer = = (4)
c2 1 − RD LD
is prepared to sell a recovery swap with RS = 37%
and buy at RS = 40%. where L denotes the loss rate. We note the following:

• The no-arbitrage condition in equation (4)


Replication and No-arbitrage between the three securities implies that the ratio
of the premium on the CDS to the premium on
A recovery swap may be synthesized by selling a the DDS is equal to the ratio of loss rates on
fixed recovery CDS (also known as a digital default the recovery swap and the digital CDS. This is
swap or DDS) at a predetermined recovery rate RD because the quote on the recovery swap RS is the
and buying a standard CDS. When the reference expected recovery rate on the CDS contract.
name defaults, the seller of the DDS pays the loss • It follows immediately that if RD is specified, then
amount on default (1 − RD ) and receives (1 − φ) equation (4) mandates a precise relation between
on the CDS, thereby generating cash flow (RD − φ). the quotations on the three types of contracts, that
There is a triangular arbitrage restriction between the is, rate RS for the recovery swap, premium c1 on
three securities: RS, DDS, and CDS. If we hold 1 the CDS, and premium c2 on the DDS. Given the
unit each of the RS, CDS, and DDS, then we would quotes on any two of these securities, the quote
need that RS = RD . on the third security is immediately obtained.
2 Recovery Swap

• These no-arbitrage based results do not depend against the default risk of this entity, but because
in any way on the underlying process for default of differences in settlement at default between the
or that of recovery. This makes the relationships CDS Index and the single-name CDS, the investor
in equation (4) very general and easy to apply in might get different recovery rates on the two instru-
practice, as well as easy to assess empirically for ments (recovery basis risk). Hence, recovery swaps
academic purposes. can hedge against recovery basis risk by locking-in
recovery rates.
Furthermore, in the case where the CDSs specify a
Applications and Uses of Recovery Swaps physical settlement, it is possible that the underlying
bonds might be scarce compared to the notional
Recovery swaps were first developed by BNP Paribas amount of CDS traded on the bond. This causes a
in early 2004 [10]. In response to market demand, “delivery squeeze” where the price, and therefore the
banks started issuing fixed-rate recovery collateral- recovery of the bond, is artificially increased because
ized debt obligations (CDOs) and as a consequence the buyers of CDS need to buy the bonds for delivery
were bearing recovery rate risk. In order to hedge to their counterparty. For instance, in October 2005,
against this recovery rate risk, market participants Delphi Corporation had $27.1 billion of outstanding
started selling recovery swaps. CDSs against notional outstanding bonds of just $2
Recovery swap markets are predominantly traded billion causing the price of the defaulted bonds to
on reference entities with a high risk of default surge by as much as 24% [9]. The consequence of this
or of declining credit quality. For this reason, the delivery squeeze is to reduce the profits accruing to
largest activity in the recovery swaps market is in the buyers of CDSs, and recovery swaps provide a hedge
auto parts and auto manufacturing sectors and geo- against this by locking in the recovery rate ahead of
graphically on North American entities [7]. Trading time. More recently though, most CDSs are being
volumes in recovery swaps, although still small rela- settled in cash, thereby circumventing this problem.
tive to the overall credit derivatives market, increased
in 2005 with the defaults of Delphi Corporation and
the Collins & Aikman Corporation [7]. Still, the mar- Recovery Risk
ket remains largely undeveloped and the International
Swaps and Derivatives Association (ISDA), in May There is a growing literature on recovery risk. Berd
2006, issued a template for the documentation on [3] provides a nice introduction and analysis of
recovery swaps but the full documentation remains recovery swaps. DDSs are analyzed in [4]. Altman
to be completed at this time [13]. et al. [2] present a detailed study showing how
There are two primary uses of recovery swaps. recovery rates depend on default rates, positing and
The first is to isolate the probability of default finding an inverse relationship. Chan-Lau [6] presents
from the recovery rate. Traders may have in-house a method to obtain the upper bound on recovery on
expertise in determining default probabilities but not emerging market debt. Das and Hanouna [8] develop
in determining recovery and thus may wish to hedge a methodology for identifying implied recovery rates
their recovery risks through recovery swaps. The sec- and default probabilities from CDS spreads and
ond use of recovery swaps is to eliminate recovery data on stock prices and volatilities. Acharya et al.
basis risk. Recovery basis risk occurs because of [1] provide empirical evidence that recovery rates
different settlement procedures between CDSs and depend on the industry, state of the economy, and
CDOs. CDSs are often settled physically, meaning specificity of assets to the industry in which the
that when default occurs the seller of protection firm operates. Carey and Gordy [5, 14] show that
receives the defaulted bonds, whereas CDOs are recovery has systematic risk. Guo et al. [11] look
almost always cash settled. The difference in set- at recoveries in reduced form models by explicitly
tlement procedures is the source of recovery basis modeling the postbankruptcy process of recoveries.
risk. For instance, an investor might hold a CDS The well-known loss given default model of Gupton
Index that includes a given reference entity and and Stein [12] is well liked and used. Absolute
have an offsetting position by selling the single-name priority rule (APR) violations are modeled in [15].
CDS of the same entity. The investor is hedged For a nice overview, see [16].
Recovery Swap 3

References [10] Financial Times (2004). Capital Markets & Commodi-


ties: Investors Welcome Recovery Swap Tool. June 18,
2004.
[11] Guo, X., Jarrow, R. & Zeng, Y. (2005). Modeling
[1] Acharya, V., Bharath, S. & Srinivasan, A. (2007). the Recovery Rate in a Reduced Form Model. Working
Does industry-wide distress affect defaulted firms? – paper, Cornell University.
Evidence from creditor recoveries, Journal of Financial [12] Gupton, G. & Stein, R. (2005). LossCalcv2:Dynamic
Economics 85(3), 787–821. Prediction of LGD. Working paper, Moodys.
[2] Altman, E., Brady, B., Resti, A. & Sironi, A. (2005). [13] Investment Dealers’ Digest (2006). New ISDA Documen-
The link between default and recovery rates: theory, tation Boosts Recovery Swaps. May 22, 2006.
empirical evidence and implications, Journal of Business [14] Levy, A. & Hu, Z. (2006). Incorporating Systematic
78, 2203–2228. Risk in Recovery: Theory and Evidence. Working paper,
[3] Berd, A. (2005). Recovery swaps, Journal of Credit Risk Moodys KMV.
1(3), 1–10. [15] Madan, D., Guntay, L. & Unal, H. (2003). Pricing the
[4] Berd, A. & Kapoor, V. (2002). Digital premium, Journal risk of recovery in default with APR violation, Journal
of Derivatives 10(3), 66. of Banking and Finance 27(6), 1001–1218.
[5] Carey, M. & Gordy, M. (2004). Measuring Systematic [16] Schuermann, T. (2004). What do we know about Loss
Risk in Recovery on Defaulted Debt I: Firm-Level Ulti- Given Default? 2nd Edition, Working paper, Federal
mate LGDs. Working paper, Federal Reserve Board. Reserve Bank of New York, forthcoming in Credit Risk
[6] Chan-Lau, J.A. (2003). Anticipating Credit Events using Models and Management.
Credit Default Swaps, with an Application to Sovereign
Debt Crisis. IMF working paper.
[7] Creditflux Ltd (2006). Jump-to-default Hedging Spurs
Related Articles
Recovery-swap Surge. January 1, 2006.
[8] Das, S. & Hanouna, P. (2007). Implied Recovery. Work- Credit Default Swaps; Exposure to Default and
ing paper, Santa Clara University. Loss Given Default; Recovery Rate.
[9] Euromoney Magazine (2006). Why CDS Investors need
to Lock in Recovery Rates Now. May 1, 2006. SANJIV R. DAS & PAUL HANOUNA
Constant Maturity Credit to a different tenor are also possible. The quote for a
CM CDS is usually given in terms of the participation
Default Swap rate, a, of the CM CDS leg. With the second leg
being the standard CDS protection leg, a participation
rate of less than 100% reflects the expectation of
A constant maturity credit default swap (CM CDS) is rising credit spreads, whereas a participation rate
a credit derivative with payments linked to periodic of more than 100% corresponds to the expectation
fixings of a standard CDS rate with a fixed tenor of decreasing credit spreads.
(e.g., 5 years) on a particular credit entity. In busi-
ness practice, CM CDS is usually presented as an Trading Aspects
elaboration of a plain CDS suitable for use in trading
strategies expressing certain views on the steepen- Descriptions of CM CDS structures started circulat-
ing or flattening of credit spreads. From the point ing in research communications of securities firms
of view of quantitative modeling, CM CDS is best in early 2004, [3, 5, 6]. On November 21, 2005,
understood as a simple representative of a family of ISDA provided a publication [4] setting a standard for
structured credit exotics, more complicated members the terms and condition of the CM CDS leg, includ-
of which would depend on a nonlinear combination ing a mechanism for the determination of CDS rate
of CDS spreads of more than one maturity and/or resets. Establishing undisputable resets for CDS rates
refer to more than one credit entity. Both business presents a problem because no standard source of
practice and quantitative modeling of CM CDS are information on CDS rates similar to Telerate pages
based on ideas and techniques originally developed of interest rates has emerged.
for CMS-linked fixed income exotics and adapted to The primary mechanism stipulated by ISDA com-
credit modeling. pels the seller of protection in a CM CDS to make a
binding bid for the fixed rate premium that the seller
is willing to pay in exchange for protection in a stan-
Instrument Structure dard CDS on the same credit entity. This bid will be
used as a CDS rate reset and is expected to be a good
A CM CDS trade usually consists of two legs, one proxy for the true CDS rate because the seller has no
of which is a CM CDS leg, which is a sequence incentive to quote a value that is too high (in which
of payments P1 , P2 , . . . , Pn made on a schedule of case the seller can be forced to buy overpriced CDS
payment dates T1 , T2 , . . . Tn (which is typically set protection) or too low (in which case the seller will
following conventions of a standard CDS schedule) lose on the coming CM CDS payment). A fallback
and is computed by the following formula: mechanism, which comes into effect if the receiver
fails to provide a bid, puts on the buyer the burden of
Pi = i min[C, a·S(Ti , Ti+M )] (1)
obtaining CDS quotes from a set of fallback dealers,
Here S(Ti , Ti+M ) is the rate of a CDS spanning subsequently using the highest quote as the rate reset.
M payment periods and observed at the fixing date At the time of this study, the volume of CM
Ti associated with the payment date Ti , i is CDS transactions remains limited, taming optimistic
the daycount fraction for the accrual period i, a is predictions for the development of multilayered struc-
the multiplier called participation rate, and C is the tures, such as tranches of CM CDS portfolios, but
fixed rate reset cap.a Notional amount of the trade is not destroying such prospects completely. We also
set to 1. The fixing normally happens at the beginning note that more exotic structures with payoffs linked
of the accrual period, so that Ti = Ti+1 .b In the case to nonlinear combinations of CDS rates of differ-
of a default of the reference credit, the fraction of the ent maturities were observed in the market and have
last payment accrued before the default is paid and potential for further growth.
further payments stop.
The second leg of a CM CDS is normally a Quantitative Modeling
standard CDS protection leg; however, structures
where the second leg is either a standard CDS The present value of a CM CDS leg is given by the
premium leg or another CM CDS leg corresponding sum of expectations of the payments (1) discounted
2 Constant Maturity Credit Default Swap


under risk-neutral measure: Aj = (Tj −1 , τk )
 Tj −1 <τk ≤Tj
V = i E[min[C, a·STi (Ti , Ti+M )]PTi D(0, Ti )]
× (ps (τk−1 ) − ps (τk ))D0 (τk ) (5)
i

+A (2) Here R is the recovery rate and {τj } is a subdivision


  of the segment [Ti , Ti+M ], sufficiently frequent to

A= (Ti−1 , τk ) enable an accurate calculation of the default leg and

i Ti−1 <τk ≤Ti the contributions Aj of the premium accrued in the
× E[min[C, a·STi (Ti , Ti+M )] time (Tj −1 , τk ) between the last CDS coupon date
and the event of default.c
× (Pτk−1
 − Pτk )D(0, τk )] (3) The value of a CM CDS leg obtained by setting
STi (Ti , Ti+M ) → F (Ti , Ti+M ), PTi → ps (Ti ), and D
Here PTi is the indicator of survival of the underlying
(0, Ti ) → D0 (Ti ) in equations (2–3) gives a quick
name until time Ti , and D(0, Ti ) is the stochastic
estimate sufficient for a qualitative understanding of
discount factor from 0 to Ti . We equipped the
the relationship between the term structure of CDS
notation STi (Ti , Ti+M ) for the CDS rate reset with
spreads and the CM CDS participation rate but miss-
a subscript Ti to indicate that this rate is modeled as
ing the essential effects of the dynamics of credit
a value at time Ti of a certain stochastic process.
spreads. Indeed, even apart from the obvious prob-
The contribution A of the interest accrued within
 lem of not handling the cap condition, the expectation
the period of time (Ti−1 , τk ) between the last
E[STi (Ti , Ti+M )PTi D(0, Ti )] can be very different
coupon payment and the time of default is written
from F (Ti , Ti+M )ps (Ti )D0 (Ti ) when credit spread
in a discretized form using a sufficiently frequent
volatility is taken into account.
subdivision {τk } of the segment [Ti−1 
, Ti ].
We consider major types of modeling approaches
in the order of increasing sophistication, including the Convexity Adjustments
nonstochastic approximation, convexity adjustments, In this section a discussion on volatility-dependent
instantaneous hazard rate modeling, and forward corrections to the results obtained in the nonstochas-
credit spread modeling. tic approximation is provided. Such correction can be
either derived from a fully consistent model capable
Nonstochastic Approximation of computing E[STi (Ti , Ti+M )PTi D(0, Ti )] or intro-
duced in an ad hoc manner. We begin by looking at
We can obtain a simple approximation assum- the convexity adjustments in the more limited sense
ing that the CDS rates are nonstochastic. In this of instrument-specific adjustments without building a
approximation, each process St (Ti , Ti+M ) of the full-fledged model. In the remainder of this article,
expectation of the time Ti CDS rate reset conditional we omit the discussion of the accrued interest term
on the information accumulated until time t is frozen A and focus on the main term in equation (2).d
at t = 0. As a result, the relevant CM CDS rate at The first step is to switch to a new measure in
date Ti is equal to the forward CDS rate F (Ti , Ti+M ), which the process St (Ti , Ti+M ) is a martingale. The
which can be expressed in terms of the survival prob- numeraire Nt (Ti , Ti+M ) of the desired measure is
ability function, ps (t) = E[Pt ], and the deterministic known as the risky basis point value (RBPV) and is
discount function, D0 (t) = E[D(0, t)], given as

F (Ti , Ti+M ) 
i+M
 Nt (Ti , Ti+M ) = Pt j Bt (Tj )
(1 − R) (ps (τj −1 ) − ps (τj ))D0 (τj )
j =i+1
Ti <τj ≤Ti+M
= 

i+M

(j ps (Tj )D0 (Tj ) + Aj ) + (Tj −1 , τk )Ht (τk−1 , τk )
j =i+1 Tj −1 <τk ≤Tj

(4) (6)
Constant Maturity Credit Default Swap 3

where Bt (Tj ) is the time t value of a risky unit Instantaneous Hazard Rate Modeling
payment at Tj and Ht (τk−1 , τk ) is the time t value
of a unit payment at τk conditional on a default A more systematic modeling of CM CDS is pos-
event in the interval (τk−1 , τk ]. We used a discretized sible in the framework of stochastic instantaneous
form of the accrued interest consistent with equa- hazard rates. This approach starts with postulating a
tion (3). stochastic differential equation (SDE) for the stochas-
The existence of the required measure follows tic default intensity λ(t). A reasonable choice is a
from a representation of the CDS rate as a ratio of lognormal process (similar to the Black–Karasinski
two tradable assets: model of interest rates) or an affine process (similar
to the Cox–Ingersoll–Ross model of interest rates).
A normal process (similar to the Hull–White model
St (Ti , Ti+M ) = Lt (Ti , Ti+M )/Nt (Ti , Ti+M ) (7)
of interest rates) was also used despite a conceptual
problem posed by positive probabilities of negative
where Lt (Ti , Ti+M ) is the time t expectation of the hazard rates. Multifactor models for joint stochastic
CDS default leg, evolution of hazard rates and instantaneous interest
rates are also possible.
 An exact analytical solution for a CM CDS is
Lt (Ti , Ti+M ) = (1 − R)Pt Ht (τk−1 , τk ) (8) not available in any of these models because of a
Ti <τk ≤Ti+M two-layered structure involving inner expectations for
CDS rates fixings conditional on the state achieved on
(For a rigorous discussion of the mathematics of the fixing dates. The machinery of trees, lattices, or
measure change involving risky basis point value as partial differential equation (PDE) solvers, however,
a numeraire, see [8].) After the measure change, the can be accommodated to handle CM CDS structures.
contribution of each individual coupon payment to The key element is a construction of a slice of
the CM CDS leg can be written as values of CM CDS rate fixings on the set of model
states achieved on the fixing date. This is done
using a representation of the CDS rate in terms of
 
BTi (Ti ) conditional expectations of elementary instruments
i N0 (Ti , Ti+M )E f (STi (Ti , Ti+M )) Bt (T ) and Ht (T1 , T2 ) provided by equations (7), (6),
NTi (Ti , Ti+M )
and (8). We refer to Chapter 7 of the book [7] for
(9) the details of a possible realization of a tree-based
construction.
where f (X) = min(C, aX). The next step is to An advantage of hazard rate modeling is its
assume that St follows a lognormal martingale, consistency that allows to price a wide range of credit
F exp(σ Wt − 0.5σ 2 t), and to replace the true value instruments of different maturities, including CDS
of the ratio BTi (Ti )/NTi (Ti , Ti+M ) by the value options, asset swaps, bond options, and credit linked
of a suitable increasing function g(S) at S = STi . notes using the same model. A notable disadvantage
Imposing the condition N0 (Ti , Ti+M )g(F (Ti , Ti+M ) is the difficulty of calibration.
= ps (Ti )D0 (Ti ) ensures that the calculation of
the average (9) for σ = 0 brings us back to the Forward Credit Spread Modeling
nonstochastic valuation. A nonzero volatility σ > 0
leads to a positive correction due to the convexity As drawbacks of short-rate models of interest rates
of the product g(S)f (S) in the region of values led to the invention and development of swap and
of S close to F (Ti , Ti+M ) and distant from the LIBOR market models, similar progression is taking
cap C. place in the space of structured credit models. We
This approach has an advantage of relative sim- refer to the work [1] and Chapter 23 of [2] for details
plicity and potential ability of calibrating the model of a model in which the CDS rates St (Ti , Ti+M ) are
volatility σ to CDS options. The disadvantage is an chosen as primary variables.
uncontrollable assumption in the choice of the func- An advantage of this approach is the ease of
tion g(S). calibration and ability to derive efficient analytical
4 Constant Maturity Credit Default Swap

approximations under minimal additional assump- [2] Brigo, D. & Mercurio, F. (2007). Interest Rate Mod-
tions. At present, the disadvantage is the paucity of els–Theory and Practice, with Smile, Inflation, and Credit,
relevant market data, leaving a large freedom in spec- 2nd Edition, Springer.
[3] Calamaro, J.-P. & Nassar, T. (2004). CMCDS: The Path to
ifying the structure of volatilities and correlations. A Floating Credit Spread Products, Deutsche Bank, Global
full payback from this level of model sophistication Markets Research.
cannot be expected until the market for structured [4] ISDA (2005). Additional Provisions for Constant Maturity
products develops enough to provide liquid quotes for Credit Default Swaps, International Swaps and Deriva-
CDS option volatilities for a dense set of maturities, tives Association, November 21, 2005. Available at
similarly to caplet and swaption volatility matrices in www.isda.org.
[5] Pedersen, C. & Sen, S. (2004). Valuation of Constant
the interest rate markets.
Maturity Default Swaps, Lehman Brothers, Quantitative
Research Quarterly.
End Notes [6] Renault, O. & Ratul, R. (2007). Constant maturity
credit default swaps, in The Structured Credit Handbook,
a.
A. Rajan, G. McDermott & R. Ratul, eds, Wiley Finance,
The structure can obviously be extended to admit a fixed pp. 57–77.
rate reset floor, which, however, is not included in the [7] Schönbucher, P. (2003). Credit Derivatives Pricing Mod-
standard ISDA template. els, Wiley Finance.
b.
The actual payment dates Ti usually have a delay of at [8] Schönbucher, P. (2004). Measure of survival, Risk
least one business day and are rolled forward or backward August, 79–85.
to fall on a valid business day in accordance with currency-
dependent conventions. In the practice of quantitative
modeling, proper care is taken to make sure that correct Related Articles
discount factors reflecting the actual payment dates are
used.
c.
These expressions are often written in terms of integrals Constant Maturity Swap; Convexity Adjustments;
obtained in the limit of an infinitely frequent discretization. Credit Default Swaps; Credit Default Swaption;
The same remark applies to equation (3). Forward and Swap Measures; Hazard Rate; Inten-
d.
A rigorous calculation of the convexity correction to sity-based Credit Risk Models; Swap Market
accrued interest term is technically involved and can be Models; Term Structure Models.
avoided by using a proportionally adjusted correction to
the main term. TIMUR S. MISIRPASHAEV

References

[1] Brigo, D. (2006). CMCDS valuation with market models,


Risk June, 78–83.
Credit Default Swaption Pricing Approaches
With some adaptations due to the cancellation fea-
ture of the CDS option, the pricing methodology
parallels the well-known approaches in interest rate
Credit default swap (CDS) options, also known swaptions. One may consider a suitable distribution
as single name credit default swaptions, allow an of the forward CDS spread under an appropriate risk-
investor to buy protection on a reference name by neutral measure. This readily leads to a Black-type
entering a CDS (see Credit Default Swaps) at a pricing formula and is dealt with first. In another
previously set CDS spread. CDS options may knock approach, one can rather model the “instantaneous”
out if the reference entity defaults before the exercise CDS spread, which is related to the “intensity” of the
date. Usually, the buyer of protection will also receive default time.
the default payment in that case. Such a cash flow
simply corresponds to the default leg of a CDS
with maturity equal to the exercise date. We will Black Formula for Credit Default Swap Options
thereafter assume cancellation of the contract if the
underlying name defaults before the exercise date. As usual r denotes the default-free short rate and Q is
We refer to Credit Default Swap Index Options for the usual risk-neutral probability associated with the
the extensions to the portfolio case. savings account. We consider G = (Gt ), a filtration
Denote the default time associated with the under- such that τ is a stopping time and r is an adapted
lying name by τ . This is usually the date of a process.
credit event; we refer to ISDA master agreement The idea of using “survival measures” was intro-
for further details, given that this notion may vary duced in [9] and further developed in [4, 6, 7, 10],
through time and differ across geographical regions. among others (see also Credit Default Swap Index
For t ≤ s, we denote by B̃(t, s) the time t price of Options). We will denote by B̃(t, Tk ) the “risky
Tk >T
a defaultable discount bond, paying one at time s if level” which corresponds to the time t price of a uni-
τ > s and zero otherwise. Clearly, B̃(t, s) collapses tary premium leg associated with the forward CDS
to zero at the default of the underlying name if it starting at T . We consider the probability measure Q̂
occurs before s, since no payment will eventually associated with the previous risky level numéraire
be received by the discount bond holder. T1 , . . . , TN (see also Change of Numeraire about change of
denote the payment dates on the premium leg of numéraire techniques) defined by
the underlying CDS. For simplicity, we will further 
neglect effects of accrued premiums. T is the exer- B̃(T , Tk ) T
dQ̂ Tk >T
cise date of the European credit spread call option =  exp − r(u) du (1)
and p the strike.  We can write the payoff at time dQ B̃(0, Tk )
Tk >T 0
T as (pT − p)+ B̃(T ,Tk ). pT is the CDS par
Tk >T
spread at time T . There is also usually a multiplica- Let us remark that dQ̂ = 0 on the set {τ ≤ T }.
dQ
tive adjustment to take into account the premium
Thus Q̂ is absolutely continuous but not equivalent
payment frequency, which is quarterly in most cases
to Q. Q̂(τ > T ) = 1 which leads to the terminology
and which is not dealt with here for notational sim-
“survival measure”. We can then readily express the
plicity. For the same reason, we do not account for a value of the credit spread option at t = 0 as
possible up-front payment associated with the CDS,
 
which is likely to be applied after the implementation
 
of the “big bang” CDS protocol. That will result in  B̃(0, Tk ) E Q̂ (pT − p)+ (2)
some small adjustments to the payoff function and Tk >T
thus to the pricing formulas, which will be neglected
thereafter. Clearly, pT is not defined if the option has We can get around the issue of the CDS premium
already cancelled out, that is, if τ < T , but the option pT not being defined after default by considering
payoff is equal to zero in that case. pT 1{τ >T } , which we assume to be a random variable
2 Credit Default Swaption

with respect to GT . This does not change the compu- short rate r and the pseudodefault intensity λ (see
tation in the previous equation since Q̂(τ > T ) = 1. Cox–Ingersoll–Ross (CIR) Model).
Let us first remark that for the forward CDS to be pt,T will further denote the time t forward CDS
priced normally, we must have E Q̂ [pT 1{τ >T } ] = p0,T premium. Though pt,T has a financial meaning only
where p0,T denotes the forward CDS premium. In on the set {τ > t}, its computation can be extended
the case where pT 1{τ >T } is lognormal under Q̂, with to the complete set of events in the previous Cox
volatility parameter σ , we readily get a Black formula modeling framework (see [4] for further discussion).
for the price of the CDS option: pt,T solves for the following equation where, once
  again, we do not take into account accrued premium
 or up-front payments effects
 B̃(0, Tk ) × (p0,T N (d1 ) − pN (d2 )) (3)
  
Tk >T
 Tk


p0,T σ2
pt,T × E exp − (r + λ)(u) du |t 
ln
p
+ T
2 √ Tk >T
where d1 = √ and d2 = d1 − σ T . t
σ T  
TN s
= E exp − (r + λ)(u) du
Intensity Approaches
T t
Another approach consists in specifying the intensity 
of the default time. This is the path followed in ×(1 − δ)λ(s)|t  ds (5)
[2–4]. To circumvent the difficulty with default
intensity dropping down to zero after default and the
various mathematical issues related to enlargement of Prior to default, the left-hand term corresponds
filtrations, the easiest way is to model the default time to the value at time t of the premium leg of the
through a Cox process. We thus define the default underlying forward default swap, while the right-
time τ associated with the underlying name as hand term is associated with the default leg. Clearly
 t  pt,T is t -measurable and we can prove that it is
   both a (, Q̂) and a (G, Q̂) martingale. Thus, the
τ = inf t, λ(s) ds ≥ − ln U (4) forward default swap premium shares the properties
 
0 of a “true” price. It can be checked that pT ,T =
pT .
where λ is a positive process adapted to some filtra- Using an extended version of Girsanov theorem
tion  = (t ) and U follows a standard uniform vari- (see Equivalence of Probability Measures) for point
able independent of . For simplicity, we will further processes (see Point Processes), it can be shown
assume that (, Q) is a Brownian filtration. Follow- that
ing [1] or [8], we define as H = (Ht ) the filtration dpt,T
generated by the counting process Nt = 1{τ ≤t} and we = σ dŴt (6)
pt,T
denote by Gt = t ∨ Ht , the relevant information at
time t, incorporating knowledge about occurrence of where Ŵ is a (, Q̂) Brownian motion.
default prior to t and current and past values of finan- Let us also assume that there exists some spec-
cial variables such as interest rates or credit spreads of ification of r and λ such that the volatility σ is
the reference entity (see Filtrations for mathematical constant. Then, the forward CDS spread has a log-
details about filtrations in finance). Up to default time, normal dynamics under Q̂. This readily leads to the
λ(t) is the default intensity of τ (we refer to Point already stated Black formula for the price of the CDS
Processes regarding point processes and to Compen- option. The most obvious advantage is the simplic-
sators about compensators and intensities). While the ity of the outcome. The drawbacks are also rather
default intensity drops to zero after τ , we can remark obvious. The lognormal assumption for the forward
that λ(t) is still well defined, thanks to the above Cox spreads is questionable since jumps are often included
modeling framework. For instance, one can consider in the dynamics of λ as in the affine specification
shifted Cox–Ingersoll–Ross (CIR) processes for the within [5].
Credit Default Swaption 3

The intensity approach is easy to understand and is swap options and the impact of correlation, Interna-
consistent across strikes, maturity of the option, and tional Journal of Theoretical and Applied Finance 9(3),
maturity of the CDS. However, it entails dealing with 315–339.
[4] Brigo, D. & Matteotti, C. (2005). Candidate Market
extra parameters and is numerically more involved. Models and the Calibrated CIR++ Stochastic Intensity
In the more general setting involving correlation Model for Credit Default Swap Options and Callable
between r and λ, Monte Carlo simulation is usually Floaters. Working paper, Credit Models, Banca IMI.
required. In special cases, such as deterministic [5] Duffie, D. & Gârleanu, N. (2001). Risk and valuation
default-free rates, analytical formulas can be derived. of collateralized debt obligations, Financial Analysts
Fortunately enough, in most examples, the correlation Journal 57(1), 41–59.
[6] Hull, J. & White, A. (2003). The valuation of credit
parameter has little impact on option prices and
default swap options, Journal of Derivatives 10(3),
analytical approximations of the implied volatility in 40–50.
the Black formula can be derived. Let us remark [7] Jamshidian, F. (2004). Valuation of credit default swaps
that in these approximations σ depends on the and swaptions, Finance and Stochastics 8(3), 343–371.
exercise date and the maturity of the underlying [8] Jeanblanc, M. & Rutkowski, M. (2000). Modelling of
CDS. default risk: an overview, in Mathematical Finance:
Theory and Practice, J. Yong & R. Cont, eds, Higher
Education Press, Beijing. pp. 171–269.
[9] Schönbucher, P.J. (2000). A Libor Market Model with
Acknowledgments Default Risk . Working paper, University of Bonn.
[10] Schönbucher, P.J. (2003). A Note on Survival Measures
The author thanks A. Cousin, L. Cousot, A. Godet and C. and the Pricing of Options on Credit Default Swaps.
Pedersen and the editors for helpful remarks. The usual Working paper, ETH Zurich.
disclaimer applies.

Related Articles
References
Change of Numeraire; Compensators; Cox–Ing-
[1] Bielecki, T.R. & Rutkowski, M. (2002). Credit Risk: ersoll–Ross (CIR) Model; Credit Default Swap
Modeling, Valuation and Hedging, Springer. Index Options; Credit Default Swaps; Filtrations;
[2] Brigo, D. & Alfonsi, A. (2005). Credit default swap cali- Point Processes.
bration and derivatives pricing with the SSRD stochastic
intensity model, Finance and Stochastics 9(1), 29–42. JEAN –PAUL LAURENT
[3] Brigo, D. & Cousot, L. (2006). The stochastic inten-
sity SSRD implied volatility patterns for credit default
Credit Default Swap (for all but distressed credits), the spread is set
on any given day such that no upfront payment is
(CDS) Indices required.a
A standard market practice is to roll index posi-
tions so as to maintain a position in the on-the-run
Credit markets have shown tremendous growth in (i.e., most recent) series and version, in order to guar-
the last 10 years. In particular, the telecom bubble antee maximum liquidity. From an investor’s point
and corporate scandals of the early 2000s increased of view, in addition to enabling credit diversifica-
the interest of market participants in products such tion, credit indices introduce the possibility of lever-
as credit default swaps (CDS) (see Credit Default age without significant liquidity concerns, as several
Swaps), which provide protection against credit derivatives on these indices exist today (see Collat-
events. In response to this demand for credit protec- eralized Debt Obligations (CDO); Credit Default
tion, credit indices were introduced in 2003, increas- Swap Index Options).
ing the liquidity of CDS markets. These indices
are, in essence, standardized baskets of CDS writ-
ten on investment-grade and high-yield corporate Pricing Framework
issuers, or emerging-market governments. Table 1
shows the basic composition criteria of the main Credit indices are routinely priced through the stan-
indices (more stringent criteria apply too, in particu- dard CDS model. Indeed, though the index contracts
lar, those concerning liquidity of the individual CDS). trade with a fixed spread, the convention is to quote a
The specific constituents for each index are posted at theoretical fair spread (i.e., the coupon that the index
www.markit.com. would need to pay in theory in order to require no
In most indices, issuers are equally weighted. A upfront payment) and use the CDS model to con-
new series of a given index is issued semiannu- vert this fair spread to an upfront payment for the
ally, excluding from the basket those issuers who index. The issuers in the basket are assumed to be
no longer match selection criteria (e.g., downgraded homogeneous in credit quality and recovery rate.
issuers) and adding new ones. In case of a default When deriving the common hazard credit curve for
event, the defaulting issuer is removed from the bas- the issuers, the convention is to assume a flat curve
ket, but the weights remain and the index continues (see Hazard Rate). The expected losses are com-
to trade. The reduced basket is referred to as a new puted from the credit curve, assuming that losses
version of the same series. The loss payment for a are paid at the end of a coupon period, and given
default event is determined through the same settle- a particular recovery rate. The present value for the
ment auction as for single-name CDS (see Credit index contract is then the difference between the dis-
Default Swaps). counted expected losses and the discounted spread
Credit indices are commonly issued with initial payments weighted by the survival probability (since
maturities of 3–10 years. Similar to CDS, a credit premium is only paid on the remaining protected
index is a contract which entails that the protection notional).
buyer pays a spread (or coupon) at a regular fre- The contract can alternatively be valued by using
quency (usually quarterly according to International information on the individual constituents, thus relax-
Swaps and Derivatives Association (ISDA) dates) ing the homogeneity assumption. We can theoreti-
in return for default protection on some notional cally replicate the index by considering a basket of
amount. In case of a default of one of the refer- individual CDS that pay the same spread. We com-
enced issuers, the protection seller pays the non- pute the expected losses on the index by aggregating
recovered part of the protected notional times the the individual-constituent expected losses, each of
weight of the issuer in the index. The contract does which is derived from the full-term structures of
not terminate, but the protected notional is reduced credit spreads for the constituent. Similarly, the pay-
accordingly. Importantly, the index trades with a ment side aggregates survival probabilities over all
fixed spread for each series; changes in market pric- issuers. It is worth noting that the dependence struc-
ing are reflected in the upfront payment required to ture between the issuers does not play a role here as
enter the contract. In contrast, in a standard CDS the whole basket is considered.
2 Credit Default Swap (CDS) Indices

Table 1 Main credit indices


Index name Number of constituents Region Credit quality
CDX.NA.IG 125 North America Investment grades
CDX.NA.IG.HVOL 30 North America Low-quality investment grades
CDX.NA.HY 100 North America Noninvestment grades
CDX.NA.HY.B North America B rated
iTraxx Europe 125 Europe Investment grades
iTraxx Europe HiVol 30 Europe Low-quality investment grades

Imperfect Replication component, and the basis. The first two components
constitute the theoretical fair spread of the index,
Pricing the index through the constituents is appeal- as replicated through a basket of (market-traded)
ing, but it is not surprising to observe significant issuer CDS. The nonlinear portion of this fair spread
differences with the quoted index prices. The repli- accounts for the heterogeneity in credit quality among
cating strategy is not perfect, as the mechanics behind the issuers, and increases both with the level of
credit indices are slightly different from those for the average fair spread and the dispersion of the
CDS. We mentioned earlier that an index trades with individual fair spreads. The nonlinear component is
a floating upfront and a fixed spread, whereas most very sensitive to an increase in default likelihood of
CDS trade with a floating spread and no upfront. a single issuer. The basis—defined as the difference
This implies that we cannot, in general, enter into between the observed fair spread and the theoretical
a basket of CDS contracts that pay the same spread fair spread—contains a risk premium rewarding the
as the index. And while the basket can be com- index dealer for the small portion of risk that cannot
posed without initial capital, the index requires a be perfectly hedged through the replicating basket,
nonzero upfront investment. After a default event, and embeds a liquidity premium as well.
new differences between the credit index and the bas-
ket appear. On the index, the reduction in spread
payments is independent of the defaulting issuer;
End Notes
the spread is fixed and only the protected notional a.
Note that changes to the conventional CDS protocol
changes. On the other hand, the spread reduction were instituted in early 2009. Among other things, the
for the basket is proportional to the spread on the new protocol stipulates that single-name CDS trade
CDS for the specific defaulting issuer. The index with a fixed coupon of 100 or 500 bp, and settle via
and the basket consequently exhibit different behav- an upfront payment (see Credit Default Swaps for
iors through time, and offer different sensitivities to further discussion).
interest rates.
Further Reading
Fair Spread Decomposition Couderc, F. (2006). Measuring risk on credit indices: on the
use of the basis, Risk Metrics Journal Winter 2007, 61–87.
As contracts in their own right, credit indices are sub- Zhang, H. (2005). Instant default, upfront concession and CDS
ject to specific demand and supply effects, and have index basis, Journal of Credit Risk 1(2), 79–89.
their own distinct risk profile. A simple, standard way
of analyzing their risk is to observe the quoted index Related Articles
fair spread. This approach, though, cannot distinguish
the risk due to specific issuers from the risk due to Basket Default Swaps; Collateralized Debt Obliga-
demand for the index as a whole. tions (CDO); Credit Default Swap Index Options;
A useful decomposition is to break the index fair Credit Default Swaps.
spread into three components: the average fair CDS
spread across the constituent issuers, the nonlinear FABIEN COUDERC & CHRISTOPHER C. FINGER
Basket Default Swaps Modeling Approaches
Copula Approach

As contingent payments are only triggered in case of


Basket default derivatives or swaps are more sophis- default, copula modeling focuses on the multivariate
ticated credit derivatives that are linked to several distribution of default times.
underlying credits. The standard product is an insur- The copula approach was first applied in this
ance contract that offers protection against the event context in [13, 14]. The challenge is to specify a
of the kth default on a basket of n, n ≥ k, underlying function C such that with given marginal distribution
names. It is similar to a plain credit default swap Fi , we have that
(CDS) but the credit event to insure against is the
event of the kth default, and it is not specified to a
Prob{τ1 ≤ t1 , . . . , τn ≤ tn } = F (t1 , . . . , tn )
particular name in the basket. A premium, or spread, s
is paid as an insurance fee until maturity or the event = C(F1 (t1 ), . . . , Fn (tn ))
of kth default. We denote by s kth the fair spread in
(1)
a kth-to-default swap, that is, the spread making the
value of this swap equal to zero at inception. For the
basic product description, we refer, for example, to Basically, the set of Copula functions coincides
[2, 3, 12]. with the set of all multivariate distribution func-
If the n underlying credits in the basket default tions whose marginal distributions are uniform dis-
swap are independent, the fair spread s 1st of a first-to- tributions on [0, 1], since under certain regularity
default swap (FtD) is expected to be close to the sum assumptions
of the fair individual default swap spreads si over
all underlying credits i = 1, . . . , n. For exponential C(u1 , . . . , un ) = F (F1−1 (u1 ), . . . , Fn−1 (un )) (2)
waiting times, this follows since the minimum of
exponentially distributed waiting times has itself an One of the most elementary copula functions is the
exponential distribution with an intensity that equals normal copula (or Gauss copula), which is derived by
the sum of the intensities of the individual waiting this approach from the multivariate normal distribu-
times. If, on the other hand, the underlying credits tion (see Gaussian Copula Model). Clearly, there
are in some sense “totally” dependent the first default are various different copulas generating all kinds of
will be the one with the worst spread; therefore dependencies, for example, in [3, 12]. The advantage
s 1st = maxi (si ). of the normal copula, however, is that it relates to the
For the exact determination of the fair spread of one period version of certain asset-value models used
basket default swaps, multivariate modeling of the in credit portfolio risk modeling. But note that since
default times of the credits in the basket is necessary. the asset-value approach can only model defaults up
This dependency modeling can be classified into to a single time horizon T , the calibration between
three different approaches, which are also used in the two models can only be done for one fixed hori-
collateralized debt obligations (CDO)-modeling (see zon. Dynamic extensions of this in the asset-value
Collateralized Debt Obligations (CDO)): context are exit time models.

• Copula approach (see Default Time Copulas;


Asset-value Models
Gaussian Copula Model; Copulas: Estimation)
• Asset-value approach (see Structural Default In asset-value models we are looking for stochastic
Risk Models; also Merton, Robert C.) processes (Yti ) called ability-to-pay processes and
• Reduced-form, spread-based approach (see (nonstochastic) barriers Ki (t) such that the default
Multiname Reduced Form Models; Hazard time τi for credit i can be modeled as the first hitting
Rate; Intensity-based Credit Risk Models; time of the barrier Ki (t) by the process (Yti ):
Duffie–Singleton Model; Jarrow–Lando–Turn-
bull Model) τi = inf{t ≥ 0 : Yti ≤ Ki (t)} (3)
2 Basket Default Swaps

First, successful models of this class are reached most general construction (as e.g., in [7]) is to
when Y i are either Brownian motions with drift or view L as an increasing cadlag pure Jump process
time changed Brownian motions; see [9, 15], where with absolute continuous compensator ν(dt, dx) =
also some numerical calibration results are shown. g(t, dx)dt; see, for example, [10] for the underlying
Exit times of more general stochastic processes, stochastic analysis. This is particularly useful, if one
including stochastic volatility models, are applied to considers options on the spread s kth of a basket swap.
default modeling in [8]. Here, the modeling attempt is on L and the single-
name modeling is not considered.
Reduced-form Modeling
Pricing
Here we start from the classical single-name CDS
approach, where the default time is a double stochas- In order to price basket default swaps, we need
tic Poisson process (or Cox-process); see Hazard the distribution F(k:n) (t) of the time τ kth of the kth
Rate; Multiname Reduced Form Models and [5, 6, default. The kth default time is, in fact, the order
11]. In this approach, it is assumed that conditional statistic τ(k:n) , k ≤ n, and, in general, we can derive
on a realization of a path of the default intensity, the the distribution of the kth order statistics from the
default time is distributed like the time of the first multivariate distribution functions [3]. For pricing we
jump of a time-inhomogeneous Poisson process with also need the survival function:
this intensity. Typically, the dynamics of resulting
credit spreads are closely tied to the dynamics of the S(k:n) (t) = 1 − F(k:n) (t) (4)
default intensity in this approach. The fair spread s kth for maturity Tm is then given
The main challenge here is the incorporation of by
default dependence. One either has to model common
jumps in the spread processes or applies the copula 
m
approach exogenously to the default times given from s kth i B(T0 , Ti )S(k:n) (Ti )
the spread and hazard rates [4, 17]. Recently, an i=1
even more reduced approach was developed [1, 7, 
n  Tm
18, 19] in which the accumulated losses (Lt )t≥0 = (1 − RECi ) kth=i
B(T0 , u)F(k:n) (du) (5)
are modeled directly as a stochastic process. The i=1 T0

2.5

2
(std,min,max)/mean

1.5

0.5

0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Correlation r

Figure 1 kth-to-default spread versus correlation for a basket with three underlyings: (solid) s 1st , (dashed) s 2nd ,
(dashed-dotted) s 3rd
Basket Default Swaps 3

The first part is the present value of the spread [8] Fouque, J.P., Wignall, B.C. & Zhou, X. (2008). Mod-
payments, which stops at τ kth . The second part is the eling correlated defaults: first passage model under
present value of the payment at the time of the kth stochastic volatility, Journal of Computational Finance
11(3), 43–78.
default. Since the recovery rates might be different [9] Hull, J. & White, A. (2001). Valuing credit default
for the n underlying names, we have to sum up swaps II: modeling default correlations, The Journal of
over all names and weigh with the probability that Derivatives Spring, 12–21.
the kth default happens around u and that the kth [10] Jacod, J. & Shiryaev, A.N. (1987). Limit Theorems for
defaulted name is just i (we assume that there are no Stochastic Processes, Springer.
kth=i
joint defaults at exactly the same time). So F(k:n) is [11] Jarrow, R.A., Lando, D. & Turnbull, S.M. (1997). A
Markov model for the term structure of credit risk
the probability distribution of the kth order statistic
spreads, Review of Financial Studies 10, 481–523.
of the default times and that kth = i. Figure 1 [3] [12] Laurent, J. & Gregory, J. (2005). Basket default
shows the kth-to-default spreads for a basket of three swaps, cdos and factor copulas, Journal of Risk 7,
underlyings with fair spreads s1 = 0.009, s2 = 0.010, 103–122.
and s3 = 0.011, and pairwise equal normal copula [13] Li, D.X. (1999). The valuation of basket credit deriva-
correlation on the x-axis. In [16], it was already tives, CreditMetricsT M Monitor April, 34–50.
observed that the sum of the kth-to-default swap [14] Li, D.X. (2000). On default correlation: a copula func-
tion approach, Journal of Fixed Income 6, 43–54.
spreads is greater  than the sum nof the individual [15] Overbeck, L. & Schmidt, W. (2005). Modeling default
n kth
spreads, that is, k=1 s > i=1 si . Both sides dependence with threshold models, Journal of Deriva-
insure exactly the same risk, so this discrepancy is tives 12(4), 10–19.
due to a windfall effect of the first-to default swap. [16] Schmidt, W. & Ward, I. (2002). Pricing default baskets,
At the time of the first default, one stops paying the Risk 15(1), 111–114.
huge spread s 1st on the one side but on the plain- [17] Schoenbucher, P. (2003). Credit Derivatives Pric-
ing Models: Models, Pricing, Implementation, Wiley
vanilla side one stops just paying the spread si of the
Finance.
first defaulted obligor i. [18] Schönbucher, P. (2005). Portfolio Losses and the Term
Structure of Loss Transition Rates: A New Methodology
References for the Pricing of Portfolio Credit Derivatives. Working
paper.
[19] Sidenius, J., Piterbarg, V. & Andersen, L. (2005).
[1] Bennani, N. (2005). The Forward Loss Model: A A New Framework for Dynamic Credit Portfolio Loss
Dynamic Term Structure Approach for the Pricing of Modelling. Working paper.
Portfolio Credit Derivatives. Working paper.
[2] Bluhm, C., Overbeck, L. & Wagner, C. (2002). An Intro-
duction to Credit Risk Modeling, CRC Press/Chapman & Related Articles
Hall.
[3] Bluhm, C. & Overbeck, L. (2006). Structured
Credit Portfolio Analysis, Baskets and CDOs, CRC- Collateralized Debt Obligations (CDO); Copulas:
press/Chapman & Hall. Estimation; Copulas in Insurance; Credit Default
[4] Duffie, D. & Gârleanu, N. (2001). Risk and valuation Swaps; Credit Default Swap (CDS) Indices;
of collateralized debt obligations, Financial Analysts Default Time Copulas; Duffie–Singleton Model;
Journal 57, 41–59. Gaussian Copula Model; Hazard Rate; Jar-
[5] Duffie, D. & Singleton, K. (1998). Simulating Correlated row–Lando–Turnbull Model; Multiname Reduced
Defaults. Working paper, Graduate School of Business,
Form Models; Intensity-based Credit Risk Models;
Stanford University.
[6] Duffie, D. & Singleton, K. (1999). Modeling term Reduced Form Credit Risk Models; Structural
structures of defaultable bonds, Review of Financial Default Risk Models.
Studies 12, 687–720.
[7] Filipovic, D., Overbeck, L. & Schmidt, T. (2008). LUDGER OVERBECK
Dynamic Term Structure of CDO-losses. Working
Paper.
Collateralized Debt The Nature of Collateral Assets

Obligations (CDO) The common denominator of CDO transactions


was, until 2002–2003, the application of securiti-
zation techniques to (credit) assets sourced in the
financial markets, such as bonds (CBOs), or from
financial institution balance sheets, such as bank
Collateralized debt obligations (CDOs) can be gener- loans (collateralized loan obligations (CLOs)). The-
ically defined as structured products using tranchinga oretically, any asset generating recurrent cash flows
and securitization technology to repackage and redis- can be securitized and therefore be used as a collat-
tribute credit risks. Figure 1 symbolically depicts the eral to a CDO transaction. What distinguishes CDOs
mechanics of a CDO. from securitization transactions (asset-backed secu-
The first forms of CDOs appeared during the rities (ABSs)),b which deal with extended pools of
1980s with the repackaging of high-yield bonds such small credit exposures, is that CDO underlying assets
as collateralized bond obligations (CBOs), following can be construed as unitary credit risks and analyzed
hot on the heels of the first collateralized mort- as such (each CDO underlying asset usually carries
gage obligations (CMOs) pioneered by First Boston an individual credit rating).
in the United States in 1983. This technique was In this decade, the range of instruments used as
later extended to other asset classes such as bank CDO collateral has considerably increased, including
loans (especially leveraged loans). With the advent of securitization issues (CDOs of ABS), other CDOs
the credit derivatives market and the surge in credit (CDOs of CDOs), trust-preferred securities (TRUPs),
default swap (CDS) trading at the beginning of the and going as far as hedge fundsc or private equity
decade, CDOs became one of the fastest growing seg- participations.
ments of the credit market (the so-called structured In parallel, the rise of credit derivatives (CDSs)
credit market) and a crucible of financial innovation. has led to the emergence of a new type of products,
In the limited time frame of a few years (2001–2007), the synthetic CDOs.d Instead of “cash” securities,
CDOs and structured credit arguably became the these instruments reference a pool of CDSs, which
hottest areas in capital markets and among the great- replicate the risk and cash-flow profile of a bond
est fee and trading income generators for investment portfolio. The credit risk is transferred to the special-
banks, asset managers, and hedge funds, until the purpose vehicle (SPV) using CDS technology, which
2007 subprime crisis marked the (temporary?) end then issues securities backed by this “synthetic” port-
of the party. folio. What makes synthetic CDOs attractive to struc-
This article first provides definitions and a typol- turers and managers is that they avoid the logistics
ogy of CDOs, based on their main characteristics. The and financial risk of buying in and warehousing secu-
second section deals with the main modeling tech- rities while a CDO is being constructed and sold
niques for CDOs. We then dwell upon the impact of to investors. The use of CDSs as reference “assets”
the 2007 subprime crisis on the CDO business and for CDOs opened the door to innovative structures
look at the evolution of the market and structures and management techniques, which led part of the
in the aftermath of this watershed. Our conclud- structured credit business away from traditional secu-
ing remarks deal with the future for CDOs in a ritization and closer to exotic derivative trading as
post–credit crisis world. discussed later.

Risk Transfer Mechanism


Definitions and Typology of CDOs One must first distinguish credit risk transfer from
the collateral portfolio to SPV and, second, from the
CDOs cover a large variety of products and struc- SPV to capital market investors.
tures. The following parameters can be used to define Credit risk transfer from the collateral portfolio to
the different types of CDOs. the SPV may happen via the following:
2 Collateralized Debt Obligations (CDO)

Collateralized debt obligation


Credit risk analysis, modeling, tranching,
stress testing, and pricing

Underlying assets/collateral Issuer Tranches


· Investment grade bonds AAA
· High yield, emerging AA
market bonds SPV BBB
· CDS Mezzanine
· Leveraged loans Real (“cash”) or
· MBS/ABS synthetic (“derivative”) Equity
asset transfer Tranche risk
- +

Debt tranches rated by


independent rating
Active management
agencies (excluding equity tranche)
of the underlying
portfolio of assets
Asset / Collateral manager

Figure 1 Mechanics of a CDO (Bruyere et al. 2005)

• “real” asset acquisition (true sale): “cash objective for the sponsor bank is to obtain regu-
CDO” or latory or economic capital relief using CDO tech-
• credit derivative technology (or other, e.g., insur- nology to transfer credit risk to investors. In these
ance): “synthetic CDO” or collateralized synthetic transactions, assets or credit risk exposures are typ-
obligation (CSO). ically sourced from the sponsor bank’s own balance
sheet.
Risk transfer from the SPV to capital market
investors can take the following forms:
Static or Managed CDOs
• SPV credit-linked note issuance: “funded CDO”;
• credit derivatives (CDSs) sold by the investor to “Static CDOs” are characterized by the fact that
the SPV: “unfunded CDO”; and the composition of the reference portfolio does not
• a combination of the above-mentioned: “par- change over the life of the transaction (but for sub-
tially funded CDO”. Most whole capital structure stitutions in a limited number of cases).
CDOs fall into that category. At the opposite end of the spectrum, “managed
CDOs” (see Managed CDO) allow for the dynamic
management of the portfolio of collateral assets
Objective of the Transaction within a predetermined set of constraints. CDOs are
usually managed by a third-party asset manager with
Most CDOs are structured for arbitrage purposes. credit management expertise. In a managed arbi-
Arbitrage CDOs are tailor-made investment prod- trage CDO, the asset manager’s objective may be the
ucts, using cash or synthetic technology, created following:
for the benefit of capital market investors. In these
transactions, collateral assets are usually sourced • to avoid default and ensure timely payment of
in the fixed-income cash or credit derivative mar- interest and repayment of principal (“cash-flow
kets. CDO”) or
However, a significant part of the CDO mar- • to optimize the market value of the underly-
ket was also driven with the purpose of bank bal- ing collateral pool through active management
ance sheet management. In such a transaction, the (“market-value CDO”).
Collateralized Debt Obligations (CDO) 3

“Self-managed CDOs” enable investors them- in future income streams (since the coupon is no
selves to manage the reference portfolio of the CDO longer being paid on the asset in default) and
they have underwritten. therefore in the dividend amounts ultimately paid
The following section provides an analysis of the to the equity tranche investors.
main CDO modeling techniques. • Portfolio management
Active trading by the CDO manager may gen-
erate losses (which have the same impact as a
Analysis of CDO Modelling Techniques default) or gains (which are then paid out in
dividends or incorporated into the CDO capital,
Cash-flow CDOs thereby, increasing the subordination level). Gen-
erally, the CDO manager is only able to modify
On the basis of securitization techniques, cash-flow the portfolio for a given period (5–7 years, the
CDOs usually aim at exploiting an arbitrage oppor- so-called reinvestment period). He/she must com-
tunity between the yield generated by a portfolio ply with a set of criteria (quality of the portfolio,
of credit assets and that required by investors on sector diversification, maturity profile, maximum
the securitized debt, the great majority of which annual trading allowance, etc.) defined in accor-
(80–90%) is rated investment grade due to the vari- dance with the rating agencies.
ous credit enhancement mechanisms: • Ramp-up risk
When a cash CDO is launched, the underlying
• Tranching and waterfall portfolio cannot be immediately constituted by
The creation of several layers of risk (“tranches”) the manager (essentially to avoid disturbing mar-
and the sequential allocation of income generated ket liquidity). The portfolio is, therefore, built up
by the collateral portfolio in order of tranche over 3–6 months (the ramp-up period). During
seniority. that time, asset prices may go up and the ini-
• Subordination tial average coupon target for the portfolio might
Losses are absorbed by all junior tranches to not be attained. In addition, the bank arranging
a given tranche, thus providing a protection the transaction carries the credit risk of the col-
“cushion” (when the CDO is liquidated, the lateral during the ramp-up period (the so-called
senior creditors have priority over the mezzanine “warehousing” risk). To avoid taking too much
investors, who have priority over the equity risk on their balance sheets and allocate capital,
holders). banks have been using off-balance sheet vehicles
• Overcollateralization (O/C) and interest cover (such as conduits and structured investment vehi-
(I/C) tests cles (SIVs)) to park the assets during the ramp-up
These act as CDO covenants, leading to the period. However, as witnessed during the 2007
diversification of cash flows toward the early credit crisis, these defense structures backfired as
repayment of the most senior tranche if they liquidity dried up and banks were forced to recon-
are breached, thus strengthening the level of solidate the vehicles and the security warehouses
subordination. on their balance sheets.
• Diversification • Reinvestment risk
Reference portfolios are diversified in terms of During the life of the transaction, the manager
obligor geography and sector, thus limiting the is regularly led to replace assets and therefore to
risk of correlated defaults. reinvest part of the portfolio. Market conditions
may change and the average coupon level might
Risks and sources of performance in cash-flow not be attained. To manage this risk, the manager
CDOs include the following: and the other equity investors usually have an
early termination option on the CDO.
• Default risk
Underperformance of the underlying portfolio
Synthetic CDOs: Correlation Products
(defaults) leads to a decrease in the amount of
assets (and therefore the amount of capital, the In the synthetic space, tranching and securitization
equivalent of a write-off in accounting terms) and techniques can also be applied to a portfolio of
4 Collateralized Debt Obligations (CDO)

CDSs (the so-called whole capital structure (WCS) • Correlation (“rho”)


synthetic CDOs). The pricing and risk management of CDOs are
However, a watershed appeared with the creation based on correlation rate assumptions. Correla-
of single-tranche technology, fed by the rise in CDS tion is determined on the basis of a smile (or
trading liquidity and advances in credit-modeling skew), which depends mainly on the subordina-
expertise. For an investor, any CDO tranche can be tion of the tranche considered. Different corre-
considered as a “put spread”e on the losses of the lation rates can thus be given to the attachment
reference portfolio (the attachment and the detach- and detachment points x and y. This approach
ment points of the CDO tranche being equivalent by “correlation pairs” is commonly referred to as
to the two strike prices of this option combination). base correlation.
Thus, the pricing of a CDO tranche (x% to y%) can • With the rise of the credit index market,g
be deduced from the value of the portfolio (i.e., the CDO arrangers have benefited from new meth-
losses from 0 to 100%) from which the value of the ods for managing their correlation books. Stan-
equity tranche (0% to x%) and that of the senior dard tranches are now traded on the main
tranche (y% to 100%) are subtracted. These tech- indices in the interbank market, thus providing
niques led to the development of the exotic credit a benchmark level for the correlation parame-
market, which trades on the basis of “correlation”, ters. Until the 2007 credit crisis, liquidity had
not unlike the equity derivative market, and volatility. significantly increased in the CDO tranche mar-
In a bespoke single-tranche CDO, the arranger ket, enabling arrangers to rebalance their books
usually retains the unsold tranches on its books and with credit hedge funds and other sophisticated
dynamically manages their credit risk by selling a investors.
fragment (delta) of the notional amount determined
for each reference credit entity in the portfolio.
This delta must then be readjusted dynamically
The Impact of the Subprime Crisis on the
depending on the changes in the credit spreads. Evolution of the CDO Market
The objective of delta hedging is to neutralize the
With the 2007 subprime and credit crisis, CDOs
price variations in the tranche that are linked to
have come to epitomize the evil of financial innova-
changes in the spread of the entities in the underlying
tion. The credit-risk-dispatching mechanism implicit
portfolio. The delta of a tranche depends upon its
in CDO structures has been broadly accused of foster-
seniority and residual maturity. Since deltas are
ing the wide spread of poorly understood risks among
being determined using marginal spread variations,
mainstream capital market investors lured by attrac-
a significant change in spreads will lead to a profit
tive yields in a low interest-rate environment. To what
or loss depending upon the convexity of the tranche
extent does that charge stand?
price (“gamma” in option language). Synthetic CDO
arrangers, therefore, not only manage first-order risk
levels but also need to monitor their convexity ABS CDOs and Subprime Crisis: How Did It
positions. Happen?
These hedging mechanisms, however, are not
perfect, since they do not deal with second-order A key driver for the subprime residential mortgage-
risks: backed securities (RMBS) market was the strong
demand for subordinated bonds (aka mezzanine
• Recovery rate in the event of default tranches, in particular, BBB and BB) from ABS CDO
This parameter cannot be inferred from market managers.
data. Thus, it is necessary for the dealers to set The reason these bonds were so attractive is that
aside appropriate levels of reservesf to cover this the rating agencies assumed that a portfolio of sub-
risk. ordinate mezzanine bonds from various ABS issues
• P&L in the event of default would not be highly correlated (much as they assume
Tranche convexity properties are magnified in that corporate bonds from various industries are not
the event of default and bank positions must be highly correlated). Because of this low-correlation
managed accordingly. assumption, pooling subprime mezzanine bonds into
Collateralized Debt Obligations (CDO) 5

a CDO structure enabled the CDO manager to create, Anatomy of the ABS CDO Market: Where Did It
in essence, new AAA-rated CDO bonds, using only All Go?. About $430 billion of ABS CDOs were
BBB subprime RMBS. issued between 2005 and 2007. However, the amount
The assumed diversification benefit drove the of risk transferred outside the banking system was
capital structure of the CDO and explains a large of actually limited because of the following factors:
part of the enormous “misrating” of subprime CDO
risk by rating agencies (let alone the rating of the • investment banks retaining a significant part of
underlying subprime RMBS risk itself). super-senior risk, either directly ($85 billion for
the most affected: Citigroup, UBS, Merrill Lynch,
ABS CDO—A Key Driver of the “Subprime” Morgan Stanley) or indirectly (by taking on
Demand. The demand from ABS CDOs allowed counterparty risk on monoline insurers; $120
RMBS originators to lay off a significant portion of billion notional amounts);
the risk. We estimate that $70 billion of mezzanine • resecuritization effect though CDO bucket ($40
subprime RMBS were issued in 2005–2007 versus billion notional);
$200 billion of mezzanine ABS CDOs over the same • off-balance sheet vehicles, for which banks
period. Such notional amount of mezzanine ABS retained all potential losses (conduits) or part
CDOs roughly represents an implied capacity of $90 of the losses (SIV, ∼$15 billion of ABS CDO
billion for mezzanine subprime RMBS investments investments); and
(over the vintages 2005–2007).h • “quasi-”off-balance sheet vehicles, such as money
This excess demand was filled by synthetic risk market funds that were subsequently supported by
(CDS) buckets. The creation of the ABS CDS mar- bank capital.
ket multiplied credit risk in the system, allowing for
the creation of far more CDOs than the available cash Outside the main banking sector, the most notable
“CDOable” assets. For example, one tranche of a sub- “CDO” casualties were either sophisticated insurers
prime RMBS securitization (nominal $15.2 million) (such as AIG) or medium-sized banks (IKB, Sach-
was referenced in at least 31 mezzanine ABS CDOs senLB, and other German Landesbanken).
(total notional of $240.5 million). As a result, it appears that CDOs were primarily
High-grade ABS CDOs also need to be taken a repackaging tool. The main roots of the “sub-
into account. Although the “subprime” demand from prime” demand stem from abusive off-balance sheet
these CDOs (roughly $85 billion) was lower than structures (SIVs, conduits) and regulatory capital
the nominal of high-grade subprime actually issued arbitrages (negative basis trades, long/short badly
($230 billion), they fueled the issuance of mezza- captured by Value-at-Risk (VaR) models, etc.), both
nine ABS CDOs through the feature of the “inner of which resulted in maintaining most of the risk
CDO bucket”. Such a bucket typically had an aver- within the banking system while “masking” its true
age size of 20%, allowing CDO arrangers to channel price/value.
a significant portion of ABS CDO risk. Such “rese- One could argue that there was no “real” CDO
curitization” was also facilitated by the existence of market for RMBS where rational investors could have
CDS on CDOs, further multiplying the credit risk sent earlier warning signals (by reducing demand,
in the system: one tranche of a mezzanine ABS refusing incestuous features such as CDO buckets
CDO ($7.5 million nominal) was referenced in at within ABS CDOs) and acted as stabilization agents
least 17 high-grade ABS CDOs ($154 million total (long-term demand, different investor base than in the
notional). underlying RMBS market).
At first sight, it would, therefore, be fair to con- In addition, the derivative market did not perform
clude that, since 2005, ABS CDOs have globally up to its objectives, as it was created too late (the
absorbed almost every cash-subordinated bond cre- ABX index, which effectively introduced a greater
ated in the subprime world (and have sold significant price transparency) and actually magnified the effects
protection in synthetic form as well), while traditional of the mispricing/misrating of RMBS risk.
cash buyers were largely absent. However, does this In conclusion, if the ABS CDO market effectively
mean that the credit risk was effectively transferred drove the demand for mezzanine subprime RMBS, its
to “mainstream” capital market investors? impact on mainstream investors has been limited. In
6 Collateralized Debt Obligations (CDO)

that respect, it is worth noting that the vast majority in synthetic CDOs, this market segment actually held
of RMBS risk (approximately 82 cents on the dollar) up well in line with the underlying asset quality
ended up being rated AAA and acquired not by CDOs (corporate earnings) further supported by the liquidity
but by institutions taking advantage of very cheap provided by banks (correlation desks).
funding. Even though the market avoided the “great
unwind”, the buying base for these products has
essentially gone away, and while some prop desks
How did Other CDO Markets Fare?
and hedge funds are still active, the institutional
Leveraged Loan CLOs. CLOs have suffered from money that provided the liquidity backbone has
pressure on both the asset and the liabilities sides. vanished.
Prices of leveraged loans fell in line with the overall
credit market, due to technical factors (significant
loan overhang resulting from warehouses at the major Conclusion: Where Next for CDOs?
investment banks) and fundamental fears (increase
in default rates, weakly structured leveraged buy- The postcrisis CDO market will probably be charac-
out (LBO) deals). On the liability side, we estimate terized by a convergence trend toward the mechanics
that negative basis buyers represented 50% of the of the corporate synthetic market, which has proved
AAA CLO buyer base, while banks and SIVs/CDOs more efficient and resilient for the distribution of
accounted for 25% and 15%, respectively. The CLO credit risk:
market suffered from the disappearance of such • the development of index and index tranches
“cheap funding”. (transparent and traded correlation) fueling
Even though we witnessed an LBO “bubble” liquidity;
(private equity houses taking advantage of the strong • less reliance on rating agencies and more in-house
CLO bid), the impact of the burst has not been as due diligence on assets; and
significant as for the ABS CDO market: • a return to balance-sheet-driven transactions.
• CLOs were not the sole buyer of leveraged loans. The main challenges for the CDO market include
• They did not suffer from misrating. the following:
• New AAA CLO buyers stepped in (Asian insti-
tutions, unaffected banks, insurance companies). • restoring investor confidence in the benefit of
structured products by providing better trans-
Most of the CLO deals issued in 2008 have been parency and liquidity;
balance sheet driven (cleaning up of warehouses), • addressing the AAA funding issue (now that SIVs
with simple two-tier structures (AAA and equity), and conduits have been dissolved); and
where the AAA tranche (or the equity) is retained • overcoming the discrepancies in accounting
by the originating bank. treatment.i
As the full capital structure execution is challeng-
ing and as the sourcing of cash asset is difficult Once the dust has settled, we expect securitization
(illiquidity, no warehouse providers for ramp up), the and CDO transactions to come back on the basis of
development of single-tranche synthetic CLOs, sup- more transparent and rational fundamentals.
ported by the growth of the Loan CDS market (ISDA
documentation, launch of LCDX and LevX indices),
is a key feature of the forthcoming years. End Notes

a.
Corporate Synthetic CDOs. With the huge growth Tranching is the operation by which the cash flows from a
in synthetic CDOs, what is commonly referred to portfolio of assets are allocated by order of priority to create
various layers (“tranches”), from the less risky (“senior”
as the structured bid became a dominant driver of tranche) to the most risky (“first loss” or “equity” tranche).
credit spreads. While a combination of mark-to- Tranching technology is usually performed using rating
market losses, rating downgrade risk, and headline agency guidelines in order to ensure that the senior tranche
risk could have caused investors to unwind positions attracts the most favorable rating (triple-A).
Collateralized Debt Obligations (CDO) 7

b.
Asset-backed securities are securities representing a secu- Reference
ritization issue. The ABS market covers mortgage-backed
securities (residential and commercial), consumer (credit [1] Bruyere, R., Cont, R., Copinot, R., Jaeck, Ch., Fery, L. &
card, student loans, auto loans), and commercial loans Spitz, T. (2005). Credit Derivatives and Structured Credit:
(trade receivables, leases, small business loans, etc.). A Guide for Investors, Wiley.
c.
Collateralized fund obligations.
d.
“Synthetic” in as far as the mechanism for transferring
risk is synthetic, using a derivative. Related Articles
e.
Combination of two put options on the same underlying
asset, at two different strike prices. Base Correlation; Basket Default Swaps; CDO
f.
Usually in the form of bid–ask spreads. Square; CDO Tranches: Impact on Economic
g.
iTraxx for the European market and CDX.NA for the US Capital; Collateralized Debt Obligation (CDO)
market. Options; Credit Default Swaps; Default Barrier
h.
On the basis of the following assumptions: 50% of the Models; Forward-starting CDO Tranche; Man-
portfolio allocated to subprime, of which 60% to the aged CDO; Multiname Reduced Form Models;
precedent vintage. Nested Simulation; Random Factor Loading
i.
While a cash CDO (or any cash bond) can be accounted Model (for Portfolio Credit); Reduced Form
for as “available for sale” by banks and insurers (meaning Credit Risk Models; Special-purpose Vehicle
that its price volatility will directly impact the equity base (SPV); Total Return Swap.
of the investor), the valuation of an equivalent synthetic
products impacts the income (P&L) of the investor. RICHARD BRUYERE & CHRISTOPHE JAECK
Forward-starting CDO Given a realization of the Gaussian factor Y , the
M individual credits are independent, and a sim-
Tranche ple recursive procedure [2] can then be employed to
recover the conditional loss distribution of the under-
lying portfolio, as well as the loss distribution of any
At the core of any CDO pricing model is a mecha- particular tranche of interest. Once we know how to
nism for generating dependent defaults. If a simple compute the loss distribution of a tranche for a given
factor structure is used to join their marginal distri- realization of the common factor, it is straightfor-
butions, the default times of the underlying credits ward to take a probability-weighted average across
are independent conditionally on the realization of all possible realizations of Y and thus recover the
the common factor(s). This conditional independence unconditional loss distribution of the tranche.
of defaults is very useful because it allows one to Repeating this procedure for a grid of horizon
use quasi-analytical algorithms to compute the term dates and interpreting the expected percentage loss up
structure of expected tranche losses, which is the fun- to time t as a “cumulative default probability”, we can
damental ingredient for the valuation of a synthetic price the tranche using exactly the same analytics that
CDO. we would use for pricing a CDS. More precisely, we
Because of their analytical tractability, condition- can define the “tranche curve” as the term structure
ally independent models have become a standard in of expected surviving percentage notionals of the
the synthetic CDO market. In the next section, we tranche, that is,
review the one-factor Gaussian-copula model, which

[Lt − U ]+ − [Lt − (U + V )]+


has played a dominant role since the early days of Q(t) = 1 − E
single-tranche trading. V
(3)

The Gaussian-copula Model where Lt is the number of loss units experienced by


the reference portfolio by time t, U is the number of
In the one-factor Gaussian-copula framework, the loss units that the tranche can withstand (attachment),
dependence of the default times is Gaussian, and is and V is the number of loss units protected by the
therefore completely specified by their correlations. tranche investor. Then the two legs of the swap can
In this model, given a particular realization of a nor- be priced using
mally distributed common factor Y , the probability
that the j th credit defaults by time t is equal to
T
  Premium = cN i Q(ti )B(ti ) (4)
 Dj,t − βj · Y 
i=1
πj,t (Y ) = N   ,
T
1 − βj2 Protection = N B(ti )(Q(ti−1 ) − Q(ti )) (5)
i=1
j = 1, 2, . . . M (1)
where c is the annual coupon paid on the tranche,
where N (.) denotes the standard Gaussian distribu- N is the notional of the tranche, ti , i = 1, 2, . . . , T
tion function, the vector {βj } determines the correla- are the coupon dates, i , i = 1, 2, . . . , T are accrual
tions of the default times, {Dj,t } are free parameters factors, and B(t) is the risk-free discount factor for
chosen to satisfy time t. Notice that, for ease of notation, we have used
 the coupon dates ti , i = 1, 2, . . . , T to discretize the
pj,t = πj,t (Y ) dN (Y ) (2) timeline for the valuation of the protection leg.
Y

and pj,t are the (unconditional) probabilities that


name j defaults by time t. Importantly, for the CDO Pricing of Reset Tranches
model to price the underlying credit default swap
(CDS) correctly, pj,t must be backed out from the Let us define a reset tranche as a path-dependent
term structure of observable CDS spreads. tranche whose attachment and/or width are reset at
2 Forward-starting CDO Tranche

a predetermined time (the reset date) as deterministic In words, the conditional tranche curve Q(t; ω)
functions of the random amount of losses incurred represents the (risk-neutral) expected percentage sur-
by the reference portfolio up to that time. Notice viving notional of the tranche at time t, conditional
that forward-starting tranches and tranches whose on the event that the reference portfolio experi-
attachment point resets at a future date both belong ences a cumulative loss of ω units up to the reset
to this class. date.
Equally, we can write down the valuation in terms
of the unconditional tranche curve
Pricing a Reset Tranche

λ
Q(t) = p(ω) · Q(t; ω) (9)
Let ts denote the reset date, λj , j = 1, 2, . . . , M, ω=0
the number of loss units produced by the default
of the j th name, λ = λj the maximum number and thus obtain the familiar equations
of loss units that the portfolio can suffer, p(ω) the
probability today that the reference portfolio incurs
T
Premium = cN i Q(ti )B(ti ) (10)
exactly ω loss units by the reset date ts .
i=1
A reset tranche can be defined by the vector
{tT , ts , U, V , U (ω), V (ω)} where U (ω) ≥ ω is the
T
attachment point of the tranche (in loss units) after Protection = N B(ti )(Q(ti−1 ) − Q(ti )) (11)
the reset date, and V (ω) is the number of loss units i=1
protected by the tranche investor after the reset date.
We can price the two legs of this swap as follows: However, while the unconditional tranche curve
for t0 ≤ t ≤ ts reduces to the standard tranche
Premium curve defined in the section The Gaussian-copula
Model,

λ
T
= cN p(ω) i Q(ti ; ω)B(ti ) (6)
λ
λ
ω=0 i=1 Q(t) = p(ω) · Q(t; ω) = 1 − p(ω)
Protection ω=0 ω=0


λ
T

[Lt −U ]+ −[Lt −(U + V )]+


=N p(ω) B(ti )(Q(ti−1 ; ω) ×E |Lts = ω
ω=0 i=1
V
− Q(ti ; ω))

(7) [Lt − U ]+ − [Lt − (U + V )]+


= 1−E
V
where we have defined the conditional tranche curve
Q(t; ω), t0 ≤ t ≤ tT , as (12)



[Lt − U (t; ω)]+ − [Lt − (U (t; ω) + V (t; ω))]+
Q(t; ω) = T (t, ω) · 1 − E |Lts = ω ,
V (t; ω)

[ω − U ]+ − [ω − (U + V )]+
T (t, ω) = 1 − 1{t>ts } ,
V

U, t ≤ ts
U (t; ω) = ,
U (ω), t > ts

V , t ≤ ts
V (t; ω) = (8)
V (ω), t > ts
Forward-starting CDO Tranche 3

the unconditional tranche curve for ts < t ≤ tT Zv01 ,v2 = 0 otherwise (15)


λ
Q(t) = p(ω) · Q(t; ω)
ω=0



[Lt − U (ω)]+ − [Lt − (U (ω) + V (ω))]+
λ
= p(ω) · T (t, ω) · 1 − E |Lts = ω (13)
ω=0
V (ω)

incorporates the added complexity of the path- We preserve the notation adopted during our
dependent valuation. description of the Gaussian-copula model and denote
by πj,t (Y ) the probability that name j defaults by
Deriving the Conditional Tranche Curve time t, conditional on the market factor taking value
Y . Now we feed one credit at a time into the recursion
Our discussion so far leaves open the problem of and update each element according to the following:
constructing the conditional tranche curve. From the If v1 ≥ λj , then
previous discussion, it should be clear that to achieve
this goal we need to be able to compute Zvj1 ,v2 = (1 − πj,u (Y )) · Zvj1−1
 conditional
,v2

expectations of the form E f Ltu , ω |Lts = ω for j −1
+ πj,s (Y ) · Z(v1 −λj ),(v2 −λj )
some function f . In this section, we present a two-
j −1
dimensional recursive algorithm for computing the + (πj,u (Y )−πj,s (Y ))·Z(v1 ),(v2 −λj ) (16)
joint distribution of cumulative losses at two different
horizons, which in turn allows us to compute the If v2 < λj , then
conditional expectations that we need. The method-
ology is conceptually similar to the one introduced Zvj1 ,v2 = (1 − πj,u (Y )) · Zvj1−1
,v2 (17)
by Baheti et al. [3] for pricing “squared” products. If v1 < λj ≤ v2 , then
As anticipated, we assume that the underlying
default model exhibits the property of conditional Zvj1 ,v2 = (1 − πj,u (Y )) · Zvj1−1
,v2
independence. We exploit this by conditioning our j −1
procedure on a particular realization of a common + (πj,u (Y )−πj,s (Y ))·Z(v1 ),(v2 −λj ) (18)
factor Y . We first discretize losses in the event of After including all the issuers, we set
default by associating each credit with the number of    
loss units that its default would produce: we indicate Zv1 ,v2 = ZvM1 ,v2 (19)
by λj the integer number of loss units that would  
result from the default of name j . Next, we construct The matrix Zv1 ,v2 now holds the joint loss dis-

a square matrix Zv1 ,v2 whose sides consist of all tribution of the reference portfolio at the two horizon
possible loss levels for the reference portfolio, that dates ts and tu , conditional on the realization of the
is, (0, 1, . . . , λ). In this matrix, we store the joint market factor Y , and we can numerically integrate
probabilities that the reference portfolio incurs v1 loss over the common factor to recover the unconditional
units up to time ts and v2 loss units up to time tu , with joint loss distribution. Using the joint distribution of
tu ≥ ts . By definition of cumulative loss, the matrix losses at different horizons, it is then straightforward,
must be upper triangular, that is, for any function f (.), to compute conditional
 expec-
tations of the form E f Ltu , ω |Lts = ω , which is
Zv1 ,v2 = 0 if v2 < v1 (14) how we construct the conditional tranche curve.
For the nontrivial elements where v2 ≥ v1 , we set
up the following recursion. We first initiate each state Comments
(recursion step j = 0) by setting
We have presented a simple methodology for quasi-
Zv01 ,v2 = 1, if v1 = 0 and v2 = 0 analytically pricing a class of default-path-dependent
4 Forward-starting CDO Tranche

tranches. The proposed methodology is general in that fit observable prices equally well may produce
the sense that it can be easily applied to any model significantly different valuations for path-dependent
with conditionally independent defaults, including instruments.
“implied copula” models fitted to liquidly traded
tranches as in the Hull–White [4] model. The algo-
rithm is useful because fast pricing of reset tranches References
allows one to obtain a variety of Greeks that are
essential for effective risk management. [1] Andersen, L. (2006). Portfolio losses in factor models:
As observed by Andersen [1], however, some cau- term structures and intertemporal loss dependence, Jour-
tion is necessary when pricing instruments whose nal of Credit Risk 4, 71–78.
valuation is sensitive to the joint distribution of cumu- [2] Andersen, L., Sidenius, J. & Basu, S. (2003). All your
lative losses at different horizons. Liquidly traded hedges in one basket, Risk November, 67–72.
tranches only contain information about marginal [3] Baheti, P., Mashal, R., Naldi, M. & Schloegl, L. (2005).
Squaring factor copula models, Risk June, 73–76.
loss distributions and tell us nothing about their
[4] Hull, J. & White, A. (2006). The Perfect Copula. Working
dependence. Implying a default time copula from Paper, University of Toronto.
these prices, therefore, implicitly contains an arbi-
trary assumption about intertemporal dependencies, PRASUN BAHETI, ROY MASHAL & MARCO
and it is easy to verify that different implied copulae NALDI
CDO Square Mechanics of a Synthetic CDO2

A synthetic CDO2 tranche follows the same mechan-


ics as an ordinary CDO tranche (see Collateralized
Debt Obligations (CDO)), with the only differ-
A CDO-of-CDO, or CDO square (CDO2 ), is a type ence that its reference portfolio is made up of CDO
of collateralized debt obligation (CDO) that has CDO tranches. This portfolio is called the outer or mas-
tranches as reference assets. The CDO2 market is ter portfolio. CDO tranches are determined by their
corresponding reference portfolio plus an attachment
a natural extension of the CDO market. The con-
(or subordination) level and detachment (or exhaus-
cept of CDO2 was pioneered by the ZAIS group,
tion) level with regard to aggregated credit losses.
when they launched ZING I in 1999, focusing on
Hence, we refer to the loss attachment and detach-
Euro CDO assets [2]. Recognizable growth of the
ment of the outer portfolio as outer attachment and
CDO2 market, particularly in the United States, was
outer detachment. Similarly, each CDO tranche of
fueled by the excessive volume growth of the CDO
the outer portfolio is described by a corresponding
market in the new millennium. In 2004, the situa-
inner reference portfolio and an inner attachment
tion of tightening credit spreads and a stable credit and detachment level (compare with Figure 1 for a
outlook shifted investor and dealer interest toward schematic description). The inner reference portfolios
more structured credit basket products in the search often overlap and include some of the same reference
for yield and tailored risk-return profiles. The repack- assets.
aging of CDO tranches via the CDO2 technology Inner attachment and detachment levels as well as
allows the dealer to manage the risk and capital the reference notional of assets in inner portfolios are
cost of (residual) trading book positions. As in the quite often of comparable size. Typically, we find ca.
case of CDOs, the dealer can exploit the rating 50–150 assets per inner portfolio and ca. 5–10 inner
alchemy, that is, the difference between traded and CDO tranches, which generally translate into a total
historical default probabilities as well as default cor- of ca. 250–500 different reference assets.
relations, to generate positive carry strategies. The CDO2 investors benefit from two layers of sub-
investor will benefit from a more diversified refer- ordination. First, we must have a considerable num-
ence portfolio, generally higher yield than similarly ber of default events with associated loss rates to
rated corporate debt, and the double-layer subordina- exceed the subordination of at least one inner CDO
tion effect. tranche. This will trigger losses on the outer port-
We distinguish three main CDO2 transaction folio. However, only if the subordination of the
types: cash CDO2 , synthetic CDO2 , and hybrid outer CDO tranche is exhausted, we will recognize
CDO2 . All transaction types can either refer to a CDO2 losses. The mathematical description of the
static portfolio of CDOs or can be combined with aggregated CDO2 loss Lout (t) for any future date
an active management of the reference CDO portfo- t during the contract term reflects the double-layer
lio. In a cash or cash-flow CDO2 , the reference assets effect.
are existing cash CDOs, which typically provide the First, the inner portfolio losses Lj (t) have to be
funds to pay CDO2 investors. In a synthetic CDO2 , determined via
the credit risk is generated synthetically, for example,
via unfunded CDO tranche swaps. Hybrid CDO2 s 
N
Lj (t) = Nij · (1 − Ri ) · 1{τi ≤t} (1)
appeared with the rise of the structured finance CDO i=1
market and comprise both elements. Typically, in
such transactions, the major portion (ca. 80–90%) where Nij is the notional of assets i = 1, . . . , N
of the reference portfolio is cash asset-backed secu- in the inner reference portfolios j = 1, . . . , M, Ri
rity (ABS) exposure and the remainder is synthetic denotes the asset-specific recovery rate and 1{τi ≤t}
CDOs. It is quite common to use a special purpose is the (stochastic) default indicator function for the
vehicle (SPV) overlay for cash CDO2 and hybrid default time τi of reference asset i. Second, the inner
CDO2 . portfolio losses Lj (t) have to be transformed into
2 CDO Square

Total of reference Inner reference portfolios Outer reference portfolio


entities, e.g., and (light grey shaded) comprising the inner tranches
corporate assets tranches and (dark grey shaded)
outer tranches

Figure 1 Schematic CDO2 description

inner CDO tranche losses Linn,j (t) (see Collateral- (non)occurrence of an isolated default event might
ized Debt Obligations (CDO)) simultaneously affect several inner reference portfo-
lios, thereby displaying a leveraged effect [1]. This
Linn,j (t) = min[Dj − Aj , max[Lj (t) − Aj , 0]]
impact is even more pronounced in case of thin
(2) tranches and understood as cliff risk of CDO2 s. More-
over, the double-layer tranche technology generally
where Aj and Dj denote the inner attachment and
amplifies correlation sensitivities: an increase in the
exhaustion level of the corresponding inner reference
asset correlation yields a higher increase of cor-
portfolio j . Third, the outer tranche or CDO2 tranche
loss can be computed as relation between affected inner CDO tranches. In
summary, overlap and correlation are the main risk
Lout (t) = min [Dout − Aout , max [Ltot (t) − Aout , 0]] drivers of a CDO2 tranche. In addition, the described
effects considerably increase the impact of other risk
(3)
drivers such as changing credit spreads (respectively,
where Aout and Dout denote the attachment and changing default probabilities) and changing recovery
exhaustion points of the outer tranche and Ltot (t) =
 rates.
M
j =1 Linn,j (t) the sum of inner tranche losses. The key ingredient to pricing is the stochastic
evaluation of the accumulated CDO2 tranche loss
Lout (t) as determined in the previous paragraph. This
Risk Analysis and Pricing requires the consistent use of a multivariate credit
The limited universe of liquid and actively traded (default) model. Since no market standard has been
reference assets naturally yields overlaps in inner ref- developed yet owing to the lack of truly observ-
erence portfolios; in other words, reference assets able correlation information, the necessity and benefit
tend to occur in more than one real-life inner ref- of appropriate scenario models are highlighted in
erence portfolio. This causes the CDO2 loss distri- this article. The rating agencies Moody’s, Standard
bution to display fatter tails on both ends, since the & Poor’s, and Fitch have consistently adapted their
CDO Square 3

CDO rating technology to the CDO2 case. In particu- [2] Smith, D. (2003). CDOs of CDOs: art eating itself? in
lar, the rating technology comes with a look-through Credit Derivatives: The Definite Guide, J. Gregory, ed,
Risk Books, London, pp. 257–279.
capability to underlying assets of inner reference port-
folios. However, the look-through capacity stops with
ABS-type assets that are modeled as a single asset. Related Articles

References Collateralized Debt Obligations (CDO); Managed


CDO; Structured Finance Rating Methodologies.
[1] Kakodkar, A., Galiani, S., Jonsson, J.G. & Gallo, A.
(2006). Credit Derivatives Handbook 2006—Vol. 2 A HANS-JÜRGEN BRASCH
Guide to the Exotics Credit Derivatives Market, Credit
Derivatives Strategy, Merrill Lynch, New York.
Leveraged Super-senior popular in 2005, when spreads on the 22–100%,
5-year, iTraxx index (see Credit Default Swap
Tranche (CDS) Indices) tranche were around 5 bps.
Issuers of LSS notes are typically investment
banks. When arranging a CDO transaction the issuer
A leveraged super-senior (LSS) note is a structure needs to be able to sell the entire capital structure, as
that allows investors to take a leveraged exposure otherwise he/she will be left with the remaining risk.
to the super-senior (SS) part of a collateralized debt For the reasons mentioned above, it can be harder
obligation (CDO). This provides an enhanced level to sell SS risk than risk on mezzanine and senior
of return while typically maintaining a AAA rating. tranches. The LSS transaction allows the issuer to
Leverage is achieved by posting an initial collateral repackage the SS risk in a way that is attractive to
amount that is less than the notional of the underlying the investor.
SS tranche. All credit losses to the investor are capped In an LSS transaction, the investor will invest
at the collateral amount, but the coupon is paid on in an SS tranche with notional N (referred to as
the full notional. Early unwind clauses are typically the reference tranche). However, he/she will only
included in the trade to mitigate the risk to the issuer post an initial collateral amount X (also referred to
that losses exceed the collateral amount. Compared to as the participation amount) that is less than the
a standard SS tranche the investor is exposed to mark- notional amount of the tranche. Any credit losses to
to-market (MTM) risk as well as credit risk, that is, the investor will reduce the collateral and losses are
for certain market moves his/her principal could be thus capped at the collateral amount, X. However, the
reduced even if no credit losses have occurred owing investor still receives a coupon on the full notional,
to a forced unwind. The issuer faces a so-called gap N , of the reference SS tranche. The ratio, N/X,
risk, the risk that the MTM of the tranche will fall is referred to as the leverage. If losses reach the
below the initially posted collateral amount before a collateral amount, the trade will terminate without
trade unwind can take place. any further cash flows. The typical structure of the
LSS trade can be seen in Figure 1. We compare credit
losses on the LSS and a standard SS in Figure 2.
Super-senior Swap The LSS structure provides an increased coupon
to the investor compared to investing the collateral
In an (unleveraged) SS swap transaction an investor amount X in a standard SS tranche. The reason for
(protection seller) will take exposure to credit losses this is that the investor takes on MTM risk over and
on the SS tranche of a CDO (see Collateralized above his/her credit risk as is described below.
Debt Obligations (CDO)). This means that in return The issuer of an LSS note (protection buyer) is
for a regular fee (or coupon), the investor will only covered for losses up to the collateral amount X.
make good all losses on the underlying reference However, he/she will typically hedge his/her position
portfolio that exceed the attachment amount, but are by selling protection on the corresponding standard
below the detachment amount. For an SS tranche the SS tranche (cf. the section Hedging and Risks) where
attachment point will be higher than what is required he/she is liable for all the losses on the tranche up
for achieving a AAA rating (see Credit Rating). The to the full notional amount N . Thus, the issuer needs
spread of the tranche is the fee that values the swap to mitigate the risk that losses exceed the collateral
at 0. level. This is done via the inclusion of a trigger
mechanism: As soon as a predefined trigger level is
reached the trade will unwind at the MTM of the
Leveraged Super-senior (LSS) Swap reference tranche capped at X. The trigger should
thus be set such that the MTM on unwind will be
Since the risk of experiencing any losses on an SS less than X.
tranche is remote, the spread for a standard SS tranche We note that in some transactions the investor has
is low. In particular, it is lower than for other AAA- the option of posting more collateral upon trigger
rated securities, which limits the attractiveness of to avoid an unwind. In this case, the investor will
the transaction to investors. LSS structures became continue to be paid the coupon of the original
2 Leveraged Super-senior Tranche

c+r
Coupons c
Issuer Special purpose X-Losses at T Investor
(protection buyer) Losses vehicle (SPV) (protection seller)
X at t

X-Losses at T
Interest
Losses r X at t

t : inception
T: maturity Collateral
X

Figure 1 This figure shows the standard structure of cash flows in an LSS transaction

Risky fee notional N (t )


N

Risky notional for LSS


Risky notional for SS
Credit losses on LSS
Credit losses for SS
Credit losses
X

Attachment point a a+X Detachment point b Portfolio losses

Figure 2 This figure shows the credit losses to the investor and the risky fee notional, N (t), as a function of the portfolio
losses. The coupon amount paid to the investor at a coupon date ti is s · N (ti ), where s is the spread. For comparison we
include both the behavior of the LSS and the reference unleveraged super-senior (SS)

transaction. This means that it is never optimal for the widen, the value of the LSS can drop severely from
investor to deliver since after posting more collateral the point of view of the investor even in the absence
he/she will be liable for more losses without receiving of any defaults. This poses a risk to the issuer since
a higher coupon in compensation. It is more favorable at the time of trigger the MTM of the tranche could
to reinvest in a new LSS transaction. have dropped below the collateral amount.
We describe the three main types of trigger
mechanisms below. There is a trade-off between how • Spread Trigger
well the trigger can approximate the MTM of the Spread triggers are based on the average spread of the
trade and how easy it is to objectively assess whether underlying portfolio. Trigger levels can be defined as
the trigger has been breached. a function of the time to maturity and the level of
losses in the portfolio. This provides a much better
• Loss Trigger proxy to the MTM of the tranche than the loss trigger.
A loss trigger is breached when the amount of portfo- For some standard portfolios, for example, iTraxx or
lio notional lost owing to defaults exceeds the trigger CDX (see Credit Default Swap (CDS) Indices), the
level. This is the easiest trigger to monitor as the value of the average spread can also be assessed
loss amounts can be objectively determined. How- using publicly available information and is hence
ever, the loss provides an imperfect approximation unambiguous. Often, however, the LSS is based on
for the MTM of the tranche. In particular, if spreads bespoke portfolios. In this case, the valuation of the
Leveraged Super-senior Tranche 3

SS spreads will have to rely on models, for which MTM losses (unless he/she posts more collateral).
there is no universally agreed methodology. Unless dealing with a loss trigger, a trigger breach
can happen even if the investor has not incurred any
• Mark-to-market Trigger actual credit losses, for example, if there is a dramatic
The MTM trigger is based on the MTM of the refer- rise in spreads.
ence (unleveraged) SS tranche. Clearly, if the MTM
trigger is set below the collateral level the issuer
ensures that the collateral will cover the unwind pay- Valuation
ment (up to gap risk, cf. the section Hedging and
Risks). The disadvantage is that the MTM trigger is The valuation of LSS transactions poses additional
the hardest to asses objectively. Typically, the MTM challenges to that of pricing a standard SS tranche.
for a tranche is not quoted, and hence one has to rely This is because the unwind feature means that we
entirely on (complex) models for valuation. need to be able to value the risk of possible MTM
losses to the investor and the issuer. Hence, we need
to model the joint behavior of MTM and the portfolio
Hedging and Risks losses. This is a dynamic problem that requires more
than knowledge of the marginal loss distributions
If the trigger mechanism guaranteed that upon needed for standard tranche pricing.
unwind the issuer will receive the full MTM of the
There are two main candidates for dynamic credit
reference swap then this swap would provide a per-
models that can, in principle, be used to value an LSS
fect hedge for the LSS. The coupon amount the
transaction:
investor would receive would be the same, as if
he/she had invested the full notional amount N in • low-dimensional models of the portfolio loss
an SS swap transaction. process;
However, there are two reasons why the trade • dynamic models of all single name spreads
can unwind without recovering the full MTM of the in the portfolio (see Duffie–Singleton Model;
hedge: Multiname Reduced Form Models).

• Typically, there will be a delay between a trigger Modeling and valuation of the LSS product is not
breach and the actual unwind of the hedge. only important for the issuer and the investor but also
In this period, there is the risk that the MTM for assessing the rating of the note. This depends not
will drop below the collateral amount. The only on the probability of experiencing credit losses
issuer then has to make good the difference but also on the probability of having a trigger event.
(MT M − X). This is the so-called gap risk: The Rating agencies (see Credit Rating) use in-house
issuer is exposed to large and sudden increases models for this as described in, for example, [1, 2].
in the value of SS protection or equivalently
increases in the spread.
• Even in the absence of a trigger breach, the LSS Model-independent Bounds
will unwind if the SS tranche losses have wiped
out the collateral. Since the collateral is 0 in this Some model-independent bounds for the value of
case, the issuer will have to pay the full MTM of the LSS can be derived. We discuss this from the
the hedge to unwind his/her position. However, perspective of the issuer who is long protection.
this scenario is unlikely, since the trigger should Let us denote the spread of a standard tranche with
be set so that a trigger event occurs before the attachment point a and detachment point b by Sa,b .
collateral has been reduced to 0. The spread of the corresponding leveraged tranche
The investor in an LSS transaction faces MTM with collateral amount x will be denoted by Sa,b,x .
risks as well as credit risks associated with SS tranche Note that the leverage amount α is given by α =
losses. In the case of a trigger event, the investor will (b − a)/x. The most basic bound we can then write
be forced to unwind his/her position and will lose part down is
or all of his/her principal as he/she realizes his/her Sa,b ≤ Sa,b,x ≤ α · Sa,b (1)
4 Leveraged Super-senior Tranche

This means the following: We also have

• The spread of the leveraged tranche is less Fa,b ≥ Fa,b,x (4)


than the leverage amount times the spread of This corresponds to the fact that on the fee leg the
the unleveraged tranche. This is because the issuer will at most pay the fees of the unleveraged
issuer has additional unwind and gap risk. The reference tranche. He/she might effectively pay less
difference if there is an unwind not due to the trigger such that
α · Sa,b − Sa,b,x (2)
no MTM exchange takes place.
is the gap charge. For a more rigorous discussion we refer to [3].
• The spread of the leveraged tranche is greater
than that of the unleveraged tranche. This is References
because of the trigger mechanism, which allows
the issuer to recover the MTM of the unlever-
[1] Chandler, C., Guadagnuolo, L. & Jobst, N. (2005). CDO
aged tranche up to the collateral amount. spotlight: Approach to rating super senior CDO notes, in
Standard & Poors Structured Finance, Standard & Poor’s
We can also give a more stringent floor for the LSS a Division of the McGraw-Hill Companies, Inc., New
value. To this, we introduce the fee leg value Fa,b York.
and contingent (or loss) leg value Ca,b of a tranche [2] Osako, C., Perkins, W. & Kissina, I. (2005). Leveraged
(see Collateralized Debt Obligations (CDO)). The super-senior credit default swaps, Fitch Ratings Struc-
(positive) value of the fee leg is the expected value of tured Finance July.
[3] Gregory, J. (2008). A trick of the credit tail, Risk 21(3),
all coupon payments paid on the risky notional. The
88–92.
contingent leg is the (positive) expected value of any
loss payments. We now have the following bounds:
Related Articles
Ca,a+x ≤ Ca,b,x (3)
This is because on the contingent leg, the issuer Forward-starting CDO Tranche.
will at least recover losses up to a + x. He can
effectively recover more on unwind since he/she MATTHIAS ARNSDORF
receives the MTM of the unleveraged tranche.
Managed CDO of the reference portfolio. Thus, investors face both
credit risk and the risk of poor management.
The majority of CDOs are managed and, in many
instances, involve compounded structured finance
General Definition claims. While standard CDOs use the same off-
A managed (or “active”) collateralized debt obliga- balance sheet structuring technology as asset-backed
tion (CDO) is a large-scale securitization transaction securities (ABS s) (e.g., securities that are themselves
that is actively arranged and administered to unbun- repackaged obligations on mortgages, consumer
dle, transform, and diversify financial risks from a loans, home equity lines of credit, and credit
dynamic reference portfolio of one or more credit- card receivables), their reference portfolios typically
sensitive asset classes (associated with different cred- include a wider and more diverse range of assets,
itors, countries, and/or industry sectors). Although such as senior secured bank loans, high-yield bonds,
the type(s) of asset(s) in the reference portfolio are and credit default swaps (CDS s). In particular, the
known and fixed through the life of the CDO, the variable portfolio structure of managed CDOs is
underlying collateral of a managed CDO is variable. particularly amenable to refinance ABSs, emerging
market bonds, or even other CDOs (to produce CDOs
of CDOs, also called CDO 2 ), as collateral assets in
Types of Managed CDOs the so-called “pools-of-pools” structures.

In general, CDO Managers operate under invest-


ment guidelines that are defined in the governing
Managed CDOs Have an Arbitrage
documents of the CDO transaction. Managers adjust Proposition
the investment exposure over time to meet a pre-
Managed CDOs are structured for arbitrage pur-
specified risk–return profile and/or achieve a certain
poses. As opposed to balance sheet CDOs, where
degree of diversification in response to changes in
issuers unload defined asset exposure to third parties
risk sensitivity, market sentiment, and/or timing pref-
in order to change their balance sheet composition or
erences. These guidelines specify parameters for the
debt maturity structure, in arbitrage transactions, the
initial portfolio (during the “ramp-up phase”, see
ability to trade a dynamic reference portfolio helps
below) but not the exact composition, for example, a
managers focus on the pool’s prospects for apprecia-
minimum average rating, a minimum average yield,
tion with the view of realizing economic gains while
a maximum average maturity, and a minimum degree limiting downside risks. These gains result from the
of diversification. As opposed to a static CDO, man- pricing mismatch between investment returns from
agers monitor and, if necessary, trade assets within reference assets (in the case of a cash flow struc-
the reference portfolio in order to inform decisions ture) or credit protection premia on exposures (in
about asset purchases and sales that protect the col- the case of a synthetic structure) and lower financing
lateral value from impairment due to deterioration in cost of generally higher rated liabilities in the form
credit quality [6]. For further references, see [1–3, 5, of issued CDO securities. While cash flow CDOs,
7, 8]. “Lightly managed” reference portfolios allow the most common type of CDOs, pay off liabilities
for some substitution of assets in the context of a with the cash generated from interest and princi-
defensive management strategy, while a “fully man- pal payments, synthetic CDOs sell credit protection
aged” CDO suggests a more active role of managers (together with various third-party guarantees) to cre-
subject to limits and investment guidelines that are ate partially funded and highly leveraged investment
determined by the issuers, rating agencies, and differ- on the performance of designated credit exposures
ent levels of risk tolerance of investors at inception. (without actually purchasing the reference assets).
In the event of the issuer’s insolvency or default,
managers are charged with maximizing recoveries on
behalf of investors. However, investors in managed The Life of a Managed CDO
CDOs do not know what specific assets CDO man-
agers will invest in, recognizing that those assets will The life of a managed CDO can be divided into
change over time as managers alter the composition three distinct phases (Figure 1). During the “ramp-up
2 Managed CDO

Time

Year 1 Years 4–5 Years 8 –10 Year 12

Phase 1: Phase 2: Phase 3:


Ramp-up Reinvestment Amortization

Closing Redemption right in Auction call in Final legal maturity in


year 3 years 8 –10 year 12

Figure 1 The phases in the life of a managed CDO

phase” (which lasts about one year), asset managers tests to avoid early amortization. In response to a
invest the proceeds from CDO placement (possibly general repricing of risk, dwindling investor demand
after an initial warehousing period when the sponsor increased risk premia and curtailed the capacity of
finances the buildup of the asset portfolio before secu- CDO managers to offset higher funding costs. Faced
ritizing). During the subsequent “reinvestment phase” with rising liability pressures and without real buyers
(up to five years or longer), managers reinvest cash available, managers of “blind pools” could “double
flows as well as trade the reference portfolio within up” by opting for riskier positions and greater lever-
the prescribed guidelines. Cash flows generated by age to preserve own arbitrage gains within predefined
the assets are used to pay back investors generally in investment guidelines, which were gradually under-
sequential order from the senior investors, who hold mined by the disassociation of ratings and structured
the highest rated (typically “AAA”-rated) securities, asset performance. In principle, if transaction costs
to the “equity investors” who bear the first-loss risk are ignored, risk-neutral managers would not ben-
and generally hold unrated securities. In transactions efit from dynamic asset allocation by substituting
with revolving pools, portfolio assets can be replaced badly performing assets. Under worsening credit con-
(e.g., credit card and trade receivables, corporate ditions, better asset performance comes at a premium,
bonds) and balances are adjustable up to maximum making it more expensive to weed out distressed
limits without amortization schedule of principal. In assets. Therefore, CDO managers are no better off
contrast, managers of substituting pools incorporate than before once they divert funds to safer but more
new assets (within defined credit parameters) as orig- costly assets (or accept higher hedging costs).
inal liabilities are paid down (e.g., corporate bonds,
some residential mortgages, and consumer loans), but
balances remain fixed. In the “amortization phase”,
the reference portfolio matures (or is prepaid/sold) References
and investors receive some or all of their principal
investment back according to the seniority of their [1] Cousseran, O. & Rahmouni, I. (2005). The CDO
claim. market – functioning and implications in terms of
financial stability, Banque de France Financial Stability
Review June (6), 43–62.
Lessons from the Credit Crisis [2] Duffie, D. & Gârleanu, N. (2001). Risk and valuation of
collateralized debt obligations, Financial Analysts Journal
Although rating agencies have developed stress 57(1), 41–59.
tests to evaluate the resilience of dynamic portfolio [3] Goodman, L.S. & Fabozzi, F.J. (2002). Collateralized
structures, the 2007 subprime mortgage crisis demon- Debt Obligations: Structures and Analysis, John Wiley &
Sons Inc., Hoboken, NJ.
strated that managed CDOs might create incentive
[4] Jobst, A. (2005). Risk management of CDOs during times
problems [4]. Existing quality and coverage tests of stress, Derivatives Week, Euromoney, London. (28
on the underlying collaterals are designed to trigger November), pp. 8–10.
amortization scenarios if asset performance deterio- [5] Jobst, A. (2007). A primer on structured finance, Journal
rates. However, CDO managers can manipulate these of Derivatives and Hedge Funds 13(3), 199–213.
Managed CDO 3

[6] Jobst, A. (2008). What is securitization? in Finance and Related Articles


Development, Vol. 47(3), (September), p. 48f.
[7] Punjabi, S. & Tierney, J.F. (1999). Synthetic CLOs and
their Role in Bank Balance Sheet Management, Deutsche Collateralized Debt Obligations (CDO); CDO
Bank Research, Fixed Income Research. Tranches: Impact on Economic Capital; Forward-
[8] Schorin, C. & Weinreich, S. (1998). Collateralized Debt starting CDO Tranche; Special-purpose Vehicle
Obligation Handbook . Working Paper, Fixed Income (SPV).
Research, Morgan Stanley Dean Witter.
ANDREAS A. JOBST
• Lt represents cumulative credit losses on that
Collateralized Debt portfolio up to date t, that is, the sum of
Obligation (CDO) Options different severities of defaulted entities that
are in that portfolio;
• a is defined as the attachment point of the
CDO tranche in proportion of the initial pool
The synthetic collateralized debt obligation (CDO) size; and
tranche activity is still a relatively new business, • d is defined as its detachment point.
which started in late 2000. Initially seen as an eccen-
tricity in the securitization market, it finally got a
life on its own. Contrary to the rest of the secu-
Definitions
ritization market, it grew as an arbitrage business, Here, we list the different derivatives that we have
where the bank will not gain from structuring fees seen in the synthetic CDO markets created between
(like for cash CDOs) but from an arbitrage between 2001 and 2007. Those derivatives can be catego-
two different markets: single-name credit default rized in two groups: derivatives on CDO tranches
swaps versus synthetic single-tranche CDO. Its evo- where the behavior of the derivative is conditioned by
lution was then marked by multiple borrowings from losses realization, or default-path-dependent deriva-
equity derivatives market, using its terminology (sin- tives, and derivatives conditioned by a spread/market
gle tranche viewed as call spread on credit losses) value evolution. Some derivatives like leveraged
and its technology (correlation smiles and the type supersenior (LSS) (see Leveraged Super-senior
of derivatives on it). This helps explain why the arti- Tranche) can be in the two categories depending on
cle focuses exclusively on synthetic CDO tranches, their variations.
because to our knowledge there do not exist deriva- The first category of default-path-dependent struc-
tives based on cash CDO notes.a ture is also known as reset tranches (as defined in
For a more comprehensive CDO framework, see [4]): those are CDO tranches where the attachment
Collateralized Debt Obligations (CDO), and for an point and/or the width of the tranche are modified at
introduction to CDO tranche pricing, see Intensity- a future reset date as a predetermined function of the
based Credit Risk Models. However, we introduce portfolio losses up to that date.
those two important notions that are used in this
article: • Forward-starting CDO (see Forward-starting
CDO Tranche and [2]): this is a CDO tranche
• A credit default swap (CDS) is a bilateral contract where the contract becomes effective at a future
where the protection buyer will pay a quarterly date, where any entities defaulting between the
premium (expressed as a proportion of the CDS entry into the forward and its effective date will be
notional) and receive from the protection seller, if considered to have a recovery rate at 100%, or in
specific events took place (related to the default other words at the effective date, the CDO tranche
or bankruptcy of a specific corporate entity), a will have an attachment point equal to the sum of
payment corresponding to one minus the price of the cumulative losses up to that date t1 and the
a bond of the defaulted entity, that payment being pre-fixed initial attachment point (i.e., a + Lt1 ),
called the severity of default or the credit loss. its width being unchanged in dollar amount (i.e.,
• A synthetic CDO tranche is a bilateral contract d − a). There is also a variation on that contract
where the protection buyer will pay a quarterly [7] where the forward CDO is the obligation to
premium (the premium leg of the swap) and enter into a CDO tranche at a future date, taking
receive from the protection seller the increment into consideration the erosion of subordination
in loss on the tranche (the loss leg of the swap), due to losses up to the effective date, and the
where the loss on the tranche is contractually decrease in the width, but not the losses on the
defined as a function of the sum of credit losses tranche, thus a subordination
  of max 0; a − Lt1
on a portfolio of single-names CDS, or more and width of min d; max a; Lt1 .
accurately as min (d − a, max (Lt − a)) with the • Subordination step-up: This is a standard CDO
following: tranche except that at the reset date, if losses have
2 Collateralized Debt Obligation (CDO) Options

not started to touch the tranche the subordination option to investors to put their CDO tranche at
will be increased by a fixed amount. Multiple par to the issuer of such guarantee if the rating
variations of those contracts exist with several was downgraded below a prespecified threshold.
reset dates or increase in subordination linked to This is in effect a callable structure conditional on
losses being in a specific band. the tranche being downgraded by a rating agency.
• Leveraged supersenior (see Leveraged Super-
senior Tranche): This is a synthetic CDO
tranche, with a large attachment point, thus its Purpose and Market
supersenior nature, which is initially partially col-
lateralized by the protection seller. Owing to that The purpose of those innovations is, in most cases,
partial collateralization, when a loss trigger or related to issues encountered by the bank’s desk
mark-to-market (MtM) trigger is breached, the working on synthetic CDO tranches. The innova-
protection seller has the obligation of either pro- tions in that market were always caused not by a
viding additional collateralization or unwinding need of the investors but by potential arbitrages to
its contract at market value. When the trigger is exploit. Synthetic CDO tranche from 2001 to 2007
based on loss level, this can be viewed as a reset was a booming market with several success stories.
tranche. However, this was a very competitive market, where
The second category encompasses all derivatives the competitive advantage was due to the endless cre-
in the classical sense, that is, derivatives based on the ation of new structural features. Indeed, as soon as
market value of the underlying asset. an innovation was introduced to the market by one
player, several others tried to imitate it, soon deplet-
• Call on CDO tranches: This is an option giving ing the potential gains evidenced by such innovation
the option holder the possibility to buy protection (Figure 1).
on a synthetic CDO tranche at a predetermined Each innovation was triggered by either an arbi-
spread on one of several future dates, being either trage to exploit or a specific problem encountered by
European for one single date or Bermudan for the desk:
a set of future dates. The strike is defined as a
spread level (and not as a value of the tranche). • Forward-starting CDO: Those products were cre-
The synthetic CDO tranche, that is, portfolio ated to exploit discrepancies between the terms
composition, attachment/detachment points, and structure of spread as the five-year maturity
maturity, is defined initially, and akin to the spread was depleted because of the wave of
differentiation done on forward-starting CDO, five-year synthetic CDO tranches. Synthetic CDO
losses up to the exercise date of the option may tranches were starting to be structured at 10 years
or may not affect the attachment point. or even as forward starting 5–10 years to bene-
• Put on CDO tranches: Contrary to the call option, fit from the tightening at 5 years. Indeed, a 5–10
this gives the option holder the possibility to sell years forward starting CDO can be seen as a com-
protection on a CDO tranche at a predetermined bination of a 10 years synthetic tranche and a 5
spread. years synthetic tranche, selling protection for 10
• Callable structure: This is an option that gives years but buying protection for the first 5 years.
the protection seller (or the protection buyer) the • Leveraged supersenior: When the correlation
right to terminate the transaction at no additional desk sold synthetic CDO tranches, they sold
costs during its life. If the option is for the protec- mainly equity and mezzanine tranches, and thus
tion seller, this is, in fact, a Bermudan call on the either delta-hedged them or kept the most senior
CDO tranche itself with a strike equal to its ini- tranches on their book. The supersenior exposures
tial spread level. Here the attachment point of the were very hard to sell due to their low spreads
underlying synthetic CDO tranche will be eroded compared to their notional amounts, that is, the
by losses up to the exercise date of the option. amount of cash needed to invest in those tranches.
• Rating guarantee: We know of one investment The creation of LSS allowed those desks to buy
bank that worked on the possibility to issue a protection on supersenior synthetic CDO tranches
guarantee on the CDO tranche rating, giving the by broadening the investor base outside of its
Collateralized Debt Obligation (CDO) Options 3

Notional of the credit derivative market (source ISDA)


$70 000

$60 000

$50 000 CPDO and CPPI

$40 000 Equity default swaps


Notional

Constant maturity Leveraged super-


default swaps senior tranches
$30 000
Single tranche CDOs Recovery swaps
Managed CDOs Index tranches
$20 000 CDS indices
CDO option
Capital structure arb.
$10,000 CDO2

$0
Mid- End- Mid- End- Mid- End- Mid- End- Mid- End- Mid- End- Mid- End- Mid-
2001 2001 2002 2002 2003 2003 2004 2004 2005 2005 2006 2006 2007 2007 2008
Date

Figure 1 Evolution of the notional of the credit derivatives market with several innovations

initial clients (monolines or (re)insurance com- position on maturities: long, the CDO tranche at the
panies), the LSS having a higher spread for an longest maturity and short, the same CDO tranche
“assumed” low credit risk. at the effective date. Indeed, on comparing the
two positions—a t1 /t2 forward-starting CDO tranche
versus a long CDO tranche at t2 and a short CDO
Valuation tranche at t1 with the same attachment/detachment
points:
The valuation of a CDO tranche, whether initially
or during its life, relies on the knowledge of the • if losses are always below the attachment point,
loss distribution of the underlying portfolio through no CDO tranches will be touched;
time or, in other words on the law governing the • if Lt1 < a and Lt2 > a, the forward-starting
 CDO
random path Lt representing the cumulative losses will lose min d − a, Lt2 − a and the long CDO
up to t. The knowledge of the loss distribution at tranche will lose the same amount; and
different future date (thus a loss distribution surface) • if Lt1 > a and Lt2 >a, the forward-starting 
is requiredb to price a CDO tranche, that is, to value CDO will lose min d − Lt1 , Lt2 − Lt1 and
the two legs of that tranche swap: P [LT ≤ l|Ft ] the long–short CDO tranches will lose/gain
  
the probability that losses up to time T will exceed min d − a, Lt2 − a /min d − a, Lt1 − a , which
threshold l knowing the losses at time t.c If the gives the same aggregate amount.d
existing information in the market consists of the
credit index tranches prices, in the arbitrage pricing However, to value the other reset tranches, we
theory framework, from those prices we will extract need an additional information: the intertemporal
constraints
 on the “spot”
 loss distribution surface dependence of losses, that is, the dependence between
being P LT ≤ l|Ft0 . losses at different dates. For a forward-starting
 
Some CDO derivatives can be valued with that CDO—first variation—we need the law of Lt1 , Lt2
“spot” loss distribution surface: the forward-starting to be able to price it.
CDO as described in [7] (the second variation as In addition, for options on tranche, dependent on
described above) can be understood as a long–short future spreads, the knowledge of the “spot” loss
4 Collateralized Debt Obligation (CDO) Options

distribution surface is not sufficient to value those default of individual companies; a portfolio
options. An additional assumption related to spread can be analyzed with correlated hazard rates.
volatility is needed: this can be an ad hoc assump- A natural extension of those models to address
tion directly on the volatility [7] or can be embedded dynamic losses is to use stochastic hazard
into a stochastic deformation of the loss distribu- rate for each company, which may be linked
tion surface P [LT ≤ l|Ft ] through time. This leads through common jumps (as first introduced
researchers to introduce the class of models known as by Duffie and Gârleanu [5]), correlated
a dynamic losses model. The dynamic losses model Brownian motion, or even introduction of
defined so far relies on the standard CDO models, stochastic time process mapping calendar time
which are classified according to two broad cate- to business time [10].
gories [8]:
Following the financial crisis of 2008, the land-
• Top-down models: The top-down approach will scape for synthetic CDO tranches has seen a change
only look at the evolution of the losses on the in paradigm. The default of Lehman Brothers in
portfolio and model its dynamics. The seminal September 2008 and the demise of the investment
paper describing a general framework for such bank business model have exposed the shaky foun-
dynamic of the “forward” loss distribution surface dations of the CDO market: liquidity drying in stress
is [11], where the distribution of losses in the period and lack of acknowledgment of the coun-
portfolio is represented as a Markov chain with terparty risk in the CDS market. However, the ini-
stochastic transition rates. Andersen et al. [3] tiatives that are currently discussed (standardization
explore the same road in a less general manner. of that market, central clearing house) will, on the
Those approaches are tractable, flexible but they long term, expand the scope of that market and ulti-
do not capture information from the single-names mately be beneficial for the development of those
CDS market. instruments.
• Bottom-up models: The approach starts with
a representation of the credit risk of the End Notes
underlying single names in order to build a loss
a.
distribution surface. Starting from the modeling Apart from rare guarantees offered by structuring desks
of individual defaults, they use classical credit or call optionality for the equity tranche of cash CDOs.
b.
In reality as pointed out in [6], the knowledge of the
modeling:
expected loss on the CDO tranche is sufficient to price it.
• Structural models (see Default Barrier c.
The filtration Ft may embed more information than the
Models): A structural model computes the cumulative losses up to that time.
default as the breaching by a random process d.
Taking into account even the timing of payment of losses,
of a barrier (in the initial Merton seminal the two positions are the same.
article the first represents the assets of a
company and the second its indebtness). References
That class of model incorporates a dynamic
for the probabilities of losses naturally, [1] Albanese, C., Chen, O., Dalessandro, A. & Vidler, A.
introducing default dependencies through the (2005). Dynamic Credit Correlation Modeling, Working
random process, with linear combination Paper, Imperial College.
[2] Andersen, L. (2006). Portfolio Losses in Factor Models:
of random process (Brownian motion or Term Structures and Intertemporal Loss Dependence.
Gamma process, see [9]). A related class [3] Andersen, L., Piterbarg, V. & Sidenius, J. (2005).
of models looks at a discrete evolution of A New Framework for Dynamic Credit Portfolio Loss
creditworthiness, generally with a Markov Modelling. Working Paper, November.
chain, where the use of stochastic transition [4] Baheti, P., Mashal, R. & Naldi, M. (2006). Step it Up
rates can also be applied (a related example or Start it Forward: Fast Pricing of Reset Tranches,
Lehman Brothers Quantitative Credit Research, Vol.
is in [1]).
2006-Q1.
• Reduced-form models (see Intensity-based [5] Duffie, D. & Gârleanu, N. (2001). Risk and the valuation
Credit Risk Models): A reduced-form model of collateralized debt obligations, Financial Analysts
uses hazard rates to represent the risk of Journal 57, 41–59.
Collateralized Debt Obligation (CDO) Options 5

[6] Hull, J. & White, A. (2006). Valuing credit derivatives for the Pricing of Portfolio Credit Derivatives. Working
using an implied copula approach, Journal of Derivatives Paper, ETZH.
14, 8–28.
[7] Hull, J. & White, A. (2007). Forward and European
options on CDO tranches, Journal of Credit Risk 3, Related Articles
63–73.
[8] Hull, J. & White, A. (2008). Dynamic models of
Collateralized Debt Obligations (CDO); Default
portfolio credit risk: a simplified approach, Journal of
Derivatives 15, 9–28.
Barrier Models; Forward-starting CDO Tranche;
[9] Jäckel, P. (2008). The Discrete Gamma Pool Model. Intensity-based Credit Risk Models; Leveraged
Working Paper, August. Super-senior Tranche.
[10] Joshi, M. & Stacey, A. (2006). Intensity Gamma, Risk
19, 78–83. OLIVIER TOUTAIN
[11] Schonbucher, P.J. (2006). Portfolio Losses and the Term
Structure of Loss Transition Rates: A New Methodology
Credit Default Swap The cash amount is paid by the protection buyer.
The accrued coupon enters the calculation because
Index Options portfolio swaps, by convention trade with accrued
coupon, similar to the way bonds trade with accrued
interest. To simplify the exposition, we ignore
Portfolio credit default swaps (CDSs) referencing accrued coupon in the remainder of the article.
indices such as CDX and iTraxx are the most liquid When a strike spread is specified, the cash amount
instruments in today’s credit market and options on is calculated using the standard CDS valuation model,
these have become mainstream. A CDS index option for example, as implemented in the Bloomberg
(also called a portfolio swaption) is an option to CDSW screen:
enter into a portfolio swap as a protection buyer or Cash amount = Notional · PV01·
a protection seller. A portfolio swap (also called a
CDS index swap) is similar to a portfolio of single- (Strike spread − Coupon) (2)
name CDS all with the same coupon (for details, see
The coupon is the fixed premium rate for the
Credit Default Swap (CDS) Indices).
underlying portfolio swap. When valuing a portfolio
Both portfolio swaps and swaptions are traded
swaption, it is important to respect the exact market
over the counter but are standardized. The con-
convention for calculating the PV01 such as the flat
ventions for how portfolio swaps are quoted and
spread curve convention (see Credit Default Swap
traded are important for properly valuing portfolio
(CDS) Indices).
swaptions.
Another important market convention is that if
In this article, we outline the basic conventions
the swaption is exercised, the option holder will
and terminology for portfolio swaptions, explain the
buy or sell protection on all names in the portfolio
standard model used by most market participants, and
including those that may have defaulted before option
briefly discuss other models and approaches.
expiration.

Conventions and Terminology The Standard Model


A portfolio swaption is an option to enter into a Now, suppose that V is the value at option expiration
portfolio swap as a protection buyer (payer swap- of owning protection on all names in the portfolio
tion) or a protection seller (receiver swaption). The including those that have already defaulted. Option
swaption is defined by the underlying portfolio swap, payoff at exercise is then
for example, 5-year CDX.IG.11, the expiration date,
and the strike spread or strike price. For investment Payer swaption payoff at exercise
grade portfolios, it is a convention to specify a strike
= max{V − Cash amount, 0}
spread, whereas for high yield portfolios a strike price
is usually specified. Trading is primarily in options on Receiver swaption payoff at exercise
5-year portfolio swaps. Option maturities are less than
= max{Cash amount − V , 0} (3)
1 year, with most liquidity in 1–3 months maturities.
The standard option expiration dates are the 20th of where the cash amount is calculated from the strike
each month. price as in equation (1) or from the strike spread
The strike, whether it is specified as a spread or using a CDS valuation model as in equation (2).
as a price, must be converted by a simple calculation The cash amount is not affected by defaults. In fact,
to determine the cash amount to be exchanged if a strike price is specified, the cash amount is
between the swaption counterparties upon exercise. known with certainty before option expiration. If a
The calculation is easiest when a strike price is strike spread is specified, the only uncertainty about
specified: the cash amount derives from uncertainty about the
interest rate curve. However, when pricing portfolio
Cash amount = Notional · (Strike price − 100%)
swaptions, it is standard to assume that forward rates
− Accrued coupon (1) are realized.
2 Credit Default Swap Index Options

To price a swaption, we must specify a stochastic It is recommended to solve the model numerically
model for V . In addition to assuming that risk-neutral to get the most accurate pricing. However, by making
valuation is proper [1, 3], the standard model is based a few simple approximations (such as simplifying the
on two minimal assumptions that are clarified further: expression for the PV01 in equation (2)) it is possible
to derive approximate closed-form solutions that look
1. the spread of the underlying portfolio swap is like Black formulas.
lognormally distributed and See [2] for details on the model outlined above.
2. the model correctly prices a synthetic forward
contract constructed by combining a long payer
and a short receiver with the same strikes. Other Models and Approaches
The standard model assumes that V is a function, The standard model is a simple approach to what
V (X), of a hypothetical spread: could be a very complicated problem. Instead of
  trying to model the credit curves and default of
X = E(X) exp −0.5σ 2 T + σ Normal(0, T ) (4)
each of the names in the portfolio, the approach
where Normal(0, T ) is random normal variable with in the standard model is to model the hypothetical
mean 0 and variance T (the time, in years, to option spread on the aggregate portfolio that also includes
expiration), and σ is the free parameter that we defaulted names. Thereby the model has only one free
interpret as the spread volatility. E(X) is the expected parameter, the aggregate spread volatility, and the
value of X. The function V (X) is the one found approach becomes similar to using Black–Scholes
in equation (2) when the cash amount is seen as a for S&P 500 options. This analogy to the equity
function of the strike spread. world suggests paths to the next generation of models
The swaptions are priced by discounting their such as introducing stochastic volatility and jumps
expected terminal payoff (risk-neutral valuation). To or creating a model that starts from the individual
understand where E(X) comes from, consider a payer credits by modeling their default, spread volatility,
and a receiver swaption both with a strike price and spread correlation.
of 100% or equivalently strike spreads equal to the
coupon in the underlying portfolio swap. In this case, References
the cash amount in equation (1) or (2) is zero and the
terminal payoff from a position that is long the payer [1] Morini, M. & Brigo, D. (2007). Arbitrage-free Pric-
and short the receiver is V . The value of this position ing of Credit Index Options, Working Paper, Bocconi
is therefore University.
[2] Pedersen, C. (2003). Valuation of Portfolio Credit Default
V0 = D(T )E(V (X)) (5) Swaptions, Lehman Brothers Quantitative Credit Re-
search Quarterly, 2003-Q4, pp. 71–81.
where D(T ) is the discount factor to time T (option [3] Rutkowski, M. & Armstrong, A. (2008). Valuation of
expiration). Credit Default Swaptions and Credit Default Index Swap-
The value of a position that pays V , that is, V0 , tions, Working Paper, University of New South Wales.
can also be determined from the credit curve of
the underlying portfolio (potentially using the credit
curves of all the names in the portfolio) since it is
Related Articles
simply the value of owning protection on all names
in the portfolio but only having to pay premium from Credit Default Swaps; Credit Default Swap (CDS)
option expiration onward. Once we have a value for Indices; Credit Default Swaption; Hazard Rate.
V0 , E(V (X)) can be found as V0 /D(T ) and E(X)
CLAUS M. PEDERSEN
can be implied from this value. We can then price the
swaptions using σ as the only additional parameter.
Hazard Rate expressed as
 T
¯ −rT 0 [τ ≤ T ] + r
D0 = e ¯ e−rs 0 [τ ≤ s] ds
0
Consider a credit default swap (CDS) (see Credit
(4)
Default Swaps), where the premium payments are 
−rtm
periodic and the terminal payment is a digital cash P0 (S) = S e 0 [τ > tm ] (5)
settlement of recovery rate 1 − . ¯ For simplicity, we tm
assume that the current time is normalized to t = 0,
the risk-free rate r is constant throughout the maturity The fair spread is the spread S ∗ for which
of the contract, and spreads are already given at P0 (S ∗ ) = D0 , making the value of the contract at ini-
tiation 0. This simple expositional formulation shows
standard interperiod rates (allowing us to ignore day-
that the modeling of survival probabilities under the
count fractions and division by period length). The
pricing measure of the form 0 [τ > s] is the essence
cash flows of a CDS can be decomposed into the
of CDS pricing. These quantities can be modeled in
default leg and the premium leg. The default leg is a
a unified way using the concept of hazard rate.
single lump sum compensation for the loss ¯ on the
face value of the reference asset made at the default
time by the protection seller to the protection buyer,
Hazard Rate and Default Intensity
given that the default is before the expiration date
T of the contract. The premium leg consists of the Suppose that we have a filtered probability space
fees, called the CDS spread, paid by the protection (, F , , ) satisfying the usual conditions and
buyer at dates tm (assumed to be equidistant e.g., that the default time of a firm is modeled by a
quarterly) until the default event or T , whichever is random time τ , where [τ = 0] = 0 and [τ >
first. The spread S is given as a fraction of the unit t] > 0 for all t ∈ + . We start under the assumption,
notional. which will be relaxed later, that the evolution of
A concise mathematical expression for both legs information only involves observations of whether
can be obtained via a point process representa- or not default has occurred up to time t. In other
tion. Suppose we have a filtered probability space words, we are dealing with the natural filtration
(, F , , ) satisfying the usual conditions. We F t = N t = σ (Ns , s ≤ t) of the right continuous,
model the default time as a random time τ in [0, ∞] increasing process N , introduced earlier, completed
with an associated single jump point process to include the -negligible sets. Let F (t) = [τ ≤
 t] be the cumulative distribution function of τ . Then,
1 if τ ≤ t the hazard function of τ is defined by the increasing
Nt = 1{τ ≤t} = (1)
0 if τ > t function H : + → + given as
H (t) = − ln(1 − F (t)) ∀t ∈ + (6)
The default leg D0 and premium leg P0 (S) can
now be expressed in terms of N as Suppose, furthermore, that F is absolutely contin-
uous, admitting a density representation of F (t) =
    t
0 f (s) ds. The hazard rate of τ is defined by the
T
¯ −rτ
D0 = Ɛ0 e Nτ = Ɛ0 ¯e−rs dNs (2) nonnegative function h : + → + given as
0
   f (t)
P0 (S) = Ɛ0 S e−rtm (1 − Ntm ) (3) h(t) = (7)
1 − F (t)
tm
under which we have
where Ɛt [·] = Ɛ[·|F t ] is the conditional expectation t

F (t) = 1 − e−H (t) = 1 − e
h(s) ds
with respect to time t, information F t , and the 0 ∀t ∈ + (8)
integral of equation (2) is defined in the Stieltjes
sense. Finally, by an application of Fubini’s theorem Naturally, the component probabilities of equa-
and integration by parts, equations (2) and (3) can be tions (4) and (5) can be expressed in terms of the
2 Hazard Rate

hazard rate as only if the cumulative distribution function F (t) is


continuous. Furthermore, if F (t) is absolutely con-
t∧τ
[τ > s|N t ] tinuous we have A(t) = 0 h(s) ds, where h is the
s hazard rate. Therefore, under the continuity (absolute
− h(u) du
= 1{τ >t} eH (t)−H (s) = 1{τ >t} e t (9) continuity) of F (t), we can say that the hazard func-
  tion H (hazard rate h) is the unique function
such
[t < τ ≤ s|N t ] = 1{τ >t} 1 − e H (t)−H (s)
 t∧τ
s
that N (t) − H (t ∧ τ ) = N (t) − 0 h(s) ds is an
− h(u) du -martingale. However, the martingale/compensator
= 1{τ >t} 1 − e t (10)
characterization and the standard hazard function
definition no longer coincide when F (t) is not
where s ≥ t. Note that for a continuous h, h(t) continuous.
represents the first-order approximation of the proba- In financial literature, we are also interested
bility of default between t and t + , given survival in cases where the current information filtration
up to t. The term hazard rate stems from the fact models not only survival but the observation of other
that h(t) can be thought of as the instantaneous processes as well. Let the total flow of information
( → 0) rate of failure (in our case default arrival) at be modeled by  =  ∨ , where  is once
time t conditional on survival up to time t. Because again the natural filtration of N (and all filtrations
of this conceptualization, the hazard rate is often considered are right-continuous completions). Under
referred to as the forward default rate in financial certain conditions the previous concepts can be
literature. While the term default intensity is also fre- extended in a straightforward fashion. Let F (t) =
quently used interchangeably with hazard rate [3], [τ ≤ t|G t ]. First, we assume that τ is not a
some authors [5] elect to distinguish between the two -stopping time (while it is trivially an -stopping
terms, where intensity is used to refer to the arrival time) such that the - hazard process H (t) =
rate conditioned on all observable information, and − ln(1 − F (t)) is well-defined. If H is absolutely
not only on survival. If survival is the only observ- continuous, admitting  the -progressive density
t
able information, as in our current setting, the two representation H (t) = 0 h(s) ds, then the process
terms are equivalent even under this distinction.
A useful alternate characterization of the haz-  t∧τ
ard rate is possible using the martingale theory of M(t) := N (t) − H (t ∧ τ ) = N (t) − h(s) ds
point processes highlighted in [2]. As an increas- 0

ing process, N has an obvious upward trend. The (11)


conditional probability of default by time s ≥ t is
always greater than or equal to Nt itself, and hence is an -martingale, and the analogs of equations (9)
N is a submartingale. It follows that N admits the and (10) are given by
Doob–Meyer decomposition [4] N = M + A, where 
M is an -martingale and A is a right-continuous, - [τ > s|F t ] = 1{τ >t} Ɛ eH (t)−H (s) |G t
predictable, increasing process starting from 0, both  s 
− h(u) du
unique up to indistinguishability. A compensates for = 1{τ >t} Ɛ e t |G t (12)
the upward trend such that N − A is an -martingale,

and hence the popular terminology (-)compensator [t < τ ≤ s|F t ] = 1{τ >t} Ɛ 1 − eH (t)−H (s) |G t
(see Compensators). Compensators are interesting  s 
constructs in and of themselves, as their analyti- − h(u) du
= 1{τ >t} Ɛ 1 − e t |G t
cal properties correspond to probabilistic properties
of the underlying random time. For instance, the (13)
almost sure continuity of sample paths of A is equiv-
alent to the total inaccessibility of τ . Giesecke [6] In this setting, h is deemed the -intensity or -
outlines these properties and provides a direct com- hazard rate. Even, in the case where τ is a -stopping
pensator–based pricing application. As shown in [7], time ( ⊂ ) and thus the -hazard function is not
the connection between the compensator A and the well defined, similar results can be obtained under
hazard function H is that A(t) = H (t ∧ τ ), if and certain conditions. Under certain restrictions on the
Hazard Rate 3

distributional properties of τ , we can still use point the many issues surrounding the divergence of opin-
process martingale theory (see Point Processes) to ions and efforts for convergence in the reduced-form
find an increasing -predictable process  for which versus structural literature. Duffie and Singleton [5,
the conditional survival probabilities are given by 8] both provide a comprehensive overview of dif-
 ferent credit models, while Giesecke [6] specifically
[τ > s|F t ] = 1{τ >t} Ɛ e(t)−(s) |G t (14) outlines the different informational assumptions and
their implications in intensity formulations.
Routinely, if  is absolutely continuous then,
(t) − (s) in equation (14)  s can be replaced by
its density representation − t λ(u) du. The details References
of such conditions and results, as well as a general
theory of hazard processes, are summarized in [1, 7]. [1] Bielecki, T.R. & Rutkowski, M. (2001). Credit Risk:
Modeling, Valuation and Hedging, Springer.
[2] Bremaud, P. (1981). Point Processes and Queues, Mar-
Reduced-form Modeling and Other Issues tingale Dynamics, Springer-Verlag.
[3] Brigo, D. & Mercurio, F. (2007). Interest Rate Models -
Thoery and Practice, With Smile, Inflation and Credit, 2nd
The importance of the concept of hazard rates (or
Edition, Springer.
intensities) lies in the fact that their direct modeling [4] Dellacherie, C. & Meyer, P.A. (1982). Probabilities and
and parametrization is the prevalent industry practice Potential, North Holland, Amsterdam.
in evaluating credit derivatives. Now that the CDS [5] Duffie, D. & Singleton, K. (2003). Credit Risk: Pric-
market has grown to one of great volume and ing, Measurement and Management, Princeton University
liquidity, the realm of CDS spread modeling has Press.
become less of a pricing issue and more of a [6] Giesecke, K. (2006). Default and information, Journal of
Economic Dynamics and Control 30, 2281–2303.
calibration one. Reduced-form modeling (see Inten-
[7] Jeanblanc, M. & Rutkowski, M. (2000). Modelling of
sity-based Credit Risk Models) refers to valuation default risk: an overview, in Mathematical Finance:
methods in which one exogenously specifies the Theory and Practice, Higher Education Press, Beijing, pp.
dynamics of an intensity model, much like we 171–269.
would for spot rates, and then calibrates the model [8] Lando, D. (2004). Credit Risk Modeling: Theory and
parameters to fit the market spread data via a pricing Applications, Princeton University Press.
formulation such as equations (4)–(5). A full-fledged
model could incorporate features such as premium Further Reading
accrual, dependence of intensity with stochastic spot
rates and the loss rate, and interaction/contagion International Swaps and Derivatives Association (1997). Con-
effects with other names, which were ignored in our firmation of OTC Credit Swap Transaction Single Reference
expositional formulation. Entity Non-Sovereign.
The assumptions, the underlying informational International Swaps and Derivatives Association (2002). 2002
assumptions in particular, implied by the mere exis- Master Agreement.
Tavakoli, J. (2001). Credit Derivatives and Synthetic Structures,
tence of a hazard rate are a nontrivial issue. Not
A Guide to Instruments and Structures, 2nd Edition, John
all models admit an intensity process in their given Wiley & Sons.
information filtrations. For instance, in the classical
first passage structural model under perfect infor-
mation (see Default Barrier Models) the forward Related Articles
default rate (hazard rate with survival information
only) exists, but the intensity process (hazard rate Compensators; Credit Default Swaps; Default
with all available information i.e. the firm value Barrier Models; Duffie–Singleton Model; Inten-
process) does not. Conceptually, the existence of a sity-based Credit Risk Models; Jarrow–Lando–
positive instantaneous default arrival rate implies a Turnbull Model; Point Processes; Reduced Form
certain imperfection in the observable information, Credit Risk Models.
modeled either explicitly through a noisy filtration
or implicitly through a totally inaccessible stopping JUNE HO KIM
time in the complete filtration. This underlines one of
Duffie–Singleton Model ht the default hazard rate, and Lt the fraction of
market value lost in the event of default. λt = ht Lt
can be interpreted as a “risk-neutral mean-loss rate
of the instrument due to default.” As a consequence,
The credit risk modeling approach of Duffie and
credit spread data alone (be it corporate bond yields,
Singleton [8, 9] falls into the class of reduced-
swap to treasury spreads, or credit default swap
form (see Reduced Form Credit Risk Models;
spreads) are insufficient to separate the “risk-neutral
Intensity-based Credit Risk Models) or intensity-
mean-loss rate” λt into its hazard rate ht and loss
based models in the sense that default is directly
fraction Lt .
modeled as being triggered by a point process, as
The representation (1) lends the model consider-
opposed to structural models (see Structural Default
able tractability, particularly for applications that do
Risk Models) attempting to explain default through
not require the separation of Rt into its components
the dynamics of the firm’s capital structure, and
rt , ht , and Lt , since Rt could then be modeled directly
the intensity of this process under a risk-neutral
as a function ρ(Yt ) of a state variable process Y that
probability measure is related to an appropriately
is Markovian under Q. If the payoff of the claim is
defined instantaneous credit spread. In its original
also Markovian in Y , say X = g(YT ), then the value
construction, it is set out as an econometric model,
of the claim at any time t (assuming that default has
that is, a model the parameters of which are estimated not occurred by time t) can be written as the condi-
from the time series of market data, such as the tional expectation
weekly data of swap yields used in [8]. To this
   T   
end, the model is driven by a set of state variables 
following a Markov process under the risk-neutral Vt = E Q exp − ρ(Ys ) ds g(YT ) Yt (2)
t
measure, and defaultable zero-coupon bond prices are
exponentially affine functions of the state variables ρ(Ys ) can be modeled analogously to any one of
along the lines of the results derived by Duffie a number of tractable default-free interest rate term
and Kan [6] for default-free models of the term structure models. One possible choice of making the
structure of interest rates (see Affine Models). Duffie Markovian model specific is along the lines of a
and Singleton [9] show that the model framework multifactor affine term structure model as studied by
can be made specific in a way that also allows Dai and Singleton [3], in which rt and λt are affine
default intensities and default-free interest rates to functions of the vector Yt ,
be negatively correlated in a manner that is more
consistent theoretically than in prior attempts in the 
N

literature. rt = δ 0 + δi Yt(i) = δ0 + δY Yt (3)


A key assumption of Duffie–Singleton is the i=1
modeling of recovery in the event of default as an 
N
exogenously given fraction of the market value of the λt = γ0 + γi Yt(i) = γ0 + γY Yt (4)
defaultable claim immediately prior to default. Under i=1
this assumption, the possibility of default on a claim
can be priced by default-adjusting the interest rate and Yt follows an “affine diffusion”
with which the future cash flow (or payoff) from the 
dYt = K( − Yt ) dt +  S(t) dW (t) (5)
claim is discounted. That is to say that today’s (t = 0)
value V0 of a claim with the (possibly random) payoff where W is an N -dimensional standard Brownian
X at time t = T can be calculated as the expectation motion under Q, K and  are N × N matrices (which
under the spot risk-neutral measure Q, may, in general, be nondiagonal and asymmetric),
   T   and S(t) is a diagonal matrix with the ith diagonal
V0 = E0Q exp − Rt dt X (1) element given by
0
[S(t)]ii = αi + βi Yt (6)
where the discounting is given in terms of the default-
adjusted short-rate process Rt = rt + ht Lt , with rt If certain admissibility conditions on the model
the default-free continuously compounded short rate, parameters are satisfied [3], it follows from [6] that
2 Duffie–Singleton Model

default-free and defaultable zero-coupon bond prices justify accepting its slight inconsistency with legal
are exponential affine functions of the state variables. and market practice.
Duffie and Singleton [9] highlight that modeling The parallels of equation (1) to the valuation of
Y as a vector of independent components follow- contingent claims in default-free interest rate term
ing [2] “square-root diffusions” constrains the joint structure models also extend to the methodology of
conditional distribution of rt and λt in a manner Heath et al. [10] (HJM). Defining a term structure of
inconsistent with empirical findings. In particular, the “defaultable instantaneous forward rates” f¯(t, T ) in
[3] conditions on admissible model parameters imply terms of defaultable zero-coupon bond prices B̄(t, T )
that such a model cannot produce negative correlation (i.e., the time t price of a bond maturing in T ) by
between the default-free interest rate and the default   T 
hazard rate. Duffie–Singleton instead propose to use B̄(t, T ) = exp − ¯
f (t, u) du (9)
a more flexible specification, which does not suffer t
from this disadvantage. In its three-factor form, it is the model can be written in terms of the dynamics
given by of the f¯(t, T ), the drift of which under the risk-



neutral measure must obey the no-arbitrage restric-
0 1 0
tions, derived by Heath, Jarrow, and Morton (HJM)
α= 0 β1 = 0 β2 = β22
in the default-free case. Note that the f¯ are “forward
β3 0 0
rates” only in the sense that equation (9) is analogous



to the definition of instantaneous forward rates in
β31 δ1 γ
the default-free case and their relationship to forward
β3 = β32 δY = 1 γY = γ (7)
bond prices is less straightforward than for default-
0 1 0
free forward rates. That is to say that typically for
with all coefficients (including δ0 and γ0 in equations the forward price F̄ (t, T1 , T2 ) = B̄(t, T2 )/B(t, T1 )
(3) and (4)) strictly positive. Furthermore, (where B(t, T ) is a default-free zero-coupon bond),
one has
  T2 
κ11 κ12 0 1 0 0
B̄(t, T2 )
K= κ21 κ22 0 = 0 1 0 F̄ (t, T1 , T2 )  = = exp − f¯(t, u) du
0 0 κ33 σ31 σ32 1 B̄(t, T1 ) T1

(8) (10)
For the continuously compounded defaultable
with the off-diagonal elements of K being nonposi-
short rate r̄(t) = f¯(t, t), the no-arbitrage restrictions
tive. This specification ensures strictly positive credit
imply
spreads λt and can represent negative correlation
f¯(t, t) = rt + ht Lt = Rt (11)
between the increments of r and λ.
The “recovery-of-market-value” assumption at the which is equal to the default-adjusted short rate given
core of the Duffie–Singleton framework is in line in equation (1). In this sense, the risk-neutral mean-
with market practice for defaultable derivative finan- loss rate ht Lt is equal to the instantaneous credit
cial instruments such as swaps. For defaultable bonds, spread r̄(t) − rt .
it is arguably more realistic to model the loss in the Cast in terms of HJM, the model is automatically
event of default as a fraction of the par value. How- calibrated to an initial term structure of defaultable
ever, Duffie and Singleton [9] provide evidence that discount factors B̄(t, T ). This type of straightforward
par yield spreads implied by reduced-form models “cross-sectional” calibration makes the model useful
are relatively robust with respect to different recov- not only for the econometric estimation followed by
ery assumptions, and suggest that for bonds trad- Duffie and Singleton [8] and others such as Duffee
ing substantially away from par, pricing differences [4] and Collin-Dufresne and Solnik [1] but also for
due to different recovery assumptions can be largely the relative pricing of credit derivatives.
compensated by changes in the recovery parame- The model can be extended in a number of direc-
ters. The computational tractability gained through tions, several of which are discussed in [9]. “Liquid-
the “recovery-of-market-value” assumption may thus ity” effects can be modeled by defining a fractional
Duffie–Singleton Model 3

carrying cost of defaultable instruments, in which [2] Cox, J.C., Ingersoll, J.E. & Ross, S.A. (1985). A theory
case the relevant discount rate Rt = rt + ht Lt + t of the term structure of interest rates, Econometrica
is adjusted for default and liquidity. The assumption 53(2), 385–407.
[3] Dai, Q. & Singleton, K.J. (2000). Specification analysis
of exogenous default intensity and recovery rate can of affine term structure models, The Journal of Finance
be lifted, as in [5], by allowing intensities/recovery 55(5), 1943–1978.
rates to differ for the counterparties in an over- [4] Duffee, G. (1999). Estimating the price of default risk,
the-counter (OTC) derivative transaction, with the Review of Financial Studies 12(1), 197–226.
intensity/recovery rate relevant for discounting deter- [5] Duffie, D. & Huang, M. (1996). Swap rates and credit
mined by which counterparty is in the money. Jumps quality, Journal of Finance 51(3), 921–949.
in the default-adjusted rate can be introduced along [6] Duffie, J.D. & Kan, R. (1996). A yield factor model of
interest rates, Mathematical Finance 6(4), 379–406.
the lines of [6] while preserving the tractability of
[7] Duffie, D. & Lando, D. (2001). Term structures of
an affine term structure model. The model of single- credit spreads with incomplete accounting information,
obligor default considered by Duffie and Singleton Econometrica 69(3), 633–664.
[8, 9] can also be extended to the portfolio level [8] Duffie, D. & Singleton, K.J. (1997). An econometric
using the copula function approach of Schönbucher model of the term structure of interest-rate swap yields,
and Schubert [11], since introducing default corre- The Journal of Finance 52(4), 1287–1322.
lation through correlated diffusive dynamics of the [9] Duffie, D. & Singleton, K. (1999). Modeling term
structures of defaultable bonds, Review of Financial
default intensities ht for different obligors is typically
Studies 12, 687–720.
insufficient, resulting only in very mild correlation of [10] Heath, D., Jarrow, R. & Morton, A. (1992). Bond pricing
defaults. and the term structure of interest rates: a new method-
Historically, reduced-form models like Duffie– ology for contingent claims valuation, Econometrica
Singleton have been considered to be following 60(1), 77–105.
a different paradigm than the more fundamental [11] Schönbucher, P. & Schubert, D. (2001). Copula Depen-
structural models where default is triggered when dent Default Risk in Intensity Models, University of
Bonn. Working paper.
the value of the firm falls below a barrier taken
to represent the firm’s liabilities. However, the two
approaches have been reconciled by Duffie and Lando Related Articles
[7], who show that models based on a default
intensity can be underpinned by a structural model Affine Models; Constant Maturity Credit Default
in which bondholders are imperfectly informed about Swap; Intensity-based Credit Risk Models;
the firm’s value. Jarrow–Lando–Turnbull Model; Markov Pro-
cesses; Multiname Reduced Form Models; Point
References Processes; Reduced Form Credit Risk Models.

[1] Collin-Dufresne, P. & Solnik, B. (2001). On the term ERIK SCHLÖGL & LUTZ SCHLÖGL
structure of default premia in the swap and LIBOR
markets, Journal of Finance 56(3), 1095–1115.
Jarrow–Lando–Turnbull describe the discrete-time case. Denoting the matrix
of risk premiums at time t by the K × K-dimensional
Model diagonal matrix (t) = diag(π1 (t), . . . , πK−1 (t), 1),
it is assumed that

Q̃(t, t + 1) − I = (t)(Q − I ) (1)


The credit-risk model of Jarrow, Lando, and Turn-
bull is based on a Markov chain with finite state where I denotes the K-dimensional identity matrix
space, modeled in discrete or continuous time. Eco- and with assumptions ensuring that Q̃(t, t + 1) is a
nomically, it relies on the appealing interpretation of probability matrix with absorbing state K. It is well
using different rating classes, which are represented known that the n-step transition matrix at time t under
by the states of the Markov chain. Presumably, it the martingale measure is given by
is the first credit-risk model that incorporates rating
information into the valuation of defaultable bonds
and credit derivatives. An advantage of modeling the 
n−1
Q̃(t, t + n) = Q̃(t + i, t + i + 1), ∀n ∈ 
credit-rating process is that the resulting bond prices
i=0
explicitly depend on the issuer’s initial rating and
(2)
possible rating transitions in the future. Moreover,
the model allows to price derivatives whose payoffs Let τ denote the random default time and C(T ) the
depend on the credit rating of some reference bond, random payoff at time T of a credit-risky claim. Then
an application that is not straightforward in intensity- the value C(t) at time t of this contingent claim is
based models or structural-default models. given by
Technically, the model is formulated on a fil-
tered probability space with a money-market account C(t) = B(t) · Ɛ̃t [C(T )/B(T )] (3)
B = {B(t)}0≤t≤T as numéraire. The state space of
the underlying Markov chain is denoted by S = with Ɛ̃t denoting the conditional expectation, with
{1, . . . , K}, where state K represents default. The respect to the information at time t under the mar-
other states are identified with rating classes that are tingale measure. Under these assumptions, the price
ordered according to increasing default risk, that is, P (t, T ) of a default-free zero-coupon bond at time t,
state 1 represents the best rating. Transition proba- maturing at time T , is given by
bilities from one state to another are specified via a
probability matrix Q in discrete time and using a gen- P (t, T ) = B(t) · Ɛ̃t [1/B(T )] (4)
erator matrix  in continuous time. Multiple defaults
The corresponding price Pid (t, T ) of a defaultable
are excluded by making the default state absorbing,
zero-coupon bond rated i at time t is given by
which corresponds to specific choices of the last rows
of Q and , respectively. The original model achieves 

You might also like