Professional Documents
Culture Documents
TT18 Dissertation Vu 0
TT18 Dissertation Vu 0
Kellogg College
University of Oxford
Trinity 2018
This thesis is dedicated to my parents.
Acknowledgements
We study and implement the equivalent risk contribution portfolio using ex-
pected shortfall as the risk measure and Gaussian mixture to model the asset
returns as proposed by Roncalli et al. in [15]. In order to estimate the models
parameters, we study the constrained Gaussian Mixture Model framework from
another separate paper by Ari [1]. We compare this model with other traditional
approach including equivalent risk contribution portfolio using volatility as the
risk measure and the well known mean-variance portfolio. All algorithms and
backtests are implemented in Python and the code is listed in Appendix C.
Contents
1 Introduction 1
1.1 Objective and Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Organization of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
i
4.2.2 Expected shortfall risk measure . . . . . . . . . . . . . . . . . . . . . 31
4.2.3 Existence and uniqueness of the portfolio . . . . . . . . . . . . . . . 33
4.3 Results analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.3.2 Stressed regime parameters calibration . . . . . . . . . . . . . . . . . 38
4.3.3 Filtering algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.3.4 Backtesting results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.3.5 Risk premia portfolio . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
A Mathematics Supplementary 51
A.1 Convex Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
A.2 Generalized Exponential Family of Distributions . . . . . . . . . . . . . . . 52
A.3 Multinomial Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
A.4 Gaussian Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
C Code Listing 63
Bibliography 80
ii
List of Figures
2.1 The skewness coefficient γ1 (X + Y ) when the random vector (X, Y ) is log-
normal. σX = 0.5, σY = 0.5, γ1 (X) = 1.8, γ1 (Y ) = 1.8 . . . . . . . . . . . . . 7
2.2 The skewness coefficient γ1 (X + Y ) when the random vector (X, Y ) is log-
normal. σX = 1.0, σY = 1.0, γ1 (X) = 6.2, γ1 (Y ) = 6.2 . . . . . . . . . . . . . 8
2.3 The skewness coefficient γ1 (X + Y ) when the random vector (X, Y ) is log-
normal. σX = 1.0, σY = 0.5, γ1 (X) = 6.2, γ1 (Y ) = 1.8 . . . . . . . . . . . . . 8
2.4 The skewness coefficient γ1 (X + Y ) when the random vector (X, Y ) is log-
normal. σX = 1.0, σY = 1.5, γ1 (X) = 6.2, γ1 (Y ) = 33.5 . . . . . . . . . . . . 9
iii
List of Tables
iv
Listings
v
Chapter 1
Introduction
1
with two states: a normal state and a stressed state where the jumps occur. The expected
shortfall and marginal risk contribution from this model exist in analytical form. When
we take into account skewness, another problem arose is the aggregation of skewness risk
premia which we will also study. In particular, investors are exposed to a risk of large
drawdown when they mix strategies with high skewness together. Linear correlation is not
the right statistical tool that we can rely on anymore. Hence, this risk is difficult to mitigate
by volatility diversification.
The main contributions of this thesis are as follows. First, we study the skewness
based risk parity model proposed by Roncalli et al.[15]. In this paper, the author uses
the Expectation-Maximisation (EM) algorithm to estimate the parameters of the Gaussian
mixture model. The difficulty comes when we first tried to implement this algorithm because
when various constraints are introduced on the parameters, the author does not specify
the methods to estimate the model parameters under these constraints. Hence the next
contribution is the study and implementation of a constrained Gaussian mixture model
framework proposed by Ari in [1]. The novelty comes from utilizing this framework to
implement and study how the asset allocation algorithm in [15] behaves with different
parameters constraints. Finally, we backtest various portfolios and compare the result with
the traditional volatility based risk parity approach and the mean variance approach.
2
Chapter 2
where R is the risk measure of portfolios X and Y . In the case if we use volatility, σ, as
the risk measure:
p
σ(X + Y ) = σ 2 (X) + σ 2 (Y ) + 2ρ(X, Y )σ(X)σ(Y )
(2.2)
≤ σ(X) + σ(Y )
where ρ(X, Y ) is the correlation between X and Y . Hence, one way to minimize volatility
is to select assets that have low or negative correlations. However, correlation as the de-
pendence measure is only sensible when we assume Gaussian distribution of asset returns,
which is the challenge when constructing portfolio with alternative risk premia due to the
high negative skewness of the strategy return.
and
µ3 (X + Y ) = µ3 (X) + µ3 (Y ) + 3 cov(X, X, Y ) + cov(X, Y, Y ) . (2.4)
1
We refer the readers to Appendix B for the formal definition of risk measure.
3
In addition, we denote cov(X, Y, Z) = E (X − E[X])(Y − E[Y ])(Z − E[Z]) . We can then
define the skewness γ1 (X + Y ) and coskewness γ1 (X, Y, Z) as:
µ3 (X + Y )
γ1 (X + Y ) = 3/2
(2.5)
µ2 (X + Y )
and
cov(X, Y, Z)
γ1 (X, Y, Z) = . (2.6)
σ(X)σ(Y )σ(Z)
Proposition 1. The skewness coefficient of X + Y is close to the skewness of X if σ(X)
σ(Y ) given X is independent of Y . In the case of dependent random variables, the coskew-
ness coefficients are the functions of the correlations ρ(X 2 , Y ) and ρ(X, Y 2 ).
µ2 (X + Y ) = σ 2 (X) + σ 2 (Y ) (2.7)
and
µ3 (X + Y ) = µ3 (X) + µ3 (Y ) . (2.8)
µ3 (X) + µ3 (Y )
γ1 (X + Y ) = 3/2
σ 2 (X)
+ σ 2 (Y )
(2.9)
σ 3 (X) σ 3 (Y )
= γ1 (X) 3/2 + γ1 (Y ) 3/2 .
σ 2 (X) + σ 2 (Y ) σ 2 (X) + σ 2 (Y )
σ 3 (X) σ 3 (Y )
γ1 (X + Y ) = γ1 (X) + γ 1 (Y )
σ 3/2 (X + Y ) σ 3/2 (X + Y ))
(2.10)
3 cov(X, X, Y ) + cov(X, Y, Y )
+ .
σ 3/2 (X + Y )
σ 3 (X) σ 3 (Y )
γ1 (X + Y ) = γ1 (X) + γ 1 (Y )
σ 3/2 (X + Y ) σ 3/2 (X + Y )
(2.11)
3σ 2 (X)σ(Y ) 3σ(X)σ 2 (Y )
+ γ1 (X, X, Y ) 3/2 + γ1 (X, Y, Y ) 3/2 .
σ (X + Y ) σ (X + Y )
Hence, the skewness of the sum is a weighted average of skewness and coskewness coefficients.
4
Furthermore, if E[X] = E[Y ] = 0, we can obtain that:
cov(X, X, Y )
γ1 (X, X, Y ) =
σ(X)2 σ(Y )
E X Y − X 2 E[Y ] − 2XY E[X] + 2XE[X]E[Y ] + E[X]2 Y − E[X]2 E[Y ]
=
σ(X)2 σ(Y )
cov(X 2 , Y ) − 2cov(X, Y )E[X]
=
σ(X)2 σ(Y )
= ρ(X 2 , Y ) .
(2.12)
Similar result can be obtained for γ1 (X, Y, Y ). This has shown that the coskewness coeffi-
cients are the function of the correlations ρ(X 2 , Y ) and ρ(X, Y 2 ).
However, skewness aggregation is complex since we have not accounted for the inter-
dependence between ρ(X, Y ), ρ(X 2 , Y ), ρ(X, Y 2 ). To illustrate this, we study an example
where the analytical solutions exist for γ1 (X + Y ) [9].
Proposition 2. Let (X, Y ) be the random vector that follows a bivariate log-normal dis-
tribution, i.e., ln X ∼ (µX , σX ) and ln Y ∼ (µY , σY ), the skewness of the sum X + Y is
µ3 (X+Y )
γ1 (X + Y ) = 3/2 and
µ2 (X+Y )
exp(ρσX σY ) − 1
ρX,Y = q q (2.13)
exp(σX2 ) − 1 exp(σ 2 ) − 1
Y
2 σY2
cov(X, X, Y ) = exp 2µX + σX + µY + × (exp(ρσX σY ) − 1)
2 (2.14)
2 2
× (exp(σX + ρσX σY ) + exp(σX ) − 2)
2
σX
cov(X, Y, Y ) = exp 2µY + σY2 + µX + × (exp(ρσY σX ) − 1)
2 (2.15)
× (exp(σY2 + ρσY σX ) + exp(σY2 ) − 2)
5
where ρ denotes the correlation between lnX and lnY .
and
σZ2
E[eZ ] = exp(µZ + ) (2.19)
2
where
µZ = nµX + mµY (2.20)
and
σZ2 = n2 σX
2
+ m2 σY2 + 2mnρσX σY . (2.21)
We obtain that:
E[X n Y m ] = E[eZ ]
n2 σX
2 + m2 σ 2 + 2mnρσ σ (2.22)
Y X Y
= exp(nµX + mµY + ).
2
It follows that:
cov(X, Y )
ρX,Y =
σX σY
E[XY ] − E[X]E[Y ]
=q q
2 ) × (exp(σ 2 ) − 1) exp(2µ + σ 2 ) × (exp(σ 2 ) − 1
exp(2µX + σX X Y Y Y
2 2
σX σY (2.23)
exp(µX + µY + 2 +q 2 ) × (exp(ρσq
X σY ) − 1)
= 2 2
σX σY 2 2
exp(µX + µY + 2 + 2 ) exp(σX ) − 1 exp(σY ) −1
exp(ρσX σY ) − 1
=q q .
exp(σX2 ) − 1 exp(σ 2 ) − 1
Y
6
3.5
3.0
γ1 (X + Y )
2.5
2.0
1.5
Figure 2.1: The skewness coefficient γ1 (X+Y ) when the random vector (X, Y ) is log-normal.
σX = 0.5, σY = 0.5, γ1 (X) = 1.8, γ1 (Y ) = 1.8
2.3 Conclusion
We know that if we consider the volatility as the risk measure, there is a monotonic in-
creasing relationship between the correlation parameter and volatility. However, we can see
that there is no monotonic relationship between the skewness risk measure and the corre-
lation parameter ρX,Y . We can see this from figure 2.1 and figure 2.2 where the skewness
first decreases with the correlation and then increases. Moreover, when one skewness is
dominated by the other, as in the case in figure 2.3 and figure 2.4, the skewness decreases
monotonically as the correlation increases. This again contradicts the case when we use
volatility as the risk measure. Hence, from this simple example we can see that the problem
of skewness aggregation is difficult when the variables are correlated. Thus we can conclude
that any portfolio optimization techniques that rely solely on correlation and volatility as
the risk measure are in danger of being totally blind to the skewness as well as the convexity
7
6.5
6.0
γ1 (X + Y )
5.5
5.0
4.5
Figure 2.2: The skewness coefficient γ1 (X+Y ) when the random vector (X, Y ) is log-normal.
σX = 1.0, σY = 1.0, γ1 (X) = 6.2, γ1 (Y ) = 6.2
8.0
7.5
7.0
γ1 (X + Y )
6.5
6.0
5.5
5.0
Figure 2.3: The skewness coefficient γ1 (X+Y ) when the random vector (X, Y ) is log-normal.
σX = 1.0, σY = 0.5, γ1 (X) = 6.2, γ1 (Y ) = 1.8
8
34
32
γ1 (X + Y )
30
28
26
Figure 2.4: The skewness coefficient γ1 (X+Y ) when the random vector (X, Y ) is log-normal.
σX = 1.0, σY = 1.5, γ1 (X) = 6.2, γ1 (Y ) = 33.5
and concavity of asset returns [14]. This motivates us to study the expected shortfall risk
measure in the asset allocation model in chapter 4.
9
Chapter 3
3.1 Introduction
As mentioned in Chapter 1, we will study the risk parity asset allocation model where asset
returns are modelled by a mixture of two Gaussians. One is for the normal regime and the
other is for the stressed regime. The Expectation-Maximization (EM) algorithm is a very
popular algorithm to estimate the parameters for mixture of Gaussian models. It is based
on the maximum likelihood principle with the convergence to local maxima. However, EM
on its own is not very useful in our case since it does not allow us to impose constraints on
the parameters. In particular, we would like incorporate prior information of our model in
the form of constraints. For e.g. To study the model, we would like to impose constraints
on the jump intensity or on the returns and covariance matrices of assets during stressed
regime as the affine transformation of those in the normal regime. The original paper by
Roncalli et al.[15] does not specify how they solved the constrained maximum likelihood
problem when estimating the models parameters. Hence, in this chapter, we study the
recent research paper by Ari in [1] and apply the results to our asset allocation model in
the later chapter.
10
where φ0 (xn ; µ, Σ) is the probability density function (pdf) of the normal distribution with
the parameters (µ, Σ), xn ∈ Rd and 1 − λ, λ are the probabilities of xn belonging to the first
and the second Gaussian respectively. We would like to use maximum likelihood estimation
to estimate our parameters. Let us define:
θ = (λ, µ1 , Σ1 , µ2 , Σ2 ) . (3.2)
To find the estimate θ̂, we would like to maximize the log likelihood function:
N
X 2
X
`(θ) = ln λk φ(xn ; µk , Σk ) . (3.3)
n=1 k=1
1 1
g(Σ−1 exp − (xn −µk )T Σ−1
k )= d 1 k (x n −µ k )
(2π) |Σk |
2 2
2
1 (3.8)
|Σk |− 2 1
exp − trace Σ−1 T
= d k (x n −µ k )(x n −µ k ) .
(2π) 2 2
11
Hence we can deduce 1
1
∂g(Σ−1
k ) 1 |Σ−1
k |
−2
|Σ−1
k |Σk 1
exp − trace Σ−1 (xn −µk )(xn −µk )T
−1 = d k
∂Σk 2 (2π) 2 2
1
1 |Σ−1 |− 2 1
− (xn −µk )(xn −µk )T k d/2 exp − trace Σ−1 (xn −µk )(xn −µk )T
2 2 k
(2π)
Σk − (xn −µk )(xn −µk )T
1 1
exp − (xn −µk )T Σ−1
= 1 k (xn −µk ) .
(2π)d/2 |Σk | 2 2 2
(3.9)
It follows
N
∂`(θ) 1X λk φ(xn ; µk , Σk )
Σk − (xn −µk )(xn −µk )T .
−1 = P2 (3.10)
∂Σk 2 s=1 λs φ(xn ; µs , Σs )
n=1
We obtain that PN T
n=1 λk,n (xn −µ̂k )(xn −µ̂k )
Σ̂k = PN . (3.12)
n=1 λk,n
Next, let c be the Lagrange multiplier to ensure that the mixture probabilities λk sum to
unity, the first order condition implies that:
∂`(θ)
− c = 0. (3.13)
∂λj
It follows
N
X φ(xn ; µk , Σk )
P2 = c. (3.14)
n=1 s=1 λs φ(xn ; µs , Σs )
12
2. Using the current parameter guesses, calculate the weights λk,n (E-step)
(t) (t) (t)
(t) λ φ(xn ; µk , Σk )
λk,n = P k (t) (t) (t)
. (3.16)
2
s=1 λ s φ(x n ; µ s , Σs )
3. Using the current weights, maximize the weighted likelihood to get the new parameter
estimates (M-step)
PN (t)
(t+1) k=1 λk,n
λk = , (3.17)
N
PN (t)
(t+1) n=1 λk,n xn
µk = PN (t)
, (3.18)
n=1 λk,n
PN (t) (t) (t) T
(t+1) n=1 λk,n (xn −µk )(xn −µk )
Σk = P (t)
. (3.19)
n=1 λk,n
To see that, let us first consider the marginal distribution of y, P (y|θy ). Since y follows
the multinomial distribution, from section A.3, we know that it can be expressed into
exponential family form as
13
where θx |y ∈ Rd × K–d is the natural parameters. Tx : Rd → Rd × K+
d is the sufficient statis-
tics function and A(θx |y=k ) is the log partition function. Moreover, the joint distribution
P (x |y, θx |y ) can also be expressed as:
K
Y –1 PK−1
P (x |y, θx |y ) = P (x | θx|y=k )δ(y=k) P (x |θx |y=K )(1− i=1 δ(y=i))
k =1
YK
= P (x | θx|y=k )δyk
k =1
(3.23)
K
Y
= exp log P (x | θx|y=k )δyk
k =1
XK
= exp( δyk log P (x | θx|y=k ))
k =1
where δyk is the kth element of the vector δy of delta functions defined as:
K−1
X
δy = (δ(y = 0), . . . , δ(y = K − 1), (1 − δ(y = i)))T . (3.24)
i=1
Now, using eq. (3.22), we can rewrite eq. (3.23) into exponential family form:
K
X
T
P (x |y, θx |y ) = exp δyk (θx|y=k Tx (x) − A(θx|y=k ))
k =1
K
X K
X
T
= exp θx|y=k (δyk Tx (x)) − δyk A(θx|y=k )
k =1 k =1
K K
(3.25)
X X
T
= exp θx|y=k Tx |y=k (x, y) − δyk A(θx|y=k )
k =1 k =1
K
X
= exp(θxT |y Tx |y (x, y) − δyk A(θx|y=k ))
k =1
Tx |y (x, y) = δy1 Tx (x), . . . , δyK Tx (x)
(3.27)
= Tx |y=1 (x, y), . . . , Tx |y=K (x, y) ,
K
X
A(θx |y , y) = δyk A(θx|y=k ) . (3.28)
k =1
14
Finally, Using Bayes’ rule and the results above, the joint distribution P (x, y|θ) can be
written in exponential family form as:
To formulate the Maximization step as a convex optimization problem with convex con-
straint set, we need to understand the theoretical underpinning of the EM algorithm. Let
us assume we are given a data set X = {x1 , . . . , xN } of N independent and identically
distributed (i.i.d) normally distributed random vectors and the corresponding random vari-
ables Y = {y1 , . . . , yN } that follow the multinomial distribution. The goal is to maximize
the log likelihood `(θ) = log P (X |θ) where θ are the natural parameters. In continuous
form, `(θ) can be written as:
In the last equation, we have introduced the arbitrary distribution Q(Y) to obtain a lower
bound F (Q, θ) on the log likelihood using Jensen’s inequality.
15
If we consider discrete distributions, then F (Q, θ), the lower bound on the log likelihood
`(θ| xn ), is also the function of the log likelihood `(θ| xn , yn ) of the joint distribution of the
observed variables X , hidden variables Y with the distributions Q = {q(y1 ), . . . , q(yn )} over
Y. Hence, for n = 1, . . . , N , we have:
K
X P (xn , yn = k|θ)
F (q(yn ), θ) = q(yn = k) log
q(yn = k)
k=1
K
X P (yn = k|xn , θ)P (xn |θ)
= q(yn = k) log
q(yn = k) (3.32)
k=1
K K
X X P (yn = k|xn , θ)
= q(yn = k) log P (xn |θ) + q(yn = k) log
q(yn = k)
k=1 k=1
We note that `N (θt−1 |X ) is not a function of Q .This means that for fixed θt−1 , F (Q, θt−1 )
is bounded above by `N (θt−1 |X ) and achieves that bound when
KLN [Q||P (Y|X , θt−1 )] is minimized. i.e. We find Qt such that:
From the expression of the KL divergence term in eq. (3.32), we can see that if the distri-
bution q(yn ) is equal to the posterior distribution q(yn | xn , θt−1 ), this will minimize the KL
divergence and also make it zero. Thus we obtain:
Hence after the E-step, the lower bound function F (P (Y|X , θt−1 ), θt−1 ) and the log likeli-
hood function `N (θt−1 |X ) are equal.
16
3.4.3 Primal problem for the Maximization step
where:
N
1 X
EQ [log P (X , Y|θ)] = Eq(yn ) [log P (xn , yn |θ)] (3.41)
N
n=1
and
N
1 X
HN (Q) = H[q(yn )] . (3.42)
N
n=1
The bound function that we try to maximize in the M-step with respect to θ and fixed
distributions of hidden variables Q(t) is:
Since HN (Q(t) ) does not depend on θ, we can say that the M-step is maximizing the sum
of the expected joint log-likelihoods of both observed and hidden variables X , Y:
17
Since the Gaussian mixture distribution belongs to the exponential family, from eq. (3.29),
we can express the joint log-likelihoods as:
18
λsk for k = 1, . . . , K − 1 and finally, the expected empirical conditional moments of x |y as
νs x |y=k = λsk1 N N
P
n=1 q(yn = k) Tx (xn ) for k = 1, . . . , K. Thus the optimization problem
for the M-step can now be written as:
K
X K−1
X K
X
T
minimize A(θy ) + λsk A(θx|y=k ) − θy=k νsy=k − λsk θx|y=k νs x |y=k (3.47)
k=1 k=1 k=1
where θ ∈ Rn is the optimization variable. From definition 7, we know that θ as the domain
of the log partition function A(θ) must belong to a convex set of parameters Cθ where A(θ)
is well defined.
Proof. From Proposition 12, we know that A(θy ), A(θx |y=1 ), . . . , A(θx |y=K ) are convex in
θ. The sum of convex functions, A(θ) + K
P
k=1 λsk A(θx|y=k ) defined over the convex set Cθ
is convex in θ. In addition, the expression − K−1
P PK T
k=1 θy=k νsy=k − k=1 λsk θx|y=k νs x |y=k is a
linear function of θ. The sum of convex function and a linear function is convex [12]. Hence,
using definition 2 , the minimization problem eq. (3.47) is a convex optimisation problem
in θ.
As shown above, the M-step is a convex optimization problem with natural parameters θ as
the optimization variable. We will see that the Lagrange dual optimization problem with
moment parameters ν can be formulated.
The Lagrange dual function of problem in eq. (3.47) is the constant p∗ where
K
X K−1
X K
X
p∗ = inf A(θy ) + λsk A(θx|y=k ) − θy=k νsy=k − T
λsk θx|y=k νs x |y=k . (3.48)
θ∈dom A
k=1 k=1 k=1
where we have introduced the equality constraints θ̄y = θy , λsk θ̄x|y=k = λsk θx|y=k with the
new variable θ̄ ∈ Rn . We can see that the problem in eq. (3.47) and 3.49 are equivalent.
19
From definition 5, we can see that the Lagrangian L : Rn × Rn × Rn → R of eq. (3.49) is:
K
X K
X
T T
L(θ, θ̄, ν) = A(θy ) + λsk A(θx|y=k ) − θ̄y=k νsy=k − λsk θ̄x|y=k νs x |y=k
k=1 k=1
(3.50)
K
X
+ νyT (θ̄y − θy ) + νxT |y=k (λsk θ̄x|y=k
T T
− λsk θx|y=k )
k=1
where ν = (νyT , νx |y=1 , . . . , νx |y=1 )T ∈ Rn are the Lagrange multipliers. Using the definition
6, we can define the Lagrange dual function g : Rn → R as:
g(ν) = inf L(θ, θ̄, ν)
θ∈dom A,θ̄∈Rn
K
X K
X
T T
= inf A(θy ) + λsk A(θx|y=k ) − θ̄y=k νsy=k − λsk θ̄x|y=k νs x |y=k
θ∈dom A,θ̄∈Rn (3.51)
k=1 k=1
K
X
+ νyT (θ̄y − θy ) + T
νx |y=k (λsk θ̄x|y=k T
− λsk θx|y=k ).
k=1
We can separate θ and θ̄. In addition, from proposition 13, we have proved the relation-
ship between the log-partition function A(θy ), A(θx |y=1 ), . . . , A(θx |y=K ) and the entropy
functions H(νy ), H(νx |y=1 ), . . . , H(νx |y=K ):
Since the Lagrangian L is linear in θ̄y , θ̄x |y=1 , . . . , θ̄x |y=K so g(ν) = −∞ unless νy − νsy =
0, νx |y=k − νs x |y=k = 0 for k = 1, . . . , K. Hence, the dual function g(ν) is:
H(νy ) + PK λ H(ν
k =1 sk x |y=k ), if ν = νs
g(ν) = (3.54)
−∞, otherwise
1 PN
where νs is the expected empirical moments νs = N n=1 T(xn ) We notice that due to the
equality constraint ν = νs , we cannot add new constraints on the moment parameters ν in
eq. (3.54). It turns out that eq. (3.54) can be written into an unconstrained optimization
problem that is equivalent to eq. (3.47) as shown below.
20
Proposition 4. The solutions of the maximum likelihood problem in eq. (3.47) is the same
as the unconstrained dual problem:
K
X K
X
maximize H(νy ) + λsk H(νx|y=k ) + νyT θsy + λsk νxT |y=k θs x |y=k . (3.55)
k=1 k =1
Proof. Since both problems are unconstrained optimization problems, we can obtain the
solution by setting the derivatives of the objective functions with respect to the optimization
variables to zero. Let us take the first derivative of the maximum likelihood problem
eq. (3.47) and set it to zero:
K
X K
X
T T
∇θ A(θy ) + λsk A(θx|y=k ) − θy=k νsy − λsk θx|y=k νs x |y=k = 0
k=1 k=1
∇θy (A(θy ) − θyT νsy ) 0
T
(3.56)
∇θ λs1 A(θx |y=1 ) − λs1 νx |y=1 νsx|y=1 0
x |y=1
=
.. ...
.
T
∇θx |y=K λsK A(θx |y=K ) − λsk νx |y=1 νsx|y=K 0
∇θ A(θy ) νsy
∇θ
x |y=1 A(θx |y=1 ) νs x |y=1
.. = .. . (3.57)
.
.
∇θx |y=K A(θx |y=K ) νs x |y=K
Recall from proposition 11 that the first derivative of the log partition function with respect
to the information parameters is equal to the source parameters. ∇θ A(θ) = ν. We have
∇θy A(θy ) = νy and ∇θx |y=k A(θx |y=k ) = νx |y=k . Hence,
νy νsy
νx |y=1 νs x |y=1
. = . (3.58)
.. ..
.
νx|y=K νs x |y=K
Hence the derivative is zero when νy = νsy and νx |y=k = νs x |y=k for k = 1, . . . , K. Now let
us take the derivative of eq. (3.55):
K
X K
X
∇ν H(νy ) + λsk H(νx|y=k ) + νyT θsy + λsk νxT |y=k θs x |y=k = 0
k=1 k =1
∇vy H(νy ) + νyT θsy
0
T
(3.59)
∇ν λs1 H(νx|y=1 + λs1 νx |y=1 θs x |y=1 0
x |y=1
= .
.
.. ..
T
∇νx |y=K λsK H(νx|y=K + λsK νx |y=K θs x |y=K 0
21
∇νy H(νy ) −θsy
∇ν H(ν x |y=1 ) −θs x |y=1
x |y=1
= . (3.60)
.. ..
.
.
∇νx |y=K H(νx |y=K ) −θs x |y=K
From what is shown in proposition 13, we know that −∇ν H(ν) = θ. Hence, ∇νy H(νy ) =
−θy and ∇νx |y=k H(νx |y=k ) = −θx |y=k . We obtain:
θy θsy
θx |y=1 θs x |y=1
. = . (3.61)
.. ..
.
θx|y=K θs x |y=K
We can see that the derivative is zero when θy = θsy and θx |y=k = θs x |y=k for k =
1, . . . , K. From the relations between optimal parameters θsy , θs x |y=1 , . . . , θs x |y=K and
νsy , νs x |y=1 , . . . , νs x |y=K via the log partition functions A and the entropy functions H ,
we can see that the optimization problem eq. (3.47) and the unconstrained dual problem
eq. (3.55) have the same optimal solutions.
Let us denote the probability density function of the Gaussian mixture models with K
Gaussian components , d dimensional random vector x ∈ Rd and a random variable
y ∈ {1, . . . , K} following a multinomial distribution P (x, y|θ). In addition, P (x, y|θ)
is parameterized by the information parameters θ = {η, m1 , S1 , ..., mK , SK } where η ∈
RK−1 , mk ∈ Rd , Sk ∈ S+
d . Next, let us assume that N random vectors of observations
X = {x1 , ..., xN } are independent, identically distributed (i.i.d.) and generated from the
PK
marginal distributions P (x |θ) = k=1 P (x, y = k|θ). Different to the unconstrained
Gaussian mixture model estimation, here we assume that we have a set of constraints
denoted by C which are either convex constraints defined over the information parameters
θ = {η, m1 , S1 , ..., mK , SK } or over the source parameters ν = {λ, µ1 , Σ1 , ..., µK , ΣK } where
α ∈ RK−1 , µk ∈ Rd , Σk ∈ S+
d for k = 1, . . . , K. Given these constraints, our objective is to
find the estimator θ̂ that maximizes the likelihood. i.e. We need to find:
N
1 X
θ̂ = arg max `(θ| xn ) . (3.62)
θ∈C N
n=1
22
3.5.2 Primal parameterizations for the Maximization step
Using the parameterization of the Gaussian and multinomial distribution in the form of
information parameters, we can express the primal problem for the M-step as a convex
optimization problem with information parameters as optimization variables.
From the relationship shown in eq. (A.33) and eq. (A.21), the natural parameters θ can
be expressed in terms of the information parameters η, m1 , S1 , . . . , mK , SK as θy = η and
θx |y=k = (mTk , (− vec( 21 Sk )T )T where we have defined the vector η as
As shown in eq. (A.38) and eq. (A.24), we can express the expected empirical moments
parameters νsy , νs x |y=1 , . . . , νs x |y=K using the source parameters λs , µs1 , Σs1 , ..., ΣsK as
T
νsy = λs , νs x |y=k = µTsk , vec(Σsk +µsk µTsk )T . In addition, from eq. (A.22) and eq. (A.34),
we can write the log partition functions as:
K−1
X
A(θy ) = log(1 + exp ηk ) , (3.64)
k=1
1 1 d
A(θx |y=k ) = − log |Sk | + mTk Sk−1 mk + log 2π . (3.65)
2 2 2
The inner product terms in eq. (3.47) become:
θyT νsy = η T λs
K−1
X (3.66)
= ηk λsk ,
k=1
1 T
θxT |y=k νs x |y=k = (mTk , − vec( Sk )T ) µTsk , vec(Σsk + µsk µTsk )T
2 (3.67)
1
= mk µsk − tr (Sk )(Σsk + µsk µTsk ) .
T
2
As a result, the convex optimization problem in eq. (3.47) can be parameterized using the
information parameters as an unconstrained optimization problem:
K−1 K
X X1 1 d
minimize log(1 + λsk (− log |Sk | + mTk Sk−1 mk + log 2π)
exp ηk ) +
2 2 2
k=1 k=1
(3.68)
K−1 K
X X
T 1 T
− ηk λsk − λsk mk µsk − tr (Sk )(Σsk + µsk µsk )
2
k=1 k=1
where η ∈ RK−1 , mk ∈ Rd , Sk ∈ S+
d for k = 1, . . . , K are the optimization variables. The
optimization problem depends on the expected sufficient statistics which are computed
23
apriori after the E-step as the expected probabilities λsk , the expected means µsk and the
expected covariance matrices Σsk for k = 1, . . . , K:
N
1 X
λsk = q(yn = k) , (3.69)
N
n=1
N
1 X
µsk = q(yn = k) xn , (3.70)
λsk N
n=1
N
1 X
Σsk = q(yn = k)(xn −µsk )(xn −µsk )T . (3.71)
λsk N
n=1
Using Definition 3.62, we can add constraints over the information parameters θ =
{η, m1 , S1 , . . . , mK , SK } to the optimization problem eq. (3.68) where θ ∈ Cθ with Cθ de-
fined as the convex constraint set including convex inequality and affine equality constraints
on θ.
We can express the dual problem for the M-step using the source parameters. Using the
relationship shown in eq. (A.38) and eq. (A.24), we can express the moment parameters
ν using the source parameters λ, µ1 , Σ1 , . . . , µk , Σk as νy = λ and νx |y=k = µTk , vec(Σk +
T
µk µTk )T .
The expected natural parameters can be written in terms of the information parameters
as νs , ms1 , Ss1 , ..., msK , SsK as θsy = ηs and θs x |y=k = (mTsk , (− vec( 12 Ssk )T )T . Hence, as
shown in eq. (A.39) and eq. (A.28), the entropy functions can be written as:
K−1
X K−1
X K−1
X
H(νy ) = − λk log λk − (1 − λk ) log(1 − λk ) , (3.72)
k=1 k=1 k=1
1 d
log |Σk | + log(2πe) .
H(νx |y=k ) = (3.73)
2 2
The inner product terms in eq. (3.55) become:
νyT θsy = αT ηs
K−1
X (3.74)
= αk ηsk ,
k=1
1
νxT |y=k θs x |y=k = µTk , vec(Σk + µk µTk )T (mTsk , − vec( Ssk )T )T
2
T T 1
= µk msk + tr (Σk + µk µk )(− Ssk )
2 (3.75)
1 1
= µTk msk − tr(Σk Ssk ) − tr(µk µTk Ssk )
2 2
T 1 1 T
= µk msk − tr(Σk Ssk ) − µk Ssk µk .
2 2
24
As a result, the convex optimization problem in eq. (3.55) can be parameterized using the
source parameters as an unconstrained optimization problem:
K−1
X K−1
X K−1
X
maximize − λk log λk − (1 − λk ) log(1 − λk )
k=1 k=1 k=1
K K
X 1 d X
+ λsk log |Σk | + log(2πe) + λk ηsk (3.76)
2 2
k=1 k=1
K
X 1 1
+ λsk µTk msk − tr(Σk Ssk ) − µTk Ssk µk
2 2
k=1
where λ ∈ RK−1 , µk ∈ Rd , Σk ∈ S+
d for k = 1, . . . , K are the optimization variables. The
msk = Σ−1
sk µsk for k = 1, ..., K , (3.78)
Ssk = Σ−1
sk for k = 1, ..., K . (3.79)
which are calculated after the E-step using the expected probabilities λsk , the expected
means µsk and the expected covariance matrices Σsk for k = 1, . . . , K:
N
1 X t
λsk = q (yn = k) for k = 1, ..., K , (3.80)
N
n=1
N
1 X t
µsk = q (yn = k)xn for k = 1, ..., K , (3.81)
λsk N
n=1
N
1 X t
Σsk = q (yn = k)(xn − µsk )(xn − µsk )T for k = 1, ..., K . (3.82)
λsk N
n=1
Similar to the primal constrained optimization problem, using Definition 3.62, we can add
constraints over the moment parameters ν = {λ, µ1 , Σ1 , . . . , µK , ΣK } to the optimization
problem eq. (3.76) where ν ∈ Cθ with Cν defined as the convex constraint set including
convex inequality and affine equality constraints on ν. Some example constraints that are
frequently used in practice [16] and can be formulated as affine quality or convex inequality
convex constraints on the moment parameters are:
25
variables a1 , . . . , ad can be formulated as linear equality and convex inequality con-
straints in the variables Σk as
Σi,i i,i
k = ai Σ̃k
Σi,j
k = 0 i 6= j
(3.83)
Σk 0
ai ≥ 0 i = 1, . . . , d .
• Constraint on the relationship mean vector µk ∈ Rd and another known vector µ̃k ∈
Rm via the affine transformation A ∈ Rd×m , b ∈ Rm . This can be formulated as linear
equality constraints in the variable µk , A, b
µk = Aµ˜k + b . (3.84)
and another covariance matrix Σ˜k ∈ S+d via the transformation A ∈ Rd×m . Although
3.6 Conclusion
The framework of generalized Expectation-Maximization in section 3.4 has allowed the
derivation and parameterization of the Maximization step as a convex optimization problem
with convex constraint set as presented in section 3.5. The novelty of this thesis is we are
able to apply the result in eq. (3.76) to study the expected shortfall parity model presented
in the next chapter.
26
Chapter 4
We refer the reader to appendix B for the introduction to risk measures, the principle of
risk allocation and the notations used in this chapter. The main idea of risk allocation is to
achieve the diversification by allocating assets based on the risk contribution to the whole
portfolio.
Definition 1. Let bi be the proportion of the portfolio-wide risk R(x) that we want to assign
the risk contribution of asset ith to. We can define the solution x∗ of risk budgeting portfolio
as in [11]:
n
X ∂R(x)
x∗ = {x ∈ [0, 1]n : xi = 1, RC i = xi = bi R(x)} . (4.1)
∂ xi
1
Proposition 5. For strictly positive risk budgets, and convex risk measure R, the solution
for eq. (4.1) in definition 1 satisfies the following optimisation problem:
27
Proof. The Lagrange function is:
Xn
L(y; ζ, ζc ) = R(y) − ζ T y − ζc ( bi ln yi − c) (4.3)
i=1
where ζ ∈ Rn and ζc ∈ R are the Lagrange multipliers. Hence, the solution y ∗ has to satisfy
the condition:
L(y; ζ, ζc ) ∂R(y) bi
= − ζi − ζc = 0 . (4.4)
∂yi ∂yi yi
The complementary slackness property implies that we must have:
ζi yi = 0 ,
(4.5)
ζ (Pn b ln y − c) = 0 .
c i=1 i i
Since we know that yi cannot be zero, we must have yi > 0 and ζi = 0. This also implies
that we must have ζc > 0 and hence, ni=1 bi ln yi = c. eq. (4.4) becomes:
P
∂R
yi = ζc bi (4.6)
∂yi
Hence, from the definition of risk contribution, we can verify that:
RC i = ζc bi . (4.7)
We can conclude that the optimisation problem has a solution and is unique.
1
For the equivalent risk contribution (ERC) portfolio, we would have the risk budget bi = n
y∗
and the normalized weights x∗ = Pn i
yj∗
.
j=1
Before considering the relationship between jump risk and skewness risk of our model, we
first define the model of the asset returns using Gaussian mixture as in [15]. Let R(x) be
the return of the portfolio.
Let µ, Σ respectively be the vector of asset returns and covariance matrix in the normal
regime. We have the normal regime modelled by Y1 ∼ N (µ1 (x), σ12 (x)) where µ1 (x) = xT µ
and σ12 = xT Σ x. And the stressed regime with the jump probability of λ is modelled by
Y2 ∼ N (µ2 (x), σ22 (x)) with µ2 (x) = xT (µ + µ̃) and σ22 (x) = xT (Σ + Σ̃) x. Thus we have the
following density function for R(x):
f (y) = (1 − λ)f1 (y) + λf2 (y)
= (1 − λ)φ0 (y; µ1 (x), σ1 (x)) + λφ0 (y; µ2 (x), σ2 (x)) (4.9)
1 y − µ1 (x) 1 y − µ2 (x)
= (1 − λ) φ +λ φ
σ1 (x) σ1 (x) σ2 (x) σ2 (x)
28
where we have used the notation φ0 (z; µ, σ) as the probability density function of z of the
normal distribution with parameters (µ, σ) and φ(z) as the standard normal probability
density function of z.
Proposition 6. The skewness of the asset returns model in eq. (4.8) is given by:
3 2 2
λ(1 − λ) (2λ − 1)(µ1 − µ2 ) + 3(µ1 − µ2 )(σ1 − σ2 )
γ1 (Y ) = 3 . (4.10)
2
(1 − λ)σ12 + λσ22 + λ(1 − λ)(µ1 − µ2 )2
Recall that for normally distributed variables, we have E[Yi ] = µi , E[Yi2 ] = µ2i + σi2 and
E[Yi3 ] = µ3i + 3µi σi2 . We obtain:
= E[Y 3 ] − 3E[Y ]σ 2 (Y ) − E3 [Y ]
= (1 − λ)(µ31 + 3µ1 σ12 ) + λ(µ32 + 3µ2 σ22 )
(4.14)
− 3 (1 − λ)µ1 + λµ2 (1 − λ)σ12 + λσ22 + (1 − λ)λ(µ1 − µ2 )2
3
− (1 − λ)µ1 + λµ2
= (1 − λ)λ(2λ − 1)(µ1 − µ2 )3 + 3λ(1 − λ)(µ1 − µ2 )(σ12 − σ22 ) .
Thus we obtain:
Y − E[Y ] 3
γ1 (Y ) = E
σ(Y )
3 2 2
(1 − λ)λ (2λ − 1)(µ1 − µ2 ) + 3(µ1 − µ2 )(σ1 − σ2 ) (4.15)
= 3 .
2
(1 − λ)σ12 + λσ22 + λ(1 − λ)(µ1 − µ2 )2
29
0.0
−0.5
γ1 (Y )
−1.0
−1.5
−2.0
0.2
0.0
−0.2
−0.4
γ1 (Y )
−0.6
−0.8
−1.0
−1.2
30
In figure 4.1, we plot the relationship between the volatility in the normal regime with
the skewness coefficient while keeping other parameters constant. As we can see, γ1 (Y )
increases with σ1 (Y ). That is because as the normal volatility increases, it is harder to
detect distinctive jump from normal movement of returns. In other words, jumps have
small influence on the distribution of the returns. On the other hand, the absolute value
of γ1 (Y ) is highest when the portfolio’s volatility is low because jumps can dramatically
change the distribution of the returns. These conclusions are also confirmed in [9] and [15].
From figure 4.2 we can also observe that the magnitude of γ1 (Y ) increases with the jump
regime volatility.
Proposition 7. The expected shortfall for the return model in eq. (4.8) has the form:
ESα (x) = (1 − λ)ϕ(VaRα (x), µ1 (x), σ1 (x)) + λϕ(VaRα (x), µ2 (x), σ2 (x)) (4.16)
where
c a+b b a+b
ϕ(a, b, c) = φ( )− Φ(− ). (4.17)
1−α c 1−α c
Proof. Let Y ∼ N (µ, σ 2 ). Consider:
ϕ = E[1{Y ≥ a}.Y ]
Z ∞
y y − µ
= φ dy
a σ σ
Z ∞
= (µ + σt)φ(t) dt
σ −1 (a−µ)
Z ∞ (4.18)
∞ σ 1
t exp − t2 dt
= µ Φ(t) σ−1 (a−µ) + √
2π σ−1 (a−µ) 2
∞
a − µ σ 1
=µ 1−Φ +√ − exp − t2
σ 2π 2 σ −1 (a−µ)
a − µ a − µ
= µΦ − + σφ
σ σ
x−µ
where we have used the change of variable t = σ . From eq. (4.8), we can see that g(y)
is the density function of L(x)
1 y + µ1 (x) 1 y + µ2 (x)
g(y) = (1 − λ) φ( )+λ φ( ) (4.19)
σ1 (x) σ1 (x) σ2 (x) σ2 (x)
where L(x) = −R(x) ∼ N (−µ(x), σ 2 (x)) is the portfolio’s loss. Thus using the result from
31
eq. (4.18) , we obtain:
It follows:
Z VaRα (x)
g(y)dy = α . (4.22)
−∞
Thus VaR can be found using a bisection algorithm by solving the equation
VaRα (x) + µ1 (x) VaRα (x) + µ1 (x)
(1 − λ)Φ + λΦ = α. (4.23)
σ1 (x) σ1 (x)
It can also be shown that there is an analytical solution of the marginal risk contribution
using the expected shortfall measure. Let us define
VaRα (x) + µi (x)
hi (x) = . (4.24)
σi (x)
Hence, using the result from eq. (B.11), we have that:
32
Differentiate the above equation with respect to x, we obtain:
where
πi φ(hi (x))
ω̄i (x) = . (4.30)
σi (x)
Finally, we can deduce the marginal risk contribution ∂x ES α (x) by differentiating the above
expression of ESα (x) with respect to x:
1−λ
∂x ES α (x) = ∂x σ1 (x)φ(h1 (x)) − σ1 (x)h1 (x)φ(h1 (x))∂x h1 (x)
1−α
1−λ
∂x µ1 (x)Φ(−h1 (x)) − µ1 (x)φ(h1 (x))∂x h1 (x)
1−α
(4.32)
λ
∂x σ2 (x)φ(h2 (x)) − σ2 (x)h2 (x)φ(h2 (x))∂x h2 (x)
1−α
λ
∂x µ2 (x)Φ(−h2 (x)) − µ2 (x)φ(h2 (x))∂x h2 (x) .
1−α
Recall that if we use volatility risk measure, since R(y) ≥ 0, the risk parity portfolio
always exists and is unique and is the solution of the optimization problem 4.2 . The
existence of solution is more complex when we take into account the standard deviation-
based risk measure since we may have the situation of limy→∞ R(y) = −∞. When a
standard deviation-based risk measure is used, we have add another constraint to ensure
the existence of the solution:
R(x) ≥ 0 . (4.33)
This is equivalent to have the scaling factor c greater than the maximum Sharpe ratio. i.e.
c > max sup SR(x), 0 . (4.34)
x∈[0,1]n
33
To understand this, we study the relationship between the risk contribution, performance
contribution and volatility contribution as in [13]. Let us define the risk contribution RC i
of asset i as:
RC i = −µi (x) + cσi (x) (4.35)
Proposition 8. The risk contribution of asset i is the weighted average of the performance
contribution and the volatility contribution:
RC ∗i = (1 − ω) PC ∗i +ω VC ∗i (4.39)
cσ(x)
where ω = −µ(x)+cσ(x) .
Proof. We have:
RC ∗i = (1 − ω) PC ∗i +ω VC ∗i
−µ(x) + cσ(x) − cσ(x) −µi xi cσ(x) xi (Σx)i
= Pn +
−µ(x) + cσ(x) j=1 xj µj −µ(x) + cσ(x) xT Σx
−µi xi cxi (Σx)i
= + (4.40)
−µ(x) + cσ(x) −µ(x) + cσ(x)
1
= −µi xi + cxi (Σx)i
R(x)
−µi (x) + cσi (x)
= .
R(x)
∂ω
Furthermore, we can obtain ∂c :
∂ω σ(x) −µ(x) + cσ(x) − cσ(x)σ(x)
=
∂c (−µ(x) + cσ(x))2
(4.41)
−µ(x)σ(x)
= .
(−µ(x) + cσ(x))2
34
As a result, we can see that if c = 0, ω = 0. And ω is a decreasing function of c until the
µ(x)
value of c∗ = σ(x) which is the Sharpe ratio of the portfolio. When c > c∗ , ω is positive and
approaches 1 as c approaches ∞. We can see that when c is lower than the Sharpe ratio of
the portfolio, the risk contribution is return based and hence can be negative. To guarantee
that the solution to the problem 4.2 exists, we must have c > c∗ . i.e. the risk contribution
is volatility based and will be always positive.
To study the existence and solution of our model, we need to write the expected shortfall
in the form of standard deviation-based risk measure and find the scaling factor c. We
begin with finding the VaR lower bound as in [15]. We assume the normal case where the
confidence level α is greater than the jump intensity λ since practically α is higher than
50% while λ is lower than 50%.
Proposition 9. Assume that α ≥ λ, the lower bound VaR− of the value at risk is :
α − λ
VaR− = −µ1 (x) + Φ−1 σ1 (x) . (4.42)
1−λ
Proof. Let L1 (x) ∼ N (−µ1 (x), σ12 (x)), L2 (x) ∼ N (−µ2 (x), σ22 (x)) be the loss of each
regimes and g1 (y), g2 (y) be the density functions of these losses respectively. The value-at-
risk with the confidence level α is defined as:
Z VaRα (x)
g(y)dy = α (4.43)
−∞
where g(y) = (1 − λ)g1 (y) + λg2 (y). Hence, we can also deduce:
Z ∞
(1 − λ)g1 (y) + λg2 (y)dy = 1 − α . (4.44)
VaRα (x)
It follows that: ∞
(1 − α)
Z
g1 (y) dy ≤ . (4.46)
VaRα (x) (1 − λ)
(1−α)
Let 1 − α0 = (1−λ) . Since we make the assumption that α ≥ λ, we can deduce that:
Z ∞ Z ∞
0
g1 (y) dy ≤ 1 − α = g1 (y) dy (4.47)
VaRα (x) VaR1α0 (x)
where VaR1α0 the value-at-risk at the confidence level α0 of the portfolio under the first
R∞
regime. Since a g1 (y) dy is a decreasing function of a, we can obtain:
This means that VaR1α0 is a lower bound of the value-at-risk. Using the result from eq. (B.4),
we can write this as the lower bound VaR− = −µ1 (x) + Φ−1 α−λ
1−λ σ1 (x) .
35
Then, we can use the result to find the lower bound for the expected shortfall.
Proposition 10. The lower bound of the expected shortfall, denoted ES− , is given as:
− 1−λ −1 α − λ −1 α − λ
ES = −(1 + λ)µ1 (x) + φ Φ + λΦ σ1 (x) . (4.49)
1−α 1−λ 1−λ
Proof. The expected shortfall is defined as:
Z ∞
1
ESα (x) = yg(y)dy
1 − α VaRα (x)
(4.50)
(1 − λ) ∞
Z Z ∞
λ
= yg1 (y) dy + yg2 (y) dy .
(1 − α) VaRα (x) (1 − α) VaRα (x)
Assuming the worst case scenario when g2 (y) = δy (VaRα (x)), we can deduce that:
Z ∞
yg2 (y) dy ≥ (1 − α)VaRα (x) ≥ (1 − α)VaR1α0 . (4.51)
VaRα (x)
Since the expected shortfall is an increasing function of the value-at-risk and confidence
level, we also have:
Z ∞ Z ∞
yg1 (y) dy ≥ yg1 (y) dy = (1 − α)ES1α0 (4.52)
VaRα (x) VaR1α0 (x)
where ES1α0 is the expected shortfall with the confidence level α0 under the first regime.
Combining the result from eq. (4.51) and eq. (4.52), we obtain:
1−λ λ
ES(x) ≥ (1 − α0 )ES1α0 (x) + (1 − α)VaR1α0 (x) . (4.53)
1−α 1−α
1−α
Since 1 − α0 = 1−λ , this is simplified to:
Using the standard deviation forms of expected shortfall and VaR from eq. (B.5) and
eq. (B.4), the lower bound ES− can be written as:
− 1−λ −1 α − λ −1 α − λ
ES = −µ1 (x) + φ Φ σ1 (x) + λ −µ1 (x) + Φ σ1 (x)
1−α 1−λ 1−λ
1−λ −1 α − λ −1 α − λ
= −(1 + λ)µ1 (x) + φ Φ + λΦ σ1 (x) .
1−α 1−λ 1−λ
(4.55)
As a result, using the result we discussed after proposition 8, we know that that RB portfolio
exists and unique if
+
c ≥ SR = max sup SR(x), 0 . (4.56)
x∈[0,1]n
36
Since expected shortfall is an increasing function of α, in our case, this means that:
α ≥ max(α− , 0) (4.57)
In this section we study the ERC portfolio using expected shortfall (ES) and Gaussian
Mixture Models (GMM) for assets returns as shown in the previous section. We start with
the portfolio of 3 underlyings representing 3 asset classes. First asset is US volatility carry
strategy in which it tries to capture the risk premium of implied volatility being most of
the time higher than realized volatility by selling delta hedged at the money straddles on
S&P 500 Index. The second asset is US Equity where we use the S&P 500 Total Return
Index as the proxy. And the third asset is US Bond where we use the performance of 10
year US Treasury Note. The time-series of these 3 assets are plotted in figure 4.3. We
can observe from the volatility carry timeseries that there are irregular jumps caused by
spikes in realized volatility in the stressed market conditions. We evaluate by comparing
the performance, risk management, weight turnover of this algorithm with the default ERC
portfolio using volatility as risk measure and the Markowitz mean-variance algorithm.
To evaluate these algorithms, we first assume that the portfolio weights are re-balanced
monthly on the first business date of the month. In addition, let Nrw be the rolling window
of 250 business dates where we calculate the weekly returns of the underlyings. For the ERC
portfolio using ES with GMM, to set up the portfolio weight optimization as in eq. (4.2)
with the risk measure R(x) = ESα (x) using expression in eq. (4.31), we would need to
estimate the parameters (µ̂n , Σ̂n ) on each re-balancing date n. i.e.
N
Xrw
0 0
µ̂n , Σ̂n = arg max ln (1 − λ)φ (Rn−s ; µn , Σn ) + λφ (Rn−s ; µn + µ̃, Σn + Σ̃) . (4.59)
(µn ,Σn ) s=1
37
Volatility Carry
400
Equity
350 Bond
Cumulative PnL
300
250
200
150
100
Figure 4.3: Cumulative PnL of bonds, equities and volatility carry strategies
To solve the problem in eq. (4.59), first we would need to calibrate the parameters (µ̃, Σ̃)
from historical data using maximum likelihood estimation. Default unconstrained parame-
ters estimation in section 3.2 using EM algorithm achieves local maxima and does not allow
us to control the jump probability. In addition, since we have a bond asset in our portfolio,
it also makes more sense to have the same return and volatility in normal market regime
and stressed market regime for US 10 year Note. As a result, utilizing the constrained
mixture of Gaussian framework that we studied in chapter 3, we can estimate (µ̃, Σ̃) us-
ing maximum likelihood by solving1 the convex optimization problem eq. (3.76) with the
following constraints:
λ = 0.02 , (4.60)
38
Period µ µ + µ̃
[2003 − 01 − 01, 2004 − 01 − 01] (0.18, 0.26, 0.03) (−0.10, −0.15, 0.03)
[2005 − 01 − 01, 2009 − 01 − 01] (0.10, 0.15, 0.03) (−0.06, −0.09, 0.03)
[2003 − 01 − 01, 2018 − 01 − 01] (0.09, 0.16, 0.03) (−0.05, −0.08, 0.03)
Table 4.1: Parameters estimation of µ̃ under different historical periods with λ = 0.02
Period ρ ρ̃
1 1
[2005 − 01 − 01, 2006 − 01 − 01] 0.19
1
0.25
1
−0.06 −0.35 1 −0.02 −0.23 1
1 1
[2005 − 01 − 01, 2009 − 01 − 01] 0.39
1
0.51
1
−0.31 −0.33 1 −0.03 −0.08 1
1 1
[2005 − 01 − 01, 2018 − 01 − 01] 0.40
1
0.44
1
−0.25 −0.35 1 −0.06 −0.11 1
Table 4.2: Correlation matrix under normal and stressed regime under different historical
periods with λ = 0.02
this example, we can see that the estimated values of (µ̃, Σ̃) are stable when we use a long
enough period and that period contains a stressed event. for e.g. the 2008 financial crisis.
On each re-balancing date n, given λ, µ̃, Σ̃, we would like to estimate the parameters µ̂n
and Σ̂n . We say that a jump is detected on rebalancing date n if the filtering probability
39
λ̂n is larger than a given threshold λ∗ . Let Nrw be the length of the rolling window.
for s = 1, . . . , Nrw . Note that λ̂n−s are based on the estimates µ̂n−1 and Σ̂n−1 calcu-
lated at time n − 1.
N rw
1X
Σ̂n = 1{λ̂n−s ≤ λ∗ }(Rn−s − µ̂n )(Rn−s − µ̂n )T (4.66)
n̂
s=1
where n̂ is
rw −1
NX
n̂ = 1{λ̂n−s ≤ λ∗ } . (4.67)
s=1
These estimates can then be used to calculate λ̂n+1−s on the next re-balancing date
n + 1.
Using the parameter estimates Σ̂n , Σ̃n , µ̃n obtained from the calibration step and the filtering
algorithm, we can set up the optimization problem as describe in eq. (4.2). Note that even
though we estimated µ̂n in the filtering algorithm, we will only keep the covariance structure
of the returns and set µ̂n to be the zero vector. The result is that we have removed the bias
of the return estimation and make our algorithm comparable to the ERC portfolio using
volatility as risk measure as discussed in section 4.2.3.
Before comparing these two approaches, we recall that the ERC portfolio using R(x) =
σ(x) as risk measure is one of the most popular asset allocation algorithm used in asset
management industry because of its simplicity and robustness compared to the Markowitz
mean-variance approach since it removes the instability of expected return estimation and
try to increase diversification by distributing equally the risk contribution of assets in the
portfolio. However, it suffers from several drawbacks as noted in [15]. Firstly, using standard
deviation as risk measure does not capture the non normality of asset returns. Secondly,
the weight allocation is not smooth. Whenever there is a jump, the weight of the volatility
carry strategy decreases sharply. We can see this from figure 4.5. This sharp decrease
is accompanied by a sharp increase in the weight allocation when the jump in the asset
returns exits the rolling window used to estimate the covariance matrix. As a result, we
40
obtain a high weight turnover. Also as a consequence, the weight is generally maximum
just before the jump occurs and it is generally too late to reduce the allocation after the
jump occurrence because jumps are not frequent and not correlated.
We present the weight allocation of ERC portfolio with expected shortfall (ES) risk
measure and Gaussian Mixture Model (GMM) in figure 4.6 with the base case parameter
λ = 0.02. We observe that the weight allocation is now much smoother compared to ERC
portfolio using volatility. As the result, we notice a better average annualized turnover
as shown in figure 4.7. The worst turnover comes from the mean-variance algorithm with
more than 10 times bigger turnover compared to ERC portfolio with ES and GMM. This is
due to the model instability in estimating the expected return leading to instability in the
weight allocation. We can see that from figure 4.4. In term of risk management metrics, we
again see from table 4.3 that the ERC portfolio with ES and GMM has lowest maximum
drawdown, lowest skewness coefficient in term of magnitude and best Sortino ratio which
better captures the risk adjusted return when the portfolio returns exhibit skew in their
distributions.
Next we study the behaviour of the new model with respect to the jump probability
parameter λ. As presented in table 4.4, as the jump probability decreases and approaches
zero, the algorithm behaves like if we use the single Gaussian model. The turnover increases
sharply as the single spot covariance matrix does not capture any possibility of jump risk.
Table 4.3: Performance metrics comparison between different asset allocation algorithms
Having studied our model using the toy portfolio with three underlyings in the above section,
we turn our focus to the practical application of our asset allocation model to the portfolio
of risk premia strategies. We pick some well known risk premia strategies and include them
in our portfolio:
• G10 FX Carry Strategy. Bloomberg ticker is UISFC1UE Index. The strategy goes
41
Figure 4.4: Markowitz maximum Sharpe portfolio weights with constraint σ <= 0.05
42
Figure 4.6: ERC portfolio weights using expected shortfall (ES) with GMM
0.4
Turnover
0.3
0.2
0.1
0.0
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
Year
43
220 Vol ERC
GMM/ES ERC
200 Mean-Variance
Cumulative PnL
180
160
140
120
100
Vol ERC
0.100 GMM/ES ERC
0.075
YoY return
0.050
0.025
0.000
−0.025
−0.050
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
Year
44
Metrics λ = 0.04 λ = 0.02 λ = 0.001
Annualized return 4.74% 4.95% 5.11%
Annualized volatility 4.14% 4.22% 4.53%
Sharpe ratio 1.15 1.17 1.13
Sortino ratio 1.47 1.41 1.21
Maximum drawdown 11.19% 12.01% 13.90%
Skewness −0.54 −0.65 −0.70
Average annualized turnover 17% 19% 39%
long and short G10 FX based on the one month carry signal.
• G10 FX Value Strategy. Bloomberg ticker is UISFV1UE Index. The strategy goes
long and short G10 FX based on the relationship between the current and historical
spot FX prices.
• MSCI World Beta Neutral Low Volatility Strategy. Bloomberg ticker is UISELGSE
Index. The strategy goes long stocks with lowest 12 month volatility and short the
MSCI World Index.
• MSCI World Beta Neutral Value Strategy. Bloomberg ticker is UISEVGSE Index.
The strategy goes long stocks with best value score based on various financial ratios
and short the MSCI World Index.
• MSCI World Beta Neutral Quality Strategy. Bloomberg ticker is UISEQGSE Index.
The strategy goes long stocks with best quality score based on various financial ratios
and short the MSCI World Index.
45
14
12 UISELGSE Index
UISEMGSE Index
10 UISEQGSE Index
UISERUAE Index
8
Weight
UISEVGSE Index
UISFC1UE Index
6 UISFV1UE Index
UISRCX8E Index
4 UISRTLGE Index
UISXTXUE Index
2
0
2008 2010 2012 2014 2016 2018
Time
Figure 4.10: Markowitz maximum Sharpe portfolio weights with constraint σ <= 0.09 for
the risk premia portfolio
In this test, we remove the constraint that the total weight must be 1. In addition, we
introduce the convex inequality constraint on the variance of the portfolio such that the
volatility of the portfolio is targeted at 9%. This allows the algorithm to control the leverage
of the portfolio and is a technique commonly used in practice. We report the weight
allocation of different algorithms in figure 4.10, 4.12 and 4.11. In figure 4.12, we see that
our algorithm leverage is very stable. On the other hand, we can again observe the sharp
increase and decrease in the allocation of the volatility based risk parity portfolio. This leads
to the instability of the leverage. The instability is worst for mean-variance portfolio with
the complete in and out of different underlying positions during the backtest period. The
performance metrics are reported in table 4.5. The result shows that our asset allocation is
stable, with the realized volatility closest the the targeted volatility of 9%. The algorithm
also achieves best risk-adjusted return in term of Sharpe and Sortino ratio. It also has the
lowest maximum drawdown. And as expected, it also has the lowest annualized turnover.
This shows that our algorithm is highly implementable in practice, especially under liquidity
constraints.
46
10
UISELGSE Index
UISEMGSE Index
8
UISEQGSE Index
UISERUAE Index
Weight
6 UISEVGSE Index
UISFC1UE Index
UISFV1UE Index
4
UISRCX8E Index
UISRTLGE Index
2 UISXTXUE Index
0
2008 2010 2012 2014 2016 2018
Time
Figure 4.11: ERC portfolio weights using volatility for the risk premia portfolio
6 UISELGSE Index
UISEMGSE Index
5
UISEQGSE Index
UISERUAE Index
4
Weight
UISEVGSE Index
UISFC1UE Index
3
UISFV1UE Index
UISRCX8E Index
2
UISRTLGE Index
UISXTXUE Index
1
0
2008 2010 2012 2014 2016 2018
Time
Figure 4.12: ERC portfolio weights using expected shortfall (ES) with GMM for the risk
premia portfolio
47
10
Vol ERC
GMM/ES ERC
8
Turnover
0
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
Year
250
225
200
Cumulative PnL
100
75
2008 2010 2012 2014 2016 2018
Time
Figure 4.14: Cumulative PnL comparison of risk premia portfolio with different asset allo-
cation models
48
Vol ERC
0.20
GMM/ES ERC
0.15
YoY return
0.10
0.05
0.00
−0.05
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
Year
Figure 4.15: Year on year return for risk premia portfolio comparison
Table 4.5: Performance metrics of risk premia portfolio between different asset allocation
algorithms
49
Chapter 5
The thesis starts with the study showing that we cannot diversify the skewness by relying
on the correlation parameter as in the case of volatility diversification. In particular, we
can optimize the portfolio to have low volatility but this can also result in high stress risk.
Therefore, to take skewness risk into account and not rely on just volatility risk, we study
and implement the paper by Roncalli et al.[15] where expected shortfall is used in the risk
parity framework together with the mixture distribution model for asset returns. The nov-
elty comes from applying the theory of constrained Gaussian Mixture Model by Ari in [1] to
estimate the parameters in this model. The idea is that prior information can be formulated
in the form of convex constraints either on the source of the information parameters and
these constraints can be handled by solving the constrained convex optimization problems
for the Maximization-step of the EM algorithm. The results presented in this thesis show
that our allocation algorithm has overcome the shortcomings of volatility-based risk parity.
It achieves stable leverage with better risk-adjusted returns and with lower maximum draw-
down and weight turnover. This implies that our algorithm is highly applicable in practice,
especially under liquidity constraints.
Despite of the brilliance of Markowitz’s theory, the mean-variance framework suffers
from several drawbacks that prevent it to perform in practice. In particular, it often concen-
trates the allocation on a few assets only, which inevitably leads to disastrous out-of-sample
risks. In addition, the solution portfolio is also very sensitive to small changes in the return
forecast [4]. This has lead to the risk allocation approach and the birth of volatility-based
risk parity. In this thesis, we have only studied one variation of how volatility-based risk
parity can be improved. Hence, the future work could be focused on comparing this ap-
proach with other recent advances in risk parity such as hierarchical risk parity by Prado
[8] and minimum-torsion bets by Meucci [6].
50
Appendix A
Mathematics Supplementary
The domain of the conjugate function is determined by the values ν ∈ Rn where the supre-
mum is finite. In addition, the conjugate function f ∗ is differentiable f is the Legendre
transform of f where:
f ∗ (ν) = [θT ν − f (θ)]ν=∇θ f (θ) . (A.2)
Definition 5. Let f0 (x) be the objective function we try to minimize with x ∈ Rn subject
to the equality constraints hi (x) = 0 for i = 1, . . . , p and inequality constraints fi (x) ≤ bi
for i = 1, . . . , m. The Lagrangian L : Rn × Rm × Rp → R is defined as:
X X
L(x, ζ, ν) = f0 (x) + ζi (fi (x) − bi ) + νi hi (x) . (A.4)
i i
51
Definition 6. The Lagrange dual function g : Rm × Rp → R is defined as the minimum
value of the Lagrangian eq. (A.4) over x for any ζ and ν
p
where dom P = (∩m
i=0 dom fi ) ∩ (∩i=1 dom hi ). We notice that the function g(ζ, ν) is a
concave function of ζ and ν because it is defined as the infimum of affine function of ν and
ζ.
where θ ∈ Rn are called the natural parameters, T : Ωx → Rn are called the sufficient
statistics and A(θ) is called the log partition function defined as:
Z
exp θT T(x) dx
A(θ) = log (A.7)
x∈Ωx
which preserves the property that the integral of P (x |θ) over x is 1. Let us denote the set
of all parameters θ as Cθ = {θ ∈ Rn | A(θ) < ∞}. For the exponential family distributions,
we have that the set of parameters Cθ is an open convex set in Rn .
The moment parameter ν ∈ Rn is defined as the expected value of the sufficient statistic
function:
ν = EP (x |θ) [T(x)] . (A.8)
The relation between the moment parameters ν ∈ Cν and the natural parameters θ ∈ Cθ
can be seen via the moment generating property of the log partition function A(θ)
Proposition 11. The gradient of the log partition function A(θ) with respect to the natural
parameters θ is equal to the moment parameters ν.
52
Proof.
= EP (x |θ) [T(x)]
.=ν
In addition, when maximum likelihood estimation is used, the most crucial property of
the log partition function A(θ) is that it is convex in the parameter θ.
Proposition 12. The log partition function A(θ) is a convex function of the natural pa-
rameters θ. i.e. A(λθ1 + (1 − λ)θ2 ) ≤ λA(θ1 ) + (1 − λ)A(θ2 ).
Proof.
Z
exp (λθ1 + (1 − λ)θ2 )T T(x) dx
A(λθ1 + (1 − λ)θ2 ) = log
Zx∈Ωx
exp λθ1T T(x) + (1 − λ)θ2T T(x) dx
= log
Zx∈Ωx
exp λθ1T T(x) exp (1 − λ)θ2T T(x) dx
= log
x∈Ωx
λ 1−λ
Z Z
λ T 1−λ T
≤ exp θ1 T(x) exp θ T(x)
x∈Ωx λ x∈Ωx 1−λ 2
Z Z
= λ log exp θ1T T(x) + (1 − λ) log exp θ2T T(x)
x∈Ωx x∈Ωx
= λ A(θ1 ) + (1 − λ) A(θ2 )
(A.10)
Definition 8. The entropy of the a probability distribution P (x) defined on the sample
space Ωx is: Z
H(P (x)) = − P (x) log P (x) dx (A.11)
Ωx
The entropy function and the log partition function are also Fenchel conjugate functions.
53
Proposition 13. The entropy function and the log partition function are also Fenchel
conjugate functions
−H(ν) = sup θT ν − A(θ) . (A.12)
θ∈dom A
As a result
H(ν) = inf A(θ) − θT ν . (A.13)
θ∈dom A
Proof. The entropy of exponential family distributions as a function of the moment param-
eters ν can be expressed as:
This corresponds to the Fenchel-Young inequality in definition 4 with the equality holds.
In addition, from proposition 11 and definition 3, we know that θT ν − A(θ) achieves the
supremum when ν = ∇θ A(θ).
where δ(y = k) is the delta function. To express the multinominal distribution in exponen-
tial form
P (y|θy ) = exp(θyT Ty (y) − A(θy )) (A.17)
where θy ∈ RK−1 , Ty : Ωy → RK−1 , A(θy ) are the natural parameters, sufficient statistics
function and log partition function respectively, we need to parameterize the probability
54
density function P (y|λ) using the K − 1 components of λ [1]:
K
δ(y=k)
Y
P (y|λ) = λk
k =1
K –1
δ(y=k) δ(y=K)
Y
= λk λK
k =1 (A.18)
K –1 K –1
–1
(1−PKi=1
δ(y=k) δ(y=i)
Y X
= λk 1− λi
k =1 i=1
= P (y|λ̂)
We obtain that:
Ty (y) = δ(y = 1), . . . , δ(y = K − 1) (A.20)
55
Hence, P (y|θy ) has the general exponential form:
K
X –1 K
X –1
P (y|θy ) = exp θy=k δ(y = k) − log(1 + exp θy=k ) . (A.23)
k =1 k =1
= λk
= νy=k .
As a result,
K
X –1
θy=k = log νy=k − log(1 + exp θy=i )−1 . (A.26)
i=1
Hence, using the relation from eq. (A.22) and applying it to eq. (A.24), we have:
νy=k
θy=k = log PK –1 . (A.27)
1− i=1 νy=i
Finally, using Fenchel duality as shown in proposition 13, we can obtain the relationship
56
between the log partition function and the entropy function H(νy )
= A(θy ) − θyT νy
νy =∇θy A(θy )
K
X –1 K
X –1
= log(1 + exp θy=i ) − θy=k νy=k θ νy=k
y=k =log PK –1
1− ν
i=1 k =1 i=1 y=i
K –1
X νy=k
= log(1 + exp log PK –1 )
i=1 1− k =1 νy=i
K –1
X νy=k
− log νy=k
PK –1
k =1
1 − k =1 νy=i
P –1 PK –1
1− K
i=1 νy=i + i=1 νy=i
= log P –1
1− K i=1 νy=i (A.28)
K –1
X νy=k
− log PK –1 νy=k
k =1
1− k =1 νy=i
K
X –1 K
X –1
= log(1 − νy=i )−1 − νy=k log νy=k
i=1 k =1
K
X –1 K
X –1
− νy=k log(1 − νy=i )−1
k =1 i=1
K
X –1 K
X –1 K
X –1
=− νy=k log νy=k + (1 − νy=k ) log(1 − νy=i )−1
k =1 k =1 i=1
K
X –1 K
X –1 K
X –1
=− νy=k log νy=k − (1 − νy=k ) log(1 − νy=k ) .
k =1 k =1 k =1
where µ ∈ Rd , Σ ∈ S+
d are mean and covariance matrix of random vector x with the sample
space Ωx = Rd . S+
d denotes the set of symmetric positive semidefinite matrices. The
1 1 1 d
P (x |m, S) = exp mT x +tr(− S x xT ) + log |S| − mT S −1 m − log 2π (A.30)
2 2 2 2
57
where x is the random vector with sample space Ωx = Rd .
We can write the information form P (x |m, S) into exponential family form as:
1 1 1 d
P (x |m, S) = exp mT x +tr(− S x xT ) + log |S| − mT S −1 m − log 2π
2 2 2 2
T 1 T 1 1 T −1 d (A.31)
= exp m x +tr(− S x x ) − (− log |S| + m S m + log 2π)
2 2 2 2
= exp(θxT T(x) − A(θx ))
1
θx = (mT , vec(− S)T )T . (A.33)
2
1 1 d
A(θx ) = − log |S| + mT S −1 m + log 2π . (A.34)
2 2 2
The relationship between the source parameters µ, Σ and the information parameters m, S
can be seen as:
1 1 T −1
P (x |µ, Σ) = d 1 exp − (x −µ) Σ (x −µ)
(2π) 2 |Σ| 2 2
1 1 1 d
= exp − xT Σ−1 x + xT Σ−1 µ − µT Σ−1 µ + log |Σ−1 | − log 2π
2 2 2 2
1 1
= exp tr(− Σ−1 x xT ) + (Σ−1 µ)T x − (Σ−1 µ)T Σ(Σ−1 µ)
2 2
1 d
+ log |Σ−1 | − log 2π (A.35)
2 2
1 1
= exp (Σ−1 µ)T x + tr(− Σ−1 x xT ) + log |Σ−1 |
2 2
1 d
− (Σ−1 µ)T Σ(Σ−1 µ) − log 2π
2 2
T 1 T 1 1 T −1 d
= exp m x + tr(− S x x ) + log |S| − m S m − log 2π .
2 2 2 2
Hence, we can see that m = Σ−1 µ, S = Σ−1 and µ = S −1 , Σ = S −1 . Similarly, using the
property of log partition function ∇θx A(θx ) = νx , we can obtain:
∇m A(m, S) = S −1 m
(A.36)
= µ,
58
∇− 1 A(m, S) = S −1 + S −1 mmT S −1
2
(A.37)
= Σ + µµT .
Finally, using Fenchel duality as shown in proposition 13, we can obtain the relationship
between the log partition function and the entropy function H(νx )
59
Appendix B
• Value-at-risk with confidence level α: R.(x) = VaRα (x) = inf{l : P (L(x) ≤ l) ≥ α}.
1
R1
• Expected shortfall: R(x) = 1−α α VaRu (x)du.
If we assume that R ∼ N (µ, Σ), we can rewrite the above risk measures into generic form
of standard deviation based risk measure:
√
SDc (x) = −µ(x) + c xT Σ x . (B.1)
Take an example of value-at risk, we have P r R(x) ≤ V aRa (x) = 1 − α. Thus,
R(x) − µ(x) −V aRα (x) − µ(x)
P √ = √ = 1 − α. (B.2)
xT Σ x xT Σ x
Hence, we can easily see that:
−V aRα (x) − µ(x)
√ = Φ−1 (1 − α) (B.3)
T
x Σx
where Φ is the cumulative distribution function of normal random variable. We finally have:
√
SDc (x) = V aRα (x) = −µ(x) + Φ−1 (α) xT Σ x (B.4)
where c = Φ−1 (α). It can also be shown in [11] that the standard deviation form of expected
shortfall (ES) can be expressed as:
√
xT Σ x
ESα (x) = −µ(x) + φ(Φ−1 (α)) . (B.5)
(1 − α)
60
B.2 Risk contribution and Euler allocation principle
After the first step of measuring risk in the portfolio, we need to also decompose the risk
portfolio into a sum of risk contributions by assets. This process is referred to as risk
allocation [10]. Risk contributions can be defined using Euler principle in [11] as follows.
Let Υi be the profit and loss of the asset i in the portfolio and hence, the profit and loss of
the whole portfolio with n assets is:
n
X
Υ= Υi . (B.6)
i=1
The risk measure R(Υ) is the portfolio-wide risk and R(Υi |Υ) as risk contribution of asset
ith to the portfolio-wide risk. Tasche in [2] defines the risk-adjusted performance measure-
ment (RAPM) as:
E[Υ]
RAPM(Υ) = (B.7)
R[Υ]
and also:
E[Υ]
RAPM(Υi |Υ) = . (B.8)
R(Υi |Υ)
Then we can state the two desirable risk contribution properties as in [2]:
Pn
1. The full allocation property: i=1 R(Υi |Υ) = R(Υ).
2. The RAPM compatible property: if ∃i , i > 0 such that RAPM(Υi |Υ) > RAPM(Υ)
then RAPM(Υ + hΥi ) > RAPM(Υ), ∀h, 0 < h < i .
If these two conditions are satisfied then it is shown in [2] that R(Υi |Υ) can be uniquely
d
defined as: R(Υi |Υ) = dh R(Υ + hΥi ). Hence, based on this framework, if we consider
the risk measure R(x) defined in term of weights, the risk contribution of asset i can be
uniquely defined as:
∂R(x)
RC i = xi (B.9)
∂xi
and hence, the Euler decomposition is satisfied:
n n
X ∂R(x) X
R(x) = xi = RC i . (B.10)
∂xi
i=1 i=1
This definition plays the key role in risk budgeting portfolios. For e.g., if we consider
the case of Gaussian asset returns: R ∼ N (µ, Σ) and with volatility as the risk measure
√
R(x) = σ(x) = xT Σ x, we can verify that this satisfies the full allocation property. First,
we have the marginal volatility:
∂σ(x) Σx
=√ . (B.11)
∂x xT Σ x
61
Then we can compute the risk contribution of the ith asset:
(Σ x)i
RC i = xi √ . (B.12)
xT Σ x
Hence, the full allocation property is satisfied:
n n
X X (Σ x)i
RC i = xi √
i=1 i=1 xT Σ x
(B.13)
Σx
= xT √ = σ(x) .
xT Σ x
62
Appendix C
Code Listing
5 # Max d i v e r s i f i c a t i o n o b j e c t i v e f u n c t i o n
6 def m i n v a r o b j f u n c ( wgtvec , covarmat ) :
7 wgtvec = np . matrix ( wgtvec )
8 r e t u r n np . a r r a y ( np . dot ( wgtvec , np . dot ( covarmat , wgtvec . T) ) ) [ 0 ] [ 0 ]
9 d e f minimum variance ( data ,
10 schedule ,
11 signal ,
12 params ={} ,
13 ∗∗ kwargs ) :
14 ”””
15 Simple minimum/mean v a r i a n c e o p t i m i z e r .
16 Function m i n i m i z e s t h e p o r t f o l i o v o l a t i l i t y with c o n s t r a i n t s t h a t
w e i g h t s add up t o 100%.
17 An a d d i t i o n a l c o n s t r a i n t o f minimum p o r t f o l i o r e t u r n can be added .
18
22 : math : ‘ p o r t f o l i o v o l a t i l i t y = \\ s q r t {WˆT ∗ C ∗ W} ‘
23
24 : math : ‘ p o r t f o l i o r e t u r n = WˆT ∗ R‘
25
63
32
33
34 ”””
35 o p t i m c o n s = ( { ’ type ’ : ’ eq ’ , ’ fun ’ : lambda x : sum ( x ) − 1 } , )
36
37 t a r g e t r e t u r n = params . g e t ( ’ t a r g e t r e t u r n ’ , None )
38 m e a n v a r i a n c e = not ( t a r g e t r e t u r n i s None ) # F a l s e
39 # if ’ t a r g e t r e t u r n ’ i n l i s t ( params . k e y s ( ) ) :
40 # t a r g e t r e t u r n = params [ ’ t a r g e t r e t u r n ’ ]
41 # m e a n v a r i a n c e = True
42
52 i f mean variance :
53 return matrix = signal [ date rebal ] [ ’ underlying return ’ ]
54 o p t i m c o n s r e t u r n = ( { ’ type ’ : ’ eq ’ , ’ fun ’ : lambda x , r , mu : np .
prod ( 1 + np . dot ( r , x ) ) − 1 − mu,
55 ’ args ’ : ( return matrix , target return , ) } ,)
56 optim cons = optim cons + optim cons return
57
1 import pandas a s pd
2 import numpy a s np
3 import s c i p y
64
4 from s c i p y . s t a t s import m u l t i v a r i a t e n o r m a l
5 from s c i p y import o p t i m i z e a s spo
6 import copy
7
31 return d es
32
65
35 h 2 = ( b i s e c t i o n o p t ( lower a , higher b , wgt vector , c o e f s [ 0 ] , c o e f s [ 1 ] ,
c o e f s [ 2 ] , c o e f s [ 3 ] , c o e f s [ 4 ] , c o e f s [ 6 ] ) +n e g r e t f u n c ( w g t v e c t o r , c o e f s [ 3 ] )
) / vol func ( wgt vector , c o e f s [ 1 ] )
36 w 1 = ( c o e f s [ 5 ] ∗ s c i p y . s t a t s . norm . pdf ( h 1 ) ) / v o l f u n c ( w g t v e c t o r , c o e f s
[0])
37 w 2 = ( c o e f s [ 6 ] ∗ s c i p y . s t a t s . norm . pdf ( h 2 ) ) / v o l f u n c ( w g t v e c t o r , c o e f s
[1])
38 d v a r = ( w 1 ∗ ( h 1 / v o l f u n c ( w g t v e c t o r , c o e f s [ 0 ] ) ∗np . dot ( c o e f s [ 0 ] ,
w g t v e c t o r ) − c o e f s [ 2 ] ) +\
39 w 2 ∗ ( h 2 / v o l f u n c ( w g t v e c t o r , c o e f s [ 1 ] ) ∗np . dot ( c o e f s [ 1 ] ,
w g t v e c t o r ) − c o e f s [ 3 ] ) ) /\
40 ( w 1+w 2 )
41 d e l t a 1 = (1+ h 1 / v o l f u n c ( w g t v e c t o r , c o e f s [ 0 ] ) ∗ b i s e c t i o n o p t ( l o w e r a ,
higher b , wgt vector , c o e f s [ 0 ] , c o e f s [ 1 ] , c o e f s [ 2 ] , c o e f s [ 3 ] , c o e f s [ 4 ] ,
c o e f s [ 6 ] ) ) ∗np . dot ( c o e f s [ 0 ] , w g t v e c t o r )−\
42 b i s e c t i o n o p t ( lower a , higher b , wgt vector , c o e f s [ 0 ] , c o e f s
[ 1 ] , c o e f s [ 2 ] , c o e f s [ 3 ] , c o e f s [ 4 ] , c o e f s [ 6 ] ) ∗( d var + c o e f s [ 2 ] )
43 d e l t a 2 = (1+ h 2 / v o l f u n c ( w g t v e c t o r , c o e f s [ 1 ] ) ∗ b i s e c t i o n o p t ( l o w e r a ,
higher b , wgt vector , c o e f s [ 0 ] , c o e f s [ 1 ] , c o e f s [ 2 ] , c o e f s [ 3 ] , c o e f s [ 4 ] ,
c o e f s [ 6 ] ) ) ∗np . dot ( c o e f s [ 1 ] , w g t v e c t o r )−\
44 b i s e c t i o n o p t ( lower a , higher b , wgt vector , c o e f s [ 0 ] , c o e f s [ 1 ] ,
c o e f s [ 2 ] , c o e f s [ 3 ] , c o e f s [ 4 ] , c o e f s [ 6 ] ) ∗( d var + c o e f s [ 3 ] )
45 d e s = w 1/(1− c o e f s [ 4 ] ) ∗ d e l t a 1 + w 2/(1− c o e f s [ 4 ] ) ∗ d e l t a 2 − (1/(1 − c o e f s
[ 4 ] ) ) ∗ ( c o e f s [ 5 ] ∗ c o e f s [ 2 ] ∗ s c i p y . s t a t s . norm . c d f (− h 1 ) + c o e f s [ 6 ] ∗ c o e f s [ 3 ] ∗
s c i p y . s t a t s . norm . c d f (− h 2 ) )
46 return d es + r i s k p a r i t y j a c ( wgt vector , c o e f s [ 7 ] )
47
48 def r i s k p a r i t y f u n c ( wgtvec , c o e f s ) :
49 r e t u r n −1 ∗ np . sum ( np . abs ( c o e f s ) ∗ np . l o g ( np . abs ( wgtvec ) ) )
50
51 def r i s k p a r i t y j a c ( wgtvec , c o e f s ) :
52 j a c v a l = [ −1.0 ∗ np . abs ( c o e f s ) [ k ] / wgtvec [ k ] f o r k i n r a n g e ( l e n ( wgtvec ) )
] #
53 return jac val
54
66
w g t v e c t o r , c o e f s [ 3 ] ) ) / v o l f u n c ( w g t v e c t o r , c o e f s [ 1 ] ) )−\
61 n e g r e t f u n c ( w g t v e c t o r , c o e f s [ 3 ] ) ∗ s c i p y . s t a t s . norm . c d f ( −((
b i s e c t i o n o p t ( lower a , higher b , wgt vector , c o e f s [ 0 ] , c o e f s [ 1 ] , c o e f s
[ 2 ] , c o e f s [ 3 ] , c o e f s [ 4 ] , c o e f s [ 6 ] ) +n e g r e t f u n c ( w g t v e c t o r , c o e f s [ 3 ] ) ) /
vol func ( wgt vector , c o e f s [ 1 ] ) ) ) ) )
62 return es + r i s k p a r i t y f u n c ( wgt vector , c o e f s [ 7 ] )
63
64
74 l o w e r a = −100
75 h i g h e r b = 100
76 d e f b i s e c t i o n o p t ( a , b , w g t v e c t o r , c o v a r 1 , c o v a r 2 , mu vt 1 , mu vt 2 , alpha ,
lambda jump , t o l =1e −6) :
77 i f v a r f u n c t i o n o p t ( a , w g t v e c t o r , c o v a r 1 , c o v a r 2 , mu vt 1 , mu vt 2 ,
alpha , lambda jump ) ∗ v a r f u n c t i o n o p t ( b , w g t v e c t o r , c o v a r 1 , c o v a r 2 ,
mu vt 1 , mu vt 2 , alpha , lambda jump ) > 0 :
78 p r i n t ( v a r f u n c t i o n o p t ( a , w g t v e c t o r , c o v a r 1 , c o v a r 2 , mu vt 1 ,
mu vt 2 , alpha , lambda jump ) )
79 p r i n t ( v a r f u n c t i o n o p t ( b , w g t v e c t o r , c o v a r 1 , c o v a r 2 , mu vt 1 ,
mu vt 2 , alpha , lambda jump ) )
80 p r i n t ( ”No r o o t found . ” )
81 else :
82 while (b − a ) /2.0 > t o l :
83 midpoint = ( a + b ) / 2 . 0
84 i f v a r f u n c t i o n o p t ( midpoint , w g t v e c t o r , c o v a r 1 , c o v a r 2 ,
mu vt 1 , mu vt 2 , alpha , lambda jump ) == 0 :
85 r e t u r n ( midpoint ) #The midpoint i s t h e x−i n t e r c e p t / r o o t .
86 e l i f v a r f u n c t i o n o p t ( a , w g t v e c t o r , c o v a r 1 , c o v a r 2 , mu vt 1 ,
mu vt 2 , alpha , lambda jump ) ∗ v a r f u n c t i o n o p t ( midpoint , w g t v e c t o r ,
c o v a r 1 , c o v a r 2 , mu vt 1 , mu vt 2 , alpha , lambda jump ) < 0 : # I n c r e a s i n g
but below 0 c a s e
87 b = midpoint
88 else :
89 a = midpoint
90 r e t u r n ( midpoint )
91
92 d e f v a r f u n c t i o n o p t ( v a r a , w g t v e c t o r , c o v a r 1 , c o v a r 2 , mu vt 1 , mu vt 2 ,
alpha , lambda jump ) :
93
67
94 v a r r e s u l t = (1−lambda jump ) ∗ s c i p y . s t a t s . norm . c d f ( ( v a r a + n e g r e t f u n c (
w g t v e c t o r , mu vt 1 ) ) / v o l f u n c ( w g t v e c t o r , c o v a r 1 ) ) + \
95 lambda jump ∗ s c i p y . s t a t s . norm . c d f ( ( v a r a + n e g r e t f u n c (
w g t v e c t o r , mu vt 2 ) ) / v o l f u n c ( w g t v e c t o r , c o v a r 2 ) ) − a l p h a
96 return var result
Listing C.2: Objective functions and Jacobian functions for volatility based risk parity and
expected-shortfall based risk parity with Gaussian mixture models
1 import pandas a s pd
2 import numpy a s np
3 from s c i p y . s t a t s import m u l t i v a r i a t e n o r m a l
4 from s c i p y import o p t i m i z e a s spo
5 import copy
6
7 h undl list = [
8 ’UISELGSE Index ’ ,
9 ’UISEMGSE Index ’ ,
10 ’UISEQGSE Index ’ ,
11 ’UISERUAE Index ’ ,
12 ’UISEVGSE Index ’ ,
13 ’UISFC1UE Index ’ ,
14 ’UISFV1UE Index ’ ,
15 ’UISRCX8E Index ’ ,
16 ’UISRTLGE Index ’ ,
17 ’UISXTXUE Index ’
18 ]
19
20 u n d l l i s t = [ adm . r e t r i e v e ( h u n d l ) f o r h u n d l i n h u n d l l i s t ]
21
22 d e f g e t r e t u r n ( t i m e s e r i e s , days =5) :
23 s h i f t e d t s = t i m e s e r i e s . s h i f t ( days )
24 return timeseries / s h i f t e d t s − 1
25
26 un ret list = []
27 f o r undl i n u n d l l i s t :
28 u n r e t = g e t r e t u r n ( undl )
29 u n r e t l i s t . append ( u n r e t )
30
31 r e t d f = pd . c o n c a t ( u n r e t l i s t , a x i s =1 , j o i n= ’ i n n e r ’ )
32 r e t d f = r e t d f . dropna ( )
33 c o v m a t r i x = r e t d f . cov ( )
34 corr matrix = ret df . corr ()
35 r e t m ea n = r e t d f . mean ( ) . v a l u e s
36 s c a l e d l e v e l s d f = 100 ∗ (1+ r e t d f ) . cumprod ( )
37
38 def i s p o s d e f (x) :
39
68
40 p r i n t ( np . l i n a l g . e i g v a l s ( x ) )
41
42 r e t u r n np . a l l ( np . l i n a l g . e i g v a l s ( x ) >= 0 )
43
57 print ( ’ weights ’ )
58 print ( c l f . weights )
59
73 iter t = 3
74 N = r e t d f p e r i o d . shape [ 0 ]
75 K = 2
76
77 f o r t i t e r in range ( i t e r t ) :
78 q k = [ a l p h a k . v a l u e , 1−a l p h a k . v a l u e ]
79 m u s k l i s t = [ np . a r r a y ( mu k 1 . v a l u e ) . f l a t t e n ( ) , np . a r r a y ( mu k 2 . v a l u e ) .
flatten () ]
80 c o v a r s k l i s t = [ c o v a r k 1 . value , c o v a r k 2 . value ]
81 q posterior list = []
82 q p o s t e r i o r m a p = {}
83 import math
69
84
85 # marginal p r o b a b i l i t y
86 m a r p r o b l i s t = np . z e r o s ( (K, N) )
87 f o r k i d x t e m p i n r a n g e ( 0 , K) :
88 f o r j t e m p i n r a n g e ( 0 , N) :
89 m a r p r o b l i s t [ k idx temp , j t e m p ] = \
90 ( q k [ k i d x t e m p ] ∗ m u l t i v a r i a t e n o r m a l . pdf ( r e t d f p e r i o d . i l o c [
j t e m p ] . v a l u e s , mean=m u s k l i s t [ k i d x t e m p ] , cov=c o v a r s k l i s t [ k i d x t e m p
]) )
91
92 f o r k i d x i n r a n g e ( 0 , K) :
93 f o r j i n r a n g e ( 0 , N) :
94 j o i n t p r o b k = q k [ k i d x ] ∗ m u l t i v a r i a t e n o r m a l . pdf ( r e t d f p e r i o d .
i l o c [ j ] , mean=m u s k l i s t [ k i d x ] , cov=c o v a r s k l i s t [ k i d x ] )
95 q p o s t e r i o r = j o i n t p r o b k / np . sum ( m a r p r o b l i s t [ : , j ] )
96 q p o s t e r i o r l i s t . append ( q p o s t e r i o r )
97
101 ##### a p r i o r i a f t e r E s t e p
102 # empirical
103 # probability
104
113 # mean
114 mu sk list = [ ]
115 w e i g h t e d x s m a p = {}
116 f o r k i d x i n r a n g e ( 0 , K) :
117 w e i g h t e d x s m a p [ k i d x ] = np . empty ( [ i n t ( l e n ( h u n d l l i s t ) ) , N ] )
118
119 f o r k i d x i n r a n g e ( 0 , K) :
120 mu sk = 0
121 f o r j i n r a n g e ( 0 , N) :
122 weighted x = ( q posterior map [ k idx ] [ j ] ) ∗ r e t d f p e r i o d . i l o c [ j ] .
values
123 mu sk += w e i g h t e d x
124 weighted xs map [ k idx ] [ : , j ] = weighted x / a l p h a s k l i s t [ k idx ]
125 m u s k l i s t . append ( mu sk / N / a l p h a s k l i s t [ k i d x ] )
126
70
127 covar sk list = []
128 f o r k i d x i n r a n g e ( 0 , K) :
129 sum covar = 0
130 f o r j i n r a n g e ( 0 , N) :
131 w0 temp = ( r e t d f p e r i o d . i l o c [ j ] − m u s k l i s t [ k i d x ] ) . v a l u e s
132 w0 covar = np . o u t e r ( w0 temp , w0 temp )
133 w e i g h t e d c o v a r = ( q p o s t e r i o r m a p [ k i d x ] [ j ] ) ∗ w0 covar
134 sum covar += w e i g h t e d c o v a r
135 sum covar = sum covar / ( a l p h a s k l i s t [ k i d x ] ∗N)
136 c o v a r s k l i s t . append ( sum covar )
137
138 #####
139 # i n f o r m a t i o n params
140 n sk list = []
141 f o r k i d x i n r a n g e ( 0 , K−1) :
142 n s k = np . l o g ( a l p h a s k l i s t [ k i d x ] / (1− ( np . sum ( a l p h a s k l i s t [ : − 1 ] ) ) )
)
143 n s k l i s t . append ( n s k )
144 m sk list = [ ]
145 f o r k i d x i n r a n g e ( 0 , K) :
146 m sk = np . dot ( np . l i n a l g . i n v ( c o v a r s k l i s t [ k i d x ] ) , m u s k l i s t [ k i d x ] )
147 m s k l i s t . append ( m sk )
148
149 S sk list = []
150 f o r k i d x i n r a n g e ( 0 , K) :
151 S s k = np . l i n a l g . i n v ( c o v a r s k l i s t [ k i d x ] )
152 S s k l i s t . append ( S s k )
153
154 a l p h a s k 1 = Parameter ( s i g n= ’ p o s i t i v e ’ )
155 a l p h a s k 2 = Parameter ( s i g n= ’ p o s i t i v e ’ )
156 n s k p r = Parameter ( 1 )
157 dim = Parameter ( s i g n= ’ p o s i t i v e ’ )
158 m sk 1 = Parameter ( i n t ( l e n ( h u n d l l i s t ) ) )
159 m sk 2 = Parameter ( i n t ( l e n ( h u n d l l i s t ) ) )
160 alpha sk 1 . value = alpha sk list [0]
161 alpha sk 2 . value = alpha sk list [1]
162 n sk pr . value = n s k l i s t [ 0 ]
163 m sk 1 . v a l u e = m s k l i s t [ 0 ]
164 m sk 2 . v a l u e = m s k l i s t [ 1 ]
165 dim . v a l u e = i n t ( l e n ( h u n d l l i s t ) )
166 c o v a r k 1 = Se mi de f ( i n t ( l e n ( h u n d l l i s t ) ) )
167 c o v a r k t i l d e = Se mi de f ( i n t ( l e n ( h u n d l l i s t ) ) )
168
169 d u a l f u n c = e n t r ( a l p h a k ) + e n t r (1− a l p h a k ) + a l p h a s k 1 ∗ 0 . 5 ∗ l o g d e t (
covar k 1 ) + \
170 a l p h a s k 1 ∗dim ∗ 0 . 5 ∗ l o g ( 2 ∗ np . e ∗np . p i ) + a l p h a s k 2 ∗ 0 . 5 ∗ l o g d e t (
c o v a r k 1+c o v a r k t i l d e ) + a l p h a s k 2 ∗dim ∗ 0 . 5 ∗ l o g ( 2 ∗ np . e ∗np . p i ) + \
71
171 a l p h a k ∗ n s k p r + a l p h a s k 1 ∗ mu k 1 . T∗ m sk 1 − a l p h a s k 1 ∗ 0 . 5 ∗ t r a c e (
c o v a r k 1 ∗ S s k l i s t [ 0 ] ) − a l p h a s k 1 ∗ 0 . 5 ∗ quad form ( mu k 1 , S s k l i s t [ 0 ] ) +
\
172 a l p h a s k 2 ∗ ( mu k 1 . T) ∗ m sk 2 − a l p h a s k 2 ∗ 0 . 5 ∗ t r a c e ( ( c o v a r k 1+
c o v a r k t i l d e ) ∗ S s k l i s t [ 1 ] ) − a l p h a s k 2 ∗ 0 . 5 ∗ quad form ( mu k 1 , S s k l i s t
[1])
173
72
10 quoting calendar = b a s k e t t i m e s e r i e s . index . values
11 i n c e p t i o n D a t e = np . d a t e t i m e 6 4 ( kwargs . g e t ( ’ i n c e p t i o n D a t e ’ ) )
12 r e b a l s t a r t d a t e = max( i n c e p t i o n D a t e , b a s k e t w e i g h t s . i n d e x . v a l u e s [ 0 ] )
13 c a l i b r a t i o n d a t e s = kwargs [ ’ r e b a l a n c i n g d a y m a p ’ ]
14 s i g n a l c a l e n d a r = kwargs [ ’ s i g n a l c a l e n d a r ’ ]
15 s i g n a l c a l e n d a r d a t e = np . a r r a y ( s i g n a l c a l e n d a r . d a t e l i s t )
16 t r a d i n g d a t e s = b a s k e t t i m e s e r i e s . index . values
17
18 l a s t t r a d e d a t e i = np . s e a r c h s o r t e d ( t r a d i n g d a t e s , asOf )
19 i f t r a d i n g d a t e s [ l a s t t r a d e d a t e i ] > asOf :
20 l a s t t r a d e d a t e i −= 1
21 last trade date = trading dates [ last trade date i ]
22
23 u n d l l i s t = b a s k e t w e i g h t s . columns
24 basket timeseries = basket timeseries [ undl list ]
25 basket timeseries values = basket timeseries [ u n d l l i s t ] . values
26 ## R e t r i e v e FX data
27 f x r e f = params . g e t ( ’ fx map ’ , None )
28 f x h e d g e d = not f x r e f i s None
29 i f fx hedged :
30 #f x r e f = params . g e t ( ’ fx map ’ , { } )
31 all fx = []
32 f x d i s c r i m i n a t o r = params . g e t ( ’ f x d i s c r i m i n a t o r ’ , None )
33 f o r undl i n u n d l l i s t :
34 i f undl i n f x r e f . k e y s ( ) :
35 i f not f x r e f [ undl ] [ 0 ] i s None :
36 f x t s = s t r a t e g y c a c h e . r e t r i e v e d a t a ( f x r e f [ undl ] [ 0 ] ,
d i s c r i m i n a t o r=f x d i s c r i m i n a t o r )
37 f x t s = f x t s ∗∗ ( f x r e f [ undl ] [ 1 ] )
38 else :
39 f x t s = pd . S e r i e s ( 1 . 0 , i n d e x=b a s k e t t i m e s e r i e s . i n d e x )
40 else :
41 f x t s = pd . S e r i e s ( 1 . 0 , i n d e x=b a s k e t t i m e s e r i e s . i n d e x )
42 a l l f x += [ f x t s ]
43
49 ## d a t e s
50 r e b a l d a t e s = basket weights . index . values
51 r e b a l w e i g h t s i d x = np . s e a r c h s o r t e d ( r e b a l d a t e s , [ r e b a l s t a r t d a t e ,
last trade date ])
52 i f r e b a l w e i g h t s i d x [ 1 ] ! = l e n ( r e b a l d a t e s ) and r e b a l d a t e s [
rebal weights idx [ 1 ] ] > last trade date :
53 r e b a l w e i g h t s i d x [ 1 ] −= 1
73
54 roll dates = rebal dates [ rebal weights idx [ 0 ] : rebal weights idx [1]+1]
55 r o l l d a t e s = np . append ( r o l l d a t e s , l a s t t r a d e d a t e )
56 rebal weights = basket weights . values [ rebal weights idx [ 0 ] :
rebal weights idx [1]+1]
57 r e b a l d a t e s s m o o t h i n g = i n t ( params . g e t ( ’ smoothing ’ , 1 ) )
58 c a r r y c o s t d a y c o n v e n t i o n = params . g e t ( ’ c a r r y c o s t d a y c o n v e n t i o n ’ , 3 6 5 )
59
60 # extract costs
61 c a r r y c o s t = params . g e t ( ’ c a r r y c o s t ’ , 0 . 0 )
62 r e b a l c o s t = params . g e t ( ’ r e b a l c o s t ’ , 0 . 0 )
63 i f not i s i n s t a n c e ( c a r r y c o s t , d i c t ) :
64 carry cost = { i : carry cost for i in u n d l l i s t }
65 i f not i s i n s t a n c e ( r e b a l c o s t , d i c t ) :
66 rebal cost = { i : rebal cost for i in u n d l l i s t }
67
68 c a r r y c o s t d f = pd . S e r i e s ( c a r r y c o s t ) [ u n d l l i s t ] . v a l u e s
69 r e b a l c o s t d f = pd . S e r i e s ( r e b a l c o s t ) [ u n d l l i s t ] . v a l u e s
70 index dates = [ ]
71 index levels = [ ]
72
73 # For w e i g h t decom
74 total cost series = []
75 no cost ret series = []
76 no cost ret df = [ ]
77 i n d e x r e f = kwargs . g e t ( ’ i n c e p t i o n V a l u e ’ , 1 0 0 . 0 )
78 l a s t u n i t l e v e l = np . a r r a y ( [ ] )
79 i n c l u d e c o s t d a y o n e = params . g e t ( ’ d a y o n e c o s t ’ , True )
80 u n i t r o u n d i n g = params . g e t ( ’ r o u n d i n g t a r g e t ’ , None )
81
82 i f inceptionDate < r e b a l s t a r t d a t e :
83 pre start date = trading dates [( trading dates < rebal start date ) ∗
84 ( t r a d i n g d a t e s >= i n c e p t i o n D a t e ) ]
85 i n d e x l e v e l s . append ( [ i n d e x r e f ] ∗ l e n ( p r e s t a r t d a t e ) )
86 index dates = [ pre start date ] ∗ len ( pre start date )
87 t o t a l c o s t s e r i e s . append ( [ 0 ] ∗ l e n ( p r e s t a r t d a t e ) )
88 n o c o s t r e t s e r i e s . append ( [ 0 ] ∗ l e n ( p r e s t a r t d a t e ) )
89 n o c o s t r e t d f . append ( [ [ 0 ] ∗ l e n ( u n d l l i s t ) ] ∗ l e n ( p r e s t a r t d a t e ) )
90
74
100 # r o l l d a t e s a r e from w e i g h t a l l o c a t o r w e i g h t s
101 r d i = np . s e a r c h s o r t e d ( t r a d i n g d a t e s , r o l l d a t e s )
102 s c d i = np . s e a r c h s o r t e d ( s i g n a l c a l e n d a r d a t e , r o l l d a t e s )
103 num rolls = len ( r o l l d a t e s )
104 t i m e s e r i e s s t a r t n p = np . d a t e t i m e 6 4 ( t i m e s e r i e s s t a r t )
105 d e t e r m i n a t i o n d t s = [ max( c a l i b r a t i o n d a t e s [ x ] , t i m e s e r i e s s t a r t n p ) i f x
i n c a l i b r a t i o n d a t e s e l s e asOf f o r x i n r o l l d a t e s ]
106 u n d l r e f p r i c e i d x = np . s e a r c h s o r t e d ( t r a d i n g d a t e s , d e t e r m i n a t i o n d t s )
107
112 wgt = r e b a l w e i g h t s [ k ]
113 r o l l t d = t r a d i n g d a t e s [ r d i [ k ] : r d i [ k +1]+1]
114 undl ref price = basket timeseries values [ undl ref price idx [k ] ]
115
128 d a y s r e b a l = s i g n a l c a l e n d a r d a t e [ s c d i [ k ] : s c d i [ k]+
rebal dates smoothing ]
129 u n i t s s c h e d u l e = np . a r r a y ( [ t r a d e d u n i t s ∗ ( i + 1 ) f o r i i n r a n g e ( 0 ,
len ( days rebal ) ) ] )
130 i f len ( l a s t u n i t l e v e l ) > 0:
131 units schedule = units schedule + last unit level
132
133 a l l u n i t s c h e d u l e 2 d a r r a y = np . v s t a c k (
134 [ all unit schedule 2d array , units schedule ]) if
a l l u n i t s c h e d u l e 2 d a r r a y i s not None e l s e u n i t s s c h e d u l e
135 d a y s r e b a l l i s t . append ( d a y s r e b a l )
136
75
141 s u b r o l l s = np . append ( d a y s r e f , nrd )
142 s r d i = np . s e a r c h s o r t e d ( r o l l t d , s u b r o l l s )
143 b s r d i = np . s e a r c h s o r t e d ( t r a d i n g d a t e s , s u b r o l l s )
144 f o r l in range ( len ( s u b r o l l s ) − 1) :
145 r o l l t d s u b r o l l = r o l l t d [ s r d i [ l ] : s r d i [ l +1]+1]
146 wgt sr = units schedule [ l ]
147 carry cost sr = carry cost ref [ l ]
148
149 # b i g improvement vs u s i n g r o l l t d s u b r o l l
150 u n d l t s = b a s k e t t i m e s e r i e s v a l u e s [ b s r d i [ l ] : b s r d i [ l +1]+1]
151 l e v e l r e f = undl ts [ 0 ]
152 undl ret = undl ts − l e v e l r e f
153
157 i f fx hedged :
158 f x t s = f x d f v a l u e s [ b s r d i [ l ] : b s r d i [ l +1]+1]
159 u n d l r e t ∗= f x t s
160 d a i l y r e t ∗= f x t s [ 1 : ]
161
171 c a r r y c o s t t o t a l = np . a r r a y ( [ np . sum ( x ) f o r x i n
c a r r y c o s t 2 d n p a r r a y ∗( f x t s i f fx hedged e l s e 1) ] )
172
76
182 l o g g e r . debug ( msg )
183
184 cost rebal = l e v e l r e f ∗ cost rebal ref ∗ include cost day one
185
186 i f fx hedged :
187 c o s t r e b a l ∗= f x t s [ 0 ]
188 r c t o t a l = c o s t r e b a l . sum ( )
189
194 # f o r w e i g h t decom
195 total cost roll = carry cost total + rc total
196
197 # f o r c r e a t i n g d a t a fr a m e
198 # of rebal cost
199 r e b a l c o s t t s l i s t . append ( c o s t r e b a l )
200 r e b a l c o s t d a t e l i s t . append ( r o l l t d s u b r o l l [ 0 ] )
201 # o f no c o s t p o r t f o l i o r e t u r n
202 n o c o s t d a i l y r e t . append ( d a i l y r e t ∗ w g t s r )
203
204 # append e v e r y t h i n g e x c e p t l a s t i n d e x r o l l v a l u e
205 i f n e x t d e t e r m i n a t i o n d a t e <r o l l t d s u b r o l l [ −1] and
next determination date in r o l l t d s u b r o l l :
206 i n d e x r e b a l r e f = i n d e x r o l l [ np . s e a r c h s o r t e d ( r o l l t d s u b r o l l ,
next determination date ) ]
207 i n d e x d a t e s . append ( r o l l t d s u b r o l l [ : − 1 ] )
208 i n d e x l e v e l s . append ( i n d e x r o l l [ : − 1 ] )
209
210 # c o s t f o r decomp
211 t o t a l c o s t s e r i e s . append ( t o t a l c o s t r o l l [ : − 1 ] )
212
223 # l a s t i t e r , append l a s t i n d e x r o l l v a l u e
224 i n d e x d a t e s . append ( r o l l t d s u b r o l l [ − 1 : ] )
225 i n d e x d a t e s = np . c o n c a t e n a t e ( i n d e x d a t e s )
77
226 i n d e x l e v e l s . append ( i n d e x r o l l [ − 1 : ] )
227 i n d e x l e v e l = pd . S e r i e s ( np . c o n c a t e n a t e ( i n d e x l e v e l s ) , i n d e x=i n d e x d a t e s )
228
229 # c o s t f o r decomp
230 t o t a l c o s t s e r i e s . append ( t o t a l c o s t r o l l [ − 1 : ] )
231 t o t a l c o s t s e r i e s = pd . S e r i e s ( np . c o n c a t e n a t e ( t o t a l c o s t s e r i e s ) , i n d e x=
index dates )
232 n o c o s t r e t s e r i e s = pd . S e r i e s ( np . c o n c a t e n a t e ( n o c o s t r e t s e r i e s ) , i n d e x=
index dates [: −1])
233 n o c o s t r e t d f = pd . DataFrame ( np . c o n c a t e n a t e ( n o c o s t r e t d f ) , i n d e x=
i n d e x d a t e s [ : − 1 ] , columns=u n d l l i s t )
234
235 # make d at a f r am e f o r i n t e r m e d i a t e r e s u l t s :
236 c a r r y c o s t d f = pd . DataFrame ( data=np . c o n c a t e n a t e ( c a r r y c o s t 2 d a r r a y l i s t )
, columns=u n d l l i s t )
237
238 t r a d i n g d a t e s f r o m = t r a d i n g d a t e s [ t r a d i n g d a t e s >= r o l l d a t e s [ 0 ] ]
239 second td = trading dates from [ 1 ]
240
264 # i f f x h e d g e d then r e a d j u s t u n i t s !
78
265 i f fx hedged :
266 try :
267 # f i n d f x r a t e on d e t e r m i n a t i o n d a t e
268 det date wgt next = c a l i b r a t i o n d a t e s [ wgt next . index . values
[ −1]]
269 t a r g e t u n i t s /= f x d f . l o c [ d e t d a t e w g t n e x t ]
270 except IndexError :
271 t a r g e t u n i t s /= f x d f . i l o c [ −1]
272
273 # r o u n d i n g f o r u n i t s a f t e r asOf d a t e
274 t a r g e t u n i t s = np . round ( t a r g e t u n i t s ,
275 params [ ’ r o u n d i n g t a r g e t ’ ] ) if ’ rounding target
’ i n params e l s e t a r g e t u n i t s
276
282 u n i t s s c h e d u l e = np . a r r a y ( [ t r a d e d u n i t s . v a l u e s ∗ ( i + 1 ) f o r i i n
range (0 , len ( d a y s r e f ) ) ] )
283 i f len ( l a s t u n i t l e v e l ) > 0:
284 units schedule = units schedule + last unit level
285
286 a l l u n i t s c h e d u l e 2 d a r r a y = np . v s t a c k (
287 [ all unit schedule 2d array , units schedule ]) if
a l l u n i t s c h e d u l e 2 d a r r a y i s not None e l s e u n i t s s c h e d u l e
288 d a y s r e b a l l i s t . append ( d a y s r e f )
289
294 t o p d r a g f e e = None
295 if ’ d r a g f e e ’ i n params :
296 i n d e x l e v e l = s i . d r a g f e e ( d r a g r a t e=params [ ’ d r a g f e e ’ ] , t i m e s e r i e s=
index level )
297 t o p d r a g f e e = params [ ’ d r a g f e e ’ ]
298
79
Bibliography
[1] C. Ari. Maximum likelihood estimation of robust constrained Gaussian Mixture Mod-
els. 2013.
[2] D. Tasche. Capital Allocation to Business Units and Sub-Portfolios: The Euler Prin-
ciple. The New Accord: The Challenge of Economic Capital, 2008.
[3] J. Jacod and Y. Ait-Sahalia. Analyzing the Spectrum of Asset Returns: Jump and
Volatility Components in High Frequency Data. Journal of Economic Literature,
50:1007–1050, 2012.
[4] J.P. Bouchaud, M. Potters, R. Benichou and Y. Lemperiere. Agnostic Risk Parity:
Taming Known and Unknown-Unknowns. 2016.
[5] M. Wainwright and M. Jordan. Graphical models, exponential families and variational
inference. 2008.
[6] A. Meucci. Risk Budgeting and Diversification Based on Optimized Uncorrelated Fac-
tors. 2015.
[7] R. Michaud. Efficient asset allocation: A practical guide to stock portfolio optimization
and asset allocation. 1998.
[10] R. Litterman. Hot Spots and Hedges. Goldman Sachs Risk Management Series, 1996.
[13] T. Roncalli. Introducing Expected Returns into Risk Parity Portfolios: A New Frame-
work for Asset Allocation. 2013.
80
[14] T. Roncalli. Keep Up The Momentum. 2017.
[15] T. Roncalli, N. Kostyuchyk, B. Bruder. Risk Parity Portfolios with Skewness Risk.
2016.
81