Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

10 August 2021

Diversification and the Distribution of Portfolio Variance


Part 3: Polynomial optimisation for asset allocation

Brian J. W. Fleming
brian.fleming@dimensionless.uk

Diversification is a fundamental topic for all investors but there remains little agreement on how to
measure it. Often it is defined ambiguously through risk-based portfolio construction techniques.
Recently it has been suggested to connect maximising diversification with minimising risk instability,
via kurtosis, which presents practical optimisation challenges. In particular, minimising kurtosis is a
non-convex problem that is typically solved using deterministic Branch-and-Bound methods, that
do not scale well, or stochastic methods that provide limited guarantees on finding minima. We
thus apply a deterministic hierarchical polynomial optimization framework that allows realistic
asset allocation problems to be readily solved and also provides a numerical certificate of optimality.

Measuring portfolio diversification directly has been a topic of increasing interest for some time. In
our view this is for two key reasons. Firstly, the 2008 financial crisis highlighted the loss of
diversification that can occur between risky assets and, secondly, the growth of risk-based strategies
such as risk parity have perpetuated the use of heuristic methods that intuitively spread or negate risk.
However, such strategies leave open a number of questions, including why they should be superior
under any particular covariance-based risk model. This leads to back-testing providing the primary
justification for their use. It is argued in Fleming and Kroeske (2020) that any measurement of
diversification should be linked to properties of the distribution of portfolio returns. Therein it is
suggested that the normalised variance-of-variance (NVOV) provides a coherent extension of the
ubiquitous mean-variance framework. Taking variance as the measure of risk, NVOV drives the
instability of realised risk, which should be important to investors. Given a target level of risk, the goal
of diversification is therefore to reduce NVOV to improve the stability of realised risk around the target.

For a fixed number of realisations, the only driver of NVOV is the kurtosis of the return distribution.
Minimising NVOV is thus achieved by minimising kurtosis. In Fleming et al. (2020) a Branch-and-Bound
algorithm was used to minimise kurtosis for small-scale pathological examples incorporating only four
assets. The restriction to these sub-scale examples was due to the limitations of using a Branch-and-
Bound approach that suffers from the curse of dimensionality, which ultimately made it impractical.

We extend that work here by introducing a polynomial optimisation approach that allows the
minimisation of kurtosis to be readily solved for typical asset allocation problems. This provides a
practical framework for investors to augment their traditional portfolio analyses with an
understanding of the impact of the distribution tails on risk stability and associated asset allocations.

Diversification and portfolio stability

In this paper we take the view that diversification concerns the distribution of capital to improve the
stability of some chosen realised portfolio statistic. For example, in a mean-variance setting the
distribution of single-period returns is defined by the expected mean and variance. However, we can
alter this view by considering the mean return as a primary metric, and the volatility as the lever by
which we control the stability of the realised mean return over any number of periods. The critical

Electronic copy available at: https://ssrn.com/abstract=3902755


10 August 2021

point is that even if a risk model is believed to be adequately representative of reality, investors never
experience the mathematical expectations that define them. They experience one realisation of actual
returns over a subsequent 𝑡 periods, and diversification allows us some control over these realisations.

For example, the normalised variability of the realised mean, 𝜇! , for a given portfolio can be captured
as the relative standard error:
𝑆𝐸(𝜇! ) 1 𝜎 1
= = 𝐶𝑉
𝜇 √𝑡 𝜇 √𝑡

where the model mean 𝜇 = 𝐸(𝜇! ), 𝜎 is the model volatility, and 𝐶𝑉 is the coefficient of variation or
dispersion. The latter quantity is mentioned in Markowitz’ original paper on portfolio choice (1952)
and minimising the coefficient of variation for excess returns is equivalent to maximising the Sharpe
Ratio (SR). For a target level of return, diversifying through the adjustment of portfolio capital
allocations to minimise the volatility therefore improves the stability of the realised mean return 𝜇! .

We note that maximising the Diversification Ratio of Choueifaty and Coignard (2013) is also equivalent
to maximising the portfolio SR, under the constraint that each individual asset has the same SR. This
may be intuitive from the perspective of assuming constant long term excess returns per unit risk;
however, it is not necessarily supported numerically as pointed out by Clarke, de Silva and Thorley
(2013). In general it is believed that most investors inability to use leverage at the individual security
level forces them to pay a premium for higher risk stocks to access higher returns. This in turn comes
at the cost of a lower SR and provides an explanation as to why Minimum-Variance (MV) approaches
can perform better, being populated with lower risk stocks. This is also consistent with MV producing
a portfolio that is equivalent to the maximum SR under the constraint that all asset returns are equal.

Describing the above strategies as risk-based could be viewed as something of a misnomer. While
expected returns are not required, both methodologies do maximise the SR under some assumed
structure of asset returns. Furthermore, Risk Parity (RP) as the most popular strategy under the risk-
based banner has been shown to be equivalent to an MV portfolio with an additional constraint
ensuring that there are no zero weights (see Roncalli 2014), thus embedding the same return structure.
If one does not accept this view, then it makes little sense to optimise a portfolio whose risk is driven
only by variance because any level of risk can be achieved in many ways using assets, cash and leverage.

Here we argue that a pure risk-based strategy should be independent of any particular view on the
structure of expected returns, be it explicit or implicit. Instead we propose that the primary metric of
interest could be the realised variance and the purpose of diversification is to control the stability of
that realised variance. Following Bouchaud and Potters (2003) the NVOV is defined as:

𝑆𝐸 " (𝜎! " ) 𝐾−1


"
=
𝜎! √𝑡

where 𝛫 = 𝜇# /𝜎 # is the population kurtosis, and 𝑆𝐸(𝜎! " ) is the standard error of realised variance.
In essence this represents a switch from trading off mean and variance, to trading off variance and the
fourth moment. However, it is often more natural to work with portfolio volatility and so we also
define the Relative Volatility of Volatility (RVOV) as follows:

Electronic copy available at: https://ssrn.com/abstract=3902755


10 August 2021

𝑆𝐸(𝜎! ) 1 √𝐾 − 1
=
𝜎! √𝑡 2

Comparing this expression to that above for the realised mean and the coefficient of variation, one
might consider the reciprocal of √𝐾 − 1/2 as equivalent to a Sharpe Ratio for this pure risk-based
framework. Rewriting 𝐾 − 1 = (𝜇# − 𝜎 # )⁄𝜎 # , we thus obtain 2⁄√𝐾 − 1 = 𝜎 " /(0.5 ∗ 8𝜇# − 𝜎 # ).

As argued in Fleming et al. (2020), the aforementioned approach is not meaningful under the
traditional multivariate Gaussian model of returns as the kurtosis of all portfolios is constant. It only
becomes meaningful when we extend our models to include the excess kurtosis that captures fat tails.

To understand the practical implication of these formulas we show the distribution of realised
volatility in Fig. 1 for a number of different return distributions that have unit variance. As RVOV is
leverage invariant, scaling to unit variance simplifies RVOV to 𝑆𝐸(𝜎! ) without a loss of generality.
250,000 volatilities are calculated for each histogram and distribution in Fig. 1, and each volatility is
calculated using 250 data points, which is approximately equivalent to one year of trading day returns.

Comparing Normal and Laplace distributions, the relative size of RVOV is simply 8(6 − 1)⁄(3 − 1) =
1.58, so the standard error of volatility is 58% larger when increasing the kurtosis from 3 to 6. This
takes us from the Normal 𝑆𝐸 of 4.5% to 7%. From the plot we can see that most observations lie in
the range 0.9 to 1.1 for the Normal distribution and 0.8 to 1.2 for the Laplace. We note the distribution
of realised volatility is positively skewed in general; however, the Central Limit Theorem (CLT) applies
so that more Normal looking distributions are obtained when summing 250 data points for these cases.

In relation to symmetry and the CLT, the story starts to change somewhat for the Student distribution.
In Fig. 1 we see the realised volatility histograms for this distribution with 6 and 4.5 degrees of freedom,
corresponding to a kurtosis of 6 (same as the Laplace) and 15 respectively. Although the histogram for
Student(6) appears similar to that of Laplace, it is more positively skewed and has higher kurtosis.
These characteristics become even more pronounced for Student (4.5), where we observe strong
positive skew and a long tail up to 1.5, representing a difference of up to 50% from the unit expectation.

We note that although the CLT does apply to the Student distributions here, it only holds in the limit
as the number of summands tends to infinity. As reviewed extensively in Taleb (2020), the rate of
convergence of distributions towards Normality can be very slow, particularly when the distribution
has power-law tails, such as in the case of the Student distribution. The Laplace distribution does not
have such tails and so in general is better behaved. As we return to later, this also leads to lower data
requirements for estimating values numerically such as the kurtosis and higher-order co-moments. In
fact, one might consider the Laplace distribution as entry-level in terms its of fat tails (see Taleb 2020).

Electronic copy available at: https://ssrn.com/abstract=3902755


10 August 2021

Fig. 1. Distribution of realised volatility for Normal, Laplace and Student distributions using 250,000
different realisations of volatility. Each realised volatility is calculated from a separately generated 𝑡 =
250 data points. All utilised distributions generating data have expected zero mean and unit variance.

Diversification and dimensionality

There is a natural desire to disentangle asset correlations and discern an effective number of equally
dominant and independent drivers of portfolio volatility. Although Risk Parity does not offer such a
measure, there have been a number of proposals in the literature. Meucci et al. (2009, 2013) use
uncorrelated factors to define two forms of an effective number of bets and an alternative mean-
diversification optimisation. However, in the case of identically-distributed asset returns and a
homogeneous correlation matrix, the metrics are unintuitively independent of the level of correlation.

In Choueifaty and Coignard (2013) an effective number of independent factors or degrees of freedom
is introduced as a transformation of the Diversification Ratio. Recalling the underlying assumption that
each asset has the same SR, the degrees of freedom is therein shown to be equivalent to the number
assets with independent and identically distributed (IID) returns that would produce the same
portfolio SR. This highlights the important role that the assumption of a constant SR at the asset level
plays in interpreting the Diversification Ratio, which may be easily overlooked given its implicit nature.

In a similar fashion to the degrees of freedom described above, Fleming et al. (2020) introduced
portfolio dimensionality, 𝐷$ , as a transformation of the portfolio kurtosis for a pure risk-based strategy.
Assuming a Gaussian copula with marginal asset return distributions that are identical up to their
variances, and with finite kurtosis 𝜅 > 3, 𝐷$ is given by:

𝜅−3
𝐷$ =
𝐾$ − 3

where 𝐾$ is the portfolio kurtosis. The critical point of interest is that for an equally weighted portfolio
of 𝑛 assets described by IID returns, the dimensionality is exactly equal to 𝑛. This permits an elegant
connection to the Central Limit Theorem (CLT) as increasing 𝐷$ is associated with decreasing portfolio
kurtosis and a distribution of portfolio returns that is more Normally-distributed. We note that this is
also effectively a transformation of NVOV and RVOV and offers an absolute comparison of portfolios.

Electronic copy available at: https://ssrn.com/abstract=3902755


10 August 2021

Fig. 2. Kurtosis of a two-asset portfolio under the given correlation with dependence defined by a
Gaussian copula. Identical marginals are assumed with results presented for Laplace and Student(6),
each having kurtosis 6. Data requirements however differ notably with 𝑡%&' = 10( and 𝑡)*+ = 10,, .

In Fig. 2 we show the kurtosis of a two-asset portfolio under an assumed model of zero-mean unit-
variance identically-distributed asset returns and a dependence structure defined by a Gaussian
copula. Under Laplace marginals we see smoothly varying levels of kurtosis under non-negative
correlations with a minimum kurtosis of 4.5 achieved for zero correlation, corresponding to 𝐷$ = 2.
However, under negative correlation, we see quite different behaviour as kurtosis is seen to increase
immediately from the boundaries. In the case of correlation -0.9, the kurtosis peaks above 6.5 before
falling towards 4.5. In fact the portfolio kurtosis does not benefit from such a correlation except in a
narrow window centred around 𝑤, = 50%. A similar behaviour is seen for a correlation of -0.5, though
it now becomes possible to achieve kurtosis of less than 4.5 with a corresponding 𝐷$ of 2.2. Note that
portfolios with kurtoses greater than the marginal distribution have a dimensionality of less than 1.

Fig. 2 also shows the same analysis using a Student marginal with kurtosis equal to 6. We observe a
more extreme outcome again with kurtosis peaking above 7 for a correlation of -0.9 and a narrower
window where the portfolio kurtosis gets below 6, though still remains above 5.5. Notably, under the
same correlation, we find that for a Student marginal with kurtosis 7, no split of capital achieves a
kurtosis less than 7, so the minimum kurtosis is 7 and is achieved with 100% invested in either asset.
This shows that even with relatively low levels of excess kurtosis, strong negative correlations that
lead to significant reductions in volatility may not reduce kurtosis at all. In Fleming et al. (2020) a four-
asset example is examined, using a Student(6) marginal, which produces Minimum Variance, Risk
Parity and Maximum Diversification Ratio (MDR) portfolios that all have kurtoses of around 14. Such
portfolios exhibit an NVOV and RVOV of 600% and 250% larger, respectively, compared to Normality.

We see two key implications of the above. Firstly, it is common with RP to achieve very low levels of
volatility due to heavier weightings in fixed income assets that have negative correlation to traditional
risky asset classes such as equities. One consequence of this is that RP portfolios are typically
leveraged to avoid low volatilities that may come with commensurately low levels of excess returns.
Our analysis highlights the potential instability in realised volatility that may arise due to leveraging
portfolios that have, by construction, relatively high kurtosis. Secondly, for long-short equity
portfolios, a similar instability might also be expected as highly positively-correlated stocks are
grouped in opposition to negate volatility in the presence of leverage. The significant sensitivity of
portfolio kurtosis to appropriate asset weightings also points to the importance of regularly
rebalancing long and short positions that fall out of line from a volatility-weighted perspective.

Electronic copy available at: https://ssrn.com/abstract=3902755


10 August 2021

We advocate the use of Laplace marginals to initially understand dependence between assets. As a
distribution with a simple functional form, non-negligible excess kurtosis and tails that do not follow
a power-law, Fig. 2 shows that Laplace marginals provide more bounded values. Furthermore, the
simpler functional form and less extreme tails allow relatively efficient estimation with lower data
requirements. In contrast, the Student(6) results in Fig. 2 required 1000 times more data than the
Laplace distribution to achieve similar accuracy in terms of estimating kurtosis i.e. 𝑡 = 10,, vs 10( .
Finally, we note that that the use of large data sets is to avoid the variability of realised kurtosis itself.

Kurtosis minimisation as constrained polynomial optimisation

In Fleming et al. (2020) artificial asset allocation problems were used to compare and contrast
minimum kurtosis (MK) portfolios with MV, RP and MDR allocations. A Branch-and-Bound (BB)
algorithm introduced by Barkhagen et al. (2019) was used to find the MK allocation, which provided a
global minimum to what is a non-convex problem. However, BB algorithms do not scale well
computationally and so analysis thus far has been limited to only 𝑛 = 4 or 5 assets. This is a lower
number than would be required for practical application. Here we introduce the application of
polynomial optimisation to reduce computation time and increase the manageable assets to over 10.

Lasserre (2015) presents key theoretical results and a framework for solving polynomial optimisations.
The approach sits at the intersection of optimisation, probability theory and real algebraic geometry
and has a number of remarkable properties. In particular, it becomes possible to find global (not local)
optima for convex and non-convex polynomial functions over regions defined by polynomials that can
also be non-convex and possibly disconnected. The global optima may be multiple in nature and the
existence of the optima can be numerically certified. Critical to the framework is exploiting a duality
between nonnegative polynomials that have representations in terms of sums-of-squares (SOS) of
polynomials, and finite moment sequences with their associated positive semidefinite (PSD) matrices.

A polynomial 𝑓(𝒙) = ∑- 𝑓- 𝑥 - of degree 2𝑑 can be rewritten as a sum-of-squares (SOS) of


polynomials in the variables of interest, if and only if it has a representation 𝑓(𝒙) = 𝒗(𝒙)′𝑸𝒗(𝒙),
where 𝑸 is a positive semi-definite matrix and 𝒗(𝒙) a vector of monomials 𝑥 - that have degree ≤ 𝑑.
If such a representation exists, then 𝑸 can be found using a semi-definite programme (SDP), while its
existence is checked by the feasibility of the same SDP. Naturally all SOS polynomials are nonnegative;
however, powerful results from real algebraic geometry allow us to connect all polynomials with linear
combinations of SOS polynomials over semi-algebraic sets. This in turn allows us to construct a
hierarchy of SDP relaxations that generically converge to the minima of any polynomial over such sets.
The practicality of such an approach is that polynomial optimisation is in general NP-hard; however,
SDP problems are convex and can be solved efficiently using free and commercially available software.

Semi-algebraic sets are of the form 𝑲 = L𝒙 ∈ ℝ𝒏 : 𝑔/ (𝒙) ≥ 0, 𝑗 = 1, … , 𝑚V, where 𝑔𝒊 (𝒙) ∈ ℝ[𝒙] and
ℝ[𝒙] is the set of real polynomials over ℝ𝒏 . Putinar’s Positivstellensatz provides the surprising
relationship between polynomials that are strictly positive on 𝑲 and SOS polynomials. Specifically, any
𝑓 ∈ ℝ[𝒙] that is strictly positive over 𝑲 may be written in the form 𝑓(𝒙) = 𝜎4 + ∑5
/6, 𝜎/ 𝑔/ (𝒙), where
each 𝜎4 , 𝜎/ is a SOS polynomial. We note, that 𝑲 must actually satisfy some compactness conditions
that are not too restrictive, and that strict positivity vs nonnegativity has no impact on our approach.
To make this more concrete, consider the following optimisation problem:

Electronic copy available at: https://ssrn.com/abstract=3902755


10 August 2021

𝑓 ∗ := min𝒙 {𝑓(𝒙): 𝒙 ∈ 𝑲},

which can be reformulated simply in the following way:

𝑓 ∗ := max8 {𝜆: 𝑓(𝒙) − 𝜆 ≥ 0, ∀𝒙 ∈ 𝑲}.

Setting 𝑔4 (𝒙) = 1, when 𝑓 ∈ ℝ[𝒙] we can use Putinar’s Positivstellensatz to write:

𝑓 ∗ := max8,: <! = L𝜆: 𝑓 − 𝜆 = ∑5


/64 𝜎/ 𝑔/ , 𝑗 = 0, … , 𝑚V.

A hierarchy of SDPs then arises from the practical need to specify an upper bound on the degree of
the 𝜎/ , which is not given by the Positivstellensatz and cannot be deduced directly from 𝑓. Letting 𝑣/ =
>?@ A
b(deg 𝑔/ ) /2f and 𝑣 = max (𝑣, , … , 𝑣5 ), we fix 𝑑 ≥ 𝑑4 = max gh i , 𝑣j and define a single SDP:
"

𝑓B ∗ := max8,: <! = g𝜆: 𝑓 − 𝜆 = ∑5


/64 𝜎/ 𝑔/ , 𝜎/ ∈ Σ[𝐱]BCD! , 𝑗 = 0, … , 𝑚j,

where Σ[𝐱]B generically denotes the universe of SOS polynomials that have degree at least 2𝑑. It can
be shown that for any 𝑑 ≥ 𝑑4 , 𝑓BE, ∗ ≤ 𝑓B ∗ and 𝑓B ∗ → 𝑓 ∗ as 𝑑 → ∞, so the definition above in fact
defines a hierarchy of SDPs that converge to the minimum of the original problem, 𝑓 ∗ . Notice that,
similar to showing a polynomial is SOS and hence nonnegative, 𝑓 − 𝜆 = ∑5 /64 𝜎/ 𝑔/ provides a
certificate of positivity for 𝑓 − 𝜆 on the set 𝑲 for some SOS polynomials 𝜎/ of degree up to 2q𝑑 − 𝑣/ r.

Although the above hierarchy allows us to reach the optimal value 𝑓 ∗ using a finite series of SDPs with
increasing 𝑑, in practice we use a dual facet of positive polynomials to enable us to more easily extract
the set of points {𝒙∗ } that yield 𝑓 ∗ . The dual converts one SDP hierarchy to another, instead solving
for finite moment sequences constrained by PSD matrices that play a similar role to the elements 𝜎/ 𝑔/ .

We proceed by considering a finite Borel probability measure 𝜇 and the expectation of 𝑓 as follows:

𝐸[𝑓(𝒙)] = s tu 𝑓- 𝑥 - v 𝑑𝜇 = u 𝑓- 𝑦-
𝜶 𝜶

where 𝑦- represents the moments associated with the expectations of the 𝑥 - . By finding the measure
𝜇∗ that minimises ∑𝜶 𝑓- 𝑦- , we are thus able to extract the minima 𝒙∗ that minimises 𝑓(𝒙). To see this,
consider a unique global minimum 𝑓 ∗ located at point 𝒙∗ . Using the notation 𝛿𝒙 to represent the
point-like Dirac delta function at any point 𝒙, then we can define 𝜇∗ = 𝛿𝒙∗ to be our optimising
measure such that 𝐸[𝑓(𝒙)] = 𝑓 ∗ . This may also be extended to multiple global minima and their
associated locations, 𝑥G∗ , such that 𝑖 = 1, … , 𝑠 and 𝑓(𝑥G∗ ) = 𝑓 ∗ , where our optimal measure now
becomes 𝜇∗ = ∑G 𝜆G 𝛿H#∗ for positive 𝜆G . In this form 𝜇∗ is termed an 𝑠-atomic measure consisting of 𝑠
atoms that are, in our application, located at the minima. Again, 𝐸[𝑓(𝒙)] = 𝑓 ∗ holds under 𝜇∗ .

Electronic copy available at: https://ssrn.com/abstract=3902755


10 August 2021

Notice that our objective function 𝐸[𝑓(𝒙)] = ∑𝜶 𝑓- 𝑦- is now a linear function of the finite moment
sequence 𝒚 = {𝑦- }; however, some restrictions are needed to ensure that its associated measure 𝜇
exists only on our set 𝑲 where we are searching for our minima. Finding a measure 𝜇 that has a given
moment sequence 𝒚 is the well-known moment problem. Essentially our original polynomial
optimisation problem can be transformed into minimising a weighted sum of moments, ∑𝜶 𝑓- 𝑦- ,
subject to ensuring the existence of measure 𝜇 on 𝑲. As described in Lasserre (2015), existence is
ensured through the positive semi-definiteness of symmetric moment matrices 𝑀B (𝒚) and
𝑀BCD! (𝑔/ 𝒚), for 𝑗 = 1, … , 𝑚. We note that 𝑀B (𝒚) and 𝑀BCD! (𝑔/ 𝒚) contain an ordering of moments
𝑦- up to degree 2𝑑 and 2𝑑 + deg (𝑔/ ) respectively. Similar to the use of Putinar’s Positivstellensatz
above, a sufficient value of 𝑑 that ensures the existence of 𝜇∗ is not in general known in advance and
so a hierarchy of SDPs that converges to 𝑓 ∗ from below is defined as follows:

𝜙B ∗ := min𝒚 g∑𝜶 𝑓- 𝑦- ∶ 𝑀B (𝒚) ≽ 0; 𝑀BCD! q𝑔/ 𝒚r ≽ 0, 𝑗 = 1, … , 𝑚; 𝑦𝟎 = ∫ 𝑑𝜇 = 1j.

Using this hierarchy, we can therefore find 𝜙B ∗ for some 𝑑 ≥ 𝑑4 . One can then check whether 𝜙B ∗ =
𝑓 ∗ using a simple rank comparison. In particular, if rank(𝑀B (𝒚∗ )) = rank(𝑀BCD (𝒚∗ )), then 𝜙B ∗ = 𝑓 ∗ .
If this condition does not hold, then we may increment 𝑑 by 1 and try again, or settle for 𝜙B ∗ as a
lower bound on 𝑓 ∗ . The primarily limitation to continually increasing 𝑑 is the size of moment matrices,
which become unmanageable for problems with greater than around 10 to 20 variables. However, the
remarkable property of the moment-problem formulation is that if the rank condition is satisfied, then
we can say that our 𝑠-atomic measure 𝜇∗ has at least 𝑠 = rank(𝑀B (𝒚∗ )) atoms corresponding to
solutions 𝑥G∗ ∈ 𝑲, that are extracted numerically from 𝑀B (𝒚∗ ) using some well-known linear algebra.
Additionally, in contrast to the Positivstellensatz, the requirement for 𝑲 to be compact can be relaxed.

While this final formulation is relatively simple to write down, setting up the moment matrices and
extracting the solutions requires careful implementation. Helpfully, Lasserre (2015) is accompanied
by GLOPTIPOLY3, a MATLAB-based software package that sets up the chosen SDP, allows the user to
specify an SDP solver, checks the rank condition, then extracts the global minimisers as appropriate.
We use GLOPTIPOLY3 in the following in conjunction with the publicly-available SeDuMi SDP solver.

The minimum kurtosis problem we wish to solve is easily converted into the above moment form.
Following Fleming et al. (2020), we consider the long-only minimum kurtosis portfolio problem based
"
on the notation 𝐾$ = 𝜇# /𝜎 # = 𝒘K 𝑴# (𝒘⨂𝒘⨂𝒘)⁄(𝒘′𝚺𝒘) :

𝒘$ 𝑴% (𝒘⨂𝒘⨂𝒘)
𝐾$ ∗ ∶= 𝑚𝑖𝑛𝒘 g (𝒘$ 𝚺𝒘)&
: 𝐰 K 𝟏 = 1 and 𝐰 ≥ 𝟎j,

for portfolio weights 𝐰 ∈ ℝ𝒏 , where 𝑴𝟒 is an 𝑛 × 𝑛S matrix of fourth-order asset co-moments, ⨂


is the tensor product and 𝚺 is the asset covariance matrix. In light of our polynomial optimisation
approach, we re-write this problem as a simple polynomial optimisation followed by a rescaling :

𝜇# ∗ ∶= 𝑚𝑖𝑛𝒙 {𝒙K 𝑴# (𝒙⨂𝒙⨂𝒙): 𝒙K 𝚺𝒙 = 1 and 𝐱 ≥ 𝟎},


𝒘∗ = 𝒙∗ /(𝒙∗ K 𝟏).

Electronic copy available at: https://ssrn.com/abstract=3902755


10 August 2021

which represents a minimisation of the fourth moment followed by normalisation of the solution, so
that the resulting weights satisfy 𝐰 K 𝟏 = 1. We note that the rescaling leaves the kurtosis unchanged;
however, 𝜇# ∗ is usefully equal to the optimal 𝐾$ ∗ under the constraint 𝒙K 𝚺𝒙 = 1. Equality constraints
such as 𝒙K 𝚺𝒙 = 1, which makes our problem non-convex, can be constructed in the usual way using
two inequalities, though GLOPTIPOLY3 conveniently allows equality constraints to be defined directly.
In fact, new insights into polynomial optimisations can be derived by treating equality constraints
separately from inequalities, as reviewed in Lasserre (2015). For simplicity we do not expand on this.

More formally, we solve the following pre-normalisation optimisation for some 𝑑:

𝜙B ∗ := min𝒚 g∑𝜶 𝑓- 𝑦- ∶ 𝑀B (𝒚) ≽ 0; 𝑀BCD! q𝑔/ 𝒚r ≽ 0, 𝑗 = 1, … , 𝑛 + 2; 𝑦𝟎 = 1j,

where:
• 𝑓 = 𝒙K 𝑴# (𝒙⨂𝒙⨂𝒙) = ∑𝜶 𝑓- 𝑥 -
• 𝑓- are co-moment values from 𝑴#
• 𝑦- = ∫ 𝑥 - 𝑑𝜇 are expectations over weights
• 𝑔/ (𝒙) = 𝑥/ ≥ 0, 𝑗 = 1, … , 𝑛.
• 𝑔TE, (𝒙) = 𝒙K 𝚺𝒙 − 1 ≥ 0
• 𝑔TE" (𝒙) = 1 − 𝒙K 𝚺𝒙 ≥ 0
• 𝑣/ = b(deg 𝑔/ ) /2f
• 𝑲 = L𝒙 ∈ ℝ𝒏 : 𝑔/ (𝒙) ≥ 0, 𝑗 = 1, … , 𝑛 + 2V

Finally, we highlight that although kurtosis is a ratio of a function of two portfolio moments, they are
not from the same family of moments 𝑦- . In particular, the elements of 𝑴# and 𝚺 are coefficients in
our defining polynomials, 𝑓 and 𝑔/ , and are based on distributions of asset returns, while the 𝑦- are
expectations of products of portfolio weights 𝒙 ∈ ℝ𝒏 , defined by a measure over the weight region 𝑲.

Numerical examples using polynomial optimisation

The power of polynomial optimisation can be easily seen from an example with multiple solutions. We
consider an asset universe with unit variance and a homogeneous correlation matrix defined by a
single parameter 𝜚 . In the case of 𝜚 ≥ 0, the minimum kurtosis portfolio is unique and equally
weighted; however, for 𝜚 < 0 this is not always the case. For a five asset universe with 𝜚 = −0.2,
Barkhagen et al. (2019) use a stochastic optimisation approach to show that an optimal portfolio is in
fact an equally weighted portfolio incorporating only four assets. By symmetry, it therefore follows
that there are five such portfolios, where each asset has zero weight in turn. Some intuition for this
result is provided by Fig. 2, where we can see that adding a new negatively correlated asset to a
portfolio can initially increase kurtosis before enabling a new minimum. However, this process only
works four times in our example as the new minimum kurtosis of the existing portfolio reduces with
each new asset. Adding a fifth does not help. This result is specific to the correlation and the chosen
marginal distribution, which was a Normal-Inverse Gaussian (NIG) with zero skew and kurtosis six.

Remarkably, polynomial optimisation allows us to extract all five solutions from a single SDP. Following
on from the previous section we must choose 𝑑 ≥ 𝑑4 = 2. For speed it is preferable to choose 𝑑 to

Electronic copy available at: https://ssrn.com/abstract=3902755


10 August 2021

be as small as possible, so we start with 𝑑 = 2 and attempt to solve for 𝜙" ∗ . We note that some care
is required in our handling of 𝑴# and 𝚺 as estimation errors can create small differences that render
our five global minima unequal. Following a similar scheme to that described in Fleming et al. (2020),
we generate correlated time series using a Gaussian copula and a chosen marginal distribution to
estimate 𝑴# and 𝚺. In this example we generate 𝑡 = 10U data points and use a Laplace marginal for
simplicity, speed, and having equal kurtosis to the NIG example. Estimation errors are removed by
rounding elements of 𝑴# and 𝚺 to two significant figures so e.g. 𝚺 contains exact values 1 and −0.2.

Using GLOPTIPOLY3 to solve for 𝜙" ∗ , we obtain a minimum kurtosis of 3.8121, which corresponds to
dimensionality of around 3.7. However, this solution does not pass the rank condition as rank(𝑀, ) =
5 and rank(𝑀" ) = 10 . Therefore 3.8121 only represents a lower bound on our true solution.
Increasing 𝑑 to 3 and solving for 𝜙S ∗ produces a minimum of 3.825, which is very close to 𝜙" ∗ , and
the rank condition is now satisfied. GLOPTIPOLY3 automatically extracts the five solutions, each with
one different asset having a weight of 0.0% and the others having equal weights of 79.1%. The
remaining step is our normalisation so that the non-zero weights become 25.0% and now sum to one.
We note that rounding elements of 𝑴# and 𝚺 is very crude but allows us to demonstrate the approach.
Using more considered methods to reduce distortion, d = 2 is adequate to find our solutions. In doing
so we achieve a lower minimum kurtosis of around 3.65, which now increases dimensionality over 4.

In more practical settings we believe that non-identical assets and small numerical differences will
typically ensure there is only one solution to a minimum kurtosis problem. Also, in almost all examples
we have explored, d = 2 is found to be sufficient, thus requiring only a single optimisation. This
optimisation is also relatively fast: setting up the SDP, solving it and extracting the multiple solutions
takes less than 1 second for the above 5 asset problem on a 2019 16” 2.4GHz Intel i9 8-core MacBook
Pro. In fact estimating 𝑴# and 𝚺 takes most time at almost 1 minute. In Barkhagen et al. (2019), the
fastest Branch and Bound (BB) algorithm takes 175 seconds on a 24-core machine for the same 5 asset
problem, while a similar 6 asset problem quickly becomes more demanding at 49,680 seconds. For
this reason a Stochastic Gradient Langevin Dynamics (SGLD) algorithm was introduced to solve larger
problems. Applied to the problem above, SGLD took more time than BB, requiring 2,476 seconds;
however, SGLD scaled more favourably to, for example, a 15-asset problem that took 8,322 seconds.

In comparison, using a number of artificially constructed examples, we found that GLOPTIPOLY3


solved problems with: (1) 10 assets in under 5 seconds, (2) 15 assets in under 120 seconds and (3) 20
assets in under 2300 seconds. The PO approach is therefore dominant for typical asset allocation
problems. We note that while GLOPTIPOLY3 is not specifically optimised for parallel processing, there
remains a natural challenge to scaling this formulation of PO further as the moment matrices in the
SDP become large. One solution to this has been proposed by Lasserre et al. (2017), which replaces
the SOS-based hierarchy of PO relaxations implemented in GLOPTIPOLY3 with a Bounded SOS (BSOS)
hierarchy that combines linear and SOS relaxations. This allows the maximum size of any required SDP
to be fixed in advance. Therein, the BSOS approach is used to solve 40-variable problems with strong
similarities to our problem and quadratic problems with 100 variables that have polyhedral constraints.
Both of these larger problems are solved in 𝒪(1𝑒3) seconds. Finally, we confirmed that the minimum
kurtosis problems solved in Fleming et al. (2020), all 𝑛 = 4 and using a BB algorithm, can be solved in
a fraction of a second using the rescaled solution of the PO approach implemented in GOLPTIPOLY3.

10

Electronic copy available at: https://ssrn.com/abstract=3902755


10 August 2021

To demonstrate the benefits of our approach we re-examine an asset allocation problem analysed by
Chaves et al. (2012) in the context of algorithms for calculating Risk Parity weights. This considers the
10 asset classes listed below along with their reproduced statistics for dates between 1991 and 2012:

In Fig. 3 we show the asset allocations obtained using six portfolio construction methodologies with
Laplace marginals. In addition to MV, RP, MK, DR we add two more: (1) Minimum risk, based on the
fourth moment (MF) and (2) Risk parity based on the fourth moment (FP). We find both of interest as
we focus on the ratio of fourth moment to variance squared in the form of kurtosis. The MV and MF
portfolios are very similar here with the usual high concentration. As MV and MF portfolios are
identical in the case of Gaussian distributed returns, the similarity is perhaps not too surprising with a
relatively low level of marginal excess kurtosis. The RP and FP portfolios are strikingly similar and have
the most balanced allocations. The MK and DR portfolios are also very close to each other sitting
between the MV and RP portfolios in terms of the number of non-zero weights. The most interesting
aspect of the MK and DR portfolios is the focus on lesser correlated assets. For example, fixed income
is represented by a pure factor-like exposure to duration through Long Treasuries, and credit spreads
through low-duration HY bonds, while MSCI EAFE loses out due to its higher equity correlation profile.

Method Volatility (%) Kurtosis 𝐷$


MV 3.6 5.5 1.2
MF 3.6 5.1 1.5
RP 6.0 4.2 2.5
FP 6.0 4.2 2.5
MK 6.9 3.9 3.2
DR 7.0 4.0 3.2

Fig. 3. Asset allocations and portfolio statistics for six portfolio construction methodologies. Laplace
marginals are assumed with 𝑡 = 10( . Calculation time for MK optimisation is around 4 - 5 seconds.

11

Electronic copy available at: https://ssrn.com/abstract=3902755


10 August 2021

Method Volatility (%) Kurtosis 𝐷$


MV 3.6 5.5 1.2
MF 3.6 5.0 1.5
RP 6.0 4.0 3.1
FP 6.1 4.0 3.1
MK 6.2 3.8 3.6
DR 7.0 3.9 3.4

Fig. 4. Comparison asset allocation results to Fig. 3, using Student(6) marginals with 𝑡 = 10,, .

We note that while the correlation matrix has negative entries, the values are still relatively close to
zero. In Fleming et al. (2020) we saw very similar RP, MK and DR allocations for small problems with
nonnegative (or close to) correlation matrices and different choices of marginals. Here we see that for
larger problems and a more realistic correlation matrix, allocations can differ significantly due to many
weightings being zero; however, the level of kurtosis remains quite consistent across the
methodologies. We believe this makes sense in the case of DR because the strong deviation from MK
primarily comes in the presence of strong negative correlations, as was seen in Fleming et al. (2020).
Although RP is constrained to be invested in every asset, our experience is that it is also typically a
strong performer in terms of minimising kurtosis when correlations are nonnegative. Some intuition
for this can be seen from the near-identical RP and FP allocations. Assuming the allocations are
identical, then we have one set of parity weights 𝒘∗ that are nonnegative and sum to 1. Additionally,
writing 𝛫 = 𝜇# /𝜎 # = 𝑢⁄𝑣, the FP and RP allocations satisfy 𝑤G∗ 𝑢GK ⁄𝑢 = 𝑤G∗ 𝑣GK ⁄𝑣 = 1⁄𝑛 ∀ 𝑖. Together
these conditions satisfy the first-order conditions for minimising 𝛫 under long-only constraints;
however, these are necessary and not sufficient conditions for a minimum. In the case of nonnegative
correlations 𝑤G∗ 𝑢GK ⁄𝑢 and 𝑤G∗ 𝑣GK ⁄𝑣 must be positive, but need not be equal to 1⁄𝑛. In the presence of
strong negative correlations we often see significant 𝑢GK ⁄𝑢 < 0 and 𝑣GK ⁄𝑣 < 0, for some 𝑖, for the MK
portfolio. Lastly, as RP/ FP are not duplication invariant, there is always further potential divergence.

In Fig. 4 we show the impact of switching to a marginal distribution with power-law tails in the form
of a Student distribution with kurtosis of 6, equal to that of the Laplace. As seen in Fleming et al. (2020),
the introduction of power-law tails can cause quite sharp changes in allocations for MK portfolios,
often away from more negatively correlated assets. In this case we see a reduction in the large
allocation to Long Treasuries of around 20% that largely goes into BarCap Agg, which has correlations
to the non-Fixed Income assets that are closer to zero than Long Treasuries. BarCap Agg has the lowest
volatility of all the assets, so this results in a noticeable drop in volatility for the MK portfolio. There is
a small increase in allocation to MSCI EAFE too, but little change in other statistics across different
methodologies involving the fourth moment. Finally, we note that the assumption of homogeneous
marginals can naturally be questioned, particularly in relation to BarCap Agg, as it includes all
components of Fixed Income. However, our approach offers some practical insight and can be
extended to heterogenous marginals with different dependence models too.

12

Electronic copy available at: https://ssrn.com/abstract=3902755


10 August 2021

Marginal Volatility (%) Kurtosis 𝐷$


Laplace 6.9 3.9 3.2
St. 𝜅 = 4 7.0 3.3 3.4
St. 𝜅 = 6 6.2 3.8 3.6
St. 𝜅 = 8 6.1 4.2 4.0
St. 𝜅 = 10 6.2 4.6 4.5
RP (𝜅=10) 6.0 4.7 4.1

Fig. 5. Comparison results to Fig. 3 using range of Student marginals and RP for largest kurtosis of 10.

Finally, in Fig. 5 we show MK results for a broader range of marginals by adding Student-t results with
kurtoses 4, 8 and 10. As in Fleming et al. (2020), allocation results for kurtosis of 4 are very similar to
those using Laplace marginals, while the results for kurtoses 8 and 10 show an increasingly broader
distribution of capital, with allocations starting for asset 5 when kurtosis is 8. This suggests broader
capital allocations may be more appropriate for investors who believe in higher levels of kurtosis and
are concerned with the stability of realised risks. It may also provide a reason why Risk Parity can
perform well in the case of largely nonnegative correlations, having positive weights in every asset.

Conclusions

In relation to optimising diversification by minimising kurtosis, we have shown that Polynomial


Optimisation provides an efficient framework for minimising portfolio kurtosis in the context of
realistic asset allocation problems. As a deterministic approach that numerically certifies global
optima, it is preferable to deterministic Branch-and-Bound methods that struggle to scale up to
practical problem sizes, and multi-start stochastic methods that provide only limited guarantees on
optimality. For problems with more than 20 variables, Bounded Sum-of-Squares methods offer the
potential to go further still, up to 40 variables, and provides an interesting avenue of future research.

While polynomial optimisation itself is relatively fast, one area for improvement in setting up the
problem is the estimation of the matrix 𝑴# , which can take hours for the largest problems. Techniques
that allow fast estimation of 𝑴# for different dependence structures and marginals would be of great
benefit. However, further consideration must be given to the large data requirements of estimation
to fully explore and back-test portfolio construction techniques that target the stability of realised risk.

Future work could entail extending our diversification and optimisation framework to alternative and
more robust measures of the fat tails of distributions. For example, Cascon and Shadwick (2006)
present the ratio of the standard deviation to the mean absolute deviation (MAD) and the ratio of the
second L-moment to the MAD as alternative ways to characterise the tails of families of distributions.
As well as being more robust, these metrics can also be finite for, respectively, distributions of
undefined kurtosis and variance. This creates interesting new challenges in optimisation, as these non-
convex functions no longer fit into the polynomial optimisation framework presented in this paper.

13

Electronic copy available at: https://ssrn.com/abstract=3902755


10 August 2021

References

[1] Fleming, B., and J. Kroeske. Diversification and the distribution of portfolio variance. Part 2:
Volatility Stability as a Measure of Diversification. SSRN, July (2020).

[2] Markowitz, H. Portfolio Selection. The Journal of Finance 7(1), 77-91 (1952).

[3] Choueifaty, Y., T. Froidure, and J. Reynier. Properties of the most diversified portfolio. Journal of
Investing 2, 49-70 (2013).

[4] Clarke, R., H. de Silva, and S. Thorley. Risk Parity, Maximum Diversification, and Minimum Variance:
An Analytic Perspective. Journal of Portfolio Management, Spring (2013).

[5] Roncalli, T. Introduction to Risk Parity and Budgeting. Taylor and Francis Group, Boca Raton, FL
(2014).

[6] Bouchaud, J.-P., and M. Potters. Theory of Financial Risk and Derivative Pricing (2nd ed.). Cambridge
University Press, NY (2003).

[7] Taleb, N. Statistical Consequences of Fat Tails. Stem Academic Press (2020).

[8] Meucci, A. Managing diversification. Risk, May (2009).

[9] Meucci, A., A. Santangelo, and R. Deguest. Risk budgeting and diversification based on optimized
uncorrelated factors. SSRN, August (2013).

[10] Barkhagen, M., B. Fleming, S. Garcia, J. Gondzio, J. Kalcsics, J. Kroeske, S. Sabanis, and A. Staal.
Optimising portfolio diversification and dimensionality. Working paper, Mathematics Department,
University of Edinburgh (2019).

[11] Lasserre, J. B. An introduction to polynomial and semi-algebraic optimization. Cambridge


University Press, UK (2015).

[12] Lasserre, J. B., K.-C. Toh, and S. Yang. A bounded degree SOS hierarchy for polynomial optimization.
EURO Journal on Computational Optimization 5, 87-117 (2017).

[13] Chaves, D., H. Jason, L. Feifei, and O. Shakernia. Efficient algorithms for computing risk parity
portfolio weights. Journal of Investing 21(3), 150-163 (2012).

[14] Cascon, A., and W. F. Shadwick. The CS character and the limitations of the Sharpe Ratio. The
Journal of Investment Consulting 8(1), Summer (2006).

14

Electronic copy available at: https://ssrn.com/abstract=3902755

You might also like