Professional Documents
Culture Documents
Beyond Correlation
Beyond Correlation
Abstract
We derive conditional means from partial moment quadrants of the
joint distribution. Restricting quadrants enables scenario analysis with-
out the need for an underlying correlation assumption. Weighting of these
conditional means permits more generalized scenarios with embedded de-
pendence structures. The resulting analysis simultaneously considers mul-
tiple correlation assumptions and demonstrates that correlation is not
necessary to derive expected values, rather merely a probability of that
expected value for a given condition. Extending the analysis to mean /
variance optimization identifies a major philosophical inconsistency with
its treatment of correlation, and offers an alternative to the use of corre-
lation in constructing portfolios.
1 Introduction
This article addresses correlation concerns for multivariate scenario analysis.
Partitioning the joint distribution into its partial moment quadrants enables
dependence analysis as well as stress testing scenarios. The restriction of the
joint distribution to specific quadrants alleviates the need to assign correlation
coefficients in scenario analysis.
The goal of this paper is to:
i Demonstrate how multiple correlation scenarios can be aggregated and
analyzed simultaneously as an artifact of generating conditional means.
1
2 Correlation
Correlation is imperative in determining scenario analysis and featured promi-
nently in portfolio theory. How should one consider correlation when generating
scenario analysis? Do you assume a full correlation between securities and thusly
the worst case scenario? Do you assume independence and a zero correlation
as a more likely scenario? Do you simulate every possible correlation on the
spectrum? Correlation is not stable and with three different correlations, we
can have three different outcomes for a multivariate relationship when testing
an outcome on one of those variables. Equation 1 is the standard Pearsons
product moment correlation coefficient.
Cov(x, y)
xy = (1)
(x y )
The nonstationarity observed is due primarily to the normalization constant
(x y ) as the literature is rife with problems in assuming a static correlation co-
efficient, be it linear or nonlinear. The denominator of equation 1 also questions
the existence of a true correlation coefficient some strive to estimate.
Figure 1 below illustrates the behavior of the components of a Pearson corre-
lation coefficient on two random Normal variables consisting of 500 observations.
The median average deviation (M AD) is taken of both the covariance (X, Y )
and the normalization constant (x y ) for observations 200:500. Using the last
300 observations ignores any small sample size instabilities. We then generated
1000 random seeds, taking the M AD for each seed.
The covariance (X, Y ) has a median absolute deviation of 0.007286368 while
the normalization constant (x y ) has a median absolute deviation of 0.008126352.
The normalization constant is more instable than the covariance.
The following MAD R-commands were generated with the supplemental
R-code available in 9.
[1] 0.007286368
[1] 0.008126352
[1] 1.115282
2
Cov(X,Y) vs. (Sigma(X)*Sigma(Y)) Stability
0.07
Cov(X,Y)
Sigma(X)*Sigma(Y)
0.06
Median Average Deviation
0.05
0.04
0.03
0.02
0.01
Index
3
3 Partial Moment Quadrants
A quick recap of the partial moment quadrants (which are the unnormalized
correlation elements) as defined in [4] is in order.
Co-Partial Moments:
" T #
1 X n n
CLP M (n, hx |hy , X|Y ) = max(0, hx Xt ) max(0, hy Yt ) (2)
T t=1
" T #
1 X q q
CU P M (q, lx |ly , X|Y ) = max(0, Xt lx ) max(0, Yt ly ) (3)
T t=1
where Xt represents the observation for variable X at time t, Yt represents
the observation for variable Y at time t, n is the degree of the LP M , q is the
degree of the U P M , hx is the target for computing below target observations
for X, and lx is the target for computing above target observations for X. For
simplicity we assume that hx = lx and per Pearson the mean is used as a target
such that, hx = x .
Divergent-Partial Moments:
" T #
1 X
DLP M (n, q, hx |hy , X|Y ) = max(0, Xt hx )q max(0, hy Yt )n (4)
T t=1
" T #
1 X n q
DU P M (n, q, hx |hy , X|Y ) = max(0, hx Xt ) max(0, Yt hy ) (5)
T t=1
4
CLPM Quadrant DUPM Quadrant
4
4
2
2
Y
Y
0
0
2
2
4
3 2 1 0 1 2 3 3 2 1 0 1 2 3
X X
4
2
2
Y
Y
0
0
2
2
4
3 2 1 0 1 2 3 3 2 1 0 1 2 3
X X
5
3.1 Quadrant Stability
Partial moments are inherently more stable than summary statistics such as
mean and variance. This fact is due to the robust property of partial moments
whereby out of quadrant outliers are not compensated in each quadrants mea-
sure.
Looking at the degree 0 CLP M versus a Pearson correlation coefficient (both
measures are normalized to the interval [0,1]) for a rolling 250 period window
on top and 20 period on the bottom, the stability of the CLP M is evident.
4 Stress Scenario
A stress or panic scenario is different from a more generalized loss scenario
specifically due to the extra condition of simultaneous losses on all of the vari-
ables. This reduces the subset of concern to the CLP M quadrant as there
are no negative correlation possibilities. By extension, all positive correlations
within [0,1] for losses are considered.
Starting with a perfectly correlated set of assets (X = Y is a straight line
with a slope equal to 1), the stress scenario is simple to identify as there are no
divergent returns to consider. Unfortunately, assets do not share this perfect
correlation property and often there are divergent returns to factor.
Extending the CLP M with a line from the origin (X, Y ) through (XCLP M , YCLP M )
will generate (E[Y |X ]) for any embedded conditional dependence structure.
These quadrant means are also used as nonlinear regression points in [2] and
[3].
(YCLP M Y )
E[Y |X ] = X (6)
(XCLP M X)
6
Rolling 250 period CLPM vs. Pearson corrrelation
0.30
0.15
CLPM Pearson
0.00
Index
CLPM
0.4
Pearson
Index
7
100 Observations 100 Observations
0.6246 0.6246
2
3
2
1
1
0
Y
Y
0
0.6246
0.8185
2
2
2 1 0 1 2 2 1 0 1 2
X X
0.797 0.797
4
2
2
0
Y
0.795 0.797
2
2
4
4 2 0 2 4 4 2 0 2 4
X X
Figure 4: Top panels show 100 observations of two random normal samples and
their perfectly correlated CLP M means. Extending to 100,000 observations in
the bottom panels and reducing correlation, we note the convergence of CLP M
means as the observations increase to this limit condition.
8
E[Y] for CLPM
0
0.7979
1
2
Y
0.7949
4
4 3 2 1 0
9
Assuming alternating random periods of greater dependence and lesser de-
pendence will on balance yield a slope equal to 1, if there no prevailing insight
exists as to either scenario. Regime switching between less and strongly corre-
lated times will always be reactionary when modeled, and may increase errors
due to the position of the prior correlation assumption. For instance, assuming
a greater dependence expands the range of errors when lesser dependence events
occur. Revisit the volatility of the 20 period correlations in Figure 3 above.
An investigation to the underlying assumptions is warranted at this point.
One assumption is that the relationship of X < X and Y < Y will hold as the
number of observations increases. Given the stability of the means and of the
CLP M quadrant as displayed in Figure 6 below, this is not an illogical assump-
tion. Remember, every observation is used in the CLP M measure, aiding in
its stability. Furthermore, if one uses the square root of time rule to extend the
observed range, it would apply equally to all observations in that CLP M area,
thus keeping the dependence slope intact.
With two identical assets, if X = 10% then E[Y |X ] = 10%. Mov-
ing from this rigid condition to an i.i.d. assumtion, by simply defining a stress
scenario as the CLP M , if X = 10%, then we still have E[Y |X ] = 10%.
We realize this conclusion without explicitly stating a correlation for this sce-
nario, simply by reducing the subset of observation to the CLP M quadrant and
extending the dependence slope.
This is different than a simple linear regression of the subset of ob-
servations. For instance, the intercept in a linear regression is an afterthought,
with no relevance to correlation scenarios. And, least squares differences will
overcompensate outliers versus a standard average of observations. Seeing as
no observation can be defined as an outlier for variables of infinite variance, the
equal treatment of those observations seems most appropriate. Figure 7 below
illustrates the difference from a linear regression as well as the first pricipal
component (PC1). The following output for a linear regression on the CLP M
observations and PC1 exemplifies the inability of the techniques for use as a
dependence proxy.
> set.seed(123);x=rnorm(100000);y=rnorm(100000)
> lm(y[x<mean(x)&y<mean(y)]~x[x<mean(x)&y<mean(y)])
Call:
lm(formula = y[x < mean(x) & y < mean(y)] ~ x[x < mean(x) & y <
mean(y)])
Coefficients:
(Intercept) x[x < mean(x) & y < mean(y)]
-0.800355 -0.003094
10
Stability of CLPM and quadrant means
0.5
0.0
0.5
1.0
CLPM
Mean of X in CLPM
1.5
Mean of Y in CLPM
Index
11
Linear regression & PC1 vs. E[Y] for CLPM
0
0.7979
1
2
Y
0.7949
4
4 3 2 1 0
Figure 7: CLP M Mean slope (green) vs. linear regression (yellow) of the
CLP M quadrant and first principal component (blue).
12
5 Generalized Loss Scenario
By utilizing the weighted means of the CLP M (E[Y |X ]) and DU P M (E[Y + |X ])
quadrants, we will simultaneously consider 3 correlation scenarios and gener-
ate a conditional mean E[Y |X ] for all X < X. For example, extrapolat-
ing the 100,000 observations of independent normally distributed variables to
the DU P M quadrant will generate another mean of Y for the instances when
X < X. The symmetrical equally weighted means will average Y .
Figure 8 below illustrates how using the weighted CLP M mean and DU P M
mean as intersection points for a line through the origin (X, Y ) will generate
E[Y + |X ] E[Y |X ] = 0 for any loss of X when X and Y are i.i.d.
One way to raise E[Y |X ] is to maximize the DU P M quadrant (E[Y + |X ])
and minimize the CLP M quadrant (E[Y |X ]).
13
E[Y] for CLPM and DUPM
4
3
2
0.7997
1
Y
0
1
0.8054
2
0.7862
3
3 2 1 0
14
E[Y] for CUPM and DLPM
3
2
0.7908
1
Y
0
1
0.7978
2
3
0.8036
0 1 2 3 4
15
Testing equation 7 on two random normal variables, we have:
> set.seed(123);x=rnorm(10000);y=rnorm(10000)
> mean(y)
[1] -0.009106453
> D.UPM(0,0,x,y,mean(x),mean(y))*mean(y[x<mean(x)&y>mean(y)])+
Co.LPM(0,0,x,y,mean(x),mean(y))*mean(y[x<mean(x)&y<mean(y)])+
Co.UPM(0,0,x,y,mean(x),mean(y))*mean(y[x>mean(x)&y>mean(y)])+
D.LPM(0,0,x,y,mean(x),mean(y))*mean(y[x>mean(x)&y<mean(y)])
[1] -0.009106453
Our means are equal and we now have a much more granular definition to
work with. This enhanced definition is critical in representing investor prefer-
ences.
7 Probability of E[Y |X ]
When analyzing a stress loss level, one aspect yet to be discussed is the proba-
bility of this occurrence. The degree 0 CLP M measure as identified in the coef-
ficients above in equation 7 (25% for i.i.d. variables) generates P (E[Y |X ) for
the generalized loss scenario and aggregated conditional mean. The probability
of a specific point for a given value of X is related to the correlation (2nd order)
within that quadrant. So what is P (E[Y |X ])? The correlation.1 Obviously
the higher the correlation the less variance about E[Y |X ], regardless of the
slope. We have officially gone circular with this argument! Well, not really...
What we have done is narrow down the fact that xy is not nec-
essary to derive E[Y |X]; while it is necessary for P (E[Y |X]).
dependent, the absolute value of the 2nd order partial moment correlations are all equal to 1.
16
Zero Correlation Full Correlation
0.0
0
0.5
0.7923
1
1.0
0.8054
1.5
Y
2
2.0
2.5
3
3.0
0.7862 0.7923
X X
Figure 10: Zero correlation and full correlation of the CLP M quadrant. The
area of the full correlation CLP M quadrant is larger than the area under no
correlation.
17
> Co.LPM(1, 1, x, y, mean(x), mean(y))
[1] 0.1585697
[1] 0.4979909
Also, when extending the range and stressing an event such as a default, this
hypothetical scenario needs to be further qualified by when. Surely a loss of
50% is far more probable in 10 years than it is tomorrow. So to approximate
the when with a square root of time extrapolation is a reasonable attempt
if one is forced to do so, with the added benefit of increasing joint default
probabilities (CLP M areas from threshold targets) as the horizon increases.
Figure 11 visualizes the range expansion of a normal distribution as the number
of observations tends towards 10,000.
18
Range of X
8
6
Range
4
2
0
Sample Size
19
increases the correlation of assets and probability of the expected value, yet this
goal is accomplished with uncorrelated assets.
This inconsistency masks the true intent.
9 Conclusions
Avoiding a correlation assumption is critical, due to the severe instability of the
normalization constant. Partial moment correlations greatly reduce this insta-
bility. Using the partial moment means generates conditional expected values
of Y . The implied dependence from the conditional means remains stationary
when using a square root of time extension to the data.
However, when using techniques involving point estimates for finance (maxi-
mizing CU PM
CLP M , or ), the futility of point estimates for undefinable distributions
is quickly realized. Past performance is no guarantee of future resuls. The vari-
ance of the conditional expected value is still not zero (and never known with
certainty) and reflected in the correlation of that subset. This is where the
argument gets circular. In order to avoid a circular argument on correlation,
minimizing the CLP M quadrant will serve to diminish the probability
and level of the conditional expected value for losses. Maximizing the
CU P M quadrant will serve to augment the probability and level of the
conditional expected value for gains.
If the reader is reticent in using the old and tired methods, at least derive the
aggregated conditional measures such that any asymmetries can be identified
and exploited rather than losing that vital information with incapable techniques
that have long outstayed their welcome.
References
[1] D J Lucas. Default correlation and credit analysis. Journal of Fixed Income,
4(4):7687, 1995.
[2] H D Vinod and F Viole. Clustering and curve fitting by line segments. SSRN
eLibrary, 2016.
[3] H D Vinod and F Viole. New nonparametric curve-fitting using partitioning,
regression and partial derivative estimation. SSRN eLibrary, 2016.
20
[4] F Viole and D Nawrocki. Deriving Nonlinear Correlation Coefficients from
Partial Moments. SSRN eLibrary, 2012.
21
Appendix A
DU P M and DLP M Transpose
Below we show the inverse asymmetrical nature of the DU P M and DLP M
matrices with variable relationships. Recall the relationship matrices from 3:
A A (0) A B A A (0) A B
DU P M = DLP M =
B A B B (0) B A B B (0)
DU P M T = DLP M
" T #
1 X q q q
CU P Mxyz = max(0, Xt lx ) max(0, Yt ly ) max(0, Zt lz )
T t=1
(10)
22
Figure 12: CLP M and CU P M represented for 3 variables under independence,
red and green respectively.
23
Figure 13: CLP M and CU P M represented for 3 variables under full correlation,
red and green respectively.
24
Thus, the multivariate correlation derived from the Co-Partial Moments is
simply,
25
Appendix B: Supplemental R-Code
For motivated readers, we provide all of the R code used to produce the above
examples and plots. The following commands were executed with the NNS pack-
age available on CRAN:
cran.r-project.org/web/packages/NNS/
Figure 1:
> require(NNS)
> mad.cov.xy=numeric();mad.sig.xy=numeric()
> for(j in 1:1000){
set.seed(123*j);x=rnorm(500);y=rnorm(500)
cov.xy=numeric();sig.xy=numeric()
for(i in 200:length(x)){
cov.xy[i-199]=cov(x[1:i],y[1:i])
sig.xy[i-199]=sd(x[1:i])*sd(y[1:i])
}
mad.cov.xy[j]=mad(cov.xy)
mad.sig.xy[j]=mad(sig.xy)
}
> plot(mad.cov.xy,type = 'l',lwd=3,
main="Cov(X,Y) vs. (Sigma(X)*Sigma(Y)) Stability",
ylab='Median Average Deviation')
> lines(mad.sig.xy,type = 'l',col='grey',lwd=3)
> legend('topright',legend = c("Cov(X,Y)","Sigma(X)*Sigma(Y)"),
fill=c('black','grey'),border=NA)
Figure 2:
26
> par(mfrow = c(2, 1))
> set.seed(123)
> x = rnorm(1000)
> y = rnorm(1000)
> CLPM.250 = numeric()
> cor.250 = numeric()
> for (i in 1:(length(x) - 250)) {
rolling.x = x[i:(i + 250)]
rolling.y = y[i:(i + 250)]
CLPM.250[i] = Co.LPM(0, 0, rolling.x, rolling.y, mean(rolling.x),
mean(rolling.y))
cor.250[i] = cor(rolling.x, rolling.y)
}
> plot(CLPM.250, type = "l", lwd = 3, ylim = c(min(CLPM.250, cor.250),
max(CLPM.250, cor.250)), main = "Rolling 250 period CLPM vs. Pearson corrrelation",
ylab = "")
> lines(cor.250, type = "l", lwd = 3, col = "grey")
> legend("center", horiz = T, legend = c("CLPM", "Pearson"), fill = c("black",
"grey"), border = NA, bty = "n")
> CLPM.20 = numeric()
> cor.20 = numeric()
> for (i in 1:(length(x) - 20)) {
rolling.x = x[i:(i + 20)]
rolling.y = y[i:(i + 20)]
CLPM.20[i] = Co.LPM(0, 0, rolling.x, rolling.y, mean(rolling.x),
mean(rolling.y))
cor.20[i] = cor(rolling.x, rolling.y)
}
> plot(CLPM.20, type = "l", lwd = 3, ylim = c(min(CLPM.20, cor.20),
max(CLPM.20, cor.20)), main = "Rolling 20 period CLPM vs. Pearson corrrelation",
ylab = "")
> lines(cor.20, type = "l", lwd = 3, col = "grey")
> legend("topleft", horiz = T, legend = c("CLPM"), fill = c("black"),
border = NA, bty = "n")
> legend("bottomleft", horiz = T, legend = c("Pearson"), fill = c("grey"),
border = NA, bty = "n")
Figure 4:
27
> text(0.8 * max(x), mean(y[y < mean(y)]), format(mean(y[y < mean(y)]),
digits = 4), col = "red", pos = 1)
> abline(v = mean(x[x < mean(x)]), col = "red")
> text(mean(x[x < mean(x)]), 0.9 * max(y), format(mean(x[x < mean(x)]),
digits = 4), col = "red", pos = 4)
> plot(x, x, main = "100 Observations", xlab = "X", ylab = "Y",
col = ifelse(x <= mean(x) & y <= mean(y), "black", "grey"))
> abline(h = mean(x[x < mean(x)]), col = "red")
> text(0.8 * max(x), mean(x[x < mean(x)]), format(mean(x[x < mean(x)]),
digits = 4), col = "red", pos = 1)
> abline(v = mean(x[x < mean(x)]), col = "red")
> text(mean(x[x < mean(x)]), 0.9 * max(x), format(mean(x[x < mean(x)]),
digits = 4), col = "red", pos = 4)
> set.seed(123)
> x = rnorm(1e+05)
> y = rnorm(1e+05)
> plot(x, y, main = "100,000 Observations", xlab = "X", ylab = "Y",
col = ifelse(x <= mean(x) & y <= mean(y), "black", "grey"))
> abline(h = mean(y[y < mean(y)]), col = "red")
> text(0.8 * max(x), mean(y[y < mean(y)]), format(mean(y[y < mean(y)]),
digits = 3), col = "red", pos = 1)
> abline(v = mean(x[x < mean(x)]), col = "red")
> text(mean(x[x < mean(x)]), 0.9 * max(y), format(mean(x[x < mean(x)]),
digits = 3), col = "red", pos = 4)
> plot(x, x, main = "100,000 Observations", xlab = "X", ylab = "Y",
col = ifelse(x <= mean(x) & x <= mean(x), "black", "grey"))
> abline(h = mean(x[x < mean(x)]), col = "red")
> text(0.8 * max(x), mean(x[x < mean(x)]), format(mean(x[x < mean(x)]),
digits = 3), col = "red", pos = 1)
> abline(v = mean(x[x < mean(x)]), col = "red")
> text(mean(x[x < mean(x)]), 0.9 * max(x), format(mean(x[x < mean(x)]),
digits = 3), col = "red", pos = 4)
Figure 5:
28
> text(mean(x[x < mean(x) & y < mean(y)]), 0.9 * min(y[x < mean(x) &
y < mean(y)]), format(mean(x[x < mean(x) & y < mean(y)]),
digits = 4), col = "red", pos = 2)
> slope = (mean(y[x < mean(x) & y < mean(y)]) - mean(y))/(mean(x[x <
mean(x) & y < mean(y)]) - mean(x))
> segments(mean(x), mean(y), mean(x[x < mean(x) & y < mean(y)]),
mean(y[x < mean(x) & y < mean(y)]), col = "green", lwd = 3)
> segments(mean(x[x < mean(x) & y < mean(y)]), mean(y[x < mean(x) &
y < mean(y)]), (min(x[x < mean(x) & y < mean(y)])), slope *
(min(x[x < mean(x) & y < mean(y)])), col = "green", lwd = 3)
Figure 6:
Figure 7:
29
> abline(v = mean(x[x < mean(x) & y < mean(y)]), col = "red")
> text(mean(x[x < mean(x) & y < mean(y)]), 0.9 * min(y[x < mean(x) &
y < mean(y)]), format(mean(x[x < mean(x) & y < mean(y)]),
digits = 4), col = "red", pos = 2)
> slope = (mean(y[x < mean(x) & y < mean(y)]) - mean(y))/(mean(x[x <
mean(x) & y < mean(y)]) - mean(x))
> segments(mean(x), mean(y), mean(x[x < mean(x) & y < mean(y)]),
mean(y[x < mean(x) & y < mean(y)]), col = "green", lwd = 3)
> segments(mean(x[x < mean(x) & y < mean(y)]), mean(y[x < mean(x) &
y < mean(y)]), (min(x[x < mean(x) & y < mean(y)])), slope *
(min(x[x < mean(x) & y < mean(y)])), col = "green", lwd = 3)
> abline(lm(y[x < mean(x) & y < mean(y)] ~ x[x < mean(x) & y <
mean(y)]), col = "yellow", lwd = 3)
> new.x = x[x < mean(x) & y < mean(y)]
> new.y = y[x < mean(x) & y < mean(y)]
> xyNorm <- cbind(x = new.x - mean(new.x), y = new.y - mean(new.y))
> xyCov <- cov(xyNorm)
> eigenValues <- eigen(xyCov)$values
> eigenVectors <- eigen(xyCov)$vectors
> lines(new.x, (eigenVectors[2, 1]/eigenVectors[1, 1] * xyNorm[,
1]) + mean(new.y), col = "blue", lwd = 2)
Figure 8:
30
> segments(mean(x), mean(y), mean(x[x < mean(x) & y < mean(y)]),
mean(y[x < mean(x) & y < mean(y)]), col = "green", lwd = 3)
> segments(mean(x[x < mean(x) & y < mean(y)]), mean(y[x < mean(x) &
y < mean(y)]), (min(x[x < mean(x) & y < mean(y)])), slope.clpm *
(min(x[x < mean(x) & y < mean(y)])), col = "green", lwd = 3)
> segments(mean(x), mean(y), mean(x[x < mean(x) & y > mean(y)]),
mean(y[x < mean(x) & y > mean(y)]), col = "green", lwd = 3)
> segments(mean(x[x < mean(x) & y > mean(y)]), mean(y[x < mean(x) &
y > mean(y)]), (min(x[x < mean(x) & y > mean(y)])), slope.dupm *
(min(x[x < mean(x) & y > mean(y)])), col = "green", lwd = 3)
> abline(h = mean(y[x < mean(x)]), lwd = 3, lty = 3, col = "green")
Figure 9:
31
> abline(h = mean(y[x > mean(x)]), lwd = 3, lty = 3, col = "green")
Figure 10:
32
> par(mfrow = c(1, 1))
> set.seed(123)
> x = rnorm(10000)
> y = rnorm(10000)
> z = rnorm(10000)
> NNS.cor.hd(cbind(x, y, z), plot = T)
Figure 13:
> set.seed(123)
> x = rnorm(10000)
> y = x
> z = x
> NNS.cor.hd(cbind(x, y, z), plot = T)
33