Professional Documents
Culture Documents
Distance Correlation
Distance Correlation
In statistics and in probability theory, distance correlation or distance covariance is a measure of dependence
between two paired random vectors of arbitrary, not necessarily equal, dimension. The population distance
correlation coefficient is zero if and only if the random vectors are independent. Thus, distance correlation
measures both linear and nonlinear association between two random variables or random vectors. This is in
contrast to Pearson's correlation, which can only detect linear association between two random variables.
Distance correlation can be used to perform a statistical test of dependence with a permutation test. One first
computes the distance correlation (involving the re-centering of Euclidean distance matrices) between two
random vectors, and then compares this value to the distance correlations of many shuffles of the data.
Background
The classical measure of dependence, the
Pearson correlation coefficient,[1] is mainly
sensitive to a linear relationship between
two variables. Distance correlation was
introduced in 2005 by Gábor J. Székely in
several lectures to address this deficiency of
Pearson's correlation, namely that it can
easily be zero for dependent variables.
Correlation = 0 (uncorrelatedness) does not
imply independence while distance
correlation = 0 does imply independence.
The first results on distance correlation were Several sets of (x, y) points, with the distance correlation
published in 2007 and 2009.[2][3] It was coefficient of x and y for each set. Compare to the graph on
proved that distance covariance is the same correlation
as the Brownian covariance.[3] These
measures are examples of energy distances.
The distance correlation is derived from a number of other quantities that are used in its specification,
specifically: distance variance, distance standard deviation, and distance covariance. These quantities take
the same roles as the ordinary moments with corresponding names in the specification of the Pearson product-
moment correlation coefficient.
Definitions
Distance covariance
Let us start with the definition of the sample distance covariance. Let (Xk, Yk), k = 1, 2, ..., n be a statistical
sample from a pair of real valued or vector valued random variables (X, Y). First, compute the n by n distance
matrices (aj, k) and (bj, k) containing all pairwise distances
where ||⋅ ||denotes Euclidean norm. Then take all doubly centered distances
where is the j-th row mean, is the k-th column mean, and is the grand mean of the distance matrix of
the X sample. The notation is similar for the b values. (In the matrices of centered distances (Aj, k) and (Bj,k) all
rows and all columns sum to zero.) The squared sample distance covariance (a scalar) is simply the arithmetic
average of the products Aj, k Bj, k:
The statistic Tn = n dCov2 n (X, Y) determines a consistent multivariate test of independence of random vectors in
arbitrary dimensions. For an implementation see dcov.test function in the energy package for R.[4]
The population value of distance covariance can be defined along the same lines. Let X be a random variable
that takes values in a p-dimensional Euclidean space with probability distribution μ and let Y be a random
variable that takes values in a q-dimensional Euclidean space with probability distribution ν, and suppose that X
and Y have finite expectations. Write
where E denotes expected value, and and are independent and identically
distributed. The primed random variables and denote independent and identically
[5]
distributed (iid) copies of the variables and and are similarly iid. Distance covariance can be expressed in
terms of the classical Pearson's covariance, cov, as follows:
This identity shows that the distance covariance is not the same as the covariance of distances,
cov(‖X − X' ‖, ‖Y − Y' ‖). This can be zero even if X and Y are not independent.
Alternatively, the distance covariance can be defined as the weighted L2 norm of the distance between the joint
characteristic function of the random variables and the product of their marginal characteristic functions:[6]
where , , and are the characteristic functions of (X, Y), X, and Y, respectively, p, q
denote the Euclidean dimension of X and Y, and thus of s and t, and cp , cq are constants. The weight function
is chosen to produce a scale equivariant and rotation invariant measure that doesn't go to
zero for dependent variables.[6][7] One interpretation of the characteristic function definition is that the variables
eisX and eitY are cyclic representations of X and Y with different periods given by s and t, and the expression
ϕX, Y(s, t) − ϕX(s) ϕY(t) in the numerator of the characteristic function definition of distance covariance is simply
the classical covariance of eisX and eitY. The characteristic function definition clearly shows that dCov2 (X, Y) = 0
if and only if X and Y are independent.
The distance variance is a special case of distance covariance when the two variables are identical. The
population value of distance variance is the square root of
where , , and are independent and identically distributed random variables, denotes the expected
value, and for function , e.g., .
which is a relative of Corrado Gini's mean difference introduced in 1912 (but Gini did not work with centered
distances).[8]
The distance standard deviation is the square root of the distance variance.
Distance correlation
The distance correlation [2][3] of two random variables is obtained by dividing their distance covariance by the
product of their distance standard deviations. The distance correlation is the square root of
and the sample distance correlation is defined by substituting the sample distance covariance and distance
variances for the population coefficients above.
For easy computation of sample distance correlation see the dcor function in the energy package for R.[4]
Properties
Distance correlation
i. and ; this is in contrast to Pearson's correlation,
which can be negative.
ii. if and only if X and Y are independent.
iii. implies that dimensions of the linear subspaces spanned by X and Y samples
respectively are almost surely equal and if we assume that these subspaces are equal, then in
this subspace for some vector A, scalar b , and orthonormal matrix .
Distance covariance
i. and ;
ii. for all constant vectors ,
scalars , and orthonormal matrices .
iii. If the random vectors and are independent then
Equality holds if and only if and are both constants, or and are both constants, or
are mutually independent.
iv. if and only if X and Y are independent.
This last property is the most important effect of working with centered distances.
Distance variance
i. if and only if almost surely.
ii. if and only if every sample observation is identical.
iii. for all constant vectors A, scalars b , and orthonormal matrices
.
iv. If X and Y are independent then .
Equality holds in (iv) if and only if one of the random variables X or Y is a constant.
Generalization
Distance covariance can be generalized to include powers of Euclidean distance. Define
One can extend to metric-space-valued random variables and : If has law in a metric space with
metric , then define , , and (provided is finite, i.e., has finite
first moment), . Then if has law (in a possibly different
metric space with finite first moment), define
This is non-negative for all such iff both metric spaces have negative type.[11] Here, a metric space
has negative type if is isometric to a subset of a Hilbert space.[12] If both metric spaces have
strong negative type, then iff are independent.[11]
Alternately, one could define distance covariance to be the square of the energy distance: In
this case, the distance standard deviation of is measured in the same units as distance, and there exists an
unbiased estimator for the population distance covariance. [10]
Under these alternate definitions, the distance correlation is also defined as the square , rather than
the square root.
where E denotes the expected value and the prime denotes independent and identically distributed copies. We
need the following generalization of this formula. If U(s), V(t) are arbitrary random processes defined for all real
s and t then define the U-centered version of X by
whenever the subtracted conditional expected value exists and denote by YV the V-centered version of
Y.[3][13][14] The (U,V) covariance of (X,Y) is defined as the nonnegative number whose square is
whenever the right-hand side is nonnegative and finite. The most important example is when U and V are two-
sided independent Brownian motions /Wiener processes with expectation zero and covariance
|s| + |t| − |s − t| = 2 min(s,t) (for nonnegative s, t only). (This is twice the covariance of the standard Wiener
process; here the factor 2 simplifies the computations.) In this case the (U,V) covariance is called Brownian
covariance and is denoted by
There is a surprising coincidence: The Brownian covariance is the same as the distance covariance:
On the other hand, if we replace the Brownian motion with the deterministic identity function id then Covid (X,Y)
is simply the absolute value of the classical Pearson covariance,
Related metrics
Other correlational metrics, including kernel-based correlational metrics (such as the Hilbert-Schmidt
Independence Criterion or HSIC) can also detect linear and nonlinear interactions. Both distance correlation and
kernel-based metrics can be used in methods such as canonical correlation analysis and independent component
analysis to yield stronger statistical power.
See also
RV coefficient
For a related third-order statistic, see Distance skewness.
Notes
1. Pearson 1895a, 1895b 8. Gini 1912.
2. Székely, Rizzo & Bakirov 2007. 9. Székely & Rizzo 2009b.
3. Székely & Rizzo 2009a. 10. Székely & Rizzo 2014.
4. Rizzo & Székely 2021. 11. Lyons 2014.
5. Székely & Rizzo 2014, p. 11. 12. Klebanov 2005, p. .
6. Székely & Rizzo 2009a, p. 1249, Theorem 7, 13. Bickel & Xu 2009.
(3.7). 14. Kosorok 2009.
7. Székely & Rizzo 2012.
References
Bickel, Peter J.; Xu, Ying (2009). "Discussion of: Brownian distance covariance" (http://projecteuc
lid.org/download/pdfview_1/euclid.aoas/1267453934). The Annals of Applied Statistics. 3 (4):
1266–1269. doi:10.1214/09-AOAS312A (https://doi.org/10.1214%2F09-AOAS312A).
Gini, C. (1912). Variabilità e Mutabilità. Bologna: Tipografia di Paolo Cuppini.
Bibcode:1912vamu.book.....G (https://ui.adsabs.harvard.edu/abs/1912vamu.book.....G).
Klebanov, L. B. (2005). N-distances and their applications. Prague: Karolinum Press, Charles
University. ISBN 9788024611525.
Kosorok, Michael R. (2009). "Discussion of: Brownian distance covariance". The Annals of
Applied Statistics. 3 (4): 1270–1278. arXiv:1010.0822 (https://arxiv.org/abs/1010.0822).
doi:10.1214/09-AOAS312B (https://doi.org/10.1214%2F09-AOAS312B). S2CID 88518490 (http
s://api.semanticscholar.org/CorpusID:88518490).
Lyons, Russell (2014). "Distance covariance in metric spaces". The Annals of Probability. 41 (5):
3284–3305. arXiv:1106.5758 (https://arxiv.org/abs/1106.5758). doi:10.1214/12-AOP803 (https://d
oi.org/10.1214%2F12-AOP803). S2CID 73677891 (https://api.semanticscholar.org/CorpusID:736
77891).
Pearson, K. (1895a). "Note on regression and inheritance in the case of two parents".
Proceedings of the Royal Society. 58: 240–242. Bibcode:1895RSPS...58..240P (https://ui.adsab
s.harvard.edu/abs/1895RSPS...58..240P).
Pearson, K. (1895b). "Notes on the history of correlation" (https://zenodo.org/record/1431597).
Biometrika. 13: 25–45. doi:10.1093/biomet/13.1.25 (https://doi.org/10.1093%2Fbiomet%2F13.1.2
5).
Rizzo, Maria; Székely, Gábor (2021-02-22). "energy: E-Statistics: Multivariate Inference via the
Energy of Data" (https://cran.r-project.org/web/packages/energy/index.html). Version: 1.7-8.
Retrieved 2021-10-31.
Székely, Gábor J.; Rizzo, Maria L.; Bakirov, Nail K. (2007). "Measuring and testing independence
by correlation of distances". The Annals of Statistics. 35 (6): 2769–2794. arXiv:0803.4101 (https://
arxiv.org/abs/0803.4101). doi:10.1214/009053607000000505 (https://doi.org/10.1214%2F00905
3607000000505). S2CID 5661488 (https://api.semanticscholar.org/CorpusID:5661488).
Székely, Gábor J.; Rizzo, Maria L. (2009a). "Brownian distance covariance" (http://projecteuclid.o
rg/download/pdfview_1/euclid.aoas/1267453933). The Annals of Applied Statistics. 3 (4): 1236–
1265. doi:10.1214/09-AOAS312 (https://doi.org/10.1214%2F09-AOAS312). PMC 2889501 (http
s://www.ncbi.nlm.nih.gov/pmc/articles/PMC2889501). PMID 20574547 (https://pubmed.ncbi.nlm.
nih.gov/20574547).
Székely, Gábor J.; Rizzo, Maria L. (2009b). "Rejoinder: Brownian distance covariance" (https://do
i.org/10.1214%2F09-AOAS312REJ). The Annals of Applied Statistics. 3 (4): 1303–1308.
doi:10.1214/09-AOAS312REJ (https://doi.org/10.1214%2F09-AOAS312REJ).
Székely, Gábor J.; Rizzo, Maria L. (2012). "On the uniqueness of distance covariance". Statistics
& Probability Letters. 82 (12): 2278–2282. doi:10.1016/j.spl.2012.08.007 (https://doi.org/10.101
6%2Fj.spl.2012.08.007).
Székely, Gabor J.; Rizzo, Maria L. (2014). "Partial Distance Correlation with Methods for
Dissimilarities". The Annals of Statistics. 42 (6): 2382–2412. arXiv:1310.2926 (https://arxiv.org/ab
s/1310.2926). Bibcode:2014arXiv1310.2926S (https://ui.adsabs.harvard.edu/abs/2014arXiv131
0.2926S). doi:10.1214/14-AOS1255 (https://doi.org/10.1214%2F14-AOS1255). S2CID 55801702
(https://api.semanticscholar.org/CorpusID:55801702).
External links
E-statistics (energy statistics) (http://personal.bgsu.edu/~mrizzo/energy.htm)