Confidence Intervals For The Mahalanobis Distance: Communication in Statistics-Simulation and Computation March 2001

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/266719016

Confidence intervals for the Mahalanobis distance

Article  in  Communication in Statistics- Simulation and Computation · March 2001


DOI: 10.1081/SAC-100001856

CITATIONS READS

29 524

1 author:

Benjamin Reiser
University of Haifa
86 PUBLICATIONS   2,747 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

ROC surface analysis View project

All content following this page was uploaded by Benjamin Reiser on 24 February 2019.

The user has requested enhancement of the downloaded file.


CONFIDENCE INTERVALS FOR THE MAHALANOBIS DISTANCE

Benjamin Reiser

Department of Statistics
University of Haifa
Haifa 31905, Israel

Key Words: generalized ROC criterion; non-central F; optimal error rate;


overlapping coefficient.

ABSTRACT

Mahalanobis distances appear, often in a disguised form, in many statistical


problems dealing with comparing two multivariate normal populations.
Assuming a common covariance matrix the overlapping coefficient (Bradley,
1985), optimal error rates (Rao and Dorvlo, 1985) and the generalized ROC
criterion (Reiser and Faraggi, 1997) are all monotonic functions of the
Mahalanobis distance. Approximate confidence intervals for all of these have
appeared in the literature on an ad-hoc basis. In this paper we provide a unified
approach to obtaining an effectively exact confidence interval for the
Mahalanobis distance and all the above measures.

1. INTRODUCTION

For two multivariate normal populations π 1 and π 2 with mean vectors µ x

and µ y and common covariance matrix Σ the squared Mahalanobis distance is

defined to be

δ = ( µ x - µ y )′ Σ-1 ( µ x - µ y ).
2
(1)

δ 2 provides a measure of the distance between the two populations. In many


statistical problems δ 2 appears in a disguised form. In discriminant analysis
(Lachenbruch, 1975) the optimal error rates are the misclassification
probabilities which would be incurred by the best allocation rule calculated with
known population parameters. The optimal error rates in both π 1 and π 2 are
OER = Φ (-δ/2 ) (2)
where Φ denotes the cumulative standard normal distribution function. We use
δ to denote the positive square root of δ 2 . There is a substantial literature on
the point estimation of OER. Rao and Dorvlo (1985) provide a review while
Dorvlo (1992) discusses a jackknife-based procedure to obtain an approximate
confidence interval for (2).
The overlapping coefficient (OVL) which is defined (Bradley, 1985) to be the
common area under two probability density functions is often used as a measure
of agreement between two populations. This coefficient has recently been
suggested (Rom and Hwang, 1996) as an appropriate measure of
bioequivalence. For two multivariate normal distributions with common
covariance matrix it can be shown that
OVL = 2 Φ (-δ/2 ) . (3)
For the univariate normal case Inman and Bradley (1994), Rom and Hwang
(1996) and Reiser and Faraggi (1999) discuss statistical inference. The more
general multivariate case does not appear to have been discussed to date in the
literature.
Su and Liu (1993) extend the use of receiver operating characteristic (ROC)
curves in the assessment of diagnostic markers to deal with multiple markers.
They define a generalized ROC criterion, A, which provides a measure of how
effective the "best" linear combination of the markers is. Under homogeneous
multivariate normality

A = Φ ( δ/ 2 ) . (4)
Reiser and Faraggi (1997) develop a confidence interval for A.


If a confidence interval for δ 2 could be obtained, confidence intervals for
OVL, OER and A would automatically follow as these are all monotone
transformations of δ 2 .
Madansky and Olkin (1969) provide an approximate method based on the
asymptotic distribution of the likelihood ratio statistic, which could be applied to
provide an approximate confidence interval for (1). A Bayesian approach for
inference on (1) (Radhakrishnan, 1984) is possible but we will not consider it in
this paper.
In Section 2 we review the approach Reiser and Faraggi (1997) use to obtain
a confidence interval for A and point out that it provides an almost exact
solution for the confidence interval of δ 2 and consequently for OER, OVL and
A. In Section 3 we further examine the properties of this solution by means of a
simulation study and provide an example. Some concluding comments are
provided in Section 4.

2. CONFIDENCE INTERVALS FOR THE MAHALANOBIS DISTANCE

Let X i i=1,...,m and Y j j=1,...,n be p dimensional vectors sampled from the

normal populations π 1 and π 2 respectively. Denote the sample mean vectors

and sample sum of square matrices by x , y , S x and S y respectively. Set

µ = µ x - µ y , µ̂ = x - y , S p = ( S x + S y )/(m + n - 2) and δˆ 2 = µˆ ′ S -1p µˆ .

Following Reiser and Faraggi (1997)

m n (m + n - p - 1) ˆ 2
δ = D 2 ~ F p,m+n- p-1 ( λ ) (5)
m + n (m + n - 2) p

mn 2
with λ = δ where F v1,v2 ( λ ) denotes a non-central F variate with v1 and
m+ n


v 2 degrees of freedom and non-centrality parameter λ 

Consequently one can immediately obtain a confidence interval for δ 2 by


numerically solving the equations
mn 2 α
Prob ( F p,m+n - p -1 ( δ ) ≤ D2 ) = 1 - (6)
m+n 2
mn 2 α
Prob ( F p,m+n - p -1 ( δ ) ≤ D2 ) = (7)
m+n 2
2 2
for δ 2 and δ respectively. The interval ( δ 2 , δ ) provides a 1- α confidence

interval for δ 2 . If Prob( F p,m+n - p-1 (0) ≤ D 2 ) is less than 1- α /2 [ α /2] then no
2
solution is obtained for (6) [(7)] and the bound δ 2 [ δ ] is taken to be zero. If
not for this restriction the interval obtained from solving (6) and (7) would have
exactly 1- α coverage. By solving only one of (6) or (7) the corresponding
one-sided confidence bounds are obtained. Frequently when dealing with
distances or functionals such as OER one-sided bounds are of greater interest
than two sided intervals.
We numerically investigate the coverage of this procedure in Section 3. Lam
(1987) provides graphs which gives solutions of (6) and (7) for certain values of
α and of the degrees of freedom. We had no difficulty in finding numerical
solutions using the GAUSS Nonlinear Equations program
If the true value of δ 2 is large, (6) and (7) will generally be solvable.

Solutions will tend not to be available for small δ 2 . Denote by F γp,m + n - p - 1 the γ

percentile point of the central F distribution with appropriate degrees of

freedom. The lower bound δ


2
will be taken to be 0 if and only if

D
2
≤ 1-α / 2
F p,m + n - p - 1 (8)
2
while the upper bound δ will be taken to be zero if and only if

D
2
≤ α/2
F p,m + n - p - 1 . (9)


Note that in the extreme case of the true value of δ 2 being 0, events (8) and (9)

occur with probability 1- α  and α  respectively. Consequently for δ 2 =0 the


resulting confidence interval can take the following three forms:
(i) (0,0) with probability α /2
(ii) (0,a), a>0, with probability 1- α
(iii) (b,c), c>b>0, with probability α /2
Thus the nominal 1- α confidence interval will actually cover δ 2 =0 in cases (i)
and (ii) with total probability 1- α /2, i.e. conservative coverage. However for

δ quite small, but not zero, the actual coverage will tend to be close to the
2

nominal value of 1- α since the probabilities in (i), (ii) and (iii) will not change
very much but coverage will tend to occur only in case (ii). Reiser and Faraggi
(1999) have made similar remarks for the univariate (p=1) problem.

3. SIMULATION STUDY AND AN EXAMPLE

For the simulation study we used a certain standardization to ease the


computation burden. For Σ positive definite, there exists a nonsingular matrix C
such that
C Σ C′= I .
Clearly inference on δ 2 based on the X i and Y j is equivalent to inference

based on X i* = CX i and Y j* = CY j where X i* ~ N ( µ*x , I ) and

Y j ~ N ( µ y , I ) with µ x = C µ x and µ y = C µ y . Consequently we carry


* * * *

out the simulation using two multivariate normal populations with common
covariance matrix taken to be the identity matrix and choose the population
means to provide a range of values for δ 2 .
An extensive simulation study was carried out for δ 2 = 0, 0.0001, 0.125,


0.25, 0.5,1 and 2; α =0.05and 0.10; m=n=10, 20, 50; p=2, 3 and 5. 10,000
simulations were carried out for every combination of the parameter values
given above. Confidence intervals for δ 2 were computed as described in
Section 2. The observed percentage of cases in which the confidence intervals
contained the true value of δ 2 was noted and is denoted by CP in Table I. For
the sake of brevity we provide results only on some of the δ 2 values. The
simulations were programmed in GAUSS (1994). In addition the Table presents
the proportion of cases falling below (above) the lower (upper) confidence
bounds. We denote these proportions by LT and RT respectively. These
measure the adequacy of the coverage of one-sided confidence bounds. An
estimated probability marked with bold face indicates that the 95% confidence
interval for that probability (based on a binomial sample of 10,000 simulated
data sets) does not contain the targeted nominal value.
The confidence interval gives simulated coverages close to their nominal
values except for the case of δ 2 =0. For this case the coverage is conservative
and is in fact 1- α /2 instead of 1- α This interval is asymmetric with all cases
falling outside the interval falling below the lower bound. This is as expected
based on our theoretical explanation in Section 2. For all other cases LT is close
to RT
One should note that for δ 2 ≠ 0 even in the cases (bold) in which the
approximate confidence intervals for the probabilities do not include the nominal
value, the estimated probabilities are quite close to their nominal values. The
differences between the estimated and nominal probabilities have little practical
importance. The performance of this procedure does not depend on p.
Our simulation study indicates that the confidence interval for the
Mahalanobis distance (and consequently for OVL, OER and A) presented in this
paper performs well even for small sample sizes and all δ 2 except for δ 2


TABLE I. Coverage Probabilities for the Mahalanobis Distance based on 10,000 Simulations.

α =1 - 0.95 α =1 - 0.90
m=n=10 m=n=20 m=n=50 m=n=10 m=n=20 m=n=50

p CP LT RT CP LT RT CP LT RT CP RT LT CP LT RT CP LT RT
δ
2

2 0 .975 .025 0 .977 .023 0 .975 .025 0 .951 .049 0 .953 .047 0 .949 .051 0

3 0 .972 .028 0 .972 .028 0 .976 .024 0 .947 .053 0 .947 .053 0 .953 .047 0

5 0 .974 .026 0 .966 .034 0 .974 .026 0 .943 .057 0 .937 .063 0 .951 .049 0

2 .0001 .949 .025 .026 .949 .025 .026 .953 .023 .024 .900 .047 .053 .900 .047 .053 .903 .049 .048

3 .0001 .948 .027 .025 .948 .027 .025 .953 .025 .022 .896 .055 .049 .896 .055 .049 .903 .050 .047

5 .0001 .949 .026 .025 .943 .032 .025 .946 .028 .026 .900 .051 .049 .890 .061 .049 .895 .051 .054

2 .125 .952 .025 .023 .952 .025 .023 .955 .023 .022 .899 .050 .051 .899 .050 .051 .904 .049 .047

3 .125 .954 .023 .023 .954 .023 .023 .953 .024 .023 .903 .050 .047 .903 .050 .047 .903 .049 .048

5 .125 .948 .027 .025 .940 .033 .027 .947 .027 .026 .898 .053 .049 .889 .059 .052 .896 .052 .052

2 1.0 .950 .025 .025 .950 .025 .025 .950 .025 .025 .897 .053 .050 .897 .053 .050 .898 .050 .052

3 1.0 .953 .023 .024 .953 .023 .024 .949 .025 .026 .906 .045 .049 .906 .045 .049 .897 .051 .052

5 1.0 .945 .029 .026 .942 .033 .025 .952 .024 .024 .894 .055 .051 .888 .059 .053 .900 .048 .052

Values in bold are the estimated coverage probabilities whose 95% confidence interval (based on a binomial sample of
10,000 simulated data sets) does not include the targeted nominal probability.


TABLE II. Coverage Probabilities for OER based on 10,000 Simulations using the Jackknife.

α =1 - 0.95 α =1 - 0.90
m=n=10 m=n=20 m=n=50 m=n=10 m=n=20 m=n=50

δ
p OER 2 CP LT RT CP LT RT CP LT RT CP RT LT CP LT RT CP LT RT

2 .500 0 .821 .039 .141 .845 .037 .118 .867 .029 .104 .739 .049 .212 .767 .046 .187 .789 .037 .174

3 .500 0 .738 .019 .243 .801 .021 .178 .836 .017 .147 .644 .028 .327 .709 .029 .263 .741 .023 .236

5 .500 0 .550 .007 .443 .674 .008 .319 .739 .008 .253 .459 .010 .531 .572 .011 .417 .625 .012 .363

2 .498 .0001 .816 .041 .143 .854 .038 .108 .874 .032 .094 .736 .052 .213 .770 .049 .181 .801 .040 .159

3 .498 .0001 .746 .023 .231 .810 .019 .171 .846 .022 .132 .651 .031 .318 .718 .026 .256 .757 .031 .212

5 .498 .0001 .562 .008 .430 .685 .009 .306 .757 .010 .233 .461 .013 .526 .580 .014 .406 .655 .015 .330

2 .430 .125 .865 .050 .084 .897 .049 .055 .919 .040 .041 .804 .066 .131 .846 .065 .088 .866 .063 .071

3 .430 .125 .828 .034 .138 .884 .038 .078 .912 .040 .048 .759 .047 .194 .883 .059 .119 .856 .062 .082

5 .430 .125 .682 .015 .303 .833 .022 .146 .893 .031 .076 .600 .024 .376 .757 .036 .208 .833 .049 .118

2 .301 1.0 .863 .027 .110 .904 .021 .075 .921 .024 .055 .803 .048 .149 .845 .045 .110 .862 .048 .090

3 .301 1.0 .820 .026 .154 .885 .026 .089 .910 .023 .067 .754 .049 .198 .822 .048 .130 .854 .046 .100

5 .301 1.0 .707 .021 .272 .832 .024 .144 .898 .027 .075 .630 .036 .333 .764 .044 .192 .834 .050 .116


exactly zero, which does not really occur in practice.
As an example of this methodology consider the data discussed by Srivastava
and Carter (1983 p. 236-7) in which three psychological tests are administered
to 114 patients suffering from anxiety and 33 patients suffering from hysteria.
The estimate of the Mahalanobis distance δ 2 =0.359 and the resulting 95%
confidence interval obtained from (6) and (7) is (0, 0.875). Depending on the
particular focus of scientific interest, the corresponding confidence intervals for
any of the measures described in Section 1 can be obtained. For example the
95% confidence interval for OER is (0.32, 0.5) indicating, in the context of
discriminant analysis a high probability of misclassifying patients.
An alternative confidence interval for the OER can be obtained using the
jackknife based methodology of Dorvlo (1992). In order to compare
procedure with that based on solving (6) and (7) we repeated the simulation
study summarized in Table I for the jackknife based method. The results are
presented in Table II. Due to the one-to-one relationship in (2) between the
Mahalanobis distance and OER the two tables are directly comparable. It is clear
that the jackknife provides coverage substantially less than the nominal. The
coverage increases with a larger sample size but is still inadequate for m=n=50
and decreases with increasing p. We also compared the average length of the
confidence intervals for the two methods. In the majority of cases the average
length of the jackknife based confidence intervals was substantially greater than
that of the non-central F based procedure. In the few cases where it was shorter
the differences were quite small. It is important to note that due to the clear
inadequacy in the coverage of the jackknife procedure there is not much value in
comparing these lengths. For OER, interest is often focused on an upper
confidence bound. Comparing the columns headed RT in Table I and II shows
that the jackknife does particularly poorly on the upper bound while the
non-central F based method performs well.


4. CONCLUDING REMARKS.

We have shown that bounds obtained by solving equations involving the


non-central F distribution provide a unified approach to computing confidence
intervals for certain monotonic functions of the Mahalanobis distance, which
arise in various statistical problems. This method provides either one or
two-sided intervals as needed. Although the recommended procedure will be
theoretically conservative in terms of coverage probability, we have argued
heuristically and confirmed by simulation that for non-zero Mahalanobis
distance, the coverage will be essentially exact.

ACKNOWLEDGEMENTS

The author thanks the referee for comments which let to an improved
presentation.


BIBLIOGRAPHY

Bradley, E. L. Overlapping Coefficient. Encyclopedia of Statistical Sciences,


1985.

Dorvlo, A.S.S. An Interval Estimator of the Probability of Misclassification.


Journal of Mathematical Analysis and Applications, 1992, 171, 389-394.

Gauss. Nonlinear Equation Gauss Applications, 1994, Aptech Systems, Inc.

Inman, H. F. and Bradley, E. L. Hypothesis Tests and Confidence Interval


Estimates for the Overlap of Two Normal Distributions with Equal Variances.
Envirometrics, 1994, 5, 167-189.

Lachenbruch, P.A. Discriminant Analysis, 1973, Hafner: New York

Lam, Y. M. Confidence Limits for Noncentrality Parameters of Noncentral


Chi-Squared and F Distributions. ASA Proceedings of Statistical Computing
Section, 1987, 441-443.

Madansky, A. and Olkin, I. Approximate Confidence Region for Constraint


Parameters. Multivariate Analysis II. P. R. Krishnaiah, Ed. 1969, 261-286.

Radhakrishnan, R. Estimating Mahalanobis's Distance Using Bayesian Analysis.


1984, 13, 2583-2600.

Rao, P.S.R.S and Dorvlo, A.S.S. The Jackknife Procedure for the Probabilities
of Misclassification.
Computation, 1985,14, 774-790.

Reiser, B. and Faraggi, D. Confidence Intervals for the Generalized ROC


Criterion. Biometrics, 1997, 53, 644-652.

Reiser, B. and Faraggi, D. Confidence Intervals for the Overlapping Coefficient;


The Normal Equal Variance Case. The Statistician, 1999, 48, 413-418.

Rom, D. M. and Hwang, E. Testing for Individual and Population Equivalence


Based on the Proportion of Similar Response. Statistics in Medicine, 1996, 15,
1489-1505.

Srivastava, M. S. and Carter, E. M. An Introduction to Applied Multivariate


Statistics, 1983, North Holland: New York.

Su, J. Q. and Liu, J. S. Linear Combinations of Multiple Diagnostic Markers.


Journal of the American Statistical Association, 1993, 88, 1350-1351.



View publication stats

You might also like