Journal of Multivariate Analysis 93 (2005) 5880

A new test for multivariate normality

bor J. Sze kelya,,1 and Maria L. Rizzob Ga

Department of Mathematics and Statistics, Bowling Green State University, Bowling Green, OH, 43403-0221, USA b Department of Mathematics, Ohio University, Athens, OH 45701, USA Received 14 June 2002

Abstract We propose a new class of rotation invariant and consistent goodness-of-t tests for multivariate distributions based on Euclidean distance between sample elements. The proposed test applies to any multivariate distribution with nite second moments. In this article we apply the new method for testing multivariate normality when parameters are estimated. The resulting test is afne invariant and consistent against all xed alternatives. A comparative Monte Carlo study suggests that our test is a powerful competitor to existing tests, and is very sensitive against heavy tailed alternatives. r 2003 Elsevier Inc. All rights reserved.
AMS 2000 subject classications: primary 62G10; secondary 62H15 Keywords: Goodness-of-t; Strictly negative denite; BHEP test; HenzeZirkler test; Multivariate skewness; Multivariate kurtosis; Projection pursuit

1. Introduction We propose a new class of consistent tests for comparing multivariate distributions based on Euclidean distance between sample elements. Applications include one-sample goodness-of-t tests for discrete or continuous multivariate distributions in arbitrary dimension d X1: The new tests can be applied to assess distributional assumptions for many classical procedures in multivariate analysis. In this article we present the general form of the multivariate goodness-of-t statistic,

kely, M.L. Rizzo / Journal of Multivariate Analysis 93 (2005) 5880 G.J. Sze 59

and show that the corresponding tests are consistent against all xed alternatives. We implement the proposed test of multivariate normality and prove consistency under the composite hypothesis when population parameters are estimated from the sample. A comparative Monte Carlo power study is presented to assess the empirical power performance of the new test of multivariate normality. Recent new approaches to testing multivariate normality have been proposed by Baringhaus and Henze [2], Bowman and Foster [5], Henze and Wagner [11], Henze and Zirkler [12], Romeu and Ozturk [19]. The most widely applied tests of multivariate normality are based on Mardias multivariate generalization of skewness and kurtosis [16]. Malkovich and A [15] proposed tests of multivariate normality based on one dimensional skewness and kurtosis statistics. Projection pursuit approaches to testing multinormality have applied the same idea, and also rvon Mises and KolmogorovSmirnov goodness-of-t apply this idea with Crame statistics, and sample entropy [6,28,29]. The family of d -variate normal distributions is closed with respect to all nonsingular linear transformations, called afne transformations, so it is desirable that a goodness-of-t test of multinormality is invariant under such transformations. A test statistic Tn dened on a d -dimensional sample X1 ; y; Xn is afne invariant if Tn AX1 ; y; AXn Tn X1 ; y; Xn for every afne transformation A of Rd : A goodness-of-t test of H0 : F AF vs. H1 : F eF is consistent against all xed alternatives if the probability of rejecting the null hypothesis approaches one as sample size tends to innity, whenever the distribution of the sampled population is not a member of F: Mardia [16] proposed multivariate generalizations of skewness and kurtosis measures. The class of elliptically symmetric distributions, including d -variate normal, have population skewness zero. Baringhaus and Henze [3] obtained the nonnull limiting distribution of Mardias skewness statistic and showed that Mardias skewness test is consistent if and only if the sampled population is not elliptically symmetric. The kurtosis of a d -variate normal distribution as dened by Mardia is d d 2: Henze [10] showed that Mardias kurtosis test of d -variate normality is consistent if and only if the kurtosis of the sampled population differs from d d 2: Afne invariance and consistency are desirable properties for a test of multivariate normality, and certainly it is desirable that such a test is applicable for arbitrary dimension and sample size. Although there are several afne invariant tests of multivariate normality in the literature, not many of these tests are known to satisfy all these properties. The BHEP class of tests [2,7,11], including the HenzeZirkler test [12], are based on the empirical characteristic function. The BHEP tests of multivariate normality are afne invariant and consistent against all xed alternatives. Our proposed test of multivariate normality has all the desirable properties mentioned above, and it is practical to apply. Results of a Monte Carlo power study suggest that the new test we propose is a powerful competitor to other existing afne invariant tests, including Mardias multivariate skewness and kurtosis tests, and the BHEP class of tests.

60 kely, M.L. Rizzo / Journal of Multivariate Analysis 93 (2005) 5880 G.J. Sze

Section 2 presents the theoretical foundation of the test, and the corresponding goodness-of-t statistics are developed in Section 3. Consistency is proved in Section 4. Section 5 presents a Monte Carlo comparative power study. Asymptotic behavior of the test is discussed in Section 6, followed by a Summary in Section 7.

2. The basic inequality Suppose X1 ; y; Xn is a d -dimensional random sample, d X1: Let X 0 denote an independent copy of the random variable X : Consider the V -statistic with kernel hx; y E jjx Y jj E jjy Y jj E jjY Y 0 jj jjx yjj; 1

where jjxjj xT x1=2 is the Euclidean norm of x; and Y is a d -dimensional random vector. Dene n n 1 X 1 X En hXj ; Xk n 2 hXj ; Xk : n j ; k 1 n j ; k 1 Then En =n is a V -statistic with kernel hx; y: Since E hx; X 0 for every x; En =n is a degenerate kernel V -statistic. The existence of the limiting distribution of nEn =n En follows from U -statistics limit theorems of Hoeffding [13], provided E h2 X ; X 0 oN and X Y ; where means X and Y are identically distributed (see Section 6). We will show that if X ; X 0 ; Y ; Y 0 are independent, E jjX jj and E jjY jj are nite, then E hX ; Y 2E jjX Y jj E jjX X 0 jj E jjY Y 0 jjX0 2
d d

with equality if and only if X and Y are identically distributed. An elementary proof of inequality (2) follows from the property of strict negative deniteness, dened below, of Euclidean distance. Let S be an arbitrary nonempty set. A symmetric, real valued function gx; y dened on S S is negative denite if for every positive integer n; n X gxj ; yk rj rk p0 3
j ;k1

for every set fxj ; xk AS S ; j ; k 1; y; ng; and every n-tuple of real numbers P r1 ; y; rn such that n j 1 rj 0: The function gx; y is strictly negative denite if the above inequalities (3) are strict whenever x1 ; y; xn are distinct and at least one of the r1 ; y; rn does not vanish. In case S is a metric space and g is continuous on S S we have the following equivalent denition. The function gx; y is strictly negative denite if Z Z gx; yrxry dQx dQyp0

kely, M.L. Rizzo / Journal of Multivariate Analysis 93 (2005) 5880 G.J. Sze 61

R whenever S rx dQx 0 for some probability measure Q; and equality holds if and only if rx 0 a.s. Q: Proposition 1 (Strict negative deniteness of jjx yjj). Let gx; y : Rd Rd -R be dened as the Euclidean distance between vectors x and y: Then gx; y jjx yjj is strictly negative denite. Proposition 1 is proved in the appendix. Theorem 1. Let S be an arbitrary metric space, and let gx; y be a continuous, symmetric, real-valued function on S S : Suppose X ; X 0 ; Y ; and Y 0 are independent S -valued random variables, X and X 0 are identically distributed, and Y and Y 0 are identically distributed. Suppose gX ; X 0 ; gY ; Y 0 ; and gX ; Y have nite expected values. Then 2E gX ; Y E gX ; X 0 E gY ; Y 0 X0 4

if and only if g is negative denite. If g is strictly negative denite then equality holds in (4) if and only if X and Y are identically distributed. Proof. Let m and n denote the distributions of X and Y respectively, let Q be an arbitrary probability measure such that Q dominates both m and n; and dene rx Then R

d mx d nx : dQ dQ

rx dQx 0; and by negative deniteness of gx; y we have

2E gX ; Y E gX ; X 0 E gY ; Y 0 Z Z Z Z gx; y d mx d my gx; y d mx d ny S S S S Z Z Z Z gx; y d my d nx gx; y d nx d ny S S Z SZ S gx; yrxry dQx dQyX0:


If gx; y is strictly negative denite, equality holds if and only if rx 0 a.s. Q; that is, if and only if X and Y are identically distributed. & Corollary 1. Suppose X ; X 0 ; Y ; and Y 0 are independent random vectors in Rd : If X and X 0 are identically distributed, Y and Y 0 are identically distributed, E jjX jj and E jjY jj are nite, then 2E jjX Y jj E jjX X 0 jj E jjY Y 0 jjX0; and equality holds if and only if X and Y are identically distributed. 5

62 kely, M.L. Rizzo / Journal of Multivariate Analysis 93 (2005) 5880 G.J. Sze

The following special case of Corollary 1 is proved by Morgenstern [17, p. 347]. For equal numbers of black and white points in euclidean space the sum of the pairwise distances between points of equal color is less than or equal to the sum of the pairwise distances between points of different color. Equality holds only in the case when black and white points coincide. The special case Y X (in distribution) of Corollary 1 is proved in [23, p. 458 459]. This special case was also associated with testing for diagonal symmetry in [24,25].

3. Goodness-of-t tests Suppose X1 ; y; Xn is a random sample from a d -variate population with distribution F ; and x1 ; y; xn are the observed values of the random sample. The proposed statistic for testing H0 : F F0 vs. H1 : F aF0 is ! n n 2X 1 X 0 En;d n E jjxj X jj E jjX X jj 2 jjxj xk jj ; n j 1 n j ;k1 where X and X 0 are independent and identically distributed (iid) with distribution F0 : If the hypothesized distribution is d -variate normal with mean vector m and nonsingular covariance matrix S; denoted Nd m; S; consider the transformed sample yj S1=2 xj m; i 1; y; n: The test statistic for d -variate normality is En;d ! n n X 2X 1 n E jjyj Z jj E jjZ Z 0 jj 2 jjyj yk jj ; n j 1 n j ;k1 6

where Z and Z0 denote iid Nd 0; I random variables, and I is the d d identity matrix. The rst component of the test statistic involves computing E jja Z jj where aARd is xed. In the univariate case E jja Z jj 2aFa 2fa a; where Fx and fx are the N 0; 1 cumulative distribution and density functions. This mean can also be expressed a series. Substituting the Taylor expansion of Fx; Z a Fx dx E ja Z j E jZ j a 2 0 " # Z a N 1 1 X 1k x2k1 p a2 dx 2 2p k 0 2k k ! 2k 1 0 r N 2X 1k jaj2k2 : p k0 2k k!2k 12k 2

kely, M.L. Rizzo / Journal of Multivariate Analysis 93 (2005) 5880 G.J. Sze 63

For d X1 the following formula holds: E jja Z jj E jjZjj r N 1 3 Gd 2 X 1k jjajj2k2 2 Gk 2 : p k0 k!2k 2k 12k 2 Gk d 2 1 Then since E jjZ Z0 jj
1 p G d 2 ; 2E jjZjj 2 d G2

the computing formula for the d -variate normality test statistic is given by ! n n 1 Gd 2X 1 X 2 En;d n 2 E jjyj Z jj 2 jjyj yk jj ; n j 1 n j ; k 1 G d 2 where p d 1 2 G 2 E jja Z jj Gd N r2 1 3 Gd 2 X 1k jjajj2k2 2 Gk 2 : p k0 k! 2k 2k 12k 2 Gk d 2 1

A test of the simple hypothesis of d -variate normality, d X1; rejects the null hypothesis for large values of En;d : The test is afne invariant, and consistency is an immediate consequence of Theorem 1. In practice, however, the parameters of the hypothesized normal distribution are usually unknown. If Nd m; S denotes the family of d -variate normal distributions with mean vector m and covariance matrix S40; and FX is the distribution of a d dimensional random vector X ; the problem is to test if FX ANd m; S: In this case, % and sample parameters m and S are estimated from the sample mean vector X 1 Pn T % % covariance matrix S n 1 j 1 Xj XXj X ; and the standardized 1=2 % ; j 1; y; n: The joint distribution of Xj X sample vectors are Yj S Y1 ; y; Yn does not depend on unknown parameters, but Y1 ; y; Yn are dependent. As a rst step, we will ignore the dependence of the standardized sample, and use the # n;d the computing formula given for testing the simple hypothesis. Denote by E version of the test statistic (6) and computing formula (8) obtained by standardizing the sample with estimated parameters. The test rejects the hypothesis of multivariate # n;d : Percentiles of the nite sample null distribution of normality for large values of E # n;d is clearly # n;d can be estimated by simulation (see Table 1). The test based on E E afne invariant, but since the standardized sample is dependent, the hypotheses of Theorem 1 do not hold.

64 kely, M.L. Rizzo / Journal of Multivariate Analysis 93 (2005) 5880 G.J. Sze Table 1 # n;d Empirical percentiles of E n 25 d 1 2 3 4 5 10 0.90 0.686 0.856 0.983 1.098 1.204 1.644 0.95 0.819 0.944 1.047 1.147 1.241 1.659 n 50 0.90 0.695 0.872 1.002 1.119 1.223 1.671 0.95 0.832 0.960 1.066 1.167 1.263 1.690 n 100 0.90 0.701 0.879 1.011 1.124 1.234 1.685 0.95 0.840 0.969 1.077 1.174 1.277 1.705

4. Consistency In this section a proof of consistency is presented for testing the composite # n;d : hypothesis of multivariate normality using the test statistic E # Let ca;n;d denote the constant satisfying PEn;d Xca;n;d a: The test is consistent against all xed alternatives if whenever the sampled population is nonnormal and # n;d Xca;n;d 1: To prove consistency it is necessary that the xed, limn-N PE sequence of critical values ca;n;d for any xed a and d is bounded above by a nite constant ka;d that does not depend on n: The existence of ka;d follows from the # n;d that does not depend existence of a nite upper bound for the expected value of E on n: The following inequalities are used to prove that such an upper bound exists. Proposition 2. If Z1 ; y; Zn is a univariate standard normal random sample, and Y1 ; y; Yn is the standardized sample, then E jY1 Y2 j4E jZ1 Z2 j: 9

Proposition 3. Let " # n 1X ES E jYj Z j n j 1 Pn denote the expected value of 1 respect to all standardized univariate n j 1 E jyj Z j; with RN normal samples Y1 ; y; Yn ; where E ja Z j N ja zjfz dz: Then for X1 ; y; Xn and Z iid N 0; 1; " # " # n n 1X 1X ES E jYj Z j pE E jX j Z j : 10 n j 1 n j 1

kely, M.L. Rizzo / Journal of Multivariate Analysis 93 (2005) 5880 G.J. Sze 65

Proofs of Propositions 2 and 3 are given in the appendix. Proposition 4. Under the hypothesis of normality, for every xed integer d X1; # n;d is bounded above by a nite constant that depends only on d : 1. E E # n;d are bounded 2. For every xed aA0; 1; the sequence of critical values ca;n;d of E above by a nite constant ka;d that depends only on a and d :

Proof. Inequalities (9) and (10) imply that under univariate normality " !# n n X X 2 1 0 # n;1 ES n E E E jY j Z j E jZ Z j 2 jYj Yk j n j 1 n j ; k 1 o 2nE jX Z j nE jZ Z0 j n 1E jZ1 Z2 j 2 E jZ Z 0 j p: p # n;1 for any xed a are bounded It follows that the sequence of critical values ca;n of E above by some nite constant ka : In the multivariate case, the same relationship # n;d is at most d times the corresponding univariate expected value, so holds: E E # n;d are bounded. & # E En;d and the critical values of E # n;d is Theorem 2 (Consistency). The test of multivariate normality based on E consistent against all nonnormal xed alternatives. Proof. Suppose X is nonnormal with E X m mk ; jmk joN; k 1; y; d ; and CovX S sjk ; jsj;k joN; j ; k 1; y; d : Without loss of generality, assume that m 0 and S I : Let X1 ; y; Xn be a random sample from the distribution of X % ; j 1; y; n: If S is singular, dene Yj Xj X % : Then and Yj S 1=2 Xj X ! n n X 1 X # n;d n 2 E E jjyj Z jj E jjZ Z0 jj 2 jjyj yk jj n j 1 n j;k1
n X j 1

Rj ;

where R1 ; y; Rn are identically distributed as n 1X jjY1 Yj jj R1 2E jjy1 Zjj E jjZ Z0 jj n j 1 and Z ; Z0 are iid Nd 0; I : From (A.7) in the appendix " # " # n n 1X 1X E jY j Z j E E jXj Z j : lim ES n-N n j 1 n j 1

66 kely, M.L. Rizzo / Journal of Multivariate Analysis 93 (2005) 5880 G.J. Sze



n 1X Rj 2E jjX Zjj E jjZ Z0 jj E jjX X 0 jjX0; n j 1 d

with equality if and only if X Z ; by Theorem 1. Since X is nonnormal, and limn-N ca;n pkd ;a ; there exists d40 such that # n;d 4ca;n;d X lim PE # n;d Xka;d lim PE n-N n-N ! n X Rj Xka;d X lim Pnd4ka;d 1: lim P
n-N j 1 n-N

This proves consistency for all xed nonnormal alternatives X with nite covariance. In the general case, consider the truncated distributions. The details of the truncation argument are given in the appendix. &

5. Empirical results In order to assess the performance of the new test, we performed a comparative Monte Carlo power study of eight afne invariant tests of multivariate normality. We chose two categories of tests for comparison; tests based on skewness and kurtosis, and the BHEP tests based on the empirical characteristic function. Projection pursuit tests based on ve univariate goodness-of-t tests of normality were also compared. 5.1. Multivariate tests The multivariate skewness test proposed by Mardia [16] is based on the sample skewness statistic dened n 1 X # 1 X k X % T S % 3 ; b1;d 2 Xj X 11 n j ; k 1 P # n 1 n X j X % Xj X % T denotes the maximum likelihood estimator of where S j 1 population covariance. Normality is rejected for large values of b1;d : Mardias multivariate kurtosis test is based on the sample kurtosis n 1X # 1 X j X % T S % 2 : b2;d Xj X 12 n j 1 Large values of jd d 2 b2;d j are signicant. The BHEP statistics are dened as follows. Assume that the sample covariance # 1=2 Xk X % ; k 1; y; n; is the standardized matrix is nonsingular and Yk S sample. The statistic Tn;d b is the weighted integral of the squared difference

kely, M.L. Rizzo / Journal of Multivariate Analysis 93 (2005) 5880 G.J. Sze 67

between the multivariate normal characteristic function and the empirical Pn itT Y k ; where i is the complex unit. The test characteristic function Cn t 1 k 1 e n statistic is dened (see [11]) Z 2 Tn;d b jjCn t ejjtjj =2 jj2 jb t dt

where jb t is a weighting function. When the weighting function is jb t 2pb2 d =2 ejjtjj



for a xed parameter b40; the BHEP test statistic is n 2 2 1 X eb =2jjYj Yk jj Tn;d b 2 n j;k1 21 b2 d =2
n 2 2 2 1X eb jjYj jj =21b 1 2b2 d =2 : n j 1


Normality is rejected for large values of Tn;d b: Four versions of Tn;d b are included in the power comparison, where b :1; :5; 1; 3: Henze and Zirkler [12] show that setting   1 2d 1n 1=d 4 b p 14 4 2 corresponds to choosing optimal bandwidth for a multivariate nonparametric kernel density estimator with Gaussian kernel. This choice of b in (13) denes the Henze Zirkler test, denoted HZ : 5.2. Projection pursuit tests The projection pursuit (PP) approach to testing multivariate normality is based on the well known fact that a d -dimensional random variable X with mean vector m and covariance matrix S has a multivariate normal Nd m; S distribution if and only if the distribution of aT X is (univariate) N aT m; aT Sa for all vectors aARd : The PP method tests the worst one dimensional projection of the multivariate data according to a univariate goodness-of-t index. Suppose Cn is a statistic for testing univariate normality that rejects normality for large Cn : For a d -dimensional random sample X1 ; y; Xn ; dene
Cn X1 ; y; Xn

aARd ;jjajj1

fCn aT X1 ; y; aT Xn g:

The PP test based on the index Cn rejects multinormality for large values of Cn : To implement the PP test based on Cn ; we approximate Cn by nding the maximum Cn over a nite set of projections determined by a suitable uniformly scattered net of vectors in Rd : In d 2 a suitable net of K points in R2 is given by fak cosyk ; sinyk ; k 1; y; K g; where yk 2p2k 1=2K : See Fang and Wang [8, Chapters 1 and 4] for details of implementation in d 42:

68 kely, M.L. Rizzo / Journal of Multivariate Analysis 93 (2005) 5880 G.J. Sze

We apply PP tests based on ve univariate tests of normality: univariate statistic rvon Mises W 2 ; KolmogorovSmirnov D; univariate skewness E; modied Crame Ob1 ; and univariate kurtosis b2 (kurtosis is a two-tailed test). If x1 ; y; xn is the observed d -dimensional random sample, and aARd ; let yaj aT S 1=2 xj x % ; j 1; y; n: Let ya1 ; y; yan denote the ordered standardized univariate sample corresponding to a: The univariate E statistic has a very simple form, implemented as ! n n X X 2 2 2 Ea n 2yaj Fyaj 2fyaj p 2 2j 1 nyaj : n j 1 p n j 1 P Let Fn j n1 n k1 1fxk pxj g; where 1 is the indicator function, denote the a empirical distribution function (EDF) of the sample, and let Fn denote the EDF of 2 the projected sample ya1 ; y; yan : The statistics W ; D; Ob1 ; and b2 corresponding to P p a 2 a a a a are W 2 n n n max jFn yaj Fyaj j; Oba j 1 Fn yaj Fyaj ; D 1 1pj pn P P 3 n 4 a 1 n 1 n j 1 yaj ; and b2 n j 1 yaj :

5.3. Simulation design Since our sample sizes are not large, we used empirical critical values for all of the test statistics in our power study. The critical values of all multivariate and PP tests compared were estimated by Monte Carlo simulation of 20,000 standardized # n;d are given in Nd 0; I samples (40,000 for n 25). Approximate percentiles for E Table 1 for selected values of d and n: # n;d with the Our Monte Carlo power study for multivariate normality compared E seven multivariate tests described above for d 2; 3; 5; 10; n 25; 50; 100; at signicance level a 0:05: The ve PP tests were compared for d 2; n 25; 50; 100; at signicance level a 0:05: Power was estimated from a simulation of 2000 random samples from the alternative distribution. The PP tests were implemented by projecting each standardized sample in 15,000 directions: fak cosyk ; sinyk ; k 1; y; 15; 000g; where yk 2p2k 1=30; 000: A normal mixture is denoted pNd m1 ; S1 1 pNd m2 ; S2 ; where the sampled populations is Nd m1 ; S1 with probability p; and Nd m2 ; S2 with probability 1 p: As the mixing parameter p and other parameters are varied, the multivariate normal mixtures have a wide variety of types of departures from normality. A 50% normal location mixture is symmetric with light tails, and a 90% normal location mixture is p 3 skewed with heavy tails. A normal location mixture with p 1 1 21 3 0:7887; provides an example of a skewed distribution with normal kurtosis [10]. The scale mixtures in the comparison are symmetric with heavier tails than normal.

kely, M.L. Rizzo / Journal of Multivariate Analysis 93 (2005) 5880 G.J. Sze Table 2 Percentage of signicant tests of bivariate normality of 2000 Monte Carlo samples at a 0:05 Alternative 0:5Nd 0; I 0:5Nd 3; I 0:79Nd 0; I 0:21Nd 3; I 0:9Nd 0; I 0:1Nd 3; I 0:5Nd 0; B 0:5Nd 0; I 0:9Nd 0; B 0:1Nd 0; I 0:5Nd 0; I 0:5Nd 3; I 0:79Nd 0; I 0:21Nd 3; I 0:9Nd 0; I 0:1Nd 3; I 0:5Nd 0; B 0:5Nd 0; I 0:9Nd 0; B 0:1Nd 0; I 0:5Nd 0; I 0:5Nd 3; I 0:79Nd 0; I 0:21Nd 3; I 0:9Nd 0; I 0:1Nd 3; I 0:5Nd 0; B 0:5Nd 0; I 0:9Nd 0; B 0:1Nd 0; I n 25 25 25 25 25 50 50 50 50 50 100 100 100 100 100 E 26 50 44 19 22 71 92 81 32 39 99 100 98 65 61 b1 2 24 45 19 29 1 51 83 21 46 1 91 100 23 55 b2 19 9 20 14 26 34 8 36 24 51 62 7 53 40 75 T0:1 2 28 46 19 29 2 58 86 20 46 2 93 100 23 57 T0:5 5 39 52 19 27 6 83 89 25 48 17 100 99 43 70 T1 24 54 47 18 21 64 94 82 31 38 98 100 98 62 61 T3 39 34 19 14 10 79 77 48 21 15 99 99 84 50 28 HZ 35 53 41 18 18 82 92 74 30 29 100 100 96 64 47 69

m 3 denotes the mean vector 3; 3T ; S B denotes 1 on diagonal and 0:9 off diagonal.

5.4. Results of simulations Empirical results summarized in Table 2 for tests of bivariate normality suggest # n;d test is more powerful than Mardias skewness or kurtosis tests against that the E the symmetric mixtures with light or normal tails, and more powerful than the kurtosis test against the skewed heavy tailed mixtures. The HenzeZirkler test is very sensitive against the alternatives with light or normal tails, but less powerful in this comparison against the heavy tail alternatives. Monte Carlo results in dimensions d 3; 5; and 10 for n 50 in Table 3 suggest that the relative performance of the eight tests is similar to the bivariate case, and that all eight tests are practical and effective in higher dimensions. Fig. 1 shows empirical power across sample sizes n 25; 50; 100 in d 5 for a # n;d test was superior to the multivariate 90% normal location mixture, where the E skewness, kurtosis, and HZ tests. In Fig. 2 power is compared in d 5 for a 79% # n;d and HZ have comparable normal location mixture, which suggests that E performance superior to the skewness and kurtosis tests. Results shown in Fig. 3 # n;d is at least as powerful as the HZ and skewness tests against this suggest that E 50% normal locationscale mixture. Empirical power performance of the multivariate skewness, kurtosis, and BHEP tests in our study was consistent with results of similar power studies of Henze and Zirkler [12], Romeu and Ozturk [19], Horswell and Looney [14], and others. The multivariate skewness test was one of the most sensitive tests against skewed heavy tailed alternatives, but relatively very poor at detecting nonnormality in symmetric

70 kely, M.L. Rizzo / Journal of Multivariate Analysis 93 (2005) 5880 G.J. Sze Table 3 Percentage of signicant tests of multivariate normality of 2000 Monte Carlo Samples at a 0:05; n 50 Alternative 0:5Nd 0; I 0:5Nd 3; I 0:79Nd 0; I 0:21Nd 3; I 0:9Nd 0; I 0:1Nd 3; I 0:5Nd 0; B 0:5Nd 0; I 0:9Nd 0; B 0:1Nd 0; I 0:5Nd 0; I 0:5Nd 3; I 0:79Nd 0; I 0:21Nd 3; I 0:9Nd 0; I 0:1Nd 3; I 0:5Nd 0; B 0:5Nd 0; I 0:9Nd 0; B 0:1Nd 0; I 0:5Nd 0; I 0:5Nd 3; I 0:79Nd 0; I 0:21Nd 3; I 0:9Nd 0; I 0:1Nd 3; I 0:5Nd 0; B 0:5Nd 0; I 0:9Nd 0; B 0:1Nd 0; I d 3 3 3 3 3 5 5 5 5 5 10 10 10 10 10 E 58 98 91 71 65 20 79 93 99 89 5 27 66 100 97 b1 2 42 89 40 71 3 28 82 79 95 3 11 48 100 99 b2 21 8 35 52 76 11 7 33 95 95 6 6 20 100 98 T0:1 3 49 90 37 71 4 36 86 75 95 3 15 57 100 99 T0:5 6 78 94 53 72 7 66 95 92 92 5 29 68 100 96 T1 62 96 88 67 58 39 85 84 99 76 13 26 28 100 70 T3 70 66 40 47 23 23 25 16 76 22 8 10 9 92 17 HZ 80 95 81 68 48 50 82 72 99 66 13 24 25 100 66

m 3 denotes the d 1 mean vector 3; y; 3T ; S B denotes 1 on diagonal and 0.9 off diagonal.

90 80

E HZ Skewness Kurtosis

Empirical Power

70 60 50 40 30 20 10 25 50 100

Sample Size
Fig. 1. Empirical power of tests of multivariate normality d 5; n 50 against normal location mixture 0:9N5 0; I 0:1N5 2; I : percent of signicant tests of 2000 Monte Carlo samples at a 0:05: E denotes # n;5 and HZ denotes the HenzeZirkler test. E

light tailed distributions. The multivariate kurtosis test had good performance against symmetric distributions with nonnormal kurtosis. Performance of xed b BHEP tests against different classes of alternatives depends on the parameter b: Henze and Wagner [11] give a graphical illustration and discussion of this dependence. Empirical results agree with conclusions of Henze and Zirkler [12] and Henze and Wagner [11] that small b is a good choice against symmetric heavy tailed alternatives, while large b is better against the symmetric light tailed

kely, M.L. Rizzo / Journal of Multivariate Analysis 93 (2005) 5880 G.J. Sze
90 80 70


E HZ Skewness Kurtosis

Empirical Power

60 50 40 30 20 10 0 25 50 100

Sample Size
Fig. 2. Empirical power of tests of multivariate normality d 5; n 50 against normal location mixture 0:7887N5 0; I 0:2113N5 2; I : percent of signicant tests of 2000 Monte Carlo samples at a 0:05: E # n;5 and HZ denotes the HenzeZirkler test. denotes E

70 60

E HZ Skewness Kurtosis

Empirical Power

50 40 30 20 10 0




Sample Size
Fig. 3. Empirical power of tests of multivariate normality d 5; n 50 against normal mixture 0:5N5 0; C 0:5N5 2; I : percent of signicant tests of 2000 Monte Carlo samples at a 0:05: # n;5 and HZ denotes the Covariance matrix C has 1 on diagonal and 0.5 off diagonal. E denotes E HenzeZirkler test.

alternatives. The parameter b in the HenzeZirkler test is between 1 and 2, and although not the optimal choice of b for most alternatives, provides a test that is effective against a wide class of alternatives. Empirical results comparing projection pursuit tests are presented in Table 4. The PP-E test was comparable to or better than the multivariate E depending on the alternative. Overall, the PP-E test was more powerful than the PP EDF tests W 2 and D; however for sample sizes nX50 there was little or no difference in power between PP-E and PP-W 2 : The multivariate skewness test was more powerful than the PP-

72 kely, M.L. Rizzo / Journal of Multivariate Analysis 93 (2005) 5880 G.J. Sze Table 4 Percentage of signicant tests of bivariate normality, by projection pursuit method, of 2000 Monte Carlo samples at a 0:05 Alternative 0:5Nd 0; I 0:5Nd 3; I 0:79Nd 0; I 0:21Nd 3; I 0:9Nd 0; I 0:1Nd 3; I 0:5Nd 0; B 0:5Nd 0; I 0:9Nd 0; B 0:1Nd 0; I 0:5Nd 0; I 0:5Nd 3; I 0:79Nd 0; I 0:21Nd 3; I 0:9Nd 0; I 0:1Nd 3; I 0:5Nd 0; B 0:5Nd 0; I 0:9Nd 0; B 0:1Nd 0; I 0:5Nd 0; I 0:5Nd 3; I 0:79Nd 0; I 0:21Nd 3; I 0:9Nd 0; I 0:1Nd 3; I 0:5Nd 0; B 0:5Nd 0; I 0:9Nd 0; B 0:1Nd 0; I n 25 25 25 25 25 50 50 50 50 50 100 100 100 100 100 E 36 60 47 20 20 88 96 82 40 40 100 100 99 75 77 Ob1 3 24 49 17 18 2 55 87 22 20 2 94 100 24 24 b2 48 10 21 15 14 88 8 35 26 25 100 7 62 53 52 W2 28 60 46 19 20 84 96 80 36 37 100 100 98 73 73 D 22 48 36 15 14 61 87 67 26 25 97 100 94 54 53

m 3 denotes the mean vector 3; 3T ; S B denotes 1 on diagonal and 0.9 off diagonal.

skewness test against the 90% scale mixture alternative, but otherwise both versions of skewness tests were comparable in power. Against some alternatives, such as the symmetric location mixture, the PP-kurtosis test was considerably more sensitive than the multivariate version. Against other alternatives such as the 90% scale mixture, the PP-kurtosis was considerably weaker than the multivariate test. Based on these results, it does not appear that PP-skewness or PP-kurtosis offers a clear advantage over Mardias tests, which are computationally much less intensive. The extra computational burden does, however, seem to correspond to increased power for the PP-E statistic against some alternatives, without sacricing power against others. The purpose of this power study was to assess the performance of the new test # n;d : The E # n;d statistic had impressive performance against all the based on E symmetric and skewed heavy tailed alternatives considered.

6. On the asymptotic behavior of the test We have shown that under the hypotheses of Corollary 1, the V -statistics En =n based on the kernel hx; y dened in (1), satisfy E hX ; Y X0 with equality if and only if X Y : If Vn is a degenerate V -statistic generated by square integrable kernel

kely, M.L. Rizzo / Journal of Multivariate Analysis 93 (2005) 5880 G.J. Sze 73

h; then nVn converges in distribution to N X 2 lk Z k ;

k 0

where fZk g is a sequence of independent standard normal random variables, and flk g are the eigenvalues determined by the integral equation Z cxhx; y dFX x lcy: 15 See [21] or [27] for asymptotic distribution theory of degenerate kernel V -statistics. Since En =n is a degenerate kernel V -statistic, the distribution of En under X Y P 2 converges to a proper limiting distribution (the quadratic form N k0 lk Zk ) provided the second moments of X are nite. If X and Y are not identically distributed, then limn-N En N with probability 1. Hence the statistics En determine consistent goodness-of-t tests for any multivariate distributions with nite second moments. In the univariate case, there is an interesting connection between E hX ; Y 2E jX Y j E jX X 0 j E jY Y 0 j rvon Mises distance given by and the Crame Z N Gt F t2 ct dF t;
N d


where F and G are the distribution functions of X and Y ; respectively, and ct is a weight function. It is easy to prove that in one dimension Z N 2E jX Y j E jX X 0 j E jY Y 0 j 2 G t F t2 dt: 17

(Equality (17) cannot hold in higher dimensions since the right-hand side is not r rotation invariant.) Our test is a rotation invariant multivariate version of a Crame von Mises type test, and the power of our univariate test is similar to the power of a rvon Mises type test. From the right-hand side of (17) we get the suitable Crame rvon Mises goodness-of-t formula if dt is classical distribution free Crame replaced by dF t (and G is replaced by the empirical distribution Fn ). When the weight function in (16) is ct F t1 F t1 ; we have the AndersonDarling distance [1]. In case of standard normal null F ; the shape of the curve ct1 F t1 F t is similar to the shape of the density F 0 t: their ratio is close to a constant c (empirically cE0:67:) That is, if we replace dF t=F t1 F t by c1 dt; we can see that our univariate test hardly differs from the powerful AndersonDarling test of univariate normality. In fact our simulations show (see Table 5) that in one dimension the power of our test, even for small samples sizes, is almost the same as that of AndersonDarling, which has extremely good 0.96 Bahadur local index for Gaussian null and location alternatives [18, p. 80]. Integral equations of the form (15) typically do not have nice analytical solutions. rvon Mises case when F is linear in [0,1] and A remarkable exception is the Crame

74 kely, M.L. Rizzo / Journal of Multivariate Analysis 93 (2005) 5880 G.J. Sze Table 5 Empirical power of E test of univariate normality compared with the AndersonDarling A2 test: percentage of signicant tests of 5000 Monte Carlo samples, at a 0:05 n 10 Alternative Student t4 Student t10 Uniform(0,1) Logistic(0,1) Laplace w2 5 w2 10 0:5N 0; 1 0:5N 3; 1 0:79N 0; 1 0:21N 3; 1 0:9N 0; 1 0:1N 3; 1 Binomial(10,0.5) Binomial(20,0.5) Poisson(4) Poisson(8) E 13 6 8 8 16 19 11 8 13 13 20 11 16 10 A2 14 6 8 8 16 19 11 8 13 13 20 11 16 9 n 25 E 26 9 23 12 31 45 25 19 34 28 51 19 37 16 A2 27 9 23 12 31 47 26 19 34 29 50 19 37 16 n 50 E 40 11 56 16 55 78 46 45 65 49 99 50 86 33 A2 41 12 58 16 55 80 48 44 65 50 98 48 86 33 n 100 E 64 15 92 22 83 99 78 81 93 80 100 100 100 86 A2 65 16 94 23 83 99 80 81 93 82 100 100 100 84

ct is identically 1. For other exceptions see [26]. So far we have not tried numerical approximations because (i) our test does not apply the concrete form of the limit distribution, only its existence, and (ii) according to Go tze [9] and Bickel et al. [4], the rate of convergence is on1 for degenerate kernel U -statistics (and V -statistics), compared with On1=2 rate of convergence to the normal limit in the nondegenerate case. For this reason as Henze and Wagner [11] observed for their degenerate kernel BHEP V -statistics, the test is practically sample size independent. We observed the same behavior for our degenerate kernel; see Table 1. This fact decreases the practical importance of large sample analysis.

7. Summary We have proposed a new class of tests for comparing multivariate distributions, and applied a new test of multivariate normality that is afne invariant and consistent against all xed alternatives. The new test of multivariate normality is practical to apply for arbitrary dimension and sample size, and our Monte Carlo power comparisons suggest that it is a powerful competitor to two important categories of afne invariant tests of multivariate normality, Mardias skewness and kurtosis tests and the BHEP tests. These empirical results illustrate that none of the tests are universally superior, but some general aspects of power performance are evident. Overall, the relative # n;d was impressive against heavy tailed alternatives, and better than performance of E a skewness test against symmetric light tailed alternatives. Among the tests

kely, M.L. Rizzo / Journal of Multivariate Analysis 93 (2005) 5880 G.J. Sze 75

# n;d was the only test that never ranked below all other tests in power. We compared, E # n;d provides a powerful omnibus test of multivariate normality, in the conclude that E sense that it is consistent against all alternatives and has relatively good power against general alternatives compared with other tests. When applied as the index of # had impressive empirical power relative t in the projection pursuit test, univariate E # outperformed multivariate E # n;d against to other univariate measures of t, and PP-E # and certain alternatives. Moreover, in view of the close relation between E # AndersonDarling, multivariate En;d can be viewed as a computationally simple way to lift a powerful EDF test to arbitrarily high dimension.

Acknowledgments The authors thank the referees for their constructive comments.

Appendix. Proofs of statements Proof of Proposition 1. Let x1 ; y; xn be arbitrary distinct points in Rd : To prove that
n X j ;k1

jjxj xk jjrj rk p0 whenever

n X j 1

r j 0;

with equality if and only if r1 ? rn 0; it is equivalent to prove the following: if p1 ; y; pn and q1 ; y; qn are probability distributions, and x1 ; y; xn are distinct arbitrary points in Rd ; then
n X j ;k1

jjxj xk jj2pj qk pj pk qj qk X0;


and equality holds if and only if p1 ; y; pn and q1 ; y; qn are identical distributions. Fix the points x1 ; y; xn in Rd and a hyperplane H ; and suppose that there are m points on one side of H ; 0omon: Select two points at random from x1 ; y; xn according to the p or q distribution and connect them if the points are on opposite sides of H : The connected pairs are called homogeneous pairs if both Pare selected by the same distribution, and called mixed pairs otherwise. If p m j 1 pj and q Pm j 1 qj ; then the expected number of mixed pairs minus the expected number of homogeneous pairs is p1 q q1 p p1 p q1 q p q2 X0; and equality holds if and only if p q: A:2

76 kely, M.L. Rizzo / Journal of Multivariate Analysis 93 (2005) 5880 G.J. Sze

By a suitable randomization of H (see below for details), the sum in inequality (A.1) is determined by the difference of expected length of line segments connecting mixed pairs and expected length of line segments connecting homogenous pairs. Suppose H is chosen randomly, and consider all possible partitions Pm fSm1 ; Sm2 g of the points x1 ; y; xn determined by H : Then each partition denes probability distributions fpm ; 1 pm g and fqm ; 1 qm g; such that pm P P m j ASm qj : j ASm pj and q
1 1

For each of the r possible partitions P1 ; y; Pr let am denote the probability that H determines partition Pm : Then the expected number of line segments between mixed pairs minus the expected number of line segments between homogeneous pairs is
r X m1

am pm 1 qm qm 1 pm

pm 1 pm qm 1 qm Xr a pm qm 2 X0: m1 m The hyperplane H is randomized as follows. Select a ball B in Rd with center O that contains the points x1 ; y; xn ; a point P uniformly distributed on the surface of B; and a point Q uniformly distributed on radius OP: Choose the hyperplane H such that H contains Q and H is perpendicular to OP: Then for each pair xj ; xk ; the probability that H intersects the segment between xj and xk is proportional to jjxj xk jj: This is a known fact in integral geometry; see e.g. [20] or [22]. If H is chosen in this way, the expected length of line segments between mixed pairs minus the expected length of line segments between homogeneous pairs is proportional to
n X j ;k1

jjxj xk jj2pj qk pj pk qj qk :

This sum is nonnegative and equals zero if and only if pm qm 2 0 for all m: For each j ; there is an m such that pm pj and qm qj ; so equality holds in (A.1) if and only if pj qj ; j 1; 2; y; n: & % ; S) by Basus Proof of Proposition 2. Y is an ancillary statistic and independent of (Z Theorem, so E jZ1 Z2 j E E jZ1 Z2 j jS E E S jY1 Y2 jjS E S E jY1 Y2 j:

kely, M.L. Rizzo / Journal of Multivariate Analysis 93 (2005) 5880 G.J. Sze 77

Hence E jY1 Y2 j E jZ1 Z2 j 2 1 p E S p E S r n1 2 2 G 2 p p n 1 Gn 2 r r 2 2 n1 4 p E jZ1 Z2 j: 2 p n1


Proof of Proposition 3. Suppose Y1 ; y; Yn is a standardized normal sample, and let X1 ; y; Xm ; and Z1 ; y; Zn be iid N 0; 1; independent of Y1 ; y; Yn : Denote the combined ordered sample X1 ; y; Xm ; Y1 ; y; Yn by U1 ; y; Umn ; and denote the combined ordered sample X1 ; y; Xm ; Z1 ; y; Zn by W1 ; y; Wmn : If yj is xed, P then the sample mean m1 m k1 jyj Xk j is an unbiased estimator of Z N E jyj Z j jyj zjfz dz:

If y1 ; y; yn are the observed values of a standardized normal random sample, an P unbiased estimator of the expected value of n1 n j 1 E jyj Z j is
m X n 1 X jyj Xk j; mn k1 j1


and hence " ES

# " # n m X n 1X 1 X E jYj Z j E jYj Xk j : n j 1 mn j1 k1

An L-statistic is a linear combination of the ordered sample. If X1 ; y; Xn denotes the ordered sample X1 ; y; Xn ; then the following L-statistics identities hold. n n n X X X jXj Xk j 2 2k 1 nXk ; A:4
j 1 k 1 k 1

n m X X j 1 k 1

jXj Yk j

n m X k 1

2k 1 n mUk 2k 1 nXk
m X k 1

n X k 1

2k 1 mYk : A:5 Z j is A:6

An unbiased estimator of the expected value of n1 1 mn

m X k 1 n X j 1


j 1 E j xj

jZj Xk j:

78 kely, M.L. Rizzo / Journal of Multivariate Analysis 93 (2005) 5880 G.J. Sze

By identity (A.5) the difference in the Monte Carlo estimates (A.3) and (A.6) is mn1 times m X n m X n X X jXk Yj j jX k Z j j
k 1 j 1 nm X k 1 j 1

2k 1 n mUk Wk 2k 1 mYk Zk : A:7

k 1 n X k 1

From identity (A.4) and inequality (9), " # n X 2k 1 nYk Zk E

k 1


Pnm lim

n n 1 E Y1 Y2 E Z1 Z2 40: 2 1 n mUk Wk has negative expected value,

k1 2k

m- N

m X n m X n 1 X 1 X jYj Xk j lim jZj Xk j m-N mn mn k1 j1 k 1 j 1

cannot be positive for any nite sample size n: & Proof of Theorem 2. In Section 4, proof of consistency of the test of multivariate # n;d is given for the case when the alternative distribution normality that is based on E has nite second moments. In the general case, we apply consistency in the special case of nite second moments to the truncated distribution. For a xed constant M dene S 1=2 X X % Y where X 8 <X if jjX jjoM ; if jjX jjXM ;

X :M jjX jj

; and let % ; S denote the sample mean vector and sample covariance matrix of X and X ! n n X 1 X #M n 2 E E jjy k Z jj E jjZ Z0 jj 2 jjy j y k jj : n;d n k 1 n j;k1 To nd a suitable truncation point M ; rst dene a constant M0 as follows. For arbitrary mARd and symmetric positive denite Sd d ; dene mm; S supfmX0: X Nd m; S a:e: on B0; mg;

kely, M.L. Rizzo / Journal of Multivariate Analysis 93 (2005) 5880 G.J. Sze 79

where B0; m denotes the ball of radius m centered at 0, and let M0 supm;S fmm; Sg: Then M0 oN since X is nonnormal. If X is truncated at M 4 M0 ; X is nonnormal with nite covariance, so for any # M 4 c 1: cAR; limn-N PE n;d Let fpn gD0; 1 be a sequence such that limn-N pn n 1: For each n; choose 4pn if X is truncated at Mn ; and limn-N Mn N: Mn 4M0 such that PX X Then jjyj yk jjXjjy j y k jj and ! n n X X jjyj yk jj4 jjy j y k jj lim P
n-N j ;k1 n-N j ;k1 n-N

lim Pjjxj jjXMn for some j lim 1 pn n 0: Similarly, E jjyj ZjjXE jjy j Z jj and

lim P

n X j 1

E jjyj Z jj4

n X j 1

E jjy j Z jj

lim Pjjxj jjXMn for some j lim 1 pn n 0:


# n;d 4c limn-N PE # M n 4 c 1: Therefore limn-N PE n;d References


