Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

This article was downloaded by: [Central Michigan University]

On: 20 December 2014, At: 02:37


Publisher: Taylor & Francis
Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer
House, 37-41 Mortimer Street, London W1T 3JH, UK

Technometrics
Publication details, including instructions for authors and subscription information:
http://www.tandfonline.com/loi/utch20

Principal Components and Orthogonal Regression


Based on Robust Scales
a
Ricardo Maronna
a
Mathematics Department, University of La Plata, C.C. 172, La Plata 1900, Argentina
Published online: 01 Jan 2012.

To cite this article: Ricardo Maronna (2005) Principal Components and Orthogonal Regression Based on Robust Scales,
Technometrics, 47:3, 264-273, DOI: 10.1198/004017005000000166

To link to this article: http://dx.doi.org/10.1198/004017005000000166

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained
in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no
representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of
the Content. Any opinions and views expressed in this publication are the opinions and views of the authors,
and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied
upon and should be independently verified with primary sources of information. Taylor and Francis shall
not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other
liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or
arising out of the use of the Content.

This article may be used for research, teaching, and private study purposes. Any substantial or systematic
reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any
form to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http://
www.tandfonline.com/page/terms-and-conditions
Principal Components and Orthogonal
Regression Based on Robust Scales
Ricardo M ARONNA
Mathematics Department
University of La Plata
C.C. 172, La Plata 1900, Argentina
(rmaronna@mail.retina.ar )

Both principal components analysis (PCA) and orthogonal regression deal with finding a p-dimensional
linear manifold minimizing a scale of the orthogonal distances of the m-dimensional data points to the
manifold. The main conceptual difference is that in PCA p is estimated from the data, to attain a small
proportion of unexplained variability, whereas in orthogonal regression p equals m − 1. The two main
approaches to robust PCA are using the eigenvectors of a robust covariance matrix and searching for the
projections that maximize or minimize a robust (univariate) dispersion measure. This article is more akin
to second approach. But rather than finding the components one by one, we directly undertake the problem
Downloaded by [Central Michigan University] at 02:37 20 December 2014

of finding, for a given p, a p-dimensional linear manifold minimizing a robust scale of the orthogonal
distances of the data points to the manifold. The scale may be either a smooth M-scale or a “trimmed”
scale. An iterative algorithm is developed that is shown to converge to a local minimum. A strategy based
on random search is used to approximate a global minimum. The procedure is shown to be faster than other
high-breakdown-point competitors, especially for large m. The case whereas p = m − 1 yields orthogonal
regression. For PCA, a computationally efficient method to choose p is given. Comparisons based on both
simulated and real data show that the proposed procedure is more robust than its competitors.

KEY WORDS: High breakdown point; M-scale.

1. INTRODUCTION generically called “projection pursuit.” Li and Chen (1985) de-


veloped such a method using an M estimate of scale and showed
Principal components analysis (PCA) is a widely used mul-
its consistency at elliptical models.
tivariate data-analytic tool that aims to find a small number p
Li and Chen searched for the optimal directions by means of
of linear combinations of the m observed variables, which ex-
a general projection-pursuit algorithm, which resulted in a com-
plain most of the variability in the data. Geometrically, this is
plicated computer-intensive method. Despite this drawback,
equivalent to finding a p-dimensional linear manifold minimiz-
their approach has been applied by Filzmoser (1999), Boente
ing the mean squared orthogonal distances of the data points to
and Orellana (2000), and Gather, Hilker, and Becker (1998).
the manifold. A related problem is orthogonal regression, usu-
ally used for estimation in errors-in-variables models, in which Croux and Ruiz-Gazen (2000) derived the influence functions
a hyperplane is fit to the data, the main conceptual difference of the estimates and their asymptotic variances and proposed a
being that here p equals m − 1 (instead of depending on the simple and explicit approximate version of the estimate. Xie,
data), and the coefficients of the hyperplane are meaningful pa- Wang, Liang, Sun, Song, and Yu (1993) used simulated an-
rameters of interest. nealing to find the optimal directions. Hubert, Rousseeuw, and
PCA proceeds by finding directions of maximum or mini- Verboven (2003) proposed a hybrid procedure combining pro-
mum variability in the data space. The classical approach mea- jection pursuit and a robust scatter matrix.
sures variability through the variance, and the corresponding One advantage of the “projection pursuit” approach is that
directions are the eigenvectors of the sample covariance matrix, the estimate is well defined even when the data are collinear,
which is well known to be very sensitive to outliers. particularly when there are more variables than observations,
The simplest and probably oldest approach to robust PCA a situation common in chemometrics. Robust scatter matrices
consists of replacing the covariance matrix by a robust scat- such as M estimates and the MVE are not defined when the
ter matrix. Devlin, Gnanadesikan, and Kettenring (1981) and data are collinear. Robust approaches to orthogonal regression
Campbell (1980) used M estimates (Maronna 1976) for this have been treated by Zamar (1989, 1992).
purpose. The asymptotic behavior of this procedure was stud- This article extends the “projection pursuit” approach. Its
ied by Boente (1987). In view of the low breakdown point basic idea is that, because the final result sought for in PCA
of M estimates for high dimensions, more robust estimates, is a low-dimensional fit, it may be more advantageous—from
such as the minimum volume ellipsoid (MVE) estimator the standpoints of both computational efficiency and statistical
(Rousseeuw 1985) and S estimates (Davies 1987), were pro- performance—to search for it directly, rather than one compo-
posed by Naga and Antille (1990) and Croux and Haesbroeck nent at a time. A similar approach was also adopted by Croux
(2000). Locantore, Marron, Simpson, Tripoli, Zhang, and and Laine (2003).
Cohen (1999) proposed a very simple and fast procedure based
on an orthogonal-equivariant matrix.
© 2005 American Statistical Association and
Another approach consists of finding directions that maxi- the American Society for Quality
mize or minimize a measure of variability, which is not the vari- TECHNOMETRICS, AUGUST 2005, VOL. 47, NO. 3
ance but rather is a robust dispersion estimate. Such approach is DOI 10.1198/004017005000000166

264
PRINCIPAL COMPONENTS AND ORTHOGONAL REGRESSION BASED ON ROBUST SCALES 265

In this spirit, an algorithm is developed for finding a p-dimen- of (2), we have to minimize σ (r(B, a)) + α(b b − 1), where
sional linear manifold that “optimally fits” a dataset in the α is a Lagrange multiplier. Differentiating (4) with ri = ri (B, a)
sense of minimizing a smooth robust scale of the orthogonal yields the derivatives of σ with respect to B and a, and a
distances. The scale may be either a scale-M estimate or a straightforward calculation yields
“trimmed squares” scale. The basic iterative algorithm finds lo- 
n
cal minima, and a random search is performed to approximate a wi (xi − µ)(xi − µ) b = λb (5)
global minimum. Its implementation is fast, even for very high i=1
dimensionality. For p = m − 1, we have orthogonal regression.
for some scalar λ, where
For general PCA, a method for choosing an adequate p is given.  
The next section describes the proposed procedure. Section 3 ri
wi = W (6)
contains a more detailed review of robust PCA estimates. Sec- σ
tion 4 compares the computing times of the different estimates. and
Section 5 compares the performance of the estimates are com- n
wi xi
pared through a simulation study. Section 6 deals with the µ = i=1
n , (7)
analysis of a real dataset. Section 7 treats the application of i=1 wi
the procedure to orthogonal regression. An Appendix provides and
proofs.
a = b µ.
Downloaded by [Central Michigan University] at 02:37 20 December 2014

(8)

2. THE PROPOSED ALGORITHM Hence b is an eigenvector of the positive semidefinite matrix



n
Let xi , i = 1, . . . , n, be an m-dimensional dataset. It is as- C = C(B, a) = wi (xi − µ)(xi − µ) . (9)
sumed henceforth that the data are normalized in such a way i=1
that the Euclidean norm  ·  represents an adequate distance.
For general q, the same reasoning shows that at local extrema,
Given p < m, our purpose will be to find a linear p-dimensional
the rows b1 , . . . , bq of B verify (5), and a is given by
manifold that in some sense minimizes the orthogonal distances
between data points and the manifold. Let q = m − p. Then a a = Bµ. (10)
linear manifold has equation {Bx = a}, with B a (q × m)-matrix
It follows from lemma 2 of Boente (1983) that if W is a
and a a q-vector. It can and will henceforth be assumed that the
decreasing function, then at a local minimum the bj ’s are the
rows of B are orthonormal, that is, BB = Iq . Then the distance
eigenvectors corresponding to the q smallest eigenvalues of C.
of x to the manifold is Bx − a. Call
This approach leads naturally to an iterative fixed-point algo-
ri (B, a) = Bxi − a2 (1) rithm for minimizing σ (B, a).
the “residuals” (actually, the squared distances to the hyper- Algorithm 1. The algorithm depends on the initial B, the di-
plane). For r = (r1 , . . . , rn ), let σ (r) be a scale statistics. We mension q, and three parameters, N1 , N2 , and tol:
represent our goal as
1. Set it ← 1, σ0 ← ∞.
σ (r(B, a)) = min with (B, a) ∈ Bq,m , (2) 2. Do until it = N1 + N2 or  ≤ tol.
where Bq,m is the parameter space, a. If it = 1, set a equal to the coordinatewise median
of Bx1 , . . . , Bxn .
Bq,m = {(B, a) : a ∈ Rm , B ∈ Rq×m , BB = Iq }. (3)
 b. Compute ri (B, a), i = 1, . . . , n, from (1).
In the classical case, σ (r) = i ri , and the solution is the ma- c. Compute σ = σ (r(B, a)) from (4).
trix whose columns are the eigenvectors corresponding to the d. Set  ← 1 − σ/σ0 and σ0 ← σ .
q smallest eigenvalues of the covariance matrix. We treat (2) e. Compute the wi from (6) and µ from (7).
when σ is a smooth robust scale. The resulting estimates are f. If it > N1 :
called “S estimates” after Zamar (1989), who proposed this ap- (1) Compute C in (9) and the eigenvectors
proach for orthogonal regression (see Sec. 7). b1 , . . . , bq corresponding to its q smallest eigen-
values.
2.1 Minimizing an M -Scale (2) Set B as the matrix with rows bj , j = 1, . . . , q.
g. Compute a from (10).
An M-scale σ (r) for a nonnegative r is defined as solution σ h. Set it ← it + 1.
of
  3. End do.
1
n
ri
χ = δ, (4) In Section A.1 it is shown that σ decreases at each iteration
n σ
i=1 if χ is concave.
where δ ∈ (0, 1) and the function χ : R+ → [0, 1] is nonde- The branching 2(f ) means that for the first N1 iterations,
creasing, with χ(0) = 0 and χ(∞) = 1, and differentiable with B remains fixed. The reason for this is to make the initial a
derivative W. minimize σ (r(B, a)) (for the initial B). This choice ensures the
For simplicity, first consider q = 1. Then B has a single orthogonal invariance of the resulting estimates and gives much
row b of norm 1 and a is a scalar. To find local minima better results than using just the coordinatewise median. If an
TECHNOMETRICS, AUGUST 2005, VOL. 47, NO. 3
266 RICARDO MARONNA

initial a is available (see Sec. 2.3), then the algorithm is run with 2.3 Global minimization
N1 = 0 and overriding step 2(a). Note that the resulting B cor-
responds to the principal components of C, which is a weighted The strategy for finding the overall minimum of (2) is similar
covariance matrix with weights wi . to the one used by Rousseeuw and van Driessen (1999). For
In the experiments in this article, χ was chosen as the each of Nran random initial B’s, run the algorithm with a small
bisquare function (recall that r represents the squared dis- number N1 and N2 of iterations, keep the Nkeep candidates with
tances), the smallest σ ’s, and then iterate them to convergence.
The Nran initial B’s are random orthogonal matrices, each ob-
χ(r) = min{1, 1 − (1 − r)3 }, (11) tained by orthogonalizing a matrix of uniform random numbers.
For each of them, the algorithm is run with parameters N1 , N2 ,
which is concave. Intuitively, the constant δ in (4) should be and tol, and both the resulting B and a are stored for the Nkeep
near 1/2; it was chosen as lowest σ ’s. For each of these, the algorithm is run with parame-
n−m+q−1 ters N1 , N2 , and tol , starting with the stored B and a (i.e., with
δ= , (12) N1 = 0).
2n
Extensive experimenting led to the conclusion that Nran = 50,
based on breakdown considerations that are explained in Sec- Nkeep = 10, N1 = 3, and N2 = 2 sufficed for all practical pur-
tion A.2. poses (here tol is irrelevant), even for dimension m = 50. The
Downloaded by [Central Michigan University] at 02:37 20 December 2014

It may seem that the same approach could be used to find final iterations were run with N2 = 10 and tol = .001.
directions of maximum scale, but this does not happen; the al- Other plausible variants were considered but discarded after
gorithm oscillates wildly, and the proofs in Section A.1 cease some experiments:
to hold. When m ≥ n, although the algorithm does work, it is
• Rather than purely random B’s, one could generate ran-
more efficient to project the data on an (n − 1)-dimensional hy-
dom subsamples of size m + 1 from X, and take for B
perplane.
the eigenvectors corresponding to the smallest q eigenval-
ues of the subsample covariance matrix. But even with a
2.2 Minimizing an L-Scale large Nran , the results were poor in comparison to the pro-
posed algorithm; the reasons for this remain unclear.
Now we consider the problem (2) when σ is an L-scale, that • Instead of centering the projected data Bxi through a, one
is, one based on order statistics, namely the “trimmed sum of could center the xi from the start by means of a robust
squares,” orthogonal equivariant location vector and drop a, but the
results here were also poor.

h
• It seems that it would not be harmful to include “cheap” es-
σ (r) = r(i) , (13)
i=1
timates like the covariance matrix and the “spherical PCA”
estimate (see Sec. 3) as initial estimates. But although this
where r(1) ≤ · · · ≤ r(n) are the ordered residuals and h < n. The sometimes yielded a smaller σ , it often led to “bad” local
proposed algorithm is inspired by the “concentration step” used minima.
by Rousseeuw and van Driessen (1999) for the minimization
of a trimmed scale of Mahalanobis distances, which is the ba- 2.4 Choosing the Number of Components
sis for their fast version of the minimum covariance determi-
nant (MCD) multivariate estimate. The algorithm is similar to The number p of components in PCA is usually chosen on the
the one given in Section 2.1, but at each step, µ and C are the basis of the “proportion of unexplained variance” (although in
mean vector and covariance matrix of the observations with the some chemical applications, p is chosen so that the coordinates
h smallest residuals. of Bxi − a have the size of the a priori known measurement
error). The “optimal” proportion of unexplained variance for
Algorithm 2. Proceed as in Algorithm 1, but at step 2(e),
p = m − q components is defined as
compute the weights as
q
 j=1 λj
1 if ri is among the h smallest r’s uq =  m
opt
, (15)
wi = j=1 λj
0 otherwise.
It is shown in Section A.1 that σ decreases at each step. Ac- where the λj ’s are the eigenvalues of the underlying covariance
tually, because the problem is combinatorial, the convergence is matrix in ascending order. For both the estimates based on a co-
exact and occurs very rapidly. Intuitively, h should be near n/2. variance matrix  and those of the Li–Chen type, which yield
estimated eigenvalues  λ = (λ1 , . . . , 
λm ), the observed propor-
It is chosen as
  tion is
n+m−q+2 q
h= 
, (14) j=1 λj
2 uq =uq (
λ) = m , (16)

j=1 λj
based on breakdown considerations explained in Section A.2.
opt
Croux and Laine (2003) treated the asymptotic behavior of es- which is consistent for uq if the underlying distribution is el-
timators of this type. liptic.
TECHNOMETRICS, AUGUST 2005, VOL. 47, NO. 3
PRINCIPAL COMPONENTS AND ORTHOGONAL REGRESSION BASED ON ROBUST SCALES 267

For the proposed estimates (2), the observed proportion is For each q, we thus obtain σq and hence uq . Although σq is
σq larger than the value obtained by running the algorithm on the

uq =uq (B, a) = , (17) complete data, it actually turns out that the difference is negli-
σ0
gible for practical purposes. The procedure is easily automated,
where σq is the optimal error scale for p = m − q components
increasing q until σq > umax . The method is demonstrated with
and σ0 is the same for p = m. (In this case we take B = I, be-
a real dataset in Section 6.
cause all B’s are equivalent.) Croux and Laine (2003) showed
Note that a very simple method would be to choose q on the
that for elliptical distributions, 
uq is consistent in the case of an
basis of the eigenvalues of the matrix C in (9) computed for
L-scale, and asserted (personal communication) that a similar
q = q0 . But this approach is unreliable, because a point may be
approach may be used to show consistency in the case of an
an outlier for some q > q0 but not for q = q0 .
M-scale.
Assume now that we are willing to accept a maximum num-
ber p0 of predictors and a maximum allowed unexplained pro- 3. REVIEW OF SOME ROBUST PRINCIPAL
portion of variance umax . Let q0 = m − p0 . Then our goal is to COMPONENTS ANALYSIS ESTIMATES
find the largest q ≥ q0 such that  uq ≤ umax . This goal could be
attained simply by running the algorithm for q = q0 , q0 + 1, and In this section we review some estimates apt for PCA with
so on, but this may be too time-consuming. We describe a more high-dimensional data, which we use for comparison:
Downloaded by [Central Michigan University] at 02:37 20 December 2014

economic strategy, based on the fact that generally p0 is much


smaller than m. 1. The fast minimum covariance determinant (FMCD) es-
First, run the algorithm with q = q0 , with output Bq0 , aq0 , timate (Rousseeuw and van Driessen 1999) computes an ap-
and σq0 and σ0 described after (17). If  uq0 is approximately proximation to the MCD estimate of multivariate location and
equal to umax , then we are done; if it is larger, then we have scatter (Rousseeuw 1985) by a combination of random starts
to revise our goals or to give them up. But it generally will be and descent steps, and requires only a moderate number of ran-
smaller, meaning that a larger q is needed. Let q > q0 . Then we dom subsamples to yield reliable results. In the experiments in
solve (2) with the restriction that this article, the parameters of the algorithm were chosen as rec-
ommended by Rousseeuw and van Driessen, namely 500 sub-
(∗ ) the first q0 rows of B and a are given by Bq0 and aq0 . samples and 2.5% trimming.
Let D be a p0 × m matrix with orthonormal rows that are 2. The spherical PCA (SPC) procedure was proposed by
orthogonal to those of Bq0 , and put Locantore et al. (1999) in the realm of functional data analy-
sis. Let µ be a robust orthogonal equivariant location vector;
yi = Bq0 xi ∈ Rq0 , zi = Dxi ∈ Rp0 ,
they chose the space L2 median, that is,
and 
  µ = arg min xi − µ. (19)
yi µ

xi = , i
zi
Let yi = (xi − µ)/xi − µ. The procedure consists of using
so that
xi is xi after an orthogonal change of coordinates. For
the eigenvectors b1 , . . . , bm of the covariance matrix of the yi ,
simplicity, write
which, as pointed out by Boente and Fraiman (1999), are con-
r0i = ri (Bq0 , aq0 ) = yi − aq0 2 , i = 1, . . . , n. sistent for the principal components of the xi if the data are
elliptically distributed. Because the eigenvalues are in general
The residuals in the new coordinate system are B xi − a2 . The
∗ not consistent, we substitute them with
restriction ( ) means that in this system,
   
Iq0 aq0 λj = S(bj x1 , . . . , bj xn )2 , (20)
B= and a= , (18)

B
a
where S is any robust scale. We chose the median absolute de-
with ( a ) ∈ Bq−q0 ·p0 . Then
B, viation (MAD) for simplicity. The procedure is extremely fast.
“Elliptical PCA” consists of normalizing each coordinate by
xi − a2 = yi − aq0 2 + 
B Bzi −
a2 . means of a robust scale such as the MAD, and then applying
r(
Call B,
a ) the vector with elements SPC. But, as pointed out by Boente and Fraiman (1999), this
procedure then ceases to be consistent.
ri (
a ) = r0i + 
B, Bzi −
a2 , i = 1, . . . , n. 3. The orthogonalized Gnanadesikan–Kettenring (OGK) es-
Then our problem is now timate of multivariate location and scatter proposed by Maronna
and Zamar (2002) is a fast procedure based on a modification
r(
σ ( a )) = min
B, with ( a ) ∈ Bq−q0 .p0 .
B, of the coordinatewise estimate proposed by Gnanadesikan and
The problem is solved by Algorithm 1 or 2 (Secs. 2.1 and 2.2), Kettenring (1972). It is only scale-equivariant, but experiments
but now basing the weights on r( B, a ). Recall that the r0i ’s re- by Maronna and Zamar showed it to be approximately affine-
main fixed. Because computations are based on the p0 -dimen- equivariant and to give reliable results even under extremely
sional vectors zi (rather than on xi ) and p0
m, this problem is high collinearity. The version used in this article is the same
solved in a fraction of the time required to solve the initial one, as that used by Maronna and Zamar (2002), that is, with two
which took place in dimension m. iterations and 10% trimming.
TECHNOMETRICS, AUGUST 2005, VOL. 47, NO. 3
268 RICARDO MARONNA

Table 1. Running Times of Estimates


4. Croux and Ruiz-Gazen’s (1996, 2000) approximate ver-
sion of the Li–Chen estimate is based on the idea that the di- n = 200 n = 500
rections of largest variability should roughly correspond to the m= 10 20 30 40 50 10 20 30 40 50
data points most distant to the data center. Let S(·) be a robust
FMCD 8.6 13.6 22.2 33.2 64.6 30.5 41.5 70 102 192
dispersion estimate. Let yi = xi − µ and di = yi /yi , where µ PPMD .6 1.2 1.9 2.4 3.3 3.5 7.0 12 17 22
is the space median. The direction bp of maximum variability is PPME 2.1 4.0 5.6 8.1 11.5 14.2 24.6 35 47 68
found by maximizing S(b y1 , . . . , b yn ) over the finite set b ∈ OGK .5 1.8 4.2 7.1 11.1 1.2 4.8 10 17 27
S –L 1.8 4.1 6.8 10.2 14.4 5.5 11.2 16 23 32
{d1 , . . . , dn }. The corresponding maximum S2 is the “largest S –M 2.5 3.9 5.6 9.4 13.3 7.3 11.4 15 19 27
eigenvalue” λm . Then the data are projected onto the subspace
orthogonal to bm and the procedure is repeated, yielding bm−1 ,
λm−1 , and so on. One S used by Croux and Ruiz-Gazen is a
It is surprising that OGK and S–M have similar running times
modified MAD; for a vector z = (z1 , . . . , zn ), S(z) is the hth
for m = 50. It must be noted that matrix languages like Gauss
order statistic of |zi − median(z)|, where h = [(n + m + 1)/2]
or Matlab are very fast for built-in linear-algebraic functions but
[i.e., as (14) with q = 1]. In this article we use this scale and
slow for DO loops, and hence the relative running times of the
also the M-scale of |zi − median(z)| defined in Section 2.1, with
estimates may be very different using an elementwise language
bisquare χ , and δ given by (12) with q = 1. The respective pro-
like Fortran or C.
cedures are denoted by PPMD and PPME (for Projection Pur-
Downloaded by [Central Michigan University] at 02:37 20 December 2014

suit with MAD scale and with M-estimate of scale, resp.). From
the computational standpoint, one advantage of this approach 5. SIMULATION
for large m is that it is not necessary to compute all m compo-
nents, but rather to compute only a number p yielding a desired To compare the performances of the different estimates un-
“proportion of explained variance,” which can be estimated as der simulated data, we need a situation where it is clear “what is
(λm + · · · + λm−p+1 )/v, where v is a measure of total variability, being estimated.” This is chosen as m-variate normal data with
which can be roughly estimated as the sum of the squared scales covariance matrix  = diag(λ1 , . . . , λm ) with λ1 < · · · < λm .
of the coordinates. On the other hand, it requires O( pmn2 ) op- For a given number p of desired components and contamina-
erations [because each projection involves O(nm) operations, tion proportion ε, the contaminated data are a mixture (1 −
and there are n of them], which may be a drawback if n is large. ε)Nm (0, ) + εNm (kx0 , ν 2 ). Here ν is a small positive scalar;
One could also consider using this approach “upward” to find x0 belongs to the subspace of the smallest eigenvalues, where
directions of minimum scale; but this does not work, the reason the effect of contamination will presumably be most harmful;
being that although directions of high variability may be ex- and k ranges over a grid of values in search of the least favor-
pected to correspond roughly to some of the yi ’s, the same does able cases. Proceed as follows to generate a sample X:
not hold for those with lowest variability (e.g., if the data are 1. Generate xi , i = 1, . . . , n, as Nm (0, I).
concentrated on a hyperplane). 2. Let x0 have coordinates x0j = 1 for j ≤ q = m − p and
Remark. Each of these procedures yields a location vec- 0 otherwise. For i ≤ [nε] transform xi to
tor µ and a set of directions b1 , . . . , bm (ordered from smallest xi ← νxi + kx0 . (21)
to largest variability). Given the number of principal compo-
nents p, call B and D the matrices whose rows are b1 , . . . , bq (In all cases, we took ν = .5.)

and bq+1 , . . . , bm . Then the p-dimensional manifold of best fit 3. Finally, for all i and j, transform xij ← λj xij .
has equation Bx = a with a = Bµ. A p-dimensional representa- To assess the performance of an estimate B = B(X), a predic-
tion of the data is given by zi = D(xi − µ), from which the data tive approach is used. Let x be a Nm (0, ) vector independent
can be approximately reconstructed by  xi = D zi + µ. of X. Then, conditionally on X,
EBx2 = BB ,
4. RUNNING TIMES
and the “prediction proportion of unexplained variance” is
The estimates were implemented in Gauss (version 5.0) and BB
run on a PC with a 550-MHz Intel Pentium processor and q (B) =
upred . (22)
trace()
128 Mb RAM. For each combination of n and m, five ran-
dom samples with 20% contamination were generated as de- This measure of prediction error is to be compared with the “op-
scribed in Section 5, and the running times were averaged. The timal” proportion (15). This criterion seems more realistic than
estimates defined in Sections 2.1 and 2.2 are denoted by S–M treating the estimation of individual eigenvalues and eigenvec-
and S–L. For their computation and for both variants of PP we tors.
used q = [.8n]. The running times of S–M and S–L for a given To choose the number of components, it is necessary to es-
pred
m do not seem to depend much on q, but of course those of timate uq . This is done with the observed proportion of vari-
PPMD and PPME do. SPC was not included, because it is at ance u defined in (16) or (17).
least 30 times faster than the other estimates. Two performance measures are defined for each estimate:
It is seen that FMCD is always the slowest and PPMD is the a measure of relative prediction error, epred, and a measure eest
fastest in most cases (although this would change with larger n). of the relative estimation error of u. We have to take into ac-
pred
S–L and S–M are competitive with the other estimates. count that small values of uq are always better than large
TECHNOMETRICS, AUGUST 2005, VOL. 47, NO. 3
PRINCIPAL COMPONENTS AND ORTHOGONAL REGRESSION BASED ON ROBUST SCALES 269

ones, but both too small and too large values of  u are unde-
sirable, because they would lead to under-estimating or overes-
timating the number of components. Hence we define
pred
uq
epred = opt −1 and
uq
 (23)
pred 
u uq
eest = max pred , − 1.
uq u

The simulations were run with m = 10 and n = 100. The per-


formance of the estimates depends on the configuration of the
eigenvalues and on the location of contamination. Because it
would be impossible to explore all scenarios, two were selected,
representing an abrupt and a smooth increase of the λj :
(a) λj = 1 + .1j for j ≤ q and λj = 20(1 + .5( j − q)) other-
wise.
Downloaded by [Central Michigan University] at 02:37 20 December 2014

Figure 1. Simulation Mean Prediction Errors as a Function of k for


(b) λj = 2j−1 for 1 ≤ j ≤ m. Eigenvalue Configuration (a) With ε = .2.
opt
The values of uq for q = 8 in cases (a) and (b) are
.214 and .249. Configuration (a). For ε = 0, both PP estimates are surpris-
Because OGK is not orthogonal equivariant, it was applied ingly inefficient, whereas S–M and OGK are very efficient; for
to data subjected to a random orthogonal transformation. For ε = .10, S–M and OGK are the best, and both PP estimates are
each sample X, a random ( p × p)-orthogonal matrix T was the worst; for ε = .20, FMCD and OGK are as bad as Cov,
generated, the data were transformed to Txi , i = 1, . . . , n, and whereas S–M is the best. In view of its simplicity, SPC shows a
the OGK scatter matrix V was computed based on the trans- rather good behavior.
formed data. From it the matrix B was obtained, and then back-
transformed to BT. Configuration (b). For ε = 0, S–M and OGK are the most
Table 2 gives the mean values of the prediction error epred efficient; for ε = .10, S–M is clearly the best. The case ε = .20
corresponding to 200 replications. Their coefficients of varia- is clearly a hard one; the maximum errors are similar for all es-
tion are about 3%. The medians were also computed, and yield timates, but Figure 2 shows that in general S–M is better than
similar results. The first column refers to the eigenvalue config- the other estimates. SPC again shows a comparatively good be-
havior.
uration. “Cov” denotes the sample covariance matrix. The val-
ues of k [the contamination location in (21)] run between .5 and Table 3 gives the maxima over k of the Monte Carlo averages
20; only those are displayed at which some estimate attains its of eest , which measures the relative error in estimation of the
maximum error. For both configurations, the complete results estimation of epred in (23).
for ε = .20 as a function of k are displayed in Figures 1 and 2. Both versions of PP show the best overall behavior at esti-
To avoid cluttering the plot, the estimates are displayed in two mating their own prediction errors. FMCD is rather poor for
panels with the same vertical scale to allow for comparisons ε = .20. S–M is very efficient for ε = 0 and does an acceptably
(PPMD and S–L were omitted). good job for ε = .10 and .20. SPC again shows rather good be-
Now we discuss the results. havior.

Table 2. Simulation: Mean Prediction Errors epred

Configuration ε k Cov FMCD PPMD PPME OGK SPC S–L S–M


(a) 0 .02 .05 .31 .22 .03 .04 .07 .02
.1 2 .03 .13 .41 .33 .04 .04 .07 .03
5 1.39 .05 .64 .60 .03 .05 .07 .03
20 1.41 .05 .55 .50 .03 .05 .07 .03
.2 2 .07 .96 .97 1.01 .40 .34 .76 .20
3 1.28 1.45 1.18 1.15 1.38 .49 .06 .37
20 1.41 .04 .49 .43 .04 .41 .06 .03
(b) 0 .03 .07 .24 .19 .04 .06 .13 .04
.1 1 .03 .13 .30 .26 .07 .11 .25 .07
2 .11 .32 .42 .40 .25 .18 .15 .09
4 .66 .06 .47 .45 .06 .21 .13 .04
20 .69 .07 .37 .33 .05 .20 .13 .04
.2 2 .47 .68 .69 .68 .62 .56 .70 .49
3 .66 .74 .78 .73 .70 .62 .13 .58
20 .70 .06 .36 .30 .06 .40 .12 .04

TECHNOMETRICS, AUGUST 2005, VOL. 47, NO. 3


270 RICARDO MARONNA

Table 4. Ionospheric Data: Proportion of Unexplained Variance


for p Components

p Cov FMCD PPMD PPME OGK SPC S–L S–M


1 .46 .39 .32 .38 .37 .31
2 .24 .14 .16 .15 .13 .18
3 .13 .05 .13 .12 .06 .12 .13 .13
4 .08 .02 .10 .09 .03 .08 .08 .07
5 .06 .01 .07 .07 .02 .06 .07 .05
6 .05 .01 .06 .06 .02 .05 .05 .04
7 .05 .01 .04 .04 .01 .04 .05 .03
8 .04 .01 .03 .04 .01 .03 .04 .03

visualized by plotting xij against the index j for j = 1, . . . , m. Vi-


sualization is much improved by placing first the odd-numbered
and then the even-numbered j’s (the real and imaginary parts of
the signal). As an example, Figure 3 shows data points (rows)
21, 78, 174, and 172, together with their reconstructions from
S–M with p = 4 (solid lines). Figures 4 and 5 show the same for
Downloaded by [Central Michigan University] at 02:37 20 December 2014

Figure 2. Simulation Mean Prediction Errors as a Function of k for


Eigenvalue Configuration (b) With ε = .2. SPC and PPME. It is seen that the reconstructions from S–M are
more precise. In general, detailed visual inspection shows that
S–M yields better reconstructions than the other estimates for
6. ANALYSIS OF REAL DATA most. Table 5 gives the α-quantiles of the residuals ri for p = 4.
We compare the performance of the different estimates on a Maronna and Zamar (2002) noted that 170 of the 225 obser-
dataset taken from the “data repository” of Bay (1999), which vations respond to some “typical” patterns, 39 look like very
has been used by Sigillito, Wing, Hutton, and Baker (1989) noisy mixtures of these patterns, and 26 do not resemble any of
and also analyzed by Maronna and Zamar (2002). It consists the former. The latter generally have the largest residuals. The
of 351 radar measurements on 34 continuous characteristics, fact that S–M has the smallest quantiles up to α = .80 can be in-
two for each (complex) pulse number. The measurements are terpreted as implying that it fits the “typical” points better than
classified as “good” radar returns (those showing evidence of the other estimators.
some type of structure in the ionosphere) or “bad” ones. We
analyze the n = 225 “good” ones. Variables 1, 2, and 27 have 7. ORTHOGONAL REGRESSION
MAD = 0. Because this would cause the FMCD to fail, they
were omitted, so that here m = 31. In the errors-in-variables model (Fuller 1987), the observa-
PCA can be used to yield a low-dimensional representation tions xi are assumed to respond to the model xi = zi + ei , where
of the data, as described in the Remark at the end of Section 3. the zi ’s are iid unobservable vectors such that
Table 4 displays for each estimate the observed proportion of β  zi = α, i = 1, . . . , n,
unexplained variance  uq defined in (16) and (17), as a func-
tion of the number of components p = m − q. It is seen that for some α ∈ R and β ∈ Rm
with β = 1 and the “errors”
p = 3 or 4 suffices in all cases for 
uq about 10%. The last two ei are iid independent of the zi and have a spherically sym-
columns (read upward) describe the results of applying the pro- metric distribution. Under the classical assumption that ei ∼
cedure described in Section 2.4. with p0 = 8 and umax = .10. Nm (0, τ 2 I), the maximum likelihood estimate of β is the vector
Here uq0 = .04 and .03 for S–L and S–M. In both cases, when b corresponding to the hyperplane {bx = a} that minimizes the
p = 3, uq0 = .13 > umax , and the procedure stops. Computing sum of the orthogonal distances to the data points, and thus co-
S–M required 7 seconds for p0 = 8 and an additional 1.8 sec- incides with PCA with p = m − 1. Thus orthogonal regression
onds for the set of p between 7 and 3. As a check, the values of deals with the directions corresponding to the smallest eigenval-
the last two columns were recomputed for p between 3 and 7 ues, unlike PCA, which deals essentially with the largest ones.
using the full algorithm [i.e., without the restriction (18)]. The Early robust proposals were made by Brown (1982) and
resulting relative decrease in 
uq was only about 1%. Carroll and Gallo (1982), who pointed out the sensitivity of the
We now compare the approximate reconstructions given by classical estimate to outliers. Zamar (1989) defined S and M es-
the estimators with p = 4 components. Each data point xi can be timates for this problem. The former minimize an M scale of

Table 3. Simulation: Maxima Over k of Mean Estimation Errors eest

Configuration ε Cov FMCD PPMD PPME OGK SPC S–L S–M


(a) 0 .09 .14 .09 .10 .10 .14 .22 .11
.1 9.10 .20 .15 .15 .16 .18 .32 .25
.2 17.54 1.96 .23 .24 .96 .30 .56 .45
(b) 0 .10 .17 .12 .13 .12 .16 .25 .09
.1 10.40 .24 .12 .13 .14 .21 .25 .12
.2 20.02 2.18 .40 .42 1.10 .51 1.01 .82

TECHNOMETRICS, AUGUST 2005, VOL. 47, NO. 3


PRINCIPAL COMPONENTS AND ORTHOGONAL REGRESSION BASED ON ROBUST SCALES 271
Downloaded by [Central Michigan University] at 02:37 20 December 2014

Figure 3. Ionospheric Data: Observations 21, 78, 174, and 172 as Figure 5. Ionospheric Data: The Same Observations (+) and Their
Functions of Coordinate Number (+), and Their Reconstructions (—–) Reconstructions (—–) Using PPME With p = 4 Components.
Using S–M With p = 4 Components.

the residuals and thus correspond to (2) with q = 1. The numer- Then ε ∗ is shown to depend in a complicated way on the “signal

ical procedure that Zamar proposed finds an initial b through to noise ratio” λ2 /λ1 − 1.
a grid search, then performs an iterative gradient search. The Zamar (1992) showed that from a bias standpoint, the least
grid search makes the algorithm suited only for very small m. favorable configurations for point-mass contamination are—as
Zamar (1992) studied the maximum asymptotic bias and the can be expected—those with the contamination located on the
breakdown point of M estimates under normality, and found the subspace spanned by e1 and e2 . We assume without loss of
“optimal” M estimate in the sense of minimizing the maximum generality that e1 , . . . , em are the canonical basis vectors and
bias under a given contamination rate. Define the error measure β = (1, 0, . . . , 0). To compare the different estimates, a simu-
for an estimate b as lation was run with m = 5 and 10, n = 10m, and eigenvalues
λ1 = 1 and λj = 10 + 5( j − 2) for j = 2, . . . , m. The signal
err(b) = 1 − |b β|. (24)
to noise ratio is 3, and according to table 2 of Zamar (1992),
Then the asymptotic bias is defined as the asymptotic value of ε ∗ is about .3. The situations are normal contaminated observa-
err(b), and the asymptotic breakdown point ε ∗ is defined as the tions generated as in Section 5. Write x0 = (ks kl , kl , 0, . . . , 0).
maximum contamination rate such that err(b) is bounded away Here kl and ks play a role similar to the leverage and slope
from 1. If the zi ’s are multivariate normal, then xi is also nor- of contamination in ordinary regression. kl was assigned the
mal. Let λ1 < λ2 < · · · < λm be the eigenvalues of its covari- values 1, 2, and 5, and ks ranged between .5 and 50. Because
ance matrix, and let e1 , . . . , em be the respective eigenvectors. the Monte Carlo distribution of err(b) turned out to be very
heavy tailed, the estimates were evaluated through the median
of err(b) rather than its average. Table 6 shows for each kl the
maxima of the criterion over ks , multiplied by 100 to improve
readability.
For ε = 0, OGK and SPC are the most efficient estimates;
the others are rather inefficient. But for ε = .2, SPC is as bad as
Cov, and OGK is not much better. Both versions of PP fail, as
would be expected according to the reasons given at the end of
Section 3. S–L and S–M clearly show the best overall behavior.

Table 5. Ionospheric Data: α -Quantiles of Residuals


for p = 4 Components

α Cov FMCD PPMD PPME OGK SPC S–L S–M


.50 .23 .23 .51 .50 .19 .28 .18 .16
.60 .37 .37 .85 .76 .31 .44 .38 .28
.70 .52 .54 1.56 1.22 .52 .66 .74 .50
.80 .77 .88 3.00 2.45 .86 .98 1.40 .85
.90 1.18 1.25 5.63 3.71 1.96 1.82 3.04 1.95
Figure 4. Ionospheric Data: The Same Observations (+) and Their .95 1.75 1.93 8.15 5.86 3.10 2.35 6.70 2.80
Reconstructions (—–) Using SPC With p = 4 Components.

TECHNOMETRICS, AUGUST 2005, VOL. 47, NO. 3


272 RICARDO MARONNA

Table 6. Orthogonal Regression: Maxima Over ks of Monte Carlo Median Errors (×100)

m ε k1 Cov FMCD PPMD PPME OGK SPC S–L S–M


5 0 .25 1.51 4.53 2.45 .51 .45 .96 .80
.1 1 94.2 4.2 22.6 22.6 1.7 13.9 1.3 1.1
2 94.3 3.1 16.8 16.6 2.5 17.3 1.3 1.5
5 94.2 1.1 8.5 6.8 .6 19.2 1.3 1.2
.2 1 94.5 31.0 69.8 70.1 73.7 91.2 3.2 3.1
2 94.4 12.9 47.8 49.0 54.4 90.2 3.0 3.1
5 94.3 1.0 18.4 14.9 12.3 79.8 4.3 5.0
10 0 .20 .68 5.66 3.74 .28 .29 .72 .75
.1 1 93.9 11.0 73.3 70.7 64.7 91.3 1.2 .9
2 94.1 6.9 57.7 55.4 46.1 91.0 1.9 2.1
5 94.1 .7 34.3 30.6 11.7 82.9 1.1 .9
.2 1 93.8 75.8 90.1 88.8 82.8 89.1 69.0 65.8
2 94.1 57.3 85.2 86.6 67.6 78.1 18.2 37.9
5 94.1 26.2 69.2 67.8 26.1 67.0 4.9 5.0
Downloaded by [Central Michigan University] at 02:37 20 December 2014

APPENDIX: PROOFS AND BREAKDOWN POINT The concavity of χ implies that χ(t) − χ(s) ≤ W(s)(t − s),
and hence
A.1 Proofs

n

We prove that the scale descends at each iteration of Algo- {χ(r1i ) − χ(r0i )}
i=1
rithms 1 and 2.

n

Theorem A.1. Let χ be concave. Then σ decreases at each ≤ wi r1i − B0 (xi − µ)2
iteration of the Algorithm 1. i=1

Proof. Let B0 and a0 be the current values, and let B1 and a1 


n

be the values corresponding to the next iteration. For j = 0, 1, + wi B0 (xi − µ)2 − r0i ,
put i=1

and because both terms are less than 0 by (A.2) and (A.3), this
rji = Bj xi − aj 2 , i = 1, . . . , n,
proves (A.1).
and define σj as the solution to
Theorem A.2. σ decreases at each iteration of Algorithm 2.

n  
rji The proof is similar to that of Theorem A.1, taking into ac-
χ = δ.
σj count that
i=1

Then we have to prove that σ1 ≤ σ0 . 


h 
h
r(i) ≤ ri .
Without loss of generality, we may assume that a0 = 0 and
i=1 i=1
σ0 = 1. Because χ is nondecreasing, we have to show that

n 
n
A.2 Breakdown Point
χ(r1i ) ≤ χ(r0i ). (A.1)
i=1 i=1 We first deal with the finite-sample breakdown point of the
Let wi = W(r0i ) and scale. Rougly speaking, it is desirable that in the presence of
contamination, σ remains bounded away from infinity, and
n
wi xi 
n
also away from 0 if the “good data” have no collinearity. Let
µ = i=1n , C= wi (xi − µ)(xi − µ) . X = {x1 , . . . , xn }. For k ∈ {0, 1, . . . , n}, call Xk the set of all
i=1 wi i=1
datasets of size n having at least n − k elements in common
Then the rows of B1 are the eigenvectors corresponding to the with X. For Y = {y1 , . . . , yn }, call σ (B, a, Y) the scale [ei-
q smallest eigenvalues of C, and a1 = B1 µ. The definition of B1 ther (4) or (13)] corresponding to the vector with elements
implies that B1 CB1 ≤ B0 CB0 , which implies that Byi − a2 , i = 1, . . . , n. We define the breakdown point of σ
at X as k∗ /n, where k∗ is the maximum of the k ≥ 0 such that

n 
n
wi r1i ≤ wi B0 (xi − µ)2 . (A.2) ∀ (B, a) ∈ Bq,m : 0 < inf{σ (B, a, Y) : Y ∈ Xk } and
i=1 i=1
sup{σ (B, a, Y) : Y ∈ Xk } < ∞.
Because µ is the mean of the xi with weights wi , we have
Define X to be in the q-general position if no (m − q)-dimen-

n 
n
wi B0 (xi − µ) ≤ 2
wi r0i . (A.3) sional linear manifold {Bx = a} [where B is a (q × m)-matrix
i=1 i=1 of rank q] contains more than m − q + 1 data points. This is a
TECHNOMETRICS, AUGUST 2005, VOL. 47, NO. 3
PRINCIPAL COMPONENTS AND ORTHOGONAL REGRESSION BASED ON ROBUST SCALES 273

reasonable assumption if the data are generated by a continu- Croux, C., and Haesbroeck, G. (2000), “Principal Component Analysis Based
ous distribution. Then standard reasoning shows that the break- on Robust Estimators of the Covariance or Correlation Matrix: Influence
Functions and Efficiencies,” Biometrika, 87, 603–618.
down point for (4) is maximized by taking δ as in (12) and that Croux, C., and Laine, B. (2003), “Optimal Subspace Estimation Based on
for (13) is maximized by taking h in (14). Trimmed Square Loss,” unpublished manuscript.
We now prove the assertion on δ. We first prove that Croux, C., and Ruiz-Gazen, A. (1996), “A Fast Algorithm for Robust Princi-
pal Components Based on Projection Pursuit,” in Compstat: Proceedings in
k∗ = min{nδ, n(1 − δ) − m + q − 1}. (A.4) Computational Statistics, Heidelberg: Physica-Verlag, pp. 211–216.
(2000), “High Breakdown Estimators for Principal Components:
Assume that k ≤ k∗ . Let B and a be fixed. Then we can make x i, The Projection-Pursuit Approach Revisited,” technical report, available at
i = 1, . . . , k, tend to infinity in such a way that ri (B, a) → ∞, http://www.econ.kuleuven.ac.be.
Davies, L. (1987), “Asymptotic Behavior of S-Estimators of Multivariate Lo-
i = 1, . . . , k, and because χ(∞) = 1, (4) implies that k ≤ nδ. cation Estimators and Dispersion Matrices,” The Annals of Statistics, 15,
Assume now that Bxi = a for i = 1, . . . , m − q + 1. For i = m − 1269–1292.
q + 2, . . . , m − q + k + 1, modify xi so that also Bxi = a. Then Devlin, S. J., Gnanadesikan, R., and Kettenring, J. R. (1981), “Robust Esti-
there are m − q + 1 + k null residuals, and hence (4) implies mation of Dispersion Matrices and Principal Components,” Journal of the
American Statistical Association, 76, 354–362.
that nδ ≤ n − (m − q + k + 1). This shows that k∗ is not larger Filzmoser, P. (1999), “Robust Principal Component and Factor Analysis in
than the right side of (A.4). The reverse inequality is proved the Geostatistical Treatment of Environmental Data,” Environmetrics, 10,
likewise. The maximum of (A.4) as a function of δ occurs when 363–375.
Fuller, W. A. (1987), Measurement Error Models, New York: Wiley.
both elements of the minimum coincide, and this gives (12).
Gather, U., Hilker, T., and Becker, C. (1998), “Robust Sliced Inverse Regression
Downloaded by [Central Michigan University] at 02:37 20 December 2014

The assertion on h is proved in the same manner. Procedures,” Technical Report 22/98, University of Dortmund.
More generally, it would be useful to define the breakdown Gnanadesikan, R., and Kettenring, J. R. (1972), “Robust Estimates, Residuals,
point for the estimate (2) itself. To this end, define the “predic- and Outlier Detection With Multiresponse Data,” Biometrics, 28, 81–124.
Hubert, M., Rousseeuw, P. J., and Verboven, S. (2003), “Robust PCA for High-
tion bias” of B as Dimensional Data,” in Developments in Robust Statistics, eds. R. Dutter,
pbias(B) = upred P. Filzmoser, U. Gather, and P. J. Rousseeuw, Heidelberg: Physica-Verlag,
q (B) − uq ,
opt
pp. 169–179.
pred opt Li, G., and Chen, Z. (1985), “Projection-Pursuit Approach to Robust Disper-
where uq and uq are defined in (22) and (15). The break- sion Matrices and Principal Components: Primary Theory and Monte Carlo,”
down point of B may be defined as the maximum contamination Journal of the American Statistical Association, 80, 759–766.
rate for which the bias is bounded away from supB pbias(B). Locantore, N., Marron, J. S., Simpson, D. G., Tripoli, N., Zhang, J. T., and
Cohen, K. L. (1999), “Robust Principal Components for Functional Data,”
But the calculations seem intractable, even for q = 1. Test, 8, 1–28.
Maronna, R. A. (1976), “Robust M-Estimates of Multivariate Location and
[Received December 2002. Revised July 2004.] Scatter,” The Annals of Statistics, 4, 51–56.
Maronna, R. A., and Zamar, R. H. (2002), “Robust Estimation of Location and
Dispersion for High-Dimensional Datasets,” Technometrics, 44, 307–317.
REFERENCES Naga, R., and Antille, G. (1990), “Stability of Robust and Non-Robust Prin-
cipal Component Analysis,” Computational Statistics & Data Analysis, 10,
Bay, S. D. (1999), The UCI KDD Archive University of California, Dept. of 169–174.
Information and Computer Science, available at http://kdd.ics.uci.edu. Rousseeuw, P. J. (1985), “Multivariate Estimation With High Breakdown
Boente, G. (1983), “Robust Methods for Principal Components,” unpublished Point,” in Mathematical Statistics and Applications, Vol. B, eds. W. Gross-
doctoral thesis, University of Buenos Aires [in Spanish]. mann, G. Pflug, I. Vincze, and W. Wertz, Dordrecht: Reidel, pp. 283–297.
(1987), “Asymptotic Theory for Robust Principal Components,” Jour- Rousseeuw, P. J., and van Driesen, K. (1999), “A Fast Algorithm for the
nal of Multivariate Analysis, 21, 67–78. Minimum Covariance Determinant Estimator,” Technometrics, 41, 212–223.
Boente, G., and Fraiman, R. (1999), Discussion of “Robust Principal Compo-
Sigillito, V. G., Wing, S. P., Hutton, L. V., and Baker, K. B. (1989), “Classifica-
nents for Functional Data,” by Locantore et al., Test, 8, 28–35.
tion of Radar Returns From the Ionosphere Using Neural Networks,” Johns
Boente, G., and Orellana, L. (2000), “A Robust Approach to Common Principal
Hopkins APL Technical Digest, 10, 262–266.
Components,” working paper, University of Buenos Aires.
Brown, M. (1982), “Robust Line Estimation With Errors in Both Variables,” Xie, Y., Wang, J., Liang, Y., Sun, L., Song, X., and Yu, R. (1993), “Robust Prin-
Journal of the American Statistical Association, 77, 71–79. cipal Components Analysis by Projection Pursuit,” Journal of Chemometrics,
Campbell, N. A. (1980), “Robust Procedures in Multivariate Analysis I: Robust 7, 527–541.
Covariance Estimation,” Applied Statistics, 29, 231–237. Zamar, R. H. (1989), “Robust Estimation in Errors-in-Variables Models,” Bio-
Carroll, R., and Gallo, P. (1982), “Some Aspects of Robustness in the Func- metrika, 76, 149–160.
tional Errors-in-Variables Model,” Communications in Statistics, Part A— (1992), “Bias-Robust Estimation in Orthogonal Regression,” The An-
Theory and Methods, 11, 2573–2585. nals of Statistics, 20, 1875–1888.

TECHNOMETRICS, AUGUST 2005, VOL. 47, NO. 3

You might also like