Professional Documents
Culture Documents
Marrona 2005 Robust Scale
Marrona 2005 Robust Scale
Technometrics
Publication details, including instructions for authors and subscription information:
http://www.tandfonline.com/loi/utch20
To cite this article: Ricardo Maronna (2005) Principal Components and Orthogonal Regression Based on Robust Scales,
Technometrics, 47:3, 264-273, DOI: 10.1198/004017005000000166
Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained
in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no
representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of
the Content. Any opinions and views expressed in this publication are the opinions and views of the authors,
and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied
upon and should be independently verified with primary sources of information. Taylor and Francis shall
not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other
liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or
arising out of the use of the Content.
This article may be used for research, teaching, and private study purposes. Any substantial or systematic
reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any
form to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http://
www.tandfonline.com/page/terms-and-conditions
Principal Components and Orthogonal
Regression Based on Robust Scales
Ricardo M ARONNA
Mathematics Department
University of La Plata
C.C. 172, La Plata 1900, Argentina
(rmaronna@mail.retina.ar )
Both principal components analysis (PCA) and orthogonal regression deal with finding a p-dimensional
linear manifold minimizing a scale of the orthogonal distances of the m-dimensional data points to the
manifold. The main conceptual difference is that in PCA p is estimated from the data, to attain a small
proportion of unexplained variability, whereas in orthogonal regression p equals m − 1. The two main
approaches to robust PCA are using the eigenvectors of a robust covariance matrix and searching for the
projections that maximize or minimize a robust (univariate) dispersion measure. This article is more akin
to second approach. But rather than finding the components one by one, we directly undertake the problem
Downloaded by [Central Michigan University] at 02:37 20 December 2014
of finding, for a given p, a p-dimensional linear manifold minimizing a robust scale of the orthogonal
distances of the data points to the manifold. The scale may be either a smooth M-scale or a “trimmed”
scale. An iterative algorithm is developed that is shown to converge to a local minimum. A strategy based
on random search is used to approximate a global minimum. The procedure is shown to be faster than other
high-breakdown-point competitors, especially for large m. The case whereas p = m − 1 yields orthogonal
regression. For PCA, a computationally efficient method to choose p is given. Comparisons based on both
simulated and real data show that the proposed procedure is more robust than its competitors.
264
PRINCIPAL COMPONENTS AND ORTHOGONAL REGRESSION BASED ON ROBUST SCALES 265
In this spirit, an algorithm is developed for finding a p-dimen- of (2), we have to minimize σ (r(B, a)) + α(b b − 1), where
sional linear manifold that “optimally fits” a dataset in the α is a Lagrange multiplier. Differentiating (4) with ri = ri (B, a)
sense of minimizing a smooth robust scale of the orthogonal yields the derivatives of σ with respect to B and a, and a
distances. The scale may be either a scale-M estimate or a straightforward calculation yields
“trimmed squares” scale. The basic iterative algorithm finds lo-
n
cal minima, and a random search is performed to approximate a wi (xi − µ)(xi − µ) b = λb (5)
global minimum. Its implementation is fast, even for very high i=1
dimensionality. For p = m − 1, we have orthogonal regression.
for some scalar λ, where
For general PCA, a method for choosing an adequate p is given.
The next section describes the proposed procedure. Section 3 ri
wi = W (6)
contains a more detailed review of robust PCA estimates. Sec- σ
tion 4 compares the computing times of the different estimates. and
Section 5 compares the performance of the estimates are com- n
wi xi
pared through a simulation study. Section 6 deals with the µ = i=1
n , (7)
analysis of a real dataset. Section 7 treats the application of i=1 wi
the procedure to orthogonal regression. An Appendix provides and
proofs.
a = b µ.
Downloaded by [Central Michigan University] at 02:37 20 December 2014
(8)
initial a is available (see Sec. 2.3), then the algorithm is run with 2.3 Global minimization
N1 = 0 and overriding step 2(a). Note that the resulting B cor-
responds to the principal components of C, which is a weighted The strategy for finding the overall minimum of (2) is similar
covariance matrix with weights wi . to the one used by Rousseeuw and van Driessen (1999). For
In the experiments in this article, χ was chosen as the each of Nran random initial B’s, run the algorithm with a small
bisquare function (recall that r represents the squared dis- number N1 and N2 of iterations, keep the Nkeep candidates with
tances), the smallest σ ’s, and then iterate them to convergence.
The Nran initial B’s are random orthogonal matrices, each ob-
χ(r) = min{1, 1 − (1 − r)3 }, (11) tained by orthogonalizing a matrix of uniform random numbers.
For each of them, the algorithm is run with parameters N1 , N2 ,
which is concave. Intuitively, the constant δ in (4) should be and tol, and both the resulting B and a are stored for the Nkeep
near 1/2; it was chosen as lowest σ ’s. For each of these, the algorithm is run with parame-
n−m+q−1 ters N1 , N2 , and tol , starting with the stored B and a (i.e., with
δ= , (12) N1 = 0).
2n
Extensive experimenting led to the conclusion that Nran = 50,
based on breakdown considerations that are explained in Sec- Nkeep = 10, N1 = 3, and N2 = 2 sufficed for all practical pur-
tion A.2. poses (here tol is irrelevant), even for dimension m = 50. The
Downloaded by [Central Michigan University] at 02:37 20 December 2014
It may seem that the same approach could be used to find final iterations were run with N2 = 10 and tol = .001.
directions of maximum scale, but this does not happen; the al- Other plausible variants were considered but discarded after
gorithm oscillates wildly, and the proofs in Section A.1 cease some experiments:
to hold. When m ≥ n, although the algorithm does work, it is
• Rather than purely random B’s, one could generate ran-
more efficient to project the data on an (n − 1)-dimensional hy-
dom subsamples of size m + 1 from X, and take for B
perplane.
the eigenvectors corresponding to the smallest q eigenval-
ues of the subsample covariance matrix. But even with a
2.2 Minimizing an L-Scale large Nran , the results were poor in comparison to the pro-
posed algorithm; the reasons for this remain unclear.
Now we consider the problem (2) when σ is an L-scale, that • Instead of centering the projected data Bxi through a, one
is, one based on order statistics, namely the “trimmed sum of could center the xi from the start by means of a robust
squares,” orthogonal equivariant location vector and drop a, but the
results here were also poor.
h
• It seems that it would not be harmful to include “cheap” es-
σ (r) = r(i) , (13)
i=1
timates like the covariance matrix and the “spherical PCA”
estimate (see Sec. 3) as initial estimates. But although this
where r(1) ≤ · · · ≤ r(n) are the ordered residuals and h < n. The sometimes yielded a smaller σ , it often led to “bad” local
proposed algorithm is inspired by the “concentration step” used minima.
by Rousseeuw and van Driessen (1999) for the minimization
of a trimmed scale of Mahalanobis distances, which is the ba- 2.4 Choosing the Number of Components
sis for their fast version of the minimum covariance determi-
nant (MCD) multivariate estimate. The algorithm is similar to The number p of components in PCA is usually chosen on the
the one given in Section 2.1, but at each step, µ and C are the basis of the “proportion of unexplained variance” (although in
mean vector and covariance matrix of the observations with the some chemical applications, p is chosen so that the coordinates
h smallest residuals. of Bxi − a have the size of the a priori known measurement
error). The “optimal” proportion of unexplained variance for
Algorithm 2. Proceed as in Algorithm 1, but at step 2(e),
p = m − q components is defined as
compute the weights as
q
j=1 λj
1 if ri is among the h smallest r’s uq = m
opt
, (15)
wi = j=1 λj
0 otherwise.
It is shown in Section A.1 that σ decreases at each step. Ac- where the λj ’s are the eigenvalues of the underlying covariance
tually, because the problem is combinatorial, the convergence is matrix in ascending order. For both the estimates based on a co-
exact and occurs very rapidly. Intuitively, h should be near n/2. variance matrix and those of the Li–Chen type, which yield
estimated eigenvalues λ = (λ1 , . . . ,
λm ), the observed propor-
It is chosen as
tion is
n+m−q+2 q
h=
, (14) j=1 λj
2 uq =uq (
λ) = m , (16)
j=1 λj
based on breakdown considerations explained in Section A.2.
opt
Croux and Laine (2003) treated the asymptotic behavior of es- which is consistent for uq if the underlying distribution is el-
timators of this type. liptic.
TECHNOMETRICS, AUGUST 2005, VOL. 47, NO. 3
PRINCIPAL COMPONENTS AND ORTHOGONAL REGRESSION BASED ON ROBUST SCALES 267
For the proposed estimates (2), the observed proportion is For each q, we thus obtain σq and hence uq . Although σq is
σq larger than the value obtained by running the algorithm on the
uq =uq (B, a) = , (17) complete data, it actually turns out that the difference is negli-
σ0
gible for practical purposes. The procedure is easily automated,
where σq is the optimal error scale for p = m − q components
increasing q until σq > umax . The method is demonstrated with
and σ0 is the same for p = m. (In this case we take B = I, be-
a real dataset in Section 6.
cause all B’s are equivalent.) Croux and Laine (2003) showed
Note that a very simple method would be to choose q on the
that for elliptical distributions,
uq is consistent in the case of an
basis of the eigenvalues of the matrix C in (9) computed for
L-scale, and asserted (personal communication) that a similar
q = q0 . But this approach is unreliable, because a point may be
approach may be used to show consistency in the case of an
an outlier for some q > q0 but not for q = q0 .
M-scale.
Assume now that we are willing to accept a maximum num-
ber p0 of predictors and a maximum allowed unexplained pro- 3. REVIEW OF SOME ROBUST PRINCIPAL
portion of variance umax . Let q0 = m − p0 . Then our goal is to COMPONENTS ANALYSIS ESTIMATES
find the largest q ≥ q0 such that uq ≤ umax . This goal could be
attained simply by running the algorithm for q = q0 , q0 + 1, and In this section we review some estimates apt for PCA with
so on, but this may be too time-consuming. We describe a more high-dimensional data, which we use for comparison:
Downloaded by [Central Michigan University] at 02:37 20 December 2014
suit with MAD scale and with M-estimate of scale, resp.). From
the computational standpoint, one advantage of this approach 5. SIMULATION
for large m is that it is not necessary to compute all m compo-
nents, but rather to compute only a number p yielding a desired To compare the performances of the different estimates un-
“proportion of explained variance,” which can be estimated as der simulated data, we need a situation where it is clear “what is
(λm + · · · + λm−p+1 )/v, where v is a measure of total variability, being estimated.” This is chosen as m-variate normal data with
which can be roughly estimated as the sum of the squared scales covariance matrix = diag(λ1 , . . . , λm ) with λ1 < · · · < λm .
of the coordinates. On the other hand, it requires O( pmn2 ) op- For a given number p of desired components and contamina-
erations [because each projection involves O(nm) operations, tion proportion ε, the contaminated data are a mixture (1 −
and there are n of them], which may be a drawback if n is large. ε)Nm (0, ) + εNm (kx0 , ν 2 ). Here ν is a small positive scalar;
One could also consider using this approach “upward” to find x0 belongs to the subspace of the smallest eigenvalues, where
directions of minimum scale; but this does not work, the reason the effect of contamination will presumably be most harmful;
being that although directions of high variability may be ex- and k ranges over a grid of values in search of the least favor-
pected to correspond roughly to some of the yi ’s, the same does able cases. Proceed as follows to generate a sample X:
not hold for those with lowest variability (e.g., if the data are 1. Generate xi , i = 1, . . . , n, as Nm (0, I).
concentrated on a hyperplane). 2. Let x0 have coordinates x0j = 1 for j ≤ q = m − p and
Remark. Each of these procedures yields a location vec- 0 otherwise. For i ≤ [nε] transform xi to
tor µ and a set of directions b1 , . . . , bm (ordered from smallest xi ← νxi + kx0 . (21)
to largest variability). Given the number of principal compo-
nents p, call B and D the matrices whose rows are b1 , . . . , bq (In all cases, we took ν = .5.)
and bq+1 , . . . , bm . Then the p-dimensional manifold of best fit 3. Finally, for all i and j, transform xij ← λj xij .
has equation Bx = a with a = Bµ. A p-dimensional representa- To assess the performance of an estimate B = B(X), a predic-
tion of the data is given by zi = D(xi − µ), from which the data tive approach is used. Let x be a Nm (0, ) vector independent
can be approximately reconstructed by xi = D zi + µ. of X. Then, conditionally on X,
EBx2 = BB ,
4. RUNNING TIMES
and the “prediction proportion of unexplained variance” is
The estimates were implemented in Gauss (version 5.0) and BB
run on a PC with a 550-MHz Intel Pentium processor and q (B) =
upred . (22)
trace()
128 Mb RAM. For each combination of n and m, five ran-
dom samples with 20% contamination were generated as de- This measure of prediction error is to be compared with the “op-
scribed in Section 5, and the running times were averaged. The timal” proportion (15). This criterion seems more realistic than
estimates defined in Sections 2.1 and 2.2 are denoted by S–M treating the estimation of individual eigenvalues and eigenvec-
and S–L. For their computation and for both variants of PP we tors.
used q = [.8n]. The running times of S–M and S–L for a given To choose the number of components, it is necessary to es-
pred
m do not seem to depend much on q, but of course those of timate uq . This is done with the observed proportion of vari-
PPMD and PPME do. SPC was not included, because it is at ance u defined in (16) or (17).
least 30 times faster than the other estimates. Two performance measures are defined for each estimate:
It is seen that FMCD is always the slowest and PPMD is the a measure of relative prediction error, epred, and a measure eest
fastest in most cases (although this would change with larger n). of the relative estimation error of u. We have to take into ac-
pred
S–L and S–M are competitive with the other estimates. count that small values of uq are always better than large
TECHNOMETRICS, AUGUST 2005, VOL. 47, NO. 3
PRINCIPAL COMPONENTS AND ORTHOGONAL REGRESSION BASED ON ROBUST SCALES 269
ones, but both too small and too large values of u are unde-
sirable, because they would lead to under-estimating or overes-
timating the number of components. Hence we define
pred
uq
epred = opt −1 and
uq
(23)
pred
u uq
eest = max pred , − 1.
uq u
Figure 3. Ionospheric Data: Observations 21, 78, 174, and 172 as Figure 5. Ionospheric Data: The Same Observations (+) and Their
Functions of Coordinate Number (+), and Their Reconstructions (—–) Reconstructions (—–) Using PPME With p = 4 Components.
Using S–M With p = 4 Components.
the residuals and thus correspond to (2) with q = 1. The numer- Then ε ∗ is shown to depend in a complicated way on the “signal
√
ical procedure that Zamar proposed finds an initial b through to noise ratio” λ2 /λ1 − 1.
a grid search, then performs an iterative gradient search. The Zamar (1992) showed that from a bias standpoint, the least
grid search makes the algorithm suited only for very small m. favorable configurations for point-mass contamination are—as
Zamar (1992) studied the maximum asymptotic bias and the can be expected—those with the contamination located on the
breakdown point of M estimates under normality, and found the subspace spanned by e1 and e2 . We assume without loss of
“optimal” M estimate in the sense of minimizing the maximum generality that e1 , . . . , em are the canonical basis vectors and
bias under a given contamination rate. Define the error measure β = (1, 0, . . . , 0). To compare the different estimates, a simu-
for an estimate b as lation was run with m = 5 and 10, n = 10m, and eigenvalues
λ1 = 1 and λj = 10 + 5( j − 2) for j = 2, . . . , m. The signal
err(b) = 1 − |b β|. (24)
to noise ratio is 3, and according to table 2 of Zamar (1992),
Then the asymptotic bias is defined as the asymptotic value of ε ∗ is about .3. The situations are normal contaminated observa-
err(b), and the asymptotic breakdown point ε ∗ is defined as the tions generated as in Section 5. Write x0 = (ks kl , kl , 0, . . . , 0).
maximum contamination rate such that err(b) is bounded away Here kl and ks play a role similar to the leverage and slope
from 1. If the zi ’s are multivariate normal, then xi is also nor- of contamination in ordinary regression. kl was assigned the
mal. Let λ1 < λ2 < · · · < λm be the eigenvalues of its covari- values 1, 2, and 5, and ks ranged between .5 and 50. Because
ance matrix, and let e1 , . . . , em be the respective eigenvectors. the Monte Carlo distribution of err(b) turned out to be very
heavy tailed, the estimates were evaluated through the median
of err(b) rather than its average. Table 6 shows for each kl the
maxima of the criterion over ks , multiplied by 100 to improve
readability.
For ε = 0, OGK and SPC are the most efficient estimates;
the others are rather inefficient. But for ε = .2, SPC is as bad as
Cov, and OGK is not much better. Both versions of PP fail, as
would be expected according to the reasons given at the end of
Section 3. S–L and S–M clearly show the best overall behavior.
Table 6. Orthogonal Regression: Maxima Over ks of Monte Carlo Median Errors (×100)
APPENDIX: PROOFS AND BREAKDOWN POINT The concavity of χ implies that χ(t) − χ(s) ≤ W(s)(t − s),
and hence
A.1 Proofs
n
We prove that the scale descends at each iteration of Algo- {χ(r1i ) − χ(r0i )}
i=1
rithms 1 and 2.
n
Theorem A.1. Let χ be concave. Then σ decreases at each ≤ wi r1i − B0 (xi − µ)2
iteration of the Algorithm 1. i=1
and because both terms are less than 0 by (A.2) and (A.3), this
rji = Bj xi − aj 2 , i = 1, . . . , n,
proves (A.1).
and define σj as the solution to
Theorem A.2. σ decreases at each iteration of Algorithm 2.
n
rji The proof is similar to that of Theorem A.1, taking into ac-
χ = δ.
σj count that
i=1
reasonable assumption if the data are generated by a continu- Croux, C., and Haesbroeck, G. (2000), “Principal Component Analysis Based
ous distribution. Then standard reasoning shows that the break- on Robust Estimators of the Covariance or Correlation Matrix: Influence
Functions and Efficiencies,” Biometrika, 87, 603–618.
down point for (4) is maximized by taking δ as in (12) and that Croux, C., and Laine, B. (2003), “Optimal Subspace Estimation Based on
for (13) is maximized by taking h in (14). Trimmed Square Loss,” unpublished manuscript.
We now prove the assertion on δ. We first prove that Croux, C., and Ruiz-Gazen, A. (1996), “A Fast Algorithm for Robust Princi-
pal Components Based on Projection Pursuit,” in Compstat: Proceedings in
k∗ = min{nδ, n(1 − δ) − m + q − 1}. (A.4) Computational Statistics, Heidelberg: Physica-Verlag, pp. 211–216.
(2000), “High Breakdown Estimators for Principal Components:
Assume that k ≤ k∗ . Let B and a be fixed. Then we can make x i, The Projection-Pursuit Approach Revisited,” technical report, available at
i = 1, . . . , k, tend to infinity in such a way that ri (B, a) → ∞, http://www.econ.kuleuven.ac.be.
Davies, L. (1987), “Asymptotic Behavior of S-Estimators of Multivariate Lo-
i = 1, . . . , k, and because χ(∞) = 1, (4) implies that k ≤ nδ. cation Estimators and Dispersion Matrices,” The Annals of Statistics, 15,
Assume now that Bxi = a for i = 1, . . . , m − q + 1. For i = m − 1269–1292.
q + 2, . . . , m − q + k + 1, modify xi so that also Bxi = a. Then Devlin, S. J., Gnanadesikan, R., and Kettenring, J. R. (1981), “Robust Esti-
there are m − q + 1 + k null residuals, and hence (4) implies mation of Dispersion Matrices and Principal Components,” Journal of the
American Statistical Association, 76, 354–362.
that nδ ≤ n − (m − q + k + 1). This shows that k∗ is not larger Filzmoser, P. (1999), “Robust Principal Component and Factor Analysis in
than the right side of (A.4). The reverse inequality is proved the Geostatistical Treatment of Environmental Data,” Environmetrics, 10,
likewise. The maximum of (A.4) as a function of δ occurs when 363–375.
Fuller, W. A. (1987), Measurement Error Models, New York: Wiley.
both elements of the minimum coincide, and this gives (12).
Gather, U., Hilker, T., and Becker, C. (1998), “Robust Sliced Inverse Regression
Downloaded by [Central Michigan University] at 02:37 20 December 2014
The assertion on h is proved in the same manner. Procedures,” Technical Report 22/98, University of Dortmund.
More generally, it would be useful to define the breakdown Gnanadesikan, R., and Kettenring, J. R. (1972), “Robust Estimates, Residuals,
point for the estimate (2) itself. To this end, define the “predic- and Outlier Detection With Multiresponse Data,” Biometrics, 28, 81–124.
Hubert, M., Rousseeuw, P. J., and Verboven, S. (2003), “Robust PCA for High-
tion bias” of B as Dimensional Data,” in Developments in Robust Statistics, eds. R. Dutter,
pbias(B) = upred P. Filzmoser, U. Gather, and P. J. Rousseeuw, Heidelberg: Physica-Verlag,
q (B) − uq ,
opt
pp. 169–179.
pred opt Li, G., and Chen, Z. (1985), “Projection-Pursuit Approach to Robust Disper-
where uq and uq are defined in (22) and (15). The break- sion Matrices and Principal Components: Primary Theory and Monte Carlo,”
down point of B may be defined as the maximum contamination Journal of the American Statistical Association, 80, 759–766.
rate for which the bias is bounded away from supB pbias(B). Locantore, N., Marron, J. S., Simpson, D. G., Tripoli, N., Zhang, J. T., and
Cohen, K. L. (1999), “Robust Principal Components for Functional Data,”
But the calculations seem intractable, even for q = 1. Test, 8, 1–28.
Maronna, R. A. (1976), “Robust M-Estimates of Multivariate Location and
[Received December 2002. Revised July 2004.] Scatter,” The Annals of Statistics, 4, 51–56.
Maronna, R. A., and Zamar, R. H. (2002), “Robust Estimation of Location and
Dispersion for High-Dimensional Datasets,” Technometrics, 44, 307–317.
REFERENCES Naga, R., and Antille, G. (1990), “Stability of Robust and Non-Robust Prin-
cipal Component Analysis,” Computational Statistics & Data Analysis, 10,
Bay, S. D. (1999), The UCI KDD Archive University of California, Dept. of 169–174.
Information and Computer Science, available at http://kdd.ics.uci.edu. Rousseeuw, P. J. (1985), “Multivariate Estimation With High Breakdown
Boente, G. (1983), “Robust Methods for Principal Components,” unpublished Point,” in Mathematical Statistics and Applications, Vol. B, eds. W. Gross-
doctoral thesis, University of Buenos Aires [in Spanish]. mann, G. Pflug, I. Vincze, and W. Wertz, Dordrecht: Reidel, pp. 283–297.
(1987), “Asymptotic Theory for Robust Principal Components,” Jour- Rousseeuw, P. J., and van Driesen, K. (1999), “A Fast Algorithm for the
nal of Multivariate Analysis, 21, 67–78. Minimum Covariance Determinant Estimator,” Technometrics, 41, 212–223.
Boente, G., and Fraiman, R. (1999), Discussion of “Robust Principal Compo-
Sigillito, V. G., Wing, S. P., Hutton, L. V., and Baker, K. B. (1989), “Classifica-
nents for Functional Data,” by Locantore et al., Test, 8, 28–35.
tion of Radar Returns From the Ionosphere Using Neural Networks,” Johns
Boente, G., and Orellana, L. (2000), “A Robust Approach to Common Principal
Hopkins APL Technical Digest, 10, 262–266.
Components,” working paper, University of Buenos Aires.
Brown, M. (1982), “Robust Line Estimation With Errors in Both Variables,” Xie, Y., Wang, J., Liang, Y., Sun, L., Song, X., and Yu, R. (1993), “Robust Prin-
Journal of the American Statistical Association, 77, 71–79. cipal Components Analysis by Projection Pursuit,” Journal of Chemometrics,
Campbell, N. A. (1980), “Robust Procedures in Multivariate Analysis I: Robust 7, 527–541.
Covariance Estimation,” Applied Statistics, 29, 231–237. Zamar, R. H. (1989), “Robust Estimation in Errors-in-Variables Models,” Bio-
Carroll, R., and Gallo, P. (1982), “Some Aspects of Robustness in the Func- metrika, 76, 149–160.
tional Errors-in-Variables Model,” Communications in Statistics, Part A— (1992), “Bias-Robust Estimation in Orthogonal Regression,” The An-
Theory and Methods, 11, 2573–2585. nals of Statistics, 20, 1875–1888.