Professional Documents
Culture Documents
Rousseeuwhubert Highbdmultivariatelocscatter Fests
Rousseeuwhubert Highbdmultivariatelocscatter Fests
4.1 Introduction
Peter Rousseeuw
KU Leuven, Department of Mathematics, Celestijnenlaan 200B, BE-3001 Heverlee, e-mail:
peter.rousseeuw@wis.kuleuven.be
Mia Hubert
KU Leuven, Department of Mathematics, Celestijnenlaan 200B, BE-3001 Heverlee, e-mail:
mia.hubert@wis.kuleuven.be
53
54 Peter Rousseeuw and Mia Hubert
1
f (xx ) = h(d 2 (xx, μ , Σ )) (4.1)
|Σ |
The matrix Σ is often called the scatter matrix. The multivariate normal distribution
N(μ , Σ ) is a specific example of (4.1) with h(t) = (2π ) −p/2e−t/2 . Another example
is the elliptical p-variate Student distribution with ν degrees of freedom (0 < ν < ∞)
for which h(t) = ct /(t + ν )(p+ν )/2 with ct a constant. The case ν = 1 is called the
multivariate Cauchy distribution.
Elliptical distributions have the following properties:
1. The contours of constant density are ellipsoids.
2. If the mean and variances of X exist, then E F [X] = μ and CovF [X] = cΣ with c
a constant. (For normal distributions c = 1.)
3. For any nonsingular matrix A and vector bb the distribution of AX + b is also
elliptical, with mean A μ + b and scatter matrix AΣ A .
4. The random variable X can be written as
X = AZ + μ (4.3)
Animals
15
10
log(brain)
5
0
−5
−10 −5 0 5 10 15
log(body)
Fig. 4.1: Scatterplot of the logarithm of body and brain weight of 28 animals.
1 n 1 n
x̄x = ∑ xi
n i=1
and C(Xn ) = ∑ (xxi − x̄x)(xxi − x̄x) .
n i=1
μ (AX + b ) = A μ̂
μ̂ μ (X) + b and Σ̂ (AX + b ) = AΣ̂ (X)A . (4.4)
Affine equivariance implies that the estimator transforms well under any non-
singular reparametrization of the space of the xxi . The data might for instance be
rotated, translated, or rescaled (for example through a change of the measurement
units). Note that a transformation AX of the variables (X 1 , . . . , X p ) corresponds to
the matrix product Xn A .
At the normal model the sample mean and the sample covariance matrix are con-
sistent and asymptotically normal with maximal efficiency, but they lack robustness:
56 Peter Rousseeuw and Mia Hubert
• Their influence function (Hampel et al., 1986) is unbounded. Let T (F) be the sta-
tistical functional mapping a distribution F to its mean μμ F = EF [X] and let V (F)
be the statistical functional mapping F to its covariance matrix Σ F = CovF [X].
Then at any z ∈ R p
IF(zz; T, F) = z − μ F
IF(zz;V, F) = (zz − μ F )(zz − μ F ) − ΣF .
• The asymptotic breakdown value of the classical covariance matrix is zero too.
Let us denote the eigenvalues of a p.s.d. matrix by λ 1 . . . λ p 0. Then we
define the implosion breakdown value of a scatter functional V at F as
∗
εimp (V, F) = inf{ε > 0 : inf{λ p(V (Fε ,H ))} = 0}
H
Any affine equivariant scatter estimator Σ̂ satisfies the sharp bound (Davies, 1987)
% &
1 n− p+1
εn∗ (Σ̂ , Xn ) (4.7)
n 2
tors the asymptotic breakdown value is always at most 0.5. For recent discussions
about the breakdown value and equivariance see Davies and Gather (2005) and the
contribution by Müller, Chapter 5.
Example. For the animals data set mentioned before, the classical estimates are
x̄x = (3.77, 4.43) and
14.22 7.05
S=
7.05 5.76
√
which yields an estimated correlation of r = 7.05/ 14.22 5.76 = 0.78. To visualize
these estimates we can plot the resulting 97.5% tolerance ellipse in Figure 4.2. Its
boundary consists of points with constant Mahalanobis distance to the mean. In
general dimension p, the tolerance ellipsoid is defined as
$
{xx; MD(xx ) χ p,0.975
2 } (4.8)
with MD(xx) = di (xx, x̄x, S) following (4.2). We expect (for large n) that about 97.5%
of the observations belong to this ellipsoid. One could flag an observation as an
outlier if it does not belong to the classical tolerance ellipsoid, but in Figure 4.2 we
see that the three dinosaurs do not stand out relative to the ellipse. This is because
the outliers have attracted x̄x and, more importantly, have affected the matrix S in
such a way that the ellipse has been inflated in their direction.
5
0
Classical
−5
−10 −5 0 5 10 15
log(body)
Ψ1 (xx, θ ) = W1 (d 2 )(xx − μ ),
Ψ2 (xx, θ ) = W2 (d 2 )(xx − μ )(xx − μ ) − Σ ,
1 n
∑ Ψ (xxi , θ̂ ) = 0.
n i=1
Examples:
• If W1 (d 2 ) = W2 (d 2 ) = 1 we find the sample mean and sample covariance matrix.
μ and Σ̂ mini-
• For an elliptical density (4.1), one finds that the MLE estimators μ̂
mize
n
n log |Σ̂ | − 2 ∑ logh(di2 ).
i=1
Some properties:
• Multivariate M-estimators are affine equivariant.
• For a monotone M-estimator equations (4.9) and (4.10) have a unique solution.
This solution can be found by a reweighting algorithm:
4 High-breakdown estimators of multivariate location and scatter 59
1. Start with initial values μ 0 and Σ 0 such as the coordinatewise median and the
diagonal matrix with the squared coordinatewise MAD at the diagonal.
μ k , Σ̂k ) and
2. At iteration k compute d ki = d(xxi , μ̂
For a monotone M-estimator this algorithm always converges to the unique so-
lution, no matter the choice of the initial values. For a redescending M-estimator
the algorithm can convergence to a bad solution, so the initial values are more
important in that case.
• Under some regularity conditions on W 1 and W2 , multivariate M-estimators are
asymptotically normal. √
• The influence function is bounded if tW 2 (t) and tW1 (t) are bounded.
• The asymptotic breakdown value of a monotone M-estimator satisfies
1
ε∗ .
p+1
Although monotone M-estimators attain the optimal value of 0.5 in the univariate
case, this is no longer true in higher dimensions.
This reveals the main weakness of M-estimators in high dimensions: monotone M-
estimators, while computationally attractive, have a rather low breakdown value.
Redescending M-estimators can have a larger breakdown value, but are impractical
to compute.
recommended that n > 5p. To obtain consistency at the normal distribution, we can
use the consistency factor α /Fχ 2 (χ p2 , α ) with α = h/n (see Croux and Haesbroeck
p+2
(1999)). For small n we can multiply by an additional finite-sample correction factor
given in Pison et al. (2002).
The parameter h controls the breakdown value. At samples in general position
ε ∗ = min(n − h + 1, h − p)/n. The maximal breakdown value (4.7) is achieved by
setting h = [(n + p + 1)/2]. The MCD estimator with h = [(n + p + 1)/2] is thus
very robust, but unfortunately suffers from a low efficiency at the normal model.
For example, the asymptotic relative efficiency of the diagonal elements of the MCD
scatter matrix with respect to the sample covariance matrix is only 6% when p = 2,
and 20.5% when p = 10. This efficiency can be increased by considering a larger h
such as h ≈ 0.75 n. This yields relative efficiencies of 26.2% for p = 2 and 45.9%
for p = 10. On the other hand, this choice of h decreases the robustness towards
possible outliers.
In order to increase the efficiency while retaining high robustness one can add
a weighting step (Rousseeuw and Leroy, 1987; Lopuhaä and Rousseeuw, 1991),
leading to the reweighted MCD estimator (RMCD):
μ , Σ̂ ). Next let
3. Compute d i = d(xxi , μ̂
for some weight function W (t). This weight function should be nonincreasing,
bounded, and equal to zero from a certain value t on.
Note the similarity with (4.9) and (4.10). The RMCD estimator can be seen as a
one-step M-estimator with the raw MCD estimates as initial starting values. RMCD
combines the breakdown value of the original MCD with the better efficiency of
M-estimators. A simple yet effective choice for W is to assume that the h selected
observations are approximately normally distributed hence the distribution of their
di2 is close to χ p2 , which leads to W (d 2 ) = I(d 2 χ p,
2
β ). This is also the default
choice in the CovMcd function in the R package rrcov (with β = 0.975). If we
take h ≈ 0.5 n this reweighting step increases the efficiency to 45.5% for p = 2 and
to 82% for p = 10.
The MCD estimator is affine equivariant. This property follows from the fact that
for each subset of size h, denoted as X H , the determinant of the covariance matrix
of the transformed data equals
Hence, the optimal h-subset (which minimizes |C(X H A )|) remains the same as for
the original data (which minimizes |C(X H )|), and its covariance matrix is trans-
formed appropriately. Afterward, the affine equivariance of the raw MCD location
4 High-breakdown estimators of multivariate location and scatter 61
estimator follows from the equivariance of the sample mean. Finally we note that
μ , Σ̂ ) are affine invariant, which implies that the
the robust distances d i = d(xxi , μ̂
reweighted estimator is equivariant.
The influence functions of the MCD location vector and scatter matrix are
bounded (Croux and Haesbroeck, 1999; Cator and Lopuhaä, 2012). At the stan-
dard Gaussian distribution, the influence function of the MCD location estimator
becomes zero for all x with xx2 > χ p, 2 , hence large outliers do not influence the
α
estimates. The same happens with the off-diagonal elements of the MCD scatter es-
timator. On the other hand, the influence function of the diagonal elements remains
constant (different from zero) when xx2 is sufficiently large. Therefore, the out-
liers still have a bounded influence of the estimator. All these influence functions
are smooth, except at those xx with xx2 = χ p, 2 . The reweighted MCD estimator has
α
an additional jump in xx = χ p,0.975 due to the discontinuity of the weight function.
2 2
Highly robust estimators such as MCD are often used to identify outliers, see e.g.
Rousseeuw and van Zomeren (1990) and Becker and Gather (1999, 2001).
μ RMCD =
Example. For the animals data, the RMCD estimates (for α = 0.5) are μ̂
(3.03, 4.28) and
18.86 14.16
Σ̂RMCD =
14.16 11.03
yielding a robust correlation estimate of 0.98 which is higher and corresponds to the
correlation of the ‘good’ (non-outlying) data. The corresponding robust tolerance
ellipse, now defined as in (4.8) but with the Mahalanobis distances replaced by the
μ RMCD , Σ̂RMCD ), correctly flags the outliers in Figure 4.3.
robust distances d i = d(xxi , μ̂
In dimensions p > 3 we cannot draw a scatterplot or a tolerance ellipsoid. To
explore the differences between the classical and the robust analysis we can then
still draw a distance-distance plot which plots the robust distances versus the Ma-
halanobis distances as in Figure 4.4. This plot reveals the differences between both
methods at a glance, as the three dinosaurs lie far from the dotted line. Note that
observation 14, which is little bit outlying, is the human species.
4.4.2 Computation
The exact
MCD estimator is very hard to compute, as it requires the evaluation of
all nh subsets of size h. Therefore one switches to an approximate algorithm such
as the FAST-MCD algorithm of Rousseeuw and Van Driessen (1999) which is quite
efficient. The key component of the algorithm is the so-called C-step:
Theorem 4.1. Take Xn = (xx1 , . . . , x n ) and let H1 ⊂ {1, . . . , n} be subset of size h. Put
μ 1 and Σ̂1 the empirical mean and covariance matrix of the data in H 1 . If |Σ̂1 | = 0
μ̂
μ 1 , Σ̂1 ) for all i = 1, . . . , n.
define the relative distances d i1 := d(xxi , μ̂
Now take H2 such that {di1 ; i ∈ H2 } := {(d 1 )1:n , . . . , (d 1 )h:n } where (d 1 )1:n
(d 1 )2:n · · · (d 1 )n:n are the ordered distances, and compute μ̂ μ 2 and Σ̂2 based on
62 Peter Rousseeuw and Mia Hubert
15
10
log(brain)
5
0
MCD
Classical
−5
−10 −5 0 5 10 15
log(body)
Fig. 4.3: Classical and robust tolerance ellipse of the animals data set.
H2 . Then
|Σ̂2 | |Σ̂1 |
μ 2 = μ̂
with equality if and only if μ̂ μ 1 and Σ̂2 = Σ̂1 .
If |Σ̂1 | > 0, the C-step thus easily yields a new h-subset with lower covariance de-
terminant. Note that the C stands for ‘concentration’ since Σ̂2 is more concentrated
(has a lower determinant) than Σ̂1 . The condition | Σ̂1 | = 0 in the C-step theorem is
no real restriction because if | Σ̂1 | = 0 the minimal objective value is already attained.
C-steps can be iterated until | Σ̂new | = 0 or |Σ̂new | = |Σ̂old |. The sequence of de-
terminants obtained in this way must converge in a finite number of steps because
there are only finitely many h-subsets. However, there is no guarantee that the final
value |Σ̂new | of the iteration process is the global minimum of the MCD objective
function. Therefore an approximate MCD solution can be obtained by taking many
initial choices of H1 , applying C-steps to each and keeping the solution with lowest
determinant.
To construct an initial subset H1 , a random (p + 1)-subset J is drawn and μ̂ μ 0 :=
ave(J) and Σ̂0 := C(J) are computed. [If | Σ̂0 | = 0 then J can be extended by adding
μ 0 , Σ̂0 )
observations until | Σ̂0 | > 0.] Then, for i = 1, . . . , n the distances d i0 := d(xxi , μ̂
are computed and sorted. The initial H 1 subset then consists of the h observations
with smallest distance d 0 . This method yields better initial subsets than by drawing
4 High-breakdown estimators of multivariate location and scatter 63
Distance−Distance Plot
26
6
8
16
Robust distance
6
4
14
17
2
0
Mahalanobis distance
some functions use α = 0.5 as default value, yielding a breakdown value of 50%,
whereas other implementations use α = 0.75.
The first affine equivariant estimator of location and scatter with 50% breakdown
value was the Stahel-Donoho estimator (Stahel, 1981; Donoho, 1982). It is con-
structed as follows. The Stahel-Donoho outlyingness of a univariate point x i is given
by
|xi − med(Xn )|
SDOi = SDO(1) (xi , Xn ) =
mad(Xn )
4.5.3 S-estimators
S-estimators of multivariate location and scatter (Rousseeuw and Leroy, 1987; Lop-
μ , Σ̂ ) to the problem of
uhaä, 1989) are defined as the solution ( μ̂
1 n
minimizing |Σ̂ | subject to ∑ ρ (di) = δ
n i=1
4.5.4 MM-estimators
1 n
∑ 1 i
n i=1
ρ [(x
x − μ ) −1
Γ (x
x i − μ )] 1/2
/ σ̂
among all μ and all symmetric positive definite Γ with |Γ | = 1. The MM-
estimator for the covariance matrix is then Σ̂ = σ̂ 2Γ̂ .
The idea is to estimate the scale by means of a very robust S-estimator, and then
to estimate the location and shape using a different ρ function that yields a better
efficiency. MM-estimators have the following properties:
• The location and shape estimates inherit the breakdown value of the initial scale:
if ρ1 (s) ρ0 (s) for all s > 0 and ρ 1 (∞) = ρ0 (∞), then
εn∗ (μ̂
μ , Γ̂ , Σ̂ ; Xn ) εn∗ (μ̃
μ , Σ̃ ; Xn ).
implementation uses a bisquare MM-estimator with 50% breakdown value and 95%
efficiency at the normal model.
If one is willing to give up the affine equivariance requirement, certain robust loca-
tion vectors and covariance matrices can be computed much faster.
This estimator has a 50% breakdown value and can be computed easily. But it is not
affine equivariant, and it does not have to lie in the convex hull of the sample when
p 3. Consider for example the three unit vectors (1, 0, 0) , (0, 1, 0) and (0, 0, 1)
whose convex hull does not contain the coordinatewise median (0, 0, 0) .
It has a nice geometrical interpretation: take a point μμ in R p and project all obser-
vations onto a sphere around μμ . If the mean of these projections equals μμ , then μ is
the spatial median.
The breakdown value of the L 1 -median is 50% and its influence function is
bounded but it is not affine equivariant (Lopuhaä and Rousseeuw, 1991). However,
the L1 -median is orthogonal equivariant, i.e. it satisfies (4.4) with A any orthogonal
matrix (A = A−1 ). This implies that the L1 -median transforms appropriately under
68 Peter Rousseeuw and Mia Hubert
Maronna and Zamar (2002) presented a method to obtain positive definite and ap-
proximately affine equivariant robust scatter matrices starting from any pairwise
robust scatter matrix. This method was applied to the robust covariance estimate
of Gnanadesikan and Kettenring (1972). The resulting multivariate location and
scatter estimates are called orthogonalized Gnanadesikan-Kettenring (OGK) esti-
mates and are calculated as follows:
1. Let m(.) and s(.) be robust univariate estimators of location and scale.
2. Construct z i = D−1 x i for i = 1, . . . , n with D = diag(s(X1 ), . . . , s(X p )).
3. Compute the ‘correlation matrix’ U of the variables of Z = (Z 1 , . . . , Z p ), given
by u jk = 1/4(s(Z j + Zk )2 − s(Z j − Zk )2 ).
4. Compute the matrix E of eigenvectors of U and
a. project the data on these eigenvectors, i.e. V = ZE;
b. compute ‘robust variances’ of V = (V 1 , . . . ,Vp ), i.e.
L = diag(s2 (V1 ), . . . , s2 (Vp ));
c. Set the p × 1 vector μ̂ μ (Z) = Em m where m = (m(V1 ), . . . , m(Vp )) , and com-
pute the positive definite matrix Ŝ(Z) = ELE .
μ RAWOGK = Dμ̂ (Z) and Σ̂RAWOGK = DŜ(Z)D .
5. Transform back to X, i.e. μ̂
In the OGK algorithm m(.) is a weighted mean and s(.) is the τ -scale of Yohai and
Zamar (1988). Step 2 makes the estimate scale equivariant, whereas the following
steps are a kind of principal components that replace the eigenvalues of U (which
may be negative) by robust variances. As in the FAST-MCD algorithm the estimate
is improved by a weighting step, where the cutoff value in the weight function is now
taken as c = χ p,0.9
2 med(d1 , . . . , dn )/ χ p,0.5
2 μ RAWOGK , Σ̂RAWOGK ).
with di = d(xxi , μ̂
4 High-breakdown estimators of multivariate location and scatter 69
Note that even though the OGK and DetMCD methods are not affine equivariant,
it turns out that their deviation from affine equivariance is very small.
4.7 Conclusions
The assumption underlying this chapter is that the majority of the data can be mod-
eled by an elliptical distribution, whereas there is no such restriction on any outliers
that may occur. Unlike the classical mean and covariance matrix, robust estimators
can withstand the effect of such outliers. Moreover, we saw in the example how the
robust methods allow us to detect the outliers by means of their robust distances,
which can for instance be visualized in a distance-distance plot like Figure 4.4.
We advocate the use of robust estimators with a suitably high breakdown value,
as these are the least affected by outliers. Our recommendation is to use a high-
breakdown affine equivariant method such as MCD, S, or MM when the number
of dimensions p is rather small, say up to 10. For higher dimensions these meth-
ods become too computationally intensive, and then we recommend either OGK or
DetMCD. The latter methods are close to affine equivariant, and can be computed
faster and in higher dimensions.
References
Becker, C., Gather, U. (1999). The masking breakdown point of multivariate outlier identification
rules. J. Am. Stat. Assoc. 94, 947–955.
Becker, C., Gather, U. (2001). he largest nonidentifiable outlier: a comparison of multivariate si-
multaneous outlier identification rules. Comput. Stat. Data An. 36, 119–127.
Billor, N., Hadi, A., Velleman, P. (2000). BACON: blocked adaptive computationally efficient
outlier nominators. Comput. Stat. Data An. 34, 279–298.
Cator, E., Lopuhaä, H. (2012). Central limit theorem and influence function for the MCD estima-
tors at general multivariate distributions. Bernoulli 18, 520–551.
Croux, C., Haesbroeck, G. (1999). Influence function and efficiency of the Minimum Covariance
Determinant scatter matrix estimator. J. Multivariate An. 71, 161–190.
Davies, L. (1987). Asymptotic behavior of S-estimators of multivariate location parameters and
dispersion matrices. Ann. Stat. 15, 1269–1292.
Davies, P., Gather, U. (2005). Breakdown and groups (with discussion and rejoinder). Ann. Stat.
33, 977–1035.
Debruyne, M., Hubert, M. (2009). The influence function of the Stahel-Donoho covariance esti-
mator of smallest outlyingness. Stat. Probabil. Lett. 79, 275–282.
Donoho, D. (1982). Breakdown properties of multivariate location estimators. PhD thesis, Harvard
University, Boston.
Fritz, H., Filzmoser, P., Croux, C. (2012). A comparison of algorithms for the multivariate L1-
median. Computation. Stat., to appear.
Gather, U., Hilker, T. (1997). A note on Tyler’s modification of the MAD for the Stahel-Donoho
estimator. Ann. Stat. 25, 2024–2026.
Gnanadesikan, R., Kettenring, J. (1972). Robust estimates, residuals, and outlier detection with
multiresponse data. Biometrics 28, 81–124.
4 High-breakdown estimators of multivariate location and scatter 71
Hampel, F., Ronchetti, E., Rousseeuw, P., Stahel, W. (1986). Robust Statistics: The Approach
Based on Influence Functions. Wiley, New York.
Hubert, M., Rousseeuw, P., Vanden Branden, K. (2005). ROBPCA: a new approach to robust prin-
cipal components analysis. Technometrics 47, 64–79.
Hubert, M., Rousseeuw, P., Verdonck, T. (2012). A deterministic algorithm for robust location and
scatter. J. Comput. Graph. Stat. 21, 618–637.
Lopuhaä, H. (1989). On the relation between S-estimators and M-estimators of multivariate loca-
tion and covariance. Ann. Stat. 17, 1662–1683.
Lopuhaä, H., Rousseeuw, P. (1991). Breakdown points of affine equivariant estimators of multi-
variate location and covariance matrices. Ann. Stat. 19, 229–248.
Maronna, R. (1976). Robust M-estimators of multivariate location and scatter. Ann. Stat. 4, 51–67.
Maronna, R., Martin, D., Yohai, V. (2006). Robust Statistics: Theory and Methods. Wiley, New
York.
Maronna, R., Yohai, V. (1995). The behavior of the Stahel-Donoho robust multivariate estimator.
J. Am. Stat. Assoc. 90, 330–341.
Maronna, R. and Zamar, R. (2002). Robust estimates of location and dispersion for high-
dimensional data sets. Technometrics 44, 307–317.
Pison, G., Van Aelst, S., Willems, G. (2002). Small sample corrections for LTS and MCD. Metrika
55, 111–123.
Rousseeuw, P. (1984). Least median of squares regression. J. Am. Stat. Assoc. 79, 871–880.
Rousseeuw, P. (1985). Multivariate estimation with high breakdown point. In: Grossmann, W.,
Pflug, G., Vincze, I., Wertz, W. (eds.) Mathematical Statistics and Applications, Vol. B, pp.
283–297. Reidel Publishing Company, Dordrecht.
Rousseeuw, P., Croux, C. (1993). Alternatives to the median absolute deviation. J. Am. Stat. Assoc.
88, 1273–1283.
Rousseeuw, P., Leroy, A. (1987). Robust Regression and Outlier Detection. Wiley-Interscience,
New York.
Rousseeuw, P., Van Driessen, K. (1999). A fast algorithm for the Minimum Covariance Determi-
nant estimator. Technometrics 41, 212–223.
Rousseeuw, P. and van Zomeren, B. (1990). Unmasking multivariate outliers and leverage points.
J. Am. Stat. Assoc. 85, 633–651.
Salibian-Barrera, M., Van Aelst, S., Willems, G. (2006). PCA based on multivariate MM-
estimators with fast and robust bootstrap. J. Am. Stat. Assoc. 101, 1198–1211.
Salibian-Barrera, M., Yohai, V.J. (2006). A fast algorithm for S-regression estimates. J. Comput.
Graph. Stat. 15, 414–427.
Stahel, W. (1981). Robuste Schätzungen: infinitesimale Optimalität und Schätzungen von Kovari-
anzmatrizen, PhD thesis, ETH Zürich.
Tatsuoka, K., Tyler, D. (2000). On the uniqueness of S-functionals and M-functionals under nonel-
liptical distributions. Ann. Stat. 28, 1219–1243.
Verboven, S., Hubert, M. (2005). LIBRA: a Matlab library for robust analysis. Chemometr. Intell.
Lab. 75, 127–136.
Visuri, S., Koivunen, V., Oja, H. (2000). Sign and rank covariance matrices. J. Stat. Plan. Infer. 91,
557–575.
Yohai, V., Zamar, R. (1988). High breakdown point estimates of regression by means of the mini-
mization of an efficient scale. J. Am. Stat. Assoc. 83, 406–413.