Professional Documents
Culture Documents
2017avril Cantoni Rlunch
2017avril Cantoni Rlunch
Robust regression in R
Eva Cantoni
Research Center for Statistics and Geneva School of
Economics and Management,
University of Geneva, Switzerland
2 Robust regression
3 R ressources
4 Examples
5 Bibliography
Robust statistics philosopy Robust regression R ressources Examples Bibliography References
Fε = (1 − ε)Fθ + εG , (1)
Inference Classical
G 0 << ε < 1 0 < ε << 1 ε=0
arbitrary ? ? Fθ
G = ∆z Fε Fε Fθ
G = F(θ,θ0 ) Fε Fε Fθ
G such that Fε = F(θ,θ0 ) F(θ,θ0 ) F(θ,θ0 ) Fθ
Robust
arbitrary ? Fθ Fθ
G = ∆z Fε Fθ Fθ
G = F(θ,θ0 ) Fε Fθ Fθ
G such that Fε = F(θ,θ0 ) F(θ,θ0 ) Fθ Fθ
Table: Models at which inference can be at best done.
Robust statistics philosopy Robust regression R ressources Examples Bibliography References
Robustness measures
yi = xiT β + i ,
with i ∼ N (0, σ 2 ).
Outliers....
6
Point vertical
Bon point levier
5
4
●
3
y
●
● ● ●
●●
● ● ●
●
●
●●●● ●●●
●
●
● ●● ●● ● ●
●●
●● ●●
●
● ●● ●● ● ●
● ●●● ●
2
●● ● ● ●● ● ●
● ● ●●● ●●●
●●●● ● ●
●
●
●● ● ●●●●
●
●● ● ●●
● ● ●●
●
●
●
●●
●● ●
1
●
●
● Mauvais point levier
0
−4 −2 0 2 4 6
x
Robust statistics philosopy Robust regression R ressources Examples Bibliography References
Robustness vs diagnostic
Masking.....
ML residuals
1.5
●
●
● ● ●
1.0
● ●
●
● ●
● ●
● ● ●●
●
● ●
0.5
● ● ● ●
● ● ●
● ●●
resid(olsmultiple)
● ● ● ●
●● ● ● ●
● ● ● ●
● ● ●
● ● ●
●
0.0
● ● ● ●
● ●
● ●
● ●
● ● ●
● ●
● ● ● ●
● ● ● ●
● ●
● ● ●
−0.5
● ●
● ●
● ●
● ●
●
● ●
●
●
● ● ●
−1.0
●
●
●
−1.5
●
●
●
●
0 20 40 60 80 100
Index
Robust statistics philosopy Robust regression R ressources Examples Bibliography References
Robustness vs diagnostic
Masking.....
Data and ML fit
6
5
4
●
3
● ●● ●
●●
● ●● ●
y
● ●●●
●●●
●
● ●● ●●● ●
●
●
●
●●
● ●
● ●●●●● ●●● ● ●
2
●
● ●● ●●●● ● ● ●
●●
●●● ●●●● ●
●
●●●● ●●●●●
●
●● ●●●
● ● ●●
●●
●●●● ●
1
●
0
●
●
●●
−1
−4 −2 0 2 4 6
x
Robust statistics philosopy Robust regression R ressources Examples Bibliography References
M and GM-estimation
The M-estimator β̂M solves
n n n
!
X yi − xiT β̂M X X
ψc xi = ψc (ri )xi = w̃c (ri ) ri xi = 0,
σ
i=1 i=1 i=1
ψc and ρc functions
Huber Tukey or bisquare
1.0
1.0
0.5
0.5
psi_c(r)
psi_c(r)
0.0
0.0
−0.5
−0.5
−1.0
−1.0
−4 −2 0 2 4 −4 −2 0 2 4
r r
ρH(u) ρB(u)
5
5
4
4
3
3
2
2
1
1
0
−4 −2 0 2 4 −4 −2 0 2 4
u u
Robust statistics philosopy Robust regression R ressources Examples Bibliography References
Comparison
6
OLS
M− Huber
GM−Huber
5
M−Tukey
4
●
3
● ●● ●
●●
● ●● ●
y
● ●●●
●
● ●●
● ●● ●●● ●
●
●
●
●●
● ●
● ●●●●● ●●● ● ●
2
●
● ●● ●●●● ● ● ●
●●
●●●● ●●●
● ●
●
●● ● ●●
● ●
●
●
●● ●●●
● ● ●●
●●
●●●● ●
1
●
0
●
●
●●
−1
−4 −2 0 2 4 6
x
Robust statistics philosopy Robust regression R ressources Examples Bibliography References
S-estimation
MM-estimation
Comparison
6
OLS
M− Huber
M−Tukey
5
MM
S
4
●
3
● ●● ●
●●
● ●● ●
y
● ●●●
●
● ●●
● ●● ●●● ●
●
●
●
●●
● ●
● ●●●●● ●●● ● ●
2
●
● ●● ●●●● ● ● ●
●●
●●●● ●●●
● ●
●
●● ● ●●
● ●
●
●
●● ●●●
● ● ●●
●●
●●●● ●
1
●
0
●
●
●●
−1
−4 −2 0 2 4 6
x
Robust statistics philosopy Robust regression R ressources Examples Bibliography References
g (µi ) = xiT β
where µ0i =
P∂µi /∂β = ∂µi /∂ηi xipand
a(β) = n1 ni=1 E [ψ(ri ; c)]w (xi )/ φvµi µ0i . The constant a(β) is
a correction term to ensure Fisher consistency.
Robust statistics philosopy Robust regression R ressources Examples Bibliography References
(G)M-estimation
• MASS: rlm() with method=’’M’’ (Huber, Tukey, Hampel)
Choice for the scale estimator: MAD, Huber Proposal 2
S-estimation
• robust: lmRob with estim=’’Initial’’
• robustbase: lmrob.S
MM-estimation
• MASS: rlm() with method=’’MM’’
• robust: lmRob (with estim=’’’Final’’, default)
• robustbase: lmrob()
Robust statistics philosopy Robust regression R ressources Examples Bibliography References
Coleman
Coleman
Call :
lm ( f o r m u l a = Y ˜ . , d a t a = c o l e m a n )
Residuals :
Min 1Q Median 3Q Max
−3.9497 −0.6174 0.0623 0.7343 5.0018
Coefficients :
E s t i m a t e Std . E r r o r t v a l u e Pr (>| t | )
( I n t e r c e p t ) 19.94857 13.62755 1.464 0.1653
salaryP −1.79333 1.23340 −1.454 0.1680
fatherWc 0.04360 0.05326 0.819 0.4267
sstatus 0.55576 0.09296 5 . 9 7 9 3 . 3 8 e−05 ∗∗∗
teacherSc 1.11017 0.43377 2.559 0.0227 ∗
motherLev −1.81092 2.02739 −0.893 0.3868
−−−
S i g n i f . codes : 0 Ä ò ∗∗∗ Ä ô 0 . 0 0 1 Ä ò ∗∗ Ä ô 0 . 0 1 Ä ò ∗ Ä ô 0 . 0 5 Ä ò . Ä ô 0 . 1 Ä ò Ä ô 1
’ ’ ’ ’ ’ ’ ’ ’ ’ ’
R e s i d u a l s t a n d a r d e r r o r : 2 . 0 7 4 on 14 d e g r e e s o f f r e e d o m
M u l t i p l e R−s q u a r e d : 0.9063 , A d j u s t e d R−s q u a r e d : 0.8728
F−s t a t i s t i c : 2 7 . 0 8 on 5 and 14 DF , p−v a l u e : 9 . 9 2 7 e−07
Robust statistics philosopy Robust regression R ressources Examples Bibliography References
Coleman
> require ( robustbase )
> summary (m. lmrob , s e t t i n g = ” KS2011 ” )
Call :
lmrob ( f o r m u l a = Y ˜ . , data = coleman )
\−−> method = ”MM”
Residuals :
Min 1Q Median 3Q Max
−4.16181 −0.39226 0 . 0 1 6 1 1 0.55619 7.22766
Coefficients :
E s t i m a t e Std . E r r o r t v a l u e Pr (>| t | )
( I n t e r c e p t ) 30.50232 6.71260 4 . 5 4 4 0 . 0 0 0 4 5 9 ∗∗∗
salaryP −1.66615 0.43129 −3.863 0 . 0 0 1 7 2 2 ∗∗
fatherWc 0.08425 0.01467 5 . 7 4 1 5 . 1 0 e−05 ∗∗∗
sstatus 0.66774 0.03385 1 9 . 7 2 6 1 . 3 0 e−11 ∗∗∗
teacherSc 1.16778 0.10983 1 0 . 6 3 2 4 . 3 5 e−08 ∗∗∗
motherLev −4.13657 0.92084 −4.492 0 . 0 0 0 5 0 7 ∗∗∗
−−−
S i g n i f . codes : 0 Ä ò ∗∗∗ Ä ô 0 . 0 0 1 Ä ò ∗∗ Ä ô 0 . 0 1 Ä ò ∗ Ä ô 0 . 0 5 Ä ò . Ä ô 0 . 1 Ä ò Ä ô 1
’ ’ ’ ’ ’ ’ ’ ’ ’ ’
Robust r e s i d u a l s t a n d a r d e r r o r : 1.134
M u l t i p l e R−s q u a r e d : 0.9814 , A d j u s t e d R−s q u a r e d : 0.9747
C o n v e r g e n c e i n 11 IRWLS i t e r a t i o n s
Robustness weights :
o b s e r v a t i o n 18 i s an o u t l i e r w i t h | w e i g h t | = 0 ( < 0 . 0 0 5 ) ;
The r e m a i n i n g 19 o n e s a r e s u m m a r i z e d a s
Min . 1 s t Qu . Median Mean 3 r d Qu . Max .
0.1491 0.9412 0.9847 0.9279 0.9947 0.9982
Robust statistics philosopy Robust regression R ressources Examples Bibliography References
Coleman
Ctn’d:
Algorithmic parameters :
tuning . chi bb tuning . psi refine . tol rel . tol
1 . 5 4 8 e+00 5 . 0 0 0 e−01 4 . 6 8 5 e+00 1 . 0 0 0 e−07 1 . 0 0 0 e−0
solve . tol eps . o u t l i e r e p s . x warn . l i m i t . r e j e c t warn . l i m i t . meanr
1 . 0 0 0 e−07 5 . 0 0 0 e−03 1 . 5 6 9 e−10 5 . 0 0 0 e−01 5 . 0 0 0 e−0
nR esample max . i t best . r . s k. fast . s k . max maxit . s c a l
500 50 2 1 200
200
trace . lev mts compute . r d f a s t . s . l a r g e . n
0 1000 0 2000
psi subsampling c o v compute . o u t l i e r . s t a t s
” bisquare ” ” nonsingular ” ” . vcov . avar1 ” ”SM”
seed : i n t (0)
Robust statistics philosopy Robust regression R ressources Examples Bibliography References
Coleman
Robustness weights
1.0
0.8
0.6
m.lmrob$rweights
0.4
0.2
0.0
5 10 15 20
Index
Robust statistics philosopy Robust regression R ressources Examples Bibliography References
Coleman
Coefficients :
Value Std . E r r o r t value
( I n t e r c e p t ) 27.3497 7.6808 3.5608
salaryP −1.6207 0 . 6 9 5 2 −2.3314
fatherWc 0.0752 0.0300 2.5045
sstatus 0.6401 0.0524 12.2182
teacherSc 1.1557 0.2445 4.7271
motherLev −3.5195 1 . 1 4 2 7 −3.0801
R e s i d u a l s t a n d a r d e r r o r : 0 . 7 4 6 1 on 14 d e g r e e s o f f r e e d o m
Robust statistics philosopy Robust regression R ressources Examples Bibliography References
Breastfeeding
Breastfeeding
Breastfeeding
Breastfeeding
> require ( robustbase )
> b r e a s t . glmrobWx=g l m r o b ( de cwhat ˜ h o w f e d f r+e t h n i c+e d u c a t+age+g r p+howfed+p a r t n e r+
smokenow+smokebf , f a m i l y=b i n o m i a l , w e i g h t s . on . x=”h a t ” , t c c = 1 . 5 , d a t a=b r e a s t )
> summary ( b r e a s t . glmrobWx )
Coefficients :
E s t i m a t e Std . E r r o r z v a l u e Pr (>| z | )
( Intercept ) −7.77423 3.35952 −2.314 0 . 0 2 0 6 6 ∗
howfedfrBreast 1.49177 0.68777 2.169 0.03008 ∗
e t h n i c N o n−w h i t e 2.68758 1.11582 2.409 0.01601 ∗
educat 0.37372 0.18486 2.022 0.04321 ∗
age 0.03116 0.05955 0.523 0.60082
grpBeginning −0.81318 0.69269 −1.174 0 . 2 4 0 4 2
howfedBreast 0.52823 0.70456 0.750 0.45342
partnerPartner 0.78295 0.81448 0.961 0.33641
smokenowYes −3.44560 1.12137 −3.073 0 . 0 0 2 1 2 ∗∗
smokebfYes 1.50733 1.09900 1.372 0.17020
−−−
S i g n i f . codes : 0 ∗∗∗ 0 . 0 0 1 ∗∗ 0 . 0 1 ∗ 0 . 0 5 . 0 . 1 1
Robustness weights w. r ∗ w. x :
Min . 1 s t Qu . Median Mean 3 r d Qu . Max .
0.04103 0.82460 0.86500 0.82890 0.89400 0.93790
Number o f o b s e r v a t i o n s : 135
F i t t e d by method Mqle ( i n 12 i t e r a t i o n s )
( D i s p e r s i o n p a r a m e t e r f o r b i n o m i a l f a m i l y t a k e n t o be 1 )
Robust statistics philosopy Robust regression R ressources Examples Bibliography References
Breastfeeding
Ctn’d:
No d e v i a n c e v a l u e s a v a i l a b l e
Algorithmic parameters :
acc tcc
0.0001 1.5000
maxit
50
t e s t . acc
” coef ”
Robust statistics philosopy Robust regression R ressources Examples Bibliography References
Breastfeeding
Robustness weights
0.95
w(x)
0.85
18
Index
1.0
53
3
0.6
w(r)
14 63
90 115
11
0.2
75
Index
Robust statistics philosopy Robust regression R ressources Examples Bibliography References
References