Professional Documents
Culture Documents
Notes 14
Notes 14
11-2
• The LMS (Least Median of Squares) is to minimize the median
squared deviation:
the median{(Yi −(β0 +β1 Xi1 +β2 Xi2 +· · ·+βk Xik ))2 , i = 1, 2, . . . , n}
• Comments
– Both LAR (or LAD) and LMS estimates of β can be obtained only
via computer.
– Steven P. Ellis (1998, Statistical Science, Vol. 13, 337-350) pointed
out that LS is better than LAD and LMS in terms of the instability
at data sets, in the sense that small changes to the data sets can
substantially change the fit.
11-3
♠ IRLS Robust Regression
• Steps
(1) Choose a weight function for weighting the cases.
(2) Obtain starting weights for all cases.
(3) Use the starting weights in weighted least squares and obtain the
residuals from the fitted regression function.
(4) Use the residuals in step 3 to obtain revised weights.
(5) Continue the iterations until convergence is obtained.
11-4
• Weight Function
Two widely used weight functions are the Huber and bisquare
weight functions:
1 |u| ≤ 1.345
Huber: w =
1.345 |u| > 1.345
|u|
h i2
1− u
2
4.685
|u| ≤ 4.685
Bisquare: w = ,
0 |u| > 4.685
where w denotes the weight and u denotes the scaled residual. 1.345
and 4.685 are called tuning constants. They were chosen to make the
IRLS robust procedure 95 percent efficient for data generated by the
normal error MLR model.
11-5
• Starting Values
– When the Huber weight function is employed, the initial residuals
may be those obtained from an ordinary least squares fit.
– The bisquare function calculations are more sensitive to the starting
values. To obtain good starting values for the bisquare weight
function, the Huber weight function is often used to obtain an initial
robust regression fit, and the residuals for this fit are then employed
as starting values fir several integrations with the bisquare weight
function.
– Alternatively, LAR regression is used to obtain starting residuals for
the bisquare weight function.
11-6
• Scaled Residuals
The scaled residual ui is defined by
ei
ui = ,
M AD
where
1
MAD = median{|ei − median{ei }|},
0.6745
and the constant 0.6745 provides an unbiased estimate of σ for
independent observations from a normal distribution.
11-7
Bootstrapping
11-8
Step 2. Draw a bootstrap sample from F̂ , denoted by
e∗ = (e∗1 , e∗2 , . . . , e∗n )′ . This consists of drawing each e∗i
independently with replacement and with equal probability
1/n from the observed data {e1 , e2 , . . . , en }.
Using the bootstrap sample {b∗1 , b∗2 , . . . , b∗B }, we can easily compute the
standard errors and CI’s for β.
11-9
Loess Method
♠ Introduction
• The name loess stands for locally weighted regression scatter plot
smoothing.
11-10
• At each point in the data set a low-degree polynomial is fit to a
subset of the data, with explanatory variable values near the point
whose response is being estimated.
11-11
♦ Localized Subsets of Data
• The subsets of data used for each weighted least squares fit in
LOESS are determined by a nearest neighbors algorithm. A
user-specified input to the procedure called the ”bandwidth” or
”smoothing parameter” determines how much of the data is used to
fit each local polynomial.
11-12
• q is called the smoothing parameter because it controls the flexibility
of the LOESS regression function. Large values of q produce the
smoothest functions that wiggle the least in response to fluctuations
in the data. The smaller q is, the closer the regression function will
conform to the data. Using too small a value of the smoothing
parameter is not desirable, however, since the regression function
will eventually start to capture the random error in the data.
11-13
♦ Degree of Local Polynomials
• The local polynomials fit to each subset of the data are almost
always of first or second degree; that is, either locally linear (in the
straight line sense) or locally quadratic.
• Using a zero degree polynomial turns LOESS into a weighted
moving average. Such a simple local model might work well for some
situations, but may not always approximate the underlying function
well enough.
• Higher-degree polynomials would work in theory, but yield models
that are not really in the spirit of LOESS.
• LOESS is based on the ideas that any function can be well
approximated in a small neighborhood by a low-order polynomial
and that simple models can be fit to data easily.
• High-degree polynomials would tend to overfit the data in each
subset and are numerically unstable, making accurate computations
difficult.
11-14
♦ Distance Measure
• For case of two predictor, suppose that we wish to obtain the fitted
value at (Xh1 , Xh2 ), and then this measure is denoted by di and is
defined as
di = [(Xi1 − Xh1 )2 + (Xi2 − Xh2 )2 ]1/2 .
11-15
♦ Weight Function
• The weight function gives the most weight to the data points nearest
the point of estimation and the least weight to the data points that
are furthest away.
• The use of the weights is based on the idea that points near each
other in the explanatory variable space are more likely to be related
to each other in a simple way than points that are further apart.
Following this logic, points that are likely to follow the local model
best influence the local model parameter estimates the most. Points
that are less likely to actually conform to the local model have less
influence on the local model parameter estimates.
• The traditional weight function used for LOESS is the tri-cube
weight function,
(
[1 − (di /dq )3 ]3 if di < dq
wi =
0 if di ≥ dq ,
11-16
♦ Local Fitting
Given the weights wi for the n cases, weighted least squares is used to fit
either the first-order polynomial model or the second-order polynomial
model. After the regression model is fitted by weighted least squares, the
fitted value Ŷh at (Xh1 , Xh2 ) then serves as the nonparametric estimate
of the mean response at these X levels.
♠ Advantages of LOESS
• The biggest advantage LOESS has over many other methods is the
fact that it does not require the specification of a function to fit a
model to all of the data in the sample. Instead the analyst only has
to provide a smoothing parameter value and the degree of the local
polynomial.
• LOESS is very flexible, making it ideal for modeling complex
processes for which no theoretical models exist.
• These two advantages, combined with the simplicity of the method,
make LOESS one of the most attractive of the modern regression
11-17
methods for applications that fit the general framework of least
squares regression but which have a complex deterministic structure.
♠ Disadvantages of LOESS
11-18
♠ References
11-19