Professional Documents
Culture Documents
RoME Regression Smoothing Splines 2021
RoME Regression Smoothing Splines 2021
RoME Regression Smoothing Splines 2021
Tommaso Proietti
Tommaso Proietti (RoME Econometric Theory) Time series regression: Regression and Smoothing Splines Local Polynomial Regression
AA 2020-21
and Averaging
1 / 39
Introduction
log(yt )
yt − yt−1
∆ log yt = log(yt ) − log(yt−1 ) ≈
yt−1
yt − yt−s
∆s log yt = log(yt ) − log(yt−s ) ≈ , s = 4, 12,
yt−s
The log transformation is a special case of the Box-Cox transformation:
( λ
yt −1
yt (λ) = λ , λ=6 0,
log yt , λ = 0.
Tommaso Proietti (RoME Econometric Theory) Time series regression: Regression and Smoothing Splines Local Polynomial Regression
AA 2020-21
and Averaging
2 / 39
Introduction
Tommaso Proietti (RoME Econometric Theory) Time series regression: Regression and Smoothing Splines Local Polynomial Regression
AA 2020-21
and Averaging
3 / 39
Introduction
Figure: U.S. Quarterly Gross Domestic Product at chained-linked volumes, ref. year 2009
15000 9.5
9.0
10000
8.5
5000 8.0
0.10
0.02
0.05
0.00
0.00
-0.02
Tommaso Proietti (RoME Econometric Theory) Time series regression: Regression and Smoothing Splines Local Polynomial Regression
AA 2020-21
and Averaging
4 / 39
Introduction
0
0.0
-1
-0.2
1960 1980 2000 1960 1980 2000
Annual Changes ∆12 log y t
5.0 1.0
2.5 0.5
0.0 0.0
-2.5 -0.5
1960 1980 2000 1960 1980 2000
Tommaso Proietti (RoME Econometric Theory) Time series regression: Regression and Smoothing Splines Local Polynomial Regression
AA 2020-21
and Averaging
5 / 39
Introduction
Series
125 ip ip_sa
100
75
50
0.1
0.0
-0.1
-0.2
D12Lip D12Lip_sa
-0.3
1990 1995 2000 2005 2010 2015
Tommaso Proietti (RoME Econometric Theory) Time series regression: Regression and Smoothing Splines Local Polynomial Regression
AA 2020-21
and Averaging
6 / 39
Introduction
Figure: Consumer Price Index for All Urban Consumers, All Items, 1982-84 = 100
200
100
0.01
0.00
-0.01
0.10
0.05
0.00
Tommaso Proietti (RoME Econometric Theory) Time series regression: Regression and Smoothing Splines Local Polynomial Regression
AA 2020-21
and Averaging
7 / 39
Introduction
Tommaso Proietti (RoME Econometric Theory) Time series regression: Regression and Smoothing Splines Local Polynomial Regression
AA 2020-21
and Averaging
8 / 39
Time Series Regression
Here global means that the coefficients of the polynomial are constant across the
sample span of t and it is not possible to control the influence of the individual
observations on the fit.
Tommaso Proietti (RoME Econometric Theory) Time series regression: Regression and Smoothing Splines Local Polynomial Regression
AA 2020-21
and Averaging
9 / 39
Time Series Regression
Global polynomials are amenable to mathematical treatment, but are not very
flexible: they can provide bad local approximations and behave rather weirdly
at the extreme of the sample.
This point is illustrated by the first panel of figure 1, which plots the original
series, representing the industrial production index for the Italian Automotive
sector, and the estimate of the trend arising from fitting cubic and quintic
polynomials of time.
In particular, it can be seen that a high order is needed to provide a reasonable
fit (the cubic fit being very poor).
Tommaso Proietti (RoME Econometric Theory) Time series regression: Regression and Smoothing Splines Local Polynomial Regression
AA 2020-21
and Averaging
10 / 39
Time Series Regression
90
80
70
100
90
80
70
Tommaso Proietti (RoME Econometric Theory) Time series regression: Regression and Smoothing Splines Local Polynomial Regression
AA 2020-21
and Averaging
11 / 39
Time Series Regression
which retains linearity in the regression coefficients and uses a suitable set of
transformation or functions of t, hm (t), called basis.
In the case of polynomial splines the idea is to add to a global polynomial of
order p polynomial pieces at given points, called knots, so that the sections are
joined together ensuring that certain continuity properties are fulfilled.
Given the set of points ξ1 < . . . < ξk < . . . ξK , a polynomial spline function of
degree p with K knots {ξk , k = 1, . . . , K } is a polynomial of degree p in each
of the k + 1 intervals [ξk , ξk+1 ), with p − 1 continuous derivatives, whereas the
p-th derivative has jumps at the knots.
Tommaso Proietti (RoME Econometric Theory) Time series regression: Regression and Smoothing Splines Local Polynomial Regression
AA 2020-21
and Averaging
12 / 39
Time Series Regression
Tommaso Proietti (RoME Econometric Theory) Time series regression: Regression and Smoothing Splines Local Polynomial Regression
AA 2020-21
and Averaging
13 / 39
Time Series Regression
Tommaso Proietti (RoME Econometric Theory) Time series regression: Regression and Smoothing Splines Local Polynomial Regression
AA 2020-21
and Averaging
14 / 39
Cubic splines and natural boundary conditions
The model has K + 4 parameters and the number of effective degrees of freedom is
df = 4 + K .
A cubic spline (like any other high order spline) behaves too erratically at the
boundary of the t support, i.e. the variance of the fit is very high.
Tommaso Proietti (RoME Econometric Theory) Time series regression: Regression and Smoothing Splines Local Polynomial Regression
AA 2020-21
and Averaging
15 / 39
Cubic splines and natural boundary conditions
●
●
●
● ●●
1.0
●
●
●
●
●
●
● ●
●
● ●
● ● ●
● ●
● ● ●
● ●
● ●
●● ●
●
●●
●●● ●
●●●
●
● ●
●
● ● ●●●● ●
● ●● ● ● ●
0.5
● ●
●
● ● ●● ●
● ● ●
y
● ● ● ●
● ●
● ● ●
● ●● ●● ●● ●
●● ●
●●
●
●
● ●
● ●
●
●
● ● ● ● ●
● ●
●● ● ●
●● ●●
● ● ●● ●
● ● ●
● ●●
●
● ● ●
● ● ●● ● ●
●● ●● ●
●
● ● ●
●
0.0
● ●
● ● ●
● ●● ●
●● ●
● ● ● ● ●
●
● ●
● ●
● ● ●●
●
●
●
●
● ●
●
●
●
●
x
Tommaso Proietti (RoME Econometric Theory) Time series regression: Regression and Smoothing Splines Local Polynomial Regression
AA 2020-21
and Averaging
16 / 39
Cubic splines and natural boundary conditions
Comparison of cubic and linear spline fit with 4 internal equally spaced knots at 0.2,
0.4, 0.6, 0.8.
●
●
● ●
●
1.0
●
linear
●
cubic
●
●
● ●
● ● ●
● ●
● ●
● ●
●
●● ●
● ●
● ●
● ●● ● ● ●
0.5
● ● ●
● ●
● ●
y
● ●
● ●
● ●● ●
●
●●
●
● ●
●
● ● ● ● ●
●
●
●
●
●●
●
● ●
● ● ●●
●● ● ●
●
0.0
●
● ●●
● ●
● ● ●
●
● ●
●
●
● ●
●
●
●
●
Tommaso Proietti (RoME Econometric Theory) Time series regression: Regression and Smoothing Splines Local Polynomial Regression
AA 2020-21
and Averaging
17 / 39
Cubic splines and natural boundary conditions
Cubic spline fit with different K and global cubic fit (knots are located
automatically at quantiles of the distribution of time t.
●
●
● ●
●
1.0
●
P3
●
K=2
●
●
K=5 ●
●
● ● ● K=10
● ●
●
●
● K=20 ●
●
●● ●
● ●
● ●
● ●● ● ● ●
0.5
● ● ●
● ●
● ●
y
● ●
● ●
● ●● ●
●
●●
●
● ●
●
● ● ● ● ●
●
●
●
●
●●
●
● ●
● ● ●●
●● ● ●
●
0.0
●
● ●●
● ●
● ● ●
●
● ●
●
●
● ●
●
●
●
●
Tommaso Proietti (RoME Econometric Theory) Time series regression: Regression and Smoothing Splines Local Polynomial Regression
AA 2020-21
and Averaging
18 / 39
Cubic splines and natural boundary conditions
This is the reason why it is preferable to impose the so called natural boundary
conditions, which constrain the spline be linear outside the boundary knots.
The natural boundary conditions require that the second and the third derivatives
are zero for t ≤ ξ1 and t ≥ ξK . This amounts to imposing 4 restrictions, and this
frees 4 df.
The complexity of the spline model is measured by the degrees of freedom, which is
the trace of the hat matrix. This is equal to p + 1 + K for polynomial splines and
K for a natural cubic spline.
Tommaso Proietti (RoME Econometric Theory) Time series regression: Regression and Smoothing Splines Local Polynomial Regression
AA 2020-21
and Averaging
19 / 39
Cubic splines and natural boundary conditions
K
00 X
f (t) = 2β2 + 6β3 t + 6 βk+3 (t − ξk )+
k=1
K
000 X
f (t) = 6β3 + 6 βk+3 (t − ξk )0+ .
k=1
The natural boundary conditions imply the following constraints on the coefficients:
K
X K
X
β2 = β3 = 0; βk+3 = 0; ξk βk+3 = 0
k=1 k=1
Tommaso Proietti (RoME Econometric Theory) Time series regression: Regression and Smoothing Splines Local Polynomial Regression
AA 2020-21
and Averaging
20 / 39
Cubic splines and natural boundary conditions
Figure: Natural cubic splines. Comparison of fit with different K and global cubic fit.
●
● ●
●
1.0
●
P3
●
K=2
●
●
K=5 ●
●
● ● ● K=10
● ●
●
●
● K=20 ●
●
●● ●
● ●
● ●
● ●● ● ● ●
0.5
● ● ●
● ●
● ●
y
● ●
● ●
● ●● ●
●
●●
●
● ●
●
● ● ● ● ●
●
●
●
●
●●
●
● ●
● ● ●●
●● ● ●
●
0.0
●
● ●●
● ●
● ● ●
●
● ●
●
●
● ●
●
●
●
●
Tommaso Proietti (RoME Econometric Theory) Time series regression: Regression and Smoothing Splines Local Polynomial Regression
AA 2020-21
and Averaging
21 / 39
Cubic splines and natural boundary conditions How many knots
Model selection is carried out by the same methods considered for regression. It
entails not only the selection of the number of knots, K , but also their location
along the support of t.
Note: the truncated power basis is easily interpretable; however, for computational
efficiency the a linear transformation of this basis, the B-spline basis is used for
estimation (the regressors are less collinear).
Tommaso Proietti (RoME Econometric Theory) Time series regression: Regression and Smoothing Splines Local Polynomial Regression
AA 2020-21
and Averaging
22 / 39
Cubic splines and natural boundary conditions How many knots
7.5
1.05
1.00 5.0
0.95 2.5
5 10 5 10
(x−ξ k )+3 (cubic spline) Smoothing spline
750 40
30
500
20
250
10
5 10 5 10
Tommaso Proietti (RoME Econometric Theory) Time series regression: Regression and Smoothing Splines Local Polynomial Regression
AA 2020-21
and Averaging
23 / 39
Smoothing splines
Smoothing splines
A smoothing spline is a natural cubic spline with n knots placed at each observation
ti , i = 1, . . . , n. Hence each new observation carries “news”. Obviously, such a
model is overparameterized as the number of parameters is n, i.e. there is one
parameter for each observation.
Pn
Hence, we choose f (t) = i=1 θi Ni (t), where Ni (t) are the elements of the
natural spline basis corresponding to the knots ti ’s. The coefficients θi are
estimated by minimising the following penalised residual sum of squares function:
( n Z )
X
2 00 2
PRESS(λ) = min [yi − f (ti )] + λ [f (t)] dt , (2)
i=1
Tommaso Proietti (RoME Econometric Theory) Time series regression: Regression and Smoothing Splines Local Polynomial Regression
AA 2020-21
and Averaging
24 / 39
Smoothing splines
R 00 N is00 the regression matrix of the natural spline and Ω has elements
where
Nh (t)Nk (t)dt.
The solution is
θ̂ = (N 0 N + λΩ)−1 N 0 y
(notice the analogy with ridge regression).
Tommaso Proietti (RoME Econometric Theory) Time series regression: Regression and Smoothing Splines Local Polynomial Regression
AA 2020-21
and Averaging
25 / 39
Smoothing splines
df (λ) = tr(Hλ ).
Tommaso Proietti (RoME Econometric Theory) Time series regression: Regression and Smoothing Splines Local Polynomial Regression
AA 2020-21
and Averaging
26 / 39
Smoothing splines
Crossvalidation
Tommaso Proietti (RoME Econometric Theory) Time series regression: Regression and Smoothing Splines Local Polynomial Regression
AA 2020-21
and Averaging
27 / 39
Smoothing splines
●
● ●●●
y
● f(x)
●
● ●
● ●
●● ●
●
2
●
● ●
●
● ● ●
●
● ● ● ●
●
● ●● ●
● ●●●
●● ● ●
●
● ●● ●
●
y
● ● ●
●● ●●
0
●
● ● ● ●
●
● ● ●
● ● ●
● ●● ●● ● ● ● ●
● ●
●
● ● ●
● ●
●
● ●
● ●
−2
● ●● ●
●
●●
●
●
●
●
x
Tommaso Proietti (RoME Econometric Theory) Time series regression: Regression and Smoothing Splines Local Polynomial Regression
AA 2020-21
and Averaging
28 / 39
Smoothing splines
●
● ●●●
●
● CV
● ●
● df=2 ●
●● ●
● df=20
2
●
● ●
●
● ● df=50 ●
●
● ● ● ●
●
● ●● ●
● ●●●
●● ● ●
●
● ●● ●
●
y
● ● ●
●● ●●
0
●
● ● ● ●
●
● ● ●
● ● ●
● ●● ●● ● ● ● ●
● ●
●
● ● ●
● ●
●
● ●
● ●
−2
● ●● ●
●
●●
●
●
●
●
x
Tommaso Proietti (RoME Econometric Theory) Time series regression: Regression and Smoothing Splines Local Polynomial Regression
AA 2020-21
and Averaging
29 / 39
Local polynomial regression
f˜(t) = β0 + β1 (t − t0 ) + · · · + βp (t − t0 )p .
It is clear from setting t = t0 , that β0 = f (t0 ), β1 = f 0 (t0 ), and in general
βj = f (j) (t0 )/j! (the scaled j-th derivative of the function).
Tommaso Proietti (RoME Econometric Theory) Time series regression: Regression and Smoothing Splines Local Polynomial Regression
AA 2020-21
and Averaging
30 / 39
Local polynomial regression
where K t−t
h
0
is a kernel function, depending on a bandwidth parameter,
h > 0, providing a set of weights declining with the distance of t from t0 .
Tommaso Proietti (RoME Econometric Theory) Time series regression: Regression and Smoothing Splines Local Polynomial Regression
AA 2020-21
and Averaging
31 / 39
Local polynomial regression
Denoting
(t1 − t0 )p
1 (t1 − t0 ) ···
1 (t2 − t0 ) ··· (t2 − t0 )p
β0
.. .. ..
β1
X0
. . ··· .
= ,β = .
p ..
1 (ti − t0 ) ··· (ti − t0 )
.
.. .. ..
βp
. . ··· .
1 (tn − t0 ) · · · (tn − t0 )p
K0
t1 − t0 t2 − t0 tn − t0
= diag K ,K ,...,K ,
h h h
the WLS estimator is
β̂ = (X00 K0 X0 )−1 X00 K0 y .
The fitted value is then ŷ0 = β̂0 .
Tommaso Proietti (RoME Econometric Theory) Time series regression: Regression and Smoothing Splines Local Polynomial Regression
AA 2020-21
and Averaging
32 / 39
Local polynomial regression
Notice that
fˆ(t0 ) = β̂0
The linear combination yielding our trend estimate is often termed a (linear)
filter, and the weights wj constitute its impulse responses.
The latter are time invariant and carry essential information on the nature of
the estimated signal.
Two important properties of the weights are symmetry (if the design is regular,
i.e., observations are equally spaced) and reproduction of p-th degree
polynomials.
Tommaso Proietti (RoME Econometric Theory) Time series regression: Regression and Smoothing Splines Local Polynomial Regression
AA 2020-21
and Averaging
33 / 39
Local polynomial regression
Tommaso Proietti (RoME Econometric Theory) Time series regression: Regression and Smoothing Splines Local Polynomial Regression
AA 2020-21
and Averaging
34 / 39
Local polynomial regression Kernels
Kernels
A kernel is a symmetric and non negative function with a maximum at zero. There
is a large literature on kernels and their properties.
The uniform kernel is K (u) = I (|u| ≤ 1);
The Epanechnikov kernel has certain optimality properties and is defined as
follows: 3 2
K (u) = 4 (1 − u ), if |u| ≤ 1,
0, otherwise.
The tricube kernel (used by Loess, a k-nearest-neighbour local polynomial
method) is defined as follows:
(1 − |u|3 )3 , if |u| ≤ 1,
K (u) =
0, otherwise.
The Gaussian kernel is such that h is the standard deviation of t and K (u) is
the standard Normal density function.
In the k-nearest-neighbour approach h is set equal to the distance from the
k-th closest ti to t0 . Hence, it varies with t0 .
Tommaso Proietti (RoME Econometric Theory) Time series regression: Regression and Smoothing Splines Local Polynomial Regression
AA 2020-21
and Averaging
35 / 39
Local polynomial regression Kernels
Uniform
Epanechnikov
-2 -1 0 1 2 -2 -1 0 1 2
Gaussian Tricube
-2 -1 0 1 2 -2 -1 0 1 2
Tommaso Proietti (RoME Econometric Theory) Time series regression: Regression and Smoothing Splines Local Polynomial Regression
AA 2020-21
and Averaging
36 / 39
Local polynomial regression Bias-Variance Trade-off
Bias-Variance Trade-off
The degree p of the polynomial and the bandwidth h are crucial quantities.
They regulate the complexity of the fit, which increases with p and decreases
with h.
The mean square estimation error is
Tommaso Proietti (RoME Econometric Theory) Time series regression: Regression and Smoothing Splines Local Polynomial Regression
AA 2020-21
and Averaging
37 / 39
Local polynomial regression Bias-Variance Trade-off
Usually, fˆ(t0 ) is biased for f (t0 ), unless the true regression function is a
polynomial of order p.
The bias arises from neglecting higher order terms in the Taylor expansion.
The bias is inversely related to p and positively related to the bandwidth h.
As far as the variance is concerned, for a given p, Var[fˆ(t0 )] decreases as h
increases.
The shape of the kernel has much less impact on the bias-variance trade-off.
The optimal choice of (p, h) should minimise MSE[fˆ(t0 )], which can be
estimated using a variety of methods (see e.g. Fan and Gijbels, 1994).
It is usually most effective to choose a low degree polynomial (e.g. p = 1, or
p = 3) and concentrate the efforts on the selection of the bandwidth.
The optimal h can be obtained by cross-validation.
Tommaso Proietti (RoME Econometric Theory) Time series regression: Regression and Smoothing Splines Local Polynomial Regression
AA 2020-21
and Averaging
38 / 39
Special case: local averaging (p = 0) and kernel smoothing
Tommaso Proietti (RoME Econometric Theory) Time series regression: Regression and Smoothing Splines Local Polynomial Regression
AA 2020-21
and Averaging
39 / 39