RoME Regression Smoothing Splines 2021

Time series regression:
Regression and Smoothing Splines

Local Polynomial Regression and Averaging
Tommaso Proietti
RoME Econometric Theory
Tommaso Proietti (RoME Econometric Theory) Time series regression: Regression and Smoothing Splines Local Polynomial Regression
AA 2020-21
and Averaging
1 / 39
Introduction
Time series and their characteristics
A time series is a collection of observations on a economic stock or flow

variable indexed by time: {yt , t = 1, 2, . . . , n}.
Time is discrete and observations are equally spaced.
If y is measured on a ratio scale, common transformations are
log(yt )
yt − yt−1
∆ log yt = log(yt ) − log(yt−1 ) ≈
yt−1
yt − yt−s
∆s log yt = log(yt ) − log(yt−s ) ≈ , s = 4, 12,
yt−s
The log transformation is a special case of the Box-Cox transformation:
( λ
yt −1
yt (λ) = λ , λ=6 0,
log yt , λ = 0.
AA 2020-21
and Averaging
2 / 39
Introduction
We take a look at some economic time series first.

Some stylized facts emerge:
Some (most) series are seasonal.
Some (most) display trends.
The changes are strongly autocorrelated (we will explain this).
AA 2020-21
and Averaging
3 / 39
Introduction
Figure: U.S. Quarterly Gross Domestic Product at chained-linked volumes, ref. year 2009
Original series Logarithmic transformation

10.0
15000 9.5
9.0
10000
8.5
5000 8.0
1960 1980 2000 2020 1960 1980 2000 2020

Monthly logarithmic changes Annual logarithmic changes
0.04
0.10
0.02
0.05
0.00
0.00
-0.02
1960 1980 2000 2020 1960 1980 2000 2020
AA 2020-21
and Averaging
4 / 39
Introduction
Figure: U.S. Unemployment rate, monthly, seasonally adjusted
Original series Logarithms

2.5
10.0
2.0
7.5
1.5
5.0
1.0
1960 1980 2000 1960 1980 2000
Monthly changes ∆log y t
1 0.2
0
0.0
-1
-0.2
1960 1980 2000 1960 1980 2000
Annual Changes ∆12 log y t
5.0 1.0
2.5 0.5
0.0 0.0
-2.5 -0.5
1960 1980 2000 1960 1980 2000
AA 2020-21
and Averaging
5 / 39
Introduction
Figure: Italy, Index of Industrial Production, Manufacturing, 2010 = 100
Series
125 ip ip_sa
100
75
50
1990 1995 2000 2005 2010 2015

Annual log changes
0.2
0.1
0.0
-0.1
-0.2
D12Lip D12Lip_sa
-0.3
1990 1995 2000 2005 2010 2015
AA 2020-21
and Averaging
6 / 39
Introduction
Figure: Consumer Price Index for All Urban Consumers, All Items, 1982-84 = 100
Consumer price index
200
100
1950 1960 1970 1980 1990 2000 2010

Monthly inflation
0.02
0.01
0.00
-0.01
1950 1960 1970 1980 1990 2000 2010

Yearly inflation
0.15
0.10
0.05
0.00
1950 1960 1970 1980 1990 2000 2010
AA 2020-21
and Averaging
7 / 39
Introduction
Approaches to economic time series analysis
Objectives of time series analysis: interpretation and forecasting.

There are various approaches to TSA.
A (rather fuzzy) classification:
1 Parametric models (ARIMA, State space models, etc.). A useful categorisation
is due to Cox (1981):
Observation driven models Dynamic stochastic difference equations (the
behaviour is explained in terms of the past values of the process
and of the innovations): ARIMA models, VAR and VECM,
GARCH, etc.
Parameter driven models The series is generated by a combination of
unobserved components. Structural time series models.
Stochastic volatility models.
2 Semiparametric approach (Splines, Random fields)
3 Nonparametric approach (Kernel methods, Local polynomial regression, Neural
networks, Wavelets etc.)
AA 2020-21
and Averaging
8 / 39
Time Series Regression
Let us consider the regression model with a single explanatory variable t,

representing time:
Y = f (t) + ,
where f (t) is an unknown conditional mean function and at least E(|t) = 0,
E(2 |t) = σ 2 .
f (t) is possibly nonlinear and non-additive.
In the linear regression framework we can consider the (global) polynomial

approximation
f (t) = β0 + β1 t + β2 t 2 · · · + βp t p .
Here global means that the coefficients of the polynomial are constant across the
sample span of t and it is not possible to control the influence of the individual
observations on the fit.
AA 2020-21
and Averaging
9 / 39
Global polynomials are amenable to mathematical treatment, but are not very
flexible: they can provide bad local approximations and behave rather weirdly
at the extreme of the sample.
This point is illustrated by the first panel of figure 1, which plots the original
series, representing the industrial production index for the Italian Automotive
sector, and the estimate of the trend arising from fitting cubic and quintic
polynomials of time.
In particular, it can be seen that a high order is needed to provide a reasonable
fit (the cubic fit being very poor).
AA 2020-21
and Averaging
10 / 39
Industrial Production Index, Manufacture and Assembly of Motor Vehicles,

seasonally adjusted, Italy, January 1990 - October 2005.
Global polynomial fit

110 yt
cubic
Quintic
100
90
80
70
1990 1995 2000 2005

Cubic spline (λ=1600)
110
100
90
80
70
1990 1995 2000 2005
AA 2020-21
and Averaging
11 / 39
We will discuss the approximation

M
X
f (t) = βm hm (t)
m=1
which retains linearity in the regression coefficients and uses a suitable set of
transformation or functions of t, hm (t), called basis.
In the case of polynomial splines the idea is to add to a global polynomial of
order p polynomial pieces at given points, called knots, so that the sections are
joined together ensuring that certain continuity properties are fulfilled.
Given the set of points ξ1 < . . . < ξk < . . . ξK , a polynomial spline function of
degree p with K knots {ξk , k = 1, . . . , K } is a polynomial of degree p in each
of the k + 1 intervals [ξk , ξk+1 ), with p − 1 continuous derivatives, whereas the
p-th derivative has jumps at the knots.
AA 2020-21
and Averaging
12 / 39
The spline can be represented as follows:

K
X
f (t) = β0 + β1 t + · · · + βp t p + βp+k (t − ξk )p+ , (1)
k=1
where the set of functions

(t − ξk )p , t − ξk ≥ 0,

(t − ξk )p+ =
0, t − ξk < 0
defines what is usually called the truncated power basis of degree p.
AA 2020-21
and Averaging
13 / 39
According to (1) the spline is a linear combination of polynomial pieces; at

each knot a new polynomial piece, starting off at zero, is added so that the
derivatives at that point are continuous up to the order p − 1.
The truncated power representation has the advantage of representing the
spline as a multivariate regression model.
The piecewise nature of the spline “reflects the occurrence of structural
change” (Poirer, 1973). The knot ξi is the location of a structural break. The
change is “smooth”, since certain continuity conditions are ensured.
The coefficients βk determines the size of the break.
AA 2020-21
and Averaging
14 / 39
Cubic splines and natural boundary conditions
The cubic spline model arises when p = 3:
f (t) = β0 + β1 t + β2 t 2 + β3 t 3 + β4 (t − ξ1 )3+ + · · · + βK +3 (t − ξK )3+ .
The model has K + 4 parameters and the number of effective degrees of freedom is
df = 4 + K .
A cubic spline (like any other high order spline) behaves too erratically at the
boundary of the t support, i.e. the variance of the fit is very high.
AA 2020-21
and Averaging
15 / 39
Training sample (ti , yi ) and true regression function

f (t) = [exp(1.2t) + 1.5 sin(7t) − 1]/3.
●
●
●
●
● ●●
1.0
●
●
●
●
●
●
● ●
●
● ●
● ● ●
● ●
● ● ●
● ●
● ●
●● ●
●
●●
●●● ●
●●●
●
● ●
●
● ● ●●●● ●
● ●● ● ● ●
0.5
● ●
●
● ● ●● ●
● ● ●
y
● ● ● ●
● ●
● ● ●
● ●● ●● ●● ●
●● ●
●●
●
●
● ●
● ●
●
●
● ● ● ● ●
● ●
●● ● ●
●● ●●
● ● ●● ●
● ● ●
● ●●
●
● ● ●
● ● ●● ● ●
●● ●● ●
●
● ● ●
●
0.0
● ●
● ● ●
● ●● ●
●● ●
● ● ● ● ●
●
● ●
● ●
● ● ●●
●
●
●
●
● ●
●
●
●
●
0.0 0.2 0.4 0.6 0.8 1.0
x
AA 2020-21
and Averaging
16 / 39
Comparison of cubic and linear spline fit with 4 internal equally spaced knots at 0.2,
0.4, 0.6, 0.8.
●
●
● ●
●
1.0
●
linear
●
cubic
●
●
● ●
● ● ●
● ●
● ●
● ●
●
●● ●
● ●
● ●
● ●● ● ● ●
0.5
● ● ●
● ●
● ●
y
● ●
● ●
● ●● ●
●
●●
●
● ●
●
● ● ● ● ●
●
●
●
●
●●
●
● ●
● ● ●●
●● ● ●
●
0.0
●
● ●●
● ●
● ● ●
●
● ●
●
●
● ●
●
●
●
●
0.0 0.2 0.4 0.6 0.8 1.0
AA 2020-21
and Averaging
17 / 39
Cubic spline fit with different K and global cubic fit (knots are located
automatically at quantiles of the distribution of time t.
●
●
● ●
●
1.0
●
P3
●
K=2
●
●
K=5 ●
●
● ● ● K=10
● ●
●
●
● K=20 ●
●
●● ●
● ●
● ●
● ●● ● ● ●
0.5
● ● ●
● ●
● ●
y
● ●
● ●
● ●● ●
●
●●
●
● ●
●
● ● ● ● ●
●
●
●
●
●●
●
● ●
● ● ●●
●● ● ●
●
0.0
●
● ●●
● ●
● ● ●
●
● ●
●
●
● ●
●
●
●
●
0.0 0.2 0.4 0.6 0.8 1.0
AA 2020-21
and Averaging
18 / 39
This is the reason why it is preferable to impose the so called natural boundary
conditions, which constrain the spline be linear outside the boundary knots.
The natural boundary conditions require that the second and the third derivatives
are zero for t ≤ ξ1 and t ≥ ξK . This amounts to imposing 4 restrictions, and this
frees 4 df.
The complexity of the spline model is measured by the degrees of freedom, which is
the trace of the hat matrix. This is equal to p + 1 + K for polynomial splines and
K for a natural cubic spline.
AA 2020-21
and Averaging
19 / 39
Natural boundary conditions
The 2nd and 3rd derivatives of P

K
f (t) = β0 + β1 t + β2 t 2 + β3 t 3 + k=1 βk+3 (t − ξk )3+ are respectively
K
00 X
f (t) = 2β2 + 6β3 t + 6 βk+3 (t − ξk )+
k=1
K
000 X
f (t) = 6β3 + 6 βk+3 (t − ξk )0+ .
k=1
The natural boundary conditions imply the following constraints on the coefficients:
K
X K
X
β2 = β3 = 0; βk+3 = 0; ξk βk+3 = 0
k=1 k=1
AA 2020-21
and Averaging
20 / 39
Figure: Natural cubic splines. Comparison of fit with different K and global cubic fit.
●
● ●
●
1.0
●
P3
●
K=2
●
●
K=5 ●
●
● ● ● K=10
● ●
●
●
● K=20 ●
●
●● ●
● ●
● ●
● ●● ● ● ●
0.5
● ● ●
● ●
● ●
y
● ●
● ●
● ●● ●
●
●●
●
● ●
●
● ● ● ● ●
●
●
●
●
●●
●
● ●
● ● ●●
●● ● ●
●
0.0
●
● ●●
● ●
● ● ●
●
● ●
●
●
● ●
●
●
●
●
0.0 0.2 0.4 0.6 0.8 1.0
AA 2020-21
and Averaging
21 / 39
Cubic splines and natural boundary conditions How many knots
Model selection: how many knots?
Model selection is carried out by the same methods considered for regression. It
entails not only the selection of the number of knots, K , but also their location
along the support of t.
The automatic option is to locate them at the 100k/(K + 1), k = 1, . . . , K -th

percentiles of the distribution of t. If there are K candidate knots, there are 2K
possible models to select. Stepwise selection has been proposed. An alternative is
to use a regularization approach, i.e. smoothing splines.
Model complexity is measured by the degrees of freedom, e.g.

df = trace(H ) = p + 1 + K for a polynomial spline.
Note: the truncated power basis is easily interpretable; however, for computational
efficiency the a linear transformation of this basis, the B-spline basis is used for
estimation (the regressors are less collinear).
AA 2020-21
and Averaging
22 / 39
Cubic splines and natural boundary conditions How many knots
Figure: Truncated power basis for polynomial spline models.
(x−ξ k )+0 (x−ξ k )+1

1.10
7.5
1.05
1.00 5.0
0.95 2.5
5 10 5 10
(x−ξ k )+3 (cubic spline) Smoothing spline
750 40
30
500
20
250
10
5 10 5 10
AA 2020-21
and Averaging
23 / 39
Smoothing splines
Smoothing splines
A smoothing spline is a natural cubic spline with n knots placed at each observation
ti , i = 1, . . . , n. Hence each new observation carries “news”. Obviously, such a
model is overparameterized as the number of parameters is n, i.e. there is one
parameter for each observation.
Pn
Hence, we choose f (t) = i=1 θi Ni (t), where Ni (t) are the elements of the
natural spline basis corresponding to the knots ti ’s. The coefficients θi are
estimated by minimising the following penalised residual sum of squares function:
( n Z )
X
2 00 2
PRESS(λ) = min [yi − f (ti )] + λ [f (t)] dt , (2)
i=1
where λ ≥ 0 isR the smoothness parameter, f 00 (t) is second derivative of the

function, and [f 00 (t)]2 dt is the curvature of the function.
AA 2020-21
and Averaging
24 / 39
Smoothing splines
The parameter λ regulates the complexity of the model.

For λ = 0, the spline fits the observations perfectly (yi = fˆ(ti )).
For λ → ∞ we obtain the linear ˆ ˆ ˆ
R fit f (ti ) = β0 + β1 ti (the linear function has
zero second derivative and so [f 00 (t)]2 dt = 0).
In vector notation
PRESS(λ) = (y − N θ)0 (y − N θ) + λθ 0 Ωθ,
R 00 N is00 the regression matrix of the natural spline and Ω has elements
where
Nh (t)Nk (t)dt.
The solution is
θ̂ = (N 0 N + λΩ)−1 N 0 y
(notice the analogy with ridge regression).
AA 2020-21
and Averaging
25 / 39
Smoothing splines
The estimated regression function is fˆ = Hλ y , where Hλ = N (N 0 N + λΩ)−1 N 0 .

The effective degrees of freedom of the spline fit is
df (λ) = tr(Hλ ).
When λ → ∞, df (λ) → 2 (minimal complexity), whereas as λ → 0 df (0) → n

(maximum complexity).
The estimation of λ is carried out either by minimizing an information criterion or

by crossvalidation. Denoting
n
X
RSS(λ) = [yi − f (ti )]2 ,
i=1
AIC (λ) = ln[RSS(λ)] + 2df (λ)/n.
AA 2020-21
and Averaging
26 / 39
Smoothing splines
Crossvalidation
We seek the value of λ which minimises

n
!2
X yi − fˆ(ti )
CV (λ) =
1 − hi,λ
i=1
where hi,λ is the i-th diagonal element of Hλ , or

RSS(λ)
GCV (λ) = .
[1 − n−1 df (λ)]2
AA 2020-21
and Averaging
27 / 39
Smoothing splines
Simulated example: n = 100, f (t) = sin[12(t + 0.2)]/(t + 0.2), ∼ N(0, 1),

Y = f (t) + .
4 ●●
●
● ●●●
y
● f(x)
●
● ●
● ●
●● ●
●
2
●
● ●
●
● ● ●
●
● ● ● ●
●
● ●● ●
● ●●●
●● ● ●
●
● ●● ●
●
y
● ● ●
●● ●●
0
●
● ● ● ●
●
● ● ●
● ● ●
● ●● ●● ● ● ● ●
● ●
●
● ● ●
● ●
●
● ●
● ●
−2
● ●● ●
●
●●
●
●
●
●
0.0 0.2 0.4 0.6 0.8 1.0
x
AA 2020-21
and Averaging
28 / 39
Smoothing splines
Simulated example: n = 100, f (t) = sin[12(t + 0.2)]/(t + 0.2), ∼ N(0, 1),

Y = f (t) + . Smoothing spline fit.
4 ●●
●
● ●●●
●
● CV
● ●
● df=2 ●
●● ●
● df=20
2
●
● ●
●
● ● df=50 ●
●
● ● ● ●
●
● ●● ●
● ●●●
●● ● ●
●
● ●● ●
●
y
● ● ●
●● ●●
0
●
● ● ● ●
●
● ● ●
● ● ●
● ●● ●● ● ● ● ●
● ●
●
● ● ●
● ●
●
● ●
● ●
−2
● ●● ●
●
●●
●
●
●
●
0.0 0.2 0.4 0.6 0.8 1.0
x
AA 2020-21
and Averaging
29 / 39
Local polynomial regression
Consider the regression model Y = f (t) + , where f (t) is an unknown

deterministic function of the input t, so that E(Y |t) = f (t).
A training sample of observations (yi , ti ), i = 1, . . . , n, is available.
Suppose that we are interested in estimating the value of the function at a
point t0 , f (t0 ).
If f (t) is “smooth” (i.e. continuous and differentiable), it can be locally
approximated around t0 by a polynomial of degree p, using a Taylor expansion:
f˜(t) = β0 + β1 (t − t0 ) + · · · + βp (t − t0 )p .
It is clear from setting t = t0 , that β0 = f (t0 ), β1 = f 0 (t0 ), and in general
βj = f (j) (t0 )/j! (the scaled j-th derivative of the function).
AA 2020-21
and Averaging
30 / 39
As f˜(t) is valid only as a local approximation, the p + 1 unknown coefficients

βk , k = 0, . . . , p, are estimated by weighted least squares (WLS), giving more
weight to the observations with values of t that are close to t0 .
The WLS objective function to be minimized is:
n
X ti − t0 2
K yi − β̂0 − β̂1 (ti − t0 ) − . . . − β̂p (ti − t0 )p ,
h
i=1
where K t−t

h
0
is a kernel function, depending on a bandwidth parameter,
h > 0, providing a set of weights declining with the distance of t from t0 .
AA 2020-21
and Averaging
31 / 39
Denoting
(t1 − t0 )p
 
1 (t1 − t0 ) ···
 1 (t2 − t0 ) ··· (t2 − t0 )p  
β0

.. .. ..
 
 β1
X0
 
. . ··· .
  
= ,β =  .

p  ..

 1 (ti − t0 ) ··· (ti − t0 ) 
  . 
 .. .. .. 
βp
 . . ··· . 
1 (tn − t0 ) · · · (tn − t0 )p
K0

t1 − t0 t2 − t0 tn − t0
= diag K ,K ,...,K ,
h h h
the WLS estimator is
β̂ = (X00 K0 X0 )−1 X00 K0 y .
The fitted value is then ŷ0 = β̂0 .
AA 2020-21
and Averaging
32 / 39
Notice that
β̂0 = wh yt−h + wh−1 yt−h+1 + · · · + w1 yt−1 + w0 yt + w−1 yt+1 + · · · + w−h yt+h
fˆ(t0 ) = β̂0
The linear combination yielding our trend estimate is often termed a (linear)
filter, and the weights wj constitute its impulse responses.
The latter are time invariant and carry essential information on the nature of
the estimated signal.
Two important properties of the weights are symmetry (if the design is regular,
i.e., observations are equally spaced) and reproduction of p-th degree
polynomials.
AA 2020-21
and Averaging
33 / 39
The nature of the fit is determined by 3 quantities:

The polynomial order, p.
The kernel function.
The bandwidth parameter, h. The bandwidth determines the width of the
local neighbourhood.
AA 2020-21
and Averaging
34 / 39
Local polynomial regression Kernels
Kernels
A kernel is a symmetric and non negative function with a maximum at zero. There
is a large literature on kernels and their properties.
The uniform kernel is K (u) = I (|u| ≤ 1);
The Epanechnikov kernel has certain optimality properties and is defined as
follows: 3 2
K (u) = 4 (1 − u ), if |u| ≤ 1,
0, otherwise.
The tricube kernel (used by Loess, a k-nearest-neighbour local polynomial
method) is defined as follows:
(1 − |u|3 )3 , if |u| ≤ 1,

K (u) =
0, otherwise.
The Gaussian kernel is such that h is the standard deviation of t and K (u) is
the standard Normal density function.
In the k-nearest-neighbour approach h is set equal to the distance from the
k-th closest ti to t0 . Hence, it varies with t0 .
AA 2020-21
and Averaging
35 / 39
Local polynomial regression Kernels
Figure: Kernel functions.
Uniform
Epanechnikov
-2 -1 0 1 2 -2 -1 0 1 2
Gaussian Tricube
-2 -1 0 1 2 -2 -1 0 1 2
AA 2020-21
and Averaging
36 / 39
Local polynomial regression Bias-Variance Trade-off
Bias-Variance Trade-off
The degree p of the polynomial and the bandwidth h are crucial quantities.
They regulate the complexity of the fit, which increases with p and decreases
with h.
The mean square estimation error is
MSE[fˆ(t0 )] = Bias2 [fˆ(t0 )] + Var[fˆ(t0 )]
Both p and h affect the MSE.

For a given h, by increasing p we reduce the bias (we improve the Taylor
approximation by including higher order terms), but we increase the variance
(there are more parameters to be estimated).
For a given p, a large h will capture more observations and the fit tends to be
the same as the global polynomial fit (with p + 1 df). The variance is small,
but the bias is high.
AA 2020-21
and Averaging
37 / 39
Local polynomial regression Bias-Variance Trade-off
Usually, fˆ(t0 ) is biased for f (t0 ), unless the true regression function is a
polynomial of order p.
The bias arises from neglecting higher order terms in the Taylor expansion.
The bias is inversely related to p and positively related to the bandwidth h.
As far as the variance is concerned, for a given p, Var[fˆ(t0 )] decreases as h
increases.
The shape of the kernel has much less impact on the bias-variance trade-off.
The optimal choice of (p, h) should minimise MSE[fˆ(t0 )], which can be
estimated using a variety of methods (see e.g. Fan and Gijbels, 1994).
It is usually most effective to choose a low degree polynomial (e.g. p = 1, or
p = 3) and concentrate the efforts on the selection of the bandwidth.
The optimal h can be obtained by cross-validation.
AA 2020-21
and Averaging
38 / 39
Special case: local averaging (p = 0) and kernel smoothing
Special case: local averaging (p = 0) and kernel smoothing
The local constant fit (p = 0) is an important special case.

The local weighted average of Y ,
Pn ti −t0

ˆ i=1 K h yi
f (t0 ) = Pn ti −t0

i=1 K h
is known as the Nadaraya-Watson average.

Denoting by Nk (t) the set of k points nearest to t, the k-nearest-neighbour average
is
1 X
fˆ(t0 ) = yi .
k
ti ∈Nk (t0 )
This corresponds to the use of a uniform kernel.
AA 2020-21
and Averaging
39 / 39

RoME Regression Smoothing Splines 2021

Uploaded by

Copyright:

Available Formats

You might also like

RoME Regression Smoothing Splines 2021

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

RoME Regression Smoothing Splines 2021

Uploaded by

Copyright:

Available Formats

Time series regression:

Regression and Smoothing Splines

RoME Econometric Theory

Time series and their characteristics

A time series is a collection of observations on a economic stock or flow

We take a look at some economic time series first.

Original series Logarithmic transformation

1960 1980 2000 2020 1960 1980 2000 2020

1960 1980 2000 2020 1960 1980 2000 2020

Figure: U.S. Unemployment rate, monthly, seasonally adjusted

Original series Logarithms

Figure: Italy, Index of Industrial Production, Manufacturing, 2010 = 100

1990 1995 2000 2005 2010 2015

Consumer price index

1950 1960 1970 1980 1990 2000 2010

1950 1960 1970 1980 1990 2000 2010

1950 1960 1970 1980 1990 2000 2010

Approaches to economic time series analysis

Objectives of time series analysis: interpretation and forecasting.

Time Series Regression

Let us consider the regression model with a single explanatory variable t,

f (t) is possibly nonlinear and non-additive.

In the linear regression framework we can consider the (global) polynomial

Industrial Production Index, Manufacture and Assembly of Motor Vehicles,

Global polynomial fit

1990 1995 2000 2005

1990 1995 2000 2005

We will discuss the approximation

The spline can be represented as follows:

where the set of functions

defines what is usually called the truncated power basis of degree p.

According to (1) the spline is a linear combination of polynomial pieces; at

Cubic splines and natural boundary conditions

The cubic spline model arises when p = 3:

f (t) = β0 + β1 t + β2 t 2 + β3 t 3 + β4 (t − ξ1 )3+ + · · · + βK +3 (t − ξK )3+ .

Training sample (ti , yi ) and true regression function

0.0 0.2 0.4 0.6 0.8 1.0

0.0 0.2 0.4 0.6 0.8 1.0

0.0 0.2 0.4 0.6 0.8 1.0

Natural boundary conditions

The 2nd and 3rd derivatives of P

0.0 0.2 0.4 0.6 0.8 1.0

Model selection: how many knots?

The automatic option is to locate them at the 100k/(K + 1), k = 1, . . . , K -th

Model complexity is measured by the degrees of freedom, e.g.

Figure: Truncated power basis for polynomial spline models.

(x−ξ k )+0 (x−ξ k )+1

where λ ≥ 0 isR the smoothness parameter, f 00 (t) is second derivative of the

The parameter λ regulates the complexity of the model.

PRESS(λ) = (y − N θ)0 (y − N θ) + λθ 0 Ωθ,

The estimated regression function is fˆ = Hλ y , where Hλ = N (N 0 N + λΩ)−1 N 0 .

When λ → ∞, df (λ) → 2 (minimal complexity), whereas as λ → 0 df (0) → n

The estimation of λ is carried out either by minimizing an information criterion or

AIC (λ) = ln[RSS(λ)] + 2df (λ)/n.

We seek the value of λ which minimises

where hi,λ is the i-th diagonal element of Hλ , or

Simulated example: n = 100, f (t) = sin[12(t + 0.2)]/(t + 0.2),  ∼ N(0, 1),

0.0 0.2 0.4 0.6 0.8 1.0

Simulated example: n = 100, f (t) = sin[12(t + 0.2)]/(t + 0.2), ∼ N(0, 1),

Simulated example: n = 100, f (t) = sin[12(t + 0.2)]/(t + 0.2), ∼ N(0, 1),

Consider the regression model Y = f (t) + , where f (t) is an unknown