Download as pdf or txt
Download as pdf or txt
You are on page 1of 47

Advanced Statistical Learning

Chapter 5: Excursion: B-Splines


Bernd Bischl, Julia Moosbauer, Andreas Groll

Department of Statistics – TU Dortmund


Winter term 2020/21
Splines

Bernd Bischl, Julia Moosbauer, Andreas Groll c Winter term 2020/21 Advanced Statistical Learning – 5 – 1 / 46
SPLINES
In this chapter we will regard a specific class of basis functions, the
so-called B-splines, in detail.

Special thanks to Prof. Brian Marx and Prof. Paul


Eilers for providing most of the slides!!!!

For an in-depth discussion of P-splines, see their new book:


P.H.C. Eilers and B.D. Marx (2020). Practical Smoothing: The Joys of
P-splines. Cambridge University Press.

Bernd Bischl, Julia Moosbauer, Andreas Groll c Winter term 2020/21 Advanced Statistical Learning – 5 – 2 / 46
Basics of Bases

Bernd Bischl, Julia Moosbauer, Andreas Groll c Winter term 2020/21 Advanced Statistical Learning – 5 – 3 / 46
BASIC LINEAR REGRESSION

Scatterplot: pairs (xi , yi ), i = 1, . . . , m


Assumption: straight line fits data well
Equation: µi = α0 + α1 · xi

Bernd Bischl, Julia Moosbauer, Andreas Groll c Winter term 2020/21 Advanced Statistical Learning – 5 – 4 / 46
BASIC LINEAR REGRESSION
Least squares: minimize
n
X n
X
2
S= (yi − α0 − α1 xi ) = (yi − µi )2
i =1 i =1

Matrix notation:
 
1 x1
1 x2   
α
X = . .  α= 0 µ = Xα
 
 .. ..  α1
1 xn

Minimize:

ky − Xα k2 =⇒ XT Xα̂
α = XT y =⇒ α = (XT X)−1 XT y
α̂

Bernd Bischl, Julia Moosbauer, Andreas Groll c Winter term 2020/21 Advanced Statistical Learning – 5 – 5 / 46
CURVED RELATIONSHIPS

Linear fit is not always OK


Judged by eye, or after studying residuals

Bernd Bischl, Julia Moosbauer, Andreas Groll c Winter term 2020/21 Advanced Statistical Learning – 5 – 6 / 46
CURVED RELATIONSHIPS
Linear fit too simple? Add higher powers of x:

µi = α0 + α1 xi + α2 xi2 + α3 xi3 + . . .

More columns in matrix X

x12 x13 . . . x1m


 
1 x1
1 x2 x22 x23 . . . x2m 
X = . .
 
.. .. .. .. 
 .. .. . . . . 
1 xn xn2 xn3 . . . xnm

Same regression equations:


XT Xα̂
α = XT y =⇒ α̂ α = (XT X)−1 XT y

Bernd Bischl, Julia Moosbauer, Andreas Groll c Winter term 2020/21 Advanced Statistical Learning – 5 – 7 / 46
BASIS FUNCTIONS

Regression model µ = Xα
Columns of X: basis functions. Polynomial basis
With sorted x for nice visual representation

Caution: Higher degree polynomials are numerically unstable!


Bernd Bischl, Julia Moosbauer, Andreas Groll c Winter term 2020/21 Advanced Statistical Learning – 5 – 8 / 46
EXAMPLE: MOTORCYCLE DATA

Simulated crash experiment


Acceleration of motorcycle helmets measured

Bernd Bischl, Julia Moosbauer, Andreas Groll c Winter term 2020/21 Advanced Statistical Learning – 5 – 9 / 46
EXAMPLE: MOTORCYCLE DATA
High degree needed for decent curve fit
Bad numerical condition (use orthogonal polynomials)

Bernd Bischl, Julia Moosbauer, Andreas Groll c Winter term 2020/21 Advanced Statistical Learning – 5 – 10 / 46
EXAMPLE: MOTORCYCLE DATA
Slight data changes can change the fit by a huge amount
Longer left part (near zero)
Notice wiggles

Bernd Bischl, Julia Moosbauer, Andreas Groll c Winter term 2020/21 Advanced Statistical Learning – 5 – 11 / 46
TROUBLE WITH POLYNOMIALS

High degree (10 or more) may be needed


Basis functions (powers of x) are global
Moving one end (vertically) moves other end too
Good fit at one end spoils things at other end
Unexpected wiggles
The higher the degree the more sensitive
Global polynomials are a dead end

=⇒ Use local instead of global functions!

Bernd Bischl, Julia Moosbauer, Andreas Groll c Winter term 2020/21 Advanced Statistical Learning – 5 – 12 / 46
B-Splines

Bernd Bischl, Julia Moosbauer, Andreas Groll c Winter term 2020/21 Advanced Statistical Learning – 5 – 13 / 46
B-SPLINE TYPES
Definition of B-Splines: Basis functions are only non-zero over the
intervals between d + 2 adjacent knots, where d is the degree of the
basis (for example d = 3 for a cubic spline). To define an M parameter
B-spline basis, we need to define M + d + 1 knots,
t1 < t2 < . . . < tM +d +1 , where the interval over which the spline is to
be evaluated lies within [td +1 , tM +1 ] (so that the first and last d knot
locations are essentially arbitrary). A d-th degree spline can then be
represented as
M
X
f (x ) = αj Bj (x ; d ) ,
j =1

where αi are unknown spline coefficients.

Bernd Bischl, Julia Moosbauer, Andreas Groll c Winter term 2020/21 Advanced Statistical Learning – 5 – 14 / 46
B-SPLINE TYPES
The i-th B-spline basis of degree d is defined recursively as:

x − tj tj +d +1 − x
Bj (x ; d ) = Bj (x ; d −1)+ Bj −1 (x ; d −1), j = 1, . . . , M
tj +d − tj tj +d +1 − tj +1

where (
1, if x ∈ [tj , tj +1 ),
Bj (x ; 0) =
0, otherwise.

More detailed information about the B-spline basis can be found for
example in Eilers and Marx (1996).

Bernd Bischl, Julia Moosbauer, Andreas Groll c Winter term 2020/21 Advanced Statistical Learning – 5 – 15 / 46
B-SPLINE TYPES
Linear B-Spline:
Two pieces, each a straight line, rest zero
Nicely connected at knots (t1 to t3 ): same values
Slope jumps at knots

Bernd Bischl, Julia Moosbauer, Andreas Groll c Winter term 2020/21 Advanced Statistical Learning – 5 – 16 / 46
B-SPLINE TYPES
Quadratic B-Spline:
Three pieces, each a quadratic segment, rest zero
Nicely connected at knots (t1 to t4 ): same values and slopes
Shape similar to Gaussian

Bernd Bischl, Julia Moosbauer, Andreas Groll c Winter term 2020/21 Advanced Statistical Learning – 5 – 17 / 46
B-SPLINE TYPES
Cubic B-Spline:
Four pieces, each a cubic segment, rest zero
At knots (t1 to t5 ): same values, first & second derivatives
Shape more similar to Gaussian

Bernd Bischl, Julia Moosbauer, Andreas Groll c Winter term 2020/21 Advanced Statistical Learning – 5 – 18 / 46
B-SPLINE TYPES

Bernd Bischl, Julia Moosbauer, Andreas Groll c Winter term 2020/21 Advanced Statistical Learning – 5 – 19 / 46
B-SPLINE TYPES

Bernd Bischl, Julia Moosbauer, Andreas Groll c Winter term 2020/21 Advanced Statistical Learning – 5 – 20 / 46
B-SPLINES IN ACTION

Basis matrix B
Columns are B-splines
 
B1 (x1 ) B2 (x1 ) B3 (x1 ) . . . BM (x1 )
B1 (x2 ) B2 (x2 ) B3 (x2 ) . . . BM (x2 )
 
 .. .. .. .. .. 
 . . . . . 
B1 (xn ) B2 (xn ) B3 (xn ) . . . BM (xn )

In each row only a few non-zero elements (degree plus one)


bij αj = BTi· α
P
Only a few basis functions contribute to µi =

Bernd Bischl, Julia Moosbauer, Andreas Groll c Winter term 2020/21 Advanced Statistical Learning – 5 – 21 / 46
B-SPLINES IN ACTION

Bernd Bischl, Julia Moosbauer, Andreas Groll c Winter term 2020/21 Advanced Statistical Learning – 5 – 22 / 46
B-SPLINES IN ACTION

Bernd Bischl, Julia Moosbauer, Andreas Groll c Winter term 2020/21 Advanced Statistical Learning – 5 – 23 / 46
B-SPLINES IN ACTION
B-splines are local functions, look like Gaussian
B-splines are columns of basis matrix B
Scaling and summing gives fitted values: µ = Bα
The knots determine the B-spline basis
Polynomial pieces make up B-splines, join at knots
General patterns of knots are possible
We consider only equal-spacing
Number of knots determines width and number of B-splines

Bernd Bischl, Julia Moosbauer, Andreas Groll c Winter term 2020/21 Advanced Statistical Learning – 5 – 24 / 46
B-SPLINES IN ACTION

Bernd Bischl, Julia Moosbauer, Andreas Groll c Winter term 2020/21 Advanced Statistical Learning – 5 – 25 / 46
P-Splines

Bernd Bischl, Julia Moosbauer, Andreas Groll c Winter term 2020/21 Advanced Statistical Learning – 5 – 26 / 46
SMOOTHNESS AND ROUGHNESS

More B-splines in basis: more detail is possible


But it is not necessary!
Perfectly smooth curves µ = Bα are possible
It all depends on α
P-spline idea: control smoothness of α
Introduce a penalty on roughness of α
While using a “rich” B-spline basis

Bernd Bischl, Julia Moosbauer, Andreas Groll c Winter term 2020/21 Advanced Statistical Learning – 5 – 27 / 46
SMOOTHNESS AND ROUGHNESS

Bernd Bischl, Julia Moosbauer, Andreas Groll c Winter term 2020/21 Advanced Statistical Learning – 5 – 28 / 46
SMOOTHNESS AND ROUGHNESS
The coefficients determine roughness
High roughness: α erratic
Little roughness: smoothly varying α
Simple numerical measure:

M
X
R= (αj − αj −1 )2
j =2

p
Or RMS “change to neighbor”: r = R /(M − 1)

Bernd Bischl, Julia Moosbauer, Andreas Groll c Winter term 2020/21 Advanced Statistical Learning – 5 – 29 / 46
SMOOTHNESS AND ROUGHNESS

Bernd Bischl, Julia Moosbauer, Andreas Groll c Winter term 2020/21 Advanced Statistical Learning – 5 – 30 / 46
DIFFERENCES AND MATRICES

α = (α2 − α1 , . . . , αM − αM −1 )
We are interested in ∆α
Special matrix makes life easy:
 
−1 1 0 0
α = Dα ;
∆α D =  0 −1 1 0
0 0 −1 1

D has M − 1 rows, M columns

Bernd Bischl, Julia Moosbauer, Andreas Groll c Winter term 2020/21 Advanced Statistical Learning – 5 – 31 / 46
ROUGHNESS WITH A MATRIX

Roughness measure R:

α||2 = ||Dα ||2 = α T DT Dα


R = ||∆α

Matrix D easily computed


(In R: D <- diff(diag(M)))
So we have easy tool to express roughness

Bernd Bischl, Julia Moosbauer, Andreas Groll c Winter term 2020/21 Advanced Statistical Learning – 5 – 32 / 46
PENALIZING LEAST SQUARES

We set a double goal:


good fit to the data: low S = ||y − Bα ||2
smooth curve, i.e. low roughness: R = ||Dα ||2
Balance of the two in one function (λ > 0):

Q = S + λR = ||y − Bα||2 + λ||Dα||2

Last term known as penalty


λ is smoothing (or penalty or regularization) parameter
User sets λ (for now, automatic choice later)
Penalized least squares

Bernd Bischl, Julia Moosbauer, Andreas Groll c Winter term 2020/21 Advanced Statistical Learning – 5 – 33 / 46
PENALIZING LEAST SQUARES
Minimize

Q = S + λR = ||y − Bα ||2 + λ||Dα ||2


= yT y − 2α T BT y + α T (BT B + λDT D)α
α

Set derivative w.r.t. α zero; result:

(BT B + λDT D)α α = (BT B + λDT D)−1 BT y


α = BT y =⇒ α̂

Small modification of BT Bα = BT y

Bernd Bischl, Julia Moosbauer, Andreas Groll c Winter term 2020/21 Advanced Statistical Learning – 5 – 34 / 46
PENALIZING LEAST SQUARES

Bernd Bischl, Julia Moosbauer, Andreas Groll c Winter term 2020/21 Advanced Statistical Learning – 5 – 35 / 46
HIGHER ORDER DIFFERENCES

α = (α2 − α1 , . . . , αM − αM −1 )
First order: ∆α
α) = ∆2α
Second order: ∆(∆α

∆2α = [(αj − αj −1 ) − (αj −1 − αj −2 )]j =3,...,M


= [(αj − 2αj −1 + αj −2 )]j =3,...,M

Matrix in R: D <- diff(diag(M), diff = 2)


 
1 −2 1 0 0
D2 = 0 1 −2 1 0
0 0 1 −2 1

Bernd Bischl, Julia Moosbauer, Andreas Groll c Winter term 2020/21 Advanced Statistical Learning – 5 – 36 / 46
HIGHER ORDER DIFFERENCES

Bernd Bischl, Julia Moosbauer, Andreas Groll c Winter term 2020/21 Advanced Statistical Learning – 5 – 37 / 46
HIGHER ORDER DIFFERENCES
Third order

∆3α = [(αj − 3αj −1 + 3αj −2 − αj −3 )]j =4,...,M

Matrix
 
−1 3 −3 1 0 0
D3 =  0 −1 3 −3 1 0
0 0 −1 3 −3 1

In R: D <- diff(diag(M), diff = 3)

Bernd Bischl, Julia Moosbauer, Andreas Groll c Winter term 2020/21 Advanced Statistical Learning – 5 – 38 / 46
INTERPOLATION WITHOUT A PENALTY

Interpolation with B-splines


α
fit B-splines by regression: ŷ = Bα̂
compute B-splines at new x̃ : Bj (x̃ ), j = 1 . . . M
P
α
µ(x̃ ) = j Bj (x̃ )α̂
Works fine if you can estimate α
Regression may fail with large gaps in x
Then some B-splines have no support
Singular system of equations

Bernd Bischl, Julia Moosbauer, Andreas Groll c Winter term 2020/21 Advanced Statistical Learning – 5 – 39 / 46
INTERPOLATION WITH A PENALTY

Penalty lets elements of α hold hands


They bridge the gap(s) automatically!

Bernd Bischl, Julia Moosbauer, Andreas Groll c Winter term 2020/21 Advanced Statistical Learning – 5 – 40 / 46
EXTRAPOLATION

Extrapolation works the same


Just choose the domain of x wide enough
Take a generous number of (cubic) B-splines
Penalty again bridges the gap
Smooth fit and neat extrapolation automatically
With differences of order d in penalty

Bernd Bischl, Julia Moosbauer, Andreas Groll c Winter term 2020/21 Advanced Statistical Learning – 5 – 41 / 46
INTER- AND EXTRAPOLATION
with d = 1

Bernd Bischl, Julia Moosbauer, Andreas Groll c Winter term 2020/21 Advanced Statistical Learning – 5 – 42 / 46
INTER- AND EXTRAPOLATION
with d = 2

Bernd Bischl, Julia Moosbauer, Andreas Groll c Winter term 2020/21 Advanced Statistical Learning – 5 – 43 / 46
INTER- AND EXTRAPOLATION
with d = 3

Bernd Bischl, Julia Moosbauer, Andreas Groll c Winter term 2020/21 Advanced Statistical Learning – 5 – 44 / 46
STRONG SMOOTHING
Example (λ = 106 ):

Bernd Bischl, Julia Moosbauer, Andreas Groll c Winter term 2020/21 Advanced Statistical Learning – 5 – 45 / 46
WRAP-UP

Basis with many B-splines allows detail


But smoothness depends on coefficients α in Bα
Difference penalty on α to tune smoothness
P-splines allow flexible interpolation and extrapolation

Bernd Bischl, Julia Moosbauer, Andreas Groll c Winter term 2020/21 Advanced Statistical Learning – 5 – 46 / 46

You might also like