Download as pdf or txt
Download as pdf or txt
You are on page 1of 815

Regression & other methods for Functional Data

Fabian Scheipl
Institut für Statistik
Ludwig-Maximilians-Universität München

adidas - March 2019


Part I

Background: Functional Data

2 / 327
Introduction

Descriptive Statistics for Functional Data

Basis Representation of Functional Data

Summary
Introduction
Overview
From high-dimensional to functional data

Descriptive Statistics for Functional Data

Basis Representation of Functional Data

Summary

4 / 327
Introduction
Overview
Examples of functional data: Berkeley growth study

200
160
Height
120 80

5 10 15
Age

4 / 327
Introduction
Overview
Examples of functional data: Handwriting

0.03
−0.03 −0.01 0.01
y(t)

−0.03 −0.01 0.01 0.03


x(t)

5 / 327
Introduction
Overview
Examples of functional data: Brain scan images

6 / 327
Introduction
Overview
Characteristics of functional data:
200

0.03
160

−0.03 −0.01 0.01


Height

y(t)
120 80

5 10 15 −0.03 −0.01 0.01 0.03


Age x(t)

I Several measurements for the same statistical unit, often over time
I Sampling grid is not necessarily equally spaced, sparse data
I Smooth variation, that could be assessed (in principle) as often as
desired
I Noisy observations
I Many observations of the same data generating process
↔ time series analysis
J. Ramsay and Silverman 2005
7 / 327
Introduction
Overview
Characteristics of functional data:
200

0.03
160

−0.03 −0.01 0.01


Height

y(t)
120 80

5 10 15 −0.03 −0.01 0.01 0.03


Age x(t)

I Several measurements for the same statistical unit, often over time
I Sampling grid is not necessarily equally spaced, sparse data
I Smooth variation, that could be assessed (in principle) as often as
desired
I Noisy observations
I Many observations of the same data generating process
↔ time series analysis
J. Ramsay and Silverman 2005
7 / 327
Introduction
Overview
Characteristics of functional data:
200

0.03
160

−0.03 −0.01 0.01


Height

y(t)
120 80

5 10 15 −0.03 −0.01 0.01 0.03


Age x(t)

I Several measurements for the same statistical unit, often over time
I Sampling grid is not necessarily equally spaced, sparse data
I Smooth variation, that could be assessed (in principle) as often as
desired
I Noisy observations
I Many observations of the same data generating process
↔ time series analysis
J. Ramsay and Silverman 2005
7 / 327
Introduction
Overview
Characteristics of functional data:
200

0.03
160

−0.03 −0.01 0.01


Height

y(t)
120 80

5 10 15 −0.03 −0.01 0.01 0.03


Age x(t)

I Several measurements for the same statistical unit, often over time
I Sampling grid is not necessarily equally spaced, sparse data
I Smooth variation, that could be assessed (in principle) as often as
desired
I Noisy observations
I Many observations of the same data generating process
↔ time series analysis
J. Ramsay and Silverman 2005
7 / 327
Introduction
Overview
Characteristics of functional data:
200

0.03
160

−0.03 −0.01 0.01


Height

y(t)
120 80

5 10 15 −0.03 −0.01 0.01 0.03


Age x(t)

I Several measurements for the same statistical unit, often over time
I Sampling grid is not necessarily equally spaced, sparse data
I Smooth variation, that could be assessed (in principle) as often as
desired
I Noisy observations
I Many observations of the same data generating process
↔ time series analysis
J. Ramsay and Silverman 2005
7 / 327
Introduction
Overview
Characteristics of functional data:
200

0.03
160

−0.03 −0.01 0.01


Height

y(t)
120 80

5 10 15 −0.03 −0.01 0.01 0.03


Age x(t)

I Several measurements for the same statistical unit, often over time
I Sampling grid is not necessarily equally spaced, sparse data
I Smooth variation, that could be assessed (in principle) as often as
desired
I Noisy observations
I Many observations of the same data generating process
↔ time series analysis
J. Ramsay and Silverman 2005
7 / 327
Introduction
Overview

Aims of functional data analysis:


I Represent the data → interpolation, smoothing
I Display the data → registration, outlier detection
I Study sources of pattern and variation → functional principal
component analysis, canonical correlation analysis
I Explain variation in a dependent variable by using independent
variable information → functional regression models
I No forecasting / extrapolation ↔ time series analysis
● ●

● ●
4

4
● ●
● ●
● ●
● ●
● ●
● ●
● ●
● ●
● ●
●● ● ●● ●● ● ●●
● ● ● ● ● ●
● ● ● ●
● ●
● ● ● ● ● ●
2

2
●● ● ● ●● ● ●
● ●
● ●
x(t)

x(t)
● ● ● ●
● ●

● ● ● ● ● ●
● ●
● ● ● ● ● ●
● ●● ● ●●
● ●
●● ● ● ● ●● ● ● ●
● ● ● ●
●● ● ● ● ● ●● ● ● ● ●
● ● ● ●● ● ● ● ●●
0

0
● ● ● ●
● ● ● ● ● ● ● ●
● ●● ● ● ● ● ●● ● ● ●
● ●
● ● ● ●
● ● ● ● ● ●
● ● ● ●
● ●● ● ● ●● ●
● ● ● ●
● ● ● ●
● ● ● ●

● ●
−2

−2

●● ●●
● ●
● ●
●● ● ●● ●

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
t t

J. Ramsay and Silverman 2005

8 / 327
Introduction
Overview

Aims of functional data analysis:


I Represent the data → interpolation, smoothing
I Display the data → registration, outlier detection
I Study sources of pattern and variation → functional principal
component analysis, canonical correlation analysis
I Explain variation in a dependent variable by using independent
variable information → functional regression models
I No forecasting / extrapolation ↔ time series analysis
1.5

1.5

0.03

0.03
1.0

1.0
0.5

0.5

−0.03 −0.01 0.01

−0.03 −0.01 0.01


y(t)

y(t)
x

x
−0.5

−0.5
−1.5

−1.5

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 −0.03 −0.01 0.01 0.03 −0.03 −0.01 0.01 0.03
t t x(t) x(t)

J. Ramsay and Silverman 2005

8 / 327
Introduction
Overview

Aims of functional data analysis:


I Represent the data → interpolation, smoothing
I Display the data → registration, outlier detection
I Study sources of pattern and variation → functional principal
component analysis, canonical correlation analysis
I Explain variation in a dependent variable by using independent
variable information → functional regression models
I No forecasting / extrapolation ↔ time series analysis
200

200

200

200
160

160

160

160
Height

Height

Height

Height
120

120

120

120
80

80

80

80
5 10 15 5 10 15 5 10 15 5 10 15
Age Age Age Age
J. Ramsay and Silverman 2005

8 / 327
Introduction
Overview

Aims of functional data analysis:


I Represent the data → interpolation, smoothing
I Display the data → registration, outlier detection
I Study sources of pattern and variation → functional principal
component analysis, canonical correlation analysis
I Explain variation in a dependent variable by using independent
variable information → functional regression models
I No forecasting / extrapolation ↔ time series analysis

Z
Scalar-on-Function: yi = µ + xi (s)β(s)ds + ε

Function-on-Scalar: yi (t) = µ(t) + xi β(t) + ε(t)


Z
Function-on-Function: yi (t) = µ(t) + xi (s)β(s, t)ds + ε(t)
8 / 327
Introduction
Overview

Aims of functional data analysis:


I Represent the data → interpolation, smoothing
I Display the data → registration, outlier detection
I Study sources of pattern and variation → functional principal
component analysis, canonical correlation analysis
I Explain variation in a dependent variable by using independent
variable information → functional regression models
I No forecasting / extrapolation ↔ time series analysis

J. Ramsay and Silverman 2005

8 / 327
Outline

Introduction
Overview
From high-dimensional to functional data

Descriptive Statistics for Functional Data

Basis Representation of Functional Data

Summary

9 / 327
Introduction
From high-dimensional to functional data

Standard setting in multivariate data analysis:

...
n observations
...

...

...

p variables

I Observations xi = (xi1 , . . . , xip ) for i = 1, . . . , n


I Model complexity increases with p (Curse of Dimensionality )

9 / 327
Introduction
From high-dimensional to functional data

Standard setting in multivariate data analysis:

...
n observations
...

...

...

p variables

I Observations xi = (xi1 , . . . , xip ) for i = 1, . . . , n


I Model complexity increases with p (Curse of Dimensionality )

9 / 327
Introduction
From high-dimensional to functional data

Standard setting in multivariate data analysis:

...
n observations
...

...

...

p variables

I Observations xi = (xi1 , . . . , xip ) for i = 1, . . . , n


I Model complexity increases with p (Curse of Dimensionality )

9 / 327
Introduction
From high-dimensional to functional data
Data with natural ordering:

xi1 xi2 xi3 ... xip

t1 t2 t3 ... tp
t1 t2 t3 tp

I Longitudinal data
I Ordering along time domain (one-dimensional)
Functional data:

xi1 xi2 xi3 ... xip

t1 t2 t3 ... tp
T

I Basic idea: Model discretely observed data by functions on domain T


10 / 327
Introduction
From high-dimensional to functional data
Data with natural ordering:

xi1 xi2 xi3 ... xip

t1 t2 t3 ... tp
t1 t2 t3 tp

I Longitudinal data
I Ordering along time domain (one-dimensional)
Functional data:

xi1 xi2 xi3 ... xip

t1 t2 t3 ... tp
T

I Basic idea: Model discretely observed data by functions on domain T


10 / 327
Introduction
From high-dimensional to functional data
Data with natural ordering:

xi1 xi2 xi3 ... xip

t1 t2 t3 ... tp
t1 t2 t3 tp

I Longitudinal data
I Ordering along time domain (one-dimensional)
Functional data:

xi1 xi2 xi3 ... xip

t1 t2 t3 ... tp
T

I Basic idea: Model discretely observed data by functions on domain T


10 / 327
Introduction
From high-dimensional to functional data
Data with natural ordering:

xi1 xi2 xi3 ... xip

t1 t2 t3 ... tp
t1 t2 t3 tp

I Longitudinal data
I Ordering along time domain (one-dimensional)
Functional data:

xi1 xi2 xi3 ... xip

t1 t2 t3 ... tp
T

I Basic idea: Model discretely observed data by functions on domain T


10 / 327
Introduction
From high-dimensional to functional data
Functional data:

I Observations xi (t), t ∈ T for i = 1, . . . , n


I Number of observable values xi (t1 ), . . . , xi (tp )
I in theory: p → ∞
I in practice: p < ∞
I Domain T
I Realizations x1 , . . . , xn of X are curves (d = 1), images (d = 2), 3D
arrays (d = 3), etc.

11 / 327
Introduction
From high-dimensional to functional data
Functional data:

I Observations xi (t), t ∈ T for i = 1, . . . , n


I Number of observable values xi (t1 ), . . . , xi (tp )
I in theory: p → ∞
I in practice: p < ∞
I Domain T
I Realizations x1 , . . . , xn of X are curves (d = 1), images (d = 2), 3D
arrays (d = 3), etc.

11 / 327
Introduction
From high-dimensional to functional data
Functional data:

I Observations xi (t), t ∈ T for i = 1, . . . , n


I Number of observable values xi (t1 ), . . . , xi (tp )
I in theory: p → ∞
I in practice: p < ∞
I Domain T
I Realizations x1 , . . . , xn of X are curves (d = 1), images (d = 2), 3D
arrays (d = 3), etc.

11 / 327
Introduction

Descriptive Statistics for Functional Data


Pointwise measures
Covariance and Correlation Functions

Basis Representation of Functional Data

Summary

12 / 327
Descriptive Statistics for Functional Data
Pointwise measures
Example: Growth curves of 54 girls

100 120 140 160 180


height (cm)

80

5 10 15

age (years)

Summary Statistics:
I Based on observed functions x1 (t), . . . , xn (t)
I Characterize location, variability, dependence between time points, ...
12 / 327
Descriptive Statistics for Functional Data
Pointwise measures
Example: Growth curves of 54 girls

100 120 140 160 180


height (cm)

80

5 10 15

age (years)

Summary Statistics:
I Based on observed functions x1 (t), . . . , xn (t)
I Characterize location, variability, dependence between time points, ...
12 / 327
Descriptive Statistics for Functional Data
Pointwise measures
Example: Growth curves of 54 girls

100 120 140 160 180


height (cm)

80

5 10 15

age (years)

Summary Statistics:
I Based on observed functions x1 (t), . . . , xn (t)
I Characterize location, variability, dependence between time points, ...
12 / 327
Descriptive Statistics for Functional Data
Pointwise measures
Example: Growth curves of 54 girls
100 120 140 160 180

deviation from mean height (cm)

20
10
height (cm)

0
−10
80

−20
5 10 15 5 10 15

age (years) age (years)

Sample mean function: Centered curves:


n
1 X xi (t) − µ̂X (t)
µ̂X (t) = xi (t)
n
i=1
I Pointwise calculation for each value t ∈ T
I Analogous to multivariate case 13 / 327
Descriptive Statistics for Functional Data
Pointwise measures
Example: Growth curves of 54 girls
100 120 140 160 180

deviation from mean height (cm)

20
10
height (cm)

0
−10
80

−20
5 10 15 5 10 15

age (years) age (years)

Sample mean function: Centered curves:


n
1 X xi (t) − µ̂X (t)
µ̂X (t) = xi (t)
n
i=1
I Pointwise calculation for each value t ∈ T
I Analogous to multivariate case 13 / 327
Descriptive Statistics for Functional Data
Pointwise measures
Example: Growth curves of 54 girls
100 120 140 160 180

deviation from mean height (cm)

20
10
height (cm)

0
−10
80

−20
5 10 15 5 10 15

age (years) age (years)

Sample mean function: Centered curves:


n
1 X xi (t) − µ̂X (t)
µ̂X (t) = xi (t)
n
i=1
I Pointwise calculation for each value t ∈ T
I Analogous to multivariate case 13 / 327
Descriptive Statistics for Functional Data
Pointwise measures
Example: Growth curves of 54 girls
50

7
height (cm^2)

40

6
height (cm)
30

5
20

4
10

3
5 10 15 5 10 15

age (years) age (years)

Sample variance function: Standard deviation function:


n
1 X
q
σ̂X2 (t) = (xi (t) − µ̂X (t))2 σ̂X (t) = σ̂X2 (t)
n−1
i=1

14 / 327
Descriptive Statistics for Functional Data
Pointwise measures
Example: Growth curves of 54 girls
50

7
height (cm^2)

40

6
height (cm)
30

5
20

4
10

3
5 10 15 5 10 15

age (years) age (years)

Sample variance function: Standard deviation function:


n
1 X
q
σ̂X2 (t) = (xi (t) − µ̂X (t))2 σ̂X (t) = σ̂X2 (t)
n−1
i=1

14 / 327
Outline

Introduction

Descriptive Statistics for Functional Data


Pointwise measures
Covariance and Correlation Functions

Basis Representation of Functional Data

Summary

15 / 327
Descriptive Statistics for Functional Data
Covariance and Correlation Functions

Covariance / Correlation functions:


I Measure dependence between different (time) points s, t ∈ T
I Sample covariance function:
n
1 X
v̂X (s, t) = (xi (s) − µ̂X (s)) · (xi (t) − µ̂X (t))
n−1
i=1

I Sample correlation function:

v̂X (s, t)
ĉX (s, t) = q
σ̂X2 (s)σ̂X2 (t)

15 / 327
Descriptive Statistics for Functional Data
Covariance and Correlation Functions

Covariance / Correlation functions:


I Measure dependence between different (time) points s, t ∈ T
I Sample covariance function:
n
1 X
v̂X (s, t) = (xi (s) − µ̂X (s)) · (xi (t) − µ̂X (t))
n−1
i=1

I Sample correlation function:

v̂X (s, t)
ĉX (s, t) = q
σ̂X2 (s)σ̂X2 (t)

15 / 327
Descriptive Statistics for Functional Data
Covariance and Correlation Functions

Covariance / Correlation functions:


I Measure dependence between different (time) points s, t ∈ T
I Sample covariance function:
n
1 X
v̂X (s, t) = (xi (s) − µ̂X (s)) · (xi (t) − µ̂X (t))
n−1
i=1

I Sample correlation function:

v̂X (s, t)
ĉX (s, t) = q
σ̂X2 (s)σ̂X2 (t)

15 / 327
Descriptive Statistics for Functional Data
Covariance and Correlation Functions

Example: Growth curves of 54 girls

Sample covariance function

40

50

15
heigh

40
40
40 45
55
30

age (years)
t
(cm^2

50 35

10
20
10
)

30

15 25
ag

15)

5
20
s
e

10
10 ear
(y

15
y
ea

5 (
5 age
rs

10
)

5 10 15

age (years)

16 / 327
Descriptive Statistics for Functional Data
Covariance and Correlation Functions

Example: Growth curves of 54 girls

Sample correlation function

1.0
auto−

0.55
0.75

0.9

0.6
15
0.8

0.75
correla

0.8 0.8

age (years)
0.7 0.75

10
tion

0.6

0.65

0.85
15
ag

5
15) 0.9

5
s
e

10
10 ear

0.7
(y

0.9
y 0.95
ea

0.8
0.9

5 ( 0.85

5 age
rs

0.8 0.75 0.65


0.7 0.65 0.6 0.55
)

5 10 15

age (years)

17 / 327
Introduction

Descriptive Statistics for Functional Data

Basis Representation of Functional Data


Regularly and irregularly sampled functional data
Basis functions
Basis representations for functional data
Most popular choices of basis functions
Smoothness and regularization
Other representations of functional data

Summary

18 / 327
Basis Representation of Functional Data
Regularly and irregularly sampled functional data

Example bacterial growth curve i-th growth curve xi (t)


5 Observed measurements:
4
 
t1 xi (t1 )
3
..  .. 
.  . 

y

2
..  .. 
.  . 
1 tp xi (tp )
0
0 20 40
t

18 / 327
Basis Representation of Functional Data
Regularly and irregularly sampled functional data

Example bacterial growth curve i-th growth curve xi (t)


5 Observed measurements:
4
 
t1 xi (t1 )
3
..  .. 
.  . 

y

2
..  .. 
.  . 
1 tp xi (tp )
0
0 20 40
t

18 / 327
Basis Representation of Functional Data
Regularly and irregularly sampled functional data
Example bacterial growth curve Sample of curves x1 (t), . . . , xN (t)
5 Observed measurements
4 in  ’wide format’: 
t1 x1 (t1 ) x2 (t1 ) . . . xN (t1 )
3
..  .. .. 
.  . . 
y

2
..  .. .. 

.  . . 
1
tp x1 (tp ) x2 (tp ) . . . xN (tp )
0
0 20 40
t

⇒ Regular functional data:


I functions observed on common grid (often equi-distant)
I simpler case
I to some extend, methods of multivariate statistics can be directly
applied
19 / 327
Basis Representation of Functional Data
Regularly and irregularly sampled functional data
Example bacterial growth curve Sample of curves x1 (t), . . . , xN (t)
5 Observed measurements
4 in  ’wide format’: 
t1 x1 (t1 ) x2 (t1 ) . . . xN (t1 )
3
..  .. .. 
.  . . 
y

2
..  .. .. 

.  . . 
1
tp x1 (tp ) x2 (tp ) . . . xN (tp )
0
0 20 40
t

⇒ Regular functional data:


I functions observed on common grid (often equi-distant)
I simpler case
I to some extend, methods of multivariate statistics can be directly
applied
19 / 327
Basis Representation of Functional Data
Regularly and irregularly sampled functional data
Example bacterial growth curve Sample of curves x1 (t), . . . , xN (t)
5 Observed measurements
4 in  ’wide format’: 
t1 x1 (t1 ) x2 (t1 ) . . . xN (t1 )
3
..  .. .. 
.  . . 
y

2
..  .. .. 

.  . . 
1
tp x1 (tp ) x2 (tp ) . . . xN (tp )
0
0 20 40
t

⇒ Regular functional data:


I functions observed on common grid (often equi-distant)
I simpler case
I to some extend, methods of multivariate statistics can be directly
applied
19 / 327
Basis Representation of Functional Data
Regularly and irregularly sampled functional data

Example bacterial growth curve Sample of curves x1 (t), . . . , xN (t)


5 Observed measurements
4 in ’long format’:
 
t1,1 x1 (t1,1 )
3
..  .. 
.  .
y


2  
t1,p1  x1 (t1,p1 ) 

1 ..  .. 
.  . 
0
 
tN,1  xN (tN,1 ) 

0 20 40
..  ..

t

.  . 
⇒ Irregular functional data: tN,pN xN (tN,pN )
I functions observed on different time points
I sometimes only sparsely sampled
I more difficult, but often given in practice
20 / 327
Basis Representation of Functional Data
Regularly and irregularly sampled functional data

Example bacterial growth curve Sample of curves x1 (t), . . . , xN (t)


5 Observed measurements
4 in ’long format’:
 
t1,1 x1 (t1,1 )
3
..  .. 
.  .
y


2  
t1,p1  x1 (t1,p1 ) 

1 ..  .. 
.  . 
0
 
tN,1  xN (tN,1 ) 

0 20 40
..  ..

t

.  . 
⇒ Irregular functional data: tN,pN xN (tN,pN )
I functions observed on different time points
I sometimes only sparsely sampled
I more difficult, but often given in practice
20 / 327
Basis Representation of Functional Data
Regularly and irregularly sampled functional data

Example bacterial growth curve Sample of curves x1 (t), . . . , xN (t)


5 Observed measurements
4 in ’long format’:
 
t1,1 x1 (t1,1 )
3
..  .. 
.  .
y


2  
t1,p1  x1 (t1,p1 ) 

1 ..  .. 
.  . 
0
 
tN,1  xN (tN,1 ) 

0 20 40
..  ..

t

.  . 
⇒ Irregular functional data: tN,pN xN (tN,pN )
I functions observed on different time points
I sometimes only sparsely sampled
I more difficult, but often given in practice
20 / 327
Outline

Introduction

Descriptive Statistics for Functional Data

Basis Representation of Functional Data


Regularly and irregularly sampled functional data
Basis functions
Basis representations for functional data
Most popular choices of basis functions
Smoothness and regularization
Other representations of functional data

Summary

21 / 327
Basis Representation of Functional Data
Basis functions

Basis representation Construct functions as weighted sum


5
θi k bk ( t ) basis functions bk (t), k = 1, . . . , K :
4 f ( t ) = Σ k θ i k bi k ( t )
K
X
3 f (t) = θk bk (t)
y

2
k=1

1 with basis coefficients θ1 , . . . , θK .


0
0 20 40
t

21 / 327
Basis Representation of Functional Data
Basis functions

Basis representation Functional shape determined


5 via basis coefficients:
θ i k bk ( t )
f ( t ) = Σ k θ i k bi k ( t )
 
4 1 θ1
3
2  θ2 

3  θ3 
y


2 ..  .. 

.  . 
1
K θK
0
0 20 40
t

Function given by
K
X
f (t) = θk bk (t)
k=1

22 / 327
Basis Representation of Functional Data
Basis functions

Basis representation Functional shape determined


5 via basis coefficients:
θ i k bk ( t )
f ( t ) = Σ k θ i k bi k ( t )
 
4 1 1
3
2 1

3 1
y


2 ..  .. 

. .

1
K 1
0
0 20 40
t

Function given by
K
X
f (t) = θk bk (t)
k=1

22 / 327
Basis Representation of Functional Data
Basis functions

Basis representation Functional shape determined


5 via basis coefficients:
θ i k bk ( t )
f ( t ) = Σ k θ i k bi k ( t )
 
4 1 1
3
2 1

3 2
y


2 ..  .. 

. .

1
K 1
0
0 20 40
t

Function given by
K
X
f (t) = θk bk (t)
k=1

22 / 327
Basis Representation of Functional Data
Basis functions

Basis representation Functional shape determined


5 via basis coefficients:
θ i k bk ( t )
f ( t ) = Σ k θ i k bi k ( t )
 
4 1 1
3
2 2

3 3
y


2 ..  .. 

. .

1
K K
0
0 20 40
t

Function given by
K
X
f (t) = θk bk (t)
k=1

22 / 327
Outline

Introduction

Descriptive Statistics for Functional Data

Basis Representation of Functional Data


Regularly and irregularly sampled functional data
Basis functions
Basis representations for functional data
Most popular choices of basis functions
Smoothness and regularization
Other representations of functional data

Summary

23 / 327
Basis Representation of Functional Data
Basis representations for functional data
Basis representation Approximate data with basis functions
4
θ i k bk ( t ) ⇒ seek to specify θ̂i,1 , . . . , θ̂i,K such
f ( t ) = Σ k θ i k bi k ( t ) that
3 K
X
xi (t) ≈ θ̂i,k bk (t) .
2
y

k=1

0
0 20 40
t

⇒ Popular criterion:
Specify θ̂i,1 , . . . , θ̂i,K such that quadratic distance becomes minimal,
i.e. !2
X q K
X
xi (tj ) − θi,k bk (tj ) −→ min
θi,k
j=1 k=1
23 / 327
Basis Representation of Functional Data
Basis representations for functional data
Basis representation Approximate data with basis functions
4
θ i k bk ( t ) ⇒ seek to specify θ̂i,1 , . . . , θ̂i,K such
f ( t ) = Σ k θ i k bi k ( t ) that
3 K
X
xi (t) ≈ θ̂i,k bk (t) .
2
y

k=1

0
0 20 40
t

⇒ Popular criterion:
Specify θ̂i,1 , . . . , θ̂i,K such that quadratic distance becomes minimal,
i.e. !2
X q K
X
xi (tj ) − θi,k bk (tj ) −→ min
θi,k
j=1 k=1
23 / 327
Basis Representation of Functional Data
Basis representations for functional data

Basis representation Sample of curves x1 (t), . . . , xN (t)


Basis representations
of  observed measurements:
4
1 θ̂1,1 θ̂2,1 . . . θ̂N,1
..  .. .. 
.  . . 
y

2 ..  .. .. 

.  . . 
K θ̂1,K θ̂2,K . . . θ̂N,K
0
0 20 40 PK
t
Functional observations represented as xi (t) ≈ k=1 θ̂i,k bk (t).

24 / 327
Outline

Introduction

Descriptive Statistics for Functional Data

Basis Representation of Functional Data


Regularly and irregularly sampled functional data
Basis functions
Basis representations for functional data
Most popular choices of basis functions
Smoothness and regularization
Other representations of functional data

Summary

25 / 327
Basis Representation of Functional Data
Most popular choices of basis functions

Basis representation B-spline bases:


4
B-Spline of degree 1 I piece-wise polynomials of degree d
3 I basis functions consist of
2
1 (d − 1)-times differentiably
0
connected polynomials
B-Spline of degree 2
4 I connection at knots determining the
3
2
number of basis functions
y

1
0
I cheap to compute & numerically
B-Spline of degree 3 stable
4
3 I local support: sparse matrix of basis
2
1
function evaluations
0
0 20 40
t

25 / 327
Basis Representation of Functional Data
Most popular choices of basis functions

Basis representation B-spline bases:


4
B-Spline of degree 1 I piece-wise polynomials of degree d
3 I basis functions consist of
2
1 (d − 1)-times differentiably
0
connected polynomials
B-Spline of degree 2
4 I connection at knots determining the
3
2
number of basis functions
y

1
0
I cheap to compute & numerically
B-Spline of degree 3 stable
4
3 I local support: sparse matrix of basis
2
1
function evaluations
0
0 20 40
t

25 / 327
Basis Representation of Functional Data
Most popular choices of basis functions

Basis representation B-spline bases:


4
B-Spline of degree 1 I piece-wise polynomials of degree d
3 I basis functions consist of
2
1 (d − 1)-times differentiably
0
connected polynomials
B-Spline of degree 2
4 I connection at knots determining the
3
2
number of basis functions
y

1
0
I cheap to compute & numerically
B-Spline of degree 3 stable
4
3 I local support: sparse matrix of basis
2
1
function evaluations
0
0 20 40
t

25 / 327
Basis Representation of Functional Data
Most popular choices of basis functions

Basis representation B-spline bases:


4
B-Spline of degree 1 I piece-wise polynomials of degree d
3 I basis functions consist of
2
1 (d − 1)-times differentiably
0
connected polynomials
B-Spline of degree 2
4 I connection at knots determining the
3
2
number of basis functions
y

1
0
I cheap to compute & numerically
B-Spline of degree 3 stable
4
3 I local support: sparse matrix of basis
2
1
function evaluations
0
0 20 40
t

25 / 327
Basis Representation of Functional Data
Most popular choices of basis functions

Other popular bases:


I Fourier basis: containing harmonics with different frequencies
⇒ periodic functions
I Wavelets:
⇒ for peaked, ragged functions.
I Thin-plate splines
⇒ better theory, also for surfaces.

26 / 327
Basis Representation of Functional Data
Most popular choices of basis functions

Other popular bases:


I Fourier basis: containing harmonics with different frequencies
⇒ periodic functions
I Wavelets:
⇒ for peaked, ragged functions.
I Thin-plate splines
⇒ better theory, also for surfaces.

26 / 327
Basis Representation of Functional Data
Most popular choices of basis functions

Other popular bases:


I Fourier basis: containing harmonics with different frequencies
⇒ periodic functions
I Wavelets:
⇒ for peaked, ragged functions.
I Thin-plate splines
⇒ better theory, also for surfaces.

26 / 327
Outline

Introduction

Descriptive Statistics for Functional Data

Basis Representation of Functional Data


Regularly and irregularly sampled functional data
Basis functions
Basis representations for functional data
Most popular choices of basis functions
Smoothness and regularization
Other representations of functional data

Summary

27 / 327
Basis Representation of Functional Data
Smoothness and regularization

Basis representation I how many knots for the basis?


5
q i bi(t) I trade-off between over-fitting
S q i bi(t)
4 and
3
under-fitting
y

0
0 20 40
t

27 / 327
Basis Representation of Functional Data
Smoothness and regularization

Penalization:
I minimize quadratic difference from data
+ a roughness penalty term
Specify θ̂i,1 , . . . , θ̂i,K to minimize

p K
!2
X X
xi (tj ) − θi,k bk (tj ) + λ pen(θi ) −→ min
θi,k
j=1 k=1

I with, e.g., P
quadratic penalty on second order differences, i.e.
pen(θi ) = K 2
k=3 ((θi,k − θi,k−1 ) − (θi,k−1 − θi,k−2 )) and λ > 0 a
smoothing parameter

28 / 327
Basis Representation of Functional Data
Smoothness and regularization

Penalization:
I minimize quadratic difference from data
+ a roughness penalty term
Specify θ̂i,1 , . . . , θ̂i,K to minimize

p K
!2
X X
xi (tj ) − θi,k bk (tj ) + λ pen(θi ) −→ min
θi,k
j=1 k=1

I with, e.g., P
quadratic penalty on second order differences, i.e.
pen(θi ) = K 2
k=3 ((θi,k − θi,k−1 ) − (θi,k−1 − θi,k−2 )) and λ > 0 a
smoothing parameter

28 / 327
Basis Representation of Functional Data
Smoothness and regularization

Penalization:
I minimize quadratic difference from data
+ a roughness penalty term
Specify θ̂i,1 , . . . , θ̂i,K to minimize

p K
!2
X X
xi (tj ) − θi,k bk (tj ) + λ pen(θi ) −→ min
θi,k
j=1 k=1

I with, e.g., P
quadratic penalty on second order differences, i.e.
pen(θi ) = K 2
k=3 ((θi,k − θi,k−1 ) − (θi,k−1 − θi,k−2 )) and λ > 0 a
smoothing parameter

28 / 327
Basis Representation of Functional Data
Smoothness and regularization
Fit with λ = 0 Fit with λ = 1
4 4
3 3
2 2
y

y
1 1
0 0
-1 -1
0 20 40 0 20 40
t t

Fit with λ = 1000


4
3
2
y

1
0
-1
0 20 40
t

I λ is typically estimated from the data, e.g. using cross validation

29 / 327
Basis Representation of Functional Data
Smoothness and regularization
Fit with λ = 0 Fit with λ = 1
4 4
3 3
2 2
y

y
1 1
0 0
-1 -1
0 20 40 0 20 40
t t

Fit with λ = 1000


4
3
2
y

1
0
-1
0 20 40
t

I λ is typically estimated from the data, e.g. using cross validation

29 / 327
Outline

Introduction

Descriptive Statistics for Functional Data

Basis Representation of Functional Data


Regularly and irregularly sampled functional data
Basis functions
Basis representations for functional data
Most popular choices of basis functions
Smoothness and regularization
Other representations of functional data

Summary

30 / 327
Basis Representation of Functional Data
Other representations of functional data

I Functional principal components: (Wang et al. 2016)


I basis representation learned from observed data
I “optimal” (low-dimensional) basis
I more on this later
I Gaussian processes: x(t) ∼ GP (µX (t), σX (t, t 0 )) (Shi and Choi 2011)
I Gaussianity assumption
I σX (t, t 0 ) from some parametric family
I µX , σX estimated from data
I Differential equations / dynamics: (J. Ramsay and Hooker 2017)
I represent functional data in terms of differential equations describing
their behavior:
d
dt x(t) = f (x(t))
I seems very useful for physical systems, motion data etc.
I (available literature uses spline representations internally)

30 / 327
Basis Representation of Functional Data
Other representations of functional data

I Functional principal components: (Wang et al. 2016)


I basis representation learned from observed data
I “optimal” (low-dimensional) basis
I more on this later
I Gaussian processes: x(t) ∼ GP (µX (t), σX (t, t 0 )) (Shi and Choi 2011)
I Gaussianity assumption
I σX (t, t 0 ) from some parametric family
I µX , σX estimated from data
I Differential equations / dynamics: (J. Ramsay and Hooker 2017)
I represent functional data in terms of differential equations describing
their behavior:
d
dt x(t) = f (x(t))
I seems very useful for physical systems, motion data etc.
I (available literature uses spline representations internally)

30 / 327
Basis Representation of Functional Data
Other representations of functional data

I Functional principal components: (Wang et al. 2016)


I basis representation learned from observed data
I “optimal” (low-dimensional) basis
I more on this later
I Gaussian processes: x(t) ∼ GP (µX (t), σX (t, t 0 )) (Shi and Choi 2011)
I Gaussianity assumption
I σX (t, t 0 ) from some parametric family
I µX , σX estimated from data
I Differential equations / dynamics: (J. Ramsay and Hooker 2017)
I represent functional data in terms of differential equations describing
their behavior:
d
dt x(t) = f (x(t))
I seems very useful for physical systems, motion data etc.
I (available literature uses spline representations internally)

30 / 327
Introduction

Descriptive Statistics for Functional Data

Basis Representation of Functional Data

Summary

31 / 327
Summary
Functional Data:
I Arises in many different contexts and in many applications (curves,
images,...)
I Observation unit represents the full curve, typically discretized, i.e.
observed on a grid
I Important analysis techniques:
I Smoothing and basis representation
I Functional principal component analysis
I Functional regression

Summary Statistics:
I Give insights into location, variability and time dependence in a
sample of curves
I Pointwise calculation, mostly analogous to multivariate case

31 / 327
Summary

Basis representation:
I Different types of raw functional data: regularly and irregularly
sampled
I (Approximate) representation via bases of functions
I ’true functional representation’
I smoothing / vector representation
I Represent a functional datum in terms of a global, fixed, known
dictionary of basis functions and an observation-specific coefficient
vector.
I Different types of basis functions for different purposes
I Obtaining desired ’smoothness’ via penalization

32 / 327
Part II

Background: Regression

33 / 327
Recap: Linear Models

Recap: Generalized Linear Models

Recap: Non-Linear Effects

Recap: Mixed Models and Random Effects

Recap: Additive Models and Penalization


Recap: Linear Models
Linear Model: Basics
Inference
Model Diagnostics
R-Implementation: LM

Recap: Generalized Linear Models

Recap: Non-Linear Effects

Recap: Mixed Models and Random Effects

Recap: Additive Models and Penalization

35 / 327
Data & Model

Data:
I (yi , xi1 , . . . , xik ); i = 1, . . . , n
I metric target variable y
I metric or categorical covariates x1 , . . . , xp (categorical data in binary
coding)
Model:
I yi = β0 + β1 xi1 + · · · + βp xip + εi ; i = 1, . . . , n
⇒ y = Xβ + ε; X = [1, x1 , . . . , xp ]
I i. i. d. residuals/errors εi ∼ N(0, σ 2 ); i = 1, . . . , n
I estimates ŷi = β̂0 + β̂1 xi1 + · · · + β̂p xip

35 / 327
Data & Model

Data:
I (yi , xi1 , . . . , xik ); i = 1, . . . , n
I metric target variable y
I metric or categorical covariates x1 , . . . , xp (categorical data in binary
coding)
Model:
I yi = β0 + β1 xi1 + · · · + βp xip + εi ; i = 1, . . . , n
⇒ y = Xβ + ε; X = [1, x1 , . . . , xp ]
I i. i. d. residuals/errors εi ∼ N(0, σ 2 ); i = 1, . . . , n
I estimates ŷi = β̂0 + β̂1 xi1 + · · · + β̂p xip

35 / 327
Data & Model

Data:
I (yi , xi1 , . . . , xik ); i = 1, . . . , n
I metric target variable y
I metric or categorical covariates x1 , . . . , xp (categorical data in binary
coding)
Model:
I yi = β0 + β1 xi1 + · · · + βp xip + εi ; i = 1, . . . , n
⇒ y = Xβ + ε; X = [1, x1 , . . . , xp ]
I i. i. d. residuals/errors εi ∼ N(0, σ 2 ); i = 1, . . . , n
I estimates ŷi = β̂0 + β̂1 xi1 + · · · + β̂p xip

35 / 327
Interpreting the coefficients

Intercept:
β̂0 : estimate for y if all metric x = 0.
and all categorical x in their reference category.
metric covariates:
β̂m : estimated expected change in y if xm increases by 1 (ceteris
paribus).
categorical covariates: (dummy-/one-hot-encoding)
β̂mc : estimated expected difference in y between observations in
category c and the reference category of xm (ceteris paribus).

36 / 327
Outline

Recap: Linear Models


Linear Model: Basics
Inference
Model Diagnostics
R-Implementation: LM

Recap: Generalized Linear Models

Recap: Non-Linear Effects

Recap: Mixed Models and Random Effects

Recap: Additive Models and Penalization

37 / 327
Linear Model Estimation

β̂ minimizes sum of quadratic errors (OLS-estimate):


n
!  
X
> 2 >
(yi − xi β) → min bzw. (y − Xβ) (y − Xβ) → min
β β
i=1
⇒β̂ = (X> X)−1 X> y

Estimated error variance:


n
1 X 1
σˆε2 2 = (yi − x> 2
i β̂) = ε̂> ε̂
n−p n−p
i=1

37 / 327
Properties of β̂

I unbiased: E(β̂) = β
I Cov(β̂) = σ 2 (X> X)−1
for Gaussian ε:
β̂ ∼ N(β, σ 2 (X> X)−1 )

38 / 327
Tests

Possible settings:
1. Testing for significance of a single coefficient:
H0 : βj = 0 vs HA : βj 6= 0
2. Testing for significance of a subvector βt = (βt1 , . . . , βtr )> :
H0 : βt = 0 vs HA : βt 6= 0
3. Testing for equality: H0 : βj − βr = 0 vs HA : βj − βr 6= 0
General:
Testing linear hypotheses H0 : Cβ = d

39 / 327
Tests
F-Test:
Compare sum of squared errors (SSE) of full model with SSE under
restriction H0 :
n − p SSEH0 − SSE
F =
r SSE
−1
(Cβ̂ − d) σ̂ 2 C(X> X)−1 C>
> (Cβ̂ − d) H0
= ∼ F (r , n − p)
r
t-Test:
Test significance of a single coefficient:
β̂j H0
t=q ∼ t(n − p)
\
Var(β̂j )
2
2 β̂j H0
F =t = ∼ F (1, n − p)
\
Var( β̂ ) j

40 / 327
Outline

Recap: Linear Models


Linear Model: Basics
Inference
Model Diagnostics
R-Implementation: LM

Recap: Generalized Linear Models

Recap: Non-Linear Effects

Recap: Mixed Models and Random Effects

Recap: Additive Models and Penalization

41 / 327
Residuals in the linear model

Observed errors ε̂ typically not uncorrelated with identical variance:

ŷ = Xβ̂ = X(X> X)−1 X> y


| {z }
hat matrix H
⇒ ε̂ = y − ŷ = (I − H)y
⇒ Cov ε̂ = σ 2 (I − H)

41 / 327
Types of Residuals

I ordinary residuals: ε̂ (not independent, no constant variance)


I standardized residuals: ri = √ ε̂i (constant variance)
σ̂ 1−hii
ε̂i
I studentized residuals: ri∗ = √
σ̂(−i) 1−hii
:
use for anomaly / outlier detection.
I partial residuals: ε̂xj ,i = ε̂i + β̂j xij :
check linearity, additivity.

42 / 327
Graphical model checks:

I model structure: ri vs ŷi


I linearity: ε̂xj ,i vs xj
I variance homogeneity: ri vs ŷi , xj
I autocorrelation: ri , ε̂i vs i (i = time, e.g.)

43 / 327
Outline

Recap: Linear Models


Linear Model: Basics
Inference
Model Diagnostics
R-Implementation: LM

Recap: Generalized Linear Models

Recap: Non-Linear Effects

Recap: Mixed Models and Random Effects

Recap: Additive Models and Penalization

44 / 327
Linear Model in R:

Linear Models in R: lm model specification:


I m <- lm(y ~ x1 + x2, data=XY)
interactions:
I lm(y ~ x1*x2) equivalent to lm(y ~ x1 + x2 + x1:x2)
methods for lm-objects:
I summary(),anova(),fitted(),predict(),resid()
I coef(), confint(), vcov(), influence()
I plot()
etc...

44 / 327
Example: Munich Rents 1999

I data: 3082 apartments


I target: net rent (DM/sqm)
I metric covariates: size, year of construction (metrisch)
I categorical covariates: area (normal/good/best), central heating
(yes/no), bathroom / kitchen fittings (normal/superior)

45 / 327
Model in R
no interaction:
y = β0 + β1 ∗ x1 + β2 ∗ x2.2 + β3 ∗ x2.3
miet1 <- lm(rentsqm ~ size + area)
(beta.miet1 <- coef(miet1))

## (Intercept) size areagood areabest


## 18.2429185 -0.0715132 0.9059416 3.4196824

with interaction:
y = β0 + β1 ∗ x1 + β2 ∗ x2.2 + β3 ∗ x2.3 + β4 x1 x2.2 + β5 x1 x2.3
miet2 <- lm(rentsqm ~ size * area)
(beta.miet2 <- coef(miet2))

## (Intercept) size areagood areabest


## 18.67890804 -0.07817872 0.11145940 0.87292650
## size:areagood size:areabest
## 0.01182596 0.03302475
Model Visualisation

miet1 miet2
35

35
area
● normal
● good
30

30
● best
25

25
net rent (DM/sqm)

net rent (DM/sqm)


20

20
15

15
10

10
5

5
0

20 40 60 80 100 120 140 160 20 40 60 80 100 120 140 160

size (sqm) size (sqm)


Tests

anova(update(miet2, . ~ -.), miet2)

## Analysis of Variance Table


##
## Model 1: rentsqm ~ 1
## Model 2: rentsqm ~ size * area
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 3081 69521
## 2 3076 60064 5 9457.3 96.866 < 2.2e-16 ***
## ---
## Signif. codes:
## 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
Tests
round(summary(miet2)$coefficients, 3)

## Estimate Std. Error t value Pr(>|t|)


## (Intercept) 18.679 0.329 56.736 0.000
## size -0.078 0.005 -16.376 0.000
## areagood 0.111 0.494 0.226 0.821
## areabest 0.873 1.542 0.566 0.571
## size:areagood 0.012 0.007 1.716 0.086
## size:areabest 0.033 0.018 1.797 0.072

round(anova(miet2), 3)

## Analysis of Variance Table


##
## Response: rentsqm
## Df Sum Sq Mean Sq F value Pr(>F)
## size 1 8071 8071.3 413.346 <2e-16 ***
## area 2 1284 641.9 32.875 <2e-16 ***
## size:area 2 102 51.1 2.617 0.073 .
## Residuals 3076 60064 19.5
## ---
## Signif. codes:
## 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
Model Comparison

anova(miet1, miet2)

## Analysis of Variance Table


##
## Model 1: rentsqm ~ size + area
## Model 2: rentsqm ~ size * area
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 3078 60166
## 2 3076 60064 2 102.19 2.6168 0.0732 .
## ---
## Signif. codes:
## 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

complex hypotheses / multiple testing: package multcomp


Model diagnostics: plot.lm()
par(mfrow = c(2, 2))
plot(miet2)

Residuals vs Fitted Normal Q−Q

Standardized residuals
20

● ●

4
● ●
● ●
● ●
● ● ● ● ●
●● ● ●●●●
● ●
Residuals

● ● ●●●●
● ● ● ●
10

● ●

● ● ● ● ● ● ● ● ●● ● ●
●●
●●
●●

● ● ● ●● ● ● ● ● ● ● ● ● ● ●
●●

●●

●●


● ● ● ● ●●● ● ● ● ●● ● ● ● ● ●● ● ● ●● ● ● ● ● ●● ●
●● ●●
●●


●●


●●
● ●●● ●●●● ●●● ●●● ● ●● ● ● ●

2
● ● ● ● ● ●● ● ● ●

●●
●●

●●
●●
● ●●● ● ●● ● ● ● ● ●● ● ● ●● ●●

●●

● ●● ●● ● ● ● ● ● ●●
●● ●●● ●●● ●● ● ● ● ● ●
●●●
●●● ●
● ● ●● ● ●● ●
●●


●●


● ● ●● ●●
● ● ●●
● ●●●●● ● ● ●● ● ●● ● ● ●●
●● ● ●
●●●● ●●●●●●●● ●●
● ● ●●●
● ● ● ●●●● ●

●●●●

●●
●●● ● ●

●●


●●

●●


●●

●●
● ● ● ● ● ● ●● ● ● ● ●● ● ● ●
● ● ●●● ●
●●● ● ● ●● ● ●●



● ●
●● ● ●● ● ●● ● ●●● ● ●● ●●● ● ●● ● ● ●●●
●●●
●●●● ●●●● ●● ●●
●●●●●●●●●●
●●●●●●●● ● ●
● ●●●● ● ● ●●
●●
●●●●●● ● ●


●●


●●


●●


● ●● ●●● ●
● ●●● ●●●●● ●● ●● ●● ●●●● ●●●
● ●● ●●●
● ●● ●●●●
●● ●●
● ●● ●● ● ●● ●
●●


●●


●●


● ●● ● ●●●● ●● ● ● ●●● ●● ● ●
● ●
●●
●●
●●●● ●●
● ●●
● ●●

●● ●
●●●● ●●

●●● ● ●
●●● ●
●●●
● ●● ●
●● ● ●

●●


●●

● ●● ● ● ●● ●●
● ●●● ●●
●●

●●●●
●●
● ●●● ●●●●●
●● ●
●●● ●●
●●● ● ●●●●●● ●●● ●●●●●●

● ●●●●●● ● ●
●●


●●


●●
● ●●● ● ● ●● ● ● ● ● ●● ●●
●●●
● ●●
●●●



●●●
●●●●●
●●●●
● ●●●●●●●
●●●
● ●●●●● ●
●●●
●●●●●


●● ●●
●●


●●●●
● ●●
●●●●
●●

● ●●●●
● ●●●●●● ●

● ●
●● ●●
● ● ●●
●● ●●
●●
● ●

●●
● ●● ● ● ●
●●● ●
●●


●●
●●


●●

● ●



●●●

● ● ● ● ●●
●● ● ● ● ● ●●●●●●●●●●● ●●
●●● ●●● ●
●● ●●

●● ● ●●●
●● ●

●● ●
●●●
● ●●
● ●

●●●●
●●●●●
● ●●● ●●
●●● ●
●●● ●●●● ●● ●●●●
●●● ●
●●●●●● ●●●● ● ● ●
●●


●●


●●

●● ●●●●●● ● ●●●
●●●●● ●●●●
●●
●● ● ●●●●●●
●●●●● ●● ●●
●●●●
● ●●
● ●
●●●●●●● ●
●●● ● ●

●●



●●
● ●

● ●● ● ● ● ● ●●



●● ●
●● ●●
● ●
●● ●
● ●●

●●
●● ●● ●

●●
● ●●
●●●
●●





● ●
●●●
● ●●
●●●●










● ●●●
●●●●●


●●




●●
●●●●



● ●
●●●


●●●

●●

● ● ● ●●● ●
●●
●●





●●●

●●
●●
●●●● ● ●
●●

●●

●●



●●
● ●


●●



● ●●● ● ● ● ●●● ● ●●● ●
●●●●● ●●●●
● ●
●●● ●●●●● ●●
●●●
●● ●●
●● ●● ●●●
●● ●●●
●●●●
●● ●● ●●
● ●●● ● ●
●●

● ●● ● ● ● ●● ●● ●●●● ●●●● ●●●●
● ●
0

●● ●● ●●
● ●●●● ●● ●● ●
● ● ● ● ● ● ● ● ● ●● ● ● ● ●●
● ●● ●● ●●● ●● ●● ●●●● ●●

●● ● ●●

● ●●
●●

●●
●●
● ●●● ●
●●
●●● ●●
●● ●●

●●
●●
●●
●●●


●●● ●●●●●●


●● ●●
●● ●●●●●●● ● ●●
●● ●●●●
●●

●● ● ●●●● ●
●●


●●


●●



● ● ● ●● ●● ● ●●● ●●

●● ● ●● ●● ●● ●
●●●
●● ● ●●
●● ●●

●●●●
● ●●●●●
●● ●●

●●●●●●●
●● ●●●
●●
●●●

●●
●●




●●
● ●●



●● ●●
●●
● ●
●●●●
●●●●● ● ● ●


●●●●
●● ●●● ● ● ●●


●●


●●


●●

0
● ●●●●●
●● ●●
●● ●
● ●● ●● ● ●● ● ●


● ● ● ●● ● ● ●●●
●●● ●● ●● ● ●●
●●●●●● ●●
●●
●●● ●●●●●●●●●●●
●● ●●
●●●●●●●
● ●●
●●●●●
●●●●
● ●●●

●●
●●● ●
● ●●
●●●●●●

●●● ●●

●● ●●●●●●●
●● ●●
● ● ●●● ●● ●

●●


●●
● ●


●●



● ●● ● ●
●● ● ●
●● ●●●
● ●●●
●● ●

●●●●
● ●●
●● ●
●●●●● ●

● ●●
●●
●●
● ●

●●●●●
● ●●
●●●
●●
● ●●●
● ●● ●●● ●● ●
●●●●●●● ● ●● ●●


●●


●●
● ●● ●● ●● ● ●●●●●
● ● ● ●
●●
●●●●●
● ●
●●●● ●●
●●●●
●●
●●●●●
●● ●●

●●
●●●●
●● ●●
●●●
● ●●●
●●
●●
● ●●


●●● ●
●●●●●●

● ●●●●●
●●
● ●●

●●
●●● ● ●● ● ●● ● ●●●
●● ●


●●


●●


● ● ● ●● ● ● ● ●●
●●● ●●
● ●●
●●
●●●
●●●● ● ●● ●●
● ●
●● ●
●●
●●●
●●
●●
● ● ●●
●●
●●●
●●
●●●●
●●

● ●●●●●●●●
●●● ● ●
●●● ●● ●● ● ●
● ● ●


●●


●●


●●



● ● ● ● ●● ● ● ●● ● ●●
●● ● ●●
●●●●●
●● ●●●●●●●● ●●●
●●
● ●●●
●●
●●


● ●
●●●●●
●●
●●
●● ●●
● ●
●●
●●
● ●●●●●
● ●● ●● ● ● ●●
● ● ● ● ●

●●


●●●




● ● ● ● ● ●● ●●●●●
● ●● ●●
●●●● ●●
●●
●●●
●●●

●●
●●
●●●●
●●● ●●

●●●●●●●●

● ●
●● ●●
●● ●
● ● ●● ●

●●


●●


●●



● ● ● ●
●● ●● ● ●● ●
● ●●
● ●●● ●●

●●●
●●

●●
●●●● ●●● ●● ●●
● ●●●
●●● ●


●●● ●●
●●●

●●
●●●●
●●●
●● ●
●●●●
●● ●● ● ●● ● ●


●●


●●


●●


●●

● ● ●●●●● ●● ●●● ● ●●●
● ● ●●●● ●● ●●●●●
●●● ●●
●● ●

●●●
●● ●●● ●●
●●
●●●
● ●● ●
● ●●●
●●●●
●●
●●
●●●●●
● ●● ●● ●

●●


●●


●●



● ●● ●●●● ●●● ● ● ●●●●●●●●
●●●●

● ●●● ●●
●●
● ●●
●● ●● ●

●●


●●


●●
● ●●● ● ● ●● ●●● ● ● ●● ●● ● ●● ●
● ●● ●● ● ● ● ●●


●●


●●
−10

● ● ● ●● ● ● ● ● ●●
●●●
● ● ●●●●●
●●● ●● ●● ●● ●●●●● ●●●●●● ●
● ● ●● ●
●●



●●
● ●


●●


●●

●●



● ● ● ●●●●●
● ●●
●● ● ●●
● ● ●●● ●●●
● ●●●●●● ●

● ●
●●


●●

●●
●●



●●

● ●
●●
●●

● ● ●● ● ●● ● ●●


●●

−2
● ● ● ●●● ● ● ●
●●


●● ● ● ●●●● ● ●●
● ● ● ●
● ●●

●●


●●

●●

●●

●●

●●
● ● ● ●

●●

●●

●●


●● ● ● ●●
●●

●●

● ●
●●

●●


●●

●●


●●
●●
●●

●●
●●
● ●●●


● ● ●●●●●●


● ●

8 10 12 14 16 18 −3 −2 −1 0 1 2 3
Fitted values Theoretical Quantiles
Scale−Location Residuals vs Leverage
Standardized residuals

Standardized residuals
0.0 0.5 1.0 1.5 2.0

● 0.5

4

● ●
● ● ●● ●
● ● ● ●
●● ● ●
● ●
● ● ● ● ●

●●●
● ● ● ● ● ● ●

● ● ●● ●● ● ●● ●● ● ●● ●●
●● ● ●
● ●● ●● ● ● ●● ● ●
● ● ● ●● ● ●●● ● ● ●● ● ● ●● ● ●● ●
●●●●●●

● ●

● ●●● ● ● ● ● ●●●● ●●
●● ●● ●●●●
● ● ●
●●●● ●
● ● ● ● ● ● ●●● ●● ● ● ● ●● ●●●
●●
● ●●
● ●● ●● ●
● ●● ●●



●●
● ● ●
● ● ●
2
● ●● ● ●●● ● ●●● ● ● ●● ●
●●

● ●
● ● ● ● ● ●●● ● ●●● ●●●●● ●●●
●● ● ●●●●●
● ● ●●
● ●● ●● ●







●●

●●
●●●
● ●● ●
● ● ●● ● ● ●
● ● ● ●● ●● ●●● ●● ● ●
● ● ●●●●
●● ●●
●● ●●●●● ●●●
●●●●●●●
● ●●● ●● ● ●● ●● ●●
●● ● ● ●







●●
●●●●● ● ●
● ● ●
● ● ●●● ●● ●● ●●●● ● ●

●●●●●●●●
● ●
●●●●●●● ● ●●●

● ●● ● ●●●
● ● ●● ●
●●● ● ●


●●



●●


●●





●●




●● ●● ● ● ● ●
● ● ● ●● ● ●● ●
● ●● ●
●●● ● ●●●●●● ●● ●●
● ● ●● ●●● ● ●●●●● ● ● ● ●


●●



●●

●●●●
●●●● ● ●● ● ●
● ● ● ●● ● ● ● ●● ● ● ●●● ● ●● ●●● ●●● ●●


● ●● ●

●●●●
●● ●●●
● ●●

● ●
●●●●

●●●●●● ●●●●
●●●●●●


●●
● ●●●● ● ●●
● ● ● ● ●●●● ● ●●


●●


●●





●●














●● ● ● ●●

● ● ● ● ●● ●●
● ●● ●● ●
●● ●●
●●
●●●●


●●
●●●●●● ●● ●●●●●● ●

●●

●●
●●

●● ●
● ●●●●

● ●●
●●●
●●
● ●
●●●
● ●●
●● ● ●●
●●● ● ●

●● ●● ● ●
●●●● ●● ● ●




●●






●●






●●●●● ●●
● ●● ● ● ● ● ●


● ● ● ●●●
● ●● ● ●● ●● ●●●
●●
● ● ●●
● ●●
●●●
●●●

●●●
●●

●●●




● ●






●●
● ●●
●●
●●●
●●
●●●●


●●●
● ●●●


●●● ●
●●●
● ●●● ●●





































●● ●●


● ● ●● ●●●
● ●
● ● ● ●● ● ●●●● ● ●● ● ● ●
●● ●●●●●●
●● ●
●●●
●● ●●●● ●
●● ● ●
●●●●
●●●●●●● ● ●
●● ●●● ● ● ● ●● ● ●


●●


●●

●●

●●●● ●
● ●● ●● ● ●●●●● ●●
● ●● ●●●●●● ● ●●● ● ● ●● ● ● ●


●●



●●
●●
●●● ● ● ●
● ● ●● ●● ● ● ●●● ●●●● ● ● ● ● ●● ●●●●
●●●●
●●●
●●●● ●●●
●● ● ●●

●●●●●●
● ●●●



●●
● ●
●●
●●
●● ● ●● ●●● ●●●● ●●●● ●
●● ●
● ● ●






●●










●●

●●
●●●●●● ● ● ●
● ● ●● ● ● ●● ● ●● ● ● ● ●● ● ●●
●● ●●●●
●●
●● ●
●●●●●●●●● ●●
● ●
● ●●
●●

●●●●
●●
●●●●●●
● ●●

● ●●
●●●● ●● ● ● ●● ●









●●




●●●

●● ● ● ● ●
● ● ● ● ●●
●● ●●●●● ●●
●●●
●●

●●

●●● ●●● ●●
●●●
●●●●
●●



●●●●
●●

●●

●●

●● ●
●●●●
●● ●●
● ●●
●●●

●●

●●
● ●●●● ● ● ●●●●
● ●●
●●●● ● ● ● ●





























●● ●
● ●● ● ● ● ● ●● ● ●
● ●● ● ● ●●●●●●● ● ●●●●
● ●●
●●●● ● ●● ●●●● ●●●
●● ● ●●
0

●● ●● ● ● ●
●● ●●●
●●●● ● ● ●● ●●

● ●●●●● ● ● ●
●●●●●●
● ●●●●
●●●●●●
●●
●●

●●●●

●●●●● ●●
●●●●●● ● ●●●●
●●●
●● ●● ● ● ●●



















●●



●●●
●●
● ● ●● ●●● ●● ●●●
● ● ●
● ● ●●●●● ●●●●●
●●●
●●●
●●
● ●● ●● ●
●●●● ●●●●
●●●●●●●
●●●●●
● ●●
● ●●
●●●● ●

●●●●● ●●● ●●
●●●● ●● ● ●





●●



●●



●●




●●
● ●●●●●●● ●●
● ● ● ●●● ● ● ● ● ●●● ● ●●● ●
●●
●●
●● ●

●●●●
●●
●●●
●●● ●● ●●●

● ●●● ●● ●

●●

●●●●
●●●●●●
● ●●
●●● ●● ●●● ●●
●● ● ●● ● ●●● ●





●●




●●



●●

●●●

● ●●●●●
● ● ● ●

●●● ●●●● ●● ●● ●
●●●●●
● ●●●
●● ●
●● ●●●●●●●●
●●●●●●


●●●
●● ●●●



●●●
● ●●●
●●
●●

●●●●
●● ●●●

●●●●●
● ●● ●●

●● ●●
●●●● ●






















●●
●●



●●● ● ● ●

● ● ●
●● ● ● ●●●● ●●● ● ●●● ●
● ●●●●●
●● ●● ●●
●●●●●
●●
●● ● ●●●●
●●●
●●●
●● ●
●●●●●

●●
●● ●● ●●
●●
● ● ●
●●● ●●
●● ●●●
●●●●
●●● ● ●● ●






●●


●●



●●
●●●
●●●●●●●
● ●● ●●●● ●●●● ●●●
●● ● ●●●
●●●●● ●●
●●● ●●● ●
●●
●● ●
●●● ●●
●● ●


●●●●

●●●● ●● ●●
● ●●
●●● ●●● ●● ●● ●●●●● ● ● ● ● ● ● ●




●●













●● ● ● ●● ● ● ●

●● ●●
● ● ● ●●● ●●
● ● ●●●●●● ●● ●●●
●● ●● ●●●● ●
●●●
● ●●●
●●●● ●● ● ● ● ● ●● ● ●


●●

●●

●●
● ● ●
● ●●● ● ●● ●●●●●

●●●●●●●●●
●●●●
●●
●● ●
●● ●●●●●●
● ●●●
● ● ●●●●
● ●
●●●●
●●● ●● ●●●
●●●● ●●● ●
●●
●●●● ●
● ● ● ●












●●


● ●

●● ●● ●
●●
● ●
● ● ●● ● ●●● ●●●
● ●●●● ● ●●
●● ●●
● ●●●
●● ● ●● ●

●●● ●● ●●●●
●●
●●

● ● ● ●● ●


●●


●●
●●
●●● ● ●
● ● ● ●●●● ● ●● ●●●●
●●●● ●●●
● ●●
●●●●● ●
●●
●● ●●● ●
●●●●●●
● ●● ●●●●
● ●●●●●●●●● ●●● ●●●●

● ●●●● ● ●●






●●



●●

●●
● ●
●● ● ● ● ● ●●●●●●● ● ● ● ●●
● ●●

●●

● ●●

● ●● ●
● ●●●●
● ●●
●●●●
●●●● ●●●
●●●

●●
●●● ●●
● ●●●●● ● ●●



● ●● ●
●●●●
● ●● ● ●● ●●
●●● ●● ●









●●
●●










●●●
● ●●

● ●
● ● ●
●●
● ●● ●● ●●●● ● ●●● ● ●● ●●
● ●●●●●●●●●
●● ●
●●● ● ●
● ● ● ●●● ●
● ● ● ●
● ● ●● ● ● ● ●


●●●●
● ●
● ● ●●● ●●
● ●●
●●●● ● ●

●●

● ●

−2

● ● ●●●●● ●● ● ●●● ● ●● ● ●
●●
● ● ●●● ● ●●●
●● ●● ●●●●
●● ● ●●●●●●● ●● ● ●
●● ● ●
● ● ● ●





●●
●●
●●● ● ●
● ● ●●● ● ● ●
●● ●●●●●●●●● ● ● ●
● ●● ● ● ● ●● ●●●●●●●
● ●●
●●●



●●●
●●● ●●●● ● ●●●●● ● ●●● ●●

●●










●●


● ● ● ●
● ● ● ● ●●●●
●● ●● ● ●●
● ●● ● ●● ● ● ●

●● ● ●
● ●● ● ●● ● ● ●●● ● ●

● ● ●●●

●● ●●● ●● ● ●●● ●●● ● ●
●●● ●●●
● ●● ●●
●● ●● ● ●● ● ● ●● ● ●


●●
●●
●● ●
●● ● ● ● ● ● ● ● ● ●●●● ● ●●● ●● ●● ● ●
● ●●● ●● ●● ●●● ●● ● ● ● ●


●●

●●● ●
● ●● ●
●●● ●● ●● ● ● ● ●●
●● ● ●
● ●● ●
●●● ● ●● ● ●● ●
●●● ●
●● ● ●● ● ● ●●
● ●● ● ● ● ●
● ● ● ●● ● ●●● ●●●●●
● ●●● ●
● ● ●● ● ●●●●● ● ● ●● ●● ● ●
● ● ● ● ●●● ● ● ●●● ●●
●● ● ●● ●● ●●
● ● ● ● ●● ●● ● ● ● ●● ●●●● ●● ●● ●
●●● ●● ● ●●● ● ● ● ●
● ● ● ●●● ● ● ●
●● ● ●● ●● ● ● ●●●●● ●
● ●● ●
● ● ● ●● ● ●● ● ●
● ● ●● ● ● ● ● ●●●● ● ● ● ● ●




● ●●● ● ● ● ● ●
● ● ●
● ● ●●●
●●

●●●
●●
●● ● ● ● ●

● ●● ●●
●●
●● ● ●●●● ● ●●●
● ●
● ● ●● ●
●●

● ●

Cook's distance
−4

8 10 12 14 16 18 0.00 0.02 0.04 0.06 0.08 0.10


Fitted values Leverage
Model criticism: Linear effect of size?
bla
blub
20
10
5

0
−10

20 40 60 80 100 120 140 160


size
Model criticism: Linear effect of size?
plot(size, res_size)
points(sort(unique(size)), tapply(res_size, size, mean), col = "red")
20


10




● ●●
5

● ● ●
● ●
● ● ●●● ●

● ●
● ●
● ●● ●
● ● ●●
●●● ●●●●●● ● ●
0

● ● ●
● ● ● ●● ●●● ● ● ●
●● ● ●●●● ●● ● ● ●● ● ● ●
● ● ● ●● ●●●●● ● ● ● ● ● ●● ● ●● ● ●

● ● ● ● ● ●

● ● ● ●● ● ● ● ●
● ● ●
● ● ● ●
●●
●● ●● ● ●


−10



20 40 60 80 100 120 140 160


size
Alternative Representation of linear models

I y = Xβ + ε
I Gaussian errors: ε ∼ N(0, σ 2 I)
⇒ y ∼ N(Xβ, σ 2 I)
⇒ E(y) = Xβ; Var(y) = σ 2 I

54 / 327
Alternative Representation of linear models

I y = Xβ + ε
I Gaussian errors: ε ∼ N(0, σ 2 I)
⇒ y ∼ N(Xβ, σ 2 I)
⇒ E(y) = Xβ; Var(y) = σ 2 I

54 / 327
Recap: Linear Models

Recap: Generalized Linear Models


Motivation
GLMs: The General Approach
Inference

Recap: Non-Linear Effects

Recap: Mixed Models and Random Effects

Recap: Additive Models and Penalization

55 / 327
Binary Target: Naive Approach

Data:
I binary target y (0 or 1)
I metric and/or categorical x1 , . . . , xp
naive estimates:
ŷi = β̂0 + β̂1 xi1 + · · · + β̂p xip
I ŷi not binary
I could try to interpret ŷi as P̂(yi = 1)
I no variance homogeneity
I ŷi < 0 ? ŷi > 1? ⇒ ŷi must be between 0 and 1
Idea:
P̂(yi = 1) = h(x>
i β̂) with h : (−∞, +∞) → [0, 1]

55 / 327
Binary Target: Naive Approach

Data:
I binary target y (0 or 1)
I metric and/or categorical x1 , . . . , xp
naive estimates:
ŷi = β̂0 + β̂1 xi1 + · · · + β̂p xip
I ŷi not binary
I could try to interpret ŷi as P̂(yi = 1)
I no variance homogeneity
I ŷi < 0 ? ŷi > 1? ⇒ ŷi must be between 0 and 1
Idea:
P̂(yi = 1) = h(x>
i β̂) with h : (−∞, +∞) → [0, 1]

55 / 327
Binary Target: Naive Approach

Data:
I binary target y (0 or 1)
I metric and/or categorical x1 , . . . , xp
naive estimates:
ŷi = β̂0 + β̂1 xi1 + · · · + β̂p xip
I ŷi not binary
I could try to interpret ŷi as P̂(yi = 1)
I no variance homogeneity
I ŷi < 0 ? ŷi > 1? ⇒ ŷi must be between 0 and 1
Idea:
P̂(yi = 1) = h(x>
i β̂) with h : (−∞, +∞) → [0, 1]

55 / 327
Binary Target: Naive Approach

Data:
I binary target y (0 or 1)
I metric and/or categorical x1 , . . . , xp
naive estimates:
ŷi = β̂0 + β̂1 xi1 + · · · + β̂p xip
I ŷi not binary
I could try to interpret ŷi as P̂(yi = 1)
I no variance homogeneity
I ŷi < 0 ? ŷi > 1? ⇒ ŷi must be between 0 and 1
Idea:
P̂(yi = 1) = h(x>
i β̂) with h : (−∞, +∞) → [0, 1]

55 / 327
Binary Target: Naive Approach

Data:
I binary target y (0 or 1)
I metric and/or categorical x1 , . . . , xp
naive estimates:
ŷi = β̂0 + β̂1 xi1 + · · · + β̂p xip
I ŷi not binary
I could try to interpret ŷi as P̂(yi = 1)
I no variance homogeneity
I ŷi < 0 ? ŷi > 1? ⇒ ŷi must be between 0 and 1
Idea:
P̂(yi = 1) = h(x>
i β̂) with h : (−∞, +∞) → [0, 1]

55 / 327
Binary Target: Naive Approach

Data:
I binary target y (0 or 1)
I metric and/or categorical x1 , . . . , xp
naive estimates:
ŷi = β̂0 + β̂1 xi1 + · · · + β̂p xip
I ŷi not binary
I could try to interpret ŷi as P̂(yi = 1)
I no variance homogeneity
I ŷi < 0 ? ŷi > 1? ⇒ ŷi must be between 0 and 1
Idea:
P̂(yi = 1) = h(x>
i β̂) with h : (−∞, +∞) → [0, 1]

55 / 327
Binary Target: Naive Approach

Data:
I binary target y (0 or 1)
I metric and/or categorical x1 , . . . , xp
naive estimates:
ŷi = β̂0 + β̂1 xi1 + · · · + β̂p xip
I ŷi not binary
I could try to interpret ŷi as P̂(yi = 1)
I no variance homogeneity
I ŷi < 0 ? ŷi > 1? ⇒ ŷi must be between 0 and 1
Idea:
P̂(yi = 1) = h(x>
i β̂) with h : (−∞, +∞) → [0, 1]

55 / 327
Binary Target: GLM Approach

I yi ∼ B(1, πi )
I model for E(yi ) = P(yi = 1) = πi
I use response function h: π̂i = h(x> i β̂)
or linkfunktion g : g (π̂i ) = xi β̂ where g () = h−1 ()
>

Logit-Model:
exp(x>
i β̂)
π̂i = h(x>
i β̂) =
1 + exp(x>
i β̂)

56 / 327
Binary Target: GLM Approach

I yi ∼ B(1, πi )
I model for E(yi ) = P(yi = 1) = πi
I use response function h: π̂i = h(x> i β̂)
or linkfunktion g : g (π̂i ) = xi β̂ where g () = h−1 ()
>

Logit-Model:
exp(x>
i β̂)
π̂i = h(x>
i β̂) =
1 + exp(x>
i β̂)

56 / 327
Binary Target: Coefficients of the Logitmodel

exp(x>
i β) πi
πi = ⇔ log = x>
i β̂
1 + exp(x>
i β) 1 − πi

πi
⇔ = exp(β0 ) exp(β1 xi1 ) . . . exp(βp xi )
1 − πi

I linear model for log-odds (Logits)


π̂i
⇒ exp(β̂r ) as factor by which odds change 1−π̂i an, if xir increases by 1.

  P̂(y = 1|x)/P̂(y = 0|x)
exp (x − x̃)> β̂ =
P̂(y = 1|x̃)/P̂(y = 0|x̃)
odds ratio between 2 observations with x and x̃.

57 / 327
Binary Target: Coefficients of the Logitmodel

exp(x>
i β) πi
πi = ⇔ log = x>
i β̂
1 + exp(x>
i β) 1 − πi

πi
⇔ = exp(β0 ) exp(β1 xi1 ) . . . exp(βp xi )
1 − πi

I linear model for log-odds (Logits)


π̂i
⇒ exp(β̂r ) as factor by which odds change 1−π̂i an, if xir increases by 1.

  P̂(y = 1|x)/P̂(y = 0|x)
exp (x − x̃)> β̂ =
P̂(y = 1|x̃)/P̂(y = 0|x̃)
odds ratio between 2 observations with x and x̃.

57 / 327
Binary Target: Coefficients of the Logitmodel

exp(x>
i β) πi
πi = ⇔ log = x>
i β̂
1 + exp(x>
i β) 1 − πi

πi
⇔ = exp(β0 ) exp(β1 xi1 ) . . . exp(βp xi )
1 − πi

I linear model for log-odds (Logits)


π̂i
⇒ exp(β̂r ) as factor by which odds change 1−π̂i an, if xir increases by 1.

  P̂(y = 1|x)/P̂(y = 0|x)
exp (x − x̃)> β̂ =
P̂(y = 1|x̃)/P̂(y = 0|x̃)
odds ratio between 2 observations with x and x̃.

57 / 327
Binary Target: Probit- & cloglog-Models

Probit-Model:
use standard-Gaussian ECDF as response function:

π̂i = Φ(x>
i β̂)

cloglog-Model:
response function:
π̂i = 1 − exp(− exp(x>
i β̂))

58 / 327
Binary Target: Probit- & cloglog-Models

Probit-Model:
use standard-Gaussian ECDF as response function:

π̂i = Φ(x>
i β̂)

cloglog-Model:
response function:
π̂i = 1 − exp(− exp(x>
i β̂))

58 / 327
Binary Target: Probit- & cloglog-Models

Probit-Model:
use standard-Gaussian ECDF as response function:

π̂i = Φ(x>
i β̂)

cloglog-Model:
response function:
π̂i = 1 − exp(− exp(x>
i β̂))

58 / 327
Binary Targets: Expectation and Variance

I no direct connection between expectation (x> β) and variance (σ 2 ) in


linear model
I for binary y ∼ B(1, π):
E(y ) = π = P(y = 1) determines Var(y ) = π(1 − π)
Overdispersion:
observed variability greater than theory assumes:
I unobserved heterogeneity
I positively correlated observations
Solution: add dispersion φ : Var(y ) = φπ(1 − π)

59 / 327
Binary Targets: Expectation and Variance

I no direct connection between expectation (x> β) and variance (σ 2 ) in


linear model
I for binary y ∼ B(1, π):
E(y ) = π = P(y = 1) determines Var(y ) = π(1 − π)
Overdispersion:
observed variability greater than theory assumes:
I unobserved heterogeneity
I positively correlated observations
Solution: add dispersion φ : Var(y ) = φπ(1 − π)

59 / 327
Binary Targets: Expectation and Variance

I no direct connection between expectation (x> β) and variance (σ 2 ) in


linear model
I for binary y ∼ B(1, π):
E(y ) = π = P(y = 1) determines Var(y ) = π(1 − π)
Overdispersion:
observed variability greater than theory assumes:
I unobserved heterogeneity
I positively correlated observations
Solution: add dispersion φ : Var(y ) = φπ(1 − π)

59 / 327
Binary Targets: Expectation and Variance

I no direct connection between expectation (x> β) and variance (σ 2 ) in


linear model
I for binary y ∼ B(1, π):
E(y ) = π = P(y = 1) determines Var(y ) = π(1 − π)
Overdispersion:
observed variability greater than theory assumes:
I unobserved heterogeneity
I positively correlated observations
Solution: add dispersion φ : Var(y ) = φπ(1 − π)

59 / 327
Example: Patent Injunctions

I Data: 4832 European patents (Europäisches Patentamt)


I Target: patent injunctions (ja/nein)
I covariates (metric):
I year of patent (0=1980)
I citations (azit)
I scope (no. of countries; aland)
I patent claims (ansp)
I covariates (categorical):
I sector (Biotech&Pharma, IT&Semiconductor) (branche)
I US patent (uszw)
I patent holder origin (US/D, CH, GB/others; (herkunft))

60 / 327
Binary Target: R-Implementation

## The following objects are masked from patent (pos = 4):


##
## aland, ansp, azit, branche, einspruch, herkunft,
## jahr, uszw

pat1 <- glm(einspruch ~ ., data = patent, family = binomial())


round(summary(pat1)$coefficients, 3)

## Estimate Std. Error z value Pr(>|z|)


## (Intercept) -0.771 0.134 -5.765 0.000
## uszwUSPatent -0.392 0.068 -5.795 0.000
## jahr -0.071 0.009 -8.194 0.000
## azit 0.118 0.014 8.297 0.000
## aland 0.084 0.011 7.915 0.000
## ansp 0.018 0.003 5.219 0.000
## brancheBioPharma 0.681 0.084 8.128 0.000
## herkunftD/CH/GB 0.323 0.083 3.897 0.000
## herkunftUS -0.152 0.076 -2.002 0.045
Binary Target: R-Implementation
round(exp(cbind(coef(pat1), confint(pat1))), 3)

## Waiting for profiling to be done...

## 2.5 % 97.5 %
## (Intercept) 0.462 0.355 0.601
## uszwUSPatent 0.676 0.592 0.772
## jahr 0.931 0.915 0.947
## azit 1.125 1.095 1.157
## aland 1.088 1.066 1.111
## ansp 1.018 1.011 1.025
## brancheBioPharma 1.975 1.676 2.328
## herkunftD/CH/GB 1.381 1.174 1.625
## herkunftUS 0.859 0.741 0.997

table(einspruch, estimated = round(fitted(pat1)))

## estimated
## einspruch 0 1
## nein 2223 624
## ja 925 1094
Count Data as Targets

Daten:
I positive, whole number target y (counts, frequencies)
I metric and/or categorical x1 , . . . , xp
⇒ naive estimates Ê(yi ) = x>
i β̂ could become negative
⇒ model log(Ê(yi )), i.e,
 
Ê(yi ) = exp x> i β̂ = exp(βˆ0 ) exp(βˆ1 xi1 ) . . . exp(βˆp xip )

⇒ exponential-multiplicative covariate effects on target

63 / 327
Count Data as Targets

Daten:
I positive, whole number target y (counts, frequencies)
I metric and/or categorical x1 , . . . , xp
⇒ naive estimates Ê(yi ) = x>
i β̂ could become negative
⇒ model log(Ê(yi )), i.e,
 
Ê(yi ) = exp x> i β̂ = exp(βˆ0 ) exp(βˆ1 xi1 ) . . . exp(βˆp xip )

⇒ exponential-multiplicative covariate effects on target

63 / 327
Count Data as Targets

Daten:
I positive, whole number target y (counts, frequencies)
I metric and/or categorical x1 , . . . , xp
⇒ naive estimates Ê(yi ) = x>
i β̂ could become negative
⇒ model log(Ê(yi )), i.e,
 
Ê(yi ) = exp x> i β̂ = exp(βˆ0 ) exp(βˆ1 xi1 ) . . . exp(βˆp xip )

⇒ exponential-multiplicative covariate effects on target

63 / 327
Count Data as Targets

Daten:
I positive, whole number target y (counts, frequencies)
I metric and/or categorical x1 , . . . , xp
⇒ naive estimates Ê(yi ) = x>
i β̂ could become negative
⇒ model log(Ê(yi )), i.e,
 
Ê(yi ) = exp x> i β̂ = exp(βˆ0 ) exp(βˆ1 xi1 ) . . . exp(βˆp xip )

⇒ exponential-multiplicative covariate effects on target

63 / 327
Count Data as Targets: log-linear Model

Distributional assumption:
I yi |xi ∼ Po (λi ) ; λi = exp(x>
i β)
⇒ E(yi ) = Var(yi ) = λi
Overdispersion:
I Frequently Var(yi ) 6= λi :
⇒ more flexible model with dispersion parameter φ:
Var(yi ) = φλi
⇒ alternative distributions: Tweedie, Negative Binomial

64 / 327
Count Data as Targets: log-linear Model

Distributional assumption:
I yi |xi ∼ Po (λi ) ; λi = exp(x>
i β)
⇒ E(yi ) = Var(yi ) = λi
Overdispersion:
I Frequently Var(yi ) 6= λi :
⇒ more flexible model with dispersion parameter φ:
Var(yi ) = φλi
⇒ alternative distributions: Tweedie, Negative Binomial

64 / 327
Exampe: Patent Citations

pat2 <- glm(azit ~ ., family = poisson, data = patent)


pat3 <- MASS::glm.nb(azit ~ ., data = patent)
AIC(pat2, pat3)

## df AIC
## pat2 9 21021.23
## pat3 10 16341.48

round(cbind(
summary(pat2)$coefficients[2:5, -c(3, 4)],
summary(pat3)$coefficients[2:5, -c(3, 4)]
), 3)

## Estimate Std. Error Estimate Std. Error


## einspruchja 0.442 0.024 0.422 0.046
## uszwUSPatent -0.079 0.024 -0.047 0.046
## jahr -0.070 0.003 -0.079 0.006
## aland -0.026 0.004 -0.029 0.008

⇒ similar estimates, much bigger variability, better fit.


Outline

Recap: Linear Models

Recap: Generalized Linear Models


Motivation
GLMs: The General Approach
Inference

Recap: Non-Linear Effects

Recap: Mixed Models and Random Effects

Recap: Additive Models and Penalization

66 / 327
Definition: GLM

I Structural assumption: Connect conditional expectation and linear


predictor Xβ via link/response function:

E (yi |xi ) = µi = h(x> >


i β) ⇔ g (E (yi |xi )) = g (µi ) = xi β

exp(x>
i β)
I logit regression: E(yi |xi ) = P(yi = 1|xi ) = 1+exp(x>i β)
I log-linear model E(yi |xi ) = exp(xi β)
I Distributional assumption: Given independent (xi , yi ) with
exponential family density f (yi ):
 
yi θi −b(θi )
⇒ f (yi |θi ) = exp φ ωi − c(yi , φ, ωi ) ; θi = θ(µi )
I E(y |x ) = µ = b 0 (θ ) = h(x> β)
i i i i i
I Var(y |x ) = φb 00 (θ )/ω ; ω = n
i i i i i i

⇒ Connect mean structure and variance structure (and higher moments)

66 / 327
Definition: GLM

I Structural assumption: Connect conditional expectation and linear


predictor Xβ via link/response function:

E (yi |xi ) = µi = h(x> >


i β) ⇔ g (E (yi |xi )) = g (µi ) = xi β

I Distributional assumption: Given independent (xi , yi ) with


exponential family density f (yi ):
 
yi θi −b(θi )
⇒ f (yi |θi ) = exp φ ωi − c(yi , φ, ωi ) ; θi = θ(µi )
I E(y |x ) = µ = b 0 (θ ) = h(x> β)
i i i i i
I Var(y |x ) = φb 00 (θ )/ω ; ω = n
i i i i i i

⇒ Connect mean structure and variance structure (and higher moments)

66 / 327
Definition: GLM

I Structural assumption: Connect conditional expectation and linear


predictor Xβ via link/response function:

E (yi |xi ) = µi = h(x> >


i β) ⇔ g (E (yi |xi )) = g (µi ) = xi β

I Distributional assumption: Given independent (xi , yi ) with


exponential family density f (yi ):
 
yi θi −b(θi )
⇒ f (yi |θi ) = exp φ ωi − c(yi , φ, ωi ) ; θi = θ(µi )
I E(y |x ) = µ = b 0 (θ ) = h(x> β)
i i i i i
I Var(y |x ) = φb 00 (θ )/ω ; ω = n
i i i i i i

⇒ Connect mean structure and variance structure (and higher moments)

66 / 327
Simple Exponential Families

Distribution θ(µ) b(θ) φ

Normal N(µ, σ 2 ) µ θ2 /2 σ2
µ
Bernoulli B(1, µ) log( 1−µ ) log(1 + exp(θ)) 1
Poisson Po(µ) log(µ) exp(θ) 1
Gamma G (µ, ν) −1/µ − log(−θ)
√ 1/ν
Inverse Gauß IG (µ, σ 2 ) 1/µ2 − −2θ σ2

67 / 327
Simple Exponential Families

Distribution E(y ) = b 0 (θ) b 00 (θ) Var(y ) = b 00 (θ)φ/ω

Normal µ=θ 1 σ 2 /ω
exp(θ)
Bernoulli µ = 1+exp(θ) µ(1 − µ) µ(1 − µ)/ω
Poisson µ = exp(θ) µ µ/ω
Gamma µ=1− √ 1/θ µ2 µ2 /(νω)
Inverse Gauß µ = 1/ −2θ µ3 µ3 σ 2 /ω

68 / 327
R-Implementation: glm()

glm(formula, family , data, ...)


I formula: as in lm
I family: specify distribution (binomial, gamma, etc.)
and link function g (µ) = Xβ
(family=binomial(link=’probit’)).

69 / 327
Advantages of GLM-Formulation

Iunified approach for variety of data situations


⇒ unified methodology for
I estimation
I tests
I model choice and diagnostics
⇒ asymptotics
via Maximum Likelihood approach.

70 / 327
Recent Extensions:

GLM idea in combination with ML inference works similarly for many


other non-exponential family distributions, implemented in mgcv:
I t-distribution
I Tweedie
I Beta
I models for ordinal categorical responses
(Wood, Pya & Säfken, 2016)

71 / 327
Outline

Recap: Linear Models

Recap: Generalized Linear Models


Motivation
GLMs: The General Approach
Inference

Recap: Non-Linear Effects

Recap: Mixed Models and Random Effects

Recap: Additive Models and Penalization

72 / 327
ML Estimation: Idea

Pn > 2
OLS estimate in linear model: i=1 (yi − xi β) → min
I
√  Pn
(y −x> β)2

density for y: ni=1 f (yi |β, xi ) = ( 2πσ)−n exp − i=1 2σi 2 i
Q
I

⇒ OLS estimate maximizes joint density of observed data over model


parameters
⇒ Maximum Likelihood principle:
maximize (Log-)Likelihood l(β) = ni=1 log(f (yi |β, xi ))
P

72 / 327
ML Estimation: Idea

Pn > 2
OLS estimate in linear model: i=1 (yi − xi β) → min
I
√  Pn
(y −x> β)2

density for y: ni=1 f (yi |β, xi ) = ( 2πσ)−n exp − i=1 2σi 2 i
Q
I

⇒ OLS estimate maximizes joint density of observed data over model


parameters
⇒ Maximum Likelihood principle:
maximize (Log-)Likelihood l(β) = ni=1 log(f (yi |β, xi ))
P

72 / 327
ML Estimation: Idea

Pn > 2
OLS estimate in linear model: i=1 (yi − xi β) → min
I
√  Pn
(y −x> β)2

density for y: ni=1 f (yi |β, xi ) = ( 2πσ)−n exp − i=1 2σi 2 i
Q
I

⇒ OLS estimate maximizes joint density of observed data over model


parameters
⇒ Maximum Likelihood principle:
maximize (Log-)Likelihood l(β) = ni=1 log(f (yi |β, xi ))
P

72 / 327
ML Estimation: Idea

Pn > 2
OLS estimate in linear model: i=1 (yi − xi β) → min
I
√  Pn
(y −x> β)2

density for y: ni=1 f (yi |β, xi ) = ( 2πσ)−n exp − i=1 2σi 2 i
Q
I

⇒ OLS estimate maximizes joint density of observed data over model


parameters
⇒ Maximum Likelihood principle:
maximize (Log-)Likelihood l(β) = ni=1 log(f (yi |β, xi ))
P

72 / 327
ML Estimation: Procedure

Pn
I log-likelihood l(β) = i=1 log(f (yi |β, xi ))

I score function s(β) = ∂β l(β)
I (iterative) solution for s(β) = 0
via Fisher-Scoring or IWLS

73 / 327
ML Estimation: Fisher-Scoring
I basically Newton method:
 −1
β (k+1) = β (k) − ∂β∂> s(β) s(β)

Newton−Verfahren

β(0)
s(β)

β(1)
β(2)

β
74 / 327
ML Estimation: Fisher-Scoring & IWLS

I basically Newton method:


 −1
β̂ (k+1) = β̂ (k) − ∂β∂> s(β̂ (k) ) s(β̂ (k) )
 −1
I observed information matrix H(β̂ (k) ) = ∂β∂> s(β̂ (k) ) expensive to
compute
⇒ use expected Fisher information F(β) = E(H(β))
very efficiently computable:
represent in terms of iteratively re-weighted LS estimation (IWLS) with a
(k)
diagonal weight matrix W(k) and working observations ỹi .

75 / 327
Properties of ML Estimators

β̂ML is consistent, efficient, asymptotically Gaussian:


a
β̂ML ∼ N(β, F−1 (β))

76 / 327
Tests
Linear hypotheses H0 : Cβ = d vs HA : Cβ 6= d
Estimation for β under restriction H0 : β̃
I LR-Test:
lq = −2(l(β̃) − l(β̂))

I Wald-Test:

w = (Cβ̂ − d)> (CF−1 (β̂)C> )−1 (Cβ̂ − d)

I Score-Test:
u = s(β̃)> F−1 (β̃)s(β̃)

a
under H0 : lq, w , u ∼ χ2r , r = rank(C) (no. of restrictions)
⇒ rejecct H0 if lq, w , u > χ2r (1 − α).
77 / 327
Tests
Linear hypotheses H0 : Cβ = d vs HA : Cβ 6= d
Estimation for β under restriction H0 : β̃
I LR-Test:
lq = −2(l(β̃) − l(β̂))

I Wald-Test:

w = (Cβ̂ − d)> (CF−1 (β̂)C> )−1 (Cβ̂ − d)

I Score-Test:
u = s(β̃)> F−1 (β̃)s(β̃)

a
under H0 : lq, w , u ∼ χ2r , r = rank(C) (no. of restrictions)
⇒ rejecct H0 if lq, w , u > χ2r (1 − α).
77 / 327
Tests
Linear hypotheses H0 : Cβ = d vs HA : Cβ 6= d
Estimation for β under restriction H0 : β̃
I LR-Test:
lq = −2(l(β̃) − l(β̂))

I Wald-Test:

w = (Cβ̂ − d)> (CF−1 (β̂)C> )−1 (Cβ̂ − d)

I Score-Test:
u = s(β̃)> F−1 (β̃)s(β̃)

a
under H0 : lq, w , u ∼ χ2r , r = rank(C) (no. of restrictions)
⇒ rejecct H0 if lq, w , u > χ2r (1 − α).
77 / 327
Tests
Linear hypotheses H0 : Cβ = d vs HA : Cβ 6= d
Estimation for β under restriction H0 : β̃
I LR-Test:
lq = −2(l(β̃) − l(β̂))

I Wald-Test:

w = (Cβ̂ − d)> (CF−1 (β̂)C> )−1 (Cβ̂ − d)

I Score-Test:
u = s(β̃)> F−1 (β̃)s(β̃)

a
under H0 : lq, w , u ∼ χ2r , r = rank(C) (no. of restrictions)
⇒ rejecct H0 if lq, w , u > χ2r (1 − α).
77 / 327
Tests
Linear hypotheses H0 : Cβ = d vs HA : Cβ 6= d
Estimation for β under restriction H0 : β̃
I LR-Test:
lq = −2(l(β̃) − l(β̂))

I Wald-Test:

w = (Cβ̂ − d)> (CF−1 (β̂)C> )−1 (Cβ̂ − d)

I Score-Test:
u = s(β̃)> F−1 (β̃)s(β̃)

a
under H0 : lq, w , u ∼ χ2r , r = rank(C) (no. of restrictions)
⇒ rejecct H0 if lq, w , u > χ2r (1 − α).
77 / 327
Tests in R
√ a
summary.glm uses w ∼ N(0, 1) for H0 : βj = 0:
round(summary(pat2)$coefficients[8:9, ], 3)

## Estimate Std. Error z value Pr(>|z|)


## herkunftD/CH/GB -0.236 0.031 -7.524 0.000
## herkunftUS 0.061 0.026 2.358 0.018

anova.glm(..., test=’Chisq’) for LR-Tests:


anova(update(pat2, . ~ . - herkunft), pat2, test = "Chisq")

## Analysis of Deviance Table


##
## Model 1: azit ~ einspruch + uszw + jahr + aland + ansp + branche
## Model 2: azit ~ einspruch + uszw + jahr + aland + ansp + branche + herkunft
## Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1 4859 13954
## 2 4857 13859 2 95.155 < 2.2e-16 ***
## ---
## Signif. codes:
## 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
Model Choice

Which probabilistic model offers best trade-off between fidelity to training


data (more complexity) and parsimony?
⇒ Information criteria:
I Akaike: AIC = −2l(β̂) + 2p → min (AIC())
I Bayes: BIC = −2l(β̂) + log(n)p → min

79 / 327
Model Diagnostics: Residuals

I Pearson residuals: (resid-Option: type=’pearson’)


yi −µ̂i
I riP = √
v (µ̂i )
I for grouped data approx. N(0, 1)
I deviance residuals (resid-Default)
riD = sgn(yi − µ̂i ) 2(li (yi ) − li (µ̂i ))
p
I
I for grouped data approx. N(0, 1)-verteilt.
I partial residuals (type=’partial’)
I prediction errors yi − ŷi (type=’response’)

80 / 327
Model Diagnostics: Residuals

I Pearson residuals: (resid-Option: type=’pearson’)


yi −µ̂i
I riP = √
v (µ̂i )
I for grouped data approx. N(0, 1)
I deviance residuals (resid-Default)
riD = sgn(yi − µ̂i ) 2(li (yi ) − li (µ̂i ))
p
I
I for grouped data approx. N(0, 1)-verteilt.
I partial residuals (type=’partial’)
I prediction errors yi − ŷi (type=’response’)

80 / 327
Model Diagnostics: Residuals

I Pearson residuals: (resid-Option: type=’pearson’)


yi −µ̂i
I riP = √
v (µ̂i )
I for grouped data approx. N(0, 1)
I deviance residuals (resid-Default)
riD = sgn(yi − µ̂i ) 2(li (yi ) − li (µ̂i ))
p
I
I for grouped data approx. N(0, 1)-verteilt.
I partial residuals (type=’partial’)
I prediction errors yi − ŷi (type=’response’)

80 / 327
Model validation: plot.glm()

par(mfrow = c(2, 2))


plot(pat2)

Residuals vs Fitted Normal Q−Q

Std. deviance resid.


2796
1871 ● ● 2796 ●
● 1871
● 4743 ● ● ●●● 4743
Residuals

●●●●● ●●●●● ●●




●●
● ●● ● ● ●● ●● ●● ●● ●
●●

●●


●●●● ●● ●● ●● ● ● ●
● ●
●● ●
● ●●

●●



●●
5

5
● ● ●
●● ● ●


●●●● ●

●●

●●●●●


●●
●●


●●

●●

●●
●●●

●●
●●

●●●



●● ●


●●●


●●●

●●




●●●●●
● ●
●●● ●
●●


●●



●●


●●
● ● ●


●●●

●●●


●●

●●

●●


●●


●●










































●●










●●



●●




●●

●●








●●





●●


●●



●●




●●


●●●











●●
●●


●●●

●●


● ●

●●●
● ● ●● ●

●●


●●


●●



●●


●●


●●●●
●●●

●●
●●


●●



●●



●●




●●





●●

●●

●●
●●

●●


●●


●●

●●


●●

●●


●●



●●




●●




●●



●●



●●

●●

●●


●●



●●




●●



●●




●●



●●

●●



●●



●●


●●


●●



●●

●●

●●


●●●

●●●

● ●●●●

● ●●



●●


●●



●●


●●

●●
●●●●
●●
●●
● ●


●●

●●
●●


●●


●●


●●


●●


●●


●●

●●


●●
●●


●●


●●●●


●●
●●


●●


●●


●●

●●

●●


●●



●●


●●

●●

●●
●●


●●●
●●
●●
●●● ●
● ● ●

●●




●●

●●
● ●



● ●●●
●●
●●
●●

●●


●●









●●





●●


●●

●●












●●●



●●


●●




●●


●●


●●


●●



●●
●●

●●

●●●


●●


●●

●●


●●
●●

●●
●●



●●


●●


●●


●●





●●






●●


●●



●●



●●



●●





●●





●●●



●●
●●


●●





●●





●●





●●

























































●●
●●




●●





















●●●
●●

●●● ●●

● ●



●●


●●



●●
●●
●●

● ●




●●

●●

●●

●●
●●

●●
●● ●


●●




●●
●●

●●


●●

●●
●●

●●

●●

●●
●●



●●

●●

●●

●●
●●


●●


●●

●●●


●●



●●


●●

●●


●●
●●●


●●

●●

●●

●●
●●
●●


●●

●●
●●


●●



●●





●●
●●



●●

●●



●●


●●

●●

●●

●●
●●


●●


●●

●●●



●●


●●

●●

●●
●●


●●


●●


●●











































●●●●
●●●
● ● ● ●


●●


●●


●●



●●

●●



●●
●●●

●●●



●●



● ●
●●
●●


●●
● ●

●●



●●

●●●




●●
●●●

●●



●● ●

●●

●●
● ●

●●
●●●

●●●

● ●


●●


●●


●●
● ●
●●


●●●

● ●
●●

●●

●●
●●
●●
●●

●●
●●


● ●●




●●

●●


●●
●●
●●



●●




●●
●●
●●●●●
●●● ●●
●●

●●

●●


●●

●●
●●



● ●●
−5

−5
●●
● ●

−0.5 0.0 0.5 1.0 1.5 2.0 2.5 −4 −2 0 2 4

Predicted values Theoretical Quantiles

Scale−Location Residuals vs Leverage


Std. deviance resid.

Std. Pearson resid.


2796

−5 5 15 25
● 4743 1871 ● ● ● 2796
3.0

● ●

● ●●●●● ●
● ●● ●● ● ●
● ● ● ●● ● ●●●● ●●●● ● ●● ●●
● ●
●●●●● ● ●● ●●●●●●●●
● ●

●● ●
●● ●●


●●







●●●●●






●●●





●●

● ●●
●●●●




●●

●●●●
●●
● ● ●




● ● ●
● ●●●●●●
●●
●●●

●●
●●
●●

●●


●●●
●●

●●●
●●●


●●
●●●● ●●●●
●●
●●●
● ●●●● ●●●●●●●●
● ● ●


1.5

●●
●●●
●● ●●●
●●
●●

●●●

●●
●●●●
●● ●●


●●●

●●

●●●

●●● ●
● ●
● ●
●●●


●●


●●

●●
●●


●●
●●
●●

●●
● ●


●●
●●●

●●








●●
●●
●●

●●
●●

●●
●●


●●





●●●
●●

●●●




●●

●●

● ●


●●



●●

●●

●●

●●




●●




●●

● ●






●●


●●●
●●
●●



●●

●●
● ●

●●

●●






●●



●●
●●

● ●











●●
●●●
●●
● ●●●●

● ●
●●
●●●● ●







● ●
●●
● ●




●●



●●
●●



●●

●●
●●

●●

●●

●●


● ●●





●●


●●
●●
● ●●



●●

●●

●●●



●●




● ●



●●


●●

●●


●●
● ●

●●
●●

●●●
●●



●●

●●● ●
●●

●●

●●


●●
● ●●
●●

●●
●●
●●

●●
● ●
●●●


●●


●●
●●

●●
●● ●● ● ● ●● ●●




●●
●●●


●●
● ●

●●


●●●

●●

●●


●●

●●
●●

● ●


●●


●●



●●



●●


●●●

●●


● ●

●●

●●


●●●

●●●
● ●
●●

●●

●●

●●●●

●●

●●


●●


●●




●●● ●
●●
●●
●●
● ●
●●●● ●






● 1
● ● ●●

●●


●●
●●


●●

●●

●●


●●


●●

●●

●●

●●




●●


●●


●●





●●


●●

●●
●●●
●●●

●●

●●
●●



●●

●●

●●

●●
●● ●● ●










●● 0.5
●●
●●●●
● ●
●●


●●

●●

●●●

●●

●●


●●


●●


●● ●

●●


●●
●●
● ●

●●

●●

●●
●●
●● ●●

● ● ●



●●

●●


●●

●●
●●


●● ●●


●●

●●

●●


● ●●●
●●

●●


● ●


●●

●●




● ●
●●● ●
●● ●●
●● ●●







●●

●● Cook's distance



●●


●●



●●
● ●

●●


●●


●●

● ●

●●

●●


● ●
●●

●●
●●
● ●

●●

●● ●
●●●●

●●

●● ●●
● 0.5
1

●● ●
●● ●● ●
● ●
● ● ● 3939 ● 4290 ●
0.0



●●
●●

● ●
●●

● ●
● ●
● ●

−0.5 0.0 0.5 1.0 1.5 2.0 2.5 0.0 0.1 0.2 0.3 0.4

Predicted values Leverage


Recap: Linear Models

Recap: Generalized Linear Models

Recap: Non-Linear Effects


Transformation of Covariates
Polynomial Splines

Recap: Mixed Models and Random Effects

Recap: Additive Models and Penalization

82 / 327
Motivation
bla
blub
20
10
5

0
−10

20 40 60 80 100 120 140 160


size
Motivation
par(mfrow = c(1, 2))
plot(size, res_size)
points(sort(unique(size)), tapply(res_size, size, mean), col = "red")
plot(log(size, base = 2), res_size)
points(log(sort(unique(size)), base = 2), tapply(res_size, size, mean), col = "red")
20

20
● ●

● ● ● ●
● ●
15

15
● ● ● ●
● ●
● ● ● ● ● ●
● ● ● ● ●● ●
●● ● ●
● ● ●● ●
● ●● ● ● ●
● ●● ● ●●
● ●● ● ●

● ● ● ● ●● ● ●● ● ● ●
● ● ●
● ●● ● ●●
●●
● ● ● ● ●● ● ● ●
● ●
10

10
●● ●● ●● ● ●●● ● ● ●● ●
● ● ● ● ● ● ● ● ● ●
●●● ● ●●● ●●● ● ● ●

● ● ● ●● ● ● ●
● ●
● ●●● ●
●● ●
●●● ●

● ● ●● ● ● ● ●● ●●●● ● ●● ● ●● ● ● ●● ●● ● ● ●
●●
● ●●●●
● ●●
● ●●● ●●●● ● ● ●● ●

● ● ●●●
● ●●●● ● ●● ● ●●●●●● ●●●● ● ● ●
● ●
● ●
●● ●●● ● ● ●● ●● ● ● ● ●● ●
●●
● ● ● ●
●●●●● ●●●●● ● ● ●● ● ● ● ● ● ● ●● ● ● ●● ●●
● ● ●● ●●●●●● ● ● ● ● ●


● ●
● ●●●

●●●● ●●

● ●
●●


●● ●

● ● ● ●
● ● ●●
●● ●
●● ● ● ● ●●● ●
● ● ●●●●● ●●●●● ●

● ● ●
● ● ●●
●● ●
●● ● ● ●
●● ●●●●● ● ●● ● ●●● ● ● ● ● ●● ● ● ● ● ● ● ●●●●●●●●● ●●●
●●●● ● ● ● ● ●● ● ●
●●
● ●
●●●
● ● ●●● ●●

●● ●●

● ● ● ● ● ●● ●
●●● ● ●
● ●
●● ●●● ●●●●●●●●●●●

● ● ● ● ●
●● ●● ●● ● ● ●● ● ● ●● ●● ● ●●●● ● ● ●● ● ● ●● ●

● ●●●●●


●●●

●●
●●●

●●

●●
●●
●● ●●


●●
● ●●●●●
●●● ●●●

●● ●
●●
● ● ● ●●
●●
● ● ● ● ● ●●●● ●●
●● ●●●●●● ●●● ●●●●
●● ● ● ●● ●●●● ● ●●●

●● ● ●● ● ●●
●●
● ● ●

●● ● ●●●
●●●●●
●●●
● ● ● ●
●●

●● ●

●●

●●


●●●●●
●● ●● ●
●●●●●●● ● ● ● ● ● ● ●
● ●● ●
●●● ●●●●
●●●●● ● ●●
●●
●●

●●●●
● ●●
●● ●●
●●●●
●●
●●● ● ●
●●●●●●●● ● ● ●● ●
5

5
● ●●● ● ●
●●
●●●
● ● ●

●●●●●
●●
● ●

●● ●
● ●●●●●
● ●●● ●

● ●● ● ●● ● ● ● ●
●● ●●● ●●●
●●● ● ●●●●●●
●●
● ●●● ● ● ●●

● ●●●
● ●
● ●●● ● ●●● ● ● ●
● ●● ● ●● ● ●● ● ● ●●● ● ● ●● ● ●●●● ● ● ●●●●
● ●● ● ●
●●
● ● ●


●●●●● ●●
●●●
●●●
●●●●
●●●

● ●●
●● ●

●●●● ● ●● ●● ● ● ● ● ●●● ●
●● ● ●
●●●●●● ●●
●●●●●●● ●●●●●
●●●●● ●●
●●●


● ●●
●●● ●●●● ● ●● ●●
● ● ● ●
● ● ●●
●●
●●
●●

●●●

●●●
● ●

● ●
●●

●●●
●●●●

●●●

●●●
●●●
●●

●●●
● ● ●● ●
● ●●●● ● ●● ● ● ●●
● ●

● ● ● ●●●●●
●● ●●
● ●●●●●●●●
● ●●●
●●●
● ●●●


●●●● ●


●●
●●

●●●
● ● ●● ●
● ●● ●● ● ●●● ● ●● ● ● ●

● ● ●
●●
●●● ●●●

●●● ●●● ●
●●●●●●●
●●
●●

●● ●

●●● ●●●

●●●● ●
●● ●
●● ● ● ●
●● ●● ● ●● ● ●●● ● ● ●●
● ● ● ● ●● ●●● ●
●●●● ● ●● ●●
● ●
●●●
●● ●●
●●
● ●●●

●●●● ●
●● ●
●● ● ● ●● ● ●
● ● ●● ●


●● ●●● ● ●● ●●●● ● ●●
●●
● ●
●●●

● ●● ●

● ●●●



●●
●●

●●

●●●


●●

●●●

●●

● ●
●●●



● ●
●●●
●●
●●

● ●● ●●
●● ● ● ●●● ●●●●● ● ● ●●
● ●●●● ●●● ●
● ●●● ●●● ●●●● ●●●
●●
●●
●●
●●
●●





●●●



● ●
●●●
●●
●●

● ●● ● ●
● ● ● ● ●●●●
● ●● ●
●●●●● ●
●●
●● ●●●●
●●
● ●
●● ●● ●●●●●
●●●
●●●● ●
● ●
●●●● ●
●●
● ●
●●● ●
● ● ● ● ●● ● ● ● ● ● ● ●
●●●●● ● ● ●
●● ● ● ●●●●●●●● ●● ● ●●
●●
●●●●●●●● ● ● ●
●● ● ●
●●
● ●●● ●
● ● ●● ●● ● ● ●● ●●
●● ● ● ●●
●●●●
● ●●●●●● ●

●● ●●● ●● ●● ● ● ●●
● ● ● ●●
● ● ● ● ●●
● ●●●
●●
●●● ● ●●●●●●●●●
●● ●●●
●● ● ●●● ●●● ● ● ●●● ●●●● ●●
● ●
●●● ●●
● ● ●●
●●
●●
● ●●
●●●●●●●●●● ●
●●● ●●●●●
●●●
●●
●●●
●●
●● ●●● ●●●●● ● ● ● ● ●●●● ●●● ●● ● ●● ●● ●●●●●
●●●
●●
●●●
●●
●● ●● ● ● ●
● ● ●
●●●
● ●● ●●● ● ●●● ●
● ●●●●
●●● ●●●●●●● ●●●●●●● ●●● ● ● ●●●●● ●● ● ● ●●●●●●● ●●●●●●● ●●●
●●●●●●
●●● ●●●●●●● ●●● ●●●●●● ● ●
● ●● ●●
● ● ●●●
● ●● ●
● ●
● ● ●●
●●●● ● ● ●●● ●●● ● ● ● ● ●● ●●●●● ●● ● ●●
●●●● ● ● ●● ● ● ●●● ●● ●

●●● ●
●●●●●



●● ●
● ●
● ●●
●●
●●●
●●
●●●●
●●●
●●
●●●

●●





● ●



● ●●
●●
●● ●
●●
●●●
●●●

● ●● ●●●●

● ● ●
● ● ● ●●●
●●●●●● ●●●●●● ● ●●●●●●●●
●●●●●●●
●● ●●●
● ●
●●
●●●

●●





● ●



● ●●
●●
●● ●
●●●●●●●

●●●●
●●●●

●● ● ●

● ● ●


●● ●

●●●● ●●●●●●●
●●●●
●●●


●●
●●
● ●●
● ●●●●●
● ●

●●●●
●●
●●● ●●● ● ●●
● ● ●
●● ●●●●● ●●●
●●●● ●●● ●●● ●
●●●●●
● ●
●● ●

● ● ●●●
● ●

●●●●
●●
●●● ●● ● ●●●

0

0
● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ●
●●●●
●● ●●●●
●●●●● ●●
●●
● ●
●●

●●●
●●●●
●●● ●
●●
● ●●
●●
●●●
●● ●● ●●






●●
●●● ● ●●

●● ● ●● ● ●●
● ● ●●
● ●●●
● ●
● ●●●● ●● ● ●● ●●●●●●●
● ●●●
●● ●● ●
●●● ●

●● ●
●●
●●

●●●
●● ●● ●●




●●
●●
●●●● ●●●
●● ●● ● ●●●
● ● ●●
●● ● ●
●●

●●
●●● ● ● ●●
● ●●
● ●


●●● ●
●●
●●
● ●



●●●●●●●●
●●

● ●●●●
● ● ●
●●
●● ● ●
● ● ●●

●●●

●●
●●●● ● ●●●●●●● ●
●● ●●

●●●●


● ●



●●●●●●●●
●●

● ●●●●
● ● ●
●●
●●●●●
●● ●●
●● ●
● ●● ●● ●
●● ●●●

●●●
●●
●●●●●
●●●

●●
●●●

●●●●
●●●●
●●
●●●●●●


●●
●●●

●●●
● ●●
● ●●●
●● ●

●●●
● ●● ● ● ● ●
● ● ●● ● ● ●●●●●●
●●●● ●● ●
●●
●●●●
●● ●●●
● ●

●●●●
●●
●●●●●●


●●
●●●

●●●
● ● ●
● ●●
●●
● ●

●●
●●●● ●●
● ●

●●
●● ●



●●





●●
● ●









●●




●●●● ●


●●
●●


●●●
●●
●●●


●●
●●


●●
●●

●●
●●
●●●●●
●● ● ●●● ● ● ● ● ● ●●● ● ●● ●●

●●


● ● ●●●
● ●●







●●




●●●● ●


●●
●●


●●●
●●
●●●
●●
●●
●●●


●●


●●
●●
● ●●●

●● ● ●●
● ● ●● ●
● ● ● ●●● ● ●●
● ●●
●● ●
●● ●●

● ● ●● ●
●●●●●
●● ●
● ●
● ●● ● ● ● ●● ●● ● ● ● ● ● ● ●
●●●●●●●●●● ●●●● ● ●●

● ● ●● ●
●●●●●
●● ●
● ●
● ●● ● ● ●●●

●●
●●
●●●●●
●● ●
●● ●●●●●● ●●
● ●● ● ● ● ●● ● ● ●
●● ● ●●●●●●●●●●

●●●●● ●●
● ●● ● ●● ●●● ●●
● ● ● ●●● ●●●●●

● ●●● ●
●●
●●●

●●●
● ●●
●●● ●●

●●●●

● ●
●●●
●●

●●●●●●●●
●●●●●●
● ●
● ● ● ● ● ●
● ● ● ●● ●●● ● ●●
●●●● ●●●●●●●●●
● ●●
●●● ●●

●●●●

● ●
●●●
●●●
●●●●●●●

●●●●●●
● ●
●● ● ● ● ●

●●●●●
●●● ●●
●●●

●●
●●●●



●●●●●
●●
●●

●●


●●●


●● ●

●●

●●


●●


●●




●●
●●
● ●● ●
●● ●● ●●●● ● ●● ● ● ●
● ●● ●●●●
●●●● ●● ●●●●
●●●● ●
●●

●●
●●●
●●
●●

●●


●●●


●●●


●●●


●●


● ●



●●●

●● ●●●●● ●
● ●●●●
●● ●●

●● ● ●● ● ●
●●●
●●●●
●●●
● ●●

●● ●
●●● ●●

●●
●●●●●
● ●
●●


●●●
●●


●●●

● ● ● ●


●● ●●● ●●●● ●
● ● ●
● ● ● ● ●●● ● ●●●
●●●●●●●●●● ●● ●● ●
● ●

●● ●●

●●
●●●●●
● ●
●●




●●
●●
● ●●●●

●● ● ●


●● ●●
● ●●●
●●●
●● ●
● ●●

●● ●
● ●●
●●
●●●●●
●●●●●●●



●●●

●●

●●
● ●●
●●●
●●●
● ●


●●●

● ●●

●●● ●
●●


●●



●● ●● ●
●●●●
● ● ● ●
● ● ● ●● ●●●●● ●●●●●●●
●● ●●●
●●●●
●● ●


●●●
● ●●
●●●
●●●
● ●


●●●

● ●●

●● ● ●

●● ●
●●●



●●●●● ●
●●
● ●●● ● ●
−10 −5

−10 −5

● ● ●● ●● ●●
●●● ●●● ● ●
● ● ● ● ● ●●●●
● ● ● ● ●● ●● ●●●● ● ● ● ●●●●
● ●● ● ●●● ● ●
●●
● ●●●
●● ●

●●●
● ●
● ●●●●●●
●● ●

● ●
●●
●● ●●● ●● ●

●● ● ●
● ● ● ●● ●● ● ●● ● ●●● ●●●

● ●
●●●●●●●●
●● ●

● ●
●●
●● ● ●●
● ●●●● ●
●●

● ●●
●●●●●●● ● ● ●

●●● ● ●●● ●

●●●●
●●

●●
●●●●
●●
●●●●● ●
●●●
●●●
● ●


●●●
●●●


●●●

● ●●●●●●
●●



● ● ●●
● ●●
●● ●●
●●
● ●
●●●●
● ● ● ●
●● ●

● ●●●●●●●●●●●●
●●●
●●

●●●●
●●● ●
●●● ●
●●●
●●●

●●

●●●
●●●


●●●
●●●
● ●●●●
●●



●●●●


● ●●
●●

● ●●● ● ●
● ● ●
● ●●● ●
●● ●
●●● ●
● ●● ●
● ●●● ●●●● ● ●●

● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ●
● ●●●● ●●● ●● ●● ● ●●

● ●
● ● ● ● ●● ●●●●● ● ●
● ● ● ●●● ● ● ●● ● ● ● ● ●● ●● ● ●● ●
●●●● ● ●
● ● ● ●●● ● ● ● ● ●
● ● ● ●●

●● ●●

●● ●● ●








● ●● ●● ●●
●●
● ●






● ●●●●●
● ●
●●●● ● ●● ● ●● ● ● ● ●
● ●● ●●●● ●● ● ● ●●●●●●●
● ●●● ● ●●
●●
● ●






● ●● ●●●
● ●
●●
●● ●●● ● ●●● ● ● ●
●● ● ● ● ● ●● ● ● ●
● ● ● ● ●● ● ●
●● ●● ● ● ●● ● ● ●● ● ● ● ● ●
●●● ●●● ● ● ●● ●
●●● ● ●●●●● ● ● ● ● ●● ● ●●
●●● ● ●
●● ●●

● ●● ● ●
●●● ● ●●●●● ● ● ●●● ●●● ● ●●
● ● ●● ● ● ●●● ● ●● ●
●●●
● ●
●● ● ● ● ● ● ● ● ● ●●●●● ●●● ● ●● ● ●● ●●●

●●●
●●● ● ● ● ●●
● ● ●
●● ●
●● ●●●●●
● ● ● ● ● ●●
●● ●●
●●●●
● ●●● ●●
● ● ●
●●● ● ●
●●●●● ● ●● ●●●●
●● ●●
●●● ●● ●
●●● ● ●
● ● ●
● ● ● ●●● ●
● ●
●● ●●● ●
● ●● ● ● ● ●● ● ● ● ● ●● ● ●●●●● ●
● ●

●● ●●●● ●●●● ● ●
●●
●● ● ● ● ●● ●●
●● ●
●●●
● ● ● ●● ● ●
●● ●
●● ●
●●●
● ● ● ●● ● ●
●● ● ● ●● ●●● ● ● ● ●●●● ●● ● ● ● ●
● ● ● ●● ●●●
● ● ●●● ● ● ●●●● ●● ● ● ● ●

● ●●● ●● ●● ● ●

●●● ● ● ● ● ●● ●●
● ● ●●● ● ●
● ● ● ●
● ● ● ● ● ●● ●
● ● ●
●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ●
●● ● ●
● ●
● ●
● ● ● ●
Simple Transformation

I linearity often too restrictive assumption


I gain flexibility without complex models by using log or polynomials of
x
⇒ replace y = βx + ε

by y = βf (x) + ε; f (x) = log(x), x3 , x etc...
I Issues:
I interpretation of β
I choice/selection of f (x)

84 / 327
Simple Transformation

I linearity often too restrictive assumption


I gain flexibility without complex models by using log or polynomials of
x
⇒ replace y = βx + ε

by y = βf (x) + ε; f (x) = log(x), x3 , x etc...
I Issues:
I interpretation of β
I choice/selection of f (x)

84 / 327
Simple Transformation

I linearity often too restrictive assumption


I gain flexibility without complex models by using log or polynomials of
x
⇒ replace y = βx + ε

by y = βf (x) + ε; f (x) = log(x), x3 , x etc...
I Issues:
I interpretation of β
I choice/selection of f (x)

84 / 327
Polynomial Transformation

I Polynomial Model
I y = f (x) + ε = β0 + β1 x + β2 x2 + · · · + βl xl + ε
I In R: Use poly(x,degree) to avoid collinearity

85 / 327
Polynomial Transformation: Collinearity
x <- seq(0, 1, l = 200)
X <- outer(x, 1:5, "^")
X.c <- poly(x, 5)
round(cor(X), 2)

## [,1] [,2] [,3] [,4] [,5]


## [1,] 1.00 0.97 0.92 0.87 0.82
## [2,] 0.97 1.00 0.99 0.96 0.93
## [3,] 0.92 0.99 1.00 0.99 0.97
## [4,] 0.87 0.96 0.99 1.00 0.99
## [5,] 0.82 0.93 0.97 0.99 1.00

round(cor(X.c), 2)

## 1 2 3 4 5
## 1 1 0 0 0 0
## 2 0 1 0 0 0
## 3 0 0 1 0 0
## 4 0 0 0 1 0
## 5 0 0 0 0 1

⇒ use orthogonal polynomials


Polynomial Transformation: Synthetic example
x <- seq(0, 1, l = 300)
fx <- function(x) {
sin(2 * (4 * x - 2)) + 2 * exp(-16^2 * (x - 0.5)^2)
}
y <- fx(x) + rnorm(300, sd = .3)
X.c <- poly(x, 15)
m.poly3 <- lm(y ~ X.c[, 1:3])
m.poly7 <- lm(y ~ X.c[, 1:7])
m.poly11 <- lm(y ~ X.c[, 1:11])
m.poly15 <- lm(y ~ X.c)
plot(x, y, pch = 19, col = "grey")
lines(x, fx(x), col = 1, lwd = 2)


●●
● ●
2

●●
●●

●●● ● ●
● ●
● ● ●●● ●
● ●● 87 / 327
● ●
Polynomial Transformation: Synthetic example


●●
● ●●
2


●●
●● ● ●

● ●
● ● ●●● ●
● ●●●
● ● ●
● ●● ● ● ●●
● ● ● ●

● ● ● ●● ● ●●
● ● ●
● ● ● ● ●● ● ●●● ●
1

● ● ● ● ●
● ● ●
● ● ●● ● ● ●●
● ●● ● ● ●● ●● ●● ●
● ● ●
●● ●● ● ● ● ● ● ● ● ● ●
●● ●● ● ●● ●● ●●● ●● ● ●

y

●● ● ●● ● ●
●● ●● ● ● ●
● ● ● ● ●
● ● ●
● ●● ● ● ● ●● ● ●
● ● ● ● ● ●●
● ● ●
● ●● ● ● ●●
0

● ● ●● ● ● ● ●
●● ● ● ●●●
● ● ●● ●●●
●● ● ● ● ●
● ● ● ●● ●●● ●
●● ●
● ● ● ● ● ● ●●
●● ● ●
● ●●
● ●● ● ●
● ●●
●●
●● ● ● ●
● ● ● ● ● ● ●● ● ●
● ● ● ● ● ●● ● ● ● ● ●

−1

● ●● ●● ●
●●● ● ●
● ● ● ● ●
● ●● ●● ● ●
● ●
● ● ●● ●● ●

0.0 0.2 0.4 0.6 0.8 1.0

88 / 327
Polynomial Transformation: Synthetic example


true f
degree 3

●●
● ●●
2


●●
●● ● ●

● ●
● ● ●●● ●
● ●●●
● ● ●
● ●● ● ● ●●
● ● ● ●

● ● ● ●● ● ●●
● ● ●
● ● ● ● ●● ● ●●● ●
1

● ● ● ● ●
● ● ●
● ● ●● ● ● ●●
● ●● ● ● ●● ●● ●● ●
● ● ●
●● ●● ● ● ● ● ● ● ● ● ●
●● ●● ● ●● ●● ●●● ●● ● ●

y

●● ● ●● ● ●
●● ●● ● ● ●
● ● ● ● ●
● ● ●
● ●● ● ● ● ●● ● ●
● ● ● ● ● ●●
● ● ●
● ●● ● ● ●●
0

● ● ●● ● ● ● ●
●● ● ● ●●●
● ● ●● ●●●
●● ● ● ● ●
● ● ● ●● ●●● ●
●● ●
● ● ● ● ● ● ●●
●● ● ●
● ●●
● ●● ● ●
● ●●
●●
●● ● ● ●
● ● ● ● ● ● ●● ● ●
● ● ● ● ● ●● ● ● ● ● ●

−1

● ●● ●● ●
●●● ● ●
● ● ● ● ●
● ●● ●● ● ●
● ●
● ● ●● ●● ●

0.0 0.2 0.4 0.6 0.8 1.0

88 / 327
Polynomial Transformation: Synthetic example


true f
degree 3

degree 7 ●●
● ●●
2


●●
●● ● ●

● ●
● ● ●●● ●
● ●●●
● ● ●
● ●● ● ● ●●
● ● ● ●

● ● ● ●● ● ●●
● ● ●
● ● ● ● ●● ● ●●● ●
1

● ● ● ● ●
● ● ●
● ● ●● ● ● ●●
● ●● ● ● ●● ●● ●● ●
● ● ●
●● ●● ● ● ● ● ● ● ● ● ●
●● ●● ● ●● ●● ●●● ●● ● ●

y

●● ● ●● ● ●
●● ●● ● ● ●
● ● ● ● ●
● ● ●
● ●● ● ● ● ●● ● ●
● ● ● ● ● ●●
● ● ●
● ●● ● ● ●●
0

● ● ●● ● ● ● ●
●● ● ● ●●●
● ● ●● ●●●
●● ● ● ● ●
● ● ● ●● ●●● ●
●● ●
● ● ● ● ● ● ●●
●● ● ●
● ●●
● ●● ● ●
● ●●
●●
●● ● ● ●
● ● ● ● ● ● ●● ● ●
● ● ● ● ● ●● ● ● ● ● ●

−1

● ●● ●● ●
●●● ● ●
● ● ● ● ●
● ●● ●● ● ●
● ●
● ● ●● ●● ●

0.0 0.2 0.4 0.6 0.8 1.0

88 / 327
Polynomial Transformation: Synthetic example


true f
degree 3

degree 7 ●●
● ●●
2


degree 11 ●●
●● ● ●

● ●
● ● ●●● ●
● ●●●
● ● ●
● ●● ● ● ●●
● ● ● ●

● ● ● ●● ● ●●
● ● ●
● ● ● ● ●● ● ●●● ●
1

● ● ● ● ●
● ● ●
● ● ●● ● ● ●●
● ●● ● ● ●● ●● ●● ●
● ● ●
●● ●● ● ● ● ● ● ● ● ● ●
●● ●● ● ●● ●● ●●● ●● ● ●

y

●● ● ●● ● ●
●● ●● ● ● ●
● ● ● ● ●
● ● ●
● ●● ● ● ● ●● ● ●
● ● ● ● ● ●●
● ● ●
● ●● ● ● ●●
0

● ● ●● ● ● ● ●
●● ● ● ●●●
● ● ●● ●●●
●● ● ● ● ●
● ● ● ●● ●●● ●
●● ●
● ● ● ● ● ● ●●
●● ● ●
● ●●
● ●● ● ●
● ●●
●●
●● ● ● ●
● ● ● ● ● ● ●● ● ●
● ● ● ● ● ●● ● ● ● ● ●

−1

● ●● ●● ●
●●● ● ●
● ● ● ● ●
● ●● ●● ● ●
● ●
● ● ●● ●● ●

0.0 0.2 0.4 0.6 0.8 1.0

88 / 327
Polynomial Transformation: Synthetic example


true f
degree 3

degree 7 ●●
● ●●
2


degree 11 ●●
degree 15 ●● ● ●

● ●
● ● ●●● ●
● ●●●
● ● ●
● ●● ● ● ●●
● ● ● ●

● ● ● ●● ● ●●
● ● ●
● ● ● ● ●● ● ●●● ●
1

● ● ● ● ●
● ● ●
● ● ●● ● ● ●●
● ●● ● ● ●● ●● ●● ●
● ● ●
●● ●● ● ● ● ● ● ● ● ● ●
●● ●● ● ●● ●● ●●● ●● ● ●

y

●● ● ●● ● ●
●● ●● ● ● ●
● ● ● ● ●
● ● ●
● ●● ● ● ● ●● ● ●
● ● ● ● ● ●●
● ● ●
● ●● ● ● ●●
0

● ● ●● ● ● ● ●
●● ● ● ●●●
● ● ●● ●●●
●● ● ● ● ●
● ● ● ●● ●●● ●
●● ●
● ● ● ● ● ● ●●
●● ● ●
● ●●
● ●● ● ●
● ●●
●●
●● ● ● ●
● ● ● ● ● ● ●● ● ●
● ● ● ● ● ●● ● ● ● ● ●

−1

● ●● ●● ●
●●● ● ●
● ● ● ● ●
● ●● ●● ● ●
● ●
● ● ●● ●● ●

0.0 0.2 0.4 0.6 0.8 1.0

88 / 327
Piecewise Polynomials

I polynomial transformations have problems:


I choice of degree (= flexibility)
I oscillations, boundary effects for higher degrees
⇒ piece-wise polynomials:
Idecompose range of x in sub-intervals
Iapproximate f (x) by low-degree polynomial in each sub-interval
⇒ removes oscillations, boundary effects

89 / 327
Piecewise Polynomials

I polynomial transformations have problems:


I choice of degree (= flexibility)
I oscillations, boundary effects for higher degrees
⇒ piece-wise polynomials:
Idecompose range of x in sub-intervals
Iapproximate f (x) by low-degree polynomial in each sub-interval
⇒ removes oscillations, boundary effects

89 / 327
Piecewise Polynomials
Piecewise Polynomials


true f
degree 15

5 piecewise ●●
● ●●
2


quadratic polynomials ●●
●● ● ●

● ●
● ● ●●● ●
● ●●●
● ● ●
● ●● ● ● ●●
● ● ● ●

● ● ● ●● ● ●●
● ● ●
● ● ● ● ●● ● ●●● ●
1

● ● ●
● ● ●
● ● ●●●
● ● ● ●
●● ●
● ●● ●●●● ● ●●● ● ● ●●●● ●
●● ●● ● ● ● ● ● ●
●● ●● ● ●● ●
● ● ● ●● ●●
y

●● ● ●● ● ●
●● ●● ● ● ● ●
● ● ● ●
● ● ●
● ●● ● ● ● ● ●●

● ● ● ● ● ●●
● ● ●
● ●● ● ● ●●
0

● ● ●● ● ● ● ●
● ●● ● ●●●
● ● ●● ●●●
●● ● ● ● ●
● ● ● ●● ●●● ●
●● ●
● ● ● ● ● ● ●●
● ●●


●●●●● ● ●
● ●●
●●
●● ● ● ●
● ● ● ● ● ● ●
● ● ● ● ● ●
● ● ● ● ●● ●● ● ●
−1

● ●● ●● ●
●●● ● ●
● ● ●● ● ● ●
● ● ● ●● ●
● ● ● ●● ●● ●

0.0 0.2 0.4 0.6 0.8 1.0

⇒ fˆ(x) for piecewise polynomials not continous


Outline

Recap: Linear Models

Recap: Generalized Linear Models

Recap: Non-Linear Effects


Transformation of Covariates
Polynomial Splines

Recap: Mixed Models and Random Effects

Recap: Additive Models and Penalization

91 / 327
Definition: Polynomial Splines

I better piecewise polynomials


I require continuous differentiability at subinterval boundaries
I formally:
f : [a, b] → R is polynomial spline of degree l ≥ 0 at knots
a = κ1 < · · · < κm = b if
1. f (x) is l − 1-times continuously differentiable
2. f (x) is polynmial with degree l on [κj , κj+1 )
⇒ choice of degree l determines smoothness of function
⇒ knot set κ defines flexibility/complexity f

91 / 327
Definition: Polynomial Splines

I better piecewise polynomials


I require continuous differentiability at subinterval boundaries
I formally:
f : [a, b] → R is polynomial spline of degree l ≥ 0 at knots
a = κ1 < · · · < κm = b if
1. f (x) is l − 1-times continuously differentiable
2. f (x) is polynmial with degree l on [κj , κj+1 )
⇒ choice of degree l determines smoothness of function
⇒ knot set κ defines flexibility/complexity f

91 / 327
Definition: Polynomial Splines

I better piecewise polynomials


I require continuous differentiability at subinterval boundaries
I formally:
f : [a, b] → R is polynomial spline of degree l ≥ 0 at knots
a = κ1 < · · · < κm = b if
1. f (x) is l − 1-times continuously differentiable
2. f (x) is polynmial with degree l on [κj , κj+1 )
⇒ choice of degree l determines smoothness of function
⇒ knot set κ defines flexibility/complexity f

91 / 327
Polynomial Splines: Example
polynomial spline degree 0 polynomial spline degree 1

● ●

5+2 knots ●●● 5+2 knots ●●●


2

2
● ●● ● ●●
● ●
●● ●●
●● ● ●
● ●● ● ●

● ● ● ●
● ● ●●● ● ● ● ●●● ●
● ● ●●● ● ● ● ●●● ●
● ● ● ● ● ●● ● ● ● ● ● ●●
● ● ●● ● ● ●● ● ●●● ● ● ● ●● ● ● ●● ● ●●● ●
● ● ● ●●● ●● ● ● ● ●●● ●●
1

1
● ● ● ●● ● ● ● ● ● ● ●● ● ● ●
● ● ●● ●●●

● ● ●● ● ● ● ● ●● ●●●

● ● ●● ● ●
● ●● ●● ●●●● ● ● ● ● ●● ●● ●●●● ● ● ●
● ● ●● ● ● ●●●● ● ● ● ●● ● ● ●●●● ●
●●●●● ● ● ● ●● ●● ●●● ●● ● ●●●●● ● ● ● ●● ●● ●●● ●● ●
● ● ● ● ● ● ● ● ● ●
y

y
●● ●● ● ● ●● ● ● ●● ●● ● ● ●● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ●
● ● ●● ●● ● ● ● ●● ● ● ● ● ●● ●● ● ● ● ●● ● ●
● ● ●● ● ● ●● ● ● ●● ● ● ●●
● ●
0

0
● ●● ● ● ●●
● ● ●● ● ● ● ● ● ●
● ● ● ● ●● ● ● ● ● ● ●
● ●
●● ●●● ● ● ●● ● ●●● ●● ●●● ● ● ●● ● ●●●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ●● ●●● ● ● ● ●● ●●● ●

●●●● ● ● ● ● ●● ● ● ●
●●●● ● ● ● ● ●● ● ●
● ● ●●●
● ● ● ● ●● ● ● ●●●
● ● ● ● ●●
● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ●● ●● ● ● ● ● ● ●● ●● ●
−1

−1
● ●
● ●
●●● ● ● ●● ●● ● ● ●
● ● ●
●●● ● ● ●● ●● ● ● ●

● ● ●● ● ● ● ●● ●
●● ● ● ●● ● ●
● ● ● ● ● ● ● ● ● ●
●● ●● ● ● ●● ● ●● ●● ● ● ●● ●
● ● ● ● ●●● ● ● ● ● ●●●

● ●

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

x x

polynomial spline degree 2 polynomial spline degree 3

● ●

5+2 knots ●●● 5+2 knots ●●●


2

2
● ●● ● ●●
● ●
●● ●●
●● ● ●
● ●● ● ●

● ● ● ●
● ● ●●● ● ● ● ●●● ●
● ● ●●● ● ● ● ●●● ●
● ● ● ● ● ●● ● ● ● ● ● ●●
● ● ●● ● ● ●● ● ●●● ● ● ● ●● ● ● ●● ● ●●● ●
● ● ● ●●● ●● ● ● ● ●●● ●●
1

1
● ● ● ●● ● ● ● ● ● ● ●● ● ● ●
● ● ●● ●●●

● ● ●● ● ● ● ● ●● ●●●

● ● ●● ● ●
● ●● ●● ●●●● ● ● ● ● ●● ●● ●●●● ● ● ●
● ● ●● ● ● ●●●● ● ● ● ●● ● ● ●●●● ●
●●●●● ● ● ● ●● ●● ●●● ●● ● ●●●●● ● ● ● ●● ●● ●●● ●● ●
● ● ● ● ● ● ● ● ● ●
y

y
●● ●● ● ● ●● ● ● ●● ●● ● ● ●● ● ●
● ●
●● ● ● ● ● ●● ● ● ● ●
●●●
● ●● ●● ● ● ● ●● ● ● ●●●
● ●● ●● ● ● ● ●● ● ●
● ● ● ● ● ●● ● ● ● ● ● ●●
● ● ● ●
0


● ● ●● ●

● ● ● ● ●
● ●●

0 ●
● ● ●● ●

● ● ● ● ●
● ●●

● ● ●●● ● ● ●● ● ●●● ● ● ●●● ● ● ●● ● ●●●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ●● ●●● ● ● ● ●● ●●● ●

●●●● ● ● ● ● ●● ● ● ●
●●●● ● ● ● ● ●● ● ●
● ● ●
●● ●
● ● ● ●● ● ● ●
●● ●
● ● ● ●●
● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ●
−1

−1

● ● ● ● ●● ●● ●● ● ● ● ● ● ● ●● ●● ●● ● ●
● ● ● ●●● ● ● ● ● ● ● ●●● ● ● ●
●●● ● ●● ● ●●● ● ●● ●
● ● ●● ● ● ● ● ● ●● ● ● ●
● ●● ● ● ● ● ● ●● ● ● ● ●
● ● ● ● ●●● ● ● ● ● ●●●

● ●

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

x x

92 / 327
Polynomial Splines: Example
polynomial spline degree 0 polynomial spline degree 1

● ●

5+2 knots ●●● 5+2 knots ●●●


2

2
● ●● ● ●●
● ●
●● ●●
20+2 knots ●
●● ● ●


● ●
● ●●● ●
20+2 knots ●
●● ● ●


● ●
● ●●● ●
● ●●● ● ● ●●● ●
● ● ● ● ● ●● ● ● ● ● ● ●●
● ● ●● ● ● ●● ● ●●● ● ● ● ●● ● ● ●● ● ●●● ●
● ● ● ●●● ●● ● ● ● ●●● ●●
1

1
● ● ● ●● ● ● ● ● ● ● ●● ● ● ●
● ● ●● ●●●

● ● ●● ● ● ● ● ●● ●●●

● ● ●● ● ●
● ●● ●● ●●●● ● ● ● ● ●● ●● ●●●● ● ● ●
● ● ●● ● ● ●●●● ● ● ● ●● ● ● ●●●● ●
●●●●● ● ● ● ●● ●● ●●● ●● ● ●●●●● ● ● ● ●● ●● ●●● ●● ●
● ● ● ● ● ● ● ● ● ●
y

y
●● ●● ● ● ●● ● ● ●● ●● ● ● ●● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ●
● ● ●● ●● ● ● ● ●● ● ● ● ● ●● ●● ● ● ● ●● ● ●
● ● ●● ● ● ●● ● ● ●● ● ● ●●
● ●
0

0
● ●● ● ● ●●
● ● ●● ● ● ● ● ● ●
● ● ● ● ●● ● ● ● ● ● ●
● ●
●● ●●● ● ● ●● ● ●●● ●● ●●● ● ● ●● ● ●●●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ●● ●●● ● ● ● ●● ●●● ●

●●●● ● ● ● ● ●● ● ● ●
●●●● ● ● ● ● ●● ● ●
● ● ●●●
● ● ● ● ●● ● ● ●●●
● ● ● ● ●●
● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ●● ●● ● ● ● ● ● ●● ●● ●
−1

−1
● ●
● ●
●●● ● ● ●● ●● ● ● ●
● ● ●
●●● ● ● ●● ●● ● ● ●

● ● ●● ● ● ● ●● ●
●● ● ● ●● ● ●
● ● ● ● ● ● ● ● ● ●
●● ●● ● ● ●● ● ●● ●● ● ● ●● ●
● ● ● ● ●●● ● ● ● ● ●●●

● ●

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

x x

polynomial spline degree 2 polynomial spline degree 3

● ●

5+2 knots ●●● 5+2 knots ●●●


2

2
● ●● ● ●●
● ●
●● ●●
20+2 knots ●
●● ● ●


● ●
● ●●● ●
20+2 knots ●
●● ● ●


● ●
● ●●● ●
● ●●● ● ● ●●● ●
● ● ● ● ● ●● ● ● ● ● ● ●●
● ● ●● ● ● ●● ● ●●● ● ● ● ●● ● ● ●● ● ●●● ●
● ● ● ●●● ●● ● ● ● ●●● ●●
1

1
● ● ● ●● ● ● ● ● ● ● ●● ● ● ●
● ● ●● ●●●

● ● ●● ● ● ● ● ●● ●●●

● ● ●● ● ●
● ●● ●● ●●●● ● ● ● ● ●● ●● ●●●● ● ● ●
● ● ●● ● ● ●●●● ● ● ● ●● ● ● ●●●● ●
●●●●● ● ● ● ●● ●● ●●● ●● ● ●●●●● ● ● ● ●● ●● ●●● ●● ●
● ● ● ● ● ● ● ● ● ●
y

y
●● ●● ● ● ●● ● ● ●● ●● ● ● ●● ● ●
● ●
●● ● ● ● ● ●● ● ● ● ●
●●●
● ●● ●● ● ● ● ●● ● ● ●●●
● ●● ●● ● ● ● ●● ● ●
● ● ● ● ● ●● ● ● ● ● ● ●●
● ● ● ●
0


● ● ●● ●

● ● ● ● ●
● ●●

0 ●
● ● ●● ●

● ● ● ● ●
● ●●

● ● ●●● ● ● ●● ● ●●● ● ● ●●● ● ● ●● ● ●●●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ●● ●●● ● ● ● ●● ●●● ●

●●●● ● ● ● ● ●● ● ● ●
●●●● ● ● ● ● ●● ● ●
● ● ●
●● ●
● ● ● ●● ● ● ●
●● ●
● ● ● ●●
● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ●
−1

−1

● ● ● ● ●● ●● ●● ● ● ● ● ● ● ●● ●● ●● ● ●
● ● ● ●●● ● ● ● ● ● ● ●●● ● ● ●
●●● ● ●● ● ●●● ● ●● ●
● ● ●● ● ● ● ● ● ●● ● ● ●
● ●● ● ● ● ● ● ●● ● ● ● ●
● ● ● ● ●●● ● ● ● ● ●●●

● ●

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

x x

92 / 327
Polynomial Splines: Example
polynomial spline degree 0 polynomial spline degree 1

● ●

5+2 knots ●●● 5+2 knots ●●●


2

2
● ●● ● ●●
● ●
●● ●●
20+2 knots ●
●● ● ●


● ●
● ●●● ●
20+2 knots ●
●● ● ●


● ●
● ●●● ●
● ●●● ● ● ●●● ●
● ● ● ●
● ●

50+2 knots ●●


● ● ●●



● ●●● ●●
●●
●●● ● ● ●

50+2 knots ●●


● ● ●●



● ●●● ●●
●●
●●● ●
1

1
● ● ● ●● ● ● ● ● ● ● ●● ● ● ●
● ● ●● ●●●

● ● ●● ● ● ● ● ●● ●●●

● ● ●● ● ●
● ●● ●● ●●●● ● ● ● ● ●● ●● ●●●● ● ● ●
● ● ●● ● ● ●●●● ● ● ● ●● ● ● ●●●● ●
●●●●● ● ● ● ●● ●● ●●● ●● ● ●●●●● ● ● ● ●● ●● ●●● ●● ●
● ● ● ● ● ● ● ● ● ●
y

y
●● ●● ● ● ●● ● ● ●● ●● ● ● ●● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ●
● ● ●● ●● ● ● ● ●● ● ● ● ● ●● ●● ● ● ● ●● ● ●
● ● ●● ● ● ●● ● ● ●● ● ● ●●
● ●
0

0
● ●● ● ● ●●
● ● ●● ● ● ● ● ● ●
● ● ● ● ●● ● ● ● ● ● ●
● ●
●● ●●● ● ● ●● ● ●●● ●● ●●● ● ● ●● ● ●●●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ●● ●●● ● ● ● ●● ●●● ●

●●●● ● ● ● ● ●● ● ● ●
●●●● ● ● ● ● ●● ● ●
● ● ●●●
● ● ● ● ●● ● ● ●●●
● ● ● ● ●●
● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ●● ●● ● ● ● ● ● ●● ●● ●
−1

−1
● ●
● ●
●●● ● ● ●● ●● ● ● ●
● ● ●
●●● ● ● ●● ●● ● ● ●

● ● ●● ● ● ● ●● ●
●● ● ● ●● ● ●
● ● ● ● ● ● ● ● ● ●
●● ●● ● ● ●● ● ●● ●● ● ● ●● ●
● ● ● ● ●●● ● ● ● ● ●●●

● ●

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

x x

polynomial spline degree 2 polynomial spline degree 3

● ●

5+2 knots ●●● 5+2 knots ●●●


2

2
● ●● ● ●●
● ●
●● ●●
20+2 knots ●
●● ● ●


● ●
● ●●● ●
20+2 knots ●
●● ● ●


● ●
● ●●● ●
● ●●● ● ● ●●● ●
● ● ● ●
● ●

50+2 knots ●●


● ● ●●



● ●●● ●●
●●
●●● ● ● ●

50+2 knots ●●


● ● ●●



● ●●● ●●
●●
●●● ●
1

1
● ● ● ●● ● ● ● ● ● ● ●● ● ● ●
● ● ●● ●●●

● ● ●● ● ● ● ● ●● ●●●

● ● ●● ● ●
● ●● ●● ●●●● ● ● ● ● ●● ●● ●●●● ● ● ●
● ● ●● ● ● ●●●● ● ● ● ●● ● ● ●●●● ●
●●●●● ● ● ● ●● ●● ●●● ●● ● ●●●●● ● ● ● ●● ●● ●●● ●● ●
● ● ● ● ● ● ● ● ● ●
y

y
●● ●● ● ● ●● ● ● ●● ●● ● ● ●● ● ●
● ●
●● ● ● ● ● ●● ● ● ● ●
●●●
● ●● ●● ● ● ● ●● ● ● ●●●
● ●● ●● ● ● ● ●● ● ●
● ● ● ● ● ●● ● ● ● ● ● ●●
● ● ● ●
0


● ● ●● ●

● ● ● ● ●
● ●●

0 ●
● ● ●● ●

● ● ● ● ●
● ●●

● ● ●●● ● ● ●● ● ●●● ● ● ●●● ● ● ●● ● ●●●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ●● ●●● ● ● ● ●● ●●● ●

●●●● ● ● ● ● ●● ● ● ●
●●●● ● ● ● ● ●● ● ●
● ● ●
●● ●
● ● ● ●● ● ● ●
●● ●
● ● ● ●●
● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ●
−1

−1

● ● ● ● ●● ●● ●● ● ● ● ● ● ● ●● ●● ●● ● ●
● ● ● ●●● ● ● ● ● ● ● ●●● ● ● ●
●●● ● ●● ● ●●● ● ●● ●
● ● ●● ● ● ● ● ● ●● ● ● ●
● ●● ● ● ● ● ● ●● ● ● ● ●
● ● ● ● ●●● ● ● ● ● ●●●

● ●

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

x x

92 / 327
Polynomial Splines: Discussion

I Standard: cubic splines:


I visually smooth
I twice continuous differentiable (i.e., curvature well defined.)
I knot set:
Isize: trade-off flexibility and overfitting
Ipositioning: equidistant? quantile-based? domain knowledge?
→ more on this in the context of penalization

93 / 327
Polynomial Splines: Discussion

I Standard: cubic splines:


I visually smooth
I twice continuous differentiable (i.e., curvature well defined.)
I knot set:
Isize: trade-off flexibility and overfitting
Ipositioning: equidistant? quantile-based? domain knowledge?
→ more on this in the context of penalization

93 / 327
Polynomial Splines: Discussion

I Standard: cubic splines:


I visually smooth
I twice continuous differentiable (i.e., curvature well defined.)
I knot set:
Isize: trade-off flexibility and overfitting
Ipositioning: equidistant? quantile-based? domain knowledge?
→ more on this in the context of penalization

93 / 327
Truncated Polynomials

I simplest polynomial splines


I basis representation for degree l and knots κ = (κ1 , . . . , κm ):

f (x) =γ1 + γ2 x + · · · + γl+1 x l +


+ γl+2 (x − κ2 )l+ + · · · + γl+m−1 (x − κm−1 )l+

I first l + 1 Koeffizienten determine global polynomial with degree l


I coefficient of highest power can change at each knot κ
⇒ f is of degree l everywhere and continuously differentiable

94 / 327
Truncated Polynomials

I simplest polynomial splines


I basis representation for degree l and knots κ = (κ1 , . . . , κm ):

f (x) =γ1 + γ2 x + · · · + γl+1 x l +


+ γl+2 (x − κ2 )l+ + · · · + γl+m−1 (x − κm−1 )l+

I first l + 1 Koeffizienten determine global polynomial with degree l


I coefficient of highest power can change at each knot κ
⇒ f is of degree l everywhere and continuously differentiable

94 / 327
Truncated Polynomials: Example

basis functions


●●
2


●● ●



●● ●

● ●
● ● ●●● ●
● ●●
● ● ● ●
● ●● ● ● ●●
● ●
● ● ● ●● ● ●●●
● ● ● ● ●
● ● ●● ●●
1

● ● ● ●
● ● ● ●
● ●● ● ●● ●
● ●
●● ● ●●
● ●●
●● ● ●● ● ● ● ● ● ●●
● ● ● ● ● ●
●● ●● ● ● ● ●
●● ●● ● ● ●● ● ● ●●
● ●

y

● ● ● ●● ● ●
● ● ● ●
●● ●
● ● ● ● ● ● ●
●●● ● ● ● ● ● ●
● ● ●
● ● ● ● ●●
● ● ●
0

● ●● ●●
●● ● ● ●


●● ● ● ●● ● ●●●
● ●●
● ●
● ● ● ● ● ●
● ● ● ● ●● ●
● ●● ●
●●● ● ● ●
● ● ● ●●
●● ● ●●
● ●
●● ● ●

● ●● ●●
●● ● ●
● ● ●
● ● ● ●
● ●●● ● ● ●
● ● ● ●● ●● ●
−1

● ●
● ●● ● ●

● ●
● ● ● ● ●
● ● ●●
● ●● ● ● ●
● ●
● ● ●● ●
●●

0.0 0.2 0.4 0.6 0.8 1.0

95 / 327
Truncated Polynomials: Example

scaled basis functions


basis functions

● ●

● ●
●● ●●
2

2
● ●
●● ● ●● ●
● ●
● ●
● ●
●● ●
● ●● ●

● ● ● ●
● ● ●●● ● ● ● ●●● ●
● ●● ● ●●
● ● ● ● ● ● ● ●
● ●● ● ● ●● ● ●● ● ● ●●
● ● ● ●
● ● ● ●● ● ●●● ● ● ● ●● ● ●●●
● ● ● ● ● ● ● ● ● ●
● ● ●● ●● ● ● ●● ●●
1

1
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ●● ● ●● ●
● ● ● ●● ● ●● ●
● ●
●● ● ●● ●● ● ●●
● ●● ● ●●
●● ● ●● ● ● ● ● ● ●● ●● ● ●● ● ● ● ● ● ●●
● ● ● ● ● ● ● ● ● ● ● ●
●● ●● ● ● ● ● ●● ●● ● ● ● ●
●● ●● ● ● ●● ● ● ●● ●● ●● ● ● ●● ● ● ●●
● ●
● ● ●

y

y
● ● ● ●● ● ● ● ● ● ●● ● ●
● ● ● ● ● ● ● ●
●● ● ●● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ●
●●● ● ● ● ● ● ● ●●● ● ● ● ● ● ●
● ● ● ● ● ●
● ● ● ● ●● ● ● ● ● ●●
● ● ● ● ● ●
0

0
● ●● ●● ● ●● ●●
●● ● ● ●

●● ● ● ●

● ●
●● ● ● ●● ● ●●● ●● ● ● ●● ● ●●●
● ●● ● ●●
● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ●● ● ● ● ● ● ●● ●
● ●● ● ● ●● ●
●●● ● ● ● ●●● ● ● ●
● ● ● ●● ● ● ● ●●
●● ● ●● ●● ● ●●
● ● ● ●
●● ● ● ●● ● ●
● ●
● ●● ●● ● ●● ●●
●● ● ● ●● ● ●
● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ●●● ● ● ● ● ●●● ● ● ●
● ● ● ●● ●● ● ● ● ● ●● ●● ●
−1

−1
● ● ● ●
● ●● ● ● ● ●● ● ●

● ● ●
● ●
● ● ● ● ● ● ● ● ● ●
● ● ●● ● ● ●●
● ●● ● ● ● ● ●● ● ● ●
● ● ● ●
● ● ●● ● ● ● ●● ●
●● ●●

● ●

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

x x

95 / 327
Truncated Polynomials: Example

scaled basis functions


basis functions
and their sum f(x)
● ●

● ●
●● ●●
2

2
● ●
●● ● ●● ●
● ●
● ●
● ●
●● ●
● ●● ●

● ● ● ●
● ● ●●● ● ● ● ●●● ●
● ●● ● ●●
● ● ● ● ● ● ● ●
● ●● ● ● ●● ● ●● ● ● ●●
● ● ● ●
● ● ● ●● ● ●●● ● ● ● ●● ● ●●●
● ● ● ● ● ● ● ● ● ●
● ● ●● ●● ● ● ●● ●●
1

1
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ●● ● ●● ●
● ● ● ●● ● ●● ●
● ●
●● ● ●● ●● ● ●●
● ●● ● ●●
●● ● ●● ● ● ● ● ● ●● ●● ● ●● ● ● ● ● ● ●●
● ● ● ● ● ● ● ● ● ● ● ●
●● ●● ● ● ● ● ●● ●● ● ● ● ●
●● ●● ● ● ●● ● ● ●● ●● ●● ● ● ●● ● ● ●●
● ●
● ● ●

y

y
● ● ● ●● ● ● ● ● ● ●● ● ●
● ● ● ● ● ● ● ●
●● ● ●● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ●
●●● ● ● ● ● ● ● ●●● ● ● ● ● ● ●
● ● ● ● ● ●
● ● ● ● ●● ● ● ● ● ●●
● ● ● ● ● ●
0

0
● ●● ●● ● ●● ●●
●● ● ● ●

●● ● ● ●

● ●
●● ● ● ●● ● ●●● ●● ● ● ●● ● ●●●
● ●● ● ●●
● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ●● ● ● ● ● ● ●● ●
● ●● ● ● ●● ●
●●● ● ● ● ●●● ● ● ●
● ● ● ●● ● ● ● ●●
●● ● ●● ●● ● ●●
● ● ● ●
●● ● ● ●● ● ●
● ●
● ●● ●● ● ●● ●●
●● ● ● ●● ● ●
● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ●●● ● ● ● ● ●●● ● ● ●
● ● ● ●● ●● ● ● ● ● ●● ●● ●
−1

−1
● ● ● ●
● ●● ● ● ● ●● ● ●

● ● ●
● ●
● ● ● ● ● ● ● ● ● ●
● ● ●● ● ● ●●
● ●● ● ● ● ● ●● ● ● ●
● ● ● ●
● ● ●● ● ● ● ●● ●
●● ●●

● ●

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

x x

95 / 327
Truncated Polynomials: Discussion

Numerical disadvantages
I Basis function values can become very large
I strong colinearity of basis functions
⇒ numerically preferable: B-spline-basis functions

96 / 327
Truncated Polynomials: Discussion

Numerical disadvantages
I Basis function values can become very large
I strong colinearity of basis functions
⇒ numerically preferable: B-spline-basis functions

96 / 327
B-splines: Idea

I B-Spline-basis function is itself a piecewise polynomial, connecting


I (l + 1) polynomial fragments
I of degree l
I (l − 1)-times continuously differentiable at connection points.
⇒ weighted sum of such basis functions is degree l and (l − 1)-times
continuously differentiable everywhere

97 / 327
B-splines: Idea

I B-Spline-basis function is itself a piecewise polynomial, connecting


I (l + 1) polynomial fragments
I of degree l
I (l − 1)-times continuously differentiable at connection points.
⇒ weighted sum of such basis functions is degree l and (l − 1)-times
continuously differentiable everywhere

97 / 327
B-Splines: Basis Functions
B−spline basis functions
1.0

degree l=0
0.8
0.6
B(x)

0.4
0.2
0.0

κ1 κ2 κ3 κ4 κ5 κ6 κ7 κ8 κ9 κ10 κ11

98 / 327
B-Splines: Basis Functions
B−spline basis functions
1.0

degree l=0
degree l=1
0.8
0.6
B(x)

0.4
0.2
0.0

κ1 κ2 κ3 κ4 κ5 κ6 κ7 κ8 κ9 κ10 κ11

98 / 327
B-Splines: Basis Functions
B−spline basis functions
1.0

degree l=0
degree l=1
degree l=2
0.8
0.6
B(x)

0.4
0.2
0.0

κ1 κ2 κ3 κ4 κ5 κ6 κ7 κ8 κ9 κ10 κ11

98 / 327
B-Splines: Basis Functions
B−spline basis functions
1.0

degree l=0
degree l=1
degree l=2
degree l=3
0.8
0.6
B(x)

0.4
0.2
0.0

κ1 κ2 κ3 κ4 κ5 κ6 κ7 κ8 κ9 κ10 κ11

98 / 327
B-Splines: Basis Functions
B−spline basis functions
1.0
0.8
0.6
B(x)

0.4
0.2
0.0

κ1 κ2 κ3 κ4 κ5 κ6 κ7 κ8 κ9 κ10 κ11

98 / 327
B-Splines: Properties

I local Basis: basis functions 6= 0 only between l + 2-knots


I bounded range
⇒ avoids problems of truncated polynomials
I overlap with 2l adjacent basis functions

99 / 327
B-Splines: Properties

I local Basis: basis functions 6= 0 only between l + 2-knots


I bounded range
⇒ avoids problems of truncated polynomials
I overlap with 2l adjacent basis functions

99 / 327
B-Splines: Properties

I local Basis: basis functions 6= 0 only between l + 2-knots


I bounded range
⇒ avoids problems of truncated polynomials
I overlap with 2l adjacent basis functions

99 / 327
(B-)Splines as Linear Models

Model: y = f (x) + ε
How to estimate f (x)?
Idefine basis functions bk (x); k = 1, . . . , K
PK
I f (x) ≈
k=1 θk bk (x)
⇒ ŷ = f (x) = K
ˆ
P
k=1 θ̂k bk (x)
⇒ this is a linear model ŷ =Bθ̂ 
b1 (x1 ) . . . bK (x1 )
 .. .. 
with design matrix B =  . . 
b1 (xn ) . . . bK (xn )
I analogously applicable to GLMs: g (µ̂) = Bθ̂

100 / 327
(B-)Splines as Linear Models

Model: y = f (x) + ε
How to estimate f (x)?
Idefine basis functions bk (x); k = 1, . . . , K
PK
I f (x) ≈
k=1 θk bk (x)
⇒ ŷ = f (x) = K
ˆ
P
k=1 θ̂k bk (x)
⇒ this is a linear model ŷ =Bθ̂ 
b1 (x1 ) . . . bK (x1 )
 .. .. 
with design matrix B =  . . 
b1 (xn ) . . . bK (xn )
I analogously applicable to GLMs: g (µ̂) = Bθ̂

100 / 327
(B-)Splines as Linear Models

Model: y = f (x) + ε
How to estimate f (x)?
Idefine basis functions bk (x); k = 1, . . . , K
PK
I f (x) ≈
k=1 θk bk (x)
⇒ ŷ = f (x) = K
ˆ
P
k=1 θ̂k bk (x)
⇒ this is a linear model ŷ =Bθ̂ 
b1 (x1 ) . . . bK (x1 )
 .. .. 
with design matrix B =  . . 
b1 (xn ) . . . bK (xn )
I analogously applicable to GLMs: g (µ̂) = Bθ̂

100 / 327
(B-)Splines as Linear Models

Model: y = f (x) + ε
How to estimate f (x)?
Idefine basis functions bk (x); k = 1, . . . , K
PK
I f (x) ≈
k=1 θk bk (x)
⇒ ŷ = f (x) = K
ˆ
P
k=1 θ̂k bk (x)
⇒ this is a linear model ŷ =Bθ̂ 
b1 (x1 ) . . . bK (x1 )
 .. .. 
with design matrix B =  . . 
b1 (xn ) . . . bK (xn )
I analogously applicable to GLMs: g (µ̂) = Bθ̂

100 / 327
(B-)Splines as Linear Models

Model: y = f (x) + ε
How to estimate f (x)?
Idefine basis functions bk (x); k = 1, . . . , K
PK
I f (x) ≈
k=1 θk bk (x)
⇒ ŷ = f (x) = K
ˆ
P
k=1 θ̂k bk (x)
⇒ this is a linear model ŷ =Bθ̂ 
b1 (x1 ) . . . bK (x1 )
 .. .. 
with design matrix B =  . . 
b1 (xn ) . . . bK (xn )
I analogously applicable to GLMs: g (µ̂) = Bθ̂

100 / 327
(B-)Splines as Linear Models

Model: y = f (x) + ε
How to estimate f (x)?
Idefine basis functions bk (x); k = 1, . . . , K
PK
I f (x) ≈
k=1 θk bk (x)
⇒ ŷ = f (x) = K
ˆ
P
k=1 θ̂k bk (x)
⇒ this is a linear model ŷ =Bθ̂ 
b1 (x1 ) . . . bK (x1 )
 .. .. 
with design matrix B =  . . 
b1 (xn ) . . . bK (xn )
I analogously applicable to GLMs: g (µ̂) = Bθ̂

100 / 327
B-Splines: R-Implementation
bs in splines package creates a B-spline Designmatrix B:
library("splines")
B <- bs(x, df = 12, intercept = T)
m_bspline <- lm(y ~ B - 1)
B_scaled <- t(t(B) * coef(m_bspline))
plot(x, y, pch = 19, cex = .5, col = "grey")
matlines(x, B, lty = 1, col = 1, lwd = 2)


●●
2


● ●●
● ●
● ●
●● ●
● ●
● ● ● ● ● ●
● ● ●
● ● ● ●
● ●● ● ● ●●
● ● ● ● ● ●●● ●
● ● ● ●
● ● ●
● ● ● ●●
1

● ● ● ● ● ●
● ● ● ●●● ● ●● ●
●● ●
● ● ● ●
● ● ● ●● ● ● ●● ● ● ● ● ●
● ● ● ● ● ●
●● ●● ● ● ● ● ●
● ● ● ● ● ● ● ●● ● ● ●● ●●
y

● ● ● ● ● ● ●
● ● ●● ● ● ●
● ● ● ●

● ● ● ●

●● ● ● ● ● ● ●
● ● ● ● ● ● ●
● ● ●
0

● ● ● ●● ●● ●
● ● ●● ● ● ● ●
● ● ● ● ●
● ● ● ●● ●


● ● ●
● ● ● ● ● ● ●● ●● ● ●
● ●● ● ● ●
● ● ● ●● ●
● ● ●● ● ●● ●

● ● ● ●● ● ●
● ● ● ● ●
● ● ● ●
●● ●● ● ● ● ●● ●●

−1

● ● ● ● ●
● ● ● ● ● ● ● ●
●● ●●
●● ● ● ●
● ●● ● ● ●● ● ●
● ● ● ●
● ● ● ●

0.0 0.2 0.4 0.6 0.8 1.0

x
B-Splines: R-Implementation
library("splines")
B <- bs(x, df = 12, intercept = T)
m_bspline <- lm(y ~ B - 1)
B_scaled <- t(t(B) * coef(m_bspline))
plot(x, y, pch = 19, cex = .5, col = "grey")
matlines(x, B, lty = 1, col = scales::alpha(1, .7), lwd = .5)
matlines(x, B_scaled, lty = 1, col = 2, lwd = 2)


●●
2


● ●●
● ●
● ●
●● ●
● ●
● ● ● ● ● ●
● ● ●
● ● ● ●
● ●● ● ● ●●
● ● ● ● ● ●●● ●
● ● ● ●
● ● ●
● ● ● ●●
1

● ● ● ● ● ●
● ● ● ●●● ● ●● ●
●● ●
● ● ● ●
● ● ● ●● ● ● ●● ● ● ● ● ●
● ● ● ● ● ●
●● ●● ● ● ● ● ●
● ● ● ● ● ● ● ●● ● ● ●● ●●
y

● ● ● ● ● ● ●
● ● ●● ● ● ●
● ● ● ●

● ● ● ●

●● ● ● ● ● ● ●
● ● ● ● ● ● ●
● ● ●
0

● ● ● ●● ●● ●
● ● ●● ● ● ●
● ● ●● ● ● ● ● ● ● ● ●
● ● ● ● ●
● ● ● ● ● ● ●● ●● ● ●
● ●● ● ● ●
● ● ● ●● ●
● ● ●● ● ●● ●

● ● ● ●● ● ●
● ● ● ● ●
● ● ● ●
●● ●● ● ● ● ●● ●●

−1

● ● ● ● ●
● ● ● ● ● ● ● ●
●● ●●
●● ● ● ●
● ●● ● ● ●● ● ●
● ● ● ●
● ● ● ●

0.0 0.2 0.4 0.6 0.8 1.0

x
B-Splines: R-Implementation
library("splines")
B <- bs(x, df = 12, intercept = T)
m_bspline <- lm(y ~ B - 1)
B_scaled <- t(t(B) * coef(m_bspline))
plot(x, y, pch = 19, cex = .5, col = "grey")
matlines(x, B, lty = 1, col = scales::alpha(1, .7), lwd = .5)
matlines(x, B_scaled, lty = 1, col = scales::alpha(2, .7), lwd = 1)
lines(x, fitted(m_bspline), lty = 1, col = 3, lwd = 2)


●●
2


● ●●
● ●
● ●
●● ●
● ●
● ● ● ● ● ●
● ● ●
● ● ● ●
● ●● ● ● ●●
● ● ● ● ● ●●● ●
● ● ● ●
● ● ●
● ● ● ●●
1

● ● ● ● ● ●
● ● ● ●●● ● ●● ●
●● ●
● ● ● ●
● ● ● ●● ● ● ●● ● ● ● ● ●
● ● ● ● ● ●
●● ●● ● ● ● ● ●
● ● ● ● ● ● ● ●● ● ● ●● ●●
y

● ● ● ● ● ● ●
● ● ●● ● ● ●
● ● ● ●

● ● ● ●

●● ● ● ● ● ● ●
● ● ● ● ● ● ●
● ● ●
0

● ● ● ●● ●● ●
● ● ●● ● ● ●
● ● ●● ● ● ● ● ● ● ● ●
● ● ● ● ●
● ● ● ● ● ● ●● ●● ● ●
● ●● ● ● ●
● ● ● ●● ●
● ● ●● ● ●● ●

● ● ● ●● ● ●
● ● ● ● ●
● ● ● ●
●● ●● ● ● ● ●● ●●

−1

● ● ● ● ●
● ● ● ● ● ● ● ●
●● ●●
●● ● ● ●
● ●● ● ● ●● ● ●
● ● ● ●
● ● ● ●

0.0 0.2 0.4 0.6 0.8 1.0

x
Splines: Summary

I basis function representation linearizes problem of function


estimation
I dimension of basis controls maximal complexity
I basis type determines properties of function estimate: continuity,
differentiability, periodicity, . . .

104 / 327
Recap: Linear Models

Recap: Generalized Linear Models

Recap: Non-Linear Effects

Recap: Mixed Models and Random Effects


Exemplary Longitudinal Study: Sleep Deprivation
Motivation: From LM to LMM
Advantages of a Mixed Models Representation
Linear Mixed Models
LMM Estimation
Generalized Linear Mixed Models
GLMM Estimation

Recap: Additive Models and Penalization

105 / 327
Example: Sleep Deprivation Data

I laboratory experiment to measure effect of sleep deprivation on


cognitive performance
I 18 subjects, restricted to 3 hours of sleep per night for 10 days
I operationalization of cognitive performance: reaction time

105 / 327
Example: Sleep Deprivation Data

data(sleepstudy, package = "lme4")


summary(sleepstudy)

## Reaction Days Subject


## Min. :194.3 Min. :0.0 308 : 10
## 1st Qu.:255.4 1st Qu.:2.0 309 : 10
## Median :288.7 Median :4.5 310 : 10
## Mean :298.5 Mean :4.5 330 : 10
## 3rd Qu.:336.8 3rd Qu.:7.0 331 : 10
## Max. :466.4 Max. :9.0 332 : 10
## (Other):120

106 / 327
Example: Sleep Deprivation Data

308 309 310 330 331 332


● ●


400 ●



Average reaction time [ms]

● ●
● ● ● ● ● ● ●
● ● ●
300 ● ● ●

● ● ●

● ● ●

● ● ●
● ● ● ●


● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ●
200 ● ●

333 334 335 337 349 350


● ●

400 ●
● ●
● ●



● ●

● ● ● ● ● ●
● ● ●
● ●
300 ● ● ●
● ●
● ●
● ●

● ●

● ●
● ●

● ● ● ● ●
● ● ● ● ● ● ●
● ● ● ● ● ●
200
351 352 369 370 371 372

400 ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ●
● ● ● ●
● ●
300 ●

● ● ●




● ●




● ● ● ● ● ●



● ●
● ● ●

● ●
● ● ●
200
1 4 7 1 4 7 1 4 7 1 4 7 1 4 7 1 4 7
Days of sleep deprivation
Example: Sleep Deprivation Data
Model global trend: Reactionij ≈ β0 + β1 Daysij

m_sleep_global <- lm(Reaction ~ Days, data = sleepstudy)


summary(m_sleep_global)

##
## Call:
## lm(formula = Reaction ~ Days, data = sleepstudy)
##
## Residuals:
## Min 1Q Median 3Q Max
## -110.848 -27.483 1.546 26.142 139.953
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 251.405 6.610 38.033 < 2e-16 ***
## Days 10.467 1.238 8.454 9.89e-15 ***
## ---
## Signif. codes:
## 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
##
## Residual standard error: 47.71 on 178 degrees of freedom
## Multiple R-squared: 0.2865, Adjusted R-squared: 0.2825
## F-statistic: 71.46 on 1 and 178 DF, p-value: 9.894e-15
Example: Sleep Deprivation Data
With estimated global level and trend added:
308 309 310 330 331 332
● ●


400 ●



Average reaction time [ms]

● ●
● ● ● ● ● ● ●
● ● ●
300 ● ● ●

● ● ●

● ● ●

● ● ●
● ● ● ●


● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ●
200 ● ●

333 334 335 337 349 350


● ●

400 ●
● ●
● ●



● ●

● ● ● ● ● ●
● ● ●
● ●
300 ● ● ●
● ●
● ●
● ●

● ●

● ●
● ●

● ● ● ● ●
● ● ● ● ● ● ●
● ● ● ● ● ●
200
351 352 369 370 371 372

400 ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ●
● ● ● ●
● ●
300 ●

● ● ●




● ●




● ● ● ● ● ●



● ●
● ● ●

● ●
● ● ●
200
1 4 7 1 4 7 1 4 7 1 4 7 1 4 7 1 4 7
Days of sleep deprivation
⇒ obviously inappropriate model
Example: Sleep Deprivation Data

I subjects obviously differ in level and trend for reaction time


I idea: model subject-specific levels and trends
Reactionij ≈ β0i + β1i Daysij
# similar: m_sleep_indiv <- lm(Reaction ~ 0 + Subject + Subject:Days)
m_sleep_indiv <- lmList(Reaction ~ Days | Subject, data = sleepstudy)
head(coef(m_sleep_indiv))
## (Intercept) Days
## 308 244.1927 21.764702
## 309 205.0549 2.261785
## 310 203.4842 6.114899
## 330 289.6851 3.008073
## 331 285.7390 5.266019
## 332 264.2516 9.566768
Example: Sleep Deprivation Data
With estimated individual level and trend added:
308 309 310 330 331 332
500 ● ●

400 ●
● ●
Average reaction time [ms]

● ● ●
● ● ● ● ● ●
● ● ● ● ●
300 ● ● ●

● ● ●

● ● ● ● ● ● ● ● ● ● ●


● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ●
200 ● ●

333 334 335 337 349 350


500 ● ●

400 ● ● ●
● ● ●

● ●
● ● ● ● ● ● ● ● ● ●
300 ● ● ● ● ●
● ●
● ●
● ●





● ● ●
● ●

● ● ● ●

● ● ● ● ●
● ● ● ● ● ●
200

351 352 369 370 371 372


500
400 ● ● ● ● ● ●
● ●
● ●

● ●
● ● ●
● ● ●
● ● ● ● ● ● ●
300 ●

● ● ●




● ● ● ●

● ● ● ● ● ● ● ● ●

● ●
● ● ●

● ● ● ● ●
200
0.02.55.07.5 0.02.55.07.5 0.02.55.07.5 0.02.55.07.5 0.02.55.07.5 0.02.55.07.5
Days of sleep deprivation
⇒ better fit
Outline
Recap: Linear Models

Recap: Generalized Linear Models

Recap: Non-Linear Effects

Recap: Mixed Models and Random Effects


Exemplary Longitudinal Study: Sleep Deprivation
Motivation: From LM to LMM
Advantages of a Mixed Models Representation
Linear Mixed Models
LMM Estimation
Generalized Linear Mixed Models
GLMM Estimation

Recap: Additive Models and Penalization


112 / 327
Motivation: From LM to LMM

I global model yij = β0 + β1 x1ij + εij :


I ignores within-subject correlation
⇒ variability of coefficients underestimated since correlated data
contain less information than independent data
⇒ invalid inference (tests, CIs)
⇒ complete pooling
I subject-specific models yij = β0i + β1i x1ij + εij :
I can be interpreted only with regard to the data in the sample
⇒ no generalization to “typical” subjects / population
I many many parameters to estimate
⇒ estimates may be unstable, imprecise
⇒ no pooling

112 / 327
Motivation: From LM to LMM

I global model yij = β0 + β1 x1ij + εij :


I ignores within-subject correlation
⇒ variability of coefficients underestimated since correlated data
contain less information than independent data
⇒ invalid inference (tests, CIs)
⇒ complete pooling
I subject-specific models yij = β0i + β1i x1ij + εij :
I can be interpreted only with regard to the data in the sample
⇒ no generalization to “typical” subjects / population
I many many parameters to estimate
⇒ estimates may be unstable, imprecise
⇒ no pooling

112 / 327
Motivation: From LM to LMM

alternative representation of subject-specific models:

yij = β̄0 + (β0i − β̄0 ) + β̄1 x1ij + (β1i − β̄1 )x1ij + εij

with means of subject-specific parameters


18 18
1 X 1 X
β̄0 = β0i ; β̄1 = β1i
18 18
i=1 i=1

113 / 327
Motivation: From LM to LMM
I idea of a random effect model
I β̄ is the population level effect β.
I express subject-specific deviations βi − β̄ as Gaussian random variables
bi ∼ N(0, σb2 ).
I this yields
yij = β0 + β1 x1ij + b0i + b1i x1ij + εij
with
εij ∼ N(0, σ 2 ), (b0i , b1i )> ∼ N2 (0, Σ).

I or alternatively:
yij = b0i + b1i x1ij + εij
with
 
εij ∼ N(0, σ 2 ), (b0i , b1i )> ∼ N2 (β0 , β1 )> , Σ .

114 / 327
Outline
Recap: Linear Models

Recap: Generalized Linear Models

Recap: Non-Linear Effects

Recap: Mixed Models and Random Effects


Exemplary Longitudinal Study: Sleep Deprivation
Motivation: From LM to LMM
Advantages of a Mixed Models Representation
Linear Mixed Models
LMM Estimation
Generalized Linear Mixed Models
GLMM Estimation

Recap: Additive Models and Penalization


115 / 327
Partial Pooling

I regression coefficients β0 , β1 , . . . in random effects models retain


their interpretation as population level parameters.
I subject-specific deviations from the population mean are modeled by
random effects – the implicit assumption is that subjects are a
random sample from the population of interest
⇒ partial pooling, with strength of pooling determined by random effect
variance.

115 / 327
Advantages of the Random Effects Approach
I decomposition of random variability in data into
I subject-specific deviations from population mean
I deviation of observations from subject means
⇒ more precise estimates of population trends
I some degree of protection against bias caused by drop-out
I random effects serve as surrogates for effects of unobserved
subject-level covariates
⇒ control for unobserved heterogeneity
I distributional assumption bi ∼ N stabilizes estimates b̂i (shrinkage
effect) compared to fixed subject-specific estimates β̂i without
distributional assumption
I intuition: estimates are stabilized by including prior knowledge in the
model, i.e., assuming that subjects from the population are mostly
similar to each other

116 / 327
Advantages of the Random Effects Approach

I random effects model the correlation structure between observations:

yij = β0 + bi + εij
i.i.d. i.i.d.
with bi ∼ N(0, σb2 ); εij ∼ N(0, σε2 )
Cov(bi , bi ) σ2
=⇒ Corr(yij , yij 0 ) = p = 2 b 2.
Var(yij ) Var(yij 0 ) σb + σε

I independence between observations on different subjects is retained


(for the kind of correlation structure we discuss here):

Corr(yij , yi 0 j ) = 0 for i 6= j.

117 / 327
Advantages of the Random Effects Approach
I random effects model the correlation structure between observations:

yij = β0 + b0i + b1i tj + εij


i.i.d. i.i.d.
with (b0i , b1i )> ∼ N2 (0, Σ); εij ∼ N(0, σε2 )
2 2 2
=⇒ Var(yij ) = σb0 + 2σb01 tj + σb1 tj + σε2
2 2
=⇒ Cov(yij , yij 0 ) = σb0 + σb01 (tj + tj 0 ) + σb1 tj tj 0

I independence between observations on different subjects is retained


(for the kind of correlation structure we discuss here):

Corr(yij , yi 0 j ) = 0 for i 6= j.

117 / 327
Advantages of the Random Effects Approach

I random effects model the correlation structure between observations:


I independence between observations on different subjects is retained
(for the kind of correlation structure we discuss here):

Corr(yij , yi 0 j ) = 0 for i 6= j.

117 / 327
Outline
Recap: Linear Models

Recap: Generalized Linear Models

Recap: Non-Linear Effects

Recap: Mixed Models and Random Effects


Exemplary Longitudinal Study: Sleep Deprivation
Motivation: From LM to LMM
Advantages of a Mixed Models Representation
Linear Mixed Models
LMM Estimation
Generalized Linear Mixed Models
GLMM Estimation

Recap: Additive Models and Penalization


118 / 327
General Form of Linear Mixed Models

Linear Mixed Model:

y = Xβ + Ub + ε
b ∼ N(0, G)
ε ∼ N(0, R)

I U: design matrix for random effects


I independence between ε and b.
I entries in G, R determined by (co-)variance parameters ϑ
I we’ll focus on independent errors with R = σ 2 I

118 / 327
Conditional and Marginal Perspective

Conditional perspective:

y|b ∼ N(Xβ + Ub, R); b ∼ N(0, G)

Interpretation:
random effects b are subject-specific effects that vary across the
population.
Hierarchical formulation:
expected response is a function of population-level effects (fixed effects)
and subject-level effects (random effects).

119 / 327
Conditional and Marginal Perspective

Marginal perspective:

y ∼ N(Xβ, V) V = Cov(y) = UGU> + R

Interpretation:
random effects b induce a correlation structure in y defined by U and G,
and thereby allow valid analyses of correlated data.
Marginal formulation:
model is concerned with the marginal expectation of y averaged over the
population as a function of population-level effects.

The marginal model is more general than the hierarchical model.

general estimating equations: geepack

120 / 327
Linear Mixed Model for Longitudinal Data
For subjects i = 1, . . . , m, each with observations j = 1, . . . , ni :

yij = xij β + uij bi + εi j


bi ∼ Nq (0, Σ)
⇔ y = Xβ + Ub + ε

with
Pm
I y = (y> > >
1 , y2 , . . . , ym ) (n = i=1 ni entries)
I > > >
ε = (ε1 , ε2 , . . . , εm ) (n entries)
I β = (β0 , β1 , . . . , βp )>
I X = [1 x1 . . . xp ]
I b = (b1 , b2 , . . . , bm ) of length mq, with b ∼ Nmq (0, G)
I G = diag(Σ, . . . , Σ))
I U = diag(U1 , . . . , Um ) with dimension n × mq
 
I Ui = 1 u1i . . . u(q−1)i with dimension ni × q. Variables in Ui are
typically a subset of those in X.
121 / 327
Other Types of Mixed Models

I hierarchical/multi-level model:
e.g., test score yijk of a pupil i in class j in school k:

yijk = β0 + x>
ijk β + b1j + b2k + εijk

with random intercepts for class (b1j ∼ N(0, σ12 )) and school
(b2k ∼ N(0, σ22 ))
I crossed designs:
e.g., score yij of a subject i on an item j:

yij = β0 + x>
ij β + b1i + b2j + εijk

with random intercepts for subject (b1i ∼ N(0, σ12 ), subject ability)
and item (b2j ∼ N(0, σ22 ), item difficulty)

122 / 327
Outline
Recap: Linear Models

Recap: Generalized Linear Models

Recap: Non-Linear Effects

Recap: Mixed Models and Random Effects


Exemplary Longitudinal Study: Sleep Deprivation
Motivation: From LM to LMM
Advantages of a Mixed Models Representation
Linear Mixed Models
LMM Estimation
Generalized Linear Mixed Models
GLMM Estimation

Recap: Additive Models and Penalization


123 / 327
Likelihood-Based Estimation of Linear Mixed Models
ML-Estimation
I determine ϑ̂ML so that profile likelihood in V of the marginal model
is maximal:

y ∼ N(Xβ, V(ϑ))
1n o
l(β, ϑ) = − log |V(ϑ)| + (y − Xβ)> V(ϑ)−1 (y − Xβ)
2
 −1
β(ϑ)
b = arg max (l(β, ϑ)) = X> V(ϑ)−1 X X> V(ϑ)−1 y
β
1n >
o
lP (ϑ) = − log |V(ϑ)| + (y − Xβ(ϑ))
b V(ϑ)−1 (y − Xβ(ϑ))
b
2
→ max
ϑ

I for given ϑ, closed form solutions for β̂ and b.


b
I b ϑ̂) = GZ> V(ϑ̂)−1 (y − Xβ(
simple generalized least squares: b( b ϑ̂)).
I Cov(β)
\ and Cov(b) \ computable for tests, CIs.
123 / 327
Likelihood-Based Estimation of Linear Mixed Models
REML estimation:
I ML-estimates ϑ̂ are biased, unbiased variance component estimates
from marginal-marginal likelihood of ϑ (“restricted”):
Z 
lR (ϑ) = log L(β, ϑ)dβ
1
∝ lP (ϑ) − log |X> V−1 X| → max
2 ϑ

I closed form solutions for β̂ and b


b and their covariances given ϑ still
apply
I both are tricky optimization problems:
I positivity constraints for most entries in ϑ
I computationally expensive, numerically unstable log-determinants
I SOTA implementation for large sub-class: mgcv

124 / 327
Outline
Recap: Linear Models

Recap: Generalized Linear Models

Recap: Non-Linear Effects

Recap: Mixed Models and Random Effects


Exemplary Longitudinal Study: Sleep Deprivation
Motivation: From LM to LMM
Advantages of a Mixed Models Representation
Linear Mixed Models
LMM Estimation
Generalized Linear Mixed Models
GLMM Estimation

Recap: Additive Models and Penalization


125 / 327
Generalized Linear Mixed Models

I GLM generalizes LM via addition of a link function


I mapping the linear predictor to a range appropriate for the response
distribution,
I and linking the variance to the expected value in a way appropriate for
the response distribution.
I carries over directly for a generalized linear mixed model (GLMM):
E(y|b) = h(Xβ + Ub)
with known response function h()
I BUT: estimation much harder problem than for LMMs or GLMs,
especially for binary responses (more later).
I BUT: GLMMs can only be interpreted in the conditional/hierarchical
perspective. Use GEEs for marginal models.

125 / 327
Generalized Linear Mixed Models

Model:

y|b : yi |b ∼ Expo.fam.(E(yi |b) = h(Xβ + Zb), φ)


b|ϑ : b|ϑ ∼ N(0, G(ϑ))

126 / 327
Caveat: Effect Attenuation in GLMMs
LMM Logit−GLMM
1.00

4
0.75
h(xiβ + b0 i)

0
0.50

−4 0.25

−8 0.00
−2 −1 0 1 2 −2 −1 0 1 2
x
1
For random intercept logit-models: βmar ≈ √ βcond
1+0.346σb2
127 / 327
Outline
Recap: Linear Models

Recap: Generalized Linear Models

Recap: Non-Linear Effects

Recap: Mixed Models and Random Effects


Exemplary Longitudinal Study: Sleep Deprivation
Motivation: From LM to LMM
Advantages of a Mixed Models Representation
Linear Mixed Models
LMM Estimation
Generalized Linear Mixed Models
GLMM Estimation

Recap: Additive Models and Penalization


128 / 327
GLMM Estimation
LMM estimation exploits analytically accessible marginal likelihood:
Z
L(β, ϑ, φ) = L(b, β, φ, ϑ)db

is the density of

y|β, φ, ϑ ∼ N(Xβ, ZG(ϑ)Z> + R(φ, ϑ)).

For GLMMs:
n
Z !
Y
L(β, ϑ, φ) = f (yi |β, φ, b, ϑ) f (b|ϑ)db
i=1

(...sucks)

128 / 327
GLMM Estimation Algorithms

I Laplace approximation based : iterate


1. Compute b̂ = arg maxb L(β, φ, ϑ, b) for given β, φ, ϑ via penalized
IWLS-Algorithmus (P-IRLS).
2. Maximize a Laplace-Approximation L̃(β, φ, ϑ) of L(β, φ, ϑ) in b̂
(numerically, typically gradient based)
(mgcv, with lots of tricksy tricks; lme4 for large b)
I (Gaussian) quadrature based methods: more accurate, much slower
(lme4, gamm4)
I penalized quasi likelihood: replace GLMM by LMM with IWLS
working reponses and weights. Biased, not guaranteed to converge,
fairly fast. (nlme, mgcv:gamm)
I do (full) Bayes: flexible choice of effect distributions, hyperpriors,
likelihoods; very slow (STAN: stanarm, brms)

129 / 327
GLMM Estimation Algorithms

I Laplace approximation based : iterate


1. Compute b̂ = arg maxb L(β, φ, ϑ, b) for given β, φ, ϑ via penalized
IWLS-Algorithmus (P-IRLS).
2. Maximize a Laplace-Approximation L̃(β, φ, ϑ) of L(β, φ, ϑ) in b̂
(numerically, typically gradient based)
(mgcv, with lots of tricksy tricks; lme4 for large b)
I (Gaussian) quadrature based methods: more accurate, much slower
(lme4, gamm4)
I penalized quasi likelihood: replace GLMM by LMM with IWLS
working reponses and weights. Biased, not guaranteed to converge,
fairly fast. (nlme, mgcv:gamm)
I do (full) Bayes: flexible choice of effect distributions, hyperpriors,
likelihoods; very slow (STAN: stanarm, brms)

129 / 327
GLMM Estimation Algorithms

I Laplace approximation based : iterate


1. Compute b̂ = arg maxb L(β, φ, ϑ, b) for given β, φ, ϑ via penalized
IWLS-Algorithmus (P-IRLS).
2. Maximize a Laplace-Approximation L̃(β, φ, ϑ) of L(β, φ, ϑ) in b̂
(numerically, typically gradient based)
(mgcv, with lots of tricksy tricks; lme4 for large b)
I (Gaussian) quadrature based methods: more accurate, much slower
(lme4, gamm4)
I penalized quasi likelihood: replace GLMM by LMM with IWLS
working reponses and weights. Biased, not guaranteed to converge,
fairly fast. (nlme, mgcv:gamm)
I do (full) Bayes: flexible choice of effect distributions, hyperpriors,
likelihoods; very slow (STAN: stanarm, brms)

129 / 327
GLMM Estimation Algorithms

I Laplace approximation based : iterate


1. Compute b̂ = arg maxb L(β, φ, ϑ, b) for given β, φ, ϑ via penalized
IWLS-Algorithmus (P-IRLS).
2. Maximize a Laplace-Approximation L̃(β, φ, ϑ) of L(β, φ, ϑ) in b̂
(numerically, typically gradient based)
(mgcv, with lots of tricksy tricks; lme4 for large b)
I (Gaussian) quadrature based methods: more accurate, much slower
(lme4, gamm4)
I penalized quasi likelihood: replace GLMM by LMM with IWLS
working reponses and weights. Biased, not guaranteed to converge,
fairly fast. (nlme, mgcv:gamm)
I do (full) Bayes: flexible choice of effect distributions, hyperpriors,
likelihoods; very slow (STAN: stanarm, brms)

129 / 327
Mixed Models in a Nutshell
I standard regression models can model only the structure of the
expected values of the response
I mixed models are regression models in which a subset of coefficients
are assumed to be random unknown quantities from a known
distribution instead of fixed unknowns, and this means we can
I model the covariance structure of the data (marginal perspective)
I estimate (a large number of) subject-level coefficients without too
much trouble (conditional perspective)
I random intercepts can be used to model subject-specific differences in
the level of the response
→ grouping variable as a special kind of nominal covariate
I a random slope for a covariate is like an interaction between the
grouping variable and that covariate
→ grouping variable as a special kind of effect modifier for that
covariate
I hard estimation problems: variance components difficult to optimize,
often very high-dim. b
130 / 327
Mixed Models in a Nutshell
I standard regression models can model only the structure of the
expected values of the response
I mixed models are regression models in which a subset of coefficients
are assumed to be random unknown quantities from a known
distribution instead of fixed unknowns, and this means we can
I model the covariance structure of the data (marginal perspective)
I estimate (a large number of) subject-level coefficients without too
much trouble (conditional perspective)
I random intercepts can be used to model subject-specific differences in
the level of the response
→ grouping variable as a special kind of nominal covariate
I a random slope for a covariate is like an interaction between the
grouping variable and that covariate
→ grouping variable as a special kind of effect modifier for that
covariate
I hard estimation problems: variance components difficult to optimize,
often very high-dim. b
130 / 327
Mixed Models in a Nutshell
I standard regression models can model only the structure of the
expected values of the response
I mixed models are regression models in which a subset of coefficients
are assumed to be random unknown quantities from a known
distribution instead of fixed unknowns, and this means we can
I model the covariance structure of the data (marginal perspective)
I estimate (a large number of) subject-level coefficients without too
much trouble (conditional perspective)
I random intercepts can be used to model subject-specific differences in
the level of the response
→ grouping variable as a special kind of nominal covariate
I a random slope for a covariate is like an interaction between the
grouping variable and that covariate
→ grouping variable as a special kind of effect modifier for that
covariate
I hard estimation problems: variance components difficult to optimize,
often very high-dim. b
130 / 327
Recap: Linear Models

Recap: Generalized Linear Models

Recap: Non-Linear Effects

Recap: Mixed Models and Random Effects

Recap: Additive Models and Penalization


Penalization: Controlling smoothness
Smoothing Parameter Optimization
Generalized Additive Models
Surface Estimation
Varying coefficients

131 / 327
Splines

I Splines
I piecewise polynomials with smoothness properties at knot locations
I can be embedded into (generalized) linear models (e.g. ML estimates)
I Problem: choice of optimal knot setting.
I Two-fold problem:
I how many knots?
I where to put them?
I two possible solutions, one of them good:
I adaptive knot choice: make no. of knots and their positioning part of
optimization procedure
I penalization: use large number of knots to guarantee sufficient model
capacity, but add a cost (penalty) for wiggliness / complexity to
optimization procedure

131 / 327
Splines

I Splines
I piecewise polynomials with smoothness properties at knot locations
I can be embedded into (generalized) linear models (e.g. ML estimates)
I Problem: choice of optimal knot setting.
I Two-fold problem:
I how many knots?
I where to put them?
I two possible solutions, one of them good:
I adaptive knot choice: make no. of knots and their positioning part of
optimization procedure
I penalization: use large number of knots to guarantee sufficient model
capacity, but add a cost (penalty) for wiggliness / complexity to
optimization procedure

131 / 327
Splines

I Splines
I piecewise polynomials with smoothness properties at knot locations
I can be embedded into (generalized) linear models (e.g. ML estimates)
I Problem: choice of optimal knot setting.
I Two-fold problem:
I how many knots?
I where to put them?
I two possible solutions, one of them good:
I adaptive knot choice: make no. of knots and their positioning part of
optimization procedure
I penalization: use large number of knots to guarantee sufficient model
capacity, but add a cost (penalty) for wiggliness / complexity to
optimization procedure

131 / 327
Function estimation example: climate reconstruction


0.5

0.0

● ● ●

● ●

● ●



● ●
● ● ●





● ●
● ●
● ●



● ●
● ● ●

● ● ●
● ●


● ●

● ● ●
● ●

● ●

● ● ●
● ●
● ● ●
● ● ●
● ● ●



● ● ●


● ● ●
● ● ●
● ● ●
● ● ●
● ●
● ●


● ● ●
● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ●
● ● ●
● ●
● ● ●

● ● ●
● ● ●
● ●
● ● ● ●
● ●
● ● ●
● ● ●
● ●
● ●
● ●
● ●
● ●
● ●
● ● ●
● ● ● ●
● ● ● ●
● ● ● ●

● ● ● ● ● ●
● ● ● ●
● ● ●
● ●
● ● ● ●
● ● ● ● ● ● ●
● ●
● ● ● ●
● ●
● ● ● ●
● ●
● ●
● ● ● ● ● ● ● ●

● ● ● ●
● ● ● ● ●
● ● ● ●
● ●●
● ● ● ●

● ●●
● ● ●
● ●
● ● ● ●
● ●
● ● ● ● ●
● ● ● ● ● ● ●
● ● ● ● ●
● ● ● ●
● ● ●
● ●● ● ●
● ● ● ●
● ● ● ● ● ● ●
● ● ● ● ●
● ● ●
● ● ●● ●
● ● ● ● ● ● ●
● ● ●
● ● ● ● ●
● ● ●
● ● ● ● ● ● ●
● ● ● ● ● ●
● ●
● ● ● ● ● ●
● ● ●
● ● ● ● ● ●
● ● ●
● ● ●
● ● ●
● ● ● ● ● ● ●

● ● ● ● ●
● ● ● ●
● ● ● ● ● ● ●
● ● ● ●
● ● ● ● ● ●
● ● ● ● ● ● ●
● ● ●
● ● ● ● ● ● ●
● ● ●
● ● ● ● ●● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ●
● ● ● ● ●
● ● ● ● ● ●
● ● ● ● ● ●
● ● ● ● ●● ●
● ●
● ● ●
● ● ● ● ● ●
● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ●
● ● ● ● ● ●● ●
● ● ● ● ● ●
● ● ● ● ● ●
● ● ● ●
● ● ● ● ●
● ● ● ●●● ● ● ● ● ●

● ●

● ●● ●
● ●
● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ●
● ●
● ● ● ●
● ● ● ●
● ● ● ● ● ● ● ●
● ●
● ● ● ● ● ●
● ● ● ●
●● ● ●
● ● ● ● ● ●
● ● ● ● ● ●

● ● ● ● ●
● ● ●
● ● ● ● ● ●
● ● ● ●
● ● ● ● ● ● ●
● ● ● ● ●
● ● ● ● ● ●
● ● ● ● ● ●

● ●● ● ●
● ● ●
● ● ● ● ● ● ● ●
● ●● ● ●
● ● ●
● ● ● ●
● ● ● ● ● ●

0.0
● ● ● ● ●
● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ●
● ● ● ● ● ●
● ● ●● ●
● ● ● ● ●
● ● ● ● ●
● ● ● ● ●
● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ●
● ● ● ● ●
● ● ● ●
● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ●
−0.5

● ● ● ● ●
● ● ● ● ● ● ●
● ● ● ● ● ● ●● ● ●
● ●● ● ●
● ● ● ● ●
● ●
● ● ● ● ● ●●
●● ●● ● ● ● ● ● ● ●
● ● ● ● ●
● ●
● ● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ● ● ● ● ● ●
● ● ● ● ● ●
● ● ● ● ● ● ●
● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ●
● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ●
● ● ● ●
● ● ● ● ● ●
● ● ● ● ● ●
● ● ● ● ● ● ● ● ●
● ●
● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ●
● ● ● ●
● ● ● ●
● ● ● ● ● ● ● ● ●
● ● ●
● ● ● ● ● ● ● ● ● ●
● ● ●
● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ●
● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ●
● ● ●
● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ●● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ●
● ● ● ● ● ● ●
● ● ● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ● ● ● ●
● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ●
● ● ● ● ● ●
● ● ● ● ● ●
● ● ● ● ● ● ●
● ● ● ● ● ● ●
● ● ● ● ●
● ●
● ● ● ● ● ● ●
● ●
● ● ● ● ● ● ●
● ● ● ● ● ● ●
● ● ●
● ●
● ● ●● ● ● ● ● ● ● ●
● ● ● ● ● ● ●
● ● ● ● ● ● ●
● ● ● ● ● ● ●
● ● ● ●● ● ● ● ● ●
● ● ● ● ● ●
● ● ● ● ● ●
● ● ● ● ● ● ●
● ● ● ● ● ●
● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ●
● ●● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ●
● ●
●● ● ● ●
● ● ● ● ● ●
● ● ●
● ● ● ● ●
● ● ●
● ● ● ● ● ● ● ●
● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ●●
● ● ● ● ● ● ●
● ● ● ● ● ●
● ● ● ● ● ● ● ● ●●
● ● ●
● ● ● ●
● ● ● ● ● ● ● ● ●
● ● ● ●
● ● ●
● ● ●
● ● ● ●
● ● ● ●
● ● ● ● ●
● ● ● ● ●
● ● ●
● ● ●
● ● ● ●
● ● ●
● ● ●
● ● ● ● ● ●
● ● ● ● ● ● ● ● ●
● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ●
● ●
● ● ● ●
● ● ● ● ● ●
● ● ●
● ● ● ●
● ● ● ●
● ● ●
● ● ● ● ● ●
● ● ● ● ● ●
●● ● ●
● ● ● ●
● ● ● ● ●
● ● ● ●● ●
● ●
● ● ● ● ●
● ● ● ● ●
● ● ● ● ●
● ● ● ●
● ● ●
● ● ● ●
● ● ● ●
● ●

● ● ● ●
● ●
● ● ● ● ●● ●
● ● ● ● ● ● ● ●
● ● ● ● ●
● ● ● ●
● ● ● ● ●
● ● ● ●
● ● ●
● ● ●
● ●
● ● ● ● ●
● ● ●
● ● ● ●
● ● ● ●● ● ● ●
● ●
● ● ●
● ● ● ● ● ● ● ● ●
● ● ●
● ● ●
● ● ●
● ● ● ● ●
● ● ● ● ●
● ● ● ●
● ●
● ● ● ● ● ● ● ● ●
● ●
● ● ● ● ●
● ● ● ● ● ● ● ● ●
● ●
● ● ● ●


● ● ● ●


● ●
● ● ● ● ● ●
● ● ● ●
● ● ● ● ●


● ● ●
● ● ●
● ● ● ● ● ●
● ●
● ● ● ● ● ● ●
● ● ●
● ● ● ●
● ●
● ●
● ● ● ● ●
● ●
● ● ● ● ●
●●
● ●
● ●
● ●


● ●●

● ●
● ●
● ● ●
● ● ●
● ● ●
● ● ● ● ●
● ● ●
● ●
● ● ●

● ●


● ●
● ●
● ● ●
● ● ● ●

● ●
● ●
● ● ●
● ●
● ●
● ●


● ● ● ● ●
● ●
● ●

● ●●

● ● ● ●

−0.5


● ●
● ●

● ●
● ●



● ● ● ●
● ●

● ● ●

● ●
● ●
● ●

● ●


● ●

● ● ●

● ● ● ●
● ●
● ●



−1.0



● ● ● ●



● ● ●


● ● ●




● ●

0 500 1000 1500 2000 0 500 1000 1500 2000

● ●

● ●

● ●

● ●
● ●

● ●

● ●

● ●
0.5

0.5
● ●

● ●

● ● ● ● ● ●
● ●

● ●

● ●

● ●

● ● ● ●
● ●

● ● ● ●

● ●
● ●

● ●
● ● ● ●
● ● ● ● ● ●
● ●
● ●
● ●
● ●
● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ●
● ●

● ●
● ●
● ● ● ●
● ● ● ● ● ●
● ●
● ● ● ● ● ●
● ● ● ●
● ●
● ●

● ●
● ● ● ●
● ●
● ● ● ●
● ●
● ● ● ●
● ●
● ● ● ●
● ●
● ● ● ●
● ●
● ● ● ●
● ● ● ● ● ●
● ● ● ● ● ●
● ● ● ● ● ●
● ●
● ●
● ●
● ● ● ● ● ●
● ●
● ●
● ● ● ● ● ●
● ● ● ● ● ●
● ● ● ● ● ●
● ● ● ● ● ●
● ● ● ●
● ● ● ●
● ●
● ●
● ● ● ● ● ●
● ● ● ●
● ● ● ● ● ●
● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ●

You might also like