00 Main Pt1to6

Regression & other methods for Functional Data
Fabian Scheipl
Institut für Statistik
Ludwig-Maximilians-Universität München
adidas - March 2019

Part I
Background: Functional Data
2 / 327
Introduction
Descriptive Statistics for Functional Data
Basis Representation of Functional Data
Summary
Introduction
Overview
From high-dimensional to functional data
Summary
4 / 327
Introduction
Overview
Examples of functional data: Berkeley growth study
200
160
Height
120 80
5 10 15
Age
4 / 327
Introduction
Overview
Examples of functional data: Handwriting
0.03
−0.03 −0.01 0.01
y(t)
−0.03 −0.01 0.01 0.03

x(t)
5 / 327
Introduction
Overview
Examples of functional data: Brain scan images
6 / 327
Introduction
Overview
Characteristics of functional data:
200
0.03
160
−0.03 −0.01 0.01

Height
y(t)
120 80
5 10 15 −0.03 −0.01 0.01 0.03

Age x(t)
I Several measurements for the same statistical unit, often over time
I Sampling grid is not necessarily equally spaced, sparse data
I Smooth variation, that could be assessed (in principle) as often as
desired
I Noisy observations
I Many observations of the same data generating process
↔ time series analysis
J. Ramsay and Silverman 2005
7 / 327
Introduction
Overview
200
0.03
160
−0.03 −0.01 0.01

Height
y(t)
120 80
5 10 15 −0.03 −0.01 0.01 0.03

Age x(t)
desired
7 / 327
Introduction
Overview
200
0.03
160
−0.03 −0.01 0.01

Height
y(t)
120 80
5 10 15 −0.03 −0.01 0.01 0.03

Age x(t)
desired
7 / 327
Introduction
Overview
200
0.03
160
−0.03 −0.01 0.01

Height
y(t)
120 80
5 10 15 −0.03 −0.01 0.01 0.03

Age x(t)
desired
7 / 327
Introduction
Overview
200
0.03
160
−0.03 −0.01 0.01

Height
y(t)
120 80
5 10 15 −0.03 −0.01 0.01 0.03

Age x(t)
desired
7 / 327
Introduction
Overview
200
0.03
160
−0.03 −0.01 0.01

Height
y(t)
120 80
5 10 15 −0.03 −0.01 0.01 0.03

Age x(t)
desired
7 / 327
Introduction
Overview
Aims of functional data analysis:

I Represent the data → interpolation, smoothing
I Display the data → registration, outlier detection
I Study sources of pattern and variation → functional principal
component analysis, canonical correlation analysis
I Explain variation in a dependent variable by using independent
variable information → functional regression models
I No forecasting / extrapolation ↔ time series analysis
● ●
● ●
4
4
● ●
● ●
● ●
● ●
● ●
● ●
● ●
● ●
● ●
●● ● ●● ●● ● ●●
● ● ● ● ● ●
● ● ● ●
● ●
● ● ● ● ● ●
2
2
●● ● ● ●● ● ●
● ●
● ●
x(t)
x(t)
● ● ● ●
● ●
● ● ● ● ● ●
● ●
● ● ● ● ● ●
● ●● ● ●●
● ●
●● ● ● ● ●● ● ● ●
● ● ● ●
●● ● ● ● ● ●● ● ● ● ●
● ● ● ●● ● ● ● ●●
0
0
● ● ● ●
● ● ● ● ● ● ● ●
● ●● ● ● ● ● ●● ● ● ●
● ●
● ● ● ●
● ● ● ● ● ●
● ● ● ●
● ●● ● ● ●● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ●
−2
−2
●● ●●
● ●
● ●
●● ● ●● ●
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
t t
8 / 327
Introduction
Overview

1.5
1.5
0.03
0.03
1.0
1.0
0.5
0.5
−0.03 −0.01 0.01
−0.03 −0.01 0.01

y(t)
y(t)
x
x
−0.5
−0.5
−1.5
−1.5
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 −0.03 −0.01 0.01 0.03 −0.03 −0.01 0.01 0.03
t t x(t) x(t)
8 / 327
Introduction
Overview

200
200
200
200
160
160
160
160
Height
Height
Height
Height
120
120
120
120
80
80
80
80
5 10 15 5 10 15 5 10 15 5 10 15
Age Age Age Age
8 / 327
Introduction
Overview

Z
Scalar-on-Function: yi = µ + xi (s)β(s)ds + ε
Function-on-Scalar: yi (t) = µ(t) + xi β(t) + ε(t)

Z
Function-on-Function: yi (t) = µ(t) + xi (s)β(s, t)ds + ε(t)
8 / 327
Introduction
Overview

8 / 327
Outline
Introduction
Overview
Summary
9 / 327
Introduction
Standard setting in multivariate data analysis:
...
n observations
...
...
...
p variables
I Observations xi = (xi1 , . . . , xip ) for i = 1, . . . , n

I Model complexity increases with p (Curse of Dimensionality )
9 / 327
Introduction
...
n observations
...
...
...
p variables

9 / 327
Introduction
...
n observations
...
...
...
p variables

9 / 327
Introduction
Data with natural ordering:
xi1 xi2 xi3 ... xip
t1 t2 t3 ... tp
t1 t2 t3 tp
I Longitudinal data
I Ordering along time domain (one-dimensional)
Functional data:
xi1 xi2 xi3 ... xip
t1 t2 t3 ... tp
T
I Basic idea: Model discretely observed data by functions on domain T

10 / 327
Introduction
xi1 xi2 xi3 ... xip
t1 t2 t3 ... tp
t1 t2 t3 tp
I Longitudinal data
Functional data:
xi1 xi2 xi3 ... xip
t1 t2 t3 ... tp
T

10 / 327
Introduction
xi1 xi2 xi3 ... xip
t1 t2 t3 ... tp
t1 t2 t3 tp
I Longitudinal data
Functional data:
xi1 xi2 xi3 ... xip
t1 t2 t3 ... tp
T

10 / 327
Introduction
xi1 xi2 xi3 ... xip
t1 t2 t3 ... tp
t1 t2 t3 tp
I Longitudinal data
Functional data:
xi1 xi2 xi3 ... xip
t1 t2 t3 ... tp
T

10 / 327
Introduction
Functional data:
I Observations xi (t), t ∈ T for i = 1, . . . , n

I Number of observable values xi (t1 ), . . . , xi (tp )
I in theory: p → ∞
I in practice: p < ∞
I Domain T
I Realizations x1 , . . . , xn of X are curves (d = 1), images (d = 2), 3D
arrays (d = 3), etc.
11 / 327
Introduction
Functional data:

I Domain T
11 / 327
Introduction
Functional data:

I Domain T
11 / 327
Introduction

Pointwise measures
Covariance and Correlation Functions
Summary
12 / 327
Pointwise measures
Example: Growth curves of 54 girls
100 120 140 160 180

height (cm)
80
5 10 15
age (years)
Summary Statistics:
I Based on observed functions x1 (t), . . . , xn (t)
I Characterize location, variability, dependence between time points, ...
12 / 327
Pointwise measures
100 120 140 160 180

height (cm)
80
5 10 15
age (years)
Summary Statistics:
12 / 327
Pointwise measures
100 120 140 160 180

height (cm)
80
5 10 15
age (years)
Summary Statistics:
12 / 327
Pointwise measures
100 120 140 160 180
deviation from mean height (cm)
20
10
height (cm)
0
−10
80
−20
5 10 15 5 10 15
age (years) age (years)
Sample mean function: Centered curves:

n
1 X xi (t) − µ̂X (t)
µ̂X (t) = xi (t)
n
i=1
I Pointwise calculation for each value t ∈ T
I Analogous to multivariate case 13 / 327
Pointwise measures
100 120 140 160 180
20
10
height (cm)
0
−10
80
−20
5 10 15 5 10 15

n
1 X xi (t) − µ̂X (t)
µ̂X (t) = xi (t)
n
i=1
Pointwise measures
100 120 140 160 180
20
10
height (cm)
0
−10
80
−20
5 10 15 5 10 15

n
1 X xi (t) − µ̂X (t)
µ̂X (t) = xi (t)
n
i=1
Pointwise measures
50
7
height (cm^2)
40
6
height (cm)
30
5
20
4
10
3
5 10 15 5 10 15
Sample variance function: Standard deviation function:

n
1 X
q
σ̂X2 (t) = (xi (t) − µ̂X (t))2 σ̂X (t) = σ̂X2 (t)
n−1
i=1
14 / 327
Pointwise measures
50
7
height (cm^2)
40
6
height (cm)
30
5
20
4
10
3
5 10 15 5 10 15
Sample variance function: Standard deviation function:

n
1 X
q
σ̂X2 (t) = (xi (t) − µ̂X (t))2 σ̂X (t) = σ̂X2 (t)
n−1
i=1
14 / 327
Outline
Introduction

Pointwise measures
Summary
15 / 327
Covariance / Correlation functions:

I Measure dependence between different (time) points s, t ∈ T
I Sample covariance function:
n
1 X
v̂X (s, t) = (xi (s) − µ̂X (s)) · (xi (t) − µ̂X (t))
n−1
i=1
I Sample correlation function:
v̂X (s, t)
ĉX (s, t) = q
σ̂X2 (s)σ̂X2 (t)
15 / 327

n
1 X
n−1
i=1
v̂X (s, t)
ĉX (s, t) = q
σ̂X2 (s)σ̂X2 (t)
15 / 327

n
1 X
n−1
i=1
v̂X (s, t)
ĉX (s, t) = q
σ̂X2 (s)σ̂X2 (t)
15 / 327
Sample covariance function
40
50
15
heigh
40
40
40 45
55
30
age (years)
t
(cm^2
50 35
10
20
10
)
30
15 25
ag
15)
5
20
s
e
10
10 ear
(y
15
y
ea
5 (
5 age
rs
10
)
5 10 15
age (years)
16 / 327
Sample correlation function
1.0
auto−
0.55
0.75
0.9
0.6
15
0.8
0.75
correla
0.8 0.8
age (years)
0.7 0.75
10
tion
0.6
0.65
0.85
15
ag
5
15) 0.9
5
s
e
10
10 ear
0.7
(y
0.9
y 0.95
ea
0.8
0.9
5 ( 0.85
5 age
rs
0.8 0.75 0.65

0.7 0.65 0.6 0.55
)
5 10 15
age (years)
17 / 327
Introduction

Regularly and irregularly sampled functional data
Basis functions
Basis representations for functional data
Most popular choices of basis functions
Smoothness and regularization
Other representations of functional data
Summary
18 / 327
Example bacterial growth curve i-th growth curve xi (t)

5 Observed measurements:
4
 
t1 xi (t1 )
3
..  .. 
.  . 

y
2
..  .. 
.  . 
1 tp xi (tp )
0
0 20 40
t
18 / 327
Example bacterial growth curve i-th growth curve xi (t)

5 Observed measurements:
4
 
t1 xi (t1 )
3
..  .. 
.  . 

y
2
..  .. 
.  . 
1 tp xi (tp )
0
0 20 40
t
18 / 327
Example bacterial growth curve Sample of curves x1 (t), . . . , xN (t)
5 Observed measurements
4 in  ’wide format’: 
t1 x1 (t1 ) x2 (t1 ) . . . xN (t1 )
3
..  .. .. 
.  . . 
y
2
..  .. .. 

.  . . 
1
tp x1 (tp ) x2 (tp ) . . . xN (tp )
0
0 20 40
t
⇒ Regular functional data:

I functions observed on common grid (often equi-distant)
I simpler case
I to some extend, methods of multivariate statistics can be directly
applied
19 / 327
t1 x1 (t1 ) x2 (t1 ) . . . xN (t1 )
3
..  .. .. 
.  . . 
y
2
..  .. .. 

.  . . 
1
tp x1 (tp ) x2 (tp ) . . . xN (tp )
0
0 20 40
t

I simpler case
applied
19 / 327
t1 x1 (t1 ) x2 (t1 ) . . . xN (t1 )
3
..  .. .. 
.  . . 
y
2
..  .. .. 

.  . . 
1
tp x1 (tp ) x2 (tp ) . . . xN (tp )
0
0 20 40
t

I simpler case
applied
19 / 327

4 in ’long format’:
 
t1,1 x1 (t1,1 )
3
..  .. 
.  .
y

2  
t1,p1  x1 (t1,p1 ) 

1 ..  .. 
.  . 
0
 
tN,1  xN (tN,1 ) 

0 20 40
..  ..

t

.  . 
⇒ Irregular functional data: tN,pN xN (tN,pN )
I functions observed on different time points
I sometimes only sparsely sampled
I more difficult, but often given in practice
20 / 327

 
t1,1 x1 (t1,1 )
3
..  .. 
.  .
y

2  
t1,p1  x1 (t1,p1 ) 

1 ..  .. 
.  . 
0
 
tN,1  xN (tN,1 ) 

0 20 40
..  ..

t

.  . 
20 / 327

 
t1,1 x1 (t1,1 )
3
..  .. 
.  .
y

2  
t1,p1  x1 (t1,p1 ) 

1 ..  .. 
.  . 
0
 
tN,1  xN (tN,1 ) 

0 20 40
..  ..

t

.  . 
20 / 327
Outline
Introduction

Basis functions
Summary
21 / 327
Basis functions
Basis representation Construct functions as weighted sum

5
θi k bk ( t ) basis functions bk (t), k = 1, . . . , K :
4 f ( t ) = Σ k θ i k bi k ( t )
K
X
3 f (t) = θk bk (t)
y
2
k=1
1 with basis coefficients θ1 , . . . , θK .

0
0 20 40
t
21 / 327
Basis functions
Basis representation Functional shape determined

5 via basis coefficients:
θ i k bk ( t )
f ( t ) = Σ k θ i k bi k ( t )
 
4 1 θ1
3
2  θ2 

3  θ3 
y

2 ..  .. 

.  . 
1
K θK
0
0 20 40
t
Function given by
K
X
f (t) = θk bk (t)
k=1
22 / 327
Basis functions

θ i k bk ( t )
f ( t ) = Σ k θ i k bi k ( t )
 
4 1 1
3
2 1

3 1
y

2 ..  .. 

. .

1
K 1
0
0 20 40
t
Function given by
K
X
f (t) = θk bk (t)
k=1
22 / 327
Basis functions

θ i k bk ( t )
f ( t ) = Σ k θ i k bi k ( t )
 
4 1 1
3
2 1

3 2
y

2 ..  .. 

. .

1
K 1
0
0 20 40
t
Function given by
K
X
f (t) = θk bk (t)
k=1
22 / 327
Basis functions

θ i k bk ( t )
f ( t ) = Σ k θ i k bi k ( t )
 
4 1 1
3
2 2

3 3
y

2 ..  .. 

. .

1
K K
0
0 20 40
t
Function given by
K
X
f (t) = θk bk (t)
k=1
22 / 327
Outline
Introduction

Basis functions
Summary
23 / 327
Basis representation Approximate data with basis functions
4
θ i k bk ( t ) ⇒ seek to specify θ̂i,1 , . . . , θ̂i,K such
f ( t ) = Σ k θ i k bi k ( t ) that
3 K
X
xi (t) ≈ θ̂i,k bk (t) .
2
y
k=1
0
0 20 40
t
⇒ Popular criterion:
Specify θ̂i,1 , . . . , θ̂i,K such that quadratic distance becomes minimal,
i.e. !2
X q K
X
xi (tj ) − θi,k bk (tj ) −→ min
θi,k
j=1 k=1
23 / 327
Basis representation Approximate data with basis functions
4
θ i k bk ( t ) ⇒ seek to specify θ̂i,1 , . . . , θ̂i,K such
f ( t ) = Σ k θ i k bi k ( t ) that
3 K
X
xi (t) ≈ θ̂i,k bk (t) .
2
y
k=1
0
0 20 40
t
⇒ Popular criterion:
Specify θ̂i,1 , . . . , θ̂i,K such that quadratic distance becomes minimal,
i.e. !2
X q K
X
xi (tj ) − θi,k bk (tj ) −→ min
θi,k
j=1 k=1
23 / 327
Basis representation Sample of curves x1 (t), . . . , xN (t)

Basis representations
of  observed measurements:
4
1 θ̂1,1 θ̂2,1 . . . θ̂N,1
..  .. .. 
.  . . 
y
2 ..  .. .. 

.  . . 
K θ̂1,K θ̂2,K . . . θ̂N,K
0
0 20 40 PK
t
Functional observations represented as xi (t) ≈ k=1 θ̂i,k bk (t).
24 / 327
Outline
Introduction

Basis functions
Summary
25 / 327
Basis representation B-spline bases:

4
B-Spline of degree 1 I piece-wise polynomials of degree d
3 I basis functions consist of
2
1 (d − 1)-times differentiably
0
connected polynomials
B-Spline of degree 2
4 I connection at knots determining the
3
2
number of basis functions
y
1
0
I cheap to compute & numerically
B-Spline of degree 3 stable
4
3 I local support: sparse matrix of basis
2
1
function evaluations
0
0 20 40
t
25 / 327

4
2
0
3
2
y
1
0
4
2
1
0
0 20 40
t
25 / 327

4
2
0
3
2
y
1
0
4
2
1
0
0 20 40
t
25 / 327

4
2
0
3
2
y
1
0
4
2
1
0
0 20 40
t
25 / 327
Other popular bases:

I Fourier basis: containing harmonics with different frequencies
⇒ periodic functions
I Wavelets:
⇒ for peaked, ragged functions.
I Thin-plate splines
⇒ better theory, also for surfaces.
26 / 327

I Wavelets:
26 / 327

I Wavelets:
26 / 327
Outline
Introduction

Basis functions
Summary
27 / 327
Basis representation I how many knots for the basis?

5
q i bi(t) I trade-off between over-fitting
S q i bi(t)
4 and
3
under-fitting
y
0
0 20 40
t
27 / 327
Penalization:
I minimize quadratic difference from data
+ a roughness penalty term
Specify θ̂i,1 , . . . , θ̂i,K to minimize
p K
!2
X X
xi (tj ) − θi,k bk (tj ) + λ pen(θi ) −→ min
θi,k
j=1 k=1
I with, e.g., P
quadratic penalty on second order differences, i.e.
pen(θi ) = K 2
k=3 ((θi,k − θi,k−1 ) − (θi,k−1 − θi,k−2 )) and λ > 0 a
smoothing parameter
28 / 327
Penalization:
p K
!2
X X
θi,k
j=1 k=1
I with, e.g., P
pen(θi ) = K 2
smoothing parameter
28 / 327
Penalization:
p K
!2
X X
θi,k
j=1 k=1
I with, e.g., P
pen(θi ) = K 2
smoothing parameter
28 / 327
Fit with λ = 0 Fit with λ = 1
4 4
3 3
2 2
y
y
1 1
0 0
-1 -1
0 20 40 0 20 40
t t
Fit with λ = 1000

4
3
2
y
1
0
-1
0 20 40
t
I λ is typically estimated from the data, e.g. using cross validation
29 / 327
Fit with λ = 0 Fit with λ = 1
4 4
3 3
2 2
y
y
1 1
0 0
-1 -1
0 20 40 0 20 40
t t
Fit with λ = 1000

4
3
2
y
1
0
-1
0 20 40
t
I λ is typically estimated from the data, e.g. using cross validation
29 / 327
Outline
Introduction

Basis functions
Summary
30 / 327
I Functional principal components: (Wang et al. 2016)

I basis representation learned from observed data
I “optimal” (low-dimensional) basis
I more on this later
I Gaussian processes: x(t) ∼ GP (µX (t), σX (t, t 0 )) (Shi and Choi 2011)
I Gaussianity assumption
I σX (t, t 0 ) from some parametric family
I µX , σX estimated from data
I Differential equations / dynamics: (J. Ramsay and Hooker 2017)
I represent functional data in terms of differential equations describing
their behavior:
d
dt x(t) = f (x(t))
I seems very useful for physical systems, motion data etc.
I (available literature uses spline representations internally)
30 / 327

their behavior:
d
dt x(t) = f (x(t))
30 / 327

their behavior:
d
dt x(t) = f (x(t))
30 / 327
Introduction
Summary
31 / 327
Summary
Functional Data:
I Arises in many different contexts and in many applications (curves,
images,...)
I Observation unit represents the full curve, typically discretized, i.e.
observed on a grid
I Important analysis techniques:
I Smoothing and basis representation
I Functional principal component analysis
I Functional regression
Summary Statistics:
I Give insights into location, variability and time dependence in a
sample of curves
I Pointwise calculation, mostly analogous to multivariate case
31 / 327
Summary
Basis representation:
I Different types of raw functional data: regularly and irregularly
sampled
I (Approximate) representation via bases of functions
I ’true functional representation’
I smoothing / vector representation
I Represent a functional datum in terms of a global, fixed, known
dictionary of basis functions and an observation-specific coefficient
vector.
I Different types of basis functions for different purposes
I Obtaining desired ’smoothness’ via penalization
32 / 327
Part II
Background: Regression
33 / 327
Recap: Linear Models
Recap: Generalized Linear Models
Recap: Non-Linear Effects
Recap: Mixed Models and Random Effects
Recap: Additive Models and Penalization

Linear Model: Basics
Inference
Model Diagnostics
R-Implementation: LM
35 / 327
Data & Model
Data:
I (yi , xi1 , . . . , xik ); i = 1, . . . , n
I metric target variable y
I metric or categorical covariates x1 , . . . , xp (categorical data in binary
coding)
Model:
I yi = β0 + β1 xi1 + · · · + βp xip + εi ; i = 1, . . . , n
⇒ y = Xβ + ε; X = [1, x1 , . . . , xp ]
I i. i. d. residuals/errors εi ∼ N(0, σ 2 ); i = 1, . . . , n
I estimates ŷi = β̂0 + β̂1 xi1 + · · · + β̂p xip
35 / 327
Data & Model
Data:
I (yi , xi1 , . . . , xik ); i = 1, . . . , n
coding)
Model:
⇒ y = Xβ + ε; X = [1, x1 , . . . , xp ]
35 / 327
Data & Model
Data:
I (yi , xi1 , . . . , xik ); i = 1, . . . , n
coding)
Model:
⇒ y = Xβ + ε; X = [1, x1 , . . . , xp ]
35 / 327
Interpreting the coefficients
Intercept:
β̂0 : estimate for y if all metric x = 0.
and all categorical x in their reference category.
metric covariates:
β̂m : estimated expected change in y if xm increases by 1 (ceteris
paribus).
categorical covariates: (dummy-/one-hot-encoding)
β̂mc : estimated expected difference in y between observations in
category c and the reference category of xm (ceteris paribus).
36 / 327
Outline

Inference
Model Diagnostics
37 / 327
Linear Model Estimation
β̂ minimizes sum of quadratic errors (OLS-estimate):

n
!
X
> 2 >
(yi − xi β) → min bzw. (y − Xβ) (y − Xβ) → min
β β
i=1
⇒β̂ = (X> X)−1 X> y
Estimated error variance:

n
1 X 1
σˆε2 2 = (yi − x> 2
i β̂) = ε̂> ε̂
n−p n−p
i=1
37 / 327
Properties of β̂
I unbiased: E(β̂) = β
I Cov(β̂) = σ 2 (X> X)−1
for Gaussian ε:
β̂ ∼ N(β, σ 2 (X> X)−1 )
38 / 327
Tests
Possible settings:
1. Testing for significance of a single coefficient:
H0 : βj = 0 vs HA : βj 6= 0
2. Testing for significance of a subvector βt = (βt1 , . . . , βtr )> :
H0 : βt = 0 vs HA : βt 6= 0
3. Testing for equality: H0 : βj − βr = 0 vs HA : βj − βr 6= 0
General:
Testing linear hypotheses H0 : Cβ = d
39 / 327
Tests
F-Test:
Compare sum of squared errors (SSE) of full model with SSE under
restriction H0 :
n − p SSEH0 − SSE
F =
r SSE
−1
(Cβ̂ − d) σ̂ 2 C(X> X)−1 C>
> (Cβ̂ − d) H0
= ∼ F (r , n − p)
r
t-Test:
Test significance of a single coefficient:
β̂j H0
t=q ∼ t(n − p)
\
Var(β̂j )
2
2 β̂j H0
F =t = ∼ F (1, n − p)
\
Var( β̂ ) j
40 / 327
Outline

Inference
Model Diagnostics
41 / 327
Residuals in the linear model
Observed errors ε̂ typically not uncorrelated with identical variance:
ŷ = Xβ̂ = X(X> X)−1 X> y

| {z }
hat matrix H
⇒ ε̂ = y − ŷ = (I − H)y
⇒ Cov ε̂ = σ 2 (I − H)
41 / 327
Types of Residuals
I ordinary residuals: ε̂ (not independent, no constant variance)

I standardized residuals: ri = √ ε̂i (constant variance)
σ̂ 1−hii
ε̂i
I studentized residuals: ri∗ = √
σ̂(−i) 1−hii
:
use for anomaly / outlier detection.
I partial residuals: ε̂xj ,i = ε̂i + β̂j xij :
check linearity, additivity.
42 / 327
Graphical model checks:
I model structure: ri vs ŷi

I linearity: ε̂xj ,i vs xj
I variance homogeneity: ri vs ŷi , xj
I autocorrelation: ri , ε̂i vs i (i = time, e.g.)
43 / 327
Outline

Inference
Model Diagnostics
44 / 327
Linear Model in R:
Linear Models in R: lm model specification:

I m <- lm(y ~ x1 + x2, data=XY)
interactions:
I lm(y ~ x1*x2) equivalent to lm(y ~ x1 + x2 + x1:x2)
methods for lm-objects:
I summary(),anova(),fitted(),predict(),resid()
I coef(), confint(), vcov(), influence()
I plot()
etc...
44 / 327
Example: Munich Rents 1999
I data: 3082 apartments

I target: net rent (DM/sqm)
I metric covariates: size, year of construction (metrisch)
I categorical covariates: area (normal/good/best), central heating
(yes/no), bathroom / kitchen fittings (normal/superior)
45 / 327
Model in R
no interaction:
y = β0 + β1 ∗ x1 + β2 ∗ x2.2 + β3 ∗ x2.3
miet1 <- lm(rentsqm ~ size + area)
(beta.miet1 <- coef(miet1))
## (Intercept) size areagood areabest

## 18.2429185 -0.0715132 0.9059416 3.4196824
with interaction:
y = β0 + β1 ∗ x1 + β2 ∗ x2.2 + β3 ∗ x2.3 + β4 x1 x2.2 + β5 x1 x2.3
miet2 <- lm(rentsqm ~ size * area)
(beta.miet2 <- coef(miet2))
## (Intercept) size areagood areabest

## 18.67890804 -0.07817872 0.11145940 0.87292650
## size:areagood size:areabest
## 0.01182596 0.03302475
Model Visualisation
miet1 miet2
35
35
area
● normal
● good
30
30
● best
25
25
net rent (DM/sqm)
net rent (DM/sqm)

20
20
15
15
10
10
5
5
0
20 40 60 80 100 120 140 160 20 40 60 80 100 120 140 160
size (sqm) size (sqm)

Tests
anova(update(miet2, . ~ -.), miet2)
## Analysis of Variance Table

##
## Model 1: rentsqm ~ 1
## Model 2: rentsqm ~ size * area
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 3081 69521
## 2 3076 60064 5 9457.3 96.866 < 2.2e-16 ***
## ---
## Signif. codes:
## 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
Tests
round(summary(miet2)$coefficients, 3)
## Estimate Std. Error t value Pr(>|t|)

## (Intercept) 18.679 0.329 56.736 0.000
## size -0.078 0.005 -16.376 0.000
## areagood 0.111 0.494 0.226 0.821
## areabest 0.873 1.542 0.566 0.571
## size:areagood 0.012 0.007 1.716 0.086
## size:areabest 0.033 0.018 1.797 0.072
round(anova(miet2), 3)

##
## Response: rentsqm
## Df Sum Sq Mean Sq F value Pr(>F)
## size 1 8071 8071.3 413.346 <2e-16 ***
## area 2 1284 641.9 32.875 <2e-16 ***
## size:area 2 102 51.1 2.617 0.073 .
## Residuals 3076 60064 19.5
## ---
## Signif. codes:
## 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
Model Comparison
anova(miet1, miet2)

##
## Model 1: rentsqm ~ size + area
## Model 2: rentsqm ~ size * area
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 3078 60166
## 2 3076 60064 2 102.19 2.6168 0.0732 .
## ---
## Signif. codes:
## 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
complex hypotheses / multiple testing: package multcomp

Model diagnostics: plot.lm()
par(mfrow = c(2, 2))
plot(miet2)
Residuals vs Fitted Normal Q−Q
Standardized residuals
20
● ●
4
● ●
● ●
● ●
● ● ● ● ●
●● ● ●●●●
● ●
Residuals
● ● ●●●●
● ● ● ●
10
● ●
●
● ● ● ● ● ● ● ● ●● ● ●
●●
●●
●●
●
● ● ● ●● ● ● ● ● ● ● ● ● ● ●
●●
●
●●
●
●●
●
●
● ● ● ● ●●● ● ● ● ●● ● ● ● ● ●● ● ● ●● ● ● ● ● ●● ●
●● ●●
●●
●
●
●●
●
●
●●
● ●●● ●●●● ●●● ●●● ● ●● ● ● ●
2
● ● ● ● ● ●● ● ● ●
●
●●
●●
●
●●
●●
● ●●● ● ●● ● ● ● ● ●● ● ● ●● ●●
●
●●
●
● ●● ●● ● ● ● ● ● ●●
●● ●●● ●●● ●● ● ● ● ● ●
●●●
●●● ●
● ● ●● ● ●● ●
●●
●
●
●●
●
●
● ● ●● ●●
● ● ●●
● ●●●●● ● ● ●● ● ●● ● ● ●●
●● ● ●
●●●● ●●●●●●●● ●●
● ● ●●●
● ● ● ●●●● ●
●
●●●●
●
●●
●●● ● ●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
● ● ● ● ● ● ●● ● ● ● ●● ● ● ●
● ● ●●● ●
●●● ● ● ●● ● ●●
●
●
●
● ●
●● ● ●● ● ●● ● ●●● ● ●● ●●● ● ●● ● ● ●●●
●●●
●●●● ●●●● ●● ●●
●●●●●●●●●●
●●●●●●●● ● ●
● ●●●● ● ● ●●
●●
●●●●●● ● ●
●
●
●●
●
●
●●
●
●
●●
●
●
● ●● ●●● ●
● ●●● ●●●●● ●● ●● ●● ●●●● ●●●
● ●● ●●●
● ●● ●●●●
●● ●●
● ●● ●● ● ●● ●
●●
●
●
●●
●
●
●●
●
●
● ●● ● ●●●● ●● ● ● ●●● ●● ● ●
● ●
●●
●●
●●●● ●●
● ●●
● ●●
●
●● ●
●●●● ●●
●
●●● ● ●
●●● ●
●●●
● ●● ●
●● ● ●
●
●●
●
●
●●
●
● ●● ● ● ●● ●●
● ●●● ●●
●●
●
●●●●
●●
● ●●● ●●●●●
●● ●
●●● ●●
●●● ● ●●●●●● ●●● ●●●●●●
●
● ●●●●●● ● ●
●●
●
●
●●
●
●
●●
● ●●● ● ● ●● ● ● ● ● ●● ●●
●●●
● ●●
●●●
●
●
●
●●●
●●●●●
●●●●
● ●●●●●●●
●●●
● ●●●●● ●
●●●
●●●●●
●
●
●● ●●
●●
●
●
●●●●
● ●●
●●●●
●●
●
● ●●●●
● ●●●●●● ●
●
● ●
●● ●●
● ● ●●
●● ●●
●●
● ●
●
●●
● ●● ● ● ●
●●● ●
●●
●
●
●●
●●
●
●
●●
●
● ●
●
●
●
●●●
●
● ● ● ● ●●
●● ● ● ● ● ●●●●●●●●●●● ●●
●●● ●●● ●
●● ●●
●
●● ● ●●●
●● ●
●
●● ●
●●●
● ●●
● ●
●
●●●●
●●●●●
● ●●● ●●
●●● ●
●●● ●●●● ●● ●●●●
●●● ●
●●●●●● ●●●● ● ● ●
●●
●
●
●●
●
●
●●
●
●● ●●●●●● ● ●●●
●●●●● ●●●●
●●
●● ● ●●●●●●
●●●●● ●● ●●
●●●●
● ●●
● ●
●●●●●●● ●
●●● ● ●
●
●●
●
●
●
●●
● ●
●
● ●● ● ● ● ● ●●
●
●
●
●● ●
●● ●●
● ●
●● ●
● ●●
●
●●
●● ●● ●
●
●●
● ●●
●●●
●●
●
●
●
●
●
● ●
●●●
● ●●
●●●●
●
●
●
●
●
●
●
●
●
●
● ●●●
●●●●●
●
●
●●
●
●
●
●
●●
●●●●
●
●
●
● ●
●●●
●
●
●●●
●
●●
●
● ● ● ●●● ●
●●
●●
●
●
●
●
●
●●●
●
●●
●●
●●●● ● ●
●●
●
●●
●
●●
●
●
●
●●
● ●
●
●
●●
●
●
●
● ●●● ● ● ● ●●● ● ●●● ●
●●●●● ●●●●
● ●
●●● ●●●●● ●●
●●●
●● ●●
●● ●● ●●●
●● ●●●
●●●●
●● ●● ●●
● ●●● ● ●
●●
●
● ●● ● ● ● ●● ●● ●●●● ●●●● ●●●●
● ●
0
●● ●● ●●
● ●●●● ●● ●● ●
● ● ● ● ● ● ● ● ● ●● ● ● ● ●●
● ●● ●● ●●● ●● ●● ●●●● ●●
●
●● ● ●●
●
● ●●
●●
●
●●
●●
● ●●● ●
●●
●●● ●●
●● ●●
●
●●
●●
●●
●●●
●
●
●●● ●●●●●●
●
●
●● ●●
●● ●●●●●●● ● ●●
●● ●●●●
●●
●
●● ● ●●●● ●
●●
●
●
●●
●
●
●●
●
●
●
● ● ● ●● ●● ● ●●● ●●
●
●● ● ●● ●● ●● ●
●●●
●● ● ●●
●● ●●
●
●●●●
● ●●●●●
●● ●●
●
●●●●●●●
●● ●●●
●●
●●●
●
●●
●●
●
●
●
●
●●
● ●●
●
●
●
●● ●●
●●
● ●
●●●●
●●●●● ● ● ●
●
●
●●●●
●● ●●● ● ● ●●
●
●
●●
●
●
●●
●
●
●●
●
0
● ●●●●●
●● ●●
●● ●
● ●● ●● ● ●● ● ●
●
●
● ● ● ●● ● ● ●●●
●●● ●● ●● ● ●●
●●●●●● ●●
●●
●●● ●●●●●●●●●●●
●● ●●
●●●●●●●
● ●●
●●●●●
●●●●
● ●●●
●
●●
●●● ●
● ●●
●●●●●●
●
●●● ●●
●
●● ●●●●●●●
●● ●●
● ● ●●● ●● ●
●
●●
●
●
●●
● ●
●
●
●●
●
●
●
● ●● ● ●
●● ● ●
●● ●●●
● ●●●
●● ●
●
●●●●
● ●●
●● ●
●●●●● ●
●
● ●●
●●
●●
● ●
●
●●●●●
● ●●
●●●
●●
● ●●●
● ●● ●●● ●● ●
●●●●●●● ● ●● ●●
●
●
●●
●
●
●●
● ●● ●● ●● ● ●●●●●
● ● ● ●
●●
●●●●●
● ●
●●●● ●●
●●●●
●●
●●●●●
●● ●●
●
●●
●●●●
●● ●●
●●●
● ●●●
●●
●●
● ●●
●
●
●●● ●
●●●●●●
●
● ●●●●●
●●
● ●●
●
●●
●●● ● ●● ● ●● ● ●●●
●● ●
●
●
●●
●
●
●●
●
●
● ● ● ●● ● ● ● ●●
●●● ●●
● ●●
●●
●●●
●●●● ● ●● ●●
● ●
●● ●
●●
●●●
●●
●●
● ● ●●
●●
●●●
●●
●●●●
●●
●
● ●●●●●●●●
●●● ● ●
●●● ●● ●● ● ●
● ● ●
●
●
●●
●
●
●●
●
●
●●
●
●
●
● ● ● ● ●● ● ● ●● ● ●●
●● ● ●●
●●●●●
●● ●●●●●●●● ●●●
●●
● ●●●
●●
●●
●
●
● ●
●●●●●
●●
●●
●● ●●
● ●
●●
●●
● ●●●●●
● ●● ●● ● ● ●●
● ● ● ● ●
●
●●
●
●
●●●
●
●
●
●
● ● ● ● ● ●● ●●●●●
● ●● ●●
●●●● ●●
●●
●●●
●●●
●
●●
●●
●●●●
●●● ●●
●
●●●●●●●●
●
● ●
●● ●●
●● ●
● ● ●● ●
●
●●
●
●
●●
●
●
●●
●
●
●
● ● ● ●
●● ●● ● ●● ●
● ●●
● ●●● ●●
●
●●●
●●
●
●●
●●●● ●●● ●● ●●
● ●●●
●●● ●
●
●
●●● ●●
●●●
●
●●
●●●●
●●●
●● ●
●●●●
●● ●● ● ●● ● ●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
● ● ●●●●● ●● ●●● ● ●●●
● ● ●●●● ●● ●●●●●
●●● ●●
●● ●
●
●●●
●● ●●● ●●
●●
●●●
● ●● ●
● ●●●
●●●●
●●
●●
●●●●●
● ●● ●● ●
●
●●
●
●
●●
●
●
●●
●
●
●
● ●● ●●●● ●●● ● ● ●●●●●●●●
●●●●
●
● ●●● ●●
●●
● ●●
●● ●● ●
●
●●
●
●
●●
●
●
●●
● ●●● ● ● ●● ●●● ● ● ●● ●● ● ●● ●
● ●● ●● ● ● ● ●●
●
●
●●
●
●
●●
−10
● ● ● ●● ● ● ● ● ●●
●●●
● ● ●●●●●
●●● ●● ●● ●● ●●●●● ●●●●●● ●
● ● ●● ●
●●
●
●
●
●●
● ●
●
●
●●
●
●
●●
●
●●
●
●
●
● ● ● ●●●●●
● ●●
●● ● ●●
● ● ●●● ●●●
● ●●●●●● ●
●
● ●
●●
●
●
●●
●
●●
●●
●
●
●
●●
●
● ●
●●
●●
●
● ● ●● ● ●● ● ●●
●
●
●●
−2
● ● ● ●●● ● ● ●
●●
●
●
●● ● ● ●●●● ● ●●
● ● ● ●
● ●●
●
●●
●
●
●●
●
●●
●
●●
●
●●
●
●●
● ● ● ●
●
●●
●
●●
●
●●
●
●
●● ● ● ●●
●●
●
●●
●
● ●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●●
●●
●
●●
●●
● ●●●
●
●
● ● ●●●●●●
●
●
● ●
●
8 10 12 14 16 18 −3 −2 −1 0 1 2 3
Fitted values Theoretical Quantiles
Scale−Location Residuals vs Leverage
0.0 0.5 1.0 1.5 2.0
● 0.5
●
●
4
●
● ●
● ● ●● ●
● ● ● ●
●● ● ●
● ●
● ● ● ● ●
●
●●●
● ● ● ● ● ● ●
●
● ● ●● ●● ● ●● ●● ● ●● ●●
●● ● ●
● ●● ●● ● ● ●● ● ●
● ● ● ●● ● ●●● ● ● ●● ● ● ●● ● ●● ●
●●●●●●
●
● ●
●
● ●●● ● ● ● ● ●●●● ●●
●● ●● ●●●●
● ● ●
●●●● ●
● ● ● ● ● ● ●●● ●● ● ● ● ●● ●●●
●●
● ●●
● ●● ●● ●
● ●● ●●
●
●
●
●●
● ● ●
● ● ●
2
● ●● ● ●●● ● ●●● ● ● ●● ●
●●
●
● ●
● ● ● ● ● ●●● ● ●●● ●●●●● ●●●
●● ● ●●●●●
● ● ●●
● ●● ●● ●
●
●
●
●
●
●
●
●●
●
●●
●●●
● ●● ●
● ● ●● ● ● ●
● ● ● ●● ●● ●●● ●● ● ●
● ● ●●●●
●● ●●
●● ●●●●● ●●●
●●●●●●●
● ●●● ●● ● ●● ●● ●●
●● ● ● ●
●
●
●
●
●
●
●
●●
●●●●● ● ●
● ● ●
● ● ●●● ●● ●● ●●●● ● ●
●
●●●●●●●●
● ●
●●●●●●● ● ●●●
●
● ●● ● ●●●
● ● ●● ●
●●● ● ●
●
●
●●
●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●● ●● ● ● ● ●
● ● ● ●● ● ●● ●
● ●● ●
●●● ● ●●●●●● ●● ●●
● ● ●● ●●● ● ●●●●● ● ● ● ●
●
●
●●
●
●
●
●●
●
●●●●
●●●● ● ●● ● ●
● ● ● ●● ● ● ● ●● ● ● ●●● ● ●● ●●● ●●● ●●
●
●
● ●● ●
●
●●●●
●● ●●●
● ●●
●
● ●
●●●●
●
●●●●●● ●●●●
●●●●●●
●
●
●●
● ●●●● ● ●●
● ● ● ● ●●●● ● ●●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ● ● ●●
●
● ● ● ● ●● ●●
● ●● ●● ●
●● ●●
●●
●●●●
●
●
●●
●●●●●● ●● ●●●●●● ●
●
●●
●
●●
●●
●
●● ●
● ●●●●
●
● ●●
●●●
●●
● ●
●●●
● ●●
●● ● ●●
●●● ● ●
●
●● ●● ● ●
●●●● ●● ● ●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●●●● ●●
● ●● ● ● ● ● ●
●
●
● ● ● ●●●
● ●● ● ●● ●● ●●●
●●
● ● ●●
● ●●
●●●
●●●
●
●●●
●●
●
●●●
●
●
●
●
● ●
●
●
●
●
●
●
●●
● ●●
●●
●●●
●●
●●●●
●
●
●●●
● ●●●
●
●
●●● ●
●●●
● ●●● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●●
●
●
● ● ●● ●●●
● ●
● ● ● ●● ● ●●●● ● ●● ● ● ●
●● ●●●●●●
●● ●
●●●
●● ●●●● ●
●● ● ●
●●●●
●●●●●●● ● ●
●● ●●● ● ● ● ●● ● ●
●
●
●●
●
●
●●
●
●●
●
●●●● ●
● ●● ●● ● ●●●●● ●●
● ●● ●●●●●● ● ●●● ● ● ●● ● ● ●
●
●
●●
●
●
●
●●
●●
●●● ● ● ●
● ● ●● ●● ● ● ●●● ●●●● ● ● ● ● ●● ●●●●
●●●●
●●●
●●●● ●●●
●● ● ●●
●
●●●●●●
● ●●●
●
●
●
●●
● ●
●●
●●
●● ● ●● ●●● ●●●● ●●●● ●
●● ●
● ● ●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●●●●●● ● ● ●
● ● ●● ● ● ●● ● ●● ● ● ● ●● ● ●●
●● ●●●●
●●
●● ●
●●●●●●●●● ●●
● ●
● ●●
●●
●
●●●●
●●
●●●●●●
● ●●
●
● ●●
●●●● ●● ● ● ●● ●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●●
●
●● ● ● ● ●
● ● ● ● ●●
●● ●●●●● ●●
●●●
●●
●
●●
●
●●● ●●● ●●
●●●
●●●●
●●
●
●
●
●●●●
●●
●
●●
●
●●
●
●● ●
●●●●
●● ●●
● ●●
●●●
●
●●
●
●●
● ●●●● ● ● ●●●●
● ●●
●●●● ● ● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
● ●● ● ● ● ● ●● ● ●
● ●● ● ● ●●●●●●● ● ●●●●
● ●●
●●●● ● ●● ●●●● ●●●
●● ● ●●
0
●● ●● ● ● ●
●● ●●●
●●●● ● ● ●● ●●
●
● ●●●●● ● ● ●
●●●●●●
● ●●●●
●●●●●●
●●
●●
●
●●●●
●
●●●●● ●●
●●●●●● ● ●●●●
●●●
●● ●● ● ● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●●
●●
● ● ●● ●●● ●● ●●●
● ● ●
● ● ●●●●● ●●●●●
●●●
●●●
●●
● ●● ●● ●
●●●● ●●●●
●●●●●●●
●●●●●
● ●●
● ●●
●●●● ●
●
●●●●● ●●● ●●
●●●● ●● ● ●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●
●
●
●
●
●●
● ●●●●●●● ●●
● ● ● ●●● ● ● ● ● ●●● ● ●●● ●
●●
●●
●● ●
●
●●●●
●●
●●●
●●● ●● ●●●
●
● ●●● ●● ●
●
●●
●
●●●●
●●●●●●
● ●●
●●● ●● ●●● ●●
●● ● ●● ● ●●● ●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●●
●
●●●
●
● ●●●●●
● ● ● ●
●
●●● ●●●● ●● ●● ●
●●●●●
● ●●●
●● ●
●● ●●●●●●●●
●●●●●●
●
●
●●●
●● ●●●
●
●
●
●●●
● ●●●
●●
●●
●
●●●●
●● ●●●
●
●●●●●
● ●● ●●
●
●● ●●
●●●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●●● ● ● ●
●
● ● ●
●● ● ● ●●●● ●●● ● ●●● ●
● ●●●●●
●● ●● ●●
●●●●●
●●
●● ● ●●●●
●●●
●●●
●● ●
●●●●●
●
●●
●● ●● ●●
●●
● ● ●
●●● ●●
●● ●●●
●●●●
●●● ● ●● ●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●●
●●●
●●●●●●●
● ●● ●●●● ●●●● ●●●
●● ● ●●●
●●●●● ●●
●●● ●●● ●
●●
●● ●
●●● ●●
●● ●
●
●
●●●●
●
●●●● ●● ●●
● ●●
●●● ●●● ●● ●● ●●●●● ● ● ● ● ● ● ●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ● ● ●● ● ● ●
●
●● ●●
● ● ● ●●● ●●
● ● ●●●●●● ●● ●●●
●● ●● ●●●● ●
●●●
● ●●●
●●●● ●● ● ● ● ● ●● ● ●
●
●
●●
●
●●
●
●●
● ● ●
● ●●● ● ●● ●●●●●
●
●●●●●●●●●
●●●●
●●
●● ●
●● ●●●●●●
● ●●●
● ● ●●●●
● ●
●●●●
●●● ●● ●●●
●●●● ●●● ●
●●
●●●● ●
● ● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
● ●
●
●● ●● ●
●●
● ●
● ● ●● ● ●●● ●●●
● ●●●● ● ●●
●● ●●
● ●●●
●● ● ●● ●
●
●●● ●● ●●●●
●●
●●
●
● ● ● ●● ●
●
●
●●
●
●
●●
●●
●●● ● ●
● ● ● ●●●● ● ●● ●●●●
●●●● ●●●
● ●●
●●●●● ●
●●
●● ●●● ●
●●●●●●
● ●● ●●●●
● ●●●●●●●●● ●●● ●●●●
●
● ●●●● ● ●●
●
●
●
●
●
●
●●
●
●
●
●●
●
●●
● ●
●● ● ● ● ● ●●●●●●● ● ● ● ●●
● ●●
●
●●
●
● ●●
●
● ●● ●
● ●●●●
● ●●
●●●●
●●●● ●●●
●●●
●
●●
●●● ●●
● ●●●●● ● ●●
●
●
●
● ●● ●
●●●●
● ●● ● ●● ●●
●●● ●● ●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●●●
● ●●
●
● ●
● ● ●
●●
● ●● ●● ●●●● ● ●●● ● ●● ●●
● ●●●●●●●●●
●● ●
●●● ● ●
● ● ● ●●● ●
● ● ● ●
● ● ●● ● ● ● ●
●
●
●●●●
● ●
● ● ●●● ●●
● ●●
●●●● ● ●
●
●●
●
● ●
●
−2
● ● ●●●●● ●● ● ●●● ● ●● ● ●
●●
● ● ●●● ● ●●●
●● ●● ●●●●
●● ● ●●●●●●● ●● ● ●
●● ● ●
● ● ● ●
●
●
●
●
●
●●
●●
●●● ● ●
● ● ●●● ● ● ●
●● ●●●●●●●●● ● ● ●
● ●● ● ● ● ●● ●●●●●●●
● ●●
●●●
●
●
●
●●●
●●● ●●●● ● ●●●●● ● ●●● ●●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
● ● ● ●
● ● ● ● ●●●●
●● ●● ● ●●
● ●● ● ●● ● ● ●
●
●● ● ●
● ●● ● ●● ● ● ●●● ● ●
●
● ● ●●●
●
●● ●●● ●● ● ●●● ●●● ● ●
●●● ●●●
● ●● ●●
●● ●● ● ●● ● ● ●● ● ●
●
●
●●
●●
●● ●
●● ● ● ● ● ● ● ● ● ●●●● ● ●●● ●● ●● ● ●
● ●●● ●● ●● ●●● ●● ● ● ● ●
●
●
●●
●
●●● ●
● ●● ●
●●● ●● ●● ● ● ● ●●
●● ● ●
● ●● ●
●●● ● ●● ● ●● ●
●●● ●
●● ● ●● ● ● ●●
● ●● ● ● ● ●
● ● ● ●● ● ●●● ●●●●●
● ●●● ●
● ● ●● ● ●●●●● ● ● ●● ●● ● ●
● ● ● ● ●●● ● ● ●●● ●●
●● ● ●● ●● ●●
● ● ● ● ●● ●● ● ● ● ●● ●●●● ●● ●● ●
●●● ●● ● ●●● ● ● ● ●
● ● ● ●●● ● ● ●
●● ● ●● ●● ● ● ●●●●● ●
● ●● ●
● ● ● ●● ● ●● ● ●
● ● ●● ● ● ● ● ●●●● ● ● ● ● ●
●
●
●
●
● ●●● ● ● ● ● ●
● ● ●
● ● ●●●
●●
●
●●●
●●
●● ● ● ● ●
●
● ●● ●●
●●
●● ● ●●●● ● ●●●
● ●
● ● ●● ●
●●
●
● ●
●
Cook's distance
−4
8 10 12 14 16 18 0.00 0.02 0.04 0.06 0.08 0.10

Fitted values Leverage
Model criticism: Linear effect of size?
bla
blub
20
10
5
^ε
0
−10
20 40 60 80 100 120 140 160

size
Model criticism: Linear effect of size?
plot(size, res_size)
points(sort(unique(size)), tapply(res_size, size, mean), col = "red")
20
●
10
●
●
●
●
● ●●
5
● ● ●
● ●
● ● ●●● ●
^ε
● ●
● ●
● ●● ●
● ● ●●
●●● ●●●●●● ● ●
0
● ● ●
● ● ● ●● ●●● ● ● ●
●● ● ●●●● ●● ● ● ●● ● ● ●
● ● ● ●● ●●●●● ● ● ● ● ● ●● ● ●● ● ●
●
● ● ● ● ● ●
●
● ● ● ●● ● ● ● ●
● ● ●
● ● ● ●
●●
●● ●● ● ●
●
●
−10
●
●
●
20 40 60 80 100 120 140 160

size
Alternative Representation of linear models
I y = Xβ + ε
I Gaussian errors: ε ∼ N(0, σ 2 I)
⇒ y ∼ N(Xβ, σ 2 I)
⇒ E(y) = Xβ; Var(y) = σ 2 I
54 / 327
Alternative Representation of linear models
I y = Xβ + ε
I Gaussian errors: ε ∼ N(0, σ 2 I)
⇒ y ∼ N(Xβ, σ 2 I)
⇒ E(y) = Xβ; Var(y) = σ 2 I
54 / 327

Motivation
GLMs: The General Approach
Inference
55 / 327
Binary Target: Naive Approach
Data:
I binary target y (0 or 1)
I metric and/or categorical x1 , . . . , xp
naive estimates:
ŷi = β̂0 + β̂1 xi1 + · · · + β̂p xip
I ŷi not binary
I could try to interpret ŷi as P̂(yi = 1)
I no variance homogeneity
I ŷi < 0 ? ŷi > 1? ⇒ ŷi must be between 0 and 1
Idea:
P̂(yi = 1) = h(x>
i β̂) with h : (−∞, +∞) → [0, 1]
55 / 327
Data:
naive estimates:
ŷi = β̂0 + β̂1 xi1 + · · · + β̂p xip
I ŷi not binary
Idea:
P̂(yi = 1) = h(x>
i β̂) with h : (−∞, +∞) → [0, 1]
55 / 327
Data:
naive estimates:
ŷi = β̂0 + β̂1 xi1 + · · · + β̂p xip
I ŷi not binary
Idea:
P̂(yi = 1) = h(x>
i β̂) with h : (−∞, +∞) → [0, 1]
55 / 327
Data:
naive estimates:
ŷi = β̂0 + β̂1 xi1 + · · · + β̂p xip
I ŷi not binary
Idea:
P̂(yi = 1) = h(x>
i β̂) with h : (−∞, +∞) → [0, 1]
55 / 327
Data:
naive estimates:
ŷi = β̂0 + β̂1 xi1 + · · · + β̂p xip
I ŷi not binary
Idea:
P̂(yi = 1) = h(x>
i β̂) with h : (−∞, +∞) → [0, 1]
55 / 327
Data:
naive estimates:
ŷi = β̂0 + β̂1 xi1 + · · · + β̂p xip
I ŷi not binary
Idea:
P̂(yi = 1) = h(x>
i β̂) with h : (−∞, +∞) → [0, 1]
55 / 327
Data:
naive estimates:
ŷi = β̂0 + β̂1 xi1 + · · · + β̂p xip
I ŷi not binary
Idea:
P̂(yi = 1) = h(x>
i β̂) with h : (−∞, +∞) → [0, 1]
55 / 327
Binary Target: GLM Approach
I yi ∼ B(1, πi )
I model for E(yi ) = P(yi = 1) = πi
I use response function h: π̂i = h(x> i β̂)
or linkfunktion g : g (π̂i ) = xi β̂ where g () = h−1 ()
>
Logit-Model:
exp(x>
i β̂)
π̂i = h(x>
i β̂) =
1 + exp(x>
i β̂)
56 / 327
Binary Target: GLM Approach
I yi ∼ B(1, πi )
I model for E(yi ) = P(yi = 1) = πi
I use response function h: π̂i = h(x> i β̂)
or linkfunktion g : g (π̂i ) = xi β̂ where g () = h−1 ()
>
Logit-Model:
exp(x>
i β̂)
π̂i = h(x>
i β̂) =
1 + exp(x>
i β̂)
56 / 327
Binary Target: Coefficients of the Logitmodel
exp(x>
i β) πi
πi = ⇔ log = x>
i β̂
1 + exp(x>
i β) 1 − πi
πi
⇔ = exp(β0 ) exp(β1 xi1 ) . . . exp(βp xi )
1 − πi
I linear model for log-odds (Logits)

π̂i
⇒ exp(β̂r ) as factor by which odds change 1−π̂i an, if xir increases by 1.
⇒
P̂(y = 1|x)/P̂(y = 0|x)
exp (x − x̃)> β̂ =
P̂(y = 1|x̃)/P̂(y = 0|x̃)
odds ratio between 2 observations with x and x̃.
57 / 327
exp(x>
i β) πi
πi = ⇔ log = x>
i β̂
1 + exp(x>
i β) 1 − πi
πi
1 − πi

π̂i
⇒
P̂(y = 1|x)/P̂(y = 0|x)
exp (x − x̃)> β̂ =
P̂(y = 1|x̃)/P̂(y = 0|x̃)
57 / 327
exp(x>
i β) πi
πi = ⇔ log = x>
i β̂
1 + exp(x>
i β) 1 − πi
πi
1 − πi

π̂i
⇒
P̂(y = 1|x)/P̂(y = 0|x)
exp (x − x̃)> β̂ =
P̂(y = 1|x̃)/P̂(y = 0|x̃)
57 / 327
Binary Target: Probit- & cloglog-Models
Probit-Model:
use standard-Gaussian ECDF as response function:
π̂i = Φ(x>
i β̂)
cloglog-Model:
response function:
π̂i = 1 − exp(− exp(x>
i β̂))
58 / 327
Probit-Model:
π̂i = Φ(x>
i β̂)
cloglog-Model:
response function:
i β̂))
58 / 327
Probit-Model:
π̂i = Φ(x>
i β̂)
cloglog-Model:
response function:
i β̂))
58 / 327
Binary Targets: Expectation and Variance
I no direct connection between expectation (x> β) and variance (σ 2 ) in

linear model
I for binary y ∼ B(1, π):
E(y ) = π = P(y = 1) determines Var(y ) = π(1 − π)
Overdispersion:
observed variability greater than theory assumes:
I unobserved heterogeneity
I positively correlated observations
Solution: add dispersion φ : Var(y ) = φπ(1 − π)
59 / 327

linear model
Overdispersion:
59 / 327

linear model
Overdispersion:
59 / 327

linear model
Overdispersion:
59 / 327
Example: Patent Injunctions
I Data: 4832 European patents (Europäisches Patentamt)

I Target: patent injunctions (ja/nein)
I covariates (metric):
I year of patent (0=1980)
I citations (azit)
I scope (no. of countries; aland)
I patent claims (ansp)
I covariates (categorical):
I sector (Biotech&Pharma, IT&Semiconductor) (branche)
I US patent (uszw)
I patent holder origin (US/D, CH, GB/others; (herkunft))
60 / 327
Binary Target: R-Implementation
## The following objects are masked from patent (pos = 4):

##
## aland, ansp, azit, branche, einspruch, herkunft,
## jahr, uszw
pat1 <- glm(einspruch ~ ., data = patent, family = binomial())

round(summary(pat1)$coefficients, 3)
## Estimate Std. Error z value Pr(>|z|)

## (Intercept) -0.771 0.134 -5.765 0.000
## uszwUSPatent -0.392 0.068 -5.795 0.000
## jahr -0.071 0.009 -8.194 0.000
## azit 0.118 0.014 8.297 0.000
## aland 0.084 0.011 7.915 0.000
## ansp 0.018 0.003 5.219 0.000
## brancheBioPharma 0.681 0.084 8.128 0.000
## herkunftD/CH/GB 0.323 0.083 3.897 0.000
## herkunftUS -0.152 0.076 -2.002 0.045
Binary Target: R-Implementation
round(exp(cbind(coef(pat1), confint(pat1))), 3)
## Waiting for profiling to be done...
## 2.5 % 97.5 %
## (Intercept) 0.462 0.355 0.601
## uszwUSPatent 0.676 0.592 0.772
## jahr 0.931 0.915 0.947
## azit 1.125 1.095 1.157
## aland 1.088 1.066 1.111
## ansp 1.018 1.011 1.025
## brancheBioPharma 1.975 1.676 2.328
## herkunftD/CH/GB 1.381 1.174 1.625
## herkunftUS 0.859 0.741 0.997
table(einspruch, estimated = round(fitted(pat1)))
## estimated
## einspruch 0 1
## nein 2223 624
## ja 925 1094
Count Data as Targets
Daten:
I positive, whole number target y (counts, frequencies)
⇒ naive estimates Ê(yi ) = x>
i β̂ could become negative
⇒ model log(Ê(yi )), i.e,

Ê(yi ) = exp x> i β̂ = exp(βˆ0 ) exp(βˆ1 xi1 ) . . . exp(βˆp xip )
⇒ exponential-multiplicative covariate effects on target
63 / 327
Daten:

63 / 327
Daten:

63 / 327
Daten:

63 / 327
Count Data as Targets: log-linear Model
Distributional assumption:
I yi |xi ∼ Po (λi ) ; λi = exp(x>
i β)
⇒ E(yi ) = Var(yi ) = λi
Overdispersion:
I Frequently Var(yi ) 6= λi :
⇒ more flexible model with dispersion parameter φ:
Var(yi ) = φλi
⇒ alternative distributions: Tweedie, Negative Binomial
64 / 327
Count Data as Targets: log-linear Model
Distributional assumption:
I yi |xi ∼ Po (λi ) ; λi = exp(x>
i β)
⇒ E(yi ) = Var(yi ) = λi
Overdispersion:
I Frequently Var(yi ) 6= λi :
⇒ more flexible model with dispersion parameter φ:
Var(yi ) = φλi
⇒ alternative distributions: Tweedie, Negative Binomial
64 / 327
Exampe: Patent Citations
pat2 <- glm(azit ~ ., family = poisson, data = patent)

pat3 <- MASS::glm.nb(azit ~ ., data = patent)
AIC(pat2, pat3)
## df AIC
## pat2 9 21021.23
## pat3 10 16341.48
round(cbind(
summary(pat2)$coefficients[2:5, -c(3, 4)],
summary(pat3)$coefficients[2:5, -c(3, 4)]
), 3)
## Estimate Std. Error Estimate Std. Error

## einspruchja 0.442 0.024 0.422 0.046
## uszwUSPatent -0.079 0.024 -0.047 0.046
## jahr -0.070 0.003 -0.079 0.006
## aland -0.026 0.004 -0.029 0.008
⇒ similar estimates, much bigger variability, better fit.

Outline

Motivation
Inference
66 / 327
Definition: GLM
I Structural assumption: Connect conditional expectation and linear

predictor Xβ via link/response function:
E (yi |xi ) = µi = h(x> >

i β) ⇔ g (E (yi |xi )) = g (µi ) = xi β
exp(x>
i β)
I logit regression: E(yi |xi ) = P(yi = 1|xi ) = 1+exp(x>i β)
I log-linear model E(yi |xi ) = exp(xi β)
I Distributional assumption: Given independent (xi , yi ) with
exponential family density f (yi ):

yi θi −b(θi )
⇒ f (yi |θi ) = exp φ ωi − c(yi , φ, ωi ) ; θi = θ(µi )
I E(y |x ) = µ = b 0 (θ ) = h(x> β)
i i i i i
I Var(y |x ) = φb 00 (θ )/ω ; ω = n
i i i i i i
⇒ Connect mean structure and variance structure (and higher moments)
66 / 327
Definition: GLM

E (yi |xi ) = µi = h(x> >

i β) ⇔ g (E (yi |xi )) = g (µi ) = xi β


yi θi −b(θi )
I E(y |x ) = µ = b 0 (θ ) = h(x> β)
i i i i i
I Var(y |x ) = φb 00 (θ )/ω ; ω = n
i i i i i i
66 / 327
Definition: GLM

E (yi |xi ) = µi = h(x> >

i β) ⇔ g (E (yi |xi )) = g (µi ) = xi β


yi θi −b(θi )
I E(y |x ) = µ = b 0 (θ ) = h(x> β)
i i i i i
I Var(y |x ) = φb 00 (θ )/ω ; ω = n
i i i i i i
66 / 327
Simple Exponential Families
Distribution θ(µ) b(θ) φ
Normal N(µ, σ 2 ) µ θ2 /2 σ2
µ
Bernoulli B(1, µ) log( 1−µ ) log(1 + exp(θ)) 1
Poisson Po(µ) log(µ) exp(θ) 1
Gamma G (µ, ν) −1/µ − log(−θ)
√ 1/ν
Inverse Gauß IG (µ, σ 2 ) 1/µ2 − −2θ σ2
67 / 327
Simple Exponential Families
Distribution E(y ) = b 0 (θ) b 00 (θ) Var(y ) = b 00 (θ)φ/ω
Normal µ=θ 1 σ 2 /ω
exp(θ)
Bernoulli µ = 1+exp(θ) µ(1 − µ) µ(1 − µ)/ω
Poisson µ = exp(θ) µ µ/ω
Gamma µ=1− √ 1/θ µ2 µ2 /(νω)
Inverse Gauß µ = 1/ −2θ µ3 µ3 σ 2 /ω
68 / 327
R-Implementation: glm()
glm(formula, family , data, ...)

I formula: as in lm
I family: specify distribution (binomial, gamma, etc.)
and link function g (µ) = Xβ
(family=binomial(link=’probit’)).
69 / 327
Advantages of GLM-Formulation
Iunified approach for variety of data situations

⇒ unified methodology for
I estimation
I tests
I model choice and diagnostics
⇒ asymptotics
via Maximum Likelihood approach.
70 / 327
Recent Extensions:
GLM idea in combination with ML inference works similarly for many

other non-exponential family distributions, implemented in mgcv:
I t-distribution
I Tweedie
I Beta
I models for ordinal categorical responses
(Wood, Pya & Säfken, 2016)
71 / 327
Outline

Motivation
Inference
72 / 327
ML Estimation: Idea
Pn > 2
OLS estimate in linear model: i=1 (yi − xi β) → min
I
√ Pn
(y −x> β)2

density for y: ni=1 f (yi |β, xi ) = ( 2πσ)−n exp − i=1 2σi 2 i
Q
I
⇒ OLS estimate maximizes joint density of observed data over model

parameters
⇒ Maximum Likelihood principle:
maximize (Log-)Likelihood l(β) = ni=1 log(f (yi |β, xi ))
P
72 / 327
ML Estimation: Idea
Pn > 2
I
√ Pn
(y −x> β)2

Q
I

parameters
P
72 / 327
ML Estimation: Idea
Pn > 2
I
√ Pn
(y −x> β)2

Q
I

parameters
P
72 / 327
ML Estimation: Idea
Pn > 2
I
√ Pn
(y −x> β)2

Q
I

parameters
P
72 / 327
ML Estimation: Procedure
Pn
I log-likelihood l(β) = i=1 log(f (yi |β, xi ))
∂
I score function s(β) = ∂β l(β)
I (iterative) solution for s(β) = 0
via Fisher-Scoring or IWLS
73 / 327
ML Estimation: Fisher-Scoring
I basically Newton method:
−1
β (k+1) = β (k) − ∂β∂> s(β) s(β)
Newton−Verfahren
β(0)
s(β)
β(1)
β(2)
β
74 / 327
ML Estimation: Fisher-Scoring & IWLS
I basically Newton method:

−1
β̂ (k+1) = β̂ (k) − ∂β∂> s(β̂ (k) ) s(β̂ (k) )
−1
I observed information matrix H(β̂ (k) ) = ∂β∂> s(β̂ (k) ) expensive to
compute
⇒ use expected Fisher information F(β) = E(H(β))
very efficiently computable:
represent in terms of iteratively re-weighted LS estimation (IWLS) with a
(k)
diagonal weight matrix W(k) and working observations ỹi .
75 / 327
Properties of ML Estimators
β̂ML is consistent, efficient, asymptotically Gaussian:

a
β̂ML ∼ N(β, F−1 (β))
76 / 327
Tests
Linear hypotheses H0 : Cβ = d vs HA : Cβ 6= d
Estimation for β under restriction H0 : β̃
I LR-Test:
lq = −2(l(β̃) − l(β̂))
I Wald-Test:
w = (Cβ̂ − d)> (CF−1 (β̂)C> )−1 (Cβ̂ − d)
I Score-Test:
u = s(β̃)> F−1 (β̃)s(β̃)
a
under H0 : lq, w , u ∼ χ2r , r = rank(C) (no. of restrictions)
⇒ rejecct H0 if lq, w , u > χ2r (1 − α).
77 / 327
Tests
I LR-Test:
lq = −2(l(β̃) − l(β̂))
I Wald-Test:
w = (Cβ̂ − d)> (CF−1 (β̂)C> )−1 (Cβ̂ − d)
I Score-Test:
u = s(β̃)> F−1 (β̃)s(β̃)
a
77 / 327
Tests
I LR-Test:
lq = −2(l(β̃) − l(β̂))
I Wald-Test:
w = (Cβ̂ − d)> (CF−1 (β̂)C> )−1 (Cβ̂ − d)
I Score-Test:
u = s(β̃)> F−1 (β̃)s(β̃)
a
77 / 327
Tests
I LR-Test:
lq = −2(l(β̃) − l(β̂))
I Wald-Test:
w = (Cβ̂ − d)> (CF−1 (β̂)C> )−1 (Cβ̂ − d)
I Score-Test:
u = s(β̃)> F−1 (β̃)s(β̃)
a
77 / 327
Tests
I LR-Test:
lq = −2(l(β̃) − l(β̂))
I Wald-Test:
w = (Cβ̂ − d)> (CF−1 (β̂)C> )−1 (Cβ̂ − d)
I Score-Test:
u = s(β̃)> F−1 (β̃)s(β̃)
a
77 / 327
Tests in R
√ a
summary.glm uses w ∼ N(0, 1) for H0 : βj = 0:
round(summary(pat2)$coefficients[8:9, ], 3)
## Estimate Std. Error z value Pr(>|z|)

## herkunftD/CH/GB -0.236 0.031 -7.524 0.000
## herkunftUS 0.061 0.026 2.358 0.018
anova.glm(..., test=’Chisq’) for LR-Tests:

anova(update(pat2, . ~ . - herkunft), pat2, test = "Chisq")
## Analysis of Deviance Table

##
## Model 1: azit ~ einspruch + uszw + jahr + aland + ansp + branche
## Model 2: azit ~ einspruch + uszw + jahr + aland + ansp + branche + herkunft
## Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1 4859 13954
## 2 4857 13859 2 95.155 < 2.2e-16 ***
## ---
## Signif. codes:
## 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
Model Choice
Which probabilistic model offers best trade-off between fidelity to training

data (more complexity) and parsimony?
⇒ Information criteria:
I Akaike: AIC = −2l(β̂) + 2p → min (AIC())
I Bayes: BIC = −2l(β̂) + log(n)p → min
79 / 327
Model Diagnostics: Residuals
I Pearson residuals: (resid-Option: type=’pearson’)

yi −µ̂i
I riP = √
v (µ̂i )
I for grouped data approx. N(0, 1)
I deviance residuals (resid-Default)
riD = sgn(yi − µ̂i ) 2(li (yi ) − li (µ̂i ))
p
I
I for grouped data approx. N(0, 1)-verteilt.
I partial residuals (type=’partial’)
I prediction errors yi − ŷi (type=’response’)
80 / 327

yi −µ̂i
I riP = √
v (µ̂i )
p
I
80 / 327

yi −µ̂i
I riP = √
v (µ̂i )
p
I
80 / 327
Model validation: plot.glm()

plot(pat2)
Residuals vs Fitted Normal Q−Q
Std. deviance resid.

2796
1871 ● ● 2796 ●
● 1871
● 4743 ● ● ●●● 4743
Residuals
●●●●● ●●●●● ●●
●
●
●
●
●●
● ●● ● ● ●● ●● ●● ●● ●
●●
●
●●
●
●
●●●● ●● ●● ●● ● ● ●
● ●
●● ●
● ●●
●
●●
●
●
●
●●
5
5
● ● ●
●● ● ●
●
●
●●●● ●
●
●●
●
●●●●●
●
●
●●
●●
●
●
●●
●
●●
●
●●
●●●
●
●●
●●
●
●●●
●
●
●
●● ●
●
●
●●●
●
●
●●●
●
●●
●
●
●
●
●●●●●
● ●
●●● ●
●●
●
●
●●
●
●
●
●●
●
●
●●
● ● ●
●
●
●●●
●
●●●
●
●
●●
●
●●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●●●
●
●●
●
●
● ●
●
●●●
● ● ●● ●
●
●●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●●
●
●
●●●●
●●●
●
●●
●●
●
●
●●
●
●
●
●●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●●
●
●●
●
●●
●●
●
●●
●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●
●
●●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●●
●
●
●
●●
●
●●
●
●
●
●●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●●
●
●●
●
●●
●
●
●●●
●
●●●
●
● ●●●●
●
● ●●
●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●●
●
●●
●●●●
●●
●●
● ●
●
●
●●
●
●●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●●
●
●
●●
●●
●
●
●●
●
●
●●●●
●
●
●●
●●
●
●
●●
●
●
●●
●
●
●●
●
●●
●
●●
●
●
●●
●
●
●
●●
●
●
●●
●
●●
●
●●
●●
●
●
●●●
●●
●●
●●● ●
● ● ●
●
●●
●
●
●
●
●●
●
●●
● ●
●
●
●
● ●●●
●●
●●
●●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●●
●●
●
●●
●
●●●
●
●
●●
●
●
●●
●
●●
●
●
●●
●●
●
●●
●●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●●●
●
●
●
●●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●●
●
●●● ●●
●
● ●
●
●
●
●●
●
●
●●
●
●
●
●●
●●
●●
●
● ●
●
●
●
●
●●
●
●●
●
●●
●
●●
●●
●
●●
●● ●
●
●
●●
●
●
●
●
●●
●●
●
●●
●
●
●●
●
●●
●●
●
●●
●
●●
●
●●
●●
●
●
●
●●
●
●●
●
●●
●
●●
●●
●
●
●●
●
●
●●
●
●●●
●
●
●●
●
●
●
●●
●
●
●●
●
●●
●
●
●●
●●●
●
●
●●
●
●●
●
●●
●
●●
●●
●●
●
●
●●
●
●●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●
●●
●●
●
●
●
●●
●
●●
●
●
●
●●
●
●
●●
●
●●
●
●●
●
●●
●●
●
●
●●
●
●
●●
●
●●●
●
●
●
●●
●
●
●●
●
●●
●
●●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●
●●●
● ● ● ●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●●
●
●●
●
●
●
●●
●●●
●
●●●
●
●
●
●●
●
●
●
● ●
●●
●●
●
●
●●
● ●
●
●●
●
●
●
●●
●
●●●
●
●
●
●
●●
●●●
●
●●
●
●
●
●● ●
●
●●
●
●●
● ●
●
●●
●●●
●
●●●
●
● ●
●
●
●●
●
●
●●
●
●
●●
● ●
●●
●
●
●●●
●
● ●
●●
●
●●
●
●●
●●
●●
●●
●
●●
●●
●
●
● ●●
●
●
●
●
●●
●
●●
●
●
●●
●●
●●
●
●
●
●●
●
●
●
●
●●
●●
●●●●●
●●● ●●
●●
●
●●
●
●●
●
●
●●
●
●●
●●
●
●
●
● ●●
−5
−5
●●
● ●
−0.5 0.0 0.5 1.0 1.5 2.0 2.5 −4 −2 0 2 4
Predicted values Theoretical Quantiles
Scale−Location Residuals vs Leverage

Std. deviance resid.
Std. Pearson resid.

2796
−5 5 15 25
● 4743 1871 ● ● ● 2796
3.0
● ●
●
● ●●●●● ●
● ●● ●● ● ●
● ● ● ●● ● ●●●● ●●●● ● ●● ●●
● ●
●●●●● ● ●● ●●●●●●●●
● ●
●
●● ●
●● ●●
●
●
●●
●
●
●
●
●
●
●
●●●●●
●
●
●
●
●
●
●●●
●
●
●
●
●
●●
●
● ●●
●●●●
●
●
●
●
●●
●
●●●●
●●
● ● ●
●
●
●
●
● ● ●
● ●●●●●●
●●
●●●
●
●●
●●
●●
●
●●
●
●
●●●
●●
●
●●●
●●●
●
●
●●
●●●● ●●●●
●●
●●●
● ●●●● ●●●●●●●●
● ● ●
●
●
1.5
●●
●●●
●● ●●●
●●
●●
●
●●●
●
●●
●●●●
●● ●●
●
●
●●●
●
●●
●
●●●
●
●●● ●
● ●
● ●
●●●
●
●
●●
●
●
●●
●
●●
●●
●
●
●●
●●
●●
●
●●
● ●
●
●
●●
●●●
●
●●
●
●
●
●
●
●
●
●
●●
●●
●●
●
●●
●●
●
●●
●●
●
●
●●
●
●
●
●
●
●●●
●●
●
●●●
●
●
●
●
●●
●
●●
●
● ●
●
●
●●
●
●
●
●●
●
●●
●
●●
●
●●
●
●
●
●
●●
●
●
●
●
●●
●
● ●
●
●
●
●
●
●
●●
●
●
●●●
●●
●●
●
●
●
●●
●
●●
● ●
●
●●
●
●●
●
●
●
●
●
●
●●
●
●
●
●●
●●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●●●
●●
● ●●●●
●
● ●
●●
●●●● ●
●
●
●
●
●
●
●
● ●
●●
● ●
●
●
●
●
●●
●
●
●
●●
●●
●
●
●
●●
●
●●
●●
●
●●
●
●●
●
●●
●
●
● ●●
●
●
●
●
●
●●
●
●
●●
●●
● ●●
●
●
●
●●
●
●●
●
●●●
●
●
●
●●
●
●
●
●
● ●
●
●
●
●●
●
●
●●
●
●●
●
●
●●
● ●
●
●●
●●
●
●●●
●●
●
●
●
●●
●
●●● ●
●●
●
●●
●
●●
●
●
●●
● ●●
●●
●
●●
●●
●●
●
●●
● ●
●●●
●
●
●●
●
●
●●
●●
●
●●
●● ●● ● ● ●● ●●
●
●
●
●
●●
●●●
●
●
●●
● ●
●
●●
●
●
●●●
●
●●
●
●●
●
●
●●
●
●●
●●
●
● ●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●●
●
●
●●●
●
●●
●
●
● ●
●
●●
●
●●
●
●
●●●
●
●●●
● ●
●●
●
●●
●
●●
●
●●●●
●
●●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●●● ●
●●
●●
●●
● ●
●●●● ●
●
●
●
●
●
●
● 1
● ● ●●
●
●●
●
●
●●
●●
●
●
●●
●
●●
●
●●
●
●
●●
●
●
●●
●
●●
●
●●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●●
●
●●
●●●
●●●
●
●●
●
●●
●●
●
●
●
●●
●
●●
●
●●
●
●●
●● ●● ●
●
●
●
●
●
●
●
●
●
●
●● 0.5
●●
●●●●
● ●
●●
●
●
●●
●
●●
●
●●●
●
●●
●
●●
●
●
●●
●
●
●●
●
●
●● ●
●
●●
●
●
●●
●●
● ●
●
●●
●
●●
●
●●
●●
●● ●●
●
● ● ●
●
●
●
●●
●
●●
●
●
●●
●
●●
●●
●
●
●● ●●
●
●
●●
●
●●
●
●●
●
●
● ●●●
●●
●
●●
●
●
● ●
●
●
●●
●
●●
●
●
●
●
● ●
●●● ●
●● ●●
●● ●●
●
●
●
●
●
●
●
●●
●
●● Cook's distance
●
●
●
●●
●
●
●●
●
●
●
●●
● ●
●
●●
●
●
●●
●
●
●●
●
● ●
●
●●
●
●●
●
●
● ●
●●
●
●●
●●
● ●
●
●●
●
●● ●
●●●●
●
●●
●
●● ●●
● 0.5
1
●
●● ●
●● ●● ●
● ●
● ● ● 3939 ● 4290 ●
0.0
●
●
●●
●●
●
● ●
●●
●
● ●
● ●
● ●
−0.5 0.0 0.5 1.0 1.5 2.0 2.5 0.0 0.1 0.2 0.3 0.4
Predicted values Leverage


Transformation of Covariates
Polynomial Splines
82 / 327
Motivation
bla
blub
20
10
5
^ε
0
−10
20 40 60 80 100 120 140 160

size
Motivation
plot(size, res_size)
points(sort(unique(size)), tapply(res_size, size, mean), col = "red")
plot(log(size, base = 2), res_size)
points(log(sort(unique(size)), base = 2), tapply(res_size, size, mean), col = "red")
20
20
● ●
● ● ● ●
● ●
15
15
● ● ● ●
● ●
● ● ● ● ● ●
● ● ● ● ●● ●
●● ● ●
● ● ●● ●
● ●● ● ● ●
● ●● ● ●●
● ●● ● ●
●
● ● ● ● ●● ● ●● ● ● ●
● ● ●
● ●● ● ●●
●●
● ● ● ● ●● ● ● ●
● ●
10
10
●● ●● ●● ● ●●● ● ● ●● ●
● ● ● ● ● ● ● ● ● ●
●●● ● ●●● ●●● ● ● ●
●
● ● ● ●● ● ● ●
● ●
● ●●● ●
●● ●
●●● ●
●
● ● ●● ● ● ● ●● ●●●● ● ●● ● ●● ● ● ●● ●● ● ● ●
●●
● ●●●●
● ●●
● ●●● ●●●● ● ● ●● ●
●
● ● ●●●
● ●●●● ● ●● ● ●●●●●● ●●●● ● ● ●
● ●
● ●
●● ●●● ● ● ●● ●● ● ● ● ●● ●
●●
● ● ● ●
●●●●● ●●●●● ● ● ●● ● ● ● ● ● ● ●● ● ● ●● ●●
● ● ●● ●●●●●● ● ● ● ● ●
●
●
● ●
● ●●●
●
●●●● ●●
●
● ●
●●
●
●
●● ●
●
● ● ● ●
● ● ●●
●● ●
●● ● ● ● ●●● ●
● ● ●●●●● ●●●●● ●
●
● ● ●
● ● ●●
●● ●
●● ● ● ●
●● ●●●●● ● ●● ● ●●● ● ● ● ● ●● ● ● ● ● ● ● ●●●●●●●●● ●●●
●●●● ● ● ● ● ●● ● ●
●●
● ●
●●●
● ● ●●● ●●
●
●● ●●
●
● ● ● ● ● ●● ●
●●● ● ●
● ●
●● ●●● ●●●●●●●●●●●
●
● ● ● ● ●
●● ●● ●● ● ● ●● ● ● ●● ●● ● ●●●● ● ● ●● ● ● ●● ●
●
● ●●●●●
●
●
●●●
●
●●
●●●
●
●●
●
●●
●●
●● ●●
●
●
●●
● ●●●●●
●●● ●●●
●
●● ●
●●
● ● ● ●●
●●
● ● ● ● ● ●●●● ●●
●● ●●●●●● ●●● ●●●●
●● ● ● ●● ●●●● ● ●●●
●
●● ● ●● ● ●●
●●
● ● ●
●
●● ● ●●●
●●●●●
●●●
● ● ● ●
●●
●
●● ●
●
●●
●
●●
●
●
●●●●●
●● ●● ●
●●●●●●● ● ● ● ● ● ● ●
● ●● ●
●●● ●●●●
●●●●● ● ●●
●●
●●
●
●●●●
● ●●
●● ●●
●●●●
●●
●●● ● ●
●●●●●●●● ● ● ●● ●
5
5
● ●●● ● ●
●●
●●●
● ● ●
●
●●●●●
●●
● ●
●
●● ●
● ●●●●●
● ●●● ●
●
● ●● ● ●● ● ● ● ●
●● ●●● ●●●
●●● ● ●●●●●●
●●
● ●●● ● ● ●●
●
● ●●●
● ●
● ●●● ● ●●● ● ● ●
● ●● ● ●● ● ●● ● ● ●●● ● ● ●● ● ●●●● ● ● ●●●●
● ●● ● ●
●●
● ● ●
●
●
●●●●● ●●
●●●
●●●
●●●●
●●●
●
● ●●
●● ●
●
●●●● ● ●● ●● ● ● ● ● ●●● ●
●● ● ●
●●●●●● ●●
●●●●●●● ●●●●●
●●●●● ●●
●●●
●
●
● ●●
●●● ●●●● ● ●● ●●
● ● ● ●
● ● ●●
●●
●●
●●
●
●●●
●
●●●
● ●
●
● ●
●●
●
●●●
●●●●
●
●●●
●
●●●
●●●
●●
●
●●●
● ● ●● ●
● ●●●● ● ●● ● ● ●●
● ●
●
● ● ● ●●●●●
●● ●●
● ●●●●●●●●
● ●●●
●●●
● ●●●
●
●
●●●● ●
●
●
●●
●●
●
●●●
● ● ●● ●
● ●● ●● ● ●●● ● ●● ● ● ●
●
● ● ●
●●
●●● ●●●
●
●●● ●●● ●
●●●●●●●
●●
●●
●
●● ●
●
●●● ●●●
●
●●●● ●
●● ●
●● ● ● ●
●● ●● ● ●● ● ●●● ● ● ●●
● ● ● ● ●● ●●● ●
●●●● ● ●● ●●
● ●
●●●
●● ●●
●●
● ●●●
●
●●●● ●
●● ●
●● ● ● ●● ● ●
● ● ●● ●
^ε
^ε
●● ●●● ● ●● ●●●● ● ●●
●●
● ●
●●●
●
● ●● ●
●
● ●●●
●
●
●
●●
●●
●
●●
●
●●●
●
●
●●
●
●●●
●
●●
●
● ●
●●●
●
●
●
● ●
●●●
●●
●●
●
● ●● ●●
●● ● ● ●●● ●●●●● ● ● ●●
● ●●●● ●●● ●
● ●●● ●●● ●●●● ●●●
●●
●●
●●
●●
●●
●
●
●
●
●
●●●
●
●
●
● ●
●●●
●●
●●
●
● ●● ● ●
● ● ● ● ●●●●
● ●● ●
●●●●● ●
●●
●● ●●●●
●●
● ●
●● ●● ●●●●●
●●●
●●●● ●
● ●
●●●● ●
●●
● ●
●●● ●
● ● ● ● ●● ● ● ● ● ● ● ●
●●●●● ● ● ●
●● ● ● ●●●●●●●● ●● ● ●●
●●
●●●●●●●● ● ● ●
●● ● ●
●●
● ●●● ●
● ● ●● ●● ● ● ●● ●●
●● ● ● ●●
●●●●
● ●●●●●● ●
●
●● ●●● ●● ●● ● ● ●●
● ● ● ●●
● ● ● ● ●●
● ●●●
●●
●●● ● ●●●●●●●●●
●● ●●●
●● ● ●●● ●●● ● ● ●●● ●●●● ●●
● ●
●●● ●●
● ● ●●
●●
●●
● ●●
●●●●●●●●●● ●
●●● ●●●●●
●●●
●●
●●●
●●
●● ●●● ●●●●● ● ● ● ● ●●●● ●●● ●● ● ●● ●● ●●●●●
●●●
●●
●●●
●●
●● ●● ● ● ●
● ● ●
●●●
● ●● ●●● ● ●●● ●
● ●●●●
●●● ●●●●●●● ●●●●●●● ●●● ● ● ●●●●● ●● ● ● ●●●●●●● ●●●●●●● ●●●
●●●●●●
●●● ●●●●●●● ●●● ●●●●●● ● ●
● ●● ●●
● ● ●●●
● ●● ●
● ●
● ● ●●
●●●● ● ● ●●● ●●● ● ● ● ● ●● ●●●●● ●● ● ●●
●●●● ● ● ●● ● ● ●●● ●● ●
●
●●● ●
●●●●●
●
●
●
●● ●
● ●
● ●●
●●
●●●
●●
●●●●
●●●
●●
●●●
●
●●
●
●
●
●
●
● ●
●
●
●
● ●●
●●
●● ●
●●
●●●
●●●
●
● ●● ●●●●
●
● ● ●
● ● ● ●●●
●●●●●● ●●●●●● ● ●●●●●●●●
●●●●●●●
●● ●●●
● ●
●●
●●●
●
●●
●
●
●
●
●
● ●
●
●
●
● ●●
●●
●● ●
●●●●●●●
●
●●●●
●●●●
●
●● ● ●
●
● ● ●
●
●
●● ●
●
●●●● ●●●●●●●
●●●●
●●●
●
●
●●
●●
● ●●
● ●●●●●
● ●
●
●●●●
●●
●●● ●●● ● ●●
● ● ●
●● ●●●●● ●●●
●●●● ●●● ●●● ●
●●●●●
● ●
●● ●
●
● ● ●●●
● ●
●
●●●●
●●
●●● ●● ● ●●●
●
0
0
● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ●
●●●●
●● ●●●●
●●●●● ●●
●●
● ●
●●
●
●●●
●●●●
●●● ●
●●
● ●●
●●
●●●
●● ●● ●●
●
●
●
●
●
●
●●
●●● ● ●●
●
●● ● ●● ● ●●
● ● ●●
● ●●●
● ●
● ●●●● ●● ● ●● ●●●●●●●
● ●●●
●● ●● ●
●●● ●
●
●● ●
●●
●●
●
●●●
●● ●● ●●
●
●
●
●
●●
●●
●●●● ●●●
●● ●● ● ●●●
● ● ●●
●● ● ●
●●
●
●●
●●● ● ● ●●
● ●●
● ●
●
●
●●● ●
●●
●●
● ●
●
●
●
●●●●●●●●
●●
●
● ●●●●
● ● ●
●●
●● ● ●
● ● ●●
●
●●●
●
●●
●●●● ● ●●●●●●● ●
●● ●●
●
●●●●
●
●
● ●
●
●
●
●●●●●●●●
●●
●
● ●●●●
● ● ●
●●
●●●●●
●● ●●
●● ●
● ●● ●● ●
●● ●●●
●
●●●
●●
●●●●●
●●●
●
●●
●●●
●
●●●●
●●●●
●●
●●●●●●
●
●
●●
●●●
●
●●●
● ●●
● ●●●
●● ●
●
●●●
● ●● ● ● ● ●
● ● ●● ● ● ●●●●●●
●●●● ●● ●
●●
●●●●
●● ●●●
● ●
●
●●●●
●●
●●●●●●
●
●
●●
●●●
●
●●●
● ● ●
● ●●
●●
● ●
●
●●
●●●● ●●
● ●
●
●●
●● ●
●
●
●
●●
●
●
●
●
●
●●
● ●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●●● ●
●
●
●●
●●
●
●
●●●
●●
●●●
●
●
●●
●●
●
●
●●
●●
●
●●
●●
●●●●●
●● ● ●●● ● ● ● ● ● ●●● ● ●● ●●
●
●●
●
●
● ● ●●●
● ●●
●
●
●
●
●
●
●
●●
●
●
●
●
●●●● ●
●
●
●●
●●
●
●
●●●
●●
●●●
●●
●●
●●●
●
●
●●
●
●
●●
●●
● ●●●
●
●● ● ●●
● ● ●● ●
● ● ● ●●● ● ●●
● ●●
●● ●
●● ●●
●
● ● ●● ●
●●●●●
●● ●
● ●
● ●● ● ● ● ●● ●● ● ● ● ● ● ● ●
●●●●●●●●●● ●●●● ● ●●
●
● ● ●● ●
●●●●●
●● ●
● ●
● ●● ● ● ●●●
●
●●
●●
●●●●●
●● ●
●● ●●●●●● ●●
● ●● ● ● ● ●● ● ● ●
●● ● ●●●●●●●●●●
●
●●●●● ●●
● ●● ● ●● ●●● ●●
● ● ● ●●● ●●●●●
●
● ●●● ●
●●
●●●
●
●●●
● ●●
●●● ●●
●
●●●●
●
● ●
●●●
●●
●
●●●●●●●●
●●●●●●
● ●
● ● ● ● ● ●
● ● ● ●● ●●● ● ●●
●●●● ●●●●●●●●●
● ●●
●●● ●●
●
●●●●
●
● ●
●●●
●●●
●●●●●●●
●
●●●●●●
● ●
●● ● ● ● ●
●
●●●●●
●●● ●●
●●●
●
●●
●●●●
●
●
●
●●●●●
●●
●●
●
●●
●
●
●●●
●
●
●● ●
●
●●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●●
●●
● ●● ●
●● ●● ●●●● ● ●● ● ● ●
● ●● ●●●●
●●●● ●● ●●●●
●●●● ●
●●
●
●●
●●●
●●
●●
●
●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●
●
●
● ●
●
●
●
●●●
●
●● ●●●●● ●
● ●●●●
●● ●●
●
●● ● ●● ● ●
●●●
●●●●
●●●
● ●●
●
●● ●
●●● ●●
●
●●
●●●●●
● ●
●●
●
●
●●●
●●
●
●
●●●
●
● ● ● ●
●
●
●● ●●● ●●●● ●
● ● ●
● ● ● ● ●●● ● ●●●
●●●●●●●●●● ●● ●● ●
● ●
●
●● ●●
●
●●
●●●●●
● ●
●●
●
●
●
●
●●
●●
● ●●●●
●
●● ● ●
●
●
●● ●●
● ●●●
●●●
●● ●
● ●●
●
●● ●
● ●●
●●
●●●●●
●●●●●●●
●
●
●
●●●
●
●●
●
●●
● ●●
●●●
●●●
● ●
●
●
●●●
●
● ●●
●
●●● ●
●●
●
●
●●
●
●
●
●● ●● ●
●●●●
● ● ● ●
● ● ● ●● ●●●●● ●●●●●●●
●● ●●●
●●●●
●● ●
●
●
●●●
● ●●
●●●
●●●
● ●
●
●
●●●
●
● ●●
●
●● ● ●
●
●● ●
●●●
●
●
●
●●●●● ●
●●
● ●●● ● ●
−10 −5
−10 −5
● ● ●● ●● ●●
●●● ●●● ● ●
● ● ● ● ● ●●●●
● ● ● ● ●● ●● ●●●● ● ● ● ●●●●
● ●● ● ●●● ● ●
●●
● ●●●
●● ●
●
●●●
● ●
● ●●●●●●
●● ●
●
● ●
●●
●● ●●● ●● ●
●
●● ● ●
● ● ● ●● ●● ● ●● ● ●●● ●●●
●
● ●
●●●●●●●●
●● ●
●
● ●
●●
●● ● ●●
● ●●●● ●
●●
●
● ●●
●●●●●●● ● ● ●
●
●●● ● ●●● ●
●
●●●●
●●
●
●●
●●●●
●●
●●●●● ●
●●●
●●●
● ●
●
●
●●●
●●●
●
●
●●●
●
● ●●●●●●
●●
●
●
●
● ● ●●
● ●●
●● ●●
●●
● ●
●●●●
● ● ● ●
●● ●
●
● ●●●●●●●●●●●●
●●●
●●
●
●●●●
●●● ●
●●● ●
●●●
●●●
●
●●
●
●●●
●●●
●
●
●●●
●●●
● ●●●●
●●
●
●
●
●●●●
●
●
● ●●
●●
●
● ●●● ● ●
● ● ●
● ●●● ●
●● ●
●●● ●
● ●● ●
● ●●● ●●●● ● ●●
●
● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ●
● ●●●● ●●● ●● ●● ● ●●
●
● ●
● ● ● ● ●● ●●●●● ● ●
● ● ● ●●● ● ● ●● ● ● ● ● ●● ●● ● ●● ●
●●●● ● ●
● ● ● ●●● ● ● ● ● ●
● ● ● ●●
●
●● ●●
●
●● ●● ●
●
●
●
●
●
●
●
●
● ●● ●● ●●
●●
● ●
●
●
●
●
●
●
● ●●●●●
● ●
●●●● ● ●● ● ●● ● ● ● ●
● ●● ●●●● ●● ● ● ●●●●●●●
● ●●● ● ●●
●●
● ●
●
●
●
●
●
●
● ●● ●●●
● ●
●●
●● ●●● ● ●●● ● ● ●
●● ● ● ● ● ●● ● ● ●
● ● ● ● ●● ● ●
●● ●● ● ● ●● ● ● ●● ● ● ● ● ●
●●● ●●● ● ● ●● ●
●●● ● ●●●●● ● ● ● ● ●● ● ●●
●●● ● ●
●● ●●
●
● ●● ● ●
●●● ● ●●●●● ● ● ●●● ●●● ● ●●
● ● ●● ● ● ●●● ● ●● ●
●●●
● ●
●● ● ● ● ● ● ● ● ● ●●●●● ●●● ● ●● ● ●● ●●●
●
●●●
●●● ● ● ● ●●
● ● ●
●● ●
●● ●●●●●
● ● ● ● ● ●●
●● ●●
●●●●
● ●●● ●●
● ● ●
●●● ● ●
●●●●● ● ●● ●●●●
●● ●●
●●● ●● ●
●●● ● ●
● ● ●
● ● ● ●●● ●
● ●
●● ●●● ●
● ●● ● ● ● ●● ● ● ● ● ●● ● ●●●●● ●
● ●
●
●● ●●●● ●●●● ● ●
●●
●● ● ● ● ●● ●●
●● ●
●●●
● ● ● ●● ● ●
●● ●
●● ●
●●●
● ● ● ●● ● ●
●● ● ● ●● ●●● ● ● ● ●●●● ●● ● ● ● ●
● ● ● ●● ●●●
● ● ●●● ● ● ●●●● ●● ● ● ● ●
●
● ●●● ●● ●● ● ●
●
●●● ● ● ● ● ●● ●●
● ● ●●● ● ●
● ● ● ●
● ● ● ● ● ●● ●
● ● ●
●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ●
●● ● ●
● ●
● ●
● ● ● ●
Simple Transformation
I linearity often too restrictive assumption

I gain flexibility without complex models by using log or polynomials of
x
⇒ replace y = βx + ε
√
by y = βf (x) + ε; f (x) = log(x), x3 , x etc...
I Issues:
I interpretation of β
I choice/selection of f (x)
84 / 327

x
√
I Issues:
84 / 327

x
√
I Issues:
84 / 327
Polynomial Transformation
I Polynomial Model
I y = f (x) + ε = β0 + β1 x + β2 x2 + · · · + βl xl + ε
I In R: Use poly(x,degree) to avoid collinearity
85 / 327
Polynomial Transformation: Collinearity
x <- seq(0, 1, l = 200)
X <- outer(x, 1:5, "^")
X.c <- poly(x, 5)
round(cor(X), 2)
## [,1] [,2] [,3] [,4] [,5]

## [1,] 1.00 0.97 0.92 0.87 0.82
## [2,] 0.97 1.00 0.99 0.96 0.93
## [3,] 0.92 0.99 1.00 0.99 0.97
## [4,] 0.87 0.96 0.99 1.00 0.99
## [5,] 0.82 0.93 0.97 0.99 1.00
round(cor(X.c), 2)
## 1 2 3 4 5
## 1 1 0 0 0 0
## 2 0 1 0 0 0
## 3 0 0 1 0 0
## 4 0 0 0 1 0
## 5 0 0 0 0 1
⇒ use orthogonal polynomials

Polynomial Transformation: Synthetic example
x <- seq(0, 1, l = 300)
fx <- function(x) {
sin(2 * (4 * x - 2)) + 2 * exp(-16^2 * (x - 0.5)^2)
}
y <- fx(x) + rnorm(300, sd = .3)
X.c <- poly(x, 15)
m.poly3 <- lm(y ~ X.c[, 1:3])
m.poly7 <- lm(y ~ X.c[, 1:7])
m.poly11 <- lm(y ~ X.c[, 1:11])
m.poly15 <- lm(y ~ X.c)
plot(x, y, pch = 19, col = "grey")
lines(x, fx(x), col = 1, lwd = 2)
●
●●
● ●
2
●●
●●
●●● ● ●
● ●
● ● ●●● ●
● ●● 87 / 327
● ●
●
●●
● ●●
2
●
●●
●● ● ●
●
● ●
● ● ●●● ●
● ●●●
● ● ●
● ●● ● ● ●●
● ● ● ●
●
● ● ● ●● ● ●●
● ● ●
● ● ● ● ●● ● ●●● ●
1
● ● ● ● ●
● ● ●
● ● ●● ● ● ●●
● ●● ● ● ●● ●● ●● ●
● ● ●
●● ●● ● ● ● ● ● ● ● ● ●
●● ●● ● ●● ●● ●●● ●● ● ●
●
y
●● ● ●● ● ●
●● ●● ● ● ●
● ● ● ● ●
● ● ●
● ●● ● ● ● ●● ● ●
● ● ● ● ● ●●
● ● ●
● ●● ● ● ●●
0
● ● ●● ● ● ● ●
●● ● ● ●●●
● ● ●● ●●●
●● ● ● ● ●
● ● ● ●● ●●● ●
●● ●
● ● ● ● ● ● ●●
●● ● ●
● ●●
● ●● ● ●
● ●●
●●
●● ● ● ●
● ● ● ● ● ● ●● ● ●
● ● ● ● ● ●● ● ● ● ● ●
●
−1
● ●● ●● ●
●●● ● ●
● ● ● ● ●
● ●● ●● ● ●
● ●
● ● ●● ●● ●
0.0 0.2 0.4 0.6 0.8 1.0
88 / 327
●
true f
degree 3
●
●●
● ●●
2
●
●●
●● ● ●
●
● ●
● ● ●●● ●
● ●●●
● ● ●
● ●● ● ● ●●
● ● ● ●
●
● ● ● ●● ● ●●
● ● ●
● ● ● ● ●● ● ●●● ●
1
● ● ● ● ●
● ● ●
● ● ●● ● ● ●●
● ●● ● ● ●● ●● ●● ●
● ● ●
●● ●● ● ● ● ● ● ● ● ● ●
●● ●● ● ●● ●● ●●● ●● ● ●
●
y
●● ● ●● ● ●
●● ●● ● ● ●
● ● ● ● ●
● ● ●
● ●● ● ● ● ●● ● ●
● ● ● ● ● ●●
● ● ●
● ●● ● ● ●●
0
● ● ●● ● ● ● ●
●● ● ● ●●●
● ● ●● ●●●
●● ● ● ● ●
● ● ● ●● ●●● ●
●● ●
● ● ● ● ● ● ●●
●● ● ●
● ●●
● ●● ● ●
● ●●
●●
●● ● ● ●
● ● ● ● ● ● ●● ● ●
● ● ● ● ● ●● ● ● ● ● ●
●
−1
● ●● ●● ●
●●● ● ●
● ● ● ● ●
● ●● ●● ● ●
● ●
● ● ●● ●● ●
0.0 0.2 0.4 0.6 0.8 1.0
88 / 327
●
true f
degree 3
●
degree 7 ●●
● ●●
2
●
●●
●● ● ●
●
● ●
● ● ●●● ●
● ●●●
● ● ●
● ●● ● ● ●●
● ● ● ●
●
● ● ● ●● ● ●●
● ● ●
● ● ● ● ●● ● ●●● ●
1
● ● ● ● ●
● ● ●
● ● ●● ● ● ●●
● ●● ● ● ●● ●● ●● ●
● ● ●
●● ●● ● ● ● ● ● ● ● ● ●
●● ●● ● ●● ●● ●●● ●● ● ●
●
y
●● ● ●● ● ●
●● ●● ● ● ●
● ● ● ● ●
● ● ●
● ●● ● ● ● ●● ● ●
● ● ● ● ● ●●
● ● ●
● ●● ● ● ●●
0
● ● ●● ● ● ● ●
●● ● ● ●●●
● ● ●● ●●●
●● ● ● ● ●
● ● ● ●● ●●● ●
●● ●
● ● ● ● ● ● ●●
●● ● ●
● ●●
● ●● ● ●
● ●●
●●
●● ● ● ●
● ● ● ● ● ● ●● ● ●
● ● ● ● ● ●● ● ● ● ● ●
●
−1
● ●● ●● ●
●●● ● ●
● ● ● ● ●
● ●● ●● ● ●
● ●
● ● ●● ●● ●
0.0 0.2 0.4 0.6 0.8 1.0
88 / 327
●
true f
degree 3
●
degree 7 ●●
● ●●
2
●
degree 11 ●●
●● ● ●
●
● ●
● ● ●●● ●
● ●●●
● ● ●
● ●● ● ● ●●
● ● ● ●
●
● ● ● ●● ● ●●
● ● ●
● ● ● ● ●● ● ●●● ●
1
● ● ● ● ●
● ● ●
● ● ●● ● ● ●●
● ●● ● ● ●● ●● ●● ●
● ● ●
●● ●● ● ● ● ● ● ● ● ● ●
●● ●● ● ●● ●● ●●● ●● ● ●
●
y
●● ● ●● ● ●
●● ●● ● ● ●
● ● ● ● ●
● ● ●
● ●● ● ● ● ●● ● ●
● ● ● ● ● ●●
● ● ●
● ●● ● ● ●●
0
● ● ●● ● ● ● ●
●● ● ● ●●●
● ● ●● ●●●
●● ● ● ● ●
● ● ● ●● ●●● ●
●● ●
● ● ● ● ● ● ●●
●● ● ●
● ●●
● ●● ● ●
● ●●
●●
●● ● ● ●
● ● ● ● ● ● ●● ● ●
● ● ● ● ● ●● ● ● ● ● ●
●
−1
● ●● ●● ●
●●● ● ●
● ● ● ● ●
● ●● ●● ● ●
● ●
● ● ●● ●● ●
0.0 0.2 0.4 0.6 0.8 1.0
88 / 327
●
true f
degree 3
●
degree 7 ●●
● ●●
2
●
degree 11 ●●
degree 15 ●● ● ●
●
● ●
● ● ●●● ●
● ●●●
● ● ●
● ●● ● ● ●●
● ● ● ●
●
● ● ● ●● ● ●●
● ● ●
● ● ● ● ●● ● ●●● ●
1
● ● ● ● ●
● ● ●
● ● ●● ● ● ●●
● ●● ● ● ●● ●● ●● ●
● ● ●
●● ●● ● ● ● ● ● ● ● ● ●
●● ●● ● ●● ●● ●●● ●● ● ●
●
y
●● ● ●● ● ●
●● ●● ● ● ●
● ● ● ● ●
● ● ●
● ●● ● ● ● ●● ● ●
● ● ● ● ● ●●
● ● ●
● ●● ● ● ●●
0
● ● ●● ● ● ● ●
●● ● ● ●●●
● ● ●● ●●●
●● ● ● ● ●
● ● ● ●● ●●● ●
●● ●
● ● ● ● ● ● ●●
●● ● ●
● ●●
● ●● ● ●
● ●●
●●
●● ● ● ●
● ● ● ● ● ● ●● ● ●
● ● ● ● ● ●● ● ● ● ● ●
●
−1
● ●● ●● ●
●●● ● ●
● ● ● ● ●
● ●● ●● ● ●
● ●
● ● ●● ●● ●
0.0 0.2 0.4 0.6 0.8 1.0
88 / 327
Piecewise Polynomials
I polynomial transformations have problems:

I choice of degree (= flexibility)
I oscillations, boundary effects for higher degrees
⇒ piece-wise polynomials:
Idecompose range of x in sub-intervals
Iapproximate f (x) by low-degree polynomial in each sub-interval
⇒ removes oscillations, boundary effects
89 / 327
I polynomial transformations have problems:

I choice of degree (= flexibility)
I oscillations, boundary effects for higher degrees
⇒ piece-wise polynomials:
Idecompose range of x in sub-intervals
Iapproximate f (x) by low-degree polynomial in each sub-interval
⇒ removes oscillations, boundary effects
89 / 327
●
true f
degree 15
●
5 piecewise ●●
● ●●
2
●
quadratic polynomials ●●
●● ● ●
●
● ●
● ● ●●● ●
● ●●●
● ● ●
● ●● ● ● ●●
● ● ● ●
●
● ● ● ●● ● ●●
● ● ●
● ● ● ● ●● ● ●●● ●
1
● ● ●
● ● ●
● ● ●●●
● ● ● ●
●● ●
● ●● ●●●● ● ●●● ● ● ●●●● ●
●● ●● ● ● ● ● ● ●
●● ●● ● ●● ●
● ● ● ●● ●●
y
●● ● ●● ● ●
●● ●● ● ● ● ●
● ● ● ●
● ● ●
● ●● ● ● ● ● ●●
●
● ● ● ● ● ●●
● ● ●
● ●● ● ● ●●
0
● ● ●● ● ● ● ●
● ●● ● ●●●
● ● ●● ●●●
●● ● ● ● ●
● ● ● ●● ●●● ●
●● ●
● ● ● ● ● ● ●●
● ●●
●
●
●●●●● ● ●
● ●●
●●
●● ● ● ●
● ● ● ● ● ● ●
● ● ● ● ● ●
● ● ● ● ●● ●● ● ●
−1
● ●● ●● ●
●●● ● ●
● ● ●● ● ● ●
● ● ● ●● ●
● ● ● ●● ●● ●
0.0 0.2 0.4 0.6 0.8 1.0
⇒ fˆ(x) for piecewise polynomials not continous

Outline

Transformation of Covariates
Polynomial Splines
91 / 327
Definition: Polynomial Splines
I better piecewise polynomials

I require continuous differentiability at subinterval boundaries
I formally:
f : [a, b] → R is polynomial spline of degree l ≥ 0 at knots
a = κ1 < · · · < κm = b if
1. f (x) is l − 1-times continuously differentiable
2. f (x) is polynmial with degree l on [κj , κj+1 )
⇒ choice of degree l determines smoothness of function
⇒ knot set κ defines flexibility/complexity f
91 / 327

I formally:
a = κ1 < · · · < κm = b if
91 / 327

I formally:
a = κ1 < · · · < κm = b if
91 / 327
Polynomial Splines: Example
polynomial spline degree 0 polynomial spline degree 1
● ●
5+2 knots ●●● 5+2 knots ●●●

2
2
● ●● ● ●●
● ●
●● ●●
●● ● ●
● ●● ● ●
●
● ● ● ●
● ● ●●● ● ● ● ●●● ●
● ● ●●● ● ● ● ●●● ●
● ● ● ● ● ●● ● ● ● ● ● ●●
● ● ●● ● ● ●● ● ●●● ● ● ● ●● ● ● ●● ● ●●● ●
● ● ● ●●● ●● ● ● ● ●●● ●●
1
1
● ● ● ●● ● ● ● ● ● ● ●● ● ● ●
● ● ●● ●●●
●
● ● ●● ● ● ● ● ●● ●●●
●
● ● ●● ● ●
● ●● ●● ●●●● ● ● ● ● ●● ●● ●●●● ● ● ●
● ● ●● ● ● ●●●● ● ● ● ●● ● ● ●●●● ●
●●●●● ● ● ● ●● ●● ●●● ●● ● ●●●●● ● ● ● ●● ●● ●●● ●● ●
● ● ● ● ● ● ● ● ● ●
y
y
●● ●● ● ● ●● ● ● ●● ●● ● ● ●● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ●
● ● ●● ●● ● ● ● ●● ● ● ● ● ●● ●● ● ● ● ●● ● ●
● ● ●● ● ● ●● ● ● ●● ● ● ●●
● ●
0
0
● ●● ● ● ●●
● ● ●● ● ● ● ● ● ●
● ● ● ● ●● ● ● ● ● ● ●
● ●
●● ●●● ● ● ●● ● ●●● ●● ●●● ● ● ●● ● ●●●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ●● ●●● ● ● ● ●● ●●● ●
●
●●●● ● ● ● ● ●● ● ● ●
●●●● ● ● ● ● ●● ● ●
● ● ●●●
● ● ● ● ●● ● ● ●●●
● ● ● ● ●●
● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ●● ●● ● ● ● ● ● ●● ●● ●
−1
−1
● ●
● ●
●●● ● ● ●● ●● ● ● ●
● ● ●
●●● ● ● ●● ●● ● ● ●
●
● ● ●● ● ● ● ●● ●
●● ● ● ●● ● ●
● ● ● ● ● ● ● ● ● ●
●● ●● ● ● ●● ● ●● ●● ● ● ●● ●
● ● ● ● ●●● ● ● ● ● ●●●
● ●
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
x x
● ●
5+2 knots ●●● 5+2 knots ●●●

2
2
● ●● ● ●●
● ●
●● ●●
●● ● ●
● ●● ● ●
●
● ● ● ●
● ● ●●● ● ● ● ●●● ●
● ● ●●● ● ● ● ●●● ●
● ● ● ● ● ●● ● ● ● ● ● ●●
● ● ●● ● ● ●● ● ●●● ● ● ● ●● ● ● ●● ● ●●● ●
● ● ● ●●● ●● ● ● ● ●●● ●●
1
1
● ● ● ●● ● ● ● ● ● ● ●● ● ● ●
● ● ●● ●●●
●
● ● ●● ● ● ● ● ●● ●●●
●
● ● ●● ● ●
● ●● ●● ●●●● ● ● ● ● ●● ●● ●●●● ● ● ●
● ● ●● ● ● ●●●● ● ● ● ●● ● ● ●●●● ●
●●●●● ● ● ● ●● ●● ●●● ●● ● ●●●●● ● ● ● ●● ●● ●●● ●● ●
● ● ● ● ● ● ● ● ● ●
y
y
●● ●● ● ● ●● ● ● ●● ●● ● ● ●● ● ●
● ●
●● ● ● ● ● ●● ● ● ● ●
●●●
● ●● ●● ● ● ● ●● ● ● ●●●
● ●● ●● ● ● ● ●● ● ●
● ● ● ● ● ●● ● ● ● ● ● ●●
● ● ● ●
0
●
● ● ●● ●
●
● ● ● ● ●
● ●●
●
0 ●
● ● ●● ●
●
● ● ● ● ●
● ●●
●
● ● ●●● ● ● ●● ● ●●● ● ● ●●● ● ● ●● ● ●●●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ●● ●●● ● ● ● ●● ●●● ●
●
●●●● ● ● ● ● ●● ● ● ●
●●●● ● ● ● ● ●● ● ●
● ● ●
●● ●
● ● ● ●● ● ● ●
●● ●
● ● ● ●●
● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ●
−1
−1
● ● ● ● ●● ●● ●● ● ● ● ● ● ● ●● ●● ●● ● ●
● ● ● ●●● ● ● ● ● ● ● ●●● ● ● ●
●●● ● ●● ● ●●● ● ●● ●
● ● ●● ● ● ● ● ● ●● ● ● ●
● ●● ● ● ● ● ● ●● ● ● ● ●
● ● ● ● ●●● ● ● ● ● ●●●
● ●
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
x x
92 / 327
● ●
5+2 knots ●●● 5+2 knots ●●●

2
2
● ●● ● ●●
● ●
●● ●●
20+2 knots ●
●● ● ●
●
●
● ●
● ●●● ●
20+2 knots ●
●● ● ●
●
●
● ●
● ●●● ●
● ●●● ● ● ●●● ●
● ● ● ● ● ●● ● ● ● ● ● ●●
● ● ●● ● ● ●● ● ●●● ● ● ● ●● ● ● ●● ● ●●● ●
● ● ● ●●● ●● ● ● ● ●●● ●●
1
1
● ● ● ●● ● ● ● ● ● ● ●● ● ● ●
● ● ●● ●●●
●
● ● ●● ● ● ● ● ●● ●●●
●
● ● ●● ● ●
● ●● ●● ●●●● ● ● ● ● ●● ●● ●●●● ● ● ●
● ● ●● ● ● ●●●● ● ● ● ●● ● ● ●●●● ●
●●●●● ● ● ● ●● ●● ●●● ●● ● ●●●●● ● ● ● ●● ●● ●●● ●● ●
● ● ● ● ● ● ● ● ● ●
y
y
●● ●● ● ● ●● ● ● ●● ●● ● ● ●● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ●
● ● ●● ●● ● ● ● ●● ● ● ● ● ●● ●● ● ● ● ●● ● ●
● ● ●● ● ● ●● ● ● ●● ● ● ●●
● ●
0
0
● ●● ● ● ●●
● ● ●● ● ● ● ● ● ●
● ● ● ● ●● ● ● ● ● ● ●
● ●
●● ●●● ● ● ●● ● ●●● ●● ●●● ● ● ●● ● ●●●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ●● ●●● ● ● ● ●● ●●● ●
●
●●●● ● ● ● ● ●● ● ● ●
●●●● ● ● ● ● ●● ● ●
● ● ●●●
● ● ● ● ●● ● ● ●●●
● ● ● ● ●●
● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ●● ●● ● ● ● ● ● ●● ●● ●
−1
−1
● ●
● ●
●●● ● ● ●● ●● ● ● ●
● ● ●
●●● ● ● ●● ●● ● ● ●
●
● ● ●● ● ● ● ●● ●
●● ● ● ●● ● ●
● ● ● ● ● ● ● ● ● ●
●● ●● ● ● ●● ● ●● ●● ● ● ●● ●
● ● ● ● ●●● ● ● ● ● ●●●
● ●
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
x x
● ●
5+2 knots ●●● 5+2 knots ●●●

2
2
● ●● ● ●●
● ●
●● ●●
20+2 knots ●
●● ● ●
●
●
● ●
● ●●● ●
20+2 knots ●
●● ● ●
●
●
● ●
● ●●● ●
● ●●● ● ● ●●● ●
● ● ● ● ● ●● ● ● ● ● ● ●●
● ● ●● ● ● ●● ● ●●● ● ● ● ●● ● ● ●● ● ●●● ●
● ● ● ●●● ●● ● ● ● ●●● ●●
1
1
● ● ● ●● ● ● ● ● ● ● ●● ● ● ●
● ● ●● ●●●
●
● ● ●● ● ● ● ● ●● ●●●
●
● ● ●● ● ●
● ●● ●● ●●●● ● ● ● ● ●● ●● ●●●● ● ● ●
● ● ●● ● ● ●●●● ● ● ● ●● ● ● ●●●● ●
●●●●● ● ● ● ●● ●● ●●● ●● ● ●●●●● ● ● ● ●● ●● ●●● ●● ●
● ● ● ● ● ● ● ● ● ●
y
y
●● ●● ● ● ●● ● ● ●● ●● ● ● ●● ● ●
● ●
●● ● ● ● ● ●● ● ● ● ●
●●●
● ●● ●● ● ● ● ●● ● ● ●●●
● ●● ●● ● ● ● ●● ● ●
● ● ● ● ● ●● ● ● ● ● ● ●●
● ● ● ●
0
●
● ● ●● ●
●
● ● ● ● ●
● ●●
●
0 ●
● ● ●● ●
●
● ● ● ● ●
● ●●
●
● ● ●●● ● ● ●● ● ●●● ● ● ●●● ● ● ●● ● ●●●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ●● ●●● ● ● ● ●● ●●● ●
●
●●●● ● ● ● ● ●● ● ● ●
●●●● ● ● ● ● ●● ● ●
● ● ●
●● ●
● ● ● ●● ● ● ●
●● ●
● ● ● ●●
● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ●
−1
−1
● ● ● ● ●● ●● ●● ● ● ● ● ● ● ●● ●● ●● ● ●
● ● ● ●●● ● ● ● ● ● ● ●●● ● ● ●
●●● ● ●● ● ●●● ● ●● ●
● ● ●● ● ● ● ● ● ●● ● ● ●
● ●● ● ● ● ● ● ●● ● ● ● ●
● ● ● ● ●●● ● ● ● ● ●●●
● ●
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
x x
92 / 327
● ●
5+2 knots ●●● 5+2 knots ●●●

2
2
● ●● ● ●●
● ●
●● ●●
20+2 knots ●
●● ● ●
●
●
● ●
● ●●● ●
20+2 knots ●
●● ● ●
●
●
● ●
● ●●● ●
● ●●● ● ● ●●● ●
● ● ● ●
● ●
●
50+2 knots ●●
●
●
● ● ●●
●
●
●
● ●●● ●●
●●
●●● ● ● ●
●
50+2 knots ●●
●
●
● ● ●●
●
●
●
● ●●● ●●
●●
●●● ●
1
1
● ● ● ●● ● ● ● ● ● ● ●● ● ● ●
● ● ●● ●●●
●
● ● ●● ● ● ● ● ●● ●●●
●
● ● ●● ● ●
● ●● ●● ●●●● ● ● ● ● ●● ●● ●●●● ● ● ●
● ● ●● ● ● ●●●● ● ● ● ●● ● ● ●●●● ●
●●●●● ● ● ● ●● ●● ●●● ●● ● ●●●●● ● ● ● ●● ●● ●●● ●● ●
● ● ● ● ● ● ● ● ● ●
y
y
●● ●● ● ● ●● ● ● ●● ●● ● ● ●● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ●
● ● ●● ●● ● ● ● ●● ● ● ● ● ●● ●● ● ● ● ●● ● ●
● ● ●● ● ● ●● ● ● ●● ● ● ●●
● ●
0
0
● ●● ● ● ●●
● ● ●● ● ● ● ● ● ●
● ● ● ● ●● ● ● ● ● ● ●
● ●
●● ●●● ● ● ●● ● ●●● ●● ●●● ● ● ●● ● ●●●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ●● ●●● ● ● ● ●● ●●● ●
●
●●●● ● ● ● ● ●● ● ● ●
●●●● ● ● ● ● ●● ● ●
● ● ●●●
● ● ● ● ●● ● ● ●●●
● ● ● ● ●●
● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ●● ●● ● ● ● ● ● ●● ●● ●
−1
−1
● ●
● ●
●●● ● ● ●● ●● ● ● ●
● ● ●
●●● ● ● ●● ●● ● ● ●
●
● ● ●● ● ● ● ●● ●
●● ● ● ●● ● ●
● ● ● ● ● ● ● ● ● ●
●● ●● ● ● ●● ● ●● ●● ● ● ●● ●
● ● ● ● ●●● ● ● ● ● ●●●
● ●
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
x x
● ●
5+2 knots ●●● 5+2 knots ●●●

2
2
● ●● ● ●●
● ●
●● ●●
20+2 knots ●
●● ● ●
●
●
● ●
● ●●● ●
20+2 knots ●
●● ● ●
●
●
● ●
● ●●● ●
● ●●● ● ● ●●● ●
● ● ● ●
● ●
●
50+2 knots ●●
●
●
● ● ●●
●
●
●
● ●●● ●●
●●
●●● ● ● ●
●
50+2 knots ●●
●
●
● ● ●●
●
●
●
● ●●● ●●
●●
●●● ●
1
1
● ● ● ●● ● ● ● ● ● ● ●● ● ● ●
● ● ●● ●●●
●
● ● ●● ● ● ● ● ●● ●●●
●
● ● ●● ● ●
● ●● ●● ●●●● ● ● ● ● ●● ●● ●●●● ● ● ●
● ● ●● ● ● ●●●● ● ● ● ●● ● ● ●●●● ●
●●●●● ● ● ● ●● ●● ●●● ●● ● ●●●●● ● ● ● ●● ●● ●●● ●● ●
● ● ● ● ● ● ● ● ● ●
y
y
●● ●● ● ● ●● ● ● ●● ●● ● ● ●● ● ●
● ●
●● ● ● ● ● ●● ● ● ● ●
●●●
● ●● ●● ● ● ● ●● ● ● ●●●
● ●● ●● ● ● ● ●● ● ●
● ● ● ● ● ●● ● ● ● ● ● ●●
● ● ● ●
0
●
● ● ●● ●
●
● ● ● ● ●
● ●●
●
0 ●
● ● ●● ●
●
● ● ● ● ●
● ●●
●
● ● ●●● ● ● ●● ● ●●● ● ● ●●● ● ● ●● ● ●●●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ●● ●●● ● ● ● ●● ●●● ●
●
●●●● ● ● ● ● ●● ● ● ●
●●●● ● ● ● ● ●● ● ●
● ● ●
●● ●
● ● ● ●● ● ● ●
●● ●
● ● ● ●●
● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ●
−1
−1
● ● ● ● ●● ●● ●● ● ● ● ● ● ● ●● ●● ●● ● ●
● ● ● ●●● ● ● ● ● ● ● ●●● ● ● ●
●●● ● ●● ● ●●● ● ●● ●
● ● ●● ● ● ● ● ● ●● ● ● ●
● ●● ● ● ● ● ● ●● ● ● ● ●
● ● ● ● ●●● ● ● ● ● ●●●
● ●
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
x x
92 / 327
Polynomial Splines: Discussion
I Standard: cubic splines:

I visually smooth
I twice continuous differentiable (i.e., curvature well defined.)
I knot set:
Isize: trade-off flexibility and overfitting
Ipositioning: equidistant? quantile-based? domain knowledge?
→ more on this in the context of penalization
93 / 327

I visually smooth
I knot set:
93 / 327

I visually smooth
I knot set:
93 / 327
Truncated Polynomials
I simplest polynomial splines

I basis representation for degree l and knots κ = (κ1 , . . . , κm ):
f (x) =γ1 + γ2 x + · · · + γl+1 x l +

+ γl+2 (x − κ2 )l+ + · · · + γl+m−1 (x − κm−1 )l+
I first l + 1 Koeffizienten determine global polynomial with degree l

I coefficient of highest power can change at each knot κ
⇒ f is of degree l everywhere and continuously differentiable
94 / 327
Truncated Polynomials
I simplest polynomial splines

I basis representation for degree l and knots κ = (κ1 , . . . , κm ):
f (x) =γ1 + γ2 x + · · · + γl+1 x l +

+ γl+2 (x − κ2 )l+ + · · · + γl+m−1 (x − κm−1 )l+
I first l + 1 Koeffizienten determine global polynomial with degree l

I coefficient of highest power can change at each knot κ
⇒ f is of degree l everywhere and continuously differentiable
94 / 327
Truncated Polynomials: Example
basis functions
●
●●
2
●
●● ●
●
●
●
●● ●
●
● ●
● ● ●●● ●
● ●●
● ● ● ●
● ●● ● ● ●●
● ●
● ● ● ●● ● ●●●
● ● ● ● ●
● ● ●● ●●
1
● ● ● ●
● ● ● ●
● ●● ● ●● ●
● ●
●● ● ●●
● ●●
●● ● ●● ● ● ● ● ● ●●
● ● ● ● ● ●
●● ●● ● ● ● ●
●● ●● ● ● ●● ● ● ●●
● ●
●
y
● ● ● ●● ● ●
● ● ● ●
●● ●
● ● ● ● ● ● ●
●●● ● ● ● ● ● ●
● ● ●
● ● ● ● ●●
● ● ●
0
● ●● ●●
●● ● ● ●
●
●
●● ● ● ●● ● ●●●
● ●●
● ●
● ● ● ● ● ●
● ● ● ● ●● ●
● ●● ●
●●● ● ● ●
● ● ● ●●
●● ● ●●
● ●
●● ● ●
●
● ●● ●●
●● ● ●
● ● ●
● ● ● ●
● ●●● ● ● ●
● ● ● ●● ●● ●
−1
● ●
● ●● ● ●
●
● ●
● ● ● ● ●
● ● ●●
● ●● ● ● ●
● ●
● ● ●● ●
●●
0.0 0.2 0.4 0.6 0.8 1.0
95 / 327
scaled basis functions

basis functions
● ●
● ●
●● ●●
2
2
● ●
●● ● ●● ●
● ●
● ●
● ●
●● ●
● ●● ●
●
● ● ● ●
● ● ●●● ● ● ● ●●● ●
● ●● ● ●●
● ● ● ● ● ● ● ●
● ●● ● ● ●● ● ●● ● ● ●●
● ● ● ●
● ● ● ●● ● ●●● ● ● ● ●● ● ●●●
● ● ● ● ● ● ● ● ● ●
● ● ●● ●● ● ● ●● ●●
1
1
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ●● ● ●● ●
● ● ● ●● ● ●● ●
● ●
●● ● ●● ●● ● ●●
● ●● ● ●●
●● ● ●● ● ● ● ● ● ●● ●● ● ●● ● ● ● ● ● ●●
● ● ● ● ● ● ● ● ● ● ● ●
●● ●● ● ● ● ● ●● ●● ● ● ● ●
●● ●● ● ● ●● ● ● ●● ●● ●● ● ● ●● ● ● ●●
● ●
● ● ●
●
y
y
● ● ● ●● ● ● ● ● ● ●● ● ●
● ● ● ● ● ● ● ●
●● ● ●● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ●
●●● ● ● ● ● ● ● ●●● ● ● ● ● ● ●
● ● ● ● ● ●
● ● ● ● ●● ● ● ● ● ●●
● ● ● ● ● ●
0
0
● ●● ●● ● ●● ●●
●● ● ● ●
●
●● ● ● ●
●
● ●
●● ● ● ●● ● ●●● ●● ● ● ●● ● ●●●
● ●● ● ●●
● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ●● ● ● ● ● ● ●● ●
● ●● ● ● ●● ●
●●● ● ● ● ●●● ● ● ●
● ● ● ●● ● ● ● ●●
●● ● ●● ●● ● ●●
● ● ● ●
●● ● ● ●● ● ●
● ●
● ●● ●● ● ●● ●●
●● ● ● ●● ● ●
● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ●●● ● ● ● ● ●●● ● ● ●
● ● ● ●● ●● ● ● ● ● ●● ●● ●
−1
−1
● ● ● ●
● ●● ● ● ● ●● ● ●
●
● ● ●
● ●
● ● ● ● ● ● ● ● ● ●
● ● ●● ● ● ●●
● ●● ● ● ● ● ●● ● ● ●
● ● ● ●
● ● ●● ● ● ● ●● ●
●● ●●
● ●
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
x x
95 / 327
scaled basis functions

basis functions
and their sum f(x)
● ●
● ●
●● ●●
2
2
● ●
●● ● ●● ●
● ●
● ●
● ●
●● ●
● ●● ●
●
● ● ● ●
● ● ●●● ● ● ● ●●● ●
● ●● ● ●●
● ● ● ● ● ● ● ●
● ●● ● ● ●● ● ●● ● ● ●●
● ● ● ●
● ● ● ●● ● ●●● ● ● ● ●● ● ●●●
● ● ● ● ● ● ● ● ● ●
● ● ●● ●● ● ● ●● ●●
1
1
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ●● ● ●● ●
● ● ● ●● ● ●● ●
● ●
●● ● ●● ●● ● ●●
● ●● ● ●●
●● ● ●● ● ● ● ● ● ●● ●● ● ●● ● ● ● ● ● ●●
● ● ● ● ● ● ● ● ● ● ● ●
●● ●● ● ● ● ● ●● ●● ● ● ● ●
●● ●● ● ● ●● ● ● ●● ●● ●● ● ● ●● ● ● ●●
● ●
● ● ●
●
y
y
● ● ● ●● ● ● ● ● ● ●● ● ●
● ● ● ● ● ● ● ●
●● ● ●● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ●
●●● ● ● ● ● ● ● ●●● ● ● ● ● ● ●
● ● ● ● ● ●
● ● ● ● ●● ● ● ● ● ●●
● ● ● ● ● ●
0
0
● ●● ●● ● ●● ●●
●● ● ● ●
●
●● ● ● ●
●
● ●
●● ● ● ●● ● ●●● ●● ● ● ●● ● ●●●
● ●● ● ●●
● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ●● ● ● ● ● ● ●● ●
● ●● ● ● ●● ●
●●● ● ● ● ●●● ● ● ●
● ● ● ●● ● ● ● ●●
●● ● ●● ●● ● ●●
● ● ● ●
●● ● ● ●● ● ●
● ●
● ●● ●● ● ●● ●●
●● ● ● ●● ● ●
● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ●●● ● ● ● ● ●●● ● ● ●
● ● ● ●● ●● ● ● ● ● ●● ●● ●
−1
−1
● ● ● ●
● ●● ● ● ● ●● ● ●
●
● ● ●
● ●
● ● ● ● ● ● ● ● ● ●
● ● ●● ● ● ●●
● ●● ● ● ● ● ●● ● ● ●
● ● ● ●
● ● ●● ● ● ● ●● ●
●● ●●
● ●
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
x x
95 / 327
Truncated Polynomials: Discussion
Numerical disadvantages
I Basis function values can become very large
I strong colinearity of basis functions
⇒ numerically preferable: B-spline-basis functions
96 / 327
Truncated Polynomials: Discussion
Numerical disadvantages
I Basis function values can become very large
I strong colinearity of basis functions
⇒ numerically preferable: B-spline-basis functions
96 / 327
B-splines: Idea
I B-Spline-basis function is itself a piecewise polynomial, connecting

I (l + 1) polynomial fragments
I of degree l
I (l − 1)-times continuously differentiable at connection points.
⇒ weighted sum of such basis functions is degree l and (l − 1)-times
continuously differentiable everywhere
97 / 327
B-splines: Idea
I B-Spline-basis function is itself a piecewise polynomial, connecting

I (l + 1) polynomial fragments
I of degree l
I (l − 1)-times continuously differentiable at connection points.
⇒ weighted sum of such basis functions is degree l and (l − 1)-times
continuously differentiable everywhere
97 / 327
B-Splines: Basis Functions
B−spline basis functions
1.0
degree l=0
0.8
0.6
B(x)
0.4
0.2
0.0
κ1 κ2 κ3 κ4 κ5 κ6 κ7 κ8 κ9 κ10 κ11
98 / 327
1.0
degree l=0
degree l=1
0.8
0.6
B(x)
0.4
0.2
0.0
κ1 κ2 κ3 κ4 κ5 κ6 κ7 κ8 κ9 κ10 κ11
98 / 327
1.0
degree l=0
degree l=1
degree l=2
0.8
0.6
B(x)
0.4
0.2
0.0
κ1 κ2 κ3 κ4 κ5 κ6 κ7 κ8 κ9 κ10 κ11
98 / 327
1.0
degree l=0
degree l=1
degree l=2
degree l=3
0.8
0.6
B(x)
0.4
0.2
0.0
κ1 κ2 κ3 κ4 κ5 κ6 κ7 κ8 κ9 κ10 κ11
98 / 327
1.0
0.8
0.6
B(x)
0.4
0.2
0.0
κ1 κ2 κ3 κ4 κ5 κ6 κ7 κ8 κ9 κ10 κ11
98 / 327
B-Splines: Properties
I local Basis: basis functions 6= 0 only between l + 2-knots

I bounded range
⇒ avoids problems of truncated polynomials
I overlap with 2l adjacent basis functions
99 / 327

I bounded range
99 / 327

I bounded range
99 / 327
(B-)Splines as Linear Models
Model: y = f (x) + ε
How to estimate f (x)?
Idefine basis functions bk (x); k = 1, . . . , K
PK
I f (x) ≈
k=1 θk bk (x)
⇒ ŷ = f (x) = K
ˆ
P
k=1 θ̂k bk (x)
⇒ this is a linear model ŷ =Bθ̂ 
b1 (x1 ) . . . bK (x1 )
 .. .. 
with design matrix B =  . . 
b1 (xn ) . . . bK (xn )
I analogously applicable to GLMs: g (µ̂) = Bθ̂
100 / 327
PK
I f (x) ≈
k=1 θk bk (x)
⇒ ŷ = f (x) = K
ˆ
P
k=1 θ̂k bk (x)
b1 (x1 ) . . . bK (x1 )
 .. .. 
b1 (xn ) . . . bK (xn )
100 / 327
PK
I f (x) ≈
k=1 θk bk (x)
⇒ ŷ = f (x) = K
ˆ
P
k=1 θ̂k bk (x)
b1 (x1 ) . . . bK (x1 )
 .. .. 
b1 (xn ) . . . bK (xn )
100 / 327
PK
I f (x) ≈
k=1 θk bk (x)
⇒ ŷ = f (x) = K
ˆ
P
k=1 θ̂k bk (x)
b1 (x1 ) . . . bK (x1 )
 .. .. 
b1 (xn ) . . . bK (xn )
100 / 327
PK
I f (x) ≈
k=1 θk bk (x)
⇒ ŷ = f (x) = K
ˆ
P
k=1 θ̂k bk (x)
b1 (x1 ) . . . bK (x1 )
 .. .. 
b1 (xn ) . . . bK (xn )
100 / 327
PK
I f (x) ≈
k=1 θk bk (x)
⇒ ŷ = f (x) = K
ˆ
P
k=1 θ̂k bk (x)
b1 (x1 ) . . . bK (x1 )
 .. .. 
b1 (xn ) . . . bK (xn )
100 / 327
B-Splines: R-Implementation
bs in splines package creates a B-spline Designmatrix B:
library("splines")
B <- bs(x, df = 12, intercept = T)
m_bspline <- lm(y ~ B - 1)
B_scaled <- t(t(B) * coef(m_bspline))
plot(x, y, pch = 19, cex = .5, col = "grey")
matlines(x, B, lty = 1, col = 1, lwd = 2)
●
●●
2
●
● ●●
● ●
● ●
●● ●
● ●
● ● ● ● ● ●
● ● ●
● ● ● ●
● ●● ● ● ●●
● ● ● ● ● ●●● ●
● ● ● ●
● ● ●
● ● ● ●●
1
● ● ● ● ● ●
● ● ● ●●● ● ●● ●
●● ●
● ● ● ●
● ● ● ●● ● ● ●● ● ● ● ● ●
● ● ● ● ● ●
●● ●● ● ● ● ● ●
● ● ● ● ● ● ● ●● ● ● ●● ●●
y
● ● ● ● ● ● ●
● ● ●● ● ● ●
● ● ● ●
●
● ● ● ●
●
●● ● ● ● ● ● ●
● ● ● ● ● ● ●
● ● ●
0
● ● ● ●● ●● ●
● ● ●● ● ● ● ●
● ● ● ● ●
● ● ● ●● ●
●
●
● ● ●
● ● ● ● ● ● ●● ●● ● ●
● ●● ● ● ●
● ● ● ●● ●
● ● ●● ● ●● ●
●
● ● ● ●● ● ●
● ● ● ● ●
● ● ● ●
●● ●● ● ● ● ●● ●●
●
−1
● ● ● ● ●
● ● ● ● ● ● ● ●
●● ●●
●● ● ● ●
● ●● ● ● ●● ● ●
● ● ● ●
● ● ● ●
0.0 0.2 0.4 0.6 0.8 1.0
x
library("splines")
matlines(x, B, lty = 1, col = scales::alpha(1, .7), lwd = .5)
matlines(x, B_scaled, lty = 1, col = 2, lwd = 2)
●
●●
2
●
● ●●
● ●
● ●
●● ●
● ●
● ● ● ● ● ●
● ● ●
● ● ● ●
● ●● ● ● ●●
● ● ● ● ● ●●● ●
● ● ● ●
● ● ●
● ● ● ●●
1
● ● ● ● ● ●
● ● ● ●●● ● ●● ●
●● ●
● ● ● ●
● ● ● ●● ● ● ●● ● ● ● ● ●
● ● ● ● ● ●
●● ●● ● ● ● ● ●
● ● ● ● ● ● ● ●● ● ● ●● ●●
y
● ● ● ● ● ● ●
● ● ●● ● ● ●
● ● ● ●
●
● ● ● ●
●
●● ● ● ● ● ● ●
● ● ● ● ● ● ●
● ● ●
0
● ● ● ●● ●● ●
● ● ●● ● ● ●
● ● ●● ● ● ● ● ● ● ● ●
● ● ● ● ●
● ● ● ● ● ● ●● ●● ● ●
● ●● ● ● ●
● ● ● ●● ●
● ● ●● ● ●● ●
●
● ● ● ●● ● ●
● ● ● ● ●
● ● ● ●
●● ●● ● ● ● ●● ●●
●
−1
● ● ● ● ●
● ● ● ● ● ● ● ●
●● ●●
●● ● ● ●
● ●● ● ● ●● ● ●
● ● ● ●
● ● ● ●
0.0 0.2 0.4 0.6 0.8 1.0
x
library("splines")
matlines(x, B, lty = 1, col = scales::alpha(1, .7), lwd = .5)
matlines(x, B_scaled, lty = 1, col = scales::alpha(2, .7), lwd = 1)
lines(x, fitted(m_bspline), lty = 1, col = 3, lwd = 2)
●
●●
2
●
● ●●
● ●
● ●
●● ●
● ●
● ● ● ● ● ●
● ● ●
● ● ● ●
● ●● ● ● ●●
● ● ● ● ● ●●● ●
● ● ● ●
● ● ●
● ● ● ●●
1
● ● ● ● ● ●
● ● ● ●●● ● ●● ●
●● ●
● ● ● ●
● ● ● ●● ● ● ●● ● ● ● ● ●
● ● ● ● ● ●
●● ●● ● ● ● ● ●
● ● ● ● ● ● ● ●● ● ● ●● ●●
y
● ● ● ● ● ● ●
● ● ●● ● ● ●
● ● ● ●
●
● ● ● ●
●
●● ● ● ● ● ● ●
● ● ● ● ● ● ●
● ● ●
0
● ● ● ●● ●● ●
● ● ●● ● ● ●
● ● ●● ● ● ● ● ● ● ● ●
● ● ● ● ●
● ● ● ● ● ● ●● ●● ● ●
● ●● ● ● ●
● ● ● ●● ●
● ● ●● ● ●● ●
●
● ● ● ●● ● ●
● ● ● ● ●
● ● ● ●
●● ●● ● ● ● ●● ●●
●
−1
● ● ● ● ●
● ● ● ● ● ● ● ●
●● ●●
●● ● ● ●
● ●● ● ● ●● ● ●
● ● ● ●
● ● ● ●
0.0 0.2 0.4 0.6 0.8 1.0
x
Splines: Summary
I basis function representation linearizes problem of function

estimation
I dimension of basis controls maximal complexity
I basis type determines properties of function estimate: continuity,
differentiability, periodicity, . . .
104 / 327

Exemplary Longitudinal Study: Sleep Deprivation
Motivation: From LM to LMM
Advantages of a Mixed Models Representation
Linear Mixed Models
LMM Estimation
Generalized Linear Mixed Models
GLMM Estimation
105 / 327
Example: Sleep Deprivation Data
I laboratory experiment to measure effect of sleep deprivation on

cognitive performance
I 18 subjects, restricted to 3 hours of sleep per night for 10 days
I operationalization of cognitive performance: reaction time
105 / 327
data(sleepstudy, package = "lme4")

summary(sleepstudy)
## Reaction Days Subject

## Min. :194.3 Min. :0.0 308 : 10
## 1st Qu.:255.4 1st Qu.:2.0 309 : 10
## Median :288.7 Median :4.5 310 : 10
## Mean :298.5 Mean :4.5 330 : 10
## 3rd Qu.:336.8 3rd Qu.:7.0 331 : 10
## Max. :466.4 Max. :9.0 332 : 10
## (Other):120
106 / 327
308 309 310 330 331 332

● ●
●
●
400 ●
●
●
●
Average reaction time [ms]
● ●
● ● ● ● ● ● ●
● ● ●
300 ● ● ●
●
● ● ●
●
● ● ●
●
● ● ●
● ● ● ●
●
●
● ● ● ● ● ● ● ● ●
●
● ● ● ● ● ● ●
200 ● ●
333 334 335 337 349 350

● ●
●
400 ●
● ●
● ●
●
●
●
● ●
● ● ● ● ● ●
● ● ●
● ●
300 ● ● ●
● ●
● ●
● ●
●
● ●
●
● ●
● ●
●
● ● ● ● ●
● ● ● ● ● ● ●
● ● ● ● ● ●
200
351 352 369 370 371 372
400 ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ●
● ● ● ●
● ●
300 ●
●
● ● ●
●
●
●
●
● ●
●
●
●
●
● ● ● ● ● ●
●
●
●
● ●
● ● ●
● ●
● ● ●
200
1 4 7 1 4 7 1 4 7 1 4 7 1 4 7 1 4 7
Days of sleep deprivation
Model global trend: Reactionij ≈ β0 + β1 Daysij
m_sleep_global <- lm(Reaction ~ Days, data = sleepstudy)

summary(m_sleep_global)
##
## Call:
## lm(formula = Reaction ~ Days, data = sleepstudy)
##
## Residuals:
## Min 1Q Median 3Q Max
## -110.848 -27.483 1.546 26.142 139.953
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 251.405 6.610 38.033 < 2e-16 ***
## Days 10.467 1.238 8.454 9.89e-15 ***
## ---
## Signif. codes:
## 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
##
## Residual standard error: 47.71 on 178 degrees of freedom
## Multiple R-squared: 0.2865, Adjusted R-squared: 0.2825
## F-statistic: 71.46 on 1 and 178 DF, p-value: 9.894e-15
With estimated global level and trend added:
308 309 310 330 331 332
● ●
●
●
400 ●
●
●
●
● ●
● ● ● ● ● ● ●
● ● ●
300 ● ● ●
●
● ● ●
●
● ● ●
●
● ● ●
● ● ● ●
●
●
● ● ● ● ● ● ● ● ●
●
● ● ● ● ● ● ●
200 ● ●
333 334 335 337 349 350

● ●
●
400 ●
● ●
● ●
●
●
●
● ●
● ● ● ● ● ●
● ● ●
● ●
300 ● ● ●
● ●
● ●
● ●
●
● ●
●
● ●
● ●
●
● ● ● ● ●
● ● ● ● ● ● ●
● ● ● ● ● ●
200
351 352 369 370 371 372
400 ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ●
● ● ● ●
● ●
300 ●
●
● ● ●
●
●
●
●
● ●
●
●
●
●
● ● ● ● ● ●
●
●
●
● ●
● ● ●
● ●
● ● ●
200
1 4 7 1 4 7 1 4 7 1 4 7 1 4 7 1 4 7
⇒ obviously inappropriate model
I subjects obviously differ in level and trend for reaction time

I idea: model subject-specific levels and trends
Reactionij ≈ β0i + β1i Daysij
# similar: m_sleep_indiv <- lm(Reaction ~ 0 + Subject + Subject:Days)
m_sleep_indiv <- lmList(Reaction ~ Days | Subject, data = sleepstudy)
head(coef(m_sleep_indiv))
## (Intercept) Days
## 308 244.1927 21.764702
## 309 205.0549 2.261785
## 310 203.4842 6.114899
## 330 289.6851 3.008073
## 331 285.7390 5.266019
## 332 264.2516 9.566768
With estimated individual level and trend added:
308 309 310 330 331 332
500 ● ●
●
400 ●
● ●
● ● ●
● ● ● ● ● ●
● ● ● ● ●
300 ● ● ●
●
● ● ●
●
● ● ● ● ● ● ● ● ● ● ●
●
●
● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ●
200 ● ●
333 334 335 337 349 350

500 ● ●
●
400 ● ● ●
● ● ●
●
● ●
● ● ● ● ● ● ● ● ● ●
300 ● ● ● ● ●
● ●
● ●
● ●
●
●
●
●
●
● ● ●
● ●
●
● ● ● ●
●
● ● ● ● ●
● ● ● ● ● ●
200
351 352 369 370 371 372

500
400 ● ● ● ● ● ●
● ●
● ●
●
● ●
● ● ●
● ● ●
● ● ● ● ● ● ●
300 ●
●
● ● ●
●
●
●
●
● ● ● ●
●
● ● ● ● ● ● ● ● ●
●
● ●
● ● ●
● ● ● ● ●
200
0.02.55.07.5 0.02.55.07.5 0.02.55.07.5 0.02.55.07.5 0.02.55.07.5 0.02.55.07.5
⇒ better fit
Outline

Linear Mixed Models
LMM Estimation
GLMM Estimation

112 / 327
I global model yij = β0 + β1 x1ij + εij :

I ignores within-subject correlation
⇒ variability of coefficients underestimated since correlated data
contain less information than independent data
⇒ invalid inference (tests, CIs)
⇒ complete pooling
I subject-specific models yij = β0i + β1i x1ij + εij :
I can be interpreted only with regard to the data in the sample
⇒ no generalization to “typical” subjects / population
I many many parameters to estimate
⇒ estimates may be unstable, imprecise
⇒ no pooling
112 / 327
I global model yij = β0 + β1 x1ij + εij :

I ignores within-subject correlation
⇒ variability of coefficients underestimated since correlated data
contain less information than independent data
⇒ invalid inference (tests, CIs)
⇒ complete pooling
I subject-specific models yij = β0i + β1i x1ij + εij :
I can be interpreted only with regard to the data in the sample
⇒ no generalization to “typical” subjects / population
I many many parameters to estimate
⇒ estimates may be unstable, imprecise
⇒ no pooling
112 / 327
alternative representation of subject-specific models:
yij = β̄0 + (β0i − β̄0 ) + β̄1 x1ij + (β1i − β̄1 )x1ij + εij
with means of subject-specific parameters

18 18
1 X 1 X
β̄0 = β0i ; β̄1 = β1i
18 18
i=1 i=1
113 / 327
I idea of a random effect model
I β̄ is the population level effect β.
I express subject-specific deviations βi − β̄ as Gaussian random variables
bi ∼ N(0, σb2 ).
I this yields
yij = β0 + β1 x1ij + b0i + b1i x1ij + εij
with
εij ∼ N(0, σ 2 ), (b0i , b1i )> ∼ N2 (0, Σ).
I or alternatively:
yij = b0i + b1i x1ij + εij
with

εij ∼ N(0, σ 2 ), (b0i , b1i )> ∼ N2 (β0 , β1 )> , Σ .
114 / 327
Outline

Linear Mixed Models
LMM Estimation
GLMM Estimation

115 / 327
Partial Pooling
I regression coefficients β0 , β1 , . . . in random effects models retain

their interpretation as population level parameters.
I subject-specific deviations from the population mean are modeled by
random effects – the implicit assumption is that subjects are a
random sample from the population of interest
⇒ partial pooling, with strength of pooling determined by random effect
variance.
115 / 327
Advantages of the Random Effects Approach
I decomposition of random variability in data into
I subject-specific deviations from population mean
I deviation of observations from subject means
⇒ more precise estimates of population trends
I some degree of protection against bias caused by drop-out
I random effects serve as surrogates for effects of unobserved
subject-level covariates
⇒ control for unobserved heterogeneity
I distributional assumption bi ∼ N stabilizes estimates b̂i (shrinkage
effect) compared to fixed subject-specific estimates β̂i without
distributional assumption
I intuition: estimates are stabilized by including prior knowledge in the
model, i.e., assuming that subjects from the population are mostly
similar to each other
116 / 327
I random effects model the correlation structure between observations:
yij = β0 + bi + εij
i.i.d. i.i.d.
with bi ∼ N(0, σb2 ); εij ∼ N(0, σε2 )
Cov(bi , bi ) σ2
=⇒ Corr(yij , yij 0 ) = p = 2 b 2.
Var(yij ) Var(yij 0 ) σb + σε
I independence between observations on different subjects is retained

(for the kind of correlation structure we discuss here):
Corr(yij , yi 0 j ) = 0 for i 6= j.
117 / 327
yij = β0 + b0i + b1i tj + εij

i.i.d. i.i.d.
with (b0i , b1i )> ∼ N2 (0, Σ); εij ∼ N(0, σε2 )
2 2 2
=⇒ Var(yij ) = σb0 + 2σb01 tj + σb1 tj + σε2
2 2
=⇒ Cov(yij , yij 0 ) = σb0 + σb01 (tj + tj 0 ) + σb1 tj tj 0

117 / 327

117 / 327
Outline

Linear Mixed Models
LMM Estimation
GLMM Estimation

118 / 327
General Form of Linear Mixed Models
Linear Mixed Model:
y = Xβ + Ub + ε
b ∼ N(0, G)
ε ∼ N(0, R)
I U: design matrix for random effects

I independence between ε and b.
I entries in G, R determined by (co-)variance parameters ϑ
I we’ll focus on independent errors with R = σ 2 I
118 / 327
Conditional and Marginal Perspective
Conditional perspective:
y|b ∼ N(Xβ + Ub, R); b ∼ N(0, G)
Interpretation:
random effects b are subject-specific effects that vary across the
population.
Hierarchical formulation:
expected response is a function of population-level effects (fixed effects)
and subject-level effects (random effects).
119 / 327
Conditional and Marginal Perspective
Marginal perspective:
y ∼ N(Xβ, V) V = Cov(y) = UGU> + R
Interpretation:
random effects b induce a correlation structure in y defined by U and G,
and thereby allow valid analyses of correlated data.
Marginal formulation:
model is concerned with the marginal expectation of y averaged over the
population as a function of population-level effects.
The marginal model is more general than the hierarchical model.
general estimating equations: geepack
120 / 327
Linear Mixed Model for Longitudinal Data
For subjects i = 1, . . . , m, each with observations j = 1, . . . , ni :
yij = xij β + uij bi + εi j

bi ∼ Nq (0, Σ)
⇔ y = Xβ + Ub + ε
with
Pm
I y = (y> > >
1 , y2 , . . . , ym ) (n = i=1 ni entries)
I > > >
ε = (ε1 , ε2 , . . . , εm ) (n entries)
I β = (β0 , β1 , . . . , βp )>
I X = [1 x1 . . . xp ]
I b = (b1 , b2 , . . . , bm ) of length mq, with b ∼ Nmq (0, G)
I G = diag(Σ, . . . , Σ))
I U = diag(U1 , . . . , Um ) with dimension n × mq

I Ui = 1 u1i . . . u(q−1)i with dimension ni × q. Variables in Ui are
typically a subset of those in X.
121 / 327
Other Types of Mixed Models
I hierarchical/multi-level model:
e.g., test score yijk of a pupil i in class j in school k:
yijk = β0 + x>
ijk β + b1j + b2k + εijk
with random intercepts for class (b1j ∼ N(0, σ12 )) and school
(b2k ∼ N(0, σ22 ))
I crossed designs:
e.g., score yij of a subject i on an item j:
yij = β0 + x>
ij β + b1i + b2j + εijk
with random intercepts for subject (b1i ∼ N(0, σ12 ), subject ability)
and item (b2j ∼ N(0, σ22 ), item difficulty)
122 / 327
Outline

Linear Mixed Models
LMM Estimation
GLMM Estimation

123 / 327
Likelihood-Based Estimation of Linear Mixed Models
ML-Estimation
I determine ϑ̂ML so that profile likelihood in V of the marginal model
is maximal:
y ∼ N(Xβ, V(ϑ))
1n o
l(β, ϑ) = − log |V(ϑ)| + (y − Xβ)> V(ϑ)−1 (y − Xβ)
2
−1
β(ϑ)
b = arg max (l(β, ϑ)) = X> V(ϑ)−1 X X> V(ϑ)−1 y
β
1n >
o
lP (ϑ) = − log |V(ϑ)| + (y − Xβ(ϑ))
b V(ϑ)−1 (y − Xβ(ϑ))
b
2
→ max
ϑ
I for given ϑ, closed form solutions for β̂ and b.

b
I b ϑ̂) = GZ> V(ϑ̂)−1 (y − Xβ(
simple generalized least squares: b( b ϑ̂)).
I Cov(β)
\ and Cov(b) \ computable for tests, CIs.
123 / 327
Likelihood-Based Estimation of Linear Mixed Models
REML estimation:
I ML-estimates ϑ̂ are biased, unbiased variance component estimates
from marginal-marginal likelihood of ϑ (“restricted”):
Z
lR (ϑ) = log L(β, ϑ)dβ
1
∝ lP (ϑ) − log |X> V−1 X| → max
2 ϑ
I closed form solutions for β̂ and b

b and their covariances given ϑ still
apply
I both are tricky optimization problems:
I positivity constraints for most entries in ϑ
I computationally expensive, numerically unstable log-determinants
I SOTA implementation for large sub-class: mgcv
124 / 327
Outline

Linear Mixed Models
LMM Estimation
GLMM Estimation

125 / 327
I GLM generalizes LM via addition of a link function

I mapping the linear predictor to a range appropriate for the response
distribution,
I and linking the variance to the expected value in a way appropriate for
the response distribution.
I carries over directly for a generalized linear mixed model (GLMM):
E(y|b) = h(Xβ + Ub)
with known response function h()
I BUT: estimation much harder problem than for LMMs or GLMs,
especially for binary responses (more later).
I BUT: GLMMs can only be interpreted in the conditional/hierarchical
perspective. Use GEEs for marginal models.
125 / 327
Model:
y|b : yi |b ∼ Expo.fam.(E(yi |b) = h(Xβ + Zb), φ)

b|ϑ : b|ϑ ∼ N(0, G(ϑ))
126 / 327
Caveat: Effect Attenuation in GLMMs
LMM Logit−GLMM
1.00
4
0.75
h(xiβ + b0 i)
0
0.50
−4 0.25
−8 0.00
−2 −1 0 1 2 −2 −1 0 1 2
x
1
For random intercept logit-models: βmar ≈ √ βcond
1+0.346σb2
127 / 327
Outline

Linear Mixed Models
LMM Estimation
GLMM Estimation

128 / 327
GLMM Estimation
LMM estimation exploits analytically accessible marginal likelihood:
Z
L(β, ϑ, φ) = L(b, β, φ, ϑ)db
is the density of
y|β, φ, ϑ ∼ N(Xβ, ZG(ϑ)Z> + R(φ, ϑ)).
For GLMMs:
n
Z !
Y
L(β, ϑ, φ) = f (yi |β, φ, b, ϑ) f (b|ϑ)db
i=1
(...sucks)
128 / 327
GLMM Estimation Algorithms
I Laplace approximation based : iterate

1. Compute b̂ = arg maxb L(β, φ, ϑ, b) for given β, φ, ϑ via penalized
IWLS-Algorithmus (P-IRLS).
2. Maximize a Laplace-Approximation L̃(β, φ, ϑ) of L(β, φ, ϑ) in b̂
(numerically, typically gradient based)
(mgcv, with lots of tricksy tricks; lme4 for large b)
I (Gaussian) quadrature based methods: more accurate, much slower
(lme4, gamm4)
I penalized quasi likelihood: replace GLMM by LMM with IWLS
working reponses and weights. Biased, not guaranteed to converge,
fairly fast. (nlme, mgcv:gamm)
I do (full) Bayes: flexible choice of effect distributions, hyperpriors,
likelihoods; very slow (STAN: stanarm, brms)
129 / 327

(lme4, gamm4)
129 / 327

(lme4, gamm4)
129 / 327

(lme4, gamm4)
129 / 327
Mixed Models in a Nutshell
I standard regression models can model only the structure of the
expected values of the response
I mixed models are regression models in which a subset of coefficients
are assumed to be random unknown quantities from a known
distribution instead of fixed unknowns, and this means we can
I model the covariance structure of the data (marginal perspective)
I estimate (a large number of) subject-level coefficients without too
much trouble (conditional perspective)
I random intercepts can be used to model subject-specific differences in
the level of the response
→ grouping variable as a special kind of nominal covariate
I a random slope for a covariate is like an interaction between the
grouping variable and that covariate
→ grouping variable as a special kind of effect modifier for that
covariate
I hard estimation problems: variance components difficult to optimize,
often very high-dim. b
130 / 327
covariate
130 / 327
covariate
130 / 327

Penalization: Controlling smoothness
Smoothing Parameter Optimization
Generalized Additive Models
Surface Estimation
Varying coefficients
131 / 327
Splines
I Splines
I piecewise polynomials with smoothness properties at knot locations
I can be embedded into (generalized) linear models (e.g. ML estimates)
I Problem: choice of optimal knot setting.
I Two-fold problem:
I how many knots?
I where to put them?
I two possible solutions, one of them good:
I adaptive knot choice: make no. of knots and their positioning part of
optimization procedure
I penalization: use large number of knots to guarantee sufficient model
capacity, but add a cost (penalty) for wiggliness / complexity to
131 / 327
Splines
I Splines
I Two-fold problem:
I how many knots?
131 / 327
Splines
I Splines
I Two-fold problem:
I how many knots?
131 / 327
Function estimation example: climate reconstruction
●
●
0.5
●
0.0
● ● ●
●
● ●
●
● ●
●
●
●
● ●
● ● ●
●
●
●
●
●
● ●
● ●
● ●
●
●
●
●
● ●
● ● ●
●
● ● ●
● ●
●
●
●
● ●
●
● ● ●
● ●
●
● ●
●
● ● ●
● ●
● ● ●
● ● ●
● ● ●
●
●
●
● ● ●
●
●
● ● ●
● ● ●
● ● ●
● ● ●
● ●
● ●
●
●
● ● ●
● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ●
● ● ●
● ●
● ● ●
●
● ● ●
● ● ●
● ●
● ● ● ●
● ●
● ● ●
● ● ●
● ●
● ●
● ●
● ●
● ●
● ●
● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
●
● ● ● ● ● ●
● ● ● ●
● ● ●
● ●
● ● ● ●
● ● ● ● ● ● ●
● ●
● ● ● ●
● ●
● ● ● ●
● ●
● ●
● ● ● ● ● ● ● ●
●
● ● ● ●
● ● ● ● ●
● ● ● ●
● ●●
● ● ● ●
●
● ●●
● ● ●
● ●
● ● ● ●
● ●
● ● ● ● ●
● ● ● ● ● ● ●
● ● ● ● ●
● ● ● ●
● ● ●
● ●● ● ●
● ● ● ●
● ● ● ● ● ● ●
● ● ● ● ●
● ● ●
● ● ●● ●
● ● ● ● ● ● ●
● ● ●
● ● ● ● ●
● ● ●
● ● ● ● ● ● ●
● ● ● ● ● ●
● ●
● ● ● ● ● ●
● ● ●
● ● ● ● ● ●
● ● ●
● ● ●
● ● ●
● ● ● ● ● ● ●
●
● ● ● ● ●
● ● ● ●
● ● ● ● ● ● ●
● ● ● ●
● ● ● ● ● ●
● ● ● ● ● ● ●
● ● ●
● ● ● ● ● ● ●
● ● ●
● ● ● ● ●● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ●
● ● ● ● ●
● ● ● ● ● ●
● ● ● ● ● ●
● ● ● ● ●● ●
● ●
● ● ●
● ● ● ● ● ●
● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ●
● ● ● ● ● ●● ●
● ● ● ● ● ●
● ● ● ● ● ●
● ● ● ●
● ● ● ● ●
● ● ● ●●● ● ● ● ● ●
●
● ●
●
● ●● ●
● ●
● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ●
● ●
● ● ● ●
● ● ● ●
● ● ● ● ● ● ● ●
● ●
● ● ● ● ● ●
● ● ● ●
●● ● ●
● ● ● ● ● ●
● ● ● ● ● ●
●
● ● ● ● ●
● ● ●
● ● ● ● ● ●
● ● ● ●
● ● ● ● ● ● ●
● ● ● ● ●
● ● ● ● ● ●
● ● ● ● ● ●
●
● ●● ● ●
● ● ●
● ● ● ● ● ● ● ●
● ●● ● ●
● ● ●
● ● ● ●
● ● ● ● ● ●
●
0.0
● ● ● ● ●
● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ●
● ● ● ● ● ●
● ● ●● ●
● ● ● ● ●
● ● ● ● ●
● ● ● ● ●
● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ●
● ● ● ● ●
● ● ● ●
● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ●
−0.5
● ● ● ● ●
● ● ● ● ● ● ●
● ● ● ● ● ● ●● ● ●
● ●● ● ●
● ● ● ● ●
● ●
● ● ● ● ● ●●
●● ●● ● ● ● ● ● ● ●
● ● ● ● ●
● ●
● ● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ● ● ● ● ● ●
● ● ● ● ● ●
● ● ● ● ● ● ●
● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ●
● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ●
● ● ● ●
● ● ● ● ● ●
● ● ● ● ● ●
● ● ● ● ● ● ● ● ●
● ●
● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ●
● ● ● ●
● ● ● ●
● ● ● ● ● ● ● ● ●
● ● ●
● ● ● ● ● ● ● ● ● ●
● ● ●
● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ●
● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ●
● ● ●
● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ●● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ●
● ● ● ● ● ● ●
● ● ● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ● ● ● ●
● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ●
● ● ● ● ● ●
● ● ● ● ● ●
● ● ● ● ● ● ●
● ● ● ● ● ● ●
● ● ● ● ●
● ●
● ● ● ● ● ● ●
● ●
● ● ● ● ● ● ●
● ● ● ● ● ● ●
● ● ●
● ●
● ● ●● ● ● ● ● ● ● ●
● ● ● ● ● ● ●
● ● ● ● ● ● ●
● ● ● ● ● ● ●
● ● ● ●● ● ● ● ● ●
● ● ● ● ● ●
● ● ● ● ● ●
● ● ● ● ● ● ●
● ● ● ● ● ●
● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ●
● ●● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ●
● ●
●● ● ● ●
● ● ● ● ● ●
● ● ●
● ● ● ● ●
● ● ●
● ● ● ● ● ● ● ●
● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ●●
● ● ● ● ● ● ●
● ● ● ● ● ●
● ● ● ● ● ● ● ● ●●
● ● ●
● ● ● ●
● ● ● ● ● ● ● ● ●
● ● ● ●
● ● ●
● ● ●
● ● ● ●
● ● ● ●
● ● ● ● ●
● ● ● ● ●
● ● ●
● ● ●
● ● ● ●
● ● ●
● ● ●
● ● ● ● ● ●
● ● ● ● ● ● ● ● ●
● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ●
● ●
● ● ● ●
● ● ● ● ● ●
● ● ●
● ● ● ●
● ● ● ●
● ● ●
● ● ● ● ● ●
● ● ● ● ● ●
●● ● ●
● ● ● ●
● ● ● ● ●
● ● ● ●● ●
● ●
● ● ● ● ●
● ● ● ● ●
● ● ● ● ●
● ● ● ●
● ● ●
● ● ● ●
● ● ● ●
● ●
●
● ● ● ●
● ●
● ● ● ● ●● ●
● ● ● ● ● ● ● ●
● ● ● ● ●
● ● ● ●
● ● ● ● ●
● ● ● ●
● ● ●
● ● ●
● ●
● ● ● ● ●
● ● ●
● ● ● ●
● ● ● ●● ● ● ●
● ●
● ● ●
● ● ● ● ● ● ● ● ●
● ● ●
● ● ●
● ● ●
● ● ● ● ●
● ● ● ● ●
● ● ● ●
● ●
● ● ● ● ● ● ● ● ●
● ●
● ● ● ● ●
● ● ● ● ● ● ● ● ●
● ●
● ● ● ●
●
●
● ● ● ●
●
●
● ●
● ● ● ● ● ●
● ● ● ●
● ● ● ● ●
●
●
● ● ●
● ● ●
● ● ● ● ● ●
● ●
● ● ● ● ● ● ●
● ● ●
● ● ● ●
● ●
● ●
● ● ● ● ●
● ●
● ● ● ● ●
●●
● ●
● ●
● ●
●
●
● ●●
●
● ●
● ●
● ● ●
● ● ●
● ● ●
● ● ● ● ●
● ● ●
● ●
● ● ●
●
● ●
●
●
● ●
● ●
● ● ●
● ● ● ●
●
● ●
● ●
● ● ●
● ●
● ●
● ●
●
●
● ● ● ● ●
● ●
● ●
●
● ●●
● ● ● ●
−0.5
●
●
● ●
● ●
●
● ●
● ●
●
●
●
● ● ● ●
● ●
●
● ● ●
●
● ●
● ●
● ●
●
● ●
●
●
● ●
●
● ● ●
●
● ● ● ●
● ●
● ●
●
●
●
−1.0
●
●
● ● ● ●
●
●
●
●
● ● ●
●
●
● ● ●
●
●
●
●
● ●
●
0 500 1000 1500 2000 0 500 1000 1500 2000
● ●
● ●
● ●
● ●
● ●
● ●
● ●
● ●
0.5
0.5
● ●
● ●
● ● ● ● ● ●
● ●
● ●
● ●
● ●
● ● ● ●
● ●
● ● ● ●
● ●
● ●
● ●
● ● ● ●
● ● ● ● ● ●
● ●
● ●
● ●
● ●
● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ●
● ●
● ●
● ●
● ● ● ●
● ● ● ● ● ●
● ●
● ● ● ● ● ●
● ● ● ●
● ●
● ●
● ●
● ● ● ●
● ●
● ● ● ●
● ●
● ● ● ●
● ●
● ● ● ●
● ●
● ● ● ●
● ●
● ● ● ●
● ● ● ● ● ●
● ● ● ● ● ●
● ● ● ● ● ●
● ●
● ●
● ●
● ● ● ● ● ●
● ●
● ●
● ● ● ● ● ●
● ● ● ● ● ●
● ● ● ● ● ●
● ● ● ● ● ●
● ● ● ●
● ● ● ●
● ●
● ●
● ● ● ● ● ●
● ● ● ●
● ● ● ● ● ●
● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ●

00 Main Pt1to6

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

00 Main Pt1to6

Uploaded by

Copyright:

Available Formats

Regression & other methods for Functional Data

adidas - March 2019

Background: Functional Data

Descriptive Statistics for Functional Data

Basis Representation of Functional Data

Descriptive Statistics for Functional Data

Basis Representation of Functional Data

−0.03 −0.01 0.01 0.03

−0.03 −0.01 0.01

5 10 15 −0.03 −0.01 0.01 0.03

−0.03 −0.01 0.01

5 10 15 −0.03 −0.01 0.01 0.03

−0.03 −0.01 0.01

5 10 15 −0.03 −0.01 0.01 0.03

−0.03 −0.01 0.01

5 10 15 −0.03 −0.01 0.01 0.03

−0.03 −0.01 0.01

5 10 15 −0.03 −0.01 0.01 0.03

−0.03 −0.01 0.01

5 10 15 −0.03 −0.01 0.01 0.03

Aims of functional data analysis:

J. Ramsay and Silverman 2005

Aims of functional data analysis:

−0.03 −0.01 0.01

−0.03 −0.01 0.01

J. Ramsay and Silverman 2005

Aims of functional data analysis:

Aims of functional data analysis:

Function-on-Scalar: yi (t) = µ(t) + xi β(t) + ε(t)

Aims of functional data analysis:

J. Ramsay and Silverman 2005

Descriptive Statistics for Functional Data

Basis Representation of Functional Data

Standard setting in multivariate data analysis:

I Observations xi = (xi1 , . . . , xip ) for i = 1, . . . , n

Standard setting in multivariate data analysis:

I Observations xi = (xi1 , . . . , xip ) for i = 1, . . . , n

Standard setting in multivariate data analysis:

I Observations xi = (xi1 , . . . , xip ) for i = 1, . . . , n

xi1 xi2 xi3 ... xip

xi1 xi2 xi3 ... xip

I Basic idea: Model discretely observed data by functions on domain T

xi1 xi2 xi3 ... xip

xi1 xi2 xi3 ... xip

I Basic idea: Model discretely observed data by functions on domain T

xi1 xi2 xi3 ... xip

xi1 xi2 xi3 ... xip

I Basic idea: Model discretely observed data by functions on domain T

xi1 xi2 xi3 ... xip

xi1 xi2 xi3 ... xip

I Basic idea: Model discretely observed data by functions on domain T

I Observations xi (t), t ∈ T for i = 1, . . . , n

I Observations xi (t), t ∈ T for i = 1, . . . , n

I Observations xi (t), t ∈ T for i = 1, . . . , n

Descriptive Statistics for Functional Data

Basis Representation of Functional Data

100 120 140 160 180

100 120 140 160 180

100 120 140 160 180

deviation from mean height (cm)

age (years) age (years)

Sample mean function: Centered curves:

deviation from mean height (cm)