Professional Documents
Culture Documents
Insurance Analytics: Prof. Julien Trufin
Insurance Analytics: Prof. Julien Trufin
1
Tree-based methods : Bagging trees and random forests
2
Tree-based methods : Bagging trees and random forests
Introduction
One issue with regression trees : their high variance.
→ High variability of the prediction µ
bD (x ) over the trees trained from all possible
D.
Bagging trees and random forests aim to reduce the variance without too
much altering bias.
ED [b
µD (x )] = ED [ED [b
µD (x )]] .
3
Tree-based methods : Bagging trees and random forests
Introduction
Assume we can draw as many training sets as we want, so that we have B
training sets D1 , D2 , . . . , DB available. An approximation of the average
model is then given by
B
1 X
E
bD [b
µD (x )] = µ
b b (x ).
B b=1 D
Introduction
In practice, the probability distribution from which the observations of the
training set are drawn is usually not known so that there is only one training
set available.
In this context, the bootstrap approach, used both in bagging trees and
random forests, appears to be particularly useful.
5
Tree-based methods : Bagging trees and random forests Bagging trees
Bagging trees
Bagging is one of the first ensemble methods proposed in the literature.
6
Tree-based methods : Bagging trees and random forests Bagging trees
Bagging trees
Let D∗1 , D∗2 , . . . , D∗B be B bootstrap samples of D.
→ For each D∗b , b = 1, . . . , B , we fit our model, giving prediction
µ
bD,Θb (x ) = µ
bD∗b (x ).
where Θ = (Θ1 , . . . , ΘB ).
→ Random vectors Θ1 , . . . , ΘB fully capture the randomness of the training
procedure.
7
Tree-based methods : Bagging trees and random forests Bagging trees
Algorithm
For b = 1 to B do
1. Generate a bootstrap sample D∗b of D.
2. Fit an unpruned tree on D∗b , which gives prediction µ
bD,Θb (x ).
End for
PB
bbag
Output : µD,Θ (x ) =
1
B b=1 µ
bD,Θb (x ).
8
Tree-based methods : Bagging trees and random forests Bagging trees
Bias
The bias is the same as the one of the individual sampled models. Indeed,
h i
Bias(x ) = µ(x ) − ED,Θ µ bbag
D,Θ (x )
B
" #
1 X
= µ(x ) − ED,Θ1 ,...,ΘB bD,Θb (x )
µ
B b=1
B
1 X
= µ(x ) − ED,Θb µ bD,Θb (x )
B b=1
= µ(x ) − ED,Θb µ bD,Θb (x )
since predictions µ
bD,Θ1 (x ), . . . , µ
bD,ΘB (x ) are identically distributed.
However, the bias of µ bD,Θb (x ) is typically greater in absolute terms than the
bD (x ) fitted on D since the reduced sample D∗b imposes restrictions.
bias of µ
⇒ The improvements in the estimation obtained by bagging will be a
consequence of variance reduction.
9
Tree-based methods : Bagging trees and random forests Bagging trees
Variance
bbag
The variance of µD,Θ (x ) can be written as
B
" #
h i 1 X
VD,Θ µbbag
D,Θ (x ) = VD,Θ1 ,...,ΘB µ
bD,Θb (x )
B b=1
" B #
1 X
= VD,Θ1 ,...,ΘB µ D,Θb (x )
B2
b
b=1
( " " B ##
1 X
= VD EΘ1 ,...,ΘB µbD,Θb (x )D
B2 b=1
" " B ## )
X
+ED VΘ1 ,...,ΘB µbD,Θb (x )D
b=1
h h ii 1 h h ii
= VD EΘb µbD,Θb (x )D + ED VΘb µbD,Θb (x )D
B
since conditionally to D, µ
bD,Θ1 (x ), . . . , µ
bD,ΘB (x ) are i.i.d..
→ The first term = sampling variance of the bagging ensemble (a result of the
sampling variability of D).
→ The second term = within-D variance (a result of the randomization due to
the bootstrap sampling). As B increases, this second term disappears.
10
Tree-based methods : Bagging trees and random forests Bagging trees
Variance
Observation : h i
bbag
VD,Θ µ D,Θ (x ) ≤ VD,Θb µbD,Θb (x ) .
since conditionally to D, µ
bD,Θb (x ) and µ
bD,Θb0 (x ) are i.i.d..
11
Tree-based methods : Bagging trees and random forests Bagging trees
Variance
ρ(x ) becomes
VD EΘb µ bD,Θb (x )|D
ρ(x ) =
VD,Θb µ bD,Θb (x )
VD EΘb µ (x )|D
ii D,Θb h
b
= h h h ii .
VD EΘb µ bD,Θb (x )D + ED VΘb µ bD,Θb (x )D
12
Tree-based methods : Bagging trees and random forests Bagging trees
Variance
Alternatively,
VD EΘb µbD,Θb (x )|D = ρ(x )VD,Θb µbD,Θb (x )
and
ED VΘb µbD,Θb (x )|D = (1 − ρ(x )) VD,Θb µbD,Θb (x ) ,
so that
h i 1
bbag
VD,Θ µ D,Θ (x ) = VD EΘb µ bD,Θb (x )|D + ED VΘb µ bD,Θb (x )|D
B
(1 − ρ(x ))
= ρ(x )VD,Θb µ bD,Θb (x ) + VD,Θb µ bD,Θb (x ) .
B
13
Tree-based methods : Bagging trees and random forests Bagging trees
Variance
Notice that
µD,Θb (x )] ≥ VD [b
VD,Θb [b µD (x )] .
⇒ Bagging averages models with higher variances.
bbag
Nevertheless, µD,Θ (x ) has generally a smaller variance than µ
bD (x ).
→ Typically, ρ(x ) compensates for the variance increase
µD,Θb (x )] − VD [b
VD,Θb [b µD (x )] ,
µD,Θb (x )] ≥ VD [b
so that the combined effect of ρ(x ) < 1 and VD,Θb [b µD (x )]
often leads to a variance reduction
µD (x )] − ρ(x )VD,Θb [b
VD [b µD,Θb (x )]
that is positive.
→ Because of their high variance, regression trees very likely benefit from the
averaging procedure.
14
Tree-based methods : Bagging trees and random forests Bagging trees
The bias remains unchanged while the variance decreases compared to the
individual prediction µ
bD,Θb (x ), so that we get
h i
ED,Θ Err µbbag
D,Θ (x )
2 h i
bbag
= Err (µ(x )) + µ(x ) − ED,Θb µbD,Θb (x ) + VD,Θ µ D,Θ (x )
2
≤ Err (µ(x )) + µ(x ) − ED,Θb µbD,Θb (x ) + VD,Θb µ bD,Θb (x )
= ED,Θb Err µ bD,Θb (x ) .
15
Tree-based methods : Bagging trees and random forests Bagging trees
with
bbag bbag
" # " !#!
h i µ D,Θ (x ) µD,Θ (x )
ED,Θ E P
bbag
µD,Θ (x ) = 2µ(x ) ED,Θ − 1 − ED,Θ ln .
µ(x ) µ(x )
Now,
bbag
" #
µD,Θ (x )
µ
bD,Θb (x )
ED,Θ = ED,Θb ,
µ(x ) µ(x )
so that h i
ED,Θ E P µ bbag
D,Θ (x )
µ
bD,Θb (x ) bD,Θb (x )
µ
= 2µ(x ) ED,Θb − 1 − ED,Θb ln
µ(x ) µ(x )
" bag !# !
µbD,Θ (x )
µ
bD,Θb (x )
−2µ(x ) ED,Θ ln − ED,Θb ln
µ(x ) µ(x )
h i
P
= ED,Θb E µ
bD,Θb (x )
h i
bbag
−2µ(x ) ED,Θ ln µ D,Θ (x ) − ED,Θb ln µ
bD,Θb (x ) .
16
Tree-based methods : Bagging trees and random forests Bagging trees
so that h i h i
ED,Θ E P µbbag
D,Θ (x ) ≤ ED,Θb E P µ
bD,Θb (x )
and hence h i
bbag
ED,Θ Err µ D,Θ (x ) ≤ ED,Θb Err µ
bD,Θb (x ) .
Random forests
The procedure called random forests is a modification of bagging trees.
→ It produces a collection of trees that are more de-correlated.
Random forests
The random forest prediction writes
B
1 X
brfD,Θ (x ) =
µ µ
bD,Θb (x ),
B
b=1
where µ
bD,Θb (x ) denotes the prediction at point x for the bth random forest
tree.
→ Θ1 , . . . , ΘB capture the randomness of the bootstrap sampling and the
additional randomness due to the random selection of m features before each
split.
→ m is a tuning parameter. Typical value of m is bp/3c.
19
Tree-based methods : Bagging trees and random forests Random forests
Algorithm
For b = 1 to B do
1. Generate a bootstrap sample D∗b of D.
2. Fit a tree on D∗b .
For each node t do
(2.1) Select m (≤ p) features at random from the p original features.
(2.2) Pick the best feature among the m.
(2.3) Split the node into two daughter nodes.
End for
This gives prediction µ
bD,Θb (x ) (use typical tree stopping criteria (but do not
prune)).
End for
1
PB
brfD,Θ (x ) =
Output : µ B b=1 µ
bD,Θb (x ).
20
Tree-based methods : Bagging trees and random forests Random forests
Computational costs
Computational costs and memory requirements increase as the number of
bootstrap samples increases.
→ However, this can be mitigated with parallel computing. Indeed, each
bootstrap sample and the corresponding tree is independent of any other
sample and tree.
21
Tree-based methods : Bagging trees and random forests Out-of-bag estimate
Out-of-bag estimate
For each observation (yi , x i ) of the training set D, an out-of-bag prediction
can be constructed by averaging only trees corresponding to bootstrap
samples D∗b in which (yi , x i ) does not appear.
23
Tree-based methods : Bagging trees and random forests Interpretability
1 X
Err
d (bµD,Θb ) = L(yi , µ
bD,Θb (x i )),
|I\I ∗b |
i∈I\I ∗b
24
Tree-based methods : Bagging trees and random forests Interpretability
25
Tree-based methods : Bagging trees and random forests Interpretability
x S ∪ x S̄ = x .
In principle, µ
b(x ) depends on features in x S and x S̄ , so that we can write
µ
b(x ) = µ
b(x S , x S̄ ).
Example MTPL
Data set :
→ MTPL insurance portfolio of a Belgian insurance company observed during
one year.
→ Description of data set :
> str(data) # description of the dataset
’data.frame’: 160944 obs. of 10 variables:
$ AgePh : int 50 64 60 77 28 26 26 58 59 57 ...
$ AgeCar : int 12 3 10 15 7 12 8 14 6 10 ...
$ Fuel : Factor w/ 2 levels "Diesel","Gasoline": 2 2 1 2 2 2 2 2 2 2 ...
$ Split : Factor w/ 4 levels "Half-Yearly",..: 2 4 4 4 1 3 1 3 1 1 ...
$ Cover : Factor w/ 3 levels "Comprehensive",..: 3 2 3 3 3 3 1 3 2 2 ...
$ Gender : Factor w/ 2 levels "Female","Male": 1 2 1 1 2 1 1 2 2 2 ...
$ Use : Factor w/ 2 levels "Private","Professional": 1 1 1 1 1 1 1 1 1 1 ...
$ PowerCat: Factor w/ 5 levels "C1","C2","C3",..: 2 2 2 2 2 2 2 2 1 1 ...
$ ExpoR : num 1 1 1 1 0.0466 ...
$ Nclaim : int 1 0 0 0 1 0 1 0 0 0 ...
27
Tree-based methods : Bagging trees and random forests Interpretability
Example MTPL
Data set :
→ The data set comprises 160 944 insurance policies.
→ For each policy, we have 8 features :
- AgePh : policyholder’s age ;
- AgeCar : age of the car ;
- Fuel : fuel of the car, with two categories (gas or diesel) ;
- Split : splitting of the premium, with four categories (annually, semi-annually,
quarterly or monthly) ;
- Cover : extent of the coverage, with three categories (from compulsory
third-party liability cover to comprehensive) ;
- Gender : policyholder’s gender, with two categories (female or male) ;
- Use : use of the car, with two categories (private or professional) ;
- PowerCat : the engine’s power, with five categories.
→ For each policy, we have the number of claim (Nclaim), the response, and
exposure information (exposure-to-risk (ExpoR) expressed in year).
28
Tree-based methods : Bagging trees and random forests Interpretability
Example MTPL
Data set :
> head(data,10) # 10 first observations
AgePh AgeCar Fuel Split Cover Gender Use PowerCat ExpoR Nclaim
1 50 12 Gasoline Monthly TPL.Only Female Private C2 1.00000000 1
2 64 3 Gasoline Yearly Limited.MD Male Private C2 1.00000000 0
3 60 10 Diesel Yearly TPL.Only Female Private C2 1.00000000 0
4 77 15 Gasoline Yearly TPL.Only Female Private C2 1.00000000 0
5 28 7 Gasoline Half-Yearly TPL.Only Male Private C2 0.04657534 1
6 26 12 Gasoline Quarterly TPL.Only Female Private C2 1.00000000 0
7 26 8 Gasoline Half-Yearly Comprehensive Female Private C2 1.00000000 1
8 58 14 Gasoline Quarterly TPL.Only Male Private C2 0.40273973 0
9 59 6 Gasoline Half-Yearly Limited.MD Male Private C1 1.00000000 0
10 57 10 Gasoline Half-Yearly Limited.MD Male Private C1 1.00000000 0
29
Tree-based methods : Bagging trees and random forests Interpretability
Example MTPL
Descriptive statistics of the data :
120000
100000
80000
Number of policies
60000
40000
20000
0
1 2 3 4 5 6 7 8 9 10 11 12
30
Tree-based methods : Bagging trees and random forests Interpretability
Example MTPL
Descriptive statistics of the data :
0.15
100000
75000
0.10
ClaimFrequency
totalExposure
50000
0.05
25000
0 0.00
31
Tree-based methods : Bagging trees and random forests Interpretability
Example MTPL
Descriptive statistics of the data :
100000
0.15
75000
0.10
ClaimFrequency
totalExposure
50000
0.05
25000
0 0.00
32
Tree-based methods : Bagging trees and random forests Interpretability
Example MTPL
Descriptive statistics of the data :
1e+05
0.10
ClaimFrequency
totalExposure
5e+04 0.05
0e+00 0.00
33
Tree-based methods : Bagging trees and random forests Interpretability
Example MTPL
Descriptive statistics of the data :
0.15
80000
60000
0.10
ClaimFrequency
totalExposure
40000
0.05
20000
0 0.00
34
Tree-based methods : Bagging trees and random forests Interpretability
Example MTPL
Descriptive statistics of the data :
0.20
60000
0.15
ClaimFrequency
totalExposure
40000
0.10
20000
0.05
0 0.00
35
Tree-based methods : Bagging trees and random forests Interpretability
Example MTPL
Descriptive statistics of the data :
80000
0.15
60000
ClaimFrequency
totalExposure
0.10
40000
0.05
20000
0 0.00
C1 C2 C3 C4 C5 C1 C2 C3 C4 C5
PowerCat PowerCat
36
Tree-based methods : Bagging trees and random forests Interpretability
Example MTPL
Descriptive statistics of the data :
0.20
10000
0.15
ClaimFrequency
totalExposure
0.10
5000
0.05
0 0.00
0 5 10 15 20 0 5 10 15 20
AgeCar AgeCar
37
Tree-based methods : Bagging trees and random forests Interpretability
Example MTPL
Descriptive statistics of the data :
1.00
3000
0.75
ClaimFrequency
2000
totalExposure
0.50
1000
0.25
0 0.00
20 40 60 80 20 40 60 80
AgePh AgePh
38
Tree-based methods : Bagging trees and random forests Interpretability
Example MTPL
Training set and validation set :
→ Training set : 80% of the data set.
→ Validation set : 20% of the data set.
> library(caret)
> inValidation = createDataPartition(data$Nclaim, p=0.2, list=FALSE)
> validation.set = data[inValidation,]
> training.set = data[-inValidation,]
39
Tree-based methods : Bagging trees and random forests Interpretability
Example MTPL
R packages :
→ ipred (for bagging).
→ randomForest.
→ rfCountData.
40
Tree-based methods : Bagging trees and random forests Interpretability
Example MTPL
Tuning the model :
set.seed(87)
folds = createFolds(training.set$Nclaim, k = 5, list = TRUE)
41
Tree-based methods : Bagging trees and random forests Interpretability
Example MTPL
Tuning the model :
Cross−validation results
0.550
0.545
0.540 Nodesize 500 Nodesize 1 000 Nodesize 5 000 Nodesize 10 000
Deviance
0.535
0.530
8 7 6 5 4 3 2 1 8 7 6 5 4 3 2 1 8 7 6 5 4 3 2 1 8 7 6 5 4 3 2 1
mtry
42
Tree-based methods : Bagging trees and random forests Interpretability
Example MTPL
Optimal RF :
> # fit optimal model
> optimal_rf = rfPoisson(x = training.set[,!names(training.set)
%in% c("Nclaim", "ExpoR")],
offset = log(training.set$ExpoR),
y = training.set$Nclaim,
xtest = validation.set[,!names(validation.set)
%in% c("Nclaim", "ExpoR")],
offsettest = log(validation.set$ExpoR),
ytest = validation.set$Nclaim,
ntree = 2000,
mtry = mtry_star,
nodesize = nodesize_star,
keep.forest = TRUE,
do.trace = TRUE,
importance=TRUE)
> print(optimal_rf)
Call:
rfPoisson(x = ...)
Type of random forest: regression
Number of trees: 2000
No. of variables tried at each split: 3
43
Tree-based methods : Bagging trees and random forests Interpretability
Example MTPL
Optimal RF :
> # Optimal number of trees
> par(mfrow=c(1,2))
> plot(optimal_rf, xlim=c(0,2000), ylim=c(0.54,0.55))
> plot(optimal_rf, xlim=c(0,500), ylim=c(0.54,0.55), main="Zoom")
0.5500 0.5500
0.5475 0.5475
0.5450 0.5450
0.5425 0.5425
0.5400 0.5400
44
Tree-based methods : Bagging trees and random forests Interpretability
Example MTPL
Optimal RF :
> # relative importances
> imp <- importance(rf.optim, type = 1)
> impvar <- rownames(imp)[order(imp[, 1], decreasing = TRUE)]
> par(mfrow=c(1, 1))
> varImpPlot(rf.optim, sort = TRUE, type = 1)
AgePh ●
Split ●
Fuel ●
AgeCar ●
Cover ●
Gender ●
PowerCat ●
Use ●
0.000 0.004 0.008 0.012
%IncLossFunction
45
Tree-based methods : Bagging trees and random forests Interpretability
Example MTPL
Optimal RF :
> # partial dependences
> op <- par(mfrow=c(2, 4)) # for all features (here: 8 features)
> for (i in seq_along(impvar)) {
+partialPlot(optimal_rf, training.set, x.var = impvar[i], offset =log(training.set$ExpoR),
+ xlab = impvar[i], main = paste("Partial Dependence on", impvar[i]))
+}
> par(op)
0.225
0.15
0.12
0.200
0.10
0.10
0.08
0.175
0.150 0.05
0.05 0.04
0.125
0.125
0.12 0.12
0.100
0.14
0.08 0.08
0.075
0.13 0.050
0.04 0.04
0.025
46
Tree-based methods : Bagging trees and random forests Interpretability
Example MTPL
Predictions :
> data$pred = predict(optimal_rf, offset = log(data$ExpoR), newdata = training.set)
> head(data, 20)
AgePh AgeCar Fuel Split Cover Gender Use PowerCat Latitude Longitude ExpoR Nclaim pred
50 12 Gasoline Monthly TPL.Only Male Private C2 50.5 4.21 1.00000 1 0.16459
64 3 Gasoline Yearly Limited.MD Female Private C2 50.5 4.21 1.00000 0 0.10225
60 10 Diesel Yearly TPL.Only Male Private C2 50.5 4.21 1.00000 0 0.12587
77 15 Gasoline Yearly TPL.Only Male Private C2 50.5 4.21 1.00000 0 0.09129
28 7 Gasoline Half-Yearly TPL.Only Female Private C2 50.5 4.21 0.04658 1 0.01007
26 12 Gasoline Quarterly TPL.Only Male Private C2 50.5 4.21 1.00000 0 0.31131
26 8 Gasoline Half-Yearly Comprehensive Male Private C2 50.5 4.21 1.00000 1 0.21448
58 14 Gasoline Quarterly TPL.Only Female Private C2 50.5 4.21 0.40274 0 0.06025
59 6 Gasoline Half-Yearly Limited.MD Female Private C1 50.5 4.21 1.00000 0 0.09995
57 10 Gasoline Half-Yearly Limited.MD Female Private C1 50.5 4.21 1.00000 0 0.10124
62 5 Gasoline Yearly Limited.MD Male Private C1 50.5 4.21 1.00000 0 0.08188
57 15 Gasoline Yearly TPL.Only Male Private C2 50.5 4.21 1.00000 0 0.09640
30 10 Gasoline Monthly Limited.MD Male Private C2 50.5 4.21 1.00000 1 0.20159
47 14 Gasoline Monthly TPL.Only Female Private C1 50.5 4.21 1.00000 0 0.16334
67 8 Gasoline Yearly Limited.MD Male Private C2 50.5 4.21 1.00000 0 0.08522
62 7 Gasoline Quarterly Comprehensive Male Professional C2 50.5 4.21 1.00000 0 0.12341
82 10 Gasoline Yearly Limited.MD Male Private C2 50.5 4.21 0.73425 0 0.06909
33 15 Gasoline Half-Yearly TPL.Only Male Private C1 50.5 4.21 0.31507 0 0.04204
43 2 Diesel Half-Yearly Comprehensive Male Private C3 50.5 4.21 1.00000 0 0.12183
51 7 Gasoline Yearly Limited.MD Male Private C4 50.5 4.21 1.00000 0 0.10955
47
Tree-based methods : Bagging trees and random forests Interpretability
Example MTPL
Comparison with a single regression tree :
→ Generalization error :
d val µ
Err brfD,Θ = 0.5440970.
48
Tree-based methods : Bagging trees and random forests Interpretability
Example MTPL
Comparison with a single regression tree :
→ Model granularity (impact of the age) : regression tree :
0.14
16e+3 / 129e+3
100%
yes AgePh >= 30 no
0.21
3459 / 19e+3
15%
Split = Yearly
0.13
12e+3 / 110e+3
85%
Split = Half−Yearly,Yearly
0.17
3097 / 24e+3
18%
AgePh >= 56
0.12
9325 / 86e+3
67%
AgePh >= 58
0.13 0.17
6704 / 57e+3 2612 / 19e+3
44% 15%
0.096
2621 / 29e+3
23%
Fuel = Gasoline
0.14
2588 / 20e+3
16%
Cover = Comprehensive,Limited.MD
0.09 0.12
1982 / 23e+3 4116 / 37e+3
18% 29%
0.13
1750 / 15e+3
11%
Cover = Limited.MD
0.087 0.12
1551 / 19e+3 2366 / 22e+3
15% 17%
AgePh < 74 AgePh >= 48
0.12
1587 / 14e+3
11%
Gender = Female
0.084
1156 / 15e+3
11%
AgeCar >= 5.5
49
Tree-based methods : Bagging trees and random forests Interpretability
Example MTPL
Comparison with a single regression tree :
→ Model granularity (impact of the age) : RF :
0.225
0.15
0.12
0.200
0.10
? 0.10
0.08
0.175
0.150 0.05
0.05 0.04
-
0.125
0.125
0.12 0.12
0.100
0.14
0.08 0.08
0.075
0.13 0.050
0.04 0.04
0.025