Download as pdf or txt
Download as pdf or txt
You are on page 1of 59

Sample Project Report

Franklin Lam

January, 2023

Introduction
In the assignment, the mlr package is used for building SVM model and neural network model for the Credit
Card Default classification problem. This sample aims to illustrate how to use R to perform exploratory
analysis and develop and benchmark SVM and neural models for the given problems. Only brief discussion
will be included as an illustration.
**NOTE: A more detailed analysis are required in the assignment and marks will not be given for directly
copying of the analysis.
First, we have to load the libraries and disable all output from the mlr package.
library(dplyr)
library(psych)
library(VIM)
library(ggplot2)
library(corrplot)
library(car)
library(mlr)
library(gridExtra)
# turn off training outputs of mlr
configureMlr(show.info=FALSE, show.learner.output=FALSE)

Exploratory analysis
The following code produces basic descriptive statistics of the raw data and plot the missing data pattern.
# Read the Data
df <- read.csv("default_credit_card.csv",header=TRUE) %>%
mutate(default=factor(Y, labels=c("No", "Yes"))) %>%
select(-Y)
str(df)

## 'data.frame': 30000 obs. of 24 variables:


## $ X1 : int 20000 120000 90000 50000 50000 50000 500000 100000 140000 20000 ...
## $ X2 : int 2 2 2 2 1 1 1 2 2 1 ...
## $ X3 : int 2 2 2 2 2 1 1 2 3 3 ...
## $ X4 : int 1 2 2 1 1 2 2 2 1 2 ...
## $ X5 : int 24 26 34 37 57 37 29 23 28 35 ...
## $ X6 : int 2 -1 0 0 -1 0 0 0 0 -2 ...
## $ X7 : int 2 2 0 0 0 0 0 -1 0 -2 ...
## $ X8 : int -1 0 0 0 -1 0 0 -1 2 -2 ...
## $ X9 : int -1 0 0 0 0 0 0 0 0 -2 ...
## $ X10 : int -2 0 0 0 0 0 0 0 0 -1 ...

1
## $ X11 : int -2 2 0 0 0 0 0 -1 0 -1 ...
## $ X12 : int 3913 2682 29239 46990 8617 64400 367965 11876 11285 0 ...
## $ X13 : int 3102 1725 14027 48233 5670 57069 412023 380 14096 0 ...
## $ X14 : int 689 2682 13559 49291 35835 57608 445007 601 12108 0 ...
## $ X15 : int 0 3272 14331 28314 20940 19394 542653 221 12211 0 ...
## $ X16 : int 0 3455 14948 28959 19146 19619 483003 -159 11793 13007 ...
## $ X17 : int 0 3261 15549 29547 19131 20024 473944 567 3719 13912 ...
## $ X18 : int 0 0 1518 2000 2000 2500 55000 380 3329 0 ...
## $ X19 : int 689 1000 1500 2019 36681 1815 40000 601 0 0 ...
## $ X20 : int 0 1000 1000 1200 10000 657 38000 0 432 0 ...
## $ X21 : int 0 1000 1000 1100 9000 1000 20239 581 1000 13007 ...
## $ X22 : int 0 0 1000 1069 689 1000 13750 1687 1000 1122 ...
## $ X23 : int 0 2000 5000 1000 679 800 13770 1542 1000 0 ...
## $ default: Factor w/ 2 levels "No","Yes": 2 2 1 1 1 1 1 1 1 1 ...
summary(df)

## X1 X2 X3 X4
## Min. : 10000 Min. :1.000 Min. :0.000 Min. :0.000
## 1st Qu.: 50000 1st Qu.:1.000 1st Qu.:1.000 1st Qu.:1.000
## Median : 140000 Median :2.000 Median :2.000 Median :2.000
## Mean : 167484 Mean :1.604 Mean :1.853 Mean :1.552
## 3rd Qu.: 240000 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000
## Max. :1000000 Max. :2.000 Max. :6.000 Max. :3.000
## X5 X6 X7 X8
## Min. :21.00 Min. :-2.0000 Min. :-2.0000 Min. :-2.0000
## 1st Qu.:28.00 1st Qu.:-1.0000 1st Qu.:-1.0000 1st Qu.:-1.0000
## Median :34.00 Median : 0.0000 Median : 0.0000 Median : 0.0000
## Mean :35.49 Mean :-0.0167 Mean :-0.1338 Mean :-0.1662
## 3rd Qu.:41.00 3rd Qu.: 0.0000 3rd Qu.: 0.0000 3rd Qu.: 0.0000
## Max. :79.00 Max. : 8.0000 Max. : 8.0000 Max. : 8.0000
## X9 X10 X11 X12
## Min. :-2.0000 Min. :-2.0000 Min. :-2.0000 Min. :-165580
## 1st Qu.:-1.0000 1st Qu.:-1.0000 1st Qu.:-1.0000 1st Qu.: 3559
## Median : 0.0000 Median : 0.0000 Median : 0.0000 Median : 22382
## Mean :-0.2207 Mean :-0.2662 Mean :-0.2911 Mean : 51223
## 3rd Qu.: 0.0000 3rd Qu.: 0.0000 3rd Qu.: 0.0000 3rd Qu.: 67091
## Max. : 8.0000 Max. : 8.0000 Max. : 8.0000 Max. : 964511
## X13 X14 X15 X16
## Min. :-69777 Min. :-157264 Min. :-170000 Min. :-81334
## 1st Qu.: 2985 1st Qu.: 2666 1st Qu.: 2327 1st Qu.: 1763
## Median : 21200 Median : 20089 Median : 19052 Median : 18105
## Mean : 49179 Mean : 47013 Mean : 43263 Mean : 40311
## 3rd Qu.: 64006 3rd Qu.: 60165 3rd Qu.: 54506 3rd Qu.: 50191
## Max. :983931 Max. :1664089 Max. : 891586 Max. :927171
## X17 X18 X19 X20
## Min. :-339603 Min. : 0 Min. : 0 Min. : 0
## 1st Qu.: 1256 1st Qu.: 1000 1st Qu.: 833 1st Qu.: 390
## Median : 17071 Median : 2100 Median : 2009 Median : 1800
## Mean : 38872 Mean : 5664 Mean : 5921 Mean : 5226
## 3rd Qu.: 49198 3rd Qu.: 5006 3rd Qu.: 5000 3rd Qu.: 4505
## Max. : 961664 Max. :873552 Max. :1684259 Max. :896040
## X21 X22 X23 default
## Min. : 0 Min. : 0.0 Min. : 0.0 No :23364
## 1st Qu.: 296 1st Qu.: 252.5 1st Qu.: 117.8 Yes: 6636

2
## Median : 1500 Median : 1500.0 Median : 1500.0
## Mean : 4826 Mean : 4799.4 Mean : 5215.5
## 3rd Qu.: 4013 3rd Qu.: 4031.5 3rd Qu.: 4000.0
## Max. :621000 Max. :426529.0 Max. :528666.0
# plot the missing data pattern
aggr(df[,1:23], prop=FALSE, numbers=TRUE)
Number of missings

25000

Combinations
30000
10000
0

X1
X5
X9
X13
X17
X21

X1
X3
X5
X7
X9
X11
X13
X15
X17
X19
X21
X23
From the descriptive statistics and graphs, no missing value in the dataset.
Descriptive statistics of the data is given below.
# count frequency and calculate percentage
freq=table(df$default)
percentage=round(prop.table(freq),3)
cbind(freq, percentage)

## freq percentage
## No 23364 0.779
## Yes 6636 0.221
# descriptive statistics
describe(df[,-24])

## vars n mean sd median trimmed mad min max


## X1 1 30000 167484.32 129747.66 140000.0 151607.40 133434.00 10000 1000000
## X2 2 30000 1.60 0.49 2.0 1.63 0.00 1 2
## X3 3 30000 1.85 0.79 2.0 1.78 1.48 0 6
## X4 4 30000 1.55 0.52 2.0 1.55 0.00 0 3
## X5 5 30000 35.49 9.22 34.0 34.69 8.90 21 79
## X6 6 30000 -0.02 1.12 0.0 -0.06 1.48 -2 8
## X7 7 30000 -0.13 1.20 0.0 -0.20 0.00 -2 8
## X8 8 30000 -0.17 1.20 0.0 -0.24 0.00 -2 8
## X9 9 30000 -0.22 1.17 0.0 -0.31 0.00 -2 8
## X10 10 30000 -0.27 1.13 0.0 -0.36 0.00 -2 8
## X11 11 30000 -0.29 1.15 0.0 -0.39 0.00 -2 8
## X12 12 30000 51223.33 73635.86 22381.5 35359.66 32321.42 -165580 964511
## X13 13 30000 49179.08 71173.77 21200.0 33836.10 30852.91 -69777 983931

3
## X14 14 30000 47013.15 69349.39 20088.5 32064.43 29219.82 -157264 1664089
## X15 15 30000 43262.95 64332.86 19052.0 29212.37 27659.39 -170000 891586
## X16 16 30000 40311.40 60797.16 18104.5 26920.95 26224.97 -81334 927171
## X17 17 30000 38871.76 59554.11 17071.0 25726.08 24840.96 -339603 961664
## X18 18 30000 5663.58 16563.28 2100.0 2997.22 2864.38 0 873552
## X19 19 30000 5921.16 23040.87 2009.0 2876.43 2951.86 0 1684259
## X20 20 30000 5225.68 17606.96 1800.0 2468.91 2661.27 0 896040
## X21 21 30000 4826.08 15666.16 1500.0 2199.23 2223.90 0 621000
## X22 22 30000 4799.39 15278.31 1500.0 2202.19 2223.90 0 426529
## X23 23 30000 5215.50 17777.47 1500.0 2165.33 2223.90 0 528666
## range skew kurtosis se
## X1 990000 0.99 0.54 749.10
## X2 1 -0.42 -1.82 0.00
## X3 6 0.97 2.08 0.00
## X4 3 -0.02 -1.36 0.00
## X5 58 0.73 0.04 0.05
## X6 10 0.73 2.72 0.01
## X7 10 0.79 1.57 0.01
## X8 10 0.84 2.08 0.01
## X9 10 1.00 3.50 0.01
## X10 10 1.01 3.99 0.01
## X11 10 0.95 3.43 0.01
## X12 1130091 2.66 9.80 425.14
## X13 1053708 2.70 10.30 410.92
## X14 1821353 3.09 19.78 400.39
## X15 1061586 2.82 11.31 371.43
## X16 1008505 2.88 12.30 351.01
## X17 1301267 2.85 12.27 343.84
## X18 873552 14.67 415.16 95.63
## X19 1684259 30.45 1641.25 133.03
## X20 896040 17.21 564.18 101.65
## X21 621000 12.90 277.27 90.45
## X22 426529 11.13 180.02 88.21
## X23 528666 10.64 167.12 102.64
Boxplots of numerical data
# boxplot - only numerical data
# X1: credit
df %>% select(X1) %>%
reshape2::melt() %>%
ggplot(aes(x=variable, y=value, fill=variable)) +
geom_boxplot() +
labs(title="Predicting variables") +
theme(axis.text.x = element_text(angle=90, vjust=0.5),
plot.title=element_text(hjust=0.5))

4
Predicting variables
1000000

750000

variable
value

500000
X1

250000

X1
variable

# X6-X11: history of past payment


df %>% select(X6:X11) %>%
reshape2::melt() %>%
ggplot(aes(x=variable, y=value, fill=variable)) +
geom_boxplot() +
labs(title="Predicting variables") +
theme(axis.text.x = element_text(angle=90, vjust=0.5),
plot.title=element_text(hjust=0.5))

Predicting variables
7.5
variable
X6
5.0
X7
value

X8
2.5
X9
X10
0.0
X11

−2.5
X10

X11
X6

X7

X8

X9

variable

# X12-X17: bill
df %>% select(X12:X17) %>%
reshape2::melt() %>%
ggplot(aes(x=variable, y=value, fill=variable)) +
geom_boxplot() +
labs(title="Predicting variables") +
theme(axis.text.x = element_text(angle=90, vjust=0.5),
plot.title=element_text(hjust=0.5))

5
Predicting variables

1500000 variable
X12
1000000
X13
value

X14
500000
X15
X16
0
X17
X12

X13

X14

X15

X16

X17
variable

# X18-X23: previous payment


df %>% select(X18:X23) %>%
reshape2::melt() %>%
ggplot(aes(x=variable, y=value, fill=variable)) +
geom_boxplot() +
labs(title="Predicting variables") +
theme(axis.text.x = element_text(angle=90, vjust=0.5),
plot.title=element_text(hjust=0.5))

Predicting variables

1500000 variable
X18
X19
1000000
value

X20
X21
500000
X22
X23

0
X18

X19

X20

X21

X22

X23

variable

Barcharts of categorical variables are given below.


# X2-X5: demographic data

# predicdting variables
# X1
df %>% select(X1, default) %>%
group_by(default) %>%

6
summarise_all(mean) %>%
reshape2::melt() %>%
ggplot(aes(x=variable, y=value, fill=default)) +
geom_col(position="dodge") +
labs(title = "The Means of Predicting Variables by Class") +
theme(axis.text.x = element_text(angle=90, vjust=0.5),
plot.title=element_text(hjust=0.5))

The Means of Predicting Variables by Class

150000

default
value

100000
No
Yes
50000

0
X1

variable

# X2:X5
df %>% select(X2, default) %>%
mutate(Sex=factor(X2, labels=c("male", "female"))) %>%
select(Sex, default) %>%
reshape2::melt() %>%
ggplot(aes(x=Sex, fill=default)) +
geom_bar(position="dodge") +
labs(title = "The Means of Predicting Variables by Class") +
theme(axis.text.x = element_text(angle=90, vjust=0.5),
plot.title=element_text(hjust=0.5))

7
The Means of Predicting Variables by Class
15000

10000
default
count

No

5000 Yes

female
male

Sex

df %>% select(X3, default) %>%


mutate(Education=factor(X3, labels=c("Unknown", "Graduate School",
"University", "High School", "Others",
"Unknown", "Unknown"))) %>%
select(Education, default) %>% reshape2::melt() %>%
ggplot(aes(x=Education, fill=default)) +
geom_bar(position="dodge") +
labs(title = "The Means of Predicting Variables by Class") +
theme(axis.text.x = element_text(angle=90, vjust=0.5),
plot.title=element_text(hjust=0.5))

The Means of Predicting Variables by Class

9000
default
count

6000
No

3000 Yes

0
Graduate School

High School
University
Unknown

Others

Education

df %>% select(X4, default) %>%


mutate(Marital_status=factor(X4, labels=c("Unknown", "Married",
"Single", "Others"))) %>%
select(Marital_status, default) %>% reshape2::melt() %>%
reshape2::melt() %>%
ggplot(aes(x=Marital_status, fill=default)) +

8
geom_bar(position="dodge") +
labs(title = "The Means of Predicting Variables by Class") +
theme(axis.text.x = element_text(angle=90, vjust=0.5),
plot.title=element_text(hjust=0.5))

The Means of Predicting Variables by Class

10000
default
count

No
5000 Yes

0
Unknown

Married

Others
Single

Marital_status

df %>% select(Age=X5, default) %>%


ggplot(aes(Age, fill=default)) +
geom_histogram(alpha=0.5, binwidth=2) +
facet_grid(.~ default)+
theme(axis.text.x = element_text(angle = 90, hjust = 1))

No Yes
2500

2000

1500 default
count

No
1000 Yes

500

0
20

40

60

80
20

40

60

80

Age

# X6:X11
df %>% select(X6:X11, default) %>%
group_by(default) %>%
summarise_all(mean) %>%
reshape2::melt() %>%
ggplot(aes(x=variable, y=value, fill=default)) +

9
geom_col(position="dodge") +
labs(title = "The Means of Predicting Variables by Class") +
theme(axis.text.x = element_text(angle=90, vjust=0.5),
plot.title=element_text(hjust=0.5))

The Means of Predicting Variables by Class

0.50

0.25 default
value

No
0.00 Yes

−0.25

X10

X11
X6

X7

X8

X9

variable

# X12:X17
df %>% select(X12:X17, default) %>%
group_by(default) %>%
summarise_all(mean) %>%
reshape2::melt() %>%
ggplot(aes(x=variable, y=value, fill=default)) +
geom_col(position="dodge") +
labs(title = "The Means of Predicting Variables by Class") +
theme(axis.text.x = element_text(angle=90, vjust=0.5),
plot.title=element_text(hjust=0.5))

The Means of Predicting Variables by Class


50000

40000

default
30000
value

No
20000 Yes

10000

0
X12

X13

X14

X15

X16

X17

variable

10
# X18:X23
df %>% select(X18:X23, default) %>%
group_by(default) %>%
summarise_all(mean) %>%
reshape2::melt() %>%
ggplot(aes(x=variable, y=value, fill=default)) +
geom_col(position="dodge") +
labs(title = "The Means of Predicting Variables by Class") +
theme(axis.text.x = element_text(angle=90, vjust=0.5),
plot.title=element_text(hjust=0.5))

The Means of Predicting Variables by Class

6000

4000
default
value

No
Yes
2000

0
X18

X19

X20

X21

X22

variable X23

Histograms of numerical variables are as given below.


# histogram of data
opar <- par(mfrow=c(3,3))
for (i in c(1, 6:23)) {
hist(df[,i], main=colnames(df)[i], xlab="Value")
}

11
X1 X6 X7
Frequency

Frequency

Frequency
0

0
0e+00 4e+05 8e+05 −2 0 2 4 6 8 −2 0 2 4 6 8

Value Value Value

X8 X9 X10
Frequency

Frequency

Frequency
0

0
−2 0 2 4 6 8 −2 0 2 4 6 8 −2 0 2 4 6 8

Value Value Value

X11 X12 X13


Frequency

Frequency

Frequency
0

0
−2 0 2 4 6 8 −2e+05 4e+05 1e+06 0e+00 6e+05

Value Value Value

X14 X15 X16


Frequency

Frequency

Frequency
0

0 1000000 −2e+05 4e+05 0e+00 4e+05 8e+05

Value Value Value

X17 X18 X19


Frequency

Frequency

Frequency
0

−4e+05 2e+05 8e+05 0e+00 4e+05 8e+05 0 500000 1500000

Value Value Value

X20 X21 X22


Frequency

Frequency

Frequency
0

0e+00 4e+05 8e+05 0e+00 3e+05 6e+05 0e+00 2e+05 4e+05

Value Value Value

# restore the original plot setting


par(opar)

12
X23
Frequency

0e+00 3e+05

Value

# correlation plot
df %>% select(X1, X6:X11) %>% cor() %>%
round(3) %>%
corrplot(method = "color", addCoef.col="white", type = "upper",
title="Correlation between response variables",
mar=c(0,0,2,0))

13
Correlation between response variables

X10

X11
X1

X6

X7

X8

X9
1
X1 1 −0.27 −0.3 −0.29 −0.27 −0.25 −0.24 0.8

X6 1 0.67 0.57 0.54 0.51 0.48 0.6

0.4
X7 1 0.77 0.66 0.62 0.58
0.2

X8 1 0.78 0.69 0.63 0

−0.2
X9 1 0.82 0.72
−0.4
X10 1 0.82 −0.6

−0.8
X11 1
−1
df %>% select(X1, X12:X17) %>% cor() %>%
round(3) %>%
corrplot(method = "color", addCoef.col="white", type = "upper",
title="Correlation between response variables",
mar=c(0,0,2,0))

14
Correlation between response variables

X12

X13

X14

X15

X16

X17
X1
1
X1 1 0.28 0.28 0.28 0.29 0.3 0.29 0.8

X12 1 0.95 0.89 0.86 0.83 0.8 0.6

0.4
X13 1 0.93 0.89 0.86 0.83
0.2

X14 1 0.92 0.88 0.85 0

−0.2
X15 1 0.94 0.9
−0.4
X16 1 0.95 −0.6

−0.8
X17 1
−1
df %>% select(X1, X18:X23) %>% cor() %>%
round(3) %>%
corrplot(method = "color", addCoef.col="white", type = "upper",
title="Correlation between response variables",
mar=c(0,0,2,0))

15
Correlation between response variables

X18

X19

X20

X21

X22

X23
X1
1
X1 1 0.2 0.18 0.21 0.2 0.22 0.22 0.8

X18 1 0.29 0.25 0.2 0.15 0.19 0.6

0.4
X19 1 0.24 0.18 0.18 0.16
0.2

X20 1 0.22 0.16 0.16 0

−0.2
X21 1 0.15 0.16
−0.4
X22 1 0.16 −0.6

−0.8
X23 1
−1
df %>% select(X6:X23) %>% cor() %>%
round(3) %>%
corrplot(method = "color", addCoef.col="white", type = "upper",
title="Correlation between response variables",
mar=c(0,0,1,0),
tl.cex=0.5, number.cex = 0.5)

16
Correlation between response variables

X10

X11

X12

X13

X14

X15

X16

X17

X18

X19

X20

X21

X22

X23
X6

X7

X8

X9
X6
1
1 0.67 0.57 0.54 0.51 0.48 0.19 0.19 0.18 0.18 0.18 0.18 −0.08−0.07−0.07−0.06−0.06−0.06

X7 1 0.77 0.66 0.62 0.58 0.24 0.24 0.22 0.22 0.22 0.22 −0.08−0.06−0.06−0.05−0.04−0.04
0.8
X8 1 0.78 0.69 0.63 0.21 0.24 0.23 0.23 0.22 0.22 0 −0.07−0.05−0.05−0.04−0.04

X9 1 0.82 0.72 0.2 0.23 0.24 0.25 0.24 0.24 −0.01 0 −0.07−0.04−0.03−0.03 0.6
X10 1 0.82 0.21 0.23 0.24 0.27 0.27 0.26 −0.01 0 0.01 −0.06−0.03−0.02

X11 1 0.21 0.23 0.24 0.27 0.29 0.28 0 0 0.01 0.02 −0.05−0.02 0.4
X12 1 0.95 0.89 0.86 0.83 0.8 0.14 0.1 0.16 0.16 0.17 0.18

X13 1 0.93 0.89 0.86 0.83 0.28 0.1 0.15 0.15 0.16 0.17 0.2
X14 1 0.92 0.88 0.85 0.24 0.32 0.13 0.14 0.18 0.18
0
X15 1 0.94 0.9 0.23 0.21 0.3 0.13 0.16 0.18

X16 1 0.95 0.22 0.18 0.25 0.29 0.14 0.16


−0.2
X17 1 0.2 0.17 0.23 0.25 0.31 0.12

X18 1 0.29 0.25 0.2 0.15 0.19 −0.4


X19 1 0.24 0.18 0.18 0.16

X20 1 0.22 0.16 0.16 −0.6


X21 1 0.15 0.16

X22 1 0.16 −0.8


X23 1
−1
The percentages of default (YES) and non-default (NO) cases are 0.221 and 0.779, respectively. The predict
variables are all right skewed. The default cases have extensive payment due with means of X6 - X11 are
all positive. No significant difference in the mean amount of bill (X12 - X17) between default cases and
non-default cases. The mean amount of previous payment (X18 - X23) for default cases are significantly
lower than the non-default cases.
The history of payment (X6 - X11) and the amount of bill statment (X12 - X17) are strongly positively
correlated with variables in the same group. But, the amount of previous payment (X18 - X23) has only
weak positive correlation in the group.
The following code further explore the relationship between variables for default/non-default cases.
df %>% filter(default=="No") %>%
select(X6:X23) %>%
cor() %>%
round(3) %>%
corrplot(method = "color", addCoef.col="white", type = "upper",
title="Correlation between response variables (Non-default)",
mar=c(0,0,1,0),
tl.cex=0.5, number.cex = 0.5)

17
Correlation between response variables (Non−default)

X10
X11
X12
X13
X14
X15
X16
X17
X18
X19
X20
X21
X22
X23
X6
X7
X8
X9
X6 1 0.640.540.50.480.440.220.210.20.190.190.18
−0.07
−0.06
−0.07
−0.05
−0.05
−0.05
1
X7 1 0.740.650.610.560.280.270.250.240.240.24
−0.07
−0.05
−0.05
−0.04
−0.03
−0.03
0.8
X8 1 0.760.660.610.250.280.260.260.250.240.02
−0.06
−0.05
−0.04
−0.02
−0.03
X9 1 0.790.680.260.280.30.290.280.280.010.02
−0.06
−0.03
−0.02
−0.02 0.6
X10 1 0.790.260.280.290.320.310.30.010.010.02
−0.05
−0.02
−0.01
X11 1 0.260.280.290.310.340.320.020.010.020.04
−0.04
−0.02 0.4
X12 1 0.940.880.850.820.790.130.090.140.140.150.16
X13 1 0.920.880.850.820.280.090.140.130.140.16 0.2
X14 1 0.920.870.840.240.330.120.130.170.17
X15 1 0.930.890.230.210.30.120.150.17
0
X16 1 0.940.210.180.250.30.140.16
−0.2
X17 1 0.20.170.230.250.310.11
X18 1 0.290.260.20.150.18 −0.4
X19 1 0.260.180.190.16
X20 1 0.230.160.17 −0.6
X21 1 0.160.16
X22 1 0.16 −0.8
X23 1
−1
df %>% filter(default=="Yes") %>%
select(X6:X23) %>%
cor() %>%
round(3) %>%
corrplot(method = "color", addCoef.col="white", type = "upper",
title="Correlation between response variables (default)",
mar=c(0,0,1,0),
tl.cex=0.5, number.cex = 0.5)

Correlation between response variables (default)


X10
X11
X12
X13
X14
X15
X16
X17
X18
X19
X20
X21
X22
X23
X6
X7
X8
X9

X6 1 0.650.550.510.480.460.180.2 0.20.210.210.21
−0.05
−0.04
−0.03
−0.04
−0.02
−0.02
1
X7 1 0.760.630.590.560.190.2 0.20.220.220.22
−0.08
−0.05
−0.03
−0.02
−0.01
−0.02
0.8
X8 1 0.780.680.630.150.180.180.2 0.20.210.02
−0.07
−0.03
−0.03
−0.02
−0.02
X9 1 0.840.740.120.140.170.180.190.19 0 0 −0.07
−0.03
−0.03
−0.02 0.6
X10 1 0.840.120.140.170.20.210.21 0 0 0.02
−0.05
−0.03
−0.01
X11 1 0.130.140.160.20.220.23 0 −0.01
0.010.02
−0.04
−0.02 0.4
X12 1 0.980.930.90.870.850.250.190.230.240.250.28
X13 1 0.960.930.890.860.320.190.220.230.220.26 0.2
X14 1 0.960.920.890.310.290.170.220.210.26
X15 1 0.960.930.30.250.30.210.20.24
0
X16 1 0.960.280.240.290.30.180.22
−0.2
X17 1 0.250.220.260.280.290.16
X18 1 0.120.160.160.110.2 −0.4
X19 1 0.10.120.120.13
X20 1 0.110.110.1 −0.6
X21 1 0.080.12
X22 1 0.13 −0.8
X23 1
−1
No significant in the correlation between variables for default cases and non-default cases.
Given the high correlation between X6 - X11 and X12 - X17, we may replace X6 - X11 and X12 - X17 by
their means: history and bill, respectively, in order to reduce the dimension of the problem.
# derive variables
df_reduce <- df %>% mutate(history=(X6 + X7 + X8 + X9 + X10 + X11)/6,
bill=(X12 + X13 + X14 + X15 + X16 + X17)/6) %>%
select(X1:X5, history:bill, X18:X23, default)

18
describe(df_reduce)

## vars n mean sd median trimmed mad min


## X1 1
30000 167484.32 129747.66 140000.00 151607.40 133434.00 10000.00
## X2 2
30000 1.60 0.49 2.00 1.63 0.00 1.00
## X3 3
30000 1.85 0.79 2.00 1.78 1.48 0.00
## X4 4
30000 1.55 0.52 2.00 1.55 0.00 0.00
## X5 5
30000 35.49 9.22 34.00 34.69 8.90 21.00
## history 6
30000 -0.18 0.98 0.00 -0.22 0.74 -2.00
## bill 7
30000 44976.95 63260.72 21051.83 31425.04 28431.82 -56043.17
## X18 8
30000 5663.58 16563.28 2100.00 2997.21 2864.38 0.00
## X19 9
30000 5921.16 23040.87 2009.00 2876.43 2951.86 0.00
## X20 10
30000 5225.68 17606.96 1800.00 2468.91 2661.27 0.00
## X21 11
30000 4826.08 15666.16 1500.00 2199.23 2223.90 0.00
## X22 12
30000 4799.39 15278.31 1500.00 2202.19 2223.90 0.00
## X23 13
30000 5215.50 17777.47 1500.00 2165.33 2223.90 0.00
## default* 14
30000 1.22 0.42 1.00 1.15 0.00 1.00
## max range skew kurtosis se
## X1 1000000.0 990000 0.99 0.54 749.10
## X2 2.0 1 -0.42 -1.82 0.00
## X3 6.0 6 0.97 2.08 0.00
## X4 3.0 3 -0.02 -1.36 0.00
## X5 79.0 58 0.73 0.04 0.05
## history 6.0 8 0.69 2.69 0.01
## bill 877313.8 933357 2.73 10.52 365.24
## X18 873552.0 873552 14.67 415.16 95.63
## X19 1684259.0 1684259 30.45 1641.25 133.03
## X20 896040.0 896040 17.21 564.18 101.65
## X21 621000.0 621000 12.90 277.27 90.45
## X22 426529.0 426529 11.13 180.02 88.21
## X23 528666.0 528666 10.64 167.12 102.64
## default* 2.0 1 1.34 -0.20 0.00
df_reduce %>% select(history, default) %>%
reshape2::melt() %>%
ggplot(aes(x=variable, y=value, fill=default)) +
geom_boxplot() +
labs(title="Predicting variables") +
theme(axis.text.x = element_text(angle=90, vjust=0.5),
plot.title=element_text(hjust=0.5))

19
Predicting variables
6

default
value

2 No
Yes

−2
history

variable

df_reduce %>% select(bill, default) %>%


reshape2::melt() %>%
ggplot(aes(x=variable, y=value, fill=default)) +
geom_boxplot() +
labs(title="Predicting variables") +
theme(axis.text.x = element_text(angle=90, vjust=0.5),
plot.title=element_text(hjust=0.5))

20
Predicting variables

750000

500000 default
value

No
Yes

250000

0
bill

variable

df_reduce %>% select(X18:X23, default) %>%


reshape2::melt() %>%
ggplot(aes(x=variable, y=value, fill=default)) +
geom_boxplot() +
labs(title="Predicting variables") +
theme(axis.text.x = element_text(angle=90, vjust=0.5),
plot.title=element_text(hjust=0.5))

21
Predicting variables

1500000

1000000 default
value

No
Yes

500000

0
X18

X19

X20

X21

X22

X23
variable

df_reduce %>% select(X1, history:bill, X18:X23) %>% cor() %>%


round(3) %>%
corrplot(method = "color", addCoef.col="white", type = "upper",
title="Correlation between response variables",
mar=c(0,0,2,0),
tl.cex=0.5, number.cex = 0.5)

22
Correlation between response variables

history

X18

X19

X20

X21

X22

X23
X1

bill
1
X1 1 −0.32 0.3 0.2 0.18 0.21 0.2 0.22 0.22
0.8
history 1 0.28 −0.04 −0.04 −0.05 −0.05 −0.05 −0.04
0.6
bill 1 0.23 0.19 0.21 0.19 0.19 0.18
0.4
X18 1 0.29 0.25 0.2 0.15 0.19
0.2

X19 1 0.24 0.18 0.18 0.16 0

X20 1 0.22 0.16 0.16 −0.2

X21 1 0.15 0.16


−0.4

−0.6
X22 1 0.16

−0.8
X23 1

−1
Variables X2 - X4 are categorical variable. It is advised that SVM should not be applied to categorical
variables because it is based on the Euclidean distances. For SVM, we can set dummy variables to represent
the categorical variables. The following code uses the dummy_cols() in the fastDummies package to replace
the categorical variables by dummy variables.
# use encoding for categorical variables X2:X4
dfc <- df %>% mutate(X2=as.factor(X2), X3=as.factor(X3), X4=as.factor(X4)) %>%
fastDummies::dummy_cols() %>%
select(X1, X6:X23, X2_1:X4_3, default)

dfc_reduce <- df_reduce %>%


mutate(X2=as.factor(X2), X3=as.factor(X3), X4=as.factor(X4)) %>%
fastDummies::dummy_cols() %>%
select(X1, history:bill, X18:X23, X2_1:X4_3, default)

Preparing data
The following code splits the dataset into a training dataset (70%) and a testing dataset (30%). The minimum
and maximum of the variables in the training dataset are used to normalize the datasets to the [0,1] range.
# split data into training and test sets
set.seed(800)
index <- 1:nrow(df)
test_set_index <- sample(index, trunc(length(index)/3))
# (1) Full set
test_set <- df[test_set_index,]
train_set <- df[-test_set_index,]
# (2) Reduced set
test_set_r <- df_reduce[test_set_index,]

23
train_set_r <- df_reduce[-test_set_index,]
# (3) Full set - dummy encoding for categorical variables
test_setc <- dfc[test_set_index,]
train_setc <- dfc[-test_set_index,]
# (4) Reduced set - dummy encoding for categorical variables
test_set_rc <- dfc_reduce[test_set_index,]
train_set_rc <- dfc_reduce[-test_set_index,]

# determine the max/min from the training set


d_max <- sapply(train_set[,1:23], max)
d_min <- sapply(train_set[,1:23], min)
d_max_r <- sapply(train_set_r[,1:13], max)
d_min_r <- sapply(train_set_r[,1:13], min)
d_maxc <- sapply(train_setc[,1:19], max)
d_minc <- sapply(train_setc[,1:19], min)
d_max_rc <- sapply(train_set_rc[,1:9], max)
d_min_rc <- sapply(train_set_rc[,1:9], min)

# function for rescale the columns based on the training set max/min
rescale <- function(dat, d_min, d_max) {
c <- ncol(dat)
for (i in 1:c) {
dat[,i] <- sapply(dat[,i], function(x) (x - d_min[i])/(d_max[i] - d_min[i]))
}
return (dat)
}

# normalize the training/testing set


train_set[,1:23] <- rescale(train_set[,1:23], d_min, d_max)
test_set[,1:23] <- rescale(test_set[,1:23], d_min, d_max)
train_set_r[,1:13] <- rescale(train_set_r[,1:13], d_min_r, d_max_r)
test_set_r[,1:13] <- rescale(test_set_r[,1:13], d_min_r, d_max_r)
train_setc[,1:19] <- rescale(train_setc[,1:19], d_minc, d_maxc)
test_setc[,1:19] <- rescale(test_setc[,1:19], d_minc, d_maxc)
train_set_rc[,1:9] <- rescale(train_set_rc[,1:9], d_min_rc, d_max_rc)
test_set_rc[,1:9] <- rescale(test_set_rc[,1:9], d_min_rc, d_max_rc)

SVM
Develop basic model
The basic SVM classifier is constructed using radial kernel. The following code develops the full and reduce
models. Default=“Yes” is set to be positive and the models is to predict the probability as follows:
# (1) Full set
# create a classification task
task_f <- makeClassifTask(id = "Credit_Default_F", data = train_set,
target = "default",
positive = "Yes")
task_f

## Supervised task: Credit_Default_F


## Type: classif
## Target: default

24
## Observations: 20000
## Features:
## numerics factors ordered functionals
## 23 0 0 0
## Missings: FALSE
## Has weights: FALSE
## Has blocking: FALSE
## Has coordinates: FALSE
## Classes: 2
## No Yes
## 15590 4410
## Positive class: Yes
# create a svm learner - with predict.type = "prob" for plotting the training
svm_lrn_f <- makeLearner("classif.svm", id="svm_full",
kernel="radial", predict.type = "prob")
svm_lrn_f

## Learner svm_full from package e1071


## Type: classif
## Name: Support Vector Machines (libsvm); Short name: svm
## Class: classif.svm
## Properties: twoclass,multiclass,numerics,factors,prob,class.weights
## Predict-Type: prob
## Hyperparameters: kernel=radial
# train the svm classifier
set.seed(500)
svm_mod_f <- train(svm_lrn_f, task_f)
svm_mod_f$learner.model

##
## Call:
## svm.default(x = d$data, y = d$target, kernel = "radial", probability = .learner$predict.type ==
## "prob")
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: radial
## cost: 1
##
## Number of Support Vectors: 8882
svm_mod_f$features

## [1] "X1" "X2" "X3" "X4" "X5" "X6" "X7" "X8" "X9" "X10" "X11" "X12"
## [13] "X13" "X14" "X15" "X16" "X17" "X18" "X19" "X20" "X21" "X22" "X23"
svm_mod_f$time

## [1] 339.06
train_svm_f <- predict(svm_mod_f, task_f); train_svm_f

## Prediction: 20000 observations


## predict.type: prob
## threshold: No=0.50,Yes=0.50

25
## time: 13.53
## id truth prob.Yes prob.No response
## 1 1 Yes 0.4350538 0.5649462 No
## 2 2 Yes 0.1618716 0.8381284 No
## 3 3 No 0.1555109 0.8444891 No
## 4 4 No 0.1576471 0.8423529 No
## 5 5 No 0.1553270 0.8446730 No
## 9 6 No 0.1530721 0.8469279 No
## ... (#rows: 20000, #cols: 5)
test_svm_f <- predict(svm_mod_f, newdata=test_set); test_svm_f

## Prediction: 10000 observations


## predict.type: prob
## threshold: No=0.50,Yes=0.50
## time: 6.87
## truth prob.Yes prob.No response
## 21299 No 0.1506065 0.8493935 No
## 8619 No 0.1576662 0.8423338 No
## 11730 No 0.1574794 0.8425206 No
## 3266 Yes 0.1627676 0.8372324 No
## 15740 No 0.1572782 0.8427218 No
## 19572 No 0.1573074 0.8426926 No
## ... (#rows: 10000, #cols: 4)
# (2) Reduce set
task_r <- makeClassifTask(id = "Credit_Default_R", data = train_set_r,
target = "default",
positive = "Yes")
task_r

## Supervised task: Credit_Default_R


## Type: classif
## Target: default
## Observations: 20000
## Features:
## numerics factors ordered functionals
## 13 0 0 0
## Missings: FALSE
## Has weights: FALSE
## Has blocking: FALSE
## Has coordinates: FALSE
## Classes: 2
## No Yes
## 15590 4410
## Positive class: Yes
# create a svm learner - with predict.type = "prob" for plotting the training
svm_lrn_r <- makeLearner("classif.svm", id="svm_reduce",
kernel="radial", predict.type = "prob")
svm_lrn_r

## Learner svm_reduce from package e1071


## Type: classif
## Name: Support Vector Machines (libsvm); Short name: svm
## Class: classif.svm

26
## Properties: twoclass,multiclass,numerics,factors,prob,class.weights
## Predict-Type: prob
## Hyperparameters: kernel=radial
# train the svm classifier
set.seed(500)
svm_mod_r <- train(svm_lrn_r, task_r)
svm_mod_r$learner.model

##
## Call:
## svm.default(x = d$data, y = d$target, kernel = "radial", probability = .learner$predict.type ==
## "prob")
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: radial
## cost: 1
##
## Number of Support Vectors: 8963
svm_mod_r$features

## [1] "X1" "X2" "X3" "X4" "X5" "history" "bill"


## [8] "X18" "X19" "X20" "X21" "X22" "X23"
svm_mod_r$time

## [1] 178.28
train_svm_r <- predict(svm_mod_r, task_r); train_svm_r

## Prediction: 20000 observations


## predict.type: prob
## threshold: No=0.50,Yes=0.50
## time: 11.00
## id truth prob.Yes prob.No response
## 1 1 Yes 0.1663680 0.8336320 No
## 2 2 Yes 0.2160805 0.7839195 No
## 3 3 No 0.1683294 0.8316706 No
## 4 4 No 0.1721904 0.8278096 No
## 5 5 No 0.1735513 0.8264487 No
## 9 6 No 0.1927403 0.8072597 No
## ... (#rows: 20000, #cols: 5)
test_svm_r <- predict(svm_mod_r, newdata=test_set_r); test_svm_r

## Prediction: 10000 observations


## predict.type: prob
## threshold: No=0.50,Yes=0.50
## time: 5.44
## truth prob.Yes prob.No response
## 21299 No 0.1534315 0.8465685 No
## 8619 No 0.1676678 0.8323322 No
## 11730 No 0.1715108 0.8284892 No
## 3266 Yes 0.1799155 0.8200845 No
## 15740 No 0.1685218 0.8314782 No

27
## 19572 No 0.1701972 0.8298028 No
## ... (#rows: 10000, #cols: 4)
The performance of the classifiers (without dummy encoding) are evaluated below:
cat("(1) Full set\n")

## (1) Full set


# Relative confusion matrix (normalized by row/column)
calculateConfusionMatrix(train_svm_f, relative = TRUE, sums = TRUE)

## Relative confusion matrix (normalized by row/column):


## predicted
## true No Yes -err.- -n-
## No 0.97/0.83 0.03/0.28 0.03 18055
## Yes 0.68/0.17 0.32/0.72 0.68 1945
## -err.- 0.17 0.28 0.18 <NA>
## -n- 15590 4410 <NA> 20000
##
##
## Absolute confusion matrix:
## No Yes -err.- -n-
## No 15054 536 536 15590
## Yes 3001 1409 3001 4410
## -err.- 3001 536 3537 NA
## -n- 18055 1945 NA 20000
calculateConfusionMatrix(test_svm_f, relative = TRUE, sums = TRUE)

## Relative confusion matrix (normalized by row/column):


## predicted
## true No Yes -err.- -n-
## No 0.97/0.83 0.03/0.28 0.03 9045
## Yes 0.69/0.17 0.31/0.72 0.69 955
## -err.- 0.17 0.28 0.18 <NA>
## -n- 7774 2226 <NA> 20000
##
##
## Absolute confusion matrix:
## No Yes -err.- -n-
## No 7511 263 263 7774
## Yes 1534 692 1534 2226
## -err.- 1534 263 1797 NA
## -n- 9045 955 NA 20000
# performance
cat("\n")
cat("Training set accuracy: ", performance(train_svm_f, measures=acc), "\n")

## Training set accuracy: 0.82315


cat("Test set accuracy: ", performance(test_svm_f, measures=acc), "\n")

## Test set accuracy: 0.8203


cat("Test set auc: ", performance(test_svm_f, measures=auc), "\n")

28
## Test set auc: 0.7287499
# plot ROC and threshold
d = generateThreshVsPerfData(train_svm_f, measures = list(fpr, tpr, mmce))
plotROCCurves(d)

1.00

0.75
True positive rate

0.50

0.25

0.00

0.00 0.25 0.50 0.75 1.00


False positive rate

plotThreshVsPerf(d)

29
False positive rate True positive rate Mean misclassification error
0.8
1.00 1.00

0.75 0.75
0.6
performance

0.50 0.50

0.4

0.25 0.25

0.2
0.00 0.00

0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00
threshold

cat("(2) Reduced set\n")

## (2) Reduced set


# Relative confusion matrix (normalized by row/column)
calculateConfusionMatrix(train_svm_r, relative = TRUE, sums = TRUE)

## Relative confusion matrix (normalized by row/column):


## predicted
## true No Yes -err.- -n-
## No 0.97/0.81 0.03/0.32 0.03 18718
## Yes 0.80/0.19 0.20/0.68 0.80 1282
## -err.- 0.19 0.32 0.20 <NA>
## -n- 15590 4410 <NA> 20000
##
##
## Absolute confusion matrix:
## No Yes -err.- -n-
## No 15180 410 410 15590
## Yes 3538 872 3538 4410
## -err.- 3538 410 3948 NA
## -n- 18718 1282 NA 20000
calculateConfusionMatrix(test_svm_r, relative = TRUE, sums = TRUE)

## Relative confusion matrix (normalized by row/column):


## predicted
## true No Yes -err.- -n-
## No 0.97/0.81 0.03/0.32 0.03 9319
## Yes 0.79/0.19 0.21/0.68 0.79 681
## -err.- 0.19 0.32 0.20 <NA>

30
## -n- 7774 2226 <NA> 20000
##
##
## Absolute confusion matrix:
## No Yes -err.- -n-
## No 7555 219 219 7774
## Yes 1764 462 1764 2226
## -err.- 1764 219 1983 NA
## -n- 9319 681 NA 20000
# performance
cat("\n")
cat("Training set accuracy: ", performance(train_svm_r, measures=acc), "\n")

## Training set accuracy: 0.8026


cat("Test set accuracy: ", performance(test_svm_r, measures=acc), "\n")

## Test set accuracy: 0.8017


cat("Test set auc: ", performance(test_svm_r, measures=auc), "\n")

## Test set auc: 0.7306912


# plot ROC and threshold
d = generateThreshVsPerfData(train_svm_f, measures = list(fpr, tpr, mmce))
plotROCCurves(d)

1.00

0.75
True positive rate

0.50

0.25

0.00

0.00 0.25 0.50 0.75 1.00


False positive rate

31
plotThreshVsPerf(d)

False positive rate True positive rate Mean misclassification error


0.8
1.00 1.00

0.75 0.75
0.6
performance

0.50 0.50

0.4

0.25 0.25

0.2
0.00 0.00

0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00
threshold
The
testing results are as follows: * Full model - TP rate: 0.3 TN rate: 0.96 Accuracy: 0.8212 AUC: 0.7145 *
Reduce model - TP rate: 0.2 TN rate: 0.97 Accuracy: 0.808 AUC: 0.7187
(Note: the TP rate and TN rate would depend on the cutoff threshold for making the prediction. By default,
the threshold is 0.5. If the threshold is reduced to 0.3, the result will be as follows:
cat("(1) Full set - threshold=0.3\n")

## (1) Full set - threshold=0.3


calculateConfusionMatrix(setThreshold(test_svm_f, threshold=0.3), relative = TRUE, sums = TRUE)

## Relative confusion matrix (normalized by row/column):


## predicted
## true No Yes -err.- -n-
## No 0.94/0.84 0.06/0.33 0.06 8690
## Yes 0.61/0.16 0.39/0.67 0.61 1310
## -err.- 0.16 0.33 0.18 <NA>
## -n- 7774 2226 <NA> 20000
##
##
## Absolute confusion matrix:
## No Yes -err.- -n-
## No 7340 434 434 7774
## Yes 1350 876 1350 2226
## -err.- 1350 434 1784 NA
## -n- 8690 1310 NA 20000
cat("Test set accuracy: ", performance(test_svm_f, measures=acc), "\n")

32
## Test set accuracy: 0.8203
cat("(2) Reduced set - threshold=0.3\n")

## (2) Reduced set - threshold=0.3


calculateConfusionMatrix(setThreshold(test_svm_r, threshold=0.3), relative = TRUE, sums = TRUE)

## Relative confusion matrix (normalized by row/column):


## predicted
## true No Yes -err.- -n-
## No 0.94/0.83 0.06/0.38 0.06 8849
## Yes 0.68/0.17 0.32/0.62 0.68 1151
## -err.- 0.17 0.38 0.19 <NA>
## -n- 7774 2226 <NA> 20000
##
##
## Absolute confusion matrix:
## No Yes -err.- -n-
## No 7340 434 434 7774
## Yes 1509 717 1509 2226
## -err.- 1509 434 1943 NA
## -n- 8849 1151 NA 20000
cat("Test set accuracy: ", performance(test_svm_r, measures=acc), "\n")

## Test set accuracy: 0.8017


The performance of the two models become closer and the TP rates are 0.38 and 0.31 for the full and reduced
set while the TN rates are both 0.94.In general, decreasing the threshold increases the TP rates but reduces
the TN rate. The selection of threshold would very much depend on the problem.)
The following code develops the full and reduce models with dummy encoding for categorical variables:
# (1) Full set - dummy encoding
task_fc <- makeClassifTask(id = "Credit_Default_FC", data = train_setc,
target = "default",
positive = "Yes")
task_fc

## Supervised task: Credit_Default_FC


## Type: classif
## Target: default
## Observations: 20000
## Features:
## numerics factors ordered functionals
## 32 0 0 0
## Missings: FALSE
## Has weights: FALSE
## Has blocking: FALSE
## Has coordinates: FALSE
## Classes: 2
## No Yes
## 15590 4410
## Positive class: Yes
# create a svm learner - with predict.type = "prob" for plotting the training
svm_lrn_fc <- makeLearner("classif.svm", id="svm_full_dummy",

33
kernel="radial", predict.type = "prob")
svm_lrn_fc

## Learner svm_full_dummy from package e1071


## Type: classif
## Name: Support Vector Machines (libsvm); Short name: svm
## Class: classif.svm
## Properties: twoclass,multiclass,numerics,factors,prob,class.weights
## Predict-Type: prob
## Hyperparameters: kernel=radial
# train the svm classifier
set.seed(500)
svm_mod_fc <- train(svm_lrn_fc, task_fc)
svm_mod_fc$learner.model

##
## Call:
## svm.default(x = d$data, y = d$target, kernel = "radial", probability = .learner$predict.type ==
## "prob")
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: radial
## cost: 1
##
## Number of Support Vectors: 8777
svm_mod_fc$features

## [1] "X1" "X6" "X7" "X8" "X9" "X10" "X11" "X12" "X13" "X14"
## [11] "X15" "X16" "X17" "X18" "X19" "X20" "X21" "X22" "X23" "X2_1"
## [21] "X2_2" "X3_0" "X3_1" "X3_2" "X3_3" "X3_4" "X3_5" "X3_6" "X4_0" "X4_1"
## [31] "X4_2" "X4_3"
svm_mod_fc$time

## [1] 437.62
train_svm_fc <- predict(svm_mod_fc, task_fc); train_svm_fc

## Prediction: 20000 observations


## predict.type: prob
## threshold: No=0.50,Yes=0.50
## time: 18.89
## id truth prob.Yes prob.No response
## 1 1 Yes 0.4027366 0.5972634 No
## 2 2 Yes 0.1553188 0.8446812 No
## 3 3 No 0.1574478 0.8425522 No
## 4 4 No 0.1582327 0.8417673 No
## 5 5 No 0.1463163 0.8536837 No
## 9 6 No 0.1440734 0.8559266 No
## ... (#rows: 20000, #cols: 5)
test_svm_fc <- predict(svm_mod_fc, newdata=test_setc); test_svm_fc

## Prediction: 10000 observations

34
## predict.type: prob
## threshold: No=0.50,Yes=0.50
## time: 8.25
## truth prob.Yes prob.No response
## 21299 No 0.1494978 0.8505022 No
## 8619 No 0.1588606 0.8411394 No
## 11730 No 0.1580871 0.8419129 No
## 3266 Yes 0.1577301 0.8422699 No
## 15740 No 0.1570811 0.8429189 No
## 19572 No 0.1548008 0.8451992 No
## ... (#rows: 10000, #cols: 4)
# (2) Reduce set - dummy encoding
task_rc <- makeClassifTask(id = "Credit_Default_RC", data = train_set_rc,
target = "default",
positive = "Yes")
task_rc

## Supervised task: Credit_Default_RC


## Type: classif
## Target: default
## Observations: 20000
## Features:
## numerics factors ordered functionals
## 22 0 0 0
## Missings: FALSE
## Has weights: FALSE
## Has blocking: FALSE
## Has coordinates: FALSE
## Classes: 2
## No Yes
## 15590 4410
## Positive class: Yes
# create a svm learner - with predict.type = "prob" for plotting the training
svm_lrn_rc <- makeLearner("classif.svm", id="svm_reduce_dummy",
kernel="radial", predict.type = "prob")
svm_lrn_rc

## Learner svm_reduce_dummy from package e1071


## Type: classif
## Name: Support Vector Machines (libsvm); Short name: svm
## Class: classif.svm
## Properties: twoclass,multiclass,numerics,factors,prob,class.weights
## Predict-Type: prob
## Hyperparameters: kernel=radial
# train the svm classifier
set.seed(500)
svm_mod_rc <- train(svm_lrn_rc, task_rc)
svm_mod_rc$learner.model

##
## Call:
## svm.default(x = d$data, y = d$target, kernel = "radial", probability = .learner$predict.type ==
## "prob")

35
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: radial
## cost: 1
##
## Number of Support Vectors: 8902
svm_mod_rc$features

## [1] "X1" "history" "bill" "X18" "X19" "X20" "X21"


## [8] "X22" "X23" "X2_1" "X2_2" "X3_0" "X3_1" "X3_2"
## [15] "X3_3" "X3_4" "X3_5" "X3_6" "X4_0" "X4_1" "X4_2"
## [22] "X4_3"
svm_mod_rc$time

## [1] 268.22
train_svm_rc <- predict(svm_mod_rc, task_rc); train_svm_rc

## Prediction: 20000 observations


## predict.type: prob
## threshold: No=0.50,Yes=0.50
## time: 12.94
## id truth prob.Yes prob.No response
## 1 1 Yes 0.1613304 0.8386696 No
## 2 2 Yes 0.2182082 0.7817918 No
## 3 3 No 0.1712494 0.8287506 No
## 4 4 No 0.1730996 0.8269004 No
## 5 5 No 0.1543305 0.8456695 No
## 9 6 No 0.2002395 0.7997605 No
## ... (#rows: 20000, #cols: 5)
test_svm_rc <- predict(svm_mod_rc, newdata=test_set_rc); test_svm_rc

## Prediction: 10000 observations


## predict.type: prob
## threshold: No=0.50,Yes=0.50
## time: 6.75
## truth prob.Yes prob.No response
## 21299 No 0.1499465 0.8500535 No
## 8619 No 0.1739002 0.8260998 No
## 11730 No 0.1725625 0.8274375 No
## 3266 Yes 0.1789634 0.8210366 No
## 15740 No 0.1616995 0.8383005 No
## 19572 No 0.1600336 0.8399664 No
## ... (#rows: 10000, #cols: 4)
The performance of the classifiers (with dummy encoding) are evaluated below:
cat("(1) Full set - dummy encoding\n")

## (1) Full set - dummy encoding


# Relative confusion matrix (normalized by row/column)
calculateConfusionMatrix(train_svm_fc, relative = TRUE, sums = TRUE)

36
## Relative confusion matrix (normalized by row/column):
## predicted
## true No Yes -err.- -n-
## No 0.97/0.83 0.03/0.28 0.03 18092
## Yes 0.69/0.17 0.31/0.72 0.69 1908
## -err.- 0.17 0.28 0.18 <NA>
## -n- 15590 4410 <NA> 20000
##
##
## Absolute confusion matrix:
## No Yes -err.- -n-
## No 15064 526 526 15590
## Yes 3028 1382 3028 4410
## -err.- 3028 526 3554 NA
## -n- 18092 1908 NA 20000
calculateConfusionMatrix(test_svm_fc, relative = TRUE, sums = TRUE)

## Relative confusion matrix (normalized by row/column):


## predicted
## true No Yes -err.- -n-
## No 0.97/0.83 0.03/0.28 0.03 9059
## Yes 0.69/0.17 0.31/0.72 0.69 941
## -err.- 0.17 0.28 0.18 <NA>
## -n- 7774 2226 <NA> 20000
##
##
## Absolute confusion matrix:
## No Yes -err.- -n-
## No 7515 259 259 7774
## Yes 1544 682 1544 2226
## -err.- 1544 259 1803 NA
## -n- 9059 941 NA 20000
# performance
cat("\n")
cat("Training set accuracy: ", performance(train_svm_fc, measures=acc), "\n")

## Training set accuracy: 0.8223


cat("Test set accuracy: ", performance(test_svm_fc, measures=acc), "\n")

## Test set accuracy: 0.8197


cat("Test set auc: ", performance(test_svm_fc, measures=auc), "\n")

## Test set auc: 0.7273231


# plot ROC and threshold
d = generateThreshVsPerfData(train_svm_fc, measures = list(fpr, tpr, mmce))
plotROCCurves(d)

37
1.00

0.75
True positive rate

0.50

0.25

0.00

0.00 0.25 0.50 0.75 1.00


False positive rate

plotThreshVsPerf(d)

False positive rate True positive rate Mean misclassification error


0.8
1.00 1.00

0.75 0.75
0.6
performance

0.50 0.50

0.4

0.25 0.25

0.2
0.00 0.00

0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00
threshold

cat("(2) Reduced set - dummy encoding\n")

## (2) Reduced set - dummy encoding

38
# Relative confusion matrix (normalized by row/column)
calculateConfusionMatrix(train_svm_rc, relative = TRUE, sums = TRUE)

## Relative confusion matrix (normalized by row/column):


## predicted
## true No Yes -err.- -n-
## No 0.97/0.81 0.03/0.32 0.03 18724
## Yes 0.80/0.19 0.20/0.68 0.80 1276
## -err.- 0.19 0.32 0.20 <NA>
## -n- 15590 4410 <NA> 20000
##
##
## Absolute confusion matrix:
## No Yes -err.- -n-
## No 15178 412 412 15590
## Yes 3546 864 3546 4410
## -err.- 3546 412 3958 NA
## -n- 18724 1276 NA 20000
calculateConfusionMatrix(test_svm_rc, relative = TRUE, sums = TRUE)

## Relative confusion matrix (normalized by row/column):


## predicted
## true No Yes -err.- -n-
## No 0.97/0.81 0.03/0.31 0.03 9324
## Yes 0.79/0.19 0.21/0.69 0.79 676
## -err.- 0.19 0.31 0.20 <NA>
## -n- 7774 2226 <NA> 20000
##
##
## Absolute confusion matrix:
## No Yes -err.- -n-
## No 7564 210 210 7774
## Yes 1760 466 1760 2226
## -err.- 1760 210 1970 NA
## -n- 9324 676 NA 20000
# performance
cat("\n")
cat("Training set accuracy: ", performance(train_svm_rc, measures=acc), "\n")

## Training set accuracy: 0.8021


cat("Test set accuracy: ", performance(test_svm_rc, measures=acc), "\n")

## Test set accuracy: 0.803


cat("Test set auc: ", performance(test_svm_rc, measures=auc), "\n")

## Test set auc: 0.7207204


# plot ROC and threshold
d = generateThreshVsPerfData(train_svm_rc, measures = list(fpr, tpr, mmce))
plotROCCurves(d)

39
1.00

0.75
True positive rate

0.50

0.25

0.00

0.00 0.25 0.50 0.75 1.00


False positive rate

plotThreshVsPerf(d)

False positive rate True positive rate Mean misclassification error


0.8
1.00 1.00

0.75 0.75
0.6
performance

0.50 0.50

0.4

0.25 0.25

0.00 0.00 0.2

0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00
threshold

The results for dummy encoding models are as follows: * Full model - TP rate: 0.29 TN rate: 0.96 Accuracy:
0.8104 AUC: 0.7159 * Reduce model - TP rate: 0.19 TN rate: 0.97 Accuracy: 0.807 AUC: 0.7121

40
The performance of the dummy encoding models are slightly worse than before. It may be due to the
increasing no. of dummy variables. So, we will use the original variable instead of dummy encoding in the
following analysis.
To following codes generate sample plots the boundary of the SVM classifier.
# Visualizing the prediction
plotLearnerPrediction(svm_lrn_f, task = task_f,
features =c("X1", "X23"))

svm: kernel=radial
Train: mmce=0.2205000; CV: mmce.test.mean=0.2205000
1.00

0.75

default
X23

0.50 No
Yes

0.25

0.00

0.00 0.25 0.50 0.75 1.00


X1

plotLearnerPrediction(svm_lrn_r, task = task_rc,


features =c("X1", "history"))

41
svm: kernel=radial
Train: mmce=0.1962500; CV: mmce.test.mean=0.1974500
1.00

0.75

default
history

0.50 No
Yes

0.25

0.00

0.00 0.25 0.50 0.75 1.00


X1

Tuning model
Due to the long model training time, we only attempt to perform grid search to search for better cost for the
reduced model with 3-fold cross-validation.
## Define the resampling strategy
set.seed(800)
rdesc <- makeResampleDesc(method = "CV", iters=3)

# discrete parameter sets


discrete_ps <- makeParamSet(
makeDiscreteParam("cost", values = c(0.975, 1.0, 1.025, 1.05, 1.075)) #0.95, 1.0
)

ctrl_d <- makeTuneControlGrid()


res_svm <- tuneParams(svm_lrn_r, task = task_r, # tune the learner - cost=1, gamma=0.0769
resampling = rdesc,
par.set = discrete_ps,
control = ctrl_d,
measures = list(auc, acc, mmce))
res_svm

## Tune result:
## Op. pars: cost=1.075
## auc.test.mean=0.7179790,acc.test.mean=0.8006500,mmce.test.mean=0.1993500
res_svm$x

## $cost
## [1] 1.075

42
res_svm$y

## auc.test.mean acc.test.mean mmce.test.mean


## 0.717979 0.800650 0.199350
The tuning performance is shown below.
#investigating hyperparameter tuning effects
data <- generateHyperParsEffectData(res_svm)

# use ggplot2 function to change y-axis label and add title


p1<-plotHyperParsEffect(data, x = "iteration", y = "auc.test.mean",
plot.type = "line")

p2<-plotHyperParsEffect(data, x = "iteration", y = "acc.test.mean",


plot.type = "line")

# plot graphs
grid.arrange(p1 + ggtitle("Tuning performance") + ylab("AUC"),
p2 + ggtitle("Tuning performance") + ylab("Accuracy"), nrow=1)

Tuning performance Tuning performance


0.80080

0.71796
0.80079

0.80078
Accuracy

0.71792
AUC

0.80077

0.71788
0.80076

0.71784 0.80075

1 2 3 4 5 1 2 3 4 5
iteration iteration

From the result, the optimal cost remains unchange (1.0). So, it seems that the model cannot be further
improved by tuning.
So, it is used to retrain the model and the performance of the tuned SVM classifier (reduced model) are then
evaluated as follows:
# update the hyperparameter of reduced model
svm_lrn_r_tuned <- setHyperPars(svm_lrn_r, par.vals = res_svm$x)
svm_lrn_r_tuned$id <- "svm_r_tuned"
svm_lrn_r_tuned$par.vals #after update

43
## $kernel
## [1] "radial"
##
## $cost
## [1] 1.075
svm_mod_r_tuned <- train(svm_lrn_r_tuned, task_r)
train_svm_r_tuned <- predict(svm_mod_r_tuned, task_r); train_svm_r_tuned

## Prediction: 20000 observations


## predict.type: prob
## threshold: No=0.50,Yes=0.50
## time: 10.69
## id truth prob.Yes prob.No response
## 1 1 Yes 0.1662110 0.8337890 No
## 2 2 Yes 0.2159087 0.7840913 No
## 3 3 No 0.1681991 0.8318009 No
## 4 4 No 0.1720694 0.8279306 No
## 5 5 No 0.1735963 0.8264037 No
## 9 6 No 0.1931731 0.8068269 No
## ... (#rows: 20000, #cols: 5)
test_svm_r_tuned <- predict(svm_mod_r_tuned, newdata=test_set_r); test_svm_r_tuned

## Prediction: 10000 observations


## predict.type: prob
## threshold: No=0.50,Yes=0.50
## time: 5.71
## truth prob.Yes prob.No response
## 21299 No 0.1525252 0.8474748 No
## 8619 No 0.1676155 0.8323845 No
## 11730 No 0.1713995 0.8286005 No
## 3266 Yes 0.1799892 0.8200108 No
## 15740 No 0.1686531 0.8313469 No
## 19572 No 0.1702622 0.8297378 No
## ... (#rows: 10000, #cols: 4)
# performance
cat("Performance of reduced model (after tuning)\n")

## Performance of reduced model (after tuning)


calculateConfusionMatrix(test_svm_r_tuned, relative=TRUE, sums=TRUE)

## Relative confusion matrix (normalized by row/column):


## predicted
## true No Yes -err.- -n-
## No 0.97/0.81 0.03/0.32 0.03 9316
## Yes 0.79/0.19 0.21/0.68 0.79 684
## -err.- 0.19 0.32 0.20 <NA>
## -n- 7774 2226 <NA> 20000
##
##
## Absolute confusion matrix:
## No Yes -err.- -n-
## No 7554 220 220 7774
## Yes 1762 464 1762 2226

44
## -err.- 1762 220 1982 NA
## -n- 9316 684 NA 20000
cat("Training set accuracy: ", performance(train_svm_r_tuned, measures=acc), "\n")

## Training set accuracy: 0.803


cat("Test set accuracy: ", performance(test_svm_r_tuned, measures=acc), "\n")

## Test set accuracy: 0.8018


cat("Test set auc: ", performance(test_svm_r_tuned, measures=auc), "\n")

## Test set auc: 0.7306253


# plot ROC and threshold
d = generateThreshVsPerfData(train_svm_r_tuned, measures = list(fpr, tpr, mmce))
plotROCCurves(d)

1.00

0.75
True positive rate

0.50

0.25

0.00

0.00 0.25 0.50 0.75 1.00


False positive rate

plotThreshVsPerf(d)

45
False positive rate True positive rate Mean misclassification error
0.8
1.00 1.00

0.75 0.75
0.6
performance

0.50 0.50

0.4

0.25 0.25

0.00 0.00 0.2

0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00
threshold

The result shows that tuning the cost cannot further improve the accuracy of the SVM classifier and The
full model performed better than the reduced model. Although the out sample test of the full and reduced
models are over 82%, the TP rate is only about 30% and 20% for threshold=0.5, respectively. From the ROC
curve of the full model, the balance TP rate and TN rate is less than 0.7. Overall, the performance of the
present SVM models are not very good. Intensive tuning is required to further improve the accuracy and
robustness of the model.

Neural Network
Basic neural network
A single layer neural network with 5 neurons is built for the problem below. In addition, it tests the effect of
applying dummy encoding for categorical variables on the performance of the neural network model. In view
of the fast training of neural network, only full dataset will be used.
# (1) Full set
# create a neural network learner - 5 neurons
nn_lrn_f <- makeLearner("classif.nnet", id="nn-full", size=5, maxit=10000L,
predict.type = "prob")
nn_lrn_f

## Learner nn-full from package nnet


## Type: classif
## Name: Neural Network; Short name: nnet
## Class: classif.nnet
## Properties: twoclass,multiclass,numerics,factors,prob,weights
## Predict-Type: prob
## Hyperparameters: size=5,maxit=10000

46
# train the neural network classifier
set.seed(500)
nn_mod_f <- train(nn_lrn_f, task_f)
nn_mod_f$learner.model

## a 23-5-1 network with 126 weights


## inputs: X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X16 X17 X18 X19 X20 X21 X22 X23
## output(s): default
## options were - entropy fitting
nn_mod_f$features

## [1] "X1" "X2" "X3" "X4" "X5" "X6" "X7" "X8" "X9" "X10" "X11" "X12"
## [13] "X13" "X14" "X15" "X16" "X17" "X18" "X19" "X20" "X21" "X22" "X23"
nn_mod_f$time

## [1] 12.15
train_nn_f <- predict(nn_mod_f, task_f); train_nn_f

## Prediction: 20000 observations


## predict.type: prob
## threshold: No=0.50,Yes=0.50
## time: 0.05
## id truth prob.No prob.Yes response
## 1 1 Yes 0.1977947 0.80220527 Yes
## 2 2 Yes 0.5107891 0.48921089 No
## 3 3 No 0.9103262 0.08967381 No
## 4 4 No 0.8426738 0.15732616 No
## 5 5 No 0.8847521 0.11524788 No
## 9 6 No 0.7462812 0.25371879 No
## ... (#rows: 20000, #cols: 5)
test_nn_f <- predict(nn_mod_f, newdata=test_set); test_nn_f

## Prediction: 10000 observations


## predict.type: prob
## threshold: No=0.50,Yes=0.50
## time: 0.03
## truth prob.No prob.Yes response
## 21299 No 0.9436665 0.05633347 No
## 8619 No 0.9013622 0.09863783 No
## 11730 No 0.8607486 0.13925138 No
## 3266 Yes 0.6012576 0.39874237 No
## 15740 No 0.8075504 0.19244957 No
## 19572 No 0.7710533 0.22894667 No
## ... (#rows: 10000, #cols: 4)
# dummy encoding
# create a neural network learner - 5 neurons
nn_lrn_fc <- makeLearner("classif.nnet", id="nn-full-dummy", size=5, maxit=10000L,
predict.type = "prob")
nn_lrn_fc

## Learner nn-full-dummy from package nnet


## Type: classif
## Name: Neural Network; Short name: nnet

47
## Class: classif.nnet
## Properties: twoclass,multiclass,numerics,factors,prob,weights
## Predict-Type: prob
## Hyperparameters: size=5,maxit=10000
# train the neural network classifier
set.seed(500)
nn_mod_fc <- train(nn_lrn_f, task_fc)
nn_mod_fc$learner.model

## a 32-5-1 network with 171 weights


## inputs: X1 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X16 X17 X18 X19 X20 X21 X22 X23 X2_1 X2_2 X3_0 X3_1 X3
## output(s): default
## options were - entropy fitting
nn_mod_fc$features

## [1] "X1" "X6" "X7" "X8" "X9" "X10" "X11" "X12" "X13" "X14"
## [11] "X15" "X16" "X17" "X18" "X19" "X20" "X21" "X22" "X23" "X2_1"
## [21] "X2_2" "X3_0" "X3_1" "X3_2" "X3_3" "X3_4" "X3_5" "X3_6" "X4_0" "X4_1"
## [31] "X4_2" "X4_3"
nn_mod_fc$time

## [1] 14.95
train_nn_fc <- predict(nn_mod_fc, task_fc); train_nn_fc

## Prediction: 20000 observations


## predict.type: prob
## threshold: No=0.50,Yes=0.50
## time: 0.06
## id truth prob.No prob.Yes response
## 1 1 Yes 0.3427545 0.65724553 Yes
## 2 2 Yes 0.7618579 0.23814212 No
## 3 3 No 0.9369623 0.06303767 No
## 4 4 No 0.8130752 0.18692479 No
## 5 5 No 0.8962573 0.10374268 No
## 9 6 No 0.8443427 0.15565725 No
## ... (#rows: 20000, #cols: 5)
test_nn_fc <- predict(nn_mod_fc, newdata=test_setc); test_nn_fc

## Prediction: 10000 observations


## predict.type: prob
## threshold: No=0.50,Yes=0.50
## time: 0.03
## truth prob.No prob.Yes response
## 21299 No 0.9430922 0.05690782 No
## 8619 No 0.8927502 0.10724976 No
## 11730 No 0.7963893 0.20361073 No
## 3266 Yes 0.8181570 0.18184302 No
## 15740 No 0.8449433 0.15505674 No
## 19572 No 0.8271063 0.17289370 No
## ... (#rows: 10000, #cols: 4)
The performance of the models are compared below.

48
# Relative confusion matrix (normalized by row/column)
calculateConfusionMatrix(train_nn_f, relative = TRUE, sums = TRUE)

## Relative confusion matrix (normalized by row/column):


## predicted
## true No Yes -err.- -n-
## No 0.96/0.84 0.04/0.31 0.04 17788
## Yes 0.66/0.16 0.34/0.69 0.66 2212
## -err.- 0.16 0.31 0.18 <NA>
## -n- 15590 4410 <NA> 20000
##
##
## Absolute confusion matrix:
## No Yes -err.- -n-
## No 14897 693 693 15590
## Yes 2891 1519 2891 4410
## -err.- 2891 693 3584 NA
## -n- 17788 2212 NA 20000
calculateConfusionMatrix(test_nn_f, relative = TRUE, sums = TRUE)

## Relative confusion matrix (normalized by row/column):


## predicted
## true No Yes -err.- -n-
## No 0.96/0.84 0.04/0.29 0.04 8903
## Yes 0.65/0.16 0.35/0.71 0.65 1097
## -err.- 0.16 0.29 0.18 <NA>
## -n- 7774 2226 <NA> 20000
##
##
## Absolute confusion matrix:
## No Yes -err.- -n-
## No 7455 319 319 7774
## Yes 1448 778 1448 2226
## -err.- 1448 319 1767 NA
## -n- 8903 1097 NA 20000
# performance
cat("Full set (without dummy encoding)\n")

## Full set (without dummy encoding)


cat("Training set accuracy: ", performance(train_nn_f, measures=acc), "\n")

## Training set accuracy: 0.8208


cat("Test set accuracy: ", performance(test_nn_f, measures=acc), "\n")

## Test set accuracy: 0.8233


cat("Test set auc: ", performance(test_nn_f, measures=auc), "\n")

## Test set auc: 0.7755196


# plot ROC and threshold
d = generateThreshVsPerfData(train_nn_f, measures = list(fpr, tpr, mmce))
plotROCCurves(d)

49
1.00

0.75
True positive rate

0.50

0.25

0.00

0.00 0.25 0.50 0.75 1.00


False positive rate

plotThreshVsPerf(d)

False positive rate True positive rate Mean misclassification error


0.8
1.00 1.00

0.75 0.75
0.6
performance

0.50 0.50

0.4

0.25 0.25

0.2
0.00 0.00

0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00
threshold

# Relative confusion matrix (normalized by row/column)


calculateConfusionMatrix(train_nn_fc, relative = TRUE, sums = TRUE)

50
## Relative confusion matrix (normalized by row/column):
## predicted
## true No Yes -err.- -n-
## No 0.95/0.84 0.05/0.33 0.05 17539
## Yes 0.62/0.16 0.38/0.67 0.62 2461
## -err.- 0.16 0.33 0.18 <NA>
## -n- 15590 4410 <NA> 20000
##
##
## Absolute confusion matrix:
## No Yes -err.- -n-
## No 14783 807 807 15590
## Yes 2756 1654 2756 4410
## -err.- 2756 807 3563 NA
## -n- 17539 2461 NA 20000
calculateConfusionMatrix(test_nn_fc, relative = TRUE, sums = TRUE)

## Relative confusion matrix (normalized by row/column):


## predicted
## true No Yes -err.- -n-
## No 0.95/0.84 0.05/0.32 0.05 8772
## Yes 0.62/0.16 0.38/0.68 0.62 1228
## -err.- 0.16 0.32 0.18 <NA>
## -n- 7774 2226 <NA> 20000
##
##
## Absolute confusion matrix:
## No Yes -err.- -n-
## No 7382 392 392 7774
## Yes 1390 836 1390 2226
## -err.- 1390 392 1782 NA
## -n- 8772 1228 NA 20000
# performance
cat("Full set (with dummy encoding)\n")

## Full set (with dummy encoding)


cat("Training set accuracy: ", performance(train_nn_fc, measures=acc), "\n")

## Training set accuracy: 0.82185


cat("Test set accuracy: ", performance(test_nn_fc, measures=acc), "\n")

## Test set accuracy: 0.8218


cat("Test set auc: ", performance(test_nn_fc, measures=auc), "\n")

## Test set auc: 0.7667567


# plot ROC and threshold
d = generateThreshVsPerfData(train_nn_fc, measures = list(fpr, tpr, mmce))
plotROCCurves(d)

51
1.00

0.75
True positive rate

0.50

0.25

0.00

0.00 0.25 0.50 0.75 1.00


False positive rate

plotThreshVsPerf(d)

False positive rate True positive rate Mean misclassification error


0.8
1.00 1.00

0.75 0.75
0.6
performance

0.50 0.50

0.4

0.25 0.25

0.2
0.00 0.00

0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00
threshold

The testing results are as follows: * Full model - TP rate: 0.35 TN rate: 0.95 Accuracy: 0.8196 AUC: 0.7693
* Full model (dummy encoding) - TP rate: 0.38 TN rate: 0.95 Accuracy: 0.821 AUC: 0.7665

52
Applying dummy encoding can slighlty increase the out sample testing TP rate from 0.35 to 0.38 and accuracy
from to 0.8196 to 0.821. The ROC curve of both models are similar. The out sample testing accuracy of
the basic neural network is slightly lower than the SVM model. To compare with the SVM model, we will
approximate the categorical variables by numerical variables without using the dummy encoding.

Tune neural network hyperparameter


The following code attempt to tune the no. of neurons in the hidden to see if further improvement can be
obtained.
rdesc <- makeResampleDesc(method = "CV", iters=3)

# discrete parameter sets


discrete_ps <- makeParamSet(
makeDiscreteParam("size", values = c(3:10))
)
ctrl_d <- makeTuneControlGrid()
set.seed(150)
res_nn <- tuneParams(nn_lrn_f, task = task_f,
resampling = rdesc,
par.set = discrete_ps,
control = ctrl_d,
measures = list(auc, acc, mmce))
res_nn

## Tune result:
## Op. pars: size=4
## auc.test.mean=0.7649980,acc.test.mean=0.8153499,mmce.test.mean=0.1846501
res_nn$x

## $size
## [1] 4
res_nn$y

## auc.test.mean acc.test.mean mmce.test.mean


## 0.7649980 0.8153499 0.1846501
The optimal no. of neurons is 5. The model is updated and retrained below.
# update the hyperparameter
nn_lrn_tuned <- setHyperPars(nn_lrn_f, par.vals = res_nn$x)
nn_lrn_tuned$id <- "nn_tuned" # rename the learner ID for benchmarking
nn_lrn_tuned

## Learner nn_tuned from package nnet


## Type: classif
## Name: Neural Network; Short name: nnet
## Class: classif.nnet
## Properties: twoclass,multiclass,numerics,factors,prob,weights
## Predict-Type: prob
## Hyperparameters: size=4,maxit=10000
# retrain the model
set.seed(500)
nn_mod_tuned <- train(nn_lrn_tuned, task_f)
nn_mod_tuned

53
## Model for learner.id=nn_tuned; learner.class=classif.nnet
## Trained on: task.id = Credit_Default_F; obs = 20000; features = 23
## Hyperparameters: size=4,maxit=10000
# prediction
train_nn_tuned <- predict(nn_mod_tuned, task_f)
test_nn_tuned <- predict(nn_mod_tuned, newdata = test_set)

The performance of the tuned model is evaluated below.


# Relative confusion matrix (normalized by row/column)
calculateConfusionMatrix(train_nn_tuned, relative = TRUE, sums = TRUE)

## Relative confusion matrix (normalized by row/column):


## predicted
## true No Yes -err.- -n-
## No 0.95/0.84 0.05/0.32 0.05 17682
## Yes 0.64/0.16 0.36/0.68 0.64 2318
## -err.- 0.16 0.32 0.18 <NA>
## -n- 15590 4410 <NA> 20000
##
##
## Absolute confusion matrix:
## No Yes -err.- -n-
## No 14839 751 751 15590
## Yes 2843 1567 2843 4410
## -err.- 2843 751 3594 NA
## -n- 17682 2318 NA 20000
calculateConfusionMatrix(test_nn_tuned, relative = TRUE, sums = TRUE)

## Relative confusion matrix (normalized by row/column):


## predicted
## true No Yes -err.- -n-
## No 0.95/0.84 0.05/0.32 0.05 8840
## Yes 0.65/0.16 0.35/0.68 0.65 1160
## -err.- 0.16 0.32 0.18 <NA>
## -n- 7774 2226 <NA> 20000
##
##
## Absolute confusion matrix:
## No Yes -err.- -n-
## No 7402 372 372 7774
## Yes 1438 788 1438 2226
## -err.- 1438 372 1810 NA
## -n- 8840 1160 NA 20000
# performance
cat("Tuned neural network model\n")

## Tuned neural network model


cat("Training set accuracy: ", performance(train_nn_tuned, measures=acc), "\n")

## Training set accuracy: 0.8203


cat("Test set accuracy: ", performance(test_nn_tuned, measures=acc), "\n")

## Test set accuracy: 0.819

54
cat("Test set auc: ", performance(test_nn_tuned, measures=auc), "\n")

## Test set auc: 0.7669821


# plot ROC and threshold
d = generateThreshVsPerfData(train_nn_tuned, measures = list(fpr, tpr, mmce))
plotROCCurves(d)

1.00

0.75
True positive rate

0.50

0.25

0.00

0.00 0.25 0.50 0.75 1.00


False positive rate

plotThreshVsPerf(d)

55
False positive rate True positive rate Mean misclassification error
0.8
1.00 1.00

0.75 0.75
0.6
performance

0.50 0.50

0.4

0.25 0.25

0.2
0.00 0.00

0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00
threshold
The
testing results are as follows: * Full model (tuned) - TP rate: 0.35 TN rate: 0.95 Accuracy: 0.8196 AUC:
0.7693
The tuned model cannot further improve the accuracy of the model.

Compare the SVM and the neural network


Since the tuning cannot produce significant improvement in the models, the benchmark will be based on
the original basic model. The following code compares the SVM and the neural network model. A 3-fold
cross-validation is used in the benchmark experiment.
# create benchmark dataset
df_b <- df
d_max_b <- sapply(df_b[,1:23], max)
d_min_b <- sapply(df_b[,1:23], min)
df_b[,1:23] <- rescale(df_b[,1:23], d_min_b, d_max_b)

# create benchmark task


task <- makeClassifTask(id = "Credit_Default_F", data = df_b,
target = "default",
positive = "Yes")

# define the learners for benchmarking


lrns <- list(svm_lrn_f, nn_lrn_f)

# define the measures to benchmark


ms <- list(auc, acc, mmce, ber)

# 3-fold cross-validation
rdesc.outer <- makeResampleDesc("CV", iters = 3)

56
# perform benchmark
set.seed(500)
bmr <- benchmark(lrns, tasks = task, resampling =
rdesc.outer, measures = ms)
bmr

## task.id learner.id auc.test.mean acc.test.mean mmce.test.mean


## 1 Credit_Default_F svm_full 0.7184379 0.8179667 0.1820333
## 2 Credit_Default_F nn-full 0.7718232 0.8177000 0.1823000
## ber.test.mean
## 1 0.3629554
## 2 0.3468246
# compare the results
df = generateThreshVsPerfData(bmr, measures = list(fpr, tpr,
mmce))
plotROCCurves(df)

1.00

0.75
True positive rate

learner
0.50 nn−full
svm_full

0.25

0.00

0.00 0.25 0.50 0.75 1.00


False positive rate

plotThreshVsPerf(df) +
theme(strip.text.x = element_text(size = 7))

57
False positive rate True positive rate Mean misclassification error
0.8
1.00 1.00

0.75 0.75
0.6
performance

learner
0.50 0.50 nn−full
svm_full
0.4

0.25 0.25

0.2
0.00 0.00

0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00
threshold

# density plots
perf <- getBMRPerformances(bmr, as.df=TRUE)
p1<-ggplot(perf, aes(acc, colour = learner.id)) +
geom_density() +
labs(title="Accuracy")
p2<-ggplot(perf, aes(mmce, colour = learner.id)) +
geom_density() +
labs(title="Mean misclassification rate")
p3<-ggplot(perf, aes(ber, colour = learner.id)) +
geom_density() +
labs(title="Balanced error rate")
grid.arrange(p1,p2,p3,ncol=1)

58
Accuracy
300 learner.id
density

200
svm_full
100
0 nn−full
0.815 0.816 0.817 0.818 0.819
acc

Mean misclassification rate


300 learner.id
density

200
svm_full
100
0 nn−full
0.181 0.182 0.183 0.184 0.185
mmce

Balanced error rate


learner.id
density

100
75 svm_full
50
25
0 nn−full
0.34 0.35 0.36
ber

From the results obtained in the benchmark experiment, the neural network performed slightly better than
the svm classifiers in all measures. Overall, the results of both classifiers are not highly satisfactory with 33%
- 37% balance error rate. More effective machine learning algorithms are required for this credit card default
problem.

59

You might also like