Professional Documents
Culture Documents
DeepCreditRisk Eng
DeepCreditRisk Eng
DeepCreditRisk Eng
Daniel Rösch*
Harald Scheule†
* DANIEL RÖSCH is a professor of business and holds the chair of statistics and risk management at the Univer-
sity of Regensburg, Germany.
† HARALD SCHEULE is a professor of finance at the University of Technology Sydney, Australia.
Deep Credit Risk
Machine Learning with Python
Version 2.0, 2021
ISBN: 9798617590199
Imprint:
Copyright © 2021 Daniel Rösch, Harald Scheule
All rights reserved.
Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best effort in preparing this book,
they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specif-
ically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or
extended by sales representatives or written sales materials. The advise and strategies contained herein may not be suitable for
your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any
loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.
2
Contents
List of Key Words 12
2 Python Literacy 25
2.1 Synopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.2 Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.2.1 Anaconda and IDEs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.2.2 Packages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.2.3 Coding Guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.3 First Look . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.3.1 Creating Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.3.2 Subsampling Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.3.3 Printing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.3.4 Chaining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.4 Describing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.5 Tabulating . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.6 Resetting Indexes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.7 Calculating Mean Values by Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.8 Plotting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.9 Generating New Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.10 Transforming Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.11 Subsetting Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.12 Combining Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.12.1 Concatenating . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.12.2 Appending . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.12.3 Match Merging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.12.4 Joining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.13 Regression Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.14 numpy vs pandas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.14.1 Converting pandas dataframes to numpy arrays . . . . . . . . . . . . . . . . . . . 42
2.14.2 Converting numpy arrays to pandas dataframes . . . . . . . . . . . . . . . . . . . 42
2.15 Module dcr . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
2.16 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
2.16.1 versions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
2.16.2 dataprep . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2.16.3 woe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.16.4 validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.16.5 resolutionbias . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3
Contents
3 Risk-Based Learning 47
3.1 Synopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.2 Maximum-Likelihood Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.2.1 Example for Default Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.2.2 Practical Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.3 Bayesian Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.3.1 Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.3.2 Parameter Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.3.3 Example for Default Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.3.4 Analytic Computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
3.3.5 Markov-Chain-Monte-Carlo Simulation . . . . . . . . . . . . . . . . . . . . . . . . . 65
3.4 Sandbox Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4 Machine Learning 68
4.1 Synopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.2 Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.3 Cost Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.4 Information and Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
4.5 Optimization: Gradient Descent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
4.6 Learning and Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
4.6.1 Train vs Test Split . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
4.6.2 Bias-Variance Tradeoff . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
4.6.3 Crossvalidation and Tuning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
4.7 Practical Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
4.8 P-Value and ML Hacking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
4.9 Sandbox Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
4
Contents
5
Contents
8 Validation 171
8.1 Synopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
8.2 Qualitative Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
8.3 Quantitative Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
8.3.1 Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
8.3.2 Backtesting as Part of Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
8.3.3 Traffic Light Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
8.4 Metrics for Discriminatory Power . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
8.4.1 Confusion Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
8.4.2 Accuracy Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
8.4.3 Classification Report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
8.4.4 ROC Curve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
8.4.5 Portfolio Dependence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
8.5 Metrics for Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
8.5.1 Brier Score . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
8.5.2 OLS R-Squared . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
8.5.3 scikit-learn R-Squared . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
8.5.4 Binomial Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
8.5.5 Jeffrey’s Prior Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
8.5.6 Calibration Curve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
8.5.7 Hosmer-Lemeshow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
8.6 Metrics for Stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
8.7 Function validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
8.8 Other Outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
8.9 Validation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
8.9.1 Data Preparation and Feature Engineering . . . . . . . . . . . . . . . . . . . . . . 197
8.9.2 Fitting of Candidate Models and Validation . . . . . . . . . . . . . . . . . . . . . . 198
8.9.3 Comparing ROC Curves Out-of-Time . . . . . . . . . . . . . . . . . . . . . . . . . . 202
8.9.4 Model Stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
8.9.5 Practical Recommendations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
8.10 Sandbox Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
6
Contents
7
Contents
8
Contents
9
Contents
17.5.1 Probability Density Function, Survival Probability and Hazard Rate . . . . . . . 402
17.5.2 Cross-Sectional Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 402
17.5.3 Cox Proportional Hazard Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403
17.6 Other Risk Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 410
17.7 Sandbox Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 410
20 Outlook 461
20.1 Where Do We Stand Today? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 461
20.2 Roles of Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 461
10
Contents
Bibliography 464
11
List of Keywords
Accuracy, 171, 177, 183, 200, 224, 249, 272, Calibration, 23, 121, 174, 186, 193, 194, 196,
306–308, 344 199, 200, 210, 253, 264, 266, 280, 317,
Activation functions, 323 318, 342, 347, 351, 352, 359, 360, 368,
Adaptive boosting, 347 369, 371, 373, 375–378, 380, 384, 398,
Akaike information criterion, 418 401
Asset correlation, 108, 228, 230, 436, 437, 439, CAP, 175, 183
440, 442–444, 446, 447, 450, 453, 460 Clustering, 17, 69, 135, 136, 139, 140, 142, 144
Asymptotic single risk factor model, 217 Co-integration, 417
Attrition, 250 Coefficient of determination, 187
AUC, 165, 180–183, 185, 196, 199, 200, 202, Common equity tier 1, 412
216, 221, 226, 263, 269, 272, 280, 292, Comprehensive capital analysis and review,
297, 299, 301, 310, 312–314, 318, 321, 21, 236
322, 333, 335, 337, 338, 342, 344, 349, Conditional probability of default, 228, 446,
353, 392 458, 459
AUROC, 165, 175, 180, 197 Confusion matrix, 175–177, 179, 206, 303
Conjugate prior, 61, 190
Backpropagation, 323, 324 Contract rate, 106, 198, 219, 220, 253, 261,
Backstop, 428 262, 285, 290, 331, 340, 355, 413, 430
Backtesting, 23, 173, 174 Cox proportional hazard, 401, 403, 408, 410
Bagged trees, 354, 375 Credible interval, 57, 59, 67
Bagging, 341, 342 Credit conversion, 95, 276, 278, 281–284, 286,
Base model, 295, 298, 310, 339, 342 287
Baseline hazard function, 404 Crisis pds, 22, 209
Bayesian approaches, 57, 236, 242 Cross-tabulation, 32
Bernoulli, 184, 188, 324 Cumulative accuracy profile, 175
Beta regression, 259, 269, 270, 272 Cumulative default rate, 100, 394, 395, 410,
Binary classification, 78, 328 447, 450, 453
Binomial test, 186, 188, 190, 191, 196 Cumulative density function, 123, 211, 228,
Boosted trees, 347, 348, 354, 375 243, 252, 265, 278
Bootstrap aggregation, 339 Cumulative excess payment, 152
Bootstrapping, 242, 339 Cure, 92, 94, 392
Brier score, 186, 187, 196, 197, 199, 200, 221, Current expected credit loss, 21, 411
226, 263, 269, 272, 297, 301, 310, Cut-offs , 173
312–314, 321, 333, 335, 337, 342, 344,
349, 359, 360, 362, 363, 365, 368, 371 Decision trees, 289, 339, 347, 354
12
LIST OF KEYWORDS
Default indicator, 44–46, 49, 94, 98, 208, 209, Hidden layer, 323, 324, 327, 330, 332, 333, 338,
354, 413 378
Default models, 208, 216, 387, 410 Hit rate, 173, 176
Deficiency judgment, 158, 262 Hosmer-lemeshow, 186, 193
Dickey-fuller test, 417 Hyperplanes, 320
Discrete time hazard model, 254
Discrimination, 121, 174, 187, 196, 199, 216, Implied pd, 214, 218, 220, 222, 256
221, 253, 301, 317, 318, 342, 347, 351, In-sample, 118, 132, 165, 172, 199, 218, 220,
352, 392 222, 256, 272, 274, 291–293, 295, 296,
Distance measure, 136, 142, 196, 221, 302 311, 312, 318, 355, 358, 359, 401, 412,
Double-trigger, 152 419, 444
Downturn, 17, 19, 22–24, 31, 95, 107, 209, 227, In-time, 151, 198–200, 208, 232, 389, 412, 428
232, 235, 238, 240, 242, 246, 256, 257, Incentive misalignments, 392
389, 412, 413, 432, 440 Infinitely granular portfolio, 435, 436, 440
Dummy coding, 128, 129 Input layer , 324
Durbin watson statistic, 419 Instantiate, 295, 299
Interaction term, 218
Elbow-criterion, 138 Inverse mills ratio, 257
Estimated pds, 215, 248, 257, 400, 401, 424, Iteration, 74, 75, 140, 168, 327, 333, 345, 452,
425, 431 453, 455, 457
European banking authority, 106, 108, 236,
428 Jeffrey’s prior test, 190, 197
Expected loss, 17, 23, 106, 107, 158, 209, 389, Johansen test, 417
411–413, 426, 427, 429, 434–436,
457–460 K-nearest neighbor, 289
Expected shortfall, 17, 209, 446, 457, 459, 460 Kernel, 166, 168, 295, 320, 377
Exponential distribution, 297, 402 Knot, 127
Exposures given default, 209
Extreme gradient boosting, 342 Lasso regression, 362
Latent variable, 212, 257
Farthest-neighbor, 139, 142 Least absolute shrinkage, 167
Federal reserve bank, 21, 236
Light gradient boosting machine, 348
Fitted pds, 214–216, 218, 251, 321, 389, 398,
Linear regression, 39, 68, 129, 172, 259, 260,
408, 415, 445
264–266, 269, 272, 275, 278, 279, 284,
Foreclosure law, 158, 262
354, 365, 366, 368, 369, 373, 375–377,
Forward stage-wise additive modeling, 342
379, 382–385, 440, 444
Fractional response regression, 259, 266, 267,
Loan amortization, 158, 387
272
Loan to value ratio, 20, 21, 275
Goodness of fit, 263, 280, 285 Log-likelihoods, 216
Gradient boosting, 13 Logistic regression, 27, 68, 81, 88, 90, 129, 165,
Gradient descent, 74, 77, 78, 80, 81, 324, 327 168, 172, 183, 194, 198, 211, 217,
Granger causality, 417 221–224, 289, 290, 297, 298, 314, 322,
323, 326, 327, 329, 333, 337–339,
Helicopter money, 22 341–344, 351, 353, 389
13
LIST OF KEYWORDS
Naive bayes, 17, 289, 312, 313, 322 Random forest, 17, 339, 344, 351, 354, 375
Nearest-neighbor, 139, 140 Rank correlation, 216
Neural network, 17, 23, 289, 323, 335, Rating migration, 392–394
337–339, 351, 354, 378, 382, 385 Recall, 176–178
Noise, 83, 174, 337 Receiver operating characteristic, 175, 179
Non-distressed exposures, 276, 281 Recourse lending, 152, 158
Normalization, 70, 135 Recovery rate, 94, 102
Notional lgd, 106–108, 260, 261, 354, 425 Recursive feature elimination, 166
Reference coding, 127, 128
Ordinal rating class, 392 Refinance, 152, 250, 251, 253, 282
Ordinary least square, 259 Regularization, 81, 87, 167, 168, 198, 200, 292,
Origination, 18–23, 28, 31, 38–40, 96, 106, 120, 294, 297, 299, 301, 321, 322, 324, 339,
122, 152–154, 157, 158, 206, 210, 249, 359
251, 322, 387, 402, 409, 410, 431, 462 Reliability diagram, 192
Out-of-time, 121, 173, 174, 198–200, 291, 293, Repository, 27
297, 317, 318, 333, 349, 352 Residential mortgage-backed securities, 251
Outstanding balance, 20, 425 Resolution bias, 45, 46, 92, 110, 114, 260, 278,
354
Paired assets, 427 Resolution period, 20, 94, 101–103, 106,
Pairwise correlation, 170 110–112, 260, 399, 427
14
LIST OF KEYWORDS
Revolving credit line, 254 141, 143, 145, 238, 239, 267, 282, 284,
Ridge regression, 359, 382 325, 347, 350
Right-censored, 402 Summary statistics, 67, 103, 104, 114
Risk segment, 242, 440 Support vector machines, 17, 289, 320, 322,
Roll rate analysis, 387 354, 377
Root mse, 260, 265, 279
Root node, 314 Term structure, 158, 210, 387, 389, 398–400,
410–412, 423, 424, 429, 431
Scheduled balance, 152–154, 253, 276, 277 Through-the-cycle pd, 21
Second lien loan, 152, 260 Train-test split, 23, 88
Security, 38, 260
Seniority, 260 Underwriting, 18
Sensitivity, 108, 175, 179–182 Unexpected losses, 209, 435
Separation margin, 320 Unimpaired asset, 412
Serial correlation, 419
Significant increase in credit risk, 427 Validation, 17, 23, 40, 82, 87, 118, 121, 165,
Single-linkage, 140 171–175, 197, 212, 216, 224, 226, 238,
Spearman correlation, 216 272, 274, 275, 280, 291, 293, 313, 324,
Specificity, 175, 179, 180, 182 329, 344, 347, 358, 379, 383
Spectral measures, 457 Value-at-risk, 435, 436
Spherical decision boundary, 328 Variance-covariance matrix, 242, 418
Splines, 84, 126, 127, 132, 158 Verbosity, 295, 327, 330
Splitting point, 314 Vintage, 23, 96, 97, 115, 118, 120, 157, 158
State dummies, 261, 262, 265
Stationarity, 417 Weak learner, 342
Stationary point, 74 Weight decay, 167
Stochastic gradient boosting, 347, 349 Weight-of-evidence, 126
Stochastic gradient descent, 353 Winsorize, 122, 158, 259, 263, 265, 355, 358,
Stress-testing, 22, 23, 235, 236, 461 379, 387, 441, 442, 444
Sub-additivity property, 459 Working capital ratio, 22
Subsample, 29, 36, 37, 90, 102, 112, 117, 137, Workout, 18, 94, 101–103, 106, 110, 412, 427
15
Part I
16