GL - ML - 2017 (Reworked)

Agenda
•How Credit Decisions are Taken in the

Financial Industry?
•How is Machine Learning Being Adopted in
Taking the Decisions?
Jun 10, 2020 1

Why Credit Risk?
•Net Credit Loss : 25% of Revenues and Twice the net income in Q2 2011
Jun 10, 2020 2

Game One –Rank Them in Order
•Provided is the Financial Data on 10
Companies.
•You have to Rank Order them in terms of the
Companies Future Prospect / Credit Quality
•Time Allotted is 30 Min
•Presentation
•Discussion
Jun 10, 2020 3

Example -Looking for
Key Attributes
Jun 10, 2020 4

What Did We Learn ?
1.If we are a team of 4 –we have 5 Rank
Orderings!
2.We can develop own rating methodology
based on own inferences and Logic
3.Can be Super-Successful if we Blend the
Knowledge and Industry Wisdom
Jun 10, 2020 5

Agenda
Jun 10, 2020 6

PD Models -Statistical
Models in Credit Risk Measurement
Jun 10, 2020 7

How do you build a
Default Model for India
•Defining Default –Some considerations
–It Should Work
–Easy to Define
–Easy to Apply and Measure
–Lend to Future Improvement
Jun 10, 2020 8

Pages in Indian ‘Default’ History
•Sick Industrial Company Act (SICA, 1985)
–Registered for Five* Years
–Incurred Cash Loss for two consecutive years
–Networth is negative
•Eligible Company Mandatorily refer to BIFR
(Board for Industrial and Financial
Reconstruction)
Jun 10, 2020 9

Pages in Indian
‘Default’ History (Continued)
•Sick Industry Report (1992)*
-Evaluated Criteria Independently
-Recommended using 2 years of consecutive
Cash Flow as the Criteria
-Rationale –Early Intervention has chances of
High Survival of the Sick Companies
Jun 10, 2020 10

Does this Default Definition Work?
•Year 2007 : PAT of a company was -60.7 crores
•Year 2008 : PAT of a company was -97.4 crores
•Shall we define this company as a Defaulter / Sick
Company ?
•Net worth of the company was positive though in
these two years
•Name of this company : TATA Advanced Material Ltd
•TATA Advanced Material Ltd sprung back to action
with positive profit in the later years
Jun 10, 2020 11

Proposed Definition
•Companies which has Negative Networth for the First timein
the Time window 1991-97
–Midway between the Two Definitions
–Is Measurable and can be Improved
Jun 10, 2020 12

PD Models –Linear Models
•Altman’s Z Score Model (1968)
•Z (Public) = 1.2X(1) + 1.4X(2) + 3.3X(3) + 0.6X(4) + 1.0X(5)
–Where Z > 2.99 is healthy
–Z < 1.81 is unhealthy
–1.81 < Z < 2.99 is indeterminate
•X(1) = Working Capital / Total Assets
•X(2) = Retained Earnings / Total Assets
•X(3) = Earnings Before Interest and Taxes (EBIT) / Total Assets
•X(4) = Market Value of Equity / Total Assets
•X(5) = Net Sales / Total Assets
Jun 10, 2020 13

Steps to Build a Good Function
•Selection of the ‘Default’ Sample
•Creation of the Non-Default Sample
–Choice of Industry, Firm Size & Time Period
•Appropriate Treatment of the Data
–Choose the predictors carefully
•Choice of a Good Out of Sample Validation
Data –Stress Test the Data
Jun 10, 2020 14

Deep Dive on the Variables
•Clear Separation in mean values of variables between
defaulters and non-defaulters
Jun 10, 2020 15

Discriminant Function
•India Z Score Model (2000)
•Z (Public) = 1.06 + 0.01PBITINT –3.12TDTA + 0.48QR +
4.61NCATA
•PBITINT = Profits Before Interest & Taxes / Total
Interest
•TDTA = Total Borrowings / Total Assets
•QR = Current Assets –Inventories / Current Liabilities
& Provisions
•NCATA = Profit After Tax & Depreciation / Total Assets
Jun 10, 2020 16

Does it Work ?
Jun 10, 2020 17

Challenges of Discriminant Model
•Challenges
–Zone of Indifference (Indeterminate)
–Sensitivity to Industry
–Dealing with New Companies
–Loss of Predictive Power across Time –use of Penalty
Functions
–Multi-colinearity of Variables
•Some variables provide the same information and are
highly correlated
–Assumption of Multivariate Normality of Variables
Jun 10, 2020 18

What if we use a
Logistic Framework?
•We can also Build a Logistic Model using 2010-11
Data
•Take all companies with negative Net worth in 2011
•Take the same number of companies with similar
asset size and same industry with positive Net worth
in 2011
•QC the data carefully, removing all outliers
•Compute all the predictors as of 2010
•Run the logistic regression and get the equation
Jun 10, 2020 19

Logistic Model -The new Equation
•Logit(score) = -6.9965
+ 4.8879 *Total Borrowings/Total Assets
-0.455 * Net working cap / Current Liabilities
-6.8605 *Net cash accruals / Total assets
+1.6317 *Current Liabilities / Current Assets
+ 0.1978 *Total Borrowings/Total Liabilities
Jun 10, 2020 20

Logistic Models –
Use in Credit Rating
•If we run the equation on N companies, we
can rank-order the companies based on the
score
•Create ratings based on the cutoffs chosen
Jun 10, 2020 21

How is Rating
Determined in CRAs ?
Jun 10, 2020 22

Developing CRA’s
Rating Algorithm
•Developing Ordered-LogitRegressions with Market
Information ( Incorporated market information into the logic)
Jun 10, 2020 23

Recap –What we Studied
•Discriminant Models though Accurate has ‘High
Maintenance’
•Logit/ ProbitRegressions display stability than
Discriminant Models
•The Strength of the Tool is a Function of the Quality of
Data used for the Analysis
•Credit Rating tries to resolve Information Asymmetry
in assigning the Right Price of Debt
•Either Tool can be used to develop an Independent
Rating Framework
Jun 10, 2020 24

Total Recall –Market Risk
•Develop Two Methods of VaR using
Parametric and Distribution Method.
•Single Stock
•Portfolio
•Developed Mean-Variance Portfolio
Approach of Weight Optimization
Jun 10, 2020 25

Industrial Wisdom
•If you want to do something New –Know the
Past –Spend 30% of the Allotted time.
•Spend 40% of Time getting the ‘Right’ Data
•Building the Analytical Solution is 10% of the
Time
•Stress Test the Solution for Remaining 20%
•Document the Key Learnings and
Opportunities for Future Enhancements
Jun 10, 2020 26

Total Recall –Credit Risk
•Challenge to Define Default in the Context of
India
•Choosing Logistic over Discriminant Models to
predict Default
•Need to Find ‘independent’ attributes / features
to predict Default
•Understanding and Cleaning of Data is Essential
•Dividing the Sample in Development, Validation,
Out of Sample Validation is a Must.
Jun 10, 2020 27

Total Recall –Credit Risk (Cont)
•Machine Learning to be used on the
same Sample to Develop Alternate type
of Models.
•Having a good understanding of the
Problem to be solved and the underlying
data is of Paramount Importance.
Jun 10, 2020 28

Game 2: Building a
Smart Default Model
•Define Default
•Do a data quality check and remove outliers
•Dividethe data into development and validation datasets
•Identify and define variables that has high co-relation
with default
•DevelopLogistic function
•Develop Classification table
•Test performance on out-of-sample data
•Rank Order your 10 Companies –what do you Find?
Jun 10, 2020 29

Define Default
•Step 1 : Define Default in your dataset
–All the company with negative
Networthshould be defined as default
–If Networth<= 0 , then default = 1, else
default = 0 ( Add a new column in your dataset
called Default which takes a value of 0/1)
–What is the default rate in your dataset ?
Jun 10, 2020 30

Quality Check
•Step 2 –Quality Check of the data
–Have a look at the data and remove the outliers
•These outliers will distort your equation if not removed
–After you are satisfied with the data, randomly divide it
into development dataset (70%) and validation dataset(30%)
•Check the default rates of the development and validation
dataset
–since it is randomly split, the default rates should be similar
–You will now work on development dataset to develop the
model equation
Jun 10, 2020 31

Identification of Predictors
•Step 3 –Identify the predictors
–Look at the variables which give a good discrimination
between the defaulters and non-defaulters in the dataset
•Eg. If you want to use variable X in your model, you should
look at the mean value of variable X for non-defaulters , and
the mean value for defaulters. If there is a good
discrimination between the two mean values, then the
variable X should be used in your model
–Ratio variables MIGHT be more appropriate for the model
equation!
Jun 10, 2020 32

Workshop Recap
•Defined Default
•Divided the data into development and
validation datasets
•Did a data quality check and remove outliers
•Identified and defined variables that had high
co-relation with default
Jun 10, 2020 33

Develop Logistic Equation
•Step 4 : Develop Logistic Equation
–Transfer the data to the SPSS sheet
–Look for the Logistic Regression field in the SPSS sheet ( will
come under the Analyze tab)
–Dependent variable is the 0/1 Default Column
–Independent variables are the predictors that you have chosen
–Remove the variables that have high significance level
–Fine tune your final model by looking at the sign of the predictors
•The predictors sign should make intuitive sense. Eg. A variable
like CL/CA should have positive sign
–Look at the efficiency rate of the model prediction
Jun 10, 2020 34

Out of Sample Validation
•Step 5 : Your model is now ready
–Take the validation dataset
–Score each row with your equation –get the score for
everyone
–Count the number of defaulters in your dataset
•Lets assume that there are 30 defaulters in your dataset
•Based on your score, take the top 30 companies
•Look at the default rate of these top 30 companies –This
gives you the efficiency of your model
•Higher the efficiency, better is the model
Jun 10, 2020 35

First Set of Lessons
1.Understanding Data in the First Step to
Successful Analytical Exercise
2.One Can Build a Great Model for Credit
Rating using Any Technique provided you
know its Limitations
3.Can be Super-Successful if we Blend the
Knowledge and Industry Wisdom
Jun 10, 2020 36

Machine Learning in Credit Risk
Jun 10, 2020 37

Learning from 2
MM Models -Kaggle
Great Lakes
•Data Type
–Structured Data
–Unstructured Data
•Step 1: Understand the Data Generation
Process
–Explore the Data
Jun 10, 2020 38

Learning from 2
MM Models -Kaggle
•Step 2: Feature Engineering
–Structured Data
–Rank Plot / Hypothesis Testing
–Synthetic Variables
•Feature Engineering
–Not Relevant for Un-Structured Data
Jun 10, 2020 39

Learning from 2
MM Models -Kaggle
•Step 3: Structured Data -Fitting the Right
Algorithm
–Random Forest
–Support Vector Machine
–Gradient Boosting Machine
•Unstructured Data -Deep Learning
–CNN or RNN (image vs sequence data)
Jun 10, 2020 40

Learning from 2
MM Models -Kaggle
•Caution
–Overfitting
–Use Cross Validation to Test Model
Performance
–Poor Performance in Out of time Sample
•Participate in Kaggle
–Get the Real Experience
–Cloud Based Kernel of Kaggle
Jun 10, 2020 41
LOGISTIC REGRESSION
Jun 10, 2020 42

Logistic Regression
Logistic Regression builds a non-linear equation to predict a dichotomous
variable. In fact, what it does is classification rather than regression unlike
its name!
Jun 10, 2020 43

Why not Linear?
•The Y variable is a binary
variable –1 or 0
•The relationship between the
dependent and independent
variables is non-linear
•The usual linear regression
generates values outside [0,1]
•A linear fit to a binary variable
becomes very sensitive to
extreme values
•Other statistical Complications!
Jun 10, 2020 44

Logistic function –a better fit!
So, we need a function that stays within the bounds of 0 & 1
and represents the data in a much better manner
Jun 10, 2020 45

How does logistic learn
the Coefficients -An Example
Can you predict whether a person will buy a house
with the given information?
Jun 10, 2020 46

Jun 10, 2020 47

Cost function:
When Y=1, then Cost = -log (Prediction)
When Y=0, then Cost = -log (1-Prediction)
Jun 10, 2020 48

Step 3: Adjust the coefficients and the
predictions in an iterative fashion to move
towards the global Cost minima
Jun 10, 2020 49

RANDOM FOREST
Jun 10, 2020 50

Why is it called a Forest?
• Predictive model based on a branching series of Boolean tests
• Boolean tests are less complex than one-stage classifiers
“Forest” or an
ensemble of trees
required to
address over-
fitting
Jun 10, 2020 51

But why Random?
• Bootstrap aggregating(or bagging)
• Random feature selection
Split variable at every node of a
tree is randomly selected from the
full list of features
Bagging is the process through
which samples are selected to
build the trees; random samples
are repetitively drawn (with or
without replacement) from the
training set
Jun 10, 2020 52

Let’s take an example
Jun 10, 2020 53

Building Decision Trees
Subsequent trees are generated in a similar manner on different samples
Jun 10, 2020 54

To summarize the model
building process…
Jun 10, 2020 55

How best to use Random Forest?
Parameter Tuning
Impact
Larger number of trees = Less chance of over-fitting
More complex solution
Higher runtime
Jun 10, 2020 56

Parameter Tuning
Impact
More randomly selected variables = Significant variables show up
Repetitive trees –all variables in
data not evaluated
Jun 10, 2020 57

Parameter Tuning
Impact
Higher sampling ratio = Enough data points to build trees
Not enough data points to test the
stability of trees
Jun 10, 2020 58

Parameter Tuning
Impact
Sampling without replacement = Trees covering different dimensions
Limit on maximum number of trees
Jun 10, 2020 59

GBM
Jun 10, 2020 60

Gradient Boosting !
Gradient boosting produces a strong prediction model by
ensembling many weak prediction models, typically decision
trees, built in a stage-wise fashion
Jun 10, 2020 61

RF Vs GBM
Jun 10, 2020 62

GBM Introduction
•GBM builds decision trees in a stage-wise manner
•The first set of prediction is initialized to a constant value and the first tree is build
on the residual error from this constant value
•Successive trees use the residual from the previous tree to reduce prediction
error
•GBM score is a linear combination of the individual tree predictions
Jun 10, 2020 63

Lets Look At An Example
Can you predict the value of the home of any person with the given
information?
Jun 10, 2020 64

Jun 10, 2020 65

Learning Rate : 10%
Jun 10, 2020 66

Step 6,7,8.. : Repeat process with next trees until errors are minimized!
The number of trees you build, the depth of each tree and the learning rate will all
decide how good a model you make!
Jun 10, 2020 67
Overview of GBM –
Regression and Classification
Jun 10, 2020 68

How best to use GBM?
Jun 10, 2020 69

Keep in Mind
Jun 10, 2020 70

k-NN algorithm
Jun 10, 2020 71

Finding Lookalikes
How do you find people similarto SushilKumar among a group
of sportspersons?
Jun 10, 2020 72

Concept of Distance and Similarity
Jun 10, 2020 73

k-NN
(k-Nearest Neighbors) Algorithm
Algorithm tofind the k most similar people, i.e., the Knearest neighbors
Jun 10, 2020 74

Example using kNN
Can you predict whether Maitreehas a car?
Jun 10, 2020 75

k-NN Algorithm:
Mathematical Formulation
Jun 10, 2020 76

k-NN Algorithm:
Jun 10, 2020 77

k-NN Algorithm:
Jun 10, 2020 78

k-NN Algorithm:
Jun 10, 2020 79

Parameters for kNNmodels
Distance Metric Dimensions
Should satisfy triangle inequality. Should be independent and identically
For example: distributed (IID).
•Euclidean distance For example:
•Chebysev’sdistance •Age
•Manhattan distance •Income
•Mahanalobisdistance
Value of k Scoring Function

Typically selected through cross- Label of the nearest neighbors weighed
validation. differently:
•Distance to the point
•Rank of the neighbor
Jun 10, 2020 80

Steps to Build a ML Model -1
•Step 1: Divide the Data in Development and
Validation
•Step 2: Appropriately Floor and Cap the
Variables
•Step 3: Execute the Different Models
•Step 4: Save the Results
Jun 10, 2020 81

Steps to Build a ML Model -2
•Step 5: Score the Validation Data Sets
•Step 6: Keep the Relevant Variables
•Step 7: Compute the GINI of the Out of
Sample
•Step 8: Save the Data Sets with Predicted
Variables and MERGE Key
•Step 9: Compare the Results
Jun 10, 2020 82

Step 1a –Read the Data
•Read the Data in R
data<-read.csv("dev-data-1.csv")
•Check the observations
nrow(dataset name)
Jun 10, 2020 83

Step 1b –Split the Data
in Development and Validation
•Do a 75-25 Split of the Data –Training (development) and Test (Validation)
splitdf<-function(dataframe, seed=NULL) {
if (!is.null(seed)) set.seed(seed)
index <-1:nrow(dataframe)
trainindex<-sample(index, trunc(length(index)*0.75))
trainset <-dataframe[trainindex, ]
testset<-dataframe[-trainindex, ]
list(trainset=trainset,testset=testset)
}
splits <-splitdf( data, seed=nrow(data))
training_lg<-splits$trainset
testing_lg<-splits$testset
Jun 10, 2020 84

Step 2 –Floor and Cap
the Variables
•Flooring of the Missing Has been Done as “1”
training_lg$TA[is.na(training_lg$TA)]<-1
training_lg$TI[is.na(training_lg$TI)]<-1
training_lg$TE[is.na(training_lg$TE)]<-1
training_lg$PAT[is.na(training_lg$PAT)]<-1
•Alternate Flooring Capping can also be
used…..
Jun 10, 2020 85

Step 2a –Alternate
Capping and Flooring of Variables
•Replacing Missing with Means
training$TLIAB[is.na(training$TLIAB)] <-
round(mean(training$TLIAB,na.rm=TRUE))
1stPercentile
•Replacing with Percentile Values:
training$INVST1<-ifelse(training$INVST<= 10,10,training$INVST)
training$INVST1<-ifelse(training$INVST1>= 1222.485,
1222.485,training$INVST1)
99thPercentile
Jun 10, 2020 86

Step 3 & 4 –Execute the
Different Models -Logistic
•Need to Load the GLM library –happens
automatically
library(glm2)
•Run the Relevant Equation
model<-glm( DEF ~
TA+TI+TE+PAT,data=training_lg,family="binomial")
summary(model)
•Play Around till you get the ‘Best’ Equation
Tips: P1 <-fitted(model)
Jun 10, 2020 87

Step 5 –Validate the
Model in the Validation Data
•Use the Model Equation to Come with the Predicted Scores
on Validation Sample
predicted<-predict(model, newdata=testing_lg,
type="response")
•Transform the ‘predicted’ temp variable to a Variable in the
File
d <-transform(predicted)
•Save the Temp d variable to predlgvariable in the
Testing_lgFile
testing_lg$predlg<-d$X_data
•Check the results –head(testing_lg)
Jun 10, 2020 88

Step 6: Keep the
Relevant Variables
•Only Keep the Relevant Variables –3 of them
testing1_lg <-subset(testing_lg, select = c(predlg,DEF,Num))
Predicted prob Default Indicator Merge key
Jun 10, 2020 89

Step 7 –Compute the GINI
•Load new Library -library(Hmisc)
•Relevant Commands:
rcorr.cens(oot_lg$pred,oot_lg$DEF)
rcorr.cens(testing_lg$pred,testing_lg$DEF)
•Dxy= GINI
•C Index = concordence
Jun 10, 2020 90

Step 8 –Save the Results in a File
•Save the Results in a CSV File
write.csv(oot1_lg,"oot_logit_pred.csv")
•Check the Data in EXCEL
Jun 10, 2020 91

Step 3 & 4: Algorithm –
Random Forest
•Load the Following Library: Random Forest
Library (randomForest)
•Run the Relevant Command
training$DEF<-factor(training$DEF)
library(randomForest)
set.seed(71)
RF<-randomForest(DEF ~ TA + TI + PAT + PBDITA + PBT + CPFT
+ PBTI + PATI + Sales + QR + CR + DE -Num, data= training ,
ntree= 50, mtry= 3, importance = TRUE, na.action= na.omit,
keep.forest= TRUE, do.trace= 10)
Jun 10, 2020 92

Step 3 & 4: Random
Forest –Model Diagnostics
•Result Summary –summary(RF)
•Variable Importance –importance (RF)
•Another Way of Variable Importance –
round(importance (RF), 2)
•Classification Metrics –print(RF)
•Printing one of the trees –getTree(RF,1) –
1stTree
Jun 10, 2020 93

Step 5 & Rest: Score
Validation Data -RF
•Scoring the Validation Dataset
predicted <-predict(RF, testing, type = "prob")
•Only Keep the ProbCorresponding to the
Second Column
prob_rf<-predicted[,2]
•Save it as variables
g <-transform(prob_rf)
testing$pred_rf<-g$X_data
Jun 10, 2020 94
Step 5 & Rest:
Random Forest
predicted <-predict(RF, testing, type = "prob")
•Only Keep the ProbCorresponding to the Second
Column
prob_rf<-predicted[,2]
g <-transform(prob_rf)
testing$pred_rf<-g$X_data
head(testing)
Jun 10, 2020 95

Step 5 & Rest:
Score Validation Data -RF
•Subset the Data for the Relevant Variables
testing1_rf <-subset(testing, select =
c(pred_rf,DEF,Num))
•Compute the GINI
library(Hmisc)
rcorr.cens(testing1_rf$pred,testing1_rf$DEF)
Jun 10, 2020 96

Step 5 & Rest:
Score Validation Data -RF
•Merge the Relevant Files
Scored_testing1 <-
merge(testing1_lg,testing1_rf ,by="Num")
•Save this as a Permanent Data
write.csv(Scored_testing1,"testing_lg_rf_pred.
csv")
•Repeat this Action as you Append Other
Probabilities using Different Algorithms
Jun 10, 2020 97

Step 3 & 4: Algorithm –
Gradient Boosting Machine
•Load the Following Library: GBM
Library (gbm)
•Run the Relevant Command
library(gbm)
gbm_model<-gbm(DEF ~
TA+TI+TE+PAT+PBDITA+CPFT+PBDITAI+Sales+SHF
+NWC+QR+CR+DE+EPS+TLIAB, training,
distribution = "bernoulli", n.trees= 5,shrinkage=
0.1, interaction.depth= 3)
Jun 10, 2020 98

Step 3 & 4: Random
Forest –Model Diagnostics
•Variable Importance
summary(gbm_model,
cBars=length(gbm_model$var.names),
n.trees=gbm_model$n.trees,
plotit=TRUE,
order=TRUE,
method=relative.influence,
normalize=TRUE)
Jun 10, 2020 99
Step 5 & Rest:
Score Validation Data -GBM
predict_gbm<-predict(gbm_model, testing,
n.trees=5, type="response")
summary(predict_gbm)
b <-transform(predict_gbm)
testing$predgbm<-b$X_data
head(testing)
Jun 10, 2020 100
Step 5 & Rest:
•Subset the Data for the Relevant Variables
testing1_gbm <-subset(testing, select =
c(predgbm,DEF,Num))
•Compute the GINI
library(Hmisc)
rcorr.cens(testing1_gbm$predgbm,testing1_g
bm$DEF)
Jun 10, 2020 101

Step 5 & Rest:
•Merge the Relevant Files
Scored_testing1 <-merge(testing1_lg,testing1_rf
,testing1_gbm by="Num")
•Save this as a Permanent Data
write.csv(Scored_testing1,"testing_lg_rf_gbm_pr
ed.csv")
•Repeat this Action as you Append Other
Probabilities using Different Algorithms
Jun 10, 2020 102

Step 2: Algorithm –KNN
•Load the Following Library: KNN
library(kknn)
•Standardize the Data
training$nTA=(training$TA-min(training$TA))/
(max(training$TA)-min(training$TA))
training$nTI=(training$TI-min(training$TI))/
(max(training$TI)-min(training$TI))
•Do the Same for Validation Data as well
Jun 10, 2020 103

Step 3 & 4: Algorithm –KNN
•Run the Relevant Algorithm
library(kknn)
knn< -kknn(as.factor(DEF)~ nTA+ nTI+ nPAT+
nPBDITA+ nPBT+ nCPFT+ nPBTI+ nPATI+ nSales+
nQR+ nCR+ nDE,training, testing, k = 7, distance = 2)
K = number of points in the Neighbourhood
Distance = 2 (Minowski’sDistance = (|distance|)**(2)
Use Both the Training and Test in the same code
Jun 10, 2020 104

Step 5 : Model Results &
Diagnostics: Algorithm –KNN
•Check the Results
summary(knn)
fit<-fitted(knn)
plot(fit)
•Save the Results as a Variable
b <-transform(fit)
testing$pred_knn<-b$X_data
table (testing$DEF,testing$pred_knn)
Jun 10, 2020 105
Step 6 and Beyond:
Algorithm –KNN
•Save Relevant Variables and Merge with Original
Data
testing1_knn <-subset(testing, select =
c(pred_knn,DEF,Num))
•Create the Final Data Set
Scored_testing1 <-
merge(testing1_lg,testing1_gbm,testing1_rf,
testing1_knn ,by="Num")
•Repeat this for other Algorithms
Jun 10, 2020 106

References
•Altman (1968), Financial Ratios, Discriminant Analysis and the
Prediction of Corporate Bankruptcy, Journal of Finance, 23, No.4, 589-
609
•Altman (1993), Corporate Financial Distress and Bankruptcy: A
Complete Guide to Predicting & Avoiding Distress and Profiting from
Bankruptcy, John Wiley Second Edition
•AnantT C, GangopadhyayS and GoswamiO (1992), Industrial Sickness in
India: Characteristics, Determinants and History, 1970-90, Report 2,
Government of India, Ministry of Indian Office of Economic Advisors
•RaghunathanV and J Verma(1992), CrisilRatings: When does AAA mean
B? Vikalpa, Vol17, No 2, 35-42
•Emerging Market Score Model:
http://pages.stern.nyu.edu/~ealtman/emerging_markets_review.pdf
Jun 10, 2020 107

GL - ML - 2017 (Reworked)

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

GL - ML - 2017 (Reworked)

Uploaded by

Copyright:

Available Formats

Agenda

•How Credit Decisions are Taken in the

Jun 10, 2020 1

Jun 10, 2020 2

Jun 10, 2020 3

Jun 10, 2020 4

Jun 10, 2020 5

Jun 10, 2020 6

Jun 10, 2020 7

Jun 10, 2020 8

Jun 10, 2020 9

Jun 10, 2020 10

Jun 10, 2020 11

Jun 10, 2020 12

Jun 10, 2020 13

Jun 10, 2020 14

Jun 10, 2020 15

Jun 10, 2020 16

Jun 10, 2020 17

Jun 10, 2020 18

Jun 10, 2020 19

Jun 10, 2020 20

Jun 10, 2020 21

Jun 10, 2020 22

Jun 10, 2020 23

Jun 10, 2020 24

Jun 10, 2020 25

Jun 10, 2020 26

Jun 10, 2020 27

Jun 10, 2020 28

Jun 10, 2020 29

Jun 10, 2020 30

Jun 10, 2020 31

Jun 10, 2020 32

Jun 10, 2020 33

Jun 10, 2020 34

Jun 10, 2020 35

Jun 10, 2020 36

Jun 10, 2020 37

Jun 10, 2020 38

Jun 10, 2020 39

Jun 10, 2020 40

Jun 10, 2020 42

Jun 10, 2020 43

Jun 10, 2020 44

Jun 10, 2020 45

Jun 10, 2020 46

Jun 10, 2020 47

Jun 10, 2020 48

Jun 10, 2020 49

Jun 10, 2020 50

Jun 10, 2020 51

Jun 10, 2020 52

Jun 10, 2020 53

Jun 10, 2020 54

Jun 10, 2020 55

Jun 10, 2020 56

Jun 10, 2020 57

Jun 10, 2020 58

Jun 10, 2020 59

Jun 10, 2020 60

Jun 10, 2020 61

Jun 10, 2020 62