SFA - Group 10 - Assignment

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

Statistical Foundation for Analytics –

Group Assignment –
Multiple Regression Analysis for Predicting Loan Default Probability

Submitted By: Group 10

1.Bhargav Gurram 2022EPGP015


2.Lathesh Kumar Polampalli 2022EPGP024
3.Ramakant Janjeere 2022EPGP040
4.Rohit Sanjay Satarke 2022EPGP042
5.Sucharita Das 2022EPGP051
Abstract:
This study is done to frame a model for predicting the default rate in loans with rate to few given
parameters. The defaults in loans are a huge risk undertaken by financial institutions while providing
credit facilities to its customers. Hence it becomes extremely essential to predict the default rates in
loans, not only to mitigate the risk but also for making necessary provisioning in their balance sheets.

This predictive analysis of default risk is also beneficial for assessing the important parameters to be
considered before extending credit facilities to its customers and also the understanding the best
target customers. In this study multiple regression analysis was done to predict the default probability
of the loanee with respect to c credit rating and loan to value (LTV) ratio. A regression model has been
developed using iterative data runs and finding the best fit.

Developing The Model: Steps Involved


Assumptions
1. One Way causality of data
2. Average Chance cause is null
3. V(ε) = σ2; where ε is the error
4. cause is non-stochastic i.e., x is measured perfectly
5. ε follows standard distribution
6. the dataset doesn’t have interdependent entries

Here in the study, we have taken 10 samples of beneficiary data for prediction of default in loans and
Credit Rating, and Loan -to- value (LTV) ratio have been taken as the predictors. The credit ratings
have been taken as A++ and A+, where A++ implies a better credit rating compared to A +, as given by
credit rating agencies and the LTV implies the ratio of loan availed to value of the asset.

The regression of the model was first developed using the Data Analysis-Regression tool of Excel, for
only the numerical data of data-set which is the LTV data.

The categorical data having A++ and A+ values, for which we had assigned value “1” for A++ and “0”
for A+. The one-way causality was checked and correlation of the data was established, and check for
multi-collinearity was also checked.

In the second step the Regression model was developed using both the categorical data and numerical
data. As the data set had only 10 number of data samples, hence we used Jack-knifing method to get
to a better regression model. We removed 1 data each time and repeated the regression-run for 10
number of times. We tabulated a model for the p-value of the intercepts.

At the end of the process, it was found that the intercept p-value came to be insignificant for 9 out
of 10 times. As a result of it we dropped the intercept from the iterative process of further
regression -runs. After dropping the intercept, we did the Jack-knifing model again for 10 number of
iterations, and found the regression model. The resultant comparative data has then been
tabulated.
We then formed a table of the data and average of gradient of the first variable, X1, which is the LTV
variable slope co-efficient and also the average value of gradient of second variable X2, which is the
credit-rating variable slope coefficient.

Considering the average of the two slope co-efficient we then calculated the sum of squares of sum
of squares of Errors (SSE) and the Sum of Squares of Total (SST) and the subsequent R-square using
the following formulae:

SSE=∑(Y-Yhat)^2 , SST=∑(Y- Yavg)^2

R-square=1-(SSE/SST)

B1 avg value (LTV slope coefficient) = 0.54506

B2 avg value (Credit rating slope coefficient) = 0.16691

The r-square value calculated came out to be a better one, than the original R-square calculated
without the Jack-knifing.
B2 - Credit
B1 - LTV
Rating Y=B1X1 yhat=x1B1| Default (y-
Sl No Slope x1 x2 Ya-Y (Y-Y|)^2
Slope + B2X2 + x2B2| Probability Yhat)^2
Coefficient
Coefficient
1 0.49711 0.1693309 0.2 1 0.268752 0.2759 0.1238 0.29 0.008464 0.00020
-
2 0.49955 0.1502252 0.6 0 0.29973 0.3270 0.2997 0.3 0.09 0.00073
-
3 0.45435 0.1870619 0.8 1 0.550546 0.6030 0.5505 0.48 0.2304 0.01512
-
4 0.51304 0.1586957 0.3 0 0.153913 0.1635 0.1539 0.18 0.0324 0.00027
-
5 0.49908 0.1704052 0.2 1 0.270221 0.2759 0.2702 0.29 0.0841 0.00020
6 0.5 0.16 0.7 1 0.51 0.5485 -0.51 0.49 0.2401 0.00342
-
7 0.49182 0.1689556 0.9 0 0.442634 0.4906 0.4426 0.42 0.1764 0.00498
8 0.51129 0.1772581 0.8 0 0.409032 0.4360 -0.409 0.48 0.2304 0.00193
-
9 0.98432 0.1671989 0.4 1 0.560927 0.3849 0.5609 0.44 0.1936 0.00303
10 0.5 0.16 0.6 1 0.46 0.4939 -0.46 0.45 0.2025 0.00193
Average 0.54506 0.1669131 0.382 1.488364 0.03181
From the table: R=1-(SSE/SST) =1-(0.03181/1.488364) =0.9786

Interpretation:

The model developed by us finally came out to be: Y= 0.545056063 * LTV -variable + 0.16691 * Credit-
rating variable, which is also validated by the improved value of R-square which came out to
be 0.9786. This implies the risk of default is directly correlated with the loan to value ratio and credit
rating.
Original probability Vs New Probability model
0.7

0.6

0.5

0.4

0.3

0.2

0.1

0
1 2 3 4 5 6 7 8 9 10

Original Default Probability New Probability Model

Hence with the model, a successful predictive analysis of the default risk of bank loans can be done
using the multilinear regression analysis which will prove to be beneficial for the financial lending
institutions which was the objective of study.

You might also like