Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

AMR-2

Assignment-2

Submitted to: Submitted by:


Prof. Kuldeep Baishya Name: Vipul Kumar Singh
Roll No: 133118

Step 1: Initially, we apply Binary Logistic Regression as the dependent variable (Loan_Status) has two possible outputs (Y and N). Except for the row identifier:
Loan_ID and Credit_History, we consider the remaining factors as the independent variables.

Variables in the Equation


B S.E. Wald df Sig. Exp(B)
Step 0 Constant .794 .088 81.997 1 .000 2.212

The constant has a significant value denoting that there is a clear difference between Yes and No categories of the dependent variable Loan_Status.

Model Summary

Step -2 Log likelihood Cox & Snell R Square Nagelkerke R Square


1 722.105a .049 .070

a. Estimation terminated at iteration number 20 because maximum iterations has been reached. Final solution cannot be found.

The Naegelkerke R2 statistic is usually referred to (it scales up to 1.0) and is a measure of how well the dependent variable is covered by independent variables.
Classification Table
Predicted
Loan_Status
Observed N Y Percentage Correct
Step 1 Loan_Status N 14 175 7.4

Y 12 406 97.1

Overall Percentage 69.2

a. The cut value is .500

The classification (or truth) table shows that the model is good at predicting True Positives (at 97.1% accuracy) but not at predicting True Negatives (7.4%). This is not
ideal for a bank as the model should be used to predict which loans to reject.

Categorical Variables Coding


Parameter coding
Frequency (1) (2) (3) (4)
Dependents 1 1.000 .000 .000 .000
0 349 .000 1.000 .000 .000

1 103 .000 .000 1.000 .000

2 103 .000 .000 .000 1.000

3+ 51 .000 .000 .000 .000

Property_Area Rural 177 1.000 .000


Semiurba 231 .000 1.000

Urban 199 .000 .000

Education Graduate 476 1.000


Not Grad 131 .000

Married No 210 1.000


Yes 397 .000

Self_Employed No 513 1.000


Yes 94 .000

Gender Female 116 1.000


Male 491 .000

Variables in the Equation


B S.E. Wald df Sig. Exp(B)
Step 1a Gender(1) -.013 .249 .003 1 .958 .987
Married(1) -.509 .214 5.633 1 .018 .601
Dependents 2.559 4 .634
Dependents(1) -21.980 40192.970 .000 1 1.000 .000
Dependents(2) .305 .341 .798 1 .372 1.356
Dependents(3) -.025 .377 .004 1 .948 .976
Dependents(4) .362 .386 .882 1 .348 1.436
Education(1) .492 .218 5.084 1 .024 1.636
Self_Employed(1) -.076 .253 .089 1 .765 .927
ApplicantIncome .000 .000 .000 1 .997 1.000
CoapplicantIncome .000 .000 2.162 1 .141 1.000
LoanAmount -.001 .001 .592 1 .442 .999
Loan_Amount_Term -.001 .001 .648 1 .421 .999
Property_Area 11.448 2 .003
Property_Area(1) -.170 .222 .583 1 .445 .844
Property_Area(2) .558 .222 6.311 1 .012 1.748
Constant .939 .645 2.122 1 .145 2.558
Variable(s) entered on step 1: Gender, Married, Dependents, Education, Self_Employed, ApplicantIncome, CoapplicantIncome, LoanAmount, Loan_Amount_Term,
Property_Area.

The model shows significant independent variables, which are

• Married
• Education
• Property Areas (Urban and Rural)

Step 2: We now add the Credit_History independent factor alongside the above-identified significant factors to create a new model.

Categorical Variables Codings


Parameter coding
Frequency (1) (2)
Credit_History 2 1.000 .000
0 103 .000 1.000

1 509 .000 .000

Property_Area Rural 179 1.000 .000


Semiurba 233 .000 1.000

Urban 202 .000 .000


Education Graduate 480 1.000
Not Grad 134 .000

Married No 214 1.000


Yes 400 .000

Model Summary

Step -2 Log likelihood Cox & Snell R Square Nagelkerke R Square


1 585.044a .251 .354
Estimation terminated at iteration number 20 because maximum iterations has been reached. Final solution cannot be found.

It can be seen that the R2 statistic has improved significantly from 0.07 to 0.354

Classification Table
Predicted
Loan_Status
Observed N Y Percentage Correct
Step 1 Loan_Status N 87 105 45.3
Y 16 406 96.2

Overall Percentage 80.3

The cut value is .500

The prediction accuracy for True Negatives has also increased from 7.4% to 45.3%.

You might also like