Professional Documents
Culture Documents
Assignment-2: Submitted By: Name: Vipul Kumar Singh Roll No: 133118 Submitted To: Prof. Kuldeep Baishya
Assignment-2: Submitted By: Name: Vipul Kumar Singh Roll No: 133118 Submitted To: Prof. Kuldeep Baishya
Assignment-2
Step 1: Initially, we apply Binary Logistic Regression as the dependent variable (Loan_Status) has two possible outputs (Y and N). Except for the row identifier:
Loan_ID and Credit_History, we consider the remaining factors as the independent variables.
The constant has a significant value denoting that there is a clear difference between Yes and No categories of the dependent variable Loan_Status.
Model Summary
a. Estimation terminated at iteration number 20 because maximum iterations has been reached. Final solution cannot be found.
The Naegelkerke R2 statistic is usually referred to (it scales up to 1.0) and is a measure of how well the dependent variable is covered by independent variables.
Classification Table
Predicted
Loan_Status
Observed N Y Percentage Correct
Step 1 Loan_Status N 14 175 7.4
Y 12 406 97.1
The classification (or truth) table shows that the model is good at predicting True Positives (at 97.1% accuracy) but not at predicting True Negatives (7.4%). This is not
ideal for a bank as the model should be used to predict which loans to reject.
• Married
• Education
• Property Areas (Urban and Rural)
Step 2: We now add the Credit_History independent factor alongside the above-identified significant factors to create a new model.
Model Summary
It can be seen that the R2 statistic has improved significantly from 0.07 to 0.354
Classification Table
Predicted
Loan_Status
Observed N Y Percentage Correct
Step 1 Loan_Status N 87 105 45.3
Y 16 406 96.2
The prediction accuracy for True Negatives has also increased from 7.4% to 45.3%.