Professional Documents
Culture Documents
day_no_46_date_no_06_07_2024_Logistic_Regression_Practical(pdf)
day_no_46_date_no_06_07_2024_Logistic_Regression_Practical(pdf)
Choose Files No file chosen Upload widget is only available when the cell has
been executed in the current browser session. Please rerun this cell to enable.
Saving creditcard.csv to creditcard.csv
In [ ]: data = pd.read_csv('creditcard.csv')
Out[ ]: V1 V2 V3 V4 V5 V6 V7 V8
5 rows × 30 columns
In [ ]: data.dtypes
file:///C:/Users/ER_SATENDRA_SHARMA/Downloads/day_no_46_date_no_07_07_2024_Logistic_Regression_Practical.html 1/8
7/7/24, 12:53 AM day_no_46_date_no_07_07_2024_Logistic_Regression_Practical
Out[ ]: V1 float64
V2 float64
V3 float64
V4 float64
V5 float64
V6 float64
V7 float64
V8 float64
V9 float64
V10 float64
V11 float64
V12 float64
V13 float64
V14 float64
V15 float64
V16 float64
V17 float64
V18 float64
V19 float64
V20 float64
V21 float64
V22 float64
V23 float64
V24 float64
V25 float64
V26 float64
V27 float64
V28 float64
V29 float64
Target int64
dtype: object
In [ ]: data.isnull().sum()
file:///C:/Users/ER_SATENDRA_SHARMA/Downloads/day_no_46_date_no_07_07_2024_Logistic_Regression_Practical.html 2/8
7/7/24, 12:53 AM day_no_46_date_no_07_07_2024_Logistic_Regression_Practical
Out[ ]: V1 0
V2 0
V3 0
V4 0
V5 0
V6 0
V7 0
V8 0
V9 0
V10 0
V11 0
V12 0
V13 0
V14 0
V15 0
V16 0
V17 0
V18 0
V19 0
V20 0
V21 0
V22 0
V23 0
V24 0
V25 0
V26 0
V27 0
V28 0
V29 0
Target 0
dtype: int64
In [ ]: data.duplicated().sum()
Out[ ]: 675
file:///C:/Users/ER_SATENDRA_SHARMA/Downloads/day_no_46_date_no_07_07_2024_Logistic_Regression_Practical.html 3/8
7/7/24, 12:53 AM day_no_46_date_no_07_07_2024_Logistic_Regression_Practical
Out[ ]: V1 V2 V3 V4 V5 V6 V7 V8
In [ ]: data.drop_duplicates(inplace = True)
In [ ]: data.duplicated().sum()
Out[ ]: 0
file:///C:/Users/ER_SATENDRA_SHARMA/Downloads/day_no_46_date_no_07_07_2024_Logistic_Regression_Practical.html 4/8
7/7/24, 12:53 AM day_no_46_date_no_07_07_2024_Logistic_Regression_Practical
Out[ ]:
0
...
56957
56958
56959
56960
56961
Out[ ]: V1 V2 V3 V4 V5 V6 V7 V
file:///C:/Users/ER_SATENDRA_SHARMA/Downloads/day_no_46_date_no_07_07_2024_Logistic_Regression_Practical.html 5/8
7/7/24, 12:53 AM day_no_46_date_no_07_07_2024_Logistic_Regression_Practical
Out[ ]: Target
0 56189
1 98
Name: count, dtype: int64
In [ ]: data.shape
In [ ]: 98 / 56287
Out[ ]: 0.0017410769804750653
In [ ]: 56189 / 56287
Out[ ]: 0.9982589230195249
In [ ]: rf_classifier = RandomForestClassifier()
rf_classifier.fit(X_train, y_train)
Out[ ]: ▾ RandomForestClassifier
RandomForestClassifier()
In [ ]: y_pred = rf_classifier.predict(X_test)
In [ ]: accuracy_score(y_test, y_pred)
Out[ ]: 0.9991117427607035
file:///C:/Users/ER_SATENDRA_SHARMA/Downloads/day_no_46_date_no_07_07_2024_Logistic_Regression_Practical.html 6/8
7/7/24, 12:53 AM day_no_46_date_no_07_07_2024_Logistic_Regression_Practical
We should never use accuracy score as a metric to check the performance of a classification
model when there is imbalance in the dataset. Rather we should use roc_auc_score in the
case of imbalanced data.
In [ ]: roc_auc_score(y_test, y_pred)
Out[ ]: 0.7999110161950526
In [ ]: precision_score(y_test, y_pred)
Out[ ]: 0.8571428571428571
In [ ]: recall_score(y_test, y_pred)
Out[ ]: 0.6
In [ ]: importances
file:///C:/Users/ER_SATENDRA_SHARMA/Downloads/day_no_46_date_no_07_07_2024_Logistic_Regression_Practical.html 7/8
7/7/24, 12:53 AM day_no_46_date_no_07_07_2024_Logistic_Regression_Practical
In [ ]: arr1
Out[ ]: array([54, 76, 79, 63, 85, 48, 60, 94, 52, 56, 87, 93, 54, 77, 46, 72, 47,
49, 62, 89, 62, 69, 56, 60, 56, 60, 85, 85, 53, 73])
In [ ]: print(np.sort(arr1))
print(np.argsort(arr1))
[46 47 48 49 52 53 54 54 56 56 56 60 60 60 62 62 63 69 72 73 76 77 79 85
85 85 87 89 93 94]
[14 16 5 17 8 28 12 0 9 24 22 6 25 23 20 18 3 21 15 29 1 13 2 4
26 27 10 19 11 7]
In [ ]:
file:///C:/Users/ER_SATENDRA_SHARMA/Downloads/day_no_46_date_no_07_07_2024_Logistic_Regression_Practical.html 8/8