Professional Documents
Culture Documents
Bank Customer Churn Analysis - Jupyter Notebook
Bank Customer Churn Analysis - Jupyter Notebook
Bank Customer Churn Analysis - Jupyter Notebook
Read dataset
In [2]: 1 df=pd.read_csv('Bank Churn_Modelling.csv')
2 df.head()
Out[2]:
RowNumber CustomerId Surname CreditScore Geography Gender Age Tenure Bala
In [3]: 1 df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 14 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 RowNumber 10000 non-null int64
1 CustomerId 10000 non-null int64
2 Surname 10000 non-null object
3 CreditScore 10000 non-null int64
4 Geography 10000 non-null object
5 Gender 10000 non-null object
6 Age 10000 non-null int64
7 Tenure 10000 non-null int64
8 Balance 10000 non-null float64
9 NumOfProducts 10000 non-null int64
10 HasCrCard 10000 non-null int64
11 IsActiveMember 10000 non-null int64
12 EstimatedSalary 10000 non-null float64
13 Exited 10000 non-null int64
dtypes: float64(2), int64(9), object(3)
memory usage: 1.1+ MB
In [4]: 1 unnecessary_cols=['RowNumber','CustomerId','Surname']
2 df=df.drop(df[unnecessary_cols],axis=1)
EDA
Categorical values
In [9]: 1 for i in df:
2 if df[i].dtypes == object:
3 print(df[i].value_counts(),"\n")
Geography
France 5014
Germany 2509
Spain 2477
Name: count, dtype: int64
Gender
Male 5457
Female 4543
Name: count, dtype: int64
In [11]: 1 df['Geography']=le.fit_transform(df['Geography'])
2 for i,j in enumerate (le.classes_):
3 print(i, j)
0 France
1 Germany
2 Spain
In [12]: 1 df['Gender']=le.fit_transform(df['Gender'])
2 for i,j in enumerate (le.classes_):
3 print(i, j)
0 Female
1 Male
In [13]: 1 df.head()
Out[13]:
CreditScore Geography Gender Age Tenure Balance NumOfProducts HasCrCard IsA
0 619 0 0 42 2 0.00 1 1
1 608 2 0 41 1 83807.86 1 0
2 502 0 0 42 8 159660.80 3 1
3 699 0 0 39 1 0.00 2 0
4 850 2 0 43 2 125510.82 1 1
In [14]: 1 df.isnull().sum()
Out[14]: CreditScore 0
Geography 0
Gender 0
Age 0
Tenure 0
Balance 0
NumOfProducts 0
HasCrCard 0
IsActiveMember 0
EstimatedSalary 0
Exited 0
dtype: int64
Models
In [17]: 1 from sklearn.metrics import classification_report, confusion_matrix
2 def print_metrics(model, X_train=X_train,y_train = y_train, X_test = X_
3 model.fit(X_train, y_train)
4 y_pred = model.predict(X_test)
5 print(classification_report(y_test,y_pred))
6 print(confusion_matrix(y_test,y_pred))
[[2354 62]
[ 536 48]]
[[2096 320]
[ 294 290]]
[[2299 117]
[ 305 279]]
[[2338 78]
[ 314 270]]