Bank Customer Churn Analysis - Jupyter Notebook

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

11/12/2023, 10:49 Bank Customer Churn Analysis - Jupyter Notebook

Importing necessary libraries


In [1]: 1 import pandas as pd
2 import numpy as np
3 import matplotlib.pyplot as plt
4 import seaborn as sns

Read dataset
In [2]: 1 df=pd.read_csv('Bank Churn_Modelling.csv')
2 df.head()

Out[2]:
RowNumber CustomerId Surname CreditScore Geography Gender Age Tenure Bala

0 1 15634602 Hargrave 619 France Female 42 2

1 2 15647311 Hill 608 Spain Female 41 1 8380

2 3 15619304 Onio 502 France Female 42 8 15966

3 4 15701354 Boni 699 France Female 39 1

4 5 15737888 Mitchell 850 Spain Female 43 2 12551

In [3]: 1 df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 14 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 RowNumber 10000 non-null int64
1 CustomerId 10000 non-null int64
2 Surname 10000 non-null object
3 CreditScore 10000 non-null int64
4 Geography 10000 non-null object
5 Gender 10000 non-null object
6 Age 10000 non-null int64
7 Tenure 10000 non-null int64
8 Balance 10000 non-null float64
9 NumOfProducts 10000 non-null int64
10 HasCrCard 10000 non-null int64
11 IsActiveMember 10000 non-null int64
12 EstimatedSalary 10000 non-null float64
13 Exited 10000 non-null int64
dtypes: float64(2), int64(9), object(3)
memory usage: 1.1+ MB

Dropping unwanted columns

localhost:8888/notebooks/KINGS LABS/4. Customer churn analysis/Bank Customer Churn Analysis.ipynb 1/11


11/12/2023, 10:49 Bank Customer Churn Analysis - Jupyter Notebook

In [4]: 1 unnecessary_cols=['RowNumber','CustomerId','Surname']
2 df=df.drop(df[unnecessary_cols],axis=1)

EDA

localhost:8888/notebooks/KINGS LABS/4. Customer churn analysis/Bank Customer Churn Analysis.ipynb 2/11


11/12/2023, 10:49 Bank Customer Churn Analysis - Jupyter Notebook

In [5]: 1 for column in df.columns:


2 unique_values = df[column].value_counts()
3 if df[column].nunique()<6:
4 plt.figure()
5 plt.pie(unique_values, labels=unique_values.index, autopct='%1.
6 plt.title(f'Distribution of {column}')
7 plt.axis('equal')
8 ​
9 # Display all the pie charts
10 plt.show()

localhost:8888/notebooks/KINGS LABS/4. Customer churn analysis/Bank Customer Churn Analysis.ipynb 3/11


11/12/2023, 10:49 Bank Customer Churn Analysis - Jupyter Notebook

localhost:8888/notebooks/KINGS LABS/4. Customer churn analysis/Bank Customer Churn Analysis.ipynb 4/11


11/12/2023, 10:49 Bank Customer Churn Analysis - Jupyter Notebook

localhost:8888/notebooks/KINGS LABS/4. Customer churn analysis/Bank Customer Churn Analysis.ipynb 5/11


11/12/2023, 10:49 Bank Customer Churn Analysis - Jupyter Notebook

In [6]: 1 # Churn Rate by Geography


2 ​
3 plt.figure(figsize =(10,6))
4 ​
5 churn_rate_geo_gender = df.groupby(['Geography','Gender'])['Exited'].me
6 sns.barplot(data=churn_rate_geo_gender, x= 'Geography', y= 'Churn Rate'
7 plt.xlabel('Geography')
8 plt.ylabel('Churn Rate')
9 plt.title('Churn Rate by Geography & Gender')
10 plt.show()

localhost:8888/notebooks/KINGS LABS/4. Customer churn analysis/Bank Customer Churn Analysis.ipynb 6/11


11/12/2023, 10:49 Bank Customer Churn Analysis - Jupyter Notebook

In [7]: 1 churn_counts = df['Exited'].value_counts()


2 colors = ['#7B68EE', '#483D8B']
3 plt.figure(figsize=(8, 6))
4 plt.bar(churn_counts.index, churn_counts.values, color=colors)
5 plt.xlabel('Churn (Exited)')
6 plt.ylabel('Count')
7 plt.xticks(churn_counts.index, labels=['Not Churned', 'Churned'])
8 plt.title('Count of Customers Churned vs. Not Churned')
9 plt.show()

localhost:8888/notebooks/KINGS LABS/4. Customer churn analysis/Bank Customer Churn Analysis.ipynb 7/11


11/12/2023, 10:49 Bank Customer Churn Analysis - Jupyter Notebook

In [8]: 1 plt.figure(figsize=(10, 6))


2 sns.countplot(data=df, x='IsActiveMember', hue='Exited', palette='Set1'
3 plt.xlabel('Active Membership')
4 plt.ylabel('Count')
5 plt.title('Active Membership Distribution by Churn')
6 plt.legend(['Not Churned', 'Churned'])
7 plt.xticks([0, 1], ['Inactive', 'Active'])
8 plt.show()

Categorical values
In [9]: 1 for i in df:
2 if df[i].dtypes == object:
3 print(df[i].value_counts(),"\n")

Geography
France 5014
Germany 2509
Spain 2477
Name: count, dtype: int64

Gender
Male 5457
Female 4543
Name: count, dtype: int64

Encoding categorical variables

localhost:8888/notebooks/KINGS LABS/4. Customer churn analysis/Bank Customer Churn Analysis.ipynb 8/11


11/12/2023, 10:49 Bank Customer Churn Analysis - Jupyter Notebook

In [10]: 1 from sklearn.preprocessing import LabelEncoder


2 le=LabelEncoder()

In [11]: 1 df['Geography']=le.fit_transform(df['Geography'])
2 for i,j in enumerate (le.classes_):
3 print(i, j)

0 France
1 Germany
2 Spain

In [12]: 1 df['Gender']=le.fit_transform(df['Gender'])
2 for i,j in enumerate (le.classes_):
3 print(i, j)

0 Female
1 Male

In [13]: 1 df.head()

Out[13]:
CreditScore Geography Gender Age Tenure Balance NumOfProducts HasCrCard IsA

0 619 0 0 42 2 0.00 1 1

1 608 2 0 41 1 83807.86 1 0

2 502 0 0 42 8 159660.80 3 1

3 699 0 0 39 1 0.00 2 0

4 850 2 0 43 2 125510.82 1 1

In [14]: 1 df.isnull().sum()

Out[14]: CreditScore 0
Geography 0
Gender 0
Age 0
Tenure 0
Balance 0
NumOfProducts 0
HasCrCard 0
IsActiveMember 0
EstimatedSalary 0
Exited 0
dtype: int64

Assigning dependent and independent


variable
In [15]: 1 X = df.drop('Exited', axis=1)
2 y = df['Exited']

localhost:8888/notebooks/KINGS LABS/4. Customer churn analysis/Bank Customer Churn Analysis.ipynb 9/11


11/12/2023, 10:49 Bank Customer Churn Analysis - Jupyter Notebook

Splitting dataset to training and testing set

In [16]: 1 from sklearn.model_selection import train_test_split


2 X_train, X_test, y_train, y_test = train_test_split(X,y, train_size=.70
3 X_train.shape, y_test.shape

Out[16]: ((7000, 10), (3000,))

Models
In [17]: 1 from sklearn.metrics import classification_report, confusion_matrix
2 def print_metrics(model, X_train=X_train,y_train = y_train, X_test = X_
3 model.fit(X_train, y_train)
4 y_pred = model.predict(X_test)
5 print(classification_report(y_test,y_pred))
6 print(confusion_matrix(y_test,y_pred))

In [18]: 1 from sklearn.linear_model import LogisticRegression


2 print_metrics(LogisticRegression())

precision recall f1-score support

0 0.81 0.97 0.89 2416


1 0.44 0.08 0.14 584

accuracy 0.80 3000


macro avg 0.63 0.53 0.51 3000
weighted avg 0.74 0.80 0.74 3000

[[2354 62]
[ 536 48]]

In [19]: 1 from sklearn.tree import DecisionTreeClassifier


2 print_metrics(DecisionTreeClassifier())

precision recall f1-score support

0 0.88 0.87 0.87 2416


1 0.48 0.50 0.49 584

accuracy 0.80 3000


macro avg 0.68 0.68 0.68 3000
weighted avg 0.80 0.80 0.80 3000

[[2096 320]
[ 294 290]]

localhost:8888/notebooks/KINGS LABS/4. Customer churn analysis/Bank Customer Churn Analysis.ipynb 10/11


11/12/2023, 10:49 Bank Customer Churn Analysis - Jupyter Notebook

In [20]: 1 from xgboost import XGBClassifier


2 print_metrics(XGBClassifier())

precision recall f1-score support

0 0.88 0.95 0.92 2416


1 0.70 0.48 0.57 584

accuracy 0.86 3000


macro avg 0.79 0.71 0.74 3000
weighted avg 0.85 0.86 0.85 3000

[[2299 117]
[ 305 279]]

In [21]: 1 from sklearn.ensemble import RandomForestClassifier


2 print_metrics(RandomForestClassifier())

precision recall f1-score support

0 0.88 0.97 0.92 2416


1 0.78 0.46 0.58 584

accuracy 0.87 3000


macro avg 0.83 0.72 0.75 3000
weighted avg 0.86 0.87 0.86 3000

[[2338 78]
[ 314 270]]

In [22]: 1 # Factors contributing to customer attrition :


2 # 1. Female (Gender)
3 # 2. Germany (Geography)

localhost:8888/notebooks/KINGS LABS/4. Customer churn analysis/Bank Customer Churn Analysis.ipynb 11/11

You might also like