Bank Customer Churn Analysis - Jupyter Notebook

11/12/2023, 10:49 Bank Customer Churn Analysis - Jupyter Notebook
Importing necessary libraries

In [1]: 1 import pandas as pd
2 import numpy as np
3 import matplotlib.pyplot as plt
4 import seaborn as sns
Read dataset
In [2]: 1 df=pd.read_csv('Bank Churn_Modelling.csv')
2 df.head()
Out[2]:
RowNumber CustomerId Surname CreditScore Geography Gender Age Tenure Bala
0 1 15634602 Hargrave 619 France Female 42 2
1 2 15647311 Hill 608 Spain Female 41 1 8380
2 3 15619304 Onio 502 France Female 42 8 15966
3 4 15701354 Boni 699 France Female 39 1
4 5 15737888 Mitchell 850 Spain Female 43 2 12551
In [3]: 1 df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 14 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 RowNumber 10000 non-null int64
1 CustomerId 10000 non-null int64
2 Surname 10000 non-null object
3 CreditScore 10000 non-null int64
4 Geography 10000 non-null object
5 Gender 10000 non-null object
6 Age 10000 non-null int64
7 Tenure 10000 non-null int64
8 Balance 10000 non-null float64
9 NumOfProducts 10000 non-null int64
10 HasCrCard 10000 non-null int64
11 IsActiveMember 10000 non-null int64
12 EstimatedSalary 10000 non-null float64
13 Exited 10000 non-null int64
dtypes: float64(2), int64(9), object(3)
memory usage: 1.1+ MB
Dropping unwanted columns
localhost:8888/notebooks/KINGS LABS/4. Customer churn analysis/Bank Customer Churn Analysis.ipynb 1/11

In [4]: 1 unnecessary_cols=['RowNumber','CustomerId','Surname']
2 df=df.drop(df[unnecessary_cols],axis=1)
EDA

In [5]: 1 for column in df.columns:

2 unique_values = df[column].value_counts()
3 if df[column].nunique()<6:
4 plt.figure()
5 plt.pie(unique_values, labels=unique_values.index, autopct='%1.
6 plt.title(f'Distribution of {column}')
7 plt.axis('equal')
8
9 # Display all the pie charts
10 plt.show()



In [6]: 1 # Churn Rate by Geography

2
3 plt.figure(figsize =(10,6))
4
5 churn_rate_geo_gender = df.groupby(['Geography','Gender'])['Exited'].me
6 sns.barplot(data=churn_rate_geo_gender, x= 'Geography', y= 'Churn Rate'
7 plt.xlabel('Geography')
8 plt.ylabel('Churn Rate')
9 plt.title('Churn Rate by Geography & Gender')
10 plt.show()

In [7]: 1 churn_counts = df['Exited'].value_counts()

2 colors = ['#7B68EE', '#483D8B']
3 plt.figure(figsize=(8, 6))
4 plt.bar(churn_counts.index, churn_counts.values, color=colors)
5 plt.xlabel('Churn (Exited)')
6 plt.ylabel('Count')
7 plt.xticks(churn_counts.index, labels=['Not Churned', 'Churned'])
8 plt.title('Count of Customers Churned vs. Not Churned')
9 plt.show()

In [8]: 1 plt.figure(figsize=(10, 6))

2 sns.countplot(data=df, x='IsActiveMember', hue='Exited', palette='Set1'
3 plt.xlabel('Active Membership')
4 plt.ylabel('Count')
5 plt.title('Active Membership Distribution by Churn')
6 plt.legend(['Not Churned', 'Churned'])
7 plt.xticks([0, 1], ['Inactive', 'Active'])
8 plt.show()
Categorical values
In [9]: 1 for i in df:
2 if df[i].dtypes == object:
3 print(df[i].value_counts(),"\n")
Geography
France 5014
Germany 2509
Spain 2477
Name: count, dtype: int64
Gender
Male 5457
Female 4543
Name: count, dtype: int64
Encoding categorical variables

In [10]: 1 from sklearn.preprocessing import LabelEncoder

2 le=LabelEncoder()
In [11]: 1 df['Geography']=le.fit_transform(df['Geography'])
2 for i,j in enumerate (le.classes_):
3 print(i, j)
0 France
1 Germany
2 Spain
In [12]: 1 df['Gender']=le.fit_transform(df['Gender'])
2 for i,j in enumerate (le.classes_):
3 print(i, j)
0 Female
1 Male
In [13]: 1 df.head()
Out[13]:
CreditScore Geography Gender Age Tenure Balance NumOfProducts HasCrCard IsA
0 619 0 0 42 2 0.00 1 1
1 608 2 0 41 1 83807.86 1 0
2 502 0 0 42 8 159660.80 3 1
3 699 0 0 39 1 0.00 2 0
4 850 2 0 43 2 125510.82 1 1
In [14]: 1 df.isnull().sum()
Out[14]: CreditScore 0
Geography 0
Gender 0
Age 0
Tenure 0
Balance 0
NumOfProducts 0
HasCrCard 0
IsActiveMember 0
EstimatedSalary 0
Exited 0
dtype: int64
Assigning dependent and independent

variable
In [15]: 1 X = df.drop('Exited', axis=1)
2 y = df['Exited']

Splitting dataset to training and testing set
In [16]: 1 from sklearn.model_selection import train_test_split

2 X_train, X_test, y_train, y_test = train_test_split(X,y, train_size=.70
3 X_train.shape, y_test.shape
Out[16]: ((7000, 10), (3000,))
Models
In [17]: 1 from sklearn.metrics import classification_report, confusion_matrix
2 def print_metrics(model, X_train=X_train,y_train = y_train, X_test = X_
3 model.fit(X_train, y_train)
4 y_pred = model.predict(X_test)
5 print(classification_report(y_test,y_pred))
6 print(confusion_matrix(y_test,y_pred))
In [18]: 1 from sklearn.linear_model import LogisticRegression

2 print_metrics(LogisticRegression())
precision recall f1-score support
0 0.81 0.97 0.89 2416

1 0.44 0.08 0.14 584
accuracy 0.80 3000

macro avg 0.63 0.53 0.51 3000
weighted avg 0.74 0.80 0.74 3000
[[2354 62]
[ 536 48]]
In [19]: 1 from sklearn.tree import DecisionTreeClassifier

2 print_metrics(DecisionTreeClassifier())
0 0.88 0.87 0.87 2416

1 0.48 0.50 0.49 584
accuracy 0.80 3000

macro avg 0.68 0.68 0.68 3000
weighted avg 0.80 0.80 0.80 3000
[[2096 320]
[ 294 290]]

In [20]: 1 from xgboost import XGBClassifier

2 print_metrics(XGBClassifier())
0 0.88 0.95 0.92 2416

1 0.70 0.48 0.57 584
accuracy 0.86 3000

macro avg 0.79 0.71 0.74 3000
weighted avg 0.85 0.86 0.85 3000
[[2299 117]
[ 305 279]]
In [21]: 1 from sklearn.ensemble import RandomForestClassifier

2 print_metrics(RandomForestClassifier())
0 0.88 0.97 0.92 2416

1 0.78 0.46 0.58 584
accuracy 0.87 3000

macro avg 0.83 0.72 0.75 3000
weighted avg 0.86 0.87 0.86 3000
[[2338 78]
[ 314 270]]
In [22]: 1 # Factors contributing to customer attrition :

2 # 1. Female (Gender)
3 # 2. Germany (Geography)

Bank Customer Churn Analysis - Jupyter Notebook

Uploaded by

Copyright:

Available Formats

You might also like

Bank Customer Churn Analysis - Jupyter Notebook

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Bank Customer Churn Analysis - Jupyter Notebook

Uploaded by

Copyright:

Available Formats

11/12/2023, 10:49 Bank Customer Churn Analysis - Jupyter Notebook

Importing necessary libraries

0 1 15634602 Hargrave 619 France Female 42 2

1 2 15647311 Hill 608 Spain Female 41 1 8380

2 3 15619304 Onio 502 France Female 42 8 15966

3 4 15701354 Boni 699 France Female 39 1

4 5 15737888 Mitchell 850 Spain Female 43 2 12551

Dropping unwanted columns

localhost:8888/notebooks/KINGS LABS/4. Customer churn analysis/Bank Customer Churn Analysis.ipynb 1/11

localhost:8888/notebooks/KINGS LABS/4. Customer churn analysis/Bank Customer Churn Analysis.ipynb 2/11

In [5]: 1 for column in df.columns:

localhost:8888/notebooks/KINGS LABS/4. Customer churn analysis/Bank Customer Churn Analysis.ipynb 3/11

localhost:8888/notebooks/KINGS LABS/4. Customer churn analysis/Bank Customer Churn Analysis.ipynb 4/11

localhost:8888/notebooks/KINGS LABS/4. Customer churn analysis/Bank Customer Churn Analysis.ipynb 5/11

In [6]: 1 # Churn Rate by Geography

localhost:8888/notebooks/KINGS LABS/4. Customer churn analysis/Bank Customer Churn Analysis.ipynb 6/11

In [7]: 1 churn_counts = df['Exited'].value_counts()

localhost:8888/notebooks/KINGS LABS/4. Customer churn analysis/Bank Customer Churn Analysis.ipynb 7/11

In [8]: 1 plt.figure(figsize=(10, 6))

Encoding categorical variables

localhost:8888/notebooks/KINGS LABS/4. Customer churn analysis/Bank Customer Churn Analysis.ipynb 8/11

In [10]: 1 from sklearn.preprocessing import LabelEncoder

Assigning dependent and independent

localhost:8888/notebooks/KINGS LABS/4. Customer churn analysis/Bank Customer Churn Analysis.ipynb 9/11

Splitting dataset to training and testing set

In [16]: 1 from sklearn.model_selection import train_test_split

Out[16]: ((7000, 10), (3000,))

In [18]: 1 from sklearn.linear_model import LogisticRegression

precision recall f1-score support

0 0.81 0.97 0.89 2416

accuracy 0.80 3000

In [19]: 1 from sklearn.tree import DecisionTreeClassifier

precision recall f1-score support

0 0.88 0.87 0.87 2416

accuracy 0.80 3000

localhost:8888/notebooks/KINGS LABS/4. Customer churn analysis/Bank Customer Churn Analysis.ipynb 10/11

In [20]: 1 from xgboost import XGBClassifier

precision recall f1-score support

0 0.88 0.95 0.92 2416

accuracy 0.86 3000

In [21]: 1 from sklearn.ensemble import RandomForestClassifier

precision recall f1-score support

0 0.88 0.97 0.92 2416

accuracy 0.87 3000

In [22]: 1 # Factors contributing to customer attrition :

localhost:8888/notebooks/KINGS LABS/4. Customer churn analysis/Bank Customer Churn Analysis.ipynb 11/11

You might also like