Customer Churn Prediction

A Hands-on Project: Enhancing Customer Churn

Prediction with Continuous Experiment
Tracking in Machine Learning

You can use the Telco Customer Churn dataset from Kaggle. This dataset contains information
about telecom customers, including various features like contract type, monthly charges, and
whether the customer churned or not.

The goal of this project is to predict customer churn (whether a customer will leave the telecom
service) using a model stacking approach. Model stacking involves training multiple models and
combining their predictions using another model.

1. Import Libraries: Import necessary libraries and initialize Comet ML.

2. Load and Explore Data: Load dataset and perform exploratory data analysis (EDA).

3. Preprocessing: Preprocess data by encoding and scaling features.

4. Model Training: Train multiple machine learning models, including Logistic Regression,
Random Forest, Gradient Boosting, and Support Vector Machine.

5. Hyperparameter Tuning: Use Optuna to optimize hyperparameters for the models.

6. Ensemble Modeling: Create a stacking ensemble of models for improved predictions.

7. Optimization Results: Display the best hyperparameters and accuracy.

8. End Experiment: Conclude the Comet ML experiment.

This project will give you insights into dealing with classification problems, handling imbalanced
datasets (if applicable), and utilizing model stacking to enhance predictive performance.

0. Import Libraries
!pip install -q optuna comet_ml
import optuna
import comet_ml
from comet_ml import Experiment

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier,
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, log_loss, roc_auc_score
from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import StackingClassifier
from sklearn.metrics import accuracy_score, log_loss

from kaggle_secrets import UserSecretsClient

# Set display options to show all columns

pd.set_option('display.max_columns', None)

1. Initialize Comet ML
user_secrets = UserSecretsClient()
comet_api_key = user_secrets.get_secret("Comet API Key")

experiment = Experiment(
api_key= comet_api_key,

2. Load Data
# Load the dataset
data = pd.read_csv("/kaggle/input/telco-customer-churn/WA_Fn-UseC_-

customerID gender SeniorCitizen Partner Dependents tenure

PhoneService \
0 7590-VHVEG Female 0 Yes No 1
1 5575-GNVDE Male 0 No No 34
2 3668-QPYBK Male 0 No No 2

3 7795-CFOCW Male 0 No No 45
4 9237-HQITU Female 0 No No 2

MultipleLines InternetService OnlineSecurity OnlineBackup \

0 No phone service DSL No Yes
1 No DSL Yes No
2 No DSL Yes Yes
3 No phone service DSL Yes No
4 No Fiber optic No No

DeviceProtection TechSupport StreamingTV StreamingMovies

Contract \
0 No No No No Month-to-
1 Yes No No No One
2 No No No No Month-to-
3 Yes Yes No No One
4 No No No No Month-to-

PaperlessBilling PaymentMethod MonthlyCharges

TotalCharges \
0 Yes Electronic check 29.85
1 No Mailed check 56.95
2 Yes Mailed check 53.85
3 No Bank transfer (automatic) 42.30
4 Yes Electronic check 70.70

0 No
1 No
2 Yes
3 No
4 Yes

3. Perform EDA on the Dataset:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7043 entries, 0 to 7042
Data columns (total 21 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 customerID 7043 non-null object
1 gender 7043 non-null object
2 SeniorCitizen 7043 non-null int64
3 Partner 7043 non-null object
4 Dependents 7043 non-null object
5 tenure 7043 non-null int64
6 PhoneService 7043 non-null object
7 MultipleLines 7043 non-null object
8 InternetService 7043 non-null object
9 OnlineSecurity 7043 non-null object
10 OnlineBackup 7043 non-null object
11 DeviceProtection 7043 non-null object
12 TechSupport 7043 non-null object
13 StreamingTV 7043 non-null object
14 StreamingMovies 7043 non-null object
15 Contract 7043 non-null object
16 PaperlessBilling 7043 non-null object
17 PaymentMethod 7043 non-null object
18 MonthlyCharges 7043 non-null float64
19 TotalCharges 7043 non-null object
20 Churn 7043 non-null object
dtypes: float64(1), int64(2), object(18)
memory usage: 1.1+ MB

# Convert 'TotalCharges' column to numerical

data['TotalCharges'] = pd.to_numeric(data['TotalCharges'],

# Drop rows with missing values


3.1. Customer Churn Distribution

This plot shows the distribution of churn vs. non-churn customers. You can see the number of
customers who have churned (left the telecom service) and those who have not.

# Plot 1: Class Distribution (Churn vs. Non-Churn)

plt.figure(figsize=(6, 6))
ax = sns.countplot(data=data, x='Churn')
plt.title("Customer Churn Distribution")

# Adding data labels (rounded) to the bars

for p in ax.patches:
ax.annotate(f'{int(round(p.get_height()))}', (p.get_x() +
p.get_width() / 2., p.get_height()), ha='center', va='center',
fontsize=12, color='black', xytext=(0, 5), textcoords='offset points')

# Log the plot to Comet



3.2. Numeric Feature Distribution:

These histograms show the distribution of numeric features (tenure, MonthlyCharges, and
TotalCharges) for the entire dataset.

# Plot 2: Numeric Feature Distribution

numerical_features = ['tenure', 'MonthlyCharges', 'TotalCharges']
plt.figure(figsize=(15, 5))
for i, feature in enumerate(numerical_features, 1):
plt.subplot(1, 3, i)
sns.histplot(data=data, x=feature, kde=True)
plt.title(f'{feature} Distribution')
# Log the plot to Comet


3.3. Categorical Feature Distribution:

These plots show the distribution of categorical features (gender, SeniorCitizen, Partner,
Dependents, Contract, PaymentMethod) split by churn status.

These plots provide insights into how different categories of customers (e.g., seniors vs. non-
seniors, customers with partners vs. without) are distributed in terms of churn. You can identify
potential customer segments that are more likely to churn.

# Plot 3: Categorical Feature Distribution

categorical_features = ['gender', 'SeniorCitizen', 'Partner',
'Dependents', 'Contract', 'PaymentMethod']
plt.figure(figsize=(15, 10))
for i, feature in enumerate(categorical_features, 1):
plt.subplot(2, 3, i)
sns.countplot(data=data, x=feature, hue='Churn', palette='Set2')
plt.title(f'{feature} Distribution by Churn')
# Log the plot to Comet

3.4. Correlation Heatmap:

The heatmap displays the correlation between numeric features in the dataset.

Understanding feature correlations can help in feature selection. For instance, if

MonthlyCharges and TotalCharges are highly correlated, you might choose to keep only one of
them to avoid multicollinearity in your models. It also helps identify which features might be
more important in predicting churn.

# Plot 4: Correlation Heatmap

plt.figure(figsize=(10, 8))
correlation_matrix = data.corr(method='pearson', min_periods=1)
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm',
plt.title("Correlation Heatmap")
# Log the plot to Comet

3.5. Monthly Charges vs. Total Charges:

This scatterplot shows the relationship between Monthly Charges and Total Charges, with
points colored by churn status.

It appears that customers who have higher Total Charges are less likely to churn. This suggests
that long-term customers who spend more are more loyal. You can use this insight to focus on
retaining high-value, long-term customers by offering loyalty programs or incentives. These

business insights derived from EDA can guide feature engineering and model selection for your
churn prediction project. They help you understand the data's characteristics and make informed
decisions to optimize customer retention strategies.

# Plot 5: Monthly Charges vs. Total Charges

plt.figure(figsize=(8, 6))
sns.scatterplot(data=data, x='MonthlyCharges', y='TotalCharges',
hue='Churn', palette='Set2')
plt.title("Monthly Charges vs. Total Charges")
plt.xlabel("Monthly Charges")
plt.ylabel("Total Charges")
# Log the plot to Comet


4. Preprocessing
# Encode categorical features, scale numerical features

encoder = OneHotEncoder(handle_unknown="ignore", sparse=False)

scaler = StandardScaler()

X_train, X_val, y_train, y_val = train_test_split(data.drop("Churn",

axis=1), data["Churn"], test_size=0.2, random_state=42)

X_train_encoded = encoder.fit_transform(X_train[categorical_features])
X_val_encoded = encoder.transform(X_val[categorical_features])

X_train_scaled = scaler.fit_transform(X_train[numerical_features])
X_val_scaled = scaler.transform(X_val[numerical_features])

X_train_processed = np.concatenate((X_train_encoded, X_train_scaled),

X_val_processed = np.concatenate((X_val_encoded, X_val_scaled),

# Split data into features and target

X = data.drop("Churn", axis=1)
y = data["Churn"]

# Split data into train and validation sets

X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2,

5. Model Training and Hyperparameter Tuning:

Logistic Regression (logreg):

• Simple and interpretable model.

• Well-suited for binary classification tasks like churn prediction.
• Helps understand how features impact the chance of churn.

Random Forest Classifier (rf):

• Ensemble method combining multiple decision trees.

• Handles mixed feature types (categorical and numerical).
• Resistant to overfitting, good for complex datasets.

Gradient Boosting Classifier (gb):

• Sequential ensemble building strong predictive power.

• Captures complex relationships in data.
• Works well for various types of datasets.

Support Vector Machine (svm):

• Versatile model for linear and non-linear data.

• Can find complex decision boundaries.
• Useful when patterns between churn and non-churn are intricate.

Modeling Stacking

In the project, I am stacking models such as random forests, gradient boosting, and support
vector machines, which have different characteristics and can capture different aspects of the
customer churn problem. This ensemble approach can help you achieve a more accurate and
robust churn prediction model, ultimately leading to better customer retention strategies and
business outcomes.

5.1. Define an Optuna Objective Function:

def objective(trial):
# Define hyperparameter search space for individual models
rf_params = {
'n_estimators': trial.suggest_int('rf_n_estimators', 100,
'max_depth': trial.suggest_categorical('rf_max_depth', [None,
10, 20]),
'min_samples_split': trial.suggest_int('rf_min_samples_split',
2, 10),
'min_samples_leaf': trial.suggest_int('rf_min_samples_leaf',
1, 4),

gb_params = {
'n_estimators': trial.suggest_int('gb_n_estimators', 100,
'learning_rate': trial.suggest_float('gb_learning_rate', 0.01,
'max_depth': trial.suggest_categorical('gb_max_depth', [3, 4,

svm_params = {
'C': trial.suggest_categorical('svm_C', [0.1, 1, 10]),
'kernel': trial.suggest_categorical('svm_kernel', ['linear',

# Create models with suggested hyperparameters

rf = RandomForestClassifier(**rf_params)
gb = GradientBoostingClassifier(**gb_params)
svm = SVC(probability=True, **svm_params)

# Train individual models, y_train), y_train), y_train)

# Evaluate individual models on validation data

rf_predictions = rf.predict(X_val_processed)
gb_predictions = gb.predict(X_val_processed)
svm_predictions = svm.predict(X_val_processed)

# Calculate accuracy and ROC AUC for individual models

rf_accuracy = accuracy_score(y_val, rf_predictions)
gb_accuracy = accuracy_score(y_val, gb_predictions)
svm_accuracy = accuracy_score(y_val, svm_predictions)

rf_roc_auc = roc_auc_score(y_val,
rf.predict_proba(X_val_processed)[:, 1])
gb_roc_auc = roc_auc_score(y_val,
gb.predict_proba(X_val_processed)[:, 1])
svm_roc_auc = roc_auc_score(y_val,
svm.predict_proba(X_val_processed)[:, 1])

# Create a stacking ensemble with trained models

estimators = [
('random_forest', rf),
('gradient_boosting', gb),
('svm', svm)

stacking_classifier = StackingClassifier(estimators=estimators,

# Train the stacking ensemble, y_train)

# Evaluate the stacking ensemble on validation data

stacking_predictions =
stacking_accuracy = accuracy_score(y_val, stacking_predictions)
stacking_roc_auc = roc_auc_score(y_val,
stacking_classifier.predict_proba(X_val_processed)[:, 1])

# Log parameters and metrics to Comet ML

'rf_n_estimators': rf_params['n_estimators'],
'rf_max_depth': rf_params['max_depth'],

'rf_min_samples_split': rf_params['min_samples_split'],
'rf_min_samples_leaf': rf_params['min_samples_leaf'],
'gb_n_estimators': gb_params['n_estimators'],
'gb_learning_rate': gb_params['learning_rate'],
'gb_max_depth': gb_params['max_depth'],
'svm_C': svm_params['C'],
'svm_kernel': svm_params['kernel']

'rf_accuracy': rf_accuracy,
'gb_accuracy': gb_accuracy,
'svm_accuracy': svm_accuracy,
'rf_roc_auc': rf_roc_auc,
'gb_roc_auc': gb_roc_auc,
'svm_roc_auc': svm_roc_auc,
'stacking_accuracy': stacking_accuracy,
'stacking_roc_auc': stacking_roc_auc

# Return the negative accuracy as Optuna aims to minimize the

return -stacking_accuracy

5.2. Optuna Hyperparameter Optimization:

Now, you can use Optuna to optimize the hyperparameters of your models. Optuna will search
the hyperparameter space defined in the objective function and log the results to Comet ML.

Clarify the optimization goal: You should mention whether you are minimizing or maximizing a
specific metric. In the code, I am using direction='minimize', which implies optimizing accuracy
(negative accuracy to minimize) AKA minimizing a loss or error metric. If you want to
maximize accuracy or ROC AUC, you should use direction='maximize'.

from tabulate import tabulate

# Create and optimize the study

study = optuna.create_study(direction='minimize') # Adjust direction
based on your optimization goal
study.optimize(objective, n_trials=100) # You can adjust the number
of trials

# Get the best hyperparameters and results

best_rf_params = study.best_params
best_accuracy = -study.best_value # Convert back to positive accuracy

# Convert the dictionary to a list of key-value pairs for tabulation

param_table = [(key, value) for key, value in best_rf_params.items()]

# Display the best_rf_params table

best_rf_params = tabulate(param_table, headers=["Parameter", "Value"],

print(f"Best RF Hyperparameters:\n{best_rf_params}")
print(f"Best Accuracy: {best_accuracy}")

Best RF Hyperparameters:
| Parameter | Value |
| rf_n_estimators | 300 |
| rf_max_depth | 20 |
| rf_min_samples_split | 8 |
| rf_min_samples_leaf | 2 |
| gb_n_estimators | 139 |
| gb_learning_rate | 0.09345289942291049 |
| gb_max_depth | 4 |
| svm_C | 1 |
| svm_kernel | rbf |
Best Accuracy: 0.7917555081734187


COMET INFO: Experiment Summary
COMET INFO: display_summary_level : 1
COMET INFO: Metrics [count] (min, max):
COMET INFO: gb_accuracy [100] : (0.7640369580668088,
COMET INFO: gb_roc_auc [100] : (0.8042899814154298,

COMET INFO: rf_accuracy [100] : (0.7604832977967306,
COMET INFO: rf_roc_auc [100] : (0.7839880209762333,
COMET INFO: stacking_accuracy [100] : (0.7782515991471215,
COMET INFO: stacking_roc_auc [100] : (0.8136417992348748,
COMET INFO: svm_accuracy [100] : (0.7732764747690121,
COMET INFO: svm_roc_auc [100] : (0.7664970414813818,
COMET INFO: Parameters:
COMET INFO: bootstrap : True
COMET INFO: break_ties : False
COMET INFO: cache_size : 200
COMET INFO: categories : auto
COMET INFO: ccp_alpha : 0.0
COMET INFO: class_weight : 1
COMET INFO: coef0 : 0.0
COMET INFO: constant : 1
COMET INFO: copy : True
COMET INFO: criterion :
COMET INFO: cv : 1
COMET INFO: decision_function_shape : ovr
COMET INFO: degree : 3
COMET INFO: drop : 1
COMET INFO: dtype : <class
COMET INFO: dual : False
COMET INFO: estimators :
[('random_forest', RandomForestClassifier(max_depth=20,
min_samples_leaf=2, min_samples_split=9,
n_estimators=295)), ('gradient_boosting',
n_estimators=113)), ('svm', SVC(C=1,
kernel='linear', probability=True))]
COMET INFO: final_estimator :
COMET INFO: final_estimator__C : 1.0
COMET INFO: final_estimator__class_weight : 1
COMET INFO: final_estimator__dual : False
COMET INFO: final_estimator__fit_intercept : True
COMET INFO: final_estimator__intercept_scaling : 1
COMET INFO: final_estimator__l1_ratio : 1

COMET INFO: final_estimator__max_iter : 100

COMET INFO: final_estimator__multi_class : auto
COMET INFO: final_estimator__n_jobs : 1
COMET INFO: final_estimator__penalty : l2
COMET INFO: final_estimator__random_state : 1
COMET INFO: final_estimator__solver : lbfgs
COMET INFO: final_estimator__tol : 0.0001
COMET INFO: final_estimator__verbose : 0
COMET INFO: final_estimator__warm_start : False
COMET INFO: fit_intercept : True
COMET INFO: gamma : scale
COMET INFO: gb_learning_rate :
COMET INFO: gb_max_depth : 4
COMET INFO: gb_n_estimators : 113
COMET INFO: gradient_boosting :
COMET INFO: gradient_boosting__ccp_alpha : 0.0
COMET INFO: gradient_boosting__criterion :
COMET INFO: gradient_boosting__init : 1
COMET INFO: gradient_boosting__learning_rate :
COMET INFO: gradient_boosting__loss : log_loss
COMET INFO: gradient_boosting__max_depth : 4
COMET INFO: gradient_boosting__max_features : 1
COMET INFO: gradient_boosting__max_leaf_nodes : 1
COMET INFO: gradient_boosting__min_impurity_decrease : 0.0
COMET INFO: gradient_boosting__min_samples_leaf : 1
COMET INFO: gradient_boosting__min_samples_split : 2
COMET INFO: gradient_boosting__min_weight_fraction_leaf : 0.0
COMET INFO: gradient_boosting__n_estimators : 113
COMET INFO: gradient_boosting__n_iter_no_change : 1
COMET INFO: gradient_boosting__random_state : 1
COMET INFO: gradient_boosting__subsample : 1.0
COMET INFO: gradient_boosting__tol : 0.0001
COMET INFO: gradient_boosting__validation_fraction : 0.1
COMET INFO: gradient_boosting__verbose : 0
COMET INFO: gradient_boosting__warm_start : False
COMET INFO: handle_unknown : ignore
COMET INFO: init : 1
COMET INFO: intercept_scaling : 1
COMET INFO: kernel : linear
COMET INFO: l1_ratio : 1
COMET INFO: learning_rate :
COMET INFO: loss : log_loss

COMET INFO: max_categories : 1

COMET INFO: max_depth : 4
COMET INFO: max_features : 600
COMET INFO: max_iter : 100
COMET INFO: max_leaf_nodes : 1
COMET INFO: max_samples : 1
COMET INFO: min_frequency : 1
COMET INFO: min_impurity_decrease : 0.0
COMET INFO: min_samples_leaf : 1
COMET INFO: min_samples_split : 2
COMET INFO: min_weight_fraction_leaf : 0.0
COMET INFO: multi_class : auto
COMET INFO: n_estimators : 113
COMET INFO: n_iter_no_change : 1
COMET INFO: n_jobs : 1
COMET INFO: oob_score : False
COMET INFO: passthrough : False
COMET INFO: penalty : l2
COMET INFO: probability : True
COMET INFO: random_forest :
RandomForestClassifier(max_depth=20, min_samples_leaf=2,
COMET INFO: random_forest__bootstrap : True
COMET INFO: random_forest__ccp_alpha : 0.0
COMET INFO: random_forest__class_weight : 1
COMET INFO: random_forest__criterion : gini
COMET INFO: random_forest__max_depth : 20
COMET INFO: random_forest__max_features : sqrt
COMET INFO: random_forest__max_leaf_nodes : 1
COMET INFO: random_forest__max_samples : 1
COMET INFO: random_forest__min_impurity_decrease : 0.0
COMET INFO: random_forest__min_samples_leaf : 2
COMET INFO: random_forest__min_samples_split : 9
COMET INFO: random_forest__min_weight_fraction_leaf : 0.0
COMET INFO: random_forest__n_estimators : 295
COMET INFO: random_forest__n_jobs : 1
COMET INFO: random_forest__oob_score : False
COMET INFO: random_forest__random_state : 1
COMET INFO: random_forest__verbose : 0
COMET INFO: random_forest__warm_start : False
COMET INFO: random_state : 181532
COMET INFO: rf_max_depth : 20
COMET INFO: rf_min_samples_leaf : 2
COMET INFO: rf_min_samples_split : 9
COMET INFO: rf_n_estimators : 295
COMET INFO: shrinking : True
COMET INFO: solver : lbfgs
COMET INFO: sparse : False

COMET INFO: sparse_output : False

COMET INFO: splitter : best
COMET INFO: stack_method : auto
COMET INFO: strategy : prior
COMET INFO: subsample : 1.0
COMET INFO: svm : SVC(C=1,
kernel='linear', probability=True)
COMET INFO: svm_C : 1
COMET INFO: svm__C : 1
COMET INFO: svm__break_ties : False
COMET INFO: svm__cache_size : 200
COMET INFO: svm__class_weight : 1
COMET INFO: svm__coef0 : 0.0
COMET INFO: svm__decision_function_shape : ovr
COMET INFO: svm__degree : 3
COMET INFO: svm__gamma : scale
COMET INFO: svm__kernel : linear
COMET INFO: svm__max_iter : -1
COMET INFO: svm__probability : True
COMET INFO: svm__random_state : 1
COMET INFO: svm__shrinking : True
COMET INFO: svm__tol : 0.001
COMET INFO: svm__verbose : False
COMET INFO: svm_kernel : linear
COMET INFO: tol : 0.0001
COMET INFO: validation_fraction : 0.1
COMET INFO: verbose : 0
COMET INFO: warm_start : False
COMET INFO: with_mean : True
COMET INFO: with_std : True
COMET INFO: Uploads:
COMET INFO: conda-environment-definition : 1
COMET INFO: conda-info : 1
COMET INFO: conda-specification : 1
COMET INFO: environment details : 1
COMET INFO: figures : 5
COMET INFO: filename : 1
COMET INFO: installed packages : 1
COMET INFO: notebook : 1
COMET INFO: os packages : 1
COMET INFO: source_code : 1
COMET INFO: Please wait for metadata to finish uploading (timeout is
3600 seconds)

6. Interpret the results

Detail explanation and interpret the results into business insights in the blog.

• GitHub Repository
• Kaggle Project
• Medium Article

