Ads Phase 4

Customer Segmentation using Data
Science
Team Leader: DINESH M
NM ID: au723721205014
PROJECT: CUSTOMER SEGMENTATION USING DATA SCIENCE
INTRODUCTION:
Customer segmentation is a marketing strategy that involves
dividing a company's customer base into distinct groups or segments based on
shared characteristics or behaviours. The problem is to implement data science
techniques to segment customers based on their behaviour, preferences, and
demographic attributes. The goal is to enable businesses to personalize
marketing strategies and enhance customer satisfaction. This project involves
data collection, data preprocessing, feature engineering, clustering algorithms,
visualization, and interpretation of results. In customer segmentation using data
science is a critical step in the process of leveraging data-driven techniques to
divide a customer base into meaningful and actionable segments. It involves
specifying the objectives and goals of the segmentation project, clarifying what
you aim to achieve, and setting the context for how data science will be used to
solve specific business problems.
WORKS DONE IN PREVIOUS PHASES

DEFINITION PHASE
➢ It is the process of grouping customers according to how and
why they are buying products.
➢ The problem is to implement data science techniques to
segment customers based on their behavior, preferences, and
demographic attributes.
➢ The goal is to enable businesses to personalize marketing
strategies and enhance customer satisfaction.
➢ This project involves data collection, data preprocessing,
feature engineering, clustering algorithms, visualization, and
interpretation of results.
➢ The main goal for the customer segmentation using data science
is to divide the customer base into distinct groups based on
similar characteristics.
➢ This segment will helpful for many Businesses purpose
INNOVATION PHASE:
In the context of customer segmentation using data science, the innovation
phase is a critical step in the process that involves developing and implementing
novel approaches, techniques, or technologies to enhance the accuracy,
effectiveness, and efficiency of customer segmentation. This phase is about
pushing the boundaries of traditional methods to extract deeper insights and
create more actionable segments. Here's an overview of the innovation phase in
customer segmentation using data science:
➢ Identifying Data Sources: To drive innovation, you need to consider

where you can obtain additional or unique data sources. These sources
might include IoT devices, social media, customer feedback, or external
data like weather patterns, economic indicators, or competitor
information.
➢ Advanced Analytics Techniques: Leveraging cutting-edge analytics
techniques can set your customer segmentation apart. This might involve:
o Machine Learning: Utilizing machine learning algorithms to
identify patterns and trends in the data that might not be apparent
through traditional statistical methods.
o Deep Learning: Employing neural networks for more complex and
nuanced customer segmentations, especially in cases involving
unstructured data like text and images.
o Natural Language Processing (NLP): Applying NLP to analyze
customer reviews, chat logs, and social media conversations to
gain a deeper understanding of customer sentiment and
preferences.
o Predictive Modeling: Creating predictive models to anticipate
future customer behavior and needs within each segment.
o Reinforcement Learning: Using reinforcement learning algorithms
to optimize marketing strategies for different customer segments.
➢ Data Enrichment: Innovations in data enrichment involve integrating
external data sources, such as third-party databases or publicly available
datasets, to supplement and enhance your existing customer data.
➢ Real-time Segmentation: Traditional segmentation is often based on static
data, but innovation may involve creating real-time or dynamic
segmentation models that adapt to changes in customer behavior
instantly.
➢ Personalization: Innovations in personalization can include the
development of recommendation systems that suggest products or content
based on individual customer preferences. Machine learning models can
continuously refine recommendations as customer behavior evolves.
➢ AI-Powered Automation: Implementing artificial intelligence (AI) to
automate the process of segmenting customers in real-time and making
personalized recommendations or decisions without human intervention.
➢ Behavioral Analysis: Innovations may involve a deeper dive into
understanding the motivations and behavior of customers through
advanced behavioral analytics tools, enabling you to predict future
behavior more accurately.
➢ Privacy and Ethics: As an essential aspect of innovation, it's crucial to
address the ethical considerations and privacy concerns surrounding the
collection and use of customer data. Compliance with regulations like
GDPR and ensuring data security and customer consent are crucial.
➢ Iterative Improvement: The innovation phase should be iterative, with
continuous monitoring, testing, and refining of segmentation models and
methods. This allows your organization to adapt to changing market
conditions and customer behavior.
➢ Cross-functional Collaboration: Encouraging collaboration between data
scientists, marketers, product developers, and other relevant teams is vital
in the innovation phase. Cross-functional teams can leverage different
perspectives and expertise to drive innovation in customer segmentation.
➢ Measuring Success: Develop metrics to measure the success of your
innovative customer segmentation strategies. Metrics might include
increased conversion rates, customer satisfaction, revenue growth, and
more.
DEVELOPMENT PHASE 1
In this phase we loaded the dataset which is provided for us and pre-processed
the data by using python library packages and necessary methods to implement
it.
Provided dataset for us
(https://www.kaggle.com/datasets/akram24/mall-customers)
PHASE 4
➢ In this phase we are going to test the model which we are pre-processed
by using some of the models and going to evolve those models.
➢ This can be executed by using

▪ Feature Engineering
▪ Clustering Algorithm
▪ Visualization
Feature Engineering:
✓ The main objective of the feature engineering stage is to create the
relevant features that capture customer behaviour and preferences.
✓ It also generates new features based on customer interactions and
demographics.
✓ It reduces the dimensionality in case of the necessary situation.
Import Libraries
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

%matplotlib inline
from IPython.display import Image
Loading Data
df = pd.read_csv('/kaggle/input/mall-customers/Mall_Customers.csv')
df.head()
Annual Income Spending Score (1-100)

CustomerID Genre Age
(k$)
0 1 Male 19 15 39
1 2 Male 21 15 81
2 3 Female 20 16 6
3 4 Female 23 16 77
4 5 Female 31 17 40
Now mall company wants to segregate their clients based on two features:
Annual Income and Spending Score
This is a typical clustering problem. Lets help poor guys with newly acquired
knowledge of K-means clustering.
df.shape
(200, 5)
X = df.iloc[:,[3,4]].values
TRAIN_TEST_SPLIT
Sklearn
➢ Scikit-learn, often abbreviated as sklearn, is a popular machine learning li
brary in Python.
➢ It provides a wide range of tools and algorithms for tasks related to data a
nalysis, machine learning, and data mining.
➢ Scikit-learn is open-source and built on top of other Python libraries such
as NumPy, SciPy, and Matplotlib.
Key features of sklearn
➢ Supervised Learning
➢ Unsupervised Learning
➢ Preprocessing
➢ Model Selection
➢ Model Evaluation
➢ Pipelines
➢ Feature Extraction
➢ Datasets
Clustering Algorithm:
In this step I used K-Means Clustering Algorithm.
K-Means Clustering is an unsupervised learning algorithm that is used to solve
the clustering problems in machine learning or data science.
Here K defines the number of pre-defined clusters that need to be created in the
process, as if K=2, there will be two clusters, and for K=3, there will be three
clusters, and so on.
The working of the K-Means algorithm is explained in the below steps:
Step-1: Select the number K to decide the number of clusters.
Step-2: Select random K points or centroids. (It can be other from the input
dataset).
Step-3: Assign each data point to their closest centroid, which will form the
predefined K clusters.
Step-4: Calculate the variance and place a new centroid of each cluster.
Step-5: Repeat the third steps, which means reassign each datapoint to the new
closest centroid of each cluster.
Step-6: If any reassignment occurs, then go to step-4 else go to FINISH.
Step-7: The model is ready.
Creating the Clusters

from sklearn.cluster import KMeans
Using the elbow method to find optimum number of
clusters
wcss = []
for i in range(1,11):
km = KMeans(n_clusters=i,init = 'k-means++',max_iter=300,n_init=10,rando
m_state=0)
km.fit(X)
wcss.append(km.inertia_)
What is random initialization trap?

Let's say we have scatter plot which looks something like this…
If we choose K=3 clusters… we will hope the random initialization would lead
us to...following this 3 clusters
sns.set_style('darkgrid')
plt.plot(range(1,11),wcss)
plt.title('The Elbow Method')
plt.xlabel('Number of Clusters')
plt.ylabel('Within Cluster Sum Of Squares');
Visualization
✓ Visualization is one of the key concept in data science which
can be used for give the pictorial or virtual representation about
the data.
✓ Several plots are used for visualize the customer segments.
✓ Plots example:
1. Bar Chart
2. Scatter Plot
3. Pie Plot
4. Line Plot
5. Histogram
✓ In python, Matplotlib library used for the visualization.
✓ Using these charts we can clearly virtualize our Mall Dataset
especially Bar chart is used in popularly for virtualize the
dataset.
✓ Syntax,
“ Import matplotlib.pyplot as plt “
Applying K-means to mall dataset
km = KMeans(n_clusters=5,init = 'k-means++',max_iter=300,n_init=10,random
_state=0) # setting default values for max_iter and n_init
y_means = km.fit_predict(X)
Visualizing the Clusters

#Plotting scatter plot of clusters along with their highlighted clusters.
sns.set_style('whitegrid')
plt.figure(figsize=(10,6))
plt.scatter(X[y_means==0,0],X[y_means==0,1],s=50,c='red',label='Cl
uster 1',marker='*') #X[y_means==0,0] for x-coordinates for cluter1,
X[y_means==0,1] for y-coordinates ,s for size of datapoint
plt.scatter(X[y_means==1,0],X[y_means==1,1],s=50,c='blue',label='
Cluster 2',marker='*')
plt.scatter(X[y_means==2,0],X[y_means==2,1],s=50,c='green',label='
Cluster 3',marker='*')
plt.scatter(X[y_means==3,0],X[y_means==3,1],s=50,c='brown',label
='Cluster 4',marker='*')
plt.scatter(X[y_means==4,0],X[y_means==4,1],s=50,c='magenta',lab
el='Cluster 5',marker='*')
plt.scatter(km.cluster_centers_[:,0],km.cluster_centers_[:,1],s=250,c='
yellow',label='Centroids') #Centroids are highlighted with bigger size
plt.title('Clusters of Clients')
plt.xlabel('Annual Income (k$)')
plt.ylabel('Spending Score (1-100)')
plt.legend();
sns.set_style('darkgrid')
plt.figure(figsize=(10,6))
plt.scatter(X[y_means==0,0],X[y_means==0,1],s=50,c='r
ed',label='Careful')
plt.scatter(X[y_means==1,0],X[y_means==1,1],s=50,c='b
lue',label='Standard')
plt.scatter(X[y_means==2,0],X[y_means==2,1],s=50,c='g
reen',label='Target')
plt.scatter(X[y_means==3,0],X[y_means==3,1],s=50,c='b
rown',label='Careless')
plt.scatter(X[y_means==4,0],X[y_means==4,1],s=50,c='m
agenta',label='Sensible')
plt.scatter(km.cluster_centers_[:,0],km.cluster_cente
rs_[:,1],s=250,c='yellow',label='Centroids')
plt.title('Clusters of Clients')
plt.xlabel('Annual Income (k$)')
plt.ylabel('Spending Score (1-100)')
plt.legend();
Conclusion:
This project aims to the data science techniques to
enhance customer satisfaction and business revenue through customer
segmentation and personalized marketing. By systematically
following the outlined phases and goals, we can achieve the deeper
understanding of customer behaviour and preferences, resulting in
more effective marketing strategies.

Ads Phase 4

Uploaded by

Copyright:

Available Formats

You might also like

Ads Phase 4

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Ads Phase 4

Uploaded by

Copyright:

Available Formats

Customer Segmentation using Data

WORKS DONE IN PREVIOUS PHASES

➢ Identifying Data Sources: To drive innovation, you need to consider

➢ This can be executed by using

import matplotlib.pyplot as plt

from IPython.display import Image

Annual Income Spending Score (1-100)

The working of the K-Means algorithm is explained in the below steps:

Step-1: Select the number K to decide the number of clusters.

Step-6: If any reassignment occurs, then go to step-4 else go to FINISH.

Step-7: The model is ready.

Creating the Clusters

What is random initialization trap?

Applying K-means to mall dataset

Visualizing the Clusters

You might also like