Professional Documents
Culture Documents
Ads Phase 4
Ads Phase 4
Ads Phase 4
Science
Team Leader: DINESH M
NM ID: au723721205014
PROJECT: CUSTOMER SEGMENTATION USING DATA SCIENCE
INTRODUCTION:
Customer segmentation is a marketing strategy that involves
dividing a company's customer base into distinct groups or segments based on
shared characteristics or behaviours. The problem is to implement data science
techniques to segment customers based on their behaviour, preferences, and
demographic attributes. The goal is to enable businesses to personalize
marketing strategies and enhance customer satisfaction. This project involves
data collection, data preprocessing, feature engineering, clustering algorithms,
visualization, and interpretation of results. In customer segmentation using data
science is a critical step in the process of leveraging data-driven techniques to
divide a customer base into meaningful and actionable segments. It involves
specifying the objectives and goals of the segmentation project, clarifying what
you aim to achieve, and setting the context for how data science will be used to
solve specific business problems.
INNOVATION PHASE:
In the context of customer segmentation using data science, the innovation
phase is a critical step in the process that involves developing and implementing
novel approaches, techniques, or technologies to enhance the accuracy,
effectiveness, and efficiency of customer segmentation. This phase is about
pushing the boundaries of traditional methods to extract deeper insights and
create more actionable segments. Here's an overview of the innovation phase in
customer segmentation using data science:
DEVELOPMENT PHASE 1
In this phase we loaded the dataset which is provided for us and pre-processed
the data by using python library packages and necessary methods to implement
it.
Provided dataset for us
(https://www.kaggle.com/datasets/akram24/mall-customers)
PHASE 4
➢ In this phase we are going to test the model which we are pre-processed
by using some of the models and going to evolve those models.
Feature Engineering:
✓ The main objective of the feature engineering stage is to create the
relevant features that capture customer behaviour and preferences.
✓ It also generates new features based on customer interactions and
demographics.
✓ It reduces the dimensionality in case of the necessary situation.
Import Libraries
import pandas as pd
import numpy as np
import seaborn as sns
Loading Data
df = pd.read_csv('/kaggle/input/mall-customers/Mall_Customers.csv')
df.head()
0 1 Male 19 15 39
1 2 Male 21 15 81
2 3 Female 20 16 6
3 4 Female 23 16 77
4 5 Female 31 17 40
Now mall company wants to segregate their clients based on two features:
Annual Income and Spending Score
This is a typical clustering problem. Lets help poor guys with newly acquired
knowledge of K-means clustering.
df.shape
(200, 5)
X = df.iloc[:,[3,4]].values
TRAIN_TEST_SPLIT
Sklearn
➢ Scikit-learn, often abbreviated as sklearn, is a popular machine learning li
brary in Python.
➢ It provides a wide range of tools and algorithms for tasks related to data a
nalysis, machine learning, and data mining.
➢ Scikit-learn is open-source and built on top of other Python libraries such
as NumPy, SciPy, and Matplotlib.
Key features of sklearn
➢ Supervised Learning
➢ Unsupervised Learning
➢ Preprocessing
➢ Model Selection
➢ Model Evaluation
➢ Pipelines
➢ Feature Extraction
➢ Datasets
Clustering Algorithm:
In this step I used K-Means Clustering Algorithm.
K-Means Clustering is an unsupervised learning algorithm that is used to solve
the clustering problems in machine learning or data science.
Here K defines the number of pre-defined clusters that need to be created in the
process, as if K=2, there will be two clusters, and for K=3, there will be three
clusters, and so on.
Step-2: Select random K points or centroids. (It can be other from the input
dataset).
Step-3: Assign each data point to their closest centroid, which will form the
predefined K clusters.
Step-4: Calculate the variance and place a new centroid of each cluster.
Step-5: Repeat the third steps, which means reassign each datapoint to the new
closest centroid of each cluster.
sns.set_style('darkgrid')
plt.plot(range(1,11),wcss)
plt.title('The Elbow Method')
plt.xlabel('Number of Clusters')
plt.ylabel('Within Cluster Sum Of Squares');
Visualization
✓ Visualization is one of the key concept in data science which
can be used for give the pictorial or virtual representation about
the data.
✓ Several plots are used for visualize the customer segments.
✓ Plots example:
1. Bar Chart
2. Scatter Plot
3. Pie Plot
4. Line Plot
5. Histogram
✓ In python, Matplotlib library used for the visualization.
✓ Using these charts we can clearly virtualize our Mall Dataset
especially Bar chart is used in popularly for virtualize the
dataset.
✓ Syntax,
“ Import matplotlib.pyplot as plt “
km = KMeans(n_clusters=5,init = 'k-means++',max_iter=300,n_init=10,random
_state=0) # setting default values for max_iter and n_init
y_means = km.fit_predict(X)
plt.title('Clusters of Clients')
plt.xlabel('Annual Income (k$)')
plt.ylabel('Spending Score (1-100)')
plt.legend();
sns.set_style('darkgrid')
plt.figure(figsize=(10,6))
plt.scatter(X[y_means==0,0],X[y_means==0,1],s=50,c='r
ed',label='Careful')
plt.scatter(X[y_means==1,0],X[y_means==1,1],s=50,c='b
lue',label='Standard')
plt.scatter(X[y_means==2,0],X[y_means==2,1],s=50,c='g
reen',label='Target')
plt.scatter(X[y_means==3,0],X[y_means==3,1],s=50,c='b
rown',label='Careless')
plt.scatter(X[y_means==4,0],X[y_means==4,1],s=50,c='m
agenta',label='Sensible')
plt.scatter(km.cluster_centers_[:,0],km.cluster_cente
rs_[:,1],s=250,c='yellow',label='Centroids')
plt.title('Clusters of Clients')
plt.xlabel('Annual Income (k$)')
plt.ylabel('Spending Score (1-100)')
plt.legend();
Conclusion:
This project aims to the data science techniques to
enhance customer satisfaction and business revenue through customer
segmentation and personalized marketing. By systematically
following the outlined phases and goals, we can achieve the deeper
understanding of customer behaviour and preferences, resulting in
more effective marketing strategies.