Professional Documents
Culture Documents
RFM Model For Customer Purchase Behaviour Using K-Means Algorithm
RFM Model For Customer Purchase Behaviour Using K-Means Algorithm
CERTIFICATE
Certified that the Technical Seminar titled RFM model for Customer Purchase be-
haviour Using K-Means Algorithm is carried out by Shubhankar (1RV16EC155)
and Siddhartha Bhaumik (1RV16EC156) who are bonafide students of RV College
of Engineering, Bengaluru, in partial fulfillment of the requirements for the degree of
Bachelor of Engineering in Electronics and Communication Engineering of the
Visvesvaraya Technological University, Belagavi during the year 2020-2021. It is cer-
tified that all corrections/suggestions indicated for the Internal Assessment have been
incorporated in the Technical Seminar report deposited in the departmental library. The
Technical Seminar report has been approved as it satisfies the academic requirements in
respect of Technical Seminar work prescribed by the institution for the said degree.
External Viva
1.
2.
DECLARATION
We, Shubhankar and Siddhartha Bhaumik students of eighth semester B.E., De-
partment of Electronics and Communication Engineering, RV College of Engineering,
Bengaluru, hereby declare that the Technical Seminar titled ‘RFM model for Cus-
tomer Purchase behaviour Using K-Means Algorithm’ has been carried out by us
and submitted in partial fulfilment for the award of degree of Bachelor of Engineering
in Electronics and Communication Engineering during the year 2020-2021.
Further we declare that the content of the dissertation has not been submitted previously
by anybody for the award of any degree or diploma to any other university.
We also declare that any Intellectual Property Rights generated out of this project carried
out at RVCE will be the property of RV College of Engineering, Bengaluru and we will
be one of the authors of the same.
Place: Bengaluru
Date:
Name Signature
1. Shubhankar(1RV16EC155)
2. Siddhartha Bhaumik(1RV16EC156)
ACKNOWLEDGEMENT
We are indebted to our guide, Dr. Nagaraj Bhat, Assistant Professor, RV College
of Engineering . for the wholehearted support, suggestions and invaluable advice through-
out our Technical Seminar and also helped in the preparation of this thesis.
We also express our gratitude to our examiner Dr. Kiran V., Associate Professor ,
Department of Electronics and Communication Engineering for their valuable comments
and suggestions.
Our sincere thanks to Dr. K S Geetha, Professor and Head, Department of Elec-
tronics and Communication Engineering, RVCE for the support and encouragement.
We thank all the teaching staff and technical staff of Electronics and Communication
Engineering department, RVCE for their help.
Lastly, we take this opportunity to thank our family members and friends who pro-
vided all the backup support throughout the project work.
ABSTRACT
Clustering is the method of grouping a set of objects in a certain way that the objects
within the same group which are called clusters and are more similar to each other in a
certain sense to every set aside from those in other groups or clusters. The main task of
clustering is Univariate Analysis and Exploratory Data Analysis.
The project evaluates the performance of Customer Segmentation performed on the
set of data acquired from different places. The RFM analysis performed provides us with
different scenarios needed for achieving insights and strategies in marketing planning. K-
Means algorithm is mainly employed in this project where PCA and Non-PCA process is
employed in order to get the most model required in K-Means and the clusters are created
to create groups in the data to plan different means to keep them together. Using python
packages (yellow brick, Scikit, Pandas and PCA) the dataset is created into distribution
plots and they are then converted into clusters for the requirement.
In the following project, the main objective is to apply to different marketing models
through intelligence to identify certain potential customers by giving proof of relevant
and the timely data for marketing entities in the Marketing Retail Industry. One dataset
is considered for simulation which is then processed to perform different Analysis in the
clustering algorithm, the main two algorithms considered are the K-means and Hierarchi-
cal Algorithm to find the number of Clusters in a data group which is formed to provide
the insight required.
The results obtained for customer segmentation based on their buying pattern of
customers though strategically important, is an equally difficult task. Customer retention
has another one of major concern for both online and the physical enterprises that are
used. In the research work, the RFM model is implemented on synthetic and real data
sets, to analyse different customer segmentation behaviour. Based on the Silhouette
Score, the Sales Recency, Sales Frequency and Sales Monetary can be analysed and an
optimal solution is found and used. The clusters are taken into consideration which allows
us to provide insight and strategies for targeting marketing towards certain customers.
Clusters allow in creating a scenario for targeted strategy every single customer which
allows us create certain situations for future scenarios for improvement into the strategies
needed to be developed.
i
CONTENTS
Abstract i
List of Figures iv
List of Tables vi
Abbreviations vii
ii
2.3.5 Affinity propagation algorithm . . . . . . . . . . . . . . . . . . . . 14
iii
LIST OF FIGURES
iv
4.18 Silhouette Analysis for K=2 Clusters for Recency, Frequency and Monetary
Plot in PCA method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.19 Silhouette Analysis for K=3 Clusters for Recency, Frequency and Monetary
Plot in PCA method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.20 Silhouette Analysis for K=4 Clusters for Recency, Frequency and Monetary
Plot in PCA method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.21 Silhouette Analysis for K=5 Clusters for Recency, Frequency and Monetary
Plot in PCA method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.22 Hierarchical Clustering at K=3 . . . . . . . . . . . . . . . . . . . . . . . 35
4.23 Data Clusters Plotted Based on Non-PCA Method . . . . . . . . . . . . 36
4.24 Data Clusters Plotted Based on PCA Method . . . . . . . . . . . . . . . 37
4.25 Final Data Clusters Plotted for our dataframe . . . . . . . . . . . . . . . 38
4.26 KDE plot for each of the Clusters . . . . . . . . . . . . . . . . . . . . . 39
v
LIST OF TABLES
vi
ABBREVIATIONS
RFM Recency,Frequency,Monetary
vii
RV College of Engineering ®, Bengaluru - 560059
Chapter 1
Introduction to RFM Analysis and
Customer Segmentation
Figure 1.1 shows the RFM framework structure. The analysis is done accordingly.
To achieve the above objectives, customer clustering and segmentation is carried out
using the K-Means algorithm. It is based on RFM values for different regions. RFM
can be defined as segmentation of customer analysis which not only gives information
on frequent purchasing pattern of the customer, but also recent purchase and the profit.
Initially the clusters are evaluated using Silhouette Analysis for Recency Vs. Monetary
using K-Means for varying number of Clusters. This is followed by the Silhouette Anal-
ysis of Frequency Vs. Monetary, using the K-Means for different number of Clusters.
Silhouette Analysis is a prototype-based method to evaluate or validate Clusters. The
validity can be either be cohesion, or separation, or a combination of both. In the present
work, Silhouette Coefficient combines both cohesion and separation.
1.3 Motivation
There are several analytical methods while working with Customer segmentation.
RFM analysis helps in the formation of customer segments. Customer segmentation is
1.4 Objectives
The objectives of the project are
1. To perform Univariate Analysis on the dataset and data visualization of the data.
3. To predict the level of Clustering at each level to perform Silhouette Analysis and
formulate the customer purchase behaviour.
Chapter 2 explains the theory and fundamentals RFM analysis with understanding
of RFM with respect to Customer Segmentation.It also explains the fundamentals
of Customer Segmentation.
Chapter 4 consists of results obtained during the simulation of the data with respect
to Customer Analysis. And Clusters are generated for Customer Segmentation.
Chapter 5 gives the conclusion chapter which discusses the inferences drawn from
the project followed by providing some future scope for the work.
Chapter 2
Fundamentals of RFM Analysis and
Customer Segmentation
2.2.1 Frequency
A high frequency score means a customer buys your brand frequently, and is likely to
be a loyalist of your brand. To calculate frequency, businesses need to analyse the total
number of purchases completed by customers in a fixed time period. Frequency can be
scored by grading on custom-built filters such as bought thrice in a year/bought once a
month and so on, depending on the nature of the business.
2.2.2 Recency
A high recency score means a customer has positively considered your brand for a
purchase decision recently. Recency can be scored by grading on custom-built filters such
as bought on the last 7 days/1 month/3 months and so on, depending on the nature of
the business.
2.2.3 Monetary
A high monetary value score means a customer is one of the highest spending cus-
tomers of your brand. Monetary value score can be graded on custom-built filters like
spent more depending on the nature of the business. All the above criteria can be graded
on a scale of 1 to 5, with 5 being the best score you could assign a customer. It is also
critical to specify an appropriate range for each grade, in order to create groupings of
customers with similar buying behaviour.
Figure 2.2 shows the Density Based Spatial Clustering Applications with Noise Algo-
rithm through clustering methods.
Figure 2.3 describes the PCA 1st(PC1) and 2nd dimension through clustering.
Machine Learning algorithms provide us with the steps involved required in clustering
of the data which have been discussed. Customer Segmentation insights for forming
different strategies have also been discussed. These strategies will provide us with the
implementation required for the model.
Chapter 3
Implementation of RFM Analysis
and Customer Segmentation
Recency: It is the number of days before the given reference date when a customer
has made a certain transaction. Lesser the recency, higher is the visit.
The section explains the proposed given process of the customer value analysis. This
process has the following four steps shown in Figure 3.2 :
The research analysis is used with the process introduced step by step.
The proposed methodology provides us with the steps that are required in construct-
ing a strategy for generating different clusters required for customer segmentation.The
segmentation provides us with different insights in customer behaviour and their pattern.
Chapter 4
Results and Analysis
In 4.1, 4.2, 4.3 it is shown the different interactions done by the customer .
Univariate Analysis provides that there are no missing and negative values which are
removed. From the data above in Table 4.1 and 4.2, here a customer carries 5 credit
cards and visits 2 times on an average. There are certain Customers who do not visit
banks, visit online banks or calls are made.
Table 4.3 shows the Data Correlation between the features through Exploratory Data
Analysis.
Recency is how recent a specific customer made his/her latest purchase. The given
case sets a date for threshold to calculate frequency.
Table 4.4 and 4.5 provides us with pre-processed data from the data frame providing
Recency,Inverse Recency,Frequency and Monetary Analysis.
Figure 4.4, 4.5, 4.6, 4.7 provides us with data distribution plot for RFM analysis of
our dataframe.
The clustering method helps in rescale values in each feature. RFM usually divided
data into 5 equal parts and scored them from 1 to 5 but standardization of data is
In 4.8, there is one data point that is very far away to other data as in the given in
the above pair plot. That data is needed to be dropped and analysed through clustering.
4.4 Clustering
The simulation provides us with different clustering methods like Non-PCA and PCA
methods to perform K-Means Clustering through which we can generate the cluster plots
required to infer different strategies and come up with different plans.
Figure 4.9 depicts the clustering analysis of Recency,Monetary and Frequency through
a pair plot.
Figure 4.10 depicts the Inertia vs k value which gives the value of k through decreasing
inertia.
Figure 4.11: Silhouette Analysis for K=2 Clusters for Recency, Frequency and Monetary
Plot
Figure 4.12: Silhouette Analysis for K=3 Clusters for Recency, Frequency and Monetary
Plot
Figure 4.13: Silhouette Analysis for K=4 Clusters for Recency, Frequency and Monetary
Plot
Figure 4.14: Silhouette Analysis for K=5 Clusters for Recency, Frequency and Monetary
Plot
Table 4.6 gives the PCA values for the separate dimensions.
Figure 4.16 shows the clustering plot for PC1 vs PC2 in K-Means Clustering.
Figure 4.17 depicts the Inertia vs k value which gives the value of k through decreasing
inertia.
Figure 4.18: Silhouette Analysis for K=2 Clusters for Recency, Frequency and Monetary
Plot in PCA method
Figure 4.19: Silhouette Analysis for K=3 Clusters for Recency, Frequency and Monetary
Plot in PCA method
Figure 4.20: Silhouette Analysis for K=4 Clusters for Recency, Frequency and Monetary
Plot in PCA method
Figure 4.21: Silhouette Analysis for K=5 Clusters for Recency, Frequency and Monetary
Plot in PCA method
Figures 4.18,4.19,4.20 and 4.21,show silhouette analysis is done to the Clustering Plot
for the different values of K=2,3,4,5 which gives us the silhouette plot for the various
clusters of different values of K. RFM is depicted through Visualization of clustered data.
365
317
110
368
317
107
Figure 4.23 and 4.24 show the data clusters plotted based on Non-PCA and PCA
methods for separate clusters.
There was no significant difference on results between Non-PCA and PCA method.
But, since PCA results in higher silhouette index so PCA is used for the clustering
technique.
After analyzing the PCA pair plot we can notice that the First cluster belongs to the
best customers who have a low recency, high frequency and high monetary. Second cluster
belongs to the loyal customers who have low recency, low frequency and low monetary.
Third cluster belongs to potential customers who have a high recency.
Group 1 (Cluster 1): They are our long-standing customers. those who come
out in terms of recency, frequency or even Monetary value, as we see they made
less transactions with a low monetary value a long time ago.
Strategy: We can design more specifically targeted communication that help con-
vert into a more loyal, higher RFM value customers.
Group 2 (Cluster 2): They are our loyal customers, they come first in terms of
frequency with large-value transactions. However, they are the second most recent
customers who made purchases, so we can’t lose them.
Strategy: We need more personalized offers that can be promoted for product
recommendation based on their past transactions in order to increase engagement
and higher customer retention rate.
Group 3 (Cluster 3): They are our new customer base, they are the most re-
cent customers who made purchase, slightly higher in monetary value than group1.
However, less frequent than group 2, which makes perfect sense they are newly
introduced to the market.
Strategy: The triggered welcome emails can be used to ensure engagement, es-
tablishing personal connection, encourage them to make more purchases with in-
troductory offers.
Chapter 5
Conclusion and Future Scope
[1] P. Anitha and M. M. Patil, “RFM model for customer purchase behavior using
k-means algorithm,” Journal of King Saud University - Computer and Information
Sciences, Dec. 2019. doi: 10.1016/j.jksuci.2019.12.011.
[2] S. Monalisa, P. Nadya, and R. Novita, “Analysis for customer lifetime value cat-
egorization with RFM model,” Procedia Computer Science, vol. 161, pp. 834–840,
2019. doi: 10.1016/j.procs.2019.11.190.
[4] M. Li, Q. Wang, Y. Shen, and T. Zhu, “Customer relationship management analysis
of outpatients in a chinese infectious disease hospital using drug-proportion recency-
frequency-monetary model,” International Journal of Medical Informatics, vol. 147,
p. 104 373, Mar. 2021. doi: 10.1016/j.ijmedinf.2020.104373.
[5] P.-Y. Hsu and C.-W. Huang, “IECT: A methodology for identifying critical prod-
ucts using purchase transactions,” Applied Soft Computing, vol. 94, p. 106 420, Sep.
2020. doi: 10.1016/j.asoc.2020.106420.
[6] A. Dudhia, “The reference forward model (RFM),” Journal of Quantitative Spec-
troscopy and Radiative Transfer, vol. 186, pp. 243–253, Jan. 2017. doi: 10.1016/
j.jqsrt.2016.06.018.
[7] R. Heldt, C. S. Silveira, and F. B. Luce, “Predicting customer value per product:
From RFM to RFM/p,” Journal of Business Research, vol. 127, pp. 444–453, Apr.
2021. doi: 10.1016/j.jbusres.2019.05.001.
[8] E. Zhang, M. Li, S.-M. Yiu, J. Du, J.-Z. Zhu, and G.-G. Jin, “Fair hierarchical
secret sharing scheme based on smart contract,” Information Sciences, vol. 546,
pp. 166–176, Feb. 2021. doi: 10.1016/j.ins.2020.07.032.
43
RV College of Engineering ®, Bengaluru - 560059
[9] A. J. Christy, A. Umamakeswari, L. Priyatharsini, and A. Neyaa, “RFM ranking –
an effective approach to customer segmentation,” Journal of King Saud University
- Computer and Information Sciences, Sep. 2018. doi: 10.1016/j.jksuci.2018.
09.004.
[10] S.-C. Wang, Y.-T. Tsai, and Y.-S. Ciou, “A hybrid big data analytical approach for
analyzing customer patterns through an integrated supply chain network,” Journal
of Industrial Information Integration, vol. 20, p. 100 177, Dec. 2020. doi: 10.1016/
j.jii.2020.100177.