Determination of Customer Satisfaction Using Improved K-Means Algorithm

Soft Computing
https://doi.org/10.1007/s00500-020-04988-4 (0123456789().,-volV)(0123456789().
,- volV)
METHODOLOGIES AND APPLICATION
Determination of Customer Satisfaction using Improved K-means

algorithm
Hamed Zare1 • Sima Emadi1
Springer-Verlag GmbH Germany, part of Springer Nature 2020
Abstract
Effective management of customer’s knowledge leads to efficient Customer Relationship Management (CRM). To
accurately predict customer’s behaviour, clustering, especially K-means, is one of the most important data mining tech-
niques used in customer relationship management marketing, with which it is possible to identify customers’ behavioural
patterns and, subsequently, to align marketing strategies with customer preferences so as to maintain the customers.
However, it has been observed in various studies on K-means clustering that customers with different behavioural indi-
cators in clustering may seem to be the same, implying that customer behavioural indicators do not play any significant role
in customer clustering. Therefore, if the level of customer participation depends on behavioural parameters such as their
satisfaction, it can have a negative effect on the K-means clusters and has no acceptable result. In this paper, customer
behavioural features—malicious feature—is considered in customer clustering, as well as a method for finding the optimal
number of clusters and the initial values of cluster centres to obtain more accurate results. Finally, according to the
organizations’ need to extract knowledge from customers’ views through ranking customers based on factors affecting
customer value, a method is proposed for modelling their behaviour and extracting knowledge for customer relationship
management. The results of the evaluation of the customers of Hamkaran System’s Company show that the improved K-
means method proposed in this paper outperforms K-means in terms of speed and accuracy.
Keywords Customer relationship management K-means Customer life cycle value Data mining Customer satisfaction
1 Introduction management and sales staff or organization services can adapt

their customers’ needs to their products, remind them of their
The interaction between organizations and customers has service needs, or perform the right-time management. In this
significantly changed so that there are no long-term guaran- regard, marketing department in each company should work
tees of business continuity with customers. However, orga- with an organized effort to collect and organize customer
nizations need to properly identify their customers, anticipate information. This information is valuable to sailing unit to
their needs and expectations, and increase their productivity contact the customer and provide good services, and it is also
using this type of information (Erdil and Öztürk 2016; Lau- valuable to a senior manager for large organization decisions.
don and Laudon 2015; Shatnawi et al. 2017). A simple Customer relationship management is a strategy used by
example of Customer Relationship Management (CRM) is organizations as a valuable source for increasing customer
customers’ information for an organization, where loyalty and satisfaction, aimed at supporting business
activities to achieve competitive ability (Erdil and Öztürk
2016; Gayathri and Mohanavalli 2011; Laudon and Laudon
Communicated by V. Loia. 2015; Prabha and Subramanian 2017; Shatnawi et al.
2017). It also includes a set of processes that enable
& Sima Emadi
emadi@iauyazd.ac.ir organizations to support business strategies to build long-
term, profitable relationships with specific customers (Ra-
Hamed Zare
ms.hzare@iauyazd.ac.ir jeh et al. 2014).
However, most of the CRM methods are based on infor-
1
Department of Computer Engineering, Yazd Branch, Islamic mation technology and decision-makers should inevitably
Azad University, Yazd, Iran
123
H. Zare, S. Emadi
manage customers’ knowledge to achieve effective customer 2.1 K-means optimization algorithms
relationship management (Dyche 2002). Managers should
determine the priority of different groups of customers so that Hierarchical clustering and partition clustering are two of
they can concentrate on main customers. This shows the the most extensively employed clustering methods (Jain
situations where customers may be lost as a result of not et al. 1999). K-means clustering algorithm, as the most
receiving appropriate services. The issue is extremely critical commonly used partition method (MacQueen 1967), is
as customers’ switching to a competitor company leaves aimed at minimizing the sum of squared Euclidean dis-
negative impacts on the company’s reputation, financial loss, tances from the mean value of each group (SSEDM)
and credibility (Ansari and Riasi 2016). Therefore, new (Celebi et al. 2013).
technologies, data mining in particular, can be applied to The K-means algorithm has been used independently or
discover the knowledge from the customers’ information, also in combination with other algorithms in many applications
to analyse them (Szulanski 1996). such as disease diagnosis (Riveros et al. 2019; Nithya et al.
However, one of the most important data mining tech- 2020; Qiao et al. 2019; Bablani et al. 2020), anomaly
niques is clustering which is used for developing CRM and detection (Karczmarek et al. 2020), task allocation strategy
customer management (He and Song 2009; Kafashpour (Sharma and Bala 2020), air pollution analysis (Govender
et al. 2012; Maryani and Riana 2017; Yuliari et al. 2015). and Sivakumar 2019) network (Fadaei and Khasteh 2019;
Clustering methods can partition customers’ assessments Žiberna 2020), sparse and high-dimensional data clustering
data into clusters. Then, managers will develop their CRM (Hussain and Haris 2019), outlier detection (Jones et al.
and provide assist for different customer clusters according 2020), and privacy preserving (Jiang et al. 2020; Lin 2020).
to their preferences. For this reason, a lot of research has been done to solve
Creating unsuitable clusters is due to the lack of con- problems and to improve the K-means algorithm, some of
sideration of malicious customers who can have a negative which are mentioned below.
impact on it. However, an improved K-means method is An iterative approach was introduced by Ismkhan
proposed in the present research to partition customers into (2018) to promote the quality of the solution produced by
clusters according to features such as length, recency, the K-means through removing one cluster (minus),
frequency, monetary, and malicious features. In the pro- dividing another one (plus), and performing re-clustering in
posed method, first the malicious customers, who have a each iteration. According to previous studies, I–K-
deductive effect on the result, are identified as outlier data means - ? outperforms K-means ??. Moreover, I-K-
and then removed from the data set. Then, a new method is means - ? has higher precision than other similar meth-
presented for determining the centres of the clusters based ods regarding SSEDM minimization.
on LRFMM2model in order to generate the best initial In the Maxmin (Katsavounidis et al. 1994) algorithm, a
clusters. In the proposed method, unlike K-means where random data point is selected as the first centre. Then, the
the number of clusters is not known from the beginning and data point with the largest distance from the previously
is achieved with several repetitions, clustering is started selected centre is considered as the ith centre. The algo-
with a cluster and then added to their number, so that the rithm in Redmond and Heneghan (2007) divides the data
number of optimum clusters is determined with more set space into several buckets via kd-trees, determining the
accuracy and in short time. density of each bucket. Each bucket is represented by a
The rest of the paper is organized as follows: in Sect. 2, point which is the mean of points in the bucket. The bucket
related works will be described. Then, Sect. 3 describes a with the greatest density value is selected as the first centre.
new method based on malicious customers for identifying The mean point of a bucket, which has the maximum value
outlier data, and improved K-means clustering is presented. of multiplication of its density by the distance from its
In Sect. 4, the evaluation of the proposed approach is nearest centre, is selected as the ith centre.
discussed and the results are expressed. Finally, Sect. 5 In K-means ?? (Arthur and Vassilvitskii 2007; Min
presents the conclusion. and Kai-fei 2015), the first centre is randomly selected
from among the data points of data set. It selects a data
2
point x 2 X as the ith centre with probability PDðxÞ 2 ,
2 Related work x2X
DðxÞ
where D(x) represents the shortest distance from a data
The related works in this research are divided into two point to the closest centre which has already been selected.
parts. The first section reviews K-means optimization The global K-means (GKM) (Likas et al. 2003) begins
algorithms, and the second part focuses on CRM with two centres, adding a centre at each time. To add the
improvement based on data mining algorithms. ith centre, each data point is considered by GKM as a
candidate for the ith centre. In this algorithm, the data point
123
Determination of Customer Satisfaction using Improved K-means algorithm
leading to the minimum value of bn ¼ and consistency of various views in a collaborative manner.
Pn n o The views and the characteristics in each view are allo-
j 2
j¼1 max 0; di1 xn xj is chosen as the ith centre. cated with weights that reflect their importance. An itera-
Modified GKM (MGKM) (Bagirov 2008) selects the tive optimization method has been introduced to optimize
starting point of the ith centre in a completely different the objective function and, subsequently, accomplish the
manner through minimizing another auxiliary cluster ultimate clustering result.
function. The experiments show that although the MGKM Bai et al. (2018) employed fuzzy K-means clustering
can minimize the SSEDM more effectively than the GKM, algorithm to find arbitrarily shaped clusters. This algorithm
it is slower than the GKM. Lai et al. (Lai and Huang 2010) inherits the scalability of fuzzy K-means and tackles its
intended to speed up GKM through the use of the cluster drawback concerning finding arbitrarily shaped clusters.
membership and geometrical information of a data point. To obtain the initial cluster centre, Chen et al. (2018)
The fast MGKM (FMGKM) (Bagirov et al. 2011) exploits proposed an optimized K-means algorithm upon mean shift
data collected in previous iterations of the incremental clustering. This algorithm selects the initial cluster centre
algorithm and cuts down on memory usage while accel- in the high-density area of the data set, being more in line
erating the MGKM. with the actual distribution of the data.
Erisoglu et al. (2011) proposed a mechanism to reduce Deng et al. (2018) presented a heuristic method to
computational time of selecting initial centres. Their pro- address the scalability problem of K-means, supported by
posed mechanism tries to select two dimensions which an approximate k-nearest neighbours graph when the data
describe the change in the data set at first in the best way. size and the cluster number are large.
Thereafter, these two dimensions are used in all compu- In another work, Dong et al. (2019) suggested a new
tations while selecting k centres to calculate the required intrusion detection algorithm DB-K-means in which the
distances between data points. Then, the mean of data advantages of K-means and DBSCAN are combined. Ini-
points is determined according to the selected dimensions. tially, DB-K-means employs the optimized DBSCAN to
The farthest data point to the mean is selected as the first remove noise points; then, it uses the optimized K-means
centre. A data point with the largest sum of distances from algorithm to divide the data into different clusters and mark
previous i - 1 centres is selected as the ith centre. This is it either as normal or abnormal.
iterated until all required centres are selected. Xiaofeng and Xiaohong (2017) proposed an optimized
Yu et al. (2018) proposed tri-level K-means algorithm K-means and multi-level SVM fusion of semi-supervised
and bi-layer K-means algorithm. K-means algorithm is learning network intrusion detection algorithm. The algo-
vulnerable to outliers and noisy data and also susceptible to rithm primarily applies the optimized K-means to divide
initial cluster centres. Tri-level K-means algorithm can the data into various clusters and mark it as normal or
tackle these shortcomings. While the data in a data set S are abnormal; then, it employs the multi-level SVM to classify
often changed, after a while, the trained cluster centres the clusters marked as abnormal in order to obtain the
cannot precisely describe the data in each cluster. There- ultimate goal of achieving improved detection efficiency.
fore, the cluster centres must be updated. When the data in Gu et al. (2019) proposed the semi-supervised K-means
a cluster are significantly different, only a cluster centre algorithm using hybrid feature selection (SKM-HFS) to
cannot precisely describe each datum in the cluster. Noisy detect distributed denial of service attacks. They intro-
data, outliers, and the data with different values in a same duced an optimized method for selecting initial cluster
cluster may slow down the performance of a pattern centres based on density in order to solve the issue of
matching system. Bi-layer K-means algorithm can over- outliers and local optimal of K-means clustering.
come these problems. Moreover, a genetic-based algorithm Hu et al. (2019) introduced an efficient multiple kernel
is provided to derive the fittest parameters used in tri-level K-means clustering approach called Consensus Multiple
K-means algorithm and bi-layer K-means algorithm, Kernel Clustering with Late Fusion Alignment and Matrix-
respectively. induced Regularization (CMKC-LFA-MR) with the aim of
Zhang et al. (2018) proposed multi-view clustering, considering the correlation between various clustering
which has been of considerable interest in recent years due partitions. Their proposed approach promotes clustering
to its ability to analyse data from multiple sources or views. performance.
They proposed a novel multi-view clustering approach Manxi et al. (2018) proposed an optimized K-means
termed Two-level Weighted Collaborative K-means (TW- algorithm through the incorporation of mutual information
Co- K-means). A new objective function is designed for into the K-means algorithm and modifying the iteration
multi-view clustering, exploiting the particular information conditions in a way as to find the community structure of
in each view while benefiting from the complementariness Bayesian networks. In their proposed algorithm, mutual
information is employed as a distance between nodes. The
123
H. Zare, S. Emadi
non-central points in the clustering subset are randomly Sohrabi and Hadavandi (2011) investigated the cus-
selected as the new clustering central points, with the tomer life cycle value of a private bank using K-means
central points being materialized. This can also cut down clustering to classify customers. Customers’ features were
on the sensitivity of the K-means algorithm to the isolated analysed by partitioning the customers into eight clusters
points. using the customer life cycle value and RFM (Recency,
Nguyen et al. (2019) introduced a kernel-based distance Frequency, and Monetary) model. The disadvantages of
metric learning approach called KDMLSC to enhance the using this method in customer relationship management are
practical use of K-means clustering. Obtaining an appro- based on the RFM model variables to analyse their fea-
priate distance metric accurately reflecting the (dis)simi- tures. According to Rinartzio Kumar, Chung and Tessa, the
larity between examples is a key factor in successfulness of RFM model is unable to identify customers with long-term/
K-means clustering. short-term communications. In the LRFM model (length,
Wang et al. (2019) developed two multiple kernel K- recency, frequency, and monetary), the length of customer
means algorithms with late fusion. Their proposed algo- relationship has a significant impact on customer loyalty
rithms integrate the various clustering indicator matrices and profitability so that increasing the length of customer
instead of the optimal kernel for clustering in the former relationship will improve customer loyalty.
multiple kernel framework. The clustering process and the Wang and Zhang (2010) examined the ability of the
optimization assignment can be linked to one optimization RFM model to classify customers in after-sales service
problem. In order to implement the proposed framework, companies. In this research, 5821 customers were divided
two novel average and adaptive approaches (average into eight clusters using K-means methods and the hierar-
multiple kernel K-means with late fusion (Average- chical analysis process (AHP) so that customer life cycle
MKKMLF)) and (adaptive multiple kernel K-means with value was determined in each cluster. However, inappro-
late fusion (Adaptive-MKKM-LF)) with carefully designed priate clusters may be created as there are customers with
deterministic fusion optimization goals have been intro- very little interaction with the organization but with a
duced to overcome the optimization problem in multiple significant role in customers clustering. As a result, these
kernel K-means clustering with proved convergence. types of customers, known as low-value customer groups,
should be eliminated from the data set and identified as an
2.2 CRM improvement based on data mining outlier. Li et al. (2011) used a two-stage clustering tech-
algorithm nique to analyse the characteristics of the customers of a
knitting factory based on extended RFM model. In this
Data mining techniques applied on CRMS include associ- approach, the optimum number of clusters was determined
ations, classification, clustering, predictive analysis, using the WARD index (Eszergár-Kiss and Caesar 2017),
sequence, or visualization (Chiang 2018; Shmueli et al. and the customers were divided by K-means method into
2017). five clusters. Since the studied company belongs to the
Also, numerous data mining methods have previously textile industry and the demand for this industry depends
been proposed (He and Song 2009; Maryani and Riana on the seasons, more accurate results can be obtained while
2017; Yuliari et al. 2015) based on clustering for pro- using the LRFM model. This is due to the fact that the gap
cessing the customer data. As an example, one can refer to between the first and last purchase in different seasons can
the application of data mining in customer partitioning, have a significant impact on the result and division of
behavioural modelling with the aim of predicting Customer customers.
Life cycle Value (CLV) and providing appropriate service Li et al. (2017) suggested a customer-based data mining
to potential customers in the direction of CRM imple- model based on a binary behavioural response and divided
mentation, corporate customer turnaround analysis, and the customers’ businesses into three stages as follows. The
forecasting and evaluating the organization’s performance first step is to collect samples of the feedback on marketing
(Sohrabi and Hadavandi 2011). Kafashpour et al. (2012) activities. In the second step, the decision tree algorithm is
determined the values of RFM model variables, including applied to use sample data to find out the rules of the
the exchange rate, the number of exchanges, and the communications of target customers. In the third step, new
monetary value of exchange in 260 customers, and feedback data are collected by the rules and are used to
weighted them using the hierarchical analysis process in create the target customer list. The authors do not identify
the East Tosh Business Company. They employed the noise and malicious customers in their approach; as a
adapted K-means for clustering customers as the life cycle result, inappropriate customer clusters may be created.
value pyramid to identify the valuable customers of the Pawar (2016) proposed a new technique for clustering and
company. classification of large numbers of customers so that orga-
nization can identify valuable customers. Mukhlas et al.
123
(2016), in support of the local cooperative community in in order to set marketing strategies. They used K-mean
customer partitioning, market analysis, and prototype algorithm and crisp method on real data from an Iranian
building, have investigated customers clustering based on state bank. The results show nine cluster patterns between
interactive and behavioural characteristics and have out- customers to propose the customer strategies (Alvandi et al.
lined a data mining method which is used to predict the 2012).
purchase of customer-interested products, increasing pro- Anitha and Patil(2019) applied business intelligence in
ductivity and profits for the organization. In this way, by identifying potential customers in the retail industry by
offering a trading business, the customer is eager on future using K-means algorithm based on the RFM (Recency,
purchases from the organization. In this research, more Frequency and Monetary) model. Finally, a variety of data
attention is paid to loyal customers and their behavioural set clusters are validated based on the calculation of Sil-
characteristics, and they are distinguished from other users, houette coefficient. Khalili et al. (2018) proposed a hybrid
and this can help to classify customers. However, they did method based on the methods like decision tree approach,
not notice customers with no loyalty to the organization clustering, and rule extraction to predict the new entities
and did not identify these customers in their community. for the customer centric companies and K-means for pre-
The inappropriate clustering is done, and the analyses of dicting future transactions based on the historical beha-
the clusters are not accurate. Zahrotun argues about clus- viour of the customers under various segmentations. They
tering process through applying Fuzzy C-means (FCM) used the hybrid feature selection for filtering making and
technique to customer’s data at tokodiapers.com online decision-making methods. Christy et al. (2018) used
shopping services. The proposed method in their study has unsupervised algorithms like K-means and fuzzy C–means
the ability of partitioning customer data on tokodiapers.- for the RFM analysis using transactional data set for
com through the use of fuzzy C-Means clustering method evaluating customers on their purchase behaviour. Carnein
(Zahrotun 2017). and Trautmann (2019) believed that because of high
In another study conducted by Maghfirah et al., cus- volatility of the market, customer segments change over a
tomer segmentation has been employed to help the banks period of time. They have introduced the concept of
with dividing the market into client groups according to ‘Stream Clustering’ to identify the new clusters or
their needs, characteristics, or behaviours, which may emerging clusters and replacing the older ones. Qadadeh
require a specific good or marketing mix (Maghfirah et al. and Abdallah (2018) employed K-means and self-orga-
2015). Alsac et al. (2017) analysed the data of a company nizing SOM algorithms to cluster the customer features via
operating in health sector in Turkey, through employing RFM model.
certain data mining algorithms such as naive Bayes which Fränti and Sieranoja (2019) studied the most important
is a classification rule algorithm, J48 which is an algorithm factors that deteriorate the performance of the K-means
for generating decision tree, and k-nearest neighbour IBk algorithm are cluster overlap. In case with high overlap, K-
algorithm. Accordingly, customers were categorized in means iterations perform satisfying irrespective of the
terms of product groups and risk factors. The aim of their initialization. Also, all initialization approaches exhibit
work was to develop a marketing strategy, also to create a poorer performance with increasing number of clusters. To
more effective sales campaign for the company considering perform successfully, the K-means relies directly on the
classification outcomes. number of clusters. Higher number of clusters means more
Kumar and Reddy (2017) presented an effective method possible errors before and after the iterations. They stated
for implementation of K-means. To do so, they firstly that dimensionality has no positive impact in this respect.
computed a kd-tree for the given data set. Secondly, they Although it has a slight impact on some initialization
used the kd-tree to select and separate initial cluster centres approaches, K-means iterations are typically independent
located at dense areas. Finally, the cluster centres were of the dimensions.
updated by recursively visiting the nodes in the kd-tree and
eliminating distance computations towards cluster centres
which are not nearest to at least one point in the node. 3 Proposed method
Peker et al. proposed a RFMP model for customer seg-
mentation in the grocery retail industry. They classified This research has proposed a methodology to modelling
customers into five different groups by K-means algorithm, customer behaviour and extracting knowledge for customer
and they are profiled based on LRFMP features. Finally, relationship management based on the CRISP methodol-
CRM and marketing strategies are recommended to them ogy, because CRISP methodology is flexible and can be
(Peker et al. 2017). Alvandi et al. purposed a model to customized easily (Khajvand and Tarokh 2011).
calculate Customer Lifetime Value (CLV) based on LRFM As Fig. 1 shows, the proposed method is implemented
customer relationship model. They classified the customers in the following two phases:
123
H. Zare, S. Emadi
Phase1
Scaling
Extracting data: LRFMM2
Transaction Data: LRFMM2 parameters parameters
Customer data
Outlier detection and cleaning
Phase2
Evaluating clusters by SSE

Customer classification
Extracting knowledge from
Determining initial data of each cluster
clusters
Clustering
Comparison with other algorithms
Fig. 1 Structure of research methodology
• Data preparing and pre-processing One of the most powerful and simplest models to
• Modelling and Evaluation implement CRM may be the RFM model—Recency, Fre-
quency, and Monetary value (Kafashpour et al. 2012;
Olson 2017):
3.1 Phase 1: data preparing and pre-processing
• The recent purchase indicates the duration of the last
Extracting data, scaling L, R, F, M, M2 parameters, cus- business interaction and the current time. As long as
tomer data and outlier detection, and cleaning are sub- this period is lower, the value of R is higher.
phases of data preparing and pre-processing phase. • The frequency of purchases indicates the number of
Information of collected data includes customer’s ID, purchases and transactions in a given range. Higher
number of purchased systems, price paid, purchase date, number of repetitions indicates the greater value of F.
and customer satisfaction. • The monetary value of a purchase represents the value
In extracting data sub-phase, there are transaction data of money spent by a customer on interactions in a
in detail which include date of transaction, number of particular time interval. Greater amount of money spent
purchased systems in each date, monetary value of trans- represents the higher value of M.
action, and customer satisfaction. For extracting L, R, F,
In RFM model, if R, F, and M are large, the likelihood
M, and M2 parameters of each customer in this research,
of a new transaction with the customer and customer
the distance between the customer’s first and final pur-
repurchase is higher. The basic assumption of the model is
chases was calculated as L feature. The distance between
that future patterns of exchange and customer purchasing
customer’s last purchase and the end of the time period is
are likely to be similar to those of the past and present. The
calculated as R feature. The number of purchased systems
RFM model selects customers who have recently created
between the customer’s first and last purchases is calcu-
high financial value for the company and in the short time
lated as F feature. The total monetary value between the
have a more frequent buying frequency than the average
customer’s first and last purchases over a given time period
purchase frequency among recurring buyers as valued
is calculated as M feature. In this research, the malicious-
customers, while the factor of the duration of communi-
ness of customers (M2 feature) is also considered as one of
cation with the company is ignored.
the influential parameters in CRM which is calculated from
The results show that the length of the periods in which
the customers’ satisfaction in each record of the data set
an individual can be identified as a customer by the orga-
and the amount of minimum and maximum of malicious-
nization has a positive relationship with the periods in
ness of customers is determined by experts and sales
which the customers wish to continue to communicate with
managers.
the organization. Hence, in the LRFM, customers are cat-
egorized based on four features, namely customer
123
relationship length, purchasing novelty, buying frequency, Therefore, customers whose values of the normalized
and purchasing value (Kumar et al. 2006). They found that parameter M2 (M20 ) are greater than 0.8 are outlier data.
an increase in the duration of customer relationship will 0
If M2 [ 0:8 ! noise data ð4Þ
improve customer loyalty, which defines the time interval
between the first and last customer purchases in the desired Condition 2 according to Eq. (5), customers whose mali-
range. cious range (M2) is greater than 5 (from the experts’ view)
In addition to LRFM and RFM models, some customers and DIS (Xi) is greater than p*AVE (DIS (x)) are consid-
may be against the organization, and pessimistic for some ered as the outlier data. The value of p = 3 is obtained
reason, and this pessimism can have a significant impact on based on multiple tests in various data sets.
customer relationship management. This negative view can 0
If M2 [ 5&ðDISðXi Þ [ ðp AVEðDISðxÞÞÞÞ ! noise data
be towards the organization or one of the members of the
organization. For this reason, this research by using a new ð5Þ
feature called Malicious (M2) can identify malicious cus-
Finally, after deleting outlier data, the customer life
tomers who are specific to this set and eliminate these
cycle value (CLV) for each customer is obtained based on
customers from their data set to determine their CLV. This
the LRFMM2 features according to Eq. 6:
new model is called LRFMM2. According to Eq. (1),
customer satisfaction rate is between maximum and mini- CLV ¼ L þ R þ F þ M þ M2 ð6Þ
mum, which is derived from experts’ opinions and
knowledge and customer surveys.
MIN\M2 \MAX ð1Þ
Since the proposed method performs clustering with 3.2 Phase 2: modelling
high precision in a noisy environment, in outlier detection
and cleaning sub-phase, first the data whose CLV and In the second phase, customers should be segmented based
LRFMM2 features further away from the rest of the data on LRFMM2 and CLV features. For clustering customers,
are detected and then these data are deleted. However, the there are different algorithms which can be used. Among
quality of the model depends on the amount determined for the existing clustering algorithms, one of the most com-
this distance (Baxter et al. 2002). monly used and simplest ones is the K-means algorithm
According to Eq. (2), the sum of the distance of each (Danesh et al. 2011).
datum from the rest of the data is calculated: K-means method partitions the data set into k clusters so
that each cluster has the following two properties:
X
k X
k
DISð xÞ ¼ j Xn X m j ð2Þ • Each cluster contains at least one data,
m¼1 n¼1 • Each datum in the data set belongs exactly to a cluster.
where x is a summation of the data distance, Xn and Xm When clusters have different sizes and densities and
represent the value of nth and mth set of data, respectively, non-spherical forms, K-means is faced with certain limi-
and k is the number of data. Then, in Eq. (3), the means of tations. Moreover, the algorithm will encounter problems
the data distance are shown. when the data set contains noise or outlier data. In general,
AVEðDISð xÞÞ ¼ ðDISð xÞÞ=k ð3Þ the algorithm has the following shortcomings (Liao et al.
2012):
According to Eq. (3), DIS (x) is the distance of x from
the rest of the data. The average distance is a suit- • The final answer depends on the selection of early
able measure for detecting outlier data; if it exceeds the clusters,
specified threshold, it is identified as outlier data and • There is no known procedure for calculating the centres
should be removed from the data set. of clusters,
In the following, the outlier data are deduced from the • If there are zero numbers of data belonging to a cluster
data set using the two parameters Malicious (M2) and the in one running of the algorithm, there is no way to
average of the data distance with the following two modify the method,
conditions: • It is assumed that in this method the number of clusters
Condition 1 the malicious range of customer is set to is determined from the beginning, while the number of
1 B M2 B 10 by an expert, and according to the experts clusters is usually unknown in many applications,
view and Eq. (4), a customer with a malicious range of • The ending of the K-means algorithm is guaranteed, but
greater than 8 is an unacceptable customer in the data set. the final result is not the same and the number of
clusters is not always optimal.
123
H. Zare, S. Emadi
Determining the optional number of clusters, determin- The value selected for custno depends on the number of
ing initial data of each cluster and clustering are sub-phases records in the data set, so for large data sets, n must be
of modelling phase. The proposed algorithm is presented to closer to 5000.
find the optimal number of clusters as well as the initial F
values of optimized clusters, also to eliminate noise data. p¼ þ3 ð9Þ
custno
Determining the Optimal Number of Clusters where F is the total number of data sets.
Selection of suitable cluster numbers is one of the most In the next step, according to Eq. (10), if the distance
important issues in clustering. The number of clusters is between each datum and the previous data is larger than p *
suitable if: AVE (Xm), then a large mutation is made at this point, in
• Samples in a cluster are as similar as possible; which case the number of clusters is added by 1. This
• Samples belonging to different clusters are as incom- operation will continue until the last record in the data set
mensurable as possible. and the final number of clusters (K) is equal to the number
of mutations that have occurred at this stage.
That is, the clusters should have the maximum possible
If ðDISðm; nÞ [ ðp AVEðXm ÞÞÞ ! Great Jump PointðK þ þÞ
compression and should be as far from each other as pos-
sible (Subbalakshmi et al. 2015). For a proper clustering, ð10Þ
both criteria must be met, since if only the compression In order to determine the number of clusters in the
criterion is held, then any data can be considered as a proposed algorithm, the distance of each datum is calcu-
cluster. In this case, the best clustering is to get the entire lated from the rest of the existing data and then the average
data into a cluster, so that the distance between each cluster of this data distance is obtained. This value is a measure of
is zero (Kanungo et al. 2002; Tzortzis and Likas 2014). the separation of data that has jumped in the distance. If
In this research, to determine the optimal number of according to Eq. (5) the data interval is more than (p *
clusters, at points where the increase in CLV has a sig- average distance) from its previous data, these data are
nificant mutation, the cluster should be broken and added identified as mutated data and added to the number of
to the number of optimum clusters. To do this, the data set clusters.
is firstly sorted in ascending order by the CLV of each Since the number of mutations may also increase as the
datum. Then, in accordance with Eq. (7), the total Eucli- number of customers increases, according to the expert
dean distance of each datum which is smaller than the view p is recalculated according to Eq. (9). Therefore,
original data with its previous data is calculated based on increasing or decreasing the number of clients will not have
the CLV. much effect on the number of clusters of the algorithm.
X m
DISðXm Þ ¼ jXn Xn1 j ð7Þ Determining initial data of each cluster
n¼2 It must be noticed that appropriate initial centres can
accelerate the convergence of the K-means. Therefore,
In the next step, according to Eq. (8), the average binary
various initialization techniques have been suggested for
distance of the data less than Xm is computed to determine
the K-means ((Ismkhan 2018; Arthur and Vassilvitskii
the mutation criterion and the number of clusters.
2007; Erisoglu et al. 2011; Celebi et al. 2013).
DISðXm Þ The global K-means (GKM) (Likas et al. 2003) starts
AVEðXm Þ ¼ ð8Þ
m1 with two centres, to which a centre is added at every
Based on the tests performed on the data, the condition iteration. To add the ith centre, GKM considers each data
of the mutation detection between the data was predicted point as a candidate for the ith centre. Modified GKM
with coefficient 3, based on the average binary distance of (MGKM) (Bagirov 2008) suggests a different approach to
the data. select the starting point of the ith centre, minimizing
However, if the number of data is increased, the distance another auxiliary cluster function. The experiments
between the data will be smaller with respect to the nor- described in Bagirov (2008) show that MGKM outper-
malization operation; thus, the threefold condition for these forms GKM in terms of minimizing the SSEDM (sum of
data in large sets will not result in the optimal number of the squared Euclidean distance from the mean), although
clusters. For this reason, according to Eq. (9), for every its performance is slower. Another version is fast MGKM
custno record added to the set, one unit is added to p in (FMGKM) (Bagirov et al. 2011) which exploits the data
Eq. (5), so that the condition for establishing this equation collected in previous iterations of the incremental algo-
with a higher coefficient is obtained more satisfactorily. rithm. Its advantages include decreasing memory usage and
The parameter custno has values between 2000 and 5000. acceleration of the MGKM.
123
In the proposed method, the first step in the algorithm is 4 Results and discussion
to determine the initial element of each cluster. This
method initially begins with a cluster containing an ele- The purpose of this research is to apply the proposed
ment, and in other steps, it calculates the distance of the method on the customers of a company to extract appro-
other elements from the centre of the specified clusters. priate strategies to each company’s part and to use the
However, the closest data to the data average are deter- acquired knowledge in the organization’s knowledge
mined as the only member of the first cluster. After management cycle.
assigning each element to a cluster, data average of that Two types of data sets are used in experiments. The first
cluster is updated based on CLV according to Eq. (11). data set is the customer database of Hamkaran System Co.
This equation is applied to each attribute of cluster centre with 7312 records. Firstly, empty/noisy features such as the
and new data: invalidity of the date field, the purchase price, or the system
datanew þ ðc Aveold Þ number need to be removed to ensure quality pre-pro-
Avenew ¼ ð11Þ cessing. Having removed empty features, 6319 customers
sizeold þ 1
were still available.
where datanew is the new input element to the cluster, aveold The second database is the customer database of Ham-
is the average of the previous elements in the cluster, and karan System Co. with 15,000 final records.
sizeold is the number of cluster elements before the entry of After the feature selection process, the final version of
the new element into the cluster. pre-processed data sets with the most relevant features was
In the following, the distance between all elements of obtained. The selected features involved the customer’s
the data set and centre of the cluster is calculated. Then, name, the number of purchased systems, the purchase date,
two elements are extracted; the first element which has the and the purchase price, which could be used for clustering.
smallest distance with the centre of the cluster is the new Below sections are the results of the evaluation on the
member of the first cluster, and the second element which first data set, and in the final section, comparisons with
has the greatest distance with the cluster centre is the first other algorithms are performed on both data sets.
element of the second cluster. The process of adding the
two elements to the clusters repeats so that each k cluster 4.1 Phase 1: Collecting and Pre-Processing
has at least one member. Finally, the distance between the of Input Variables
data and the centre of clusters is calculated and the data
with the greatest average of distance from the entire centre The selected features should be used to determine five
of cluster are determined as the first element of the new features, namely L, R, F, M, and M2, for customers clus-
cluster according to Eq. (12), and the data that have the tering (Table 1). In this research, the distance between the
smallest distance from the centre of each cluster are con- customer’s first and final purchases is calculated as L fea-
sidered as a member of that cluster according to Eq. (13): ture. The distance between customer’s last purchase and
P
k ave 1 distðx; aveÞ the end of the time period is calculated as R feature. The
b ¼ max ð12Þ number of purchased systems between the customer’s first
k px1
P and last purchases is calculated as F feature. The total
k ave 1 distðx; aveÞ monetary value between the customer’s first and last pur-
a ¼ min ð13Þ
k px1 chases over a given time period is calculated as M feature.
The maliciousness of customers (M2 feature) is also
where x is the new data, ave is the average of the data in the
determined by experts and sales managers.
clusters, k is the number of optimal clusters, and p is the
Table 2 shows an example of customers whose
number of data remaining in the data set. The centre of each
LRFMM2 features are calculated. Since the selected fea-
cluster, after adding the elements to them, is updated
tures of the samples in this study do not have the same
according to Eq. (11) to get k clusters with at least 1 member.
change intervals, to minimize the effect of different prop-
Then, using Eq. (14), the closest data to the centre of
erties, Min–Max normalization was used (Han et al. 2011).
each cluster are found and are added to it:
Normalization of LRFMM2 features for some data is
c ¼ minfdistðx; aÞgx2mðaÞ ð14Þ shown in Table 3. Finally, clustering of the customers is
performed according to these features.
where m(a) is the desired cluster number (a) in Eq. (13)
In the next step, LRFMM2 features are weighted using
and x is an element of the data set. After all the elements of
the grouped hierarchical method (Mojena 1977), the output
the original data set are clustered, the average of each
of the Expert Choice software, and based on the paired
cluster is also available and can be used as the centre of the
comparison of the LRFMM2 features. By comparing the
clusters in the K-means.
123
H. Zare, S. Emadi
Table 1 Introducing LRFMM2 features

Row Variable Description
1 ID Customer number
2 L The interval between the first purchase and the last customer’s purchase
3 R The interval between the last purchase of the customer by the end of the specified time period
4 F The number of customer purchases in the specified time period
5 M Customer purchases based on the financial value within the specified time period
6 M2 Customer satisfaction
pair of features, the significance coefficient of each of them The values of the LRFMM2 and CLV features are
is calculated. Therefore, the special vector is calculated shown in Table 5.
using geometric comparison and normalization of the Finally in this step, malicious customers are excluded
obtained values. from the data set to eliminate their effect on clustering.
According to the output of this software, the relative According to Eq. (3) and the conditions 1 and 2, the outlier
weights of these features were obtained according to data are detected and are deleted from the data set. The
Table 4, L (0.119), R (0.06), F (0.244), M (0.463), and M2 number of outlier customers was 16. An example of these
(0.115). results is shown in Table 6. According to this table, the
Because each of the LRFMM2 features has different data of row 4920 and 5059 are the result of condition 1 and
weights, these weights can affect the way customers are the data of row 6230 are the result of condition 2.
clustered. It is also necessary to determine the association
of the weights of each feature to the values obtained for 4.2 Phase 2: modelling
that feature for all records in the data set. This is deter-
mined by multiplying the normalized value of the feature To obtain the number of clusters in the proposed method, it
by its weight (Eq. (15)): is necessary to identify the points at which a large jump is
M200 ¼ WM2 M20 M200 ¼ WM M20 made in the customer CLV. To find points with a large
ð15Þ
L00 ¼ WL L0 R00 ¼ WR R0 F 00 ¼ WF F 0 jump, the data are first arranged in terms of the CLV in
ascending order; then, the distance of each datum from its
Finally, each customer life cycle value (CLV) is equal to previous data in ascending order CLV is obtained. The final
the total value of the LRFMM2 features based on Eq. (6): result of this step is shown in Table 7, and the average of
CLV ¼ L00 þ R00 þ F 00 þ M 00 þ M200 the distance of each datum is calculated with data smaller
than itself (Eq. 7).
Table 8 shows the distance between some data and
Table 2 LRFMM2 indices for customers’ sample centre of the first cluster. The values of optimal centres
Row Customer name L R F M M2 obtained are shown in Table 9. After that, these steps are
repeated to obtain all the cluster centres. Using Eqs. 7–10,
1 … 172 1278 7 43,079,000 1
the optimal number of clusters was calculated as 15. Also,
2 … 411 1020 3 53,206,000 4
as shown in Table 10, using classic K-means clustering and
3 … 0 223 1 23,423,000 3
the results obtained from the Dunn index, the optimal
4 … 0 1257 1 23,423,000 3
number of clusters with the highest Dunn index was
5 … 673 828 2 28,723,000 9
determined to be 15. This result shows that the proposed
6 … 385 1314 2 254,830,000 7 method has correctly identified the number of clusters, but
7 … 239 305 2 30,617,000 2 this was done in a shorter time.
8 … 88 770 2 284,230,000 1 The obtained values of the optimal centres in improved
9 … 882 692 2 31,148,000 4 K-means and centres of the clusters created randomly in
10 … 0 1595 1 234,230,000 2 classic K-means are shown in Table 9. Then, customers
11 … 91 913 8 702,700,000 2 were clustered based on the LRFMM2 features. Table 11
12 … 534 782 2 468,460,000 5 shows the number of data in each cluster in improved K-
13 … 819 824 3 248,000,000 1 means and classic K-means.
14 … 1040 769 5 305,050,000 1
15 … 158 1522 5 2,960,000 1
123
Table 3 Normalization of
Row Customer name l normal R normal F normal M Normal M2 normal
LRFMM2 features
1 … 0.95264317 0.245200219 0.047619048 0.007114706 1
2 … 0.691629956 0.905101481 0.031746032 0.006335914 1
3 … 0.345814978 0.878771256 0.007936507 0.002381598 1
4 … 0.890969163 0.921557872 0.095238095 0.016010109 1
5 … 0.642621145 0.768513439 0.07935079 0.013666975 0.888888889
6 … 0.263215859 0.280307186 0.007936508 0.004984594 0.444444444
7 … 0.233480176 0.93527153 0.063492064 0.026344879 0.333333333
8 … 0.898127753 0.990624412 0.087301587 0.008003638 1
9 … 0.756057269 0.840921558 0.103174603 0.078547543 0.777777778
10 … 0.844162996 0.967087219 0.142857143 0.040894094 0.777777778
11 … 0.050110132 0.500276273 0.055555556 0.02600279 0.666666667
12 … 0.65969163 0.936368623 0.023809524 0.007473242 0.666666667
13 … 0.900881057 0.947888097 0.158730159 0.038175515 0.333333333
14 … 0.572687225 0.579264948 0.031746032 0.003742459 1
15 … 0 0.723406473 0 0.001119114 0.888888889
Table 4 Weight of LRFMM2 features K-means. The results of SSE are shown in Table 12. Given
the SSE values, our proposed method with the lowest SSE
L R F M M2 Weight
value is selected for extraction of knowledge.
L 1 4 3 6 3 0.119
R 0.25 1 3 5 3 0.06 4.4 Using knowledge derived from clustering
F 0.33 0.33 1 2 6 0.244
M 0.16 0.2 0.5 1 4 0.463 Having classified the customers in 15 clusters, the required
M 0.33 0.33 0.16 0.25 1 0.115 knowledge of these clusters should be extracted so that the
clusters are evaluated and analysed. Since customers in a
4.3 Evaluating clusters by sum of total squared cluster have similar behavioural characteristics, extracted
errors criteria knowledge can be a great guide to adopting optimal
strategies tailored to each cluster for better management of
Sum of squared error criterion (SSE criteria) (Feng et al. customer relationships. Table 13 shows the average value
2015) is used to compare the proposed method and classic of L.R.F.M.M2 features in each cluster based on dividing
Table 5 Values of LRFMM2

Row Customer name L (value) R (value) F (value) M (value) M2 (value) CLV
and CLV features
1 … 0.0113365 0.01471201 0.01161905 0.00329611 0.115 0.15596
2 … 0.082304 0.05430609 0.00774602 0.00293353 0.115 0.26229
3 … 0.041152 0.05272628 0.00193651 0.00110268 0.115 0.21192
4 … 0.106053 0.05529347 0.0232381 0.00741268 0.115 0.30697
5 … 0.0764719 0.04611081 0.01936508 0.00632781 0.10222222 0.2505
6 … 0.0313227 0.01681843 0.00193651 0.00230787 0.051111111 0.1035
7 … 0.0277841 0.05611629 0.01549206 0.01219768 0.03833333 0.16992
8 … 0.1068772 0.05944048 0.02130159 0.00370568 0.115 0.30632
9 … 0.0899708 0.05045529 0.0251746 0.03636751 0.08944444 0.29161
10 … 0.1004554 0.05802523 0.03485714 0.01893397 0.08944444 0.30172
11 … 0.0059631 0.03001646 0.01355556 0.01203929 0.07666667 0.13824
12 … 0.0785033 0.05618212 0.00580952 0.00346011 0.07666667 0.22062
13 … 0.1072048 0.05687329 0.02783016 0.01767526 0.03833333 0.25882
14 … 0.0681498 0.0367559 0.00774603 0.00173276 0.115 0.22738
15 … 0 0.04400439 0 0.00051815 0.10222222 0.14674
123
H. Zare, S. Emadi
Table 6 Outlier data identification Also, customer life cycle value is calculated in each
Row Customer name Distance from other data M
cluster, and the customers are classified in the customer
value pyramid in Fig. 2. As shown in this figure, the
118 … 333.134728 1 clusters with higher CLV are located at upper levels of this
226 … 770.337678 1 pyramid. Also, the linear graph of the customer life cycle
244 … 519.0062648 1 value is shown in Fig. 3.
4920 … 2154.631055 6
1109 … 706.3420992 2 4.5 Analysis of clustering results
4574 … 351.6237346 6
5214 … 322.6735643 7 The purpose of this research is to classify the customers of
5059 … 1630.689118 7 Hamkaran System Co. based on their CLV using the
1941 … 938.6377863 3 LRFMM2 features and the improved K-means method.
6230 … 459.2522449 9 Also, the analysis of clusters in order to more accurately
3159 … 310.050627 4 assess the features of customers and the final partitioning of
… … … … customers in the form of CLV pyramid are some of the
… … … … special cases used in this study to satisfy the customers.
… … … … Customers in all clusters other than clusters from the first to
the fifth in this pyramid are in a favourable position in
Average distance between data: 446.2476034
terms of L, R, F, M, M2 features. Therefore, to maintain
these customers, the company is recommended to trans-
Table 7 Distance of any data from previous data form loyalty behaviour of these customers to loyalty view
Row Customer name CLV through establishing communication and interaction with
them.
1 … 0.262223628
Accordingly, customers in clusters 7 to 15 have the best
2 … 0.0268523932 conditions in terms of L, R, F, M, M features. Customers in
… … … cluster 6 are in favourable conditions in terms of recent
… … … purchasing, malicious and purchasing time. However, these
… … … customers are not in a good position in terms of a number
4206 … 0.1578784479 of purchases and purchasing amounts and the company
4207 … 0.1578898727 should invest on them to order new systems. Cluster 5
4208 … 0.1579069536 customers are considered to be missing customers, and the
4209 … 0.1581086040 company must pay special attention to the sale of new
4210 … 0.1581521162 products to them. Cluster 4 customers are customers who
4211 … 0.1581575984 are dissatisfied with the system and have had long-term
4212 … 0.1581942404 purchases. That is, these customers have not had double-
… … … sided interactions with the company in a long time, and
… … … therefore, they have reduced their satisfaction with the
company. As a result, the cost of purchasing this customer
group is also reduced. Customers in clusters 1, 2, and 3 are
the total value of the features in each cluster into the lost customers, and investments made on them by the
number of customers in that cluster. company will not be successful and beneficial. As a result,
Table 14 shows the comparison between the average these customers are regarded as lost customers as they are
values of the L, R, F, M, M features in each cluster and the very unhappy with the company and have not had an
average values of these features for all data. acceptable purchase over the time frame, which does not
As shown, the sign (:) (desirable state) is used when the help too much financially.
average value of each feature in a cluster is greater than the
average value of that feature in the total data, and the sign 4.6 Comparison of the proposed algorithm
(;) (undesirable condition) is used where the average value with other algorithms
of the feature in a cluster is less than the average value in
the total data. The last column of Table 14 shows the The proposed algorithm was compared with DBSCAN
average ranking of the feature s in each cluster in com- clustering, hierarchical clustering and IK-means-? (
parison with other clusters. Ismkhan 2018) algorithms.
123
Table 8 Distance from the first cluster centre

Row Customer name CLV First cluster centre First cluster
1 … 0.2287927 0.1461945673 0.825981328

2 … 0.16829495 0.1461945673 0.0221003827
3 … 0.1560914671 0.1461945673 0.0098958998
… … … … …
766 … 0.1462017052 0.1461945673 0.0000071379 Least distance
The second element of the cluster 1
… … … … …
5913 … 0.8987083171 0.1461945673 0.7525137498 Maximum distance
New cluster 2 element
… … … … …
6320 … 0.1039865355 0.1461945673 0.0422080318
6321 … 0.0020737094 0.1461945672 0.1441208579
6322 … 0.0015783931 0.1461945673 0.1446161842
Table 9 Value of optimal cluster centres

Row Cluster CLV of cluster centres in improved K-means CLV of cluster centres in classic K-means
1 Cluster 1 0.8987083171 0.5691094129

2 Cluster 2 0.7598177169 0.3215912223
3 Cluster 3 0.6192538932 0.8987083171
4 Cluster 4 0.5761698729 0.3638519496
5 Cluster 5 0.5691094129 0.8987083171
6 Cluster 6 0.4696476840 0.4204430338
7 Cluster 7 0.3638203636 0.4950704583
8 Cluster 8 0.1526541136 0.0564768887
9 Cluster 9 0.0646915269 0.5761698729
10 Cluster 10 0.0417882488 0.2961741289
11 Cluster 11 0.0323226943 0.3239657403
12 Cluster 12 0.0285419652 0.2711738810
13 Cluster 13 0.0269086453 0.1644750508
14 Cluster 14 0.0268523932 0.1689292279
15 Cluster 15 0.0262223628 0.2615105404
The number of data in the first data set is 6319 records (minPts). eps defines the neighbourhood around a data
and in the second data set is 15,000 records. Number of point, i.e. if the distance between two points is lower or
clusters, SSE, and clustering time parameters were selected equal to ‘eps’, then they are considered as neighbours. If
for comparison. the eps value is chosen too small, then a large part of the
DBSCAN clustering method is one of the density-based data will be considered as outliers. If it is chosen very
clustering algorithms. The main concept of this algorithm large, then the clusters will merge and majority of the data
is to locate regions of high density that are separated from points will be in the same clusters. MinPts is minimum
one another by regions of low density. The key idea is that number of neighbours (data points) within eps radius. The
for each point of a cluster, the neighbourhood of a given larger the data set, the larger value of MinPts must be
radius has to contain at least a minimum number of points. chosen. As a general rule, the minimum MinPts can be
DBSCAN requires two parameters: e (eps) and the mini- derived from the number of dimensions D in the data set as
mum number of points required to form a dense region
123
H. Zare, S. Emadi
Table 10 Dunn indicator values for specified clusters in classic K- MinPts C D ? 1. The minimum value of MinPts must be
means chosen at least 3 (Schubert et al. 2017).
Number of clusters Dan index In data mining and statistics, hierarchical clustering
(also called hierarchical cluster analysis or HCA) is a
10 0.0003594385
method of cluster analysis which seeks to build a hierarchy
11 0.0002493929 of clusters. Strategies for hierarchical clustering generally
12 0.0003208687 fall into two types (Szekely and Rizzo 2005):
13 0.0002494322
14 0.0004841421 • Agglomerative This is a ‘bottom-up’ approach: each
15 0.0005287482 observation starts in its own cluster, and pairs of
16 0.0005285562 clusters are merged as one moves up the hierarchy.
17 0.0003212698 • Divisive This is a ‘top-down’ approach: all observations
18 0.0001647504
start in one cluster, and splits are performed recursively
19 0.0005278998
as one moves down the hierarchy.
20 0.0003211122 Calculating the similarity between two clusters is
important to merge or divide the clusters. Then, we use
single linkage (the shortest distance between two points),
complete linkage (the longest distance between two points
Table 11 Number of cluster customers in each cluster), mean or average linkage (the average
distance between each point in one cluster to every point in
Cluster Number of customers per cluster Number of customers
number in improved K-means per cluster in K-means the other cluster), centroid distance and ward’s method
(sum of squared Euclidean distance is minimized).
1 287 1026 Table 15 shows the comparison between these algo-
2 579 1 rithms and the proposed algorithm on the first data set. In
3 780 12 this evaluation, the parameter value P = 2000 is selected
4 874 852 and for the DBSCAN algorithm, two values are selected for
5 1019 1188 required parameters based on repeated experiments. Also,
6 752 1 several methods have been used to determine the similarity
7 608 2 between clusters in the hierarchical algorithm. On the other
8 444 655 hand, because the number of clusters should be given as
9 424 39 inputs in the IK-means-? , in this evaluation the number
10 341 521 of clusters 15 was selected. As Table 15 shows, the
11 161 1 DBSCAN algorithm and IK-means-? are better than the
12 32 898 proposed algorithm in time, but worse than the SSE
13 3 369 parameter. Since the criterion for clustering suitability is
14 1 535 the SSE parameter, the proposed algorithm works well for
15 1 205 small data sets.
Tables 16 and 17 show the comparison between these
algorithms and the proposed algorithm on the second data
set. In this evaluation, the values P = 4000 and p = 5000 is
selected. The rest of the specifications are similar to the
first evaluation. In these tables as in Table 15, the
Table 12 The SSE values in the proposed algorithm and standard K-
means DBSCAN algorithm and IK-means-? algorithm are better
than the proposed algorithm in time, but worse than the
Clustering methods SSE
SSE parameter. Since the criterion for clustering suitability
Proposed K-means 0.3908527624 is the SSE parameter, the proposed algorithm works well
Classic K-means with random cluster centres 1 0.4834075448 for large data sets.
Classic K-means with random cluster centres 2 0.4156998956 In both data sets, the hierarchical algorithm performs
Classic K-means with random cluster centres 3 0.6040612737 worse than the other algorithms.
Classic K-means with random cluster centres 4 0.4834075448
123
Table 13 Average value of L, R, F, M, M2 features per cluster

Cluster Average feature Average feature Average feature Average feature Average feature Average feature value
value L value R value F value M value M2 CLV
1 0.0004717148 0.0089718865 0.0016396217 0.0008347777 0.0365969803 0.048514981

2 0.0012409697 0.0178519721 0.0029030896 0.0018665711 0.0500076761 0.073870289
3 0.0020568430 0.0224960125 0.0027930403 0.0019332268 0.064527778 0.0938069
4 0.0037594242 0.0270518207 0.0038818786 0.0027465332 0.0745760234 0.1121568
5 0.0053918041 0.0329044157 0.0043006059 0.0032322383 0.0838643550 0.129693419
6 0.0116369095 0.0407779729 0.0062781999 0.0045680594 0.0868277187 0.15008886
7 0.0223731787 0.0457105480 0.0088767231 0.0066546835 0.0890451389 0.172660272
8 0.0448456426 0.0478043320 0.0125044330 0.0096441649 0.0837462462 0.198544819
9 0.0682416438 0.0498514256 0.0157438553 0.0115905780 0.0832821119 0.228709615
10 0.0871776810 0.0538265854 0.0191379230 0.0136647278 0.0903812317 0.264188149
11 0.0916562440 0.0552327574 0.0374792468 0.0309624533 0.0914285714 0.306759273
12 0.0927353042 0.0546701865 0.0785496032 0.0896930320 0.0970312500 0.412679376
13 0.0975284508 0.0518705431 0.1213544973 0.2322390499 0.0851851852 0.588177726
14 0.0798794053 0.0596708722 0.0774603175 0.4533626775 0.0894444444 0.759817717
15 0.0170374449 0.0596708722 0.2440000000 0.4630000000 0.1150000000 0.898708317
Table 14 Clusters tagging based on LRFMM2 features a particular feature called maliciousness. Moreover, the
Cluster Average value Average rating
proposed method calculates the optimal number of clusters.
of LRFMM2 features of LRFMM2 features Therefore, the proposed method is more accurate and faster
than the classic K-means clustering as there is no need to
1 ##### 15.15.15.15.15 guess the number of clusters by the user and to compare it
2 ##### 14.14.13.14.14 with the number of predicted clusters in the previous steps.
3 ##### 13.13.14.13.13 The knowledge gained from the results of the proposed
4 "#### 12.12.12.12.12 method can easily be employed by company managers to
5 "###" 11.11.11.11.9 adopt an appropriate strategy for attracting and maintaining
6 ""##" 10.10.10.10.7 customers and improving its financial performance.
7 """"" 8.9.9.9.6 Future researchers are suggested to consider the wide
8 """"" 7.8.8.8.10 application of data mining methods in software industry,
9 """"" 6.7.7.7.11 the application of the proposed method in other fields such
10 """"" 4.5.6.6.4 as customer credibility assessment and prediction of cus-
11 """"" 3.3.5.5.3 tomer’s ability to buy from the company, and evaluation of
12 """"" 2.4.3.4.2 the company’s performance. It is also suggested that the
13 """"" 1.6.2.3.8 maliciousness level of customers should be determined
14 """"" 5.2.4.2.5 using data mining methods such as association rules and
15 """"" 9.1.1.1.1 the rules extraction. On the other hand, intermittent time
slices from the company database allow the dynamic
implementation of data mining based on the LRFMM2
5 Conclusion
model to determine the CLV, which reflects the process of
changes in customer behaviour, plays a significant role in
In this paper, an improved K-means method was proposed
marketing actions, and improves CRM.
based on malicious customers. Therefore, the results of
clustering have been improved through the consideration of
123
H. Zare, S. Emadi
Fig. 2 Customer value pyramid
Fig. 3 Linear life cycle value
123
Table 15 Comparison between

Number of clusters SSE Time (ms)
the proposed algorithm and
other algorithms on the first data IK-means-? 15 3.2 10,125
set
DBSCAN minpt = 10, eps = 0.001 17 2.76 221
minpt = 6, eps = 0.001 16 4.59 219
Hierarchical Average 38 5.35 52,059
Complete 49 5.2 52,473
Single 40 5.27 54,623
Ward 52 5.6 44,904
Centroid 50 5.57 46,459
Proposed algorithm 15 0.39 11,512

other algorithms on the second IK-means-? 15 5.41 90,129
data set and p = 4000
DBSCAN minpt = 10, eps = 0.001 1 6.49 2201
minpt = 6, eps = 0.001 4 6.49 2229
Complete 36 6.54 513,270
Single 35 5.98 514,270
Ward 35 5.35 500,711
Centroid 40 6.23 535,031

other algorithms on the second IK-means-? 15 5.41 93,230
data set and p = 5000
DBSCAN minpt = 10, eps = 0.001 1 6.49 2321
minpt = 6, eps = 0.001 4 6.49 2429
Complete 36 6.27 513,270
Single 35 5.78 514,270
Ward 35 4.85 500,711
Centroid 40 5.57 535,031
Compliance with ethical standards IEEE International Conference on Industrial Technology and
Management (ICITM), pp 41–46
Conflict of interest The authors declare that there is no conflict of Alvandi M, Fazli S, Abdoli FS (2012) K-mean clustering method for
interest regarding the publication of this paper. analysis customer lifetime value with LRFM relationship model
in banking services. Int Res J Appl Basic Sci 3(11):2294–2302
Human and animal rights This article does not contain any studies Anitha P, Patil MM (2019) RFM model for customer purchase
with human participants or animals performed by the authors. behavior using K-means algorithm. J King Saud Univ Comput
Inf Sci. https://doi.org/10.1016/j.jksuci.2019.12.011
Ansari A, Riasi A (2016) Customer clustering using a combination of
fuzzy c-means and genetic algorithms. Int J Bus Manag
References 11(7):59–66
Arthur D, Vassilvitskii S (2007) K-means??: the advantages of
Alsaç A, Çolak M, Keskin GA (2017) An integrated customer careful seeding. In: Proceedings of the eighteenth annual ACM-
relationship management and Data Mining framework for SIAM symposium on discrete algorithms, pp 1027–1035. Soci-
customer classification and risk analysis in health sector. In: ety for Industrial and Applied Mathematics
123
H. Zare, S. Emadi
Bablani A, Edla DR, Kuppili V, Ramesh D (2020) A multi stage EEG Gayathri A, Mohanavalli S (2011) Enhanced customer relationship
data classification using K-means and feed forward neural management using fuzzy clustering. Int J Comput Sci Eng
network. Clin Epidemiol Glob Health. https://doi.org/10.1016/j. Technol 1(4):163–167
cegh.2020.01.008 Govender P, Sivakumar V (2019) Application of K-means and
Bagirov AM (2008) Modified global K-means algorithm for minimum hierarchical clustering techniques for analysis of air pollution: a
sum-of-squares clustering problems. Pattern Recogn review (1980–2019). Atmos Pollut Res
41(10):3192–3199 Gu Y, Li K, Guo Z, Wang Y (2019) Semi-supervised K-means DDoS
Bagirov AM, Ugon J, Webb D (2011) Fast modified global K-means detection method using hybrid feature selection algorithm. IEEE
algorithm for incremental cluster construction. Pattern Recogn Access 7:64351–64365
44(4):866–876 Han J, Pei J, Kamber M (2011) Data mining: concepts and techniques.
Bai L, Liang J, Guo Y (2018) An ensemble clusterer of multiple fuzzy Elsevier, Amsterdam
k means clusterings to recognize arbitrarily shaped clusters. He BH, Song GF (2009) Knowledge management and data mining for
IEEE Trans Fuzzy Syst 26(6):3524–3533 supply chain risk management. In: IEEE international confer-
Baxter R, He H, Williams G, Hawkins S, Gu L (2002) An empirical ence on management and service science, 2009, pp 1–4
comparison of outlier detection methods. In: Sixth Pacific-Asia Hu J, Li M, Zhu E, Wang S, Liu X, Zhai Y (2019) Consensus multiple
conference on knowledge discovery and data mining (PAKDD- kernel K-means clustering with late fusion alignment and matrix-
02) induced regularization. IEEE Access 7:136322–136331
Carnein M, Trautmann H (2019) Customer segmentation based on Hussain SF, Haris M (2019) A K-means based co-clustering (kCC)
transactional data using stream clustering. In: Pacific-Asia algorithm for sparse, high dimensional data. Expert Syst Appl
conference on knowledge discovery and data mining, 118:20–34
pp 280–292. Springer, Cham Ismkhan H (2018) Ik-means-?: an iterative clustering algorithm
Celebi ME, Kingravi HA, Vela PA (2013) A comparative study of based on an enhanced version of the K-means. Pattern Recogn
efficient initialization methods for the K-means clustering 79:402–413
algorithm. Expert Syst Appl 40(1):200–210 Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM
Chen Y, Hu P, Wang W (2018) Improved K-means algorithm and its Comput Surv 31(3):264–323
implementation based on mean shift. In: 2018 11th International Jiang ZL, Guo N, Jin Y, Lv J, Wu Y, Liu Z, Fang J, Yiu SM, Wang X
congress on image and signal processing, biomedical engineer- (2020) Efficient two-party privacy-preserving collaborative K-
ing and informatics (CISP-BMEI), pp 1–5. IEEE means clustering protocol supporting both storage and compu-
Chiang WY (2018) Applying data mining for online CRM marketing tation outsourcing. Inf Sci 518:168–180
strategy. Br Food J. https://doi.org/10.1108/BFJ-02-2017-0075 Jones PJ, James MK, Davies MJ, Khunti K, Catt M, Yates T,
Christy AJ, Umamakeswari A, Priyatharsini L, Neyaa A (2018) RFM Rowlands AV, Mirkes EM (2020) FilterK: a new outlier
ranking—an effective approach to customer segmentation. detection method for K-means clustering of physical activity.
J King Saud Univ Comput Inf Sci. https://doi.org/10.1016/j. J Biomed Inf 103397:1–29
jksuci.2018.09.004 Kafashpour A, Tavakoli A, Alizadeh S (2012) Customers segmen-
Danesh M, Naghibzadeh M, Totonchi MRA, Danesh M, Minaei B, tation base on lifetime value, use RFM data mining. Iran J Public
Shirgahi H (2011) Data clustering based on an efficient hybrid of Manag 5(15):63–84
K-harmonic means, PSO and GA. In: Transactions on compu- Kanungo T, Mount DM, Netanyahu NS, Piatko CD, Silverman R, Wu
tational collective intelligence IV, pp 125–140. Springer, Berlin, AY (2002) An efficient K-means clustering algorithm: analysis
Heidelberg and implementation. IEEE Trans Pattern Anal Mach Intell
Deng CH, Zhao WL (2018) Fast K-means based on k-NN Graph. In: 24(7):881–892
2018 IEEE 34th international conference on data engineering Karczmarek P, Kiersztyn A, Pedrycz W, Al E (2020) K-means-based
(ICDE), pp 1220–1223. IEEE isolation forest. Knowledge-Based Syst 105659:1–15
Dong G, Jin Y, Wang S, Li W, Tao Z, Guo S (2019) DB-K means: an Katsavounidis I, Kuo CCJ, Zhang Z (1994) A new initialization
intrusion detection algorithm based on DBSCAN and K-means. technique for generalized Lloyd iteration. IEEE Signal Process
In: 2019 20th Asia-Pacific network operations and management Lett 1(10):144–146
symposium (APNOMS), pp 1–4. IEEE Khalili-Damghani K, Abdi F, Abolmakarem S (2018) Hybrid soft
Dyche J (2002) The CRM handbook: a business guide to customer computing approach based on clustering, rule mining, and
relationship management. Addison-Wesley Professional, Boston decision tree analysis for customer segmentation problem: real
Erdil A, Öztürk A (2016) Improvement a quality oriented model for case of customer-centric industries. Appl Soft Comput
customer relationship management: a case study for shipment 73:816–828
industry in Turkey. Procedia Soc Behav Sci 229:346–353 Kumar KM, Reddy ARM (2017) An efficient K-means clustering
Erisoglu M, Calis N, Sakallioglu S (2011) A new algorithm for initial filtering algorithm using density based initial cluster centers. Inf
cluster centers in K-means algorithm. Pattern Recogn Lett Sci 418:286–301
32(14):1701–1705 Kumar V, Shah D, Venkatesan R (2006) Managing retailer prof-
Eszergár-Kiss D, Caesar B (2017) Definition of user groups applying itability—one customer at a time! J Retail 82(4):277–294
Ward’s method. Transp Res Procedia 22:25–34 Lai JZ, Huang TJ (2010) Fast global K-means clustering using cluster
Fadaei A, Khasteh SH (2019) Enhanced K-means re-clustering over membership and inequality. Pattern Recogn 43(5):1954–1963
dynamic networks. Expert Syst Appl 132:126–140 Laudon KC, Laudon JP (2015) Management information systems:
Feng Q, Zhu X, Pan JS (2015) Global linear regression coefficient managing the digital firm plus MyMISLab with Pearson eText–
classifier for recognition. Optik Int J Light Electron Opt access card package. Prentice Hall Press, Upper Saddle River
126(21):3234–3239 Li DC, Dai WL, Tseng WT (2011) A two-stage clustering method to
Fränti P, Sieranoja S (2019) How much can K-means be improved by analyze customer characteristics to build discriminative cus-
using better initialization and repeats? Pattern Recogn tomer management: a case of textile manufacturing business.
93:95–112 Expert Syst Appl 38(6):7186–7191
123
Li X, Qin B, Zhu Z, Lin Q (2017) Study on application of data mining Redmond SJ, Heneghan C (2007) A method for initialising the K-
in customer acquisition. In: DEStech transactions on social means clustering algorithm using kd-trees. Pattern Recogn Lett
science, education and human science, (eemt) 28(8):965–973
Liao SH, Chu PH, Hsiao PY (2012) Data mining techniques and Riveros NAM, Espitia BAC, Pico LEA (2019) Comparison between
applications—a decade review from 2000 to 2011. Expert Syst K-means and self-organizing maps algorithms used for diagnosis
Appl 39(12):11303–11311 spinal column patients. Inform Med Unlocked 16:100206
Likas A, Vlassis N, Verbeek JJ (2003) The global K-means clustering Schubert E, Sander J, Ester M, Kriegel HP, Xu X (2017) DBSCAN
algorithm. Pattern Recogn 36(2):451–461 revisited, revisited: why and how you should (still) use
Lin CY (2020) A reversible privacy-preserving clustering technique DBSCAN. ACM Trans Database Syst 42(3):19
based on K-means algorithm. Appl Soft Comput 87:105995 Sharma V, Bala M (2020) An improved task allocation strategy in
MacQueen J (1967) Some methods for classification and analysis of cloud using modified K-means clustering technique. Egypt
multivariate observations. In: Proceedings of the fifth Berkeley Inform J. https://doi.org/10.1016/j.eij.2020.02.001
symposium on mathematical statistics and probability, vol. 1, no. Shatnawi MQ, Yassein MB, Al-natour H (2017) Customer relation-
14, pp 281–297 ship management at Jordan University of science and technol-
Maghfirah MM, Adji TB, Setiawan NA (2015) Appropriate data ogy: case study, issues and recommendations. In: IEEE
mining technique and algorithm for using in analysis of customer international conference on engineering and technology (ICET),
relationship management (CRM) in bank industry. In: Seminar pp 1–6. IEEE
Nasional Aplikasi Teknologi Informasi (SNATI), vol. 1, no. 1 Shmueli G, Bruce PC, Yahav I, Patel NR, Lichtendahl KC Jr (2017)
Manxi W, Liandong W, Chenfeng W, Xiaoguang G, Ruohai D Data mining for business analytics: concepts, techniques, and
(2018). Finding community structure of Bayesian networks by applications in R. Wiley, Hoboken
improved K-means algorithm. In: 2018 IEEE 3rd international Sohrabi J, Hadavandi E (2011) Data mining in banking industry.
conference on image, vision and computing (ICIVC), Iranian Jahad Publishing, Amir Kabir University of Technology,
pp 865–869. IEEE Tehran, pp 25–70
Maryani I, Riana D (2017) Clustering and profiling of customers Subbalakshmi C, Krishna GR, Rao SKM, Rao PV (2015) A Method
using RFM for customer relationship management recommen- to find optimum number of clusters based on fuzzy silhouette on
dations. In: IEEE 5th International Conference on Cyber and IT dynamic data set. Procedia Comput Sci 46:346–353
Service Management (CITSM), pp 1–6 Szekely GJ, Rizzo ML (2005) Hierarchical clustering via joint
Min Z, Kai-fei D (2015) Improved research to K-means initial cluster between-within distances: extending ward’s minimum variance
centers. In: 2015 Ninth international conference on frontier of method. J Classif 22(2)
computer science and technology, pp 349–353. IEEE Szulanski G (1996) Exploring internal stickiness: impediments to the
Mojena R (1977) Hierarchical grouping methods and stopping rules: transfer of best practice within the firm. Strateg Manag J
an evaluation. Comput J 20(4):359–363 17(S2):27–43
Mukhlas A, Ahmad A, Zainun Z Berhad MP (2016) Data mining Tzortzis G, Likas A (2014) The MinMax K-means clustering
technique: towards supporting local co-operative society in algorithm. Pattern Recogn 47(7):2505–2516
customer profiling, market analysis and prototype construction. Wang H, Zhang J (2010) Study of customer segmentation for auto
In: IEEE international conference on information and commu- services companies based on RFM model. School of Manage-
nication technology, pp 109–114 ment, Wuhan University of Technology, Wuhan
Nguyen B, De Baets B (2019) Kernel-based distance metric learning Wang S, Zhu E, Hu J, Li M, Zhao K, Hu N, Liu X (2019) Efficient
for supervised K-means clustering. IEEE Trans Neural Netw multiple kernel K-means clustering with late fusion. IEEE
Learn Syst 30(10):3084–3095 Access 7:61109–61120
Nithya A, Appathurai A, Venkatadri N, Ramji DR, Palagan CA Xiaofeng Z, Xiaohong H (2017) Research on intrusion detection
(2020) Kidney disease detection and segmentation using artifi- based on improved combination of K-means and multi-level
cial neural network and multi-kernel K-means clustering for SVM. In: 2017 IEEE 17th international conference on commu-
ultrasound images. Measurement 149:106952 nication technology (ICCT), pp 2042–2045. IEEE
Olson DL (2017) Recency frequency and monetary model. In: Khajvand M, Tarokh MJ (2011) Analyzing customer segmentation based
Descriptive data mining. Springer, Singapore on customer value components (case study: a private bank)
Pawar RG (2016) Data mining: techniques for enhancing customer Yu SS, Chu SW, Wang CM, Chan YK, Chang TC (2018) Two
relationship management in fast moving consumer goods improved K-means algorithms. Appl Soft Comput 68:747–755
industries. Int Res J Multidiscip Stud 2(2):1–5 Yuliari NPP, Putra IKGD, Rusjayanti NKD (2015) Customer
Peker S, Kocyigit A, Eren PE (2017) LRFMP model for customer segmentation through fuzzy C-means and fuzzy RFM method.
segmentation in the grocery retail industry: a case study. Market J Theor Appl Inf Technol 78(3):380–385
Intell Plan 35(4):544–559 Zahrotun L (2017) Implementation of data mining technique for
Prabha D, Subramanian RS (2017) A survey on customer relationship customer relationship management (CRM) on online shop
management. In: 4th IEEE international conference on advanced tokodiapers.com with fuzzy c-means clustering. In: IEEE 2nd
computing and communication systems (ICACCS), pp 1–5 international conferences on information technology, informa-
Qadadeh W, Abdallah S (2018) Customers segmentation in the tion systems and electrical engineering (ICITISEE), pp 299–303
insurance company (TIC) dataset. Procedia Comput Sci Zhang GY, Wang CD, Huang D, Zheng WS, Zhou YR (2018) TW-
144:277–290 Co-K-means: two-level weighted collaborative K-means for
Qiao J, Cai X, Xiao Q, Chen Z, Kulkarni P, Ferris C, Kamarthi S, multi-view clustering. Knowl Based Syst 150:127–138
Sridhar S (2019) Data on MRI brain lesion segmentation using Žiberna A (2020) K-means-based algorithm for blockmodeling linked
K-means and Gaussian mixture model-expectation maximiza- networks. Soc Netw 61:153–169
tion. Data Brief 27:104628
Rajeh SM, Koudehi FA, Seyedhosseini SM, Farazmand R (2014) A Publisher’s Note Springer Nature remains neutral with regard to
model for customer segmentation based on loyalty using data jurisdictional claims in published maps and institutional affiliations.
mining approach and fuzzy concept in Iranian Bank. Int J Bus
Behav Sci 4(9):118–136
123

Determination of Customer Satisfaction Using Improved K-Means Algorithm

Uploaded by

Copyright:

Available Formats

You might also like

Determination of Customer Satisfaction Using Improved K-Means Algorithm

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Determination of Customer Satisfaction Using Improved K-Means Algorithm

Uploaded by

Copyright:

Available Formats

Soft Computing

METHODOLOGIES AND APPLICATION

Determination of Customer Satisfaction using Improved K-means

Springer-Verlag GmbH Germany, part of Springer Nature 2020

1 Introduction management and sales staff or organization services can adapt

Evaluating clusters by SSE

Fig. 1 Structure of research methodology

Table 1 Introducing LRFMM2 features

Table 5 Values of LRFMM2

Table 8 Distance from the first cluster centre

1 … 0.2287927 0.1461945673 0.825981328

Table 9 Value of optimal cluster centres

1 Cluster 1 0.8987083171 0.5691094129

Table 13 Average value of L, R, F, M, M2 features per cluster

1 0.0004717148 0.0089718865 0.0016396217 0.0008347777 0.0365969803 0.048514981

Fig. 2 Customer value pyramid

Fig. 3 Linear life cycle value

Table 15 Comparison between

Table 16 Comparison between

Table 17 Comparison between

You might also like