Professional Documents
Culture Documents
Expert Systems With Applications: Chui-Yu Chiu, Yi-Feng Chen, I-Ting Kuo, He Chun Ku
Expert Systems With Applications: Chui-Yu Chiu, Yi-Feng Chen, I-Ting Kuo, He Chun Ku
Expert Systems With Applications: Chui-Yu Chiu, Yi-Feng Chen, I-Ting Kuo, He Chun Ku
a r t i c l e i n f o a b s t r a c t
Keywords: With the development of information technology (IT), how to find useful information existed in vast data
Market segmentation has become an important issue. The most broadly discussed technique is data mining, which has been
Data mining
successfully applied to many fields as analytic tool. Data mining extracts implicit, previously unknown,
Clustering
and potentially useful information from data. Clustering is one of the most important and useful technol-
Particle swarm optimization
ogies in data mining methods. Clustering is to group objects together, which is based on the difference of
similarity on each object, and making highly homogeneity in the same cluster, or highly heterogeneity
between each group. In this paper, we propose a market segmentation system based on the structure
of decision support system which integrates conventional statistic analysis method and intelligent clus-
tering methods such as artificial neural network, and particle swarm optimization methods. The pro-
posed system is expected to provide precise market segmentation for marketing strategy decision
making and extended application.
2008 Elsevier Ltd. All rights reserved.
0957-4174/$ - see front matter 2008 Elsevier Ltd. All rights reserved.
doi:10.1016/j.eswa.2008.05.029
C.-Y. Chiu et al. / Expert Systems with Applications 36 (2009) 4558–4565 4559
2.2. Data mining Flynn, 1999). Over the last several decades, due to the development
of artificial intelligence and soft computing, clustering methods
2.2.1. Introduction of data mining based on other theories or techniques has taken place (Backer,
The science of extracting useful information from large data sets 1995; Fayyad, Piatetsky-Shapiro, & Smith, 1996).
or databases is known as data mining. In the past, data mining has
been referred to as knowledge management or knowledge engineer- 2.4. Particle swarm optimization
ing. Until recently, it has been an obscure and exotic technology, dis-
cussed more theoreticians in the artificial intelligence fields. Fayyad, Particle swarm optimization (PSO) is an evolutionary computa-
et al defined data mining as a step in the KDD process consisting of tion technique developed by Kennedy and Eberhart (1995). The
applying computational techniques that, under acceptable compu- method has been developed through a simulation of simplified so-
tational efficiency limitations, produce a particular enumeration of cial models. The following scene explains the concept of PSO: a
patterns or models over the data (Chung & Gary, 1999). group of birds are searching one piece of food in an area randomly.
All the birds do not know where the food is. The most effective
2.2.2. Methods of data mining search strategy to find the food is to figure out which bird is near-
A particular data mining algorithm is usually an instantiation of est the food and follows it.
the model-preference-search components. Some of the common Like GA (Maulik & Bandyopadhyay, 2000), PSO must also have a
model functions in current data mining practice include (Han & fitness evaluation function that takes the particle’s position and as-
Kamber, 2001): signs to it a fitness value. The objective is to optimize the fitness
function. Each particle also has own coordinates and velocity to
1. Classification: This model function classifies a data item into one changes the direction of flying in predefined search domain with
of several predefined categorical classes. D-dimensional. All particles fly through the search domain by fol-
2. Regression: The purpose of this model function is to map a data lowing the current optimum particles. PSO is applied in market
item to a real-valued prediction variable. segmentation with good performance (Van der Merwe & Engelbr-
3. Clustering: This function maps a data item into one of several echt, 2003).
clusters, where clusters are natural groupings of data items
based on similarity metrics or probability density models. 3. Development of the system
4. Rule generation: Here one mines or generates rules from the
data. Association rule mining refers to discovering association 3.1. Framework of system
relationships among different attributes. Dependency modeling
corresponds to extracting significant dependencies among In this study, we propose a market segmentation system based
variables. on the structure of decision support system which integrates con-
5. Summarization or condensation: This function provides a com- ventional statistic analysis method and intelligent clustering meth-
pact description for a subset of data. Data compression may ods such as artificial neural network and particle swarm
play a significant role here, particularly for multimedia data, optimization methods. Then we put it into real practices on a com-
because of the advantage it offers to compactly represent the pany. The proposed system is expected to provide precise market
data with a reduced number of bits, thereby increasing the segmentation for marketing strategy decision making. The first
database storage bandwidth. important thing of marketing is the accuracy of data. Accurate data
6. Sequence analysis: It models sequential patterns, like time-series analysis is essential to making profitable marketing decisions.
analysis, gene sequences, etc. The goal is to model the states of Therefore, we designed the market segmentation system architec-
the process generating the sequence, or to extract and report ture as shown in Fig. 3.1.
deviation and trends over time.
3.2. SOM + k-means
2.3.2. Clustering methods Here we utilize PSO + k-means (Huang, 2006) to decide the vec-
A good clustering method will produce high-quality clusters tor of cluster center. PSO + k-means has much powerful distin-
with high intraclass similarity and low interclass similarity. The guished capability in multidimensional space. A single particle
quality of a clustering result depends on both the similarity mea- represents the k cluster centroid vectors. That is, each particle Xid
sure used by the method and its implementation. It is measured is constructed as follows:
by the ability of the system to discover some or all of the hidden
X id ¼ ðzi1 ; . . . ; zij ; . . . ; zik Þ ð3:1Þ
patterns.
Clustering of data is broadly based on two approaches, namely, where zij refers to the jth cluster centroid vector of the ith particle in
hierarchical and partitional (Jain & Dubes, 1988; Jain, Murty, & cluster Cij. Therefore, a swarm represents a number of candidate
4560 C.-Y. Chiu et al. / Expert Systems with Applications 36 (2009) 4558–4565
clusters for the current data vectors. The fitness of particles is easily (c) Update the global best and local best positions.
measured as quantization error (d) Update the cluster centroids using Eqs. (3.4) and (3.5).
" #
Pk P V new old
dðx; zij Þ=jnij j id ¼ W V id þ c 1 rand1 ðP id X id Þ þ c 2 rand2 ðP gd X id Þ
j¼1
8x2nij ð3:4Þ
Je ¼ ð3:2Þ
k X new old new
id ¼ X id þ V id ð3:5Þ
dðx; zij Þ ¼ kx zij k ð3:3Þ
where W is the inertia weight, c1 and c2 are the acceleration
where d is defined in Eq. (3.3), and jnijj is the number of data vectors constraints
belonging to cluster nij, i.e. the frequency of that cluster. The execu-
tion of PSO + k-means is shown in Fig. 4.4 and the procedure of rand1 ; rand2 Uð0; 1Þ:
PSO + k-means is as follows: 3. Calculate the Euclidean distance d(x, zij) to all cluster centroids
nij.
1. Initialize each particle to contain k randomly selected cluster 4. Assign x to cluster nij such that d(x, zij) = min"c = 1, . . . , k{d(x, zij)}.
centroids. P
5. Recalculate the positions of centroids as zij ¼ n1ij 8x2nij x.
2. For t=1 to tmax do 6. tmax is the maximum number of iterations, Stop until meet tmax.
(a) For each particle i do
(b) For each data vector x 3.4. Clustering performance evaluation
i. Calculate the Euclidean distance d(x, zij) to all cluster cen-
troids nij. The algorithms try to minimize the cost function – Mean Square
ii. Assign x to cluster nij such that d(x, zij) = Error (MSE).The main purpose of this section is to compare the
min"c = 1, . . . , k{d(x, zij)}. quality of the respective clusters, where quality is measured
iii. Calculate the fitness using Eq. (3.2). according to the following three criteria (Van den Bergh, 2002):
C.-Y. Chiu et al. / Expert Systems with Applications 36 (2009) 4558–4565 4561
The mean square error as defined in Eq. (3.6). method and intelligent clustering methods, such as k-means,
The intra-cluster distances: the distance between data vectors SOM + k-means, and PSO + k-means. The practical implementation
within a luster, where the objective is to minimize the intra- of this system is intended for the market survey industry. With re-
cluster distances. It is defined in Eq. (3.7). gards to system architecture, the market segmentation system has
The inter-cluster distances: the distance between the centroids been designed to provide convenient and efficient clustering
of the clusters, where the objective is to maximize the distance analysis.
between clusters. It is defined in Eq. (3.8).
4.1.1. System interface
X
n X
k
MSE ¼ jxi mj j2 ð3:6Þ The market segmentation system is presented in a practical
i¼1 j¼1;xi 2C j manner for the marketers or analysts dealing with the marketing
events. In the way to develop the system, we first make three clus-
where xi (i = 1,2, . . . , n) is a data set X with n objects. tering algorithms: k-means, SOM + k-means, and PSO + k-means.
This part is the core of the proposed system. Second, it needs col-
k is the number of clusters. lection of necessary information to improve the system. Finally, we
mj is the centroid of cluster Cj (j = 1,2,. . .k). provide the clustering results and use three criteria to evaluate the
quality of the results.
X
n X
n
Main form of the proposed system is shown in Fig. 4.1. The k-
Intra-cluster Distances ¼ dðxi ; xj Þ ð3:7Þ
means module, SOM + k-means module, and PSO + k-means mod-
i¼1 j¼iþ1
ule can be launched through this form. The main form provides
where xi (i = 1,2, . . . , n) is a data set X with n objects. users to enter and execute the files. And before executing, users
xj (j = 1,2, . . . , n) is a data set X with n objects can choose to normalize the data or not.
The module subsystem is shown in Figs. 4.2, 4.3 and 4.4.
X
k X
k
Inter-cluster Distances ¼ dðmi ; mj Þ ð3:8Þ During executing the clustering process, all information is re-
i¼1 j¼iþ1 corded behind this module. Furthermore, system users can change
model parameters such as W, c1, c2 of PSO.
where mi is the centroid of cluster Ci (i = 1,2, . . . , k).
All algorithms are run for 100 iterations, and the PSO algo-
mj is the centroid of cluster Cj (j = 1,2, . . . , k).
rithm used 50 particles. W = 0.72 and c1 = c2 = 1.49. For SOM,
k is the number of clusters.
the neighborhood width is 0.1 and the training rate is 0.995.
These values are chosen to ensure good convergence (Van den
4. Applying the market segmentation system Bergh, 2002).
The clustering problems used for the purpose of this study are Iris plants database: This is a well-known database with 4
as follow: numeric attributes, 3 classes and 150 instances.
C.-Y. Chiu et al. / Expert Systems with Applications 36 (2009) 4558–4565 4563
Glass identification database: This database contains 6 types of distances, the latter ensures compact clusters with little deviation
glass defined in terms of their oxide content (i.e. Na, Fe, K, from the cluster centroids, while the former ensures larger separa-
etc). All attributes are numeric-valued. tion between the different clusters. Therefore, according to the no-
Teaching assistant evaluation database: This database contains tion of Intra-cluster Distances
Inter-cluster Distances
, a small value represents the homogeneous
151 instances, 6 attributes, 3 classes. The data consist of evalu- within clusters and heterogeneous between clusters.
ations of teaching performance over three regular semesters For these three problems, the PSO + k-means algorithm had a
and two summer semesters of 151 teaching assistant (TA) smallest average MSE and standard deviation. When considering
Intra-cluster Distances
assignments at the Statistics Department of the University of Inter-cluster Distances
, the PSO + k-means algorithm also had a smallest
Wisconsin – Madison. value comparing with k-means and SOM + k-means.
Table 4.1 summarizes the results obtained from the three clus- 4.2. Example and demonstration of the proposed system – a business
tering algorithms for the problems above. The values reported are application
averages 30 simulations, with standard deviations to indicate the
range of values to which the algorithms converge. For a good clus- In this section, a further discussion about how it helps managers
tering, each cluster should be as homogeneous as possible and the in business practice will be explicated. To understand customers
various clusters should be as heterogeneous as possible. To achieve more, it’s easier to enhance the relationship between business
this objective, we use MSE value and the value of Intra-cluster Distances
Inter-cluster Distances
. If and its customers.
the algorithm could cluster data with a lower MSE value, the sim-
ilarity within a segment increases. Likewise, the standard deviation 4.2.1. Data description
of an algorithm is also taken as an index. Chung and Gary thought The data were collected primarily by a company, which pro-
that the quality of an algorithm should be assessed by its robust- vides passive and active electronic components for consumer elec-
ness, and a good algorithm could always produce the same results tronics, telecommunications, computers and etc. The period of the
(Chung & Gary, 1999). When considering inter- and intra-cluster business transaction detail data is from 2003/1/1 to 2006/6/15.
Table 4.1
Comparison of three methods of the proposed system
Intra-cluster Distances
Problem Algorithm MSE Intra-cluster Distance Inter-cluster Distance Inter-cluster Distances
Table 4.2
Clustering result from Group = 3 to Group = 8
4.2.2. Data extraction with RFM variables ber of clusters is more than six. Therefore, it is a good indication
With customers’ purchasing behavior data available, we need to that the number of clusters should be six.
decide which variables are to be used for analyzing. Among many With the number of clusters determined, we can easily perform
behavior variables, RFM variables are well known to both research- the k-means, SOM + k-means and PSO + k-means algorithms to find
ers and practitioners. In order to completely understand custom- which customers are clustered together.
ers, database marketers applied RFM variables to segment
customer markets (Newell, 1997). RFM variables that represent 4.2.5. Experiment results
customers’ latest purchase, number of purchase and total purchase This section compares the results of the k-means, SOM + k-
amount can be derived from transaction history records. After data means and PSO + k-means algorithms with the number of clusters
extraction operation, one file is created and listing customers who is six. The result is presented in Table 4.3.
belong to clusters as a result of customer clustering. The file con- After executing the system, we also conclude PSO + k-means is
tains customers, recency, frequency, and monetary values. better than k-menas and SOM + k-means running for this case.
Among the three methods, the MSE value of PSO + k-means is the
4.2.3. Data normalization lowest; the intra-cluster distance is the lowest; the inter-cluster
Before feeding data into the clustering algorithms, RFM vari- distance is the highest, and Intra-cluster Distances
Inter-cluster Distances
value is the lowest.
ables should be normalized to eliminate scale effects. Normaliza- Therefore, we adapt the result of PSO + k-means to make market-
tion entails relatively minor additional computations during ing strategies; Table 4.4 shows the clustering results for customers.
application of a solution to new data, which must also be normal- About RFM status, we could make use of the strategic position-
ized. For frequency and monetary whose preferences are monoton- ing of RFM clusters similar to Ha and Park (1998) work to set up
ically increasing, a simple positively linear normal function as in priorities among clusters. Ha and Park proposed to position each
Eq. (4.1) can be used to derive the massaged value. Recency, how- customer cluster based on comparing average RFM values of each
ever, exhibits a monotonically decreasing preference. An inverse cluster with the total average RFM values of all clusters. If the aver-
function as in Eq. (4.2) is then applied age of a cluster is greater than the total average, then an upward
arrow " is assigned to that variable; otherwise, a downward arrow
log xi log xmin
Massaged Value ¼ ð4:1Þ ; is assigned. According to this result, cluster characteristics could
log xmax log xmin
be analyzed and their strategic positions could be determined.
log xi log xmin Cluster 5 who has R ; F " M " and Cluster 2 who has R " F " M "
Massaged Value ¼ 1 ð4:2Þ
log xmax log xmin can be considered as loyal ones who frequently deal with and make
a large purchase. Cluster 6 who has R ; F ; M; is probably a new
4.2.4. Determine the number of clusters customer who recently deals with. Cluster 1 who has R ; F " M ;
In this study, we employ a two-stage clustering suggested by is promising one who might be promoted to the loyal customer.
Punj and Stewart (Punj & Stewart, 1983). Ward’s hierarchical Clusters 3 and 4 who have R " F ; M ; is likely to be vulnerable cus-
method is used to determine the number of clusters and k-means tomers who have not deal with for a long time.
clustering method is then performed to yield clusters. Direct marketers have been using recency, frequency, and mon-
First, we run PSO + k-means for this normalized data from etary (RFM) analysis to predict customer behavior for more than 50
Group = 3 to Group = 8. The result is shown in Table 4.2. It is found years (Hughes, 1994). It is one of the most powerful techniques
that the Inter-cluster Distance is the highest at Group = 6 and the available to marketers. Among the six clusters, Cluster 1 is selected
Intra-cluster Distance is relatively decreasing flatly when the num- as a target customer segment with the first priority, followed by
Table 4.3
Comparison of three methods with Group = 6
Intra-cluster Distances
Problem Algorithm MSE Intra-cluster Distance Inter-cluster Distance Inter-cluster Distances
Table 4.4
Clustering results of PSO + k-means
Cluster Customer Counts Recency (Avg.) Frequency (Avg.) Monetary (Avg.) RFMStatus
1 21 20.190476 139.285714 1,959,011.190476 R;F"M;
2 10 174.100000 526.400000 27,645,786.600000 R " F "M "
3 33 233.666667 17.727273 245,315.030303 R"F;M;
4 29 331.103448 2.793103 15,980.827586 R"F;M;
5 14 2.357143 522.928571 10,205,913.857143 R;F"M"
6 13 12.461538 10.153846 179,290.692308 R;F;M;
Total average 163.941667 135.900000 3,928,079.283333
C.-Y. Chiu et al. / Expert Systems with Applications 36 (2009) 4558–4565 4565
Clusters 5 and 2. It is because that the effect to these target cus- Fayyad, U., Piatetsky-Shapiro, G., & Smith, P. (1996). From data mining to
knowledge discovery in database. American Association for Artificial
tomer clusters might become potentially greater than the effect
Intelligence, August, 37–54.
to others from the RFM point of view. Han, J., & Kamber, M. (2001). Data mining: Concepts and techniques. San Diego:
Academic Press.
Ha, S. H., & Park, S. C. (1998). Application of data mining tools to hotel data mart on
5. Conclusion the intranet for database marketing. Expert Systems with Application, 15, 1–31.
Huang, T. W. (2006). Application of clustering analysis for reducing SMT setup time-
Data mining is a fast-expanding field with many new research a case study on Advantech company. Master Thesis, Department of National
Taipei University of Technology, Taipei, Taiwan.
results reported and new systems or prototypes developed re- Hughes, A. M. (1994). Strategic database marketing: The masterplan for starting and
cently (Ha & Park, 1998). In this study, we presented the data min- managing a profitable customer-based marketing program. Chicago: Probus
ing process and clustering methods, and corresponding system. We Publishing.
Jain, A. K., & Dubes, R. C. (1988). Algorithms for clustering data. Englewood Cliffs, NJ:
proposed the market segmentation system based on the structure
Prentice-Hall.
of decision support system. The proposed system provides three Jain, A. K., Murty, M. N., & Flynn, P. J. (1999). Data clustering: A review. ACM
clustering tools for users, including k-means, SOM + k-means and Computing Surveys, 31, 264–323.
Kennedy, J., & Eberhart, R. C. (1995). Particle swarm optimization. In Proceedings of
PSO + k-means, and the users can select any tool of them according
the IEEE international joint conference on neural networks (Vol. 4, pp. 1942–1948).
to their needs. Kotler, P. (2000). Marketing management. Upper Saddle River, NJ: Prentice-Hall.
In CRM, it is essential to realize the characteristics of different Kuo, R. J., Ho, L. M.,& Hu, C. M. (1999). Cluster analysis in industrial market
customer groups not only to improve their profitability, but more segmentation through artificial neural network. In Proceedings of 26th
international conference on computers and industrial engineering (pp. 15–17),
importantly to develop long-term relationship with them. For the Melbourne, Australia.
case of this study, we applied the proposed market segmentation MacQueen, J. B. (1967). Some methods for classification and analysis of multivariate
system to solving a real problem. Our idea is to segment customers observations. In Proceedings of the fifth Berkeley symposium and mathematical
statistics and probability (Vol. 1, pp. 281–296).
via RFM variables to indicate degrees of importance of customers. Maulik, U., & Bandyopadhyay, S. (2000). Genetic algorithm-based clustering
After customers had been clustered, it can help marketers to devel- technique. Pattern Recognition, 33, 1455–1465.
op proper tactics for their customers. Especially in some industries Newell, F. (1997). The new rule of marketing: How to use one-to-one relationship
marketing to be the leader in your industry. New York: McGraw-Hill.
with highly competitive intensity, having a great understanding of Punj, G., & Stewart, D. W. (1983). Cluster analysis in marketing research: Review
customers can help business to improve the quality of the service and suggestions for application. Journal of Marketing Research, 20, 134–148.
and maintain a good relationship with its customers. As a result, a Tou, J. T., & Gonzalez, R. C. (1974). Pattern recognition principles. London: Addison-
Wesley.
competitive advantage could be generated from it.
Van den Bergh, F. (2002), An analysis of particle swarm optimizers, PhD Thesis,
Department of Computer Science, University of Pretoria, Pretoria, South Africa.
Van der Merwe, D. W., & Engelbrecht, A. P. (2003). Data clustering using particle
References swarm optimization. The 2003 Congress on Evolutionary Computation, 215–220.
Vesanto, J., & Alhoniemi, E. (2002). Clustering of the self-organizing map. IEEE
Backer, U. E. (1995). Computer-assisted reasoning in cluster analysis. Prentice-Hall. Transactions on Neural Networks, 11(3), 586–600.
Chung, H. M., & Gary, P. (1999). Special section: Data mining. Journal of Management Wendell, R. S. (1956). Product differentiation and market segmentation as
Information Systems, 16(1), 11–16. alternative marketing strategies. Journal of Marketing, 21, 3–8.