Professional Documents
Culture Documents
Usage of Cluster Analysis in Consumer Behavior Res
Usage of Cluster Analysis in Consumer Behavior Res
Usage of Cluster Analysis in Consumer Behavior Res
net/publication/265109861
CITATIONS READS
4 3,590
3 authors:
Arnost Motycka
Mendel University in Brno
24 PUBLICATIONS 57 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
FITPED-AI: Future IT Professionals EDucation in Artificial Intelligence (Erasmus+ Programme) View project
All content following this page was uploaded by Jiri Stastny on 17 October 2014.
Abstract: - This article discusses a case study that deals with the application of clustering methods in data
mining research in consumer behavior in the food market. The data obtained questionnaire survey of the
Institute of Marketing and Trade of Faculty of Business and Economics of Mendel University in Brno are
applied to different types of cluster analysis algorithms to find market segments. The aim of this study is to
identify the possibilities of these methods in the issues and describe their suitability or unsuitability for solving
such problems.
Key-Words: - Cluster analysis, Data mining, Consumer behavior, Marketing Research, Application of methods
of knowledge discovery in marketing, Data processing.
estimations of future development. These are For a better understanding how it can be difficult
secondary data collection instrument of national and to determine the format of individual clusters in
international sources, primary data collection Figure 1 is shown three different ways to divide the
through marketing research and data processing twenty points are assigned to groups. The shapes of
applications selected statistical methods [4]. Data is objects determine the jurisdiction of individual
usually necessary prepare for analysis [5]. clusters. Figures 1 (b) and 1 (d) divide the data into
two or six parts. However, the apparent breakdown
of each of the two major clusters of three sub
3 Cluster analysis clusters can only be caused by the activities of the
Cluster analysis is a multidimensional statistical human visual system. In some cases it may be
method that is used to classify objects. The basic reasonable to classify items into four groups as
problem of cluster analysis is to classify objects into shown in Figure 1 (c). Figure 1 thus shows that the
groups (clusters), so that the two objects in the same creation of clusters is not clear. The optimal
cluster are more similar than two objects of different allocation depends on the nature of the required data
clusters [6]. and the results of [7].
The first problem is to determine the similarity of exceed its category. Han and Kamber [8] offer the
two objects. To be measured in similarity, each following breakdown:
object must be characterized by their properties [6]. Partitioning methods enable to divide 𝑛 objects
Properties of objects can be divided into several into k groups, where 𝑘 ≤ 𝑛. These methods must be
categories. Han and Kamber [8] reported these types complied with two requirements:
of variables: a) each group must contain at least one
• Interval-Scaled variables object,
• Binary variables b) each object must belong to just one
• Categorical variables group.
• Ordinal variables At first, the appropriate method divides objects
• Ratio-Scaled variables into 𝑘 groups. Subsequently, the algorithm starts the
How to determine the similarity respectively appropriate distribution of objects between groups.
differences of individual criteria is described in The algorithm can be terminated after a certain
detail [8, 9] and others. number of iterations or if they no longer move it.
Hierarchical methods create a hierarchical
3.1 Types of methods of cluster analysis decomposition of objects. Depending on how this
You can find very many algorithms for creating decomposition is carried out the hierarchical
clusters. It is however difficult, is clearly divided methods are divided agglomerative (bottom-up),
into different categories, as some of them may where at the beginning of each object forms its own
class and then there are the most similar pool into
one class, and divisive (top-down), where all objects Department of Marketing and Trade of Faculty of
are first in one class, which is subdivided until it Business and Economics of the Mendel University
reaches the desired level of distribution. in Brno. Marketing research was focused on the
Density-based methods find the clusters with behavior of consumers in the food market in the
large density of objects in the data area that is Czech Republic. The questionnaire contained thirty
separated from areas with low densities occurring items related to the research questions (low price,
objects. These methods allow users to find clusters product composition, ...), which respondents rated
of different shapes and are also capable of dealing on scale from 1 to 10, where a value of 10
with the occurrence of noise and outliers in the data. determined that this criterion has the highest
Grid-based methods transform the object space importance to the interviewee. Another eight
into a finite number of cells that form a grid questions then characterized the respondent (age,
structure. All clustering operations are performed on sex, educational level, etc.) [3]. By applying data
the structure of the grid. The main advantage of this mining methods to this kind of data, interesting
approach is its speed of processing, which is usually patterns concerning the customers behaviour can be
independent of the number of data objects and identified [14].
depends only on the number of cells in each
dimension.
Model-based methods try to optimize 5 Finding the clusters
consistency between the dataset and some In finding clusters it has been used several
mathematical model, which means that they try to algorithms. It was a K-means algorithm,
find such clusters, which would most correspond to Expectation-Maximization, and DBSCAN algorithm
that model. for hierarchical clustering. Because of absence of
Methods for clustering high-dimensional data knowledge of number of clusters in dataset at first
allow transformation or selection of attributes to we focused on the methods, which do not require
reduce the number of dimensions while preserving this information. As input criteria were selected
the relevant distances between objects [8]. thirty items related to the issue of consumer
Individual examples of algorithms are presented behavior.
and describe in [6, 8, 10, 11, 12, 13]. At first it was tested DBSCAN method. With
default setting (𝑀𝑖𝑛𝑃𝑡𝑠 = 6, 𝜀 = 0.9), there was no
cluster. All objects were identified as noise. So the
4 Data source parameters were modified. The following table
Data file, on which is performed the knowledge shows the values for selected parameters.
discovery was acquired in the survey of the
The table is not a list of all the tested values. But Another method which has been tested was
all the other attempts to set the parameters so that Expectation-Maximization. In gaining the outputs of
the output would produce consistently large clusters this method were gradually adjusted value of the
this method failed. As the table shows, applications minimum standard deviation (𝑚𝑖𝑛𝑆𝑡𝑑𝐷𝑒𝑣) and the
DBSCAN method to data from the survey, usually maximum number of iterations. Table 2
we get one large cluster and several other of demonstrates the results.
negligible size. For this reason, this method can be As the table shows the number of iterations does
considered for such data unsuitable conceived. not influence result is too big. Besides one case
As a second division option was chosen where the number of elements in different clusters
hierarchical clustering. Even using this method, we differed by a maximum of two, was no effect of
have not come to the desired distribution. When number of iterations. It is possible that in the event
applying this method, almost exclusively occurred of a further increase in the number of iterations
that one object was separated from the rest. For this causing major changes, but the period during which
reason, is this hierarchical clustering method the algorithm was carried out, would be too long.
unsuitable for our purpose.
From the beginning of the experiments, it This method of identifying clusters has provided
seemed that even change of the minimum standard results, which at first glance appears to be
deviation does not change the number and applicable. The output is a few clusters with an
composition of clusters. The change occurred acceptable number of objects.
between the values 0.005, 0.01, where the number As a last method was used 𝑘-means algorithm.
of clusters decreased from ten to seven. Another This procedure requires the knowledge of number of
increase in this parameter, then brought again clusters. The following table presents the results for
increase of the number of clusters. different numbers of clusters.
With this method we have achieved clusters of set are only suitable methods EM and K-means, that
comparable size. However, it is difficult to create useable (reasonable) clusters out of input
determine the number closest to reality. This is data. These methods can be used with advantage in
already a task for an expert on the issue, which is the preparatory phase of the subsequent application
able to assess whether the clusters have meaning. of the methods for dealing with data classification
This also applies to the previous method. according to the set parameters [16] or [17].
6 Conclusion References:
As a tool for data analysis was chosen Weka [1] Solomon, M. R. Consumer Behavior. Buying,
software. Weka (Waikato Environment for Having, and Being. Pearson Prenctice Hall.
Knowledge Analysis) is in Java written machine Saddle River 2004, 621 s., ISBN: 0-13-123011-
learning tool, developed at the University of 5.
Waikato, New Zealand. WEKA is freely available [2] Chalupová, N., Motyčka, A. Situation and
software under the GNU General Public License. trends in trade-supporting information
Weka is a set of machine learning algorithms technologies. In Acta Universitatis agriculture
designed for data mining tasks. Algorithms can be et silviculture Mendelianae Brunensis. 2008,
applied directly to a data file, or you can call via our LVI, no. 6, pp. 25-36. ISSN 1211-8516.
own code written in Java. Weka contains tools for [3] Turčínková, J., Kalábová, J., Preferences of
preprocessing, classification, regression, clustering, Moravian consumers when buying food. Acta
association rules and visualization. It is suitable also Universitatis agriculture et silviculture
for developing new machine learning schemes [15]. Mendelianae Brunensis. 2011. Vol. LIX, No. 2,
This whole case study was performed to obtain pp.371-376.
information about the behavior of consumers in the
[4] Turčínková, J., Stávková, J., Stejskal, L.
food market using the methods of cluster analysis.
Chování a rozhodování spotřebitele. 2007.
The study dealt with the issue primarily in terms of
102 s. ISBN 978-80-7392-013-5.
the suitability of selected methods for the type of
data. [5] Munk, M., Kapusta, J., Švec, P., Turčáni, M.
The study concluded that the application of 2010. Data Advance Preparation Factors
cluster analysis on the number of such attributes is Affecting Results of Sequence Rule Analysis in
possible, but not all types of methods are suitable Web Log Mining. In E & M Ekonomie a
for this purpose. However, the results need to Management. ISSN 1212-3609, 2010, vol. 13,
consult an expert in consumer behavior, if the no. 4, p. 143-160.
results relevant. For this type of data would be [6] Řezanková, H., Húsek, D., Snášel, V. Shluková
useful to test other approaches such as the creation analýza dat. Professional Publishing. Praha,
of association rules. 2007, 1. vyd., 196 s. ISBN 978-80-86946-26-9.
In the research of consumer behavior in the food [7] Tan, P.-N., Steinbach, M., Kumar, V.
market was performed analysis of data by following Introduction to Data Mining. Addison-Wesley.
the methods of cluster analysis: DBSCAN, 2006. 769 s. ISBN 9780321321367.
HierarchicalClusterer, Expectation-Maximization,
K-means. The analysis shows that for a given data