Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

Proceedings of the 8th

World Congress on Intelligent Control and Automation


July 6-9 2010, Jinan, China

A Knowledge Discovery and Data Mining


Process Model in E-Marketing∗
Huifang Zeng, Ding Pan+
Center for Business Intelligence Research, Management School, Jinan University
Guangzhou 510632
E-mail: zenghuif@gmail.com; +pandingcn@gmail.com

Abstract- Based on the process model CRISP-DM, this The objective of this paper is to remedy the dilemma
paper has solved E-marketing problems like network customer discussed above and to propose the usage of a six- phase
prediction and market segmentation. Each phase of this model knowledge discovery and data mining process
was described firstly, and then its application in E-marketing model-CRISP-DM in previously unexplored area of
was presented, which provides a complete description of all the E-marketing. We present the framework of the process
steps, from understanding the problem domain to using the model firstly, and then describe each step in detail. After all,
discovered knowledge. Finally a KDDM procedure of network we aim for proposing a simple operating and easy
marketing is discussed thoroughly, and a simple operating and understanding process model which should be reliable and
easy understanding process model is provided to marketing repeatable by E-marketing people with little data mining
people who have difficulties to make marketing decisions. background.
Key words - E-marketing/network marketing, e-customer,
data mining process, CRISP-DM II. THE CRISP-DM MODEL AND E-MARKETING

I. INTRODUCTION A. The CRISP-DM Process Model


The cross-industry standard process for data mining
Using the internet and web based information (CRISP-DM) is an nonlinear and iterative process model,
technology to carry out the marketing activities is called developed in 1996, and became the most favored
E-marketing [1]. Due to the development of network methodology ever since [7]. The research described in this
technology, it is convenient and inexpensive to obtain large article is conducted by this methodology. Figure 1 shows
amounts of data. The fact is that the original data cannot be that CRISP-DM comprising of six phases:
directly used to guide E-marketing decision-making, so a
crucial problem is how to extract useful knowledge and
patterns to support decisions which will be made by
business persons.
The traditional market research methods are highly Business Data
Understanding Understanding
subjective and static; it is difficult to support the objective
marketing decisions. In addition, the previous data mining
in E-marketing, emphasizing too much on model selection Data
and algorithms, while ignoring some important steps, such Deployment
preparation

as: business understanding and data preparation. Although Data

an e-customer behavior (eCB) model that uses an online Modeling


analytical mining (OLAM) methodology has been proposed
by Kwan et al. to foster the development of a marketing
plan for B2C Web sites [2], knowledge discovery in Evaluation

E-marketing is challenging.
CRISP-DM is currently the most popular and broadly
adopted model which has been already acknowledged and
relatively widely used in both research and industrial Fig. 1 The CRISP-DM process model
communities [3]. For instance, CRISP-DM model is used in 1) Business Understanding: involve the project
automotive direct marketing by Gersten et al.[4], processing objectives and requirements, define data mining problems,
small and medium sized enterprises' data [5], for customer and design a preliminary plan to achieve the objectives.
churn prediction[6], and so on.. Nevertheless, little research 2) Data Understanding: start with the initial data
has used it to develop E-marketing problems. In fact, it collection and further gather the description of data, the
almost has become a prescribed standard, thus this paper exploration of data, and the verification of data quality.
argues that specializing CRISP-DM for discovering
knowledge in website is significant.


This work is partially supported by National Natural Science Foundation of China grant #70771044 and #70872020, and Jinan University Fund for
Humanities and Social Science grant #006JSYJ013.

978-1-4244-6712-9/10/$26.00 ©2010 IEEE 3960


3) Data Preparation: cover analyzing of the initial raw requirements, so we need to use the CRISP-DM, thoroughly
data, and constructing the final data set to mine. analyze and mine customer behavior in order to fully
4) Modeling: aim at selecting the actual modeling understand customer preferences, buying patterns, and even
techniques and calibrating their parameters to optimal customers on impulse, and then work out the correct and
values. timely marketing strategies.
5) Evaluation: evaluate the model and review the steps
executed to construct the model, make sure it properly III. APPLICATIONS OF CRISP-DM IN
achieves the business objectives. E-MARKETING
6) Deployment: generate a report or a data mining As we know, there is a huge amount of data
process across the enterprise depending on the accumulating in web marketing, such as commodity
requirements. information, network customer information and transaction
B. E-marketing records. If we make full use of these data, and find the
E-marketing is built on an advanced e-commerce useful knowledge hidden in the raw data would be very
information platform, using Internet to carry out marketing important for enterprises to make the right decisions, adjust
activities. In the information business model, consumer market strategy, and reduce risk. The CRISP-DM process
behavior has the following characteristics: model can be systematically analyzed the data, highly
1) Personalized consumption has become increasingly automatically excavate the potential pattern, predict
prominent network customer's behavior, and finally help enterprise to
On some of the latest emerging personalized merchandise, achieve the above objectives. The application of this model
consumers can via the network facilities, to determine their in network marketing is described in detail below (see Fig.
consumption behavior, and find decision-making for their 2), which provides a complete description of all the steps,
own personal consumption. from understanding the problem domain to using the
2) Buying behavior inclined to emotion discovered knowledge. Here we do not talk about the
When consumers face with too much information on the detailed technical problems; just offer an application
Internet, they have no time and effort to choose, resulting in process to make marketing people better mine patterns and
consumer behavior is somewhat sentimental. knowledge in actual situation.
3) Customer demand for services becomes the A. Business Understanding
mainstream The commercial understanding is to understand the goal
Network Marketing is not only selling products to and the request of DM, then transforms again as the data
customers, but also a concept. Marketers should dilute the mining question from the service angle. The business
promotion strategies, communicate and exchange with understanding phase involves several key steps, including
consumers in order to meet their needs. determining business objectives, assessing the situation,
The traditional data mining in E-marketing, determining the data mining goals, and producing the
emphasizing too much on model selection and algorithms, project plan. In the KDDM process described in this article,
while ignoring some important steps, such as: business CRISP-DM was conducted in order to reveal hidden,
understanding and data preparation. Characteristics appear potentially useful knowledge among the data of network
in network marketing, making a simple data mining customers. Specific target can be refined as follows:
analysis of customer behavior can no longer meet the

Business Data Data


Modeling Evaluation Deployment
Understanding Understanding Preparation

Determine the Collect Select


Select Attributes Evaluate Plan
E-marketing Initial Modeling
and Datasets Results Deployment
objectives Data Technique
Import Plan Monitering
Assess Clean Generate Quanlity
Data into &
Situation Data Test Design Assurance
Clementine Maintenance
Determine Set up
Explore Derive New Determine Run
Data Mining Modeling
Data Attributes Next Steps Campaign
Goals Streams
Verify Produce
Produce Integrate Assess
Data Final
Project Plan Data Model
Quality Report

Select Format Review Process Review


Working Data Plan Project
Data

Fig. 2 Tasks and Outputs of the CRISP-DM in E-marketing

3961
1) Determine the E-marketing objectives: Using data females males
mining techniques to analyze network customer data we can 100%
segment the market at all levels, thus provide reliable basis
for company to locate its target market. 80% 57.0% 56.5% 53.0%
2) E-marketing program development: Usage of data
mining technology analyzes customer consumption data; 60%
digs out the customer's consumption patterns in order to
help vendors develop effective marketing programs, so as to 40%
effectively win customers and create profits.
3) Carry out one to one marketing: Enterprises can use 20% 43.0% 43.5% 47.0%

classification and clustering technology to divide a large


0%
number of customers into different classes, each class of
customers have similar attributes, so that enterprises can Bei jing Shang hai Guang zhou
provide different service for each category of customer,
increase customer satisfaction, enhance the customer's Fig. 4: The sex structure of network customer in different cities [8]
loyalty. Based on the above analysis of data understanding,
Figure 3 shows a framework of data classification and
B. Data Understanding and Preparation collection methods. In addition, according to exploring the
1) Data Understanding primary data, some interesting things are observed, such as
This phase should determine what data is available to online shopping women users account for half the sky users,
solve our business needs. According to the commercial at present the proportion accounted for 50.8%. The higher
understanding, we select the variables of following 3 the level of urban development, such as Beijing, Shanghai,
aspects to carry on the data to understand. Guangzhou, net purchases by female users are significantly
Customer individual information data: In network higher than men, as shown in Figure 4.This information
marketing, the main marketing method is "one to one", we would has potential impact on the business goals.
can collect a large number of customers personal 2) Data Preparation
information through direct sales and distribution, or online This step is the key to the success of the project. It is
survey. These data mainly reflected individual characteristic usually assumed that data preparation takes about half of
information like age, sex, education degree and so on. the project workload [3]. Firstly, we should select the data
Customer purchase behavior data: In the direct level, that discussed in data understanding and then clean them. It
each customer's purchase process can be tracked is obviously that customer individual information data are
automatically by a computer and kept a record, which preserved in customer material database permanently, and
contains the client's net purchase behavior the customer purchase behaviour data main gain from the in
characteristics(for example purchase amount, payment the customer speech list or the web documents, other
method, frequency of purchase and etc.). And this relevant information can be attained through investigation
information about the whole marketing can be collected by and detecting.
the professional market research company. Furthermore, in order to improve mining efficiency, we
Other relevant information: This type of data has not need to eliminate useless or irrelevant information and
been contained in the customer individual information and integrate the information is necessary. For example, remove
the customer purchase behavior characteristic. Such as the redundant information like the advertisement links and
website brand awareness and the macroeconomic tags in web documents, and organize the data into neat
conditions of network market including the policy changes, logical form even relational table. We should also delete the
could be obtained through surveys and consulting firm. attributes which express the similar meaning, such as name
Dat a classi fication
and account number.
In the phase of data cleansing, missing values were
either replaced by some neutral value, or the records with
Customer Other
Customer missing data were excluded from further analysis in the
individual purchase relevant
information information KDDM process. If the sample size is large, we can choose
behaviour data
the latter approach, otherwise it should consider retaining
the items.
C. Modeling
The modeling phase of CRISP-DM methodology
online survey
direct sales,
consult ant firm includes the application of various DM technologies and
distribution KD methods, with wide scale of tunable parameters [5]. It
is well known, that no method dominates the other methods
Fig. 3 Data classification and collection methods all the time. Ref. [9] provides some answers to the question
how to decide which method to choose for a particular
application. For our applications, we can choose association

3962
rules, cluster analysis, classification analysis to complete In the modeling phase, we did not use any specific data
our network marketing purposes. to verify the validity of the model, which only provides a
1) Association Rules: The purpose of association rules process for reference. The results may not be as expected,
is to search the relevance of different items in the same you need to establish a large number of models to compare
event. Such as the relevance of different commodities in a and analyze. If the results are still not satisfied, we must
purchase activity, while the most typical example is the consider looping back to the second or third step, whether
market basket analysis. We can carry out one to one the prepared data is suitable for the tool of choice.
marketing sales and develop E-marketing plans with the Generally, the cost of data collection is huge. If we can use
help of correlation analysis. the KDDM techniques and mining results to improve data
2) Cluster Analysis: Cluster analysis is used to put collection, not only we can save the cost including savings,
together the data items and customers that have similar preprocessing, etc, but also directly affect the ability of
characteristics. In the network marketing, in order to carry knowledge discovery.
out targeted marketing and provide more appropriate and
satisfactory service, using clustering technology to divide D. Evaluation and Deployment
the market into a number of segments on the basis of The keys steps in the evaluation phase are the
available customer data is necessary. evaluation of results, the process review, and the
3) Classification Analysis: Classification was based on determination of next steps. As shown in Figure 6, the
the value of variables for calculation, and then classified Evaluation Chart node offers an easy way to evaluate and
according to the results. It usually involves two kinds of compare predictive models to choose the best model for
statistical methods: logistic regression and discriminated your application. There are five types of evaluation charts:
analysis, however, neural networks and decision trees have Gains Charts, Lift Charts, Response Charts, Profit Charts,
been adopted gradually. By analyzing the existing historical and ROI Charts, each of which emphasizes a different
data, classification can summarize a prediction model that evaluation.
could predict which customers might react to the Here we give an example of gains charts showing
advertising, product catalogs, etc. Then you can determine Figure 6. Gains are defined as the proportion of total hits
the network marketing objectives and provide personalized that occurs in each quantile. Cumulative gains charts always
information services, addressing the characteristics of this start at 0% and end at 100% as you go from left to right. For
type of client. a good model, the gains chart will rise steeply toward 100%
In the research described in this paper, the modeling and then level off. A model that provides no information
was supported by the data mining tool Clementine from will follow the diagonal from lower left to upper right.
SPSS, mainly because its breadth of techniques, its process
support, and its scripting facilities. Using a visual
programming interface, Clementine offers rich facilities for
exploring and manipulating data. It also contains several
modeling techniques and offers standard graphics for
visualization. The single operations are represented by
% Gain

nodes which are linked on a workspace to form a data flow,


a so called stream.
For our applications, we chose the Clementine of SPSS
to build the clustering model. The process of modeling is
shown in Figure 5. We use three kinds of algorithms:
Kohonen Clustering Network, K-means Clustering, and
Two-step Clustering. Because Clementine contains many
kinds of models, it permits the users compare with the
different models and choose the most appropriate model by
evaluating. Fig. 6 Gains chart (cumulative)
After evaluation, it is time to do a more thorough review
of the data mining engagement to determine if there is any
important factor or task that has somehow been overlooked.
And then decide whether to finish this project and move on
to deployment or whether to initiate further iterations.
The goal of deployment is that the results or solutions
should be full understood by the marketing persons in the
certain form. Thus they can deploy the data mining result(s)
into the business and take efficiency actions.
IV. CONCLUSIONS AND FURTHER WORK

Fig. 5 The process of modeling According to the standard data mining process,
CRISP-DM, one can directly collect data that are essential

3963
and useful for the mining results. It also offers practical to plan e-marketing activities,” Business Horizons, vol. 49, pp. 51-60,
help to those KD researchers both from industry and 2006.
academia. In this paper, a KD procedure and method of [2] I.S.Y. Kwan, J. Fong and H.K. Wong, "An e-customer behavior model
E-marketing is discussed thoroughly, and a simple operating with online analytical mining for internet marketing planning,"
and easy understanding process model is presented to Decision Support Systems, vol. 41, pp. 189–204, 2005.
marketing people with little data mining background. [3] L.A. Kurgan and P. Musilek, "A survey of Knowledge Discovery and
In addition, the traditional market research methods are Data Mining process models," The Knowledge Engineering Review, vol.
highly subjective; it is difficult to support the objective 21, pp. 1-24, 2006.
marketing decisions. While the KDDM process model can [4] W. Gersten, R. Wirth and D. Arndt, "Predictive modeling in automotive
be effective in helping market analyst find the distribution direct marketing: tools, experiences and open issues," In Proceeding of
and propensity of customers, thus to predict customer needs, the 6th ACM SIGKDD International Conference on Knowledge
determine the marketing strategy and ultimately to develop Discovery and Data Mining, pp. 398–406, 2000.
effective marketing plans. [5] Z. Bošnjak, O. Grljević and S. Bošnjak, "CRISP-DM as a Framework
However, dynamic characteristics of the marketing for Discovering Knowledge in Small and Medium Sized Enterprises'
make KDDM difficult to deal with unstructured data, and a Data," 5th International Symposium on Applied Computational
complete dynamic model can not be established to support Intelligence and Informatics, pp. 509-514, 2009.
marketing decisions. In addition, theoretical and applied [6] Z. Mo, S. Zhao, L. Li and A. Liu, "A Predictive Model of Churn in
data mining research on network marketing still at an early Telecommunications Based on Data Mining," 2007 IEEE International
exploratory stage, the dynamic characteristics add various Conference on Control and Automation, pp. 809-813, 2007.
difficulties to the study. We can say that KDDM process [7] C. Shearer, "The CRISP-DM model: the new blueprint for data
model for the academic study of E-marketing have a long mining," Journal of Data Warehousing, vol. 5, pp. 13–19, 2000.
way to go. [8] Data sources: http://research.cnnic.cn/
[9] T.W. Liao and E. Triantaphyllou, "Recent Advances in Data Mining of
REFERENCES Enterprise Data: Algorithms and Applications," Series on Computers
[1] S. Krishnamurthy, “Introducing e-markplan: A practical methodology and Operations Research, vol. 6, 2008.

3964

You might also like