Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 8

~ 1 ~

Intelligent Decision Support System


Based on Data Mining: Foreign
Trading Case Study'

Abstract-
In this paper, we propose an Intelligent Decision Support System based on Data
Mining (IDSSDM), which integrates several data mining techniques and considers both
structured data and semi-structured data. For structured transactional data, Online Analytical
Processing (OLAP) is first used to access data warehouse for multidimensional analysis and
primary decision support. As for semi-structured data, classification and clustering is
exploited for contract documents mining; while Web usage mining is used for analyzing the
behavior of the users in order to extract relationships in the recorded data. Furthermore,
Knowledge Discovery in Knowledge Base (KDK) is used as the primary inference engine. As
the main business intelligence tool, the system has been adopted by E-Commerce Center of
Ministry of Commerce of the People's Republic of China.
Keywords- Decision Support System, IDSSDM, KDD*, SOM, KDK, Foreign
Trading.

I. INTRODUCTION-
Business Intelligence is the gathering, management, and analysis of large amounts of
data on a company's customers, products, services, operations, suppliers, and partners and all
the transactions in between. Business Intelligence applications include target marketing,
market basket analysis, customer profiling, and fraud detection in e-commerce industry,
which require analyzing large volumes of shopping transaction data from electronic
storefront sites.
Intelligent Decision Support Systems (IDSS), which are computer-mediated tools that
assist managerial decision making by presenting information and interpretations for various
alternatives, are usually regarded as one of major implementations of Business Intelligence.
Data Mining (DM), or Knowledge Discovery in Database (KDD), is an area that has received
an increasing amount of attentions in that it develops techniques and tools for the exploration
of databases in an attempt to extract relevant and interesting hidden relationships that exist
among variables or between causes and effects. The results emerging from the data mining
~ 2 ~
community can be an important contribution to IDSS community by providing techniques
which future IDSS can be able to utilize in providing a wide range of information Bingru
Yang, Wei Song and Linna Li School of Information Engineering University of Science and
Technology Beijing, China available for decision makers.
In this paper, by utilizing several data mining techniques and considering both
structured data and semi-structured data, we try to integrate DSS and DM better. That is to
provide one more knowledge source for decision making besides domain knowledge and
inference engine. As the main Business Intelligence tool, the proposed system has been
adopted by E-commerce Center of Ministry of Commerce of the People's Republic of China
(ECCMCPRC) to deal with foreign trading.
The rest of this paper is organized as follows. Section II describes the proposed
system in briefly. Section III discusses the techniques for structured data mining. The
techniques for semi-structured data mining are introduced in section IV. Section V depicts the
inference engine. Section VI compares the proposed system with related works. Section VII
concludes the paper and prospects future work.

II. Brief Introduction to IDSSDM -


Instead of traditional DSSs, which only process information in one or two
organizational databases, in today's ever-changing business climate, organizations are
compelled to make their decisions based on knowledge captured from a wide variety of
different organizational information sources. Thus the data source of IDSSDM includes both
structured data, which is collected from the transactional databases, and semi structured data,
which comes from World Wide Web.
The structured transactional data is preprocessed firstly, and then is stored into data
warehouse. Thus OLAP is used for multi-dimensional analysis and basic decision support. In
order to further analyze the complex hidden relationships among different attributes, KDD*,
was employed. While for semi-structured data, text mining is used for categorizing different
trading contracts. And Web usage mining is exploited for providing knowledge about the
behaviors of the users in order to extract relationships in the recorded data. Furthermore,
Knowledge Discovery in Knowledge Base (KDK) is used as the primary inference engine.
~ 3 ~

III. STRUCTURED DATA MINING-


A. OLAP
Online Analytical Processing (OLAP) is an efficient way to access data warehouse for
multidimensional analysis and decision support. But OLAP technique alone cannot generate
patterns from the stored data a capability most companies need if they are to identify trends
and consumer preferences. Data mining, in which analysts build a data cube to describe data
at different levels of abstraction, is thus a natural partner to OLAP. Analysts then use OLAP
technique to visualize the data cubes and identify interesting patterns, outcomes; pivoting
(reorienting data views); rolling up or drilling down (increasing and decreasing the level of
abstraction); and filtering. Thus, data cubes make it easier to use OLAP for large
multidimensional databases, and OLAP makes it easier to analyze the data cubes themselves.
Data warehousing is the process of envisioning, planning, constructing, using,
managing, maintaining, and enhancing the storage of a variety of data in one place under one
scheme. The ultimate goal of a data warehouse is to make standard OLAP operations and
data mining easier. In this paper, data collected from "Foreign Trading Contracts Approval
System" of ECCMCPRC was stored. After preprocessing, seven OLAP data marts
subdivisions of data that capture the abstraction level of interest are generated. They are
"information of domestic companies", "currency", "information of foreign companies",
"export commodity", "process cost", "raw material purchasing", "seaport".

B. Association Rule Mining


To further analyze the hidden relationships among different attributes, KDD*, which
is a software designed by us, is used for discovering association rules from massive trading
data. The main ideas of this paper used for association rule mining were described as follows-
We regard knowledge discovery as cognitive system, study the knowledge discovery
process from cognitive psychology perspective, and our emphasis is self-cognition. Two
important features of cognitive psychology was adopted, i.e. "~creating intent" and
"psychology information maintenance", to deal with two important issues of knowledge
discovery.
(1) Making the system find knowledge shortage automatically by simulating "creating
intent".
(2) Performing the function of real-time maintenance of knowledge base by simulating
"psychology information maintenance". To accomplish the above two functions, database and
knowledge base were used at the same time and 1-1 mapping between them was constructed
under the condition that they are specifically constructed.
~ 4 ~
The theoretical foundation of double base cooperating mechanism and structure
correspondence theorem firstly was constructed in order to the above two important
functions. Our goal is to accomplish the function of "directional searching" and "directional
mining" which can reduce the searching space and complexity of algorithms. To achieve this
goal, the key technology is to construct certain mapping between database and knowledge
base. This kind of mapping is named as double bases cooperating mechanism. It can discover
potential essences, principles and complexity of knowledge discovery to certain extent from a
special perspective. * Then, based on double bases cooperating mechanism, KDD* which is a
software for association rule mining was developed. The main advantage of KDD* is that it
can avoid multiple scans of database in a certain extent. The roadmap of KDD* is illustrated
in Fig. 1.
IV. SEMI STRUCTURED DATA MINING-

A. Data Preprocessing

Data preprocessing is the foundation for data analysis and mining. Incorrect data integration
must lead to incorrect outputs of data mining algorithms. So attained decision support
through this way is unreliable. In fact, during the process of data mining, a large part of work
is used to data preparation and improvement of data quality. Therefore, in order to improve
the accuracy of the results of mining, data preprocessing should not be ignored. Because of
semi-structured data collected from World Wide Web, they often contain large number of
noises, such as advertisements, hyperlinks, etc. Data preprocessing usually means re-
processing of data, checking the integrity of data and consistency, smoothing of the noise
data, fill the lost data, elimination of "dirty" data, and elimination of repeated records. There
are a number of common data preprocessing techniques, such as data cleaning, data
integration, data transformation and data reduction. Data cleaning routines work to "clean"
the data by filling in missing values, smoothing noisy data, identifying or removing outliers,
and resolving inconsistencies. Binning method and regression can be employed to remove the
noise.
B. Web Text Classification
There are thousands of new trading contracts in China, and how to categorize them efficiently
for support decision is far beyond the ability of existing statistical software used by E-
commerce Center of Ministry of Commerce of the People's Republic of China. Furthermore,
with the rapid development of World Wide Web, more contract data are collected via
Internet, so it is urgent to design efficient content-based retrieval, searching and filtering for
the huge and semi-structured online repositories on the Internet. Text classification, the
assignment of free text documents to one or more predefined categories based on their
~ 5 ~
content. Each document can be in multiple, exactly one, or no category at all. Text
classification has many application areas, such as information management, real-time sorting
of email or files into folder hierarchies, topic identification to support topic-specific
processing operations, structured search and/or browsing, or finding documents that match
long-term standing interests or more dynamic task-based interests, etc. In the research
community the dominant approach to it is based on machine learning techniques: a general
inductive process automatically builds a classifier by learning from a set of pre-classified
documents, the characteristic of the categories. This system adopts the fusion of K-NN (K
Nearest Neighbor) and SVM (Support Vector Machine). K-NN classification is an instance-
based learning algorithm. For deciding whether, K-NN algorithm ranks the document’s
neighbors among the training documents, and uses the class labels of the k most similar
neighbors to predict the class of the input document. SVM regards training samples as
relevant and non-relevant and aims to find the hyper plane which minimizes the loss function.
The detailed introduction to SVM can be found in references. The fused method can be
described as follows:
Input: texts will be classified
Output: texts and corresponding categories
(1) While there are texts which have not been dealt
(2) For a text which will be dealt
(3) K-NN is applied to it firstly.
(4) If the text can be classified by K-NN undoubtedly, it will be classified to the
corresponding category.
(5) Otherwise, SVM is applied to it.
C. Web Text Clustering
The above-mentioned text classification technique requires a large number of labeled training
examples. However, in certain cases, it is difficult to label some of them. Clustering can be
applied to these collected texts. Hierarchical clusters of these unlabeled documents can be
generated. Based on this, categorization can be operated indefinitely. Clustering via Self-
Organizing Maps (SOM) is adopted. The SOM is one of the major unsupervised artificial
neural network models and often used to learn certain useful features found in their learning
process. It basically provides a way for cluster analysis by producing a mapping of high
dimensional input vectors onto a two-dimensional output space while preserving topological
relations as faithfully as possible. After appropriate training iterations, the similar input items
are grouped spatially close to each other. As such, the resulting map is capable of performing
the clustering task in a completely unsupervised fashion. Furthermore, SOM approach is
superior to other cluster analysis methods in data mining in terms of the power of data
~ 6 ~
visualization. Thus, in this work SOM method was adopted to produce document cluster map
for Web text mining.

V. INFERENCE ENGINE-
As an important part of DSS, common methods used for inference engine mainly include
rule-based reasoning and case-based reasoning. For IDSSDM, Knowledge Discovery in
Knowledge Base (KDK), proposed in this paper, is used for this task. The basic ideas of
KDK are illustrated as follows.
(i) The aim of the KDK is the nontrivial process of discovering new knowledge in the
huge knowledge base, this means that the key problem of the discovery process is
induction, but the deduction is just assistance, since it cannot always ensure the
facts;
(ii) KDK can discover knowledge of deeper level. To be specific, we have to go further to
discover other relation based on the existing attributes and relations, from the
logical point of view, it is important to discover the relation between predicate or
function word.
(iii) Because knowledge itself may contain some attributes such as uncertainty, non-
monotony, incompletion, etc., the progress of KDK process will be a process of
complexity and multi-solutions. It is closely related to the organization of
knowledge base, as well as, the types of knowledge that a user pursue, the means
of reasoning may be associated with many different logical domains.
(iv) The knowledge discovered in KDK should be original, potentially useful,
effective and understandable to users. From the above description, we can see that
the nature of KDK is a machine learning process. Its aim is to obtain knowledge.
The resource of learning is knowledge base, the way of learning is to combine
induction with deduction methods, and the final output is not only to discover the
fact knowledge, as well as, knowledge of rules. As a result, in specific fulfillment,
two mining methods should be adopted.

VI. COMPARISONS WITH RELATED WORKS


Data mining is not well integrated with the decision support system. IDSS proposed
by Wang takes advantage of the intelligent, autonomous, and active aspects of intelligent
agent technology. It also has the successful integration of data mining process for the
generation of optimal sampling method into a DSS framework by means of adopting
~ 7 ~
intelligent agent technology. Proposed to integrate DSS and knowledge management
processes using DM; proposed the concept of a knowledge warehouse to manage the
knowledge of the firm; and summarized the evolution of DSS systems and emphasized the
importance of data warehouses, OLAP, and DM in those systems. Compared with these
works, most of the main parts of IDSSDM, such as KDD* for association rule mining,
improved SOM for contract data clustering, and KDK for inference engine, were proposed
rather than simple integration of existing data mining techniques.

VII. CONCLUSIONS-
In this paper, an Intelligent Decision Support System based on Data Mining (IDSSDM) was
proposed. It integrates several data mining techniques and considers both structured data and
semi-structured data. For structured transactional data, Online Analytical Processing (OLAP)
is first used to access data warehouse for multidimensional analysis and primary decision
support. To uncover hidden relationships between different attributes, KDD* is used for
discovering association rules from massive trading data. As for semi-structured data,
classification and clustering is exploited for contract documents mining; while Web usage
mining is used for analyzing the behaviors of the users in order to extract relationships in the
recorded data. Furthermore, Knowledge Discovery in Knowledge Base (KDK) is used as the
primary inference engine. As the main business intelligence tool, the proposed system has
been adopted by E-Commerce Center of Ministry of Commerce of the People's Republic of
China (ECCMCPRC) for foreign trading. How to better integrate different main parts of
IDSSDM for real-time decision support is our future work. And the second stage of the
project between ECCMCPRC and us has been started just now.
~ 8 ~

Real DB Basic KB

Pre- Classify
processing knowledge sub-
base

Classify data
sub-base
Classify knowledge
nodes according to
attributes establish
the reasoning and
create mining
Create data sub knowledge base
structure according
to sub-base and
establish the Search the association
mining data base of knowledge nodes in
mining knowledge base,
discover shortage of
knowledge and
User’s interest and needs prioritize.
heuristic coordinator

(Directional mining)

(
Focusing Derivative
DBA

Directional
mining

Acquire Evaluation
hypothesis
rule
(Directional searching)
Store the mined rule into
knowledge base
Maintenance coordinator

( Fig-1 )

The KDD* process model


Abhishek Prasad
R no.-07141A1202
2nd B.Tech IT, Swami Ramananda Tirtha Institute of Science and Technology,
Ramananda nagar, (post)SLBC, Nalgonda-508004
Ph no.-9966622362
Email id- abhishekprasad73@gmail.com

You might also like