Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

2009 International Joint Conference on Artificial Intelligence

Research on KDD process model and an improved algorithm

Zhen Peng Bingru Yang Hongde Ren


Department of Computer School of Information Engineering Department of Computer
North China Institute of Science and Beijing University of Science and North China Institute of Science
Technology Technology and Technology
Beijing, China Beijing, China Beijing, China
e-mail: yx_dpzc@yahoo.com.cn

Abstract—In order to improve the performance of KDD on FCM, which use FCM for knowledge representation and
greatly, the paper researched KDD model process based on inference algorithm to achieving accessible matrix.
double bases cooperating mechanism and focused on one of its Compared with the approach based on hyper graph, on the
part that is heuristic coordinator, which simulated the creating one hand, FCM-based algorithm has strong reasoning ability
intent of cognitive psychology feature. And an improved to effectively reduce searching space and complexity of
heuristic coordinator algorithm was proposed. The method algorithm, on the other hand, it fully reflects the inner
used FCM representing knowledge and being effective
cognitive law of KDD and the cognitive characteristics of
inference to get non-association state of knowledge base for
directional mining knowledge in massive database
cognitive psychology.
automatically. Furthermore, the method greatly reduced the The rest of the paper is organized as follows: the section
searching space and the complexity of the algorithm and 2 introduces knowledge representation and knowledge
enhanced the self-cognition ability. reasoning based on FCM, the section 3 represents heuristic
coordinator algorithm based on FCM and conclusion is in the
Keywords-Knowledge Discovery in Database (KDD); Double section 4.
Bases Cooperating Mechanism; Heuristic Coordinator; Fuzzy
Cognitive Map II. KNOWLEDGE REPRESENTATION AND INFERENCE
MECHANISM BASED ON FCM
I. INTRODUCTION
A. Knowledge representation
With the fast growing amount of data stored in database,
data warehouse or other data repositories, how to derive As a soft computing methodology, Fuzzy Cognitive Map
(FCM)[7], in which knowledge stored in concepts and the
useful knowledge is an impending problem. KDD [1,2]
relations between the concepts that are both fuzzy variable,
offers us a powerful tool. Generally, KDD systems are in
are relative easy to use for representing structured knowledge,
accordance to user demands to mine knowledge in massive
and the inference can be computed by numeric matrix
database. However, the research in KDD has mostly been
operation.
concentrated on good algorithms for various tasks. Relatively
In heuristic coordinator, the number of relational nodes
little research has been published about the theoretical
are equal to the number of rules and each relational node
framework or foundations of KDD. To overcome it and
corresponds to a probabilistic relationship rule “if Ci then
improve the performance of KDD greatly, we regard
Cj” ,in which Ci is the single concept node or co-node
knowledge discovery as a cognitive system , and construct
pointing to the relational node, while Cj is the single concept
double bases (knowledge base and database) cooperating
node or co-node pointed by the relational node. Each concept
mechanism to improving the knowledge discovery process
node Cj has a state value recorded as Aj. Aj amounts to
model.
There are two important parts in double bases δ(Cj)/N, in which δ(Cj) means the record numbers that Cj is
cooperating mechanism. They are heuristic coordinator[2,3] true in database and N is the total record numbers.
and maintenance coordinator, in which heuristic coordinator The corresponding diagonal matrix W of FCM is shown
could simulated the creating intent of cognitive psychology in Figure 1. The Wii values on diagonal line are 0 forever,
feature enhanced the self-cognition ability. And KDD because the rule of Ci→Ci is always true and do not need to
systems based on heuristic coordinator can focus from the participate in the calculation. The Wij of interconnection
two perspectives of user demands and system, where the relationship/rule denotes {?,sup(Ci→Cj),con(Ci→Cj)}. “?” is
heuristic coordinator algorithm is able to be used to discover the state value of the Ci→Cj rule indicating that the rule is
knowledge automatically in the knowledge base by searching knowledge rule or not. So the value of “?” is 1 or -1.
the non-association state of knowledge nodes, then activate sup(Ci→Cj) is equal to δ(Ci∧Cj)/N and means the support of
the corresponding data sub-class structure in the massive the Ci→Cj rule. If δ(Ci∧Cj)/N is less than support threshold,
database, thus realizing the directional mining process, the value of “?” is recorded as -1 indicating the rule is non-
effective reducing searching space and the complexity of knowledge rule. con(Ci→Cj) represents the confident of the
algorithm. So the heuristic coordinator conforms to the inner Ci→Cj rule and is equal to δ(Ci∧Cj)/ δ(Ci). Similarly, if
cognitive rule of KDD, improve the KDD model and have con(Ci→Cj) is less than confident threshold, the value of “?”
applied widely [4,5,6]. is also recorded as -1 indicating the rule is non-knowledge
Based on the research of KDD model process, the paper rule. To sum up, each value of Wij in matrix is 0 or the set of
proposed an improved heuristic coordinator algorithm based state value, support and confident.

978-0-7695-3615-6/09 $25.00 © 2009 IEEE 113


DOI 10.1109/JCAI.2009.15
C1 C2 C3 C4 C5 C6 C7 By the same token, the support δ(A1∧B∧(A-A1))/N and
the confident δ(A∧B)/δ(A1) can be confirmed.
C1 0 0 { 1,s,c } 0 0 0 0
C2 { 1,s,c } 0 0 0 0 0 { 1,s,c }
III. AN IMPROVED ALGORITHM
The first step realizing heuristic coordinator algorithm is
C3 0 0 0 0 0 0 0 to discover some knowledge by reasoning in order to get
C4
accessible matrix W for determining the non-association
0 0 0 0 0 0 0
state of knowledge nodes according to the above inference
C 5 :C 1 ∧ C 3 0 0 0 0 0 0 0 mechanism. It includes two processes of constructing two-
dimensional array W and one-dimensional array A and
C 6 :C 2 ∧ C 4 0 0 0 0 { 1,s,c } 0 0 calculating accessible matrix W.
C 7 :C 3 ∧ C 4 0 0 0 0 0 0 0 Function calculate_reach_matrix
Figure 1. The FCM relational matrix
//calling function construct_matrix for the first
constructing process
In FCM, each Aj and Wij may be change with time and Step1: construct_matrix(C,E,A,W);
the changing of record numbers of database. Maintenance //calling function calculate_matrix for the second
coordinator is responsible for the real time maintenance of calculating process
information. Here is no longer described. Step2: calculate_matrix (A,W,K);

B. Inference mechanism Procedure construct_matrix(C,E,A,W)


The Inference mechanism based on FCM is to calculate Step1: In accordance with the order of all single nodes
accessible knowledge using adjacency matrix W and state first and co-nodes back, all single nodes and
values of nodes of FCM automatically. It includes the existing co-nodes in FCM are given ID(Ci)
following four conditions. representing the subscription i of Ci. The total
Inference 1. If the rule A→B and B→C is valid,then number of FCM is recorded as |C|. The |C| nodes
the rule A→C is also valid. form one |C|×|C| matrix W;
Step2:The state value A[i] of each node in one-
Inference 2. If the rule A→B is valid,in which back
dimensional array A is assigned as δ(Ci);
part B is a co-node, then the rule A→B1 must be valid to any Step3:Initially, W[i][j] of each relationship in matrix is
B1⊂B. the value of 0;
Because the rule of A → B is true , that is to say, Step4:e:=1;
δ(A∧B)/N is more than the support threshold and δ(A∧B)/ Step5:Read the eth rule re:Ci→Cj;
δ(A) is more than the confident threshold as well. And Step6:W[i][j]:={1,sup,con}, sup and con are the support
δ(A∧B1)>δ(A∧B) is sure to be true. So it is definitely valid and confidence of the rule respectively;
that δ(A∧B1)/N is more than the support threshold and Step7:if e<|E| (|E| is the relationship number of FCM)
δ(A∧B1)/δ(A) is more than the confident threshold too. Step8: e:=e+1,go to the step 5.
Therefore, the rule of A→B1 is true.
Inference 3. If the rule A→B is valid,in which back Procedure calculate_matrix(A,W,K)
Step1:for r:=1 to |C|;
part B is a co-node, then the rules of B1→A∧(B-B1) and A∧ //r means the row number of the matrix W
B1 →(B-B1) can be determined to be true or false to any Step2: for l:=1 to |C|;
B1⊂B. //l means the line number of the matrix W
The support of the rules of B1→A∧ (B-B1) and A∧ B1→ Step3: if r!=l&&W[r][l]!=0 is true,
(B-B1) is equal to δ(A∧B1∧(B-B1))/N, also equal to Step4: if Cr is a co-node,
δ(A∧B)/N. //the 4th inference condition
The confident of the rule B1 → A∧(B-B1) is equal to Step5: to any Cri⊂Cr
δ(A∧B)/δ(B1), also equal to support×N/δ(B1) and the Step6: if Cri or (Cr-Cri)∧Cl does not exist in W,
confident of the rule A∧ B1 →(B-B1) is equal to δ(A∧B)/ Step7: A[ri]:= δ(Cri) /N or
δ(A∧B1), also equal to support×N/δ(A∧B1), where support, A[ID((Cr-Cri)∧Cl) ]:= δ((Cr-Cri)∧Cl) /N
N, δ(B1) and δ(A∧B1) are known. Consequently, the Step8: if con := W [ r ][ l ]. sup > min_ con is true,
confidents can be determined that the two rules are A[ ri ]
knowledge or not. Step9: W[ri][ ID((Cr-Cri)∧Cl)]:={1, W[r][l].sup,con}
Inference 4. If the rule of A→ B is valid, in which Step10: K:= K∪{Cri →((Cr-Cri)∧Cl) }
anterior part A is a co-node, then the rule A1→B∧(A-A1) can // Adding new knowledge in knowledge base K
be determined to be true or false to any A1⊂A . Step11: else,W[ri][ID((Cr-Cri)∧Cl)]:={-1,W[r][l].sup,con}
Step12: if Cl is a co-node,
// the 3th inference condition

114
Step13: to each Cli ⊂Cl Step10: W[i][j]:={-1, support, confidence}
Step14: if Cr∧Cli or Cl-Cli does not exist in W,
Step15: A[ID(Cr∧Cli) ]:= δ(Cr∧Cli) /N or IV. CONCLUSION
A[[ID(Cl-Cli) ]:= δ( Cl-Cli)/N The paper proposed one new heuristic coordinator
Step16: if con := W [ r ][ l ]. sup > min_ con is true, algorithm that is based on knowledge presentation method
A[ ID ( C r ∧ C li )] and inference mechanism of FCM. Compared with hyper
graph based algorithm, the heuristic coordinator algorithm is
Step17: W[ID(Cr∧Cli)][ID(Cl-Cli)]:={1,W[r][l].sup,con}
able to effectively reason out more rules, especially the
Step18: K:= K∪{ Cr∧Cli →Cl-Cli} knowledge whose length is greater than 2, and more accord
Step19: else, with cognitive characteristics. At the same time, KDD
W[ID(Cr∧Cli)][ID(Cl-Cli)]:={-1,W[r][l].sup,con} systems with the heuristic coordinator can achieve double
Step20: if Cli or Cr∧(Cl -Cli ) does not exist in W, focus from user demands and system self that simulate the
Step21: A[li]:= σ (Cli ) /N or “creating intent” of cognitive psychology feature. So the new
A[ID(Cr∧(Cl -Cli ))]:= δ(Cr∧(Cl -Cli)) /N heuristic coordinator algorithm based on FCM reduced the
Step22: if con := W [ r ][ l ]. sup > min_ con is true, searching space and the complexity of the algorithm and
A[ li ] enhanced the self-cognition ability and intelligence degree,
Step23: W[li][ ID(Cr∧(Cl -Cli ))]:={1,W[r][l].sup,con} which will be definite to extremely promote the mainstream
development of KDD.
Step24: K:= K∪{ Cli → (Cr∧(Cl -Cli ))}
Step25: else, ACKNOWLEDGMENT
W[li][ ID(Cr∧(Cl -Cli ))]:={-1,W[r][l].sup,con}
//the 2th inference condition We would like to thank the Science and Research Project
Step26: W[r][li]:={1,δ(Cr∧Cli)/N, δ(Cr∧Cli)/(A[r]×N)} of North China Institute of Science and Technology
Step27: for k:=1 to |C|; (No.2008-B-13) and National Nature Science Foundation
// the 1th inference condition (No.60675030) support.
Step28: if k!=r && W[l][k]!=0 is true, REFERENCES
Step29: W[r][k]:={1, δ(Cr∧Ck)/N, δ(Cr∧Ck)/(A[r]×N)}
[1] P. S. Gregory, “Knowledge Discovery in Database: 10 Years After,”
SIGKDD Explorations, vol.1, 2000, pp: 59-61.
Considering for not missing any rule, the procedure of [2] H. Mannila, “Theoretical frameworks for data mining,” SIGKDD
calculate_matrix use firstly the forth inference mechanism Explorations Newsletter, vol.1, 2000, pp:30-32.
and lastly the first inference mechanism. Finally, the rules [3] B. Yang, T. Zhang, W. Song, J. Gao, “Coordinatiors based on
with 0 weight value in matrix W are non-association states cognitive psychology features and the cooresponding KDD process
comprising knowledge rule and non-knowledge rule. model,” Journal of university of science and technology of china,
Knowledge rule is these rules that conform to support and vol.37, 2007, pp:212-216.
confidence threshold, do not yet appear in knowledge base [4] B. Yang, W. Song, Z. Xu, “New structure of Expert System based on
Knowledge Discovery Innovation Technology,” Science in
and are still unable to be got by knowledge inference China(Series E:Information Sciences) , vol.37, 2007, pp:738-747.
mechanism.
[5] B. Yang, J. Wang, H. Sun, “A Study on Double Bases Cooperating
The second step of heuristic coordinator algorithm is to Mechanism in KDD (Ⅱ) ,” Engineering Science , vol.4, 2002, pp:34-
activate the corresponding data sub-class structure of the 44.
massive database and to realize the directional mining [6] B. Yang, W. Song, Z. Xu, “Knowledge Discovery theory and
process to get the shortage knowledge by calling the application based on inner cognitive mechnism,” Progree in Natuarl
procedure Heuristic_Coordinator. Science, vol.16, 2006, pp:107-115.
[7] J. Aguilar. “A Survey about Fuzzy Cognitive Maps Papers (Invited
Procedure Heuristic_Coordinator(A,W,K) Paper) ,” International Journal of Computational Cognition, vol.3,
2005, pp:27-33.
Step1:for i:=0 to |C|
[8] J. P. Carvalho, A. B. Tomé J, “Rule Based Fuzzy Cognitive Maps
Step2: for j:=0 to |C| and Fuzzy Cognitive Maps –A Comparative Study,” Proceeding s of
Step3: if i!=j && W[i][j]==0 is true, then the 18th International Conference of the North American Fuzzy
Step4: Directional mining to the data tables Information Processing Society, New York, USA, 1999, pp.115-119.
corresponding to i and j; [9] J. P. Carvalho, A. B. Tomé J, “Rule Based Fuzzy Cognitive Maps-
Step5: if the support and confidence of the rule Ci→Cj Fuzzy Causal Relations,” In: Mohammadian, M. (Ed.),
satisfy with thresholds, Computational Intelligence for Modelling, Control and Automation-
Evolutionary Computation and Fuzzy Logic for Intelligent Control,
Step6: W[i][j]:={1,support,confidence}; Knowledge Acquisition and Information Retrieval, IOS Press. pp.
Step7: K= K∪{Ci→Cj}; 276-281.
Step8: calculate_matrix(A, W,K);
Step9: else,

115

You might also like