Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

Global Congress on Intelligent Systems

Building Ontology Automatically Based on Bayesian Network and PART Neural


Network

ZHI Xi-hu LI Yan-fei


Academy of Information Technology . Luoyang Academy of Information Technology . Luoyang
Normal University Normal University
Luoyang, China Luoyang, China
zhixihu@yahoo.com.cn xhs_henu2004@sina.com

Abstract—The deployment of the semantic web depends on the Framework (RDF). RDF is recommended by W3C and can
rapid and efficient construction of the ontology. But traditional deal with the lack of standard to reuse or integrate existing
ontology construction is time-consuming and costly procedure. ontology.
This paper present a novel ontology construction method
based on ART network and Bayesian Network. The feature of II. BAYESIAN NETWORKS
this ontology construction system includes that the PART
Bayesian methods provide a formalism for reasoning
architecture overcomes the lack of flexibility in clustering,
about partial beliefs under conditions of uncertainty [2][3].
while in the web page analysis, WordNet and Entropy deal
with the lack of knowledge acquisition. The system then uses a The basic expressions in Bayesian formalism are statements
Bayesian network to insert the terms and finish the complete about conditional probabilities. We say two random
hierarchy of the ontology. The experimental results indicate variables X and Y are independent if P(x|y) = P(x). The
that this method has great promise. variables X and Y are conditionally independent given the
random variable Z if P(x|y, z) = P(x|z). From the Bayesian
Keywords-ontology; bayesian network; neural network rule, the global joint distribution function P(x1, x2, ..., xn) of
variables X1,X2, ...,Xn can be represented as a product of local
I. INTRODUCTION conditional distribution functions. That is,
P(x1, x2, ..., xn) = P(x1)P(x2)...P (xn|x1, ..., xn−1) (1)
A semantic web consists of machine-understandable A Bayesian network for a collection X1,X2, ...Xn of
documents and data. The core technology of a semantic web random variables represents the joint probability distribution
is an artifact called ontology. Ontologies also play an of these variables. The joint probability distribution, which is
important role in biomedical informatics and in knowledge associated with a set of assertions of conditional
management. Unfortunately, constructing and maintaining independency among the variables, can be written as
ontology is a difficult task. Traditional ontology construction n
leans on domain experts but it is costly, lengthy, and ∏ P(x i | D xi )
arguable [1]. In addition to the lack of standards, the field P(x1, x2, ..., xn) = i =1 . (2)
also lacks methods for fully automated knowledge where Dxi is a subset of variables x1, ..., xi−1 on which xi is
acquisition: ontology construction is a time-consuming and dependent. Hence a Bayesian network can be described as a
costly procedure. Though current ontology construction directed acyclic graph consisting of a set of n nodes and a set
methods can achieve apartially automated classification of directed edges between nodes. Each node in the graph
framework, there are limitations such as the requirement for corresponds to a variable xi and each directed edge is
human labor and domain restrictions. constructed from a variable in Dxi to the variable xi. If each
In order to overcome the above problems, this paper
proposes a novel method consisting of Projective Adaptive variable has a finite set of values, to each variable xi with
Resonance Theory (PART) neural network and Bayesian parents in Dxi , there is an attached table of conditional
Network probability theorem to automatically construct probabilities P(xi| Dxi i ). In the case of problems with many
ontology. The PART neural network not only considers the variables, the direct approach is often not practical.
data points, but also the dimensions, and can deal with the Nevertheless, at least when all the variables are discrete, we
lack of flexibility in clustering. Additionally, the system can expand the conditional independences encoded in the
utilizes WordNet combined with TF-IDF and Entropy Bayesian Network to make the calculation more efficient.
theorem to acquire key terms automatically. Then, based on
the term frequency, a matrix containing documents and terms III. ADAPTIVE RESONANCE THEORY NETWORK
for the PART neural network to use in clustering the web Adaptive Resonance Theory Network (ART)is an
pages is constructed. Finally, the system uses Bayesian unsupervised learning network model[4]. It obtains training
Networks to reason out the complete hierarchy of terms and examples directly from the known or real input data. The
to construct the final domain ontology. The system then principle of ART is originated from the study of cognitive,
stores the resultant ontology using a Resource Description i.e., human’s mnemonic system which stored some known

978-0-7695-3571-5/09 $25.00 © 2009 IEEE 563


DOI 10.1109/GCIS.2009.29
information. When human memorizes new information (the
plasticity of cognitive), it has to maintain the old memories
(the stability of memory). However, the confusions of new
and old memory may occur at this time. The tolerance of
how many new things can be stored in the memory and how
many old memories can still be maintained is controlled 6. Repeat step1 to step5 until the number of data points in
through ‘vigilance test’. each cluster falls below the threshold θc
ART networks can be divided into two classes: ART1, 7. Return the clusters.
which takes only binary input, and ART2, which takes
continuous or binary input. The basic ART includes both IV. DOMAIN ONTOLOGY CONSTRUCTION BASED ON
bottom-up competitive learning and the top-down cluster BAYESIAN NETWORK AND PART
pattern learning. The operations generate a new output node In the study, an automatic ontology construction based on
dynamically when an unfamiliar input pattern is fed in. The a projective ART and a Bayesian Network is presented. The
forward and backward processes operate until the message PART architecture overcomes the lack of flexibility in
resonates. The operation of ART appears to be similar to that clustering, while in the web page analysis, WordNet and
of the neural system of the human brain. Not only does the Entropy deal with the lack of knowledge acquisition. The
system learn new examples, it also reserves old memory RDF format of the domain ontology will hasten the
cells. integration and reuse of exiting ontology.
A. Projective Adaptive Resonance Theory Network A. The system architecture of Ontology Construction
(PART) The architecture of the system shows in Figure 1
In order to deal with the feasibility-reliability dilemma in included six processes. The detail of domain ontology
clustering data sets of high dimension, a new neural network construction described as follows.
architecture – PART (Projective Adaptive Resonance
Theory) was presented in 2002[5]. The basic architecture of Web pages analysis Web pages
Web papes
PART is similar to that of ART neural networks. The main collection
difference between PART and ART is in the input layer. In
PART, the input layer selectively sends signals to nodes in Term weight
Term frequency table
the output layer (cluster layer). The signals are determined
by a similarity check between the corresponding top-down Establish
weight and the signal generated in the input layer. Hence, the hierarchy
Cluster PART tree
similarity check plays a crucial role in the projected
clustering of PART. In addition to the vigilance test, the Ontology tree
PART adds a distance test to increase the accuracy of
clustering. The PART algorithm is presented below. evaluate
Output
PART Algorithms:
0. Initialization: Initialize parameters L, ρ, σ, α, θw, θc. Figure 1. The system architecture of ontology construction.
Input vectors: Dj=(F1j, F2j, …, Fij, …,Fnj), j=1, 2, …, m.
Output nodes: Yk, k=1, 2, …, m. (1) Collect web pages
Set Yk does not learn any input pattern. The system utilizes the Universal Resource Locator
1. Input the pattern D1, D2, …, Dj, …, Dm. (URL) to collect web pages from the Google and ESPN
2. Similarity Check: hjk=h(Dj, Wjk, Wkj)=hσ(Dj, Wkj)l(Wjk) search engines. The system then filters the collected web
pages and returns the filtered web pages for web page
analysis.
(2) Analyze web pages
In this section, the proposed system first performs word
if hjk=1, Dj is similar to Yk Else h =0, D is not similar to segmentation according to space. The system adopts
Yk WordNet 2.1 to ascertain the existence of keywords and
3. Selection of winner node: Tk =ΣWjk hjk =ΣWjk h(Dj , restore the keywords to wordstem. First, the system
Wjk , Wkj) Max {Tk } is the winner node. calculates the TF-IDF and Entropy value of keywords, then
4. Vigilance and Reset: Rk = Σhjk < ρ will define the TF-IDF threshold θt and the Entropy
If the winner node succeeds in vigilance test, the input threshold θe. If the TF-IDF of the keyword is greater than θt
pattern will be clustered into the winner node. Otherwise, the or the Entropy value of the keyword is greater than θe will be
input pattern will be clustered into a new node. selected. These key terms will be the concepts of the
5. Learning: Update the bottom-up and top-down weights problem domain used to construct the domain ontology.
for winner node Tk Therefore, the system mixes TF-IDF with Entropy to define
If Yk has not learned any pattern before: Wjknew= L/(L- the weight of the key terms. The weights of the key terms
1+n), Wjknew= Dj show in formula(3).
If Yk has learned some patterns before W(Ti) = (TF _ IDFi +E(Ti))/2 (3)

564
Finally, the system outputs formula (4) to represent Dj as Java language (Jena version 2.5.2) to program the RDF file
follows. and stores it in the computer for ease of reuse.
Dj={(T1, F1j, W(T1)),…,(Ti, Fij, W(Ti)),…, (Tn, Fnj, W(Tn))}
(4) B. Experiments and Discussions
(3) Cluster semantic web pages In the section, we present the results of the system. We
In this section, the system uses PART neural network to collected web pages from the plant domain to construct the
cluster web pages. In order to obtain detailed information ontology. The two important types of experiments are
from the PART clustering, we add the notion of recursion to conducted: an investigation of how the number of collected
the PART architecture. Each cluster result will call PART web pages affects the precision of constructed ontology and
again if the number of elements in the cluster exceeds the a comparison of PART and ART in the construction of the
threshold (θc). The PART tree will provide the information domain ontology.
about the hierarchical relation of the projective clusters. The In the first experiment, we explore whether the quantity
original PART only obtains the information of four clusters of data will affect the result. Bayesian reasoning is usually
after clustering according to the above calculation. The affected by the quantity of data. We divided the experiment
system then chooses the term that has the highest weight in into five stages and used Precision (C_P) and Precision
each cluster to represent that cluster. If the term represents an (C_L_P) to evaluate the five ontology results. In the first
upper cluster, the system will choose the sub-high entropy stage, we randomly extracted 500 web pages and constructed
value to represent the cluster. We call the recursive tree a domain ontology. In the second stage, we randomly
PART tree, and it displays the basic hierarchical extracted 200 web pages from the remnant web pages to add
relationships of the ontology. The PART tree is the initial to the quantity of the web pages. In the final stage, we
structure of ontology and will be complicated by Bayesian extracted all of the web pages to construct the domain
network. ontology. After clustering, the system will select the highest
(4) Establish the hierarchical relation term weight (TF-IDF value) of each cluster to represent the
After getting the PART tree, there are still some terms do cluster, and obtain the basic hierarchical concept (PART
not inserted into domain ontology. According to the tree). The Bayesian Network is used to infer the relationship
inference, we adopt Bayesian Network probability theorem among the levels of the remainder terms. The system
to insert the remainder terms to the PART tree. Because calculates the conditional probability table and sets the
there are several advantages for data analysis in Bayesian threshold θBN=0.35 in order to insert the remainder terms
Networks[6]. First, Bayesian Network encodes dependencies into the PART tree. Following the steps, the system will
within all variables, so it can deal with missing data entries discover the complete concept stratum. Table 1 shows the
easily. Secondly, the network can be used to handle causal results of the experiment.
relationships and hence it can be utilized to gain
TABLE I. THE ONTOLOGY RESULT FOR DIFFERENT QUANTITY OF
understanding about a problem domain and to predict results. WEB PAGES
Third, Bayesian Network is a technology based on statistics
which offers a valid and widely recognized approach for Stage Stage1 Stage2 Stage3 Stage4 Stage5
avoiding the over-fitting of data. Finally, the diagnostic No of web pages 200 455 677 990 1200
Depth of ontology 40 48 53 63 68
performance with the Bayesian Network is often surprisingly
Breadth of ontology 16 13 13 12 12
insensitive to imprecision in the numerical probabilities. A 39 42 48 52 59
Owing to the above advantages, we adopt Bayesian Network B 17 10 10 9 9
to complete the hierarchy of the domain ontology. C 46 47 50 59 62
Besides, when the highest conditional probability is less C_P 78.4% 82.9% 87.5% 88.2% 91.4%
than the threshold(θBN), the system will prioritize the next C_L_P 71.3% 72.5% 79.6% 83.6% 84.4%
remainder term. Following the above steps, the system will
finish the complete hierarchical relationship of the domain The five data sets are inputted to the system in their
ontology respective order. The detail results of the ontology are shown
(5) The exhibition of ontology in Table 1. We found the results are suboptimal when the
RDF can be used to describe the resources of a given web quantity of data is small. The system is based on the
page, using a meaning graph of the RDF to represent a inference of probability, and thus, it is possible that the result
problem. The RDF accentuates the exchange and automation will be affect by partial data, especially with a small sample.
processing of web resources designated by a Unified Based on the foregoing discussion, it is clear that the more
Resource Identifier (URI), a string of web resources or an web pages included in the data set, the higher the precision
element of XML; while the description depicts the resource is. As Table 1 shows, when the quantity of data exceeds 900,
attributes and the framework describes the irrelevant Precision (C_P) exceeds 80%. After the experiment, we
common model of the resources. suggest that Precision (C_P) will exceed 80% when the
In this study, an RDF structure is utilized to describe and quantity of data exceeds 900. Efficiency, however, is
store the relationship between terms and clusters. The RDF suboptimal because the Bayesian reasoning is affected by the
can improve the effectiveness of query and aid the quantity of web pages.
integration of the existing ontology[7]. The system utilizes The preceding experiment suggests that the projective
ART saw better results with greater quantities of data. In this

565
experiment, we explored whether PART is superior to ART V. CONCLUSIONS AND FUTURE WORKS
in web page clustering. The ART neural network was used to In conclusion, building the ontology rapidly and correctly
cluster the entire set of web pages (1,523 web pages) and the has become an essential task for content based search on the
results compared with those of PART. To ensure an Internet. In the field, ontology construction is usually done
equitable comparison, the parameter setting of ART was set using manual or semi-automated methods, which require
identical to that of PART (ρ=3, α=0.1, θ=4). After clustering, help from domain experts. This paper proposes a novel
we obtain a basic ART tree and employ the Bayesian method consisting of Projective Adaptive Resonance Theory
network to complete the hierarchical relationships. Figure 2 (PART) neural network and Bayesian Network probability
shows the detailed comparison results of ART and PART. theorem to automatically construct ontology. The PART
architecture overcomes the lack of flexibility in clustering,
not only considers the data points, but also the dimensions,
and can deal with the lack of flexibility in clustering. the
system uses Bayesian Networks to reason out the complete
hierarchy of terms and to construct the final domain
ontology. The experimental results indicate that this method
has great promise.
In the future work, we will attempt to improve the
precision of term location. We will attempt to combine it
with a multi-field ontology to develop a well rounded
system.

Figure 2. The compared result of PART and ART.


REFERENCES
Processing with the Bayesian Network shows that these
proper nouns usually have the highest inferred probability [1] Navigli, R., Velardi, P., Gangemi, A., “Ontology Learning and Its
from the node plant (the root of the ontology). The result is Application to Automated Terminology translation,” IEEE Intelligent
an ontology with a wider breadth. For instance, when the System, vol. 18, no. 1, pp. 22-31, 2003.
ART processed the node cleanup, it determined that it must [2] Denoyer, L., Gallinari, P., “Bayesian Network Model for Semi-
structured Document Classification,” Information Processing and
be some kind of hitter but ART could not discover the Management, vol. 40, pp. 807-827, 2004.
relationship. Eventually, the node cleanup was clustered by [3] Park, Y. C., Choi, K. S., “Automatic Thesaurus Construction Using
itself and became a descendant of the node plant. The Bayesian Networks,” Information Processing & Management, vo1.
domain experts judged the node to be in the wrong location, 32, mo. 5, pp. 543-553, 1996.
showing that PART is superior to ART in clustering. [4] R. J. Kuo, J. L. Liao, and C. Tu, “Integration of ART2 Neural
We adopted RDF, a standard ontology web language Network and Genetic K-means Algorithm for Analyzing Web
recommended by W3C, to record and represent the domain Browsing Paths in Electronic Commerce,” Decision Support Systems,
Vol. 40, No. 2, pp. 355-374,2005.
ontology result. The system utilizes the Jena package to
output the results of the RDF format. The RDF is capable of [5] Cao, Y., Wu, J., “Dynamics of Projective Adaptive Resonance
Theory Model: The Foundation of PART Algorithm,” IEEE
describing the resources of the World Wide Web. Moreover, Transactions on Neural Networks, vol. 15, no. 2, pp. 245-260, 2004.
it helps achieve the integration and reuse of ontology. Figure [6] Langseth, H., Portinale, L., “Bayesian Networks in Reliability,”
3 shows a partial RDF format of the plant ontology Reliability Engineering and System Safety, vol. 9, pp. 92-108, 2007.
<rdfs:subClassOf> [7] Hjelm, J. “Creating the Semantic Web with RDF,” Willey Computer
<owl:Restriction> Publishing, 2001.
<owl:someValuesFrom>
<owl:Class rdf:about="# Double Cotyledon "/>
</owl:someValuesFrom>
<owl:onProperty>
<owl:ObjectProperty rdf:ID=" Cotyledon "/>
</owl:onProperty>
</owl:Restriction>
</rdfs:subClassOf>

Figure 3. The RDF format of plant ontology .

566

You might also like