Professional Documents
Culture Documents
2008 Umedia 541
2008 Umedia 541
net/publication/4353204
CITATIONS READS
12 140
1 author:
Sheng-Yuan Yang
Lunghwa University of Science and Technology
95 PUBLICATIONS 809 CITATIONS
SEE PROFILE
All content following this page was uploaded by Sheng-Yuan Yang on 01 June 2014.
541
instance of relationship. The complete PC ontology can categorized into six types of queries as shown in Table 1,
be referenced from the Protégé Ontology Library at which was originally developed in Chinese and was
Stanford Website changed to English here for easy explanation. For each
(http://protege.stanford.edu/download/download.html). type of query, we further identified several intent types
We have also developed a problem ontology to help according to its operations. Finally, we defined a query
process user queries. Fig. 3 illustrates part of the pattern for each intent type, as shown in Table 1. Based
Problem ontology, which contains query type and upon these concepts we then can formally define a query
operation type. These two concepts constitute the basic template, as shown in Table 2 for an example. We have
semantics of a user query and are therefore used as also developed a hierarchy of intent types to organize all
indices to structure the cases in OD, which in turn can FAQs in accord with the generalization relationships
provide fast case retrieval. Finally, we use Protégé’s among the intent types, as shown in Fig. 5, which can
APIs (Application Program Interface) to develop a set of help reduce the search scope during the retrieval of
ontology services, which work as the primitive functions FAQs after the intent of a user query is recognized.
to support the application of the ontologies. The Keyword Index Keyword-based Query
ontology services currently available include Query Type Operation Type
Submit
Hardware
isa Keyword-based Query
isa
CD DVD C DR/W C DR
(c) Best-matched templates for user query in natural
Fig. 1 Part of PC ontology taxonomy language
CPU
Fig. 4 User query through our User Interface
Synonym= Central Processing Unit
D-Frequency String
Adjust Use Setup Close O p en S upp ort Provid e How W hat W hy W here
Table 2 Example of query template:
Fig. 3 Part of problem ontology taxonomy ANA_CAN_SUPPORT
Template_Number 76
#Sentence 1
2.2 Ontology-Supported User Query Intent_Words could, support
Intent_Type ANA_CAN_SUPPORT
Processing Query_Type could
Operation_Type support
Fig. 4 illustrates two ways in which the user can enter Query_Patterns
<could S1 support S2>
<could support S1>
Chinese query through the user interface. Fig. 4(a) shows Focus S1
them into the input field. Fig. 4(b) shows the user using
o HOW_FIX
n
WHAT_SUPPORT
To build the query templates, we collected 1215 FAQs Fig. 5 Intention type hierarchy
from the FAQ websites of six most famous motherboard
factories in Taiwan and used them as the reference 3 System Architecture
materials for query template construction. To simplify Fig. 6 illustrates the architecture of our FAQ system.
the construction process, we deliberately restricted the Ontology Base is the key component, which stores both
user query to only contain one intent word with at most PC ontology and query ontology. OD is a stored
three sentences. The collected FAQs were analyzed and structure designed according to the ontology structure,
542
serving as an ontology-directed canonical format for 3.2 Ontological Webpage Wrapping
storing FAQ information. Webpage Wrapper performs
Fig. 7 shows the structure of Webpage Wrapper. Q_A
parsing, extracting and transforming of Q-A pairs on
Pairs Parser removes the HTML tags, deletes
each Web page into the canonical format for Ontological
unnecessary spaces, and segments the words in the Q-A
Database Manager (ODM) to store in OD. User Interface
pairs using MMSEG. The results of MMSEG
is responsible for singling out significant keywords from
segmentation were bad, for the predefined MMSEG
the user queries. Finally, FAQ Answerer is responsible
word corpus contains insufficient terms of the PC
for retrieving best-matched Q-A pairs from OD, deleting
domain. For example, it didn’t know keywords “華碩”
any conflicting Q-A pairs, and ranking the results
according to the match degrees for the user. (Asus) or “AGP4X”, and returned wrong word
segmentation like “華” (A), “碩” (Sus), “AGP”, and
“4X”. We easily fixed this by using Ontology Base as a
W e b p a g e W ra p p e r
second word corpus to bring those mis-segmented words
back. Keyword Extractor is responsible for building
W rappers for Various canonical keyword indices for FAQs. It first extracts
keywords from the segmented words, applies the
Types of W ebpage
O ntological
Ontological
Database
ontology services to check whether they are ontology
terms, and then eliminates ambiguous or conflicting
O ntology Manager
Database
Base (O DM)
(O D)
543
3.3.1 Ranking method for Full Keywords Match CVi SSVi
MS ( FAQi ) = WCV × + WSSV × )
If only one Q-A pair can be located in OD under full Max(CV1 ...CVN ) Max( SSV1 ...SSVN )
keywords match, FAQ Answerer will directly output its CRi SVi (5)
answer part to the user. If more than one, say N, is + WCR × + WSV × )
Max(CR1 ...CRN ) Max( SV1 ...SVN )
retrieved, it employs Eq. (1) to calculate a match score
(MS) for each Q-A pair. where SVi is the same as in Eq. (4) and SSVi stands for
Statistic Similarity Value of FAQi, which calculates the
APi SVi (1)
MS ( FAQ ) = W × +W × inner product of the two-keyword vectors according to
Max( AP1 ... APN ) Max( SV1 ...SVN )
i AP SV
the Vector Space Model [6]. Eq. (6) defines CVi as
where APi is Appearance Probability and SVi means Compatibility Value and Eq. (7) defines CRi as Coverage
Satisfaction Value of FAQi. Weight factors WAP and WSV Ratio for FAQi.
are set to 0.6 and 0.4, respectively, in our experiments C (Ti ,q ,Ti , f ) with and
(detailed later). Eq. (2) and (3), in turn, define APi. CVi =
Ti ,q × Ti , f
C (Ti ,q ,Ti , f ) = ∑ c(q
qk ∈Ti , q , f j ∈Ti , f
k , fj )
n
(2)
APi = ∏ P(ki , j ) ⎧1, qk compatible with fj (6)
j =1 c(qk , f j ) = ⎨
⎩ 0, else
⎧1, if k i , j ∈ user ' s query (3)
⎪
P (k i , j ) = ⎨ # k i where Ti,q contains unmatched keywords in FAQi, while
⎪⎩ N , otherwise Ti,f contains unmatched keywords in the user query.
Function c(qk, fj) checks for compatibility and is
where ki,j represents the jth keyword of FAQi; #ki is the supported by the ontology services, which check whether
number of keywords in FAQi. the two keywords are related with conflicting constraints.
If yes, it returns 0; otherwise, it returns 1.
We use Eq. (4) to calculate SVi.
n ∑ E (q , f k j )
∑USLm × UPLm (4) CRi =
q k ∈K i,q , f j ∈K i, f with E (q , f ) = ⎧1, if qk = f j (7)
⎨
| K i, f | 0,
k j
SVi = n
m =1
− (0.1 × IA) ⎩ else
∑ Max(USL ...USL ) × Max(UPL ...UPL )
m =1
1 n 1 n where Ki,f contains the keywords in FAQi. Function E(qk,
fj) checks for syntactical equality between keyword qk
where USLm represents the user satisfaction level of the and keyword fj syntactically.
mth feedback, which takes on one of the five predefined
user feedback levels, namely, highly satisfied, satisfied, 4 System Evaluation
normal, unsatisfied, and highly unsatisfied with
corresponding scores 5, 4, 3, 2, and 1; UPLm stands for Table 5 Ontology-supported performance experiments
the user proficiency level of the mth user feedback, (a) Results of keywords trimming
Match Score
which takes on one of the five predefined user levels
0 0.2 0.4 0.6
Threshold
Without Without Without Without
[12,14], namely, expert, senior, junior, novice, and
Trimming Trimming Trimming Trimming
Trimming Trimming Trimming Trimming
Relevant
amateur with corresponding weights 5, 4, 3, 2, and 1; n #FAQ
Retrieved
41 53 29 51 23 45 20 32
Answerer employs IA to record how aged an FAQ is in (b) Results of conflict resolution
order to track the hot topics. It increases or decreases the Match Score
Threshold
0 0.2 0.4 0.6
junior user can give, FAQ Answerer will decrease its IA, We have done a series of experiments in evaluating
which in turn raises its SV, signifying the anti-aging how the system performs under the support of ontology.
process. Continuing with the query example of Fig. 8, The first experiment is to learn how well the ontology
Table 4 illustrates the results of the full keywords match supports keywords trimming and conflict resolution. We
method. According to the MS values of the last column, randomly selected 100 FAQs from OD, extracted proper
the most possible answer for the user is FAQ23. query keywords from their question parts, and randomly
Table 4 Example of full keywords match results combined the keywords into a set of 45 queries, which is
No Question Keywords AP (%) SV (%) MS used to simulate real user queries in our experiments.
15
21
1GHZ,K7V,CPU,Motherboard,AGP4X
1GHZ,K7V,CPU,HD,AGP4X
50
12.5
60
40
54
23.5
Table 5(a) illustrates the precision rate is far better than
23 1GHZ,K7V,CPU
1GHZ,K7V,CPU,Display Card
100 50 80 that without keyword trimming under every match score
threshold. Note that the domain experts decide whether a
29 25 30 27
3.3.2 Ranking method for Partial Keywords retrieved FAQ is relevant. Table 5(b) illustrates the
Match results with ontology-supported conflict resolution,
In the partial keywords match method, we calculate where we achieve 5 to 20% improvement in precision
match scores for retained FAQs according to Eq. (5). rate compared with non-conflict detection under deferent
thresholds.
544
To investigate how the weights behave in our ranking database long before any user queries are submitted in
methods and to find a set of adequate weights for our order to associate with each FAQ four categories of
system, we have carried out an experiment on how keywords, namely, required, optional, irrelevant, and
weights should be chosen with respective to user forbidden to support retrieval. In this way, the work of
satisfaction. First, we need a measure to evaluate ranking FAQ retrieval is reduced to simple keyword matching
results. Assume n documents are in the system, without inference. Our system is different from the two
( S1 , S 2 ,....., S n ) represents the ranking sequence by the systems in two ways. First, we employ
system, and ( E1 , E3 ,....., En ) represents the ranking ontology-supported, template-based natural language
processing technique to support both FAQ analysis for
sequence by the domain expert. Eq. (8) defines Ri as the storage in OD in order to provide solutions with better
importance degree of the ith document in a given semantics as well as user query processing in order to
sequence of ranks. For convenience of discussion, this better understand user intent. Second, we improve the
importance degree is dubbed as SRi if the ranking ranking methods by proposing a different set of metrics
sequence is from the system and as ERi if the ranking for different match mechanisms. In addition, Ding and
sequence is from the expert. Chi [1] proposed a ranking model to measure the
n − i +1 (8) relevance of the whole website, but merely a web page.
Ri = 10 EXP( )
n Its generalized feature, supported by both functions score
We then define ranking gain G by Eq. (9) to measure propagation and site ranking, provides another level of
how consistent the two ranking sequences are. A higher calculation in ranking mechanism and deserves more
G implies better ranking consistency between the domain attention. Finally, Massa et al. [4] suggested extending
expert and the system. Note that this calculation is a the web language so that semantic links can be expressed
special form of vector similarity and reflects the degree in order to discriminate between sites that are highly
of similarity between the two ranking sequences. linked and sites that are highly trusted. Although this
n
(9) algorithm provides an objective global estimate of the
G = ∑ SRi × ERi webpage importance, it might be not targeted to specific
i =1
user preferences. Srour et al. [9] present a novel
Finally, to support comparison among various factors,
approach for the personalization of the results of a search
we pre-calculate the best ranking gain Gmax and worst
engine based on the user’s taste and preferences and also
ranking gain Gmin and define a normalized ranking gain
earns additional investigation.
Gnormal by Eq. (10). Note the best ranking gain occurs
when the system ranking sequence conforms to that of 6 Conclusions
the expert sequence, as shown in Table 6.
G − Gmin (10) We had described an ontological FAQ system on the
Gnormal = × 100% Personal Computer domain, which employs ontology as
Gmax − Gmin
the key technique to pre-process FAQs and process user
Table 6 Experiments on weights query in ubiquitous information environments. It is also
(a) Results of full keywords match equipped with an enhanced ranking technique to present
No AP SV Gnormal(%)
1 0 1 70.24 retrieved, query-relevant results. Basically, the system
bases on the wrapper technique to help clean, retrieve,
2 0.2 0.8 85.32
3 0.4 0.6 90
4
5
0.6
0.8
0.4
0.2
90
90
and transform FAQ information collected from a
6 1 0 86.83 heterogeneous environment and stores it in an
(b) Results of partial keywords match ontological database. During retrieval of FAQs, the
No
1
SSV
0.2
CV
0.2
CR
0.2
SV Gnormal(%)
0.4 81.85
system trims irrelevant query keywords, employs either
2 0.2 0.2 0.4 0.2 79.64 full keywords match or partial keywords match to
3
4
0.2
0.4
0.4
0.2
0.2
0.2
0.2
0.2
74.91
81.99
retrieve FAQs, and removes conflicting FAQs before
5 0.2 0.2 0.3 0.3 80.99 turning the final results to the user. Ontology plays the
key role in all the above activities. To produce a more
6 0.2 0.3 0.3 0.2 77.83
7 0.3 0.3 0.2 0.2 77.83
8
9
0.3
0.3
0.2
0.2
0.3
0.2
0.2
0.3
81.99
82.15
effective presentation of the search results, the system
10 0.2 0.3 0.2 0.3 78.04 employs an enhanced ranking technique, which includes
Appearance Probability, Satisfaction Value,
11 1 0 0 0 81.68
12 0 1 0 0 48.77
13
14
0
0
0
0
1
0
0
1
72.69
81.79
Compatibility Value, and Statistic Similarity Value as
four measures properly weighted to rank the FAQs. Our
5 Related Works and Comparisons experiments show the system does improve precision
rate and produces better ranking results. The proposed
Ranking mechanism is an important technique for FAQ system manifests the following interesting features.
web-based information systems. For example, First, the ontology-supported FAQ extraction from
FAQFinder [3] is a Web-based natural language webpages can clean FAQ information by removing
question-answering system. It applies natural language redundant data, restore missing data, and resolve
techniques to organize FAQ files and answers user’s inconsistent data. Second, the FAQs are stored in an
questions by retrieving similar FAQ questions using term ontology-directed internal format, which supports
vector similarity, coverage, semantic similarity, and semantics-constrained retrieval of FAQs. Third, the
question type similarity as four matrices, each weighted ontology-supported natural language processing of user
by 0.25. Sneiders [8] proposed to analyze FAQs in the query helps pinpoint user’s intent. Finally, the partial
545
keywords match-based ranking method helps present and evaluation tool in teaching activities,” Proc. of
user-most-wanted, conflict-free FAQ solutions for the the 14th international conference on Software
user. engineering and knowledge engineering, Ischia,
Italy, 2002, pp. 557-560.
Acknowledgements [12] S.Y. Yang and C.S. Ho, “Ontology-Supported User
The author would like to thank Ying-Hao Chiu, Yai-Hui Models for Interface Agents,” Proc. of the 4th
Chang, and Fang-Chen Chuang for their assistance in Conference on Artificial Intelligence and
system implementation. This work was supported by the Applications, 1999, pp. 248-253.
National Science Council, under Grants [13] S.Y. Yang, Y.H. Chiu, and C.S. Ho,
NSC-89-2213-E-011-059, NSC-89-2218-E-011-014, and “Ontology-Supported and Query Template-Based
NSC-95-2221-E-129-019. User Modeling Techniques for Interface Agents,”
2004 The 12th National Conference on Fuzzy
References Theory and Its Applications, I-Lan, 2004, pp.
181-186.
[1] C. Ding and C.H. Chi, “A Generalized Site [14] S.Y. Yang, “FAQ-master: A New Intelligent Web
Ranking Model for Web IR,” Proc. of the Information Aggregation System,” International
IEEE/WIC International Conference on Web Academic Conference 2006 Special Session on
Intelligence, Halifax, Canada, 2003, pp. 584-587. Artificial Intelligence Theory and Application,
[2] J. Lin and B. Katz, B. “Question Answering from Tao-Yuan, 2006, pp. 2-12.
the Web Using Knowledge Annotation and [15] S.Y. Yang, F.C. Chuang, and C.S. Ho,
Knowledge Mining Techniques,” Proc. of the 12th “Ontology-Supported FAQ Processing and
International Conference on Information and Ranking Techniques,” International Journal of
Knowledge Management, New Orleans, LA, 2003, Intelligent Information Systems, Vol. 28, No. 3,
pp. 116-123. 2007, pp. 233-251.
[3] S. Lytinen and N. Tomuro, “The Use of Question [16] W. Winiwarter, “Adaptive Natural Language
Types to Match Questions in FAQFinder,” AAAI Interface to FAQ Knowledge Bases,” International
Spring Symposium on Mining Answers from Texts Journal on Data and Knowledge Engineering, Vol.
and Knowledge Bases, Stanford, CA, USA, 2002, 35, 2000, pp. 181-199.
pp. 46-53.
[4] P. Massa and C. Hayes, “Page-rerank: Using
Trusted Links to Re-rank Authority,” Technical
report, ITC/iRST, Trendo, Italy, 2005.
[5] N.F. Noy and D.L. McGuinness, “Ontology
Development 101: A Guide to Creating Your First
Ontology,” Available at
http://www.ksl.stanford.edu/people/dlm/papers/ont
ology-tutorial-noy-mcguinness.pdf, 2000.
[6] G. Salton, and M.J. McGill, Introduction to Modern
Information Retrieval, McGraw-Hill Book
Company, New York, USA, 1983.
[7] F. Scarselli, S.L. Yong, M. Gori, M. Hagenbuchner,
A.C. Tsoi, and M. Maggini, “Graph neural
networks for ranking Web pages,” Proc. of the
2005 IEEE/WIC/ACM International Conference on
Web Intelligence, Siena Univ., Italy, 2005, pp.
666-672.
[8] E. Sneiders, “Automated FAQ Answering:
Continued Experience with Shallow Language
Understanding,” Question Answering Systems,
AAAI Fall Symposium Technical Report FS-99-02,
1999.
[9] L. Srour, A. Kayssi, and A. Chehab, “Personalized
Web Page Ranking Using Trust and Similarity,”
IEEE/ACS International Conference on Computer
Systems and Applications, Amman, Jordan, 2007,
pp. 454-457.
[10] C.H. Tsai, “MMSEG: A Word Identification
System for Mandarin Chinese Text Based on Two
Variants of the Maximum Matching Algorithm,”
Available at http://technology.chtsai.org/mmseg/,
2000.
[11] H.L. Van and A. Trentini, “FAQshare: a frequently
asked questions voting system as a collaboration
546