Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/4353204

Developing an Ontological FAQ System with FAQ Processing and Ranking


Techniques for Ubiquitous Services

Conference Paper · July 2008


DOI: 10.1109/UMEDIA.2008.4570950 · Source: IEEE Xplore

CITATIONS READS

12 140

1 author:

Sheng-Yuan Yang
Lunghwa University of Science and Technology
95 PUBLICATIONS   809 CITATIONS   

SEE PROFILE

All content following this page was uploaded by Sheng-Yuan Yang on 01 June 2014.

The user has requested enhancement of the downloaded file.


Developing an Ontological FAQ System with FAQ Processing and Ranking
Techniques for Ubiquitous Services
Sheng-Yuan Yang
Dept. of Computer and Communication Engineering, St. John’s University
E-mail: ysy@mail.sju.edu.tw

Abstract customised page ranks problem in the World Wide Web.


The class of customised page ranks that can be
This paper proposed an ontological FAQ system on implemented in this way is very general and easy
the Personal Computer domain, which employs ontology because the neural network model is learned by
as the key technique to pre-process FAQs and process examples. The problems of this approach are how to
user query in ubiquitous information environments. It is handle more complex learning tasks, for example, what
also equipped with an enhanced ranking technique to contradicting and weighting constraints are involved in
present retrieved, query-relevant results. Basically, the the process of graph neural networks, collecting classical
system bases on the wrapper technique to help clean, user examples used in the training set, etc.
retrieve, and transform FAQ information collected from In short, we need a technique that has a general idea
a heterogeneous environment and stores it in an of how the user is going to ask and what he may be
ontological database. Our experiments show the system asking in order to provide better FAQ service. This paper
does improve precision rate and produces better ranking introduces an ontological FAQ system on the Personal
results. Computer (PC) domain, which employs ontology as the
key technique to help pre-process FAQs and process the
1 Introduction user request, along with an enhanced ranking technique
The increasing popularity of the World Wide Web has to present the search results in ubiquitous information
been attracting more and more people to visit FAQ environments. Specifically, the system relies on the
(Frequently Asked Question) sites for answers to their wrapper technique to help clean, retrieve, and transform
questions in ubiquitous information environments. Most FAQ information collected from the Web, and stores it in
FAQ sites, however, provide no effective mechanisms to an ontological database (OD). When it comes to the
assist the user, who either has to pan through a long list retrieval of FAQs, the system trims irrelevant query
of FAQs or has to rely on the use of the rudimentary keywords, employs either full keywords or partial
keyword method to find relevant questions and answers. keywords to retrieve FAQs, and removes conflicting
Researchers in information retrieval have been FAQs before turning the final results to the user.
developing various techniques trying to retrieve truly Ontology plays the key role in all the above activities. To
relevant information in accord with real intent of user produce a more effective presentation of the search
queries [3]. For example, Winiwarter [16] proposes an results, the system employs an enhanced ranking
adaptive natural language interface architecture to access technique, which includes Appearance Probability,
FAQ knowledge bases. Without some background on Satisfaction Value, Compatibility Value, and Statistic
what the user may ask, however, it, in general, is very Similarity Value as four measures properly weighted to
hard to capture true user intent simply by natural rank the FAQs. The system is also equipped with an
language processing. As to ranking techniques, how well aging and anti-aging mechanism in accord with the user
can a ranking technique successfully tackle both user feedback to properly track the hot topics. Our
interest and semantic correctness? FAQshare [11] works experiments show the system does improve precision
toward this track by employing interest and correctness rate and produces better ranking results.
as two metrics to rank FAQs. Unfortunately, it proposes
no techniques to calculate the metrics; instead it relies on 2 Domain Ontology
users to evaluate the metrics. The underlying factors that 2.1 Fundamental Semantics and Services
may affect these two metrics are thus unable to surface,
making effectiveness of the results dubious. Knowledge The most key background knowledge of the system is
annotation techniques require careful analysis of FAQ domain ontology about PC, which was originally
classes so that effective schemata can be developed and developed in Chinese using Protégé 2000 [5] but was
user query can be compared with the schemata to changed to English here for easy explanation. Fig. 1
determine how to process the query, as proposed in shows part of the ontology taxonomy. The taxonomy
Aranea [2]. This approach invites the sampling problem represents relevant PC concepts as classes and their
as one may encounter in any statistical classification relationships as isa links, which allows inheritance of
domain: Unless one is sure the FAQs one uses to develop features from parent classes to child classes. Fig. 2
schemata are truly representative of the domain, this exemplifies the detailed ontology for the concept CPU.
approach is easy to return no answers. One more deficit, In the figure, the uppermost node uses various fields to
it is hard for this approach to explain why an answer is define the semantics of the CPU class, each field
not given. Finally, Scarselli et al. [7] apply an artificial representing an attribute of “CPU”, e.g., interface,
neural network model, capable of processing general provider, synonym, etc. The nodes at the lower level
types of graph-structured data, to the computation of represent various CPU instances, which capture real
world data. The arrow line with term “io” means the
_____________________________________
978 -1-4244-1866-4/08/$25.00 © 2008IEEE

541
instance of relationship. The complete PC ontology can categorized into six types of queries as shown in Table 1,
be referenced from the Protégé Ontology Library at which was originally developed in Chinese and was
Stanford Website changed to English here for easy explanation. For each
(http://protege.stanford.edu/download/download.html). type of query, we further identified several intent types
We have also developed a problem ontology to help according to its operations. Finally, we defined a query
process user queries. Fig. 3 illustrates part of the pattern for each intent type, as shown in Table 1. Based
Problem ontology, which contains query type and upon these concepts we then can formally define a query
operation type. These two concepts constitute the basic template, as shown in Table 2 for an example. We have
semantics of a user query and are therefore used as also developed a hierarchy of intent types to organize all
indices to structure the cases in OD, which in turn can FAQs in accord with the generalization relationships
provide fast case retrieval. Finally, we use Protégé’s among the intent types, as shown in Fig. 5, which can
APIs (Application Program Interface) to develop a set of help reduce the search scope during the retrieval of
ontology services, which work as the primitive functions FAQs after the intent of a user query is recognized.
to support the application of the ontologies. The Keyword Index Keyword-based Query
ontology services currently available include Query Type Operation Type
Submit

transforming query terms into canonical ontology terms, NL Query

finding definitions of specific terms in ontology, finding


relationships among terms, finding compatible and/or (a) User query in keywords
conflicting terms against a specific term, etc. NL Query
Yes

Hardware
isa Keyword-based Query
isa

(b) User query in natural language


isa
isa isa isa isa isa

Interface Power Storage


Mem ory C ase
C ard Equipm ent Media
isa
isa
isa
isa isa
isa isa isa isa isa isa isa isa

N etwork Sound Display SCSI Netw ork Power Main


UPS ROM Optical ZIP
C hip Card C ard C ard Card Supply Memory Best-matched candidates

isa isa isa isa

CD DVD C DR/W C DR
(c) Best-matched templates for user query in natural
Fig. 1 Part of PC ontology taxonomy language
CPU
Fig. 4 User query through our User Interface
Synonym= Central Processing Unit
D-Frequency String

Table 1 Question types and examples of query patterns


Interface Instance* CPU Slot
L1 Cache Instance Volume Spec.
Abbr. Instance CPU Spec.
...
Query Operation
io io io io io io io Intent Type Query Pattern
Type Type
XEON THUNDERBIRD 1.33G DURON 1.2G PENT IUM 4 2.0AGHZ PENT IUM 4 1.8AGHZ CELERON 1.0G PENT IUM 4 2.53AGHZ
Could Support ANA_CAN_SUPPORT <could S1 support S2>
Factory= Intel Synonym= Athlon 1.33G Interface= Socket A D-Frequency= 20 D-Frequency= 18 Interface= Socket 370 Synonym= P4
Interface= Socket A L1 Cache= 64KB Synonym= P4 2.0GHZ Synonym= P4 1.8GHZ L1 Cache= 32KB Interface= Socket 478 Could the GA-7VRX motherboard support the KNIGMAX DDR-400 memory type?
L1 Cache= 128KB Abbr.= Duron Interface= Socket 478 Interface= Socket 478 Abbr.= Celeron L1 Cache= 8KB
Abbr.= Athlon Factory= AMD L1 Cache= 8KB L1 Cache= 8KB Factory= Intel Abbr.= P4 HOW_SETUP <how S1><setup S2>
Factory= AMD Clock= 1.2GHZ Abbr.= P4 Abbr.= P4 Clock= 1GHZ Factory= Intel How Setup
... ... ... ... ... ... How to setup the 8RDA sound driver on a Windows 98SE platform?

Fig. 2 Ontology for the concept of CPU


WHAT_IS <what is S1>
What Is
What is an AUX power connector?
WHEN_SUPPORT <when S1 support S2>
When Support
When can the P4T support the 32-bit 512 MB RDRAM memory specification?
Q uery
WHERE_DOWNLOAD <where can download S2><S1>
isa isa Where Download
Where can I download the sound driver of CUA whose Driver CD was lost?
O peration Q uery WHY_PRINT <why cannot print S2>[S1]
T ype T ype Why Print
io Why can I not print after coming back from dormancy on a Win ME platform?
io io io io io io io
io io io io io

Adjust Use Setup Close O p en S upp ort Provid e How W hat W hy W here
Table 2 Example of query template:
Fig. 3 Part of problem ontology taxonomy ANA_CAN_SUPPORT
Template_Number 76
#Sentence 1
2.2 Ontology-Supported User Query Intent_Words could, support
Intent_Type ANA_CAN_SUPPORT
Processing Query_Type could
Operation_Type support
Fig. 4 illustrates two ways in which the user can enter Query_Patterns
<could S1 support S2>
<could support S1>
Chinese query through the user interface. Fig. 4(a) shows Focus S1

the traditional keyword-based method, enhanced by the I


n ANA_CAN_SUPPLY

ontology features as illustrated in the left column. The t


e
COULD ...
ANA_CAN_SET

user can directly click on the ontology terms to select


n
t HOW_SET
i HOW_SOLVE ...

them into the input field. Fig. 4(b) shows the user using
o HOW_FIX
n
WHAT_SUPPORT

natural language to input his query. In this case, Interface T


y
WHAT ...
WHAT_SETUP

Agent first employs MMSEG [10] to do word


p
e
WHEN WHEN_SUPPORT

segmentation, then applies the template matching H


i
WHERE_DOWNLOAD

technique [13] to select best-matched query templates as


e
WHERE ...
r
a WHERE_OBTAIN

shown in Fig. 4(c), and finally trims any irrelevant


r
c
WHY_USE
h
WHY_EXPLAIN ...

keywords in accord with the templates [15].


WHY_OFF
y

To build the query templates, we collected 1215 FAQs Fig. 5 Intention type hierarchy
from the FAQ websites of six most famous motherboard
factories in Taiwan and used them as the reference 3 System Architecture
materials for query template construction. To simplify Fig. 6 illustrates the architecture of our FAQ system.
the construction process, we deliberately restricted the Ontology Base is the key component, which stores both
user query to only contain one intent word with at most PC ontology and query ontology. OD is a stored
three sentences. The collected FAQs were analyzed and structure designed according to the ontology structure,

542
serving as an ontology-directed canonical format for 3.2 Ontological Webpage Wrapping
storing FAQ information. Webpage Wrapper performs
Fig. 7 shows the structure of Webpage Wrapper. Q_A
parsing, extracting and transforming of Q-A pairs on
Pairs Parser removes the HTML tags, deletes
each Web page into the canonical format for Ontological
unnecessary spaces, and segments the words in the Q-A
Database Manager (ODM) to store in OD. User Interface
pairs using MMSEG. The results of MMSEG
is responsible for singling out significant keywords from
segmentation were bad, for the predefined MMSEG
the user queries. Finally, FAQ Answerer is responsible
word corpus contains insufficient terms of the PC
for retrieving best-matched Q-A pairs from OD, deleting
domain. For example, it didn’t know keywords “華碩”
any conflicting Q-A pairs, and ranking the results
according to the match degrees for the user. (Asus) or “AGP4X”, and returned wrong word
segmentation like “華” (A), “碩” (Sus), “AGP”, and
“4X”. We easily fixed this by using Ontology Base as a
W e b p a g e W ra p p e r
second word corpus to bring those mis-segmented words
back. Keyword Extractor is responsible for building
W rappers for Various canonical keyword indices for FAQs. It first extracts
keywords from the segmented words, applies the
Types of W ebpage

O ntological
Ontological
Database
ontology services to check whether they are ontology
terms, and then eliminates ambiguous or conflicting
O ntology Manager
Database
Base (O DM)
(O D)

terms accordingly. Ontology techniques used here


FAQ User
User
include employing ontology synonyms to delete
redundant data, utilizing the features of ontology
Answ erer Interface

concepts to restore missing data, and exploiting the value


Fig. 6 System architecture
constraints of ontology concepts to resolve inconsistency.
3.1 Ontological FAQ Storage It then treats the remained, consistent keywords as
Table 3 Structure of the “what” table canonical keywords and makes them the indices for OD.
No. Finally, Structure Transformer calculates statistic
Operation type
Original query
information associated with the canonical ontological
Segmented words of query
Query keywords
keywords and stores them in proper database tables
Number of query keywords
Appearance frequency of query keywords
according to their Query types.
Original answers
Segmented words of answer Pre-processed Transformed
Webpage Q_A Pairs Keyword Structure FAQ
Answer keywords
Parser Extractor Transformer
Number of answer keywords
FAQ URL
Total satisfaction degree
Number of feedbacks Fig. 7 Structure of webpage Wrapper
Date of feedbacks
Aging count
3.3 Ontological FAQ Retrieval
The FAQs stored in OD come from the FAQ website Select *
(FAQs in Chinese) of a famous motherboard From COULD
Where operation = ‘support’ AND Question keywords like ‘% 1GHZ % K7V % CPU%’
manufacturer in Taiwan (http://www.asus.com.tw). Since Fig. 8 Example of transformed SQL statement
the FAQs are already correctly categorized, they are
directly used in our experiments. We pre-analyzed all Given a user query, ODM performs the retrieval of
FAQs and divided them into six question types, namely, best-matched Q-A pairs from OD, deletion of any
“which”, “where”, “what”, “why”, “how”, and “could”. conflicting Q-A pairs, and ranking of the results
These types are used as the table names in OD. Given according to the match degrees for the user. First, Fig. 8
the “what” table for an example, as shown in Table 3, shows the transformed SQL statement from a user query.
which in turn contains a field of “Operation type” to Here the “Where” clause contains all the keywords of the
represent the query intent. Other important fields in the query. This is called the full keywords match method. In
structure include “segmented words of query” and this method, the agent retrieves only those Q-A pairs,
“segmented words of answer” to record the word whose question part contains all the user query keywords,
segmentation results from the user query produced by from OD as candidate outputs. If none of Q-A pairs can
MMSEG; “query keywords” and “answer keywords” to be located, the agent then turns to a partial keywords
record, respectively, the stemmed query and answer match method to find solutions. In this method, we select
keywords produced by the Webpage Wrapper; and the best half number of query keywords according to
“number of feedbacks”, “date of feedbacks“ and “aging their TFIDF (Term Frequency – Inverse Document
count” to support the aging and anti-aging mechanism. Frequency) values and use them to retrieve a set of FAQs
Still other fields are related to statistics information to from OD. We then check the retrieved FAQs for any
help speed up the system performance, including conflict with the user query keywords by submitting the
“number of query keywords”, “appearance frequency of unmatched keywords to the ontology services, which
query keywords”, “number of answer keywords”, and check for any semantic conflicts. Only those FAQs
“total satisfaction degree”. Finally we have some fields which are proved consistent with the user intention by
to store auxiliary information to help tracing back to the the ontology are retained for ranking. We finally apply
original FAQs, including “original query”, “original different ranking methods to rank the retrieval results
answers”, and “FAQ URL”. according to whether full keywords match or partial
keywords match is applied.

543
3.3.1 Ranking method for Full Keywords Match CVi SSVi
MS ( FAQi ) = WCV × + WSSV × )
If only one Q-A pair can be located in OD under full Max(CV1 ...CVN ) Max( SSV1 ...SSVN )
keywords match, FAQ Answerer will directly output its CRi SVi (5)
answer part to the user. If more than one, say N, is + WCR × + WSV × )
Max(CR1 ...CRN ) Max( SV1 ...SVN )
retrieved, it employs Eq. (1) to calculate a match score
(MS) for each Q-A pair. where SVi is the same as in Eq. (4) and SSVi stands for
Statistic Similarity Value of FAQi, which calculates the
APi SVi (1)
MS ( FAQ ) = W × +W × inner product of the two-keyword vectors according to
Max( AP1 ... APN ) Max( SV1 ...SVN )
i AP SV
the Vector Space Model [6]. Eq. (6) defines CVi as
where APi is Appearance Probability and SVi means Compatibility Value and Eq. (7) defines CRi as Coverage
Satisfaction Value of FAQi. Weight factors WAP and WSV Ratio for FAQi.
are set to 0.6 and 0.4, respectively, in our experiments C (Ti ,q ,Ti , f ) with and
(detailed later). Eq. (2) and (3), in turn, define APi. CVi =
Ti ,q × Ti , f
C (Ti ,q ,Ti , f ) = ∑ c(q
qk ∈Ti , q , f j ∈Ti , f
k , fj )
n
(2)
APi = ∏ P(ki , j ) ⎧1, qk compatible with fj (6)
j =1 c(qk , f j ) = ⎨
⎩ 0, else
⎧1, if k i , j ∈ user ' s query (3)

P (k i , j ) = ⎨ # k i where Ti,q contains unmatched keywords in FAQi, while
⎪⎩ N , otherwise Ti,f contains unmatched keywords in the user query.
Function c(qk, fj) checks for compatibility and is
where ki,j represents the jth keyword of FAQi; #ki is the supported by the ontology services, which check whether
number of keywords in FAQi. the two keywords are related with conflicting constraints.
If yes, it returns 0; otherwise, it returns 1.
We use Eq. (4) to calculate SVi.
n ∑ E (q , f k j )
∑USLm × UPLm (4) CRi =
q k ∈K i,q , f j ∈K i, f with E (q , f ) = ⎧1, if qk = f j (7)

| K i, f | 0,
k j
SVi = n
m =1
− (0.1 × IA) ⎩ else
∑ Max(USL ...USL ) × Max(UPL ...UPL )
m =1
1 n 1 n where Ki,f contains the keywords in FAQi. Function E(qk,
fj) checks for syntactical equality between keyword qk
where USLm represents the user satisfaction level of the and keyword fj syntactically.
mth feedback, which takes on one of the five predefined
user feedback levels, namely, highly satisfied, satisfied, 4 System Evaluation
normal, unsatisfied, and highly unsatisfied with
corresponding scores 5, 4, 3, 2, and 1; UPLm stands for Table 5 Ontology-supported performance experiments
the user proficiency level of the mth user feedback, (a) Results of keywords trimming
Match Score
which takes on one of the five predefined user levels
0 0.2 0.4 0.6
Threshold
Without Without Without Without
[12,14], namely, expert, senior, junior, novice, and
Trimming Trimming Trimming Trimming
Trimming Trimming Trimming Trimming
Relevant
amateur with corresponding weights 5, 4, 3, 2, and 1; n #FAQ
Retrieved
41 53 29 51 23 45 20 32

is the total number of feedbacks to FAQi; IA stands for #FAQ


44 98 30 96 23 74 20 36

Aging Index with an initial value of zero. Note that FAQ


Precision
93.18 54.08 96.66 53.12 100 60.81 100 88.88
(%)

Answerer employs IA to record how aged an FAQ is in (b) Results of conflict resolution
order to track the hot topics. It increases or decreases the Match Score
Threshold
0 0.2 0.4 0.6

IA of an FAQ according to the user feedback. Thus, if an Detecting


Conflict
Without
Detecting
Detecting
Conflict
Without
Detecting
Detecting
Conflict
Without
Detecting
Detecting
Conflict
Without
Detecting

FAQ receives no feedback over seven days, it will Relevant


Conflict Conflict Conflict Conflict

increase its IA, signifying the aging process. On the other


74 81 74 81 69 60 21 23
#FAQ
Retrieved
hand, if an FAQ’s USL multiplied by the user’s UPL is
111 158 105 147 88 83 21 24
#FAQ
Precision
larger than 9, implying the user’s SV is better than a
66.67 51.26 70.47 55.10 78.40 72.28 100 95.83
(%)

junior user can give, FAQ Answerer will decrease its IA, We have done a series of experiments in evaluating
which in turn raises its SV, signifying the anti-aging how the system performs under the support of ontology.
process. Continuing with the query example of Fig. 8, The first experiment is to learn how well the ontology
Table 4 illustrates the results of the full keywords match supports keywords trimming and conflict resolution. We
method. According to the MS values of the last column, randomly selected 100 FAQs from OD, extracted proper
the most possible answer for the user is FAQ23. query keywords from their question parts, and randomly
Table 4 Example of full keywords match results combined the keywords into a set of 45 queries, which is
No Question Keywords AP (%) SV (%) MS used to simulate real user queries in our experiments.
15
21
1GHZ,K7V,CPU,Motherboard,AGP4X
1GHZ,K7V,CPU,HD,AGP4X
50
12.5
60
40
54
23.5
Table 5(a) illustrates the precision rate is far better than
23 1GHZ,K7V,CPU
1GHZ,K7V,CPU,Display Card
100 50 80 that without keyword trimming under every match score
threshold. Note that the domain experts decide whether a
29 25 30 27

3.3.2 Ranking method for Partial Keywords retrieved FAQ is relevant. Table 5(b) illustrates the
Match results with ontology-supported conflict resolution,
In the partial keywords match method, we calculate where we achieve 5 to 20% improvement in precision
match scores for retained FAQs according to Eq. (5). rate compared with non-conflict detection under deferent
thresholds.

544
To investigate how the weights behave in our ranking database long before any user queries are submitted in
methods and to find a set of adequate weights for our order to associate with each FAQ four categories of
system, we have carried out an experiment on how keywords, namely, required, optional, irrelevant, and
weights should be chosen with respective to user forbidden to support retrieval. In this way, the work of
satisfaction. First, we need a measure to evaluate ranking FAQ retrieval is reduced to simple keyword matching
results. Assume n documents are in the system, without inference. Our system is different from the two
( S1 , S 2 ,....., S n ) represents the ranking sequence by the systems in two ways. First, we employ
system, and ( E1 , E3 ,....., En ) represents the ranking ontology-supported, template-based natural language
processing technique to support both FAQ analysis for
sequence by the domain expert. Eq. (8) defines Ri as the storage in OD in order to provide solutions with better
importance degree of the ith document in a given semantics as well as user query processing in order to
sequence of ranks. For convenience of discussion, this better understand user intent. Second, we improve the
importance degree is dubbed as SRi if the ranking ranking methods by proposing a different set of metrics
sequence is from the system and as ERi if the ranking for different match mechanisms. In addition, Ding and
sequence is from the expert. Chi [1] proposed a ranking model to measure the
n − i +1 (8) relevance of the whole website, but merely a web page.
Ri = 10 EXP( )
n Its generalized feature, supported by both functions score
We then define ranking gain G by Eq. (9) to measure propagation and site ranking, provides another level of
how consistent the two ranking sequences are. A higher calculation in ranking mechanism and deserves more
G implies better ranking consistency between the domain attention. Finally, Massa et al. [4] suggested extending
expert and the system. Note that this calculation is a the web language so that semantic links can be expressed
special form of vector similarity and reflects the degree in order to discriminate between sites that are highly
of similarity between the two ranking sequences. linked and sites that are highly trusted. Although this
n
(9) algorithm provides an objective global estimate of the
G = ∑ SRi × ERi webpage importance, it might be not targeted to specific
i =1
user preferences. Srour et al. [9] present a novel
Finally, to support comparison among various factors,
approach for the personalization of the results of a search
we pre-calculate the best ranking gain Gmax and worst
engine based on the user’s taste and preferences and also
ranking gain Gmin and define a normalized ranking gain
earns additional investigation.
Gnormal by Eq. (10). Note the best ranking gain occurs
when the system ranking sequence conforms to that of 6 Conclusions
the expert sequence, as shown in Table 6.
G − Gmin (10) We had described an ontological FAQ system on the
Gnormal = × 100% Personal Computer domain, which employs ontology as
Gmax − Gmin
the key technique to pre-process FAQs and process user
Table 6 Experiments on weights query in ubiquitous information environments. It is also
(a) Results of full keywords match equipped with an enhanced ranking technique to present
No AP SV Gnormal(%)
1 0 1 70.24 retrieved, query-relevant results. Basically, the system
bases on the wrapper technique to help clean, retrieve,
2 0.2 0.8 85.32
3 0.4 0.6 90
4
5
0.6
0.8
0.4
0.2
90
90
and transform FAQ information collected from a
6 1 0 86.83 heterogeneous environment and stores it in an
(b) Results of partial keywords match ontological database. During retrieval of FAQs, the
No
1
SSV
0.2
CV
0.2
CR
0.2
SV Gnormal(%)
0.4 81.85
system trims irrelevant query keywords, employs either
2 0.2 0.2 0.4 0.2 79.64 full keywords match or partial keywords match to
3
4
0.2
0.4
0.4
0.2
0.2
0.2
0.2
0.2
74.91
81.99
retrieve FAQs, and removes conflicting FAQs before
5 0.2 0.2 0.3 0.3 80.99 turning the final results to the user. Ontology plays the
key role in all the above activities. To produce a more
6 0.2 0.3 0.3 0.2 77.83
7 0.3 0.3 0.2 0.2 77.83
8
9
0.3
0.3
0.2
0.2
0.3
0.2
0.2
0.3
81.99
82.15
effective presentation of the search results, the system
10 0.2 0.3 0.2 0.3 78.04 employs an enhanced ranking technique, which includes
Appearance Probability, Satisfaction Value,
11 1 0 0 0 81.68
12 0 1 0 0 48.77
13
14
0
0
0
0
1
0
0
1
72.69
81.79
Compatibility Value, and Statistic Similarity Value as
four measures properly weighted to rank the FAQs. Our
5 Related Works and Comparisons experiments show the system does improve precision
rate and produces better ranking results. The proposed
Ranking mechanism is an important technique for FAQ system manifests the following interesting features.
web-based information systems. For example, First, the ontology-supported FAQ extraction from
FAQFinder [3] is a Web-based natural language webpages can clean FAQ information by removing
question-answering system. It applies natural language redundant data, restore missing data, and resolve
techniques to organize FAQ files and answers user’s inconsistent data. Second, the FAQs are stored in an
questions by retrieving similar FAQ questions using term ontology-directed internal format, which supports
vector similarity, coverage, semantic similarity, and semantics-constrained retrieval of FAQs. Third, the
question type similarity as four matrices, each weighted ontology-supported natural language processing of user
by 0.25. Sneiders [8] proposed to analyze FAQs in the query helps pinpoint user’s intent. Finally, the partial

545
keywords match-based ranking method helps present and evaluation tool in teaching activities,” Proc. of
user-most-wanted, conflict-free FAQ solutions for the the 14th international conference on Software
user. engineering and knowledge engineering, Ischia,
Italy, 2002, pp. 557-560.
Acknowledgements [12] S.Y. Yang and C.S. Ho, “Ontology-Supported User
The author would like to thank Ying-Hao Chiu, Yai-Hui Models for Interface Agents,” Proc. of the 4th
Chang, and Fang-Chen Chuang for their assistance in Conference on Artificial Intelligence and
system implementation. This work was supported by the Applications, 1999, pp. 248-253.
National Science Council, under Grants [13] S.Y. Yang, Y.H. Chiu, and C.S. Ho,
NSC-89-2213-E-011-059, NSC-89-2218-E-011-014, and “Ontology-Supported and Query Template-Based
NSC-95-2221-E-129-019. User Modeling Techniques for Interface Agents,”
2004 The 12th National Conference on Fuzzy
References Theory and Its Applications, I-Lan, 2004, pp.
181-186.
[1] C. Ding and C.H. Chi, “A Generalized Site [14] S.Y. Yang, “FAQ-master: A New Intelligent Web
Ranking Model for Web IR,” Proc. of the Information Aggregation System,” International
IEEE/WIC International Conference on Web Academic Conference 2006 Special Session on
Intelligence, Halifax, Canada, 2003, pp. 584-587. Artificial Intelligence Theory and Application,
[2] J. Lin and B. Katz, B. “Question Answering from Tao-Yuan, 2006, pp. 2-12.
the Web Using Knowledge Annotation and [15] S.Y. Yang, F.C. Chuang, and C.S. Ho,
Knowledge Mining Techniques,” Proc. of the 12th “Ontology-Supported FAQ Processing and
International Conference on Information and Ranking Techniques,” International Journal of
Knowledge Management, New Orleans, LA, 2003, Intelligent Information Systems, Vol. 28, No. 3,
pp. 116-123. 2007, pp. 233-251.
[3] S. Lytinen and N. Tomuro, “The Use of Question [16] W. Winiwarter, “Adaptive Natural Language
Types to Match Questions in FAQFinder,” AAAI Interface to FAQ Knowledge Bases,” International
Spring Symposium on Mining Answers from Texts Journal on Data and Knowledge Engineering, Vol.
and Knowledge Bases, Stanford, CA, USA, 2002, 35, 2000, pp. 181-199.
pp. 46-53.
[4] P. Massa and C. Hayes, “Page-rerank: Using
Trusted Links to Re-rank Authority,” Technical
report, ITC/iRST, Trendo, Italy, 2005.
[5] N.F. Noy and D.L. McGuinness, “Ontology
Development 101: A Guide to Creating Your First
Ontology,” Available at
http://www.ksl.stanford.edu/people/dlm/papers/ont
ology-tutorial-noy-mcguinness.pdf, 2000.
[6] G. Salton, and M.J. McGill, Introduction to Modern
Information Retrieval, McGraw-Hill Book
Company, New York, USA, 1983.
[7] F. Scarselli, S.L. Yong, M. Gori, M. Hagenbuchner,
A.C. Tsoi, and M. Maggini, “Graph neural
networks for ranking Web pages,” Proc. of the
2005 IEEE/WIC/ACM International Conference on
Web Intelligence, Siena Univ., Italy, 2005, pp.
666-672.
[8] E. Sneiders, “Automated FAQ Answering:
Continued Experience with Shallow Language
Understanding,” Question Answering Systems,
AAAI Fall Symposium Technical Report FS-99-02,
1999.
[9] L. Srour, A. Kayssi, and A. Chehab, “Personalized
Web Page Ranking Using Trust and Similarity,”
IEEE/ACS International Conference on Computer
Systems and Applications, Amman, Jordan, 2007,
pp. 454-457.
[10] C.H. Tsai, “MMSEG: A Word Identification
System for Mandarin Chinese Text Based on Two
Variants of the Maximum Matching Algorithm,”
Available at http://technology.chtsai.org/mmseg/,
2000.
[11] H.L. Van and A. Trentini, “FAQshare: a frequently
asked questions voting system as a collaboration

546

View publication stats

You might also like