Professional Documents
Culture Documents
Ubicc-Id365 365
Ubicc-Id365 365
Ubicc-Id365 365
Nessah Djamel
University Center of Khenchela, Algeria
Kazar Okba
Biskra University, Algeria
ABSTRACT
Currentely most search systems developed for information retrieval are based on
vector representations as the space vector model, others use various statistical and /
or probabilistic approach. Generally, for a given user request, the documents
retrieved are not relevant, because the precision measurement is partial (presence
of noise), and furthermore other relevant documents will never be found, this
difficulty is an effect of a low recall measurement (presence of silence).
Our work is to propose a model whose objective is to improve the results towards
a user query, this will be done by acting on measures of precision and recall, for
this, first we use a multi agents system to reproduce the concepts of autonomy,
cooperation and communication, which are inherent to this type of search systems,
and secondly our approach will combine a syntactic search improved by the use of
semantics that provides the WordNet taxonomy with a semantic search engine
based domain ontology. The knowledge base (domain ontology) is used to
annotate documents by the concepts and the defined instances, therefore these
form a set of semantic index classes on which our search model is based.
The semantic annotation allows agents who use 3.3.2 Weight mapping
a semantic search engine to decide intelligently about Weights of the annotations are used to evaluate
the relevance of the returned results, for these the relevance, the appropriateness and to implement
reasons the process of retrieving information a classification algorithm (ranking) of retrieved and
depends largely on the quality of formal semantic relevant documents.
annotations defined by domain ontology. These annotation’s weights reflect the relevance
Documents in the domain of interest are of an instance for the semantic of the document
annotated by concepts and instances of concepts, this where it appears, our model is based on the
annotation has two relational properties that are frequency of occurrence of annotation instances in
instances annotations and documents annotated and each document and takes into account the principle
through which the concepts and documents are of generating an equivalent annotation class
linked. So, terms (class, concept, data type, object described above, so the adaptation of the Tf-Idf
property, data type property) defined in the ontology algorithm will consider the number of times an
are used as metadata to annotate the content of the annotation's label within an equivalent class appears
in a document, the formula is: system, it receives and transmits to the system the
user feedback and presents him the search results, the
freq x , d D (1) agent "Information-Research" collects in the area of
dx = * log
max freq y , d nx interest relevant information resources; it may deal
y
with several other subcontracts agents to accomplish
dx : weight of the instance "x" in document "d" this goal, while the "Domain-Ontology" agent
freqx,d : number of occurrences in "d" of keywords inspects and monitors the dynamic changes in
linked with instance "x" information resources contents, it extracts and stores
max y freqy,d : The frequency of occurrence of the in an RDF base document's links that are annotated
most repeated instance in the document "d" by concepts and instances of the specified ontology.
nx : The number of documents annotated by "x" In the management query unit (processing) is
D : Total number of documents. situated the "Query-Treatment" agent which
Based on works presented in [3], [4] and to coordinates the activities of the system, it formulates
simplify the calculations we’ll only retain the and refines (prepare) the query to be submitted to the
importance of an instance in the document: agent "Information-Research".
freqxi,d (2)
wxi =
maxy freqyi,d
4 SYSTEM’S AGENTS
4.1 User-Interface Agent Another module that complements the first one
Interface agent resides on the desktop user; it prepares the same query based concepts; it is a
provides the interface to interact with the system. For semantic search which uses relationships between
a search session it records the user request in terms concepts as follow:
of keywords. An RDQL query will be generated from the
Possibly the user can define its Search domain keywords expressed in the original request, also this
and introduce various user preferences such as the may be done by the "User Interface" agent who in
favorite search engine (default Google), and a set of this case reaches the domain ontology and help the
variables defining thresholds calculations. Also this user to explicitly select classes and introduce the
agent presents the user the search results when they desired values of properties.
arrive, it can implement an intelligent behavior and The "Query-Treatment" agent interacts with the
learn from past experiences and user feedback on agent "Domain-Ontology" to run on the pattern of
earlier requests. domain ontology and instances of concepts specified
in OWL the RDQL query, the result is a set of
4.2 Query-Treatment Agent instances that strictly satisfy conditions of the RDQL
In our multi-agents system, this agent manages query. (Standard engine such as "Jena" is used to
the cooperative execution of the user request; it has execute RDQL queries). The execution is an
knowledge about each agent which includes the instantiation operation of the concepts of the
identification and roles that the agent can perform in ontology’s scheme OWL by values of variables used
its capabilities order. (Fig.5) According to their in the constructed query and the invocation of reason
various skills it allocates them tasks to achieve their such as Jena to infer the related knowledge.
common goal. Through interactions that the agent
maintains with the "Evaluate-Ranking" agent it 4.3 Information-Search Agent
performs various substitutions involving: The first research component of this agent is
based syntactic keywords and targets the area of
research (e.g. the web) through a traditional search
engine; however, to improve research results purely
syntactic we introduce a second component which
performs semantic search.
Both modules operate simultaneously, each one
receives input model adapted to query search mode
prepared by the agent "Query-Treatment" (keywords
to perform syntactic-semantic search and generated
instances of concepts derived from the execution of
RDQL query to perform a semantic search).
The agent can contract several other agents to
complete the research, choosing an agent for such
research can depend on the agent capability and the
nature of the information sought. (Fig.6)
4.3.1 Semantic-Syntactic search
Uses a syntactic search engine such as Google to
find in the area of interest documents that satisfy the
Figure 5: Query-Treatment agent structure submitted query. That is a search of purely syntactic
correspondence between the keywords in the query
The weight of keywords: the weights of and terms indexing the documents available in space
keywords are replaced by values calculated by a research.
heuristic evaluation of similarity; these values are
classes. By analogy with the space vector model,
semantic annotations are assigned weights reflecting
the importance of the annotation instance for the
document, therefore in RDQL queries; the variables
in the SELECT clauses are assigned weights
according to the principle of vector model.
The formula of cosine is used to calculate the
similarity document-query, so for a page "j" and a
request "q" we used the expression:
r r
P j .q (7)
Sim ( Pj , q ) =
Pj . q
Figure 6: Information-Search agent structure |pj|,|q| : respectively “Pj” and “q” vectors norms
When Sim( Pj, q) > =Rmin, the page “Pj” is
4.3.2 Semantic search considered relevant, its link and the similarity's value
This module research in the RDF documents are returned to agent "Information-Search" for final
base, the RDF annotations that match tuples storage, in the case Sim( Pj, q) <Rmin the current page
instances recovered by the "Query-Treatment" agent. will be ignored, the process ends when all the pages
The module receives input instances which are the are crawled, at the end we will have obtained a set of
results of the RDQL query, then, documents whose all relevant pages, according to user feedback if the
links have been stored in the RDF database are number of relevant resources found is sufficient, an
analyzed and those annotated by these tuples algorithm for grading results is executed to present
instances are found, they are considered semantically the results in their degree of relevance; this algorithm
relevant. Then the agent records in a temporary file is implemented by the ranking component in the
the following details: Links of resources found and structure of this agent.
their evaluated similarities. If we want against include more resources
4.4 Evaluate-Ranking Agent (depends on user feedbacks), the "Evaluate-Ranking"
The "Information-Search" agent stores links of agent will explore the relationships between concepts
resources found in a temporary file to which the defined in the WordNet hierarchy to extract sets of
"Evaluate-Ranking" agent accesses, so it is a type of synonyms, hyponyms and hypernyms., Then the
memory that can be modeled by a blackboard. For expansion of the query will use the "synsets" in the
each entry, the "Evaluate-Ranking" agent download limits of depths set by the user, but generally when
page referenced by the link. Furthermore Keywords using an expansion with hypernym synsets the depth
that syntactically index the page or semantic index is set to "1" because the similarity tends to decrease
instances are assigned weights according to the when generalizing sense.
principle of vector model.
4.4.1 Evaluate module
Let Wij: weight of term "i" (keywords) in page j.
Freq _ t i
Wij = (6)
max( freq _ t j ) j =1, n
Documents returned having a high similarity are An inference engine applied to the ontology
those with: a # 0 and b # 0. schema and defined instances, will infer knowledge
other than those explicitly declared, inference is a
4.5 Domain-Ontology Agent mechanism that is based on the expressiveness of the
Attached to the domain ontology, the main goal language (OWL-Lite) and its formal semantics based
of this agent is to maintain the ontology closely and on description logics, especially this concerns
restrictions on classes, on the properties among <class>3</class>
classes and a set of defined axioms on classes, for </Hotel>
example, we specify that a 5 star rated hotel must </rdf:RDF>
have as service “guided-visits” by the class: guided- The execution of code associated with this
visits=((hotel) ∩ (> = 5 rated.star)). model produced the following results:
Type: Chelia is
5.2 Inference Models
The integration of the Jena API in our model http://mydomain/ontology/infohotel/chelia rdf:type
will allow it deriving additional RDF assertions http://mydomain/ontology/infohotel/hotel
included in the OWL knowledge base; this
mechanism supports the languages RDF /RDFS and Type: Chelia is
OWL and uses an inference model which has two http://mydomain/ontology/infohotel/chelia rdf:type
components: http://mydomain/ontology/infohotel/serviceh
• The schema of the model
• The instances of the model 6 CONCLUSION
The example below is an illustration of an
inference model used by inference engine RDFS. The proposed semantic research model based
Inference is performed by the transitive relation on multi-agent system and using domain ontology
properties which defines 'room service “as a sub illustrate the concept of cooperative resolution of
property” of the property “hotel service”. distributed problems, the process combines a search
5.2.1 Model’s schema engine based ontology with a traditional search-
based keyword which include relations of synonymy
<?xml version="1.0"?> and hyponymy provided by the WordNet taxonomy.
<!DOCTYPE rdf:RDF [ <!ENTITY hotelerie The semantic search uses as support an RDQL
'http://mydomain/ontology/infohotel/'> query generated from query keywords, then an
<!ENTITY rdf 'http://www.w3.org/1999/02/22-rdf- inference engine such as "Jena" will use the ontology
syntaxns#'> scheme to retrieve defined instances in
<!ENTITY rdfs 'http://www.w3.org/2000/01/rdfschema#'> correspondence with keywords in the query, these
<!ENTITY xsd
instances will be sought in the RDF database and
'http://www.w3.org/2001/XMLSchema#'>]>
<rdf:RDF xmlns:rdf="&rdf;" xmlns:rdfs="&rdfs;"
return the documents that they annotate.
xmlns:xsd="&xsd;" As prospects for research in this area and in
xml:base="http://mydomain/ontology/infohotel/" relation with our model, we propose to enrich the
xmlns="&hotelerie;"> knowledge base agents with techniques for
<rdf:Description rdf:about="&hotelerie;room-service"> formulation query including explicit rules and policy
<rdfs:subPropertyOf decision, this will allow the "Query-Treatment"
rdf:resource="&hotelerie;hotelservice"/> agent to optimize the request in an intelligent way, it
</rdf:Description> is true that over the query is well-defined, better
<rdf:Description rdf:about="&hotelerie;hotel-service">
relevant results are obtained.
<rdfs:range rdf:resource="&hotelerie;Hotel"/>
<rdfs:domain rdf:resource="&hotelerie;Serviceh"/>
Also, to take advantage of new technologies
</rdf:Description> applied to artificial intelligence systems, we intend to
<rdf:Description rdf:about="&hotelerie;classement"> couple the agent "Query-Treatment" with a system of
<rdfs:range rdf:resource="&xsd;integer" /> reasoning from cases (CBR), this will enable and
</rdf:Description> perfect the search process by reasoning from cases
</rdf:RDF> already resolved and stored in the CBR data base.