Download as ps, pdf, or txt
Download as ps, pdf, or txt
You are on page 1of 8

A Rule-based Query Language for HTML

Mengchi Liu Tok Wang Ling


Department of Computer Science School of Computing
University of Regina National University of Singapore
Regina, Saskatchewan Lower Kent Ridge Road
Canada S4S 0A2 Singapore 119260
mliu@cs.uregina.ca lingtw@comp.nus.edu.sg

Abstract web data at a very rough level. For example, none of the
existing data models and query languages can capture the
With the recent popularity of the web, enormous amount internal structure of the latest DBLP bibliography server
of information is now available on line. Most web doc- web page of Michael Ley at http://www.informatik.uni-
uments available over the web are in HTML format and trier.de/ley/db shown in Figure 1 in a simple and natural
are hierarchically structured in nature. How to query such way and use such structure to express practical queries.
web documents based on their internal hierarchical struc- In [13], we presented a conceptual model for HTML.
ture becomes more and more important. In this paper, we It has only a few simple constructs but is able to repre-
present a rule-based language called WebQL to support ef- sent the complex hierarchical structure in the web docu-
fective and flexible web queries. Unlike other web query ments at a high level that is close to human conceptual-
languages, WebQL is a high level declarative query lan- ization/visualization of the documents. Also, a set of rules
guage with a logical semantics. It allows us to query web were presented to convert HTML documents into this con-
documents based on their internal hierarchical structures. ceptual model.
It supports not only negation and recursion, but also query In this paper, we present a rule-based language called
result restructuring in a natural way. We also describe WebQL based on this conceptual model. Unlike other web
the implementation of the system that supports the WebQL query languages, WebQL is a declarative query language
query language. with a logical semantics. It allows us to query web docu-
ments based on their internal hierarchical structures. It sup-
ports not only negation and recursion, but also query result
1 Introduction restructuring in a natural way. It can be used in two different
ways: one is for the user to query the internal structure of
With the recent popularity of the web, enormous amount the web documents; the other is for the user to query parts
of information is now available on line. Most web docu- of the web documents when they know part of the internal
ments available over the web conform to the HTML speci- structure of the web documents. As the conceptual model
fication. They are intended to be human readable through a on which the rule-based language is based is high level, user
browser and thus are constructed following some common queries in the rule-based language is also quite high level.
conventions and often exhibit some hierarchical structure. The rest of the paper is organized as follows. Section 2
How to query such web documents based on its internal presents the syntax of the language. Section 3 gives several
structure becomes more and more important. query examples. Section 4 provides the logical semantics
In the past few years, a number of query languages for the language. Section 5 describes the implementation of
and systems have been developed in the database com- our web query and inference system that supports WebQL.
munity to retrieve data from the web, such as W3QS [6], Section 6 summarizes and points out further research issues.
WebSQL [15], WebLog [10], UnQL [5], Lorel [2], We-
bQOL [3], Strudel [7] and Florid [9]. For surveys, see [1, 8,
16]. However, most proposals use relational, graph-based 2 Syntax of WebQL
or tree-based data models to represent the web data. They
focus on inter-document structures, with little attention to We assume the existence of two kinds of disjoint sym-
intra-document structure and thus can only represent the bols: a set C of constants containing the set U of URLs, and
a set V of variables started with ’$’ followed by a string and An expression is ground if it contains no variables.
’$’ itself is an anonymous variable.
Definition 3 A rule has the form A :– L1 ; :::; Ln , where A
Definition 1 The terms are defined recursively as follows: is a positive expression u : T , each Li is a positive expres-
sion, a negative expression, or an arithmetic, string or bag
(1) A constant is a lexical term. operation expression defined using terms. A rule is safe if
(2) If X is a constant or a variable, and Y is a URL or a all variables in the head are covered or limited as defined
variable, then X hY i is a linking term, and X is called in [4, 11, 17].
the label and Y is called the anchor of the linking term.
When X or Y is the anonymous variable, we can sim- For a negative expression with a bag term in the body of
ply use hY i or X hi respectively. a rule, we can move the negation sign into the bag for conve-
(3) If X and Y are terms, then X ) Y is an attributed nience. For example, we can use a : f:Faculty)fJohngg
term. to stand for : a : fFaculty)fJohngg. We can also com-
(4) If X1 ; :::; Xn are terms, then fX1 ; :::; Xn g is a bag bine positive and negative expressions with the same URL
term. for convenience. For example, we can use the following
(5) A variable is either an atomic term, a linking term, a expression
a : fFaculty)fJohng, :Faculty)fMarygg
label term, a URL term, an attributed term, or a bag
term depending on the context.
to stand for a:fFaculty)fJohng, a:f: Faculty)fMarygg
Example 1 The following are several examples of terms:
in the body of a rule.
Lexical terms: CS Dept, John Smith, $Name Note that the anonymous variable $ may appear several
Linking terms: Facultyhfac.htmli, Facultyh$Ui, h$Ui, times in a rule and their different appearances in general
Facultyhi, $Fhfac.htmli, hfac.htmli stand for different variables. Thus, it cannot appear in the
Attributed terms: Title )CS Dept, Program )f$Dg, head of a safe rule.
$A )$V, $A )$Lh$Ui, $Ah$Ui)$V
Bag terms f$Xg, f$X, Johng, fAuthorh$Uig Definition 4 A web document is a safe rule with empty
body.

A term is ground if it has no variables. An object is a In other words, a web document is ground positive ex-
ground term. Corresponding to terms, four kinds of objects pression.
are distinguished in WebQL: lexical, linking, attributed, and
bag objects. Example 3 The following is an example of web object:
http://www.cs.uregina.ca/csdept.html : f
Title )CSDept,
Definition 2 The expressions are defined as follows:

(1) Let U be a URL or a variable and T a term. Then Peoplehpeople.htmli)f


U : T is a positive expression. Facultyhfac.htmli,
(2) If P is a positive expression, then :P is a negative Staffhstaff.htmli,
expression. Studentshstudents.html ig,
(3) Arithmetic, string and bag operation expressions are Programs )f
defined using terms in the usual way. Ph.D Programhphd.htmli,
M.Sc Programhmsc.htmli,
Example 2 The following are several examples of expres- B.Sc Programhbsc.htmlig,
sions where a stands for some constant URL: Researchhresearch.htmli
Positive expressions:
g
a : $X, a : f$Xg, a : fAnswer )$Xg, $U : $V Using the methods presented in [13], we can convert
Negative expressions: most HTML documents into web documents of WebQL.
: a : $X, : a : fFaculty)fJohngg, : a : fFacultyhig
Arithmetic expressions: Example 4 Consider part of the latest DBLP bib-
$A = $B * 2, $Age = 2001 - $Birthyear liography server web page of Michael Ley at
String expressions: http://www.informatik.uni-trier.de/ley/db shown in
$FName = John + $LName, John 2 $FName Figure 1. We can convert it into a web document as shown
Bag expressions: in Figure 2 with simplified URLs such as a1 ; b1 ; etc. to fit
John 2 $Faculty, $S = $S1 [ $S2 in the paper.
http://www.informatik.uni-trier.de/ley/db : f
Title )DBLP Bibliography,
Body )f
Search )fAuthorha1 i, Titleha2 i, Advancedha3 i,
Home Page Searchha4 ig,
Bibliographies ) f
Conferenceshb1 i )fSIGMODhb11 i, VLDBhb12 i,
PODShb13 i, ERhb14 i, ...g,
Journalshb2 i )fCACMhb21 i, TODShb22 i,
TOIShb23 i, TOPLAShb24 i, ...g,
Serieshb3 i )fLNCS/LNAIhb41 i, DISDBIShb42ig,
Books )fCollectionshb51 i,DB Textbookhb52 ig,
By Subjectshb4 i )fDatabase Systemshb61 i,
Logic Proghb62 i, IRhb63 igg,
Full Text )ACM SIGMOD Anthologyh 1 i,
Reviews )ACM SIGMOD Digital Reviewh 2 i,
Links )f
Research Groups )fDatabase Systemshd1i,
Logic Programminghd2ig,
Computer Science Organizationhe1 i )f
ACMhe11 i (DLhe12 i, SIGMODhe13 i, SIGIR he14 i),
IEEE Computer Societyhe15 i(DLhe16 i)g
Related Services hf1 i )f
CoRRhf11 i, ResearchIndexhf12 i, NZ-DLhf13 i,
CS BibTexhf14 i, HBPhf15 i, Virtual Library hf16 igg
g

Figure 1. DBLP Bibliography Figure 2. DBLP Web Document

In deductive database languages, a query is normally de-  A :– ::: $:::$ :X:::


|{z}
fined as a rule with empty head. If the query contains no n
variables, the query result is either true or false. If the If there are several such dot notations in a rule, then
query contains variables, the query result is a set of bind- it stands for their various combinations as outlined
ings that make each ground query true. However, for the above.
web queries, we want not only the set of bindings that make (4) A :– :::  ::: stands for A :– :::  X::: for some fixed
the query true but also proper structuring of the query re- number .
sults. The head of the rule can be used for this purpose. (5) A :– :::hX i : Y stands for A :– :::hX i; X : Y
Also, complex queries over the web documents may need (6) A :– :::hX i:Y stands for A :– :::hX i; X : fY g
more than one rule to express. Thus, we introduce our no-
tion of query as follows. In other words, n stands for 0 to n anonymous variables
in the path.
Definition 5 A query is a set of safe rules whose heads have
the same URL. 3 Query Examples
In order to make queries easier, we introduce the follow- The following queries are based on the DBLP web docu-
ing shorthands for rules, terms and expressions appearing in ment shown in Figure 2. To make them simple, we use as ul
rules: to stand for http://www.informatik.uni-trier.de/ley/db and
uo for the URL of the query result.
(1) X: stands for X ) $
(2) X1 :X2 :::Xn stands for X1 ) fX2 ) :::fXn g:::g (Q1 ) Copy the contents of the document at the given URL
(3) A :– ::: n X::: stands for the following n + 1 rules: ul into a local file given by the URL uo :

 A :– :::X::: uo : $X :– ul : $X
 A :– :::$:X::: Note that no matter what document pointed by ul is, such
... as HTML, postscript, image, executable, etc., it is copied to
the destination uo . However, if we know that it is an HTML CanadaURL: f
document which can be converted into a bag, then we can Title )Canada,
use the following query instead: Body )f
Geographic )f
f g
uo : $X :– ul : f$Xg
Land boundaries )f
which says that every element denoted by $X in the bag border countries )fUSg
is also an element in the result bag. The notion f$Xg in ...
the body of the rule means that $X is an element in the g
corresponding bag whereas the notion f$X g in the head of
the rule is used to group the result into a bag. It corresponds USURL: f
to a partial set term in Relationlog [12]. Title )US,
Body )f
Geographic )f
(Q2 ). List the objects under the attribute Search:
f
uo : Answer )$Xg :– ul : f*Search )$Xg Land boundaries )f
The result to this query based on the web document in Fig- border countries )fCanada, Mexicog
ure 2 is as follows: ...
fAnswer)fAuthorha1i, Titleha2 i, Advancedha3i, ...gg g
(Q3 ). List the anchors (URLs) under the attribute Search: MexicoURL: f
f
uo : Answer )f$Xgg :– ul : f*Search )fh$X igg Title )Mexico,
Body )f
The result is fAnswer )fa1 ; a2 ; a3 gg
Geographic )f
(Q4 ). List the labels under the attribute Search: Land boundaries )f
border countries )fUS, ...g
f
uo : Answer )f$Xgg :– ul : f*Search )f$Xhigg ...
The result is fAnswer )fAuthor, Title, Advanced, ...gg. g
Note that this query can be represented equivalently using
the dot notation in WebQL as follows: GermanyURL: f ... g
f
uo : Answer )f$Xgg :– ul : f*Search.$Xhig
FranceURL: f ... g
(Q5 ). List all the attributes at the first and second levels:
f
uo : Answer )f$Xgg :– ul : f$X )$Yg ...
f
uo : Answer )f$Ygg :– ul : f$X )$Y )$Zg
The result is fAnswer )fTitle, Body, Search, ...gg.
Figure 3. CIA World Factbook
This
query can also be represented using equivalently using the
dot notation:
f
uo : Answer )f$Xgg :– ul : f$X.g
are recursive since the web document defined is used in the
f
uo : Answer )f$Ygg :– ul : f$X.$Y.g
second rule.
(Q10 ). Get all the URLs together with their labels reachable
(Q6 ). Obtain the URL of TODS: from the page.
f
uo : Answer )$Xg :– ul : f*TODSh$Xig f h ig
ur : $L $U :– ul : f*$Lh$Uig
The result is fAnswer )b22 g f h ig
ur : $L $U :– ur : f*$Lh$Uig, $U: f*$Lh$Uig
(Q7 ). Obtain all the URLs in the page.
In order to demonstrate the expressive power of
f
uo : Answer )f$Ugg :– ul : f*h$Uig WebQL, let us consider the CIA world factbook 2000 at
http://www.odci.gov/cia/publications/factbook. This web
(Q8 ). Obtain all the URLs together with their labels.
server contains detailed information about each country in
f
uo : Answer )f$Lh$Uigg :– ul : f*$Lh$Uig the world in HTML format, such as its location, geographic
(Q9 ). Get all the URLs reachable from the page. coordinates, area, land boundaries population, etc. We can
view the web server as a set of web documents and there-
f g
ur : $X :– ul : f*h$Xig fore we can query them and inference useful information.
f g
ur : $X :– ur : f$Yg, $Y: f*h$Xig Figure 3 shows part of the web documents in a simplified
Note that this query involves multiple web documents and form.
The use of anonymous variables allows us to simply our
(Q11 ) Find countries that border both Germany and France. query rules as demonstrated in the examples above. How-
f
uo : Answer ) f$N g :– ever, when we deal with semantics, we disallow anonymous
$U: fTitle )$N,
variables. We assume that each appearance of anonymous
*border country )fGermany, Franceg,
variable is replaced by a non-anonymous variable that never
occur in the query rules. This is why we do not map anony-
(Q12 ) Find countries that border Germany but not France. mous variable $ to any object in the above definition.
f
uo : Answer ) f$N g :– In order to define the semantics, we now introduce the
$U: fTitle )$N, following auxiliary notions.
*border country )fGermanyg,
: *border country )fFrancegg Definition 10 An object o is part-of of an object o0 , de-
noted by o  o0 , if and only if one of the following hold:
Note that this query involves negation.
(Q13 ) Find pairs of countries that border the same countries. (1) both are constants and o = o0 ;
f
uo : Answer )fCountry1)$N1, Country2)$N2gg :– (2) both are linking objects such that one of the following
$U1: fTitle )$N1, *border countries )$Csg holds:
$U2: fTitle )$N2, *border countries )$Csg  o  lhi and o  l hui such that l  l ;
0 0 0

N1 6= N2  o  lhui and o  l hui such that l  l ;


0 0 0

 o  hui and o  l hui.


0 0

(Q14 ) Find all the countries that can be reached from


Canada by land transportation means. (3) both are attributed objects: o  a ) v and o  a ) 0 0

v such that a  a and v  v ;


0 0 0

f
ur : Answer )f$Cgg :– (4) both are bag objects such that for each oi 2 o o , 0

$C: fTitle )Canada, *border countries )f$Cgg there exists oi 2 o


0 0
o such that oi  oi . 0

ur : fAnswer)f$Cgg:–
ur : fAnswer )f$Xgg The part-of relationship between objects o and o0 cap-
$U: fTitle )$X, *border countries )f$Cgg tures the fact that o is part of o0 .
Note that this is another recursive query.
Example 5 The following are several examples:
4 Semantics of WebQL Faculty  Faculty
Faculty  Facultyhfac.htmli
In this section, we define the Herbrand-like logical se- hfac.htmli Facultyhfac.htmli
mantics for WebQL queries. Programs )fM.Sc Programg 
Programs )fPh.D Program, M.Sc Programg
Definition 6 The Herbrand universe UH of WebQL is the fTitle )CSDeptg  fTitle )CSDept, Facultyhfac.htmlig
set of all ground terms that can be formed.

In other words, UH the domain of all possible objects. We need this notion because ground positive expressions
in the body of a query should always be part of some web
Definition 7 The Herbrand base BH of WebQL is the set documents. Thus, we extend the part-of relationship to web
of all ground web documents that can be formed using terms documents and web databases as follows.
in UH .
Definition 11 Let W  u : t, W 0  u0 : t0 be two web
That is, BH is the set of all possible web documents that documents. Then W is part-of W 0 , denoted by W  W 0 ,
can be formed. if and only if u = u0 and t  t0 .
Definition 8 A web database WD is a subset of BH . Definition 12 Let DB and DB 0 be two web databases.
Then DB is part-of DB 0 , denoted by DB  DB 0 , if
and only if for each W 2 DB DB 0 , there exists W 0 2
In other words, a web database is a set of web docu-
DB 0 DB such that W  W 0 .
ments. For example, the CIA world factbook shown in Fig-
ure 3 is a web database. The whole world-wide web is also
a web database.
Definition 13 Let DB be a web database. The notion of
Definition 9 A ground substitution  is a mapping from the satisfaction (denoted by j=) and its negation (denoted by
set of web variables V f$g to UH . 6j=) based on DB are defined as follows.
(1) For a ground positive expression u : t, DB j= u : t if Example 7 Consider query Q4 in the last subsection and
and only if there exists u : t0 2 DB such that t  t0 . the database DB above, we have
(2) For a ground negative expression :u : t, DB j= :u : t TQ4 (DB ) = f uo : fAnswer )fAuthorgg,
if and only if DB 6j= u : t uo : fAnswer )fTitlegg,
(3) For each ground arithmetic, string, or bag operation uo : fAnswer )fAdvancedgg,
expression , DB j= if and only if is satisfied in uo : fAnswer )fHome Page Searchggg
the usual sense.
(4) For a rule r of the form A :– L1 ; :::; Ln , DB j= r if Note that the operator TQ does not perform grouping.
and only if for every ground substitution , DB j= Therefore, we introduce the following notions.
L1 ; :::; DB j= Ln implies DB j= A
Definition 16 Two objects o and o0 are compatible if and
In other words, a ground positive expression is satis- only if one of the following holds:
fied if and only if it is part of a web document in the web
database; a ground negative expression is satisfied if and (1) both are constants and are equal;
only if it is not part of a web document; and a rule is satis- (2) o  a ) v and o0  a ) v 0 such that v and v 0 are
fied if there is a web document in the database that satisfies compatible;
the head of the rule for each ground substitution that makes (3) both are bag objects.
the body of the rule satisfied.
A set of objects are compatible if and only each pair of them
is compatible.
Example 6 Let DB denote the web database containing
DBLP web document in Example 4. Then we have Example 8 The following pairs are compatible:
DB j= ul : fSearch )fAuthorhigg Author and Author
DB j= ul : fSearch )fTitlehigg fAuthorg and fTitleg
DB j= ul : fSearch )fAdvancedhigg Answer )fAuthorg and Answer )fTitleg
DB j= :ul : fSearch )fSIGMODhigg
DB j= :ul : fJournals )fAuthorhigg Definition 17 Two web document u : t and u0 : t0 are com-
DB j= 6 = 3  2 patible if and only if u = u0 and t and t0 are compatible.
DB j= John Smith = John + Smith A set of web documents are compatible if and only if each
DB j= John 2 fJohn, Mary, Tonyg pair of them is compatible.

Note that given a query that is a set of safe rules, an ex- Example 9 The following set of web objects are compati-
isting web database cannot satisfy the head of these rules. ble.
We have to generate a new web document using the rules so uo : fAnswer )fAuthorgg,
that the new web document and the existing web database uo : fAnswer )fTitlegg,
together can satisfy the query. uo : fAnswer )fAdvancedgg,
uo : fAnswer )fHome Page Searchgg
Definition 14 Let Q be a query. A model M of Q is a web
Definition 18 Let S be a set of (web) objects and S 0 a com-
database that satisfies Q. A model M of Q is minimal if and
only if for each model N of Q, M  N . patible subset of S . Then S 0 is a maximal compatible set in
S if there does not exist a (web) object o 2 S S 0 that is
compatible with each object in S 0 .
As in deductive databases, we are interested in a minimal
model of the query that can be computed bottom-up. We Definition 19 Let S be a set of objects. The grouping op-
first introduce several auxiliary notions. erator G is defined recursively on S as follows:

Definition 15 Let DB be a web database and Q a set of (1) If S is a singleton set S = fog, then G(S ) = o
rules. The immediate logical consequence operator TQ over (2) If S is a compatible set of attributed objects S = fa )
DB is defined as follows: v1 ; :::; a ) vn g, then G(S ) = a ) G(fv1 ; :::; vn )
(3) S is a set of bag objects, then
f j
TQ (DB ) = A A :– L1 ; :::; Ln 2 Q and there exists G(S ) = [ fG(S 0 ) j S 0 = fo j o 2 s; and s 2 S g is a
maximal compatible bag of objectsg
a ground substitution  such that
DB j= L1 ; :::; DB j= Ln g
It is extended to a set of web objects as follows:
(1) If S is a compatible set of web objects of the form Textual Interface Browser Interface
u : o1 ; :::; u : on , then G(S ) = u : G(fo1 ; :::; on g)
(2) If S is divided into maximal compatible subsets
S1 ; :::; Sn such that S = S1 [ ::: [ Sn , then G(S ) =
G(S1 ) [ ::: [ G(Sn ) Query and Inference Processor

Definition 20 The powers of the operation TQ over the web


database DB are defined as follows:
TQ " 0(DB ) = DB Local Data
TQ " n(DB ) = TQ (G(TQ " n 1(DB ))) [ Repository

TQ " n 1(DB )
TQ " ! (DB ) = [n=0 TQ " n(DB )
1

Intelligent Wrapper
Example 10 Continuing with the Example 7, we have
G(TQ4 " !(DB )) = DB [ G(f
uo : fAnswer )fAuthorgg,
uo : fAnswer )fTitlegg,
uo : fAnswer )fAdvancedgg,
: fAnswer )fHome Page Searchgg
World Wide Web
uo
g)
= DB [ u0 : G(f
fAnswer )fAuthorgg,
fAnswer )fTitlegg, Figure 4. System Architecture
fAnswer )fAdvancedgg,
fAnswer )fHome Page Searchgg 5 Implementation
g)
= DB [ u0 : f
Answer ) The WebQL language presented in this paper is part of
G(ffAuthorg,fTitleg,fAdvancedg, our web search and inference system project that is cur-
fHome Page Searchg rently under implementation at the University of Regina.
g) The architecture of the system being implemented is shown
= DB [ u0 : f in Figure 4.
Answer ) The system is organized into four layers. The first layer
fAuthor, Title, Advanced, Home Page Search g is the entire world-wide web.
g The second layer is the intelligent wrapper. It accesses
the world-wide web through the Internet and extra structure
which contains the original database DB plus a new web and data stored in the local data repository with proper in-
document that satisfies the head of the rule for query Q4 . dexing supports for efficient query processing. It converts
between web documents in HTML/XML and web docu-
Theorem 1 Let DB be a web database and Q a set of query
rules. Then G(TQ " ! (DB )) is a minimal model of Q.
ments in WebQL, allows the user to adjust web documents
by adding or removing attributes and maintain such adjust-
ment information in the local data repository, and cashes
The semantics of rules in a rule-based language are usu-
web documents in the local data repository to speed up
ally given by the minimal model of the rules, since non-
query answering.
minimal models may contain things that cannot be derived.
We do the same for WebQL. The third layer is the query and inference processor,
which is mainly in charge of query processing. It commu-
Definition 21 Let DB be a web database and Q a set of nicates with the user interface layer and uses the data in the
query rules. Then the semantics of Q under DB is given by local data repository to process the user queries. For recur-
G(TQ " ! (DB )). sive queries, it uses semi-naive bottom-up fixpoint compu-
tation to generate the result. Simple keyword-based search
Therefore, given a recursive query, we just need to com- is also supported by the query and inference processor.
pute its fixpoint bottom-up and construct the web document The fourth layer is the user interface. Two kinds of user
that satisfies the rules in the query. interfaces are provided: textual user interface and browser
user interface. They provide different kinds of environment [5] P. Buneman, S. Davidson, G. Hilebrand, and D. Suciu. A
for the user to express queries and view the results. They Query Language and Optimization Techniques for Unstruc-
accept user commands and queries, display web documents tured Data. In Proceedings of the ACM SIGMOD Interna-
like lynx and netscape respectively, display web documents tional Conference on Management of Data, pages 505–516,
converted by intelligent wrapper, and invoke the query and 1996.
[6] O. S. D. Konopnicki. W3QS: A Query System for
inference processor to process queries. It also provides var-
the World-Wide Web. In Proceedings of the Interna-
ious templates to generate web documents in HTML/XML tional Conference on Very Large Data Bases, pages 54–
for query results. 65, Zurich,Switzerland, 1995. Morgan Kaufmann Publish-
ers, Inc.
6 Conclusion [7] M. Fernandez, D. Florescu, A. Levy, and D. Suciu. A Query
Language for a Web-Site Management System. SIGMOD
Record, pages 4–11, 1997.
In this paper, we have presented WebQL, a rule-based
[8] D. Florescu, A. Levy, and A. Mendelzon. Database Tech-
language for querying the HTML documents over the web niques for the World-Wide Web: A Survey. SIGMOD
based on the conceptual model proposed in [13]. Unlike Record, 26(3), 1997.
other web query languages, WebQL provides a simple but [9] R. Himmeroder, G. Lausen, B. Ludascher, and C. Schlep-
very powerful way to query both the structure and con- phorst. On a declarative semantics for web queries. In
tents of the HTML documents and to restructure the re- Proceedings of the International Conference on Deductive
sults. As already shown in the implementation section, this and Object-Oriented Databases, pages 386–398, Switzer-
query language can indeed be used to query XML doc- land, 1997. Springer-Verlag LNCS.
uments since XML documents can be converted into our [10] L. V. S. Lakshmanan, F. Sadri, and I. N. Subramanian. A
Declarative Language for Querying and Restructuring the
web documents much easier than HTML documents. We
Web. In Proceedings of the 6th International Workshop on
have also defined a fixpoint bottomup semantics for We-
Research Issues in Data Engineering, 1996.
bQL. The system that supports WebQL is currently under [11] M. Liu. ROL: A Deductive Object Base Language. Infor-
implementation and will soon be available from the web mation Systems, 21(5):431 – 457, 1996.
page at http://www.cs.uregina.ca/mliu/WebQL/. [12] M. Liu. Relationlog: A Typed Extension to Datalog with
We would like to extend the functionality of WebQL by Sets and Tuples. Journal of Logic Programming, 36(3):271–
adding other useful features to make it a really useful tool 299, 1998.
for web query and inference and investigate the computabil- [13] M. Liu and T. W. Ling. A Conceptual Model for the Web. In
ity and complexity issues of WebQL queries. Using We- Proceedings of the International Conference on Conceptual
bQL, we would also like to develop data extraction tools Modeling (ER 2000), Salt Lake City, October 9-12 2000.
Springer-Verlag LNCS.
and data integration tools based on the method proposed
[14] M. Liu and T. W. Ling. A Data Model for Semistructured
in [14]. Our objective is to build an intelligent web search Data with Partial and Inconsistent Information. In Pro-
engine on top of the query and inference system. ceedings of the International Conference on Advances in
Database Technology (EDBT 2000), pages 317–331, Kon-
Acknowledgments The research was partially supported stanz, Germany, March 27-31 2000. Springer-Verlag LNCS
by grants from the Natural Sciences and Engineering Re- 1777.
search Council of Canada (NSERC). The authors are also [15] A. Mendelzon, G. Mihaila, and T. Milo. Querying the World
Wide Web. In Proceedings of the First International Confer-
grateful to Yibin Su for implementing the system.
ence on Parellel and Distributed Information System, pages
80–91, 1996.
References [16] A. O. Mendelzon and T. Milo. Formal Models of Web
Queries. In Proceedings of the ACM Symposium on Prin-
[1] S. Abiteboul. Querying Semistructured Data. In Proceed- ciples of Database Systems, 1997.
ings of the International Conference on Data Base Theory, [17] J. D. Ullman. Principles of Database and Knowledge-Base
pages 1–18. Springer-Verlag LNCS 1186, 1997. Systems, volume 1. Computer Science Press, 1988.
[2] S. Abiteboul, D. Quass, J. McHugh, J. Widom, and J. L.
Wiener. The Lorel Query Language for Semistructured
Data. Intl. Journal of Digital Libraries, 1(1):68–88, 1997.
[3] G. Arocena and A. Mendelzon. WebOQL: Restructuring
Documents, Databases and Webs. In Proceedings of the In-
ternational Conference on Data Engineering, pages 24–33.
IEEE Computer Society, 1998.
[4] C. Beeri, S. Naqvi, O. Shmueli, and S. Tsur. Set Construc-
tion in a Logic Database Language. Journal of Logic Pro-
gramming, 10(3,4):181–232, 1991.

You might also like