Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/220492795

Solving Ordinary Differential Equations Using Taylor Series

Article  in  ACM Transactions on Mathematical Software · June 1982


DOI: 10.1145/355993.355995 · Source: DBLP

CITATIONS READS
227 10,663

2 authors, including:

George Corliss
Marquette University
144 PUBLICATIONS   3,693 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

ASB AAFS Standards Board. Consensus Body on Patterned Injuries View project

Applying clustering techniques to determine if there is significant difference between near-infrared spectra collected from semi-field raised and wild collected mosquitoes of
the same species View project

All content following this page was uploaded by George Corliss on 31 May 2014.

The user has requested enhancement of the downloaded file.


Optimizing Queries on Files
Mariano P. Consens Tova Milo
University of Waterloo CSRI, University of Toronto
Waterloo, Canada N2L 3G1 Toronto, Canada M5S 1A1
mconsens@uwaterloo.ca milo@db.toronto.edu

Abstract In order to allow data stored in les to bene t from


standard database technology, and in particular be
We present a framework which allows the user to queried and updated using database languages, we
access and manipulate data uniformly, regardless of need to address two issues. First, we need to de ne a
whether it resides in a database or in the le sys- mapping between les and databases. This mapping
tem (or in both). A key issue is the performance should allow the e ective translation of queries and
of the system. We show that text indexing, com- updates on databases to operations on les, and vice
bined with newly developed optimization techniques, versa. An example for such a mapping is given in
can be used to provide an ecient high level inter- [ACM93]. Second, we want to use this mapping to
face to information stored in les. Furthermore, us- provide an ecient high level database interface to
ing these techniques, some queries can be evaluated les.
signi cantly faster than in standard database imple- A key issue is the performance of the system. To
mentations. We also study the tradeo between e- have feasible execution of queries and updates on les,
ciency and the amount of indexing. the query evaluation and optimization mechanisms
used in standard databases must be extended. In this
paper we concentrate on query evaluation. To answer
1 Introduction queries on les, one would like to avoid scanning the
whole le system. We show in the paper that this
Database systems provide powerful features for ma- can be achieved using advanced text indexing tech-
nipulating large quantities of data. They do not, niques. The idea is to translate high level queries
however, provide access to data stored in les out- on les into low level expressions that manipulate in-
side the database system. A large portion of the dices, and then evaluate these expressions eciently
information in a computerized environment resides using the engine of an indexing system. We do not
in the le system, and not in databases. This in- address the issues of building and maintaining the in-
cludes, in particular, semi-structured textual infor- dices, we assume that this is a service given by the
mation such as: electronic documents, programs, log underlying text indexing system. In this work we use
les, online newspapers, patent information, litera- the Pat1 text indexing system [Ope93], and translate
ture citations, business pro les, and e-mail. The database queries on les to expressions in the Pat al-
tools available for manipulating such les do not pro- gebra [Gon87, ST92]. Speci cally, we show how word
vide high level query and update facilities as exist in indexing and region indexing can be combined with
database systems. The purpose of this research is to extended database query optimization to provide ef-
bridge the gap between databases and the surround- cient access to les. Using these techniques, some
ing environment, and to provide a uniform frame- queries can be evaluated signi cantly faster than in
work where data can be accessed and manipulated, standard database implementations.
regardless of whether it resides in a database or in Database queries may be expressed by several in-
the le system (or in both). Similar motivations dex expressions, each a ording di erent degree of ef-
are presented in [SLS+93, ACM93, Sch93, GNOT92, ciency. Clearly, we want to evaluate a query using
BGMM93, Pae93, BGH+92]. the most ecient expression. We therefore study op-
timization of index expressions, and present a poly-
An earlier version was published in Proceedings of
the 1994 ACM SIGMOD Conference. 1 Pat is a registered trademark of Open Text Corporation.

Page 1
nomial optimization algorithm that nds the most
ecient expression that is equivalent to a given one. @INCOLLECTION{ Corl82a,
AUTHOR = "G. F. Corliss and Y. F. Chang",
TITLE = "Solving Ordinary Differential
As in standard database systems, it turns out that Equations Using Taylor Series",
there is a tradeo between eciency and the amount BOOKTITLE = "Automatic Differentiation Algorithms",
of data being indexed. We study how the selection YEAR = "1982",
of speci c indices a ects the amount of data needed EDITOR = "A. Griewank and G. F. Corliss",
to be actually scanned when answering a query. We PUBLISHER = "SIAM",

also present guidelines for pro table choice of indices. ADDRESS


PAGES
= "Philadelphia, Penn.",
= "114--144",
We show that there are cases where database queries REFERRED = "[Aber88a]; [Corl88a]; [Gupt85a].",
can be fully computed using the indexing engine. In KEYWORDS = "point algorithm; Taylor series;
other cases, the indices are not sucient for full com- radius of convergence;",
putation, but can be used to locate le regions that ABSTRACT = "A Fortran pre-processor uses automatic
are potentially relevant for the query computation, differentiation to write a Fortran
and thus save in le scanning. The potentially rele- program to solve the system."}
vant regions must then be further processed, in which
case we describe how extended database optimiza-
tions techniques can be used to improve the perfor- Figure 1: A sample bibliographic reference in Bib-
TeX.
mance.
The ideas presented in this paper can be used to 2 Example
build a uni ed and ecient interface to databases and
le systems. The main advantage is that the system We are interested in les that have strong inner struc-
can be implemented on top of any suitable DBMS, ture (e.g., electronic documents, programs, SGML
le system, and indexing system, and can use any les). Our goal is to use the inner structure of les
standard parser for the mapping between the le and for providing high level and ecient interface to the
the database. information they contain. Bibliography les consti-
tute an example of semi-structured data with which
In particular, the Hy+ System [CM93] is an ex- all researchers are well acquainted. The text in Fig-
ample of a system that demonstrates how deduc- ure 1 describes one bibliographic entry in the familiar
BibTeX format [Lam85].
tive database technology can be combined with Pat There is a multitude of bibliographic les that are
to evaluate a mixture of traditional and textual available on the Internet. Even at the local level, it is
queries expressed in the visual language GraphLog common that each one of the members of a research
[Con89, CM90]. One of the speci c applications of group keeps several such les on a variety of subjects,
Hy+ where textual queries were certainly useful was and all of the members may share access to those
the querying and visualization of software engineering bibliographies. In this familiar scenario, there is a
data. A discussion of the implementation approach strong motivation for being able to express queries
taken to integrate Pat into Hy+ can be found in against the information in those les.
[Yeu93]. In addition, the optimization techniques pre- It is easy to see that bibliography les have a well
sented here were tested on a prototype system com- de ned complex structure. We can nd (i) atomic
bining Pat with the O2 database system [BCD89], elds like YEAR, (ii) set valued elds like KEYWORDS,
and the Yacc parser [AJ74]. (iii) nested structures like AUTHOR and EDITOR that are
sets of names of people, where each name consists of a
In Section 2 we present an example used in the rest rst and last name, (iv) references to other entries like
of the paper. Section 3 presents the indexing engine, each of the elements in the set valued eld REFERRED,
and the main properties of the index algebra used in and (v) chunks of unstructured text, both long (e.g.,
the optimization process. Section 4 introduces struc- ABSTRACT) and short (e.g., TITLE and BOOKTITLE).
turing schemas { a tool for specifying how a le should Viewing this information as a database provides
be interpreted in a database. The following sections both modeling and processing bene ts. We assume in
describe the optimization process. Section 5 consid- the following basic knowledge about object oriented
ers full indexing. Section 6 considers partial indexing, databases. We use below the data model and query
and Section 7 provides guidelines for pro table choice language of XSQL [KKS92]. With suitable variations
of indices. Finally, we conclude in Section 8. we could have used any other object oriented data

Page 2
model and language. A bibliography database may using a text indexing system for the pre-processing
contain classes like: References, Authors, and Edi- stage, and for the computation, results in a signi -
tors. Every reference object may have atomic at- cant performance advantage.
tributes like Year, and set attributes like Authors, In the above example all the necessary information
(where each Author has a First Name and Last Name (i.e. words, references, authors, and last-names) was
attributes). This bibliography database can support indexed in advance. In general we would like to index
complex queries like \Select the names of editors who as little as possible, to save in space and update cost.
never wrote a paper with any of the keywords occur- Assume, for example, that we decide not to index the
ring in a book that they edited". authors regions. Note that a reference region contains
Note that this query can be easily formulated in two kinds of last names. Last names of authors, and
XSQL, but cannot be directly expressed by standard last names of editors. If the authors regions are not
text search tools (e.g. grep) that work directly on indexed, then it is impossible to distinguish between
the le, and not even by most text retrieval systems, references where Chang is an author and those where
since those systems do not support join-like opera- Chang is only an editor, without actually accessing
tions. Our research, thus, has two goals: (i) enabling the le. Thus the query cannot be fully computed
users to view les as databases, and (ii) using this using the indexing engine.
database view for manipulating the les. However, the indexed information can be still used
It was shown in [ACM93] that structuring schemas to improve performance. In particular, it can be used
can be used to specify how data stored in a le should locate the references that are \potentially relevant"
be interpreted in a database. The main question is for the query computation, and thus save on le scan-
how to avoid scanning the whole le system when an- ning. The Reference regions that include some Last-
swering queries over such a database view. We show Name region that is the word \Chang", are a superset
below that this unnecessary e ort can be avoided if of the required references (in those references, Chang
some information about the le content and structure is either an author or an editor). The number of these
is indexed and maintained by the system. We start potentially relevant references is signi cantly smaller
by giving an intuition of the process with a very sim- than the number of all the references in the le sys-
ple query. We then present in the next sections the tem. Thus scanning those references, (in order to l-
full optimization algorithm. ter out the irrelevant references), instead of scanning
Suppose that we want to nd references where the whole le system, provides big performance im-
\Chang" is one of the authors. This can be formu- provement. Furthermore, we show in the paper that
lated by an SQL query these potential regions can be processed eciently us-
Q = SELECT r FROM References r ing various optimizations techniques.
WHERE r:Authors:Name:Last Name = \Chang "
Another way to reduce the amount of indexed data
is to use selective indexing. Assume that users often
Assume that we pre-process the le, and build two query names of authors, but never (or hardly ever)
kinds of indices. Word index, recording the loca- query names of editors. In that case, instead of index-
tion(s) of all the words in the le, and region in- ing all the last-name regions it is better to index only
dices recording the location of various regions in the last names of authors. Selective indexing can also be
le. Assume that we built 3 region indices. The done for words. As in standard database systems, we
rst records the locations of Reference regions (i.e. show that there is a tradeo between eciency and
where each reference in the le starts, and where it the amount of data being indexed.
ends). The second records the locations of regions In the next sections we explain how queries on the
corresponding to Authors (i.e regions starting with database view of les are transformed to expressions
AUTHOR= and ending with a comma). The third index manipulating word and region indices. Then we show
records locations of Last-Names. how these expressions can be optimized. We consider
This pre-processed information can be used to lo- full and partial indexing and the performance accel-
cate the references required by the query without ac- eration gained by them.
tually accessing the BibTeX le. The references re-
quired by the query are exactly those corresponding
to Reference regions, that include some Authors re-
gion, that includes a Last-Name region, that contains
3 Indexing and Optimization
the word \Chang". Note that the Pat text indexing In this section we provide a brief overview of the Pat
systems has facilities for indexing words and regions, algebra [ST92], the language used by the Pat text
and it provides a very ecient inclusion test. Thus, retrieval system [Ope93]. Pat combines traditional

Page 3
text search capabilities (lexical, proximity, contex- Given a region expression e and an instance I ,
tual, boolean, see [SM83]) with some original power- e(I ) denotes the result of evaluating e on I , where the
ful features (position and frequency search). In par- semantics are as follows. The operations [ (union),
ticular, we are interested in Pat's ability to manip- \ (intersection), and ? (di erence), are the usual set
ulate regions of text (in the sense that we discussed theoretic operations on sets of regions. The selection
informally in the previous section). Regions are a gen- operation w takes a set of regions R and returns
eralization of the concepts of document and eld usu- the regions r 2 R containing (exactly) the word w.
ally found in conventional information retrieval sys- (The selection is implemented by combined usage of
tems. Similar modelling capabilities are described in the word and region indices). The innermost  (resp.
[Bur92]. outermost !) operation takes a set of regions R and
The rst subsection presents a subset of the Pat returns the r 2 R such that there is no r0 2 R; r0 6= r
algebra (with some extensions) that deals with re- for which r  r0 (resp. r0  r).
gions. In the second part of this section we deal with The  (including) and  (included) operations take
a powerful optimization technique for expressions in two sets of regions R and S and return the sets of
this algebra. regions
R  S = fr 2 R : 9s 2 S; r  sg
3.1 The Region Algebra
Pat is a set-at-a-time algebra for text queries. There R  S = fr 2 R : 9s 2 S; s  rg
are two types of sets in the algebra: sets of match Finally, the d (directly including) and d (directly
points (speci c positions in the text) and sets of re- included) operations are a re nement of  and  resp.
gions. The match points correspond to the position The operation d (d ) selects regions r 2 R that
in the text of indexed strings (the entries of the word directly include (are directly included in) a region s 2
index referred to earlier). Each region is a substring S , i.e. there is no other indexed region between r and
of the indexed text, and is de ned by a pair of posi- s. More formally,
tions in the text corresponding to the beginning and R d S = fr 2 R : 9s 2 S; r  s ^
end of the region. We use the notation r  s, where :9t 2 T; T 2 I ; r  t  sg
r,s are two regions, to denote the fact that the region R d S = fr 2 R : 9s 2 S; s  r ^
r includes the region s (i.e., the endpoints of s are :9t 2 T; T 2 I ; s  t  rg
within those of r).
To simplify the presentation, (and highlight the as- We show below how d can be computed using the
pects of the Pat algebra that are of interest to us), other algebra operators, by an algorithm that addi-
we describe below a subset of the algebra (with some tionally uses a while construct (d can be computed
extentions), that concentrates on the manipulation of similarly). The main objective of this presentation
sets of regions. We call this algebra the region al- is to give intuition about the cost of this operation,
gebra. In particular, we assume that we are given a and in particular to show that it is signi cantly more
speci c set of named regions on the indexed text2. expensive than the simple inclusion operation .
A region index I is a set of region names The program, which takes as input two regions R; S
R1 ; : : : ; Rn . An instance of a region name Ri is a set and produces as output Rresult = R d S , basically
of regions in a le (with no restrictions on overlaps). iterates over nested layers of R regions, and for each
An instance I of a region index I is a mapping asso- layer selects the R regions of the layer that directly
ciating an instance Ri(I ) to each region name Ri. As include an S region.
a notational convenience when I is understood from
the context, we use Ri for both the region name and Rlayer := !(R); Rrest := R ? Rlayer ; Rresult := ;;
the instance Ri (I ). Region expressions over I are while (Rlayer  S ) 6= ; do
expressions generated by the grammar Rresult := Rresult [ S
(Rlayer  (S ? ( T 2I?fSg(S  T  Rlayer ))));
e ! Ri j e [ e j e \ e j e ? e j w (e) j (e) j !(e) j Rlayer := !(Rrest); Rrest := Rrest ? Rlayer ;
e  e j e  e j e d e j e d e j (e) end
return Rresult
where the terminals Ri are the region names in I .
2 Note that the full Pat algebra is capable of constructing
sets of regions dynamically. From the point of view of this Note that ; d ; ; d are not associative. For
work we can treat regions de ned dynamically as if they were brevity, we omit parentheses and assume that the op-
views. erations are grouped from the right. The following is

Page 4
an example for a region expression. Consider the re- We next consider equivalence of region expressions.
gion index I = f Reference, Key, Authors, Editors, In the standard database approach, two queries over
Name, First Name, Last Name g, that can be de ned a given schema are equivalent i they have the same
for BibTeX les. The expression result for every instance of the database. In the con-
text of queries in the region algebra, a RIG can be
(Reference  Authors  \Chang"(Last Name))[ viewed as schema. We therefore have the following
(Reference  Editors  \Corliss"(Last Name)) de nition.
returns the set of Reference regions that either con- De nition 3.2 Two region expressions e1 ; e2 are
tain an Authors region containing a Last Name re- equivalent with respect to a RIG G = (I ; E ) i for
gion that is the word \Chang", or contain an Editors every instance I 2 IG , e1 (I ) = e2 (I ).
region containing a Last Name region that is the
word \Corliss". For example, let I = f Reference, Key, Authors, Ti-
tle, Editors, Name, First Name, Last Name g, and
3.2 Optimizing Region Expressions consider the following RIG.
We begin by describing some of the properties of ex-
pressions in the region algebra that are the basis of
the optimization technique, and exploited throughout
the rest of the paper.
Our goal is to translate database queries to region
expressions, and evaluate them using the indexing en- XXXz
gine. Note that database queries may be expressed
by several di erent region expressions, some of which
are more ecient, and some less. Clearly, we want to
evaluate the query using the most ecient expression.
We therefore present below an optimization algorithm
that given such an expression, nds the most ecient
equivalent expression. In the rest of this section we
concentrate on region expressions where all the oper-
ations are  and d. (As we show later, this type of
expressions are used for evaluating database queries
on text les). We call such expressions inclusion
expressions.
We rst observe that les of a speci c format have
speci c inclusion relationships among regions. For
instance, in our BibTeX le example, Reference re-
gions can include Editors region, but not vice versa.
To describe such relationships between regions, we in-
troduce a region inclusion graph (RIG, for short).
The nodes of the graph are region names, and the
edges state the possible inclusion relationships be-
tween the corresponding region instances. An edge
(Ri; Rj ) is in the graph, i an Ri region can directly
include an Rj region. The graph is used to charac-
terize a set of instances that obey certain inclusion
restrictions. In general, the RIG may contain cycles
(e.g., self-nested regions).
De nition 3.1 An instance I of a region index I =
fR1; : : : ; Rn g satis es a RIG (region inclusion graph)
G = (I ; E ) i for every two regions ri 2 Ri(I ); rj 2
Rj (I ), if ri directly includes rj then (Ri; Rj ) 2 E .
The set of all instances of I that satisfy a RIG G is
denoted IG .
Consider, the expression e3 = Reference  relationship between regions (details are omitted for
Title  Last Name. The result of e3 is empty for lack of space).
all the instances satisfying the above inclusion graph.
This is because in all those instances, no Last Name The Optimization Algorithm
region is included in a Title region. In general we
have that We present below an optimization algorithm, that
given an inclusion expression e computes the most
Proposition 3.3 Let I = fR1; : : : ; Rn g be a region ecient version of e. The algorithm has two steps.
index, and let e be an inclusion expression over I . The rst replaces d operations by , and the sec-
Let G = (I ; E ) be a region inclusion graph. ond shortens the expression.
e(I ) = ; for every I 2 IG , i at least one of the
following holds: 1. The rightmost subexpression Ri d Rj , satisfy-
(i) e has a subexpression Ri d Rj , and (Ri; Rj ) 62 E . ing criteria (a) of Proposition 3.5, is replaced by
(ii) e has a subexpression Ri  Rj , and G does not Ri  Rj , and this step is repeated on the re-
contain a path from Ri to Rj sulting expression until no more changes can be
done.
The proof follows immediately from the properties of
the instances I 2 IG that satisfy G. 2. All subexpressions of the form Ri  Rj  Rk ,
We next show that knowledge about the structure satisfying criteria (b) of Proposition 3.5, are re-
of les can be used to shorten inclusion expressions, placed by Ri  Rk , and this step is repeated
and to replace the d operation by . W.l.o.g we until no more changes can be done.
consider in the following only non trivial expressions
(i.e. expressions whose result is not always empty). When the algorithm is applied on the expression
Reference d Authors d Name d Last Name,
De nition 3.4 Let G = (I ; E ) be a RIG, and let the rst step replaces the three d operations by
e1; e2 be two inclusion expressions over I . We say  from right to left. The second step replaces
that e2 is more ecient than e1 w.r.t. G, i Authors  Name  Last Name by Authors 
e1 and e2 are equivalent w.r.t. G, and e2 was ob- Last Name, obtaining Reference  Authors 
tained from e1 by replacing sub-expressions of the Last Name. No more simpli cations can be done due
form Ri1 1 Ri2 : : : n?1 Ri (where i is  or d ), to the multiple paths from Reference to Last Name.
by Ri1  Ri . We say that e2 is the most ecient
n

n (The inclusion of Last Name in Authors must be


version of e1 w.r.t. G, i it is more ecient than e1 , tested to lter out last-names of editors.)
and there is no other expression that is more ecient
than e2 w.r.t. G. Theorem 3.6 .
(i) Every inclusion expression e has a unique most
We show below that every inclusion expression e1 ecient version e0.
has a unique most ecient version. Furthermore, we (ii) The optimization algorithm, on input e, computes
present an algorithm that computes this expression this e0, in time polynomial in the size of e.
in time polynomial in the size of e1. The algorithm
is based on the following observations Proof: (sketch) We prove the theorem by showing
that
Proposition 3.5 Let I = fR1; : : : ; Rn g be a region
index. Let G = (I ; E ) be a RIG. Let e; e1 ; e2 be inclu-  only the kind of rewriting done by the algorithm
sion expressions over I , s.t. e1 is constructed from e can yield a more ecient expression which is still
by replacing the rightmost subexpression Ri d Rj by equivalent to the original one,
Ri  Rj , and e2 is constructed from e by replacing
some subexpression Ri  Rj  Rk by Ri  Rk .  the replacement system used by the algorithm
(a) e1 is equivalent to e w.r.t. G, i every path from satis es the nite Church-Rosser property (this
Ri to Rj in G has the edge (Ri; Rj ) and if there is a is shown using Sethi's theorem [Set74]).
cycle through Ri then  precedes Ri in e.
(b) e2 is equivalent to e w.r.t. G, i every path from In the following sections we present a technique for
Ri to Rk in G passes through Rj . translating database queries on les, into inclusion ex-
pressions. The expressions are then optimized using
The proposition is proved by analyzing the cases that the above algorithm and evaluated using the indexing
can cause ambiguity in the interpretation of inclusion engine.

Page 6
4 Mapping Files to Databases /* Non-terminals type definition */
Type hRef seti = set(Reference)
In this section we consider mappings between les and Type hReferencei = Reference
databases, and explain how to use such mappings for Type hKeyi = string
deriving a RIG for regions in a le. Type hAuthorsi = set(Name)
Type hTitlei = string
4.1 Structuring Schemas Type hEditorsi = set(Name)
Type hNamei = Name
Structuring schemas were introduced in [ACM93] as a Type hFirst Namei = string
tool for specifying how the data stored in a le should Type hLast Namei = string
be interpreted in a database. Structuring schemas /* Annotated grammar BibTeX Schema */3
enable users to view information stored in les as hRef Seti ! hReference
S i
if it is stored in a database, and to use database f$$ := $ig
query and update languages for accessing this in- hReferencei ! \@INCOLLECTIONf" hKeyi
formation. We brie y describe below the main con- \AUTHOR = " hAuthorsi
cepts. For full discussion see [ACM93]. In the sequel, \TITLE = " hTitlei : : :
we assume standard knowledge on object-oriented \EDITOR = " hEditorsi : : :g
databases, context-free grammars and parsing. f$$ := new(Reference; tuple(Key : $1;
A structuring schema consists of a database schema Authors : $2;
and a grammar annotated with database programs. Title : $3; : : :
Editors : $6; : : :))g
The grammar describes some of the structure of the hKeyi ! string
le. The annotation speci es the relationship be- f$$ := $1g
tween the grammar non-terminals and their database hAuthorsi ! hNameSi
representation. In particular, it speci es how a word f$$ := $ig
w derivable from a nonterminal A should be rep- hTitlei ! string
resented in a database. This is done by associat- f$$ := $1g
ing to each derivation rule A ! A1; : : : ; An a state- hEditorsi ! hNameSi
ment describing how the database representation of f$$ := $ig
a word derived from this rule is constructed using hNamei ! hFirst Namei hLast Namei
the database representations of the subwords derived f$$ := tuple(First Name : $1;
from A1 ; : : : ; An . Last Name : $2)g
hFirst Namei ! string
In the sequel, we use a Yacc-like notation [AJ74]. f$$ := $1g
In a rule A ! A1; : : : ; An , $i denotes the database hLast Namei ! string
image of the string corresponding to Ai, and $$ the f$$ := $1g
one associated to A.
The next example provides a simplistic subset of the Structuring schemas can be used to specify a virtual
structuring schema for BibTeX les. Every BibTeX database view over les [ACM93]. To answer a query
le is represented in the database as a set of reference on the database view of a le, one may construct the
objects. Each such object has attributes containing database image of the le (i.e. parse the le using
the key of the reference, the title, the set of authors, the structuring schema, construct the objects/tuples,
etc. and load them into the database), and then evaluate
The rst part of the speci cation de nes the classes the query on the database. This technique will ob-
and types used in the database representation. The viously lead to scanning and parsing the whole le,
second part associates with each non-terminal in the and constructing many unnecessary objects and com-
grammar the type/class used for representing words plex values. This is time and space consuming and
derived from that non-terminal. The third part de- we want to avoid it. The optimization technique pre-
scribes how the words are mapped into their database sented in [ACM93] reduces the amount of data loaded
representation. into the database while answering a query. But the
/* Classes and types */
whole le still needs to be scanned and parsed. We
will show below that this unnecessary e ort can be
Class Reference =
tuple(Key : string; Authors : set(Name); 3 We use the notation A ! B  f$$ := S $ig, to denote the
Title : string; : : : ; Editors : set(Name); : : :) fact that A is a sequence (possibly empty) of B 's, and that the
Type Name = database representation of A is a set containing the database
tuple(First Name : string; Last Name : string) representation of all the B 's in the sequence.

Page 7
avoided using text indexing techniques. The key ob- Thus, the region inclusion graph of I can be auto-
servation is that word and region indices can be used matically derived from the grammar G. The nodes
to locate substrings that are potentially relevant to are the non-terminals of the grammar, and the graph
the query computation, and thus save on scanning has an edge (Ai; Aj ) i G has a rule where Ai ap-
the whole le when evaluating the query. Moreover, pears as the left side, and Aj as the right side. For
we identify cases where database queries on a le can example, the graph in Section 3.2 is part of the RIG
be fully computed using the indexing engine, and the de ned by the BibTeX grammar.
scanning of the le can be completely avoided. It is important to note that the optimization tech-
nique presented in the following sections is applica-
4.2 Deriving a RIG from a Natural ble to other mappings between les and databases as
well. The technique depends only on the existence of
Structuring Schema a region inclusion graph describing the relationships
Note that the database representation of the Bib- between the indexed regions, and on the existence of
TeX le is rather \natural", i.e. it is very close to a mapping between path expressions in queries and
the actual structure of a le. We call such structur- paths in the region inclusion graph. In the case of
ing schemas natural schemas. The database rep- natural structuring schemas, the graph and the map-
resentation using a natural schema is ensentially the ping can be automatically derived from the grammar.
p-string of [GT87]. In general, the database represen- In case of more general mappings, the user may need
tation may signi cantly di er from the le structure to provide this information as part of the database
(for examples, see [ACM93]). schema.
To simplify the presentation we demonstrate the
optimization technique on queries over views de ned
using natural structuring schemas. We assume be- 5 Querying Fully Indexed
low that literals A de ned using rules of the form Files
A ! B  are represented in the database by sets
or lists. Literals A de ned using rules of the form This section describes how to translate database
A ! B1 : : : Bn are represented by tuples or by ob- queries into expressions in the region algebra, assum-
jects whose attributes correspond to B1 : : : Bn . We ing that all the needed regions are indexed. The last
also assume that the names of attributes are the same subsection highlights a class of queries that are very
names of the non-terminals they represent4. Termi- expensive when computed in traditional databases,
nals are represented by atomic types5. but are signi cantly cheaper using text indexing.
We rst consider a simple case where all the words
in the le are being indexed, and where the region 5.1 From Simple Queries to Region
index I = fA1; : : : ; An g contains all the non-terminal Expressions
names Ai in the grammar G, except the root of the
grammar. We defer the discussion of partial indexing We rst study optimization of simple queries of the
to section 6). We also assume that each index Ai is form ``SELECT r FROM R WHERE r:p = w", where p is
instantiated by the set of all regions corresponding to some path expression, and R is a database view of a
occurrences of Ai in the parse tree of the le f using le f de ned using a natural structuring schema with
the grammar G. a grammar G.
The inclusion relationship between the indexed re- Let References be a view de ned using the Bib-
gions is determined by the grammar. In particular, TeX structuring schema. Consider the query

a region ai corresponding to a nonterminal Ai can Q = SELECT r FROM References r


directly include a region aj corresponding to a non- WHERE r:Authors:Name:Last Name = \Chang "
terminal Aj i the grammar G has a rule where Ai
appears of the left side, and Aj on the right side. The references retrieved by this query, are exactly
4 This requires that every non-terminal name appears at those that match Reference regions that directly
most once in the right hand side of a rule. This is not a seri- include an Authors region, that directly include a
ous limitation since every grammar can be easily adjusted to Name region, that directly include a Last Name re-
satisfy this requirement. gion, that contains exactly the string \Chang". In
5 When considering general context-free grammar, disjunc- fact, the path expression in the query corresponds to
tive types will naturally arise from non terminals de ned dis-
junctively. This may not be realized directly is some database a path in the RIG of the fully indexed le. Part of
systems. There are of course a variety of means of simulating the RIG is presented below. The path is denoted by
such types (in particular using inheritance). dashed arrows.

Page 8
Reference
. .. XZXZXXXX
previous subsections, this query cannot be fully eval-
  + = Z ~ Title XX z Editors uated using the region algebra. The problem is that
9 Authors
Key the region algebra does not support operations com-
. .X . .X.X. X 
). 
paring contents of regions. This limitation is typical
.X
z.X
z Name.  to text indexing systems. It signals out the di erence
)
 XXXz. .q between database systems and traditional text index-
ing systems, and the bene ts one gains from having
First-Name Last-Name
a full database interface to les.
Consider the parse tree of the BibTeX le, shown in It turns out, however, that indices can still be used
Figure 5.1. The data used to answer the query resides to accelerate the computation. The region index can
in regions reachable by paths that match the path in be used to locate the regions corresponding to the at-
the RIG. tributes speci ed by the two paths. The content of
It follows that the references retrieved by the the regions is then loaded into the database, and a
query Q can be selected using the expression database join operator is used to select regions with
e1 = Reference d Authors d Name d matching content. Then, the region index is used
\Chang"(Last Name). As shown in section 3.2, again to locate the references containing those match-
this expression can be optimized, obtaining e2 = ing strings.
Reference  Authors  \Chang"(Last Name). To Queries may select elements based on several selec-
compute the query, we: (i) evaluate e2 , (ii) parse tion criteria composed using and, or and not opera-
the reference regions in the result using the BibTeX tors. These operations can be simulated in the region
structuring schema, obtaining reference objects, and algebra using union, intersection and subtraction of
(iii) return these objects to the user. the corresponding index expressions. Note that di er-
Recall that we consider here only natural structur- ent selection criteria may access common attributes.
ing schemas. This implies that every path expres- As in classical query optimization, the goal is to nd
sion p in the query, matches a derivation sequence(s) common subexpressions in the region expressions and
in the grammar (p may match several derivation evaluate them once.
paths due to conjunctive rules). This also implies Projection is handled similarly to selection. Instead
that the attributes used in the path p match re- of using the  and d operators, we use the  and
gions that correspond to the nonterminals in this d , resp. For example, the query
derivation. In general, every path p in a query
\SELECT r FROM R WHERE r:p = w", matches path(s)
A1 ! A2 ! : : : ! Am in the RIG. The path(s) Q = SELECT r:Authors:Name:Last Name
can be easily determined by syntactically analyzing FROM References r
the grammar. The regions matching the objects re-
trieved by Q are exactly those regions selected by the is translated to e1 = Last Name d Name d
inclusion expressions A1 d A2 d : : : d w (Am) Authors d Reference.
Thus to compute the query eciently we (i) trans- The optimization technique presented in section 3.2
form the query to an inclusion expression, (ii) opti- works for expression containing  and d operations
mize the inclusion expression, (iii) evaluate the inclu- as well. In particular, e1 above is optimized getting
sion expression, (iv) parse the resulting regions, and e2 = Last Name  Authors  Reference.
return the required objects. Complex queries involving several view de nitions
(e.g. the BibTeX authors that are cited in a La-
5.2 Select{Project{Join Queries Tex le) or several occurrences of the same view
The queries considered above have only one selection (e.g. nested queries), use join. Text indexing systems
criterion comparing an attribute to a constant. We are inadequate for performing join like computation.
next consider queries with more complex selection cri- This must be done at the database level. However,
teria. Consider rst selections that compare the val- we can use the indexing system to reduce the amount
ues of two attributes. The query of information loaded to the databases for performing
the join. The idea is to use rewriting rules to push
Q = SELECT r FROM References r selection and projection down as much as possible,
WHERE r:Editors:Name: = r:Authors:Name and then use indexes to locate the elements needed
for the join computation. We will not address in de-
selects references that appeared in books edited by tails the problems raised because they are similar to
one of the authors. Unlike the queries discussed in the that of complex queries in any rewriting system.

Page 9
 . . . XXXXXXXz
Ref-Set

9
. . Reference H
   H  . . ZHHH
Reference
) . ?
 ? B
N j
H  = . =

 Authors Title Z~ HjEditors
Key Authors . Title Editors Key
.. HHH . . H.H. . .
. ..
.  j ..
j S w
S .


.
 HHjj. . @@R
.. Name. . Name. . NameP. . . . Name
JJ^ PPP.P.Pq. q.
Name Name
 Z .. H .

HHj j. .  P P .
PPq j.
.

+
 ~
Z
First-Name Last-Name
s 

First-Name Last-Name
=

First-Name Last-Name First-Name Last-Name
Figure 2: The parse tree for BibTeX les (full indexing).

5.3 Extended Path Expressions evaluated much more eciently than in the standard
Information stored in les often has complex struc- approach.
ture because it represents objects that are inherently XSQL also supports path expressions of the form
complex. It has been observed [KKS92, MBW80] Ai:X1 :X2 : : : Xn :Aj (i.e variables without the star no-
that simple path expressions are not always suit- tation). This is used to access Aj attributes that are
able for manipulating objects with complex structure. reachable from Ai by an arbitrary path of length n.
One way to overcome this problem is to use path ex- (In contrast, the star notation denotes paths of ar-
pressions with variables. bitrary length). This can be simulated in the region
Assume that one wants to nd all references where algebra by looking for regions a1 of type A1 that con-
\Chang" is an author or an editor. This can be ex- tain some region a2 of type A2, provided that there
pressed by the XSQL query are exactly i nested regions contained in a1 and con-
taining a2 .
Q = SELECT r FROM References r It is important to emphasize that in traditional
WHERE r:  X:Last Name = \Chang" OODBMS, path expressions with variables are com-
putationally more expensive than those with no vari-
The notation X means that we are interested in ables (since the system has to actually traverse all
the attribute Last Name, no matter what is the possible paths). In contrast, for text les, path ex-
path leading to this attribute. A similar facility for pressions with variables may be cheaper. This is due
querying text databases with partial knowledge of the to the fact that simple inclusion () may be applica-
schema is described in [KM93]. ble instead of direct inclusion (d).
A naive way to evaluate such a query is to ana- One could also go beyond rst order queries, and
lyze the path expression, nd all the possible assign- use a notation borrowed from GraphLog [Con89,
ments to the variables, and then evaluate the query CM90]: path regular expressions. These extend path
separately for each speci c path. There are cases, expressions with the traditional regular expression
however, where a better evaluation strategy exists. operators (in particular, the transitive closure oper-
In particular, if a variable name X appears in the ator). Within the framework we describe here it is
path expression only once, then any path from the possible to evaluate paths with a regular expression
attribute left of X to the attribute on the right is involving a transitive closure, with just an inclusion
an acceptable assignment. Recall that the database expression. This shows, once more, that in some cases
representation of les naturally matches their struc- a traditionally expensive query (a closure) can be im-
ture, and that attributes correspond to regions in plemented much more eciently with the techniques
the le. The attribute Aj speci ed by the path ex- we describe here.
pression Ai:  X:Aj resides in a region of type Aj
that is included in some region of type Ai. Thus,
Ai:  X:Aj (where X appears in p only once) is
translated to Ai  Aj . Variables that appear more
6 Partial Indexing
than once need to be instantiated by appropriate at- In the previous sections we assumed that all the non-
tribute sequences, and are translated as before (us- terminals in the grammar are indexed. In practice,
ing the d operator). Thus, the query Q above we may want to create a smaller number of region
can be mapped into the region algebra expression indexes to reduce the space and update costs. We
Reference  \Chang"(Last Name), which can be show below that performance improvements can be

Page 10
obtained even when only a selected subset of regions regions can be achieved. Below is a part of the parse
is indexed. tree of a bibtex le. The dashed lines in Figure 6.1
Partial indexing may not be sucient for fully correspond to paths that match the path in the above
evaluating queries using the region algebra. It can RIG.
be used, however, to signi cantly reduce the search Note that due to the partial indexing, one can-
space. In the presence of partial indexing, a query is not distinguish between last names of authors and
computed in two phases: (i) The query is compiled last-names of editors. Thus, the inclusion expression
into an inclusion expression that computes a super Reference d \Chang"(Last Name) identi es a su-
set of the required result - a set of candidate regions, perset of the required references, (references where
and (ii) the candidate regions are further processed \Chang" is either an author or an editor).
to obtain the exact result. In general, a path p in a query
Q = SELECT r FROM References r WHERE r:p = w
6.1 Obtaining the Candidate Regions
matches a path A1 ! A2 ! : : : ! Am in the RIG
In this subsection we describe the rst phase { trans- of the indexed non-terminals. The inclusion expres-
forming the query to an inclusion expression comput- sion A1 d A2 d : : : d w (Am) retrieves a set of
ing a set of candidate regions. The second phase - candidate regions, that is a superset of the regions
the regions processing { is discussed next. required by the query. (This expression can be fur-
Let Ip = fAi1 ; : : : ; Aik g, be a region index contain- ther optimized using the optimization algorithm of
ing part of the non-terminal names in the grammar Section 3.2).
G. Assume that each index Aij is instantiated by the Note that there are cases where the set of candi-
set of all regions corresponding to occurrences of Aij date regions coincide with the query's answer. The
in the parse tree of the le f using the grammar G. conditions under which this happens are discussed in
As in the case of full indexing, the inclusion rela- Section 6.3.
tionship between the indexed regions is determined
by the grammar. In particular, the region inclusion
graph of Ip can be automatically derived from the 6.2 Parsing the Candidate Regions
grammar G. The nodes are the indexed non-terminal We next lter out irrelevant the regions. To this
in Ip . The graph has an edge (Ai; Aj ) i in the RIG end, we parse the regions in the superset, building for
of the full grammar (i.e. where all the non-terminals each region a corresponding database representation,
are indexed) there is a path from Ai to Aj where all and then select the required elements by applying the
the non-terminals on the path other than Ai,Aj are query on the resulting database objects.
not indexed (i.e. do not belong to Ip ). It was observed in [ACM93] that the structuring
For example consider the BibTeX grammar, and schema can be optimized by \pushing" the query into
let Ip = fReference; Key; Last Nameg. The corre- the parsing process, so that only objects that meet
sponding RIG is presented below. As before, a path p the query selection criteria are built. Parsing using
in a database query matches a path in the RIG. The an optimized schema reduces the construction of un-
dashed arrows in the following diagram correspond to necessary database objects.
the path in the query
Q = SELECT r FROM References r 6.3 Exact Answer with Partial Index-
WHERE r:Authors:Name:Last Name = \Chang " ing
Reference There are cases where partial indexing is sucient for
. fully computing the query, without additional pars-
? ? .R@. @
R ing. This happens when the indexed non-terminals

?
Key Last-Name provide enough information to avoid ambiguities in
path interpretation. The conditions are sketched be-
Once more, consider the parse tree of the BibTeX low.
le. The data used to answer the query, resides in re- Let I be a region index containing all the names
gions reachable by paths that match the path in the of non-terminals in the grammar G, and let Ip 
RIG. In the case of full indexing, the indexed infor- I be a partial index. Let RIG(I ) and RIG(Ip ) be
mation was sucient for exactly locating the regions the corresponding region inclusion graphs. The key
needed for the query processing. In the case of par- observation is that every edge (Ai; Aj ) in RIG(Ip )
tial indexing only an approximation of the required matches path(s) from Ai to Aj in RIG(I ), where all

Page 11

9  Ref-Set XX XX
XXXz
  .H. H
...
Reference .H
) . .?
. Reference . . .H
 ? B
N j
. j
H 
   = . .
=

.
 Z ~
Z .H
j. HjEditor
Key Authors. Title . Q
Editor Key Authors Title . ........
.
.
. H . .
H .. .
R s
Q . .
 . H. .
H . R RName ..
. . H
j j .
Name
. JJ^ . 
 . H .
HjName
j. . .
. ..
ZZ~
Name Name . ^ Name
P P P . R

HHHj. j. Last-Name = PPPq. j.
. .
+
  @@ R
.
~
. 

. . JJ^ PP.Pq. qLast-Name
.
.
First-Name Last-Name First-Name Last-Name First-Name Last-Name First-Name Last-Name
Figure 3: The parse tree for BibTeX les (partial indexing).

the nodes on the path(s), other than Ai and Aj , are gions. Note however that not all the indexed regions
not in Ip. must indeed be checked. The grammar G enforces
Consider a query \SELECT r FROM R r WHERE r:p = certain relationships between regions. In fact, only re-
w". Let A1 ! A2 ! : : : ! An be the path in gions corresponding to non-terminals derivable from
RIG(Ip ) corresponding to the path p in the query Q. Ai and deriving Aj can violate the direct inclusion.
The inclusion expression A1 d A2 d : : : d w (An) Thus only region indices corresponding to those non-
fully computes Q i each of the edges (Ai; Aj ) on terminals need to be checked. Moreover, if Ai derives
this path matches a unique path in RIG(I ). If the some Ai that derives some Ai that derives Ai+1,
0 00

edges match several paths, then the inclusion expres- it is not necessary to check both Ai and Ai , one
0 00

sion computes a superset of the required regions. suces.


In summary, to fully compute Q, it is sucient to
(i) index the nonterminals mentioned in e, and (ii)
7 Choosing What to Index for every subexpression Ai d Ai+1 in e, index one
non-terminal (other than Ai,Ai+1) on each path from
The eciency of query evaluation depends on the Ai to Ai+1 in the RIG of the grammar G.
choice of region indices. There is a tradeo between Indexing can be either performed globally (i.e., for
performance and the number of regions being in- the whole le) or only in speci c regions. For exam-
dexed. As in standard database systems, knowledge ple, assume that users often query names of authors,
about the access patterns to les can be used to select but never (or hardly ever) query names of editors. In
indices. In fact, many aspects of this issue should be that case, instead of indexing all the Name regions
considered in the broader context of indexing tech- it is better to index only those that reside in some
niques for OODBMS's (see [Ber94] for a survey). A Authors region.
key observation in our context is that in many cases As explained in Section 6, one can trade indexing
partial indexing is sucient for full computation of for accuracy of computation. The parameters taken
queries. into consideration are the number of regions needed
Consider a query \SELECT r FROM R r WHERE r:p = to be indexed for full computation, and the number
w", where R is a view de ned by a structuring and expected size of the regions needed to be parsed
schema with grammar G. Assume rst that all the due to non indexed data.
non-terminals in the grammar are indexed, and let
e = A1 1 A2 2 : : : n?1 An be the optimized inclusion
expression that computes Q (e is constructed and op-
timized as explained in Section 5).
8 Conclusions
Note that not all the indexed non-terminals are ac- In this paper we discussed how to provide ecient
tually needed for evaluating e. One can distinguish access to semi-structured data residing in les. The
two kinds of region indices used in the computation. approach combines the convenience of using an ex-
Regions that are explicitly mentioned in e (i.e. the tended SQL to query the more structured components
Ai's), and those that are implicitly mentioned. The of the information in les with the eciency a orded
implicit usage is due to the d operation. To com- by text indexing.
pute Ai d Ai+1, one has to rule out all non di- An original contribution of the paper consists of an
rect inclusions. This requires checking that none of optimization technique that is applicable when ad-
the other indexed regions resides between the two re- vanced text indexing technology is available to the

Page 12
database query evaluator. The concept of a region [CM90] M. Consens and A. Mendelzon. GraphLog:
inclusion graph is introduced to provide the informa- a visual formalism for real life recur-
tion needed to perform the above optimization. We sion. In Proceedings of the Ninth ACM
also discuss how to automatically derive a RIG when SIGACT-SIGMOD Symposium on Principles
structuring schemas are used to specify the mapping of Database Systems, pages 404{416, 1990.
from les to databases. [CM93] M. Consens and A. Mendelzon. Hy: A
Preliminary experimental results show that signif- hygraph-based query and visualization sys-
icant performance improvements can be obtained by tem. In Proceedings of the ACM-SIGMOD
using optimized Pat inclusion expressions (instead of 1993 Annual Conference on Management of
relying on a traditional database engine) for the eval- Data, pages 511{516, 1993.
uation of queries. [Con89] Mariano P. Consens. Graphlog: \real life" re-
cursive queries using graphs. Master's thesis,
Acknowledgments: We would like to thank Vassos Department of Computer Science, University
Hadzilacos for his fruitful suggestions, and Frank Tompa of Toronto, January 1989.
and Pekka Kilpelainen for detailed comments on a prelim- [GNOT92] D. Goldberg, D. Nichols, B. M. Oki, and
inary version of this paper. The rst author would like D. Terry. Using collaborative ltering to
to acknowledge Gaston Gonnet for several discussions on weave an information tapestry. CACM,
Pat. This work was done at University of Toronto and the 35(12), December 1992.
second author was supported by the Institute for Robotics [Gon87] G. Gonnet. Examples of Pat applied to the
and Intelligent Systems. Oxford English Dictionary. Technical Report
OED-87-02, University of Waterloo, 1987.
[GT87] G. Gonnet and F. Tompa. Mind your gram-
References mar: a new approach to modelling text. In
Proc. of the 13th Int. Conf. on Very Large
[ACM93] S. Abiteboul, S. Cluet, and T. Milo. Querying Databases, pages 339{346, 1987.
and updating the le. In Proc. of the 19th Int. [KKS92] M. Kifer, W. Kim, and Y. Sagiv. Query-
Conf. on Very Large Databases, VLDB93, ing object-oriented databases. In Proc. SIG-
pages 73{84, 1993. MOD, San-Diego, 1992.
[AJ74] A. V. Aho and S. C. Johnson. Programming [KM93] P. Kilpelainen and H. Mannila. Retrieval
utilities and libraries LR parsing. Computing from hierarchical texts by partial patterns. In
Surveys, June 1974. Proc. of the 15th. SIGIR Conference, 1993.
[BCD89] F. Bancilhon, S. Cluet, and C. Delobel. [Lam85] L. Lamport. LaTeX: A Document Prepara-
Query languages for object-oriented database tion System. Addison-Wesley, Reading, MA,
systems: the O2 proposal. In Proc. DBPL, 1985.
Salishan Lodge, Oregon, June 1989. [MBW80] J. Mylopoulos, P. A. Bernstein, and H. K. T
[Ber94] E. Bertino. A Survey of Indexing Tech- Wong. A language facility for designing
niques for Object-Oriented Database Man- database-intensive applications. ACM Trans-
agement Systems. In Freytag, J. and Maier, actions on Database Systems, 5(2), 1980.
D. and Vossen, G., editor, Query Processing [Ope93] Open Text Corporation. Pat Reference Man-
for Advanced Database Systems, pages 383{ ual and Tutorial, 1993.
418, San Mateo, CA, 1994. Morgan Kauf- [Pae93] A. Paepcke. An object oriented view onto
mann. public hetrogeneous text databases. In IEEE
[BGH+ 92] T. F. Bowen, G. Gopal, G. Herman, Data Eng., page 484, 1993.
T. Hickey, K. C. Lee, W. H. Mans eld, [Sch93] M. F. Schwartz. Internet Resource Discovery
J. Raitz, and A. Weinrib. The Datacycle at the University of Colorado. IEEE Com-
architecture. Communications of the ACM, puter Networking, 26(9), September 1993.
35(12):71{81, December 1992.
[Set74] R. Sethi. Testing for Church-Rosser Prop-
[BGMM93] D. Barbara, H. Garcia-Molia, and erty. JACM, 21(4), October 1974.
S. Mehrota. The gold mailer. In IEEE Data [SLS+ 93] K. Shoens, A. Luniewski, P. Schwartz, J. Sta-
Eng., pages 92{99, 1993. mos, and J. Thomas. The Rofus system:
[Bur92] F. J. Burkowski. Retrieval activities in a Information organization for semi-structured
database consisting of heterogeneous collec- data. In Proc. of the 19th Int. conf. on Very
tions of structured text. In Proc. of the 15th. Large Databases, VLDB 93, pages 97{107,
SIGIR Conference, pages 112{125, 1992. 1993.

Page 13
[SM83] G Salton and M. J. McGill. Introduction to
modern information retrieval. McGraw-Hill,
1983.
[ST92] A. Salminen and F. W. Tompa. Pat expres-
sions: an algebra for text search. In Pa-
pers in Computational Lexicography: COM-
PLEX'92, pages 309{332, 1992.
[Yeu93] A. Yeung. Text Searching in the Hy+ Visual-
ization System. Master's thesis, Department
of Computer Science, University of Toronto,
October 1993.

View publication stats


Page 14

You might also like