Professional Documents
Culture Documents
Towards A Theory of Spatial Database Queries: Jan Paredaens Jan Van Den 13usschet Dirk Van Gucht$
Towards A Theory of Spatial Database Queries: Jan Paredaens Jan Van Den 13usschet Dirk Van Gucht$
(Extended Abstract)
Abstract 1 Introduction.
A general model for spatial databases is considered, tial database systems (e.g., [Buc89, GS91, A093]),
which extends the relational model by allowing as tuple which are capable of dealing with spatial informa-
components not only atomic values but also geometrical tion. Spatial database models and prototypes pro-
figures. The model, which is inspired by the work posed in the literature typically focus on one spe-
of Kanellakis, Kuper and Revesz on constraint query cific type of spatial information (or a finite set of
languages, includes a calculus and an algebra which are
specific types), such as intervals for temporal appli-
equivalent. Given this framework, the concept of spatial
cations or polygonal line segments for geographic
database query is investigated. Thereto, Chandra and
Harel’s well-known consistency criterion for classical applications. This narrow focus is often justified,
relational queries is adapted. Various adaptations since it is sufficient for many applications, and al-
are proposed, depending on the kinds of geometry in lows for tuned, efficient implementations. Never-
which the spatial information in the database is to theless, in order to obtain a better understanding of
be interpreted. The consistency problem for calculus the fundamental issues underlying spatial database
queries is studied. Expressiveness issues are examined.
management, a more general perspective is in or-
The main purpose of the paper is to open up new
der.
grounds for theoretical research in the area of spatial
A natural approach to achieving this generality is
database systems. Consequently, many open problems
are indicated. tct consider as spatial data any geometrical figure
definable in elementary geometry, i.e., first-order
logic over the reals. This wide class of figures is
also known as the class of semi-algebraic sets in
*Dept. Math. & Computer Sci., University of Antwerp real algebraic geometry [B CR87]. The first-order
(UIA), Universiteitsplein 1, B-261O Antwerp, Belgium. thleory of the reals is decidable; actually, it allows
t Research Assistant of the Belgian National Fund for a very strong form of effective quantifier elimina-
Scientific Research. Address: same as Jan Paredaens. tion (known as Cylindrical Algebraic Decomposi-
t Computer Sci. Dept., Indiana University, Bloomington,
tion [C0175, Arn88]), which makes that many prop-
IN 47405-4101, USA.
erties of semi-algebraic sets are decidable as well
Permission to copy without fee all or part of this material is [HRR91]. Databases containing arbitrary semi-
granted provided that the copies are not made or distributed for
direct commercial advantage, the ACM copyright notice and the algebraic sets (i.e., their defining formulas) and a
title of the publication and its date appear, and notice is given ca,lculus for querying them were first considered by
that copying is by permission of the Association of Computing
Machinery. To copy otherwise, or to republish, requires a fee Kanellakis, Kuper and Revesz (abbr. KKR) in the
and/or specific permission. cclntext of their work on constraint query languages
279
model and the KKR model. The tuples in the order logic on finite structures. This is no longer
relations have both ordinary data attributes and true however in a purely spatial setting without
spatial attributes. Correspondingly, we have an ordinary data. Nevertheless, we show that unde-
associated calculus query language that is a two- cidability carries over, even for the simplest prob-
sorted combination (and a conservative extension) lem of testing for translation-genericity over one-
of the standard relational calculus and the KKR dimensional spatial relations.
calculus. We also define an equivalent algebra. We also look into the expressiveness of the
In the seminal paper on the theory of classical spatial calculus. We observe that it can be decided
relational queries, Chandra and Harel defined a in the calculus whether an arbitrary semi-algebraic
query formally as a partial recursive function from set is finite. ~From this it follows that various
databases to databases that is invariant under important integrity constraints are expressible. For
permutations of the universe of atomic values instance, one can check in the calculus whether
( [CH80]; cf. also [AU79]). The latter consistency the database consists of a finite number of circles,
criterion, nowadays known as genericity [HY84], polygonal line segments, closed polygons, lines, and
says informally that the query can be computed points, as is often the case in practical applications.
(on a Turing machine, say) regardless of how Along the way, several open problems are in-
atomic values are encoded as strings: two different dicated. For example, it is effectively decidable
atomic values can only be distinguished on the whether a semi-algebraic set is topologically con-
basis of their logical properties. This intuition nected. But is it decidable by a calculus query? It
was also expressed by Tarski at a talk given in is in one dimension, and in at least three dimen-
1966 [Tar86], where he compared the situation sions we show that the problem is related to an-
with that of geometry. There, for example, other open problem, namely whether connectivity
a mathematical object (e.g., a set of points, of a finite graph over the real numbers is express-
or a function from sets of points to sets of ible in the calculus. The case of two dimensions
points) is called “Euclidean” if it is invariant remains open. Another important open problem is
under similarities, or called “topological” if it is to find query languages that are complete for the
invariant under homomorphisms. More generally, (different kinds of generic] spatial database queries.
geometrical concepts are classified according to the In this context, we briefly discuss ways to extend
group of geometrical transformations under which the calculus with an iteration mechanism.
they are invariant. It should be clear that our main purpose with
This will lead us to the definition of various this paper, rather than presenting a number of
consistency criteria for spatial database queries, hard results, is to break up new grounds for the
depending on the kind of geometry in which the study of fundamental aspects of queries for spatial
280
defines a set of points {(~1, . . . . z~) I p} in the type) and flat instances. In the flat case, there
Euclidean space Rn in the obvious way by letting is no difference between the intensional and the
real variables range over the real numbers. Sets extensional level.
thus definable are called semi-algebraic. Note that On the other hand, relation types of the form
for m = O, there are only two such sets: {( )}, which [0, m] are called pureiy spatial. Purely spatial
also stands for the Boolean constant True, and the databases are exactly those considered in the KKR
empty set which stands for False. model [KKR90] mentioned in the Introduction.
Two real formulas are equivalent if they define The sharp distinction we make between ordinary
the same set. Since Tarski [Tar51] it is known data and spatial data allows a clean definition of
that equivalence is decidable. Actually, every real the concepts introduced in the next section. This
formula can be effectively converted to an equiva- would not be possible if we would have chosen to
lent formula that is quantifier-free. Algorithms are represent the atomic values in U as, e.g., integers
known [C0175, Ren92] which perform this quanti- in R, which seems what is suggested in [KKR90].
fier elimination in time which is exponential only
in the number of variables of the formula (assum- i] Generic spatial queries.
ing prenex normal form). So, the algorithms have IDefinition 1 For a scheme S, denote the set of
polynomial-time complexity if this number is con- i-instances (resp., e-instances) of S by i-inst( S)
st ant. This will clearly be the case in our setting. (resp., e-inst(S)). Let Sin and SOUt be disjoint
schemes.
281
a query has the same behavior on “isomorphic” Q(I,).
databases. With respect to the ordinary data, An i-query Q is Ggeneric if its corresponding
isomorphisms are simply arbitrary permutations e-query Q’ (cf. Definition 1) is. ■
of the set of at omit values U. But what are
Let us now consider a few naturally occurring
the isomorphisms with respect to the spatial
groups of transformations in a bit more detail: T,
data? They will certainly be transformations of
the group of translations; D, the group of direct
the Euclidean space, but precisely which kinds
isometrics; I, the group of isometrics; S, the group
of transformations should be considered depends
of similarities; A, the group of affinities; H, the
on the kind of geometry in which the spatial
group of homeomorphisms; and P, the full group
information is to be interpreted.
of all permutations.
For inst ante, in some applications, the relative
P-genericity is an extreme form which enforces
positions (e.g., above-below, left-right ) among the
the standard genericity criterion of Chandra and
different geometrical figures in the database form
Harel on the querying of spatial data. In other
an essential part of the spatial information. This
words, the points in space are considered to be
is for example the case in temporal databases deal-
abstract atomic values without any geometrical
ing with time points on the real line [T+93]. For
meaning, much like the atomic values in U.
these applications, translations of the space are
As already mentioned earlier, H-genericity is
considered to be isomorphisms, but reflections are
most appropriate for applications that are inter-
not. On the other hand, in geographical appli-
ested only in the topological properties of the spa-
cations, which oft en deal exclusively with areas
tial data.
and dist antes, any dist ante-preserving t ransforma-
A-genericity corresponds to applications where
tion (i.e., any isomet ry ) is considered an isomor-
straightness of lines becomes important. With S-
phism. Still other applications may only be in-
genericity, also the length of straight line segments
terested in the topological properties of the spa-
comes in. A-genericity is appropriate for applica-
tial information (e.g., [EF91]), and will think of
tions which consider any two figures with the same
isomorphisms as topological homeomorphisms. In
shape (e.g., a circle and an ellipse, or two triangles)
summary, genericity of spatial database queries is
to be isomorphic; S-genericity captures the appli-
best defined with respect to some general group of
cations which do so only if the two figures also have
geometrical transformations.
the same proportions (e.g., two circles or two sim-
Before we can give a formal definition, we need
ilar triangles ).
the following notion. A relation type of the form
As already mentioned earlier, I-genericity is for
[n, m] is said to be of dimension k if n-t is a
applications in which the exact distances are im-
multiple of k. Assume this is the case: m =
portant when considering two figures as isomor-
i.k. The intuition then is that in an etu-
phic. D-genericity is even stronger, also requiring
ple (al, ..,, annul, ..., u~) of type [n, m], the m-
that the orientation of angles is the same: direct
tuple of real numbers (UI, . . . . u~) is thought of
isomet ries are transitions or rotations. Finally,
as an i-tuple of points in the k-dimensional Eu-
we have T-genericity which, as already mentioned
clidean space R~. We can similarly talk about
earlier, is relevant for temporal databases (in di-
k-dimensional schemes (containing only relation
mension 1).
names with a k-dimensional type) and instances.
Clearly, the stronger the notion of isomorphism
Given a k-dimensional instance 1 and a permuta-
is, the smaller the group ~ is and the more queries
tion g of Rk, we can apply g to I in the obvious
are G-generic. We thus have a series of implications
way.
which we will now show to be strict. The proof of
the following Proposition also serves to illustrate
Definition. Let Sin and SOUt be two disjoint
the different kinds of genericity by means of natural
schemes of dimension k. Let ~ be a group of
examples.
permutations of R~. An e-query Q of signature
Sin Y S.Uk is G-generic if for any g E ~ and any Proposition 1 P-generic + H-generic + A-
11,12 c e-inst(Sin), if g(ll) = 12 then g(Q(lI)) = generic + S-generic + I-generic + D-generic +
282
T-generic. These implications hold for i-queries single atomic value ‘John’, and rephrase the query
(whence also for e-queries) and are strict. correspondingly. This is of course merely an
alternative view on the well-known notion of
Proof. (Sketch) We work in dimension 2. Con- C-genericity [HY84], a slight generalization of
sider a database scheme {Lives, Region}, where classical genericity allowing queries which depend
Lives is of type [1,2] and Region is of type [0, 2]. cm some finite set of constants C.
On the extensional level, each tuple in Lives is of Under this alternative view, we can generalize
the form (n; z, y) where n is (the name of) some the notion of C’-genericity to the spatial context,
person and (x, y) is a location where this person appropriate for a finite set of spatial constants
lives. Region cent ains some region, i.e., a semi- (which can be arbitrary semi-algebraic sets in
algebraic set in the Euclidean plane. The following general). As a simple example where the constant
queries have an output scheme consisting of a single is a single point, consider a relation R oft ype [0, 2]
relation. a,nd the query (of result type [0, 2]) which rotates
The query “give the persons who live in the 1? over 90° around the center (O, O). This query is
region” (resulting in a relation of type [1, O]) is P- nlot even T-generic since the center point is given
generic. special status. However, given two relations RI and
The query “give the persons who live in the 122 of type [0, 2], the query which checks whether
topological interior of the region” is H-generic but R2 consists of a single point p and, if so, rotates
not P-generic. Another example of a query with RI over 90° around p certainly is T-generic (it is
this property is “give the topological boundary of actually S-generic).
the region” (of result type [0, 2]).
The query of result type [0,2] “give the straight 41 Calculus and algebra.
lines contained in the region” is A-generic but not
We now present a basic mechanism for specifying
H-generic.
spatial queries in the form of a calculus language
The query “give the persons who live closest
which is an orthogonal combination of the standard
to the region” is S-generic but not A-generic.
relational calculus and one of the constraint query
Another such query (of result type [0, O]) is “does
languages of [KKR90]. The operational semantics
the region consist of two orthogonal finite-length
of the calculus is provided by an equivalent algebra.
line segments?”
Recall the language of real formulas defined in
The query “give the equilateral triangles with
Section 2. The formulas of the calculus query
sides of length 1 contained in the region” is I-
limguage are obtained simply by adding to the
generic but not S-generic. Another such query is
former language:
“give the pairs of persons living at Ieast 10 miles
apart”. ● A totally ordered infinite set of variables called
Assume the region is a circle and that persons value variables, disjoint from the set of real
live on this circle. The query “give the pairs variables;
(nl, nz) such that person nl lives before person nz
in clockwise order” is D-generic but not I-generic. ● Atomic formulas of the form VI = V2 or
Finally, the query “give the pairs (nl, n2) such R(vl,. ... v~; pi,.. ., pm) where R is a relation
that person nl lives west of person n2° is T-generic name of type [n, m], the vi are value variables
but not D-generic. ■ and the p; are real terms (the latter kind of
atomic formula is called a relation atom); and
We conclude this section with the following
remark. Asking for the locations where John ● Quantifications (2v) or (Vu) of value variables.
lives, though reasonable as a query, is strictly
speaking not a query at all since it gives special Let @ be a calculus formula with free value vari-
status to the atomic value ‘John) and is therefore ables VI, ..., Vn and free real variables ZI, . . . . r~.
not invariant under all permutations of U (cf. Let S be a scheme containing all relation names
Definition 1). This problem goes away if we occurring in @. Given an e-inst ante 1 of S, 0 de-
provide a constant relation John containing the fines an e-relation 0(1) of type [n, m] in the obvious
283
way by letting value variables range over the set of Let 1 ~ ii, ..., iP s m and let gl, ..., yP be
all atomic values appearing in 1 and letting real real variables different from each Xt. For a real
variables range over the real numbers. formula ~ with free variables 01, . . . . xn, define
The calculus can thus be used as a declara- T ~tl ,...,~,, (y) as the quantifier-free equivalent of
tive language for expressing queries on the exten- the real formula
sional level. It is, of course, more interesting how-
ever to express i-queries (which are effectively com- (~ Xl)... (~Xn)(PA ~ ye = z,,).
putable). This turns out to be possible: every e- != 1
query expressible in the calculus corresponds (in
the sense of Definition 1) to an i-query. We show Then the spatial projection mz,, ,...,CLP(T) equals
this by translating calculus formulas into equiva-
{(va~(t); ~Ttl,...,ztp (spat(t))) I t ~ r}, of type
lent algebra expressions which have an operational [n,p].
semantics on the intensional level. The operators
of this algebra are presented next. ● Finally, the algebra includes the constant R~
We will use the following terminology: given an for each k, standing for the “full” relation of
ituple t = (al, . . . , an; ~), we will denote the value type [0, k].
part (al, ..., an) of t by val(t) and the spatial part
p oft by spat(t). Algebra expressions are obtained by applying al-
Now let r, TI, and Tz be i-relations of type [n, m] gebra operators to relation names. Using standard
and let r’ be an i-relation of type [n’, m’]. Without techniques we can prove:
loss of generality, assume that all real formulas
appearing in T, T1 and r2 use the same variables Proposition. Every calculus jormzda @ can be
xl, . . ..x~. an d that no real variable is used both effectively converted into an algebra expression E
in r and r’. such that the e-query expressed by @ coTTesponds
to the i-query expressed by E.
● The union rl U r2 is the standard set-theoretic
union.
There is also an obvious converse to this propo-
● For an arbitrary ituple t, denote the set sition which we do not state explicitly. We can
thus conclude that the calculus and the algebra are
{t’ ET, I val(f) = val(t)} equivalent.
● The value selection u~=j (r) equals {t ~ r / To conclude this section we point out (proof
val(t)(i) = vail}. omitted) that the spatial calculus (algebra) is a
Let p be a quantifier-free real formula on conservative extension of the standard relational
the variables xl, . . . . x~. The spatial selection calculus (algebra).
OP(T) equals {(va/(t); spat(t) A $0) I t E r}.
Proposition. Every calculus query of flat signa-
● For 1 < il, . . ..iP < n, the value projection
ture is expressible in the fiat relational calculus.
T’,l,..,,,P(T) equals
284
5 Genericity of calculus queries. algebra (in particular for any natural subgroup of
the group of affine transformations with algebraic
Not all calculus queries are generic. The simplest
parameters), G-genericit y of purely spatial calculus
example is the trivial query ‘{z I z = O}’ defined on
queries is co-r. e., since for any fixed transformation
the l-dimensional scheme SI = {R} with T(R) =
g, query Q, and i-instance 1, the statement
[0, 1]. This query is not T-generic. Actually,
it is not g-generic for any g which contains a g(Q(0) = Q(g(I)) can be written as a sentence
in the decidable first-order theory of the reals.
transformation that does not fix O.
Hence, there is a “consistency problem” for the If we disallow quantification, the consistency
calculus. The following theorem witnesses the problem seems to become more manageable. We
difficulty of this problem even in the simplest present one theorem that is probably typical for
possible setting: the kind of results that can be expected in this
sit uation.
Theorem 1 With g and S1 as above, G-genericity Consider the simple 2-dimensional scheme con-
of calculus queries of signature S1 -+ S1 is sisting of a single relation R of type [n, 2] for some
undecidable. n. Call a calculus query of result type [n’, 2] over
this scheme simple quantifier-free if its defining for-
Proof. (Sketch) The V*-fragment of number
mula ~(~z,y) has the form QI A . . . A O, A E“,
theory is undecidable since Hilbert’s 10th problem
where each @i is a relation atom or its negation
can be reduced to it. Encode a natural number n by
and @ is a quantifier-free formula not involving
the one-dimensional semi-algebraic set enc(n) :=
R. The class of simple quantifier-free queries cap-
{o,..., n}, and encode a vector of natural numbers
tures the “global” operations (and selections, us-
(721,..., nk) by enc(nl) U enc(n2) + nl + 1U ...U
ing IO’] on the spatial data, such as a transla-
enc(n~ ) + nk–l + 1. The corresponding decoding is
tion over a constant vector ii of all points in R:
first-order. We then reduce a V*-sentence (V7)P(~)
{(;,;) I R(~i?- d)}.
of number theory to the query:
databases” equivalent to ours. Note that equiv- with free value variables ii and free real variables
alence of purely spatial tableau calculus queries E, j. The query {(~ ji) I {Z I O(Z ~, ~)} is finite}
285
In other words, Proposition 2 says that the However, restricting attention to purely spatial
generalized quantifier (~fin~) (“there exist only databases, we conjecture that also many PTIME
finitely many d’) is expressible in the calculus.1 queries are not expressible in the calculus. An
So, (’dn)(~fi”z, y) Lives(n; z, g) expresses that every obvious candidate is the query to decide whether
person lives only on a finite number of points. a finite e-relation of type [0, 2] is connected when
Importantly, afi” can also be used to recognize viewed as a directed graph. We thus have the
properties of infinite semi-algebraic sets. Indeed, following open problem, also stated as an open
many types of geometrical figures used in practical problem in [G S]:
situations consists of an infinite number of points,
Open problem 1 Is graph-theoretic connectivity
but can be described using only a finite number.
expressible in the calculus?
For instance, polygons can be described their
vertices, and circles by their center and radius.
There is also another notion of connectivity
Moreover, the finite description is often first-order
entirely different from the graph-theoretic one,
definable (expressible in the calculus). Given any
which comes in naturally in the context of spatial
fixed collection of such types of figures, it can
data. Recall that an e-relation of type [0, k] is a
then be verified in the calculus whether the spatial
(possibly infinite) semi-algebraic set of points in
information in the database consists of a finite
k-dimensional space Rk. This set may or may
number of figures of these types.
not be connected in the classical topological sense.
Example. As a simple example, we can verify The query to decide topological connectivity in
whether a relation R of type [0, 2] contains only dimension k is in PTIME [HRR91]. For k = 1,
a finite number of line segments. If the endpoints it is easily expressed in the calculus. However, we
of a line segment are (zl, gl) and (X2, y2), then the ask:
286
R, there is one such line segment, connecting the References
vertical a-line with the vertical b-line, situated
[A093] D. Abel and B.C. Ooi, editors. Ad-
at height Nla/m + b, Here, ill is one plus
vances in spatial databases—3rd Sym-
the maximum max.+.,cv la – a’1, and m is the
posium SSD ’93, volume 692 of Lecture
minimum min.#./Gv la – a’1 minus one. It can be
Notes in Computer Science. Springer-
verified that for any two vertices a and b of R, the
Verlag, 1993.
a-line and the b-line are topologically connected if
and only if there is a path in the graph R from a [Arn88] D .S. Arnon. Geometric reasoning with
to b. Furthermore, Q is expressible in the calculus. logic and algebra. Artificial Intelligence,
Hence, the claim follows. ■ 37:37-60, 1988.
The design of complete query languages in our ings of the ACM Symposium on Prin-
framework is an important subject for further re- ciples of Programming Languages, pages
In [KKR90], it was observed that the calculus [CH80] A. Chandra and D. Harel. Computable
cannot be extended with declarative iteration queries for relational database systems.
based on least fixpoints, since such an extension Journal of Computer and System Sci-
would no longer be closed for the spatial data ences, 21(2):156–178, 1980.
model. But (partially defined) procedural iteration
based on conventional while-loops can of course [CO175] G.E. Collins. Quantifier elimination for
still be used. real closed fields by cylindrical algebraic
Another problem is finding languages complete decomposition. Lecture Notes in Com-
not for all i-queries, but for all ~-generic ones, for puter Science, 33:134–183, 1975.
various groups of transformations G (e.g., those
[EF91] M.J. Egenhofer and R.D. Franzosa.
considered in Section 3). In view of our undecid-
Point-set topological spatial relations.
ability result, for this problem it is perhaps more
Int. J. Geographical Information Sys-
appropriate to have primitives for the querying of
tems, 5(2):161-174, 1991.
spatial data that are more specific to G-generic ge-
ometry than general first-order logic. For instance,
i[Eng93] E. Engeler. Foundations of Mathematics.
in the case of S-genericity (corresponding to Eu- Springer-Verlag, 1993.
clidean geometry), the programming language for
geometrical constructions in Euclidean geometry [GS] S. Grumbach and J. Su. Finitely repre-
defined in [Eng93] might be a good starting point. sentable databases. These Proceedings.
287
[GS91] 0. Gunther and H.-J. Schek, editors. Ad-
vances in spatial databases—2nd Sym-
posium SSD ’91, volume 525 of Lecture
Notes in Computer Science. Springer-
Verlag, 1991.
288