Professional Documents
Culture Documents
Experimental Evaluation of Spatial Indices With Festival: October 2016
Experimental Evaluation of Spatial Indices With Festival: October 2016
net/publication/310295040
CITATIONS READS
6 72
3 authors, including:
Some of the authors of this publication are also working on these related projects:
USO DE FERRAMENTAS DE CÓDIGO ABERTO EM UM DATA WAREHOUSE SOBRE DADOS EDUCACIONAIS View project
All content following this page was uploaded by Anderson Chaves Carniel on 15 November 2016.
demos:03
Abstract. Spatial indices like the R-tree and the R*-tree are widely employed in
spatial databases to improve the spatial query processing, such as point queries
and spatial range queries. Different parameters are conceivable for spatial in-
dices, which directly impact in their performance. Despite there are many evalu-
ations of spatial indices in the literature, the reproducibility of these evaluations
requires much implementation efforts. In this paper, we propose FESTIval, a
PostgreSQL extension that provides a unique environment to evaluate different
spatial indices with different parameters. As a result, FESTIval automatically
collects statistical data of performed operations and allows the performance
comparison of spatial indices by using different metrics.
1. Introduction
Several advanced applications use spatial database systems to manage spa-
tial information represented by spatial data types like points, lines, and re-
gions [Schneider and Behr 2006]. For instance, cities represented by regions. Spa-
tial queries [Gaede and Günther 1998] commonly employ topological predicates (e.g.,
overlap, inside) [Schneider and Behr 2006] to return a set of objects that satisfy
some topological predicate. For instance, a spatial range query that returns all ob-
jects overlapping a rectangular-shaped object (window query). A huge set of spa-
tial indices [Gaede and Günther 1998] have been proposed in the literature to im-
prove the spatial query processing. Hierarchical spatial indices are the most popular,
such as the R-tree [Guttman 1984] and its variant the R*-tree [Beckmann et al. 1990,
Beckmann and Seeger 2009]. Parameterization of spatial indices plays an important role
in the performance of the spatial query processing.
Experimental evaluations are needed to verify the performance of a spatial index.
This task is quite complicated since functional implementations are hardly found even for
the most popular spatial indices. Another problem is that available implementations are
based on different programming languages or system environments, and thus, it can lead
to unfair comparisons and to problems in the collection of statistical data.
This paper has two objectives. The first objective is to propose a unique framework
for spatial indexing evaluation that contains implementations of different spatial indices
to provide fairer comparisons between them. The second objective is to capture, collect,
and store statistical information of performed operations (e.g., insertion, spatial queries)
123
31th SBBD – Demonstration Track October, 2016 – Salvador, BA, Brazil
and thus, comparisons of different spatial indices with different parameters would be pos-
sible. We achieve these goals by proposing FESTIval (stands for Framework to Evaluate
SpaTial Indices in non-volatile memories for PostgreSQL). FESTIval is a framework im-
plemented in a form of a PostgreSQL extension that aids in the execution of experimental
evaluations of different spatial indices for non-volatile memories, such as magnetic disks.
It provides the following main functionalities:
2. Related Work
Spatial indexing has been an important topic in spatial databases and several spa-
tial indices with different characteristics are proposed [Gaede and Günther 1998].
Commonly, extensive experimental evaluations are conducted in order to check,
verify the performance of a spatial index [Guttman 1984, Beckmann et al. 1990,
Gaede and Günther 1998, Beckmann and Seeger 2009, Sowell et al. 2013]. Here, we
consider two important features: (i) the use of a unique framework or environment that
comprises the implementations of compared spatial indices and (ii) the collection of a
expressive set of different comparison metrics.
With regard to the first characteristic, a majority of the approaches [Guttman 1984,
Beckmann et al. 1990, Gaede and Günther 1998, Beckmann and Seeger 2009] does not
use a unique framework or environment system to evaluate the performance of spatial
indices. As a consequence, the experiments are conducted by using a specific implemen-
tation for each spatial index. Hence, in order to compare different spatial indices, we need
either to reimplement them based on their original papers or to reuse their existing im-
plementations. Unfortunately, the reimplementation is the most common situation since
the source code of the spatial indices are often not available by several reasons, such
as license restrictions. On the other hand, FESTIval and [Sowell et al. 2013] provide a
unique extensible, free, and open-source framework to evaluate different spatial indices
and thus, do not require extra efforts of implementation. While FESTIval offers several
different parameterizations of a spatial index, it is not the case of [Sowell et al. 2013].
Hence, FESTIval allows the reproducibility of experimental evaluations and its possible
extensions by changing several parameters of spatial indices (see Section 4).
With regard to the second characteristic, the approaches differ in the way
that they collect the statistical data of an experimental evaluation. A major-
ity of the approaches [Guttman 1984, Beckmann et al. 1990, Gaede and Günther 1998,
124
31th SBBD – Demonstration Track October, 2016 – Salvador, BA, Brazil
Beckmann and Seeger 2009] collects specific statistical data from their own implemen-
tations. Thus, in addition to the cost of the implementation of each spatial index of the
experiment, these approaches also collect their statistical data of interest. On the other
hand, FESTIval automatically stores and collects an expressive set of statistical data from
a spatial index operation. The collected statistical data are based on several metrics used
in existing evaluations and are detailed in Section 4.2.
4. The FESTIval
We propose FESTIval (stands for Framework to Evaluate SpaTial Indices in non-volatile
memories for PostgreSQL), which offers a unique environment to perform experimen-
tal evaluations of different spatial indices with different parameters. FESTIval is a
125
31th SBBD – Demonstration Track October, 2016 – Salvador, BA, Brazil
There are two categories of information managed by FESTIval and stored in sdf.
The first category refers to the configuration of a spatial index. Firstly, FESTIval stores
needed information of indexed spatial objects, such as its table, column, and primary
key column. This information is stored in the table Source. We also store two types of
parameters of a spatial index. The first type refers to generic parameters that are used
for all spatial indices implemented in FESTIval, such as page size as well as minimum
and maximum entries allowed in nodes. These parameters are stored in the table Basic-
Configuration. The second type refers to specific parameters that are used by a specific
index. That is, each spatial index has its own set of specific parameters, and the table
SpecializedConfiguration generalizes the specific parameters of each spatial index. For
instance, the R*-tree has specific parameters, such as the reinsertion percentage of leaf
and internal nodes (rein perc) and its type (rein type). They are stored in the specialized
table RStarTreeConfiguration and for each register in this table, there is a value in Spe-
cializedConfiguration. It is performed similarly for the R-tree. Several possible values of
both generic and specific parameters are included by default. Users are also able to insert
new parameters, which are checked to determinate if they are valid.
The second category refers to the storage of statistical data collected after a per-
formed operation. Two types of statistical information are collected by FESTIval. The
first type refers to statistical data of a performed operation. This information is stored in
the table Execution, which is a non-normalized table that stores statistical data of any type
of execution. Thus, each spatial operation fills some specific columns of this table (see
126
31th SBBD – Demonstration Track October, 2016 – Salvador, BA, Brazil
Section 4.2). The second type refers to statistical data about the structure of the spatial
index. It includes, the height of the index, number of leaf and internal nodes, number of
entries in leaf and internal nodes, and the total and dead space area of each node. This
information is stored in the table IndexSnapshot.
FESTIval provides the following SQL function that constructs a spatial index:
FT CreateSpatialIndex(integer index , text name, text path, integer src id , integer bc id ,
integer sc id ). It returns true if the spatial index was successfully constructed, and false
otherwise. A spatial index is constructed by inserting spatial objects one by one according
to the insertion algorithm of the index. To create a spatial index, FESTIval considers the
following parameters: index, name, path, src id, bc id, and sc id. The parameter index is
an identifier that specifies the spatial index to be constructed. The possible values are 1 for
the R-tree and 2 for the R*-tree. The parameter name consists in the name of the spatial
index while the parameter path is the full path of the index directory. The parameter src id
is a primary key value of the table BasicConfiguration that indicates the spatial objects to
be indexed. The parameters bc id and sc id specify the generic and specific parameters
of the spatial index by using the primary key values of the tables BasicConfiguration and
SpecializedConfiguration, respectively. Since the specific parameters refer to only one
type of index, FESTIval checks if the index to be constructed (i.e., the parameter index)
is compatible with the values of the sc id.
This function also automatically collects statistical data with respect to the con-
struction of a spatial index. This data is stored in the table Execution (Section 4.1) which
includes the following information: total time of the construction (total time), processing
time of read and write operations (read time, write time), and processing time of splitting
operations (split time). FESTIval also collects other statistical data that are not showed
in Section 4.1. Further, FESTIval allows to visualize the constructed index by accessing
another relational table. This additional statistical data is not detailed here due to space
limitation but detailed in FESTIval documentation.
FESTIval provides the following SQL function that processes a spatial query:
FT QuerySpatialIndex(text name, text path, integer query, geometry obj , integer p). It
returns a set of records that corresponds to the final result of the query with the follow-
ing format (id , geo), where id is the primary key value of the spatial object geo. This
SQL function has the following parameters name, path, query, obj, and p. The parame-
ters name and path specify respectively the name and the location of the index previously
created (Section 4.2.1). The parameter query identifies the type of the spatial query that
will be processed, which can be 1 for spatial selection, 2 for spatial range query, and 3
for point query. The parameter obj gives the search object to be used in the spatial query.
127
31th SBBD – Demonstration Track October, 2016 – Salvador, BA, Brazil
Some restrictions with respect to the geometric format of obj may be applicable. The first
restriction is if query is 2, then the MBR of obj is considered. The second restriction is
if query is 3, then only a point object is allowed for obj. The parameter p specifies the
topological predicate to be used in the spatial query, which includes intersects, overlap,
disjoint, meet, inside, coveredBy, contains, covers, and equals.
This function also automatically collects statistical data with respect to the spa-
tial query processing. This data is stored in the table Execution (Section 4.1) which in-
cludes: total time of the spatial query (total time), processing time of refinement and
filtering steps (filter time, refin time), processing time of read operations (read time), the
employed predicate (query pred), and the number of candidates and results of the spa-
tial query (cand num, result num). Due to space limitation, we recommend to access the
FESTIval documentation that details other collected statistical data.
Acknowledgments. This work has been supported by the Brazilian federal research agen-
cies CAPES and CNPq as well as by the São Paulo Research Foundation (FAPESP). A.
C. Carniel has been supported by the grant #2015/26687-8, FAPESP. R. R. Ciferri has
been supported by the grant #311868/2015-0, CNPq. C. D. A. Ciferri has been supported
by the grant #2016/04990-3, FAPESP.
References
Beckmann, N., Kriegel, H.-P., Schneider, R., and Seeger, B. (1990). The R*-tree: An effi-
cient and robust access method for points and rectangles. SIGMOD Record, 19(2):322–
331.
Beckmann, N. and Seeger, B. (2009). A revised R*-tree in comparison with related index
structures. In ACM SIGMOD Int. Conf. on Management of Data, pages 799–812.
Gaede, V. and Günther, O. (1998). Multidimensional access methods. ACM Computing
Surveys, 30(2):170–231.
Guttman, A. (1984). R-trees: A dynamic index structure for spatial searching. SIGMOD
Record, 14(2):47–57.
Schneider, M. and Behr, T. (2006). Topological relationships between complex spatial
objects. ACM Trans. on Database Systems, 31(1):39–81.
Sowell, B., Salles, M. V., Cao, T., Demers, A., and Gehrke, J. (2013). An experimental
analysis of iterated spatial joins in main memory. Proc. VLDB Endow., 6(14):1882–
1893.
128