Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

2016 International Conference on Computer, Control, Informatics and its Applications

Spatial Co-location Pattern Discovery Using Multiple


Neighborhood Relationship Function
Erna Piantari1, Saiful Akbar2
School of Electrical Engineering and Informatics
Institut Teknologi Bandung
Indonesia
erna.piantari14@gmail.com1, saiful@informatika.org2

Abstract— Co-location pattern discovery is a process to find a connectivity matrix by using the size of the displacement
subset of Boolean spatial feature that is frequently located in the distance, Euclidean distance or a combination of them.
same geographic area. There are some approaches have used for Determination of neighborhood function is based on the
this process. Mostly co-location mining discovery has been done semantics of the domain data. The second one is how to
for point type and the feature has the same domain. But in reality
identify row co-location instance. Row co-location instance is
spatial data has three types, which are point, line, and polygon.
In this paper, we tried to discover spatial co-location pattern that a clique co-location instance that may consist of more than
involves three types of data spatial from a different domain. We one pair of co-location instance. The identification process of
propose multiple neighborhood relationship function to find row co-location instance can be difficult because the objects in
neighborhood relation from the multiple types and multiples the spatial domain are continuous [2].
domains of data spatial and apply co-location mining with Several approaches have been used to discover spatial co-
joinless approach to find co-location pattern. The evaluation of
location patterns, one of them is with Joinless Approach. This
our proposed method that using real data shows that multiple
neighborhood relationship function is needed to extract the approach has many advantages compared to the other
correct and complete spatial relationship to the data that have approaches, which is the excess do not require a reference to
expansion of the data types and heterogeneous data source. establish relations of the neighborhood, make it can be used
for spatial data that cannot be clustered well and the
Keywords— spatial co-location; co-location pattern; spatial complexity is better than Join Approach method. This
mining; spatial neighborhoos relationship approach uses a model to generate star neighbor partition of
neighborhood relations and also been evaluated to process
I. INTRODUCTION spatial objects of type point and give a better result than the
Spatial data mining or spatial knowledge discovery is a Join Approach method.
process of extracting knowledge, spatial relations and spatial A. Problems
pattern that cannot be extracted directly from a spatial dataset
[1]. Along with the fast growth of data spatial, the process of 1) Expansion of the data types
spatial data mining is important to do. One of the processes in In reality, spatial data has three types of data, which are
spatial data mining is spatial co-location pattern discovery. points, lines, and polygons. However, co-location pattern
Spatial co-location pattern is a subset of Boolean feature that discovery involving three types still not widely applied. The
is frequently located in some geographic area [2]. Spatial co- difference of co-location discovery on the type point with the
location pattern may give important insights for many other type is the determination of neighborhood functions. In
application. For example, in ecology and environment previous research have been conducted co-location pattern
management application. In that area, we can use it to know mining involving all spatial types to build a framework using a
the spatial object that can be found close to damaged buffer function. Buffer function is used to generate relation of
environment frequently (i.e. fire, flood). In transportation spatial object [4].With this approach, we must consider the
application, G. Kiran et al used co-location pattern to analyze size of the buffer of spatial objects. However, buffering
the factors that cause accidents and give remedies. operation to all type of data is not efficient. For point type, the
other approach that can be used is generating the
There are things to be aware of to discover co-location
neighborhood relationship using near distance function. The
patterns. First, finding a pair of spatial objects that have spatial
near distance function is a function that calculates the distance
co-location relationship. We called this pair as co-location
of each object to other near object and generates a distance
instance. Neighborhood relationship function is required to
matrix. By calculating the distance of near object, we can
find co-location instance. This function will be implemented
determine the relationship between two point objects. This
for determining the co-location relationship of an existing
approach is more efficient than buffer operation for point type.
object in the spatial dataset. The neighborhood function can be
Therefore, the determination of neighborhood function has to
defined using graph theory as a connected, adjacent or
consider the type of spatial data.

978-1-5090-2323-3/16/$31.00 2016
c IEEE 83
2) Heterogeneous Data Source probability determined [2]. In the co-location algorithms
As mentioned earlier, the determination of the mining, every data will be initialized to a collection of co-
neighborhood function should be based on the domain location which has a size of k = 1, k is the number of members
semantics of data processed. At the same database, semantic of the collection of the co-location. Next will be raised co-
differences of each feature will probably require different location pattern which has a size of k = 2 and so on up to k =
neighborhood functions. This will often happen on a spatial the number of features or approaching the number of features
dataset consisting of different data types (points, lines or contained in the data. For any size of k> 1, participation index
polygons). Therefore, we need an approach that allows value will be calculated and will be used as interestingness
different neighborhood relation functions for each feature in measure of co-location pattern. Participation index value can
the dataset. be obtained by calculating the participation ratio (PR) with the
In this study, co-location pattern mining will be done following formula [5][6][8]:
using all types of spatial objects (points, lines, and polygons).
Materialize neighborhood relation process will use several
different functions for each feature. Furthermore, co-location (1)
pattern mining is done by Joinless approach that has proved
effective to look for co-location patterns in the data that do not
have a reference feature and cannot be clustered properly [2]. (2)
B. Contributions
Base on the problems above, our contributions on this Yan Huang and Pungseng Zhang has developed a framework
paper are: co-location pattern mining using clustering approach [7]. This
approach suitable for data that can be clustered properly.
1) Propose multi neighborhood function to find spatial Seung Kwan kim et al constructed a framework for co-
neighborhood relation in dataset that involve three type of location pattern mining using transaction based approach, and
data with various domain data employs maximal cliques as transaction type dataset [9]. This
2) Apply Joinless approach to discovery spatial co-location approach suitable for data that has an object reference. For
pattern in dataset that involve three type of data spatial example to discover co-location pattern in data cancer image.
3) Conduct experiments to evaluate the proposed use real Venkatesan et al used participation index as interest measure
data. [8]. For data that cannot be clustered and doesn’t have object
The rest of the paper is organized in the following way: reference, Soung Ji Yoo, and Shashi Shekhar has developed
Chapter II and III of this paper present related work and the joinless approach to co-location pattern mining [2]. This
proposed solution for the defined problem. In Chapter IV, approach implements star neighbor partition to avoid join
implementation and result of discovery co-location pattern process while join process needs high cost computation.
using real data. The conclusion is presented in Chapter V.
B. Neighborhood Relation
II. RELATED WORK Nearest neighbor search is a fundamental process in
machine learning and data mining [9]. Various methods have
In this section, we discuss the basic concept and several
been carried out to find the nearest neighbors of spatial data.
works related to the topic of this paper.
One simple method is using the distance value of each object
A. Co-location Spatial to another object. Another method used to find the relation
between spatial objects is the spatial query for topology
Co-location of spatial pattern describes a pattern that
relation.
indicates the existence of subsets of a set Boolean features of
spatial data whose members often appear frequently in the 1) Distance Relation
same geographic area. The pattern described represents the
Distance relation is a relation that defines how far a certain
relationship between the events that occurred at a different
object to another object. Distance relation function results in
location but close together.
real value that can be calculated. There are some distance
There are several approaches that can be used to find functions that often used to find a neighborhood relation. The
spatial co-location pattern, i.e. spatial statistics and data simple one is Euclidean distance function. Another distance
mining approach [5][6]. A spatial statistical approach using function is Manhattan [11].
spatial correlation calculations to classify the relationship
between the different spatial features. The calculation includes 2) Topology Relation
the chi-square tests, correlation coefficient, and regression In GIS, topology is the term used to describe the
models. geometric characteristics of object which do not change under
Co-location mining approach is an algorithm that is used transformations such as stretching or bending and are
to generate the entire pattern and rules of co-location that has a independent of any co-ordinate system [12] [13]. Topology is
value of prevalence and conditional probabilities above the related to spatial data, consists of three elements (adjacency,
minimum value prevalence and minimum conditional containment, and connectivity) which describes the geometry

84
relationship that exists between spatial object [14]. materialize neighborhood is co-location instance that will
Clementini has defined a formal description of topology process to the next process.
relations using predicates function [15]. There are eight We propose three materialize neighborhood relation
topology relations: operations: kd-tree construction, buffering operation and
topology relation (Fig. 2). Kd-tree and buffering operation
1) Disjoint specifically used to search object relation that defined by
The disjoint relation is true if there is no intersection distance.
between two geometries object.
2) Equal Kd-tree
Equal(A,B) is true, if there are intersection between
interior of A and interior of B, no intersection between buffering
Generate star Co-location
Data spasial neighborhood instance
exterior of A and interior of B , no intersection between
boundary of A and exterior of B, and no intersection
Topology relation
between interior of A and interior of B. In other words, using spatial query
equal relation is true if two objects of geometry have the
same type and have identical coordinate.
3) Intersection Fig. 2. Materialize neighborhood relations for expansion of data types
The intersection is opposite of disjoint. This relation will
be true if there is an intersection between two objects of A. Distance neighborhood relation
geometry. The process to find the nearest neighbor usually done by
4) Touch calculating the distance of each object to all other objects that
Touch relation is true if there is no intersection that exist in the dataset. The nearest neighbors of an object
involves interior of both of object. One object has to be a reference are obtained by looking for other objects that have a
line, polygon, multi-line or multi-polygon. distance value within the certain threshold. Complexity to
5) Crosses calculate the distance of all objects in this way is O (N) where
This relation is true if there is an intersection that results N is the size of the object involved (n x n).
from the object which is the smallest dimension.
6) Overlap 3) Using Kd-tree
This relation will be true if there are intersections between For large data, searching neighborhoods by building a
two object that have the same dimension. distance matrix (n x n) is not effective, since the object at a
7) Within great distance can be identified is not the nearest neighbor so
This relation is true if the whole first object is within the that the complete calculation is not required. Therefore, we
second object. propose to be built index kd-trees. This index is used to find
8) Contains the nearest neighbor object of spatial data, in the form of a
Contains relation is opposite of within relation. This relation is point with a radius. This method quite effective to locate the
true if the whole second object is within the first object. object nearest neighbor for data with a lot of the number of
rows but low dimensions [10].
III. THE PROPOSE SOLUTION
In Fig. 3, the structure of kd-trees index constructed by
There are five main phases to mining co-location pattern: dividing each area on the median value of the data point.
Dataset preparation, materialize neighborhood relation, Zoning performed on the y-axis and x in turn and will stop
generate candidate co-location, filtering and generate co- when the amount of data in one little area or 1. From the
location pattern (Fig.1). structure of the data index kd-trees, with radius r, the object of
Materialize Generate co- the object q nearest neighbor is the data residing on the
Spatial neighborhood location pattern territory of which passes through the line circle with radius r is
dataset relations candidate
region 1, 2, 3, 4, and 5. The complexity of the nearest
neighbor search using kd-trees index is O (log n). It is more
Generate co-location
filtering
effective than calculate all distance and search neighbor object
pattern
with minimum distance.
Fig. 1. Co-location mining process

Our solution for the problem on this paper is using


different relation function for different data type and domain.
Therefore, the research on this paper focuses on materialize
neighborhood relation phase. Input for materialize Fig. 3. Illustration of searching nearest neighbor using Kd-tree structure [9]
neighborhood relation is data spatial vector (point, line and
polygon) with the heterogeneous data source. The output of As the illustration on Fig. 3, the nearest neighbor that
success to be generated by kd-tree did not have certain distance

85
value from object reference q. So, we have to calculate the the relation on any feature of the data is based on a formal
distance between an object reference and the nearest neighbor. definition by Eliseo Clementini, et al. [14].
We used Euclidean or Manhattan distance function, it depends
on the requirement. Figure 4 is an algorithm to materialize Generation co-location instance is done in one operation
neighborhood relation using kd-tree. function of topological relations in all spatial objects in certain
features. Operating results from the relation function are
Get co-location instance using kd-tree
Boolean values true and false. For example:
S : spatial dataset
Input r : radius for search nearest neighbor A and B are spatial objects,
minDistance : threshold
f (A, B): topological relations function,
colocationInstance : a pair of spatial object as a co-
Output
location instance if f (A, B) = true then A and B are co-location instance
tree : structure of kd-tree Technically, topology relation can be generated by query
Variable I : index of S spatial operation.
NN : list of nearest neighbor using kdtree
Process Base on the explanation of problem solution, distance
1. tree = kdtree(S) function, and topology relation function can be alternative of
2. for i to S neighborhood relation function. Table I presented the
3. NN = getNearestPoint(r, tree, i) alternative relational function that can be used for points, lines,
4. for n in NN and polygons.
5. if (getDistance(n,i) < minDistance)
6. colocationInstance.add(n,i)
7. end if TABLE I. NEIGBORHOOD RELATION FUNCTION
8. end for
9. end for Alternative Relation Functions (A,B)
Spatial
Fig. 4. Materialize neighborhood relation for point using kd-tree Spatial Topology relations
object Distance
object B Inter- Contain
A relation equal touch cross overlap within
section -ed
4) Buffering
point point kd-trees √ √
In contrast to point data types, distance function can be point line buffer √ √ √
directly used to find distance relation for polygons and lines. point polygon buffer √ √ √ √
Therefore, we use buffer operation to generate distance relation Line line buffer √ √ √ √ √ √ √
for polygon and line. H. Xiong and Shekar (2004) has applied Line polygon buffer √ √ √ √
buffer operation for co-location mining with extended data [4]. polygon polygon buffer √ √ √ √ √
Fig. 5 shows algorithm materialize neighborhood relation using
buffer operation.
Materialize neighborhood process will be illustrated below.
co-location instance using buffering Consider Fig. 6 is a dataset that consists of three type of data:
S : spatial dataset C1 is a polygon, B1 is a line, A1, A2, A3, A4 and A5 are
Input r : radius for buffering
points. Neighborhood functions are used is presented in Table
TopologiQuery : topology function
colocationInstance : a pair of spatial object as a co- II.
Output
location instance
bufferZone : new polygon as a result of buffering
Neighborhood relation between point and point is
operation generated using distance function. It represents by line. A1
Variable
n : index of bufferZone
and A2 have a line between, so A1 and A2 is a pair of co-
location instance. Buffering zone has created to line and
I : index of S
polygon. A3 is within B1’s buffering zone, A4 is within B1’s
Process
10. bufferZone = SpatialQueryBuffering(r,S) buffering zone and C1’s buffering zone, A5 is within C1’s
11. for n in bufferZone buffering zone. Therefore (A3, B1), (A4, B1), (A4, C1) and
12. for i in S (A5, C1) are pairs of co-location instance.
13. if (TopologiQuery(bufferZonen,i))
14. colocationInstance.add(n,i)
15. end if
16. end for
17. end for
Fig. 5. Materialize neighborhood relation using buffering

5) Topology relation
The functions in relation topology can help to find a
relationship between two or more objects of spatial points,
lines or polygons. In addition to the data domain, spatial data Fig. 6. Dataset for illustration
type determines the function of topology relationships that can
be used to generate object spatial relationships. Determining

86
TABLE II. MATERIALIZE NEIGHBORHOOD ILLUSTRATION pattern that has the largest participation index is ‘area with
Object 1 Object 2 Neighborhood function Co-location Instance high population’, ‘river’ with pi = 0.906. This value indicates
Point Point Distance (A1,A2) that river and high population has strong co-location
Point Polygon buffering and within (A4, C1), (A5, C1) relationship. Fig 7. shows mapping river and area with high
Point Line buffering and intersection (A4, B1), (A3, B1) population and Fig. 8 shows co-location instance that was
Polygon Line buffering and intersection (B1,C1)
generated by co-location mining. In Fig 8, area id= 6 is not
co-location instance because there is no intersection between
Furthermore, Star neighbor was generated from co-location
that area and river.
instance. Table III is a star neighbor from the co-location
instance in Table II. Star neighbor will process to discover co-
location pattern using joinless approach algorithm by Soung Ji
Yoo and Shashi Shekhar [2].
IV. IMPLEMENTATION AND RESULT
In this section, we present the implementation to discover
co-location pattern using real data. The tools used for
implementation are QGIS for displaying the result, PostGIS to
store the data and execute the spatial query, and Python 2.4 as
a primary language. Fig. 7. Mapping river and high population area

B. Data
This implementation used three type of data that have
feature variation. Table IV provides a list of data that be used.

TABLE III. DATASET WITH FEATURE VARIATIONS


Feature Data Type
Garbage dumb, market, epidemic of Point
Dengue Hemorrhagic Fever (DHF)
Market, garbage dumb, Point
River Line
Fig. 8. Mapping co-location instance of river and high population area with
Level of population area, level of polygon pi=0.906
health service area
Another rule is a market and big garbage dump that has pi=
C. Materialize neighborhood 0.75. Fig. 9 shows co-location instance of this rule. Relation
Consider the alternative of neighborhood relation in Table object is represented as lines.
I, this implementation uses a neighborhood relation in Table
V.

TABLE IV. NEIGHBORHOOD RELATION FUNCTION FOR IMPLEMENTATION


Data 1 Data 2 Operation Neighborhood
relation
Point Point Kd-trees Euclidean distance
Point (all data
Create river
that have point Line (river) Within
buffer
type)
Polygon (all
Create river
Line (river) data that have Intersection
buffer Fig. 9. Co-location relation of market and big garbage dump with pi=0.75
polygon type)
Polygon (all Polygon (all
data that have data that have - Equal Fig. 10 shows that co-location relation between the
polygon type) polygon type) epidemic of DHF and middle garbage dump is low. The value
Polygon Point - Within of this rule is 0.28.

D. Co-location pattern
The implementation of co-location mining was performed
several times using different participation index. The Co-
location mining using participation index 0.45, 0.40, 0.35,
0.30, and 0.25 product co-location pattern with maximal
length was 5. Therefore, we used 0.45 as participation index.
The co-location pattern mining generated 174 rules. The

87
ACKNOWLEDGMENT
This research was supported by Sekolah Teknik Elektro
dan Informatika, Institut Teknologi Bandung.

REFERENCES
[1] Guo Dianseng, Jeremy Mennis, “Spatial data mining and Geographic
Knowledge Discovery”. ELSIVIER : Computer, Environment and
Urban System 33, 2009, pp. 403-408.
[2] Soung Ji Yoo, Shashi Shekhar , “A joinless approach for mining spatial
co-location patterns,” IEEE Transaction on Knowledge and Data
Fig. 10. Co-location relation of epidemic of DHF and middle garbage dumb Engineering, 10 (18), pp 1323-1337, 2006.
with pi = 0.28 [3] G. Kiran Kumar, P.Premchad, T.Venu Gopal, “Mining of Spatial Co-
location Pattern from Spatial Datasets”. International Journal of
Computer Application (0975-8887), Vol 42, No 21. 2012
V. EVALUATION [4] Xiong Hui, Shashi Shekhar, Yan Huang, Vipin Kumar, “A framework
for discovering co-location patterns in data sets with extended spatial
The evaluation process is conducted in two conditions: objects”, In : Proceding of the 2004 SIAM International conference on
data mining (SDM’04), Lake Buena Vista, FL, pp 78-89. 2004.
A. Completeness [5] Huang Yan, Shashi Shekhar, Hui Xiong, “Discovering co-location
Completeness is an indicator that all pattern that has been patterns from spatial dataset : a general approach”, IEEE Transaction on
generated is complete. There is no loss pattern. To fill this knowledge and Data Engineering, Vol. 16(12), pp 1472-1485, 2004.
condition, the materialize neighborhood relation has to [6] Zala Rushirajsinh, Brijesh B Metha, Mahipalsinh R. Zala, “A survey on
spatial co-location patterns discovery from spatial dataset,”
generate all relation in the dataset. Utilizing correct relation International Journal of Computer Trends and Technology (UCTT). Vol.
function for all feature in dataset done for generating complete 7, no 3, Jan 2014.
relation for each feature. Furthermore, generated star neighbor [7] Huang Yan, Pusheng Zang, “On the relations between clustering and
was done without loss any relation. spatial co-location pattern mining, ” IEEE International Conference on
Tools with Artificial Intelligent, 2006.
[8] Venkatesan, Arunkumar Thangavelu, Prabhavathy. “Event centric
B. Correctness modeling approach in co-location pattern analysis from spatial data, ”
Internatinal Jpurnal of Database Management System (IJDMS), Vol. 3,
Correctness is an indicator that all pattern that has been No. 3, August 2011.
generated is correct. This condition has been filled by filtering [9] Kwan Seung Kim, Jee Hyung Lee, Keun Ho Ryu. “A framework of
process. Applying clique filtering and prevalence filtering in spatial co-location pattern mining for ubiqiutous GIS”, Proc. Springer
joinless approach co-location mining, assured that all pattern Science Business Media, 2014.
that has been generating has filled this condition. Futhermore, [10] Liberty Edo. “Nearest neighbor search”
co-location pattern result of implementation evaluated by http://www.cs.yale.edu/homes/el327/ , access : 26 August 2015.
participation index value. Base on human judgment using [11] Greenacre M, “Correspondence analysis and related methods”,
http://www.econ.upf.edu/~michael/stanford/ , access : 26 May 2016
visulization of each pattern, co-location pattern result with it’s
[12] Heywood Ian, Sarah Cornelius, Steve carver. An Introduction to
participation index value could be accpetable. Geographical Information System. Pearson Practicall Hall. 2006.
[13] Bernharden T. Choosing a GIS . In: Longley P A, Goodchild M F,
Maguire D J and Rhind D W (eds) Geographical Information System.
VI. CONCLUSION AND FUTURE WORK Wiley, New York. Pp 589-600. 1999.
In this paper, we did co-location pattern mining with the [14] Burrough P. Principles of Geographical Information Systems for Land
Resources Assessment. Clarendon Press. Oxford.
expansion of data types (point, line and polygon) and
[15] Clementini Eliseo, Paolino Di Felice, Peter van Oasterom, “A small set
heterogeneous data sources. With three alternatives of of formal topological relationship suitable for end-user interaction ”, In
neighborhood relation, we materialize neighborhood relations Abel David; Ooi, Beng Chin. Advance in Spatial Databases: Third
for point, line and polygon. Applying different function for International Symposium, SSD ’93 Singapore, June 23-25. 1995.
each feature has become a solution to obtain complete and Procedings. Lecture Notes in Computer Science. 692/1993. Springer . pp
correct relations for a dataset with the expansion of data type
and heterogeneous data source. In the future, the co-location
mining problem should be investigated for is about
hierarchical relation conditions.

88

You might also like