Professional Documents
Culture Documents
MySQL and MongoDB For GIS Applications-Final
MySQL and MongoDB For GIS Applications-Final
applications
Contents
Introduction...........................................................................................................................................3
Theoretical knowledge..........................................................................................................................7
1. Geographical Data................................................................................................................................8
2. Geometric Data.....................................................................................................................................8
3. Geospatial Data....................................................................................................................................9
Method.................................................................................................................................................15
Previous Research....................................................................................................................15
Hypothesis.................................................................................................................................15
Variables....................................................................................................................................16
Dataset.......................................................................................................................................17
Code for Testing sqlite3...........................................................................................................18
Performance analysis of sqlite3..............................................................................................25
Code for testing MongoDB.......................................................................................................25
Performance analysis of pymongo..........................................................................................27
Discussion...........................................................................................................................................27
Conclusion...........................................................................................................................................29
Refrences.............................................................................................................................................30
Introduction
The integration of databases within Geographic Information Systems (GIS) has gained paramount
importance in recent years, driven by the exponential growth of spatial data acquisition and
analysis. As GIS applications continue to expand in various horizons, the need for efficient and
reliable databases to store and manage spatial data has intensified. A database management system
is essential in managing data efficiently and enabling users to complete various tasks with ease, as
it "improves the effectiveness of business processes and decreases overall costs" (Sahatqija et al.,
2018a, p. 1).
This supports the importance of database management systems in improving business processes and
reducing costs which is relevant to the context of GIS applications, as efficient and reliable database
systems are crucial for managing the growing volume and complexity of spatial data. It suggests that
the performance and functionality of these systems play a crucial role in driving operational
efficiency and cost savings. This aligns with the broader context of GIS applications, where the
choice of a suitable database system can significantly impact data management, analysis, and
decision-making processes. Therefore, understanding the implications of database selection on
business processes and costs is essential when evaluating the suitability of systems like MongoDB
and MySQL for GIS applications. Choosing the right database management system is a critical
decision that every firm and software developer grapples with regularly. This paper delves into two
of the leading database systems commonly used in GIS applications: MongoDB and MySQL. Each
system offers unique features and capabilities, making them suitable for different scenarios within
GIS. Our goal is to thoroughly explore the comparative strengths and weaknesses of MongoDB and
MySQL. In the paper, we are going to explore the following sections:
"Which database has faster retrieval time for data and how does it vary according to a specific
data type for GIS applications, and how do MySQL and MongoDB perform against that
data?"
This research will provide GIS professionals and organizations with educated insights into
each database's operational efficiencies, particularly focusing on ease of
implementation, performance during complex spatial queries, and overall suitability for
various GIS applications. This objective, therefore, not only addresses a significant academic
gap but also serves a practical need in the GIS community.
Theoretical knowledge
Overview of Geographic Information Systems (GIS)
Geographic Information Systems (GIS) are integral tools that provide the ability to capture, store,
manipulate, analyze, manage, and present all types of geographical data. These systems are pivotal
in supporting decision-making processes across a multitude of sectors. A GIS is defined as: “In
general, the definitions of GIS cover three main components. They reveal that GIS is a computer
system. This implies more than just a series of computer boxes sitting on a desk but includes
hardware (the physical parts of the computer itself and associated peripherals - plotters and
printers), software (the computer programs that run on the computer) and appropriate procedures
(or techniques and orders for task implementation).” (Heywood et al., 2010, 18). This will make
possible the overall view of the GIS as a computer system involving hardware, software and
procedures essential for task implementation. This specialization refers to the overall
interdisciplinary nature of GIS that integrates technology, software, and methodologies into the
running of Geographic Information Systems. Through this emphasis on various
elements that these systems are composed of, it also conveys the level of complexity and intricacy
of these systems which removes the impression that these systems work only because of the mere
physical aspect of computers and peripherals and expands to a
wider aspect concerning software applications and data management techniques which are
equally critical as the hardware components. This overall view implies that the successful use of
GIS necessitates not only the appropriate hardware and software, but also suitable procedures and
organizational structure, to tap into the full potential of this system. The implication of these
features is vital for GIS to get the status of an integrated tool that helps in keeping data,
management, and analysis operations to support decision-making processes. GIS can be defined
in a more general aspect as “A special
kind of information system, often located on a user's desk, dedicated to performing special kinds of
operations related to location.” (Longley et al., 2015, 3). This definition offers a more user-centric
perspective, portraying GIS as a specialized information system focused on location-related
operations. Furthermore, we need to understand the fact that GIS is a facility that helps in handling
all the geographical information and tasks with the help of spatial operations.
Introduction to Spatial Data
Another terminology used commonly in GIS systems is Spatial data also known as geospatial data,
encompasses information about the geographic location and characteristics of natural or
constructed features on the earth, integrating location with descriptive information. This data type
is foundational to Geographic Information Systems (GIS), which are designed to collect, store,
process, and interpret datasets tied to spatial contexts.
2. Geometric Data
Geometric data focuses on the shapes and relative positions of objects in space. This includes
points (single locations), lines (roads, rivers), and polygons (areas like parks, building
footprints), which are used to represent spatial dimensions and boundaries.
3. Geospatial Data
Encompassing both geographical and geometric data, geospatial data adds additional layers of
information such as elevation, population density, or traffic patterns.
This type of data often involves a temporal component that tracks changes over time, providing
dynamic insights into the evolution of a particular area. Geospatial data refers to information about
objects, events, or other features with a position on Earth (Khalilizangelani and Ghaffarian n.d).
What distinguishes spatial data from any other is its coordinates, with which they have a direct link
that gives specific locations their attributes such as latitude or longitude. This spatial reference is
the very reason it is critical for Geographic Information Systems (GIS) and various other
applications that depend on maps, location analysis, and spatial relationships. Since it can
determine the pattern, trend, and correlation in different fields such as urban planning, monitoring
environmental components, and public health. Spatial data, thus, is being considered as a key
factor in spatially oriented decision-making processes where they provide the insights that
irremediably aid in better management and not just the understanding of the
interconnected world.
MongoDB
MongoDB is a NoSQL database that provides high performance, high availability, and easy
scalability. MongoDB stores data in flexible, JSON-like documents, meaning fields can vary
from document to document and data structure can be changed over time. This model makes the
integration of data in certain types of applications easier and faster.
MongoDB falls under the category of NoSQL databases that allow flexibility in storing as well
as performing operations on data. An example of a NoSQL database is MongoDB
which is a document-oriented database that is written in C++. An entry in the database is stored as
an object which in turn is serialized as JSON, XML, or BSON. The objects do not need to have the
same structure, fields, or types of fields, which makes the database flexible (Li & Manoharan,
2013a, p. nd) This stems from the fact that the application is able to dynamically change its data
model without any need to maintain a strict schematic structure that may be difficult to use or
maintain when faced with changing needs or
real-time processing requirements. MongoDB can be changed into heterogeneous data
collections on the fly without incurring the overhead of costly schema migrations by its
capability to do this.
Therefore, this fits in with agile development and especially in cases where the data
requirements are very complex or uncertain, in which for example we can have the content
management systems, IoT or big data analytics. Currently, MongoDB uses GeoJSON objects
to store spatial geometries. GeoJSON is an open-source specification for the JSON- formatting
of shapes in a coordinate space. The GeoJSON spec is used in the geospatial community and
there is growing library support in most popular languages. Each GeoJSON document (or sub
document) is generally composed of two fields:
1. Type – the shape being represented, which informs a GeoJSON reader how to
interpret the “coordinates” field.
2. Coordinates – an array of points, the specific arrangement of which is determined
by “type” field. (agarwal, n.d., p. 39). However, MongoDB does not support R-trees.
The use of MongoDB around GeoJSON objects for a spatial purpose by storing geometries
shows that it adheres to the open-source GeoJSON specification that is the standard across the
geospatial community. GeoJSON is a format for encoding shapes over a coordinate system and
doing this aside others increases interoperability so that data sharing can also be simplified
across systems and programming languages which would enjoy growing library support. Each
GeoJSON document, consisting primarily of
"type" and "coordinates" fields, provides a clear structure: Type of the shape determines the
shape based on the type of the geometry object (such as Point, LineString, or
Polygon), whereas points define the coordinates for these shapes based on the coordinates of
objects. The choice of this technology describes the effective storage and document processing,
which makes it suitable for spatial uses of mapping, geolocation service and spatial analytics.
Through supporting the GeoJSON standard, MongoDB delivers efficient and adaptive location-
based data governance options which satisfy the demands of developers and data scientists for
performing the complex analysis of spatial data. MongoDB supports geospatial queries and
indexing, which makes it suitable for handling GIS data that is less structured or rapidly evolving.
MongoDB's geospatial features include support for GeoJSON and legacy coordinate pairs,
allowing for efficient querying of location-based data.
Method
A comparison between MySQL and MongoDB databases will be conducted using a geospatial
dataset under identical conditions. The dataset will be loaded, and queries executed using Python
within a Jupyter notebook environment, facilitating precise measurement and comparison of
query performance between the two databases. This methodology ensures consistent evaluation of
execution times, identification of optimization opportunities, and thorough analysis of each
database's handling of geospatial data. We will try to answer our research question in this section
through a quantitative approach as stated above. We will be measuring response times for both
the database and analyze why one database is better than the other.
Previous Research
There was a total of 5 different papers which served as primary information sources and as
inspiration for this experiment. These papers were found through means of different search terms
primarily made in Google Scholar and IEEE, which includes “Geospatial Data SQL”, “Geospatial
Data NoSQL”, “maps MongoDB”, “maps MySQL”, “geospatial data”, etc.
“SQL versus NoSQL databases for geospatial applications” (Baralis et al., 2017) and
“Performance Analysis of RDBMS and No SQL Databases: PostgreSQL, MongoDB and
Neo4j” (Sharma et al., 2018). The former discusses many databases, both NoSQL and SQL,
and includes descriptions for all supported geospatial data types within each
database. Although the paper did not test MongoDB or MySQL, it compared other NoSQL and
SQL databases, offering valuable insights into evaluating results and the impact of hardware on
tests. The study provides practical examples of data insertion and handling in both SQL and
NoSQL databases, including MongoDB. While the implementation
differed slightly from the intended approach for this thesis, it still offered useful information on
query usage for data insertion and retrieval. These findings supported the development of
database management and data structure strategies.
Hypothesis
The hypothesis formulated in this regard would be:
“MySQL, due to its structured query capabilities and indexing, will perform better in terms of
query execution times for GIS data compared to MongoDB.”
This hypothesis provides a basis for designing experiments and analyzing results. An
experimental approach is chosen because it allows for controlled and systematic comparisons
between MySQL and MongoDB. We will then perform an experiment to see the effects of the
independent variable (type of database) with the dependent variable (query execution time).
Variables
The variables defined for this process are given below:
- Q = """
[out:json][timeout:25];
node(50.745,7.17,50.75,7.18);
<;
);
out body;
>;
"""
- response = requests.get(f"http://overpass-api.de/api/interpreter?data={Q}")
- If response.status_code == 200:
- Else:
- geojson_data = osm2geojson.json2geojson(response.json())
- G.to_csv('datasetop.csv', index=False)
8. Print GeoDataFrame G:
- print(G)
This pseudocode describes how the code will be performing to perform query selection for the user
and give different results on the basis of what the user selects
1. View all data
2. Search data records
3. Exit
Enter your choice: 2
Enter your search query for 'id': 507464742
geometry type id \
0 POINT (7.1788184 50.748386) node 507464742
1 POINT (7.1788184 50.748386) node 507464742
tags nodes
0 {'VRS:gemeinde': 'BONN', 'VRS:ortsteil': 'Holz... None
1 None None
Time taken to execute query: 0.00 seconds
nodes
0 [99672363, 99668853, 1936339964, 8282081287, 2...
Time taken to execute query: 0.02 seconds
geometry type id \
0 POINT (7.1759171 50.7498233) node 507464720
1 POINT (7.1788184 50.748386) node 507464742
tags \
0 {'VRS:gemeinde': 'BONN', 'VRS:ortsteil': 'Holz...
1 {'VRS:gemeinde': 'BONN', 'VRS:ortsteil': 'Holz...
2 {'VRS:gemeinde': 'BONN', 'VRS:ortsteil': 'Holz...
3 {'VRS:gemeinde': 'BONN', 'VRS:ortsteil': 'Holz...
4 {'VRS:gemeinde': 'BONN', 'VRS:ortsteil': 'Holz...
... ...
1456 None
1457 None
1458 None
1459 None
1460 None
nodes
0 None
1 None
2 None
3 None
4 None
... ...
1456 [300758079, 516363187, 9155787455,
7773004753,...
1457 [99668851, 99672560, 1476849747, 978635870, 11...
1458 [100697013, 7827934685, 99668849, 1585742366, ...
1459 [3723260306, 632714853, 99672363]
1460 [99672363, 99668853, 1936339964, 8282081287, 2...
tags \
0 {'VRS:gemeinde': 'BONN', 'VRS:ortsteil': 'Holz...
1 {'VRS:gemeinde': 'BONN', 'VRS:ortsteil': 'Holz...
2 {'VRS:gemeinde': 'BONN', 'VRS:ortsteil': 'Holz...
3 {'VRS:gemeinde': 'BONN', 'VRS:ortsteil': 'Holz...
4 {'VRS:gemeinde': 'BONN', 'VRS:ortsteil': 'Holz...
... ...
1456 None
1457 None
1458 None
1459 None
1460 None
nodes
0 None
1 None
2 None
3 None
4 None
... ...
1456 [300758079, 516363187, 9155787455, 7773004753,...
1457 [99668851, 99672560, 1476849747, 978635870, 11...
1458 [100697013, 7827934685, 99668849, 1585742366, ...
1459 [3723260306, 632714853, 99672363]
1460 [99672363, 99668853, 1936339964, 8282081287, 2...
[1461 rows x 5 columns]
Time taken to execute query: 0.05 seconds
Performance analysis of sqlite3
The implemented solution, a time of 0. 02 and 0.05 seconds for bringing a record from by 'id' and
all the records via sqlite3 indicate the high processing level in terms of read performance, that is
mainly assured with the proper data structure and indexes. An embedded lightweight database
called SQLite is designed to operate fast by its property of high retrieval, which makes it a pick
for apps that have an average workload on data.
The split-second reaction to queries by SQLite represents an attractive use of the indexing
technology which is used for fast looking up queries and specific searches. This leads to using
sqlite3 as an efficient wagon in the cart for seating purposes that has fast data access and flexible
query feature.
- import pandas as pd
- import time
- sampled_df.to_csv('sampled_datasetop.csv', index=False)
- collection.insert_many(data)
- def menu():
- While True:
6. Execute menu:
- menu()
Performance analysis of pymongo
While fetching all the records the code took 1.23 seconds to retrieve all the records from the
database while fetching an id took 0.14 seconds. Specifically, when categorizing data into similar
groups, it took 1. 23 seconds to explore and print the full dataset from MongoDB DB while an ID
exists find query requires only 0. 14 seconds. The difference in performance can be potentially
caused by the disparity of queries themselves since each of them is different from the others. For
instance, a compare operation of full record sets involves scanning the entire collection of records
and consequently, it takes longer to execute.
Discussion
Comparing both databases, it's clear that SQLite3 significantly outperforms MongoDB in terms
of query execution time, and the reasons quickly become apparent upon closer
inspection. SQLite3's speed advantage likely comes from its lightweight, embedded nature, which
allows it to work entirely in memory, reducing the usual overhead of disk I/O. Its highly efficient
indexing also minimizes the path to relevant data, ensuring fast lookups, especially in single-user
environments where contention and concurrency aren't issues.
On the other hand, MongoDB's slower performance seems to stem from its distributed design,
which, while enabling it to handle massive datasets, also introduces network
latency and computational overhead as it manages data partitions and coordinates results across
clusters. Furthermore, the support for multiple concurrent operations and
ACID-compliant transactions, though crucial for complex data management, increases
resource consumption and further slows execution. Thus, while MongoDB excels at
large-scale, distributed applications, SQLite3 stands out in scenarios requiring swift,
lightweight data retrieval. SQLite3 significantly outperformed MongoDB in terms of query
execution time because the dataset chosen for the experiment had a table like structure rather than
a document structure. This significance was determined by doing multiple benchmark tests where
SQLite3 consistently exhibited faster query responses. The
lightweight, embedded nature of SQLite3 allows it to operate entirely in memory, reducing disk
I/O overhead and since it is a python library, no external storage was used to configure SQL
Server on the device, rather the features of the library were leveraged. Its efficient indexing further
enhances data retrieval speed. In contrast, MongoDB's slower
performance is attributed to its distributed design, introducing network latency and
computational overhead. The tests included 10 measurements for each database, demonstrating
a statistically significant difference in execution times under controlled conditions which means
that 10 times, the code performed similar returns while performing the desired user actions.
Conclusion
The SQLite3 and MongoDB database systems show that apart, each has its outstanding strengths
and is best suited for distinct cases of unequivocal applications. Operation in memory, efficient
indexing, and a lightweight architecture of SQLite3 contribute to its speed outperformance as it
carries out fast data retrieval functions in contrast to
large-scale applications where reading and writing predominantly takes place. Firstly,
MongoDB's distributed design and strong functions can be applied to big data management as
well, high-performance computing across whole massive data and parallel transactions is
executed by MongoDB. Yet, the dropping of such elements leads to even more network latency
and further complicates the order for responding to an
inquiry. The final decision then depends on what application is the most appropriate, as the
demands differ from one application to another. Meanwhile, SQLite3 is better for those situations
where speed and simplicity are prime objectives, compared to MongoDB which is more
appropriate for applications that require distributed, scalable, and
high-performance data management. The knowledge of the dissimilarities is important as this will
make sure that different databases are used according to their strength and will provide the best
performance as per their case use
Refrences
● agarwal, sarthak. (n.d.). Performance analysis of mongodb vs.
PostGIS/PostGreSQL databases for line intersection and point containment
spatial queries. ScholarWorks@UMass Amherst.
https://scholarworks.umass.edu/foss4g/vol15/iss1/50/
● Baralis, E., Dalla Valle, A., Garza, P., Rossi, C., & Scullino, F. (2017, December). SQL
versus NoSQL databases for geospatial applications. 2017 IEEE International
Conference on Big Data (Big Data). http://dx.doi.org/10.1109/bigdata.2017.8258324.
● Baralis, E., Dalla Valle, A., Garza, P., Rossi, C., & Scullino, F. (2017, December).
SQL versus NoSQL databases for geospatial applications. 2017 IEEE International
● Sharma, M., Sharma, V. D., & Bundele, M. M. (2018, November). Performance analysis of
RDBMS and no SQL databases: PostgreSQL, MongoDB and Neo4j. 2018 3rd