Professional Documents
Culture Documents
Migration of Data From Relational Database To Graph Database
Migration of Data From Relational Database To Graph Database
net/publication/325095439
CITATIONS READS
0 131
2 authors:
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Yelda Ünal on 20 November 2018.
ABSTRACT* 1 INTRODUCTION
Relational databases have been widely used in many applications Relational database management systems were first released
until today and they have met needs for data-intensive domains in the early 1970's. The relational model has been the most
and transactions, but today data is growing faster than ever and popular and common database model for both commercial and
extracting information from this huge data is becoming more non-commercial applications since it was created. Today, there
challenging. Growing size of data and number of connections are many commercial relational database management systems,
between data items reduces performance because relational such as Oracle, IBM DB2 and Microsoft SQL Server and there are
databases use many complex join operations to query and access also free and open source RDBMS, such as MySQL, PostgreSQL
data. As a solution, graph database store these connections and SQLite. Relational database model is set of tables, rows and
between entities and provide traversing connections fast and columns to organize and store data. A table can be showed as a
easily and accessing data efficiently. This article reports on our matrix of rows and columns, where each intersection of a row and
experience of migration of document-based, parent-child column contains a specific value of data. It is relational since all
hierarchical data from relational database to graph database. It rows share same fields in a table and relationships can be created
also reports comparison of data access processes and performance among the tables to store and retrieve selected data efficiently.
between relational database and graph database. The standard way to access data from a relational database is SQL
(Structured Query Language) query. SQL queries can be used to
CCS CONCEPTS create, read, update and delete data from existing tables.
• Information systems → Data management systems → Today, with the spread of the Internet and digitizing of the
Database design and models → Graph-based database models data, data is growing faster than ever and extracting information
→ Hierarchical data models → Migration from relational from this huge data is becoming more challenging. Growing size
database to graph database, Comparison of data access. of data and number of connections between data reduces
performance in traditional database management systems because
KEYWORDS these databases use many complex join operations to access and
Relational Database, Graph Database, Migration, NoSQL. retrieve data. With the growing size of data, relational database
models started not falling to meet requirements of application
ACM Reference format:
domains that are data intensive and have highly connected data.
Y. Unal and H. Oguztuzun. 2018. Migration of Data from Therefore, researchers started to investigate storage alternatives to
Relational Database to Graph Database. In ICIST ’18: 8th relational databases. NoSQL is a common term for those
International Conference on Information Systems and alternative systems and first began to gain popularity in 2009.
Technologies, March 16–18, 2018, Istanbul, Turkey. ACM, New BigTable, Cassandra, CouchDB and Dynamo are all NoSQL
York, NY, USA, 5 pages. https://doi.org/10.1145/3200842.3200852 projects, as they are huge-volume and highly connected data
stores that eschew relational and object-relational models. Early
adopters of graph technology reimagined their businesses around
Permission to make digital or hard copies of all or part of this work for personal or the value of data relationships. These companies have now
classroom use is granted without fee provided that copies are not made or distributed become industry leaders: LinkedIn, Google, Facebook and
for profit or commercial advantage and that copies bear this notice and the full
citation on the first page. Copyrights for components of this work owned by others PayPal.
than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, Graph database model is set of nodes, edges and properties to
or republish, to post on servers or to redistribute to lists, requires prior specific represent and store data and use graph structure for semantic
permission and/or a fee. Request permissions from Permissions@acm.org.
ICIST ’18, March 16–18, 2018, Istanbul, Turkey queries to retrieve data, based on the NoSQL approach. Graph
© 2018 Association for Computing Machinery. databases are gaining a lot of interest, as they use powerful data
ACM ISBN 978-1-4503-6404-1/18/03. . . $15.00
https://doi.org/10.1145/3200842.3200852 modeling tools that provide a closer fit to real world data. Graph
databases store, process and query connections between data
ICIST '2018, March 16–18, 2018, Istanbul, Turkey Y.Unal and H.Oguztuzun
efficiently by storing relationships as edges in the data model. applications. In this study, MySQL Community Server is used as
While relational databases compute relationships at query time a relational database management system for storing application
through expensive and complex join operations, graph databases data.
stores and process connections as data entity. Neo4J is one of the most popular graph database management
A graph database stores connections as persistent entities. system and it is also one of the most popular NoSQL database
Accessing connections is efficient and allows traversing data system. Neo4J stores and presents data in the form of a graph.
easily without any high-cost join operations. The property graph Data is represented by nodes and relationships between those
contains connected entities as nodes which can hold any number nodes. Neo4J is very suitable for storing and retrieving data that
of attributes (key-value-pairs).Nodes can be tagged with labels has many interconnecting relationships.
representing their different roles in the domain. Relationships
provide directed, named semantically relevant connections
between two node-entities. A relationship always has a direction,
a type, a start node, and an end node. Like nodes, relationships
can have properties.
2 BACKGROUND
MySQL is one of the most popular open source relational
database, enabling reliable and scalable relational database
2
Migration of Data from Relational Database to Graph Database ICIST '2018, March 16–18, 2018, Istanbul, Turkey
3
ICIST '2018, March 16–18, 2018, Istanbul, Turkey Y.Unal and H.Oguztuzun
During data migration of legal document system, conversion compare. After legal document system data was migrated to graph
was implemented according to predefined transformation rules. A database data access performance was compared by searching for
migration tool was implemented as a Java application which use the same data value and relationship pattern. Figure-6 shows the
below transformation rules and data was migrated by executing SQL query developed in MySQL database to retrieve all laws
Java application. Metadata information was used to created node from system whose parent name is "Vergi Mevzuat Seti".As
labels and relationship types in graph database and data was used shown in the Figure-6, two join operations are necessary to access
to create node instances and edges between nodes. children data and these tables have huge amount of data and join
operations for these tables decreased the performance.
1. Each entity table is represented by a label on nodes
2. Each row in an entity table is a node
3. Columns on those tables become node properties.
4. Remove technical primary keys, keep domain
primary keys Figure 6: The example of SQL query
5. Add unique constraints for business primary keys,
add indexes for frequent lookup attributes In Figure-7, the same data access operation was implemented
6. Replace foreign keys with relationships to the other in Neo4J database to retrieve all laws from system whose parent
table, remove them afterwards name is "Vergi Mevzuat Seti". This cypher query is more readable
7. Remove data with default values, no need to store and query result was retrieved 10 times faster than relational
those database. As seen in Figure-7, there was no need complex join
8. Data in tables that is denormalized and duplicated operations or traversing all table data for the given entity type
might have to be pulled out into separate nodes to get a during data search. During query execution, first parent node was
cleaner model. accessed whose name is "Vergi Mevzuat Seti". After that, all
9. Join tables are transformed into relationships, children nodes were accessed through only related edges.
columns on those tables become relationship properties
6 CONCLUSIONS
In the phase of deciding which database model is most
Figure 5: Transformation of relational model to graph model suitable for a specific domain, data should be investigated by
(Kanun = Law, Hukum = Judgment, Vergi = Tax) considering basic criteria. If data has lots of many-to-many
relationships, using graph database model can be very efficient.
5 DATA QUERY AND PERFORMANCE Graph database can traverse data very efficiently by using
Query development for the same data access was relationship entities while relational database has to use many
implemented in both relational and graph database models to complex and expensive join operations.
4
Migration of Data from Relational Database to Graph Database ICIST '2018, March 16–18, 2018, Istanbul, Turkey
REFERENCES
[1] Virgilio, R., Maccioni, A., Torlone, R. (2013, June). Converting relational to
graph databases. In Proceedings of the 1st International Workshop on Graph
Data Management Experiences and Systems Article No.1. ACM.
DOI= https://doi.org/10.1145/2484425.2484426
[2] Vicknair, C., Macias, M., Zhao, Z., Nan, X., Chen, Y., Wilkins, D. (2010,
April). A comparison of a graph database and a relational database: a data
provenance perspective. In Proceedings of the 48st Annual Southeast Regional
Conference Article No.42. ACM.
DOI=https://doi.org/10.1145/1900008.1900067
[3] Neo4J Graph Database Documentation.
https://neo4j.com/developer/graph-database/
[4] Graph Basics for the Relational Developer.
https://neo4j.com/blog/rdbms-graphs-basics-for-relational-developer/
[5] MySQL Relational Database Documentation.
https://dev.mysql.com/doc/refman/5.7/en/
[6] Tutorial on Relational Database Design.
http://www3.ntu.edu.sg/home/ehchua/programming/sql/Relational_Database_D
esign.html
[7] From Relational to Neo4J.
https://neo4j.com/developer/graph-db-vs-rdbms/
[8] Relation Database Overview.
https://docs.oracle.com/javase/tutorial/jdbc/overview/database.html
[9] Tutorial: Import Data into Neo4J.
https://neo4j.com/developer/guide-importing-data-and-etl/