Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

STORING AND QUERYING XML DATA

USING RDBMS
Yesi Novaria Kunang
Computer Science Faculty
Bina Darma University, Palembang – Indonesia
ykunang@yahoo.com
Ahmad Ashari
Faculty of Mathematics and Natural Sciences
Gadjah Mada Universty, Yogyakarta – Indonesia
ashari@ugm.ac.id

Abstract

XML (eXtensible Markup Language) is rapidly becoming a popular data format and emerging
standard for data exchange over the Internet. With a large amount of data represented as XML
documents, it becomes necessary to store and query these XML documents. One of these is using an
RDBMS or for media storage and using SQL to query an XML document.
There are two approaches to parsing an XML document into RDBMS using a middleware, i.e. SAX
parsing and DOM parsing methods. This research studied those methods, and then compared the
performance the two methods. It also studied performance of some alternatives way to structuring
and tagging data from one or more tables on RDBMS as a hierarchical XML document. As a final
result, we will get the best performance for storing and querying XML data using an RDBMS from
these alternatives

1. Introduction
XML (eXtensible Markup Language) has become a popular data format and exchanging data over
the Internet. The flexibility of XML structure is suitable for exchanging data and modeling
applications. However, when a large number of data is to be present as an XML document, it causes
the query request and saving process at XML document very important needed. One of the
approaches is by using the XML native database system. This approach has two weaknesses: first,
XML native database system is not adequate to save data and cannot accommodate the complicated
query at relational database system; second, it is impossible for the users to ask directly for XML
documents and other data that are stored in a relational database system.

Querying and storing XML data techniques using relational database system are implemented to
overcome those weaknesses, which have been presented above. The steps for this approach are as
follows: first, make the relational table design to save data or an XML document; second, divide the
XML data by separating them into columns in the presented table; and third is processing an SQL
query to get the XML document format needed from RDBMS data.

457
2. Literature Review

Bourret [3] says that XML and related technology can be said to be a simple database because XML
documents can be used in environments with small amounts of data, few users, and simple work
performance. XML also provides many things found in databases: storage (XML documents),
schemas (DTD, XML schema languages), query languages, programming interfaces (SAX, DOM,
JDOM), and so on. However, XML lacks many of the things found in real databases, efficient
storage, indexes, security, transaction and data integrity, multi user access, queries across multiple
documents, and so on. For this reasons, XML is not suitable for environments that have many users,
strict data integrity requirements, and the need for good performance.

Two mappings are commonly used to map an XML document schema to database schema: the
table-based mapping and the object-relational mapping [4]. These mapping data in XML document
other than in that document itself, so it will be suitable to map data-centric and not suitable for
document-centric [6].

Strategies for data transfer from XML to database depend on the software used and middleware
support. There are two ways to parsing XML document using middleware application (Java, Perl,
PHP, Python, C/C++, Eiel, Tcl, dll); they are using SAX (Simple API for XML) or DOM
(Document Object Model) for XML parser [1].

Shanmugasundaram, et.al, [5], discuss some alternatives to publishing relational data as an XML
document, that can be differentiated based on basic principle between relational table and XML
document, where document XML has been tagged and structured while relational table does not
have these two things. Therefore, in order to converse relational table into XML document, tagging
and structuring need to be added in processing. One approach is to do tagging as the final step of
query processing (late tagging), while another approach is to do it earlier in the process (early
tagging). Similarly, structuring can be done as the final step query processing (late structuring) or it
can be done earlier (early structuring).

Each alternative depends on how much work is done inside the relational engine. Inside Engine
means tagging and structuring are done completely inside the relational engine, whereas outside
engine means that part, though not necessarily all, of that work is done outside the relational engine.
This depends upon the ability of relational database engine in used. For early tagging with late
structuring is not visible alternative because adding tags to an XML document without having its
structure makes no sense.

2.1. Individual

Tagging and structuring are both done early in query processing. One the simplest technique for
structuring relational data as an XML document is using the early tagging and early structuring
method which is called the Stored Procedure [5] or Individual table technique [2].

This strategy transfers a hierarchy database orderly document. This process can be done by opening
a number of results set in the table for root element. For each row within the table, it contains row
element, and then open almost all child table orderly.

458
2.2. Universal I table

In this way, the tagging and structuring were done as the final step in arranging an XML document.
The forming of XML document is divided into two phases: (a) form content, where relational data
produced, and (b) tagging and structuring; relational data are arranged and given tag to produce
XML document.

In order to produce the content as that we want is by making a single result set (universal table),
consisting of all data in document. By correlating all tables, using join predicate is to relate parent to
child. This is known as Redundant Relation [5], or it also called Universal I [2].

To tagging and structuring result, there are two ways this should be done: (a) grouping all siblings
(XML documents) which have the same category (and eliminate duplicate for redundant), and (b)
extracting information from each tuple and tag to produce XML result.

2.3. Universal II table

The main problem with the Late Tagging Late Structuring technique is related to memory
management when forming tags. To overcome this problem, a relational engine can structure
relational content. This strategy can be done using a Universal table type II [2] or Sorted Outer
Union [5], which uses UNION statement.

3. Conducting the Research


This research specializes to see strategies to transfer data between XML document and relational
database, and vice versa. The steps are as follows:

3.1. Designing an XML Document

To show how treating the element data attribute and sub element in an XML document and
transferring to a database or vice versa, an example will be given in the form of an XML document
where inside there is web element, attribute and sub element. The example that looks in Figure 1
can be done for this research.
<? xml version="1.0"?>
<inventory>
<book>
<title id="001" year="2000">Professional PHP Programming</title>
<authors>
<name first="Jesus" last="Castagnetto"/>
<name first="Harish" last="Rawat"/>
<name first="Sascha" last="Schumann"/>
<name first="Chris" last="Schollo"/>
<name first="Deepak" last="Veliath"/>
</authors>
<binding>paperback</binding>
<pages>909</pages>
<price>$25</price>
<Genre>Tutorial</Genre>
<Publisher>Wrox Press Ltd.</Publisher>
</book>
</inventory>

Figure 1: Inventory Document sample

459
To examine this XML document, numbers of root node were added, from 1 until 100,000-root node
by the assisting of PHP script to form the data file. The size of data begins from the smallest unit,
583 bytes until 35,135 M.

3.2. Object Relational - XML Document Mapping

The previous example of an XML documents, can be mapped by Object Relational mapping
become objects that look in Figure 2.
Object Book {
- Idbook=001; - Title=Profesional PHPProgramming;
- Year =2000; - Binding =paperback;
- Pages = 909; - Price = $25;
- Genre = Tutorial; -Publisher = Wrox Press Ltd.
}

Object Author { Object Author { Object Author {


Etc.
-First = Jesus -First = Harish -First = Sascha
-middle= -middle= -middle=
-last =Castagnetto -last =Rawat -last =Schumann
} } }

Figure 2: Object based Mapping on Document Inventory

These objests can be mapped into MySQL database with two tables, the database schema or table
relation is shown in Figure 3.

Figure 3: Table Relation

The idbook attribute in the tabbook1 table is the primary key, and the idbook attribute in the
tabauthor1 is the foreign key.

3.3. Transferring Data from XML to Database

The strategy to transfer data from XML to database using PHP as middleware in this research can
be done by:
1. Parsing file using PHP SAX parser
2. Using PEAR’s XML Tree Class

460
3.4. Transferring Data from Database into XML

By using an alternative way in forming tagging and structure, the present of XML document format
from relational MySQL database supported by PHP script as middleware. For all alternatives done
by tagging and structuring process, which is outside engine, has meant that a part of the process was
done outside the relational engine.

The alternatives of data transfer from database into XML that used in this research are:
1. Early tagging, early structuring; stored procedure/individual table
2. Late Tagging, Late Structuring; Universal Table I/Redundant
3. Late Tagging, Early Structuring; Universal Table II /Sorted Union

3.5. Presenting and Searching XML Data

To compare the work performance between XML document and RDBMS, from speed side in
loading process in browser, it was done by the following task:

1. Data Searching at XML document by using DSO binding technique, using script.
2. Presenting XML data from RDBMS, by conducting a data search from XML document that
have been saved into MySQL database, using the redundant method. The result from query is
saved as XML document and by using DOM Tree method, the result from query is saved as
XML document using DOM Tree method, the result file is read and bound using XSL file.

4. Results and Discussion


4.1. Comparison between SAX and DOM

Comparison between SAX and DOM were conducted in this research by inserting the data from
XML document, which has some node variation to be inserted into database table. This result is
look in Figure 4.

2000
Time (s)

1500 DOM
1000 SAX
500
0
1 50 100 500 1000 5000
Number of nodes

Figure 4: Comparison between SAX and DOM

As we see from graphic, parsing XML document is faster if we use SAX method (Simple API for
XML) than DOM method (Document Object Model). There are two important things: first, SAX
code uses smaller memory because the buffer is only one row, while DOM code uses the buffer for
the whole document. The second, SAX code is faster because it is saving time to form DOM Tree.

461
The most important effect of using the memory is SAX method can be use for large document while
DOM uses a lot of memory. However, DOM is suitable to be used for application where DOM tree
is needed, for example if we want to present an XML document supported by XSLT. Another
reason is the hierarchy of XML document parsing by DOM technique is more complete for the tag
name, tag attribute, data content and other nested tags.

In order to parsing the larger XML document with a large number of node, from this research, it is
suggested to divide those document for having parsing faster by using SAX and DOM method.

4.2. The Comparison Result for transfer from Database into XML

To compare some strategy for data transfer from database to an XML document, each method
examine with the variation of numbers data to observe the speed of data transfer from database into
XML document. The comparison result is shown in Figure 5.

1500
Individual
Time (s)

1000
Universal I
500 Universal II
0
1 10 50 100 1000 5000
Number of records

Figure 5: Data Transfer Strategies from Database into Document XML Graphic

From the graphic, it is significant that the individual table method becomes most inappropriate
method. This method shows the worst work performance by processing data slowly to be an XML
document, compared to Universal Universal II methods and I. The main cause of this is because
most of the resource database used in this method, one or more SQL query that should be given in
every tuple for tables should have nested structure. Therefore, to form a larger document, thousands
of queries that should be processed cause the inefficient or even deadlock.

The Universal table I/ redundant method shows a very good work performance compared to
universal type II. This happened because of the efficient process of query even we found
redundancy, compared to Universal II method, which should form result table in the structure form.
Also for tagging process, this become faster because rows from the result set table is fewer than
Universal table II. The use of memory of Universal table I is better than Universal table II, it is
indicated by the number of record data (50.000). Universal II method spends all, or even more of
the default memory provided by MySQL.

4.3. The Comparison of Searching Data for XML Document toward RDBMS

The comparison result of searching data using XML file document toward RDBMS can be found in
Figure 6.

462
40

30

Time (s)
XML
20
RDBMS
10

0
1 10 50 100 500 1000 5000 10000 50000
Number of nodes/records

Figure 6: Comparison of query using RDBMS and using XML Document

From the graphic of XML searching data, which is saved using XML document and using RDBMS,
it is concluded that RDBMS work performance for keeping and data query is better than keeping
XML data in XML document. To find a certain record (final record from total number of the record)
it is found that RDBMS is more stable, starting from searching time until after the display in web
page form. Compared to an XML document, the bigger number of node, the worst the form of data
and final node become. This happened because by using DSO before data was searched; browser
should form (cache) data from XML document and find out each node to find certain data. So the
bigger the number of data at an XML document the longer the time to cache the data and data
searching.

On the other hand, XML document data also needs a bigger capacity for saving than saving data in
RDBMS form; it is look in Table 1. The use of index in RDBMS will make the searching for data
faster. Besides that in an XML document, for every data saving into an XML document, they also
should save the tags and this causes the larger capacity.

Table 1: Comparison Space XML Document vs. RDBMS Table


Number of XML RDBMS
Nodes/ records Document Data Index Total
1 583 b 224 b 3 Kb 3,2 Kb
10 4Kb 1,3 Kb 3 Kb 4,3 Kb
50 16 Kb 6,8 Kb 3 Kb 9,8 Kb
100 32 Kb 13,6 Kb 3 Kb 19,6 Kb
500 165 Kb 73,0 Kb 7 Kb 80,0 Kb
1.000 330 Kb 147,2 Kb 11,0 Kb 158,2 Kb
5.000 1,7 M 717,7 Kb 43,0 Kb 760,7 Kb
10.000 3,4 M 1,4 M 82,0 Kb 1,5 M
50.000 17,5 M 7,9 M 403 Kb 8,3M

5. Conclusion and Future Work

The results clearly indicated that storing XML document using RDBMS needs SAX parser to make
better work performance compared to DOM tree technique. Redundant/Universal I technique is the
best alternative for querying XML document in RDBMS and data transfer from RDBMS into XML
document since the use of memory of Universal I table is better than the other techniques. The use
of RDBMS for querying XML data especially for large number of data is faster than XML
document flat file as storage. In general, storing XML data using RDBMS is more efficient since

463
RDBMS only needs smaller capacity to store the data compared to XML document. This happened
because XML document not only saved the content of the data but also the tags. Our future work
wills also comparing the relative performance of relational database and native XML database,
integrating and comparing some other XML query languages.

6. References
[1] Asaduzzaman, A., 2003, Building XML Tress with PEAR’s XML_Tree Class, Devshed article,
http://www.devarticles.com/art/1/443

[2] Bourret, R., Data Transfer Strategies, 2001. http://www.rpbourret.com/xml/DataTransfer.htm

[3] Bourret, R., XML and Databases, 2003.


http://www.informatik.tudarmstadt.de/DVS1/staff/bourret/xml/XMLAndDatabases.htm

[4] Florescu, D., Kossmann, D., Storing and Querying XML Data using an RDBMS, Bulletin of the Technical Comitte
on Data Enginering, 1999. http://www.research.microsoft.com/research/db/debull/99sept/we.ps

[5] Shanmugasundaram, J., Shekita, E., Carey, M., Lindsay, B., Pirahesh, H., Reinwald, B., Efficiently Publishing
Relational Data as XML Document, 1999. http://www.acm.org/sigmod/vIdb/conf/2000/P065.pdf

[6] Tatarinov, I., Viglas, S.D., Beyer, K., Shanmugusundaram, J., Shekita, E., Zhang, C., Storing and Querying Ordered
XML Using a Relational Database System, ACM SIGMOOD, Madison, Wisconsin, USA, 2002.

464

You might also like