Download as pdf or txt
Download as pdf or txt
You are on page 1of 26

When To Store Data As XML ?

Philip Nelson
Lloyds Banking Group PLC
Agenda

Why Consider Storing As XML

Where XML Started Out

XML : Data In Transit Data At Rest

Experiences Before pureXML

Life After pureXML


Why Consider Storing as XML

Whats Wrong With Relational ?

Is XML The Answer ?

Lets consider the real world


Is Relational Wrong ?

NO

Still best for


Relatively simple and uniform structures
Non-evolving data structures

Typical data best in relational format -


Financial transactions
Reference data (e.g. actuarial rates)
Why Is Relational Ubiquitous ?

Traditional large scale business computing


Largely automating simple financial processes

Availability Of RDBMS Software


Many products / many platforms
Standards (e.g. SQL) help in update

Focus Of Education
Application development : SQL
Database design targets relational structures
Traditional (DB) World View
The World Is Not All Tabular

Only small percentage is naturally tabular

Much is unstructured

More is semi-structured

Some is hierarchical
When to Use XML

Volatile Data Structures

Naturally Hierarchical Data

Complete Business Objects

Sparse Attributes

Information Interchange
Volatile Data Structures

When database definitions change frequently

Not an excuse for poor database design

Often driven by external factors


Regulatory changes
Regular new product releases
Naturally Hierarchical Data

Some data in naturally hierarchical

Examples
Organisational structures
Bill of Materials

Often using relational tables for these


structures results in complex programming
and poor performance
Complete Business Objects

Collection of data used as a whole

Often represented as objects in application

Must be decomposed into and constructed


from many relational tables

Often this process is as expensive as the


real work done with the data
Sparse Attributes

In relational terms nullable columns

High percentage of null instances

Wasted storage (less with compression)

Unnecessary processing

Can't see the wood for the trees


Especially in BI contexts
Information Interchange

Traditional XML use case

But consider storing received data intact

Also store transmitted data intact

Useful as an audit trail


XMLs Heritage

SGML : dynamic information display (1980s)

W3C interest : solving internet problems

XML : first draft (1996)

XML : W3C recommendation (1998)


XML : First Uses

Information interchange
Became seen as a means of representing data in
transit
Many industry-specific dialects / schemas
produced to enable B2B processes

Web standards
Basis of, among other things, web services
Storing XML

Two initial approaches


Storage as LOBs

Every access requires parsing the textual format

Updates are full document replacements
Shredding to relational tables

XML is parsed and mapped to relational columns

Even simple XMLs need many tables

Result : large overheads !!!


Before pureXML : LOBs

Web services audit log

Key elements extracted before LOB storage

Only practically able to search on these few


elements

Useful information locked up in the LOB


eventually unlocked by moving to pureXML
Before pureXML : Shredding

Application Form from POS System

Shredded to 80 tables

Complex mapping documents

Shred time limits system throughput


With pureXML

Faster storage
Direct reads
Direct updates
Easier shredding (XMLTABLE)
Easier XML construction
PureXML : Faster Storage

Storing into XML column performs well


Even with validation as well as parsing
Faster than previous LOB storage solution
Even if some overhead
Massive benefits
Especially if high read to write ratio
PureXML : Direct Reads

Opens the XML content to full query access


In one case we were able to unlock the BI
potential of our ESB audit logs
Turning data exhaust into business benefit
PureXML : Direct Updates

XQuery update is different


But well worth learning
Key to treating XML as a first class citizen
Very efficient
PureXML : Easier Shredding

Often still need to get data into table format


Joining to existing relational data
Populating legacy systems
Presentation to users
XMLTABLE is king in this field
Easy to learn
Some basic concepts (e.g. setting the right
granularity) critical to success
Replaced old mapping document schemes
PureXML : Easier Construction

Old mapping document schemes for


construction replaced
SQL/XML construction functions are -
Standard
Easier to learn and maintain
Often matched with XQuery updates to -
Produce control totals and checksums
Perform sync with previous documents in series
Horses for Courses

In conclusion
Native XML storage is now an option to be
considered (we have the technology)
Not for everything : many data storage
requirements are better suitable to relational
Learn to know the signs of when XML is better
Other technologies (e.g. Big Data) will introduce
yet other data storage formats
The great thing is you can mix and match
Philip Nelson
(teamba@scotdb.com)

You might also like