Data Mapping Techniques

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 7

Data Mapping Techniques -- WDDX

Contents:

 Introduction
 Description of the cannonical XML form
 Commonly Used Templates -- A Template Cookbook

Back to top

Introduction
This paper explores a technique for exloiting the interchangability between XML documents
and Python data structures.

It is often useful to load an XML document into Python data structures so that they can be
processed. The DOM interface performs this task. However, DOM data structures are
specific to DOM. Often it is desirable to load the XML document into custom or application-
specific data structures.

There are a number of approaches to this problem that come to mind:

 Use the SAX interface -- Scan the XML document, creating custom Python data
structures while doing so.
 Use the DOM interface -- Load the XML document into a DOM tree, then perform a
tree walk on the DOM tree, creating custom Python data structures while doing so.

In both of the above techniques, custom Python code is used to perform much of the
conversion, plucking data values from the XML document and creating instances of the
Python data structures. In effect, the developer's control over the conversion process is
encoded in custom Python code.

This paper presents a alternative technique. This technique described in this paper transforms
the original XML document into an XML document having a cannonical form, specifically,
WDDX, then use the (un-)marshaller that is distributed with PyXML to convert that into
Python objects. In effect, XSLT is used to customize the conversion. The developer's control
over the conversion process is encoded in an XSLT stylesheet.

This technique has the following benefits:

 XSLT can be used to perform the conversion.


 We can provide a set of XSLT templates that can be easily adapted to each of a set of
common transformations.
 And the hope is that we can provide help with generating the templates for the XSLT
stylesheet that are used in the conversion process. Perhaps, we can enable to describe
the mapping from specific XML elements to specific Python data structures in an
easier way, and then can generate XSLT stylesheet templates that perform that
mapping (or more correctly, that convert to the XML elements that can be
automatically loaded into Python data structures).

Our interest in this paper is to provide help with implementing an equivalence between XML
documents and Python data structures.

Note, however, that the technique we describe in this paper will work for any language for
which there is an implementation of the marshaller/unmarshaller provided in generic.py (in
PyXML).

The top level process is composed of the following steps:

1. Use an XSLT processor and the stylesheet that you have written to transform the
source XML document into an XML document of the form accepted by the class
generic.Unmarshaller.
2. Use the generic.py (in the PyXML distribution) to load the generated/marshalled
XML into Python data structures.

Here is sample code that performs this transformation:


import generic
import libxsltmod

class MsgHandler:
def __init__(self):
pass
def write(self, msg):
print '***', msg

def loadfile(inFileName, stylesheetFile):


msgHandler = MsgHandler()
s1 = libxsltmod.translate_to_string(
'f', stylesheetFile,
'f', inFileName,
msgHandler)
print s1
um = generic.Unmarshaller()
ds = um.loads(s1)
ds.show()
Some notes about this code:

 This code uses the libxsltmod XSLT processor, which is the libxslt C library that I
wrapped for Python. You can find it http://www.rexx.com/~dkuhlman. You should be
able to use the XSLT processor in PyXML just as easily.
 The Unmarshaller in generic.py is sensitive to whitespace. I squeezed whitespace out
of the generated by including the following in my stylesheet:
 <xsl:strip-space elements="tag1 tag2 ..."/>
 Module generic is included in the PyXML distribution (under the xml/marshal sub-
directory). You can find out about PyXML here.
 The above sample code assumes that the created data structure is an instance of a
class that has a show method.

A note on the term "WDDX" -- I don't believe that the XML documents that we are
generating (for input to the generic.py Unmarshaller) follow the DTD for WDDX. That
doesn't concern me much for our purposes here, since in this technique we are building
documents for input to generic.py. However, if you plan to share or "syndicate" those
documents, then you will want to pay attention to generating XML documents that obey a
publicly known DTD. In the mean time, I believe that what this paper describes is in the
"spirit" of WDDX in the sense that it marshals and unmarshals data structures in a way that is
programming language neutral. However, the occurance of so many quasi-quotes in this
paragraph should be a caution. As should paragraphs that refer to themselves.

Back to top

Description of the cannonical XML form


This technique generates XML that can be processed by the Unmarshaller in generic.py,
which is included in the PyXML distribution.

This section describes the XML elements that we must generate. Effectively, we are
describing the XML elements generated by class generic.Marshaller and function
generic.dumps() and accepted by class generic.Unmarshaller and function generic.loads().
(gneric.py is in the PyXML distribution.)

To create an instance of a class, generate something like the following:

<object class="object_class_name" module="object_classes">


<tuple/>
<dictionary>
<string>member_1</string>
<string>value_1</string>
<string>member_2</string>
<string>value_2</string>
o
o
o
</dictionary>
</object>
Here are a few things to notice about this generated XML:

 The name of the class, an instance of which is created, is the value of the attribute
class, e.g. "object_class_name".
 This class must be defined in a module whose name is the value of the attribute
module, e.g. "object_classes". So, in this cass we would need a module
object_classes.py.
 The empty tuple in this generated XML could contain parameters to be passed to the
constructor to the class. If this tuple is empty, the constructure is not called. However,
member variables for the instance will be initialize (see next bullet).
 The dictionary contains the names and values of the member variables to be set in the
instance. The format is a member name followed by its value followed by the next
member name followed by its value, and so on.

To create a list of objects, generate something like the following:


<string>member_variable_name</string>
<list>
<object class="object_class_name" module="object_classes">
o
o
o
</object>
o
o
o
</list>
Or:
<string>member_variable_name</string>
<list>
<string>value_1</string>
<string>value_2</string>
o
o
o
</list>
Here are a few things to notice about this generated XML:

 If this list is to be the value of a member variable of a class, generate this code within
the dictionary that defines the member variables of an instance of a class.
 The list will become the value of the member member_variable_name.

To create a string value, generate the following:


<string>value_1</string>
To create an integer value, generate the following:
<int>101</int>
To create a float value, generate the following:
<float>1.23</float>
You can use class Marshaller in generic.py to determine the format of other data types. The
following code will print a sample of the input to the Unmarshaller:
import generic

m = generic.Marshaller()
ds1 = ([11,22], 333, 'bbb')
s1 = m.dumps(ds1)
print s1
Back to top

Commonly Used Templates -- A Template Cookbook


This section presents some (skeletons of) templates that produce commonly needed XML
elements, for input to class generic.Unmarshaller. It can be viewed as a cookbook for creating
XSLT templates to perform common data structure loading tasks.

Create an object

To create an instance of a class from the current element, create a template rule similar to the
following:
<xsl:template match="object_element_name">
<xsl:element name="object">
<xsl:attribute name="class">class_name</xsl:attribute>
<xsl:attribute name="module">object_classes</xsl:attribute>
<xsl:element name="tuple"/>
<xsl:element name="dictionary">
<xsl:element name="string">
<xsl:text>member_x</xsl:text>
</xsl:element>
<xsl:element name="string">
<xsl:value-of select="@attribute_x"/>
</xsl:element>
<xsl:element name="string">
<xsl:text>sub_object_list</xsl:text>
</xsl:element>
<xsl:element name="list">
<xsl:apply-templates select="./*"/>
</xsl:element>
</xsl:element>
</xsl:element>
</xsl:template>
Where:

 object_element_name is the name of the element.


 object_class_name is the name of the Class. An instance of this class will be created
from the element.
 object_classes is the name of the module in which the class is defined. Create a .py
file with this name containing the class definition.
 Additional notes:
o This example creates a member variable named member_x with a string value
from the attribute named attribute_x.
o This example creates a list of sub-objects and assigns it to member variable
sub_object_list.

Add a string member data item

To add a member variable to the current object with a simple string value that comes from an
attribute of the current element, do the following:

Add the following snippet to the current template:

<xsl:element name="string">
<xsl:text>member_variable_name</xsl:text>
</xsl:element>
<xsl:element name="string">
<xsl:value-of select="@attribute_name"/>
</xsl:element>
Where:

 member_variable_name is the name of the member data item to be added to the


current instance.
 attribute_name is the name of the attribute that provides the value.
To add a member variable to the current object with a simple string whose value that comes
from the text (node) in the current element, do the following:

Add the following snippet to the current template:

<xsl:element name="string">
<xsl:text>member_variable_name</xsl:text>
</xsl:element>
<xsl:element name="string">
<xsl:value-of select="."/>
</xsl:element>
Where:

 member_variable_name is the name of the member data item to be added to the


current instance.

Create a list of objects

To create a list of objects from a nested list of elements, do the following:

Step 1. Add the following snippet to the parent template:

<xsl:element name="string">
<xsl:text>object_list</xsl:text>
</xsl:element>
<xsl:element name="list">
<xsl:apply-templates select="./object_element_name"/>
</xsl:element>
Where:

 object_list is the name of the member variable to be added to the parent instance.
 object_element_name is the element/tag of the sub-elements. One object will be
created and added to the list (object_list) for each sub-element of this name.

Step 2. Add a template rule for the sub-element:


<xsl:template match="object_element_name">
<xsl:element name="object">
<xsl:attribute name="class">class_name</xsl:attribute>
<xsl:attribute name="module">object_classes</xsl:attribute>
<xsl:element name="tuple"/>
<xsl:element name="dictionary">
<xsl:element name="string">
<xsl:text>x</xsl:text>
</xsl:element>
<xsl:element name="string">
<xsl:value-of select="@X"/>
</xsl:element>
<xsl:element name="string">
<xsl:text>object_list</xsl:text>
</xsl:element>
<xsl:element name="list">
<xsl:apply-templates select="./*"/>
</xsl:element>
</xsl:element>
</xsl:element>
</xsl:template>
Where:

 object_element_name is the name of the element.


 object_class_name is the name of the Class. An instance of this class will be created
from the element.
 object_classes is the name of the module in which the class is defined. Create a .py
file with this name containing the class definition.

2.1 Causal mapping

Causal mapping is one of the most commonly used cognitive mapping techniques in
investigating the cognition of decisionmakers in organizations (Swan, 1997). Causal
mapping is derived from personal construct theory (Kelly, 1955). This theory posits
that an individual's set of perspectives is a system of personal constructs and
individuals use their own personal constructs to understand and interpret events. In
other words, an individual understands die environment with salient concepts
(constructs), which can be expressed by eidier simple single-polar phrases or
contextually rich bipolar phrases. An example of single-polar phrase is "good reader",
while an example of bipolar phrase is "good computer skills - poor computer skills".
As revealed by its name, a causal map represents a set of causal relationships among
constructs within a belief system. Through capturing the causeeffect relationships,
insights into the reasoning of a particular person are acquired.

Semantic mapping

It must be pointed out that causal assertions are only part of an individual's total belief
system. There are some cognitive mapping techniques that can be used to identify other
relations among concepts. Semantic mapping, also known as idea mapping, is used to explore
an idea without the constraints of a superimposed structure (Buzan, 1993). To make a
semantic map, one starts at the center of the paper with the main idea, and works outwards in
all directions, producing a growing and organized structure composed of key words and key
images. Around the main idea (a central word), five to ten ideas (child words) that are related
to the central word are drawn. Each of these "child" words then serves as a sub-central word
for the next level drawing (Buzan, 1993). In other words, a semantic map has one main or
central concept with tree-like branches. Figure 2 is an example of a semantic map that depicts
related words around the main idea "UML"

You might also like