Professional Documents
Culture Documents
Data Mapping Techniques
Data Mapping Techniques
Data Mapping Techniques
Contents:
Introduction
Description of the cannonical XML form
Commonly Used Templates -- A Template Cookbook
Back to top
Introduction
This paper explores a technique for exloiting the interchangability between XML documents
and Python data structures.
It is often useful to load an XML document into Python data structures so that they can be
processed. The DOM interface performs this task. However, DOM data structures are
specific to DOM. Often it is desirable to load the XML document into custom or application-
specific data structures.
Use the SAX interface -- Scan the XML document, creating custom Python data
structures while doing so.
Use the DOM interface -- Load the XML document into a DOM tree, then perform a
tree walk on the DOM tree, creating custom Python data structures while doing so.
In both of the above techniques, custom Python code is used to perform much of the
conversion, plucking data values from the XML document and creating instances of the
Python data structures. In effect, the developer's control over the conversion process is
encoded in custom Python code.
This paper presents a alternative technique. This technique described in this paper transforms
the original XML document into an XML document having a cannonical form, specifically,
WDDX, then use the (un-)marshaller that is distributed with PyXML to convert that into
Python objects. In effect, XSLT is used to customize the conversion. The developer's control
over the conversion process is encoded in an XSLT stylesheet.
Our interest in this paper is to provide help with implementing an equivalence between XML
documents and Python data structures.
Note, however, that the technique we describe in this paper will work for any language for
which there is an implementation of the marshaller/unmarshaller provided in generic.py (in
PyXML).
1. Use an XSLT processor and the stylesheet that you have written to transform the
source XML document into an XML document of the form accepted by the class
generic.Unmarshaller.
2. Use the generic.py (in the PyXML distribution) to load the generated/marshalled
XML into Python data structures.
class MsgHandler:
def __init__(self):
pass
def write(self, msg):
print '***', msg
This code uses the libxsltmod XSLT processor, which is the libxslt C library that I
wrapped for Python. You can find it http://www.rexx.com/~dkuhlman. You should be
able to use the XSLT processor in PyXML just as easily.
The Unmarshaller in generic.py is sensitive to whitespace. I squeezed whitespace out
of the generated by including the following in my stylesheet:
<xsl:strip-space elements="tag1 tag2 ..."/>
Module generic is included in the PyXML distribution (under the xml/marshal sub-
directory). You can find out about PyXML here.
The above sample code assumes that the created data structure is an instance of a
class that has a show method.
A note on the term "WDDX" -- I don't believe that the XML documents that we are
generating (for input to the generic.py Unmarshaller) follow the DTD for WDDX. That
doesn't concern me much for our purposes here, since in this technique we are building
documents for input to generic.py. However, if you plan to share or "syndicate" those
documents, then you will want to pay attention to generating XML documents that obey a
publicly known DTD. In the mean time, I believe that what this paper describes is in the
"spirit" of WDDX in the sense that it marshals and unmarshals data structures in a way that is
programming language neutral. However, the occurance of so many quasi-quotes in this
paragraph should be a caution. As should paragraphs that refer to themselves.
Back to top
This section describes the XML elements that we must generate. Effectively, we are
describing the XML elements generated by class generic.Marshaller and function
generic.dumps() and accepted by class generic.Unmarshaller and function generic.loads().
(gneric.py is in the PyXML distribution.)
The name of the class, an instance of which is created, is the value of the attribute
class, e.g. "object_class_name".
This class must be defined in a module whose name is the value of the attribute
module, e.g. "object_classes". So, in this cass we would need a module
object_classes.py.
The empty tuple in this generated XML could contain parameters to be passed to the
constructor to the class. If this tuple is empty, the constructure is not called. However,
member variables for the instance will be initialize (see next bullet).
The dictionary contains the names and values of the member variables to be set in the
instance. The format is a member name followed by its value followed by the next
member name followed by its value, and so on.
If this list is to be the value of a member variable of a class, generate this code within
the dictionary that defines the member variables of an instance of a class.
The list will become the value of the member member_variable_name.
m = generic.Marshaller()
ds1 = ([11,22], 333, 'bbb')
s1 = m.dumps(ds1)
print s1
Back to top
Create an object
To create an instance of a class from the current element, create a template rule similar to the
following:
<xsl:template match="object_element_name">
<xsl:element name="object">
<xsl:attribute name="class">class_name</xsl:attribute>
<xsl:attribute name="module">object_classes</xsl:attribute>
<xsl:element name="tuple"/>
<xsl:element name="dictionary">
<xsl:element name="string">
<xsl:text>member_x</xsl:text>
</xsl:element>
<xsl:element name="string">
<xsl:value-of select="@attribute_x"/>
</xsl:element>
<xsl:element name="string">
<xsl:text>sub_object_list</xsl:text>
</xsl:element>
<xsl:element name="list">
<xsl:apply-templates select="./*"/>
</xsl:element>
</xsl:element>
</xsl:element>
</xsl:template>
Where:
To add a member variable to the current object with a simple string value that comes from an
attribute of the current element, do the following:
<xsl:element name="string">
<xsl:text>member_variable_name</xsl:text>
</xsl:element>
<xsl:element name="string">
<xsl:value-of select="@attribute_name"/>
</xsl:element>
Where:
<xsl:element name="string">
<xsl:text>member_variable_name</xsl:text>
</xsl:element>
<xsl:element name="string">
<xsl:value-of select="."/>
</xsl:element>
Where:
<xsl:element name="string">
<xsl:text>object_list</xsl:text>
</xsl:element>
<xsl:element name="list">
<xsl:apply-templates select="./object_element_name"/>
</xsl:element>
Where:
object_list is the name of the member variable to be added to the parent instance.
object_element_name is the element/tag of the sub-elements. One object will be
created and added to the list (object_list) for each sub-element of this name.
Causal mapping is one of the most commonly used cognitive mapping techniques in
investigating the cognition of decisionmakers in organizations (Swan, 1997). Causal
mapping is derived from personal construct theory (Kelly, 1955). This theory posits
that an individual's set of perspectives is a system of personal constructs and
individuals use their own personal constructs to understand and interpret events. In
other words, an individual understands die environment with salient concepts
(constructs), which can be expressed by eidier simple single-polar phrases or
contextually rich bipolar phrases. An example of single-polar phrase is "good reader",
while an example of bipolar phrase is "good computer skills - poor computer skills".
As revealed by its name, a causal map represents a set of causal relationships among
constructs within a belief system. Through capturing the causeeffect relationships,
insights into the reasoning of a particular person are acquired.
Semantic mapping
It must be pointed out that causal assertions are only part of an individual's total belief
system. There are some cognitive mapping techniques that can be used to identify other
relations among concepts. Semantic mapping, also known as idea mapping, is used to explore
an idea without the constraints of a superimposed structure (Buzan, 1993). To make a
semantic map, one starts at the center of the paper with the main idea, and works outwards in
all directions, producing a growing and organized structure composed of key words and key
images. Around the main idea (a central word), five to ten ideas (child words) that are related
to the central word are drawn. Each of these "child" words then serves as a sub-central word
for the next level drawing (Buzan, 1993). In other words, a semantic map has one main or
central concept with tree-like branches. Figure 2 is an example of a semantic map that depicts
related words around the main idea "UML"