INFO-3138 Tutorial 7 - DOM Node, NodeList, NamedNodeMap

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

INFO-3138 Programming with Declarative Languages

Tutorial 7: DOM Node, NodeList and NamedNodeMap


Purpose
To learn the fundamentals of how to work with XML data programmatically using the XML
DOM.

Background and Overview of the Example


The XML DOM is a W3C standard model for representing and manipulating XML data in a
program. There are many implementations for any development platform and language you
can think of. Here we’ll use Microsoft .NET’s System.Xml.XmlDocument implementation along
with C#. Other implementations are highly similar, so the skills demonstrated here are
transferrable.
Using the DOM in our code involves using an assortment of interfaces as summarized in the
following diagram:

Figure 1 - Key DOM Interfaces

For this tutorial we’ll focus on the following interfaces: Node, Node list and Named node map.

Tutorial on DOM Interfaces Page 1 S2023


INFO-3138 Programming with Declarative Languages

One thing Figure 1 shows us is that the Node interface is a generalization of several more
specialized interfaces such as Element, Attr, Comment and Text. This means the Node interface
can be used to represent and manipulate individual XML building blocks such as elements,
attributes, comments and text nodes. However, it can only represent these items in a generic
way. In other words, the node interface will give you some ability to manipulate an element (for
example) but not as fully as the more specialized Element interface.
At the top right of Figure 1 you’ll find the Named node map and Node list interfaces. In the
DOM specification, these interfaces are not derived from nor used to derive other interfaces.
However, Microsoft’s XmlDocument implemention of the DOM does include it’s own
specialized version of the Named node map which is called XmlAttributeCollection. The key
point is that these interfaces represent collections of nodes. The Node list represents an
ordered list of nodes and the Named node map represents a collection of key-value pairs. For
the upcoming example we’ll use a Node list to manipulate a collection of child nodes belonging
to a parent element and to manipulate a set of elements with the same tag name. We’ll also
use the Named node map to work with the set of attributes of a selected element.

The AfricanCountries Example Program


Download and extract the archive DOM Example using Node, NodeList and NamedNodeMap,
then open the extracted solution in Visual Studio. This is a basic but complete example that
demonstrates using the Node and Named node map interfaces of the DOM.
The XML data file used in this example contains information about countries in Africa. You
should look at this file before examining the program code. It’s included in the project’s
bin\Debug\net5.0 folder. There’s a country element for each country containing attributes and
child elements that document the population, region within Africa, capital city, languages and
ethnic groups for the selected country.

Figure 2 - example ecountry element from africa.xml

The general structure of program’s Main method is as follows:


1. Opens an XML file called africa.xml, parses it and loads it into the DOM (in memory)
2. Calls the helper method DisplayCountries to display the name of every African country
from the data

Tutorial on DOM Interfaces Page 2 S2023


INFO-3138 Programming with Declarative Languages

3. Calls the helper method DisplayEthnicGroups to display the name of all the ethnic
groups that are represented in a particular African country (based on user-nput)
Now let’s go through these steps in more detail!
1. Loading the XML Document
We’re using .NET’s System.Xml namespace which implements the DOM. We can load
the XML document into the DOM using an instance of the XmlDocument class which
implements the Document interface from Figure 1. The object’s Load() method will load
the XML file indicated via a path argument. Once this is loaded, the XmlDocument object
(doc) is used to manipulate the DOM. Note that the XmlDocument object can throw
exceptions of type IOException and XmlException.
2. Displaying the country names

Figure 3 - DisplayCountries method

The DisplayCountries() method obtains a collection of country elements (and only


country elements) using the XmlDocument object’s GetElementsByTagName() method.
The method accepts the tag name of the elements (“country”) and returns a Node list
interface which represents an ordered list of all country elements in the entire data set.
C# allows us to iterate through the collection using a foreach loop. Each item in the
collection can be referenced using a Node interface because an element is a specialized
form of node.
Inside the foreach loop we’re obtaining a collection of attributes belonging to the
current element. This is done via the Attributes property which is a Named node map
interface. Notice that this property is assigned to a variable of type
XmlAttributeCollection called attrs. As mentioned earlier, .NET’s implementation of the
DOM includes the XmlAttributeCollection interface which is a more specialized form of
the Name node map interface. In fact, we could instead have declared the variable as
type XmlNamedNodeMap.

Tutorial on DOM Interfaces Page 3 S2023


INFO-3138 Programming with Declarative Languages

Once we have the attributes collection we can “lookup” a specific attribute using its
name since the Named node map contains key-value pairs. Here we’re assigning the
attribute to variable of the generalized type XmlNode. We could also have used an
attribute interface variable. Finally, we can print the name of the country using the Node
interface’s InnerText property.

Aside: You may have noticed that Microsoft has chosen to prefix the DOM interface
types with “Xml”. For example, the Node interface (as named in the W3C standard for
the DOM) is of type XmlNode and the NodeList interface is type XmlNodeList.

3. Displaying a country’s ethnic groups

Figure 4 - DisplayEthnicGroups method

The DisplayEthnicGroups() method begins like the DisplayCountries() method. It uses the
XmlDocument object’s GetElementsByTagName() method to obtain a Node list of all the
country elements. It also iterates through the countries using a foreach loop and a
variable of type XmlNode to reference each country individually. However the foreach
loop isolates the required country, as specified by the DisplayEthnicGroups() method’s
country parameter, using an if statement.
Inside the if statement we’re obtaining a Node list of all child nodes of the country
element using the country node’s ChildNode property. It’s important to understand that
this collection may contain some nodes that are not elements. For example, it could

Tutorial on DOM Interfaces Page 4 S2023


INFO-3138 Programming with Declarative Languages

contain the country element’s text node as well as any comment nodes that may be
present. Therefore, when we process this collection of nodes via another foreach loop
we must ensure we’re processing a child element with the tag name ethnic_group. As
long as these criteria are met we then print the inner text (the value or text node) of the
ethnic_group element.

Now try the practice exercise accompanying this tutorial.

Tutorial on DOM Interfaces Page 5 S2023

You might also like