AdvancedJavaProgramming-SLIDES03-UNIT1-FP2005-Ver 1.0

Advanced Java Programming
(J2EE LC)
XML Parsers - Day 3
Course Objectives
Overview of XML
XML Document Type Definitions (DTDs)
XML Schemas
To understand the need for parsing XML documents
To understand types of XML Parsers
– Validating vs. Non-Validating Parsers
To understand different XML Parser Interfaces
– Tree Based Interface Standard : DOM
– Event Based Interface Standard : SAX
Evaluating Parsers
– Which parser to use?
ER/CORP/CRS/LA22/003
Copyright © 2005, Infosys 2
Technologies Ltd Version 1.00
Recap on XML
What is XML? tomcat-users.xml
– eXtensible Markup Language (XML)
Uses of XML
– XML Data Buffers : Used to store the data
– Config Files : Describes the configuration of the Servers
– Example : The user configuration file for Tomcat Web Server (tomcat-users.xml)
How these files are read?

– Parsers are used to read XML document programmatically
– Types of Parsers
• DOM: Reads entire XML data, converts into Memory objects and keeps data ready
• SAX: Incremental parser, parses chunk by chunk, used for huge XML data
• Validating and Non-Validating Parsers
XML structure and data validating technologies

– XML Document Type Definition (DTD)
– XML Schema
Recap on XML
XML Document (address.xml)
<?xml
xml version=“
version=“1.0”
1.0”?> XML Declaration
<address> Root Element
address
<name>
<first>John</first> Nested Elements
first, middle, last
<middle>Fitzgerald Johansen</middle>
<last>Doe</last>
</name>
Attribute
<doornumber>2345</doornumber> type
<street>Kalidasa Road</street>
<city>Mysore</city>
<pin>570 002</pin>
<telephone type=“work”>91-821-
2404000</telephone>
<telephone type=“home”> 91-821-
2404001</telephone>
Copyright © 2005, Infosys
Technologies Ltd
4
Version 1.00
<telephone type=“mobile”>91-93424-
Namespaces in XML
Namespaces helps to differentiate two objects (XML data) of the same name
In the example below, a ‘table’ can be a visual element or a piece of furniture

<web:table>
<table>
<web:tr>
<tr>
<web:td>Apples</web:td>
<td>Apples</td>
<web:td>Bananas</web:td>
<td>Bananas</td>
</web:tr>
</tr>
</web:table>
</table>
<wood:table>
<table>
<wood:name>Coffee Table</wood :name>
<name>Coffee Table</name>
<wood:width>80</wood :width>
<width>80</width>
<wood:length>120</wood :length>
<length>120</length>
</wood:table>
</table>
Document Type Definitions (DTDs)
Describes syntax that explains
– which elements may appear in the XML document
– what are the element contents and attributes
Need for DTD

– Validating parser ( a program) can be used to check whether XML data adheres to
the rules in DTD
– The parser can do appropriate error handling if there are any violation
– Validity error is not necessary a fatal error, but some applications may treat it as
fatal error
Document Type Declarations

– A valid XML document must include the reference to DTD which validates it
– Types of DTD
• Internal DTD: DTD can be embedded into XML document
• External DTD: DTD can be in a separate file
Internal DTD
DTD embedded in the XML document
– The declarations appear between [ and ]
– E.g. AddressBook.xml AddressBook.xml
<?xml version='1.0' encoding='utf-8'?> XML Declaration


<!DOCTYPE AddressBook [
<!ELEMENT AddressBook (Address+)>
<!ELEMENT Address (Name, Street, City)>
<!ELEMENT Name (#PCDATA)> Internal DTD
<!ATTLIST Name salutation CDATA #REQUIRED> Defining the
<!ELEMENT Street (#PCDATA)> Attribute(s)
<!ELEMENT City (#PCDATA)> salutation
]>
<AddressBook>
<Address> Document Name (Root
Element)
<Name salutation="Mr.">Ram</Name>
<Street>M G Road</Street>
<City>Bangalore</City> Defining the elements
</Address> AddressBook,
AddressBook, address,
</AddressBook> Name, Street, City.
City
External DTD
DTD is present in separate file

AddressBook.xml AddressBook.dtd
Example
– The DTD for AddressBook.xml is contained in a file AddressBook.dtd
– AddressBook.xml contains only XML Data with a reference to the DTD file
AddressBook.xml
<?xml version="1.0" encoding="UTF-8"?>

<!DOCTYPE AddressBook SYSTEM "file:///c:/XML/AddressBook.dtd
"file:///c:/XML/AddressBook.dtd">
c:/XML/AddressBook.dtd">
<AddressBook>
<Address>
<Name salutation="Mr.">Ram</Name>
<Street>M G Road</Street>
<City>Bangalore</City>
</Address>
Reference to
</AddressBook> external DTD
Anatomy of DTD – Defining new XML tags
(Elements)
<!ELEMENT element_name content_specification>
– element_name: Specifies name of the XML tag
– Content_specification: Specifies what are the contents of the element

• #PCDATA: Parsed character data (Extra white spaces are ignored)
• #CDATA: Character data (White spaces retained as is)
• Nested elements
Example:
– <!ELEMENT Street (#PCDATA)>
• element Street contains the parsed character Data
– <!ELEMENT Address (Name, Street, City)>

• element Address contains three nested tags Name, Street and City respectively
– <!ELEMENT AddressBook (Address+)>

• Element AddressBook contains one or more occurrences of element Address
Anatomy of DTD – Attribute Declarations
Specifies allowable attributes of each element
<!ATTLIST Tag-name Attr-Name Attr-Type Restriction>
– Tag-name : Element name
– Attr-Name : Name of the attribute, the attribute is defined for element Tag-Name
– Restriction : Whether the attribute must be present or implied etc
Example
– <!ATTLIST Name salutation CDATA #REQUIRED>
– The element Name has attribute salutation which is of type CDATA
– The attribute salutation must be specified in the Name tag
Anatomy of DTD – Entity Declarations (1 of 2)
Way to escape special characters

Some special characters such as <, >, & are not used as #PCDATA
This escaping of the characters is called as “Entity reference”
Following different entity references are used in the XML document
– Built-in Entities : &, <, >, ', "
– Characters Entities : ó representing ó
– General Entities : &source-text;
Example
– <State>Jammu & Kashmir</State>
AddressBook1.xml
Anatomy of DTD – Entity Declarations(2 of 2)
Data that is frequently used can be declared as an General Entity

– <!ENTITY entity_name entity_contents>
• entity_name : Name of the new Entity
• entity_contents : Contents of the new entity
Example
– <!ENTITY MyCountry "India">
• Defines the entity called as MyCountry
• “India” is the contents of entity MyCountry
Usage in the XML Document

– <Country>&MyCountry;</Country>
XML Schema
What is XML Schema?
– An XML vocabulary for expressing your data's structure and business rules
– Validating parsers can use Schema to check whether XML data adheres to rules in schema
– More robust and extensive than DTD, can do even data type validations
– E.g. : Consider following XML Document

<Result>
<EmpNo>45609</EmpNo>
<Name>Kiran</Name>
<Subject>
<Name>CHSSC</Name>
<Marks>80</Marks>
<Grade>A</Grade>
</Subject>
</Result>
Is this data valid?
To be valid, it must meet following business rules (constraints)

– The Result must be comprised of a Subject, Marks, Grade in the order shown
– The Subject must be any valid subject from the list (PF, CHSSC, RDBMS, IWT, AOA)
– The Marks must be between 0 to 100 only and Grade can be either A or B or C
XML Schema : Validating the XML Document
Validating your Data (XML Document)
<Result>
<Name>Kiran</Name>
<Subject>
<Name>CHSSC</Name>
<Marks>80</Marks>
XML
<Grade>A</Grade>
</Subject> Schema Data is
</Result> Validating Ok!
XML Document ( Instance Document ) parser
Subject, Marks and Grade must appear in that order

The Subject must be one of the following
CHSSC, PF, RDBMS, AOA, IWT
The Marks must be between 0 to 100 only
The Grade can be either A, or B or C
Constraints on XML Document (Schema)
How can XML schema help to accomplish this?
Answer
– It creates XML vocabulary : Defines following set of elements
• <Result>, <Subject>, <Marks>, <Grade>
– It specifies the contents of each element and restrictions on each element

• <Result> element must contain <Subject>, <Marks>, <Grade> in that order
• <Subject> must be one of the valid subjects (CHSSC, PF, RDBMS, AOA, IWT)
• The Marks must be between 0 to 100 only
• Grade can be either A or B or C
– XML Schema specifies in which namespace the created vocabulary must be in
– It is not an actual URL, but uses URL syntax and should be a unique string
– Example: http://www.Results.com Namespace defines the following vocabulary

<Result>
<Subject>
<Marks> <Grade>
Example of referring to Schema Result.xml
<?xml version = "1.0" encoding = "UTF-
"UTF-8"?>
<res:Result xmlns:res="http://
xmlns:res="http://www.Results.com
="http://www.Results.com"
www.Results.com"
xmlns:xsi="http://www.w3.org/2001/XMLSchema
xmlns:xsi="http://www.w3.org/2001/XMLSchema-
="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.Results.com
xsi:schemaLocation="http://www.Results.com Result.xsd
Result.xsd"
">
<res:Name>Kiran</
res:Name>Kiran</res:Name
>Kiran</res:Name>
res:Name>
<res:EmpNo>45609</
res:EmpNo>45609</res:EmpNo
>45609</res:EmpNo>
res:EmpNo>
<res:Subject>
res:Subject>
<res:Name>CHSSC</
res:Name>CHSSC</res:Name
>CHSSC</res:Name>
res:Name>
<res:Marks>80.70</
res:Marks>80.70</res:Marks
>80.70</res:Marks>
res:Marks>
<res:Grade>A</
res:Grade>A</res:Grade
>A</res:Grade>
res:Grade>
</res:Subject
</res:Subject>
res:Subject>
<res:Subject>
res:Subject>
<res:Name>PF</
res:Name>PF</res:Name
>PF</res:Name>
res:Name>
<res:Marks>78.30</
res:Marks>78.30</res:Marks
>78.30</res:Marks>
res:Marks>
<res:Grade>B+</
res:Grade>B+</res:Grade
>B+</res:Grade>
res:Grade>
</res:Subject
</res:Subject>
res:Subject>
</res:Result
</res:Result>
res:Result>
Schema example : Result.xsd
Result.xsd
<?xml version="1.0" encoding="UTF-

encoding="UTF-8"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://
targetNamespace="http://www.Results.com
www.Results.com"
xmlns="http://
xmlns="http://www.Results.com
www.Results.com" elementFormDefault="qualified">
elementFormDefault="qualified">

-->
<xsd:element name="Result">
<xsd:complexType>
<xsd:sequence>
xsd:sequence>
<xsd:element name="Name" type="xsd:string"/>
<xsd:element name="EmpNo
name="EmpNo"
EmpNo" type="xsd:int"/>
<xsd:element name="Subject" type="SubjectType
type="SubjectType"
SubjectType" maxOccurs="5"/>
</xsd:sequence
</xsd:sequence>
xsd:sequence>
</xsd:complexType>
</xsd:element>
<xsd:simpleType name="NameType
name="NameType">
NameType">
<xsd:restriction base="xsd:string">
<xsd:pattern value="CHSSC|PF|RDBMS|IWT|AOA"/>
</xsd:restriction>
</xsd:simpleType>
[ Continued ……]
Schema example : Result.xsd (Continued ……)
<xsd:complexType name="SubjectType
name="SubjectType">
SubjectType">
<xsd:sequence>
xsd:sequence>
<xsd:element name="Name" type="NameType
type="NameType"/>
NameType"/>

-->
<xsd:element ref="Marks"/>
<xsd:element name="Grade">
<xsd:simpleType>
<xsd:restriction base="xsd:string">
<xsd:pattern value="A|B+|B|C|D"/>
</xsd:restriction>
</xsd:simpleType>
</xsd:element>
</xsd:sequence
</xsd:sequence>
xsd:sequence>
</xsd:complexType>
<xsd:element name="Marks">
<xsd:simpleType>
<xsd:restriction base="xsd:float
base="xsd:float">
xsd:float">
<xsd:minInclusive value="0.0"/>
<xsd:maxInclusive value="100.0"/>
</xsd:restriction>
</xsd:simpleType>
</xsd:element>
</xsd:schema
</xsd:schema>
xsd:schema>
Result.xml : Understanding XML Declaration
<?xml version="1.0" encoding="UTF-8"?> XML Declaration

<res:Result xmlns:res="http://www.Results.com"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.Results.com Result.xsd">
All elements prefixed
with res: are defined in
<res:Name>Kiran</res:Name> www.Resuts.com
<res:EmpNo>1000</res:EmpNo> namespace
<res:Subject>
...
... The namespace
</res:Subject> www.Resuts.com
is defined in
Result.xsd
</res:Result>
XML Data
Result.xml : Understanding Structure of XML Data

<res:Result xmlns:res="http://www.Results.com"
XML Declaration
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" and Reference to
xsi:schemaLocation="http://www.Results.com Result.xsd"> Schema
<res:Name>Kiran</res:Name>
Attributes prefixed
<res:EmpNo>1000</res:EmpNo> with xsi: are
<res:Subject> defined in
www.w3.org/.../XM
<res:Name>CHSSC</res:Name> LScheman-
<res:Marks>80.90</res:Marks> CHSSC Result instance
<res:Grade>A</res:Grade> namespace
</res:Subject>
All elements
prefixed with res:
<res:Subject> are defined in
www.Results.co
<res:Name>PF</res:Name> mnamespace
<res:Marks>45.30</res:Marks> PF Result
<res:Grade>D</res:Grade>
</res:Subject>
</res:Result>
Understanding XML Schema
<xsd:schema xmlns:xsd=“http://www.w3.org/2001/XMLSchema” All the elements
targetNamespace=“http://www.Results.com” prefixed with xsd are
xmlns="http://www.Results.com" elementFormDefault="qualified"> defined in
www.w3.org/../...
<xsd:element name="Result"> Name-space
<xsd:complexType>
<xsd:sequence>
<xsd:element name="Name" type="xsd:string"/> Define
<xsd:element name="EmpNo" type="xsd:int"/> Element
<xsd:element name=“Subject" type=“SubjectType" maxOccurs="5"/> Result
</xsd:sequence>
</xsd:complexType>
</xsd:element> All the elements
defined here are part
<xsd:complexType name=“SubjectType">
of this
... “targetNamespace”
...
</xsd:complexType>
</xsd:schema>
DTD vs Schema
XML document and DTD use different syntax : Inconsistency

– Schema uses XML syntax
Limited data type capability

– DTDs support a very limited capability for specifying data types.
– DTDs do not support field level validations and complex types

• E.g. : You can't, express "I want the <Marks> element to hold an integer with a range of 0
to 100“ in DTD
Schema describes a set of data types compatible with those found in
databases
– E.g.: Database supports integer, string, etc data types
– Schema supports integer, string etc while the DTD does not
Element Declarations: Simple Element
Syntax :
<xsd:element name=“Element_name” type=“Element_type” Occurrence/>
Element_name : Any valid xml name

Element_type : Built in Simple type
Occurrence : Number of occurrences of that element, optional
Example :
– <xsd:element name="Name" type="xsd:string"/>
• Defines the element Name of type string
– <xsd:element name=“Marks" type=“xsd:float“ maxOccurs=“5”/>
• Defines the element Marks of simple type float
• Marks may appear for maximum 5 times
• And by default for minimum 1 time
Element Declarations
Syntax :
<xsd:element name=“
name=“Element_name”
Element_name”>
<xsd:complexType>


</xsd:complexType>
</xsd:element>
– Example
<xsd:element name=“
name=“Subject">
<xsd:complexType>
<xsd:sequence>
<xsd:element name=“ name=“Name" type="xsd:string
type="xsd:string"/>
xsd:string"/>
<xsd:element name=“ name=“Marks" type="xsd:float
type="xsd:float"/>
xsd:float"/>
<xsd:element name=“ name=“Grade" type="xsd:string
type="xsd:string"/>
xsd:string"/>
</xsd:sequence>
</xsd:complexType>
<xsd:element>
• Defines non-reusable complex element called ‘Subject’
• Each element appears in that sequence

Technologies Ltd because <xsd:sequence> tag is used
Version 1.00
Element Declarations: Reusable Simple Type
Syntax :
<xsd:simpleType name=“
name=“Element_type_name">
Element_type_name">
<xsd:restriction base="Base_Data_type
base="Base_Data_type">
Base_Data_type">

-->
</xsd:restriction>
</xsd:simpleType>
Element_type_name : Name of the data type

Base_data_type : Any of the built in simple data type (integer, float etc)
Restriction_specification : Specifies restriction on the element if any
Example :
<xsd:simpleType name=“name=“MarksType">
MarksType">
<xsd:restriction base="xsd:float
base="xsd:float">
xsd:float">
<xsd:minInclusive value=“
value=“0.0"/>
<xsd:maxInclusive value=“
value=“100.0”
100.0”/>
</xsd:restriction>
</xsd:simpleType>
– Defines the reusable element type MarksType
– Element defined as MarksType may take minimum value of 0.0 and maximum value 100.0
– <xsd:element name=“Marks” type=“MarksType”>
Element Declarations: Reusable Complex Type
Syntax
<xsd:complexType name=“
name=“Type_name”
Type_name”>
– Defines the reusable type Type_name
Example
<xsd:complexType name=“SubjectType“>
<xsd:sequence>
<xsd:element name=“Name" type=“xsd:string"/>
<xsd:element name=“Marks" type="xsd:int"/>
<xsd:element name=“Grade" type="xsd:string”/>
</xsd:sequence>
</xsd:complexType>
– Defines reusable complex element type SubjectType
– Comprises of following elements in the sequence specified (<xsd:sequence> tag)
• Name
• Marks
• Grade
This type can be used to define elements in your XML
<xsd:element name=“Subject” type=“SubjectType”>
Defining the Attributes
Syntax : <xsd:attribute name=“Attr_Name" type=“Attr_Type"/>
– Example
<xsd:attribute name=“Project" type=“xsd:string"/>
– All attributes are declared as simple types.

– Only complex elements can have attributes
– Example
<xsd:complexType name=“EmpNo">
<xsd:sequence>

</xsd:sequence>
<xsd:attribute name=“
name=“Project" type=“
type=“xsd:string"/>
</xsd:complexType>
• Defines the attribute Type of string type
– Attribute Project being used

<res:Name>Kiran<res:Name>
<res:EmpNo Project=“
Project=“Training”
Training”>45609<res:EmpNo
>45609<res:EmpNo>
res:EmpNo>
<res:Subject>CHSSC<res:Subject>
Anatomy of XML Schema : Constraints specification
Controls occurrence of individual element or group of elements
Types of constraints
• <choice> : allows only one element to appear
• <sequence> : elements must appear in the same order as they are declared
• <all> : elements can occur in any order and in any combination
<choice> constraint
– E.g.:
<xsd:choice>
<xsd:element name=“first”/>
<xsd:element name=“last”/>
</xsd:choice>
• Allows either first or last name to be used in the instance XML Document
<sequence> constraints
– E.g.:
<xsd:sequence>
<xsd:element name="Name" type="xsd:string"/>
<xsd:element name="EmpNo" type=“xsd:int"/>
<xsd:element name=“Subject" type="SubjectType" maxOccurs="5"/>
</xsd:sequence>
• All elements must appear in the defined order only
Anatomy of XML Schema : Constraints specification
<all> constraints
– E.g. :
<xsd:all>
<xsd:element name=“invoice”>
<xsd:element name=“purchaseOrder”>
<xsd:element name=“mailingLabel”>
</xsd:all>
• Any of the elements can either appear or not appear
• Elements may appear in any order
XML Parsers
XML Parser : The Big Picture
Why to use Parser?

– Typically use a pre-built XML parser (e.g. JAXP, Apache Xerces etc)
– This enables you to build your application much more quickly
XML
DTD / Schema
API’s
XML XML Client
Document Application
Parser
Parsed Data
Fig. 1 : Usage of the XML Parser
Need for Parser
Defining the Parser’s Responsibilities
– Ensure that the document adheres to specific standards

• Does the document match the DTD or Schema?
• Is the document well-formed?
– Make the document contents available to your application

• The parser will parse the XML document, and make this data available to your
application
• An application using parser can access data in XML by going through the hierarchy
or using tag names
Types of XML Parsers
Validating Parser
– a parser that verifies that the XML document adheres to the DTD or Schema
Non-Validating Parser
– a parser that does not verify the XML document against the DTD or Schema
Most parsers provide an option to turn validation on or off
All parsers checks the well-formedness of XML document at all times
XML Parser Interfaces
Two types of Interfaces provided by XML Parsers

– SAX An Event Based Interface
– DOM a Tree Based Interface
JAXP
– “Java API for XML Processing”
– JAXP is part of JDK
– Provides parsers which can be used in any Java application
It supports both
– Tree Based Parser : DOM
– Event Based Parser : SAX
DOM Parser
Tree Based Parser

– Definition: Parser reads the XML document, and creates an in-memory “tree”
representation of XML Document
– For example: Given a sample XML document below
– What kind of tree would be produced?
<Result>
<Name>Kiran</Name>
<Subject>
<Name>CHSSC</Name>
<Marks>80</Marks>
<Grade>A</Grade>
</Subject>
</Result>
DOM Parser
In memory tree created by Tree Based Parser

– Tree represents the hierarchy of XML document
Element
Result Nodes
Name
Kiran
EmpNo
Text Nodes
45609
DOM Parser
Tree based APIs presents a memory model of entire document to an

application once parsing has concluded
No need to use extra data-structures to maintain the information during parsing
An application can navigate through the tree to find the desired pieces of
document
Document Object Model (DOM) is the standard for Tree Based parsing of
XML document
Document Object Model (DOM)
The Document Object Model (DOM) is a set of interfaces defined by the W3C
DOM Working Group
DOM is the tree based interface used by the programmers to manipulate the
XML document
DOM Parser can be Validating or Non Validating
DOM Parser represents the logical Model of the XML document in the memory
All the entity reference are expanded before the DOM tree was constructed
DOM Structure representing XML
XML Document Document Structure

Structure Document representing Result.xml
Root
Document Result
Element
Node
Element Element Element Name EmpNo
Attribute Text Kiran 45609

Subject
Comment
Text Name Marks
Text Node
80.0 Grade
CHSSC
A
Document Object Model (DOM) : Overview
The root of the DOM Hierarchy is called as a Document node

– Example : Result
The Child nodes of the Document node are : Element nodes, Comments
nodes etc
– Example : Name, Subject, EmpNo, etc are all Child Nodes
All the nodes in the XML Document are derived from interface :
org.w3c.dom.Node
The Big picture : Parsing the XML Document
Document builder factory creates an instance of parser with required characteristics
– Whether the parser should be validating parser or not
– Whether namespace support required or not, Whether to ignore the white spaces between the elements or
not
Factory hides the implementation details of the parser and gives a standard DOM interface for
parsing XML
– (Analogous to JDBC driver)
Java Application using DOM Parser (JAXP)
DomApp.java : Parsing XML Document using DOM Parser
public class DomApp {
public static void main(String argv[]) { DomApp.java
MyErrorHandler hErr;
Document hDocument;
DocumentBuilderFactory factory =
DocumentBuilderFactory.newInstance();
factory.setValidating(true);
factory.setNamespaceAware(true);
try {
hErr = new MyErrorHandler();
DocumentBuilder hBuilder = factory.newDocumentBuilder();
// Set the error handler
hBuilder.setErrorHandler(hErr);
hDocument = hBuilder.parse( new File(“Result.xml”));
}
catch (Exception e){
// Handle exception if generated during parsing
}
}// End of Function main
}
Parsing the XML Document using DOM Parser
Step 1: Get the instance of document-builder factory.
This will be used to produce the DOM-parser (called DocumentBuilder)
DocumentBuilderFactory factory =
DocumentBuilderFactory.newInstance();
Step 2: Set the properties of the DOM parser to be produced
a. It should validate the XML Document against the Schema / DTD
b. It should be namespace aware
Step 3 : Obtain the instance of the MyErrorHandler class
This instance handles the error generated during parsing, in application specific way
hErr = new MyErrorHandler();
Step 4: Obtain the instance of DOM parser, and register the error handler
This will be used to parse the XML Document and creates the memory based tree
representation of the XML Document
DocumentBuilder hBuilder = factory.newDocumentBuilder();
hBuilder.setErrorHandler(hErr);
Step 5 : Parse the XML Document (Result.xml) using the parser created as above
hDocument = hBuilder.parse( new File(“Result.xml”));
DOM : Exploring the org.w3c.dom.Node Interface
The Node interface is the root of DOM Core class hierarchy
This interface can be used to extract information from any DOM object without
knowing its actual type (e.g. Element node, Text node, Attr Node etc ) of
underlying node
i.e. It is possible to access a document's complete structure and content using

only the methods and properties exposed by the Node interface
The Class Hierarchy rooted at org.w3c.dom.Node
Node
Element Document Entity
Attr Text Comment
DOM : Important Methods of Node interface
Methods to retrieve the various information from the XML DOM Tree
• Node getFirstChild() : Returns the first child of the current node
• Node getLastChild() : Returns the last child of the current node
• String getNodeName() : The name of this node
• String getNodeValue() : The value of this node, depending on its type
• short getNodeType() : A code representing the type of the underlying object
Methods to alter the elements of XML DOM Tree

• Node insetBefore( Node newChild, Node refChild)
• Node appendChild (Node newChild)
• Node removeChild (Node oldChild)
• Node replaceChild (Node newChild, Node oldChild )
Using Node Interface
hNode hNode = hDocument.getDocumentElement()
Node hFirstChild= hNode.getFirstChild();

Result
hFirstChild String sName = hFirstChild.getNodeName()
sName=“Name” Node hLastChild = hNode.getLastChild();

Name
hFirstChild= hFirstChild.getFirstChild();
hFirstChild hLastChild
String sVal = hFirstChild.getNodeValue()
Kiran EmpNo Subject
sVal = “Kiran”
Name
45609
XML Parser Interfaces : Event Based Interface
Event Based Interface

– Definition : Parser reads the XML document and generates events for each parsing
step
– Some common parsing events

• Element start-tag read
• Element content read
• Element end- tag read
– Example
<Result>
<Name>Kiran</Name>
<Subject>
<Name>CHSSC</Name>
<Marks>80</Marks>
<Grade>A</Grade>
</Subject> Copyright © 2005, Infosys
Technologies Ltd
47
Version 1.00
</Result>
XML Parser Interfaces : Event Generated
– startElement : Result
– startElement : Name
– contents : Kiran
– endElement : Name
– startElement : EmpNo
– contents : 45609
– endElement : EmpNo
– endElement : Result
XML Parser Interfaces : Event Based Interface
For each of these events, your application implements “event handlers”
Each time an event occurs, a different event handler is called
Your application intercepts these events, and handles them in any way you
want
Application does not wait till the entire document gets parsed
Application has to maintain the information from XML document within local
data-structures till it is processed completely
Simple API for XML (SAX) is the standard for Event Based parsing of XML
document
SAXApp.java : Parsing XML Document using SAX Parser
SAXApp.java
public class SAXApp {
public static void main(String argv[])
argv[]) {
//Get the instance of parser event handing class
DefaultHandler handler = new Handler();
//Get the instance of SAXParserFactory
SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParserFactory.newInstance();
try {
// Set the properties of the parser to be obtained
// Get the new SAX Parser
SAXParser saxParser = factory.newSAXParser();
factory.newSAXParser();
// Parse the file
// handler : processes events generated during parsing
saxParser.parse(new File(“
File(“Result.xml”
Result.xml”), handler);
}
//Handle any exceptions if generated during parsing
catch (Throwable
(Throwable t) {
t.printStackTrace();
t.printStackTrace();
}
} // End of function main
} ER/CORP/CRS/LA22/003
SAXApp.java : Parsing XML Document using SAX Parser
class Handler extends DefaultHandler{
DefaultHandler{
public void error(SAXParseException e) throws SAXException {

System.out.println("Error At Line:”
Line:”+e.getLineNumber());
+e.getLineNumber());
System.out.print(“
System.out.print(“Column:
Column: "+e.getColumnNumber
"+e.getColumnNumber());
e.getColumnNumber());
// Print the error message
System.out.print(e.getMessage());
}
// Process any fatal errors in the XML document

public void fatalError(SAXParseException e) throws SAXException {
System.out.println("Fatal Error At Line:”
Line:”+e.getLineNumber());
+e.getLineNumber());
System.out.print(“
System.out.print(“Column:
Column: "+e.getColumnNumber
"+e.getColumnNumber());
e.getColumnNumber());
// Print the error message
}
} //End Class DefaultHander
Understanding The Simple API for XML (SAX)
Step 1: Get the instance of SAXParserFactory
This instance is used to obtain the SAX Parser
SAXParserFactory factory = SAXParserFactory.newInstance();
Step 2:Get the instance of the event handler class
This class handles all the events generated by parser
DefaultHandler handler = new Handler();
Step 3:Set the properties of the parser to be obtained
a. It should validate the XML Document against the Schema / DTD
b. It should be namespace aware
Step 4 : Obtain the instance of the SAX Parser using the factory just obtained
SAXParser saxParser = factory.newSAXParser();
Step 5: Parse the Result.xml file using the SAX Parser obtained as above
Events generated during parsing will be handled by object handler
saxParser.parse(new File(“Result.xml”), handler);
The Big picture : Paring the XML Document using SAX
SAX Parser org.xml.sax class hierarchy

Factory
org.xml.sax org.xml.sax org.xml.sax

ContentHander ErrorHander EntityResolver
XML Parser implements

SAX Parser Events
Document
DefaultHandler/
MyHandler
org.xml.sax Interfaces
org.xml.sax.DefaultHandler Class
– Provides the default implementation of all the events
– DefaultHandler implements the ContentHandler, ErrorHandler, DTDHandler, and

EntityResolver interfaces (with null methods).
– Only the methods which are required are overridden
org.xml.sax.ContentHandler Interface
– Receive notification of the logical content of a document
– Defines methods like startDocument(), endDocument(), startElement(), and

endElement()
– These are invoked when an XML tags arerecognized
– Also defines methods characters() which are invoked when the parser encounters
the text in an XML element
org.xml.sax Interfaces
org.xml.sax.ErrorHandler Interface
– Allows SAX application to do customized error handling
– The parser will then report all errors and warnings through this interface
– Important Methods
• void error() : receives the notification of recoverable error
• void fatalError() : receives the notification of non-recoverable error
• void warning() : receives the notification of a warning
Evaluating Parsers : SAX vs. DOM
SAX
– Advantage
• It is good when serial processing of the document is required and document is very large
• i.e. when the size of the XML document is in terms of GBs.
– Disadvantage
• Requires internal data structure to maintain the parts of XML document till the complete processing is not
finished, therefore not suitable for parsing the small XML Documents.
DOM
– Advantage
• Supports DOM Tree Traversing methods
• Allows modification of XML Document
• Good when the random access of a document is required
– Disadvantage
• For large XML documents (size in GBs) requires more memory as compared to memory required to parse
XML document using SAX Parser.

AdvancedJavaProgramming-SLIDES03-UNIT1-FP2005-Ver 1.0

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

AdvancedJavaProgramming-SLIDES03-UNIT1-FP2005-Ver 1.0

Uploaded by

Copyright:

Available Formats

Advanced Java Programming

How these files are read?

XML structure and data validating technologies

In the example below, a ‘table’ can be a visual element or a piece of furniture

Need for DTD

Document Type Declarations

<?xml version='1.0' encoding='utf-8'?> XML Declaration

DTD is present in separate file

<?xml version="1.0" encoding="UTF-8"?>

– Content_specification: Specifies what are the contents of the element

• #CDATA: Character data (White spaces retained as is)

– <!ELEMENT Address (Name, Street, City)>

– <!ELEMENT AddressBook (Address+)>

Specifies allowable attributes of each element

<!ATTLIST Tag-name Attr-Name Attr-Type Restriction>

– Tag-name : Element name

– Restriction : Whether the attribute must be present or implied etc

– The element Name has attribute salutation which is of type CDATA

– The attribute salutation must be specified in the Name tag

Way to escape special characters

Data that is frequently used can be declared as an General Entity

• entity_name : Name of the new Entity

• entity_contents : Contents of the new entity

• Defines the entity called as MyCountry

• “India” is the contents of entity MyCountry

Usage in the XML Document

– E.g. : Consider following XML Document

Is this data valid?

To be valid, it must meet following business rules (constraints)

XML Document ( Instance Document ) parser

Subject, Marks and Grade must appear in that order

Constraints on XML Document (Schema)

– It specifies the contents of each element and restrictions on each element

• The Marks must be between 0 to 100 only

• Grade can be either A or B or C

– XML Schema specifies in which namespace the created vocabulary must be in

– Example: http://www.Results.com Namespace defines the following vocabulary

<?xml version="1.0" encoding="UTF-

<?xml version="1.0" encoding="UTF-8"?> XML Declaration

<?xml version="1.0" encoding="UTF-8"?>

XML document and DTD use different syntax : Inconsistency

Limited data type capability

– DTDs do not support field level validations and complex types

Element_name : Any valid xml name

• Each element appears in that sequence

Element_type_name : Name of the data type

– All attributes are declared as simple types.

– Attribute Project being used

Why to use Parser?

– This enables you to build your application much more quickly

XML XML Client

Defining the Parser’s Responsibilities

– Ensure that the document adheres to specific standards

• Is the document well-formed?

– Make the document contents available to your application

Most parsers provide an option to turn validation on or off

All parsers checks the well-formedness of XML document at all times

Two types of Interfaces provided by XML Parsers

– DOM a Tree Based Interface

– JAXP is part of JDK

– Provides parsers which can be used in any Java application

– Event Based Parser : SAX

Tree Based Parser