WT Unit 4 @omi

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 23

@omi

Elective –I MCA 303 (2) Web Technology

UNIT IV XML

Introduction,............................................................................2
Features,...................................................................................3
Anatomy,..................................................................................4
Declaration,..............................................................................5
Uses,..........................................................................................6
Key Components,....................................................................7
DTD..........................................................................................8
Schema, ...................................................................................12
Markup Elements...................................................................13
Attributes,................................................................................14
XML Objects,..........................................................................15
XML Scripting,.......................................................................16
Using XML with application, ...............................................19
Transforming XML using XSL and XSLT,.........................19
XPATH - Template Based Transformations. ......................22

Contributed By: Om Shankar Mishra

www.linkedin.com/in/omi92229

https://t.me/SolutionsAndTricksIT
UNIT IV XML
Introduction:
 XML, or Extensible Markup Language, is a markup language designed to store and
transport data in a structured and readable format. It uses a set of rules to define and
describe the data, making it easy to exchange information between different systems and
applications.
 XML is used to address the need for a universal format that allows diverse systems to
share data seamlessly. Unlike programming languages, XML doesn't perform
computations; instead, it focuses on providing a standard way to structure and represent
data. It plays a crucial role in facilitating data interchange and communication between
disparate systems.
Difference from HTML: While both XML and HTML use markup symbols (tags) to structure
content, they serve different purposes.
HTML (Hypertext Markup Language):
 Purpose: HTML is designed for presenting and displaying information on web browsers.
 Tags: HTML has predefined tags for structuring web content, such as headings,
paragraphs, and links.
 Syntax: HTML is forgiving of syntax errors and is not case-sensitive.
XML (Extensible Markup Language):
 Purpose: XML is used to store and transport data, emphasizing a generic and flexible
approach.
 Tags: Users can define their own tags in XML, providing a customizable and extensible
structure.
 Syntax: XML is case-sensitive, and syntax errors can result in parsing failures by XML
parsers.
Benefits of Using XML:
 Interoperability: XML enables the exchange of data between different systems, fostering
interoperability.
 Data Integrity: Descriptive information included in XML helps in maintaining data
integrity during storage and transfer.
 Flexibility: XML's flexible structure allows for easy modification and adaptation to
changing requirements.
 Efficient Searching: XML facilitates efficient sorting and categorization of data, enhancing
search engine capabilities.
 Structured Data Transfer: It provides a standardized format for transferring data between
systems with different formats.
 Application Design: XML supports flexible application design, allowing for upgrades and
modifications without extensive reformatting.
Features:
XML (Extensible Markup Language) is a versatile markup language that provides a set of
features for defining, structuring, and sharing data in a standardized way.
1. Extensibility: XML is extensible, allowing users to define their own tags and structures.
This flexibility makes it suitable for a wide range of applications and industries.
2. Human-Readable: XML is designed to be easily readable and writable by both humans
and machines. Its text-based format enhances human readability, facilitating manual
inspection and editing.
3. Hierarchy: XML data is organized hierarchically using nested elements. This hierarchical
structure allows the representation of complex relationships and structures within the
data.
4. Platform-Independent: XML is platform-independent, meaning it can be used across
different operating systems and computing environments. This characteristic contributes
to its interoperability.
5. Interoperability: One of XML's primary purposes is to enable interoperability between
different systems. It provides a standardized way for diverse applications to exchange
data seamlessly.
6. Self-Descriptive: XML documents are self-descriptive, as they include both data and
information about the structure of the data. This self-descriptive nature contributes to
data integrity and interpretation.
7. Metadata Support: XML allows the inclusion of metadata, providing additional
information about the data elements. This metadata can include attributes, data types,
and other descriptors that enhance the understanding of the data.
8. Data Validation: XML documents can be validated against a specified XML schema. This
ensures that the data adheres to predefined rules, enhancing consistency and reducing
the likelihood of errors.
9. Reusable Components: XML supports the creation of modular and reusable components.
These components can be defined once and reused in multiple parts of the document,
promoting consistency and reducing redundancy.
10. Unicode Support: XML supports Unicode, allowing the representation of characters from
various languages and character sets. This ensures that XML is suitable for
internationalization and multilingual applications.
11. Well-Defined Standards: XML is governed by well-defined standards maintained by
organizations like the World Wide Web Consortium (W3C). These standards ensure
consistency and reliability in XML usage across different contexts.
12. Data Presentation Separation: XML separates data from its presentation, making it a
suitable choice for data exchange without concerns about how the data will be displayed
or formatted.
Anatomy
<?xml version="1.0" encoding="UTF-8"?>
<company>
<!-- Employee 1 -->
<employee id="101">
<name>
<first>Jane</first>
<last>Doe</last>
</name>
<position>Software Engineer</position>
<department>Engineering</department>
<salary currency="USD">80000</salary>
</employee>

<!-- Employee 2 -->


<employee id="102">
<name>
<first>John</first>
<last>Smith</last>
</name>
<position>Marketing Specialist</position>
<department>Marketing</department>
<salary currency="EUR">60000</salary>
</employee>

<!-- Employee 3 -->


<employee id="103">
<name>
<first>Alice</first>
<last>Johnson</last>
</name>
<position>HR Manager</position>
<department>Human Resources</department>
<salary currency="GBP">70000</salary>
</employee>
</company>

 The XML declaration (<?xml version="1.0" encoding="UTF-8"?>) specifies the XML


version and character encoding.
 The root element <company> contains information about three employees, each
represented by an <employee> element.
 Each <employee> element has child elements like <name>, <position>, <department>,
and <salary>.
 The <name> element includes nested <first> and <last> elements to represent the first
and last names of the employees.
 The <salary> element has an attribute (currency) to indicate the currency of the salary.
Declaration
In XML, the declaration is an optional but often included part of an XML document that
provides essential information about the version of XML being used and the character
encoding of the document. The XML declaration is placed at the very beginning of the
document, before any other content.
Let's break down the components of the XML declaration:
<?xml: This signals the start of the XML declaration.
version="1.0": Specifies the version of XML that the document adheres to. In the example, it is
version 1.0. This attribute is mandatory, and it indicates the XML version used in the
document.
encoding="UTF-8": Specifies the character encoding used in the document. UTF-8 is a widely
used character encoding that supports a vast range of characters from various languages. The
encoding attribute is optional, but it is good practice to include it to ensure proper
interpretation of special characters.
?>: Signals the end of the XML declaration.
Here's the XML declaration used in context:
<?xml version="1.0" encoding="UTF-8"?>
<root>
<!-- Other elements and content go here -->
</root>
Including the XML declaration helps applications and parsers correctly interpret and process
the XML document by providing information about its version and character encoding. While
the version attribute is mandatory, the encoding attribute is optional, but specifying it is
recommended to avoid potential encoding issues.
When creating XML documents, it's important to follow certain naming rules to ensure that
the document is well-formed and can be correctly interpreted by XML parsers.
Element Names:
 Must start with a letter or underscore (_).
 After the first character, can contain letters, digits, hyphens (-), underscores (_), and
periods (.).
 Cannot start with the letters "xml" (or XML, or Xml, etc.), as these are reserved for XML-
related constructs.
<!-- Valid element names -->
<book>
<book_title> <chapter1> <first-name>
</book>
Attribute Names:
 Follow the same rules as element names.
 Cannot contain spaces.
Case Sensitivity:
XML is case-sensitive. <Book> and <book> are considered different elements.
<!-- Invalid: case-sensitive mismatch -->
<Book>Title</book> ✖
<book>Content</book> ✔
Reserved Characters:
 Certain characters have special meanings in XML and cannot be used directly in element
or attribute names. These characters are <, >, &, ', and ".
 If these characters need to be included, use character entities: &lt;, &gt;, &amp;, &apos;,
and &quot;.
<!-- Using character entities -->
<title>It&apos;s a Book</title>
Whitespace:
Whitespace characters (spaces, tabs, newlines) are allowed, but leading and trailing
whitespace in element names and attribute names should be avoided.
<!-- Valid, but avoid leading/trailing whitespace -->
< element_name attribute_name="value" >Content </element_name>
Namespace Considerations:
 When using namespaces, the rules for element and attribute names still apply within
the specific namespace.
 Namespace prefixes (e.g., xmlns:prefix) should be declared before use.
<!-- Namespace example -->
<ns:book xmlns:ns="http://example.com">
<ns:title>XML Basics</ns:title>
</ns:book>
Following these naming rules ensures that your XML documents are well-formed and can be
accurately processed by XML parsers and other tools.

Uses
XML (Extensible Markup Language) is used in a variety of applications and industries due to its
versatility, extensibility, and ability to facilitate data interchange.
Data Exchange Between Applications: XML is widely used for exchanging data between
different applications and systems. It provides a standardized format that allows diverse
software to understand and process the shared data.
Web Services: Many web services use XML as the data format for communication between
client and server. XML's structure makes it well-suited for representing complex data in a way
that can be easily processed by web applications.
Configuration Files: XML is often used for configuration files in software applications. These files
define settings and parameters for the application and can be easily modified without altering
the application's code.
Document Representation: XML is used to represent and structure documents in a machine-
readable format. It's common in applications where document structure and semantics need to
be maintained, such as in content management systems.
Database Interaction: XML is employed in database systems to facilitate the exchange of data
between databases and other applications. It provides a standardized way to represent and
transfer structured data.
Middleware Communication: Middleware systems use XML to facilitate communication
between different software components and services. This allows for interoperability in
distributed computing environments.
Industry Standards and Protocols: XML is often a key component in defining industry-specific
standards and protocols. Various domains, such as finance, healthcare, and
telecommunications, adopt XML-based standards for data exchange.
RDF (Resource Description Framework): XML is used as a syntax for representing RDF, which is
a standard model for expressing metadata and relationships on the web. This is particularly
important in the context of the Semantic Web.
SOAP (Simple Object Access Protocol): XML is the foundation for SOAP, a protocol used for
exchanging structured information in web services. SOAP messages are typically XML
documents.
Data Transformation: XML is employed in data transformation processes where data from one
format needs to be converted into another format. XSLT (Extensible Stylesheet Language
Transformations) is commonly used for XML transformations.
Messaging Formats: XML is used in messaging formats, providing a common language for
communication between systems. For example, XML-based formats like RSS and Atom are used
for syndicating web content.
Inter-business Data Transfer: XML facilitates electronic data interchange between businesses,
allowing them to exchange information on goods, services, and transactions in a standardized
format.

Key components:
XML Declaration: The XML declaration is an optional, but often included, part of an XML
document. It provides essential information about the version of XML being used and the
character encoding.
<?xml version="1.0" encoding="UTF-8"?>
Root Element: Every XML document has a single root element that encapsulates all other
elements. It is the top-level element in the hierarchy and represents the entire document.
<root>
<!-- Other elements and content go here -->
</root>
XML Elements: Elements are the building blocks of an XML document and represent the
structure of the data. Each element has a start tag, content, and an end tag. Elements can be
nested within other elements. For example:
<book>
<title>Learning XML</title>
<author>Om Mishra</author>
</book>
Attributes: XML elements can have attributes, which provide additional information about an
element. Attributes are specified within the start tag and are written as name-value pairs.
<person age="30" gender="male">John</person>
Text Content: The text content of an element is the data it holds. In the example below,
"Learning XML" and "Om Mishra" are the text content of the <title> and <author> elements,
respectively.
<book>
<title>Learning XML</title>
<author> Om Mishra </author>
</book>
Comments: Comments in XML are enclosed within <!-- and -->. They are used to add
explanatory notes or annotations within the document. For example:
<!-- This is a comment -->
Processing Instructions: Processing instructions are special directives for applications
processing the XML document. They begin with <? and end with ?>. For instance:
<?target instruction?>
CDATA Section: CDATA (Character Data) sections allow the inclusion of character data that
should not be treated as XML markup. This is useful when including blocks of text or code.
<![CDATA[This is a CDATA section. <tag> This will not be treated as markup. ]]>

These components collectively define the structure of an XML document, providing a


standardized way to represent and exchange data in a readable and extensible format. The
proper arrangement and nesting of these elements create a well-formed XML document that
adheres to XML syntax rules.

DTD
 DTD stands for Document Type Definition.
 It is a set of rules that define the structure and the legal elements and attributes of an
XML document.
 DTDs specify the order and nesting of elements, the types of data allowed, and the
relationships between elements.
 DTDs are used for standardizing the structure of XML documents.
 They allow independent groups to agree on a common structure for exchanging data.
 Applications can use a DTD to verify that XML data is valid and adheres to a predefined
structure.

DTD Syntax:
DTDs use a set of declarations to define the structure of an XML document.
The basic syntax for an element declaration is:
<!ELEMENT element_name content_type>
Example:
<!ELEMENT note (to, from, heading, body)>
Data Types in DTD:
Two main data types are used in DTD: PCDATA and CDATA.
 PCDATA (Parsed Character Data) is used for data that is parsed by the XML parser.
 CDATA (Character Data) is used for character data that is not usually parsed.
Example:
<!ELEMENT title (#PCDATA)>
Element Structure in DTD:
DTDs define the structure of elements, their nesting, and the order of child elements.
Example:
<!ELEMENT address (name, email, phone, birthday)>
Attribute Declaration in DTD:
DTDs can include declarations for attributes within element definitions.
Example:
<!ELEMENT person (name)>
<!ATTLIST person age CDATA #REQUIRED>
Entity Declaration:
DTDs allow the declaration of entities for reuse and modularity.
Example:
<!ENTITY greeting "Hello, ">
Validation Using DTD:
 XML parsers can use DTDs to validate XML documents.
 Validation ensures that the document adheres to the defined structure and rules.

Internal DTD:
 An internal DTD is declared within the XML document itself.
 The DTD declarations are placed within the <!DOCTYPE> declaration, which is part of the
XML document's prolog.
Example:
<?xml version="1.0"?>
<!DOCTYPE rootElement [
<!-- DTD declarations go here -->
]>
<rootElement>
<!-- XML content goes here -->
</rootElement>
In this example, the DTD declarations are enclosed within square brackets [] immediately
following the <!DOCTYPE> declaration.

Internal DTD Declaration Example: DTD declarations can be included inside the XML file using
the <!DOCTYPE> definition.
<?xml version="1.0"?>
<!DOCTYPE note [
<!ELEMENT note (to, from, heading, body)>
<!ELEMENT to (#PCDATA)>
<!ELEMENT from (#PCDATA)>
<!ELEMENT heading (#PCDATA)>
<!ELEMENT body (#PCDATA)>
]>
<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend</body>
</note>

The main difference between internal and external Document Type Definitions (DTDs) in XML
lies in where the DTD is located and how it is associated with the XML document.
External DTD:
 An external DTD is stored in a separate file, and the XML document references this
external file.
 The SYSTEM keyword is used in the <!DOCTYPE> declaration to specify the location of
the external DTD file.
Example:
<?xml version="1.0"?>
<!DOCTYPE rootElement SYSTEM "external.dtd">
<rootElement>
<!-- XML content goes here -->
</rootElement>
External DTD (external.dtd):
<!-- DTD declarations go here -->
In this example, the DTD declarations are stored in a separate file named "external.dtd," and
the XML document references it using SYSTEM "external.dtd".
External DTD Declaration Example: DTD declarations can also be stored in an external file and
referenced in the XML file.
<?xml version="1.0"?>
<!DOCTYPE note SYSTEM "note.dtd">
<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend</body>
</note>
The corresponding "note.dtd" file contains the DTD declarations.

Internal DTD External DTD


 Simplifies document  Promotes modularization, as
distribution as the DTD is the DTD can be reused across
contained within the XML file. multiple XML documents.
Advantages  Useful for small documents or  Facilitates easier maintenance,
cases where the DTD is as changes to the DTD only
specific to a particular XML need to be made in one place
file. for multiple documents.
 Suitable for standalone  Better suited for larger projects
documents with a simple with multiple XML documents
structure. sharing the same structure.
Considerations  May lead to redundancy if the  Requires an additional file,
same DTD needs to be used which may be a consideration
across multiple documents. for document distribution.
Write DTD for following XML code.
<?xml version="1.0"?>
<employee>
<firstname>Om</firstname>
<lastname>Mishra</lastname>
<email>Omshankar92229@gmail.com</email>
</employee>
DTD
<!DOCTYPE employee [
<!ELEMENT employee (firstname, lastname, email)>
<!ELEMENT firstname (#PCDATA)>
<!ELEMENT lastname (#PCDATA)>
<!ELEMENT email (#PCDATA)>
]>
 The <!DOCTYPE> declaration indicates the start of the Document Type Definition (DTD)
and associates it with the "employee" element.
 The <!ELEMENT> declarations define the structure of the "employee" element and its
child elements:
 "employee" must contain "firstname," "lastname," and "email."
 "firstname," "lastname," and "email" elements can contain parsed character data
(#PCDATA).

Schema
A schema in the context of XML refers to a formal specification that defines the structure,
content, and data types of XML documents. XML Schema Definition (XSD) is the most commonly
used schema language for XML. It provides a way to describe the elements and attributes that
can appear in an XML document and the relationships between them.
Purpose: The primary purpose of an XML schema is to define the rules and constraints that
XML documents must adhere to. It serves as a blueprint for the structure and content of valid
XML documents.
Language: XML Schema Definition (XSD) is the predominant schema language for XML. It is an
XML-based language itself and is used to define the structure and constraints of other XML
documents.
Structure: An XML schema consists of element declarations, attribute declarations, complex
types, simple types, and other constructs that collectively define the structure of XML
documents.
Element Declarations: Element declarations specify the structure of XML elements, including
their names, types, and whether they are required or optional.
Attribute Declarations: Attribute declarations define the attributes that can be associated
with XML elements, specifying their names, types, and constraints.
Complex Types and Simple Types: Complex types define the structure of elements that can
contain other elements and attributes. Simple types define the content of elements that
contain only text.
Namespaces: XML Schema supports the use of namespaces, allowing the definition of distinct
types and elements for different purposes.
Validation: XML documents can be validated against an XML schema to ensure that they
conform to the specified rules. Validation helps identify errors and ensures data consistency.
Example: Below is a simplified example of an XML schema for an address book:
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="addressBook">
<xs:complexType>
<xs:sequence>
<xs:element name="contact" minOccurs="0" maxOccurs="unbounded">
<xs:complexType>
<xs:sequence>
<xs:element name="name" type="xs:string"/>
<xs:element name="email" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
In this example, the schema defines an "addressBook" element containing "contact" elements
with "name" and "email" sub-elements.
Benefits:
 XML Schema provides a powerful and standardized way to define and validate XML
document structures.
 It supports data typing, ensuring that data adheres to specific types such as strings,
numbers, and dates.
 Schemas enable better data interoperability and communication between different
systems.

Elements
Document Type Definition (DTD) is a markup language used to define the structure and legal
elements of an XML document. While DTD itself does not directly use markup elements like
XML or HTML, it defines rules for the markup elements that can appear in an XML document.
DTD specifies the elements, attributes, entities, and their relationships, acting as a blueprint
for creating valid XML documents.
Element Declarations:
In DTD, you define elements using the <!ELEMENT> declaration. The syntax is as follows:
<!ELEMENT element_name content_model>
 element_name: The name of the XML element.
 content_model: Describes what can appear inside the element (e.g., child elements,
data content).
Example:
<!ELEMENT book (title, author)>
This DTD declaration states that a "book" element must contain both "title" and "author"
elements.
Attributes:
In Document Type Definitions (DTD), attributes are used to provide additional information
about elements in an XML document. Attributes help define characteristics or properties
associated with an element. DTD allows you to specify the attributes that elements may or
must have. Here's how you can declare attributes in DTD:
Attribute Declaration Syntax:
Attributes are declared within the element declaration in DTD using the following syntax:
<!ATTLIST elementName attributeName attributeType attributeDefaultValue>
elementName: The name of the XML element for which the attribute is being defined.
attributeName: The name of the attribute.
attributeType: The data type of the attribute's value (e.g., CDATA, ID, IDREF, etc.).
attributeDefaultValue: The default value for the attribute (optional).
Example: Consider an XML document representing books with attributes for the title, author,
and publication year:
<!DOCTYPE library [
<!ELEMENT library (book+)>
<!ELEMENT book (title, author, year)>
<!ELEMENT title (#PCDATA)>
<!ELEMENT author (#PCDATA)>
<!ELEMENT year (#PCDATA)>

<!ATTLIST book
genre CDATA #IMPLIED
language CDATA #REQUIRED
>
]>
<library>
<book genre="Short Stories" language="English">
<title>Malgudi Days</title>
<author>R. K. Narayan</author>
<year>1943</year>
</book>
<!-- Other book elements go here -->
</library>

 The <!ATTLIST book ...> declaration specifies two attributes for the "book" element:
"genre" and "language."
 genre has a default data type of CDATA and is optional (#IMPLIED).
 language has a default data type of CDATA but is required (#REQUIRED).
This allows instances of the "book" element to have optional attributes like "genre" and
required attributes like "language."
Key Points about XML Attributes:
Name-Value Pairs:
 Attributes consist of a name and a corresponding value, separated by an equal sign (=).
 The value is enclosed in double or single quotes.
Position in Opening Tag: Attributes are specified within the opening tag of an element.
Quotation Marks: Attribute values are typically enclosed in double quotes (") or single quotes
('), but using double quotes is more common.
<element attribute="value">content</element>
Multiple Attributes: An element can have multiple attributes, each separated by a space.
<person gender="male" age="30">John</person>
Order of Attributes: The order of attributes within an opening tag does not matter; XML
parsers treat them as an unordered set.
Empty Elements: Attributes can also be used with empty elements (self-closing tags).
<empty-element attribute="value" />
Data Types: Attribute values are generally treated as strings, but they can represent different
data types based on the application's interpretation.
Use Cases:
 Attributes are often used to convey metadata, such as IDs, classes, or styles in XML
documents.
 They are useful for providing additional information about the content of an element.
Example with Multiple Attributes:
<product id="123" category="electronics" price="49.99">Smartphone</product>
In this example, the <product> element has three attributes: "id," "category," and "price,"
each providing additional information about the product.
Xml object:
In XML processing, the Document Object Model (DOM) provides a way to represent and
manipulate XML documents as a tree structure of objects. Each XML element, attribute, and
piece of text is treated as an object within the DOM, and these objects can be accessed and
manipulated using programming languages like JavaScript, Java, Python, etc.
Here's a basic overview of XML DOM objects:
1. Document Object: The Document object represents the entire XML document. It serves as
the entry point to access and manipulate the content.
var xmlDoc = new DOMParser().parseFromString(xmlString, "text/xml");
In JavaScript, for example, you can create a new Document object by parsing an XML string
using the DOMParser.
2. Element Objects: Element objects represent XML elements in the document. They have
properties corresponding to attributes and methods to access child elements.
var rootElement = xmlDoc.documentElement;
In this example, xmlDoc.documentElement gives you the root Element of the XML document.
3. Attribute Objects: Attribute objects represent attributes of XML elements. They can be
accessed through the attributes property of an Element.
var attributeValue = rootElement.getAttribute("attributeName");
Here, getAttribute("attributeName") retrieves the value of the specified attribute.
4. Text Node Objects: Text node objects represent the text content within XML elements.
var textContent = rootElement.firstChild.nodeValue;
In this example, rootElement.firstChild gives you the first child node (which might be a text
node), and nodeValue retrieves its content.
5. NodeList Objects: NodeList objects represent an ordered collection of nodes. They are often
returned by methods that access multiple nodes.
var nodeList = rootElement.getElementsByTagName("childElement");
In this example, getElementsByTagName returns a NodeList containing all elements with the
specified tag name.
Example (JavaScript):
// Creating a new XML document
var xmlString = '<root><item>Item 1</item><item>Item 2</item></root>';
var parser = new DOMParser();
var xmlDoc = parser.parseFromString(xmlString, "text/xml");
// Accessing elements and text content
var rootElement = xmlDoc.documentElement;
var items = xmlDoc.getElementsByTagName("item");
for (var i = 0; i < items.length; i++) {
console.log(items[i].textContent);
}
In this example, the XML string is parsed into a DOM, and then elements and text content are
accessed and printed.

Xml scripting
XML itself is not a scripting language; it is a markup language designed to store and transport
data. However, when people refer to "XML scripting," they might be talking about using
scripting languages to work with XML data. Common scripting languages for XML manipulation
include JavaScript, Python, and Ruby.
DOM Parser in XML:
A DOM parser is a software module or library that parses an XML document and builds a
corresponding DOM tree. It provides an interface to navigate and manipulate the XML
document using the DOM. The parsing process involves reading the XML document and creating
in-memory representations of its structure.
Here's a general process of using a DOM parser in XML:
Create a DOM Parser: In most programming languages, you would instantiate a DOM parser
object or use a library that provides DOM parsing capabilities.
Load XML Document: Use the parser to load the XML document. This can be from a file, a string,
or any other input source.
Build DOM Tree: The parser reads the XML document and constructs a tree of DOM objects
representing the document's structure.
Access and Manipulate: Use the DOM objects to navigate, access, and manipulate the XML
document. This involves traversing the tree, modifying elements, attributes, or text content, and
performing various operations.
Save or Serialize: Optionally, you can save or serialize the modified DOM tree back to an XML
document.
Example in JavaScript (using a browser environment):
Step 1: Create an XML Document
<!-- example.xml -->
<bookstore>
<book>
<title>Introduction to XML</title>
<author>John Doe</author>
<price>29.99</price>
</book>

<book>
<title>Data Structures in JavaScript</title>
<author>Jane Smith</author>
<price>39.95</price>
</book>
</bookstore>

Step 2: Use DOMParser to Parse the XML


var xmlString = `
<bookstore>
<book>
<title>Introduction to XML</title>
<author>John Doe</author>
<price>29.99</price>
</book>
<book>
<title>Data Structures in JavaScript</title>
<author>Jane Smith</author>
<price>39.95</price>
</book>
</bookstore>`;

var parser = new DOMParser();


var xmlDoc = parser.parseFromString(xmlString, "text/xml");

console.log(xmlDoc);
Step 3: Access Elements in the Parsed XML
var rootElement = xmlDoc.documentElement;
console.log("Root Element Name:", rootElement.nodeName);
var books = xmlDoc.getElementsByTagName("book");
for (var i = 0; i < books.length; i++) {
var title = books[i].getElementsByTagName("title")[0].textContent;
var author = books[i].getElementsByTagName("author")[0].textContent;
var price = books[i].getElementsByTagName("price")[0].textContent;
console.log(`Book ${i + 1}:`);
console.log(` Title: ${title}`);
console.log(` Author: ${author}`);
console.log(` Price: $${price}`);
}

Step 4: Update XML Content


var firstTitleElement = xmlDoc.getElementsByTagName("title")[0];
firstTitleElement.textContent = "Updated Introduction to XML";
console.log("Updated XML Content:");
console.log(xmlDoc.documentElement.outerHTML);
In this example:
 We created a simple XML document with bookstore and book elements.
 We used the DOMParser to parse the XML string, creating a Document object (xmlDoc).
 We accessed elements within the Document object using methods like
getElementsByTagName.
 We printed information about each book in the console.
 We updated the text content of the first <title> element.
 Finally, we logged the updated XML content.
Xml using application
Using XML with applications involves integrating XML as a data format for communication,
configuration, or storage within your software. XML's self-descriptive and extensible nature
makes it a versatile choice for representing structured data.
Data Exchange Between Applications:
 XML is often used as a format for exchanging data between different applications. The
sending application structures its data as XML, and the receiving application parses the
XML to extract the information.
 This is particularly common in web services, APIs, and other forms of data interchange
where multiple systems need to communicate.
Configuration Files:
 Many applications use XML for configuration files. XML allows developers to define a clear
structure for configuration settings, making it easy to update and maintain.
 For example, a web server might use an XML configuration file to specify various settings
such as server ports, directories, and security configurations.
Web Services and APIs:
 XML is widely used in web services and APIs to structure data in a standard and
interoperable way. SOAP (Simple Object Access Protocol) is an XML-based protocol for
exchanging structured information in web services.
 RESTful APIs often use XML as one of the data interchange formats alongside JSON.
Database Interaction: XML can be used to represent data in a database-agnostic way. For
example, it can be employed to export or import data between databases or to represent
complex data structures within a database column.
Document Storage: XML is often used for storing documents or configuration data in a
standardized, machine-readable format. This is common in content management systems,
where data needs to be structured, searchable, and easily transportable.
Middleware Communication: In enterprise applications, XML is frequently used as a data
format for communication between middleware systems. This allows different components of
a larger system to exchange data in a standard way.
Custom Data Formats: Some applications use custom XML schemas to define their specific data
structures. This is especially prevalent when working with specialized domains or industries that
adopt XML for their specific data representation needs.
Data Transformation and Processing: XML can be used as an intermediate format for data
transformation and processing. Applications might convert data to XML for processing, and then
convert it back to another format.
Transforming XML using XSL and XSLT
Transforming XML using XSLT (Extensible Stylesheet Language Transformations) involves using
an XSLT stylesheet to define rules for how the XML should be transformed into another
structure, typically HTML, but it could be any text-based format.
Example:
The provided XML document contains information about students, and it is associated with an
XSLT stylesheet (Rule.xsl) to transform and display the data in an HTML table format.

XML Document (data.xml):


<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl "href="Rule.xsl" ?>
<student>
<s>
<name> Deepak Singh Sikarwar </name>
<branch> CSE</branch>
<age>18</age>
<city> Agra </city>
</s>
<s>
<name> Anamika Chauhan </name>
<branch> CSE</branch>
<age> 20</age>
<city> Shahjahanpur </city>
</s>
<s>
<name> Shejal Agarwal</name>
<branch> CSE</branch>
<age> 23</age>
<city> Buland Shar</city>
</s>
<s>
<name> Om Shankar Mishra</name>
<branch> MCA</branch>
<age> 27</age>
<city> Rewa</city>
</s>
<s>
<name> Tammanna Bhatia</name>
<branch> IT</branch>
<age> 25</age>
<city> Indore</city>
</s>
</student>

 The XML document contains student information, each enclosed in the <s> element.
 It includes student attributes such as name, branch, age, and city.
 The <?xml-stylesheet> processing instruction links this XML document to the XSLT
stylesheet (Rule.xsl).
XSLT Stylesheet (style.xsl):
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<html>
<body>
<h1 align="center">Students' Basic Details</h1>
<table border="3" align="center" >
<tr>
<th>Name</th>
<th>Branch</th>
<th>Age</th>
<th>City</th>
</tr>
<xsl:for-each select="student/s">
<tr>
<td><xsl:value-of select="name"/></td>
<td><xsl:value-of select="branch"/></td>
<td><xsl:value-of select="age"/></td>
<td><xsl:value-of select="city"/></td>
</tr>
</xsl:for-each>
</table>
</body>
</html>
</xsl:template>
</xsl:stylesheet>

 The XSLT stylesheet defines rules for transforming the XML document.
 The <xsl:template match="/"> rule selects the root of the XML document.
 The transformed output is an HTML document with a centered heading and a table.
 The <xsl:for-each> loop iterates over each <s> element in the XML and creates a table
row for each student.
 <xsl:value-of> extracts and displays the values of student attributes in the HTML table.

Output:
Students’ Basic Details
Name Branch Age City
Deepak Singh Sikarwar CSE 18 Agra
Anamika Chauhan CSE 20 Shahjahanpur
Shejal Agarwal CSE 23 Buland Shar
Om Shankar Mishra MCA 27 Rewa
Tammanna Bhatia IT 25 Indore
XPATH - Template Based Transformations:
XPath (XML Path) is a query language used for navigating XML documents and selecting nodes
based on their structure, attributes, or content. XPath is commonly used in web scraping and
in tools like Selenium to locate elements on a web page.
Key Concepts:
Node:
 In XML, everything is a node. This includes elements, attributes, text, comments, and
more.
 Nodes are organized in a hierarchical tree structure.
Expression:
 An XPath expression is a string of text that specifies a set of nodes.
 It is used to navigate through elements and attributes in an XML document.
Path:
 A path is a sequence of nodes separated by slashes (/).
 Paths describe the hierarchy of nodes in an XML document.
XPath Syntax: XPath expressions use a combination of symbols and patterns to locate nodes.
Symbol Description
// Selects nodes in the document from the current node that match the selection no
matter where they are.
/ Selects the root node.
tagname Selects nodes with a specific tag name.
@ Selects an attribute.
attribute Attribute name of the node.
value Value of the attribute.
Example: Consider the following XML document:
<?xml version="1.0" encoding="UTF-8"?>
<bookstore>
<book category = "Math">
<title lang="en">IIT Mathematics</title>
<author>A Das Gupta</author>
</book>
<book category = "Chemistry">
<title lang="en"> Inorganic chemistry for JEE</title>
<author>V K Jaiswal</author>
</book>
</bookstore>

To select the author element of the chemistry book using XPath:


/bookstore/book[@category='Chemistry']/author
Types of XPath:
Absolute XPath:
 Starts with the root element and
includes the complete path.
 Less preferable as changes in the
structure may break the path.

Example:

/html[1]/body[1]/div[6]/div[1]/div[3]/div[1]/div[1]/div[1]/div[3]/ul[1]/li[2]/a[1]

Relative XPath:
 Starts with // and can search for the
element anywhere in the XML.
 Preferred as it is more flexible to
changes in the structure.

Example: //input[@id = 'fakebox-input']

XPath Functions:
contains(): Selects nodes whose specified attribute value contains the specified string.
Example:
//input[contains(@id, 'fakebox')]
starts-with(): Selects nodes whose specified attribute value starts with the specified string.
Example:
//input[starts-with(@id, 'fakebox')]
text(): Finds nodes having an exact match with the specified string value.
Example:
//div[text() = 'Search Google or type a URL']
AND and OR in XPath: OR is used to combine two or more conditions to find the node.
Example:
//input[@value='Log In' or @type='submit']
AND can also be applied similarly.

-Om Shankar Mishra

You might also like