Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 28

XML Validation

DTD
Sep-2009

© 2008 MindTree Consulting


Agenda

Introduction to XML Validation


DTD
XML Schema

Slide 2
XML Validation

© 2008 MindTree Consulting


An Introduction to XML Validation

One of the important innovations of XML is the ability to place


preconditions on the data the programs read, and to do this in a
simple declarative way.
XML allows you to say
that every Order element must contain exactly one Customer element,
that each Customer element must have an id attribute that contains an
XML name token,
that every ShipTo element must contain one or more Streets, one City,
one State, and one Zip, and so forth.

Checking an XML document against this list of conditions is called


validation.
Validation is an optional step but an important one.

Slide 4
Validation

There are many reasons and opportunities to validate an XML document:


When we receive one, before importing data into a legacy system
When we receive one, before importing data into a legacy system, when we have
produced or hand-edited one
To test the output of an application, etc.
Validation as “firewall”
to serve as actual firewalls when we receive documents from the external world
(as is commonly the case with Web Services and other XML communications),
to provide check points when we design processes as pipelines of transformations.
Validation can take place at several levels.
Structural validation
Data validation

Slide 5
Schema Languages

There is more than one language in which you can express such
validation conditions. Generically, these are called schema
languages, and the documents that list the constraints are called
schemas.
Different schema languages have different strengths and
weaknesses.
The document type definition (DTD) is the only schema language
built into most XML parsers and endorsed as a standard part of XML.
The W3C XML Schema Language (schemas for short, though it’s
hardly the only schema language) addresses several limitations of
DTDs.
Many other schema languages have been invented that can easily
be integrated with your systems.

Slide 6
Document Type Definition (DTD)

© 2008 MindTree Consulting


Document Type Definition (DTD)

XML 1.0 included a set of tools for defining XML document structures,
called Document Type Definitions (DTDs).
A DTD focuses on the element structure of a document. It says what
elements a document may contain, what each element may and must
contain in what order, and what attributes each element has. DTDs can be
used for:
defining reusable content (entities),
some kinds of metadata information (notations).
mechanisms for providing default values for attributes.
Document type definitions (DTDs) serve two general purposes.
They provide the syntax for describing/constraining the logical structure of
a document. (Element/attribute declarations are used for it)
They provide syntax for composing a logical document from physical
entities. (entity/notation declarations are used to accomplish it.)

Slide 8
DTD Declarations

DTDs contain
several types
of
declarations

DOCTYPE ENTITY NOTATION ELEMENT ATTLIST

Slide 9
<!DOCTYPE ... >

The DOCTYPE declaration is the container for all other DTD


declarations.
The document type declaration is placed in the instance
document’s prolog, after the XML declaration but before the root
element start-tag to associate the given document with a set of
declarations.
The name of the DOCTYPE must be the same as the name of the
document’s root element.
Example:

Slide 10
DOCTYPE Syntax

DOCTYPE may contain internal declarations (referred to as the


internal DTD subset ), may refer to declarations in external files
(referred to as the external DTD subset ), or may use a combination
of both techniques.

Slide 11
Internal Declarations

The simplest way to define a DTD is through internal declarations. In this case, all
declarations are simply placed between the open/close square brackets. The obvious
downside to this approach is that you can’t reuse the declarations across different
XML document instances.
 <!DOCTYPE name [
<!-- insert declarations here -->
]>

Example ; using internal declarations


 <!DOCTYPE person [
<!-- internal subset -->
<!ELEMENT person (name, age)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT age (#PCDATA)>
]>
<person>
<name>Billy Bob</name>
<age>33</age>
</person>

Slide 12
External Declarations

DOCTYPE can also contain a reference to an external resource


containing the declarations. This type of declaration is useful
because it allows you to reuse the declarations in multiple
document instances.
The DOCTYPE declaration references the external resource through
public and system identifiers.
<!DOCTYPE name PUBLIC "publicId" "systemId">
<!DOCTYPE name SYSTEM "systemId">
A system identifier is a URI that identifies the location of the
resource; a public identifier is a location-independent identifier.
Processors can use the public identifier to determine how to retrieve the
physical resource if necessary. The PUBLIC token identifies a public
identifier followed by a backup system identifier.

Slide 13
Using external declarations examples

Using external declarations (public Using external declarations (system


identifier) identifier)
<!-- person.dtd -->
<!-- person.dtd -->
<!ELEMENT person (name, age)>
<!ELEMENT person (name, age)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT age (#PCDATA)> <!ELEMENT name (#PCDATA)>
<!-- person.xml --> <!ELEMENT age (#PCDATA)>
<!DOCTYPE person PUBLIC <!-- person.xml -->
"uuid:d2d19398-4be3-4928-a0fc-
26d572a19f39" <!DOCTYPE person SYSTEM "person.dtd">
<person>
"http://www.develop.com/people/person
.dtd">
<name>Billy Bob</name>

<person> <age>33</age>
<name>Billy Bob</name> </person>
<age>33</age>
</person>
Slide 14
Internal and external declarations

A DOCTYPE declaration can also use both the internal and external declarations.
 This is useful when you’ve decided to use external declarations but you need to extend them
further or override certain external declarations.
 Note: only ENTITY and ATTLIST declarations may be overridden.
Example
<!-- person.dtd -->
<!ENTITY % person-content "name, age">
<!ELEMENT person (%person-content;)>
<!ELEMENT name (#PCDATA)>
<!-- person2.xml -->
<!ELEMENT age (#PCDATA)> <!DOCTYPE person SYSTEM "person.dtd" [
<!-- change person's content model -->
<!-- person1.xml --> <!ENTITY % person-content "age, name">
<!DOCTYPE person SYSTEM "person.dtd">
]>
<person>
<person>
<name>Billy Bob</name>
<age>33</age>
<age>33</age>
<name>Billy Bob</name>
</person>
</person>

Slide 15
<!ELEMENT name content-model>

An ELEMENT declaration defines an element of the specified name with the
specified content model. The content model defines the element’s allowed
children.
Content Model Basics

Syntax Description

ANY Any child is allowed within the element.

EMPTY No children are allowed within the element.

(#PCDATA) PCDATA stands for parsed character data and means


the element can contain text.
(child1,child2,...) Only the specified children in the order given are
allowed within the element.
(child1|child2|...) Only one of the specified children is allowed within
the element.
Slide 16
Occurrence Modifiers

Occurrence modifiers that can be used to control how many times a


particular child or group occurs in the content model.

Syntax Description
No modifier means the child or child group must appear exactly once at
the specified location (except in a choice content model).

* Annotated child or child group may appear zero or more times at the
specified location.

+ Annotated child or child group may appear one or more times at the
specified location.

? Annotated child or child group may appear zero or one time at the
specified location.

A mixed content model is a special declaration that allows a mixture of text


and child elements in any order. Mixed content models must use the
following syntax:<!ELEMENT name (#PCDATA | child1 | child2 | ...)*>
Slide 17
Elements - Examples
Element and text content models Mixed content model
<!-- person.dtd -->
<!ELEMENT person (name, age, children?)>
<!-- p.dtd -->
<!ELEMENT name (fname, (mi|mname)?, lname)?>
<!ELEMENT fname (#PCDATA)> <!ELEMENT p (#PCDATA | b | i)*>
<!ELEMENT lname (#PCDATA)>
<!ELEMENT b (#PCDATA)>
<!ELEMENT mi (#PCDATA)>
<!ELEMENT mname (#PCDATA)> <!ELEMENT i (#PCDATA)>
<!ELEMENT age (#PCDATA)> <!-- p.xml -->
<!ELEMENT children (person*)>
<!-- person.xml -->
<!DOCTYPE p SYSTEM "p.dtd">
<!DOCTYPE person SYSTEM "person.dtd"> <p>This <i>is</i> an <b>example</b> of <i>mixed</i>
<person>
<i>content</i><b>!</b></p>
<name>
<fname>Billy</fname>
<lname>Smith</lname>
</name>
<age>43</age>
<children>
<person>
<name/>
<age>0.1</age>
</person>
<person>
<name>
<fname>Jill</fname>
<mi>J</mi>
<lname>Smith</lname>
</name>
<age>21</age>
</person>
</children>
</person>
Slide 18
<!ATTLIST eName aName1 aType default
aName2 aType default ...>
Attribute types- Attribute types make it possible Default declarations - After the attribute type,
to constrain the attribute value in different ways. you must specify either a default value for the
See the following list of type identifiers for details. attribute or a keyword that specifies whether it is
required.

Type Description Declaration Description

CDATA Arbitrary character data “Value” Default value for attribute. If the
attribute is not explicitly used on
ID A name that is unique within the
the given element, it will still
document
exist in the logical document
IDREF A reference to an ID value in the with the specified default value.
document
ENTITY The name of an unparsed entity #REQUIRED Attribute is required on the given
declared in the DTD element.
ENTITIES A space-delimited list of ENTITY #IMPLIED Attribute is optional on the given
values element.
NMTOKEN A valid XML name (NMTOKEN is #FIXED Attribute always has the
essentially a word without spaces.) "value" specified fixed value.
NMTOKENS A space-delimited list of
NMTOKEN values

Slide 19
Attribute enumerations
<!ATTLIST eName aName (token1 | token2 | token3 | ...)>
<!ATTLIST eName aName NOTATION (token1 | token2 | token3 |
...)>
It’s also possible to define an attribute as an enumeration of tokens. The tokens may be of type NMTOKEN or NOTATION . In
either case, the attribute value must be one of the specified enumerated values.

Example - Using attribute enumerations


Example - Using attribute types
<!-- emp.dtd -->
<!-- emp.dtd -->
<!ELEMENT employees (employee*)> <!ELEMENT employee (address)>

<!ELEMENT employee (#PCDATA)> <!-- NMTOKEN enumeration -->


<!ATTLIST employee <!ATTLIST employee
name CDATA #REQUIRED title (president|vice-pres|secretary|sales)
species NMTOKEN #FIXED "human"
#REQUIRED>
id ID #REQUIRED
<!ELEMENT address (#PCDATA)>
mgr IDREF #IMPLIED
<!-- NOTATION enumeration -->
manage IDREFS #IMPLIED>
<!-- emp.xml --> <!ATTLIST address

<!DOCTYPE employees SYSTEM "emp.dtd"> format NOTATION (cs|lf) "cs">


<employees> <!NOTATION cs PUBLIC "urn:addresses:comma-separated">
<employee name="Billy Bob" id="e100" manage="e101 <!NOTATION lf PUBLIC "urn:addresses:line-breaks">
e102"/>
<!-- emp.xml -->
<employee name="Jesse Jim" id="e101" mgr="e100"/>
<employee name="Sarah Sas" id="e102" mgr="e100" <!DOCTYPE employee SYSTEM "emp.dtd">

manage="e103" species="human"/> <employee title='vice-pres'>


<employee name="Nikki Nak" id="e103" mgr="e102"/> <!-- notation informs consuming application how to
<employee name="Peter Pan" id="e104"/> process element content -->
</employees> <address format='cs'>1927 N 52 E, Layton, UT, 84041
</address>
</employee>
Slide 20
<!ENTITY ... >

Entities are the most atomic unit of information in XML. Entities are used
to construct logical XML documents (as well as DTDs) from physical
resources. There are several types of entities, each of which is declared
using an ENTITY declaration.

A given entity is either

General or parameter Internal or external Parsed or unparsed

•General Entity may only be referenced in an XML document (not the DTD).
•Parameter Entity may only be referenced in a DTD (not the XML document).

•Internal Entity value defined inline.


•External Entity value contained in an external resource.

•Parsed Entity value parsed by a processor as XML/DTD content.


•Unparsed Entity value not parsed by XML processor.
Slide 21
Entity Syntax

Note that unparsed entities


are always general and
external whereas
parameter/internal
entities are always
parsed.

Distinct Entity Types Syntax Description Entity References Syntax Description

<!ENTITY % name "value"> Internal &name; General


parameter
%name; Parameter
<!ENTITY % name SYSTEM External
"systemId"> parameter
Name is used as the value of Unparsed
<!ENTITY name "value"> Internal general an attribute of type ENTITY
<!ENTITY name SYSTEM External parsed or ENTITIES
"systemId"> general
<!ENTITY name SYSTEM Unparsed
"systemId" NDATA nname>
Slide 22
Internal parameter entities
<!ENTITY % name "value">

•It’s common to override parameter entities defined in the


external subset with declarations in the internal subset
Always parsed
•Parameter entities may not be referenced within other
declarations in internal subset but it can be in external subset

Referenced
(%name;) is within
Internal
replaced with ELEMENT,
parameter
the parsed ATTRIBUTE,
entities
content NOTATION,
ENTITY

Used to
parameterize
portions of the Example: Parameter entities in the internal subset
DTD
<!DOCTYPE person [
<!ELEMENT person (name)>
<!ENTITY % nameDecl "<!ELEMENT name (#PCDATA)>">
<!-- parameter entity expands to
complete declaration -->
%nameDecl;
]>
<person><name>Billy Bob</name></person>

Slide 23
External parameter entities
<!ENTITY % name PUBLIC "publicId" "systemId">
<!ENTITY % name SYSTEM "systemId">

External parameter entities are Example


<!-- person-decls.dtd -->
used to include declarations
<!ELEMENT person (name, age)>
from external resources. <!ELEMENT name (#PCDATA)>
External parameter entities are <!ELEMENT age (#PCDATA)>

always parsed. A reference to an <!-- person.xml -->


<!DOCTYPE person [
external parameter entity
<!ENTITY % decls SYSTEM "person-decls.dtd">
(%name;) is replaced with the %decls;
parsed content. ]>
<person>
This example uses an external
<name>Billy Bob</name>
parsed entity (decls) to include
<age>33</age>
the set of declarations that are
</person>
contained in person-decls.dtd.

Slide 24
Internal general entities
<!ENTITY name "value">

Internal general entities always contain The resulting logical document could
parsed XML content. The parsed content is
be serialized as follows:
placed in the logical XML document
everywhere it’s referenced (&name;). <person>

Example : Using internal general entities <name>


<fname>Billy</fname>
<!DOCTYPE person [ <lname>Smith</lname>
<!ENTITY n </name>
"<fname>Billy</fname><lname>Smith</lname>">
<!ENTITY a "<age>33</age>"> <age>33</age>
]> </person>
<person>
<name>&n;</name>
&a;
</person>

Slide 25
External general parsed entities and Unparsed entities

 External general parsed entities  Unparsed entities make it possible to attach


<!ENTITY name PUBLIC "publicId" "systemId"> arbitrary binary resources to an XML document.
<!ENTITY name SYSTEM "systemId">
 External general parsed entities are used the same  Unparsed entities are always general and
way as internal general entities except for the fact external.
that they aren’t defined inline. They always contain  Because unparsed entities can reference any
parsed XML content that becomes part of the logical
XML document wherever it’s referenced (&name;). binary resource, applications require additional
Example: information to determine the resource’s type.
The notation name (nname) provides exactly this
<!DOCTYPE person [ type of information
<!ENTITY n SYSTEM "name.xml">  Because unparsed entities don’t contain XML
<!ENTITY a SYSTEM "age.xml"> content, they aren’t referenced the same way as
]> other general entities (&name;), but rather
<person> through an attribute of type ENTITY/ENTITIES.
<name>&n;</name> <!DOCTYPE person [
&a; <!ELEMENT person (#PCDATA)>
</person> <!ATTLIST person photo ENTITY #REQUIRED>
<!ENTITY imgEntity SYSTEM "aaron.gif" NDATA pic>
 Unparsed entities
<!NOTATION pic PUBLIC "urn:mime:img/gif">
<!ENTITY name PUBLIC "publicId" "systemId" NDATA
nname> ]>
<!ENTITY name SYSTEM "systemId" NDATA nname> <person photo='imgEntity'>Aaron</person>

Slide 26
Questions

Slide 27
Thank you

XML Technology, Semester 4


SICSR Executive MBA(IT) @ MindTree, Bangalore, India

By Neeraj Singh (toneeraj(AT)gmail(DOT)com


)
Slide 28

You might also like