Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 66

EXtensible Markup

Language(XML)
U. K. Roy
An HTML system

HTML
document Web Server

Internet

Web Client
Parser, formatter,
interface

©U.K.R., 2008
Role of HTML
 HTML
 Designed to display data
 Focuses on appearance
 Has a fixed set of predefined tags
 Ambiguity

©U.K.R., 2008 XML 3


Role of XML
 EXtensible Markup Language
 W3C recommendation, 1998
 Designed to structure, transport and store data
 Transformation and Dynamic data
customization
 Interoperable way to represent and process
documents (not necessarily on web)
 Self descriptive

©U.K.R., 2008 XML 4


Example
<note>
<from>Ani</from>
<to>John</to>
<heading>Reminder</heading>
<body>Return my book on Monday</body>
</note>

©U.K.R., 2008 XML 5


Another Example
<song>
<title>Requiem</title>
<composer>Mozart</composer>
</song>

Equivalent HTML code:

<p>Requiem is a song composed by


Mozart</p>
©U.K.R., 2008 XML 6
Another Example
<question>
<text>What is the full form of XML?</text>
<A>eXtra Markup Language</A>
<B>eXtensible Markup Language</B>
<C>X-Markup Language</C>
<D>eXpandable Markup Language</D>
<answer value=“B”/>
</question>

©U.K.R., 2008 XML 7


Another Example
<schedule>
<appointment>
<subject>Appointment w/Doc</subject>
<when>
<date day=‘20’ month=‘03’ year=‘2013’/>
<startTime hour=‘8’ minute=‘30’/>
</when>
</appointment>
<appointment>
<subject>Lunch w/Boss</subject>
<when>
<date day=‘21’ month=‘03’ year=‘2013’/>
<startTime hour=‘13’ minute=‘30’/>
</when>
</appointment>
</schedule>

©U.K.R., 2008 XML 8


Another Example
<contact>
<person>
<name>B. S. Roy</name>
<number>9345654334</number>
</person>
<person>
<name>G. Mahapatra</name>
<number>9444554734</number>
</person>
</contact>

©U.K.R., 2008 XML 9


Role of XML
 Not a replacement of HTML
 XML focuses on what data are
 HTML focuses on how data look
 Tags are custom defined (not predefined)
 Functional meaning depends on application
 Everything must be marked up correctly

©U.K.R., 2008 XML 10


XML and Databases
 XML brings benefits of DBs to documents
 Schema to model information directly
 Formal validation, locking, versioning, rollback...
 But
 Not all traditional database concepts map
cleanly, because documents are fundamentally
different in some ways

©U.K.R., 2008
XML Building blocks
 Element
 Delimited by angular brackets
 Identifies the nature of the content it surrounds
 General format: <element> … </element>
 Empty element: <empty-element/>
 Attribute
 Name-value pairs that occur inside start-tags after
element name, like:
<element attribute=“value”>

©U.K.R., 2008 XML 12


XML Building blocks--Prolog
 The part of an XML document that precedes the
XML data
 Includes
 A declaration: version [, encoding, standalone]
<?xml version="1.0" encoding="ISO-8859-1"
standalone="yes"?>
 An optional DTD (Document Type Definition )
<!DOCTYPE greeting SYSTEM "hello.dtd">
 Processing Instructions (Optional)
<?xml-stylesheet href="simple.xsl"
type="text/xsl"?>

©U.K.R., 2008 XML 13


XML Elements
 XML Elements are Extensible
 More and more elements may be added to carry more
information
 XML Elements have Relationships
 Elements are related as parent and children
 Elements have Content
 Elements can have different types of content:
 empty content
 simple content
 element content
 mixed content
 attributes
 XML elements must follow the naming rules
©U.K.R., 2008 XML 14
XML Elements naming rules
 Names can only contain letters, digits and some
other special characters.
 Names can not start with a number or punctuation
marks
 Names must not contain the string “xml”, “XML” or
“Xml”
 Names can not contain while space(s).

©U.K.R., 2008 XML 15


Anatomy of an element

Element type
Element type

Attribute
(character)
entity
Attribute Attribute
reference
name value
<p type="rule">Use a hyphen: &#173;.</p>
Start-tag Content End-tag

Element

©U.K.R., 2008
The Basic Rules
 XML is case sensitive

<Msg>This is incorrect</msg>
<msg>This is correct</msg>

©U.K.R., 2008 XML 17


The Basic Rules
 All start tags must have end tags

//incorrect
<composer>Mozart

//correct
<composer>Mozart</composer>
©U.K.R., 2008 XML 18
The Basic Rules
 Empty Element

<BR></BR>
<BR/>
<img align=“center” src=“logo.gif”/>
<composer name=“Mozart”></composer>
<composer name=“Mozart”/>

©U.K.R., 2008 XML 19


The Basic Rules
 Elements must be properly nested

<b><i>This is incorrect nesting</b></i>

<b><i>This is correct nesting</i></b>

©U.K.R., 2008 XML 20


The Basic Rules
 XML declaration must be the first statement

<?xml version="1.0" encoding="ISO-8859-1"


standalone="yes"?>

©U.K.R., 2008 XML 21


The Basic Rules
 Every document must contain a root element

<root>
<child>
<subchild>.....</subchild>
</child>
</root>

©U.K.R., 2008 XML 22


The Basic Rules
 Attribute values must be quoted with inverted
commas

<note date="12/11/2007">
<to>Ani</to>
<from>John</from>
</note>

©U.K.R., 2008 XML 23


The Basic Rules
 Certain characters are reserved for parsing

<message>if salary < 1000 then</message>

<message>if salary &lt; 1000 then</message>

©U.K.R., 2008 XML 24


Predefined entities

&lt; < less than


&gt; > greater than
&amp; & &ampersand
&apos; ' apostrophe
&quot; " quotation mark

©U.K.R., 2008 XML 25


The Basic Rules
 With XML, white space is preserved
 With XML, a new line is always stored as
CR/LF
 Comments in XML: <!-- This is a comment -->
 Can go almost anywhere (not inside tags)
 Schemas can contain comments, too

©U.K.R., 2008 XML 26


Common Errors for Element Naming
 Do not use white space when creating names
for elements
 Element names cannot begin with a digit,
although names can contain digits
 Only certain punctuation allowed – periods,
colons, and hyphens

©U.K.R., 2008 XML 27


XML Attributes
 Located in the start tag of elements
 Provide additional information about
elements
 Often provide information that is not a
part of data
 Must be enclosed in quotes

 Should I use an element or an attribute?


 metadata (data about data) should be stored as attributes,
and that data itself should be stored as elements

©U.K.R., 2008 XML 28


Types of XML Documents
 XML document
 Well Formed XML.
 Syntax is correct
 Valid XML.
 Well formed
 Conforms to a DTD/Schema

©U.K.R., 2008 XML 29


Well-Formed XML
 Properties
 Documents must have a root element
 Elements must have a closing tag
 Elements must be properly nested
 Attribute values must be quoted

 Advantage
 Avoids fixed nature like HTML
 Flexible
 Expandable
©U.K.R., 2008 XML 30
Valid XML
 Properties
 Well Formed
 Comply with the rules defined in a DTD/Schema

 Advantage
 Clear Understanding
 Data verification
 Interoperability
 Better document processing

©U.K.R., 2008 XML 31


XML Validation
XML
document

Optimized
XML XML XML
schema Parser document

Error
messages

 xmllint --valid sample.xml

©U.K.R., 2008 XML 32


Dislaying XML
<?xml version="1.0" encoding="ISO-8859-1"?>
<?xml-stylesheet type="text/xsl"
href="books.xsl"?>
<bookstore>
<book category="literature">
<title lang="beng">Sanchoita</title>
<author>Rabindranath Tagore</author>
<year>2009</year>
<price>200.00</price>
</book>

</bookstore>
©U.K.R., 2008 XML 33
Document Type Definition
 Allows developers to create a set of rules to
specify legal content and place restrictions on
an XML file
 Parser generates error, if XML document does
not follow the rules contained within DTD
 Including a DTD
 Using internal declaration
 Using external file
 Both

©U.K.R., 2008 XML 34


Internal (standalone) DTD
 Uses DOCTYPE declaration

<!DOCTYPE greeting [
<!ELEMENT greeting (#PCDATA)>
]>
<greeting>Hello, world!</greeting>

 Specify in XML declaration


<?xml version="1.0" standalone="yes"?>

©U.K.R., 2008 XML 35


External DTD
 Most common
 Use DOCTYPE declaration before root element

<!DOCTYPE greeting SYSTEM "hello.dtd">


<greeting>Hello, world!</greeting>

©U.K.R., 2008 XML 36


External plus Internal DTD
 Usually to declare entities
 Use DOCTYPE declaration before root element

<!DOCTYPE greeting SYSTEM "hello.dtd" [


<!ENTITY excl "&#x21;">
]>
<greeting>Hello, world&excl;</greeting>

©U.K.R., 2008 XML 37


DTD – XML Building Blocks
 XML documents consist of following blocks
 Elements
 Attributes
 Entities
 &lt; &gt; &amp; &quot; &apos
 PCDATA
 Parsed Character DATA
 Entities will be expanded
 CDATA
 Character DATA
 Entities will not be expanded
©U.K.R., 2008 XML 38
Declaring empty Elements
<!ELEMENT elementName (EMPTY)>

Example
<!ELEMENT br (EMPTY)>
<!ELEMENT Bool (EMPTY)>

Usage:
<br/>
<Bool Value="True"></Bool>

©U.K.R., 2008 XML 39


Elements with text data
<!ELEMENT elementName (#PCDATA)>

Example
<!ELEMENT from (#PCDATA)>
<!ELEMENT question (#PCDATA)>
<!ELEMENT email (#PCDATA)>

©U.K.R., 2008 XML 40


Elements with text data: Usage
<from>U. K. Roy</from>

<question>
What does DTD stand for?
</question>

<email>
u_roy@it.jusl.ac.in
</email>
©U.K.R., 2008 XML 41
Elements with arbitrary content
<!ELEMENT elementName ANY>

Example
<!ELEMENT tutorial ANY>
<!ELEMENT employee ANY>
<!ELEMENT book ANY>

©U.K.R., 2008 XML 42


Usage
<tutorial>Hello <p>World!</p></tutorial>

<employee>
<fname>Arnab</fname>
<lname>Mitra</fname>
</employee>

<book>
<title>Web Technologies</title>
<publisher>
Oxford University Press
</publisher>
</book>
©U.K.R., 2008 XML 43
DTD Declarations
Example : Elements with Data
<!ELEMENT Month (#PCDATA)>

Valid Usage
<Month>April</Month>
<Month>This is a month</Month>

Invalid Usage:
<Month> <!—Invalid usage within XML file,
can’t have children!-->
<January>Jan</January>
<March>March</March>
</Month>
©U.K.R., 2008 XML 44
Declaring Elements
Element with Children (sequential)
<!ELEMENT elementName (child1,
child2,…)>

Example
<!ELEMENT message (from, to,
body)>
<!ELEMENT address (street, city,
zip)>
©U.K.R., 2008 XML 45
Declaring Elements
Inner elements must also be declared
<!ELEMENT message (from, to, body)>
<!ELEMENT from (#PCDATA)>
<!ELEMENT to (#PCDATA)>
<!ELEMENT body (#PCDATA)>

<!ELEMENT address (street, city, zip)>


<!ELEMENT street (#PCDATA)>
<!ELEMENT city (#PCDATA)>
<!ELEMENT zip (#PCDATA)>
©U.K.R., 2008 XML 46
Declaring Elements
Usage:
<?xml version="1.0"?>
<!DOCTYPE message SYSTEM "message.dtd">
<message>
<from>tom@it.jusl.ac.in</from>
<to>jerry@rediffmail.com</to>
<body>Learn DTD from
www.w3schools.com</body>
</message>

©U.K.R., 2008 XML 47


Declaring Elements
Usage:
<?xml version="1.0"?>
<!DOCTYPE message SYSTEM "address.dtd">
<address>
<street>S. C. Mallick Road</street>
<city>Kolkata</city>
<zip>700032</zip>
</address>

©U.K.R., 2008 XML 48


Using internal DTD
<?xml version="1.0"?>
<!DOCTYPE message [
<!ELEMENT message (from, to, body)>
<!ELEMENT from (#PCDATA)>
<!ELEMENT to (#PCDATA)>
<!ELEMENT body (#PCDATA)>
]>
<message>
<from>tom@it.jusl.ac.in</from>
<to>jerry@rediffmail.com</to>
<body>Learn DTD from w3schools.com</body>
</message>

©U.K.R., 2008 XML 49


DTD Declarations
<?xml version="1.0"?>
<!ELEMENT House (address)>
<!ELEMENT address (person, street, city, zip)>
<!ELEMENT person (#PCDATA)>
<!ELEMENT street (#PCDATA)>
<!ELEMENT city (#PCDATA)>
<!ELEMENT zip (#PCDATA)>
<House>
<address>
<person>John Doe</person>
<street>1234 Preston Ave.</street>
<city>Charlottesville, Va</city>
<zip>22903</zip>
</address>
</House>
©U.K.R., 2008 XML 50
Declaring Elements
Occurrence Indicators
Term Meaning Example
, Sequence Operators a, b, c
| Choice operators a|b|c
+ One or more a+
* Zero or more a*
? Single optional a?
() grouping (a)
©U.K.R., 2008 XML 51
Declaring Elements
Example
<!ELEMENT Book (Front, Chapter+, Back?)>
<!ELEMENT Front (Title, Author+,
Publisher?)>
<!ELEMENT Chapter (Name, Content)>
<!ELEMENT Back (ISBN)>

<!ELEMENT Title (#PCDATA)>


<!ELEMENT Author (#PCDATA)>
<!ELEMENT Publisher (#PCDATA)>
<!ELEMENT Name (#PCDATA)>
<!ELEMENT Content (#PCDATA)>
<!ELEMENT ISBN (#PCDATA)>
©U.K.R., 2008 XML 52
Examples
<!ELEMENT a EMPTY>
<!ELEMENT b ANY>
<!ELEMENT either (one | theother)>
<!ELEMENT ordered (first, second)>
<!ELEMENT list (item+)>
<!ELEMENT dl ((dt?, dd?)*)>
<!ELEMENT text (#PCDATA)>
<!ELEMENT mixed (#PCDATA | b | i | em)>

©U.K.R., 2008 XML 53


Cautions concerning DTDs
 All element declarations begin with <!
ELEMENT and end with >
 The ELEMENT declaration is case sensitive
 Elements declared with the #PCDATA content
model can not have children
 When describing sequences, the XML
document must contain exactly those elements
in exactly that order.

©U.K.R., 2008 XML 54


Declaring Attributes
General Syntax
<!ATTLIST elementName attributeName
attributeType default>
Example
<!ATTLIST HDD speed CDATA "7200">
<!ATTLIST HDD unit CDATA #IMPLIED>
<!ATTLIST price currency CDATA "INR">
<!ATTLIST question number ID #REQUIRED>

<HDD speed="6000"> … </HDD>


<price currency="USD">10</price>
<question number="1"> … </question>
©U.K.R., 2008 XML 55
Declaring Attributes
The attribute-type can be one of the following:
Description
Type
CDATA The value is character data
(en1|en2|..) The value must be one from an enumerated list
ID The value is a unique id
IDREF The value is the id of another element
IDREFS The value is a list of other ids
NMTOKEN The value is a valid XML name
NMTOKENS The value is a list of valid XML names
ENTITY The value is an entity
ENTITIES The value is a list of entities
NOTATION The value is a name of a notation
xml: The value is a predefined xml value
©U.K.R., 2008 XML 56
Declaring Attributes
The default-value can be one of the following:

Explanation
Value
value The default value of the attribute
#REQUIRED The attribute is required
#IMPLIED The attribute is not required
#FIXED value The attribute value is fixed

©U.K.R., 2008 XML 57


Examples

<!ATTLIST termdef
id ID #REQUIRED
name CDATA #IMPLIED>

<!ATTLIST list
type (bullets|ordered|glossary) "ordered">

<!ATTLIST form
method CDATA #FIXED "POST">

©U.K.R., 2008 XML 58


Examples
<!ELEMENT square EMPTY>
<!ATTLIST square width CDATA "0">

Valid XML:
<square width="100" />
<square/>

©U.K.R., 2008 XML 59


Examples--#REQUIRED
DTD:
<!ATTLIST person number CDATA #REQUIRED>

Valid XML:
<person number=“9876556789" />

Invalid XML:
<person />

©U.K.R., 2008 XML 60


Examples--#IMPLIED
DTD:
<!ATTLIST contact fax CDATA #IMPLIED>

Valid XML:
<contact fax="555-667788" />
<contact />

©U.K.R., 2008 XML 61


Examples--#FIXED
DTD:
<!ATTLIST sender company CDATA #FIXED "Microsoft">

Valid XML:
<sender company="Microsoft" />

Invalid XML:
<sender company="W3Schools" />

©U.K.R., 2008 XML 62


Examples--Enumerated
DTD:
<!ATTLIST payment type (cheque|cash) "cash">

XML example:
<payment type="cheque" />
<payment type="cash" />
<payment />

©U.K.R., 2008 XML 63


Elements of Attributes?
 attributes cannot contain multiple values (child
elements can)
 attributes are not easily expandable (for future
changes)
 attributes cannot describe structures (child
elements can)
 attributes are more difficult to manipulate by
program code
 attribute values are not easy to test against a DTD

©U.K.R., 2008 XML 64


Declaring Entities
General Syntax
<!ENTITY entityName "entityValue">
Example
<!ENTITY euro "&#x20AC;"> //€
<!ENTITY language "XML">
<!ENTITY W3C "World Wide Web Consortium">
<!ENTITY copyright "&#x00A9;"> //
<!ENTITY USD SYSTEM "currency.dtd">

<tutorial>
&language; is standardized by &W3C;
&copyright; UKR
</tutorial>

©U.K.R., 2008 XML 65


Questions?

©U.K.R., 2008 XML 66

You might also like