XML Documents Structure: Encoding

Encoding
• XML (like Java) uses Unicode to encode characters.
• Unicode comes in many flavors. The most common one used in the West is UTF-8.
• UTF-8 is a variable length code. Characters are encoded in 1 byte, 2 bytes, or 4 bytes.
• The first 128 characters in Unicode are ASCII.
• In UTF-8, the numbers between 128 and 255 code for some of the more common
characters used in western Europe, such as ã, á, å, or ç.
• Two byte codes are used for some characters not listed in the first 256 and some Asian
ideographs.
• Four byte codes can handle any ideographs that are left.
• Those using non-western languages should investigate other versions of Unicode.
Well-Formed Documents
• An XML document is said to be well-formed if it follows all the rules.
• An XML parser is used to check that all the rules have been obeyed.
• Recent browsers such as Internet Explorer 5 and Netscape 7 come with XML parsers.
• Parsers are also available for free download over the Internet. One is Xerces, from the
Apache open-source project.
• Java 1.4 also supports an open-source parser.
XML Documents structure
XML Document consists of many parts. XML Document have 2 main parts i.e Document
Information Followed by Document Body.
<?xml version="1.0" encoding="UTF-8" standalone="no"?>

<!DOCTYPE document system "Person.dtd">

<?xml-stylesheet type="text/css" href="Styles.css"?>
<Person>
<name>Pritesh</name>
<age>22</age>
<name>Pooja</name>
<age>22</age>
</Person>
Part 1 : Prolog (optional)
1.1 XML Declaration :
1. XML Declaration is Optional.
2. XML Declaration must be First Line in XML Document if we write Declaration.
3. XML Declaration tells that Document Written is in XML.
4. XML Declaration tells XML Version used to Write Document.
5. XML Declaration tells Encoding Style Used to Encode XML Document.
6. If XML Document is standalone i.e if it does not depends on other external
document then we need to specify standalone=”yes”.
7. W3C recommends to include XML Declaration.
<?xmlversion="1.0"encoding="UTF-8"standalone="no"?>
1.2 Document Type Definition (DTD)
1. Document Type definition is used to Define XML Document.
2. DTD is used when you Validate your XML document.
3. DTD can be Internal or External.
4. DTD rule tells which Element is allowed to nest inside Other Element.
<!DOCTYPE document system "Person.dtd">

1.3 Comment
1. Comments are Optional part of XML Document.
2. Comments in XML are similar to HTML . <!– and –>
3. Content Written inside Comment is ignored by Parser. (Comment part is not
parsed by Parser)
4. Comments can appear anywhere inside XML Document.
1.4 Styling and Processing Instruction
1. Processing Instructions begin with <? and ends with ?>
2. Processing Instructions are instructions for the XML processor.
3. Processing instructions are processor dependant so not all processors understand
all processing instructions.
<?xml-stylesheettype="text/css"href="Styles.css"?>
1.5 White Space
1. White Space can be created using Carriage Return , Line Feed and Tab.
2. White Space cannot affect Parsing of Document.
3. User is Free to Use White Space anywhere inside document.
4. XML recommendation specifies that XML documents use the UNIX convention
for line endings.
5. It means that you should use a linefeed character only (ASCII code 10) to indicate
the end of a line.

<?xml version="1.0" encoding="UTF-8"
XML Declaration standalone="no"?>
Document Type Definition (DTD) <!doctype document system "Person.dtd">
Comment 
<?xml-stylesheet type="text/css"
Processing Instructions href="Styles.css"?>
Elements & Content : XML Document
different mandatory fields of XML document.

<address>
<name>Alice Lee</name>
<email>alee@aol.com</email>
<phone>212-346-1234</phone>
<birthday>1985-03-22</birthday>
</address>
In the above example –
1. Root Node
<address>
2. Sub Nodes
<name> <email> <phone> <birthday>
are siblings of each other and Children nodes of <address>
Root Element :
1. Each XML Document must have One and Only One Root Element.
2. Other XML elements must be Nested inside Root Element.
3. Opening Tag of Root Element is the Opening Tag of Document.
4. Closing Tag of Root Element is the Closing Tag of Document.
Some Facts :
1. XML is Organized as Tree Structure.
2. XML can have User Defined Tags.
3. XML consists of any number of nodes.
4. Elements & Content
Root element opening tag <Person>
Child elements and content <Student>

<Boy>
<name>Pritesh</name>
<marks>90</marks>
</Boy>
<Girl>
<name>Pooja</name>
<marks>89</marks>
</Girl>
</Student>
Root element closing tag </Person>

XML Documents Structure: Encoding

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

XML Documents Structure: Encoding

Uploaded by

Copyright:

Available Formats

Encoding

• XML (like Java) uses Unicode to encode characters.

• The first 128 characters in Unicode are ASCII.

• Those using non-western languages should investigate other versions of Unicode.

• An XML document is said to be well-formed if it follows all the rules.

• Java 1.4 also supports an open-source parser.

XML Documents structure

<?xml version="1.0" encoding="UTF-8" standalone="no"?>

1.1 XML Declaration :

1. XML Declaration is Optional.

2. XML Declaration must be First Line in XML Document if we write Declaration.

3. XML Declaration tells that Document Written is in XML.

4. XML Declaration tells XML Version used to Write Document.

5. XML Declaration tells Encoding Style Used to Encode XML Document.

6. If XML Document is standalone i.e if it does not depends on other external

document then we need to specify standalone=”yes”.

7. W3C recommends to include XML Declaration.

1.2 Document Type Definition (DTD)

1. Document Type definition is used to Define XML Document.

2. DTD is used when you Validate your XML document.

3. DTD can be Internal or External.

4. DTD rule tells which Element is allowed to nest inside Other Element.

<!DOCTYPE document system "Person.dtd">

1. Comments are Optional part of XML Document.

2. Comments in XML are similar to HTML . <!– and –>

3. Content Written inside Comment is ignored by Parser. (Comment part is not

4. Comments can appear anywhere inside XML Document.

1.4 Styling and Processing Instruction

1. Processing Instructions begin with <? and ends with ?>

2. Processing Instructions are instructions for the XML processor.

3. Processing instructions are processor dependant so not all processors understand

all processing instructions.

1.5 White Space

2. White Space cannot affect Parsing of Document.

3. User is Free to Use White Space anywhere inside document.

4. XML recommendation specifies that XML documents use the UNIX convention

for line endings.

the end of a line.

Document Type Definition (DTD) <!doctype document system "Person.dtd">

Comment

Elements & Content : XML Document

different mandatory fields of XML document.

In the above example –

<name> <email> <phone> <birthday>

are siblings of each other and Children nodes of <address>

2. Other XML elements must be Nested inside Root Element.

3. Opening Tag of Root Element is the Opening Tag of Document.

4. Closing Tag of Root Element is the Closing Tag of Document.

1. XML is Organized as Tree Structure.

2. XML can have User Defined Tags.

3. XML consists of any number of nodes.

4. Elements & Content

Root element opening tag <Person>

Child elements and content <Student>

Root element closing tag </Person>

You might also like