Professional Documents
Culture Documents
XML Documents Structure: Encoding
XML Documents Structure: Encoding
• Unicode comes in many flavors. The most common one used in the West is UTF-8.
• UTF-8 is a variable length code. Characters are encoded in 1 byte, 2 bytes, or 4 bytes.
• In UTF-8, the numbers between 128 and 255 code for some of the more common
characters used in western Europe, such as ã, á, å, or ç.
• Two byte codes are used for some characters not listed in the first 256 and some Asian
ideographs.
• Four byte codes can handle any ideographs that are left.
Well-Formed Documents
• An XML parser is used to check that all the rules have been obeyed.
• Recent browsers such as Internet Explorer 5 and Netscape 7 come with XML parsers.
• Parsers are also available for free download over the Internet. One is Xerces, from the
Apache open-source project.
XML Document consists of many parts. XML Document have 2 main parts i.e Document
Information Followed by Document Body.
<Person>
<name>Pritesh</name>
<age>22</age>
<name>Pooja</name>
<age>22</age>
</Person>
Part 1 : Prolog (optional)
<?xmlversion="1.0"encoding="UTF-8"standalone="no"?>
parsed by Parser)
<?xml-stylesheettype="text/css"href="Styles.css"?>
1. White Space can be created using Carriage Return , Line Feed and Tab.
5. It means that you should use a linefeed character only (ASCII code 10) to indicate
<?xml-stylesheet type="text/css"
Processing Instructions href="Styles.css"?>
<name>Alice Lee</name>
<email>alee@aol.com</email>
<phone>212-346-1234</phone>
<birthday>1985-03-22</birthday>
</address>
1. Root Node
<address>
2. Sub Nodes
Root Element :
1. Each XML Document must have One and Only One Root Element.
Some Facts :