Professional Documents
Culture Documents
01 - XML - Introduction To XML
01 - XML - Introduction To XML
0> <course startdate=February 06, 2006> <title> eXtensible Markup Language </title> <lecturer>Phan Vo Minh Thang</lecturer> </course>
XML is an Acronym standing for: eXtensible Markup Language XML is fast becoming the new Internet standard for information exchange. For complex information reuse, XML is the technology of choice
Markup Language
Whats a markup language?
Markup is a collection of characters that group, organize, and label the pieces of content in a document
In XML, markup is primarily meta content -information about information
Markup Language is a set of rules for representing data and encoding structures
Types of markup
Procedural markup specifies what to do with data
How to process the data
XML designed to
describe data focus on what data is
Markup is primarily descriptive or declarative
HTML
<html> <body> <p>333 MHz Pentium II with 256K internal cache, 512K external cache, 32MB standard RAM, 512MB max. RAM </p> </body> </html>
XML
<pcinfo> <processor> <type>Pentium II</type> <speed>333</speed> <intcache>256</intcache> </processor> <extcache>512</exctache> <ram> <standard>32</standard> <max>512</max> </ram> </pcinfo>
eXtensible Markup Language
Alphabet Soup
Whats with all the different markup languages?
SGML HTML XML XHTML
Disadvantages of SGML
Comprehensive but very complex and is difficult to learn and apply. SGML tools have traditionally been very expensive. SGML has been primarily a technology for print publishing
SGML
What is XML?
XML is NOT a set of tags that you can apply to documents
XML is a specification that set rules for the creation of tag sets that you can apply to document (eXtensible) XML does not define the tag names you do
XML is NOT a programming language like C++ XML is NOT a network transport protocol like HTTP, FTP XML is NOT a database
A database can contain XML data, but the database itself is not an XML document You can store XML data into a database or retrieve XML data from a database, but you need to run software written in a real programming language such as C and Java
eXtensible Markup Language
16 Lecturer: Phan Vo Minh Thang MSc.
It will be easy to write programs that processes XML documents XML documents will be easy to create, readable without specialized tools, and reasonably clear
XML is in ASCII (or text) format
transport
Web client - Removes the platform dependency Java - Write code not specific to an OS XML - Makes data independent of SW
A World Wide Web Consortium (W3C) standard Has overwhelming vendor support
XML Specification
XML 1.0 specification released
By W3C consortium on Feb. 10, 1998 As a recommendation - the highest level of endorsement possible
Over HTML
Interchangeable Reusable Enables automation Searchable
Better support for multimedia, wireless devices An official W3C Recommendation - January 26, 2000
Using XHTML
Some dont recommend XHTML because
You have all the pain of XML without any of the gain Markup is still structural and presentational
Exercise A
1. Metadata is information about information 2. XML is a programming language
FALSE TRUE
3. XML assumes that a document is structured TRUE hierarchically 4. XML is platform and language independent 5. You can use any text editor to create XML TRUE documents 6. XHTML lets you define your own tags
FALSE TRUE
Exercise A (continued)
7. XML content can be in any language of the world 8. Internet Explorer 5 and Netscape 6 support XML
TRUE
9. XML is about adding procedural markup code to TRUE documents 10. XML provides 50% of the capabilities of SGML 11. XML is a good solution to achieve data independence 12. A parser checks to see if your markup tags are spelled correctly
FALSE FALSE TRUE
FALSE
eXtensible Markup Language
32 Lecturer: Phan Vo Minh Thang MSc.
A Look at XML
No Predefined Tags
<price currency=usd>499.00</price> <toc xlink:href=/pineapple>Pineapplesoft</toc>
<P> The Price is $499.00 <TABLE> <TR> <TD> <A HREF=/pineapple><B>Pineapplesoft</B></A> </TD> <TR> </TABLE>
eXtensible Markup Language
40
Issues
How to look at XML documents
WWW browsers?
How to map XML to HTML?
Stricter
HTML has a very forgiving syntax
Difficult to develop software (like browsers) to process HTML Software size is large
Document Structure
Dear John, Hello .. This is the document for you! Harry April 12, 2000
44
Benefits of XML
XML is a text-based format that lets developer describe, deliver, and exchange structured data between a range of applications to clients for local display and manipulation Information will be more accessible and reusable XML brings power and flexibility to Web-based applications for exchanging structure data XML offers the tantalizing possibility of truly cross-platform, long-term data formats
Schema Language
Structure
LP XM
ser ar
Processing
XSL
Display
DTDs
Extensible Stylesheet Language (XSL) - used
interface
is an application programming interface (API) for HTML and XML documents Allows for content to be accessed and manipulated even AFTER it has become part of an HTML or XML document Published as a Recommendation by W3C
The DOM provides a common way of accessing data structures from structured documents
Opens the door for XML as the lingua franca of data interchange on the Internet
XML Software
XML Browser
View and print XML documents Microsoft Internet Explorer
XML Editor
Microsoft XML notepad
XML Parser
Shield programmers from the XML syntax IBMs XML for Java
XSL Processor
Transform XML to HTML LotusXSL
eXtensible Markup Language
53 Lecturer: Phan Vo Minh Thang MSc.
Applications of XML
Document-oriented applications
Intended for human consumption Document publishing
Data-oriented applications
Intended for software consumption Data Exchange
Data Exchange in Digital Libraries
Document Applications
XML concentrates on the document structure, and this makes it independent of the delivery medium Possible to edit and maintain documents in XML and automatically publish them on different media
Many publications are available online and in print Web sites are optimized for specific viewers versions and one optimized for some users
If done manually very costly
one generic
Make sense to maintain a common version of the documentation in a media-independent format (XML), and to automatically convert it into publishing formats such as HTML, PostScript, RTF, PDF
Document Applications
Write XML
printf(<? xml version=\1.0\ ?>); SELECT IDENTIFIER, NAME, PRICE FROM TABLE1 INTO :IDENTIFIER, :NAME, :PRICE printf(<Record>); printf(<Identifier> :IDENTIFIER </Identifier> <Name> :NAME </Name> <Price> :PRICE </Price>); printf(</Record>);
XML Result
<?xml version=1.0 ?> <Records> <Record> <Identifier>p1</Identifier> <Name>XML Editor</Name> <Price>$499.00</Price> </Record> <Record> </Record> </Records>
61
Characteristics of XML
Structured Content
Book chapter title -- section summary When you examine similar information product, structures are not consistent from product to product
Big problem for reuse
In XML, structure can be defined in a Document Type Definition (DTD) or Schema. DTD
Defines all the elements (XML tags) that can be used in a document. Defines the relationship of those elements to other elements Define the hierarchy of elements ("a chapter contains"), the order of elements, the number of elements Maintain structural consistency
eXtensible Markup Language
64 Lecturer: Phan Vo Minh Thang MSc.
XML makes documents transportable across systems and applications XML focuses on the structure of a document, not the presentation. XML maintains the presentation information (style) in separate files that are associated with the document when it is published or used.
Built-in Metadata
XML is a set of rules for creating markup tags The tag names themselves offer additional detail about the information.
The tag names become metadata Attributes can be used to further define metadata
Example
Using attributes identifies the audience for each specific step, option, or even word
Database Orientation
XML makes you look at information in the way of data XML DTD designers are not interested in the actual data values in design. They are concerned with the type of information, the hierarchy of information, and the relationship of the pieces of information
The result is a structural format that can be stored very easily in DB Each element can become a field of a table
Use of XML
How to format XML information for presentation? XSL is a powerful mechanism for both transforming and formatting XML documents
XSL is an XML markup language itself and can
Format content for online display or for paper-based delivery Add constant text or graphics Filter content Sort or reorder text
XSLT
Can manipulate the information to reorder, repeat, filter out information, or even add information based on details in the file Can transform an XML document into another markup language
XSL-FO
Provide style sheet capabilities for converting XML to paper-based format such as PDF Include page layouts, headers, footers
71
Document Applications
Personalization
Personalization is information that can be manipulated to serve the needs of a specific user Personalization can be user defined, or can managed by software, based on a user's login information Personalization that is managed by software may be controlled by observing user behavior, and/or combined with preferences to create a personalized experience With XML, documents can be broken down, stored as separate physical pieces in a database, and then assembled in any order to meet user demands
HTML Document
<html> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> <title>Phn ni dung tiu </title> </head> <body> <p align=center> Ni dung bn trong phn thn <br/> ca mt ti liu HTML </p> </body> </html>
74
XML Document
<?xml version="1.0" <noidung> <phandau> <tieude>Phn ni dung tiu </tieude> </phandau> <phanthan> <gioithieu>Ni dung bn trong phn thn ca mt element XML</gioithieu> </phanthan> </noidung>
eXtensible Markup Language
76 Lecturer: Phan Vo Minh Thang MSc.
encoding="UTF-8" standalone="yes"?>
Task List
Identify the method to store data in a device-independent format. Identify the structure of the document in which data is to be stored. Create an XML document to store data. View the XML document in a browser.
XML provides a way to store structured data that is capable of being recognized by different kinds of devices. In other words, it enables device-independence.
Before you store data in an XML document, you need to organize it. An XML document is composed of a number of components that can be used for representing information. These components are:
Processing Instruction
An XML document usually begins with the XML declaration statement or the Processing Instruction (PI). The PI provides information regarding the way in which the XML file should be processed.
In the above example, the PI states that version 1.0 is used. The PI uses the encoding property to specify information about the encoding scheme that is used to create the XML file.
Element Content
Elements can contain other elements. The elements contained in another element are called child elements. The containing element is called the eXtensible Markup Language parent element.
87 Lecturer: Phan Vo Minh Thang MSc.
Combination
Elements can contain textual information as well as other elements.
Summary (Contd.)
An XML document consists of:
Processing Instructions Elements Attributes Entities Comments Content
Summary (Contd.)
The rules that govern the creation of a well-formed XML document are as follows:
Every start tag must have an end tag. Empty tags must be closed using a forward slash (/). All attribute values must be given in double quotation marks. Tags must nest correctly. XML tags are case-sensitive. They must match each other in every implementation.
Info
Course name: