Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 20

A BRIEF INTRODUCTION ABOUT THE INTERNET

Origins:
1960s
o U.S. Department of Defence (DoD) became interested in developing a new large-scale
computer network
o The purposes of this network were communications, program sharing, and remote computer
access for researchers working on defence-related contracts.
o The DoD’s Advanced Research Projects Agency (ARPA) funded the construction of
the first such network. Hence it was named as ARPAnet.
o The primary early use of ARPAnet was simple text-based communications through e-mail.

1990s

o NSFnet which was created in 1986 replaced ARPAnet by 1990.


o It was sponsored by the National Science Foundation (NSF).
o By 1992 NSFnet, connected more than 1 million computers around the world.
o In 1995, a small part of NSFnet returned to being a research network. The rest became known as the
Internet.

What Is the Internet?


 The Internet is a huge collection of computers connected in a communications network.
 The Transmission Control Protocol/Internet Protocol (TCP/IP) became the standard for
computer network connections in 1982.
 Rather than connecting every computer on the Internet directly to every other computer on
the Internet, normally the individual computers in an organization are connected to each other in a
local network. One node on this local network is physically connected to the Internet.
 So, the Internet is actually a network of networks, rather than a network of computers.
 Obviously, all devices connected to the Internet must be uniquely identifiable.

IP Addresses

In order to identify all the computers and other devices (printers and other networked peripherals) on the
Internet, each connected machine has a unique number, called an "IP address". IP stands for "Internet
Protocol," the common language used by machines on the Internet to share information.

An IP address is written as a set of 4 numbers, separated by periods, as in

203.183.184.10

This representation is sometimes referred to as dotted-octet representation of an IP address.

Here is some simple mathematics you should know! Each number in this four-number address can range from
0 through 255. So there are 256 different possible numbers for each part.

Since there are four parts, so there are a possible 256 X 256 X 256 X 256 = 256 to the 4th power =
4,294,967,296 different possible machine numbers - over four billion different possible machine numbers.

This sounds like more than enough addresses to go around, but IP addresses are beginning to run out.
One single network, for example, might typically have all the IP addresses starting with 203.183.184
(203.183.184.0 through 203.183.184.255). To make matters worse, many networks don't even use all the 256
IP addresses available to them, which means that although the IP addresses are reserved by them they are
going to waste.
Domain Names

The IP addresses are numbers. Hence, it would be difficult for the users to remember IP address. To solve
this problem, text based names were introduced. These are technically known as domain name system
(DNS).

These names begin with the names of the host machine, followed by progressively larger enclosing
collection of machines, called domains. There may be two, three or more domain names.

DNS is of the form hostname.domainName.domainName . Example: rnsit.ac.in


The steps for conversion from DNS to IP:
 The DNS has to be converted to IP address before destination is reached.
 This conversion is needed because computer understands only numbers.
 The conversion is done with the help of name server.
 As soon as domain name is provided, it will be sent across the internet to contact name servers.
 This name server is responsible for converting domain name to IP
 If one of the name servers is not able to convert DNS to IP, it contacts other name server.
 This process continues until IP address is generated.
 Once the IP address is generated, the host can be accessed.
 The hostname and all domain names form what is known as FULLY QUALIFIED DOMAIN
NAME. This is as shown below:

For example, the mail server at Web Crossing Harbor has an IP address 210.226.166.200. You could send
email to doug@210.226.166.200, but it isn't very convenient for a lot of reasons. For one thing, it is difficult
to remember. For another, we might need to move the mail server to a different machine (with a different IP
address) someday. Then we would have to tell everybody our new IP address in order to receive mail.

To solve this problem a system of giving easy-to-remember names to IP addresses was created. This system
is called the domain system. There are several top-level domains, and all other names fall under that in a
hierarchy of sub-domains.

There are two basic kinds of top-level domains - those based on type of activity and those based on
geographical location:
Some Activity Based Domains
Perhaps the most well-known top-level domain. Originally it was designated for use by
.com
companies and commercial activities. Now it can be used by anybody for any purpose.
Originally designated for use by nonprofit organizations and individuals, now it can be used for
.org
any purpose.
Originally designated for use by network organizations (such as Internet providers). Now it can
.net
be used for any purpose.
.gov For governmental organizations in the United States.
.mil For military organizations in the United States.
.edu For four-year degree-granting colleges and universities only.

Some Geographic Based Domains


.jp The Japan domain
.us The U.S. domain
.ca The Canada domain
.to The Tonga domain

Some geographic domains, such as the .to (Tonga) domain, open up their domains and make them available
to anybody in the world.

A web application is an application that is accessed by users using a web browser.

What are the advantages of Web applications?


1. Web Applications just need to be installed only on the web server, where as desktop applications need to
be installed on every computer, where you want to access them.
2. Maintenance, support and patches are easier to provide.
3. Only a browser is required on the client machine to access a web application.
4. Accessible from anywhere, provided there is internet.
5. Cross Platform

Servers

A server is just a host that serves something. Some examples are:

 web servers - computers that serve web pages. People connect to web servers using browsers, such as
Netscape Navigator or Internet Explorer.
 FTP servers - People connect to them for file transfer, using a browser or a specialized FTP program, such as
Fetch (on a Mac) or FTP Explorer (on Windows).
 mail servers - People connect to them to send and receive mail, using such programs as Eudora, Netscape
Mail, Claris Mail and Microsoft Outlook Express.
 Web Crossing - a server that lets users create and use online communities, including forums and chat and
other services

The Web is known as a client-server system. Your computer is the client and the remote computers that store
electronic files are the servers.

Here's how web works:


When you enter something like http://www.google.com, the request goes to one of many special computers
on the Internet known as Domain Name Servers (DNS). All these requests are routed through various routers
and switches. The domain name servers keep tables of machine names and their IP addresses, so when you
type in http://www.google.com, it gets translated into a number, which identifies the computers that serve the
Google Web site to you.

When you want to view any page on the Web, you must initiate the activity by requesting a page using your
browser. The browser asks a domain name server to translate the domain name you requested into an IP
address. The browser then sends a request to that server for the page you want, using a standard called
Hypertext Transfer Protocol or HTTP.

The server should constantly be connected to the Internet.ready to serve pages to visitors. When it receives a
request, it looks for the requested document and returns it to the Web browser. When a request is made, the
server usually logs the client's IP address, the document requested, and the date and time it was requested.
This information varies server to server.

In short:

We have seen how a Web client - server interaction happens. We can summaries these steps as follows:

 A user enters a URL into a browser (for example, http://www.google.com). This request is passed to a
domain name server.

 The domain name server returns an IP address for the server that hosts the Web site (for example,
68.178.157.132).

 The browser requests the page from the Web server using the IP address specified by the domain name
server.

 The Web server returns the page to the IP address specified by the browser requesting the page. The page
may also contain links to other files on the same server, such as images, which the browser will also request.

 The browser collects all the information and displays to your computer in the form of Web page.

WEB BROWSERS
 Documents provided by servers on the Web are requested by browsers, which are programs running on
client machines.
 They are called browsers because they allow the user to browse the resources available on servers.
 Mosaic was the first browser with a graphical user interface.
 A browser is a client on the Web because it initiates the communication with a server, which waits for
a request from the client before doing anything.
 In the simplest case, a browser requests a static document from a server.
 The server locates the document among its servable documents and sends it to the browser, which
displays it for the user.
 Sometimes a browser directly requests the execution of a program stored on the server. The output of
the program is then returned to the browser.
 Examples: Internet Explorer, Mozilla Firefox, Netscape Navigator, Google Chrome, Opera etc.,

WEB SERVERS
Web servers are programs that provide documents to requesting browsers. Example: Apache
Web server operations:
 All the communications between a web client and a web server use the HTTP
 When a web server begins execution, it informs the OS under which it is running & it runs
as a background process
 A web client or browser, opens a network connection to a web server, sends information requests and
possibly data to the server, receives information from the server and closes the connection.
 The primary task of web server is to monitor a communication port on host machine, accept HTTP
commands through that port and perform the operations specified by the commands.
 When the URL is received, it is translated into either a filename or a program name.

General characteristics of web server:


 The file structure of a web server has two separate directories
 The root of one of these is called document root which stores web documents
 The root of the other directory is called the server root which stores server and its support softwares
 The files stored directly in the document root are those available to clients through top level URLs
 The secondary areas from which documents can be served are called virtual document trees.
 Many servers can support more than one site on a computer, potentially reducing the cost of each
site and making their maintenance more convenient. Such secondary hosts are called virtual hosts.

Apache
Apache is the most widely used Web server.
The primary reasons are as follows: Apache is an excellent server because it is both fast and reliable.
Furthermore, it is open-source software, which means that it is free and is managed by a large team of
volunteers, a process that efficiently and effectively maintains the system.
Finally, it is one of the best available servers for Unix-based systems, which are the most popular for
Web servers.
Apache is capable of providing a long list of services beyond the basic process of serving documents
to clients.
When Apache begins execution, it reads its configuration information from a file and sets its parameters
to operate accordingly.

IIS
Microsoft IIS server is supplied as part of Windows—and because it is a reasonably good server—most
Windows-based Web servers use IIS.
With IIS, server behaviour is modified by changes made through a window-based management
program, named the IIS snap-in, which controls both IIS and ftp.
This program allows the site manager to set parameters for the server.
Under Windows XP and Vista, the IIS snap-in is accessed by going to Control Panel, Administrative
Tools, and IIS Admin.

UNIFORM RESOURCE LOCATORS


 Uniform Resource Locators (URLs) are used to identify different kinds of resources on Internet.
 If the web browser wants some document from web server, just giving domain name is not sufficient
because domain name can only be used for locating the server.
 It does not have information about which document client needs. Therefore, URL should be provided.
 The general format of URL is: scheme: object-address
Example: http: www.vtu.ac.in/results.php
 The scheme indicates protocols being used. (http, ftp, telnet...)
 In case of http, the full form of the object address of a URL is as follows:
//fully-qualified-domain-name/path-to-document
 URLs can never have embedded spaces
 It cannot use special characters like semicolons, ampersands and colons
 The path to the document for http protocol is a sequence of directory names and a filename, all
separated by whatever special character the OS uses. (forward or backward slashes)
 The path in a URL can differ from a path to a file because a URL need not include all directories on
the path
 A path that includes all directories along the way is called a complete path.
Example: http://www.rnsit.ac.in/index.html
 In most cases, the path to the document is relative to some base path that is specified in the
configuration files of the server. Such paths are called partial paths.
Example: http://www.rnsit.ac.in/

MULTIPURPOSE INTERNET MAIL EXTENSIONS


 MIME stands for Multipurpose Internet Mail Extension.
 The server system apart from sending the requested document, it will also send MIME information.
 The MIME information is used by web browser for rendering the document properly.
 The format of MIME is: type/subtype
Example: text/html , text/doc , image/jpeg , video/mpeg
 When the type is either text or image, the browser renders the document without any problem
 However, if the type is video or audio, it cannot render the document
 It has to take the help of other software like media player, win amp etc.,
 These softwares are called as helper applications or plugins
 These non-textual information are known as HYPER MEDIA
 Along with creating customized information, the user should also create helper applications.
 This helper application will be used for rendering the document by browser.
 The list of MIME specifications is stored in configuration file of web server.

Chapter :2-Web2.0

Web 2.0 is the term given to describe a second generation of the World Wide Web that is focused on
the ability for people to collaborate and share information online. Web 2.0 basically refers to the
transition from static HTML Web pages to a more dynamic Web that is more organized and is based
on serving Web applications to users.

Other improved functionality of Web 2.0 includes open communication with an emphasis on Web-
based communities of users, and more open sharing of information. Over time Web 2.0 has been
used more as a marketing term than a computer-science-based term. Blogs, wikis, and Web
servicesare all seen as components of Web 2.0.

Search 2.0 (S-2.0):


What I'm calling Search 2.0 are actually third generation search technologies. To explain the
generations:

 First-generation search ranked sites based on page content - examples are early yahoo.com and Alta
Vista.
 Second-generation relies on link analysis for ranking - so they take the structure of the Web into
account. Examples are Google and Overture.
 Third-generation search technologies are designed to combine the scalability of existing internet
search engines with new and improved relevancy models; they bring into the equation user
preferences, collaboration, collective intelligence, a rich user experience, and many other
specialized capabilities that make information more productive.

Content Networks

Content networks are websites or collections of websites that provide information in various forms
(such as articles, wikis, blogs, etc.). These provide another way of filtering the vast amounts of
information on the Internet, by allowing users to go to a trusted site that has already sorted through
many sources to find the best content or has provided its own content.

 User-Generated Content

User-generated content has been the key to success for many of today’s leading Web 2.0 companies,
such as Amazon, eBay and Monster. The community adds value to these sites, which, in many cases,
are almost entirely built on user-generated content. For example, eBay (an online auction site) relies on
the community to buy and sell auction items, and Monster (a job search engine) connects job seekers
with employers and recruiters.

Wikis, websites that allow users to edit existing content and add new information, are prime
examples of user-generated content and collective intelligence. The most popular wiki is Wikipedia, a
community-generated encyclopedia with articles available in over 200 languages. Wikipedia trusts its
users to follow certain rules, such as not deleting accurate information and not adding biased
information, while allowing community members to enforce the rules.

Blog:

The term “blog” evolved from weblog, a regularly updated list of interesting websites. These blogs
consisted of short postings, in reverse chronological order, that contained links to other web pages
and short commentaries or reactions. Blogging has since taken on a looser structure—some blogs
still follow the traditional format of links and small amounts of text, while others consist of essays,
sometimes not containing any links. Blogs can also now incorporate media, such as music or videos.
Many people are familiar with personal journal blogs.

Social Networking

Social networking sites, which allow users to keep track of their existing interpersonal relationships and
form new ones, are experiencing extraordinary growth in Web 2.0.

RSS:

RSS is the acronym used to describe the de facto standard for the syndication of Web content. RSS is an
XML-based format and while it can be used in different ways for content distribution, its most widespread
usage is in distributing news headlines on the Web. A Web site that wants to allow other sites to publish
some of its content creates an RSS document and registers the document with an RSS publisher. A user
that can read RSS-distributed content can use the content on a different site. Syndicated content can
include data such as news feeds, events listings, news stories, headlines, project updates, excerpts from
discussion forums or even corporate information.

VOIP - VOICE OVER INTERNET PROTOCOL

"Voice Over Internet Protocol" (also called VoIP, IP Telephony, Internet telephony, Broadband
telephony, Broadband Phone and Voice over Broadband) allows people to make telephone calls, use
voice instant messaging and teleconference over the Internet or through any IP-based network.

VoIP works by converting the voice into a digital signal that travels over the internet then converts it
back at the other end. It can be used by way of a regular analogue phone, dedicated VoIP phone or
headset/microphone connected to a computer system using a software programme, with the end user
able to receive the call through their own computer or on a normal phone number.

Companies providing VoIP service are commonly referred to as providers, and protocols which are
used to carry voice signals over the IP network are commonly referred to as Voice over IP or VoIP
protocols.

INTRODUCTION TO XML

XML stands for Extensible Markup Language. It is a text-based markup language derived from
Standard Generalized Markup Language (SGML).

XML tags identify the data and are used to store and organize the data, rather than specifying how to
display it like HTML tags, which are used to display the data. XML is not going to replace HTML in
the near future, but it introduces new possibilities by adopting many successful features of HTML.

There are three important characteristics of XML that make it useful in a variety of systems and
solutions:

 XML is extensible: XML allows you to create your own self-descriptive tags, or language, that suits
your application.

 XML is a public standard: XML was developed by an organization called the World Wide Web
Consortium (W3C) and is available as an open standard.

 All XML documents begin with an XML declaration. This declaration identifies that the document
is a XML document and also specifies version number of XML standard.
 It also specifies encoding standard.
<?xml version = “1.0” encoding = “utf-8”?>
 Comments in XML is similar to HTML
 XML names are used to name elements and attributes.
 XML names are case-sensitive.
 All XML document contains a single root element whose opening tag appears on first line of the code
 All other tags must be nested inside the root element
 As in case of XHTML, XML tags can also have attributes
 The values for the attributes must be in single or double quotation
Example:
1. <?xml version = “1.0” encoding = “utf-8”?>
<student>
<name>Santhosh</name>
<usn>sc120CS090</usn>
</student>
XML Declaration

The XML document can optionally have an XML declaration. It is written as below:

<?xml version="1.0" encoding="UTF-8"?>

Where version is the XML version and encoding specifies the character encoding used in the document.

Syntax Rules for XML declaration

 The XML declaration is case sensitive and must begin with "<?xml>" where "xml" is written in lower-
case.

 If document contains XML declaration, then it strictly needs to be the first statement of the XML
document.

 The XML declaration strictly needs be the first statement in the XML document.

 An HTTP protocol can override the value of encoding that you put in the XML declaration.

Tags and Elements

An XML file is structured by several XML-elements, also called XML-nodes or XML-tags. XML-
elements' names are enclosed by triangular brackets < > as shown below:

Element Syntax: Each XML-element needs to be closed either with start or with end elements as
shown below:

<element>....</element>

or in simple-cases, just this way:

<element/>

Root element: An XML document can have only one root element. For example, following is not a
correct XML document, because both the x and y elements occur at the top level without a root element:

<x>...</x>
<y>...</y>

The following example shows a correctly formed XML document:

<root>
<x>...</x>
<y>...</y>
</root>

Case sensitivity: The names of XML-elements are case-sensitive. That means the name of the start and
the end elements need to be exactly in the same case.

For example <contact-info> is different from <Contact-Info>.

Attributes

An attribute specifies a single property for the element, using a name/value pair. An XML-element can
have one or more attributes. For example:

<a href="http://www.google.com/">google</a>

Here href is the attribute name and http:// www.google.com / is attribute value.

In the above syntax, the attribute value is not defined in quotation marks.

XML References

References usually allow you to add or include additional text or markup in an XML document.
References always begin with the symbol "&" ,which is a reserved character and end with the symbol
";". XML has two types of references:

Entity References: An entity reference contains a name between the start and the end delimiters. For
example &amp; where amp is name. The name refers to a predefined string of text and/or markup.

Character References: These contain references, such as &#65;, contains a hash mark (“#”) followed
by a number. The number always refers to the Unicode code of a character. In this case, 65 refers to
alphabet "A".

XML Text

 The names of XML-elements and XML-attributes are case-sensitive, which means the name of start and
end elements need to be written in the same case.
 To avoid character encoding problems, all XML files should be saved as Unicode UTF-8 or UTF-16
files.
 Whitespace characters like blanks, tabs and line-breaks between XML-elements and between the XML-
attributes will be ignored.
 Some characters are reserved by the XML syntax itself. Hence, they cannot be used directly. To use
them, some replacement-entities are used, which are listed below:

not allowed character replacement-entity character description

< &lt; less than

> &gt; greater than

& &amp; ampersand

' &apos; apostrophe

" &quot; quotation mark


The document prolog comes at the top of the document, before the root element. This section
contains:

 XML declaration
 Document type declaration

Document Elements Section

Document Elements are the building blocks of XML. These divide the document into a hierarchy of
sections, each serving a specific purpose. You can separate a document into multiple sections so that
they can be rendered differently, or used by a search engine. The elements can be containers, with a
combination of text and other elements.

Syntax

Following syntax shows XML declaration:

<?xml
version="version_number"
encoding="encoding_declaration"
standalone="standalone_status"
?>
Parameter Parameter_value Parameter_description

Version

1.0

Specifies the version of the XML standard used.

Encoding

UTF-8, UTF-16, ISO-10646-UCS-2, ISO-10646-UCS-4, ISO-8859-1 to ISO-8859-9, ISO-2022-JP,


Shift_JIS, EUC-JP

It defines the character encoding used in the document. UTF-8 is the default encoding used.

Standalone

yes or no.

It informs the parser whether the document relies on the information from an external source, such as
external document type definition (DTD), for its content. The default value is set to no. Setting it to
yes tells the processor there are no external declarations required for parsing the document.
Types of Character Entities

There are three types of character entities:

 Predefined Character Entities


 Numbered Character Entities

Predefined Character Entities

They are introduced to avoid the ambiguity while using some symbols. For example, an ambiguity is
observed when less than ( < ) or greater than ( > ) symbol is used with the angle tag(<>). Character
entities are basically used to delimit tags in XML. Following is a list of pre-defined character entities
from XML specification. These can be used to express characters without ambiguity.

 Ampersand: &amp;

 Single quote: &apos;

 Greater than: &gt;

 Less than: &lt;

 Double quote: &quot;

Numeric Character Entities

The numeric reference is used to refer to a character entity. Numeric reference can either be in decimal
or hexadecimal format. As there are thousands of numeric references available, these are a bit hard to
remember. Numeric reference refers to the character by its number in the Unicode character set.

General syntax for decimal numeric reference is:

&# decimal number ;

General syntax for hexadecimal numeric reference is:

&#x Hexadecimal number ;

The following table lists some predefined character entities with their numeric values:

Hexadecimal
Entity name Character Decimal reference
reference

quot " &#34; &#x22;

The term CDATA means, Character Data. CDATA are defined as blocks of text that are not parsed by
the parser, but are otherwise recognized as markup.
The predefined entities such as &lt;, &gt;, and &amp; require typing and are generally difficult to read in
the markup. In such cases, CDATA section can be used. By using CDATA section, you are commanding
the parser that the particular section of the document contains no markup and should be treated as regular
text.

Syntax

Following is the syntax for CDATA section:

<![CDATA[
characters with markup
]]>

The above syntax is composed of three sections:

 CDATA Start section - CDATA begins with the nine-character delimiter <![CDATA[

 CDATA End section - CDATA section ends with ]]> delimiter.

 CData section - Characters between these two enclosures are interpreted as characters, and not as
markup. This section may contain markup characters (<, >, and &), but they are ignored by the XML
processor.

Example

The following markup code shows example of CDATA. Here, each character written inside the CDATA
section is ignored by the parser.

<script>
<![CDATA[
<message> Welcome to “college” </message>
]] >
</script >

In the above syntax, everything between <message> and </message> is treated as character data and not
as markup.

CDATA Rules

The given rules are required to be followed for XML CDATA:

 CDATA cannot contain the string "]]>" anywhere in the XML document.

 Nesting is not allowed in CDATA section.

Whitespace:

A special attribute named xml:space may be attached to an element. This indicates that whitespace
should not be removed for that element by the application. You can set this attribute to default or
preserve as shown in the example below:

<!ATTLIST address xml:space (default|preserve) 'preserve'>

Where:
 The value default signals that the default whitespace processing modes of an application are acceptable
for this element;

 The value preserve indicates the application to preserve all the whitespaces.

Processing instructions (PIs) can be used to pass information to applications. PIs can appear anywhere in
the document outside the markup. They can appear in the prolog, including the document type definition
(DTD), in textual content, or after the document.

DOCUMENT TYPE DEFINITIONS


 A DTD is a set of structural rules called declarations which specify a set of elements that can appear in
the document. It also specifies how and where these elements appear
 DTD also specify entity definitions
 DTD is more useful when the same tag set definition is used by collection of documents
 A DTD can be embedded in XML document whose syntax rules it describes
 In this case, a DTD is called as internal DTD or a separate file can be created which can be linked to XML
file. In this case the DTD is called as External DTD
 An external DTD can be used with more than one XML file

Syntax
Basic syntax of a DTD is as follows:

<!DOCTYPE element DTD identifier


[
declaration1
declaration2
........
]>

In the above syntax,

 The DTD starts with <!DOCTYPE delimiter.

 An element tells the parser to parse the document from the specified root element.

 DTD identifier is an identifier for the document type definition, which may be the path to a file on the
system or URL to a file on the internet. If the DTD is pointing to external path, it is called External
Subset.

 The square brackets [ ] enclose an optional list of entity declarations called Internal Subset.

Internal DTD

A DTD is referred to as an internal DTD if elements are declared within the XML files. To refer it as
internal DTD, standalone attribute in XML declaration must be set to yes. This means, the declaration
works independent of external source.

Syntax

The syntax of internal DTD is as shown:

<!DOCTYPE root-element [element-declarations]>


where root-element is the name of root element and element-declarations is where you declare the
elements.

Example

Following is a simple example of internal DTD:

<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>


<!DOCTYPE address [
<!ELEMENT address (name,company,phone)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT phone (#PCDATA)>
]>

<address>
<name>Raj</name>
<phone>(011) 123-4567</phone>
</address>
<!ELEMENT phone_no (#PCDATA)>

Several elements are declared here that make up the vocabulary of the <name> document. <!
ELEMENT name (#PCDATA)> defines the element name to be of type "#PCDATA". Here
#PCDATA means parse-able text data.

External DTD

In external DTD elements are declared outside the XML file. They are accessed by specifying the
system attributes which may be either the legal .dtd file or a valid URL. To refer it as external DTD,
standalone attribute in the XML declaration must be set as no. This means, declaration includes
information from the external source.

Syntax

Following is the syntax for external DTD:

<!DOCTYPE root-element SYSTEM "file-name">

where file-name is the file with .dtd extension.

Example

The following example shows external DTD usage:

<?xml version="1.0" encoding="UTF-8" standalone="no" ?>


<!DOCTYPE address SYSTEM "address.dtd">
<address>
<name>raj</name>
< <phone>(011) 123-4567</phone>
</address>

The content of the DTD file address.dtd are as shown:

<!ELEMENT address (name,,phone)>


<!ELEMENT name (#PCDATA)>
<!ELEMENT phone (#PCDATA)>
Types

You can refer to an external DTD by using either system identifiers or public identifiers.

System Identifiers

A system identifier enables you to specify the location of an external file containing DTD declarations.
Syntax is as follows:

<!DOCTYPE name SYSTEM "address.dtd" [...]>

As you can see, it contains keyword SYSTEM and a URI reference pointing to the location of the
document.

Public Identifiers

Public identifiers provide a mechanism to locate DTD resources and are written as below:

<!DOCTYPE name PUBLIC "-//Beginning XML//DTD Address Example//EN">

XML Schema

XML Schema is commonly known as XML Schema Definition (XSD). It is used to describe and
validate the structure and the content of XML data. XML schema defines the elements, attributes and
data types. Schema element supports Namespaces.

Syntax

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
Example

The following example shows how to use schema:

<?xml version="1.0" encoding="UTF-8"?>


<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="contact">
<xs:complexType>
<xs:sequence>
<xs:element name="name" type="xs:string" />
<xs:element name="phone" type="xs:int" />
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>

The basic idea behind XML Schemas is that they describe the legitimate format that an XML
document can take.

Elements

As we saw in the XML - Elements chapter, elements are the building blocks of XML document. An
element can be defined within an XSD as follows:

<xs:element name="x" type="y"/>

Definition Types
You can define XML schema elements in following ways:

Simple Type - Simple type element is used only in the context of the text. Some of predefined simple
types are: xs:integer, xs:boolean, xs:string, xs:date. For example:

<xs:element name="phone_number" type="xs:int" />

Complex Type - A complex type is a container for other element definitions. This allows you to
specify which child elements an element can contain and to provide some structure within your XML
documents. For example:

<xs:element name="Address">
<xs:complexType>
<xs:sequence>
<xs:element name="name" type="xs:string" />
<xs:element name="phone" type="xs:int" />
</xs:sequence>
</xs:complexType>
</xs:element>

In the above example, Address element consists of child elements. This is a container for other
<xs:element> definitions, that allows to build a simple hierarchy of elements in the XML document.

Global Types - With global type, you can define a single type in your document, which can be used by
all other references. For example, suppose you want to generalize the person and company for different
addresses of the company. In such case, you can define a general type as below:

<xs:element name="AddressType">
<xs:complexType>
<xs:sequence>
<xs:element name="name" type="xs:string" />
<xs:element name="company" type="xs:string" />
</xs:sequence>
</xs:complexType>
</xs:element>

Now let us use this type in our example as below:

<xs:element name="Address1">
<xs:complexType>
<xs:sequence>
<xs:element name="address" type="AddressType" />
<xs:element name="phone1" type="xs:int" />
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="Address2">
<xs:complexType>
<xs:sequence>
<xs:element name="address" type="AddressType" />
<xs:element name="phone2" type="xs:int" />
</xs:sequence>
</xs:complexType>
</xs:element>
Instead of having to define the name and the company twice (once for Address1 and once for
Address2), we now have a single definition. This makes maintenance simpler, i.e., if you decide to add
"Postcode" elements to the address, you need to add them at just one place.

Attributes

Attributes in XSD provide extra information within an element. Attributes have name and type
property as shown below:

<xs:attribute name="x" type="y"/>

Namespace Declaration

A Namspace is declared using reserved attributes. Such an attribute name must either be xmlns or
begin with xmlns: shown as below:

<element xmlns:name="URL">

Syntax

 The Namespace starts with the keyword xmlns.

 The word name is the Namespace prefix.

 The URL is the Namespace identifier.

Example

Namespace affects only a limited area in the document. An element containing the declaration and all
of its descendants are in the scope of the Namespace. Following is a simple example of XML
Namespace:

<?xml version="1.0" encoding="UTF-8"?>


<cont:contact xmlns:cont="www.college.com/profile">
<cont:name>raj</cont:name>
<cont:phone>(011) 123-4567</cont:phone>
</cont:contact>

Here, the Namespace prefix is cont, and the Namespace identifier (URI) as
www.tutorialspoint.com/profile. This means, the element names and attribute names with the cont
prefix (including the contact element.

XSLT

XSL stands for EXtensible Stylesheet Language, and is a style sheet language for XML documents.

XSL stands for EXtensible Stylesheet Language.

The World Wide Web Consortium (W3C) started to develop XSL because there was a need for an
XML-based Stylesheet Language.

XSL describes how the XML document should be displayed!


XSLT is used to transform an XML document into another XML document, or another type of
document that is recognized by a browser, like HTML and XHTML. Normally XSLT does this by
transforming each XML element into an (X)HTML element.

With XSLT you can add/remove elements and attributes to or from the output file. You can also
rearrange and sort elements, perform tests and make decisions about which elements to hide and
display, and a lot more.

Correct Style Sheet Declaration

The root element that declares the document to be an XSL style sheet is <xsl:stylesheet> or
<xsl:transform>.

Note: <xsl:stylesheet> and <xsl:transform> are completely synonymous and either can be used!

The correct way to declare an XSL style sheet according to the W3C XSLT Recommendation is:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

or:

<xsl:transform version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

To get access to the XSLT elements, attributes and features we must declare the XSLT namespace at the
top of the document.

The xmlns:xsl="http://www.w3.org/1999/XSL/Transform" points to the official W3C XSLT


namespace. If you use this namespace, you must also include the attribute version="1.0".

Start with a Raw XML Document

We want to transform the following XML document ("cdcatalog.xml") into XHTML:

<?xml version="1.0" encoding="UTF-8"?>


<catalog>
  <cd>
    <title>Empire Burlesque</title>
    <artist>Bob Dylan</artist>
    <country>USA</country>
    <company>Columbia</company>
    <price>10.90</price>
    <year>1985</year>
  </cd>
.
.
</catalog>

Create an XSL Style Sheet

Then you create an XSL Style Sheet ("cdcatalog.xsl") with a transformation template:

<?xml version="1.0" encoding="UTF-8"?>


<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:template match="/">
  <html>
  <body>
  <h2>My CD Collection</h2>
  <table border="1">
    <tr bgcolor="#9acd32">
      <th>Title</th>
      <th>Artist</th>
    </tr>
    <xsl:for-each select="catalog/cd">
    <tr>
      <td><xsl:value-of select="title"/></td>
      <td><xsl:value-of select="artist"/></td>
    </tr>
    </xsl:for-each>
  </table>
  </body>
  </html>
</xsl:template>
</xsl:stylesheet>

You might also like