L2 Mechanisms For Web 1

Lecture 2: Mechanisms for the
functioning of the Web
Dr. Victor ODUMUYIWA

vodumuyiwa@unilag.edu.ng
Learning Objectives
At the end of this lecture, you should be able to:
• Describe how the Web works
• Describe the three mechanisms upon which the Web relies
• Analyse a URL
WWW (Web)
The WWW (internet based hypermedia initiative for global
information exchange) was invented by Tim Berners – Lee
in 1989 while working at CERN, European Particle Physics
Laboratory http://www.w3.org/People/Berners-Lee/
• Tim is a graduate of oxford university

• He wrote the first web client and server in 1990
• He gave the initial specifications of URIs, HTTP, and HTML
• His specifications were refined as web technology spread
• He is the founder and director of the W3C
• He’s currently a professor at MIT, USA
3
WWW (Web) Contd.
• W3C (World Wide Web Consortium) seeks to standardize and improve web technologies and
other related issues
• W3C (founded in 1994) develops interoperable technologies (specifications, guidelines,

software, and tools) to lead the Web to its full potential
• The Web is based on the Internet. While W3C works on defining standards for the Web, IETF
focuses on making the Internet work better
• IETF (Internet Engineering Task Force) mission is to make the Internet work better from an
engineering point of view
4
Mechanisms for Information Exchange
The Web relies on three mechanisms to make information exchange and

resource sharing possible:
• A uniform naming scheme for locating resources on the Web e.g. URIs
• Protocols for access to named resources over the Web e.g. HTTP
• Hypertext for easy navigation among resource e.g. HTML
5
Uniform Naming Scheme
• A name is a logical way of referring to an object in some abstract name space
• Naming architecture given by the W3C (and submitted to IETF as RFC) is the URI – Uniform
Resource Identification
• URI contains several naming scheme
• All the naming schemes are standardized (uniformised) under the URI
• When referring to the naming architecture, URI can be considered as Uniform Resource
Identification but when URI is used to refer to the identity of each resource on the web, then we are
talking about Uniform Resource Identifier
6
URI
• A Uniform Resource Identifier (URI) is a compact sequence of characters that

identifies an abstract or physical resource (RFC 3986)
• URI provides a simple and extensible means for identifying a resource (RFC
3986)
• Every resource available on the Web -- HTML document, image, video clip,
program, audio file etc. -- has an address that may be encoded by a URI
7
Uniform (RFC3986)
Uniformity provides several benefits:
• It allows different types of resource identifiers to be used in the same context,

even when the mechanisms used to access those resources may differ
• It allows uniform semantic interpretation of common syntactic conventions
across different types of resource identifiers
• It allows introduction of new types of resource identifiers without interfering
with the way that existing identifiers are used
• It allows the identifiers to be reused in many different contexts, thus
permitting new applications or protocols to leverage a pre- existing, large, and
widely used set of resource identifiers.
8
Resource (RFC3986)
• The term "resource" is used in a general sense for whatever might be identified by a URI.
Familiar examples include an electronic document, an image, a source of information with a
consistent purpose (e.g., "today's weather report for Los Angeles"), a service (e.g., an HTTP-to-
SMS gateway), and a collection of other resources.
• A resource is not necessarily accessible via the Internet; e.g., human beings, corporations, and
bound books in a library can also be resources. Likewise, abstract concepts can be resources,
such as the operators and operands of a mathematical equation, the types of a relationship
(e.g., "parent" or "employee"), or numeric values (e.g., zero, one, and infinity).
9
Identifier (RFC3986)
• An identifier embodies the information required to distinguish what is being identified from
all other things within its scope of identification
• Our use of the terms "identify" and "identifying" refer to this purpose of distinguishing one
resource from all other resources, regardless of how that purpose is accomplished (e.g., by
name, address, or context)
• These terms should not be mistaken as an assumption that an identifier defines or embodies
the identity of what is referenced, though that may be the case for some identifiers.
• It should not be assumed that a system using URIs will access the resource identified: in many
cases, URIs are used to denote resources without any intention that they be accessed.
• Also, the "one" resource identified might not be singular in nature (e.g., a resource might be a
named set or a mapping that varies over time).
10
URI Syntax (RFC3986)
The generic URI syntax consists of a hierarchical sequence of components referred to as the
scheme, authority, path, query, and fragment.
URI = scheme:[//[user:password@]host[:port]][/]path[?query][#fragment]
URI = scheme ":" hier-part [ "?" query ] [ "#" fragment ]
11
URIs and Their Component Parts (RFC3986)
12
URIs and Their Component Parts(Wikipedia)
URIs and Their Component Parts(Stackoverflow))
Scheme
• The scheme consists of a sequence of characters beginning with a letter and followed by
any combination of letters, digits, plus (+), period (.), or hyphen (-).
• Examples of popular schemes include http, ftp, mailto, file, and data.
• URI schemes should be registered with the Internet Assigned Numbers Authority (IANA)
• Two slashes (//) : this is required by some schemes and not required by some others. When
the authority component is absent, the path component cannot begin with two slashes.
(Wikipedia)
Authority
• An authentication section (optional) of a user name and password, separated by a colon,

followed by an at symbol (@)
• A "host", consisting of either a registered name (including but not limited to a hostname), or
an IP address.
• IPv4 addresses must be in dot-decimal notation, and IPv6 addresses must be enclosed
in brackets ([ ])
• A port number (optional), separated from the hostname by a colon
(Wikipedia)
Path
• A path contains data usually organized in hierarchical form
• It appears as a sequence of segments separated by slashes.
• A path must begin with a single slash (/) if an authority part was present, and may also if one
was not, but must not begin with a double slash.
(Wikipedia)
Query and Fragment
• Query (optional), separated from the preceding part by a question mark (?)
• most often a sequence of attribute–value pairs separated by a delimiter.
• A delimeter could be an ampersand (&) or a semilon (;)
• Fragment (optional), separated from the preceding part by a hash (#).

• contains a fragment identifier providing direction to a secondary resource, such as a
section heading in an article identified by the remainder of the URI.
(Wikipedia)
Examples (RFC3986)
http://ng.linkedin.com/pub/temitope-odumuyiwa/28/b51/807
http://video.google.com/?hl=fr&tab=wv
ftp://ftp.is.co.za/rfc/rfc1808.txt
http://www.ietf.org/rfc/rfc2396.txt
ldap://[2001:db8::7]/c=GB
mailto:John.Doe@example.com
news:comp.infosystems.www.servers.unix
tel:+1-816-555-1212
telnet://192.0.2.16:80/
urn:oasis:names:specification:docbook:dtd:xml:4.1.2
19
URL and URN (RFC3986)
• A URI can be further classified as a locator, a name, or both. The term "Uniform Resource
Locator" (URL) refers to the subset of URIs that, in addition to identifying a resource, provide
a means of locating the resource by describing its primary access mechanism (e.g., its
network "location").
• The term "Uniform Resource Name" (URN) has been used historically to refer to both URIs
under the "urn“, which are required to remain globally unique and persistent even when the
resource ceases to exist or becomes unavailable, and to any other URI with the properties of a
name.
20
URL
URL typically consist of three pieces:
• The naming scheme of the mechanism used to access the resource
• Each URI begins with a scheme name that refers to a specification for assigning identifiers within
that scheme
• The name of the machine hosting the resource
• The name of the resource itself, given as a path
http://www.w3.org/People/Berners-Lee/
Naming Name of the
scheme Host machine resource
21
In-class activity
Analyse the following URLs:
1 http://www.tc3.edu/instruct/sbrown/swt/symbol.htm
2 http://brownmath.com/swt/chap09.htm#c09_DefMargerr
3 http://video.google.com/?hl=fr&tab=wv&q=victor+hugo
4 http://mail.unilag.edu.ng:8080/zimbra/#1
5 https://www.google.com.ng/?gfe_rd=cr&ei=yQJbVbjYH_HH8geLlYDwCQ&gws_rd=ssl&
q=statistical+symbols
Relative URIs
• Does not contain any naming scheme
• Its path generally refers to a resource on the same machine with the current document
• May contain relative path component (e.g. “..” meaning one level up in the hierarchy defined by the path )
• <IMG src="../image/logo.gif" alt="logo">
• May contain fragments
• Are resolved to full URIs using a base URI
Illustration
Given a base URI www.okoro.com/home/index.html
www.okoro.com is the authority while /home/index.html is the path to the
resource
<A href= “about_us.html”>About us</A>
"http:// www.okoro.com /home/about_us.html »
<IMG src="../image/logo.gif" alt="logo">
"http:// www.okoro.com /image/logo.gif » 23
In-class Activity
Base url = localhost/400level-2015-2016/relative/index.html
<img src = "/image/vicmykid.JPG">

<img src = “../image/vicmykid.JPG">
<img src = “./image/vicmykid.JPG">
<img src = "image/vicmykid.JPG">
<img src = “…/image/vicmykid.JPG">
Solution
Base url = localhost/400level-2015-2016/relative/
<img src = "/image/vicmykid.JPG">

http://localhost/image/vicmykid.JPG
<img src = “../image/vicmykid.JPG">
http://localhost/400level-2015-2016/image/vicmykid.JPG
<img src = “./image/vicmykid.JPG">
http://localhost/400level-2015-2016/relative/image/vicmykid.JPG
<img src = "image/vicmykid.JPG">
http://localhost/400level-2015-2016/relative/image/vicmykid.JPG
<img src = “…/image/vicmykid.JPG">
http://localhost/400level-2015-2016/relative/.../image/vicmykid.JPG
Common uses of relative urls
• To facilitate porting from test environment to life environment

Problems with relative urls
• Search engine optimization
• “Spidered” and indexed test environment leading to massive duplicate content

issues
• Spider traps
Uses of URIs in HTML
URIs can be used in HTML to:

• link to another document or resource ( e.g. in <a> and <link> elements).
• link to an external style sheet or script (e.g. in <link> and <script> elements).
• include an image, object, or applet in a page, (e.g. in <img>, <object>, <applet> and <input>
elements).
• create an image map (e.g. in <map> and <area> elements).
• submit a form (e.g. in <form>).
• create a frame document (e.g. in <frame> and <iframe> elements).
• cite an external reference (e.g. in < q >, <blockquote>, <ins> and <del> elements).
• refer to metadata conventions describing a document (e.g. in <head> element).
28

L2 Mechanisms For Web 1

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

L2 Mechanisms For Web 1

Uploaded by

Copyright:

Available Formats

Lecture 2: Mechanisms for the

functioning of the Web

Dr. Victor ODUMUYIWA

At the end of this lecture, you should be able to:

• Describe how the Web works

• Describe the three mechanisms upon which the Web relies

• Tim is a graduate of oxford university

• W3C (founded in 1994) develops interoperable technologies (specifications, guidelines,

The Web relies on three mechanisms to make information exchange and

• Hypertext for easy navigation among resource e.g. HTML

• A name is a logical way of referring to an object in some abstract name space

• URI contains several naming scheme

• A Uniform Resource Identifier (URI) is a compact sequence of characters that

Uniformity provides several benefits:

• It allows different types of resource identifiers to be used in the same context,

URI = scheme ":" hier-part [ "?" query ] [ "#" fragment ]

• An authentication section (optional) of a user name and password, separated by a colon,

• A path contains data usually organized in hierarchical form

• It appears as a sequence of segments separated by slashes.

• Fragment (optional), separated from the preceding part by a hash (#).

URL typically consist of three pieces:

• The naming scheme of the mechanism used to access the resource

• The name of the machine hosting the resource

• The name of the resource itself, given as a path

Analyse the following URLs:

Base url = localhost/400level-2015-2016/relative/index.html

<img src = "/image/vicmykid.JPG">

<img src = "/image/vicmykid.JPG">

• To facilitate porting from test environment to life environment

• Search engine optimization

• “Spidered” and indexed test environment leading to massive duplicate content

URIs can be used in HTML to:

You might also like