Download as pdf or txt
Download as pdf or txt
You are on page 1of 28

Lecture 2: Mechanisms for the

functioning of the Web

Dr. Victor ODUMUYIWA


vodumuyiwa@unilag.edu.ng
Learning Objectives

At the end of this lecture, you should be able to:

• Describe how the Web works

• Describe the three mechanisms upon which the Web relies

• Analyse a URL
WWW (Web)
The WWW (internet based hypermedia initiative for global
information exchange) was invented by Tim Berners – Lee
in 1989 while working at CERN, European Particle Physics
Laboratory http://www.w3.org/People/Berners-Lee/

• Tim is a graduate of oxford university


• He wrote the first web client and server in 1990
• He gave the initial specifications of URIs, HTTP, and HTML
• His specifications were refined as web technology spread
• He is the founder and director of the W3C
• He’s currently a professor at MIT, USA

3
WWW (Web) Contd.

• W3C (World Wide Web Consortium) seeks to standardize and improve web technologies and
other related issues

• W3C (founded in 1994) develops interoperable technologies (specifications, guidelines,


software, and tools) to lead the Web to its full potential

• The Web is based on the Internet. While W3C works on defining standards for the Web, IETF
focuses on making the Internet work better

• IETF (Internet Engineering Task Force) mission is to make the Internet work better from an
engineering point of view

4
Mechanisms for Information Exchange

The Web relies on three mechanisms to make information exchange and


resource sharing possible:

• A uniform naming scheme for locating resources on the Web e.g. URIs

• Protocols for access to named resources over the Web e.g. HTTP

• Hypertext for easy navigation among resource e.g. HTML

5
Uniform Naming Scheme

• A name is a logical way of referring to an object in some abstract name space

• Naming architecture given by the W3C (and submitted to IETF as RFC) is the URI – Uniform
Resource Identification

• URI contains several naming scheme

• All the naming schemes are standardized (uniformised) under the URI

• When referring to the naming architecture, URI can be considered as Uniform Resource
Identification but when URI is used to refer to the identity of each resource on the web, then we are
talking about Uniform Resource Identifier

6
URI

• A Uniform Resource Identifier (URI) is a compact sequence of characters that


identifies an abstract or physical resource (RFC 3986)

• URI provides a simple and extensible means for identifying a resource (RFC
3986)

• Every resource available on the Web -- HTML document, image, video clip,
program, audio file etc. -- has an address that may be encoded by a URI

7
Uniform (RFC3986)

Uniformity provides several benefits:

• It allows different types of resource identifiers to be used in the same context,


even when the mechanisms used to access those resources may differ
• It allows uniform semantic interpretation of common syntactic conventions
across different types of resource identifiers
• It allows introduction of new types of resource identifiers without interfering
with the way that existing identifiers are used
• It allows the identifiers to be reused in many different contexts, thus
permitting new applications or protocols to leverage a pre- existing, large, and
widely used set of resource identifiers.

8
Resource (RFC3986)

• The term "resource" is used in a general sense for whatever might be identified by a URI.
Familiar examples include an electronic document, an image, a source of information with a
consistent purpose (e.g., "today's weather report for Los Angeles"), a service (e.g., an HTTP-to-
SMS gateway), and a collection of other resources.

• A resource is not necessarily accessible via the Internet; e.g., human beings, corporations, and
bound books in a library can also be resources. Likewise, abstract concepts can be resources,
such as the operators and operands of a mathematical equation, the types of a relationship
(e.g., "parent" or "employee"), or numeric values (e.g., zero, one, and infinity).

9
Identifier (RFC3986)

• An identifier embodies the information required to distinguish what is being identified from
all other things within its scope of identification

• Our use of the terms "identify" and "identifying" refer to this purpose of distinguishing one
resource from all other resources, regardless of how that purpose is accomplished (e.g., by
name, address, or context)

• These terms should not be mistaken as an assumption that an identifier defines or embodies
the identity of what is referenced, though that may be the case for some identifiers.

• It should not be assumed that a system using URIs will access the resource identified: in many
cases, URIs are used to denote resources without any intention that they be accessed.

• Also, the "one" resource identified might not be singular in nature (e.g., a resource might be a
named set or a mapping that varies over time).
10
URI Syntax (RFC3986)

The generic URI syntax consists of a hierarchical sequence of components referred to as the
scheme, authority, path, query, and fragment.

URI = scheme:[//[user:password@]host[:port]][/]path[?query][#fragment]

URI = scheme ":" hier-part [ "?" query ] [ "#" fragment ]

11
URIs and Their Component Parts (RFC3986)

12
URIs and Their Component Parts(Wikipedia)
URIs and Their Component Parts(Stackoverflow))
Scheme

• The scheme consists of a sequence of characters beginning with a letter and followed by
any combination of letters, digits, plus (+), period (.), or hyphen (-).

• Examples of popular schemes include http, ftp, mailto, file, and data.

• URI schemes should be registered with the Internet Assigned Numbers Authority (IANA)

• Two slashes (//) : this is required by some schemes and not required by some others. When
the authority component is absent, the path component cannot begin with two slashes.

(Wikipedia)
Authority

• An authentication section (optional) of a user name and password, separated by a colon,


followed by an at symbol (@)

• A "host", consisting of either a registered name (including but not limited to a hostname), or
an IP address.
• IPv4 addresses must be in dot-decimal notation, and IPv6 addresses must be enclosed
in brackets ([ ])
• A port number (optional), separated from the hostname by a colon

(Wikipedia)
Path

• A path contains data usually organized in hierarchical form

• It appears as a sequence of segments separated by slashes.

• A path must begin with a single slash (/) if an authority part was present, and may also if one
was not, but must not begin with a double slash.

(Wikipedia)
Query and Fragment

• Query (optional), separated from the preceding part by a question mark (?)
• most often a sequence of attribute–value pairs separated by a delimiter.
• A delimeter could be an ampersand (&) or a semilon (;)

• Fragment (optional), separated from the preceding part by a hash (#).


• contains a fragment identifier providing direction to a secondary resource, such as a
section heading in an article identified by the remainder of the URI.

(Wikipedia)
Examples (RFC3986)

http://ng.linkedin.com/pub/temitope-odumuyiwa/28/b51/807
http://video.google.com/?hl=fr&tab=wv
ftp://ftp.is.co.za/rfc/rfc1808.txt
http://www.ietf.org/rfc/rfc2396.txt
ldap://[2001:db8::7]/c=GB
mailto:John.Doe@example.com
news:comp.infosystems.www.servers.unix
tel:+1-816-555-1212
telnet://192.0.2.16:80/
urn:oasis:names:specification:docbook:dtd:xml:4.1.2

19
URL and URN (RFC3986)

• A URI can be further classified as a locator, a name, or both. The term "Uniform Resource
Locator" (URL) refers to the subset of URIs that, in addition to identifying a resource, provide
a means of locating the resource by describing its primary access mechanism (e.g., its
network "location").

• The term "Uniform Resource Name" (URN) has been used historically to refer to both URIs
under the "urn“, which are required to remain globally unique and persistent even when the
resource ceases to exist or becomes unavailable, and to any other URI with the properties of a
name.

20
URL

URL typically consist of three pieces:

• The naming scheme of the mechanism used to access the resource

• Each URI begins with a scheme name that refers to a specification for assigning identifiers within
that scheme

• The name of the machine hosting the resource

• The name of the resource itself, given as a path

http://www.w3.org/People/Berners-Lee/
Naming Name of the
scheme Host machine resource

21
In-class activity

Analyse the following URLs:

1 http://www.tc3.edu/instruct/sbrown/swt/symbol.htm

2 http://brownmath.com/swt/chap09.htm#c09_DefMargerr

3 http://video.google.com/?hl=fr&tab=wv&q=victor+hugo

4 http://mail.unilag.edu.ng:8080/zimbra/#1

5 https://www.google.com.ng/?gfe_rd=cr&ei=yQJbVbjYH_HH8geLlYDwCQ&gws_rd=ssl&
q=statistical+symbols
Relative URIs
• Does not contain any naming scheme
• Its path generally refers to a resource on the same machine with the current document
• May contain relative path component (e.g. “..” meaning one level up in the hierarchy defined by the path )
• <IMG src="../image/logo.gif" alt="logo">
• May contain fragments
• Are resolved to full URIs using a base URI

Illustration
Given a base URI www.okoro.com/home/index.html
www.okoro.com is the authority while /home/index.html is the path to the
resource
<A href= “about_us.html”>About us</A>
"http:// www.okoro.com /home/about_us.html »
<IMG src="../image/logo.gif" alt="logo">
"http:// www.okoro.com /image/logo.gif » 23
In-class Activity

Base url = localhost/400level-2015-2016/relative/index.html

<img src = "/image/vicmykid.JPG">


<img src = “../image/vicmykid.JPG">
<img src = “./image/vicmykid.JPG">
<img src = "image/vicmykid.JPG">
<img src = “…/image/vicmykid.JPG">
Solution
Base url = localhost/400level-2015-2016/relative/

<img src = "/image/vicmykid.JPG">


http://localhost/image/vicmykid.JPG
<img src = “../image/vicmykid.JPG">
http://localhost/400level-2015-2016/image/vicmykid.JPG
<img src = “./image/vicmykid.JPG">
http://localhost/400level-2015-2016/relative/image/vicmykid.JPG
<img src = "image/vicmykid.JPG">
http://localhost/400level-2015-2016/relative/image/vicmykid.JPG
<img src = “…/image/vicmykid.JPG">
http://localhost/400level-2015-2016/relative/.../image/vicmykid.JPG
Common uses of relative urls

• To facilitate porting from test environment to life environment


Problems with relative urls

• Search engine optimization

• “Spidered” and indexed test environment leading to massive duplicate content


issues

• Spider traps
Uses of URIs in HTML

URIs can be used in HTML to:


• link to another document or resource ( e.g. in <a> and <link> elements).
• link to an external style sheet or script (e.g. in <link> and <script> elements).
• include an image, object, or applet in a page, (e.g. in <img>, <object>, <applet> and <input>
elements).
• create an image map (e.g. in <map> and <area> elements).
• submit a form (e.g. in <form>).
• create a frame document (e.g. in <frame> and <iframe> elements).
• cite an external reference (e.g. in < q >, <blockquote>, <ins> and <del> elements).
• refer to metadata conventions describing a document (e.g. in <head> element).

28

You might also like