Professional Documents
Culture Documents
L2 Mechanisms For Web 1
L2 Mechanisms For Web 1
• Analyse a URL
WWW (Web)
The WWW (internet based hypermedia initiative for global
information exchange) was invented by Tim Berners – Lee
in 1989 while working at CERN, European Particle Physics
Laboratory http://www.w3.org/People/Berners-Lee/
3
WWW (Web) Contd.
• W3C (World Wide Web Consortium) seeks to standardize and improve web technologies and
other related issues
• The Web is based on the Internet. While W3C works on defining standards for the Web, IETF
focuses on making the Internet work better
• IETF (Internet Engineering Task Force) mission is to make the Internet work better from an
engineering point of view
4
Mechanisms for Information Exchange
• A uniform naming scheme for locating resources on the Web e.g. URIs
• Protocols for access to named resources over the Web e.g. HTTP
5
Uniform Naming Scheme
• Naming architecture given by the W3C (and submitted to IETF as RFC) is the URI – Uniform
Resource Identification
• All the naming schemes are standardized (uniformised) under the URI
• When referring to the naming architecture, URI can be considered as Uniform Resource
Identification but when URI is used to refer to the identity of each resource on the web, then we are
talking about Uniform Resource Identifier
6
URI
• URI provides a simple and extensible means for identifying a resource (RFC
3986)
• Every resource available on the Web -- HTML document, image, video clip,
program, audio file etc. -- has an address that may be encoded by a URI
7
Uniform (RFC3986)
8
Resource (RFC3986)
• The term "resource" is used in a general sense for whatever might be identified by a URI.
Familiar examples include an electronic document, an image, a source of information with a
consistent purpose (e.g., "today's weather report for Los Angeles"), a service (e.g., an HTTP-to-
SMS gateway), and a collection of other resources.
• A resource is not necessarily accessible via the Internet; e.g., human beings, corporations, and
bound books in a library can also be resources. Likewise, abstract concepts can be resources,
such as the operators and operands of a mathematical equation, the types of a relationship
(e.g., "parent" or "employee"), or numeric values (e.g., zero, one, and infinity).
9
Identifier (RFC3986)
• An identifier embodies the information required to distinguish what is being identified from
all other things within its scope of identification
• Our use of the terms "identify" and "identifying" refer to this purpose of distinguishing one
resource from all other resources, regardless of how that purpose is accomplished (e.g., by
name, address, or context)
• These terms should not be mistaken as an assumption that an identifier defines or embodies
the identity of what is referenced, though that may be the case for some identifiers.
• It should not be assumed that a system using URIs will access the resource identified: in many
cases, URIs are used to denote resources without any intention that they be accessed.
• Also, the "one" resource identified might not be singular in nature (e.g., a resource might be a
named set or a mapping that varies over time).
10
URI Syntax (RFC3986)
The generic URI syntax consists of a hierarchical sequence of components referred to as the
scheme, authority, path, query, and fragment.
URI = scheme:[//[user:password@]host[:port]][/]path[?query][#fragment]
11
URIs and Their Component Parts (RFC3986)
12
URIs and Their Component Parts(Wikipedia)
URIs and Their Component Parts(Stackoverflow))
Scheme
• The scheme consists of a sequence of characters beginning with a letter and followed by
any combination of letters, digits, plus (+), period (.), or hyphen (-).
• Examples of popular schemes include http, ftp, mailto, file, and data.
• URI schemes should be registered with the Internet Assigned Numbers Authority (IANA)
• Two slashes (//) : this is required by some schemes and not required by some others. When
the authority component is absent, the path component cannot begin with two slashes.
(Wikipedia)
Authority
• A "host", consisting of either a registered name (including but not limited to a hostname), or
an IP address.
• IPv4 addresses must be in dot-decimal notation, and IPv6 addresses must be enclosed
in brackets ([ ])
• A port number (optional), separated from the hostname by a colon
(Wikipedia)
Path
• A path must begin with a single slash (/) if an authority part was present, and may also if one
was not, but must not begin with a double slash.
(Wikipedia)
Query and Fragment
• Query (optional), separated from the preceding part by a question mark (?)
• most often a sequence of attribute–value pairs separated by a delimiter.
• A delimeter could be an ampersand (&) or a semilon (;)
(Wikipedia)
Examples (RFC3986)
http://ng.linkedin.com/pub/temitope-odumuyiwa/28/b51/807
http://video.google.com/?hl=fr&tab=wv
ftp://ftp.is.co.za/rfc/rfc1808.txt
http://www.ietf.org/rfc/rfc2396.txt
ldap://[2001:db8::7]/c=GB
mailto:John.Doe@example.com
news:comp.infosystems.www.servers.unix
tel:+1-816-555-1212
telnet://192.0.2.16:80/
urn:oasis:names:specification:docbook:dtd:xml:4.1.2
19
URL and URN (RFC3986)
• A URI can be further classified as a locator, a name, or both. The term "Uniform Resource
Locator" (URL) refers to the subset of URIs that, in addition to identifying a resource, provide
a means of locating the resource by describing its primary access mechanism (e.g., its
network "location").
• The term "Uniform Resource Name" (URN) has been used historically to refer to both URIs
under the "urn“, which are required to remain globally unique and persistent even when the
resource ceases to exist or becomes unavailable, and to any other URI with the properties of a
name.
20
URL
• Each URI begins with a scheme name that refers to a specification for assigning identifiers within
that scheme
http://www.w3.org/People/Berners-Lee/
Naming Name of the
scheme Host machine resource
21
In-class activity
1 http://www.tc3.edu/instruct/sbrown/swt/symbol.htm
2 http://brownmath.com/swt/chap09.htm#c09_DefMargerr
3 http://video.google.com/?hl=fr&tab=wv&q=victor+hugo
4 http://mail.unilag.edu.ng:8080/zimbra/#1
5 https://www.google.com.ng/?gfe_rd=cr&ei=yQJbVbjYH_HH8geLlYDwCQ&gws_rd=ssl&
q=statistical+symbols
Relative URIs
• Does not contain any naming scheme
• Its path generally refers to a resource on the same machine with the current document
• May contain relative path component (e.g. “..” meaning one level up in the hierarchy defined by the path )
• <IMG src="../image/logo.gif" alt="logo">
• May contain fragments
• Are resolved to full URIs using a base URI
Illustration
Given a base URI www.okoro.com/home/index.html
www.okoro.com is the authority while /home/index.html is the path to the
resource
<A href= “about_us.html”>About us</A>
"http:// www.okoro.com /home/about_us.html »
<IMG src="../image/logo.gif" alt="logo">
"http:// www.okoro.com /image/logo.gif » 23
In-class Activity
• Spider traps
Uses of URIs in HTML
28