Download as pdf or txt
Download as pdf or txt
You are on page 1of 142

Internet and Web Based

Technology

http://144.16.192.60/~isg/IWT/
About the Course
• I will be covering half the course (2 hours / week)
– Tuesday 9:30 AM – 11:25 AM
• Topics to be covered
– How Internet works, HTML, HTTP, CGI scripts, PERL, etc.
– Basic concepts of cryptology
– Network security protocols, firewall, NAT, etc.
• Details would be available on the web site.
• What to expect?
– Self-study materials will be prescribed all throughout the
course, from which questions will be set.
– Assignments:
• In groups of two, students will be assigned a Term
Paper and a Programming Assignment.

Internet & Web Based Technology 2


• Attendance is mandatory
– If the cumulative attendance of a student falls below 75%, it
will lead to immediate deregistration.
– Proxy attendance, if detected, will lead to deregistration
and subsequent disciplinary actions.
• Other requirements
– Satisfactory completion of Term Paper and Assignment is
essential, failing which a student will get “F” grade.
– Term Paper:
• A topic will be assigned to a group of two students. A
comprehensive study report of 25-30 pages (11 point,
1.5 spacing) will have to be submitted.

Internet & Web Based Technology 3


– Programming Assignment
• A non-trivial programming problem will be given to a
group. It will have to be implemented and demonstrated.
• Typical example:
– “Design and implement a web based email client
that supports attachments”.

Internet & Web Based Technology 4


Introduction
Internetworking: Basic Concepts
• Computer Network
– a communication system for connecting end-systems
(hosts)
• Local Area Network (LAN)
– connects hosts within a relatively small geographical area
– room, building, campus
• Wide Area Network (WAN)
– hosts may be widely dispersed
– across buildings, cities, countries

Internet & Web Based Technology 6


What is Internet?
• The network formed by the co-operative
interconnection of a large number of computer
networks.
– Network of Networks.
– No one owns the internet
• every person who makes a connection owns a slice of
the Internet
– There is no central administration to the Internet

Internet & Web Based Technology 7


Network

Network

Network

BACKBONE

Network

Network
Network

Internet & Web Based Technology 8


What is it actually?
• A community of people who use and develop the
networks.
• A collection of resources that can be reached from
those networks.
• A setup to facilitate collaboration among members
of the research and educational communities,
world-wide.
• The connected networks use the TCP/IP protocol.

Internet & Web Based Technology 9


Growth of
Internet

Internet & Web Based Technology 10


Internet & Web Based Technology 11
Internet & Web Based Technology 12
Internet & Web Based Technology 13
How Data Flows?
• Packet Switching
– Internet uses TCP/IP protocol.
– TCP/IP uses packet switching.
• A message is broken down into smaller packets.
– A packet is a self-contained bundle of data sent over the
network.
• Generally less than 1500 bytes long.
– Each packet contains
• Address of origin
• Address of destination

Internet & Web Based Technology 14


Packet

MESSAGE

Packets

HEADER DATA

Internet & Web Based Technology 15


World Wide Web (WWW)
• WWW is an Internet “organizer”.
– Developed in the 1980’s by the NSF.
– Internet browsers (Mosaic, Netscape, Internet Explorer,
etc.) developed to make use of WWW easier.
• Based on client-server technology.
– The server is a computer (hardware and software)
providing access to the data.
– The client is the software that allows users to access the
data.

Internet & Web Based Technology 16


The Inside Story
• Interconnected web of documents.
– Billions of them around.
• Where do the documents reside?
– On web (http) servers.
– http stands for Hyper Text Transport Protocol
• They are written in
– html, typically
– html stands for Hyper Text Markup Language
• Documents get formatted/displayed using
– Web browsers (Netscape, Mosaic, Explorer)
– WWW clients

Internet & Web Based Technology 17


Illustration

http
request
Web
Servers
http
response
Web
Client http
request

http
response

Internet & Web Based Technology 18


Topics for Self-study
• Hyper Text Markup Language
– http://www.w3schools.com/html/default.asp
• Hyper Text Transport Protocol
– http://www.comptechdoc.org/independent/web/http/reference/
– http://www.jmarshall.com/easy/http/

Internet & Web Based Technology 19


HTML Forms
Introduction

• Provides two-way communication between web


servers and browsers.
– Demand for most of the emerging applications.
– Provides dynamic contents.

WEB
BROWSER SERVER

Internet & Web Based Technology 21


What is a HTML FORM?

• A form basically contains boxes and buttons.


– Real-life examples:
• Search engines
• On-line purchase of items
• Registration
– The form allows a user to fill up the blank entries and send
it back to the owner of the page.
• Called SUBMITTING the form.

Internet & Web Based Technology 22


FORM Example

Internet & Web Based Technology 23


FORM Tags and Attributes

• Several tags are used in connection with forms:


<form> …… </form>
<input>
<textarea> …… </textarea>
<select> …… </select>

Internet & Web Based Technology 24


<FORM> …… </FORM>

• This tag is used to bracket a HTML form.


– Includes attributes which specify where and how to deliver
filled-up information to the web server.
• Two main attributes:
– METHOD
– ACTION

Internet & Web Based Technology 25


• METHOD:
– Indicates how the information in the form will be sent to the
web server when the form is submitted.
– Two possible values:
• POST: causes a form’s contents to be parsed one
element at a time.
• GET: concatenates all field names and values in a single
large string.
– POST is the preferred method because of string size
limitations in most systems.

Internet & Web Based Technology 26


• ACTION:
– Specifies the URL of a program on the origin server that
will be receiving the form’s inputs.
– Traditionally called Common Gateway Interface (CGI).
• Details of CGI to be discussed later.
– The specified program is executed on the server, when the
form is submitted.
• Output sent back to the browser.

Internet & Web Based Technology 27


• Typical usage:
<FORM METHOD=“POST”
ACTION=“cgi-bin/myprog.pl”>
……..
……..
</FORM>

Internet & Web Based Technology 28


<INPUT>
• This tag defines a basic form element.
• Several attributes are possible:
– TYPE
– NAME
– SIZE
– MAXLENGTH
– VALUE
– SRC
– ALIGN

Internet & Web Based Technology 29


• TYPE:
– Defines the kind of element that is to be displayed in the
form.
• “TEXT” – defines a text box, which provides a single
line area for entering text.
• “RADIO” – radio button, used when a choice must be
made among several alternatives (clicking on one of
the buttons turns off all others in the same group).
• “CHECKBOX” – similar to the radio buttons, but each
box here can be selected independently of the
others.

Internet & Web Based Technology 30


• “PASSWORD” – similar to text box, but characters are not
shown as they are typed.
• “HIDDEN” – used for output only; cannot be modified
(mainly used to refer to choices that have already been
made earlier).
• “IMAGE” – used for active maps. When the user clicks on
the image, the (x,y) co-ordinates are stored in variables,
and are returned for further processing.
• “SUBMIT” – creates a box labeled Submit; if clicked, the
form data are passed on to the designated CGI script.
• “RESET” – creates a box labeled Reset; if clicked, clears a
form’s contents.

Internet & Web Based Technology 31


• NAME:
– Specifies a name for the input field.
– The input-handling program (CGI) in reality receives a
number of (name,value) pairs.
• SIZE:
– Defines the number of characters that can be displayed in a
TEXT box without scrolling.
• MAXLENGTH:
– Defines the maximum number of characters a TEXT box
can contain.

Internet & Web Based Technology 32


• VALUE:
– Used to submit a default value for a TEXT or HIDDEN field.
– Can also be used for specifying the label of a button
(renaming “Submit”, for example).
• SRC:
– Provides a pointer to an image file.
– Used for clickable maps.
• ALIGN:
– Used for aligning image types.
ALIGN = TOP | MIDDLE | BOTTOM

Internet & Web Based Technology 33


<TEXTAREA> … </TEXTAREA>
• Can be used to accommodate multiple text lines in a
box.
• Attributes are:
– NAME: name of the field.
– ROWS: number of lines of text that can fit into the box.
– COLS: width of the text area on the screen.

Internet & Web Based Technology 34


<SELECT> …. </SELECT>
• Used along with the tag <OPTION>.
• Used to define a selectable list of elements.
– The list appears as a scrollable menu or a pop-up menu
(depends on browser).
• Attributes are:
– NAME: name of the field.
– SIZE: specifies the number of option elements that will be
displayed at a time on the menu. (If actual number exceeds
SIZE, a scrollbar will appear).
– MULTIPLE: specifies that multiple selections from the list
can be made.

Internet & Web Based Technology 35


<FORM ………….>
……..
Languages known:
<SELECT NAME=“lang” SIZE=3 MULTIPLE>
<OPTION> English
<OPTION> Hindi
<OPTION> French
<OPTION> Hebrew
</SELECT>
</FORM>

Internet & Web Based Technology 36


Example 1

<HTML>
<HEAD>
<TITLE> Using HTML Forms </TITLE>
</HEAD>

<BODY TEXT="#FFFFFF" BGCOLOR="#0000FF"


LINK="#FF9900" VLINK="#FF9900"
ALINK="#FF9900">

<CENTER><H3> Student Registration Form </H3>


</CENTER>

Please fill up the following form about the


courses you will register for this Semester.

Internet & Web Based Technology 37


<FORM METHOD="POST" ACTION="/cgi/feedback">
<P> Name: <INPUT NAME="name" TYPE="TEXT"
SIZE="30" MAXLENGTH="50">
<P> Roll Number: <INPUT NAME="rollno"
TYPE="TEXT" SIZE="7">
<P> Course Numbers:
<INPUT NAME="course1" TYPE="TEXT" SIZE="6">
<INPUT NAME="course2" TYPE="TEXT" SIZE="6">
<INPUT NAME="course3" TYPE="TEXT" SIZE="6">
<P> <P> Press SUBMIT when done.
<P> <INPUT TYPE="SUBMIT">
<INPUT TYPE="RESET">
</FORM> </BODY> </HTML>

Internet & Web Based Technology 38


Internet & Web Based Technology 39
Example 2

<HTML>
<HEAD>
<TITLE> Using HTML Forms </TITLE>
</HEAD>

<BODY TEXT="#FFFFFF" BGCOLOR="#0000FF"


LINK="#FF9900" VLINK="#FF9900"
ALINK="#FF9900">

<CENTER> <H3> Student Registration Form </H3>


</CENTER>

Please fill up the form below and press DONE when done.

Internet & Web Based Technology 40


<FORM METHOD="POST" ACTION="/cgi/feedback">
<P> Name: <INPUT NAME="name" TYPE="TEXT"
SIZE="30" MAXLENGTH="50">
<P> Roll Number:
<INPUT NAME="rollno" TYPE="TEXT" SIZE="7">
<P> Course Numbers:
<INPUT NAME="course1" TYPE="TEXT" SIZE="6">
<INPUT NAME="course2" TYPE="TEXT" SIZE="6">
<INPUT NAME="course3" TYPE="TEXT" SIZE="6">
<P> Category: SC <INPUT NAME="cat" TYPE=RADIO>
ST <INPUT NAME="cat" TYPE=RADIO>
GE <INPUT NAME="cat" TYPE=RADIO>

Internet & Web Based Technology 41


<P> Mother tongue: <SELECT NAME="mtongue" SIZE="3">
<OPTION> Hindi
<OPTION> Bengali
<OPTION> Gujrati
<OPTION> Tamil
<OPTION> Oriya
<OPTION> Assamese
</SELECT>
<P> <P> Thanks for the information.
<P> <INPUT TYPE="SUBMIT" VALUE="DONE">
<INPUT TYPE="RESET" VALUE="CLEAR FORM">
</FORM>
</BODY>
</HTML>

Internet & Web Based Technology 42


Internet & Web Based Technology 43
Example 3
<HTML>
<HEAD>
<TITLE> Using HTML Forms </TITLE>
</HEAD>

<BODY TEXT="#FFFFFF" BGCOLOR="#0000FF"


LINK="#FF9900" VLINK="#FF9900"
ALINK="#FF9900">

<CENTER> <H3> Student Feedback Form </H3>


</CENTER>

Please fill up the following form and press DONE


when finished.

Internet & Web Based Technology 44


<FORM METHOD="POST" ACTION="/cgi/feedback">
<P> Name: <INPUT NAME="name" TYPE="TEXT"
SIZE="30" MAXLENGTH="50">
<P> Roll Number:
<INPUT NAME="rollno" TYPE="TEXT" SIZE="7">
<P> Password:
<INPUT NAME="code" TYPE=PASSWORD
SIZE="10">
<P> Course Numbers:
<INPUT NAME="course1" TYPE="TEXT" SIZE="6">
<INPUT NAME="course2" TYPE="TEXT" SIZE="6">
<INPUT NAME="course3" TYPE="TEXT" SIZE="6">

Internet & Web Based Technology 45


<P> Category: SC <INPUT NAME="cat" TYPE=RADIO>
ST <INPUT NAME="cat" TYPE=RADIO>
GE <INPUT NAME="cat" TYPE=RADIO>

<P> Mother tongue: <SELECT NAME="mtongue"


SIZE="3">
<OPTION> Hindi
<OPTION> Bengali
<OPTION> Gujrati
<OPTION> Tamil
<OPTION> Assamese
<OPTION> Oriya
</SELECT>

Internet & Web Based Technology 46


<P> Languages known:
English <INPUT NAME="lang" TYPE=CHECKBOX>
Hindi <INPUT NAME="lang" TYPE=CHECKBOX>
<P> Scholarship holder (select for yes):
<INPUT NAME="schol" TYPE=CHECKBOX>
<P> General feedback:
<TEXTAREA NAME="feed" ROWS=3 COLS=20>
</TEXTAREA>
<P> <P> Thanks for the information.
<P> <INPUT TYPE="SUBMIT" VALUE="DONE">
<INPUT TYPE="RESET" VALUE="CLEAR FORM">
</FORM>
</BODY>
</HTML>

Internet & Web Based Technology 47


Internet & Web Based Technology 48
How to Submit a Form?

• Three different ways:


– Clicking on the Submit button.
– Clicking on an active map.
– Pressing <ENTER> on a TEXT box or TEXTAREA.

Internet & Web Based Technology 49


The Basic Mechanism

original page

P P
submit form

cgi
new html page

Browser

Internet & Web Based Technology 50


• Web page including form
– Resides on the web server in the regular folder where html
files and other documents are kept.
• CGI script program handling form data
– Resides under a special folder on the web server (usually,
“cgi-bin).
– May be written in Perl, C, shell script, etc.
• Web page linked to the cgi script.

Internet & Web Based Technology 51


<FORM METHOD=“POST”
ACTION=“cgi-bin/myprog.pl”>
……..
……..
</FORM>

Internet & Web Based Technology 52


How to Write the CGI Program?

• Must know …
– How to access the form data.
• Mechanism depends on METHOD (GET or POST).
– How to return processed output back to the browser.
• HTML file created on the fly (typically).
• Details to be discussed later.
– Good idea to have a look at a typical Perl script.

Internet & Web Based Technology 53


Image Maps
Introduction
• An image map allows us to create links to different
URLs depending upon where we click on the image.
– Useful for creating links on maps, diagrams, fancy buttons,
etc.
• There are two parts to an image map.
– The image.
– The map file.
• The map file defines the areas of the image and the
URLs that correlate to different areas.

Internet & Web Based Technology 55


So basically …
• An image map is a single image that contains hot
spots.
– When we click on a hot spot, we go to a new location (URL).
– Requires loading of only one image from the server.
• Thus requires fewer server calls.
• Is generally better looking.

Internet & Web Based Technology 56


Types of Image Maps
• Depending on the way they are configured and the
location where the processing is carried out, image
maps can be classified as two types.
– Server side
• Traditional
– Client side
• More efficient; supported by all recent browsers.

Internet & Web Based Technology 57


Server Side Image Maps
Basic Functioning
• Three ingredients are required to incorporate an
image map into a HTML document.
a) Creating the image map with well-defined boundaries.
b) Creating an image map configuration file.
¾ Contains relative pixel co-ordinates marking the
boundaries of the different clickable regions.
¾ Allowable geometries: circle, poly, point, rect.
c) Establish appropriate HTML information in the page to
link
¾ the map image,
¾ the map configuration file, and
¾ an (optional) CGI script which decodes of map co-
ordinates and selects the corresponding URL.

Internet & Web Based Technology 59


Typical Usage

<HTML>
<BODY>
……..
……..
<A HREF = “cgi-bin/map/menu.map”>
<IMG SRC = “IMAGES/imagemap.gif” ISMAP>
</A>
……..
……..
</BODY>
</HTML>

Internet & Web Based Technology 60


• The URL that is sent to the image map program or web server
when a user clicks the map resembles the following:

http://myserver.com/menu.map?x,y

where x and y are integers denoting the pixel co-ordinate of


the point of click.

Internet & Web Based Technology 61


Image Map Configuration File
• There are several different formats, all similar, and
varying slightly in syntax.
a) NCSA httpd server
b) APACHE httpd server
c) CERN httpd server
d) W3C httpd server

Internet & Web Based Technology 62


Example: APACHE server
• A sample configuration file looks like:

# An example
default http://www.myserver.edu
base_url http://www.iitkgp.ac.in/demo
circle circle.html 45,45,80,45
rect rectangle.html 20,10,178,70
point point.html 100,50
poly polygon.html 200,60,295,60,275,10

Internet & Web Based Technology 63


• Defining the default
– Typically, the first line in the map file is a default line.
– Defines the URL to which users will be taken if they click on
an undefined area of the image.
• Defining circles
– A circle is defined by two co-ordinates.
– The first co-ordinate is the centre point.
– The second co-ordinate is any point on the circumference.

Internet & Web Based Technology 64


• Defining rectangles
– A rectangle is defined by two co-ordinates.
– The first co-ordinate refers to the upper left corner.
– The second co-ordinate refers to the bottom right corner.
• Defining points
– Defines by a single co-ordinate.
– Clicks closest to that point on the image map will take to the
specified URL.
• Defining polygons
– A polygon is defined by a series of co-ordinates that outline the
area to be defined.
– We can start from any vertex of the polygon.
– Maximum number of vertices is 100.

Internet & Web Based Technology 65


Illustrative Example

Internet & Web Based Technology 66


An Important Point

• For each of the specified URLs, it is required to


specify the entire path.
• However, common prefix URL can be specified by
the base_url command.

base_url http://www.iitkgp.ac.in
circle circle.html 45,45,80,45
rect rectangle.html 20,10,178,70

Internet & Web Based Technology 67


Client Side Image Maps
Introduction
• In client-side image maps, the map information is
contained in the HTML document itself.
• Consists of three components:
– An ordinary image file (gif, jpeg, png)
– A map delimited by <MAP> tags containing the co-ordinate
and URL information for each region.
– The USEMAP attribute within the <IMG> tag that indicates
which map to reference.

Internet & Web Based Technology 69


Advantages
• They are self-contained within the HTML document.
• No dependence on the server to handle every
client’s request for image mapping.
• Faster processing; improves response time.
• No longer required to specify a default URL.
– Clicking outside hyperlinked area will take a user nowhere.
• Complete URL information displays in the status bar
when the mouse moves over the hot spots.
– In contrast, server-side image maps show only co-
ordinates.

Internet & Web Based Technology 70


Disadvantage
• The only disadvantage is that they are not
universally supported.
– Netscape Navigator 1.0 and Internet Explorer 2.0 do not
support client-side image maps.

Internet & Web Based Technology 71


Sample Client-side Image Map

<MAP NAME = “demo_map”>

<AREA SHAPE=CIRCLE COORDS=“45,45,20”


HREF=“circle.html” ALT=“Circle”>

<AREA SHAPE=RECT COORDS=“20,20,80,80”


HREF=“rectangle.html” ALT=“Rectangle”>

<AREA SHAPE=POLY COORDS=“10,10,50,50,70,100”


HREF=“polygon.html” ALT=“Triangle”>
</MAP>

Internet & Web Based Technology 72


• Some points:
– POINT is not supported.
– CIRCLE is specified by the centre co-ordinates, followed by
its radius.
– Comments can be included as in HTML, using
<! ……….. >

Internet & Web Based Technology 73


Linking to an Image
• This can be done using the <IMG> tag using the
USEMAP attribute.

<IMG SRC=“mymap.gif” USEMAP=“#demo_map”>

– References the image “mymap.gif”.


– Searches for the <MAP> element with the NAME attribute of
“demo_map”.

Internet & Web Based Technology 74


A Complete Example
<HTML>
<HEAD><TITLE> Client Side Image map </TITLE></HEAD>
<BODY>
<MAP NAME = “demo_map”>
<AREA SHAPE=CIRCLE COORDS=“45,45,20”
HREF=“circle.html” ALT=“Circle”>
<AREA SHAPE=RECT COORDS=“20,20,80,80”
HREF=“rectangle.html” ALT=“Rectangle”>
<AREA SHAPE=POLY COORDS=“10,10,50,50,70,100”
HREF=“polygon.html” ALT=“Triangle”>
</MAP>
<IMG SRC=“mymap.gif” USEMAP=“#demo_map”>
</BODY>
</HTML>

Internet & Web Based Technology 75


Combining the Two
• Motivation for combining client and server side
image map processing:
– Browsers ignore tags they do not understand.
– Newer browsers will use client-side map.
– Older browsers will use the server-side map.
• How to do this?

Internet & Web Based Technology 76


<A HREF = “http://myserver.edu/cgi-bin/map/demo_map”>
<IMG SRC = “mymap.gif” USEMAP = “#demo_map” ISMAP>
</A>

• USEMAP will be ignored by older browsers.


• ISMAP will be considered redundant by browsers
supporting client-side map.

Internet & Web Based Technology 77


Creating Image Maps
Available Tools

• There are several tools using which we can create


an image map.
• Some of the tools are:
– MapEdit
– Macromedia Dreamweaver
– Adobe GoLive
• Irrespective of the tool used, the steps required for
creation are more or less the same.

Internet & Web Based Technology 79


Creating the Map
• Typical steps:
– Open the image in the imagemap editor.
– Define areas within the image that will be clickable:
rectangle, circle or polygon.
– Highlight an area, and enter the URL for that area.
– Repeat the above steps for all the clickable areas of the
image.
– For server-side image maps, we also need to define a
default URL.
– Select the type (client or server side).

Internet & Web Based Technology 80


Hyper Text Transfer Protocol
(HTTP)
What is HTTP?
• Hyper Text Transfer Protocol
– A protocol using which web clients (browsers) interact with
web servers.
• It is a stateless protocol.
– Fresh connection for every item to be downloaded.
• Transfers hypertext across the Internet.
– A text with links to other text documents.

Internet & Web Based Technology 82


HTTP Protocol
• Web clients (browsers) and web servers communicate
via HTTP protocol.
• Basic steps:
– Client opens socket connection to the HTTP server.
• Typically over port 80.
– Client sends HTTP requests to server.
– Server sends back response.
– Server closes connection.
• HTTP is a stateless protocol.

Internet & Web Based Technology 83


Illustration

http
request Web
Servers
http
response
Web
Client http
request
http
response

Internet & Web Based Technology 84


HTTP Request Format
• A client request to a server consists of:
– Request method
– Path portion of the HTTP URL
– Version number of the HTTP protocol
– Optional request header information
– Blank line
– POST or PUT data if present.

Internet & Web Based Technology 85


HTTP Request Methods
• GET
– Most common HTTP method.
– Returns the contents of the specified document.
– Places any parameters in request header.
– Can also be used to submit forms:
• The form data is URL-encoded and appended to the
GET command URL.

GET /cgi-bin/myscript.cgi?Roll=1234&Sex=M HTTP/1.0

Internet & Web Based Technology 86


Illustration of GET
– A very simple HTTP connection to a server.
telnet www.facweb.iitkgp.ac.in http
– Client sends request for a file:
GET /test.html HTTP/1.0
– The server sends back the response:
HTTP/1.1 200 OK
Date: Sun, 22 May 2005 09:51:42 GMT
Server: Apache/1.3.33 (Win32)
Last-Modified: Sun, 22 May 2005 09:51:10 GMT
Accept-Ranges: bytes
Content-Length: 119
Connection: close

Internet & Web Based Technology 87


Illustration of GET (contd.)
Content-Type: text/html

<html> <head> <title> A test page </title> </head>


<body>
This is the body of the test page.
</body>
</html>

Internet & Web Based Technology 88


HTTP Request Methods (contd.)
• HEAD
– Returns only the header information of the specified
document.
– Used by clients to determine the file size, modification date,
server version, etc.

Internet & Web Based Technology 89


Illustration of HEAD
• Client sends
HEAD /index.html HTTP/1.0

• Server responds back with:


HTTP/1.1 200 OK
Date: Sun, 22 May 2005 10:08:37 GMT
Server: Apache/1.3.33 (Win32)
Last-Modified: Thu, 03 May 2001 11:30:38 GMT
Accept-Ranges: bytes
Content-Length: 1494
Connection: close
Content-Type: text/html

Internet & Web Based Technology 90


HTTP Request Methods (contd.)
• POST
– Used to send data to the server to be processed in some
way, as in a CGI script.
– Basic difference from GET:
• A block of data is sent along with the request.
• Extra headers like Content-Type and Content-Length
are used for this purpose.
• The requested object is not a resource to retrieve.
Rather, it is a script that can handle the data being sent.
• The server response is not a static file; but is generated
dynamically as the program output.

Internet & Web Based Technology 91


Illustration of POST
– A typical form submission, using POST is illustrated below:

POST /cgi-bin/myscript.cgi HTTP/1.0


From: isg@hotmail.com
User-Agent: HTTPTool/1.0
Content-Type: application/x-www-form-urlencoded
Content-Length: 32

Roll=1234&Sex=M&Age=20

Internet & Web Based Technology 92


HTTP Request Methods (contd.)
• PUT
– Replaces the contents of the specified document with data
supplied along with the command.
– Not used widely.
• DELETE:
– Deletes the specified document from the server.
– Not used widely.

Internet & Web Based Technology 93


HTTP Request Headers
• After a HTTP request line, a client can send any
number of header fields.
– Usually optional – used to convey some information.
– Some commonly used fields:
• Accept: MIME types client accepts, in order of
preference.
• Connection: connection options, close or Keep-Alive.
• Content-Length: number of bytes of data to follow.
• Content-Type: MIME type and subtype of the data that
follows.
• Pragma: “no-cache” option directs the server/proxy to
return a fresh document even though a cached copy
may exist.

Internet & Web Based Technology 94


HTTP Request Data
• To be given if the request type is either PUT or
POST.
– Send the data immediately after the HTTP request header,
and a blank line.

Internet & Web Based Technology 95


HTTP Response
• An initial response line.
– Also called the status line.
– Consists of three parts separated by spaces
• The HTTP version
• A 3-digit response status code
• An English phrase describing the status code.

HTTP/1.0 200 OK

HTTP/1.0 404 Not Found

Internet & Web Based Technology 96


HTTP Response (contd.)
• Header information, followed by a blank line, and
then the data.
HTTP/1.1 200 OK
Date: Sun, 22 May 2005 09:51:42 GMT
Server: Apache/1.3.33 (Win32)
Last-Modified: Sun, 22 May 2005 09:51:10 GMT
Content-Length: 119
Connection: close
Content-Type: text/html

<html> <head> <title> A test page </title> </head>


<body>
This is the body of the test page.
</body> </html>

Internet & Web Based Technology 97


3-digit Status Code
• 1xx
– Indicates informational messages only.
• 2xx
– Indicates successful transaction.
• 3xx
– Redirects the client to another URL.
• 4xx
– Indicates client error, such as unauthorized request.
• 5xx
– Indicates internal server error.

Internet & Web Based Technology 98


Common Status Codes
• 200 OK
• 301 Moved Permanently
• 302 Moved Temporarily
• 401 Unauthorized
• 403 Forbidden
• 404 Not Found
• 500 Internal Server Error

Internet & Web Based Technology 99


HTTP Response Headers
• Common response headers include:
– Content-Length
• Size of the data in bytes.
– Content-Type
• MIME type and subtype of data being sent.
– Date
• Current date.
– Expires
• Date at which document expires.
– Last-Modified
– Set-Cookie
• Name/value pair to be stored as cookie.

Internet & Web Based Technology 100


HTTP Response Data
• A blank line follows the response header, and the
data follows next.
– No upper limit on data size.
• HTTP/1.0
– Server typically closes connection after completing a
transaction.
• HTTP/1.1
– Server keeps the connection open by default, across
transactions.

Internet & Web Based Technology 101


HTTP version 1.1
• Current standard and widely used.
– Became IETF draft standard in 2001.
• Improvements over HTTP 1.0:
– Requires host identification.

GET /index.html HTTP/1.1


Host: www.facweb.iitkgp.ac.in
<blank line>

• Allows multi-homed servers.


• More than one domain living on same server.

Internet & Web Based Technology 102


HTTP version 1.1 (contd.)
– Default support for persistent connections.
• Multiple transactions over a single connection.
– Support for content negotiation.
• Decides on the best among the available
representations.
• Server-driven or browser-driven.
– Browsers can request part of document.
• Specify the bytes using Range header.
• Browser can ask for more than one range.
• Continue interrupted downloads.

Range: bytes=1200-3500

Internet & Web Based Technology 103


HTTP version 1.1 (contd.)
– Efficient caching support
• A document caching model that allows both the server
and the client to control the level of cachability and
update conditions and requirements.
• HTTP 1.1 requires several extra things from both
clients and servers.
– Mandatory to know these if one is trying to write a HTTP
client or server.

Internet & Web Based Technology 104


HTTP 1.1 Client Requirements
• The clients must do the following:
– Include the Host: header with each request.
– Either support persistent connections, or include the
Connection: close header with each request.
– Handle the 100 Continue response.
– Accept responses with chunked data.

Internet & Web Based Technology 105


HTTP 1.1 Server Requirements
• The servers must do the following:
– Require the Host: header from HTTP 1.1 clients.
– Accepts absolute URL’s in a request.
– Accept requests with chunked data.
– Include the Date: header in each response.
– Support at least the GET and HEAD methods.
– Support HTTP 1.0 requests.
– Either support persistent connections, or include the
Connection: close header with each request.

Internet & Web Based Technology 106


HTTP Proxy servers
• What is a HTTP Proxy server?
– A program that acts as an interface between a client and a
server.
– It receives requests from the clients, and forwards them to
the server(s).
– The responses are sent back in the same way.
– A proxy thus acts both as a HTTP client and a server.

Internet & Web Based Technology 107


• Request from a client to a proxy server differs from
normal server requests in one way.
– The complete URL of the resource being requested must be
specified.

GET http://www.xyz.com/docs/abc.txt HTTP/1.0

– Required by the proxy to know where to forward the


request to.

Internet & Web Based Technology 108


Uniform Resource Locators (URL)
What is a URL?
• They are the mechanism by which documents are
addressed in the WWW.
• A URL contains the following information:
– Name of the site containing the resource.
– The type of service to be used to access the resource (ftp,
http, etc.).
– The port number of the service.
• Default assumed, if omitted.
– Location of the resource (path name) in the server.

Internet & Web Based Technology 110


• URLs specify Internet addresses.
• General format for URL:
scheme://address:port/path/filename
• Examples:
http://www.rediff.com/news/ab1.html
http://www.xyz.edu:2345/home/rose.jpg
mailto://skdas@yahoo.co.in
news:alt.rec.flowers
ftp://kumar:km123@www.abc.com/docs/paper/x1.pdf
ftp://www.ftpsite.com/docs/paper1.ps

Internet & Web Based Technology 111


Sending a Query String
• The mechanism can also be used to send a query
string to a specified URL.
– Used for CGI scripts.
– Place a question mark at the end of the URL, followed by
the query string.

http://www.xyz.com/cgi-bin/xyz.pl?Roll=1234&Sex=M

Internet & Web Based Technology 112


CGI Scripts
Introduction
• CGI stands for Common Gateway Interface.
– Allows interactive web pages to be written.
• Page created dynamically, based on user request.
– CGI programs are called “scripts” because the first CGI
programs were written using UNIX shell scripts, and PERL.
• Can be written in almost any language.
– Usually resides in a special directory in the web server
(typically, “cgi-bin”).

Internet & Web Based Technology 114


• Apache Directory Structure: a case study
– cgi-bin
• Here most of the interactive programs will reside. These
will be written in Perl, Java, or any other programming
language.
– conf
• This will contain the configuration files.
– htdocs
• This will contain the actual HTML documents, and will
typically have many subdirectories. This directory is
known as the DocumentRoot.

Internet & Web Based Technology 115


– icons
• This contains the icons that Apache will use when
displaying information or error messages.
– images
• This will contain the image files that will be used in the
web site.
– logs
• This will contain the log files: the access_log and
error_log.

Internet & Web Based Technology 116


Structure of CGI Script
• When a CGI script is invoked by the server, the
server passes information to the script in one of
two ways:
a) GET
b) POST

• The request method used is passed to the script


via the environment variable REQUEST_METHOD.

Internet & Web Based Technology 117


“GET” Request Method
• The GET method sends request information as
parameters appended at the end of the URL.
http://myserver.edu/cgi-bin/myprog.pl?
name=niloy&rollno=7312&age=24

• The parameters are passed to the CGI program via


the environment variable QUERY_STRING.
– For the above example, QUERY_STRING will contain
“name=niloy&rollno=7312&age=24”

Internet & Web Based Technology 118


“POST” Request Method
• The data gets passed from the server to the CGI
script through STDIN.
• The environment variable CONTENT_LENGTH
indicates the size in bytes of the incoming data.
• The format of the POST-ed data is:
var1=value1&var2=value2&……
• The REQUEST_METHOD environment variable must
be examined to know whether or not to read from
STDIN.

Internet & Web Based Technology 119


To Summarize
• For GET
– Data are read from QUERY_STRING environment variable.
• For POST
– Data are read from STDIN.
– Number of bytes to be read is obtained from
CONTENT_LENGTH.
• Both data available in same format:
var1=value1&var2=value2&……
name=niloy&rollno=7312&age=24

Internet & Web Based Technology 120


URL Encoding
• For platform independence, all data passed to the
server are URL-encoded.
– Variables are separated by ‘&’.
– Special characters (including ‘&’) are escaped as 2-digit
hex numbers, e,g,
%25 Î ‘%’
%20 Î ‘ ’
– ‘+’ sign is interpreted as a space character.

Internet & Web Based Technology 121


• The process of decoding back:
– Separate out the variables.
– Replace all ‘+’ signs by spaces.
– Replace all %## with the corresponding ASCII character.

Internet & Web Based Technology 122


• Which characters are encoded?
– Control characters: 0x00 through 0x1F, and 0x7F.
– 8-bit characters: 0x80 through 0xFF
– Characters given special importance within URLs:
; / ? : @ & = + $ ,
– Characters often used to delimit URLs:
< > # % “
– Characters considered unsafe as they may have special
meaning for other protocols:
{ } | \ ^ [ ] `

Internet & Web Based Technology 123


• A point to note:
– When the server passes data using the POST method, the
scripts checks the environment variable CONTENT_TYPE.
– If the value of CONTENT_TYPE is
application/x-www-form-urlencoded
the data needs to be decoded before use.

Internet & Web Based Technology 124


Basic Structure of CGI Script
• Step 1: Initialization
– Check REQUEST_METHOD.
– Parse string and extract variables depending on “GET” or
“POST”.
– Check CONTENT_TYPE, to find out if the string is URL-
encoded.
• Step 2: Processing
– Process the input data.
– Output the results (MIME-type header, and the contents).
• Step 3: Termination
– Release the system resources.
– Terminate the program.

Internet & Web Based Technology 125


Environment Variables Used
• CONTENT_LENGTH
– Length of URL-encoded data in bytes.
• CONTENT_TYPE
– Specifies the type of data as a MIME header.
• QUERY_STRING
– Information at the end of the URL after ‘?’.
• REMOTE_ADDR
– IP address of the client making the request.
• REMOTE_HOST
– Resolved host name of the client.

Internet & Web Based Technology 126


• REQUEST_METHOD
– “GET” or “POST”.
• SERVER_NAME
– Web server’s host name, or IP address.
• SERVER_PROTOCOL
– Say, HTTP/1.0
• SERVER_PORT
– Port number on server that received the HTTP request.
• SCRIPT_NAME
– Name of the CGI script being run.

Internet & Web Based Technology 127


Response Header
• The most common response header is Content-
Type, which is based on MIME types.
• Typical values are:

Content-Type: text/plain
text/html
image/gif
video/avi

Internet & Web Based Technology 128


• A complete MIME header looks like this:

Content-Type: text/plain; charset=US-ASCII


Content-Transfer-Encoding: 7bit
Content-Description: Postscript

Internet & Web Based Technology 129


CGI Real-life Examples
• Search Engine
• Page-hit Counter
• Student Registration
• On-line Booking of Tickets
• On-line Purchase of Items
• E-mail Gateways
• Feedback Scripts
• Web-based Games

Internet & Web Based Technology 130


Security Issues with CGI Scripts
• A CGI script is a program that anyone in the world
can run on your machine.
• Do not trust the user input.
– In particular, do not put user data in a shell command
without verifying the data carefully.
– An example in next slide.

Internet & Web Based Technology 131


• An example
– Suppose that you have a CGI script that lets users run the
“finger” command on your host.
– In Perl, there can be a line:
system “finger $username”
– A malicious user may enter
isg; rm –r /
as the username.

– The result Æ all files will get deleted.

Internet & Web Based Technology 132


Enter UserId isg; rm –r /

Internet & Web Based Technology 133


An Example CGI Program
• Using bash shell script:
#!/bin/sh
CAT=/bin/cat

echo Content-type: text/plain


echo ""
if [[ -x $CAT]]
then
$CAT $1 | sort
else
echo Cannot find command on this system.
fi

Internet & Web Based Technology 134


• What this program does?
– Sends the contents of a file residing on the server back to
the browser.
• How to invoke?
<A HREF="/cgi-bin/test1.sh?
/home/user1/public_html/text-file.txt">
Click here to activate</A>

$1

Internet & Web Based Technology 135


Another Example
#!/bin/sh
echo Content-type: text/html
echo ""

/bin/cat << EOM


<HTML>
<HEAD>
<TITLE>File Output: /home/user1/public_html/text-file.txt </TITLE>
</HEAD>
<BODY bgcolor="#cccccc" text="#000000">
<HR SIZE=5>
<H1>File Output: /home/user1/public_html/text-file.txt </H1>
<HR SIZE=5> <P>

Internet & Web Based Technology 136


<SMALL>
<PRE>
EOM

/bin/cat /home/user1/public_html/text-file.txt
CAT << EOM
</PRE>
</SMALL> <P>
</BODY>
</HTML>
EOM

Internet & Web Based Technology 137


• What this program does?
– Outputs the contents of the file “text-file.txt” as a HTML file.
• How to invoke?
– Through a dummy HTML form.
– Through the following link:
<A HREF="/cgi-bin/test2.sh">Click here</A>

Internet & Web Based Technology 138


E-mail Gateways: an Example
• E-mail gateways are very popular on the web.
• Allows users to send and receive mails, without
having to worry about managing a mail server.
• Can be designed using CGI scripts, or any other
similar technologies.
• Popular e-mail gateways:
– yahoo, rediffmail, hotmail, gmail, etc.

Internet & Web Based Technology 139


Internet & Web Based Technology 140
Email
Browser Mail Server
Gateway

Internet & Web Based Technology 141


Writing CGI Scripts using Perl
• Would be discussed later.
– After discussing the syntax and semantics of Perl.
– We will see how the form data can be extracted and
processed.
• Requires string manipulation.

Internet & Web Based Technology 142

You might also like