Download as pdf or txt
Download as pdf or txt
You are on page 1of 47

Technologies for Internet Systems

“The structured web”

ICP3123 & ICP4123

Dr. Jonathan C. Roberts


Emergence of the World Wide Web
• In the 1960s, Ted Nelson (co
founder IBM) described a
similar system called
hypertext
– Text on one page links to the
text on other pages, but with
additions!
– Nelson coined “hypertext”
Guide
and “hypermedia”
• 1982 (Guide),
– Peter Brown (Uni Kent)
– the first significant hypertext
system for PCs
• 1985 (NoteCards)
– [Halasz et al. 1987].
NoteCards
– It was designed at Xerox PARC
• Note. Andy van Dam, buit hypertext editing system
J.C.Roberts 2
Ben Shneiderman also did work in this area
Emergence of the World Wide Web

• Tim Berners-Lee developed code for a hypertext server


program
– In 1980, Berners-Lee created ENQUIRE, an early hypertext
database system
– Hypertext server:
o Stores files written in the hypertext markup language
o Lets other computers connect to it and read files
– Hypertext Markup Language (HTML)
o Includes a set of codes (or tags) attached to text
• 1987, Apple Computer released HyperCard for Mac
– first ACM Hypertext conference 1987
• 1990s Tim Berners-Lee invented the WWW for
physicists working at CERN
• 1993-1994, grow from 500 to 10,000 web servers
• 1994, Lynx (an early browser)
J.C.Roberts 3
Thinking about structure

• What is structure?
– An observation of patterns or relationships of entities
– Structure may be
hierarchical, linear, network (many-to-many), lattice
• Where is the structure?
• In Biology:
– Atomic, molecular, cellular, tissue, organ
• A report
– Title, Abstract, contents, Introduction, main-body, conclusions,
bibliography, appendices

J.C.Roberts 4
Document Links are only one “structure”
component “HTML is precisely what we were trying to
PREVENT— ever-breaking links, links
going outward only, quotes you can't follow
to their origins, no version management,
no rights management”. – Ted Nelson
But what are links?
– A reference the reader can follow
– If it has an anchor then it can go to a particular place in that
document
• Not only one way?
– In some hypertext documents they can be one-to-one, one-to-
many?
• Are they typed or guaranteed?
• Where does the information go?
– Traditionally replaces the current,
– could open a new window.
– Transclusion – inserts the text in place
• “inline linking” (e.g., banner ads)
<img src="http://www.webserver.com/picture.jpg"
J.C.Roberts 5
Need a good way to encode structure of docs
The idea was to separate structure of the document from
its appearance:
e.g. author, title, address, introduction etc..

Standard Generalized Markup Language (SGML)


ISO 8879:1986 SGML
– Older and more general text markup language than HTML
(accepted as standard in 1986)
– A meta language
• Is a system to define markup languages
• Offers a system of marking up documents that is
independent of any software application
• Platform independent World Wide Web Consortium
• Offers user-defined tags (W3C)
• Costly to set up and maintain Not-for-profit group that maintains
standards for the Web

http://www.w3.org/TR/html4/intro/sgmltut.html J.C.Roberts 6
a generalized markup language should be...

• Declarative
– Describe the structure (and attributes) of the
document
– Not describe the processing that is done to it!
– Does not have side-effects
– Extensible
• Rigorous
– Clear correspondence to mathematics
– So there can be known/pre-defined and consistent
ways to manipulate it.
– Can be used in programming and processing
documents

J.C.Roberts 7
A bit more on SGML
• An element is a structural unit
e.g., <poem>

• Tag start/end pairs delimit the


information
<poem> ... </poem>

• From this poem we could derive the


following rules:
1. An anthology contains a number of poems and nothing The example is taken from William Blake's
Songs of innocence and experience (1794).
else.
The markup is designed for illustrative
2. A poem always has a single title element which precedes purposes
the first stanza and contains no other elements. http://www-sul.stanford.edu/tools/tutorials/
3. Apart from the title, a poem consists only of stanzas. html2.0/gentle.html#fn6

4. Stanzas consist only of lines and every line is contained by


a stanza.
5. Nothing can follow a stanza except another stanza or the
end of a poem.
6. Nothing can follow a line except another line or the start of
a new stanza.

J.C.Roberts 8
We could simplify the markup (making assumptions)
• For instance
Rule 2, “a single title” could mean that
we could remove the end tag
Rule 3, we could imply the end of the
poem so don’t need </poem>
Likewise don’t need </line>

The example is taken from William Blake's Songs of


innocence and experience (1794). The markup is designed for
illustrative purposes
http://www-sul.stanford.edu/tools/tutorials/html2.0/
gentle.html#fn6

J.C.Roberts 9
Rules like these, need to be made formal
• Need a formal specification -> document type definition (DTD)
• Markup declarations, BNF, (Backus Normal Form or Backus–Naur
Form)

Three parts:
1. a name or group of names,
2. two characters specifying minimization rules,
i.e., does it need a start/end tag. Either a hyphen or a letter O (for
"optional“ or “ommisible”)
3. and a content model (the material)
several such reserved words, of which by far the most commonly
encountered is #PCDATA
(This means "parsed character data“)
occurrence indicators = plus sign (+), the question mark (?), and the
asterisk (*)

http://www-sul.stanford.edu/tools/tutorials/html2.0/gentle.html
J.C.Roberts 10
DTD, SGML & HTML

• The SGML encoding thus needs a DTD


• The W3C XML (Extensible Markup Language) is
a subset of SGML
– It reduces the SGML options (e.g., optional end tags)
– So, compulsory end tags

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN”


"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
• All HTML 4.01 documents conform to one of
three SGML DTDs
-//W3C//DTD HTML 4.01//EN
-//W3C//DTD HTML 4.01 Transitional//EN
-//W3C//DTD HTML 4.01 Frameset//EN
J.C.Roberts 11
Hypertext Markup Language (HTML)
\documentclass[twocolumn]{article}
\usepackage{times}
• HTML is a markup language \begin{document}
..
– a language for adding some \section{Introduction}
Some text…
annotation or “markup” to text. \section{Sec 2}
• Prevalent markup language Some more text
..
LaTeX
used to create web documents \end{document}

• There are many markup languages: e.g. LaTeX


• HTML tags are interpreted by a Web browser
and are used by it to format the display of the
text
• HTML links can be structured as:
– Linear hyperlink structures
– Hierarchical hyperlink structures

J.C.Roberts 12
Hypertext Markup Language (HTML) (cont.)

• Cascading Style Sheets (CSS) gives the


appearance

• CSS is a set of instructions for the display of


pages
– Style sheet is:
o Usually stored in a separate file
o Referenced using the HTML style tag

J.C.Roberts 13
HTML
<html>
• HTML has a specific set <head>
of tags that allow: <title> My Page </title>
– the structure of a document
to be described (e.g., <h1> </head>
- heading) <body>
– links to other documents on <h1> My Page </h1>
web defined (e.g, <a> - <p> This is great </p>
anchor)
</body>
– some control of
presentation (e.g., fonts). </html>
• To learn HTML you Uses Tags:
should start creating html, head, title,
simple files using body, h1, p
notepad.

J.C.Roberts 14
Basic HTML tags

• All text is inside <html>


• <head> and <body> are compulsory

• Comments are included as follows


– <! my fist comment which may extend to more than a single line>

• Always use comments to implement some basic version


control
– application name, description, author, creation date, modification
date, version number, copyright

J.C.Roberts 15
Tags and Elements

• HTML tags are used to mark up a page.


• Tags start and end with angle brackets, e.g., <h1>.
• Need a start and an end tag to mark up a region. End
tags have an additional “/”.
<h1> My Page </h1>
• The two tags define named elements. Above is an h1
element.
• head element gives information about page.
• body element contains content of page.

J.C.Roberts 16
Elements and Attributes

• Elements can also have attributes, giving


additional information.

<body bgcolor=“blue”> ..</body>


<table width=“610”> ..</table>
<a href=“http://…”> link </a>

URL - Uniform resource locator


• Last example defines anchor with hyperlink to
another page.

J.C.Roberts 17
Presentational markup

• <b>boldface</b> indicates that something should be bold


– But, what about screen readers?
– How do they interpret this? How should this sound?
• Most presentational markup elements have become deprecated
under the HTML 4.0 specification
– Instead use CSS
• Deliberate forms are bad
– center, font, strike, u,
– Frames, frameset, iframe
– b, br (probably should have used div or p)
– Big, small, sub, sup, i
• Presentational attributes:
– Align, alink, bgcolor, face, height, size, valign, vlink, width
• Even entities such as TABLE can be used as presentational
markup, and represent bad practice

J.C.Roberts 18
URLs: Uniform Resource Locators

• Links to other files are defined using URLs.


• These define precisely the location of a file, anywhere
on the WWW.
• URLs can be relative or absolute.
– Absolute URLs give the full path to the file.
– Relative URLs give the location relative to the file containing the
URL.
• A link containing the relative URL href=“bit.html” in the
file with URL “http://www.loc/main.html” would load the
file from “http://www.loc/bit.html”.
• URLs are also referred to by the broader term URIs -
Uniform Resource Indicators

J.C.Roberts 19
The Anchor Tag; Href Attribute; Named anchors

• HTML uses the <a> (anchor) tag to create a link to another


document.
• An anchor can point to any resource on the Web: an HTML page,
an image, a sound file, a movie, etc.
• The syntax of creating an anchor:
<a href="url">Text to be displayed</a>
<a href="http://www.w3schools.com/">Visit W3Schools!</a>
• Named anchors
– Jump to specific points
<a name=“tips">Useful tips section</a>
– You should notice that a named anchor is not displayed in a special
way.
– To link directly to the "tips" section, add a # sign and the name of the
anchor to the end of a URL,
<a href= "http://www.mytips.com/links.html#tips"> Jump
to the Useful Tips Section</a>
– Or within the file:
<a href="#tips">Jump to the Useful Tips Section</a

J.C.Roberts 20
more HTML tags…

• Ordered <ol> • Background color


<li>…</li>
list <li> … </li> <body bgcolor="#000000">
</ol> <body bgcolor="rgb(0,0,0)">
<body bgcolor="black">
<dl> • Background image
<dt>Coffee</dt>
• Definition <dt>Black</dd>
<body background="clouds.gif">
<body background=“
lists <dt>Milk</dt>
http://www.mytips.com/clouds.gif">
<dd>iced</dd>
</dl>

J.C.Roberts 21
HTML Versions and Validity

• HTML has evolved over last 15 years, and continuing to


change!
• Development is coordinated now by World Wide Web
Consortium (W3C).
• Currently most widely used versions are HTML 4 and
XHTML. Use of HTML5 is growing
• HTML which is correct in one version may not be correct
in another.
• HTML 4 has a strict version and a transitional version.
Strict is preferred, but transitional allows more backward
compatibility.

J.C.Roberts 22
Document Type Declaration & Character encoding

• There are tools that allow you to validate HTML page.


• You must stipulate which version of HTML you are
writing!
• Add a document type declaration to top of your page.
E.g.
– <!DOCTYPE HTML PUBLIC
“-//W3C//DTD HTML 4.01 Transitional//EN”
“http://www.w3.org/TR/html4/loose.dtd”>
<html>
– …
• You may also need to state what character encoding you are using!
• So, add the following in the head element of the document:
<html>
<head>
<meta http-equiv=“Content-Type”
content=“text/html; charset=utf-8”>
• (UTF8 is a common encoding)

J.C.Roberts 23
Validators

• Once we have added these, we can use online


validators to check our document is quite
correct.
• One validator is the W3C’s validator at
http://validator.w3.org.
• Try it on an example file

J.C.Roberts 24
HTML Conclusion

• Easy to create Web pages,


• important to understand basis of HTML,
– and to be able to author “by hand” using text editors.
• Pages that work in your browser may not work in
someone else’s!
– Validate your documents. If you get errors try and
understand them by looking at HTML references.

J.C.Roberts 25
XHTML - Making HTML XML Compatible!

• HTML is based on SGML standard for markup


languages.
• SGML has now been largely superceded by XML
(eXtensible Markup Language).
• XML is simpler and stricter than SGML, making it more
suitable for “lightweight” processing in Web applications.
• XHTML is an XML compatible version of HTML 4.

J.C.Roberts 26
HTML 4. vs XHTML
<!DOCTYPE … HTML 4.01 ..> <!DOCTYPE … XHTML ..>
<html> <html>
… …
<body> <body>
<h1> My HTML Page </H1> <h1> My HTML Page </h1>
<P> A paragraph <p> A paragraph </p>
<UL> <ul>
<li> a list item <li> a list item </li>
</UL> </ul>
</body> </body>
</html> </html>

validate validate

J.C.Roberts 27
XHTML vs HTML 4

• Key things to note about XHTML:


– All tag names must be in lower case.
– Every start tag must have matching end tag.
Following not allowed:
<li> stuff ….
<li> more stuff
– Note: Empty elements can be abbreviated to e.g.,
<br />
– The document must start with a document type
declaration, which for XHTML is:
– <!DOCTYPE html PUBLIC
"-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

J.C.Roberts 28
The XHTML document head
• Declaring The XHTML File Encoding
– <head>
– <meta http-equiv="content-type" content="text/html;
charset=utf-8"/>
– </head>
• XHTML documents must have a DTD
– strict: full compliance with XHTML
– transitional: allows some presentational markup
– frameset: pages with frames
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://
www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

J.C.Roberts 29
Extensible Markup Language (XML)

• XML uses paired start and stop tags


• It includes data management capabilities that
HTML cannot provide
• Differences between XML and HTML:
– XML is not a markup language with defined tags
– XML tags do not specify how text appears on a Web
page

J.C.Roberts 30
XML Schema

• The DTD imposes constraints on the XML


document,
• Filename .xsd
• A set of rules for that document to be considered
to “conform”,
– e.g., adhering to specific datatypes
– Such as an integer or a date
– New datatypes can be
i) Restricted (only include some explicit values)
ii) List (a sequence of values)
iii) Union (a choice of values from specific types)

J.C.Roberts 31
XML Schema, example

An example of an XML document that conforms to this schema


J.C.Roberts 32
XPath

• XPath is a query language


• Used for addressing parts of an XML document,
• designed to be used by both XSLT and
XPointer.

J.C.Roberts 33
XSLT, Extensible Stylesheet Language
Transformations
• a declarative, XML-based language
• used for the transformation of XML documents.
• use XSLT to convert XML data into HTML or
XHTML
• Typically transform XML into XSL Formatting
Objects
– Which are then translated into PDF, PostScript etc.

J.C.Roberts 34
J.C.Roberts 35
Stylesheets - CSS (Cascading Stylesheets)
• Stylesheets control body {background-color: blue}
presentation. h1 {color: red; font-family: times}
• Main stylesheet language
is CSS
<style type=“text/css”>
• Stylesheets specify how
body {background-color: blue}
elements should be h1 {color: red}
displayed </style>
• Placed where?
– style instructions at top of
your HTML file
– Or external file <link rel=“stylesheet”
o Good for consistent type=“text/css”
appearance across site
o Conflicts between internal href=“mystyle.css”>
and external (internal one
wins!)

J.C.Roberts 36
More on CSS

• CSS is based on simple statement:


– selector {property: value}
• Selector is (usually) name of element, e.g., h1, body, li
• Property is something like font or color or alignment.
• Value is value you want that property to have, e.g.,
“times”, “blue”.

• Syntax:
– Style sheets consist of a list of rules
– Each rule consists of one or more comma-separated selectors
and a declaration block
– A declaration block consists of a list of semicolon-separated
declarations in curly braces
– Each declaration consists of a property, a colon (:) and a value.

J.C.Roberts 37
CSS example

p /* paragraph */
{ font-family: "Garamond", serif; }

h2 /* heading 2 */
{ font-size: 110%; color: red; background: white; }

p.Leftmargin
{margin-left: 2cm}

img
{float:right;border:1px dotted black;margin:0px 0px
15px 20px;}

J.C.Roberts 38
Advantages of using CSS

• All presentation information for pages held in


one place
• Presentation can be updated quickly and easily
• Different users can have different style sheets
• Document code reduced in size and complexity
• CSS has a simple syntax

J.C.Roberts 39
FORM HTML tag

• The FORM tag creates an HTML form.


• The form can contain interface elements such as
– text fields,
– buttons,
– checkboxes,
– radio buttons,
– selection lists
• that let users enter text and
make choices.

J.C.Roberts 40
INPUT tag of FORM
<HTML><HEAD>
• Several kinds of form elements <TITLE>Form example .. tell me your name </
TITLE>
can be defined using the </HEAD>
<BODY>
INPUT tag, <H2>Who are you?</H2>
– Uses TYPE attribute to <FORM
METHOD=POST
indicate (e.g) button, ACTION="mailto:j.c.roberts@somehost.ac.uk">
checkbox, and so on. <P>Enter your name:<INPUT NAME="theName"></
P>
<P><INPUT TYPE="submit"></P>
INPUT TYPE="BUTTON" </FORM>
</BODY>
INPUT TYPE="CHECKBOX" </HTML>
INPUT TYPE="FILE"
INPUT TYPE="HIDDEN"
INPUT TYPE="IMAGE"
INPUT TYPE="PASSWORD"
INPUT TYPE="RADIO"
INPUT TYPE="RESET"
INPUT TYPE="SUBMIT"
INPUT TYPE="TEXT"

J.C.Roberts 41
Enter some text, more fields

<html><head>
<title>Form example .. tell me your name </title>
</head>
<body>
<h2>Who are you?</h2>
<FORM
METHOD=POST
ACTION=mailto:j.c.roberts@somehost.ac.uk
enctype="text/plain">
<p>Enter your name:
<input name="theName">
</p>
<p>Enter your age:
<input name="theAge" size= "3" maxlength= "3” >
</p>
<p>Enter your address:<input name="theAddress”
type= ”text " ></p>
<p><input type="submit"></p>
</form>
</body>
</html>

J.C.Roberts 42
Radio buttons...

• List of items (only chose • Multiple items from a list, none,


one..) one or more
– Use radio type
<OL>
<LI><INPUT TYPE="checkbox" NAME="red">Red</LI>
<LI><INPUT TYPE="checkbox" NAME="green">Green</LI>
<LI><INPUT TYPE="checkbox" NAME="blue">Blue</LI>
</OL>

J.C.Roberts 43
Setting and Resetting

• For the default value:


– use VALUE for text fields
– use CHECKED for check boxes and radio buttons
– Use RESET type for a button to clear the fields..
– Also. PASSWORD type allows a password to be entered and ****’s
are replace
<INPUT TYPE= “RESET” VALUE=“Clear fields”>

<INPUT TYPE=“PASSWORD” NAME=“password”>

• Events can go to mailto:blob@somehost.ac.uk But also


could go to an external program:

<FORM METHOD=POST
ACTION=“http://www.cs.bangor.ac.uk/cgi-bin/post-query”>

J.C.Roberts 44
Forms

• Forms introduce interactivity


– user enters information
– information is passed on from client to server
• Processing client response
– use PHP scripts to process data from server
– invoke client emailer to forward data to server
• Data communication
– get is a short request attached to the end of a URL
– post sends a separate data message, can send large amount of
data

<form action="URL" method="post"|"get">...</form>

J.C.Roberts 45
Summary

• Use XHTML if possible for XML compatibility.


• Validate your documents
– they will be more likely to work in all browsers then!
• Use stylesheets to control appearance,
– specially when you want lots of pages to have same
style.
• Use forms to allow user input, with “action”
attribute to specify what is to be done with the
input.

J.C.Roberts 46
The Structured web

• Talked about structure and the web


• Discussed some technologies to markup the
information
– Discussed SGML, and HTML, and XML
• Thought about structure where it is and how to
control it.
– What is structured?
– The Web is only one place for structured content.
o What other information can be structured?
• Discussed hypertext and links and

J.C.Roberts 47

You might also like