Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 36

IST 195 Programming 1: HTML-Web The Semantic Web, XML, and English on the Web

Prof. Randy Wenner

Adapted from Professor Jeff Stanton

Learning Map
1. The Semantic Web
Giving web page contents more meaning for people and computers One of the most important tools for creating the semantic web Challenges of many cultures, many pages, in many languages and how XML and the semantic web may help

2. XML

3. English and the Web

Semantic Web
Semantic:
Part of the structure of language relating to meaning, especially of words

The Semantic Web: Web 3.0?


An idea for the future of the WWW in which information is tagged with information about its meaning rather than about its format

The Web is not Semantic now


Currently the web is a large collection of HTML documents and a bit of other stuff
We know that HTML is a formatting language: It says where elements on the page should go and what they should look like
Example: <h2>Zap Mama</h2> makes a second level heading, left justified, bold, larger font

HTML does not actually say what anything actually is what kind of information it is
You cant tell from the <h2> tag what Zap Mama signifies. Is it a command? A label? A name?

Where we are Today: the Syntactic Web

[Hendler & Miller 02]

The Syntactic Web is


A hypermedia, a digital library
A library of documents called (web pages) interconnected by a hypermedia of links

A database, an application platform


A common portal to applications accessible through web pages, and presenting their results as web pages

A platform for multimedia


BBC Radio 4 anywhere in the world! Terminator trailers!

A naming scheme
Unique identity for those documents
[Goble 03]

A place where computers do the presentation (easy) and people do the linking and interpreting (hard). Why not get computers to do more of the hard work?

Hard Work using the Syntactic Web


Find image of Buzz Shaw (SU former chancellor)

http://www.buzzbutt.com/ html/shaw_party.html

What is the Problem? Consider a typical web page:


Markup consists of:
rendering information (e.g., font size and color) Hyperlinks to related content

Semantic content is accessible to humans but not (easily) to computers

What information we see


WWW 2002 The eleventh international world wide web conference Sheraton Waikiki hotel Honolulu, Hawaii, USA 7-11 may 2002 1 location 5 days learn interact Registered participants coming from australia, canada, chile denmark, france, germany, ghana, hong kong, india, ireland, italy, japan, malta, new zealand, the netherlands, norway, singapore, switzerland, the united kingdom, the united states, vietnam, zaire Register now On the 7th May Honolulu will provide the backdrop of the eleventh international world wide web conference. This prestigious event Speakers confirmed Tim Berners-Lee Tim is the well known inventor of the Web, Ian Foster Ian is the pioneer of the Grid, the next generation internet

What information a machine sees


WWW2002 The eleventh inteqnational woqld wide web confeqence Sheqaton waikiki hotel Honolulu, hawaii, USA 7-11 may 2002 1 location 5 days leaqn inteqact Registeqed paqticipants coming fqom austqalia, canada, chile denmaqk, fqance, geqmany, ghana, hong kong, india, iqeland, italy, japan, malta, new zealand, the netheqlands, noqway, singapoqe, switzeqland, the united kingdom, the united states, vietnam, zaiqe Registeq now On the 7th May Honolulu will pqovide the backdqop of the eleventh inteqnational woqld wide web confeqence. This pqestigious event Speakeqs confiqmed Tim beqneqs-lee Tim is the well known inventoq of the Web, Ian Fosteq Ian is the pioneeq of the Gqid, the next geneqation inteqnet

So, if a machine sees garble


You cant ask it
Whos speaking at the conference? What countries will be represented? What dates is the conference being held? etc.

The Semantic Web aims to solve this


Rather than describing formatting, tags would designate what kind of information a piece of information was Rather than discarding the internal organization of the data when placing it on web pages, authors would keep the natural structure of the data
Artist Beastie Boys David Byrne Zap Mama Title Now Get Busy My Fair Lady Wadidyusay? Courtesy Beastie Boys appear courtesy of Beastie Boys and Capitol Records. David Byrne appears courtesy of Nonesuch Records. Zap Mama appears courtesy of Luaka Bop Records.

Tags like <artist>Zap Mama</artist> would replace the current HTML strategy of tagging the format of the information

The Semantic Web was always the goal


Web was invented by Tim Berners-Lee (amongst others), a physicist working at CERN TBLs original vision of the Web was much more ambitious than the reality of the existing (syntactic) Web:
... a goal of the Web was that, if the interaction between person and hypertext could be so intuitive that the machine-readable information space gave an accurate representation of the state of people's thoughts, interactions, and work patterns, then machine analysis could become a very powerful management tool, seeing patterns in our work and facilitating our working together through the typical problems which beset the management of large organizations.

TBL (and others) have since been working towards realizing this vision, which has become known as the Semantic Web
article in May 2001 issue of Scientific American

More on the Semantic Web

Oh Happy Day!
The Semantic Web is under development Three major components
XML Extensible markup language
for tagging the structure of the data

RDF Resource description framework


a way to break knowledge down into small pieces, with some rules about the meaning of those pieces Goal: to have a method so simple that it can express any fact, and yet so structured that computer applications can do useful things with knowledge expressed in RDF

OWL Web Ontology Language


for describing the big picture about how data elements on one or more pages all fit together and relate to one another

Of course, we can't be drawing our way through the Semantic Web, so instead how about a table-style representation for the graph? Each row represents an arrow (an edge) in the figure. The first column has the name of the node at the start of the edge. The second column has the label of the edge itself (the kind of edge). The third column has the name of the node at the end of the arrow.

Start Node vincent_donofrio law_&_order_ci

Edge Label starred_in is_a

End Node law_&_order_ci tv_show the_matrix

the_thirteenth_floor similar_plot_as ...

Example XML
The following text may look identical in a browser

Example XML
But its quite different under the hood. See how the XML differs from the HTML?

HTML
XML

Example XML
You can use Internet Explorer to view XML in its raw form (VIEW>SOURCE) Note the meaningful tags, like <tasklist>

Example RDF
RDF information is expressed in XML This example describes the prior example
Gives the title, author, creation date, and subject These pieces of information are called metadata because they are data about data
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:ex="http://www.example.org/"> <rdf:Description rdf:about="http://www.example.org/vincent_donofrio"> <ex:starred_in> <ex:tv_show rdf:about="http://www.example.org/law_and_order_ci" /> </ex:starred_in> </rdf:Description>
<rdf:Description rdf:about="http://www.example.org/the_thirteenth_floor"> <ex:similar_plot_as rdf:resource="http://www.example.org/the_matrix" /> </rdf:Description> </rdf:RDF>

OWL Example
OWL is also expressed in a form similar to XML Things to note from the example:
a wine is a potable liquid produced by at least one maker of type winery A wine is made from at least one type of grape (such grapes are restricted to wine grapes elsewhere in the ontology)

Wine
<rdfs:Class rdf:ID="WINE"> <rdfs:subClassOf rdf:resource="#POTABLE-LIQUID"/> <rdfs:subClassOf> <daml:Restriction> <daml:onProperty rdf:resource="#MAKER"/> <daml:minCardinality> 1 </daml:minCardinality> </daml:Restriction> </rdfs:subClassOf> <rdfs:subClassOf> <daml:Restriction> <daml:onProperty rdf:resource="#MAKER"/> <daml:toClass rdf:resource="#WINERY"/> </daml:Restriction> </rdfs:subClassOf> <rdfs:subClassOf> <daml:Restriction> <daml:onProperty rdf:resource="#GRAPE-SLOT"/> <daml:minCardinality> 1 ......... </rdfs:Class>

Learning Map - Pause


1. The Semantic Web
As it stands, HTML documents have lots of formatting but little meaningful structure: Example, it is impossible to search the web for a name and restrict the search just to names of authors Adding semantics to web pages would simplify both searching and organization of information on the web; automatic research tools could eliminate lots of the drudgery The semantic web is under development by W3C and others and consists of XML, RDF, and OWL
Next: More on XML Last: English and the Web

XML The Universal Markup


A human and computer readable coding system for tagging web documents
In order to be human readable, both markup and data are represented in character form

Can be used for a variety of purposes, but for the semantic web is used for giving structure to raw data XML is a restricted form of SGML, the Standard Generalized Markup Language (ISO Standard #8879)

XML Smackdown
XML is bigger and more powerful than HTML Anything HTML can do, XML can do better XML can be used to create specialized tools (such as XML Signatures for storing and managing digital signatures)
There is no easy or consistent way to do this in HTML

HTML has recently been eaten up by XHTML, which expresses all of the HTML tags as proper XML tags
Eventually, HTML will become obsolete, and to be properly displayed, all pages will have to be XHTML compliant

XML Make it up as you go


<?xml version="1.0" encoding="UTF-8" ?> <courseOffering>IST195 <semester>Summer 2010</semester> <instructor>Randy Wenner <instrTitle>Adjunct Professor</instrTitle> </instructor> <location>Hinds 010</location> </courseOffering>

Learning Map - Pause


1. The Semantic Web

2. XML
The universal markup language, poised to take over pretty much everything having to do with markup on the web XML is the heart of the semantic web

English and the Web

English Language Country Categories


Countries with English as a native language:
Examples: UK, USA, Australia, New Zealand, Canada

Countries with English as a second language:


Everyday or official usage; E.g., India, Singapore, Ghana

Countries with English as a frequently taught foreign language


Examples: France, Germany, Netherlands

Countries with English as a foreign language:


English not generally spoken within the country Examples: Japan, Thailand, China

Where is the major growth coming from in terms of new web content?

North America? Nope, try again


N.A. does not have the most Internet users, and Does not have the fastest growth in Internet use
WORLD INTERNET USAGE AND POPULATION STATISTICS World Regions Population ( 2009 Est.) Internet Users Dec. 31, 2000 Internet Users Latest Data Penetration (% Population) Growth 2000-2009 Users % of Table

Africa Asia Europe Middle East North America

991,002,342 3,808,070,503 803,850,858 202,687,005 340,831,831

4,514,400 114,304,000 105,096,093 3,284,800 108,096,800

86,217,900 764,435,900 425,773,571 58,309,546 259,561,000

8.7 % 20.1 % 53.0 % 28.8 % 76.2 %

1,809.8 % 568.8 % 305.1 % 1,675.1 % 140.1 %

4.8 % 42.4 % 23.6 % 3.2 % 14.4 %

Latin America/Caribbean

586,662,468

18,068,919

186,922,050

31.9 %

934.5 %

10.4 %

Oceania / Australia

34,700,201

7,620,480

21,110,490

60.8 %

177.0 %

1.2 %

But English is the Language of the Web

Are you pleased? Better not get too happy!


Because the Internet was developed in the U.S. and had its earliest rapid growth there, originally most pages were in English As the Internet has grown and the adoption has occurred more rapidly, the overall number of English language pages has grown, but as a proportion of the total content on the web, English has dropped

Top 10 Languages on the Web (http://www.internetworldstats.com)


Internet Internet Growth Internet TOP TEN LANGUAGES Users Penetration in Internet Users IN THE INTERNET by Language by Language (2000 - 2009) % of Total English Chinese Spanish Japanese Portuguese German Arabic French Russian Korean 499,213,462 407,650,713 139,849,651 95,979,000 77,569,900 72,337,310 60,252,100 57,017,099 45,250,000 37,475,800 39.5 % 29.7 % 34.0 % 75.5 % 31.4 % 75.0 % 17.5 % 16.9 % 32.3 % 52.7 % 251.7 % 1,162.0 % 669.2 % 103.9 % 923.9 % 161.1 % 2,297.7 % 375.2 % 1,359.7 % 96.8 % 27.7 % 22.6 % 7.8 % 5.3 % 4.3 % 4.0 % 3.3 % 3.2 % 2.5 % 2.1 % World Population for this Language (2009 Estimate) 1,263,830,976 1,373,859,774 411,631,985 127,078,679 247,223,493 96,389,702 344,139,242 337,046,097 140,041,247 71,174,317

Likely Conclusions
Although the total number of English language web pages will continue to grow, the proportion versus total pages will continue to drop
The proportional growth of pages in European languages will also slow down The proportional growth of pages in Chinese will grow at an accelerating pace

The use for or necessity of automated and semiautomated page translation will increase markedly over the coming ten years

Machine Translation
Refers to the process of using a computer program to translate from one language to another The state of the art is still not as accurate or sophisticated as one might like Back-translation example from Babelfish:
Original text: In all of Syracuse University, there is not a finer instructor of Information Technology than the highly-accomplished, intellectual overachiever, and all-around good guy who is known as Randy Wenner.

English to Chinese: Wenner

Chinese back to English: West the grand total forces the dove Si university, compared to is called blue Wenner the high success, the intelligence high achievement and the versatile goodness does not have an information technology better instructor.

Machine Translation and XML

Machine translation can be improved by the use of XML and XML standards
XML documents are much easier to translate than other electronic documents because they separate out form from content, and they conform to a rigorous standard and defined syntax.

Web-Based Translation Tools Babelfish (http://babelfish.yahoo.com/)


Translates short text passages Also try http://www.freetranslation.com/

Google
Google Translate (http://translate.google.com/)
Translates text or URLs

Google Language Tools


(http://www.google.com/language_tools?hl=en ) Search for pages in any of several dozen languages

Google Chrome (browser) will offer to translate for you

Learning Map - Pause


1. 2. Semantic Web XML

3. English and the Web

English is first language in relatively little of the world population and a declining proportion of Internet users The need will grow over the coming 5-10 years for automated translation of web content to facilitate use of foreign language web pages XML can be used to develop tools and standards that will assist with the development of better machine translation

You might also like