SW Mod1 PartB

Semantic Web
In the area of the Semantic Web, the metadata is

used to add meaning to the page, and this
structured data can be easily understood by
machines
Metadata provides the essential link between the

page content and content meaning
SEARCH ENGINE FOR THE TRADITIONAL
WEB
●
Today, the most visible component of the Internet is the Web
●
It has hundreds of millions of pages available, presenting
information on an amazing variety of topics
●
you can find almost anything you are interested in on the Web,
and all this information originates from search engines
●
In the current world, there are many different search engines
you can chose from.Perhaps the most widely used ones are
Google and Yahoo! There are also other search engines such
as AltaVista, Lycos, etc.
BUILDING THE INDEX TABLE
●
Even before a search engine is made available on the Web, it
starts preparing a huge index table for its potential users. This
process is called the indexation process
●
it will be conducted repeatedly throughout the life of the search
engine
●
The quality of the generated index table to a large extent
decides the quality of a query result
●
The indexation process is conducted by a special piece of
software usually called a spider, or crawler
Indexation Process
Step 1: Build an index table for every single word on this page
Step 2: From the current page, find the first link (which is again
a URL pointing to another page) and “crawl” to this link,
meaning to download the page pointed to by this link
Step 3: After downloading this page, start reading each word on
this page, and add them all to the index table
Now there are two possibilities: (1) the current word from this new page has never
been added to the index table, or (2) it already exists in the index table
Step 4: Go to step 2, until no unvisited link exists (more about

this later)
Indexation Process
Indexation Process
Indexation Process
Indexation Process
CONDUCTING THE SEARCH
A DDITIONAL DETAILS
Let us summarize
●
The Web is made up of billions of Web pages
●
Most Web page have three different kinds of codes:
HTML/XHTML tags (required), CSS tags (optional), and
JavaScript (optional)
●
All these tags in a Web page just tell the machines how to display
the page; they do not tell the machines what they mean
●
When a crawler reaches a given page, it has no way to know
what this page is all about. In fact, it is not even able to tell
whether the page is a personal page or a corporate Web site
●
Thus, the crawler can only collect keywords, turning all search
engines into essentially keyword matchers
SEARCH ENGINE FOR THE
SEMANTIC WEB
A HYPOTHETICAL USAGE OF THE
TRADITIONAL SEARCH ENGINE
By typing “SLR,” I should get the following list

●
www.cheapCameras.com
●
www.buyItHere.com and many other links...
They do not discuss the performance issues of digital SLRs
which is the topic that I am looking for Only if I am patient
enough will I be able to reach www.goodPhoto.com , which is
very useful and does address my concerns
●
There are some really good sites I have missed
www.digcamhelp.com talks about shutter speed and aperture
it uses the phrase single lens reflex instead of SLR
BUILDING A
SEMANTIC WEB SEARCH ENGINE
Step 1: Build a common language

The word semantics means meaning. Adding semantics into the
Web, therefore, means adding meaning to the Web. Before we
can add meaning, the first step is to figure out how to express
meaning
This is done by constructing a vocabulary that has meaning
(knowledge) coded inside its terms
there is a group of experts — mainly professional photographers
who have extensive domain knowledge — who get together, and
work out a vocabulary set
A simple Vocabulary
A simple Vocabulary
This extremely small vocabulary tells us the following:

Camera is a root concept in this vocabulary; we can also call it
a root term or a root class. Camera has two subconcepts:
Digital camera or Film camera. Also, Digital camera can have
two subclasses: SLR camera or Point-and-Shoot camera
●
the concept of class is the same as that used in the object-
oriented programming language world
●
At this point, we can see the meaning, or knowledge if you will,
of the photography domain being coded into this vocabulary
A simple Vocabulary
We can even add more knowledge into it by saying more about

this domain:
SingleLengthReflex camera is exactly the same class or
concept as the SLR class, and SLR-Camera is also exactly the
same concept as the SLR class. Further, one can use two
different concepts to describe the performance of a SLR class:
one is ShutterSpeed and the other is Aperture ; or, one can say
that SLR class has two properties: ShutterSpeed and Aperture
●
Clearly, it is domain specific and includes a bunch of concepts
(classes) with the relations defined among these concepts
A simple Vocabulary
The benefits of building such a vocabulary are the following :

●
It is a standard way to express the meaning/knowledge of a
specific domain
●
It is a common understanding/language/meaning/knowledge
shared by different parties over the Web
This vocabulary is built for a specific domain, and in our example
this domain is photography
building a comprehensive vocabulary that covers everything is just
too ambitious; it is not even technically possible, given the fact that
a single concept may have different meanings in different domains
BUILDING A
Step 2: Markup the pages
●
to take advantage of the Semantic Web, semantic similarities
between the words on the given page and the concepts defined in the
vocabulary have to be explicitly expressed
●
The solution is to markup the page. More specifically, to markup
means to add some extra data or information to the Web page to
describe some specific characteristic of the page. In our case, we
need to add some data to explicitly indicate that the semantics of
some words contained in this page is defined in some common
vocabulary set
●
The extra data used to express semantic similarity is indeed
metadata. Therefore, to markup a page is to add some metadata to it
BUILDING A
●
First, the page owner of www.goodPhoto.com creates a special
file by using one of the description languages. This file conveys
the following information:
●
The word SLR used in these pages has the same semantics as
that defined in the common vocabulary file
●
The page owner then saves this file somewhere on the Web,
most likely on the Web server hosting www.goodPhoto.com
●
Second, the pages have to be marked up: each page that has the
word SLR has to be connected to the preceding description file.
This is done by adding a <link> tag in the <head> section of the
page
Connecting the Web Page to the
Special File
BUILDING A
Let us now assume the owners of the following two sites have
also marked up their pages:
●
www.digcamhelp.com
●
www.ehow.com
The semantics of the words shutter speed on this page is the
same as that defined in the common vocabulary; the semantics
of the word aperture is the same as that defined in the common
vocabulary
The single lens reflex camera discussed on this page is an
instance of the class SingleLensReflex defined in the common
vocabulary.
BUILDING A
Step 3: A much smarter crawler
●
Now that the related pages have been marked up, it is up to the crawler to
collect this information while crawling. It can now be viewed as a smart
agent
●
At some point during its journey on the Web, the crawler reaches a page
under www.cheapcameras.com . Just as it usually does, it downloads this
page and starts to index the words on it
●
The first thing the crawler notices is that this page does not link to any
special file. This implies that this page has not been marked up for any
special semantics
●
For all the pages meant mainly for selling the cameras, the same process
is repeated by the crawler: when it hits the word SLR, it simply locates it in
the index table, and adds the corresponding document record
BUILDING A
BUILDING A
Finally, the crawler reaches a page belonging to
www.goodPhoto.com . The first thing the crawler sees is that this
page links to another file (the markup file), which specifies that
the word SLR used on this page has the same semantics as that
defined in the common vocabulary. Immediately, the crawler
accesses the URL of this markup file and downloads it to its
memory. It will perform the following two important steps:
BUILDING A
USING THE SEMANTIC WEB SEARCH
ENGINE
Step 1 : Select your semantics

●
After accepting the keyword we entered (in this case, SLR), the
engine will first scan all the available common vocabularies and
find all the ones that have the concept SLR defined in them
●
it is quite possible that there are other common vocabulary sets
●
The engine then presents these owl files, and it is up to us to tell
the engine that the semantics (meaning) of the keyword SLR is
the same as that defined in this vocabulary:
mySimpleCamera.owl
USING THE SEMANTIC WEB SEARCH
ENGINE
Step 2 : Search the index table and return the results
It does this by retrieving the document list indexed by the keyword
SLR. It then iterates over this list and for each document in the list, it
checks the markupURL field and does the following:
●
If this field points to NULL , discard this document
●
If the field is not NULL and if it does not point to
mySimpleCamera.owl , then discard this document
●
If the markupURL field is not NULL and it does point to
mySimpleCamera .owl , then include this document in the candidate
For our hypothetical example, the following three sites will be collected
: www.goodPhoto.com, www.digcamhelp.com, www.ehow.com
THE SEMANTIC WEB SEARCH
ENGINE : EXAMPLES
●
Swoogle (http://swoogle.umbc.edu/2006/)
●
Watson (http://kmi.open.ac.uk/technologies/name/watson/)
●
Sindice (https://sindice.com/)
●
SemSearch (http://technologies.kmi.open.ac.uk/semsearch/)
●
WebOWL
THE SEMANTIC WEB: A SUMMARY
THE SEMANTIC WEB: A SUMMARY
THE SEMANTIC WEB: DEFINITION
WHAT IS THE KEY TO SEMANTIC WEB
IMPLEMENTATION?
●
the key to the implementation of the Semantic Web is this
structured data set
●
a structured data set implies that it is machine readable
●
we called it the common vocabulary file
●
a commonly accepted name for this common vocabulary file
is ontology
●
RDF, RDFS, and OWL are all description languages that you
can use to construct such an ontology

SW Mod1 PartB

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

SW Mod1 PartB

Uploaded by

Copyright:

Available Formats

Semantic Web

In the area of the Semantic Web, the metadata is

Metadata provides the essential link between the

Step 4: Go to step 2, until no unvisited link exists (more about

By typing “SLR,” I should get the following list

Step 1: Build a common language

This extremely small vocabulary tells us the following:

We can even add more knowledge into it by saying more about

The benefits of building such a vocabulary are the following :

Step 1 : Select your semantics

You might also like