Download as pdf or txt
Download as pdf or txt
You are on page 1of 33

Semantic Web

In the area of the Semantic Web, the metadata is


used to add meaning to the page, and this
structured data can be easily understood by
machines

Metadata provides the essential link between the


page content and content meaning
SEARCH ENGINE FOR THE TRADITIONAL
WEB


Today, the most visible component of the Internet is the Web

It has hundreds of millions of pages available, presenting
information on an amazing variety of topics

you can find almost anything you are interested in on the Web,
and all this information originates from search engines

In the current world, there are many different search engines
you can chose from.Perhaps the most widely used ones are
Google and Yahoo! There are also other search engines such
as AltaVista, Lycos, etc.
BUILDING THE INDEX TABLE


Even before a search engine is made available on the Web, it
starts preparing a huge index table for its potential users. This
process is called the indexation process

it will be conducted repeatedly throughout the life of the search
engine

The quality of the generated index table to a large extent
decides the quality of a query result

The indexation process is conducted by a special piece of
software usually called a spider, or crawler
Indexation Process

Step 1: Build an index table for every single word on this page
Step 2: From the current page, find the first link (which is again
a URL pointing to another page) and “crawl” to this link,
meaning to download the page pointed to by this link
Step 3: After downloading this page, start reading each word on
this page, and add them all to the index table
Now there are two possibilities: (1) the current word from this new page has never
been added to the index table, or (2) it already exists in the index table

Step 4: Go to step 2, until no unvisited link exists (more about


this later)
Indexation Process
Indexation Process
Indexation Process
Indexation Process
CONDUCTING THE SEARCH
A DDITIONAL DETAILS
Let us summarize

The Web is made up of billions of Web pages

Most Web page have three different kinds of codes:
HTML/XHTML tags (required), CSS tags (optional), and
JavaScript (optional)

All these tags in a Web page just tell the machines how to display
the page; they do not tell the machines what they mean

When a crawler reaches a given page, it has no way to know
what this page is all about. In fact, it is not even able to tell
whether the page is a personal page or a corporate Web site

Thus, the crawler can only collect keywords, turning all search
engines into essentially keyword matchers
SEARCH ENGINE FOR THE
SEMANTIC WEB
A HYPOTHETICAL USAGE OF THE
TRADITIONAL SEARCH ENGINE

By typing “SLR,” I should get the following list



www.cheapCameras.com

www.buyItHere.com and many other links...
They do not discuss the performance issues of digital SLRs
which is the topic that I am looking for Only if I am patient
enough will I be able to reach www.goodPhoto.com , which is
very useful and does address my concerns

There are some really good sites I have missed
www.digcamhelp.com talks about shutter speed and aperture
it uses the phrase single lens reflex instead of SLR
BUILDING A
SEMANTIC WEB SEARCH ENGINE

Step 1: Build a common language


The word semantics means meaning. Adding semantics into the
Web, therefore, means adding meaning to the Web. Before we
can add meaning, the first step is to figure out how to express
meaning
This is done by constructing a vocabulary that has meaning
(knowledge) coded inside its terms
there is a group of experts — mainly professional photographers
who have extensive domain knowledge — who get together, and
work out a vocabulary set
A simple Vocabulary
A simple Vocabulary

This extremely small vocabulary tells us the following:


Camera is a root concept in this vocabulary; we can also call it
a root term or a root class. Camera has two subconcepts:
Digital camera or Film camera. Also, Digital camera can have
two subclasses: SLR camera or Point-and-Shoot camera

the concept of class is the same as that used in the object-
oriented programming language world

At this point, we can see the meaning, or knowledge if you will,
of the photography domain being coded into this vocabulary
A simple Vocabulary

We can even add more knowledge into it by saying more about


this domain:
SingleLengthReflex camera is exactly the same class or
concept as the SLR class, and SLR-Camera is also exactly the
same concept as the SLR class. Further, one can use two
different concepts to describe the performance of a SLR class:
one is ShutterSpeed and the other is Aperture ; or, one can say
that SLR class has two properties: ShutterSpeed and Aperture

Clearly, it is domain specific and includes a bunch of concepts
(classes) with the relations defined among these concepts
A simple Vocabulary

The benefits of building such a vocabulary are the following :



It is a standard way to express the meaning/knowledge of a
specific domain

It is a common understanding/language/meaning/knowledge
shared by different parties over the Web
This vocabulary is built for a specific domain, and in our example
this domain is photography
building a comprehensive vocabulary that covers everything is just
too ambitious; it is not even technically possible, given the fact that
a single concept may have different meanings in different domains
BUILDING A
SEMANTIC WEB SEARCH ENGINE
Step 2: Markup the pages

to take advantage of the Semantic Web, semantic similarities
between the words on the given page and the concepts defined in the
vocabulary have to be explicitly expressed

The solution is to markup the page. More specifically, to markup
means to add some extra data or information to the Web page to
describe some specific characteristic of the page. In our case, we
need to add some data to explicitly indicate that the semantics of
some words contained in this page is defined in some common
vocabulary set

The extra data used to express semantic similarity is indeed
metadata. Therefore, to markup a page is to add some metadata to it
BUILDING A
SEMANTIC WEB SEARCH ENGINE

First, the page owner of www.goodPhoto.com creates a special
file by using one of the description languages. This file conveys
the following information:

The word SLR used in these pages has the same semantics as
that defined in the common vocabulary file

The page owner then saves this file somewhere on the Web,
most likely on the Web server hosting www.goodPhoto.com

Second, the pages have to be marked up: each page that has the
word SLR has to be connected to the preceding description file.
This is done by adding a <link> tag in the <head> section of the
page
Connecting the Web Page to the
Special File
BUILDING A
SEMANTIC WEB SEARCH ENGINE
Let us now assume the owners of the following two sites have
also marked up their pages:

www.digcamhelp.com

www.ehow.com
The semantics of the words shutter speed on this page is the
same as that defined in the common vocabulary; the semantics
of the word aperture is the same as that defined in the common
vocabulary
The single lens reflex camera discussed on this page is an
instance of the class SingleLensReflex defined in the common
vocabulary.
BUILDING A
SEMANTIC WEB SEARCH ENGINE
Step 3: A much smarter crawler

Now that the related pages have been marked up, it is up to the crawler to
collect this information while crawling. It can now be viewed as a smart
agent

At some point during its journey on the Web, the crawler reaches a page
under www.cheapcameras.com . Just as it usually does, it downloads this
page and starts to index the words on it

The first thing the crawler notices is that this page does not link to any
special file. This implies that this page has not been marked up for any
special semantics

For all the pages meant mainly for selling the cameras, the same process
is repeated by the crawler: when it hits the word SLR, it simply locates it in
the index table, and adds the corresponding document record
BUILDING A
SEMANTIC WEB SEARCH ENGINE
BUILDING A
SEMANTIC WEB SEARCH ENGINE
Finally, the crawler reaches a page belonging to
www.goodPhoto.com . The first thing the crawler sees is that this
page links to another file (the markup file), which specifies that
the word SLR used on this page has the same semantics as that
defined in the common vocabulary. Immediately, the crawler
accesses the URL of this markup file and downloads it to its
memory. It will perform the following two important steps:
BUILDING A
SEMANTIC WEB SEARCH ENGINE
USING THE SEMANTIC WEB SEARCH
ENGINE

Step 1 : Select your semantics



After accepting the keyword we entered (in this case, SLR), the
engine will first scan all the available common vocabularies and
find all the ones that have the concept SLR defined in them

it is quite possible that there are other common vocabulary sets

The engine then presents these owl files, and it is up to us to tell
the engine that the semantics (meaning) of the keyword SLR is
the same as that defined in this vocabulary:
mySimpleCamera.owl
USING THE SEMANTIC WEB SEARCH
ENGINE
Step 2 : Search the index table and return the results
It does this by retrieving the document list indexed by the keyword
SLR. It then iterates over this list and for each document in the list, it
checks the markupURL field and does the following:

If this field points to NULL , discard this document

If the field is not NULL and if it does not point to
mySimpleCamera.owl , then discard this document

If the markupURL field is not NULL and it does point to
mySimpleCamera .owl , then include this document in the candidate
For our hypothetical example, the following three sites will be collected
: www.goodPhoto.com, www.digcamhelp.com, www.ehow.com
THE SEMANTIC WEB SEARCH
ENGINE : EXAMPLES


Swoogle (http://swoogle.umbc.edu/2006/)

Watson (http://kmi.open.ac.uk/technologies/name/watson/)

Sindice (https://sindice.com/)

SemSearch (http://technologies.kmi.open.ac.uk/semsearch/)

WebOWL
THE SEMANTIC WEB: A SUMMARY
THE SEMANTIC WEB: A SUMMARY
THE SEMANTIC WEB: DEFINITION
WHAT IS THE KEY TO SEMANTIC WEB
IMPLEMENTATION?


the key to the implementation of the Semantic Web is this
structured data set

a structured data set implies that it is machine readable

we called it the common vocabulary file

a commonly accepted name for this common vocabulary file
is ontology

RDF, RDFS, and OWL are all description languages that you
can use to construct such an ontology

You might also like