Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 47

Chapter 3

USING SEMANTIC
DATA
Querying Data
Information is represented in RDF, need to be accessed for reasoning
and developing applications.
SPARQL query language, pronounced ‘sparkle’, is the standard query
language and protocol for Linked Open Data on the web or for
semantic graph databases (also called RDF triplestores), for letting us
select, extract, and otherwise easily get a hold of particular sections of
knowledge expressed in RDF.
SPARQL is specifically designed for RDF, and is tailored to and relies
upon the various technologies underlying the web.
SPARQL, short for “SPARQL Protocol and RDF Query Language”,
enables users to query information from databases or any data source
that can be mapped to RDF.
SPARQL lets us translate heavily interlinked, graph data into
normalized, tabular data with rows and columns which can
be opened in programs like Excel, or import into a
visualization suite such as plot.ly or Palladio.
It is useful to think of a SPARQL query as a Mad Lib - a set
of sentences with blanks in them. The database will take
this query and find every set of matching statements that
correctly fill in those blanks, returning the matching values
to us as a table.
Take this SPARQL query:
SELECT ?painting
WHERE {
?painting <has medium> <oil on canvas> .
}
?painting - the node (or nodes) that the database will
return.
On receiving this query, the database will search for all
values of ?painting that properly complete the RDF
statement <has medium> <oil on canvas> .
Visualization by the query
When the query runs against the full database, it looks for
the subjects, predicates, and objects that match this
statement, while excluding the rest of the data:

painting

The Nightwatch
Woman with a Balance
RDF and SPARQL has the ability to create complex queries
that reference many variables at a time.
For example, we could search our pseudo-RDF database
for paintings by any artist who is Dutch:
SELECT ?artist ?painting
WHERE {
?artist <has nationality> <Dutch> .
?painting <was created by> ?artist .
}
Used a second variable, ?artist.
The RDF database will return all matching combinations
of ?artist and ?painting that fulfill both of these
statements.
artist painting
Rembrandt van Rijn The Nightwatch
Johannes Vermeer Woman with a Balance
Inferences:

Generating new facts and proving facts using reasoning


and a semantic knowledge base
Triplestores are designed to store semantic facts in an
atomic form – subject, predicate and object.
Example: “Flipper is a dolphin” and “A dolphin is a
mammal” is another fact.
Storing massive amounts of triples improves search.
Organizations can analyze all of the facts.
Customers receive highly relevant results.
Web pages can be dynamically built based on search
history, profiles and facts about unstructured text.
Billions of these facts are available for free in the
Linked Open Data world about music, places, subjects
of interest and products.
Having all of these facts about people, places,
organizations and events and when applied correctly,
semantic facts can enhance your knowledge
management and data discovery applications in a
powerful way.
what if you could combine these facts to create new facts
derived from the original sets? --- “inferencing” or “reasoning” .
Triplestore, is one of the most powerful and important aspects
of this type of database.
Because we know that “Flipper is a dolphin” and “dolphin is a
mammal”, we can infer a new fact: “Flipper is a mammal”.
The applications of inferencing span industries and use cases.
Knowing that two people are connected through a series of other
factual relationships can be helpful in identifying networks for a
variety of purposes.
Ex:
* social networks, fraud networks or terrorist
networks.
*In physician referrals and clinical trials research, the
ability to infer a doctor’s specialty based on the drugs
prescribed can help you find doctors that you require.
* When analyzing economic markets, the ability to
infer trading price points for commodities using weather
and regional data may provide you with a competitive
advantage.
List down triples in the above graph.

List down some of the inferences that you can arrive at


referring the graph.

Assume that an article appeared in a major newspaper that


mentioned a company called “My Local Café” and in that
news article My Local Café was referred to as “a network of
independent coffee shops.”Prove or disprove the fact.
This is the key: As new facts are added to the
database, inferred facts can be created.
It’s these inferred facts that add extra explanatory
power to queries and search results.
Inferencing can be one of the most useful weapons in
your knowledge discovery arsenal.
Types of Inferences
Simple and Deterministic: If I know a rock weighs 1 Kg,I
can Infer that the same rock weighs 2.2lbs.
Rule Based: If I know a person is under 16 and in
California, I can infer that they are not allowed to drive.
Classifications: If I know a company is in San Fransisco or
Seattle, I can classify it as a “west coast company”.
Judgement: If I know a person’s height is 6 feet or more,I
refer to them as tall.
Online services: If I know a restaurant’s adress, I can use
a geocoder to find its coordinates on a map
Searching for connections

The common algorithm to find shortest path between


two entities(points) in graph of data is called breadth
first search.
Ex:”Six Degrees Kevin Bacon”
Linked Data
Linked Data is one of the core pillars of the Semantic Web
, also known as the Web of Data.
Makes links between datasets that are understandable
not only to humans, but also to machines,
Linked Data is a set of design principles for sharing
machine-readable interlinked data on the Web.
The aim of Linked Data is to relate data described using
the RDF model so that machines can browse the Web.
This collection of interrelated datasets on the Web can
also be referred to as Linked Data.
To make the Web of Data a reality, it is important to have the huge
amount of data on the Web available in a standard format, reachable
and manageable by Semantic Web tools. Along with relationships
among data .
Four Principles for linked data
( 2006,Tim Berners-Lee)

1. Use Uniform Resource Identifiers (URIs) as names for things.


Single global identification system used for giving unique names to
anything .
Distinguish between different things or know that one thing from one
dataset is the same as another in a different dataset.
2. Use HTTP URIs to look up the names on the Web.
HTTP protocol in conjunction with URI provides a simple and easier
mechanism for identifying retrieving resources,
Publishing any kind of data and adding it to the global data space is quick.
3. URI, provide useful information, using the standards (RDF,
SPARQL).
* RDF(graph-based representation format for data publishing
and interchange on the Web ).
* It is also used in semantic graph databases (RDFtriplestores)
–a
technology for storing interconnected data and inferring
new
facts out of existing ones .
* SPARQL for querying W3C-standardized query language for

retrieving and manipulating data stored in RDF format. It


allows
us to search the Web of Data (or any database) and discover
relationships.
4. Include links to other URIs so that they can discover more
things.
* Links to other URIs makes data interconnected and
enables us to find
different things.
* By interlinking new information with existing resources,
we maximize the
reuse
*Create a richly interconnected network of machine-
processable.
5-star development scheme(2010, Tim Berners-Lee) for LOD.

★ Make your content available on the web with an open license.


★★ Make your content available as machine-readable structured data
(e.g., Excel instead of an image scan of a table). If you’re using WordLift,
you will also be able to enrich your structured data with more related
pieces of information, the so-called properties.
★★★ Make your content available in a non-proprietary open format
(e.g., CSV instead of Excel).
★★★★ Assign a unique and permanent URl to each entity to identify
them and to make your content easily findable by people using stable
IDs.
★★★★★ Link your data to other data to provide a context: now the
web is connected and you’ve reached the 5th star.
Some of the primary datasets that implement the 5-stars linked
data schema are foundational for the machine learning
algorithms behind semantic search engines like Google and Bing
as well as digital personal assistants like Alexa, Cortana, and the
Google Assistant.
How does Linked Data works?
Let’s say that on Website A we can present the entity Jason and the fact that he
knows Marie.
On website B we can provide all the information about Marie .
The Website C contain information about Marie’s birthplace.

Each page contains the structured data to describe an entity (Jason, Marie and
Italy) and the link to the entity that could be described on a different page or
even on a different website.
Linked open data graph visualization Salzburgerland Tourism
Linked Data vs. Open Data
When data can be freely used and distributed by anyone it is called Open Data.
Open Data not same as Linked Data. Open Data can be made available to
everyone without links to other data. At the same time, data can be linked
without being freely available for reuse and distribution.
The W3C community puts a lot of efforts in enriching the
Linked Open Data (LOD) cloud.
Linked Open Data is a powerful blend of Linked Data and Open Data: it is both
linked and uses open sources
Example: DBPedia a crowd-sourced community effort to
extract structured information from Wikipedia and make it
available on the web(RDF). DBPedia not only includes
Wikipedia data, but also incorporates links
to other datasets on the Web, e.g., to Geonames. –

By providing those extra links (in terms of RDF triples)


applications may exploit the extra (and possibly more
precise) knowledge from other datasets when developing
an application.

By virtue of integrating facts from several datasets, the


application may provide a much better user experience.
A semantic graph database such as Ontotext’s GraphDB is able to
handle huge datasets coming from disparate sources and link
them to Open Data. This provides richer queries, boosting
untapped knowledge discovery and efficient data-driven
analytics.
Linked Open Data Cloud(These datasets (like DBpedia, Wikidata, Geonames
just to name it a few) are all interlinked together
Freebase
was a large collaborative knowledge base consisting of data
composed mainly by its community members. It was an
online collection of structured data harvested from many
sources, including individual, user-submitted wiki
contributions.[3] Freebase aimed to create a global resource
that allowed people (and machines) to access common
information more effectively. It was developed by the
American software company Metaweb and ran publicly
beginning in March 2007. Metaweb was acquired by Google
in a private sale announced 16 July 2010.[4] Google's
Knowledge Graph was powered in part by Freebase.[5]
3rd chapter completed

THANK YOU 

07/05/2024 47

You might also like