Download as pdf or txt
Download as pdf or txt
You are on page 1of 20

Matteo Palmonari

matteo.palmonari@unimib.it

Data Semantics: Introduction

INSID&S Lab – INteraction and Semantics


for Innovation with Data & Services
Dipartimento di Informatica, Sistemistica e Comunicazione
Università degli Studi di Milano-Bicocca
1
Just to warm up #1
¨  Matteo Palmonari
n  AI, Knowledge Representation, Data
DATA SEMANTICS @ DATA SCIENCE - UNIMIB

Integration
n  How to represent? How to match
representations? How to learn
representations?

¨  Blerina Spahiu
n  Knowledge Graphs, Semantic Web,
Data Management and Profiling

¨  Manuel Vimercati
n  Deep Learning, Knowledge Graphs,
NLP
2
Just to warm up #2
n  What is your n  What do you think
DATA SEMANTICS @ DATA SCIENCE - UNIMIB

background (BA)? data semantics is


¨  Write it in the chat about?
¨  Write it in the chat

3
Data Semantics / What
(the intuition) Google’s Knowledge Graph
DATA SEMANTICS @ DATA SCIENCE - UNIMIB

n  What is data semantics?


¨  Data semantics is about
n  modeling the meaning
attached to data and using
it to improve their
interpretation and the
subsequent processing
n  interpreting meaning of
data and use this imputed
interpretation to improve
data processing

4
Data Semantics – A Personal View
n  DataSemantics as a science of (not so
DATA SEMANTICS @ DATA SCIENCE - UNIMIB

straightforward) data interpretation


n  See my chapter “Dati e Semantica” in the book
edited by Carlo Batini “Le Basi della Scienza dei
Dati” (Ch. 6) – in Italian
n  A lot of attention to cases where interpretation is
not straightforward:
¨  Semi-structured data (e.g., CSV files)
¨  Texts (e.g., news and social media)

5
Data Semantics / Why
n  When should we take care of data semantics?
DATA SEMANTICS @ DATA SCIENCE - UNIMIB

¨  When we need to integrate different datasets


¨  When we need to make a dataset accessible to parties who do
not know the dataset
¨  When we need to access a dataset that is produced by third
parties
¨  When we need to infer meaning of data that is not structured
as we wished
n  Why?
¨  To improve data processing in crucial steps of the data
science lifecycle
¨  Data preparation (including: cleaning, integration,
transformation) can cover up to 80% of effort in a data science
project
6
Data Semantics / What
(aka what you’ll learn)
DATA SEMANTICS @ DATA SCIENCE - UNIMIB

n  Make semantics of n  Extract semantics


data explicit in such a from data that have
way that machines no semantics
can process data in a attached, in such a
smarter way way that machines
can process data in a
smarter way

n  Understand the role of semantics in data


integration and data access applications, i.e.,
semantic interoperability
7
Data Semantics / What
(aka what you’ll learn)
DATA SEMANTICS @ DATA SCIENCE - UNIMIB

n  Make semantics of n UsExtract


ing K NLsemantics
nowl Pp
e d g e i l l s:
data explicit in such a from data Gthat raph have
s for
NLP
way that machines no semantics
can process data in a attached, in such a
smarter way way that NLPmachines
pills:
Distributional Semantics for Text
can processAnalysis
data in a
smarter way

n  Understand the role of semantics in data


integration and
Semantic Datadata access applications, i.e.,
Integration
semantic interoperability
8
Relation to Artificial Intelligence
n  Knowledge Graphs and n  Distributional Semantics
DATA SEMANTICS @ DATA SCIENCE - UNIMIB

Inference and NLP

n  This course: more n  This course: more


practical perspective: practical perspective:
¨  How to query, model, ¨  How to use NLP to
create and develop support integration of
Knowledge Graphs? structured and unstructured
¨  How to validate and information?
complete Knowledge ¨  How to use NLP and
Graphs using inference? Knowledge graphs to
¨  How Knowledge Graphs analyze text sources?
can support data
integration?
9
Program of the Course
n  Lessons n  Exercises
DATA SEMANTICS @ DATA SCIENCE - UNIMIB

¨  Introduction to semantics in Big ¨  Querying KGs


Data ¨  Modeling KGs (ontologies)
¨  Knowledge Graphs (models/ ¨  Creating KGs from legacy
languages/tools/architectures) sources
¨  Model transformations, data ¨  Data integration (+
integration, data enrichment, semantic enrichment ?)
data linking ¨  Text analysis with
¨  Content relevance & similarity Distributional Semantics
¨  Distributional Semantics and
Text Analysis (deep learning &
semantics) n  Seminars

¨  SPAZIODATI, … 10
Program of the Course
n  Lessons Exercises
n  Based on a textbook: models
DATA SEMANTICS @ DATA SCIENCE - UNIMIB

and techniques to master

¨  Introduction to semantics in Big ¨  Querying KGs


Data ¨  Modeling KGs (ontologies)
¨  Knowledge Graphs (models/ ¨  Creating KGs from legacy
languages/tools/architectures) sources
Non based on a textbook:
¨  Model transformations, data ¨  Data
models integration
and techniques&with
integration, data enrichment, Semantic
selected Enrichment
but specific teaching
data linking material (selected articles,
¨  Text analysis with
¨  Content relevance & similarity slides, additional material)
Distributional Semantics
¨  Distributional Semantics and
Text Analysis (deep learning &
semantics) n  Seminars

¨  SPAZIODATI, … 11
Temporal Analogies
New York Times + DBpedia Entities
DATA SEMANTICS @ DATA SCIENCE - UNIMIB

Try to complete 3
analogies
2-5 4

1) Barack Obama is for 2010 what George W Bush is for 2005


Temporal Analogies
New York Times + DBpedia Entities
DATA SEMANTICS @ DATA SCIENCE - UNIMIB

1) Barack Obama is for 2010 what George W Bush is for 2005


Temporal Shift in Entity Similarity
New York Times + DBpedia Entities
DATA SEMANTICS @ DATA SCIENCE - UNIMIB

Is the similarity
between the entities
Increased (+) or decreased (-)?
Temporal Shift in Entity Similarity
New York Times + DBpedia Entities
DATA SEMANTICS @ DATA SCIENCE - UNIMIB
Relative Polarization: SWEAT
measure
DATA SEMANTICS @ DATA SCIENCE - UNIMIB

Measure delta in cumulative associations


between topic and two polarizations
OUTPUT:
●  Effect size Repubblica ~ Negative
●  Statistical Significance Giornale ~ Positive

●  Data-driven polarization wordsets -  'score': -0.4149


-  'effect_size': -1.5621
from annotated lexicon -  'p-value': 0.0109

e.g. topic “Silvio Berlusconi”


{"berlusconi","mediaset","fi","arcore",
"fininvest","italia1","rete4","canale5"}
Knowledge Technologies in the Industry Today
DATA SEMANTICS @ DATA SCIENCE - UNIMIB

•  Knowledge-based data processing crucial to support data


science and AI tasks
•  Knowledge graphs used by most of IT giants (Google,
Facebook, Amazon, eBay, etc.)
17
Why Knowledge Graphs
(Semantic Web)
DATA SEMANTICS @ DATA SCIENCE - UNIMIB

VS

18
DATA SEMANTICS

TEACHING MATERIAL
Tommaso Di Noia, Roberto De Virgilio, Eugenio Di Sciascio,
Francesco M. Donin
Seman&c Web. Tra ontologie e Open Data.
1° ed. (Apogeo, 2013), pp. 240

Tom Heath, ChrisIan Bizer
Linked Data: Evolving the Web into a Global Data Space
1° ed. (Morgan & Claypool, 2011), pp. 136

Slides, papers and other texts


shared during the course

DATA SEMANTICS

EVALUATION
•  AggregaIon of the scores obtained in two independent assessments.
•  First assessment:
–  exam-tailored project or a survey (individuals or groups)
–  oral presentaIon supported by slides lasIng about 20 minutes (with short demo of the
project if any)
–  project: in-depth knowledge and/or hands-on experience of a specific topic covered in the
course or linked to topics covered in the course;
–  survey: bibliographic review on a topic, in which the student discusses and compares
proposed soluIons in the state of the art to a specific problem of interest for him.
–  Evaluated by: significance, methodological soundness, mastery of the in-depth topic.
•  Second assessment:
–  Assignments to be completed before the oral exam, presented and evaluated at the exam

You might also like