Download as pdf or txt
Download as pdf or txt
You are on page 1of 86

Teachers

Intelligent
Techniques Carlos A. Iglesias
Introduction You can contact me at carlosangel.iglesias@upm.es -
Office C-211

Goals

▣ Learn the foundations of intelligent systems


▣ Understand the challenges and objectives of
intelligent systems
▣ Understand some of the key technologies in
this field
□ Knowledge Representation
□ Knowledge based systems

1. □

Machine learning
Natural Language Processing

Goals Topics

3 4
What is AI? - Press view

2.
Introduction What is AI?

5 6

What is AI? - Press view

7 8
9 10

THINKING HUMANLY
"The exciting new effort to make computers think ... machines with minds, in
What is Artificial Intelligence (AI) ? the full and literal sense." (Haugeland, 1985)
"[The automation of] activities that we associate with human thinking,
activities such as decision-making, problem solving, learning " (Hellman, 1978)
THINKING RATIONALLY
"The study of mental faculties through the use of computational models."
(Charniak and McDermott, 1985)
Systems that think Systems that think
THOUGHT "The study of the computations that make it possible to perceive, reason, and
like humans rationally
act." (Winston, 1992)
ACTING HUMANLY
Systems that act Systems that act "The art of creating machines that perform functions that require
BEHAVIOUR intelligence when performed by people." (Kurzweil, 1990)
like humans rationally
"The study of how to make computers do things at which, at the moment,
people are better." (Rich and Knight, 1991)
HUMAN RATIONAL
ACTING RATIONALLY
"Computational Intelligence is the study of the design of intelligent
agents." (Poole et at, 1998)
"Al ... is concerned with intelligent behavior in artifacts." (Nilsson, 1998)
11 12
Which capabilities requires the
Acting humanly: The Turing Test Turing test?

Natural
Language Automated Knowledge
Processing reasoning representation

Machine
learning

13 14

Total Turing Test (Harnard, 1992)

▣ The computer is a robot that should look, act


and communicate like a human

Computer
Robotics
Vision

15 16
Philosophy and AI:
Critic: Searle’s Chinese room Strong AI vs Weak AI

▣ Strong AI:
□ “True” AI
□ AI matches (or exceeds) human intelligence
□ AI machines have real conscious minds
□ Ej. HAL, Terminator, ...
▣ Weak AI:
□ AI “only” simulates human cognition
□ Narrow AI: constrained in problems / domains

Minds, Brains and Programs, John Searle, 1980


17 18

Thinking humanly: Thinking rationally:


the cognitive approach the logicist approach

▣ First: how do humans think? ▣ Logicist tradition from Aristotle


▣ Ways to understand human mind □ Formalize knowledge with logical notation and rules
□ Observe human reasoning to derive new knowledge
□ Psychological experiments ▣ Correct inferences - logical proof
□ Observe the brain in action ▣ Challenges
▣ Cognitive science: computers models from AI □ Integrate informal / uncertain knowledge
+ techniques from psychology to construct □ Scalability in real problems
precise theories of the human mind
▣ General Problem Solver (GPS, Newell & Simon
1961): not only solves correctly but with the
same reasoning steps than humans

19 20
Acting rationally:
rational agent approach Discussion

▣ The branch of computer science that is ▣ Acting rationally


concerned with the automation of intelligent □ More general that thinking rationally (logicist
behavior (Luger) approach)
■ correct inference is just one way to achieve
▣ Systems that operate autonomously, perceive rationality
their environment, persist over a prolonged □ More focused on scientific development
time period, adapt to change, and create and ■ than thinking or acting humanly
pursue goals. ▣ Our focus in the course: rational agents
▣ A rational agent is one that acts so as to
achieve the best outcome or, when there is
uncertainty, the best expected outcome
▣ Rational behaviour: doing the right thing

21 22

Foundations of AI

▣ Philosophy: logic, reasoning, rationality, …


▣ Mathematics: logic, computability, tractability,
np-completeness, probability, ...
▣ Economics: utility, decision theory, game
theory, operations research, …
▣ Neuroscience: neuron
▣ Psychology: behaviourism, cognitive 2.
psychology
▣ Computer engineering: ENIAC, ...
The History of
Disciplines that
▣ Control theory and cybernetics: homeostatic, AI contribute to AI
▣ Linguistics: computational linguistics

23 24
Pre-IA: 1911 Torres Quevedo
Story of IA “Ajedrecista” (Chess automaton)

"Torres and His Remarkable


https://history.computer.org/pioneers/torres.html Automatic Devices," Scientific
American Supplement, Vol. 80, No.
25 2079, 1915, pp. 296-298. 26
IBM ,

Pre-IA: 40s Pre-IA: First neuron

▣ Science-Fictions ▣ 1888 - Ramón y Cajal - brain composed of


▣ Three laws of Robotics, 42 neurons
□ No injure human beings ▣ 1943 - McCulloch-Pitts - first model of a neuron
□ Obey human orders □ A Turing machine can be implemented with ANN
□ Auto protect

27 28
Birth of AI: 1952-1956

The brain is a physical machine… Adaptive


▣ WW2 Ratio Club Dinner: British cybernetics behaviour is that which maintains it in
with biologists, psychologists, maths,
neurologists, computers, … Turing Test
▣ Cybernetics (e.g. Walter)
‘’ physical equilibrium with its environment
Ashby 1947
“Adaptive” behaviour is equivalent to the
□ behaviour patterns and Paulov behaviour of a stable system, the region of
the stability being the region of the
phase-space in which all the essential
variables lie within their normal limits
Ashby 1952

Walter’s tortoise 29 30

The Law of Requisite Variety (Ashby) Birth of AI: 1952-1956

▣ The number of states of its control ▣ 1951 Prinz’s checker (UK)


mechanism must be greater than or equal to ▣ Ferrari Mark I, Manchester Electronic
the number of states in the system being Computer, 1st commercial computer
controlled.
▣ Ashby states the Law as
▣ "Only variety can destroy variety”

31 32
Birth of AI: 1952-1956 IA: Dartmouth Summer 56

▣ 1956: Newell & Simon: Logic Theorist ▣ McCarthy, Minsky, Shannon, Simon, Newell, Mc
□ first AI program that uses heuristics “rules of Culloch, Nash, Samuel
thumb” to prove 38 out of 52 theorems of Principia ▣ McCarthy
Mathematica (Whitehead and Russell) □ Program Advice Taker - Logic with Common Sense

https://en.wikipedia.org/wiki/Dartmouth_workshop
33 34

Golden era - 1956-1974 Golden era - 1956-1974

▣ 1957 Newell & Simon - General Problem Solver ▣ 1960 - Quillian - 1st Semantic Network (or
(GPS): generalization of LT, model of cognition Frame)
□ reasoning as search ▣ 1966 - Weizenbaum, ELIZA - 1st Chatbot
▣ 1957 Rosenblatt - Perceptron
□ based on McCulloch-Pitts (1943)
□ training mechanism to learn the weights
▣ 1959 Samuel’s ML checker (USA)
□ Coined the term ‘machine learning’

35 36
Golden era - 1956-1974 The first AI winter 1974–1980

▣ Optimism ▣ High expectations, poor results


▣ Funding ▣ End of funding
□ Millions $ for funding AI labs in USA and Europe ▣ 1966 - failure of Machine Translation
□ NRC - after 20 billion $ investment
□ Underestimation difficulties
■ word disambiguation
■ common sense knowledge
▣ 1969 - abandonment of connectionism
□ Minsky & Papert, Perceptron cannot solve XOR!!!
▣ Lack of scalability
□ Limited computing facilities
□ Quilliam semantic networks only with 20 words
□ Combinatorial explosion → Toy problems
37 38

First AI winter 1974–1980 - Teenage AI Schools of Thought

▣ 1971-75 DARPA's frustration with the Speech “Neat” “Scruffy”


Understanding Research program at CMU ▣ Logical approach ▣ Symbolic approach
▣ 1973 - Lighthill report - the large decrease in AI ▣ McCarthy, Newell, ▣ Schank, Minsky,
research in the UK Simon Weizenbaum
▣ 1973-74 DARPA's cutbacks to academic AI ▣ Logic ▣ Frames and Scripts
research in general ▣ Prolog □ Origin of OOP
▣ CMU, Stanford, ▣ Lisp
Edinburgh ▣ MIT

39 40
New developments: Logic and 1969-1979 - Expert Systems
Symbolic reasoning New hopes

▣ 1972 - Colmenaur and Roussel, success of ▣ Small domains to avoid common sense
Prolog (PROgrammation en LOGique) ▣ 1969 Feigenbaum, Buchanan et al. DENDRAL
□ Reduces logic (Horn clauses) to be tractable (similar □ infer molecular structure from info provided by a
to rules and production rules) mass spectometer
▣ Critics to logic from psychologists: people do ▣ 1972 Feigenbaum - MYCIN
not think with logics □ diagnosed infectious blood diseases
□ McCarthy → machines should not think like humans □ evolves to E-MYCIN
▣ Development of Expert Systems & Knowledge ▣ 1978 McDermott R1/XCON, eXpert CONfigure
based systems (KBS) □ selecting computer system components based on
customer's requirements
□ 2500 rules.
□ By 1986, it had processed 80,000 orders, and
achieved 95-98% accuracy. It was estimated to be
saving DEC $25M a year
41 42

MYCIN Expert System

Expert
System

43 44
ES with KBS architecture Boom 1980–1987

▣ Emergence of KBS
▣ 1988 Deep Thought - wins chess masters
▣ 1982 Knowledge Level - Newell
□ Knowledge engineering
▣ Return of investment
▣ 1986 Revival of connectionism
□ Rumelhart, Backpropagation

45 46

Bust: the second AI winter 1987–1993

There exists a distinct


▣ 1987: collapse of the Lisp machine market

‘’ computers system level, lying


immediately above the
symbol level, which is
characterized by knowledge
▣ 1988: the cancellation of new spending on AI
1993: fail of expert systems
▣ 1990s: the quiet disappearance of the
as the medium and the fifth-generation computer project's original
principle of rationality as the goals.
law of behaviour
Newell, 1982

47 48
Weak definition of agents,
AI 1993–2001 Wooldridge 1994

▣ 90s: intelligent agents ▣ autonomy: agents operate without the direct


▣ 1997 Deep Blue beats Kasparov intervention of humans, and have control over
▣ 2011 IBM Watson wins Jeopardy their actions and internal state;
▣ social ability: agents interact with other agents
/ humans via some kind of
agent-communication language
▣ reactivity: agents perceive their environment,
and respond to changes that occur in it;
▣ pro-activeness: agents are able to exhibit
goal-directed behaviour by taking the
initiative.

49 50

Deep learning, big data and artificial


Multiagent Systems general intelligence: 2000–present

▣ System of agents ▣ 2007 Hinton Deep learning


▣ Require cooperation, coordination, □ Distributed Hypothesis
negotiation ▣ Big Data
▣ Social structures

51 52
Technological singularity

53 54

4. Disciplines that
AI Industry contribute to AI

55 56
AI Companies

57 58

Investment in IA (08/2015) Innovation in AI (8/2015)

59 60
Funding per AI category

9. What we have
Conclusions learnt

61 62

Conclusions References

▣ AI aims at developing systems that reproduce ▣ Artificial Intelligence, a A Modern Approach,


human behaviour and reasoning 3rd Ed, Russell & Norvig, Chapter 1, Prentice
▣ AI is a multidisciplinary field Hall, 2016
▣ AI has a number of subfields, such as
reasoning, knowledge representation,
machine learning, natural language
processing, computer vision and robotics
▣ AI is living a new golden era based on the
availability of large data sets

63 64
Credits

Thanks! Thanks to all who have published these resources


with a free licence:
▣ Minicons by Webalys
Any questions? ▣ Slide template by SlidesCarnival
▣ Images from Unsplash and Wix

You can find me at


cif@gsi.dit.upm.es

65 66
Hello!
Carlos A. Iglesias
Linked Data
Technologies You can contact me at cif@gsi.dit.upm.es - Office C-211

Carlos A. Iglesias
Universidad Politécnica de Madrid 2

Objectives

The learning objectives of this lesson consist of


having a clear understanding of:
▣ What open data is
▣ What linked data is
▣ The differences between open data and linked
data

1.
▣ How to publish a linked data
▣ How to define new vocabularies What is Open
▣ What are the main technologies of linked data Data and why it
(RDF, SPARQL) and their principles Open Data is important

3 4
My Administration is
committed to creating an
unprecedented level of
How can one openness in Government.
govern We will work together to
informed ensure the public trust and
establish a system of
citizens? transparency, public
participation, and
collaboration. Openness will
strengthen our democracy
and promote efficiency and
effectiveness in
Government.

President Obama, 2009 https://www.whitehouse.gov/open

5 6

Gobierno abierto Open Data, Open Government, Big Data

Source: Creating Open Value through Open Data, European Data Portal, 2015

A Vision of public services http://ec.europa.eu/newsroom/dae/document.cfm?doc_id=3179 7 8


Open Data characteristics

● Public: public access respecting legal


restrictions and concerns such as privacy or
security
● Accessible: open formats
● Documented: metadata to understand its use
● Reusable: open licences that not limit its use
● Complete: raw data (not processed) with
enough granularity
● Updated: updated data stream to preserve its
value
● User Support Center: provide support

https://www.whitehouse.gov/sites/default/files/omb/memoranda/2013/m-13-13.pdf 9 10

That's why I say that data is datos.gob.es


the new oil for the digital
age.

How many other ways could


stimulate a market worth 70
billion euros a year, without
spending big budgets? Not
many, I'd say.

Neelie Kroes,
EU vicepresident, 2012

http://europa.eu/rapid/press-release_SPEECH-12-149_en.htm
11 12
Data Value Chain

Source: Creating Open Value through Open Data, European Data Portal, 2015

13 14

325M€
Direct market size EU+26 2016-2020
Benefits of open data

25k new jobs


In 2016: 75k open data jobs in EU+26,
▣ Increment economic activity
□ Enterprise creation, improve entrepreneurship
□ Creation of new products and services
In 2020: 100k - 33% of increment, 9k jobs in Italy □ Improves existing products
▣ Reduces administration costs

1,7b€ saved in costs


182M€ saved in Italy in public administrations in 2020
□ Self-service → better use of resources
□ Transparency → quick public detection of unuseful
expenses
▣ Increased efficiency to link data across

626 m hours saved


administrations
▣ Creation of jobs
□ Need of new skills for Big Data era
Congestion costs is 1% of GDP every year
Milan most congested city with 61.2 hours wasted in 2014
15 16
Source: http://www.europeandataportal.eu/es/content/creating-value-through-open-data
Examples of companies created
using raw governamental data Open Data Europe

Zillow
Weather
valued at
Channel Garmin
more than
sold in market of
$1billion
$3.5billion $7.24billion
329 jobs in
in 2008
5 years

Source: Creating Open Value through Open Data, European Data Portal, 2015

17 18

“Datos abiertos” in Spain Spain - opendatabarometer.org

19 20
Uniform Linked Data
Resource
Identifier
(URI)

Open format

Machine readable

How can we
Open licence
publish open
data?

21 http://5stardata.info/en/ 22

How can we
publish
information to be
shared?

23 24
Traditional approach:
Usual technologies for sharing data Structural integration

▣ Data dump or database accessible through a


service
▣ Problems
□ Heterogeneous data, without common keys or
different types
□ Different definitions (what’s a region, ...)
□ Different attributes
□ Integration case by case
□ “Centralized” integration
□ Mix of fields with semantics with fields for
optimizing or normalising the database schema

25 26

Linked Data Approach:


Semantic Integration

▣ No need to change existing IT infrastructure


▣ We change how to publish and share data
□ Processes for transforming our data as linked data
■ Potential benefit from external open data
□ Processes for publishing interesting and relevant
parts of our data

2. What is Linked
Data and why it
Linked Data is important

27 28
From a Web of Documents
to a Web of Data

Linked Data is the term

‘’ used to describe a
method of exposing
and connecting data on
the Web from different
sources following the
four principles of Linked
Data
Tim Berners-Lee

https://www.ted.com/talks/tim_berners_lee_on_the_next_web https://www.w3.org/DesignIssues/LinkedData.html
29 30

The Four Design principles of Linked


Data (by Tim Berners Lee, 2006) The 4 principles in practice (1)

1. Use URIs as names for things 1. Use URIs (Uniform Resource Identifier) as
2. Use HTTP URIs so that people can look up names for things
those names. 2. Use HTTP URIs so that people can look up
3. When someone looks up a URI, provide useful those names.
information, using the standards (RDF,
SPARQL) UPM,

4. Include links to other URIs. so that they can


Universidad Politécnica de Madrid,
Technical University of Madrid,
discover more things ...

http://es.dbpedia.org/page/Universidad_Politécnica_de_Madrid
31 32
The 4 principles in practice (2) UPM

3. When someone looks up a URI, provide useful


information, using the standards (RDF*,
SPARQL)
4. Include links to other URIs. so that they can
discover more things

33 34

UPM - N3 UPM - XML

35 36
The Four Linked Data principles
Madrid (by Tim Berners Lee, 2006)

▣ Everything should have a unique web


identifier (URI)
▣ Information about things is published as web
resources with HTTP
▣ Information is described with RDF ‘triples’
(subject, predicate, object) (=entity, attribute,
value )
▣ Things are connected with relationships
http://dbpedia.org/resource/Madrid
Madrid
http://dbpedia.org/ontology/country country

http://dbpedia.org/resource/Spain Spain
37 38

RDF (Resource Description


Language) Web Resource

▣ W3C specification
▣ Resources are described as triples that form a
graph
▣ Graphs can be serialised using different
languages: XML, Turtle, JSON-LD
▣ E.g. Turtle
@prefix dbo: <http://dbpedia.org/ontology/>. Madrid
@prefix dbp: <http://dbpedia.org/property/>. populationTotal
@prefix dbr: <http://dbpedia.org/resource/>. country

dbr:Madrid Spain 3141991


dbo:country dbr:Spain ;
areaTotal
dbo:populationTotal 3141991 .
dbr:Spain 604300000.00
dbp:areaTotal 604300000.00 . 39 40
Linked Data ≠ Open Data Strategy: towards the Web of Data

Open Data Linked Data

Data can be published Data can be linked to


and be publicly URIs from other data
available under an sources, using open
open licence without standards such as RDF
linking to other data without being publicly
sources available under an
open licence

41 42

Linked Open Data Cloud Evolution LOD 2014

2007

2009

2011

43 http://lod-cloud.net/ 44
Linked Geodata cloud Queries SPARQL
http://dbpedia.org/sparql

PREFIX yago: <http://dbpedia.org/class/yago/>

SELECT ?thing
WHERE
{
?thing rdf:type yago:EuropeanCountries .
}
ORDER BY DESC(?thing)
LIMIT 25

45 46

soccer players, who are born in a country


with more than 10 million inhabitants, who
played as goalkeeper for a club that has a
SPARQL Query stadium with more than 30.000 seats and
the club country is different from the birth
country

Linked Open Data

‘’ (LOD) is Linked Data


which is released under
an open license, which
does not impede its
reuse for free.

Tim Berners-Lee

https://www.w3.org/DesignIssues/LinkedData.html
47 48
Benefits of LOD

▣ Flexible data integration


□ Interconnection of disparate dataset
▣ Network effect
□ Adding a new dataset increases the value of existing
datasets

3.
▣ Increase in data quality
□ Use of URLs leads to improved data management
and quality
□ Increased (re)use of datasets increases data quality,
errors are progressively corrected
Linked Data
Main
▣ Increase in data usability
□ URis and different formats (XML, CSV, JSON, …)
Foundations technologies

▣ Compability with existing infrastructure


□ Database / Files conversion to RDF
49 50

W3C Technologies for Linked Data Uniform Resource Identifier (URI)

URIs for ▣ “A Uniform Resource Identifier (URI) is a


naming things compact string of character for identifying an
abstract or physical resource.” (RFC 2396)
▣ IRI - Internationalized URI
□ Extends URIs beyond ASCII

RDF for
describing data

SPARQL for
querying linked
data
51 https://www.ietf.org/rfc/rfc2396.txt 52
Resource Description Framework
RDF Example (informal)

▣ RDF is a framework for representing data


and resources in the web
▣ Anything can be a resource (entity):
□ Document, physical thing, ...
▣ Every information is expressed in triples:
□ Subject: a resource, identified by a URI
□ Predicate: a URI identified relationship
□ Object: a resource or literal to which the subject is
related
▣ A set of such triples is called an RDF graph
▣ br:Madrid dbo:country dbr:Spain .
http://dbpedia.org/ontology/country
http://dbpedia.org/resource/Madrid
dbo:country
http://dbpedia.org/resource/Spain dbr:Madrid dbr:Spain 53 54
https://www.w3.org/TR/rdf11-concepts/
https://www.w3.org/TR/rdf11-primer/

RDF Serialization Triples <subject><predicate><object>.

▣ Several syntaxes to describe RDF graphs


□ RDF/XML
□ Turtle family of RDF languages
■ Turtle, N3, N-Quads, TriG
□ JSON-LD (JSON for Linked Data)
□ RDFa (embedding RDF into HTML)

55 56
Turtle JSON-LD

a → rdf:type
57 58

RDFa RDFa

▣ Annotate semantically HTML web pages


▣ Similar to microformats and microdata

59 60
https://www.w3.org/TR/rdfa-lite/
RDF/XML Blank nodes

▣ Resource (subject/object) without an ID

61 62

RDF Schema (RDF-S) Example


MotorVehicle
<<instance>>

▣ Support definition of vocabularies ex:MotorVehicle rdf:type rdfs:Class .


::yourCar
ex:Van rdf:type rdfs:Class . Van Truck
ex:Truck rdf:type rdfs:Class .
<<instance>>
ex:Van rdfs:subClassOf ex:MotorVehicle . ::myCar
ex:Van rdfs:subClassOf ex:MotorVehicle .

ex:yourCar rdf:type ex:MotorVehicle .


ex:myCar rdf:type ex:Van .

ex:author
ex:Book rdf:type rdfs:Class . Book Person
ex:Person rdf:type rdfs:Class .
ex:author rdf:type rdf:Property .
ex:author rdfs:domain ex:Book .
63 ex:author rdfs:range ex:Person . 64
Popular RDF-S vocabularies OWL - Ontology Web Language

▣ FOAF (Friend of a Friend) - Social Networks ▣ Additional semantics based on Description


▣ SKOS - Thesauri and taxonomies Logics
▣ Schema.org - web pages for search engines ▣ Complex class construction, cardinality, ….
▣ Dublin Core - metadata of pages (creator,
author, …)
Grandfather: Man ⋂ Parent

John, at least 4 children


who are Parent

https://www.w3.org/TR/2012/REC-owl2-primer-20121211/
65 66

SPARQL Protocol and RDF Query


Triplestore Language (SPARQL)

▣ W3C standardised language to query and


manipulate RDF data
▣ Family of specifications
□ SPARQL Query - Graph patterns (triples with vars)
■ SELECT query (return variable binding)
■ ASK query (yes / no question)
■ CONSTRUCT query (a new RDF graph is
constructed from the query result)
□ SPARQL Federated Query
■ Delegate subqueries to SPARQL endpoints
□ SPARQL Update
■ Insert, Delete, Update RDF tripels
□ SPARQL Protocol
■ Graph operations over HTTML
67 https://www.w3.org/TR/sparql11-overview/ 68
SPARQL 1.1 Graph pattern

69 https://programminghistorian.org/lessons/graph-databases-and-SPARQL 70

RDF, RDFS, OWL, SPARQL

Query / Manipulate:
Ontology: Consulting and manipulating
what to say said things
(what is correct)
(class constructors, OWL

4.
inferences, SPARQL
cardinality, …)
RDFS
Vocabulary: How to create a How to create a
new RDF
expressing shared terms
(classes, subclasses,
RDF
vocabulary vocabulary
range, domain) Statements:
how to say things
71 72
(triples, syntax)
6 steps for creating an RDF
vocabulary 1. Robust domain model

1. Start with a robust domain model ▣ A vocabulary should enable interoperability


2. Research and reuse existing terms within a domain: common understanding
3. When new terms specialize existing terms, ▣ Example: SIOC (online communities)
use subclass and subproperties
4. When new terms are required, create them
with commonly agreed best practice
5. Publish with a highly stable environment
designed to be persistent
6. Publicise the RDF vocabulary by registering it
with relevant services

73 74

2. Reuse existing vocabularies 2. Search available vocabularies

▣ General purpose: Dublin Core (DC)


▣ People: FOAF, vCard
▣ Projects: DOAP
▣ Addresses: vCard
▣ Geography: geo
▣ Entities: DBPedia (dbp)
▣ Sensors: Semantic Sensor Network (ssn)
▣ Social networks: SIOC
▣ Web information (eCommerce, Music,
Products, Restaurants, …): schema.org
▣ Taxonomies / Thesauri: skos

75 76
3. Create subclasses / subproperties 4. New terms

▣ Classes: Capital letter and singular:


□ skos:Concept, sioc:Post
▣ Properties: lower case
□ rdfs:label
▣ Object properties should be verbs
□ sioc:has_reply
▣ Data type properties should be nouns
□ dc:title

77 78

Google’s Knowledge Graph Facebook’s Open Graph Protocol

79 80
Conclusions

▣ Linked data is a set of design principles for


sharing machine readable data on the Web
▣ Open data is a movement for publishing data
in the web
Thanks!
▣ Linked Open Data is Open data published Any questions?
according to Linked Data principles
▣ The basic technologies of Linked Data are
RDF, SPARQL and URIS You can find me at cif@gsi.dit.upm.es
▣ RDFS and OWL add semantics to RDF
▣ RDF vocabularies are reusable, some popular
ones are SKOS and FOAF

81 82
Teacher
Carlos A. Iglesias
Introduction to
Machine Learning You can contact me at carlosangel.iglesias@upm.es
- Office C-211

References

▣ Web Data Mining, 2nd Ed., Bing Liu, 2011


▣ Machine Learning: The Art and Science of
Algorithms that Make Sense of Data,
by Peter Flach, 2012

1.
Motivation and
What this talk is
Goals about

3 4
Topics Problem: classify houses

▣ What is machine learning?


▣ Ingredients of machine learning

5 6

Some intuition Let’s look at home elevation data

▣ We face a
classification
problem
▣ Since San Francisco
is hilly, maybe home
elevation data is
relevant

Source: http://www.r2d3.us/visual-intro-to-machine-learning-part-1/
7 8
if elevation > 239.5ft then S. Francisco Adding nuance

▣ SF can be
expensive
▣ We add more
‘features’,
‘variables’,
‘predictors’ or
‘characteristics’
▣ Scatterplot
elevation vs
price/sqft

9 10

Boundaries Machine learning

▣ Linear
separation
▣ Lines
□ elevation >
239.5 →SF
□ price > $1776 →
LA
▣ Still one region
‘with mixed
values’
pairwise
scatterplot
11 12
Machine learning Split point: SF

▣ Automate discovering patterns False negative: some SF are classified as LA ones

▣ Split point to classify

13 New view: histogram 14

Split point: SF Best split

False positive: some LA are classified as SF ones

15 16
Recursion with more features Recursion

Accuracy: 82%

Accuracy: 84%

17 18

Recursion Recursion

Accuracy: 100%

Accuracy: 96%

19 20
Training and overfitting

2. What is
Machine
Introduction learning?

21 22

“Field of study that gives

What is Machine Learning? ‘’ computers the ability to


learn without being
explicitly programmed”
Arthur Samuel, Science, 1959

23 24
Use case: spam detection

“A computer program is

‘’ said to learn from


experience E with respect
to some class of tasks T and
performance measure P, if
its performance at tasks in
T, as measured by P,
improves with experience
E”
Tom M. Mitchell, 1997
25 26

Use case: market segmentation in


Use case: spam detection marketing

predictive model
27 28
Use case: marketing segmentation Use case: association rule

descriptive
model
29 30

Use case: cross selling Use case: credit card fraud detection

descriptive
model

31 32
Use case: fraud detection Use case: Tesla’s autopilot

descriptive &
predictive model
33 34

Use case: Tesla’s autopilot Supervised Learning Workflow

reinforcement
learning
predictive model
35 36
Unsupervised Learning Workflow Reinforcement learning

descriptive model
37 38

Deep Learning ML Process

Source: https://hackernoon.com/log-analytics-with-deep-learning-and-machine-learning-20a1891ff70e

39 40
Source: https://machinelearningmastery.com/machine-learning-checklist/
Descriptive, Predictive and Descriptive, Predictive, Prescriptive
Prescriptive models Analytics

41 42

Data analytics

3.
The ingredients
of Machine Elements of
Machine
learning Learning

43 44
Ingredients of Machine Learning Ingredients ML

Tasks Models

Features

45 46

Ingredients of Machine Learning Task hierarchy

Tasks Models

Features

Source: CommonKADS, Guus Screiber


47 48
ML tasks Machine learning tasks

Predictive model Descriptive model SUPERVISED UNSUPERVISED

classification, ● classification
Supervised learning subgroup discovery ● clustering
regression ○ KNN, SVM,
○ k-means
CATEGORICAL NaiveBayes,
descriptive ● association rules
induction
clustering, ○ Apriori
trees
association rule
Unsupervised learning predictive clustering regression dimensionality
discovery, ● ●
dimensionality ○ linear, reduction
reduction logistic, ○ SVD, PCA
CONTINUOUS
polynomial
● decision trees
● random forests

49 50

Machine learning types Tasks

Source: http://www.apress.com/us/book/9781484223338

51 Source: http://smartbasegroup.com/introduccion-al-machine-learning/ 52
Use cases Supervised ML - Predictive model

53 54
Source: http://usblogs.pwc.com/emerging-technology/demystifying-machine-learning-part-2-supervised-unsupervised-and-reinforcement-learning/#94555

Unsupervised ML - Descriptive model Evaluating task performance

55 56
Confusion matrix, precision, recall, F1 Training, Test and Validation data

Exactitud

Precisión

Exhaustividad

Factor F

57 58

Evaluation Overfitting and underfitting

▣ Training set: seeing data to build(fit) the


model
□ determine parameters
▣ Validation set: data to (unbiased) evaluation
of a model while tuning the model
□ determine hyperparameters
▣ Test/Evaluation set: unseeing data to
(unbiased) measure model’s performance
□ holding parameters and hyperparameters constant

59 60
Dimensionality Dimensionality in practice

▣ When we add more dimensions, ▣ Suppose we want to classify photos of cats


□ many times we can separate data in an easier way and dogs
□ but if we add too many, can occur overfitting ▣ Let’s start with one characteristic (i.e. average
▣ Overfitting red color)
□ Our model lacks of generalization capability
□ it only classifies correctly data using in training

▣ Very bad separation :(

61 Source: https://www.visiondummy.com/2014/04/curse-dimensionality-affect-classification/ 62

Second dimension Third dimension

▣ Second feature: Average of green ▣ Third dimension: average blue color: Eureka!

▣ Not a great deal


63 64
Overfitting Generalization

▣ If we add more dimensions, the model learns ▣ Even though it is not evident, in this example
‘the exceptions’ (the list of points) the model with 2 dimensions is better than
the one with 3
▣ This model classifies wrongly some example,
but has a better generalization capability

65 66

Another approach Generalization and training data

▣ How many data we need to train a model? ▣ Let’s suppose 1 characteristic is unique for every
▣ From all the possible photos of cats and dogs, cat and dog
how many do we need? □ with 1 single feature, we need 20% of the population of
▣ If we want to train with the 20% of the data... data
▣ But… with 2 features
□ we need 45% (0.452=0.2) to cover the 20% of the 2D space
▣ and with 3 features, we need the 58% (0.58³=0.2)...

67 68
Generalization and training data Holdout validation

▣ Conclusions ▣ If lot of data is available


□ If we add more dimensions, we need more data, to □ independent sampling: one dataset for training,
cover this 20% of the possibilities so that our training another for testing
data is representing well the dataset and avoid ▣ If lot of (labeled) data is available:
overfitting ▣ Holdout: divide data into training and testing
□ usually 10% - 30% for testing
▣ Stratified holdout: each class is represented
in the two sets

69 70

K-Fold Cross Validation K-Fold Cross Validation

▣ Maximizes use of data


□ all data for training and testing
▣ Usually, k = 10
▣ Leave-One-Out-Cross-Validation
□ k = n (number of samples): 1 for testing, n-1 training
□ Maximizes training data
□ Cannot be stratified (only 1 class in test)
□ High computational cost
▣ Stratified K-Fold
□ each class is (approximately) represented in each
fold

71 72
Holdout + K-Fold Ingredients of Machine Learning

Tasks Models

Features

73 74

Models: output of ML Models: a function

75 76
Model: process Model building and Hyperparameters

77 78

ML tribes ML tribes

Pedro Domingos, The Master Algorithm,


Source: https://kevinbinz.com/2017/08/13/ml-five-tribes/

79 80
Types of models Geometric models

▣ Hypothesis: there is a geometric separator


(e.g. line, hyperplane, etc.) between the
GEOMETRIC instances
LOGICAL
MODELS
MODELS ▣ Output: decision (clustering, classifier) based
on geometric descriptor (e.g. distance)

PROBABILISTIC
MODELS

Gaussian Mixture Models (GMM)


81 82

Probabilistic models Logical models

▣ Hypothesis: there is an underlying unknown ▣ Hypothesis: there are rules based on features
probability distribution that generates the to classify the instances
output from the input ▣ Output: decision trees
▣ Output: probabilistic model

83 84
Ingredients of Machine Learning Features

Tasks Models

Features

85 86

Good features Example: classification

87 88
Example: Bag of Words Dimensionality reduction

▣ Feature selection
▣ Feature extraction (compression)
□ PCA (Principal Component Analysis)
□ SVD (Singular Vector Decomposition)

89 90

Conclusions

▣ Machine learning is about using the right


features to build the right models that
achieve the right tasks
▣ These tasks include: binary and multi-class
Thanks!
classification, regression, clustering and Any questions?
descriptive modelling.
▣ Supervised learning works with labelled
datasets while unsupervised learning works You can find me at
with unlabelled datasets carlosangel.iglesias@upm.es
▣ The output of predictive models involves the
target variable while descriptive models
identify interesting structure in the data.
91 92
Credits

Thanks to all who have published free resources:


▣ Minicons by Webalys
▣ Slide template by SlidesCarnival
▣ Photos by Unsplash and Wix

93
Teachers
Carlos A. Iglesias

Geometric Models You can contact me at cif@gsi.dit.upm.es - Office C-211

Referencias

▣ Web Data Mining, 2nd Ed., Bing Liu, 2011


▣ Machine Learning: The Art and Science of
Algorithms that Make Sense of Data,
by Peter Flach, 2012
▣ Machine Learning, Andrew Ng, Stanford
University, Coursera
▣ Python Machine Learning, S. Raschka,
Packt, 2015 1. What this talk is
Introduction about

3 4
Topics

▣ Tour of Geometric methods


□ Simple Linear classifier
□ Linear Regression

2.
□ Perceptron
□ Support Vector Machines (SVM)
▣ Linear Regression
▣ Gradient Descent
▣ Conclusions
Tour of
Geometric
Main geometric
Models models

5 6

Families Tour of algorithms

▣ Linear Models ▣ Distance based


LINEAR MODELS
DISTANCE BASED MODELS
□ Regression models
■ Linear □ Classification
Linear Regression Regression ■ k-Nearest
□ Classification Neighbours
Logistic Regression ■ Perceptron (kNN)
kNN
■ Logistic □ Clustering
Perceptron Regression ■ k-Means
k-means ■ Support Vector
SVM Machines (SVM)

7 8
kNN - k Nearest Neighbors

▣ Classification
decision based on
majority of the k

3. nearest neighbors
▣ Usually Euclidean
distance
Tour of ▣ Non linear classifier
Distance based ▣ Effect large K
□ noise
Main distance
Models based models □ majority class

9 10

K-Means Clustering

▣ Place k random
centroids
▣ For each point i
□ find nearest centroid
j_i
□ assign point i to
cluster j
▣ For each cluster
4.
□ new centroid = Tour of Linear
average of points Main linear
assigned to cluster based Models based models
▣ Repeat until there
are not changes
11 12
Perceptron Perceptron

▣ Linear classifier
1843 Warren McCulloch
and Walter Pitts
▣ Binary classifier
▣ Learning

1957 Frank Rosenblatt

13 14

Perceptron learning rule Support Vector Machines (SVM)

+1
▣ Output h(x) = sign(wTx) ▣ Perceptron: learn any separating hyperplane
▣ Pick misclassified point Wk □ Minimize misclassified points
□ sign(wTxn) != yn ▣ SVM: learn the hyperplane with maximum
Wk + 1
▣ Update weight vector margin
□ wk+1 = wk + ynxn

w+yx
-1 Wk x + b = 0
y = +1
X Wk+1 x + b = 0
w

y = -1
w
X 15 16
w+yx
Support Vector Machines (SVM) SVM

▣ Large margin classifier


□ Data is messy → allow some errors
□ Parameter C: amount of allowed errors

Source: Python Machine Learning,

17 18

Kernel Trick Example

▣ Intuition
□ Use linear algorithms for classification in higher
dimensions

19 20
Kernel trick

▣ We are only operating wTx


□ wTx = <x, w> = ||x|| ||w|| cosϴ
▣ Kernel trick: redefine inner products
□ Go from the original space to a project space
(higher dimensional)

5.
□ Not need to do transformations → only products are
changed (matrices)
□ No extra computational cost Linear
▣ Common kernels
□ Polynomial
Linear Regression and
Gradient
□ Gaussian Radial Basis Function(RBF)
Regression descent

21 22

Problem: price house prediction Some intuition

▣ We face a…
□ data is continuous or discrete?
□ supervised or unsupervised?
□ prediction or classification problem?
▣ Scatter plot: it seems we can describe the
relationship between surface and price by a
line

23 Source: 24
https://www.pugetsystems.com/labs/hpc/Machine-Learning-and-Data-Science-Linear-Regression-Part-1-954/
Machine learning process Model representation

Training set
(xi, yi)

Learning
Algorithm
residual

Model (h)

25 26

Model visualization Model visualization

27 28
Cost function J(a0, a1) Cost function visualization

29 30

h(a1) and J(a1) In n+1 dimensions, Θ0 and Θ1

31 32
Solving Linear Regression Gradient descent

▣ Simple linear regression


□ Using statistics if all data if available (means,
variance, correlations, …)
□ Not useful when data changes
▣ Ordinary Least Squares Method
□ Goal: minimize sum of squared residuals
□ Model data as a matrix and uses algebra
□ Commonly used and very fast
▣ Gradient Descent
□ Optimized the coefficients in an iterative way
□ Alpha parameter determines the size of the
improvement
□ Useful with very large datasets
Source: https://mubaris.com/2017/09/28/linear-regression-from-scratch/
33 https://mubaris.com/2017/09/28/linear-regression-from-scratch/ 34

Gradient intuition Gradient intuition

Source: http://www.big-data.tips/gradient-descent 35 36
Gradient intuition Gradient descent - local minimum

37 38

Gradient descent - linear regression Gradient descent algorithm

▣ Given the cost function ▣ Iteratively until gradient is 0

Nabla operator

α = step size in the gradient


direction

39 40
Example alpha too big - oscillates in
Example gradient - good alpha / slow minimum

Convex function - local minimum?


41 42

Example alpha too big - not converge Conclusions

▣ There are a large number of ML algorithms


▣ Geometric ML can be classified as
distance-based and linear based
▣ Linear algorithms can address non linear
classifications with the kernel trick (higher
dimension space)
▣ Most of ML algorithms have some
optimization algorithm, such as gradient
descent
▣ It is important to understand both
parameters and hyperparameters

43 44
Credits

Thanks! Thanks to all who have published free resources:


▣ Minicons by Webalys
▣ Slide template by SlidesCarnival
Any questions? ▣ Photos by Unsplash and Wix

You can find me at


cif@gsi.dit.upm.es

45 46
Teachers
Carlos A. Iglesias

Tree based Models You can contact me at carlosangel.iglesias@upm.es


Office C-211

References Topics

▣ Machine Learning: The Art and Science of ▣ Tour of Tree Models


Algorithms that Make Sense of Data, □ Decision Trees
by Peter Flach, 2012 □ Ensembles
▣ Induction of Decision Trees, J. R. Quinlan, ▣ Decision Trees
Machine Learning, 1: 81-106, 1986, Kluwer ▣ Conclusions

3 4
1. What this talk is
Introduction about

5 6

Historical perspective Practical use: Kinect

Decision Trees Random Forest ▣ Uses Random Forest to predict body pose
▣ J. R. Quinlan, ▣ T.K. Ho, “Random ▣ Implemented efficiently on the GPU
“Induction of Decision Forests”.
Decision Trees”, Proc. ICDAR, 1995.
1979 ▣ L. Breiman,
▣ L. Breiman et al., "Random Forests".
Classification and Machine Learning.
regression trees. 45 (1): 5–32, 2001.
T&F, 1984.
▣ J. R. Quinlan, “C4.5:
Programs for
Machine Learning”.
MK Publishers, 1993. 7
http://www.i-programmer.info/news/105-artificial-intelligence/2176-kinects-ai-breakthrough-explained.html
8
Families

ENSEMBLE MODELS
DECISION TREE (decision forest)

Random Forest
ID3

2.
Extratrees
C4.5
GBM (Gradient
C5.0
Boosting Machine)
CART
Tour of Tree
Main tree
Models models

9 10

Tour of algorithms Binary Classification tree

Family Algorithm Classification Regression

ID3, C4,5, entropy --


C5.0 (information gain, ig)

CART gini impurity, mean


Decision Tree (binary) entropy or generic squared error
function (mse), mean
absolute
error (mae)

Random gini, entropy mse, mae


Ensemble Forest

Extra Trees gini, entropy mse, mae

11 Source: http://apprize.info/python/scratch/17.html 12
Non-binary Classification tree Regression tree (continuous output)

Source: Python Machine Learning


13 14

Multiway and Binary splitting Decision tree naming

color? color ==
green?
Yes No

red green yellow


color ==
yellow?

15 16
Basic algorithm

▣ Choose the best attribute(s) to split the


remaining instances and make that attribute a
decision node
▣ Repeat this process for recursively for each
child
3. ▣ Stop when:
□ All the instances have the same target attribute
value
Decision Tree Fundamentals □ There are no more attributes
of Decision Tree □ There are no more instances
Learning Learning

17 18

Will Nadal play the match? The induction task

▣ outlook: {sunny, overcast, rain}


▣ temperature: {cool, mild, hot}
▣ humidity: {high, normal}
▣ windy: {true, false}

Induction of Decision Trees, J. R. Quinlan,


Machine Learning, 1: 81-106, 1986, Kluwer
19 20
Data Which attribute provides more info?

(9P, 5N)

Outlook

Sunny Overcast Rain


(2P, 3N)

21 22

Which attribute provides more info? Which attribute provides more info?

Outlook Temperature

Outlook

Sunny Overcast Rain Hot Mild Cool


(2P, 3N) (4P, 0N) (3 P, 2N) (2P, 2N) (4P, 2N) (3P, 1N)
Sunny Overcast Rain
(2P, 3N) (4P, 0N) (3P, 2N)
Humidity Wind

split “pure”: all of one class and


zero from the rest, no need to Weak Strong
split High Normal
23 (3P, 4N) (6P, 1N) (6P, 2N) (3P, 3N) 24
Select best attribute Measure of Node Impurity

▣ We prefer nodes with homogeneous class


distribution
▣ Ej.
□ Wind strong (3P, 3N)
■ → non homogeneous, high impurity
□ Outlook overcast (4P, 0N)
■ → homogeneous, low impurity

25 probability (proportion) positive class: 0 → all negative (pure); 1.0 → all positive (pure) 26

(9P, 5N) (9P, 5N)

Outlook Outlook
Entropy Outlook Information Gain (IG)
Sunny Overcast Rain Sunny Overcast Rain
(2P, 3N) (4P, 0N) (3P, 2N) (2P, 3N) (4P, 0N) (3P, 2N)

▣ Entropy: level of ‘disorder’ or ‘impurity’ or ▣ IG(S,A) : expected reduction in entropy in S


‘diversity’ because of the split in attribute A
▣ Entropy(S) = E(S) = -∑pilog2pi ▣ IG(S, A) = E(S) - ∑|Sv|/|S| E(SA), v: values of A
□ S = set of examples
□ pi = proportion of examples of class i in S ▣ E(S) = 0.940
▣ E(Outlook) = 5/14 E(Sunny) + 4/14 E(Overcast) + 5 /14 E(Rain)
= 5/14 * 0.971 + 4/14 *0 + 5/14* 0.971 = 0.694 bits

▣ E(Initial) = - 9/14log29/14 - 5/14log25/14 = 0.940


▣ E(Sunny) = -2/5log22/5 -3/5log23/5 = 0.971 ▣ IG(S, Outlook) = E(S) - E(Outlook) = 0.246 bits
▣ E(Overcast) = 0
▣ E(Rain) = -3/5log23/5 -2/5log22/5 = 0.971

27 28
(9P, 5N) (9P, 5N)

IG Temperature Temperature IG Humidity Humidity

Hot Mild Cool High Normal


▣ E(S) = 0.940 (2P, 2N) (4P, 2N) (3P, 1N) ▣ E(S) = 0.94 (3P, 4N) (6P, 1N)

▣ E(Hot) = 1 ▣ E(High) = -3/7log2(3/7)-4/7log2(4/7) = 0.985


▣ E(Mild) = -4/6log2(4/6)-2/6log2(2/6) = 0.918 ▣ E(Normal) = -6/7log2(6/7) - 1/7log2(1/7) = 0.591
▣ E(Cold) = -3/4log2(3/4) - 1/4log2(1/4) = 0.811
▣ IG(S, Humidity) = 0.94 - 7/14 * 0.984 - 7/14 * 0.591
▣ E(Temperature) = 4/14 * 1 + 6/14 * 0.918 + 4/14 * = 0,151 bits
0.811 = 0.911
▣ IG(S, Temperature) = 0.94 - 0.911 = 0,029 bits

29 30

(9P, 5N)

IG Wind Wind Select attribute

Weak Strong
▣ E(S) = 0.94 (6P, 2N) (3P, 3N) ▣ IG(S, Outlook) = 0.246 bits
▣ IG(S, Temperature) = 0,029 bits
▣ E(Weak) = -6/8* log2(6/8) - 2/8*log2(2/8) = 0.811 ▣ IG(S, Humidity) = 0,151 bits
▣ E(Strong) = 1 ▣ IG(S, Wind) = 0,048 bits

▣ IG(S, Wind) = 0.94 - 8/14 * 0.811 - 6/14 * 1


= 0,048 bits

31 32
Select next attribute - Sunny (I) Select next attribute - Sunny (2)
(9P, 5N)

Outlook Outlook Outlook

(2P, 3N) Sunny Sunny


(2P, 3N) Sunny
... ... ... ...
... humidity windy
temperature
...
high normal weak strong
hot mild cold

(0P, 3N) (2P, 0N) (1P, 2N) (1P, 1N)


(0P, 2N) (1P, 1N) (1P, 0N)

▣ E(Outlook=Sunny) = 0.971;
▣ E(humidity | Sunny) = 0;
▣ E(Outlook=Sunny) = 0.971; E(hot) = 0; E(mild) = 1; E(cold) = 0
▣ IG(humidity) = 0.971- 0 = 0.971 bits
▣ E(temperature | Sunny) = 2/5 E(hot) + 2/5 E(mild) + 1/5 E(cold) ▣ E(weak|Sunny) = -⅓ * log2(⅓)-⅔*log2(⅔) = 0.918
= 2/5 * 0 + 2/5 *1 + 1/5* 0 = 0.4 bits ▣ E(windy|Sunny) = ⅗ * 0.918 + ⅖ * 1 = 0.951
▣ IG(temperature) = 0.971 - 0.4 = 0.571 bits 33 ▣ IG(windy) = 0.971 - 0.951 = 0.020 bits 34

Select next attribute Select next attribute - rain (I)


(9 P, 5N)

Outlook

(9P, 5N)
(3 P, 2N) Rain
Outlook

temperature
(2P, 3N) Sunny Overcast Rain (3P, 2N)
hot mild cold
(4P, 0N)
humidity ?
(0 P, 0N) (2P, 1N) (1P, 1N)
high normal

(0P, 3N) (2P, 0N)


▣ E(Outlook=Rain) = 0.971; E(hot) = 0; E(mild) = 0.918; E(cold) = 1
▣ E(temperature | Rain) = 3/5 * 0.918 + 2/5 = = 0,951 bits
▣ IG(temperature) = 0.971 - 0.951 = 0.02 bits
35 36
Select next attribute - rain (II) Final decision tree

Outlook Outlook

(3 P, 2N) Rain Rain (9P, 5N)

... ... ... ... outlook


humidity wind

high normal weak strong (2P, 3N) Sunny Overcast Rain (3P, 2N)

(4P, 0N)
humidity wind
(1 P, 1N) (2 P, 1N) (3 P, 0N) (0P, 2N)
high normal weak strong

▣ E(Outlook=Rain) = 0.971;
▣ E(Humidity | Rain) = ⅖ * 1 + ⅗ * 0,918 = 0,951 bits (0P, 3N) (2P, 0N) (3P, 0N) (0P, 2N)
▣ IG(Humidity|Rain) = 0.971 - 0,951 = 0.02 bits
▣ E(Wind|Rain) = 0
▣ IG(Wind|Rain) = 0.971 bits 37 38

Final tree as disjunction of


Final tree as decision rules conjunctions

39 40
Inductive bias

“Pluralitas non est ponenda


▣ Does ID3 generalize from training samples?

‘’ sine necessitate”
“when you have two
▣ ML “Bias”: a ML algorithm prefers some
hypotheses
▣ ID3: prefers “short trees” to “long trees”
competing theories that
→ short hypotheses
make exactly the same
predictions, the simpler one
is the better”
Occam's razor, c. 1287–1347

41 42

Issues Conclusions

▣ Overfitting with training data ▣ Decision trees (DT) can be seen as rules and
□ Prepruning: Stop growing tree as some point make easy to understand the outcome of ML
during top-down construction when there is no ▣ DTs can be used for classification and
longer sufficient data to make reliable decisions.
□ Postpruning: Grow the full tree, then remove
regression
subtrees that do not have sufficient evidence ▣ Main algorithms are CART, ID3 and C4.5
▣ Handling missing or wrong values ▣ Ensembling approaches such as Random
Forest provide a very robust approach for
combining DTs

43 44
Credits

Thanks! Thanks to all who have published free resources:


▣ Minicons by Webalys
▣ Slide template by SlidesCarnival
Any questions? ▣ Photos by Unsplash and Wix

You can find me at carlosangel.iglesias@es

45 46

You might also like