1 Intro IA - Merged

Teachers
Intelligent
Techniques Carlos A. Iglesias
Introduction You can contact me at carlosangel.iglesias@upm.es -
Office C-211
Goals
▣ Learn the foundations of intelligent systems

▣ Understand the challenges and objectives of
intelligent systems
▣ Understand some of the key technologies in
this field
□ Knowledge Representation
□ Knowledge based systems
1. □
□
Machine learning
Natural Language Processing
Goals Topics
3 4
What is AI? - Press view
2.
Introduction What is AI?
5 6
What is AI? - Press view
7 8
9 10
THINKING HUMANLY
"The exciting new effort to make computers think ... machines with minds, in
What is Artificial Intelligence (AI) ? the full and literal sense." (Haugeland, 1985)
"[The automation of] activities that we associate with human thinking,
activities such as decision-making, problem solving, learning " (Hellman, 1978)
THINKING RATIONALLY
"The study of mental faculties through the use of computational models."
(Charniak and McDermott, 1985)
Systems that think Systems that think
THOUGHT "The study of the computations that make it possible to perceive, reason, and
like humans rationally
act." (Winston, 1992)
ACTING HUMANLY
Systems that act Systems that act "The art of creating machines that perform functions that require
BEHAVIOUR intelligence when performed by people." (Kurzweil, 1990)
like humans rationally
"The study of how to make computers do things at which, at the moment,
people are better." (Rich and Knight, 1991)
HUMAN RATIONAL
ACTING RATIONALLY
"Computational Intelligence is the study of the design of intelligent
agents." (Poole et at, 1998)
"Al ... is concerned with intelligent behavior in artifacts." (Nilsson, 1998)
11 12
Which capabilities requires the
Acting humanly: The Turing Test Turing test?
Natural
Language Automated Knowledge
Processing reasoning representation
Machine
learning
13 14
Total Turing Test (Harnard, 1992)
▣ The computer is a robot that should look, act

and communicate like a human
Computer
Robotics
Vision
15 16
Philosophy and AI:
Critic: Searle’s Chinese room Strong AI vs Weak AI
▣ Strong AI:
□ “True” AI
□ AI matches (or exceeds) human intelligence
□ AI machines have real conscious minds
□ Ej. HAL, Terminator, ...
▣ Weak AI:
□ AI “only” simulates human cognition
□ Narrow AI: constrained in problems / domains
Minds, Brains and Programs, John Searle, 1980

17 18
Thinking humanly: Thinking rationally:

the cognitive approach the logicist approach
▣ First: how do humans think? ▣ Logicist tradition from Aristotle

▣ Ways to understand human mind □ Formalize knowledge with logical notation and rules
□ Observe human reasoning to derive new knowledge
□ Psychological experiments ▣ Correct inferences - logical proof
□ Observe the brain in action ▣ Challenges
▣ Cognitive science: computers models from AI □ Integrate informal / uncertain knowledge
+ techniques from psychology to construct □ Scalability in real problems
precise theories of the human mind
▣ General Problem Solver (GPS, Newell & Simon
1961): not only solves correctly but with the
same reasoning steps than humans
19 20
Acting rationally:
rational agent approach Discussion
▣ The branch of computer science that is ▣ Acting rationally

concerned with the automation of intelligent □ More general that thinking rationally (logicist
behavior (Luger) approach)
■ correct inference is just one way to achieve
▣ Systems that operate autonomously, perceive rationality
their environment, persist over a prolonged □ More focused on scientific development
time period, adapt to change, and create and ■ than thinking or acting humanly
pursue goals. ▣ Our focus in the course: rational agents
▣ A rational agent is one that acts so as to
achieve the best outcome or, when there is
uncertainty, the best expected outcome
▣ Rational behaviour: doing the right thing
21 22
Foundations of AI
▣ Philosophy: logic, reasoning, rationality, …

▣ Mathematics: logic, computability, tractability,
np-completeness, probability, ...
▣ Economics: utility, decision theory, game
theory, operations research, …
▣ Neuroscience: neuron
▣ Psychology: behaviourism, cognitive 2.
psychology
▣ Computer engineering: ENIAC, ...
The History of
Disciplines that
▣ Control theory and cybernetics: homeostatic, AI contribute to AI
▣ Linguistics: computational linguistics
23 24
Pre-IA: 1911 Torres Quevedo
Story of IA “Ajedrecista” (Chess automaton)
"Torres and His Remarkable

https://history.computer.org/pioneers/torres.html Automatic Devices," Scientific
American Supplement, Vol. 80, No.
25 2079, 1915, pp. 296-298. 26
IBM ,
Pre-IA: 40s Pre-IA: First neuron
▣ Science-Fictions ▣ 1888 - Ramón y Cajal - brain composed of

▣ Three laws of Robotics, 42 neurons
□ No injure human beings ▣ 1943 - McCulloch-Pitts - first model of a neuron
□ Obey human orders □ A Turing machine can be implemented with ANN
□ Auto protect
27 28
Birth of AI: 1952-1956
The brain is a physical machine… Adaptive

▣ WW2 Ratio Club Dinner: British cybernetics behaviour is that which maintains it in
with biologists, psychologists, maths,
neurologists, computers, … Turing Test
▣ Cybernetics (e.g. Walter)
‘’ physical equilibrium with its environment
Ashby 1947
“Adaptive” behaviour is equivalent to the
□ behaviour patterns and Paulov behaviour of a stable system, the region of
the stability being the region of the
phase-space in which all the essential
variables lie within their normal limits
Ashby 1952
Walter’s tortoise 29 30
The Law of Requisite Variety (Ashby) Birth of AI: 1952-1956
▣ The number of states of its control ▣ 1951 Prinz’s checker (UK)

mechanism must be greater than or equal to ▣ Ferrari Mark I, Manchester Electronic
the number of states in the system being Computer, 1st commercial computer
controlled.
▣ Ashby states the Law as
▣ "Only variety can destroy variety”
31 32
Birth of AI: 1952-1956 IA: Dartmouth Summer 56
▣ 1956: Newell & Simon: Logic Theorist ▣ McCarthy, Minsky, Shannon, Simon, Newell, Mc
□ first AI program that uses heuristics “rules of Culloch, Nash, Samuel
thumb” to prove 38 out of 52 theorems of Principia ▣ McCarthy
Mathematica (Whitehead and Russell) □ Program Advice Taker - Logic with Common Sense
https://en.wikipedia.org/wiki/Dartmouth_workshop
33 34
Golden era - 1956-1974 Golden era - 1956-1974
▣ 1957 Newell & Simon - General Problem Solver ▣ 1960 - Quillian - 1st Semantic Network (or
(GPS): generalization of LT, model of cognition Frame)
□ reasoning as search ▣ 1966 - Weizenbaum, ELIZA - 1st Chatbot
▣ 1957 Rosenblatt - Perceptron
□ based on McCulloch-Pitts (1943)
□ training mechanism to learn the weights
▣ 1959 Samuel’s ML checker (USA)
□ Coined the term ‘machine learning’
35 36
Golden era - 1956-1974 The first AI winter 1974–1980
▣ Optimism ▣ High expectations, poor results

▣ Funding ▣ End of funding
□ Millions $ for funding AI labs in USA and Europe ▣ 1966 - failure of Machine Translation
□ NRC - after 20 billion $ investment
□ Underestimation difficulties
■ word disambiguation
■ common sense knowledge
▣ 1969 - abandonment of connectionism
□ Minsky & Papert, Perceptron cannot solve XOR!!!
▣ Lack of scalability
□ Limited computing facilities
□ Quilliam semantic networks only with 20 words
□ Combinatorial explosion → Toy problems
37 38
First AI winter 1974–1980 - Teenage AI Schools of Thought
▣ 1971-75 DARPA's frustration with the Speech “Neat” “Scruffy”

Understanding Research program at CMU ▣ Logical approach ▣ Symbolic approach
▣ 1973 - Lighthill report - the large decrease in AI ▣ McCarthy, Newell, ▣ Schank, Minsky,
research in the UK Simon Weizenbaum
▣ 1973-74 DARPA's cutbacks to academic AI ▣ Logic ▣ Frames and Scripts
research in general ▣ Prolog □ Origin of OOP
▣ CMU, Stanford, ▣ Lisp
Edinburgh ▣ MIT
39 40
New developments: Logic and 1969-1979 - Expert Systems
Symbolic reasoning New hopes
▣ 1972 - Colmenaur and Roussel, success of ▣ Small domains to avoid common sense
Prolog (PROgrammation en LOGique) ▣ 1969 Feigenbaum, Buchanan et al. DENDRAL
□ Reduces logic (Horn clauses) to be tractable (similar □ infer molecular structure from info provided by a
to rules and production rules) mass spectometer
▣ Critics to logic from psychologists: people do ▣ 1972 Feigenbaum - MYCIN
not think with logics □ diagnosed infectious blood diseases
□ McCarthy → machines should not think like humans □ evolves to E-MYCIN
▣ Development of Expert Systems & Knowledge ▣ 1978 McDermott R1/XCON, eXpert CONfigure
based systems (KBS) □ selecting computer system components based on
customer's requirements
□ 2500 rules.
□ By 1986, it had processed 80,000 orders, and
achieved 95-98% accuracy. It was estimated to be
saving DEC $25M a year
41 42
MYCIN Expert System
Expert
System
43 44
ES with KBS architecture Boom 1980–1987
▣ Emergence of KBS
▣ 1988 Deep Thought - wins chess masters
▣ 1982 Knowledge Level - Newell
□ Knowledge engineering
▣ Return of investment
▣ 1986 Revival of connectionism
□ Rumelhart, Backpropagation
45 46
Bust: the second AI winter 1987–1993
There exists a distinct

▣ 1987: collapse of the Lisp machine market
‘’ computers system level, lying

immediately above the
symbol level, which is
characterized by knowledge
▣ 1988: the cancellation of new spending on AI
1993: fail of expert systems
▣ 1990s: the quiet disappearance of the
as the medium and the fifth-generation computer project's original
principle of rationality as the goals.
law of behaviour
Newell, 1982
47 48
Weak definition of agents,
AI 1993–2001 Wooldridge 1994
▣ 90s: intelligent agents ▣ autonomy: agents operate without the direct

▣ 1997 Deep Blue beats Kasparov intervention of humans, and have control over
▣ 2011 IBM Watson wins Jeopardy their actions and internal state;
▣ social ability: agents interact with other agents
/ humans via some kind of
agent-communication language
▣ reactivity: agents perceive their environment,
and respond to changes that occur in it;
▣ pro-activeness: agents are able to exhibit
goal-directed behaviour by taking the
initiative.
49 50
Deep learning, big data and artificial

Multiagent Systems general intelligence: 2000–present
▣ System of agents ▣ 2007 Hinton Deep learning

▣ Require cooperation, coordination, □ Distributed Hypothesis
negotiation ▣ Big Data
▣ Social structures
51 52
Technological singularity
53 54
4. Disciplines that
AI Industry contribute to AI
55 56
AI Companies
57 58
Investment in IA (08/2015) Innovation in AI (8/2015)
59 60
Funding per AI category
9. What we have
Conclusions learnt
61 62
Conclusions References
▣ AI aims at developing systems that reproduce ▣ Artificial Intelligence, a A Modern Approach,

human behaviour and reasoning 3rd Ed, Russell & Norvig, Chapter 1, Prentice
▣ AI is a multidisciplinary field Hall, 2016
▣ AI has a number of subfields, such as
reasoning, knowledge representation,
machine learning, natural language
processing, computer vision and robotics
▣ AI is living a new golden era based on the
availability of large data sets
63 64
Credits
Thanks! Thanks to all who have published these resources

with a free licence:
▣ Minicons by Webalys
Any questions? ▣ Slide template by SlidesCarnival
▣ Images from Unsplash and Wix
You can find me at

cif@gsi.dit.upm.es
65 66
Hello!
Carlos A. Iglesias
Linked Data
Technologies You can contact me at cif@gsi.dit.upm.es - Office C-211
Carlos A. Iglesias
Universidad Politécnica de Madrid 2
Objectives
The learning objectives of this lesson consist of

having a clear understanding of:
▣ What open data is
▣ What linked data is
▣ The differences between open data and linked
data
1.
▣ How to publish a linked data
▣ How to define new vocabularies What is Open
▣ What are the main technologies of linked data Data and why it
(RDF, SPARQL) and their principles Open Data is important
3 4
My Administration is
committed to creating an
unprecedented level of
How can one openness in Government.
govern We will work together to
informed ensure the public trust and
establish a system of
citizens? transparency, public
participation, and
collaboration. Openness will
strengthen our democracy
and promote efficiency and
effectiveness in
Government.
President Obama, 2009 https://www.whitehouse.gov/open
5 6
Gobierno abierto Open Data, Open Government, Big Data
Source: Creating Open Value through Open Data, European Data Portal, 2015
A Vision of public services http://ec.europa.eu/newsroom/dae/document.cfm?doc_id=3179 7 8

Open Data characteristics
● Public: public access respecting legal

restrictions and concerns such as privacy or
security
● Accessible: open formats
● Documented: metadata to understand its use
● Reusable: open licences that not limit its use
● Complete: raw data (not processed) with
enough granularity
● Updated: updated data stream to preserve its
value
● User Support Center: provide support
https://www.whitehouse.gov/sites/default/files/omb/memoranda/2013/m-13-13.pdf 9 10
That's why I say that data is datos.gob.es

the new oil for the digital
age.
How many other ways could

stimulate a market worth 70
billion euros a year, without
spending big budgets? Not
many, I'd say.
Neelie Kroes,
EU vicepresident, 2012
http://europa.eu/rapid/press-release_SPEECH-12-149_en.htm
11 12
Data Value Chain
13 14
325M€
Direct market size EU+26 2016-2020
Benefits of open data
25k new jobs

In 2016: 75k open data jobs in EU+26,
▣ Increment economic activity
□ Enterprise creation, improve entrepreneurship
□ Creation of new products and services
In 2020: 100k - 33% of increment, 9k jobs in Italy □ Improves existing products
▣ Reduces administration costs
1,7b€ saved in costs

182M€ saved in Italy in public administrations in 2020
□ Self-service → better use of resources
□ Transparency → quick public detection of unuseful
expenses
▣ Increased efficiency to link data across
626 m hours saved

administrations
▣ Creation of jobs
□ Need of new skills for Big Data era
Congestion costs is 1% of GDP every year
Milan most congested city with 61.2 hours wasted in 2014
15 16
Source: http://www.europeandataportal.eu/es/content/creating-value-through-open-data
Examples of companies created
using raw governamental data Open Data Europe
Zillow
Weather
valued at
Channel Garmin
more than
sold in market of
$1billion
$3.5billion $7.24billion
329 jobs in
in 2008
5 years
17 18
“Datos abiertos” in Spain Spain - opendatabarometer.org
19 20
Uniform Linked Data
Resource
Identifier
(URI)
Open format
Machine readable
How can we
Open licence
publish open
data?
21 http://5stardata.info/en/ 22
How can we
publish
information to be
shared?
23 24
Traditional approach:
Usual technologies for sharing data Structural integration
▣ Data dump or database accessible through a

service
▣ Problems
□ Heterogeneous data, without common keys or
different types
□ Different definitions (what’s a region, ...)
□ Different attributes
□ Integration case by case
□ “Centralized” integration
□ Mix of fields with semantics with fields for
optimizing or normalising the database schema
25 26
Linked Data Approach:

Semantic Integration
▣ No need to change existing IT infrastructure

▣ We change how to publish and share data
□ Processes for transforming our data as linked data
■ Potential benefit from external open data
□ Processes for publishing interesting and relevant
parts of our data
2. What is Linked
Data and why it
Linked Data is important
27 28
From a Web of Documents
to a Web of Data
Linked Data is the term
‘’ used to describe a
method of exposing
and connecting data on
the Web from different
sources following the
four principles of Linked
Data
Tim Berners-Lee
https://www.ted.com/talks/tim_berners_lee_on_the_next_web https://www.w3.org/DesignIssues/LinkedData.html
29 30
The Four Design principles of Linked

Data (by Tim Berners Lee, 2006) The 4 principles in practice (1)
1. Use URIs as names for things 1. Use URIs (Uniform Resource Identifier) as
2. Use HTTP URIs so that people can look up names for things
those names. 2. Use HTTP URIs so that people can look up
3. When someone looks up a URI, provide useful those names.
information, using the standards (RDF,
SPARQL) UPM,
4. Include links to other URIs. so that they can

Universidad Politécnica de Madrid,
Technical University of Madrid,
discover more things ...
http://es.dbpedia.org/page/Universidad_Politécnica_de_Madrid
31 32
The 4 principles in practice (2) UPM
3. When someone looks up a URI, provide useful

information, using the standards (RDF*,
SPARQL)
4. Include links to other URIs. so that they can
discover more things
33 34
UPM - N3 UPM - XML
35 36
The Four Linked Data principles
Madrid (by Tim Berners Lee, 2006)
▣ Everything should have a unique web

identifier (URI)
▣ Information about things is published as web
resources with HTTP
▣ Information is described with RDF ‘triples’
(subject, predicate, object) (=entity, attribute,
value )
▣ Things are connected with relationships
http://dbpedia.org/resource/Madrid
Madrid
http://dbpedia.org/ontology/country country
http://dbpedia.org/resource/Spain Spain
37 38
RDF (Resource Description

Language) Web Resource
▣ W3C specification
▣ Resources are described as triples that form a
graph
▣ Graphs can be serialised using different
languages: XML, Turtle, JSON-LD
▣ E.g. Turtle
@prefix dbo: <http://dbpedia.org/ontology/>. Madrid
@prefix dbp: <http://dbpedia.org/property/>. populationTotal
@prefix dbr: <http://dbpedia.org/resource/>. country
dbr:Madrid Spain 3141991

dbo:country dbr:Spain ;
areaTotal
dbo:populationTotal 3141991 .
dbr:Spain 604300000.00
dbp:areaTotal 604300000.00 . 39 40
Linked Data ≠ Open Data Strategy: towards the Web of Data
Open Data Linked Data
Data can be published Data can be linked to

and be publicly URIs from other data
available under an sources, using open
open licence without standards such as RDF
linking to other data without being publicly
sources available under an
open licence
41 42
Linked Open Data Cloud Evolution LOD 2014
2007
2009
2011
43 http://lod-cloud.net/ 44
Linked Geodata cloud Queries SPARQL
http://dbpedia.org/sparql
PREFIX yago: <http://dbpedia.org/class/yago/>
SELECT ?thing
WHERE
{
?thing rdf:type yago:EuropeanCountries .
}
ORDER BY DESC(?thing)
LIMIT 25
45 46
soccer players, who are born in a country

with more than 10 million inhabitants, who
played as goalkeeper for a club that has a
SPARQL Query stadium with more than 30.000 seats and
the club country is different from the birth
country
Linked Open Data
‘’ (LOD) is Linked Data

which is released under
an open license, which
does not impede its
reuse for free.
Tim Berners-Lee
https://www.w3.org/DesignIssues/LinkedData.html
47 48
Benefits of LOD
▣ Flexible data integration

□ Interconnection of disparate dataset
▣ Network effect
□ Adding a new dataset increases the value of existing
datasets
3.
▣ Increase in data quality
□ Use of URLs leads to improved data management
and quality
□ Increased (re)use of datasets increases data quality,
errors are progressively corrected
Linked Data
Main
▣ Increase in data usability
□ URis and different formats (XML, CSV, JSON, …)
Foundations technologies
▣ Compability with existing infrastructure

□ Database / Files conversion to RDF
49 50
W3C Technologies for Linked Data Uniform Resource Identifier (URI)
URIs for ▣ “A Uniform Resource Identifier (URI) is a

naming things compact string of character for identifying an
abstract or physical resource.” (RFC 2396)
▣ IRI - Internationalized URI
□ Extends URIs beyond ASCII
RDF for
describing data
SPARQL for
querying linked
data
51 https://www.ietf.org/rfc/rfc2396.txt 52
Resource Description Framework
RDF Example (informal)
▣ RDF is a framework for representing data

and resources in the web
▣ Anything can be a resource (entity):
□ Document, physical thing, ...
▣ Every information is expressed in triples:
□ Subject: a resource, identified by a URI
□ Predicate: a URI identified relationship
□ Object: a resource or literal to which the subject is
related
▣ A set of such triples is called an RDF graph
▣ br:Madrid dbo:country dbr:Spain .
http://dbpedia.org/ontology/country
http://dbpedia.org/resource/Madrid
dbo:country
http://dbpedia.org/resource/Spain dbr:Madrid dbr:Spain 53 54
https://www.w3.org/TR/rdf11-concepts/
https://www.w3.org/TR/rdf11-primer/
RDF Serialization Triples <subject><predicate><object>.
▣ Several syntaxes to describe RDF graphs

□ RDF/XML
□ Turtle family of RDF languages
■ Turtle, N3, N-Quads, TriG
□ JSON-LD (JSON for Linked Data)
□ RDFa (embedding RDF into HTML)
55 56
Turtle JSON-LD
a → rdf:type
57 58
RDFa RDFa
▣ Annotate semantically HTML web pages

▣ Similar to microformats and microdata
59 60
https://www.w3.org/TR/rdfa-lite/
RDF/XML Blank nodes
▣ Resource (subject/object) without an ID
61 62
RDF Schema (RDF-S) Example

MotorVehicle
<<instance>>
▣ Support definition of vocabularies ex:MotorVehicle rdf:type rdfs:Class .

::yourCar
ex:Van rdf:type rdfs:Class . Van Truck
ex:Truck rdf:type rdfs:Class .
<<instance>>
ex:Van rdfs:subClassOf ex:MotorVehicle . ::myCar
ex:Van rdfs:subClassOf ex:MotorVehicle .
ex:yourCar rdf:type ex:MotorVehicle .

ex:myCar rdf:type ex:Van .
ex:author
ex:Book rdf:type rdfs:Class . Book Person
ex:Person rdf:type rdfs:Class .
ex:author rdf:type rdf:Property .
ex:author rdfs:domain ex:Book .
63 ex:author rdfs:range ex:Person . 64
Popular RDF-S vocabularies OWL - Ontology Web Language
▣ FOAF (Friend of a Friend) - Social Networks ▣ Additional semantics based on Description

▣ SKOS - Thesauri and taxonomies Logics
▣ Schema.org - web pages for search engines ▣ Complex class construction, cardinality, ….
▣ Dublin Core - metadata of pages (creator,
author, …)
Grandfather: Man ⋂ Parent
John, at least 4 children

who are Parent
https://www.w3.org/TR/2012/REC-owl2-primer-20121211/
65 66
SPARQL Protocol and RDF Query

Triplestore Language (SPARQL)
▣ W3C standardised language to query and

manipulate RDF data
▣ Family of specifications
□ SPARQL Query - Graph patterns (triples with vars)
■ SELECT query (return variable binding)
■ ASK query (yes / no question)
■ CONSTRUCT query (a new RDF graph is
constructed from the query result)
□ SPARQL Federated Query
■ Delegate subqueries to SPARQL endpoints
□ SPARQL Update
■ Insert, Delete, Update RDF tripels
□ SPARQL Protocol
■ Graph operations over HTTML
67 https://www.w3.org/TR/sparql11-overview/ 68
SPARQL 1.1 Graph pattern
69 https://programminghistorian.org/lessons/graph-databases-and-SPARQL 70
RDF, RDFS, OWL, SPARQL
Query / Manipulate:
Ontology: Consulting and manipulating
what to say said things
(what is correct)
(class constructors, OWL
4.
inferences, SPARQL
cardinality, …)
RDFS
Vocabulary: How to create a How to create a
new RDF
expressing shared terms
(classes, subclasses,
RDF
vocabulary vocabulary
range, domain) Statements:
how to say things
71 72
(triples, syntax)
6 steps for creating an RDF
vocabulary 1. Robust domain model
1. Start with a robust domain model ▣ A vocabulary should enable interoperability

2. Research and reuse existing terms within a domain: common understanding
3. When new terms specialize existing terms, ▣ Example: SIOC (online communities)
use subclass and subproperties
4. When new terms are required, create them
with commonly agreed best practice
5. Publish with a highly stable environment
designed to be persistent
6. Publicise the RDF vocabulary by registering it
with relevant services
73 74
2. Reuse existing vocabularies 2. Search available vocabularies
▣ General purpose: Dublin Core (DC)

▣ People: FOAF, vCard
▣ Projects: DOAP
▣ Addresses: vCard
▣ Geography: geo
▣ Entities: DBPedia (dbp)
▣ Sensors: Semantic Sensor Network (ssn)
▣ Social networks: SIOC
▣ Web information (eCommerce, Music,
Products, Restaurants, …): schema.org
▣ Taxonomies / Thesauri: skos
75 76
3. Create subclasses / subproperties 4. New terms
▣ Classes: Capital letter and singular:

□ skos:Concept, sioc:Post
▣ Properties: lower case
□ rdfs:label
▣ Object properties should be verbs
□ sioc:has_reply
▣ Data type properties should be nouns
□ dc:title
77 78
Google’s Knowledge Graph Facebook’s Open Graph Protocol
79 80
Conclusions
▣ Linked data is a set of design principles for

sharing machine readable data on the Web
▣ Open data is a movement for publishing data
in the web
Thanks!
▣ Linked Open Data is Open data published Any questions?
according to Linked Data principles
▣ The basic technologies of Linked Data are
RDF, SPARQL and URIS You can find me at cif@gsi.dit.upm.es
▣ RDFS and OWL add semantics to RDF
▣ RDF vocabularies are reusable, some popular
ones are SKOS and FOAF
81 82
Teacher
Carlos A. Iglesias
Introduction to
Machine Learning You can contact me at carlosangel.iglesias@upm.es
- Office C-211
References
▣ Web Data Mining, 2nd Ed., Bing Liu, 2011

▣ Machine Learning: The Art and Science of
Algorithms that Make Sense of Data,
by Peter Flach, 2012
1.
Motivation and
What this talk is
Goals about
3 4
Topics Problem: classify houses
▣ What is machine learning?

▣ Ingredients of machine learning
5 6
Some intuition Let’s look at home elevation data
▣ We face a
classification
problem
▣ Since San Francisco
is hilly, maybe home
elevation data is
relevant
Source: http://www.r2d3.us/visual-intro-to-machine-learning-part-1/
7 8
if elevation > 239.5ft then S. Francisco Adding nuance
▣ SF can be
expensive
▣ We add more
‘features’,
‘variables’,
‘predictors’ or
‘characteristics’
▣ Scatterplot
elevation vs
price/sqft
9 10
Boundaries Machine learning
▣ Linear
separation
▣ Lines
□ elevation >
239.5 →SF
□ price > $1776 →
LA
▣ Still one region
‘with mixed
values’
pairwise
scatterplot
11 12
Machine learning Split point: SF
▣ Automate discovering patterns False negative: some SF are classified as LA ones
▣ Split point to classify
13 New view: histogram 14
Split point: SF Best split
False positive: some LA are classified as SF ones
15 16
Recursion with more features Recursion
Accuracy: 82%
Accuracy: 84%
17 18
Recursion Recursion
Accuracy: 100%
Accuracy: 96%
19 20
Training and overfitting
2. What is
Machine
Introduction learning?
21 22
“Field of study that gives
What is Machine Learning? ‘’ computers the ability to

learn without being
explicitly programmed”
Arthur Samuel, Science, 1959
23 24
Use case: spam detection
“A computer program is
‘’ said to learn from

experience E with respect
to some class of tasks T and
performance measure P, if
its performance at tasks in
T, as measured by P,
improves with experience
E”
Tom M. Mitchell, 1997
25 26
Use case: market segmentation in

Use case: spam detection marketing
predictive model
27 28
Use case: marketing segmentation Use case: association rule
descriptive
model
29 30
Use case: cross selling Use case: credit card fraud detection
descriptive
model
31 32
Use case: fraud detection Use case: Tesla’s autopilot
descriptive &
predictive model
33 34
Use case: Tesla’s autopilot Supervised Learning Workflow
reinforcement
learning
predictive model
35 36
Unsupervised Learning Workflow Reinforcement learning
descriptive model
37 38
Deep Learning ML Process
Source: https://hackernoon.com/log-analytics-with-deep-learning-and-machine-learning-20a1891ff70e
39 40
Source: https://machinelearningmastery.com/machine-learning-checklist/
Descriptive, Predictive and Descriptive, Predictive, Prescriptive
Prescriptive models Analytics
41 42
Data analytics
3.
The ingredients
of Machine Elements of
Machine
learning Learning
43 44
Ingredients of Machine Learning Ingredients ML
Tasks Models
Features
45 46
Ingredients of Machine Learning Task hierarchy
Tasks Models
Features
Source: CommonKADS, Guus Screiber

47 48
ML tasks Machine learning tasks
Predictive model Descriptive model SUPERVISED UNSUPERVISED
classification, ● classification
Supervised learning subgroup discovery ● clustering
regression ○ KNN, SVM,
○ k-means
CATEGORICAL NaiveBayes,
descriptive ● association rules
induction
clustering, ○ Apriori
trees
association rule
Unsupervised learning predictive clustering regression dimensionality
discovery, ● ●
dimensionality ○ linear, reduction
reduction logistic, ○ SVD, PCA
CONTINUOUS
polynomial
● decision trees
● random forests
49 50
Machine learning types Tasks
Source: http://www.apress.com/us/book/9781484223338
51 Source: http://smartbasegroup.com/introduccion-al-machine-learning/ 52
Use cases Supervised ML - Predictive model
53 54
Source: http://usblogs.pwc.com/emerging-technology/demystifying-machine-learning-part-2-supervised-unsupervised-and-reinforcement-learning/#94555
Unsupervised ML - Descriptive model Evaluating task performance
55 56
Confusion matrix, precision, recall, F1 Training, Test and Validation data
Exactitud
Precisión
Exhaustividad
Factor F
57 58
Evaluation Overfitting and underfitting
▣ Training set: seeing data to build(fit) the

model
□ determine parameters
▣ Validation set: data to (unbiased) evaluation
of a model while tuning the model
□ determine hyperparameters
▣ Test/Evaluation set: unseeing data to
(unbiased) measure model’s performance
□ holding parameters and hyperparameters constant
59 60
Dimensionality Dimensionality in practice
▣ When we add more dimensions, ▣ Suppose we want to classify photos of cats

□ many times we can separate data in an easier way and dogs
□ but if we add too many, can occur overfitting ▣ Let’s start with one characteristic (i.e. average
▣ Overfitting red color)
□ Our model lacks of generalization capability
□ it only classifies correctly data using in training
▣ Very bad separation :(
61 Source: https://www.visiondummy.com/2014/04/curse-dimensionality-affect-classification/ 62
Second dimension Third dimension
▣ Second feature: Average of green ▣ Third dimension: average blue color: Eureka!
▣ Not a great deal

63 64
Overfitting Generalization
▣ If we add more dimensions, the model learns ▣ Even though it is not evident, in this example
‘the exceptions’ (the list of points) the model with 2 dimensions is better than
the one with 3
▣ This model classifies wrongly some example,
but has a better generalization capability
65 66
Another approach Generalization and training data
▣ How many data we need to train a model? ▣ Let’s suppose 1 characteristic is unique for every
▣ From all the possible photos of cats and dogs, cat and dog
how many do we need? □ with 1 single feature, we need 20% of the population of
▣ If we want to train with the 20% of the data... data
▣ But… with 2 features
□ we need 45% (0.452=0.2) to cover the 20% of the 2D space
▣ and with 3 features, we need the 58% (0.58³=0.2)...
67 68
Generalization and training data Holdout validation
▣ Conclusions ▣ If lot of data is available

□ If we add more dimensions, we need more data, to □ independent sampling: one dataset for training,
cover this 20% of the possibilities so that our training another for testing
data is representing well the dataset and avoid ▣ If lot of (labeled) data is available:
overfitting ▣ Holdout: divide data into training and testing
□ usually 10% - 30% for testing
▣ Stratified holdout: each class is represented
in the two sets
69 70
K-Fold Cross Validation K-Fold Cross Validation
▣ Maximizes use of data

□ all data for training and testing
▣ Usually, k = 10
▣ Leave-One-Out-Cross-Validation
□ k = n (number of samples): 1 for testing, n-1 training
□ Maximizes training data
□ Cannot be stratified (only 1 class in test)
□ High computational cost
▣ Stratified K-Fold
□ each class is (approximately) represented in each
fold
71 72
Holdout + K-Fold Ingredients of Machine Learning
Tasks Models
Features
73 74
Models: output of ML Models: a function
75 76
Model: process Model building and Hyperparameters
77 78
ML tribes ML tribes
Pedro Domingos, The Master Algorithm,

Source: https://kevinbinz.com/2017/08/13/ml-five-tribes/
79 80
Types of models Geometric models
▣ Hypothesis: there is a geometric separator

(e.g. line, hyperplane, etc.) between the
GEOMETRIC instances
LOGICAL
MODELS
MODELS ▣ Output: decision (clustering, classifier) based
on geometric descriptor (e.g. distance)
PROBABILISTIC
MODELS
Gaussian Mixture Models (GMM)

81 82
Probabilistic models Logical models
▣ Hypothesis: there is an underlying unknown ▣ Hypothesis: there are rules based on features
probability distribution that generates the to classify the instances
output from the input ▣ Output: decision trees
▣ Output: probabilistic model
83 84
Ingredients of Machine Learning Features
Tasks Models
Features
85 86
Good features Example: classification
87 88
Example: Bag of Words Dimensionality reduction
▣ Feature selection
▣ Feature extraction (compression)
□ PCA (Principal Component Analysis)
□ SVD (Singular Vector Decomposition)
89 90
Conclusions
▣ Machine learning is about using the right

features to build the right models that
achieve the right tasks
▣ These tasks include: binary and multi-class
Thanks!
classification, regression, clustering and Any questions?
descriptive modelling.
▣ Supervised learning works with labelled
datasets while unsupervised learning works You can find me at
with unlabelled datasets carlosangel.iglesias@upm.es
▣ The output of predictive models involves the
target variable while descriptive models
identify interesting structure in the data.
91 92
Credits
Thanks to all who have published free resources:

▣ Slide template by SlidesCarnival
▣ Photos by Unsplash and Wix
93
Teachers
Carlos A. Iglesias
Geometric Models You can contact me at cif@gsi.dit.upm.es - Office C-211
Referencias
▣ Web Data Mining, 2nd Ed., Bing Liu, 2011

▣ Machine Learning: The Art and Science of
Algorithms that Make Sense of Data,
by Peter Flach, 2012
▣ Machine Learning, Andrew Ng, Stanford
University, Coursera
▣ Python Machine Learning, S. Raschka,
Packt, 2015 1. What this talk is
Introduction about
3 4
Topics
▣ Tour of Geometric methods

□ Simple Linear classifier
□ Linear Regression
2.
□ Perceptron
□ Support Vector Machines (SVM)
▣ Linear Regression
▣ Gradient Descent
▣ Conclusions
Tour of
Geometric
Main geometric
Models models
5 6
Families Tour of algorithms
▣ Linear Models ▣ Distance based

LINEAR MODELS
DISTANCE BASED MODELS
□ Regression models
■ Linear □ Classification
Linear Regression Regression ■ k-Nearest
□ Classification Neighbours
Logistic Regression ■ Perceptron (kNN)
kNN
■ Logistic □ Clustering
Perceptron Regression ■ k-Means
k-means ■ Support Vector
SVM Machines (SVM)
7 8
kNN - k Nearest Neighbors
▣ Classification
decision based on
majority of the k
3. nearest neighbors
▣ Usually Euclidean
distance
Tour of ▣ Non linear classifier
Distance based ▣ Effect large K
□ noise
Main distance
Models based models □ majority class
9 10
K-Means Clustering
▣ Place k random
centroids
▣ For each point i
□ find nearest centroid
j_i
□ assign point i to
cluster j
▣ For each cluster
4.
□ new centroid = Tour of Linear
average of points Main linear
assigned to cluster based Models based models
▣ Repeat until there
are not changes
11 12
Perceptron Perceptron
▣ Linear classifier
1843 Warren McCulloch
and Walter Pitts
▣ Binary classifier
▣ Learning
1957 Frank Rosenblatt
13 14
Perceptron learning rule Support Vector Machines (SVM)
+1
▣ Output h(x) = sign(wTx) ▣ Perceptron: learn any separating hyperplane
▣ Pick misclassified point Wk □ Minimize misclassified points
□ sign(wTxn) != yn ▣ SVM: learn the hyperplane with maximum
Wk + 1
▣ Update weight vector margin
□ wk+1 = wk + ynxn
w+yx
-1 Wk x + b = 0
y = +1
X Wk+1 x + b = 0
w
y = -1
w
X 15 16
w+yx
Support Vector Machines (SVM) SVM
▣ Large margin classifier

□ Data is messy → allow some errors
□ Parameter C: amount of allowed errors
Source: Python Machine Learning,
17 18
Kernel Trick Example
▣ Intuition
□ Use linear algorithms for classification in higher
dimensions
19 20
Kernel trick
▣ We are only operating wTx

□ wTx = <x, w> = ||x|| ||w|| cosϴ
▣ Kernel trick: redefine inner products
□ Go from the original space to a project space
(higher dimensional)
5.
□ Not need to do transformations → only products are
changed (matrices)
□ No extra computational cost Linear
▣ Common kernels
□ Polynomial
Linear Regression and
Gradient
□ Gaussian Radial Basis Function(RBF)
Regression descent
21 22
Problem: price house prediction Some intuition
▣ We face a…
□ data is continuous or discrete?
□ supervised or unsupervised?
□ prediction or classification problem?
▣ Scatter plot: it seems we can describe the
relationship between surface and price by a
line
23 Source: 24
https://www.pugetsystems.com/labs/hpc/Machine-Learning-and-Data-Science-Linear-Regression-Part-1-954/
Machine learning process Model representation
Training set
(xi, yi)
Learning
Algorithm
residual
Model (h)
25 26
Model visualization Model visualization
27 28
Cost function J(a0, a1) Cost function visualization
29 30
h(a1) and J(a1) In n+1 dimensions, Θ0 and Θ1
31 32
Solving Linear Regression Gradient descent
▣ Simple linear regression

□ Using statistics if all data if available (means,
variance, correlations, …)
□ Not useful when data changes
▣ Ordinary Least Squares Method
□ Goal: minimize sum of squared residuals
□ Model data as a matrix and uses algebra
□ Commonly used and very fast
▣ Gradient Descent
□ Optimized the coefficients in an iterative way
□ Alpha parameter determines the size of the
improvement
□ Useful with very large datasets
Source: https://mubaris.com/2017/09/28/linear-regression-from-scratch/
33 https://mubaris.com/2017/09/28/linear-regression-from-scratch/ 34
Gradient intuition Gradient intuition
Source: http://www.big-data.tips/gradient-descent 35 36
Gradient intuition Gradient descent - local minimum
37 38
Gradient descent - linear regression Gradient descent algorithm
▣ Given the cost function ▣ Iteratively until gradient is 0
Nabla operator
α = step size in the gradient

direction
39 40
Example alpha too big - oscillates in
Example gradient - good alpha / slow minimum
Convex function - local minimum?

41 42
Example alpha too big - not converge Conclusions
▣ There are a large number of ML algorithms

▣ Geometric ML can be classified as
distance-based and linear based
▣ Linear algorithms can address non linear
classifications with the kernel trick (higher
dimension space)
▣ Most of ML algorithms have some
optimization algorithm, such as gradient
descent
▣ It is important to understand both
parameters and hyperparameters
43 44
Credits
Thanks! Thanks to all who have published free resources:

Any questions? ▣ Photos by Unsplash and Wix
You can find me at

cif@gsi.dit.upm.es
45 46
Teachers
Carlos A. Iglesias
Tree based Models You can contact me at carlosangel.iglesias@upm.es

Office C-211
References Topics
▣ Machine Learning: The Art and Science of ▣ Tour of Tree Models

Algorithms that Make Sense of Data, □ Decision Trees
by Peter Flach, 2012 □ Ensembles
▣ Induction of Decision Trees, J. R. Quinlan, ▣ Decision Trees
Machine Learning, 1: 81-106, 1986, Kluwer ▣ Conclusions
3 4
1. What this talk is
Introduction about
5 6
Historical perspective Practical use: Kinect
Decision Trees Random Forest ▣ Uses Random Forest to predict body pose
▣ J. R. Quinlan, ▣ T.K. Ho, “Random ▣ Implemented efficiently on the GPU
“Induction of Decision Forests”.
Decision Trees”, Proc. ICDAR, 1995.
1979 ▣ L. Breiman,
▣ L. Breiman et al., "Random Forests".
Classification and Machine Learning.
regression trees. 45 (1): 5–32, 2001.
T&F, 1984.
▣ J. R. Quinlan, “C4.5:
Programs for
Machine Learning”.
MK Publishers, 1993. 7
http://www.i-programmer.info/news/105-artificial-intelligence/2176-kinects-ai-breakthrough-explained.html
8
Families
ENSEMBLE MODELS
DECISION TREE (decision forest)
Random Forest
ID3
2.
Extratrees
C4.5
GBM (Gradient
C5.0
Boosting Machine)
CART
Tour of Tree
Main tree
Models models
9 10
Tour of algorithms Binary Classification tree
Family Algorithm Classification Regression
ID3, C4,5, entropy --

C5.0 (information gain, ig)
CART gini impurity, mean

Decision Tree (binary) entropy or generic squared error
function (mse), mean
absolute
error (mae)
Random gini, entropy mse, mae

Ensemble Forest
Extra Trees gini, entropy mse, mae
11 Source: http://apprize.info/python/scratch/17.html 12
Non-binary Classification tree Regression tree (continuous output)
Source: Python Machine Learning

13 14
Multiway and Binary splitting Decision tree naming
color? color ==
green?
Yes No
red green yellow

color ==
yellow?
15 16
Basic algorithm
▣ Choose the best attribute(s) to split the

remaining instances and make that attribute a
decision node
▣ Repeat this process for recursively for each
child
3. ▣ Stop when:
□ All the instances have the same target attribute
value
Decision Tree Fundamentals □ There are no more attributes
of Decision Tree □ There are no more instances
Learning Learning
17 18
Will Nadal play the match? The induction task
▣ outlook: {sunny, overcast, rain}

▣ temperature: {cool, mild, hot}
▣ humidity: {high, normal}
▣ windy: {true, false}
Induction of Decision Trees, J. R. Quinlan,

Machine Learning, 1: 81-106, 1986, Kluwer
19 20
Data Which attribute provides more info?
(9P, 5N)
Outlook
Sunny Overcast Rain

(2P, 3N)
21 22
Which attribute provides more info? Which attribute provides more info?
Outlook Temperature
Outlook
Sunny Overcast Rain Hot Mild Cool

(2P, 3N) (4P, 0N) (3 P, 2N) (2P, 2N) (4P, 2N) (3P, 1N)
Sunny Overcast Rain
(2P, 3N) (4P, 0N) (3P, 2N)
Humidity Wind
split “pure”: all of one class and

zero from the rest, no need to Weak Strong
split High Normal
23 (3P, 4N) (6P, 1N) (6P, 2N) (3P, 3N) 24
Select best attribute Measure of Node Impurity
▣ We prefer nodes with homogeneous class

distribution
▣ Ej.
□ Wind strong (3P, 3N)
■ → non homogeneous, high impurity
□ Outlook overcast (4P, 0N)
■ → homogeneous, low impurity
25 probability (proportion) positive class: 0 → all negative (pure); 1.0 → all positive (pure) 26
(9P, 5N) (9P, 5N)
Outlook Outlook
Entropy Outlook Information Gain (IG)
Sunny Overcast Rain Sunny Overcast Rain
(2P, 3N) (4P, 0N) (3P, 2N) (2P, 3N) (4P, 0N) (3P, 2N)
▣ Entropy: level of ‘disorder’ or ‘impurity’ or ▣ IG(S,A) : expected reduction in entropy in S

‘diversity’ because of the split in attribute A
▣ Entropy(S) = E(S) = -∑pilog2pi ▣ IG(S, A) = E(S) - ∑|Sv|/|S| E(SA), v: values of A
□ S = set of examples
□ pi = proportion of examples of class i in S ▣ E(S) = 0.940
▣ E(Outlook) = 5/14 E(Sunny) + 4/14 E(Overcast) + 5 /14 E(Rain)
= 5/14 * 0.971 + 4/14 *0 + 5/14* 0.971 = 0.694 bits
▣ E(Initial) = - 9/14log29/14 - 5/14log25/14 = 0.940

▣ E(Sunny) = -2/5log22/5 -3/5log23/5 = 0.971 ▣ IG(S, Outlook) = E(S) - E(Outlook) = 0.246 bits
▣ E(Overcast) = 0
▣ E(Rain) = -3/5log23/5 -2/5log22/5 = 0.971
27 28
(9P, 5N) (9P, 5N)
IG Temperature Temperature IG Humidity Humidity
Hot Mild Cool High Normal

▣ E(S) = 0.940 (2P, 2N) (4P, 2N) (3P, 1N) ▣ E(S) = 0.94 (3P, 4N) (6P, 1N)
▣ E(Hot) = 1 ▣ E(High) = -3/7log2(3/7)-4/7log2(4/7) = 0.985

▣ E(Mild) = -4/6log2(4/6)-2/6log2(2/6) = 0.918 ▣ E(Normal) = -6/7log2(6/7) - 1/7log2(1/7) = 0.591
▣ E(Cold) = -3/4log2(3/4) - 1/4log2(1/4) = 0.811
▣ IG(S, Humidity) = 0.94 - 7/14 * 0.984 - 7/14 * 0.591
▣ E(Temperature) = 4/14 * 1 + 6/14 * 0.918 + 4/14 * = 0,151 bits
0.811 = 0.911
▣ IG(S, Temperature) = 0.94 - 0.911 = 0,029 bits
29 30
(9P, 5N)
IG Wind Wind Select attribute
Weak Strong
▣ E(S) = 0.94 (6P, 2N) (3P, 3N) ▣ IG(S, Outlook) = 0.246 bits
▣ IG(S, Temperature) = 0,029 bits
▣ E(Weak) = -6/8* log2(6/8) - 2/8*log2(2/8) = 0.811 ▣ IG(S, Humidity) = 0,151 bits
▣ E(Strong) = 1 ▣ IG(S, Wind) = 0,048 bits
▣ IG(S, Wind) = 0.94 - 8/14 * 0.811 - 6/14 * 1

= 0,048 bits
31 32
Select next attribute - Sunny (I) Select next attribute - Sunny (2)
(9P, 5N)
Outlook Outlook Outlook
(2P, 3N) Sunny Sunny

(2P, 3N) Sunny
... ... ... ...
... humidity windy
temperature
...
high normal weak strong
hot mild cold
(0P, 3N) (2P, 0N) (1P, 2N) (1P, 1N)

(0P, 2N) (1P, 1N) (1P, 0N)
▣ E(Outlook=Sunny) = 0.971;
▣ E(humidity | Sunny) = 0;
▣ E(Outlook=Sunny) = 0.971; E(hot) = 0; E(mild) = 1; E(cold) = 0
▣ IG(humidity) = 0.971- 0 = 0.971 bits
▣ E(temperature | Sunny) = 2/5 E(hot) + 2/5 E(mild) + 1/5 E(cold) ▣ E(weak|Sunny) = -⅓ * log2(⅓)-⅔*log2(⅔) = 0.918
= 2/5 * 0 + 2/5 *1 + 1/5* 0 = 0.4 bits ▣ E(windy|Sunny) = ⅗ * 0.918 + ⅖ * 1 = 0.951
▣ IG(temperature) = 0.971 - 0.4 = 0.571 bits 33 ▣ IG(windy) = 0.971 - 0.951 = 0.020 bits 34
Select next attribute Select next attribute - rain (I)

(9 P, 5N)
Outlook
(9P, 5N)
(3 P, 2N) Rain
Outlook
temperature
(2P, 3N) Sunny Overcast Rain (3P, 2N)
hot mild cold
(4P, 0N)
humidity ?
(0 P, 0N) (2P, 1N) (1P, 1N)
high normal
(0P, 3N) (2P, 0N)

▣ E(Outlook=Rain) = 0.971; E(hot) = 0; E(mild) = 0.918; E(cold) = 1
▣ E(temperature | Rain) = 3/5 * 0.918 + 2/5 = = 0,951 bits
▣ IG(temperature) = 0.971 - 0.951 = 0.02 bits
35 36
Select next attribute - rain (II) Final decision tree
Outlook Outlook
(3 P, 2N) Rain Rain (9P, 5N)
... ... ... ... outlook

humidity wind
high normal weak strong (2P, 3N) Sunny Overcast Rain (3P, 2N)
(4P, 0N)
humidity wind
(1 P, 1N) (2 P, 1N) (3 P, 0N) (0P, 2N)
high normal weak strong
▣ E(Outlook=Rain) = 0.971;
▣ E(Humidity | Rain) = ⅖ * 1 + ⅗ * 0,918 = 0,951 bits (0P, 3N) (2P, 0N) (3P, 0N) (0P, 2N)
▣ IG(Humidity|Rain) = 0.971 - 0,951 = 0.02 bits
▣ E(Wind|Rain) = 0
▣ IG(Wind|Rain) = 0.971 bits 37 38
Final tree as disjunction of

Final tree as decision rules conjunctions
39 40
Inductive bias
“Pluralitas non est ponenda

▣ Does ID3 generalize from training samples?
‘’ sine necessitate”
“when you have two
▣ ML “Bias”: a ML algorithm prefers some
hypotheses
▣ ID3: prefers “short trees” to “long trees”
competing theories that
→ short hypotheses
make exactly the same
predictions, the simpler one
is the better”
Occam's razor, c. 1287–1347
41 42
Issues Conclusions
▣ Overfitting with training data ▣ Decision trees (DT) can be seen as rules and
□ Prepruning: Stop growing tree as some point make easy to understand the outcome of ML
during top-down construction when there is no ▣ DTs can be used for classification and
longer sufficient data to make reliable decisions.
□ Postpruning: Grow the full tree, then remove
regression
subtrees that do not have sufficient evidence ▣ Main algorithms are CART, ID3 and C4.5
▣ Handling missing or wrong values ▣ Ensembling approaches such as Random
Forest provide a very robust approach for
combining DTs
43 44
Credits
Thanks! Thanks to all who have published free resources:

Any questions? ▣ Photos by Unsplash and Wix
You can find me at carlosangel.iglesias@es
45 46

1 Intro IA - Merged

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

1 Intro IA - Merged

Uploaded by

Copyright:

Available Formats

Teachers

▣ Learn the foundations of intelligent systems

What is AI? - Press view

Total Turing Test (Harnard, 1992)

▣ The computer is a robot that should look, act

Minds, Brains and Programs, John Searle, 1980

Thinking humanly: Thinking rationally:

▣ First: how do humans think? ▣ Logicist tradition from Aristotle

▣ The branch of computer science that is ▣ Acting rationally

▣ Philosophy: logic, reasoning, rationality, …

"Torres and His Remarkable

Pre-IA: 40s Pre-IA: First neuron

▣ Science-Fictions ▣ 1888 - Ramón y Cajal - brain composed of

The brain is a physical machine… Adaptive

The Law of Requisite Variety (Ashby) Birth of AI: 1952-1956

▣ The number of states of its control ▣ 1951 Prinz’s checker (UK)

Golden era - 1956-1974 Golden era - 1956-1974

▣ Optimism ▣ High expectations, poor results

First AI winter 1974–1980 - Teenage AI Schools of Thought

▣ 1971-75 DARPA's frustration with the Speech “Neat” “Scruffy”

MYCIN Expert System

Bust: the second AI winter 1987–1993

There exists a distinct

‘’ computers system level, lying

▣ 90s: intelligent agents ▣ autonomy: agents operate without the direct

Deep learning, big data and artiﬁcial

▣ System of agents ▣ 2007 Hinton Deep learning

Investment in IA (08/2015) Innovation in AI (8/2015)

▣ AI aims at developing systems that reproduce ▣ Artiﬁcial Intelligence, a A Modern Approach,

Thanks! Thanks to all who have published these resources

You can ﬁnd me at

The learning objectives of this lesson consist of

President Obama, 2009 https://www.whitehouse.gov/open

Gobierno abierto Open Data, Open Government, Big Data

A Vision of public services http://ec.europa.eu/newsroom/dae/document.cfm?doc_id=3179 7 8

● Public: public access respecting legal

That's why I say that data is datos.gob.es

How many other ways could

25k new jobs

1,7b€ saved in costs

626 m hours saved

“Datos abiertos” in Spain Spain - opendatabarometer.org

▣ Data dump or database accessible through a

Linked Data Approach:

▣ No need to change existing IT infrastructure

Linked Data is the term

The Four Design principles of Linked

4. Include links to other URIs. so that they can

3. When someone looks up a URI, provide useful

UPM - N3 UPM - XML

▣ Everything should have a unique web

RDF (Resource Description

dbr:Madrid Spain 3141991

Open Data Linked Data

Data can be published Data can be linked to

Linked Open Data Cloud Evolution LOD 2014

PREFIX yago: <http://dbpedia.org/class/yago/>

soccer players, who are born in a country

Linked Open Data

‘’ (LOD) is Linked Data

▣ Flexible data integration

▣ Compability with existing infrastructure