Professional Documents
Culture Documents
1 Intro IA - Merged
1 Intro IA - Merged
Intelligent
Techniques Carlos A. Iglesias
Introduction You can contact me at carlosangel.iglesias@upm.es -
Office C-211
Goals
1. □
□
Machine learning
Natural Language Processing
Goals Topics
3 4
What is AI? - Press view
2.
Introduction What is AI?
5 6
7 8
9 10
THINKING HUMANLY
"The exciting new effort to make computers think ... machines with minds, in
What is Artificial Intelligence (AI) ? the full and literal sense." (Haugeland, 1985)
"[The automation of] activities that we associate with human thinking,
activities such as decision-making, problem solving, learning " (Hellman, 1978)
THINKING RATIONALLY
"The study of mental faculties through the use of computational models."
(Charniak and McDermott, 1985)
Systems that think Systems that think
THOUGHT "The study of the computations that make it possible to perceive, reason, and
like humans rationally
act." (Winston, 1992)
ACTING HUMANLY
Systems that act Systems that act "The art of creating machines that perform functions that require
BEHAVIOUR intelligence when performed by people." (Kurzweil, 1990)
like humans rationally
"The study of how to make computers do things at which, at the moment,
people are better." (Rich and Knight, 1991)
HUMAN RATIONAL
ACTING RATIONALLY
"Computational Intelligence is the study of the design of intelligent
agents." (Poole et at, 1998)
"Al ... is concerned with intelligent behavior in artifacts." (Nilsson, 1998)
11 12
Which capabilities requires the
Acting humanly: The Turing Test Turing test?
Natural
Language Automated Knowledge
Processing reasoning representation
Machine
learning
13 14
Computer
Robotics
Vision
15 16
Philosophy and AI:
Critic: Searle’s Chinese room Strong AI vs Weak AI
▣ Strong AI:
□ “True” AI
□ AI matches (or exceeds) human intelligence
□ AI machines have real conscious minds
□ Ej. HAL, Terminator, ...
▣ Weak AI:
□ AI “only” simulates human cognition
□ Narrow AI: constrained in problems / domains
19 20
Acting rationally:
rational agent approach Discussion
21 22
Foundations of AI
23 24
Pre-IA: 1911 Torres Quevedo
Story of IA “Ajedrecista” (Chess automaton)
27 28
Birth of AI: 1952-1956
Walter’s tortoise 29 30
31 32
Birth of AI: 1952-1956 IA: Dartmouth Summer 56
▣ 1956: Newell & Simon: Logic Theorist ▣ McCarthy, Minsky, Shannon, Simon, Newell, Mc
□ first AI program that uses heuristics “rules of Culloch, Nash, Samuel
thumb” to prove 38 out of 52 theorems of Principia ▣ McCarthy
Mathematica (Whitehead and Russell) □ Program Advice Taker - Logic with Common Sense
https://en.wikipedia.org/wiki/Dartmouth_workshop
33 34
▣ 1957 Newell & Simon - General Problem Solver ▣ 1960 - Quillian - 1st Semantic Network (or
(GPS): generalization of LT, model of cognition Frame)
□ reasoning as search ▣ 1966 - Weizenbaum, ELIZA - 1st Chatbot
▣ 1957 Rosenblatt - Perceptron
□ based on McCulloch-Pitts (1943)
□ training mechanism to learn the weights
▣ 1959 Samuel’s ML checker (USA)
□ Coined the term ‘machine learning’
35 36
Golden era - 1956-1974 The first AI winter 1974–1980
39 40
New developments: Logic and 1969-1979 - Expert Systems
Symbolic reasoning New hopes
▣ 1972 - Colmenaur and Roussel, success of ▣ Small domains to avoid common sense
Prolog (PROgrammation en LOGique) ▣ 1969 Feigenbaum, Buchanan et al. DENDRAL
□ Reduces logic (Horn clauses) to be tractable (similar □ infer molecular structure from info provided by a
to rules and production rules) mass spectometer
▣ Critics to logic from psychologists: people do ▣ 1972 Feigenbaum - MYCIN
not think with logics □ diagnosed infectious blood diseases
□ McCarthy → machines should not think like humans □ evolves to E-MYCIN
▣ Development of Expert Systems & Knowledge ▣ 1978 McDermott R1/XCON, eXpert CONfigure
based systems (KBS) □ selecting computer system components based on
customer's requirements
□ 2500 rules.
□ By 1986, it had processed 80,000 orders, and
achieved 95-98% accuracy. It was estimated to be
saving DEC $25M a year
41 42
Expert
System
43 44
ES with KBS architecture Boom 1980–1987
▣ Emergence of KBS
▣ 1988 Deep Thought - wins chess masters
▣ 1982 Knowledge Level - Newell
□ Knowledge engineering
▣ Return of investment
▣ 1986 Revival of connectionism
□ Rumelhart, Backpropagation
45 46
47 48
Weak definition of agents,
AI 1993–2001 Wooldridge 1994
49 50
51 52
Technological singularity
53 54
4. Disciplines that
AI Industry contribute to AI
55 56
AI Companies
57 58
59 60
Funding per AI category
9. What we have
Conclusions learnt
61 62
Conclusions References
63 64
Credits
65 66
Hello!
Carlos A. Iglesias
Linked Data
Technologies You can contact me at cif@gsi.dit.upm.es - Office C-211
Carlos A. Iglesias
Universidad Politécnica de Madrid 2
Objectives
1.
▣ How to publish a linked data
▣ How to define new vocabularies What is Open
▣ What are the main technologies of linked data Data and why it
(RDF, SPARQL) and their principles Open Data is important
3 4
My Administration is
committed to creating an
unprecedented level of
How can one openness in Government.
govern We will work together to
informed ensure the public trust and
establish a system of
citizens? transparency, public
participation, and
collaboration. Openness will
strengthen our democracy
and promote efficiency and
effectiveness in
Government.
5 6
Source: Creating Open Value through Open Data, European Data Portal, 2015
https://www.whitehouse.gov/sites/default/files/omb/memoranda/2013/m-13-13.pdf 9 10
Neelie Kroes,
EU vicepresident, 2012
http://europa.eu/rapid/press-release_SPEECH-12-149_en.htm
11 12
Data Value Chain
Source: Creating Open Value through Open Data, European Data Portal, 2015
13 14
325M€
Direct market size EU+26 2016-2020
Benefits of open data
Zillow
Weather
valued at
Channel Garmin
more than
sold in market of
$1billion
$3.5billion $7.24billion
329 jobs in
in 2008
5 years
Source: Creating Open Value through Open Data, European Data Portal, 2015
17 18
19 20
Uniform Linked Data
Resource
Identifier
(URI)
Open format
Machine readable
How can we
Open licence
publish open
data?
21 http://5stardata.info/en/ 22
How can we
publish
information to be
shared?
23 24
Traditional approach:
Usual technologies for sharing data Structural integration
25 26
2. What is Linked
Data and why it
Linked Data is important
27 28
From a Web of Documents
to a Web of Data
‘’ used to describe a
method of exposing
and connecting data on
the Web from different
sources following the
four principles of Linked
Data
Tim Berners-Lee
https://www.ted.com/talks/tim_berners_lee_on_the_next_web https://www.w3.org/DesignIssues/LinkedData.html
29 30
1. Use URIs as names for things 1. Use URIs (Uniform Resource Identifier) as
2. Use HTTP URIs so that people can look up names for things
those names. 2. Use HTTP URIs so that people can look up
3. When someone looks up a URI, provide useful those names.
information, using the standards (RDF,
SPARQL) UPM,
http://es.dbpedia.org/page/Universidad_Politécnica_de_Madrid
31 32
The 4 principles in practice (2) UPM
33 34
35 36
The Four Linked Data principles
Madrid (by Tim Berners Lee, 2006)
http://dbpedia.org/resource/Spain Spain
37 38
▣ W3C specification
▣ Resources are described as triples that form a
graph
▣ Graphs can be serialised using different
languages: XML, Turtle, JSON-LD
▣ E.g. Turtle
@prefix dbo: <http://dbpedia.org/ontology/>. Madrid
@prefix dbp: <http://dbpedia.org/property/>. populationTotal
@prefix dbr: <http://dbpedia.org/resource/>. country
41 42
2007
2009
2011
43 http://lod-cloud.net/ 44
Linked Geodata cloud Queries SPARQL
http://dbpedia.org/sparql
SELECT ?thing
WHERE
{
?thing rdf:type yago:EuropeanCountries .
}
ORDER BY DESC(?thing)
LIMIT 25
45 46
Tim Berners-Lee
https://www.w3.org/DesignIssues/LinkedData.html
47 48
Benefits of LOD
3.
▣ Increase in data quality
□ Use of URLs leads to improved data management
and quality
□ Increased (re)use of datasets increases data quality,
errors are progressively corrected
Linked Data
Main
▣ Increase in data usability
□ URis and different formats (XML, CSV, JSON, …)
Foundations technologies
RDF for
describing data
SPARQL for
querying linked
data
51 https://www.ietf.org/rfc/rfc2396.txt 52
Resource Description Framework
RDF Example (informal)
55 56
Turtle JSON-LD
a → rdf:type
57 58
RDFa RDFa
59 60
https://www.w3.org/TR/rdfa-lite/
RDF/XML Blank nodes
61 62
ex:author
ex:Book rdf:type rdfs:Class . Book Person
ex:Person rdf:type rdfs:Class .
ex:author rdf:type rdf:Property .
ex:author rdfs:domain ex:Book .
63 ex:author rdfs:range ex:Person . 64
Popular RDF-S vocabularies OWL - Ontology Web Language
https://www.w3.org/TR/2012/REC-owl2-primer-20121211/
65 66
69 https://programminghistorian.org/lessons/graph-databases-and-SPARQL 70
Query / Manipulate:
Ontology: Consulting and manipulating
what to say said things
(what is correct)
(class constructors, OWL
4.
inferences, SPARQL
cardinality, …)
RDFS
Vocabulary: How to create a How to create a
new RDF
expressing shared terms
(classes, subclasses,
RDF
vocabulary vocabulary
range, domain) Statements:
how to say things
71 72
(triples, syntax)
6 steps for creating an RDF
vocabulary 1. Robust domain model
73 74
75 76
3. Create subclasses / subproperties 4. New terms
77 78
79 80
Conclusions
81 82
Teacher
Carlos A. Iglesias
Introduction to
Machine Learning You can contact me at carlosangel.iglesias@upm.es
- Office C-211
References
1.
Motivation and
What this talk is
Goals about
3 4
Topics Problem: classify houses
5 6
▣ We face a
classification
problem
▣ Since San Francisco
is hilly, maybe home
elevation data is
relevant
Source: http://www.r2d3.us/visual-intro-to-machine-learning-part-1/
7 8
if elevation > 239.5ft then S. Francisco Adding nuance
▣ SF can be
expensive
▣ We add more
‘features’,
‘variables’,
‘predictors’ or
‘characteristics’
▣ Scatterplot
elevation vs
price/sqft
9 10
▣ Linear
separation
▣ Lines
□ elevation >
239.5 →SF
□ price > $1776 →
LA
▣ Still one region
‘with mixed
values’
pairwise
scatterplot
11 12
Machine learning Split point: SF
15 16
Recursion with more features Recursion
Accuracy: 82%
Accuracy: 84%
17 18
Recursion Recursion
Accuracy: 100%
Accuracy: 96%
19 20
Training and overfitting
2. What is
Machine
Introduction learning?
21 22
23 24
Use case: spam detection
“A computer program is
predictive model
27 28
Use case: marketing segmentation Use case: association rule
descriptive
model
29 30
Use case: cross selling Use case: credit card fraud detection
descriptive
model
31 32
Use case: fraud detection Use case: Tesla’s autopilot
descriptive &
predictive model
33 34
reinforcement
learning
predictive model
35 36
Unsupervised Learning Workflow Reinforcement learning
descriptive model
37 38
Source: https://hackernoon.com/log-analytics-with-deep-learning-and-machine-learning-20a1891ff70e
39 40
Source: https://machinelearningmastery.com/machine-learning-checklist/
Descriptive, Predictive and Descriptive, Predictive, Prescriptive
Prescriptive models Analytics
41 42
Data analytics
3.
The ingredients
of Machine Elements of
Machine
learning Learning
43 44
Ingredients of Machine Learning Ingredients ML
Tasks Models
Features
45 46
Tasks Models
Features
classification, ● classification
Supervised learning subgroup discovery ● clustering
regression ○ KNN, SVM,
○ k-means
CATEGORICAL NaiveBayes,
descriptive ● association rules
induction
clustering, ○ Apriori
trees
association rule
Unsupervised learning predictive clustering regression dimensionality
discovery, ● ●
dimensionality ○ linear, reduction
reduction logistic, ○ SVD, PCA
CONTINUOUS
polynomial
● decision trees
● random forests
49 50
Source: http://www.apress.com/us/book/9781484223338
51 Source: http://smartbasegroup.com/introduccion-al-machine-learning/ 52
Use cases Supervised ML - Predictive model
53 54
Source: http://usblogs.pwc.com/emerging-technology/demystifying-machine-learning-part-2-supervised-unsupervised-and-reinforcement-learning/#94555
55 56
Confusion matrix, precision, recall, F1 Training, Test and Validation data
Exactitud
Precisión
Exhaustividad
Factor F
57 58
59 60
Dimensionality Dimensionality in practice
61 Source: https://www.visiondummy.com/2014/04/curse-dimensionality-affect-classification/ 62
▣ Second feature: Average of green ▣ Third dimension: average blue color: Eureka!
▣ If we add more dimensions, the model learns ▣ Even though it is not evident, in this example
‘the exceptions’ (the list of points) the model with 2 dimensions is better than
the one with 3
▣ This model classifies wrongly some example,
but has a better generalization capability
65 66
▣ How many data we need to train a model? ▣ Let’s suppose 1 characteristic is unique for every
▣ From all the possible photos of cats and dogs, cat and dog
how many do we need? □ with 1 single feature, we need 20% of the population of
▣ If we want to train with the 20% of the data... data
▣ But… with 2 features
□ we need 45% (0.452=0.2) to cover the 20% of the 2D space
▣ and with 3 features, we need the 58% (0.58³=0.2)...
67 68
Generalization and training data Holdout validation
69 70
71 72
Holdout + K-Fold Ingredients of Machine Learning
Tasks Models
Features
73 74
75 76
Model: process Model building and Hyperparameters
77 78
ML tribes ML tribes
79 80
Types of models Geometric models
PROBABILISTIC
MODELS
▣ Hypothesis: there is an underlying unknown ▣ Hypothesis: there are rules based on features
probability distribution that generates the to classify the instances
output from the input ▣ Output: decision trees
▣ Output: probabilistic model
83 84
Ingredients of Machine Learning Features
Tasks Models
Features
85 86
87 88
Example: Bag of Words Dimensionality reduction
▣ Feature selection
▣ Feature extraction (compression)
□ PCA (Principal Component Analysis)
□ SVD (Singular Vector Decomposition)
89 90
Conclusions
93
Teachers
Carlos A. Iglesias
Referencias
3 4
Topics
2.
□ Perceptron
□ Support Vector Machines (SVM)
▣ Linear Regression
▣ Gradient Descent
▣ Conclusions
Tour of
Geometric
Main geometric
Models models
5 6
7 8
kNN - k Nearest Neighbors
▣ Classification
decision based on
majority of the k
3. nearest neighbors
▣ Usually Euclidean
distance
Tour of ▣ Non linear classifier
Distance based ▣ Effect large K
□ noise
Main distance
Models based models □ majority class
9 10
K-Means Clustering
▣ Place k random
centroids
▣ For each point i
□ find nearest centroid
j_i
□ assign point i to
cluster j
▣ For each cluster
4.
□ new centroid = Tour of Linear
average of points Main linear
assigned to cluster based Models based models
▣ Repeat until there
are not changes
11 12
Perceptron Perceptron
▣ Linear classifier
1843 Warren McCulloch
and Walter Pitts
▣ Binary classifier
▣ Learning
13 14
+1
▣ Output h(x) = sign(wTx) ▣ Perceptron: learn any separating hyperplane
▣ Pick misclassified point Wk □ Minimize misclassified points
□ sign(wTxn) != yn ▣ SVM: learn the hyperplane with maximum
Wk + 1
▣ Update weight vector margin
□ wk+1 = wk + ynxn
w+yx
-1 Wk x + b = 0
y = +1
X Wk+1 x + b = 0
w
y = -1
w
X 15 16
w+yx
Support Vector Machines (SVM) SVM
17 18
▣ Intuition
□ Use linear algorithms for classification in higher
dimensions
19 20
Kernel trick
5.
□ Not need to do transformations → only products are
changed (matrices)
□ No extra computational cost Linear
▣ Common kernels
□ Polynomial
Linear Regression and
Gradient
□ Gaussian Radial Basis Function(RBF)
Regression descent
21 22
▣ We face a…
□ data is continuous or discrete?
□ supervised or unsupervised?
□ prediction or classification problem?
▣ Scatter plot: it seems we can describe the
relationship between surface and price by a
line
23 Source: 24
https://www.pugetsystems.com/labs/hpc/Machine-Learning-and-Data-Science-Linear-Regression-Part-1-954/
Machine learning process Model representation
Training set
(xi, yi)
Learning
Algorithm
residual
Model (h)
25 26
27 28
Cost function J(a0, a1) Cost function visualization
29 30
31 32
Solving Linear Regression Gradient descent
Source: http://www.big-data.tips/gradient-descent 35 36
Gradient intuition Gradient descent - local minimum
37 38
Nabla operator
39 40
Example alpha too big - oscillates in
Example gradient - good alpha / slow minimum
43 44
Credits
45 46
Teachers
Carlos A. Iglesias
References Topics
3 4
1. What this talk is
Introduction about
5 6
Decision Trees Random Forest ▣ Uses Random Forest to predict body pose
▣ J. R. Quinlan, ▣ T.K. Ho, “Random ▣ Implemented efficiently on the GPU
“Induction of Decision Forests”.
Decision Trees”, Proc. ICDAR, 1995.
1979 ▣ L. Breiman,
▣ L. Breiman et al., "Random Forests".
Classification and Machine Learning.
regression trees. 45 (1): 5–32, 2001.
T&F, 1984.
▣ J. R. Quinlan, “C4.5:
Programs for
Machine Learning”.
MK Publishers, 1993. 7
http://www.i-programmer.info/news/105-artificial-intelligence/2176-kinects-ai-breakthrough-explained.html
8
Families
ENSEMBLE MODELS
DECISION TREE (decision forest)
Random Forest
ID3
2.
Extratrees
C4.5
GBM (Gradient
C5.0
Boosting Machine)
CART
Tour of Tree
Main tree
Models models
9 10
11 Source: http://apprize.info/python/scratch/17.html 12
Non-binary Classification tree Regression tree (continuous output)
color? color ==
green?
Yes No
15 16
Basic algorithm
17 18
(9P, 5N)
Outlook
21 22
Which attribute provides more info? Which attribute provides more info?
Outlook Temperature
Outlook
25 probability (proportion) positive class: 0 → all negative (pure); 1.0 → all positive (pure) 26
Outlook Outlook
Entropy Outlook Information Gain (IG)
Sunny Overcast Rain Sunny Overcast Rain
(2P, 3N) (4P, 0N) (3P, 2N) (2P, 3N) (4P, 0N) (3P, 2N)
27 28
(9P, 5N) (9P, 5N)
29 30
(9P, 5N)
Weak Strong
▣ E(S) = 0.94 (6P, 2N) (3P, 3N) ▣ IG(S, Outlook) = 0.246 bits
▣ IG(S, Temperature) = 0,029 bits
▣ E(Weak) = -6/8* log2(6/8) - 2/8*log2(2/8) = 0.811 ▣ IG(S, Humidity) = 0,151 bits
▣ E(Strong) = 1 ▣ IG(S, Wind) = 0,048 bits
31 32
Select next attribute - Sunny (I) Select next attribute - Sunny (2)
(9P, 5N)
▣ E(Outlook=Sunny) = 0.971;
▣ E(humidity | Sunny) = 0;
▣ E(Outlook=Sunny) = 0.971; E(hot) = 0; E(mild) = 1; E(cold) = 0
▣ IG(humidity) = 0.971- 0 = 0.971 bits
▣ E(temperature | Sunny) = 2/5 E(hot) + 2/5 E(mild) + 1/5 E(cold) ▣ E(weak|Sunny) = -⅓ * log2(⅓)-⅔*log2(⅔) = 0.918
= 2/5 * 0 + 2/5 *1 + 1/5* 0 = 0.4 bits ▣ E(windy|Sunny) = ⅗ * 0.918 + ⅖ * 1 = 0.951
▣ IG(temperature) = 0.971 - 0.4 = 0.571 bits 33 ▣ IG(windy) = 0.971 - 0.951 = 0.020 bits 34
Outlook
(9P, 5N)
(3 P, 2N) Rain
Outlook
temperature
(2P, 3N) Sunny Overcast Rain (3P, 2N)
hot mild cold
(4P, 0N)
humidity ?
(0 P, 0N) (2P, 1N) (1P, 1N)
high normal
Outlook Outlook
high normal weak strong (2P, 3N) Sunny Overcast Rain (3P, 2N)
(4P, 0N)
humidity wind
(1 P, 1N) (2 P, 1N) (3 P, 0N) (0P, 2N)
high normal weak strong
▣ E(Outlook=Rain) = 0.971;
▣ E(Humidity | Rain) = ⅖ * 1 + ⅗ * 0,918 = 0,951 bits (0P, 3N) (2P, 0N) (3P, 0N) (0P, 2N)
▣ IG(Humidity|Rain) = 0.971 - 0,951 = 0.02 bits
▣ E(Wind|Rain) = 0
▣ IG(Wind|Rain) = 0.971 bits 37 38
39 40
Inductive bias
‘’ sine necessitate”
“when you have two
▣ ML “Bias”: a ML algorithm prefers some
hypotheses
▣ ID3: prefers “short trees” to “long trees”
competing theories that
→ short hypotheses
make exactly the same
predictions, the simpler one
is the better”
Occam's razor, c. 1287–1347
41 42
Issues Conclusions
▣ Overfitting with training data ▣ Decision trees (DT) can be seen as rules and
□ Prepruning: Stop growing tree as some point make easy to understand the outcome of ML
during top-down construction when there is no ▣ DTs can be used for classification and
longer sufficient data to make reliable decisions.
□ Postpruning: Grow the full tree, then remove
regression
subtrees that do not have sufficient evidence ▣ Main algorithms are CART, ID3 and C4.5
▣ Handling missing or wrong values ▣ Ensembling approaches such as Random
Forest provide a very robust approach for
combining DTs
43 44
Credits
45 46