Download as pdf or txt
Download as pdf or txt
You are on page 1of 73

Artificial Intelligence

and Machine Learning

A.A. 2020/2021
Tatiana Tommasi
1
Naïve Bayes Classifier

2020/21
Slide Credit: Barnabás Póczos & Alex Smola 2
Deep Learning

2020/21
3
How good are our predictions?
The avocado problem:

Example:
● Imagine to buy your first avocado
● How do you understand if it is ripe?
● Gather data: buy avocados and open them
● Features:
○ color: from dark green to dark brown https://www.istockphoto.com/
○ softness: from rock hard to mushy
● What is your prediction strategy on new avocados?
● Is perfect prediction of avocados possible?

2020/21
slide credit: Francesco Orabona 4
How good are our predictions?
The avocado problem:

Example:
● Imagine to buy your first avocado
● How do you understand if it is ripe?
● Gather data: buy avocados and open them
● Features:
○ color: from dark green to dark brown https://www.istockphoto.com/
○ softness: from rock hard to mushy
● What is your prediction strategy on new avocados?
● Is perfect prediction of avocados possible?

Build a Make predictions


Model on a new sample

Training Data
2020/21
slide credit: Francesco Orabona 5
How good are our predictions?
The avocado problem:

Example:
● Imagine to buy your first avocado
● How do you understand if it is ripe?
● Gather data: buy avocados and open them
● Features:
○ color: from dark green to dark brown https://www.istockphoto.com/
○ softness: from rock hard to mushy
● What is your prediction strategy on new avocados?
● Is perfect prediction of avocados possible?

...why do we minimize the training error while we care about the test error?
...is it always possible to get 100% accuracy?
...what are the important parameters characterizing a learning problem?

2020/21
slide credit: Francesco Orabona 6
Notation

2020/21
slide credit: Francesco Orabona 7
Loss

2020/21
slide credit: Francesco Orabona 8
True Risk

2020/21
slide credit: Francesco Orabona 9
True Risk

2020/21
slide credit: Francesco Orabona 10
Bayes Classifier

2020/21
slide credit: Francesco Orabona 11
Bayes Risk

2020/21
slide credit: Francesco Orabona 12
Batch Learning

2020/21
slide credit: Francesco Orabona 13
Batch Learning - IID condition

2020/21
slide credit: Francesco Orabona 14
Empirical Risk

2020/21
slide credit: Francesco Orabona 15
Can Only Be Probably Correct

2020/21
slide credit: Francesco Orabona 16
Can Only Be Approximately Correct

2020/21
slide credit: Francesco Orabona 17
Probably Approximately Correct (PAC) Learning

2020/21
slide credit: Francesco Orabona 18
Let’s Start Easy: Realizability Assumption

2020/21
slide credit: Francesco Orabona 19
Learning in the Realizable Setting

2020/21
slide credit: Francesco Orabona 20
Analysis of a Consistent Classifier (1)

2020/21
slide credit: Francesco Orabona 21
Analysis of a Consistent Classifier (2)

2020/21
slide credit: Francesco Orabona 22
PAC Learning

2020/21
slide credit: Francesco Orabona 23
What is Learnable and How to Learn?

2020/21
slide credit: Francesco Orabona 24
Waving the Realizability Assumption

2020/21
slide credit: Francesco Orabona 25
What is Learnable and How to Learn?

2020/21
slide credit: Francesco Orabona 26
Infinite Hypothesis Classes?

2020/21
slide credit: Francesco Orabona 27
Shattering & VC-dimension

2020/21
slide credit: Francesco Orabona 28
Example

+ - + -

c c c
1 + 1 2 -

+ -
c c

c + 1 2 -
1

c c
1 2
This set is not shattered by
VCdim = 1 c1 ➡ - c2 ➡+ ?

2020/21
slide credit: Francesco Orabona 29
Example
Consider as the set of linear classifiers in two dimensions

2020/21
slide credit: Francesco Orabona 30
Example
Consider as the set of linear classifiers in two dimensions

2020/21
slide credit: Francesco Orabona 31
Example
Consider as the set of linear classifiers in two dimensions

Case 1: one point inside the triangle formed by the


others. Cannot label inside point as positive and
outside points as negative.

Case 2: all points on the boundary (convex hull).


Cannot label two diagonally as positive and other two
as negative.

This set is not shattered by


VCdim = 3
2020/21
slide credit: Francesco Orabona 32
Example
d VCdim = d+1
Consider as the set of linear classifiers in two dimensions

Case 1: one point inside the triangle formed by the


others. Cannot label inside point as positive and
outside points as negative.

Case 2: all points on the boundary (convex hull).


Cannot label two diagonally as positive and other two
as negative.

This set is not shattered by


VCdim = 3
2020/21
slide credit: Francesco Orabona 33
Not PAC Learnable

2020/21
slide credit: Francesco Orabona 34
Infinite Hypothesis Classes?

2020/21
slide credit: Francesco Orabona 35
Things can go wrong

Example: regression using


polynomial curve

2020/21
from Machine Learning and Pattern Recognition, Bishop 36
Things can go wrong

Example: regression using


polynomial curve

2020/21 http://www-inst.eecs.berkeley.edu/~cs188/fa19/
from Machine Learning and Pattern Recognition, Bishop 37
Things can go wrong

Example: regression using


polynomial curve

2020/21
from Machine Learning and Pattern Recognition, Bishop 38
Things can go wrong

Example: regression using


polynomial curve

http://www-inst.eecs.berkeley.edu/~cs188/fa19/
2020/21
from Machine Learning and Pattern Recognition, Bishop 39
Things can go wrong

Example: regression using


polynomial curve

2020/21 http://www-inst.eecs.berkeley.edu/~cs188/fa19/
from Machine Learning and Pattern Recognition, Bishop 40
Things can go wrong

General
Phenomenon:

2020/21
Figure from Deep Learning, Goodfellow, Bengio and Courville 41
Overfitting

2020/21 Slide Credit: Francesco Orabona


42
Cross Validation

Dataset

Training Testing

Cross Validation

If data permits:

Training Validation

2020/21
43
k-fold Cross Validation

Dataset

Training Testing

Cross Validation

Figure from Machine


Learning and Pattern
Recognition, Bishop
2020/21
44
k-fold Cross Validation

Dataset

Training Testing

Cross Validation

Figure from Machine


Learning and Pattern
Recognition, Bishop
2020/21
45
k-fold Cross Validation

Dataset

Training Testing

Cross Validation

If k = |S|
Leave-One-Out
Cross Validation

2020/21
46
Train - Validation - Test

2020/21
47
Model Selection In Summary

2020/21
48
Model Selection In Summary

2020/21 https://en.ephoto360.com/blood-writing-text-online-77.html
49
Learning Solution to Overfitting

Figure Credit: Nati Srebro

Control complexity by penalizing complex


models in learning: regularization
2020/21
50
Regularization

2020/21
51
Regularization

+ λ Complexity(h)

● λ is a parameter, a positive number that serves as a conversion rate between the loss
and the hypothesis complexity (they might not have the same scale)
● the form of the complexity /regularization function depends on the hypothesis space
● we still need cross validation and in this case we need to include the parameter λ -
select the one that gives the best validation score

2020/21
52
Linear Regression

2020/21
53
Linear Regression

2020/21
Slide Credit: Francesco Orabona 54
Flashback: Loss Function

2020/21
Slide Credit: Francesco Orabona 55
Empirical Loss

2020/21
Slide Credit: Francesco Orabona 56
Linear Predictors in 1d

2020/21
Slide Credit: Francesco Orabona 57
Linear Fitting to Data

2020/21
Slide Credit: Francesco Orabona 58
Linear Fitting to Data

2020/21
Slide Credit: Francesco Orabona 59
Linear Fitting to Data

2020/21
Slide Credit: Francesco Orabona 60
Linear Fitting to Data

2020/21
Slide Credit: Francesco Orabona 61
Linear Functions

2020/21
Slide Credit: Francesco Orabona 62
Least Squares Criterion

2020/21
Slide Credit: Francesco Orabona 63
Least Squares in Matrix / Vector Form

2020/21
Slide Credit: Francesco Orabona 64
Least Squares via Calculus

Slide Credit: E. Rodolà


2020/21
65
Least Squares via Calculus

2020/21
Slide Credit: Francesco Orabona 66
Linear Predictor in 1d

forget about the b for one second….

2020/21
Slide Credit: William Cohen 67
add the bias term

2020/21
Slide Credit: William Cohen 68
Add regularization (Ridge Regression)

2020/21
69
Beyond Linear Models: Polynomial Regression

2020/21
Slide Credit: Francesco Orabona 70
Fitting Polynomials

2020/21
Slide Credit: Francesco Orabona 71
Regression Summary

2020/21
Slide Credit: Francesco Orabona 72
Final Summary

● Learning can only be Probably and Approximately Correct (PAC)


○ true risk vs empirical risk
○ loss
○ empirical risk minimization can lead to learning algorithms with reasonable
generalization guarantees (within some conditions)
○ when a task is not PAC learnable

● Underfitting / Overfitting
○ cross-validation

● Linear Regression
○ without and with regularization

2020/21
73

You might also like