Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 26

Course enrollment on Blackboard

Announced on e-com

course_name course_id enroll_access_code

‫األسس النظرية لتعلم اآللة‬ 202101.FCI.AI321 367505


Book chapters for additional reading
• Can read from these books selected chapters
• Chapter 1: Foundations of ML: Mohri
• Chapter 1: Duda & Hart (online PDF)
• Next lecture chapter 2 Duda & Hart
• Can also read Koutrombas (chapter 1 & 2) (online PDF)
• Elements of Statistical learning (chapter 1&2) (online PDF)
• books PDFs on blackboard
• Useful blogs
• https://machinelearningmastery.com/
Blackboard View
• https://cu.blackboard.com/ultra/courses/_65_1/cl/outline
Check your knowledge (1/2)
• Supervised vs unsupervised
• Classification vs regression
• Features
• Need to be predictive: in regression & classification
• Something you can compute (from picture use feature that animal drinks milk or not)
• Preferably fast
• Can be
• Numeric: e.g. age
• Categorical: one of many categories: male vs female
• String: word
• Usually algorithms require converting to numeric
• Usually represented as vectors of values in d-dimensions
Check your knowledge (2/2)
• ML as function mapping
• Linear vs nonlinear classifier
• Intra-class vs inter class variability/similarity
• Evaluating an algorithm
• Accuracy
• Loss function
• Generalization error vs Training set error
• Usually simpler classifiers/regressors are better in generalizing
• Role of Train/test/validation
• i.i.d samples
• Example of non-iid: speech, words in text, pixels in image
• Full sentence could be considered as independent from other sentences
• Images iid as well (unless video)
Linear Regression as an example
• Life expectancy vs number of hours sports practicing
• Collect data
• Assume some model (let’s say linear)
• Define loss function
• Optimize ML parameters
• Use function to predict life expectancy for unknown case
• Measure performance (mean squared error) on test set
Keep in mind
• We are still in the introduction
• Much of what we said so far will be detailed later
Any questions so far?
ML in the news
• Photoshop’s AI neural filters can tweak age and expression with a fe
w clicks
Error analysis to
ML system overview improve
classifier/feature

Training + Validation data


Evaluation criterion, Domain knowledge
ground truth, baseline feedback
Technique selection + Technique selection +
Data collection parameter tuning parameter tuning
+ Labelling
accepted

Extract Split data into Select relevant Build predictive Evaluate model on
Sensor Preprocessing Transform features
Features train/test/validate features model/ensemble validation set Built model

rejected

Use test data at the very to test the


“generalization” of your ML algorithm
Difficulties in Representation
Difficulties in Representation

A young child can


easily say these
are all trees
MNIST handwritten digit datasets
Good Representation!
• Should have some invariant properties
• Image case: rotation, translation, scale, skew, deformation, color, …
• Speech case: noise, tone, speech amplitude, …
• Text (spam): single vs plural, capitalization, …
• Account for intra-class variations
• Ability to discriminate pattern classes of interest
• Robustness to noise/occlusion
• Lead to simple decision making (e.g., linear decision boundary)
• Low cost (affordable)
So…Basic questions in a classification task
• How are features generated? (feature generation stage)
• What is the best number of features? (feature selection)
• Classifier design (need an optimality criterion to choose a classifier,
e.g. linear or not)
• Classification evaluation

15
Some Key considerations
• Performance
• Error rate (Prob. of misclassification) on independent test samples
• Speed
• Cost
• Robustness
• Reject option
• Return on investment
Let’s consider one specific
example
Fish Classification: Salmon vs Sea Bass

From Duda & Hart


Length Feature

From Duda & Hart


Lightness Feature
Histogram overlap
smaller than the
case of length

From Duda & Hart


Two-dimensional Feature Space (Representation)
Two features are
better than each
one separately

The line shown


here is a linear
classifier

Line is also called


decision
boundary

Separates
decision regions
From Duda & Hart
Complex Decision Boundary: Issue of
Generalization
? Is wrongly
classified with
complex decision
boundary

From Duda & Hart


Boundary with Good Generalization

Validation set

From Duda & Hart


Underfitting
VS
Overfitting
Generalization

• the best hypothesis on the sample may not be the best overall.
• generalization is not memorization.
• complex rules (very complex separation surfaces) can be poor
predictors.
Handling Overfitting

• Simplest can be use validation set

• Cross validation idea

• Can also penalize complex models wrt fitting training data


(regularization)

You might also like