Course Enrollment On Blackboard: Announced On E-Com

Course enrollment on Blackboard
Announced on e-com
course_name course_id enroll_access_code
‫األسس النظرية لتعلم اآللة‬ 202101.FCI.AI321 367505

Book chapters for additional reading
• Can read from these books selected chapters
• Chapter 1: Foundations of ML: Mohri
• Chapter 1: Duda & Hart (online PDF)
• Next lecture chapter 2 Duda & Hart
• Can also read Koutrombas (chapter 1 & 2) (online PDF)
• Elements of Statistical learning (chapter 1&2) (online PDF)
• books PDFs on blackboard
• Useful blogs
• https://machinelearningmastery.com/
Blackboard View
• https://cu.blackboard.com/ultra/courses/_65_1/cl/outline
Check your knowledge (1/2)
• Supervised vs unsupervised
• Classification vs regression
• Features
• Need to be predictive: in regression & classification
• Something you can compute (from picture use feature that animal drinks milk or not)
• Preferably fast
• Can be
• Numeric: e.g. age
• Categorical: one of many categories: male vs female
• String: word
• Usually algorithms require converting to numeric
• Usually represented as vectors of values in d-dimensions
Check your knowledge (2/2)
• ML as function mapping
• Linear vs nonlinear classifier
• Intra-class vs inter class variability/similarity
• Evaluating an algorithm
• Accuracy
• Loss function
• Generalization error vs Training set error
• Usually simpler classifiers/regressors are better in generalizing
• Role of Train/test/validation
• i.i.d samples
• Example of non-iid: speech, words in text, pixels in image
• Full sentence could be considered as independent from other sentences
• Images iid as well (unless video)
Linear Regression as an example
• Life expectancy vs number of hours sports practicing
• Collect data
• Assume some model (let’s say linear)
• Define loss function
• Optimize ML parameters
• Use function to predict life expectancy for unknown case
• Measure performance (mean squared error) on test set
Keep in mind
• We are still in the introduction
• Much of what we said so far will be detailed later
Any questions so far?
ML in the news
• Photoshop’s AI neural filters can tweak age and expression with a fe
w clicks
Error analysis to
ML system overview improve
classifier/feature
Training + Validation data

Evaluation criterion, Domain knowledge
ground truth, baseline feedback
Technique selection + Technique selection +
Data collection parameter tuning parameter tuning
+ Labelling
accepted
Extract Split data into Select relevant Build predictive Evaluate model on
Sensor Preprocessing Transform features
Features train/test/validate features model/ensemble validation set Built model
rejected
Use test data at the very to test the

“generalization” of your ML algorithm
Difficulties in Representation
Difficulties in Representation
A young child can

easily say these
are all trees
MNIST handwritten digit datasets
Good Representation!
• Should have some invariant properties
• Image case: rotation, translation, scale, skew, deformation, color, …
• Speech case: noise, tone, speech amplitude, …
• Text (spam): single vs plural, capitalization, …
• Account for intra-class variations
• Ability to discriminate pattern classes of interest
• Robustness to noise/occlusion
• Lead to simple decision making (e.g., linear decision boundary)
• Low cost (affordable)
So…Basic questions in a classification task
• How are features generated? (feature generation stage)
• What is the best number of features? (feature selection)
• Classifier design (need an optimality criterion to choose a classifier,
e.g. linear or not)
• Classification evaluation
15
Some Key considerations
• Performance
• Error rate (Prob. of misclassification) on independent test samples
• Speed
• Cost
• Robustness
• Reject option
• Return on investment
Let’s consider one specific
example
Fish Classification: Salmon vs Sea Bass
From Duda & Hart

Length Feature
From Duda & Hart

Lightness Feature
Histogram overlap
smaller than the
case of length
From Duda & Hart

Two-dimensional Feature Space (Representation)
Two features are
better than each
one separately
The line shown

here is a linear
classifier
Line is also called

decision
boundary
Separates
decision regions
From Duda & Hart
Complex Decision Boundary: Issue of
Generalization
? Is wrongly
classified with
complex decision
boundary
From Duda & Hart

Boundary with Good Generalization
Validation set
From Duda & Hart

Underfitting
VS
Overfitting
Generalization
• the best hypothesis on the sample may not be the best overall.
• generalization is not memorization.
• complex rules (very complex separation surfaces) can be poor
predictors.
Handling Overfitting
• Simplest can be use validation set
• Cross validation idea
• Can also penalize complex models wrt fitting training data

(regularization)

Course Enrollment On Blackboard: Announced On E-Com

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Course Enrollment On Blackboard: Announced On E-Com

Uploaded by

Copyright:

Available Formats

Course enrollment on Blackboard

course_name course_id enroll_access_code

‫األسس النظرية لتعلم اآللة‬ 202101.FCI.AI321 367505

Training + Validation data

Use test data at the very to test the

A young child can

From Duda & Hart

From Duda & Hart

From Duda & Hart

The line shown

Line is also called

From Duda & Hart

From Duda & Hart

• Simplest can be use validation set

• Cross validation idea

• Can also penalize complex models wrt fitting training data

You might also like