03 Supervised Classification

AI in Healthcare: Lecture 3
Why is Deep Learning taking off

Reminder: Computer Vision
Processing on Image/Video data

Deep Learning Applications
Classification
What the
computer
sees
Image classification 82% cat

18% non-cat
Binary Classification
An image is just a big grid of
numbers between [0,255]:
e.g., 900 x 600 x 3 (3 channels

RGB)
What the
computer
sees
Pixel intensity
Challenges
Attempts have been made
Fei-Fei Li et al. (cs231n.Stanford.edu)

Data-driven approach
• We’re going to provide the computer with many examples of each class and
then develop learning algorithms that look at these examples and learn
about the visual appearance of each class.
Data-driven approach
Output
Supervised vs. Unsupervised Learning
• Train the machine using data • Working with unlabeled data
which is well “labeled”. • Clustering data patterns
• The machine then predict • Unsupervised data finds all kind
unforeseen data in the same of unknown patterns in data
domain
• It is easier to get unlabeled data
• Labeling is not easy than labeled data
• Challenge with huge data • Better in case of huge data
Supervised vs. Unsupervised Learning
Are the
supervised and
unsupervised
learning a data-
driven
approach?
Supervised Learning
• Classification:
• Classification
• Segmentation
• Object detection
• Regression
Unsupervised Learning
• Clustering
• Association
• Semisupervised Learning
Training and Testing in Supervised Learning
Input
Input Prediction
Training and Testing in Supervised Learning
Supervised
Classification
1 (cat) vs 0 (non cat)
[]
output
255
x 231
⋮
⋮
input 𝑥= 255
134
⋮
⋮
142
X(1) y(1)
X= X(1) X(2) … X(m) Y = [y(1) y(2) … y(m)]

X(2) y(2) 𝑌 ∈ℝ
1 ×𝑚
𝑛𝑥× 𝑚
𝑋∈ℝ
…
…
m training examples
(Mtrain, Mtest)
X(m) y(m)
Multi-class Classifier
• Data-driven approach:
• Collect a dataset of images and labels
• Use Machine Learning to train a classifier
• Evaluate the classifier on new images
Nearest Class 1
Neighbor
Class 2
Class 3
Class 4 …
First classifier: Nearest Neighbor
Example Dataset: CIFAR10
• 10 classes
• 50,000 training images
• 10,000 testing images Test images and nearest neighbors
Distance Metric to compare images
• L1 distance:
Nearest Neighbor Classifier
Nearest Neighbor Classifier
Memorize training data
For each test image:
• Find closest train image
• Get the same label of
the nearest image
Problem:
• Fast for training
• Slow for prediction
We want classifiers that
are fast at prediction, slow
for training is ok.
Prediction
K-Nearest Neighbors (KNN)
Instead of copying label from nearest neighbor,
take majority vote from K closest points
Setting hyperparameters
Setting hyperparameters
Usefull for small datasets, but not used too frequently in real-world problems
Pros and Cons
• Pros:
• Simple to implement and understand
• Take no time to train
• Cons:
• Pay computational cost at test time
• Bad choice in case of high-dimensional data since distances over
high-dimensional spaces can be very counter-intuitive
• Distances over
high-dimensional
space can be
counter-intuitive
sklearn.neighbors.KNeighborsClassifier
Assignment
Applicable in some settings ( especially if the data is low-dimensional)

Assignment
• Classify the MNIST dataset
Supervised
Regression
Linear Regression
• A house with area , have bed rooms, and stay away from the city
center km have price?
Real outcome (groundtruth) Prediction
Where
• : constant, : bias
• The relationship is a linear relationship
• This problem is a regression problem when we want to find the
optimized parameters LINEAR REGRESSION
Linear Regression
• Loss function
We have the derivative wrt. b of loss function is

=
(1)
Linear Regression
• If is invertible (non singular), then (1) has only one solution
• If A is non invertible, we use the pseudoinverse

Simple Linear Regression
Line of regression
Datapoints
Simple Linear Regression
• We have a table of height and weight of 15 persons as below
Height (cm) Weight (kg) Height (cm) Weight (kg)
Predict weight of one person based on one’s height?

Prediction
Multiple Linear Regression
• A multiple linear regression model is a model with many predictor
variables .
• More complex models may include higher powers of one or more

predictor variables
Multiple Linear Regression
• Consider the following data

concerning the Housing Price Index
(HPI). Age and Loan are two
numerical variables (predictors) and
HPI is the numerical target.
• We can use the training set to
calculate an unknown case using
Euclide distance. If K=1 then the
Nearest Neighbor is the last case in
the training set with HPI = 264
Assignment:
• Predict the HPI using the two predictors Age and Loan using:
sklearn.neighbors.KNeighborsRegressor
Neural Network
Neural Network
?
𝑋 → 𝑦=𝑓 (𝑋 )
Convolutional Neural Network (CNN)
Retinal Disease Classification
Skin Diseases Detection
Acne Grading
Linear Classification
• Parametric Approach
Parametric Approach: Linear Classifier
Example with an image with 4 pixels, and 3 classes (cat/dog/ship)
Non-linear
Sigmoid Function
1.002329376404
1745e-233
1.0
5.333220360939
705e-164
Homework 1
• Using KNN or Linear classifier (1 dense layer without activation
function) to classify the iris flower dataset or the MNIST hand-written
digit dataset.
• Hint: Dense layer: tf.keras.layers.Dense(activation = None)
References
• https://cs231n.github.io/
• https://
www.coursera.org/programs/data-science-program-6-months-5n0mk
/browse?productId=W62RsyrdEeeFQQqyuQaohA&productType=s12n
&query=deep+learning&showMiniModal=true

03 Supervised Classification

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

03 Supervised Classification

Uploaded by

Copyright:

Available Formats

AI in Healthcare: Lecture 3

Why is Deep Learning taking off

Processing on Image/Video data

Image classification 82% cat

e.g., 900 x 600 x 3 (3 channels

Fei-Fei Li et al. (cs231n.Stanford.edu)

1 (cat) vs 0 (non cat)

X= X(1) X(2) … X(m) Y = [y(1) y(2) … y(m)]

Applicable in some settings ( especially if the data is low-dimensional)

We have the derivative wrt. b of loss function is

• If A is non invertible, we use the pseudoinverse

Predict weight of one person based on one’s height?

• More complex models may include higher powers of one or more

• Consider the following data

You might also like