Download as pdf or txt
Download as pdf or txt
You are on page 1of 42

CS 437 / CS 5317 / EE414


Deep Learning

Murtaza Taj
murtaza.taj@lums.edu.pk

Lecture 2: Introduction
Wed 20th Jan 2021
Recap: Datasets over Algorithms

Perhaps the most important news of our day is that datasets—not algorithms—might be the key
limiting factor to development of human-level artificial intelligence.
Alexander Wissner-Gross, 2016.

http://www.spacemachine.net/views/2016/3/datasets-over-algorithms
Transfer Learning
! Low-budget deep-learning: less data and less compute
power

Train
Freeze Layers

Adapted from https://www.slideshare.net/AndrKarpitenko/practical-deep-learning


Why does it work now?
! The success of deep learning is multi-factorial:
! Five decades of research in machine learning, 


! CPUs/GPUs/storage developed for other purposes, 


! lots of data from “the internet”, 


! tools and culture of collaborative and reproducible science, 


! resources and efforts from large corporations.


Why does it work now?
! From a practical perspective, deep learning
! lessens the need for a deep mathematical grasp, 


! makes the design of large learning architectures a system/software


development task, 


! allows to leverage modern hardware (clusters of GPUs), 


! does not plateau when using more data, 


! makes large trained networks a commodity.


What is machine learning?
What is Machine Learning?

! “the acquisition of knowledge or skills through experience, study,


or by being taught.”

(C) Dhruv Batra !44


What is Machine Learning?

! [Arthur Samuel, 1959]


! Field of study that gives computers the ability to learn without
being explicitly programmed

! [Kevin Murphy] algorithms that


! automatically detect patterns in data use the uncovered
patterns to predict future data or other outcomes of interest

! [Tom Mitchell] algorithms that


! improve their performance (P)
! at some task (T)
! with experience (E)

(C) Dhruv Batra !45


What is Machine Learning?

Machine 

Data Understanding
Learning

(C) Dhruv Batra !46


ML in a Nutshell

! Tens of thousands of machine learning algorithms


! Hundreds new every year

! Decades of ML research oversimplified:


! All of Machine Learning:
! Learn a mapping from input to output f: X —> Y
! e.g. X: emails, Y: {spam, notspam}

(C) Dhruv Batra Slide Credit: Pedro Domingos !47


Types of Learning

! Supervised learning
! Training data includes desired outputs

! Unsupervised learning
! Training data does not include desired outputs

! Weakly or Semi-supervised learning


! Training data includes a few desired outputs

! Reinforcement learning
! Rewards from sequence of actions

(C) Dhruv Batra !48


Vision: Image Classification

• http://cloudcv.org/classify/
x y

scuba diver
tiger shark
hammerhead
shark

(C) Dhruv Batra !49


So what is Deep (Machine) Learning?
Feature Engineering

SIFT Spin Images

HoG Textons

and many many more….

(C) Dhruv Batra !51


Traditional Machine Learning

! Vision
hand-crafted

features your favorite


SIFT/HOG
classifier “car”

fixed learned

! Speech
hand-crafted

features your favorite


MFCC
classifier \ˈd ē p\

fixed learned

! NLP
hand-crafted

This burrito place your favorite

features
is yummy and fun!
Bag-of-words
classifier “+”

fixed learned
Slide Credit: Marc'Aurelio Ranzato, Yann LeCun !52
Hierarchical Compositionality

! Vision

pixels edge texton motif part object

! Speech
sample spectral formant motif phone word
band

! NLP
character word NP/VP/.. clause sentence story

Slide Credit: Marc'Aurelio Ranzato, Yann LeCun !53


Building A Complicated Function

Given a library of simple functions

Compose into a

complicate function

(C) Dhruv Batra Slide Credit: Marc'Aurelio Ranzato, Yann LeCun !54
Building A Complicated Function

Given a library of simple functions

Idea 1: Linear Combinations


Compose into a
• Boosting
• Kernels
complicate function
• …

(C) Dhruv Batra Slide Credit: Marc'Aurelio Ranzato, Yann LeCun !55
Building A Complicated Function

Given a library of simple functions

Idea 2: Compositions
Compose into a
• Deep Learning
• Grammar models
complicate function
• Scattering transforms…

(C) Dhruv Batra Slide Credit: Marc'Aurelio Ranzato, Yann LeCun !56
Building A Complicated Function

Given a library of simple functions

Idea 2: Compositions
Compose into a
• Deep Learning
• Grammar models
complicate function
• Scattering transforms…

(C) Dhruv Batra Slide Credit: Marc'Aurelio Ranzato, Yann LeCun !57
Linear Combination
Composition
Deep Learning = Hierarchical Compositionality

“car”

Slide Credit: Marc'Aurelio Ranzato, Yann LeCun


Deep Learning = Hierarchical Compositionality

Low-Level
 Mid-Level
 High-Level
 Trainable 
 “car”


Feature Feature Feature Classifier

Feature visualization of convolutional net trainedRanzato,


Slide Credit: Marc'Aurelio on ImageNet from [Zeiler & Fergus 2013]
Yann LeCun
Sparse DBNs
[Lee et al. ICML ‘09]
Figure courtesy: Quoc Le
(C) Dhruv Batra !62
Study of the visual cortex of cats
(HUBEL; WIESEL, 1959)

Image credits: https://distillery.com/


Study of the visual cortex of cats
(HUBEL; WIESEL, 1959)

Image credits: https://distillery.com/

Only specific neurons activate each time (right)


while the bar moves (left) (HUBEL; WIESEL, 1962)
The Mammalian Visual Cortex is Hierarchical

The ventral (recognition) pathway in the visual cortex

[picture from Simon Thorpe]


Slide Credit: Marc'Aurelio Ranzato, Yann LeCun
Visual Cortex vs. Deep Learning

Image credits: https://distillery.com/

Visual Cortex in Cats Sparse Dynamic Bayesian Network

Visual Cortex in Mammals Deep Neural Networks


Neuron
Neuron
x = {1,x1, x2, ⋯}T

Dendrites

x3
cell body y
xj
x2 w

wT x
axon synaptic terminal
x1
Linear Regression
Parameter Optimization: Least Squared Error Solutions
! Let us first consider the ‘simpler’ problem of fitting a line to
a set of data points…
x y
1.3 5.7
2.4 7.3
3.4 10.5
4.6 11.8
5.3 13.9
6.6 16.3
6.4 15.3
8.0 17.9
8.9 20.8
9.2 20.9

! Equation of best fit line ?


Line Fitting: Least Squared Error Solution
! Step 1: Identify the model
! Equation of line: y = mx + c

! Step 2: Set up an error term which will give the goodness


of every point with respect to the (unknown) model
! Error induced by ith point: e (i) = (t (i) −mx (i) −c)2

(t (i) −y (i))2

! Error for whole data: E=
i
(t (i) −m x (i) −c)2

E=
i
! Step 3: Differentiate Error w.r.t. parameters, put equal to
zero and solve for minimum point

! In other words, find the set of parameters m and c for


which the error term E is minimized
Line Fitting: Least Squared Error Solution

x t
1.3 5.7
(t (i) −m x (i) −c)2

E= 2.4 7.3
i 3.4 10.5
∂E 4.6 11.8
= − (t (i) −m x (i) −c)x (i)
∂m ∑ 5.3 13.9
i 6.6 16.3
∂E 6.4 15.3
= − (t (i) −m x (i) −c)
∂c ∑ 8.0 17.9
i
8.9 20.8
∑i (x (i))2 ∑i x (i) ∑i x (i)t (i)
[ ∑i x (i) ∑i 1 ] [ ] [ ∑i t (i) ]
m 9.2 20.9
c =

Ax = b
380.63 56.1 m 914.68
=
56.1 10 c 140.4

Solution: m = 1.9274 c = 3.227


Line Fitting: Least Squared Error Solution
Line Fitting: Least Squared Error Solution
w = {w1, w2, ⋯}T
x = {x1, x2, ⋯}T
y = mx + c y = wT x + w0 =

wj xj + w0
j

(t (i) −m x (i) −c)2 (t (i) −wT x −w0)2


∑ ∑
E= E=
i i

∂E
= − (t (i) −m x (i) −c)x (i)
∂m ∑ ∂E
= − (t (i) −y (i))xj(i)
i
∂wj ∑
∂E i
= − (t (i) −m x (i) −c)
∂c ∑
i

∑i (x (i))2 ∑i x (i) ∑i x (i)t (i)


[ ∑i x (i) ∑i 1 ] [ ] [ ∑i t (i) ]
m
c =
Linear Classifier
Linear Regression for Classification
w = {w0, w1, w2, ⋯}T
x = {1,x1, x2, ⋯}T
y = wT x =

wj xj
j

t (i) ∈{−1, + 1}
j

x1 x2 x3 y
(t (i) −wT x)2

E=
i 1 3 2 +ve
∂E i
1 4 3 +ve
= − (t (i) −y (i))xj(i)
∂wj ∑ 1 4 8 -ve
i
1 8 9 -ve

{−1 y ≤0
+ 1 y> 0 ! w = {10, -1, -1}
! 1x10 + (-1)x3 + (-1)x2 > 0
! 1x10 + (-1)x4 + (-1)x8 < 0
Linear Regression for Classification
w = {w0, w1, w2, ⋯}T
x = {1,x1, x2, ⋯}T
y = wT x =

wj xj
j

t (i) ∈{−1, + 1}

Lbs Width Heigth y


(t (i) −wT x)2

E=
i 1 3 2 Insect
δE 1 4 3 Insect
= − (t (i) −y (i))xj(i)
δwj ∑ 2 11 12 Bird
i
2 14 9 Bird

{−1 y ≤0
+ 1 y> 0 ! w = {10, -1, -1}
! 1x10 + (-1)x3 + (-1)x2 > 0
! 1x10 + (-1)x4 + (-1)x8 < 0
Neuron
Neuron
x = {1,x1, x2, ⋯}T

Dendrites

x3
cell body y
xj
x2 w

wT x
axon synaptic terminal
x1

You might also like