CS437 5317 EE414 L2 LinearRegression

CS 437 / CS 5317 / EE414 
Deep Learning
Murtaza Taj
murtaza.taj@lums.edu.pk
Lecture 2: Introduction
Wed 20th Jan 2021
Recap: Datasets over Algorithms
Perhaps the most important news of our day is that datasets—not algorithms—might be the key
limiting factor to development of human-level artificial intelligence.
Alexander Wissner-Gross, 2016.
http://www.spacemachine.net/views/2016/3/datasets-over-algorithms
Transfer Learning
! Low-budget deep-learning: less data and less compute
power
Train
Freeze Layers
Adapted from https://www.slideshare.net/AndrKarpitenko/practical-deep-learning

Why does it work now?
! The success of deep learning is multi-factorial:
! Five decades of research in machine learning,  
! CPUs/GPUs/storage developed for other purposes,  
! lots of data from “the internet”,  
! tools and culture of collaborative and reproducible science,  
! resources and efforts from large corporations.

Why does it work now?
! From a practical perspective, deep learning
! lessens the need for a deep mathematical grasp,  
! makes the design of large learning architectures a system/software

development task,  
! allows to leverage modern hardware (clusters of GPUs),  
! does not plateau when using more data,  
! makes large trained networks a commodity.

What is machine learning?
What is Machine Learning?
! “the acquisition of knowledge or skills through experience, study,

or by being taught.”
(C) Dhruv Batra !44

! [Arthur Samuel, 1959]

! Field of study that gives computers the ability to learn without
being explicitly programmed
! [Kevin Murphy] algorithms that

! automatically detect patterns in data use the uncovered
patterns to predict future data or other outcomes of interest
! [Tom Mitchell] algorithms that

! improve their performance (P)
! at some task (T)
! with experience (E)
(C) Dhruv Batra !45

Machine  
Data Understanding
Learning
(C) Dhruv Batra !46

ML in a Nutshell
! Tens of thousands of machine learning algorithms

! Hundreds new every year
! Decades of ML research oversimplified:

! All of Machine Learning:
! Learn a mapping from input to output f: X —> Y
! e.g. X: emails, Y: {spam, notspam}
(C) Dhruv Batra Slide Credit: Pedro Domingos !47

Types of Learning
! Supervised learning
! Training data includes desired outputs
! Unsupervised learning
! Training data does not include desired outputs
! Weakly or Semi-supervised learning

! Training data includes a few desired outputs
! Reinforcement learning
! Rewards from sequence of actions
(C) Dhruv Batra !48

Vision: Image Classification
• http://cloudcv.org/classify/
x y
scuba diver
tiger shark
hammerhead
shark
(C) Dhruv Batra !49

So what is Deep (Machine) Learning?
Feature Engineering
SIFT Spin Images
HoG Textons
and many many more….
(C) Dhruv Batra !51

Traditional Machine Learning
! Vision
hand-crafted 
features your favorite 
SIFT/HOG
classifier “car”
fixed learned
! Speech
hand-crafted 
features your favorite 
MFCC
classifier \ˈd ē p\
fixed learned
! NLP
hand-crafted 
This burrito place your favorite 
features
is yummy and fun!
Bag-of-words
classifier “+”
fixed learned
Slide Credit: Marc'Aurelio Ranzato, Yann LeCun !52
Hierarchical Compositionality
! Vision
pixels edge texton motif part object
! Speech
sample spectral formant motif phone word
band
! NLP
character word NP/VP/.. clause sentence story
•
Slide Credit: Marc'Aurelio Ranzato, Yann LeCun !53

Building A Complicated Function
Given a library of simple functions
Compose into a
complicate function
(C) Dhruv Batra Slide Credit: Marc'Aurelio Ranzato, Yann LeCun !54
Idea 1: Linear Combinations

Compose into a
• Boosting
• Kernels
complicate function
• …
Idea 2: Compositions
Compose into a
• Deep Learning
• Grammar models
complicate function
• Scattering transforms…
Idea 2: Compositions
Compose into a
• Deep Learning
• Grammar models
complicate function
• Scattering transforms…
Linear Combination
Composition
Deep Learning = Hierarchical Compositionality
“car”
Slide Credit: Marc'Aurelio Ranzato, Yann LeCun

Deep Learning = Hierarchical Compositionality
Low-Level  Mid-Level  High-Level  Trainable   “car”

Feature Feature Feature Classifier
Feature visualization of convolutional net trainedRanzato,

Slide Credit: Marc'Aurelio on ImageNet from [Zeiler & Fergus 2013]
Yann LeCun
Sparse DBNs
[Lee et al. ICML ‘09]
Figure courtesy: Quoc Le
(C) Dhruv Batra !62
Study of the visual cortex of cats
(HUBEL; WIESEL, 1959)
Image credits: https://distillery.com/

Study of the visual cortex of cats
(HUBEL; WIESEL, 1959)
Only specific neurons activate each time (right)

while the bar moves (left) (HUBEL; WIESEL, 1962)
The Mammalian Visual Cortex is Hierarchical
The ventral (recognition) pathway in the visual cortex
[picture from Simon Thorpe]

Slide Credit: Marc'Aurelio Ranzato, Yann LeCun
Visual Cortex vs. Deep Learning
Visual Cortex in Cats Sparse Dynamic Bayesian Network
Visual Cortex in Mammals Deep Neural Networks

Neuron
Neuron
x = {1,x1, x2, ⋯}T
Dendrites
x3
cell body y
xj
x2 w
wT x
axon synaptic terminal
x1
Linear Regression
Parameter Optimization: Least Squared Error Solutions
! Let us first consider the ‘simpler’ problem of fitting a line to
a set of data points…
x y
1.3 5.7
2.4 7.3
3.4 10.5
4.6 11.8
5.3 13.9
6.6 16.3
6.4 15.3
8.0 17.9
8.9 20.8
9.2 20.9
! Equation of best fit line ?

Line Fitting: Least Squared Error Solution
! Step 1: Identify the model
! Equation of line: y = mx + c
! Step 2: Set up an error term which will give the goodness

of every point with respect to the (unknown) model
! Error induced by ith point: e (i) = (t (i) −mx (i) −c)2
(t (i) −y (i))2
∑
! Error for whole data: E=
i
(t (i) −m x (i) −c)2
∑
E=
i
! Step 3: Differentiate Error w.r.t. parameters, put equal to
zero and solve for minimum point
! In other words, find the set of parameters m and c for

which the error term E is minimized
x t
1.3 5.7
(t (i) −m x (i) −c)2
∑
E= 2.4 7.3
i 3.4 10.5
∂E 4.6 11.8
= − (t (i) −m x (i) −c)x (i)
∂m ∑ 5.3 13.9
i 6.6 16.3
∂E 6.4 15.3
= − (t (i) −m x (i) −c)
∂c ∑ 8.0 17.9
i
8.9 20.8
∑i (x (i))2 ∑i x (i) ∑i x (i)t (i)
[ ∑i x (i) ∑i 1 ] [ ] [ ∑i t (i) ]
m 9.2 20.9
c =
Ax = b
380.63 56.1 m 914.68
=
56.1 10 c 140.4
Solution: m = 1.9274 c = 3.227

w = {w1, w2, ⋯}T
x = {x1, x2, ⋯}T
y = mx + c y = wT x + w0 =
∑
wj xj + w0
j
(t (i) −m x (i) −c)2 (t (i) −wT x −w0)2

∑ ∑
E= E=
i i
∂E
= − (t (i) −m x (i) −c)x (i)
∂m ∑ ∂E
= − (t (i) −y (i))xj(i)
i
∂wj ∑
∂E i
= − (t (i) −m x (i) −c)
∂c ∑
i
∑i (x (i))2 ∑i x (i) ∑i x (i)t (i)

[ ∑i x (i) ∑i 1 ] [ ] [ ∑i t (i) ]
m
c =
Linear Classifier
Linear Regression for Classification
w = {w0, w1, w2, ⋯}T
x = {1,x1, x2, ⋯}T
y = wT x =
∑
wj xj
j
t (i) ∈{−1, + 1}
j
x1 x2 x3 y
(t (i) −wT x)2
∑
E=
i 1 3 2 +ve
∂E i
1 4 3 +ve
= − (t (i) −y (i))xj(i)
∂wj ∑ 1 4 8 -ve
i
1 8 9 -ve
{−1 y ≤0
+ 1 y> 0 ! w = {10, -1, -1}
! 1x10 + (-1)x3 + (-1)x2 > 0
! 1x10 + (-1)x4 + (-1)x8 < 0
Linear Regression for Classification
w = {w0, w1, w2, ⋯}T
x = {1,x1, x2, ⋯}T
y = wT x =
∑
wj xj
j
t (i) ∈{−1, + 1}
Lbs Width Heigth y

(t (i) −wT x)2
∑
E=
i 1 3 2 Insect
δE 1 4 3 Insect
= − (t (i) −y (i))xj(i)
δwj ∑ 2 11 12 Bird
i
2 14 9 Bird
{−1 y ≤0
+ 1 y> 0 ! w = {10, -1, -1}
! 1x10 + (-1)x3 + (-1)x2 > 0
! 1x10 + (-1)x4 + (-1)x8 < 0
Neuron
Neuron
x = {1,x1, x2, ⋯}T
Dendrites
x3
cell body y
xj
x2 w
wT x
axon synaptic terminal
x1

CS437 5317 EE414 L2 LinearRegression

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CS437 5317 EE414 L2 LinearRegression

Uploaded by

Copyright:

Available Formats

CS 437 / CS 5317 / EE414

Adapted from https://www.slideshare.net/AndrKarpitenko/practical-deep-learning

! CPUs/GPUs/storage developed for other purposes,

! lots of data from “the internet”,

! tools and culture of collaborative and reproducible science,

! resources and efforts from large corporations.

! makes the design of large learning architectures a system/software

! allows to leverage modern hardware (clusters of GPUs),

! does not plateau when using more data,

! makes large trained networks a commodity.

! “the acquisition of knowledge or skills through experience, study,

(C) Dhruv Batra !44

! [Arthur Samuel, 1959]

! [Kevin Murphy] algorithms that

! [Tom Mitchell] algorithms that

(C) Dhruv Batra !45

(C) Dhruv Batra !46

! Tens of thousands of machine learning algorithms

! Decades of ML research oversimplified:

(C) Dhruv Batra Slide Credit: Pedro Domingos !47

! Weakly or Semi-supervised learning

(C) Dhruv Batra !48

(C) Dhruv Batra !49

SIFT Spin Images

and many many more….

(C) Dhruv Batra !51

pixels edge texton motif part object

Slide Credit: Marc'Aurelio Ranzato, Yann LeCun !53

Given a library of simple functions

Given a library of simple functions

Idea 1: Linear Combinations

Given a library of simple functions

Given a library of simple functions

Slide Credit: Marc'Aurelio Ranzato, Yann LeCun

Low-Level Mid-Level High-Level Trainable “car”

Feature visualization of convolutional net trainedRanzato,

Image credits: https://distillery.com/

Image credits: https://distillery.com/

Only specific neurons activate each time (right)

The ventral (recognition) pathway in the visual cortex

[picture from Simon Thorpe]

Image credits: https://distillery.com/

Visual Cortex in Cats Sparse Dynamic Bayesian Network

Visual Cortex in Mammals Deep Neural Networks

! Equation of best fit line ?

! Step 2: Set up an error term which will give the goodness

! In other words, find the set of parameters m and c for

Solution: m = 1.9274 c = 3.227

(t (i) −m x (i) −c)2 (t (i) −wT x −w0)2

∑i (x (i))2 ∑i x (i) ∑i x (i)t (i)

Lbs Width Heigth y

You might also like

CS 437 / CS 5317 / EE414 

! CPUs/GPUs/storage developed for other purposes,  

! lots of data from “the internet”,  

! tools and culture of collaborative and reproducible science,  

! allows to leverage modern hardware (clusters of GPUs),  

! does not plateau when using more data,  

Low-Level  Mid-Level  High-Level  Trainable   “car”