IE643 Lecture2 2020aug18

Deep Learning - Theory and Practice
IE 643
Lecture 2
August 18, 2020.
P. Balamurugan Deep Learning - Theory and Practice August 18, 2020. 1 / 65

Outline
1 Perceptron: Recap
2 Nature of Machine Learning Tasks

Supervised Machine Learning
3 Perceptron and Learning
4 Multi-layer Perceptron
5 Convolutional Neural Networks
6 Recurrent Neural Networks
7 Other Topics

Perceptron: Recap
Perceptron

Perceptron: Recap
Biological Motivation

Perceptron: Recap
Perceptron

Perceptron: Recap
Perceptron

Perceptron: Recap
Perceptron
Key Assumptions
Stimuli which are similar will tend to form pathways to some sets of
response cells.
Stimuli which are dissimilar will tend to form pathways to different

sets of response cells.
Application of positive or negative reinforcements may facilitate or

hinder the formation of connections.
Similarity of stimuli is a dynamically evolving attribute.

Perceptron: Recap
Perceptron

Perceptron: Recap
Perceptron

Perceptron: Recap
Perceptron

Perceptron: Recap
Perceptron

Perceptron: Recap
Perceptron

Nature of Machine Learning Tasks

Supervised Learning
I Inputs and corresponding outputs are known during learning
I e.g. Classification (Binary, Multi-class, Multi-label)
Unsupervised Learning
I Input objects are not generally labeled
I e.g. Clustering, Principal-component Analysis
Semi-supervised Learning
I learning from a few labeled data
Transfer Learning
I transferring a learned model from task T1 to T2
I e.g. transfer from image captioning to video captioning

Nature of Machine Learning Tasks Supervised Machine Learning
Supervised Machine Learning

Binary Classification
Recall: e-mail Spam Classification

Input: e-mail messages

Output: Spam/Not spam

Input: e-mail messages =⇒ some feature space

Output: Spam/Not spam =⇒ {+1, −1}
Generally many input/output pairs are given for learning the machine
learning model.

Feature Extraction

Feature Extraction

Input: e-mail messages =⇒ some feature space ⊆ Rd .

I x ∈ X ⊆ Rd
Output: Spam/Not spam =⇒ {+1, −1}

I y ∈ Y = {+1, −1}
Generally n input/output pairs {(xi , yi )}ni=1 ⊂ (X × Y) are given for

learning the machine learning model.
D = {(xi , yi )}ni=1 called the training data.

General Nature of a Supervised Machine Learning Task
Training
Input: Training data D = {(xi , yi )}ni=1
Aim: Learn a model h : X → Y

General Nature of a Supervised Machine Learning Task
Training
Input: Training data D = {(xi , yi )}ni=1
Aim: Learn a model h : X → Y
Testing
Given x̂, predict ŷ = h(x̂)
Perceptron and Learning

Perceptron

Perceptron

Perceptron

Perceptron
Input is of the form x = (x1 , x2 , . . . , xd ) ∈ Rd .

We associate weights w = (w1 , w2 , . . . , wd ) ∈ Rd to the connections.
Prediction Rule:
Pd
I w x ≥ θ =⇒ predict 1.
Pdi=1 i i
i=1 i xi < θ =⇒ predict −1.
w
I

Perceptron
Input is of the form x = (x1 , x2 , . . . , xd ) ∈ Rd .

We associate weights w = (w1 , w2 , . . . , wd ) ∈ Rd to the connections.
Prediction Rule:
I hw , xi ≥ θ =⇒ predict 1.
I hw , xi < θ =⇒ predict −1.
Pd
Note: hw , xi = i=1 wi xi denotes inner product between w and x.
Perceptron - Geometry





Perceptron

Perceptron

Perceptron
Input is of the form x̃ = (1, x) = (1, x1 , x2 , . . . , xd ) ∈ Rd+1 .

We associate weights w̃ = (−θ, w ) = (−θ, w1 , w2 , . . . , wd ) ∈ Rd+1 to
the connections.
Prediction Rule:
I hw , xi − θ ≥ 0 =⇒ predict 1.
I hw , xi − θ < 0 =⇒ predict −1.
Perceptron - Data Perspective
Actual Input: x = (x1 , x2 , . . . , xd ) ∈ Rd

Transformed Input: x̃ = (1, x1 , x2 , . . . , xd ) ∈ Rd+1
Actual Weights: w = (w1 , w2 , . . . , wd ) ∈ Rd , θ ∈ R
Transformed Weights: w̃ = (w0 , w1 , w2 , . . . , wd ) ∈ Rd+1

Equivalently we might use:

Transformed Input: x̃ = (x1 , x2 , . . . , xd , 1) ∈ Rd+1
Transformed Weights: w̃ = (w1 , w2 , . . . , wd , w0 ) ∈ Rd+1

x1 w1 Activation
function
Output
Inputs x2 w2 hw̃ , x̃i = Σwi xi sign(hw̃ , x̃i) ŷ
xd wd
Constant 1 w0
Weights, Bias
Figure: Perceptron unit

Input: data point x = (x1 , x2 , . . . , xd ), label y ∈ {+1, −1}.

Perceptron - Geometrical Interpretation
hw , xi = 0
hw , xi < 0
hw , xi > 0

Perceptron - Training
Perceptron Training Procedure

1: w̃ 1 = 0
2: for t ← 1, 2, 3, . . . do
3: receive (x t , y t ), x t ∈ Rd , y t ∈ {+1, −1}.
4: Transform x t into x̃ t = (x t , 1) ∈ Rd+1 .
5: yb = Perceptron(x̃ t ; w̃ t )
6: if yb 6= y t then
7: w̃ t+1 = w̃ t + y t x̃ t
8: else
9: w̃ t+1 = w̃ t

Perceptron Update Rule - Geometric Idea

The current weights are w̃ t = (1, 1, −4).

Suppose a new point x t = (2, 1), y t = 1 arrives.


Transform x t into x̃ t = (2, 1, 1).


However with the current weights, perceptron outputs ŷ t = −1.
(why?)


Hence an error occurs and w̃ t gets updated to w̃ t+1 = w̃ t + y t x̃ t , .


After update, the new weights become w̃ t+1 = (3, 2, −3).

Homework:
Suppose now a new point x t = (−1, −1) with label −1 comes up.
How will the weights change?
Suppose a different new point x t = (−2, 3) with label +1 comes up.
How will the weights change?
Perceptron - Convergence
Convergence of Perceptron Training Procedure

Under a suitable separation assumption of the positive and negative
labeled data, the training procedure for Perceptron converges in finite time.
Proof
Will be discussed later...

Perceptron - Caveat
Not suitable when separation assumption fails

Example: Classical XOR problem

Perceptron - Sigmoid Activation

Multi-layer Perceptron


Convolutional Neural Networks
Convolutional Neural Networks (CNNs)

CNN

CNN - LeNet Architecture

Recurrent Neural Networks
Recurrent Neural Networks (RNNs)

Sequential outputs
yt−1 yt yt+1
Noun Verb Adverb
Magadh ran fast

xt−1 xt xt+1
Figure: Part-of-Speech Tagging

Recurrent Neural Network
Output
Hidden
Input
Figure: A simple recurrent network

Other Topics
Other Topics
Memory models: e.g. Boltzmann machines, Hopfield Nets
Other variants: Capsule Nets, U-Nets, R-CNN, etc.,
Different types of learning problems
Optimization algorithms
Applications from a variety of fields

Other Topics
ACKNOWLEDGMENTS
Some content borrowed from various open-access resources for
Lecture 1 and Lecture 2 slides.
I Blogs
I Tutorials
I Free e-books
I Open-access Papers
I Youtube videos
I Scribe notes from my students

Other Topics
Home Work
Programming Exercise
Write Python code to generate
100 two-dimensional
points from a normal
−5 1 0
distribution with mean and variance . Show the generated
−5 0 1
points in a plot.

IE643 Lecture2 2020aug18

Uploaded by

Copyright:

Available Formats

You might also like

IE643 Lecture2 2020aug18

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

IE643 Lecture2 2020aug18

Uploaded by

Copyright:

Available Formats

Deep Learning - Theory and Practice

August 18, 2020.

P. Balamurugan Deep Learning - Theory and Practice August 18, 2020. 1 / 65

2 Nature of Machine Learning Tasks

3 Perceptron and Learning

5 Convolutional Neural Networks

6 Recurrent Neural Networks

P. Balamurugan Deep Learning - Theory and Practice August 18, 2020. 2 / 65

P. Balamurugan Deep Learning - Theory and Practice August 18, 2020. 3 / 65

P. Balamurugan Deep Learning - Theory and Practice August 18, 2020. 4 / 65

P. Balamurugan Deep Learning - Theory and Practice August 18, 2020. 5 / 65

P. Balamurugan Deep Learning - Theory and Practice August 18, 2020. 6 / 65

Stimuli which are dissimilar will tend to form pathways to different

Application of positive or negative reinforcements may facilitate or

Similarity of stimuli is a dynamically evolving attribute.

P. Balamurugan Deep Learning - Theory and Practice August 18, 2020. 7 / 65

P. Balamurugan Deep Learning - Theory and Practice August 18, 2020. 8 / 65

P. Balamurugan Deep Learning - Theory and Practice August 18, 2020. 9 / 65

P. Balamurugan Deep Learning - Theory and Practice August 18, 2020. 10 / 65

P. Balamurugan Deep Learning - Theory and Practice August 18, 2020. 11 / 65

P. Balamurugan Deep Learning - Theory and Practice August 18, 2020. 12 / 65

Nature of Machine Learning Tasks

P. Balamurugan Deep Learning - Theory and Practice August 18, 2020. 13 / 65

Nature of Machine Learning Tasks

P. Balamurugan Deep Learning - Theory and Practice August 18, 2020. 14 / 65

Supervised Machine Learning

P. Balamurugan Deep Learning - Theory and Practice August 18, 2020. 15 / 65

P. Balamurugan Deep Learning - Theory and Practice August 18, 2020. 16 / 65

Input: e-mail messages

P. Balamurugan Deep Learning - Theory and Practice August 18, 2020. 17 / 65

Input: e-mail messages =⇒ some feature space

P. Balamurugan Deep Learning - Theory and Practice August 18, 2020. 18 / 65

P. Balamurugan Deep Learning - Theory and Practice August 18, 2020. 19 / 65

P. Balamurugan Deep Learning - Theory and Practice August 18, 2020. 20 / 65

Input: e-mail messages =⇒ some feature space ⊆ Rd .

Output: Spam/Not spam =⇒ {+1, −1}

Generally n input/output pairs {(xi , yi )}ni=1 ⊂ (X × Y) are given for

D = {(xi , yi )}ni=1 called the training data.

P. Balamurugan Deep Learning - Theory and Practice August 18, 2020. 21 / 65

General Nature of a Supervised Machine Learning Task

P. Balamurugan Deep Learning - Theory and Practice August 18, 2020. 22 / 65

General Nature of a Supervised Machine Learning Task

Perceptron and Learning

P. Balamurugan Deep Learning - Theory and Practice August 18, 2020. 24 / 65

P. Balamurugan Deep Learning - Theory and Practice August 18, 2020. 25 / 65

P. Balamurugan Deep Learning - Theory and Practice August 18, 2020. 26 / 65

P. Balamurugan Deep Learning - Theory and Practice August 18, 2020. 27 / 65

Input is of the form x = (x1 , x2 , . . . , xd ) ∈ Rd .

P. Balamurugan Deep Learning - Theory and Practice August 18, 2020. 28 / 65

Input is of the form x = (x1 , x2 , . . . , xd ) ∈ Rd .

P. Balamurugan Deep Learning - Theory and Practice August 18, 2020. 30 / 65

P. Balamurugan Deep Learning - Theory and Practice August 18, 2020. 31 / 65

P. Balamurugan Deep Learning - Theory and Practice August 18, 2020. 32 / 65

P. Balamurugan Deep Learning - Theory and Practice August 18, 2020. 33 / 65

P. Balamurugan Deep Learning - Theory and Practice August 18, 2020. 34 / 65

P. Balamurugan Deep Learning - Theory and Practice August 18, 2020. 35 / 65

P. Balamurugan Deep Learning - Theory and Practice August 18, 2020. 36 / 65