IE643 Lecture2 2020aug18

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 65

Deep Learning - Theory and Practice

IE 643
Lecture 2

August 18, 2020.

P. Balamurugan Deep Learning - Theory and Practice August 18, 2020. 1 / 65


Outline

1 Perceptron: Recap

2 Nature of Machine Learning Tasks


Supervised Machine Learning

3 Perceptron and Learning

4 Multi-layer Perceptron

5 Convolutional Neural Networks

6 Recurrent Neural Networks

7 Other Topics

P. Balamurugan Deep Learning - Theory and Practice August 18, 2020. 2 / 65


Perceptron: Recap

Perceptron

P. Balamurugan Deep Learning - Theory and Practice August 18, 2020. 3 / 65


Perceptron: Recap

Biological Motivation

P. Balamurugan Deep Learning - Theory and Practice August 18, 2020. 4 / 65


Perceptron: Recap

Perceptron

P. Balamurugan Deep Learning - Theory and Practice August 18, 2020. 5 / 65


Perceptron: Recap

Perceptron

P. Balamurugan Deep Learning - Theory and Practice August 18, 2020. 6 / 65


Perceptron: Recap

Perceptron

Key Assumptions
Stimuli which are similar will tend to form pathways to some sets of
response cells.

Stimuli which are dissimilar will tend to form pathways to different


sets of response cells.

Application of positive or negative reinforcements may facilitate or


hinder the formation of connections.

Similarity of stimuli is a dynamically evolving attribute.

P. Balamurugan Deep Learning - Theory and Practice August 18, 2020. 7 / 65


Perceptron: Recap

Perceptron

P. Balamurugan Deep Learning - Theory and Practice August 18, 2020. 8 / 65


Perceptron: Recap

Perceptron

P. Balamurugan Deep Learning - Theory and Practice August 18, 2020. 9 / 65


Perceptron: Recap

Perceptron

P. Balamurugan Deep Learning - Theory and Practice August 18, 2020. 10 / 65


Perceptron: Recap

Perceptron

P. Balamurugan Deep Learning - Theory and Practice August 18, 2020. 11 / 65


Perceptron: Recap

Perceptron

P. Balamurugan Deep Learning - Theory and Practice August 18, 2020. 12 / 65


Nature of Machine Learning Tasks

Nature of Machine Learning Tasks

P. Balamurugan Deep Learning - Theory and Practice August 18, 2020. 13 / 65


Nature of Machine Learning Tasks

Nature of Machine Learning Tasks

Supervised Learning
I Inputs and corresponding outputs are known during learning
I e.g. Classification (Binary, Multi-class, Multi-label)
Unsupervised Learning
I Input objects are not generally labeled
I e.g. Clustering, Principal-component Analysis
Semi-supervised Learning
I learning from a few labeled data
Transfer Learning
I transferring a learned model from task T1 to T2
I e.g. transfer from image captioning to video captioning

P. Balamurugan Deep Learning - Theory and Practice August 18, 2020. 14 / 65


Nature of Machine Learning Tasks Supervised Machine Learning

Supervised Machine Learning

P. Balamurugan Deep Learning - Theory and Practice August 18, 2020. 15 / 65


Nature of Machine Learning Tasks Supervised Machine Learning

Binary Classification
Recall: e-mail Spam Classification

P. Balamurugan Deep Learning - Theory and Practice August 18, 2020. 16 / 65


Nature of Machine Learning Tasks Supervised Machine Learning

Binary Classification

Input: e-mail messages


Output: Spam/Not spam

P. Balamurugan Deep Learning - Theory and Practice August 18, 2020. 17 / 65


Nature of Machine Learning Tasks Supervised Machine Learning

Binary Classification

Input: e-mail messages =⇒ some feature space


Output: Spam/Not spam =⇒ {+1, −1}

Generally many input/output pairs are given for learning the machine
learning model.

P. Balamurugan Deep Learning - Theory and Practice August 18, 2020. 18 / 65


Nature of Machine Learning Tasks Supervised Machine Learning

Binary Classification
Feature Extraction

P. Balamurugan Deep Learning - Theory and Practice August 18, 2020. 19 / 65


Nature of Machine Learning Tasks Supervised Machine Learning

Binary Classification
Feature Extraction

P. Balamurugan Deep Learning - Theory and Practice August 18, 2020. 20 / 65


Nature of Machine Learning Tasks Supervised Machine Learning

Binary Classification

Input: e-mail messages =⇒ some feature space ⊆ Rd .


I x ∈ X ⊆ Rd

Output: Spam/Not spam =⇒ {+1, −1}


I y ∈ Y = {+1, −1}

Generally n input/output pairs {(xi , yi )}ni=1 ⊂ (X × Y) are given for


learning the machine learning model.

D = {(xi , yi )}ni=1 called the training data.

P. Balamurugan Deep Learning - Theory and Practice August 18, 2020. 21 / 65


Nature of Machine Learning Tasks Supervised Machine Learning

General Nature of a Supervised Machine Learning Task

Training
Input: Training data D = {(xi , yi )}ni=1
Aim: Learn a model h : X → Y

P. Balamurugan Deep Learning - Theory and Practice August 18, 2020. 22 / 65


Nature of Machine Learning Tasks Supervised Machine Learning

General Nature of a Supervised Machine Learning Task

Training
Input: Training data D = {(xi , yi )}ni=1
Aim: Learn a model h : X → Y
Testing
Given x̂, predict ŷ = h(x̂)
P. Balamurugan Deep Learning - Theory and Practice August 18, 2020. 23 / 65
Perceptron and Learning

Perceptron and Learning

P. Balamurugan Deep Learning - Theory and Practice August 18, 2020. 24 / 65


Perceptron and Learning

Perceptron

P. Balamurugan Deep Learning - Theory and Practice August 18, 2020. 25 / 65


Perceptron and Learning

Perceptron

P. Balamurugan Deep Learning - Theory and Practice August 18, 2020. 26 / 65


Perceptron and Learning

Perceptron

P. Balamurugan Deep Learning - Theory and Practice August 18, 2020. 27 / 65


Perceptron and Learning

Perceptron

Input is of the form x = (x1 , x2 , . . . , xd ) ∈ Rd .


We associate weights w = (w1 , w2 , . . . , wd ) ∈ Rd to the connections.
Prediction Rule:
Pd
I w x ≥ θ =⇒ predict 1.
Pdi=1 i i
i=1 i xi < θ =⇒ predict −1.
w
I

P. Balamurugan Deep Learning - Theory and Practice August 18, 2020. 28 / 65


Perceptron and Learning

Perceptron

Input is of the form x = (x1 , x2 , . . . , xd ) ∈ Rd .


We associate weights w = (w1 , w2 , . . . , wd ) ∈ Rd to the connections.
Prediction Rule:
I hw , xi ≥ θ =⇒ predict 1.
I hw , xi < θ =⇒ predict −1.
Pd
Note: hw , xi = i=1 wi xi denotes inner product between w and x.
P. Balamurugan Deep Learning - Theory and Practice August 18, 2020. 29 / 65
Perceptron and Learning

Perceptron - Geometry

P. Balamurugan Deep Learning - Theory and Practice August 18, 2020. 30 / 65


Perceptron and Learning

Perceptron - Geometry

P. Balamurugan Deep Learning - Theory and Practice August 18, 2020. 31 / 65


Perceptron and Learning

Perceptron - Geometry

P. Balamurugan Deep Learning - Theory and Practice August 18, 2020. 32 / 65


Perceptron and Learning

Perceptron - Geometry

P. Balamurugan Deep Learning - Theory and Practice August 18, 2020. 33 / 65


Perceptron and Learning

Perceptron - Geometry

P. Balamurugan Deep Learning - Theory and Practice August 18, 2020. 34 / 65


Perceptron and Learning

Perceptron

P. Balamurugan Deep Learning - Theory and Practice August 18, 2020. 35 / 65


Perceptron and Learning

Perceptron

P. Balamurugan Deep Learning - Theory and Practice August 18, 2020. 36 / 65


Perceptron and Learning

Perceptron

Input is of the form x̃ = (1, x) = (1, x1 , x2 , . . . , xd ) ∈ Rd+1 .


We associate weights w̃ = (−θ, w ) = (−θ, w1 , w2 , . . . , wd ) ∈ Rd+1 to
the connections.
Prediction Rule:
I hw , xi − θ ≥ 0 =⇒ predict 1.
I hw , xi − θ < 0 =⇒ predict −1.
P. Balamurugan Deep Learning - Theory and Practice August 18, 2020. 37 / 65
Perceptron and Learning

Perceptron - Data Perspective

Actual Input: x = (x1 , x2 , . . . , xd ) ∈ Rd


Transformed Input: x̃ = (1, x1 , x2 , . . . , xd ) ∈ Rd+1
Actual Weights: w = (w1 , w2 , . . . , wd ) ∈ Rd , θ ∈ R
Transformed Weights: w̃ = (w0 , w1 , w2 , . . . , wd ) ∈ Rd+1

P. Balamurugan Deep Learning - Theory and Practice August 18, 2020. 38 / 65


Perceptron and Learning

Perceptron - Data Perspective

Equivalently we might use:


Transformed Input: x̃ = (x1 , x2 , . . . , xd , 1) ∈ Rd+1
Transformed Weights: w̃ = (w1 , w2 , . . . , wd , w0 ) ∈ Rd+1

P. Balamurugan Deep Learning - Theory and Practice August 18, 2020. 39 / 65


Perceptron and Learning

Perceptron - Data Perspective

x1 w1 Activation
function
Output
Inputs x2 w2 hw̃ , x̃i = Σwi xi sign(hw̃ , x̃i) ŷ

xd wd

Constant 1 w0
Weights, Bias

Figure: Perceptron unit

P. Balamurugan Deep Learning - Theory and Practice August 18, 2020. 40 / 65


Perceptron and Learning

Perceptron - Data Perspective

Input: data point x = (x1 , x2 , . . . , xd ), label y ∈ {+1, −1}.


P. Balamurugan Deep Learning - Theory and Practice August 18, 2020. 41 / 65
Perceptron and Learning

Perceptron - Geometrical Interpretation

hw , xi = 0

hw , xi < 0

hw , xi > 0

P. Balamurugan Deep Learning - Theory and Practice August 18, 2020. 42 / 65


Perceptron and Learning

Perceptron - Training

Perceptron Training Procedure


1: w̃ 1 = 0
2: for t ← 1, 2, 3, . . . do
3: receive (x t , y t ), x t ∈ Rd , y t ∈ {+1, −1}.
4: Transform x t into x̃ t = (x t , 1) ∈ Rd+1 .
5: yb = Perceptron(x̃ t ; w̃ t )
6: if yb 6= y t then
7: w̃ t+1 = w̃ t + y t x̃ t
8: else
9: w̃ t+1 = w̃ t

P. Balamurugan Deep Learning - Theory and Practice August 18, 2020. 43 / 65


Perceptron and Learning

Perceptron Update Rule - Geometric Idea

P. Balamurugan Deep Learning - Theory and Practice August 18, 2020. 44 / 65


Perceptron and Learning

Perceptron Update Rule - Geometric Idea

The current weights are w̃ t = (1, 1, −4).

P. Balamurugan Deep Learning - Theory and Practice August 18, 2020. 45 / 65


Perceptron and Learning

Perceptron Update Rule - Geometric Idea

Suppose a new point x t = (2, 1), y t = 1 arrives.

P. Balamurugan Deep Learning - Theory and Practice August 18, 2020. 46 / 65


Perceptron and Learning

Perceptron Update Rule - Geometric Idea

Suppose a new point x t = (2, 1), y t = 1 arrives.


Transform x t into x̃ t = (2, 1, 1).

P. Balamurugan Deep Learning - Theory and Practice August 18, 2020. 47 / 65


Perceptron and Learning

Perceptron Update Rule - Geometric Idea

Suppose a new point x t = (2, 1), y t = 1 arrives.


However with the current weights, perceptron outputs ŷ t = −1.
(why?)

P. Balamurugan Deep Learning - Theory and Practice August 18, 2020. 48 / 65


Perceptron and Learning

Perceptron Update Rule - Geometric Idea

Suppose a new point x t = (2, 1), y t = 1 arrives.


However with the current weights, perceptron outputs ŷ t = −1.
Hence an error occurs and w̃ t gets updated to w̃ t+1 = w̃ t + y t x̃ t , .

P. Balamurugan Deep Learning - Theory and Practice August 18, 2020. 49 / 65


Perceptron and Learning

Perceptron Update Rule - Geometric Idea

Suppose a new point x t = (2, 1), y t = 1 arrives.


However with the current weights, perceptron outputs ŷ t = −1.
After update, the new weights become w̃ t+1 = (3, 2, −3).

P. Balamurugan Deep Learning - Theory and Practice August 18, 2020. 50 / 65


Perceptron and Learning

Perceptron Update Rule - Geometric Idea

Homework:
Suppose now a new point x t = (−1, −1) with label −1 comes up.
How will the weights change?
Suppose a different new point x t = (−2, 3) with label +1 comes up.
How will the weights change?
P. Balamurugan Deep Learning - Theory and Practice August 18, 2020. 51 / 65
Perceptron and Learning

Perceptron - Convergence

Convergence of Perceptron Training Procedure


Under a suitable separation assumption of the positive and negative
labeled data, the training procedure for Perceptron converges in finite time.

Proof
Will be discussed later...

P. Balamurugan Deep Learning - Theory and Practice August 18, 2020. 52 / 65


Perceptron and Learning

Perceptron - Caveat

Not suitable when separation assumption fails


Example: Classical XOR problem

P. Balamurugan Deep Learning - Theory and Practice August 18, 2020. 53 / 65


Perceptron and Learning

Perceptron - Sigmoid Activation

P. Balamurugan Deep Learning - Theory and Practice August 18, 2020. 54 / 65


Multi-layer Perceptron

Multi-layer Perceptron

P. Balamurugan Deep Learning - Theory and Practice August 18, 2020. 55 / 65


Multi-layer Perceptron

Multi-layer Perceptron

P. Balamurugan Deep Learning - Theory and Practice August 18, 2020. 56 / 65


Convolutional Neural Networks

Convolutional Neural Networks (CNNs)

P. Balamurugan Deep Learning - Theory and Practice August 18, 2020. 57 / 65


Convolutional Neural Networks

CNN

P. Balamurugan Deep Learning - Theory and Practice August 18, 2020. 58 / 65


Convolutional Neural Networks

CNN - LeNet Architecture

P. Balamurugan Deep Learning - Theory and Practice August 18, 2020. 59 / 65


Recurrent Neural Networks

Recurrent Neural Networks (RNNs)

P. Balamurugan Deep Learning - Theory and Practice August 18, 2020. 60 / 65


Recurrent Neural Networks

Sequential outputs

yt−1 yt yt+1

Noun Verb Adverb

Magadh ran fast


xt−1 xt xt+1

Figure: Part-of-Speech Tagging

P. Balamurugan Deep Learning - Theory and Practice August 18, 2020. 61 / 65


Recurrent Neural Networks

Recurrent Neural Network

Output

Hidden

Input

Figure: A simple recurrent network

P. Balamurugan Deep Learning - Theory and Practice August 18, 2020. 62 / 65


Other Topics

Other Topics

Memory models: e.g. Boltzmann machines, Hopfield Nets

Other variants: Capsule Nets, U-Nets, R-CNN, etc.,

Different types of learning problems

Optimization algorithms

Applications from a variety of fields

P. Balamurugan Deep Learning - Theory and Practice August 18, 2020. 63 / 65


Other Topics

ACKNOWLEDGMENTS
Some content borrowed from various open-access resources for
Lecture 1 and Lecture 2 slides.
I Blogs
I Tutorials
I Free e-books
I Open-access Papers
I Youtube videos
I Scribe notes from my students

P. Balamurugan Deep Learning - Theory and Practice August 18, 2020. 64 / 65


Other Topics

Home Work

Programming Exercise
Write Python code to generate
  100 two-dimensional
  points from a normal
−5 1 0
distribution with mean and variance . Show the generated
−5 0 1
points in a plot.

P. Balamurugan Deep Learning - Theory and Practice August 18, 2020. 65 / 65

You might also like