3-Intro to Deep Learning and Perceptron

Lecture -01: Introduction to
Neural Networks
1
Artificial Intelligence vs. Machine Learning vs. Deep Learning
and Neural Networks
What is AI?
The idea of artificial intelligence came into
existence and the term AI was first coined
in 1956 by John McCarthy
“Artificial Intelligence is a technique
which enables the machines to mimic
human behavior”
2
and Neural Networks What is Learning?
• “Learning is any process by which a system improves performance

from experience.” -Arthur Samuel
• “Learning denotes changes in a system that ... enable a system to do the
same
task more efficiently the next time.” –Herbert Simon
• “Learning is constructing or modifying representations of what is being
experienced.” –Ryszard Michalski
• “Learning is making useful changes in our minds.” –Marvin Minsky
3
and Neural Networks
What is Machine Learning?

Machine Learning is a subset of Artificial Intelligence (AI) which provides
machines the ability to learn automatically and improve from experience without
being explicitly programmed.
4
and Neural Networks
What is Deep Learning?
Deep learning is part of a broader family of machine learning

methods based on artificial neural networks with representation
learning. Learning can be supervised, unsupervised or
reinforcement-based. (Wikipedia)
5
and Neural Networks
What are Neural Networks (NNs) ?
Neural Networks or Artificial Neural Networks (ANNs) are
machine learning techniques inspired by our brain cells'
functionality. These brains cells are knows as bilogical neurons.
Biological neuron (Image source: Wikipedia)

8
What are Neural Networks ?
Biological Neurons vs. Artificial Neuron
Image source: Wikipedia
Image source: https://towardsdatascience.com
Biological neuron Artificial neuron

7
What are Neural Networks ?
Biological Neuron vs. Artificial Neuron
Biological Neuron Artificial Neuron

Dendrites (input) Input
Soma (cell body) Node
Axon (output) Output
Synapse Interconnections
Adaptation-based learning Model-based learning
8
Neural Network Learning Algorithms (1)
• Two main algorithms:
• Perceptron: Initial algorithm for learning simple neural

networks (with no hidden layer) developed in the 1950’s.
• Backpropagation: More complex algorithm for learning multi-

layer neural networks developed in the 1980’s.
9
Neural Network Learning Algorithms (2)
• Neural Netwoks are one the most important class of

learning
algorithms in ML.
• The learned classification model is an algebraic function.
• The function is linear for Perceptron algorithm, non-linear

for Backpropagation algorithm
• Both features and the output classes are allowed to be real valued
13
Types of Machine learning algorithms (1)
• Supervised learning is a method in which the machine is trained using

data with corresponding labels.
• Unsupervised learning is when a machine is trained on unlabeled

data without providing any guidance with the data.
• Reinforcement learning is a method in which an agent learns by

interacting with the environment by taking specific actions and discovers
rewards or penalties in response to its actions.
11
Target problems in supervised , unsupervised and reinforcement learning respectively
(from left to right)
12
Nature of data in supervised , unsupervised and reinforcement learning respectively
16
Training process in supervised , unsupervised and reinforcement learning respectively
14
Purpose of supervised , unsupervised and reinforcement learning respectively
15
Traditional ML vs. Deep Learning
(1)
19
(2)
17
(3)
18
Perceptron: The First Neural Network
Types of Artificial Neural Networks
• ANN can be categorized based on number of hidden layers contained
in ANN architecture
One Layer Neural Network (Perceptron)

01 • Contains 0 hidden layers
Multi Layer Neural Network

• Regular Neural Network
02 • Contains 1 hidden layer
• Deep Neural Network
• Contains >1 hidden layers
One layer Artificial Neural Network (Perceptron)
• Multiple input nodes x1

w1
• Single output node x2 w2
• Takes weighted sum of the inputs w3 ∑ ? ŷ
• Unit function calculates the output for x3
the network
wn
xn
Unit Function
• Linear Function
• Weighted sum followed by an activation function
Perceptron Example
• To categorize a 2x2 pixel binary image to:

• “Bright” and “Dark”
• The rule is:
• If it contains 2, 3 or 4 white pixels, it is
“bright”
• If it contains 0 or 1 white pixels, it is “dark”
Image of 4 pixels
• Perceptron architecture:
• Four input units, one for each pixel
• One output unit: +1 for bright, -1 for dark
Perceptron Example
Pixel 1 x1
0.25 If S >= 0
+1 Bright
Pixel 2 x2 0.25
ŷ
S
0.25
Pixel 3 x3
0.25 Otherwise -1 Dark
x4
Pixel 4
S= 0.25*x1 + 0.25*x2 + 0.25*x3 + 0.25*x4

Perceptron Example
• Calculation (Step-1):
• X1 = 1
• X2 = 0 1 x1
0.25 +1 Bright
• X3 = 0 If S>= 0
• X4 = 0 x2 0.25
0
ŷ
S
S= 0.25*(1) + 0.25* (0) + 0.25*(0) + 0.25* (0) = 0.25 0.25
0 x3
• 0.25 > 0, so the output of ANN is 0.25 Otherwise -1 Dark
+1
• So the image is categorized as “Bright” x4
0
• Target : “Dark”
Perceptron Training Rule (How to update weights)
• When t(E) is different from o(E)
• Add ∆i to weight wi
• Where ∆i = ɳ(t(E) – o(E) ) xi  ɳ is learning rate (Usually very
small value)
• Do this for every weight in the network
• Let ɳ=0.1
Calculating the error values Calculating the New Weights
∆1 = ɳ (t(E) – o(E) )* x1
wʹ1 = w1 +∆1 = 0.25 - 0.2 = 0.05
= 0.1 ( -1-1) * 1 = -0.2
∆2 = ɳ (t(E) – o(E) )* x2
wʹ2 = w2 +∆2 = 0.25 + 0 = 0.25
= 0.1 ( -1-1) * 0 = 0
∆3 = ɳ (t(E) – o(E) )* x3
wʹ3 = w3 +∆3 = 0.25 + 0 = 0.25
= 0.1 ( -1-1) * 0 = 0
∆4 = ɳ (t(E) – o(E) )* x4 wʹ4 = w4 +∆4 = 0.25 + 0 = 0.25
= 0.1 ( -1-1) * 0 = 0
Perceptron Example
• X1 = 1
1 x1
• X2 = 0 0.05 +1 Bright
• X3 = 0 If S>= 0
• X4 = 0 0 x2 0.25
ŷ
S
S= 0.05*(1) + 0.25* (0) + 0.25*(0) + 0.25* (0) = 0.05 0.25
0 x3
Otherwise
• 0.05 > 0, so the output of ANN is +1 0.25 -1 Dark
• So the image is categorized as “Bright” 0

x4
Perceptron Training Rule (How to update weights)
• When t(E) is different from o(E)
• Add ∆i to weight wi
• Where ∆i = ɳ(t(E) – o(E) ) xi  ɳ is learning rate (Usually very small value)
• Do this for every weight in the network
• Let ɳ=0.1
Calculating the error values Calculating the New

∆1 = ɳ (t(E) – o(E) )* x1
Weights wʹ1 = w1 +∆1 = 0.05 - 0.2 = - 0.15
= 0.1 ( -1 -1 ) * 1 = -0.2
∆2 = ɳ (t(E) – o(E) )* x2
wʹ2 = w2 +∆2 = 0.25 + 0 = 0.25
= 0.1 ( -1-1) * 0 = 0
∆3 = ɳ (t(E) – o(E) )* x3 wʹ3 = w3 +∆3 = 0.25 + 0 = 0.25
= 0.1 ( -1-1) * 0 = 0
∆4 = ɳ (t(E) – o(E) )* x4 wʹ4 = w4 +∆4 = 0.25 + 0 = 0.25
= 0.1 ( -1-1) * 0 = 0
Perceptron Example
• X1 = 1
1 x1
• X2 = 0 -0.15 +1 Bright
• X3 = 0 If S>= 0
• X4 = 0 0 x2 0.25
ŷ
S
S= - 0.15*(1) + 0.25* (0) + 0.25*(0) + 0.25* (0) = - 0.15 0.25
0 x3
Otherwise
• - 0.15 < 0, so the output of ANN is -1 0.25 -1 Dark
• So the image is categorized as “Dark” 0

x4
Another Example (AND)
X1 X2 X1 AND X2 +1 ON
If S >= 0
0 0 0
0 x1 W1=0.5
0 1 0
S ŷ
W2=0.5
1 0 0
-1 OFF
1 x2
1 1 1
Weights Step-1 Step-2 Step-3 Step-4

• X1 = 0
• X2 = 1, w1 0.5 0.5 0.5 0.5
• ɳ = 0.1 w2 0.5 0.3 0.1 -0.1
• t(E) = -1 Weighted Sum 0.5 0.3 0.1 -0.1
Observed Output +1 +1 +1 -1
If S >= 0
0 0 0
0 x1 0.5
0 1 0
S ŷ
1 0 0
-0.1
-1 OFF
1 x2
1 1 1

• X1 = 1
• X2 = 0, w1 0.5 0.3 0.1 -0.1
• ɳ = 0.1 w2 -0.1 -0.1 -0.1 -0.1
• t(E) = -1 Weighted Sum 0.5 0.3 0.1 -0.1
If S >= 0
0 0 0
0 x1 -0.1
0 1 0
S ŷ
1 0 0
-0.1
-1 OFF
1 x2
1 1 1
Weights Step-1 Step-2

• X1 = 1
• w1 -0.1 0.1
X2 = 1,
• ɳ = 0.1 w2 -0.1 0.1
• t(E) = +1 Weighted Sum -0.2 0.2
Observed Output -1 +1
Use of Bias
Bias x0
• Bias is just like an intercept
If S >= 0 +1 ON
added in a linear equation. 0 x1 0.5
S ŷ
0.5
-1 OFF
output = sum (weights * inputs) + bias 1 x2
• The output is calculated by multiplying the inputs with their weights

and then passing it through an activation function like the Sigmoid
function, etc. Here, bias acts like a constant which helps the model to
fit the given data.
Use of Bias
• A simpler way to understand bias is through a constant c of a linear function
y =mx + c
• It allows us to move the line down and up fitting the prediction with the
data better. If the constant c is absent then the line will pass through the
origin (0, 0) and we will get a poorer fit.
Example (AND) with Bias
x0 =
Bias
1
X1 X2 X1 AND X2 W0=0.5
0 0 0 If S > 0 +1 ON
0 1 0 0 x1 W1=0.5
S ŷ
1 0 0
W2=0.5
1 1 1 -1 OFF
1 x2

• X1 = 0
• X2 = 1, w0 0.5 0.3 0.1 -0.1
• ɳ = 0.1 w1 0.5 0.5 0.5 0.5
• t(E) = -1 w2 0.5 0.3 0.1 -0.1
Weighted Sum 1 0.6 0.2 -0.2
x0 =
Bias
1
X1 X2 X1 AND X2 W0=-0.1
0 0 0 If S > 0 +1 ON
0 1 0 0 x1 W1=0.
5 S ŷ
1 0 0
W2=-
1 1 1 0.1 -1 OFF
1 x2

• X1 = 1
• w0 -0.1 -0.3
X2 = 0,
• ɳ = 0.1 w1 0.5 0.3
• t(E) = -1 w2 -0.1 -0.1
Weighted Sum 0.4 0
Observed Output +1 -1
x0 =
Bias
1
X1 X2 X1 AND X2 W0=-0.3
0 0 0 If S > 0 +1 ON
0 1 0 0 x1 W1=0.
3 S ŷ
1 0 0
W2=-
1 1 1 0.1 -1 OFF
1 x2

• X1 = 1
• w0 -0.3 -0.1
X2 = 1,
• ɳ = 0.1 w1 0.3 0.5
• t(E) = +1 w2 -0.1 0.1
Weighted Sum 0 0.5
Observed Output -1 +1
x0 =
Bias
After 2 Epochs 1
W0= - 0.3
X1 X2 X1 AND X2 If S > 0 +1 ON
0 0 0 0 x1 W1= 0.3
S ŷ
0 1 0 W2= 0.1
-1 OFF
1 0 0 1 x2
1 1 1
Final Weights
Weights
• X1 = 0
• w0 - 0.3
X2 = 1,
• ɳ = 0.1 w1 0.3
• t(E) = -1 w2 0.1
Learning in Perceptron
• Need To Learn
• Both the weights between input and output units
• And the value for the bias
• Make Calculations easier by:

• Thinking of the bias as a weight from a special input unit
where the output from the unit is always 1
• Exactly the same result:

• But we only have to worry about learning weights
New Representation for Perceptron
Threshold function has

Special input, which is become this
always 1 1
w0
If S > 0
+1
x1 w1
ŷ
w2 S
x2
-1
Otherwis
wn e
xn
S= w0 + w1*x1 + w2*x2 . . . . . . . wn*xn

Learning Algorithm
• Weights are randomly initialized.
• For each training example E
• Calculate the observed output from Perceptron, o(E)
• If the target output t(E) is different to o(E)
• Then update all the weights so that o(E) becomes closer to t(E)
• This process is done for every example

• It is not necessary to stop when all examples are used.
• Repeat the cycle again (an epoch) until network produces the correct output
Limitations of Perceptron
• The perceptron can only learn simple problems. this is
only useful if the problem is linearly separable.
• A linearly separable problem is one in which the classes can
be separated by a single hyperplane.
Any Questions?

3-Intro to Deep Learning and Perceptron

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

3-Intro to Deep Learning and Perceptron

Uploaded by

Copyright:

Available Formats

Lecture -01: Introduction to

• “Learning is any process by which a system improves performance

What is Machine Learning?

What is Deep Learning?

Deep learning is part of a broader family of machine learning

Biological neuron (Image source: Wikipedia)

Image source: Wikipedia

Image source: https://towardsdatascience.com

Biological neuron Artificial neuron

Biological Neuron vs. Artificial Neuron

Biological Neuron Artificial Neuron

• Two main algorithms:

• Perceptron: Initial algorithm for learning simple neural

• Backpropagation: More complex algorithm for learning multi-

• Neural Netwoks are one the most important class of

• The learned classification model is an algebraic function.

• The function is linear for Perceptron algorithm, non-linear

• Supervised learning is a method in which the machine is trained using

• Unsupervised learning is when a machine is trained on unlabeled

• Reinforcement learning is a method in which an agent learns by

One Layer Neural Network (Perceptron)

Multi Layer Neural Network

• Multiple input nodes x1

• Unit function calculates the output for x3

• To categorize a 2x2 pixel binary image to:

S= 0.25*x1 + 0.25*x2 + 0.25*x3 + 0.25*x4

• So the image is categorized as “Bright” 0

Calculating the error values Calculating the New

• So the image is categorized as “Dark” 0

Weights Step-1 Step-2 Step-3 Step-4

• ɳ = 0.1 w2 0.5 0.3 0.1 -0.1

• t(E) = -1 Weighted Sum 0.5 0.3 0.1 -0.1

Weights Step-1 Step-2 Step-3 Step-4

• ɳ = 0.1 w2 -0.1 -0.1 -0.1 -0.1

• t(E) = -1 Weighted Sum 0.5 0.3 0.1 -0.1

Weights Step-1 Step-2

• t(E) = +1 Weighted Sum -0.2 0.2

• The output is calculated by multiplying the inputs with their weights

• A simpler way to understand bias is through a constant c of a linear function

Weights Step-1 Step-2 Step-3 Step-4

• ɳ = 0.1 w1 0.5 0.5 0.5 0.5

• t(E) = -1 w2 0.5 0.3 0.1 -0.1

Weighted Sum 1 0.6 0.2 -0.2

Weights Step-1 Step-2

• t(E) = -1 w2 -0.1 -0.1

Weighted Sum 0.4 0

Weights Step-1 Step-2

• t(E) = +1 w2 -0.1 0.1

Weighted Sum 0 0.5

• Make Calculations easier by:

• Exactly the same result:

Threshold function has

S= w0 + w1*x1 + w2*x2 . . . . . . . wn*xn

• This process is done for every example

You might also like

S= 0.25x1 + 0.25x2 + 0.25x3 + 0.25x4

S= w0 + w1x1 + w2x2 . . . . . . . wn*xn