5 Ann

CS-471
Machine Learning
Dr. Hammad Afzal
hammad.afzal@mcs.edu.pk
Assoc Prof (NUST)

Data and Text Processing Lab
www.codteem.com
1
Agenda
• Preliminary Concepts
– Derivatives
• Perceptron
• Gradient Descent
• Perceptron Learning
• Multi Layer Perceptron
• Back Propagation Algorithm
2
Derivatives
Session-21
3
Contents
• Derivatives
• Computation Graphs and Chain Rule
4
Derivatives
5
Derivatives
6
Derivatives
7
Derivatives
8
Derivatives
9
Derivatives
10
Derivatives
• Derivative of a function means slope of
the function
– Line – Same slope
– Curve – Different slopes at different points
11
Computation Graph
12
Chain Rule
13
Computation Graph
14
Computation Graph
15
Artificial Neural Networks
16
What is it?
• An artificial neural network is a crude way of trying to
simulate the human brain (digitally)
• Human brain – Approx 10 billion neurons
• Each neuron connected with thousands of others
• Parts of neuron
– Cell body
– Dendrites – receive input signal
– Axons – Give output
What is ANN
• ANN – made up of artificial neurons
– Digitally modeled biological neuron
• Each input into the neuron has its own weight
associated with it
• As each input enters the nucleus (blue circle)
it's multiplied by its weight.
• The nucleus sums all these new input values

which gives us the activation
• For n inputs and n weights – weights multiplied
by input and summed
What is ANN
• If the activation is greater than a threshold
value - the neuron outputs a signal – (for
example 1)
• If the activation is less than the threshold the
neuron outputs zero.
• This is typically called a step function
What is ANN
• The combination of summation and thresholding is
called a node
http://www-cse.uta.edu/~cook/ai1/lectures/figures/neuron.jpg
• For step (activation) function – The output is 1 if:
x1w1+x2w2+x3w3... +xnwn > T

What is ANN
x1w1+x2w2+x3w3... +xnwn > T
x1w1+x2w2+x3w3... +xnwn -T > 0
Let w0 = -T and x0 = 1
D = x0w0 + x1w1+x2w2+x3w3... +xnwn > 0
Output is 1 if D> 0;
Output is 0 otherwise
w0 is called a bias weight

Typical activation functions
Step function Sign function Sigmoid function Linear function
Y Y Y Y
+1 +1 1 1
0 X 0 X 0 X 0 X
-1 -1 -1 -1
1, if X  0 sign +1, if X  0 sigmoid 1

step
Y = Y = Y = Y linear= X
0, if X  0 −1, if X  0 1 + e− X
Controls when unit is “active” or “inactive”

Simplest Classifier
Can a single neuron learn a task?

A motivating example
• Each day you get lunch at the cafeteria.

– Your diet consists of fish, chips, and drink.
– You get several portions of each
• The cashier only tells you the total price of the meal
– After several days, you should be able to figure out the price of
each portion.
• Each meal price gives a linear constraint on the prices of the
portions:
price = x fish w fish + xchips wchips + xdrink wdrink

Solving the problem
• The prices of the portions are like the weights in of a
linear neuron.
• We will start with guesses for the weights and then

adjust the guesses to give a better fit to the prices
w = (w w
given by the cashier.
fish , w )
chips , drink
The Cashier’s Brain
Price of meal = 850
Linear
neuron
150 50 100
2 5 3
portions of fish portions of portions of

chips drink
A model of the cashier’s brain with arbitrary
initial weights
Price of meal = 500 • Residual error = 350
• The learning rule is:
wi =  xi ( y − yˆ )
• With a learning rate of 

50 50 50 1/35, the weight changes are
+20, +50, +30
2 5 3
• This gives new weights of
portions of portions of
70, 100, 80
portions of fish
chips drink • Notice that the weight for
chips got worse!
Perceptron
• In 1958, Frank Rosenblatt introduced a training algorithm that
provided the first procedure for training a simple ANN: a
Perceptron.
Inputs
x1 Linear Hard
w1 Combiner Limiter
Output
Y
w2

x2
Threshold
A Two Input Perceptron
Perceptron
Perceptron
Perceptron
Implementing ‘OR’ with Perceptron
Implementing ‘AND’ with Perceptron
Implementing ‘XOR’ with Perceptron
Linearly Separable
Perceptron Learning
Simple Perceptron
Perceptron Learning Algorithm
Perceptron Learning Algorithm
Learning Example
Learning Example
Learning Example
Learning Example
Learning Example
Learning Example
Learning Example
Learning Example
Learning Example
Learning Example
Perceptron Learning
Perceptron Learning
Mean square Algorithm – Gradient
Descent
Gradient Descente vs Batch Gradient
Perceptron Training Rule (X)
Incremental Gradient Descent (X)
Gradient Descente - Algorithm
Example of
perceptron
learning: the
logical
operation AND
Multi-Layer Perceptron
73
75
Back Propagation
Back Propagation
Back Propagation
Back Propagation
Back Propagation
Back Propagation
Back Propagation
Back Propagation
Back Propagation
Back Propagation - Example
RESOURCES
91
Acknowledgements
• Dr. Imran Siddiqi: Bahria University, Islamabad

• Machine Intelligence, Dr M. Hanif, UET, Lahore
• Machine Learning, S. Stock, University of
Nebraska
• Lecture Slides, Introduction to Machine
Learning, E. Alpyadin, MIT Press.
• Machine Learning, Andrew Ng – Stanfrod
University
• Fisher kernels for image representation &
generative classification models, Jakob Verbeek
92
XOR - Revisited
0,1 1,1 0,1 1,1
0,0 1,0 0,0 1,0
AND XOR
[Russell & Norvig, 1995]
• Piece-wise linear separation

0,1 1,1
0,1 1,1
0,0 1,0
1 XOR
2
0,0 1,0
0,1 1,1
XOR
0,0 1,0
XOR
BackPropagation
Backpropagation Algorithm
Computational complexity
• Could lead to a very large number of

calculations
Input Units
Hidden Units
Influence
Output Units
Map Layer
1
Influence
Map Layer
2
• Sensitivity to noise
– Very tolerant
• Transparency
– Neural networks are essentially black boxes
– There is no explanation or trace for a particular answer
– Tools for the analysis of networks are very limited
Different non linearly separable
problems
Other neural networks
Multilayer neural networks trained with the back-propagation
algorithm are used for pattern recognition problems.
However, to emulate the human memory’s associative characteristics

we need a different type of network: a recurrent neural network.
A recurrent neural network has feedback loops from its outputs to its
inputs
RESOURCES
101
Acknowledgements
• Dr. Imran Siddiqi: Bahria University, Islamabad
• Machine Intelligence, Dr. M. Hanif, UET, Lahore
• Machine Learning, S. Stock, University of Nebraska
• Lecture Slides, Introduction to Machine Learning, E. Alpyadin, MIT Press.
• Machine Learning, Andrew Ng – Stanfrod University
• Fisher kernels for image representation & generative classification models,

Jakob Verbeek 102
Thank You
103

5 Ann

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

5 Ann

Uploaded by

Copyright:

Available Formats

CS-471

Assoc Prof (NUST)

• The nucleus sums all these new input values

• For step (activation) function – The output is 1 if:

x1w1+x2w2+x3w3... +xnwn > T

x1w1+x2w2+x3w3... +xnwn > T

x1w1+x2w2+x3w3... +xnwn -T > 0

D = x0w0 + x1w1+x2w2+x3w3... +xnwn > 0

w0 is called a bias weight

Step function Sign function Sigmoid function Linear function

1, if X  0 sign +1, if X  0 sigmoid 1

Controls when unit is “active” or “inactive”

Can a single neuron learn a task?

• Each day you get lunch at the cafeteria.

price = x fish w fish + xchips wchips + xdrink wdrink

• We will start with guesses for the weights and then

Price of meal = 850

portions of fish portions of portions of

• With a learning rate of 

• Dr. Imran Siddiqi: Bahria University, Islamabad

0,0 1,0 0,0 1,0

[Russell & Norvig, 1995]

• Piece-wise linear separation

• Could lead to a very large number of

However, to emulate the human memory’s associative characteristics

• Machine Intelligence, Dr. M. Hanif, UET, Lahore

• Machine Learning, S. Stock, University of Nebraska

• Lecture Slides, Introduction to Machine Learning, E. Alpyadin, MIT Press.

• Machine Learning, Andrew Ng – Stanfrod University

• Fisher kernels for image representation & generative classification models,

You might also like