Download as pdf or txt
Download as pdf or txt
You are on page 1of 103

CS-471

Machine Learning
Dr. Hammad Afzal
hammad.afzal@mcs.edu.pk

Assoc Prof (NUST)


Data and Text Processing Lab
www.codteem.com

1
Agenda
• Preliminary Concepts
– Derivatives
• Perceptron
• Gradient Descent
• Perceptron Learning
• Multi Layer Perceptron
• Back Propagation Algorithm

2
Derivatives
Session-21

3
Contents
• Derivatives
• Computation Graphs and Chain Rule

4
Derivatives

5
Derivatives

6
Derivatives

7
Derivatives

8
Derivatives

9
Derivatives

10
Derivatives
• Derivative of a function means slope of
the function
– Line – Same slope
– Curve – Different slopes at different points

11
Computation Graph

12
Chain Rule

13
Computation Graph

14
Computation Graph

15
Artificial Neural Networks

16
What is it?
• An artificial neural network is a crude way of trying to
simulate the human brain (digitally)
• Human brain – Approx 10 billion neurons
• Each neuron connected with thousands of others
• Parts of neuron
– Cell body
– Dendrites – receive input signal
– Axons – Give output
What is ANN
• ANN – made up of artificial neurons
– Digitally modeled biological neuron
• Each input into the neuron has its own weight
associated with it
• As each input enters the nucleus (blue circle)
it's multiplied by its weight.

• The nucleus sums all these new input values


which gives us the activation
• For n inputs and n weights – weights multiplied
by input and summed
What is ANN
• If the activation is greater than a threshold
value - the neuron outputs a signal – (for
example 1)
• If the activation is less than the threshold the
neuron outputs zero.
• This is typically called a step function
What is ANN
• The combination of summation and thresholding is
called a node

http://www-cse.uta.edu/~cook/ai1/lectures/figures/neuron.jpg

• For step (activation) function – The output is 1 if:

x1w1+x2w2+x3w3... +xnwn > T


What is ANN

x1w1+x2w2+x3w3... +xnwn > T

x1w1+x2w2+x3w3... +xnwn -T > 0

Let w0 = -T and x0 = 1

D = x0w0 + x1w1+x2w2+x3w3... +xnwn > 0

Output is 1 if D> 0;
Output is 0 otherwise

w0 is called a bias weight


Typical activation functions

Step function Sign function Sigmoid function Linear function

Y Y Y Y
+1 +1 1 1

0 X 0 X 0 X 0 X
-1 -1 -1 -1

1, if X  0 sign +1, if X  0 sigmoid 1


step
Y = Y = Y = Y linear= X
0, if X  0 −1, if X  0 1 + e− X

Controls when unit is “active” or “inactive”


Simplest Classifier

Can a single neuron learn a task?


A motivating example

• Each day you get lunch at the cafeteria.


– Your diet consists of fish, chips, and drink.
– You get several portions of each
• The cashier only tells you the total price of the meal
– After several days, you should be able to figure out the price of
each portion.
• Each meal price gives a linear constraint on the prices of the
portions:

price = x fish w fish + xchips wchips + xdrink wdrink


Solving the problem
• The prices of the portions are like the weights in of a
linear neuron.

• We will start with guesses for the weights and then


adjust the guesses to give a better fit to the prices
w = (w w
given by the cashier.
fish , w )
chips , drink
The Cashier’s Brain

Price of meal = 850

Linear
neuron

150 50 100

2 5 3

portions of fish portions of portions of


chips drink
A model of the cashier’s brain with arbitrary
initial weights
Price of meal = 500 • Residual error = 350
• The learning rule is:
wi =  xi ( y − yˆ )

• With a learning rate of 


50 50 50 1/35, the weight changes are
+20, +50, +30

2 5 3
• This gives new weights of
portions of portions of
70, 100, 80
portions of fish
chips drink • Notice that the weight for
chips got worse!
Perceptron
• In 1958, Frank Rosenblatt introduced a training algorithm that
provided the first procedure for training a simple ANN: a
Perceptron.
Inputs
x1 Linear Hard
w1 Combiner Limiter
Output
Y
w2

x2
Threshold
A Two Input Perceptron
Perceptron
Perceptron
Perceptron
Implementing ‘OR’ with Perceptron
Implementing ‘AND’ with Perceptron
Implementing ‘XOR’ with Perceptron
Linearly Separable
Perceptron Learning
Simple Perceptron
Perceptron Learning Algorithm
Perceptron Learning Algorithm
Learning Example
Learning Example
Learning Example
Learning Example
Learning Example
Learning Example
Learning Example
Learning Example
Learning Example
Learning Example
Perceptron Learning
Perceptron Learning
Mean square Algorithm – Gradient
Descent
Gradient Descente vs Batch Gradient
Perceptron Training Rule (X)
Incremental Gradient Descent (X)
Gradient Descente - Algorithm

Example of
perceptron
learning: the
logical
operation AND
Gradient Descente - Algorithm
Gradient Descente - Algorithm
Gradient Descente - Algorithm
Gradient Descente - Algorithm
Gradient Descente - Algorithm
Gradient Descente - Algorithm
Gradient Descente - Algorithm
Gradient Descente - Algorithm
Gradient Descente - Algorithm
Multi-Layer Perceptron
Multi-Layer Perceptron
Multi-Layer Perceptron
Multi-Layer Perceptron
Multi-Layer Perceptron
Multi-Layer Perceptron
Multi-Layer Perceptron
73
Multi-Layer Perceptron
75
Back Propagation
Back Propagation
Back Propagation
Back Propagation
Back Propagation
Back Propagation
Back Propagation
Back Propagation
Back Propagation
Back Propagation - Example
Back Propagation - Example
Back Propagation - Example
Back Propagation - Example
Back Propagation - Example
Back Propagation - Example
RESOURCES

91
Acknowledgements

• Dr. Imran Siddiqi: Bahria University, Islamabad


• Machine Intelligence, Dr M. Hanif, UET, Lahore
• Machine Learning, S. Stock, University of
Nebraska
• Lecture Slides, Introduction to Machine
Learning, E. Alpyadin, MIT Press.
• Machine Learning, Andrew Ng – Stanfrod
University
• Fisher kernels for image representation &
generative classification models, Jakob Verbeek
92
XOR - Revisited
0,1 1,1 0,1 1,1

0,0 1,0 0,0 1,0

AND XOR

[Russell & Norvig, 1995]

• Piece-wise linear separation


0,1 1,1

0,1 1,1

0,0 1,0

1 XOR

2
0,0 1,0
0,1 1,1
XOR

0,0 1,0

XOR
BackPropagation
Backpropagation Algorithm
Computational complexity

• Could lead to a very large number of


calculations
Input Units
Hidden Units

Influence
Output Units
Map Layer
1

Influence
Map Layer
2
• Sensitivity to noise
– Very tolerant
• Transparency
– Neural networks are essentially black boxes
– There is no explanation or trace for a particular answer
– Tools for the analysis of networks are very limited
Different non linearly separable
problems
Other neural networks
Multilayer neural networks trained with the back-propagation
algorithm are used for pattern recognition problems.

However, to emulate the human memory’s associative characteristics


we need a different type of network: a recurrent neural network.

A recurrent neural network has feedback loops from its outputs to its
inputs
RESOURCES

101
Acknowledgements
• Dr. Imran Siddiqi: Bahria University, Islamabad

• Machine Intelligence, Dr. M. Hanif, UET, Lahore

• Machine Learning, S. Stock, University of Nebraska

• Lecture Slides, Introduction to Machine Learning, E. Alpyadin, MIT Press.

• Machine Learning, Andrew Ng – Stanfrod University

• Fisher kernels for image representation & generative classification models,


Jakob Verbeek 102
Thank You

103

You might also like