Download as pdf or txt
Download as pdf or txt
You are on page 1of 25

ARTIFICIAL NEURAL NETWORK

■  Introduction,or how the brain works


■  The neuron as a simple computing element
■  The perceptron
■  Multilayer neural networks
Neural networks and deep learning

eural nets to recognize handwritten digits

ual system is onevisual


The human of the wonders
system ofthe
is one of the world. Neural approach
Neural networks Networksthe andproblem
Deep Learning
in a
wonders of the world. Consider the different way. Thethis
What ideabook
is toistake a large number
about
ollowingfollowing
sequencesequence of handwrittendigits:
of handwritten digits: of handwritten digits, known as training examples,
On the exercises and problems
3/6/2021 Neural networks and deep learning

Using neural nets to recognize


handwritten digits
How the backpropagation
algorithm works
fortlessly recognize those digits as 504192. That ease
Improving the way neural
n each hemisphere of our brain, humans have a networks learn
A visual proof that neural nets can
cortex, also known as V1, containing 140 million
compute any function
tens of billions of connections between them. And yet Why are deep neural networks
nvolves not just V1, but an entire series of visual hard to train?
Deep learning
V3, V4, and V5 - doing progressively more complex and then develop a system which can learn from those training
Appendix: Is there a simple
examples. In other words, the neural network uses the examples to
ng. We carry in our heads a supercomputer, tuned by algorithm for intelligence?
automatically infer rules for recognizing handwritten digits.
Acknowledgements
hundreds of millions of years, and superbly adaptedFurthermore, by increasing the number of training examples, the
Frequently Asked Questions
network can learn more about handwriting, and so improve its
the visual world. Recognizing handwritten digits isn't
Introduction, or how the brain works
Machine learning involves adaptive mechanisms that
enable computers to learn from experience, learn by
example and learn by analogy. Learning capabilities can
improve the performance of an intelligent system over
time. The most popular approaches to machine learning
are artificial neural networks and genetic algorithms.
This lecture is dedicated to neural networks.
A neural network can be defined as a model of
reasoning based on the human brain. The
brain consists of a densely interconnected set
of nerve cells, or basic information-processing
units, called neurons.

The human brain incorporates nearly 10 billion


neurons and 60 trillion connections, synapses,
between them. By using multiple neurons Biological neural network
simultaneously, the brain can perform its
functions much faster than the fastest
computers in existence today. Synapse

Synapse Dendrites
Each neuron has a very simple structure, but Axon
an army of such elements constitutes a Axon
tremendous processing power.

Soma Soma
Dendrites
A neuron consists of a cell body, soma, a Synapse
number of fibers called dendrites, and a single
long fiber called the axon.
Our brain can be considered as a highly complex, non-linear
and parallel information-processing system.

Information is stored and processed in a neural network


simultaneously throughout the whole network, rather than at
specific locations. In other words, in neural networks, both
data and its processing are global rather than local.
Learning is a fundamental and essential characteristic of

Out put Signals


Input Signals
biological neural networks.
The ease with which they can learn led to attempts to emulate
a biological neural network in a computer.
An artificial neural network consists of a number of very
simple processors, also called neurons, which are analogous Middle Layer
to the biological neurons in the brain.
Input Layer Output Layer

The neurons are connected by weighted links passing signals


from one neuron to another.

The output signal is transmitted through the neuron’s outgoing


connection. The outgoing connection splits into a number of
branches that transmit the same signal. The outgoing
branches terminate at the incoming connections of other
neurons in the network.
Architecture of a typical artificial neural network

Out put Signals


Input Signals

Middle Layer
Input Layer Output Layer
Analogy between biological and
artificial neural networks

Biological Neural Network Artificial Neural Network


Soma Neuron
Dendrite Input
Axon Output
Synapse Weight
The neuron as a simple computing element

Input Signals Weights Output Signals

x1
Y
w1
x2
w2
Neuron Y Y

wn
Y
xn

Diagram of a neuron
Input Signals Weights Output Signals

x1
Y
w1
x2
w2
Neuron Y Y

wn
Y
xn

■  The neuron's output, or , is determined by whether the weighted sum is less than or greater than some
threshold value. Just like the weights, the threshold is a real number which is a parameter of the neuron.

■  The neuron computes the weighted sum of the input signals and compares the result with a threshold
value, θ. If the net input is less than the threshold, the neuron output is –1. But if the net input is greater
than or equal to the threshold, the neuron becomes activated and its output attains a value +1.
Input Signals Weights Output Signals

x1 3/6/2021 Neural networks and dee


Y
w1
x2
w2
Neuron Y Y

wn
Y
xn In the example shown the perceptron has three inputs,
general it could have more or fewer inputs. Rosenblatt prop
■  That's the basic mathematical model. A way you canrule
simple think
to about thethe
compute perceptron is introduced
output. He that it's a weights,
device that makes decisions by weighing up evidence., real numbers expressing the importance of the r
inputs to the output. The neuron's output, or , is determi
■  The neuron uses the following transfer or activation
whether function:
the weighted sum is less than or greater tha
threshold value. Just like the weights, the threshold is a rea
number which is a parameter of the neuron. To put it in mo
n ⎧+ 1, if X ≥precise
θ algebraic terms:
X = ∑ xi wi Y =⎨
i =1 ⎩− 1, if X < θ

■  This type of activation function is called a sign function.


That's all there is to how a perceptron works!
Can a single neuron learn a task?

■  In 1958, Frank Rosenblatt introduced a training algorithm that provided the first
procedure for training a simple ANN: a perceptron.
■  The perceptron is the simplest form of a neural network. It consists of a single neuron
with adjustable synaptic weights and a hard limiter.

Inputs
x1 Linear Hard
w1 Combiner Limiter
Output
∑ Y
w2
θ
x2
Threshold
Single-layer two-input perceptron

Inputs
x1 Linear Hard
w1 Combiner Limiter
Output
∑ Y
w2
θ
x2
Threshold
Perceptron Example
(Chapter 1:pp 1-10)

Suppose the weekend is coming up, and you've heard that there's Now, suppose you absolutely adore cheese, so much so
going to be a cheese festival in your city. You like cheese, and are that you're happy to go to the festival even if your
trying to decide whether or not to go to the festival. You might make boyfriend or girlfriend is uninterested and the festival is
your decision by weighing up three factors: hard to get to. But perhaps you really loathe bad weather,
and there's no way you'd go to the festival if the weather
1. Is the weather good? is bad.
2. Does your boyfriend or girlfriend want to
accompany you? You can use perceptrons to model this kind of decision-
3. Is the festival near public transit? (You don't own a making.
car).
One way to do this is to choose a weight w1=6 for the
We can represent these three factors by corresponding binary weather, and w2=2 and w3=2 for the other conditions.
variables X1, X2 , and X3 .
The larger value of w1 indicates that the weather matters a
For instance, we'd have lot to you, much more than whether your boyfriend or
X1=1 if the weather is good, and X1=0 if the weather girlfriend joins you, or the nearness of public transit.
is bad.
Finally, suppose you choose a threshold of 5 for the
Similarly, perceptron. With these choices, the perceptron implements
X2=1 if your boyfriend or girlfriend wants to go, and the desired decision-making model, outputting whenever
X2=0 if not. the weather is good, and whenever the weather is bad. It
makes no difference to the output whether your boyfriend
similarly again for X3 and public transit. or girlfriend wants to go, or whether public transit is nearby.
X3=1 if the festival is near, X3=0 if not
The Perceptron

The operation of Rosenblatt’s perceptron is


based on the McCulloch and Pitts neuron
model. The model consists of a linear
combiner followed by a hard limiter.
Inputs
The weighted sum of the inputs is applied to x1 Linear Hard
the hard limiter, which produces an output w1 Combiner Limiter
equal to +1 if its input is positive and -1 if it is Output
negative. ∑ Y
w2
θ
x2
The aim of the perceptron is to classify Threshold
inputs,

•  x1, x2, . . ., xn, into one of two classes, say


•  A1 and A2.
In the case of an elementary perceptron, the n-dimensional space is divided by a hyperplane into
two decision regions. The hyperplane is defined by the linearly separable function:

n
∑ xi wi − θ = 0
i =1
x2 x2

Class A1

1
2
1
x1
Class A2 x1

x1w1 + x2w2 − θ = 0 x1w1 + x2w2 + x3w3 − θ = 0


x3
(a) Two-input perceptron. (b) Three-input perceptron.
HOW DOES THE PERCEPTRON LEARN ITS CLASSIFICATION
TASKS?
This is done by making small adjustments in the weights to reduce the
difference between the actual and desired outputs of the perceptron.
The initial weights are randomly assigned, usually in the range [-0.5,
0.5], and then updated to obtain the output consistent with the training
examples.

Inputs
x1 Linear Hard
w1 Combiner Limiter
Output
∑ Y
w2
θ
x2
Threshold
■  If at iteration p, the actual output is Y(p) and the desired output is Yd (p), then the
error is given by:

e( p) = Yd ( p) − Y ( p) where p = 1, 2, 3, . . .

Iteration p here refers to the pth training example presented to the perceptron.
■  If the error, e(p), is positive, we need to increase perceptron output Y(p), but if it is
negative, we need to decrease Y(p).

Inputs
x1 Linear Hard
w1 Combiner Limiter
Output
∑ Y
w2
θ
x2
Threshold
The perceptron learning rule
wi ( p + 1) = wi ( p) + α ⋅ xi ( p) ⋅ e( p)
where p = 1, 2, 3, . . .
α is the learning rate, a positive constant less than
unity.

The perceptron learning rule was first proposed by


Rosenblatt in 1960. Using this rule we can derive
the perceptron training algorithm for classification
tasks.
Perceptron’s training algorithm

Step 1: Initialisation
Set initial weights w1, w2,…, wn and threshold θ to random numbers
in the range [-0.5, 0.5].

If the error, e(p), is positive, we need to increase perceptron output


Y(p), but if it is negative, we need to decrease Y(p).
Inputs
x1 Linear Hard
w1 Combiner Limiter
Output
∑ Y
w2
θ
x2
Threshold
Perceptron’s training algorithm (continued)

Step 2: Activation
Activate the perceptron by applying inputs x1(p), x2(p),…, xn(p) and
desired output Yd (p). Calculate the actual output at iteration p = 1
⎡ n ⎤
Y ( p ) = step ⎢∑ xi ( p ) wi ( p ) − θ⎥
⎢ i =1
⎣ ⎥

where n is the number of the perceptron inputs, and step is a step


activation function.
Perceptron’s training algorithm (continued)

Step 3: Weight training


Update the weights of the perceptron
wi ( p + 1) = wi ( p) + Δwi ( p)
where is the weight correction at iteration p.
The weight correction is computed by the delta rule:
Δwi ( p) = α ⋅ xi ( p) ⋅ e( p)

Step 4: Iteration
Increase iteration p by one, go back to Step 2 and repeat the
process until convergence.
Example of perceptron learning: the logical operation AND
Inputs Desired Initial Actual Error Final
Epoch output weights output weights
x1 x2 Yd w1 w2 Y e w1 w2
1 0 0 0 0.3 −0.1 0 0 0.3 −0.1
0 1 0 0.3 −0.1 0 0 0.3 −0.1
1 0 0 0.3 −0.1 1 −1 0.2 −0.1
1 1 1 0.2 −0.1 0 1 0.3 0.0
2 0 0 0 0.3 0.0 0 0 0.3 0.0 Inputs
0 1 0 0.3 0.0 0 0 0.3 0.0
1 0 0 0.3 0.0 1 −1 0.2 0.0 x1 Linear Hard
1 1 1 0.2 0.0 1 0 0.2 0.0 Combiner Limiter
w1
3 0 0 0 0.2 0.0 0 0 0.2 0.0 Output
0 1 0 0.2 0.0 0 0 0.2 0.0 ∑ Y
1 0 0 0.2 0.0 1 −1 0.1 0.0
1 1 1 0.1 0.0 0 1 0.2 0.1 w2
4 0 0 0 0.2 0.1 0 0 0.2 0.1 θ
0 1 0 0.2 0.1 0 0 0.2 0.1 x2
1 0 0 0.2 0.1 1 −1 0.1 0.1 Threshold
1 1 1 0.1 0.1 1 0 0.1 0.1
5 0 0 0 0.1 0.1 0 0 0.1 0.1
0 1 0 0.1 0.1 0 0 0.1 0.1
1 0 0 0.1 0.1 0 0 0.1 0.1
1 1 1 0.1 0.1 1 0 0.1 0.1
Threshold: θ = 0.2; learning rate: α = 0.1
p  LabTutorial (Phyton from scratch)
p  Chapter 10: Perceptron (Pg80-88)

n  How to train the network weights for the perceptron


n  How to make predictions with the perceptron
n  How to implement the Perceptron algorithm for real-world
classification problem.
Two-dimensional plots of basic logical operations
x2 x2 x2

1 1 1

x1 x1 x1
0 1 0 1 0 1

(a) AND (x1 ∩ x2) (b) OR (x1 ∪ x2) (c) Exclusive-OR


(x1 ⊕ x2)

A perceptron can learn the operations AND and OR, but


not Exclusive-OR.
Activation functions of a neuron

Step function Sign function Sigmoid function Linear function

Y Y Y Y
+1 +1 +1 +1

0 X 0 X 0 X 0 X
-1 -1 -1 -1

step ⎧1, if X ≥ 0 sign ⎧+ 1, if X ≥ 0 sigmoid 1 Y linear = X


Y =⎨ Y =⎨ Y =
⎩0, if X < 0 ⎩− 1, if X < 0 1 + e− X

You might also like