12 Neural Network

Artificial Intelligence
Neural Network
The idea of ANNs..?
▪ ANNs learn relationship between cause and effect or organize large volumes of data
into orderly and informative patterns.
What is that?
frog
lion
bird
It’s a frog
Artificial Neural Network
▪ ANN Definition:
“Data processing system consisting of a large number of simple, highly interconnected
processing elements (artificial neurons) in an architecture inspired by the structure of the
cerebral cortex of the brain”
(Tsoukalas & Uhrig, 1997)
▪ Neural network: information processing paradigm inspired by biological nervous

systems, such as our brain
▪ Structure: large number of highly interconnected processing elements (neurons)
working together
▪ Like people, they learn from experience (by example)
Artificial Neural Networks
▪ Computational models inspired by the human brain:
▪ Algorithms that try to mimic the brain.
▪ Massively parallel, distributed system, made up of simple processing units (neurons)
▪ Synaptic connection strengths among neurons are used to store the acquired
knowledge.
▪ Knowledge is acquired by the network from its environment through a learning

process
Biological Inspirations
Biological Neuron
Dendrites: nerve fibres carrying electrical Soma (Cell body): computes a non-
signals to the cell linear function of its inputs
Synapse: the point of contact between

the axon of one cell and the dendrite of
Axon: single long fiber that carries the
another, regulating a chemical connection
electrical signal from the cell body to
whose strength affects the input to the
other neurons
cell.
Biological Inspirations
▪ Humans perform complex tasks like vision, motor control, or language
understanding very well.
▪ One way to build intelligent machines is to try to imitate the (organizational
principles of) human brain.
▪ A variety of different neurons exist (motor neuron, on-center off-surround visual

cells…), with different branching structures.
▪ The connections of the network and the strengths of the individual synapses
establish the function of the network.
Artificial Neuron (Perceptron)
x0 w0
𝑛
x1 w1 ෍ 𝑥𝑖 𝑤𝑖
𝑖=0 f y
output
xn wn
For Example
Input weight weighted Activation n
vector x vector w sum function y = sign(෍ 𝑤𝑖 𝑥𝑖 )

i=0
▪ The n-dimensional input vector x is mapped into variable y by means of the scalar product and a
nonlinear function mapping
Key features of ANN are suggested by the properties of biological neurons:
▪ The processing element receives many signals.

▪ Signals may be modified by a weight at the receiving synapse.
▪ The processing element sums the weighted inputs.
▪ Under appropriate circumstances (sufficient input), the neuron transmits a single output.
▪ The output from a particular neuron may go to many other neurons.
x0 w0
𝑛
x1 w1
෍ 𝑥𝑖 𝑤𝑖
𝑖=0
∫ y
xn wn
Key features of ANN are suggested by the properties of biological neurons:
▪ Strength of connection between the neurons is stored as a weight-value for the

specific connection.
▪ Learning the solution to a problem = changing the connection weights
x0 w0
𝑛
x1 w1
෍ 𝑥𝑖 𝑤𝑖
𝑖=0
∫ y
xn wn
ANN = Simplified Biology
Properties of ANNs
▪ Learning from examples
▪ labeled or unlabeled
▪ Adaptivity
▪ changing the connection strengths to learn things
▪ Non-linearity
▪ the non-linear activation functions are essential
▪ Fault tolerance
▪ if one of the neurons or connections is damaged, the whole network still works quite well
▪ Thus, they might be better alternatives than classical solutions for problems
characterized by
▪ high dimensionality, noisy, imprecise or imperfect data
▪ a lack of a clearly stated mathematical solution or algorithm
Characterization of ANN
ANN
Activation
Interconnections Learning
Functions
Feed Learning Learning

Feed back Recurrent Binary Step Linear Non-Linear
Forward Rules Types
Parameter
Single Layer Single Layer Supervised Sigmoid
Learning
Structure
Multilayer Multilayer Unsupervised Softmax
Learning
Reinforcement TanH
ReLU
Leaky ReLU
Characterization of ANN
ANN
Interconnections
Feed
Feed back Recurrent
Forward
Single Layer Single Layer
Multilayer Multilayer
ANN Models
ANN Models
Learning in ANN
ANN
Learning
Learning Rules Learning Types
Neural
Parameter Structure
Supervised 𝑋 Network 𝑌෠
Learning Learning (Predicted output)
(Input)
W
Connection Change in Error
weights are network Unsupervised
updated structure 𝑌 − 𝑌෠ Error
Signal 𝑌
signals Generator (Desired Output)
Reinforcement
Activation Function
ANN
Activation
Functions
Binary Step Linear Non-Linear
Sigmoid
Softmax
TanH
ReLU
Leaky ReLU
Activation Function
ANN
Activation
Functions
Binary Step Linear Non-Linear
Sigmoid
Softmax
TanH
ReLU
Leaky ReLU
Perceptron
Output Y ▪ Perceptron
▪ computes the weighted sum of its input
(called its net input)
▪ adds its bias
1 𝑖𝑓 𝑛𝑒𝑡 > 0
𝑌=ቊ
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 ▪ passes this value through an activation
b function
𝑛
𝑛𝑒𝑡 = ෍ 𝑤𝑖 𝑥𝑖 ▪ We say that the neuron “fires” (i.e.

𝑖=1 becomes active) if its output is above
zero.
w1 w2 wn
▪ Bias can be incorporated as another
x1 x2 ... xn weight clamped to a fixed input of +1.0
Input X ▪ This extra free variable (bias) makes the
𝑛
neuron more powerful.
𝑛e𝑡 = ෍ 𝜔𝑖 𝑥𝑖 + 𝑏
𝑖=1
Perceptron
Output Y ▪ Perceptron
▪ computes the weighted sum of its input
(called its net input)
▪ adds its bias
1 𝑖𝑓 𝑛𝑒𝑡 > 0
𝑌=ቊ
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 ▪ passes this value through an activation
function
𝑛
𝑛𝑒𝑡 = ෍ 𝑤𝑖 𝑥𝑖 ▪ We say that the neuron “fires” (i.e.

w0
𝑖=1 becomes active) if its output is above
zero.
w1 w2 wn
X0 ▪ Bias can be incorporated as another
x1 x2 ... xn weight clamped to a fixed input of +1.0
Input X ▪ This extra free variable (bias) makes the
𝑛
neuron more powerful.
𝑛e𝑡 = ෍ 𝜔𝑖 𝑥𝑖
𝑖=0
Perceptron's
x0 wk0
wk1 H1
x1 𝑛
wk2 Y 𝑛e𝑡 = ෍ 𝑤𝑘𝑖 𝑥𝑖

x2 𝑖=0
H2
...
wkn
xn
Inverter
input output
X1 Y 1 w0= 0.5
0 1
1 0 x1 w1= -1
Boolean OR
x2
input input output
x1 x2 Y w0= -0.5 + +
OR
0 0 0 1
0 1 1 w1=1 w2=1
x1
- +
1 0 1
x1 x2
1 1 1
Boolean AND
input input x2
ouput
x1 x2
w0= -1.5
0 0 0 - +
1
0 1 0 w1=1 w2=1
AND
1 0 0
x1
x1 x2 - -
1 1 1
Boolean XOR
input input Output x2

x1 x2 Y
0 0 0
+ -
0 1 1
XOR
1 0 1
x1 x2 x1
1 1 0 - +
Boolean XOR
XOR
input input
ouput -0.5 x2
x1 x2 o
0 0 0 1 -1
+ -
0 1 1 OR AND
1 0 1 h1 h2 -1.5 XOR
1 1 0 w10= -0.5 x1
- +
1 1
1 1 1
w20= -1.5
x1 x1
Exercise
▪ Design neural network:
▪ XNOR
▪ NOR
▪ NAND
▪ XOR using OR, AND and NAND
▪ XNOR using OR, AND and NOR
Perceptron Sigmoid
▪ Perceptron: functions of the form y = f(x) where
▪ x ∈ {0, 1}
▪ y ∈ {0, 1}
▪ What about arbitrary functions of the form y = f(x) where x ∈ R n

(instead of {0, 1} n ) and y ∈ R (instead of {0, 1}) ?
Multilayer Neural Networks
▪ A multilayer perceptron is a feedforward neural network with ≥ 1 hidden layers.
Inputs
x1 Linear Hard
w1 Combiner Limiter
Out put Signals

Output
Input Signals
 Y
w2

x2
Threshold
First Second
Input hidden hidden Output
layer layer layer layer
Single-layer VS Multi-layer Neural Networks

Roles of Layers
▪ Input Layer
▪ Accepts input signals from outside world
▪ Distributes the signals to neurons in hidden
layer
Out put Signals

Input Signals
▪ Usually does not do any computation
▪ Output Layer (computational neurons)
▪ Accepts output signals from the previous
hidden layer
▪ Outputs to the world First Second
Input hidden hidden Output
▪ Knows the desired outputs layer layer layer layer
▪ Hidden Layer (computational neurons)
▪ Determines its own desired outputs
Roles of Layers
▪ Neurons in hidden layers unobservable
through input and output of the
networks.
▪ Desired output unknown (hidden) from
Out put Signals

Input Signals
the outside and determined by the layer
itself
▪ 1 hidden layer for continuous functions
▪ 2 hidden layers for discontinuous First Second
functions Input hidden hidden Output
layer layer layer layer
▪ Practical applications mostly use 3 layers
▪ More layers are possible, but each
additional layer exponentially increases
computing load
How do multilayer neural networks learn?
▪ More than a hundred different learning algorithms are available
for multilayer ANNs
▪ The most popular method is back-propagation.
Back-propagation Algorithm
▪ In a back-propagation neural network, the learning algorithm has 2 phases.
1. Forward propagation of inputs
2. Backward propagation of errors
▪ The algorithm loops over the 2 phases until the errors obtained are lower than a
certain threshold.
▪ Learning is done in a similar manner as in a perceptron
▪ A set of training inputs is presented to the network.
▪ The network computes the outputs.
▪ The weights are adjusted to reduce errors.
▪ The activation function used is a sigmoid function.
1
=
sigmoid
Y 1 +e - X
Common Activation Functions
Step function Sign function Sigmoid function Linear function
Output is a real number in the [0, 1] range.
Y Y Y Y
+1 +1 +1 +1
0 X 0 X 0 X 0 X
-1 -1 -1 -1
step 1, if X  0 sign + 1, if X  0 1

Y = Y = Y sigmoid = Y linear = X
0, if X  0 - 1, if X  0 1 + e- X
Hard limit functions often used Popular in Often used for
for decision-making neurons back- linear
for classification and pattern propagation approximation
recognition networks
3-layer Back-propagation Neural Network
Input signals
1
x1 1 y1
1
2
x2 2 y2
2
i wij j wjk
xi k yk
m
n l yl
xn
Input Hidden Output
layer layer layer
Error signals
How a neuron determines its output
▪ Very similar to the perceptron
n
X =  xi wi - 
1
=
sigmoid
Y 1 +e - X
i =1
1. Compute the net 2. Pass the result to the
weighted input activation function 0.98
2 0.98
1
Input Signals
Input signals 0.1
x1 1 y1 0.2 0.98
5
x2
2
1
0.5
j
2
2 y2
0.98
i wij wjk
1 0.3
j
xi k yk
0.98
m 8 Let θ = 0.2
xn
n l yl
0.98
Input Hidden Output
layer layer layer
Error signals X = (0.1(2) + 0.2(5) + 0.5(1) + 0.3(8)) – 0.2

= 3.9
Y = 1 / (1 + e-3.9) = 0.98
How the errors propagate backward
▪ The errors are computes in a similar manner to the errors in the perceptron.
▪ Error = The output we want – The output we get
ek ( p ) = yd ,k ( p ) - yk ( p ) Error at an output
neuron k at iteration p
2 Iteration p
Input signals 0.1
1
x1 1 y1 0.2
5
x2
2
1
2 y2 0.5
k 0.98
2
i wij wjk
1 0.3
j
xi k yk
m 8 Suppose the expected output is 1.

n l yl
xn
Input Hidden Output
layer layer layer ek(p) = 1 – 0.98 = 0.02
Error Signals
Error signals
Back-Propagation Training Algorithm
Step 1: Initialization
Randomly define weights and threshold θ such that the numbers are
within a small range
Random weight range [2.4, 0]
 2 .4 2 .4 
 - , +  2
2.4/Fi
 Fi Fi  1
0
0 10 20 30 40
Fi
Where Fi is the total number of inputs of neuron i.

The weight initialization is done on a neuron-by-neuron basis.
Step 2: Activation
Propagate the input signals forward from the input layer to the output layer.
n 1
X =  xi wi -  =  [0,1]
sigmoid sigmoid
Y 1 +e -X Y
,
i =1
0.98
2 0.1 Let θ = 0.2
Input Signals
Input signals
0.2
x1
1
1 y1 5 0.98
2
1
0.5
j
x2
2
2 y2 0.98
1 0.3
xi
i wij j wjk
k yk 0.98
8
m
n l yl 0.98
xn
Input Hidden Output
layer layer layer X = (0.1(2) + 0.2(5) + 0.5(1) + 0.3(8)) – 0.2
Error signals
= 3.9
Y = 1 / (1 + e-3.9) = 0.98
Step 3: Weight Training
There are 2 types of weight training
1. For the output layer neurons
2. For the hidden layer neurons
*** It is important to understand that first the input signals propagate forward, and
then the errors propagate backward to help train the weights. ***
In each iteration (p + 1), the weights are updated based on the weights from the
previous iteration p.
The signals keep flowing forward and backward until the errors are below some
preset threshold value.
3.1 Weight Training (Output layer neurons)
Input signals 1
1
x1 w Iteration p
1
1 y1
2 w2,k 1,k
2
x2 2 y2
2
xi
i wij j wjk
k yk
wj,k k yk(p)
j
m
xn
n l yl wm,k
Input Hidden Output
ek(p) = yd,k(p) - yk(p)
layer layer layer
m
Error signals
w j ,k ( p + 1) = w j ,k ( p ) + w j ,k ( p )
▪ These formulas
are used to w j ,k ( p ) =   y j ( p )   k ( p )
perform weight
corrections.  k ( p) = yk ( p)  1 - yk ( p) ek ( p)
δ = error gradient
We want to compute this We know this
w j ,k ( p + 1) = w j ,k ( p ) + w j ,k ( p )
We know how to compute

predefined
this
w j ,k ( p ) =   y j ( p )   k ( p )
1
w1,k Iteration p We know how to compute
2 w2,k
these
 k ( p) = yk ( p)  1 - yk ( p) ek ( p)
wj,k k yk(p)
j
wm,k
ek(p) = yd,k(p) -
m yk(p)
We do the above for each of the weights of the outer layer
neurons.
3.2 Weight Training (Hidden layer neurons)
Input signals 1 Iteration p 1
1
x1
y1
w1,j
1
1
2 w2,j 2
2
x2 2 y2
2
xi
i wij j wjk
k yk
wi,j j
i k
m
n l yl
xn wn,j
Input Hidden Output
layer layer layer
n l
Error signals
wi , j ( p + 1) = wi , j ( p ) + wi , j ( p )
▪ These formulas
are used to wi , j ( p ) =   xi ( p )   j ( p )
perform weight
 j ( p ) = y j ( p )  1 - y j ( p )   k ( p ) w jk ( p )
corrections. l
k =1
We want to compute this We know this
wi , j ( p + 1) = wi , j ( p ) + wi , j ( p ) predefined
input
wi , j ( p ) =   xi ( p )   j ( p )
 j ( p ) = y j ( p )  1 - y j ( p )   k ( p ) w jk ( p )
l
We know
this
k =1
We know how to 1 Iteration p 1

compute these w1,j
2 w2,j 2
Propagates from
the outer layer wi,j j
i k
We do the above for each of the wn,j
weights of the hidden layer
n l
neurons.
Input signals
1
P=1
x1 1 y1
1
2
x2 2 y2
2
i wij j wjk
xi k yk
m
n l yl
xn
Input Hidden Output
layer layer layer
Error signals
Input signals
1
P=1
x1 Weights trained Weights trained
1 y1
1
2
x2 2 y2
2
i wij j wjk
xi k yk
m
n l yl
xn
Input Hidden Output
layer layer layer
Error signals
After the weights are trained in p = 1, we go back to Step 2
(Activation) and compute the outputs for the new weights.
If the errors obtained via the use of the updated weights are still
above the error threshold, we start weight training for p = 2.
Otherwise, we stop.
Input signals
1
P=2
x1 1 y1
1
2
x2 2 y2
2
i wij j wjk
xi k yk
m
n l yl
xn
Input Hidden Output
layer layer layer
Error signals
Input signals
1
P=2
x1 Weights trained Weights trained
1 y1
1
2
x2 2 y2
2
i wij j wjk
xi k yk
m
n l yl
xn
Input Hidden Output
layer layer layer
Error signals
Example: 3-layer ANN for XOR
x2 or input2
(0, 1) (1, 1)
(0, 0) (1, 0)
x1 or input1
XOR is not a linearly separable function.
A single-layer ANN or the perceptron cannot deal with problems that are not linearly
separable. We cope with these problem using multi-layer neural networks.

12 Neural Network

Uploaded by

Copyright:

Available Formats

You might also like

12 Neural Network

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

12 Neural Network

Uploaded by

Copyright:

Available Formats

Artificial Intelligence

(Tsoukalas & Uhrig, 1997)

▪ Neural network: information processing paradigm inspired by biological nervous

▪ Massively parallel, distributed system, made up of simple processing units (neurons)

▪ Knowledge is acquired by the network from its environment through a learning

Synapse: the point of contact between

▪ A variety of different neurons exist (motor neuron, on-center off-surround visual

vector x vector w sum function y = sign(෍ 𝑤𝑖 𝑥𝑖 )

▪ The processing element receives many signals.

▪ Strength of connection between the neurons is stored as a weight-value for the

Feed Learning Learning

Single Layer Single Layer

Learning Rules Learning Types

Binary Step Linear Non-Linear

Binary Step Linear Non-Linear

𝑛𝑒𝑡 = ෍ 𝑤𝑖 𝑥𝑖 ▪ We say that the neuron “fires” (i.e.

𝑛𝑒𝑡 = ෍ 𝑤𝑖 𝑥𝑖 ▪ We say that the neuron “fires” (i.e.

wk2 Y 𝑛e𝑡 = ෍ 𝑤𝑘𝑖 𝑥𝑖

input input Output x2

▪ What about arbitrary functions of the form y = f(x) where x ∈ R n

Out put Signals

Single-layer VS Multi-layer Neural Networks

Out put Signals

Out put Signals

step 1, if X  0 sign + 1, if X  0 1

Error signals X = (0.1(2) + 0.2(5) + 0.5(1) + 0.3(8)) – 0.2

m 8 Suppose the expected output is 1.

Where Fi is the total number of inputs of neuron i.

We know how to compute

We know how to 1 Iteration p 1

XOR is not a linearly separable function.

You might also like