Professional Documents
Culture Documents
12 Neural Network
12 Neural Network
12 Neural Network
Neural Network
The idea of ANNs..?
▪ ANNs learn relationship between cause and effect or organize large volumes of data
into orderly and informative patterns.
What is that?
frog
lion
bird
It’s a frog
Artificial Neural Network
▪ ANN Definition:
“Data processing system consisting of a large number of simple, highly interconnected
processing elements (artificial neurons) in an architecture inspired by the structure of the
cerebral cortex of the brain”
▪ Synaptic connection strengths among neurons are used to store the acquired
knowledge.
x0 w0
𝑛
x1 w1 𝑥𝑖 𝑤𝑖
𝑖=0 f y
output
xn wn
For Example
Input weight weighted Activation n
▪ The n-dimensional input vector x is mapped into variable y by means of the scalar product and a
nonlinear function mapping
Key features of ANN are suggested by the properties of biological neurons:
x0 w0
𝑛
x1 w1
𝑥𝑖 𝑤𝑖
𝑖=0
∫ y
xn wn
Key features of ANN are suggested by the properties of biological neurons:
x0 w0
𝑛
x1 w1
𝑥𝑖 𝑤𝑖
𝑖=0
∫ y
xn wn
ANN = Simplified Biology
Properties of ANNs
▪ Learning from examples
▪ labeled or unlabeled
▪ Adaptivity
▪ changing the connection strengths to learn things
▪ Non-linearity
▪ the non-linear activation functions are essential
▪ Fault tolerance
▪ if one of the neurons or connections is damaged, the whole network still works quite well
▪ Thus, they might be better alternatives than classical solutions for problems
characterized by
▪ high dimensionality, noisy, imprecise or imperfect data
▪ a lack of a clearly stated mathematical solution or algorithm
Characterization of ANN
ANN
Activation
Interconnections Learning
Functions
Parameter
Single Layer Single Layer Supervised Sigmoid
Learning
Structure
Multilayer Multilayer Unsupervised Softmax
Learning
Reinforcement TanH
ReLU
Leaky ReLU
Characterization of ANN
ANN
Interconnections
Feed
Feed back Recurrent
Forward
Multilayer Multilayer
ANN Models
ANN Models
Learning in ANN
ANN
Learning
Neural
Parameter Structure
Supervised 𝑋 Network 𝑌
Learning Learning (Predicted output)
(Input)
W
Connection Change in Error
weights are network Unsupervised
updated structure 𝑌 − 𝑌 Error
Signal 𝑌
signals Generator (Desired Output)
Reinforcement
Activation Function
ANN
Activation
Functions
Sigmoid
Softmax
TanH
ReLU
Leaky ReLU
Activation Function
ANN
Activation
Functions
Sigmoid
Softmax
TanH
ReLU
Leaky ReLU
Perceptron
Output Y ▪ Perceptron
▪ computes the weighted sum of its input
(called its net input)
▪ adds its bias
1 𝑖𝑓 𝑛𝑒𝑡 > 0
𝑌=ቊ
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 ▪ passes this value through an activation
b function
𝑛
x0 wk0
wk1 H1
x1 𝑛
wkn
xn
Inverter
input output
X1 Y 1 w0= 0.5
0 1
1 0 x1 w1= -1
Boolean OR
x2
input input output
x1 x2 Y w0= -0.5 + +
OR
0 0 0 1
0 1 1 w1=1 w2=1
x1
- +
1 0 1
x1 x2
1 1 1
Boolean AND
input input x2
ouput
x1 x2
w0= -1.5
0 0 0 - +
1
0 1 0 w1=1 w2=1
AND
1 0 0
x1
x1 x2 - -
1 1 1
Boolean XOR
0 0 0
+ -
0 1 1
XOR
1 0 1
x1 x2 x1
1 1 0 - +
Boolean XOR
XOR
input input
ouput -0.5 x2
x1 x2 o
0 0 0 1 -1
+ -
0 1 1 OR AND
1 0 1 h1 h2 -1.5 XOR
1 1 0 w10= -0.5 x1
- +
1 1
1 1 1
w20= -1.5
x1 x1
Exercise
▪ Design neural network:
▪ XNOR
▪ NOR
▪ NAND
▪ XOR using OR, AND and NAND
▪ XNOR using OR, AND and NOR
Perceptron Sigmoid
▪ Perceptron: functions of the form y = f(x) where
▪ x ∈ {0, 1}
▪ y ∈ {0, 1}
Input Signals
Y
w2
x2
Threshold
First Second
Input hidden hidden Output
layer layer layer layer
1
=
sigmoid
Y 1 +e - X
Common Activation Functions
Step function Sign function Sigmoid function Linear function
Output is a real number in the [0, 1] range.
Y Y Y Y
+1 +1 +1 +1
0 X 0 X 0 X 0 X
-1 -1 -1 -1
i wij j wjk
xi k yk
m
n l yl
xn
Input Hidden Output
layer layer layer
Error signals
How a neuron determines its output
▪ Very similar to the perceptron
n
X = xi wi -
1
=
sigmoid
Y 1 +e - X
i =1
1. Compute the net 2. Pass the result to the
weighted input activation function 0.98
2 0.98
1
Input Signals
Input signals 0.1
x1 1 y1 0.2 0.98
5
x2
2
1
0.5
j
2
2 y2
0.98
i wij wjk
1 0.3
j
xi k yk
0.98
m 8 Let θ = 0.2
xn
n l yl
0.98
Input Hidden Output
layer layer layer
ek ( p ) = yd ,k ( p ) - yk ( p ) Error at an output
neuron k at iteration p
2 Iteration p
Input signals 0.1
1
x1 1 y1 0.2
5
x2
2
1
2 y2 0.5
k 0.98
2
i wij wjk
1 0.3
j
xi k yk
2.4/Fi
Fi Fi 1
0
0 10 20 30 40
Fi
n 1
X = xi wi - = [0,1]
sigmoid sigmoid
Y 1 +e -X Y
,
i =1
0.98
2 0.1 Let θ = 0.2
Input Signals
Input signals
0.2
x1
1
1 y1 5 0.98
2
1
0.5
j
x2
2
2 y2 0.98
1 0.3
xi
i wij j wjk
k yk 0.98
8
m
n l yl 0.98
xn
Input Hidden Output
layer layer layer X = (0.1(2) + 0.2(5) + 0.5(1) + 0.3(8)) – 0.2
Error signals
= 3.9
Y = 1 / (1 + e-3.9) = 0.98
Back-Propagation Training Algorithm
Step 3: Weight Training
There are 2 types of weight training
1. For the output layer neurons
2. For the hidden layer neurons
*** It is important to understand that first the input signals propagate forward, and
then the errors propagate backward to help train the weights. ***
In each iteration (p + 1), the weights are updated based on the weights from the
previous iteration p.
The signals keep flowing forward and backward until the errors are below some
preset threshold value.
3.1 Weight Training (Output layer neurons)
Input signals 1
1
x1 w Iteration p
1
1 y1
2 w2,k 1,k
2
x2 2 y2
2
xi
i wij j wjk
k yk
wj,k k yk(p)
j
m
xn
n l yl wm,k
Input Hidden Output
ek(p) = yd,k(p) - yk(p)
layer layer layer
m
Error signals
w j ,k ( p + 1) = w j ,k ( p ) + w j ,k ( p )
▪ These formulas
are used to w j ,k ( p ) = y j ( p ) k ( p )
perform weight
corrections. k ( p) = yk ( p) 1 - yk ( p) ek ( p)
δ = error gradient
We want to compute this We know this
w j ,k ( p + 1) = w j ,k ( p ) + w j ,k ( p )
w j ,k ( p ) = y j ( p ) k ( p )
1
w1,k Iteration p We know how to compute
2 w2,k
these
k ( p) = yk ( p) 1 - yk ( p) ek ( p)
wj,k k yk(p)
j
wm,k
ek(p) = yd,k(p) -
m yk(p)
We do the above for each of the weights of the outer layer
neurons.
3.2 Weight Training (Hidden layer neurons)
Input signals 1 Iteration p 1
1
x1
y1
w1,j
1
1
2 w2,j 2
2
x2 2 y2
2
xi
i wij j wjk
k yk
wi,j j
i k
m
n l yl
xn wn,j
Input Hidden Output
layer layer layer
n l
Error signals
wi , j ( p + 1) = wi , j ( p ) + wi , j ( p )
▪ These formulas
are used to wi , j ( p ) = xi ( p ) j ( p )
perform weight
j ( p ) = y j ( p ) 1 - y j ( p ) k ( p ) w jk ( p )
corrections. l
k =1
We want to compute this We know this
wi , j ( p + 1) = wi , j ( p ) + wi , j ( p ) predefined
input
wi , j ( p ) = xi ( p ) j ( p )
j ( p ) = y j ( p ) 1 - y j ( p ) k ( p ) w jk ( p )
l
We know
this
k =1
i wij j wjk
xi k yk
m
n l yl
xn
Input Hidden Output
layer layer layer
Error signals
Input signals
1
P=1
x1 Weights trained Weights trained
1 y1
1
2
x2 2 y2
2
i wij j wjk
xi k yk
m
n l yl
xn
Input Hidden Output
layer layer layer
Error signals
After the weights are trained in p = 1, we go back to Step 2
(Activation) and compute the outputs for the new weights.
If the errors obtained via the use of the updated weights are still
above the error threshold, we start weight training for p = 2.
Otherwise, we stop.
Input signals
1
P=2
x1 1 y1
1
2
x2 2 y2
2
i wij j wjk
xi k yk
m
n l yl
xn
Input Hidden Output
layer layer layer
Error signals
Input signals
1
P=2
x1 Weights trained Weights trained
1 y1
1
2
x2 2 y2
2
i wij j wjk
xi k yk
m
n l yl
xn
Input Hidden Output
layer layer layer
Error signals
Example: 3-layer ANN for XOR
x2 or input2
(0, 1) (1, 1)
(0, 0) (1, 0)
x1 or input1
A single-layer ANN or the perceptron cannot deal with problems that are not linearly
separable. We cope with these problem using multi-layer neural networks.