12 Neural Network

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 52

Artificial Intelligence

Neural Network
The idea of ANNs..?
▪ ANNs learn relationship between cause and effect or organize large volumes of data
into orderly and informative patterns.

What is that?
frog

lion

bird
It’s a frog
Artificial Neural Network
▪ ANN Definition:
“Data processing system consisting of a large number of simple, highly interconnected
processing elements (artificial neurons) in an architecture inspired by the structure of the
cerebral cortex of the brain”

(Tsoukalas & Uhrig, 1997)

▪ Neural network: information processing paradigm inspired by biological nervous


systems, such as our brain
▪ Structure: large number of highly interconnected processing elements (neurons)
working together
▪ Like people, they learn from experience (by example)
Artificial Neural Networks
▪ Computational models inspired by the human brain:
▪ Algorithms that try to mimic the brain.

▪ Massively parallel, distributed system, made up of simple processing units (neurons)

▪ Synaptic connection strengths among neurons are used to store the acquired
knowledge.

▪ Knowledge is acquired by the network from its environment through a learning


process
Biological Inspirations
Biological Neuron
Dendrites: nerve fibres carrying electrical Soma (Cell body): computes a non-
signals to the cell linear function of its inputs

Synapse: the point of contact between


the axon of one cell and the dendrite of
Axon: single long fiber that carries the
another, regulating a chemical connection
electrical signal from the cell body to
whose strength affects the input to the
other neurons
cell.
Biological Inspirations
▪ Humans perform complex tasks like vision, motor control, or language
understanding very well.
▪ One way to build intelligent machines is to try to imitate the (organizational
principles of) human brain.

▪ A variety of different neurons exist (motor neuron, on-center off-surround visual


cells…), with different branching structures.
▪ The connections of the network and the strengths of the individual synapses
establish the function of the network.
Artificial Neuron (Perceptron)

x0 w0
𝑛
x1 w1 ෍ 𝑥𝑖 𝑤𝑖
𝑖=0 f y
output
xn wn

For Example
Input weight weighted Activation n

vector x vector w sum function y = sign(෍ 𝑤𝑖 𝑥𝑖 )


i=0

▪ The n-dimensional input vector x is mapped into variable y by means of the scalar product and a
nonlinear function mapping
Key features of ANN are suggested by the properties of biological neurons:

▪ The processing element receives many signals.


▪ Signals may be modified by a weight at the receiving synapse.
▪ The processing element sums the weighted inputs.
▪ Under appropriate circumstances (sufficient input), the neuron transmits a single output.
▪ The output from a particular neuron may go to many other neurons.

x0 w0
𝑛
x1 w1
෍ 𝑥𝑖 𝑤𝑖
𝑖=0
∫ y
xn wn
Key features of ANN are suggested by the properties of biological neurons:

▪ Strength of connection between the neurons is stored as a weight-value for the


specific connection.
▪ Learning the solution to a problem = changing the connection weights

x0 w0
𝑛
x1 w1
෍ 𝑥𝑖 𝑤𝑖
𝑖=0
∫ y
xn wn
ANN = Simplified Biology
Properties of ANNs
▪ Learning from examples
▪ labeled or unlabeled
▪ Adaptivity
▪ changing the connection strengths to learn things
▪ Non-linearity
▪ the non-linear activation functions are essential
▪ Fault tolerance
▪ if one of the neurons or connections is damaged, the whole network still works quite well
▪ Thus, they might be better alternatives than classical solutions for problems
characterized by
▪ high dimensionality, noisy, imprecise or imperfect data
▪ a lack of a clearly stated mathematical solution or algorithm
Characterization of ANN
ANN
Activation
Interconnections Learning
Functions

Feed Learning Learning


Feed back Recurrent Binary Step Linear Non-Linear
Forward Rules Types

Parameter
Single Layer Single Layer Supervised Sigmoid
Learning

Structure
Multilayer Multilayer Unsupervised Softmax
Learning

Reinforcement TanH

ReLU

Leaky ReLU
Characterization of ANN
ANN
Interconnections

Feed
Feed back Recurrent
Forward

Single Layer Single Layer

Multilayer Multilayer
ANN Models
ANN Models
Learning in ANN
ANN

Learning

Learning Rules Learning Types

Neural
Parameter Structure
Supervised 𝑋 Network 𝑌෠
Learning Learning (Predicted output)
(Input)
W
Connection Change in Error
weights are network Unsupervised
updated structure 𝑌 − 𝑌෠ Error
Signal 𝑌
signals Generator (Desired Output)
Reinforcement
Activation Function
ANN
Activation
Functions

Binary Step Linear Non-Linear

Sigmoid

Softmax

TanH

ReLU

Leaky ReLU
Activation Function
ANN
Activation
Functions

Binary Step Linear Non-Linear

Sigmoid

Softmax

TanH

ReLU

Leaky ReLU
Perceptron
Output Y ▪ Perceptron
▪ computes the weighted sum of its input
(called its net input)
▪ adds its bias
1 𝑖𝑓 𝑛𝑒𝑡 > 0
𝑌=ቊ
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 ▪ passes this value through an activation
b function
𝑛

𝑛𝑒𝑡 = ෍ 𝑤𝑖 𝑥𝑖 ▪ We say that the neuron “fires” (i.e.


𝑖=1 becomes active) if its output is above
zero.
w1 w2 wn
▪ Bias can be incorporated as another
x1 x2 ... xn weight clamped to a fixed input of +1.0
Input X ▪ This extra free variable (bias) makes the
𝑛
neuron more powerful.
𝑛e𝑡 = ෍ 𝜔𝑖 𝑥𝑖 + 𝑏
𝑖=1
Perceptron
Output Y ▪ Perceptron
▪ computes the weighted sum of its input
(called its net input)
▪ adds its bias
1 𝑖𝑓 𝑛𝑒𝑡 > 0
𝑌=ቊ
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 ▪ passes this value through an activation
function
𝑛

𝑛𝑒𝑡 = ෍ 𝑤𝑖 𝑥𝑖 ▪ We say that the neuron “fires” (i.e.


w0
𝑖=1 becomes active) if its output is above
zero.
w1 w2 wn
X0 ▪ Bias can be incorporated as another
x1 x2 ... xn weight clamped to a fixed input of +1.0
Input X ▪ This extra free variable (bias) makes the
𝑛
neuron more powerful.
𝑛e𝑡 = ෍ 𝜔𝑖 𝑥𝑖
𝑖=0
Perceptron's

x0 wk0

wk1 H1
x1 𝑛

wk2 Y 𝑛e𝑡 = ෍ 𝑤𝑘𝑖 𝑥𝑖


x2 𝑖=0
H2
...

wkn
xn
Inverter
input output
X1 Y 1 w0= 0.5
0 1
1 0 x1 w1= -1
Boolean OR
x2
input input output
x1 x2 Y w0= -0.5 + +
OR
0 0 0 1
0 1 1 w1=1 w2=1
x1
- +
1 0 1
x1 x2
1 1 1
Boolean AND

input input x2
ouput
x1 x2
w0= -1.5
0 0 0 - +
1
0 1 0 w1=1 w2=1
AND
1 0 0
x1
x1 x2 - -
1 1 1
Boolean XOR

input input Output x2


x1 x2 Y

0 0 0
+ -
0 1 1
XOR
1 0 1
x1 x2 x1
1 1 0 - +
Boolean XOR
XOR
input input
ouput -0.5 x2
x1 x2 o

0 0 0 1 -1
+ -
0 1 1 OR AND
1 0 1 h1 h2 -1.5 XOR

1 1 0 w10= -0.5 x1
- +
1 1
1 1 1
w20= -1.5

x1 x1
Exercise
▪ Design neural network:
▪ XNOR
▪ NOR
▪ NAND
▪ XOR using OR, AND and NAND
▪ XNOR using OR, AND and NOR
Perceptron Sigmoid
▪ Perceptron: functions of the form y = f(x) where
▪ x ∈ {0, 1}
▪ y ∈ {0, 1}

▪ What about arbitrary functions of the form y = f(x) where x ∈ R n


(instead of {0, 1} n ) and y ∈ R (instead of {0, 1}) ?
Multilayer Neural Networks
▪ A multilayer perceptron is a feedforward neural network with ≥ 1 hidden layers.
Inputs
x1 Linear Hard
w1 Combiner Limiter

Out put Signals


Output

Input Signals
 Y
w2

x2
Threshold

First Second
Input hidden hidden Output
layer layer layer layer

Single-layer VS Multi-layer Neural Networks


Roles of Layers
▪ Input Layer
▪ Accepts input signals from outside world
▪ Distributes the signals to neurons in hidden
layer

Out put Signals


Input Signals
▪ Usually does not do any computation
▪ Output Layer (computational neurons)
▪ Accepts output signals from the previous
hidden layer
▪ Outputs to the world First Second
Input hidden hidden Output
▪ Knows the desired outputs layer layer layer layer
▪ Hidden Layer (computational neurons)
▪ Determines its own desired outputs
Roles of Layers
▪ Neurons in hidden layers unobservable
through input and output of the
networks.
▪ Desired output unknown (hidden) from

Out put Signals


Input Signals
the outside and determined by the layer
itself
▪ 1 hidden layer for continuous functions
▪ 2 hidden layers for discontinuous First Second
functions Input hidden hidden Output
layer layer layer layer
▪ Practical applications mostly use 3 layers
▪ More layers are possible, but each
additional layer exponentially increases
computing load
How do multilayer neural networks learn?
▪ More than a hundred different learning algorithms are available
for multilayer ANNs
▪ The most popular method is back-propagation.
Back-propagation Algorithm
▪ In a back-propagation neural network, the learning algorithm has 2 phases.
1. Forward propagation of inputs
2. Backward propagation of errors
▪ The algorithm loops over the 2 phases until the errors obtained are lower than a
certain threshold.
▪ Learning is done in a similar manner as in a perceptron
▪ A set of training inputs is presented to the network.
▪ The network computes the outputs.
▪ The weights are adjusted to reduce errors.
▪ The activation function used is a sigmoid function.

1
=
sigmoid
Y 1 +e - X
Common Activation Functions
Step function Sign function Sigmoid function Linear function
Output is a real number in the [0, 1] range.
Y Y Y Y
+1 +1 +1 +1

0 X 0 X 0 X 0 X
-1 -1 -1 -1

step 1, if X  0 sign + 1, if X  0 1


Y = Y = Y sigmoid = Y linear = X
0, if X  0 - 1, if X  0 1 + e- X
Hard limit functions often used Popular in Often used for
for decision-making neurons back- linear
for classification and pattern propagation approximation
recognition networks
3-layer Back-propagation Neural Network
Input signals
1
x1 1 y1
1
2
x2 2 y2
2

i wij j wjk
xi k yk

m
n l yl
xn
Input Hidden Output
layer layer layer

Error signals
How a neuron determines its output
▪ Very similar to the perceptron
n
X =  xi wi - 
1
=
sigmoid
Y 1 +e - X
i =1
1. Compute the net 2. Pass the result to the
weighted input activation function 0.98

2 0.98
1
Input Signals
Input signals 0.1
x1 1 y1 0.2 0.98
5
x2
2
1

0.5
j
2
2 y2
0.98
i wij wjk
1 0.3
j
xi k yk
0.98
m 8 Let θ = 0.2
xn
n l yl
0.98
Input Hidden Output
layer layer layer

Error signals X = (0.1(2) + 0.2(5) + 0.5(1) + 0.3(8)) – 0.2


= 3.9
Y = 1 / (1 + e-3.9) = 0.98
How the errors propagate backward
▪ The errors are computes in a similar manner to the errors in the perceptron.
▪ Error = The output we want – The output we get

ek ( p ) = yd ,k ( p ) - yk ( p ) Error at an output
neuron k at iteration p

2 Iteration p
Input signals 0.1
1
x1 1 y1 0.2
5
x2
2
1

2 y2 0.5
k 0.98
2

i wij wjk
1 0.3
j
xi k yk

m 8 Suppose the expected output is 1.


n l yl
xn
Input Hidden Output
layer layer layer ek(p) = 1 – 0.98 = 0.02
Error Signals
Error signals
Back-Propagation Training Algorithm
Step 1: Initialization
Randomly define weights and threshold θ such that the numbers are
within a small range
Random weight range [2.4, 0]
 2 .4 2 .4 
 - , +  2

2.4/Fi
 Fi Fi  1

0
0 10 20 30 40
Fi

Where Fi is the total number of inputs of neuron i.


The weight initialization is done on a neuron-by-neuron basis.
Back-Propagation Training Algorithm
Step 2: Activation
Propagate the input signals forward from the input layer to the output layer.

n 1
X =  xi wi -  =  [0,1]
sigmoid sigmoid
Y 1 +e -X Y
,
i =1
0.98
2 0.1 Let θ = 0.2
Input Signals
Input signals
0.2
x1
1
1 y1 5 0.98
2
1
0.5
j
x2
2
2 y2 0.98
1 0.3
xi
i wij j wjk
k yk 0.98
8
m
n l yl 0.98
xn
Input Hidden Output
layer layer layer X = (0.1(2) + 0.2(5) + 0.5(1) + 0.3(8)) – 0.2
Error signals
= 3.9
Y = 1 / (1 + e-3.9) = 0.98
Back-Propagation Training Algorithm
Step 3: Weight Training
There are 2 types of weight training
1. For the output layer neurons
2. For the hidden layer neurons
*** It is important to understand that first the input signals propagate forward, and
then the errors propagate backward to help train the weights. ***
In each iteration (p + 1), the weights are updated based on the weights from the
previous iteration p.
The signals keep flowing forward and backward until the errors are below some
preset threshold value.
3.1 Weight Training (Output layer neurons)
Input signals 1
1
x1 w Iteration p
1
1 y1
2 w2,k 1,k
2
x2 2 y2
2

xi
i wij j wjk
k yk
wj,k k yk(p)
j
m

xn
n l yl wm,k
Input Hidden Output
ek(p) = yd,k(p) - yk(p)
layer layer layer
m
Error signals

w j ,k ( p + 1) = w j ,k ( p ) + w j ,k ( p )
▪ These formulas
are used to w j ,k ( p ) =   y j ( p )   k ( p )
perform weight
corrections.  k ( p) = yk ( p)  1 - yk ( p) ek ( p)
δ = error gradient
We want to compute this We know this

w j ,k ( p + 1) = w j ,k ( p ) + w j ,k ( p )

We know how to compute


predefined
this

w j ,k ( p ) =   y j ( p )   k ( p )

1
w1,k Iteration p We know how to compute
2 w2,k
these

 k ( p) = yk ( p)  1 - yk ( p) ek ( p)
wj,k k yk(p)
j
wm,k
ek(p) = yd,k(p) -
m yk(p)
We do the above for each of the weights of the outer layer
neurons.
3.2 Weight Training (Hidden layer neurons)
Input signals 1 Iteration p 1
1
x1
y1
w1,j
1
1
2 w2,j 2
2
x2 2 y2
2

xi
i wij j wjk
k yk
wi,j j
i k
m
n l yl
xn wn,j
Input Hidden Output
layer layer layer
n l
Error signals

wi , j ( p + 1) = wi , j ( p ) + wi , j ( p )
▪ These formulas
are used to wi , j ( p ) =   xi ( p )   j ( p )
perform weight
 j ( p ) = y j ( p )  1 - y j ( p )   k ( p ) w jk ( p )
corrections. l

k =1
We want to compute this We know this

wi , j ( p + 1) = wi , j ( p ) + wi , j ( p ) predefined
input

wi , j ( p ) =   xi ( p )   j ( p )

 j ( p ) = y j ( p )  1 - y j ( p )   k ( p ) w jk ( p )
l
We know
this
k =1

We know how to 1 Iteration p 1


compute these w1,j
2 w2,j 2
Propagates from
the outer layer wi,j j
i k
We do the above for each of the wn,j
weights of the hidden layer
n l
neurons.
Input signals
1
P=1
x1 1 y1
1
2
x2 2 y2
2

i wij j wjk
xi k yk

m
n l yl
xn
Input Hidden Output
layer layer layer

Error signals
Input signals
1
P=1
x1 Weights trained Weights trained
1 y1
1
2
x2 2 y2
2

i wij j wjk
xi k yk

m
n l yl
xn
Input Hidden Output
layer layer layer

Error signals
After the weights are trained in p = 1, we go back to Step 2
(Activation) and compute the outputs for the new weights.

If the errors obtained via the use of the updated weights are still
above the error threshold, we start weight training for p = 2.

Otherwise, we stop.
Input signals
1
P=2
x1 1 y1
1
2
x2 2 y2
2

i wij j wjk
xi k yk

m
n l yl
xn
Input Hidden Output
layer layer layer

Error signals
Input signals
1
P=2
x1 Weights trained Weights trained
1 y1
1
2
x2 2 y2
2

i wij j wjk
xi k yk

m
n l yl
xn
Input Hidden Output
layer layer layer

Error signals
Example: 3-layer ANN for XOR
x2 or input2

(0, 1) (1, 1)

(0, 0) (1, 0)
x1 or input1

XOR is not a linearly separable function.

A single-layer ANN or the perceptron cannot deal with problems that are not linearly
separable. We cope with these problem using multi-layer neural networks.

You might also like