Professional Documents
Culture Documents
26 NeuralNetworks
26 NeuralNetworks
x1
x2
x3
What are they?
Inspired by the Human Brain.
The human brain has about 86 Billion neurons
and requires 20% of your body’s energy to
function.
These neurons are connected to between 100
Trillion to 1 Quadrillion synapses!
What are they?
Human Machine
Input Output
What are they?
1. Originally developed by Warren McCullough and Walter Pitts (1944)
2. First trainable network, perceptron, proposed by Frank Rosenblatt
(1957)
3. Started off as an unsupervised learning tool.
1. Had problems with computing time and could not compute XOR
2. Was abandoned in favor of other algorithms
4. Werbos's (1975) backpropagation algorithm
1. Incorporated supervision and solved XOR
2. But were still too slow vs. other algorithms e.g., Support Vector
Machines
5. Backpropagation was accelerated by GPUs in 2010 and shown to be more
efficient and cost effective
GPUS
GPUS handle parallel operations much better (thousands of
threads per core) but are not as quick as CPUs. However, the
matrix multiplication steps in ANNs can be run in parallel resulting
in considerable time + cost savings. The best CPUs handle about
50GB/s while the best GPUs handle 750GB/s memory bandwidth.
Input Output
Neuron
bias
x0
w0 = − 10
g − activation function
x2 w2 = 20
Neuron
bias
x0
w0 = − 10
g − activation function
x2 w2 = 20
2
z = x0 w0 + x1w1 + x2 w2 =
∑
wi xi y = g(z)
i=0
Activation function
bias
x0
w0 = − 10
w1 = 20 y = g (w0 x 0 + w1 x1 + w2 x 2 )
x1
g − activation function
x2 w2 = 20
1
g(z) =
1 + e −z
2
z = x 0 w0 + x1w1 + x 2 w2 =
∑ i i
wx y = g (z )
i=0 z = w0 x0 + w1x1 + w2 x2
Activation Functions
■ Sigmoid ■ Hyperbolic tangent
2
1 ta n h(z) =
1 + e −2z
−1
g(z) =
1 + e −z
{z for z ≥ 0
r(z) = ma x(0,z) αz for z < 0
lr(z) =
Neural Network
b0 b1
x1
Input
x2 Output
x3
1 1 0
XOR Gate
x y za
w0 = 0.4
x z 0 h0 w4 = 0.7
0 0 0 w1 = 0.6
0 1 1 w2 = 0.3 z2 h2 ẑ
y z1 h1 w5 = 0.1 output = h2 = z ̂
1 0 1 w3 = 0.8
1 1 0
Feedforward propagation
x y za 1
w0 = 0.4 activation function, g(z) =
0 0 x z0 h0 1 + e −z
0 w4 = 0.7
w1 = 0.6
0 1 1
w2 = 0.3
z2 h 2 ẑ
1 0 1
z1 h1 w5 = 0.1 out put = h 2 = z ̂
y
0 w3 = 0.8
1 1
1
z 0 = wo x + w2 y h 0 = g(z 0 ) =
1 + e −z 0
x =1 = 0.4 × 1 + 0.3 × 1 1 1
= 0.7 = h 2 = g(z2 ) =
1 + e −0.7 1 + e −z2
z2 = w4 h0 + w5h1
h 0 = 0.67 1 h2 = z ̂ = 0.65
= 0.7 × 0.67 + 0.1 × 0.8 =
1 + e −0.63
= 0.63 h 2 = 0.65
1
h1 = g(z1) =
z1 = w1x + w3 y 1 + e −z1
y=1 = 0.6 × 1 + 0.8 × 1 1
=
= 1.4 1 + e −1.4
h1 = 0.8
Feedforward propagation
x y za 1
w0 = 0.4 activation function, g(z) =
0 0 x z0 h0 1 + e −z
0 w4 = 0.7
w1 = 0.6
0 1 1
z2 h 2 ẑ
w2 = 0.3
1 0 1
z1 h1 w5 = 0.1 out put = h 2 = z ̂
y
0 w3 = 0.8
1 1
1
z 0 = wo x + w2 y h 0 = g (z 0 ) =
1 + e −z 0
x =1 = 0.4 × 1 + 0.3 × 1 1 1
= h 2 = g (z2 ) =
= 0.7 1 + e −z2
1 + e −0.7 z2 = w4 h 0 + w5 h1
h 0 = 0.67 =
1 ; h 2 = z ̂ = 0.65
= 0.7 × 0.67 + 0.1 × 0.8 1 + e −0.63
= 0.63 h 2 = 0.65
1
h1 = g (z1) =
z1 = w1 x + w3 y 1 + e −z1
y =1 = 0.6 × 1 + 0.8 × 1 1
=
= 1.4 1 + e −1.4
h1 = 0.8
1 1
Mean Squared Error, E = (z ̂ − za )2 = (0.65 − 0)2 = 0.21
2 2
Backpropagation
x y za 1
w0 = 0.4 activation function, g(z) =
0 0 x z0 h0 1 + e −z
0 w4 = 0.7
w1 = 0.6
0 1 1 1
z2 h 2 ẑ E= (z ̂ − za )2
w2 = 0.3
1 0 1 2
z1 h1 w5 = 0.1 out put = h 2 = z ̂
y
w3 = 0.8
∂z ( 1 + e −z )
1 1 0 ∂ ∂ 1
g′(z) = g(z) = = g(z)[1 − g(z)]
∂z
1 1 1
Mean Squared Error, E = (z ̂ − za )2 = (h 2 − za )2 = (0.65 − 0)2 = 0.21
2 2 2
∂E ∂E ∂h2 ∂z2
= ⋅ ⋅ = 0.65 × 0.228 × 0.67 = 0.09
∂w4 ∂h2 ∂z2 ∂w4
∂h 2 [ 2 2 ]
∂E ∂ 1
= (h − za )2 = h 2 − za = 0.65
∂h 2
∂z2 ( 1 + e −z2 )
∂h 2 ∂ ∂ 1
= g(z2 ) = = g(z2 )(1 − g(z2 )) = h 2(1 − h 2 ) = 0.65 × 0.35 = 0.228
∂z2 ∂z2
∂z2 ∂
= [w h + w5h1] = h0 = 0.67
∂w4 ∂w4 4 0
Backpropagation
x y za 1
w0 = 0.4 activation function, g(z) =
0 0 x z0 h0 1 + e −z
0 w4 = 0.7
w1 = 0.6
0 1 1 1
z2 h 2 ẑ E= (z ̂ − za )2
w2 = 0.3
1 0 1 2
z1 h1 w5 = 0.1 out put = h 2 = z ̂
y
w3 = 0.8
∂z ( 1 + e −z )
1 1 0 ∂ ∂ 1
g′(z) = g(z) = = g(z)[1 − g(z)]
∂z
1 1 1
Mean Squared Error, E = (z ̂ − za )2 = (h 2 − za )2 = (0.65 − 0)2 = 0.21
2 2 2
∂E ∂E ∂h2 ∂z2
= ⋅ ⋅ = 0.65 × 0.228 × 0.8 = 0.118
∂w5 ∂h2 ∂z2 ∂w5
∂h 2 [ 2 2 ]
∂E ∂ 1
= (h − za )2 = h 2 − za = 0.65
∂h 2
∂z2 ( 1 + e −z2 )
∂h 2 ∂ ∂ 1
= g(z2 ) = = g(z2 )(1 − g(z2 )) = h 2(1 − h 2 ) = 0.65 × 0.35 = 0.228
∂z2 ∂z2
∂z2 ∂
= [w h + w5h1] = h1 = 0.8
∂w5 ∂w5 4 0
Backpropagation
x y za 1
w0 = 0.4 activation function, g(z) =
0 0 x z0 h0 1 + e −z
0 w4 = 0.7
w1 = 0.6
0 1 1 1
z2 h 2 ẑ E= (z ̂ − za )2
w2 = 0.3
1 0 1 2
z1 h1 w5 = 0.1 out put = h 2 = z ̂
y
w3 = 0.8
∂z ( 1 + e −z )
1 1 0 ∂ ∂ 1
g′(z) = g(z) = = g(z)[1 − g(z)]
∂z
1 1 1
Mean Squared Error, E = (z ̂ − za )2 = (h 2 − za )2 = (0.65 − 0)2 = 0.21
2 2 2
∂E ∂E ∂h0 ∂z0
= ⋅ ⋅ = 0.103 × 0.221 × 1 = 0.227
∂w0 ∂h0 ∂z0 ∂w0
∂E ∂E ∂h 2 ∂z2
= ⋅ ⋅ = h 2 × h 2(1 − h 2 ) × w4 = 0.65 × 0.65(1 − 0.65) × 0.7 = 0.103
∂h 0 ∂h 2 ∂z2 ∂h 0
∂z0 ( 1 + e −z 0 )
∂h0 ∂ ∂ 1
= g(z 0 ) = = g(z 0 )(1 − g(z 0 )) = h0(1 − h0 ) = 0.67 × 0.33 = 0.221
∂z0 ∂z0
∂z 0 ∂
= [w x + w2 y] = x = 1
∂w0 ∂w0 0
Backpropagation
x y za 1
w0 = 0.4 activation function, g(z) =
0 0 x z0 h0 1 + e −z
0 w4 = 0.7
w1 = 0.6
0 1 1 1
z2 h 2 ẑ E= (z ̂ − za )2
w2 = 0.3
1 0 1 2
z1 h1 w5 = 0.1 out put = h 2 = z ̂
y
w3 = 0.8
∂z ( 1 + e −z )
1 1 0 ∂ ∂ 1
g′(z) = g(z) = = g(z)[1 − g(z)]
∂z
1 1 1
Mean Squared Error, E = (z ̂ − za )2 = (h 2 − za )2 = (0.65 − 0)2 = 0.21
2 2 2
∂E ∂E ∂h0 ∂z0
= ⋅ ⋅ = 0.103 × 0.221 × 1 = 0.227
∂w2 ∂h0 ∂z0 ∂w2
∂E ∂E ∂h 2 ∂z2
= ⋅ ⋅ = h 2 × h 2(1 − h 2 ) × w4 = 0.65 × 0.65(1 − 0.65) × 0.7 = 0.103
∂h 0 ∂h 2 ∂z2 ∂h 0
∂z0 ( 1 + e −z 0 )
∂h0 ∂ ∂ 1
= g(z 0 ) = = g(z 0 )(1 − g(z 0 )) = h0(1 − h0 ) = 0.67 × 0.33 = 0.221
∂z0 ∂z0
∂z 0 ∂
= [w x + w2 y] = y = 1
∂w2 ∂w2 0
Backpropagation
x y za 1
w0 = 0.4 activation function, g(z) =
0 0 x z0 h0 1 + e −z
0 w4 = 0.7
w1 = 0.6
0 1 1 1
z2 h 2 ẑ E= (z ̂ − za )2
w2 = 0.3
1 0 1 2
z1 h1 w5 = 0.1 out put = h 2 = z ̂
y
w3 = 0.8
∂z ( 1 + e −z )
1 1 0 ∂ ∂ 1
g′(z) = g(z) = = g(z)[1 − g(z)]
∂z
1 1 1
Mean Squared Error, E = (z ̂ − za )2 = (h 2 − za )2 = (0.65 − 0)2 = 0.21
2 2 2
∂E ∂E ∂h1 ∂z1
= ⋅ ⋅ = 0.014 × 0.16 × 1 = 0.0022
∂w1 ∂h1 ∂z1 ∂w1
∂E ∂E ∂h 2 ∂z2
= ⋅ ⋅ = h 2 × h 2(1 − h 2 ) × w5 = 0.65 × 0.65(1 − 0.65) × 0.1 = 0.014
∂h1 ∂h 2 ∂z2 ∂h1
∂z1 ( 1 + e −z1 )
∂h1 ∂ ∂ 1
= g(z1) = = g(z1)(1 − g(z1)) = h1(1 − h1) = 0.8 × 0.2 = 0.16
∂z1 ∂z1
∂z 0 ∂
= [w x + w3 y] = x = 1
∂w1 ∂w1 1
Backpropagation
x y za 1
w0 = 0.4 activation function, g(z) =
0 0 x z0 h0 1 + e −z
0 w4 = 0.7
w1 = 0.6
0 1 1 1
z2 h 2 ẑ E= (z ̂ − za )2
w2 = 0.3
1 0 1 2
z1 h1 w5 = 0.1 out put = h 2 = z ̂
y
w3 = 0.8
∂z ( 1 + e −z )
1 1 0 ∂ ∂ 1
g′(z) = g(z) = = g(z)[1 − g(z)]
∂z
1 1 1
Mean Squared Error, E = (z ̂ − za )2 = (h 2 − za )2 = (0.65 − 0)2 = 0.21
2 2 2
∂E ∂E ∂h1 ∂z1
= ⋅ ⋅ = 0.014 × 0.16 × 1 = 0.0022
∂w3 ∂h1 ∂z1 ∂w3
∂E ∂E ∂h 2 ∂z2
= ⋅ ⋅ = h 2 × h 2(1 − h 2 ) × w5 = 0.65 × 0.65(1 − 0.65) × 0.1 = 0.014
∂h1 ∂h 2 ∂z2 ∂h1
∂z1 ( 1 + e −z1 )
∂h1 ∂ ∂ 1
= g(z1) = = g(z1)(1 − g(z1)) = h1(1 − h1) = 0.8 × 0.2 = 0.16
∂z1 ∂z1
∂z 0 ∂
= [w x + w3 y] = y = 1
∂w1 ∂w3 1
Gradient descent weight updates
x y za 1
w0 = 0.4 activation function, g(z) =
0 0 x z0 h0 1 + e −z
0 w4 = 0.7
w1 = 0.6
0 1 1 1
z2 h 2 ẑ E= (z ̂ − za )2
w2 = 0.3
1 0 1 2
z1 h1 w5 = 0.1 out put = h 2 = z ̂
y
w3 = 0.8
∂z ( 1 + e −z )
1 1 0 ∂ ∂ 1
g′(z) = g(z) = = g(z)[1 − g(z)]
∂z
1 1 1
Mean Squared Error, E = (z ̂ − za )2 = (h 2 − za )2 = (0.65 − 0)2 = 0.21
2 2 2
1
z 0 = wo x + w2 y h 0 = g (z 0 ) =
1 + e −z 0
x =1 = 0.4 × 1 + 0.3 × 1 1 1
= h 2 = g (z 2 ) =
= 0.7
1 + e −0.7 1 + e −z 2
z2 = w4 h 0 + w5 h1
h 0 = 0.67 1 h 2 = z ̂ = 0.65
=
feed for ward = 0.7 × 0.67 + 0.1 × 0.8
= 0.63 h 2 = 0.65
1 + e −0.63
1
h1 = g (z1) =
z1 = w1 x + w3 y 1 + e −z1
y =1 = 0.6 × 1 + 0.8 × 1 1
=
= 1.4 1 + e −1.4
h1 = 0.8
∂E ∂E ∂h 2 ∂z 2 ∂E ∂E ∂h 0 ∂z 0 ∂E ∂E ∂h1 ∂z1
∂ w4
= ⋅ ⋅
∂h 2 ∂z 2 ∂ w4
= 0.65 × 0.228 × 0.67 = 0.09
∂ w0
= ⋅ ⋅
∂h 0 ∂z 0 ∂ w0
= 0.103 × 0.221 × 1 = 0.227 = ⋅ ⋅ = 0.014 × 0.16 × 1 = 0.0022
∂w1 ∂h1 ∂z1 ∂w1
back prop
∂E ∂E ∂h 2 ∂z 2 ∂E ∂E ∂h 0 ∂z 0 ∂E ∂E ∂h1 ∂z1
= ⋅ ⋅ = 0.65 × 0.228 × 0.8 = 0.118 = ⋅ ⋅ = 0.103 × 0.221 × 1 = 0.227 = ⋅ ⋅ = 0.014 × 0.16 × 1 = 0.0022
∂ w5 ∂h 2 ∂z 2 ∂ w5 ∂ w2 ∂h 0 ∂z 0 ∂ w2 ∂w3 ∂h1 ∂z1 ∂w3