Professional Documents
Culture Documents
Neural Netshandout
Neural Netshandout
Neural Netshandout
Neural Networks
Biological Neural Networks
Artificial Neural Networks .
ANN Structure. . . . . . . . .
ANN Illustration. . . . . . . .
.
.
.
.
2
2
3
4
5
Perceptrons
Perceptron Learning Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Perceptrons Continued . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Example of Perceptron Learning ( = 1) . . . . . . . . . . . . . . . . . . . . . . . . .
6
6
7
8
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Gradient Descent
9
Gradient Descent (Linear) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
The Delta Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Multilayer Networks
Illustration. . . . . . . . . . . . . . . . .
Sigmoid Activation . . . . . . . . . . .
Plot of Sigmoid Function. . . . . . .
Applying The Chain Rule. . . . . . .
Backpropagation . . . . . . . . . . . .
Hypertangent Activation Function.
Gaussian Activation Function . . . .
Issues in Neural Networks . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
11
11
12
13
14
15
16
17
18
Perceptrons
ANN Structure
A typical unit i receives inputs aj1 , aj2 , . . . from other units and performs a
weighted sum:
P
ini = W0,i + j Wj,i aj
and outputs activation ai = g(ini ).
Typically, input units store the inputs, hidden units transform the inputs
into an internal numeric vector, and an output unit transforms the hidden
values into the prediction.
An ANN is a function f (x, W) = a, where x is an example, W is the
weights, and a is the prediction (activation value from output unit).
Learning is finding a W that minimizes error.
x1
x2
x3
x4
HIDDEN
UNITS
W15
W25
W35
W45
W16
W26
W36
W46
ANN Illustration
INPUT
UNITS
OUTPUT
UNIT
OUTPUT
H5
Perceptrons Continued
W05
W57
bias
W07
O7
a7
W06
W67
H6
WEIGHTS
CS 5233 Artificial Intelligence
Using = 1:
x1
Inputs
x2 x3
x4
in
0
1
1
0
0
0
1
1
0
0
1
1
0
0
1
0
0
1
1
0
1
1
0
1
0
1
0
-1
1
1
-1
1
-1
1
1
-1
0
-1
2
0
-1
-1
1
-1
2
1
2
0
1
2
0
0
2
3
0
1
1
1
0
0
0
1
0
W0 W1
0
0
-1
0
0
1
0
1
-1
1
0
1
0
1
0
1
1
2
0
2
Weights
W2 W3 W4
0
0
0
0
0 -1
1
1 -1
1
1 -1
1
0 -2
1
0 -2
1
0 -2
1
0 -2
1
1 -1
0
1 -1
Artificial Neural Networks 8
Gradient Descent
Multilayer Networks
INPUT
UNITS
HIDDEN
UNITS
1
2
1
3
x1
x2
(x,y)
11
Illustration
diff y a
W0 W0 + diff
Wj Wj + xj diff for each input j
x3
x4
OUTPUT
O7
a7
H5
1
bias
0
3
4
1
OUTPUT
UNIT
H6
WEIGHTS
CS 5233 Artificial Intelligence
Sigmoid Activation
ai = sigmoid(ini) P
= sigmoid(W0,i + j Wj,i aj )
E
E ai ini
=
Wj,i
ai ini Wj,i
= 2(yi ai ) ai (1 ai ) aj
Note that
sigmoid(x)
= sigmoid(x)(1 sigmoid(x))
x
E ai ini aj inj
E
=
Wk,j
ai ini aj inj Wk,j
= 2(yi ai ) ai (1 ai ) Wj,i
aj (1 aj ) xk
1.0
Backpropagation
sigmoid(x)
0.8
0.6
W W
0.4
E
W
0.2
0.0
-4
-2
tanh(x)
0.5
0.0
-0.5
-1.0
-2
-1
-2
-1
10