Neural Netshandout

Neural Networks
Biological Neural Networks
Artifical Neural Networks

Neural Networks
Biological Neural Networks
Artificial Neural Networks .
ANN Structure. . . . . . . . .
ANN Illustration. . . . . . . .
.
.
.
.
2
2
3
4
5
Perceptrons
Perceptron Learning Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Perceptrons Continued . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Example of Perceptron Learning ( = 1) . . . . . . . . . . . . . . . . . . . . . . . . .
6
6
7
8
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Neural networks are inspired by our brains.

The human brain has about 1011 neurons and 1014 synapses.
A neuron consists of a soma (cell body), axons (sends signals), and
dendrites (receives signals).
A synapse connects an axon to a dendrite.
Given a signal, a synapse might increase (excite) or decrease (inhibit)
electrical potential. A neuron fires when its electrical potential reaches a
threshold.
Learning might occur by changes to synapses.
CS 5233 Artificial Intelligence
Artificial Neural Networks

An (artificial) neural network consists of units, connections, and weights. Inputs
and outputs are numeric.
Biological NN Artificial NN
soma
unit
axon, dendrite connection
synapse
weight
potential
weighted sum
threshold
bias weight
signal
activation
Gradient Descent
9
Gradient Descent (Linear) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
The Delta Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Multilayer Networks
Illustration. . . . . . . . . . . . . . . . .
Sigmoid Activation . . . . . . . . . . .
Plot of Sigmoid Function. . . . . . .
Applying The Chain Rule. . . . . . .
Backpropagation . . . . . . . . . . . .
Hypertangent Activation Function.
Gaussian Activation Function . . . .
Issues in Neural Networks . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
11
11
12
13
14
15
16
17
18
Artificial Neural Networks 2
Perceptrons
ANN Structure

A typical unit i receives inputs aj1 , aj2 , . . . from other units and performs a
weighted sum:
P
ini = W0,i + j Wj,i aj
and outputs activation ai = g(ini ).
Typically, input units store the inputs, hidden units transform the inputs
into an internal numeric vector, and an output unit transforms the hidden
values into the prediction.
An ANN is a function f (x, W) = a, where x is an example, W is the
weights, and a is the prediction (activation value from output unit).
Learning is finding a W that minimizes error.
Perceptron Learning Rule

x1
x2
x3
x4
HIDDEN
UNITS
W15
W25
W35
W45
W16
W26
W36
W46
A perceptron is a single unit with activation:

P
a = sign W0 + j Wj xj
sign returns 1 or 1. W0 is the bias weight.

One version of the perceptron learning rule is:
P
in W0 + Wj xj
E max(0, 1 y in)
if E > 0 then
W0 W0 + y
Wj Wj + xj y for each input xj
x is the inputs, y {1, 1} is the target, and > 0 is the learning rate.
ANN Illustration
INPUT
UNITS
OUTPUT
UNIT
OUTPUT
H5
Perceptrons Continued
W05
W57
bias
W07
O7
a7

W06
W67
This learning rule tends to minimize E.

The perceptron convergence theorem states that if some W classifies all
the training examples correctly, then the perceptron learning rule will
converge to zero error on the training examples.
Usually, many epochs (passes over the training examples) are needed until
convergence.
If zero error is not possible, use 0.1/n, where n is the number of
normalized or standardized inputs.
H6
WEIGHTS
Example of Perceptron Learning ( = 1)
The Delta Rule
Using = 1:
Given error E(W), obtain the gradient:

E
E E
,
,...
E(W) =
W0 W1
Wn
To decrease error, use the update rule:

E
Wj Wj
Wj
The LMS update rule can be derived using:

P
E(W) = (y (W0 + Wj xj ))2/2
x1
Inputs
x2 x3
x4
in
0
1
1
0
0
0
1
1
0
0
1
1
0
0
1
0
0
1
1
0
1
1
0
1
0
1
0
-1
1
1
-1
1
-1
1
1
-1
0
-1
2
0
-1
-1
1
-1
2
1
2
0
1
2
0
0
2
3
0
1
1
1
0
0
0
1
0
W0 W1
0
0
-1
0
0
1
0
1
-1
1
0
1
0
1
0
1
1
2
0
2
Weights
W2 W3 W4
0
0
0
0
0 -1
1
1 -1
1
1 -1
1
0 -2
1
0 -2
1
0 -2
1
0 -2
1
1 -1
0
1 -1

For the perceptron learning rule, use:

P
E(W) = max (0, 1 y (W0 + Wj xj ))
Gradient Descent
Multilayer Networks
Gradient Descent (Linear)

Suppose activation is a linear function:

P
a = W0 + Wj xj
INPUT
UNITS
The Widrow-Hoff (LMS) update rule is:
HIDDEN
UNITS
1
2
1
3
x1
where is the learning rate.

This update rule tends to minimize squared error.
P
(y a)2
E(W) =
x2
(x,y)
11
Illustration
diff y a
W0 W0 + diff
Wj Wj + xj diff for each input j
x3
x4
OUTPUT
O7
a7
H5
1
bias
0
3
4
1
OUTPUT
UNIT
H6
WEIGHTS
Sigmoid Activation
Applying The Chain Rule
The sigmoid function is defined as:

1
sigmoid(x) =
1 + ex
It is commonly used for ANN activation functions:
ai = sigmoid(ini) P
= sigmoid(W0,i + j Wj,i aj )
E
E ai ini
=
Wj,i
ai ini Wj,i
= 2(yi ai ) ai (1 ai ) aj
Note that
sigmoid(x)
= sigmoid(x)(1 sigmoid(x))
x
Using E = (yi ai )2 for output unit i:
E ai ini aj inj
E
=
Wk,j
ai ini aj inj Wk,j
= 2(yi ai ) ai (1 ai ) Wj,i
aj (1 aj ) xk
Plot of Sigmoid Function
For weights from input to hidden units:
1.0
Backpropagation
sigmoid(x)
0.8

Backpropagation is an application of the delta rule.

Update each weight W using the rule:
0.6
W W
0.4
E
W
where is the learning rate.
0.2
0.0
-4
-2
Issues in Neural Networks
Hypertangent Activation Function

1.0
tanh(x)
0.5
0.0
-0.5
-1.0
-2
-1
Sufficiently complex ANNs can approximate any reasonable function.

ANNs approx. a preference bias for interpolation.
How many hidden units and layers?
What are good initial values?
One approach to avoid overfitting is:
Remove validation exs. from training exs.
Train neural network using training exs.
Choose weights that are best on validation set.
Faster algorithms might rely on:
Momentum, a running average of deltas.
Conjugant Gradient, a second deriv. method,
or RPROP, update based only on sign.
Gaussian Activation Function

1.0
gaussian(x,1)
0.8
0.6
0.4
0.2
0.0
-3
-2
-1
10

Neural Netshandout

Uploaded by

Copyright:

Available Formats

You might also like

Neural Netshandout

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Neural Netshandout

Uploaded by

Copyright:

Available Formats

Neural Networks

Biological Neural Networks

Artifical Neural Networks

Neural networks are inspired by our brains.

CS 5233 Artificial Intelligence

Artificial Neural Networks

Artificial Neural Networks 2

CS 5233 Artificial Intelligence

Artificial Neural Networks 3

CS 5233 Artificial Intelligence

Perceptron Learning Rule

Artificial Neural Networks 4

A perceptron is a single unit with activation:

sign returns 1 or 1. W0 is the bias weight.

CS 5233 Artificial Intelligence

Artificial Neural Networks 6

This learning rule tends to minimize E.

CS 5233 Artificial Intelligence

Artificial Neural Networks 7

Artificial Neural Networks 5

Example of Perceptron Learning ( = 1)

The Delta Rule

Given error E(W), obtain the gradient:

To decrease error, use the update rule:

The LMS update rule can be derived using:

CS 5233 Artificial Intelligence

For the perceptron learning rule, use:

CS 5233 Artificial Intelligence

Gradient Descent (Linear)

Suppose activation is a linear function:

The Widrow-Hoff (LMS) update rule is:

where is the learning rate.

CS 5233 Artificial Intelligence

Artificial Neural Networks 10

Artificial Neural Networks 9

Artificial Neural Networks 11

Applying The Chain Rule

The sigmoid function is defined as:

CS 5233 Artificial Intelligence

Using E = (yi ai )2 for output unit i:

Artificial Neural Networks 12

Plot of Sigmoid Function

For weights from input to hidden units:

CS 5233 Artificial Intelligence

Artificial Neural Networks 14

Backpropagation is an application of the delta rule.

where is the learning rate.

CS 5233 Artificial Intelligence

Artificial Neural Networks 15

CS 5233 Artificial Intelligence

Artificial Neural Networks 13

Issues in Neural Networks

Hypertangent Activation Function

CS 5233 Artificial Intelligence

Artificial Neural Networks 16

Sufficiently complex ANNs can approximate any reasonable function.

CS 5233 Artificial Intelligence

Artificial Neural Networks 18

Gaussian Activation Function