Neural Netshandout

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

Neural Networks

Biological Neural Networks

Artifical Neural Networks







Neural Networks
Biological Neural Networks
Artificial Neural Networks .
ANN Structure. . . . . . . . .
ANN Illustration. . . . . . . .

.
.
.
.

2
2
3
4
5

Perceptrons
Perceptron Learning Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Perceptrons Continued . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Example of Perceptron Learning ( = 1) . . . . . . . . . . . . . . . . . . . . . . . . .

6
6
7
8

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

Neural networks are inspired by our brains.


The human brain has about 1011 neurons and 1014 synapses.
A neuron consists of a soma (cell body), axons (sends signals), and
dendrites (receives signals).
A synapse connects an axon to a dendrite.
Given a signal, a synapse might increase (excite) or decrease (inhibit)
electrical potential. A neuron fires when its electrical potential reaches a
threshold.
Learning might occur by changes to synapses.

CS 5233 Artificial Intelligence

Artificial Neural Networks


An (artificial) neural network consists of units, connections, and weights. Inputs
and outputs are numeric.
Biological NN Artificial NN
soma
unit
axon, dendrite connection
synapse
weight
potential
weighted sum
threshold
bias weight
signal
activation

Gradient Descent
9
Gradient Descent (Linear) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
The Delta Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Multilayer Networks
Illustration. . . . . . . . . . . . . . . . .
Sigmoid Activation . . . . . . . . . . .
Plot of Sigmoid Function. . . . . . .
Applying The Chain Rule. . . . . . .
Backpropagation . . . . . . . . . . . .
Hypertangent Activation Function.
Gaussian Activation Function . . . .
Issues in Neural Networks . . . . . .

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

11
11
12
13
14
15
16
17
18

Artificial Neural Networks 2

CS 5233 Artificial Intelligence

Artificial Neural Networks 3

Perceptrons

ANN Structure





A typical unit i receives inputs aj1 , aj2 , . . . from other units and performs a
weighted sum:
P
ini = W0,i + j Wj,i aj
and outputs activation ai = g(ini ).
Typically, input units store the inputs, hidden units transform the inputs
into an internal numeric vector, and an output unit transforms the hidden
values into the prediction.
An ANN is a function f (x, W) = a, where x is an example, W is the
weights, and a is the prediction (activation value from output unit).
Learning is finding a W that minimizes error.

CS 5233 Artificial Intelligence

Perceptron Learning Rule




Artificial Neural Networks 4

x1
x2
x3
x4

HIDDEN
UNITS
W15
W25
W35
W45
W16
W26
W36
W46

A perceptron is a single unit with activation:




P
a = sign W0 + j Wj xj

sign returns 1 or 1. W0 is the bias weight.


One version of the perceptron learning rule is:
P
in W0 + Wj xj
E max(0, 1 y in)
if E > 0 then
W0 W0 + y
Wj Wj + xj y for each input xj
x is the inputs, y {1, 1} is the target, and > 0 is the learning rate.

ANN Illustration

INPUT
UNITS

OUTPUT
UNIT

CS 5233 Artificial Intelligence

OUTPUT

H5

Perceptrons Continued


W05

W57

bias

W07

O7

a7




W06

W67

Artificial Neural Networks 6

This learning rule tends to minimize E.


The perceptron convergence theorem states that if some W classifies all
the training examples correctly, then the perceptron learning rule will
converge to zero error on the training examples.
Usually, many epochs (passes over the training examples) are needed until
convergence.
If zero error is not possible, use 0.1/n, where n is the number of
normalized or standardized inputs.

CS 5233 Artificial Intelligence

Artificial Neural Networks 7

H6

WEIGHTS
CS 5233 Artificial Intelligence

Artificial Neural Networks 5

Example of Perceptron Learning ( = 1)

The Delta Rule

Using = 1:

Given error E(W), obtain the gradient:




E
E E
,
,...
E(W) =
W0 W1
Wn

To decrease error, use the update rule:


E
Wj Wj
Wj

The LMS update rule can be derived using:


P
E(W) = (y (W0 + Wj xj ))2/2

x1

Inputs
x2 x3

x4

in

0
1
1
0
0
0
1
1
0

0
1
1
0
0
1
0
0
1

1
0
1
1
0
1
0
1
0

-1
1
1
-1
1
-1
1
1
-1

0
-1
2
0
-1
-1
1
-1
2

1
2
0
1
2
0
0
2
3

0
1
1
1
0
0
0
1
0

W0 W1
0
0
-1
0
0
1
0
1
-1
1
0
1
0
1
0
1
1
2
0
2

Weights
W2 W3 W4
0
0
0
0
0 -1
1
1 -1
1
1 -1
1
0 -2
1
0 -2
1
0 -2
1
0 -2
1
1 -1
0
1 -1

CS 5233 Artificial Intelligence


Artificial Neural Networks 8

For the perceptron learning rule, use:


P
E(W) = max (0, 1 y (W0 + Wj xj ))

CS 5233 Artificial Intelligence

Gradient Descent

Multilayer Networks

Gradient Descent (Linear)




Suppose activation is a linear function:


P
a = W0 + Wj xj

INPUT
UNITS

The Widrow-Hoff (LMS) update rule is:

HIDDEN
UNITS
1
2
1
3

x1

where is the learning rate.


This update rule tends to minimize squared error.
P
(y a)2
E(W) =

x2

(x,y)

CS 5233 Artificial Intelligence

11

Illustration

diff y a
W0 W0 + diff
Wj Wj + xj diff for each input j


Artificial Neural Networks 10

x3
x4

OUTPUT

O7

a7

H5
1

bias

0
3
4
1

Artificial Neural Networks 9

OUTPUT
UNIT

H6

WEIGHTS
CS 5233 Artificial Intelligence

Artificial Neural Networks 11

Sigmoid Activation

Applying The Chain Rule

The sigmoid function is defined as:


1
sigmoid(x) =
1 + ex
It is commonly used for ANN activation functions:

ai = sigmoid(ini) P
= sigmoid(W0,i + j Wj,i aj )

E
E ai ini
=
Wj,i
ai ini Wj,i
= 2(yi ai ) ai (1 ai ) aj


Note that
sigmoid(x)
= sigmoid(x)(1 sigmoid(x))
x

CS 5233 Artificial Intelligence

Using E = (yi ai )2 for output unit i:

E ai ini aj inj
E
=
Wk,j
ai ini aj inj Wk,j
= 2(yi ai ) ai (1 ai ) Wj,i
aj (1 aj ) xk

Artificial Neural Networks 12

Plot of Sigmoid Function

For weights from input to hidden units:

CS 5233 Artificial Intelligence

1.0

Artificial Neural Networks 14

Backpropagation

sigmoid(x)
0.8




Backpropagation is an application of the delta rule.


Update each weight W using the rule:

0.6
W W

0.4

E
W

where is the learning rate.

0.2

CS 5233 Artificial Intelligence

Artificial Neural Networks 15

0.0
-4

-2

CS 5233 Artificial Intelligence

Artificial Neural Networks 13

Issues in Neural Networks

Hypertangent Activation Function


1.0

tanh(x)


0.5

0.0
-0.5

-1.0
-2

-1

CS 5233 Artificial Intelligence

Artificial Neural Networks 16

Sufficiently complex ANNs can approximate any reasonable function.


ANNs approx. a preference bias for interpolation.
How many hidden units and layers?
What are good initial values?
One approach to avoid overfitting is:
Remove validation exs. from training exs.
Train neural network using training exs.
Choose weights that are best on validation set.
Faster algorithms might rely on:
Momentum, a running average of deltas.
Conjugant Gradient, a second deriv. method,
or RPROP, update based only on sign.

CS 5233 Artificial Intelligence

Artificial Neural Networks 18

Gaussian Activation Function


1.0
gaussian(x,1)
0.8
0.6
0.4
0.2
0.0
-3

-2

-1

CS 5233 Artificial Intelligence

Artificial Neural Networks 17

10

You might also like