Professional Documents
Culture Documents
13b Neural Networks 1
13b Neural Networks 1
13b Neural Networks 1
Neural
eu a Networks
e o s1
z Topics
Perceptrons
structure
training
expressiveness
Multilayer networks
possible structures
activation functions
training with gradient descent and backpropagation
expressiveness
z Consider humans:
Neuron switching time ~ 0.001
0 001 second
Number of neurons ~ 1010
Connections per neuron ~ 1044-55
Scene recognition time ~ 0.1 second
100 iinference
f steps
t d
doesnt
t seem lik
like enough
h
z Properties:
Many neuron-like threshold switching units
Many weighted interconnections among units
Highly parallel, distributed process
Emphasis on tuning weights automatically
z Model is an assembly of
nodes connected by
weighted links
z Output
O t t node
d sums up its it
input values according to
the weights
g of their links
y = sign( w j x j t )
threshold t
j
X1 X2 X3 Y
1 0 0 0
1 0 1 1
1 1 0 1
1 1 1 1
0 0 1 0
0 1 0 0
0 1 1 1
0 0 0 0
X1 X2 X3 Y
1 0 0 0
1 0 1 1
1 1 0 1
1 1 1 1
0 0 1 0
0 1 0 0
0 1 1 1
0 0 0 0
y = I ( 0 . 3 x1 + 0 . 3 x 2 + 0 . 3 x 3 > 0 . 4 )
1 if z is true
where I ( z ) =
0 otherwise
Jeff Howbert Introduction to Machine Learning Winter 2012 9
Perceptron decision boundary
Delta rule
train on unthresholded outputs
driven by continuous differences between correct
and predicted outputs
modify weights via gradient descent
= 0.1
yi y i = 1
( yi y i ) xi1 = 0.1
( yi y i ) xi 2 = 0.0
( yi y i ) xi 3 = 0.1
2 i 2 i
Note that error is difference between correct
output and unthresholded sum of inputs, a
continuous quantity (rather than binary difference
between correct output and thresholded output).
Gradient :
E E E
E ( w ) = , ,L,
w0 w1 wd
E 1
=
w j w j 2 i
( y i y
i ) 2
1
=
2 w j
i i
(
i
y y
) 2
1
= 2( yi y i ) ( yi y i )
2 i w j
= ( yi y i ) ( yi w x i )
i w j
E
= ( yi y i )( xijj )
w j i
z Batch mode
Compute errors and weight updates for a block of
samples (maybe all samples)
samples).
Apply all updates simultaneously to weights.
y = I ( w j x j t )
j