Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

Chapter 1

 Why neural networks:


 There are problem categories that cannot be formulated as an algorithm (example: predicting
house prices)
 Without an algorithm, computer can't work
 Computer can't learn like a human brain (computer is static)
 Largest part of human brain is continuously working whereas largest part of computer is for
passive data storage

Brain Computer
No. of processing units 10^11 10^9
Type of processing units Neurons Transistors
Type of calculation massively parallel usually serial
Data storage associative address-based
Possible switching operations 10^13 s^-1 10^18 s^-1
Actual switching operations 10^12 s^-1 10^10 s^-1

 Neural network characteristics:


 inspired from biological systems which:
 consist of very simple but numerous nerve cells that
 work massively in parallel and
 have the capability to learn
 No need to explicitly program a neural network
 it can learn from:
o training samples or
o by encouragement
 Can generalize and associate data
 Fault tolerance against noisy input data

 main characteristics we try to adapt from biology:


 Self-organization and learning capability
 Generalization capability
 Fault tolerance
Chapter 2

 Nervous system:
 Central:
 brain
 spinal cord
 Peripheral
 nerves outside of brain and spinal cord
[see neuron figure]

 Neuron components:
 Dendrites:
 receive electrical signals from many different sources, which are then transferred into the
nucleus of the cell.
 Nucleus (soma):
 accumulates signals from dentrites or synapses
 when the accumulated signal exceeds a certain value the nucleus activates an electrical
pulse
 axon
 long, slender extension of the soma
 electrically isolated
 transfers electrical signal to dentrites of other neurons (by synapses)
 synapse:
 connect different neurons together
 electrical synapse: direct, strong, unadjustable connection
 chemical synapse:
 synaptic cleft electrically separates the pre-synaptic side from the post-synaptic one
 electrical signal is converted into chemical signal, passes through the cleft, and then is
converted back to (modified) electrical signal
 one way connection
 adjustable

 Characteristics of technical neural networks:


 vectorial input
 Scalar output
 synapses change input
 Accumulating inputs
 Non-linear characteristic: No proportional relationship between input and output
 Adjustable weights
Chapter 3
 Formal definition of ANN:
 set of triples (N, V, w) where:
 N is the set of neurons
 V is the set of connections {(i, j) : i, j \in N}
 w: weights, mapping between (i, j) to wij

 Steps of data processing in neuron:


 Propagation function: often weighted sum of inputs to convert vector input to scalar net input
 Activation function: Transforms net input (and sometimes old activation) to new activation
 Output function: Often identity function, transforms activation to output to other neurons
Threshold definition: theta_j marks the position of the maximum gradient value of the activation
function

 Step functions:
 Heaviside (binary threshold function)
 if the input is above a certain threshold, the function changes from one value to another, but
otherwise remains constant
 Fermi (logistic function): maps to the range 0:1
 Can be expanded by a temperature parameter: the smaller, the more compressed on x-axis
 Hyperbolic tangent (tanh): maps to the range -1:1

 Topologies:
 Feedforward network:

One input layer, one output layer and one or more hidden layers.

Connections are only permitted to neurons of the following layer.
 Feedforward with shortcut connections:
 connections may not only be directed towards the next layer but also towards any other
subsequent layer.
 Recurrent networks:
 Direct recurrent: a neuron can connect to itself
 Indirect recurrent: a neuron can connect to neurons in preceding layer
 Lateral recurrent: a neuron can connect to neurons in the same layer
 Completely linked:
 Every neuron is connected to every other neuron except itself
 A bias neuron:
 is a neuron whose output value is always 1
 Connected to neurons j1, j2, ..., jn with weights equal to negative thresholds -theta_j1, -
theta_j2,……theta_jn
 We can modify these weights to learn instead of trying to modify the threshold (which is
difficult)
 Advantage: easier to implement
 Disadvantage: Network representation already becomes ugly with only a few nerons, let
alone with a great number of them
Chapter 4
 A neural network could learn by:
1. developing new connections,
2. deleting existing connections,
3. changing connecting weights,
4. changing the threshold values of neurons,
5. varying one or more of the three neuron functions (activation function, propagation function
and output function),
6. developing new neurons, or
7. deleting existing neurons (and so, of course, existing connections).

 A training set: (named P)


 is a set of training patterns, which we use to train our neural net

 Unsupervised learning:
 The training set only consists of input patterns
 The network tries by itself to detect similarities and to generate pattern classes

 Reinforcement learning:
 The training set consists of input patterns
 After completion of a sequence a value is returned to the network indicating whether the
result was right or wrong and,
 Possibly, how right or wrong it was

 Supervised learning:
 The training set consists of input patterns with correct results
 The network can receive a precise error vector

 Supervised learning scheme (steps):


1. Entering the input pattern (activation of input neurons),
2. Forward propagation of the input by the network, generation of the output
3. Comparing the output with the desired output (teaching input), provides error vector
(difference vector),
4. Corrections of the network are calculated based on the error vector,
5. Corrections are applied.

 Offline learning:
 Several training patterns are entered into the network at once,
 the errors are accumulated and it learns for all patterns at the same time.

 Online learning:
 The network learns directly from the errors of each training sample.

 Questions you should answer before learning:


1. Where does the learning input come from and in what form?
2. How must the weights be modified to allow fast and reliable learning?
3. 3 How can the success of a learning process be measured in an objective way?
4. Is it possible to determine the "best" learning procedure?
5. Is it possible to predict if a learning procedure terminates
6. How can the learned patterns be stored in the network?
7. Is it possible to avoid that newly learned patterns destroy previously learned associations
(the so-called stability/plasticity dilemma)?

 Training pattern is:


 an input vector p with the components p1 ,p2 ,..., pn whose desired output (teaching input) is
known.

 Training sample (p):


 nothing more than an input vector
 We use it for training if we know its corresponding teaching input

 Teaching input:
 desired output vector to the training sample
 The teaching input tj is the desired and correct value j should output after the input of a
certain training pattern

 Error vector (Ep): For several output neurons Omega_1, Omega_2, ..., Omega_n the
difference between output vector and teaching input under a training input p

Ep = [t1 - y1; t2 - y2; ... tn - yn]

 Overfitting: oversized network with too much free storage capacity.


 Underfitting: insufficient capacity

 Learning curve:
 Indicates the progress of the error over time t
 A perfect learning curve looks like a negative exponential function, that means it is
proportional to e^-t

 Specific error:
 Err_p is based on a single training sample, which means it is generated online

 Total error:
 based on all training samples, that means it is generated offline.
 Total error = sum[p \in P]: Err_p

 Euclidean error:
 sqrt( sum[Omega \in O]: (t_Omega - y_Omega)^2 )

 RMS error:
 sqrt( {sum[Omega \in O]: (t_Omega - y_Omega)^2}/{|O|} )

 When to stop learning:


 depends on a more objective view on the comparison of several learning curve
 when the network always reaches objectivity nearly the same final error-rate for different
random initializations
 We must not forget to plot the validation error curve as well

 Gradient:
 gradient g of a function f is a vector that directs from any point of f towards the steepest
ascent from this point,
 with |g| corresponding to the degree of this ascent

 Gradient descent:
 going from f(s) against the direction of g, i.e. towards -g with steps of the size of |g| towards
smaller and smaller values of f.

 Gradient descent problems: [refer to figure 4.4]


1. converging against suboptimal minima (figure 4.4: a)
2. Flat plataeus on the error surface may cause training slowness (figure 4.4: b)
3. Even if good minima are reached, they may be left afterwards (figure 4.4: d)
4. Steep canyons in the error surface may cause oscillations (figure 4.4: c)

 Hebbian learning rule:


 If neuron j receives an input from neuron i and if both neurons are strongly active at the same
time, then increase the weight w_i,j
Rule: delta w_i,j is proportional to eta o_i a_j

 The generalized form of the Hebbian Rule:


 only specifies the proportionality of the change in weight to the product of two undefined
functions, but with defined input values.
Rule: delta w_i,j is proportional to eta * h(o_i, w_i,j) * g(a_j, t_j)

You might also like