Professional Documents
Culture Documents
(Type The Document Title) : Chapter 3: Supervised Learning Networks
(Type The Document Title) : Chapter 3: Supervised Learning Networks
(Type The Document Title) : Chapter 3: Supervised Learning Networks
of the neuron is the binary threshold function, which can be illustrated by . This leads us
to the complete depiction of information processing neurons, namely . Other neurons that
use the weighted sum as propagation function but the activation functions hyperbolic tangent or
Fermi function, or with a separately defined activation function fact, are similarly represented by
Fig. 3.1: Architecture of a perceptron with one layer of variable connections in different views.
The solid-drawn weight layer in the two illustrations on the bottom can be trained.
Left side: Example of scanning information in the eye.
Right side, upper part: Drawing of the same example with indicated fixed-weight layer using
the defined designs of the functional descriptions for neurons.
Right side, lower part: Without indicated fixed-weight layer, with the name of each neuron
corresponding to our convention. The fixed-weight layer will no longer be taken into account in
the course of this work
Fig. 3.2: A single layer perceptron with two input neurons and one output neuron. The network
returns the output by means of the arrow leaving the network. The trainable layer of weights is
situated in the center (labeled). As a reminder, the bias neuron is again included here. Although
the weight ωBIAS, is a normal weight and also treated like this, I have represented it by a dotted
line – which significantly increases the clarity of larger networks. In future, the bias neuron will
no longer be included.
Certainly, the existence of several output neurons Ω1, Ω2, . . . , Ωn does not considerably change
the concept of the perceptron (fig. 3.3). A perceptron with several output neurons can also be
regarded as several different perceptrons with the same input.
Table 3.1: Definition of the logical XOR. The input values are shown of the left, the output
values on the right.
Fig. 3.5: Sketch of a single layer perceptron that shall represent the XOR function - which is
impossible.
Here we use the weighted sum as propagation function, a binary activation function with
the threshold value and the identity as output function. Depending on i1 and i2, has to output the
value 1 if the following holds:
𝑛𝑒𝑡𝛺 = 𝑜𝑖1 𝜔𝑖1 ,𝛺 + 𝑜𝑖2 𝜔𝑖2 ,𝛺 ≥ 𝛩𝛺 𝐸𝑞. 3.1
We assume a positive weight 𝜔𝑖2 ,𝛺 the inequality 3.1 is then equivalent to eq. 3.2
1
𝑜𝑖1 ≥ (𝛩𝛺 − 𝑜𝑖2 𝜔𝑖2 ,𝛺 ) 𝐸𝑞. 3.2
𝜔𝑖1 ,𝛺
The XOR problem itself is one of these tasks, since a perceptron that is supposed to
represent the XOR function already needs a hidden layer (fig. 3.7 on the next page).
Fig. 3.7: Neural network realizing the XOR function. Threshold values are located
within the neurons
3.3 The Multilayer Perceptron: A multilayer perceptron contains more trainable weight
layers.
A perceptron with two or more trainable weight layers (called multilayer perceptron or
MLP) is more powerful than an SLP. As we know, a single layer perceptron can divide the
input space by means of a hyperplane (in a two-dimensional input space by means of a straight
line). A two stage perceptron (two trainable weight layers, three neuron layers) can classify
convex polygons by further processing these straight lines, e.g. in the form "recognize patterns
lying above straight line 1, below straight line 2 and below straight line 3". Thus, we –
metaphorically speaking - took an SLP with several output neurons and "attached" another SLP
(upper part of fig. 3.8). A multilayer perceptron represents an universal function approximator,
which is proven by the Theorem of Cybenko.
Another trainable weight layer proceeds analogously, now with the convex polygons.
Those can be added, subtracted or somehow processed with other operations (lower part of fig.
3.8).
Generally, it can be mathematically proven that even a multilayer perceptron with one
layer of hidden neurons can arbitrarily precisely approximate functions with only finitely many
discontinuities as well as their first derivatives. Unfortunately, this proof is not constructive and
therefore it is left to us to find the correct number of neurons and weights.
In the following we want to use a widespread abbreviated form for different multilayer
perceptrons: We denote a two stage perceptron with 5 neurons in the input layer, 3 neurons in
the hidden layer and 4 neurons in the output layer as a 5-3-4-MLP.
Fig. 3.8: We know that an SLP represents a straight line. With 2 trainable weight layers, several
straight lines can be combined to form convex polygons (above). By using 3 trainable weight
layers several polygons can be formed into arbitrary sets (below).
Definition of Multilayer perceptron: Perceptrons with more than one layer of variably
weighted connections are referred to as multilayer perceptrons (MLP). An n-layer or n-stage
perceptron has thereby exactly n variable weight layers and n + 1 neuron layers (the retina is
disregarded here) with neuron layer 1 being the input layer.
Since three-stage perceptrons can classify sets of any form by combining and separating
arbitrarily many convex polygons, another step will not be advantageous with respect to
function representations. Be cautious when reading the literature: There are many different
definitions of what is counted as a layer. Some sources count the neuron layers, some count the
weight layers. Some sources include the retina, some the trainable weight layers. Some exclude
(for some reason) the output neuron layer. In this work, I chose the definition that provides, in
my opinion, the most information about the learning capabilities – and I will use it consistently.
Remember: An n-stage perceptron has exactly n trainable weight layers. You can find a
summary of which perceptrons can classify which types of sets in table 3.3. We now want to
face the challenge of training perceptrons with more than one weight layer.
Table 3.3: Representation of which perceptron can classify which types of sets with n being the
number of trainable weight layers.
3.4 Backpropagation Error:
The backpropagation of error learning rule (abbreviated: backpropagation, backprop or
BP), which can be used to train multistage perceptrons with semi-linear3 activation functions.
Binary threshold functions and other non-differentiable functions are no longer supported, but
that doesn’t matter: We have seen that the Fermi function or the hyperbolic tangent can
arbitrarily approximate the binary threshold function by means of a temperature parameter T.
Backpropagation is a gradient descent procedure (including all strengths and weaknesses of the
gradient descent) with the error function Err(W) receiving all n weights as arguments and
assigning them to the output error, i.e. being n-dimensional. On Err(W) a point of small error or
even a point of the smallest error is sought by means of the gradient descent. Thus, in analogy to
the delta rule, backpropagation trains the weights of the neural network. And it is exactly the
delta rule or its variable δi for a neuron i which is expanded from one trainable weight layer to
several ones by backpropagation.
Back Propagation Network:
symbol .
Definition of RBF output neuron: RBF output neurons Ω use the weighted sum as
propagation function fprop, and the identity as activation function fact. They are represented by the
symbol .
Definition of RBF network: An RBF network has exactly three layers in the following order:
The input layer consisting of input neurons, the hidden layer (also called RBF layer) consisting
of RBF neurons and the output layer consisting of RBF output neurons. Each layer is
completely linked with the following one, shortcuts do not exist (fig. 3.9 on the next page) it is a
feedforward topology. The connections between input layer and RBF layer are unweighted, i.e.
they only transmit the input. The connections between RBF layer and output layer are weighted.
The original definition of an RBF network only referred to an output neuron, but – in analogy to
the perceptrons – it is apparent that such a definition can be generalized. A bias neuron is not
used in RBF networks. The set of input neurons shall be represented by I, the set of hidden
neurons by H and the set of output neurons by O. Therefore, the inner neurons are called radial
basis neurons because from their definition follows directly that all input vectors with the same
distance from the center of a neuron also produce the same output value (fig. 3.10).
Fig. 3.9: An exemplary RBF network with two input neurons, five hidden neurons and three
output neurons. The connections to the hidden neurons are not weighted, they only transmit the
input. Right of the illustration you can find the names of the neurons, which coincide with the
names of the MLP neurons: Input neurons are called i, hidden neurons are called h and output
neurons are called Ω. The associated sets are referred to as I, H and O.
Fig. 3.10: Let ch be the center of an RBF neuron h. Then the activation function 𝑓𝑎𝑐𝑡 ℎ is radially
symmetric around ch.
impact on decision-making in business, science, and engineering. Stock market prediction and
weather forecasting are typical applications of prediction/forecasting techniques (see below
figure 3.12).