Professional Documents
Culture Documents
Preliminaries: Biological Neuron To Artificial Neural Network
Preliminaries: Biological Neuron To Artificial Neural Network
1.1 Motivation
Artificial neural network (ANN) is designed with the goal of building intelligent
machines to solve complex perceptual problems, such as pattern recognition and
optimization, by mimicking the networks of real neurons in the human brain. Biological
neural system possesses massive parallelism, distributed representation and computation,
learning ability, generalization ability, adaptability, inherent contextual information
processing, fault tolerance, and low energy consumption (Jain et al., 1996). An ANN is a
collection of simple processing units which has a natural propensity for storing
experimental knowledge and resembles the human brain in two respects (Haykin, 1999):
1. Knowledge is acquired by it from its environment through a learning process.
2. Interneuron connection strengths, known as synaptic weights, are used to store the
knowledge.
The procedure used to perform the learning process is called a learning algorithm.
1
2 | Chapter 1. Preliminaries: Biological Neuron to Artificial Neural Network
Since synaptic weights store the knowledge, the goal of a learning algorithm is to modify
the synaptic weights of a network in an orderly fashion to attain a desired objective. Back-
propagation (BP) (Rumelhart et al., 1986) is the most popular ANN learning method for
multi layered networks. In BP synaptic weights are adjusted at the time of output error
propagation from output layer to input layer.
(an axon strand of one neuron and a dendrite of another), When the impulse reaches the
synapse's terminal, certain chemicals called neuro transmitters are released. The
neurotransmitters diffuse across the synaptic gap, to enhance or inhibit, depending on the
type of the synapse, the receptor neuron's own tendency to emit electrical impulses. The
synapse's effectiveness can be adjusted by the signals passing through it so that the
synapses can learn from the activities in which they participate. This dependence on
history acts as a memory, which is possibly responsible for human memory.
McCulloch and Pitts proposed a binary threshold unit as a computational model for
an artificial neuron (see Figure 1.2). This mathematical neuron computes a weighted sum
of its n input signals, xj, j = 1,2,..,n, and generates an output of 1 if this sum is above a
certain threshold u . Otherwise, an output of 0 results. Mathematically,
𝑦 = ∅(∑𝑛𝑗=1 𝑤𝑗 𝑥𝑗 − 𝑢) (1.1)
where ɵ() is a unit step function at 0, and w, is the synapse weight associated with the jth
input. Positive weights correspond to excitatory synapses, while negative weights model
inhibitory ones. There is a crude analogy here to a biological neuron: wires and
interconnections model axons and dendrites, connection weights represent synapses, and
the threshold function approximates the activity in a soma. The McCullochand Pitts
model, however, contains a number of simplifying assumptions that do not reflect the true
behavior of biological neurons.
The McCulloch-Pitts neuron has been generalized in many ways. An obvious one is
to use activation functions other than the threshold function, such as piecewise linear,
sigmoid, or Gaussian. Again, for simplicity of notation, it is often considered the threshold
U as another weight wo = - U attached to the neuron with a constant input x0 = 1. Fig. 1.3
shows a generalized form of artificial neuron which is consisted with a set of connecting
weights, a summing unit, and an activation function. Each input signal is weighted, that
X1
Bias (x0 or b) = 1
W1
W0
X2
W2
Input
signals Output
(Y)
WN
XN Summing Activation
Synaptic unit function
weights
is, it is multiplied with the weight value of the corresponding input (with an analogy to
the synaptic strength of the connections of real neurons). The output of the summing unit
is, therefore, a combination of weighted input signals and an externally applied bias. The
bias has the effect of increasing or decreasing the net input of the activation function,
depending on whether it is positive or negative, respectively. Finally, the output of a
neuron comes from the activation function. The activation function is also referred to as
a squashing function in the sense that it squashes the permissible amplitude range of the
output signal to some finite value.
Activation function may be threshold unit, linear function or non-linear function as shown
in Fig. 1.4. The most commonly used nonlinear activation function is the sigmoid function
(Haykin, 1999). The sigmoid function is defined by, y=1/ (1+exp(-ax)) where a is the
slope parameter. By varying the parameter a, one can obtain sigmoid functions of
different slopes, as illustrated in Fig. 1.5. Since the output of a sigmoid function is
bounded between 0 and 1, the increase or decrease of input values (x) by a large amount
will push the output into the saturated region. The important characteristics of the sigmoid
function are that it is bound above and below, it is monotonically increasing, and it is
Figure 1.4: Different types of activation functions: (a) threshold, (b) piecewise linear,
(c) sigmoid, and (d) Gaussian.
Chapter 1. Preliminaries: Biological Neuron to Artificial Neural Network | 5
a=2
a=1
a=1/2
a=1/3
x
Figure 1.5: Sigmoid function with various slope (a) values.
Table 1.2: Weight values for two input single decision line logic operations.
Logic Operation W1 W2 W0
OR +1 +1 - 0.5
AND +1 +1 - 1.5
NOR -1 -1 + 0.5
NAND -1 -1 + 1.5
Chapter 1. Preliminaries: Biological Neuron to Artificial Neural Network | 7
Bias (X0 or b) = 1
X1 - 0.5
+1
Logic
Inputs Y (OR)
+1
X2
response is generated through a different unit. In this case XOR logic comes from two
inputs (i.e., H1 and H2) which are manipulated through AND and OR logic units. Final
output may represent as
𝑌 = ̅𝐻
̅̅1̅ 𝐴𝑁𝐷 𝐻2 = ̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅
(𝑋1 𝐴𝑁𝐷 𝑋2 ) 𝐴𝑁𝐷 (𝑋1 𝑂𝑅 𝑋2 ) .
The operational procedure is similar to previous model in such as a way that for inputs (0,
0), (0, 1) and (1, 0) H1 = 0 (AND logic) and Y = H2 (OR logic) with outputs 0, 1, and 1,
respectively. On the other hand, for (1, 1), both H1 and H2 are activated but the output Y
= 0 because H1 is connected through weight value -2. In general term, the intermediate
AND and OR logics preparation is termed as hidden representation. Finally, to solve a
problem appropriate architecture as well as proper weight assignment is essential.
time of output error propagation from output layer to input layer. To apply BP, connection
weights of a network are initialized with random values in a small range.
Consider most common three layer NN architecture to explain the BP algorithm as
shown in Fig. 1.11. The BP algorithm consists of two basic steps: a forward pass and a
backward pass. In the forward pass, input values of an example or pattern are presented
to the network, actual outputs are measured from output layer passing responses from
input layer to output layer through hidden layer, and then error for the pattern is calculated
based on actual output and desired output of that pattern. In the backward pass, the
connection weights are adjusted based on the error calculated. The weights between
hidden and output layer is updated first then weights between input and hidden layer.
If a weight w sends input x to a neuron and f is the output of that neuron, according to
BP learning, the weight correction (∆w) for that weight is given by the following equation:
w x , (1.2)
where δ is the local gradient of the neuron and η is learning rate. The learning rate merely
indicates the relative size of change in weights and therefore affects learning speed. For
its high value, update due to one example may alter a weight value much adversely with
respect to others resulting oscillation. In general, the value of η considers in a small range
such as in between 0.1and 0.3.
The local gradient of output unit (δo) and hidden unit (δh) are defined by:
𝐼 → { 𝑊ℎ } → 𝐻 → { 𝑊𝑜 } → 𝑂
e f o
o (1.3)
f o xo
f h
h o wo (1.4)
o xh
Here, xo and fo represents the net input (weighted sum) and the output of an output neuron,
respectively. e is the error that is defined by the difference between desired output and
actual response. The error function for n-th training pattern may be defined by the
following equation.
1
e(n) (d (n) f o (n)) 2 , (1.5)
2
where fo(n) is the actual output and d(n) is the desired output. To update weights, BP
algorithm requires partial derivative of Eq. (1.5) with respect to the output fo(n) and is
calculated as follows.
eo (n)
d (n) f o (n) (1.6)
f o (n)
For sigmoid activation function f o (n) 1 / 1 exp xo (n) and therefore
f o (n)
f o (n)1 f o (n) (1.7)
xo (n)
Now the local gradient of output unit (δo) becomes
o d (n) f o (n) f o (n)1 f o (n) (1.8)
For the same sigmoid activation function the local gradient of hidden unit (δh) becomes
Figure 1.12 shows the operational flowchart of the NN shown in Fig. 1.11. In the figure upper
part is the forward pass from input to actual output then error. The lower portion demonstrates the
local gradient and weight correction generation from output layer and input layer. Every directed
indicates the component require to calculate it. As an example, according to Eq. (1.8) local
gradient of output layer (i.e., 𝛿o) requires actual output (fo) and desired output (do). On the other
hand, according to Eq. (1.9) local gradient of hidden layer (i.e., 𝛿h) requires hidden layer output
(fh), local gradient of output layer (i.e., 𝛿o) and connecting weights of particular hidden node and
output nodes (wo).
14 | Chapter 1. Preliminaries: Biological Neuron to Artificial Neural Network
On the other hand, local gradient for first hidden layer (H1) will be
h1 f h1 (n)1 f h1 (n) h 2 wh2 (1.11)
h2
1. NN Setup:
The function nnsetup (.) is to create a feedforward NN (i.e., MLP) NN which takes
NN architecture as parameter and return NN in an object. CS1 is the code segment to
define the NN for the sample case to classify handwritten numeral from 28×28(=784)
pixel image. In the function, the parameters 784, 300 and 10 are for number of neurons
Chapter 1. Preliminaries: Biological Neuron to Artificial Neural Network | 15
in the input, hidden and output layers, respectively. The function return NN as nn
object. The setup function also defines other parameters such as activation function,
learning rate.
CS1:
nn = nnsetup([784 300 10]);
2. NN Training:
The function nntrain(.) is to train the NN which takes network object with training
data and return trained network. CS2 is the code segment for training which trains the
neural network nn with input train_x and output train_y for setting opts with epochs,
batchsize. It returns NN nn with updated activations, errors, weights and biases, and
L, the sum squared error for each training batch.
CS2:
[nn, L] = nntrain(nn, train_x, train_y, opts);
The training operation is performed in three different steps and three functions are
activated in it:
(i) nnff (.) - performs a feedforward pass
(ii) nnbp (.) - performs backpropagation and calculate gradients
(iii) nnapplygrads (.) - updates weights and biases with calculated gradients
3. NN Evaluation:
nneval(.) function evaluates performance of NN and CS3 is for code segment of it.
The function takes network nn, and test input feature test_x, and desired output of
inputs test_y. It measures actual response of the NN for input; compared with the
desired output; and finally, returns error rate er and misclassified pattern index bad.
CS3:
[er, bad] = nntest(nn, test_x, test_y);
the training data. Generalization is a more desirable and critical feature because the most
common use of a classifier is to make good prediction on new or unknown objects.
Commonly generalization ability is measured on testing set data that are reserved from
the available data at the time of training. Testing error rate (TER), i.e., rate of wrong
classification on testing set, is widely acceptable quantitative measure, whose value
minimum is good. A number of benchmark problems are available to measure the TER
or generalization ability of neural networks or any other machine learning system.
1.9 Features of NN
18 | Chapter 1. Preliminaries: Biological Neuron to Artificial Neural Network
activation function for weighted sum of previous layer responses. If final output match
with desired output, local gradient of output layer neurons will be zero. Since local
gradient of output layer transfer backward toward input layer, weights of a layer will not
be updated if weights connected to output layer do not update. In another words, training
NN by BP starting from random initialized parameters, does not work very well and easily
gets stuck in undesired local optima. This prevents the lower layers from learning useful
features.
References
Akhand, M. A. H., Islam, M. M., & Murase, K. (2009). A Comparative Study of Data
Sampling Techniques for Constructing Neural Network Ensembles, International
Journal of Neural Systems 19(2), 67-89.
Akhand, M. A. H. & Murase, K. (2010). Neural Networks Ensembles: Existing Methods and
New Techniques, ISBN-10: 3838391373 & ISBN-13: 978-3838391373, LAP LAMBERT Academic
Publishing, 2010.
Demsar, J., Zupan, B., & Leban, G. (2004). Orange Datasets, AI Laboratory, Faculty of
Computer and Information Science, University of Ljubljana.
(http://www.ailab.si/orange/datasets.asp)
Jain, A. K., Mao, J., & Mohiuddin, K. M. (1996). Artificial Neural Networks: A Tutorial.
IEEE Computer 29 (3), 31-44.
Krose, B. & Smagt, P (1996). An Introduction to Neural Networks. 8th edition.
Chapter 1. Preliminaries: Biological Neuron to Artificial Neural Network | 21
Newman, D. J., Hettich, S., Blake, C. L., & Merz, C. J. (1998). UCI Repository of
Machine Learning Databases. Department of Information and Computer Science,
University of California Irvine. (http://www.ics.uci.edu/~mlearn/)
Prechelt, L. (1994). Proben1- A Set of Benchmarks and Benching Rules for Neural
Network Training Algorithms. Tech. rep. 21/94, Fakultat fur Informatik, University
of Karlsruhe, Germany.
Rumelhart, D., Hinton, G., & Williams, R. (1986). Learning internal representations by
error propagation. In Rumelhart, D., & McClelland, J. (Eds.), Parallel Distributed
Processing: Explorations in the Microstructure of Cognition. Volume 1: Foundations,
pp. 318-363. MIT Press, Cambridge, MA.
Rasmussen, C. E. & Neal, R. M. (2003). Delve - Data for Evaluating Learning in Valid
Experiments, Department of Computer Science, University of Toronto, Canada.
(http://www.cs.toronto.edu/~delve/data/datasets.html)