Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 7

CHAPTER 3

BIOLOGICALLY- INSPIRED MODEL

3.1 DEEP LEARNING

Deep learning is a subset of machine learning that differentiates itself through the
way it solves problems. Machine learning requires a domain expert to identify most
applied features. On the other hand, deep learning learns features incrementally, thus
eliminating the need for domain expertise. Deep learning is a class of machine learning
algorithms that uses multiple layers to progressively extract higher level features from
the raw input. It allows us to train an AI to predict outputs, given a set of inputs. Both
supervised and unsupervised learning can be used to train the AI.

Figure-3.1-AI-Deep learning

The word "deep" in "deep learning" refers to the number of layers through which the
data is transformed. More precisely, deep learning systems have a substantial credit
assignment path (CAP) depth. The CAP is the chain of transformations from input to
output. CAPs describe potentially causal connections between input and output. For
a feed forward neural network, the depth of the CAPs is that of the network and is the
number of hidden layers plus one (as the output layer is also parameterized).
For recurrent neural networks, in which a signal may propagate through a layer more
than once, the CAP depth is potentially unlimited. No universally agreed-upon threshold
of depth divides shallow learning from deep learning, but most researchers agree that
deep learning involves CAP depth higher than 2. CAP of depth 2 has been shown to be
a universal approximator in the sense that it can emulate any function. Beyond that,
more layers do not add to the function approximator ability of the network. Deep models
(CAP > 2) are able to extract better features than shallow models and hence, extra
layers help in learning the features effectively.

For supervised learning tasks, deep learning methods eliminate feature engineering,


by translating the data into compact intermediate representations akin to principal
components, and derive layered structures that remove redundancy in representation.
Deep learning algorithms can be applied to unsupervised learning tasks. This is an
important benefit because unlabeled data are more abundant than the labeled data.
Examples of deep structures that can be trained in an unsupervised manner are neural
history compressors and deep belief networks.

3.2 CONVOLUTIONAL NEURAL NETWORKS

Convolutional neural networks is biologically inspired model. Convolutional neural


networks (CNNs) are one of the most efficient supervised methods of deep learning
which have made remarkable improvements in image processing field. Convolutional
neural network is composed of multiple building blocks, such as convolution layers,
pooling layers, and fully connected layers, and is designed to automatically and
adaptively learn spatial hierarchies of features through a back propagation algorithm.

There are two stages of training in every convolutional neural network. In feed forward
step input images are fed to the network. In other words, dot product of the input vector
and parameters vector of each neuron is performed and convolution operator in each
layer is applied. Afterwards, the output is computed. By using a loss function, the
network output is compared with the desired output (correct answers) and the error rate
is computed then, based on the error, back propagation stage begins. Calculation of the
gradient of each parameter is done in this step using the chain rule and finally all the
parameters are updated. This is repeated for an adequate number of iterations.
Figure-3.2-General structure of CNN

3.2.1 CONVOLUTIONAL LAYER:

In convolutional layers, the network uses different kernels to convolve the input
image to create various feature maps. Applying this layer will significantly reduce the
number of parameters (weight sharing) of the network and the network learns the
correlation between the neighbor pixels (local connectivity). For a (n x n) image and the

(f x f) filter / kernel, the dimensions of the image resulting from a convolution operation is
(n – f + 1) x (n – f + 1).

3.2.1.1 PADDING

It should be noted that in order to keep the size of the output unchanged and to
preserve the information on the borders of images, the zero padding is done. The
padding is a technique where the input volume is padded with zeros around the border.
There are two padding methods valid padding and same padding. In this project same
padding is used for all the convolutional layers. In same padding, ‘p’ padding layers are
added such that the output image has the same dimensions as the input image.
Therefore, [(n + 2p) x (n + 2p) image] * [(f x f) filter] —> [(n x n) image] which gives p = (f
– 1) / 2 (because n + 2p – f + 1 = n).
Figure:3.3-Same Convolution
Figure3.3 -gives an example of applying or convolving a 3 *3 filter on a 4 * 4 input
matrix. As it is seen, by adding zeros to the input matrix, the size of the output matrix
remains same as the input matrix 4 * 4.
3.2.2 ACTIVATION FUNCTION:
Generally, a nonlinear operator or activation function is used in deep networks after
convolutions. The presence of this function riches the model in comparison with a linear
model. It is well-known that applying the rectified linear unit (ReLU) activation function in
deep networks increases the training speed. ReLU simply projects negative values to
zero and is defined in
x if x ≥ 0
{
relu(x)= 0if x< 0

In many cases Leaky ReLU, has performed better than ReLU. It allows a small, non-
zero gradient when the function is not active or for negative values.It is define as

leaky relu( x)= x if x ≥ 0


{
ax if x <0
Recently using exponential linear units (ELUs) has led to an increase in training speed
and classification accuracy. ELU accept negative values allowing it to push mean unit
activations closer to zero like batch normalization but with less computational cost.

x if x ≥ 0
elu(x) = { a ( e x −1 ) if x <0
3.2.3 POOLING LAYER

The pooling operation involves sliding a two-dimensional filter over each channel of
feature map and summarizing the features lying within the region covered by the filter. For
a feature map having dimensions nh x nw x nc, the dimensions of output obtained after a
pooling layer is n h-f+1/s*(n w −f +1 )/s*n c, where the height, width and number of channels
in feature map represented as nh , nw , nc.
Pooling layers are used to reduce the dimensions of the feature maps. Thus, it reduces
the number of parameters to learn and the amount of computation performed in the
network. The pooling layer summarizes the features present in a region of the feature
map generated by a convolution layer. So, further operations are performed on
summarized features instead of precisely positioned features generated by the
convolution layer. This makes the model more robust to variations in the position of the
features in the input image.
The type of pooling options available are max pooling, average and global pooling .One
of the most widely used pooling methods is called max-pooling. In this project, a max-
pooling layer always comes after each convolutional layer with filters of size 2 * 2
applied with a stride of 2 and takes the maximum over four numbers. Max pooling is a
pooling operation that selects the maximum element from the region of the feature map
covered by the filter. Thus, the output after max-pooling layer would be a feature map
containing the most prominent features of the previous feature map.

Figure-3.4 –Max pooling


3.2.4 FULLY CONNECTED LAYER

CNN process begins with convolution and pooling, breaking down the image into
features, and analyzing them independently. The result of this process feeds into a fully
connected neural network structure that drives the final classification decision. It takes
the output of the previous layers, “flattens” them and turns them into a single vector that
can be an input for the next stage.

3.2.5 REGULARIZATION

The main issue in machine learning is to generate an algorithm that performs well
not only on the training data, but also on the new entries. Several regularization
methods for deep learning have been proposed. In this project dropout which provides a
computationally inexpensive, yet powerful method for regularization is used. It randomly
removes some nodes of the fully connected layer in the training phase, to prevent over
fitting. On the other hand, dropout is considered as an ensemble method, since it
provides different networks during training .

3.2.6 LOSS FUNCTION


One of the important aspects of designing a deep neural network is the selection of the
loss function to be minimized. Categorical cross-entropy function (H) is usually a good
candidate and has been used here. It is defined for two distributions (p and q) over
discrete variable x and is given by:

H(p,q)= -∑ p ( x ) ln(q (x))


x

where,
q(x) is the estimate for true distribution p(x).
3.3 TRAINING
In order to train a deep network, the loss function must be minimized by a gradient-
based optimization algorithm. Stochastic gradient descent (SGD) is widely used as an
optimizer in deep learning. Recently a method for stochastic optimization called
Adaptive moment estimation (Adam) is presented. It is demonstrated that Adam works
better than customary optimization algorithms. Furthermore, its computational efficiency
in the presence of large dataset is a privilege of this method. Learning rate for updating
the weights will remain constant in SGD algorithm, however Adam algorithm computes
adaptive learning rates by estimating the first moment or the mean and the second
moment or the uncentered variance of the gradients.
Other optimizers like Adagrad, Adadelta, Adamax and Nadam can also be used. In
Adagrad optimizer, the learning rate adapts to the parameters. This happens by doing
larger updates for infrequent parameters compared to frequent parameters. Unlike
Adagrad which accumulates all past squared gradients, Adadelta limits the size of the
window of the accumulated previous gradients. Adamax is a variant of Adam optimizer
based on the infinity norm. Nadam combines Adam and Nesterov Accelerated Gradient
optimizers. In Nadam, before computing the gradient, parameters with the momentum
step are updated and this makes it possible to take more precise steps in the gradient
direction.

You might also like