Professional Documents
Culture Documents
Machine Learning (ML) :: Aim: Analysis and Implementation of Deep Neural Network. Definitions
Machine Learning (ML) :: Aim: Analysis and Implementation of Deep Neural Network. Definitions
DEFINITIONS :
INTRODUCTION :
Deep learning is a class of machine learning algorithms that use multiple layers to progressively
extract higher level features from raw input.
ARCHITECTURE OVERVIEW : Neural Networks receive an input (a single
vector), and transform it through a series of hidden layers. Each hidden layer is made up
of a set of neurons, where each neuron is fully connected to all neurons in the previous
layer, and where neurons in a single layer function completely independently and do not
share any connections. The last fully-connected layer is called the “output layer” and in
classification settings it represents the class scores. Convolutional Neural Networks
(ConvNet) have neurons arranged in 3 dimensions: width, height, depth. The final
output layer, by the end of the ConvNet architecture will reduce the full image into a
single vector of class scores, arranged along the depth dimension.
There are three main types of layers to build ConvNet architectures. Each Layer accepts
an input 3D volume and transforms it to an output 3D volume through a differentiable
function.
1. Convolutional Layer
The CONV layer computes without brain/neuron analogies. The CONV layer’s
parameters consist of a set of learnable filters.
Accepts a volume of size W1×H1×D1
Requires four hyperparameters:
Number of filters K,
their spatial extent F,
the stride S,
the amount of zero padding P.
Produces a volume of size W2×H2×D2 where:
W2=(W1−F+2P)/S+1
H2=(H1−F+2P)/S+1 (i.e. width and height are computed equally by
symmetry)
D2=k
With parameter sharing, it introduces F⋅F⋅D1 weights per filter, for a total
of (F.F⋅D1)⋅K weights and K biases.
In the output volume, the d-th depth slice (of size W2×H2) is the result of
performing a valid convolution of the d-th filter over the input volume with a stride
of S, and then offset by d-th bias.
2. Pooling Layer
It progressively reduce the spatial size of the representation to reduce the amount of
parameters and computation in the network, and hence to also control overfitting.
Accepts a volume of size W1×H1×D1.
Requires two hyperparameters:
their spatial extent F,
the stride S,
Produces a volume of size W2×H2×D2 where:
W2=(W1−F)/S+1
H2=(H1−F)/S+1
D2=D1
3. Fully-Connected Layer(Dense)
Fully connected layers connect every neuron in one layer to every neuron in another layer.
It is in principle the same as the traditional multi-layer perceptron neural network (MLP).
The flattened matrix goes through a fully connected layer to classify the images.
2. Batch Normalisation.
Batch Normalization normalizes the output of a previous activation layer by subtracting
the batch mean and dividing by the batch standard deviation.It increase the stability of a
neural network.
3. Leaky_Relu
lt is an activation function which helps the network learn non-linear decision boundaries.
1. Data Preprocessing
The preprocessing is to center the data to have mean of zero (Mean Subtraction), and
normalize its scale to [-1, 1] along each feature (Normalisation).
2. Weight Initialization
Initialize the weights by drawing them from a gaussian distribution with standard
deviation of √2/n, where n is the number of inputs to the neuron. If the weights are
initialized to be the same or all zero then there is no source of asymmetry between
neurons and so it is randomly initialized. Calibrating the variances with 1/sqrt(n)
ensures that all neurons in the network initially have approximately the same output
distribution and empirically improves the rate of convergence.
3. Regularization
Regularization is a technique which makes slight modifications to the learning algorithm
such that the model generalizes better. Regularization penalizes the coefficients i.e. the
weight matrices of the nodes. L1 and L2 are the most common types of regularization.
These update the general cost function by adding another term known as the
regularization term.
In L2, we have:
In L1, we have:
Here, lambda is the regularization parameter which is the hyperparameter whose value
is optimized for better results and “w” represents the weights. Also, the loss function
could be either Softmax (preferred) or SVM (Support Vector Machine).
Dropout
This is the one of the most interesting types of regularization techniques. While training,
dropout is implemented by only keeping a neuron active with some probability p (a
hyperparameter), or setting it to zero otherwise as shown below.
STEPS INVOLVED :
1. Loading and Analyzing the data
2. Training the classifier by Data preprocessing, Weight initialization and computing the loss.
3. Computing the Analytic Gradient with Backpropagation and performing parameter update.
4. Modeling the data by introducing Convolutional Layer, Pooling Layer, Fully-Connected Layer
and adding dropout.
5. Evaluating the test set and predicting the labels.
EXPERIMENTAL RESULTS :