Professional Documents
Culture Documents
Report About Neural Network For Image Classification
Report About Neural Network For Image Classification
INTRODUCTION
In the last few years, there has been a lot of hype about artificial neural networks and
other machine learning methods, mostly due to the success of a series of applications
in the industry that use this technology. This in turn has spawned the creation of
relatively powerful open source libraries and services that make this technology
accessible to average developers. In order to use these libraries optimally,
programmers have to have a basic understanding of the underlying implementation
of the algorithms.
If the network generates a good or desired output, there is no need to adjust the
weights. However, if the network generates a poor or undesired output or an error,
then the system alters the weights in order to improve subsequent results.
1
1.2.1 Supervised Learning It involves a teacher that is scholar than the ANN
itself. For example, the teacher feeds some example data about which the
teacher already knows the answers.
For example, pattern recognizing. The ANN comes up with guesses while
recognizing. Then the teacher provides the ANN with the answers. The
network then compares it guesses with the teachers correct answers and
makes adjustments according to errors.
2
prediction, chemical product design analysis, dynamic modeling of chemical
process systems, machine maintenance analysis, project bidding, planning,
and management.
Medical Cancer cell analysis, EEG and ECG analysis, prosthetic design,
transplant time optimizer.
3
1.4.1 Image processing
The second component of Computer Vision is the low-level processing of images.
Algorithms are applied to the binary data acquired in the first step to infer low-level
information on parts of the image. This type of information is characterized by image
edges, point features or segments, for example. They are all the basic geometric
elements that build objects in images.
This second step usually involves advanced applied mathematics algorithms and
techniques.
4
We will be focusing on using artificial neural networks for image classification.
While artificial neural networks are some of the oldest machine learning algorithms
in existence, they have not been widely used in the field of computer vision. M
ore recent improvements in the methods of training artificial neural networks have
made them worth looking into once again for the task of image classification.
Image recognition and classification is a problem that has been around for a long
time and has many real world applications. Police can use image recognition and
classification to help identify suspects in security footage. Banks can use it to help
sort out checks. More recently, Google has been using it in their self-driving car
program.
Traditionally, a lot of different machine learning algorithms have been utilized for
image classification, including template matching, support vector machines, k-NN,
and hidden Markov models.
1.5.1 Challenges
Here is a set of challenges in applying deep learning techniques to image
classification problems:
1) Image size: The majority of the image-related tasks where deep learning was
successfully applied used images of small size. Examples: MNIST (28x28
pixels), STL-10(96x96), Norb (108x108), Cifar-10 and Cifar-100 (32x32).
2) Model size and Training time: One exception to dataset list above is the
ImageNet dataset, which consists in high-resolution images of variable sizes.
The best results on this dataset, however, require significant usage of
computer resources (such as 16thousands cores running for three days ). The
texture datasets commonly consist of higher resolution images, and therefore
different techniques need to be tested, in order to classify the textures without
using too much computing resources.
5
1.6 Motivation
There is a considerable set of potential applications for image classification, as
briefly presented above. However, despite the reported success of classical image
classification techniques in many of these tasks, these problems are still not resolved
and are subject of active research, with potential to increase classification rates.
A second motivation is that traditional machine learning techniques often require
human expertise and knowledge to hand-engineer features, for each particular
domain, to be used in classification and regression tasks. It can be considered that the
actual intelligence in such systems is therefore in the creation of such features,
instead of the machine learning algorithm that uses them. Therefore, using
techniques that do not rely on expert-defined feature extractors can make it easier to
develop effective machine learning models for novel datasets, without requiring the
test and selection of a large set of possible features extractors.
6
CHAPTER-2
LITERATURE SURVEY
Classification between the objects is easy task for humans but it has proved to be a
complex problem for machines [1]. The raise of high-capacity computers, the
availability of high quality and low-priced video cameras, and the increasing need for
automatic video analysis has generated an interest in object classification algorithms.
A simple classification system consists of a camera fixed high above the interested
zone, where images are captured and consequently processed. Classification includes
image sensors, image pre- processing, object detection, object segmentation, feature
extraction and object classification. [2]
Classification system consists of database that contains predefined patterns that
compares with detected object to classify in to proper category.[3] Image
classification is an important and challenging task in various application domains,
including biomedical imaging, biometry, video- surveillance, vehicle navigation,
industrial visual inspection, robot navigation, and remote sensing Classification
process consists of following steps[5]:
C. Training: Selection of the particular attribute which best describes the pattern.
D. Classification of the object-Object classification step categorizes detected objects
into predefined classes by using suitable method that compares the image patterns
with the target patterns.[6]
Image classification has made great progress over the past decades in the following
three areas [7]:
(1) development and use of advanced classification algorithms, such as sub pixel,
per-field, and knowledge-based classification algorithms
7
(2) use of multiple remote-sensing features, including spectral, spatial,
multitemporal, and multisensor information.[8]
(3) incorporation of ancillary data into classification procedures, including such
data as topography, soil, road and census data.
Accuracy assessment is an integral part in an image classification procedure.[9]
Accuracy assessment based on error matrix is the most commonly employed
approach for evaluating per-pixel classification, while fuzzy approaches are gaining
attention for assessing fuzzy classification results.[10] Uncertainty and error
propaga tion in the image-processing chain is an important factor influencing
classification accuracy. Identifying the weakest links in the chain and then
reducing the uncertainties are critical for improvement of classification accuracy.
[11] The study of uncertainty will be an important topic in the future research of
image classification. Spectral features are the most important information for image
classification.[15]
Standard training algorithms for the multi-layer perceptron use back-propagation to
evaluate the first derivatives of the error function with respect to the weights and
thresholds in the network. There are, however, several situations in which it is also of
interest to evaluate the second derivatives of the error measure. These derivatives
form the elements of the Hessian matrix. [12]
Second derivative information has been used to provide a fast procedure for re-
training a network following a small change in the training data . In this application it
is important that all elements of the Hessian matrix be evaluated accurately.
Approximations to the Hessian have been used to identify the least significant
weights as a basis for network pruning technique, as well as for improving the speed
of training algorithms.[13] The Hessian has also been used by MacKay for Bayesian
estimation of regularization parameters, as well as for calculation of error bars on the
network outputs and for assigning probabilities to different network solutions.[14]
[9] MacKay found that the approximation scheme of Le Cun was not sufficiently
accurate and therefore included off-diagonal terms in the approximation scheme.
8
Chapter-3
BACKGROUND
9
different, adjustable weight. Data is passed through these many hidden layers and the
output is eventually interpreted as different results.
In this example diagram, there are three input nodes, shown in red. Each input node
represents a parameter from the dataset being used. Ideally the data from the dataset
would be preprocessed and normalized before being put into the input nodes.
There is only one hidden layer in this example and it is represented by the nodes in
blue. This hidden layer has four nodes in it. Some artificial neural networks have
more than one hidden layer.
The output layer in this example diagram is shown in green and has two nodes. The
connections between all the nodes (represented by black arrows in this diagram) are
weighted differently during the training process.
10
More recently, Google has been using it in their selfdriving car program.
Traditionally, a lot of different machine learning algorithms have been utilized for
image classification, including template matching, support vector machines, k-
NNand hidden Markov models. Image classification remains one of the most
difficult problems in machine learning, even today.
11
CHAPTER- 4
PROBLEM DEFINITION & PROCESS INVOLVED
4.1 INTRODUCTION
Image classification is a critical task for both humans and computers. One of the
challenges lies in the large scale of the semantic space. In particular, humans can
recognize tens of thousands of object classes and scenes.
By using the image classification on CIFAR10 dataset We find that: a) computational
issues be- come crucial in algorithm design; b) conventional wisdom from a couple
of hundred image categories on relative performance of different classifiers does not
necessarily hold when the number of categories increases; c) there is a surprisingly
strong relationship between the structure of WordNet (developed for studying
language) and the difficulty of visual categorization; d) classification can be
improved by exploiting the se- mantic hierarchy.
The drawbacks of the existing methods are twofold: (a) lack of high accuracy and (b)
slow convergence rate. Since wrong identification leads to misleading results,
accuracy must be exceedingly high in classification and segmentation techniques.
Also, these techniques must possess a faster convergence rate which will make them
practically feasible for real-time applications. These problems can be overcome by
using artificial intelligence techniques and by performing suitable modifications on
the existing conventional algorithms.
This paper is from Geoffrey Hinton at the University of Toronto and the Canadian
Institute for Advanced Research and deals with the problem of training multilayer
neural networks[1] . It is an overview of the many different strategies that are used to
train multilayer neural networks today.
First it discusses five strategies for learning neural networks, which include denial,
evolution, procrastination, calculus, and generative. Out of these five strategies, the
most significant ones are strategies four and five. Calculus includes the strategy of
12
backpropagation, which has been independently discovered by multiple researchers.
Generative includes the wakesleep algorithm.
13
propagation also has several limitations including requiring labelled data and
becoming slow when used on neural networks with excessive amounts of layers.
The backpropagation algorithm was originally introduced in the 1970s, but its
importance wasn't fully appreciated until a famous 1986 paper by David
Rumelhart, Geoffrey Hinton, and Ronald Williams. That paper describes several
neural networks where backpropagation works far faster than earlier approaches to
learning, making it possible to use neural nets to solve problems which had
previously been insoluble. Today, the backpropagation algorithm is the workhorse of
learning in neural networks.
Backpropagation is a practical realization of the gradient descent algorithm in
multilayered neural networks. It calculates the gradient of the loss function with
respect to all the the weights in the network by iteratively applying the multivariable
chain rule.
Applying the backpropagation algorithm to a neural network is a two way process:
we first propagate the input values through the network and calculate the errors, and
then we backpropagate the errors through the network backwards to adjust the
connection weights in order to minimize the error. The algorithm calculates the
gradient of the loss function with respect to the weights between the hidden layer and
output layer nodes, and then it proceeds to calculate the gradient of the loss function
with respect to the weights between the input layer and hidden layer nodes. After
calculating the gradients, it subtracts them from the corresponding weight vectors to
get the new weights for the connections. This process is repeated until the network
produces the desired outputs.
The whole process is better explained with an example. In the neural network
implemented in this project, the gradient of the loss function with respect to the
weights between the hidden layer and output layer nodes can be computed as
follows:
14
where loss is the cross-entropy loss described and z is a vector that holds the values
of the output layer nodes.
Going further down the line, the gradient of the loss function with repsect to the
weights between the input layer and hidden layer nodes is calculated with the
following formula:
Equation 4.4.2.Calculation of gradient of loss function
where y is a vector that holds the values of the hidden layer nodes.
CHAPTER 5
15
SYSTEM ANALYSIS AND DESIGN
OS: Windows
5.1.1 numPy
NumPy is the fundamental package for scientific computing with Python. It contains
among other things:
Besides its obvious scientific uses, NumPy can also be used as an efficient multi-
dimensional container of generic data. Arbitrary data-types can be defined. This
allows NumPy to seamlessly and speedily integrate with a wide variety of databases.
5.1.2 sciPy
SciPy is an open source Python library used for scientific computing and technical
computing. SciPy contains modules for optimization, linear algebra, integration,
interpolation, special functions, FFT, signal and image processing, ODE solvers and
other tasks common in science and engineering.
SciPy builds on the NumPy array object and is part of the NumPy stack which
includes tools like Matplotlib, pandas and SymPy. There is an expanding set of
scientific computing libraries that are being added to the NumPy stack every day.
This NumPy stack has similar users to other applications such as MATLAB, GNU
Octave, and Scilab. The NumPy stack is also sometimes referred to as the SciPy
stack.
16
5.1.3 Cython Compiler
Cython is an optimizing static compiler for both the Python programming language
and the extended Cython programming language (based on Pyrex). It makes writing
C extensions for Python as easy as Python itself.
The Cython language is a superset of the Python language that additionally supports
calling C functions and declaring C types on variables and class attributes. This
allows the compiler to generate very efficient C code from Cython code. The C code
is generated once and then compiles with all major C/C++ compilers in CPython 2.6,
2.7 (2.4+ with Cython 0.20.x) as well as 3.2 and all later versions. We regularly run
integration tests against all supported CPython versions and their latest in-
development branches to make sure that the generated code stays widely compatible
and well adapted to each version. PyPy support is work in progress (on both sides)
and is considered mostly usable since Cython 0.17. The latest PyPy version is always
recommended here.
All of this makes Cython the ideal language for wrapping external C libraries,
embedding CPython into existing applications, and for fast C modules that speed up
the execution of Python code.
5.1.4 Python
Uses an elegant syntax, making the programs you write easier to read.
Is an easy-to-use language that makes it simple to get your program working.
This makes Python ideal for prototype development and other ad-hoc
programming tasks, without compromising maintainability.
Python's interactive mode makes it easy to test short snippets of code. There's
also a bundled development environment called IDLE.
17
Is easily extended by adding new modules implemented in a compiled
language such as C or C++.
5.1.5 TensorFlow
TensorFlow is an open source software library for numerical computation using data
flow graphs. Nodes in the graph represent mathematical operations, while the graph
edges represent the multidimensional data arrays (tensors) communicated between
them. The flexible architecture allows you to deploy computation to one or more.
CPUs or GPUs in a desktop, server, or mobile device with a single API. TensorFlow
was originally developed by researchers and engineers working on the Google Brain
Team within Google's Machine Intelligence research organization for the purposes of
conducting machine learning and deep neural networks research, but the system is
general enough to be applicable in a wide variety of other domains as well.
The pictures are stored in memory as six 10 000 x 3072 numpy arrays where each
row represents a single image. The 3072 values in each row are logically divided into
3 chunks of 1024 values, each representing the red, green and blue channel values,
18
respectively. The images are stored in row-major order, which means that every i-th
logical block consisting of 32 elements in the row represent the i-th row channel
value in the actual image.
The labels are stored as six lists of 10,000 elements, where each element is an integer
between 0 and 9. These numbers map to the class names, so that 0 maps to the
airplanes class, 1 maps to automobiles etc. The i-th element in the labels list is the
label for the i-th picture in the numpy array that was described in the previous
paragraph.
Here are the 10 different classes in the dataset and some examples of the pictures that
belong to these classes:
19
5.2.DESIGN
CHAPTER 6
20
IMPLEMENTATION
Neural networks are very loosely based on how biological brains work. They consist
of a number of artificial neurons which each process multiple incoming signals and
return a single output signal. The output signal can then be used as an input signal for
other neurons.
We have a vector of input values and a vector of weights. The weights are the
neurons internal parameters. Both input vector and weights vector contain the same
21
WeightedSum=input1w1+input2w2+...
As long as the result of the weighted sum is a positive value, the neurons output is
this value. But if the weighted sum is a negative value, we ignore that negative value
and the neuron generates an output of 0 instead. This operation is called a Rectified
Linear Unit (ReLU).
Fig 6.2 Rectified Linear Unit, which is defined by f(x) = max (0,x)
The reason for using a ReLU is that this creates a non linearity. The neurons output
is now not strictly a linear combination (=weighted sum) of its inputs anymore.
Well see why this is useful when we stop looking at individual neurons and instead
look at the whole network.
The neurons in artificial neural networks are usually not connected randomly to each
22
Fig 6.3 Architecture and functioning of a neural network
The input images pixel values are the inputs for the networks first layer of neurons.
The output of the neurons in layer 1 is the input for neurons of layer 2.
This is the reason why having a non-linearity is so important. Without the ReLU at
each layer, we would only have a sequence of weighted sums. And stacked weighted
sums can be merged into a single weighted sum, so the multiple layers would give us
solves this problem as each additional layer really adds something to the network.
23
The networks final layers outputs are the values we are interested in, the scores for
the image categories. In this network architecture each neuron is connected to all
neurons of the previous layer, therefore this kind of network is called a fully
connected network.
24
Equation 6.2.2.Peceptron rule
where w and x are the weight and input vectors, respectively, and b is the bias term.8
Although in theory, one can use perceptrons to compute any function, it is almost
never used in practice, mostly due to its discrete nature - a small change in the
weights might cause the perceptron to produce a drastically different output, which is
not desirable for learning. Accordingly, in this neural network implementation, a
sigmoid activation function is used instead of the perceptron, which is actually quite
similar to the former, but gets rid of its shortcoming. The formula for the sigmoid
function is the following:
The intuition behind the function is that it takes a real valued input and outputs a
value between 0 and 1. The bigger the input value, the closer the output is to 1, and
vice versa. In that respect, the sigmoid function is very similar to the perceptron,
because for most of the inputs, it produces an output that is really close to either 1
or 0, as can be seen from the following graph:
25
Fig. 6.4. The sigmoid activation function.
6.3 Loss function
The task of the neural network implemented in this project is to learn to classify
images. In order to learn something, one must be provided with some feedback about
his current performance. The job of the loss function is precisely that: to evaluate the
accuracy of the neural network. As the name suggests, if the loss is low, the neural
network is doing a good job at classifying the images, and the loss will be high if the
network is not guessing the right classes.
In order to calculate the loss for a specific guess, the neural network's output must
first be interpreted as class scores. This is the job of the score function, which takes
the values from the output layer nodes and calculates the probability that a given
input represents a specific class. The score function used in this project is called the
softmax function and is given by the following formula:
Equation.6.3.1.Softmax function
where z is a vector of output nodes and and group denotes a set of indexes of every
node in the output layer.
To illustrate the workings of the score function, an example is probably needed. Let's
26
say we have a neural network that has the job to classify 3 different classes of data.
For this, we need 3 nodes in the output layer, each voting for a different class (first
neuron represents the first class, second neuron represents the second class etc). The
3 nodes in the output layer could have the values 3, 5, 4, so z = (3, 4, 5). After
calculating the softmax probability distribution, we get p=(0.09, 0.244, 0.665). Since
the third node has the highest score, we say that the network is classifying the input
to be from class 3.
After calculating the probability score for each class, the cross-entropy loss function
can be used to calculate the network's total loss:
C= tilog pii
Gradient descent works by finding the gradient of the loss function with respect to
the weights of the network, telling us the direction where the function increases the
most. In order to minimize the loss, it is required to subtract a fraction of the gradient
from the corresponding weight vector. To understand the algorithm intuitively, an
analogy can be used.
27
One can imagine being placed blindfolded at the top of the mountain and given a task
to find a way down. The strategy a blindfolded person would use would be to feel the
ground just before him and take a small step in the direction where the ground feels
to descent the steepest. Doing this iteratively, the person is guaranteed to arrive at a
location where the ground just around him is not descending. Since the mountain is
not always going downwards at every location, but rather has a very fluctuating
terrain, it is not guaranteed that the place the person feels the ground to be horizontal
in every direction is actually the steepest place on the mountain. When it is not, it is
called a local minima and it can be visualized with the following picture:
As can be seen from the graph, the hiker's move along the paths (depicted as red
lines) that continuously take them down the mountain when it is descending, but fail
to reach the absolute bottom once the ground is stable.
In order to get out of local minima, several methods can be used, most notably by
adjusting the learning rate, which is intuitively the length of the steps that the hiker
takes in a specific direction. When the learning rate is low, the hiker takes very small
steps and can be almost certain that he is descending, but it takes a lot of time to
28
reach the bottom of the mountain. On the other hand, when the learning rate is high
and he takes long steps, he might reach the ground faster, but with a relatively big
possibility of going in the wrong direction. For example, if the hiker feels that the
ground is descending just before him, but then takes a very big step in that direction,
he might end up at a location that is actually higher than he was at previously.
Another straightforward method that is used to improve gradient descent and to help
it get out of local minima is called the momentum update, which adds a fraction of
the previous weight update values to the current one. The intuition behind the method
is very simple and can be easily derived from its name: the faster the hiker is
descending (i.e the steeper the hill), the more momentum he has due to inertia when
he makes his next step, hence the step will be longer if he is descending fast, and
shorter when the descent is not as drastic.
Softmax Classifier:
The softmax classifier is used to predict one class at a time. The Softmax Regression
model first computes a score sk(x) for each class k, then estimates the probability of
each class by applying the softmax function (also called the normalized exponential)
to the scores.
29
(s(x))k is the estimated probability that the instance x belongs to class k
given the scores of each class for that instance.
6.5 Backpropagation
Backpropagation is a practical realization of the gradient descent algorithm in
multilayered neural networks. It calculates the gradient of the loss function with
respect to all the the weights in the network by iteratively applying the multivariable
chain rule.
Applying the backpropagation algorithm to a neural network is a two-way process:
we first propagate the input values through the network and calculate the errors, and
then we backpropagate the errors through the network backwards to adjust the
connection weights in order to minimize the error. The algorithm calculates the
gradient of the loss function with respect to the weights between the hidden layer and
output layer nodes, and then it proceeds to calculate the gradient of the loss function
with respect to the weights between the input layer and hidden layer nodes. After
calculating the gradients, it subtracts them from the corresponding weight vectors to
30
get the new weights for the connections. This process is repeated until the network
produces the desired outputs.
The whole process is better explained with an example. In the neural network
implemented in this project, the gradient of the loss function with respect to the
weights between the hidden layer and output layer nodes can be computed as
follows:
where loss is the cross-entropy loss described and z is a vector that holds the values
of the output layer nodes.
Going further down the line, the gradient of the loss function with repsect to the
weights between the input layer and hidden layer nodes is calculated with the
following formula:
where y is a vector that holds the values of the hidden layer nodes.
The code is split into two files: two_layer_neural_network.py, which defines the
31
inference() gets us from input data to class scores.
6.6.1.1 Inference() describes the forward pass through the network. How are the
# Layer 1
with tf.variable_scope('Layer1'):
weights = tf.get_variable(
name='weights',
shape=[image_pixels, hidden_units],
initializer=tf.truncated_normal_initializer(
32
stddev=1.0 / np.sqrt(float(image_pixels))
regularizer=tf.contrib.layers.l2_regularizer(reg_constant)
# Layer 2
with tf.variable_scope('Layer2'):
# Define variables
initializer=tf.truncated_normal_initializer(
stddev=1.0 / np.sqrt(float(hidden_units))),
regularizer=tf.contrib.layers.l2_regularizer(reg_constant)
# layer 2 output
33
logits = tf.matmul(hidden, weights) + biases
##############################################################
unnecessary complications. There are different ways to achieve this, and the
the sum of the squares of all the weights in the network to the loss function.
This corresponds to a heavy penalty if the model is using big weights and a
Thats why we used the regularizer parameter when defining the weights and
assigned a l2_regularizer to it. This tells TensorFlow to keep track of the L2-
regularization terms (and weigh them by the parameter reg_constant) for this
variable.
34
accesses. We then add the sum of all regularization losses to the previously
with tf.name_scope('Loss'):
cross_entropy = tf.reduce_mean(
tf.nn.sparse_softmax_cross_entropy_with_logits(
tf.GraphKeys.REGULARIZATION_LOSSES))
tf.summary.scalar('loss', loss)
return loss
35
##############################################################
global_step is a scalar variable which keeps track of how many training iterations
have already been performed. When repeatedly running the model in our training
loop, we already know this value. Its the iteration variable of the loop. The reason
were adding this value directly to the TensorFlow graph is that we want to be able to
take snapshots of the model. And these snapshots should include information about
The definition of the gradient descent optimizer is simple. We provide the learning
rate and tell the optimizer which variable it is supposed to minimize. In addition, the
36
# (which also increments the global step counter)
train_step = tf.train.GradientDescentOptimizer(learning_rate).minimize(
loss, global_step=global_step)
return train_step
#############################################################
We compare the models predictions with true labels and calculate the frequency of
how often the prediction is correct. Were also interested in how the accuracy evolves
over time, so were adding a summary operation which keeps track of the value
of accuracy.
with tf.name_scope('Accuracy'):
37
correct_prediction = tf.equal(tf.argmax(logits,1), labels)
tf.summary.scalar('train_accuracy', accuracy)
return accuracy
##########################################################
inference() describes the forward pass through the network. How are the class scores
Each neuron takes all values from the previous layer as input and generates a single
output value. Each neuron in the hidden layer therefore has image_pixels inputs and
the layer as a whole generates hidden_units outputs. These are then fed into
the classes neurons of the output layer which generate classes output values, one
automatically.
38
Fig 6.7.Regularization in process
size). The initializer parameter describes the weight variables initial values. Up to
now, weve initialized our variables to 0, but this wouldnt work here. Think about
the neurons in a single layer. They all receive exactly the same input values. If they
all had the same internal parameters as well, they would all make the same
calculation and all output the same value. To avoid this, we need to randomize their
initial weights. We use an initialization scheme which usually works well, the
weights are initialized to normally distributed values. We drop values which are more
39
than 2 standard deviations from the mean, and the standard deviation is set to the
To create the first layers output we multiply the images matrix and
the weights matrix with each other and add the bias variable.
Then we apply tf.nn.relu(), the ReLU function to arrive at the hidden layers output.
Layer 2 is very similar to layer 1. The number of inputs is hidden_units, the number
are [hidden_units, classes]. Since this is the final layer of our network, theres no
need for a ReLU anymore. We arrive at the class scores (logits) by multiplying input
To sum it up, the inference() function as whole takes in input images and returns
class scores. Thats all a trained classifier needs to do, but in order to arrive at a
trained classifier, we first need to measure how good those class scores are.
after 10 iterations each image will have been picked once on average(1). But in fact
some images will have been picked multiple times while some images havent been
part of any batch so far. As long as you repeat this often enough, its not that terrible
that randomness causes some images to be part of the training batches somewhat
40
But we want to improve the sampling process. What we do is we first shuffle the 100
images of the training dataset. The first 10 images of the shuffled data are our first
batch, the next 10 images are our second batch and so forth. After 10 batches were
at the end of our dataset and the process starts again. We shuffle the data another time
and run through it from front to back. This guarantees that no image is being picked
more often than any other while still ensuring that the order in which the images are
returned is random.
Python generator, which returns the next batch each time it is evaluated.
Were using the Pythons built-in zip()function to generate a list of tuples of the
from [(image1, label1), (image2, label2), ...], which is then passed to our generator
function.
next(batches) returns the next batch of data. Since its still in the form of [(imageA,
labelA), (imageB, labelB), ...], we need to unzip it first to separate images from
Every 100 iterations the models current accuracy is evaluated and printed to the
screen. In addition, the summary operation is being run and its results are added to
41
After the training is finished, the final model is evaluated on the test set (remember,
the test set contains data that the model has not seen so far, allowing us to judge how
data_sets = data_helpers.load_data()
name='images')
tf.summary.histogram('logits', logits)
42
loss = two_layer_neural_network.loss(logits, labels_placeholder)
run_metadata = tf.RunMetadata()
saver = tf.train.Saver()
summary = tf.summary.merge_all()
sess.run(tf.global_variables_initializer())
43
zipped_data = zip(data_sets['images_train'], data_sets['labels_train'])
FLAGS.max_steps)
for i in range(FLAGS.max_steps):
batch = next(batches)
feed_dict = {
images_placeholder: images_batch,
labels_placeholder: labels_batch }
if i % 100 == 0:
44
summary_writer.add_summary(summary_str, i)
summary_writer.add_run_metadata(run_metadata, 'step%d' % i)
summary_writer.add_summary(summary_str, i)
if (i + 1) % 1000 == 0:
print('Saved checkpoint')
images_placeholder: data_sets['images_test'],
labels_placeholder: data_sets['labels_test']})
45
CHAPTER 7
RESULTS
We can see that the training accuracy starts at a level we would expect from guessing
randomly (10 classes -> 10% chance of picking the correct one). Over the first about
1000 iterations the accuracy increases to around 50% and fluctuates around that
value for the next 1000 iterations. The test accuracy of 46% is not much lower than
the training accuracy. This indicates that our model is not significantly over-fitted.
46
7.2 Accuracy and Loss graph: The following graph is displayed in
Tensorflow. Accuracy of the classification gradually increasing.
47
7.3 Histograms Generation
48
CHAPTER 8
The task of image classification has relevant applications in a wide range of domains,
and although being successful in many areas, it is still object of active research with
potential to increase classification rates. Herein, we have used artificial neural
networks for image classification. Our technique of image classification, hence,
classifies new samples, with the help of the patterns learned by it, in the training set.
But, the accuracy of this project still needs to be improved as the neural network can
be improved a number of ways like refactoring the code to make it more efficient and
accommodating more hidden layers so that the networks can deal with more complex
images. The objective, to use a machine learning algorithm to learn a classification
model, using a training dataset, has been achieved.
Take humans for instance, we receive somewhat around eighty percent of the
information about our environment through vision, so if simply put this field has
really great scope in machine vision especially Autonomous Robotics, Artificial
intelligence, Satellite imaging, tracking and the list continues.
One good thing about computer vision is; when it is paired with machine learning it
has limitless application in various diverse fields, right from medical to military. But
it demands high computing power and parallel processing which current generation
processors are not able to provide well, although new technologies have improved
the performance but we are not there yet. So, simply put computer vision and
especially machine learning are still in its elementary and budding stage and theres
a lot to come ahead.
Future work can explore the optimization of the hyper parameters on the network
(such as number of layers, number of neurons per layer, etc.), since these hyper
parameters were fixed during the experiments on the present work. The strategy to
extract patches during testing can also be further explored, experimenting with
different strategies such as random patch extraction
49
CHAPTER-9
REFERENCES
[2] D. L. & Q. Weng, A survey of image classification methods and techniques for
improving classification performance, Int. J. Remote Sens., pp 90-94, 2007.
[3] Mokhairi Makhtar, Engku Fadzli, and Shazwani Kamarudin, The Contribution
of Feature Selection and Morphological Operation For On- Line Business System s
Image Classification, World Applied Science Journal, Vol 1. pp. 22-29, 2014.
[4] Christopher M. Bishop, Exact Calculation of the Hessian Matrix for the
Multilayer Perceptron, Published in Neural Computation, Vol. No. 4 pp. 122-125,
1992
[6] A. C. Berg, T. L. Berg, J. Malik, and U. C.Berkeley, Shape Matching and Object
Recognition using Low Distortion Correspondences, International Journal of
Computer Trends and Technology, pp: 343-349, 2005.
50
[9] L. P. Ricotta, S. Ragazzini and G. Martinelli Learning of Word Stress in a
Suboptimal Second Order Back-propagation Neural Network. In Proceedings IEEE
International Conference on Neural Networks, San Diego, Vol. 1, pp. 355, 2007
[12] L. Fei-Fei, R. Fergus, and P. Perona, Learning Generative Visual Models from
Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object
Categories, Conf. of Comput. Vis. Pattern Recognit. Work., pp. 178178, 2004.
[15] S. Banerji, A. Sinha, and C. Liu, New image descriptors based on color,
texture, shape, and wavelets for object and scene image classification,Published in
IEEE Neurocomputing, vol. 117, pp. 173185, Oct. 2013.
51