435

OBJECT RECOGNITION
A mini project report submitted by
MEKALA SATHVIK REDDY URK18CS146
in partial fulfilment for the award of the degree of
BACHELOR OF TECHNOLOGY
in
COMPUTER SCIENCE AND ENGINEERING
under the supervision of
DR. E. GRACE MARY KANAGA, Associate Professor
DEPARTMENT OF COMPUTER SCIENCE AND

ENGINEERING
KARUNYA INSTITUTE OF TECHNOLOGY AND SCIENCES
(Declared as Deemed to be University -under Sec-3 of the UGC Act, 1956)

Karunya Nagar, Coimbatore - 641 114. INDIA
March 2021
1|Page
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
BONAFIDE CERTIFICATE
This is to certify that the project report entitled, “Object Recognition” is a bonafide record of
Mini Project work done during the even semester of the academic year 2020-2021 by
MEKALA SATHVIK REDDY(Reg. No: URK18CS146)
in partial fulfilment of the requirements for the award of the degree of Bachelor of Technology in
Computer Science and Engineering of Karunya Institute of Technology and Sciences.
Submitted for the Viva Voce held on __30/03/2021_______________
Project Coordinator Signature of the Guide
2|Page
CONTENTS
Acknowledgement 4
Abstract 5
1. Introduction 6
1.1 Introduction
1.2 Objectives
1.3 Overview of the Project
2. Analysis and Design 8
2.1 Functional Requirements
2.2 Non-Functional Requirements
2.3 Operating environment
2.4 Architecture
2.5 Use case diagram
3. Background Research 10
3.1 Types of machine learning
3.2 Artificial neural networks
3.3 Convolutional neural networks
3.4 Background Research
3.5 Experimental setup
4. Implementation . 14
4.1. Modules Description
4.2. Implementation Details
4.3. Tools used
5. Test results/experiments/verification . 17
5.1. Testing
5.2. Results
5.3. Verification
6. Conclusions 21
6.1 Goals/vision
6.2 Future work
References 22
3|Page
ACKNOWLEDGEMENT
First and foremost, I praise and thank ALMIGHTY GOD whose blessings have bestowed
in me the will power and confidence to carry out my project.
I am grateful to our beloved founders Late. Dr D.G.S. Dhinakaran, C.A.I.I.B, PhD and
Dr Paul Dhinakaran, M.B.A, PhD, for their love and always remembering us in their prayers.
I extend my thanks to our Vice-Chancellor Dr P. Mannar Jawahar, PhD and our
Registrar Dr Elijah Blessing, M.E., PhD, for giving me this opportunity to do the project.
I would like to thank Dr Prince Arulraj, M.E., PhD, Dean, School of Engineering and
Technology for his direction and invaluable support to complete the same.
I would like to place my heartfelt thanks and gratitude to Dr J. Immanuel John Raja,
M.E., PhD, Head of the Department, Computer Science and Engineering for his encouragement
and guidance.
I feel it is a pleasure to be indebted to, Mr J. Andrew, M.E, (PhD), Assistant Professor,
Department of Computer Science and Engineering and Dr Grace Mary Kanaga(Guide) for their
invaluable support, advice and encouragement.
I also thank all the staff members of the Department for extending their helping hands to
make this project a successful one.
I would also like to thank all my friends and my parents who have prayed and helped me
during the project work.
4|Page
ABSTRACT
TITLE: OBJECT RECOGNITION USING MACHINE LEARNING
Object recognition is a computer vision technique for identifying objects in images. Object
recognition is a key output of deep learning and machine learning algorithms. When humans
look at a photograph, we can readily spot people, objects, scenes, and visual details. Object
recognition has immense applications in the field of monitoring and surveillance, medical
analysis, robot localization and navigation etc. The appearance of an object can be varied due to
scene clutter, photometric effects, changes in shape and viewpoints of the object. The recognition
should be invariant to viewpoint changes and object transformations. The goal is to teach a
computer to do what comes naturally to humans: to gain a level of understanding of what an
image contains is what object recognition does. An object recognition system finds objects in the
real world from an image of the world, using object models which are known as a priori. This
task is surprisingly difficult. Humans perform object recognition effortlessly and instantaneously.
The algorithmic description of this task for the implementation of machines has been very
difficult. In this chapter, we will discuss different steps in object recognition and introduce some
techniques that have been used for object recognition in many applications. We will discuss the
different types of recognition tasks that a vision system may need to perform. We will analyze
the complexity of these tasks and present approaches useful in different phases of the recognition
task.
Machine Learning:
machine learning is one of the applications of Artificial Intelligence (AI) which enables
computers to learn on their own and perform tasks without human intervention [15]. There are
numerous applications of machine learning algorithms in the field of computer vision. With the
help of machine learning, the formulation of some of the most complex problems has been
performed easily. Various computer programs which were previously programmed by humans,
sometimes by-hand, are now being programmed without any human contribution with the help of
machine learning [16]. In recent years, due to the remarkable increase in the availability of
humongous sources of data and the feasibility of computational resources, machine learning has
become predominant with a wide range of applications in our daily lives.
5|Page
CHAPTER 1
1.INTRODUCTION
1.1 Introduction
The process of recognizing objects in videos and images is known as Object recognition. This
computer vision technology enables autonomous vehicles to classify and detect objects in real-
time. An autonomous vehicle is an automobile that can sense and react to its environment to
navigate without the help or involvement of a human. Object recognition is considered to be one
of the most important tasks as this is what helps the vehicle detect obstacles and set the future
courses of the vehicle. Therefore, object recognition algorithms must be highly accurate.
Though there are many machine learning and deep learning algorithms for object detection and
recognition, such as Support vector machine (SVM), Convolutional Neural Networks (CNNs),
Regional Convolutional Neural Networks (R-CNNs), You Only Look Once (YOLO) model etc.,
it is important to choose the right algorithm for autonomous driving as it requires real-time
object recognition. Since machines cannot detect the objects in an image instantly like humans, it
is really necessary for the algorithms to be fast and accurate and to detect the objects in real-time,
so that the vehicle controllers solve optimization problems at least at a frequency of one per
second.
Convolutional neural networks:
Most modern convolutional neural networks (CNNs) used for object recognition are built using
the same principles: Alternating convolution and max-pooling layers followed by a small number
of fully connected layers. We re-evaluate the state of the art for object recognition from small
images with convolutional networks, questioning the necessity of different components in the
pipeline. We find that max-pooling can simply be replaced by a convolutional layer with
increased stride without loss in accuracy on several image recognition benchmarks. Following
this finding – and building on other recent work for finding simple network structures – we
propose a new architecture that consists solely of convolutional layers and yields competitive or
state of the art performance on several object recognition datasets (CIFAR-10, CIFAR-100,
ImageNet). To analyze the network we introduce a new variant of the “deconvolution approach”
6|Page
for visualizing features learned by CNNs, which can be applied to a broader range of network
structures than existing approaches.
1.2 OBJECTIVES
The aim is to evaluate the classification performance of suitable deep learning and machine
learning models for real-time object recognition. To identify suitable and highly efficient CNN
models for real-time object recognition. Evaluate the classification performance of these CNN
models.
1.3 OVERVIEW OF THE PROJECT
❖ Fully functional machine learning-based object recognition system
❖ It helps the machine or computers to recognize the objects
❖ Use example images (called templates or exemplars) of the objects to perform
recognition
❖ Object Recognition Software can affect everyday life in incredible ways while
optimizing operations in many industries. such as
1. Industrial robots can track all processes, and giving them object recognition
abilities helps to root out human error.
2. This software can play an important role in diagnosis, through high-resolution
pictures, MRIs, and CT scans.
3. Object recognition has been successfully implemented in the retail industry. This
technology can help improve quality control and automate other day-to-day
product operations.
4. Implementing object recognition into the automotive business has proven to be a
breakthrough in this industry, especially as it works toward developing driverless
vehicles.
7|Page
CHAPTER 2
ANALYSIS AND DESIGN
2.1 FUNCTIONAL REQUIREMENTS

❖ Should be able to make predictions more optimally
❖ Better User experience and user interface
❖ Should be able to take inputs from the end-user
❖ Able to be self-explanatory
2.2 NON-FUNCTIONAL REQUIREMENTS

❖ More Accuracy
❖ Fast and better computing with optimization
❖ Better visualizations and analysis of the data to understand data better
2.3 OPERATING ENVIRONMENT
Software environment
Table-2.1: Software requirements
Number Description Type
1 Operating System Windows-10
2 Language Python
3 Browser Google Chrome
4 Compiler Jupyter lab
Hardware environment
Table-2.2: Hardware requirements
Number Description
1 Pc with 250GB or more hard drive
2 Pc with 4GB or above RAM
3 Pc with graphics processing unit (GPU)
8|Page
2.4 ARCHITECTURE
❖ Various preprocessing takes place
❖ Import datasets from Keras
❖ Use one-hot vectors for categorical labels
❖ Addlayers to a Keras model
❖ Load pre-trained weights
❖ Make predictions using a trained Keras model
2.5 USE CASE DIAGRAM
OBJECT RECOGNITION
OBJECT
IMAGE CLASSIFICATION LOCALIZATION
OBJECT DETECTION
OBJECT
SEGMENTATION
Fig-2.1: Use case diagram
9|Page
CHAPTER 3
BACKGROUND RESEARCH
3.1 TYPES OF MACHINE LEARNING
• Supervised Learning
Supervised learning is considered to be the most elementary class of machine learning
algorithms. As the name suggests, these algorithms require direct supervision In this type
of learning, the data labelled/annotated by humans is spoon-fed to the algorithm. This data
contains the classes and locations of the objects of interest. Eventually, the algorithm learns
from the annotated data and predicts the annotations of the new data previously not known
to the algorithm
• Unsupervised Learning
In unsupervised learning, the algorithm tries to learn and identify useful properties of the
classes from the given annotated data, without the help or intervention of a human
• Reinforcement Learning
In this type of learning, the machine is allowed to train itself continually using trial and
error. As a result, the machine learns from past experience and attempts to capture the
best knowledge possible to predict accurately
3.2 ARTIFICIAL NEURAL NETWORKS
Artificial neural networks are a popular type of supervised learning model. A special case of a
neural network called the convolutional neural network (CNN) is the primary focus of object
recognition
3.3 CONVOLUTIONAL NEURAL NETWORK
There are various types of artificial neural networks that are considered to be very important
such as Radial basis function neural network, Feed-forward neural network, Convolutional
neural network, recurrent neural network, Modular neural network etc. Among these types of
networks, the convolutional neural networks (CNNs) are effective in applications such as
image/video recognition. A convolutional neural network typically comprises of three layers –
Convolutional layer, Pooling layer and Fully-connected layer.
10 | P a g e
3.4 BACKGROUND RESEARCH
The vast majority of modern convolutional neural networks (CNNs) used for object recognition
are built using the same principles: They use alternating convolution and max-pooling layers
followed by a small number of fully connected layers. We empirically study the effect of
transitioning from a more standard architecture to our simplified CNN by performing an ablation
study on CIFAR-10 and compare our model to the state of the art on CIFAR-10, CIFAR-100 and
the ILSVRC-2012 ImageNet dataset. Our results both confirm the effectiveness of using small
convolutional layers as recently proposed by Simonyan & Zisserman (2014) and give rise to
interesting new questions about the necessity of pooling in CNNs. Since dimensionality
reduction is performed via stridden convolution rather than max-pooling in our architecture it
also naturally lends itself to studying questions about the invertibility of neural networks
(Estrach et al., 2014). For the first step in that direction, we study the properties of our network
using a deconvolutional approach similar to Zeiler & Fergus (2014).
3.5 EXPERIMENT SETUP

To quantify the effect of simplifying the model architecture we perform experiments on two
datasets: CIFAR-10, CIFAR-100. Specifically, we use CIFAR-10 to perform an in-depth study
of different models, since a large model on this dataset can be trained with moderate computing
costs of ≈ 10 hours on a modern GPU. We then test the best model found on CIFAR-10 and
CIFAR-100 with and without augmentations. In experiments on CIFAR-10 and CIFAR-100, we
use three different base network models which are intended to reflect current best practices for
setting up CNNs for object recognition.
Table-3.1: The three base networks used for classification on CIFAR-10 and CIFAR-100.
11 | P a g e
Table-3.2: Model description of the three networks derived from base model C used for
evaluating the importance of pooling in case of classification on CIFAR-10 and CIFAR-100. The
derived models for base models A and B are built analogously. The higher layers are the same as
in Table-3.1.
Strided-CNN-C ConvPool-CNN-C All-CNN-C
Input 32 * 32 RGB image
3 * 3 conv. 96 ReLU 3 * 3 conv. 96 ReLU 3 * 3 conv. 96 ReLU

With stride r = 2 3 * 3 conv. 96 ReLU
3 * 3 max-pooling stride 2 3 * 3 conv. 96 ReLU

With stride r = 2

With stride r = 2 3 * 3 conv. 192 ReLU
3 * 3 max-pooling stride 2 3 * 3 conv. 192 ReLU

With stride r = 2
12 | P a g e
Table-3.3: Comparison between the base and derived models on the CIFAR-10 dataseT
Model Error(%) *parameters
Without data augmentation
Model A 12.47% ~ 0.9M

Strided-CNN-A 13.46% ~0.9M
ConvPool-CNN-A 10.21% ~1.28M
All-CNN-A 10.30% ~1.28M
Model B 10.20% ~ 1M
Strided-CNN-A 10.98% ~ 1M
ConvPool-CNN-A 9.33% ~ 1.35M
All-CNN-A 9.10% ~1.35M
Model C 9.74% ~ 1.3M
Strided-CNN-A 10.19% ~ 1.3M
ConvPool-CNN-A 9.31% ~ 1.4M
All-CNN-A 9.08% ~ 1.4M
13 | P a g e
CHAPTER 4
IMPLEMENTATION
4.1 MODULES DESCRIPTION
Input:
➢ The data is fed into the model pipeline using NumPy as Data frames
➢ Data stored in Numpy Data frames and read from various sources
➢ In our case, we get data from the CIFAR-10 dataset
Preprocessing:
➢ Preprocessing is like preparing the data for modelling
➢ Here most of the tasks are done
➢ Data is processed
➢ we need to preprocess the dataset so the images and labels are in a form that Keras can
ingest
➢ we'll define a NumPy seed for reproducibility, then normalize the images.
➢ we will also convert our class labels to one-hot vectors. This is a standard output format
for neural networks.
Building the All-CNN:

➢ Using the CIFAR-10 dataset as a reference, we can implement the All-CNN network in
Keras.
➢ Keras models are built by simply adding layers, one after another.
➢ To make things easier, we will wrap this model in a function, which will allow us to
quickly and neatly generate the model later on in the project.
➢ We have to define model type to sequential
➢ Add model layers Convolutional2D, Activation, Dropout
14 | P a g e
Defining Parameters and Training the Model:
➢ Define our hyperparameters, such as learning rate and momentum
➢ build all-CNN model
➢ Defining optimizer and compile model
➢ Defining additional training parameters
GPU:
The graphics processing unit (GPU) is in charge of image rendering. The most advanced GPUs
were originally designed for gamers; however, GPU-accelerated computing, the use of a GPU
together with a CPU to accelerate deep learning, analytics, and engineering applications, has
become increasingly common. The training of deep neural networks is not realistic without them.
The most common GPUs for deep learning are produced by NVIDIA. Furthermore, the NVIDIA
Deep Learning SDK provides high-performance tools and libraries to power GPU-accelerated
machine learning applications. An alternative would be an AMD GPU in combination with the
OpenCL libraries; however, these libraries have fewer active users and less support than the
NVIDIA libraries.
The CIFAR-10 dataset:

The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images
per class. There are 50000 training images and 10000 test images.
The dataset is divided into five training batches and one test batch, each with 10000 images. The
test batch contains exactly 1000 randomly-selected images from each class. The training batches
contain the remaining images in random order, but some training batches may contain more
images from one class than another. Between them, the training batches contain exactly 5000
images from each class.
15 | P a g e
Fig-4.1: Cifar-10 dataset sample images
4.2 IMPLEMENTATION DETAILS

➢ Numpy package is used for scientific computation
➢ Keras is used for Data frame and Data analysis
➢ Matplotlib is used for data visualizations
➢ Jupyter or any equivalent environment is used for the development
4.3 TOOLS USED

o Numpy
o Matplotlib
o Keras
o Cifar dataset
o Jupyter lab
o Anaconda Navigator
16 | P a g e
CHAPTER 5
TEST RESULTS/EXPERIMENTS/VERIFICATION
5.1 TESTING
o The model is testes using test/validation data
o Prior to model training, the data is split into train and test/validation data set
o So, the model is only trained on the training data
o Model is completely unaware of the test data that is split before training
o The classes are completely mutually exclusive
o An object is recognized in a new image by individually comparing each feature from the
new image to this database and finding candidate matching features
5.2 RESULTS
o Once the testing takes place, based on the type of the model and the use case, various
evaluation metrics are obtained
o These metrics give a glimpse of our model works
o The accuracy of our model is 88%
Fig-5.1: Number of training and testing images from the dataset
17 | P a g e
Fig-5.2: Sample images for the computer to recognise
5.3 VERIFICATION
o The model is verified on various data samples
o The model is tested and verified in production by predicting the result inside the code
o When the data-sets were trained using the above variations random patterns were
obtained on the desired target data-sets.
18 | P a g e
Fig-5.3: The accuracy is 87.51%
19 | P a g e
Fig-5.4: The predicted output and the original output is the same
Fig-5.5: Feature Filters of Front, Middle and Rear-End Layers in a CNN
20 | P a g e
CHAPTER 6
CONCLUSIONS
This section elucidates the overall lookup at the project and some of the future works that may
enhance the solution.
6.1 Goals / Vision

Our goal was to take a shot at the accumulation of information and feature extraction and work
with insignificant prototyping, and then take a shot at Object Recognition model, training, testing
developing feasible architecture of model to deliver great precision, and chip away at document
analysis, character recognition
6.2 Future Work

To expand this application for regional languages, By testing with real-time users, improve the
performance of the product by training with more and more data. Object recognition has been
one of the most important topics in machine vision. In one form or another, it has attracted
significant attention. Many approaches have been developed for pattern classification. These
approaches are very useful in many applications of machine vision. Many object recognition
systems are built upon low-level vision modules which operate upon images to derive depth
measurements. These measurements are often incomplete and unreliable and thus adversely
affect the performance of higher-level recognition modules.
21 | P a g e
REFERENCES
[1] https://machinelearningmastery.com/object-recognition-with-deep-learning/
[2] https://www.geeksforgeeks.org/object-detection-vs-object-recognition-vs-image-
segmentation/
[3] https://github.com/PAN001/All-CNN/blob/master/all_cnn_weights_0.9088_0.4994.hdf5
[4] https://ieeexplore.ieee.org/document/7845313
[5] https://www.cs.toronto.edu/~kriz/cifar.html
[6] https://arxiv.org/1412.6806
[7] Object Recognition Software: A Technology That Lives Up To The Hype | SENLA - Custom
Software Development Company (senlainc.com)
22 | P a g e

435

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

435

Uploaded by

Copyright:

Available Formats

OBJECT RECOGNITION

A mini project report submitted by

MEKALA SATHVIK REDDY URK18CS146

in partial fulfilment for the award of the degree of

under the supervision of

DR. E. GRACE MARY KANAGA, Associate Professor

DEPARTMENT OF COMPUTER SCIENCE AND

KARUNYA INSTITUTE OF TECHNOLOGY AND SCIENCES

(Declared as Deemed to be University -under Sec-3 of the UGC Act, 1956)

MEKALA SATHVIK REDDY(Reg. No: URK18CS146)

Submitted for the Viva Voce held on __30/03/2021_______________

Project Coordinator Signature of the Guide

in me the will power and confidence to carry out my project.

I extend my thanks to our Vice-Chancellor Dr P. Mannar Jawahar, PhD and our

I feel it is a pleasure to be indebted to, Mr J. Andrew, M.E, (PhD), Assistant Professor,

invaluable support, advice and encouragement.

make this project a successful one.

during the project work.

2.1 FUNCTIONAL REQUIREMENTS

2.2 NON-FUNCTIONAL REQUIREMENTS

2.5 USE CASE DIAGRAM

Fig-2.1: Use case diagram

3.5 EXPERIMENT SETUP

Strided-CNN-C ConvPool-CNN-C All-CNN-C

Input 32 * 32 RGB image

3 * 3 conv. 96 ReLU 3 * 3 conv. 96 ReLU 3 * 3 conv. 96 ReLU

3 * 3 max-pooling stride 2 3 * 3 conv. 96 ReLU

3 * 3 conv. 96 ReLU 3 * 3 conv. 192 ReLU 3 * 3 conv. 192 ReLU

3 * 3 max-pooling stride 2 3 * 3 conv. 192 ReLU

Model Error(%) *parameters

Without data augmentation

Model A 12.47% ~ 0.9M

Building the All-CNN:

The CIFAR-10 dataset:

4.2 IMPLEMENTATION DETAILS

4.3 TOOLS USED

Fig-5.1: Number of training and testing images from the dataset

Fig-5.5: Feature Filters of Front, Middle and Rear-End Layers in a CNN

6.1 Goals / Vision

6.2 Future Work

You might also like

Submitted for the Viva Voce held on 30/03/2021_____________