Professional Documents
Culture Documents
Internship Report Dikshant Sharma (191203040)
Internship Report Dikshant Sharma (191203040)
Internship Report Dikshant Sharma (191203040)
AT
Azure Skynet
Gurgaon, Haryana
BACHELOR OF ENGINEERING
In
Computer Science and Engineering
SUBMITTED BY:
Dikshant sharma
Roll Number: 191203040
SUBMITTED TO
CANDIDATES’ DECLARATION
I, Dikshant Sharma, 191203040, hereby declare that the work which is being
presented in the Industry Internship Report entitled, “Object Recognition ”in partial
fulfillment of requirement for the award of degree of B.E. (Branch Name) and
submitted in the Department Name, Model Institute of Engineering and Technology
(Autonomous), Jammu is an authentic record of myown work carried by me at
“Azure Skynet,Gurgaon(Haryana)” under the supervision and mentorship of Mr.
Ashish Saini , Azure Skynet, Gurgaon(Haryana) and Faculty Name(Designation,
Department, Institute) respectively. The matter presented in this report has not been
submitted in this or any other University / Institute for the award of B.E. Degree.
(Dikshant sharma)
191203006
Page |3
INTERNSHIP CERTIFICATE
Page |4
ACKNOWLEDGEMENTS
(Dikshnat Sharma)
191203040
Page |6
Department Name
Model Institute of Engineering and Technology (Autonomous) Kot
Bhalwal, Jammu, India
(NAAC “A” Grade Accredited)
CERTIFICATE
Certified that this Industry Internship Report entitled “Object Recognition using OpenCV”
is the bonafide work of “Dikshant Sharma, 191203040, of 7th Semester, Branch Computer
(Autonomous), Jammu”, who carried out the Industry Internship at “Azure Skynet ,
This is to certify that the above statement is correct to the best of my knowledge.
ABSTRACT
Contents
Candidates’ Declaration ii
Internship Certificate iii
Acknowledgement iv
Abstract vi
Contents vii
List of Figures ix
Chapter 1 Introduction 10-11
1.1 Preamble 10
1.2 Object Detection 10
1.3 Approaches 11
1.4 Problem description and Research Questions 11
Chapter 2 Background 12-24
2.1 Machine Learning 12
2.2 Computer Vision 12-13
2.3 Why Python 13
2.4 Libraries & Modules 13-15
2.4.1 Numpy 13
2.4.2 Argparser 14
2.4.3 OpenCV 14-15
2.5 Unified Detection Model – YOLO 15-22
2.6Training YOLO on COCO 22
2.7 Image Processing Technique 22-23
2.8 Object Classification in Moving Object Detection 23-24
Chapter3 Workflow 25-30
3.1 Steps Involved In Object Detection in Python 3.7 25
3.1.1 Install OpenCV-Python
3.1.2 Read an Image
3.1.3 Feature detection and description
3.2Architecture of the Proposed Model 26-27
3.3 Implementation 27
3.4 Results and Analysis 27-30
Page |9
REFERENCES 34-35
APPENDIX A 37
APPENDIX B 38
P a g e | 10
LIST OF FIGURES
Chapter 1 Introduction
.
1.1 Preamble
The main aim of this project is to build a system that detects objects from the image or a
stream of images given to the system in the form of previously recorded video or the real time
input from the camera. Bounding boxes will be drawn around the objects that are being
detected by the system. The system will also classify the object to the classes the object belongs.
Python Programming and a Machine Learning Technique named YOLO (You Only Look Once)
algorithm using Convolutional Neural Network is used for the Object Detection.
1.3 Approaches
Broadly speaking, object detection can be broken down into machine learning-based
approaches and deep learning-based approaches.
In more traditional ML-based approaches, computer vision techniques are used to look at
various features of an image, such as the color histogram or edges, to identify groups of pixels
that may belong to an object. These features are then fed into a regression model that predicts
the location of the object along with its label.
On the other hand, deep learning-based approaches employ convolutional neural networks
(CNNs) to perform end-to-end, unsupervised object detection, in which features don’t need to
be defined and extracted separately.
Chapter 2 Background
This is quite different from image processing, which involves manipulating or enhancing visual
information and is not concerned about the contents of the image. Applications of computer
vision include image classification, visual detection, and 3D scene reconstruction from 2D
images, image retrieval, augmented reality, and machine vision and traffic automation.
2.4 Libraries
2.4.1 Numpy
NumPy is the fundamental package for scientific computing with Python [11].It can be
treated as an extension of the Python programming language with support for multidimensional
matrices and arrays. It is open source software with many contributors. It contains among other
things:
• A powerful N-dimensional array object.
• Broadcasting functions.
P a g e | 15
Besides its obvious scientific uses, NumPy can also be used as an efficient multi-dimensional
container of generic data.
Arbitrary data types can be defined. This allows NumPy to seamlessly and speedily integrate
with a wide variety of databases. NumPy is licensed under the BSD license, enabling reuse with
few restrictions
2.4.2 Argparse
The argparse module in Python helps create a program in a command-line-environment in a
way that appears not only easy to code but also improves interaction. The argparse module also
automatically generates help and usage messages and issues errors when users give the
program invalid arguments.
You may already be familiar with CLIs: programs like git, ls, grep, and find all expose command-
line interfaces that allow you to call an underlying program with specific inputs and options.
argparse allows you to call your own custom Python code with command-line arguments
similar to how you might invoke git, ls, grep, or find using the command line. You might find this
useful if you want to allow other developers to run your code from the command line.
In this tutorial, you’ll use some of the utilities exposed by Python’s argparse standard library
module. You’ll write command-line interfaces that accept positional and optional arguments to
control the underlying program’s behavior. You’ll also self-document a CLI by providing help
text that can be displayed to other developers who are using your CLI.
2.4.3 OpenCV
OpenCV (Open Source Computer Vision) is an open source computer vision and machine
learning software library. OpenCV was initially built to provide a common infrastructure for
applications related to computer vision and to increase the use of machine perception in the
commercial products. As it is a BSD-licensed product so it becomes easy for businesses to utilize
and modify the existing code in OpenCV.
Around 3000 algorithms are currently embedded inside OpenCV library, all these algorithms
being efficiently optimized. It supports real-time vision applications. These algorithms are
P a g e | 16
categorized under classic algorithms, state of art computer vision algorithms and machine
learning algorithms. These algorithms are easily implemented in Java, MATLAB, Python, C, C++
etc. and are well supported by operating system like Window, Mac OS, Linux and Android. A full-
featured CUDA and OpenCL interfaces are being actively developed for the betterment of
technology. There are more than 500 different algorithms and even more such functions that
compose or support those algorithms. OpenCV is written natively in C++ and has a templated
interface that works seamlessly with STL containers.
For OpenCV to work efficiently with python we need to install NumPy package first.
2.5 YOLOALGORITHM
All the previous object detection algorithms have used regions to localize the object
within the image. The network does not look at the complete image. Instead, parts of the image
which has high probabilities of containing the object. YOLO or You Only Look Once is an object
detection algorithm much is different from the region based algorithms. In YOLO a single
convolutional network predicts the bounding boxes and the class probabilities for these boxes.
Figure
YOLO works by taking an image and split it into an S x S grid, within each of the grid we take
mbounding boxes. For each of the bounding box, the network gives an output a class probability
and offset values for the bounding box. The bounding boxes have the class probability above a
P a g e | 17
threshold value is selected and used to locate the object within the image.
YOLO is orders of magnitude faster (45 frames per second) than any other object
detection algorithms. The limitation of YOLO algorithm is that it struggles with the small objects
within the image, for example, it might have difficulties in identifying a flock of birds. This is due
to the spatial constraints of the algorithm.
The YOLO model was first brought into existence by Joseph Redmon in his paper “You
only look once, Unified, Real-time object detection”. The mechanism for the algorithm employs
the use of a single neural network that takes a photograph as an Input and attempts to predict
bounding boxes and class labels for each bounding box directly. Although this offered less
predictive accuracy, which was mostly due to more localization errors, it boasted speeds of up
to 45 frames per second and up to 155 frames person on speed optimized versions of the model
[6].
“Our unified architecture is extremely fast. Our base YOLO model processes images in real-time
at 45 frames per second. A smaller version of the network, Fast YOLO, processes an astounding
155 frames per second …” [7]
To begin with the model operates by splitting the inputted image into a grid of cells, where each
cell is responsible for predicting a bounding box if the center of a bounding box falls within it.
Each grid cell predicts a bounding box involving the x, y coordinate and the width and height
and a metric of valuation of quality known as a confidence score. A class prediction is also based
on each cell. To supply more emphasis an instance will be provided. For example, an image may
be divided into a 7 × 7 grid and each cell in the grid may predict 2 bounding boxes, resulting in
94 proposed bounding box predictions. The class probabilities map and the bounding boxes
with confidences are then combined into a final set of bounding boxes and class labels. The
YOLO was not without shortcomings, the algorithm had a number of limitations because of the
number of grids that it could run on as well as some other issues which will be addressed
subsequently. Firstly, the model uses a 7 × 7 grid and since each grid can only identify an object,
the model restricts the maximum number of objects detectable to 49. Secondly, the model
suffers from what is known as a close detection model, since each grid is only capable of
detecting one object, if a grid cell contains more than one object it will be unable to detect it.
Thirdly, a problem might arise because the location of an object might be more than a grid, thus,
there exists a possibility that the model might detect the object more than once [8]. Due to the
aforementioned problems encountered when running YOLO, it was fairly obvious that
localization error and other problems of the system needed to be addressed. As a result of that,
YOLOv2 was created as an improvement to deal with the issues and questions posed by its
P a g e | 18
predecessor. Therefore, localization errors as well as errors of real were significantly addressed
in the newversion. The model was updated by Joseph Redmon and Ali Farhadi to further revamp
model performance in their 2016 paper named “YOLO9000: Better, Faster, Stronger [6]”.
Structure of YOLO
YOLO is implemented as a convolution neural network and has been evaluated on the PASCAL VOC
detection dataset. It consists of a total of 24 convolutional layers followed by 2 fully connected layers.
Network Structure
box for the same object. The metric helps to eliminate bounding boxes with lesser scores. How
this is done is that if there are two bounding boxes with very high confidence scores, then the
chances are that the boxes are detecting the same object therefore the box with the higher
confidence rating is selected. However, where confidence rating of the two boxes is low, it is
likely that they are predicting separate objects of the same class, such as two different cars in
the same picture. This application is used after the model has completed training and is being
deployed [12]. As earlier mentioned, Intersection over union is a good metric for evaluating the
quality of a task. Before further elucidation, it is pertinent to define some terminologies first,
these terminologies are true positive, true negatives, false negatives, and false positives. Simply
put, a true positive is any correctly drawn annotation with a value greater than 0.5, a true
negative is when an F1 refuses to draw any annotation because there simply is not one to be
drawn. There is no value here because no annotation is drawn, thus there is no way to calculate
true negatives. A false negative occurs where there are missing annotations while a false
positive occurs where there are incorrectly drawn annotations that have an𝐼𝑜𝑈 score of less
than 0.5. An equation known as accuracy is usually used to measure the performance of a task
because it is an incredibly straight forward measurement as well as for its simplicity. It is simply
a ratio of correctly drawn annotations to the total expected annotations (ground truth). Thus, it
is calculated as being equal to the sum of True positive and true negative divided by the sum of
True positive, False positive, True negative and False negative.
For example, a prediction of tx = 1 would shift the box to the right by the width of the anchor
box, a prediction of 𝑡𝑥 = −1 would shift it to the left by the same amount [25]. This formulation
did not exist within any boundaries and as such, any anchor box could end up at any point in the
image regardless of what location predicted the box. With random initialization it took the
model an obscene amount of time to stabilize to predict sensible offsets [13]. The subsequent
versions of YOLO diverged from this approach and devised a means to properly tackle the
situation. YOLOv2 bounds the location using logistic activation.
P a g e | 21
Sigma (𝜎), which ensures that the value remains between 1 and 0 [14].Given the anchor box of
size (𝑝𝑤, 𝑝ℎ) at the grid cell with its top left corner at (𝑐𝑥, 𝑐𝑦), the model predicts the offset and
the scale, (𝑡𝑥,𝑡𝑦,𝑡𝑤,𝑡ℎ) and the corresponding predicted bounding box has center (𝑏𝑥, 𝑏𝑦) and
size (𝑏𝑤, 𝑏ℎ). The confidence score is the sigmoid (𝜎) of another output to. Since 35 the location
prediction is constrained, the parameterization is easier to learn, making the network more
stable. The employment of dimension clusters along with directly predicting the bounding box’s
center location improves YOLO by almost 5% over the version with anchor boxes n, m.
such that the coordinate values are bounded between 0 and 1. However in YOLOv2 there was a
shift in paradigm and the algorithm employed dimensional clusters in place of anchor boxes, 4
coordinates are predicted for each bounding box, 𝑡𝑥,𝑡𝑦,𝑡𝑤,𝑡ℎ, If the cell is offset from the top left
corner of the image by (𝑐𝑥, 𝑐𝑦) and the bounding box prior has width and height 𝑝𝑤, 𝑝ℎ, then
the predictions correspond to [32]:
𝑏𝑥 = 𝜎(𝑡𝑥 ) + 𝑐𝑥
𝑏𝑦 = 𝜎(𝑡𝑦 ) + 𝑐𝑦
𝑏𝑤 = 𝑝𝑤 𝑒 𝑡𝑤
𝑏ℎ = 𝑝ℎ 𝑒 𝑡ℎ
During training, the sum of squared error loss is used Assuming that the ground truth for some
coordinate prediction is 𝑡̂ ∗ our gradient is the ground truth value (computed from the ground
truth box) minus our prediction: 𝑡̂∗ − 𝑡∗. This ground truth value can be easily computed by
inverting the equations above. YOLOv3 predicts an objectness score for each bounding box
using logistic regression. This should be 1 if the bounding box prior overlaps a ground truth
object by more than any other bounding box prior is not the best but does overlap a ground
truth object by more than some threshold the prediction is ignored. We use the threshold of .5.
Usually, a system only assigns one bounding box prior for each ground truth object. If a
bounding box prior is not assigned to a ground truth object it incurs no loss for coordinate or
class predictions, only objectness [33]
The concept of a bounding box prior was introduced in YOLOv2. Previously, the model was
expected to provide unique bounding box descriptors for each new image, a collection of
bounding boxes is defined with varying aspect ratios which embed some prior information
about the shape of objects we are expecting to detect. Redmon offers an approach towards
discovering the best aspect ratios by doing k-means clustering (with a custom distance metric)
on all of the bounding boxes in the training dataset [33].
Thus, instead of predicting the bounding box dimension directly, the task is reformulated to
simply predict the offset from the bounding box prior in order to fine-tune the predicted
bounding box dimensions. The result of which is that it makes the prediction task easier to
learn.
the predicted box and the ground truth label. When the loss during training is calculated,
Objects are 38 matched to whichever bounding box prediction on the same grid cell produced
the highest 𝐼𝑜𝑈 score. For unmatched boxes, the only descriptor which will be included in the
function is𝑝𝑜𝑏𝑗.
Upon the introduction of additional bounding box priors in YOLOv2, it was possible to
assign objects to whichever anchor box on the same grid cell has the highest IoUscore with the
labeled object.
YOLO (version 3) redefined the "objectness" target score 𝑝𝑜𝑏𝑗 to be 1 for the bounding
boxes with highest 𝐼𝑜𝑈 score for each given target, and the score 0 for all remaining boxes.
However bounding boxes which have a high 𝐼𝑜𝑈 score above a defined threshold but not the
highest score when calculating the loss will not be included. This simply means that it does not
produce the most appropriate prediction because it is not the best possible prediction [33].
Class Labels
YOLO (version 3) uses sigmoid activations for multi-label classification, noting that
SoftMax (from the previous versions) is not necessary for good performance. This choice will
depend on your dataset and whether or not your labels overlap (e.g., "golden retriever" and
"dog").
Output Layer
YOLO (version 3) has 3 output layers. These output layers predict box coordinates at 3
different scales. The output prediction is of the form 𝑤𝑖𝑑𝑡ℎ × ℎ𝑒𝑖𝑔ℎ𝑡 × 𝑓𝑖𝑙𝑡𝑒𝑟𝑠.
YOLO (version 3) replaced the skip connection splitting for a more standard feature
pyramid network output structure. With this method, there is an alternate between outputting a
prediction and up sampling the feature maps (with skip connections). This allows for
predictions that can take advantage of finer-grained information from earlier in the network,
which helps for detecting small objects in the image [33].
2.8.4 Texture-Based
The texture-based approaches with the help of texture pattern recognition work
similar to motion-based approaches. It provides better accuracy, by using overlapping
local contrast normalization but may require more time, which can be improved using
some fast techniques.
P a g e | 26
Chapter 3Workflow
complex image.
The images are divided into SXS grid cells before sending tothe Convolutional Neural Network
(CNN). B Bounding boxes per grid are generated around all the detected objects in the image as
the result of the Convolutional Neural Network. On the other hand, the Classes to which the
objects belong is also classified by the Convolutional Neural Network, giving C Classes per grid.
Then a threshold is set to the Object Detection. In this project we have given a Threshold of 0.3.
Lesser the Threshold value, more number of bounding boxes will appear in the output resulting
in the clumsy output.
P a g e | 28
The Fig-2 illustrates the Flow of data in the System. Initially User will be given the options to
choose the type of the File to be given to the System as an input. Thus, User can either choose
option of File Selection or start the Camera. In the former, User can choose either Image File or a
Video File and, in the latter, User can start the Camera module. Once the input is selected
Preprocessing is done, where the SXS grids are formed. The resultant thus formed with the grids
is send to the Bounding Box Prediction process where the Bounding Boxes are drawn around
the detected objects. Next the result from the previous process is sent to the Class Prediction
where the Class of the object to which it belongs is predicted. Then it is sent to the detection
process where a Threshold is set in order to reduce clumsiness in the output with many
Bounding Boxes and Labels in the final Output. At the end an image or a stream of images are
generated for image and video or camera input respectively with Bounding Boxes and Labels
are obtained as the Output.
3.3 Implementation
This chapter describes the methodology for implementing this project. Following is the
algorithm for detecting the object in the Object Detection System.
3. YOLO detects one object per grid cell only regardless of the number bounding boxes
4. It predicts C conditional class probabilities
5. If no objects exists then confidence score is zero Else confidence score should be greater
or equal to threshold value
6. YOLO then draws bounding box around the detected objects and predicts the class to
which the object belongs
input class.
3.4.2 Result
This section describes different results obtained by giving various Test Cases
describedabove.
The Fig-3.3 illustrates the output of the Object Detection System. Bounding Boxes are drawn
around the Objects detected. Fig-3.4 illustrates the output obtained when objects are
overlapping. This shows that partially visible objects will also be detected by drawing bounding
box around it along with the label indicating the class to which it belongs.
Chapter4 CONCLUSIONS
Computer vision is still a developing discipline, it has not been matured to that level
where it can be applied directly to real life problems. After few years‟ computer vision and
particularly the object detection would not be any more futuristic and will be ubiquitous. For
now, we can consider object detection as a sub-branch of machine learning. Some common and
widely used application of object detection are:
4.1.4 Industries
Object detection is also used in industrial processes for the identification of different products.
Say you want your machine to only detect objects of a particular shape, you can achieve it very
easily. For e.g. Hough circle detection transform can be used for detecting circular objects.
4.1.5 Security
Identification of unwanted or suspicious objects in any particular area or more specifically
object detection techniques are used for detecting bombs/explosives. It is also even used for
personal security purpose. 6.6 Biometric recognition Biometric recognition uses physical or
behavioral traits of humans to recognize any individuals for security and authentication
P a g e | 33
purpose. It uses distinct biological traits like fingerprints, hand geometry, retina and iris
patterns etc.
4.1.6 Surveillance
Objects can be recognized and tracked in videos for security purpose. Object recognition is
required so that the suspected person or vehicle can be tracked.
Object detection’s scope is not yet limited here. You can use it for any purpose you can think of.
For e.g. for solving number puzzles by just giving their images as input and applying some
proper algorithms after detecting different numbers and their places from the input image.
4.2 CHALLENGES
The main purpose is to recognize a specific object in real time from a large number of objects.
Most recognition systems are poorly scalable with many recognizable objects. Computational
cost rises as the number of objects increases. Comparing and querying images using color,
texture, and shape are not enough because two objects might have same attributes. Designing a
recognition system with abilities to work in the dynamic environment and behave like a human
is difficult. Some main challenges to design object recognition system are lighting, dynamic
background, the presence of shadow, the motion of the camera, the speed of the moving objects,
and intermittent object motion weather conditions etc.
4.3 CONCLUSION
The project is developed with objective of detecting real time objects in image, video and
camera. Bounding Boxes are drawn around the detected objects along with the label indicating
the class to which the object belongs. We have used CPU for the processing in the project. Future
P a g e | 34
enhancements can be focused by implementing the project on the system having GPU for faster
results and better accuracy
The possibilities of using computer vision to solve real world problems are immense. The basics
of object detection along with various ways of achieving it and its scope has been discussed.
Python has been preferred over MATLAB for integrating with OpenCV because when a Matlab
program is run on a computer, it gets busy trying to interpret all that Matlab code as Matlab
code is built on Java. OpenCV is basically a library of functions written in C\C++. Additionally,
OpenCV is easier to use for someone with little programming background. So, it is better to start
researching on any concept of object detection using OpenCV-Python. Feature understanding
and matching are the major steps in object detection and should be performed well and with
high accuracy. Deep Face is the most effective face detection method that is preferred over
Haar-Cascade by most of the social applications like facebook, snap chat, Instagram etc. In the
coming days, OpenCV will be immensely popular among the coders and will also be the prime
requirement of IT companies. To improve the performance of object detection IOU measures are
used.
P a g e | 35
REFERENCES
[1] LeleXie, Tasweer Ahmad,Lianwen Jin , Yuliang Liu, and Sheng Zhang ,“ A New CNN-
Based Method for MultiDirectional Car License Plate Detection”, IEEE Transactions on
Intelligent Transportation Systems, ISSN (e): 1524-9050, Vol-19, Issue-02, Year-2018
[2] L. Carminati, J. Benois-Pineau and C. Jennewein, “Knowledge-Based Supervised
Learning Methods in a Classical Problem of Video Object Tracking”, 2006
International Conference on Image Processing, Atlanta, GA, USA, ISSN (e): 2381-8549,
year-2006.
[3] Jinsu Lee, JunseongBang and Seong-II Yang, “Object Detection with Sliding Window
in Images including Multiple Similar Object”, 2017 IEEE International Conference on
Information and Communication Technology Convergence (ICTC), Jeju, South Korea,
ISBN
[4] Qichang Hu, SakrapeePaisitkriangkrai, ChunhuaShen, Anton van den Hengel and
Faith Porikli,“Fast Detection of Multiple Objects in Traffic Scenes with Common
Detection Framework”, IEEE Transactions on Intelligent Transportation Systems, ISSN
[5] HaihuiXie, Quingxiang Wu and Binshu Chen, “Vehicle Detection in Open Parks Using a
Convolutional NeuralNetwork”, 20015 6th International Conference on
Intelligent Systems Design and Engineering Applications(ISDEA), Guiyang, China, ISSN
[6] J. Brownlee, “A Gentle Introduction to Object Recognition with Deep Learning” (2018)
Available at: https://machinelearningmastery.com/object-recognition-with-deep-learning/
[Assessed: 01/10/20]
[7] You Only Look Once: Unified, Real-Time Object Detection, 2015. Available at:
https://arxiv.org/abs/1506.02640, [Assessed: 04/10/20]
[8] A. Kamal, “YOLO, YOLOv2 and YOLOv3: All You want to know”(2019) Available at:
https://medium.com/@amrokamal_47691/yolo-yolov2-and-yolov3-all-you-want-toknow-
7e3e92dc4899 [Assessed: 04/10/20]
[9] A. Rosebrock., “Intersection over Union (IoU) for object detection” (2016) Available at:
https://pyimagesearch.com/2016/11/07/intersection-over-union-iou-for-object-detection/
[10]YOLO: You Only Look Once – Real Time Object
Detectionhttps://www.geeksforgeeks.org/yolo-you-only-look-once-real-time-object-detection/
[11] I. Tan, “Measuring Labelling Quality with IOU and F1 Score”, Available at:
P a g e | 36
https://medium.com/supahands-techblog/measuring-labelling-quality-with-iou-and-
f1-score-1717e29e492f
[12] StackOverflow, “Intersection Over Union (IoU) ground truth in YOLO”, Available at:
https://stackoverflow.com/questions/61758075/intersection-over-union-iou-ground-
truth-in-yolo
[13]J. Redmon& A. Farhadi, (University of Washington), “YOLO9000: Better, Faster,
Stronger” Available at:https://pjreddie.com/media/files/papers/YOLO9000.pdf
[14] “YOLO v2 – Object Detection” Available at: https://www.geeksforgeeks.org/yolo-v2-object-
detection/
[15] A. Aggarwal, “YOLO Explained” Available at: https://medium.com/analytics-vidhya/yolo-
explained-5b6f4564f31
[16] L. Cai, F. Jiang, W. Zhou, and K. Li, Design and Application of An Attractiveness Index for
Urban Hotspots Based on GPS Trajectory Data, (Fellow, IEEE), pg 4
[17] K. Mahesh Babu, M.V. Raghunadh, Vehicle number plate detection and recognition using
bounding box method, May 2016, pp 106–110
P a g e | 37
1APPENDIX A
P a g e | 38
APPENDIXB