Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 50

Vehicle Detection and Counting With Custom

Data-Set Training

Project report submitted


in partial fulfilment of the requirement for the degree of

Bachelor of Technology

By

Raunak Giri (190610026033)


Trinashis Deb Roy (190610026045)
Dibakar Das Gupta
(190610026014) Sandip Tiwari
(1906100260370)

Under the guidance


of

NISHANT BHARTI, ASST. PROFESSOR, AEC

DEPARTMENT OF ELECTRONICS & TELECOMMUNICATION ENGINEERING

ASSAM ENGINEERING COLLEGE


JALUKBARI- 781013, GUWAHATI

January, 2023
ASSAM ENGINEERING COLLEGE, GUWAHATI

CERTIFICATE

This is to certify that the thesis entitled “Vehicle Detection and Counting With Custom Data-Set
Training” submitted by Rauank Giri (190610026033), Trinashis Deb Roy (190610026045),
Dibakar Das Gupta (190610026014), Sandip Tiwari (190610026037) the partial fulfillment of the
requirements for the award of Bachelor of Technology degree in Electronics & Telecommunication
at Assam Engineering College, Jalukbari, Guwahati is an authentic work carried out by them under
my supervision and guidance.

To the best of my knowledge, the matter embodied in the thesis has not been submitted to any other
University/Institute for the award of any Degree or Diploma.

Signature of Supervisor(s)
Name(s)
Electronics & Telecommunication
Assam Engineering College
January,2023
DECLARATION

We declare that this written submission represents our ideas in our own words and where others
ideas or words have been included, we have adequately cited and referenced the original sources. We
also declare that we have adhered to all principles of academic honesty and integrity and have not
misrepresented or fabricated or falsified any idea/data/fact/source in our submission. We understand
that any violation of the above will be cause for disciplinary action by the Institute and can also
evoke penal action from the sources which have thus not been properly cited or from whom proper
permission has not been taken when needed.

(Signature) (Signature) (Signature) (Signature)

Raunak Giri Trinashis Deb Roy Dibakar Das Gupta Sandip Tiwari

190610026033 190610026045 190610026014 190610026037

Date: Date: Date: Date:


ACKNOWLEDGMENTS

We would like to express our special thanks of gratitude to our HOD Navajit
Saikia Sir and our Project Guide Nishant Bharti Sir, Asst. Professor, AEC who
gave us the golden opportunity to do this project on the topic Vehicle Detection
and Counting With Custom Data Set Training. It helped us in doing a lot of
Research and we came to know about a lot of things related to this topic.
Finally, we would also like to thank our teachers and friends who helped us a lot in
finalizing this project within the limited time frame.

Raunak Giri (190610026033)


Trinashis Deb Roy (190610026045)
Dibakar Das Gupta (190610026014)
Sandip Tiwari (190610026037)
ABSTRACT

Detecting and counting on road vehicles is a key task in intelligent transport management and
surveillance systems. The applicability lies both in urban and highway traffic monitoring and control,
particularly in difficult weather and traffic conditions. In the past, the task has been performed through
data acquired from sensors and conventional image processing toolbox. However, with the advent of
emerging deep learning based smart computer vision systems the task has become computationally
efficient and reliable. The data acquired from road mounted surveillance cameras can be used to train
models which can detect and track on road vehicles for smart traffic analysis and handling problems
such as traffic congestion particularly in harsh weather conditions where there are poor visibility issues
because of low illumination and blurring. Different vehicle detection algorithms focusing the same issue
deal only with on or two specific conditions.
LIST OF FIGURES

Fig. No. Fig. Title Page No.


1.1 Moores Law............................................................................................. 4
LIST OF TABLES

Table No. Table Title Page No.


1.1 Constant field scaling and constant voltage scaling parameters.......... 5
KEYWORDS

intelligent traffic monitoring

urban and highway traffic analysis

artificial intelligence

deep learning

vehicle detection

traffic surveillance
CONTENTS
Page No.
CANDIDATE’S DECLERATION i
ACKNOWLEDGEMENTS ii
ABSTRACT iii
LIST OF FIGURES iv
LIST OF TABLES v
CONTENTS vi

Chapter 1 INTRODUCTION 1
1.1 Introduction............................................................................................. 1
1.2 Background of vehicle detection............................................................ 2
1.3 Motivation.............................................................................................. 10
1.4 Objectives……………………………………………………………... 10
1.5 Organisation of Report……………………………………………….. 11
Chapter 2 LITERATURE REVIEW
2.1 Introduction……………………………………………………………. 12
2.2 Literature survey………………………………………………………. 12
2.3 Conclusion……………………………………………………………. 19

Chapter 3 METHODOLOGY 20
3.1 Introduction……………………………………………………………. 20
3.2 Methodology………………………………………………………….. 20
3.3 Software Tools Used…………………………………………………... 22
3.4 Conclusion…………………………………………………………….. 26

Chapter 4 RESULT ANALYSIS 27


4.1 Introduction……………………………………………………………. 27
4.2 Result Analysis………………………………………………………... 27
4.3 Significance of Result Obtained……………………………………… 32
4.4 Test Sample……………………………………………………………. 32
4.5 Conclusion of Result Analysis………………………………………… 33

Chapter 5 CONCLUSION & FUTURE SCOPE OF WORK 35


REFERENCES 36
CHAPTER 1

INTRODUCTION

1.1. Introduction:

In recent years, vehicle detection has become a popular topic of research among researchers working in
related fields due to its societal importance. Vehicle detection and counting is a process to estimate the
road traffic density to assess the traffic conditions for intelligent transportation systems. With the
extensive utilization of cameras in urban transport systems, the surveillance video has become a central
data source .Also, real-time traffic management system has become popular recently due to the
handheld/mobile cameras and big-data analysis. According to a survey, every year a large number of
people die worldwide because of the fatal accidents which are mainly caused due to the negligence of
drivers or poor visibility during inclement weather conditions, etc. The report published by National
Crime Record Bureau’s Accidental Death and Suicides in India stated that hundreds of people died
mainly in two states of India(Andhra Pradesh and Telangana) in the year 2014 due accidents caused by
poor visibility during inclement weather conditions. Another report which is published in the website of
U.S. Department of Transportation, Federal Highway Administration based on the data collected over a
span of 10 years (2007–2016) by NHTSA also stated that approximately 21% of total annual accidental
crashes in U.S. occurred due to poor visibility during inclement weather. According to the data
published in [3], it is stated that about 90% of the road accidents in India occur due to negligence of
drivers. These data and statistics published in several significant sources clearly state the importance of
performing accurate vehicle detection in real world. In the past decade, numerous methods have been
designed for accurate tracking and detection of vehicles. Although the traditional vehicle detection
algorithms such as Gaussian Mixed Model (GMM) give promising results but it fails to perform
desirably when illumination changes occur or in the presence of background clutter etc. The deep
learning methods have inherent feature extraction capability which makes them much more acceptable
to researchers compared to the traditional methods as it minimizes the errors occurring in classification
tasks which occur due to erroneous handcrafted feature extraction to great extent. As Convolutional
Neural Networks (CNN) are designed to artificially replicate the functional capabilities of human

1
cognitive system, they give better performances in various computer vision tasks compared to the
traditional methods. Here, we have discussed about vehicle detection using machine learning and deep
learning techniques.

1.2. Background of vehicle detection:

Vehicle detection methods can be mainly classified into classical methods which relies on manual
feature extraction algorithms with a machine learning classifier and the recent more robust methods
which depends on deep learning models for automatic feature extraction.

1.2.1. Vehicle Detection Using Machine Learning:

Classical methods of detecting objects within a media involves feature extraction algorithms such as
HOG [57–63], HAAR [64–74] and SIFT [75–81] to manually craft features. Secondly, using a classifier
such as support vector machine [8,82–86] and K Nearest Neighbour [84,87,88] to classify multiple
objects within an image. The general processing pipeline is shown in Figure 1.

Figure 1. General processing pipeline of object detection with machine learning classifier.

1.2.2. Object Detection Using Deep Learning:

2
Deep Learning based algorithms provide a more robust and accurate solution to the problem of object
detection. There is no need for a separate feature description part, rather the model is trained using
images with bounding boxes and class labels. Thus, the model learns automatically the visual attributes
present in the underlying image. In the past few years, the task of object detection using deep learning is
performed by a number of algorithms including RCNN, Fast RCNN, Faster RCNN, YOLO, YOLOv2,
YOLOv3, YOLOv4, YOLOv5,YOLOv6,YOLOv7.

1.2.2 R-CNN Family:


R-C NN or Region-based Convolutional Neural Network, was introduced by Ross Girshick in 2014. It is
a method for object detection that involves using a convolutional neural network (CNN) to classify
object proposals, or regions of interest (ROIs), within an image. R-CNN is a two-stage object detection
pipeline that first generates a set of ROIs using a method such as selective search or edge boxes and
then classifies the objects within these ROIs using a CNN.

The R-CNN pipeline can be divided into three main steps:

 Region proposal: A method, such as selective search or edge boxes, generates a set of ROIs
within the image. The bounding boxes around the objects of interest typically define these
ROIs.
 Feature extraction: A CNN is used to extract features from each ROI. These features are
then used to
represent the ROI in a compact and informative manner.
 Classification: The extracted features are fed into a classifier, such as a support vector
machine (SVM), to predict the object’s class within the ROI.

One of the main advantages of R-CNN is that it can handle many object classes, as the classifier is
trained separately for each class. However, a significant drawback of R-CNN is that it is
computationally expensive, requiring the CNN to be run on each ROI individually.

What is FAST R-CNN?

3
Fast R-CNN, introduced by Ross Girshick in 2015, is an improvement over R-CNN that
addresses the computational inefficiency of the original method. It achieves this by using a
single CNN to process the entire image and generate the ROIs rather than running the CNN on
each ROI individually.

The Fast R-CNN pipeline can be divided into four main steps:

 Region proposal: A set of ROIs is generated using a method such as selective search
or edge boxes.
 Feature extraction: A CNN extracts features from the entire image.
 ROI pooling: The extracted features are then used to compute a fixed-length feature
vector for each ROI. This is done by dividing the ROI into a grid of cells and max-
pooling the features within each cell.
 Classification and bounding box regression: The fixed-length feature vectors for each
ROI are fed into two fully connected (FC) layers: one for classification and one for
bounding box regression. The classification FC layer predicts the object’s class within
the ROI, while the bounding box regression FC layer predicts the refined bounding
box coordinates for the object.

Fast R-CNN significantly reduces the computational cost of object detection compared to R-
CNN, as the CNN is only run once on the entire image rather than multiple times on each ROI.
However, it still requires a separate classifier for each object class, which can be
computationally expensive if the number of classes is large.

Fast R-CNN has been widely used in various applications, including object detection in
natural, medical, and satellite images. It has also been extended to handle tasks such as
instance segmentation, joint object detection, and scene classification.

What is Faster R-CNN?


Faster R-CNN, introduced by Shaoqing Ren et al. in 2015, is an improvement over Fast R-
CNN that further reduces the computational cost of object detection. It achieves this by using a

4
single CNN to generate both the ROIs and the features for each ROI, rather than using a
separate CNN for each task, as in Fast R-CNN.

The Faster R-CNN pipeline can be divided into four main steps:

 Feature extraction: A CNN extracts features from the entire image.


 Region proposal: A set of ROIs is generated using a fully convolutional network
(FCN) that processes the extracted features.
 ROI pooling: The extracted features are then used to compute a fixed-length feature
vector for each ROI using the same ROI pooling process as in Fast R-CNN.
 Classification and bounding box regression: The fixed-length feature vectors for each
ROI are fed into two separate FC layers: one for classification and one for bounding
box regression. The classification FC layer predicts the object’s class within the ROI,
while the bounding box regression FC layer predicts the refined bounding box
coordinates for the object.

Faster R-CNN combines the feature extraction and region proposal steps of R-CNN and Fast
R-CNN into a single CNN, making it more computationally efficient than both methods. It also
uses an anchor box mechanism to handle multiple scales and aspect ratios, which can improve
the robustness of object detection.

1.2.3 YOLO Family:

YOLO v1:

Redmon et. al. have designed this object detection network to reduce huge run-time
complexities of R-CNN and its proposed variants.Unlike R-CNN and its’ variants, YOLO does
not require region proposals to localize and classify objects, instead it divides an entire image
into S×S grid and within each grid it locates 'm' number of bounding boxes. Each bounding

5
box predicts a class probability and offset values. The bounding boxes which predict class
probabilities below a certain threshold are suppressed.

Limitations of YOLO v1:

 The maximum number of objects detected by a YOLO detector always depends on the
dimension of the grid as YOLO can detect only one object per grid. Like, if the size of
the grid is S×S , the maximum number of objects detected is S 2 .
 As the maximum number of objects detected by the YOLO detector per grid is 1, so it
performs erroneous detection when more than one object exist within a grid.

YOLO v2:

Redmon et. al. has proposed an improved version of YOLO also known as YOLO9000 which
not only excels state-of-art methods like Fast R-CNN, Faster R-CNN in terms of efficiency but
also performs detection within a reasonable amount of time. In this version of YOLO detector,
the authors
have performed various changes in the architecture of YOLO version 1 in order to solve its
limitations.
Some notable architectural changes which are done in YOLO v1:

 Introduction of Batch Normalization Layer: The introduction of this layer after all
convolutional layer improves the performance of the detector and eliminates the
chances of overfitting without even adding the dropout layers.
 Unlike YOLO detector which uses images of dimension 224×224 for training and
increases their dimension into 448×448 during test phase. The sudden increase of the
image resolution during the test phase decrease the performance efficiency of YOLO
detector version 1. Hence, to overcome this drawback YOLO version 2, fine tuning is
done and network is trained on images of dimension 448×448 for 10 epochs so that it
can gradually adjust with images of high resolution. Hence, the problem arising due to
the decrease in m-AP (mean Average Precision) which occurs in YOLO due to
sudden
6
increase in image dimension is solved.
 This improved model does not predict the offset values using the fully connected layers
which lie on the the top of convolutional layers like YOLO version, instead it removes
the fully connected layers from the architecture and predicts objectness scores using
the anchor boxes. The use of anchor boxes although reduce the m-AP of YOLO v2 in
to YOLO v1 but it increases its’ Recall value.
 YOLO v1 performs training of the network using hand annotated bounding boxes but to
make the learning process more easier, the authors in have performed training of their
network using bounding boxes generated using k-means algorithm in combination with
their proposed distance metric, which is mathematically defined in (1).

d(box,centroid)=1-IOU(box,centroid )...................(1)

YOLO v3:

YOLO v3 is an improved version of YOLO detector which is designed by Redmon et. al.. YOLO v3
does not use softmax classifier to predict classes of detected objects as it allows the prediction of only
one class per object and thus fails to efficiently handle multiclass prediction. To overcome this
drawback, YOLO v3 uses independent logistic classifiers for each class, which allows it to efficiently
handle multi-class prediction. Unlike YOLO v2 which uses Darknet-19 as feature extractor, YOLO v3
uses a hybrid feature extraction approach by combining features extracted using Darknet-19 and the
residual network. The proposed architecture of YOLO v3 has several shortcut connections which
increases its’ performance efficiency while detecting small objects but decreases its’ performance
efficiency while detecting large and medium objects.

YOLO v4:

YOLO v4 is designed taking the inspiration from several Bag-of-Freebies and Bag-of-Specials object
detection methods. Bag-of-Freebies method increases the inference time and training cost of the detector

7
but increases its’ accuracy while Bag-of-Specials methods increases the inference cost of the method to
some extent but increases its’ accuracy. Apart from these modifications, other improvements performed
in YOLO v4 model are selection of optimal values of hyper-parameters using genetic algorithms,
introduction of data-augmentation methods like Self- Adversarial Training (SAT) and Mosaic,
alterations of existing methods like Cross mini-Batch Normalization, Spatial Attention Module, etc.

YOLO v5:

Unlike previous versions of YOLO which have been developed using Darknet research framework, this
is the first version of YOLO which is developed in PyTorch framework. This makes YOLO v5 much
more production ready compared to its’ previous versions as PyTorch is much more easily configurable
compared to Darknet. Another notable improvement of this version of YOLO is its’ run-time. YOLO v5
is much faster compared to its’ previously proposed versions. The inference time of YOLO v5 is 140
frames per second while inference time of YOLO v4 is 50 frames per second when it is designed using
same PyTorch library as that of YOLO v5.

YOLO V6:

The original intention for this new version comes from solving the practical problems encountered when
dealing with industrial applications. MT-YOLOv6 is a target detection framework developed by the
Meituan (Chinese shopping platform) Visual Intelligence Department.

Note that MT-YOLOv6 is not a part of the official YOLO series and has been called YOLOv6 as it was
inspired by one-stage YOLO algorithms and hence is being pushed as the next version of the YOLO
models by the authors.

It is a single-stage object detection framework mostly focusing on industrial applications, with hardware
efficient design and better performance than YOLOv5 in detection accuracy and inference speed. This
makes it the best OS version of YOLO algorithms for production.

8
Performance:

 YOLOv6(nano) achieves 35.0 mAP (mean Average Precesion) on the COCO data set and 1242
FPS inference speed on T4 GPUs,
 YOLOv6(s) can reach 43.1 mAP, and the inference speed can reach up to 520 FPS on T4.

New Improvements:

Other than that, YOLOv6 has been enhanced by adding in improvements from previous YOLO releases,
more specifically by including —

 Anchor-free paradigm: has a 51% improvement in speed as compared to Anchor-based detectors.


 SimOTA label assignment strategy: to dynamically allocate positive samples to further
improve the detection accuracy.
 SIoU bounding box regression loss: to supervise the learning of the network.

YOLO V7:

YOLOv7 is the fastest and most accurate real-time object detection model for computer vision tasks.
The official YOLOv7 paper named “YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for
real-time object detectors” was released in July 2022 by Chien-Yao Wang, Alexey Bochkovskiy, and
Hong-Yuan Mark Liao.

The different basic YOLOv7 models include YOLOv7, YOLOv7-tiny, and YOLOv7-W6:

 YOLOv7 is the basic model that is optimized for ordinary GPU computing.
 YOLOv7-tiny is a basic model optimized for edge GPU. The suffix “tiny” of computer vision
models means that they are optimized for Edge AI and deep learning workloads, and more
lightweight to run ML on mobile computing devices or distributed edge servers and devices.
This model is important for distributed real-world computer vision applications. Compared to the
other versions, the edge-optimized YOLOv7-tiny uses leaky ReLU as the activation function,
while other models use SiLU as the activation function.
9
 YOLOv7-W6 is a basic model optimized for cloud GPU computing. Such Cloud Graphics Units
(GPUs) are computer instances for running applications to handle massive AI and deep learning
workloads in the cloud without requiring GPUs to be deployed on the local user device.

Other variations include YOLOv7-X, YOLOv7-E6, and YOLOv7-D6, which were obtained by
applying the proposed compound scaling method (see YOLOv7 architecture further below) to scale
up the depth and width of the entire model.

1.2. Motivation:

Vehicle detection methods have been in development for several years in academia and industry.
So far, some state-of-the-art object detection methods cannot achieve competitive performance on
vehicle detection benchmarks. The main problem for vehicle detection with two stage detectors (RCNN,
FAST RCNN,etc) is that they provide non-real-time detection. One-stage detectors provide real-time
detection but most of them have low detection precision, poor results for small and dense objects.Here,
we try to achieve vehicle detection using YOLO v7 which is faster and provides real time detection with
better performance and accuracy as compared to other one-stage detectors.

1.3. Objectives:

• Detection of vehicles.
• Classification of different types of vehicles such as a car, bus, truck, motorcycle, auto, etc.
• Count the total number of vehicles.

1.4. Organisation of report:

• Chapter-2 contains a discussion about the existing literature survey.

10
• Chapter-3 provides an overview of methods that are used to carry
out implementation.
• Chapter-4 shows the results of various experiments performed for vehicle
detection and classification, vehicle counting from the image, vehicle tracking
and counting from the video sequence.
• Chapter-5 contains a conclusion and discussion about the future scope.

11
CHAPTER 2

LITERATURE REVIEW

2.1. Introduction:

Since its inception, the object detection field has grown significantly, and the state-of-the-art
architectures generalize pretty well on various test datasets. But to understand the magic behind these
current best architectures, it is necessary to know how it all started and how far we have reached in the
YOLO family.
In the last few years, researchers have shown a great interest in vehicle detection, classification, and
also in the area of tracking and counting. Many researchers used deep learning algorithms for vehicle
detection and classification task. This chapter briefly discusses the related research work carried out
by different researchers in the area of vehicle detection, classification, counting, and tracking.

2.2. Literature Survey:

D. Mittal et al. (2018) proposed Faster RCNN for vehicle detection and classification. A Large amount
of dataset is needed to train a Deep Learning Network so, they have augmented existing large scales
datasets and uses Faster RCNN for vehicle detection and classification purpose. They have performed
testing using four different approaches: first,they applied Faster RCNN pre-trained model, performed
fine-tuning of a pre-trained modelwith their collected dataset, train the model from the scratch using
their dataset, and then they trained model using collected dataset and existing dataset. However, the
collected dataset is quite different from the existing dataset. So, they have augmented the Pascal VOC
dataset with the IITM-HeTra dataset.So using this way improves the performance of vehicle detection.

B. Hicham et al. (2018) presented two methods: Firstly they have implemented the Data
Augmentation technique to reduce the imbalanced dataset problem and secondly they applythe CNN
model to perform vehicle detection and classification. The structure of CNN comprises of series of
convolutive layers and fully connected layers. The convolutive layer used to perform feature
extraction and fully connected Artificial Neural Networks (ANN)layers used to classify the objects.
Precision, Recall, and Accuracy metrics used to measurethe performance of the system.
12
M. V. et al. (2019) proposed Deep Learning Framework R-CNN for the classification of different
types of vehicles. Box filter smoothening technique is implemented to reduce thenoise in the image.
The background information is removed from the frame to detect the interested region.

K. Shi et al. (2017) presented Fast RCNN for the purpose of vehicle detection. An author has
performed efficient vehicle detection using incremental learning and pre-processing approach to
optimize the training process, training parameters are adjusted during the training process in order to
achieve the best state. The vehicle detection model comprises of the two stages: 1) training stage and
2) testing stage. Inside the training stage, they re- trained the initial parameters of the CNN after
pretraining on ImageNet. The testing stage is used to test the input samples. CaffeNet,
VGG_Cnn_M_1024, and VGG-16 networks are used in the pretraining process. Fast RCNN is
initialized with the CaffeNet but the ROIpooling layer is used in place of the Last pooling layer,
using two parallel layers instead ofa fully connected layer and softmax layer, and it takes two sets of
data as an input. The newdataset is formed for incremental learning by adding the BUU-T2Y dataset
to the KITTI dataset and using this it improves the vehicle detection performance.

C. N. Aishwarya et al. (2018) proposed the YOLO method for object detection purposes. To achieve
better prediction accuracy they trained a Neural network named Inception V3 with the dataset that
comprises of two different types of images, both static images, and dynamic images. After that, they
used image processing methods such as Sobel kernel, andthresholding to identify license plate and
use classification algorithm KNN to determine the characters. Accuracy of Tiny YOLO is compared
with the YOLO. As compared to YOLO, Tiny YOLO is less accurate and it classifies only fewer
objects.

B. Benjdira et al. (2018) have examined the performance of Faster RCNN and YOLOv3 for the
purpose of car detection. They have trained a dataset using both Faster RCNN andYOLOv3. After
training, try to evaluate both algorithms using the F1 score, precision, recall, processing, and quality
time. Both of the algorithms have high precision, it means both will able to recognize the car, but in
case of a recall, YOLOv3 is better than the FasterRCNN.

G. Prabhakar et al. (2017) proposed Faster RCNN to detect on-road objects. The ZF Net pre-trained
model is fine-tuned with the PASCAL VOC 2012 dataset to detect only 20 objects. Training is
available in two ways: Approximate Joint Training, it trains both the RPN and Fast R-CNN at the
same time. While in the Alternate Training, first train the RPNand then train Fast R-CNN using
13
generated region proposals. They have performed training

14
using Approximate Joint Training. They have tested results on a variety of datasets under different
climate conditions. The performance of vehicle detection is measured using MeanAverage Precision
(mAP). This method will able to detect all the on-road objects but it sometimes fails to detect auto
class.

M. C. Olgun et al. (2018) presents Faster RCNN and Haar Cascade Classifier. They performed
training using Faster RCNN and used feature extractor Resnet Inception V2 to detect the vehicles.
To detect traffic light and stop sign they used Haar Cascade Classifier.CNN based on NVIDIA's
PilotNet is used for the purpose of lane tracking and they trainedthe model using CNN and applied
data augmentation to support a variety of lanes.

S. Srilekha et al. (2015) have presented vehicle detection, tracking, and counting using a Kalman
filter. Moving object detection using a background subtraction technique isdifficult. So they used the
Kalman filter algorithm and using this technique they achieve better object detection accuracy
compared to the background subtraction.

P. K. Bhaskar and S.-P. Yong (2014) used Gaussian Mixer Model(GMM) and blob detection for
vehicle data recognition and tracking purpose. Some morphological operations have been applied to
detect moving objects and remove noise. Gaussian MixerModel is used to separate the foreground
and background of the image. The Blob detectionused to determine the object movement in the
frame.

M. Djalalov et al. (2010) perform object detection using Median Filtering and Blob extraction and
track the vehicles using Kalman filter. First, they apply the median filter andit results in a
background image. After that, apply background subtraction to determine themoving objects for each
frame. They combined edge pixels into the objects by performingmorphological closing operations.
Then the location of vehicles is tracked using the Kalman filter.

15
Table 2.1 Shows a Summary of the Literature Survey

No. Paper Title Author Journal Method Advantages Dis- Remarks


name Name Used Advantages

1 Training a Deepak 2017 2nd Faster Faster It was AP values


Deep Mittal, Internation RCNN RCNN unable to used for
al detect

16
Learning Avinas nal architecture class Auto- evaluating
Architecture h Conferen can able to rickshaw. results.
for Vehicle Reddy, ces on detect all Dataset:
Detection Gitakri Informati types of PASCAL
Using Limited shnan on vehicles. VOC
Heterogeneo Ramad Technolo dataset,
us Traffic urai, gy, IITM-
Data[1] Kaushi Informati HeTra
k Mitra, on
Balara Systems
man and
Ravind Electrical
ran Engineeri
ng
(ICITISE
E)
2 Vehicle Type Bensed 2018 data CNNs have - Our
Classificatio n ik IEEE augment demonstrate methods
Using Hicham ation, d better provide
Convolution , CNN performance good
al Neural Azough in image results for
Network[7] Ahmed classificatio precision
, n. and recall.
Meknas Use Own
ssi dataset
Moham
med
3 A Deep Muruga Internatio SVM, RCNN used - Recognitio
Learning n. V, nal RCNN to overcome n accuracy
RCNN Vijayk Conferen the using
Approach for umar ce on degradation RCNN is
Vehicle V.R, Commun factors like 91.3%.
Recognition and ication complexity
in Traffic Nidhila and in
Surveillance .A Signal computation
System[8] Processin and time
g, April required in
4-6, CNN.
2019,
India
4 Forward Kaijing 2017 Fast R- Proposed This model The
Vehicle Shi, IEEE CNN method was not the experiment
Detection Hong solved the end-to-end al result
Based on Bao, problems network can reach
Incremental Nan related to structure 86.2%.
Learning and Ma missing compared to Dataset:
Fast R- vehicle the other KITTI,
CNN[10] models. And BUU-T2Y

17
detection it was not
and it real-time.
improves the
testing
accuracy of
the vehicle.
5 Multilayer Chaya 2018 YOLO Our The object A Sedan
vehicle N IEEE algorithm detection with 93%
classification Aishwa uses a faster system accuracy
integrated rya, and more implemented and car as
with single Rajshe efficient on the SUV and
frame khar object Raspberry Pi Small Car
optimized Mukher detection can be with
object jee, system expanded to accuracies
detection Dharm named GPU enabled 98% and
framework endra YOLO to create a 74%
using CNN Kumar which has faster and respectivel
based deep Mahato significant portable real- y.
learning advantages time object Dataset:
architecture over SURF detection COCO
[11] as it uses a system.
neural
network
instead of the
conventional
sliding
window
technique.
6 Car Detection Bilel Proceedi Faster R- YOLO V3 is It classifies Performan
using Benjdir ngs of the CNN and more capable car object ce
Unmanned a, Taha 1st YOLOv3 to only, cannot evaluated
Aerial Khursh Internatio extract all classifies based on
Vehicles: eed, nal the cars inthe other precision,
Comparison Anis Conferen image with vehicles like recall, F1
between Koubaa ce on 99.07% bicycle, score,
Faster R- , Adel Unmanne accuracy. motorcycle, quality,
CNN and Ammar d Vehicle bus, truck. and
YOLOv3[12 , Kais Systems processing
] Ouni (UVS), time.
Muscat, Dataset:
Oman, 5- UAV
7 imagery
February, dataset
2019
7 Obstacle Gowdh 2017 Faster The deep It sometimes Mean
Detection am IEEE RCNN learning fails to Average
and Prabha Region network is detect auto Precision
Classificatio kar, 10 found robust because for Kitti
that is
18
19
n using Deep Binsu Symposi to variation commonly Driver
Learning for Kailath um in the objects seen on video is
Tracking in , Sudha (TENSY view, Indian roads 71.7% and
High-Speed Nataraj MP) lighting, and but not in the for the
Autonomous an, climatic PASCAL Chennai
Driving[13] Rajesh conditions. training road video
Kumar dataset. 90.5%.
Dataset:
PASCAL
VOC 2012,
KITTI and
iRoads
8 Autonomous Masum 2018 6th Faster Faster The Dataset for
Vehicle Celil Internatio RCNN RCNN gives autonomous vehicle
Control for Olgun, nal good vehicle tracking
Lane and Zakir Conferen performance would need a collected
Vehicle Baytar, ce on for tracking more manually
Tracking by Kadir Control the vehicles. advanced
Using Deep Metin Engineeri Artificial
Learning via Akpola ng & Intelligence
Vision[14] t, Informati algorithms to
Ozgur on take proper
Koray Technolo action by
Sahing gy combining
oz (CEIT) the outputs
of the
model(s) and
sensors to
handle real-
life traffic
complication
s.
9 A Multiple Yi- 2018 Foregrou Detect - Counting
Vehicle Hsuan IEEE nd vehicles, accuracy
Tracking and Hsu, Internatio extractio Kalman can be
Counting Ssu- nal n, Data filter used to more than
Method andits Yuan Conferen associati predict the 95%.
Realization on Chang, ce on on, position of
an and Consume Normaliz the lost
Embedded Jiun-In r ed Cross- tracker.
System witha Guo Electroni Correlati
Surveillance cs- on
Camera[18] Taiwan (NCC),
(ICCE- Kalman
TW) filter,
PVA-
Lite

20
10 A Novel Yong 2018 YOLOv2 The Weather 96%
Multi-source He, IEEE , Integration conditions accuracy at
Vehicle Liangq Coordina of Video such as 38 FPS.
Detection un Li te Space detections heavy fog or Dataset:
Algorithm Conversi and radar rainy days BrnoComp
based on on, Data information affect the Speed
Deep Fusion improves the video
Learning[19] performance information.
of vehicle
detection.

2.3. Conclusion:

The literature survey has shown that various approaches have been used for vehicle detection,
including image processing techniques, deep learning algorithms, background subtraction,
and object tracking. The use of these techniques has resulted in improved accuracy and
efficiency in vehicle detection systems. Despite the advancements in the field, there are still
challenges that need to be addressed. For example, vehicle detection algorithms must be
robust to various lighting conditions or vehicle types as well as should be able to accurately
detect and track vehicles in complex and crowded environments and counting of the no. of
vehicles can be used to develop efficient systems for traffic monitoring, surveillance, and
autonomous vehicles.

2
CHAPTER 3

METHODOLOGY

3.1. Introduction:

Image classification determines which objects in the image, such as a car or bicycle rail,
while image localization provides a specific location for these objects by using restrictive
fields. In order to classify the images, the convolution neural network had to recognize
different objects, such as a car, bus and motorcycle. Hence image classification and
localization can be defined as object detection.
Object detection = Image classification + Image localization
The workflow has 3 parts, first step is gathering the training data, second is training the
model and the final one is prediction of new images.

Fig 3.1 Workflow of vehicle detection.

3.2. Methodology:

 Gather Training Data:

 For this task, camera is used to capture the data as close to the data that should be
finally predicted. The data set collection has 8219 images. After the images are
captured, the obtained set of images are resized and ground truth labelling is
generated with location and labels of object of interest. But this process is a fairly

2
intensive and time consuming task.

 Image Annotation:

 In machine learning (ML) and deep learning (DL), image annotation is the technique
of labelling or categorizing an image using annotation text, software tools, or both to
display the data features we want our ML/DL model to identify on its own. When we
conduct image annotation, we are basically adding metadata to a dataset for
specifying the ground-truth. Put simply, image annotation is a kind of data labelling
often referred to as tagging, processing, or transcribing. The method of image
annotation applies to both image and video annotation. Just like a set of images, the
videos can also be annotated continuously, like an image feed or frame by frame.
Here,we have used labelImg in python for image annotations.

Fig 3.2 Image annotation process.

 Dataset Splitting:

 Splitting of dataset into train and test datasets (ratio 80:20) using python data
structures and modules like numpy and pandas.

 Training Our Model:

2
 Trained our model with the training-data and weight-Yolov7 in Google Colab virtual
GPU and obtained our custom trained model along with analytical results.

 Conversion Of The Obtained Model:

 Converted the obtained model(best.pt) into open-cv supported model (best.onnx).

 Building Of Python Code:

 Built a python code(yolo_predictions.py) to showcase the predictions in open-cv


visuals.

 Testing Our Model:

 Tested and observed our model using python and open-cv.

 Web Application:

 Built a web-application using Streamlit to test our model on image based data and
predict objects with their count.

3.3. Software Tools Used:

Visual Studio Code:

Visual Studio Code (VS Code) is a free, open-source, cross-platform code editor developed
by Microsoft. It supports multiple programming languages, including but not limited to
JavaScript, Python, C++, and Java, and provides features like:

 IntelliSense: context-aware code completion and suggestions.


 Debugging: integrated debugging support for Node.js, Go, and C++.
 Git integration: source control management through Git and GitHub.
 Code snippet: code templates to improve your productivity.
 Extensions: a rich ecosystem of extensions to add additional functionality and support

2
for different languages and tools.

VS Code is widely used by developers due to its versatility, extensibility, and ease of use.

Google Colab: Google Collab is a free, cloud-based platform for machine learning and
data science research. It provides access to powerful computing resources, including GPUs
and TPUs, and allows users to write and execute code in Jupyter notebooks.

Some of the key features of Google Colab include:

 Free access to GPUs and TPUs for training machine learning models.
 Integration with Google Drive, allowing you to save and share your work with others.
 Easy-to-use Jupyter notebook environment, with pre-installed popular libraries
such as TensorFlow, PyTorch, and OpenCV.
 Code sharing and collaboration, allowing multiple users to work on the same
notebook at the same time.
 Integration with Google's cloud services, such as BigQuery and Google Cloud
Storage.

Python And It’s Libraries: Python is a high-level, interpreted, general-purpose


programming language. It was first released in 1991 and has since become one of the most
widely used programming languages in the world, especially for data science, machine
learning, web development, and scientific computing.
There are several Python libraries that can be used for vehicle detection, including:

OpenCV: OpenCV (Open Source Computer Vision) is an open-source computer vision and
machine learning library. It provides a wide range of tools and algorithms for image
processing, including vehicle detection.

YOLO (You Only Look Once): YOLO is a real-time object detection system that is
capable of detecting vehicles. It is implemented in Python and can be used for vehicle
detection with the appropriate trained weights.

2
Yolo V7: YOLOv7 (You Only Look Once version 7) is a state-of-the-art object detection
system for real-time applications. It is an improvement over the earlier versions of YOLO,
including YOLOv3, and is designed to be faster, more accurate, and easier to use. YOLOv7
is a convolutional neural network-based system that is trained to detect objects in an image
by dividing the image into a grid of cells and predicting the bounding boxes and class
probabilities for each cell. The key innovation in YOLOv7 is the use of multiple scales and
aspect ratios to better detect objects of different sizes and shapes, making it more accurate
and robust compared to previous versions. YOLOv7 is implemented in C++ and CUDA and
has a Python API for easy integration into existing applications. It can be used for a wide
range of object detection tasks, including vehicle detection, pedestrian detection, and face
detection, among others. YOLOv7 has been well received in the computer vision and
machine learning communities, and is widely used for real-time object detection in a variety
of applications, including autonomous vehicles, surveillance systems, and computer vision
research.
YOLOv7 is considered to be an improvement over YOLOv6, although the exact extent of the
improvement will depend on the specific use case and evaluation metrics.Some of the ways
that YOLOv7 is thought to be better than YOLOv6 include:

Improved accuracy: YOLOv7 is designed to be more accurate than YOLOv6, particularly in


detecting smaller objects and in handling scale variations.

Faster inference: YOLOv7 is optimized for real-time performance, with faster inference times
compared to YOLOv6. This makes it suitable for use in real-time applications, such as
autonomous vehicles and surveillance systems.

More flexible architecture: YOLOv7 is designed to be more flexible and modular, making it
easier to fine-tune for specific use cases and to integrate into existing applications.

It is worth noting that the improvements in YOLOv7 come at the cost of increased
complexity, with a larger model size and increased computational requirements compared
to YOLOv6. The choice between YOLOv6 and YOLOv7 will depend on the specific
requirements and resources of each application.

2
Streamlit: Streamlit is an open-source Python library for building data-driven web
applications. It provides a simple, intuitive way to build beautiful, dynamic, and
interactive applications with minimal effort.Some of the key features of Streamlit include:

 Easy-to-use: Streamlit provides a simple and intuitive syntax for building web
applications, making it easy to get started even for those with little web development
experience.
 Dynamic UI: Streamlit's dynamic UI allows you to build responsive and
interactive applications, with live updates as data changes.
 Data visualization: Streamlit includes built-in support for a wide range of data
visualization libraries, including Matplotlib, Plotly, and Bokeh, making it easy
to build beautiful and informative visualizations.
 Reactive programming: Streamlit uses reactive programming to automatically
update the UI when the underlying data changes, without the need to manually
manage updates.
 Extensibility: Streamlit is highly extensible, allowing you to integrate custom Python
code and libraries, and use its API to build custom components and interactions.

Streamlit is widely used for building data-driven applications, including machine learning
models, dashboards, data visualization tools, and more. Its ease of use and powerful features
make it a popular choice for both data scientists and web developers.

3.4. Conclusion:

The steps for the implementation of our model alongwith the software tools used is studied
and discussed in brief.

2
CHAPTER 4

RESULT ANALYSIS

4.1. Introduction:

In this chapter, we will discuss about the evaluation of analytical results and graphs we
got after training our YOLO model with our custom dataset. We will understand how
read those graphs and come to conclusion about the quality of the model produced after
training.
After the reading the results and observing our trained model, we will also test our model
with the test data (remaining 20% image dataset) and evaluate the performance of our
model.

4.2. Result Analysis:

Post Training Analysis:

So, we have trained our model in Google Colab Virtual GPU for 60 epochs in 16 batches
with around 6574 image-data and got the below confusion matrix as output.

2
Figure 4.1

From figure 4.1; We can see that out of 8 different object classes used to train our model;
car, motorcycle and tractor are trained the best with True Positive scores 0.92, 0.88 and
0.92 respectively. And, LCV class is trained relatively worse with True Positive score
0.69 (which is however a considerable good score in general).

Besides the confusion matrix, we also have precision, recall and other graphs for analysis.
Let’s have a look at them too.

2
Figure 4.2

From the above graphs (fig 4.2) we can say that-

1) Box-loss, objectness-loss and classification-loss and converging down near to zero


value, which is a good remark.
2) Precision and Recall ratio per epoch is logarithmically increasing till (0.8-0.85) and
then follows a steep path.
3) Mean Average Precision is also increasing smoothly upto 0.85 and then follows a
steep path.

Thus, we can say that training our model for 60 epochs has done a great job of achieving
such results. Though, increasing the number of epochs may have also increased our
output model efficiency.

We also have some other resulting graphs for precision, recall and confidence to evaluate
the training result of our YOLO model. Let’s also have a look at them.

3
Figure 4.3: From the graph, we can see that for confidence score of 1.0 we have
overall precision of 0.953 (95.3%) for all classes which is a good result.

Figure 4.4: From the above graph, we can observe that the mean average precision for
all classes is 0.852 which is obtained from the Precision x Recall curve.

3
Figure 4.5: From the above graph, we can see that for confidence score of 0.0 we have
overall precision of 0.99 (99%) for all classes which is a very good result.

Post Testing Analysis:

3
Figure 4.6

Comparing the confusion matrices obtained after training and testing (fig 4.1 and 4.6),
we can see the following differences-

True Positive Value


Class Names Post Training Post Testing
1. Auto 0.82 0.82

2. Bus 0.83 0.83


3. Car 0.92 0.93

4. LCV 0.69 0.66


5. Motorcycle 0.88 0.88

6. Multi-axle 0.76 0.73


7. Tractor 0.92 0.87

8. Truck 0.75 0.74

So, we can see that car has highest True Positive values in both Training and Testing
Result. And, LCV has the least in both.

4.3. Significance of Result Obtained:

From the above analysis of the output results, it has come to a state that our model is
performing very well according the datasets provided. 5 out of 8 classes are performing
extremely great which are (Auto, car, bus, motorcycle and tractor), but other 3 classes
(LCV, Multi-axle and Truck) are trained and predicted slightly unreasonable according to
our expectations.
The reasons behind these behaviour can be bad annotations while image labelling or
higher chances are that multi-axle, truck and LCV are treated same in some rare scenarios
as in our Indian dataset these three classes are also seen quite similar.
Overall, our model is performing considerable good in terms of accuracy, precision and
confidence scores.

3
4.4. Test Sample:

We have built an application to test our model using python-streamlit and configured the
backend with python codes for yolo-predictions using our model. This will help us test
the model with user friendly approach. Below is the sample taken using it -

Figure 4.7

3
4.5. Conclusion of Result Analysis:

The performance of our model output is studied in details and the significant behaviour of
the model is noted and examined.

3
CHAPTER 5

CONCLUSION AND FUTURE SCOPE OF WORK

The use of custom data in training the vehicle detection model has proven to be vital in
achieving high detection rates. The model was fine-tuned to the specific characteristics of the
vehicles in the dataset, resulting in a higher detection rate compared to using pre-trained
weights alone. The custom data also enabled the model to adapt to the specific scenarios in
which it will be used, making it more robust and effective in detecting vehicles.The results of
this study have shown that the vehicle detection model using YOLO and custom data is
capable of accurately detecting and classifying vehicles in video footage. The model's high
detection rate and fast processing time make it suitable for practical use in real-world
applications such as traffic monitoring and surveillance. Furthermore, the model has the
potential to be further improved and adapted to different scenarios.

Traffic Monitoring:

Traffic monitoring using vehicle detection technique can be widely used as an application of
computer vision technology. With the help of our model we can analyze images or videos
from cameras mounted on roadways to identify and track vehicles. The information gathered
by the model can be used to generate traffic flow data such as traffic volume, speed, and
density. Overall, traffic monitoring using computer vision can act as a crucial tool for
optimizing traffic flow, improving road safety, and reducing congestion. The use of advanced
algorithms and computer vision techniques is expected to improve the accuracy and
efficiency of traffic monitoring systems in the future.

Surveillance Systems:

The model can be further used as a component of surveillance systems, such as those used for
security,parking control and law enforcement. The system can use cameras or other imaging
devices to capture images or video feeds, which are then processed to detect and track
vehicles.

 Security: Vehicle detection algorithms can be used to identify vehicles entering or


leaving restricted areas, such as airports, military bases, or high-security facilities.

3
 Parking control: Vehicle detection algorithms can be used to monitor parking
spaces, enforce parking regulations, and direct drivers to available spaces.

 Law enforcement: Vehicle detection algorithms can be used by police to monitor


roadways and enforce traffic laws, such as speed limits and red-light violations.

However, as with any model, it is important to note that the performance of the model is
dependent on the quality of the data used for training. The model's generalization ability can
be improved by using a larger and more diverse dataset, which will help it perform better in
various scenarios. Additionally, further testing and evaluations are needed to ensure the
model's robustness and effectiveness in real-world applications.

In conclusion, the implementation of a vehicle detection and counting model using YOLO
and custom data has shown to be a promising approach for accurately detecting,classifying
and counting vehicles in video footage. The model's potential for practical use in real-world
applications, combined with its efficiency, make it a valuable tool for various industries. This
study highlights the importance of using custom data and fine-tuning models to specific
scenarios, and further research is needed to optimize the model's performance and adapt it to
different scenarios.

3
REFERENCES

Journal / Conference Papers


[1] Name 1 and Name 2, “Paper Title”, Full Journal Name, volume no, publication year,
page numbers
[2] Name 1 and Name 2, “Paper Title”, Proceedings of the International / National
Conference on , Institution, Country, Date, page numbers

Reference / Hand Books


[1] Name 1, “Book Title”, Publication Name, Edition, ISBN number

Web
[1] Topic 1, website name (do not include long URL’s)

3
ANNEXURES (optional)

Annexure to include
 Product Data sheets
 Design drawings
 Standard diagrams
 Lengthy codes / algorithms etc

3
General Guidelines (Delete this page when making the report submission)

 Paper Size: A4; Left = Right = Top = Bottom Margins = 1”


 Page Numbering Position: Bottom with right justified and continuous
numbering from the Introduction Chapter
 Use Times New Roman Font with Normal Style, paragraph justified and 1.50 line
spacing
 Paragraph Heading: Times New Roman Font, Bold, Font Size 14; Paragraph
Matter: Times New Roman Font, Normal, Font Size 12;
 Sub-paragraphs be appropriately numbered as in 1.1, 1.2, 1.3 etc; Sub-paragraph
Heading: Times New Roman Font, Italics, Font Size 12; Sub-paragraph Matter:
Times New Roman Font, Normal, Font Size 12;
 Figure captions below Figure with chapter wise numbering
 Tables captions above Table with chapter wise numbering
 All references must be quoted in ascending order (follow IEEE format for referencing)
 Only Hard bound reports will be accepted. The outer cover of the report will be black.
 Arrangement of contents
[1] Cover page (same as inner page)
[2] Inner page
[3] A transparent cover
[4] Candidate’s declaration
[5] Acknowledgement
[6] Abstract
[7] List of Figures
[8] List of Tables
[9] Table of contents
[10] Chapters 1, 2, 3, 4, 5
[11] References (follow IEEE format)
[12] Annexure
 Report formatting should not be disturbed in any form
 Project students are requested to discuss with their department guides regarding
the contents of the project report

 Hard Copies to be submitted: 2 (One to guide and one in the HOD’s office)
 Soft copy (both word and pdf format) to be submitted in cd with project name,
students name with register number, batch mentioned on cd cover.

You might also like