Professional Documents
Culture Documents
Final Project Report
Final Project Report
Of
in
By
(2020-2024)
P age |2
ACKNOWLEDGEMENTS
First of all, we would like to thank the almighty whe showered his immense
blessings on us. Our heartfelt full thanks to Prof (Dr.) Swarup Kumar Mitra,
Principal of Guru Nanak Institute of Technology Kolkata for giving us the
accessory environment to acquire knowledge and skill. Our heartfelt thanks to
Dr. Suman Bhattacharya Sir, HOD, Computer Science and Engineering, for
gluing us the opportunity to do this project. I would like to express my special
thanks of gratitude to our Professor Mr. Rafiqul Islam Sir for his able
guidance and support in completing our project on the topic "Object
Detection Technique using OpenCV".
----------------------------------
Arnab Ghorui
----------------------------------
Suvadip Paul
----------------------------------
Subhadeep Mondal
----------------------------------
Rudranil Sen
----------------------------------
Krishna Das
----------------------------------
Shubhudeep Mandal
Date:
Place:
P age |3
CERTIFICATE
The contents of this thesis, in full or part, have not been submitted to any
other university or institution for the award of any degree or diploma.
Supervisor
______________
----------------------------------
Dr. Suman Bhattacharya
Head of the Department
Department of Computer Science and Engineering,
GNIT, Kolkata
P age |4
Contents
c
Chapter no Content Page No.
1. Abstract 05
2. Introduction 05
3. Objectives 06
4. Problem Statement 06
4.1 Applications 07
4.2 Challenges 08
5. Solution Statement 10
6. Required Resources 10
7. Pre-Processing 10
8. Classification and Localization 11
1. ABSTRACT :
Efficient and accurate object detection has been an important topic in the advancement of
computer vision systems. With the advent of deep learning techniques, the accuracy for
object detection has increased drastically. The project aims to incorporate state-of-the-art
technique for object detection with the goal of achieving high accuracy with a real-time
performance.
2. INTRODUCTION :
Object detection is a well-known computer technology connected with computer vision
and image processing. With the advent of deep learning techniques, the accuracy for
object detection has increased drastically. It focuses on detecting objects or its instances
of a certain class (such as humans, flowers, animals) in digital images and videos. There
are various applications including face detection, character recognition, and vehicle
calculator.
Object detection is a pivotal task in computer vision, identifying and locating objects
within images or video frames. Traditional methods employ handcrafted features and
classifiers, like SVMs, while modern approaches leverage deep learning, particularly
Convolutional Neural Networks (CNNs). CNN-based architectures, such as R-CNN,
YOLO, and SSD, have revolutionized object detection by directly learning hierarchical
features from data. One-stage detectors like YOLO and SSD predict bounding boxes and
class probabilities directly, while two-stage detectors like Faster R-CNN first propose
regions and then classify them. Evaluation metrics like Intersection over Union (IoU) and
Mean Average Precision (mAP) gauge the accuracy and completeness of detection.
Challenges include scale variation, occlusion, cluttered backgrounds, and the need for
real-time processing. Object detection techniques continually evolve to meet the demands
P age |6
3. OBJECTIVES :
Develop a application that detects an object and it can be used for vehicles counting,
when the object is a vehicle such as a bicycle or car, it can count how many vehicles have
passed from a particular area or road and it can recognize human activity too.
7. Scalability: Being able to handle large datasets and diverse types of objects.
4. PROBLEM STATEMENT :
Humans can easily detect and identify objects present in an image but for the computer or
machine a classifying and finding an unknown number of individual objects within an
image is extremely a difficult problem.
Although there exist object detection software and application they do not give an
accurate result of an object because despite a lot of research, real-time, and dynamic
object detection methods are still in process.
P age |7
The problem statement in object detection techniques involves accurately identifying and
localizing objects within images or video frames, addressing challenges such as scale
variation, occlusion, cluttered backgrounds, and the need for real-time processing to
enable effective applications across diverse domains.
4.1 APPLICATIONS :
A well known application of object detection is face detection, that is used in almost all
the mobile cameras. A more generalized (multi-class) application can be used in
autonomous driving where a variety of objects need to be detected. Also it has a
important role to play in surveillance systems. These systems can be integrated with other
tasks such as pose estimation where the rst stage in the pipeline is to detect the object,
and then the second stage will be to estimate pose in the detected region. It can be used
for tracking objects and thus can be used in robotics and medical applications. Thus this
problem serves a multitude of applications:-
3. Retail Analytics: In retail, object detection helps track product movements, analyze
customer behavior, and manage inventory more efficiently.
4. Medical Imaging: Object detection assists in medical image analysis for identifying
and localizing abnormalities in X-rays, MRI scans, and other medical images.
7. Natural Disaster Management: Object detection techniques can identify and track
the movement of natural phenomena such as hurricanes, tornadoes, and wildfires, aiding
in disaster preparedness and response efforts.
10. Sports Analytics: Object detection is used in sports to track player movements,
analyze game tactics, and enhance training programs.
4.2 CHALLENGES :
The major challenge in this problem is that of the variable dimension of the output which
is caused due to the variable number of objects that can be present in any given input
image. Any general machine learning task requires a xed dimension of input and output
for the model to be trained. Another important obstacle for widespread adoption of object
detection systems is the requirement of real-time (>30fps) while being accurate in
detection. The more complex the model is, the more time it requires for inference; and
the less complex the model is, the less is the accuracy. This trade-o between accuracy and
performance needs to be chosen as per the application. The problem involves classi
cation as well as regression, leading the model to be learnt simultaneously. This adds to
the complexity of the problem.
P age |9
2. Scale and Size Variability: Objects can appear at different scales and sizes within
an image, requiring object detection algorithms to be capable of detecting objects
regardless of their size.
4. Limited Training Data: Object detection models typically require large amounts of
annotated training data to learn effectively. However, obtaining high-quality labeled
datasets can be time-consuming and expensive, especially for specialized domains or rare
objects.
5. SOLUTION STATEMENT :
To classify image objects and also to determine the objects positions we use two different
methods object classification and localization in the same time.
The true purpose of this project is that it can be used in security, surveillance and
autonomous vehicle driving to detect pedestrians walking or jogging on the street to
avoid accidents.
We use object classification and localization it provides accuracy, speed for real time
detection and also improves detection tasks are optimized using one multi-task function
and an object is compared to the image's true objects.
6. REQUIRED RESOURCES :
An integrated development environment (IDE) Microsoft Visual Studio code.
Open CV.
Tensorflow.
YOLO Framework.
7. PRE-PROCESSING :
Preprocessing in object detection involves preparing input images before feeding them
into the model for detection. This typically includes several steps. First, resizing the
image to a fixed size to ensure uniformity across inputs. Next, normalization to
standardize pixel values, often by subtracting the mean and dividing by the standard
deviation. Then, data augmentation techniques like rotation, scaling, or flipping to
increase the diversity of the training dataset and improve model generalization.
P a g e | 11
Additionally, some models require converting images to a specific color space, such as
RGB or BGR. Finally, padding images to ensure they fit the input dimensions required by
the model. Overall, preprocessing enhances the model's ability to detect objects
accurately by providing clean, standardized inputs for training and inference.
Localization, on the other hand, involves determining the precise location of objects
within an image by predicting bounding boxes that tightly enclose them. This task
requires regression-based models that predict the coordinates of bounding box corners
relative to the image frame.
Combining classification and localization, object detection models not only classify
objects but also provide spatial information about their locations in the image.
Techniques like region-based convolutional neural networks (R-CNN), You Only Look
Once (YOLO), and Single Shot Multibox Detector (SSD) integrate both tasks to
accurately detect and localize objects in images with varying scales and aspect ratios.
The bounding box is predicted using regression and the class within the bounding box is
predicted using classification. The overview of the architecture is shown in Fig.
P a g e | 12
In the second stage, features are extracted from each proposal and fed into a classifier to
determine the presence of objects and assign class labels. Additionally, bounding box
regression is applied to refine the coordinates of the bounding boxes, improving
localization accuracy.
Common examples of two-stage methods include Faster R-CNN and its variants. While
effective, two-stage methods tend to be slower due to the sequential nature of proposal
generation and classification. However, they often achieve higher detection accuracy
compared to single-stage methods, making them suitable for applications requiring
precise object localization.
In this case, the proposals are extracted using some other computer vision technique and
then resized to xed input for the classification network, which acts as a feature extractor.
Then an SVM is trained to classify between object and background (one SVM for each
class). Also a bounding box regressor is trained that outputs some correction (o sets) for
P a g e | 13
proposal boxes. The overall idea is shown in Fig. These methods are very accurate but
are computationally intensive (low fps).
P a g e | 14
The difference here is that instead of producing proposals, pre-de ne a set of boxes to
look for objects. Using convolutional feature maps from later layers of the network, run
another network over these feature maps to predict class scores and bounding box o sets.
The broad idea is depicted in Fig. The steps are mentioned below:
2. Gather activation from later layers to infer classification and location with a fully
connected or convolutional layers.
3. During training, use jaccard distance to relate predictions with the ground truth.
4. During inference, use non-maxima suppression to lter multiple boxes around the
same object.
P a g e | 15
Evaluation metrics such as precision, recall, and mean Average Precision (mAP)
assess the model's accuracy. Additionally, considerations for real-world deployment,
including hardware constraints and computational efficiency, are vital for practical
applications. Continuous refinement through iterative model updates and adaptation to
evolving datasets and requirements further enhance the technique's effectiveness across
various scenarios. Overall, a well-designed and implemented object detection technique
integrates both theoretical considerations and practical constraints to achieve accurate and
efficient detection results.
P a g e | 16
Results are evaluated using metrics like precision, recall, and mean Average Precision
(mAP) to assess detection accuracy. Visualization techniques, such as bounding box
overlays on images or video frames, provide qualitative insights into model performance.
Furthermore, benchmarking against state-of-the-art methods and comparison with ground
truth annotations validate the technique's effectiveness.
The project is implemented in python 3. Tensor ow was used for training the deep
network and OpenCV was used for image pre-processing. The system specifications on
which the model is trained and evaluated are mentioned as follows: CPU - Intel Core i7-
7700 3.60 GHz, RAM - 32 Gb, GPU - Nvidia Titan Xp.
P a g e | 19
11. CONCLUSION :
Object detection techniques have revolutionized computer vision, enabling machines to
perceive and understand the visual world. Through the amalgamation of deep learning
algorithms, specifically convolutional neural networks (CNNs), and sophisticated
architectures like R-CNN, Fast R-CNN, and YOLO, object detection has achieved
remarkable accuracy and speed.
These techniques have myriad applications, from surveillance systems and autonomous
vehicles to medical imaging and augmented reality. However, challenges persist, such as
detecting small objects, handling occlusions, and maintaining real-time performance on
resource-constrained devices.
Furthermore, the integration of object detection with other tasks like instance
segmentation and multi-object tracking is an exciting area of exploration, promising even
richer understanding of visual scenes.
In conclusion, object detection techniques have evolved into powerful tools, driving
innovation across industries. With ongoing research and development, the future holds
the promise of even more accurate, efficient, and versatile object detection systems.