Download as pdf or txt
Download as pdf or txt
You are on page 1of 19

P age |1

Object Detection Technique using OpenCV


Report Submitted in fulfillment of the Requirement for the degree

Of

Bachelor of Technology (B.Tech)

in

Department of Computer Science and Engineering

By

Arnab Ghorui. (Roll. 500120010067, Reg. 201430100110106)

Subhadeep Mondal. (Roll. 500120010066, Reg. 201430100110105)

Suvadip Paul. (Roll. 500120010063, Reg. 201430100110159)

Krishna Das. (Roll. 500120010117, Reg. 201430100110155)

Shubhadeep Mandal. (Roll. 500120010119, Reg. 201430100110175)

Rudranil Sen. (Roll. 500120010088, Reg. 201430100110023)

Under the guidance of

Mr. Rafiqul Islam

Assistant Professor, Computer Science & Engineering Department

Guru Nanak Institute of Technology, Kolkata-700114

(2020-2024)
P age |2

ACKNOWLEDGEMENTS

First of all, we would like to thank the almighty whe showered his immense
blessings on us. Our heartfelt full thanks to Prof (Dr.) Swarup Kumar Mitra,
Principal of Guru Nanak Institute of Technology Kolkata for giving us the
accessory environment to acquire knowledge and skill. Our heartfelt thanks to
Dr. Suman Bhattacharya Sir, HOD, Computer Science and Engineering, for
gluing us the opportunity to do this project. I would like to express my special
thanks of gratitude to our Professor Mr. Rafiqul Islam Sir for his able
guidance and support in completing our project on the topic "Object
Detection Technique using OpenCV".

----------------------------------
Arnab Ghorui
----------------------------------
Suvadip Paul

----------------------------------
Subhadeep Mondal

----------------------------------
Rudranil Sen
----------------------------------
Krishna Das
----------------------------------
Shubhudeep Mandal

Date:

Place:
P age |3

GURUNANAK INSTITUTE OF TECHNOLOGY


DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

CERTIFICATE

This is to certify that the thesis entitled, “Object Detection Technique


using OpenCV” being submitted by Arnab ghorui, Suvadip Paul, Subhadeep
Mondal, Rudranil Sen, Krishna Das, Shubhadeep Mandal for the award of the
degree of Bachelor of Technology (Computer Science and Engineering) of GNIT,
is a record of bonafide research work carried out by his under supervision and
guidance. Mr.Rafiqul Islam has worked for nearly one year on the above problem
at the Department of Computer Science and Engineering, Guru Nanak Institute of
Technology, Kolkata and this has reached the standard fulfilling the requirements
and the regulation relating to the degree.

The contents of this thesis, in full or part, have not been submitted to any
other university or institution for the award of any degree or diploma.

Supervisor

______________

MR. Rafiqul Islam , Assistant Professor


Department of Computer Science and Engineering
GNIT, Kolkata.
Head of the Department

----------------------------------
Dr. Suman Bhattacharya
Head of the Department
Department of Computer Science and Engineering,
GNIT, Kolkata
P age |4

Contents
c
Chapter no Content Page No.

1. Abstract 05
2. Introduction 05
3. Objectives 06
4. Problem Statement 06

4.1 Applications 07

4.2 Challenges 08

5. Solution Statement 10
6. Required Resources 10
7. Pre-Processing 10
8. Classification and Localization 11

8.1 Two Stage Method 12

8.2 Unified Method 14

9. Design and Implementation 15

9.1 Use Case Diagram 16

9.2 Activity Diagram 16

9.3 Sequence Diagram 17

9.4 Architecture Of The System 17

10. Implementation And Results 18


11. Conclusion 19
P age |5

1. ABSTRACT :
Efficient and accurate object detection has been an important topic in the advancement of
computer vision systems. With the advent of deep learning techniques, the accuracy for
object detection has increased drastically. The project aims to incorporate state-of-the-art
technique for object detection with the goal of achieving high accuracy with a real-time
performance.

A major challenge in many of the object detection systems is the dependency on


other computer vision techniques for helping the deep learning based approach, which
leads to slow and non-optimal performance. In this project, we use a completely deep
learning based approach to solve the problem of object detection in an end-to-end
fashion. The network is trained on the most challenging publicly available data-set, on
which a object detection challenge is conducted annually. The resulting system is fast and
accurate, thus aiding those applications which require object detection.

2. INTRODUCTION :
Object detection is a well-known computer technology connected with computer vision
and image processing. With the advent of deep learning techniques, the accuracy for
object detection has increased drastically. It focuses on detecting objects or its instances
of a certain class (such as humans, flowers, animals) in digital images and videos. There
are various applications including face detection, character recognition, and vehicle
calculator.

Object detection is a pivotal task in computer vision, identifying and locating objects
within images or video frames. Traditional methods employ handcrafted features and
classifiers, like SVMs, while modern approaches leverage deep learning, particularly
Convolutional Neural Networks (CNNs). CNN-based architectures, such as R-CNN,
YOLO, and SSD, have revolutionized object detection by directly learning hierarchical
features from data. One-stage detectors like YOLO and SSD predict bounding boxes and
class probabilities directly, while two-stage detectors like Faster R-CNN first propose
regions and then classify them. Evaluation metrics like Intersection over Union (IoU) and
Mean Average Precision (mAP) gauge the accuracy and completeness of detection.
Challenges include scale variation, occlusion, cluttered backgrounds, and the need for
real-time processing. Object detection techniques continually evolve to meet the demands
P age |6

of various applications, ranging from surveillance and autonomous vehicles to medical


imaging and retail.

3. OBJECTIVES :
Develop a application that detects an object and it can be used for vehicles counting,
when the object is a vehicle such as a bicycle or car, it can count how many vehicles have
passed from a particular area or road and it can recognize human activity too.

1. Localization: Precisely determining the location of objects within an image.


2. Classification: Assigning a label or category to each detected object.
3. Detection: Identifying multiple objects within an image and distinguishing them from
the background.
4. Accuracy: Achieving high accuracy in detecting and classifying objects, minimizing
false positives and negatives.
5. Efficiency: Ensuring fast and efficient processing, especially for real-time applications.

6. Robustness: Maintaining performance across various conditions such as lighting


changes, occlusion, and variations in object appearance.

7. Scalability: Being able to handle large datasets and diverse types of objects.

8. Interpretability: Providing insights into the decision-making process of the model,


making it easier to understand and trust.

4. PROBLEM STATEMENT :
Humans can easily detect and identify objects present in an image but for the computer or
machine a classifying and finding an unknown number of individual objects within an
image is extremely a difficult problem.

Although there exist object detection software and application they do not give an
accurate result of an object because despite a lot of research, real-time, and dynamic
object detection methods are still in process.
P age |7

The problem statement in object detection techniques involves accurately identifying and
localizing objects within images or video frames, addressing challenges such as scale
variation, occlusion, cluttered backgrounds, and the need for real-time processing to
enable effective applications across diverse domains.

4.1 APPLICATIONS :
A well known application of object detection is face detection, that is used in almost all
the mobile cameras. A more generalized (multi-class) application can be used in
autonomous driving where a variety of objects need to be detected. Also it has a
important role to play in surveillance systems. These systems can be integrated with other
tasks such as pose estimation where the rst stage in the pipeline is to detect the object,
and then the second stage will be to estimate pose in the detected region. It can be used
for tracking objects and thus can be used in robotics and medical applications. Thus this
problem serves a multitude of applications:-

1. Autonomous Vehicles: Object detection is crucial for identifying pedestrians,


vehicles, and other objects on the road, enabling autonomous vehicles to navigate safely.

2. Surveillance Systems: Object detection is used for monitoring and detecting


suspicious activities or objects in public places, enhancing security measures.
P age |8

3. Retail Analytics: In retail, object detection helps track product movements, analyze
customer behavior, and manage inventory more efficiently.

4. Medical Imaging: Object detection assists in medical image analysis for identifying
and localizing abnormalities in X-rays, MRI scans, and other medical images.

5. Augmented Reality: Object detection enables augmented reality applications to


recognize real-world objects and overlay digital content on them.

6. Industrial Automation: Object detection is used in manufacturing for quality


control, defect detection, and robotic assembly tasks.

7. Natural Disaster Management: Object detection techniques can identify and track
the movement of natural phenomena such as hurricanes, tornadoes, and wildfires, aiding
in disaster preparedness and response efforts.

8. Agriculture: In precision agriculture, object detection helps monitor crop health,


detect pests and diseases, and optimize farming practices.

9. Resource Management: Object detection is employed in environmental monitoring


to track wildlife populations, monitor deforestation, and manage natural resources.

10. Sports Analytics: Object detection is used in sports to track player movements,
analyze game tactics, and enhance training programs.

4.2 CHALLENGES :
The major challenge in this problem is that of the variable dimension of the output which
is caused due to the variable number of objects that can be present in any given input
image. Any general machine learning task requires a xed dimension of input and output
for the model to be trained. Another important obstacle for widespread adoption of object
detection systems is the requirement of real-time (>30fps) while being accurate in
detection. The more complex the model is, the more time it requires for inference; and
the less complex the model is, the less is the accuracy. This trade-o between accuracy and
performance needs to be chosen as per the application. The problem involves classi
cation as well as regression, leading the model to be learnt simultaneously. This adds to
the complexity of the problem.
P age |9

1. Variability in Object Appearance: Objects can vary significantly in appearance


due to factors such as lighting conditions, occlusions, pose variations, and background
clutter, making it challenging for algorithms to accurately detect them.

2. Scale and Size Variability: Objects can appear at different scales and sizes within
an image, requiring object detection algorithms to be capable of detecting objects
regardless of their size.

3. Real-Time Processing: Many applications require real-time object detection, such


as autonomous vehicles and surveillance systems. Achieving real-time performance while
maintaining accuracy can be challenging, especially for complex scenes and high-
resolution images.

4. Limited Training Data: Object detection models typically require large amounts of
annotated training data to learn effectively. However, obtaining high-quality labeled
datasets can be time-consuming and expensive, especially for specialized domains or rare
objects.

5. Class Imbalance: In some scenarios, certain object classes may be significantly


more prevalent than others in the dataset, leading to class imbalance issues. This can
result in biased models that perform poorly on underrepresented classes.

6. Computational Complexity: Object detection algorithms often involve


computationally intensive operations, especially deep learning-based approaches.
Deploying these models on resource-constrained devices or in real-time systems can be
challenging due to computational limitations.

7. Privacy and Ethical Concerns: In applications such as surveillance and facial


recognition, object detection raises concerns related to privacy, data security, and
potential misuse of technology. Addressing these ethical considerations is essential for
responsible deployment.

8. Adversarial Attacks: Object detection models are vulnerable to adversarial attacks,


where imperceptible perturbations to input images can cause misclassification or false
detections. Mitigating the impact of adversarial attacks is an ongoing research area in
computer vision.

9. Interpretability and Explainability: Deep learning-based object detection models


are often perceived as black boxes, making it challenging to understand their decision-
P a g e | 10

making process. Ensuring interpretability and explainability of model predictions is


crucial, especially in safety-critical applications.

5. SOLUTION STATEMENT :
To classify image objects and also to determine the objects positions we use two different
methods object classification and localization in the same time.

The true purpose of this project is that it can be used in security, surveillance and
autonomous vehicle driving to detect pedestrians walking or jogging on the street to
avoid accidents.

We use object classification and localization it provides accuracy, speed for real time
detection and also improves detection tasks are optimized using one multi-task function
and an object is compared to the image's true objects.

6. REQUIRED RESOURCES :
 An integrated development environment (IDE) Microsoft Visual Studio code.

 Python Programming Language.

 Open CV.

 Tensorflow.

 YOLO Framework.

7. PRE-PROCESSING :
Preprocessing in object detection involves preparing input images before feeding them
into the model for detection. This typically includes several steps. First, resizing the
image to a fixed size to ensure uniformity across inputs. Next, normalization to
standardize pixel values, often by subtracting the mean and dividing by the standard
deviation. Then, data augmentation techniques like rotation, scaling, or flipping to
increase the diversity of the training dataset and improve model generalization.
P a g e | 11

Additionally, some models require converting images to a specific color space, such as
RGB or BGR. Finally, padding images to ensure they fit the input dimensions required by
the model. Overall, preprocessing enhances the model's ability to detect objects
accurately by providing clean, standardized inputs for training and inference.

The process of preprocessing improves the image intensity by suppressing the


unwanted features or enhancing them for further processing. It resizes the image size to
448*448 and also normalizes the contrast and brightness effects. The image is also
cropped and resized so that feature extraction can be performed easily. The input images
are pre-processed and very easily normalize the contrasts and brightness.

8. CLASSIFICATION AND LOCALIZATION :


Object detection techniques involve both classification and localization tasks .
Classification aims to identify the category or class of objects present in an image, such
as cars, pedestrians, or animals. This is typically achieved using convolutional neural
networks (CNNs) trained on large datasets to learn discriminative features for each class.

Localization, on the other hand, involves determining the precise location of objects
within an image by predicting bounding boxes that tightly enclose them. This task
requires regression-based models that predict the coordinates of bounding box corners
relative to the image frame.

Combining classification and localization, object detection models not only classify
objects but also provide spatial information about their locations in the image.
Techniques like region-based convolutional neural networks (R-CNN), You Only Look
Once (YOLO), and Single Shot Multibox Detector (SSD) integrate both tasks to
accurately detect and localize objects in images with varying scales and aspect ratios.

The bounding box is predicted using regression and the class within the bounding box is
predicted using classification. The overview of the architecture is shown in Fig.
P a g e | 12

8.1 TWO-STAGE METHOD :


Two-stage methods in object detection involve a sequential approach to localize objects
within an image. The first stage generates a set of candidate regions or proposals likely to
contain objects. These proposals are usually obtained using region proposal algorithms
like Selective Search or EdgeBoxes. The second stage involves classifying and refining
these proposals to accurately detect objects.

In the second stage, features are extracted from each proposal and fed into a classifier to
determine the presence of objects and assign class labels. Additionally, bounding box
regression is applied to refine the coordinates of the bounding boxes, improving
localization accuracy.

Common examples of two-stage methods include Faster R-CNN and its variants. While
effective, two-stage methods tend to be slower due to the sequential nature of proposal
generation and classification. However, they often achieve higher detection accuracy
compared to single-stage methods, making them suitable for applications requiring
precise object localization.

In this case, the proposals are extracted using some other computer vision technique and
then resized to xed input for the classification network, which acts as a feature extractor.
Then an SVM is trained to classify between object and background (one SVM for each
class). Also a bounding box regressor is trained that outputs some correction (o sets) for
P a g e | 13

proposal boxes. The overall idea is shown in Fig. These methods are very accurate but
are computationally intensive (low fps).
P a g e | 14

8.2 UNIFIED METHOD :


The unified method in object detection integrates object localization and classification
into a single neural network architecture. This approach eliminates the need for separate
stages, enabling end-to-end training and inference. Models like SSD (Single Shot
Multibox Detector) and YOLO (You Only Look Once) are examples of unified methods.
They predict object classes and bounding box coordinates simultaneously, allowing for
efficient real-time detection in various scenarios. While these methods sacrifice some
accuracy compared to two-stage approaches, they offer significant speed advantages,
making them well-suited for applications requiring fast and accurate object detection,
such as autonomous driving and surveillance systems.

The difference here is that instead of producing proposals, pre-de ne a set of boxes to
look for objects. Using convolutional feature maps from later layers of the network, run
another network over these feature maps to predict class scores and bounding box o sets.
The broad idea is depicted in Fig. The steps are mentioned below:

1. Train a CNN with regression and classification objective.

2. Gather activation from later layers to infer classification and location with a fully
connected or convolutional layers.

3. During training, use jaccard distance to relate predictions with the ground truth.

4. During inference, use non-maxima suppression to lter multiple boxes around the
same object.
P a g e | 15

9. DESIGN & IMPLEMENTATION :


The process of designing and implementing an object detection technique begins
with selecting a suitable neural network architecture, such as Faster R-CNN, SSD, or
YOLO. Preprocessing steps include image resizing, normalization, and augmentation.
Defining appropriate loss functions, like classification loss and localization loss, ensures
effective training. Implementation involves training the model on annotated datasets,
adjusting hyperparameters, and optimizing inference speed for real-time performance.

Evaluation metrics such as precision, recall, and mean Average Precision (mAP)
assess the model's accuracy. Additionally, considerations for real-world deployment,
including hardware constraints and computational efficiency, are vital for practical
applications. Continuous refinement through iterative model updates and adaptation to
evolving datasets and requirements further enhance the technique's effectiveness across
various scenarios. Overall, a well-designed and implemented object detection technique
integrates both theoretical considerations and practical constraints to achieve accurate and
efficient detection results.
P a g e | 16

9.1 Use Case Diagram :

9.2 Activity Diagram :


P a g e | 17

9.3 Sequence Diagram :

9.4 Architecture Of The System :


P a g e | 18

10. IMPLEMENTATION AND RESULTS :


Implementation of an object detection technique involves training the model on labeled
datasets, adjusting parameters, and optimizing inference speed. This process requires
significant computational resources and may involve fine-tuning pre-trained models to
improve performance on specific tasks. Additionally, deploying the model in real-world
applications involves considerations such as hardware compatibility and efficiency.

Results are evaluated using metrics like precision, recall, and mean Average Precision
(mAP) to assess detection accuracy. Visualization techniques, such as bounding box
overlays on images or video frames, provide qualitative insights into model performance.
Furthermore, benchmarking against state-of-the-art methods and comparison with ground
truth annotations validate the technique's effectiveness.

Ultimately, successful implementation yields accurate and efficient object detection


across diverse scenarios, with the potential for real-time deployment in applications
ranging from autonomous driving to surveillance systems. Continuous refinement and
adaptation based on feedback and evolving requirements ensure the technique remains
effective in addressing new challenges and datasets.

The project is implemented in python 3. Tensor ow was used for training the deep
network and OpenCV was used for image pre-processing. The system specifications on
which the model is trained and evaluated are mentioned as follows: CPU - Intel Core i7-
7700 3.60 GHz, RAM - 32 Gb, GPU - Nvidia Titan Xp.
P a g e | 19

11. CONCLUSION :
Object detection techniques have revolutionized computer vision, enabling machines to
perceive and understand the visual world. Through the amalgamation of deep learning
algorithms, specifically convolutional neural networks (CNNs), and sophisticated
architectures like R-CNN, Fast R-CNN, and YOLO, object detection has achieved
remarkable accuracy and speed.

These techniques have myriad applications, from surveillance systems and autonomous
vehicles to medical imaging and augmented reality. However, challenges persist, such as
detecting small objects, handling occlusions, and maintaining real-time performance on
resource-constrained devices.

Continued research is focused on addressing these challenges, with advancements in


model architectures, dataset collection, and training methodologies. Techniques like one-
stage detectors, which directly predict bounding boxes and class probabilities, and
efficient backbones, such as MobileNet and EfficientNet, have pushed the boundaries of
object detection.

Furthermore, the integration of object detection with other tasks like instance
segmentation and multi-object tracking is an exciting area of exploration, promising even
richer understanding of visual scenes.

In conclusion, object detection techniques have evolved into powerful tools, driving
innovation across industries. With ongoing research and development, the future holds
the promise of even more accurate, efficient, and versatile object detection systems.

You might also like