Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 24

LIVE OBJECT DETECTION AND TRACKING USING

MACHINE LEARNING

Presented by Under the guidance of


J S Deepa Chandrika (19691A0524) Dr. G. Arun Kumar
Asso.Prof – CSE
M Bharathi (19691A0515)
P Bhavani (19691A0519)
S Jasmeen (19691A0554)
Introduction to Object Detection with RCNN
• Region-Based Convolutional Neural Networks, or R-CNNs, are a family of
techniques for addressing object detection tasks, designed for model performance.
Region-Based Object Detection (R-CNN):
This is like most influential paper in Deep Learning that came out in 2014. (Rich
Feature hierarchies for accurate Object Detection and Semantic Segmentation) by
Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun

Architecture and Working of RCNN:


Architecture and Working of RCNN:
Working of RCNN:
• Step1: We start with input image and run region proposal method like selective search,
by which we get 2,000 candidate region proposals in image that we need to evaluate
• Step2: For each candidate regions, as region proposals can be of different sizes and
different aspect ratio, so we are going to warp that region into fixed size, say (224x224)
• Step3: For each warped image regions, we are going to run them independently trough
Convolutional Neural Network (CNN) and that CNN will output classification score for
each of these regions
• But, there is a slight problem here,
• What happens if region proposals that we get from selective search do not exactly
match to the objects that we want to detect in the image?
• So, to overcome this problem,
• Now CNN is going to output additional thing which is transformation that will transform
region proposal box into final box that we want to output for object of our
interest.Finally it would look something like this.
Finally the architecture of RCNN will look like this
Step1: Run region proposal method to compute 2,000 candidate region proposals

Step2: Resize each region to specific size (224x224) and run independently through CNN
to predict class scores and bounding box transform

Step3: Use scores to select subset of region proposals to output

Step4: Compare with the ground truth boxes


We can compare these bounding boxes with the metric called Intersection over Union
(IOU)
IOU = (Area of Intersection) / (Area of Union)
More generally, IOU is measure of Overlap between the bounding boxes.
If, IOU<0.5 → we say it ‘Bad’ IOU>0.5→ ‘descent’, IOU>0.7 → ‘Good’, IOU>0.9 → ‘Almost perfect’
So for this example:
1. It takes largest Pc which is 0.9 in this case
2. It check IOU for all the remaining bounding boxes (i.e. for 0.6, 0.7 for Car 1 and 0.8, 0.7 for Car 2)
3. Now, NMS will suppress 0.6 and 0.7 for car 1 as they have high IOU with respect to bounding box of
Pc=0.9, so like this we get only one bounding box for car 1 which is highlighted in the image.
4. Next, for remaining bounding boxes we have highest Pc=0.8 for car2 and again we check IOU for
remaining boxes (i.e. 0.9 for car1 and 0.7 for car2)
5. Now, NMS will suppress 0.7 as it has high IOU with respect to bounding box of Pc=0.8. And we get
only one bounding box for car 2 as well.

Now, in case of RCNN it is very slow and cannot be used in real-time.


Faster RCNN Object detection

Introduction
Faster RCNN is an object detection
architecture presented by Ross Girshick,
Shaoqing Ren, Kaiming He and Jian
Sun in 2015, and is one of the famous
object detection architectures that uses
convolution neural networks like YOLO
(You Look Only Once) and SSD
( Single Shot Detector).
Let’s explain how this architecture
works,
Faster RCNN is composed from 3 parts
• Part 1 : Convolution layers
• In this layers we train filters to extract the appropriate features the image, for example let’s
say that we are going to train those filters to extract the appropriate features for a human
face, then those filters are going to learn throught training shapes and colors that only exist
in the human face.
• so we can assimilate convolution layers to coffee filters , coffee filter don’t let the coffee
powder pass to the cup so our convolutions layer that learn the object features and don’t let
anything else pass, only the desired object.
• Coffee powder + Coffee liquid = Input image
• Coffee filter = CNN filters
• Coffee liquid = Last feature map of the CNN
• Let’s talk more about Convolution neural networks,
• Convolution networks are generally composed of Convolution layers, pooling layers and a
last component wich is the fully connected or another extended thing that will be used for
an appropriate task like classification or detection.
We compute convolution by sliding filter all along our input image and the result is a
two dimension matrix called feature map.

Pooling consists of decreasing quantity of features in the features map by eliminating


pixels with low values.
And the last thing is using the fully
connected layer to classify those features
wich not our case in the Faster RCNN.
Part 2 : Region Proposel Network (RPN)
RPN is small neural network sliding on
the last feature map of the convolution
layers and predict wether there is an
object or not and also predict the
bounding box of those objects.
Part 3 : Classes and Bounding Boxes
prediction
Introduction to YOLO Algorithm for Object Detection

How the YOLO algorithm works


YOLO algorithm works using the following
three techniques:
• Residual blocks
• Bounding box regression
• Intersection Over Union (IOU)
Residual blocks
• First, the image is divided into various grids.
Each grid has a dimension of S x S. The
following image shows how an input image is
divided into grids.
In the image above, there are many grid cells of equal dimension. Every grid cell will
detect objects that appear within them. For example, if an object center appears within a
certain grid cell, then this cell will be responsible for detecting it.
Bounding box regression
A bounding box is an outline that highlights an object in an image.
Every bounding box in the image consists of the following attributes:
• Width (bw)
• Height (bh)
Class (for example, person, car, traffic light, etc.)- This is represented by the letter c.
Bounding box center (bx,by)
The following image shows an example of a bounding box. The bounding box has been
represented by a yellow outline.
YOLO uses a single bounding box regression to predict the
height, width, center, and class of objects. In the image
above, represents the probability of an object appearing in
the bounding box.
Intersection over union (IOU)
Intersection over union (IOU) is a phenomenon in object
detection that describes how boxes overlap. YOLO uses
IOU to provide an output box that surrounds the objects
perfectly.
Each grid cell is responsible for predicting the bounding
boxes and their confidence scores. The IOU is equal to 1 if
the predicted bounding box is the same as the real box. This
mechanism eliminates bounding boxes that are not equal to
the real box.
The following image provides a simple example of how
IOU works.
• In the image above, there are two
bounding boxes, one in green and the
other one in blue. The blue box is the
predicted box while the green box is
the real box. YOLO ensures that the
two bounding boxes are equal.
• Combination of the three techniques
• The following image shows how the
three techniques are applied to
produce the final detection results.
UML DIAGRAMS FOR LIVE OBJECT DETECTION AND TRACKING

You might also like