Professional Documents
Culture Documents
Updated Report (27 Aug)
Updated Report (27 Aug)
High-Resolution Images
PROJECT REPORT
Submitted by:
Aqsa Abu Bakar.
(UET-18F-BSCE-KICSIT-02)
Ayesha Tahir.
(UET-18F-BSCE-KICSIT-13)
Esha Baig.
(UET-18F-BSCE-KICSIT-18)
PROJECT SUPERVISOR
KAHUTA
(2022)
UAV Detection and Tracking in High-Resolution Images
PROJECT REPORT
Submitted by:
Ayesha Tahir.
(UET-18F-BSCE-KICSIT-13)
Esha Baig.
(UET-18F-BSCE-KICSIT-18)
PROJECT SUPERVISOR
KAHUTA
(2022)
In the name of ALLAH ALMIGHTY the most beneficent,
the most merciful
We hereby declare that this project and work, neither as a whole nor as a part has been copied
out from any source. It is further declared that we have conducted this project work and have
accomplished this thesis entirely on the basis of our personal efforts and under the sincere
guidance of our supervisor Mr. Mughees Sarwar Awan. If any part of this project is proved
to be copied from any source or found to be a reproduction of the same project, we shall stand
by the consequences. No portion of the work presented in our dissertation has been submitted
in support of any other degree or qualification of this or any other university or institute of
learning.
MEMBER’S SIGNATURES
______________________________
Aqsa Abu Bakar.
(UET-18F-BSCE-KICSIT-02)
_____________________________
Ayesha Tahir.
(UET-18F-BSCE-KICSIT-13)
_____________________________
Esha Baig.
(UET-18F-BSCE-KICSIT-18)
___________________________
Mr. Mughees Sarwar Awan
Assistant Professor
Computer Engineering
It is certified that we have examined the thesis titled UAV Detection and Tracking in High-
Resolution Images/ Videos submitted by Aqsa Abu Bakar (UET-18F-BSCE-KICSIT-02),
Ayesha Tahir (UET-18F-BSCE-KICSIT-13), Esha Baig (UET-18F-BSCE-KICSIT-18),
and found as per standard. We accept the work contained in the report as a confirmation of the
required standard for the partial fulfillment of the degree of Bachelor of Science in Computer
Engineering.
Committee:
External Examiner:
_________________________
Name
Designation
Institute
Project Coordinator:
_________________________
Mr. Muhammad Waqas
Project Supervisor:
_________________________
Mr. Mughees Sarwar Awan
Director KICSIT:
_________________________
Engr. Masood Khalid
Copyright in text of this thesis rests with the authors. Copies (by any process) either in full, or
of extracts, may be made only in accordance with instructions given by the author and lodged
in the Library of KICSIT. Details may be obtained by the Librarian. This page must form part
of any such copies made. Further copies (by any process) of copies made in accordance with
such instructions may not be made without the permission (in writing) of the author.
The ownership of any intellectual property rights which may be described in this thesis is
vested in KICSIT, subject to any prior agreement to the contrary, and may not be made
available for use by third parties without the written permission of the KICSIT, which will
prescribe the terms and conditions of any such agreement.
Further information on the conditions under which disclosures and exploitation may take place
is available from the Library of KICSIT, Kahuta.
First of all, we thank Allah, the almighty, for giving us the strength to carry on this project and
for blessing us with many great people who have been the greatest support in our personal and
professional lives. We also thank our parents, for advising, praying, and supporting us in
everything in our lives, especially for 4 year studying Engineering. We owe so much to our
whole family for their undying support, and their unwavering belief that we can achieve so
much. We would like to thank our university Dr. A Q Khan Institute of Computer Sciences &
Information Technology for teaching us Computer Engineering and for all our professors and
doctors. Also, we would specially like to thank Swarm Robotics Lab, National Center of
Robotics and Automation for their cooperation and contribution during the training process of
our dataset.
We deeply thank our Director I/C Dr. Salman Iqbal whose help and advice were invaluable.
Not least of all, I am very fortunate and grateful to my project supervisor Mr. Mughees
Sarwar Awan for providing leadership and ideas. Thank you for your support and helpful
suggestions, we will be forever thankful to you. He provides help to solve various problems.
Several organizations including the military government and commercial sector are adopting
drone technology. Drones are also known as UAVs (Unmanned Ariel Vehicles). The
exponentially increasing public accessibility of drones possesses a great threat to general
security and confidentiality. Therefore, detecting and eliminating drones before lethal
outcomes is of paramount interest. For this purpose, we have developed an autonomous drone
detection and tracking system. For evaluation of the detection algorithm, we have used YOLO
v3 which is combined with the MOSSE tracker to enhance the efficiency of the detector. The
detection algorithm was trained through our custom dataset for which we have manually
labeled approx. 8000 images were gathered from almost 120 videos. The ROI(Region Of
Interest )is passed from the detector to the tracker which keeps on tracking the UAV until or
unless the target gets lost after which it shifts back to the initial detection. The rotating turret
rotates along the targeted UAV as it displaces from the center of the frame. As soon as the
drone is detected the system gives an alert in the form of a siren, sends an email message and
calls the authorized admin to inform about the detected drone. Authorized admin can view the
screen via a mobile app connected with the system through Wi-Fi. Moreover, the videos of the
tracked drones are saved at the backend along with the proper date and time so that they can
be used in the future for other purposes.
Acknowledgments............................................................................................................................... vii
4.2 Motors..............................................................................................................................................17
4.2.1 Introduction ....……........…………...…………………………………….……….........17
4.2.2 Technical Specification …...……........…………...….……………...…………….........17
4.3Arduino….........................................................................................................................................18
5.1.1 Detection and Tracking of the face in real-time using Haar cascade…..………………. 35
6.1 Conclusion...................................................................................................................................... 64
REFERENCES ................................................................................................................................ 65
Figure 4.6: Circuit Diagram of L298N Motor Driver with motors. ….................................. 22
Figure 4.9: Turrets (a) Static Turret, (b) Rotating turret. ....................................................... 26
Figure 4.11: Base of Aluminum and Timber (a) Side view, (b) Top view............................. 27
Figure 4.12: Motors and Gears attached with the rod. …..................................................... 28
Figure 4.15: Static Turret. (a) Static Turret without the camera, (b) Static Turret with the
camera …………………..................................................................................................... 29
Figure 4.16: Rotating Turret. (a) Rotating Turret without the camera, (b) Rotating Turret with
the camera ………………………………….…………....................................................... 30
Figure 5.6: Testing of drone detection using the pre-trained classifier. ................................. 38
Figure 5.7: Testing of drone detection using the custom-trained classifier. .......................... 39
Figure 5.29 (a)Folder of tracked drones, (b) Images saved inside folder, (c)Zoomed images
along with the video of tracked drones……………………………………………………54-55
Acronyms Abbreviations
Chapter 1
INTRODUCTION
CHAPTER 1
INTRODUCTION
1.1 Project Background
UAVs (Un-Manned Ariel vehicles) also known as Drones are a modern and advanced technology used
in many fields. It is currently being used by many countries for military intelligence, reconnaissance,
and espionage operations and a number of organizations are adopting this technology including the
government, commercial and military sector. Drones are unlimited in our daily lives. For example,
drones are used for courier, security monitoring, and other purposes. On the other hand, drones are a
double-edged sword. Drones can be used to improve people's quality of life, but they can also be used
for other criminal and harmful purposes. Even terrorists have used smart drones to attack several
countries. Therefore, drone detection is an active research topic that attracts the attention of many
scientists. Sophisticated drones are often difficult to detect and, in some cases, life-threatening.
Most modern detection technologies use video, audio, radar, temperature, radio frequency (RF), or
Wi-Fi technology. However, each detection method has the disadvantage that it is not suitable for
detecting drones in sensitive areas.
Our goal is to overcome the challenges faced by current methods of drone detection. In this project,
we have combined different strategies to create an efficient drone detection and tracking system. In
particular, we have evaluated different detection and tracking techniques based on risk assessment and
probability of success. Most of the proposed drone detection solutions are based on reliable data
transmission, which is continuously sent to the ground station for data processing. Such
implementations may fail if large amounts of data need to be transferred over long distances.
The popular detection techniques for UAV detection are RADAR, LIDAR, and Acoustics. RADARs
cannot resolve the type of object being detected, LIDARs with a laser beam of high intensity can cause
damage to the human eye while acoustics is efficient in noisy environments such as airports
It is well known that the detection abilities are high when the target is visible, and camera sensors have
this advantage. The popularity of vision-based systems combined with image and video-processing
technologies is growing, in part, due to the proliferation of systems based on deep learning. The
advantage of these automated systems is that they can be operated at scale at any time of the day or
night, without human intervention and with minimal maintenance. It is modern and less susceptible to
human prejudice.
• To develop a system that doesn’t require large space and can be mounted on a small space.
Cameras have long been used to monitor security-sensitive areas. Drones have been used in many
security events in recent years. Drone detection requires the purchase of expensive equipment. In
past, other detection systems such as RADARs have cost us a fortune for setting up butt this system
meets the same functionality at a very lower cost and it can be mounted anywhere due to its
compactness. Therefore, you need a drone detection system. Today, security-sensitive areas such
as streets, busy public places, and borders are being monitored. Video surveillance has been around
for a long time. In recent years, the number of drone cases involving both civilian and military
sites has increased dramatically. Exorbitantly expensive specialized equipment is often used to
identify drones. Therefore, you need a drone detection system that can run on high-performance
hardware.
Improvement of detection accuracy for drone / UAV using the high-resolution multi-cameras, and
detect the high fidelity movement of aerial objects i.e. drones.
This project is based on the video coming from two different cameras one with 4K resolution and the
other with 1080p mounted on two different turrets. The two turrets being used in this project are:-
1. Static Turret.
2. Rotating Turret.
The wide-angle camera that has 4K resolution is mounted on the static turret. To detect the object,
and is connected to the system. It detects the object i.e. the drone and is capable of detecting
multiple drones at a time.
A revolving structure that has two movements. Pitch and Yaw movements are two types of
movement. These movements are controlled by two motors. The low-angle camera that has 1080p
resolution is mounted on the rotating turret. It also allows us to get monitor the drone being tracked
closely by displaying a zoomed-in version of the drone. To track the object that is connected to the
system. Arduino is used to rotate the motors based on the coordinates sent serially from the Python
script to the Arduino. The motors are driven by the L298N motor driver IC. A battery powers the
motors.
YOLO is an associate degree rule that uses neural networks to supply a period of time object
detection. This algorithm is common thanks to its speed and accuracy. it's been utilized in varied
applications to detect traffic signals, people, parking meters, and animals.
YOLO is an abbreviation for the term ‘You solely Look Once’. this is often an algorithm that
detects and acknowledges various objects in exceedingly image (in real-time). Object detection in
YOLO is finished as a regression downside and provides the category possibilities of the detected
images.
YOLO algorithm employs convolutional neural networks (CNN) to detect objects in real-time. As
the name suggests, the rule needs solely one forward propagation through a neural network to
notice objects. this implies that prediction within the entire image is finished in an exceedingly
single algorithm run. The CNN is employed to predict varied category possibilities and bounding
boxes simultaneously.
• Residual blocks
• Bounding box regression
• Intersection Over Union (IOU).
YOLOv3 could be a period of time object detection algorithm that identifies specific objects in
videos, live feeds, or images. YOLO uses options learned by a deep convolutional neural network
to notice associate degree object. it's a State-of-the-art Algorithm. it's thus quick that it's become
virtually a customary approach of sleuthing objects within the field of laptop vision.
Initialization and tracking are the algorithm's two core components. The initialization of the object
is established from the first few frames of either the simple or the more complex tracker version.
The item is cropped and centered before the filter is put on. The initialization filter is then attached
to the video tracking frame to calculate the object's new position. The tracker modifies the filter
when keeping track of complicated versions. The user selects an object by clicking in its center to
begin the filter. When you decide to annotate an object, the bounding box around at that object gets
displayed. The bounding box represents both the monitor window and the initialized model. During
tracking, the model would be clipped in order to create and update the filter. Before filters
initialization and object tracking, each frame of the film goes through a pre - processing stage.
The preprocessing step model is converted into a Fourier region. In order to initialize and modify
the filter, a composite result is created. Additionally, the combined output is converted into a
Fourier area, where the filter is then computed. In the next few frames of the video while tracking,
the correlation output is translated to the spatial domain and the new position of the identified item.
To accelerate the computation, the correlation is performed out in the Fourier area.
This chapter discusses the idea behind the project, the background, and the basic need for this
project. Furthermore, he gave a short and small introduction to the components and algorithms
used in the project, details of which will be provided in the following chapters.
Chapter 2
LITERATURE REVIEW
CHAPTER 2
LITERATURE REVIEW
In this portion, we present a literature review of UAVs, and their prevention using anti-UAV
techniques. We first explain the existing systems such as cameras, videos for detection, RADAR
systems, etc. After understanding their operation, various techniques for monitoring and
preventing UAV attacks are described along with case studies.
The existing systems include video detection methods, cameras for detection, RADAR systems,
and RF-based methods.
The surveillance system captures a wide area of the sky in high resolution with a 4K camera and
transmits the individual image to the server every second. [4]
A visual camera allows adjustment and provides each UAV with the ability to visually monitor
other members of the formation. Developing a robust solution for detecting and tracking
collaborative targets through a series of frames. [5]
years to detect Unmanned Aerial Vehicles, especially at long distances and with poor visibility
such as thick fog or at night, while traditional radar detects objects that are smaller, slower, and at
lower altitudes than regular aircraft. Not designed to be. Radar is the easiest to find a drone in
flight and has a high false-positive rate in crowded urban areas. [6]
2.1.4 Radio Frequency Based Methods:
Radio frequency-based approaches are ineffective if the drone is not interacting with the controller.
RF-based control and video transmission. It is difficult when the altitude is high. It depends on the
transmission power and reception sensitivity. [7]
2.1.5 Real-Time Drone Detection and Tracking with Visible, Thermal, and Acoustic Sensors:
This paper describes the design process of a multi-sensor drone auto-detection system besides the
usual video and audio sensors; the system also includes thermal Infrared cameras proving to be a
viable solution for Drone detection tasks. Lower the resolution a little Performance is comparable
to visible range cameras. Detector performance as a function of sensor-to-target Distance is also
considered. In addition, by sensor fusion, the system is designed to be more robust than individual
sensors. Helps reduce false positives. To solve the lack of publicity. [8]
A convolutional neural network (CNN) may be an epoch within the deep learning field, it always
includes one enter layer and one output layer and to boot consists of quite one hidden layer among
the enter and output layers. The convolutional layer applies a convolution kernel over the input
tensor and also the convolutional operation is basically a cross-correlation mathematical process
that may cut back a large number of network parameters while achieving constant goals as a
conventional absolutely connected network. [9] Initially introduced a deep convolutional neural
network to classify ImageNet datasets, going beyond ancient classification methods. once their
successful work, a variety of recent applications emerged on CNN [10].
CNN is most typically employed in laptop vision image classification, image object detection,
image object segmentation, image vogue transfer, and more. Object detection may be a computer
vision task that discovers the category and site of objects in a very specific image. Discovered
objects are known by an oblong bounding box with a particular class identifier. Recent studies
have used deep learning to realize objection detection. Area-based convolutional neural networks
(R-CNN) for object detection are typical of this kind of deep learning technique. Use the region of
interest technique to separate object classification and bounding box predictions and see objects
[11]. quick R-CNN improves the speed of RCNN by introducing a vicinity of interest (ROI)
pooling that may share a convolutional layer [12]
Chapter 3
REQUIREMENT ANALYSIS
CHAPTER 3
REQUIREMENT ANALYSIS
The requirement analysis covers the user requirements, functional requirements, non-functional
requirements, and important key points indicator for a successful project.
The UAV Detection and tracking in High-Resolution Images/ Videos are specially designed
for indigenous organizations. The main objective involved in this project is to detect and track
drones. Resolve issues in the current detection systems. Avoid false alarms. Security increases at
a lower cost. To develop a system that doesn’t require large space and can be mounted on a small
space.
Most of the work is Image-based real-time detection and tracking has been specially developed
for indigenous organizations. The main goal of this plan is to detect and track the drone. The use
of visual information to find and classify drones continues to be in its infancy. Most of the work
was done through victimization the learned options and numerous deep learning models and
techniques. Deep learning approaches, on the opposite hand, are data-driven and need giant tagged
datasets to create strong models. the shortage of public out their datasets may be a major obstacle
to analysis in this area rather than ranging from scratch, some authors used transfer learning to cut
back this difficulty. In other studies, a special software system was accustomed [13] to generating
composite pictures to extend the number of samples within the dataset. Data enlargement and
therefore the use of generative models appreciate Generative Adversarial Networks (GANs) to
make generative data comparable to the first real data increase the body of knowledge that will be
employed in the future. There are two ways. Most of the analysis on visual drone detection doesn't
specify the kind of detection device, drone type, detection range, or dataset used in their research.
These details are necessary to validate your work and compare it to similar investigations.
excluding these machine learning features, visual detection is restricted there in potency and is
often limited as a result of it depends on the presence of a line of sight (LOS) between the drone
and therefore the camera system.
The hardware components include the onboard Arduino circuit to which all system components
are connected.
3.2.2 Arduino Python programming
The connected components are controlled by Arduino, which also performs the geared motor and
turret rotation functions.
• Safety:
Appropriate testing and security measures have been taken to ensure that the system for
implementation and maintenance is functioning efficiently and properly.
• Performance:
• Availability:
The system should be accessible and will work effectively at any time
• Reliability:
It requires extremely less human resources, but it still provides more efficient results,
unlike the drone detectors and tracking currently in use.
Our focus is on detecting and tracking drones in high-resolution images/ videos and generating an
alarm, sending emails and call to the dedicated authorities in real-time. Drone detection and
tracking also saved in the memory.
Chapter 4
SYSTEM DESIGN AND
IMPLEMENTATION
CHAPTER 4
SYSTEM DESIGN AND IMPLEMENTATION
4.1 Battery
4.2 Motors
4.2.1 Introduction
There are two motors that are responsible for the clockwise and anticlockwise rotation of the turret.
These motors allow two movements, pitch, and yaw. One motor is responsible for yaw(left and
right) movement and the other is responsible for the pitch(up and down) movement. Turret
follows the detected drone.
4.3 Arduino
Arduino Uno may be a microcontroller board supported by ATmega328 (datasheet). 14 digital
input/output pins (6 of which are used as PWM outputs), 6 relevant analog inputs, 16 megacycle
crystal oscillators, USB connectors, influence sockets, ICSP headers, and resets There is a button.
It contains everything you need to support a microcontroller. Connect it to your laptop with a USB
cable, or power it on with an AC-DC adapter or battery to start booting. Uno differs from all
previous boards in that it does not use the FTDI USB-to-Serial driver chip. Instead, the Atmega8U2
is programmed as a USB to serial converter. As an Italian associate degree,
"Uno" was named to commemorate the next release of Arduino 1.0. Uno and version 1.0 have
the potential to be reference versions of the evolving Arduino. With advanced features on various
USB Arduino boards, Uno is also the reference model for the Arduino platform for evaluation
compared to previous versions. [14]
• Microcontroller: ATmega328P
4.3.2 Communication
We have to connect Arduino with the PC in this project (Python Script). We have used the
PySerial library because PySerial is a Python library that allows a PC to communicate with
Arduino or other systems serially.
Python gets the bounding box coordinates of the discovered object and sends a serial command
to Arduino accordingly. The motor (clockwise or counterclockwise) received from Python is
driven by the Arduino via the motor driver IC described later in this chapter
4.3.3 Programming
Arduino Uno can be programmed with Arduino software (download). Select Arduino Use
ATmega328 from the Tools> Board menu (compatible with onboard microcontrollers) Arduino
Uno's ATmega328 comes pre-installed with a bootloader that allows you to push new code without
using an external device developer. Has been the first STK500 convention (reference, C header
record) Program the boot loader and microcontroller through the ICSP (In-Circuit Serial
Programming) header. Check this driving route for attractions. You can access the source code of
the ATmega8U2 firmware.
The ATmega8U2 is stacked with the DFU boot loader and can be activated by connecting the bind
jumper on the back of the board (near the Italian instructions) and resetting the 8U2 at this point.
You can then use Atmel's FLIP programming or DFU software developers to stack different
firmware. Alternatively, you can use the ISP header with an external software engineer
(overwriting the DFU boot loader).
It is supported the H-bridge concept. A circuit that permits current to flow in both directions
because the voltage is responsible for changing the direction of motor rotation either clockwise or
anticlockwise direction. In L298N chip consists of two h-bridge circuits that can independently
rotate two dc motors. The pin diagram of the L298N motor driver is shown below
Two Enable pins lie on Pin 6 and 11 both should be high to drive the motor. Pin 6 should enable
high to drive motor with the left H-Bridge. We should enable pin 11 for the right H-bridge. The
motor in the applicable segment will stop functioning If either pin 6 or pin 11 goes low. At the
same time, a switch has been inverted.
Input Function
EN A Input1 = H, Input2= L Clockwise
Input1 = L, Input2= H Anticlockwise
Input1 = Input2 Fast motor stop
EN B Input1 = X, Input2= X Free running Motor stop
Important point: We can simply connect the pin 4 Vcc (5V) to pin 6 and pin 11 to make them
high.
The L298N has 4 input pins, Input 1 and Input 2 on pins 5 and 7 will control the rotation of motor
A while Input 3 and Input 4 on pins 10 and 12 will control the rotation of motor B. The rotation of
the motor is based on LOGIC 0 or LOGIC 1 input carry via input pins.
The wide-angle camera that has 4K resolution is mounted on the static turret.
PK-940 HA 1080p camera is used in our project. The camera is mounted on a rotating turret and
also connected to the system through the USB port. Camera capture frames in the form of
video/images in real-time. Algorithms evolved on the Python library especially used for image
processing using OpenCV and DNN library is used to execute deep neural networks in python and
detect the desired object as output.
4.7 Turrets
This portion will have introduced about the turret and its mechanical portions related to its
movements.
• Static turret.
• Rotating turrets.
The wide-angle camera that has 4K resolution is mounted on the static turret. To track the object
that is connected to the system. It detects the object and passes the targeted object bounding box
to the tracker. A revolving structure that has two movements. Pitch and Yaw movements are two
types of movement. These movements are controlled by two motors. The low-angle camera that
has 1080p resolution is mounted on the rotating turret. To track the object that is connected to the
system. Arduino is used to rotate the motors based on the coordinates sent serially from the Python
script to the Arduino. The motors are driven by the L298N motor driver IC. The motors are
powered by a battery.
(a) (b)
In this section, all of the components which include the timber board, rods, gear, and so on. in
addition to their dimensions, are shown with the help of images
All if these components (mechanical) are required to precisely construct the turret model in order
that it may work efficiently.
The upper part of the model is made from timber, and is well positioned over a base of aluminum
and timber to offer a balanced model, is shown as above.
(a) (b)
Figure 4.11: Base of Aluminum and Timber (a) Side View, (b) Top View.
The motor designated for pitch (up and down) is geared up in the timber arm of the higher part
of the rotating turret with the help of a hole so one can allow the aluminum rod to rotate freely
with the help of baring and gears and a metal strip to keep the motor in place
The turret's top part of the base has an imperative hole in it, into which we inserted a bearing
that permits it to revolve in a 360-degree route.
This is the mechanism with the help of which the turret rotates in the route of Yaw (left to right)
and is supported by way of three wheels that rotate at the aluminum pinnacle with no trouble.
These are the three caster wheels deployed to distribute the weight of the turret's upper module in
order that the lower motor does not bear the entire load.
(a) (b)
Figure 4.15: Static turret (a) Static Turret without the camera, (b) Static Turret
with the camera.
(a) (b)
Figure 4.16: Rotating turret (a) Rotating turret without the camera, (b) Rotating
turret with the camera.
The system that we used in this project is Dell t7920 Workstation. Some of the technical
specifications of the system are as follows:
The NVIDIA Quadro RTX 5000 16GB GPU provides a wide range of features to supercharge
next-generation workflows. Offering a range of new features, The RTX 5000 offers many new
features that allow professionals to push the boundaries of what is possible with the NVIDIA
Turing architecture packaged in the 12nm manufacturing process. Additionally, the graphics card
is equipped with 3072 CUDA cores and 16GB of GDDR6 memory, plus 48 ray tracing cores. [18]
The features of RTX 5000 GPU are Four Display Port 1.4 Connectors, Virtual Link Connector3,
DisplayPort with Audio, VGA Support4, 3D stereo Support with Stereo Connector4, NVIDIA
GPUDirect™ Support, Quadro Sync II5 Compatibility, NVIDIA nView® Desktop Management
Software, HDCP 2.2 Support > NVIDIA Mosaic6.
CHAPTER 5
CHAPTER 5
We employed a .xml file with a pre-trained Haar cascade classifier for real-time face detection.
We created a Python algorithm for this. Face identification was primarily done so that we could
gain a fundamental understanding of the Haar Cascade Algorithm. Results are depicted visually
in the provided image. And it is clear that the classifier accurately recognizes faces in real-time.
5.1.1 Detection and Tracking of the face in real-time using Haar cascade
Following the development of a face detection algorithm, the bounding box's coordinates were
extracted. As soon as we knew the face's coordinates, we send a command to the Arduino to drive
the motors in that direction so we could monitor the face in real-time.
Many experts who study drone detection choose to keep their work (including their classifier and
dataset) private. Furthermore, there isn't a drone-specific bespoke dataset available in Pakistan
that can be trained using various methods. Because of this, it was necessary to gather our dataset
for that particular drone. We developed our dataset and gave it the title AUDATS-UAV.
Nearly every machine learning algorithm has a specific dataset format, and Haar Cascade is no
exception. Positive and negative images are needed to train the Haar Cascade classifier. Negative
photos are those that contain images without any drones, and positive images solely contain drone
images. Which kind of negative photos you used is up to you. Depending on the type of special
thing you wish to identify, you can employ various situations, backdrops, and objects in negative
photos.
In our situation, we employ 2000 negative images gathered from various sources along with 1500
positive images (self-captured images). When training the Haar classifier, more negative images
are often used than positive ones.
Samples that were positive and negative were kept in separate folders with the letters "p" and "n,"
respectively.
We utilized the GUI tool "Cascade Trainer GUI (a tool built by Amin Ahmadi)" to build our
classifier on a specific dataset.
A method based on machine learning where a cascade function is trained using a large number of
both positive and negative images. It is then used to detect the objects in the other images that are
based on the training. They are huge individual .xml files with a huge amount of feature sets and
each XML corresponds to a very unique type of use case
5.2.5 Block Diagram of Drone detection using Haar Cascade custom Classifier
Training
Parameters
Haar features
Drone Images + Non-drone Images Memory model of drone
(.xml)
Now we have a basic grasp of how the Haar cascade method functions and how to create an
algorithm to extract the coordinates of the detected object and send them to an Arduino utilizing
serial communication to track the target object in real-time. The Drone was tracked and detected
using Haar Cascade in the following phase.
The drone video was used to test the trained classifier. Although it fairly detects drones, it does
not detect drones that are moving and also does not detect drones very effectively indoors. In other
words, it works great when the background is the sky or is clear, but it struggles when the
background changes or the drones are moving.
On drone video, the custom-trained classifier was tested. Although it fairly detects drones, it does
not detect drones that are moving and also does not detect drones very effectively indoors. In other
words, it works great when the background is the sky or is clear, but it struggles when the
background changes or the drones are moving.
YOLO is an abbreviation for the phrase "You Only Look Once." This is an algorithm that detects
and recognizes different objects in a photograph (in real-time). YOLO performs object detection
as a regression problem and returns the class probabilities of the detected images. To detect objects
in real-time, the YOLO algorithm employs convolutional neural networks (CNN). To detect
objects, the algorithm requires only one forward propagation through a neural network, as the
name implies. This means that the entire image is predicted in a single algorithm run. The CNN
is used to predict multiple class probabilities and bounding boxes at the same time. The YOLO
algorithm has several variants. Tiny YOLO and YOLOv3 are two popular examples.
In the beginning, we recorded various videos from various angles and locations. Three mobile
cameras were used to take about 120 videos (Android & iPhone). Using Python programming,
these films were turned into photos, and any extra or raw images were removed. So, in the end,
we obtained 12500 drone images.
Multiple cameras were used to record drone videos from various angles and locations (Indoor,
Outdoor). The python script is used to convert videos into frames. Videos were recorded at a
frame rate of three per second or 30 frames per second. Each image is manually annotated with
the Label Image tool.
We gathered our dataset of around 12500 drone images and gave it the title AUDATS-UAV. Then
we have trained 10000 images on GPU in Swarm Robotics Lab, National Center of Robotics and
Automation.
When tracking, our objective is to locate an item in the current frame if we have successfully
tracked it in all (or almost all) of the preceding frames. You can use OpenCV's different object
tracking implementations in your computer vision applications. Let's see how various tracking
algorithms work.
This algorithm is ten years old and functions reasonably, but I was unable to think of a solid reason
to utilize it, especially with the availability of more sophisticated trackers (MIL, KCF) built on
comparable ideas. The tracking results are only fair. However, it cannot constantly determine
when tracking has failed.
The performance of this tracker is acceptable. It performs admirably even when partially obscured
and does not drift as much as the BOOSTING tracker. This might be the greatest tracker available
to you if you're using OpenCV 3.0. But if you're using a more recent version, think about KCF.
This tracker reports tracking failure more effectively than BOOSTING and MIL, and it is both
more accurate and faster than MIL. However, it cannot recover from complete occlusion.
Over several frames of occlusion, this tracker performs well. also keeps track of the best scaling
changes. However, it has numerous false positives, rendering it all but useless.
Excellent tracking failure reporting is provided by this tracker. works incredibly well when there
is no occlusion and predictable motion. But in the case of strong motion, it fails.
The MOSSE tracker is resistant to changes in position, scale, illumination, and other non-rigid
deformations. The tracker may pause and pick up where it left off when the object reappears
because it also identifies occlusion using the peak-to-side lobe ratio. The MOSSE tracker also has
a greater frame rate. In addition to these advantages, it is also relatively simple to implement,
more precise than other sophisticated trackers, and faster.
It also performs object tracking at a better level of accuracy while operating at a relatively slow
frame rate (25 fps).
In contrast to deep neural networks or other identification techniques, the mean shift doesn't need
any training. There is no need to input the computer hundreds or thousands of drone-labeling
photos if only one drone has to be tracked. Instead, the system automatically tracks the drone for
the remainder of its life after analyzing its initial color input.
It will be more challenging to track an object if its color or texture varies widely. The object being
tracked may also conflict with an image or video that has a "busy" or "noisy" background with a
lot of color fluctuation.
We have implemented the detector with the MOSSE tracker because it has pure speed.
For detection-based tracking firstly, the input is taken from a 4k wide-angle camera on whose
frames detection algorithm is used if the drone gets detected then the ROI(Region of Interest )is
passed from the detector to the tracker then we start taking input from 100p camera mounted on
the rotating turret on which tracking begins after which the output is displayed on the screen in
the form of a zoomed image, a binary image the trajectory of the tracked drone the videos and
images of the tracked drone are saved at the backend on the system in a folder named after the
time and date of a drone being tracked and finally the turrets rotate along the movement of the
drone being tracked
4K Detection Display
Frames
Camera Algorithm
Memory
1080p Tracking
ROI Turret
Camera Algorithm
Initially, a 4k wide angle camera starts monitoring the environment at 30 fps(frames per second).
If the drone is not detected, then it will again start monitoring. If the drone is detected, then it will
draw a bounding box on the detected target and increment the frame by one. It will perform
monitoring until the frame becomes equal to four. When the frame becomes equal to four the
bounding box of the detected target is passed to the tracker as ROI. Then 1080p camera will start
tracking if the target is lost then it will again start detection and monitoring.
No
No
Yes
No
The Haar Cascade detector detects the drone and passes the bounding box as ROI (Region of
Interest).
The Yolo v3 detector detects the drone and passes the bounding box as ROI (Region of Interest).
5.5.4 Compare Efficiency of the detector and Yolo v3 detector with MOSSE tracker
If we use drone detection, then the rate of frame per second is 8. We combined the Yolo V3
detector with the MOSSE tracker then the rate of frame per second is 32. It is used to improve
efficiency. 4x times better performance than detector only.
(a) (b)
Figure 5.26 (a)FPS rate of Detector (b) FPS of Combined YOLO V3 Detector with MOSSE
tracker.
5.5.5 Demonstration
We have conducted several indoor and outdoor testing in order to confirm the working od our
project here we have shown the final demonstration of our project in which the zoomed image,
binary image, drone being tracked and it’s trajectory can be clearly seen in one frame along with
the turret that rotates in pitch and yaw in order to track the drone effectively.
As soon as the drone gets detected it sends alert to the admin in the form of an email a phone call
and an SMS in order to inform about the detection of a hostile UAV in their territory. The message
and Email consist of the statement of the drone detected along with the date and time of detection.
Videos and images of drones being tracked gets save at the backend in separate folders labeled
with date and time of tracking
Figure 5.29 (c) Zoomed images along with the video of tracked drone
It is a table that's wont to describe the performance of a classification model on a dataset that
actuality values are known. comparatively simple to know however the connected terminologies
may be confusing. To outline all the terminologies, we'd like to define some thresholds (α=0.6).
Predicted
Class
Positive Negative
5.6.2 Terminologies:
True positives (TP): The case where the classifier predicts yes and the real case is positive.
True negatives (FP): No predicted classifier and the actual case is also negative.
Predicted Class
N=2500
Positive Negative
Class
Negative 0 510 510
1898 602
• Accuracy.
• Precision.
• Recall.
• Error rate.
• Specificity.
• F1 Score.
Accuracy:
1898 + 510
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 =
2500
Accuracy = 0.9632
Precision:
It reflects the reliability with which the model classifies the sample as positive.
∑ 𝑻𝒓𝒖𝒆 𝑷𝒐𝒔𝒊𝒕𝒊𝒗𝒆
𝑷𝒓𝒆𝒄𝒊𝒔𝒊𝒐𝒏 =
∑ 𝑻𝒓𝒖𝒆 𝑷𝒐𝒔𝒊𝒕𝒊𝒗𝒆 + ∑ 𝑭𝒂𝒍𝒔𝒆 𝑷𝒐𝒔𝒊𝒕𝒊𝒗𝒆
1898
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 =
1898 + 0
Precision = 1.0
𝑷𝒓𝒆𝒄𝒊𝒔𝒊𝒐𝒏 𝒊𝒏 % = 𝟏𝟎𝟎%
Recall:
The fraction of the relevant cases retrieved is also known as Sensitivity. Detecting positive
samples.
∑ 𝑻𝒓𝒖𝒆 𝑷𝒐𝒔𝒊𝒕𝒊𝒗𝒆
𝑹𝒆𝒄𝒂𝒍𝒍 =
∑ 𝑻𝒓𝒖𝒆 𝑷𝒐𝒔𝒊𝒕𝒊𝒗𝒆 + ∑ 𝑭𝒂𝒍𝒔𝒆 𝑵𝒆𝒈𝒂𝒕𝒊𝒗𝒆
1898
𝑅𝑒𝑐𝑎𝑙𝑙 =
1898 + 92
Recall = 0.9537
Error Rate:
The error rate generally indicates how often the model makes false predictions also
known as the Misclassification rate.
Specificity:
Test specificity, also known as true negative rate (TNR), is the proportion of negative
samples that have a negative test result using the test in question.
∑ 𝑻𝒓𝒖𝒆 𝑵𝒆𝒈𝒂𝒕𝒊𝒗𝒆
𝑺𝒑𝒆𝒄𝒊𝒇𝒊𝒄𝒊𝒕𝒚 =
∑ 𝑻𝒓𝒖𝒆 𝑵𝒆𝒈𝒂𝒕𝒊𝒗𝒆 + ∑ 𝑭𝒂𝒍𝒔𝒆 𝑷𝒐𝒔𝒊𝒕𝒊𝒗𝒆
510
𝑆𝑝𝑒𝑐𝑖𝑓𝑖𝑐𝑖𝑡𝑦 =
510 + 0
𝑆𝑝𝑒𝑐𝑖𝑓𝑖𝑐𝑖𝑡𝑦 = 1
𝑆𝑝𝑒𝑐𝑖𝑓𝑖𝑐𝑖𝑡𝑦 𝑖𝑛 % = 1 × 100
𝑺𝒑𝒆𝒄𝒊𝒇𝒊𝒄𝒊𝒕𝒚 𝒊𝒏 % = 𝟏𝟎𝟎%
F1 Score:
Precision and recall are two components of the F1 score. The F1 score is defined as the
harmonized mean of precision and recall.
• A model will receive a high F1 score if Precision and Recall are high.
• A model will receive a low F1 score if Precision and Recall are low.
• A model will receive an average F1 score if one of the Precision and Recall is low and the
other is high.
𝑷𝒓𝒆𝒄𝒊𝒔𝒊𝒐𝒏 ∗ 𝑹𝒆𝒄𝒂𝒍𝒍
𝑭𝟏 𝑺𝒄𝒐𝒓𝒆 = 𝟐 ×
𝑷𝒓𝒆𝒄𝒊𝒔𝒊𝒐𝒏 + 𝑹𝒆𝒄𝒂𝒍𝒍
1 ∗ 0.9653
𝐹1 𝑆𝑐𝑜𝑟𝑒 = 2 ×
1 + 0.9653
𝐹1 𝑆𝑐𝑜𝑟𝑒 = 0.976
𝑭𝟏 𝑺𝒄𝒐𝒓𝒆 𝒊𝒏 % = 𝟗𝟕. 𝟔%
CHAPTER 6
CONCLUSION AND FUTURE WORK
CHAPTER 6
CONCLUSION AND FUTURE WORK.
6.1 Conclusion:
A drone detection system with YOLO v3 algorithm and MOSSE Tracker tracking was
implemented and tested using drone images from custom datasets, the internet as well as real-
time random drone images/videos. We used a sequence of image processing algorithms to
perform real-time object detection and tracking. This is useful for applications that require real-
time response. We have implemented a fully autonomous drone surveillance system based on
RGB cameras and computer vision. The implemented scheme can be partially or fully used for
other video surveillance tasks instead of countering the drone's activity after making the
appropriate changes. The focal point of this is that frames from multiple cameras are used for
detection and tracking in the proper configuration. Compared to conventional object detection
methods, it exhibits higher performance in terms of accuracy and accuracy.
REFERENCES
REFERENCES
[1]Wu, M., Xie, W., Shi, X., Shao, P. and Shi, Z., 2018, July. Real-time drone
detection using deep learning approach. In International Conference on Machine
Learning and Intelligent Communications (pp. 22-32)
[2] Kashiyama, Takehiro, Hideaki Sobue, and Yoshihide Sekimoto. 2021. "Sky
Monitoring System For Flying Object Detection Using 4K Resolution Camera.“
[3]Unlu, E., Zenou, E., Riviere, N. and Dupouy, P.E., 2019. Autonomous drone
surveillance and tracking architecture. Electronic Imaging, 2019(15), pp.35-1.
[4 ]Kashiyama, T., Sobue, H. and Sekimoto, Y., 2020. Sky Monitoring System
for Flying Object Detection Using 4K Resolution Camera. Sensors, 20(24),
p.7071.
[5] Opromolla, R., Inchingolo, G. and Fasano, G., 2019. Airborne visual
detection and tracking of cooperative UAVs exploiting deep
learning. Sensors, 19(19), p.4332.
[6] Unlu, E., Zenou, E., Riviere, N. and Dupouy, P.E., 2019. Deep learning-based
strategies for the detection and tracking of drones using several cameras. IPSJ
Transactions on Computer Vision and Applications, 11(1), pp.1-13.
[7] I. Bisio, C. Garibotto, F. Lavagetto, A. Sciarrone and S. Zappatore,
"Unauthorized Amateur UAV Detection Based on WiFi Statistical Fingerprint
Analysis," in IEEE Communications Magazine, vol. 56, no. 4, pp. 106-111, April
2018, DOI: 10.1109/MCOM.2018.1700340.
[8] Svanström, F., Englund, C. and Alonso-Fernandez, F., 2021, January. Real-
time drone detection and tracking with visible, thermal and acoustic sensors. In
2020 25th International Conference on Pattern Recognition (ICPR) (pp. 7265-
7272). IEEE.
[10] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand,
T., Andreetto, M. and Adam, H., 2017. Mobilenets: Efficient convolutional
neural networks for mobile vision applications. arxiv preprint arXiv:1704.04861.
[11] Girshick, R., Donahue, J., Darrell, T. and Malik, J., 2014. Rich feature
hierarchies for accurate object detection and semantic segmentation. In
Proceedings of the IEEE conference on computer vision and pattern recognition
(pp. 580-587).
[12] Li, C., Sun, X. and Cai, J., 2019. Intelligent Mobile Drone System Based on
Real-Time Object Detection. Journal on Artificial Intelligence, 1(1), pp.1-8.
[13] J. Peng, C. Zheng, P. Lv, T. Cui, Y. Cheng and S. Lingyu, "Using images
rendered by PBRT to train faster R-CNN for UAV detection", Proc. Comput. Sci.
Res. Notes, pp. 13-18, 2018.
https://datasheet.octopart.com/A000066-Arduino-datasheet-38879526.pdf
[Accessed 1 June 2022].:
[15] Singha, S. and Aydin, B., 2021. Automated Drone Detection Using
YOLOv4. Drones, 5(3), p.95.
[16] Reader's Digest. 2022. What Happens When a Plane Collides with a Flock
of Birds? [online] Available at: https://www.rd.com/article/plane-surrounded-
by-birds/ [Accessed 6 July 2022].:
[18] Database, G. and Specs, A., 2022. NVIDIA A100 SXM4 40 GB Specs.
[online] TechPowerUp. Available at: https://www.techpowerup.com/gpu-
specs/a100-sxm4-40-gb.c3506 [Accessed 6 July 2022].: