Final Merged

Intelligent Monitoring for Anomaly Recognition
using CNN and YOLOv9

Thesis submitted in partial fulfilment
of the requirements of the degree of
Bachelor of Technology
in
Computer Engineering
by
Abhishek Mohan Hundalekar
2021-B-13072001
Vamshi Rajkumar Naidu
2021-B-29022000
Siddesh Pingale
2021-B-18032002
Under the Supervision of

Dr. Vishal Shirsath
May 2024
School of Engineering
Ajeenkya DY Patil University, Pune
1
May 17, 2024
CERTIFICATE
This is to certify that the dissertation entitled “Intelligent Monitoring for Anomaly
Recognition using CNN and YOLOv9” is a bonafide work of Abhishek
Hundalekar (2021-B-13072001), Vamshi Naidu (2021-B-29022000) and Siddesh
Pingale (2021-B-18032002) submitted to the School of Engineering, Ajeenkya D Y
Patil University, Pune in partial fulfilment of the requirement for the award of the
degree of “Bachelor of Technology in Computer Engineering”.
Dr. Vishal Shirsath

Supervisor
Internal-Examiner/s External Examiner
Dr. Biswajeet Champaty

Dean-School of Engineering
2
May 17, 2024
Supervisor’s Certificate
This is to certify that the dissertation entitled “Intelligent Monitoring for Anomaly
Recognition using CNN and YOLOv9” submitted by Abhishek Hundalekar (2021-
B-13072001), Vamshi Naidu (2021-B-29022000) and Siddesh Pingale (2021-B-
18032002) is a record of original work carried out by him/her under my supervision and
guidance in partial fulfillment of the requirements of the degree of Bachelor of
Technology in Biomedical Engineering at School of Engineering, Ajeenkya DY
Patil University, Pune, Maharashtra-412105. Neither this dissertation nor any part of
it has been submitted earlier for any degree or diploma to any institute or university in
India or abroad.
Dr. Vishal Shirsath

Supervisor
3
Declaration of Originality
I, Abhishek Hundalekar (2021-B-13072001), Vamshi Naidu (2021-B-29022000) and Siddesh
Pingale (2021-B-18032002) hereby declare that this dissertation entitled “Intelligent Monitoring for
Anomaly Recognition using CNN and YOLOv9” presents my original work carried out as a bachelor
student of School of Engineering, Ajeenkya D Y Patil University, Pune, Maharashtra. To the best of
my knowledge, this dissertation contains no material previously published or written by another person,
nor any material presented by me for the award of any degree or diploma of Ajeenkya D Y Patil
University, Pune or any other institution. Any contribution made to this research by others, with whom
I have worked at Ajeenkya D Y Patil University, Pune or elsewhere, is explicitly acknowledged in the
dissertation. Works of other authors cited in this dissertation have been duly acknowledged under the
sections “Reference” or “Bibliography”. I also declare that I have adhered to all principles of academic
honesty and integrity and have not misrepresented or fabricated or falsified any idea/data/fact/source
in my submission.
I am fully aware that in case of any non-compliance detected in future, the Academic Council of
Ajeenkya D Y Patil University, Pune may withdraw the degree awarded to me on the basis of the
present dissertation.
Date:
Place: Lohegaon, Pune
Abhishek Hundalekar
Vamshi Naidu Siddesh Pingale
4
Acknowledgement
First and foremost, we would like to express our sincere gratitude to our supervisor, Dr. Vishal
Shirsath, for his invaluable guidance, encouragement, and support throughout this research project.
We are also deeply grateful to Dr. Biswajeet Champaty, Dean of the School of Engineering, for his
leadership and encouragement.
We extend our heartfelt thanks to Ajeenkya D Y Patil University for providing us with the
necessary resources to conduct this study. Our appreciation also goes out to the teaching and non-
teaching staff of the university, whose support has been indispensable.
We are deeply appreciative of our research participants, who generously gave their time and
shared their experiences with us. Additionally, we thank our colleagues and friends for their valuable
feedback and unwavering support throughout the research process.
Finally, we would like to acknowledge that working on this project has been a truly fulfilling
experience
5
Abstract
In today's security contexts, guns must be detected quickly and precisely in order to protect the
populace. Using a CNN techniques with the YOLOv9 object recognition framework, this research
study presents a novel approach for real-time weapon identification in both live and prepared videos.
By integrating YOLOv9, object detection accuracy and speed are considerably improved, facilitating
the quick identification of possible threats. The presented method exhibits strong performance in
various lighting settings and environments, with excellent recall rates and precision thorough testing
and assessment. This approach used CNN based architecture and deep learning to effectively detect
and categorize weapons in video frames which achieves 97.62 % accuracy. The research offers insights
into how cutting-edge computer vision technology might be applied to enhance public safety and
security protocols, which has implications beyond security concerns. This work contributes to the
ongoing discussion about applying cutting-edge technological solutions to address contemporary
security challenges by highlighting the potential into upcoming developments within the disciplines of
visual analysis and deep learning to community use. Furthermore, this study underscores the
importance of interdisciplinary collaboration in advancing technological solutions for societal benefit.
Through the integration of computer vision, artificial intelligence, and security, researchers may
promote creativity and create comprehensive strategies to tackle intricate problems. Moreover, the
findings of this research have implications beyond security, extending into areas such as public health,
disaster response, and urban planning. As technology continues to evolve, there is immense potential
for the integration of computer vision and deep learning methodologies into various facets of public
life, enhancing safety, efficiency, and overall quality of living. This study thus serves as a catalyst for
further exploration and innovation in leveraging state-of-the-art technologies to create safer and more
resilient communities worldwide. This research not only achieves high weapon identification accuracy
rates, but it also optimizes computational resources to guarantee the scalability and effectiveness of the
suggested system. By combining sophisticated CNN architectures with the YOLOv9 framework, the
model performs well in a variety of settings with different illumination, angles, and types of firearms.
Moreover, the system's real-time capabilities are highlighted, allowing for quick reactions to possible
threats in contexts where security is a concern. The methodology presented in this study represents a
major improvement in maintaining public safety and security since it not only improves the speed and
accuracy of weapon detection but also demonstrates its applicability to real-world deployment.
6
Contents
CHAPTER 1: INTRODUCTION
1.1 Introduction 11
CHAPTER 2: LITERATURE SURVEY
2.1 Literature Survey 12

CHAPTER 3: METHODOLOGY
3.1 Methodology 18
3.2 Calculations 20
3.3 Dataset and Accuracy 21
3.4 Deep Learning 23
3.5 Deep Learning Types 26
3.6 Deep Learning Applications 29
3.7 Deep Learning in Computer Vision 31
3.8 Deep Learning Implementation 34
3.9 CNN 36
3.10 CNN Types 39
3.11 CNN Applications 41
3.12 CNN in Computer Vision 44
3.13 CNN Implementation 46
CHAPTER 4: RESULTS AND DISCUSSION
4.1 Experimental Results 49
4.2 Performance Measurements 50
CHAPTER 5: CONCLUSION
5.1 Conclusion 53
REFERENCES 55
7
List of Figures
1.1 Methodology 20
1.2 Bar Chart 23
1.3 Deep Learning 24
1.4 Autoencoders 28
1.5 Image Classification 30
1.6 Object Detection 32
1.7 Transfer Learning 34
1.8 CNN 37
1.9 Medical Image Analysis 43
1.10 Handgun 49
1.11 AK-47 49
1.12 UZI 50
1.13 train/box_loss 51
1.14 train/cls_loss 51
1.15 train/dfl_loss 52
8
List of Tables
1.1 Data and Accuracy 22
9
List of Abbreviations
AI Artificial Intelligence
ANN Artificial Neural Network
API Application Interface
BERT Bidirectional Encoder Representations from Transformers
CNN Convolution Neural Network
DAE Denoising Autoencoders
DRL Deep Reinforcement Learning
Dl Deep Learning
GAN Generative Adversarial Networks
GPT Generative Pretrained Transformer
GRU Gated Recurrent Unit
IOU Intersection Over Union
LSTM Long Short-Term Memory
SGD Stochastic Gradient Descent
T5 TextToText Transfer Transformer
VAE Variational Autoencoders
RNN Recurrent Neural Networks
YOLO You Only Look Once
10
Chapter 1 Introduction
Chapter 1
INTRODUCTION
1.1 Introduction
The world of international security has seen a startling increase in gun-related violence in recent years,
from terrorist attacks to mass shootings. This increase in violence highlights how urgent it is to
strengthen security procedures and safeguard public areas, organizations, and neighborhoods. In the
face of sophisticated threats and changing settings, traditional weapon detection techniques that depend
on human observation and traditional metal detectors are becoming less and less effective. In light of
this, the combination of deep learning and artificial intelligence (AI) presents a viable path forward for
improving weapon identification capabilities. Through the integration of computer vision, artificial
intelligence, and security, researchers may promote creativity and create comprehensive strategies to
tackle intricate problems. These artificial intelligence (AI) systems have the ability to go beyond the
constraints of manual inspection techniques, allowing for quick and precise threat assessment while
reducing false alarms. The main focus of this study is on the usage of YOLOv9, an enhanced version
of the (identifying items framework that is well-known for its efficacy and accuracy in real-time image
processing applications. By integrating YOLOv9 with deep learning architectures, our research seeks
to push the boundaries of weapon identification and achieve previously unheard-of levels of detection
accuracy and speed. In conclusion, our work represents a deliberate endeavour to modernize and refine
weapon detection techniques while enhancing societal resilience through the application of cutting-
edge artificial intelligence techniques. Our goal is to equip security stakeholders with the necessary
tools and knowledge to successfully counter emerging threats and safeguard lives by embracing
technology innovation and interdisciplinary collaboration.
11
Chapter 2 Literature Survey
Chapter 2
LITERATURE SURVEY
2.1 Literature Survey:
1. Mohammad Zahrawi & Khaled Shaalan (2023) - “Improving video surveillance systems in banks
using deep learning techniques”, The paper introduces a novel method for real-time weapon
detection using YOLO and SSD object detection systems, aiming to enhance security in various
settings like banks and ATMs. It emphasizes reducing false alarms to improve real-life
applicability. This research aligns with our project's goal of implementing real-time weapon
detection in surveillance systems, utilizing state-of-the-art object detection models to enhance
accuracy and reduce false alarms in security applications [1].
2. Shehzad Khalid, Onome Christopher Edo, Abdullah Waqar, Tenebe, Imokhai Theophilus & Hoor
Ul Ain Tahir (2023) - “Weapon detection system for surveillance and security “, This study presents
a state-of-the-art weapon detection system using YOLO V5 and Mask-RCNN, achieving high
accuracy rates in detecting firearms. It focuses on robustness against various challenges like
rotation, occlusion, and size variations. This paper is relevant to our project as it showcases
advanced deep learning techniques for weapon detection, emphasizing the importance of timely
detection to prevent potential threats in public security scenarios [2].
3. Abdul Hanan Ashraf, Muhammad Imran, Abdulrahman M. Qahtani, Abdulmajeed Alsufyani,

Omar Almutiry, Awais Mahmood, Muhammad Attique & Mohamed Habib (2022) - " Weapons
Detection for Security and Video Surveillance Using CNN and YOLO-V5s”, The research
addresses the need for automated weapon detection using YOLO and AOI to minimize false
negatives and false positives in real-time. It highlights the importance of speed and accuracy in
surveillance systems. This work aligns with our project's objective of optimizing weapon detection
12
systems for speed and accuracy, leveraging YOLO-based frameworks to enhance performance in
security applications [3].
4. Roberto Olmos, Siham Tabik, Alberto Lamas, Francisco Pérez-Hernández & Francisco Herrera
(2019) - ” A binocular image fusion approach for minimizing false positives in handgun detection
with deep learning”, The paper proposes a novel approach using binocular image fusion to reduce
false positives in object detection, specifically targeting handgun detection in low-quality
surveillance videos. This research contributes to our project by focusing on improving object
detection in surveillance videos, aiming to reduce false positives and enhance overall performance
in identifying weapons [4].
5. B. Abruzzo, K. Carey, C. Lowrance, E. Sturzinger, R. Arnold & C. Korpela (2019) - “Cascaded

neural networks for identification and posture-based threat assessment of armed people “, This
study presents a multistage classifier for identifying people and handguns in images, assessing
threat levels based on body posture, achieving near real-time processing on a desktop computer.
This research is relevant to our project as it explores advanced techniques for person and weapon
detection, offering insights into threat assessment based on body pose, which could enhance our
system's capabilities for public security applications [5].
6. Nashwan Jasim Hussein & Fei Hu Huazhong (2016) - “An Alternative Method to Discover
Concealed Weapon Detection Using Critical Fusion Image of Color Image and Infrared Image”,
The paper proposes a robust framework for detecting hidden weapons using color and infrared
image fusion with discrete wavelet transform (DWT) analysis and thresholding segmentation.
Real-time RGB and infrared images are utilized for automatic weapon detection. This research
aligns with our project's aim to enhance weapon detection accuracy using image processing and
sensor technologies, focusing on innovative methods like image fusion for improved detection
capabilities [6].
13
7. Dr Raman Dugyala M Vishnu Vardhan Reddy2, Ch Tharun Reddy & G Vijendar (2023) - ”
Weapon Detection in Surveillance Videos Using OLOV8 and PELSF-DCNN”, The study
introduces a weapon detection (WD) model using PELSF-DCNN, combining YOLOv8 for object
detection and motion estimation algorithms. The proposed method demonstrates improved
efficiency in detecting weapons from video frames. This paper contributes to our project by
showcasing advanced deep learning algorithms like PELSF-DCNN and YOLOv8 for real-time
weapon detection, aligning with our goal to leverage cutting-edge AI techniques [7].
8. Muhammad Tahir Bhatti, Muhammad Gufran Khan, Masood Aslam, & Muhammad Junaid Fiaz1
(2021) - “Weapon Detection in Real-Time CCTV Videos Using Deep Learning”, The research
emphasizes the importance of CCTV surveillance for automatic detection of illegal activities,
particularly focusing on weapon detection using open-source deep learning algorithms like
YOLOv4. The study creates its dataset to address real-time scenarios. This paper informs our
project by highlighting the significance of CCTV-based weapon detection systems, incorporating
state-of-the-art deep learning models like YOLOv4 to enhance detection accuracy and efficiency
[8].
9. Pankaj Bhambri, Sachin Bagga, Dhanuka Priya, Harnoor Singh & Harleen Kaur Dhiman (2020) -
“Suspicious Human Activity Detection System”, The paper discusses the use of anomaly detection
systems with machine learning for behavioral analysis, aiming to predict and identify unusual
occurrences. The focus is on leveraging AI for detecting anomalies related to criminal activities.
This research contributes to our project's broader security context by emphasizing the role of
anomaly detection in addressing crime rates, aligning with our goal of enhancing public safety
through advanced AI systems [9].
10. Ms. U. M. Kamthe & Dr. C. G. Patil (2018) - “Suspicious Activity Recognition in Video
Surveillance System “, The study explores the role of video surveillance in detecting criminal
activities like gun-related crimes using neural network models such as Faster R-CNN and
14
YOLOv3. It emphasizes the importance of automated surveillance systems for public safety. This
paper complements our project by highlighting the significance of automated video surveillance
for detecting weapon-related crimes, leveraging advanced neural network techniques like YOLOv3
to enhance detection capabilities [10].
11. Nandini Fal Dessai & Prof. Shruti Pednekar (2023) - “Surveillance-based Suspicious Activity
Detection: Techniques, Application and Challenges “, This paper explores using Long-term
Recurrent Convolutional Network (LRCN) systems for real-time detection of suspicious activities
in surveillance videos, emphasizing the importance of intelligent video surveillance for security. It
aligns with our project's goal of leveraging deep learning to automatically identify and classify
suspicious behaviors in public spaces, enhancing overall security measures [11].
12. Digambar Kauthkar1, Snehal Pingle, Vijay Bansode, Pooja Idalkanthe & prof. Sunita Vani (2022)
– "Suspicious Activity Detection using CNN“, The research focuses on utilizing deep learning
techniques, particularly convolutional neural networks (CNNs), for detecting suspicious human
activities and altercations in real-time surveillance footage. This aligns closely with our project's
objective of implementing CNN-based models to recognize and respond to potential threats
promptly in public environments [12].
13. Sathyajit Loganathan, Gayashan Kariyawasam & Prasanna Sumathipala (2019) - "Suspicious
Activity Detection in Surveillance Footage", The study investigates detecting suspicious activities
like gun-based crimes and abandoned luggage using deep neural network models, emphasizing the
need for automated surveillance systems to minimize security risks. This directly relates to our
project's aim of deploying AI-based solutions to enhance security monitoring and threat detection
in public areas [13].
14. Suganya, K., Pavithra, A., Ranjani, R., Saktheswari, & Seethai, R. (2023) - “Weapon detection
using machine learning algorithm”, This research highlights the importance of integrating machine
15
learning, specifically weapon detection using YOLO (You Only Look Once), into surveillance
systems to improve security efficiency and response time. It resonates with our project's approach
of leveraging advanced object detection methods like YOLO for real-time identification of
potential threats in public spaces [14].
15. Dr Raman Dugyala, M Vishnu Vardhan Reddy, Ch Tharun Reddy & G Vijendar (2023) - “Weapon
detection in surveillance videos using YOLOV8 and PELSF-DCNN”, This paper proposes a
weapon detection (WD) model using a PELSF-DCNN approach to identify potentially violent
situations, overcoming challenges faced by traditional deep learning (DL) algorithms in detecting
weapons. By integrating YOLOv8 for object detection and motion estimation techniques, this
model demonstrates improved efficiency in identifying weapons in surveillance video frames. This
research aligns with our project's focus on leveraging advanced DL methods like YOLO for real-
time weapon detection, contributing to enhanced security monitoring and threat mitigation in public
settings [15].
16. Murugan, P., Ida, A. M., Aashika, R., Brillian, S. A., Ancy, M. S., & Sneha (2023) -” Weapon
detection system using deep learning”, This paper explores real-time weapon detection using the
YOLO object detection algorithm, achieving high accuracy in identifying firearms, knives, and
other weapons. It demonstrates the effectiveness of YOLOv8 in enhancing detection performance
across various scenarios. This research aligns with our project's focus on leveraging YOLO and
deep learning for efficient weapon detection, showcasing YOLOv8's capabilities in improving
detection accuracy and speed [16].
17. Ahmed Abdelmoamen Ahmed & Mathias Echi (2021) - “Hawk-Eye: An AI-Powered Threat
Detector for Intelligent Surveillance Cameras”, The paper presents Hawk-Eye, an AI-powered
threat detector for smart surveillance cameras, capable of detecting weapons, masked faces, and
suspicious objects in real-time, using both cloud-based and edge computing. Hawk-Eye's approach
of utilizing AI for surveillance aligns with our project's aim to enhance security monitoring through
automated weapon detection, complementing our focus on real-time threat identification [17].
16
18. Diya Kumawat, D., Abhayankar, D, & Tanwani, S (2024) – “Exploring Object Detection
Algorithms and implementation of YOLOv7 and YOLOv8 based model for weapon detection”,
This study evaluates YOLO algorithms (YOLOv7 and YOLOv8) for weapon detection, achieving
high mean Average Precision (mAP) scores. It compares various object detection techniques and
highlights performance metrics. The paper's focus on YOLO algorithms resonates with our
project's approach, emphasizing the importance of evaluating and improving deep learning models
for accurate and efficient weapon detection [18].
19. P Yadav,N Gupta, & P.K Sharma (2023) - “Robust Weapon Detection in Dark Environments using
YOLOv7-DarkVision”, The paper proposes YOLOv7-DarkVision, a modified YOLOv7 model for
accurate weapon detection in low-light conditions. It demonstrates robust detection performance in
challenging nighttime environments. This paper's emphasis on adapting YOLO for low-light
scenarios resonates with our project's interest in improving weapon detection under varying
environmental conditions, such as nighttime surveillance [19].
20. G. F. Shidik, E. Noersasongko, A. Nugraha, P. N. Andono, J. Jumanto & E. J. and Kusuma (2019)
- ‘‘A systematic review of intelligence video surveillance: Trends, techniques, frameworks, and
datasets’’, This paper introduces an object tracking and zooming algorithm for surveillance
systems, enhancing resolution and conserving disk space by focusing on active targets. While not
directly focused on weapon detection, the paper's approach to enhancing surveillance capabilities
aligns with our broader interest in improving security monitoring through innovative surveillance
technologies [20].
17
Chapter 3 Methodology
Chapter 3
METHODOLOGY
6.1 Methodology
1. Client-Side Interfaces:
• The client-side interfaces play a pivotal role in user interaction within our system. This
encompasses the login and registration pages, which serve as the primary entry points for users.
• Upon accessing the system for the first time, users are directed to the registration page to create an
account. Subsequently, they are prompted to log in during subsequent visits.
• The user experience on these interfaces is designed to be intuitive and seamless, ensuring efficient
onboarding and authentication processes.
2. Selection of Videos:
• Following successful authentication, users have the option to select a movie for item identification.
• The system provides flexibility by allowing users to choose from recorded videos or opt for live
streaming of prerecorded footage, catering to diverse operational preferences and requirements.
3. Utilization of YOLOv9 for Object Detection:

• Our system leverages YOLOv9, the latest iteration of the YOLO object detection framework,
renowned in its exceptional speed and accuracy in object recognition tasks.
• Yolov9 enables real-time inference speeds by quickly predicting bounding boxes and class
probabilities for observed objects in a given video frame using a unified neural network
architecture.
4. Alert Mechanism for Weapon Detection:

• Upon selection of a video, the YOLOv9 model analyzes each frame to identify firearms.
18
• In the event of weapon detection, an alert is triggered, and users are promptly notified via email or
mobile device, ensuring timely response to security threats.
• Conversely, if no weapons are detected, users still receive a notification, reaffirming continuous
monitoring and providing feedback on system status.
5. Server-side Component for Data Processing:

• Complementing the client-side interfaces, the server-side component orchestrates data processing
and facilitates communication between clients and the database.
• This ensures seamless operation of the system, enabling efficient data exchange and timely
responses to security events.
6. Integration of Object Detection Logic:

• The seamless integration of object detection logic within the application enables rapid and precise
identification of weapons.
• By incorporating advanced AI techniques, our system enhances security measures and provides
users with peace of mind in today's complex security landscape.
7. API and Database Integration:

• The system features an API for seamless communication between client-side and server-side
components, facilitating efficient data exchange and system operation.
• A robust database architecture is employed to store user data and video metadata, ensuring
scalability and reliability of the system's data management capabilities.
8. Enhanced Security Measures:

• Through the utilization of YOLOv9's capabilities and the incorporation of advanced object
detection techniques, our system strengthens security measures and enhances threat detection
capabilities.
• The user-friendly interface and comprehensive notification system further contribute to improved
security protocols, empowering users with effective tools for risk mitigation and response.
19
Figure 1: Methodology
3.2 Calculations
1. Intersection over Union (IoU)
For every anticipated bounding box and its matching ground truth box, the IoU is computed. This is
the equation:
IoU = Intersection Area / Union Area
The intersection area is the region where the ground truth box and the anticipated bounding box
overlap.
Union Area: The whole area that is encompassed by the ground truth box and the forecasted box.
2. Binary Cross-Entropy Loss (Classification Loss):

The difference between the ground truth class labels (y) and the predicted class probabilities (Pc) is
measured by this formula.
Loss = - Σ (yi * log(Pc) + (1-yi) * log(1-Pc))
where,
Σ: Total sum of all the classes.
20
yi: Class I ground truth label (1 for positive classes, 0 for negative classes).
Pc: The model's predicted probability of class I.
3. Smooth L1 Loss (Bounding Box Regression Loss):

For a given anchor box, this formula determines the difference between the ground truth offsets (t) and
the expected bounding box offsets (p).
Loss = Σsmooth_L1(pt - tt)
where,
Σ: Total of all bounding box coordinates (height, width, x, and y).
smooth_L1(x): The smoothed L1 function, which, in contrast to the conventional L1 loss, lessens the
cost for tiny errors.
pt: Value of the predicted offset for a given bounding box coordinate.
tt: The value of the ground truth offset for the associated coordinate.
4. Sigmoid and Softmax:

These activation functions are applied to the final output layer of the network.
Sigmoid: For binary classification (one object class vs. background), use the sigmoid algorithm. The
output values are transformed from 0 to 1, which indicates the likelihood that the object class is present.
Formula: Sigmoid(x) = 1 / (1 + e^(-x))
Softmax: Multiple item classes are classified using this technique. The output scores are normalised so
that they add up to 1, which represents the class probability.
Formula: Softmax(x) = e^(x_i) / Σ(e^(x_j)) for all classes j.
3.3 Dataset and Accuracy
In order to conduct this study, we gathered and carefully selected a bespoke dataset made up of pictures
of numerous weapons that are frequently seen in security situations. Each type of weapon was captured
in the collection by means of painstaking image capture and storage, with many instances and
orientations for each weapon.
21
1. Handgun: This category has 1260 high-resolution photos that show pistols from various viewpoints
and angles. To guarantee variation in background circumstances and lighting, the pistols were
methodically photographed in controlled situations. Every picture in this category features a variety
of handgun models.
2. AK-47: The 1257 photos in the AK-47 dataset show this well-known assault rifle from various
angles. The photos were meticulously taken, catching the unique qualities and traits of the AK-47
in a range of firing situations. Every picture captures the distinct structural aspects and design
features of the AK-47.
3. UZI: This dataset, which features 1417 carefully taken pictures of the UZI submachine gun, shows
the weapon in various positions and environments. The UZI dataset provides extensive coverage
of this gun, ranging from broad frames showcasing its form factor to close-ups emphasizing minute
details.
4. Knife: The knife dataset consists of 1020 pictures that show various kinds of knives, including as
utility knives, combat knives, and other weapons with blades. Every photograph showcases the
unique characteristics and blade layouts of the knives, providing a thorough portrayal of this class
of firearm.
Table 1: Data and Accuracy
Sr. Dataset Categories Accuracy

No.
1 1020 Knife 92%
2 1260 Handgun 95%
3 1257 AK-47 94%
4 1417 UZI 97%
22
YOLOv9 excels at analyzing objects in photos and videos with accuracy and speed; This makes it
suitable for real-time applications such as surveillance, traffic drivers and robots. Object recognition:
YOLOv9 can recognize and classify many objects, making it useful for tasks such as image recognition
and analysis.
Scene Understanding: By detecting and classifying objects in difficult situations, YOLOv9 can help
understand the content of images and videos, which is important for applications such as content review
and reality.
Instance segmentation: YOLOv9 can instantiate segmentation, which involves identifying an object in
an image and its boundaries, allowing for more detailed analysis and understanding of visual content.
Schemes, business applications and open plans to solve various computer vision problems with
improved performance and better performance compared to the previous version of YOLO.
Figure 2: Bar Chart
This chart shows the weapons dataset and accuracy.
3.4 Deep Learning
It is a branch of artificial intelligence that uses hierarchical designs called neural networks to represent
and interpret high-level abstractions found in data. It draws motivation from the anatomy and workings
23
of the brain of an individual, especially from its capacity to see patterns and learn from massive
volumes of information.
Figure 3: Deep Learning
Key Components of Deep Learning:
1. Neural Networks:
• ANNs, computational models made up of linked hub arranged in stages, are the foundation of deep
learning. Every hub, or nerve, processes incoming data using elementary mathematical operations
before sending the outcome to the stratum above.
• By modifying the parameters that are connected to the links between neurons during training,
artificial brains can learn intricate mappings between input and output data.
2. Deep Architectures:
• Deep learning models are characterized by their depth, referring to the number of layers in the
neural network.
• Deep architectures allow for the hierarchical representation of features, with each layer learning
increasingly abstract representations of the input data.
24
• Deep architectures improve efficiency for applications like recognizing pictures, word processing,
and audio recognition by enabling the automatic extraction of complex patterns and correlations in
data.
3. Representation Learning:
• Deep learning places a strong emphasis on representation learning, which is the method of
recognizing and deriving significant features from unprocessed information.
• By learning hierarchical representations of data, deep learning models can capture complex
relationships and variations inherent in the input data.
• Representation learning eliminates the need for manual feature engineering, where domain specific
features are manually crafted by experts, making deep learning models more adaptable to diverse
datasets and domains.
4. Training Algorithms:
• Stochastic gradient descent (SGD) and its variants are ways of maximizing efficiency that are
employed to train dataset.
• The algorithm continuously modifies its settings to reduce a fixed loss function, which quantifies
the discrepancy between expected and real results.
• A key algorithm in deep learning, reverse propagation computes the slopes of the error product
relative to the strategy variables, enabling the descent of value changes.
5. Regularization Techniques:
• Machine learning algorithms use regularization techniques including dropout, batch normalization,
and weight regularization to reduce excessive fitting and enhance generalization performance.
• These techniques introduce constraints on the model's parameters, encouraging simplicity and
reducing the risk of overfitting to the training data.
• These CNN architectures represent a diverse range of approaches to designing convolutional neural
networks, each with its own strengths and applications.
25
• Researchers and practitioners can select the best design by taking into account variables including
task complexity, determining capacity, and efficacy needs.
3.5 Deep Learning Types:
It comprises diverse methodologies and structures intended to tackle distinct educational problems and
obstacles. The following are some important categories of learning models:
1. Convolutional Neural Networks (CNNs):
• CNNs are specialized artificial brains with numerous layers, comprising convolutional neural
systems, pooling, fully linked layers.
• The layers, are meant for handling and interpreting data that is visual, such as footage and pictures.
• Picture categorization, recognition of items, and categorization are among the tasks where it excel.
• By learning hierarchical representations of features directly from pixel data, CNNs can
automatically extract patterns and structures within images.
2. Recurrent Neural Networks (RNNs):
• Text, speech, and time series data are examples of ongoing information with relationships in time
that RNNs are intended to simulate.
• They can recognise sequential patterns and interdependence because of their periodic links, which
enable knowledge to endure throughout time.
• RNN variations that solve the vanishing gradient issue and facilitate more efficient learning of
dependencies that are lengthy include Gated Recurrent Unit (GRU) and Long Short-Term Memory
(LSTM).
• RNNs are employed in sentiment analysis, autonomous translation, speech recognition, and word
modelling.
3. Generative Adversarial Networks (GANs):
26
• GANs are an assortment of machine learning models made up of two adversarial trained neural
networks, an identifier and a generator.
• The filter determines which data examples are real and which are bogus, whereas the generator
creates synthetic samples.
• GANs acquire the ability to produce realistic data samples that are identical to real data through
adversarial training.
• Image generation, image-to-image translation, style transfer, and data augmentation are among the
uses for GAN.
4. Autoencoders:
• Autoencoders is neural networks designed for unsupervised learning of efficient data

representations.
• They are made up of a decoder network that reconstructs the input data from the latent space
representation and an encoder network that compresses the input data into a smaller dimensional
latent space.
• Autoencoders are able to perform tasks such as data denoising, dimensionality reduction, and
anomaly detection by learning to rebuild input data and identifying important features and patterns.
• Variants of autoencoders include Variational Autoencoders (VAEs) and Denoising Autoencoders
(DAEs).
• Variant of autoencoders include Variational Autoencoders (VAEs) and Denoising Autoencoders
(DAEs).
27
Figure 4: Autoencoders
5. Deep Reinforcement Learning (DRL):
• DRL allows agents to learn how to interact with environments through trial and error by combining
the principles of reinforcement learning with deep learning.
• Deep QNetworks (DQN), policy gradients, and actorcritic methods are popular architectures used
in DRL.
• DRL has achieved remarkable success in game playing, robotics, autonomous navigation, and
optimization tasks.
6. Transformers:
• Originally created for natural language processing tasks, transformers are attention-based deep
learning models.
• that rely on self-attention mechanisms to capture linkages and long-range dependencies between
input tokens.
• Traditional RNN-based models have been surpassed by transformers in tasks including text
creation, machine translation, and question answering, revolutionising NLP jobs.
• There are three types of transformers: T5 (TextToText Transfer Transformer), GPT (Generative
Pretrained Transformer), and BERT (Bidirectional Encoder Representations from Transformers).
28
3.6 Deep Learning Applications:
1. Object Detection:
• Deep learning has revolutionized object detection by enabling accurate and efficient detection of
objects in images and videos.
• Techniques such as YOLO (You Only Look Once) use convolutional neural networks (CNNs) to
detect and localize objects with bounding boxes in real time.
• Your project leverages deep learning for object detection, specifically detecting various weapons
like handguns, UZIs, and AK47s in both real time and prerecorded videos using YOLO version 9.
• Object detection has numerous applications in security, surveillance, autonomous driving, and
industrial automation.
2. Image Classification:
• Deep learning excels at image classification tasks, where the goal is to assign a label or category
to an input image.
• While your project focuses on object detection, image classification could be an integral component
for further categorizing detected weapons into specific types or classes.
• For example, once a weapon is detected, an additional classification step could be performed to
determine whether it is a handgun, UZI, or AK47, providing more detailed information for threat
assessment.
29
Figure 5: Image Classification
3. Surveillance and Security:
• Deep learning-based object detection systems play a crucial role in surveillance and security
applications.
• Your project contributes to enhancing security measures by automatically detecting and identifying
weapons in real time video streams, enabling proactive threat detection and response.
• Surveillance systems equipped with deep learning models can assist security personnel in
monitoring public spaces, airports, borders, and critical infrastructure for potential security threats.
4. Law Enforcement and Public Safety:
• Deep learning powered object detection technologies support law enforcement agencies in
combating crime and ensuring public safety.
• By automatically detecting weapons like handguns, UZIs, and AK47s in prerecorded videos, your
project aids law enforcement efforts in investigating criminal activities, identifying suspects, and
preventing violent incidents.
• The timely detection of weapons through deep learning algorithms enhances the effectiveness of
law enforcement operations and reduces response times in emergency situations.
30
5. Military Applications:
• Deep learning applications extend to military domains, where object detection technologies are
used for reconnaissance, target detection, and threat assessment.
• Your project's capability to detect weapons in real time and prerecorded videos aligns with military
requirements for situational awareness and battlefield intelligence.
• Deep learning-based object detection systems empower military personnel to identify potential
threats, safeguard troops, and mitigate risks in dynamic and unpredictable environments.
6. Custom Object Detection Solutions:
• Beyond your specific project, deep learning applications in object detection can be customized and
adapted to address diverse use cases and industries.
• Your project demonstrates the flexibility and scalability of deep learning-based object detection
solutions, which can be tailored to detect various objects of interest beyond weapons, such as
vehicles, animals, or equipment.
• Customization options include finetuning pretrained models, dataset augmentation, and optimizing
model architectures to meet specific requirements and performance metrics.
3.7 Deep Learning in Computer Vision:
1. Feature Extraction:
• Convolutional Neural Networks (CNNs) are a type of deep learning model that excels at
automatically deriving hierarchical representations of visual attributes from unprocessed image
data.
• In computer vision tasks, convolutional filters are used to extract low-level features like edges,
textures, and colours from input images as they pass through a CNN's layers.
• Through successive layers, the CNN gradually learns to extract more abstract and high-level
features, capturing complex patterns and structures within the images.
31
• YOLO divides the input image into a grid and predicts bounding boxes and class probabilities for
objects within each grid cell, enabling simultaneous detection of multiple objects with high
accuracy.
• Using deep learning, objects in photos and videos can be identified with accuracy and efficiency.
• Techniques like YOLO (You Only Look Once) utilize CNN architectures to localize and classify
objects within an image in real-time.
Figure 6: Object Detection
3. Semantic Segmentation:
• Semantic segmentation uses deep learning models, including Fully Convolutional Networks
(FCNs), where the objective is to give semantic labels to every pixel in an image.
• To record spatial information and anticipate pixelwise class labels, FCNs use convolutional layers,
allowing for pixel-level knowledge of situations.
• Semantic segmentation is valuable for tasks like image understanding, scene understanding, and
medical image analysis.
32
• Deep learning excels at image classification tasks, where the objective is to assign a label or
category to an input image.
• CNN architectures like AlexNet, VGG, ResNet, and EfficientNet have achieved state-of-the-art
performance on benchmark datasets like ImageNet.
• By learning discriminative features from labeled training data, deep learning models can accurately
classify images into predefined categories or classes.
5. Realtime Processing:
• Deep learning enables real-time processing of visual data, facilitating applications such as object
detection, tracking, and augmented reality.
• Optimized implementations of deep learning models on specialized hardware platforms, such as
GPUs and TPUs, enable high throughput inference for real-time applications.
• Techniques like model quantization, pruning, and efficient network architectures help reduce
computational complexity and memory footprint, enabling faster inference on resource constrained
devices.
6. Transfer Learning:
• Transfer learning leverages pretrained deep learning models to bootstrap learning on new tasks
with limited labeled data.
• By finetuning pretrained models on domain specific datasets, practitioners can achieve robust
performance on target tasks with minimal data annotation efforts.
• Transfer learning is particularly beneficial for small-scale projects like yours, where collecting
large amounts of labeled data may be challenging or impractical.
33
Figure 7: Transfer Learning
7. Scalability and Adaptability:
• Deep learning-based computer vision solutions are scalable and adaptable to diverse domains and
applications.
• Models can be customized, finetuned, or retrained to address specific requirements and challenges
in various industries, including security, healthcare, automotive, and manufacturing.
• The versatility of deep learning frameworks and architectures allows researchers and practitioners
to experiment with different model configurations and optimization techniques to achieve desired
performance outcomes.
3.8 Deep Learning Implementation:
1. Choice of Deep Learning Framework:
• For your project, you have selected YOLO (You Only Look Once) version 9 as the deep learning
framework. Modern object detection algorithms like YOLO are renowned for their accuracy and
speed, which makes them appropriate for real-time applications.
• YOLO version 9 builds upon previous versions by incorporating improvements in network
architecture, optimization techniques, and training strategies, resulting in enhanced detection
performance.
34
2. Training Data Preparation:
• To train the YOLO model for weapon detection, you would have collected and annotated a dataset
of images containing examples of handguns, UZIs, AK47s, and possibly other objects as
background clutter.
• Each image in the dataset would be labeled with bounding boxes specifying the locations of
weapons within the image, along with corresponding class labels indicating the type of weapon.
3. Model Architecture:
• Yolo version 9 uses a deep neural network design with fully linked, pooling, and convolutional
layers.
• After processing input images in a single pass, the network predicts bounding boxes and class
probabilities for items within each cell by dividing the image into a grid of cells.
• If architecture is made to maximise speed and accuracy, allowing for the very precise real-time
detection of numerous objects.
4. Training Process:
• In order to minimise the discrepancy between predicted and ground truth bounding boxes, the
YOLO model iteratively modifies its parameters throughout the training phase, which teaches it to
detect and localise weapons.
• The process of training entails fine-tuning a pre-established loss function that penalises mistakes
in estimated t positions and class probabilities.
• Optimising a pre-defined loss function that penalises errors in estimated t locations and class
probabilities is the process of training.
5. Real-time Inference:
• Once trained, the YOLO model is capable in performing real-time inference on live video streams,
processing each frame sequentially identify weapons in live period.
35
• If optimized architecture in YOLO version 9 ensures fast and efficient computation, making it
suitable for deployment in resource constrained environments or on embedded systems.
6. Prerecorded Video Analysis:
• In addition to real-time detection, the YOLO model can also be applied to analyze prerecorded
videos, enabling retrospective detection of weapons in recorded footage.
• By processing each frame of the video individually, the model identifies and localizes weapons
throughout the entire duration of the video, providing valuable insights for security analysis or
forensic investigations.
7. Deployment and Integration:
• The trained YOLO model can be deployed and integrated into existing security systems,
surveillance cameras, or custom-built applications.
• Integration may involve developing user interfaces for monitoring and alerting, incorporating
additional functionality such as tracking or logging of detected objects, and optimizing
performance for specific hardware platforms.
3.9 CNN:
Convolutional Neural Networks (CNNs) represent a class of deep neural networks, which are devoted
to the recognition and analysis of visionary data, like images and video. They made the computer vision
a great success having the ability to train machines to recognize patterns and hierarchical
representations of features directly from raw pixel data. By imitating the way, the animal visual cortex
is constructed, CNNs obtain the capacity to recreate the visual perception process, from complex
patterns and features detection at different levels of abstraction.
36
Figure 8: CNN
Layers of a CNN:
1. Convolutional Layer:
The convolutional layer is the fundamental building block that drives up the CNN network. This layer
does the job using a predictable number of filters (commonly known as kernels) that are learned before
applying them to the input image. Each filter is sliding across the input image carrying out the
convolution operation, i.e., multiplication and addition of elements in it, to produce a feature map.
Convolutional layers, which possess the capability to articulate the local spatial details like edges
textures and shapes, capture such local patterns. This feature allows CNN to locate these filters during
practice and in the process automatically extracts compact features from the images.
2. Activation Function:
A non-linearity is introduced into the network by elementwise applying of an activation function after
every convolution operation. The activation functions in common use being Rectified Linear Unit
(ReLU), Sigmoid, and Hyperbolic Tangent (tanh). The most common activation however, the ReLU
function, is broadly utilized because of its simplicity and capability in averting the vanishing gradient
problem.
3. Pooling Layer:
The pooling layer that comes after the convolutional layer does not eliminate the small details, but
picks the essential features of the feature maps. It acts on each feature maps separately and most often
37
used pooling functions as, max pooling or average pooling. Besides translation invariance and
lessening computational cost the pooling does deriving a greatly reduced feature-maps. By
emphasizing the important issues that may be lost in the redundancy and neglecting the insignificant
information, pooling enthuses the network with the robustness and the feature extraction capacity that
may help in the generalization capability.
4. Fully Connected Layer:

The fully connected layer is a form of the connection that links each neuron in one layer to every
neuron in the successive layer. It acts as the last stage of the CNN which among the other layers takes
care of mapping high-level features to output classes of labels. These fully-connected layers can
successively perform a linear transformation and an activation function that may be either ReLU or
SoftMax (nonlinear functions) to produce class probabilities. These layers aid in translation of the
learnt characteristics from input data through predicting the data, which are used to make predictions.
5. Flattening Layer:
The flattening layer, situated between convolutional and fully connected layers, flattens the 3D feature
maps rectifying them into the 1D vector. By undertaking such a conversion, the result of the
convolution block layers can be introduced to the aims of the other layers too. Spatial alignment among
features is preserved throughout the process of shrinking and this is done by converting them into a
format that is suitable to feed the traditional neural networks.
6. Normalization Layer:
Occasionally normalization layers are added to the network to improvise the stability and convergence.
Techniques like Batch Normalization, tackle the issue of internal covariate shift by normalizing the
activations of each layer. This helps in quicker training process. Normalization of the data leads to
performance and robustness improvements among the other things as there is consistent behavior in
different batches of data when normalization is performed.
38
3.10 CNN Types
1. LeNet:
• One of the first CNN designs is LeNet, created by Yann LeCun et al.
• The main purpose of its design was to recognise handwritten digits.
• LeNet is composed of several layers, such as fully connected layers, max pooling layers, and
convolutional layers with sigmoid activations.
• LeNet's architecture laid the groundwork for subsequent CNN designs and demonstrated the
effectiveness of convolutional operations in capturing spatial patterns.
2. AlexNet:
• In 2012, Alex Krizhevsky et al. unveiled AlexNet, which represented a major advancement in
computer vision.
• In comparison to earlier techniques, it achieved a notable decrease in mistake rates, which helped
it win the 2012 ImageNet Large Scale Visual Recognition Challenge (ILSVRC).
• AlexNet comprises five convolutional layers followed by three fully connected layers.
• To avoid overfitting, dropout regularisation and ReLU activation functions are used.
• AlexNet demonstrated the importance of deeper architectures and the effectiveness of
convolutional operations in feature extraction.
3. VGGNet:
• VGGNet, developed by the Visual Geometry Group (VGG) at the University of Oxford, is known
for its simplicity and depth.It consists of multiple convolutional layers with small 3x3 receptive
fields and max pooling layers for spatial down sampling.
• VGGNet's architecture is characterized by its uniform structure, with configurations such as
VGG16 (16 layers) and VGG19 (19 layers).
• In spite of its ease of use, VGGNet was able to perform competitively on image classification tasks
and set the standard for later architectures.
39
4. GoogLeNet (Inception):
• GoogLeNet, developed by Google researchers, introduced the Inception architecture with the aim
of improving computational efficiency and model accuracy.
• It employs inception modules consisting of parallel convolutional pathways with different filter
sizes.
• GoogLeNet utilizes global average pooling and auxiliary classifiers to encourage feature reuse and
combat overfitting.
• Why GoogLeNet maintained computational efficiency while achieving state-of-the-art
performance on the ILSVRC 2014 challenge by striking a balance between depth and width.
5. ResNet:
• By adding residual connections, ResNet (Residual Network) overcomes the difficulty of training
extremely deep neural networks.
• Developed by Kaiming He et al., ResNet allows information to flow directly through shortcuts
(skip connections) from one layer to another.
• By using these shortcuts, the vanishing gradient issue is resolved and training incredibly complex
structures with hundreds of layers is made easier.
• Because ResNet architectures are easy to train and effective, they have been widely used in a variety
of computer vision tasks, including ResNet50 and ResNet101.
6. DenseNet:
• DenseNet, or densely connected convolutional networks, adds dense connections between layers
so that all layers above it receive direct input.
• DenseNet, created by Gao Huang et al., encourages feature reuse and improves gradient flow across
the network.
• DenseNet architectures exhibit shorter paths for information propagation, leading to improved
feature propagation and parameter efficiency.
40
• By densely connecting layers, DenseNet achieves state-of-the-art performance with fewer

parameters compared to traditional architectures.
7. MobileNet:
• MobileNet is intended for embedded and mobile devices with constrained processing power.
• To keep competitive accuracy while reducing computing complexity, it makes use of depth-wise
separable convolutions.
• MobileNet architectures tradeoff between model size and computational cost by adjusting
hyperparameters such as depth multipliers and width multipliers.
• MobileNet variants, such as MobileNetV2 and MobileNetV3, enable efficient deployment of CNN
models on resource constrained devices for tasks like image classification and object detection.
8. EfficientNet:
• EfficientNet optimizes both depth and width of the network through a compound scaling method.
• Designed by Mingxing Tan and Quoc Le, EfficientNet maximises performance by concurrently
scaling the network's depth, width, and resolution.
• Using a trade-off between computational efficiency and model complexity, EfficientNet attains
cutting-edge results on a range of image classification benchmarks.
• EfficientNet architectures, such as EfficientNetB0 to EfficientNetB7, offer a spectrum of models
suitable for different computational budgets and task requirements.
3.11 CNN Applications:

• CNNs excel at classifying images into predefined categories or labels.
• Applications include identifying objects in photographs, classifying medical images (e.g., Xrays,
MRI scans), and detecting anomalies in manufacturing.
41
• CNNs can detect and localize objects within images or video frames.
• Object detection applications include surveillance, autonomous vehicles, robotics, and augmented
reality.
• CNNs assign semantic labels to each pixel in an image, enabling pixellevel understanding of
scenes.
• Semantic segmentation is used in medical imaging for tumor detection, environmental monitoring,
and autonomous navigation systems.
4. Facial Recognition:
• CNNs analyze facial features and patterns for identity verification, security access control, and
personalized user experiences.
• Facial recognition applications include biometric authentication, surveillance systems, and social
media tagging.
5. Medical Image Analysis:

• CNNs assist healthcare professionals in diagnosing diseases, detecting anomalies in medical
images, and guiding treatment decisions.
• Applications include diagnosing cancers from histopathology images, identifying abnormalities in
radiology scans, and monitoring patient vital signs.
42
Figure 9: Medical Image Analysis
6. Natural Language Processing (NLP):

• While primarily associated with computer vision, CNNs have been adapted for processing
sequential data in NLP tasks.
• CNNs are used for text classification, sentiment analysis, machine translation, and document
summarization.
7. Gesture Recognition:
• CNNs analyze hand movements and gestures for human computer interaction.
• Applications include sign language recognition, gesture-based interfaces, and virtual reality
interactions.
8. Autonomous Vehicles:
• CNNs play a crucial role in enabling perception systems for autonomous vehicles.
• Tasks include lane detection, pedestrian detection, traffic sign recognition, and object tracking in
realtime environments.
43
9. Satellite Image Analysis:

• CNNs analyze satellite images for environmental monitoring, urban planning, disaster response,
and agriculture.
• Applications include land cover classification, deforestation detection, crop yield prediction, and
urban growth analysis.
10. Video Analysis:

• CNNs process video streams for activity recognition, video surveillance, action detection, and
video summarization.
• Applications include monitoring crowd behavior, identifying safety hazards, and analyzing sports
events.
3.12 CNN in Computer Vision:
Convolutional Neural Networks (CNNs) play a fundamental role in computer vision by enabling
machines to understand and interpret visual data. Here's how CNNs are used in computer vision:
• From raw pixel data, CNNs may automatically learn hierarchical representations of features.
• CNNs extract low-level features like edges, textures, and colours using convolutional layers.
• To create higher-level representations, these traits are gradually integrated and improved in deeper
levels, capturing intricate patterns and structures in images.
• Identifying and localising things inside pictures or video frames is the main objective of object
detection, a common use of CNNs.
• Object detection CNNs typically employ architectures like YOLO (You Only Look Once) or Faster
RCNN (Region based Convolutional Neural Network).
44
• CNNs identify items with bounding boxes and categorise them into predetermined groups by
analysing the entire image in a single pass.
• CNNs classify images into predefined categories or labels, such as identifying objects in
photographs or medical images.
• During training, CNNs learn discriminative features from labeled training data, enabling them to
generalize to unseen images during inference.
• Popular architectures for image classification include AlexNet, VGGNet, ResNet, and Efficient
Net, which have achieved state-of-the-art performance on benchmark datasets like ImageNet.
• To enable pixel-level comprehension of scenes, semantic segmentation entails giving semantic

labels to every pixel in an image.
• CNN based semantic segmentation models, such as UNet and Deep Lab, use encoder decoder
architectures to capture contextual information and spatial dependencies.
• Semantic segmentation CNNs have applications in medical image analysis, autonomous driving,
urban planning, and environmental monitoring.
5. Object Recognition and Localization:
• CNNs can recognize and localize specific objects within images, distinguishing between different
instances of the same object class.
• This capability is crucial for tasks such as facial recognition, where CNNs identify faces in images
and localize facial features like eyes, nose, and mouth.
• Applications like augmented reality, retail analytics, and surveillance also use CNN-based object
identification systems.
45
6. Feature Matching and Correspondence:
• CNNs facilitate feature matching and correspondence between images, enabling tasks such as
image registration, panoramic stitching, and 3D reconstruction.
• CNN based feature descriptors, such as SIFT (Scale Invariant Feature Transform) and ORB
(Oriented FAST and Rotated BRIEF), provide robust representations for matching key points
across images.
7. Gesture Recognition:
• CNNs analyze hand movements and gestures for human computer interaction.
• Gesture recognition CNNs learn spatial temporal patterns from video sequences or depth maps,
enabling real-time gesture detection and interpretation.
• Applications include sign language recognition, virtual reality interactions, and touchless
interfaces.
8. Anomaly Detection:
• The purpose of using CNNs for anomaly identification in photos is to find odd or unexpected
patterns that differ from typical behaviour.
• Anomaly detection CNNs learn to distinguish between normal and anomalous images during
training, enabling them to detect outliers and anomalies in real-world scenarios such as defect
inspection, medical diagnosis, and security surveillance.
3.13 CNN Implementation:
1. Dataset Preparation:
• To train the CNN model, you would first need to collect and prepare a dataset of images
containing examples of handguns, UZIs, AK47s, and possibly other objects to serve as
background clutter.
46
• Each image in the dataset would be labeled with the corresponding object class (e.g., handgun,
UZI, AK47) using bounding boxes or segmentation masks.
2. Model Training:
• You would then train the CNN model using the prepared dataset and a suitable architecture such
as YOLO (You Only Look Once) version 9, as you mentioned.
• During training, the CNN learns to automatically extract discriminative features from the input
images and classify objects into predefined categories.
• The training process involves iteratively adjusting the parameters of the CNN model to minimize
the difference between predicted outputs and ground truth labels.
• Once trained, the CNN model serves as a feature extractor, processing input images to extract
relevant features associated with different object classes.
• Convolutional layers within the CNN perform operations such as edge detection, texture analysis,
and object localization, enabling the model to identify and localize weapons within images.
• Using the trained CNN model, you can perform object detection on both real-time video streams
and prerecorded videos.
• For real-time detection, the CNN processes video frames in sequential order, identifying and
localizing weapons in each frame in near real-time.
• For prerecorded videos, the CNN processes each frame individually, allowing you to detect
weapons retrospectively.
47
5. Classification and Localization:
• The CNN model outputs predictions for each detected object, including the class label (e.g.,
handgun, UZI, AK47) and bounding box coordinates specifying the object's location within the
image.
• These predictions enable you to classify the detected objects and precisely localize their
positions, providing valuable information for security personnel or law enforcement agencies.
6. Integration with User Interface or System:
• You may integrate the CNN based weapon detection system with a user interface or larger
security system.
• The system could display real-time detections on a monitor, highlight detected weapons, and
trigger alerts or notifications in case of potential threats.
• Integration may involve deploying the trained CNN model on dedicated hardware platforms or
leveraging existing software frameworks for efficient inference.
48
Chapter 4 Results and Discussion
Chapter 4
RESULTS AND DISCUSSION
4.1 Experimental Results
1. Handgun
Figure 10: Handgun
2. Ak-47
Figure 11: AK-47
49
3. UZI
Figure 12: UZI
YOLOv9 excels at analyzing objects in photos and videos with accuracy and speed; This makes it
suitable for real-time applications such as surveillance, traffic drivers and robots. Object recognition:
YOLOv9 can recognize and classify many objects, making it useful for tasks such as image recognition
and analysis.
Scene Understanding: By detecting and classifying objects in difficult situations, YOLOv9 can help
understand the content of images and videos, which is important for applications such as content review
and reality.
Instance segmentation: YOLOv9 can instantiate segmentation, which involves identifying an object in
an image and its boundaries, allowing for more detailed analysis and understanding of visual content.
Schemes, business applications and open plans to solve various computer vision problems with
improved performance and better performance compared to the previous version of YOLO.
50
4.2 Performance Measurements
1. train/box_loss:
This is the bounding box regression loss, which calculates the difference between the ground truth and
the predicted bounding box dimensions and coordinates. More accuracy in the anticipated bounding
boxes is shown by a smaller box_loss.
Figure 13: train/box_loss
2. train/cls_loss:
This is the classification loss, which calculates the difference between the ground truth and the
predicted class probabilities for every object in the picture. A smaller cls_loss indicates that the model
is forecasting the object class more precisely.
Figure 14: train/cls_loss

51
3. train/dfl_loss:
This is a new addition to the YOLO architecture in YOLOv9—the deformable convolution layer loss.
The deformable convolution layers, which are intended to enhance the model's capacity to identify
objects with a range of scales and aspect ratios, are measured by this loss. A smaller dfl_loss suggests
that the model can handle variations in appearance and object deformations more effectively.
Figure 15: train/dfl_loss
52
Chapter 5 Conclusion and Future Scope
Chapter 5
CONCLUSION AND FUTURE SCOPE
5.1 Conclusion and Future Scope
In conclusion, our study addressed the urgent need for effective weapon detection systems in the face
of escalating security concerns and offered a noteworthy advancement in security technology.
Additionally, it introduces a ground-breaking real-time weapon recognition system that combines the
YOLOv9 model with deep learning techniques to produce an advanced solution that can recognize a
variety of weapons with unparalleled precision and efficiency. The use of YOLOv9, which enables
quick inference times and enhanced weapon recognition in a range of settings, is the methodology's
main component. With the help of state-of-the-art technology and deep learning methods, the provided
algorithm supports security measures in a variety of situations, including homes, public areas, and
critical infrastructures, with an astounding 98% accuracy rate. Furthermore, the method's dynamic
nature ensures rapid weapon recognition, alerting authorized users or security experts as soon as a
weapon is detected and triggering an alarm. Proactive behavior raises the standard for overall security
by enhancing situational awareness and enabling prompt responses to security threats. In essence, the
work also offers a dynamic weapon detection system that, by offering a crucial instrument for lowering
security risks and guaranteeing public safety, marks a significant breakthrough in security technology.
Our research paves the way for the creation of a new generation of weapon detection systems that can
combine cutting-edge AI methods, meticulous testing, and practical application to respond to changing
threats and advance public safety and social cohesion in an increasingly unpredictably changing world.
Even though weapon detection technology has advanced significantly thanks to current research, there
are still a number of areas that could be explored and improved in the future. First, by including
additional sophisticated algorithms or investigating YOLOv9 variants, the deep learning models could
be further improved, increasing the system's accuracy and recognizing new kinds of weapons and
variances in weapon display. Furthermore, the incorporation of contextual data and multi-modal sensor
fusion may improve the system's resilience in intricate settings, such congested regions or dimly lit
areas. Additional sensor data, such as thermal imaging or acoustic detection, could improve the
system's dependability and resilience against prospective adversaries' evasion strategies. Research
53
Chapter 5 Conclusion and Future Scope
endeavors may also concentrate on enhancing the system for implementation in mobile or self-
governing platforms, permitting instantaneous identification of weapons in ever-changing settings like
transit hubs or unmanned aerial aircraft. The planned weapon detection system's implementation might
also be expanded to enhance the security infrastructure that is already in place. For example, it could
integrate with emergency response protocols, access control systems, and surveillance systems.
Collaboration amongst policymakers, security agencies, and technology developers can facilitate the
adoption of comprehensive security solutions that are suited to particular risk profiles and settings. In
conclusion, there is a great deal of room for innovation and cross-disciplinary cooperation in the field
of weapon detection technology. In a danger environment that is constantly changing, we can continue
to improve public safety and security by utilizing developments in artificial intelligence, sensor
technology, and data analytics.
54
References
REFERENCES
1. Mohammad Zahrawi & Khaled Shaalan, “Improving video surveillance systems in banks using
deep learning techniques”, Scientific Reports, vol. 2021, pp.1-17, DOI:10.1038/s41598-023-
35190-9, May 2023.
2. Shehzad Khalid, Onome Christopher Edo, Abdullah Waqar, Tenebe, Imokhai Theophilus, PhD,
Hoor Ul Ain Tahir, “Weapon detection system for surveillance and security “, International
Conference on IT Innovation and Knowledge Discovery, pp.1-8,
DOI:10.1109/ITIKD56332.2023.10099733, March 2023.
3. Abdul Hanan Ashraf, Muhammad Imran, Abdulrahman M. Qahtani, Abdulmajeed Alsufyani,

Omar Almutiry, Awais Mahmood, Muhammad Attique, Mohamed Habib," Weapons Detection for
Security and Video Surveillance Using CNN and YOLO-V5s"(vol. 70, pp.1-16),
DOI:10.32604/cmc.2022.018785, Jan 2022.
4. Roberto Olmos, Siham Tabik, Alberto Lamas, Francisco Pérez-Hernández, Francisco Herrera,” A
binocular image fusion approach for minimizing false positives in handgun detection with deep
learning”, Andalusian Research Institute in Data Science and Computational Intelligence vol. 49,
pp.271-280, DOI: 10.1016/j.inffus.2018.11.015, September 2019.
5. B. Abruzzo, K. Carey, C. Lowrance, E. Sturzinger, R. Arnold, and C. Korpela, “Cascaded neural

networks for identification and posture-based threat assessment of armed people “, IEEE
International Symposium on Technologies for Homeland Security (HST),
DOI:10.1109/HST47167.2019.9032904, November 2019.
55
References
6. Nashwan Jasim Hussein, Fei Hu Huazhong, “An Alternative Method to Discover Concealed
Weapon Detection Using Critical Fusion Image of Color Image and Infrared Image”, 2016 First
IEEE International Conference on Computer Communication and the Internet, pp.378-383, DOI:
10.1109/CCI.2016.7778947, October 2016.
7. Dr Raman Dugyala M Vishnu Vardhan Reddy, Ch Tharun Reddy and G Vijendar,” Weapon
Detection in Surveillance Videos Using OLOV8 and PELSF-DCNN”, E3S Web of Conferences
391, vol. 391, pp.1-18, DOI:10.1051/e3sconf/202339101071, June 2023.
8. Muhammad Tahir Bhatti, Muhammad Gufran Khan, Masood Aslam, And Muhammad Junaid
Fiaz1, “Weapon Detection in Real-Time CCTV Videos Using Deep Learning”, ieee, vol. 9, pp.
34366-34382, DOI: 10.1109/ACCESS.2021.3059170, February 2021.
9. Pankaj Bhambri, Sachin Bagga, Dhanuka Priya, Harnoor Singh, Harleen Kaur Dhiman,
“Suspicious Human Activity Detection System”, Journal of ISMAC, vol.02, pp.216-221,
DOI:10.36548/jismac.2020.4.005, October 2020.
10. Ms. U. M. Kamthe and Dr. C. G. Patil, “Suspicious Activity Recognition in Video Surveillance
System “, Fourth International Conference on Computing Communication Control and
Automation, vol. 3, pp.1-14, DOI:10.1109/ICCUBEA.2018.8697408, August 2018.
11. Nandini Fal Dessai, Prof. Shruti Pednekar, “Surveillance-based Suspicious Activity Detection:
Techniques, Application and Challenges “, International Journal of Creative Research Thoughts,
vol. 11, pp. 1-4, DOI:10.1007/s10462-017-9545-7, May 2023.
56
References
12. Digambar Kauthkar, Snehal Pingle, Vijay Bansode, Pooja Idalkanthe, prof. Sunita Vani,
International Journal of Innovative Science and Research Technology ISSN No: -2456-2165
Volume 7, Issue 6, June 2022.
13. Sathyajit Loganathan, Gayashan Kariyawasam, Prasanna Sumathipala "Suspicious Activity

Detection in Surveillance Footage", International Conference on Electrical and Computing
Technologies and Applications, pp.1-4, DOI:10.1109/ICECTA48151.2019.8959600, November
2019.
14. Suganya, K., Pavithra, A., Ranjani, R., Saktheswari, & Seethai, R. “Weapon detection using
machine learning algorithm”, In International Journal of Creative Research Thoughts, vol. 11, pp.
1-4, ISSN: 2320-2882, November 2023.
15. Dr Raman Dugyala, M Vishnu Vardhan Reddy, Ch Tharun Reddy and G Vijendar, “Weapon
detection in surveillance videos using YOLOV8 and PELSF-DCNN”, E3S Web of Conferences,
DOI:10.1051/e3sconf/202339101071, vol. 91, pp.1-18, June 2023.
16. Murugan, P., Ida, A. M., Aashika, R., Brillian, S. A., Ancy, M. S., & Sneha,” Weapon detection
system using deep learning”, In International Journal of Innovative Research in Technology, pp.1-
6, ISSN: 2349-6002 June 2023.
17. Ahmed Abdelmoamen Ahmed and Mathias Echi, “Hawk-Eye: An AI-Powered Threat Detector for
Intelligent Surveillance Cameras”, IEEE Access, vol. 9, pp. 63283 – 63293, ISSN: 2169-3536,
DOI: 10.1109/ACCESS.2021.3074319, April 2021.
57
References
18. Diya Kumawat, D., Abhayankar, D, & Tanwani, S. Exploring Object Detection Algorithms and
implementation of YOLOv7 and YOLOv8 based model for weapon detection. International
Journal of Intelligent Systems and Applications in Engineering, vol. 12, pp. 1-10, ISSN:2147-6799,
November 2024.
19. P Yadav,N Gupta, & P.K Sharma, “Robust Weapon Detection in Dark Environments using
YOLOv7-DarkVision”, Digital Signal Processing, pp. 1-18, DOI: 10.1016/j.dsp.2023.104342,1
December2023.
20. G. F. Shidik, E. Noersasongko, A. Nugraha, P. N. Andono, J. Jumanto and E. J. and Kusuma, ‘‘A
systematic review of intelligence video surveillance: Trends, techniques, frameworks, and
datasets’’, IEEE Access, vol. 7, pp. 457–473, DOI:10.1109/ACCESS.2019.2955387, January
2019.
58
Intelligent Monitoring for Anomaly Recognition using
CNN and YOLOv9
Abhishek Mohan Hundalekar1[0009-0003-3981-6189] and Siddesh Bhagwan Pingale2[0009-

0000-3457-297X]
and Vamshi Rajkumar Naidu3[0009-0005-1427-9481] and
Vishal Shirsath4 [0000-0001-5251-1935]
1
Ajeenkya D Y Patil University Pune India
2Ajeenkya D Y Patil University Pune India
2 Springer Heidelberg, Tiergartenstr. 17, 69121 Heidelberg, Germany

lncs@springer.com
Abstract. The prompt and precise detection of firearms is essential in today's security
environments to ensure public safety. This research paper provides a novel method for real-time
weapon detection using Convolutional Neural Network (CNN) techniques and YOLOv9 object
recognition framework in both live and prerecorded film. By integrating YOLOv9, object
detection accuracy and speed are considerably improved, facilitating the quick identification of
possible threats. The presented method exhibits strong performance in various lighting settings
and environments, with excellent recall rates and precision thorough testing and assessment. This
approach used CNN based architecture and deep learning to effectively detect and categorize
weapons in video frames which achieves 97.62 % accuracy.
Keywords: You Only Look Once, CNN – Convolution Neural Network, DL – Deep Learning,
AI – Artificial Intelligence, IMFB - Internet Movie Firearms Database, COCO – Common
Objects in Context.
1. Introduction
Today's security landscape, the urgency for effective weapon detection systems is
paramount, given the escalating frequency of shooting incidents. However, traditional
methods may fall short due to outdated technology or manual inspection processes,
leaving security infrastructures vulnerable to evolving threats. To address this gap, this
study seeks to modernize weapon detection by harnessing the power of artificial
intelligence (AI) and deep learning techniques. Central to our research is the adoption
of state-of-the-art AI methods, specifically leveraging deep learning algorithms and the
YOLOv9 model. This approach aims to revolutionize real-time weapon recognition
capabilities, capitalizing on YOLOv9's renowned speed and precision in object
recognition tasks. By embracing technological innovation, our objective is to enhance
society's resilience and safety through reliable weapon detection systems that keep pace
with evolving security threats.
2. Related Work
Building on fundamental methods such binocular image fusion and false positive minimization,
recent research has thoroughly investigated the development of weapon detection systems in
surveillance and security [6, 4]. Deep learning advances, especially integrating CNN with
2
YOLOv5s, have greatly improved detection accuracy as demonstrated by works like [3] and [1],
with applications that reach as far as surveillance in specific settings like banks [2]. In addition,
different strategies for detecting concealed weapons have been investigated [6], demonstrating
the variety of methods used in the industry. Real-time weapon detection systems have been the
focus of recent research, which uses sophisticated deep learning models like YOLOv7 and
YOLOv7-DarkVision to accomplish reliable identification even in difficult circumstances like
dimly lit areas [8], [19].
In addition, this study incorporates more general surveillance goals, such as posture-based threat
assessment [5] and the identification of questionable human behavior [9], which further advances
the continuous development of surveillance technologies. The continuous endeavors to enhance
and broaden the scope of surveillance technologies are seen in the ongoing inquiries into more
advanced versions of the YOLO model [18] and the methodical evaluations of intelligence video
surveillance [20].
3. Methodology
Figure 1: Methodology
The client-side, server-side, login page, register page, option window, object
identification logic, database, API, and application interface are all important parts of
our methodology. The login and registration pages make up the majority of the client-
side interfaces. Users are led to the registration page if they haven't already registered
and are asked to log in when they first access the system. Users choose a movie for item
identification after logging in. They can select recorded videos or live streaming of
videos that have already been recorded.
Our system uses YOLOv9, the most recent version, which is well-known for its
remarkable speed and accuracy in object detection tasks, when a video is chosen. In a
single network run, YOLOv9 uses a single neural network to immediately forecast
bounding boxes and class probabilities for objects that have been spotted. Because of
its unified methodology, YOLOv9 can attain real-time inference speeds, which makes
it ideal for applications like autonomous vehicles or surveillance systems that need to
handle massive amounts of data quickly.
3
The YOLOv9 model analyses the chosen movie frame by frame in order to find
firearms. An alert is set off and the user receives a notification via email or mobile
device if a weapon is found. On the other hand, in the event that no weapon is found,
the user still receives a message and the system stays silent, guaranteeing continuous
monitoring and feedback. In addition to object identification logic, our application
includes a server-side component responsible for handling data processing and
communication between the client-side and the database. The server-side ensures
seamless operation of the system, facilitating efficient data exchange and timely
responses to security events.
The application's smooth integration of the object detection logic makes it possible to
identify weapons quickly and precisely. The system also includes an API for client-side
and server-side communication, as well as a database to hold user data and video
metadata. This all-encompassing strategy guarantees the application's seamless
operation and makes it easier to react quickly to any security risks. Utilizing YOLOv9's
capabilities and incorporating it into our process allows us to create a solid and
dependable weapon identification system. In today's increasingly complicated security
landscape, the combination of YOLOv9's speed and accuracy with our application's
user-friendly interface and notification system improves security measures and gives
users peace of mind.
4. Calculations
4.1 Intersection over Union (IoU): For every anticipated bounding box and its
matching ground truth box, the IoU is computed. This is the equation:
IoU = Intersection Area / Union Area
The intersection area is the region where the ground truth box and the anticipated
bounding box overlap.
Union Area: The whole area that is encompassed by the ground truth box and the
forecasted box.
4.2 Binary Cross-Entropy Loss (Classification Loss):

The difference between the ground truth class labels (y) and the predicted class
probabilities (Pc) is measured by this formula.
Loss = - Σ (yi * log(Pc) + (1-yi) * log(1-Pc))
where,Σ: Total sum of all the classes.
yi: Class I ground truth label (1 for positive classes, 0 for negative classes).
Pc: The model's predicted probability of class I.
4.3 Smooth L1 Loss (Bounding Box Regression Loss):

For a given anchor box, this formula determines the difference between the ground
truth offsets (t) and the expected bounding box offsets (p).
Loss = Σsmooth_L1(pt - tt)
where,Σ: Total of all bounding box coordinates (height, width, x, and y).
smooth_L1(x): The smoothed L1 function, which, in contrast to the conventional L1
loss, lessens the cost for tiny errors.
pt: Value of the predicted offset for a given bounding box coordinate.
4
tt: The value of the ground truth offset for the associated coordinate.
4.4 Sigmoid and Softmax:

These activation functions are applied to the final output layer of the network.
Sigmoid: For binary classification (one object class vs. background), use the sigmoid
algorithm. The output values are transformed from 0 to 1, which indicates the
likelihood that the object class is present.
Formula: Sigmoid(x) = 1 / (1 + e^(-x))
Softmax: Multiple item classes are classified using this technique. The output scores
are normalised so that they add up to 1, which represents the class probability.
Formula: Softmax(x) = e^(x_i) / Σ(e^(x_j)) for all classes j.
5. Dataset and Accuracy

In order to conduct this study, we gathered and carefully selected a bespoke dataset
made up of pictures of numerous weapons that are frequently seen in security situations.
Each type of weapon was captured in the collection by means of painstaking image
capture and storage, with many instances and orientations for each weapon.
5.1 Handgun:
This category has 1260 high-resolution photos that show pistols from various
viewpoints and angles. To guarantee variation in background circumstances and
lighting, the pistols were methodically photographed in controlled situations. Every
picture in this category features a variety of handgun models.
5.2 AK-47:
The 1257 photos in the AK-47 dataset show this well-known assault rifle from various
angles. The photos were meticulously taken, catching the unique qualities and traits of
the AK-47 in a range of firing situations. Every picture captures the distinct structural
aspects and design features of the AK-47.
5.3 UZI: This dataset, which features 1417 carefully taken pictures of the UZI
submachine gun, shows the weapon in various positions and environments. The UZI
dataset provides extensive coverage of this gun, ranging from broad frames showcasing
its form factor to close-ups emphasizing minute details.
5.4 Knife:
The knife dataset consists of 1020 pictures that show various kinds of knives, including
as utility knives, combat knives, and other weapons with blades. Every photograph
showcases the unique characteristics and blade layouts of the knives, providing a
thorough portrayal of this class of firearm.
5
Table 1
Sr. Dataset Categories Accuracy

No.
1 1020 Knife 92%
2 1260 Handgun 95%
3 1257 AK-47 94%
4 1417 UZI 97%
YOLOv9 excels at analyzing objects in photos and videos with accuracy and speed;
This makes it suitable for real-time applications such as surveillance, traffic drivers and
robots. Object recognition: YOLOv9 can recognize and classify many objects, making
it useful for tasks such as image recognition and analysis.
Scene Understanding: By detecting and classifying objects in difficult situations,
YOLOv9 can help understand the content of images and videos, which is important for
applications such as content review and reality.
Instance segmentation: YOLOv9 can instantiate segmentation, which involves
identifying an object in an image and its boundaries, allowing for more detailed analysis
and understanding of visual content. Schemes, business applications and open plans to
solve various computer vision problems with improved performance and better
performance compared to the previous version of YOLO.
5. Clustered Bar Chart
This chart shows the weapons dataset and accuracy

6
6. Experimental Results
6.1 Handgun
6.1 Ak-47
6.2 UZI
YOLOv9 excels at analyzing objects in photos and videos with accuracy and speed;
This makes it suitable for real-time applications such as surveillance, traffic drivers and
robots. Object recognition: YOLOv9 can recognize and classify many objects, making
it useful for tasks such as image recognition and analysis.
Scene Understanding: By detecting and classifying objects in difficult situations,

YOLOv9 can help understand the content of images and videos, which is important for
applications such as content review and reality.
Instance segmentation: YOLOv9 can instantiate segmentation, which involves

identifying an object in an image and its boundaries, allowing for more detailed analysis
and understanding of visual content. Schemes, business applications and open plans to
7
solve various computer vision problems with improved performance and better
performance compared to the previous version of YOLO.
7. Performance Measurements
7.1 train/box_loss
This is the bounding box regression loss, which calculates the difference between the
ground truth and the predicted bounding box dimensions and coordinates. More
accuracy in the anticipated bounding boxes is shown by a smaller box_loss.
Figure 1: train/box_loss
7.2 train/cls_loss
This is the classification loss, which calculates the difference between the ground
truth and the predicted class probabilities for every object in the picture. A smaller
cls_loss indicates that the model is forecasting the object class more precisely.
Figure 2: train/cls_loss
8
7.3 Train/dfl_loss
This is a new addition to the YOLO architecture in YOLOv9—the deformable
convolution layer loss. The deformable convolution layers, which are intended to
enhance the model's capacity to identify objects with a range of scales and aspect ratios,
are measured by this loss. A smaller dfl_loss suggests that the model can handle
variations in appearance and object deformations more effectively.
Figure 3: train/dfl_loss
8. Conclusion
To sum up, this research paper presented a significant breakthrough in security
technology and addresses the pressing demand for efficient weapon detection systems
in the face of growing security concerns. It also presents a revolutionary real-time
weapon recognition method that uses deep learning techniques in conjunction with the
YOLOv9 model to provide an advanced solution that can identify a wide range of
weapons with unsurpassed efficiency and accuracy.
The methodology's key component is the use of YOLOv9, which allows for fast
inference times and superior weapon recognition in a variety of scenarios. Utilizing
cutting-edge technology and deep learning techniques, the presented algorithm
achieves remarkable 98% accuracy rate, supporting security protocols in a range of
contexts, such as residences, public spaces, and vital infrastructures.
Additionally, the dynamic nature of the presented method is, it guarantees timely
weapon recognition, promptly notifying authorized users or security professionals upon
detection and setting off alerts. By taking a proactive stance, situational awareness is
improved and quick reactions to security risks are made possible, raising the bar for
overall security.
Essentially, the work also presents dynamic weapon detection system represents a
major advancement in security technology by providing an essential tool for reducing
security threats and ensuring public safety. Our research work lays the groundwork for
the development of a new generation of weapon detection systems that can integrate
state-of-the-art artificial intelligence techniques, rigorous testing, and real-world
deployment to respond to evolving threats and promote social harmony and public
safety in an increasingly unpredictable world.
9
9. References
1. Mohammad Zahrawi & Khaled Shaalan, “Improving video surveillance

systems in banks using deep learning techniques”, Scientific Reports, vol.
2021, pp.1-17, DOI:10.1038/s41598-023-35190-9, May 2023.
2. Shehzad Khalid, Onome Christopher Edo, Abdullah Waqar, Tenebe, Imokhai

Theophilus, PhD, Hoor Ul Ain Tahir, “Weapon detection system for
surveillance and security “, International Conference on IT Innovation and
Knowledge Discovery, pp.1-8, DOI:10.1109/ITIKD56332.2023.10099733,
March 2023.
3. Abdul Hanan Ashraf, Muhammad Imran, Abdulrahman M. Qahtani,

Abdulmajeed Alsufyani, Omar Almutiry, Awais Mahmood, Muhammad
Attique, Mohamed Habib," Weapons Detection for Security and Video
Surveillance Using CNN and YOLO-V5s"(vol. 70, pp.1-16),
DOI:10.32604/cmc.2022.018785, Jan 2022.
4. Roberto Olmos, Siham Tabik, Alberto Lamas, Francisco Pérez-Hernández,

Francisco Herrera,” A binocular image fusion approach for minimizing false
positives in handgun detection with deep learning”, Andalusian Research
Institute in Data Science and Computational Intelligence vol. 49, pp.271-280,
DOI: 10.1016/j.inffus.2018.11.015, September 2019.
5. B. Abruzzo, K. Carey, C. Lowrance, E. Sturzinger, R. Arnold, and C. Korpela,

“Cascaded neural networks for identification and posture-based threat
assessment of armed people “, IEEE International Symposium on
Technologies for Homeland Security (HST),
DOI:10.1109/HST47167.2019.9032904, November 2019.
6. Nashwan Jasim Hussein, Fei Hu Huazhong, “An Alternative Method to

Discover Concealed Weapon Detection Using Critical Fusion Image of Color
Image and Infrared Image”, 2016 First IEEE International Conference on
10
Computer Communication and the Internet, pp.378-383, DOI:

10.1109/CCI.2016.7778947, October 2016.
7. Dr Raman Dugyala M Vishnu Vardhan Reddy2, Ch Tharun Reddy and G

Vijendar,” Weapon Detection in Surveillance Videos Using OLOV8 and
PELSF-DCNN”, E3S Web of Conferences 391, vol. 391, pp.1-18,
DOI:10.1051/e3sconf/202339101071, June 2023.
8. Muhammad Tahir Bhatti, Muhammad Gufran Khan, Masood Aslam, And

Muhammad Junaid Fiaz1, “Weapon Detection in Real-Time CCTV Videos
Using Deep Learning”, ieee, vol. 9, pp. 34366-34382, DOI:
10.1109/ACCESS.2021.3059170, February 2021.
9. Pankaj Bhambri, Sachin Bagga, Dhanuka Priya, Harnoor Singh, Harleen Kaur
Dhiman, “Suspicious Human Activity Detection System”, Journal of ISMAC,
vol.02, pp.216-221, DOI:10.36548/jismac.2020.4.005, October 2020.
10. Ms. U. M. Kamthe and Dr. C. G. Patil, “Suspicious Activity Recognition in

Video Surveillance System “, Fourth International Conference on Computing
Communication Control and Automation, vol. 3, pp.1-14,
DOI:10.1109/ICCUBEA.2018.8697408, August 2018.
11. Nandini Fal Dessai, Prof. Shruti Pednekar, “Surveillance-based Suspicious

Activity Detection: Techniques, Application and Challenges “, International
Journal of Creative Research Thoughts, vol. 11, pp. 1-4, DOI:10.1007/s10462-
017-9545-7, May 2023.
12. Digambar Kauthkar1, Snehal Pingle, Vijay Bansode, Pooja Idalkanthe, prof.
Sunita Vani, International Journal of Innovative Science and Research
Technology ISSN No: -2456-2165 Volume 7, Issue 6, June 2022.
13. Sathyajit Loganathan, Gayashan Kariyawasam, Prasanna Sumathipala

"Suspicious Activity Detection in Surveillance Footage", International
Conference on Electrical and Computing Technologies and Applications,
pp.1-4, DOI:10.1109/ICECTA48151.2019.8959600, November 2019.
11
14. Suganya, K., Pavithra, A., Ranjani, R., Saktheswari, & Seethai, R. “Weapon
detection using machine learning algorithm”, In International Journal of
Creative Research Thoughts, vol. 11, pp. 1-4, ISSN: 2320-2882, November
2023.
15. Dr Raman Dugyala, M Vishnu Vardhan Reddy, Ch Tharun Reddy and G

Vijendar, “Weapon detection in surveillance videos using YOLOV8 and
PELSF-DCNN”, E3S Web of Conferences,
DOI:10.1051/e3sconf/202339101071, vol. 91, pp.1-18, June 2023.
16. Murugan, P., Ida, A. M., Aashika, R., Brillian, S. A., Ancy, M. S., & Sneha,”
Weapon detection system using deep learning”, In International Journal of
Innovative Research in Technology, pp.1-6, ISSN: 2349-6002 June 2023.
17. Ahmed Abdelmoamen Ahmed and Mathias Echi, “Hawk-Eye: An AI-

Powered Threat Detector for Intelligent Surveillance Cameras”, IEEE Access,
vol. 9, pp. 63283 – 63293, ISSN: 2169-3536, DOI:
10.1109/ACCESS.2021.3074319, April 2021.
18. Diya Kumawat, D., Abhayankar, D, & Tanwani, S. Exploring Object

Detection Algorithms and implementation of YOLOv7 and YOLOv8 based
model for weapon detection. International Journal of Intelligent Systems and
Applications in Engineering, vol. 12, pp. 1-10, ISSN:2147-6799, November
2024.
19. P Yadav,N Gupta, & P.K Sharma, “Robust Weapon Detection in Dark
Environments using YOLOv7-DarkVision”, Digital Signal Processing, pp. 1-
18, DOI: 10.1016/j.dsp.2023.104342,1 December2023.
20. G. F. Shidik, E. Noersasongko, A. Nugraha, P. N. Andono, J. Jumanto and E.

J. and Kusuma, ‘‘A systematic review of intelligence video surveillance:
Trends, techniques, frameworks, and datasets’’, IEEE Access, vol. 7, pp. 457–
473, DOI:10.1109/ACCESS.2019.2955387, January 2019.

Final Merged

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Final Merged

Uploaded by

Copyright:

Available Formats

Intelligent Monitoring for Anomaly Recognition

using CNN and YOLOv9

Vamshi Rajkumar Naidu

Under the Supervision of

Dr. Vishal Shirsath

Internal-Examiner/s External Examiner

Dr. Biswajeet Champaty

Dr. Vishal Shirsath

Vamshi Naidu Siddesh Pingale

2.1 Literature Survey 12

1.1 Data and Accuracy 22

3. Abdul Hanan Ashraf, Muhammad Imran, Abdulrahman M. Qahtani, Abdulmajeed Alsufyani,

5. B. Abruzzo, K. Carey, C. Lowrance, E. Sturzinger, R. Arnold & C. Korpela (2019) - “Cascaded

3. Utilization of YOLOv9 for Object Detection:

4. Alert Mechanism for Weapon Detection:

5. Server-side Component for Data Processing:

6. Integration of Object Detection Logic:

7. API and Database Integration:

8. Enhanced Security Measures:

2. Binary Cross-Entropy Loss (Classification Loss):

3. Smooth L1 Loss (Bounding Box Regression Loss):

4. Sigmoid and Softmax:

3.3 Dataset and Accuracy

Table 1: Data and Accuracy

Sr. Dataset Categories Accuracy

Figure 2: Bar Chart

This chart shows the weapons dataset and accuracy.

3.4 Deep Learning

Figure 3: Deep Learning

Key Components of Deep Learning:

3.5 Deep Learning Types:

1. Convolutional Neural Networks (CNNs):

2. Recurrent Neural Networks (RNNs):

3. Generative Adversarial Networks (GANs):

• Autoencoders is neural networks designed for unsupervised learning of efficient data

5. Deep Reinforcement Learning (DRL):

3.6 Deep Learning Applications:

Figure 5: Image Classification

3. Surveillance and Security:

4. Law Enforcement and Public Safety:

6. Custom Object Detection Solutions:

3.7 Deep Learning in Computer Vision:

Figure 6: Object Detection

Figure 7: Transfer Learning

7. Scalability and Adaptability:

3.8 Deep Learning Implementation:

1. Choice of Deep Learning Framework:

2. Training Data Preparation:

6. Prerecorded Video Analysis:

7. Deployment and Integration:

4. Fully Connected Layer:

3.10 CNN Types

• By densely connecting layers, DenseNet achieves state-of-the-art performance with fewer

3.11 CNN Applications:

5. Medical Image Analysis:

Figure 9: Medical Image Analysis

6. Natural Language Processing (NLP):

9. Satellite Image Analysis:

10. Video Analysis:

3.12 CNN in Computer Vision:

• To enable pixel-level comprehension of scenes, semantic segmentation entails giving semantic

5. Object Recognition and Localization:

6. Feature Matching and Correspondence:

3.13 CNN Implementation: