Professional Documents
Culture Documents
cis_ipd report
cis_ipd report
cis_ipd report
The standard font shall be Times New Roman of 12 pts with 1.5 line spacing.
Page Format:
The Printed Sheets shall have the following written area and margins:
Top Margin : 17 mm
Head Height : 3 mm
Head Separation : 12 mm
Bottom Margin : 25.4 mm
Footer : 3 mm
Foot Separation : 10 mm
Left Margin : 0.01 mm
Right Margin : 7.5 mm
Text Height : 245 mm
Text Width : 160 mm
When header is not used the top margin shall be 30 mm.
Criminal Detection using Facial Recognition
by
Project Guide
(Information Technology)
Dwarkadas J. Sanghvi College of Engineering University
of Mumbai
2023-2024
Criminal Identification using Facial Recognition
CERTIFICATE
This is to certify that the project entitled Criminal Detection using Facial Recognition is a
bonafide work of Prince Doshi (60003210047), Krish Shah (60003210048), Sara Kore
“Information Technology”.
Date: 17/05/2024
DECLARATION
We declare that this written submission represents our ideas in our own words and
where others ideas or words have been included, we have adequately cited and
referenced the original sources. We also declare that we have adhered to all the
any violation of the above will be a cause for disciplinary action by the Institute and
can also evoke penal action from the sources, which have thus not been properly cited
or from whom proper permission has not been taken, when needed.
Date: 17/05/2024
APPROVAL SHEET
Doshi” (60003210048), “Krish Shah” (60003210047) and “Sara Kore” (60003210155) for
Date: 17/05/2024
Abstract
Law enforcement faces significant challenges in criminal identification due to the vast amount of
CCTV footage and its often poor quality. Manually reviewing every frame is impractical, leading
to missed identification opportunities. To address this, we propose a novel system that leverages AI
and machine learning techniques to enhance and analyze keyframes from CCTV feeds. Using frame
differencing, candidate frames are extracted and clustered with HDBSCAN based on similarity.
Each frame undergoes preprocessing, including cosine transformation, scaling, and grayscale
conversion. Keyframes are identified as unclustered frames and the best frame from each cluster.
Enhanced Super-Resolution Generative Adversarial Networks (ESRGAN) are then employed to
improve image quality and detail in these keyframes.
Following enhancement, facial recognition and mapping algorithms produce facial embeddings
from the keyframes, which are stored in a PostgreSQL database. During identification, these
enhanced embeddings are compared to the database using vector similarity analysis, allowing for
rapid and accurate person identification. This system offers improved image quality, more efficient
criminal recognition, and higher accuracy in suspect identification, thereby enhancing law
enforcement's investigative capabilities and contributing to public safety and crime prevention.
List of Figures
Fig No. Figure Name Page No.
List of Tables
Table Table Name Page No.
No.
Table of
Contents
Abstract...................................................................................................................................................................... 6
List of Figures............................................................................................................................................................ 7
List of Tables ............................................................................................................................................................. 8
Chapter 1 Introduction ............................................................................................................................................... 1
1.1 Motivation / Objective ..................................................................................................................................... 1
1.2 Major Challenges............................................................................................................................................. 1
1.3 Report Overview ............................................................................................................................................. 1
Chapter 2 Literature Review ..................................................................................................................................... 4
2.1 Existing Work .................................................................................................................................................. 4
2.1.1 Literature Related to Existing Systems: ............................................................................................... 4
2.1.2 Literature Related to Methodology / Approaches/ Algorithms ......................................................... 4
2.1.3 Literature Related to Technology / Tools / Frameworks .................................................................... 5
2.2 Observations on Existing Work....................................................................................................................... 7
Chapter 3 Proposed Methodology/ Approach ......................................................................................................... 10
3.1 Problem Definition ........................................................................................................................................ 10
3.2 Scope ............................................................................................................................................................. 10
3.2.1 Assumptions and Constraints.............................................................................................................. 10
Chapter 4 Project Management ............................................................................................................................... 13
4.1 Project Schedule ............................................................................................................................................ 13
4.1.1 Timeline Chart ...................................................................................................................................... 13
4.2 Feasibility Study ............................................................................................................................................ 13
4.2.1 Technical Feasibility ............................................................................................................................. 13
4.2.2 Operational Feasibility......................................................................................................................... 14
4.2.3 Economical Feasibility ......................................................................................................................... 14
4.3 Project Resources .......................................................................................................................................... 14
4.3.1 Hardware Requirements...................................................................................................................... 15
4.3.2 Software Requirements ........................................................................................................................ 15
4.3.3 Operating Requirements...................................................................................................................... 15
Chapter 5 System Design ........................................................................................................................................ 16
5.1 System Designs ............................................................................................................................................. 16
5.1.1 Data Flow Diagram .............................................................................................................................. 16
5.1.2 Use Case Diagram................................................................................................................................. 17
Chapter 1
Introduction
The introduction outlines the motivation behind enhancing criminal identification through CCTV
footage analysis, addressing challenges such as large data volumes and poor image quality. It
also provides an overview of the report, detailing the proposed AI-driven system for improving
investigative efficiency and accuracy.
The need to close this gap is what motivates our project. Our suggested AI system uses machine
learning (ML) and artificial intelligence (AI) to evaluate CCTV footage. Keyframes, which are the
most informative parts of the video feeds, will be automatically extracted and enhanced by this
system. Through our focus on these keyframes, we hope to accomplish a number of important goals:
enhanced image quality using methods such as Enhanced Super-Resolution Generative Adversarial
Networks (ESRGAN); simplified suspect identification by ranking leads according to analysis; and,
finally, a notable improvement in the accuracy of suspect identification using facial recognition
algorithms that generate "facial fingerprints" that can be compared with current databases. To put
it simply, this AI-powered system gives law enforcement an effective tool to transform criminal
identification procedures, which will ultimately make society safer.
Facial recognition from CCTV footage is fraught with challenges due to a variety of factors. One
significant issue is the variation in lighting conditions. Poor or inconsistent lighting can obscure
facial features, making it difficult for algorithms to accurately identify individuals. Overexposure
or underexposure in footage can cause parts of the face to be too bright or too dark, complicating
the recognition process further. Similarly, the fixed angles of CCTV cameras often fail to capture a
person's face directly. Side views or partial views reduce the effectiveness of facial recognition
systems, which perform best when the subject is facing the camera. This inconsistency in camera
angles makes matching captured images with those in databases more challenging.
Another major challenge is the dynamic nature of human facial expressions. Faces change with
different expressions, such as smiling or frowning, which can distort key features like the eyes and
mouth. This variability requires recognition systems to be robust enough to identify individuals
despite these changes, adding complexity to the task. Additionally, occlusions such as masks, hats,
glasses, and scarves can partially or fully obscure a person's face. The widespread use of masks
during events like the COVID-19 pandemic has significantly hindered facial recognition efforts, as
these occlusions hide critical features needed for accurate identification.
The sheer volume of footage generated by CCTV systems, especially in high-traffic areas, presents
another substantial obstacle. Manually reviewing this vast amount of data is impractical, and
automated systems must be capable of processing and analyzing large volumes quickly and
efficiently. This demands high computational power and sophisticated algorithms to ensure
thorough and rapid analysis. Compounding this issue is the low resolution typical of many CCTV
cameras. Low-resolution images lack the detail needed for precise facial recognition, as blurred
facial features provide insufficient information for accurate identification.
Ensuring real-time processing of footage is also crucial for effective surveillance. Real-time analysis
requires substantial computational resources and highly optimized algorithms to handle continuous
data streams without delay. This capability is essential for timely identification and response but is
technically demanding. Furthermore, maintaining high accuracy in diverse and dynamic
environments is challenging. Facial recognition systems must perform consistently across various
lighting conditions, backgrounds, and crowd densities, which requires continuous tuning and
validation to ensure reliability. Addressing these challenges necessitates advancements in both
hardware and software, including better camera technology, sophisticated algorithms for handling
variability, and powerful computing resources for real-time analysis.
The introduction and motivation is covered in Chapter 1. Chapter 2 focuses on Literature review
and the gaps. The approach to detect criminals is discussed in Chapter 3. Project Management details
along with the project schedule is covered in chapter 4. The system architecture and its design is
discussed in chapter 5. The implementation details are highlighted in Chapter 6. Testing and Results
are discussed in Chapter 7. The report concludes with conclusion and future scope.
Chapter 2
Literature
Review
This section summarizes the literature review we conducted for the system, which included
research on existing systems, research on various methodologies, approaches, and algorithms, as
well as research on technology and frameworks. The section ends with an observation of existing
systems, which details the various features present in existing systems and how they might be
useful in our system.
Trinetra
The Uttar Pradesh Police introduced Trinetra, a digital platform, in March 2024 to aid in both crime
prevention and investigation. Advanced features of the app include audio search, facial recognition,
QR codes, and the Crime GPT feature, which enables detailed analysis of seizure details. Officers
can quickly identify suspects, access crime files, and expedite investigative procedures with the aid
of this app. It also offers voice sample analysis, which is helpful in cybercrime cases. More than
9.32 lakh criminal records are available in the Trinetra database, which aids frontline police in
identifying suspects during security checks. Through the use of photo linking and facial recognition
technology, the app also makes it possible to look for missing persons.
Clearview AI
Renowned for creating a robust system that can match faces to billions of photos taken from
websites and social media platforms, Clearview AI is a contentious provider of facial recognition
technology. The business's facial recognition algorithm identifies people with high accuracy across
a variety of datasets by analyzing facial features in gathered images to produce distinct facial
embeddings. Law enforcement agencies, security companies, and other organizations have used
Clearview AI's technology for security and investigative needs. However, due to privacy violations,
data scraping activities, worries about mass surveillance, and ethical considerations surrounding
facial recognition technology, Clearview AI has drawn a great deal of criticism and controversy.
NEC NeoFace
Across a wide range of locations and industries, NEC's biometric face recognition technology is
utilized globally to combat crime, stop fraud, ensure public safety, and enhance customer
experience.NeoFace, which employs cutting-edge algorithms and deep learning techniques, is ideal
for use in law enforcement, border control, airport security, retail, banking, and other fields because
of its exceptional ability to recognize faces in a variety of lighting conditions, angles, and partial
occlusions. Through the effective real-time face-to-large database matching, the technology helps
organizations prevent unwanted access, expedite identification procedures, and strengthen security
measures. NeoFace provides scalable solutions that support features like access control, automated
identity verification, and surveillance while integrating seamlessly with current security systems.
NEC guarantees adherence to privacy laws and the moral application of facial recognition
technology by placing a strong emphasis on privacy and data protection.
Criminal Identification for Low Resolution Surveillance: S. P. Patil has presented a face
recognition model based on the Tiny Face Detector, a popular real-time face detector that is web-
friendly and portable. It takes a person's face as input, detects faces and facial landmarks on
images or frames, and outputs a vector known as an embedding of 128 numbers that represents the
key facial features. After being trained on a dataset of images of different criminals, any criminal
can be identified using real-time surveillance video feed as input. The frames are subjected to
feature extraction of the identified faces, and a threshold value is used to determine whether there
is a match between the embeddings and the features of the dataset's image images. The admin can
view the results through a portal created with the Django framework, which has the recognized
images saved in a folder in PNG format.[1]
A system for identifying criminals through real-time image processing: In surveillance facial
recognition systems, images are first preprocessed to remove noise and redundancy, and then the
Haar cascade algorithm is used to extract features. To ascertain whether an individual is a criminal
or a suspect, the system compares the processed images with databases of citizens and
local/international watch list users. The individual is deemed innocent if they cannot be located in
either database. The use of these systems has sparked questions about ethical implications and
privacy.[2]
Combined Face Alignment and 3D Face Reconstruction for Face Recognition Applications: A
method for joint face alignment and 3D face reconstruction is proposed in this paper. The process
entails defining the problem to be solved and creating a 3D face model with detachable identity
and expression components. The general process is then described, which includes refining the
landmarks and updating the 3D face shape. The training data preparation is demonstrated, and the
main components of the suggested method are introduced, including learning the 2D landmark
and 3D shape regressors and estimating the 3D-to-2D mapping and landmark visibility. When
tested on benchmark datasets, the approach beats state-of-the-art[3].
After grouping crime data into subsets based on the evidence at hand, Shalinda Adikari,
Kaumalee Bogahawatte, and his researchers used clustering techniques and Naive Bayesian
classification to determine who the most likely suspects were for criminal incidents.[4]
One of the best methods to improve image super resolution, according to Nathanael Carraz
Rakotonirina and Andry Rasoanaivo, is to use an Enhanced Super-Resolution Generative
Adversarial Network. It functions regardless of how poor the image quality is. A few layers have
noise inputs added to them to improve the image's appearance and make it appear more
photorealistic. Additionally, comparisons between SRGAN with ESRGAN and ESRGAN+,
EnhanceNet, and SRCNN are made. The generalization of noise injection is not without
limitations. There's a chance that GANs will be applied maliciously or illegally.[5]
The paper titled "Face Recognition in the Presence of Age Differences using Holistic and
Subpattern-based Approached" by Sertbay and Toygar, presented at the 8th WSEAS International
Conference on Signal Processing (SIP '09) in Istanbul, Turkey, aims to address the challenge of
face recognition in scenarios where age differences between individuals may impact the accuracy
of recognition systems.
The paper explores two different approaches for face recognition: a holistic approach and a
subpattern-based approach. In the holistic approach, the entire face image is considered as a single
entity for recognition purposes. This method typically involves feature extraction techniques that
capture global facial characteristics without explicitly considering specific facial regions or
features. On the other hand, the subpattern-based approach focuses on extracting and analyzing
local facial features or patterns, such as eyes, nose, and mouth regions. By examining these
individual subpatterns, the recognition system can potentially achieve more robust performance,
particularly in scenarios where age differences between individuals may result in significant
variations in specific facial regions.
The study likely investigates the effectiveness of both approaches in handling age differences
during face recognition tasks. It may explore various feature extraction techniques, classification
algorithms, and performance evaluation metrics to assess the accuracy and robustness of each
approach.[12]
Facial Recognition Technology: Advances and Challenges: This paper offers a perceptive
summary of the most recent developments in facial recognition technology, emphasizing
convolutional neural networks (CNNs) and deep learning-based methods. It talks about the
amazing advancements in facial recognition performance and accuracy brought about by the use
of deep learning techniques.
The study also discusses the difficulties faced by facial recognition systems in practical settings,
such as problems with accuracy, bias, and privacy. It emphasizes how critical it is to deal with these
issues in order to guarantee the ethical and responsible application of facial recognition technology.
[6].
Clustering Algorithms for Video Analysis and Surveillance: Reviewing scalable clustering
algorithms, this paper concentrates on Hierarchical Density-Based Spatial Clustering of
Applications with Noise (HDBSCAN) and its uses in surveillance and video analysis. It looks at
how well these algorithms cluster extracted frames from CCTV footage according to similarity.
The study emphasizes the significance of organizing and structuring surveillance data for effective
analysis, enabling tasks like object detection, anomaly detection, and pattern recognition in video
Machine Learning Techniques for Image Enhancement in Surveillance Systems: This study
examines several machine learning methods for improving image quality in surveillance systems,
with a focus on deep learning architectures and Generative Adversarial Networks (GANs). It
assesses how well these methods work to enhance the lucidity and nuance of CCTV footage
captured at low resolution. The research shows notable advances in image enhancement through the
use of GANs and deep learning, facilitating improved visualization and analysis of surveillance
footage for the purpose of criminal identification and investigation.[8].
The field of facial recognition technology and surveillance systems is currently characterized by
significant progress, but it is also characterized by ongoing difficulties. A trend toward the use of
convolutional neural networks (CNNs), in particular, and other deep learning-based techniques for
facial recognition tasks is evident in a number of studies. These methods have produced notable
gains in performance and accuracy, suggesting improved capacities for law enforcement
organizations in detecting criminals and augmenting public safety.
But even with these developments, there are still significant issues that need to be resolved in order
to use face recognition technology responsibly. Concerns about privacy, bias, and accuracy continue
to be major obstacles in the development and application of facial recognition systems, highlighting
the significance of ethical and legal frameworks.
To further improve image quality in surveillance applications, the integration of Enhanced Super-
Resolution Generative Adversarial Networks (ESRGAN) has shown promise. ESRGANs contribute
to a clearer visualization of surveillance imagery by improving the resolution and detail of low-
resolution images. This ultimately improves the accuracy and efficacy of tasks related to facial
recognition and identification.
While these advancements represent notable breakthroughs, several challenges and limitations
persist. Despite gains through CNNs and deep learning techniques, practical implementation faces
issues such as varying lighting, camera angles, and occlusions. ESRGANs aim to address low-
resolution images, but may not fully compensate for inherent variability and quality issues in CCTV
feeds.
Privacy remains a significant concern, with ethical and legal questions about surveillance and
potential misuse. Without robust frameworks, there's a high risk of infringing on individual privacy
rights. Biases in facial recognition algorithms also pose a major obstacle, often leading to higher
error rates across demographic groups and undermining reliability and fairness.
In conclusion, while advancements in deep learning, image enhancement, and clustering algorithms
offer promising improvements for surveillance systems, significant challenges related to data
quality, privacy, bias, and computational efficiency must be addressed. Future research and
development must focus on overcoming these obstacles to ensure that facial recognition technology
can be used responsibly and effectively in enhancing public safety and security.
Chapter 3
Proposed
Methodology/
Approach
3.1 Problem Definition
The project intends to address the difficulties law enforcement agencies encounter in criminal
identification as a result of the copious amount of CCTV footage they possess and the subpar
quality of the images and videos they obtain from these sources. However the challenge lies in
facial recognition from CCTV Footages due to variables like occlusions, angles etc. The ultimate
objective of this project is to create an intelligent system that automates keyframe extraction,
improves image quality, and speeds up facial recognition from CCTV video feeds in order to
increase the efficacy and accuracy of criminal recognition tasks performed by law enforcement
agencies. By doing this, the project hopes to improve public safety and efforts to prevent crime by
giving law enforcement officials an effective tool for locating suspects and strengthening their
investigative skills in actual surveillance situations.
3.2 Scope
The goal of the project is to create an artificial intelligence (AI) system that will greatly improve
CCTV footage's capacity for criminal identification. With the use of sophisticated frame
differencing and clustering algorithms, along with preprocessing methods like cosine
transformation and grayscale conversion to isolate pertinent features, this all-inclusive system will
automate keyframe extraction.
Using ESRGAN to improve the quality and resolution of extracted keyframes is a crucial part of
the project. The project's ultimate goal is to create a unified AI-based system that seamlessly
combines features for facial recognition, picture enhancement, and keyframe extraction. Strict
performance reviews will be carried out to evaluate dependability, efficiency, and accuracy;
iterative optimizations will be made to improve performance in criminal identification tasks. The
problem statement's scope includes addressing issues that police departments deal with, such as the
excessive amount of CCTV footage and the low image quality that impede manual identification
procedures.
Throughout the project's lifecycle, ethical concerns such as privacy, data protection, and algorithmic
bias will be crucial to maintaining compliance with legal regulations governing surveillance data
and facial recognition technologies. The created system seeks to empower law enforcement
organizations by utilizing state-of-the-art AI and ML techniques, ultimately improving public safety
outcomes and crime prevention initiatives.
● Consistency of Facial Features: The system assumes that despite variations in lighting, angles, and
occlusions, the facial features captured will be consistent enough across different frames to allow
for accurate clustering and recognition.
● Sufficient Training Data: The facial recognition algorithms assume the availability of a diverse
and extensive dataset for training, which includes various lighting conditions, angles, expressions,
and occlusions to ensure robust performance.
Constraints:
● False Positives and Negatives: The system must manage the constraint of minimizing false
positives (incorrectly identifying individuals) and false negatives (failing to identify individuals).
This requires fine-tuning of the algorithms to balance sensitivity and specificity, which can be
challenging in diverse and dynamic real-world environments.
● Data Privacy and Security: The system must adhere to privacy and security regulations governing
the handling and analysis of CCTV footage, ensuring that sensitive information is protected and
appropriately handled.
Chapter 4
Project
Management
The proposed system leverages advanced AI and machine learning techniques, including
convolutional neural networks (CNNs) for facial recognition and Enhanced Super-Resolution
Generative Adversarial Networks (ESRGAN) for image enhancement. These technologies are well-
established in the field of computer vision and have shown promising results in similar applications.
The use of frame differencing and scalable clustering algorithms like HDBSCAN ensures that only
the most relevant frames are processed, optimizing performance. Preprocessing steps such as cosine
transformation, scaling, and grayscale conversion are standard techniques that can be effectively
implemented. Given the advancements in hardware, such as powerful GPUs, the technical
implementation of this system is feasible. However, the system's success relies heavily on the
quality and robustness of the training data and the computational resources available for real-time
processing.
Operationally, the system integrates seamlessly into existing CCTV infrastructure, enhancing its
capabilities without requiring a complete overhaul. The use of PostgreSQL databases for storing
facial embeddings ensures efficient data management and retrieval. Real-time processing
capabilities allow law enforcement agencies to respond promptly to potential threats, improving
overall security operations. However, the system requires skilled personnel for setup, maintenance,
and continuous monitoring to address any technical issues and to fine-tune the algorithms based on
evolving requirements. Training law enforcement officers to interpret and act on the system's
outputs is also necessary to maximize its effectiveness. Despite these requirements, the operational
integration is feasible with proper planning and resource allocation.
An economic Economically, the implementation of this system involves initial costs related to
acquiring the necessary hardware, such as high-performance GPUs and servers, and software
development or procurement. There are also ongoing costs for system maintenance, updates, and
potential subscription fees for cloud-based AI services. However, these costs can be offset by the
significant benefits in terms of improved efficiency and accuracy in criminal identification,
potentially reducing the time and resources spent on investigations. Enhanced public safety and
crime prevention can lead to long-term economic benefits by fostering a safer environment,
potentially reducing crime-related costs. Additionally, the system's ability to process large volumes
of footage more effectively than manual review can result in substantial operational cost savings.
Overall, the economic feasibility is favorable, provided that the initial investment is justified by the
long-term benefits and savings.
The success of any project depends heavily on the availability and suitability of resources. In this
section, we outline the essential resources required for the development and operation of our facial
recognition system. This includes hardware specifications, such as CPUs and GPUs, software
components like libraries and databases, and operating system compatibility. By identifying and
addressing these resource requirements upfront, we can ensure a smooth and efficient
implementation of the system, ultimately contributing to its overall success and effectiveness.
• T4 GPU
• 8 GB RAM.
• Scikit-Learn
• PostgreSQL
Thus we conclude the section by stating the problem definition, scope of the system, the proposed
approach to build our system and finally the project resources i.e software, hardware and operating
requirements required for the smooth working and running of our system.
Chapter 5
System Design
The Level 0 DFD provides an overarching view of the entire system, illustrating the high-level
processes and their interactions.
The Level 1 DFD delves deeper into the processes outlined in Level 0, breaking them down into
more detailed subprocesses.
As s illustrated in Figure, the system architecture consists of multiple layers and are described below
Video Input from CCTV Footages: This stage entails gathering video streams from CCTV
cameras that have been positioned tactically throughout the area. These cameras record security
footage continuously, giving analysts access to a multitude of data.
Frame Differencing for Frame Extraction: This technique finds significant motion by comparing
successive frames in the video stream. Frames exhibiting significant alterations are extracted,
suggesting possible illicit activities or noteworthy occurrences.
Preprocessing for Feature Extraction: To improve image quality and extract significant features,
each extracted frame is put through preprocessing. The frames are prepared for further analysis
using methods like scaling, grayscale conversion, and cosine transformation.
Face Detection with Haar Cascade Model: To accurately detect faces in the enhanced images, the
pre-trained Haar Cascade Model is used. It can be downloaded from the OpenCV GitHub repository.
An important first step in facial recognition and identification tasks is the efficient identification of
facial regions by this model.
Image Recognition with OpenAI CLIP: With OpenAI CLIP, image recognition tasks can be
completed. Facial feature extraction and the creation of facial embeddings for additional analysis
are made possible. This cutting-edge method improves the system's capacity to precisely identify
and assess faces, even under difficult circumstances.
Mapping and Recognition with pgvector: Using pgvector for Mapping and Recognition: The
PostgreSQL database uses pgvector for mapping and recognition. With the help of this tool, facial
embeddings can be stored and retrieved efficiently, allowing for quick access to pertinent data for
tasks involving analysis and identification.
Search Optimization for Quicker Mapping: Search optimization strategies are used to speed up
the mapping procedure. By improving the speed and effectiveness of data retrieval from the
database, these optimizations guarantee prompt access to crucial data for analysis.
Alert Generation and Metadata Display: The system creates alerts for law enforcement agencies
and shows the identified person's metadata after finding a match in the database. In response to
possible criminal activity, these alerts enable prompt action and decision-making by providing
timely notifications.
UCF Dataset:
Video Duration: The dataset comprises 128 hours of video footage, providing a substantial amount
of data for training and testing anomaly detection algorithms.
Anomaly Categories: Thirteen realistic anomalies are included in the dataset, covering a wide range
of criminal activities such as abuse, arrest, arson, assault, vehicle accidents, explosions, fighting,
robbery, shootings, stealing, shoplifting, and vandalism. Each category represents a distinct type of
criminal behavior, enabling comprehensive analysis and evaluation of the system's performance.
Real-world Scenarios: The videos are captured from real-world surveillance cameras, depicting
actual incidents and scenarios that occur in public spaces. This realism enhances the dataset's value
for training and evaluating anomaly detection models in real-world settings.
Uncut Surveillance Tapes: The dataset consists of 1900 uncut surveillance tapes, ensuring that the
videos capture the full context of each incident without any edits or modifications. This preserves
the authenticity of the data and allows for a more accurate assessment of the system's performance.
Mugshot Dataset:
Dataset Size: With a total of 68149 entries, the Mugshot dataset provides a large and diverse
collection of prisoner mugshots and related information. This extensive dataset enables
comprehensive training and testing of facial recognition algorithms.
Mugshot Images: Each entry in the dataset includes front and side views of prisoner mugshots,
serving as the primary data for facial recognition tasks. These images capture various facial
expressions and angles, enhancing the dataset's diversity and utility for training robust recognition
models.
Public Domain Status: Since the Mugshot dataset is sourced from a government agency and
consists of factual information, it is considered to be in the public domain. This means that the
dataset is freely available for use and distribution without any copyright restrictions, making it an
accessible resource for research and development purposes.
By leveraging these datasets, researchers and practitioners can train and evaluate facial recognition and
anomaly detection algorithms with real-world data, ultimately improving the accuracy and effectiveness of
surveillance and security systems.
Chapter 6
Implementation
This section includes the implementation details used for preparing the system which includes the
working of our system, along with the algorithms and tools used by us to build it and snapshots of
the outputs of our system.
• CCTV cameras are strategically placed based on risk assessments and crime statistics to ensure
comprehensive coverage of high-traffic and high-risk areas.
• Camera placement takes into account factors such as lighting conditions, field of view, and potential
obstructions to maximize surveillance effectiveness.
• Cameras are equipped with features such as pan, tilt, and zoom for flexible monitoring of dynamic
environments.
• Video feeds are transmitted in real-time to a central monitoring station for immediate analysis and
response.
• Frame differencing algorithms employ techniques like thresholding and background subtraction to
identify regions of interest with significant motion.
• Temporal filtering algorithms suppress transient motion caused by factors like wind or moving
foliage, focusing on sustained motion indicative of human activity.
• Extracted frames are timestamped and tagged with metadata such as camera location, motion
intensity, and scene context for later analysis.
• Density estimation techniques dynamically adjust cluster boundaries to adapt to variations in data
density and noise levels.
• Hierarchical clustering structures enable multi-level organization of data, allowing for fine-grained
analysis of complex scenes with overlapping activities.
• Cluster quality metrics such as silhouette scores and coherence indices are computed to assess the
effectiveness of clustering algorithms and guide parameter tuning.
• Preprocessing steps are tailored to address specific challenges in facial feature extraction, such as
variations in illumination, pose, and facial expression.
• Adaptive scaling techniques adjust image resolution and aspect ratio to normalize facial size and
orientation across frames.
• Histogram equalization methods enhance contrast and detail in low-contrast regions, improving the
visibility of facial features.
• Noise reduction algorithms such as Gaussian smoothing and median filtering mitigate artifacts
introduced by image compression and sensor noise.
• ESRGAN architectures consist of multiple layers of convolutional neural networks trained on large-
scale image datasets to learn complex image transformations.
• Perceptual loss functions measure the perceptual similarity between generated and ground truth
images, guiding model optimization to prioritize relevant features.
• Adversarial training frameworks introduce adversarial examples to the training process, enhancing
model robustness against adversarial attacks and domain shifts.
• Haar Cascade classifiers utilize cascading stages of simple classifiers trained on integral image
features to efficiently detect faces in real-time.
• Non-maximum suppression algorithms prune overlapping detections and retain only the most
confident face candidates, reducing redundancy and computational overhead.
• Facial landmark detection algorithms localize key facial landmarks such as eyes, nose, and mouth
to facilitate accurate alignment and pose estimation.
• Deep neural network architectures such as ResNet and VGGNet are employed for facial feature
extraction, leveraging large-scale face datasets for robust representation learning.
• PostgreSQL, a robust relational database management system, is employed for its scalability,
reliability, and support for advanced indexing and querying capabilities.
• pgvector, an extension for PostgreSQL, enables efficient storage and retrieval of high-dimensional
vector data, such as facial feature embeddings.
• Vector indexing techniques such as similarity search trees and inverted indexes optimize query
performance for large-scale vector datasets.
• Parallel processing frameworks like Apache Spark are utilized for distributed computation of
• KD trees are binary trees used for partitioning multidimensional space into regions to facilitate
efficient nearest neighbor search.
• Splitting criteria such as median or mean are employed to recursively divide the feature space along
different dimensions.
• Balanced tree structures ensure uniform distribution of data points within each partition, minimizing
search complexity.
• Nearest neighbor search algorithms traverse the tree in a recursive manner, pruning branches based
on distance bounds and node characteristics to expedite search operations.
• Approximate nearest neighbor search techniques such as locality-sensitive hashing (LSH) are
employed to trade off search accuracy for improved efficiency in high-dimensional spaces.
• When a match is found between a queried facial feature embedding and the database, an alert is
generated to notify relevant authorities or security personnel.
• Alerts are dispatched through various communication channels such as email, SMS, or push
notifications, ensuring timely response to potential security threats.
• Metadata enrichment techniques integrate data from external sources such as social media, public
records, and law enforcement databases to augment the information available for analysis and
response.
• Performance metrics such as detection accuracy, false positive rate, and response time are monitored
continuously to evaluate system effectiveness and identify areas for improvement.
• Feedback mechanisms gather input from end-users, stakeholders, and domain experts to inform
system refinements and enhancements.
• Continuous integration and deployment (CI/CD) pipelines automate the delivery of software
updates and model improvements, ensuring seamless integration of new features and bug fixes.
• Threat intelligence feeds and anomaly detection algorithms analyze emerging patterns of criminal
activity and security threats, enabling proactive measures to mitigate risks and vulnerabilities.
By incorporating these detailed components, your criminal recognition system achieves a high level
of sophistication, reliability, and effectiveness in detecting and preventing criminal activity, thereby
enhancing public safety and security.
images from low-resolution inputs, improving the clarity and fidelity of surveillance footage. By
enhancing the visual quality of the footage, ESRGAN enables more accurate facial recognition and
analysis, leading to improved identification of individuals and activities captured in the video
streams. This enhancement is particularly crucial in scenarios where the quality of surveillance
footage may be compromised due to factors such as low lighting conditions or camera limitations.
3. Facial Recognition and Mapping:
• Open cv, Pre-trained Haar Cascade Model:
• Purpose: The pre-trained Haar Cascade Model is employed for face detection within the
surveillance footage. This model utilizes a cascade of classifiers to efficiently detect faces in images,
providing the initial step in facial recognition. By accurately localizing faces in the video streams,
the Haar Cascade Model enables subsequent analysis and identification of individuals.
• OpenAI CLIP (Contrastive Language-Image Pre-training):
• Purpose: CLIP is utilized for facial recognition tasks, including facial feature extraction and
generation of facial embeddings. Trained on a diverse dataset of image-text pairs, CLIP can
associate images with textual descriptions, enabling robust image understanding capabilities. By
extracting facial features and generating embeddings for each detected face, CLIP enhances the
accuracy and robustness of facial recognition, facilitating accurate identification of individuals
across varying poses, lighting conditions, and image resolutions.
• pgvector for Mapping and Recognition in PostgreSQL Database:
• Purpose: Facial embeddings and associated metadata are efficiently stored and retrieved using
pgvector in the PostgreSQL database. This optimization improves the responsiveness and scalability
of facial recognition tasks, enabling real-time identification of known individuals and potential
suspects. By leveraging pgvector's capabilities for vectorized data storage and indexing, the system
can efficiently query the database for matching facial embeddings, facilitating rapid identification
and retrieval of relevant information.
• Additionally, for search optimization, the system employs KD trees, a data structure used for
efficient nearest neighbor search in multidimensional spaces. KD trees partition the feature space
into regions, enabling rapid retrieval of pertinent data from the database and facilitating quick
identification of matching individuals. This optimization enhances the responsiveness and
scalability of the system, improving its effectiveness in identifying and apprehending criminals.
• By integrating these advanced algorithms and techniques, the proposed criminal recognition system
achieves a high level of accuracy, efficiency, and scalability, contributing to enhanced public safety
and security initiatives.
OpenCV, an open-source computer vision and machine learning library, serves as the backbone of
the project, providing essential functionalities for processing CCTV footage. Leveraging OpenCV's
extensive suite of functions, the system performs tasks like frame extraction from video streams and
face detection using pre-trained models such as the Haar Cascade Classifier. Additionally, OpenCV
facilitates various image preprocessing tasks to prepare data for further analysis, ensuring that the
surveillance footage is ready for subsequent processing steps.
OpenAI CLIP, a cutting-edge deep learning model developed by OpenAI, enhances the system's
facial recognition capabilities. Trained on a large dataset of image-text pairs, CLIP learns to
associate images with textual descriptions, enabling robust image understanding. Within the project,
OpenAI CLIP is utilized for facial recognition tasks, extracting facial features and generating
embeddings for detected faces. By leveraging CLIP's multimodal capabilities, the system achieves
accurate identification of individuals across varying poses, lighting conditions, and image
resolutions.
In the database management aspect of the project, pgvector, an extension for PostgreSQL, optimizes
storage and retrieval of high-dimensional vector data. Designed to handle vector data types
efficiently, pgvector enhances the responsiveness and scalability of the PostgreSQL database.
Specifically, pgvector is employed for mapping and recognition tasks, enabling efficient storage and
retrieval of facial embeddings and associated metadata. This optimization ensures rapid facial
recognition and analysis, facilitating real-time identification of individuals in surveillance footage.
Chapter 7
Testing And
Results
System testing is a critical stage in software development that aims to verify whether an integrated
system meets the defined requirements and functions correctly. This section includes creating a test
plan, developing various test cases and testing methods, and then executing those tests to obtain
experimental results. The ultimate objective of system testing is to identify any defects, bugs, or
other issues that could impact the system's performance, security, or functionality.
The plan of the test is composed of a few phases such as the creation of the plan, as the first step,
then the development of the test cases, unit testing, integration testing, system testing, usability
testing, performance testing, preparing the report, and finally. In fact, every stage is carefully
constructed to fulfill a particular objective of the testing and make sure the system functions as
well the performance is strictly analyzed.
The testing will be carried out for each of the components of the system as specified namely
frames extraction, image enhancement by ESRGAN, facial embeddings generation from OpenAI
CLIP, mapping and recognition in PostgreSQL database based on pgvector, search optimization
methods, alert generation, components integration, and performance testing. Each feature is in
separate of it, specific testing process say which type of region can be human motion recognition,
with ESRGAN see if things really can be improved, saw if system mapping and recognition
pgvector are efficient and also evaluate the system responsiveness so that load conditions can
change.
The testing plan, which is based on full comprehensiveness, has the objective of proving the
system’s capability to provide accuracy, reliability, and efficiency. The verification will also
ensure that the system can perform its tasks properly from the onset and in the real world
scenarios using the data available from the different weather conditions to assist in sustaining
public safety and security.
Test Test Scenario Test Steps Test Data Expected Actual Pass/Fail
case results results
ID
UT02 Check that the 1. Record a Video Accurate The most Pass
most pertinent variety of stream selection is relevant
frame was motion- with made of the frame is
chosen leveled various most pertinent selected
following video feeds. moving frame from
frame 2. To scenes in each cluster,
clustering and identify it. which
differencing. regions of represents
noteworthy regions of
motion, notable
apply frame motion or
differencing. noteworthy
Utilize events.
HDBSCAN
clustering to
organize
comparable
frames.
3. Verify
which frame
within each
cluster was
chosen as
the most
pertinent.
UT03 Video input is 1. Record Footage The system Frames are Pass
at a different video from recognizes generated
angle. streams from CCTV regions of from videos
CCTV cameras interest taken at
cameras that are precisely and
UT07 Check to see Take the 1. Video The system's It is able to Pass
if the system frames' frames ability to differentiate
can identify facial containing identify and
and map embeddings people map frames
frames that and compare who look that bear
resemble them to the like they resemblance
stored photos, database's belong in to stored
even if they photo old photos is a
differ. embeddings. pictures. testament to
Check to see 2. kept its ability to
if the system images in identify faces.
can the
recognize database.
and associate
frames with
comparable
facial
features with
previously
stored
images.
IT02 Make sure 1. Use facial Enhanced Precise Both units Pass
the mapping recognition to images mapping of are
and facial create facial facial integrated
recognition embeddings embeddings to properly
modules are from the stored photos
properly extracted guarantees
integrated. frames. precise
2. Connect the identification.
created
embeddings to
the database's
stored images.
3. Check to
make sure the
mapped people
correspond
with the
corresponding
photos.
IT03 Verify that 1. Act out the Mapped The system Alerts are Pass
the database scenario where images correctly sent to the
interaction a suspect is creates alerts concerned
modules and identified by for recognized department
alert facial suspects and
generation recognition. enters them
modules 2. To create an into the
work together alert for the database error-
seamlessly. identified free.
suspect,
activate the
alert generation
module.
ST01 Verify the Carry out the Provide a The system Identification Pass
system's full workflow of video feed maps and is done
overall the system, with a correctly
operation, which includes variety of identifies
including mapping, face scenarios, people from
the ability recognition, such as the given
to identify image various video feed,
suspects enhancement, lighting proving its
and input and frame settings, usefulness in
videos. extraction. mask- practical
wearing situations.
people, and
angles.
Unit testing:
Unit testing is used to verify particular features or parts of the system, like facial mapping, image
enhancement, frame extraction, frame clustering, and masked person detection. Unit testing helps
find flaws or errors early in the development process by isolating each component, which makes it
simpler to fix them before integrating with other units.
Integration testing:
Integration testing assesses how various system modules or components interact with one another.
It guarantees the smooth operation of these components and the proper flow of data between them.
Verifying the integration of different modules, such as image enhancement and frame extraction,
facial recognition and mapping, and database interaction with alert generation, is the main goal of
integration test cases. Integration testing assists in identifying potential errors or inconsistencies.
System testing:
System testing evaluates the overall functionality and behavior of the system. It assesses whether
the system satisfies its objectives and operates as planned in actual use cases. system test case
assesses the system's overall performance, taking into account image enhancement, face
recognition, mapping, and frame extraction. System testing guarantees that all parts function
cohesively and satisfy the expectations of the stakeholders by evaluating the system as a whole.
Performance testing:
Performance testing assesses the system's scalability, stability, and responsiveness to changes in
workload. In order to evaluate the effectiveness and dependability of the system, it tracks
important performance indicators including response time, throughput, and resource utilization. A
performance test case evaluates the system's data processing speed at both peak and typical load
levels.
Conclusion
Finally, presented systemitis a huge step in resolving the difficulties the law enforcement agencies
are facing while using Computer Vision Television to identify criminals. Facial detection, AI, and
machine learning serve as a robust and cutting-edge method for dealing with mass surveillance data
of poor quality or volume.
By running the frame differencing department, collection of scalable clustering algorithm and
improvement of super-resolution technique, the system brings the clearness and details of
mainframes out of CCTV. Thus, thanks to the development of good image quality, accuracy of
facial recognition and mapping and their all imaginability and rapidity in the reproduct as well as
identifying of individuals separately, the concerned tasks of recognition are more effective.
What differentiates the system from others is the fact that it enhances investigative capabilities, but
these are not the only advantages that it provides. On the grounds of state-of-the-art AI methods,
GAN, the automatic vehicle is not only the key factor in improving the public safety but also the
aspect of crime reduction. The implementation of privacy-seeking protocols means that the rational
development and application of the monitoring technology is done in an ethical and responsible way
to safeguard civil rights and contribute to security.
It can then be argued that the suggested system has potential to solve the criminal identification
difficulties that law enforcement agencies have by means of revolutionizing the procedures they are
following. Harnessing the power of AI and Machine Learning allows law enforcement organizations
not to merely adopt strategies to a changing crime landscape but to outdo the trends; thus, we
achieve safer communities and advocacy for a secure society.
Future Scopes
Face Frontalization (Facial Reconstruction): To boost facial recognition accuracy, make use of deep
learning that will crop non-frontal images to a standard frontal orientation.
Multi-Modal Fusion for Enhanced Recognition: For higher precision, especially in difficult
conditions, the combination of voice analysis, gait analysis and facial recognition will result in better
results.
Semantic Segmentation for Object Detection: In order to increase the efficiency of criminal
identification with the use of semantic segmentation algorithms for retrieving main object or people
from surveillance footage could also be crucial.
Real-Time Video Analytics and Alerting: Develop real-time video analytics through detecting
possible threats automatically and providing real-time tracking to enable fast response to those
cases.
Behavioral Analysis and Anomaly Detection: Utilize anomaly detection algorithms to detect rule
breakers that reveal potentially criminal activity, which leads to constituting suspicion and probing
for deeper investigation.
Techniques for Preserving Privacy while Sharing Data: Secure sharing of information between
agencies through the use of differential privacy or federated learning ought to be enabled. It ensures
that the information availed to the agents will act as a shield at the same time as an avenue for data
sharing.
References
[1] S. P. Patil et al., "Criminal Identification for Low Resolution Surveillance," in VIVA-
Tech International Journal for Research and Innovation, Mumbai, 2021, vol. 1.
[3] F. Liu, Q. Zhao, X. Liu and D. Zeng, "Joint Face Alignment and 3D Face Reconstruction
with Application to Face Recognition," in IEEE Transactions on Pattern Analysis and
Machine Intelligence, , 1 March 2020, vol. 42, no. 3, pp. 664-678.
[4] Kaumalee Bogahawatte & Shalinda Adikari, “Intelligent Criminal Identification System”
The 8th International Conference on Computer Science & Education (ICCSE 2013)
Colombo, Sri Lanka April 26-28, 2013
[6] Parkhi, O. M., Vedaldi, A., & Zisserman, A. (2015). Deep Face Recognition. In
Proceedings of the British Machine Vision Conference (BMVC) (pp. 41.1-41.12).
[7] Ledig, C., Theis, L., Huszár, F., Caballero, J., Cunningham, A., Acosta, A., ... & Shi, W.
(2017). Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial
Network. In Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition (CVPR) (pp. 4681-4690).
[8] Campello, R. J., Moulavi, D., Zimek, A., & Sander, J. (2015). Hierarchical Density
Estimates for Data Clustering, Visualization, and Outlier Detection. ACM Transactions on
Knowledge Discovery from Data (TKDD), 10(1), 5.
[9] Wang, X., Yu, K., Wu, S., Gu, J., Liu, Y., Dong, C., ... & Wang, Y. (2018). ESRGAN:
Enhanced Super-Resolution Generative Adversarial Networks. In Proceedings of the
European Conference on Computer Vision (ECCV) (pp. 63-79).
[11] Waqas Sultani, Chen Chen, Mubarak Shah, IEEE Conference on Computer Vision and
Pattern Recognition (CVPR), 2018: https://webpages.charlotte.edu/cchen62/dataset.html
[12] Sertbay, H. and Toygar, Ö, “Face Recognition in the Presence of Age Differences using
Holistic and Subpatternbased Approached”, in Proceedings of the 8th WSEAS International
Conference on Signal Processing (SIP '09), Istanbul, Turkey, May 30 - June 1, 2019.
Acknowledgement
Sara Kore
Prince Doshi
Krish Shah