YOLO Based Real Time Human Detection Using Deep Learning

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

Journal of Physics: Conference

Series

PAPER • OPEN ACCESS You may also like


- A Deep Learning approach to 3D Viewing
YOLO Based Real Time Human Detection Using of MRI/CT images in Augmented Reality
Dewanshi Paul, Ankit Kumar Patel and G
Deep Learning Rohith

- A Deep Learning approach for fire object


detection in Autonomous vehicles
To cite this article: Y M Jaswanth Kumar and P Valarmathi 2023 J. Phys.: Conf. Ser. 2466 012034 Aum Shiva Rama Bishoyi, Raghav Goel,
Vansh Batra et al.

- An approach for improving Optical


Character Recognition using Contrast
enhancement technique
View the article online for updates and enhancements. Nithin K Shine, Gariman Bhutani,
Tamatapu Sai Keerthana et al.

This content was downloaded from IP address 106.78.197.70 on 28/04/2024 at 07:28


4th National Conference on Communication Systems (NCOCS 2022) IOP Publishing
Journal of Physics: Conference Series 2466 (2023) 012034 doi:10.1088/1742-6596/2466/1/012034

YOLO Based Real Time Human Detection Using Deep


Learning
Y M Jaswanth Kumar 1 , P Valarmathi 2*
1
Student, School of computer science Engineering, Vellore Institute of
Technology, Chennai , 600 127, India
2
Assistant Professor Senior Grade2, School of Computer Science
Engineering, Vellore Institute of Technology, Chennai 600 127 , India
*valarmathi.sudhakar@vit.ac.in

Abstract. Person identification which has been widely used in computer vision
and one of the difficult applications employed in a variety of fields such
autonomous vehicles, robots, security tracking, aiding visually impaired
people, etc. As deep learning quickly developed, numerous algorithms
strengthened the link between video analysis and visual comprehension. The
goal of all these algorithms, regardless of how their network architectures
operate, is to find many people inside a complicated image. The freedom of
movement in an unknown environment is restricted by the absence of vision
impairment, thus it is crucial to use modern technologies and teach them to
assist blind people whenever necessary. We provide a system that will identify
all potential daily multiple people and then prompt a voice to inform the user
about both nearby and distant people.

1. Introduction

Humans are taught by their parents to classify numerous objects, including themselves, nearly from
birth. Because of its high accuracy and precision, the human visual system can manage several tasks
even when the conscious mind is not fully engaged. When there is a lot of data, a more precise system
is required to concurrently recognize and localize several objects. Now that machines have been created,
we may teach our computers to recognize several items in an image with great accuracy and precision
by using better algorithms. Since it requires a deep understanding of images, Using computer vision to
recognize objects is the most difficult task. To put it another way, an object tracker searches for the
presence of objects throughout a number of frames and detects them separately. Complex visuals,
information loss, and the transformation of 3D environments into 2D images can all cause problems for
the tracker. The detection of items is important, but we also need to pinpoint the locations of several
objects whose positions could change from image to image if we wish to identify things with high
precision. Making the best real-time object tracking algorithm is a difficult undertaking. Such problems
have been addressed since 2012 using deep learning. This study attempts to evaluate the performance
of both algorithms in a number of real-world circumstances and was specifically intended for persons
who are blind or visually impaired. Blind people are forced to follow someone else or make physical
contact with them, both of which can be quite dangerous. Without some clever technologies, blind
people may find it daunting to navigate new settings on a regular basis. This contribution's major
objective is to investigate the possibility of doing more things at once in order to improve the help given
to persons who are visually impaired.

Content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution
of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.
Published under licence by IOP Publishing Ltd 1
4th National Conference on Communication Systems (NCOCS 2022) IOP Publishing
Journal of Physics: Conference Series 2466 (2023) 012034 doi:10.1088/1742-6596/2466/1/012034

2. Related Work

Real-time object tracking and detection are crucial functions in many computer vision systems.
Variations in object shape, partial and total occlusion, and scene illumination pose serious challenges
for reliable object tracking. The two key components of object tracking that can be accomplished by
applying these methods are the representation of the target item and the location prediction [1]. We
examine the topic of sets for trustworthy Human recognition using a test case of linear SVM-based
human detection. In this experimental demonstration, We demonstrate that, following Grids of
Histograms of Oriented Gradient (HOG) descriptors, current feature sets for human detection are
significantly outperformed [2]. Shallow trainable structures and handmade characteristics provide the
basis of typical person identification systems It is simple to improve their performance by creating
intricate ensembles that combine a number of low-level picture variables with high-level data from
scene classifiers and object detectors. As a result of deep learning's rapid development, more powerful
tools that can learn semantic, high-level, deeper features are becoming available in response to the
problems with conventional designs [3]. Our approach combines two ideas: (1) domain-specific fine-
tuning after supervised pre-training for an auxiliary job, significantly improves performance when
labelled training data are scarce; and (2) when labelled training data are plentiful, In order to identify
specific and segment objects, bottom-up region proposals can be used with high-capacity convolutional
networks (CNNs) [4]. The study presents the camera applications for persons detection and
identification based on convolutional neural networks (CNN) YOLO-v2. Deep learning-based
computer vision is used to determine the person's position and status [5]. we propose a method for
practically current scenario human detection, localization, and recognition using frames from a set of
data which is of video type that can be acquired from a security camera. After a predetermined amount
of time, the model begins receiving input frames and can assign an action label based on a single frame.
We were able to determine the action label for the video stream by combining information gathered
over a predetermined period of time [6]. A deep learning model known as YOLO has been used to
examine the context of person detection from an overhead perspective You Look Only Once (YOLO).
The model is tested on person data from an overhead view after being trained on frontal view data.
Furthermore, utilizing data from categorized bounding boxes has been done. With a TPR of 95%, the
YOLO model generates noticeably good outcomes [7].
3. System Analysis and Feasibility Study

3.1 Existing Method


In this computer vision application, deep learning and machine learning techniques are both effective.
SVM technique is used by Histograms of Gradient Descent (HOG) to recognize people. It is a feature
analyzer that is used to ignore the background image and remove significant information from an image.
This algorithm is particularly good at identifying text and human data. Many researchers also used a
variety of deep learning techniques to enhance performance in scenarios that are more universal.
Recently, real-time Person recognition using convolutional neural network (CNN) based techniques
was demonstrated.

3.2 Proposed Method


We provide a system that will identify all potential daily multiple people and then prompt a voice to
inform the user about both nearby and distant people. We will create speech using the web speech api
in order to obtain audio.

Fig. 1. General Architecture of Human Detection System

2
4th National Conference on Communication Systems (NCOCS 2022) IOP Publishing
Journal of Physics: Conference Series 2466 (2023) 012034 doi:10.1088/1742-6596/2466/1/012034

Fig. 2. Block Diagram of Human Detection System

3.3 Methodology and Algorithms

3.3.1 Convolutional Neural Network (CNN)


Convolutional neural networks (CNNs, also known as Convent’s) are a family of deep neural networks
that are often employed in deep learning to assess visual vision. Because of the shared-weight
convolution kernels' shift across input features and produce effective and efficient outputs, they are also
known as shift invariant or space invariant artificial neural networks (SIANN).
CNNs are modified versions of multilayer perceptron’s. Each neuron in a layer of a multilayer
perceptron, also known as a fully linked network, is connected to each and every neuron in the layer
above it. These networks are vulnerable to overfit the data because to their "full connectivity." Two
common overfitting or normalization avoidance strategies include decreasing connectivity or penalizing
training parameters. CNNs utilize a special method to normalization by uses the data's hierarchical
structure to create increasingly complex patterns using smaller, simpler patterns imprinted in its filters.
CNNs are therefore near the bottom in terms of connection and complexity. The phrase "convolutional
neural network" refers to a network that uses the convolution mathematical technique. A convolutional
network is a type of sophisticated neural network that uses convolution in place of traditional matrix
multiplication in at least one of its layers. Applications include video analysis, natural language
processing, and image recognition.

3.3.2 YOLO
Yolo is a component of object detection. Finding instances of semantic objects that belong to a certain
class (such as people, buildings, or automobiles) in digital photos and videos is the aim of the computer
science discipline of object detection, which is connected with regard to image processing and computer
vision. Two well-studied object detection domains are face and pedestrian detection. Object detection
is necessary for several computer vision applications, including image retrieval and video monitoring.
Every object class includes distinguishing qualities that make it easier to classify the objects; for
example, all circles are spherical. Detecting the object class is done using these distinctive properties.
When searching for circles, for instance, one looks for objects that are a specific distance from the center
or a point.

3.4 Software Development Life Cycle – SDLC


It includes dataset collection, designing of a system as per the requirement, then implementing the
system according to the design drawn, Testing is done to check for any errors while implementing.
Finally run the application in editors and maintaining it properly.
Requirement Gathering and analysis − During this stage, all the requirements that are needed for the
development of the system are collected and specified in the system requirements.
System Design − In this stage, the requirements that are collected in the previous phase are checked
and designing of the system is being started. This determines the overall system architecture as well as
the hardware and software needs for developing.
Implementation − The system is initially designed as number of webpages that are required and they

3
4th National Conference on Communication Systems (NCOCS 2022) IOP Publishing
Journal of Physics: Conference Series 2466 (2023) 012034 doi:10.1088/1742-6596/2466/1/012034

are combined in the following phase, using the inputs from the previous phase that is system design.
Unit testing helps us in testing the system designed and help us in evaluating each and every unit.
Integration and Testing – After the testing phase in the previous stage, all the units are being
combined. The entire system that is being merged is sent to test the faults and errors to next phase.
Deployment of system − Once the product undergone with no errors and faults, it is either released or
installed in the market satisfying the user’s needs.
Maintenance – If the system doesn’t satisfy the user’s need, various problems arise. Patches are
released to fix the certain issues raised. After fixing the issues, the next versions of the system are to be
released. To bring about these changes from the user’s, maintenance is required.

Fig. 3. Process for modelling the Detection System

3.5 Feasibility Study


At this level, the viability of the idea is assessed, and a business proposal with a very basic project
design and some cost estimates is presented. The viability of the proposed system must be assessed
during system analysis. This will guarantee that the suggested treatment won't burden the business.
Understanding the key system requirements thoroughly is necessary for the feasibility study.

Three key considerations involved in the feasibility analysis are


 Economic Feasibility
 Technical Feasibility
 Social Feasibility
3.5.1 Economic Feasibility
This study is being conducted to determine the system's potential financial impact on the business. The
company's resources for research and development of the technology are limited. There must be proof
for backing up the expenses for establishment. Due to some of the resources are made available free,
the entire system is to be allowed and constructed with the given budget. Only the resources that are
not made available free, need to buy and utilize them.

3.5.2 Technical Feasibility


The main goal of this research is to see what kind of needs are required to the system. Any system that
is designed must not be significantly put the burden on the available resources at hand. As a result, the
number of available resources in hand will be severely limited, then the client will have high

4
4th National Conference on Communication Systems (NCOCS 2022) IOP Publishing
Journal of Physics: Conference Series 2466 (2023) 012034 doi:10.1088/1742-6596/2466/1/012034

expectations regarding the system. The system that is being designed with the required needs should
have no alterations and it should have low demand.
3.5.3 Social Feasibility
The main agenda of this study is to specify how much the user agrees to the system. After the acceptance
of the system, they can be able to compromise to use the system effectively. The user while using the
system should be made comfortable to use it rather than threatening them. The method used to
familiarize the user with the system is the sole element that influence how accepting they are of it.
4. System Requirements and Specifications
4.1 Functional Requirements
For the system to fulfil the fundamental requirements of the end user, these requirements must be met.
Functions those programmers must include to help users complete their tasks. It is essential to make
them clear about the requirements that are to be developed in a manner that should satisfy the
requirements. These are the requirements that frequently describe how a system will perform in the
following situation.
Examples of functional requirements:
• Whenever a user logs into the system, they must authenticate themselves.
• Shutdown of systems, if there are any cyber users.
• If a person registers for the first time, a verification email has to be sent to the user.

4.2 Non-Functional Requirements


These are the criteria that have nothing to do with how the system functions; rather, they dictate how
the system should function.
Examples of Non-functional requirements:
• After 12 hours of time, the emails should not be sent.
• Each action by the user selected should not exceed 10 sec of time.
• The next page should load in few seconds even if there are multiple users using the same page
or of any other page in the system.
4.3 System Design
4.3.1 Use Case Diagram
The fundamental goal of a use case diagram is to tell about the functions done by each of the page that
is being created and the operations performed by the actor in each of the following web pages. The
events of the system and the actions of the user are represented in this diagram.

Fig. 4. Use Case Diagram of Human Detection System

5
4th National Conference on Communication Systems (NCOCS 2022) IOP Publishing
Journal of Physics: Conference Series 2466 (2023) 012034 doi:10.1088/1742-6596/2466/1/012034

4.3.2 Class Diagram


The classes, properties, operations (or methods), and connections between the classes are displayed in
a class diagram, it is one of the diagram that shows the actions that are going to be performed in the
system and in user’s perspective.

Fig. 5. Class Diagram of Human Detection System

4.3.3 Sequence Diagram


A sequence diagram, it is a form of diagram that interacts with the system and the user and used for
modelling, shows the connections and sequential order of actions between processes.
It is a part of a chart showing the message flow. This diagram will tell the user regarding the events that
are going to be performed entirely by the system.

Fig. 6. Sequence Diagram of Human Detection System

6
4th National Conference on Communication Systems (NCOCS 2022) IOP Publishing
Journal of Physics: Conference Series 2466 (2023) 012034 doi:10.1088/1742-6596/2466/1/012034

5. Results
5.1 Home page
From the Home Page, the user can navigate to the rest of the pages of the Human Detection Application.

Fig.7. Home page of Human Detection System


5.2 About Page
The About page describes the complete concept behind building this application.

Fig. 8. Complete info of the Human Detection System


5.3 Prediction
The prediction page will capture the image and detect the humans along with the things in and around
the scene that is being captured and specify them with their name.

Fig. 9. Detecting the image captured


7
4th National Conference on Communication Systems (NCOCS 2022) IOP Publishing
Journal of Physics: Conference Series 2466 (2023) 012034 doi:10.1088/1742-6596/2466/1/012034

6. Conclusion
In this paper, With excellent accuracy and in a short amount of time, we have developed a Yolo
approach for identifying and classifying any item in front of the webcam. Yolo is quick to pick up on
all adjacent items, but when tested against a complex scene, it overlooks all the little and distant objects.
To achieve acceptable accuracy across all of the deep learning algorithms used for this application is a
significant problem.
In this study, we evaluate algorithms on a small scale with regard to their accuracy, recall, and inference
time. Future evaluations of alternative algorithms with more parameters and photos will be our goal.

7. References
[1] Z.Zhao, Q. Zheng, P.Xu, S. T, and X. Wu, Object detection with deep learning: A review,IEEE
Transactions on neural networks and learning systems, 30(11), 3212-3232, 2019.

[2] R.Bharti, K. Bhadane, P. Bhadane, and A. Gadhe, Object Detection and Recognition for Blind
Assistance, International Research Journal of Engineering and Technology ( IRJET ) e-ISSN
:2395-0056 Volume:06, 2019.

[3] Misbah Ahmad, Imran Ahmed and Awais Adnan, Overhead view person detection using yolo,
IEEE Annual Ubiquitous Computing, Electronics & Mobile Communication Conference
(UEMCON), 2019

[4] Jiahui Sun, Huayong Ge and Zhehao Zhang, AS-YOLO: An Improved YOLOv4 based on Attent-
ion Mechanism and Squeeze Net for Person Detection, IEEE Advanced Information Technol-
ogy, Electronic and Automation Control Conference (IAEAC), 2021

[5] Prasanth Kannadaguli, YOLO v4 Based Human Detection System Using Aerial Thermal Imagi-
ng for UAV Based Surveillance Applications , IEEE Decision Aid Sciences and Application
(DASA), 2020

[6] Huy Hoang Nguyen, Thi Nhung Ta, Ngoc Cuong Nguyen, Van Truong Bui, Hung Manh Pham,
and Duc Minh Nguyen, YOLO Based Real-Time Human Detection for Smart Video surveilla-
nce at the Edge, IEEE International Conference on Communications and Electronics, ICCE
2020

[7] Sophia Riziel C De Guzman, Lauren Castro Tan and Jocelyn Flores Villaverde, Social Distanc-
ing Violation Monitoring Using YOLO for Human Detection, IEEE International Conference
on Control Science and Systems Engineering (CCSSE), 2021

[8] Sheshang Degadwala, Dhairya Vyas, Utsho Chakraborty, Abu Raihan Dider, Haimanti Biswas,
Yolo-v4 Deep Learning Model for Medical Face Mask Detection, Artificial Intelligence and
Smart Systems (ICAIS), International Conference on 2021

[9] Muhammad Azhad Bin Zuraimi and Fadhlan Hafizhelmi Kamaru Zaman, Vehicle Detection
and Tracking using YOLO and Deep SORT, Computer Applications & Industrial Electronics
(ISCAIE), IEEE Symposium on 2021.

[10] Huy Hoang Nguyen, Thi Nhung Ta, Ngoc Cuong Nguyen, Van Truong Bui, Hung Manh Pham,
YOLO Based Real-Time Human Detection for Smart Video Surveillance at the Edge, Interna-
tional Conference on Communications and Electronics, ICCE, 2021.

You might also like