A Human-Detection Method Based On YOLOv5 and Trans

drones
Article
A Human-Detection Method Based on YOLOv5 and Transfer
Learning Using Thermal Image Data from UAV Perspective for
Surveillance System
Aprinaldi Jasa Mantau 1, * , Irawan Widi Widayat 1 , Jenq-Shiou Leu 2 and Mario Köppen 1
1 Department of Computer Science and System Engineering (CSSE), Graduate School of Computer Science and
System Engineering, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka-shi, Fukuoka 820-8502, Japan
2 Department of Electronic and Computer Engineering (ECE), National Taiwan University of Science and
Technology, Taipei City 106, Taiwan
* Correspondence: mantau.aprinaldi@ieee.org
Abstract: At this time, many illegal activities are being been carried out, such as illegal mining,
hunting, logging, and forest burning. These things can have a substantial negative impact on the
environment. These illegal activities are increasingly rampant because of the limited number of
officers and the high cost required to monitor them. One possible solution is to create a surveillance
system that utilizes artificial intelligence to monitor the area. Unmanned aerial vehicles (UAV) and
NVIDIA Jetson modules (general-purpose GPUs) can be inexpensive and efficient because they use
few resources. The problem from the object-detection field utilizing the drone’s perspective is that
the objects are relatively small compared to the observation space, and there are also illumination
and environmental challenges. In this study, we will demonstrate the use of the state-of-the-art
object-detection method you only look once (YOLO) v5 using a dataset of visual images taken from a
Citation: Mantau, A.J.; Widayat, I.W.;
UAV (RGB-image), along with thermal infrared information (TIR), to find poachers. There are seven
Leu, J.-S.; Köppen, M. A scenario training methods that we have employed in this research with RGB and thermal infrared
Human-Detection Method Based on data to find the best model that we will deploy on the Jetson Nano module later. The experimental
YOLOv5 and Transfer Learning result shows that a new model with pre-trained model transfer learning from the MS COCO dataset
Using Thermal Image Data from can improve YOLOv5 to detect the human–object in the RGBT image dataset.
UAV Perspective for Surveillance
System. Drones 2022, 6, 290. Keywords: human detection; Jetson; surveillance; thermal imaging; UAV; YOLO
https://doi.org/10.3390/
drones6100290
Academic Editor: Federico Tombari

1. Introduction
Received: 7 September 2022
Since 2012, the United Nations has proclaimed 21 March as World Forest Day [1]. The
Accepted: 30 September 2022
goal is to make people aware of the importance of forest sustainability. Based on data
Published: 4 October 2022
from the Food and Agriculture Organization of the United Nations [2], Indonesia is the
Publisher’s Note: MDPI stays neutral eighth most forested country, with a total forest area of more than 50% of the total land area
with regard to jurisdictional claims in or about 93 million hectares. However, many illegal activities occur in large forest areas
published maps and institutional affil-
in Indonesia, such as land clearing without a permit, forest fires, and illegal hunting [3].
iations.
Illegal activities carried out in the forest environment may cause many natural disasters
such as landslides, floods, and the loss of the biological environment for many animals [4] .
Various efforts have been made to maintain Indonesia’s forests and their various biological
species, whereas many illegal activities still occur. Due to a lack of personnel resources and
Copyright: © 2022 by the authors.
thorough area coverage, the traditional technique of patrolling and monitoring these areas
Licensee MDPI, Basel, Switzerland.
This article is an open access article
has not been able to resolve this issue.
distributed under the terms and
The answer to this problem is to develop a surveillance system that keeps an eye
conditions of the Creative Commons on the neighborhood using artificial intelligence. Unmanned aerial vehicles (UAVs, often
Attribution (CC BY) license (https:// known as drones) and NVIDIA Jetson modules, a general-purpose GPU, are an affordable
creativecommons.org/licenses/by/ and effective solution because they only need a small number of resources. The proposed
4.0/). solution for a survaillance system using drones and Jetson can be seen in Figure 1.
Drones 2022, 6, 290. https://doi.org/10.3390/drones6100290 https://www.mdpi.com/journal/drones

Drones 2022, 6, 290 2 of 12
Figure 1. UAVs with NVIDIA Jetson Nano for surveillance system.
UAV technology is currently very advanced and is the most realistic solution today
because it is flexible, fast, relatively inexpensive, lightweight, and easy to use [5]. In several
fields of studies, UAVs have been employed as tools for area and target coverage, path and
trajectory planning, image analysis and vision-based techniques, networking, and flight
control [6]. Despite the massive use of UAVs in these various fields, there are still many
challenges that need to be solved, which include weather conditions, shadows, illumination,
and other variations. To overcome this challenge, RGBT images, which are also known as
red, green, and blue images with thermal infrared information, are utilized.
Conceptually, a thermal infrared (TIR) image represents data that capture information
outside the spectrum of the human eye. It captures wavelengths out of the visible light
spectrum area, as we can see in Figure 2. This helps the TIR to overcome changes in
light intensity that affect the color captured by the human eye. However, TIR also has a
weakness: it is sensitive to temperature changes and does not contain detailed information
such as visual RGB images [7].
Figure 2. Wavelength of light.
Due to the small size of humans in UAV videos, the UAV’s motion, and the low
resolution, the ability to detect poachers in UAV video, particularly thermal infrared
footage, is an important topic of research. In this present study, several scenarios have been
used to enhance the you only look once (YOLO) [8] object-detection method, which focuses
Drones 2022, 6, 290 3 of 12
on small human–object detection from a UAV perspective. The target presents a harder
challenge for the object detection due to its various shapes and dense crowds. Therefore,
the YOLOv5 model was trained using the RGB image and TIR dataset in order to evaluate
how well it performed when identifying humans from aerial perspective data.
The main contributions of this paper are as follows:
• Optimizing the YOLOv5s algorithm for small human–object detection dataset via the
transfer learning method .
• Developing a method to handle different environmental issues, including illumination
and mobility change using thermal infrared (TIR) images in addition to RGB (RGBT)
images.
• The original dataset has been manually annotated to be YOLO-format-compatible,
and the annotation will be made available to the public.
• Proposing a surveillance system for wildlife conservation using NVIDIA Jetson Nano
module.
This paper is organized as follows: Section 2 describes the object detection for surveil-
lance and provides a brief overview of the NVIDIA Jetson Modules. Section 3 consists of
the methodology and necessary background information, as well as the evaluation method.
Sections 4 and 5 consist of the experiment’s results and the conclusion, respectively.
2. Related Work
The technique of object detection in UAVs or drones has been developed for use in a
variety of contexts, including aerial image analysis, monitoring agents, delivery routing
agents, intelligent surveillance, and air force security. Hengstler et al. [9] introduce a new
approach to the distribution model of the surveillance camera by using a low-resolution
stereo camera that calculates all the captured images for the position, range, and dimension
that UAVs use, called MeshEye. Widiyanto et al. [10] introduced a PSO algorithm for
the odor-source localization model of automatic robotic movement by reconstructing two
different points of robotic sensing. Zhao et al. [11] proposed a new mixed YOLOv3-LITE
for image detection precision and speed, which can be used on a non-GPU computing
system such as a mobile or portable device.
Several studies have been conducted in the field of object detection, especially with the
availability of large datasets online and the increasing computing power, which have made
extraordinary achievements in the field of computer vision [12]. It has been observed that
object detection has been able to solve general and specific problems. The two examples of
single-stage detection include you only look once (YOLO) and single-shot multi-box detec-
tor (SSD) [13]. Meanwhile, the RCNN family, which includes RCNN [14], Fast RCNN [15],
and Faster RCNN [16], is categorized as being composed of two-stage detectors. These two
categories of deep-learning-based detectors are divided based on accuracy and processing
time.
2.1. You Only Look Once (YOLO)

The first YOLO method was introduced by Redmon et al. in 2016 [8]. This single
convolution network object detection has the ability to predict object categories and loca-
tions up to 45 fps. YOLO algorithm takes all the images in one instance and then divides
the given image into the SxS grid system. Each grid on the input image is responsible for
detecting and predicting the category of the object inside the bounding box that contains
the class probability. The YOLO architecture has 24 convolutional layers for performing
feature extraction and two fully connected layers for predicting the bounding box of the
predicted object. In addition, YOLO is renowned for its high performance, but with a
tiny model, which makes it an ideal candidate for real-time object detection for on-device
deployment.
By late 2021, YOLO had been upgraded to version 5. Before this period, the first
three YOLO versions were released in 2016, 2017, and 2018, respectively, and within a few
months in 2020, two versions of this model were released, namely, YOLOv4 and YOLOv5.
Drones 2022, 6, 290 4 of 12
YOLO version 2 (YOLOv2) replaced the original architecture with a 19-layer feature called
Darknet-19 [17]. In the third version (YOLOv3), the network architecture was updated
again to a more profound architecture known as Darknet-53 [18]. Furthermore, YOLO
version4 (YOLOv4), regarded as CSP Darknet-53, utilized the same Darknet-53 as the
backbone architecture with additional cross stage partial connection (CSP) [19]. YOLOv4
came up in 2020 with several additional features that are proven to enhance accuracy.
2.2. NVIDIA Jetson Modules

Embedded machine learning is evolving rapidly. NVIDIA is recognized as a manufac-
turer of graphics-processing units for gaming, professional markets, and system-on-chip
units for mobile computing. Furthermore, it has also produced several NVIDIA Jetson
modules, which is a family of embedded computers with integrated GPUs or modules
designed for high-performance computing to create an embedded AI system easily [20].
Jetson Nano is the cheapest of all NVIDIA Jetson modules, and with its 128 parallel process-
ing cores, it has the ability to handle a real-time video feed. The main technical parameters
of the Jetson Nano modules are summarized in Table 1.
Table 1. Jetson Nano Technical Specification [21].
Technical Specifications
AI Performance 472 GFLOPs

GPU NVIDIA Maxwell architecture
CPU Quad-core ARM Cortex-A57 MPCore processor
Cuda Core 128
Memory 4 GB 64-bit LPDDR4

25.6 GB/s
Power 5 W|10 W
NVIDIA Jetson Nano used the compute unified device architecture (CUDA) as a
parallel computing platform. Generally, CUDA is a development and execution enabling
platform designed by NVIDIA for general proposed computing or program on graphical
processing units (GPUs) [22]. It allocates tasks that are parallel to others, which do not
need to be executed sequentially on the GPU. Furthermore, it supports many programming
languages, such as C, C++, Fortran, and Python. CUDA is useful in domains that require a
lot of computing power or in situations where parallelization is possible and high perfor-
mance is required. NVIDIA Jetson modules have been widely used in research in the field
of computer vision; this is because NVIDIA Jetson general-purpose GPUs became a viable
platform for the efficient execution of some computational models [23].
In this current study, the NVIDIA Jetson Nano was used to detect human appearance
from a UAV perspective for the surveillance system. Additionally, the best YOLOv5 model
was deployed from the RGBT dataset on Jetson Nano. An overview design of the Jetson
Nano utilized is seen in Figure 3.
Drones 2022, 6, 290 5 of 12
Figure 3. Jetson Nano module.
3. Methodology
3.1. Object Detector
YOLOv5 [24] is the latest major version of YOLO till date. Jocher launched the YOLOv5
publicly on 9 June 2020 and is still being updated. The release of YOLOv5 includes four
main different model sizes, which are YOLOv5s, the smallest; YOLOv5m, medium; and
YOLOv5l, large; and YOLOv5x, the largest. When it was released, YOLOv5 was initially
only intended for an image size of 640 pixels, but now it also offers 1280 pixels.
Furthermore, the architecture of YOLOv5 has a cross stage partial connection (CSP)
backbone and PANET neck, just like YOLOv4. However, YOLOv5 utilizes the PyTorch
instead of using the original Darknet. The significant improvements in YOLOv5 include
mosaic data augmentation and auto-learning bounding box anchors. The architecture of
YOLOv5 is shown in Figure 4.
Figure 4. YOLOv5 architecture. Backbone: CSPD; neck: PANet; and head: YOLO layer detection
results (class, score, location, and size).
Drones 2022, 6, 290 6 of 12
3.2. Dataset
During the experimental design, the VisDrone 2021 RGBT dataset was used [25]. This
dataset was originally part of the VisDrone 2021 Crowd Counting Challenge, which is a
challenge for counting people in each frame. This challenge aims to estimate the number of
people in an image. VisDrone 2021 provides a dataset with pairs of RGB and TIR images. It
is important to note that the VisDrone 2021 RGBT dataset was collected by the AISKYEYE
team from the Lab of Machine Learning and Data Mining at Tianjin University, China.
These data consist of 1807 pairs of RGB and TIR images; an example of this pair image
can be seen in Figure 5. This team collected the data from the actual UAV under several
different scenarios as well as various lighting and weather conditions. The ground truth
of the dataset is the object’s target point in XML format. Before implementing this data in
the experiment, some data prepossessing was performed to make it compatible with the
YOLO format. In this study, the data was divided into training and test sets in the ratio of
80:20, respectively.
Figure 5. RGBT VisDrone crowd-counting dataset [25].
3.3. Experiment Setup

In this research, the YOLOv5 model was trained in the host machine with an NVIDIA
RTX 3060 GPU, 12 GB of VRAM, Intel Corei9-10900K Processor (3.70 GHz, 20 MB), and
memory of 32GB. After getting the best model from the training stage, it was converted to a
TensorRT model that was deployed to the Jetson Nano. Finally, the model was tested with
NVIDIA Deep Stream SDK. The training parameters used can be seen in Table 2. Since this
study aims to deploy the inference model on the Jetson Nano module, the smallest model
version of YOLOv5 (YOLOv5s) was chosen.
Table 2. Training parameter.
Training Parameter Value

Class 1
Batch size 16
Epoch 100
Learning rate 1 × 10−3
A total of seven training-testing dataset scenarios considered in this study based on

YOLOv5 are as follows:
• Original YOLOv5 model (MS COCO RGB Dataset);
• VisDrone RGB image data + transfer learning YOLOv5s model (YOLO-RGB-TL);
• VisDrone RGB image data (YOLO-RGB);
• VisDrone TIR image data + transfer learning YOLOv5s model (YOLO-TIR-TL);
• VisDrone TIR image data (YOLO-TIR);
Drones 2022, 6, 290 7 of 12
• VisDrone RGB and TIR Image + transfer learning YOLOv5s model (YOLO-RGBT-TL);
• VisDrone RGB and TIR image data (YOLO-RGBT).
The seven aforementioned scenarios are intended to investigate the impacts of the
combination transfer-learning approach and dataset utilized so that the best scenario may
be selected and applied to the Jetson Nano device.
3.4. Evaluation
The training scenarios for VisDrone RGB, TIR, and RGBT images were evaluated
in both RGB and TIR test sets. The evaluation measurements utilized include precision
(P), recall (R), and average precision (AP). The AP measures a combination of recall and
precision for ranked retrieval results and is the average precision at various recall values [26].
The formula to calculate P and R is as follows:
TP
Recall = (1)
TP + FN
TP
Precision = (2)
TP + FP
where :
• TP denotes true positive;
• FP denotes false positive;
• FN denotes false negative.
4. Experiments and Results

The experimental pipeline consists of two main stages: the first one is a model search
or training process to find the best model to perform the human-detection task from a UAV
perspective. This model search was performed on the computer host machine mentioned
earlier. The second stage is the execution or inference in the Jetson Nano module. The flow
of this experiment can be seen in Figure 6.
4.1. Model Search

The original YOLOv5 provided by ultralytics has the ability to detect small objects
in both the RGB and TIR images, but such detection leads to a wrong classification. For
example, YOLOv5 shown in Figure 7 classifies the small human being as a bird and kite.
Intuitively, this occurs because the original YOLOv5 is a model trained on the COCO
dataset, which has 80 classes and different perspectives. This indicates that the model
trained on the MS COCO dataset is insufficient to solve the human classification problem
from the standpoint of an unmanned aerial vehicle (UAV).
After the training was conducted using RGB, TIR, and RGBT images from Visdrone
2021 dataset, the model was tested using RGB and TIR test-set images. The result of
experiment from seven scenarios can be seen in Tables 3 and 4.
Table 3. Performance result on RGB test-set image.
Model Precision (%) Recall (%) AP (%)

YOLOv5 24.6 7.3 12.36
YOLO-RGB-TL 80.8 75.4 79.8
YOLO-RGB 71.5 68 70
YOLO-TIR-TL 12.2 13.3 4.94
YOLO-TIR 9.89 12 4.01
YOLO-RGBT-TL 80.1 75.1 79.1
YOLO-RGBT 76.5 66.8 71.4
Drones 2022, 6, 290 8 of 12
Table 4. Performance result on TIR test-set image.
Model Precision (%) Recall (%) AP (%)

YOLOv5 21.2 4.2 8.3
YOLO-RGB-TL 76.3 63.3 71.3
YOLO-RGB 66 61.4 64
YOLO-TIR-TL 86.6 84.2 88.8
YOLO-TIR 81.7 80.1 84.6
YOLO-RGBT-TL 86.3 83.9 88.8
YOLO-RGBT 82.5 81.5 85.7
Figure 6. Model search and human–object detection on Jetson Nano workflow.
Table 3 shows the comparison results for each of the seven training scenarios and the
original YOLOv5 model when applied on the RGB images test set. It was observed that
the performance from all trained models produced a better performance than the original
YOLOv5.
The best model in this scenario was the YOLO-RGB-TL model, with an average preci-
sion of 79.8%; meanwhile, the YOLO-TIR model failed in the RGB images test as it produced
a lower performance value. Table 3 also shows that the performance of both YOLO-RGB
and YOLO-RGBT became better when pre-trained weight transfer learning from the MS
COCO dataset was employed. This is evident as the model performance increased from
70% to 79.8% and 71.4% to 79.1% for YOLO-RGB and YOLO-RGBT, respectively.
Drones 2022, 6, 290 9 of 12
Furthermore, Table 4 shows the comparison results for each of the seven training
scenarios and the original YOLOv5 model when applied to the TIR images test set. It was
observed that the YOLO-TIR and YOLO-RGBT with transfer learning weight produced a
TIR image test set with AP 88.8%. In Table 4, both YOLO-RGB and YOLO-RGB- TL did not
produce the same result as YOLO-TIR and YOLO-RGBT models because the information
in the TIR image was not as detailed as that in the RGB image. This limited information
makes it to be difficult for this model, which is not trained with TIR images, to detect the
object. The performance results of each scenario for the RGB and TIR images are shown
from Figures 7–10.
(a) (b)
Figure 7. YOLOv5s original model detection result. (a) TIR Image. (b) RGB Image.
(a) (b)
Figure 8. YOLOv5-RGB model detection result. (a) TIR Image. (b) RGB Image.
(a) (b)
Figure 9. YOLOv5-TIR model detection result. (a) TIR Image. (b) RGB Image.
Drones 2022, 6, 290 10 of 12
(a) (b)
Figure 10. YOLOv5-RGBT detection result. (a) TIR Image. (b) RGB Image.
4.2. Inference on the Jetson GPUs

It is important to note that the best model obtained from the previous step was chosen
and was executed on the Jetson Nano module. The process of deploying the model in the
Jetson Nano module includes converting the model to TensorRT and cloning the TensorRT
project on the Jetson Nano. In the deployment process, the NVIDIA Deep Stream was
installed and then the model was executed in the Jetson Nano module. The best model was
run on the platform using the Keras API with TensorFlow v2. Jetson Nano modules were
switched to the highest performance mode (nvpmodel 0), and the model processed images
from the testing data set.
4.3. The Limitation of This Study

The limitation of this study is that the model we used as a foundation is YOLOv5s,
a simple version of YOLOv5. Because we propose a method for applying the model to
Jetson Nano, we consider resource constraints such as memory, time, and energy con-
sumption. Additionally, the proportion of the RGB and TIR images has not been studied
further to determine the optimal combination of these images for the most accurate object-
detection method.
5. Conclusions
The detection ability of the state-of-the-art deep-learning-based algorithm, namely,
you only look once (YOLO), has been investigated by considering the small human–object
detection from an unmanned aerial vehicle perspective using NVIDIA Jetson modules . For
the model search aspect, the YOLOv5 model trained with RGB and thermal infrared images
produced a good result for solving the small object-detection problem. The RGB and TIR
images dataset from VisDrone was able to boost the performance of the YOLOv5 model in
order to detect the small object from a UAV perspective with AP values up to 79.8% and
88.8% for RGB and TIR images, respectively. Future study needs to consider more complex
methods for the training process, including the possibility to observe new architecture in
YOLO and the most effective way to utilize the combination of RGB and thermal infrared
dataset images. Finally, a complex surveillance system can be implemented in a multi-agent
UAV with an edge AI concept using the NVIDIA Jetson module in order to investigate the
cost performance of this solution.
Author Contributions: Conceptualization, A.J.M., I.W.W., M.K., and J.-S.L.; data curation, A.J.M.;
formal analysis, A.J.M.; investigation, A.J.M. and I.W.W.; resources, A.J.M.; supervision, M.K. and
J.-S.L.; visualization, A.J.M.; writing—original draft, A.J.M. and I.W.W.; and writing—review and
editing, A.J.M., I.W.W., M.K., and J.-S.L. All authors read and agreed to the published version of the
manuscript.
Funding: This study was supported by a collaborative research project between the Kyushu Institute
of Technology (Kyutech) and the National Taiwan University of Science and Technology (Taiwan-
Tech).
Drones 2022, 6, 290 11 of 12
Institutional Review Board Statement: Not applicable.

Informed Consent Statement: Not applicable.
Data Availability Statement: Not applicable
Acknowledgments: The authors gratefully acknowledge the support by the Kyushu Institute of
Technology—National Taiwan University of Science and Technology Joint Research Program, under
Grant Kyutech-NTUST-111-04.
Conflicts of Interest: The authors declare that they have no known competing financial interest or
personal relationships that could have appeared to influence the work reported in this paper.
References
1. United Nations. International Day of Forests, 21 March. Available online: https://www.un.org/en/observances/forests-and-
trees-day (accessed on 1 September 2021).
2. Food and Agriculture Organization of the United Nations. Global Forest Resources Assessment 2020: Main Report; FAO: Rome, Italy,
2020. https://doi.org/10.4060/ca9825en.
3. Assifa, F. Setiap Tahun, HUTAN INDONESIA HILANG 684.000 Hektar. Available online: https://regional.kompas.com/read/
2016/08/30/15362721/setiap.tahun.hutan.indonesia.hilang.684.000.hektar (accessed on 24 April 2021).
4. Nugroho, W.; Eko Prasetyo, M.S. Forest Management and Environmental Law Enforcement Policy against Illegal Logging in
Indonesia. Int. J. Manag. 2019, 10, 317–323.
5. Mantau, A.J.; Widayat, I.W.; Köppen, M. A Genetic Algorithm for Parallel Unmanned Aerial Vehicle Scheduling: A Cost
Minimization Approach. In Proceedings of the International Conference on Intelligent Networking and Collaborative Systems; Springer:
Cham, Switzerland, 2021; pp. 125–135. https://doi.org/10.1007/978-3-030-84910-8_14.
6. Shakeri, R.; Al-Garadi, M.A.; Badawy, A.; Mohamed, A.; Khattab, T.; Al-Ali, A.; Harras, K.A.; Guizani, M. Design Chal-
lenges of Multi-UAV Systems in Cyber-Physical Applications: A Comprehensive Survey, and Future Directions. arXiv 2018,
arXiv:1810.09729. https://doi.org/10.48550/ARXIV.1810.09729.
7. Bokolonga, E.; Hauhana, M.; Rollings, N.; Aitchison, D.; Assaf, M.H.; Das, S.R.; Biswas, S.N.; Groza, V.; Petriu, E.M. A compact
multispectral image capture unit for deployment on drones. In Proceedings of the 2016 IEEE International Instrumentation and
Measurement Technology Conference Proceedings, Taipei, Taiwan, 23–26 May 2016; pp. 1–5.
8. Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. you only look once: Unified, real-time object detection. In Proceedings
of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788.
https://doi.org/10.48550/arXiv.1506.02640.
9. Hengstler, S.; Prashanth, D.; Fong, S.; Aghajan, H. Mesheye: A hybrid-resolution smart camera mote for applications in
distributed intelligent surveillance. In Proceedings of the 6th International Conference on Information Processing in Sensor
Networks, Cambridge, MA, USA, 25–27 April 2007; pp. 360–369.
10. Widiyanto, D.; Purnomo, D.; Jati, G.; Mantau, A.; Jatmiko, W. Modification of particle swarm optimization by reforming global best
term to accelerate the searching of odor sources. Int. J. Smart Sens. Intell. Syst. 2016, 9, 1410–1430. https://doi.org/10.21307/ijssis-
2017-924.
11. Zhao, H.; Zhou, Y.; Zhang, L.; Peng, Y.; Hu, X.; Peng, H.; Cai, X. Mixed YOLOv3-LITE: A lightweight real-time object-detection
method. Sensors 2020, 20, 1861.
12. Liu, M.; Wang, X.; Zhou, A.; Fu, X.; Ma, Y.; Piao, C. UAV-YOLO: Small object detection on unmanned aerial vehicle perspective.
Sensors 2020, 20, 2238.
13. Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of the
European Conference on Computer Vision; Springer: Cham, Switzerland, 2016; pp. 21–37.
14. Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmenta-
tion. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June
2014; pp. 580–587. https://doi.org/10.48550/arXiv.1311.2524.
15. Girshick, R. Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December
2015; pp. 1440–1448.
16. Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. Adv. Neural
Inf. Process. Syst. 2015, 28, 91–99.
17. Redmon, J.; Farhadi, A. YOLO9000: Better, faster, stronger. In Proceedings of the IEEE Conference on Computer Vision and
Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 7263–7271.
18. Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767.
19. Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. Yolov4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934.
20. Cass, S. Nvidia makes it easy to embed AI: The Jetson nano packs a lot of machine-learning power into DIY projects—[Hands on].
IEEE Spectr. 2020, 57, 14–16. https://doi.org/10.1109/MSPEC.2020.9126102.
21. Jetson Modules, 2021. Available online: https://developer.nvidia.com/embedded/jetson-modules (accessed on 12 January
2022).
Drones 2022, 6, 290 12 of 12
22. Kirk, D. NVIDIA Cuda Software and Gpu Parallel Computing Architecture. In Proceedings of the 6th International Symposium
on Memory Management, ISMM’07, Montreal, QC, Canada, 21–22 October; Association for Computing Machinery: New York,
NY, USA, 2007; pp. 103–104. https://doi.org/10.1145/1296907.1296909.
23. Krömer, P.; Nowaková, J. Medical Image Analysis with NVIDIA Jetson GPU Modules. In Proceedings of the Advances in Intelligent
Networking and Collaborative Systems; Barolli, L., Chen, H.C., Miwa, H., Eds.; Springer International Publishing: Cham, Switzerland,
2022; pp. 233–242. https://doi.org/10.1007/978-3-030-84910-8_25.
24. Jocher, G. yolov5. 2021. Available online: https://github.com/ultralytics/yolov5 (accessed on 31 January 2022).
25. University, T. Crowd Counting. Available online: http://aiskyeye.com/download/crowd-counting_/ (accessed on 6 June 2022).
26. Zhang, E.; Zhang, Y., Average Precision. In Encyclopedia of Database Systems; Liu, L., Özsu, M.T., Eds.; Springer: Boston, MA, USA,
2009; pp. 192–193. https://doi.org/10.1007/978-0-387-39940-9_482.

A Human-Detection Method Based On YOLOv5 and Trans

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A Human-Detection Method Based On YOLOv5 and Trans

Uploaded by

Copyright:

Available Formats

drones

Academic Editor: Federico Tombari

Drones 2022, 6, 290. https://doi.org/10.3390/drones6100290 https://www.mdpi.com/journal/drones

Figure 1. UAVs with NVIDIA Jetson Nano for surveillance system.

Figure 2. Wavelength of light.

2.1. You Only Look Once (YOLO)

2.2. NVIDIA Jetson Modules

Table 1. Jetson Nano Technical Specification [21].

AI Performance 472 GFLOPs

Memory 4 GB 64-bit LPDDR4

Figure 3. Jetson Nano module.

Figure 5. RGBT VisDrone crowd-counting dataset [25].

3.3. Experiment Setup

Table 2. Training parameter.

Training Parameter Value

A total of seven training-testing dataset scenarios considered in this study based on

4. Experiments and Results

4.1. Model Search

Table 3. Performance result on RGB test-set image.

Model Precision (%) Recall (%) AP (%)

Table 4. Performance result on TIR test-set image.

Model Precision (%) Recall (%) AP (%)

Figure 6. Model search and human–object detection on Jetson Nano workflow.

4.2. Inference on the Jetson GPUs

4.3. The Limitation of This Study

Institutional Review Board Statement: Not applicable.

You might also like