YOLO v4 Based Human Detection System Using Aerial Thermal Imaging For UAV Based Surveillance Applications

2020 International Conference on Decision Aid Sciences and Application (DASA)
YOLO v4 Based Human Detection System Using

Aerial Thermal Imaging for UAV Based
Surveillance Applications
Prashanth Kannadaguli
2020 International Conference on Decision Aid Sciences and Application (DASA) | 978-1-7281-9677-0/20/$31.00 ©2020 IEEE | DOI: 10.1109/DASA51403.2020.9317198
Technical Trainer and Freelancer

Dhaarini Academy of Technical Education
Bengaluru, India
prashscd@gmail.com
Abstract—This work is related to building a Human Detection rimage rclassification ralgorithms. rHuman rdetection ris rone
system based on You Only Look Once (YOLO) v4. It is one of rsuch rutilization rof rcomputer rvision rwhich ris ressential rtask
the most recent Deep Learning approaches primitively built rbefore rone rcan rbegin rwith ridentification rof rthe rtarget
using single shot detection proposal. Unlike the double stage rduring rsurveillance rand rits rdemeanour rscrutiny. rThere
region-based object detection schemes this technique do not rexists rtwo rways rof rimplementation rof rHuman rdetection
follow semantic segmentation, it does not undergo loss of the ressentially. rThe rclassic roption ruses rHOG+SVM r[1] rwhich
object information such as disappearance of the gradients rperforms rpleasantly ron rMIT rdataset. rAn rimproved rDPM
and it does not require pre-defined anchors. This technique ralgorithm r[2] ris ra rcomponent-based rdetection rmethod rand
comprises strong feature extractors and reinforce multi scale
rhas ra rstrong rrobustness rto rthe rorientations rof rthe rHuman.
object detection and it is very quick in the multi-threaded
r[3] rproposed rthe rintegral rchannel rfeatures rand rthe
GPU environments. Since our fundamental research is
raggregated rchannel rfeatures r[4] rto rintegrate rgradient
concentrated on object classification related to Unmanned
rhistogram, rLUV, rand rgradient rmagnitude rfeatures rto
Aerial Vehicle (UAV) applications, as a first step we choose to
robtain rbetter rperformance rof rthe rfeatures rof rthe rdataset.
detect the humans from thermal dataset. Therefore, we used
rThese rtraditional rmethods rexpect rmanual rfeature
thermal images and videos possessed from thermal cameras
of UAV 1m to 50m above ground level as our dataset in rextraction rand rrequire rmanual rdesign rprocess. rWith rthe
building the model and testing. The YOLO v4 uses ground rdevelopment rof rthe rdeep rneural rnetwork, rthe rdeep
truth bounding boxes to extract the features like Weighted rlearning rmodel rhas rbeen rwidely rused rin rvehicle rdetection
Residual Connections (WRC), Cross Stage Partial ras rthe rother roption r[5]. rR-CNN ris ra rtwo-shot rdetection
Connections (CSP), Cross mini Batch Normalization ralgorithm rusing rRegion rProposal r[6] rand rYOLO ris ra
(CmBN), Self-Adversarial Training (SAT), Mish Activation rsingle rstage rdetection rscheme rbuilt rupon rRegression
(MA), Mosaic Data Augmentation (MDA) and Drop Block rModel r[7]. rThe rdeep rlearning ralgorithms rcan rgrasp rthe
Regularization (DBR). Finally, the performance analysis of rfeatures rto rbe rextracted rin rresponse rto rthe rhuge rtraining
these model in terms of mean Average Precision (mAP) rdata rby rtheir rselves, rthey rare rmore rprototypical. rThe rdeep
indicates that the modelling using YOLO v4 performs in a rlearning rmodels rresemble rthe rhuman rbrain's rvisual
promising way and it can be used in automatic human rperception rsystem rand rthey rextract rfeatures rdirectly rfrom
detection systems. rthe roriginal rimage. rThen rthose rfeatures rare rpermeated
rthrough rseries rof rlayers rto rgather rthe rhigh-dimensional

Keywords-YOLO v4; Deep Learning; Thermal Imaging;
rrepresentation rof rthe rimage. rSince rthe rdeep rlearning
UAV; Object Detection, Smart City, Surveillance.
rillustrates rpowerful rfeature rextraction rand rultimate
rlearning rproficiency rof rconvolutional rneural rnetworks,

I. INTRODUCTION rthey ryield rhighest rdetection raccuracy rand rspeed. rAnother
The rartificial rintelligence ris rextensively rapplied rwith rcause rof rdeep rlearning ralgorithm rbeing revolved
rthe rreal-time robject rdetection rby rmeans rof rcomputer rexpeditiously ris rthat rmachines rcan raptly rdeal rwith rthe
rvision. rDue rto rfar-reaching rresearch rcontrived rin rdeep rmatrix rhandling rand rthe rconvolution rmathematics rby
rlearning, rthe robject rdetection rand rclassification ralgorithms rusing rthe rGPU's rmulti-threaded rparallel rarchitecture.
restablished ron rconvolutional rneural rnetworks r(CNN) rhave rSince rthe rembedded rsystems rdevices rare rusually rlimited
rsubstantiated rsurpassing rrecognition raccuracy rand ragility rby rtheir rtiny rbuild rsize rand rfixed ronboard rpower rsupplies,
rwhen rcompared rto rthe rtypical rimage rsegmentation rand rthey rmay rnot rbe requipped rwith rthe rcomparable rGPUs
rused rin rthe rPC rto renact rthe rfunctions rof raccelerating rthe
Research work fully funded by Dhaarini Academy of Technical
rmatrix rcalculations. rOne rmay rimplement rreal-time
Education, Bengaluru, India.
978-1-7281-9677-0/20/$31.00
Authorized ©2020
licensed use limited to: UNIVERSITY 1213 on May 25,2021 at 15:25:01 UTC from IEEE Xplore. Restrictions apply.
IEEE ONTARIO. Downloaded
OF WESTERN
rdetection rof rmoving rvehicles ron ran rembedded rdevice, rby robject rdetection ror rclassification rsteps rof rthe rYOLO rv4
rusing rYOLO r[7]. rIn r[8] rwe rcan rfind ra rcomponent-based ralgorithm rfor rany rgiven rtarget rare ras rfollows: r
rdetection rmethod rthat ris rrobust rto rthe rdeformation rof rthe STEP1: rThe rinput rpicture ris rdivided rinto rseveral rgrids;
rtarget. r[9] rproposed rusing rdeep rlearning rcombined rwith rdevelopers rcan ropt rfor rthe rsize rS rx rS ras ra rsquare rsize rto
rPart rModel rwhich rresults rin rDeep rParts rto rsolve rthe rkeep rthe rmathematics reasier. rWhen rthe rcentre rpoint rof rthe
rocclusion rissues rin robject rdetection. r[10] rproposed ra rrapid rtarget robject ris rlocated rat rthe rcentre ra rcertain rgrid, rthe rgrid
rreal-time robject rdetection rmethod, rwhich rinvolves rthe ris rconsidered ras rthe rresponsible rgrid rfor rdetecting rthe
rveracity rof rdeep rneural rnetworks rand rthe rincisiveness rof rtarget. r
rcascaded rclassifiers rfor robject rdetection. rThe rmajor rtarget
rdetection rmodels rinclude: rR-CNN r[11], rSPP-Net r[12],
rFast-RCNN r[13], rFaster-RCNN r[14], rSSD r[15], rYOLO
r[16] rand rResNet r[17]. rThe rYOLO rnetwork rmodels
rtranscend rin rreal-time rdetection rof rtargets rfrom rimages ror
rvideos. rTherefore, rwe rhave rchosen rthe rYOLO rv4
rtechnique rto rmodel rand rtest rthe rHuman rdetection rsystem
rof rthermal rdataset. rThe rapplication rof rdeep rneural

Fig.1: rThe rgeneral rarchitecture rof rYOLO rv4
rnetworks rin rinfrared rimages ris rincreasing r[18-26]. rIn r[27]
STEP2: rA rBag rof rFreebies r(BOF) ris rcreated rwhich ris
rintended rto rbe rthe robjective rfunction rof rthe rBounded rBox
rwe rcan rfind ran rinfrared rimage rclassification rbased ron
r(BBox) rregression. rPerform rMSE, rdirect rregression ror rIoU
rgrowth rregion rmethod rand rBPNN. rHere rOtsu rmethod ris
rcomputation rin rorder rto restimate rthe rgrids rof rBBox rby
rused rto rdetermine rthe rvalues rof roptimal rsegmentation
rconsidering rthe rground rtruth. r
rthreshold rwhen rthe rgrayscale rintensity ris runiformity
rdistributed rover rthe rscreen rcapture rwindow. rThe rgrayscale

STEP3: rPerform rCut rMix rwhich rforces rthe rmodel rnot rto
rbe roverconfident ron rspecific rfeatures rin rmaking rthe
rsimilarity rthreshold rvalues rare rused rto rpersuade rthe
rsegmentation rcriterion rof rthe rregion rgrowth. rThe rauthors rclassifications rand rMosaic rdata raugmentation rfor
rof r[28] rintroduced ra rnew rproblem rof rtarget rrecognition rin renhancing rthe rdetection rof robjects routside rtheir rnormal
rcomplex rbackground. rTheir rsystem runcovers rthe rposition rcontext.
rof rthe rtarget rby rextracting rthe rHOG rfeature rand rcombining STEP4: rPreform rDropBlock rregularization rto rdrop ra
rit rwith rthe rSVM rclassifier. rLater, rthe rsystem rnormalizes rblock rof rpixels r without raffecting rthe rspatial rinformation
rthe rGray rlevels rto rachieve rtarget rdetection. rIn r[29] rwe rread rof rthe rdata rforcing rthe rmodel rto rlearn ra rvariety rof
rabout ra rproposal-based rgroundwork rbeing rexercised rto rfeatures.

rgenerate ra rhuge rnumber rof rcandidate rproposals rand rthe STEP5: rPerform rCross rlabel rsmoothing rto ravoid rthe
rbounding rbox rregressions rare racted rupon rthese rproposals
roverfitting rby radjusting rthe rtarget rupper rbound rof rthe
rbefore rclassification ris rcompleted. rThese rregression-based
rprediction rto r90% rinstead rof r100%.
rgroundworks routput rthe rpositions rof rthe rbounding rboxes
STEP6: rReplace rthe rsoft ractivation rfunction rby rMish
rthrough rspecial riterative rprocedures rdirectly. rIn r[30] rthe
ractivation rfunction r ( ) = . rtanh ( ) rwhere r ( ) =
rauthors rprove rthat rby radding rthe rconvolution rlayer rto rthe
ln (1 + ).
rYOLO rstructure, rthe rdetector ris radapted rto rthe rprojection
rimaging rof rthe rsupervising rsystem, rthereby rreaching ra

STEP7: rObtain rMulti rweighted rResidual rConnections
r(MWRC) rwhich rgenerates ra rBi-directional rFeature
rbetter rdetection. rTo rproclaim rthis rdeep rlearning rtechnique,
rPyramidal rNetwork r(BiFPN). rStep5 rto rstep7 rcreates rBag
rwe rhave rbuilt ra rdatabase rby rcollecting r30000 rnumber rof
rof rSpecials r(BOS).
rthermal rimages rand r50 rnumber rof rthermal rvideos rfrom
rdifferent rsources. rThey rwere roriginally rcollected rthrough

As ra rresult, reach rgrid ris rexpected rto rgenerate r3 rbounding
rboxes rto rdetect ra rgiven rtarget. rEach rof rthe rbounding rboxes
rthermal rcameras rfrom r1m rto r50m rabove rground rlevel rat
rhas rfive rparameters rdenoted rby r(x, ry, rw, rh, rc), rfor
rdifferent rscenarios. rThis rdatabase rtraces rdifferent rnoise
rrepresenting rthe rcentre rpoint rcoordinates, rlength, rwidth,
rprofiles, rdifferent rresolutions, rdifferent rhardwares rand
rand rconfidence rscores rof rthe rspecific rbox. rConfidence
rsoftwares rto rbring rdiversity rand rrange rin rit.
rscore ris rmainly runruffled rof rtwo rparts, rone ris rthe
II. DEEP LEARNING DATA MODELLING USING YOLO V4

R R R R R R rprobability rof robject rcategory rin rthe rbounding rbox, rand
rthe rother ris rthe rcoincidence rdegree rbetween rthe rbounding

The rroot rthought rof rthe rYOLO rv4 ralgorithm ris rto
rbox rand rthe rreal rbounding rbox. rThe rconfidence rscore ris
rdisciple rthe rclassification rproblem rinto ra rregression
rcalculated ras rfollows:
rproblem. rBy rusing ronly rone rneural rnetwork rarrangement
rto rrealize rthe rregression rfunction, rthe rwhole rnetwork ( )∗

= Pr ∗
rframework ris rmade rstraightforward rand rfair. rDue rto rthe
rexclusive rnetwork rtopology, rYOLO rcan rforesee rmultiple Where,Pr represents rthe rprobability rof rthe robject
r
rbounding rboxes rand rtheir rcategories rin ra rsingle rshot. rAt
rthe rvery rsame rtime, rthe rend-to-end rnetwork rstructure

r category rinside rthe rbox. r ( ) rrepresents rthe
rmakes rits rdetection rrate rfaster rthan rother rnetworks. rThe
r probability rof rfinding ra rcenter rpoint rin rthe rgrid rand
rgeneral rarchitecture rof rYOLO rv4 ris ras rshown rin rFig.1.The r rrepresents rthe rintersection rratio rof rthe
rbounding rbox rto rthe rbox rarea rin rreality.
1214 on May 25,2021 at 15:25:01 UTC from IEEE Xplore. Restrictions apply.
Authorized licensed use limited to: UNIVERSITY OF WESTERN ONTARIO. Downloaded
STEP8: rBOF rfor rthe rdetector rshould rinclude rCIoU-loss, B. Pre-processing rof rImages
rCmBN, rDropBlock rregularization, rMosaic rdata
In rpre-processing, ras ra rfirst rstep rall rthe rtemplates rin
raugmentation, rSelf-Adversarial rTraining, rEliminate rgrid
rthe rdatabase rwere rannotated rby rusing rMicrosoft’s rVisual
rsensitivity, rUsing rmultiple ranchors rfor ra rsingle rground
rObject rTagging rTool r(VoTT) r[31]. rA rscreenshot rof rthe
rtruth, rCosine rannealing rscheduler.
rannotation rprocess ris rshown rin rFig.3. rSince rthe rdata
STEP9: rBOS rfor rdetector rincludes rMish ractivation, rSPP- rcollection rtook rplace runder rvarious rconditions rwith
block, rSAM-block, rPAN rpath-aggregation rblock, rDIoU- rdifferent rbackground rnoise rprofiles, rit ris rnecessary rto
NMS. rremove rthe rnoise. rThermal rimages rare rsubjected rto
STEP10: rBy rusing rthe rNon-Maximum rSuppression rdifferent rnoises rlike rdetector rnoise, rADC rnoise,
r(NMS) ralgorithm rone rcan rdribble rout rthe rgenerated
ratmospheric rradiation rnoise ror rthe rbackground rnoise.
rbounding rboxes rand rmitigate rthe runnecessary rbounding
rSuch rnoises rresult rin rdevaluation rof rthe rimage rcontrast.
rboxes rby rsetting rup ra rverge rfor rthe rconfidence rscore.
rAs ra rnoise rremoval rprocedure, rfirst rmedian rfiltering ris
rThen rthe rtarrying rbounding rbox ris rsaid rto rbe rthe ractual
renforced r[32]. rThe rfiltering rsorts rthe rpixels rin rthe
rdetector rbounding rbox rof rthe rtarget. rThe rloss rfunction rof
rtemplate rdata rfrom rsmaller rto rthe rlarger rand rreplaces rthe
rthe rbounding rbox rcan rbe restimated rduring rthe rtraining
rcurrent rpixel rvalue rwith rthe rintermediate rvalue rof rthe
rphase ras rfollows:
rsorted rsequence.
= r +
Where, r rrepresents rthe rpositioning rloss rfunction rand
r rrepresents rthe rclassification rloss rfunction. rThe
rYOLO rv4 rcan rdo rthe rmulti-label rclassification rin raddition
rto rthe rmulti-class rclassification. rThis ris rdone rby rapplying
rthe rbinary rcross-entropy rfor reach rof rthe rclasses rone rby
rone rand rsum rthem rup ras rthey rare rnot rmutually rexclusive.
rThen ronly rthose rcells rhaving rground rtruth robject r will rbe
rcounted. rDetailed rarchitecture rof rYOLO rv4 ris ras rshown rif
rFig. r2.
Fig.3: rAnnotation rby rusing rVoTT

The rresults rof rmedian rfiltering rapplied rto rone rsuch
rtemplate rcan rbe rseen rif rFig.4.
Fig.2: rArchitectural rdesign rof rYOLO rv4.
III. METHODOLOGY
There rare rtwo rphases rnamely rTraining rand rTesting.
A. Database rCreation
Though rthe rintention ris rto restablish ran rautomatic
rHuman rdetection rsystem, rto rbegin rwith rwe radapted
rmethods rto rbuild rthe rone rfor rthermal rdataset. rSo ronly rthe
rthermal rdata rcollected rusing rthermal rimaging rwere rused
rfor rtraining r(80% of total) rand rtesting r(20% of total) . rThe
rdetails rof rthe rdatabase rcreated rare ras rshown rin rTable1.
TABLE1: rDETAILS rOF rTHERMAL rDATABASE
Number r Number r Fig.4: rResults rof rmedian rfiltering
Class of r of r In rthe rnext rstep rwe rperform rhistogram rmatching rof rall
Training rFrames Testing rFamples
rthe rtemplates rwith rrespect rto ra rtarget rimage r[33]. rThe
Human 48000 12000
No rHuman 48000 12000 rresult rof rhistogram rmatching ris ras rshown rin rFig.5.
Since rthe rHuman rdetection rnetwork rhas rmany

rparameters rto rtrain, rwe rused rmore rcomputing rpower rby
rattaching ra rCUDA renabled rGPU r[34] rto rthe rCPU r[35].
rThe rexperimental rsetup ris ras rshown rin rTable2.

TABLE2: rDETAILS rOF rEXPERIMENTAL rPLATFORM
Hardware rPlatform Software rPlatform
CPU Intel rCore ri7-
Windows r10
7700HQ rCPU OS
r64-Bit
r@2.80GHz
GPU Nvidia
rGeforce rGTX DL rFramework DarkNet
r1080 rTi
RAM MATLAB
16 rGB Programming
r2020b r64-bit
Fig.5: rThe rresult rof rhistogram rmatching The rtraining rprogress rscreenshot rshowing rAverage rloss
In rcase rof rtemplate rvideos, rthe rmedian rfiltering rand r and raverage rIOU rfor rYOLO rv4 rare ras rshown rin rFig.7.
rthe rhistogram r matching r were rcarried rout rin rframe rby
rframe rmode. rAfter rpre-processing rour rdatabase rhas
rdifferent rfolders rorganized raccording rto rthe rnames rof
roriginally rcollected rdataset rinside rwhich rthe rtemplates rare
rsaved rin r.jpg rformat.
C. YOLO rv4 rBased rHuman rDetection

The rIn rorder rto rinterpret rthe rpre-processed rdata rto ra
rproper rinput rformat rof ra rYOLO rv4 rnetwork rthe rfollowing
rstep ris rcrucial. rSince rthe rprediction rof rHumans rneed
rdifferent rscales, rwe rfirst rmust rcompute rthe rgap rbetween
rthe rground rtruth rand rthe rprediction. rThis rcan rbe rdone rby
rscaling rthe rground rtruths rinto rsuch rmatrices. r rTherefore,
rfor reach rbounding rbox rof rground rtruth, rwe rselect ran
radjustment rof rproper rfit rand rthen rwe ranchor rit. rWe rhave
rchosen r9 rdifferent ranchors rof r3 rdifferent rscales rin rthis
rwork rwhich rcan rbe rseen rin rFig.6.

Fig.7: rYOLO rv4 rTraining rprogress
The rfinal rsegment rof rthis rHuman rdetection rsystem ris
rcalled rNon-Maximum rSuppression r(NMS). rThis rNMS
r[36] rhelps rus rin rwiping rout rthe rduplicate rresults. rIn rNMS,
rwe rinitially robserve rthe rdetection rbox rwith rthe rbest
rconfidence, rthen rwe radd rit rto rthe rfinal rresult robtained rand
rthen rcancel rout rall rother rboxes rwhich rhave rIOU rover ra
rpre-determined rthreshold rwith rthis rbest rbox. rLater rwe
rselect rdifferent rbox rwith rthe rbest rconfidence rscore rin rthe
rleftover rboxes rand rdo rthe rsame rthing rover rand rover runtil
rno rmore rboxes rare rleft. rThe rresult rof rNMS rapplied rto rour
rwork ris ras rshown rin rFig.8.
Fig.6: rAnchors rand rscales Fig.8: rResult rof rNMS
IV. RESULTS AND DISCUSSIONS

R R V. CONCLUSION
The rresult ranalysis rwas rdone rby rusing rmAP rwhich ris In rthis rwork rwe rbuilt rthe rHuman rdetection rsystem rby
rinferred rby rusing rIntersection rover rUnion r(IoU) rwhich rcan rusing rYOLO rv4 rbased rapproach. rFrom rthe rstochastic rdeep
rbe rdefined ras rthe rratio rof rthe rarea rof roverlap rbetween rthe rlearning rof rthermal rimage rprocessing ra rHuman rdata rmodel
rpredicted rand rground rtruth rbox rto rthe rarea rof runion rof rwas rcreated. rLater rthe rtemplate-based rpattern rrecognizers
rpredicted rand rground rtruth rbox. rThe rresults robtained rwere rbuilt rby rusing rthe rdeep rneural rnetworks rwhich rare
rduring rthe rtesting rprocess rare ras rin rthe rTable.3. rthe rmost rsophisticated rstochastic rapproaches rin rdeep
TABLE3: rRESULTS rlearning. rOn rthe rraw rthermal rimage rand rvideo rdataset
EXPERIMENT rannotation, rnoise rremoval, rhistogram rmatching rhave rbeen
1 2 3 4
rNo.
rperformed rin ran ralgorithm rbased rapproach. rThe rnetworks
Total Data 15000 30000 60000 120000
ruse rmultiple rlayered rfeature rvector rmatrices ras rthe
Training rData 12000 24000 48000 96000
robservation rtemplate rvectors. rDuring rthe rtesting rstage rthe
Testing rData 3000 6000 12000 24000
Epoch 100 200 400 800 rfeatures rof ran runknown rthermal rimage ris rsearched ragainst
Iterations rper rthe rdifferent rmodels ralready rtrained. rFrom rthe rresults
50 100 200 400
rEpoch robtained rit rcan rbe rinferred rthat rHumans rin rthe rthermal
True rPositives 2500 5085 10330 21010 raerial rimages ror rvideos rcan rbe rsuccessfully rmodelled
False rNegatives 441 692 1148 1825 rusing rYOLO, rwhich ris ra rtop rperformer ramong rDL
False rPositives 217 324 430 428
ralgorithms. rIt rcan rbe rused rto rbuild rautomatic rHuman
Recall 0.85 0.88 0.90 0.92
rdetection rsystems. rIn rfuture rwe rneed rto rinvestigate rthe
Precision 0.92 0.94 0.96 0.98
mAP 0.40 0.43 0.46 0.48 rperformance rof rYOLO rv4 rby rvarying rconfidence
Computation rthresholds, rweight rdecay, rmomentum retc. rWe rneed rto

1.0 0.85 0.75 0.6
rTime rin rFPS rperform rdata raugmentation rincluding rrandom rflipping,
The rtest rresults rshow rthat rthe rmAP rof rYOLO rv4 ris rrandom rcropping rand rrandom rtranslating rto rcheck rwhether
rmuch rhigher rcompared rto rthat rof rthe rconventional rit rimproves rthe rrobustness, raccuracy rand rversatility rof rthe
rmachine rlearning rtechniques [37-42]. rThis rcould rbe ralgorithm. rWe rcan ralso rtry rto rbuild rRegion rProposal
rbecause rof rthe rthe rpresence rof rthe rmulti-layers rin rthe rNetworks r(RPN) rusing rYOLO rv4 rto rsimplify rall rinstance
rbackbone rnetwork. rThe rtable ralso rshows rmore rruntime rlevel rclassification rproblems rand rkey-point rdetection
rand rmore raccuracy ras rwe rincrease rthe rtraining rsamples. rproblems rof rcomputer rvision. rThis rwork rcan ralso rbe
rThis ris rbecause rof rhyper-parameters rof rthe rnetwork rextended rfurther rby rincluding rmore rdata, rusing ran
rconverges rslowly rbut refficiently compared to the results of rembedded rhardware rfriendly rdata rmodel rwhich rworks rat ra
[43-47]. rAs rthe rtraining rsample rincreased rthere ris rmore rreduced rcomputation rtime rto rcome rup rwith rapplications
rlearning rdue rto rincreased rinformation rcontents rrelated rto rsuch ras rsurveillance, rtraffic rmonitoring retc. r
rHumans. rThe rYOLO rv4 rhandles rIoU rthresholds rduring
rclassification rof rpositive rand rnegative ranchor rboxes. rThis ACKNOWLEDGMENT

ris rbecause rof rthe rstable rtraining rsteps rof rYOLO compared I rthank rmy rstudents rof rDhaarini rAcademy rof rTechnical
to the results of [48-52]. rThe raccuracy vs data size rcurves rEducation rfor rhelping rme rin rcreating rsuch ra rhuge rdatabase
rcan rbe rplotted rfor rYOLO rv4 ras rshown rin rFig.9. rYOLO rv4 rand rI rthank rmy rcolleagues rfor rproviding rvaluable
rbased rdetector rconsiders rthe rsuccessive rfeature rvectors rare rsuggestions rduring rthis rresearch rwork.
rmutually rexclusive rto rprovide rconsistent rperformance
compared to [53-56]. REFERENCES

[1] Dalal,N., Triggs, B.:Histograms of oriented gradients for human
detection. In:CVPR.(2005).
[2] Felzenszwalb, P., McAllester,D., Ramanan, D.A.:discriminatively
trained, multiscale,deformable part model.In:CVPR.(2008).
[3] Dollár P, Tu Z, Perona P, et al. Integral Channel Features[C]//British
Machine Vision Conference, BMVC,2009:1-11.
[4] Dollar P, Appel R, Belongie S, et al. Fast Feature Pyramids for
Object Detection[J]. IEEE Transactions on Pattern Analysis &
Machine Intelligence,2014,36(8):1532-45.
[5] Redmon, J., Divvala, S., Girshick, R. and Farhadi, A., 2016. You
only look once: Unified, real-time object detection. IEEE
conference on computer vision and pattern recognition (pp. 779-
788)
[6] Girshick, R., Donahue, J., Darrell, T. and Malik, J., 2014. Rich
feature hierarchies for accurate object detection and semantic
segmentation. IEEE conference on computer vision and pattern
recognition (pp. 580-587)
[7] Redmon, J., Divvala, S., Girshick, R. and Farhadi, A., 2016. You
only look once: Unified, real-time object detection. In Proceedings
Fig.9: rAccuracy vs data size curves
of the IEEE conference on computer vision and pattern recognition [29] Fang Wei, Li Min, Sheng School, Shi Zeqiong, Dong Yanzhi.
(pp. 779-788). Research on Image Recognition of Infrared Ship Based on
[8] Ouyang, W., Wang, X. : Joint deep learning for pedestrian detection. Enhanced SVM Algorithm [J]. Journal of Yantai University,
In : ICCV. (2013) 2018,31(3),254-259.
[9] Tian Y, Luo P, Wang X, et al. Pedestrian detection aided by deep [30] CAI Chengtao, WU Kejun, YAN YongJie. Runway Target
learning semantic tasks[J].2015 IEEE Conference on Computer Detection Based on Optimized YOLO Method[J]. Command
Vision and Pattern Recognition (CVPR),2015:5079-5087. Information System and Technology, 2018, 3(120):112-117.
[10] Anelia Angelova,Alex Krizhevsky,Vincent Vanhoucke, et al. Real- [31] https://github.com/microsoft/VoTT
Time Pedestrian Detection With Deep Network Cascades. [32] Y. Shao et al., "Using Multi-Scale Infrared Optical Flow-based
In:BMVC.(2015). Crowd motion estimation for Autonomous Monitoring UAV," 2018
[11] Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for Chinese Automation Congress (CAC), Xi'an, China, 2018, pp. 589-
accurate object detection and semantic segmentation[J]. Computer 593, doi: 10.1109/CAC.2018.8623268.
Science,2013:580-587. [33] V. Voronin, S. Tokareva, E. Semenishchev and S. Agaian, "Thermal
[12] He K, Zhang X, Ren S, et al. Spatial Pyramid Pooling in Deep Image Enhancement Algorithm Using Local And Global
Convolutional Networks for Visual Recognition[J]. IEEE Logarithmic Transform Histogram Matching With Spatial
Transactions on Pattern Analysis & Machine Intelligence,2014, Equalization," 2018 IEEE Southwest Symposium on Image
37(9):1904-16. Analysis and Interpretation (SSIAI), Las Vegas, NV, 2018, pp. 5-8,
doi: 10.1109/SSIAI.2018.8470344.
[13] Girshick R. Fast R-CNN[C]//IEEE International Conference on
Computer Vision.IEEE,2015:1440-1448. [34] https://www.geforce.com/hardware/desktop-gpus/geforce-gtx-
1050-ti/specifications
[14] Ren S,He K,Girshick R,et al.Faster R-CNN: Towards Real-Time
Object Detection with Region Proposal Networks[J].IEEE [35] https://www.intel.com/content/www/us/en/products/processors/cor
Transactions on Pattern Analysis & Machine Intelligence, 2015:1- e/i7-processors/i7-7700hq.html
1. [36] R. Dong, D. Xu, J. Zhao, L. Jiao and J. An, "Sig-NMS-Based Faster
[15] L. Wei, A. Dragomir: SSD: Single Shot MultiBox Detector.arXiv R-CNN Combining Transfer Learning for Small Target Detection
preprint arXiv:1512.02325v5,2016. in VHR Optical Remote Sensing Imagery," in IEEE Transactions on
Geoscience and Remote Sensing, vol. 57, no. 11, pp. 8534-8545,
[16] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi.You only look Nov. 2019, doi: 10.1109/TGRS.2019.2921396.
once: Unified, real-time object detection. arXiv preprint
arXiv:1506.02640, 2015. [37] P. Kannadaguli and V. Bhat, "Microwave Imaging based Automatic
Crack Detection System using Machine Learning for Columns,"
[17] Redmon J, Farhadi A. YOLO9000:Better, Faster, Stronger.arXiv
2020 IEEE 9th International Conference on Communication
preprint arXiv:1612.08242v1,2016.
Systems and Network Technologies (CSNT), Gwalior, India, 2020,
[18] Ren S, He K, Girshick R, et al. Faster R-CNN: Towards Real-Time pp. 5-8, doi: 10.1109/CSNT48778.2020.9115763.
Object Detection with Region Proposal Networks[J]. IEEE
[38] M. K. K and P. Kannadaguli, "IoT Based CNC Machine Condition
Transactions on Pattern Analysis & Machine Intelligence, 2017,
Monitoring System Using Machine Learning Techniques," 2020
39(6):1137.
IEEE 9th International Conference on Communication Systems and
[19] Hyeok-June Jeong, Kyeong-Sik Park, Young-Guk Ha. Image Network Technologies (CSNT), Gwalior, India, 2020, pp. 61-65,
Preprocessing for Efficient Training of YOLO Deep Learning doi: 10.1109/CSNT48778.2020.9115762.
Networks. 2018 IEEE International Conference on Big Data and [39] P. Kannadaguli and V. Bhat, "Comparison of Hidden Markov
Smart Computing[C], 2018:635-637.
Model and Artificial Neural Network Based Machine Learning
[20] Redmon J, Divvala S, Girshick R, et al. You Only Look Once: Techniques Using DDMFCC Vectors for Emotion Recognition in
Unified, Real-Time Object Detection[J]. 2015:779- 788. Kannada," 2019 IEEE International WIE Conference on Electrical
[21] Abbas Q, Ahmad J. Analysis of Learning Rate Using BP Algorithm and Computer Engineering (WIECON-ECE), Bangalore, India,
for Hand Written Digit Recognition Application [J]. Information 2019, pp. 1-6, doi: 10.1109/WIECON-ECE48653.2019.9019936.
and Emerging Technologies, 2010, 1-5. [40] P. Kannadaguli and V. Bhat, "A comparison of Bayesian and HMM
[22] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, and S. E. Reed. SSD: based approaches in machine learning for emotion detection in
single shot multibox detector. CoRR, abs/1512.02325, 2015. native Kannada speaker," 2018 IEEMA Engineer Infinite
[23] Fang Wei, Li Min, Sheng School, Shi Zeqiong, Dong Yanzhi. Conference (eTechNxT), New Delhi, 2018, pp. 1-6, doi:
Research on Image Recognition of Infrared Ship Based on 10.1109/ETECHNXT.2018.8385377.
Enhanced SVM Algorithm [J]. Journal of Yantai University, [41] P. Kannadaguli and V. Bhat, "A comparison of Gaussian Mixture
2018,31(3),254-259. Modeling (GMM) and Hidden Markov Modeling (HMM) based
[24] Jadin M S, Taib S. Infrared image enhancement and segmentation approaches for Automatic Phoneme Recognition in Kannada," 2015
for extracting the thermal anomalies in electrical equipment[J]. International Conference on Signal Processing and Communication
Electronics and Electrical Engineering, 2012, 4(120):107-112. (ICSC), Noida, 2015, pp. 257-260, doi:
10.1109/ICSPCom.2015.7150658.
[25] Budak, HALICI U, SENGUR A, et al. Efficient airport detection
using line segment detector and fisher vector representation[J]. [42] P. Kannadaguli and V. Bhat, "A comparison of Bayesian
IEEE Geoscience & Remote Sensing Letters, 2016, 13(8): 1079- multivariate modeling and hidden Markov modeling (HMM) based
1083. 12. approaches for automatic phoneme recognition in kannada," 2015
Recent and Emerging trends in Computer and Computational
[26] Kendall A, Cipolla R. Segnet: a deep convolutional encoder-decoder Sciences (RETCOMP), Bangalore, 2015, pp. 1-5, doi:
architecture for image segmentation[J]. IEEE Transactions on 10.1109/RETCOMP.2015.7090795.
Pattern Analysis and Machine Intelligence, 2017, 39(12): 2481-
2495. [43] P. Talluri and M. Dua, "Low-resolution Human Identification in
thermal imagery," 2020 5th International Conference on
[27] CHEN Yue-wei, PENG Dao-gang, XIA Fei, Qian Yuliang. Infrared Communication and Electronics Systems (ICCES),
image recognition based on region growing method and BP neural COIMBATORE, India, 2020, pp. 1283-1287, doi:
network [J]. Laser and Infrared, 2018, 48(3): 1-8. 10.1109/ICCES48766.2020.9138039.
[28] WU Shuai, XU Yong, ZHAO Dongning. Survey of Object
Detection Based on Deep Convolutional Network[J]. Pattern
Recognition and Artificial Intelligence, 2018, 31(4): 335-346.
[44] M. Krišto, M. Ivasic-Kos and M. Pobar, "Thermal Object Detection

in Difficult Weather Conditions Using YOLO," in IEEE Access,
vol.8,pp125459-125476,2020,doi: 0.1109/ACCESS.2020.3007481.
[45] S. Yun and S. Kim, "Recurrent YOLO and LSTM-based IR single
pedestrian tracking," 2019 19th International Conference on
Control, Automation and Systems (ICCAS), Jeju, Korea (South),
2019, pp. 94-96, doi: 10.23919/ICCAS47443.2019.8971679.
[46] Chaitanya, S. Sarath, Malavika, Prasanna and Karthik, "Human
Emotions Recognition from Thermal Images using Yolo
Algorithm," 2020 International Conference on Communication and
Signal Processing (ICCSP), Chennai, India, 2020, pp. 1139-1142,
doi: 10.1109/ICCSP48568.2020.9182148.
[47] V. Ghenescu, E. Barnoviciu, S. Carata, M. Ghenescu, R. Mihaescu
and M. Chindea, "Object Recognition on Long Range Thermal
Image Using State of the Art DNN," 2018 Conference Grid, Cloud
& High Performance Computing in Science (ROLCG), Cluj-
Napoca, 2018, pp. 1-4, doi: 10.1109/ROLCG.2018.8572026.
[48] V. Paidi, H. Fleyeh and R. G. Nyberg, "Deep learning-based vehicle
occupancy detection in an open parking lot using thermal camera,"
in IET Intelligent Transport Systems, vol. 14, no. 10, pp. 1295-1302,
10 2020, doi: 10.1049/iet-its.2019.0468.
[49] P. Tumas, A. Nowosielski and A. Serackis, "Pedestrian Detection in
Severe Weather Conditions," in IEEE Access, vol. 8, pp. 62775-
62784, 2020, doi: 10.1109/ACCESS.2020.2982539.
[50] L. Lianqiao, C. Xiai, Z. Huili and W. Ling, "Recognition and
Application of Infrared Thermal Image Among Power Facilities
Based on YOLO," 2019 Chinese Control And Decision Conference
(CCDC), Nanchang, China, 2019, pp. 5939-5943, doi:
10.1109/CCDC.2019.8833160.
[51] F. Jehlik and A. Blazquez de Mingo, "A Deep Learning Approach
to Modeling a Complex Multi-variate, Temporal Thermal Problem,"
2019 18th IEEE International Conference On Machine Learning
And Applications (ICMLA), Boca Raton, FL, USA, 2019, pp. 1506-
1510, doi: 10.1109/ICMLA.2019.00248.
[52] L. Li, S. Dai and Z. Cao, "Deep Long Short-term Memory (LSTM)
Network with Sliding-window Approach in Urban Thermal
Analysis," 2019 IEEE/CIC International Conference on
Communications Workshops in China (ICCC Workshops),
Changchun, China, 2019, pp. 222-227, doi:
10.1109/ICCChinaW.2019.8849965.
[53] R. Suresh and N. Keshava, "A Survey of Popular Image and Text
analysis Techniques," 2019 4th International Conference on
Computational Systems and Information Technology for
Sustainable Solution (CSITSS), Bengaluru, India, 2019, pp. 1-8,
doi: 10.1109/CSITSS47250.2019.9031023.
[54] L. Lianqiao, C. Xiai, Z. Huili and W. Ling, "Recognition and
Application of Infrared Thermal Image Among Power Facilities
Based on YOLO," 2019 Chinese Control And Decision Conference
(CCDC), Nanchang, China, 2019, pp. 5939-5943, doi:
10.1109/CCDC.2019.8833160.
[55] B. Ilikci, L. Chen, H. Cho and Q. Liu, "Heat-Map Based Emotion
and Face Recognition from Thermal Images," 2019 Computing,
Communications and IoT Applications (ComComAp), Shenzhen,
China, 2019, pp. 449-453, doi:
10.1109/ComComAp46287.2019.9018786.
[56] S. Raka, A. Kamat, S. Chavan, A. Tyagi and P. Soygaonkar, "Taste-
wise fruit sorting system using thermal image processing," 2019
IEEE Pune Section International Conference (PuneCon), Pune,
India, 2019, pp. 1-4, doi: 10.1109/PuneCon46936.2019.9105726.

YOLO v4 Based Human Detection System Using Aerial Thermal Imaging For UAV Based Surveillance Applications

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

YOLO v4 Based Human Detection System Using Aerial Thermal Imaging For UAV Based Surveillance Applications

Uploaded by

Copyright:

Available Formats

2020 International Conference on Decision Aid Sciences and Application (DASA)

YOLO v4 Based Human Detection System Using

Technical Trainer and Freelancer

rthrough rseries rof rlayers rto rgather rthe rhigh-dimensional

rlearning rproficiency rof rconvolutional rneural rnetworks,

rcascaded rclassifiers rfor robject rdetection. rThe rmajor rtarget

rdetection rmodels rinclude: rR-CNN r[11], rSPP-Net r[12],

rFast-RCNN r[13], rFaster-RCNN r[14], rSSD r[15], rYOLO

r[16] rand rResNet r[17]. rThe rYOLO rnetwork rmodels

rtranscend rin rreal-time rdetection rof rtargets rfrom rimages ror

rvideos. rTherefore, rwe rhave rchosen rthe rYOLO rv4

rtechnique rto rmodel rand rtest rthe rHuman rdetection rsystem

rof rthermal rdataset. rThe rapplication rof rdeep rneural

rdistributed rover rthe rscreen rcapture rwindow. rThe rgrayscale

rcomplex rbackground. rTheir rsystem runcovers rthe rposition rcontext.

rabout ra rproposal-based rgroundwork rbeing rexercised rto rfeatures.

rimaging rof rthe rsupervising rsystem, rthereby rreaching ra

rdifferent rsources. rThey rwere roriginally rcollected rthrough

II. DEEP LEARNING DATA MODELLING USING YOLO V4

rthe rother ris rthe rcoincidence rdegree rbetween rthe rbounding

rto rrealize rthe rregression rfunction, rthe rwhole rnetwork ( )∗

rthe rvery rsame rtime, rthe rend-to-end rnetwork rstructure

rbounding rbox rto rthe rbox rarea rin rreality.

r rrepresents rthe rclassification rloss rfunction. rThe

rYOLO rv4 rcan rdo rthe rmulti-label rclassification rin raddition

rto rthe rmulti-class rclassification. rThis ris rdone rby rapplying

Fig.3: rAnnotation rby rusing rVoTT

Fig.2: rArchitectural rdesign rof rYOLO rv4.

rthermal rdata rcollected rusing rthermal rimaging rwere rused

rfor rtraining r(80% of total) rand rtesting r(20% of total) . rThe

Since rthe rHuman rdetection rnetwork rhas rmany

rThe rexperimental rsetup ris ras rshown rin rTable2.

rframe rmode. rAfter rpre-processing rour rdatabase rhas

rdifferent rfolders rorganized raccording rto rthe rnames rof

roriginally rcollected rdataset rinside rwhich rthe rtemplates rare

rsaved rin r.jpg rformat.

C. YOLO rv4 rBased rHuman rDetection

rstep ris rcrucial. rSince rthe rprediction rof rHumans rneed

rdifferent rscales, rwe rfirst rmust rcompute rthe rgap rbetween

rscaling rthe rground rtruths rinto rsuch rmatrices. r rTherefore,

rchosen r9 rdifferent ranchors rof r3 rdifferent rscales rin rthis

rwork rwhich rcan rbe rseen rin rFig.6.

rwe rinitially robserve rthe rdetection rbox rwith rthe rbest

rpre-determined rthreshold rwith rthis rbest rbox. rLater rwe

rwork ris ras rshown rin rFig.8.

Fig.6: rAnchors rand rscales Fig.8: rResult rof rNMS

IV. RESULTS AND DISCUSSIONS

Computation rthresholds, rweight rdecay, rmomentum retc. rWe rneed rto

rHumans. rThe rYOLO rv4 rhandles rIoU rthresholds rduring

rclassification rof rpositive rand rnegative ranchor rboxes. rThis ACKNOWLEDGMENT

compared to [53-56]. REFERENCES

[44] M. Krišto, M. Ivasic-Kos and M. Pobar, "Thermal Object Detection

You might also like