Agronomy 14 01331

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 16

agronomy

Article
Highly Accurate and Lightweight Detection Model of Apple Leaf
Diseases Based on YOLO
Zhaokai Sun 1,2 , Zemin Feng 1,3, * and Ziming Chen 1

1 School of Mechanical Engineering, Yanshan University, Qinhuangdao 066004, China;


szk192611@126.com (Z.S.); chenzm@ysu.edu.cn (Z.C.)
2 Tangshan Jidong Equipment Engineering, Tangshan 063200, China
3 School of Mechanical and Equipment Engineering, Hebei University of Engineering, Handan 056038, China
* Correspondence: fengzemin@hebeu.edu.cn

Abstract: To mitigate problems concerning small-sized spots on apple leaves and the difficulties
associated with the accurate detection of spot targets exacerbated by the complex backgrounds of
orchards, this research used alternaria leaf spots, rust, brown spots, gray spots, and frog eye leaf
spots on apple leaves as the research object and proposed the use of a high-accuracy detection
model YOLOv5-Res (YOLOv5-Resblock) and lightweight detection model YOLOv5-Res4 (YOLOv5-
Resblock-C4). Firstly, a multiscale feature extraction module, ResBlock (residual block), was designed
by combining the Inception multi-branch structure and ResNet residual idea. Secondly, a lightweight
feature fusion module C4 (CSP Bottleneck with four convolutions) was designed to reduce the number
of model parameters while improving the detection ability of small targets. Finally, a parameter-
streamlining strategy based on an optimized model architecture was proposed. The experimental
results show that the performance of the YOLOv5-Res model and YOLOv5-Res4 model is significantly
improved, with the mAP0.5 values increasing by 2.8% and 2.2% compared to the YOLOv5s model
and YOLOv5n model, respectively. The sizes of the YOLOv5-Res model and YOLOv5-Res4 model are
only 10.8 MB and 2.4 MB, and the model parameter counts are reduced by 22% and 38.3% compared
to the YOLOv5s model and YOLOv5n model.

Keywords: leaf disease detection; YOLO; residual block; parameter streamlining strategy
Citation: Sun, Z.; Feng, Z.; Chen, Z.
Highly Accurate and Lightweight
Detection Model of Apple Leaf
Diseases Based on YOLO. Agronomy 1. Introduction
2024, 14, 1331. Apples are one of the world’s most important temperate fruits, and they have medici-
https://doi.org/10.3390/ nal, nutritional, and ornamental value. Apples are not only rich in essential nutrients for the
agronomy14061331 brain, such as sugar, protein, vitamins, and minerals, but, more importantly, are rich in zinc.
Academic Editor: Gamal Elmasry According to research, zinc is a component of many important enzymes in the human body
and is a key element in promoting growth and development. Therefore, improving the
Received: 14 May 2024 yield and quality of apples can better improve human health [1]. The spring and summer
Revised: 11 June 2024 temperatures and high humidity in northern China are favorable for the propagation and
Accepted: 18 June 2024
spread of pathogenic bacteria, increasing the possibility of apple tree leaves being infected
Published: 19 June 2024
with leaf spot diseases, such as alternaria leaf spots, rust, brown spots, gray spots, and frog
eye leaf spots. These diseases are the most common, and when coupled with the fact that
artificial orchards are planted with high density and are poorly ventilated, this leads to the
Copyright: © 2024 by the authors.
development of these diseases. Poor ventilation leads to the frequent occurrence of these
Licensee MDPI, Basel, Switzerland. diseases in the northern region, seriously affecting the healthy development of the apple
This article is an open access article industry. The early discovery and rapid identification of leaf diseases, as well as the precise
distributed under the terms and prevention and control of apple leaf diseases, can effectively improve the yield and quality
conditions of the Creative Commons of apples [2].
Attribution (CC BY) license (https:// Traditional disease detection requires experienced experts to undertake visual judg-
creativecommons.org/licenses/by/ ments of leaves in the field. This traditional detection method is inefficient, especially
4.0/). concerning the fact that some similar leaf diseases are easily affected by subjective factors,

Agronomy 2024, 14, 1331. https://doi.org/10.3390/agronomy14061331 https://www.mdpi.com/journal/agronomy


Agronomy 2024, 14, 1331 2 of 16

resulting in omission and misdetection. A large number of studies have shown that image
processing and machine learning algorithms are widely used for the rapid diagnosis of
crop diseases [3]. However, traditional machine learning methods have shortcomings,
such as difficulties in terms of their feature extraction and their high computational com-
plexity [4]. Deep learning methods based on convolutional neural networks have faster
detection speeds and greater detection accuracy in target detection compared to traditional
machine learning methods [5–11]. LEE [12] used the Faster R-CNN (Fast Region-based
Convolutional Neural Network) model with region-based convolutional neural networks
to recognize brown blight, blister blight, and algal leaf spot diseases affecting tea, with an
AP (average precision) of 63.58%, 81.08%, and 64.71%, respectively. Singh [13] used MCNN
(multilayer convolutional network) to classify mango anthracnose leaf disease with 97.3%
accuracy. He [14] proposed a Mask-RCNN-based model for detecting leaf diseases of tea
tree with F1 scores of 88.3% and 95.3% for tea coal diseases and boll weevil leaf diseases;
however, the F1 scores were lower when distinguishing between brown leaf disease and
target spot disease, with scores of only 61.1% and 66.6%. Sun [15] et al. proposed a module
for MEAN by reconstructing the 3 × 3 convolutional module to detect a single diseased
apple leaf in a simple environment, and the MEAN-SSD (Mobile End AppleNet-based
SSD algorithm) model achieved a mAP (mean average precision) value of 83.12% and an
FPS (frames per second) value of 12.53 for apple leaf diseases. Tian [16] et al. combined
DenseNet to optimize the feature layer of the YOLOv3 model and proposed the use of the
YOLOv3-dense model to detect apple surface anthracnose; DenseNet greatly improved
the utilization of features in the neural network, with a detection accuracy of 95.57%, and
the detection speed could be up to 31 FPS. Li [17] proposed an improved YOLOv5s model
for vegetable disease detection, which achieved 93.1% mAP@0.5 for five disease images
with a model size of 17.1 M. Lin [18] proposed an improved YOLOx-Tiny model for the
detection of tobacco brown spot disease, which introduced the HMU module for informa-
tion interaction and small feature extraction capability in the neck network, resulting in
an AP of 80.45%. Faiza Khan [19] obtained 99.04% AP for three maize crop diseases using
YOLOv8n model detection. Firozeh Solimani [20] used the SE attention module after the
C2f module of the YOLOv8 model and investigated the effect of the SE module on the
YOLOv8 model, and the results showed that the SE module improved the efficiency when
detecting small targets, such as flowers, fruits, and neck nodes of tomato plants. Ma [21]
proposed a YOLOv8 detection model for the whole growth stage of apple fruits, which
assembled the ShuffleNetv2 base module, the Ghost module, and the SE attention module
and invoked the CIoU (Complete-IOU) marginal regression loss function, which resulted
in a final mAP of 91.4%, and a model size of 2.6 MB.
The analysis shows that deep learning detection algorithms have been effectively
used to diagnose various crop diseases; however, fewer studies have focused on apple leaf
disease detection. Moreover, most studies were implemented in simple environments using
single leaves, meaning that they could not realistically simulate the complex background of
the growing environment, different light conditions, and the degree of leaves shading each
other, thus limiting their application in real production scenarios. It is crucial to develop an
apple leaf disease detection model with high recognition accuracy and fast detection speed.
In order to solve the problems concerning the small size of some disease spots on apple
leaves and the difficulty in accurately detecting the disease spot targets in the complex
background of plantations, a high-precision model and a lightweight model were proposed
based on the multiscale feature extraction module ResBlock and lightweight feature fusion
module C4 with a parameter streamlining strategy, which improved the accuracy of disease
detection on apple leaves and realized a lightweight model.

2. Materials and Methods


2.1. Materials
To train and evaluate the proposed model, we composed a new ALPP dataset using a
fusion of the AppleLeaf9 dataset [22] and the Ai Studio dataset [23]. The AppleLeaf9 dataset
2. Materials and Methods
2.1. Materials
Agronomy 2024, 14, 1331 To train and evaluate the proposed model, we composed a new ALPP 3dataset of 16 usin
a fusion of the AppleLeaf9 dataset [22] and the Ai Studio dataset [23]. The AppleLeaf
dataset fuses PlantVillage [24], ATLDSD [25], PPCD2020 [26], and PPCD2021 [26
fuses PlantVillage
PlantVillage [24], ATLDSD
contains [25], PPCD2020
only single-leaf [26], and
images PPCD2021
of simple [26]. PlantVillage
environments. Thecon-
images o
tains only single-leaf images of simple environments. The images
AT-LDSD, PPCD2020, and PPCD2021 were collected from orchards under differenof AT-LDSD, PPCD2020,
and PPCD2021 were collected from orchards under different weather conditions, which
weather conditions, which included various situations, such as sunny, cloudy, after rain
included various situations, such as sunny, cloudy, after rain, with light, against light, near,
with light, against light, near, far, pitch, and elevation. Five types of leaf images o
far, pitch, and elevation. Five types of leaf images of alternaria leaf spot, rust, brown spot,
alternaria
gray leaf
spot, and spot,
frog eyerust, brown
leaf spot werespot, grayand
selected, spot, and frog
in order eye leaf
to increase thespot wereofselected,
diversity the an
in order
images to increase
as much the and
as possible diversity
improve ofthe
thegeneralizability
images as much of theasmodel,
possible and improve
alternaria leaf th
generalizability of the model, alternaria leaf spot, rust, brown spot,
spot, rust, brown spot, and gray spot leaf images were examined by using the Ai Studio and gray spot lea
images Rust,
dataset. were brown
examined spot,by using
gray spot,the Aifrog
and Studio dataset.
eye leaf spot Rust,
when brown
using thespot, gray spot, an
Ai Studio
dataset
frog eyeareleaf
shown
spotinwhen
Figureusing
1. the Ai Studio dataset are shown in Figure 1.

Figure 1. Examples of five diseases and labels.


Figure 1. Examples of five diseases and labels.
The images were labeled using labeling software, and the label of each target feature
The images
corresponded were
to the labeled
target using
feature, and labeling
redundantsoftware, andwas
information the label of during
ignored each target
im- featur
corresponded
age to the
labeling. A total target
of 4863 feature,
images and redundant
were labeled, which were information was ignored
randomly divided into a durin
training set, a testing
image labeling. set, and
A total a validation
of 4863 imagesset in the
were ratio ofwhich
labeled, 6:2:2, and therandomly
were specific number
divided into
distribution of each disease category is shown in Table 1.
training set, a testing set, and a validation set in the ratio of 6:2:2, and the specific numbe
distribution of each disease category is shown in Table 1.
Table 1. Distribution of dataset labels.

Table 1. Distribution
Disease Type of dataset
Train labels. Test Validation Number of Images
Alternaria leaf spot 490 163 163 816
Disease Type 508
Rust Train Test
169 Validation
169 Number816 of Images
Brown spot
Alternaria leaf spot741 490 247
163 247
163 1235 816
Gray spot 425 142 142 709
Rust
Frog eye leaf spot 755 508 169
251 169
251 1257 816
Total 2919 972 972 4863
Brown spot 741 247 247 1235
Gray spot 425 142 142 709
2.2. Methods
Frog eye leaf spot 755 251 251 1257
Based on the characteristics of the YOLOv5 [27] model, an improved apple leaf disease
Total 2919 972 972 4863
detection model was proposed, YOLOv5-Res, as shown in Figure 2, in order to improve
the accuracy of the model in detecting the degree of small-scale apple leaf spot disease
2.2. Methods
against a complex background. The model increases the number of bottlenecks in the C3
(CSP Bottleneck
Based on with
the 3characteristics
convolutions) [28] module
of the of the backbone
YOLOv5 and neck
[27] model, networks to
an improved apple lea
deepen
disease detection model was proposed, YOLOv5-Res, as shown in Figure focus
the network and improve the feature extraction ability, adopts the lighter 2, in order t
module
improveinstead of the Conv
the accuracy ofmodule to balance
the model the increase
in detecting the in the network
degree parameters,
of small-scale and leaf spo
apple
adopts the SPP (Spatial Pyramid Pooling) [29] module instead of the SPPF (Spatial Pyramid
disease against a complex background. The model increases the number of bottlenecks i
Pooling Fast) [29] module to reduce the overfitting risk and improve the generalizability
the C3 (CSP Bottleneck with 3 convolutions) [28] module of the backbone and neck
networks to deepen the network and improve the feature extraction ability, adopts the
lighter focus module instead of the Conv module to balance the increase in the network
Agronomy 2024, 14, 1331 4 of 16
parameters, and adopts the SPP (Spatial Pyramid Pooling) [29] module instead of the
SPPF (Spatial Pyramid Pooling Fast) [29] module to reduce the overfitting risk and
improve the generalizability of the model while decreasing the model parameters. In
of the model
addition, the while decreasing
Resblock (residualthe model
block) parameters.
module In addition,
is designed thetoResblock
to adapt (residual
input features of
block) module
different scalesisand
designed to adapt to input features of different scales and complexities.
complexities.
Based on
Based onthe
theYOLOv5-Res
YOLOv5-Res model, the the
model, C4 (CSP Bottleneck
C4 (CSP with 4 with
Bottleneck convolutions) module
4 convolutions)
was
module was designed with the goal of lightening the model to optimize the transferofpath
designed with the goal of lightening the model to optimize the transfer path the
feature map information fusion process, and with the parameter streamlining
of the feature map information fusion process, and with the parameter streamlining strategy, the
lightweight YOLOv5-Res4 model was designed.
strategy, the lightweight YOLOv5-Res4 model was designed.

Figure 2. YOLOv5-Res model architecture.


Figure 2. YOLOv5-Res model architecture.

2.2.1. YOLOv5 Model


The YOLOv5 model is a single-stage target recognition model proposed by Glenn
Jocher in 2020. It can be categorized
categorized into YOLOv5s,
YOLOv5s, YOLOv5n, YOLOv5m, YOLOv5l, and
versionsbased
YOLOv5x versions basedonon thethe differences
differences in network
in network depthdepth and network
and network width.
width. Among
Among them, YOLOv5n
them, YOLOv5n is the version
is the lightest lightestof version
the YOLOv5 of the YOLOv5
series, with series, with the
the fastest fastest
detection
detection speed,
speed, while while YOLOv5s
YOLOv5s is the model is the model
with withoverall
the best the best overall performance.
performance.
The YOLOv5
YOLOv5model model structure
structure consists of a backbone,
consists of a backbone, neck, and
neck,a head.
and The backbone
a head. The
is used to is
backbone extract
used the features
to extract thefrom the input
features from image
the input andimage
consists
andofconsists
Conv, C3, and SPPF.
of Conv, C3,
The C3 structure is used for the backbone and neck. The SPPF structure
and SPPF. The C3 structure is used for the backbone and neck. The SPPF structure is used is used for the
maximum
for pooling pooling
the maximum operation of featureofmaps
operation featureusingmaps differently sized convolution
using differently kernels to
sized convolution
enrich feature
kernels representation.
to enrich The neck partThe
feature representation. uses FPNs
neck (Feature
part Pyramid
uses FPNs Networks)
(Feature [30]
Pyramid
and PANs (Path
Networks) Aggregation
[30] and PANs (Path Networks) [31] to jointly
Aggregation Networks)enhance[31]the
to semantic and spatial
jointly enhance the
information.
semantic andThespatial
FPN structure realizesThe
information. semanticFPNfeature enhancement
structure realizes from the top–down
semantic feature
and fuses higher-order
enhancement from thefeaturetop–downmapping andwith fuseslower-order
higher-order feature mapping
feature throughwith
mapping up-
sampling. The PAN structure enhances the localization features
lower-order feature mapping through up-sampling. The PAN structure enhances the from the bottom–up to
achieve the purpose of feature information enrichment. The head
localization features from the bottom–up to achieve the purpose of feature information part generates the class
probability
enrichment.and The location
head part information
generates the of the
classpredicted
probability target. The three
and location detection heads
information of the
predicted to 20 ×The
correspondtarget. 40 × 40,
20, three and 80 ×
detection 80 feature
heads maps, respectively.
correspond to 20 × 20, 40 The YOLOv5
× 40, and 80 model
× 80
uses CIoU [32] to compute intersections, such as the following:
feature maps, respectively. The YOLOv5 model uses CIoU [32] to compute intersections,
such as the following:
ρ2 (b, bgt )
CIoU = IoU − − αv
c2

where b and bgt represent the centers of the predicted and real frames, respectively, ρ rep-
resents the computation of the Euclidean distance between two centers, c represents the
diagonal distance of the smallest closure region that can contain both the prediction box and
the true box, α represents the weighting function, and υ represents the difference between
the length ratios of the predicted and real frames.
represents the difference between the length ratios of the predicted and

2.2.2. Design of ResBlock Multiscale Module


Agronomy 2024, 14, 1331 5 of 16
Diseases, such as alternaria leaf spot, rust, brown spot, gray spot
spot, occur in artificial apple orchards, and the spots present in early
2.2.2. Design of ResBlock Multiscale Module
small targets, and the patches at the end of this stage do not have the s
Diseases, such as alternaria leaf spot, rust, brown spot, gray spot, and frog eye leaf
terms of inthe
spot, occur leaf,apple
artificial so orchards,
they are anddifficult to detect
the spots present in earlyvia YOLOv5
disease are mostlyusing
kernels. Specifically,
small targets, and the patcheswhen detecting
at the end of this stagesimilar diseases,
do not have the ability
the same percentage in is w
terms of the leaf, so they are difficult to detect via YOLOv5 using large convolution kernels.
extract the
Specifically, same
when feature
detecting map totheobtain
similar diseases, richerInfeature
ability is weaker. information
order to extract the
characterization
same feature map to obtain andricher
generalization
feature informationabilities
and improve ofthethe model, we
characterization and desig
generalization abilities of the model, we designed the ResBlock (residual block) module, as
(residual block) module, as shown in Figure 3.
shown in Figure 3.

Figure ResBlock
Figure 3.3. residual
ResBlock block.
residual block.
GoogLeNet proposed the Inception [33] model to increase the network width by
GoogLeNet
introducing proposed
multiple Inception thewhile
modules Inception
reducing [33] model
the number to increase
of parameters. In the
Inception-v1 [33], multiscale feature extraction is achieved by using different-sized convolu-
introducing
tional kernels in the multiple
same layer.Inception
The core ideamodules
of the ResNetwhile
(Residualreducing the
Network) [34] number
model
Inception-v1
is to learn residuals, [33],
i.e., themultiscale
neural networkfeature
is designedextraction is achieved
to carry out residual mapping dur- by usi
ing the training process, which maps the inputs directly to the outputs, y = F ( x, {wi }) + x,
convolutional kernels in the same layer. The core idea of the ResNet (R
where x denotes the inputs of the residual block, y denotes the output of the residual block,
[34]
and Fmodel
( x, {wi }) is to learn
denotes residuals,
the function that thei.e., the needs
network neural network
to fit. is designed to
Residual connectivity
makes it easier to train deeper networks.
mapping during the training process, which maps the inputs direct
The ResBlock module was proposed based on the idea of Inception and ResNet. There
ywere= three { }
F ( xbranch ) + x a, 1where
, wi designs: x denotes
× 1 convolutional kernel, the
a 3 ×inputs of thekernel,
3 convolutional residual
and bloc
{ }
a jump connection. Each branch contains a BN (Batch Normalization) [35] module, which is
output of the
used to improve theresidual block,
training stability model F
of theand and( xaccelerate
, wi )thedenotes
convergencethe function
process.
Specifically, the 1 × 1 convolutional kernel is used to linearly transform the feature map to
needs to fit.
extract local Residual
feature connectivity
information; makes kernel
the 3 × 3 convolutional it easier
is usedtototrain deeper
capture a wider netwo
rangeThe ResBlock
of feature relationshipsmodule was
to enhance theproposed
global expressionbased on the
capability; and idea
the jumpof Incep
connection realizes direct information transfer and the back propagation of the gradient
There were three branch designs: a 1 × 1 convolutional kernel, a 3
to alleviate the gradient vanishing problem. This multi-branch design utilizes different
kernel,
convolutional andkernel
a jump featureconnection. Each combined
extraction capabilities branch with contains a BN (Batch
jump connections to N
achieve feature
module, which fusionisand information
used transfer to
to improve promote
the multiscale
training feature extraction.
stability of the model
Meanwhile, the introduction of the BN module helps to normalize the feature distribution,
convergence
accelerate the network process. Specifically,
convergence speed, and theimprove 1 ×the1generalization
convolutional kernel is
ability and
transform
stability of thethe feature
model. map to
Such a module extract
design adaptslocal feature
to different scales information;
and complexities the 3
in terms of input features to improve the characterization abilities and performance of
kernel
the model. is used to capture a wider range of feature relationships to e
expression capability; and the jump connection realizes direct inform
2.2.3. Design of the C4 Feature Fusion Module
the back propagation of the gradient to alleviate the gradient vanish
C3 is an important building block for feature extraction in the YOLOv5s model, which
multi-branch design
consists of multiple utilizes
convolutional different
modules (Conv),convolutional kernel feature
bottleneck layers (bottleneck), BN extr
layers (Batch Normalization), and SiLU (Sigmoid Linear Unit)
combined with jump connections to achieve feature fusion and infor [36] activation functions.
The convolution module consists of a Conv2d convolution layer plus a BN layer plus a
promote multiscale
SiLU activation feature
function; the extraction.
bottleneck layer involvesMeanwhile, thefollowed
1 × 1 convolution introduction
by
helps to normalize the feature distribution, accelerate the network co
3 × 3 convolution, where the 1 × 1 convolution halves the number of channels and the
3 × 3 convolution doubles the number of channels and then adds the inputs, which is set
and improve the generalization ability and stability of the model. Suc
adapts to different scales and complexities in terms of input featur
characterization abilities and performance of the model.
the model. On the basis of C3, considering that the first convolution of the main branch
and the first convolution of the bottleneck layer are both point-by-point convolutions, th
two layers are merged to form the C4 (CSP Bottleneck with 4 convolutions) module; C
uses 1 × 1 convolution to reduce the number of channels in the main branch to extract th
Agronomy 2024, 14, 1331 6 of 16
shallow features to obtain more detailed information; 3 × 3 convolution is used t
increase the sensory field to extract the deeper information; and the side branch
introduces
up to improve a 1network
× 1 convolutional layerand
characterization to realize featureAs
performance. fusion. With
a whole, thethe
C3 above
modulestructure
the C4 module optimizes the information transfer path within the module,
is divided into three steps, which include the First PW (Pointwise Convolution) [37] simplifies
to th
feature transfer
downscale and
the data, theninteraction
convolutionprocess,
is carriedand
out avoids
using a information loss and
regular convolution blurring to
kernel,
better
and capture
finally, PW islocal
usedfeatures and
to upscale thespatial information;
data. The itsisstructure
C3 structure shown inisthe
shown in the righ
left panel
of Figure
panel of4.Figure 4.

Figure4.4.C3C3
Figure and
and C4C4 module
module comparison.
comparison.

C3YOLOv5-Res4
2.2.4. is used to enhance theon
Based learning capabilities
Parameter of the network
Refinement Strategyvia the feature extrac-
tion of the input image through multilayer operations, such as a convolution module, a
Sincelayer,
bottleneck image BNfeature extraction
layer, etc., is mainly
increasing carriedand
the complexity out computation
in the backbone of thepart,
model.the neck i
mainly
On used
the basis of for
C3, the feature that
considering fusion
the of
firstthe feature map.
convolution of theInmain
thisbranch
paper,and wetheusefirst
the lighte
weight focus
convolution [38]
of the module layer
bottleneck to replace
are boththe layer 0 Conv
point-by-point module in
convolutions, thethe
twobackbone
layers part
are merged to form the C4 (CSP Bottleneck with 4 convolutions) module;
reduce the bottleneck structure coefficient in the C3 module to 1 in order to maximize th C4 uses 1 × 1
convolution
lightweighttopurpose,
reduce theuse
number of channelsmultiscale
the ResBlock in the mainmodule
branch totoextract
replace the the
shallow
other Conv
features to obtain more detailed information; 3 × 3 convolution
modules in the backbone network, and introduce the SPP module to directly is used to increase the connec
sensory field to extract the deeper information; and the side branch introduces a 1 × 1
with the ResBlock module to improve the model’s ability to extract features from th
convolutional layer to realize feature fusion. With the above structure, the C4 module
backbonethe
optimizes network. Thetransfer
information SPP modulepath withinis directly connected
the module, simplifiestothe
thefeature
ResBlock module t
transfer
and interaction process, and avoids information loss and blurring to better capture local
features and spatial information; its structure is shown in the right panel of Figure 4.

2.2.4. YOLOv5-Res4 Based on Parameter Refinement Strategy


Since image feature extraction is mainly carried out in the backbone part, the neck
is mainly used for the feature fusion of the feature map. In this paper, we use the lighter
weight focus [38] module to replace the layer 0 Conv module in the backbone part, re-
duce the bottleneck structure coefficient in the C3 module to 1 in order to maximize the
lightweight purpose, use the ResBlock multiscale module to replace the other Conv mod-
ules in the backbone network, and introduce the SPP module to directly connect with the
ResBlock module to improve the model’s ability to extract features from the backbone
network. The SPP module is directly connected to the ResBlock module to improve the
model’s understanding of global information and reduce the risk of model overfitting to
improve the generalization ability of the model, and the streamlined C4 module was used
to optimize the feature information transfer path to further improve the model’s detection
performance and ability to represent features. In the neck network, the lighter C4 module
is used to replace the C3 module to accelerate the feature transfer and interaction process,
which is conducive to the rapid dissemination and fusion of semantic information. The
Conv module in the 18th and 21st layers mainly plays the role of downsampling; for this
reason, the ResBlock multiscale module is used to replace the Conv module to improve
lighter C4 module is used to replace the C3 module to accelerate the feature transfer and
interaction process, which is conducive to the rapid dissemination and fusion of semantic
information. The Conv module in the 18th and 21st layers mainly plays the role of
downsampling; for this reason, the ResBlock multiscale module is used to replace the
Agronomy 2024, 14, 1331 7 of 16
Conv module to improve the detection accuracy and reduce the computation volume
while maintaining a sufficient sensory field so as to accelerate the inference speed of the
model. the detection accuracy and reduce the computation volume while maintaining a sufficient
sensory
In summary, the field so as to accelerate
YOLOv5-Res4 the inference
model adopts speed of the
a lightweight model. the strategy of
module,
In summary, the YOLOv5-Res4 model adopts a lightweight module, the strategy
channel reduction of and feature fusion achieves a significant reduction in the number of
channel reduction and feature fusion achieves a significant reduction in the number
model parameters and computational
of model parameters and complexity under
computational the premise
complexity underof
themaintaining high
premise of maintaining
detection accuracy,highanddetection
the model realizes
accuracy, and thea shallower
model realizescompact structure,
a shallower compact as shownasin
structure, shown
Figure 5. in Figure 5.

Figure 5. YOLOv5-Res4 model


Figure network architecture.
5. YOLOv5-Res4 model network architecture.

3. Results
3. Results 3.1. Experimental Environment and Model Evaluation Metrics
3.1. Experimental Environment and
The main Model Evaluation
parameters Metrics
of the computer platform used in this experiment are as follows:
Intel Core i9-13900K CPU with a main frequency of 3.00 GHz, a running memory of
The main parameters of the computer platform used in this experiment are as
64 GB, and NVIDIA GeForce RTX3090 with 24 GB GPU. The PyTorch2.0.1 deep learning
follows: Intel Core i9-13900K
framework was CPU with
used a main
to build andfrequency
evaluate theofmodel.
3.00 GHz, a running
The training memory
parameters were set
of 64 GB, and NVIDIA GeForce RTX3090 with 24 GB GPU. The PyTorch2.0.1
with an image input size of 640 × 640; the training epoch was 150, the image batch sizedeep
learning framework was set used
was to 2, the
to Adam
build optimizer was used,
and evaluate the the initialThe
model. learning rate was
training set as 0.01, the
parameters
momentum value was 0.937, and the weight decay parameter was 0.0005.
were set with an image input size of 640 × 640; the training epoch was 150, the image
To evaluate the performance of the improved YOLOv5s model, the evaluation metrics
batch size was set to 2, in
used thethis
Adam
study optimizer was used,
were categorized the initial
into precision learning
(P), recall rate was set average
(R), single-category as
0.01, the momentum value (AP),
precision was 0.937, and the
mean average weight(mAP),
precision decaynumber
parameter was 0.0005.
of parameters (parameters), and
To evaluate the performance of the improved YOLOv5s model, the evaluation
floating-point operations (FLOPs), where mAP, parameters, and FLOPs denote the mean
metrics used in thisaverage
studyprecision of disease detection,
were categorized the number(P),
into precision of network
recall parameters, and the real-time
(R), single-category
TP
Precision =
TP + FP
TP
Agronomy 2024, 14, 1331 Recall = 8 of 16
TP + FN
1
AP =  P(R)dR
processing speed of the model, respectively. The formula used for the model performance
evaluation metrics is shown in Equations (1)–(6): 0

TP 1

N
TP +=FP
Precision = (1)
mAP i =1
APi
N
TP

=
Recall = (2)
Parameters
Z 1 (K×K×Cin ×Cout)
TP + FN

AP = P( R)dR (3)

mAP = 
0
FLOPs = 1 ∑(NKAP
×Ki
×Cin ×Cout ×H×W) (4)
N i =1

where TP denotes Parameters


the number=∑ of(Kcorrectly × Cout ) positive samples,(5)FP den
× K × Cindetected
number of incorrectly
FLOPs detected
= ∑ (K × Knegative samples,
× Cin × Cout × H × W )and
FN denotes(6)the nu

where TP denotes
undetected the number
positive samples. K denotes
of correctly detectedthe
positive samples, FPkernel
convolutional the C in a
denotessize,
number of incorrectly detected negative samples, and FN denotes the number of undetected
denote the number of input and output channels, respectively, and H and W
positive samples. K denotes the convolutional kernel size, Cin and Cout denote the number
ofthe height
input and width
and output of the
channels, output feature
respectively, map,
and H and respectively.
W denote the height and width of
the output feature map, respectively.
3.2. Experimental Analysis of the ResBlock Module
3.2. Experimental Analysis of the ResBlock Module
In order to verify the effectiveness of the ResBlock module, the Re
In order to verify the effectiveness of the ResBlock module, the ResBlock2, ResBlock3,
ResBlock3,
and ResBlock4and ResBlock4
modules modules were
were reconstructed, reconstructed,
as shown in Figure 6. as shown
Keeping thein Figure 6.
other
the otherinstructures
structures in the
the YOLOv5-Res YOLOv5-Res
model unchanged,model unchanged,
ResBlock was replacedResBlock was repla
with ordinary
Conv, ResBlock2, ResBlock3, and ResBlock4 for comparison experiments.
ordinary Conv, ResBlock2, ResBlock3, and ResBlock4 for comparison experimen

Figure 6. ResBlock 2, ResBlock 3, and ResBlock 4 modules.


Agronomy 2024, 14, 1331 9 of 16

The features of leaf diseases vary greatly at different stages of development, and 1 × 1
and 3 × 3 parallel convolution kernels were used to ensure that the detailed information in
the image could be adequately captured. Different sensory fields facilitate the extraction
of local features of different sizes, which can better capture detailed information in the
image and improve the robustness of the network. As shown in Table 2 of the experimental
results, ResBlock obtained the best mAP@0.5 (mean average precision, IoU = 0.5) and
mAP@0.5–0.95 (mean average precision, IoU = 0.5–0.95) values, where the mAP@0.5 values
are 1.88%, 2.18%, and 2.28% higher than those of ResBlock2, ResBlock3, and ResBlock4,
respectively. In addition, the ResBlock and ResBlock2 models have the same number of
parameters and model size, but there is a large difference in terms of accuracy, which is
mainly because the BN layer plays a role in speeding up the training speed to improve
model accuracy. Comparing the FLOP values of the three improved modules, the ResBlock
module is also more advantageous. The experimental results show that using the ResBlock
module to replace the Conv module in the backbone structure of YOLOv5s is an effective
way to improve the model, and although the ResBlock module increases the number of
parameters, this multiscale voxel extraction module adapts to the needs of apple leaf disease
detection and improves the accuracy of the detection model.

Table 2. Performance comparison of multiple ResBlock modules.

Modules mAP@0.5/% mAP@0.5–0.95/% Parameters FLOPs/G FPS


Conv 79.8% 32.7% 5,309,095 15.2 116.3
ResBlock 82.1% 34.3% 5,486,055 15.6 107.5
ResBlock2 80.2% 33.0% 5,486,055 15.6 107.5
ResBlock3 79.9% 34.0% 9,839,975 26.1 92.6
ResBlock4 79.8% 33.7% 18,195,815 46.3 87.7
Bolded fonts for best results.

3.3. Experiments with the C4 Module


To evaluate the effectiveness of the C4 feature fusion module, several different feature
fusion strategies were compared. The YOLOv5-Res4 model was used as the base model, and
other mainstream feature fusion module replacements, e.g., the CSP (CSP Bottleneck) [39],
C3, and C2f (CSPDarknet53 to 2-Stage FPN) [40] modules, were used sequentially in place
of the C4 module for comparison.
The experimental results, shown in Table 3, demonstrate that the mAP@0.5 values
of the CSP, C3, and C2f modules are 77.5%, 77.8%, and 77.2%, which are 2.4%, 2.1%,
and 2.7% lower than that of the YOLOv5-Res4 network model with the addition of the
C4 feature fusion module in terms of the mAP@0.5 values, respectively. The C3 module
introduces the idea of constantly mapping ResNet residual networks on top of the channel
separation strategy of the CSP module by stacking multiple bottleneck [41] modules on the
main branch to tune model depth. Additionally, C2f references the ELAN (Efficient Layer
Attention Network) [42] network’s idea of using short gradient paths to stack more blocks,
greatly enriching the gradient flow information. In contrast, the C4 feature fusion module
streamlines the network structure, which only consists of three 1 × 1 Convs with a step size
of 1 and one 3 × 3 Conv with a step size of 1. Finally, the low-level and high-level feature
maps are spliced using Concat, which retains more spatial and semantic information and
enhances the enhancement of the model’s expressive ability.
In addition, the C4 feature fusion module obtains the lowest parameters and FLOPs
while maintaining high detection accuracy. Additionally, the introduction of the C4 module
enables the network to have better sensing capabilities and information fusion abilities,
improving the accuracy and robustness of target detection.
Agronomy 2024, 14, 1331 10 of 16

Table 3. Performance comparison of different feature fusion strategies.

Modules mAP@0.5/% mAP@0.5–0.95/% Parameters FLOPs/G FPS


CSP 77.5% 26.9% 1,155,271 5.5 129.8
C3 77.8% 28.7% 1,124,519 5.4 131.5
C2f 77.2% 30.0% 1,183,911 5.5 125
C4 79.9% 31.5% 1,094,471 5.3 135.1
Bolded fonts for best results.

3.4. Experiments with Model Streamlining Strategies


To verify the effectiveness of the parameter streamlining scheme, this paper designed
several models for exploratory validation, and the specific streamlining methods of each
model are shown in Table 4.

Table 4. Model streamlining experimental program.

Models Method
Taking YOLOv5s as the base model, modify the layer 0 Conv of the backbone to the focus module and replace all C3
A
modules to obtain all C3s in the new model A as C3 × 1.
Taking YOLOv5s as the base model, modify the layer 0 Conv of the backbone to the focus module and replace all C3
B
modules to obtain C3 as C4 × 1 in the new model B.
Using model A as the base model, all Convs are replaced with ResBlock × 1, except for backbone layer 0 Conv, which is
C
the focus, to obtain the new model C.
Using model C as the base model, all the Convs in the obtained new model D are ResBlock, except the first Conv in the
D
backbone, which is the focus. Improving the last layer C3 in the backbone and C3 in the head are both C4 × 1.
E With D as the base model, SPPF is improved to SPP.
With E as the base model, the SPP is adjusted up one layer so that C4 is connected to the head layer as the output layer
F
of the backbone.
Taking F as the base model, shrinking the number of channels and improving the 10th and 14th layers in the neck for
YOLOv5-Res4
Conv, we end up with YOLOV5-Res4.

As shown in Table 5, both the C4 module and the Resblock module play a positive
effect in simplifying model complexity and recognition accuracy. Feature extraction is
mainly carried out in the shallow network, and model D has 1.7% fewer model parameters
than model C, while mAP@0.5 is improved by 0.5%. The effect of the SPP module was
verified using model D, model E, and model F. The results show that the mAP@0.5 value
gradually increases and reaches the highest value of 80.8% in model F. The mAP@0.5 value
of the YOLOv5-Res4 model is 79.9%, with the parameters only 22.89% and 61.7% of those
of the F and YOLOv5n models, and the model size is 2.4 MB, which significantly reduces
the size and computational cost of the model.

Table 5. Model streamlining exploration experiment performance comparison.

Models mAP@0.5/% mAP@0.5–0.95/% Parameters Model Size/MB FLOPs/G FPS


A 79.3% 33.2% 4,570,023 9.1 11.4 126.6
B 79.7% 33.8% 4,465,863 8.8 11.1 144.9
C 79.9% 32.3% 4,863,463 9.7 12.1 109.9
D 80.4% 33.1% 4,781,031 9.5 11.9 117.6
E 80.5% 33.9% 4,781,031 9.5 11.9 117.6
F 80.8% 32.1% 4,781,031 9.5 11.9 117.3
YOLOv5s 79.3% 32.7% 7,037,095 13.8 15.8 136.9
YOLOv5n 77.7% 31.1% 1,772,695 3.7 4.2 156.3
YOLOv5-Res4 79.9% 31.5% 1,094,471 2.4 5.3 135.1
Bolded fonts for best results.
Agronomy 2024, 14, 1331 11 of 16

3.5. YOLOv5-Res Comparative Analysis Experiment


To further validate the performance of the model and considering the differences
between the YOLOv5-Res accuracy model and the YOLOv5-Res4 lightweight model, this
experiment compared YOLOv5-Res and YOLOv5-Res4 separately with current mainstream
single-target detection models.
The YOLOv5-Res model was compared with DT-DETR-L [43], YOLOv5s [27], YOLOv5s-
TR [27], YOLOv5s6 [27], YOLOv8s [44], and YOLOv9 [45]; DT-DETR-L is a transformer-based
object detection model, a high-accuracy version of the DETR (detection transformer) model, and
the experimental results are shown in Table 6. The precision of YOLOv5-Res is improved by
1.6%, 4.3%, 3.8%, 5.2%, and 1.9%;recall is improved by 7.5%, −0.3%, 2.3%, 2.6%, 2.6%, and 4.3%;
and mAP@0.5 is improved by 7.5%, 2.8%, 3.7%, 4.3%, and 3.5%, respectively. Compared to
the latest YOLOv9, YOLOv5-Res improved by 5.2, −1.7%, and 2% in terms of precision, recall,
and mAP@0.5, respectively, and YOLOv5-Res still has the highest detection performance with
fewer parameters.

Table 6. Comparison of YOLOv5-Res performance with mainstream accuracy models.

Models Precision/% Recall/% mAP@0.5/% mAP@0.5–0.95/% Parameters Model Size/MB FLOPs/G FPS
RTDETR-L 79.8% 69.4% 74.6% 34.6% 32,004,290 66.2 103.5 65.4
YOLOv5s 77.1% 77.2% 79.3% 32.7% 7,037,095 13.8 15.8 136.9
YOLOv5s-TR 77.6% 74.6% 78.4% 32.7% 7,078,951 14.6 16.1 133.3
YOLOv5s6 76.2% 74.3% 77.8% 33.9% 12,342,868 25.2 16.2 120.5
YOLOv8s 79.5% 72.6% 78.6% 37.4% 11,129,454 21.4 28.5 92.6
YOLOv9 76.4% 78.6% 80.1% 40.6% 60,516,700 122.5 264 34.2
YOLOv5-Res 81.4% 76.9% 82.1% 34.0% 5,486,055 10.8 15.6 107.5
Bolded fonts for best results.

Table 7 displays the mAP@0.5 values of mainstream models for detecting various
categories of disease characteristics. Among these, the best-performing models for detecting
alternaria leaf spot, rust, brown spot, gray spot, and frog eye leaf spot were YOLOv9,
YOLOv5s6, YOLOv9, RTDETR-L, and YOLOv5-Res, respectively. Comparing the detection
results of the other models, YOLOv5-Res demonstrates a more balanced performance, with
only gray spot having a mAP@0.5 lower than 80%.

Table 7. YOLOv5-Res and mainstream accuracy modeling for detecting disease characteristics in
various categories mAP@0.5.

Models Alternaria Leaf Spot Rust Brown Spot Gray Spot Frog Eye Leaf Spot
RTDETR-L 73.4% 77.3% 76.2% 71.2% 75%
YOLOv5s 76.8% 81.4% 88.5% 69.8% 79.9%
YOLOv5s-TR 75.6% 76.2% 88.0% 70.4% 81.2%
YOLOv5s6 76.0% 82.6% 88.1%% 62.0% 80.0%
YOLOv8s 75.8% 78.8% 87.2% 70.5% 80.8%
YOLOv9 85.2% 76.9% 97.4% 63.3% 77.6%
YOLOv5-Res 84.9% 80.4% 93.0% 70.4% 81.7%
Bolded fonts for best results.

3.6. YOLOv5-Res4 Comparative Analysis Experiment


Since the YOLOv5-Res4 model focuses more on being lightweight, this paper compares it
with lightweight YOLO models, such as YOLOv3-tiny [46], YOLOv5n [27], YOLOv5n6 [27], and
YOLOv8n [44]. The experimental results are shown in Table 8, showing that the YOLOv5-Res4
model improved the precision by 7.1%, 0.9%, and 1.3% and decreased it by 0.2%; the recall is
improved by 13.2%, 3.7%, 7.9%, and 3.2%; and the mAP@0.5 improved by 11.7%, 2.2%, and 5.3%,
respectively, with the lowest number of parameters and model sizes at 2.9%; however, the FLOPs
are only 5.3, providing better performance.
Agronomy 2024, 14, 1331 12 of 16

Table 8. YOLOv5-Res4 performance comparison with mainstream lightweight modules.

Models Precision/% Recall/% mAP@0.5/% mAP@0.5–0.95/% Parameters Model Size/MB FLOPs/G FPS
YOLOv3-tiny 69.8% 62.7% 68.2% 22.2% 8,687,482 17.5 12.9 256.4
YOLOv5n 76.0% 72.2% 77.7% 31.1% 1,772,695 3.7 4.2 156.3
YOLOv5n6 75.6% 68.0% 74.6% 30.7% 3,104,644 6.7 4.3 138.8
YOLOv8n 77.1% 72.7% 77% 37.3% 3,007,598 6.3 8.1 112.4
YOLOv5-Res4 76.9% 75.9% 79.9% 31.5% 1,094,471 2.4 5.3 135.1
Bolded fonts for best results.

The experimental data of mainstream lightweight algorithmic models on various


categories of diseases are shown in Table 9, and the detection performance is higher than
that of other algorithmic models in all categories, except for the fact that the mAP@0.5
value for gray spot is lower than that of the YOLOv8n model by 3.4%.

Table 9. YOLOv5-Res4 and mainstream lightweight models for detecting various types of disease
characteristics of mAP@0.5.

Models Alternaria Leaf Spot Rust Brown Spot Gray Spot Frog Eye Leaf Spot
YOLOv3-tiny 72.7% 77.0% 49.6% 64.4% 77.3%
YOLOv5n 80.4% 79.6% 81.6% 66.9% 79.9%
YOLOv5n6 75.5% 75.0% 85.4% 59.7% 77.3%
YOLOv8n 75.4% 76.9% 84.7% 70.9% 77.3%
YOLOv5-Res4 82.1% 81.3% 87.5% 67.5% 81.2%
Bolded fonts for best results.

3.7. Improved Model Validity Analysis


In order to assess the differences in the performance of the two groups of models,
YOLOv5s vs. YOLOv5-Res and YOLOv5n vs. YOLOv5-Res4, in the target detection task,
this study used a paired samples t-test to analyze their performance on the mAP@0.5
metric [47]. The significance level of this test was set at a p-value of < 0.05 to ensure that
the observed differences were statistically significant. According to the results of multiple
calculations shown in Table 10, YOLOv5-Res and YOLOv5-Res4 exhibited statistically
significant differences in mAP@0.5 compared to their control models. This finding provides
solid statistical support for the validity of the YOLOv5-Res series model on the current
dataset and further validates its potential to solve the problems relating to the detection of
small spots on apple leaves, as well as the difficulty of accurate spot target detection due to
the complex background of orchards.

Table 10. T-test of different control models.

Models mAP@0.5_1 mAP@0.5_2 mAP@0.5_3 mAP@0.5_4 p-Value


YOLOv5s 79.3 79.4 79.2 79.4 0.05
YOLOv5-Res 81.9 81.3 82.7 82.4 0.004
YOLOv5n 77.7 76.9 77.9 78.3 0.05
YOLOv5-Res4 79.9 80.3 80.2 79.3 0.020
Bolded fonts for best results.

The YOLOv5-Res model and YOLOv5-Res4 model were validated using the test set,
and the detection results are shown in Figure 7. The apple leaf disease targets were small,
and the YOLOv5s model and YOLOv5n model had misdetection and omission in detecting
alternaria leaf spot and gray spot and had a slightly lower confidence level compared with
YOLOv5-Res and YOLOv5-Res4. In summary, the YOLOv5-Res and YOLOv5-Res4 models
were able to accomplish the task of accurate and efficient apple leaf spot detection.
and the detection results are shown in Figure 7. The apple leaf disease targets were small,
and the YOLOv5s model and YOLOv5n model had misdetection and omission in
detecting alternaria leaf spot and gray spot and had a slightly lower confidence level
compared with YOLOv5-Res and YOLOv5-Res4. In summary, the YOLOv5-Res and
Agronomy 2024, 14, 1331 13 of 16
YOLOv5-Res4 models were able to accomplish the task of accurate and efficient apple
leaf spot detection.

Modelrecognition
Figure7.7.Model
Figure recognitionperformance
performancecomparison.
comparison.

4. Discussion
4. Discussion
In order to solve problems concerning the small sizes of some lesions on apple leaves
and In
theorder to solve
difficulty problems
of accurately concerning
detecting lesionthe smalldue
targets sizes of some
to the complexlesions on apple
background of
leaves and the difficulty of accurately detecting lesion
the orchard, this study proposed obtaining the high-precision YOLOv5-Res modeltargets due to the complex
and the
background of the orchard,
lightweight YOLOv5-Res4 model. this study proposed obtaining the high-precision
YOLOv5-Res
Compared modelwithand
thethe lightweight
study of Li [48],YOLOv5-Res4
although all our model.
studies are based on the YOLO
Compared with the study of Li [48],
series, our model is more advantageous for lesion detectionalthough all ourinstudies
complex are based on the
environments in
YOLO series, our model is more advantageous for lesion
orchards, and its design fully takes into account the diversity of actual orchard detection in scenarios,
complex
environments
which improves in orchards, and itsand
the adaptability design fully takes
robustness into
of the accountCompared
module. the diversity of actual
to Gao’s [49]
orchard
study, we used a more challenging multidimensional dataset containing imagesmodule.
scenarios, which improves the adaptability and robustness of the of apple
Compared to Gao’sorchard
leaves in complex [49] study, we used with
backgrounds a more challenging
different weather multidimensional dataset
conditions and different
containing images
shooting angles, of apple
which enhanced leaves in complex
the utility orchardInbackgrounds
of our model. with concerned
addition, we were different
weather
with the conditions
difficulty inand different shooting
differentiating angles, which
spot identification, enhanced
especially the utilityleaf
that alternaria of spot
our
model. In addition, we were concerned with the difficulty in differentiating
is easily confused with gray spot in the early stages of the disease spot, and the results spot
identification,
showed that gray especially
spot wasthat alternaria
not detectedleaf withspot
highisaccuracy,
easily confused withbegray
which will spot in the
considered as a
early stages
separate of the topic
research disease spot, and
in future the results
studies. showed
In the field that gray spot
of agricultural was notthe
detection, detected
studies
with high accuracy,
of Firozeh Solimani [20]whichandwillMa be [21]considered
concludedas a separate
that research
the attention topic incan
mechanism future
help
studies.
the modelIn the field of
to better agricultural
focus on the key detection,
regions of thethe
studies
imageofwhile
Firozeh
alsoSolimani
increasing[20] and
the Ma
model
parameters. However, we cleverly propose a lightweighting strategy that successfully
balances the model complexity and performance by compressing the depth and width
of the model, achieving the need to significantly reduce the model computation while
ensuring detection accuracy.
In summary, our model effectively solves the challenge of the difficulty of the accurate
detection of targets with small-sized spots on apple leaves and under complex background
conditions in the orchard and shows excellent performance in both detection accuracy and
Agronomy 2024, 14, 1331 14 of 16

the lightening of the model, providing an innovative solution for the detection of apple
leaf diseases.

5. Conclusions
Between YOLOv5-Res and YOLOv5-Res4, the proposed ResBlock multiscale module
helps to extract features at different scales and preserve image detail information. Addi-
tionally, the feature fusion module C4 enhances the ability to extract feature information
from small targets while being lightweight. By compressing the depth and width of the
model, the lightweight YOLOv5-Res4 model is obtained. The improved algorithm model
YOLOv5-Res has a mAP@0.5 of 82.1%, FLOPs of 15.6 G, parameters of 5486055, and a
model size of 10.8 MB. The YOLOv5-Res4 model has a mAP0.5 of 79.9%, FLOPs of 5.3 G, a
parameter count of 1094471, and a model size of 2.4 MB. Overall, YOLOv5-Res was able
to achieve high accuracy in detecting apple leaf diseases, while YOLOv5-Res4 still had
high detection accuracy while being lightweight, and it also had certain advantages in
comparison with mainstream models.

Author Contributions: Conceptualization, Z.F.; Methodology, Z.S.; Resources, Z.C. All authors have
read and agreed to the published version of the manuscript.
Funding: This research received no external funding.
Data Availability Statement: Data are contained within the article.
Acknowledgments: The authors would like to thank the anonymous reviewers for their critical
comments and suggestions for improving the manuscript.
Conflicts of Interest: Zhaokai Sun and Zemin Feng were employed by the Jidong Equipment
Company. The remaining authors declare that the research was conducted in the absence of any
commercial or financial relationships that could be construed as a potential conflict of interest.

References
1. Bi, C.; Wang, J.; Duan, Y.; Fu, B.; Kang, J.-R.; Shi, Y. MobileNet based apple leaf diseases identification. Mob. Netw. Appl. 2022,
27, 172–180. [CrossRef]
2. Abbaspour-Gilandeh, Y.; Aghabara, A.; Davari, M.; Maja, J.M. Feasibility of using computer vision and artificial intelligence
techniques in detection of some apple pests and diseases. Appl. Sci. 2022, 12, 906. [CrossRef]
3. Liu, J.; Wang, X. Plant diseases and pests detection based on deep learning: A review. Plant Methods 2021, 17, 22. [CrossRef]
4. Fuentes, A.; Yoon, S.; Park, D.S. Deep learning-based techniques for plant diseases recognition in real-field scenarios. In
Proceedings of the Advanced Concepts for Intelligent Vision Systems: 20th International Conference, ACIVS 2020, Auckland,
New Zealand, 10–14 February 2020; Proceedings 20. pp. 3–14.
5. Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. Adv. Neural
Inf. Process. Syst. 2015, 28, 1137–1149. [CrossRef]
6. He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision,
Venice, Italy, 22–29 October 2017; pp. 2961–2969.
7. Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of
the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Proceedings,
Part I 14. pp. 21–37.
8. Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788.
9. Redmon, J.; Farhadi, A. YOLO9000: Better, faster, stronger. In Proceedings of the IEEE Conference on Computer Vision and
Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 7263–7271.
10. Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767.
11. Bochkovskiy, A.; Wang, C.-Y.; Liao, H.-Y.M. Yolov4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934.
12. Lee, S.-H.; Wu, C.-C.; Chen, S.-F. Development of image recognition and classification algorithm for tea leaf diseases using
convolutional neural network. In Proceedings of the 2018 ASABE Annual International Meeting, Detroit, MI, USA, 29 July–1
August 2018; p. 1.
13. Singh, U.P.; Chouhan, S.S.; Jain, S.; Jain, S. Multilayer convolution neural network for the classification of mango leaves infected
by anthracnose disease. IEEE Access 2019, 7, 43721–43729. [CrossRef]
14. Li, H.; Shi, H.; Du, A.; Mao, Y.; Fan, K.; Wang, Y.; Shen, Y.; Wang, S.; Xu, X.; Tian, L. Symptom recognition of disease and insect
damage based on Mask R-CNN, wavelet transform, and F-RNet. Front. Plant Sci. 2022, 13, 922797. [CrossRef]
Agronomy 2024, 14, 1331 15 of 16

15. Sun, H.; Xu, H.; Liu, B.; He, D.; He, J.; Zhang, H.; Geng, N. MEAN-SSD: A novel real-time detector for apple leaf diseases using
improved light-weight convolutional neural networks. Comput. Electron. Agric. 2021, 189, 106379. [CrossRef]
16. Tian, Y.; Yang, G.; Wang, Z.; Li, E.; Liang, Z. Detection of apple lesions in orchards based on deep learning methods of cyclegan
and yolov3-dense. J. Sens. 2019, 2019, 7630926. [CrossRef]
17. Li, J.; Qiao, Y.; Liu, S.; Zhang, J.; Yang, Z.; Wang, M. An improved YOLOv5-based vegetable disease detection method. Comput.
Electron. Agric. 2022, 202, 107345. [CrossRef]
18. Lin, J.; Yu, D.; Pan, R.; Cai, J.; Liu, J.; Zhang, L.; Wen, X.; Peng, X.; Cernava, T.; Oufensou, S. Improved YOLOX-Tiny network for
detection of tobacco brown spot disease. Front. Plant Sci. 2023, 14, 1135105. [CrossRef]
19. Khan, F.; Zafar, N.; Tahir, M.N.; Aqib, M.; Waheed, H.; Haroon, Z. A mobile-based system for maize plant leaf disease detection
and classification using deep learning. Front. Plant Sci. 2023, 14, 1079366. [CrossRef]
20. Solimani, F.; Cardellicchio, A.; Dimauro, G.; Petrozza, A.; Summerer, S.; Cellini, F.; Renò, V. Optimizing tomato plant phenotyping
detection: Boosting YOLOv8 architecture to tackle data complexity. Comput. Electron. Agric. 2024, 218, 108728. [CrossRef]
21. Ma, B.; Hua, Z.; Wen, Y.; Deng, H.; Zhao, Y.; Pu, L.; Song, H. Using an improved lightweight YOLOv8 model for real-time
detection of multi-stage apple fruit in complex orchard environments. Artif. Intell. Agric. 2024, 11, 70–82. [CrossRef]
22. Yang, Q.; Duan, S.; Wang, L. Efficient identification of apple leaf diseases in the wild using convolutional neural networks.
Agronomy 2022, 12, 2784. [CrossRef]
23. Pathological Images of Apple Leaves. Available online: https://aistudio.baidu.com/datasetdetail/11591/0 (accessed on 1 April 2023).
24. Hughes, D.; Salathé, M. An open access repository of images on plant health to enable the development of mobile disease
diagnostics. arXiv 2015, arXiv:1511.08060.
25. Feng, J.; Chao, X. Apple Tree Leaf Disease Segmentation Dataset; Science Data Bank: Beijing, China, 2022.
26. Thapa, R.; Zhang, K.; Snavely, N.; Belongie, S.; Khan, A. The Plant Pathology Challenge 2020 data set to classify foliar disease of
apples. Appl. Plant Sci. 2020, 8, e11390. [CrossRef]
27. Jocher, G.; Chaurasia, A.; Stoken, A.; Borovec, J.; Kwon, Y.; Michael, K.; Fang, J.; Wong, C.; Yifu, Z.; Montes, D. ultralytics/yolov5:
v6. 2-yolov5 classification models, apple m1, reproducibility, clearml and deci. ai integrations. Zenodo. 2022. Available online:
https://github.com/ultralytics/yolov5 (accessed on 13 May 2024).
28. Park, H.; Yoo, Y.; Seo, G.; Han, D.; Yun, S.; Kwak, N. C3: Concentrated-comprehensive convolution and its application to semantic
segmentation. arXiv 2018, arXiv:1812.04920.
29. He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans.
Pattern Anal. Mach. Intell. 2015, 37, 1904–1916. [CrossRef]
30. Lin, T.-Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings
of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125.
31. Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path aggregation network for instance segmentation. In Proceedings of the IEEE conference on
Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 8759–8768.
32. Zheng, Z.; Wang, P.; Ren, D.; Liu, W.; Ye, R.; Hu, Q.; Zuo, W. Enhancing geometric factors in model learning and inference for
object detection and instance segmentation. IEEE Trans. Cybern. 2021, 52, 8574–8586. [CrossRef]
33. Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with
convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June
2015; pp. 1–9.
34. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778.
35. Ioffe, S.; Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings
of the International Conference on Machine Learning, Lille, France, 6–11 July 2015; pp. 448–456.
36. Elfwing, S.; Uchibe, E.; Doya, K. Sigmoid-weighted linear units for neural network function approximation in reinforcement
learning. Neural Netw. 2018, 107, 3–11. [CrossRef]
37. Hua, B.-S.; Tran, M.-K.; Yeung, S.-K. Pointwise convolutional neural networks. In Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 984–993.
38. Lucic, A.; Oosterhuis, H.; Haned, H.; de Rijke, M. FOCUS: Flexible optimizable counterfactual explanations for tree ensembles. In
Proceedings of the AAAI Conference on Artificial Intelligence, Online, 22 February–1 March 2022; pp. 5313–5322.
39. Wang, C.-Y.; Liao, H.-Y.M.; Wu, Y.-H.; Chen, P.-Y.; Hsieh, J.-W.; Yeh, I.-H. CSPNet: A new backbone that can enhance learning
capability of CNN. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle,
WA, USA, 14–19 June 2020; pp. 390–391.
40. Sun, Y.; Chen, G.; Zhou, T.; Zhang, Y.; Liu, N. Context-aware cross-level fusion network for camouflaged object detection. arXiv
2021, arXiv:2105.12555.
41. Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.-C. Mobilenetv2: Inverted residuals and linear bottlenecks. In
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018;
pp. 4510–4520.
42. Zhang, X.; Zeng, H.; Guo, S.; Zhang, L. Efficient long-range attention network for image super-resolution. In Proceedings of the
European Conference on Computer Vision, Glasgow, UK, 23–28 August 2022; pp. 649–667.
Agronomy 2024, 14, 1331 16 of 16

43. Zhao, Y.; Lv, W.; Xu, S.; Wei, J.; Wang, G.; Dang, Q.; Liu, Y.; Chen, J. Detrs beat yolos on real-time object detection. arXiv 2023,
arXiv:2304.08069.
44. Reis, D.; Kupec, J.; Hong, J.; Daoudi, A. Real-time flying object detection with YOLOv8. arXiv 2023, arXiv:2305.09972.
45. Wang, C.-Y.; Yeh, I.-H.; Liao, H.-Y.M. YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information.
arXiv 2024, arXiv:2402.13616.
46. Adarsh, P.; Rathi, P.; Kumar, M. YOLO v3-Tiny: Object Detection and Recognition using one stage improved model. In Proceedings
of the 2020 6th International Conference on Advanced Computing and Communication Systems (ICACCS), Coimbatore, India,
6–7 March 2020; pp. 687–694.
47. Mojaravscki, D.; Graziano Magalhães, P.S. Comparative Evaluation of Color Correction as Image Preprocessing for Olive
Identification under Natural Light Using Cell Phones. AgriEngineering 2024, 6, 155–170. [CrossRef]
48. Li, H.; Shi, L.; Fang, S.; Yin, F. Real-Time Detection of Apple Leaf Diseases in Natural Scenes Based on YOLOv5. Agriculture 2023,
13, 878. [CrossRef]
49. Ang, G.; Han, R.; Yuepeng, S.; Longlong, R.; Yue, Z.; Xiang, H. Construction and verification of machine vision algorithm model
based on apple leaf disease images. Front. Plant Sci. 2023, 14, 1246065. [CrossRef]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.

You might also like