Professional Documents
Culture Documents
Agronomy 14 01331
Agronomy 14 01331
Agronomy 14 01331
Article
Highly Accurate and Lightweight Detection Model of Apple Leaf
Diseases Based on YOLO
Zhaokai Sun 1,2 , Zemin Feng 1,3, * and Ziming Chen 1
Abstract: To mitigate problems concerning small-sized spots on apple leaves and the difficulties
associated with the accurate detection of spot targets exacerbated by the complex backgrounds of
orchards, this research used alternaria leaf spots, rust, brown spots, gray spots, and frog eye leaf
spots on apple leaves as the research object and proposed the use of a high-accuracy detection
model YOLOv5-Res (YOLOv5-Resblock) and lightweight detection model YOLOv5-Res4 (YOLOv5-
Resblock-C4). Firstly, a multiscale feature extraction module, ResBlock (residual block), was designed
by combining the Inception multi-branch structure and ResNet residual idea. Secondly, a lightweight
feature fusion module C4 (CSP Bottleneck with four convolutions) was designed to reduce the number
of model parameters while improving the detection ability of small targets. Finally, a parameter-
streamlining strategy based on an optimized model architecture was proposed. The experimental
results show that the performance of the YOLOv5-Res model and YOLOv5-Res4 model is significantly
improved, with the mAP0.5 values increasing by 2.8% and 2.2% compared to the YOLOv5s model
and YOLOv5n model, respectively. The sizes of the YOLOv5-Res model and YOLOv5-Res4 model are
only 10.8 MB and 2.4 MB, and the model parameter counts are reduced by 22% and 38.3% compared
to the YOLOv5s model and YOLOv5n model.
Keywords: leaf disease detection; YOLO; residual block; parameter streamlining strategy
Citation: Sun, Z.; Feng, Z.; Chen, Z.
Highly Accurate and Lightweight
Detection Model of Apple Leaf
Diseases Based on YOLO. Agronomy 1. Introduction
2024, 14, 1331. Apples are one of the world’s most important temperate fruits, and they have medici-
https://doi.org/10.3390/ nal, nutritional, and ornamental value. Apples are not only rich in essential nutrients for the
agronomy14061331 brain, such as sugar, protein, vitamins, and minerals, but, more importantly, are rich in zinc.
Academic Editor: Gamal Elmasry According to research, zinc is a component of many important enzymes in the human body
and is a key element in promoting growth and development. Therefore, improving the
Received: 14 May 2024 yield and quality of apples can better improve human health [1]. The spring and summer
Revised: 11 June 2024 temperatures and high humidity in northern China are favorable for the propagation and
Accepted: 18 June 2024
spread of pathogenic bacteria, increasing the possibility of apple tree leaves being infected
Published: 19 June 2024
with leaf spot diseases, such as alternaria leaf spots, rust, brown spots, gray spots, and frog
eye leaf spots. These diseases are the most common, and when coupled with the fact that
artificial orchards are planted with high density and are poorly ventilated, this leads to the
Copyright: © 2024 by the authors.
development of these diseases. Poor ventilation leads to the frequent occurrence of these
Licensee MDPI, Basel, Switzerland. diseases in the northern region, seriously affecting the healthy development of the apple
This article is an open access article industry. The early discovery and rapid identification of leaf diseases, as well as the precise
distributed under the terms and prevention and control of apple leaf diseases, can effectively improve the yield and quality
conditions of the Creative Commons of apples [2].
Attribution (CC BY) license (https:// Traditional disease detection requires experienced experts to undertake visual judg-
creativecommons.org/licenses/by/ ments of leaves in the field. This traditional detection method is inefficient, especially
4.0/). concerning the fact that some similar leaf diseases are easily affected by subjective factors,
resulting in omission and misdetection. A large number of studies have shown that image
processing and machine learning algorithms are widely used for the rapid diagnosis of
crop diseases [3]. However, traditional machine learning methods have shortcomings,
such as difficulties in terms of their feature extraction and their high computational com-
plexity [4]. Deep learning methods based on convolutional neural networks have faster
detection speeds and greater detection accuracy in target detection compared to traditional
machine learning methods [5–11]. LEE [12] used the Faster R-CNN (Fast Region-based
Convolutional Neural Network) model with region-based convolutional neural networks
to recognize brown blight, blister blight, and algal leaf spot diseases affecting tea, with an
AP (average precision) of 63.58%, 81.08%, and 64.71%, respectively. Singh [13] used MCNN
(multilayer convolutional network) to classify mango anthracnose leaf disease with 97.3%
accuracy. He [14] proposed a Mask-RCNN-based model for detecting leaf diseases of tea
tree with F1 scores of 88.3% and 95.3% for tea coal diseases and boll weevil leaf diseases;
however, the F1 scores were lower when distinguishing between brown leaf disease and
target spot disease, with scores of only 61.1% and 66.6%. Sun [15] et al. proposed a module
for MEAN by reconstructing the 3 × 3 convolutional module to detect a single diseased
apple leaf in a simple environment, and the MEAN-SSD (Mobile End AppleNet-based
SSD algorithm) model achieved a mAP (mean average precision) value of 83.12% and an
FPS (frames per second) value of 12.53 for apple leaf diseases. Tian [16] et al. combined
DenseNet to optimize the feature layer of the YOLOv3 model and proposed the use of the
YOLOv3-dense model to detect apple surface anthracnose; DenseNet greatly improved
the utilization of features in the neural network, with a detection accuracy of 95.57%, and
the detection speed could be up to 31 FPS. Li [17] proposed an improved YOLOv5s model
for vegetable disease detection, which achieved 93.1% mAP@0.5 for five disease images
with a model size of 17.1 M. Lin [18] proposed an improved YOLOx-Tiny model for the
detection of tobacco brown spot disease, which introduced the HMU module for informa-
tion interaction and small feature extraction capability in the neck network, resulting in
an AP of 80.45%. Faiza Khan [19] obtained 99.04% AP for three maize crop diseases using
YOLOv8n model detection. Firozeh Solimani [20] used the SE attention module after the
C2f module of the YOLOv8 model and investigated the effect of the SE module on the
YOLOv8 model, and the results showed that the SE module improved the efficiency when
detecting small targets, such as flowers, fruits, and neck nodes of tomato plants. Ma [21]
proposed a YOLOv8 detection model for the whole growth stage of apple fruits, which
assembled the ShuffleNetv2 base module, the Ghost module, and the SE attention module
and invoked the CIoU (Complete-IOU) marginal regression loss function, which resulted
in a final mAP of 91.4%, and a model size of 2.6 MB.
The analysis shows that deep learning detection algorithms have been effectively
used to diagnose various crop diseases; however, fewer studies have focused on apple leaf
disease detection. Moreover, most studies were implemented in simple environments using
single leaves, meaning that they could not realistically simulate the complex background of
the growing environment, different light conditions, and the degree of leaves shading each
other, thus limiting their application in real production scenarios. It is crucial to develop an
apple leaf disease detection model with high recognition accuracy and fast detection speed.
In order to solve the problems concerning the small size of some disease spots on apple
leaves and the difficulty in accurately detecting the disease spot targets in the complex
background of plantations, a high-precision model and a lightweight model were proposed
based on the multiscale feature extraction module ResBlock and lightweight feature fusion
module C4 with a parameter streamlining strategy, which improved the accuracy of disease
detection on apple leaves and realized a lightweight model.
Table 1. Distribution
Disease Type of dataset
Train labels. Test Validation Number of Images
Alternaria leaf spot 490 163 163 816
Disease Type 508
Rust Train Test
169 Validation
169 Number816 of Images
Brown spot
Alternaria leaf spot741 490 247
163 247
163 1235 816
Gray spot 425 142 142 709
Rust
Frog eye leaf spot 755 508 169
251 169
251 1257 816
Total 2919 972 972 4863
Brown spot 741 247 247 1235
Gray spot 425 142 142 709
2.2. Methods
Frog eye leaf spot 755 251 251 1257
Based on the characteristics of the YOLOv5 [27] model, an improved apple leaf disease
Total 2919 972 972 4863
detection model was proposed, YOLOv5-Res, as shown in Figure 2, in order to improve
the accuracy of the model in detecting the degree of small-scale apple leaf spot disease
2.2. Methods
against a complex background. The model increases the number of bottlenecks in the C3
(CSP Bottleneck
Based on with
the 3characteristics
convolutions) [28] module
of the of the backbone
YOLOv5 and neck
[27] model, networks to
an improved apple lea
deepen
disease detection model was proposed, YOLOv5-Res, as shown in Figure focus
the network and improve the feature extraction ability, adopts the lighter 2, in order t
module
improveinstead of the Conv
the accuracy ofmodule to balance
the model the increase
in detecting the in the network
degree parameters,
of small-scale and leaf spo
apple
adopts the SPP (Spatial Pyramid Pooling) [29] module instead of the SPPF (Spatial Pyramid
disease against a complex background. The model increases the number of bottlenecks i
Pooling Fast) [29] module to reduce the overfitting risk and improve the generalizability
the C3 (CSP Bottleneck with 3 convolutions) [28] module of the backbone and neck
networks to deepen the network and improve the feature extraction ability, adopts the
lighter focus module instead of the Conv module to balance the increase in the network
Agronomy 2024, 14, 1331 4 of 16
parameters, and adopts the SPP (Spatial Pyramid Pooling) [29] module instead of the
SPPF (Spatial Pyramid Pooling Fast) [29] module to reduce the overfitting risk and
improve the generalizability of the model while decreasing the model parameters. In
of the model
addition, the while decreasing
Resblock (residualthe model
block) parameters.
module In addition,
is designed thetoResblock
to adapt (residual
input features of
block) module
different scalesisand
designed to adapt to input features of different scales and complexities.
complexities.
Based on
Based onthe
theYOLOv5-Res
YOLOv5-Res model, the the
model, C4 (CSP Bottleneck
C4 (CSP with 4 with
Bottleneck convolutions) module
4 convolutions)
was
module was designed with the goal of lightening the model to optimize the transferofpath
designed with the goal of lightening the model to optimize the transfer path the
feature map information fusion process, and with the parameter streamlining
of the feature map information fusion process, and with the parameter streamlining strategy, the
lightweight YOLOv5-Res4 model was designed.
strategy, the lightweight YOLOv5-Res4 model was designed.
where b and bgt represent the centers of the predicted and real frames, respectively, ρ rep-
resents the computation of the Euclidean distance between two centers, c represents the
diagonal distance of the smallest closure region that can contain both the prediction box and
the true box, α represents the weighting function, and υ represents the difference between
the length ratios of the predicted and real frames.
represents the difference between the length ratios of the predicted and
Figure ResBlock
Figure 3.3. residual
ResBlock block.
residual block.
GoogLeNet proposed the Inception [33] model to increase the network width by
GoogLeNet
introducing proposed
multiple Inception thewhile
modules Inception
reducing [33] model
the number to increase
of parameters. In the
Inception-v1 [33], multiscale feature extraction is achieved by using different-sized convolu-
introducing
tional kernels in the multiple
same layer.Inception
The core ideamodules
of the ResNetwhile
(Residualreducing the
Network) [34] number
model
Inception-v1
is to learn residuals, [33],
i.e., themultiscale
neural networkfeature
is designedextraction is achieved
to carry out residual mapping dur- by usi
ing the training process, which maps the inputs directly to the outputs, y = F ( x, {wi }) + x,
convolutional kernels in the same layer. The core idea of the ResNet (R
where x denotes the inputs of the residual block, y denotes the output of the residual block,
[34]
and Fmodel
( x, {wi }) is to learn
denotes residuals,
the function that thei.e., the needs
network neural network
to fit. is designed to
Residual connectivity
makes it easier to train deeper networks.
mapping during the training process, which maps the inputs direct
The ResBlock module was proposed based on the idea of Inception and ResNet. There
ywere= three { }
F ( xbranch ) + x a, 1where
, wi designs: x denotes
× 1 convolutional kernel, the
a 3 ×inputs of thekernel,
3 convolutional residual
and bloc
{ }
a jump connection. Each branch contains a BN (Batch Normalization) [35] module, which is
output of the
used to improve theresidual block,
training stability model F
of theand and( xaccelerate
, wi )thedenotes
convergencethe function
process.
Specifically, the 1 × 1 convolutional kernel is used to linearly transform the feature map to
needs to fit.
extract local Residual
feature connectivity
information; makes kernel
the 3 × 3 convolutional it easier
is usedtototrain deeper
capture a wider netwo
rangeThe ResBlock
of feature relationshipsmodule was
to enhance theproposed
global expressionbased on the
capability; and idea
the jumpof Incep
connection realizes direct information transfer and the back propagation of the gradient
There were three branch designs: a 1 × 1 convolutional kernel, a 3
to alleviate the gradient vanishing problem. This multi-branch design utilizes different
kernel,
convolutional andkernel
a jump featureconnection. Each combined
extraction capabilities branch with contains a BN (Batch
jump connections to N
achieve feature
module, which fusionisand information
used transfer to
to improve promote
the multiscale
training feature extraction.
stability of the model
Meanwhile, the introduction of the BN module helps to normalize the feature distribution,
convergence
accelerate the network process. Specifically,
convergence speed, and theimprove 1 ×the1generalization
convolutional kernel is
ability and
transform
stability of thethe feature
model. map to
Such a module extract
design adaptslocal feature
to different scales information;
and complexities the 3
in terms of input features to improve the characterization abilities and performance of
kernel
the model. is used to capture a wider range of feature relationships to e
expression capability; and the jump connection realizes direct inform
2.2.3. Design of the C4 Feature Fusion Module
the back propagation of the gradient to alleviate the gradient vanish
C3 is an important building block for feature extraction in the YOLOv5s model, which
multi-branch design
consists of multiple utilizes
convolutional different
modules (Conv),convolutional kernel feature
bottleneck layers (bottleneck), BN extr
layers (Batch Normalization), and SiLU (Sigmoid Linear Unit)
combined with jump connections to achieve feature fusion and infor [36] activation functions.
The convolution module consists of a Conv2d convolution layer plus a BN layer plus a
promote multiscale
SiLU activation feature
function; the extraction.
bottleneck layer involvesMeanwhile, thefollowed
1 × 1 convolution introduction
by
helps to normalize the feature distribution, accelerate the network co
3 × 3 convolution, where the 1 × 1 convolution halves the number of channels and the
3 × 3 convolution doubles the number of channels and then adds the inputs, which is set
and improve the generalization ability and stability of the model. Suc
adapts to different scales and complexities in terms of input featur
characterization abilities and performance of the model.
the model. On the basis of C3, considering that the first convolution of the main branch
and the first convolution of the bottleneck layer are both point-by-point convolutions, th
two layers are merged to form the C4 (CSP Bottleneck with 4 convolutions) module; C
uses 1 × 1 convolution to reduce the number of channels in the main branch to extract th
Agronomy 2024, 14, 1331 6 of 16
shallow features to obtain more detailed information; 3 × 3 convolution is used t
increase the sensory field to extract the deeper information; and the side branch
introduces
up to improve a 1network
× 1 convolutional layerand
characterization to realize featureAs
performance. fusion. With
a whole, thethe
C3 above
modulestructure
the C4 module optimizes the information transfer path within the module,
is divided into three steps, which include the First PW (Pointwise Convolution) [37] simplifies
to th
feature transfer
downscale and
the data, theninteraction
convolutionprocess,
is carriedand
out avoids
using a information loss and
regular convolution blurring to
kernel,
better
and capture
finally, PW islocal
usedfeatures and
to upscale thespatial information;
data. The itsisstructure
C3 structure shown inisthe
shown in the righ
left panel
of Figure
panel of4.Figure 4.
Figure4.4.C3C3
Figure and
and C4C4 module
module comparison.
comparison.
C3YOLOv5-Res4
2.2.4. is used to enhance theon
Based learning capabilities
Parameter of the network
Refinement Strategyvia the feature extrac-
tion of the input image through multilayer operations, such as a convolution module, a
Sincelayer,
bottleneck image BNfeature extraction
layer, etc., is mainly
increasing carriedand
the complexity out computation
in the backbone of thepart,
model.the neck i
mainly
On used
the basis of for
C3, the feature that
considering fusion
the of
firstthe feature map.
convolution of theInmain
thisbranch
paper,and wetheusefirst
the lighte
weight focus
convolution [38]
of the module layer
bottleneck to replace
are boththe layer 0 Conv
point-by-point module in
convolutions, thethe
twobackbone
layers part
are merged to form the C4 (CSP Bottleneck with 4 convolutions) module;
reduce the bottleneck structure coefficient in the C3 module to 1 in order to maximize th C4 uses 1 × 1
convolution
lightweighttopurpose,
reduce theuse
number of channelsmultiscale
the ResBlock in the mainmodule
branch totoextract
replace the the
shallow
other Conv
features to obtain more detailed information; 3 × 3 convolution
modules in the backbone network, and introduce the SPP module to directly is used to increase the connec
sensory field to extract the deeper information; and the side branch introduces a 1 × 1
with the ResBlock module to improve the model’s ability to extract features from th
convolutional layer to realize feature fusion. With the above structure, the C4 module
backbonethe
optimizes network. Thetransfer
information SPP modulepath withinis directly connected
the module, simplifiestothe
thefeature
ResBlock module t
transfer
and interaction process, and avoids information loss and blurring to better capture local
features and spatial information; its structure is shown in the right panel of Figure 4.
3. Results
3. Results 3.1. Experimental Environment and Model Evaluation Metrics
3.1. Experimental Environment and
The main Model Evaluation
parameters Metrics
of the computer platform used in this experiment are as follows:
Intel Core i9-13900K CPU with a main frequency of 3.00 GHz, a running memory of
The main parameters of the computer platform used in this experiment are as
64 GB, and NVIDIA GeForce RTX3090 with 24 GB GPU. The PyTorch2.0.1 deep learning
follows: Intel Core i9-13900K
framework was CPU with
used a main
to build andfrequency
evaluate theofmodel.
3.00 GHz, a running
The training memory
parameters were set
of 64 GB, and NVIDIA GeForce RTX3090 with 24 GB GPU. The PyTorch2.0.1
with an image input size of 640 × 640; the training epoch was 150, the image batch sizedeep
learning framework was set used
was to 2, the
to Adam
build optimizer was used,
and evaluate the the initialThe
model. learning rate was
training set as 0.01, the
parameters
momentum value was 0.937, and the weight decay parameter was 0.0005.
were set with an image input size of 640 × 640; the training epoch was 150, the image
To evaluate the performance of the improved YOLOv5s model, the evaluation metrics
batch size was set to 2, in
used thethis
Adam
study optimizer was used,
were categorized the initial
into precision learning
(P), recall rate was set average
(R), single-category as
0.01, the momentum value (AP),
precision was 0.937, and the
mean average weight(mAP),
precision decaynumber
parameter was 0.0005.
of parameters (parameters), and
To evaluate the performance of the improved YOLOv5s model, the evaluation
floating-point operations (FLOPs), where mAP, parameters, and FLOPs denote the mean
metrics used in thisaverage
studyprecision of disease detection,
were categorized the number(P),
into precision of network
recall parameters, and the real-time
(R), single-category
TP
Precision =
TP + FP
TP
Agronomy 2024, 14, 1331 Recall = 8 of 16
TP + FN
1
AP = P(R)dR
processing speed of the model, respectively. The formula used for the model performance
evaluation metrics is shown in Equations (1)–(6): 0
TP 1
N
TP +=FP
Precision = (1)
mAP i =1
APi
N
TP
=
Recall = (2)
Parameters
Z 1 (K×K×Cin ×Cout)
TP + FN
AP = P( R)dR (3)
mAP =
0
FLOPs = 1 ∑(NKAP
×Ki
×Cin ×Cout ×H×W) (4)
N i =1
where TP denotes
undetected the number
positive samples. K denotes
of correctly detectedthe
positive samples, FPkernel
convolutional the C in a
denotessize,
number of incorrectly detected negative samples, and FN denotes the number of undetected
denote the number of input and output channels, respectively, and H and W
positive samples. K denotes the convolutional kernel size, Cin and Cout denote the number
ofthe height
input and width
and output of the
channels, output feature
respectively, map,
and H and respectively.
W denote the height and width of
the output feature map, respectively.
3.2. Experimental Analysis of the ResBlock Module
3.2. Experimental Analysis of the ResBlock Module
In order to verify the effectiveness of the ResBlock module, the Re
In order to verify the effectiveness of the ResBlock module, the ResBlock2, ResBlock3,
ResBlock3,
and ResBlock4and ResBlock4
modules modules were
were reconstructed, reconstructed,
as shown in Figure 6. as shown
Keeping thein Figure 6.
other
the otherinstructures
structures in the
the YOLOv5-Res YOLOv5-Res
model unchanged,model unchanged,
ResBlock was replacedResBlock was repla
with ordinary
Conv, ResBlock2, ResBlock3, and ResBlock4 for comparison experiments.
ordinary Conv, ResBlock2, ResBlock3, and ResBlock4 for comparison experimen
The features of leaf diseases vary greatly at different stages of development, and 1 × 1
and 3 × 3 parallel convolution kernels were used to ensure that the detailed information in
the image could be adequately captured. Different sensory fields facilitate the extraction
of local features of different sizes, which can better capture detailed information in the
image and improve the robustness of the network. As shown in Table 2 of the experimental
results, ResBlock obtained the best mAP@0.5 (mean average precision, IoU = 0.5) and
mAP@0.5–0.95 (mean average precision, IoU = 0.5–0.95) values, where the mAP@0.5 values
are 1.88%, 2.18%, and 2.28% higher than those of ResBlock2, ResBlock3, and ResBlock4,
respectively. In addition, the ResBlock and ResBlock2 models have the same number of
parameters and model size, but there is a large difference in terms of accuracy, which is
mainly because the BN layer plays a role in speeding up the training speed to improve
model accuracy. Comparing the FLOP values of the three improved modules, the ResBlock
module is also more advantageous. The experimental results show that using the ResBlock
module to replace the Conv module in the backbone structure of YOLOv5s is an effective
way to improve the model, and although the ResBlock module increases the number of
parameters, this multiscale voxel extraction module adapts to the needs of apple leaf disease
detection and improves the accuracy of the detection model.
Models Method
Taking YOLOv5s as the base model, modify the layer 0 Conv of the backbone to the focus module and replace all C3
A
modules to obtain all C3s in the new model A as C3 × 1.
Taking YOLOv5s as the base model, modify the layer 0 Conv of the backbone to the focus module and replace all C3
B
modules to obtain C3 as C4 × 1 in the new model B.
Using model A as the base model, all Convs are replaced with ResBlock × 1, except for backbone layer 0 Conv, which is
C
the focus, to obtain the new model C.
Using model C as the base model, all the Convs in the obtained new model D are ResBlock, except the first Conv in the
D
backbone, which is the focus. Improving the last layer C3 in the backbone and C3 in the head are both C4 × 1.
E With D as the base model, SPPF is improved to SPP.
With E as the base model, the SPP is adjusted up one layer so that C4 is connected to the head layer as the output layer
F
of the backbone.
Taking F as the base model, shrinking the number of channels and improving the 10th and 14th layers in the neck for
YOLOv5-Res4
Conv, we end up with YOLOV5-Res4.
As shown in Table 5, both the C4 module and the Resblock module play a positive
effect in simplifying model complexity and recognition accuracy. Feature extraction is
mainly carried out in the shallow network, and model D has 1.7% fewer model parameters
than model C, while mAP@0.5 is improved by 0.5%. The effect of the SPP module was
verified using model D, model E, and model F. The results show that the mAP@0.5 value
gradually increases and reaches the highest value of 80.8% in model F. The mAP@0.5 value
of the YOLOv5-Res4 model is 79.9%, with the parameters only 22.89% and 61.7% of those
of the F and YOLOv5n models, and the model size is 2.4 MB, which significantly reduces
the size and computational cost of the model.
Models Precision/% Recall/% mAP@0.5/% mAP@0.5–0.95/% Parameters Model Size/MB FLOPs/G FPS
RTDETR-L 79.8% 69.4% 74.6% 34.6% 32,004,290 66.2 103.5 65.4
YOLOv5s 77.1% 77.2% 79.3% 32.7% 7,037,095 13.8 15.8 136.9
YOLOv5s-TR 77.6% 74.6% 78.4% 32.7% 7,078,951 14.6 16.1 133.3
YOLOv5s6 76.2% 74.3% 77.8% 33.9% 12,342,868 25.2 16.2 120.5
YOLOv8s 79.5% 72.6% 78.6% 37.4% 11,129,454 21.4 28.5 92.6
YOLOv9 76.4% 78.6% 80.1% 40.6% 60,516,700 122.5 264 34.2
YOLOv5-Res 81.4% 76.9% 82.1% 34.0% 5,486,055 10.8 15.6 107.5
Bolded fonts for best results.
Table 7 displays the mAP@0.5 values of mainstream models for detecting various
categories of disease characteristics. Among these, the best-performing models for detecting
alternaria leaf spot, rust, brown spot, gray spot, and frog eye leaf spot were YOLOv9,
YOLOv5s6, YOLOv9, RTDETR-L, and YOLOv5-Res, respectively. Comparing the detection
results of the other models, YOLOv5-Res demonstrates a more balanced performance, with
only gray spot having a mAP@0.5 lower than 80%.
Table 7. YOLOv5-Res and mainstream accuracy modeling for detecting disease characteristics in
various categories mAP@0.5.
Models Alternaria Leaf Spot Rust Brown Spot Gray Spot Frog Eye Leaf Spot
RTDETR-L 73.4% 77.3% 76.2% 71.2% 75%
YOLOv5s 76.8% 81.4% 88.5% 69.8% 79.9%
YOLOv5s-TR 75.6% 76.2% 88.0% 70.4% 81.2%
YOLOv5s6 76.0% 82.6% 88.1%% 62.0% 80.0%
YOLOv8s 75.8% 78.8% 87.2% 70.5% 80.8%
YOLOv9 85.2% 76.9% 97.4% 63.3% 77.6%
YOLOv5-Res 84.9% 80.4% 93.0% 70.4% 81.7%
Bolded fonts for best results.
Models Precision/% Recall/% mAP@0.5/% mAP@0.5–0.95/% Parameters Model Size/MB FLOPs/G FPS
YOLOv3-tiny 69.8% 62.7% 68.2% 22.2% 8,687,482 17.5 12.9 256.4
YOLOv5n 76.0% 72.2% 77.7% 31.1% 1,772,695 3.7 4.2 156.3
YOLOv5n6 75.6% 68.0% 74.6% 30.7% 3,104,644 6.7 4.3 138.8
YOLOv8n 77.1% 72.7% 77% 37.3% 3,007,598 6.3 8.1 112.4
YOLOv5-Res4 76.9% 75.9% 79.9% 31.5% 1,094,471 2.4 5.3 135.1
Bolded fonts for best results.
Table 9. YOLOv5-Res4 and mainstream lightweight models for detecting various types of disease
characteristics of mAP@0.5.
Models Alternaria Leaf Spot Rust Brown Spot Gray Spot Frog Eye Leaf Spot
YOLOv3-tiny 72.7% 77.0% 49.6% 64.4% 77.3%
YOLOv5n 80.4% 79.6% 81.6% 66.9% 79.9%
YOLOv5n6 75.5% 75.0% 85.4% 59.7% 77.3%
YOLOv8n 75.4% 76.9% 84.7% 70.9% 77.3%
YOLOv5-Res4 82.1% 81.3% 87.5% 67.5% 81.2%
Bolded fonts for best results.
The YOLOv5-Res model and YOLOv5-Res4 model were validated using the test set,
and the detection results are shown in Figure 7. The apple leaf disease targets were small,
and the YOLOv5s model and YOLOv5n model had misdetection and omission in detecting
alternaria leaf spot and gray spot and had a slightly lower confidence level compared with
YOLOv5-Res and YOLOv5-Res4. In summary, the YOLOv5-Res and YOLOv5-Res4 models
were able to accomplish the task of accurate and efficient apple leaf spot detection.
and the detection results are shown in Figure 7. The apple leaf disease targets were small,
and the YOLOv5s model and YOLOv5n model had misdetection and omission in
detecting alternaria leaf spot and gray spot and had a slightly lower confidence level
compared with YOLOv5-Res and YOLOv5-Res4. In summary, the YOLOv5-Res and
Agronomy 2024, 14, 1331 13 of 16
YOLOv5-Res4 models were able to accomplish the task of accurate and efficient apple
leaf spot detection.
Modelrecognition
Figure7.7.Model
Figure recognitionperformance
performancecomparison.
comparison.
4. Discussion
4. Discussion
In order to solve problems concerning the small sizes of some lesions on apple leaves
and In
theorder to solve
difficulty problems
of accurately concerning
detecting lesionthe smalldue
targets sizes of some
to the complexlesions on apple
background of
leaves and the difficulty of accurately detecting lesion
the orchard, this study proposed obtaining the high-precision YOLOv5-Res modeltargets due to the complex
and the
background of the orchard,
lightweight YOLOv5-Res4 model. this study proposed obtaining the high-precision
YOLOv5-Res
Compared modelwithand
thethe lightweight
study of Li [48],YOLOv5-Res4
although all our model.
studies are based on the YOLO
Compared with the study of Li [48],
series, our model is more advantageous for lesion detectionalthough all ourinstudies
complex are based on the
environments in
YOLO series, our model is more advantageous for lesion
orchards, and its design fully takes into account the diversity of actual orchard detection in scenarios,
complex
environments
which improves in orchards, and itsand
the adaptability design fully takes
robustness into
of the accountCompared
module. the diversity of actual
to Gao’s [49]
orchard
study, we used a more challenging multidimensional dataset containing imagesmodule.
scenarios, which improves the adaptability and robustness of the of apple
Compared to Gao’sorchard
leaves in complex [49] study, we used with
backgrounds a more challenging
different weather multidimensional dataset
conditions and different
containing images
shooting angles, of apple
which enhanced leaves in complex
the utility orchardInbackgrounds
of our model. with concerned
addition, we were different
weather
with the conditions
difficulty inand different shooting
differentiating angles, which
spot identification, enhanced
especially the utilityleaf
that alternaria of spot
our
model. In addition, we were concerned with the difficulty in differentiating
is easily confused with gray spot in the early stages of the disease spot, and the results spot
identification,
showed that gray especially
spot wasthat alternaria
not detectedleaf withspot
highisaccuracy,
easily confused withbegray
which will spot in the
considered as a
early stages
separate of the topic
research disease spot, and
in future the results
studies. showed
In the field that gray spot
of agricultural was notthe
detection, detected
studies
with high accuracy,
of Firozeh Solimani [20]whichandwillMa be [21]considered
concludedas a separate
that research
the attention topic incan
mechanism future
help
studies.
the modelIn the field of
to better agricultural
focus on the key detection,
regions of thethe
studies
imageofwhile
Firozeh
alsoSolimani
increasing[20] and
the Ma
model
parameters. However, we cleverly propose a lightweighting strategy that successfully
balances the model complexity and performance by compressing the depth and width
of the model, achieving the need to significantly reduce the model computation while
ensuring detection accuracy.
In summary, our model effectively solves the challenge of the difficulty of the accurate
detection of targets with small-sized spots on apple leaves and under complex background
conditions in the orchard and shows excellent performance in both detection accuracy and
Agronomy 2024, 14, 1331 14 of 16
the lightening of the model, providing an innovative solution for the detection of apple
leaf diseases.
5. Conclusions
Between YOLOv5-Res and YOLOv5-Res4, the proposed ResBlock multiscale module
helps to extract features at different scales and preserve image detail information. Addi-
tionally, the feature fusion module C4 enhances the ability to extract feature information
from small targets while being lightweight. By compressing the depth and width of the
model, the lightweight YOLOv5-Res4 model is obtained. The improved algorithm model
YOLOv5-Res has a mAP@0.5 of 82.1%, FLOPs of 15.6 G, parameters of 5486055, and a
model size of 10.8 MB. The YOLOv5-Res4 model has a mAP0.5 of 79.9%, FLOPs of 5.3 G, a
parameter count of 1094471, and a model size of 2.4 MB. Overall, YOLOv5-Res was able
to achieve high accuracy in detecting apple leaf diseases, while YOLOv5-Res4 still had
high detection accuracy while being lightweight, and it also had certain advantages in
comparison with mainstream models.
Author Contributions: Conceptualization, Z.F.; Methodology, Z.S.; Resources, Z.C. All authors have
read and agreed to the published version of the manuscript.
Funding: This research received no external funding.
Data Availability Statement: Data are contained within the article.
Acknowledgments: The authors would like to thank the anonymous reviewers for their critical
comments and suggestions for improving the manuscript.
Conflicts of Interest: Zhaokai Sun and Zemin Feng were employed by the Jidong Equipment
Company. The remaining authors declare that the research was conducted in the absence of any
commercial or financial relationships that could be construed as a potential conflict of interest.
References
1. Bi, C.; Wang, J.; Duan, Y.; Fu, B.; Kang, J.-R.; Shi, Y. MobileNet based apple leaf diseases identification. Mob. Netw. Appl. 2022,
27, 172–180. [CrossRef]
2. Abbaspour-Gilandeh, Y.; Aghabara, A.; Davari, M.; Maja, J.M. Feasibility of using computer vision and artificial intelligence
techniques in detection of some apple pests and diseases. Appl. Sci. 2022, 12, 906. [CrossRef]
3. Liu, J.; Wang, X. Plant diseases and pests detection based on deep learning: A review. Plant Methods 2021, 17, 22. [CrossRef]
4. Fuentes, A.; Yoon, S.; Park, D.S. Deep learning-based techniques for plant diseases recognition in real-field scenarios. In
Proceedings of the Advanced Concepts for Intelligent Vision Systems: 20th International Conference, ACIVS 2020, Auckland,
New Zealand, 10–14 February 2020; Proceedings 20. pp. 3–14.
5. Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. Adv. Neural
Inf. Process. Syst. 2015, 28, 1137–1149. [CrossRef]
6. He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision,
Venice, Italy, 22–29 October 2017; pp. 2961–2969.
7. Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of
the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Proceedings,
Part I 14. pp. 21–37.
8. Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788.
9. Redmon, J.; Farhadi, A. YOLO9000: Better, faster, stronger. In Proceedings of the IEEE Conference on Computer Vision and
Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 7263–7271.
10. Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767.
11. Bochkovskiy, A.; Wang, C.-Y.; Liao, H.-Y.M. Yolov4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934.
12. Lee, S.-H.; Wu, C.-C.; Chen, S.-F. Development of image recognition and classification algorithm for tea leaf diseases using
convolutional neural network. In Proceedings of the 2018 ASABE Annual International Meeting, Detroit, MI, USA, 29 July–1
August 2018; p. 1.
13. Singh, U.P.; Chouhan, S.S.; Jain, S.; Jain, S. Multilayer convolution neural network for the classification of mango leaves infected
by anthracnose disease. IEEE Access 2019, 7, 43721–43729. [CrossRef]
14. Li, H.; Shi, H.; Du, A.; Mao, Y.; Fan, K.; Wang, Y.; Shen, Y.; Wang, S.; Xu, X.; Tian, L. Symptom recognition of disease and insect
damage based on Mask R-CNN, wavelet transform, and F-RNet. Front. Plant Sci. 2022, 13, 922797. [CrossRef]
Agronomy 2024, 14, 1331 15 of 16
15. Sun, H.; Xu, H.; Liu, B.; He, D.; He, J.; Zhang, H.; Geng, N. MEAN-SSD: A novel real-time detector for apple leaf diseases using
improved light-weight convolutional neural networks. Comput. Electron. Agric. 2021, 189, 106379. [CrossRef]
16. Tian, Y.; Yang, G.; Wang, Z.; Li, E.; Liang, Z. Detection of apple lesions in orchards based on deep learning methods of cyclegan
and yolov3-dense. J. Sens. 2019, 2019, 7630926. [CrossRef]
17. Li, J.; Qiao, Y.; Liu, S.; Zhang, J.; Yang, Z.; Wang, M. An improved YOLOv5-based vegetable disease detection method. Comput.
Electron. Agric. 2022, 202, 107345. [CrossRef]
18. Lin, J.; Yu, D.; Pan, R.; Cai, J.; Liu, J.; Zhang, L.; Wen, X.; Peng, X.; Cernava, T.; Oufensou, S. Improved YOLOX-Tiny network for
detection of tobacco brown spot disease. Front. Plant Sci. 2023, 14, 1135105. [CrossRef]
19. Khan, F.; Zafar, N.; Tahir, M.N.; Aqib, M.; Waheed, H.; Haroon, Z. A mobile-based system for maize plant leaf disease detection
and classification using deep learning. Front. Plant Sci. 2023, 14, 1079366. [CrossRef]
20. Solimani, F.; Cardellicchio, A.; Dimauro, G.; Petrozza, A.; Summerer, S.; Cellini, F.; Renò, V. Optimizing tomato plant phenotyping
detection: Boosting YOLOv8 architecture to tackle data complexity. Comput. Electron. Agric. 2024, 218, 108728. [CrossRef]
21. Ma, B.; Hua, Z.; Wen, Y.; Deng, H.; Zhao, Y.; Pu, L.; Song, H. Using an improved lightweight YOLOv8 model for real-time
detection of multi-stage apple fruit in complex orchard environments. Artif. Intell. Agric. 2024, 11, 70–82. [CrossRef]
22. Yang, Q.; Duan, S.; Wang, L. Efficient identification of apple leaf diseases in the wild using convolutional neural networks.
Agronomy 2022, 12, 2784. [CrossRef]
23. Pathological Images of Apple Leaves. Available online: https://aistudio.baidu.com/datasetdetail/11591/0 (accessed on 1 April 2023).
24. Hughes, D.; Salathé, M. An open access repository of images on plant health to enable the development of mobile disease
diagnostics. arXiv 2015, arXiv:1511.08060.
25. Feng, J.; Chao, X. Apple Tree Leaf Disease Segmentation Dataset; Science Data Bank: Beijing, China, 2022.
26. Thapa, R.; Zhang, K.; Snavely, N.; Belongie, S.; Khan, A. The Plant Pathology Challenge 2020 data set to classify foliar disease of
apples. Appl. Plant Sci. 2020, 8, e11390. [CrossRef]
27. Jocher, G.; Chaurasia, A.; Stoken, A.; Borovec, J.; Kwon, Y.; Michael, K.; Fang, J.; Wong, C.; Yifu, Z.; Montes, D. ultralytics/yolov5:
v6. 2-yolov5 classification models, apple m1, reproducibility, clearml and deci. ai integrations. Zenodo. 2022. Available online:
https://github.com/ultralytics/yolov5 (accessed on 13 May 2024).
28. Park, H.; Yoo, Y.; Seo, G.; Han, D.; Yun, S.; Kwak, N. C3: Concentrated-comprehensive convolution and its application to semantic
segmentation. arXiv 2018, arXiv:1812.04920.
29. He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans.
Pattern Anal. Mach. Intell. 2015, 37, 1904–1916. [CrossRef]
30. Lin, T.-Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings
of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125.
31. Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path aggregation network for instance segmentation. In Proceedings of the IEEE conference on
Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 8759–8768.
32. Zheng, Z.; Wang, P.; Ren, D.; Liu, W.; Ye, R.; Hu, Q.; Zuo, W. Enhancing geometric factors in model learning and inference for
object detection and instance segmentation. IEEE Trans. Cybern. 2021, 52, 8574–8586. [CrossRef]
33. Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with
convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June
2015; pp. 1–9.
34. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778.
35. Ioffe, S.; Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings
of the International Conference on Machine Learning, Lille, France, 6–11 July 2015; pp. 448–456.
36. Elfwing, S.; Uchibe, E.; Doya, K. Sigmoid-weighted linear units for neural network function approximation in reinforcement
learning. Neural Netw. 2018, 107, 3–11. [CrossRef]
37. Hua, B.-S.; Tran, M.-K.; Yeung, S.-K. Pointwise convolutional neural networks. In Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 984–993.
38. Lucic, A.; Oosterhuis, H.; Haned, H.; de Rijke, M. FOCUS: Flexible optimizable counterfactual explanations for tree ensembles. In
Proceedings of the AAAI Conference on Artificial Intelligence, Online, 22 February–1 March 2022; pp. 5313–5322.
39. Wang, C.-Y.; Liao, H.-Y.M.; Wu, Y.-H.; Chen, P.-Y.; Hsieh, J.-W.; Yeh, I.-H. CSPNet: A new backbone that can enhance learning
capability of CNN. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle,
WA, USA, 14–19 June 2020; pp. 390–391.
40. Sun, Y.; Chen, G.; Zhou, T.; Zhang, Y.; Liu, N. Context-aware cross-level fusion network for camouflaged object detection. arXiv
2021, arXiv:2105.12555.
41. Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.-C. Mobilenetv2: Inverted residuals and linear bottlenecks. In
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018;
pp. 4510–4520.
42. Zhang, X.; Zeng, H.; Guo, S.; Zhang, L. Efficient long-range attention network for image super-resolution. In Proceedings of the
European Conference on Computer Vision, Glasgow, UK, 23–28 August 2022; pp. 649–667.
Agronomy 2024, 14, 1331 16 of 16
43. Zhao, Y.; Lv, W.; Xu, S.; Wei, J.; Wang, G.; Dang, Q.; Liu, Y.; Chen, J. Detrs beat yolos on real-time object detection. arXiv 2023,
arXiv:2304.08069.
44. Reis, D.; Kupec, J.; Hong, J.; Daoudi, A. Real-time flying object detection with YOLOv8. arXiv 2023, arXiv:2305.09972.
45. Wang, C.-Y.; Yeh, I.-H.; Liao, H.-Y.M. YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information.
arXiv 2024, arXiv:2402.13616.
46. Adarsh, P.; Rathi, P.; Kumar, M. YOLO v3-Tiny: Object Detection and Recognition using one stage improved model. In Proceedings
of the 2020 6th International Conference on Advanced Computing and Communication Systems (ICACCS), Coimbatore, India,
6–7 March 2020; pp. 687–694.
47. Mojaravscki, D.; Graziano Magalhães, P.S. Comparative Evaluation of Color Correction as Image Preprocessing for Olive
Identification under Natural Light Using Cell Phones. AgriEngineering 2024, 6, 155–170. [CrossRef]
48. Li, H.; Shi, L.; Fang, S.; Yin, F. Real-Time Detection of Apple Leaf Diseases in Natural Scenes Based on YOLOv5. Agriculture 2023,
13, 878. [CrossRef]
49. Ang, G.; Han, R.; Yuepeng, S.; Longlong, R.; Yue, Z.; Xiang, H. Construction and verification of machine vision algorithm model
based on apple leaf disease images. Front. Plant Sci. 2023, 14, 1246065. [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.