Professional Documents
Culture Documents
A Cucumber Leaf Disease Severity Grading Method in Natural Environment Based on the Fusion of TRNet and U-Net
A Cucumber Leaf Disease Severity Grading Method in Natural Environment Based on the Fusion of TRNet and U-Net
Article
A Cucumber Leaf Disease Severity Grading Method in Natural
Environment Based on the Fusion of TRNet and U-Net
Hui Yao 1,2 , Chunshan Wang 1,2,3 , Lijie Zhang 4, *, Jiuxi Li 4 , Bo Liu 1,2 and Fangfang Liang 1,2
1 School of Information Science and Technology, Hebei Agricultural University, Baoding 071001, China;
gshanhui@163.com (H.Y.); chunshan9701@163.com (C.W.); boliu@hebau.edu.cn (B.L.);
liangfangfang@hebau.edu.cn (F.L.)
2 Hebei Key Laboratory of Agricultural Big Data, Baoding 071001, China
3 National Engineering Research Center for Information Technology in Agriculture, Beijing 100097, China
4 College of Mechanical and Electrical Engineering, Hebei Agricultural University, Baoding 071001, China;
lijiuxi@163.com
* Correspondence: ljzhang@ysu.edu.cn
Abstract: Disease severity grading is the primary decision-making basis for the amount of pesticide
usage in vegetable disease prevention and control. Based on deep learning, this paper proposed an
integrated framework, which automatically segments the target leaf and disease spots in cucumber
images using different semantic segmentation networks and then calculates the area of disease spots
and the target leaf for disease severity grading. Two independent datasets of leaves and lesions
were constructed, which served as the training set for the first-stage diseased leaf segmentation
and the second-stage lesion segmentation models. The leaf dataset contains 1140 images, and the
lesion data set contains 405 images. The proposed TRNet was composed of a convolutional network
and a Transformer network and achieved an accuracy of 93.94% by fusing local features and global
features for leaf segmentation. In the second stage, U-Net (Resnet50 as the feature network) was
used for lesion segmentation, and a Dice coefficient of 68.14% was obtained. After integrating TRNet
and U-Net, a Dice coefficient of 68.83% was obtained. Overall, the two-stage segmentation network
achieved an average accuracy of 94.49% and 94.43% in the severity grading of cucumber downy
mildew and cucumber anthracnose, respectively. Compared with DUNet and BLSNet, the average
Citation: Yao, H.; Wang, C.; Zhang, L.; accuracy of TUNet in cucumber downy mildew and cucumber anthracnose severity classification
Li, J.; Liu, B.; Liang, F. A Cucumber
increased by 4.71% and 8.08%, respectively. The proposed model showed a strong capability in
Leaf Disease Severity Grading
segmenting cucumber leaves and disease spots at the pixel level, providing a feasible method for
Method in Natural Environment
evaluating the severity of cucumber downy mildew and anthracnose.
Based on the Fusion of TRNet and
U-Net. Agronomy 2024, 14, 72.
Keywords: cucumber disease; disease spot; fusion of TRNet and U-Net; two-stage segmentation
https://doi.org/10.3390/
agronomy14010072
framework; disease severity grading
the classification task: after inputting a disease image into the model, it is expected to output
the category label to which the image belongs based on the detected features. “Where”
corresponds to the object detection and localization task: the model is not only required
to identify the type of disease present in the image but also to indicate the location of the
abnormality. “How” corresponds to the semantic segmentation task: through semantic
segmentation, the model is expected to output a series of useful information, such as the
size and location of disease spots, in order to comprehensively evaluate the disease severity
and guide subsequent pesticide usage.
Among the three questions above, there have been a number of studies focusing on
the classification and detection tasks related to the prevention and control of crop diseases,
and fruitful results have been achieved. For example, Yang et al. [2] optimized GoogleNet
for rice disease detection and achieved an accuracy of 99.58%. Muhammad et al. [3,4]
constructed multiple convolutional neural network (CNN) structures and reported that
the Xception and DenseNet architectures delivered better performance in multi-label plant
disease classification. Zhang et al. [5] proposed to use residual paths instead of the original
MU-Net skip connection in U-Net to segment diseased leaves of corn and cucumber, and
the segmentation accuracy was significantly improved. Bhagat et al. [6,7] constructed the
Eff-UNet++ model, which uses EfficientNet-B4 as the encoder and a modified UNet++ as
the decoder. This model achieved a Dice coefficient of 83.44% in the segmentation of leaves
in the KOMATSUNA dataset.
However, there are a few studies on the severity grading of diseases. Assessing and
grading disease severity has important practical significance because it directly affects
the formulation of control plans and the prediction of crop losses and provides a basis
variable spraying. The existing disease severity grading methods can be divided into two
categories. The first category of methods is to directly construct a classification model
to identify the type and severity of the disease. For example, Esgario et al. [8] used two
CNNs in parallel to classify the disease type and severity of coffee leaves and achieved
an accuracy of 95.24% and 86.51%, respectively. Liang et al. [9] proposed the PD2SE-Net
for category recognition, disease classification, and severity estimation, and the accuracy
of disease severity estimation reached 91%. Hu et al. [10] improved the Faster R-CNN
model for detecting tea tree leaves and then used the VGG16 network to classify the disease
severity. Pan et al. [11] employed the Faster R-CNN model with VGG16 as the feature
network to extract strawberry leaf spots to form a new dataset and then used the Siamese
model to estimate the severity of strawberry burning. Dhiman et al. [12] classified the
severity of citrus diseases as high, medium, low, and healthy and used the optimized
VGGNet16 to classify the severity of diseased fruits, which achieved an accuracy of 97%.
The core of this type of method is to regard disease severity grading as a classification task
in order to establish a relationship between the disease severity and the samples using
an appropriate classification model. The advantage of such a method lies in the ease of
implementation, while the disadvantage is that the disease severity data in the dataset
is manually labelled, which involves a high degree of subjectivity and lacks a stringent
quantitative standard [13].
The second category of methods is to segment the diseased regions through semantic
segmentation first, and then calculate the ratio of the area of diseased regions to the total
area in order to estimate the disease severity. The nature of semantic segmentation is to
classify images pixel by pixel. Wspanialy et al. [14] used the improved U-Net to segment
nine tomato diseases in the PlantVillage tomato dataset and estimated the disease severity,
which had an error rate of 11.8%. Zhang et al. [15] constructed a CNN model, which
regarded cucumber downy mildew leaves with the background removed as the input, to
estimate the severity of cucumber downy mildew and achieved a high level of accuracy
(R2 = 0.9190). Gonçalves et al. [16] applied the multi-semantic segmentation method in
laboratory-acquired images to evaluate the disease severity. The results indicated that
DeepLab V3+ delivered the best performance in disease severity estimation. Lin et al. [17]
proposed a semantic segmentation model based on CNN, which was used to segment
Agronomy 2024, 14, 72 3 of 19
cucumber powdery mildew images at the pixel level. The model achieved an average pixel
accuracy of 96.08%, a cross-merge ratio of 72.11%, and a Dice coefficient of 83.45% using
20 test samples. The advantage of this category of methods is that the classification criteria
are usually objective and clear, while the disadvantage is that the complexity of the image
background can seriously affect the segmentation accuracy.
In order to reduce the impact of complex backgrounds, Tassisa et al. [18] proposed
to use Mask R-CNN to identify the positions of coffee leaves in real production environ-
ments first, and then apply U-Net/PSPNet to segment the coffee leaves and disease spots
simultaneously. Li et al. [19] further improved the segmentation accuracy of the model
using a mixed attention mechanism that combined spatial attention and channel attention,
with support from transfer learning. The model was used to automatically estimate the
severity of cucumber leaf diseases under field conditions and achieved an R2 of 0.9578.
The above studies have, to some extent, reduced the interference of complex backgrounds
and achieved satisfactory results. However, segmenting leaves and disease spots simul-
taneously may affect the segmentation outcome of disease spots due to pixel imbalance,
leading to the omission of disease spots. In addition, none of these methods analyzed
the overlap of leaves in the collected images in actual production environments. In the
process of image collection, the target leaf often overlaps with other leaves, and similar
backgrounds can easily lead to the over-segmentation problem. In light of that, the grading
of disease severity requires the precise segmentation of leaves and disease spots; thus, we
proposed a two-stage segmentation method in this paper as follows:
(1) An image dataset for cucumber leaf segmentation was constructed, which contained
1140 diseased or healthy cucumber leaf images in complex backgrounds.
(2) In order to improve the accuracy of disease classification and disease severity grading,
we proposed a two-stage model. In the first stage, the diseased leaves were separated
from the background, and in the second stage, the diseased spots were separated from
the diseased leaves.
(3) In order to reduce the interference from overlapping leaves and minimize the phe-
nomenon of over-segmentation, the convolutional structure and Transformer were
used simultaneously for feature extraction in order to fuse the global and local fea-
tures. Thus, the information loss caused by down-sampling could be compensated to
optimize the effect of leaf edge segmentation.
Figure
Figure2.
Figure 2.Image
2. Imagesegmentation
Image segmentationlabels.
segmentation labels.
labels.
2.2. Two-Stage
2.2. Two-Stage Model
2.2. Two-StageModel
Model
Due
Due to interference from complex backgrounds andand significant differences in theinscale
Due toto interference
interference fromfrom complex
complex backgrounds
backgrounds significant differences the
in the
of
scaledisease spots, it is difficult to achieve the accurate segmentation of cucumber leaves
scale of
of disease
disease spots,
spots, itit is
is difficult
difficult to to achieve
achieve thethe accurate
accurate segmentation of cucumber
and disease spots using a one-stage segmentation model. Therefore, a two-stage model
leaves
leaves and
and disease
disease spots
spots using
using aa one-stage
one-stage segmentation
segmentation model. Therefore, a two-stage
was designed in this study to decompose a complex task into two simple subtasks. The
model
model was designed in this study to decompose aa complex
was designed in this study to decompose complex task into two simple subtasks.
proposed two-stage model, namely TUNet, consisted of TRNet and U-Net. In the first
The
Theproposed
proposedtwo-stage
two-stagemodel,
model,namely
namely TUNet,
TUNet, consisted
consisted of TRNet and U-Net. In the first
stage, TRNet was used to segment the target leaf from complex backgrounds. In the second
stage,
stage,TRNet
TRNetwas wasused
used toto segment
segment thethe target
target leaf
leaf from
from complex backgrounds. In the sec-
stage, U-Net was used to further segment disease spots from the obtained target leaf. The
ond
ondstage,
stage,U-Net
U-Netwaswasused
usedto to further
further segment
segment disease
disease spots from the obtained target leaf.
advantage of two-stage segmentation is that the model only needs to focus on one type of
The
Theadvantage
advantage of of two-stage
two-stage segmentation
segmentation is is that
that the
the model only needs to focus on one
target at each stage (leaf target in the first stage and disease spot target in the second stage).
For the two different targets, semantic segmentation models with different structures were
selected according to the specific needs of each target, so as to combine the advantages of
the two models to improve the segmentation accuracy. The framework of the proposed
type of target at each stage (leaf target in the first stage and disease spot target in the
second stage). For the two different targets, semantic segmentation models with different
Agronomy 2024, 14, 72 structures were selected according to the specific needs of each target, so as to combine5 of 19
the advantages of the two models to improve the segmentation accuracy. The framework
of the proposed two-stage model is shown in Figure 3. The structure and key algorithms
oftwo-stage
the two models
model used will in
is shown beFigure
described in detail
3. The in the
structure andfollowing sections.
key algorithms of the two models
used will be described in detail in the following sections.
Figure
Figure 4.
4. TRNet
TRNetstructure
structure diagram.
diagram.
2.3.1. ResNet50
2.3.1. ResNet50
In thispaper,
In this paper,after
after taking
taking intointo account
account the the network
network performance
performance and model
and model size,
size, Res-
ResNet50
Net50 was was chosen
chosen as theasnetwork
the network for extracting
for extracting local features
local features [31]. ResNet50
[31]. ResNet50 is an
is an archi-
architecture
tecture basedbased on multi-layer
on multi-layer convolution
convolution and identity
and identity mapping,
mapping, as shown
as shown in Figure
in Figure 3. For3.
aFor a given
given image image
as theasinput,
the input, ResNet50
ResNet50 first conducts
first conducts a convolution
a convolution operation
operation and a
and a maxi-
maximum pooling operation on this image. The subsequent operations
mum pooling operation on this image. The subsequent operations consist of four stages, consist of four
stages, namely Stage 1, Stage 2, Stage 3, and Stage 4, which all start
namely Stage 1, Stage 2, Stage 3, and Stage 4, which all start with a Conv Block, followedwith a Conv Block,
followed
by different bynumbers
differentofnumbers of Identity
Identity Block. From Block.
FiguresFrom
4 andFigures
5, it can 4beand
seen5,that
it can
eachbeblock
seen
that each block contains three layers of convolution. The difference
contains three layers of convolution. The difference between Conv Block and Identity between Conv Block
and Identity
Block Block
lies in that lies Block
Conv in thatuses
Conv theBlock uses thekernel
convolution convolution kernel for dimensionality
for dimensionality reduction at
reduction at residual jumps, which
residual jumps, which can be expressed as: can be expressed as:
H(x) =FF(x)
H (x) = (x) + +
x x (1)
(1)
ceptron (MLP) module. As shown in Equations (3) and (4), in the first layer, after the input
sequence passing through the Transformer layer, the output can be obtained as follows:
QK T
SA( Zl −1 ) = ( √ )V (6)
d
Q = Zl −1 W Q , K = Zl −1 W K , V = Zl −1 W V (7)
Agronomy 2024, 14, 72 7 of 21
where W o ∈ Rmd×C , W q , W k , W v ∈ RC×d are three learnable parameters; d is the dimension
of K.
Figure 5. Two types of residual modules: (a) Identity block and (b) Conv block.
Figure 5. Two types of residual modules: (a) Identity block and (b) Conv block.
2.3.2. Transformer
As shown in Figure 3, since the Transformer module does not require three-dimen-
sional image data, the input image first needs to be transformed into a vector sequence
through an embedding layer. Considering that ResNet50 down-samples an input image
H W
16 times, the input sequence length of the Transformer is designed to be × ×C .
16 16
The input image is divided into 1024 patches with a size of 16 × 16. Then, each patch is
mapped into a one-dimensional vector through linear mapping, which is further pro-
cessed using position coding. Subsequently, the obtained vector sequence is inputted into
the Transformer Encoder for feature learning. From Figure 6, it can be seen that the Trans-
former Encoder mainly consists of an L-layer Muli-Head Attention (MSA) module and a
multi-layer perceptron (MLP) module. As shown in Equations (3) and (4), in the first layer,
after the input sequence passing through the Transformer layer, the output can be ob-
tained as follows:
Z = Z + MSA(LN( Z l −1 ))
'
Figure 6. Structure
Figure 6. Structure of
of Transformer
l
Transformer Encoder.
l −1
Encoder. (3)
2.3.3. Decoder
Z l = Z l + MLP( LN ( Z l ))
' '
(4)
The feature maps of the same scale but with different channels that are outputted by
where, the MSA
ResNet50 operation
and is realized
the Transformer arebyconcatenated
projecting the andconcatenation
then inputted of into
m SA operations,
the decoder part.
as shown in Equations (5)–(7).
The decoder adopts the naive structure in SETR, which consists of two layers of 1 × 1
convolution + BatchNorm + 1 × 1 convolution. The last 1 × 1 convolutionOmaps each com-
MSA( Z l −1) =toConcat
ponent feature vector
( SA1 ( Znumber
the required l −1), SA2of
( Zcategories.
l −1),• • • SAm ( Z l −1))
Then, W
bilinear
(5)
interpolation
Agronomy 2024, 14, 72 8 of 19
Finally, the features of the Transformer are projected onto the dimension of the number
H W
of categories, and the output is X ∈ R 16 × 16 ×768 .
2.3.3. Decoder
The feature maps of the same scale but with different channels that are outputted
by ResNet50 and the Transformer are concatenated and then inputted into the decoder
part. The decoder adopts the naive structure in SETR, which consists of two layers of 1 × 1
convolution + BatchNorm + 1 × 1 convolution. The last 1 × 1 convolution maps each
component feature vector to the required number of categories. Then, bilinear interpolation
up-sampling is performed directly to obtain the output with the same resolution as the
original image, that is, X ∈ R H ×W ×numcls .
Figure7.7.U-Net
Figure U-Netnetwork
networkstructure
structure diagram.
diagram.
pesticide spraying or ineffective disease control due to insufficient pesticide usage. In this
study, we calculated the ratio of the total area of disease spots on each leaf to the area of the
entire leaf by segmenting the target leaf and the disease spots separately, which was used
as the basis of disease severity grading. The specific steps are as follows:
Step 1: The leaf and complex backgrounds in the image are considered as the targets.
Then, the complex backgrounds in the manually labeled mask image are removed to obtain
a complete leaf.
Step 2: The mask image obtained in Step 1 is taken as the input of the second stage,
which is segmented to obtain disease spots.
Step 3: The ratio of the total area of disease spots to the area of the entire leaf is
calculated according to Equation (8). Then, this ratio is compared with the disease severity
grading standard to derive the final grading result.
SDisease
P= × 100% (8)
S Lea f
where S Lea f refers to the area of the leaf after segmentation; SDisease refers to the total area
of disease spots after segmentation; and p refers to the ratio of the total area of disease spots
to the area of the entire leaf.
Disease Severity Grading Standard. Referring to the relevant disease severity grading
standards and suggestions from plant protection experts, the severity of cucumber diseases
was classified into five levels in this study, as shown in Table 2.
union of the two sets of true values and predicted values for each category. The calculation
of PA and IoU is as follows:
∑K Pii
PA = K i=0K (9)
∑i=0 ∑ j=0 Pij
pii
IoU = (10)
∑kj=0 pij + ∑kj=0 p ji − pii
where, Pij refers to the total number of i pixels predicted as j pixels; Pii refers to the total
number of i pixels predicted as i pixels, i.e., the total number of correctly classified pixels.
The k value for each stage in the two-stage model is 1. Specifically, in the first stage of the
two-stage model, k = 1 represents leaf, while in the second stage, it represents lesion.
Dice is usually used to calculate the similarity between two samples, and the value
range is [0, 1]. A dice value close to 1 indicates that the set similarity is high, that is, the
segmentation effect between the target and the background is better. A dice value close to 0
indicates that the target cannot be effectively segmented from the background. Recall is the
ratio between the number of samples correctly predicted as positive classes and the total
number of positive classes. Dice and Recall are calculated as follows:
(2 × TP)
Dice = (11)
FN + FP + (2 × TP)
TP
Recall = (12)
FN + TP
where TP represents the true positive example, FP represents the false positive example,
and FN represents the false negative example.
errors caused by complex backgrounds. At the same time, the focus on local features makes
TRNet equally sensitive to detailed features in the target leaves. Therefore, the TRNet
network achieves the best segmentation performance of leaves and lesions, with a PA of
93.94%, an IoU of 96.86%, a Dice coefficient of 72.25%, and a Recall of 98.60%. Compared
with the SETR model using the Transformer as the encoder, the PA was improved by 2.38%,
the IoU was improved by 4.25%, the Dice coefficient was improved by 1.13%, and the Recall
was improved by 2.46%. Among the segmentation networks using convolutional networks
as encoders, DeepLabV3+(ResNet50) achieved the highest metrics, which were 92.90%,
95.49%, 71.65%, and 97.42% for the PA, IoU, Dice coefficient, and Recall, respectively. The
PA, IoU, Dice coefficient, and Recall for TRNet increased by 1.04%, 1.37%, 0.6% and 1.18%,
respectively, compared to DeepLabV3+(ResNet50). It can be seen that the segmentation
performance of TRNet was significantly improved. It further shows that the combination
of the Transformer and the CNN was effective.
In the second-stage task, the model needed to extract complete disease spots from the
target leaf, which required the model to extract finer features. Since ResNet50 is deeper
and wider than the original U-Net encoder, it can extract more comprehensive disease
spot information. Therefore, in the fine segmentation of lesions, U-Net, using ResNet50
as the feature extraction network, achieved an optimal performance, with the IoU, Dice
coefficient, and Recall reaching 52.52%, 68.14%, and 73.46% respectively, which are better
results than those obtained with the original U-Net. The improvements in the IoU, Dice
coefficient, and Recall were 2.87%, 3.14%, and 7.45%, respectively, which were 8.04%, 7.88%,
and 14.63% higher than the Transformer-based SETR network. The proposed TRNet model
had a slight negative impact on the fine segmentation of lesions because the Transformer
branch extracted global features, so the indicators of this model were slightly lower than
U-Net (ResNet50).
To further demonstrate the superiority of TRNet and U-Net(ResNet50), we visualized
the first-stage and second-stage segmentation results, as shown in Figure 8. It can be seen
that, in the first stage, models based on the CNN could completely segment the target leaf
but were inevitably affected by complex backgrounds, resulting in over-segmentation, more
or less. The SETR model, which is purely based on the Transformer as the feature extractor,
was obviously less affected by overlapping leaves. This is largely because that Transformer
mainly focuses on global features. On the other hand, the SETR model was significantly
weaker than CNN-based models in extracting local features of the cucumber leaf. TRNet,
which combines the advantages of both, could more completely segment the target leaf
from complex backgrounds and had less interference from environmental factors.
In the second stage, the image containing disease spots has a simple background with-
out external interference such that the attention to local features becomes more important.
Except for the original U-Net and U-Net(ResNet50), all the other models mistakenly seg-
mented the connection between two adjusted disease spots, while U-Net also ignored some
minor disease spots. It can be seen that the U-Net model had a significant advantage in
fusing multi-scale features for the segmentation of small disease spots. Moreover, ResNet50,
as a feature extractor, could provide the precise extraction of the local features. Overall, it
Agronomy 2024, 14, 72 12 of 19
PSPNet(ResNet50) SETR
PSPNet(ResNet50) SETR
Figure8.8.Visualization
Figure Visualizationof
ofmodel
modelsegmentation.
segmentation.
A comparison of the results is shown in Table 7. It can be seen that the indicators
of the fusion model on these two categories were similar, which is because the lesions
of cucumber downy mildew and cucumber anthracnose were similar. Among the two
diseases, the performance of Scheme 1 was slightly better than Scheme 3 and Scheme 4.
This is because TRNet is a first-stage model and the leaf segmentation was more accurate.
Scheme 2 outperformed all other fusion schemes and performed better on all metrics (PA,
IoU, Dice coefficient, or recall). It was also noted that all the indicators of Scheme 1, Scheme
2, and Scheme 4 were lower than they were before fusion, and only Scheme 2 yielded
higher values for all the indicators after fusion compared to before. Compared with the
situation in which the other combinations were declining, the integrated advantages of the
two models in scenario 2 can be fully reflected.
The segmentation results of the various fusion schemes are shown in Figure 9. It can
be seen that Scheme 3 and Scheme 4, which used DeepLabV3+ for segmentation in the first
stage, mistakenly segmented some leaves with similar colors as the target leaf, resulting in
the segmentation of disease spots from non-target leaves in the second stage. Therefore, the
final accuracy was reduced. For Scheme 1 and Scheme 2, the TRNet model performed well
in the first stage and fully segmented the contour of the target leaf. However, for disease
spots of varying sizes, the multi-scale segmentation of U-Net apparently outperformed
Agronomy 2024, 14, 72 14 of 19
other schemes. Based on the advantages and disadvantages of the four schemes and
Agronomy 2024, 14, 72 15 of 21
the actual production needs, Scheme 2 was ultimately chosen as the cucumber disease
segmentation model in this study.
Figure9.9.The
Figure Theresults
results ofofthe
the fusion
fusion scenario
scenario are visualized
are visualized.
3.4.Two-Stage
3.4. Two-StageModel
Model
Consideringthat
Considering that the
the segmenting
segmenting of of leaves
leaves and
and disease
diseasespots
spotsfrom
fromcomplex
complex back-
back-
grounds simultaneously with a one-stage model is extremely challenging, we proposed a
grounds simultaneously with a one-stage model is extremely challenging, we proposed a
two-stage segmentation method in this paper. Specifically, purpose of the first stage was
two-stage segmentation method in this paper. Specifically, purpose of the first stage was
to remove complex backgrounds and the purpose of the second stage was to segment the
todisease
removespots
complex backgrounds and the purpose of the second stage was to segment the
under a simple background. In order to verify the improvement of the pro-
disease spots under
posed two-stage model a simple background.
as compared In order
to one-stage to verify the
segmentation, we improvement of the pro-
chose U-Net(ResNet50),
posed two-stage model as compared to one-stage segmentation, we
which delivered the best performance for disease spot segmentation, to extract disease chose U-Net(Res-
Net50), which
spots from delivered
complex the best
and simple performance for disease spot segmentation, to extract
backgrounds.
diseaseThespots from complex
segmentation andare
results simple
shownbackgrounds.
in Figure 10. It can be seen that the results
The segmentation
obtained from two-stage results are shown
segmentation were infar
Figure
better10.
thanIt can
thosebeobtained
seen that theone-stage
from results ob-
tained from two-stage
segmentation. segmentation
In the three wereinfar
images shown better10,than
Figure some those obtained
disease spots onfrom one-stage
non-target
leaves were mistakenly
segmentation. In the threesegmented
images using
shown one-stage
in Figuresegmentation. This is spots
10, some disease becauseonone-stage
non-target
segmentation
leaves does not remove
were mistakenly segmented confounding factors, such
using one-stage as overlapping
segmentation. This isleaves before
because one-
stage segmentation does not remove confounding factors, such as overlapping leavesabe-
disease spot segmentation, leading to poor segmentation results. Therefore, in this study,
two-stage
fore diseasemodel was proposed for
spot segmentation, disease
leading to severity grading in order
poor segmentation to guarantee
results. Therefore, a high
in this
classification accuracy.
study, a two-stage model was proposed for disease severity grading in order to guarantee
a high classification accuracy.
Agronomy 2024, 14, 72 16 of 21
cucumber
downy mildew
cucumber an-
thracnose
Weused
We used TRNet
TRNet and
and U-Net
U-Net toto segment
segment the
thetarget
targetleaf
leafand
anddisease
diseasespots,
spots,respectively,
respectively,
and calculated the ratio of the pixel area of disease spots to the pixel areaofofthe
and calculated the ratio of the pixel area of disease spots to the pixel area theleaf.
leaf.Then,
Then,
the severity of cucumber downy mildew and cucumber anthracnose was graded accord-
ing to the specified grading standard. In this study, 90 cucumber downy mildew images
and 94 cucumber anthracnose images were selected as test objects, and the predicted dis-
ease severity was compared with the manually labelled severity to evaluate the classifica-
tion accuracy of the model. The experimental results are shown in Tables 8 and 9. It can
be seen from Table 8 that the classification accuracy of cucumber downy mildew from
Agronomy 2024, 14, 72 16 of 19
the severity of cucumber downy mildew and cucumber anthracnose was graded according
to the specified grading standard. In this study, 90 cucumber downy mildew images and
94 cucumber anthracnose images were selected as test objects, and the predicted disease
severity was compared with the manually labelled severity to evaluate the classification
accuracy of the model. The experimental results are shown in Tables 8 and 9. It can
be seen from Table 8 that the classification accuracy of cucumber downy mildew from
Levels 1, 2, 3, 4, and 5 was 100.00%, 100.00%, 94.44%, 92.31%, and 85.71%, respectively,
with an average accuracy of 94.49%. According to Table 9, the classification accuracy of
cucumber anthracnose from Levels 1, 2, 3, 4, and 5 was 100%, 96%, 100.00%, 92.85% and
83.33%, respectively, with an average accuracy of 94.43%. In general, the model had a high
prediction accuracy for disease severity for Levels 1 to 3 but performed suboptimally for
Levels 4 and 5. This is because the edges of leaves with Level 4–5 cucumber downy mildew
or cucumber anthracnose were mostly withered, and the model might recognize such edges
as background factors in the first-stage segmentation, resulting in a lower accuracy.
A comparison of the results of the proposed model TUNet and the existing models is
shown in Table 10. Ref. [34] uses the two-stage method DUNet to segment diseased leaves
and lesions, and Ref. [13] uses an improved U-Net model to simultaneously segment leaves
and lesions. As can be seen in Table 10, TUNet has a higher accuracy in disease severity
grading compared to Ref. [34]. The one-stage model in Ref. [13] has a speed advantage, but
the accuracy is much lower than the two-stage model.
Table 10. Comparison of results of cucumber downy mildew and anthracnose grading.
As can be seen in Figure 12, both Refs. [13,34] have problems with over-segmentation,
that is, the lesions on the edge of the leaves are classified as background, resulting in an
incorrect classification of disease severity. DUNet failed to segment lesions due to the
incorrect segmentation of leaves in the first stage, resulting in an incorrect input in the
second stage, which illustrates the importance of the first-stage model in the two-stage
method. Our method adds global features to the first-stage model for context modeling so
that it can correctly determine whether the edge lesion is part of the leaf, thus avoiding the
As can be seen in Figure 12, both Refs. [13,34] have problems with over-segmentation,
that is, the lesions on the edge of the leaves are classified as background, resulting in an
incorrect classification of disease severity. DUNet failed to segment lesions due to the in-
correct segmentation of leaves in the first stage, resulting in an incorrect input in the sec-
Agronomy 2024, 14, 72 ond stage, which illustrates the importance of the first-stage model in the two-stage 17 of 19
method. Our method adds global features to the first-stage model for context modeling so
that it can correctly determine whether the edge lesion is part of the leaf, thus avoiding
over-segmentation
the over-segmentationproblem. However,
problem. TUNet
However, still still
TUNet has has
shortcomings in the
shortcomings segmentation
in the segmenta-
of small
tion lesions,
of small which
lesions, needsneeds
which further improvement.
further improvement.
Figure 12.
Figure 12. A
A comparison
comparison of
of the
the results
results of
of segmentation.
segmentation.
4. Conclusions
This paper proposed a two-stage model, namely TUNet, for grading the severity of
cucumber leaf diseases. The proposed model consisted of two segmentation networks,
TRNet and U-Net. In the first stage, we chose TRNet to extract the target cucumber leaf
from the image. The TRNet network uses both a convolutional structure and a Transformer
to extract image features, so it can compensate for the global loss caused by down-sampling
in the convolutional structure. The combination of global and local information not only
improved the segmentation accuracy of the target leaf but also effectively reduced the
impact of complex backgrounds on the segmentation task. Then, the segmented leaf image
with a simple background was used as the input of the second-stage segmentation, and
U-Net, which uses ResNet50 as the backbone network, was chosen to extract disease spots
from the image. We found that when ResNet50 was used as the backbone network, it
could accurately detect and segment very small objects, which is conducive to disease
spot segmentation. Further, we compared these two models with several classic models.
The experimental results showed that these two networks outperformed other models in
leaf segmentation and disease spot segmentation, and the fusion of the two yielded more
effective results. Finally, the cucumber disease severity was graded by calculating the ratio
of the total area of disease spots to the area of the entire leaf. The results showed that the
two-stage model proposed in this study performed well with regard to the grading of the
severity of cucumber downy mildew and cucumber anthracnose under real production
Agronomy 2024, 14, 72 18 of 19
environments. It is worth noting that our approach also has limitations. First of all, the
proposed TRNet model takes a long time to infer, and this time loss cannot be ignored.
Therefore, future research should focus on the light weight of the model structure in order
to shorten the segmentation time. Secondly, in addition to the proportion of the lesion area,
the disease severity classification needs to consider the color of the lesions and whether the
diseased leaves are perforated. Therefore, a more accurate classification of disease severity
requires a comprehensive consideration of multiple factors mentioned above.
References
1. Food and Agriculture Organisation of the United States. Food and Agriculture Data. Available online: http://www.fao.org/
faostat/en/#home (accessed on 15 July 2021).
2. Yang, L.; Yu, X.; Zhang, S.; Long, H.; Zhang, H.; Xu, S.; Liao, Y. GoogLeNet based on residual network and attention mechanism
identification of rice leaf diseases. Comput. Electron. Agric. 2023, 204, 107543. [CrossRef]
3. Gulzar, Y. Fruit image classification model based on MobileNetV2 with deep transfer learning technique. Sustainability 2023,
15, 1906. [CrossRef]
4. Kabir, M.M.; Ohi, A.Q.; Mridha, M.F. A Multi-Plant Disease Diagnosis Method Using Convolutional Neural Network. arXiv 2020,
arXiv:2011.05151.
5. Zhang, S.; Zhang, C. Modified U-Net for plant diseased leaf image segmentation. Comput. Electron. Agric. 2023, 204, 107511.
[CrossRef]
6. Bhagat, S.; Kokare, M.; Haswani, V.; Hambarde, P.; Kamble, R. Eff-UNet++: A novel architecture for plant leaf segmentation and
counting. Ecol. Inform. 2022, 68, 101583. [CrossRef]
7. Gulzar, Y.; Ünal, Z.; Aktaş, H.; Mir, M.S. Harnessing the Power of Transfer Learning in Sunflower Disease Detection: A
Comparative Study. Agriculture 2023, 13, 1479. [CrossRef]
8. Esgario, J.G.M.; Krohling, R.A.; Ventura, J.A. Deep learning for classification and severity estimation of coffee leaf biotic stress.
Comput. Electron. Agric. 2020, 169, 105162. [CrossRef]
9. Liang, Q.; Xiang, S.; Hu, Y.; Coppola, G.; Zhang, D.; Sun, W. PD2 SE-Net: Computer-assisted plant disease diagnosis and severity
estimation network. Comput. Electron. Agric. 2019, 157, 518–529. [CrossRef]
10. Hu, G.; Wang, H.; Zhang, Y.; Wan, M. Detection and severity analysis of tea leaf blight based on deep learning. Comput. Electr.
Eng. 2021, 90, 107023. [CrossRef]
11. Pan, J.; Xia, L.; Wu, Q.; Guo, Y.; Chen, Y.; Tian, X. Automatic strawberry leaf scorch severity estimation via faster R-CNN and
few-shot learning. Ecol. Inform. 2022, 70, 101706. [CrossRef]
12. Dhiman, P.; Kukreja, V.; Manoharan, P.; Kaur, A.; Kamruzzaman, M.M.; Dhaou, I.B.; Iwendi, C. A Novel Deep Learning Model for
Detection of Severity Level of the Disease in Citrus Fruits. Electronics 2022, 11, 495. [CrossRef]
13. Chen, S.; Zhang, K.; Zhao, Y.; Sun, Y.; Ban, W.; Chen, Y.; Zhuang, H.; Zhang, X.; Liu, J.; Yang, T. An Approach for Rice Bacterial
Leaf Streak Disease Segmentation and Disease Severity Estimation. Agriculture 2021, 11, 420. [CrossRef]
14. Wspanialy, P.; Moussa, M. A detection and severity estimation system for generic diseases of tomato greenhouse plants. Comput.
Electron. Agric. 2020, 178, 105701. [CrossRef]
15. Zhang, L.-x.; Tian, X.; Li, Y.-x.; Chen, Y.-q.; Chen, Y.-y.; Ma, J.-c. Estimation of Disease Severity for Downy Mildew of Greenhouse
Cucumber Based on Visible Spectral and Machine Learning. Spectrosc. Spectr. Anal. 2020, 40, 227–232.
16. Gonçalves, J.P.; Pinto, F.A.C.; Queiroz, D.M.; Villar, F.M.M.; Barbedo, J.G.A.; Del Ponte, E.M. Deep learning architectures for
semantic segmentation and automatic estimation of severity of foliar symptoms caused by diseases or pests. Biosyst. Eng. 2021,
210, 129–142. [CrossRef]
Agronomy 2024, 14, 72 19 of 19
17. Lin, K.; Gong, L.; Huang, Y.; Liu, C.; Pan, J. Deep Learning-Based Segmentation and Quantification of Cucumber Powdery
Mildew Using Convolutional Neural Network. Front. Plant Sci. 2019, 10, 155. [CrossRef] [PubMed]
18. Tassis, L.M.; de Souza, J.E.T.; Krohling, R.A. A deep learning approach combining instance and semantic segmentation to identify
diseases and pests of coffee leaves from in-field images. Comput. Electron. Agric. 2022, 193, 106732. [CrossRef]
19. Li, K.; Zhang, L.; Li, B.; Li, S.; Ma, J. Attention-optimized DeepLab V3 + for automatic estimation of cucumber disease severity.
Plant Methods 2022, 18, 109. [CrossRef]
20. Long, J.; Shelhamer, E.; Darrell, T. Fully Convolutional Networks for Semantic Segmentation. arXiv 2015, arXiv:1411.4038.
21. Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In International
Conference on Medical Image Computing and Computer-Assisted Intervention, Proceedings of the Medical Image Computing and Computer-
Assisted Intervention—MICCAI 2015, Munich, Germany, 5–9 October 2015; Springer: Cham, Switzerland; Volume 9351, pp. 234–241.
22. Chen, L.-C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. Semantic Image Segmentation with Deep Convolutional Nets
and Fully Connected CRFs. arXiv 2016, arXiv:1412.7062.
23. Chen, L.-C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. DeepLab: Semantic Image Segmentation with Deep
Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. arXiv 2017, arXiv:1606.00915. [CrossRef] [PubMed]
24. Chen, L.-C.; Papandreou, G.; Schroff, F.; Adam, H. Rethinking Atrous Convolution for Semantic Image Segmentation. arXiv 2017,
arXiv:1706.05587.
25. Chen, L.-C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-Decoder with Atrous Separable Convolution for Semantic
Image Segmentation. arXiv 2018, arXiv:1802.02611.
26. Lecun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-Based Learning Applied to Document Recognition. Proc. IEEE 1998,
86, 2278–2324. [CrossRef]
27. Wu, H.; Xiao, B.; Codella, N.; Liu, M.; Dai, X.; Yuan, L.; Zhang, L. CvT: Introducing Convolutions to Vision Transformers. arXiv
2021, arXiv:2103.15808.
28. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need.
arXiv 2017, arXiv:1706.03762.
29. Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.;
Gelly, S.; et al. An Image is Worth 16 × 16 Words: Transformers for Image Recognition at Scale. arXiv 2020, arXiv:2010.11929.
30. Zheng, S.; Lu, J.; Zhao, H.; Zhu, X.; Luo, Z.; Wang, Y.; Fu, Y.; Feng, J.; Xiang, T.; Torr, P.H.S.; et al. Rethinking Semantic
Segmentation from a Sequence-to-Sequence Perspective with Transformers. arXiv 2020, arXiv:2012.15840.
31. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. arXiv 2015, arXiv:1512.03385.
32. Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. MobileNets: Efficient
Convolutional Neural Networks for Mobile Vision Applications. arXiv 2017, arXiv:1704.04861.
33. Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. Pyramid Scene Parsing Network. arXiv 2017, arXiv:1612.01105.
34. Wang, C.; Du, P.; Wu, H.; Li, J.; Zhao, C.; Zhu, H. A cucumber leaf disease severity classification method based on the fusion of
DeepLabV3+ and U-Net. Comput. Electron. Agric. 2021, 189, 106373. [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.