Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

Plant Disease Detection using Hyperspectral Imaging

Peyman Moghadam, Daniel Ward, Ethan Goan, Srimal Jayawardena, Pavan Sikka, Emili Hernandez

Abstract— Precision agriculture has enabled significant


progress in improving yield outcomes for farmers. Recent
progress in sensing and perception promises to further enhance
the use of precision agriculture by allowing the detection of
plant diseases and pests. When coupled with robotics methods
for spatial localisation, early detection of plant diseases will al-
low farmers to respond in a timely and localised manner to dis-
ease outbreaks and limit crop damage. This paper proposes the
use of hyperspectral imaging (VNIR and SWIR) and machine
learning techniques for the detection of the Tomato Spotted
Wilt Virus (TSWV) in capsicum plants. Discriminatory features
(a) (b)
are extracted using the full spectrum, a variety of vegetation
indices, and probabilistic topic models. These features are used
to train classifiers for discriminating between leaves obtained Fig. 1: (a) An example of a hypercube. (b) Spectral re-
from healthy and inoculated plants. The results show excellent flectance of a healthy plant.
discrimination based on the full spectrum and comparable
results based on data-driven probabilistic topic models and
the domain vegetation indices. Additionally our results show
increasing classification performance as the dimensionality of prediction of the plant health [3]. Thermography, fluores-
the features increase. cence and hyperspectral imaging are among the most used
indirect methods for plant disease detection [4].
I. INTRODUCTION Hyperspectral imaging has been successfully applied for
plant disease characterisation, detection, modelling and clas-
Plant diseases are responsible for major economic losses in sification [3]. Hyperspectral imaging measures reflected light
the agricultural industry worldwide. In precision agriculture, from plants in hundreds of narrow bands across the electro-
monitoring plant health and detecting pathogens are essential magnetic spectrum as a hypercube (Figure 1a). A typical
to reduce disease spread and facilitate effective management spectral reflectance of a healthy plant is shown in Figure 1b.
practices [1]. Automatic detection of plant diseases and other A plant’s interaction with different parts of the electromag-
crop stresses could be a valuable source of information for netic spectrum depends on leaf biochemical compounds and
improving crop management strategies and disease control leaf anatomical structure. Healthy plants typically absorb
measures to prevent the development and the spread of light in the visible range (VIS 400-700 𝑛𝑚) due to leaf
diseases. photosynthesis pigments. The amount of light scattered in
The automatic plant disease detection methods can be the near-infrared range (NIR 700-1000 𝑛𝑚) is strongly
categorised into two main groups, direct and indirect (proxy) sensitive to leaf cell structure. Dominant factors influencing
methods. Molecular and serological methods are examples leaf reflectance in short-wave infrared (SWIR 1000-2500
of direct detection and favor high-throughput analysis of a 𝑛𝑚) are leaf water and chemical contents.
large number of samples. These methods provide accurate When plants undergo various stresses, they react through
identification of the disease-pathogen link by directly detect- biophysical and biochemical variations such as degradation
ing the disease causing pathogens [2]. Common pathogens of the leaf chlorophyll content or changes in leaf cell struc-
include bacteria, fungi and viruses. These methods require tures. Hyperspectral imaging has the advantage of detecting
at least couple days for sample harvest, processing, and these subtle changes in spectral reflectance of plants. A range
analysis. Indirect (proxy) methods identify plant diseases of machine learning methods have been developed based on
through various parameters such as morphological change, these spectral reflectance values for automatic classification
temperature change, transpiration rate change and volatile of plant diseases. The process usually involves extracting
organic compounds released by infected plants [2]. These features from spectral reflectance, learning a classifier model
methods are mainly based on optical imaging techniques. from diseased and healthy plant images and finally apply-
The optical imaging sensors provide detailed information ing the model to new data for the prediction of diseased
based on different electromagnetic spectra and thus, enable leaves [5], [6]. A common feature extraction approach is to
calculate spectral Vegetation Indices (VIs) related to specific
The authors are with the Autonomous Systems, Cyber-Physical physiological parameters. However, these VIs are generally
Systems, DATA61, CSIRO, Brisbane, QLD 4069, Australia. E-mails:
firstname.lastname@data61.csiro.au Correspondence not designed to discriminate between healthy and diseased
should be addressed to peyman.moghadam@data61.csiro.au plants [7].

978-1-5386-2839-3/17/$31.00 ©2017 IEEE


In this paper, we present an automatic method to detect
plant disease using hyperspectral imaging (VNIR and SWIR)
and machine learning techniques. We evaluate the utility
of three types of features based on the hypercube - the
full spectrum, spectral Vegetation Indices (VIs) and features
generated using data driven probabilistic topic models. In
contrast to previous work [8] we propose to use a data
driven approach to build features and do not make any prior
assumptions about the topics.
The remainder of this paper is organized as follows. Sec-
tion II explains existing hyperspectral imaging and machine
learning techniques applied to plant disease detection. In (a) (b)
Section III we present the hyperspectral imaging system used
in this paper. Section IV explains the pre-processing steps Fig. 2: (a) Hyperspectral imaging system. (b) Closeup of the
such as leaf segmentation and grid removal. In section IV target platform showing a leaf being imaged by the system.
we propose our feature extraction techniques based on leaf
spectral indices, entire spectrum, and probabilistic topic mod-
eling from VNIR and SWIR imaging systems. Section VI hyperspectral imaging [8]. However, in contrast to previous
demonstrates the effectiveness of the proposed feature ex- work, we do not make any assumptions on the nature of
traction techniques for automatic plant disease classification the topics (healthy vs disease) and use topic modelling to
in greenhouse experiments. Finally, we conclude our paper generate discriminative features for disease detection in a
in Section VII and discuss the future directions. data driven manner.
II. RELATED WORK III. HYPERSPECTRAL IMAGING SYSTEM
SVM is a common choice of machine learning algorithm The hyperspectral imaging system (Fig 2) consists of
for detection of plant diseases. Rumpf et al. [5] used a variety two push-broom hyperspectral cameras from Headwall -
of machine learning algorithms for early detection of plant the VNIR A-series and the SWIR M-series. The VNIR
diseases based on hyperspectral reflectance. They showed hyperspectral camera has a spectral range of 400-1000 𝑛𝑚
a SVM trained on spectral VI feature vectors calculated with 324 spectral bands and a spatial resolution of 1004
from a leaf mean of 25 representative spectra to be the best pixels. The SWIR hyperspectral camera provides a spectral
performing model in disease classification. Similarly, in our range of 900-2500 𝑛𝑚 with 168 spectral bands and a spatial
paper, a leaf mean spectra was calculated from all pixels. resolution of 384 pixels. The cameras are attached to a
Rumpf et al. [5]s choice of VNIR VIs matched those used frame with a linear motion stage whose axis of motion is
in our paper with the addition of the leaf SPAD value, a perpendicular to the spatial axis of the two cameras. Six 20𝑊
measure of chlorophyll content. Rumpf et al. [5] reported halogen lights are used for illuminating the leaf samples. We
diseased leaf classification accuracy of up to 97% using use a thread grid to hold the leaf sample flat against the plate
K-fold testing. The majority of existing literature in the during imaging. The entire system is controlled from a PC
field of small scale hyperspectral imaging involves VNIR running the Hyperspec III software from Headwall.
technology, furthermore the distribution of vegetation indices The data acquisition protocol is an important aspect of
across the reflectance regions strongly favors VNIR [9], [10], experiment design. It was experimentally determined that
[11]. while the temperature variation is less than 5% for the
More recently, Zhu et al. [12] investigated early detection VNIR camera after it was first turned on, there is significant
and classification of inoculated tobacco leaves based on variation in the SWIR camera which needs about 1 hour to
hyperspectral imaging. They use only a VNIR camera and stabilise after being turned on. Another important aspect of
generate mean leaf spectra from a ROI rather than the all hyperspectral sensing is the timely collection of dark refer-
pixels, Zhu et al. [12] reported model accuracies from 75- ence images which provide an estimate of the noise (i.e., dark
98.33%. Most notably, 81.67% for an SVM with RBF kernel current noise) and white reference images which provide the
and 98.33% for an extreme learning machine. Unlike this spectrum of the incident light. The dark reference is normally
paper, they used effective wavelengths (EWs) as features obtained by capturing an image with the lens cap on (0%
to train all their models. EWs were found by successive reflectance) while the white reference is obtained by imaging
projections algorithm. The majority of literature has extracted a standard white Teflon target (99.9% reflectance). An auto-
features from the hyperspectral images through approaches mated system, implementing the considerations above, was
such as PCA [12], VIs [5] and manual wavelength selection developed for the data acquisition stage of the experiment.
[13]. It also guarded the user from errors such as leaving a lens
Probabilistic Topic Modeling [14] is used extensively in cap on during white reference and target leaf imaging or
the natural language processing community as an unsuper- leaving a lens cap off for dark reference imaging. The system
vised learning technique and has been recently extended to enforced the collection of meta-data and the timely collection
Fig. 3: The dataflow of the hypercube pre-processing steps (a) (b)
used in this paper.

of dark and white reference data (at least every 30 minutes,


configurable).
Once the raw radiance hypercubes were collected they
were calibrated to normalized reflectance hypercubes using
white and dark references images to account for drift of the
(c) (d)
light system and thermal variations in hyperspectral imaging
sensors. Then, to improve signal to noise ratio (SNR) the Fig. 4: Grid removal process (a) Whitened image from SWIR
starting 58 wavelengths in VNIR (below 500 𝑛𝑚) and the camera. (b) Centre of output of convolving the matched
starting 1 wavelength in SWIR (900 𝑛𝑚) were discarded for filter with the whitened image. (c) Average of matched filter
further processing in this paper. The Savitzky-Golay filter [8] output and peaks found via the CWT. (d) False colourised
was next applied to smooth the reflectance spectrum with 11 representation of SWIR data with the grid removed.
supporting points and 2 degree polynominal.

IV. HYPERCUBE PRE-PROCESSING


It was ensured that the region of interest (ROI) cropped from
Figure 3 displays the dataflow of the hypercube pre- the VNIR and SWIR masks were of the same shape. Figure 3
prossesing steps. Initially, a predefined region of interest shows an example image and the resulting leaf segmentation
(ROI) is extracted from the hypercube. Next, the leaf is mask in the leaf segmentation step of the dataflow.
automatically segmented from the background. The grid is
detected within the images and a grid mask is generated. B. Grid Removal
The final hypercube is generated by applying the relevant Whilst use of the grid permits holding the leaf flat and
grid mask to each of the leaf segmented images. measuring of reflectance normal to the leaves, it introduces
unwanted artifacts in hypercubes. Despite common place-
A. Leaf Segmentation ment during the experiment, a small amount of displacement
No single frequency band or combination of bands was in each grid line is caused by each individual leaf. This
able to be utilized to segment the leaf from the back- makes accurate and spatially adaptive grid removal essential
ground. This was particularly evident in some diseased leaves to ensure that spectral measurements used for classification
where dry/dead leaf spots were commonly miss-classified as pertain only to the leaf being scanned. Since the shape and
background. Spectral profiles of vegetation exhibit a unique orientation of the grid remains consistent throughout scans, a
spectral signature compared to other materials in the image spatial matched filter approach is proposed for detection and
such as plastics and metals. One could exploit this spectral removal of the grid. A matched filter is an optimal filter for
signature to segment the leaf from the background. detection of a known signal corrupted with additive white
In this paper, an unsupervised K-means clustering algo- noise.
rithm was applied in the spectral domain to segment the leaf For a matched filter approach to be suitable, the data
from the background. Cluster analysis successfully classified must be first be whitened. Since the spectral content of
the image spectra into two classes using the significant natural images is primarily at low frequencies, a high pass
spectral profile difference between the vegetation and it’s filter is applied in an effort to flatten the power spectral
surroundings. The spatial location of each classified spectra density of the image. The high pass filter used is a third
was maintained and a 2D leaf mask was generated. Then, order FIR Butterworth filter. An example of the output of
in order to detect the leaf and to identify any misclassified this filter is shown in Figure 4a. It can be seen that whilst
regions, the contours of the binary mask were detected using much of the leaf content has been attenuated, high frequency
an algorithm proposed by Suzuki et al. [15]. The largest components at the edge of the grid are still prominent. With
of the detected contours is assumed to be the leaf, while the whitened image, a spatially matched filter is created.
smaller contours were removed as misclassified pixels. The Given the dimension and orientation of the grid, the matched
extremities of the largest contours were located and mask filter is approximated by a column that is four pixels wide
was cropped to this size with a 5% buffer on all dimensions. and the same height as the image, with all elements equal to
Algorithm 1: Grid removal algorithm more wavelengths that are related to biophysical character-
Input : Hyperspectral hypercubes, Type of camera istics of plants. However, most VIs have been developed for
Output: Hypercubes with grid removed remote sensing, and are not disease specific. As such, using a
function gridRemoval (ℎ𝑦𝑝𝑒𝑟𝑐𝑢𝑏𝑒, 𝑐𝑎𝑚𝑒𝑟𝑎); single VI for disease discrimination is not feasible. Different
// Select band where grid most salient VIs relate to different biophysical plant properties. Steddom
if camera = VNIR then et al. [27] amongst others, investigated the effectiveness of
2DImage = hypercube[666nm] using multiple VIs for disease discrimination. They found
else that differences in betacyanin and carotenoid levels appeared
2DImage = hypercube[1450nm] to be related to the investigated disease and not nitrogen con-
end tent [28]. These results indicate the importance of choosing
whitened ← HPF(2DImage) VIs specific to the application.
// Convolve with matched filter Table I displays the vegetation indices which were used
gridLoc2D ← convolve(abs(whitened), matchedFilt) in this paper as features where 𝑅 denotes a particular wave-
gridLoc1D ← verticalMean(gridLoc2D) length reflectance value. In the VNIR range (400-1000 𝑛𝑚),
gridPeaks ← findPeaksCWT(gridLoc1D) vegetation indices are mostly correlated to variation in pho-
// Use found peaks to mask out grid in tosynthetic pigments. The Normalized Difference Vegetation
horizontal dimension 𝑥 Index (NDVI) and Simple ratio (SR) relate to stress, biomass
return hypercube[𝑥 != gridPeaks, 𝑦, 𝜆] and leaf area [17], [18]. Structure Insensitive Pigment Index
(SIPI), Modified Chlorophyll Absorption Integral (mCAI)
and both Pigment Specific Simple Ratio (PSSR) indices
one. This filter is then convolved with the whitened image, relate to chlorophyll concentration [20], [23]. Furthermore,
and the absolute value of the output is found as shown in Anthocyanin Reflectance Index (ARI) provides an index for
Figure 4b. anthocyanin [21] and Red Edge Position (REP) relates to
Peaks in the output of the convolution are found as column the inflection point of the red edge [22]. Those relevant to
shapes in the 2D array. As only the horizontal location the VNIR region, were selected from results of correlation
of these peaks is of importance, a vertical Gaussian blur and regression analyses between VIs and disease severity by
is applied to smooth each column vertically. The output Mahlein et al. [29]. The VNIR VIs are the same as those
from convolution with the spatially matched filter and the used in a comparable study by Rumpf et al. [5].
whitened image is then averaged along the vertical axis to In SWIR range (1000-2500 𝑛𝑚), majority of VIs are
create a 1D array. The peaks in this filtered 1D vector were related to water content and chemical compounds [30].
used to represent the horizontal position of each grid line. The Water Band Index (WBI) relates to leaf water con-
The location of each grid line is indicated by a peak in tent [24]. While Cellulose Absorption Index (CAI) provides
the mean vector found through the matched filter. Figure an indication of dried plant material [25] and Normalized
4c shows an example of the signal obtained. To find these Difference Lignin Index (NDLI) and Normalized Difference
peaks, the method proposed by Du et al. [16] which utilises Nitrogen Index (NDNI) are normalized difference indexes of
the Continuous Wavelet Transform (CWT) is employed. The lignin and nitrogen content respectively [26]. In this study,
CWT can be expressed as, all VIs for each leaf (sample) were calculated using the mean
∫ ∞ spectra of all pixels in that leaf.
𝐶(𝑎, 𝑏) = 𝑠(𝑡)𝜓𝑎,𝑏 (𝑡) 𝑑𝑡 (1)
−∞ B. Full Spectrum
where 𝑠(𝑡) is a given signal, 𝜓(𝑡) is the mother wavelet, 𝑎 is In order to preserve short range dependencies between
the scale and 𝑏 is the translation in the independent variable. spectral wavelengths, the mean reflectance for each wave-
To find peaks, the CWT is applied at a number of different length for all pixels in a leaf were used as features. The
scales and local maxima at each scale are found. Peaks found entire spectrum (feature vector length: VNIR: 268, SWIR:
at surrounding scales create a ridge in the coefficient matrix 167) was used to allow the models to learn complex re-
at a common point of translation. For this application, the lationships between shape of the spectral reflectance rather
Ricker mother wavelet is used. From the peaks found from than limiting the model to the information within the few
application of the CWT, a mask is created to indicate location bands of predefined VIs. The length of the feature vector
of the grid. A column with a width of nine pixels was was dependant on the frequency range of image following
used to illustrate the width of each grid line as shown in preprocessing.
Figure 4d. The grid removal steps are further summarised in
Algorithm 1. C. Probabilistic Topic Modeling
In order to utilize the entire hyperspectral spectrum to
V. FEATURE EXTRACTION
build useful features with reduced dimensions for a classifier
A. Vegetation Indices that predicts diseased leaves, we generate features based
Vegetation Indices (VIs) are widely used for extracting on a data-driven approach using Probabilistic Topic Mod-
features from the spectral reflectance by combining two or eling [14] which is used extensively in the natural language
Spectral region Index Equation
𝑅800 −𝑅670
Normalised Difference Vegetation Index [17] 𝑁 𝐷𝑉 𝐼 = 𝑅800 +𝑅670
𝑅800
Simple Ratio [18] 𝑆𝑅 = 𝑅670
Structure Insensitive Pigment Index [19] 𝑆𝐼𝑃 𝐼 = 𝑅 800 −𝑅445
𝑅800 +𝑅680
VNIR Pigments Specific Simple Ratio (a) [20] 𝑃 𝑆𝑆𝑅𝑎 = 𝑅 800
𝑅680
Pigments Specific Simple Ratio (b) [20] 𝑃 𝑆𝑆𝑅𝑏 = 𝑅 800
𝑅635
1 1
Anthocyanin Reflectance Index [21] 𝐴𝑅𝐼 = ( 𝑅500 ) − ( 𝑅700 )
40(𝑅𝑅𝐸 −𝑅700 )
Red Edge Position [22] 𝑅𝐸𝑃 = 700 + (𝑅740 −𝑅700 ) , 𝑅𝑅𝐸 = 𝑅670 +𝑅 2
700

(𝑅545 +𝑅752 ) ∑𝑅752


Modified Chlorophyll Absorption Integral [23] 𝑚𝐶𝐴𝐼 = 2
.(752 − 545) − ( 𝑅545 𝑅 ∗ 1.423)
𝑅970
Water Band Index [24] 𝑊 𝐵𝐼 = 𝑅900
Cellulose Absorption Index [25] 𝐶𝐴𝐼 = 0.5 ∗ (𝑅2000 + 𝑅2200 ) − 𝑅2100
log 𝑅 1 −log 𝑅 1
SWIR Normalized Difference Lignin Index [26] 𝑁 𝐷𝐿𝐼 = 1754
log 𝑅 1
1680
+log 𝑅 1
1754 1680
log 𝑅 1 −log 𝑅 1
Normalized Difference Nitrogen Index [26] 𝑁 𝐷𝑁 𝐼 = log 𝑅
1510
1
1680
+log 𝑅 1
1510 1680

TABLE I: Vegetation indices used as features for VNIR and SWIR (VNIR indices replicated from Rumpf et al. [5]).

processing community. It is an unsupervised learning tech-


nique that models documents as words generated from topics
based on a two step generative process where each word in
a document is chosen by randomly picking a ‘topic’ and
randomly picking a ‘word’ from that topic. To apply topic
modelling to hyperspectral imaging we treat each hypercube
pixel as a document and each wavelength-reflectance pair in
the discretized reflectance spectrum is treated as a word in the
corpus [8]. The wordified hyperspectral data is used to train
a regularized Latent Dirichlet Allocation (LDA) model using
on-line Variational Bayes (VB) inference [8] on hypercube.
Similar to [8], we used 50 quantisation levels for reflectance
and apply a down-sampling factor of five. The trained model
is used to obtain topic proportions for the hyperspectral
pixels of the leaves. Fig. 5: Differences in the distributions of the leaf-wise feature
for control samples (green) vs diseased samples (red) shows
its utility as a discriminative feature. The feature is the ratio
The proportions of topics in each hyperspectral pixel of pixels in each leaf where the topic proportion is greater
can be expected to differ between inoculated and control than a threshold 𝜏 that is computed from the data. The
(healthy) leaves. Therefore the topic proportions should figure shows the feature element 𝑥19 from topic 19. Feature
form a good basis for generating data-driven discriminative elements for other topics show similar characteristics.
features. We build a feature vector 𝑥 ∈ ℝ𝑚 for each leaf
where each element 𝑥𝑖 ∈ 𝑥 contains the ratio of pixels in
the leaf where the proportion of topic 𝑖 is greater than a VI. RESULTS
threshold 𝜏 . The threshold 𝜏 is determined from the data
A. Greenhouse Experiment
by conducting a grid search to select the threshold which
gives the best classification performance on the training data The experiment were carried out in November and De-
with 5-fold cross validation. To illustrate the discriminative cember 2016 in a controlled greenhouse environment with
capability of these features, the distribution of the ratios for relative humidity of 68% (𝜎 = 13.5%) and temperature of
topic 𝑖 = 19 (𝑥19 ) have been compared across diseased and 27.8∘ C (𝜎 = 4.53∘ 𝐶). Capsicum (variety Warlock) were
control leaves in Figure 5. As shown, distributions of the used in this experiment. Plants were watered daily using a
pixel ratios between diseased and control leaves are different, drip irrigation system. 30 plants were inoculated with Tomato
making it a useful feature to discriminate between the two Spotted Wilt Virus (TSWV) when they were 8 weeks old.
classes. We are able to build a feature matrix 𝑋 ∈ ℝ𝑛×𝑚 Disease progression of TSWV was observed for 21 days
with a leaf based feature vector in each row and a label after inoculation. A further 30 plants was not inoculated and
vector 𝑦 ∈ ℝ𝑛 which can be used for supervised learning. served as control of healthy plants for this experiment.
To evaluate dissimilarity between distributions of inocu-
lated and control hypercubes, we first randomly split the
control group per DAI by half to represent the prior proba-
bility distribution 𝑄. Then, we calculate the KL-divergence
between two marginal distributions, the selected control
leaves (𝑄) and inoculated leaves (𝑃 ) per DAI, along each
spectrum (𝜆) where 𝜆 = 1, ..., 𝑚 and 𝑚 is the number of
spectrum bands for each hyperspectral camera (VNIR and
SWIR). Finally, the dissimilarity values obtained for each
spectrum are combined into a joint overall dissimilarity value
using Equation (3). For comparison, we repeat the procedure
to measure dissimilarity between two control distributions
(50% split) per DAI. Joint overall dissimilarity between two
control distributions (control-control) shows close to zero
values (0.24). This implies that their spectrum distributions
in both SWIR and VNIR hypercubes are similarly distributed
Fig. 6: The joint overall dissimilarity values between control- across different days.
inoculated per DAI for VNIR and SWIR hypercubes. Figure 6 illustrates the joint overall dissimilarity values
between control-inoculated per DAI for SWIR and VNIR
hypercubes. Note that the small dissimilarity values from
Leaves were harvested from both plants groups (inoculated DAI 3 to DAI 10 between inoculated and control spectrum
and control) from 3 days after inoculation (DAI) till 21 DAI. distributions in VNIR (Figure 6) are quite consistent with
Leaves were only harvested from the new growth above the the ground-truth labels in our experiments where the visual
inoculation node. In this study, a total of 216 leaves (133 symptoms of TSWV only appears after DAI 10.
inoculated leaves, 103 control leaves) were sampled on six Figure 6 indicates that dissimilarity values between inoc-
separate days over 18 days period. Each leaf was assessed ulated and control spectrum distributions in SWIR gradually
and labeled visually for symptoms by QDAF plant scientists. increase over time. Except for DAI 7 and 10, where we
Labels consisted of presence or absence of symptoms and were not able to correlate this small reduction in SWIR
disease severity using 1-5 disease rating scale. First visual overall dissimilarity values to any plant-pathogen biophysical
symptoms of TSWV only appeared 10 DAI. interaction. This suggests that it might be linked to other
stress factors such as environmental variation during those
B. Dissimilarity Estimation using Relative Entropy days.
In order to make inferences from collected hyperspectral Figure 7a and Figure 7b demonstrate which part of the
data using any machine learning technique, we first should electromagnetic spectrum is mainly contributing to the joint
measure the dissimilarity between two spectrum distributions overall dissimilarity values. As can be seen, the key discrim-
of inoculated and control hypercubes. We use the Kullback- inative spectrum ranges in VNIR (Figure 7a) are located in
Leibler (KL) divergence, also called relative entropy, to green (520-580 𝑛𝑚, maximum reflectivity of leaf pigments),
estimate the distances between two distributions (inoculated red (650-680 𝑛𝑚, maximum chlorophyll absorption), and
and control). In this application, we use the control group red edge (680-720 𝑛𝑚, where changes appear between
to represent the model distribution (𝑄) while the inoculated reflectivity of stress and healthy leaves). As we expected only
group represents the true distribution of data (𝑃 ). The KL- DAI 12 and 21 show strong dissimilarity in these spectrum
divergence between these two probability density functions ranges. In stressed plants, photosynthesis pigments decline
can be approximated as: resulting in a reduction in overall green reflection and an
increase in red reflectance.
∑ 𝑃𝜆 (𝑖)
𝐷𝜆 (𝑃 ∣∣𝑄) = 𝑃𝜆 (𝑖)𝑙𝑜𝑔 (2) The variation in SWIR wavelengths are scattered across
𝑖
𝑄𝜆 (𝑖) spectrum range (910-2500 𝑛𝑚) and across different DAI. In
SWIR Figure 7b, the KL-divergence variability is primarily
where 𝐷𝑠 (𝑃 ∣∣𝑄) denotes a dissimilarity measure between governed by scattering and absorption characteristics of the
the control group and inoculated group. A superscript in- leaf structure, biochemical and water content. Pathogenesis
dicates that the respective measure is applied only to the causes variation in leaf cell structure which result in different
marginal distributions along spectrum 𝜆. The dissimilarity light scattering signatures. In addition, loss of water impacts
values obtained for single spectrum must be combined into on biological processes during pathogenesis and this can
a joint overall dissimilarity value as: influence the spectral reflectance in the SWIR range.

𝐷(𝑃 ∣∣𝑄) = ( (𝐷𝜆 (𝑃 ∣∣𝑄))𝑝 )1/𝑝 (3)
𝜆

where 𝑝 = 1 is used in this work.


(a) VNIR (b) SWIR

Fig. 7: The KL-divergence between two distributions (inoculated and control group) for each spectrum in (a) VNIR and (b)
SWIR across different DAI.

C. Machine Learning was 93.6% and 91.5% accurate on VNIR and SWIR images
respectfully. This is higher than the same model used by
Table II presents the results of classification between Zhu et al. [12] which achieved 81.67% with a full spectrum
control and diseased leaves based on three types of feature feature vector calculated from a ROI. Whether the increased
sets derived: the full spectrum, spectral Vegetation Indices accuracy can be attributed to differing experimental condi-
(VIs) and data driven features generated using probabilistic tions (capsicum plants and TSWV vs. tobacco and tobacco
topic models. The full spectrum is the result of a spatial mosaic virus) or feature extraction (mean of all spectra vs.
mean of the leaf and the VIs are derived from this. The ROI) requires further investigation.
dataset was split 80-20% for training and testing respectively. The accuracy of human experts labelling the leaves as
Results are reported for the testing set only. All models inoculated on last DAI was 70%. This suggests that the
were trained and evaluated on identical data. We trained presented models are better than human expert classification
an SVM with RBF kernel using grid search on 80% train abilities.
data. Overall, the full spectrum based models outperformed
both the spectral Vegetation Indices (VIs) and probabilistic
VII. C ONCLUSIONS
topic modeling for VNIR and SWIR. The performance of
classifiers with the topic modelling based features were better In this paper, we have presented the results of an early
than the VI based models but lower than the full spectrum experiment in the automatic detection of plant disease us-
based models. Interestingly, the dimensionality of the three ing hyperspectral imaging. An experimental trial was run
types of features considered increase in the order of VIs, to detect capsicum plants with the Tomato Spotted Wilt
topic model based features and full spectrum based features. Virus (TSWV). An automated system was developed to
Our results indicate that the performance of the classifiers collect hyperspectral hypercubes of plant leaves in VNIR and
improve as the dimensionality of the features increases. SWIR spectrum ranges at regular intervals after inoculation.
In this study, all components of the feature vectors were The system was also used to collect associated meta-data
equally weighted. Future investigations could further explore about the plants. The data was then analysed using image
this area and deep learning approaches such as neural net- processing and machine learning techniques to demonstrate
works. Mosho et al. [13] used multilayer perceptrons to effective automatic discrimination of plant leaves. We trained
classify yellow rust in wheat VNIR leaf spectra and achieved SVM classifiers on three types of features; vegetation indices,
classification accuracy greater than 99%. Feature weighting features based on probabilistic topic modeling and the full
may also play a role in investigating different selections and spectrum, on both VNIR and SWIR hypercubes. Our results
combinations of VIs. A similar trend between performance showed better than 90% accuracy. It is important to empha-
and feature vector length is seen when VNIR models to size that we used the full electromagnetic spectrum including
SWIR. The SWIR models have significantly less features both the VNIR component (400-1000 𝑛𝑚) and the SWIR
and were out performed by the VNIR models (approximately component (900-2500 𝑛𝑚) compared to previous studies
10%). The VNIR full spectrum SVM with RBF kernel which have mostly relied on the VNIR component only. The
achieved the greatest performance on the test set. Most work based on Topic Models is of a preliminary nature and is
notably, an F1-score of 0.939 and area under the Receiver very promising. Future work includes exploration of dynamic
Operator Characteristic (ROC) curve of 0.95. topic models to estimate the dynamic nature of pathogen-
The best performing model, full spectrum RBF SVM, plant development over time.
Spectral Range Model Accuracy F1-Score Specificity Sensitivity
Vegetation Indices 0.830 0.840 0.818 0.840
VNIR Full Spectrum 0.936 0.939 0.955 0.920
Topic Models 0.894 0.898 0.882 0.876
Vegetation Indices 0.787 0.800 0.773 0.800
SWIR Full Spectrum 0.915 0.920 0.909 0.920
Topic Models 0.840 0.846 0.870 0.815

TABLE II: Classification results using an SVM (RBF) with vegetation indicies, full spectrum and probabilistic topic models.

ACKNOWLEDGMENT [13] D. Moshou, C. Bravo, J. West, S. Wahlen, A. McCartney, and H. Ra-


mon, “Automatic detection of yellow rust in wheat using reflectance
This work has been funded by Horticulture Innovation measurements and neural networks,” Computers and electronics in
Australia through project VG15024 and Commonwealth Sci- agriculture, vol. 44, no. 3, pp. 173–188, 2004.
entific and Industrial Organization (CSIRO) and in collab- [14] D. M. Blei, “Probabilistic topic models,” Communications of the ACM,
vol. 55, no. 4, pp. 77–84, 2012.
oration with the Queensland Government’s Department of [15] S. Suzuki et al., “Topological structural analysis of digitized binary
Agriculture and Fisheries (QDAF). The authors gratefully images by border following,” Computer vision, graphics, and image
acknowledge valuable feedback from Susanne Heisswolf processing, vol. 30, no. 1, pp. 32–46, 1985.
[16] P. Du, W. A. Kibbe, and S. M. Lin, “Improved peak detection in mass
through the project. The authors appreciate critical support spectrum by incorporating continuous wavelet transform-based pattern
provided by QDAF plant scientists Andrew Manners, Denis matching,” Bioinformatics, vol. 22, no. 17, pp. 2059–2065, 2006.
Persley, David Carey, Tony Cooke and Mary Firrell for [17] J. Rouse Jr, R. Haas, J. Schell, and D. Deering, “Monitoring vegetation
systems in the great plains with erts,” Third ERTS Symposium. NASA,
greenhouse data collection and labeling. Authors also thank pp. 309 – 317, 1974.
James Brett from the CSIRO. [18] M. G. R. Birth, Gerald S., “Measuring the color of growing turf
with a reflectance spectrophotometer,” American Society of Agronomy,
R EFERENCES vol. 60, pp. 640–643, 1968.
[1] F. Martinelli, R. Scalenghe, S. Davino, S. Panno, G. Scuderi, P. Ruisi, [19] I. F. J. Penuelas, F. Baret, “Semiempirical indexes to assess carotenoids
P. Villa, D. Stroppiana, M. Boschetti, L. R. Goulart et al., “Advanced chlorophyll-a ratio from leaf spectral reflectance,” Photosynthetica,
methods of plant disease detection. a review,” Agronomy for Sustain- vol. 31, no. 2, p. 221230, 1995.
able Development, vol. 35, no. 1, pp. 1–25, 2015. [20] G. A. Blackburn, “Spectral indices for estimating photosynthetic pig-
[2] Y. Fang and R. P. Ramasamy, “Current and prospective methods for ment concentrations: a test using senescent tree leaves,” International
plant disease detection,” Biosensors, vol. 5, no. 3, pp. 537–561, 2015. Journal of Remote Sensing, vol. 19, no. 4, pp. 657–675, 1998.
[3] A.-K. Mahlein, E.-C. Oerke, U. Steiner, and H.-W. Dehne, “Recent [21] A. A. Gitelson, M. N. Merzlyak, and O. B. Chivkunova, “Optical
advances in sensing plant diseases for precision crop protection,” properties and nondestructive estimation of anthocyanin content in
European Journal of Plant Pathology, vol. 133, no. 1, pp. 197–209, plant leaves,” Photochem. Photobiol., vol. 74, no. 1, pp. 38–45, 2001.
2012. [22] G. Guyot and F. Baret, “Utilisation de la haute resolution spectrale
[4] L. Chaerle and D. Van Der Straeten, “Imaging techniques and the pour suivre l’etat des couverts vegetaux,” in Spectral Signatures of
early detection of plant stress,” Trends in plant science, vol. 5, no. 11, Objects in Remote Sensing, vol. 287, 1988, p. 279.
pp. 495–501, 2000. [23] R. D. R. Laudien, G. Bareth, “Analysis of hyperspectral field data
[5] T. Rumpf, A.-K. Mahlein, U. Steiner, E.-C. Oerke, H.-W. Dehne, for detection of sugar beet diseases,” in Proceedings of the EFITA
and L. Plümer, “Early detection and classification of plant diseases Conference, Debrecen, Hungary, jul 2003, p. 375381.
with support vector machines based on hyperspectral reflectance,” [24] C. Champagne, E. Pattey, A. Bannari, and I. B. Strachan, “Mapping
Computers and Electronics in Agriculture, vol. 74, no. 1, pp. 91–99, crop water stress: Issues of scale in the detection of plant water status
2010. using hyperspectral indices,” in Physical measurements & signatures
[6] C. Xie and Y. He, “Spectrum and image texture features analysis for in remote sensing. International symposium, 2001, pp. 79–84.
early blight disease detection on eggplant leaves,” Sensors, vol. 16, [25] C. Daughtry, E. Hunt, and J. McMurtrey, “Assessing crop residue cover
no. 5, p. 676, 2016. using shortwave infrared reflectance,” Remote Sensing of Environment,
[7] A.-K. Mahlein, T. Rumpf, P. Welke, H.-W. Dehne, L. Plümer, vol. 90, no. 1, pp. 126–134, 2004.
U. Steiner, and E.-C. Oerke, “Development of spectral indices for [26] L. Serrano, J. Penuelas, and S. L. Ustin, “Remote sensing of nitrogen
detecting and identifying plant diseases,” Remote Sensing of Environ- and lignin in mediterranean vegetation from aviris data: Decomposing
ment, vol. 128, pp. 21–30, 2013. biochemical from structural signals,” Remote sensing of Environment,
[8] M. Wahabzada, A.-K. Mahlein, C. Bauckhage, U. Steiner, E.-C. Oerke, vol. 81, no. 2, pp. 355–364, 2002.
and K. Kersting, “Plant phenotyping using probabilistic topic models: [27] K. Steddom, M. Bredehoeft, M. Khan, and C. Rush, “Comparison of
Uncovering the hyperspectral language of plants,” Scientific reports, visual and multispectral radiometric disease evaluations of cercospora
vol. 6, 2016. leaf spot of sugar beet,” Plant Disease, vol. 89, no. 2, pp. 153–158,
[9] A. Thenkabail, P. S. Lyon, and J. G. Huete, Hyperspectral remote 2005.
sensing of vegetation. CRC Press, 2011. [28] K. Steddom, G. Heidel, D. Jones, and C. Rush, “Remote detection of
[10] M. Govender, K. Chetty, and H. Bulcock, “A review of hyperspectral rhizomania in sugar beets,” Phytopathology, vol. 93, no. 6, pp. 720–
remote sensing and its application in vegetation and water resource 726, 2003.
studies,” Water Sa, vol. 33, no. 2, pp. 145–151, 2007. [29] A.-K. Mahlein, U. Steiner, H.-W. Dehne, and E.-C. Oerke, “Spectral
[11] E. Adam, O. Mutanga, and D. Rugege, “Multispectral and hyper- signatures of sugar beet leaves for the detection and differentiation of
spectral remote sensing for identification and mapping of wetland diseases,” Precision Agriculture, vol. 11, no. 4, pp. 413–431, 2010.
vegetation: a review,” Wetlands Ecology and Management, vol. 18, [30] I. Herrmann, A. Karnieli, D. Bonfil, Y. Cohen, and V. Alchanatis,
no. 3, pp. 281–296, 2010. “Swir-based spectral indices for assessing nitrogen content in potato
[12] H. Zhu, H. Cen, C. Zhang, and Y. He, “Early detection and classifi- fields,” International Journal of Remote Sensing, vol. 31, no. 19, pp.
cation of tobacco leaves inoculated with tobacco mosaic virus based 5127–5143, 2010.
on hyperspectral imaging technique,” in 2016 ASABE Annual Inter-
national Meeting. American Society of Agricultural and Biological
Engineers, 2016, p. 1.

You might also like