Professional Documents
Culture Documents
2 FCM
2 FCM
net/publication/282771723
CITATIONS READS
10 1,622
5 authors, including:
Diptesh Majumdar
Carnegie Mellon University
4 PUBLICATIONS 34 CITATIONS
SEE PROFILE
All content following this page was uploaded by Diptesh Majumdar on 12 October 2015.
Diptesh Majumdar1, Arya Ghosh2, Dipak Kumar Kole3, Aruna Chakraborty3 and
Dwijesh Dutta Majumder4
Abstract. This paper presents a novel and efficient way to detect the presence
and identification of disease in wheat leaf from its image. The system applies
FCM on data-points consisting of selected features of a set of Wheat Leaf
images. In the first step, number of clusters is fixed to 2, in order to divide the
input into sets of diseased and undiseased leaf images. The diseased leaf set is
further classified into 4 sets corresponding to possibility of occurrence of
known 4 types of disease, by applying FCM on this set with number of clusters
fixed to 4. We have proposed an efficient method for selection of feature set
based on inter and intra-class variance. Although testing has been done only on
wheat leaf images, this method can also be applied on other leaf images through
careful selection of the feature set.
Keywords: Image Processing, Feature Extraction, Fuzzy C-Means Clustering
1 Introduction
Wheat Leaf rust, Puccinia recondite, usually does not cause spectacular damage, but
on a world-wide basis it probably causes more damage than the other wheat rusts. In
India, average losses of 3% have been estimated, although higher losses occur in
certain areas if the cultivars are susceptible to leaf rust.
Researches in Image Processing and Analysis for Detection of Plant Diseases have
grown immensely over the past decade. Various methods have been devised that are
used to study plant diseases/traits using Image Processing. The methods studied are
aimed at increasing throughput & reducing subjectiveness arising from human experts
in detecting the plant diseases [1].
As shown in Fig. 1, the general system for detection and recognition of disease in
plant leaf consists of three main components: Image analyzer, Feature extraction and
classifier.
The processing that is done by using these components is divided into two phases.
The first processing phase is the offline phase or Training Phase. In this phase, a set
of input images of leaves (diseased and normal) were processed by image analyzer
and certain features were extracted. Then these features were given as input to the
classifier, and along with it, the information whether the image is that of a diseased or
a normal leaf. The classifier then learns the relation among the features extracted and
the possible conclusion about the presence of the disease. Thus the system is trained.
The second processing phase is the online phase, in which the features of a
specified image is extracted by image analyzer and then tested by the classifier
whether the leaf is diseased or not, according to the information provided to it in the
learning phase (offline phase).
Now, assume there is a large set of images of plant leaves, and we need to
determine the type of the disease, if there is any, the leaves are infected with. If we go
by the existing system, we have to take the images one by one, feed it to the input of
the system, get the output and continue the steps for the next image until there is
none. This cycle may be time consuming if the test is conducted for a large number of
leaves, which is the case in most practical situations. The algorithm that we propose
overcomes this shortcoming- it is very easy to comprehend and is designed to work
for a large set of images.
Fig. 1. Architecture of a General System for Detection and Recognition of Disease in Plant
Leaf
The Paper has been organized as follows. In Section 2, we go through the type of
diseases a wheat leaf may be infected with. Section 3 describes Architecture of the
proposed system. Selection of appropriate features is detailed in Section 4. Section 5
explains the basics and application of Fuzzy C-Means Algorithm in our system.
Finally we show the results in Section 6, and reach our conclusion in Section 7.
For this work, more than 300 wheat leaf images were collected from a wheat farm
at Field Crop Research Station, Department of Agriculture, Govt of West Bengal,
Burdwan. Then the leaves where classifed according to the severity of infection in the
leaves by two experienced doctors, Dr. Amitava Ghosh (Ex. Economic Botanist IX,
West Bengal) and Dr. P. K. Maity (Chief Agronomist & Ex-Officio Joint Director of
Agriculture, West Bengal).
All four diseases mentioned above can be considered to be wheat leaf rusts with
various degrees of infections.
A study by Pathologists Marsalis and Goldberg [2] reveals that Wheat Leaf rust
disease symptoms begin as small, circular to oval yellow spots on infected tissues of
the upper leaf surface. As the disease progresses, the spots develop into orange
colored pustules which may be surrounded by a yellow halo (Fig. 2).
Fig. 2. Relative resistances of wheat to leaf rust: R=resistant, MR= moderately resistant, MS=
moderately susceptible, and S= susceptible. Source: Rust Scoring Guide, Research Institute for
Plant Protection, Wageningen, Netherlands.
The first step in making the proposed system to be working properly is selection of
the appropriate set of features. In order to do that, we have considered the most
common features used in such applications, namely, Entropy, Median, Mode,
Variance [3], Standard Deviation, Number of Zeros and connected components in the
Binary Image, Number of Peaks in the Histogram, Color Moments [4, 5] and Texture
based features [6, 7] like Inertia, Correlation, Energy and Homogeneity. Then we
used a simple algorithm based on inter-class and intra-class variance (refer to Section
4) to identify the most effective among these features.
Next, we take the set of images of wheat leaf, which is required to be divided into
separate clusters. We extract the selected features from the input set of images. The
feature vectors are then fed to the Fuzzy C-Means Clustering Algorithm (refer to
Section 5) fixing number of clusters to 2. This divides the set of wheat leaf images
into sets of diseased and undiseased leaf images. Next, we consider only the feature
vectors of the set of diseased leaves. We again run the Fuzzy C-Means Clustering
Algorithm on this set, fixing the number of clusters to 4. Thus, we obtain 4 sets
partitioned according to the degree of infection of the system.
The architecture of the system is schematically represented in Fig. 3a. The “Feature
Extraction and Fuzzy C-Means” block in Fig. 3a has been expanded in Fig. 3b.
Fig. 3. (a) Architecture of the System (b) Feature Extraction followed by Fuzzy C-Means
Clustering
The algorithm selects the most suitable features from the set of all features mentioned
in the previous section. This activity decreases the size of each feature vector to an
optimum level, and thus enhances performance of the FCM. The steps of the
algorithm are as follows.
Fig. 4: Feature Selection Process
1. Collect 300 or more images of wheat leaf such that 50% of wheat leaf
images belong to R-Class, 25% to S-Class, 13% to MR-Class and 12% to
MS-Class.
2. Extract the 25 Features of each of the images collected and stored in 4
separate files for 4 categories of image. The file containing features of all the
images belong to R Class is named FR, and the others are named FS, FMR
and FMS respectively.
3. Calculate the variance of all the feature values belonging to one feature in a
particular class-file. This operation is done for each of the 4 files and stored
in 1 file named Intra_Class_Var. This file will contain 4 rows corresponding
to 4 classes of image, and 25 columns corresponding to each of the features.
Hence, the value at position i,j specifies the variance of feature j of images
belonging to class i. These values are called Intra-class variance.
4. Calculate the mean of all the feature values belonging to one feature in a
particular class-file. This operation is done for each of the 4 files and stored
in 1 file named Intra_Class_Mean. This file will contain 4 rows
corresponding to 4 categories of image, and 25 columns corresponding to
each of the features. Hence, the value at position i,j specifies the mean of
feature j of images belonging to class i. These values are called Intra-class
Mean.
5. Calculate the variance of mean values (all rows) corresponding to each
feature (one column) in the file Intra_Class_Mean. Thus we get 25 values of
variance and store these values in a file named Inter_Class_Var.
6. Choose two thresholds T1 and T2, one based on inter-class-variance, and
another on intra-class-variance.
7. A set of features InterFeature are selected such that the inter-class variance
value for all those features are greater than T1.
8. A set of features IntraFeature are selected such that the intra-class variance
values of all 4 categories for all those features are less than T2.
9. We perform set union operation on 2 sets, InterFeature and IntraFeature to
get the final selected set of features.
The schematic representation of the algorithm has been shown in Fig. 4.
The concept of Fuzzy sets [10, 11] is as follows. Suppose there are n elements in a
set, and we need to distribute these n elements into c clusters. Each of these n
elements will have c degree of membership values which will correspond to c
clusters. Among these c values, the element will be considered to be a part of that
cluster which will have the highest membership value for that element.
In our case, the set consists of all the feature vectors of the images. We have
applied Fuzzy c-means Clustering Algorithm [12] which iteratively minimizes the
objective function:
N C
J m ui j m || xi c j ||2 (1)
i 1 j 1
Where m is any real number greater than 1, uij is the degree of membership
of feature vector xi in the cluster j, xi is the ith of d-dimensional measured data, cj is
the d-dimension center of the cluster, and ||*|| is any norm expressing the similarity
between any measured data and the center.
In our application, we have considered 4 clusters corresponding to 4 categories of
disease- R, MR, MS and S. This operation categorizes the feature vectors into 4
clusters. From the result we can conclude that an image I belonging to cluster C
implies the wheat leaf in image I is infected with disease belonging to category C.
6 Results
As shown in the description of architecture of the system, the first step required for
proper functioning of the proposed algorithm is the proper selection of the appropriate
features. We first collected 310 wheat leaf images and extracted all 25 features, as
described in Section 4. Then we apply the algorithm formulated in Section 5, to get a
set of 7 features which are designated as the best among all the features in this
particular application. For this experiment, the Intra-Class Threshold was set to 0.6
and the Inter-Class Threshold was set to 0.03. The 7 features that were selected from
the system are shown in Table 1.
6.2 Fuzzy C-Means Results
In the first step, we run FCM on feature vectors of 310 wheat leaf images fixing C=2.
We get a set of diseased wheat leaf images in one cluster. Next, we run FCM only on
the set of diseased leaves, fixing C=4. Classification into diseased and undiseased
leaves was found to be accurate in 88% of the cases. Recognition of type of disease
was accurate in 56% of the cases.
Wheat
Leaf Images
Category R MR MS S
Mode 0.11695 0.0035 0 0.13443
Variance 341.77 0.0587 301.94 73.155
Standard
5846.1 7.6602 5495 2704.7
Deviation
No. of 0s in
89.084 0.5634 18.972 44.876
Binary Image
No. of
Connected 0.0055036 0.0032 0.0080 0.2457
Components
S Plane
-0.07605 0 0.1442 -0.034339
Moment3
V Plane
2.2459 0 0.3411 0.018235
Moment 1
7 Conclusion
References