Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/282771723

Application of Fuzzy C-Means Clustering Method to Classify Wheat Leaf


Images Based on the Presence of Rust Disease

Conference Paper · November 2014


DOI: 10.1007/978-3-319-11933-5_30

CITATIONS READS

10 1,622

5 authors, including:

Diptesh Majumdar
Carnegie Mellon University
4 PUBLICATIONS 34 CITATIONS

SEE PROFILE

All content following this page was uploaded by Diptesh Majumdar on 12 October 2015.

The user has requested enhancement of the downloaded file.


Application of Fuzzy C-Means Clustering Method to
Classify Wheat Leaf Images based on the presence of rust
disease

Diptesh Majumdar1, Arya Ghosh2, Dipak Kumar Kole3, Aruna Chakraborty3 and
Dwijesh Dutta Majumder4

1 Department of Computer Science and Engineering, Indian Institute of Technology,


uwahati, Guwahati, Assam, India
2 Department of Information Technology, Indian Institute of Engineering Science and

Technology, Shibpur, West Bengal, India


3 Department of Computer Science and Engineering, St. Thomas’ College of Engineering &

Technology, Kidderpore, Kolkata, West Bengal, India


4 Professor Emeritus in Electronics and Communication Sciences Unit, Indian Statistical

Institute, Director in Institute of Cybernetics Systems and Information Technology, Kolkata,


West Bengal, India

Abstract. This paper presents a novel and efficient way to detect the presence
and identification of disease in wheat leaf from its image. The system applies
FCM on data-points consisting of selected features of a set of Wheat Leaf
images. In the first step, number of clusters is fixed to 2, in order to divide the
input into sets of diseased and undiseased leaf images. The diseased leaf set is
further classified into 4 sets corresponding to possibility of occurrence of
known 4 types of disease, by applying FCM on this set with number of clusters
fixed to 4. We have proposed an efficient method for selection of feature set
based on inter and intra-class variance. Although testing has been done only on
wheat leaf images, this method can also be applied on other leaf images through
careful selection of the feature set.
Keywords: Image Processing, Feature Extraction, Fuzzy C-Means Clustering

1 Introduction

Wheat Leaf rust, Puccinia recondite, usually does not cause spectacular damage, but
on a world-wide basis it probably causes more damage than the other wheat rusts. In
India, average losses of 3% have been estimated, although higher losses occur in
certain areas if the cultivars are susceptible to leaf rust.
Researches in Image Processing and Analysis for Detection of Plant Diseases have
grown immensely over the past decade. Various methods have been devised that are
used to study plant diseases/traits using Image Processing. The methods studied are
aimed at increasing throughput & reducing subjectiveness arising from human experts
in detecting the plant diseases [1].
As shown in Fig. 1, the general system for detection and recognition of disease in
plant leaf consists of three main components: Image analyzer, Feature extraction and
classifier.
The processing that is done by using these components is divided into two phases.
The first processing phase is the offline phase or Training Phase. In this phase, a set
of input images of leaves (diseased and normal) were processed by image analyzer
and certain features were extracted. Then these features were given as input to the
classifier, and along with it, the information whether the image is that of a diseased or
a normal leaf. The classifier then learns the relation among the features extracted and
the possible conclusion about the presence of the disease. Thus the system is trained.
The second processing phase is the online phase, in which the features of a
specified image is extracted by image analyzer and then tested by the classifier
whether the leaf is diseased or not, according to the information provided to it in the
learning phase (offline phase).
Now, assume there is a large set of images of plant leaves, and we need to
determine the type of the disease, if there is any, the leaves are infected with. If we go
by the existing system, we have to take the images one by one, feed it to the input of
the system, get the output and continue the steps for the next image until there is
none. This cycle may be time consuming if the test is conducted for a large number of
leaves, which is the case in most practical situations. The algorithm that we propose
overcomes this shortcoming- it is very easy to comprehend and is designed to work
for a large set of images.

Fig. 1. Architecture of a General System for Detection and Recognition of Disease in Plant
Leaf

The Paper has been organized as follows. In Section 2, we go through the type of
diseases a wheat leaf may be infected with. Section 3 describes Architecture of the
proposed system. Selection of appropriate features is detailed in Section 4. Section 5
explains the basics and application of Fuzzy C-Means Algorithm in our system.
Finally we show the results in Section 6, and reach our conclusion in Section 7.
For this work, more than 300 wheat leaf images were collected from a wheat farm
at Field Crop Research Station, Department of Agriculture, Govt of West Bengal,
Burdwan. Then the leaves where classifed according to the severity of infection in the
leaves by two experienced doctors, Dr. Amitava Ghosh (Ex. Economic Botanist IX,
West Bengal) and Dr. P. K. Maity (Chief Agronomist & Ex-Officio Joint Director of
Agriculture, West Bengal).

2 Recognition of Wheat Leaf Diseases and Gradations

A wheat leaf can be infected with the following four diseases:


I. Powdery Mildew: Elliptical patches of white fungal growth appear on both leaf
surfaces.
II. Septoria Leaf Spot: Yellow flecks first appear on the lower leaves. Later, yellow
to red-brown or gray-brown spots or blotches may develop on all above-ground
parts of wheat.
III. Tan spot or Yellow Leaf: Oval-shaped tan spots up to 12 mm in length appear
on the leaves.
IV. Snow Molds: Leaves may be partly or entirely dried and appear brown or
bleached. When the crowns are attacked, the plants are usually killed. When the
crowns are unharmed, new leaves emerge among the damaged leaves and the
wheat plants often recover.

All four diseases mentioned above can be considered to be wheat leaf rusts with
various degrees of infections.
A study by Pathologists Marsalis and Goldberg [2] reveals that Wheat Leaf rust
disease symptoms begin as small, circular to oval yellow spots on infected tissues of
the upper leaf surface. As the disease progresses, the spots develop into orange
colored pustules which may be surrounded by a yellow halo (Fig. 2).

Fig. 2. Relative resistances of wheat to leaf rust: R=resistant, MR= moderately resistant, MS=
moderately susceptible, and S= susceptible. Source: Rust Scoring Guide, Research Institute for
Plant Protection, Wageningen, Netherlands.

Accordingly, our proposed system is able to identify a wheat leaf as diseased or


not, and if found to be diseased, it can determine the severity of the infection by
labeling it as R, MR, MS or S.
3 Architecture of the System

The first step in making the proposed system to be working properly is selection of
the appropriate set of features. In order to do that, we have considered the most
common features used in such applications, namely, Entropy, Median, Mode,
Variance [3], Standard Deviation, Number of Zeros and connected components in the
Binary Image, Number of Peaks in the Histogram, Color Moments [4, 5] and Texture
based features [6, 7] like Inertia, Correlation, Energy and Homogeneity. Then we
used a simple algorithm based on inter-class and intra-class variance (refer to Section
4) to identify the most effective among these features.
Next, we take the set of images of wheat leaf, which is required to be divided into
separate clusters. We extract the selected features from the input set of images. The
feature vectors are then fed to the Fuzzy C-Means Clustering Algorithm (refer to
Section 5) fixing number of clusters to 2. This divides the set of wheat leaf images
into sets of diseased and undiseased leaf images. Next, we consider only the feature
vectors of the set of diseased leaves. We again run the Fuzzy C-Means Clustering
Algorithm on this set, fixing the number of clusters to 4. Thus, we obtain 4 sets
partitioned according to the degree of infection of the system.
The architecture of the system is schematically represented in Fig. 3a. The “Feature
Extraction and Fuzzy C-Means” block in Fig. 3a has been expanded in Fig. 3b.

Fig. 3. (a) Architecture of the System (b) Feature Extraction followed by Fuzzy C-Means
Clustering

4 Feature Selection Algorithm

The algorithm selects the most suitable features from the set of all features mentioned
in the previous section. This activity decreases the size of each feature vector to an
optimum level, and thus enhances performance of the FCM. The steps of the
algorithm are as follows.
Fig. 4: Feature Selection Process

1. Collect 300 or more images of wheat leaf such that 50% of wheat leaf
images belong to R-Class, 25% to S-Class, 13% to MR-Class and 12% to
MS-Class.
2. Extract the 25 Features of each of the images collected and stored in 4
separate files for 4 categories of image. The file containing features of all the
images belong to R Class is named FR, and the others are named FS, FMR
and FMS respectively.
3. Calculate the variance of all the feature values belonging to one feature in a
particular class-file. This operation is done for each of the 4 files and stored
in 1 file named Intra_Class_Var. This file will contain 4 rows corresponding
to 4 classes of image, and 25 columns corresponding to each of the features.
Hence, the value at position i,j specifies the variance of feature j of images
belonging to class i. These values are called Intra-class variance.
4. Calculate the mean of all the feature values belonging to one feature in a
particular class-file. This operation is done for each of the 4 files and stored
in 1 file named Intra_Class_Mean. This file will contain 4 rows
corresponding to 4 categories of image, and 25 columns corresponding to
each of the features. Hence, the value at position i,j specifies the mean of
feature j of images belonging to class i. These values are called Intra-class
Mean.
5. Calculate the variance of mean values (all rows) corresponding to each
feature (one column) in the file Intra_Class_Mean. Thus we get 25 values of
variance and store these values in a file named Inter_Class_Var.
6. Choose two thresholds T1 and T2, one based on inter-class-variance, and
another on intra-class-variance.
7. A set of features InterFeature are selected such that the inter-class variance
value for all those features are greater than T1.
8. A set of features IntraFeature are selected such that the intra-class variance
values of all 4 categories for all those features are less than T2.
9. We perform set union operation on 2 sets, InterFeature and IntraFeature to
get the final selected set of features.
The schematic representation of the algorithm has been shown in Fig. 4.

5 Fuzzy C-Means Clustering Algorithm

The concept of Fuzzy sets [10, 11] is as follows. Suppose there are n elements in a
set, and we need to distribute these n elements into c clusters. Each of these n
elements will have c degree of membership values which will correspond to c
clusters. Among these c values, the element will be considered to be a part of that
cluster which will have the highest membership value for that element.
In our case, the set consists of all the feature vectors of the images. We have
applied Fuzzy c-means Clustering Algorithm [12] which iteratively minimizes the
objective function:
N C
J m   ui j m || xi  c j ||2 (1)
i 1 j 1

Where m is any real number greater than 1, uij is the degree of membership
of feature vector xi in the cluster j, xi is the ith of d-dimensional measured data, cj is
the d-dimension center of the cluster, and ||*|| is any norm expressing the similarity
between any measured data and the center.
In our application, we have considered 4 clusters corresponding to 4 categories of
disease- R, MR, MS and S. This operation categorizes the feature vectors into 4
clusters. From the result we can conclude that an image I belonging to cluster C
implies the wheat leaf in image I is infected with disease belonging to category C.

6 Results

6.1 Feature Selection

As shown in the description of architecture of the system, the first step required for
proper functioning of the proposed algorithm is the proper selection of the appropriate
features. We first collected 310 wheat leaf images and extracted all 25 features, as
described in Section 4. Then we apply the algorithm formulated in Section 5, to get a
set of 7 features which are designated as the best among all the features in this
particular application. For this experiment, the Intra-Class Threshold was set to 0.6
and the Inter-Class Threshold was set to 0.03. The 7 features that were selected from
the system are shown in Table 1.
6.2 Fuzzy C-Means Results

In the first step, we run FCM on feature vectors of 310 wheat leaf images fixing C=2.
We get a set of diseased wheat leaf images in one cluster. Next, we run FCM only on
the set of diseased leaves, fixing C=4. Classification into diseased and undiseased
leaves was found to be accurate in 88% of the cases. Recognition of type of disease
was accurate in 56% of the cases.

Table 1. Selected Features and their Sample Values

Wheat
Leaf Images

Category R MR MS S
Mode 0.11695 0.0035 0 0.13443
Variance 341.77 0.0587 301.94 73.155
Standard
5846.1 7.6602 5495 2704.7
Deviation
No. of 0s in
89.084 0.5634 18.972 44.876
Binary Image
No. of
Connected 0.0055036 0.0032 0.0080 0.2457
Components
S Plane
-0.07605 0 0.1442 -0.034339
Moment3
V Plane
2.2459 0 0.3411 0.018235
Moment 1

7 Conclusion

This paper demonstrates the power of Fuzzy C-Means Clustering Algorithm as a


classifier in applications of identification of disease in plant leaves. It is simpler and
faster than other contemporary integrated image processing approaches used for
disease identification. The only drawback is that the output of the system (partitioned
set of images) can be ambiguous in some cases. For example, suppose if the input set
of diseased leaves does not cover all possible type of infections, the output can still be
separated into 4 clusters. This drawback can be avoided through better selection of the
features. An improved AdaBoost algorithm [8] or KPCA based feature selection [9]
can also be adopted on the large feature set to select the most significant features.

References

1. Patil, J. , K. , Kumar, R. “Advances in Image Processing for Detection of Plant Diseases”,


Journal of Advanced Bio Informatics Applications and Research ISSN 0976-260,4 Vol 2,
Issue 2, June-2011, pp 135-141.
2. M. Halkidi, Y. B. M. V., 2002. Cluster validity methods: Part I, SIGMOD Record,
September.
3. Gonzalez R. C., Woods R. E. “Digital Image Processing”, Second Edition, 2002 by
Prentice-Hall, Inc.
4. Powbunthorn K., Abudullakasim W., Unartngam J. “Assessment of the severity of Brown
Leaf Spot Disease in Cassava using Image Analysis” , The International conference of the
Thai Society of Agricultural Engineering 2012 August 4-5, 2012, Chiangmai, Thailand.
5. Patil J. K., Kumar R. “Color Feature Extraction of Tomato Leaf Diseases” , International
Journal of Engineering Trends and Technology – Volume2Issue2-2011.
6. Majumder D. D., Chanda B. Digital Image Processing and Analysis. Prentince Hall of India
Private Limited. September 2007.
7. Patil J. K., Kumar R. “Feature Extraction of Diseased Leaf Images”, Journal of Signal and
Image Processing ISSN: 0976-8882 & E-ISSN: 0976-8890, Volume 3, Issue 1, 2012, pp-
60-63.
8. Zhang M., Meng Q. “Citrus canker detection based on leaf images Analysis”. 2010 IEEE.
9. Tian J., HU Q., MA X., HAN M. “An Improved KPCA/GA-SVM Classification Model for
Plant Leaf Disease Recognition”. Journal of Computational Information Systems 8: 18
(2012) 7737-7745.
10. Majumder D. D., Pal S. K. “Concepts of Fuzzy Sets and its Application in Pattern
Recognition Problems”, Proc. Nat. Conf. of CSI, Hyderabad, India, Jan. 1976.
11. Majumder D. D., Majumder S. “Fuzzy Set in Pattern Recognition and Image Analysis”, in
Advances in Inform. Sc. And Tech. (Ed. D. Dutta Majumder), Statistical Publishing
Society, ISI, Calcutta, pp. 50-69, 1982.
12. Bezdek J.C., R. Ehrlich, W. Full., 1984. FCM: the Fuzzy c-mean Clustering Algorithm,
Computer and Geoscience, 10, pp. 191-203.

View publication stats

You might also like