Applying Data Mining Techniques in The F

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

Vol 01, Issue 02, December 2012 International Journal of Business Intelligents ISSN: 2278-2400



Assistant Professor, Department of Computer Science
Maharani’s Science College for Women, Bangalore, India

In this paper an attempt has been made to review the research studies on application of data mining
techniques in the field of agriculture. Some of the techniques, such asID3 algorithms, the k-means, the
k nearest neighbour, artificial neural networks and support vector machines applied in the field of
agriculture were presented. Data mining in application in agriculture is a relatively new approach for
forecasting / predicting of agricultural crop/animal management. This article explores the applications
of data mining techniques in the field of agriculture and allied sciences.

Data mining, K-means algorithm, crop productivity, ID3 algorithm, rough sets, k nearest neighbour.

1. Introduction
Data mining is the extraction of hidden
The major reason that data mining has predictive information from large databases, is
attracted a great deal of attention in a powerful new technology with great
information industry in recent years is due to potential to help companies focus on the most
the wide availability of huge amounts of data important information in their data
and the imminent need for turning such data warehouses. Data mining tools predict future
into useful information and knowledge. The trends and behaviours, allowing businesses to
information and knowledge gained can be make proactive, knowledge-driven decisions.
used for applications ranging from business The automated, prospective analysis offered
management, production control, and market by data mining move beyond the analysis of
analysis, to engineering design and science past events provided by retrospective tools
exploration. typical of decision support systems.

Data mining can be viewed as a result of the Agriculture and allied activities constitute the
natural evolution of information technology. single largest component of India’s gross
An evolutionary path has been witnessed in domestic product, contributing nearly 25% of
the database industry in the development of the total and nearly 60% of Indian population
the following functionalities: data collection depends on this profession. Due to vagaries of
and database creation, data management climate factors the agricultural productivities
(including data storage and retrieval, and in India are continuously decreasing over a
database transaction processing), and data decade. The reasons for this were studied
analysis and understanding (involving data mostly using regression analysis. In this paper
warehousing and data mining). For instance, an attempt has been made to compile the
the early development of data collection and research findings of different researchers who
database creation mechanisms served as a used data.
prerequisite for later development of effective
mechanisms for data storage and retrieval, and
query and transaction processing. Become the
next target systems opening query and
transaction processing as common practice,
data analysis and understanding has naturally

Integrated Intelligents Research (IIR) 72

Vol 01, Issue 02, December 2012 International Journal of Business Intelligents ISSN: 2278-2400

2. Application of Data mining Data mining techniques are often used to study
soil characteristics. As an example, the k-
techniques in Agriculture means approach is used for classifying soils in
combination with GPS-based technologies
Many techniques for learning rules
[19], k-means approach [15] to classify soils
and relationships automatically from diverse
and plants and SVMs to classify crops [6].
data sets were developed [14], to simplify the
often tedious and error-prone process of
Independent component analysis techniques
acquiring knowledge from empirical data.
for mining spatio-temporal data has been
While these techniques are plausible,
applied to mine for patterns in weather data
theoretically well-founded, and perform well
using the North Atlantic Oscillation (NAO) as
on more or less artificial test data sets, they
a specific example [5]. They found that the
depend on their ability to make sense of real-
strongest independent components match the
world data. This paper describes a project that
observed synoptic weather patterns
is applying a range of machine learning
corresponding to the NAO. They validated the
strategies to problems in agriculture and
results by matching the independent
horticulture. They briefly surveyed some of
component activities with the NAO index.
the techniques emerging from machine
learning research, describe a software
Analytical exploration of vast amount of
workbench for experimenting with a variety of
agricultural data can best be supported by an
techniques on real-world data sets, and
appropriate application. [1] applied data
describe a case study of dairy herd
warehousing and Online Analytical Processing
management in which culling rules were
(OLAP) technologies for appropriate utility of
inferred from a medium-sized database of herd
agricultural data. A data warehouse provides a
information. They also described a range of
flexible yet efficient and reliable storage
machine learning strategies to problems in
structure for vast amount of data while OLAP
agriculture and horticulture. They briefly
techniques provide mechanisms for ad hoc and
surveyed some of the techniques emerging
in depth analysis of this data.
from machine learning research, described a
software workbench for experimenting with a
Traditional analytical tools and database
variety of techniques on real-world data sets,
techniques may not succeed here due their
and described a case study of dairy herd
rigid nature. Techniques used in their work are
management in which culling rules were
equally applicable at any geographic location
inferred from a medium-sized database of herd
provided that related data is available.
A case study of interpreting paddy
In recent years, several models for the
distributions of three counties on Northern
simulation of soil dynamics have been
Taiwan during two crop seasons on year 2000
developed. Some examples are DSSAT [11],
using multi-temporal imageries together with
CROPSYST [17], and GLEAMS [13], to cadastre GIS by Bayesian posteriori
name a few. Such models are able to simulate
probability classifier was studied [7]. In order
the dynamics in a soil, using some soil
to integrating Bayesian conditional
parameters that need to be specified. Three are
probability, priori probabilities of paddy's
the most used parameters, referred to as LL,
attributes were estimated from
DUL, and PEWS. LL is the lower limit of
photogrammettric interpretation results
plant water availability; DUL is the drained
provided by the Food Bureau, and the
upper limit; PESW is the plant extractable soil spectrum reflectance from different growth
water. Unfortunately, these parameters are
stages was used. Due to the spatial
usually unknown. The available information
heterogenous of paddy's distribution, classifier
about the soils usually regards their texture,
parameters were established individually on
such as the percentage of clay, silt, sand and
each map-quadrangle.
organic carbon in the soil. While the texture of
Temporal change of NDVI from different
a soil is usually known, the LL, DUL and
growth stages pass through rice's life cycle has
PEWS parameters are difficult to estimate.
been measured and we find two-stage images
make significant improvement on

Integrated Intelligents Research (IIR) 73

Vol 01, Issue 02, December 2012 International Journal of Business Intelligents ISSN: 2278-2400

classification results. Results of the study help and near infra red regions was also performed.
us to evaluate the accuracy of the classifier. A 60% classification accuracy was achieved
Imagery classification results were compared i.e., correspondence in order of ranking
with aerial photo's interpreting results for generated through regression, as compared
assessing accuracy. Overall accuracy of first with the spectral rank order.
crop of Tao-yuan, Hsin-chu, and Miao-li were
89.93% 92.83% 95.33% respectively. Bayesin A process model for analyzing data, and
classifier has advantages including easy-to- describes the support that Weka to
adjusted and easy-to-computed rules and Environment for Knowledge Analysis
comparative stable results when limited SPOT (WEKA) provides for this model [8]. The
satellite imageries available. Bayesin method domain model learned by the
also provides results with probability that help data mining algorithm can then be readily
the operator to assess the places having least incorporated into a software application. This
confidence. These advantages allow us to WEKA based analysis and application
suggest Bayesian method be used in paddy- construction process was illustrated through a
area investigation in Taiwan. case study in the agricultural domain i.e., in
mushroom grading.
Studies conducted by agricultural researchers
in Pakistan have shown that attempts of crop Effect of pesticides on humans can’t be
yield maximization through propesticide state directly checked because of the poisonous
policies have led to a dangerously high nature of pesticides, therefore the usage of
pesticide usage. These studies have reported a pesticides on cotton crop has been taken [16]
negative correlation between pesticide usage into consideration for the purpose. The COF
and crop yield. Hence excessive use of Clustering Tool cannot only be used for
pesticides is harming the farmers with adverse pesticide data, but also possesses the flexibility
financial, environmental and social impacts. to deal with any numeric data.
Study [2] had shown that how data mining
integrated agricultural data including pest Spatial data mining methods to extract
scouting, pesticide usage and meteorological interesting and regular knowledge from large
recordings is useful for optimization of spatial databases of agriculture were studied
pesticide usage. Unsupervised clustering of the [12] aiming at discerning trends in agriculture
data was performed first through Recursive production with reference to the availability of
Noise Removal (RNR). These clusters reveal inputs. The predicted and real vs. Counter
interesting patterns of farmer practices along graph illustrated how closely the poly analyst
with pesticide usage dynamics and hence help prediction follows the actual value of the
identify the reasons for this pesticide abuse attribute over the range of the dataset.
Applying the data mining techniques to
A mechanism of performing the mapping from agriculture the target for different food grains
nominal to numeric values(actually ranking) can be achieved. Their study demonstrated the
based on the transmittance as well as the scope for application of spatial mining tools
statistical properties of the plants was for a utility study and analysis. The specific
proposed [3]. Spectral analysis (using application of Polyanalyst gave a clear scope
chemical means) is a tedious and time for evaluation and comparison of predicted
consuming process, thus difficult to repeat, and real values.
each and every time, for classification of
(numerically) unclassified cotton varieties. A Influence of climatic factors on major kharif
supporting statistical method was also and rabi crops production in Bhopal District of
proposed based on linear regression curve Madhya Pradesh State was studied [18]. The
fitting using normalized nominal attributes. findings of the study revealed that the decision
Subsequently a rank is assigned to the variety tree analysis indicated that the productivity of
based on its R2 value and slope of the plot. soybean crop was mostly influenced by
This rank thus becomes the numeric Relative humidity followed by rainfall and
equivalent of the nominal alphanumeric name temperature. The decision tree analysis
of the variety being considered. Spectral indicated that the productivity of paddy crop
analysis of 12 cotton varieties in the visible was mostly influenced by Rainfall followed by

Integrated Intelligents Research (IIR) 74

Vol 01, Issue 02, December 2012 International Journal of Business Intelligents ISSN: 2278-2400

Relative humidity and Evaporation. For Wheat maize seed breeding, they analyzed the
crop the analysis indicated that the potential rules and found out useful
productivity is mostly influenced by information from it for direct growth of maize.
Temperature followed by Relative humidity Their experiment showed the improved CA
and Rainfall. The findings of decision tree algorithm can obtain more intuitive and
were confirmed from Bayesian classification. efficient information.
The decision tree in the study area fast to 3. Conclusions
execute and much to be desired as
representations of knowledge interpretations. There is a growing number of applications of
The rules formed from the decision tree are data mining techniques in agriculture and a
helpful in identifying the conditions growing amount of data that are currently
responsible for the high or low crop available from many resources. This is
productivity. relatively a novel research field and it is
expected to grow in the future. There is a lot of
Powdery Mildew of Mango a devastating work to be done on this emerging and
disease of mango was predicted [9] using interesting research field. The
Decision Tree induction, Rough Sets (RS) and multidisciplinary approach of integrating
hybridized Rough Set based Decision Tree computer science with agriculture will help in
Induction (RDT) in comparison with the forecasting/managing agricultural crops
standard Logistic Regression (LR) method. effectively.
The induction algorithms shown better
performance over logistic regression.
A web based expert information system based
on ID3 algorithm was studied [4] in which an [1] Jiawei and Micheline Kamber. Simon Fraser
expert system provides advisory services to University “Data Mining Concepts & Techniques”
Tomato growers regarding pests, diseases and 2000.
their control measures. The web based system
[2] Abdullah, A., Brobst, S., M.Umer M. 2004.
has also provision for the growers to interact
with other growers on the management "The case for an agri data ware house: Enabling
practices of tomato crop cultivation. analytical exploration of integrated agricultural
data". Proc. of IASTED International Conference
An advanced version of decision-making tree on Databases and Applications. Austria. Feb
algorithm IBLE that it mainly uses in the
information theory [20]. The channel capacity [3] Abdullah, A., Brobst, S, Pervaiz.I., Umer
concept to take chooses the important M.,A.Nisar. 2004. "Learning dynamics of pesticide
characteristic to the entity in the measure. abuse through data mining". Proc. of Australian
Combines the rule with many characteristics Workshop on Data Mining and Web Intelligence,
the point to distinguish the example can New Zealand, January.
effectively the correct distinction. They
applied this algorithm in the oral cavity [4] Using Data Mining to Discover Patterns in
disease diagnosis, the experimental result Autonomic Storage Systems. Zhenmin Li,
indicated this algorithm has the very strong Sudarshan M. Srinivasan, Zhifeng Chen, Yuanyuan
recognition capability to agriculture case Zhou, Peter Tzvetkov, Xifeng Yan, and Jiawei
diagnosis to very good assistance diagnosis Han. 1st Workshop on Algorithms and
function. Architectures for Self- Managing Systems in
conjunction with ISCA and SIGMETRICS, June
The application of information technology in 2003.
agriculture accelerates the digitization of
agriculture information [10] presented a new [5] Abdullah, A., Bulbul.R., Tahir Mehmood.
improved CA algorithm based on traditional 2005. "Mapping nominal values to numbers by data
decision tree method. It introduces a pre- mining spectral properties of leaves". Proc. of 3rd
treatment theory about double dimension International Symposiumm on Intelligent
reduction which can deal with large and high- Information Technology in Agriculture. Beijing,
scale datasets. By using CA algorithm in China. Oct, 2005.

Integrated Intelligents Research (IIR) 75

Vol 01, Issue 02, December 2012 International Journal of Business Intelligents ISSN: 2278-2400

[6] Babu, MSP., Ramana Murthy, NV, SVNL [15] Leonard RA, Knisel WG, Still DA., 1987,
Narayana, 2010. "A web based tomato crop expert GLEAMS: groundwater-loading effects of
information system based on artificial intelligence agricultural management systems. Trans Am Soc
and machine learning algorithms". Int. J. of Comp. Agric Eng 30(5): pp. 1403–1418
Sci., and Information Technologies. Vol. 1(1) . pp.
6-15. [16] McQueen Robert J, Garner S.R.,Nevill-
Manning C.G. , Ian H. Witten, 1995. "Applying
[7] Basak J., Sudharshan, A., Trivedi D., machine learning to agricultural data". Comptuers
M.S.Santhanam. 2004. "Weather Data Mining and Electronics in Agriculture. Vol. 12:pp. 275-
Using Independent Component Analysis". J. of 293.
Machine Learning Research 5: pp. 239-253.
[17] Meyer GE, Neto JC, Jones DD, Hindman TW,
[8] Camps-Valls G, Gomez-Chova L, Calpe- 2004, "Intensified fuzzy clusters for classifying
Maravilla J, Soria-Olivas E, Martin-Guerrero JD, plant, soil, and residue regions of interest from
Moreno J., 2003, "Support vector machines for color images". Computer Electronics Agric Vol.
crop classification using hyperspectral data". Lect 42: pp. 161–180.
Notes Comp Sci 2652: pp. 134–141
[18] Rabia Imitiaz, Malik Sikandar Hayat Khiyal,
[9] Chi-Chung LAU, Kuo-Hsin HSIAO, 2005. Shahid Khalil , Ahsan Abdullah, 2005, "Effect of
"Bayesian Classification For Rice Paddy pesticides on human life through visual data
interpretation". Paper presented in Conference on mining". Journal of Theoretical and Applied
data mining held at China Tapei. December, 2005 Information Technology. pp. 104-109.

[10] Cunningham S.J., G. Holmes. 2005. [19] Stockle CO, Martin SA, Campbell GS, 1994,
"Developing innovative applications in agriculture "CropSyst, a cropping systems model:
using data mining". Proc. Of 3rd International water/nitrogen budgets and crop yield". Agric Syst
Symposium on Intelligent Information Technology Vol. 46(3): pp. 335–359.
in Agriculture. Beijing, China. Oct, 2005.
[20] Verheyen K, Adriaens D, Hermy M, Deckers
[11] Jain Rajni, Minz, S., V. Rama Subramaniam. S., 2001, "High-resolution continuous soil
2009. "Machine learning for forewarning crop classification using morphological soil profile
diseases". J. Ind. Soc. Agri. Stat. 63(1): pp. 97-107. descriptions". Geoderma Vol. 101: pp. 31–48

[12] Jianlin Ji Dan, Qiu Chen, Jianping Chen, Li [21] Yue Jin Hai, Song Kai, 2010. "IBLE
He Peng , 2010. "An improved decision tree Algorithm in agricultural disease diagnosis". In
algorithm and its application in maize seed third International Conference on Intelligent
breeding". Sixth Internation Conference on Natural Networks and Intelligent Systems held at
Computation, held at Yantai, Shandon 10-12th Shenyang, Liaoning China during November 01-
January. pp. 117-121. November 2003.

[13] Jones JW, Tsuji GY, Hoogenboom G, Hunt [22] Olivia Parr Rud : “Data Mining, Modeling
LA, Thornton PK, Wilkens PW, Imamura DT, data for marketing risk, and Customer Relationship
Bowen WT, Singh U., (1998), "Decision support Management”, Wiley Publications 2003.
system for agrotechnology transfer: DSSAT v3".
In: Tsuji GY, Hoogenboom G, Thornton PK (eds) ,
"Understanding options for agricultural
production". Kluwer Academic Publishers,
Dordrecht, pp 157–177

[14] Kiran Mai, C., Murali Krishna, I.V.,

A.Venugopal Reddy, 2006. "Data Mining of Geo-
spatial Database for Agriculture Related
Application". Proc. of Map India. New Delhi.

Integrated Intelligents Research (IIR) 76

You might also like