Performance Evaluation of Machine Learning Algorithms For A Cluster-Based Crop Recommendation System

2023 17th International Conference on Signal-Image Technology & Internet-Based Systems (SITIS)
2023 17th International Conference on Signal-Image Technology & Internet-Based Systems (SITIS) | 979-8-3503-7091-1/23/$31.00 ©2023 IEEE | DOI: 10.1109/SITIS61268.2023.00079
Performance Evaluation of
Machine Learning Algorithms for
a Cluster-based Crop Recommendation System
Dalhatu Muhammed Ehsan Ahvar Shohreh Ahvar
Institut Supérieur d’Electronique de Paris (ISEP) Nokia Nokia
Paris, France Massy, France Massy, France
dalhatu.muhammed@ext.isep.fr ehsan.ahvar@nokia.com shohreh.ahvar@nokia.com
Maria Trocan
Institut Supérieur d’Electronique de Paris (ISEP)
Paris, France
maria.trocan@isep.fr
Abstract—A cluster-based crop recommendation system cate- opportunity to the farmer to choose one of the crops of the
gorizes the crop candidates into several groups or classes (e.g., recommended group based on his/her interests.
based on soil and environment parameters similarities). After In addition, the grouping of crops can reduce the complexity
receiving a request from a farmer, it recommends the most
appropriate group of crops to the farmer. Proposing a group and, therefore, can reduce the resource usage (e.g., compute
of crops (i.e., more than one crop) can allow farmers to consider resource) of the recommendation system.
their personal interests as well. In addition, the cluster-based crop On the other hand, an appropriate Machine Learning (ML)
recommendation system can reduce the complexity and utilized algorithm can play an important role in the performance of
resources (e.g., compute resource). As the main contribution of a cluster-based crop recommendation system. To this end,
this paper, we evaluate the performance of different Machine
Learning (ML) algorithms to find the most appropriate one for performance evaluation of different ML classification algo-
using in the cluster-based crop recommendation system. rithms to find the most appropriate one is necessary. The key
Index Terms—Crop Recommendation, Machine Learning contributions of this paper are as follows:
(ML) Algorithms, Cluster-based and Performance Evaluation • We present the concept of a cluster-based crop recom-
mendation system.
I. I NTRODUCTION • We group the crops based on their soil and environment
parameters similarities.
S MART agriculture (SA) utilizes technology to increase

efficiency, productivity and sustainability in agriculture. It
proposes various solutions ranging from crop, soil, water and
• Using a dataset, we evaluate the performance of different
ML classification algorithms for the cluster-based crop
recommendation system.
disease management to smart harvesting [1]. • We find and propose the most appropriate ML classi-
Crop management, as one of the SA solutions, is a set fication algorithm for the cluster-based recommendation
of agricultural practices performed to improve the growth, system.
development and yield of crops. It covers a vast range from The remaining part of this paper is as follows: Section II
seedbed preparations, sowing of seeds and crop maintenance to presents the related work. The cluster-based crop recommen-
crop harvest, storage and marketing. The crop recommendation dation system is introduced in Section III. In Section IV, we
system, as part of the crop management, takes several features evaluate the performance of different ML algorithms. Section
(e.g., weather, soil and crop) as input and suggests the most V concludes our work and discusses the future work.
appropriate crops to the farmers.
In our previous work [2], we proposed the concept of II. R ELATED WORK
a User-friendly AIoT-based Crop Recommendation system Crop recommendation systems, as part of crop management,
(UACR). In this paper, we first present the concept of a cluster- grabbed the attention of researchers over the last few decades
based crop recommendation system which is an improved to increase food production and minimize global food short-
version of UACR. age. The studies in the literature suggested various crops for
The cluster-based crop recommendation system can recom- farmers to grow on their farms utilizing different algorithms.
mend a group of crops to a farmer. Therefore, it can give this The well-known algorithms such as Naïve Bayes, Support
Vector Machine (SVM), Support Vector Classifier (SVC), De-
cision Tree (DT), K-Nearest Neighbor (KNN), Random Forest
979-8-3503-7091-1/23/$31.00 ©2023 IEEE 441

DOI 10.1109/SITIS61268.2023.00079
Authorized licensed use limited to: Indian Institute of Technology Hyderabad. Downloaded on April 28,2024 at 11:00:52 UTC from IEEE Xplore. Restrictions apply.
(RF), Linear Regression (LR), Artificial Neural Networks of crops giving the farmer the freedom to apply other local
(ANN), and Multivariate Linear Regression (MLR) have been parameters and interests.
evaluated in previous studies (i.e., [3], [4], [5], [6]).
III. C LUSTER - BASED C ROP R ECOMMENDATION S YSTEM
In a research conducted by [7], Convolutional Neural Net-
works (CNNs) were used for soil analysis of the pictures In our previous work [2], we proposed a User-friendly AIoT-
uploaded by farmers and, then, an RF algorithm was used based Crop Recommendation system concept.
for selecting a crop to grow. However, the above-mentioned In UACR, the user interface enables a user to make a
work did not consider all important parameters related to soil recommendation request by sending a farm location data to
and weather. the recommendation system. Considering the farm location,
Research conducted by [8], used pattern matching method the system automatically obtains other required data and runs
to select a suitable crop for a farmer, by considering user its ML model to find the most appropriate crop. It finally sends
location and season as inputs and suggesting the best crop for the recommended crop to the farmer.
farmers. In another research in [9], authors used a content- This paper presents a cluster-based recommendation system.
based filtering technique to suggest crops to the farmer by The cluster-based recommendation system is an extended ver-
considering user location and season for cultivation as input. sion of UACR proposed in our previous work [2]. As shown in
However, the recommendation systems of [8] and [9] did not figure 1, unlike our previous work that recommends only one
consider soil pH and nitrogen, phosphorous and potassium crop, the cluster-based recommendation system recommends
(NPK) for recommending the best crops to the farmers. a group of crops to the farmers (i.e., usually between 2
or 3 crops). By grouping the crops and, therefore, reducing
The crop recommendation introduced by [10], used the
the number of classes, the system complexity and required
clustering method of Density-Based Spatial Clustering of
resources (e.g., required computing resource) can be also
Applications with Noise (DBSCAN) with a ball-tree algorithm
reduced. In addition, it can open hands of farmers to have a set
to recommend a crop to the farmers based on soil parameters
of candidates (i.e., the crops in the proposed group) and select
and seasonal soil variations. However, the system did not
the most appropriate crop based on their interests. For the
consider climate conditions such as rainfall and humidity. In
grouping of crops, we use the K-means clustering algorithm
addition, the clustering here was used to cluster the regions
in this paper.
based on their soil parameters and geolocation information.
In our paper, clustering is used for grouping the crops with IV. P ERFORMANCE E VALUATION
similar parameters for cultivation.
This section presents the performance evaluation of ML
In research of [11], the authors trained ML algorithms for algorithms for cluster-based crop recommendation. It includes
crop recommendation and compared the performance of LR, the preprocessing and analysis of the datasets, evaluation
RF, DT and XGBoost and their results showed that RF scored tools, evaluated ML models, hyperparameter features, valida-
a higher accuracy. Similarly, in a research conducted by [12], tion techniques and evaluation metrics. Finally, we present a
[13], SVM was used for suggesting a suitable crop to the comparison of different ML models based on the accuracy
farmers. An ensemble classification crop recommendation was metric.
proposed in [14], which used DT, SVM, RF, Naive Bayes
and Logistic Regression and showed that RF and Naive Bayes A. Dataset and Libraries
scored a higher accuracy. The initial dataset has 2200 data samples and 8 attributes. It
The authors in [15], used DT, KNN, RF, SVM, Naive has been downloaded from the Kaggle website1 . The following
Bayes and Logistic Regression for recommending crops to are the features of the dataset:
the farmers and compared their performance where XGBoost • Nitrogen (N) which is a nutrient content of the soil
and RF scored a higher accuracy and authors of [16] used LR • Phosphorous (P) which is a nutrient content of the soil
and Neural Networks which considered the cost of cultivation, • Potassium (K) which is a nutrient content of the soil
the model price of crops, the standard price of crops, the • Temperature which is weather data of the environment
nutrient contents, rainfall and temperature data. However, • Humidity which is weather data of the environment
the recommendation system did not consider humidity in • pH which is the soil pH level.
recommending the best crops to the farmers. • Rainfall which is weather data of the environment
A crop recommendation system was proposed by [17], • Label which represents a crop type in the dataset.
where a large number of algorithms (i.e., Naive Bayes, Lo- We started with dataset characteristic analysis and prepro-
gistic Regression, SVM, DT, RF, KNN, LGBM, XGBoost, cessing by checking the attributes of the dataset, checking
Gradient Boosting, AdaBoost and Bagging Classifier) were the null and missing values, the duplicate values, outliers and
evaluated which showed that RF scored a higher accuracy. descriptive analysis of the dataset.
The NPK for the soil was not considered by this work as an We then augmented the dataset size to 6000 and 12000 sam-
input. ples. Data augmentation is the process of increasing the size
These above-mentioned related work proposed only one
crop to the farmer. However, our work recommends a group 1 https://www.kaggle.com/datasets/aksahaha/crop-recommendation/
442
Fig. 1. The cluster-based crop recommendation system
of a dataset by creating new examples through transformations TABLE I

of existing data. This is often done to increase the diversity VALUES OBTAINED FOR HYPER - PARAMETERS
of data available for training the models. Here, we used one Models Parameter Value Best
of the common data augmentation techniques for csv datasets Score
which is the random noise method by adding random noise RF max-depth 15 0.9949
min-samples-leaf 2
to the numerical features of the dataset [18]. Therefore, we min-samples-split 10
finally had an initial version and two augmented versions of n-estimators 100
the dataset. KNN metric manhattan 0.9827
n-neighbors 5
B. Descriptive Analysis and Data Visualization Logistic C 0.1 0.9581
Regression
The descriptive analytics of the data was conducted before penalty l2
the model training to show how the dataset looks like and give DT max-depth 8 0.9864
min-samples-leaf 4
some insights for the best predictive models with the help min-samples-split 4
of statistical analysis of the dataset. The data visualization SVM C 100 0.9863
was carried out using a correlation matrix to visualize the gamma scale
kernel rbf
correlation coefficient of the dataset attributes as presented in XGBoost learning-rate 0.5 0.9896
Fig 2. This shows a significant correlation between P and K, max-depth 3
humidity and temperature, humidity and nitrogen and humidity n-estimators 100
Naive Bayes priors None 0.9863
and rainfall. Similarly, the diagonal data showed that there is var-smoothing 1e-09
a very significant correlation between those dataset features. Bagging bootstrap True 0.9942
max-features 0.5
C. Crops Clustering max-samples 1.0
n-estimators 50
We used one of the most popular clustering algorithm (i.e., GradientBoost learning-rate 0.1 0.9812
the K-means algorithm) for grouping the crops. K-means clus- max-depth 3
tering is an unsupervised ML algorithm used for partitioning n-estimators 50
LGBM learning-rate 0.1 0.9909
a dataset into K distinct, non-overlapping subsets or clusters max-depth 3
[19]. The initial dataset using in this paper consists of 22 crops. n-estimators 150
The clustering algorithm divided them to 7 clusters based on
the similarities to have 2 to 3 crops in every group.
efficient model for the cluster-based crop recommendation sys-
D. Models Comparison tem. In our evaluation, as we do not have temporal information
Scikit-learn library [20] and Python were used for evaluating on the dataset and we have no season variation within the
and comparing different ML models to find out the most dataset, we evaluated well-known algorithms that are used
443
Fig. 2. Dataset features correlation matrix
for non-temporal datasets. We consider RF, KNN, Logistic see that the worst results belong to Logistic Regression. In
Regression, DT, SVM, XGBoost, Naive Bayes, Bagging, Gra- general, increasing size of the dataset from 2000 to 6000 and
dientBoost and LGBM ML models. even 12000 could not make a noticeable effects on the results.
We used 70% of the initial dataset (2200 samples) for
training and hyperparameter tuning and 30% of the dataset V. C ONCLUSION
for testing. GridSearchCV method was utilized for the hyper- This paper first presented the concept of a cluster-based crop
parameter tuning. We also evaluated the performance of the recommendation system where the crops were grouped based
ML models. Table I shows the obtained hyperparameter values on their soil and environment parameters similarities. The
for the ML models. The best score represents the accuracy of cluster-based crop recommendation system received a farm
the ML models. information from a farmer as input and, then, recommended
a group of crops to the farmer. The benefits were 1) having
TABLE II
ACCURACY COMPARISON USING DATASETS WITH SIZES OF 2200, 6000 more than one option for farmers to consider their interests
AND 12000 as well, 2) reducing complexity and resource usage for the
recommendation system.
Models 2200 6000 12000
i=5-i=10 i=5-i=10 i=5-i=10 We then evaluated the performance of several well-known
RF 0.9941- 0.9946- 0.9945- ML algorithms to see which one is the best candidate to
0.9952 0.9952 0.9951 be used in the cluster-based crop recommendation system.
KNN 0.9793- 0.9791- 0.9791-
0.9798 0.9800 0.9800 The result showed that RF obtained the highest accuracy.
Logistic 0.9553- 0.9559- 0.9559- XGBoost, LGBM and GradientBoost were close to RF. The
Regression 0.9594 0.9586 0.9607 dataset utilized in this work was small. We saw that even
DT 0.9875- 0.9874- 0.9876-
0.9903 0.9907 0.9910 augmenting the dataset could not change the results noticeably.
SVM 0.9777- 0.9773- 0.9773- As a future work, the ML models under study can be compared
0.9789 0.9791 0.9791 considering a larger dataset.
XGBoost 0.9925- 0.9927- 0.9927-
0.9948 0.9950 0.9950
Naive Bayes 0.9887- 0.9845- 0.9845-
R EFERENCES
0.9885 0.9841 0.9841 [1] M. Pathan, N. Patel, H. Yagnik, and M. Shah, “Artificial cognition for
Bagging 0.9899- 0.9898- 0.9897- applications in smart agriculture: A comprehensive review,” Artificial
0.9914 0.9908 0.9908 Intelligence in Agriculture, vol. 4, pp. 81–95, 2020.
GradientBoost 0.9908- 0.9904- 0.9905- [2] D. Muhammed, E. Ahvar, S. Ahvar, and M. Trocan, “A user-friendly
0.9922 0.9928 0.9929 aiot-based crop recommendation system (uacr): concept and architec-
LGBM 0.9916- 0.9927- 0.9927- ture,” in 2022 16th International Conference on Signal-Image Technol-
0.9921 0.9931 0.9932 ogy & Internet-Based Systems (SITIS). IEEE, 2022, pp. 569–576.
[3] T. Setiadi, F. Noviyanto, H. Hardianto, A. Tarmuji, A. Fadlil, and
In next step, we evaluated the performance of the ML M. Wibowo, “Implementation of naïve bayes method in food crops
planting recommendation,” Int. J. Sci. Technol. Res, vol. 9, no. 02, pp.
algorithms using the k-fold cross validation method (k=5). 4750–4755, 2020.
In order to get more accurate results, we run the 5-fold [4] T. K. Mishra, S. K. Mishra, K. J. Sai, S. Peddi, and M. Surusomayajula,
cross validation 5 times (i=5) and get the average of these “Crop recommendation system using support vector machine consider-
ing indian dataset,” in Advances in Distributed Computing and Machine
5 times results for every ML algorithm. Similarly, we run the Learning: Proceedings of ICADCML 2022. Springer, 2022, pp. 501–
5-fold cross validation 10 times (i=10). Table II shows the 510.
evaluation results. The results clearly show that RF algorithm [5] K. G. Sandhya, S. Vemuri, K. S. Deeksha, and T. Anvitha, “Crop
recommendation system using ensembling technique,” in 2022 Inter-
could outperform other algorithms in all cases. The results of national Conference on Breakthrough in Heuristics And Reciprocation
XGBoost, GradientBoost and LGBM are close to RF. We can of Advanced Technologies (BHARAT). IEEE, 2022, pp. 55–58.
444
[6] A. Chougule, V. K. Jha, and D. Mukhopadhyay, “Crop suitability and
fertilizers recommendation using data mining techniques,” in Progress
in Advanced Computing and Intelligent Engineering. Springer, 2019,
pp. 205–213.
[7] A. Motwani, P. Patil, V. Nagaria, S. Verma, and S. Ghane, “Soil
analysis and crop recommendation using machine learning,” in 2022
International Conference for Advancement in Technology (ICONAT).
IEEE, 2022, pp. 1–7.
[8] A. Kedlaya, A. Sana, B. A. Bhat, S. Kumar, N. Bhat et al., “An efficient
algorithm for predicting crop using historical data and pattern matching
technique,” Global Transitions Proceedings, vol. 2, no. 2, pp. 294–298,
2021.
[9] N. Patil, S. Kelkar, M. Ranawat, and M. Vijayalakshmi, “Krushi sahyog:
Plant disease identification and crop recommendation using artificial
intelligence,” in 2021 2nd International Conference for Emerging Tech-
nology (INCET). IEEE, 2021, pp. 1–6.
[10] M. Suchithra and M. L. Pai, “Data mining based geospatial clustering
for suitable recommendation system,” in 2020 International Conference
on Inventive Computation Technologies (ICICT). IEEE, 2020, pp. 132–
139.
[11] M. D. Hossain, M. A. Kashem, and S. Mustary, “Iot based smart soil fer-
tilizer monitoring and ml based crop recommendation system,” in 2023
International Conference on Electrical, Computer and Communication
Engineering (ECCE). IEEE, 2023, pp. 1–6.
[12] D. Modi, A. V. Sutagundar, V. Yalavigi, and A. Aravatagimath, “Crop
recommendation using machine learning algorithm,” in 2021 5th Inter-
national Conference on Information Systems and Computer Networks
(ISCON). IEEE, 2021, pp. 1–5.
[13] M. S. Teja, T. S. Preetham, L. Sujihelen, S. Jancy, M. P. Selvan et al.,
“Crop recommendation and yield production using svm algorithm,” in
2022 6th International Conference on Intelligent Computing and Control
Systems (ICICCS). IEEE, 2022, pp. 1768–1771.
[14] L. Meenachi, S. Ramakrishnan, M. Sivaprakash, C. Thangaraj, and
S. Sethupathy, “Multi class ensemble classification for crop recommen-
dation,” in 2022 International Conference on Inventive Computation
Technologies (ICICT). IEEE, 2022, pp. 1319–1324.
[15] V. Vagisha, E. Rajesh, and P. Johri, “Crop recommendation system
for intelligent smart farming technology,” in 2022 4th International
Conference on Advances in Computing, Communication Control and
Networking (ICAC3N). IEEE, 2022, pp. 249–253.
[16] A. Priyadharshini, S. Chakraborty, A. Kumar, and O. R. Pooniwala,
“Intelligent crop recommendation system using machine learning,” in
2021 5th International Conference on Computing Methodologies and
Communication (ICCMC). IEEE, 2021, pp. 843–848.
[17] S. K. S. Durai and M. D. Shamili, “Smart farming using machine
learning and deep learning techniques,” Decision Analytics Journal,
vol. 3, p. 100041, 2022.
[18] A. Arora, N. Shoeibi, V. Sati, A. González-Briones, P. Chamoso, and
E. Corchado, “Data augmentation using gaussian mixture model on
csv files,” in Distributed Computing and Artificial Intelligence, 17th
International Conference. Springer, 2021, pp. 258–265.
[19] Sudirman, A. P. Windarto, and A. Wanto, “Data mining tools| rapid-
miner: K-means method on clustering of rice crops by province as
efforts to stabilize food crops in indonesia,” in IOP Conference Series:
Materials Science and Engineering, vol. 420. IOP Publishing, 2018,
p. 012089.
[20] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion,
O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg et al.,
“Scikit-learn: Machine learning in python,” the Journal of machine
Learning research, vol. 12, pp. 2825–2830, 2011.
445

Performance Evaluation of Machine Learning Algorithms For A Cluster-Based Crop Recommendation System

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Performance Evaluation of Machine Learning Algorithms For A Cluster-Based Crop Recommendation System

Uploaded by

Copyright:

Available Formats

2023 17th International Conference on Signal-Image Technology & Internet-Based Systems (SITIS)

S MART agriculture (SA) utilizes technology to increase

979-8-3503-7091-1/23/$31.00 ©2023 IEEE 441

of a dataset by creating new examples through transformations TABLE I

You might also like