Professional Documents
Culture Documents
Crop Yield Prediction Using Gradient Boosting Neural Network Regression Model
Crop Yield Prediction Using Gradient Boosting Neural Network Regression Model
By
Batch – A7
CERTIFICATE
This is to certify that the project report titled “Crop yield prediction using
Gradient Boosting Neural Network Regression Model” is a bonafide work of following
IV/IV B.Tech. students in the Department of Computer Science and Engineering, Gayatri
VidyaParishad College of Engineering for Women affiliated to JNT University, Kakinada
duringthe academic year 2022-23, in partial fulfillment of the requirement for the award of
the degree of Bachelor of Technology of this university.
External Examiner
ACKNOWLEDGEMENT
The satisfaction that accompanies the successful completion of any task would be
incomplete without the mention of people who made it possible and whose constant
guidance and encouragement crown all the efforts with success.
We would like to take the opportunity to express our profound sense of gratitude to
the revered Principal, Dr. R. K. Goswami for allowing us to utilize the college resources
thereby facilitating the successful completion of our thesis.
We would like to take this opportunity to express our profound sense of gratitude
to Vice Principal, Dr. G. Sudheer for allowing us to utilize the college resources thereby
facilitating the successful completion of our thesis and not but the least we are also thankful
to both teaching and non-teaching faculty of the Department of Computer Science and
Engineering for giving valuable suggestions from our project
TABLE OF CONTENTS
ABSTRACT i
LIST OF FIGURES ii
LIST OF TABLES iii
LIST OF SCREENS iv
LIST OF ACRONYMS v
1. INTRODUCTION 1
1.1.Motivation 1
1.2.Problem Definition 1
1.3.Objective Of the Project 2
1.4.Limitation Of the Project 2
1.5.Organization Of the Project 2
2. LITERATURE SURVEY 3
2.1. Introduction 3
2.2. Existing System 17
2.3. Disadvantages Of Existing System 18
2.4. Proposed System 18
2.5 Conclusion 19
3. REQUIREMENTS ANALYSIS 20
3.1. Introduction 20
3.2. Requirement Specification 20
3.2.1. Functional Requirement 20
3.2.2. Non-Functional Requirement 21
3.3. Conclusion 21
4. METHODOLOGY 22
4.1. Introduction 22
4.2. Modules Identified 22
4.2.1. Data collection and Preprocessing 22
4.2.1.1. Data Collection 22
4.2.1.2. Data Preprocessing 22
4.2.2. Feature Extraction 23
4.2.3. Model Building 23
4.2.3.1. Random Forest 23
4.2.3.2. Support Vector Machine 23
4.2.3.3. Artificial Neural Networks 24
4.2.3.4. Gradient Boosting 24
4.2.4 Output Prediction 25
4.3. Architecture Diagram 26
5. IMPLEMENTATION 27
5.1 Output Screens 27
5.2 Conclusion 30
6. FUTURE WORK 31
7. CONCLUSION 32
REFERENCES 33
ABSTRACT
Agriculture is the best utility region especially inside the developing worldwide areas like
India. Usage of records age in agriculture can substitute the circumstance of decision
making and Farmers can yield in higher manner. About portion of the number of
inhabitants in India relies upon on farming for its occupation however its commitment
towards the GDP of India is just 14 percent. One suitable explanation behind this is the
deficiency of adequate decision making by farmers on yield prediction. There isn’t any
framework in location to suggest farmer what plants to grow. The proposed machine
learning approach aims at predicting the crop yield for a particular region by analyzing
various atmospheric factors like rainfall, temperature, humidity etc., and land factors like
soil pH, soil type including past records of crops grown. Finally, our system is expected to
predict the yield based on dataset we have collected.
This project will help the farmers to know the accurate yield of their crop before cultivating
onto the agricultural field and thus help them to make the appropriate decisions. It attempts
to solve the issue by building a prototype of an interactive prediction and error reduction
system. Implementation of such a system with an easy-to-use web based graphic user
interface and the learning algorithm will be carried out. The results of the prediction will
be made available to the farmer. Thus, for such kind of data analytics in crop prediction,
there are different techniques or algorithms, and with the help of those algorithms we can
predict crop yield. Random forest algorithm, Support Vector Machine [SVM] , Artificial
Neural networks [ANN] and Gradient Boosting algorithms are used.
Key Words
Random Forest, Support Vector Machine, Neural Networks, Gradient Boosting Algorithm
i
LIST OF FIGURES
ii
LIST OF TABLES
iii
LIST OF SCREENS
Implementation of
1. Screen 1 Random Forest Using 27
All Parameters
2. Screen 2 Implementation of 28
Random Forest Using
Selected Parameters
iv
LIST OF ACRONYMS
DT Decision Tree
RF Random Forest
ML Machine Learning
v
INTRODUCTION
1
Boosting and Artificial Neural Networks [ANN] on dataset which contains above resources
to predict the crop yield.
Chapter 1: Introduction describes the motivation, domain and problem definition of this
project.
Chapter 2: Literature survey describes the primary terms involved in the development of
the project.
Chapter 3: Analysis deals with the detailed analysis of the project. Introduction, functional
requirements and non-functional requirements.
Chapter 4: Methodology includes modules identification and architecture diagrams of the
system.
Chapter 5: implementation contains step by step process and screenshots of the output.
Chapter 6: Contains future scope.
Chapter 7: Conclusion of the project.
Chapter 8: Contains references and base paper.
2
2. LITERATURE SURVEY
2.1 INTRODUCTION
India is one of the world’s oldest countries with a thriving agricultural sector, due
to globalization, agricultural trends have drastically changed in recent years. The state of
agriculture in India has been influenced by a number of factors. The Crop Yield Prediction
using more accurate models help farmers all over the world to know the crop yield based
on their land and weather factors. As per our project we have used both Machine learning
and Deep Learning models and applied Gradient Boosting algorithm for better accuracy.
This helps for farmers to know the crop yield and also Contribute to the Country’s GDP.
The existing model, uses Machine learning and Neural Network models.
In [1] Elavarasan, D., & Vincent, P. M. D. (2020) proposed “CROP YIELD PREDICTION
USING DEEP REINFORCEMENT LEARNING MODEL FOR SUSTAINABLE
AGRARIAN APPLICATIONS”. Predicting crop yield based on the environmental, soil,
water and crop parameters has been a potential research topic. Deep-learning-based models
are broadly used to extract significant crop features for prediction. Combining the
intelligence of reinforcement learning and deep learning, deep reinforcement learning
builds a complete crop yield prediction framework that can map the raw data to the crop
prediction values. The proposed work constructs a Deep Recurrent Q-Network model
which is a Recurrent Neural Network deep learning algorithm over the Q-Learning
reinforcement learning algorithm to forecast the crop yield. Deep reinforcement learning,
Deep learning, Artificial neural network, Random Forest, Bayesian ANN. The proposed
model efficiently predicts the crop yield outperforming existing models by preserving the
original data distribution with an accuracy of 93.7%.
3
the three machine learning classifiers like DT, RF and SVM.A machine learning approach
to examine soil fertility and plant nutrient management. The backpropagation network
(BPN) used is trained with inputs on crop growth characteristics, nutrient reserves in the
soil, and external applications for crop production. Feature selection techniques: Boruta,
RFE(recursive feature elimination), MRFE(modified RFE).Classification Techniques:
NAIVE BAYES (NB), DECISION TREES(DT), SUPPORT VECTOR MACHINE
(SVM), K-NEAREST NEIGHBOR (KNN), RANDOM FOREST (RF).
In [4] Suresh, N., Ramesh, N. V. K., Inthiyaz, S., Priya, P. P., Nagasowmika, K., Kumar,
K. V. N. H., … Reddy, B. N. K. (2021) Proposed “CROP YIELD PREDICTION USING
RANDOM FOREST ALGORITHM”. Agriculture is the one that plays important role in
the economy of India. India is an agricultural country and its economy largely based upon
crop production. Hence one must say that agriculture is often the backbone of all businesses
in the a-part-of-us country. Basically, paper focuses on predicting the yield of the crop by
using a different machine learning algorithm. Machine Learning is the best technique
which gives a better practical solution to crop yield problem. So, the Random Forest
4
algorithm which we decided to use to train our model to give high accuracy and best
prediction., we chose 5 climatic parameters to train the model. Agriculture inputs such as
pesticides, fertilizers, chemicals, soil quality, etc. The model is trained and designed using
20 decision trees build the random forest algorithm which gives better accuracy of the
model. 10-fold cross-validation technique used to improve the accuracy of the model. The
predicted accuracy of the model is analysed 87%.
In [5] Rashid, M., Bari, B. S., Yusup, Y., Kamaruddin, M. A., & Khan, N. (2021) Proposed
“A COMPREHENSIVE REVIEW OF CROP YIELD PREDICTION USING MACHINE
LEARNING APPROACHES WITH SPECIAL EMPHASIS ON PALM OIL
YIELD PREDICTION”. Intelligent agriculture requires extensive use of image recognition
of agricultural disease. Several machine learning approaches along with more recent
artificial intelligence (AI) techniques like deep learning and transfer learning, have begun
to be applied to agricultural diagnostics. Satellite-based SIF features might be the potential
feature for predicting crop yield. There are potential ways for improving the performance
of yield forecasting utilizing SIF. Initially, numerous ways of utilizing SIF data to generate
crop yield forecasting algorithms that may indicate to performance variation. The most
promising conventional ML architectures are LR, RF and NN. Besides these algorithms,
some DL models, including DNN, CNN and LSTM, are also employed in the crop yield
estimation. A wide range of classification and regression algorithms have been employed
in previous studies to predict crop yield. According to the extracted data, the most utilized
crop yield prediction algorithm is ANN, and the second most used algorithm is RF. The
other popular algorithms, namely LR, CNN, SVM, SVR and LASSO, were utilized in
studies, respectively.
In [6] Khaki, S., Wang, L., & Archontoulis, S. V. (2020) Proposed “A CNN-RNN
FRAMEWORK FOR CROP YIELD PREDICTION”. It focuses on deep learning
framework using convolutional neural networks (CNNs) and recurrent neural networks
(RNNs) for crop yield prediction based on environmental data and management practices.
The proposed CNN-RNN model, along with other popular methods such as random forest
(RF), deep fully connected neural networks (DFNN), and LASSO, was used to forecast
5
corn and soybean. The CNN-RNN model was designed to capture the time dependencies
of environmental factors and the genetic improvement of seeds over time without having
their genotype information. The model demonstrated the capability to generalize the yield
prediction to untested environments without significant drop in the prediction accuracy.
In [7] Khaki, S., & Wang, L. (2019) Proposed “CROP YIELD PREDICTION USING
DEEP NEURAL NETWORKS”. Crop yield prediction is of great importance to global
food production. To compare the individual importance of genotype, soil and weather
components in the yield prediction, we obtained the yield prediction results using following
models: DNN(G)This model uses the DNN model to predict the phenotype based on the
genotype data (without using the environment data), which is able to capture linear and
nonlinear effects of genetic markers. DNN(S)This model uses the DNN model to predict
the phenotype based on the soil data (without using the genotype and weather data), which
is able to capture linear and nonlinear effects of soil conditions. DNN(W)This model uses
the DNN model to predict the phenotype based on the weather data (without using the
genotype and soil data), which is able to capture linear and nonlinear effects of weather
components.
In [8] Nishant, P. S., Sai Venkat, P., Avinash, B. L., & Jabber, B. (2020) Proposed “CROP
YIELD PREDICTION BASED ON INDIAN AGRICULTURE USING MACHINE
LEARNING”. In India, we all know that Agriculture is the backbone of the country. This
paper predicts the yield of almost all kinds of crops that are planted in India. In this, we
add a meta model and use the out of fold predictions of the other models used to train the
main meta model. The total training set is again divided into two different sets. (Train and
holdout) train the selected base models with first part (train). Test them with the second
part. (holdout) Now, the predictions obtained from test part are inputs to the train higher
level learner called meta-model. The performance metric used in this project are Root mean
square error. When the models applied individually, for ENet it was around 4%, Lasso had
an error about 2%, Kernel Ridge was about 1% and finally after stacking it was less than
1%.
6
In [9] M. Qiao et al., Proposed "EXPLOITING HIERARCHICAL FEATURES FOR
CROP YIELD PREDICTION BASED ON 3-D CONVOLUTIONAL NEURAL
NETWORKS AND MULTI KERNEL GAUSSIAN PROCESS”. Accurate and timely
prediction of crop yield based on remote sensing data is important for food security.
However, crop growth is a complex process, which makes it quite difficult to achieve better
performance. A 3-D CNN is first applied for excavating spatial–spectral features in the
crop yield prediction assessment. An MKGP with a new “spatial–spectral–spatio”
composite Gaussian kernel is concatenated on the top of the 3-D CNN. County-level wheat
yield in China is predicted to show the effectiveness of the proposed method. Algorithms
used: 3-D convolution neural networks, Multi kernel GP, Gaussian Process(f(x) ∼gp (m(x),
k (x, x)).
In [11] Dr. V. Latha Jothi, Neelambigai A, Nithish Sabari S, Santhosh K, 2020 Proposed
“CROP YIELD PREDICTION USING KNN MODEL”. Climate and different
environmental modifications have become a major threat in the agriculture field. This
makes the problem of predicting the yielding of crops an exciting challenge. Data Mining
techniques are the better selections for this purpose. KNN model is using to classifies the
groundwater level dataset to predict the future test data record dataset. Models used: arma
model-based prediction for rainfall, temperature, ground water. ground water level
7
classification based on knn model. Their proposed algorithm was later then compared with
C&R tree algorithm and it outperformed nicely with an accuracy of 90%.
In [12] Bhanumathi, S., Vineeth, M., & Rohit, N. (2019) Proposed “CROP YIELD
PREDICTION AND EFFICIENT USE OF FERTILIZERS”. In this , it proposed a
prediction model for datasets bearing on agriculture that's referred to as CRY algorithm for
crop yield by using beehive clustering techniques. They have taken into consideration, the
parameters particularly crop kind, soil type, soil pH value, humidity and crop sensitivity.
Their analysis was in particular in paddy, rice and sugarcane yields in India. Algorithms
Used: k- manner Algorithm, Apriori Algorithm, Bayes Algorithm. Algorithms used for
classification: Linear regression, ANN algorithm, KNN algorithm. Their proposed
algorithm was later then compared with C&R tree algorithm and it outperformed nicely
with an accuracy of ninety percent.
In [13] Bhosale, S. V., Thombare, R. A., Dhemey, P. G., & Chaudhari, A. N. (2018)
Proposed “CROP YIELD PREDICTION USING DATA ANALYTICS AND HYBRID
APPROACH”. India is by and large an agrarian country. Horticulture is the absolute most
significant supporter of the Indian economy. Horticulture crop creation relies upon the
season, natural, and monetary reason. The forecasting of agrarian yield is testing and
beneficial undertaking for each country. The harvest yield expectation is upgraded through
the information mining procedures. The proposed approach uses both soil and yield
highlights for foreseeing the harvest yield. At first, the gathered soil and yield information's
are pre-handled and the highlights are extricated. The separated elements are chosen in
view of the firefly advancement calculation to lessen the hunt space during forecast. When
the elements are chosen, using (KNN) is acquainted for characterization which assists with
foresee the harvest yield successfully. Algorithms used: KNN, Decision trees,
Classification algorithm.
In [14] Agarwal, S., & Tarar, S. (2021) Proposed “A HYBRID APPROACH FOR CROP
YIELD PREDICTION USING MACHINE LEARNING AND DEEP LEARNING
ALGORITHMS”. The machine learning and deep learning techniques are executed in
8
order to predict the best crop production. An experiment is done on a crop dataset by the
proposed model. The crop is chosen on the basis of the current atmosphere, the soil along
with its constituents as the climatic and soil parameters are taken into consideration. Deep
learning is used to achieve numerous successful calculations as it is used to get the best
suitable crop in case a number of options available. By using this technique, crops are
predicted accurately. The SVM algorithm is implemented under machine learning while
LSTM and RNN are executed under deep learning technique. Algorithms Used: Support
Vector Machine (SVM), Long-Short Term Memory (LSTM), Recurrent Neural Network
(RNN).
In [16] Pallavi Shankarrao Mahore, Dr. Aashish A. Bardekar proposed “CROP YIELD
PREDICTION USING DIFFERENT MACHINE LEARNING TECHNIQUES”. Crop
yield is a very useful information for farmers. It is very beneficial to know the yield which
results in reduction in loss. In the past the yield prediction is done by experienced farmers.
The proposed system also works in a similar way. Datasets have been obtained from the
9
Kaggle website and other different websites. The data set has instance or data that have
taken from the past historic data. It includes 8 parameters or features like the temperature,
rainfall, moisture, humidity, alkaline, sandy etc. This paper presented the various machine
learning algorithms for predicting the yield of the crop on the basis of temperature, rainfall,
season and area. Results reveal that Random Forest is the best classifier when all
parameters are combined.
10
the crop yield. Regression Analysis is used to establish a relationship among a set of
variables AR, AUC and FPI and their effects on yield of rice crop.
In [19] Dishant Israni, Kevin Masalia, Tanvi Khasgiwal, Monica Tolani, Dr. Mani Roja
Edinburgh proposed “CROP-YIELD PREDICTION AND CROP RECOMMENDATION
SYSTEM”. Precision farming is a modern approach in comparison to traditional
cultivation techniques. In this paper we are using various techniques like XGB Regressor,
Ridge Regression and LGBM Classifier. We have used Hyperparameter Tuning on these
models to get a better accuracy. We have also planned to combine both the models and also
notify the farmers using SMS or E-mail. The dataset constitutes of region-specific
attributes which are collected from districts of Karnataka, India like, Bagalkot,
Chamarajanagar, Gadag, Belagavi (Belgaum), Tumakuru (Tumkur), Chikballapur, Koppal
etc. A total of 29 districts are taken into consideration. The crops considered in our dataset
are cotton, Jowar, Maize (Corn), Bajra, Rice. We got the best results from our model when
we applied XGB regressor with Hyperparameter tuning to predict the crop. We also got the
best result from our model when we used LGBM Classifier with Hyperparameter tuningfor
predicting the yield of crop that can be produced.
In [20] Javad Ansarifar1, Lizhi Wang1 & Sotirios V. Archontoulis proposed “AN
INTERACTION REGRESSION MODEL FOR CROP YIELD PREDICTION”. Crop
yield prediction is crucial for global food security yet notoriously challenging due to
multitudinous factors that jointly determine the yield, including genotype, environment,
management, and their complex interactions. The most significant contribution of the new
prediction model is its capability to produce accurate prediction and explainable insights
simultaneously. This was achieved by training the algorithm to select features and
interactions that are spatially and temporally robust to balance prediction accuracy for the
training data and generalizability to the test data. We collected weather data from the Iowa
Environmental Mesonet32, soil data from the Gridded Soil Survey GeographicDatabase33,
and management and yield performance data from the National Agricultural Statistics
Service34 for all 293 counties of the states of Illinois, Indiana, and Iowa from1990 to
2018. We proposed the interaction regression model for crop yield prediction.
11
Table 1: Summary of Literature Survey
Microwave sensing 2021 IEEE A. F. Haufler, Alex F. Haufler; John H. Lack of computing
for estimating Transactions on J. H. Booske Cran Booske; Susan C. efficiency of the testing
cranberry crop yield Geoscience and and S. C. Hagness process
Remote Sensing Hagness,
berry
proposed
datas
et
Crop Yield 2021 2021 7th Suresh, N., Agricultur Random Forest Not accuracy on non-
Prediction Using International Ramesh, N. V. al dataset algorithm linear data not classify a
Random Forest Conference on K., Inthiyaz, hybrid for low yielding
Algorithm. Advanced S., Priya, P. P.,
Computing and Nagasowmika,
Communication K., Kumar, K.
Systems V. N. H., …
(ICACCS). Reddy, B. N.
K.
12
A Comprehensive 2021 IEEE Access Rashid, M., Palmoil DNN, CNN and LSTM Crop yield prediction
Review of Crop Bari, B. S., yield and crop prediction all
Yield Prediction Yusup, Y., dataset together not possible
Using Machine Kamaruddin,
Learning Approaches M. A., &
with Special Khan, N.
Emphasis on Palm
Oil Yield Prediction
A CNN-RNN 2019 Frontiers in Khaki, S., Meteorolo CNN Could not classify a
framework for crop Plant Science, Wang, L., & gical RNN hybrid for low yielding
yield prediction. 2019 Archontoulis, dataset
S. V
Crop Yield 2019 Front. Plant Sci., Khaki, S., & Agricultur Deep Neural Networks Not accuracy on non-
Prediction Using 22 May 2019 Wang, L al dataset linear data
Deep Neural Sec.
Networks Computational
Genomics
Crop Yield 2020 2020 Nishant, P. S., Indian ENet Low prediction using
Prediction based on International Sai Venkat, P., Agricultur Lasso Kernel ridge
Indian Agriculture Conference for Avinash, B. al dataset
using Machine Emerging L., & Jabber,
Learning Technology B
(INCET).
Exploiting 2021 IEEE Journal of M. Qiao et al Hierarchi 3-D Convolutional Low prediction accuracy
Hierarchical Features Selected Topics cal Neural Networks for classification
for Crop Yield in Applied Earth dataset Multi kernel GP technique
Prediction based on Observations
3D Convolutional and Remote
Neural Networks and Sensing
Multi-kernel
Gaussian Process
Ensemble Machine 2022 IEEE Access H. R. Seireg, Blueberry LIGHT GRADIENT Lack of accuracy for
Learning Techniques Y. M. K. dataset BOOSTING large data sets
Using Computer Omar, F. E. A. MACHINE,
Simulation Data for El-Samie, A. GRADIENT
Wild Blueberry S. El-Fishawy BOOSTED
Yield Prediction and A. REGRESSION,
Elmahalawy EXTREME
13
GRADIENT
BOOSTING, RIDGE.
crop yield prediction 2020 INTERNATION Dr. V. Latha Ground ARMA MODEL Lack of classify for
using KNN model AL JOURNAL Jothi, water BASED PREDICTION large data set
OF Neelambigai level FOR RAINFALL,
ENGINEERING A, Nithish dataset TEMPERATURE,
RESEARCH & Sabari S, GROUND WATER.
TECHNOLOG Santhosh K GROUND WATER
Y (IJERT) LEVEL
RTICCT – 2020 CLASSIFICATION
BASED ON KNN
MODEL.
crop yield prediction 2019 . 2019 Bhanumathi, Fertilizer k-manner Algorithm, Not efficient on
and efficient use of International S., Vineeth, dataset Apriori Algorithm, different datasets
fertilizers Conference on M., & Rohit, Bayes Algorithm
Communication N. (2019) Linear regression, ANN
and Signal algorithm, KNN
Processing algorithm
(ICCSP).
crop yield prediction 2018 2018 Fourth Bhosale, S. V., Hybrid KNN, Decision trees, Not capable for large
using data analytics International Thombare, R. dataset Classification algorithm. data sets
and hybrid approach Conference on A., Dhemey,
Computing P. G., &
Communication Chaudhari, A.
Control and N.
Automation
(ICCUBEA).
14
A hybrid approach 2021 Journal of Agarwal, S., & Crop Support Vector Machine Not work for multi
for crop yield Physics: Tarar, S. dataset (SVM), Long-Short skilled application
prediction using Conference Term Memory (LSTM),
machine learning and Series, 1714, Recurrent Neural
deep learning 012012. doi:10. Network (RNN)
algorithms 1088/1742-
6596/1714/1/01
2012
crop yield prediction 2021 International Pallavi Agricultur KNN Varying results with
using different Journal of Shankarrao al dataset SVM different datasets
machine learning Scientific Mahore, Dr. Random forest
techniques Research in Aashish A.
Computer Bardekar
Science,
Engineering and
Information
Technology
prediction of crop 2019 International Devdatta A. Soil Random forest Not applicable for
yield and fertilizer Journal of Bondre, Mr. testing lab SVM Mobile applications
recommendation Engineering Santosh data set
using machine Applied Mahagaonkar
learning algorithms Sciences and
Technology,
2019
Vol. 4, Issue 5,
ISSN No. 2455-
2143, Pages
371-376
Published
Online
15
September 2019
in IJEAST
Prediction of crop 2018 V. Sellam and Distributi Ridge regression Crop yield prediction
yield using E. Poovammal on of crop XGB Regressor and crop prediction all
regression analysis dataset LGBM Classifier together not possible
Crop-yield prediction 2016 Indian Journal Dishant Israni, Rice XGB Regressor, Ridge FPI and their effects on
and crop of Science and Kevin productio Regression and LGBM predicting yield of rice
recommendation Technology, Masalia, Tanvi n dataset Classifier crop
system Khasgiwal,
Monica
Tolani, Dr.
Mani Roja
Edinburgh
An interaction 2021 Department of Javad Historical Linear regression Low accuracy to predict
regression model for Industrial and Ansarifar1, dataset on large dataset
crop yield prediction Manufacturing Lizhi Wang1
Systems & Sotirios V.
Engineering, Archontoulis
Iowa State
University,
Ames, IA
50011,
USA. 2
Department of
Agronomy,
Iowa State
University,
Ames, IA
50011, USA
16
2.2 EXISTING SYSTEM
Predicting crop yield based on the environmental, soil, water and crop parameters
hasbeen a potential research topic. Deep-learning-based models are broadly used to
extract significant crop features for prediction. The proposed work as shown in Fig 1,
constructs KNN ( K Nearest Neighbor ) to forecast the crop yield. The proposed model
efficiently predicts the crop yield outperforming existing models by preserving the
original data distribution with an accuracy of 93.7%.
17
2.3 DISADVANTAGES OF EXISTING SYSTEM
Lack of integrity
Lack of availability and continuity of service’
Lack of accuracy
SVM algorithm
Random Forest
Artificial Neural Networks [ANN]
Gradient Boosting
18
Fig 2. Proposed System
2.5 CONCLUSION
Different papers along with knowledge resources are researched and thus the
proposed system is made by incorporating different features from the survey of
research papers. The system uses different models to compare and contrast the
outputs and also the accuracy results. This system determines the crop yield by
considering different parameters.
19
3. REQUIREMENT ANALYSIS
3.1 INTRODUCTION
The main focus of the project is to predict the crop yield for the crop. Crop
yield prediction in agriculture is a new generation wave that is captivating the
public. The majority of the farmers are unaware of the productivity of the crop get
on that particular field. This is why we’re concentrating on determining the crop
yield prediction to maximize the yield. Our Crop Yield Prediction System using
better Models suggest farmers about the crop productivity and also educate the
farmers on that.
20
RAM: 4.00GB (3.90 GB usable)
System Type: 64-bit Operating System
3.3 CONCLUSION
The precise structure of a system’s data is the key to its success. Normal changes to
the business will not necessitate large adjustments to a system based on those facts if the
data are arranged to avoid redundancy along the lines of the business structure. For many
years, the holy grail of the computer industry has been achieving this durability in the face
of continual business change. It is possible if requirements are expressed in terms of a good
grasp of the data’s underlying structure.
21
4. METHODOLOGY
4.1 INTRODUCTION
To make things easier, complex things are usually divided into sample codes
called "modules.". A module is a file with the extension .py and contains
executable Python code. A module contains several Python statements and
expressions. Most modules are designedto be concise and unambiguous, and they
are intended to solve specific developer problems.
Data preprocessing is the process of preparing raw data for learning models. In
this project data that is obtained is already preprocessed data. The dataset
consists of 3649 rows and 48 columns.
22
4.2.2 Feature Extraction
It refers to the process of transforming raw data into numerical features that can
be processed while preserving the information in the original dataset.
In this phase, a machine learning or deep learning model is built by learning and
generalizing from training data, then applying that acquired knowledge to new
data it has never seen before to make predictions and fulfill its purpose. Model
is built based on different algorithms like Random Forest, Support Vector
Machine, Artificial Neural networks and so on. Later on Gradient Boosting
algorithm is applied on each model to reduce the error rate.
23
The goal of the SVM algorithm is to create the best line or decision boundary that can
segregate n-dimensional space into classes so that we can easily put the new data point in
the correct category in the future. This best decision boundary is called a hyperplane. SVM
chooses the extreme points/vectors that help in creating the hyperplane. These extreme
cases are called as support vectors, and hence algorithm is termed as Support Vector
Machine.
24
Fig 3. Working of Gradient Boosting Algorithm
After building the model, it is tested against the test data and the crop yield is predicted.
The performance of the model is also tested with the help of metrics like:
25
4.3 Architecture Diagram
26
5. IMPLEMENTATION
5.1 INTRODUCTION
The project's implementation stage is when the theoretical design is translated into
a workable system. As a result, it can be seen as the most crucial stage in ensuring the
success of a new system and giving the user confidence that the system will work and be
effective. The implementation step entails meticulous planning, research of the existing
system and its implementation limitations, designing of changeover methods, and
evaluation of changeover methods.
27
Screen 2: Implementation of Random Forest Using selected Parameters
28
Screen 4: Implementation of SVM Using All Parameters
29
Screen 6: Yield Prediction Using SVM
5.3 CONCLUSION
In this chapter we have discussed the result part of our project. It discusses some key
functions and the introduction to the chapter. It also includes output screens of the parts of
code implemented.
30
6. FUTURE WORK
As already stated, the project is aimed at making accurate yield predictions for various
crops applying Gradient Boosting algorithm on various models built such as Random
Forest (RF), Support Vector Machine (SVM) and Artificial Neural Networks (ANN).The
current status of the project for this Semester (4th year 1st semester) is completion of the
implementation of Random Forest (RF) and Support Vector Machine (SVM). In the
upcoming semester, the project is expected to complete the implementation of Artificial
Neural Networks model and then apply Gradient Boosting algorithm on all the models that
are built. However, for future enhancements of the project, the project can be extended for
any other Machine learning or Deep Learning models and also for any Hybrid models.
31
7. CONCLUSION
The main objective of this project is to accurately estimate the yield of various crops in
Rajasthan. For this purpose Gradient Boosting algorithm is used. Currently two models are
built (Random Forest and SVM). At a later point of time in future, the project is expected
to complete the implementation of Artificial Neural Networks model and then by applying
Gradient Boosting algorithm on all the models that are built, the models will predict the
output more accurately. This project will be very helpful for farmers in future to make
appropriate crop yield predictions.
32
REFERENCES
[1] Everingham, Y.L., Inman-Bamber, N.G., Thorburn, P.J., McNeill,T.J., 2007. A Bayesian
modelling approach for long lead sugarcane yield forecasts for the Australian sugar
industry.Australian Journal of Agricultural Research 58, 87–94.
[2] Hansen, J.W., Indeje, M., 2004. Linking dynamic seasonal climate forecasts with crop
simulation for maize yield prediction in semi-arid Kenya. Agricultural and Forest Meteorology
125 (1–2), 143–157.
[3] Selvaraj, A., Selvaraj, J., Maruthaiappan, S., Babu, G. C., & Kumar, P. M. (2020). L1 norm
based pedestrian detection using video analytics technique. Computational Intelligence, 36(4),
1569-1579.
[4] Zhang, P., Anderson, B., Tan, B., Huang, D., Myneni, R., 2005.Potential monitoring of
crop production using a satellitebased Climate-Variability Impact Index. Agricultural and
Forest Meteorology 132 (3–4), 344–358.
[5] Y.L. Everinghama, C.W. Smyth, N.G. Inman-Bamber.,2008. Ensemble data mining
approaches to forecast regional sugarcane crop production. Agricultural and Forest
Meteorology 149 (2009), 689–696.
[6] X.E. Pantazi ,D. Moshou , T. Alexandridis, R.L. Whetton, A.M. Mouazen ,2016.Wheat
yield prediction using machine learning and advanced sensing techniques. Computers an
Electronics in Agriculture 11 (2016) 5765.http://dx.doi.org/10.1016/j.compag.2015.11.018
[7] Yang Chen ,Won Suk Lee,y, Hao Gan , Natalia Peres , Clyde Fraisse , Yanchao Zhang and
Yong He.,2019Strawberry Yield Prediction Based on a Deep Neural Network Using High-
Resolution Aerial Orthoimages Remote Sens. 2019, 11, 1584; doi:10.3390/rs11131584.
[8] K. MATSUMURA, C. F. GAITAN, K. SUGIMOTO, A. J. CANNON AND W.
W.HSIEH.,2014.Maize yield forecasting by linear regression and artificial neural networks in
Jilin, China. Journal of Agricultural Science,(1- 12). doi:10.1017/S0021859614000392.
[9] Mohammad MotiurRahman, NaheenaHaq, Rashedur M Rahman.Machine Learning
Facilitated Rice Prediction in Bangladesh. 2014 Annual Global Online Conference on 35
Information and Computer Technology. 978-1-4799-8311-7/15 $31.00 © 2015 IEEE DOI
10.1109/GOCICT.2014.9
[10] Alberto Gonzalez-Sanchez, Juan Frausto-Solis,Waldo OjedaBustamante,2014.Predictive
ability of machine learning methods for massive crop yield prediction.Spanish Journal of
Agricultural Research 2014 12(2): 313- 328.InstitutoNacional de Investigación y Tecnología
33
Agraria y Alimentaria (INIA).http://dx.doi.org/10.5424/sjar/2014122-4439.
[11] McQueen RJ, Garner SR, Nevill-Manning CG, Witten IH,1995. Applying machine
learning to agricultural data.Comput Electron Agr 12(4): 275-293.
[12] Y., Xu, H., Yan, L., 2017. Support vector machine-based open crop model
(SBOCM):Case of rice production in China. Saudi Journal of Biological Sciences 24 (3),537–
547. https://doi.org/10.1016/j.sjbs.2017.01.024.
[13] Ahilan, A., Manogaran, G., Raja, C., Kadry, S., Kumar, S. N., Kumar, C. A., ...&
Murugan, N. S. (2019). Segmentation by fractional order darwinian particle swarm
optimization based multilevel thresholding and improved lossless prediction based
compression algorithm for medical images. Ieee Access, 7, 89570-89580.
[14] Marinkovic´ B, Crnobarac J, Brdar S, Antic´ B, Jac´imovic´ G,Crnojevic´ V, 2009. Data
mining approach for predictive modeling of agricultural yield data. Proc. First Int Workshop
on Sensing Technologies in Agriculture,Forestry and Environment (BioSense09), Novi Sad,
Serbia, October, pp: 1-5. [15] Ruß G, Kruse R, 2010. Feature selection for wheat yield
prediction. In: Research and development in intelligent systems XXVI (Bramer M et al., eds.),
SpringerVerlag,London.
[16] Zhang, B., Valentine, I., Kemp, P., 2005. Modelling the productivity of naturalised
pasture in the north island, New Zealand: a decision tree approach. Ecol. Model. 186 (3),299–
311. https://doi.org/10.1005/j.ecolmod.2005.10.331
[17] Matsumura, K., Gaitan, C.F., Sugimoto, F., Cannon, A., Hsieh, W.W., 2015. Maize yield
forecasting by linear regression and artificial neural networks in Jilin, China. J. Agr.Sci. 153
(3), 399–410. https://doi.org/10.1016/j.agrsci.2015.10.153. 36
[18] Rub, G., Kruse, R., 2010. Feature selection for wheat yield prediction. In: Bramer,
M.(Ed.), Research and Development in Intelligent Systems XXVI. SpringerVerlag,London.
[19] Bocca, F.F., Rodrigues, L.H.A., 2016. The effect of tuning, feature engineering, and
feature selection in data mining applied to rainfed sugarcane yield modelling.
Comput.Electron. Agric. 128, 67–76. https://doi.org/10.1016/j.com&ele.2016.10.128.
[20] Fortin, J.G., Anctil, F., Parent, L., Bolinder, M.A., 2011. Site specific early season potato
yield forecast by neural network in Eastern Canada. Precis. Agr. 12 (6), 905–
923.https://doi.org/10.1011/j.preagr.2011.10.905
34