Crop Yield Prediction Using Gradient Boosting Neural Network Regression Model

Crop Yield Prediction using Gradient Boosting Neural
Network Regression Model

A PROJECT REPORT SUBMITTED IN PARTIAL FULFILLMENT OF THE
REQUIREMENTS FOR THE AWARD OF THE DEGREE OF
BACHELOR OF TECHNOLOGY IN COMPUTER SCIENCE AND ENGINEERING
By
Batch – A7
G. Chaitanya Sree (19JG1A0519) A. Niharika(19JG1A0501)
Ch. Rohitha Anupama (20JG5A0503) G. Srivalli(20JG5A0504)
Under the esteemed guidance of
Mr. K. Purushotham Naidu
Assistant Professor, CSE Department
Department of Computer Science and Engineering

GAYATRI VIDYA PARISHAD COLLEGE OF ENGINEERING FOR WOMEN
Approved by AICTE NEW DELHI, Affiliated to JNTUK Kakinada
Accredited by National Board of Accreditation (NBA) for B. Tech. CSE, ECE & IT – valid from2019-22 and 2022-25)
Accredited by National Assessment and Accreditation Council (NAAC) with A Grade-Valid from2022-2027
Kommadi, Madhurawada, Visakhapatnam – 530048
2019 – 2023
GAYATRI VIDYA PARISHAD COLLEGE OF ENGINEERING FOR
WOMEN
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
CERTIFICATE
This is to certify that the project report titled “Crop yield prediction using
Gradient Boosting Neural Network Regression Model” is a bonafide work of following
IV/IV B.Tech. students in the Department of Computer Science and Engineering, Gayatri
VidyaParishad College of Engineering for Women affiliated to JNT University, Kakinada
duringthe academic year 2022-23, in partial fulfillment of the requirement for the award of
the degree of Bachelor of Technology of this university.
G. Chaitanya Sree (19JG1A0519) A. Niharika (19JG1A0501)
Ch. Rohitha Anupama (20JG5A0503) G. Srivalli (20JG5A0504)
Mr. K. Purushotham Naidu Dr. P. V. S. L. Jagadamba

Assistant Professor Professor
Internal Guide Head of the Department
External Examiner
ACKNOWLEDGEMENT
The satisfaction that accompanies the successful completion of any task would be
incomplete without the mention of people who made it possible and whose constant
guidance and encouragement crown all the efforts with success.
We feel elated to extend our sincere gratitude to Mr. K. Purushotham Naidu,

Assistant Professor for encouragement all the way during analysis of the project. His
annotations, insinuations and criticisms are the key behind the successful completion of the
thesis and for providing us all the required facilities.
We express our deep sense of gratitude and thanks to Dr. P. V. S. Lakshmi

Jagadamba, Professor and Head of the Department of Computer Science and Engineering
for her guidance and for expressing her valuable and grateful opinions in the project for its
development and for providing lab sessions and extra hours to complete the project.
We would like to take the opportunity to express our profound sense of gratitude to
the revered Principal, Dr. R. K. Goswami for allowing us to utilize the college resources
thereby facilitating the successful completion of our thesis.
We would like to take this opportunity to express our profound sense of gratitude
to Vice Principal, Dr. G. Sudheer for allowing us to utilize the college resources thereby
facilitating the successful completion of our thesis and not but the least we are also thankful
to both teaching and non-teaching faculty of the Department of Computer Science and
Engineering for giving valuable suggestions from our project
TABLE OF CONTENTS
ABSTRACT i
LIST OF FIGURES ii
LIST OF TABLES iii
LIST OF SCREENS iv
LIST OF ACRONYMS v
1. INTRODUCTION 1
1.1.Motivation 1
1.2.Problem Definition 1
1.3.Objective Of the Project 2
1.4.Limitation Of the Project 2
1.5.Organization Of the Project 2
2. LITERATURE SURVEY 3
2.1. Introduction 3
2.2. Existing System 17
2.3. Disadvantages Of Existing System 18
2.4. Proposed System 18
2.5 Conclusion 19
3. REQUIREMENTS ANALYSIS 20
3.1. Introduction 20
3.2. Requirement Specification 20
3.2.1. Functional Requirement 20
3.2.2. Non-Functional Requirement 21
3.3. Conclusion 21
4. METHODOLOGY 22
4.1. Introduction 22
4.2. Modules Identified 22
4.2.1. Data collection and Preprocessing 22
4.2.1.1. Data Collection 22
4.2.1.2. Data Preprocessing 22
4.2.2. Feature Extraction 23
4.2.3. Model Building 23
4.2.3.1. Random Forest 23
4.2.3.2. Support Vector Machine 23
4.2.3.3. Artificial Neural Networks 24
4.2.3.4. Gradient Boosting 24
4.2.4 Output Prediction 25
4.3. Architecture Diagram 26
5. IMPLEMENTATION 27
5.1 Output Screens 27
5.2 Conclusion 30
6. FUTURE WORK 31
7. CONCLUSION 32
REFERENCES 33
ABSTRACT
Agriculture is the best utility region especially inside the developing worldwide areas like
India. Usage of records age in agriculture can substitute the circumstance of decision
making and Farmers can yield in higher manner. About portion of the number of
inhabitants in India relies upon on farming for its occupation however its commitment
towards the GDP of India is just 14 percent. One suitable explanation behind this is the
deficiency of adequate decision making by farmers on yield prediction. There isn’t any
framework in location to suggest farmer what plants to grow. The proposed machine
learning approach aims at predicting the crop yield for a particular region by analyzing
various atmospheric factors like rainfall, temperature, humidity etc., and land factors like
soil pH, soil type including past records of crops grown. Finally, our system is expected to
predict the yield based on dataset we have collected.
This project will help the farmers to know the accurate yield of their crop before cultivating
onto the agricultural field and thus help them to make the appropriate decisions. It attempts
to solve the issue by building a prototype of an interactive prediction and error reduction
system. Implementation of such a system with an easy-to-use web based graphic user
interface and the learning algorithm will be carried out. The results of the prediction will
be made available to the farmer. Thus, for such kind of data analytics in crop prediction,
there are different techniques or algorithms, and with the help of those algorithms we can
predict crop yield. Random forest algorithm, Support Vector Machine [SVM] , Artificial
Neural networks [ANN] and Gradient Boosting algorithms are used.
Key Words
Random Forest, Support Vector Machine, Neural Networks, Gradient Boosting Algorithm
i
LIST OF FIGURES
S. No. Figure No. Figure Name Page No.
1. Figure 1 Existing system 17

2. Figure 2 Proposed System 19
3. Figure 3 Working of Gradient 25

Boosting Model diagram
4. Figure 4 Architecture Diagram 26
ii
LIST OF TABLES
S.NO Table No. Table Name Page no.
1. Table 1 Table for Literature Survey 12
iii
LIST OF SCREENS
S. No. Screen No. Screen Name Page No.
Implementation of
1. Screen 1 Random Forest Using 27
All Parameters
2. Screen 2 Implementation of 28
Random Forest Using
Selected Parameters
3. Screen 3 Yield Prediction Using Random 28

Forest
4 Screen 4 Implementation of SVM Using 29

All Parameters
5 Screen 5 Implementation of SVM 29

Using Selected
Parameters
6. Screen 6 Yield Prediction Using SVM 30
iv
LIST OF ACRONYMS
CNN Convolutional Neural Network
DT Decision Tree
DNN Deep Neural Networks
MLP Multi Layer Perceptron
RNN Recurrent Neural Networks
SVM Support Vector Machine
RF Random Forest
ML Machine Learning
ARMA Auto Regression Moving Average
KNN K Nearest Neighbors
v
INTRODUCTION
1.1 MOTIVATION OF THE PROJECT

Agriculture is considered as the main and the foremost culture practiced in India.
Ancient people cultivate the crops in their own land and so they have been accommodated
to their needs. Nowadays, modern people don’t have awareness about the cultivation of the
crops in a right time and at a right place. The natural resources used and inputs such as
Nitrogen, Phosphorous, Potassium, Temperature, pH value, Rainfall, Humidity are the
foundations for agriculture production. Machine Learning (ML) approaches are used in
many fields, ranging from supermarkets to evaluate the behavior of customers to the
prediction of customers phone use. Machine learning is also being used in agriculture for
several years. Crop yield prediction is one of the challenging problems in precision
agriculture, and many models have been proposed and validated so far. This problem
requires the use of several datasets since crop yield depends on many different factors such
as climate, weather, soil, use of fertilizer, and seed. This indicates that crop yield prediction
is not a trivial task; instead, it consists of several complicated steps. Nowadays, crop yield
prediction models can estimate the actual yield reasonably, but a better performance in
yield prediction is still desirable.
1.2 PROBLEM DEFINITION
Agriculture is considered as the main and the foremost culture practiced in India.
Ancient people cultivate the crops in their own land and so they have been accommodated
to their needs. Nowadays, modern people don’t have awareness about the cultivation of the
crops in a right time and at a right place. The natural resources used and inputs such as
Nitrogen, Phosphorous, Potassium, Temperature, pH value, Rainfall, Humidity are the
foundations for agriculture production. Crop yield prediction is an important agricultural
problem. The existing work constructs an ARMA (Auto Regression Moving Average)
model and K nearest neighbours (KNN) to forecast the crop yield. The proposed model is
enhanced by applying models like (Random Forest, Support Vector Machine, Gradient
1
Boosting and Artificial Neural Networks [ANN] on dataset which contains above resources
to predict the crop yield.
1.3 OBJECTIVE OF THE PROJECT

The main objective of this project is to accurately estimate the yield of various crops
in Rajasthan. There are many models used to predict the yield of the crop. Each model
has some error rate while making predictions. By using Gradient Boosting model we
are trying to reduce the error rate of each model. This project will be very helpful for
farmers in future to make appropriate crop yield predictions.
1.4 LIMITATIONS OF THE PROJECT

The main challenge faced in agriculture sector is the lack of knowledge about
the changing variations in climate. As profit analysis is based on previous year’s data,
the data may not be accurate and the output may differ time to time.
1.5. ORGANIZATION OF PROJECT

Concise overview of rest of the documentation work is explained below:
Chapter 1: Introduction describes the motivation, domain and problem definition of this
project.
Chapter 2: Literature survey describes the primary terms involved in the development of
the project.
Chapter 3: Analysis deals with the detailed analysis of the project. Introduction, functional
requirements and non-functional requirements.
Chapter 4: Methodology includes modules identification and architecture diagrams of the
system.
Chapter 5: implementation contains step by step process and screenshots of the output.
Chapter 6: Contains future scope.
Chapter 7: Conclusion of the project.
Chapter 8: Contains references and base paper.
2
2. LITERATURE SURVEY
2.1 INTRODUCTION
India is one of the world’s oldest countries with a thriving agricultural sector, due
to globalization, agricultural trends have drastically changed in recent years. The state of
agriculture in India has been influenced by a number of factors. The Crop Yield Prediction
using more accurate models help farmers all over the world to know the crop yield based
on their land and weather factors. As per our project we have used both Machine learning
and Deep Learning models and applied Gradient Boosting algorithm for better accuracy.
This helps for farmers to know the crop yield and also Contribute to the Country’s GDP.
The existing model, uses Machine learning and Neural Network models.
In [1] Elavarasan, D., & Vincent, P. M. D. (2020) proposed “CROP YIELD PREDICTION
USING DEEP REINFORCEMENT LEARNING MODEL FOR SUSTAINABLE
AGRARIAN APPLICATIONS”. Predicting crop yield based on the environmental, soil,
water and crop parameters has been a potential research topic. Deep-learning-based models
are broadly used to extract significant crop features for prediction. Combining the
intelligence of reinforcement learning and deep learning, deep reinforcement learning
builds a complete crop yield prediction framework that can map the raw data to the crop
prediction values. The proposed work constructs a Deep Recurrent Q-Network model
which is a Recurrent Neural Network deep learning algorithm over the Q-Learning
reinforcement learning algorithm to forecast the crop yield. Deep reinforcement learning,
Deep learning, Artificial neural network, Random Forest, Bayesian ANN. The proposed
model efficiently predicts the crop yield outperforming existing models by preserving the
original data distribution with an accuracy of 93.7%.
In [2] S. P. Raja, B. Sawicka, Z. Stamenkovic and G. Mariammal proposed "CROP

PREDICTION BASED ON CHARACTERISTICS OF THE AGRICULTURAL
ENVIRONMENT USING VARIOUS FEATURE SELECTION TECHNIQUES AND
CLASSIFIERS". Crop prediction in agriculture is critical and is chiefly contingent upon
soil and environment conditions, including rainfall, humidity, and temperature. Applying
3
the three machine learning classifiers like DT, RF and SVM.A machine learning approach
to examine soil fertility and plant nutrient management. The backpropagation network
(BPN) used is trained with inputs on crop growth characteristics, nutrient reserves in the
soil, and external applications for crop production. Feature selection techniques: Boruta,
RFE(recursive feature elimination), MRFE(modified RFE).Classification Techniques:
NAIVE BAYES (NB), DECISION TREES(DT), SUPPORT VECTOR MACHINE
(SVM), K-NEAREST NEIGHBOR (KNN), RANDOM FOREST (RF).
In [3] A. F. Haufler, J. H. Booske and S. C. Hagness, proposed "MICROWAVE SENSING

FOR ESTIMATING CRANBERRY CROP YIELD”. A Pilot Study Using Simulated
Canopies and Field Measurement Testbeds. Accurate prediction of cranberry yield is
desirable to farmers, agricultural researchers, and the industry as a whole in order to
maximize supply chain efficiency and future crop yields. We collected experimental field
data with a prototype open-ended waveguide sensor operating between 600 and 1300
MHz’s. We measured experimental microwave signals by placing our sensor directly on
top of cranberry-crop bed canopies in central Wisconsin and recording reflection
coefficients across the operating band. We implemented a machine learning approach to
map the microwave reflection coefficients to yield. Performance evaluations of the
machine learning algorithm applied to the measured field data indicated that, in 81% of test
cases, the predicted crop yield had less than 8% error. Most importantly, the average yield
prediction error was less than 1.3%.
In [4] Suresh, N., Ramesh, N. V. K., Inthiyaz, S., Priya, P. P., Nagasowmika, K., Kumar,
K. V. N. H., … Reddy, B. N. K. (2021) Proposed “CROP YIELD PREDICTION USING
RANDOM FOREST ALGORITHM”. Agriculture is the one that plays important role in
the economy of India. India is an agricultural country and its economy largely based upon
crop production. Hence one must say that agriculture is often the backbone of all businesses
in the a-part-of-us country. Basically, paper focuses on predicting the yield of the crop by
using a different machine learning algorithm. Machine Learning is the best technique
which gives a better practical solution to crop yield problem. So, the Random Forest
4
algorithm which we decided to use to train our model to give high accuracy and best
prediction., we chose 5 climatic parameters to train the model. Agriculture inputs such as
pesticides, fertilizers, chemicals, soil quality, etc. The model is trained and designed using
20 decision trees build the random forest algorithm which gives better accuracy of the
model. 10-fold cross-validation technique used to improve the accuracy of the model. The
predicted accuracy of the model is analysed 87%.
In [5] Rashid, M., Bari, B. S., Yusup, Y., Kamaruddin, M. A., & Khan, N. (2021) Proposed
“A COMPREHENSIVE REVIEW OF CROP YIELD PREDICTION USING MACHINE
LEARNING APPROACHES WITH SPECIAL EMPHASIS ON PALM OIL
YIELD PREDICTION”. Intelligent agriculture requires extensive use of image recognition
of agricultural disease. Several machine learning approaches along with more recent
artificial intelligence (AI) techniques like deep learning and transfer learning, have begun
to be applied to agricultural diagnostics. Satellite-based SIF features might be the potential
feature for predicting crop yield. There are potential ways for improving the performance
of yield forecasting utilizing SIF. Initially, numerous ways of utilizing SIF data to generate
crop yield forecasting algorithms that may indicate to performance variation. The most
promising conventional ML architectures are LR, RF and NN. Besides these algorithms,
some DL models, including DNN, CNN and LSTM, are also employed in the crop yield
estimation. A wide range of classification and regression algorithms have been employed
in previous studies to predict crop yield. According to the extracted data, the most utilized
crop yield prediction algorithm is ANN, and the second most used algorithm is RF. The
other popular algorithms, namely LR, CNN, SVM, SVR and LASSO, were utilized in
studies, respectively.
In [6] Khaki, S., Wang, L., & Archontoulis, S. V. (2020) Proposed “A CNN-RNN
FRAMEWORK FOR CROP YIELD PREDICTION”. It focuses on deep learning
framework using convolutional neural networks (CNNs) and recurrent neural networks
(RNNs) for crop yield prediction based on environmental data and management practices.
The proposed CNN-RNN model, along with other popular methods such as random forest
(RF), deep fully connected neural networks (DFNN), and LASSO, was used to forecast
5
corn and soybean. The CNN-RNN model was designed to capture the time dependencies
of environmental factors and the genetic improvement of seeds over time without having
their genotype information. The model demonstrated the capability to generalize the yield
prediction to untested environments without significant drop in the prediction accuracy.
In [7] Khaki, S., & Wang, L. (2019) Proposed “CROP YIELD PREDICTION USING
DEEP NEURAL NETWORKS”. Crop yield prediction is of great importance to global
food production. To compare the individual importance of genotype, soil and weather
components in the yield prediction, we obtained the yield prediction results using following
models: DNN(G)This model uses the DNN model to predict the phenotype based on the
genotype data (without using the environment data), which is able to capture linear and
nonlinear effects of genetic markers. DNN(S)This model uses the DNN model to predict
the phenotype based on the soil data (without using the genotype and weather data), which
is able to capture linear and nonlinear effects of soil conditions. DNN(W)This model uses
the DNN model to predict the phenotype based on the weather data (without using the
genotype and soil data), which is able to capture linear and nonlinear effects of weather
components.
In [8] Nishant, P. S., Sai Venkat, P., Avinash, B. L., & Jabber, B. (2020) Proposed “CROP
YIELD PREDICTION BASED ON INDIAN AGRICULTURE USING MACHINE
LEARNING”. In India, we all know that Agriculture is the backbone of the country. This
paper predicts the yield of almost all kinds of crops that are planted in India. In this, we
add a meta model and use the out of fold predictions of the other models used to train the
main meta model. The total training set is again divided into two different sets. (Train and
holdout) train the selected base models with first part (train). Test them with the second
part. (holdout) Now, the predictions obtained from test part are inputs to the train higher
level learner called meta-model. The performance metric used in this project are Root mean
square error. When the models applied individually, for ENet it was around 4%, Lasso had
an error about 2%, Kernel Ridge was about 1% and finally after stacking it was less than
1%.
6
In [9] M. Qiao et al., Proposed "EXPLOITING HIERARCHICAL FEATURES FOR
CROP YIELD PREDICTION BASED ON 3-D CONVOLUTIONAL NEURAL
NETWORKS AND MULTI KERNEL GAUSSIAN PROCESS”. Accurate and timely
prediction of crop yield based on remote sensing data is important for food security.
However, crop growth is a complex process, which makes it quite difficult to achieve better
performance. A 3-D CNN is first applied for excavating spatial–spectral features in the
crop yield prediction assessment. An MKGP with a new “spatial–spectral–spatio”
composite Gaussian kernel is concatenated on the top of the 3-D CNN. County-level wheat
yield in China is predicted to show the effectiveness of the proposed method. Algorithms
used: 3-D convolution neural networks, Multi kernel GP, Gaussian Process(f(x) ∼gp (m(x),
k (x, x)).
In [10] H. R. Seireg, Y. M. K. Omar, F. E. A. El-Samie, A. S. El-Fishawy and A.

Elmahalawy Proposed "ENSEMBLE MACHINE LEARNING TECHNIQUES USING
COMPUTER SIMULATION DATA FOR WILD BLUEBERRY YIELD PREDICTION”.
The wild blueberry crop is Maine’s most important fruit crop, growing in upland acidic
sandy soils. The wild blueberry crop (also known as low bush blueberry) is the most
important crop for most people and is considered the largest producer among the other
crops. The stacking and cascading techniques to accurately estimate wild blueberry yield
prediction based on unique subgroup criteria. Meta-learning is a technique that integrates
the predictions of many MLA to develop an EMLA approach. Machine learning models:
LIGHT GRADIENT BOOSTING MACHINE, GRADIENT BOOSTED REGRESSION,
EXTREME GRADIENT BOOSTING, RIDGE.
In [11] Dr. V. Latha Jothi, Neelambigai A, Nithish Sabari S, Santhosh K, 2020 Proposed
“CROP YIELD PREDICTION USING KNN MODEL”. Climate and different
environmental modifications have become a major threat in the agriculture field. This
makes the problem of predicting the yielding of crops an exciting challenge. Data Mining
techniques are the better selections for this purpose. KNN model is using to classifies the
groundwater level dataset to predict the future test data record dataset. Models used: arma
model-based prediction for rainfall, temperature, ground water. ground water level
7
classification based on knn model. Their proposed algorithm was later then compared with
C&R tree algorithm and it outperformed nicely with an accuracy of 90%.
In [12] Bhanumathi, S., Vineeth, M., & Rohit, N. (2019) Proposed “CROP YIELD
PREDICTION AND EFFICIENT USE OF FERTILIZERS”. In this , it proposed a
prediction model for datasets bearing on agriculture that's referred to as CRY algorithm for
crop yield by using beehive clustering techniques. They have taken into consideration, the
parameters particularly crop kind, soil type, soil pH value, humidity and crop sensitivity.
Their analysis was in particular in paddy, rice and sugarcane yields in India. Algorithms
Used: k- manner Algorithm, Apriori Algorithm, Bayes Algorithm. Algorithms used for
classification: Linear regression, ANN algorithm, KNN algorithm. Their proposed
algorithm was later then compared with C&R tree algorithm and it outperformed nicely
with an accuracy of ninety percent.
In [13] Bhosale, S. V., Thombare, R. A., Dhemey, P. G., & Chaudhari, A. N. (2018)
Proposed “CROP YIELD PREDICTION USING DATA ANALYTICS AND HYBRID
APPROACH”. India is by and large an agrarian country. Horticulture is the absolute most
significant supporter of the Indian economy. Horticulture crop creation relies upon the
season, natural, and monetary reason. The forecasting of agrarian yield is testing and
beneficial undertaking for each country. The harvest yield expectation is upgraded through
the information mining procedures. The proposed approach uses both soil and yield
highlights for foreseeing the harvest yield. At first, the gathered soil and yield information's
are pre-handled and the highlights are extricated. The separated elements are chosen in
view of the firefly advancement calculation to lessen the hunt space during forecast. When
the elements are chosen, using (KNN) is acquainted for characterization which assists with
foresee the harvest yield successfully. Algorithms used: KNN, Decision trees,
Classification algorithm.
In [14] Agarwal, S., & Tarar, S. (2021) Proposed “A HYBRID APPROACH FOR CROP
YIELD PREDICTION USING MACHINE LEARNING AND DEEP LEARNING
ALGORITHMS”. The machine learning and deep learning techniques are executed in
8
order to predict the best crop production. An experiment is done on a crop dataset by the
proposed model. The crop is chosen on the basis of the current atmosphere, the soil along
with its constituents as the climatic and soil parameters are taken into consideration. Deep
learning is used to achieve numerous successful calculations as it is used to get the best
suitable crop in case a number of options available. By using this technique, crops are
predicted accurately. The SVM algorithm is implemented under machine learning while
LSTM and RNN are executed under deep learning technique. Algorithms Used: Support
Vector Machine (SVM), Long-Short Term Memory (LSTM), Recurrent Neural Network
(RNN).
In [15] X. E. Pantazi; D. Moshou; T. Alexandridis; R. L. Whetton; A. M. Mouazen.

Proposed "WHEAT YIELD PREDICTION USING MACHINE LEARNING AND
ADVANCED SENSING TECHNIQUES". Yield prediction in precision farming, is
considered of high importance for the improvement of crop management and fruit
marketing planning. On-line proximal soil sensing for estimation of soil properties is
required, due to the ability of these sensors to collect high resolution data (>1500 sample
per ha), and subsequently reducing labor and time cost of soil sampling and analysis.
Algorithms Used: counter-propagation artificial neural networks (CP-ANNs), XY-fused
Networks (XY-Fs) and Supervised Kohonen Networks (SKNs). Results showed that cross
validation based yield prediction of the SKN model for the low yield class exceeded 91%
which can be considered as highly accurate given the complex relationship between
limiting factors and the yield. The medium and high yield class reached 70% and 83%
respectively. The average overall accuracy for SKN was 81.65%, for CP-ANN 78.3% and
for XY-F 80.92%, showing that the SKN model had the best overall performance.
In [16] Pallavi Shankarrao Mahore, Dr. Aashish A. Bardekar proposed “CROP YIELD
PREDICTION USING DIFFERENT MACHINE LEARNING TECHNIQUES”. Crop
yield is a very useful information for farmers. It is very beneficial to know the yield which
results in reduction in loss. In the past the yield prediction is done by experienced farmers.
The proposed system also works in a similar way. Datasets have been obtained from the
9
Kaggle website and other different websites. The data set has instance or data that have
taken from the past historic data. It includes 8 parameters or features like the temperature,
rainfall, moisture, humidity, alkaline, sandy etc. This paper presented the various machine
learning algorithms for predicting the yield of the crop on the basis of temperature, rainfall,
season and area. Results reveal that Random Forest is the best classifier when all
parameters are combined.
In [17] Devdatta A. Bondre, Mr. Santosh Mahagaonkar proposed” PREDICTION OF

CROP YIELD AND FERTILIZER RECOMMENDATION USING MACHINE
LEARNING ALGORITHMS”. Machine learning is an emerging research field in crop
yield analysis. Yield prediction is a very important issue in agriculture. Different Machine
learning techniques are used and evaluated in agriculture for estimating the future year's
crop production. This paper proposes and implements a system to predict crop yield from
previous data. This is achieved by applying machine learning algorithms like Support
Vector Machine and Random Forest on agriculture data and recommends fertilizer suitable
for every particular crop. The paper focuses on creation of a prediction model which may
be used for future prediction of crop yield. for soil classification Random Forest is good
with accuracy 86.35% compare to Support Vector Machine. For crop yield prediction
Support Vector Machine is good with accuracy 99.47% compare to Random Forest
algorithm.
In [18] V. Sellam and E. Poovammal proposed” PREDICTION OF CROP YIELD USING

REGRESSION ANALYSIS”. The objective of this work is to analyse the environmental
parameters like Area under Cultivation (AUC), Annual Rainfall (AR) and Food Price Index
(FPI) that influences the yield of crop and to establish a relationship among these
parameters. In this research, Regression Analysis (RA) is used to analyse the
environmental factors and their infliction on crop yield. A sample of environmental factors
like AR, AUC, FPI are considered for a period of 10 years from 1990-2000. The influenced
value R2 = 0.7 is obtained by implementing the Regression Analysis for the data in Table
2. This R2 value clearly states that AR, AUC and FPI have an average of 70% influence in
10
the crop yield. Regression Analysis is used to establish a relationship among a set of
variables AR, AUC and FPI and their effects on yield of rice crop.
In [19] Dishant Israni, Kevin Masalia, Tanvi Khasgiwal, Monica Tolani, Dr. Mani Roja
Edinburgh proposed “CROP-YIELD PREDICTION AND CROP RECOMMENDATION
SYSTEM”. Precision farming is a modern approach in comparison to traditional
cultivation techniques. In this paper we are using various techniques like XGB Regressor,
Ridge Regression and LGBM Classifier. We have used Hyperparameter Tuning on these
models to get a better accuracy. We have also planned to combine both the models and also
notify the farmers using SMS or E-mail. The dataset constitutes of region-specific
attributes which are collected from districts of Karnataka, India like, Bagalkot,
Chamarajanagar, Gadag, Belagavi (Belgaum), Tumakuru (Tumkur), Chikballapur, Koppal
etc. A total of 29 districts are taken into consideration. The crops considered in our dataset
are cotton, Jowar, Maize (Corn), Bajra, Rice. We got the best results from our model when
we applied XGB regressor with Hyperparameter tuning to predict the crop. We also got the
best result from our model when we used LGBM Classifier with Hyperparameter tuningfor
predicting the yield of crop that can be produced.
In [20] Javad Ansarifar1, Lizhi Wang1 & Sotirios V. Archontoulis proposed “AN
INTERACTION REGRESSION MODEL FOR CROP YIELD PREDICTION”. Crop
yield prediction is crucial for global food security yet notoriously challenging due to
multitudinous factors that jointly determine the yield, including genotype, environment,
management, and their complex interactions. The most significant contribution of the new
prediction model is its capability to produce accurate prediction and explainable insights
simultaneously. This was achieved by training the algorithm to select features and
interactions that are spatially and temporally robust to balance prediction accuracy for the
training data and generalizability to the test data. We collected weather data from the Iowa
Environmental Mesonet32, soil data from the Gridded Soil Survey GeographicDatabase33,
and management and yield performance data from the National Agricultural Statistics
Service34 for all 293 counties of the states of Illinois, Indiana, and Iowa from1990 to
2018. We proposed the interaction regression model for crop yield prediction.
11
Table 1: Summary of Literature Survey
Title Year Journal Name Author(s) Dataset(s) Methodology Limitation(s)

Crop Yield 2020 . IEEE Access, Elavarasan, Strident Deep reinforcement Lack of computing
Prediction Using 8, 86886– D., & Vincent, dataset learning, Deep learning, efficiency of the testing
Deep Reinforcement 86901. doi:10.1 P. M. D Artificial neural process
Learning Model for 109/access.2020 network, Random forest,
Sustainable Agrarian .2992480 Bayesian ANN
Applications
Crop Prediction 2022 IEEE Access S. P. Raja, B. Felin NAIVE BAYES (NB), Low prediction accuracy
Based on Sawicka, Z. dataset DECISION for classification
Characteristics of the Stamenkovic TREES(DT), SUPPORT technique
Agricultural and G. VECTOR MACHINE
Environment Using Mariammal (SVM), K-NEAREST
Various Feature NEIGHBOR (KNN),
Selection Techniques RANDOM FOREST
and Classifiers (RF)
Microwave sensing 2021 IEEE A. F. Haufler, Alex F. Haufler; John H. Lack of computing
for estimating Transactions on J. H. Booske Cran Booske; Susan C. efficiency of the testing
cranberry crop yield Geoscience and and S. C. Hagness process
Remote Sensing Hagness,
berry
proposed
datas
et
Crop Yield 2021 2021 7th Suresh, N., Agricultur Random Forest Not accuracy on non-
Prediction Using International Ramesh, N. V. al dataset algorithm linear data not classify a
Random Forest Conference on K., Inthiyaz, hybrid for low yielding
Algorithm. Advanced S., Priya, P. P.,
Computing and Nagasowmika,
Communication K., Kumar, K.
Systems V. N. H., …
(ICACCS). Reddy, B. N.
K.
12
A Comprehensive 2021 IEEE Access Rashid, M., Palmoil DNN, CNN and LSTM Crop yield prediction
Review of Crop Bari, B. S., yield and crop prediction all
Yield Prediction Yusup, Y., dataset together not possible
Using Machine Kamaruddin,
Learning Approaches M. A., &
with Special Khan, N.
Emphasis on Palm
Oil Yield Prediction
A CNN-RNN 2019 Frontiers in Khaki, S., Meteorolo CNN Could not classify a
framework for crop Plant Science, Wang, L., & gical RNN hybrid for low yielding
yield prediction. 2019 Archontoulis, dataset
S. V
Crop Yield 2019 Front. Plant Sci., Khaki, S., & Agricultur Deep Neural Networks Not accuracy on non-
Prediction Using 22 May 2019 Wang, L al dataset linear data
Deep Neural Sec.
Networks Computational
Genomics
Crop Yield 2020 2020 Nishant, P. S., Indian ENet Low prediction using
Prediction based on International Sai Venkat, P., Agricultur Lasso Kernel ridge
Indian Agriculture Conference for Avinash, B. al dataset
using Machine Emerging L., & Jabber,
Learning Technology B
(INCET).
Exploiting 2021 IEEE Journal of M. Qiao et al Hierarchi 3-D Convolutional Low prediction accuracy
Hierarchical Features Selected Topics cal Neural Networks for classification
for Crop Yield in Applied Earth dataset Multi kernel GP technique
Prediction based on Observations
3D Convolutional and Remote
Neural Networks and Sensing
Multi-kernel
Gaussian Process
Ensemble Machine 2022 IEEE Access H. R. Seireg, Blueberry LIGHT GRADIENT Lack of accuracy for
Learning Techniques Y. M. K. dataset BOOSTING large data sets
Using Computer Omar, F. E. A. MACHINE,
Simulation Data for El-Samie, A. GRADIENT
Wild Blueberry S. El-Fishawy BOOSTED
Yield Prediction and A. REGRESSION,
Elmahalawy EXTREME
13
GRADIENT
BOOSTING, RIDGE.
crop yield prediction 2020 INTERNATION Dr. V. Latha Ground ARMA MODEL Lack of classify for
using KNN model AL JOURNAL Jothi, water BASED PREDICTION large data set
OF Neelambigai level FOR RAINFALL,
ENGINEERING A, Nithish dataset TEMPERATURE,
RESEARCH & Sabari S, GROUND WATER.
TECHNOLOG Santhosh K GROUND WATER
Y (IJERT) LEVEL
RTICCT – 2020 CLASSIFICATION
BASED ON KNN
MODEL.
crop yield prediction 2019 . 2019 Bhanumathi, Fertilizer k-manner Algorithm, Not efficient on
and efficient use of International S., Vineeth, dataset Apriori Algorithm, different datasets
fertilizers Conference on M., & Rohit, Bayes Algorithm
Communication N. (2019) Linear regression, ANN
and Signal algorithm, KNN
Processing algorithm
(ICCSP).
crop yield prediction 2018 2018 Fourth Bhosale, S. V., Hybrid KNN, Decision trees, Not capable for large
using data analytics International Thombare, R. dataset Classification algorithm. data sets
and hybrid approach Conference on A., Dhemey,
Computing P. G., &
Communication Chaudhari, A.
Control and N.
Automation
(ICCUBEA).
14
A hybrid approach 2021 Journal of Agarwal, S., & Crop Support Vector Machine Not work for multi
for crop yield Physics: Tarar, S. dataset (SVM), Long-Short skilled application
prediction using Conference Term Memory (LSTM),
machine learning and Series, 1714, Recurrent Neural
deep learning 012012. doi:10. Network (RNN)
algorithms 1088/1742-
6596/1714/1/01
2012
Wheat yield 2015 Journal by X. E. Pantazi; Wheat counter-propagation Lack of computing

prediction using Elsevier BV D. Moshou; T. yield artificial neural efficiency of the testing
machine learning and Alexandridis; dataset networks (CP-ANNs), process
advanced sensing R. L. Whetton; XY-fused Networks
techniques A. M. (XY-Fs) and Supervised
Mouazen. Kohonen Networks
(SKNs)
crop yield prediction 2021 International Pallavi Agricultur KNN Varying results with
using different Journal of Shankarrao al dataset SVM different datasets
machine learning Scientific Mahore, Dr. Random forest
techniques Research in Aashish A.
Computer Bardekar
Science,
Engineering and
Information
Technology
prediction of crop 2019 International Devdatta A. Soil Random forest Not applicable for
yield and fertilizer Journal of Bondre, Mr. testing lab SVM Mobile applications
recommendation Engineering Santosh data set
using machine Applied Mahagaonkar
learning algorithms Sciences and
Technology,
2019
Vol. 4, Issue 5,
ISSN No. 2455-
2143, Pages
371-376
Published
Online
15
September 2019
in IJEAST
Prediction of crop 2018 V. Sellam and Distributi Ridge regression Crop yield prediction
yield using E. Poovammal on of crop XGB Regressor and crop prediction all
regression analysis dataset LGBM Classifier together not possible
Crop-yield prediction 2016 Indian Journal Dishant Israni, Rice XGB Regressor, Ridge FPI and their effects on
and crop of Science and Kevin productio Regression and LGBM predicting yield of rice
recommendation Technology, Masalia, Tanvi n dataset Classifier crop
system Khasgiwal,
Monica
Tolani, Dr.
Mani Roja
Edinburgh
An interaction 2021 Department of Javad Historical Linear regression Low accuracy to predict
regression model for Industrial and Ansarifar1, dataset on large dataset
crop yield prediction Manufacturing Lizhi Wang1
Systems & Sotirios V.
Engineering, Archontoulis
Iowa State
University,
Ames, IA
50011,
USA. 2
Department of
Agronomy,
Iowa State
University,
Ames, IA
50011, USA
16
2.2 EXISTING SYSTEM
Predicting crop yield based on the environmental, soil, water and crop parameters
hasbeen a potential research topic. Deep-learning-based models are broadly used to
extract significant crop features for prediction. The proposed work as shown in Fig 1,
constructs KNN ( K Nearest Neighbor ) to forecast the crop yield. The proposed model
efficiently predicts the crop yield outperforming existing models by preserving the
original data distribution with an accuracy of 93.7%.
Fig 1: Existing System
17
2.3 DISADVANTAGES OF EXISTING SYSTEM
 Lack of integrity
 Lack of availability and continuity of service’
 Lack of accuracy
2.4 PROPOSED SYSTEM

The proposed system will predict the crop yield for particular crop based on soil
contents and weather parameters such as Rainfall, soil pH, etc define the target for a model.
In our system as shown in fig 2, we used machine learning and deep learning algorithms.
We have also used Gradient boosting algorithm to reduce the error rates of the above-
mentioned models and increase the accuracy of each of them.
 SVM algorithm
 Random Forest
 Artificial Neural Networks [ANN]
 Gradient Boosting
18
Fig 2. Proposed System
2.5 CONCLUSION
Different papers along with knowledge resources are researched and thus the
proposed system is made by incorporating different features from the survey of
research papers. The system uses different models to compare and contrast the
outputs and also the accuracy results. This system determines the crop yield by
considering different parameters.
19
3. REQUIREMENT ANALYSIS
3.1 INTRODUCTION
The main focus of the project is to predict the crop yield for the crop. Crop
yield prediction in agriculture is a new generation wave that is captivating the
public. The majority of the farmers are unaware of the productivity of the crop get
on that particular field. This is why we’re concentrating on determining the crop
yield prediction to maximize the yield. Our Crop Yield Prediction System using
better Models suggest farmers about the crop productivity and also educate the
farmers on that.
3.2 REQUIREMENT SPECIFICATION

3.2.1 Functional Requirements
The software requirements include the languages, packages and

operating system, and different tools used for developing the project.
 Language: python3, HTML, CSS

 Packages Used:
 Pandas:
It is a python package providing fast, flexible, and expressive data
structures designed to make working with “relational” or “labeled” data
both easy and intuitive. It aims to be the fundamental high-level building
block for doing practical, real-world data analysis in Python. Additionally,
it has the broader goal of becoming the most powerful and flexible open-
source data analysis/manipulation tool available in any language.
 NumPy:
NumPy is a library for the Python programming language, adding
support for large, multi-dimensional arrays and matrices, along with a large
collection of high-level mathematical functions to operate on these arrays.
 Seaborn:
Seaborn is a library that uses Matplotlib underneath to plot graphs. It
will be used to visualize random distributions.
 Matplotlib:
Matplotlib is a comprehensive library for creating static, animated, and
interactive visualizations in Python. Matplotlib makes easy things easy and
hard things possible. Create publication quality plots. Make interactive
figures that can zoom, pan, update.
 Processor: IntelcoreTM i5-7200u CPU@2.50 Ghz
20
 RAM: 4.00GB (3.90 GB usable)
 System Type: 64-bit Operating System
3.2.2 Non-Functional Requirements
 Compatibility: According to the definition, Compatibility is the capacity

for two systems to work together without having to be altered to do so. This
project is compatible to run on Windows 10,11
 Capacity: It can be stated as the capacity of a system refers to the amount
of storage it utilizes. This project can work with i3 processor of 8GB RAM
and i7 processor of 8GB RAM.
 Environment: It can be stated as all the external and internal forces that
exert on your project. This project works on python 3.8.5 environment and
higher versions.
 Performance: The performance of this project is analyzed based on the
accuracy of model and confusion matrix.
 Reliability: It can be stated as the extent to which the software system
consistently performs the specified function without failure. In this project,
output is analyzed is based on the dataset which is taken in controlled
environment.
3.3 CONCLUSION
The precise structure of a system’s data is the key to its success. Normal changes to
the business will not necessitate large adjustments to a system based on those facts if the
data are arranged to avoid redundancy along the lines of the business structure. For many
years, the holy grail of the computer industry has been achieving this durability in the face
of continual business change. It is possible if requirements are expressed in terms of a good
grasp of the data’s underlying structure.
21
4. METHODOLOGY
4.1 INTRODUCTION
To make things easier, complex things are usually divided into sample codes
called "modules.". A module is a file with the extension .py and contains
executable Python code. A module contains several Python statements and
expressions. Most modules are designedto be concise and unambiguous, and they
are intended to solve specific developer problems.
4.2 MODULES IDENTIFIED
In this project, we identify four modules to simplify the task:

1. Data collection and Data Processing
2. Feature Extraction
3. Model Building
4. Output prediction
4.2.1 Data collection and Data Preprocessing

4.2.1.1. Data collection
This is the first module, which is concerned with data collection. We consider
the Kaggle dataset titled emotions.(Rajasthan Dataset for yield prediction |
Kaggle).That data is originally taken from farmers in various regions of
Rajasthan and also from the state Government.
4.2.1.2 Data Preprocessing
Data preprocessing is the process of preparing raw data for learning models. In
this project data that is obtained is already preprocessed data. The dataset
consists of 3649 rows and 48 columns.
22
4.2.2 Feature Extraction
It refers to the process of transforming raw data into numerical features that can
be processed while preserving the information in the original dataset.
4.2.3 Model Building
In this phase, a machine learning or deep learning model is built by learning and
generalizing from training data, then applying that acquired knowledge to new
data it has never seen before to make predictions and fulfill its purpose. Model
is built based on different algorithms like Random Forest, Support Vector
Machine, Artificial Neural networks and so on. Later on Gradient Boosting
algorithm is applied on each model to reduce the error rate.
4.2.3.1 Random Forest
Random Forest is a powerful and versatile supervised machine learning

algorithm that grows and combines multiple decision trees to create a “forest.” It can be
used for both classification and regression problems in R and Python. Random Forest
grows multiple decision trees which are merged together for a more accurate prediction.
The logic behind the Random Forest model is that multiple uncorrelated models (the
individual decision trees) perform much better as a group than they do alone. When using
Random Forest for classification, each tree gives a classification or a “vote.” The forest
chooses the classification with the majority of the “votes.” When using Random Forest for
regression, the forest picks the average of the outputs of all trees.
4.2.3.2 Support vector Machine
Support Vector Machine or SVM is one of the most popular Supervised

Learning algorithms, which is used for Classification as well as Regression problems.
However, primarily, it is used for Classification problems in Machine Learning.
23
The goal of the SVM algorithm is to create the best line or decision boundary that can
segregate n-dimensional space into classes so that we can easily put the new data point in
the correct category in the future. This best decision boundary is called a hyperplane. SVM
chooses the extreme points/vectors that help in creating the hyperplane. These extreme
cases are called as support vectors, and hence algorithm is termed as Support Vector
Machine.
4.2.3.3 Artificial Neural Networks

"Artificial neural network" refers to a biologically inspired sub-field of artificial
intelligence modeled after the brain. An Artificial neural network is usually a
computational network based on biological neural networks that construct the structure of
the human brain. Similar to a human brain has neurons interconnected to each other,
artificial neural networks also have neurons that are linked to each other in various layers
of the networks. These neurons are known as nodes.
4.2.3.4 GRADIENT BOOSTING:

Gradient Boosting is a popular boosting algorithm. In gradient boosting, each
predictor corrects its predecessor’s error as shown in Fig 3. Each predictor is trained using
the residual errors of predecessor. Loss optimization takes place.
Formula:
Final Prediction= Base value+(LR * 1st residual pred by residual model 1)+(LR *
2nd residual pred by residual model 2) + …..
24
Fig 3. Working of Gradient Boosting Algorithm
4.2.4 OUTPUT PREDICTION
After building the model, it is tested against the test data and the crop yield is predicted.
The performance of the model is also tested with the help of metrics like:
25
4.3 Architecture Diagram
Fig 4. Architecture Diagram
26
5. IMPLEMENTATION
5.1 INTRODUCTION
The project's implementation stage is when the theoretical design is translated into
a workable system. As a result, it can be seen as the most crucial stage in ensuring the
success of a new system and giving the user confidence that the system will work and be
effective. The implementation step entails meticulous planning, research of the existing
system and its implementation limitations, designing of changeover methods, and
evaluation of changeover methods.
5.2 OUTPUT SCREENS
Screen 1: Implementation of Random Forest Using All Parameters
27
Screen 2: Implementation of Random Forest Using selected Parameters
Screen 3: Yield Prediction Using Random Forest
28
Screen 4: Implementation of SVM Using All Parameters
Screen 5: Implementation of SVM Using Selected Parameters
29
Screen 6: Yield Prediction Using SVM
5.3 CONCLUSION
In this chapter we have discussed the result part of our project. It discusses some key
functions and the introduction to the chapter. It also includes output screens of the parts of
code implemented.
30
6. FUTURE WORK
As already stated, the project is aimed at making accurate yield predictions for various
crops applying Gradient Boosting algorithm on various models built such as Random
Forest (RF), Support Vector Machine (SVM) and Artificial Neural Networks (ANN).The
current status of the project for this Semester (4th year 1st semester) is completion of the
implementation of Random Forest (RF) and Support Vector Machine (SVM). In the
upcoming semester, the project is expected to complete the implementation of Artificial
Neural Networks model and then apply Gradient Boosting algorithm on all the models that
are built. However, for future enhancements of the project, the project can be extended for
any other Machine learning or Deep Learning models and also for any Hybrid models.
31
7. CONCLUSION
The main objective of this project is to accurately estimate the yield of various crops in
Rajasthan. For this purpose Gradient Boosting algorithm is used. Currently two models are
built (Random Forest and SVM). At a later point of time in future, the project is expected
to complete the implementation of Artificial Neural Networks model and then by applying
Gradient Boosting algorithm on all the models that are built, the models will predict the
output more accurately. This project will be very helpful for farmers in future to make
appropriate crop yield predictions.
32
REFERENCES
[1] Everingham, Y.L., Inman-Bamber, N.G., Thorburn, P.J., McNeill,T.J., 2007. A Bayesian
modelling approach for long lead sugarcane yield forecasts for the Australian sugar
industry.Australian Journal of Agricultural Research 58, 87–94.
[2] Hansen, J.W., Indeje, M., 2004. Linking dynamic seasonal climate forecasts with crop
simulation for maize yield prediction in semi-arid Kenya. Agricultural and Forest Meteorology
125 (1–2), 143–157.
[3] Selvaraj, A., Selvaraj, J., Maruthaiappan, S., Babu, G. C., & Kumar, P. M. (2020). L1 norm
based pedestrian detection using video analytics technique. Computational Intelligence, 36(4),
1569-1579.
[4] Zhang, P., Anderson, B., Tan, B., Huang, D., Myneni, R., 2005.Potential monitoring of
crop production using a satellitebased Climate-Variability Impact Index. Agricultural and
Forest Meteorology 132 (3–4), 344–358.
[5] Y.L. Everinghama, C.W. Smyth, N.G. Inman-Bamber.,2008. Ensemble data mining
approaches to forecast regional sugarcane crop production. Agricultural and Forest
Meteorology 149 (2009), 689–696.
[6] X.E. Pantazi ,D. Moshou , T. Alexandridis, R.L. Whetton, A.M. Mouazen ,2016.Wheat
yield prediction using machine learning and advanced sensing techniques. Computers an
Electronics in Agriculture 11 (2016) 5765.http://dx.doi.org/10.1016/j.compag.2015.11.018
[7] Yang Chen ,Won Suk Lee,y, Hao Gan , Natalia Peres , Clyde Fraisse , Yanchao Zhang and
Yong He.,2019Strawberry Yield Prediction Based on a Deep Neural Network Using High-
Resolution Aerial Orthoimages Remote Sens. 2019, 11, 1584; doi:10.3390/rs11131584.
[8] K. MATSUMURA, C. F. GAITAN, K. SUGIMOTO, A. J. CANNON AND W.
W.HSIEH.,2014.Maize yield forecasting by linear regression and artificial neural networks in
Jilin, China. Journal of Agricultural Science,(1- 12). doi:10.1017/S0021859614000392.
[9] Mohammad MotiurRahman, NaheenaHaq, Rashedur M Rahman.Machine Learning
Facilitated Rice Prediction in Bangladesh. 2014 Annual Global Online Conference on 35
Information and Computer Technology. 978-1-4799-8311-7/15 $31.00 © 2015 IEEE DOI
10.1109/GOCICT.2014.9
[10] Alberto Gonzalez-Sanchez, Juan Frausto-Solis,Waldo OjedaBustamante,2014.Predictive
ability of machine learning methods for massive crop yield prediction.Spanish Journal of
Agricultural Research 2014 12(2): 313- 328.InstitutoNacional de Investigación y Tecnología
33
Agraria y Alimentaria (INIA).http://dx.doi.org/10.5424/sjar/2014122-4439.
[11] McQueen RJ, Garner SR, Nevill-Manning CG, Witten IH,1995. Applying machine
learning to agricultural data.Comput Electron Agr 12(4): 275-293.
[12] Y., Xu, H., Yan, L., 2017. Support vector machine-based open crop model
(SBOCM):Case of rice production in China. Saudi Journal of Biological Sciences 24 (3),537–
547. https://doi.org/10.1016/j.sjbs.2017.01.024.
[13] Ahilan, A., Manogaran, G., Raja, C., Kadry, S., Kumar, S. N., Kumar, C. A., ...&
Murugan, N. S. (2019). Segmentation by fractional order darwinian particle swarm
optimization based multilevel thresholding and improved lossless prediction based
compression algorithm for medical images. Ieee Access, 7, 89570-89580.
[14] Marinkovic´ B, Crnobarac J, Brdar S, Antic´ B, Jac´imovic´ G,Crnojevic´ V, 2009. Data
mining approach for predictive modeling of agricultural yield data. Proc. First Int Workshop
on Sensing Technologies in Agriculture,Forestry and Environment (BioSense09), Novi Sad,
Serbia, October, pp: 1-5. [15] Ruß G, Kruse R, 2010. Feature selection for wheat yield
prediction. In: Research and development in intelligent systems XXVI (Bramer M et al., eds.),
SpringerVerlag,London.
[16] Zhang, B., Valentine, I., Kemp, P., 2005. Modelling the productivity of naturalised
pasture in the north island, New Zealand: a decision tree approach. Ecol. Model. 186 (3),299–
311. https://doi.org/10.1005/j.ecolmod.2005.10.331
[17] Matsumura, K., Gaitan, C.F., Sugimoto, F., Cannon, A., Hsieh, W.W., 2015. Maize yield
forecasting by linear regression and artificial neural networks in Jilin, China. J. Agr.Sci. 153
(3), 399–410. https://doi.org/10.1016/j.agrsci.2015.10.153. 36
[18] Rub, G., Kruse, R., 2010. Feature selection for wheat yield prediction. In: Bramer,
M.(Ed.), Research and Development in Intelligent Systems XXVI. SpringerVerlag,London.
[19] Bocca, F.F., Rodrigues, L.H.A., 2016. The effect of tuning, feature engineering, and
feature selection in data mining applied to rainfed sugarcane yield modelling.
Comput.Electron. Agric. 128, 67–76. https://doi.org/10.1016/j.com&ele.2016.10.128.
[20] Fortin, J.G., Anctil, F., Parent, L., Bolinder, M.A., 2011. Site specific early season potato
yield forecast by neural network in Eastern Canada. Precis. Agr. 12 (6), 905–
923.https://doi.org/10.1011/j.preagr.2011.10.905
34

Crop Yield Prediction Using Gradient Boosting Neural Network Regression Model

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Crop Yield Prediction Using Gradient Boosting Neural Network Regression Model

Uploaded by

Copyright:

Available Formats

Crop Yield Prediction using Gradient Boosting Neural

Network Regression Model

BACHELOR OF TECHNOLOGY IN COMPUTER SCIENCE AND ENGINEERING

G. Chaitanya Sree (19JG1A0519) A. Niharika(19JG1A0501)

Ch. Rohitha Anupama (20JG5A0503) G. Srivalli(20JG5A0504)

Under the esteemed guidance of

Mr. K. Purushotham Naidu

Assistant Professor, CSE Department

Department of Computer Science and Engineering

G. Chaitanya Sree (19JG1A0519) A. Niharika (19JG1A0501)

Ch. Rohitha Anupama (20JG5A0503) G. Srivalli (20JG5A0504)

Mr. K. Purushotham Naidu Dr. P. V. S. L. Jagadamba

We feel elated to extend our sincere gratitude to Mr. K. Purushotham Naidu,

We express our deep sense of gratitude and thanks to Dr. P. V. S. Lakshmi

S. No. Figure No. Figure Name Page No.

1. Figure 1 Existing system 17

3. Figure 3 Working of Gradient 25

S.NO Table No. Table Name Page no.

1. Table 1 Table for Literature Survey 12

S. No. Screen No. Screen Name Page No.

3. Screen 3 Yield Prediction Using Random 28

4 Screen 4 Implementation of SVM Using 29

5 Screen 5 Implementation of SVM 29

6. Screen 6 Yield Prediction Using SVM 30

CNN Convolutional Neural Network

DNN Deep Neural Networks

MLP Multi Layer Perceptron

RNN Recurrent Neural Networks

SVM Support Vector Machine

ARMA Auto Regression Moving Average

KNN K Nearest Neighbors

1.1 MOTIVATION OF THE PROJECT

1.3 OBJECTIVE OF THE PROJECT

1.4 LIMITATIONS OF THE PROJECT

1.5. ORGANIZATION OF PROJECT

In [2] S. P. Raja, B. Sawicka, Z. Stamenkovic and G. Mariammal proposed "CROP

In [3] A. F. Haufler, J. H. Booske and S. C. Hagness, proposed "MICROWAVE SENSING

In [10] H. R. Seireg, Y. M. K. Omar, F. E. A. El-Samie, A. S. El-Fishawy and A.

In [15] X. E. Pantazi; D. Moshou; T. Alexandridis; R. L. Whetton; A. M. Mouazen.

In [17] Devdatta A. Bondre, Mr. Santosh Mahagaonkar proposed” PREDICTION OF

In [18] V. Sellam and E. Poovammal proposed” PREDICTION OF CROP YIELD USING

Title Year Journal Name Author(s) Dataset(s) Methodology Limitation(s)

Wheat yield 2015 Journal by X. E. Pantazi; Wheat counter-propagation Lack of computing

Fig 1: Existing System

2.4 PROPOSED SYSTEM

3.2 REQUIREMENT SPECIFICATION

The software requirements include the languages, packages and

 Language: python3, HTML, CSS

3.2.2 Non-Functional Requirements

 Compatibility: According to the definition, Compatibility is the capacity

4.2 MODULES IDENTIFIED

In this project, we identify four modules to simplify the task:

4.2.1 Data collection and Data Preprocessing

4.2.1.2 Data Preprocessing

4.2.3 Model Building

4.2.3.1 Random Forest

Random Forest is a powerful and versatile supervised machine learning

4.2.3.2 Support vector Machine

Support Vector Machine or SVM is one of the most popular Supervised

4.2.3.3 Artificial Neural Networks

4.2.3.4 GRADIENT BOOSTING:

4.2.4 OUTPUT PREDICTION

Fig 4. Architecture Diagram