Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

2019 Global Conference for Advancement in Technology (GCAT)

Bangalore, India. Oct 18-20, 2019

Design and Implementation of Mobile Application for


Crop Yield Prediction using Machine Learning
Meeradevi HrishikeshSalpekar
Dept. of CSE Dept. of CSE
M S Ramaiah Institute of Technology M S Ramaiah Institute of Technology
Bangalore, India Bangalore, India
meera_ak@msrit.edu hrishikeshsalpekar@gmail.com

Abstract-India is primarily a country based on agriculture II. RELATED WORKS


and depends largely on agriculture for its economy. If farmer
knew which crop would yield higher, he would be able to grow The author used artificial neural network to analyze and
that crop.The project is based on designing an app that will allow evaluate the relative importance of selected soil, landscape
farmers to predict the region's production of specific crops and seed hybrid factors on yield and grain quality in two
depending on physical parameters such as rainfall and Illinois, USA fields. The response curves generated by the
temperature. Using crop dataset of various crops from various ANN models were more informative than simple correlation
regions of India, rainfall and temperature dataset for same coefficients or coefficients in multiple regression equations. In
regions , the proposed model is used to predict crop yield. The ANN analysis, the choice of variables, neural network
aim of proposed model is to develop a tool which is meant to architectures, training algorithms and associated parameters
deliver prediction based on the crop of the individual. The are critica. The commonly used trial-and-error approach is too
product version being developed focuses on the prediction, but time-consuming for manual execution, so the author used
eventually it can be added as more data is available and included Intelligent Problem Solver (IPS), a built-in feature involving a
as features in the data model. The proposed model provides combination of heuristics and sophisticated optimization
farmers with the detailed recommendation set to optimize their strategies, was used to automatically test 150 different
crop selection based on individual factors such as location, farm combinations of predictors, network architectures, training
size, temperature, rainfall, and various crop dataset.All data used algorithms and associated parameters. The more the model is
in this study are publicly available. tested the more better results are found. The author discussed
three important factors that influence corn yield and grain
Keywords:Forecasting yield, temperature, linear regression, quality are CEC (cation exchange capacity), relative elevation,
multivariate regression, Arima model.
aspect and soil sulfur .
In this study ShamsollahAyoubi discussed that artificial
I. INTRODUCTION neural network (ANN) models were designed to predict the
biomass and grain yield of barley from soil properties. ANN
Data mining extracts the required data from a large data yield models gave higher coefficient of determination and
set. We can attain new correlation, patterns with the filtered lower root mean square error compared to the multivariate
data. There are two types of data mining i.e. clustering and regression, indicating that ANN is a more powerful tool than
classification. Clustering divides the data to similar groups multivariate regression. In ANN soil organic matter (SOM) is
and hence can be treated as one section to perform the used as most important factor. The performance of ANN is
analysis. Classification classifies the data into predefined better compared to multivariate regression in prediction yield.
classes or groups; though agriculture is a crucial part of our the results of this study indicated that production under dry
economy, due to the lack of technology and scientific methods land farming systems in Iran is limited by moisture shortage
to farming, farmers in most cases do not get the preferred and the lack of optimum quantity of soil organic matter and
output. ARIMA model is used to train the dataset. The the use of improper management practices such as burning of
predicted yield accuracy will greatly depend on the amount the crop and organic residues. Therefore, the improvement in
and quality of available data. The prediction will also depend soil organic matter (SOM) pool, soil aggregation, soil fertility
on the model type used to predict. Python data processing and and drainage system could increase barley biomass and grain
yield in the study area.
Android Studio are used in this work to design the
The work has undertaken the study to evaluate the effect of
application.This paper presents a smartphone app, called
moisture content, soil organic carbon on soil texture and total
IntelliGrow, which portrays the results to the user in a nitrogen. A total of 174 soil samples were used in this study.
systematic manner.The app has a simple, easy-to-use interface The study included the laboratory analysis and online
requiring only a few taps to retrieve the desired results. measurement to determine soil Organic Carbon, Total
Nitrogen and texture by standard laboratory analyses. To

978-1-7281-3694-3/19/$31.00 ©2019 IEEE 1


distinguish and group soil spectra from each field, principal w=aaa['Production'].min()
component analysis (PCA) was carried out using raw soil df.fillna(random.randint(int(w),int(q)),inplace=True)
spectra collected in the laboratory from fresh soil samples. df.to_csv('C:/Users/User/Desktop/mini/new.csv')
The study also explains differences in Moisture Content and
texture of each field, which is reflected in and in line with the The dataset contains missing values that is replaced by
Principle Component Analysis. This study was carried out to random value of that particular district’s crop. In above code
understand and quantify the individual and interaction effects the df reads the csv file and dfs contains unique crop names in
of moisture content (MC) and texture fractions on accuracy of the csv file and dfs2 contains unique district names in the csv
organic carbon (OC) and total nitrogen (TN) predictions using file, a contains size of different number of crop name in
laboratory-scanned visible and near infrared (vis-NIR). dataset and b contains size different number of district name in
Pantazi X.E designed an efficient knowledge-based dataset, then for different district’s crop maximum and
approach utilizing an efficient Machine Learning algorithm minimum production of yield is assigned to q and w and using
for characterizing wheat yield behavior is presented in this fillna we fill missing value with random integer value in
research work. The novel method used is supervised self between the range of w and q. At last storing the new dataset
organizing maps. The author used large number of in new.csv
approaches, models and algorithms for accessing wheat yield
yield prediction. One method used in this study is B. Implementation
Counterpropagation Artificial Neural Networks (CP-ANNs)
which combines the feature from both supervised and 1.df=pd.read_csv("Mean_Temp_IMD_2017.csv")
unsupervised learning. CP-ANNs consist of two layers, a 2.Crop=pd.read_csv("apy2.csv")
Kohonen layer and an output layer, whose neurons have as 3.Rain=pd.read_csv("Future_Rain.csv")
many weights as the number of classes to be modeled. XY- 4.df=df[["YEAR", "ANNUAL"]]
Fused Networks (XY-Fs) uses supervised learning technique 5.
for building classification model. Supervised Kohonen
6.i=0
Networks (SKNs) are supervised neural networks used to
7.final=pd.DataFrame()
calculate classification models.
Andrew Crane-Droesch found that the timing of heat and 8.
moisture are important to predicting corn yields, along with 9. for crop inCrop.Crop.unique():
the simple accumulation of heat. The pane lNNETR package 10. df1=Crop.loc[Crop["Crop"]==crop]
was used to train SNN. A central challenge in training neural 11. fordistin df1.District_Name.unique():
networks is choosing appropriate hyper parameters, such that 12. df2=df1.loc[df1["District_Name"]==dist]
the main parameters constitute a model that predicts well out- 13. first=df2.Crop_Year.tolist()[0]
of-sample. 14. last=df2.Crop_Year.tolist()[-1]
15. if first >2000orif last <2013:
III. DATASET DESCRIPTION 16. continue
17. final=final.append(df2,
The data refers to data on crop covered area and ignore_index=True)
production from district wise, crop wise, season wise, and year 18. print(i)
wise.The temperature dataset is based on data from more than 19. i=i+1
350 stations spread across the country from the surface air 20.
temperature (1.2 m above sea level).In the rainfall dataset, the
21. newdf=pd.DataFrame()
subdivision wise rainfall and its departure from normal was
22. for crop infinal.Crop.unique():
provided for each month and season.
23. df1=final.loc[final["Crop"]==crop]
24. fordistin df1.District_Name.unique():
25. df2=df1.loc[df1["District_Name"]==dist]
A. Data preprocessing 26. q=df2['ppa'].max()
27. w=df2['ppa'].min()
df=pd.read_csv('C:/Users/User/Desktop/mini/apy.csv') 28.
dfs = df.Crop.unique() 29.df2.fillna(random.randint(int(w),int(q)),inplace=True)
dfs2 = df.District_Name.unique() 30.newdf = newdf.append(df2,ignore_index=True)
a=len(dfs)
b=len(dfs2)
for min range(0,b):
for l in range(0,a): The above Python code snippet was used to combine the
separate datasets and treat the missing values in the resultant
aa=df[df['District_Name']==dfs2[m]]
dataset. The appropriate data are selected grouped by unique
aaa=df[df['Crop']==dfs[l]]
crop and district in the lines, 10 and 12 in above code. The
q=(aaa['Production'].max())

2
values are appended to the final pandas DataFrame. The configuration for the set of rainfall values for the particular
missing values in the production attribute are filled randomly subdivision we use this configuration of ARIMA parameters
in the range [min, max] and appended to the newdf pandas to predict the rainfall till 2021.
DataFrame.
IV. METHODOLOGY
1. df=pd.read_csv("India_Rainfall.csv") As shown in figure 1 firstly when user opens the app in his
2. fin=pd.DataFrame() mobile it imports the location from Google. If the location is
3. p_values=range(0,5) detected user can enter the details about all parameters
4. d_values=range(0,5) required like soil type, temperature, rainfall. App runs the
5. q_values=range(0,5) algorithm and displays the list of crops suitable for entered
6. warnings.filterwarnings("ignore") data with predicted yield value. If location is not detected then
7. for sub in df.SUBDIVISION.unique(): user can enter location manually from drop down menu. App
8. df1=df.loc[df['SUBDIVISION']==sub] also shows the details and procedure to grow crops which you
9. series=pd.Series(df1.ANNUAL.tolist()) want.
10. cfg,error=evaluate_models(series.values,p_values,d_
values,q_values)
11. try:
12. history=[x for x in series]
13. predictions=list()
14. for t in range(4):
15. model=ARIMA(history,order=cfg)#order=(4,0,4)
16. model_fit=model.fit(disp=0)
17. yhat=model_fit.forecast()[0]
18. yh=mod(yhat[0])
19. predictions.append(yh)
20. history.append(yh)
21. except:
22. print("Error")
23. df1=df1[['SUBDIVISION','YEAR','ANNUAL']]
24. df1=df1.append(pd.DataFrame({'SUBDIVISION':[su
b,sub,sub,sub],'YEAR':[2018,2019,2020,2021],'ANN
UAL':predictions}))
25. fin=fin.append(df1,ignore_index=True)

To find the forecast values of rainfall and temperature we


are using modified ARIMA method. We have rainfall values
from 1901 to 2017. The values are read in a dataframedf. To
find the configuration of p,d,q values which gives the least
error in the ARIMA method grid search is used. p,d,q values
are chosen in the range of 0 to 5, as configuration of
parameters in this range gives highest accuracy anr any higher
range will require a system with better processing power. We Fig. 1: Flow diagram for the proposed system
will create a series of values in the ‘ANNUAL’ column. This
A. Linear regression
series is passed to the ‘evaluate_models’ function. Here all
possible combinations of the parameters are passed to the The production of crop data is available till 2014 so
‘evaluate_arima_model’ function which returns the error for production value of crop is forecasted till 2021. Read the csv
each configuration. The configuration with least error is file in df dataframe. For each crop select df1 from df using loc
selected and returned. In the ‘evaluate_arima_model’ function function. For each district select df2 from df1 for crop. Pass
the list of values divided into test and train datasets, with 30% ‘ppa’ and ‘Crop_Year’ values to the pred function.
training data. A predictions list is created using the train data
and the configuration passed to the function. The test and
prediction datasets are the rainfall values for the same period.
These values are passed to the ‘MAPE’ function where we
determine the error using the MAPE (Mean Absolute
Percentage Error) formula. Once we have the best

3
The above functions are used to find the forecast values for There are two activities in the app, MainActivity and
the crop yield. These are predicted values. To make the values Results. The user has to fill the form fields on the
more realistic, we add little variation form the predicter MainActivity to move onto the Results activity. This activity
values. we are adding a random value in the range of 0% to uses the Search Dialog Android library to display the available
10% of the range of the crop yield from the year 2015 to 2018. entries for the fields, State, District, Crop (which is optional),
Area and Soil. The state and district can be filled with a
B. Forecasting Yield simple tap of a button which gets the latitude and longitude of
the user. The GPS coordinates are fetched using the Location
Here after reading the data in dataframedf, we are selecting
Manager API. The coordinates are then translated(using
the temperature, rainfall and production values for different
Geocoder) to the proper State name and District name.
districts for every crop. This is done by selecting dataframes
The filled entries are bundled and sent to the Results
of different crops ‘df1’, and selecting different districts where
activity. In this, the program queries the CSV file in the assets
the crop is grown ‘df2’. Store a copy of df2, for which append
folder of the app. This action does not require an active
the forecasted values.
Internet connection as the file is local. The CSV file is the
To figure out the forecast with greatest accuracy we are
output of the Machine Learning algorithms described in the
training 3 methods for each dataframe, and selecting one
paper earlier. If the crop field was left empty in the form, the
which gives the least error. Training the models is done by
search results in the best crop to be grown. Otherwise, the
80% of the available data. This is split randomly using the
search returns the information about the requested crop.
‘test_train_split’ function in the ‘sklearn’ module. Since this
The results are presented in a tabular format, with the
process selects test and train datasets at random, the model is
columns, Year, Predicted Production, Rainfall and
trained 5 times with different test train datasets. First train the
Temperature. The data shown is available for the years, 2019
SVR model with rbf kernel then find the confidence of the
to 2021. Two plots are drawn in the same activity, depicting
model. Confidence of themodel is the accuracy of the model.
the variation of Production and Rainfall with time. The plots
If the accuracy of this model is higher than the accuracy of the
are made using the GraphView library and the predicted
most accurate model yet, stored in ‘high’, then the high is
values are distinguished from the past values in the dataset by
given the value of confidence and the trained model is saved
use of different colours.
in the pickle called ‘crop.pickle’. Now the SVR model is
trained with the linear kernel. Here as well the proposed
system will find the confidence of the model and if it is more
than ‘high’, it will overwrite the previous pickle with the
better model. Repeat this process 5 times to get the model with
the best confidence.
Now train the linearregression model. Again test if the
confidence of this model is better than the currently best
model and if it is save this model in the pickle. Now read the
model from the pickle and save it in ‘clf’. This model has the
best accuracy. To forecast the yield the proposed model use
the predict function and pass the X_lately values. These are
the values of rainfall, temperature and current yield for the
years 2016, 2017, 2018. The output is a list containing 3
numbers which are the production values for the years 2019,
2020, 2021. This is appended to the ‘df3’ dataframe, which is
appended to the final dataframe.
This process is performed for all the districts where each of
the crop is grown. The output is stored in a csv file.
V. APPLICATION DESCRIPTION
An Android app has been developed to query the results of
the machine learning analysis. The app is compatible with Fig. 2: Main Activity Query Form
Android OS version 4.1.x (JELLY_BEAN) and higher. This
makes sure that app runs on 99.5% of all the Android devices
and it would penetrate the farmer demographic. The app has a
simple, easy-to-use interface requiring only a few taps to
retrieve the desired results.

4
As shown in above figure 4user enters all details with
specific crop then result shows the predicted production for
that region based on temperature soil type and rainfall as
shown in figure 5.

Fig. 3 : Results Activity

Fig. 1 shows the the app launch activity with a sample


filled form. The crop field is left empty to get the result for
the crop which gives better yield with said parameters. Fig. 2
shows the fetched results with the graphs from fig 1 which
shows sugar cane gives better yield compared to others with
predicted temperature and rainfall.

Fig. 5: Groundnut yield prediction of Nagpur district.

Figure 5 shows 18.68 tons of groundnutin the year 2019


and 18.94 tons in year 2020.

Fig. 4: Main Activity Query Form Fig. 6: Maize prediction for Bagalkot district in tons

5
networks“, Precision Agric (2006) 7: 117.
https://doi.org/10.1007/s11119-006-9004-y
[2] ShamsollahAyoubi&KanwarLalSahrawat (2011)” Comparing
multivariate regression and artificial neural network to predict barley
production from soil characteristics in northern Iran”, Archives of
Agronomy and Soil Science, 57:5, 549-
565, DOI: 10.1080/03650341003631400.
[3] BoyanKuangAbdulM.Mouazen,” Non-biased prediction of soil organic
carbon and total nitrogen with vis–NIR spectroscopy, as affected by soil
moisture content and texture“,Biosystems Engineering Volume 114,
Issue 3, March 2013, Pages 249-258, Elsevier,
https://doi.org/10.1016/j.biosystemseng.2013.01.005
[4] Pantazi X.E., Moshou D., Mouazen A.M., Kuang B., Alexandridis T.
(2014), ” Application of Supervised Self Organising Models for Wheat
Yield Prediction”, Artificial Intelligence Applications and Innovations.
AIAI 2014.IFIP Advances in Information and Communication
Technology, vol 436. Springer, Berlin, Heidelberg.
[5] Andrew Crane-Droesch, “ Machine learning methods for crop yield
Fig. 7: Wheat prediction for Bidar district in tons
prediction and climate change impact assessment in agriculture”,
Environmental Research Letters 2018, 13 114003.
REFERENCES
[1] Miao, Y., Mulla, D.J. & Robert, P.C, “Identifying important factors
influencing corn yield and grain quality variability using artificial neural

You might also like