Chapter-1 1.1 Overview

CHAPTER-1
INTRODUCTION
1.1 Overview
Agriculture is the backbone of the Indian economy. In India, agricultural yield primarily
depends on weather conditions and area. Rice cultivation mainly depends on rainfall and soil
type. Timely advice to predict the future crop productivity and an analysis is to be made in
order to help the farmers to maximize the crop production of crops. Yield prediction is an
important agricultural problem. In the past farmers used to predict their yield from previous
year yield experiences. Thus, for this kind of data analytics in crop prediction, there are
different techniques or algorithms, and with the help of those algorithms we can predict crop
yield. Using all these algorithms and with the help of inter-relation between them, there are
growing range of applications and the role of Big data analytics techniques in agriculture.
Since the creation of new innovative technologies and techniques the agriculture field is
slowly degrading. Due to these, abundant invention people are concentrated on cultivating
artificial products that are hybrid products where there leads to an unhealthy life. Nowadays,
modern people don't have awareness about the cultivation of the crops at the right time and at
the right place. Because of these cultivating techniques the seasonal climatic conditions are also
being changed against the fundamental assets like soil, water and air which lead to insecurity
of food. By analysing all these issues and problems like weather, temperature and several
factors, there is no proper solution and technologies to overcome the situation faced by us. In
India, there are several ways to increase the economic growth in the field of agriculture. There
are multiple ways to increase and improve the crop yield and the quality of the crops.
Machine Learning algorithms is also useful for predicting crop yield production.
Using past information on weather, temperature and a number of other factors the information
is given. The Application which we developed, runs the algorithm and shows the list of crops
suitable for entered data with predicted yield value.
when the producers of the crops know the accurate information on the crop yield it minimizes
the loss. Machine learning, a fast-growing approach that’s spreading out and helping every
sector in making viable decisions to create the foremost of its applications.
The core objective of crop yield estimation is to achieve higher agricultural crop production
and many established models are exploited to increase the yield of crop production. Nowadays,
1
ML is being used worldwide due to its efficiency in various sectors such as forecasting, fault
detection, pattern recognition, etc.
The ML algorithms also help to improve the crop yield production rate when there is a loss in
unfavorable conditions. The ML algorithms are applied for the crop selection method to reduce
the losses crop yield production irrespective of distracting environment.The main objectives
are
a. To use machine learning techniques to predict crop yield.
b. To provide easy to use User Interface.
c. To increase the accuracy of crop yield prediction
d. To analyse different climatic parameters (rainfall ,temperature etc)
1.2 PROBLEM STATEMENT:
The problem that the Indian Agriculture sector is facing the integration oftechnology to bring
the desired outputs. With the advent of new technologies and overuse of non-renewable energy
resources, patterns of rainfall and temperature are disturbed. The inconsistent trends developed
from the side effects of global warming make it difficult for the farmers to clearly predict the
temperature and rainfall patterns thus affecting their crop yield productivity andalso Indian
GDP is decreasing as crop yielding is decreasing. The main aim of this project is to help farmers
to cultivate a crop with maximum yield.
1.3 OBJECTIVE OF PROJECT
This project focuses on predicting the yield of the crop by applying various machine learning
techniques. The outcome of these techniques is compared on the basis of mean absolute error.
The prediction made by machine learning algorithms will help the farmers to decide which crop
to grow to get the maximum yield by considering factors like temperature, rainfall, area, etc.
2
CHAPTER-2
LITERATURE SURVEY
Title: YIELD OF THE CROP USING MACHINE LEARNING ALGORITHM

AUTHORS: P.Priya, U.Muthaiah & M.Balamurugan
The agriculture plays a dominant role in the growth of the country’s
economy.Climate and other environmental changes has become a major threat in the
agriculture field. Machine learning (ML) is an essential approach for achieving practical
and effective solutions for this problem. Crop Yield Prediction involves predicting yield
of the crop from available historical available data like weather parameter,soil parameter
and historic crop yield.This paper focus on predicting the yield of the crop based on the
existing data by using Random Forest algorithm. Real data of Tami lnadu were used for
building the models and the models were tested with samples.The prediction will helps to
the farmer to predict the yield of the crop before cultivating onto the agriculture field. To
predict the crop yield in future accurately Random Forest, a most powerful and popular
supervised machine learning algorithm is used.
Title: Applications of machine learning techniques in agricultural crop production:
a review
AUTHORS: Mishra .s, Mishra .D and Santra .G. H
This paper has been prepared as an effort to reassess the research studies on the
relevance of machine learning techniques in the domain of agricultural crop production.
Methods/Statistical Analysis: This method is a new approach for production of agricultural
crop management. Accurate and timely forecasts of crop production are necessary for
important policy decisions like import-export, pricing marketing distribution etc. which
are issued by the directorate of economics and statistics. However one has understand that
these prior estimates are not the objective estimates as these estimate requires lots of
descriptive assessment based on many different qualitative factors. Hence there is a
requirement to develop statistically sound objective prediction of crop production. That
development in computing and information storage has provided large amount of data.
Findings: The problem has been to intricate knowledge from this raw data, this has lead to
the development of new approach and techniques such as machine learning that can be
used to unite the knowledge of the data with crop yield evaluation. This research has been
intended to evaluate these innovative techniques such that significant relationship can be
found by their applications to the various variables present in the data base.
Application/Improvement: The few techniques like artificial neural networks, Information
3
Fuzzy Network, Decision Tree, Regression Analysis, Bayesian belief network. Time series
analysis, Markov chain model, k- means clustering, k nearest neighbor, and support vector
machine are applied in the domain of agriculture were presented.
Title: A Model for Prediction of Crop Yield.
AUTHORS: Manjula.E
Data Mining is emerging research field in crop yield analysis. Yield prediction is a
very important issue in agricultural. Any farmer is interested in knowing how much yield
he is about to expect. In the past, yield prediction was performed by considering farmer's
experience on particular field and crop. The yield prediction is a major issue that remains
to be solved based on available data. Data mining techniques are the better choice for this
purpose. Different Data Mining techniques are used and evaluated in agriculture for
estimating the future year's crop production. This research proposes and implements a
system to predict crop yield from previous data. This is achieved by applying association
rule mining on agriculture data. This research focuses on creation of a prediction model
which may be used to future prediction of crop yield. This paper presents a brief analysis
of crop yield prediction using data mining technique based on association rules for the
selected region i.e. district of Tamil Nadu in India. The experimental results shows that the
proposed work efficiently predict the crop yield production.
Ttle:Agricultural crop yield prediction using artificial neural network approach
AUTHORS: Dahikar, S. S, Rode and S. V.
By considering various situations of climatologically phenomena affecting local
weather conditions in various parts of the world. These weather conditions have a direct
effect on crop yield. Various researches have been done exploring the connections between
large-scale climatologically phenomena and crop yield. Artificial neural networks have
been demonstrated to be powerful tools for modeling and prediction, to increase their
effectiveness. Crop prediction methodology is used to predict the suitable crop by sensing
various parameter of soil and also parameter related to atmosphere. Parameters like type
of soil, PH, nitrogen, phosphate, potassium, organic carbon, calcium, magnesium, sulphur,
manganese, copper, iron, depth, temperature, rainfall, humidity. For that purpose we are
used artificial neural network (ANN).
Title:Predictive ability of machine learning methods for massive crop yield
prediction.
AUTHORS: Gonzlez Snchez. A, Frausto Sols. J and Ojeda Bustamante. W
An important issue for agricultural planning purposes is the accurate yield
estimation for the numerous crops involved in the planning. Machine learning (ML) is an
4
essential approach for achieving practical and effective solutions for this problem. Many
comparisons of ML methods for yield prediction have been made, seeking for the most
accurate technique. Generally, the number of evaluated crops and techniques is too low
and does not provide enough information for agricultural planning purposes. This paper
compares the predictive accuracy of ML and linear regression techniques for crop yield
prediction in ten crop datasets. Multiple linear regression, M5-Prime regression trees,
perceptron multilayer neural networks, support vector regression and k- nearest neighbor
methods were ranked. Four accuracy metrics were used to validate the models: the root
mean square error (RMS), root relative square error (RRSE), normalized mean absolute
error (MAE), and correlation factor (R). Real data of an irrigation zone of Mexico were
used for building the models. Models were tested with samples of two consecutive years.
The results show that M5- Prime and k-nearest neighbor techniques obtain the lowest
average RMSE errors (5.14 and 4.91), the lowest RRSE errors (79.46% and 79.78%), the
lowest average MAE errors (18.12% and 19.42%), and the highest average correlation
factors (0.41 and 0.42). Since M5-Prime achieves the largest number of crop yield models
with the lowest errors, it is a very suitable tool for massive crop yield prediction in
agricultural planning.
5
CHAPTER-3
SYSTEM ANALYSIS
3.1:EXISTING SYSTEM:
Due to the revolution in industrialization, the economic contribution of agriculture to
India’s GDP is steadily declining with the country’s broad-based economic growth. The
problem that the Indian Agriculture sector is facing is the integration of technology to bring
the desired outputs. With the advent of new technologies and overuse of non-renewable
energy resources patterns of rainfall and temperature are disturbed. The inconsistent trends
developed from the side effects of global warming make it cumbersome for the farmers to
clearly predict the temperature and rainfall patterns thus affecting their crop yield
productivity. In order to perform accurate prediction and handle inconsistent trends in
temperature and rainfall various machine learning algorithms like CNN and also used
computer vision etc can be applied to get a pattern. It will complement the agricultural
growth in India and all together augment the ease of living for farmers. In past, many
researchers have applied machine learning techniques and computer vision to enhance
agricultural growth of the country but it gave the less accuracy.
3.2 PROPOSED SYSTEM:
This project focuses on predicting the yield of the crop by applying various machine
learning techniques like Recurrent Neural Network(RNN) and Long Short term
Memory(LSTM) and Feed Forward Neural Network. The outcome of these techniques is
compared on the basis of mean square error. The prediction made by machine learning
algorithms will help the farmers to decide which crop to grow to get the maximum yield
by considering factors like temperature, rainfall, area, etc. Crop yielding prediction is
determined considering all the features.
The advantages of proposed system:
• The proposed system is useful for agriculture department and farmers to predict
crop yield and to suggest the suitable crop . It is useful to farmers to know the crop
yield.
• It is also used to help the farmers to decide which crop to cultivate in the field.
6
3.3:HARDWARE AND SOFTWARE REQUIREMENTS :
3.3.1 Hardware System Configuration:
• System - Pentium IV 3.5 GHz .
• Hard Disk - 40 GB.
• Floppy Drive - 1.44 Mb.
• Ram - 512 Mb
3.3.2 Software Requirements:
• System - Windows.
• Coding Language – Python 3.7.
Python
• Python is an interpreted high-level programming language for general-purpose
programming. Created by Guido van Rossum and first released in 1991, Python
has a design philosophy that emphasizes code readability, notably using significant
whitespace. Python features a dynamic type system and automatic memory
management. It supports multiple programming paradigms, including object-
oriented, imperative, functional and procedural, and has a large and comprehensive
standard library.
• Python is Interpreted − Python is processed at runtime by the interpreter. You do
not need to compile your program before executing it. This is similar to PERL and
PHP.
• Python is Interactive − you can actually sit at a Python prompt and interact with the
interpreter directly to write your programs.
• Python also acknowledges that speed of development is important. Readable and
terse code is part of this, and so is access to powerful constructs that avoid tedious
repetition of code. Maintainability also ties into this may be an all but useless
metric, but it does say something about how much code you have to scan, read
and/or understand to troubleshoot problems or tweak behaviors. This speed of
development, the ease with which a programmer of other languages can pick up
basic Python skills and the huge standard library is key to another area where
Python excels. All its tools have been quick to implement, saved a lot of time, and
several of them have later been patched and updated by people with no Python
background - without breaking.
7
Machine Learning
Before we take a look at the details of various machine learning methods, let's
start by looking at what machine learning is, and what it isn't. Machine learning is often
categorized as a subfield of artificial intelligence, but I find that categorization can often
be misleading at first brush. The study of machine learning certainly arose from research
in this context, but in the data science application of machine learning methods, it's more
helpful to think of machine learning as a means of building models of data. Fundamentally,
machine learning involves building mathematical models to help understand data.
"Learning" enters the fray when we give these models tuneable parameters that can be
adapted to observed data; in this way the program can be considered to be "learning" from
the data. Once these models have been fit to previously seen data, they can be used to
predict and understand aspects of newly observed data. I'll leave to the reader the more
philosophical digression regarding the extent to which this type of mathematical, model-
based "learning" is similar to the "learning" exhibited by the human brain. Understanding
the problem setting in machine learning is essential to using these tools effectively
Applications of Machines Learning:
Machine Learning is the most rapidly growing technology and according to researchers
we are in the golden year of AI and ML. It is used to solve many real-world complex
problems which cannot be solved with traditional approach.
Following are some real-world applications of ML
• Emotion analysis Sentiment analysis
• Error detection and prevention
• Weather forecasting and prediction
• Stock market analysis and forecasting
• Speech synthesis
• Speech recognition
• Customer segmentation
• Object recognition
• Fraud detection
• Fraud prevention
• Recommendation of products to customer in online shopping
8
CHAPTER-4
SYSTEM DESIGN
4.1:ARCHITECTURE
Fig1.Architecture
Agriculture Data
Data preparation Data Collection

Stage
Pre-Processing Data
Feature Extraction
Soil Other Weather

Area Temperature
Conditions
Load and Train Dataset
Applying Machine Learning Algorithms

LSTM,RNN,Feed
Summarize the Data Forward Neural
Network
Make a prediction Constraint
Calculate the yield of crop based on temperature ,rainfall

and area and soil type
Result
9
4.2:UML DIAGRAMS:
4.2.1 Introduction:UML represents Unified Modelling Language. UML is an
institutionalized universally useful showing dialect in the subject of article situated
programming designing. The fashionable is overseen, and become made by way of, the
Object Management Group. The goal is for UML to become a regular dialect for making
fashions of item arranged PC programming. In its gift frame UML is contained two
noteworthy components: a Meta-show and documentation. Later on, a few type of method
or system can also likewise be brought to; or related with, UML. The Unified Modeling
Language is a popular dialect for indicating, Visualization, Constructing and archiving the
curios of programming framework, and for business demonstrating and different non-
programming frameworks. The UML speaks to an accumulation of first-rate building
practices which have verified fruitful in the showing of full-size and complicated
frameworks. The UML is a essential piece of creating gadgets located programming and
the product development method. The UML makes use of commonly graphical
documentations to specific the plan of programming ventures.
GOALS: The Primary goals inside the plan of the UML are as in step with the subsequent:
1. Provide clients a prepared to utilize, expressive visual showing Language on the way to
create and change massive models.
2. Provide extendibility and specialization units to make bigger the middle ideas.
3. be free of specific programming dialects and advancement manner.
4. Provide a proper cause for understanding the displaying dialect.
5. Encourage the improvement of OO gadgets exhibit.
6. Support large amount advancement thoughts, for example, joint efforts, systems,
examples and its components.
7. Integrate widespread procedures.
4.2.2 Use Case Diagram:
A use case diagram in the Unified Modeling Language (UML) is a type of behavioral
diagram defined by and created from a Use-case analysis. Its purpose is to present a
graphical overview of the functionality provided by a system in terms of actors, their goals
(represented as use cases), and any dependencies between those use cases. The main
purpose of a use case diagram is to show what system functions are performed for which
actor. Roles of the actors in the system can be depicted.
10
Fig.2 Usecase Diagram
4.2.3 Class Diagram:
In software engineering, a class diagram in the Unified Modelling Language (UML) is type
of static structure diagram that describes the structure of a system by showing the system's
classes, their attributes, operations (or methods), and the relationships among the classes.
It explains which class contains information.
Fig.3 Class Diagram

4.2.4 Sequence Diagram:
A sequence diagram in Unified Modelling Language (UML) is a kind of interaction
diagram that shows how processes operate with one another and in what order. It is a
construct of a Message Sequence Chart. Sequence diagrams are sometimes called event
diagrams, event scenarios, and timing diagrams.
11
Fig.4 Sequence Diagram
4.2.5 Activity Diagram:
Activity diagrams are graphical representations of workflows of stepwise activities and

actions with support for choice, iteration and concurrency. In the Unified Modelling
Language, activity diagrams can be used to describe the business and operational step by-
step workflows of components in a system. An activity diagram shows the overall flow of
control.
Fig 5.Activity Diagram
12
CHAPTER-5
IMPLEMENTATION
5.1: Modules Description:
1.Upload data
Upload the dataset that have collected from the IMD.It contains the attributes like states
,Districts ,Crop ,Season , Area ,Production and Rainfall.
Table 1. Dataset
2.Preprocessing
Data Preprocessing is a method that is used to convert the raw data into a clean data set.
The data are gathered from different sources, it is collected in raw format which is not
feasible for the analysis. By applying different techniques like replacing missing values
and null values, we can transform data into an understandable format. The final step on
data preprocessing is the splitting of training and testing data. The data usually tend to be
split unequally because training the model usually requires as much datapoints as possible.
The training dataset is the initial dataset used to train ML algorithms to learn and produce
right predictions shows the few rows of the preprocessed data
3.Features Exraction:
There are a lot of factors that affects the yield of any crop and its production. These are
basically the features that help in predicting the production of any crop over the year. In
this project we include factors like Temperature, Rainfall, Area, Humidity and area and
soil type.
4.Load train and test Dataset:
Training datasets with 67% of the observations that can use to train our model, leaving
the remaining 33% testing the model.
5.Apply Neural Network
The processed data is trained through the machine learning algorithms .
6.Performance Analysis
13
The performance of neural network model was evaluated using the metrics like Mean
Square Error (MSE) .
5.2 Modules Used in Python:
Tensorflow:
TensorFlow is a free and open-source software library for dataflow and differentiable
programming across a range of tasks. It is a symbolic math library, and is also used for
machine learning applications such as neural networks. It is used for both research and
production at Google. TensorFlow was developed by the Google Brain team for internal
Google use.
It was released under the Apache 2.0 open-source license on November 9, 2015.
Numpy:
Numpy is a general-purpose array-processing package. It provides a highperformance
multidimensional array object, and tools for working with these arrays.
It is the fundamental package for scientific computing with Python. It contains various
features including these important ones:
• A powerful N-dimensional array object
• Sophisticated (broadcasting) functions
• Tools for integrating C/C++ and Fortran code
• Useful linear algebra, Fourier transform, and random number capabilities
Besides its obvious scientific uses, Numpy can also be used as an efficient
multidimensional container of generic data. Arbitrary data-types can be defined using
Numpy which allows Numpy to seamlessly and speedily integrate with a wide variety of
databases.
Pandas:
Pandas is an open-source Python Library providing high-performance data manipulation
and analysis tool using its powerful data structures. Python was majorly used for data
munging and preparation. It had very little contribution towards data analysis. Pandas
solved this problem. Using Pandas, we can accomplish five typical steps in the processing
and analysis of data, regardless of the origin of data load, prepare, manipulate, model, and
analyze. Python with Pandas is used in a wide range of fields including academic and
commercial domains including finance, economics, Statistics, analytics, etc.
Matplotlib:
Matplotlib is a Python 2D plotting library which produces publication quality figures in a
variety of hardcopy formats and interactive environments across platforms. Matplotlib can
be used in Python scripts, the Python and IPython shells, the Jupyter Notebook, web
14
application servers, and four graphical user interface toolkits. Matplotlib tries to make easy
things easy and hard things possible. You can generate plots, histograms, power spectra,
bar charts, error charts, scatter plots, etc., with just a few lines of code. For examples, see
the sample plots and thumbnail gallery. For simple plotting the pyplot module provides a
MATLAB-like interface, particularly when combined with IPython. For the power user,
you have full control of line styles, font properties, axes properties, etc, via an object
oriented interface or via a set of functions familiar to MATLAB users.
Scikit – learn
Scikit-learn provides a range of supervised and unsupervised learning algorithms via a
consistent interface in Python. It is licensed under a permissive simplified BSD license and
is distributed under many Linux distributions, encouraging academic and commercial use.
Tkinter
Tkinter is the standard GUI library for Python. Python when combined with Tkinter
provides a fast and easy way to create GUI applications. Tkinter provides a powerful
object-oriented interface to the Tk GUI toolkit.Creating a GUI application using Tkinter is
an easy task.
Python
Python is an interpreted high-level programming language for general-purpose
programming. Created by Guido van Rossum and first released in 1991, Python has a
design philosophy that emphasizes code readability, notably using significant whitespace.
Python features a dynamic type system and automatic memory management. It supports
multiple programming paradigms, including object-oriented, imperative, functional and
procedural, and has a large and comprehensive standard library.
• Python is Interpreted − Python is processed at runtime by the interpreter. You do not
need to compile your program before executing it. This is similar to PERL and PHP.
• Python is Interactive − you can actually sit at a Python prompt and interact with the
interpreter directly to write your programs.
Python also acknowledges that speed of development is important. Readable and terse code
is part of this, and so is access to powerful constructs that avoid tedious repetition of code.
Maintainability also ties into this may be an all but useless metric, but it does say something
about how much code you have to scan, read and/or understand to troubleshoot problems
or tweak behaviors. This speed of development, the ease with which a programmer of other
languages can pick up basic Python skills and the huge standard library is key to another
area where Python excels. All its tools have been quick to implement, saved a lot of time,
15
and several of them have later been patched and updated by people with no Python
background - without breaking.
5.3 ALGORITHMS
5.3.1:Feed Forward Neural Work:
• A Feed forward Neural Network is an artificial neural network wherein connections
between the nodes do not form a cycle.
• Perceptrons are arranged in layers, with the first layer taking in inputs and the last
layer producing outputs. The middle layers have no connection with the external
world, and hence are called hidden layers.
• Each perceptron in one layer is connected to every perceptron on the next layer.
Hence information is constantly “fed forward” from one layer to the next., and this
explains why these networks are called feed-forward networks.
• There is no connection among perceptrons in the same layer.
Fig.6 Feed Forward Neural Network
16
5.3.2:Recurrent Neural Network:
Recurrent neural networks (RNN) are the state of the art algorithm for sequential data and
are used by Apple's Siri and and Google's voice search. It is the first algorithm that
remembers its input, due to an internal memory, which makes it perfectly suited for
machine learning problems that involve sequential data. It is one of the algorithms behind
the scenes of the amazing achievements seen in deep learning over the past few years.
RNNs are a powerful and robust type of neural network, and belong to the most promising
algorithms in use because it is the only one with an internal memory.
Fig.7 Recurrent Neural Network

In a RNN the information cycles through a loop. When it makes a decision, it considers
the current input and also what it has learned from the inputs it received previously.
Because of their internal memory, RNN’s can remember important things about the input
they received, which allows them to be very precise in predicting what’s coming next. This
is why they're the preferred algorithm for sequential data like time series, speech, text,
financial data, audio, video, weather and much more. Recurrent neural networks can form
a much deeper understanding of a sequence and its context compared to other algorithms.
17
5.3.3:LONG-SHORT TERM MEMORY
Long short-term memory networks (LSTMs) are an extension for recurrent neural
networks, which basically extends the memory. Therefore it is well suited to learn from
important experiences thathave very long time lags in between.
The units of an LSTM are used as building units for the layers of aRNN, often called an
LSTM network.
LSTMs enable RNNs to remember inputs over a long period of time.This is because
LSTMs contain information in a memory, much like the memory of a computer. The
LSTM can read, write and delete information from its memory.
Fig.8 LSTM Algorithm

Pre-processing and Data Splitting :
Backfilling is used to handle the null values and a simple method is used to split the dataset
by separating into the training datasets with 67% of the observations that can use to train
our model, leaving the remaining 33% for testing the model.
Define Model :
The network has a visible layer with 1 input, a hidden layer with 4 LSTM blocks or
neurons, and an output layer that makes a single value prediction. The Long Short-Term
Memory network, or LSTM network, is a recurrent neural network that is trained using
Back propagation. Instead of neurons, LSTM networks have memory blocks that are
18
connected through layers. A block has components that make it smarter than a classical
neuron and a memory for recent sequences. A block contains gates that manage the block’s
state and output and operates upon an input sequence and each gate within a block uses the
sigmoid activation units to control whether they are triggered or not, making the change of
state and addition of information flowing through the block conditional. Once the model
is fit, we can estimate the performance of the model on the train and test datasets. This will
give us a point of comparison for new models. The network is trained for 50 epochs and a
batch size of 1 is used. The predictions before calculating error scores to ensure that
performance is reported in the same units as the original data.
5.3.4 FLOW OF AN ALGORITHM :

Step1: import libraries. Here we are using tkniter pandas matipolab numpy keras sklearn
Tkniter is used for graphical interface and others are used for training machine learning
model.
Step 2: creating main function and set the title
Step 3: creating global variables
19
Step 4: upload dataset. The dataset must be in .csv format.
Step 5: Preprocess the data
Step 6: Defining rnn function and where create rnn model and train the data by defining
each layer with certain filters to filter dataset
While filtering and training the dataset need to store accuracy like epochs and loss
Where adam is the optimizer used to train and metrics is the accuracy used to predict.
Here the class is defined as binary classification
Binary classification:In binary classification each input sample is assigned to one of two
classes. Generally these two classes are assigned labels like 1 and 0, or positive and
negative. More specifically, the two class labels might be something like malignant or
benign (e.g. if the problem is about cancer classification), or success or failure (e.g. if it is
about classifying student test scores).
Assume there is a binary classification problem with the classes positive and negative.
Here is an example of the labels for seven samples used to train the model. These are
called the ground-truth labels of the sample.
Positive, Negative, Positive , Negative, Positive , Negative, Positive , Negative ,Positive ,Negative
Accura
For comparison, here are both the ground-truth and predicted labels. At first glance we
can see 4 correct and 3 incorrect predictions. Note that changing the threshold might give
20
different results. For example, setting the threshold to 0.6 leaves only two incorrect
predictions.
To extract more information about model performance the confusion matrix is used. The
confusion matrix helps us visualize whether the model is "confused" in discriminating
between the two classes. As seen in the next figure, it is a 2×2 matrix. The labels of the
two rows and columns are Positive and Negative to reflect the two class labels. In this
example the row labels represent the ground-truth labels, while the column labels
represent the predicted labels. This could be changed.
Accuracy is a metric that generally describes how the model performs across all classes.
It is useful when all classes are of equal importance. It is calculated as the ratio between
the number of correct predictions to the total number of predictions.
Here is how to calculate the accuracy using Scikit-learn, based on the confusion matrix
previously calculated. The variable acc holds the result of dividing the sum of True
Positives and True Negatives over the sum of all values in the matrix.
Step 7: Defining runlstm and runff functions for training dataset it is also same as rnn.
21
Step 8:Defining predict function to predict the yield . Here we use Keras to predict the
yield here the test data will be coming to existance here in the array will stored with
actual value and predicted value.
Mean square error: Actual value- Predicted value
If MSE is less accuracy is more if MSE is more accuracy is less.
Step 9: Defining graph function and here we are using numpy for graph
Step 10: Defining GUI function for user interface

22
23
5.4 SAMPLE CODE
from tkinter import *
import tkinter
from tkinter import filedialog
import numpy as np
from tkinter.filedialog import askopenfilename
import pandas as pd
from tkinter import simpledialog
import matplotlib.pyplot as plt
import os
import pandas as pd
import numpy as np
from keras.models import Sequential
from keras.layers.core import Dense,Activation,Dropout, Flatten
from keras.utils.np_utils import to_categorical
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
import keras.layers
from keras.models import model_from_json
import pickle
from sklearn.preprocessing import StandardScaler
main = tkinter.Tk()
main.title("Crop Yield Prediction using Machine Learning")
main.geometry("1000x650")
global filename
global rnn_acc,lstm_acc, ff_acc
global classifier
global X, Y, Y1
global rainfall_dataseta
global crop_dataset
global le
scalerX = StandardScaler()
global weight_for_0
global weight_for_1
24
def upload():
global filename
global rainfall_dataset
global crop_dataset
global le
filename = filedialog.askdirectory(initialdir = ".")
rainfall_dataset = pd.read_csv('dataset/district wise rainfall normal.csv')
crop_dataset = pd.read_csv('dataset/Agriculture In India.csv')
crop_dataset.fillna(0, inplace = True)
crop_dataset['Production'] = crop_dataset['Production'].astype(np.int64)
print(crop_dataset.dtypes)
print(crop_dataset['Production'])
text.delete('1.0', END)
text.insert(END,filename+' Loaded\n\n')
text.insert(END,str(crop_dataset.head))
def preprocess():
global weight_for_0
global weight_for_1
global crop_dataset
global le
global X, Y
le = LabelEncoder()
crop_dataset['State_Name'] = pd.Series(le.fit_transform(crop_dataset['State_Name']))
crop_dataset['District_Name'] = pd.Series(le.fit_transform(crop_dataset['District_Name']))
crop_dataset['Season'] = pd.Series(le.fit_transform(crop_dataset['Season']))
crop_dataset['Crop'] = pd.Series(le.fit_transform(crop_dataset['Crop']))
crop_datasets = crop_dataset.values
cols = crop_datasets.shape[1]-1
X = crop_datasets[:,0:cols]
Y = crop_datasets[:,cols]
Y = Y.astype('uint8')
avg = np.average(Y)
#avg = avg / 60
Y1 = []
25
for i in range(len(Y)):
if Y[i] >= avg:
Y1.append(1)
else:
Y1.append(0)
Y = np.asarray(Y1)
a,b = np.unique(Y, return_counts=True)
print(str(a)+" "+str(b))
Y = to_categorical(Y)
counts = np.bincount(Y[:, 0])
weight_for_0 = 1.0 / counts[0]
weight_for_1 = 1.0 / counts[1]
print(X.shape)
print(Y.shape)
scalerX.fit(X)
X = scalerX.transform(X)
text.insert(END,str(X))
def runRNN():
global rnn_acc
global X, Y
global classifier
global rnn_acc
global weight_for_0
global weight_for_1
if os.path.exists('model/rnnmodel.json'):
with open('model/rnnmodel.json', "r") as json_file:
loaded_model_json = json_file.read()
classifier = model_from_json(loaded_model_json)
classifier.load_weights("model/rnnmodel_weights.h5")
classifier._make_predict_function()
print(classifier.summary())
f = open('model/rnnhistory.pckl', 'rb')
26
data = pickle.load(f)
f.close()
accuracy = data[1] * 100
rnn_acc = accuracy
text.insert(END,'RNN Prediction Accuracy : '+str(accuracy)+"\n\n")
else:
class_weight = {0: weight_for_0, 1: weight_for_1}
rnn = Sequential() #creating RNN model object
rnn.add(Dense(256, input_dim=X.shape[1], activation='relu', kernel_initializer = "uniform"))
#defining one layer with 256 filters to filter dataset
rnn.add(Dense(128, activation='relu', kernel_initializer = "uniform"))#defining another layer
to filter dataset with 128 layers
rnn.add(Dense(Y.shape[1], activation='softmax',kernel_initializer = "uniform")) #after buildi
ng model need to predict two classes such as normal or Dyslipidemia disease
rnn.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy']) #while filte
ring and training dataset need to display accuracy
print(rnn.summary()) #display rnn details
rnn_acc = rnn.fit(X, Y, epochs=2, batch_size=64,class_weight=class_weight) #start building
RNN model
values = rnn_acc.history #save each epoch accuracy and loss
values = values['accuracy']
acc = values[1] * 100
rnn_acc = acc;
f = open('model/rnnhistory.pckl', 'wb')
pickle.dump(values, f)
f.close()
text.insert(END,'RNN Prediction Accuracy : '+str(acc)+"\n\n")
classifier = rnn
classifier.save_weights('model/rnnmodel_weights.h5')
model_json = classifier.to_json()
with open("model/rnnmodel.json", "w") as json_file:
json_file.write(model_json)
def runLSTM():
global lstm_acc
if os.path.exists('model/lstmmodel.json'):
with open('model/lstmmodel.json', "r") as json_file:
27
loaded_model_json = json_file.read()
classifier1 = model_from_json(loaded_model_json)
classifier1.load_weights("model/lstmmodel_weights.h5")
classifier1._make_predict_function()
print(classifier1.summary())
f = open('model/lstmhistory.pckl', 'rb')
data = pickle.load(f)
f.close()
accuracy = data[1] * 100
lstm_acc = accuracy
text.insert(END,'LSTM Prediction Accuracy : '+str(accuracy)+"\n\n")
else:
XX = X.reshape((X.shape[0], X.shape[1], 1))
model = Sequential() #creating LSTM model object
model.add(keras.layers.LSTM(512,input_shape=(X.shape[1], 1))) #defining LSTM layer in se
quential object
model.add(Dropout(0.5)) #removing irrelevant dataset features
model.add(Dense(256, activation='relu'))#create another layer
model.add(Dense(Y.shape[1], activation='softmax'))#predict two values as normal or Dyslipi
demia disease
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])#cal
culate accuracy
print(model.summary())
lstm_acc = model.fit(XX, Y, epochs=2, batch_size=64) #start training model
values = lstm_acc.history
acc = values[1] * 100
lstm_acc = acc
f = open('model/lstmhistory.pckl', 'wb')
pickle.dump(values, f)
f.close()
text.insert(END,'LSTM Prediction Accuracy : '+str(acc)+"\n\n")
classifier1 = model
classifier1.save_weights('model/lstmmodel_weights.h5')
model_json = classifier1.to_json()
with open("model/lstmmodel.json", "w") as json_file:
28
json_file.write(model_json)
def runFF():
global ff_acc
model = Sequential([
Dense(64, activation='relu', input_shape=(X.shape[1],)),
Dense(64, activation='relu'),
Dense(2, activation='softmax')])
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
print(model.summary())
lstm_acc = model.fit(X, Y, epochs=2, batch_size=64) #start training model
values = lstm_acc.history
ff_acc = values[1] * 100
text.insert(END,'Feed Forward Neural Network Prediction Accuracy : '+str(ff_acc)+"\n\n")
def predict():
file = filedialog.askopenfilename(initialdir="dataset")
test = pd.read_csv(file)
test['State_Name'] = pd.Series(le.fit_transform(test['State_Name']))
test['District_Name'] = pd.Series(le.fit_transform(test['District_Name']))
test['Season'] = pd.Series(le.fit_transform(test['Season']))
test['Crop'] = pd.Series(le.fit_transform(test['Crop']))
test = test.values
cols = test.shape[1]
test = test[:,0:cols]
test = scalerX.fit_transform(test)
#test = test.reshape((test.shape[0], test.shape[1], 1))
print(test.shape)
#test = test[:,0:test.shape[1]]
y_pred = classifier.predict(test)
for i in range(len(test)):
predict = np.argmax(y_pred[i])
print(str(predict))
if predict == 0:
text.insert(END,"X=%s, Predicted = %s" % (test[i], 'Predicted Crop Yield will be LESS')+"\n\
n")
29
else:
text.insert(END,"X=%s, Predicted = %s" % (test[i], 'Predicted Crop Yield will be HIGH')+"\n
\n")
def graph():
global rnn_acc,lstm_acc
bars = ['RNN Accuracy','LSTM Accuracy','Feed Forward Accuracy']
height = [rnn_acc,lstm_acc, ff_acc]
y_pos = np.arange(len(bars))
plt.bar(y_pos, height)
plt.xticks(y_pos, bars)
plt.show()
def topGraph():
global rainfall_dataset
global crop_dataset
rainfall_dataset = pd.read_csv('dataset/district wise rainfall normal.csv')
rainfall = rainfall_dataset.groupby(['STATE_UT_NAME'])['ANNUAL'].agg(['sum'])
rainfall = rainfall.sort_values("sum", ascending=False).reset_index()
rainfall = rainfall.loc[0:5]
print(type(rainfall))
rainfall = rainfall.values
x1 = []
y1 = []
for i in range(len(rainfall)):
x1.append(str(rainfall[i,0]))
y1.append(rainfall[i,1])
rice = pd.read_csv('dataset/Agriculture In India.csv')
rice.fillna(0, inplace = True)
rice['Production'] = rice['Production'].astype(np.int64)
rice = rice.groupby(['State_Name','Crop'])['Production'].agg(['sum'])
rice = rice.sort_values("sum", ascending=False).reset_index()
x2 = []
y2 = []
rice = rice.values
for i in range(len(rice)):
if str(rice[i,1]) == 'Rice':
30
x2.append(str(rice[i,0]))
y2.append(rice[i,2])
if len(x2) > 5:
break;
x3 = []
y3 = []
if str(rice[i,1]) == 'Coconut':
if len(x3) > 5:
break;
x4 = []
y4 = []
if str(rice[i,1]) == 'Sugarcane':
if len(x4) > 5:
break;
x5 = []
y5 = []
if len(x5) > 5:
break;
fig, ax = plt.subplots(5)
fig.suptitle('Top 6 State Rainfall & Crop Yield')
ax[0].plot(x1,y1.copy())
ax[0].set_title("State Vs Rainfall")
ax[1].set_title("Top 6 State Vs Rice Crop Yield")
31
ax[2].set_title("Top 6 State Vs Coconut Crop Yield")
ax[3].set_title("Top 6 State Vs Sugarcane Crop Yield")
ax[4].set_title("Top 6 State Vs Any Crop Yield")
plt.show()
font = ('times', 15, 'bold')
title = Label(main, text='Crop Yield Prediction using Machine Learning', justify=LEFT)
title.config(bg='#00cc88', fg='#000000')
title.config(font=font)
title.config(height=3, width=120)
title.place(x=100,y=5)
title.pack()
font1 = ('times', 12, 'bold')
uploadButton = Button(main, text="Upload Agriculture Dataset", command=upload)
uploadButton.place(x=10,y=100)
uploadButton.config(font=font1)
preprocessButton = Button(main, text="Preprocess Dataset", command=preprocess)
preprocessButton.place(x=300,y=100)
preprocessButton.config(font=font1)
rnnButton = Button(main, text="Run RNN Algorithm", command=runRNN)
rnnButton.place(x=480,y=100)
rnnButton.config(font=font1)
lstmButton = Button(main, text="Run LSTM Algorithm", command=runLSTM)
lstmButton.place(x=670,y=100)
lstmButton.config(font=font1)
ffButton = Button(main, text="Run Feedforward Neural Network", command=runFF)
ffButton.place(x=10,y=150)
ffButton.config(font=font1)
graphButton = Button(main, text="Accuracy Comparison Graph", command=graph)
graphButton.place(x=300,y=150)
graphButton.config(font=font1)
predictButton = Button(main, text="Predict Crop using Test Data", command=predict)
predictButton.place(x=10,y=200)
predictButton.config(font=font1)
32
topButton = Button(main, text="Top 6 Crop Yield Graph", command=topGraph)
topButton.place(x=300,y=200)
topButton.config(font=font1)
font1 = ('times', 12, 'bold')
text=Text(main,height=20,width=160)
scroll=Scrollbar(text)
text.configure(yscrollcommand=scroll.set)
text.place(x=10,y=250)
text.config(font=font1)
main.mainloop()
33
5.5 RESULT
5.5.1 SCREENSHOTS
To run project double click on ‘run.bat’ file to get below screen
Fig.9 output screenshot 1
In above screen click on ‘Upload Crop Dataset’ button to upload dataset
Fig.10 outputscreenshot 2
34
In above screen selecting and uploading ‘Dataset.csv’ file and then click on ‘Open’ button
to load dataset and to get below screen

In above screen dataset loaded and we can see dataset contains some non-numeric values
and ML will not take non-numeric values so we need to preprocess dataset to convert
non-numeric values to numeric values by assigning ID to each non-numeric value. So
click on ‘Preprocess Dataset’ button to process dataset
35
In above screen all non-numeric values converted to numeric format and in below lines
we can see dataset contains total 246091 records and application using (80%) 196872
records to train ML and using (20%) 49219 records to test ML prediction error rate
(RMSE (root mean square error)). Now click on run rnn algorithm.
36
Fig 15.output screenshot 7
In above screen ML is trained and we got prediction as RNN Prediction Accuracy as

58.84 and LSTM Prediction Accuracy as 78.73 and Feed Forward Network accuracy as
62.69 then click on accuracy comparision graph.
37
In above screen selecting and uploading ‘test.csv’ file and then click on ‘Open’ button to
load test data and then application will give below prediction result
38
5.5.2 ADVANTAGES:-
• The proposed system is useful for agriculture department and farmers to predict
crop yield and to suggest the suitable crop if yield is low
• This model can be used to select the most excellent crops for the region.
• In this proposed system there is no need to analyze manually.
5.5.3 DISADVANTAGES:-
• Any kind of outliers in the data might lead to a completely unadequate
Suggestion.
39
CHAPTER-6
TESTING
6.1 INTRODUCTION
The purpose of testing is to discover errors. Testing is the process of trying to discover
every conceivable fault or weakness in a work product. It provides a way to check the
functionality of components, sub assemblies, assemblies and/or a finished product It is the
process of exercising software with the intent of ensuring that the Software system meets
its requirements and user expectations and does not fail in an unacceptable manner. There
are various types of test. Each test type addresses a specific testing requirement.
6.2 TYPES OF TESTS
6.2.1 Unit testing:
Unit testing involves the design of test cases that validate that the internal program logic
is functioning properly, and that program inputs produce valid outputs. All decision
branches and internal code flow should be validated. It is the testing of individual software
units of the application .it is done after the completion of an individual unit before
integration. This is a structural testing, that relies on knowledge of its construction and is
invasive. Unit tests perform basic tests at component level and test a specific business
process, application, and/or system configuration. Unit tests ensure that each unique path
of a business process performs accurately to the documented specifications and contains
clearly defined inputs and expected results.
6.2.2 Integration testing:
Integration tests are designed to test integrated software components to determine if they
actually run as one program. Testing is event driven and is more concerned with the basic
outcome of screens or fields. Integration tests demonstrate that although the components
were individually satisfaction, as shown by successfully unit testing, the combination of
components is correct and consistent. Integration testing is specifically aimed at exposing
the problems that arise from the combination of components.
6.2.3 Functional test :
Functional tests provide systematic demonstrations that functions tested are available as
specified by the business and technical requirements, system documentation, and user
manual
Functional testing is centered on the following items:
Valid Input : identified classes of valid input must be accepted.
Invalid Input : identified classes of invalid input must be rejected.
Functions : identified functions must be exercised.
40
Output :identified classes of application outputs must be exercised.
Systems/Procedures : interfacing systems or procedures must be invoked.
Organization and preparation of functional tests is focused on requirements, key functions,
or special test cases. In addition, systematic coverage pertaining to identify Business
process flows; data fields, predefined processes, and successive processes must be
considered for testing. Before functional testing is complete, additional tests are identified
and the effective value of current tests is determined.
System Test
System testing ensures that the entire integrated software system meets requirements. It
tests a configuration to ensure known and predictable results. An example of system testing
is the configuration oriented system integration test. System testing is based on process
descriptions and flows, emphasizing pre-driven process links and integration points.
6.2.4 White Box Testing
White Box Testing is a testing in which in which the software tester has knowledge of the
inner workings, structure and language of the software, or at least its purpose. It is purpose.
It is used to test areas that cannot be reached from a black box level.
6.2.5 Black Box Testing
Black Box Testing is testing the software without any knowledge of the inner workings,
structure or language of the module being tested. Black box tests, as most other kinds of
tests, must be written from a definitive source document, such as specification or
requirements document, such as specification or requirements document. It is a testing in
which the software under test is treated, as a black box .you cannot “see” into it. The test
provides inputs and responds to outputs without considering how the software works. Unit
Testing Unit testing is usually conducted as part of a combined code and unit test phase of
the software lifecycle, although it is not uncommon for coding and unit testing to be
conducted as two distinct phases.
Test Results: All the test cases mentioned above passed successfully. No defects
encountered.
6.2.6 Acceptance Testing:
User Acceptance Testing is a critical phase of any project and requires significant
participation by the end user. It also ensures that the system meets the functional
requirements. Test Results:All the test cases mentioned above passed successfully. No
defects encountered.
41
CHAPTER-7
CONCLUSION AND FUTURESCOPE
7.1 CONCLUSION:
This project focuses on the prediction of crop and calculation of its yield with the help of
machine learning techniques. Several machine learning methodologies used for the
calculation of accuracy. RNN and LSTM and Feed Forward Neural Network used for the
crop prediction for chosen district. Implemented a system to crop prediction from the
collection of past data. The proposed technique helps farmers in decision making of which
crop to cultivate in the field. This work is employed to search out the gain knowledge about
the crop that can be deployed to make an efficient and useful harvesting. The accurate
prediction of different specified crops across different areas will help farmers . This
improves our Indian economy by maximizing the yield rate of crop production.
7.2 FUTURE SCOPE:
In coming years, can try applying data independent system. That is whatever be the format
our system should work with same accuracy. Integrating soil details to the system is an
advantage, as for the selection of crops knowledge on soil is also a parameter. Proper
irrigation is also a needed feature crop cultivation. In reference to rainfall can depict
whether extra water availability is needed or not. This research work can be enhanced to
higher level by availing it to whole India.
42
CHAPTER-8
REFERENCES
1.Agriculture Role on Indian Economy Madhusudhan L-
https://www.omicsonline.org/openaccess/agriculture-role-on-indianeconomy- 2151-
6219-1000176.php?aid=62176
2.Priya, P., Muthaiah, U., Balamurugan, M. International Journal of Engineering Sciences
Research Technology Predicting Yield of the Crop Using Machine Learning Algorithm.
3.Mishra, S., Mishra, D., Santra, G. H. (2016). Applications of machine learning
techniques in agricultural crop production: a review paper.Indian J. Sci. Technol,9(38), 1-
14.
4.Manjula, E., Djodiltachoumy, S. (2017). A Model for Prediction of Crop
Yield.International Journal of Computational Intelligence and Informatics,6(4), 2349-
6363.
5.Dahikar, S. S., Rode, S. V. (2014). Agricultural crop yield prediction using artificial
neural network approach.International journal of innovative research in electrical,
electronics, instrumentation and control engineering,2(1), 683-686.
6.Gonzlez Snchez, A., Frausto Sols, J., Ojeda Bustamante, W. (2014). Predictive ability of
machine learning methods for massive crop yield prediction.
7.Mandic, D. P., Chambers, J. (2001). Recurrent neural networks for prediction: learning
algorithms, architectures and stability. JohnWiley Sons, Inc..
8.Hochreiter, S., Schmidhuber, J. (1997). Long short-term memory. Neural computation,
9(8), 1735-1780.
9.A. A. Alif, I. F. Shukanya, and T. N. Afee, “Crop prediction based on geographical and
climatic data using machine learning and deep learning”, Doctoral dissertation, BRAC
University) 2018.
10.Sak, H., Senior, A., Beaufays, F. (2014). Long short-term memory recurrent neural
network architectures for large scale acoustic modeling. In Fifteenth annual conference of
the international speech communication association.
11.Niketa Gandhi et al ," Rice Crop Yield Forecasting of Tropical Wet and Dry Climatic
Zone of India Using Data Mining Techniques",IEEE International Conference on
Advances in Computer Applications (ICACA).
43
44

Chapter-1 1.1 Overview

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter-1 1.1 Overview

Uploaded by

Copyright:

Available Formats

CHAPTER-1

a. To use machine learning techniques to predict crop yield.

b. To provide easy to use User Interface.

c. To increase the accuracy of crop yield prediction

d. To analyse different climatic parameters (rainfall ,temperature etc)

1.2 PROBLEM STATEMENT:

Title: YIELD OF THE CROP USING MACHINE LEARNING ALGORITHM

Data preparation Data Collection

Soil Other Weather

Load and Train Dataset

Applying Machine Learning Algorithms

Calculate the yield of crop based on temperature ,rainfall

Fig.3 Class Diagram

4.2.5 Activity Diagram:

Activity diagrams are graphical representations of workflows of stepwise activities and

Fig 5.Activity Diagram

Fig.6 Feed Forward Neural Network

Fig.7 Recurrent Neural Network

Fig.8 LSTM Algorithm

5.3.4 FLOW OF AN ALGORITHM :

Step 2: creating main function and set the title

Step 3: creating global variables

Step 5: Preprocess the data

Step 10: Defining GUI function for user interface

Fig.9 output screenshot 1

In above screen click on ‘Upload Crop Dataset’ button to upload dataset

Fig.11 output screenshot 3

Fig.12 output screenshot 4

Fig.13 output screenshot 5

Fig.14 output screenshot 6

In above screen ML is trained and we got prediction as RNN Prediction Accuracy as

Fig 16.output screenshot 8

Fig.18 output screenshot 10

You might also like