Analysis of Crop Yield Using Machine Learning: A Minor Project Report

ANALYSIS OF CROP YIELD USING MACHINE
LEARNING
A MINOR PROJECT REPORT
Submitted by
TATIKONDA ROHIT
YETURI SRI LASYA
MUPPARAJU RAMYA
Under the Guidance of
Dr.S.JANA
in partial fulfillment for the award of the degree
of
BACHELOR OF TECHNOLOGY
in
ELECTRONICS & COMMUNICATION ENGINEERING
DECEMBER 2021
BONAFIDE CERTIFICATE
Certified that this Minor project report entitled ”ANALYSIS OF CROP YIELD USING MA-
CHINE LEARNING” is the bonafide work of TATIKONDA ROHIT(18UEEC0450), YE-
TURI SRI LASYA(18UEEC0523) and MUPPARAJU RAMYA(18UEEC0278) who car-
ried out this Minor project under my supervision.
SUPERVISOR HEAD OF THE DEPARTMENT
Dr.S.JANA Dr. P. ESTHER RANI

Professor Professor
Department of ECE Department of ECE
Submitted for Evaluation of Minor Project on:−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
INTERNAL EXAMINER EXTERNAL EXAMINER
ii
ACKNOWLEDGEMENT
We express our deepest gratitude to our respected Founder President and Chancellor Col.
Prof. Dr. R. Rangarajan, Foundress President Dr. R. Sagunthala Rangarajan, Chairperson
Managing Trustee and Vice President.
We are very thankful to our beloved Vice Chancellor Prof. Dr. S. Salivahanan for providing us
with an environment to complete the work successfully.
We are obligated to our beloved Registrar Dr. E. Kannan for providing immense support in all our
endeavours. We thankful to our esteemed Dean Academics Dr. A. T. Ravichandran for providing
a wonderful environment to complete our work successfully.
We are extremely thankful and pay gratitude to our Dean/SOEC Dr. V. Jayasankar for his valuable
guidance and support on completion of this Minor project.
It is a great pleasure for us to acknowledge the assistance and contributions of our Head of the De-
partment Dr. P. Esther Rani, for her useful suggestions, which helped us in completing the work
in time and we thank her for being instrumental in the completion of final year (7th sem) with her
encouragement and unwavering support during the entire course.
We are extremely thankful and pay gratitude to our supervisor Dr.S.Jana, for her valuable guidance
and support on completing this Minor project report in pleasant form.
We thank our department faculty, supporting staffs and our parents for encouraging and supporting
us throughout the study to complete this Minor project report.
TATIKONDA ROHIT
YETURI SRI LASYA
MUPPARAJU RAMYA
iii
TABLE OF CONTENTS
CONTENTS PAGE No.
ABSTRACT vi
LIST OF TABLES vii
LIST OF FIGURES viii
LIST OF ABBREVIATIONS x
1 INTRODUCTION 1
1.1 MACHINE LEARNING AND ITS TYPES . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.1 Types of machine learning: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.2 Supervised Learning: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.3 Unsupervised Learning: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.4 Reinforcement learning: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 MACHINE LEARNING ALGORITHMS USED IN OUR PROJECT . . . . . . . . . . 5
1.2.1 Support Vector Machine: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.2 Decision Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2.3 Naive Bayes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2.4 Random Forest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3 FACTORS INFLUENCING AGRICULTURE . . . . . . . . . . . . . . . . . . . . . . . 11
1.4 WEB APPLICATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2 LITERATURE SURVEY 17
3 PROPOSED SYSTEM 20
3.1 METHODOLOGY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.2 JUPYTER NOTEBOOK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.3 PROCESSING OF ALGORITHMS: . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.4 WEB FRAMEWORK USED: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.4.1 Importance of python for machine learning: . . . . . . . . . . . . . . . . . . . . 29
iv
3.4.2 Libraries Used: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.5 CODE EDITOR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4 RESULTS AND DISCUSSION 33

4.1 ACCURACY COMPARISION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.1.1 Fitting the algorithms: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
5 CONCLUSION 38
REFERENCES 39
v
ABSTRACT
Accurate yield prediction is essential in agriculture. Remote sensing systems are being used
very frequently in building decision support tools for farming systems that will improve farming there
by improving yield while decreasing operating costs and environmental impact. However, RS based
approaches require processing of very large amounts of remotely sensed data from various platforms
and, therefore, greater attention is currently being given to machine learning (ML) methods. This is
because of the capability of machine learning systems to process a large input in very easy manner
Machine learning is a very important tool for crop yield prediction, including supporting decisions on
what crop to grow and what to do during the growing season of the crops. Several machine learn-
ing algorithms have been applied to support crop yield prediction research. In this project, we are
considering some of the parameters which will affect the Crop yield. The parameters are Nutrients,
Humidity, Temperature, PH, Rainfall. Through the information extracted from these parameter, a
machine learning model will be developed to analyse and predict the best crop using Machine Learning.
Keywords: Crop, Yeild, Machine Learning, Support Vector Machine(SVM), Naive Bayes,
Decision Tree(DT), Random Forest(RF)
vi
LIST OF TABLES
Sl.No. TITLE PAGE No.
4.1 Accuracy comparison of trainset and testset . . . . . . . . . . . . . . . . . . . . . . . . 34
vii
LIST OF FIGURES
Sl.No. TITLE PAGE No.
1.1 Types of machine learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2 Flow chart of supervised learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Flow chart of unsupervised learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Hyperplane of Linear SVM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.5 Flow chart of Decision Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.6 Classification using Naive Bayes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.7 Flow chart of Random Forest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.8 Example of Bagging and Boosting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.9 Flow chart of Bagging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.10 Factors effecting agriculture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.11 Userinterface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.12 Hypertext Markup Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.13 Cascading Style Sheets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.1 Block diagram of proposed system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.2 Flow chart of splitting train set and test set . . . . . . . . . . . . . . . . . . . . . . . . 21
3.3 Dataset of proposed model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.4 Flow chart of selecting the best algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.5 Flow chart of selected algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.6 Flask framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.7 Route and view function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.8 VS code editor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.1 Comparison of accuracy of four algorithms . . . . . . . . . . . . . . . . . . . . . . . . . 33

4.2 Accuracy of decision tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.3 Accuracy of random forest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.4 Accuracy of Naive Bayes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.5 Accuracy of Support Vector Machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
viii
4.6 Fitting Random Forest classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.7 Input and Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.8 Input and Output through website . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
ix
LIST OF ABBREVIATIONS
CSS Cascading Style Sheets

DT Decision Tree
HT M L HyperText Markup Language
OS Operating System
RF Random Forest
SV M Support Vector Machine
x
CHAPTER 1
INTRODUCTION
Cultivation is among the huge occupations experienced in the nation. An enormous im-
provement is achieved in the nation financially by performing various farming activities. Therefore,
it is referred to as the broadest money earning method. In India, 60.45fulfilling the necessities of
around 1.2 billion people. The process of streamlining agriculture business is very high in the present
era.Hence, the farmers are moving towards advantage and achieve greater profits in less expenses
[1, 2]. The informational indexes are analyzed with the help of Data Scientific (DA) by which the
inferences about the data they contain can be reached, along with the guide of specified software
and framework. Traditionally, the yield was expected on the basis of a rancher’s understanding on
a specific land and harvest [3, 4, 5]. As the conditions are changing bit by bit, the farmers focus
on building up of a regularly expanding number of harvests. Being it as the current situation, many
farmers need more data about the new yields. They are completely unaware of the gains they will get
after the cultivation [6, 7]. Likewise, the profitability of farm can be incremented by having a good
understanding and estimation of crop execution in natural conditions [8, 9]. The proposed framework
requires the information of the area of client. The constituents of the soil such as Nitrogen, Phospho-
rous and Potassium are achieved from the area [10, 11]. The two more datasets are contemplated in
the handling part which include crop and feature datasets, taken from the website kaggle.com, have
distinct information, taken as static information [12]. The fixed information represents the forma-
tion of harvest and data identified about various yields obtained from different government sites [13,
14].The proposed structure uses Machine learning and calculates for making expectation like Multiple
Linear Regression to recognize the model among information. Further, it is processed as indicated
including conditions. Thus, it will provide the best feasible reaps as demonstrated by given biological
conditions. Hence, this system simply needs the area of the customer and suggest various beneficial
yield. It provides a decision to the farmer about which harvest to develop.
The objective of agricultural production is to achieve maximum crop yield. Initial discovery
and management of complications like crop yield can help amplify return yield and ensuing profits. If
regional weather patterns are influenced, large scale weather events can have a substantial effect
1
Figure 1.1: Types of machine learning
On crop production. Crop managers can use predictions to minimize damage in critical
conditions. Furthermore, these forecasts could be used to make full use of the crop forecast if the
potential for favorable conditions of growth exists.
1.1 MACHINE LEARNING AND ITS TYPES
Machine learning is a subset of artificial intelligence (AI) that allows software applications to
perform more accurate at predicting outcomes with out being directly programmed to do so. Machine
learning algorithms use historical data as input to predict new output values. Machine learning is an
important in the developing field of data science. Through the use of statistical methods, algorithms
are trained to make classifications or predictions, uncovering key insights within data mining projects.
These are subsequently drive decision making with in applications and businesses that is a key growth.
As big data expand and grow, the demand for data will increase, them to assist in the identification
of the most problems in business questions and subsequently the data to answer them.
1.1.1 Types of machine learning:
As with any method, there are different ways to train machine learning algorithms, each with
their own advantages and disadvantages. To understand the pros and cons of each type of machine
learning, we must first look at what kind of data they ingest. In ML, there are two kinds of data
labeled data and unlabeled data.Labeled data has both the input and output parameters in a com-
pletely machine-readable pattern, but requires a lot of human labor to label the data, to begin with.
Unlabeled data only has one or none of the parameters in a machine-readable form. This negates the
need for human labor but requires more complex solutions. Figure 1.1 shows the three main methods
of machine learning that are used in various use-cases.
2
1.1.2 Supervised Learning:
Supervised learning is one of the most basic types of machine learning. In this type, the
machine learning algorithm is trained on labeled data. Even though the data needs to be labeled
accurately for this method to work, supervised learning is extremely powerful when used in the right
circumstances.
In supervised learning, the ML algorithm is given a small training dataset to work with. This
training dataset is a smaller part of the bigger dataset and serves to give the algorithm a basic idea of
the problem, solution, and data points to be dealt with. The training dataset is also very similar to
the final dataset in its characteristics and provides the algorithm with the labeled parameters required
for the problem.
The algorithm then finds relationships between the parameters given, essentially establish-
ing a cause and effect relationship between the variables in the dataset. At the end of the training,
the algorithm has an idea of how the data works and the relationship between the input and the
output.This solution is then deployed for use with the final dataset, which it learns from in the same
way as the training dataset. This means that supervised machine learning algorithms will continue
to improve even after being deployed, discovering new patterns and relationships as it trains itself on
new data. The figure 1.2 shows the flow chart of the supervised learning.
Figure 1.2: Flow chart of supervised learning
1.1.3 Unsupervised Learning:
Unsupervised machine learning holds the advantage of being able to work with unlabeled
data. This means that human labor is not required to make the dataset machine-readable, allowing
3
much larger datasets to be worked on by the program.In supervised learning, the labels allow the
algorithm to find the exact nature of the relationship between any two data points. However, unsu-
pervised learning does not have labels to work off of, resulting in the creation of hidden structures.
Relationships between data points are perceived by the algorithm in an abstract manner, with no
input required from human beings.
The creation of these hidden structures is what makes unsupervised learning algorithms ver-
satile. Instead of a defined and set problem statement, unsupervised learning algorithms can adapt to
the data by dynamically changing hidden structures. This offers more post-deployment development
than supervised learning algorithms.The figure 1.3 shows the flow chart of the unsupervised learning.
Figure 1.3: Flow chart of unsupervised learning
1.1.4 Reinforcement learning:
Reinforcement learning directly takes inspiration from how human beings learn from data in
their lives. It features an algorithm that improves upon itself and learns from new situations using
a trial-and-error method. Favorable outputs are encouraged or ‘reinforced’, and non-favorable out-
puts are discouraged or ‘punished’. Based on the psychological concept of conditioning, reinforcement
learning works by putting the algorithm in a work environment with an interpreter and a reward sys-
tem. In every iteration of the algorithm, the output result is given to the interpreter, which decides
whether the outcome is favorable or not.
In case of the program finding the correct solution, the interpreter reinforces the solution by
providing a reward to the algorithm. If the outcome is not favorable, the algorithm is forced to reiter-
ate until it finds a better result. In most cases, the reward system is directly tied to the effectiveness
of the result.In typical reinforcement learning use-cases, such as finding the shortest route between
4
two points on a map, the solution is not an absolute value. Instead, it takes on a score of effectiveness,
expressed in a percentage value. The higher this percentage value is, the more reward is given to the
algorithm. Thus, the program is trained to give the best possible solution for the best possible reward.
1.2 MACHINE LEARNING ALGORITHMS USED IN OUR PROJECT
In the proposed model we have used four algorithms there are support vector machine,
random forest, decision tree, naive bayes algorithm
1.2.1 Support Vector Machine:
Support vector machine algorithm is an ML based algorithm that used to find a hyperplane
in a N-dimensional space(N=The number of features) that classifies the data points. To separate the
two classes of data points, there are many possible hyperplanes as shown in fig 1.4 that could be
chosen. Our objective is to find a plane that has the maximum margin, that the maximum distance
between data points of both classes that maximizing the margin distance provides some reinforcement
so the future data points can be classified with more confidence.Support vectors are data points that
are closer to the hyperplane and influence the position and orientation of the hyperplane. Using these
support vectors, we maximize the margin of the classifier. Deleting the support vectors will change
the position of the hyperplane. These are the points that help us build our SVM.
Figure 1.4: Hyperplane of Linear SVM
5
Types of Support Vector Machine:
• Linear SVM: Linear SVM is used for linearly separable data, which means if a dataset can
be classified into two classes by using a single straight line, then such data is termed as linearly
separable data, and classifier is used called as Linear SVM classifier.
• Non linear SVM: Non-Linear SVM is used for non-linearly separated data, which means if a
dataset cannot be classified by using a straight line, then such data is termed as non-linear data
and classifier used is called as Non-linear SVM classifier.
1.2.2 Decision Tree
Decision Tree is a Supervised learning technique that can be used for both classification
and Regression problems, but mostly it is preferred for solving Classification problems. It is a tree,
Structured classifier, where internal nodes represent the features of a dataset, branches represent the
decision rules and each leaf node represents the outcome. In a Decision tree, there are two nodes,
which are the Decision Node and Leaf Node. Decision nodes are used to make any decision and have
multiple branches. Decision trees help you to evaluate your options. Decision Trees are excellent tools
for helping you to choose between several courses of action. They provide a highly effective struc-
ture within which you can lay out options and investigate the possible outcomes of choosing those
options.The figure 1.5 shows the flowchart of Decision tree.
Figure 1.5: Flow chart of Decision Tree
6
Types of Decisions
There are two main types of decision trees that are based on the target variable, i.e., cate-
gorical variable decision trees and continuous variable decision trees.
• Categorical Variable: Decision tree A categorical variable decision tree includes categorical
target variables that are divided into categories. For example, the categories can be yes or no.
The categories mean that every stage of the decision process falls into one category, and there
are no in betweens.
• Continuous Variable: Decision tree A continuous variable decision tree is a decision tree
with a continuous target variable. For example, the income of an individual whose income is
unknown can be predicted based on available information such as their occupation, age, and
other continuous variables figure 1.5 shows the flow chart of Decision Tree
1.2.3 Naive Bayes
Naive Bayes classifiers are a collection of classification algorithms based on Bayes’ Theorem.
It is not a single algorithm but a family of algorithms where all of them share a common principle,
i.e. every pair of features being classified is independent of each other.
Naive Bayes Classifier is one of the simple and most effective Classification algorithms which
helps in building the fast machine learning models that can make quick predictions.Naive Bayes model
is easy to build and particularly useful for very large data sets. Along with simplicity, Naive Bayes is
known to outperform even highly sophisticated classification methods figure 1.6 shows the example of
Naive Bayes algorithm. Bayes’ Theorem finds the probability of an event occurring given the proba-
bility of another event that has already occurred.
7
Figure 1.6: Classification using Naive Bayes
Types of Naive Bayes Models:
There are three types of Naive Bayes Model, which are given below:
• Gaussian: The Gaussian model assumes that features follow a normal distribution. This means
if predictors take continuous values instead of discrete, then the model assumes that these values
are sampled from the Gaussian distribution.
• Multinomial: The Multinomial Naı̈ve Bayes classifier is used when the data is multinomial
distributed. It is primarily used for document classification problems, it means a particular
document belongs to which category such as Sports, Politics, education, etc. The classifier uses
the frequency of words for the predictors.
• Bernoulli: The Bernoulli classifier works similar to the Multinomial classifier, but the predictor
variables are the independent Booleans variables. Such as if a particular word is present or not
in a document. This model is also famous for document classification tasks.
1.2.4 Random Forest
A random forest is a supervised machine learning algorithm that is constructed from decision
tree algorithms. This algorithm is applied in various industries such as banking and e-commerce to
predict behavior and outcomes.
A random forest is a machine learning technique that’s used to solve regression and classifi-
cation problems. It utilizes ensemble learning, which is a technique that combines many classifiers to
8
provide solutions to complex problems.
A random forest algorithm consists of many decision trees. The ‘forest’ generated by the
random forest algorithm is trained through bagging or bootstrap aggregating. Bagging is an ensemble
meta.Algorithm that improves the accuracy of machine learning algorithms.
The (random forest) algorithm establishes the outcome based on the predictions of the deci-
sion trees. It predicts by taking the average or mean of the output from various trees. Increasing the
number of trees increases the precision of the outcome.
A random forest eradicates the limitations of a decision tree algorithm. It reduces the overfit-
ting of datasets and increases precision. It generates predictions without requiring many configurations
in packages (like scikit-learn)
Random Forest Algorithm Working:
Decision trees are the building blocks of a random forest algorithm. A decision tree is a
decision support technique that forms a tree-like structure. An overview of decision trees will help us
understand how random forest algorithms work.
A decision tree consists of three components: decision nodes, leaf nodes, and a root node.
A decision tree algorithm divides a training dataset into branches, which further segregate into other
branches. This sequence continues until a leaf node is attained. The leaf node cannot be segregated
further.
The nodes in the decision tree represent attributes that are used for predicting the outcome.
Decision nodes provide a link to the leaves. The figure 1.7 shows the three types of nodes in a
decision.The working of the random forest we must look into the ensemble technique. Ensemble
simply means combining multiple models. Thus a collection of models is used to make predictions
rather than an individual model.
9
Figure 1.7: Flow chart of Random Forest
Ensemble uses two types of methods:
1. Bagging: It creates a different training subset from sample training data with replacement the
final output is based on majority voting. For example, Random Forest.
2. Boosting: It combines weak learners into strong learners by creating sequential models such
that the final model has the highest accuracy. For example, ADA BOOST, XG BOOST. The
example of bagging and boosting are shown in figure 1.8.
Figure 1.8: Example of Bagging and Boosting
10
Random forest works on the Bagging principle. Now let’s dive in and understand bagging in detail.
Bagging:
Bagging, also known as Bootstrap Aggregation is the ensemble technique used by random
forest. Bagging chooses a random sample from the data set. Hence each model is generated from the
samples (Bootstrap Samples) provided by the Original Data with replacement known as row sampling.
This step of row sampling with replacement is called bootstrap.Flow chart of bagging is shown in figure
1.9. Now each model is trained independently which generates results. The final output is based on
majority voting after combining the results of all models. This step which involves combining all the
results and generating output based on majority voting is known as aggregation.
Figure 1.9: Flow chart of Bagging
1.3 FACTORS INFLUENCING AGRICULTURE
Many research have been conducted to develop an efficient method for yield prediction The
crop production depends on various factors which change with every square meter and depends on:
1. Geography of region
2. Weather (Temperature , Humidity ,Precipitation)
3. Soil Type (sodic, non-alkaline)
4. Soil Composition (PH , N ,P ,K ,EC ,OC ,Zn ,F ).
The proposed structure uses Machine Learning (ML) and calculates for Best Crop Further, it
is processed as indicated including conditions. Thus, it will provide the best feasible reaps as demon-
strated by given biological conditions that system simply needs the Factors of the customers field and
11
suggest various beneficial yield. Figure 1.10 shows the factors that effect the agriculture
It provides a decision to the farmer about which harvest to develop. It is proceeded with
importing the necessary libraries and packages and continued with performing data preprocessing.
The data is spat into trained data and test data. Finally, a model is constructed in which required
ML algorithms are utilized which in return will provide the best suitable crop that should be grown
on a particular land.
Figure 1.10: Factors effecting agriculture
12
1.4 WEB APPLICATION
A web application (or web app) is application software that runs on a web server, unlike
computer-based software programs that are run locally on the operating system (OS) of the device.
Web applications are accessed by the user through a web browser with an active network connection.We
Build an Webapplication for this project so that it can be easyly understandable by user and can
operate easily.
User Interface:
The user interface (UI) is the point at which human users interact with a computer, website
or application. The goal of effective UI is to make the user’s experience easy and intuitive, requiring
minimum effort on the user’s part to receive maximum desired outcome.To design this we used HTML,
CSS and Python
Figure 1.11: Userinterface
Hypertext Markup Language:
HTML stands for Hyper Text Markup Language. It is the standard markup language for
creating Web pages.It describes the structure of a Web page.It consists of a series of elements the
13
elements tell the browser how to display the content.HTML elements label pieces of content such as
”this is a heading”, ”this is a paragraph”, ”this is a link”, etc. The Hyper Text Markup Language,
or HTML is the standard markup language for documents designed to be displayed in a web browser.
It can be assisted by technologies such as Cascading Style Sheets (CSS) and scripting languages such
as JavaScript.
Web browsers receive HTML documents from a web server or from local storage and render
the documents into multimedia web pages. HTML describes the structure of a web page semantically
and originally included cues for the appearance of the document.
HTML elements are the building blocks of HTML pages. With HTML constructs, images
and other objects such as interactive forms may be embedded into the rendered page. HTML provides
a means to create structured documents by denoting structural semantics for text such as headings,
paragraphs, lists, links, quotes and other items. HTML elements are delineated by tags, written
using angle brackets. Tags such as img and input directly introduce content into the page. Other
tags such as p surround and provide information about document text and may include other tags as
sub-elements. Browsers do not display the HTML tags, but use them to interpret the content of the
page.
HTML can embed programs written in a scripting language such as JavaScript, which affects
the behavior and content of web pages. Inclusion of CSS defines the look and layout of content. The
World Wide Web Consortium (W3C), former maintainer of the HTML and current maintainer of the
CSS standards, has encouraged the use of CSS over explicit presentational HTML since 1997.
Figure 1.12: Hypertext Markup Language
14
Cascading Style Sheets
Cascading Style Sheets (CSS) is a style sheet language used for describing the presentation
of a document written in a markup language such as HTML.CSS is a cornerstone technology of the
World Wide Web, alongside HTML and JavaScript.
CSS is designed to enable the separation of presentation and content, including layout, col-
ors, and fonts.This separation can improve content accessibility, provide more flexibility and control
in the specification of presentation characteristics, enable multiple web pages to share formatting by
specifying the relevant CSS in a separate. CSS file which reduces complexity and repetition in the
structural content as well as enabling the CSS file to be cached to improve the page load speed between
the pages that share the file and its formatting.
Separation of formatting and content also makes it feasible to present the same markup page
in different styles for different rendering methods, such as on-screen, in print, by voice (via speech-
based browser or screen reader), and on Braille-based tactile devices. CSS also has rules for alternate
formatting if the content is accessed on a mobile device.
The name cascading comes from the specified priority scheme to determine which style rule
applies if more than one rule matches a particular element. This cascading priority scheme is pre-
dictable. The CSS specifications are maintained by the World Wide Web Consortium (W3C). Internet
media type (MIME type) text/css is registered for use with CSS by RFC. The W3C operates a free
CSS validation service for CSS documents. In addition to HTML, other markup languages support
the use of CSS including XHTML, plain XML, SVG, and XUL.
Figure 1.13: Cascading Style Sheets
15
Python:
Python is an object-oriented programming language that is preferred by most of the devel-

opers. Professionals who are freshers in the field of programming, they start learning and practicing
code with Python. Its simplicity, versatility, and the community support are the most important
features for Python’s popularity. It has a wide range of libraries to further simplify coding in Python.
You should choose Python because it has become the most preferred programming language
and enables machine learning applications. Python is swift as compared to other programming lan-
guages. Syntax is simpler and the pre-existing libraries eliminate the need for coding every logic from
scratch.
Python is a swift compiler and since it is java-based, programmers will be able to extend
its applications beyond analytical research, analytical modelling, and statistical modelling. Web ap-
plications that are created using Python can be integrated directly to the analytical models in the
background.
Python could be easily integrated with other platforms and programming languages. With
this common object oriented programming architecture wherein existing IT analysts, IT developers,
and IT programmers can easily transition to the analytics domain.As the structure of coding in Python
is object-oriented programming architecture, it has excellent documentation support.
16
CHAPTER 2
LITERATURE SURVEY
Sheenoy et al.[1] represented a paper that places an answer for the decrement in cost of
transportation. The IOT-based methodology is used to decrease quantity of agents and middle hops
between the clients and the ranchers that further supports the rancher. The paper ends up being the
inspiration for the research work. The paper executes mechanisms that are integrated and provides a
prediction-based mechanism that advise for crops which yield maximum profit.
Monali et al.[2] said, to make prediction of the crop yields, an analysis of crops is made and based on
it, they are classified. The classification is performed on the basis of data mining algorithms
In this paper the authors [3] discussed various rules of classification such as K-Nearest Neighbor,
Naive Bayes are taken care of in this paper. These rules are studied and recognized that will be exact
for dataset used in this research work
Abdullah et al.[4] provided a smartphone based application that calculates the pH values for soil,
humidity and temperature progressively. A microcontroller block, communication block and sensing
block are used by the system. Sensors are utilized in ranch in order to set a communication link with
cellphones using Bluetooth continuously. This paper gives methods for remote investigation of soil
through different procedures. It urged us to search for different methods through which the informa-
tion can be passed on that will be taken from sensors for development and in the end producing the
yield.
Hemageetha et al.[5] gave various data mining techniques such as Association Rule Mining, Clas-
sification, Clustering, Market-based analysis, Decision Trees. It totally wraps the data mining idea.
In this paper, different data mining algorithms, for example, K-Means, Naive Bayes classifier, J48 are
discussed.
In this paper the authors [6] discussed the soil classification depending on Genetic algorithm, Naive
17
Bayes, Association Rule Mining. In the end, the clustering in database of soil is covered. It supported
us in comprehension along with various data mining algorithms analysis. While building up the task
of this research work, it ends up to be very beneficiary. It helps in dataset mining acquired from
remotely used sensors.
Nagini et al.[7] displayed an Explorative data study shown in this paper along with an explaina-
tion of creating a number of predictive models is given. The different regression techniques are used
on a sample dataset so as to recognize and examine their properties separately. The methods explained
in the paper are Linear, non-Linear, Multiple-Linear, Polynomial, Ridge and Logistic regression.
In this paper the authors [8] discussed about comparative study of a number of algorithms in data
analytics is achieved. It further helps to take a better decision of the best suitable algorithm for the
proposed structure.
Awanit et al.[9] gave a framework to predict the crop production for the present year is proposed
in this paper. An algorithm of data mining called as K-Means, is used to get the production of crop.
In this, a mechanism is also applied to predict the crop in the form of fuzzy logic. A set of rules are
used on a particular land for cultivation, precipitation, and creation of crops. It is referred to as a
rule-based prediction logic, also termed as Fuzzy Logic. The process of using K-Means to study the
datasets is explained in this paper.
In this paper the authors [10] discussed set of rules are used in fuzzy logic form, these rules are
again used for prediction of the crop which will maximize the profit on the basis of cost of crops in
the past years and present information of soil and weather.
In this paper the authors [11] discussed Machine Learning algorithms that are used to predict the
crops.Random forest algorithm is used for the five climatic parameters to train the model but other
agriculture inputs like soil quality, pest, chemicals used, etc. are not considered. The model was
trained by 200 decision trees to construct random forest. fold cross validation was used for accuracy
of the trained model.
B. J I et al.[12] The aim of this study was to:research whether Artificial neural system (ANN) mod-
els could effectively and efficiently forecast Fujian rice yield for characteristic mountainous climate
and atmospheric conditions,assess the performance of the ANN model in comparison to varieties of
rising parameters and Compare the effectiveness of multiple linear models of regression with models
of ANN. The models were developed using historical harvest data from several locations in Fujian
Field explicit rainfall information and the climate conditions were utilized at every location for the
rice yield prediction
18
Lillian Kay Peterson [13] Here, Lillian created satellite investigation methodologies and program-
ming devices to forecast crop yields two to four months ahead of time, before the harvest. This
procedure estimated relative vegetation condition dependent on pixel-level customary irregularities
of NDVI, EVI and NDWI indices. Since no crop mask, modification, or sub-national ground truth
information are vital, this procedure can be valuable to any area, location, harvest, or climate, making
it ideal for African nations with little fields and poor ground perception.
B.A. Smith et al. [14] discussed about all year or long-haul atmospheric temperature expectation
models that were produced for estimating forecast horizons of 1 to 12 h utilizing Ward-style Artificial
Neural System. These models were intended for general support in decision making. The variations
of the ANN plan described here provide greater precision compared to previously developed winter
models amid the winter time frame. The models that had included precipitation terms likewise as a
part of the air prediction model in the input vector were progressively exact.
SnehalS.Darikar et al. [15]discussed in their paper the use of Artificial Neural Network to predict
crop yield. The paper senses the parameters of the regional soil and the various atmospheric condi-
tions. Then it furthers analyses by using feed forward back propagation ANN. By using Mat lab ANN
approach was made more efficient. They structure a system that accurately links climate effects to
crop yield, can be used to estimate long - term or short-term crop production and can also obtain an
ANN with adequate and useful data.
19
CHAPTER 3
PROPOSED SYSTEM
3.1 METHODOLOGY
In the proposed framework, the machine learning are executed in order to predict the best
crop production. An experiment is done on a crop dataset by the proposed model. The crop is chosen
on the basis of the current atmosphere, the soil along with its constituents as the climatic and soil
parameters are taken into consideration. We have used Four Algorithms and select one algorithm
which will predict more accurately The algorithms we used are Decision Tree , Naive Bayes , SVM
and Random Forest Algorithms .Where Random Forest algorithm predict more accurately so finally
we used Random Forest algorithm as the final algorithm to predict the best crop.Block diagram of
proposed model is shown in figure 3.1.
Figure 3.1: Block diagram of proposed system
1. Dataset Collection:
We collected from a variety of sources and prepare for data set. And this data is used for de-
scriptive analysis. Data is available from several online abstract sources such as Kaggle.com and
data.gov.in.
20
2. Preprocessing Step:
This step is a very important step in machine learning. Preprocessing consists of inserting the
missing values, the appropriate data range, and extracting the functionality. The kind of the
dataset is critical to the analysis process. Here we will use isnull() method for checking null
values and label Encoder() for converting the categorical data into numerical data.
3. Split the Dataset into Train and Testset:

This step includes training and testing of input data. The loaded data is divided into two sets,
such as training data and test data, with a division ratio of 80% or 20%, such as 0.8 or 0.2
the flowchart of splitting is shown in figure 3.2. In a learning set, a classifier is used to form
the available input data. In this step, create the classifier’s support data and preconceptions to
approximate and classify the function. During the test phase, the data is tested. The final data
is formed during preprocessing and is processed by the machine learning module.
Figure 3.2: Flow chart of splitting train set and test set
4. Applying Machine Learning Modules:

In our project we are making use of four algorithms to predict crop yield. The Decision Tree,
Naive Bayes, SVM and Random Forest Algorithms to predict the suitable crop which gives more
yield. Dataset of proposed model is shown in figure 3.3.
21
Figure 3.3: Dataset of proposed model
3.2 JUPYTER NOTEBOOK
The Jupyter Notebook is an open-source web application that allows you to create and share
documents that contain live code, equations, visualizations and narrative text. Uses include: data
cleaning and transformation, numerical simulation, statistical modeling, data visualization, machine
learning, and much more.
The Jupyter Notebook App is a server-client application that allows editing and running
notebook documents via a web browser. The Jupyter Notebook App can be executed on a local
desktop requiring no internet access (as described in this document) or can be installed on a remote
server and accessed through the internet.
In addition to displaying/editing/running notebook documents, the Jupyter notebook App

has a “Dashboard” (Notebook Dashboard), a “control panel” showing local files and allowing to open
notebook documents or shutting down their kernels.
22
Process to Launch Jupyter Notebook:
1. Click on spotlight, type terminal to open a terminal window.
2. Enter the startup folder by typing cd /some folder name
3. Type jupyter notebook to launch the Jupyter Notebook App The notebook interface will appear
in a new browser window or tab.
3.3 PROCESSING OF ALGORITHMS:
Machine learning (ML) is a type of artificial intelligence (AI) that allows software applications
to become more accurate at predicting outcomes without being explicitly programmed to do so.
Machine learning algorithms use historical data as input to predict new output values. We used four
ML algorithms based on their accuracy we selected one algorithm using anaconda software. Flow
chart of selecting the best algorithm is shown in figure 3.4.
Figure 3.4: Flow chart of selecting the best algorithm
23
Selected Algorithm:
A random forest is a machine learning technique that’s used to solve regression and classifi-
cation problems. It utilizes ensemble learning, which is a technique that combines many classifiers to
provide solutions to complex problems.
A random forest algorithm consists of many decision trees. The ‘forest’ generated by the
random forest algorithm is trained through bagging or bootstrap aggregating. Bagging is an ensemble
meta algorithm that improves the accuracy of machine learning algorithms.
The (random forest) algorithm establishes the outcome based on the predictions of the deci-
sion trees. It predicts by taking the average or mean of the output from various trees. Increasing the
number of trees increases the precision of the outcome.
A random forest eradicates the limitations of a decision tree algorithm. It reduces the overfit-
ting of datasets and increases precision. It generates predictions without requiring many configurations
in packages (like scikit-learn). The figure 3.5 shows the flow chart of selected algorithm.
Figure 3.5: Flow chart of selected algorithm
24
Pros of selected algorithm:
The following are the advantages of Random Forest algorithm:
1. It overcomes the problem of overfitting by averaging or combining the results of different decision
trees
2. Random forests work well for a large range of data items than a single decision tree does.
3. Random forest has less variance then single decision tree.
4. Random forests are very flexible and possess very high accuracy.
5. Scaling of data does not require in random forest algorithm. It maintains good accuracy even
after providing data without scaling.
6. Random Forest algorithms maintains good accuracy even a large proportion of the data is
missing.
Cons of selected algorithm:
The following are the disadvantages of Random Forest algorithm:
1. Complexity is the main disadvantage of Random forest algorithms.
2. Construction of Random forests are much harder and time-consuming than decision trees.
3. More computational resources are required to implement Random Forest algorithm.
4. It is less intuitive in case when we have a large collection of decision trees.
5. The prediction process using random forests is very time-consuming in comparison with other
algorithms.
25
3.4 WEB FRAMEWORK USED:
Here we used Flask framework.Flask is a micro web framework written in Python. It is clas-
sified as a microframework because it does not require particular tools or libraries.It has no database
abstraction layer, form validation, or any other components where pre-existing third-party libraries
provide common functions. However, Flask supports extensions that can add application features as
if they were implemented in Flask itself. Extensions exist for object-relational mappers, form val-
idation, upload handling, various open authentication technologies and several common framework
related tools. Flask is a web framework that provides libraries to build lightweight web applications
in python. It is developed by Armin Ronacher who leads an international group of python enthusiasts
Figure 3.6: Flask framework
(POCCO). It is based on WSGI toolkit and jinja2 template engine. Flask is considered as a micro
framework.
WSGI
It is an acronym for web server gateway interface which is a standard for python web application
development. It is considered as the specification for the universal interface between the web server
and web application.
Jinja2
Jinja2 is a web template engine which combines a template with a certain data source to render the
dynamic web pages.
26
OVERVIEW OF FLASK FRAMEWORK:
Web apps are developed to generate content based on retrieved data that changes based on a
user’s interaction with the site. The server is responsible for querying, retrieving, and updating data.
This makes web applications to be slower and more complicated to deploy than static websites for
simple applications. There are two primary coding environments for the whole web app ecosystem.
Client-side Scripting:
The code executed on the user’s browser visible to anyone who has access to the system, generating
the first results.
Server-side Scripting:
This type of code is run on the backend on a web server. To enable developers to design, build,
maintain, host web apps over the internet, a web framework necessary.
Routes and view functions in flask framework:
Clients send requests to the webserver, in turn, sends them to the Flask application instance.
The instance needs to know what code needs to run for each URL requested and map URLs to Python
functions. The association between a URL and the function that handles it is called a route. The
most convenient way to define a route in a Flask application is through the (app.route). Decorator
exposed by the application instance, which registers the ‘decorated function,’ decorators are python
feature that modifies the behavior of a function.
Figure 3.7: Route and view function
The hello is a view function, and the response as shown in figure 3.7 that can even be a string format
HTML. Server Startup-The application instance has a ‘run’ method that launches flask’s integrated
development web server.Once the script starts, it waits for requests and services in a loop.Local Host-
Run a python script in a virtual environment. Flask starts the server listening on 127.0.0.1 and port
5000 by default. To accept connection from any remote address, use host =‘0.0.0.0.’
27
HTTP METHOD
Request:
To process incoming data in Flask, you need to use the request object, including mime-type, IP
address, and data. HEAD: Unencrypted data sent to server w/o response.
Get:
Sends data to the server requesting a response body.
Post:
Read form inputs and register a user, send HTML data to the server are methods handled by the
route. Flask attaches methods to each route so that different view functions can handle different
request methods to the same URL.
Response:
Flask invokes a view function. It has to return a response value to the client. HTTP requires it to be
more than a string response, a status code.
Templates:
To maintain the site. Flask uses a powerful template engine, ‘Jinja2’, in its simplest form. A Jinja2
template is a file that contains the text of a response, returned by a view function that has a dynamic
component represented by a variable.
Linking:
Dynamic url routing support is included using ‘urlfor()’ helper function. For example, urlfor(’sagar’,
name=’projectfile’, external=True) would return http://localhost:5000/sagar/project file.
Security:
CSRF(Cross-Site-Request-Forgery) occurs when a malicious website sends requests to a different web-

site on which the victim logs in. Flask-WTF protects against all such attacks. Apart from that,
Flask also implements some common security mechanisms like session-based management, role mgmt,
password hashing, basic HTTP and token-based authentication, optional log-in tracking.
28
DATABASE CONNECTIVITY:
Flask has no restrictions for the use of databases; there’s no native support for databases.
However, they can be broadly divided into two categories.
• That following relational model for e.g SQL, sqlite3 mainly for structured data.
• That not following the relational model for e.g NoSQL primarily for unstructured data.
Flask-SQL Alchemy is a Flask extension that simplifies the use of SQL Alchemy inside Flask ap-
plications. SQL Alchemy is a robust relational database framework that supports several databases
back ends. It offers a high-level ORM and low-level access to the database’s native SQL functional-
ity.Travis can be used for continuous integration to upload application images to docker hubs. After
the publishing image to the Docker hub, it triggers a webhook to pull onto target servers.
3.4.1 Importance of python for machine learning:
Machine Learning applications are improving traditional processes across industries and solv-
ing some of their pressing problems efficiently. Enabling better personalisation, improved search func-
tionality, and smarter recommendations and Python has been instrumental in all developments. The
characteristics of Python that make it an ideal programming language for machine learning are:
• Simplicity and Consistency
• Range of libraries and frameworks
• Platform Independence
• Flexibility
• Visualisation Options
Properties of python:
You should choose Python because it has become the most preferred programming language
and enables machine learning applications. Python is swift as compared to other programming lan-
guages. Syntax is simpler and the pre-existing libraries eliminate the need for coding every logic from
scratch.
Python is a swift compiler and since it is java-based, programmers will be able to extend
its applications beyond analytical research, analytical modelling, and statistical modelling. Web
applications that are created using Python can be integrated directly to the analytical models in the
background.
29
Python could be easily integrated with other platforms and programming languages. With
this common object oriented programming architecture wherein existing IT analysts, IT developers,
and IT programmers can easily transition to the analytics domain.As the structure of coding in Python
is object-oriented programming architecture, it has excellent documentation support.
• Readable and Maintainable Code
• Multiple Programming Paradigms
• Compatible with Major Platforms and Systems
• Robust Standard Library
• Open Source Frameworks and Tools
• Simplified Software Development
• Test-Driven Development
3.4.2 Libraries Used:
A library is a collection of pre-combined codes that are used to reduce the time required to
code. Libraries eliminate the need for writing codes again and again from scratch by accessing pre-
written codes that are used frequently. Similar to a physical library, a Python library is a collection
of reusable resources having a root source. This makes the foundation of most of the open-source
python libraries.
Used Libraries:
• Numpy
• Pandas
• Matplotlib
• Seaborn
• Sklearn
• Pickle
• Os
30
1. Numpy:
NumPy is a library for the python programming language, adding support for large,multi-
dimensional arrays and matrices, along with a large collection of high-level mathematical func-
tions to operate on these arrays.
2. Pandas:
Pandas is mainly used for data analysis. Pandas allows importing data from various file formats
such as comma-separated values, JSON, SQL, and Microsoft Excel. Pandas allows various data
manipulation operations such as merging, reshaping, selecting, as well as data cleaning, and
data wrangling features.
3. Matplotlib:
Matplotlib is a plotting library for the Python programming language and its numerical mathe-
matics extension NumPy. It provides an object-oriented API for embedding plots into applica-
tions using general-purpose GUI toolkits like Tkinter, wxPython, Qt, or GTK.
4. Seaborn:
Seaborn is an open-source Python library built on top of matplotlib. It is used for data visu-
alization and exploratory data analysis. Seaborn works easily with dataframes and the Pandas
library. The graphs created can also be customized easily.
5. Sklearn:
Scikit-learn (Sklearn) is the most useful and robust library for machine learning in Python. It
provides a selection of efficient tools for machine learning and statistical modeling including
classification, regression, clustering and dimensionality reduction via a consistence interface in
Python.
6. Pickle:
Python pickle module is used for serializing and de-serializing a Python object structure. Any
object in Python can be pickled so that it can be saved on disk. What pickle does is that it
“serializes” the object first before writing it to file. Pickling is a way to convert a python object
(list, dict, etc.)
7. Os:
The OS module in Python provides functions for interacting with the operating system. OS
comes under Python’s standard utility modules. This module provides a portable way of using
operating system-dependent functionality.
31
3.5 CODE EDITOR
Visual Studio Code is an Integrated Development Environment (IDE) made by Microsoft for
Windows, Linux and macOS. Features include support for debugging, syntax highlighting, intelligent
code completion, snippets, code refactoring, and embedded Git. Users can change the theme, key-
board shortcuts, preferences, and install extensions that add additional functionality.
Microsoft has released most of Visual Studio Code’s source code on GitHub under the per-
missive MIT License, while the releases by Microsoft are proprietary freeware.
In the Stack Overflow 2021 Developer Survey, Visual Studio Code was ranked the most pop-
ular developer environment tool, with 70Visual Studio Code was first announced on April 29, 2015, by
Microsoft at the 2015 Build conference. A preview build was released shortly thereafter. On November
18, 2015, the source of Visual Studio Code was released under the MIT License, and made available
on GitHub. Extension support was also announced.On April 14, 2016, Visual Studio Code graduated
from the public preview stage and was released to the Web.
Working with python in Visual Studio Code, using the Microsoft Python extension, is simple,
fun, and productive. The extension makes VS Code an excellent Python editor, and works on any
operating system with a variety of Python interpreters.
Figure 3.8: VS code editor
32
CHAPTER 4
RESULTS AND DISCUSSION
This chapter explains the results obtained in implementing the Analysis of crop yieldusing machine
learning in using machine learning.A reasonable dataset has been collected through internet, labelled
according to the application and trained using machine learning.
4.1 ACCURACY COMPARISION
The four Algorithms we used are SVM,Decision Tree,Navie Bayes,Random Forest.In Jupiter
Notebook Comparing Accuracy’s of Algorithms Using Barplot we will get the following output. The
figure 4.1 shows the accuracy comparison of four algorithms
Figure 4.1: Comparison of accuracy of four algorithms
33
Algorithms Accuracy of trainset Accuracy of testset
Random Forest 20% 25%

Support vector machine 99% 99.1%
Naive Bayes 97% 98%
Decision tree 89% 90%
Table 4.1: Accuracy comparison of trainset and testset
Decision tree accuracy:
The number of correct predictions made divided by the total number of predictions made. We’re
going to predict the majority class associated with a particular node as True. The figure 4.2 shows
the accuracy of decision tree for our model.
Figure 4.2: Accuracy of decision tree
Random forest accuracy:
Random Forests tend to have high accuracy prediction. The accuracy of our model is shown in figure
4.3 this high accuracy prediction can handle large numbers of features due to the embedded feature
selection in the model generation process. Note that when the number of features is large, it is
preferable to use a higher number of regression trees.
34
Figure 4.3: Accuracy of random forest
Naive Bayes accuracy:
Naive Bayes classifier is the fast, accurate and reliable algorithm. Naive Bayes classifiers have high
accuracy and speed on large datasets. Naive Bayes classifier assumes that the effect of a particular
feature in a class is independent of other features.The accuracy of model is shows in figure 4.4.
Figure 4.4: Accuracy of Naive Bayes
Support vector machine accuracy:
SVM gives very good results in terms of accuracy when the data are linearly or non-linearly separable.
When the data are linearly separable, the SVMs result is a separating hyperplane, which maximizes
the margin of separation between classes, measured along a line perpendicular to the hyperplane below
figure 4.5 shows the accuracy of SVM for our model
35
Figure 4.5: Accuracy of Support Vector Machine
4.1.1 Fitting the algorithms:
A random forest is a meta estimator that fits a number of decision tree classifiers on various
sub-samples of the dataset and uses averaging to improve the predictive accuracy and control over-
fitting otherwise the whole dataset is used to build each tree.Random Forest gives better accuracy
than other. The fitting of random forest algorithm to our model is shown in figure 4.6.
Figure 4.6: Fitting Random Forest classifier
Finally the result can be obtained by giving the values of required parameters in numpy array the
algorithm predicts the best crop to grow and display the output crop figure 4.7 shows how to give the
input through numpy array
36
Figure 4.7: Input and Output
INPUT AND OUTPUT THROUGH WEBSITE
The above pattern of giving input values cant be understandable by normal people i.e.Users So by
Using User interface the user can easily give the input as shown in figure 4.8 and get the resultant
output.
Figure 4.8: Input and Output through website
37
CHAPTER 5
CONCLUSION
The proposed model is constructed by using ML algorithms to reduce the farmers problems of getting
losses in their farms due to lack of knowledge of cultivation in different soil and weather conditions.
The model is created by using machine learning (SVM, Naive Bayes, RF, Decision Tree) techniques.
The model predicts best crops that should be grown on land with less expenses among a number of
crops available after analysing the prediction parameters. To the best of studies, there is no such
work in existence that uses the same techniques in predicting the crops. Hence, it is concluded that
there is an enhancement in the accuracy of this research work when compared to the existing work
that used other techniques for prediction of crops. The accuracy is calculated as more than 97%. The
farmers need to be educated and hence, will get a clear information regarding best crop yield on their
mobiles. The progress in the agribusiness field will be extremely appreciable which will further result
in helping the farmers in production of crops.
38
REFERENCES
[1] Agarwal, Sonal, and Sandhya Tarar. ”a Hybrid Approach for Crop Yield Prediction Using Machine
Learning and Deep Learning Algorithms.” Journal of Physics: Conference Series. Vol. 1714. No. 1.
[2] Sajja, Guna Sekhar, et al. ”An Investigation on Crop Yield Prediction Using Machine Learning.”
2021 Third International Conference on Inventive Research in Computing Applications (ICIRCA).
[3] Van Klompenburg, Thomas, Ayalew Kassahun, and Cagatay Catal. ”Crop yield prediction using
machine learning: A systematic literature review.” Computers and Electronics in Agriculture 177
(2020): 105709.
[4] Awan, A. M., Sap, M. N. M. (2006, April). An intelligent system based on kernel methods for crop
yield prediction. In Pacific-Asia Conference on Knowledge Discovery and Data Mining (pp. 841-846).
Springer, Berlin, Heidelberg.
[5] Bang, S., Bishnoi, R., Chauhan, A. S., Dixit, A. K., Chawla, I. (2019, August). Fuzzy Logic
based Crop Yield Prediction using Temperature and Rainfall parameters predicted through ARMA,
SARIMA, and ARMAX models. In 2019 Twelfth International Conference on Contemporary Com-
puting (IC3) (pp. 1-6).
[6] Bhosale, S. V., Thombare, R. A., Dhemey, P. G., Chaudhari, A. N. (2018, August). Crop Yield
Prediction Using Data Analytics and Hybrid Approach. In 2018 Fourth International Conference on
Computing Communication Control and Automation (ICCUBEA) (pp. 1-5).
[7] Gandge, Y. (2017, December). A study on various data mining techniques for crop yield predic-
tion. In 2017 International Conference on Electrical, Electronics, Communication, Computer, and
Optimization Techniques (ICEECCOT) (pp. 420-423).
[8] Gandhi, N., Petkar, O., Armstrong, L. J. (2016, July). Rice crop yield prediction using arti-
ficial neural networks. In 2016 IEEE Technological Innovations in ICT for Agriculture and Rural
Development (TIAR) (pp. 105-110).
39
[9] Gandhi, N., Armstrong, L. J., Petkar, O., Tripathy, A. K. (2016, July). Rice crop yield prediction
in India using support vector machines. In 2016 13th International Joint Conference on Computer
Science and Software Engineering (JCSSE) (pp. 1-5).
[10] Gandhi, N., Armstrong, L. J., Petkar, O. (2016, July). Proposed decision support system (DSS)
for Indian rice crop yield prediction. In 2016 IEEE Technological Innovations in ICT for Agriculture
and Rural Development (TIAR) (pp. 13-18).
[11] Jaikla, R., Auephanwiriyakul, S., Jintrawet, A. (2008, May). Rice yield prediction using a support
vector regression method. In 2008 5th International Conference on Electrical Engineering/Electronics,
Computer, Telecommunications and Information Technology (Vol. 1, pp. 29-32).
[12] Kadir, M. K. A., Ayob, M. Z., Miniappan, N. (2014, August). Wheat yield prediction: Artificial
neural network based approach. In 2014 4th International Conference on Engineering Technology and
Technopreneuship (ICE2T) (pp. 161-165).
[13] Manjula, A., Narsimha, G. (2015, January). XCYPF: A flexible and extensible framework for
agricultural Crop Yield Prediction. In 2015 IEEE 9th International Conference on Intelligent Systems
and Control (ISCO) (pp. 1-5).
[14] Mariappan, A. K., Das, J. A. B. (2017, April). A paradigm for rice yield prediction in Tamilnadu.
In 2017 IEEE Technological Innovations in ICT for Agriculture and Rural Development (TIAR) (pp.
18-21).
[15] Paul, M., Vishwakarma, S. K., Verma, A. (2015, December). Analysis of soil behaviour and pre-
diction of crop yield using data mining approach. In 2015 International Conference on Computational
Intelligence and Communication Networks (CICN) (pp. 766-771).
[16] Ahamed, A. M. S., Mahmood, N. T., Hossain, N., Kabir, M. T., Das, K., Rahman, F., Rah-
man, R. M. (2015, June). Applying data mining techniques to predict annual yield of major crops
and recommend planting different crops in different districts in Bangladesh. In 2015 IEEE/ACIS
16th International Conference on Software Engineering, Artificial Intelligence, Networking and Paral-
lel/Distributed Computing (SNPD) (pp. 1-6).
[17] Shastry, A., Sanjay, H. A., Hegde, M. (2015, June). A parameter based ANFIS model for crop
yield prediction. In 2015 IEEE International Advance Computing Conference (IACC) (pp. 253-257).
40
[18] Sujatha, R., Isakki, P. (2016, January). A study on crop yield forecasting using classification
techniques. In 2016 International Conference on Computing Technologies and Intelligent Data Engi-
neering (ICCTIDE’16) (pp. 1-4).
[19] Suresh, A., Kumar, P. G., Ramalatha, M. (2018, October). Prediction of major crop yields of
Tamilnadu using K-means and Modified KNN. In 2018 3rd International Conference on Communica-
tion and Electronics Syst ems (ICCES) (pp. 88-93).
[20] Veenadhari, S., Misra, B., Singh, C. D. (2014, January). Machine learning approach for fore-
casting crop yield based on climatic parameters. In 2014 International Conference on Computer
Communication and Informatics (pp. 1-5).
[21] Kitchen N R, Lundb S T D, Sudduth K A and Buchleiter G W 2003 Soil electrical conductivity
and topography related to yield for three contrasting soil-crop systems Agronomy Journal 95 483-495
[22] Ezrin M H 2009 Relationship between apparent electrical conductivity of paddy soil and rice yield,
M.S. Thesis Institute of Advance Technology, Universiti Putra Malaysia
[22] Pantazi X E, Tamouridou A A, Alexandridis T K, Lagopodi A L, Kashefi J and Moshou D 2017

Evaluation of hierarchical self-organising maps for weed mapping using UAS multispectral imagery
Comput. Electron. Agric. 139 224–230
[23] Zhang M, Li C and Yang F 2017 Classification of foreign matter embedded inside cotton lint using
short wave infrared (SWIR) hyperspectral transmittance imaging Comput. Electron. Agric. 139 75–90
[24] Ming J, Zhang L, Sun J and Zhang Y April 2018 Analysis models of technical and economic
data of mining enterprises based on big data analysis IEEE 3rd International Conference on Cloud
Computing and Big Data Analysis (ICCCBDA) pp. 224–227
[25] Stastny J, Konecny V and Trenz O 2011 Agricultural data prediction by means of neural network
vol 7 pp 356–361
[26] Veenadhari S, Misra B and Singh C D March 2011 Data mining Techniques for Predicting Crop
Productivity – A review article International Journal of Computer Science and technology
41

Analysis of Crop Yield Using Machine Learning: A Minor Project Report

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Analysis of Crop Yield Using Machine Learning: A Minor Project Report

Uploaded by

Copyright:

Available Formats

ANALYSIS OF CROP YIELD USING MACHINE

A MINOR PROJECT REPORT

YETURI SRI LASYA

Under the Guidance of

in partial fulfillment for the award of the degree

ELECTRONICS & COMMUNICATION ENGINEERING

SUPERVISOR HEAD OF THE DEPARTMENT

Dr.S.JANA Dr. P. ESTHER RANI

Submitted for Evaluation of Minor Project on:−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−

INTERNAL EXAMINER EXTERNAL EXAMINER

YETURI SRI LASYA

CONTENTS PAGE No.

LIST OF TABLES vii

LIST OF FIGURES viii

4 RESULTS AND DISCUSSION 33

Sl.No. TITLE PAGE No.

4.1 Accuracy comparison of trainset and testset . . . . . . . . . . . . . . . . . . . . . . . . 34

Sl.No. TITLE PAGE No.

1.1 Types of machine learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

3.1 Block diagram of proposed system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

4.1 Comparison of accuracy of four algorithms . . . . . . . . . . . . . . . . . . . . . . . . . 33

CSS Cascading Style Sheets

1.1 MACHINE LEARNING AND ITS TYPES

1.1.1 Types of machine learning:

Figure 1.2: Flow chart of supervised learning

1.1.3 Unsupervised Learning:

Figure 1.3: Flow chart of unsupervised learning

1.1.4 Reinforcement learning:

1.2 MACHINE LEARNING ALGORITHMS USED IN OUR PROJECT

1.2.1 Support Vector Machine:

Figure 1.4: Hyperplane of Linear SVM

1.2.2 Decision Tree

Figure 1.5: Flow chart of Decision Tree

1.2.3 Naive Bayes

Types of Naive Bayes Models:

1.2.4 Random Forest

Random Forest Algorithm Working:

Ensemble uses two types of methods:

Figure 1.8: Example of Bagging and Boosting

Figure 1.9: Flow chart of Bagging

1.3 FACTORS INFLUENCING AGRICULTURE

2. Weather (Temperature , Humidity ,Precipitation)

3. Soil Type (sodic, non-alkaline)

4. Soil Composition (PH , N ,P ,K ,EC ,OC ,Zn ,F ).

Figure 1.10: Factors effecting agriculture

Figure 1.11: Userinterface

Hypertext Markup Language:

Figure 1.12: Hypertext Markup Language

Figure 1.13: Cascading Style Sheets

Python is an object-oriented programming language that is preferred by most of the devel-

Figure 3.1: Block diagram of proposed system

3. Split the Dataset into Train and Testset:

4. Applying Machine Learning Modules:

3.2 JUPYTER NOTEBOOK

In addition to displaying/editing/running notebook documents, the Jupyter notebook App

1. Click on spotlight, type terminal to open a terminal window.

2. Enter the startup folder by typing cd /some folder name

3.3 PROCESSING OF ALGORITHMS:

Figure 3.4: Flow chart of selecting the best algorithm

Figure 3.5: Flow chart of selected algorithm

The following are the advantages of Random Forest algorithm:

3. Random forest has less variance then single decision tree.