A Model To Predict The Crop Based On Soil Properties Using Machine Learning

A Project Report On
A MODEL TO PREDICT THE CROP BASED ON SOIL

PROPERTIES USING MACHINE LEARNING
Submitted in partial fulfillment of the requirement for the award of the degree in
BACHELOR OF TECHNOLOGY
In
COMPUTER SCIENCE AND ENGINEERING
Submitted By
S KEERTHI 198X1A05E0
T ESTHER RANI 198X1A05E8
B VIVEK 198X1A05I7
V SASIDHAR REDDY 198X1A05G2
Under the esteemed Guidance of

Prof V RAJEEV JETSON MTech, (Ph.D)
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

KALLAM HARANADHAREDDY INSTITUTE OF TECHNOLOGY
Approved by AICTE- New Delhi, Accredited by NAAC A Grade and NBA Accredited
Permanently Affiliated to Jawaharlal Nehru Technological University, Kakinada
NH-5, Chowdavaram, Guntur, Andhra Pradesh, India
2022-2023
KALLAM HARANADHAREDDY INSTITUTE OF TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
CERTIFICATE
This is to certify that the project report entitled A Model to Predict the Crop Based on Soil
Properties Using Machine Learning being submitted by
B VIVEK 198X1A05I7
in partial fulfillment for the award of the Degree of Bachelor of Technology in Computer
Science and Engineering to the Jawaharlal Nehru Technological University, Kakinada is a
record of bonafide work carried out under my guidance and supervision.
The result embedded in this thesis has not been submitted to any other university /institute
for the award of any degree / diploma.
PROJECT SUPERVISOR HEAD OF THE DEPARTMENT

Prof V Rajeev Jetson MTech, (Ph.D) Prof V Rajeev Jetson MTech,(Ph.D)
Professor, Department of CS, KHIT Department of CS, KHIT
EXTERNAL EXAMINER
DECLARATION
I/We hereby declare that the work described in the project report, entitled
“A Model to Predict the Crop based on Soil Properties using Machine Learning”
which is submitted by us in partial fulfilment for the award of Bachelor of Technology
in the Department of Computer Science and Engineering, KHIT, Andhra Pradesh is
the record original and independent research work done by us during the academic year
2022–2023 under the supervision of Prof. V. Rajeev Jetson. The work is original and
has not been submitted for the award of any Degree or Diploma of associate ship or
Fellowship or any other similar title to this or any other university.
Name of the Student Roll No Signature
B VIVEK 198X1A05I7

ACKNOWLEDGMENT
We profoundly grateful to express our deep sense of gratitude and respect towards
our honorable chairman, Sri KALLAM MOHAN REDDY, Chairman of Kallam group
for his precious support in the college.
We are thankful to Dr. M. UMA SANKAR REDDY, Director, KHIT, GUNTUR

for his encouragement and support for the completion of the project.
We are much thankful to Dr. B. SIVA BASIVI REDDY, Principal, KHIT,

GUNTUR for his support during and until the completion of the project.
We are greatly indebted to Prof. V. Rajeev Jetson Professor, & Head of the
department, Computer Science and Engineering, KHIT, GUNTUR for providing the
laboratory facilities fully as and when required and for giving us the opportunity to
carry the project work in the college.
We are also thankful to our Project Coordinators D. Vinay Kumar, G. Mantru

Naik and Mr. Ramprasad Mathi who helped us in each step of our Project.
We extend our deep sense of gratitude to our Internal Guide Prof. V. Rajeev
Jetson, Professor, & Head of the department and other Faculty Members & Support staff
for their valuable suggestions, guidance and constructive ideas in each and every step,
which wasindeed of great help towards the successful completion of our project.
Team Members
B VIVEK 198X1A05I7

TABLE OF CONTENTS
SNO CHAPTER NAME Page No

ABSTRACT I
LIST OF FIGURES II
LIST OF TABLES III
1 INTRODUCTION 1
1.1 Introduction for Crop Prediction 2
1.2 Significance of the Project 3
2 LITERATURE SURVEY 4
2.1 Literature Survey 5
3 SYSTEM ANALYSIS 8
3.1 Existing System 9
3.1.1 Disadvantages of Existing System 9
3.2 Proposed System 9
3.2.1 Advantages of Proposed System 10
3.3 Flow in Proposed System 10
3.3.1 Acquisition of training dataset 10
3.3.2 Data Preprocessing 11
3.4 Machine Learning Algorithm 12
4 PROJECT MODULES 18
4.1 Modules Description 19
4.1.1 Input 19
4.1.2 Data Processing 19
4.1.3 Classification Algorithm 19
4.1.4 Majority Voting 19
5 SYSTEM REQUIREMENT SPECIFICATION 20
5.1 Hardware Requirements 21
5.2 Software Requirements 21
5.3 Non-Functional Requirements 22
6 SYSTEM DESIGN 23
6.1 System Architecture 24
6.2 UML Diagrams 25
6.2.1 Use Case Diagram 27
6.2.2 Sequence Diagram 28
6.2.3 Activity Diagram 29
6.2.4 Collaboration Diagram 30
6.2.5 Data Flow Diagram 31
7 IMPLEMENTATION 32
7.1 Sample Source Code 34
8 TESTING 50
8.1 Testing Introduction 51
8.1.1 Unit Testing 52
8.1.2 Integration Testing 52
8.1.3 User Interface Testing 53
8.2 Test Results 54
9 OUTPUT SCREENS 55
10 CONCLUSION 60
11 REFERENCES 62
ABSTRACT
Agriculture is one of the most essential and widely practiced occupations in India and it has
a vital role in the development of our country. Around 60 percent of the total land in the
country is used for agriculture to meet the needs of 1.2 billion people, so improving crop
production is therefore seen as a significant aspect of agriculture. Basically, if we have a
piece of land, we need to know what kind of crop can be grown in this area. Agriculture
depends on the various soil properties. Production of crops is a difficult task since it involves
various factors like soil type, temperature, humidity etc. But now-a-days, food production
and prediction is getting depleted due to unnatural climatic changes, which will adversely
affect the economy of farmers by getting a poor yield and also help the farmers to remain
less familiar in forecasting the future crops. If it is possible to find the crop before sowing it,
it would be of great help to the farmers and the other people involved to make appropriate
decisions on the storage and business side. The proposed project would solve agricultural
problems by monitoring the agricultural area based on soil properties and recommending the
most appropriate crop to farmers, thereby helping them to significantly increase
productivity and reduce loss. Our project is a recommendation system which makes use of
machine learning techniques like Logistic Regression and SVM, etc. such that it
recommends the suitable crops based on the input soil parameters. The seed data of the
crops are collected here, with the appropriate parameters like temperature, humidity and
moisture content, which helps the crops to achieve a successful growth. This system thus
reduces the financial losses faced by the farmers caused by planting the wrong crops and
also it helps the farmersto find new types of crops that can be cultivated in their area.
i
LIST OF FIGURES
FIGURE NO DESCRIPTION PAGE NO

Fig 1 Acquisition of Training Dataset 10
Fig 2 System Architecture of Crop Prediction 24
Fig 3 Use Case diagram for Crop Prediction 27
Fig 4 Sequence Diagram for Crop Prediction 28
Fig 5 Activity Diagram for Crop Prediction 29
Fig 6 Collaboration Diagram for Crop Prediction 30
Fig 7 DFD Diagram for Crop Prediction 31
ii
LIST OF TABLES
TABLE No TABLE NAME PAGE No

Table 1 Head of the Dataset 11
Table 2 Test Results 54
iii
CHAPTER 1
INTRODUCTION
1
1 INTRODUCTION
1.1 INTRODUCTION FOR CROP PREDICTION
Agriculture is the main source of the Indian Economy. From the olden days itself
agriculture is considered to be one of the main practices practiced in India. In India 50% man
force is involved in agriculture activities. India is the leading producer of few crops.
Predominant occupation in India is agriculture. In Agriculture the soil is the main and basic
thing . But now also the farmers are using the traditional method. Because of the traditional
method farmers did not get satisfactory results means the quantity of crops is not increasing.
To increase the quantity of crops need good quality of soil. The production and
quality of crops totally depends on the soil. The soil quality of the agriculture includes the soil
properties those related to organic matter such as N(Nitrogen), C(Carbon), Ph(Phosphorus),
Mg(Magnesium), Ca(Calcium) and K(Potassium).
To help the farmers to decide the crop to be plow for their benefits we motivated to
build this system. This dataset consists of the available nutrient for farmers’ soil . Based on
nutrients value, our system predicts soil type. According to the soil type system predicts a list
of crops that can grow in a particular soil. Hence the yield of the crop increases, as well as the
farmer, earn more money with this new method. We create the system with the help of
advanced technology. We use machine learning to create the system. Machine learning
concentrates on the creation of computer programs that can access data and use it to learn
from that. Machine learning allows building models from sample data and give the ability to
take decision automatically according to past experiences .
2
1.2 SIGNIFICANCE OF THE PROJECT
Crop prediction has been a popular problem in research for years since the traditional crop
prediction is depending on the climatic conditions that does not satisfy the farmers. An
accurate crop prediction has importance for the farmers in the agriculture industry. Soil
properties plays a significant role in choosing crop to be plow by the farmer to increase the
yield of the crop.
The proposed model is the prediction of the crop using the soil properties. Here the farmers
can consider nutrients of the soil such as N(Nitrogen), C(Carbon), Ph(Phosphorus),
Mg(Magnesium), Ca(Calcium) in order to predict the crop.
In short, this crop prediction can help the farmers to determine the crop that is suitable for the
soil and can help the farmers to increase the profit.
3
CHAPTER 2
LITERATURE SURVEY
4
2 LITERATURE SURVEY
2.1 LITERATURE SURVEY:
[ 1 ] Shriya Sahu, Meenu Chawla and Nilay Khare, “An Efficient Analysis of Crop Yield
Prediction using Hadoop Framework Based on Random Forest Approach”, IEEE .
In this paper, various parameters are considered from soil to atmosphere for predicting the
suitable crop. Soil parameters such as type, ph level, iron, copper, manganese, sulphur,
organic carbon, potassium, phosphate, nitrogen are considered. The random forest algorithm
is used to classify the dataset which provides result in good accuracy with poor error rate.
Since this framework can handle large dataset by processing it in MapReduce programming
model. The phases of the proposed work are: Data Collection, Data Classification(Random
Forest Algorithm), Hadoop Framework – MapReduce programming model and Final
Prediction. The implementation is carried out in ubuntu 14.04 LTS with Hadoop 2.6.0 and the
dataset is collected from various online sources to predict the suitable crop.
[ 2 ] Rakesh Kumar, M.P. Singh, Prabhat Kumar and J.P. Singh, “Crop Selection
Method to Maximize Crop yield rate using Machine Learning Technique”.
This work presents a technique named CSM to select sequence of crops to be planted over
season. CSM method may improve net yield rate of crops to be planted over season. The
proposed method resolves selection of crop (s) based on prediction yield rate influenced by
parameters (e.g. weather, soil type, water density, crop type). The crop sowing table data
considered are gathered from farmer of Patna District, Bihar (India). It takes crop, their
sowing time, plantation days and predicted yield rate for the season as input and finds a
sequence of crops whose production per day are maximum over season.
5
[ 3 ] Monali Paul, Santosh K. Vishwakarma and Ashok Verma. “Analysis of Soil
Behaviour and Prediction of Crop Yield using Data Mining Approach”.
In this work the experiments are performed using RapidMiner 5.3. Two important and well
known classification algorithms K-Nearest Neighbor (KNN) and Naive Bayes (NB) are
applied to the soil dataset which is taken from the soil testing laboratory Jabalpur, M.P. And
classification of soil into low, medium and high categories are done in order to predict the
crop yield using available dataset. This study can help the soil analysts and farmers to decide
sowing in which land may result in better crop production.
[ 4 ] Renuka & Sujata Terdal, "Evaluation of Machine Learning Algorithms for Crop
Prediction"
Agriculture plays a major role within the growth of the national economy. It relay on weather
and different environmental aspects. a number of the factors on that agriculture relies area
unit Soil, climate, flooding, fertilizers, temperature, precipitation, crops, pesticides and herb.
The crop yield relies on these factors and therefore tough to predict. to understand the standing
of crop production, during this work we tend to perform descriptive study on agricultural
information mistreatment numerous machine learning techniques. Crop yield estimates
embrace estimating crop yields from accessible historical information like precipitation
information, soil data, and historic crop yields.
[ 5 ] V. Puranik, Sharmila, A. Ranjan and A. Kumari, "Automation in Agriculture and

IoT," 2019 4th International Conference on Internet of Things.
We live during a world of digitization. Almost everything around us is bit by conversion.
The role the Technology should play in agriculture sector is changing into additional and
additional visible day by day. Since year of its beginning communication has played a crucial
half in agriculture, it was not simply restricted to in area of crop medical specialty however
6
it's played polar role within the modification ancient previous agricultural practices. One may
witness development in varied methodologies and technologies being employed within the
agricultural system. On the contrary, the agriculture sector in Asian nation is witnessing losing
ground a day that has affected the production capability of the system. there's associate rising
want to solve the matter within the said domain to revive vibrancy and put it back on higher
growth.
[ 6 ] Arun Kumar & et al ,“Efficient Crop Yield Prediction Using Machine Learning
Algorithms”.
Descriptive analytics is that the initial state of analytics it's a method during which we will
understand what happened within the past. And we know that past is that the best predictor
of the longer term. during this analysis paper we tend to apply descriptive analytics within the
agriculture production domain for sugarcane crop to search out economical crop yield
estimation. during this paper we've got 3 datasets like as Soil dataset, precipitation dataset,
and Yield dataset. on this combined dataset we apply many supervised techniques to search
out the particular calculable price and also the accuracy of many techniques. during this paper
3 supervised techniques are used like as K-Nearest Neighbor, Support Vector Machine, and
Least square Support Vector Machine .
[ 7 ] Ramesh Medar & Anand M. Ambekar, “Sugarcane Crop prediction Using

Supervised Machine Learning”.
Traditionally, application of LTTS within the agriculture sector for yield prediction/crop
statement is limited to empirical strategies exploitation ground-based observations and
productions reports gathered by numerous organizations from completely different sources:
meteorologic information, agro- meteorological(yield), soil (water holding capacity), and
remotely perceived agricultural statistics. Based on the scientific discipline information, many
indices they are derived that are deemed to be related variables in finding crop yield.
7
CHAPTER 3
SYSTEM ANALYSIS
8
3 SYSTEM ANALYSIS
3.1 EXISISTING SYSTEM

In Agriculture soil is the main and basic thing. But now also farmers are using the
traditional method. Because of these traditional methods, they are not getting satisfactory
results means the quantity of crop is not increasing. We have a system that predicts the crop
based on climatic conditions but nowadays, weather conditions are being rapidly changing
against the elemental assets to deplete the food and increase the security. The yield of the crop
totally depends on the soil, so to increase the profit margins of the farmer we proposed a
system to predict the crop based on soil properties.
3.1.1 Disadvantages of Existing System

1. Now-a-days climatic conditions aren’t like decades ago so, the farmer cannot sow the crop
by depending on the climatic conditions.
2.Farmers are facing difficulties in forecasting the weather and crops based on climate data.
3.Ignorance of soil properties leads to the decrease of the crop yield and may cause loss to the
farmers.
4.Now-a-days seasons are not coming regularly so, cultivating the crop based on the seasons
by ignoring the soil properties sometimes leads to financial crisis to the farmer.
3.2 PROPOSED SYSTEM

To help the farmers to decide the crop to be plow for their benefits we motivated to
build this system. The dataset consists of the available nutrient for farmer’s soil . Based on
nutrients value, our system predicts soil type. According to the soil type system predicts a list
of crops that can grow in a particular soil. Hence the yield of the crop increases, as well as
9
the farmer, earn more money with this new method. We create the system with the help of
advanced technology. We use machine learning to create the system.
3.2.1 Advantages of Proposed System

1. Increases the profit to the farmers by cultivating the crop that which appropriately suites for
the soil.
2. Sowing the crop based on soil properties leads increase of the crop yield.
3.3 Flow in the Proposed System

3.3.1 Acquisition of training dataset
Figure 1: Acquisition of Training Dataset
The accuracy of a machine learning algorithm may depend on the number of parameters
used and to the extent of correctness of the dataset. Our dataset contains the N, P, K, and pH
values of different kinds of soils as attributes and it also contains the corresponding crops that
can be grown in that soil as label.
Thus, by using an appropriate machine learning algorithm we can train the dataset to predict
the most suitable crop that can be grown under the given input parameters. The data set used
in our project was obtained from Kaggle and is titled “Crop recommendation” and is a CSV
file. A Comma Separated Value (CSV) fileis a delimited text file that uses a comma to separate
values. Each line of the file isa data record. Each record consists of one or more fields,
separated by commas. The use of the comma as a field separator is the source of the name for
this file format. A CSV file typically stores tabular data (numbers and text) in plain text, in
10
which case each line will have the same number of fields. So, in order to use this dataset in
Python, we have to import the .csv file. After the .csv file is imported, in order to read the .csv
file using Python, we use the command:
Table 1 : Head of the dataset

3.3.2 Data preprocessing
Data preprocessing is the second step and it contains two steps. The first step being Data
Cleaning in which, the original dataset which can contain lots of missing values, so initially
all these should be removed. Missing values are denoted by a dot in the dataset and their
presence can deteriorate the value of entire data and it can reduce the performance. So, to solve
this problem we replace these values with large negative values which will be treated as
outliers by the model. Generating the class labels is the second step. Since we are using a
supervised learning method, for each entry in the dataset there should be a class label
whichis created during the preprocessing steps.
11
3.4 Machine Learning Algorithm
Support Vector Machine Algorithm
Support Vector Machine or SVM is one of the most popular Supervised Learning algorithms,
which is used for Classification as well as Regression problems. However, primarily, it is
usedfor Classification problems in Machine Learning.
The goal of the SVM algorithm is to create the best line or decision boundary that can
segregate n-dimensional space into classes so that we can easily put the new data point in
the correct category in the future. This best decision boundary is called a hyperplane.
SVM chooses the extreme points/vectors that help in creating the hyperplane. These extreme
cases are called as support vectors, and hence algorithm is termed as Support Vector Machine.
Consider the below diagram in which there are two different categories that are classified
using a decision boundary or hyperplane:
Types of SVM
SVM can be of two types:
Linear SVM: Linear SVM is used for linearly separable data, which means if a dataset can
be classified into two classes by using a single straight line, then such data is termed as linearly
separable data, and classifier is used called as Linear SVM classifier.
Non-linear SVM: Non-Linear SVM is used for non-linearly separated data, which means if
a dataset cannot be classified by using a straight line, then such data is termed as non-linear
data and classifier used is called as Non-linear SVM classifier.
12
Hyperplane and Support Vectors in the SVM algorithm:
Hyperplane:
There can be multiple lines/decision boundaries to segregate the classes in n-dimensional
space, but we need to find out the best decision boundary that helps to classify the data
points. This best boundary is known as the hyperplane of SVM.
The dimensions of the hyperplane depend on the features present in the dataset, which means
if there are 2 features (as shown in image), then hyperplane will be a straight line. And if
there are 3 features, then hyperplane will be a 2-dimension plane.
We always create a hyperplane that has a maximum margin, which means the maximum
distance between the data points.
Support Vectors:
The data points or vectors that are the closest to the hyperplane and which affect the position
of the hyperplane are termed as Support Vector. Since these vectors support the hyperplane,
hence called a Support vector.
How does SVM works?
Linear SVM:
The working of the SVM algorithm can be understood by using an example. Suppose we have
a dataset that has two tags (green and blue), and the dataset has two features x1 and x2. We
want a classifier that can classify the pair(x1, x2) of coordinates in either green or blue.
Consider the below image:
13
So as it is 2-d space so by just using a straight line, we can easily separate these two classes.
Butthere can be multiple lines that can separate these classes. Consider the below image:
Hence, the SVM algorithm helps to find the best line or decision boundary; this bestboundary
or region is called as a hyperplane. SVM algorithm finds the closest point of the lines from
both the classes. These points are called support vectors. The distance between the vectors and
the hyperplane is called as margin. And the goalof SVM is to maximize this margin. The
14
hyperplane with maximum margin is called the optimal hyperplane.
Non-Linear SVM:
If data is linearly arranged, then we can separate it by using a straight line, but for non-linear
data, we cannot draw a single straight line. Consider the below image:
So, to separate these data points, we need to add one more dimension. For linear data, we have
15
used two dimensions x and y, so for non-linear data, we will add a third-dimension z. It can
be calculated as:
z=x2 +y2
By adding the third dimension, the sample space will become as below image
So now, SVM will divide the datasets into classes in the following way. Consider the below
image:
16
Since we are in 3-d Space, hence it is looking like a plane parallel to the x-axis.
If we convert it in 2d space with z=1, then it will become as:
Hence, we get a circumference of radius 1 in case of non-linear data
17
CHAPTER 4
PROJECT MODULES
18
4 MODULES
4.1 Module Description

The proposed system architecture divided into four modules as mentioned below.
4.1.1 Input:
The prediction of crop is dependent on soil parameters such as PH, nitrogen, phosphorus,
potassium, soil type to predict crop accurately. The farmer provides the soil parameters to the
system
4.1.2 Data Preprocessing:

Data obtained from various sources are in the form of raw data. Data is generally in raw form;
we want to convert it to useful form by using data preprocessing. It Consist of redundant,
incomplete, inconsistent data. So in Data preprocessing raw data is converted into
normalized form.
4.1.3 Classification Algorithm:

Data takes place in two steps; The first step is to create a classification model and the second
step where the model is used to predict the class label for given data. The two classification
algorithms used by us are Support Vector Machine, and linear regression.
4.1.4 Majority Voting:
Ensemble techniques that are used to create multiple models and then combine them to
produce an accurate result. Every model predicts the class label for each instance and
depending upon the majority that class label is selected.
19
CHAPTER 5
SYSTEM REQUIREMENT SPECIFICATIONS
20
5 SYSTEM REQUIREMEN SPECIFICATIONS
A system specification contains a requirement model and a use case model. These two
models are different yet complementary way to capture system requirements. It is formed
with the help of set of tasks.
1. Finding functional requirements.
2. Finding non-functional requirements.
3. Software requirements.
4. Hardware requirements.
Among these activities, first two are the responsibilities of requirement engineer, whereas the
third and fourth are the responsibility of an architect.
5.1 Hardware Requirements:
• Processor : Intel Core i3 or Never
• Storage :256GB
• RAM :4GB
• Monitor :15 VGA Color
5.2 Software Requirements:
• Operating system : Windows XP/7/10/11

• Coding Language : Python
• Compiler/Shell : IDLE
21
5.3 Non-Functional Requirements:
Non-functional requirements are requirements which specify criteria that can be used
to judge the operation of a system, rather than specific behaviors. This should be contrasted
with functional requirements that specify specific behavior or functions. Typical non-
functional requirements are reliability, scalability, and cost. Non-functional requirements are
often called the utilities of a system. Other terms for non-functional requirements are
“constraints”, “quality attributes” and “quality of service requirements”.
Reliability: If any exceptions occur during the execution of the software, it should be caught
and thereby prevent the system from crashing.
Scalability: The system should be developed in such a way that new modules and
functionalities can be added, thereby facilitating system evolution.
Cost: The cost should be low because a free availability of software package.
22
CHAPTER 6
SYSTEM DESIGN
23
6 SYSTEM DESIGN
6.1 SYSTEM ARCHITECTURE
Figure 2 : System Architecture for crop prediction
Above figure represents the architectural design of the proposed work. System architecture is
a conceptual model that defines the structure and behavior of the system. It comprises of the
system components and the relationship describing how they work together to implement the
overall system.
24
6.2 UML DIAGRAMS
Taking software requirements specification document of analysis phase as input to the
design phase we have drawn Unified Modelling Language (UML) diagrams. UML depends
on the visual modelling of the system. Visual modelling is the process of taking information
from the model and displaying it graphically using some sort of standards set of graphical
elements. UML Diagrams are drawn using the Star UML Diagrammed Software.
Complexity is better understood when it is displayed visually rather than written textually.
By producing visual models of a system, one can understand how system works on several
levels and can model the interactions between the users and the system.
Each UML diagram is designed to let developers and customers view a software system
from a different perspective and in varying degrees of abstraction.
There are two broad categories of diagrams and they are again divided into subcategories
1.Structural Diagrams
2. Behavioral Diagrams
Structural Diagrams:
The structural diagrams represent the static aspect of the system. These static aspects
represent those parts of a diagram, which forms the main structure and are therefore stable.
These static parts are represented by classes, interfaces, objects, components, and nodes.
The four structural diagrams are −
1.Class diagram
2.Object diagram
3.Component diagram
4.Deploymentdiagram
25
Behavioral Diagrams:
Any system can have two aspects, static and dynamic. So, a model is considered as
complete when both the aspects are fully covered. Behavioral diagrams basically capture the
dynamic aspect of a system. Dynamic aspect can be further described as the
changing/moving parts of a system.
UML has the following five types of behavioral diagrams −
1. Use case diagram
2. Sequence diagram
3. Collaboration diagram
4. State chart diagram
5. Activity diagram
26
6.2.1 Use Case Diagram
A use case diagram is a way to summarize details of a system and the users within that system.
It contains actors and use cases. Use case diagrams will specify the events in a system and
how those events flow. Use case diagram doesn’t describe how those events are implemented.
Figure 3:Use Case diagram for crop prediction
27
6.2.2 Sequence Diagram
Sequence diagrams describe interactions among objects in terms of exchange of messages

over time. It focuses on the time ordering of messages. Sequence diagram explains how
and in what order the objects are working together. A sequence diagram is a good way to
visualize and validate various run time scenarios. Sequence diagram can help to predict
how a system will behave.
Figure 4: Sequence Diagram for Crop Prediction
28
6.2.3 Activity Diagram
Activity diagram is an important behavioural diagram that describes dynamic aspects of the
system. It mainly focuses on the flow control from one activity to another activity .Activity
diagram is essentially an advanced version of flow chart that represent the flow from one
activity to another activity. It describes how activities are coordinated to provide a service
which can be at levels of abstraction.
Figure 5: Activity Diagram for Crop Prediction
29
6.2.4 Collaboration Diagram
Collaboration diagrams captures dynamic behavior of the objects in the system. They are
useful for visualizing the relationship between objects collaborating to perform a particular
task. It illustrates object interactions in a graph or network format. Collaboration is used to
illustrate coordination of object structure and control.
Figure 6 : Collaboration Diagram for Crop Prediction

30
6.2.5 Data Flow Diagram
A dataflow diagram is a graphical representation of the “flow” of data through an

information system, modelling its process aspects. A DFD is often used as a preliminary
step to create an overview of the system without going into great detail, which can later be
elaborated. DFDs can also be used for the visualization of the data processing .A DFD
shows what kind of information will be input to and output from the system, how the data
will advance through the system , and where the data will be stored.
Figure 7: DFD diagram for Crop Prediction
31
CHAPTER 7
IMPLEMENTATION
32
7 IMPLEMENTATION
This chapter includes the implementation of the design and source code. In this phase
the design is translated into code. Computer programs are written using a conventional
programming language or an application generator. Programming tools like Compilers,
Interpreters, and Debuggers are used to generate the code. Python programming language is
used for coding. With respect to the type of application, the right programming language is
chosen.
Python:
Python is an interpreted, high-level, general-purpose programming language. Created by
Guido van Rossum and first released in 1991, Python's design philosophy emphasizes code
readability with its notable use of significant whitespace. Its language constructs and object-
oriented approach aim to help programmers write clear, logical code for small and large-scale
projects. Python is dynamically typed and garbage-collected. It supports multiple programming
paradigms, including structured (particularly, procedural), object-oriented, and functional
programming.
33
7.1 Sample Source Code
from flask import Flask, render_template, request, Markup
#from collections.abc import Mapping
import numpy as np
import pandas as pd
import pickle
file = open('cropmodel2.pkl', 'rb')
svm = pickle.load(file)
file.close()
app = Flask(__name__)
mapper = {1: 'rice',
2: 'maize',
3: 'chickpea',
4: 'kidneybeans',
5: 'pigeonpeas',
6: 'mothbeans',
7: 'mungbean',
8: 'blackgram',
9: 'lentil',
10: 'pomegranate',
11: 'banana',
12: 'mango',
13: 'grapes',
14: 'watermelon',
15: 'muskmelon',
16: 'apple',
17: 'orange',
34
18: 'papaya',
19: 'coconut',
20: 'cotton',
21: 'jute',
22: 'coffee'}
fertilizer_dic = {
'NHigh': """The N value of soil is high and might give rise to weeds.
 Please consider the following suggestions:
 1. Manure – adding manure is one of the simplest ways to amend your
soil with nitrogen. Be careful as there are various types of manures with varying degrees of
nitrogen.
 2. Coffee grinds – use your morning addiction to feed your gardening habit!
Coffee grinds are considered a green compost material which is rich in nitrogen. Once the
grounds break down, your soil will be fed with delicious, delicious nitrogen. An added benefit
to including coffee grounds to your soil is while it will compost, it will also help provide
increased drainage to your soil.
 3. Plant nitrogen fixing plants – planting vegetables that are in Fabaceae
family like peas, beans and soybeans have the ability to increase nitrogen in your soil
 4. Plant ‘green manure’ crops like cabbage, corn and brocolli
 5. Use mulch (wet grass) while growing crops - Mulch can also include
sawdust and scrap soft woods""",
'Nlow': """The N value of your soil is low.
 1. Add sawdust or fine woodchips to your soil – the carbon in the
sawdust/woodchips love nitrogen and will help absorb and soak up and excess nitrogen.
 2. Plant heavy nitrogen feeding plants – tomatoes, corn, broccoli, cabbage
and spinach are examples of plants that thrive off nitrogen and will suck the nitrogen dry.
35
 3. Water – soaking your soil with water will help leach the nitrogen deeper
into your soil, effectively leaving less for your plants to use.
 4. Sugar – In limited studies, it was shown that adding sugar to your soil can
help potentially reduce the amount of nitrogen is your soil. Sugar is partially composed of
carbon, an element which attracts and soaks up the nitrogen in the soil. This is similar concept
to adding sawdust/woodchips which are high in carbon content.
 5. Add composted manure to the soil.
 6. Plant Nitrogen fixing plants like peas or beans.
 7. Use NPK fertilizers with high N value.
 8. Do nothing – It may seem counter-intuitive, but if you already have plants
that are producing lots of foliage, it may be best to let them continue to absorb all the nitrogen
to amend the soil for your next crops.""",
'PHigh': """The P value of your soil is high.
 1. Avoid adding manure – manure contains many key nutrients for your
soil but typically including high levels of phosphorous. Limiting the addition of manure will
help reduce phosphorus being added.
 2. Use only phosphorus-free fertilizer – if you can limit the amount of
phosphorous added to your soil, you can let the plants use the existing phosphorus while still
providing other key nutrients such as Nitrogen and Potassium. Find a fertilizer with numbers
such as 10-0-10, where the zero represents no phosphorous.
 3. Water your soil – soaking your soil liberally will aid in driving
phosphorous out of the soil. This is recommended as a last ditch effort.
 4. Plant nitrogen fixing vegetables to increase nitrogen without increasing
phosphorous (like beans and peas).
 5. Use crop rotations to decrease high phosphorous levels""",
'Plow': """The P value of your soil is low.

36
 1. Bone meal – a fast acting source that is made from ground animal
bones which is rich in phosphorous.
 2. Rock phosphate – a slower acting source where the soil needs to convert
the rock phosphate into phosphorous that the plants can use.
 3. Phosphorus Fertilizers – applying a fertilizer with a high phosphorous
content in the NPK ratio (example: 10-20-10, 20 being phosphorous percentage).
 4. Organic compost – adding quality organic compost to your soil will help
increase phosphorous content.
 5. Manure – as with compost, manure can be an excellent source of
phosphorous for your plants.
 6. Clay soil – introducing clay particles into your soil can help retain & fix
phosphorus deficiencies.
 7. Ensure proper soil pH – having a pH in the 6.0 to 7.0 range has been
scientifically proven to have the optimal phosphorus uptake in plants.
 8. If soil pH is low, add lime or potassium carbonate to the soil as fertilizers. Pure
calcium carbonate is very effective in increasing the pH value of the soil.
 9. If pH is high, addition of appreciable amount of organic matter will help acidify
the soil. Application of acidifying fertilizers, such as ammonium sulfate, can help lower soil
pH""",
'KHigh': """The K value of your soil is high.
 1. Loosen the soil deeply with a shovel, and water thoroughly to
dissolve water-soluble potassium. Allow the soil to fully dry, and repeat digging and watering
the soil two or three more times.
 2. Sift through the soil, and remove as many rocks as possible, using a soil
sifter. Minerals occurring in rocks such as mica and feldspar slowly release potassium into the
soil slowly through weathering.
37
 3. Stop applying potassium-rich commercial fertilizer. Apply only commercial
fertilizer that has a '0' in the final number field. Commercial fertilizers use a three number
system for measuring levels of nitrogen, phosphorous and potassium. The last number stands
for potassium. Another option is to stop using commercial fertilizers all together and to begin
using only organic matter to enrich the soil.
 4. Mix crushed eggshells, crushed seashells, wood ash or soft rock phosphate to the
soil to add calcium. Mix in up to 10 percent of organic compost to help amend and balance the
soil.
 5. Use NPK fertilizers with low K levels and organic fertilizers since they have low
NPK values.
 6. Grow a cover crop of legumes that will fix nitrogen in the soil. This practice will
meet the soil’s needs for nitrogen without increasing phosphorus or potassium.
""",
'Klow': """The K value of your soil is low.
 Please consider the following suggestions:
 1. Mix in muricate of potash or sulphate of potash
 2. Try kelp meal or seaweed
 3. Try Sul-Po-Mag
 4. Bury banana peels an inch below the soils surface
 5. Use Potash fertilizers since they contain high values potassium
"""
}
@app.route('/')
def home():
return render_template('index.html')
@app.route('/dashboard')
def dashboard():
38
return render_template('dashboard.html')
# nitrogen
# phosphorus
# potassium
# temperature
# humidity
# ph
# rainfall
@app.route('/predict', methods=['GET','POST'])
def predict():
if request.method == 'POST':
mydict = request.form
nitrogen = mydict.get('nitrogen')
phosphorus = mydict.get('phosphorus')
potassium = mydict.get('potassium')
temperature = mydict.get('temperature')
humidity = mydict.get('humidity')
ph = mydict.get('ph')
rainfall = mydict.get('rainfall')
input_features = [nitrogen, phosphorus, potassium,
temperature, humidity, ph, rainfall]
# for i in input_features:
# print(i)
inf = svm.predict([input_features])
inf = inf[0]
value = mapper[inf]
print(value)
39
df = pd.read_csv('fertilizer.csv')
print(df.head())
nitro = df[df['Crop'] == value]['N'].iloc[0]
phos = df[df['Crop'] == value]['P'].iloc[0]
pota = df[df['Crop'] == value]['K'].iloc[0]
print(f' Nitrogen is : {nitro},phos is : {phos},potassium is : {pota}')
# print(nitrogen)
print(int(nitro)-int(nitrogen))
n = int(nitro)-int(nitrogen)
p = int(phos)-int(phosphorus)
k = int(pota)-int(potassium)
temp = {abs(n): "N", abs(p): "P", abs(k): "K"}
max_val = temp[max(temp.keys())]
print(f' Max val is : {max_val}')
if max_val == 'N':
if n < 0:
key = 'NHigh'
else:
key = 'Nlow'
elif max_val == 'P':
if p < 0:
key = 'PHigh'
else:
key = 'Plow'
else:
if k < 0:
key = 'KHigh'
40
else:
key = 'Klow'
response = Markup(str(fertilizer_dic[key]))
value = value.capitalize()
return render_template('result.html', inf=response, value=value)
return render_template('predict.html')
if __name__ == '__main__':
app.run(debug=False, host='127.0.0.1')
#ML ALGORITHM
from sklearn.model_selection import RandomizedSearchCV,train_test_split,cross_val_score

from sklearn.tree import DecisionTreeClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.ensemble import RandomForestClassifier,GradientBoostingClassifier
from sklearn import metrics
from sklearn.svm import SVC
from sklearn import tree
X_train_df = pd.DataFrame(X_train_scaled,columns=train.columns)
X_train_df.head()
X_train_scaled
clf = DecisionTreeClassifier(random_state=0)
clf.fit(X_train,y_train)
41
plt.figure(figsize=(16,9))
tree.plot_tree(clf,filled=True,feature_names=train.columns)
path = clf.cost_complexity_pruning_path(X_train,y_train)
ccp_alpha = path.ccp_alphas
ccp_alpha
alpha_list = []
for i in ccp_alpha:
clf = DecisionTreeClassifier(random_state=0,ccp_alpha=i)
alpha_list.append(clf)
train_score = [clf.score(X_train,y_train) for clf in alpha_list]

test_score = [clf.score(X_test,y_test) for clf in alpha_list]
plt.xlabel('alpha')
plt.ylabel('accuracy')
plt.plot(ccp_alpha,train_score,marker = 'o',label = 'training',color = 'magenta',drawstyle = 'steps-
post')
plt.plot(ccp_alpha,test_score,marker = '+',label = 'testing',color = 'red',drawstyle = 'steps-post')
plt.legend()
plt.show()
clf = DecisionTreeClassifier(random_state=0,ccp_alpha=0.045)
tree.plot_tree(clf,filled=True,feature_names=train.columns)
42
params = {
'RandomForest':{
'model': RandomForestClassifier(),
'params':{
'n_estimators': [int(x) for x in np.linspace(start=1,stop=1200,num=10)],
'max_depth':[int(x) for x in np.linspace(start=1,stop=30,num=5)],
'min_samples_split':[2,5,10,12],
'min_samples_leaf':[2,5,10,12],
'max_features':['auto','sqrt'],
'ccp_alpha':[0.030,0.035,0.040,0.045,0.050],
}
},
'logistic':{
'model':LogisticRegression(),
'params':{
'penalty':['l1', 'l2', 'elasticnet'],
'C':[0.25,0.50,0.75,1.0],
'tol':[1e-10,1e-5,1e-4,1e-3,0.025,0.25,0.50],
'solver':['lbfgs','liblinear','saga','newton-cg'],
'multi_class':['auto', 'ovr', 'multinomial'],
'max_iter':[int(x) for x in np.linspace(start=1,stop=250,num=10)],
}
},
'D-tree':{
'model':DecisionTreeClassifier(),
'params':{
'criterion':['gini','entropy'],
43
'splitter':['best','random'],
'min_samples_split':[1,2,5,10,12],
'min_samples_leaf':[1,2,5,10,12],
'max_features':['auto','sqrt'],
'ccp_alpha':[0.030,0.035,0.040,0.045,0.050],
}
},
'SVM':{
'model':SVC(),
'params':{
'C':[0.25,0.50,0.75,1.0],
'tol':[1e-10,1e-5,1e-4,0.025,0.50,0.75],
'kernel':['linear','poly','sigmoid','rbf'],
'max_iter':[int(x) for x in np.linspace(start=1,stop=250,num=10)],
}
}
}
scores = []
for model_name,mp in params.items():
clf = RandomizedSearchCV(mp['model'],param_distributions= mp['params'],cv = 5,n_jobs=-
1,n_iter = 10,scoring='accuracy',
verbose=2)
scores.append({
'model_name':model_name,
'best_score':clf.best_score_,
'best_estimator':clf.best_estimator_
44
})
scores_df= pd.DataFrame(data = scores,columns = ['model_name','best_estimator','best_score'])

scores_df
for i in scores_df['best_estimator']:
print(i)
rf = RandomForestClassifier(ccp_alpha=0.03, max_depth=22, min_samples_leaf=12,

min_samples_split=5, n_estimators=800)
lr = LogisticRegression(C=0.75, max_iter=194, multi_class='ovr', penalty='l1',

solver='liblinear')
svc = SVC(C=0.5, kernel='poly', max_iter=139, tol=1e-10)
rf_val = cross_val_score(estimator=rf,X=X_train_scaled,y=y_train,cv=20,n_jobs=-1)
lr_val = cross_val_score(estimator = lr,X=X_train_scaled,y=y_train,cv=20,n_jobs=-1)
svc_val = cross_val_score(estimator=svc,X=X_train_scaled,y=y_train,cv=20,n_jobs=-1)
score_list = [rf_val,lr_val,svc_val]
model_name = ['rf','lr','svc']
for i,j in zip(score_list,model_name):
print(f' Model : {j} gave {i.mean()} accuracy')
from sklearn import metrics

45
svc.fit(X_train_scaled,y_train)
svc.score(X_train_scaled,y_train)
rf = RandomForestClassifier(ccp_alpha=0.03, max_depth=22, min_samples_leaf=12,

min_samples_split=5, n_estimators=800)
rf.fit(X_train,y_train)
rf.score(X_train,y_train)

svc.fit(X_train,y_train)
svc.score(X_train,y_train)
svc.score(X_test,y_test)
y_pred = svc.predict(X_test)
cn = metrics.confusion_matrix(y_test,y_pred)
sn.heatmap(cn,annot=True,linecolor='red',linewidths=2,cmap='plasma')
print(metrics.classification_report(y_test,y_pred))
train.shape,temp.shape
train = np.array(train)
predict_list = []
for i in range(0,len(train)):
predict_list.append(svc.predict([train[i]]))
46
predict_list = np.array(predict_list)
temp.head()
original_labels = ['rice', 'maize', 'chickpea', 'kidneybeans', 'pigeonpeas',

'mothbeans', 'mungbean', 'blackgram', 'lentil', 'pomegranate',
'banana', 'mango', 'grapes', 'watermelon', 'muskmelon', 'apple',
'orange', 'papaya', 'coconut', 'cotton', 'jute', 'coffee']
labels_map_new = {i+1:original_labels[i] for i in range(len(original_labels))}
labels_map_new
temp['Original_labels'] = temp['label'].map(labels_map_new)
temp.head()
temp['SVM_pred'] = predict_list
temp['Predicted_labels'] = temp['SVM_pred'].map(labels_map_new)
temp.head()
sn.countplot(data=temp,x = 'Original_labels')
sn.countplot(data = temp,x = 'Predicted_labels')
temp['Predicted_labels'].value_counts()
temp['Original_labels'].value_counts()
47
a= temp[temp['Original_labels']!=temp['Predicted_labels']].style.background_gradient('plasma')
tru = temp['Original_labels'].values
tru = list(tru.flatten())
predict = temp['Predicted_labels'].values
predict = list(predict.flatten())
count = 0
for i,j in zip(tru,predict):
if i!=j:
count+=1
Preprocessing : {temp.shape[0]}\nMisclassified values are : {temp.shape[0]-count}')

# 21st label is jute!
data = np.array([[83, 45, 60, 28, 70.3, 7.0, 150.9]])
prediction = svc.predict(data)
pred = prediction[0]
print(labels_map_new[pred])
data = np.array([[104,18, 30, 23.603016, 60.3, 6.7, 140.91]])

prediction = svc.predict(data)
pred = prediction[0]
print(labels_map_new[pred])
#saving the model

import pickle
svm_model_pkl = open('crop_classification_model.pkl', 'wb')
pickle.dump(svc, svm_model_pkl)
48
svm_model_pkl.close()
file = open('cropmodel2.pkl','wb')
pickle.dump(svc,file)
file.close()
49
CHAPTER 8
TESTING
50
8.1 TESTING INTRODUCTION
Software testing can be stated as the process of verifying and validating whether a
software or application is bug-free, meets the technical requirements as guided by its design
and development, and meets the user requirements effectively and efficiently by handling all
the exceptional and boundary cases.
The process of software testing aims not only at finding faults in the existing software
but also at finding measures to improve the software in terms of efficiency, accuracy, and
usability. It mainly aims at measuring the specification, functionality, and performance of a
software program or application.
Need for Testing
Testing was essential for the following reasons:-
▪ Existence of program defects of inadequacies
▪ The software behavior as intended by its designer
▪ Conformance with requirement specification/user needs.
▪ Assess the operational reliability of the system.
▪ Reflect the frequency of actual user inputs.
▪ Find the fault, which caused the output anomaly.
▪ Checks for detect flaws and deficiencies in the requirements.
▪ Check whether the software is operationally useful.
▪ Exercise the program using data like the real data processed by the program.
51
8.1.1 Unit Testing
Unit testing is a software testing method where individual units or components of a

software application are tested in isolation from the rest of the system. A unit refers to the
smallest testable part of an application, such as a function, method, or class. The purpose of
unit testing is to validate that each unit of the software performs as expected and to identify
and fix any defects or bugs early in the development process.
Unit tests are typically automated and run in a testing framework that allows
developers to create, run, and analyse the results of tests. Unit tests should be independent,
meaning that they should not rely on other units or external resources, and should be
repeatable and predictable.
By performing unit testing, developers can ensure that each unit of their code is
functioning as intended and that any issues are caught early in the development process, when
they are easier and less expensive to fix. This approach can also improve the overall quality
and reliability of the software, as well as make it easier to maintain and modify over time.
8.1.2 Integration Testing
Integration testing is a software testing method that involves testing the interaction
between different components or modules of an application to ensure that they work together
as expected. This type of testing is performed after unit testing and before system testing.
Integration testing can be performed using different approaches, such as top-down,

bottom-up, or a combination of both. In top-down integration testing, the higher-level
modules are tested first, followed by the lower-level modules. In contrast, bottom-up
integration testing tests the lower-level modules first, followed by the higher-level modules.
52
In both approaches, the goal is to ensure that the integration between the modules is seamless
and that the system as a whole function correctly.
By performing integration testing, developers can identify any issues that may arise
when different components of the application are combined, and ensure that the system
functions as intended. This can help reduce the risk of errors or bugs in the final product,
improve the overall quality and reliability of the software, and ensure that the system meets
the requirements and expectations of end-users.
8.1.3 User Interface Testing
User Interface (UI) testing is a software testing method that focuses on testing the user
interface or the front-end of a software application to ensure that it functions correctly and
meets the requirements of end-users. The purpose of UI testing is to verify that the graphical
user interface (GUI) elements such as buttons, menus, icons, and other visual components of
the application work as intended, are displayed correctly, and are responsive to user input.
UI testing can be performed manually or with the help of automated testing tools. In
manual UI testing, testers perform tests by using the software application and interacting with
the user interface to verify that it behaves as expected. Automated UI testing involves using
software tools to simulate user interactions and validate that the UI elements are displayed
correctly, and that they respond to user input in the expected manner.
Some common UI testing scenarios include verifying that the application is responsive
to different screen resolutions and orientations, checking that the user interface elements are
aligned correctly, testing that the application is compatible with different browsers, and
ensuring that the user interface is accessible to users with disabilities.
53
By performing UI testing, developers can ensure that the user interface of their software
application functions correctly and meets the expectations of end-users. This can help
improve the user experience, reduce the risk of errors or bugs, and increase the overall quality
and reliability of the software application.
8.2 Test Results
Table 2: Test Results.
54
CHAPTER 9
OUTPUT SCREENS
55
9 OUTPUT SCREENS
Home Page
Predict Page:
56
INPUT 1:
OUTPUT 1:
57
INPUT 2:
OUTPUT 2:
58
INPUT 3:
OUTPUT 3:
59
CHAPTER 10
CONCLUSION
60
10 CONCLUSION
Agriculture is the backbone of many countries including India. Since integrating the
information technology with the agriculture will guide the farmer to improve the productivity.
In this proposed work the system described works faster and gives better accuracy in
prediction to predict the suitable crops for the field. It includes various parameters of soil to
analyze the crop. This prediction makes the farmers to improve the productivity, growth,
and quality of the plants.
61
CHAPTER 11
REFERENCES
62
11 REFERENCES
1. Sk Al Zaminur Rahman, S.M. Mohidul Islam, Kaushik Chandra Mitra,” Soil
Classification using Machine Learning Methods and Crop Suggestion Based on Soil
Series”,2018 21st International Conference of Computer and Information Technology
(ICCIT), 21-23 December, 2018.
2. S. Panchamurthi. M.E.,M.D. Perarulalan,A. Syed Hameeduddin,P. Yuvaraj,”Soil
Analysis and Prediction of Suitable Crop for Agriculture using Machine Learning”,
International Journal for Research in Applied Science & Engineering
Technology(IJRASET)
3. D Ramesh,B Vishnu Vardhan,“Data Mining Techniques and Applications to Agricultural
Yield Data,” International Journal of Advanced Research in Computer and
Communication Engineering Vol. 2, Issue 9, September 2013.
4. RakeshKumar ,M.P.Singh,Prabhat Kumar and J. P.Singh ,“Crop Selection Method to
Maximize Crop Yield Rate using Machine Learning Technique,”2015 International
Conference on Smart Technologies and Management for Computing, Communication,
Controls, Energy and Materials (ICSTM), Vel Tech Rangarajan Dr. Sagunthala R&D
Institute of Science and Technology, Chennai, T.N., India. 6 - 8 May 2015. pp.138-145
5. AyushShah,AkashDubey,Vishesh Hemnani,Divye Gala and D. R. Kalbande,” Smart
Farming System: Crop Yield Prediction Using Regression Techniques,”Springer Nature
Singapore Pte Ltd.2018H.Vasudevan etal.(eds.),Proceedings of International Conference
on Wireless Communication, Lecture Notes on Data Engineering and
CommunicationsTechnologies 19.
6. S. Kanaga Suba Raja,Rishi R.,Sundaresan E.,Srijit V.,“Demand Based Crop
Recommender System for Farmers”,2017 IEEE International Conference on
Technological Innovations in ICT For Agriculture and Rural Development(TIAR2017),
978-1-5090-4437- 5/17/$31.
63
PREDICTION OF CROP BASED ON
SOIL PROPERTIES
S. Keerthi, T. Esther Rani, B. Vivek, V. Sasidhar Reddy
Prof V RAJEEV JETSON M.Tech (Ph.D), Professor, Department of CSE, KHIT, Guntur
ABSTRACT
One of India's most important and prevalent professions, agriculture plays a crucial part in the
growth of our nation. Improving crop output is therefore viewed as a significant aspect of agriculture
since 60 percent of the nation's territory is used for agriculture to feed its 1.2 billion inhabitants.
Basically, we need to know what kind of crop can be grown here if we have a plot of ground. The
different aspects of dirt are important to agriculture. Crop production is a challenging job because it
involves a variety of variables, including soil type, temperature, humidity, etc. However, due to
unnatural climatic changes, food output and forecasting are currently declining, which will have a
negative impact on farmers' economies by resulting in a low yield and also make farmers less adept at
predicting future crops. Farmers and the other parties involved would benefit greatly from being able
to locate the crop before sowing it in order to make informed choices regarding storage and business
operations. By keeping track of the agricultural area based on the properties of the soil and advising
farmers on the best crop to grow, the proposed project would help them to significantly boost output
and lower loss. In our research, we develop a recommendation system that uses machine learning
methods like logistic regression, support vector machines, and others to suggest the best crops based
on the input soil parameters. Here, the seed information for the crops is gathered along with the
necessary conditions, such as temperature, humidity, and moisture content, which aids in the crops'
successful development. Thus, this method lessens the financial losses that farmers experience as a
result of planting the incorrect crops. It also aids farmers in discovering new crop varieties that can be
grown in their region.
Keywords: Machine Learning, Crop Prediction, Soil Properties.
1. INTRODUCTION
In Industry that is 4.0, also known as the Fourth Industrial revolution, The primary
driver of the Indian economy is agriculture. Agriculture has long been regarded as one of the
primary activities carried out in India. In India, agriculture employs 50% of the labour
population. In terms of a few commodities, India is the top producer. India's primary industry
is agriculture. The primary and fundamental component of agriculture is the soil. However,
producers are still employing the old technique today. Farmers' inability to obtain satisfactory
results using the conventional technique indicates that crop production is not growing. Good
soil quality is necessary to boost crop yields. Crop quality and output are entirely dependent
on the soil. The soil qualities used in agriculture include those linked to organic matter, such
as nitrogen, phosphorus, and potassium (Potassium). We were inspired to create this method
in order to assist farmers in selecting the crop that should be grown for their benefit. The
dataset includes the nutrients that are readily accessible to farmers' soil, including N, P, K,
humidity, rainfall, pH, and temperature. The crop that can grow in a specific soil is predicted
Volume 13, Issue 03, Mar 2023 ISSN 2457-0362 Page 364
using a method that takes into account the soil type. As a result, the crop yield rises and the
farmer makes more money using this novel technique. We use cutting-edge technology to
build the system. The technology is built using machine learning. Machine learning focuses
on developing software applications that can acquire data and use that data to learn. Machine
learning enables the creation of models from sample data as well as the ability to make
decisions autonomously based on prior knowledge.
1.1 Prediction of Crop
We were inspired to create this method in order to assist farmers in selecting the crop to
plough for their benefit. The accessible nutrients for farmer's soil make up the dataset. Our system
forecasts crop based on the value of the nutrients. The soil type method makes a prediction about the
types of crops that will thrive in a given soil. As a result, the crop yield rises and the farmer makes
more money using this novel technique. The technology is built using machine learning
2. LITERATURE SURVEY
The study explores exploratory data and takes into account various predictive model designs.
Different regression techniques are attempted to identify and examine each property using a data set
as a sample data set. To determine the best crop to cultivate, various algorithms were applied to the
data collection, including K Nearest Neighbors, Naive Bayes, and KNN with Cross Validation[1].
The system that was created suggested the crop that would grow best on a specific plot of
ground. based on soil composition and environmental variables like rainfall, temperature, humidity,
and pH. To find patterns in the input data and handle it in accordance with the input requirements,
Support Vector Machine (SVM) and Decision Tree machine learning predictive algorithms are used.
The system suggested a crop for the farmer as well as how much fertilizer should be added for the
anticipated produce. Other requirements for the system included displaying the estimated yield in
q/acre, the amount of seed needed for cultivation in kg/acre, and the crop's market price [2].
This paper offers a method for smart agriculture through field monitoring, which can greatly
help farmers increase output. In order to find patterns in the data and then process it in accordance
with the input circumstances, it also uses machine learning and prediction algorithms like multiple
linear regression. [3]
This study paper's objective is to suggest and put into practise a rule-based system to forecast crop
yield production using historical data. By using association rule mining on agricultural data from 2000
to 2012, this was accomplished [4].
The project's main goal is to develop a prediction model that can be used to foretell the crop's
maximum output rate before it is sown. A machine learning algorithm is applied to the data to
estimate the output rate of crops based on the farmer's state, district, season, land area, and crop type
[5].
Based on the literature survey 60% of India's territory is used for agriculture in order to feed its 1.3
billion inhabitants. Additionally, the populace is growing daily. Therefore, agriculture must be
modernised in order to benefit producers in our nation and address many of their issues.
Farmers in the current setup have no access to technology or analysis. Farmers in the traditional
system employ the "trial and error" technique. A farmer experiments on land with various crops,
water availability, etc., and after numerous such "tries," the farmer probably achieves the anticipated
crop output.
Numerous papers have conducted the poll while taking into account various factors. There are some
methods that aid in crop selection, but no system is perfect.
In some papers, crop yield predictions based on climatic input parameters are made using data mining
methods. However, predicting crop output solely on the basis of climatic factors is insufficient.
Different machine learning algorithms that can be used to predict crops have been analysed in some
survey studies.
There are numerous review articles on crop prediction that outline various prediction algorithms. But
at this time, there isn't such a method. Therefore, it is necessary to put such a system in place so that
farmers can profit from it.
3. PROPOSED SYSTEM
Implementation methodology
Implementation Steps:
• Data Collection
• Data Pre-processing
• Training and Testing Data
• Result and analysis
Fig-1: Proposed approach

3.1 Data collection:
One of the initial steps we perform during deployment is a data analysis. We carried out this
analysis to check for correlations between the different dataset characteristics. Any machine learning
method's accuracy is determined by the quantity of factors and the validity of the training dataset.
This study meticulously selected the settings that would yield the best results after examining a
variety of datasets from the Kaggle website. Environmental factors have been used in many studies on
this topic to forecast crop sustainability; some have focused primarily on yield, while others have only
considered fiscal factors. In order to provide the farmer with an exact and reliable recommendation on
which crop would be best for his property, we combined climatic factors like rainfall, temperature,
and soil ph with soil parameters like soil nutrients. Using the read csv() function from the pandas
package, we import the dataset.
Fig-2: dataset of crop recommendation System
3.2 Data Preprocessing:
Sometimes, real-world data has noise, missing values, and is in an unsuitable format that
prevents it from being immediately incorporated into machine learning models. To clean data and
make it suitable for a machine learning model, which increases the model's efficacy and accuracy,
data preprocessing is a necessary job. Data cleaning and preparation for use in machine learning
algorithms make data preprocessing a crucial stage. Preprocessing is primarily concerned with
resolving any missing data as well as removing any outliers or inaccurate data. There are two methods
to fill in any gaps in the data. The first choice is to remove the complete row that contains the
inaccurate or missing data. Although this technique is straightforward to use, it works best with
sizable datasets.
3.3 Training and Testing Data:
We used numerous ml methods to obtain accurate findings because the proposed model needs
to be trained and tested in a variety of scenarios. Here, we've trained the data so that it can forecast the
crop that can be grown based on a variety of provided parameters, such as environmental variables
and soil nutrients. We train the data to forecast the precise crop to be grown using a variety of input
parameters. We make forecasts based on the X test data and fit the data to the X, Y training values.
We ran 100 training epochs on the model. The best model is one that has the lowest loss, and this
model is used for testing and assessment.
Fig-3: Splitting of dataset.
3.4 Result and analysis:
The forecast outcomes are evaluated using the accuracy parameter.
Accuracy:
When true positive and true negative are multiplied by a percentage of true positive, true negative,
and false positive with false negative, the result is an estimate of how close the computation is to
the actual value.
4. ALGORITHMS
classification using random forest

Journal of Engineering Sciences: Algorithms
1. We essentially pick the k to feature at random from all m features in the model.
2. Using the best split point, we compute the node d by choosing the k feature.
3. As a result, we split the nodes into daughter nodes using the split technique.
4. Continue doing steps 1 through 3 until you have the required number of nodes.
5. To make an endless number of trees, carry out steps 1-4 an infinite number of times to make a
forest. to forecast using the learned random forest algorithm.
The method uses the following pseudo code:

1. To predict the output and the result, which were then saved, we used the test characteristics and
each random decision tree.
2. After that, we calculated the vote that each decision tree offers for each result that is predicted.
3. Lastly, we examined the most widely anticipated result, which provides the ultimate prediction
from the random forest.
K-Nearest Neighbour (KNN) algorithm:
1. The K-Nearest Neighbour (KNN) algorithm belongs to the class of supervised learning techniques
and is one of the simplest machine learning methods.
2. The K-NN algorithm saves all the information that is accessible and categorizes new data based on
similarity.
3. This means that using the K-NN algorithm, new data can often be quickly and accurately classified
into a suitable group.
4. K-NN algorithms are frequently used for classification issues. o Regression issues are also
addressed in some instances.
Decision Tree Algorithm:
1. The supervised learning algorithms group includes Decision Tree. The majority of classification
and regression issues are solved using decision tree algorithms.
2. Each leaf node of the decision tree correlates to a class label, and the internal nodes of the tree are
used to represent the attributes in order to solve the issue.
3. A decision tree only accepts yes or no as its only two Binary values.
4. If the response is yes, the tree is divided into another sub-tree; otherwise, the process halts and the
node turns into a leaf node.
Support Vector Machine (SVM):
A supervised machine learning method is called the Support Vector Machine (SVM).
Although the Support Vector Machine (SVM) is used for both categorization and regression, it is
primarily used for classification. As a result of the high accuracy rate offered by the Support Vector
Machine (SVM) method, we also used it. In this method, each piece of data is represented as a point
in an N-dimensional area, and a hyper plane is built to divide the points into various classes. The
hyper plane is then used to perform classification. The datasets will be divided into various classes as
positive and negative by the hyper plane.
5. RESULTS
Fig-4: Home Page
Fig-5: Soil details form
Fig-6: Crop Predicted
Fig-7: Soil details form
Fig-8: Crop Predicted

6. CONCLUSION:
India is one of many nations whose economy is based primarily on agriculture. Since
information technology integration in agriculture will help farmers increase output. The system
outlined in this proposed work perates more quickly and provides better prediction accuracy to
determine the best crop for the area. To analyze the crop, it contains a number of soil parameters. This
forecast encourages farmers to increase growth and output.
REFERENCES
[1] Kevin Tom Thomas, Varsha S , Merin Mary Saji , Lisha Varghese , Er. Jinu Thomas “Crop
Prediction using Machine Learning”.
[2] Nischitha K, Dhanush Vishwakarma, Mahendra N, Manjuraju M.R, Ashwini “Crop Prediction
using Machine Learning Approaches”
[3] “CROP YIELD PREDICTION USING K-MEANS CLUSTERING” Capstone Design Spring
2020 Amine Bouighoulouden Dr. Ilham Kissani.
[4] “Crop prediction based on soil and environmental characteristics using feature selection
techniques” by A. Suruliandi,G. Mariammal & S.P. Raja
[5] “Crop Yield Prediction Using Supervised Machine Learning Algorithm” Hardik Joshi, Monika
Gawade, Manasvi Ganu, Prof. Priya Porwal.
[6] Sk Al Zaminur Rahman, S.M. Mohidul Islam, Kaushik Chandra Mitra,” Soil Classification using
Machine Learning Methods and Crop Suggestion Based on Soil Series”,2018 21st International
Conference of Computer and Information Technology (ICCIT), 21-23 December, 2018.
[7] S. Panchamurthi. M.E.,M.D. Perarulalan,A. Syed Hameeduddin,P. Yuvaraj,”Soil Analysis and
Prediction

A Model To Predict The Crop Based On Soil Properties Using Machine Learning

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A Model To Predict The Crop Based On Soil Properties Using Machine Learning

Uploaded by

Copyright:

Available Formats

A Project Report On

A MODEL TO PREDICT THE CROP BASED ON SOIL

COMPUTER SCIENCE AND ENGINEERING

Under the esteemed Guidance of

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

Properties Using Machine Learning being submitted by

Science and Engineering to the Jawaharlal Nehru Technological University, Kakinada is a

record of bonafide work carried out under my guidance and supervision.

for the award of any degree / diploma.

PROJECT SUPERVISOR HEAD OF THE DEPARTMENT

Name of the Student Roll No Signature

T ESTHER RANI 198X1A05E8

V SASIDHAR REDDY 198X1A05G2

We are thankful to Dr. M. UMA SANKAR REDDY, Director, KHIT, GUNTUR

We are much thankful to Dr. B. SIVA BASIVI REDDY, Principal, KHIT,

We are also thankful to our Project Coordinators D. Vinay Kumar, G. Mantru

T ESTHER RANI 198X1A05E8

V SASIDHAR REDDY 198X1A05G2

SNO CHAPTER NAME Page No

FIGURE NO DESCRIPTION PAGE NO

TABLE No TABLE NAME PAGE No

1.1 INTRODUCTION FOR CROP PREDICTION

[ 5 ] V. Puranik, Sharmila, A. Ranjan and A. Kumari, "Automation in Agriculture and

[ 7 ] Ramesh Medar & Anand M. Ambekar, “Sugarcane Crop prediction Using

3.1 EXISISTING SYSTEM

3.1.1 Disadvantages of Existing System

3.2 PROPOSED SYSTEM

3.2.1 Advantages of Proposed System

3.3 Flow in the Proposed System

Figure 1: Acquisition of Training Dataset

Table 1 : Head of the dataset

SVM can be of two types:

How does SVM works?

Hence, we get a circumference of radius 1 in case of non-linear data

4.1 Module Description

4.1.2 Data Preprocessing:

4.1.3 Classification Algorithm:

4.1.4 Majority Voting:

SYSTEM REQUIREMENT SPECIFICATIONS

2. Finding non-functional requirements.

5.2 Software Requirements:

• Operating system : Windows XP/7/10/11

6.1 SYSTEM ARCHITECTURE

Figure 2 : System Architecture for crop prediction

The four structural diagrams are −

UML has the following five types of behavioral diagrams −

1. Use case diagram

4. State chart diagram

Figure 3:Use Case diagram for crop prediction

Sequence diagrams describe interactions among objects in terms of exchange of messages

Figure 4: Sequence Diagram for Crop Prediction

Figure 5: Activity Diagram for Crop Prediction

Figure 6 : Collaboration Diagram for Crop Prediction

A dataflow diagram is a graphical representation of the “flow” of data through an

Figure 7: DFD diagram for Crop Prediction

'Plow': """The P value of your soil is low.

from sklearn.model_selection import RandomizedSearchCV,train_test_split,cross_val_score

train_score = [clf.score(X_train,y_train) for clf in alpha_list]

scores_df= pd.DataFrame(data = scores,columns = ['model_name','best_estimator','best_score'])

rf = RandomForestClassifier(ccp_alpha=0.03, max_depth=22, min_samples_leaf=12,

lr = LogisticRegression(C=0.75, max_iter=194, multi_class='ovr', penalty='l1',

svc = SVC(C=0.5, kernel='poly', max_iter=139, tol=1e-10)