Professional Documents
Culture Documents
A Model To Predict The Crop Based On Soil Properties Using Machine Learning
A Model To Predict The Crop Based On Soil Properties Using Machine Learning
Submitted in partial fulfillment of the requirement for the award of the degree in
BACHELOR OF TECHNOLOGY
In
Submitted By
S KEERTHI 198X1A05E0
T ESTHER RANI 198X1A05E8
B VIVEK 198X1A05I7
V SASIDHAR REDDY 198X1A05G2
CERTIFICATE
This is to certify that the project report entitled A Model to Predict the Crop Based on Soil
S KEERTHI 198X1A05E0
T ESTHER RANI 198X1A05E8
B VIVEK 198X1A05I7
V SASIDHAR REDDY 198X1A05G2
in partial fulfillment for the award of the Degree of Bachelor of Technology in Computer
The result embedded in this thesis has not been submitted to any other university /institute
EXTERNAL EXAMINER
DECLARATION
I/We hereby declare that the work described in the project report, entitled
“A Model to Predict the Crop based on Soil Properties using Machine Learning”
which is submitted by us in partial fulfilment for the award of Bachelor of Technology
in the Department of Computer Science and Engineering, KHIT, Andhra Pradesh is
the record original and independent research work done by us during the academic year
2022–2023 under the supervision of Prof. V. Rajeev Jetson. The work is original and
has not been submitted for the award of any Degree or Diploma of associate ship or
Fellowship or any other similar title to this or any other university.
S KEERTHI 198X1A05E0
B VIVEK 198X1A05I7
We profoundly grateful to express our deep sense of gratitude and respect towards
our honorable chairman, Sri KALLAM MOHAN REDDY, Chairman of Kallam group
for his precious support in the college.
We are greatly indebted to Prof. V. Rajeev Jetson Professor, & Head of the
department, Computer Science and Engineering, KHIT, GUNTUR for providing the
laboratory facilities fully as and when required and for giving us the opportunity to
carry the project work in the college.
We extend our deep sense of gratitude to our Internal Guide Prof. V. Rajeev
Jetson, Professor, & Head of the department and other Faculty Members & Support staff
for their valuable suggestions, guidance and constructive ideas in each and every step,
which wasindeed of great help towards the successful completion of our project.
Team Members
S KEERTHI 198X1A05E0
B VIVEK 198X1A05I7
Agriculture is one of the most essential and widely practiced occupations in India and it has
a vital role in the development of our country. Around 60 percent of the total land in the
country is used for agriculture to meet the needs of 1.2 billion people, so improving crop
production is therefore seen as a significant aspect of agriculture. Basically, if we have a
piece of land, we need to know what kind of crop can be grown in this area. Agriculture
depends on the various soil properties. Production of crops is a difficult task since it involves
various factors like soil type, temperature, humidity etc. But now-a-days, food production
and prediction is getting depleted due to unnatural climatic changes, which will adversely
affect the economy of farmers by getting a poor yield and also help the farmers to remain
less familiar in forecasting the future crops. If it is possible to find the crop before sowing it,
it would be of great help to the farmers and the other people involved to make appropriate
decisions on the storage and business side. The proposed project would solve agricultural
problems by monitoring the agricultural area based on soil properties and recommending the
most appropriate crop to farmers, thereby helping them to significantly increase
productivity and reduce loss. Our project is a recommendation system which makes use of
machine learning techniques like Logistic Regression and SVM, etc. such that it
recommends the suitable crops based on the input soil parameters. The seed data of the
crops are collected here, with the appropriate parameters like temperature, humidity and
moisture content, which helps the crops to achieve a successful growth. This system thus
reduces the financial losses faced by the farmers caused by planting the wrong crops and
also it helps the farmersto find new types of crops that can be cultivated in their area.
i
LIST OF FIGURES
ii
LIST OF TABLES
iii
CHAPTER 1
INTRODUCTION
1
1 INTRODUCTION
Agriculture is the main source of the Indian Economy. From the olden days itself
agriculture is considered to be one of the main practices practiced in India. In India 50% man
force is involved in agriculture activities. India is the leading producer of few crops.
Predominant occupation in India is agriculture. In Agriculture the soil is the main and basic
thing . But now also the farmers are using the traditional method. Because of the traditional
method farmers did not get satisfactory results means the quantity of crops is not increasing.
To increase the quantity of crops need good quality of soil. The production and
quality of crops totally depends on the soil. The soil quality of the agriculture includes the soil
properties those related to organic matter such as N(Nitrogen), C(Carbon), Ph(Phosphorus),
Mg(Magnesium), Ca(Calcium) and K(Potassium).
To help the farmers to decide the crop to be plow for their benefits we motivated to
build this system. This dataset consists of the available nutrient for farmers’ soil . Based on
nutrients value, our system predicts soil type. According to the soil type system predicts a list
of crops that can grow in a particular soil. Hence the yield of the crop increases, as well as the
farmer, earn more money with this new method. We create the system with the help of
advanced technology. We use machine learning to create the system. Machine learning
concentrates on the creation of computer programs that can access data and use it to learn
from that. Machine learning allows building models from sample data and give the ability to
take decision automatically according to past experiences .
2
1.2 SIGNIFICANCE OF THE PROJECT
Crop prediction has been a popular problem in research for years since the traditional crop
prediction is depending on the climatic conditions that does not satisfy the farmers. An
accurate crop prediction has importance for the farmers in the agriculture industry. Soil
properties plays a significant role in choosing crop to be plow by the farmer to increase the
yield of the crop.
The proposed model is the prediction of the crop using the soil properties. Here the farmers
can consider nutrients of the soil such as N(Nitrogen), C(Carbon), Ph(Phosphorus),
Mg(Magnesium), Ca(Calcium) in order to predict the crop.
In short, this crop prediction can help the farmers to determine the crop that is suitable for the
soil and can help the farmers to increase the profit.
3
CHAPTER 2
LITERATURE SURVEY
4
2 LITERATURE SURVEY
2.1 LITERATURE SURVEY:
[ 1 ] Shriya Sahu, Meenu Chawla and Nilay Khare, “An Efficient Analysis of Crop Yield
Prediction using Hadoop Framework Based on Random Forest Approach”, IEEE .
In this paper, various parameters are considered from soil to atmosphere for predicting the
suitable crop. Soil parameters such as type, ph level, iron, copper, manganese, sulphur,
organic carbon, potassium, phosphate, nitrogen are considered. The random forest algorithm
is used to classify the dataset which provides result in good accuracy with poor error rate.
Since this framework can handle large dataset by processing it in MapReduce programming
model. The phases of the proposed work are: Data Collection, Data Classification(Random
Forest Algorithm), Hadoop Framework – MapReduce programming model and Final
Prediction. The implementation is carried out in ubuntu 14.04 LTS with Hadoop 2.6.0 and the
dataset is collected from various online sources to predict the suitable crop.
[ 2 ] Rakesh Kumar, M.P. Singh, Prabhat Kumar and J.P. Singh, “Crop Selection
Method to Maximize Crop yield rate using Machine Learning Technique”.
This work presents a technique named CSM to select sequence of crops to be planted over
season. CSM method may improve net yield rate of crops to be planted over season. The
proposed method resolves selection of crop (s) based on prediction yield rate influenced by
parameters (e.g. weather, soil type, water density, crop type). The crop sowing table data
considered are gathered from farmer of Patna District, Bihar (India). It takes crop, their
sowing time, plantation days and predicted yield rate for the season as input and finds a
sequence of crops whose production per day are maximum over season.
5
[ 3 ] Monali Paul, Santosh K. Vishwakarma and Ashok Verma. “Analysis of Soil
Behaviour and Prediction of Crop Yield using Data Mining Approach”.
In this work the experiments are performed using RapidMiner 5.3. Two important and well
known classification algorithms K-Nearest Neighbor (KNN) and Naive Bayes (NB) are
applied to the soil dataset which is taken from the soil testing laboratory Jabalpur, M.P. And
classification of soil into low, medium and high categories are done in order to predict the
crop yield using available dataset. This study can help the soil analysts and farmers to decide
sowing in which land may result in better crop production.
[ 4 ] Renuka & Sujata Terdal, "Evaluation of Machine Learning Algorithms for Crop
Prediction"
Agriculture plays a major role within the growth of the national economy. It relay on weather
and different environmental aspects. a number of the factors on that agriculture relies area
unit Soil, climate, flooding, fertilizers, temperature, precipitation, crops, pesticides and herb.
The crop yield relies on these factors and therefore tough to predict. to understand the standing
of crop production, during this work we tend to perform descriptive study on agricultural
information mistreatment numerous machine learning techniques. Crop yield estimates
embrace estimating crop yields from accessible historical information like precipitation
information, soil data, and historic crop yields.
6
it's played polar role within the modification ancient previous agricultural practices. One may
witness development in varied methodologies and technologies being employed within the
agricultural system. On the contrary, the agriculture sector in Asian nation is witnessing losing
ground a day that has affected the production capability of the system. there's associate rising
want to solve the matter within the said domain to revive vibrancy and put it back on higher
growth.
[ 6 ] Arun Kumar & et al ,“Efficient Crop Yield Prediction Using Machine Learning
Algorithms”.
Descriptive analytics is that the initial state of analytics it's a method during which we will
understand what happened within the past. And we know that past is that the best predictor
of the longer term. during this analysis paper we tend to apply descriptive analytics within the
agriculture production domain for sugarcane crop to search out economical crop yield
estimation. during this paper we've got 3 datasets like as Soil dataset, precipitation dataset,
and Yield dataset. on this combined dataset we apply many supervised techniques to search
out the particular calculable price and also the accuracy of many techniques. during this paper
3 supervised techniques are used like as K-Nearest Neighbor, Support Vector Machine, and
Least square Support Vector Machine .
7
CHAPTER 3
SYSTEM ANALYSIS
8
3 SYSTEM ANALYSIS
9
the farmer, earn more money with this new method. We create the system with the help of
advanced technology. We use machine learning to create the system.
the soil.
2. Sowing the crop based on soil properties leads increase of the crop yield.
The accuracy of a machine learning algorithm may depend on the number of parameters
used and to the extent of correctness of the dataset. Our dataset contains the N, P, K, and pH
values of different kinds of soils as attributes and it also contains the corresponding crops that
can be grown in that soil as label.
Thus, by using an appropriate machine learning algorithm we can train the dataset to predict
the most suitable crop that can be grown under the given input parameters. The data set used
in our project was obtained from Kaggle and is titled “Crop recommendation” and is a CSV
file. A Comma Separated Value (CSV) fileis a delimited text file that uses a comma to separate
values. Each line of the file isa data record. Each record consists of one or more fields,
separated by commas. The use of the comma as a field separator is the source of the name for
this file format. A CSV file typically stores tabular data (numbers and text) in plain text, in
10
which case each line will have the same number of fields. So, in order to use this dataset in
Python, we have to import the .csv file. After the .csv file is imported, in order to read the .csv
file using Python, we use the command:
11
3.4 Machine Learning Algorithm
Support Vector Machine Algorithm
Support Vector Machine or SVM is one of the most popular Supervised Learning algorithms,
which is used for Classification as well as Regression problems. However, primarily, it is
usedfor Classification problems in Machine Learning.
The goal of the SVM algorithm is to create the best line or decision boundary that can
segregate n-dimensional space into classes so that we can easily put the new data point in
the correct category in the future. This best decision boundary is called a hyperplane.
SVM chooses the extreme points/vectors that help in creating the hyperplane. These extreme
cases are called as support vectors, and hence algorithm is termed as Support Vector Machine.
Consider the below diagram in which there are two different categories that are classified
using a decision boundary or hyperplane:
Types of SVM
Linear SVM: Linear SVM is used for linearly separable data, which means if a dataset can
be classified into two classes by using a single straight line, then such data is termed as linearly
separable data, and classifier is used called as Linear SVM classifier.
Non-linear SVM: Non-Linear SVM is used for non-linearly separated data, which means if
a dataset cannot be classified by using a straight line, then such data is termed as non-linear
data and classifier used is called as Non-linear SVM classifier.
12
Hyperplane and Support Vectors in the SVM algorithm:
Hyperplane:
There can be multiple lines/decision boundaries to segregate the classes in n-dimensional
space, but we need to find out the best decision boundary that helps to classify the data
points. This best boundary is known as the hyperplane of SVM.
The dimensions of the hyperplane depend on the features present in the dataset, which means
if there are 2 features (as shown in image), then hyperplane will be a straight line. And if
there are 3 features, then hyperplane will be a 2-dimension plane.
We always create a hyperplane that has a maximum margin, which means the maximum
distance between the data points.
Support Vectors:
The data points or vectors that are the closest to the hyperplane and which affect the position
of the hyperplane are termed as Support Vector. Since these vectors support the hyperplane,
hence called a Support vector.
Linear SVM:
The working of the SVM algorithm can be understood by using an example. Suppose we have
a dataset that has two tags (green and blue), and the dataset has two features x1 and x2. We
want a classifier that can classify the pair(x1, x2) of coordinates in either green or blue.
Consider the below image:
13
So as it is 2-d space so by just using a straight line, we can easily separate these two classes.
Butthere can be multiple lines that can separate these classes. Consider the below image:
Hence, the SVM algorithm helps to find the best line or decision boundary; this bestboundary
or region is called as a hyperplane. SVM algorithm finds the closest point of the lines from
both the classes. These points are called support vectors. The distance between the vectors and
the hyperplane is called as margin. And the goalof SVM is to maximize this margin. The
14
hyperplane with maximum margin is called the optimal hyperplane.
Non-Linear SVM:
If data is linearly arranged, then we can separate it by using a straight line, but for non-linear
data, we cannot draw a single straight line. Consider the below image:
So, to separate these data points, we need to add one more dimension. For linear data, we have
15
used two dimensions x and y, so for non-linear data, we will add a third-dimension z. It can
be calculated as:
z=x2 +y2
By adding the third dimension, the sample space will become as below image
So now, SVM will divide the datasets into classes in the following way. Consider the below
image:
16
Since we are in 3-d Space, hence it is looking like a plane parallel to the x-axis.
If we convert it in 2d space with z=1, then it will become as:
17
CHAPTER 4
PROJECT MODULES
18
4 MODULES
4.1.1 Input:
The prediction of crop is dependent on soil parameters such as PH, nitrogen, phosphorus,
potassium, soil type to predict crop accurately. The farmer provides the soil parameters to the
system
Ensemble techniques that are used to create multiple models and then combine them to
produce an accurate result. Every model predicts the class label for each instance and
depending upon the majority that class label is selected.
19
CHAPTER 5
20
5 SYSTEM REQUIREMEN SPECIFICATIONS
A system specification contains a requirement model and a use case model. These two
models are different yet complementary way to capture system requirements. It is formed
with the help of set of tasks.
1. Finding functional requirements.
3. Software requirements.
4. Hardware requirements.
Among these activities, first two are the responsibilities of requirement engineer, whereas the
third and fourth are the responsibility of an architect.
5.1 Hardware Requirements:
• Processor : Intel Core i3 or Never
• Storage :256GB
• RAM :4GB
• Monitor :15 VGA Color
21
5.3 Non-Functional Requirements:
Non-functional requirements are requirements which specify criteria that can be used
to judge the operation of a system, rather than specific behaviors. This should be contrasted
with functional requirements that specify specific behavior or functions. Typical non-
functional requirements are reliability, scalability, and cost. Non-functional requirements are
often called the utilities of a system. Other terms for non-functional requirements are
“constraints”, “quality attributes” and “quality of service requirements”.
Reliability: If any exceptions occur during the execution of the software, it should be caught
and thereby prevent the system from crashing.
Scalability: The system should be developed in such a way that new modules and
functionalities can be added, thereby facilitating system evolution.
Cost: The cost should be low because a free availability of software package.
22
CHAPTER 6
SYSTEM DESIGN
23
6 SYSTEM DESIGN
Above figure represents the architectural design of the proposed work. System architecture is
a conceptual model that defines the structure and behavior of the system. It comprises of the
system components and the relationship describing how they work together to implement the
overall system.
24
6.2 UML DIAGRAMS
Taking software requirements specification document of analysis phase as input to the
design phase we have drawn Unified Modelling Language (UML) diagrams. UML depends
on the visual modelling of the system. Visual modelling is the process of taking information
from the model and displaying it graphically using some sort of standards set of graphical
elements. UML Diagrams are drawn using the Star UML Diagrammed Software.
Complexity is better understood when it is displayed visually rather than written textually.
By producing visual models of a system, one can understand how system works on several
levels and can model the interactions between the users and the system.
Each UML diagram is designed to let developers and customers view a software system
from a different perspective and in varying degrees of abstraction.
There are two broad categories of diagrams and they are again divided into subcategories
1.Structural Diagrams
2. Behavioral Diagrams
Structural Diagrams:
The structural diagrams represent the static aspect of the system. These static aspects
represent those parts of a diagram, which forms the main structure and are therefore stable.
These static parts are represented by classes, interfaces, objects, components, and nodes.
1.Class diagram
2.Object diagram
3.Component diagram
4.Deploymentdiagram
25
Behavioral Diagrams:
Any system can have two aspects, static and dynamic. So, a model is considered as
complete when both the aspects are fully covered. Behavioral diagrams basically capture the
dynamic aspect of a system. Dynamic aspect can be further described as the
changing/moving parts of a system.
2. Sequence diagram
3. Collaboration diagram
5. Activity diagram
26
6.2.1 Use Case Diagram
A use case diagram is a way to summarize details of a system and the users within that system.
It contains actors and use cases. Use case diagrams will specify the events in a system and
how those events flow. Use case diagram doesn’t describe how those events are implemented.
27
6.2.2 Sequence Diagram
28
6.2.3 Activity Diagram
Activity diagram is an important behavioural diagram that describes dynamic aspects of the
system. It mainly focuses on the flow control from one activity to another activity .Activity
diagram is essentially an advanced version of flow chart that represent the flow from one
activity to another activity. It describes how activities are coordinated to provide a service
which can be at levels of abstraction.
29
6.2.4 Collaboration Diagram
Collaboration diagrams captures dynamic behavior of the objects in the system. They are
useful for visualizing the relationship between objects collaborating to perform a particular
task. It illustrates object interactions in a graph or network format. Collaboration is used to
illustrate coordination of object structure and control.
31
CHAPTER 7
IMPLEMENTATION
32
7 IMPLEMENTATION
This chapter includes the implementation of the design and source code. In this phase
the design is translated into code. Computer programs are written using a conventional
programming language or an application generator. Programming tools like Compilers,
Interpreters, and Debuggers are used to generate the code. Python programming language is
used for coding. With respect to the type of application, the right programming language is
chosen.
Python:
Python is an interpreted, high-level, general-purpose programming language. Created by
Guido van Rossum and first released in 1991, Python's design philosophy emphasizes code
readability with its notable use of significant whitespace. Its language constructs and object-
oriented approach aim to help programmers write clear, logical code for small and large-scale
projects. Python is dynamically typed and garbage-collected. It supports multiple programming
paradigms, including structured (particularly, procedural), object-oriented, and functional
programming.
33
7.1 Sample Source Code
from flask import Flask, render_template, request, Markup
#from collections.abc import Mapping
import numpy as np
import pandas as pd
import pickle
file = open('cropmodel2.pkl', 'rb')
svm = pickle.load(file)
file.close()
app = Flask(__name__)
mapper = {1: 'rice',
2: 'maize',
3: 'chickpea',
4: 'kidneybeans',
5: 'pigeonpeas',
6: 'mothbeans',
7: 'mungbean',
8: 'blackgram',
9: 'lentil',
10: 'pomegranate',
11: 'banana',
12: 'mango',
13: 'grapes',
14: 'watermelon',
15: 'muskmelon',
16: 'apple',
17: 'orange',
34
18: 'papaya',
19: 'coconut',
20: 'cotton',
21: 'jute',
22: 'coffee'}
fertilizer_dic = {
'NHigh': """The N value of soil is high and might give rise to weeds.
<br/> Please consider the following suggestions:
<br/><br/> 1. <i> Manure </i> – adding manure is one of the simplest ways to amend your
soil with nitrogen. Be careful as there are various types of manures with varying degrees of
nitrogen.
<br/> 2. <i>Coffee grinds </i> – use your morning addiction to feed your gardening habit!
Coffee grinds are considered a green compost material which is rich in nitrogen. Once the
grounds break down, your soil will be fed with delicious, delicious nitrogen. An added benefit
to including coffee grounds to your soil is while it will compost, it will also help provide
increased drainage to your soil.
<br/>3. <i>Plant nitrogen fixing plants</i> – planting vegetables that are in Fabaceae
family like peas, beans and soybeans have the ability to increase nitrogen in your soil
<br/>4. Plant ‘green manure’ crops like cabbage, corn and brocolli
<br/>5. <i>Use mulch (wet grass) while growing crops</i> - Mulch can also include
sawdust and scrap soft woods""",
'Nlow': """The N value of your soil is low.
<br/> Please consider the following suggestions:
<br/><br/> 1. <i>Add sawdust or fine woodchips to your soil</i> – the carbon in the
sawdust/woodchips love nitrogen and will help absorb and soak up and excess nitrogen.
<br/>2. <i>Plant heavy nitrogen feeding plants</i> – tomatoes, corn, broccoli, cabbage
and spinach are examples of plants that thrive off nitrogen and will suck the nitrogen dry.
35
<br/>3. <i>Water</i> – soaking your soil with water will help leach the nitrogen deeper
into your soil, effectively leaving less for your plants to use.
<br/>4. <i>Sugar</i> – In limited studies, it was shown that adding sugar to your soil can
help potentially reduce the amount of nitrogen is your soil. Sugar is partially composed of
carbon, an element which attracts and soaks up the nitrogen in the soil. This is similar concept
to adding sawdust/woodchips which are high in carbon content.
<br/>5. Add composted manure to the soil.
<br/>6. Plant Nitrogen fixing plants like peas or beans.
<br/>7. <i>Use NPK fertilizers with high N value.
<br/>8. <i>Do nothing</i> – It may seem counter-intuitive, but if you already have plants
that are producing lots of foliage, it may be best to let them continue to absorb all the nitrogen
to amend the soil for your next crops.""",
'PHigh': """The P value of your soil is high.
<br/> Please consider the following suggestions:
<br/><br/>1. <i>Avoid adding manure</i> – manure contains many key nutrients for your
soil but typically including high levels of phosphorous. Limiting the addition of manure will
help reduce phosphorus being added.
<br/>2. <i>Use only phosphorus-free fertilizer</i> – if you can limit the amount of
phosphorous added to your soil, you can let the plants use the existing phosphorus while still
providing other key nutrients such as Nitrogen and Potassium. Find a fertilizer with numbers
such as 10-0-10, where the zero represents no phosphorous.
<br/>3. <i>Water your soil</i> – soaking your soil liberally will aid in driving
phosphorous out of the soil. This is recommended as a last ditch effort.
<br/>4. Plant nitrogen fixing vegetables to increase nitrogen without increasing
phosphorous (like beans and peas).
<br/>5. Use crop rotations to decrease high phosphorous levels""",
@app.route('/')
def home():
return render_template('index.html')
@app.route('/dashboard')
def dashboard():
38
return render_template('dashboard.html')
# nitrogen
# phosphorus
# potassium
# temperature
# humidity
# ph
# rainfall
@app.route('/predict', methods=['GET','POST'])
def predict():
if request.method == 'POST':
mydict = request.form
nitrogen = mydict.get('nitrogen')
phosphorus = mydict.get('phosphorus')
potassium = mydict.get('potassium')
temperature = mydict.get('temperature')
humidity = mydict.get('humidity')
ph = mydict.get('ph')
rainfall = mydict.get('rainfall')
input_features = [nitrogen, phosphorus, potassium,
temperature, humidity, ph, rainfall]
# for i in input_features:
# print(i)
inf = svm.predict([input_features])
inf = inf[0]
value = mapper[inf]
print(value)
39
df = pd.read_csv('fertilizer.csv')
print(df.head())
nitro = df[df['Crop'] == value]['N'].iloc[0]
phos = df[df['Crop'] == value]['P'].iloc[0]
pota = df[df['Crop'] == value]['K'].iloc[0]
print(f' Nitrogen is : {nitro},phos is : {phos},potassium is : {pota}')
# print(nitrogen)
print(int(nitro)-int(nitrogen))
n = int(nitro)-int(nitrogen)
p = int(phos)-int(phosphorus)
k = int(pota)-int(potassium)
temp = {abs(n): "N", abs(p): "P", abs(k): "K"}
max_val = temp[max(temp.keys())]
print(f' Max val is : {max_val}')
if max_val == 'N':
if n < 0:
key = 'NHigh'
else:
key = 'Nlow'
elif max_val == 'P':
if p < 0:
key = 'PHigh'
else:
key = 'Plow'
else:
if k < 0:
key = 'KHigh'
40
else:
key = 'Klow'
response = Markup(str(fertilizer_dic[key]))
value = value.capitalize()
return render_template('result.html', inf=response, value=value)
return render_template('predict.html')
if __name__ == '__main__':
app.run(debug=False, host='127.0.0.1')
#ML ALGORITHM
X_train_scaled
clf = DecisionTreeClassifier(random_state=0)
clf.fit(X_train,y_train)
41
plt.figure(figsize=(16,9))
tree.plot_tree(clf,filled=True,feature_names=train.columns)
path = clf.cost_complexity_pruning_path(X_train,y_train)
ccp_alpha = path.ccp_alphas
ccp_alpha
alpha_list = []
for i in ccp_alpha:
clf = DecisionTreeClassifier(random_state=0,ccp_alpha=i)
clf.fit(X_train,y_train)
alpha_list.append(clf)
plt.xlabel('alpha')
plt.ylabel('accuracy')
plt.plot(ccp_alpha,train_score,marker = 'o',label = 'training',color = 'magenta',drawstyle = 'steps-
post')
plt.plot(ccp_alpha,test_score,marker = '+',label = 'testing',color = 'red',drawstyle = 'steps-post')
plt.legend()
plt.show()
clf = DecisionTreeClassifier(random_state=0,ccp_alpha=0.045)
clf.fit(X_train,y_train)
plt.figure(figsize=(16,9))
tree.plot_tree(clf,filled=True,feature_names=train.columns)
42
params = {
'RandomForest':{
'model': RandomForestClassifier(),
'params':{
'n_estimators': [int(x) for x in np.linspace(start=1,stop=1200,num=10)],
'max_depth':[int(x) for x in np.linspace(start=1,stop=30,num=5)],
'min_samples_split':[2,5,10,12],
'min_samples_leaf':[2,5,10,12],
'max_features':['auto','sqrt'],
'ccp_alpha':[0.030,0.035,0.040,0.045,0.050],
}
},
'logistic':{
'model':LogisticRegression(),
'params':{
'penalty':['l1', 'l2', 'elasticnet'],
'C':[0.25,0.50,0.75,1.0],
'tol':[1e-10,1e-5,1e-4,1e-3,0.025,0.25,0.50],
'solver':['lbfgs','liblinear','saga','newton-cg'],
'multi_class':['auto', 'ovr', 'multinomial'],
'max_iter':[int(x) for x in np.linspace(start=1,stop=250,num=10)],
}
},
'D-tree':{
'model':DecisionTreeClassifier(),
'params':{
'criterion':['gini','entropy'],
43
'splitter':['best','random'],
'min_samples_split':[1,2,5,10,12],
'min_samples_leaf':[1,2,5,10,12],
'max_features':['auto','sqrt'],
'ccp_alpha':[0.030,0.035,0.040,0.045,0.050],
}
},
'SVM':{
'model':SVC(),
'params':{
'C':[0.25,0.50,0.75,1.0],
'tol':[1e-10,1e-5,1e-4,0.025,0.50,0.75],
'kernel':['linear','poly','sigmoid','rbf'],
'max_iter':[int(x) for x in np.linspace(start=1,stop=250,num=10)],
}
}
}
scores = []
for model_name,mp in params.items():
clf = RandomizedSearchCV(mp['model'],param_distributions= mp['params'],cv = 5,n_jobs=-
1,n_iter = 10,scoring='accuracy',
verbose=2)
clf.fit(X_train,y_train)
scores.append({
'model_name':model_name,
'best_score':clf.best_score_,
'best_estimator':clf.best_estimator_
44
})
for i in scores_df['best_estimator']:
print(i)
rf_val = cross_val_score(estimator=rf,X=X_train_scaled,y=y_train,cv=20,n_jobs=-1)
lr_val = cross_val_score(estimator = lr,X=X_train_scaled,y=y_train,cv=20,n_jobs=-1)
svc_val = cross_val_score(estimator=svc,X=X_train_scaled,y=y_train,cv=20,n_jobs=-1)
score_list = [rf_val,lr_val,svc_val]
model_name = ['rf','lr','svc']
for i,j in zip(score_list,model_name):
print(f' Model : {j} gave {i.mean()} accuracy')
rf.fit(X_train,y_train)
rf.score(X_train,y_train)
svc.score(X_test,y_test)
y_pred = svc.predict(X_test)
cn = metrics.confusion_matrix(y_test,y_pred)
plt.figure(figsize=(16,9))
sn.heatmap(cn,annot=True,linecolor='red',linewidths=2,cmap='plasma')
print(metrics.classification_report(y_test,y_pred))
train.shape,temp.shape
train = np.array(train)
predict_list = []
for i in range(0,len(train)):
predict_list.append(svc.predict([train[i]]))
46
predict_list = np.array(predict_list)
temp.head()
temp['Original_labels'] = temp['label'].map(labels_map_new)
temp.head()
temp['SVM_pred'] = predict_list
temp['Predicted_labels'] = temp['SVM_pred'].map(labels_map_new)
temp.head()
plt.figure(figsize=(25,10))
sn.countplot(data=temp,x = 'Original_labels')
plt.figure(figsize=(25,10))
sn.countplot(data = temp,x = 'Predicted_labels')
temp['Predicted_labels'].value_counts()
temp['Original_labels'].value_counts()
47
a= temp[temp['Original_labels']!=temp['Predicted_labels']].style.background_gradient('plasma')
tru = temp['Original_labels'].values
tru = list(tru.flatten())
predict = temp['Predicted_labels'].values
predict = list(predict.flatten())
count = 0
for i,j in zip(tru,predict):
if i!=j:
count+=1
file = open('cropmodel2.pkl','wb')
pickle.dump(svc,file)
file.close()
49
CHAPTER 8
TESTING
50
8.1 TESTING INTRODUCTION
Software testing can be stated as the process of verifying and validating whether a
software or application is bug-free, meets the technical requirements as guided by its design
and development, and meets the user requirements effectively and efficiently by handling all
the exceptional and boundary cases.
The process of software testing aims not only at finding faults in the existing software
but also at finding measures to improve the software in terms of efficiency, accuracy, and
usability. It mainly aims at measuring the specification, functionality, and performance of a
software program or application.
▪ Exercise the program using data like the real data processed by the program.
51
8.1.1 Unit Testing
Unit tests are typically automated and run in a testing framework that allows
developers to create, run, and analyse the results of tests. Unit tests should be independent,
meaning that they should not rely on other units or external resources, and should be
repeatable and predictable.
By performing unit testing, developers can ensure that each unit of their code is
functioning as intended and that any issues are caught early in the development process, when
they are easier and less expensive to fix. This approach can also improve the overall quality
and reliability of the software, as well as make it easier to maintain and modify over time.
Integration testing is a software testing method that involves testing the interaction
between different components or modules of an application to ensure that they work together
as expected. This type of testing is performed after unit testing and before system testing.
52
In both approaches, the goal is to ensure that the integration between the modules is seamless
and that the system as a whole function correctly.
By performing integration testing, developers can identify any issues that may arise
when different components of the application are combined, and ensure that the system
functions as intended. This can help reduce the risk of errors or bugs in the final product,
improve the overall quality and reliability of the software, and ensure that the system meets
the requirements and expectations of end-users.
User Interface (UI) testing is a software testing method that focuses on testing the user
interface or the front-end of a software application to ensure that it functions correctly and
meets the requirements of end-users. The purpose of UI testing is to verify that the graphical
user interface (GUI) elements such as buttons, menus, icons, and other visual components of
the application work as intended, are displayed correctly, and are responsive to user input.
UI testing can be performed manually or with the help of automated testing tools. In
manual UI testing, testers perform tests by using the software application and interacting with
the user interface to verify that it behaves as expected. Automated UI testing involves using
software tools to simulate user interactions and validate that the UI elements are displayed
correctly, and that they respond to user input in the expected manner.
Some common UI testing scenarios include verifying that the application is responsive
to different screen resolutions and orientations, checking that the user interface elements are
aligned correctly, testing that the application is compatible with different browsers, and
ensuring that the user interface is accessible to users with disabilities.
53
By performing UI testing, developers can ensure that the user interface of their software
application functions correctly and meets the expectations of end-users. This can help
improve the user experience, reduce the risk of errors or bugs, and increase the overall quality
and reliability of the software application.
54
CHAPTER 9
OUTPUT SCREENS
55
9 OUTPUT SCREENS
Home Page
Predict Page:
56
INPUT 1:
OUTPUT 1:
57
INPUT 2:
OUTPUT 2:
58
INPUT 3:
OUTPUT 3:
59
CHAPTER 10
CONCLUSION
60
10 CONCLUSION
Agriculture is the backbone of many countries including India. Since integrating the
information technology with the agriculture will guide the farmer to improve the productivity.
In this proposed work the system described works faster and gives better accuracy in
prediction to predict the suitable crops for the field. It includes various parameters of soil to
analyze the crop. This prediction makes the farmers to improve the productivity, growth,
and quality of the plants.
61
CHAPTER 11
REFERENCES
62
11 REFERENCES
1. Sk Al Zaminur Rahman, S.M. Mohidul Islam, Kaushik Chandra Mitra,” Soil
Classification using Machine Learning Methods and Crop Suggestion Based on Soil
Series”,2018 21st International Conference of Computer and Information Technology
(ICCIT), 21-23 December, 2018.
2. S. Panchamurthi. M.E.,M.D. Perarulalan,A. Syed Hameeduddin,P. Yuvaraj,”Soil
Analysis and Prediction of Suitable Crop for Agriculture using Machine Learning”,
International Journal for Research in Applied Science & Engineering
Technology(IJRASET)
3. D Ramesh,B Vishnu Vardhan,“Data Mining Techniques and Applications to Agricultural
Yield Data,” International Journal of Advanced Research in Computer and
Communication Engineering Vol. 2, Issue 9, September 2013.
4. RakeshKumar ,M.P.Singh,Prabhat Kumar and J. P.Singh ,“Crop Selection Method to
Maximize Crop Yield Rate using Machine Learning Technique,”2015 International
Conference on Smart Technologies and Management for Computing, Communication,
Controls, Energy and Materials (ICSTM), Vel Tech Rangarajan Dr. Sagunthala R&D
Institute of Science and Technology, Chennai, T.N., India. 6 - 8 May 2015. pp.138-145
5. AyushShah,AkashDubey,Vishesh Hemnani,Divye Gala and D. R. Kalbande,” Smart
Farming System: Crop Yield Prediction Using Regression Techniques,”Springer Nature
Singapore Pte Ltd.2018H.Vasudevan etal.(eds.),Proceedings of International Conference
on Wireless Communication, Lecture Notes on Data Engineering and
CommunicationsTechnologies 19.
6. S. Kanaga Suba Raja,Rishi R.,Sundaresan E.,Srijit V.,“Demand Based Crop
Recommender System for Farmers”,2017 IEEE International Conference on
Technological Innovations in ICT For Agriculture and Rural Development(TIAR2017),
978-1-5090-4437- 5/17/$31.
63
PREDICTION OF CROP BASED ON
SOIL PROPERTIES
S. Keerthi, T. Esther Rani, B. Vivek, V. Sasidhar Reddy
Prof V RAJEEV JETSON M.Tech (Ph.D), Professor, Department of CSE, KHIT, Guntur
ABSTRACT
One of India's most important and prevalent professions, agriculture plays a crucial part in the
growth of our nation. Improving crop output is therefore viewed as a significant aspect of agriculture
since 60 percent of the nation's territory is used for agriculture to feed its 1.2 billion inhabitants.
Basically, we need to know what kind of crop can be grown here if we have a plot of ground. The
different aspects of dirt are important to agriculture. Crop production is a challenging job because it
involves a variety of variables, including soil type, temperature, humidity, etc. However, due to
unnatural climatic changes, food output and forecasting are currently declining, which will have a
negative impact on farmers' economies by resulting in a low yield and also make farmers less adept at
predicting future crops. Farmers and the other parties involved would benefit greatly from being able
to locate the crop before sowing it in order to make informed choices regarding storage and business
operations. By keeping track of the agricultural area based on the properties of the soil and advising
farmers on the best crop to grow, the proposed project would help them to significantly boost output
and lower loss. In our research, we develop a recommendation system that uses machine learning
methods like logistic regression, support vector machines, and others to suggest the best crops based
on the input soil parameters. Here, the seed information for the crops is gathered along with the
necessary conditions, such as temperature, humidity, and moisture content, which aids in the crops'
successful development. Thus, this method lessens the financial losses that farmers experience as a
result of planting the incorrect crops. It also aids farmers in discovering new crop varieties that can be
grown in their region.
Keywords: Machine Learning, Crop Prediction, Soil Properties.
1. INTRODUCTION
In Industry that is 4.0, also known as the Fourth Industrial revolution, The primary
driver of the Indian economy is agriculture. Agriculture has long been regarded as one of the
primary activities carried out in India. In India, agriculture employs 50% of the labour
population. In terms of a few commodities, India is the top producer. India's primary industry
is agriculture. The primary and fundamental component of agriculture is the soil. However,
producers are still employing the old technique today. Farmers' inability to obtain satisfactory
results using the conventional technique indicates that crop production is not growing. Good
soil quality is necessary to boost crop yields. Crop quality and output are entirely dependent
on the soil. The soil qualities used in agriculture include those linked to organic matter, such
as nitrogen, phosphorus, and potassium (Potassium). We were inspired to create this method
in order to assist farmers in selecting the crop that should be grown for their benefit. The
dataset includes the nutrients that are readily accessible to farmers' soil, including N, P, K,
humidity, rainfall, pH, and temperature. The crop that can grow in a specific soil is predicted
Volume 13, Issue 03, Mar 2023 ISSN 2457-0362 Page 364
using a method that takes into account the soil type. As a result, the crop yield rises and the
farmer makes more money using this novel technique. We use cutting-edge technology to
build the system. The technology is built using machine learning. Machine learning focuses
on developing software applications that can acquire data and use that data to learn. Machine
learning enables the creation of models from sample data as well as the ability to make
decisions autonomously based on prior knowledge.
1.1 Prediction of Crop
We were inspired to create this method in order to assist farmers in selecting the crop to
plough for their benefit. The accessible nutrients for farmer's soil make up the dataset. Our system
forecasts crop based on the value of the nutrients. The soil type method makes a prediction about the
types of crops that will thrive in a given soil. As a result, the crop yield rises and the farmer makes
more money using this novel technique. The technology is built using machine learning
2. LITERATURE SURVEY
The study explores exploratory data and takes into account various predictive model designs.
Different regression techniques are attempted to identify and examine each property using a data set
as a sample data set. To determine the best crop to cultivate, various algorithms were applied to the
data collection, including K Nearest Neighbors, Naive Bayes, and KNN with Cross Validation[1].
The system that was created suggested the crop that would grow best on a specific plot of
ground. based on soil composition and environmental variables like rainfall, temperature, humidity,
and pH. To find patterns in the input data and handle it in accordance with the input requirements,
Support Vector Machine (SVM) and Decision Tree machine learning predictive algorithms are used.
The system suggested a crop for the farmer as well as how much fertilizer should be added for the
anticipated produce. Other requirements for the system included displaying the estimated yield in
q/acre, the amount of seed needed for cultivation in kg/acre, and the crop's market price [2].
This paper offers a method for smart agriculture through field monitoring, which can greatly
help farmers increase output. In order to find patterns in the data and then process it in accordance
with the input circumstances, it also uses machine learning and prediction algorithms like multiple
linear regression. [3]
This study paper's objective is to suggest and put into practise a rule-based system to forecast crop
yield production using historical data. By using association rule mining on agricultural data from 2000
to 2012, this was accomplished [4].
The project's main goal is to develop a prediction model that can be used to foretell the crop's
maximum output rate before it is sown. A machine learning algorithm is applied to the data to
estimate the output rate of crops based on the farmer's state, district, season, land area, and crop type
[5].
Based on the literature survey 60% of India's territory is used for agriculture in order to feed its 1.3
billion inhabitants. Additionally, the populace is growing daily. Therefore, agriculture must be
modernised in order to benefit producers in our nation and address many of their issues.
Farmers in the current setup have no access to technology or analysis. Farmers in the traditional
system employ the "trial and error" technique. A farmer experiments on land with various crops,
Volume 13, Issue 03, Mar 2023 ISSN 2457-0362 Page 365
water availability, etc., and after numerous such "tries," the farmer probably achieves the anticipated
crop output.
Numerous papers have conducted the poll while taking into account various factors. There are some
methods that aid in crop selection, but no system is perfect.
In some papers, crop yield predictions based on climatic input parameters are made using data mining
methods. However, predicting crop output solely on the basis of climatic factors is insufficient.
Different machine learning algorithms that can be used to predict crops have been analysed in some
survey studies.
There are numerous review articles on crop prediction that outline various prediction algorithms. But
at this time, there isn't such a method. Therefore, it is necessary to put such a system in place so that
farmers can profit from it.
3. PROPOSED SYSTEM
Implementation methodology
Implementation Steps:
• Data Collection
• Data Pre-processing
• Training and Testing Data
• Result and analysis
One of the initial steps we perform during deployment is a data analysis. We carried out this
analysis to check for correlations between the different dataset characteristics. Any machine learning
method's accuracy is determined by the quantity of factors and the validity of the training dataset.
This study meticulously selected the settings that would yield the best results after examining a
variety of datasets from the Kaggle website. Environmental factors have been used in many studies on
this topic to forecast crop sustainability; some have focused primarily on yield, while others have only
considered fiscal factors. In order to provide the farmer with an exact and reliable recommendation on
which crop would be best for his property, we combined climatic factors like rainfall, temperature,
Volume 13, Issue 03, Mar 2023 ISSN 2457-0362 Page 366
and soil ph with soil parameters like soil nutrients. Using the read csv() function from the pandas
package, we import the dataset.
Sometimes, real-world data has noise, missing values, and is in an unsuitable format that
prevents it from being immediately incorporated into machine learning models. To clean data and
make it suitable for a machine learning model, which increases the model's efficacy and accuracy,
data preprocessing is a necessary job. Data cleaning and preparation for use in machine learning
algorithms make data preprocessing a crucial stage. Preprocessing is primarily concerned with
resolving any missing data as well as removing any outliers or inaccurate data. There are two methods
to fill in any gaps in the data. The first choice is to remove the complete row that contains the
inaccurate or missing data. Although this technique is straightforward to use, it works best with
sizable datasets.
We used numerous ml methods to obtain accurate findings because the proposed model needs
to be trained and tested in a variety of scenarios. Here, we've trained the data so that it can forecast the
crop that can be grown based on a variety of provided parameters, such as environmental variables
and soil nutrients. We train the data to forecast the precise crop to be grown using a variety of input
parameters. We make forecasts based on the X test data and fit the data to the X, Y training values.
We ran 100 training epochs on the model. The best model is one that has the lowest loss, and this
model is used for testing and assessment.
Volume 13, Issue 03, Mar 2023 ISSN 2457-0362 Page 367
Fig-3: Splitting of dataset.
3.4 Result and analysis:
Accuracy:
When true positive and true negative are multiplied by a percentage of true positive, true negative,
and false positive with false negative, the result is an estimate of how close the computation is to
the actual value.
4. ALGORITHMS
1. The K-Nearest Neighbour (KNN) algorithm belongs to the class of supervised learning techniques
and is one of the simplest machine learning methods.
2. The K-NN algorithm saves all the information that is accessible and categorizes new data based on
similarity.
3. This means that using the K-NN algorithm, new data can often be quickly and accurately classified
into a suitable group.
4. K-NN algorithms are frequently used for classification issues. o Regression issues are also
addressed in some instances.
1. The supervised learning algorithms group includes Decision Tree. The majority of classification
and regression issues are solved using decision tree algorithms.
Volume 13, Issue 03, Mar 2023 ISSN 2457-0362 Page 368
2. Each leaf node of the decision tree correlates to a class label, and the internal nodes of the tree are
used to represent the attributes in order to solve the issue.
3. A decision tree only accepts yes or no as its only two Binary values.
4. If the response is yes, the tree is divided into another sub-tree; otherwise, the process halts and the
node turns into a leaf node.
A supervised machine learning method is called the Support Vector Machine (SVM).
Although the Support Vector Machine (SVM) is used for both categorization and regression, it is
primarily used for classification. As a result of the high accuracy rate offered by the Support Vector
Machine (SVM) method, we also used it. In this method, each piece of data is represented as a point
in an N-dimensional area, and a hyper plane is built to divide the points into various classes. The
hyper plane is then used to perform classification. The datasets will be divided into various classes as
positive and negative by the hyper plane.
5. RESULTS
Volume 13, Issue 03, Mar 2023 ISSN 2457-0362 Page 369
Fig-5: Soil details form
Volume 13, Issue 03, Mar 2023 ISSN 2457-0362 Page 370
Fig-7: Soil details form
Volume 13, Issue 03, Mar 2023 ISSN 2457-0362 Page 371
determine the best crop for the area. To analyze the crop, it contains a number of soil parameters. This
forecast encourages farmers to increase growth and output.
REFERENCES
[1] Kevin Tom Thomas, Varsha S , Merin Mary Saji , Lisha Varghese , Er. Jinu Thomas “Crop
Prediction using Machine Learning”.
[2] Nischitha K, Dhanush Vishwakarma, Mahendra N, Manjuraju M.R, Ashwini “Crop Prediction
using Machine Learning Approaches”
[3] “CROP YIELD PREDICTION USING K-MEANS CLUSTERING” Capstone Design Spring
2020 Amine Bouighoulouden Dr. Ilham Kissani.
[4] “Crop prediction based on soil and environmental characteristics using feature selection
techniques” by A. Suruliandi,G. Mariammal & S.P. Raja
[5] “Crop Yield Prediction Using Supervised Machine Learning Algorithm” Hardik Joshi, Monika
Gawade, Manasvi Ganu, Prof. Priya Porwal.
[6] Sk Al Zaminur Rahman, S.M. Mohidul Islam, Kaushik Chandra Mitra,” Soil Classification using
Machine Learning Methods and Crop Suggestion Based on Soil Series”,2018 21st International
Conference of Computer and Information Technology (ICCIT), 21-23 December, 2018.
[7] S. Panchamurthi. M.E.,M.D. Perarulalan,A. Syed Hameeduddin,P. Yuvaraj,”Soil Analysis and
Prediction
Volume 13, Issue 03, Mar 2023 ISSN 2457-0362 Page 372