Crop Report Final

CHAPTER 1
INTRODUCTION
1.1. CROP PREDICTION
Agriculture has an extensive history in India. Agriculture is one of the important occupation
practiced in India. India being a nation of millions of villages so it employs a large share of population
in rural areas. It is the broadest economic sector and plays a most important role in the overall
development of the country. More than 60% of the land in the country is used for agriculture in order
to suffice the needs of 1.4 billion people Thus adopting new agriculture technologies is very
important. Current agriculture is highly dependent on technology and focuses on obtaining large
profits from selected hybrid crops, which destroy the soil's physical and biochemical properties in the
long run. This will be leads the farmers of our country towards profit.
Prior crop prediction and crop prediction was performed on the basis of farmers experience on a
particular location. The crop is the significant factor contributing in agricultural monetary. The crop
depends on multiple factors such as climatic, geographic, and financial elements. It is difficult for
farmers to decide when and which crops to plant. Farmers are unaware of which crop to grow, and
what is the right time and place to start due to uncertainty in climatic conditions .They will prefer the
prior or neighborhood or more trend crop in the surrounding region only for their land and they don’t
have enough of knowledge about soil nutrients content such as nitrogen, phosphorus, potassium in the
land.
Considering all these problems takes into the account we designed the system using a machine
learning for betterment of the farmer. Machine learning (ML) is a game changer for agriculture sector.
Machine learning is the part of artificial intelligence, has emerged together with bigdata technologies
and high-performance computing to create new opportunities for data intensive science in the multi-
disciplinary agro technology domain.
The designed system will recommend the most suitable crop for particular land. Based on weather
parameter and soil content such as Rainfall, Temperature, Humidity and pH. They are collected from
Government website and KAGGLE. The system takes the required input from the farmers or data set
such as Temperature, Humidity and pH. This all inputs data applies to machine learning predictive
algorithms like logistic regression and Decision tree to identify the pattern among data and then
process it as per input conditions. The system recommends the crop for the farmer and also
recommends the amount of nutrients to be add for the predicted crop.
1
1.2. OBJECTIVE
 Contributing to optimal crop growth, development and yield.

 Predict appropriate crop from given temperature and rainfall and soil.
 To enhance the economic development of all stake-holders.
 To improve nutritional standards for betterment of health.
 To contribute towards protection and up gradation of the environment for ensuing
ecological balance, avoidance of global warming and healthy living for man and animal.
 to reduce the financial losses faced by the farmers caused by planting the wrong crops
 Also to help the farmers to find new types of crops that can be cultivated in their area. So,
to make every farmer rich by farm and wealth we producing this System.
1.3. ORGANIZATION OF REPORT
The following chapter describes literature survey. Chapter 3 gives a overview about system
requirements of this project. Chapter 4 provides technical details about System analysis and design.
Chapter 5 and 6 describes System Implementation and System Methodology. Chapter 7 and
Chapter 8 gives the Existing System, Proposed System and Result of this project. Chapter 9 will give
the conclusion and future work of our system and future work.
2
CHAPTER 2
LITERATURE SURVEY
Given the significance of crop prediction, numerous suggestions have been proposed in the
past with the goal of improving crop prediction accuracy. In this paper different machine learning
methodology has been approached to model and predict various crop yields at rural areas based on
parameters of soil(PH, nitrogen, potassium, etc.) and parameters related to the atmosphere
(rainfall, humidity, etc.)[1].
This paper looks at five of Tamil Nadu's most important crops- rice, maize, ragi, sugarcane,
and tapioca during a five-year period beginning in 2005. [2] In order to get the maximum crop
productivity, various factors such as rainfall, groundwater, and cultivation area, and soil type were
used in the analysis. K-Means technique was used for the clustering, and for the classification, the
study looked at three different types of algorithms: fuzzy, KNN, and Modified KNN. After
the analysis, MKNN gave the best prediction result of the three algorithms.
An application for farmers can be created that will aid in the reduction of many problems in
the agriculture sector [3]. In this application, farmers perform single/multiple testing by providing
input such as crop name, season, and location. As soon as one provides the input, the user can choose
a method and mine the outputs. The outputs will show you the crop's yield rate. The findings of the
previous year's data are included in the datasets and transformed into a supported format. The
machine learning models used are Naïve Bayes and KNN.
To create the dataset, information about crops over the previous ten years was gathered
from a variety of sources, such as government websites. An IoT device was setup to collect the
atmospheric data using the components like Soil sensors, Dht11 sensor for humidity and temperature,
and Arduino Uno with Atmega as a processor. Naive Bayes, a supervised learning algorithm
obtaining an accuracy of 97% was further improved by using boosting algorithm, which makes
use of weak rule by an iterative process to bring higher accuracy [5]. To anticipate the yield, the
study employs advanced regression techniques such as ENet, Kernel Ridge, and Lasso algorithms
[4]. The three regression techniques are improved by using Stacking Regression for better prediction.
However, when a comparison study is conducted between the existing system and the
proposed system employing Naive Bayes and Random Forest, respectively. The proposed system
comes out on top. Because it is a bagging method, the random forest algorithm has a high accuracy
3
level, but the Naïve Bayes classifier’s accuracy level is lower as the algorithm is probability based.
[6].
This paper contributes to the following aspects- (a) Crop production prediction utilizing a
range of Machine Learning approaches and a comparison of error rate and accuracy for certain
regions. (b) An easy-to-use mobile app that recommends the most gainful crop. (c) A GPS-based
location identifier that can be used to obtain rainfall estimates for a specific location. (d) A system
that recommends the prime time to apply fertilizers [7]. On the given datasets from Karnataka and
Maharashtra, different machine learning algorithms such as KNN, SVM, MLR, Random Forest, and
ANN were deployed and assessed for yield to accuracy [9]. The accuracy of the above algorithms is
compared. The results show that Decision Tree is the most accurate of the standard algorithms used
on the given datasets, with a 99.87% accuracy rate.
Regression Analysis is applied to determine the relationship between the three factors: Area
Under Cultivation, Food Price Index, and Annual Rainfall and their impact on crop yield. The above
three factors are taken as independent variables, and for the dependent variable, crop yield is taken
into consideration. The R2 obtained after the implementation of RA shows these three factors showed
slight differences indicating their impact on the crop yield [8].
In the proposed paper, the dataset is collected from the government websites such as
APMC website, VC Farm Mandya, which contains data related to climatic conditions and soil
nutrients [10]. Two machine learning models were used; the model was trained using the Support
Vector Machine model with Radial Basis Function kernel for rainfall prediction and Decision
Tree for the crop prediction.
A comparative study of various machine learning can be applied on a dataset with a

view to determine the best performing methodology. The prediction is found by applying the
Regression Based Techniques such as Linear, Random Forest, Decision Tree, Gradient Boosting,
Polynomial and Ridge on the dataset containing details about the types of crops, different
states, and climatic conditions under different seasons [11].
The parameters used to estimate the efficiency of these techniques were mean absolute
error, root mean square error, mean squared error, R-square, and cross validation. Gradient Boosting
gave the best accuracy- 87.9% for the target variable ‘Yield’ and Random Forest- 98.9% gave
the best accuracy for the target value ‘Production’. The DHT22 sensor is recommended for
monitoring live temperature and humidity [12].
4
The surrounding air is measured with a thermistor and a capacitive humidity sensor and
outputs a digital signal on the data pin to the Arduino Uno port pin. The humidity value ranges from
0-100% RH and -40 to 80 degrees Celsius to read the temperature. The above two parameters and soil
characteristics are considered as input to three different machine learning models: Support Vector
Machine, Decision Tree, and KNN. The Decision Tree gave better accuracy results.
Table 2.1: Approach to Crop Prediction
5
CHAPTER 3
EXISTING SYSTEM & PROPOSED SYSTEM
3.1 Existing System
The crop yield rate plays a significant role in the economic growth of the country. So, there is
a need to increase crop yield rate. Some biological approaches (e.g. seed quality of the crop, crop
hybridization, strong pesticides) and some chemical approaches (e.g. use of fertilizer, urea, potash)
are carried out to solve this issue. One of existing system we identified is Crop Selection Method
(CSM) to achieve a net yield rate of crops over the season. We have taken example of CSM to
demonstrate how it helps farmers in achieving more yield Crop.
These can be classified as:
 Seasonal crops—crops can be planted during a season. e.g. wheat, cotton.

 Whole year crops—crops can be planted during the entire year. e.g. vegetable, paddy, Toor.
 Short time plantation crops— crops that take a short time for growing. e.g. potato,
vegetables, ratio.
 Long-time plantation crops— These crops take a long time for growing. e.g. sugarcane,
Onion.
A combination of these crops can be selected in a sequence based on yield rate per day. Illustrates
sequences of crops with cumulative yield rate over the season. CSM method, shown in may improve
the net yield rate of crops using the limited land resource and also increases re-usability of the land.
The agricultural systems that significantly follows to the agriculture of India are subsistence
farming, organic farming, industrial farming. Regions all over India differ in types of farming they
use; some are based on horticulture, ley farming, agroforestry, and many more. The surveyed research
papers have given a rough idea about using ML with only one attribute. We have the aim of adding
more attributes to our system and ameliorate the results, which can improve the yields and we can
recognize several patterns for predictions. This system will be useful to justify which crop can be
grown in a particular region.
3.2 Proposed System

The Proposed system will predict the most suitable crop for particular land based on soil
contents and weather parameters such as Temperature, Humidity, soil PH and Rainfall. The
Architecture of the proposed system consists of various blocks such as:
Data Collection: - Data collection is the most efficient method for collecting and measure the data
from different resources like govt websites, VC Form Mandya, APMC website …. etc. To get an
approximate dataset for the system. This dataset must contain the following attributes i) Soil PH ii)
Temperature iii) Humidity iv) Rainfall v) Crop data vi) NPK values, those parameters will consider
for crop prediction.
Data Preprocessing: - After collecting datasets from various resources. Dataset must be
preprocessing before training to the model. The data preprocessing can be done by various stages,
begins with reading the collected dataset the process continues to data cleaning. In data cleaning the
6
datasets contain some redundant attributes, those attributes are not considering for crop prediction. So,
we have to drop unwanted attributes and datasets containing some missing values we need to drop
these missing values or fill with unwanted nan values in order to get better accuracy.
Machine Learning Algorithm for Prediction : - Machine learning predictive algorithms has
highly optimized estimation has to be likely outcome based on trained data.
Crop Recommendation: Based on predicted rainfall, soil contents and weather parameters the
system will recommend the most suitable crop for cultivation. Crop prediction process being with the
loading the external crop datasets. Once the dataset read then pre-processing will be done by various
stages as discussed in Data Pre-processing section. After the data pre-processing, train the models
using Decision tree classifier into training set. For a prediction of the crop, we consider a various
factor such as temperature, humidity, soil PH and predicted rainfall. Those are the input parameter for
a system that can be entered by manually or taken from the sensors. Predicted rainfall and input
parameter values will be appended in a list. The Decision tree algorithm will predict the crop based on
list data.
START
Read temperature, Humidity, pH Values
Append all the values in list & Load External Crop dataset
DATA PREPROCESSING
(dealing with missing values, data cleaning, train/test split)
YES
Predict Crop Using Decision Tree Regressor Crop Classification and by using NO Predicted crop
index values
YES
7
Display Suitable Crop
Chapter 4
Implementation
The system will predict the most suitable crop for particular land based on soil parameters and
weather parameters such as Temperature, Humidity, soil PH, Potassium(K), Phosphorus(P),
Nitrogen(N) and Rainfall.
Fig.4.1: Architecture of crop prediction
4.1. Data Collection: -
Data collection or Loading Data is the most efficient method for collecting and measure the data
from different resources like govt. websites, Kaggle website .... etc. To get an approximate dataset for
the system. This dataset must contain the following attributes i)Soil Ph ii) Temperature iii) Humidity
iv) Rainfall v) Crop data vi) NPK values, those parameters will consider for crop prediction.
4.2. Data Preprocessing: -
After collecting datasets from various resources. Dataset must be preprocessing before training to
the model. The data preprocessing can be done by various stages, begins with reading the collected
8
dataset the process continues to data cleaning. In data cleaning the datasets contain some redundant
attributes, those attributes are not considering for crop prediction. So, we have to drop unwanted
attributes and datasets containing some missing values we need to drop these missing values or fill
with unwanted nan values in order to get better accuracy. Then define the target for a model. After
data cleaning the dataset will be split into training.
4.3. Machine Learning Algorithm for Prediction: -
Machine learning predictive algorithms has highly optimized estimation has to be likely outcome
based on trained data. Predictive analytics is the use of data, statistical algorithms and machine
learning techniques to identify the likelihood of future outcomes based on historical data. The goal is
to go beyond knowing what has happened to providing a best assessment of what will happen in the
future. First, In our system we check the maximum accuracy between some algorithms such as
Logistic Regression, Decision Tree and Random Forest.
4.3.1 Logistic Regression:
Logistic regression is one of the most popular Machine Learning algorithms, which comes
under the Supervised Learning technique. It is used for predicting the categorical dependent
variable using a given set of independent variables. Logistic regression predicts the output of a
categorical dependent variable.
4.3.2 Decision Tree:
It is a supervised learning algorithm where attributes and class labels are represented using a
tree. Here, root attributes are compared with the record's attribute and subsequently, depending
upon the comparison, a new node is reached. This comparison is continued until a leaf node with
a predicted class value is reached. Therefore, a modeled decision tree is very efficient for
prediction purposes
9
Fig. 4.2: Structure of decision tree
4.3.3 Random Forest:
It is an ensemble method of learning that is commonly used for both classification and
regression. In order to train the model to perform prediction using this algorithm, the test features
must be passed through the rules of each randomly created tree. As a result of this, a different
target will be predicted by each random forest for the same test feature. Then, votes are
calculated on the basis of each predicted target. The final prediction of the algorithm is the
highest votes predicted target. The fact that random forest algorithm can efficiently handle
missing values and that the classifier can never over-fit the model are huge benefits for using this
algorithm.
Fig.4.3: Working Of Random Forest
We will discuss about accuracy levels of the above three algorithms while Result section.
4.4. Crop Prediction:
10
Crop prediction process being with the loading the external crop datasets. Once the dataset read
then pre-processing will be done by various stages as discussed in Data Pre-processing section. For a
prediction of the crop, we consider a various factor such as temperature, humidity, soil PH, Soil
Parameters and rainfall. Those are the input parameter for a system that can be entered by manually.
The most accuracy algorithm among the three Logistic Regression, Decision Tree and Random Forest
will predict the crop based on data.
4.5. DEPLOYMENT
The proposed system recommends the best suitable crop for particular land by considering
parameters as annual rainfall, temperature, humidity and soil PH.
In order to deploy the trained model for the farmers to use off, we would need a web application
with a simple user interface which the farmers can utilize.
Fig.4.4: Implementing Flask
11
Fig.4.5: Train & Testing Model
Fig.4.6: Prediction Code
Thus, here we made a simple web interface using HTML & CSS.
12
Fig.4.7: Created Web Page
CHAPTER 5
METHODOLOGY
5.1 Basic Process

i. Data Collection: Collect the data that the algorithm will learn from.
ii. Data Preparation: Format and engineer the data into the optimal format, extracting
important features and performing dimensionality reduction.
iii. Training: Also known as the fitting stage, this is where the Machine Learning algorithm
actually learns by showing it the data that has been collected and prepared.
iv. Evaluation: Test the model to see how well it performs.
v. Tuning: Fine tune the model to maximize its performance.
\
Fig.5.1: General Process
5.2 Datasets
13
Machine Learning depends heavily on data. It‘s the most crucial aspect that makes algorithm
training possible. It uses historical data and information to gain experiences. The better the collection
of the dataset, the better will be the accuracy.
The first step is Data Collection. For this project, we require two datasets. One for modelling
the crop prediction algorithm and other for predicting weather .i.e. Average Rainfall and Average
Temperature. These two parameters are predicted so as to be used as inputs for predicting the suitable
crop.
The crop prediction module dataset requires the following columns: State, District, Crop,
Season, Average Temperature, Average Rainfall, Soil Type, Area and Production as these are the
major factors that crops depend on. ‗Production‘ is the dependent variable or the class variable. There
are eight independent variables and 1 dependent variable. We achieved this by merging the datasets.
The datasets were merged taking the location as the common attribute in both.
Fig.5.2: Soil and Crop data sample
5.3 Exploratory Data Analysis

It is an approach to analysing datasets to summarize their main characteristics, often with
visual methods. It is also about knowing your data, gaining a certain amount of familiarity with the
data, before one starts to extract insights from it. The idea is to spend less time coding and focus more
on the analysis of data itself. After the data has been collected, it undergoes some processing before
being cleaned and EDA is then performed. After EDA, go back to processing and cleaning of data,
i.e., this can be an iterative process. Subsequently, use the cleaned dataset and knowledge from EDA
to perform modelling and reporting. Exploratory data analysis is generally cross- classified in two
ways. First, each method is either non-graphical or graphical. And second, each method is either
univariate or multivariate (usually just bivariate). It is a good practice to understand the data first and
14
try to gather as many insights from it. EDA is all about making sense of data in hand. EDA can give
us the following:
 Preview data
 Check total number of entries and column types using in built functions. It is a
good practice to know the columns and their corresponding data types.
 Check any null values.
 Check duplicate entries
 Plot distribution of numeric data (univariate and pairwise joint distribution)
 Plot count distribution of categorical data.
Using various built in functions, we can get an insight of the number of values in each column
which can give us information about the null values or the duplicate data. We can also find the mean,
standard deviation, minimum value and the maximum value. This is the basic procedure of EDA. To
get a better insight of the data that is being used, we can plot graphs like the correlation matrix which
is one of the most important concept which gives us a lot of information about how variables
(columns) are related to each other and the impact each of them have on the other. A few other graphs
like box plot and distribution graphs can be plotted too.
Fig.5.3: Correlation Matrix Example
5.4 EDA Performed:
15
Fig.5.4: EDA Code
Fig.5.5: Output of describe function

Fig 1.10 shows the simple code written in python to perform the initial steps in EDA .i.e., finding the
number of columns, the features, the missing values, etc. Fig 1.11 shows the details of each attribute
in tabular form which helps in getting a deeper insight of the attributes.
Fig.5.6: Correlation Matrix of the Proposed System
5.5 Algorithms Used

Machine Learning offers a wide range of algorithms to choose from. These are usually
divided into classification, regression, clustering and association. Classification and regression
algorithms come under supervised learning while clustering and association comes under
unsupervised learning.
16
 Classification: A classification problem is when the output variable is a
category, such as ―red‖ or ―blue‖ or ―disease‖ and ―no disease‖. Example:
Decision Trees
 Regression: A regression problem is when the output variable is a real value,
such as ―dollars‖ or ―weight‖. Example: Linear Regression
 Clustering: A clustering problem is where you want to discover the inherent
groupings in the data, such as grouping customers by purchasing behaviour.
Example: k means clustering
 Association: An association rule learning problem is where you want to
discover rules that describe large portions of your data, such as people that
buy X also tend to buy Y.
Algorithm Accuracy
1. Logistic Regression 93.63%
2. Decision Tree 95.45%
3. Random Forest 97.87%
Table 5.1: Summary of the Approaches

From the above table, we can conclude that the Random Forest Algorithm gives the best
accuracy for our dataset
5.6 Random Forest

Random forest is a flexible, easy to use machine learning algorithm that produces, even
without hyper-parameter tuning, a great result most of the time. It is also one of the most used
algorithms, because of its simplicity and diversity. It can be used for both classification and regression
tasks. Random forest builds multiple decision trees and merges them together to get a more accurate
and stable prediction. One big advantage of random forest is that it can be used for both classification
and regression problems, which form the majority of current machine learning systems. Another great
quality of the random forest algorithm is that it is very easy to measure the relative importance of each
feature on the prediction. It computes this score automatically for each feature after training and
scales the results so the sum of all importance is equal to one.
The working of Random Forest is as follows:
 Step 1 − First, start with the selection of random samples from a given dataset.
 Step 2 − Next, this algorithm will construct a decision tree for every sample.
Then it will get the prediction result from every decision tree.
 Step 3 − In this step, voting will be performed for every predicted result.
17
 Step 4 − At last, select the most voted prediction result as the final prediction.
Fig.5.7: Random Forest Flow
5.6.1 Another Reason for choosing Random Forest:

The data of a particular crop was taken and passed through two algorithms .i.e.,
Random Forest and another Algorithm that is said to give best results for that crop. The
accuracy achieved in both the algorithms were compared. Rice and Groundnut were chosen
based on the research papers that were found.
 Rice: According to the paper, the best algorithm for rice yield prediction is Linear
Regression [14]. After running both the algorithms, we found a very high
difference between actual value and predicted value in Linear Regression while
Random Forest continued to maintain an accuracy of 90+.
 Groundnut: The Research Paper stated that KNN works best for Groundnut yield
prediction [13]. On running the algorithms, we did not find much difference in
the results in both the algorithms. Hence, we can conclude that Random Forest
can be used as a general algorithm which gives a considerably high accuracy with
Good predictions.
18
CHAPTER 6
SYSTEM ANALYSIS AND DESIGN
Systems development is a systematic process which includes phases such as planning, analysis,
design, deployment, and maintenance. System Analysis is a process of collecting and interpreting
facts, identifying the problems, and decomposition of a system into its components. System analysis
is conducted for the purpose of studying a system or its parts in order to identify its objectives. It is a
problem solving technique that improves the system and ensures that all the components of the system
work efficiently to accomplish their purpose. Analysis specifies what the system should do. System
Design is a process of planning a new business system or replacing an existing system by defining its
components or modules to satisfy the specific requirements. Before planning, you need to understand
the old system thoroughly and determine how computers can best be used in order to operate
efficiently. System Design focuses on how to accomplish the objective of the system.
6.1 System Architecture

Architecture diagrams can help system designers and developers visualize the high-level,
overall structure of their system or application for the purpose of ensuring the system meets their
users' needs. They can also be used to describe patterns that are used throughout the design. It's
somewhat like a blueprint that can be used as a guide for the convenience of discussing,
improving, and following among a team.
19
Fig.6.1: System Architecture.
6.2 Flowchart
A flowchart is simply a graphical representation of steps. It shows steps in sequential order
and is widely used in presenting the flow of algorithms, workflow or processes. Typically, a
flowchart shows the steps as boxes of various kinds, and their order by connecting them with
arrows. It originated from computer science as a tool for representing algorithms and
programming logic but had extended to use in all other kinds of processes. Nowadays, flowcharts
play an extremely important role in displaying information and assisting reasoning. They help us
visualize complex processes, or make explicit the structure of problems and tasks. A flowchart
can also be used to define a process or project to be implemented.
20
Fig.6.2: Research overview
21
CHAPTER 7
SYSTEM REQUIREMENTS SPECIFICATIONS
A software requirements specification (SRS) is a description of a software system to be

developed. It lays out functional and non functional requirements, and may include a set of use cases
that describe user interactions that the software must provide. It is very important in a SRS to list out
the requirements and how to meet them. It helps the team to save upon their time as they are able to
comprehend how are going to go about the project. Doing this also enables the team to find out about
the limitations and risks early on.
A SRS can also be defined as a detailed description of a software system to be developed with
its functional and non-functional requirements. It may include the use cases of how the user is going
to interact with the software system. The software requirement specification document is consistent
with all necessary requirements required for project development. To develop the software system we
should have a clear understanding of Software system. To achieve this we need continuous
communication with customers to gather all requirements.
A good SRS defines how the Software System will interact with all internal modules,
hardware, and communication with other programs and human user interactions with a wide range of
real life scenarios. It is very important that testers must be cleared with every detail specified in this
document in order to avoid faults in test cases and its expected results.
Qualities of SRS
 Correct
 Unambiguous
 Complete
 Consistent
 Ranked for importance and/or stability
 Verifiable
 Modifiable
 Traceable
22
7.1. Basic Requirements
1. Data collection: The dataset used in this project is the data collected from reliable websites and
merged to achieve the desired data set. The sources of our datasets are: https://en.tutiempo.net/ for
weather data and https://www.kaggle.com/srinivas1/agricuture-crops-production-in-india for crop
yield data. It consists of names of the crops, production, area, average temperature, average rainfall
(mm), season, year, name of the states and the districts. ‗Production‘ is the dependent variable or the
class variable. There are eight independent variables and 1 dependent variable.
2. Data Preprocessing: The purpose of preprocessing is to convert raw data into a form that fits
machine learning. Structured and clean data allows a data scientist to get more precise results from an
applied machine learning model. The technique includes data formatting, cleaning, and sampling.
Here, data pre-processing focuses on finding the attributes with null values or invalid values and
finding the relationships between various attributes as well. Data Pre-processing also helps in finding
out the impact of each parameter on the target parameter. To preprocess our datasets we used EDA
methodology. All the invalid and null values were handled by removing that record or giving the
default value of that particular attribute based on its importance.
3. Dataset splitting: A dataset used for machine learning should be partitioned into two subsets —
training and test sets. We split the dataset into two with a split ratio of 80% i.e., in 100 records 80
records were a part of the training set and remaining 20 records were a part of the test set.
4. Model training: After a data scientist has preprocessed the collected data and split it into train and
test can proceed with a model training. This process entails ―feeding‖ the algorithm with training
data. An algorithm will process data and output a model that is able to find a target value (attribute) in
new data an answer you want to get a predictive analysis. The purpose of model training is to develop
a model. We trained our model using the random forest algorithm. On training the model it predicts
the yield on giving the other attributes of the dataset as input.
5. Model evaluation and testing: The goal of this step is to develop the simplest model able to
formulate a target value fast and well enough. A data scientist can achieve this goal through model
tuning. That‘s the optimization of model parameters to achieve an algorithm‘s best performance.
23
7.2. Requirement
1. The access permissions for system data may only be changed by the system‘s data
administrator.
2. Data shall never be viewable at the point of entry or at any other time.
3. Webpage should be responsive to the user Input or to any external interrupt which is of
highest priority and return back to the same state.
4. The webpage must be easy to maintain.
5. It must be user-friendly. Having a user-friendly application is of key importance for the
success of the webpage.
7.2.1. Hardware Requirement
The hardware requirements include the requirements specification of the physical computer
resources for a system to work efficiently. The hardware requirements may serve as the basis for a
contract for the implementation of the system and should therefore be a complete and consistent
specification of the whole system. The Hardware Requirements are listed below:
1. Processor: A processor is an integrated electronic circuit that performs the calculations that
run a computer. A processor performs arithmetical, logical, input/output (I/O) and other basic
instructions that are passed from an operating system (OS). Most other processes are
dependent on the operations of a processor. A minimum 1 GHz processor should be used,
although we would recommend S2GHz or more. A processor includes an arithmetical logic
and control unit (CU), which measures capability in terms of the following:
 Ability to process instructions at a given time
 Maximum number of bits/instructions
 Relative clock speed Learning
Fig.7.1: Processor
The proposed system requires a 2.4 GHz processor or higher.
24
2. Ethernet connection (LAN) OR a wireless adapter (Wi-Fi): Wi-Fi is a family of radio
technologies that is commonly used for the wireless local area networking (WLAN) of
devices which is based around the IEEE 802.11 family of standards. Devices that can use Wi-
Fi technologies include desktops and laptops, smartphones and tablets, TV‘s and printers,
digital audio players, digital cameras, cars and drones. Compatible devices can connect to
each other over Wi- Fi through a wireless access point as well as to connected Ethernet
devices and may use it to access the Internet. Such an access point (or hotspot) has a range of
about 20 meters (66 feet) indoors and a greater range outdoors. Hotspot coverage can be as
small as a single room with walls that block radio waves, or as large as many square
kilometres achieved by using multiple overlapping access points.
Fig.7.2: Wi-Fi
3. Hard Drive: A hard drive is an electro-mechanical data storage device that uses magnetic
storage to store and retrieve digital information using one or more rigid rapidly rotating disks,
commonly known as platters, coated with magnetic material. The platters are paired with
magnetic heads, usually arranged on a moving actuator arm, which reads and writes data to
the platter surfaces. Data is accessed in a random-access manner, meaning that individual
blocks of data can be stored or retrieved in any order and not only sequentially. HDDs are a
type of nonvolatile storage, retaining stored data even when powered off. 32 GB or higher is
recommended for the proposed system.
Fig.7.3: Hard Drive
25
4. Memory (RAM): Random-access memory (RAM) is a form of computer data storage that
stores data and machine code currently being used. A random- access memory device allows
data items to be read or written in almost the same amount of time irrespective of the physical
location of data inside the memory. In today's technology, random-access memory takes the
form of integrated chips. RAM is normally associated with volatile types of memory (such as
DRAM modules), where stored information is lost if power is removed, although non-
volatile RAM has also been developed. A minimum of 2 GB RAM is recommended for the
proposed system.
Fig.7.4: RAM
7.2.2 Software Requirements

The software requirements are description of features and functionalities of the target system.
Requirements convey the expectations of users from the software product. The requirements can
be obvious or hidden, known or unknown, expected or unexpected from client‘s point of view.
5 Jupyter Notebook: The Jupyter Notebook is an open source web application that you can use
to create and share documents that contain live code, equations, visualizations, and text. Jupyter
ships with the IPython kernel, which allows you to write your programs in Python, but there are
currently over 100 other kernels that you can also use. The Jupyter Notebook combines three
components:
 The notebook web application: An interactive web application for writing and running
code interactively and authoring notebook documents.
 Kernels: Separate processes started by the notebook web application that runs users‘ code
in a given language and returns output back to the notebook web application. The kernel
also handles things like computations for interactive widgets, tab completion and
introspection.
 Notebook documents: Self- contained documents that contain a representation of all

content visible in the notebook web application, including inputs and outputs of the
26
computations, narrative text, equations, images, and rich media representations of objects.
Each notebook document has its own kernel.
Fig.7.5: Jupyter Notebook
6 Python: It is an object-oriented, high-level programming language with integrated dynamic

semantics primarily for web and app development. It is extremely attractive in the field of
Rapid Application Development because it offers dynamic typing and dynamic binding options.
Python is relatively simple, so it's easy to learn since it requires a unique syntax that focuses on
readability. Developers can read and translate Python code much easier than other languages. In
turn, this reduces the cost of program maintenance and development because it allows teams to
work collaboratively without significant language and experience barriers. Additionally, Python
supports the use of modules and a package, which means that programs can be designed in a
modular style and code can be reused across a variety of projects.
Fig.7.6: Python
7 Pycharm: PyCharm is the most popular IDE for Python, and includes great features such as
excellent code completion and inspection with advanced debugger and support for web
programming and various frameworks. The intelligent code editor provided by PyCharm
enables programmers to write high quality Python code. The editor enables programmers to
read code easily through colour schemes, insert indents on new lines automatically, pick the
appropriate coding style, and avail context-aware code completion suggestions. At the same
27
time, the programmers can also use the editor to expand a code block to an expression or logical
block, avail code snippets, format the code base, identify errors and misspellings, detect
duplicate code, and auto-generate code. PyCharm offers some of the best features to its users
and developers in the following aspects
 Code completion and inspection
 Advanced debugging
 Support for web programming and frameworks such as Django and Flask
Fig.7.7: Pycharm
8 Flask: Flask is a lightweight WSGI web application framework. It is designed to make getting
started quick and easy, with the ability to scale up to complex applications. It began as a simple
wrapper around Werkzeug and Jinja and has become one of the most popular Python web
application frameworks. It offers suggestions, but doesn't enforce any dependencies or project
layout. It is up to the developer to choose the tools and libraries they want to use. There are
many extensions provided by the community that make adding new functionality easy.
Fig.7.8: Flask
28
9 Microsoft Visual Studio: Microsoft Visual Studio is an integrated development environment
(IDE) from Microsoft. It is used to develop computer programs, as well as websites, web apps,
web services and mobile apps. Visual Studio uses Microsoft software development platforms
such as Windows API, Windows Forms, Windows Presentation Foundation, Windows Store
and Microsoft Silverlight. It can produce both native code and managed code.
Fig.7.9: Visual Studio
29
CHAPTER 8
RESULT & DISCUSSION
8.1 Result
In result, firstly we find out the most accuracy algorithm among Logistic Regression,
Decision Tree and Random Forest.
Below given table will show the accuracy levels of the above algorithms.
Algorithm Accuracy
1. Logistic Regression 93.63%
2. Decision Tree 95.45%
3. Random Forest 97.87%
Table 8.1: Table OF Accuracy Levels
Here we found the suitable algorithm is Random Forest with 97.87% accuracy score.
Inputs has to given as shown in the below.
Fig.8.1: Input Screen
The system will predict the output by using Random Forest Algorithm and it display as shown
below. The inputs to be given are Temperature, Humidity, PH, Phosphorus, Nitrogen, Photasium ,
Rainfall and Water Level as shown in above figure.
30
Fig.8.2: Output Screen
Hence, the system return the most suitable crop as shown in the above figure.
31
CHAPTER 9
CONCLUSION & FUTURE WORK
9.1 Conclusion
Presently our farmers are not effectively using technology and analysis, so there may be a chance of
wrong selection of crop for cultivation that will reduce their income. To reduce those type of loses we
have developed a farmer friendly system with GUI, that will predict which would be the best suitable
crop for particular land and this system will also provide information about required nutrients to add
up, required seeds for cultivation, expected yield and market price. So, this makes the farmers to take
right decision in selecting the crop for cultivation such that agricultural sector will be developed by
innovative idea.
9.2 Future Work

The future work will be focused on updating the datasets from time to time to produce
accurate predictions, and the processes can be automated. We have to collect all required data by
giving GPS locations of a land and by taking access from Rain forecasting system of by the
government, we can predict crops by just giving GPS location. Also, we can develop the model to
avoid over and under crisis of the food.
Also, we would be building an application where the farmers can use it as app and converting
the whole system in their regional language.
32
REFERENCE
[1] Nischitha K, Dhanush Vishwakarma, Mahendra N, Ashwini, Manjuraju M.R “Crop Prediction
using Machine Learning Approaches” 2020.
[2] Bhawana Sharma, Jay Kant Pratap Singh Yadav, Sunita Yadav “Predict Crop Production in India
Using Machine Learning Technique: A Survey” 2020.
[3] Shilpa Mangesh Pande, Dr. Prem Kumar Ramesh, Anmol, B.R Aishwarya, Karuna Rohilla,
Kumar Shourya “Crop Recommender System Using Machine Learning Approach” 2021.
[4] M. Kalimuthu, P. Vaishnavi, M. Kishore “Crop Prediction using Machine Learning” 2020.
[5] Potnuru Sai Nishant, Pinapa Sai Venkat, Bollu Lakshmi Avinash, B. Jabber ”Crop Yield
Prediction based on Indian Agriculture using Machine Learning” 2020.
[6] D. Jayanarayana Reddy, Dr. M Rudra Kumar “Crop Yield Prediction using Machine Learning
Algorithm” 2021.
[7] Namgiri Suresh, N.V.K. Ramesh, Syed Inthiyaz, P. Poorna Priya, Kurra Nagasowmika, Kota.V.N.
Harish Kumar, Mashkoor Shaik, B.N.K. Reddy “Crop Yield Prediction Using Random Forest
Algorithm” 2021.
[8] Fatin Farhan Haque, Ahmed Abdelgawad, Venkata Prasanth Yanambaka, Kumar Yelamarthi
“Crop Yield Analysis Using Machine Learning Algorithms” 2020.
[9] Shivani S. Kale, Preeti S. Patil “A Machine Learning Approach to Predict Crop Yield and Success
Rate” 2020.
[10] Viviliya B, Vaidhehi V “The Design of Hybrid Crop Recommendation System using Machine
Learning Algorithms” 2020.
[11] Pavan Patil, Virendra Panpatil, Prof. Shrikant Kokate “Crop Prediction System using Machine
Learning Algorithms”.
[12] Himani Sharma, Sunil Kumar “A Survey on Decision Tree Algorithms of Classification in Data
Mining”.
[13] Umamaheswari S, Sreeram S, Kritika N, Prasanth DJ, “BIoT: Blockchain-based IoT for
Agriculture”, 11th International Conference on Advanced Computing (ICoAC), 2019 Dec 18 (pp.
324-327). IEEE.
[14] Pijush Samui, Venkata Ravibabu Mandla, Arun Krishna and Tarun Teja “Prediction of Rainfall
Using Support Vector Machine and Relevance Vector Machine”.
33

Crop Report Final

Uploaded by

Copyright:

Available Formats

You might also like

Crop Report Final

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Crop Report Final

Uploaded by

Copyright:

Available Formats

CHAPTER 1

1.1. CROP PREDICTION

 Contributing to optimal crop growth, development and yield.

1.3. ORGANIZATION OF REPORT

A comparative study of various machine learning can be applied on a dataset with a

Table 2.1: Approach to Crop Prediction

 Seasonal crops—crops can be planted during a season. e.g. wheat, cotton.

3.2 Proposed System

Read temperature, Humidity, pH Values

Fig.4.1: Architecture of crop prediction

4.1. Data Collection: -

4.2. Data Preprocessing: -

4.3. Machine Learning Algorithm for Prediction: -

4.3.1 Logistic Regression:

4.3.2 Decision Tree:

4.3.3 Random Forest:

Fig.4.3: Working Of Random Forest

4.4. Crop Prediction:

Fig.4.4: Implementing Flask

Fig.4.6: Prediction Code

5.1 Basic Process

Fig.5.2: Soil and Crop data sample

5.3 Exploratory Data Analysis

Fig.5.3: Correlation Matrix Example

5.4 EDA Performed:

Fig.5.5: Output of describe function

Fig.5.6: Correlation Matrix of the Proposed System

5.5 Algorithms Used

1. Logistic Regression 93.63%

2. Decision Tree 95.45%

3. Random Forest 97.87%

Table 5.1: Summary of the Approaches

5.6 Random Forest

Fig.5.7: Random Forest Flow

5.6.1 Another Reason for choosing Random Forest:

6.1 System Architecture

SYSTEM REQUIREMENTS SPECIFICATIONS

A software requirements specification (SRS) is a description of a software system to be

7.2.1. Hardware Requirement

The proposed system requires a 2.4 GHz processor or higher.

Fig.7.3: Hard Drive

7.2.2 Software Requirements

 Notebook documents: Self- contained documents that contain a representation of all

Fig.7.5: Jupyter Notebook

6 Python: It is an object-oriented, high-level programming language with integrated dynamic

Fig.7.9: Visual Studio

1. Logistic Regression 93.63%

2. Decision Tree 95.45%

3. Random Forest 97.87%

Table 8.1: Table OF Accuracy Levels

Fig.8.1: Input Screen

9.2 Future Work

You might also like