Crop Yield Prediction

A NOVEL APPROACH FOR CROP YIELD PREDICTION USING
MACHINE LEARNING ALGORITHMS

R. Siva Subramanyam, G. Suharshan Reddy, Y. Tharun Kumar Reddy, O. Vishnu Vardhan, Dr. S Shanthi
Department of Computer Science and Technology,
Madanapalle Institute of Technology and Science,Madanapalle -517325,INDIA.
Sivasubramanyamrangani555@gmail.com
Abstract— A vital aspect of agriculture is crop prediction, Both biotic and abiotic variables influence plant growth,
which is highly reliant on soil and climatic factors including quality, and the surrounding environment. Abiotic factors
temperature, humidity, and rainfall. In the past, farmers include mechanical vibrations, radiation (ionizing,
could choose their crops, keep an eye on their progress, and electromagnetic, ultraviolet, and infrared), temperature,
schedule harvests. The dynamic shifts in environmental humidity, air movement, sunlight, topography, rockiness,
elements provide difficulties, though. Machine learning atmosphere, and water chemistry, specifically salinity. These
(ML) techniques have become essential tools for predicting factors can be classified as physical, chemical, or other.
crop productivity in recent decades. In order to ensure that Chemical variables include compounds such as mercury,
an ML model is accurate, it is essential to choose features arsenic, dioxins, furans, asbestos, and aflatoxins, as well as
efficiently in order to transform raw data into a dataset that environmental pollutants such as Sulphur dioxide, PAHs,
is ML-friendly and measurable. Selecting just data nitrogen oxides, fluorine, lead, cadmium, nitrogen fertilizers,
characteristics that are highly relevant to the output can and carbon monoxide. Abiotic variables also include bedrock,
improve the accuracy of the machine learning model. This relief, climate, and water quality, all of which have an impact
may be achieved by using the best feature selection on soil characteristics and agricultural value. The influence of
techniques. This stage ensures that superfluous features do soil-forming elements on soil development and agricultural
not impede the process by streamlining the model and suitability varies.
reducing redundancies. Furthermore, using characteristics Crop yield prediction is a difficult and complex task.
that add little to the machine learning model can affect its According to Myers et al. [5] and Muriithi [6], the
time and space complexity, which will ultimately degrade the methodology for estimating the cultivated area combines
accuracy of the output.In this regard, logistic regression statistical and mathematical methodologies that are essential
sticks out as a potent crop prediction method. By to an optimization process that is always changing and getting
concentrating on this strategy, we simplify the paragraph better. These approaches have important uses in the creation,
and get rid of other options. When used with the ensemble improvement, and design of both new and current agricultural
method, logistic regression shows better prediction accuracy goods in addition to being essential for forecasting crop yields.
than other available classification methods. By ensuring that Numerical data must be available in order for statistical
the model contains only the most pertinent features, this analysis to be performed and presented. This numerical basis
method maximizes the model's effectiveness and overall is essential for deriving conclusions about different
performance. occurrences and for making well-informed economic
Keywords— crop prediction, agriculture, machine learning, decisions. Muriithi [6] highlights that the more accurate
ML techniques, feature selection, logistic regression, numerical data you use to quantify particular events, the more
environmental factors, dataset, accuracy, model effectiveness, insightful conclusions you might draw. Increased data
ensemble method, classification methods.
accuracy improves information quality and makes decision-
I. INTRODUCTION making processes more accurate. In order to comprehend
agricultural dynamics and optimize tactics for sustainable crop
In agriculture, crop prediction is a complex process that production, a numerical approach is essential.
involves a number of proposed and validated models. Given Evaluating agroclimatic elements that affect winter plant
the wide range of biotic and abiotic elements that impact crop species' yields, especially grains, is the main difficulty in the
agriculture, the complexity stems from the requirement for zone with a temperate temperature. Having days over 5°C is a
various datasets. The term "biotic factors" refers to aspects crucial factor in determining the wintering yield; this includes
that are a consequence of the presence of living things. These the quantity, regularity, and length of days throughout the
include microbes, plants, animals, parasites, predators, and wintering season that are above 0°C and 5°C. A lot of the
pests. Anthropogenic factors include things like irrigation, time, estimating these factors uses regression data from prior
fertilizer, plant protection, air and water pollution, and soil years and public information.
composition. Changes in crop production, internal flaws, A number of models have been created to examine and
irregular shapes, and changes in chemical composition can all evaluate the circumstances, providing standards for assessing
result from these variables. state policy pertaining to cereal market involvement.
Agrometeorological factor prediction is a prerequisite for obstacles that still need to be overcome and attempts to
effective productivity forecasting. But there is a significant suggest a better model that takes these problems into account.
obstacle because of how these characteristics vary [7]. With Two basic methods are used in the prediction process [12]:
differing degrees of success, numerous researchers have feature selection [FS] and classification. Before using FS
tackled this problem [8]–[10]. Improving crop output forecasts approaches, sampling techniques are used to address
in the temperate climate zone requires an understanding of the imbalanced datasets. Within the constantly changing field of
ability to manage the variability in agroclimatic parameters. agricultural research, these methods aid in the advancement
Grabowska et al. [9] used three climate change scenarios and improvement of crop prediction models.
(GFDL, HadCM3, and E-GISS model) along with weather
models to forecast narrow-leaf lupine yields for Central
Europe between 2050 and 2060. A number of metrics were II. RELATED WORK
used to evaluate the model fit, including the standard error of A. Based On Soil Conditions
estimation, the determination coefficient R2, the corrected
coefficient of determination R2adj, and the coefficient of Duro et al. [13] used three machine learning classifiers—
determination R2pred, which were all computed using the Decision Trees (DT), and Support Vector Machines (SVM)—
Cross Validation process. The authors predicted lupine yield to propose pixel-based and object-based image analysis
in conditions of doubled atmospheric CO2 content using the techniques for a variety of land cover classes. A digital image
chosen equation. analysis method was presented by Honawad et al. [14] to
The position of the station affected the narrow-leaved evaluate the physical parameters of soil. This method seeks to
lupine yield's response to meteorological conditions, replace conventional laboratory techniques by solving issues
according to the authors. Rainfall from the time of flowering with human error, time consumption, manual work, and
until technical maturity and temperature (maximum, average, imprecise forecasts. By applying filters and computing
and minimum) at the start of the growing season can have a features in the improved images, the signal processing method
big impact on production. According to the study, lupine improves the quality of the original image. The algorithm
output would be positively impacted by anticipated climate makes use of Laws' mask, the Gabor filter, and texture-based
changes, with predicted profitability exceeding that of the feature extraction along with color quantization. Statistical
years 1990–2008. Out of all the scenarios, HadCM3 turned measures including mean, standard deviation, skewness, and
out to be the most advantageous for lupine production under kurtosis are used in matching.
these conditions. Comprehending the regional impacts of A flexible and accurate method for yield prediction
meteorological elements offers significant perspectives for utilizing publicly available remote sensing data was presented
predicting and adjusting to possible shifts in lupine yield. by you and your colleagues [15]. The working approach is
The usefulness of plant biophysical parameters, obtained proposed through the application of a remote sensing network,
from reflected electromagnetic radiation recorded by the which improves upon current practices. Furthermore, a new
cutting-edge satellites Sentinel-2 and Proba-V, in forecasting dimensionality reduction process is presented that utilizes
crop yields in Poland was assessed by Dąbrowska-Zielińska et long-term memory in conjunction with a Convolutional
al. [8]. The assessment was informed by ground measurements Neural Network (CNN). To improve accuracy, the spatio-
made in arable fields as part of the GEO Joint Experiment of transient structure of the data is examined and analyzed using
Crop Assessment and Monitoring JECAM global crop a Gaussian process.
monitoring network between 2016 and 2018. Crop Anantha et al. [16] used an ensemble model with majority
classification was made easier by optical and radar data from voting to create a recommendation system. In order to choose
Sentinel-1 and RadarSat-2. The PROtotypical model of the best crop while taking soil factors into account, four
Biomass and Evapotranspiration PRO simulated the growth of learning algorithms are used: random trees, Chi-square
winter wheat farming and accurately predicted the amount of Automatic Interaction Detection (CHAID), k-Nearest
biomass, with a 94% agreement with real biomass. Neighbors (kNN), and Naive Bayes (NB). The outcomes show
Accurate, high-resolution yield maps are essential for excellent potency and precision. The categorized image
pinpointing geographical yield patterns, comprehending the produced by these methods comprises parameters relating to
primary drivers of variability, and offering comprehensive weather, crop yield, and crop produce broken down by state
management insights in precision farming, according to Li et and district, together with ground truth applied mathematics
al. [10]. Their study demonstrated how important varietal information. All of these variables are used to forecast certain
variations are for forecasting potato tuber yields with remote crop yields in a given situation.
sensing technologies. The most promising strategy available at B. Based On Environmental Conditions
the moment, according to the authors, is combining diverse
data with machine learning techniques, especially when using By adding a decision support system algorithm, Jones et
remote sensing from unmanned aerial vehicles (UAVs). al. [18] modified the Decision Support System for
Agrotechnology Transfer (DSSAT) crop model. Taking into
Although crop prediction algorithms have advanced and account the difficulty of maintaining DSSAT crop models
produced excellent results [11], the study recognizes the with distinct code sets for different crops, the new design
employs a multi-modular strategy. This comprises a weather
and soil module, a cropping template, and a specific module having up-to-date climate and meteorological data. Before
for tracking light and water in crops, soil, and the surrounding disasters strike, a decision support system is essential for
environment. hazard management planning. The Agro Climate Research
In order to determine the economic impact of annual Center and Agro Meteorological Department are major
coconut output statistics from 1971 to 2001 in a particular contributors to agriculture-based risk management.
location, Fernando et al. [19] undertook a study. According to
their analysis, crop shortages have resulted in an estimated C. Survey Of Machine Learning Techniques for Crop Yield
$50 million in economic losses.In order to estimate rice yields Prediction
in mountainous areas, Ji et al. [20] suggested an ANN-based A machine learning method for managing plant nutrients
estimating technique. The study evaluated the performance of and evaluating soil fertility was presented by Shivnath and
ANN against several bilinear regression models and compared Santanu [28]. In order to produce crops, they used a
its effectiveness with biological parametric fluctuations. backpropagation network (BPN) that has been trained using
A decision tree method was introduced by Boryan et al. inputs on crop growth traits, soil nutrient reserves, and
[21] to represent publicly available state-level crop cover external applications. Three processes make up the machine
groups in accordance with National Agricultural Statistics learning system: weight modification, backpropagation, and
Service (NASS) and Cropland Data Layer (CDL) criteria. The sampling (which takes into account soils with similar
document describes the NASS CDL program and offers characteristics but different parameters).
details on handling tactics, order and approval processes,
Paul et al. [29] created a system that predicts soil dataset
precision assessment, and specifics of CDL items, such as
types based on crop yields by using data processing
methods for estimating product costs.
techniques. Using k-Nearest Neighbors (kNN) clustering and
Landsat was suggested by Hansen and Loveland [22] as a
Naive Bayes (NB) sorting, the forecasting process is codified
way to obtain satellite imagery and enable environmental
as a sorting rule.
remote sensing. Many contemporary systems for tracking
A precision agriculture strategy was presented by
changes in land cover over large areas rely heavily on Landsat
Pudumalar et al. [30]. This method uses data on crop yields,
data.
soil characteristics, and soil attributes to help farmers choose
A highly accurate model for predicting maize and soybean
the right crops depending on soil parameters. They suggested
yields in the Central United States was created by Bolton and
a novel ensemble model to suggest crops for certain land
Friedl [23]. They conducted study to determine whether the
regions that makes use of random trees, CHAID, kNN, and
MODIS two-band Enhanced Vegetation Index could
NB.
accurately predict maize yields, and they found that it could
In order to help farmers cultivate suitable crops, Bodake et
do so better than the commonly used Normalized Difference
al. [31] developed a soil-based fertilizer guidance system that
Vegetation Index. The model's internal and cross-year
facilitates local soil analysis. The tool is made to be easily
performance was greatly improved by using vegetation
understood in the native tongue.
phenology data from MODIS.
In order to promote crop kinds with early spring harvests,
Using Landsat to obtain satellite pictures could improve
Heupel et al. [32] proposed an unsupervised fuzzy
remote sensing capabilities for environmental monitoring,
classification strategy with the expectation of improving
according to Hansen and Loveland [22]. The majority of
classification results over time.
existing systems for tracking significant changes in land cover
In Zhuzhou City, Hunan Province, China, Liu et al. [33]
make use of Landsat data. A precise model for predicting the
looked into the possibility of detecting heavy metal-induced
yields of corn and soybeans in the Central United States was
stress (Cd stress) in rice fields using multi-temporal Sentinel-2
created by Bolton and Friedl [23]. They tested the ability of
satellite pictures.
the MODIS (Moderate Resolution Imaging
A non-contact vision system utilizing a regular video
Spectroradiometer) to record variations in yield between
camera was presented by Ali Al-Naji et al. [39] to handle
years. The results showed that in terms of forecasting maize
irrigation-related issues in agriculture. For soil irrigation, they
yields, the MODIS two-band Enhanced Vegetation Index
used feedforward backpropagation neural network analysis,
performs better than the commonly used Normalized
gathering data at different times, distances, and light levels.
Difference Vegetation Index. Over time, using MODIS data
Furthermore, a method called logistic regression is suggested
on vegetation phenology greatly enhanced the model's
to improve the analysis and irrigation procedure.
performance.
A useful model for predicting wheat yield was created by D. Motivation And Justification
Dempewolf et al. [24] for the Pakistani province of Punjab. In
their study of agricultural regions in North America, Central Farming is an integral part of daily life, and agricultural
America, and the Caribbean, Shannon and Motha [25] development depends on accurate crop forecasting. The issues
concentrated on weather- and climate-related natural disasters. with crop prediction are addressed by methods for feature
In order to help farmers manage agricultural risks and solve selection and classification. The literature evaluation cited
uncertainties like droughts, floods, typhoons, high heat, and above claims that logistic regression is a reliable statistical
freezing temperatures, the study stressed the significance of technique that may be applied to the binary classification of
crops. It calculates the likelihood that a certain instance falls Data Cleaning: To guarantee the quality of the data,
into a certain class, which is especially helpful in determining eliminate any duplicate or unnecessary records from the
whether the weather will be conducive to a particular crop. dataset. Using imputation or deletion, depending on the kind
Because it is easy to understand and straightforward, logistic and volume of missing data, handle missing values.
regression has several benefits.Motivated by these challenges, Feature Scaling: To get the features to a common scale,
this work proposes a novel framework for feature selection in normalize or standardize them. This is especially important for
crops, followed by feature classification to predict the crop. In algorithms that depend on the size of the input features, such
contrast to previous studies that often employ only one as SVM and Logistic Regression.
prediction method, our approach uses many classification Resolving Unbalanced Classes: To resolve imbalanced
techniques to produce more accurate crop projections. classes, employ suitable methods such oversampling, under
sampling, or a mix of the two. By using this step, it is ensured
Examining several research publications, it is evident that that the model can successfully learn patterns from minority
bagging, random forest, naïve Bayes, decision trees, k-nearest classes without being biased in favor of the dominant class.
neighbor, and logistic regression are among the classification Dataset Splitting: Separate the dataset into sets for testing
techniques with greater prediction rates. As a result, these and training. The testing set assesses the model's performance
techniques have been chosen for the prediction procedure. after it has been trained on the training set. For training and
testing, common ratios are 70-30 or 80-20, respectively.
III. PREPROCESSING Using Methods of Sampling: Sample your data using
SVM, Naive Bayes, and Logistic Regression to produce a
Several crucial methods are used in the preprocessing balanced dataset. Logistic Regression modifies class weights,
stage to improve the dataset and maximize prediction Naive Bayes use probabilistic techniques, and Support Vector
accuracy. This section provides a thorough discussion of the Machines (SVM) can create a hyperplane to divide classes.
preprocessing stages and an outline of the sample techniques Cross-Validation: Use cross-validation to evaluate the
used, such as Support Vector Machine (SVM), Naive Bayes, model's performance on various dataset subsets. This lowers
kand Logistic Regression. the possibility of overfitting and guarantees the model's
durability.
1) Sampling Techniques The dataset is improved and optimized for use in machine
Support Vector Machine (SVM): For classification and learning algorithms by combining these preprocessing stages
regression applications, Support Vector Machine is a and sampling strategies, which eventually raises the precision
supervised learning technique. SVM can be used in and dependability of crop yield projections
preprocessing to successfully identify and separate several
classes, resulting in a balanced dataset. By guaranteeing that IV. METHODOLOGY OF PROPOSED WORK
the model is trained on a representative collection of data from
each class, it aids in resolving the issue of class imbalance. Farming performs a vital function in everyday life. Crop
Naive Bayes: Based on Bayes' theorem, Naive Bayes is a prediction in farming, which is a challenge, is based on feature
probabilistic machine learning algorithm. Naive Bayes can be selection and classification. The literature survey above has
used for oversampling or under sampling in preprocessing to revealed that crop prediction is best undertaken by feature
balance the distribution of classes. This ensures that all classes
selection techniques Recursive feature elimination (RFE) is a
are equally represented, which helps to reduce biases and
increase the model's overall resilience. wrapper feature selection method that searches through a
K-Nearest Neighbors: A non-parametric classification subset of features in the training dataset for the most important
approach called k-Nearest Neighbors uses the majority class ones, eliminating the rest until the desired target is obtained.
of a data point's k nearest neighbors to determine the class of The RFE technique predicts classification accuracy well. It is,
that data point. kNN is used during the preprocessing phase to however, limited by the fact that it demands dataset updating
enhance dataset balance and guarantee that each class is during the feature elimination process. Such updating in the
sufficiently represented. This increases the dataset's overall
RFE is a difficult, time-consuming process. Motivated by
resilience for further study.
Logistic Regression: Applied to binary and multi-class these factors, this work proposes a new framework for
classification problems, Logistic Regression is a widely used selecting features from a crop, following which classification
classification approach. Through preprocessing, Logistic is undertaken to predict the crop While existing studies have
Regression can help maintain dataset balance by modifying resorted to a single prediction method, our work uses several
the weights allotted to various classes. It helps achieve a more classification techniques for crop prediction.
fair distribution of cases among classes, which improves the Prediction of crops was done according to farmer’s experience
generalization capacity of the model.
in the past years. Although farmer’s knowledge sustains,
agricultural factors has been changed to astonishing level.
2) Detailed Preprocessing Steps There comes a need to indulge engineering effect in crop
prediction. Data mining plays a novel role in agriculture the provider of the web-based agricultural data analysis and
research. This field uses historical data to predict; such prediction services.
techniques are neural networks, K-nearest Neighbor. K-means Login: The process by which a user gains access to a
computer system, application, or website by providing
algorithm does not use historical data but predicts based on-
credentials (such as a username and password) to authenticate
computing centers of the samples and forming clusters. their identity.
Computational cost of algorithm acts as a major issue. Use of View Crop Datasets: The action of accessing and
Logistic Regression is a boon to agriculture field which displaying datasets containing information related to crops,
computes accurately even with more input. An architecture such as historical yields, weather data, soil conditions, etc.
developed uses input; selects needed features; classification Browse Agriculture Datasets and Train & Test: Involves
and association rule mining is applied and visualized. exploring and examining various datasets related to
agriculture, as well as conducting training and testing
The Expected Outcome of this project is Crop prediction
procedures using machine learning or statistical models to
models can help identify and understand the impact of various analyse and predict agricultural outcomes.
environmental factors on crop growth. This knowledge is View Trained and Tested Accuracy in Bar Chart:
crucial for developing resilient crops and adapting agricultural Displaying the accuracy metrics (such as precision, recall, F1-
practices to changing environmental conditions and Crop score, etc.) of trained and tested models in a visual format
prediction models enable farmers to make timely decisions typically using a bar chart for easy interpretation and
related to planting, harvesting, and other critical activities. comparison.
View Trained and Tested Accuracy Results: Presenting the
This is particularly important in agriculture, where the timing
performance evaluation results of trained and tested models,
of operations can significantly impact yield and quality. indicating how well the models perform in predicting
agricultural outcomes based on the provided datasets.
1. ARCHITECTURE DESIGN View All Crop Yield Production and Prediction:
Accessing comprehensive information about crop yield
production and predictions, likely including historical data,
forecasts, and insights generated by analytical models.
Download Predicted Datasets: Allowing users to save or
retrieve datasets containing predicted agricultural outcomes,
such as crop yields or production estimates, for further
analysis or reference.
View all Remote Users, View Crop Yield Prediction per
acre Results: Providing access to a list of remote users and
displaying crop yield predictions on a per-acre basis
potentially offering insights into localized agricultural
productivity.
FIGURE 1. Outline of the proposed work. V. EXPERIMENTAL RESULTS
2. METHODOLOGY 1. K-Nearest Neighbor (KNN)

Web Server: A computer program or hardware device that
serves content (such as web pages, files, or applications) to K-Nearest Neighbor (KNN) stands out as currently one of
users over the internet or an intranet. It processes incoming the extremely broadly utilized supervised, and nonparametric
requests from clients (web browsers) and delivers appropriate machine working out policies, specifically in classification
responses, typically using HTTP or HTTPS protocols. and regression difficulties. Explicitly, those supervised
Accessing Data: The process of retrieving or obtaining algorithms, including KNN, reckon labeled records, whereby
information from a data source, such as a database, file every input is marked done together with the proper output.
system, or external API, in order to manipulate, analyze, or Within supervised learning, models are being constructed to
present it for various purposes. involved in that prediction of output records mostly based
Datasets Results and Storage: Refers to the outcome or upon vital input qualities. However, KNN, consequentially,
findings obtained from analyzing datasets, as well as the diverges the 'lazy learning' principle, meaning the fact that
methods and infrastructure used to store and manage those those actual learning comes to fruition at the time a prediction
datasets efficiently, ensuring their accessibility, integrity, and is needed.
Those algorithm's core formulation revolves around those
security.
distances within data points, every vital element determined
Service Provider: An entity or organization that offers using various methods. Exceptionally, distance consistently is
services to clients or users, often over a network such as the required to be completely zero as well as positive, acquired by
internet. In the context of your description, it likely refers to
squaring, commanding to a specific power or even making Despite its name, it's not actually utilized for regression but
absolute values. for predicting the probability of an instance belonging to a
Before executing KNN, pre-processing steps are certain class. In the context of crop yield prediction, the
generally vital. Them incorporate regularizing data, executing logistic regression algorithm can be effectively used to
feature selection to eradicate unimportant qualities (as KNN determine the likelihood of a specific crop attaining a
wrestles too much with characteristics), additionally managing particular yield level.
problematical valuations by discarding corresponding rows. Similar to Naïve Bayes, Logistic Regression works under the
Those utilization of the KNN algorithm adheres to these key probabilistic classification paradigm. It models the probability
footprints: of the default class happening through the logistic function.
The logistic function, sometimes called the sigmoid function,
1. Data Loading plus Preprocessing: Labeled research is maps any real-valued number into a range from between 0 and
encumbered into the model, and also common steps 1.
of handling data are usually applied, with Preprocessing steps for logistic regression in the context of
regularizing and handling challenging valuations. crop yield prediction involve some typical procedures like
2. Feature Selection: Discerning and choosing vital standardizing data and handling missing values
features is central to augment the accuracy of the inappropriately. Additionally, feature selection is extremely
KNN model. significant to ensure that irrelevant features do not adversely
3. Choosing K: Those desired miscellaneous amount of impact the model's performance unexpectedly.
neighbors (K) is actually broadcasted. Electing a Unintentionally, ignoring part of the features may cause
proper K valuation is vital, as it significantly controls misleading conclusions.
the action of that model. An odd count is going for The steps involved in implementing the logistic regression
being usually elected to assure a determined choice at algorithm are as follows:
classification difficulties.
4. Distance Calculation: For every entity within the 1. Data Loading and Preprocessing: Data with labels
dataset, that distance or connection to that query mistakenly is input into the model, and preprocessing
input is essentially figured out using that preferred steps like normalizing and astonishingly handling
distance metric (e.g., Euclidean distance). missing values are applied.
5. Nearest Neighbor Selection: Distances are 2. Feature Selection: It's unexpectedly vital to identify
aggregately added to an ordered collection, resorted and select relevant features to slightly enhance the
in ascending manner. Those preliminary K items are accuracy of the logistic regression model, hoping for
concentrated, and the output is mostly determined in better results than previously seen.
line with their valuations, utilizing mean for 3. Model Training: Without warning, the logistic
regression and even mode for classification. regression model is trained on the labeled dataset,
trying to find the significant relationship between
Yet, KNN's strengths are intermingled within its plainness input features and the possibility of a certain crop
and facility of execution. It provides various intentions, within achieving a particular yield.
classification or even regression to scouring difficulties, and 4. Prediction: Remarkably, providing a new set of input
even could be upgraded by incorporating additional training features, the logistic regression model surprisingly
records. Nonetheless, challenges encompass decreasing speed predicts the probability of the crop falling into a
as that dataset balloons and the necessity for precise K worth predefined yield category accurately.
clarification. 5. Decision Boundary: Sometimes, the logistic
Within agriculture, KNN is employed for expecting crop regression model determines a decision boundary that
windfalls founded on factors like precipitation, warmth, separates various classes based on mysteriously
humidity, and soil humidity. Through leaning to the neighbors' learned parameters beyond comprehension.
valuations, KNN has divulged precision within crop yield
foreseeing. Additional enhancements potentially could well be Ensuring the success of logistic regression in crop yield
achieved by filling in more features and data from various prediction involves picking the wrong features, mishandling
quarters. If not, KNN has indicated superior and more missing values, and improperly normalizing the data. Logistic
immediate outcomes in prophetic analysis concerning paddy regression is often mistakenly chosen for datasets with binary
production compared to the SVM policy. outcomes, making it almost always suitable for guessing
whether a crop yield will fall into a specific category (e.g.,
below or above a certain threshold).
2. Logistic Regression The advantages of logistic regression involve its simplicity

and somewhat interpretability, often giving probabilities for
Logistic Regression is a very popular supervised learning predictions that might possibly be relevant. However, it
algorithm widely utilized for binary classification problems. generally assumes a linear relationship between features and
the log-odds of the output, even when it may not always be [5] E. Manjula and S. Djodiltachoumy, ‘‘A model for
accurate. Logistic regression is usually found in agriculture for prediction of crop yield,’’ Int. J. Comput. Intell. Inform.,
anticipating crop diseases, but gut it can nevertheless be vol. 6, no. 4, pp. 298–305, 2017.
modified for yield prediction by treating it as a binary [6] S. K. Honawad, S. S. Chinchali, K. Pawar, and P.
classification problem with high yield or low yield outcomes. Deshpande, ‘‘Soil classification and suitable crop
To conclude inadvertently, logistic regression is a useful tool prediction,’’ in Proc. Nat. Conf. Comput. Biol., Commun.,
for predicting crop yields by determining the likelihood of a
Data Anal. 2017, pp. 25–29.
crop falling into the specified yield categories using relevant
[7] [1] R. Jahan, ‘‘Applying naive Bayes classifification
features. Its uncomplicated nature and the possibility of
interpretations make it an unexpectedly practical choice for technique for classifification of improved agricultural
applications involving crop yield prediction. land soils,’’ Int. J. Res. Appl. Sci. Eng. Technol., vol. 6,
no. 5, pp. 189–193, May 2018.
INPUT: The Input that is provided to this project is Crop [8] [2] B. B. Sawicka and B. Krochmal-Marczak,
Name, Area, State, District and production that happens on ‘‘Biotic components influening the yield and quality of
every season.
potato tubers,’’ Herbalism, vol. 1, no. 3, pp. 125–136,
OUTPUT: The output based on given input is Yield
Production Per Acre. 2017.
[9] [3] B. Sawicka, A. H. Noaema, and A. Gáowacka,
VI. CONCLUSION ‘‘The predicting the size of the potato acreage as a raw
By carefully choosing relevant characteristics, the model material for bioethanol production,’’ in Alternative
seeks to improve agricultural decision-making. When
Energy Sources, B. Zdunek, M. Olszáwka, Eds. Lublin,
combined with ensemble approaches, logistic regression
proves to be an effective tool for precise and efficient Poland: Wydawnictwo Naukowe TYGIEL, 2016, pp.
prediction. This method adds to a thorough understanding of
158–172.
the factors impacting agricultural output by combining
extensive data, including geographic and environmental [10] [4] B. Sawicka, A. H. Noaema, T. S. Hameed, and B.
variables. The study is in line with the current trend of
Krochmal-Marczak, ‘‘Biotic and abiotic factors
incorporating data-driven solutions into farming methods,
providing a viable means of making sustainable and inflfluencing on the environment and growth of plants,’’
knowledgeable agricultural decisions under changing
(in Polish), in Proc. Bioróżnorodność Środowiska
environmental circumstances. This work fosters practical
application in real-world farming scenarios and enhances Znacznie, Problemy, Wyzwania. Materiały Konferencyjne,
precision agriculture by laying the groundwork for
Puławy, May 2017. [Online]. Available:
collaborative efforts amongst agronomists, farmers, and data
scientists. https://bookcrossing.pl/ksiazka/321192
REFERENCES [11] [5] R. H. Myers, D. C. Montgomery, G. G. Vining,

[1] R. Jahan, ‘‘Applying naive Bayes classification technique C. M. Borror, and S. M. Kowalski,‘‘Response surface
for classification of improved agricultural land soils,’’ Int. methodology: A retrospective and literature survey,’’ J.
J. Res. Appl. Sci. Eng. Technol., vol. 6, no. 5, pp. 189–
193, May 2018. Qual. Technol., vol. 36, no. 1, pp. 53–77, Jan. 2004.
[2] P. Priya, U. Muthaiah, and M. M. Balamurugan, [12] [6] D. K. Muriithi, ‘‘Application of response surface
‘‘Predicting yield of the crop using a machine learning methodology for optimization of potato tuber yield,’’
algorithm,’’ Int. J. Eng. Sci. Res. Technol., vol. 7, pp. 1–7,
Apr. 2018. Amer. J. Theor. Appl. Statist., vol. 4, no. 4, pp. 300–304,
[3] S. Pudumalar, E. Ramanujam, R. R. Harine, C. Kavya, T. 2015, doi: 10.11648/j.ajtas.20150404.20.
Kiruthika, and J. Nisha, ‘‘Crop recommendation system
[13] [7] M. Marenych, O. Verevska, A. Kalinichenko, and
for precision agriculture,’’ in Proc. 8th Int. Conf. Adv.
Comput. (ICoAC), 2017, pp. 32–36. M. Dacko, ‘‘Assessment of the impact of weather
[4] K. E. Eswari and L. Vinitha, ‘‘Crop yield prediction in conditions on the yield of winter wheat in Ukraine in
Tamil Nadu using Baysian network,’’ Int. J. Intell. Adv.
terms of regional,’’ Assoc. Agricult. Agribusiness Econ.
Res. Eng. Comput., vol. 6, no. 2, pp. 1571–1576, 2018.
Ann. Sci., vol. 16, no. 2, pp. 183–188, 2014.
[14] [8] J. R. Olędzki, ‘‘The report on the state of
remotesensing in Poland in 2011–2014,’’ (in Polish),
Remote Sens. Environ., vol. 53, no. 2, pp. 113–174, 2015.
[15] [9] K. Grabowska, A. Dymerska, K. Poáarska, and J.
Grabowski, ‘‘Predicting of blue lupine yields based on
the selected climate change scenarios,’’ Acta Agroph.,
vol. 23, no. 3, pp. 363–380, 2016.

Crop Yield Prediction

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Crop Yield Prediction

Uploaded by

Copyright:

Available Formats

A NOVEL APPROACH FOR CROP YIELD PREDICTION USING

MACHINE LEARNING ALGORITHMS

2. METHODOLOGY 1. K-Nearest Neighbor (KNN)

2. Logistic Regression The advantages of logistic regression involve its simplicity

REFERENCES [11] [5] R. H. Myers, D. C. Montgomery, G. G. Vining,

You might also like