Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

Prediction of Rice Yield in Tarlac Province Using

Support Vector Regression


Jeremiah Trinidad, Jaquilyn Timog,
Teofilo Luzon, Jessel Salvador,
Yves Pasquil

1
Contents

1 Introduction 3

2 Methodology 5
2.1 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Support Vector Regression . . . . . . . . . . . . . . . . . . . . . . 5
2.3 Cross Validation Method . . . . . . . . . . . . . . . . . . . . . . . 6

3 Results 6
3.1 Summary Statistics of Rice Parameters . . . . . . . . . . . . . . . 6
3.2 Rice Yield Prediction of Province of Tarlac Using Various Kernels
of SVR with Hyper Tuning Parameters. . . . . . . . . . . . . . . 7
3.3 SVR with Different Kernels for Allocated Testing Data of Rice
Yield . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

4 Discussions 9

5 Resources 9

2
1 Introduction
Rice has an integral part of meal in countries in Asia. Asians consumed
three or more rice meals per day, the meal cannot be completed without the
rice served on the table. Especially in the Philippines rice is essential, despite of
the availability of other food such as bread and noodles, rice remains the main
and preferred to eat by Filipino people. Rice has various names depending on
which stage it is from rice grains to well milled rice and ready to cook. It is
known as “Palay” (un-milled rice), “Bigas” (milled rice), and “Kanin” (cooked
rice) in the Philippines.
Every Filipino meal must serve rice on the table, whether for breakfast,
lunch, dinner, and even snacks. Filipino have various ways of cooking rice this
is a result of how they love eating rice in every day. Since it is a one of the
primary commodities of every Filipino lives and to sustain the needs for rice,
Rice farming is one of the main livelihood of the people.
In 2018, the Philippines ranked eighth in global rice production (FAOSTAT,
2020). Rice is widely grown in the Philippines, particularly in Luzon, West-
ern Visayas, Southern Mindanao, and Central Mindanao. Rice production has
increased over the last two decades, from 12 Mt in 1999 to 19 Mt in 2008. (FAO-
STAT, 2020). The annual mean of total rice harvested area is approximately
4.7M ha, with an average yield of approximately 3.95 tons per harvested hectare
are harvested in the Philippines. The largest rice producing regions are Central
Luzon and Cagayan Valley region (FAO, 2002).
According to Mindanao Times (2022), among top 10 rice regions in the
Philippines with most rice production the, Region 3 (Central Luzon) got the
first spot. Because of its enormous flatlands and swaps has a clear natural
advantage when it comes in rice production and it has been remains the top
rice producer for decades. In 2021, the provinces of Aurora, Bataan, Bulacan,
Nueva Ecija, Pampanga, Tarlac, and Zambales produced 699,043.50 metric tons
of palay (un-milled rice), accounting for 15.1% of total production in the country.
The province of Nueva Ecija leads the region and the country in rice production.
Rice is the chief commodity in Asia. It is the most consumable goods in the
world, particularly in the Philippines. Because of the love of rice of the Filipino
people it has many translations in its language. Rice translates to palay which
is the rice grain, to bigas which is uncooked rice, to kanin which is the cooked
state, to tutong the burned part and lastly to bahaw the cold rice. In fact,
one of the UNESCO World Heritage in the Philippines is the Rice Terraces of
the Philippine Cordilleras, a 2000 years old rice fields made by Ifugaos in the
contours of the mountains. It is a living cultural landscape that withstands the
diversity of time. These only show how rice is richly embedded in the Filipino
culture.
The increase in world population has led to a significant increase in food
demand throughout the world, so agricultural policy makers in all countries try
to estimate their annual food requirements in advance in order to provide food
security for their people. In order to achieve this goal, this study developed a

3
novel predictive model based on the energy inputs employed during the produc-
tion season. Rice caters more than 30% of the calorie requirement for the Asian
countries. In Iran too rice is one of the most important agricultural products.
According to the researchers Chen, H., Et al (2016) The study investigated
the relative importance of climate factors in the yield alteration of paddies in
southwestern China. A comparison between an SVM with multiple linear re-
gression (MLR) and an artificial neural network (ANN) have been carried out
and validated by various cross-validation techniques such as (those abbreviated
as) MAE, mean relative absolute error (MRAE), RMSE, relative root Mean
square error (RRMSE), and a coefficient of determination. It was further sug-
gested to consider various parameters of soil management practices to increase
the precision in the developed models.
Palanivel, K. Et al (2019) looked at using different machine learning tech-
niques to predict crop yield data and validating the findings using RMSE values.
A study used Modular Artificial Neural Networks (MANN) and SVR to estimate
Kharif crop production in Visakhapatnam, with the amount of monsoon rainfall
factored in to improve accuracy. Other researchers used SVR with RBF ker-
nel to construct a model of wetland rice production based on climate changes
in the Kalimantan province to predict with greater precision. Additionally,
some researchers used four machine learning algorithms (SVM, KNN, Linear
Regression, and Elastic Net Regression) to predict potato tuber yield with soil
and crop properties through proximal sensing on a dataset of six fields across
Atlantic Canada with different zones for the year 2017–2018.
The rationale behind choosing this research topics is to predict the rice
cultivation in Tarlac province. The researchers will use the Support vector
Algorithm approach in order to know which model is the most efficient using
different kernel function. The result of this study will help Tarlac province to
secure food for the Tarlaqueño.

Objectives

The objective of this study was to develop a model based on artificial intel-
ligence for predicting the output in rice production. Such a model could help
farmers and policy makers. This model employed the polynomial and radial ba-
sis function (RBF) as the kernel function for support vector regression (SVR).
Specifically, it seeks to answer the following questions:
1. Determine the summary statistics of:
1.1 Yield
1.2 Area
1.3 Production
2. Determine the error of training and testing data set
2.1 RMSE (Root Means Square Error)
2.2 MAE (Mean Absolute Error)
3. Calculate different of kernel functions involved in the study and create
graphical representation
3.1 Polynomial

4
3.2 Linear
3.2 RBF (Radial Basis Function)
4. Which model is most efficient among different kernel function?

2 Methodology
2.1 Data Collection
The data was gathered with the help of the available database online of the
Philippine Statistics Authority. The rice yield data ranges from 1987-2022 where
we took the annual Volume of Production and the Area Harvested in Tarlac
Province. Also, we used the two data to get the annual Yield (KG/Hectare) of
the province.

2.2 Support Vector Regression


In this research, we used Support Vector Regression which provides accurate
prediction when it comes to dealing with non-linear uncertainties compared to
other traditional prediction models (Paidipati, K.K., et. al.). SVR is not as well-
known as the other machine learning however, SVR provides easier updating.
Also, its generalization capacity is outstanding and its implementation is easy.
The concept of Support Vector Regression is as follows:
Let F = {(x1 , y1 ), (x2 , y2 ), . . . , (xn , yn )} be the n sample sets where (xi , yi ) are
vectors that corresponds to the output variables. The regression function is
given as:
Xn
y(x) = wi xi + b; y, b ∈ R and x, w ∈ Rn (1)
i=1

= wxT + b (2)
where x = {(x1 , x2 , . . . , xn )}, y = {(y1 , y2 , . . . , yn )}, and w = {(w1 , w2 , . . . , wn )};
x, w ∈ R.
The formula
n
1 X
M in ||w||2 + C (δi + δi∗ ) (3)
2 i=1
such that
yi − wxTi − b ≤ ϵ + δi ,
wxTi + b − yi ≤ ϵ + δi∗ (4)
δi ≥ 0, δi∗ ≥0
is the optimization problem where slack variables are added for the variables
that are outside the hyperplane of the regression model. The parameter C is
the misclassification cost, ϵPis the constraint or the tolerance level, while δi and
n
δi∗ are the slack variables. i=1 (δi + δi∗ ) is in Equation (3) so that the variables
outside of the margin are included.

5
For easier solving, the optimization problem is translated into its Lagrange
dual formulation. The non-linear SVR in this formulation is:
n n n
1 XX X
L(γ) = min K(ai + a∗i )(ai − a∗i ) + ϵ yi (ai − a∗i ) (5)
2 i=1 i=1 i=1

where ai and a∗i are the non-negative multipliers for each observation xi subject
to constraints:
Xn
(ai − a∗i ) = 0; 0 ≥ ai , a∗i ≤ C (6)
i=1

, K is the kernel function such that K = K(i, j) = ϕ(xi )ϕ(xj )T . The kernel
function used in this research involves three such as:
Linear → K(xi , xj ) = xi xTj
P olynomial → K(xi , xj ) = (xi xTj + 1)d (7)
(−γ|xi −xj |2 )
RadialBasisF unction → K(xi , xj ) = e

where d is the polynomial function’s degree.

2.3 Cross Validation Method


To improve the task of regression preparation and the classification, the
training set is divided using k-fold cross validation with k distinct subsets. After,
the subsets are used for training while the remaining k-1 are used for validation.
To calibrate the parameters, the training datasets are used for the trained model
which will be used with testing datasets and to be tested with Root Mean Square
Error (RMSE) and Mean Absolute Error. In this paper, 10-folds were used for
the mean value of RMSE and MAE.
v
u n
u1 X
RM SE = t (yi − yi∗ )2 (8)
n i=1

n
1X
M AE = |yi − yi∗ | (9)
n i=1

3 Results
3.1 Summary Statistics of Rice Parameters
The table shows the characteristics of the data from Area Harvested to yield
hectares. The dataset was gathered from the official website of the Philippines
Statistics Authority (PSA) and to get the yield, the area harvested was divided
by the volume of Production in terms of Kilograms. The below data, show
the mean of Area Harvested (115202.52), Volume of production in kilograms

6
(442875896.4), and yield (3767.52778).In terms of SD the area harvest Area
Harvested (16740.38087), Volume of production in kilograms (131422562.7), and
yield (699.38).It also showed that Area Harvested has negatively skewed and
the rest of the data are slightly skewed. The kurtosis values have leptokurtic
distribution. It suggests an increase in rice output in Tarlac City over the
recorded years because, as the population increases, the region and production
of the states contribute to the total yield.

Table 1: Table 1. Summary statistics of rice production in Tarlac province for


the year 1987-2022
MEAN SD SKEWNESS KURTOSIS

Area (Ha) 115202.52 16740.38087 0.11755209 -1.605523137

VoP (in KG) 442875896.4 131422562.7 -0.290029022 -1.402961194

Yield(KG/Ha) 3767.527778 699.3780366 -0.868571874 -0.550724363

3.2 Rice Yield Prediction of Province of Tarlac Using Var-


ious Kernels of SVR with Hyper Tuning Parameters.
The data was first categorized into two sections: training sets and test sets
that were generated by the SRV. Training sets will be 70% of the overall data,
while the remaining 30% will be in testing sets. To test the validity, the re-
searcher used the Gride search parameter and K fold Cross Validation for both
the training and testing sets such as C, gamma, epsilon, and d. In this study, we
consider cross-validation (k = 10) to evaluate the model performance of training
data of rice yield prediction and to reduce error estimates with less bias and
variance in the dataset. The given range of the following as generated also by
the AI itself: C (0.001, 100), Gamma (0.001, 100), epsilon (0,1) for radial ba-
sis function, C (0.001, 100), Gamma (0.001, 100), epsilon (0,1), degree (1) for
Polynomial, and linear C (10). The findings are summarized as follows:

7
Table 2: Table 2. Error analysis and cost values of training and testing datasets
by using SVR kernels for rice yield prediction.
Province Dataset Parameter RMSE MAE C d
Tarlac Training Polynomial 336.9 225.8 10 0.1 1
Linear 336.9 225.8 10 n.a n.a
RBF 224.69 149.86 100 0.1 n.a
Tarlac Testing Polynomial 116.95 83.49 10 0.1 1
Linear 116.95 83.49 10 n.a n.a
RBF 112.67 74.06 100 0.1 n.a

Table 2 shows the RMSE, MAE, and specified cost function for the Province
of Tarlac. It is apparent that linear has the most outliers for training and
testing datasets with error validation such as RMSE (336.9) and MAE (225.8)
with cost function C = 10. Polynomial shows exact results with respect to the
scale parameters as =0.1, =0.1, and d=1. The Radial Basis function however
showed the lowest error such as RMSE (224.69) and MAE (149.89) with respect
to the scale parameters as =0.1, =0.1, and d=1.

3.3 SVR with Different Kernels for Allocated Testing


Data of Rice Yield

Table 3: Table 3. SVR kernels for testing data of Rice Yield.


Year Testing data Linear Polynomial Radial Basis Function
2012 4456 4344.110353 4385.714398 4426.142311
2013 4338 4337.999988 4369.785854 4386.305512
2014 4682 4331.889623 4353.857311 4351.199462
2015 4251 4325.779257 4337.928768 4321.283263
2016 4223 4319.668892 4322.000224 4296.915803
2017 4286 4313.558527 4306.071681 4278.348573
2018 4344 4307.448162 4290.143137 4265.721385
2019 4287 4301.337797 4274.214594 4259.061098
2020 4188 4295.227432 4258.286051 4258.283342
2021 4331 4289.117066 4242.357507 4263.197151
2022 4283 4283.006701 4226.428964 4273.512281

Table 3 and graphical representations depict (Figure 1-3) the prediction of


testing data of rice yield of the Tarlac Province through various SVR kernels.
From the overall summary of Table 3, it is observed that RBF kernel is the best
models to predict the rice yield of Tarlac Province that it showed a lower RMSE
and MAE as compared linear and polynomial.

8
4 Discussions
5 Resources
FAOSTAT (2020). FAO. (http://www.fao.org/faostat/en/#data. Accessed
on June 25, 2020).

FAO (2002). Rice information. (https://www.fao.org/3/y4347e/y4347


e1g.htm#bm52).

Mindanao Times (2022). (https://mindanaotimes.com.ph/2022/01/30/


top-10-rice-farming-regions-in-the-philippines/#:~:text=The%20pro
vinces%20of%20Aurora%2C%20Bataan,the%20country%20in%20rice%20pro
duction).

Urrutia, J. D., Bedaa, J. S., Combalicer, C. B. V., Mingo, F. L. T. (2019,


December 19). Forecasting rice production in Luzon using integrated spatio-
temporal forecasting framework. AIP Publishing. (https://aip.scitation.
org/doi/abs/10.1063/1.5139184).

Yousefi, M., Khoshnevisan, B., Shamshirband, S. Et al (2015). Support Vec-


tor Regression Methodology for prediction of output energy in rice production.
Stochastic Environmental Research and Risk Assessment. 29(8),2115-2126.
10.1007/s00477-015-1055-z.

Paidipati, K., Chesneau, C., Nayana B(2021). Prediction of Rice Cultiva-


tion in India—Support Vector Regression Approach with Various Kernels for
Non-Linear Patterns. AgriEngineering, 3(2),182-198. 10.3390/agriengineer-
ing3020012.

9
Figure 1: SVR RBF kernel for testing data.

Figure 2: SVR Polynomial kernel for testing data.

10
Figure 3: SVR Linear kernels for testing data.

Figure 4: SVR kernels for testing data of Rice Yield.

11

You might also like