Professional Documents
Culture Documents
Ai 5
Ai 5
Technologies
T. Dananjali S. Wijesinghe J. Ekanayake
Faculty of Graduate Studies, Department of Economics and Statistics Department of Computer Science and
Sabaragamuwa University of Sri Lanka Sabaragamuwa University of Sri Lanka Informatics
Belihuloya, Sri Lanka Belihuloya, Sri Lanka Uva Wellassa University
dananjali@ccs.sab.ac.lk wijesinghewadsk@gmail.com Badulla, Sri Lanka
jayalath@uwu.ac.lk
Abstract— Rainfall forecasting is a technologically and mining technologies, statistical models, and hybrid models,
scientifically a challenging task around the world. Rainfall is which are combinations of data mining models and statistics.
one of the most important weather conditions in a given area.
Forecasting possible rainfall can help to solve several problems The multiplicative seasonal autoregressive integrated
moving average model (SARIMA) is used to simulate
2020 From Innovation to Impact (FITI) | 978-1-6654-1471-5/20/$31.00 ©2020 IEEE | DOI: 10.1109/FITI52050.2020.9424877
Authorized licensed use limited to: Makerere University Library. Downloaded on November 01,2022 at 14:41:25 UTC from IEEE Xplore. Restrictions apply.
to be the best model. An MLP-ANN with 4 hidden nodes TABLE I. AVERAGE ERROR VALUES OF THE MODELS PERFORMED
BY TRAINING DATA SET
performed satisfactorily. According to their results model
tree, M5 predicts better than ANN-MLP. Model MAE RMSE RRSE RAE DA
According to the survey paper [5], the widely used Linear regression
14.2 18.9 63.8 66.1 51.8
techniques for prediction are Regression analysis, clustering, model
and Artificial Neural Network (ANN). Most of the M5P model tree 14.5 19.1 64.6 67.9 49.5
researches have used the statistical method and the ANN
SMO regression
model simultaneously and a comparison between those two model
13.1 19.9 67.3 60.9 55.4
types. shows that ANN models were better than the
traditional statistical models [9], [10], [11]. TABLE II. AVERAGE ERROR VALUES OF THE MODELS PERFORMED
BY TESTING DATA SET
Mishra et al [12] developed and analyzed ANN models to
forecast rainfall based on time series data. They developed Model MAE RMSE RRSE RAE DA
two models to forecast one month and two months ahead
Linear regression
predictions. Feed Forward Neural Network (FFNN) using model
19.2 25.9 78.3 85.5 44.4
Back Propagation Algorithm and Levenberg- Marquardt
training function has been used. MSE, Magnitude of Relative M5P model tree 17.0 23.4 70.9 75.7 46.7
Error (MRE) were used to evaluate models. According to SMO regression
19.9 28.0 84.5 88.3 44.3
their findings, one month ahead prediction models performed model
better than two months ahead prediction models.
According to the results, the SMO regression model
Even though many existing rainfall forecasting models provides the lowest MAE, RAE and the highest direction
are available, the models based on Sri Lankan weather accuracy whereas the linear regression model provides the
conditions are comparatively rare. Therefore, in our study we lowest RMSE and RRSE. The M5P model performs
aim to forecast weekly rainfall in Badulla district using three comparatively similar to the linear regression model. Also,
different data mining technologies for next six months’ time the M5P model provides the lowest error values for each
period. The prediction quality of each model was evaluated
evaluation matrix while providing the highest direction
using different evaluation matrix after which the prediction
accuracy according to the Table II. Hence, the M5P model is
quality was analyzed using residual analysis and the quality
identified as the comparatively best performed prediction
of prediction models was further confirmed. Thereby, the
model among these three models.
best fit data mining model was identified for weekly rainfall
prediction.
Actual Value Predicted Value
III. METHODOLOGY
The SMO regression, Linear regression and M5P model 150
Rainfall Value (mm)
tree were trained and tested to predict weekly rainfall data in 100
Badulla district and then, the best performing model among
them was identified to forecast weekly rainfall. The rainfall 50
data was collected from the Meteorological Department of 0
Sri Lanka for past fifteen years from 1st January 2002 to 31st 7/25/2012 8/25/2012 9/25/2012 10/25/2012 11/25/2012 12/25/2012
December 2017. The first week is defined from 1st January Week
2002 to 7th January 2002 and thereafter each seven day
period was considered as a week. Dataset was preprocessed
as it contains missing values and outliers. Removed outliers Fig. 1. Actual values and predicted values of the linear regression model
from the data set and each missing values were filled with
mean values. There are 835 instances in the final dataset. The
algorithms are implemented in Weka (weka-dev-3.9.3.) data 150
mining tool. Minitab (Minitab 18.0) was used to analyze the
Rainfall Value(mm)
results. The size of the training dataset is 66% of the total 100
and the rest was used for testing the models. Performances of 50
each linear regression, SMO regression and M5P model tree
were evaluated using Mean Absolute Error (MAE), Root 0
7/25/2012 8/25/2012 9/25/2012 10/25/2012 11/25/2012 12/25/2012
Mean Squired Error (RMSE), Root Relative Squired Error -50
(RRSE), Root Absolute Error (RAE), and Direction Week
Accuracy (DA). Further, the performances of the models
were evaluated using residual analysis.
Fig. 2. Actual values and predicted values of the M5P model tree
IV. RESULTS
The rainfall prediction was conducted for weekly basis in
six (06) months’ time period ahead hence, the models predict
twenty four (24) rainfall values each for a week.
Table I shows the prediction quality of each linear
regression, SMO regression and M5P model tree in training
process. Table II shows the prediction quality of each models
in testing process.
Authorized licensed use limited to: Makerere University Library. Downloaded on November 01,2022 at 14:41:25 UTC from IEEE Xplore. Restrictions apply.
150
Rainfall Value(mm)
100
50
0
7/25/2012 8/25/2012 9/25/2012 10/25/2012 11/25/2012 12/25/2012
-50
Week
Pearson Correlation
Model P-Value
Coefficient Value
Linear regression model 0.16 0.451
We define following two hypotheses; Fig. 6. Run chart of SMO regression model
H0 = There is no correlation between actual and Runs chart is used to check the randomness of the
predicted rainfall values residuals. Fig. 4, 5, and 6 show the run chart generated for
H1 = There is correlation between actual and predicted residual values of the each models. The X-axis of the charts
rainfall values shows the observation and Y-axis shows residuals of each
models.
Table III shows the Pearson correlation coefficient of
each model. H0 is rejected when P-Value is less than 0.05. To that end we defined two hypotheses:
According to Table III, P-Values of the linear regression and H0 = Error values performed by model are random
the SMO regression models are greater than 0.05. Hence, H0
is not rejected and we accept that there is no correlation H1 = Error values performed by model are not random
between actual and predicted values. The P-Value of M5P and there is a pattern.
model is less than 0.05 and the correlation coefficient is 0.41. H0 : the null hypothesis, is rejected when the P-Value is
According to the correlation analysis the M5P outperforms less than 0.05. The hypothesis is tested using the
the other two models in predicting rainfall six weeks ahead. approximate P value for clustering and trends.
Next the residual analysis is conducted to check the goodness
of the model. According to the run charts, the linear, M5P and SMO
regression models obtain approximate P-Value 0.338, 0.202
and 0.338 respectively for clustering and 0.567, 0.749 and
0.201 respectively for trends. According to the P-Values H0
is not rejected for any of the models, which indicates that the
error values are random. The M5P model obtains
approximately a higher P-Value than other two models.
Therefore, the randomness of the M5P model is greater than
the other two models. This further confirms that the
prediction quality of M5P is better than the other two
models.
V. CONCLUSION
This project proposes an approach to forecast rainfall in
Badulla district Sri Lanka. Towards that end, three data
mining models were trained from the rainfall data collected
Fig. 4. Run chart of linear regression model from Budulla District. From the evidence, it was concluded
Authorized licensed use limited to: Makerere University Library. Downloaded on November 01,2022 at 14:41:25 UTC from IEEE Xplore. Restrictions apply.
that the M5P model tree performed better than linear [5] P. Burlando, R. Rosso, L. G. Cadavid, and J. D. Salas, “Forecasting
regression and SMO regression models. The M5P model of short-term rainfall using ARMA models,” Journal of Hydrology,
vol. 144, no. 1, pp. 193–211, 1993, doi:
recorded comparatively lower MAE, RMSE, RRSE, RAE, https://doi.org/10.1016/0022-1694(93)90172-6.
and higher DA values in both training and testing processes. [6] J. Joseph and T. K. Rathees, “Rainfall Prediction using Data
Only the M5P model provides a positive correlation of 0.41 Mining Techniques,” International Journal of Computer
whereas the other two models do not show any correlation Applications, vol. 83, pp. 11–15, 2013, doi: 10.5120/14467-2750.
between actual and predicted rainfall values. Furthermore, [7] M. A. I. Navid and N. H. Niloy, “Multiple Linear Regressions for
Predicting Rainfall for Bangladesh,” Communications, vol. 6, no. 1,
the M5P provides greater randomness in error distribution. Art. no. 1, Feb. 2018, doi: 10.11648/j.com.20180601.11.
Hence, the, M5P model tree is proposed as the best model [8] E. Onyari and F. Ilunga, “Application of MLP Neural Network and
for weekly rainfall prediction at Badulla district. M5P Model Tree in Predicting Streamflow: A Case Study of
Luvuvhu Catchment, South Africa,” International Journal of
Summarizing, the rainfall can be forecasted for six Innovation, Management and Technology, vol. 4, pp. 11–15, 2013,
months ahead in Badulla District using the M5P model tree doi: 10.7763/IJIMT.2013.V4.347.
with a decent accuracy. This finding is useful for many [9] A. El-Shafie, H. Mazoghi, A. AbouKheira, and M. Taha, “Artificial
industries particularly for the agriculture sector in Badulla neural network technique for rainfall forecasting applied to
area. Alexandria, Egypt,” International Journal of the Physical Sciences,
vol. 6, pp. 1306–1316, 2011.
[10] I. Khandelwal, R. Adhikari, and G. Verma, “Time Series
ACKNOWLEDGMENT Forecasting Using Hybrid ARIMA and ANN Models Based on
The work/publication is supported by Research grant DWT Decomposition,” Procedia Computer Science, vol. 48, pp.
2016, Sabaragamuwa University of Sri Lanka. 173–179, 2015, doi: https://doi.org/10.1016/j.procs.2015.04.167.
[11] “Comparative Study of Rainfall Prediction Modeling Techniques
(A Case Study on Srinagar, J&K, India),” The Research
REFERENCES
Publication. https://www.trp.org.in/issues/comparative-study-of-
[1] “Four Types of Forecasting,” Sciencing. https://sciencing.com/four- rainfall-prediction-modeling-techniques-a-case-study-on-srinagar-
types-forecasting-8155139.html (accessed Oct. 15, 2020). jk-india (accessed Oct. 15, 2020).
[2] T. Mohamed and A. Ibrahim, “Time Series Analysis of Nyala [12] Department of Computer Science and Engineering, NRI College of
Rainfall Using ARIMA Method,” vol. 17, 2016. Engineering and Management, Gwalior, Madhya Pradesh, India, N.
[3] D. Eni and F. Adeyeye, “Seasonal ARIMA Modeling and Mishra, H. K. Soni, S. Sharma, and A. K. Upadhyay,
Forecasting of Rainfall in Warri Town, Nigeria,” Journal of “Development and Analysis of Artificial Neural Network Models
Geoscience and Environment Protection, vol. 03, pp. 91–98, 2015, for Rainfall Prediction by Using Time-Series Data,” IJISA, vol. 10,
doi: 10.4236/gep.2015.36015. no. 1, pp. 16–23, Jan. 2018, doi: 10.5815/ijisa.2018.01.03.
[4] I. Mahmud, S. H. Bari, and M. T. U. Rahman, “Monthly rainfall
forecast of Bangladesh using autoregressive integrated moving
average method,” Environmental Engineering Research, vol. 0, p.
0, 2016, doi: 10.4491/eer.2016.075.
Authorized licensed use limited to: Makerere University Library. Downloaded on November 01,2022 at 14:41:25 UTC from IEEE Xplore. Restrictions apply.