Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 25

DECLARATION

I hereby declare that the project report on Time Series “ A statistical study on
Rainfall Data of Lower Assam” is the record of project work done by me under
the guidance and supervision of Dr. Amit Choudhary and that it has not
previously formed the basis for the award of any degree, diploma, fellowship or
any other similar title or recognition.

Date: 27/06/2023 Sagarika Sarma


Place :Guwahati. M.Sc 4th Semester.
Department of
Statistics.
Gauhati
University.

1
CERTIFICATE

This is hereby certified that Sagarika Sarma, a student of M.Sc. 4th Semester
with Roll No PS-221-832-0007 under my guidance and supervision has
successfully completed the project work entitled " A statistical study on Rainfall
Data of Lower Assam " for the partial fulfilment of M.Sc course curriculum
under Gauhati University. The work is original and no part of it has been ever
published in any form.

Dr. Amit Choudhary


Professor Department of
Statistics
Gauhati University

ACKNOWLEDGEMENT

2
I would like to express my gratitude to my Supervisor Dr. Amit Choudhary,
HOD , Department of Statistics, Gauhati University, for his valuable
admonishment, precious suggestion, constant encouragement and unending
support throughout the entire project.
I pay my deep sense of gratitude to Dr. Amit choudhury (HOD,
Department of Statistics, Gauhati University) for supporting and providing me
the opportunity to embark on this project. Further, my acknowledgement cannot
be concluded without extending my thanks to all the teachers of our department
for their co-operation in carrying out the project successfully.
Lastly, I offer my thanks to my classmates and the non-teaching staff of
our department for their kind co-operation.

Date: 27/06/2023
Place: Guwahati Sagarika Sarma

CONTENTS

Declaration 1

3
Certificate 2
Acknowledgement 3
1. Introduction 5
2. Objectives 6
3. Data Sources 7
4. Methodology 8
5. Analysis and Results 15
6. Conclusions 25
References 26

Introduction
Assam experiences a monsoon-influenced tropical climate. The region
typically receives heavy rainfall during the monsoon season, which lasts from
June to September. The southwest monsoon brings intense rains, contributing to
the overall annual precipitation. The Brahmaputra River, flowing through
Assam, also influences the rainfall pattern. However, variations can occur,
impacting agriculture and the local ecosystem.
4
Assam, like many regions, faces climate change impacts such as erratic
rainfall, floods, and temperature variations. These changes affect agriculture,
biodiversity, and communities. Efforts to adapt and mitigate are crucial for
sustainable development in the face of climate challenges.
The impact of rainfall patterns in Assam can be significant, as it directly
influences agriculture, ecosystems, and communities in the region. Excessive
rainfall can lead to floods, affecting crops and causing infrastructure damage.
On the other hand, insufficient rainfall may result in drought conditions,
impacting agriculture and water resources. Studying these patterns is crucial for
understanding and managing the associated challenges in Assam.
Studying rainfall in Assam is crucial for various reasons. It helps in
understanding regional climate patterns, predicting floods or droughts,
managing water resources, and aiding agricultural planning. Additionally, it
contributes to infrastructure development, disaster preparedness, and overall
sustainable environmental management in the region.
This project aims to conduct a comprehensive understanding of rainfall
behaviour in the lower Assam region of the state Assam. This region consists of
11 districts. Statistical study of rainfall data involves various aspects like-
• Temporal patterns: Examining trends over time, Seasonal variation and
identifying long term changes in rainfall.
• Spatial Distribution: How rainfall is distributed geographically over the
eleven districts of the Lower Assam region.
• Predictive Modeling: Developing statistical models to predict future
rainfall patterns based on historical data.
ObjeCTIVE

The main objectives of the study are:


1. To describe the monthly and seasonal rainfall patterns of Lower Assam.
2. To describe the annual rainfall patterns of Lower Assam.
3. To determine the stationarity of the monthly rainfall data of Lower
Assam.
4. To forecast monthly rainfall of 2024-2026 using a suitable model.

5
Data source

The monthly rainfall data is collected for the districts of Lower Assam region
for the period of 9 years during 2015-2023 from the Indian Meteorological
Department, Borjhar, Guwahati, Assam. Data of this study is secondary in
nature. The obtained data was then transferred to MS Excel for further
organization and analysis. The analysis of the study was conducted using both
MS Excel and R Programming software. The collected data was continuous in
nature. It is worth noting that no missing values were encountered in the dataset,
ensuring the integrity and completeness of the analysis.

6
methodology
TREND ANALYSIS:
For several years, the researchers we interested in trend analysis of
meteorological variables such as rainfall, temperature, relative humidity, etc. of
the several methods of non-parametric test, which are used to detect the trend in
a time series, the Mann- Kendall test is the one which is widely used and
preferred technique over other non- parametric tests to detect significant trends.
In this study, trend analysis of the selected area for rainfall is carried out, firstly
by checking the trend whether the Mann-Kendall test for monthly average
rainfall data series decreases, increases or no trends.

MANN-KENDALL TEST:
The Mann-Kendall test is a non-parametric test used for determination
of linear and the non-linear trends as well as the turning point of the distribution
by using the Kendall test statistics. The Mann-Kendall method (MK) is used to
detect the trend in a time series data without fixing whether the trend is linear or
nonlinear. We wish to test null hypothesis H0 of no trend against the alternative
hypothesis H1 of monotonic increasing or decreasing trend. The nonparametric
MK test is applied to a time series were ranked from k = 1, 2 ,....,n-1 and xJ
ranked from j = k + 1, 2 ,.....,n. Each data point XK is used as a reference
point and is compared with all other data points xJ such that,
7
{
1 ,if ( x j−x k ) >0
sgn(xj-xk)= 0if ( x j−x k ) =0
−1if ( x j−x k ) <0

The MK test statistic, S is given below:

n −1 n

S=∑ ∑ sgn ( x j −x k )
k =1 j=k+1

Where, sgn( x j −x k )is the sig num function. The test statistic, S is assumed
to be asymptotically normal, with E(S) = 0 and variance as follows:
m
n(n−1)( 2n+ 5)−∑ t i (t i−1)(2 t i +5)
V(S)= i=1
18

Where, n is the number of data points, m is the number of tied groups in the
data set and ti is the number of data values in the ith tied group.
When the sample size n > 10 the test statistics Zc is computed as,

{
S−1
If S>0
√V (S)
Zc= 0 If S=0
S+1
If S<0
√V (S)
Where, Zc follows N(0,1).The positive value of Zc represents the positive
trend while the negative values of Zc represents the negative trend. When | Zc |>
Z1−α /2 ,the null hypothesis is rejected and a significant trend exists in the time
series. In the present study, a significant level of 5% is used. i.e α = 0.05. At 5%
level of significance, the null hypothesis of no trend is rejected if |Zc| > 1.96.

8
ARIMA MODEL:
ARIMA models are widely used in time series analysis for forecasting and
understanding the dynamics of a sequence of data points over time. These
models are used to identify trends and patterns in historical data and as well the
processes behind these patterns and trends.
AUTOREGRESSIVE(AR) MODEL:
In an autoregressive model a value from a time series is regressed on preceding
values of the same time series. The response variable in the preceding time
period has become the predictor and the errors have the same assumptions as in
simple linear regression models.

Let {Yt} t = 1, 2, 3 ,...,N express a time series data of N period of time. Assume
that µ is the mean of Y. Then Yt follows a first order Auto Regressive AR
stochastic process if Yt is modeled as,
y t = µ+ Φ 1 y t −1 +et

Where e t is the independent random error with mean 0 and variance σ 2.


In general, the AR model is represented as
(Y t )= Φ 1 y t −1 +Φ 2 y t −2+……..+Φ p y t − p+ e t
The AR model can be also written by including back shift operator (B) as
Φ(B)Yt=e t
Φ(B)=1- Φ 1 B-Φ 2 B2-………--Φ p B p

MOVING AVERAGE(MA) PROCESS:


The moving average model or moving-average process is a simple accession for
modelling the univariate time series. Opposite to the AR model, the finite MA
model is always static. The output of moving-average model consists of a white
noise input e t and transient response (θq).
Let (Y t ); t = 1,2,3, ..., N express a time series data of N period of time. Assume
that

u is the mean of Y. Then Y t follows a first order Moving Average (MA) if Y t is


modelled as,

9
Y t = µ+e t + θ1 e t−1

Where u is a constant or mean. In general, MA process is defined as,


Y t = µ+e t + θ1 e t−1+…..+θq e t −q

MA process depends only on the error term e t which is the moving average
of the present and the past error terms.
The MA model can be also written by including back shift operator (B) as
Y t =θ ( B ) et

θ(B)= 1+𝜽 B+θ2 B 2+………+θ p B p

AUTOREGRESSIVE INTEGRATED MOVING AVERAGE (ARIMA)


MODEL:
The autoregressive integrated moving average (ARIMA) is a generalization of
the autoregressive moving average (ARMA) model. The ARIMA model is also
used as an application for understanding and predicting the future values of the
time series (Chattopadhyay et al., 2012). The difference between ARIMA and
ARMA model is that ARIMA deals with time series data which showed
evidence of non- stationarity (Chaudhury and Dutta, 2014). The non-stationarity
can be removed by applying differencing step one or more times.

The autoregressive integrated moving average model expressed as ARIMA


(p.d.q). where p indicates for autoregressive (AR) order which deals with
regression of prior values and q indicate for moving average (MA) order which
deals with linear combination of error terms occurring simultaneously in various
times in the past (Al Balasmeh et al., 2019). The integrated (1) stands for
number of differencing between the values and preceding values for attaining
stationarity of the data. These processes are carried out to make the model best
fit for the data. ARIMA model can be represented as
Φ(B)∇ d Y t = θ ( B ) et
Where ,
Φ(B)=1- Φ 1 B-Φ 2 B2-………--Φ p B p
θ(B)= 1+𝜽 B+θ2 B 2+………+θ p B p

10
Where , Y t and e t , represent the time series and error terms at time t respectively.
B represents the backward shift operator. ∇ d indicates the differencing operation
in data series in order to make it stationary, d represents the number of
differencing. Φp and θq represent the model parameters and φ(B) and θ (B)
indicate the order of p and q respectively.

The time series data may have a seasonal effect which can be removed by
seasonal differencing of the time series data. The ARIMA model has two
general forms, model with seasonality is known as seasonal ARIMA model
represented as SARIMA. The autocorrelation function measures the correlation
between a time series and its lagged values. It provides information about the
relationship between observations at different time points in a time series.
The partial autocorrelation function measures the correlation between a time
series and its lagged values, after removing the effects of the intermediate lags.
It provides insights into the direct relationship between observations at different
time points, taking into account the contributions of intermediate lags.
FITTING AN ARIMA MODEL:
Model identification, selection, diagnostic checking and forecasting are the
processes that make up the ARIMA modelling process . The process is
discussed extensively in the following subsections:
Here's a brief overview of the model identification process for SARIMA model:
1. Stationarity Analysis
The first step is to check if the time series data is stationary. Stationarity implies
that the statistical properties of the series, such as mean and variance, remain
constant over time. The test of presence of stationarity for the time series data is
done by Augmented Dickey Fuller (ADF) test.
H0= The data consists of unit roots or data is not stationary.(γ^ =0)
H₁: The data does not consist of unit roots or data is stationary.(γ^ <0)
The test statistic for ADF test is
γ^
DF= SE ( γ^ )

11
Where, γ^ is ordinary least square estimate and SE (γ^ ) is the standard error. If the
DF value is more than critical value, H0 is rejected otherwise it is accepted. If
the series is non-stationary, differencing is applied to make it stationary.
Differencing.
The order of differencing, denoted as 'd' in SARIMA(p. d, q)(P, D. Q)m.
represents the number of times differencing is required to achieve stationarity.
Differencing helps remove trends or seasonal patterns from the data.

Seasonality Analysis
If the time series exhibits a seasonal pattern, seasonal differencing is required.
The seasonal period is denoted as 'm'. The order of seasonal differencing,
denoted as 'D'. determines the number of times differencing is applied to
remove the seasonal component. p.d.q)(P.D.Q)m and model without seasonality
is the non-seasonal ARIMA model represented as ARIMA (p.d.q).
Seasonal Autoregressive Integrated Moving Average (SARIMA) Model
The seasonal ARIMA is an extension of ARIMA which completely deals with
the time series data consisting of seasonal components. The seasonal component
of the model is composed of terms which are identical to the non-seasonal
components of the model. The Seasonal ARIMA consists of three new
parameters for representing the auto-regression (AR), differencing (1) and
moving average (MA) for the seasonal component of the series and also an
extra parameter for representing the period of seasonality. The Seasonal ARIMA
can be represented as Seasonal ARIMA (p.d.q)(P.D,Q)m. It consists of two
components, non-seasonal and seasonal components (Hung Ken et al. 1998).
The non-seasonal part is represented as (p.d.q) where the 'p' stands for
autoregressive order, 'd' stands for number of differencing for attaining
stationarity and 'q' stands for moving average order. The seasonal component of
seasonal ARIMA is indicated as (P.D.Q)m where the 'P' stands for
autoregressive order of the seasonal component, 'D' stands for number of
differencing with preceding values for achieving stationarity for the seasonal
component, 'Q' stands for moving average of the seasonal component and m
stands for the additional parameter for representing the period of the seasonality
for the data.
The seasonal ARIMA can be represented can be represented by
φ p ( Bm ) φ p ( B ) ∇ Dm ∇d Y t =¿θQ(Bm )θq(B)e t

Where,
12
φ p ( Bm ) = 1- Φ 1 Bm-Φ 2 B -………--Φ p B
2m pm

Represents the seasonal autoregressive operator of order P.


Φ(B)=1- Φ 1 B-Φ 2 B2-………--Φ p B p
denotes the non seasonal autoregressive operator of order p
D
∇ m =¿m)D , ∇ d =(1−B)d, BkY t =Yt-k

Θq(Bm)= 1+𝜽 Bm+θ2 B 2 m+………+θ p B p m


Indicates the seasonal moving average operator of order Q.
Θq(B)= 1+𝜽 B+θ2 B 2+………+θq Bq
denotes the non-seasonal moving average operator of order q.

Here, et represents the error terms which are identically and independently
distributed with mean zero and variance σ 2e ,m is the specific time period and B
is the back shift operator (Murthy,s).

13
ANALYSIS AND RESULT

Annual Rainfall:
The annual rainfall distribution in Lower Assam for the period of 9 years
during 2015 to 2023 is illustrated in this graph. The average annual rainfall
during the study period is 2497.76 mm. Maximum annual rainfall of 3472.08
occurred in 2020 and minimum annual rainfall of 1836.86 occurred in 2021.

Monthly Variation of rainfall:


The rainfall patterns of distinct months during a period of 9 years from
2015 to 2023 is illustrated in this graph. Which shows that the month of June
has the most rainfall of 7353.611 mm followed by July (5237.789 mm) and May
(4560.97 mm). December has the least rainfall (83.25mm), followed by
January (84.42mm) and November(98.28mm).

14
Quarterly variation of rainfall:
The quarterly rainfall distribution for the period 2015-2023 is
illustrated in this graph. 2nd quarter (Apr-Jun) delivered the most rainfall
131937.3 mm, followed by 3rd quarter (115036.8 mm) and 4th quarter (14305.1
mm). 1st quarter has the least rainfall (8479.8 mm).

15
From the above graph, we can interpret that the highest rainfall 12852.7 mm
occurred in the month of June of 2022. The 2nd highest rainfall 10120.8 mm
occurred in the month July of 2020. The 3rd highest 9515.5 mm occurred in the
month of June of the year 2020.

Mann-Kendall Test:
Kendall's Tau statistic is -0.0315, which is very close to 0. This suggests
there is little to no monotonic trend (either upward or downward) in the data
over time.
The two-sided p-value is 0.63085, which is much greater than the commonly
used significance level of 0.05. This means we fail to reject the null hypothesis
that there is no trend present.
The Mann-Kendall test results do not provide evidence of a statistically
significant upward or downward trend in the data. The data appears to be
relatively stable over time, without a clear monotonic trend.

SARIMA model for monthly rainfall data of Lower Assam:


16
Studying the above graph, we can see existence of some repeating patterns,
we can conclude that seasonal variation is there in the data and there is no trend
line exist in the data.

Augmented Dicky-Fuller Test:


In order to check the rainfall data is stationary or not, We perform the
ADF test. The test statistics and p-value we got is given below.

ADF test statistics P-value

ADF test -7.7051 0.01

The null-hypothesis with a unit root is rejected because the p-value is less
than 0.05. As an outcome, the monthly rainfall time series data is stationary.

ACF and PACF plot:

17
From the ACF of the time series rainfall data, sinusoidal graph can be
seen from which we can say the seasonality exists in the data and the significant
spikes detected at interval of 12 months. This shows that a seasonal fluctuations
occur every month resulting m=12. Therefore seasonal differencing (D=1) is
required to make the time series seasonally stationary in order to fit ARIMA
model. Also, there no need for differencing the non-seasonal component. Hence,
d=0

Model Identification:
18
There are significant autocorrelation at lag 1 in the PACF plot, and this
indicates that non-seasonal autoregressive order p=1 would be appropriate.
Again, there is significant autocorrelation occur at the seasonal lag 12 and lag
24 of the PACF plot, indicating P=2.
Similarly, there are significant auto correlation at lag1 and 9 in the ACF
plot, and this indicates that a non-seasonal MA order q=1 or 9.
Also, there is significant autocorrelation at the seasonal lag 12 indicating Q=1.
Based on the above results, we have estimated our possible models and then
determined the AIC values of these models-

19
SARIMA model AIC value
SARIMA (1, 0 , 9)(2 , 1, 1)12

1560.77
SARIMA (1, 0 , 1)(2 ,1 , 1)12

1558.65

Here, SARIMA (1, 0 , 1)(2 ,1 , 1)12 shows the least AIC value.
Now we will do the diagnostic checking of the SARIMA (1, 0 , 1)(2 ,1 , 1)12 ,

Model diagnostic checking:


ADF test for stationarity:

P-value 0.01

The ACF plot of the residuals shows that almost all of the
autocorrelations are within the threshold limits, indicating that the residuals are
behaving as white noise.
Histogram of the residual data shows that the residuals are normally
distributed.

20
From the above interpretation, it can be concluded that the
SARIMA (1, 0 , 1)(2 ,1 , 1)12 model is capable of making reliable rainfall forecast.

Model Parameters Coefficients


AR1 0.5591
MA1 -0.463
SARIMA (1, 0 , 1)(2 ,1 , 1)12 SAR1 -0.5596
SAR2 -0.1309
SMA1 -0.7254

Forecasting:
The SARIMA (1, 0 , 1)(2 ,1 , 1)12 model which was selected based on the
lowest AIC value is used to predict monthly rainfall values in the Lower Assam
from January 2024 to December 2026, is shown in the table below.

Rainfall
Year Month Forecast
2024 JAN 60.364
145.031636
FEB 4
1018.29891
MAR 1
1921.10390
APR 4
2307.04284
MAY 5
4396.34113
JUN 3
JUL 3405.40853
2

21
1190.99916
AUG 2
936.826376
SEP 4
698.097289
OCT 7
NOV 43.5446777
DEC 38.5505628
Rainfall
year month forecast
JAN 37.6439282
FEB 10.5508969
207.111631
MAR 1
267.619571
APR 3
843.966853
MAY 5
1874.15751
JUN 6
669.685179
JUL 9
AUG 46.6213344
SEP 41.2268835
OCT 4.8340939
NOV 0.1681774
2025 DEC 10.1021208
2026 JAN 38.4591371
FEB 68.8568286
213.154997
MAR 4
APR 499.218778
630.482651
MAY 8
1682.23762
JUN 6
892.791492
JUL 2
AUG 304.465069

22
6
139.054500
SEP 2
OCT 95.8835406
NOV 31.7402439
DEC 11.456

We plot the forecasted values for the year 2024,2025,2026.Which is given as


follows:-

forecast
14000
12000
10000
8000
6000
4000
2000
0
JAN JAN JAN JAN JAN JAN JAN JAN JAN JAN JAN JAN

23
CONCLUSION

The rainfall in Lower Assam does not exhibit a significant overall trend
during the period 2015-2023.
Our selected SARIMA (1, 0 , 1)(2 ,1 , 1)12model provides a suitable framework
for understanding the patterns, trends, and seasonal variations in the monthly
rainfall data for Assam.
The SARIMA (1, 0 , 1)(2 ,1 , 1)12 model residuals were found to be white noise
and normally distributed, indicating the model assumptions were satisfied.
The SARIMA (1, 0 , 1)(2 ,1 , 1)12 model gives us reliable rainfall forecasts
that can support decision-making for water storage, irrigation scheduling, and
crop planning to optimize agricultural productivity. Accurate short-term and
medium-term rainfall predictions can help mitigate the impacts of extreme
weather events like floods, droughts, and landslides.
The reliable rainfall forecasts by the model SARIMA (1, 0 , 1)(2 ,1 , 1)12 can
support sustainable water resource management and climate change adaptation
strategies in the Lower Assam.
Other variables like climate change, topography, atmospheric conditions,
and regional factors might also play a significant role in determining future
rainfall patterns, and these aspects may not be adequately captured by the
model. Therefore, it is essential to interpret the forecast with caution and
consider them as probabilistic estimates rather than deterministic predictions.

24
References

For methodology:
Forecasting long term monthly precipitation using SARIMA model : by- P
Kabbilawsh, D Satish Kumar, N R Chithra, Journal of Earth System
Science(2022)

For Analysis:
Deka S. (2021), Trend analysis of rainfall and temperature data for India,
current Science
Mahsin M. Akhter Y. and Begum M.(2012). Modelling rainfall in Dhaka
division of Bangladesh using time series analysis. Journal of Mathematical
Modelling and Application, Vol. 1, No.5, 67-73
Mitu KN. Hasan K. (2021). Modelling and Forecasting Temperature Time
Series in the Memphis, Tennessee. International Journal of Environmental
Monitoring and Analysts, 9(6): 214-221.

25

You might also like