Professional Documents
Culture Documents
AlbarrakA 2013-1 BODY
AlbarrakA 2013-1 BODY
AlbarrakA 2013-1 BODY
PRODUCTION DATA
A THESIS
SUBMITTED TO THE GRADUATE SCHOOL
IN PARTIAL FULFILLMENT OF THE
REQUIREMENTS
FOR THE DEGREE
MASTER OF SCIENCE
BY
ABDULMAJEED ALBARRAK
ADVISER DR. RAHMATULLAH IMON
BALL STATE UNIVERSITY
MUNCIE, INDIANA
DECEMBER, 2013
Time Series Analysis of Saudi Arabia Oil Production Data
A THESIS
MASTER OF SCIENCE
By
Abdulmajeed Albarrak
Committee Approval:
………………………………………………………………………………………………….
………………………………………………………………………………………………
…………………………………………………………………………………………………
…………………………………………………………………………………………………
…………………………………………………………………………………………………
Muncie, Indiana
November 2013
ACKNOWLEDGEMENTS
Foremost, I would like to express my sincere gratitude to my advisor Professor Dr. Rahmatullah
Imon for the continuous support of my thesis study, for his patience, motivation, enthusiasm, and
immense knowledge. His guidance helped me in all the time during my analysis and writing the
report. I could not have imagined having a better advisor and mentor for my thesis other than
him. Besides my advisor, I would like to thank the rest of my thesis committee: Dr. Dale
Umbach and Dr. Munni Begum for their encouragement, insightful comments and patience. I am
thankful to all my classmates for their kind supports. Last but not the least, I would like to thank
Abdulmajeed Albarrak
November 3, 2013
ABSTRACT
THESIS PAPER: Time series analysis of Saudi Arabia oil production data
PAGES: 123
Saudi Arabia is the largest petroleum producer and exporter in the world. Saudi Arabian
economy hugely depends on production and export of oil. This motivates us to do research on oil
production of Saudi Arabia. In our research the prime objective is to find the most appropriate
models for analyzing Saudi Arabia oil production data. Initially we think of considering
integrated autoregressive moving average (ARIMA) models to fit the data. But most of the
variables under study show some kind of volatility and for this reason we finally decide to
ARCH effect, it will automatically become an ARIMA model. But the existence of missing
values for almost each of the variable makes the analysis part complicated since the estimation of
parameters in an ARCH model does not converge when observations are missing. As a remedy
to this problem we estimate missing observations first. We employ the expectation maximization
(EM) algorithm for estimating the missing values. But since our data are time series data, any
simple EM algorithm is not appropriate for them. There is also evidence of the presence of
outliers in the data. Therefore we finally employ robust regression least trimmed squares (LTS)
based EM algorithm to estimate the missing values. After the estimation of missing values we
employ the White test to select the most appropriate ARCH models for all sixteen variables
under study. Normality test on resulting residuals is performed for each of the variable to check
Table 4.14.2 Order of ARCH Using the White Test for the Export of Refined Oil to Asia and Far
East Data 103
Table 4.14.3 Normality Test of ARCH (1) Rresiduals for the Export of Refined Oil to Asia and
Far East Data 104
Table 4.15.1 The ACF and PACF Values for the Export of Crude Oil to Oceania Data 105
Table 4.15.2 Order of ARCH Using the White Test for the Export of Crude Oil to Oceania Data 106
Table 4.15.3 Normality Test of ARCH (2) Rresiduals for the Export of Crude Oil to Oceania Data 107
Table 4.16.1 The ACF and PACF Values for the Export of Refined Oil to Oceania Data 108
Table 4.16.2 Order of ARCH Using the White Test for the Export of Refined Oil to Oceania Data 109
Table 4.16.3 Normality Test of ARCH (1) Rresiduals for the Export of Refined Oil to Oceania
Data 110
Table 4.17 Selected Models for Saudi Arabia Oil Production Data 111
List of Figures
CHAPTER 1
Figure 1.1 Time Series Plot of Crude Oil Production 5
Figure 1.2 Time Series Plot of Export of Refined Oil 5
Figure 1.3 Time Series Plot of Export of Crude Oil to North America 6
Figure 1.4 Time Series Plot of Export of Refined Oil to North America 6
Figure 1.5 Time Series Plot of Export of Crude Oil to South America 7
Figure 1.6 Time Series Plot of Export of Refined Oil to South America 7
Figure 1.7 Time Series Plot of Export of Crude Oil to Western Europe 8
Figure 1.8 Time Series Plot of Export of Refined Oil to Western Europe 8
Figure 1.9 Time Series Plot of Export of Crude Oil to Middle East 9
Figure 1.10 Time Series Plot of Export of Refined Oil to Middle East 9
Figure 1.11 Time Series Plot of Export of Crude Oil to Africa 10
Figure 1.12 Time Series Plot of Export of Refined Oil to Africa 10
Figure 1.13 Time Series Plot of Export of Crude Oil to Asia and Far East 11
Figure 1.14 Time Series Plot of Export of Refined Oil to Asia and Far East 11
Figure 1.15 Time Series Plot of Export of Crude Oil to Oceania 12
Figure 1.16 Time Series Plot of Export of Refined Oil to Oceania 12
CHAPTER 3
Figure 3.1 Time Series Plot of Original and Missing Value Estimated Export of Crude Oil to
North America 47
Figure 3.2 Time Series Plot of Original and Missing Value Estimated Export of Refined Oil to
North America 48
Figure 3.3 Time Series Plot of Original and Missing Value Estimated Export of Crude Oil to
South America 49
Figure 3.4 Time Series Plot of Original and Missing Value Estimated Export of Refined Oil to
South America 50
Figure 3.5 Time Series Plot of Original and Missing Value Estimated Export of Crude Oil to
Western Europe 51
Figure 3.6 Time Series Plot of Original and Missing Value Estimated Export of Refined Oil to
Western Europe 52
Figure 3.7 Time Series Plot of Original and Missing Value Estimated Export of Crude Oil to
Middle East 53
Figure 3.8 Time Series Plot of Original and Missing Value Estimated Export of Refined Oil to
Middle East 54
Figure 3.9 Time Series Plot of Original and Missing Value Estimated Export of Crude Oil to
Africa 55
Figure 3.10 Time Series Plot of Original and Missing Value Estimated Export of Refined Oil to
Africa 56
Figure 3.11 Time Series Plot of Original and Missing Value Estimated Export of Crude Oil to
Asia and Far East 57
Figure 3.12 Time Series Plot of Original and Missing Value Estimated Export of Refined Oil to
Asia and Far East 58
Figure 3.13 Time Series Plot of Original and Missing Value Estimated Export of Crude Oil to
Oceania 59
Figure 3.14 Time Series Plot of Original and Missing Value Estimated Export of Refined Oil to
Oceania 60
CHAPTER 4
Figure 4.1.1 The ACF and PACF Values for the Crude Oil Production Data 62
Figure 4.1.2 Normal Probability Plot of ARCH (1) Residuals for the Crude Oil Production Data 64
Figure 4.1.3 The LTS ACF and PACF Values for the Crude Oil Production Data 65
Figure 4.1.4 Normal Probability Plot of ARCH (1) LTS Residuals for the Crude Oil Production
Data 67
Figure 4.2.1 The ACF and PACF Values for Total Export of Refined Oil Data 68
Figure 4.2.2 Normal Probability Plot of ARCH (1) Residuals for the Total Export of Refined Oil
Data 69
Figure 4.3.1 The ACF and PACF Values for the Export of Crude Oil to North America Data 71
Figure 4.3.2 Normal Probability Plot of ARCH (2) Residuals for the Export of Crude Oil to North
America Data 72
Figure 4.4.1 The ACF and PACF Values for the Export of Refined Oil to North America Data 74
Figure 4.4.2 Normal Probability Plot of ARCH (1) Residuals for the Export of Refined Oil to
North America Data 75
Figure 4.5.1 The ACF and PACF Values for the Export of Crude Oil to South America Data 77
Figure 4.5.2 Normal Probability Plot of ARCH (1) Residuals for the Export of Crude Oil to South
America Data 78
Figure 4.6.1 The ACF and PACF Values for the Export of Crude Oil to South America Data 80
Figure 4.6.2 Normal Probability Plot of ARCH (1) Residuals for the Export of Crude Oil to South
America Data 81
Figure 4.7.1 The ACF and PACF Values for the Export of Crude Oil to Western Europe Data 83
Figure 4.7.2 Normal Probability Plot of ARCH (1) Residuals for the Export of Crude Oil to
Western Europe Data 84
Figure 4.8.1 The ACF and PACF Values for the Export of Refined Oil to Western Europe Data 86
Figure 4.8.2 Normal Probability Plot of ARCH (1) Residuals for the Export of Refined Oil to
Western Europe Data 87
Figure 4.9.1 The ACF and PACF Values for the Export of Crude Oil to Middle East Data 88
Figure 4.9.2 Normal Probability Plot of AR (1) Residuals for the Export of Crude Oil to Middle
East Data 90
Figure 4.10.1 The ACF and PACF Values for the Export of Refined Oil to Middle East Data 91
Figure 4.10.2 Normal Probability Plot of AR (1) Residuals for the Export of Refined Oil to Middle
East Data 93
Figure 4.11.1 The ACF and PACF Values for the Export of Crude Oil to Africa Data 94
Figure 4.11.2 Normal Probability Plot of AR (1) Residuals for the Export of Crude Oil to Africa
Data 95
Figure 4.12.1 The ACF and PACF Values for the Export of Refined Oil to Africa Data 97
Figure 4.12.2 Normal Probability Plot of AR (1) Residuals for the Export of Refined Oil to Africa 98
Data
Figure 4.13.1 The ACF and PACF Values for the Export of Crude Oil to Asia and Far East Data 100
Figure 4.13.2 Normal Probability Plot of ARCH (2) Residuals for the Export of Crude Oil to Asia
and Far East Data 101
Figure 4.14.1 The ACF and PACF Values for the Export of Refined Oil to Asia and Far East Data 103
Figure 4.14.2 Normal Probability Plot of ARCH (1) Residuals for the Export of Refined Oil to
Asia and Far East Data 104
Figure 4.15.1 The ACF and PACF Values for the Export of Crude Oil to Oceania Data 105
Figure 4.15.2 Normal Probability Plot of ARCH (2) Residuals for the Export of Crude Oil to
Oceania Data 107
Figure 4.16.1 The ACF and PACF Values for the Export of Refined Oil to Oceania Data 108
Figure 4.16.2 Normal Probability Plot of ARCH (1) Residuals for the Export of Refined Oil to
Oceania Data 110
CHAPTER 1
INTRODUCTION
Saudi Arabia is the largest petroleum producer and exporter in the world. Saudi Arabia possesses
18 per cent of the world’s proven petroleum reserves, which is over 260 billion barrels. The oil
and gas sector accounts for roughly 50 per cent of gross domestic product, and 90 per cent of
export earnings. Saudi refineries produce around 10.78 million barrels of oil per day. Oil was
Oil exploration had been initiated in Middle Eastern area before World War I. But the search was
not initiated largely in Saudi Arabia before 1933. The Standard Oil of California (SOCAL now
Chevron) was given exploration rights to some area of Saudi Arabia in 1933. SOCAL set up a
subsidiary company, the California Arabian Standard Oil Company (CASOC) to develop the oil
concession. SOCAL also joined forces with the Texas Oil Company when together they formed
CALTEX in 1936 to take advantage of the latter’s formidable marketing network in Africa and
Asia. When CASOC geologists surveyed the concession area, they identified a promising site
and named it Dammam No. 1. Over the next three years, the drillers were unsuccessful in
making a commercial strike. The drillers finally struck oil on March 3, 1938 in Dammam No. 7.
This discovery would turn out to be first of many, eventually revealing the largest source of
crude oil in the world. The name of the operating company in Saudi Arabia was changed to
Arabian American Oil Company (Aramco) in January 1944. Two partners, Standard Oil
Company of New Jersey (later renamed Exxon) and Socony-Vacuum (now Mobil Oil
Company), were added in 1946 to gain investment capital and marketing outlets for the large
1
reserves being discovered in Saudi Arabia. These four companies were the sole owners of
Once the existence of oil in quantity was ascertained, the advantages of a pipeline to the
Mediterranean Sea seemed obvious, saving about 3,200 kilometers of sea travel and the transit
fees of the Suez Canal. The Trans-Arabian Pipeline Company (Tapline), a wholly owned
Aramco subsidiary, was formed in 1945, and the pipeline was completed in 1950. Tax problems
with Saudi authorities and transit fees due Jordan, Iraq, and Lebanon plagued Tapline for many
years. The line was damaged and out of operation several times in the 1970s. And while
operating costs of Tapline increased, supertankers were reducing seaborne expenses. By 1975
Tapline was no longer used to export Saudi crude via Sidon. In 1982 the line was again
damaged. In late 1983, Tapline filed formal notice to cease operations in Syria and Lebanon,
although small amounts of crude would reportedly continue, albeit temporarily, to supply a
refinery in Jordan.
The General Petroleum and Mineral Organization (Petromin) was established in 1962 as a public
corporation wholly owned by the Saudi government to develop industries based on petroleum,
natural gas, and minerals by itself or in conjunction with other investors, foreign or domestic.
Although its activities predominantly centered on the country's hydrocarbon resources, Petromin
After two decades of organizational change, the reshaping of the oil industry in Saudi Arabia
reared completion by the late 1980s. During the 1970s and early 1980s, the industry was
transformed from one controlled by foreign oil companies (the Aramco parent companies) to one
owned and operated by the government. Decisions made directly by the ruling family
increasingly became a feature of the industry in the late 1970s. Saudi Arabia's participation in the
2
Arab oil embargo in 1973 and foreign policy goals were featives of this transition. In 1992 the
government had title to all mineral resources in the country (except in the former Divided Zone,
where both Kuwait and Saudi Arabia had interests in the national resources of the whole zone).
Through the Supreme Oil Council, headed by the king, and the Ministry of Petroleum and
Mineral Resources the government initiated, funded, and implemented all investment decisions.
Saudi Arabia is the world’s largest producer and exporter of oil, and has one quarter of the
world’s known oil reserves – more than 260 billion barrels. Most are located in the Eastern
Province, including the largest onshore field in Ghawar and the largest offshore field at Safaniya
in the Arabian Gulf. Saudi refineries produce around 8 million barrels of oil per day, and there
are plans to increase production to around 12 million barrels per day. As the world’s largest
producer and exporter of oil, Saudi Arabia plays a unique role in the global energy industry. Its
policies on the production and export of oil, natural gas and petroleum products have a major
impact on the energy market, as well as the global economy. Mindful of this responsibility, Saudi
In our study we would like to consider various types of oil production data from Saudi Arabia.
Here we consider both crude oil and refined oil and look at the data regarding the export of these
two types of oils in different continents. This data set is taken from the official website of Saudi
http://www.sama.gov.sa/sites/samaen/ReportsStatistics/statistics/Pages/YearlyStatistics.aspx
3
At first we present time series plot of all sixteen oil production variables of Saudi Arabia from
4
Time Series Plot of Crude_Production_mil_barl
4000
3500
Crude_Production_mil_barl
3000
2500
2000
1500
1000
500
1962 1970 1978 1986 1994 2002 2010
Year
500
Total_refine_exp_mil_barl
400
300
200
100
5
Time Series Plot of Crude_North_america
700
600
Crude_North_america 500
400
300
200
100
0
1962 1970 1978 1986 1994 2002 2010
Year
Figure 1.3 Time Series Plot of Export of Crude Oil to North America
50
Refine_North_america
40
30
20
10
Figure 1.4 Time Series Plot of Export of Refined Oil to North America
6
Time Series Plot of Crude_South_america
500
400
Crude_South_america
300
200
100
0
1962 1970 1978 1986 1994 2002 2010
Year
Figure 1.5 Time Series Plot of Export of Crude Oil to South America
40
Refine_South_america
30
20
10
0
1962 1970 1978 1986 1994 2002 2010
Year
Figure 1.6 Time Series Plot of Export of Refined Oil to South America
7
Time Series Plot of Crude_Western_europe
1600
1400
Crude_Western_europe 1200
1000
800
600
400
200
0
1962 1970 1978 1986 1994 2002 2010
Year
Figure 1.7 Time Series Plot of Export of Crude Oil to Western Europe
80
70
Refine_Western_europe
60
50
40
30
20
10
0
1962 1970 1978 1986 1994 2002 2010
Year
Figure 1.8 Time Series Plot of Export of Refined Oil to Western Europe
8
Time Series Plot of Crude_Middle_east
120
110
Crude_Middle_east 100
90
80
70
60
50
40
30
1962 1970 1978 1986 1994 2002 2010
Year
Figure 1.9 Time Series Plot of Export of Crude Oil to Middle East
70
60
Refine_Middle_east
50
40
30
20
10
0
1962 1970 1978 1986 1994 2002 2010
Year
Figure 1.10 Time Series Plot of Export of Refined Oil to Middle East
9
Time Series Plot of Crude_Africa
100
80
Crude_Africa
60
40
20
0
1962 1970 1978 1986 1994 2002 2010
Year
50
40
Refine_Africa
30
20
10
10
Time Series Plot of Crude_Asia_and_Far_east
1600
1400
Crude_Asia_and_Far_east
1200
1000
800
600
400
200
1962 1970 1978 1986 1994 2002 2010
Year
Figure 1.13 Time Series Plot of Export of Crude Oil to Asia and Far East
350
300
Refine_Asia_and_Far_east
250
200
150
100
50
1962 1970 1978 1986 1994 2002 2010
Year
Figure 1.14 Time Series Plot of Export of Refined Oil to Asia and Far East
11
Time Series Plot of Crude_Oceania
50
Crude_Oceania 40
30
20
10
0
1962 1970 1978 1986 1994 2002 2010
Year
20
Refine_Oceania
15
10
12
The time series plots of the variables reveal lots of interesting features of time series. Most of the
plots show some kind of volatility which indicates that perhaps ARCH models are more
appropriate for these variables. Whatever model we consider here there is always a chance that
there might be few outliers in the data set. It is necessary to check the normality assumption of
error which is the key to any kind of statistical inference drawn from these data. Figures 1.3 –
1.16 show that one observation is consistently missing. This is the year 1987. We searched for
the reasons of the missing data, but could not find any discussions neither in any authentic
websites nor in any article that why the Saudi Arabian government did not publish the data in
1987. So we just treat this case as a missing value problem. For the ‘Export of Refined Oil to
Africa’ another observation (for year 1982) is missing as shown in Figure 1.12. Before fitting
any kind of econometric or time series models to these data we need to estimate the missing
values.
We organize this thesis in the following way. In chapter 2, we introduce different methodologies
we use in our research that include the diagnostic and robust methods of outlier detection,
estimation of missing values by robust EM algorithm, ARCH models and determination of order
of ARCH in time series using graphical and analytical tests, tests for normality of errors etc.
Since almost all variables that we consider in our study have missing observations, in chapter 3,
we employ the EM algorithm for estimating the missing values. It is now well known that
outliers can adversely affect the missing value estimation procedure and for this reason we
employ the robust EM algorithm where the outliers are identified at first by the least median of
squares (LMS) or the least trimmed squares (LTS) before applying the EM algorithm. In Chapter
4 in order to determine the most appropriate ARCH model for all seventeen Saudi Arabia oil
13
production data we employ the three most commonly used tests such as the Bruesch-Pagan, the
Goldfeld-Quandt, and the White test and also by using the Ljung-Box test based on the
autocorrelation function (ACF) and the partial autocorrelation function (PACF) . We observe
that different ARCH models are adequate for different variables. In Chapter 5, we draw
conclusions on our current research and suggest appropriate ARCH models for all seventeen
Saudi Arabia oil production variables. In this chapter we also outline our directions for future
research.
14
CHAPTER 2
ARCH/GARCH MODELS, OUTLIERS AND ROBUSTNESS,
TESTS FOR NORMALITY AND ESTIMATION OF MISSING
VALUES IN TIME SERIES
In this chapter we discuss different aspects of data analysis techniques useful in time series
analysis. Although the prime topic of our discussion will be regression analysis, but we will
consider some other important topics that we are going to use in our study. A time series is a
chronological sequence of observations on a particular variable. A time series model accounts for
patterns of the past movement of a variable and uses that information to predict its future
Researchers also use this technique to forecast and comment on goodness of fit of their model. In
regression analysis we usually use least squares models to find out the contribution of each
explanatory variables on the response variable. All the assumptions, including equal variance of
the error terms, required to meet. Equality of variance of the error terms is defined as
homoscedasticity. There are many situations where this assumption is violated and least squares
model does not work well. Violation of this assumption is called heteroskedasticity. In any area
of research, including econometrics, error terms may be larger in some points than the other and
the volatility may not be explained by explanatory variables. In such situation, ordinary least
square regression may not fit well and the variance of estimated coefficients of each explanatory
variables may be unreasonably high. In econometrics or in time series data, usually the unequal
15
variations of error terms related with the preceding time points. Engle (1982) defined it as
exhibition of time-varying volatility clustering and proposed a new method to address this kind
of heteroskedasticity. He proposed that variance of the current error term may be set as a
function of the previous time periods’ error terms. He proposed the Autoregressive Conditionally
The ARCH model allows to fit a model considering the conditional variance of error terms to
change over time as a function of past error terms. In ARCH model we have flexibility to use
any lag structure. Bollerslev (1986) discussed about the risk of using a totally free lag
distribution that may lead to violation of the non-negativity constraints. Then he proposed
more flexibility in lag structure. In GARCH model we consider variance of the error terms as a
Time series data can be modeled in different forms for different stochastic process. Addressing
the variations in the process we may consider autoregressive (AR) models, the integrated (I)
models and the moving average (MA) models. All of these models consider the assumption of
linearity on the previous data points. The general AR(p) models have the following form:
(2.1)
where the term is the source of randomness and is called white noise. It is assumed to hold the
following assumptions:
E[ ] = 0,
E[ ] = and
In modeling time series data we need to meet the above assumptions and one of the assumptions
16
is the model possesses constant error variance (white noise). To be sure that the selected time
series model is valid, we must test the assumptions (i.e., E[ ] = ). According to Granger and
Ramanathan (1984), there is really no reason to believe that the errors are white noise without
testing. Engle (1982) has written that under some circumstances, the “error variance may change
over time and be predicted by past forecast errors.’’ In the area of financial econometrics and in
consider seriously [Engle (1982)]. In cases of regression models, where the value of error
variance is a function of the time lag, an autoregressive model with conditional heteroskedastic
(ARCH) error variance may be in the appropriate model to model that risk or volatility.
model that has a variance, ht , that is conditional on the error variance at a previous time periods,
. (2.2)
and
(2.3)
(2.4)
The generalized version of ARCH model have been first used by Bollerslev in 1986. The
variance is a function of previous conditional variances and also previous innovations in the
(2.5)
17
2.1.2 Testing for ARCH
Before fitting the ARCH model we need to know whether ARCH effect is present in the data or
not. A number of detection methods are now available in the literature [see Pindyck and
Rubinfeld (1997), Green (1997)] for the detection of ARCH. They can be categorized broadly by
Graphical Tests
The simplest graphical test is to plot the data against time on the graph paper which is popularly known
as the time series (TS) plot. If the ARCH effect is present it is expected that this plot will show a
pattern: one is likely to find periods of high volatility followed by periods of low volatility and so on.
ARCH effect can be visible if we plot residuals against time. A plot containing periods of large
residuals followed by periods of small residuals will indicate the existence of the ARCH effect. The
other two graphs available for the same; plot of time series values against their corresponding lag
values and plot of residuals against the corresponding lag time series values. For both plots, linear
Analytical Tests
The graphical methods are simple and very easy to understand. But it may often produce ambiguous
pictures and analysts may come up with conflicting conclusions. That is why more formal tests like the
analytical tests are required. Several analytical methods are available to test the ARCH test. In each
case we wish to find a test for the null hypothesis of no ARCH. The specific alternative hypothesis
against which the null hypothesis is to be tested depends on the estimation procedure that is considered
18
White’s general ARCH Test
White test is considered to be the most popular test for detecting the ARCH effect which has been in
use from the very beginning [see Engel (1982)]. Let us consider an ARCH (1) model as defined in
Step 1. Given the data, we estimate the parameters given in (2.6) and compute the residuals û t .
Step 2. We then run the following (auxiliary) regression for the necessary order, i.e., for the pth order
In other words, here the squared residuals ( uˆ t2 ) from the original regression are regressed on the lag of
squared residuals. Higher powers of regressors can also be introduced. Note that there is a constant
term in this equation even though the original regression may or may not contain it. We obtain R 2
Step 3. Under the null hypothesis that there is no ARCH, it can be shown that sample size n times R 2
obtained from the auxiliary regression asymptotically follows the chi-square distribution with p degrees
of freedom, i.e.,
n R 2 ~ χ (2p ) (2.7)
Step 4. If the chi-square value obtained in equation (2.7) exceeds the critical value at the chosen level
19
Goldfeld-Quandt ARCH Test
Let us consider the model (2.2) once again. Here we wish to test the null hypothesis of
σ t2 = α 0 + α1 ut2−1 (2.8)
The Goldfeld-Quandt test procedure involves the calculation of two least square regression lines,
one using data thought to be associated with low variance errors and the other associated with
high variance errors. If the residuals variances associated with each regression line are
approximately equal, the homoscedasticity assumption cannot be rejected, but if the residuals
variance increases substantially, it is possible to reject the null hypothesis. The test can be carried
Step 1. Fit the two-variable model (2.6) and compute residuals û t . Consider the square of
residuals uˆ t2 as the dependent variable and one period lag of uˆ t2 (i.e. uˆ t2−1 ) as independent
variable.
Step 2. Order the data by the magnitude of the independent variable uˆ t2−1 , which is thought to be
Step 3. Omit the middle d observations, d might be chosen, for example, to be approximately
Step 4. Fit to separate regressions, the first (indicated by subscript 1) for the portion of the data
associated with low values of uˆ t2−1 and the second (indicated by subscript 2) associated with high
values of uˆ t2−1 . Each regression will involve (n-d)/2 pieces of data with [(n-d)/2] – 2 degrees of
20
freedom. The portion d must be small enough to ensure that sufficient degrees of freedom are
available to allow for the proper estimation of each of the separate regressions.
Step 5. Calculate the residuals sum of squares with each regression: ESS1 , associated with low
Step 6. Assume that the error process is normally distributed (and no serial correlation is
present), the statistic ESS2 / ESS1 will be distributed as an F statistic with (n-d-2k)/2 degrees of
freedom in both the numerator and the denominator. We can reject the null hypothesis at a
chosen level of significance if the calculate statistic is greater than the critical value of the F
distribution.
The Goldfeld-Quandt test can easily be applied to the general linear model by ordering the
observation by the magnitude of one of the independent variables. The test works because it
allows for the independent regression estimation of both high and low observation data.
However, there is an important cost involved. Because no restrictions are made on the regression
parameters (as well as the error variances) in each of the two-regression run, statistical power is
lost. A more powerful test (one that has smaller Type ΙΙ errors) would take into account the
information that the regression parameters are identical for both sets of data and that only the
error variance has changed. But the main shortcoming of this test is it can only detect whether or
not the data set is affected by ARCH but cannot say anything about its order.
The Goldfeld-Quandt test is a natural test to apply when one can order the observations in terms of the
increasing variance of the error term (or one independent variable). An alternative test, which does not
21
require such an ordering and is easy to apply, is the Breusch-Pagan test. Let us consider the model (2.2)
which includes a general assumption about the relationship between the true error variance and an
Equation (2.4) provides the specification of the form taken by autoregressive conditional
heteroscedasticity (ARCH) if it is indeed present, f (.) represents a general function that allows, for
example, for both non-linear and logarithmic form. ut2−1 , u t2− 2 ,..., u t2− p could be independent variable,
Step 1. To test the ARCH, we first calculate the least squares residuals û t from the regression in
equation (2.9). We consider the square of residuals uˆ t2 as the dependent variable and lag values of uˆ t2 ,
n
2
∑u t
σˆ 2 = t =1
(2.10)
n
uˆ t2
pt = (2.11)
σˆ 2
pt = α 0 + α1 uˆ t2−1 + vt (2.12)
22
Step 4. If the error term vt in equation (2.12) is normally distributed and there exists no
heteroscedasticity, then half of the regression sum of squares provides a suitable test statistic.
2
Φ = RSS / 2 = (TSS – ESS) / 2 ~ χ m (2.13)
−1
when there are m independent variables (including constant term). Therefore, if in an application the
quantity Φ exceeds the critical χ 2 value at the chosen level of significance, one can reject the
hypothesis of no ARCH.
1 n
l= ∑ lt
n t =1
(2.14)
where xt, may include lagged dependent and exogenous variables and an irrelevant constant has
been omitted from the likelihood. This likelihood function can be maximized with respect to the
models, especially for the determination of the order of ARCH. A good number of them are
23
based on the autocorrelation function because it provides a partial description of the process for
modeling purposes. The autocorrelation function tells us how much correlation there is between
neighboring data points in the series y t . We define the autocorrelation with lag k as
Cov ( y t , y t + k )
ρk = (2.15)
V ( y t )V ( y t + k )
In practice, we use an estimate of the autocorrelation function, called the sample autocorrelation
(SAC) function
T −k
∑ (y
t =1
t − y )( y t + k − y )
rk = T (2.16)
2
∑ (y
t =1
t − y)
A geometrically decline pattern of the sample autocorrelation indicates the presence of ARCH
⎧ 1/ n if k = 1
⎪
SE (rk ) = ⎨ ⎛ k − 1
2 ⎞ (2.17)
⎪ ⎜1 + 2 ∑ ri ⎟ / n if k > 1
⎩ ⎝ i =1 ⎠
T = rk /SE( rk ) (2.18)
24
Box and Pierce Test and Ljung and Box Test
To test the joint hypothesis that all the autocorrelation coefficients are zero we use a test statistic
H 0 : ρ1 = ρ 2 = … = ρ k = 0.
Box and Pierce show that the appropriate statistic for testing this null hypothesis is
k
2
Q=n ∑r i (2.19)
i =1
A slight modification of the Box-Pierce test was suggested by Ljuang and Box, which is
k
2
Q = n(n + 2)∑ (n − k ) −1 ri (2.20)
i =1
Thus, if the calculated value of Q is greater than, say, the critical 5% level, we can be 95% sure
The sample autocorrelation can indicate about whether there is any ARCH effect is present in the
data but cannot tell much about the order of ARCH. The partial autocorrelation function is often
used to determine the order of an ARCH model. For an autoregressive process of order p, the
which gives
γ0 = φ1 γ 1 + φ 2 γ 2 + … + φ p γ p + σ ∈ 2 (2.22)
25
γ1 = φ1 γ 0 + φ 2 γ 1 + … + φ p γ p −1
……………………………………
γ p = φ1 γ p −1 + φ 2 γ p − 2 + … + φ p γ 0 (2.23)
The above equations also give a set of p equations, known as Yule-Walker equations, to
ρ1 = φ1 + φ 2 ρ1 + φ p ρ p −1
………………………………
ρ p = φ1 ρ p −1 + φ 2 ρ p − 2 + … + φ p (2.24)
The solution of the Yule-Walker equations requires the knowledge of p. Therefore we solve
the sample autocorrelation φˆ1 as an estimate of ρ1 . If this value is significantly different from 0,
we know that the autoregressive process is at least order 1. Next we consider the hypothesis that
p = 2. We solve the Yule-Walker equations for p = 2 and obtain a new set of estimates for φ1
and φ 2 . If φ 2 is significantly different from 0, we may conclude that the process is at least order
2. Otherwise we conclude that the process is order 1. We repeat this process for successive
values of p. We call the series φ1 , φ 2 , …, partial autocorrelation function. If the true order of the
To test whether a particular φ j is zero, we can use the fact that it is approximately
normally distributed with mean 0 and variance 1/n. Hence we can check whether it is statistically
26
Table 2.1: Specification of ARCH Models
… … …
ARCH (p) Geometric decline from pth lag Zero after p-lags
The determination of the order of ARCH using the ACF and PACF values are summarized in the
above table.
and their GARCH (generalized ARCH) extension is due to Bollerslev (1986). In these models,
the key concept is the conditional variance, that is, the variance conditional on the past. In the
classical GARCH models, the conditional variance is expressed as a linear function of the
i) E ( .
(2.25)
because it provides consistent and asymptotically normal estimators for strictly stationary
27
GARCH processes under mild regularity conditions, but with no moment assumptions on the
observed process. By contrast, the least-squares methods of the previous chapter require
moments of order 4 at least. The QML considers an iterative procedure for computing the
written as if the law of the variables ηt were Gaussian N (0, 1) (refer to pseudo- or quasi-
likelihood), but this assumption is not necessary for the strong consistency of the estimator.
To write the likelihood of the model, a distribution must be specified for the iid variables ηt.
Here we do not make any assumption on the distribution of these variables, but we work with a
function, called the (Gaussian) quasi-likelihood, which, conditionally on some initial values,
coincides with the likelihood when the ηt are distributed as standard Gaussian. The conditional
(2.26)
. One can use Ljung-Box and Jarque-Bera tests for diagnostic checking.
the data set is called an outlier. According to Barnett and Lewis (1994), ‘We shall define an
inconsistent with the remainder of that set of data.’ Outliers do not inevitably ‘perplex’ or
‘mislead’; they are not necessarily ‘bad’ or ‘erroneous’, and the experimenter may be tempted in
28
some situations not to reject an outlier but to welcome it as an indication of some unexpectedly
useful industrial treatment or surprisingly successful agricultural variety. Outliers are considered
as an empirical reality. Hampel et al. (1986) claim that a routine data set typically contains about
1-10% outliers, and even the highest quality data set cannot be guaranteed free of outliers.
Normality and the entire classical inferential procedure might breakdown in the presence of
outliers.
measurement error.
Execution Error: Imperfect collection of data. We may inadvertently choose a biased sample or
Among them the three-sigma rule has become very popular with the statisticians. If we assume a
normal distribution, a single value may be considered as an outlier if it falls outside a certain
is the ratio between its distance to the sample mean and the sample SD:
29
xi − x
ti =
s (2.27)
Observations with | ti | > 3 are traditionally deemed as suspicious (the three-sigma rule), based on
the fact that they would be very unlikely under normality, since P (|t| > 3) = 0.003 for a random
Although the three-sigma rule is very popular sometimes it may fail to identify outliers
because the statistic (2.27) is based on mean and standard deviation which may be severely
contaminated in the presence of outliers. When we have multiple outliers, the three sigma rule
usually does not work mainly because of masking and swamping effects of outliers. Masking
occurs when we fail to detect the outliers (false negative). Swamping occurs when observations
outlier, although better than doing nothing, still poses a number of problems [see Maronna et al.
(2006)]:
• The user may think that ‘an observation is an observation’ (i.e., observations should
speak of themselves) and hence feel uneasy about deleting them. Sometimes atypical data
may be the most informative data and its deletion may outliers.
variability.
30
• Since the results depend on the user’s subjective decisions, it is difficult to determine the
The word “Robust” literary means something “very strong.” So robust statistics are those
statistics which do not breakdown easily. The term robustness signifies insensitivity to small
deviations from the assumption. That means a robust procedure is nearly as efficient as the
classical procedure when classical assumptions hold strictly but is considerably more efficient
over all when there is a small departure from them. One objective of robust techniques is to cope
with outliers by trying to keep small the effects of their presence. The analogous term used in the
Here we introduce several statistics which are robust in the presence of outliers. Median and
trimmed mean are robust measures of location. For the measure of dispersion we can use the
normalized median absolute deviation (MADN). For a set of data the Median Absolute Deviation
(MAD) is defined as
To make the MAD comparable to the SD in terms of efficiency, we consider the normalized
MAD defined as
31
Both of them are based on order statistics; the former is clearly very sensitive to outliers, while
ineffective in the identification of outliers since its components the mean and the standard
deviation are not outlier resistant. For this reason we need robust outlier detection methods
which are unaffected in the presence of outliers. In this section we discuss few robust outlier
detection methods.
Let us now use the robust plug-in technique to obtain a robust t-like statistic from (2.27) by
replacing mean by median and SD by the normalized median absolute deviation (MADN). Thus
xi − Median(x ) (2.32)
tiʹ′ =
MADN(x )
Interquartile Range
The above-mentioned strategies for identifying outliers are probably most appropriate for
32
Hampel’s Test
In recent years Hampel (1984)’s test for outliers has become very popular in data mining and
It is interesting to note that Hampel’s test is equivalent to robust t test. Recall that according to
the robust t test as described in (2.32). It is easy to show from (2.32) that an observation is
identified as an outlier if
time series data. Outliers occur in statistical data quite frequently. Hampel et al. (1986) indicate that
routine data generally contain 1-10% gross errors and even high quality data may not be guaranteed
free from it. Outliers are more critical in time series, especially in non-linear time series, because, their
effects may be much longer persistent and they may have serious impact in parameter estimation. An
excellent review of detection of outliers in time series is available in Gounder et al. (2007). Software
packages based on the work of Box and Jenkins (1976) are widely available, but unfortunately they are
restricted to the least square approach and do not provide for handling outliers. Indeed the field of
robust time series analysis has come into existence only fairly recently and has seen most of its activity
during the last decade. This is partly because one had to wait for the development of robust regression
techniques (of which extensive use is made) and also because of the increased difficulty inherent in
dealing with dependencies between the observations. But most of the tests for outliers in time series
analysis available in the literature are designed to identify additive and/or innovation outliers for
33
autoregressive (AR), moving average (MA), autoregressive moving average (ARMA) model, but
detection of outlier for ARCH model are not developed until quite recently [see Franses and van Dijk
(2000)]. But these methods are computationally extensive and are not readily available in
econometrical/statistical packages.
Rousseeuw and Leroy (1987) gave a rough and ready suggestion to use the robust regression
techniques like LMS and LTS for any time series model. Rousseeuw (1984) proposed Least Median of
Squares (LMS) regression which is a fitting technique less sensitive to outliers than the OLS. In OLS,
we estimate parameters by
n
2
minimizing the sum of squared residuals ∑u t
t =1
1 n 2
minimize the mean of squared residuals ∑ ut .
n t =1
Sample means are sensitive to outliers, but medians are not. Hence to make it less sensitive we can
Then the LMS estimate of β is the value that minimizes MSR ( β̂ ). Rousseeuw and Leroy (1987)
have shown that LMS estimates are very robust with respect to outliers and have the highest possible
The least trimmed (sum of) squares (LTS) estimator is proposed by Rousseeuw (1984). In this method
h
2
LTS ( β̂ ) = minimize ∑ uˆ ( )
t =1
t (2.37)
34
Here û (t ) is the t-th ordered residual. For a trimming percentage of α , Rousseeuw and Leroy (1987)
suggested choosing the number of observations h based on which the model is fitted as h = [n (1 – α )]
+ 1. The advantage of using LTS over LMS is that, in the LMS we always fit the regression line based
on roughly 50% of the data, but in the LTS we can control the level of trimming. When we suspect that
the data contains nearly 10% outliers, the LTS with 10% trimming will certainly produce better result
than the LMS. We can increase the level of trimming if we suspect there are more outliers in the data.
In quest of which robust fit does well in ARCH models Doula et al (2007) show that the LTS in general
performs better than the LMS for detecting outliers. We employ the LTS method to fit the time series
model and also use it to identify outliers if any. We consider graphical and analytical tests for ARCH
statistical analysis which are normality, data screening and randomness. At first we check the
condition of normality assumption for the data. This is the most crucial diagnostic check as the
entire classical statistics are based on the normality assumption of observations. At the time of
the development of the classical statistics there was a general believe among the statisticians that
the data set follow a normal distribution. It was observed that most of the classical data such as
height, weight etc followed normal distribution. In the last hundred years, attitudes towards the
assumption of a normal distribution in statistical models have varied from one extreme to
another. To quote Pearson (1905), ‘Even towards the end of the nineteenth century not all were
convinced of the need for curves other than normal.’ By the middle of this century Geary (1947)
35
made this comment ‘Normality is a myth; there never was and never will be a normal
distribution.’ Now it is evident that nonnormal data are more prevalent in nature. A nice review
method is based on the fact that if the ordered observations are plotted against their cumulative
probabilities on normal probability paper, the resulting points should lie approximately on a
straight line.
observations and the expectation of normalized order statistics is known as the Shapiro–Wilk
test. A test based on empirical distribution function is known as the Anderson–Darling test.
Jarque-Bera Test
A test based on the coefficients of skewness and kurtosis is known as Bowman–Shenton test.
This test is popularly known as the Jarque–Bera test. If we denote the sample size by n, the
sample skewness by S and the sample kurtosis by K, then the Jarque–Bera test statistic is defined
as
JB = [n / 6] [ S 2 + ( K − 3) 2 / 4] (2.38)
The standard theory tells us that a normal distribution has skewness 0 and the value of the
kurtosis is 3. So a departure from these two values will indicate non-normality and that is how
this test statistic was developed. The JB statistic follows a chi-square distribution with 2 degrees
of freedom.
36
Rescaled Moments Test
Imon (2003) suggests a slight adjustment to the JB statistic to make it more suitable for the
regression problems. The skewness and kurtosis components of the JB test are based on the
unobserved errors but in reality we use residuals instead. Those estimates are not unbiased either.
To overcome these problems Imon (2003) proposed a statistic based on rescaled moments (RM)
RM = [n c 3 / 6] [ S 2 + c ( K − 3) 2 / 4] (2.39)
where c = n/(n – k), k is the number of independent variables in a regression model. Both the JB
and the RM statistic follow a chi square distribution with 2 degrees of freedom. If the values of
these statistics are greater than the critical value of the chi square, we reject the null hypothesis
of normality.
The RM test performs better than the JB in every respect, but both the JB and the RM use the
least squares residuals in it which can be largely affected by outliers. To overcome this problem
Rana et al. (2009) suggested a normality test whose form is exactly same as the RM statistic as
shown in (2.39), but instead of the least squares residuals it uses robust LMS or LTS residuals.
This test is known as the robust rescaled moments (RRM) test for normality.
parameters and loss of power. An excellent review of different aspects of missing values is
37
available in Little and Rubin (2002). In this section we introduce few commonly used missing
Rubin (2002)) is one of the most widely used technique to solve incomplete data problems.
Therefore, this study stresses on several imputation methods to determine the best methods to
by x1* , x 2* ,..., x m* . Thus the observed data with missing values are
x1 , x2 ,..., xn1 , x1* , xn1 +1 , xn1 +2 ,..., xn2 , x2* , xn2 +1 , xn2 +2 ,..., xm* , xn (2.40)
Therefore, the first missing value occurs after n1 observations, the second missing value occur
after n2 observations, and so on. Note that there might be more than one consecutive missing
observation.
Mean-before Technique
The mean-before technique is one of the most popular imputation techniques in handling missing
data. This technique consists of substituting all missing values with the mean of all available data
before missing values. Thus for the data in (2.40), x1* will be replaced by
1 n1
x1= ∑ xi (2.41)
n1 i =1
n2
1
x 2= ∑ xi (2.42)
(n2 − n1 − 1) i = n1 +1
38
and so on.
Mean-before-after Technique
The mean-before-after technique substitutes all missing values with the mean of one datum
before the missing value and one datum after the missing value. Thus for the data in (2.40),
xn1 + xn1 +1
x1= (2.43)
2
x n2 + x n2 +1
x2= (2.44)
2
and so on.
parameter. If y were complete, the maximum likelihood of would be based on the distribution
incomplete data and is the unobserved missing data. Let assume that the missing data is
(2.45)
(2.46)
39
conditional expectation given the observed data . The EM algorithm has an E-step
(2.47)
(2.48)
The E-step and M-step are repeated alternately until the difference is less that
If the convergence attribute of the likelihood function of the complete data, that is , is
attainable then convergence of EM algorithm also attainable. The rate of convergence depends
on number of missing observations. Dempster, Laird, and Rubin (1977) show that convergence is
linear with rate proportional to the fraction of information about in that is observed.
regression or in time series we assume a model and that should have a consideration when we try
to estimate missing values. In time series things are even more challenging as the observations
are dependent. In this study, we consider EM (LTS) method for estimating the missing values.
error (RMSE) and estimated bias (EB) are considered to examine the accuracy of theses
40
imputation methods. In order to select the best method for estimation missing values, the
The mean absolute error is the average difference between predicted and actual data values, and
is given by
N
1
MAE = ∑ P −O i i
N i =1 (2.49)
where N is the number of imputations, Pi and Oi are the imputed and observed data points,
respectively. MAE varies from 0 to infinity and perfect fit is obtained when MAE=0.
The root mean squared error is one of the most commonly used measure and it is computed by
1 N
RMSE = ∑ [Pi − Oi ]2 (2.50)
N i =1
The smaller is the RMSE value, the better is the performance of the model.
The estimated bias is the absolute difference between the observed and the estimated value of the
EB = Oi − Ei (2.51)
where Ei is the estimated value of the parameter that obtained from the imputation methods.
2.5 Computation
We have used a number of modern and sophisticated statistical software such as R, S-Plus and
41
CHAPTER 3
OUTLIER ANALYSIS AND ESTIMATION OF MISSING
VALUES BY ROBUST EM ALGORITHM FOR SAUDI ARABIA
OIL PRODUCTION DATA
In this section our main objective is to estimate missing values of Saudi Arabia oil production
data because if any data is missing we cannot fit any ARCH model to the data. It is now evident
that outliers may have an adverse effect [see Mamun (2013)] on the estimation of missing values
and also in the determination of the order of ARCH [see Imon et al. (2007)]. The presence of
outliers may also break the normality assumption [see Imon (2003)] which is one of the most
important assumptions required for statistical inference. So we would like to apply a robust
approach of missing value estimation. For this reason we need to know which observations are
At first we would like to identify outliers (if any) from all sixteen variables we are using in our
study. An excellent review of different outlier detection methods is available in Hadi et al.
(2009). But since our final objective is to fit the data by ARCH models we restrict our attention
to the identification of outliers in time series and ARCH models. An excellent review of methods
appropriate for such a condition is available in Murugeson et al. (2007). We would also like to
employ robust methods for the identification of outliers in time series data. Again we have lots of
different choices but following the suggestions of Rousseeuw and Leroy (1987), Barnett and
Lewis (1993), Imon et al. (2007) we would use the least trimmed squares (LTS) method for the
42
identification purpose. We consider sixteen different data one by one using S-Plus and the results
The first variable that we consider is the Crude Oil Production data. The attached S-Plus output
shows that cases 17 – 26, i.e., observations for the years 1978 – 1987 are appearing as outliers.
Outliers in Model 1:
17 18 19 20 21 22 23
24 25 26
43
Export of Refined Oil to South America
44
Export of Crude Oil to Oceania
The attached S-Plus output shows that cases 16 – 26, i.e., observations for the years 1977 – 1987
16 17 18 19 20 21 22
23 24 25 26
The attached S-Plus output shows that cases 20 – 26, i.e., observations for the years 1981 – 1987
20 21 22 23 24 25 26
The above results make some sense. The Saudi Arabian pipeline for exporting oil was damaged
In this section we would like to estimate the missing values of the oil production variables of
Saudi Arabia. We have observed in section 1.2 that 14 out of 16 variables have missing
observations. Information for the year 1987 is missing for all sixteen variables. For the ‘Export
of Refined Oil to Africa’ another observation for the year 1982 is also missing. Estimation of
missing values is really necessary while fitting an ARCH model because if there exists any
45
discontinuity the process does not converge. In section 2.4 we discussed different methods of
estimating missing values, but since we wre dealing with time series data, the simple mean
imputation or EM estimation should not be appropriate for them. That is not all, few of the
variables also contain outliers. For this reason we use the robust EM (LTS) method for
estimating missing values using S-Plus and the estimates are given in the following table.
Table 3.1 Estimates of Missing Values for the Saudi Arabia Oil Production Data
1982 1987
46
Next we construct time series plot for all fourtine variables where some values were missing and
for comparison they are displayed with their corresponding time series plots of the original data.
600
Crude_North_america
500
400
300
200
100
0
1962 1970 1978 1986 1994 2002 2010
Year
600
Crude_North_america
500
400
300
200
100
0
1962 1970 1978 1986 1994 2002 2010
Year
Figure 3.1 Time Series Plot of Original and Missing Value Estimated Export of Crude Oil to
North America
47
Time Series Plot of Refine_North_america
60
50
Refine_North_america
40
30
20
10
50
Refine_North_america
40
30
20
10
Figure 3.2 Time Series Plot of Original and Missing Value Estimated Export of Refined Oil to
North America
48
Time Series Plot of Crude_South_america
500
400
Crude_South_america
300
200
100
0
1962 1970 1978 1986 1994 2002 2010
Year
400
Crude_South_america
300
200
100
0
1962 1970 1978 1986 1994 2002 2010
Year
Figure 3.3 Time Series Plot of Original and Missing Value Estimated Export of Crude Oil to
South America
49
Time Series Plot of Refine_South_america
50
40
Refine_South_america
30
20
10
0
1962 1970 1978 1986 1994 2002 2010
Year
40
Refine_South_america
30
20
10
0
1962 1970 1978 1986 1994 2002 2010
Year
Figure 3.4 Time Series Plot of Original and Missing Value Estimated Export of Refined Oil to
South America
50
Time Series Plot of Crude_Western_europe
1600
1400
Crude_Western_europe 1200
1000
800
600
400
200
0
1962 1970 1978 1986 1994 2002 2010
Year
1400
Crude_Western_europe
1200
1000
800
600
400
200
0
1962 1970 1978 1986 1994 2002 2010
Year
Figure 3.5 Time Series Plot of Original and Missing Value Estimated Export of Crude Oil to
Western Europe
51
Time Series Plot of Refine_Western_europe
90
80
Refine_Western_europe 70
60
50
40
30
20
10
0
1962 1970 1978 1986 1994 2002 2010
Year
80
70
Refine_Western_europe
60
50
40
30
20
10
0
1962 1970 1978 1986 1994 2002 2010
Year
Figure 3.6 Time Series Plot of Original and Missing Value Estimated Export of Refined Oil to
Western Europe
52
Time Series Plot of Crude_Middle_east
120
110
Crude_Middle_east 100
90
80
70
60
50
40
30
1962 1970 1978 1986 1994 2002 2010
Year
110
100
Crude_Middle_east
90
80
70
60
50
40
30
1962 1970 1978 1986 1994 2002 2010
Year
Figure 3.7 Time Series Plot of Original and Missing Value Estimated Export of Crude Oil to
Middle East
53
Time Series Plot of Refine_Middle_east
80
70
Refine_Middle_east 60
50
40
30
20
10
0
1962 1970 1978 1986 1994 2002 2010
Year
70
60
Refine_Middle_east
50
40
30
20
10
0
1962 1970 1978 1986 1994 2002 2010
Year
Figure 3.8 Time Series Plot of Original and Missing Value Estimated Export of Refined Oil to
Middle East
54
Time Series Plot of Crude_Africa
100
80
Crude_Africa
60
40
20
0
1962 1970 1978 1986 1994 2002 2010
Year
80
Crude_Africa
60
40
20
0
1962 1970 1978 1986 1994 2002 2010
Year
Figure 3.9 Time Series Plot of Original and Missing Value Estimated Export of Crude Oil to
Africa
55
Time Series Plot of Refine_Africa
50
40
Refine_Africa
30
20
10
50
40
Refine_Africa
30
20
10
Figure 3.10 Time Series Plot of Original and Missing Value Estimated Export of Refined Oil to
Africa
56
Time Series Plot of Crude_Asia_and_Far_east
1600
1400
Crude_Asia_and_Far_east
1200
1000
800
600
400
200
1962 1970 1978 1986 1994 2002 2010
Year
1400
Crude_Asia_and_Far_east
1200
1000
800
600
400
200
1962 1970 1978 1986 1994 2002 2010
Year
Figure 3.11 Time Series Plot of Original and Missing Value Estimated Export of Crude Oil to
57
Time Series Plot of Refine_Asia_and_Far_east
350
300
Refine_Asia_and_Far_east
250
200
150
100
50
1962 1970 1978 1986 1994 2002 2010
Year
350
300
Refine_Asia_and_Far_east
250
200
150
100
50
1962 1970 1978 1986 1994 2002 2010
Year
Figure 3.12 Time Series Plot of Original and Missing Value Estimated Export of Refined Oil to
58
Time Series Plot of Crude_Oceania
50
Crude_Oceania 40
30
20
10
0
1962 1970 1978 1986 1994 2002 2010
Year
40
Crude_Oceania
30
20
10
0
1962 1970 1978 1986 1994 2002 2010
Year
Figure 3.13 Time Series Plot of Original and Missing Value Estimated Export of Crude Oil to
Oceania
59
Time Series Plot of Refine_Oceania
25
20
Refine_Oceania
15
10
20
Refine_Oceania
15
10
Figure 3.14 Time Series Plot of Original and Missing Value Estimated Export of Refined Oil to
Oceania
Figures 3.1 – 3.14 clearly show that the estimation of missing values using EM (LTS) perfectly
60
CHAPTER 4
SELECTION OF ARCH MODELS FOR SAUDI ARABIA OIL
PRODUCTION DATA
In this chapter our main objective is to find appropriate ARCH models for all sixteen oil
production variables under study. At first we look at the autocorrelation function (ACF) and
partial autocorrelation functions (PACF) to detect whether there is any ARCH effect in the
model and if so what is the order of it. But we know that ACF and PACF can only give an
indication. We need to employ formal test to answer this question. Later we use the White test
and the Bruesch-Pagan test to confirm the order of ARCH if at all. The Goldfeld-Quandt test can
only indicate whether there is any ARCH effect in the model, but it cannot determine its order.
After each fitting we have done the normality test. This is hugely important when we draw
inference as all of our conventional inferential procedures heavily rely on normality assumptions.
this data together with the Ljuang-Box t and χ 2 tests are given in Table 4.1.1 and in Figure 4.1.1.
Table 4.1.1 The ACF and PACF Values for the Crude Oil Production Data
61
4 0.415648 1.39671 97.897 -0.093001 -0.65101
1.0 1.0
0.8 0.8
0.6 0.6
Partial Autocorrelation
0.4 0.4
Autocorrelation
0.2 0.2
0.0 0.0
-0.2 -0.2
-0.4 -0.4
-0.6 -0.6
-0.8 -0.8
-1.0 -1.0
1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12
Lag Lag
Figure 4.1.1 The ACF and PACF Values for the Crude Oil Production Data
The above table and figure clearly show an indication of autoregressive pattern. The ACF values
show a geometrically declined pattern and only the first PACF value is significant, hence the
possible model could be ARCH (1). To confirm this we employ both the White and Bruesch-
Pagan tests. Table 4.1.2 gives the significance of ARCH effects based on the White test.
62
Table 4.1.2 Order of ARCH Using the White Test for the Crude Oil Production Data
The above results clearly show that the data fits an ARCH (1) model. ARCH (1) effect is highly
significant but ARCH (2) is not and the White statistic for ARCH (1) is highly significant.
Next we employ the Breusch-Pagan test and the results are presented in Table 4.1.3
Table 4.1.3 Order of ARCH Using the Breusch-Pagan Test for the Crude Oil Production Data
Here we observe the same type of results. The above results show that only ARCH (1) is
significant for the data. Thus we can conclude that the crude oil production data fits an ARCH
(1) model.
63
Probability Plot of Residual
Normal - 95% CI
99
Mean -4.12115E-13
StDev 334.0
95 N 48
AD 0.404
90
P-Value 0.343
80
70
Percent
60
50
40
30
20
10
1
-1000 -500 0 500 1000
Residual
Figure 4.1.2 Normal Probability Plot of ARCH (1) Residuals for the Crude Oil Production Data
For the validity of the inference we now check the normality assumption of the errors. At
first we give a normal probability plot of the ARCH (1) residuals which is given in Figure 4.1.2.
Table 4.1.4 Normality Test of ARCH (1) Rresiduals for the Crude Oil Production Data
Skewness -0.83
Kurtosis 4.81
It is not very clear from the above plot that whether the errors satisfy normality assumptions
here. Hence we employ two analytic tests, the Jarque-Bera and the rescaled moments to check
The above table clearly shows non-normality of erros. Both the JB and RM tests appear
to be significant. For this particular dat we observe in section 3.2 that there are few outliers. The
above normal probability plot also suggests the evidence of few outliers. We believe these
64
outliers are the main reason of this apparent non-normality. Since ARCH (1) model fails the
Now we recompute all results using the robust LTS residuals instead of the least squares
residuals. At first we look at the ACF and PACF values which are now given in Table 4.1.5 and
in Figure 4.1.3.
Table 4.1.5 The ACF and PACF Values for the LTS Crude Oil Production Data
1.0 1.0
0.8 0.8
0.6 0.6
Partial Autocorrelation
0.4 0.4
Autocorrelation
0.2 0.2
0.0 0.0
-0.2 -0.2
-0.4 -0.4
-0.6 -0.6
-0.8 -0.8
-1.0 -1.0
1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9
Lag Lag
Figure 4.1.3 The LTS ACF and PACF Values for the Crude Oil Production Data
65
Likewise the previous results, the ACF and PACF values indicate that the possible model could
be ARCH (1). Tables 4.1.6 and 4.1.7 show that Both the White test and the Breusch-Pagan test
confirm that ARCH (1) model fits the data. For the rest of the examples we observe that the
Bruesch-Pagan test is consistently giving the same conclusion and since White test is much
easier and more popular test, we report only the White test for brevity.
Table 4.1.6 Order of ARCH Using the LTS White Test for the Crude Oil Production Data
Table 4.1.7 Order of ARCH Using the LTS Breusch-Pagan Test for the Crude Oil Production
Data
Finally we check the normality assumption. The normal probability plot of the ARCH (1) LTS
We use the robust rescaled moments (RRM) test which includes LTS residuals. To make
the results comparable with the tests with the outliers we present this result with the JB test as it
66
Probability Plot of Del_Res
Normal - 95% CI
99
Mean 33.99
StDev 253.0
95 N 36
AD 0.426
90
P-Value 0.299
80
70
Percent 60
50
40
30
20
10
1
-500 0 500 1000
Del_Res
Figure 4.1.4 Normal Probability Plot of ARCH (1) LTS Residuals for the Crude Oil Production
Data
Table 4.1.8 Normality Test of ARCH (1) Rresiduals for the Crude Oil Production Data
The above results clearly show that the LTS residuals show normal pattern. So we can make a
reliable inference based on these and the appropriate model is ARCH (1).
Next we consider is the total export of refined oil. The ACF and PACF values for this data
together with the Ljuang-Box t and χ 2 tests are given in Table 4.2.1 and in Figure 4.2.1.
67
Table 4.2.1 The ACF and PACF Values for the Total Export of Refined Oil Data
1.0 1.0
0.8 0.8
0.6 0.6
Partial Autocorrelation
0.4 0.4
Autocorrelation
0.2 0.2
0.0 0.0
-0.2 -0.2
-0.4 -0.4
-0.6 -0.6
-0.8 -0.8
-1.0 -1.0
1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12
Lag Lag
Figure 4.2.1 The ACF and PACF Values for Total Export of Refined Oil Data
The above table and figure clearly show an indication of autoregressive pattern. The ACF values
show a geometrically declined pattern and only the first PACF value is significant indicating that
68
the possible model could be ARCH (1). To confirm this we employ the White test and the results
Table 4.2.2 Order of ARCH Using the White Test for the Total Export of Refined Oil Data
60
50
40
30
20
10
1
-200 -100 0 100 200
Residual
Figure 4.2.2 Normal Probability Plot of ARCH (1) Residuals for the Total Export of Refined Oil
Data
Here we compare three different ARCH models. At first we fit an ARCH (1) model and observe
that the lag effect is significant. Then we fit an ARCH (2) model and observe that the first lag
effect is significant, but the second one is not. Thus we may conclude that ARCH (1) is the most
appropriate model for this data. The White statistic for ARCH (1) is 23.91 and which is highly
significant.
69
Table 4.2.3 Normality Test of ARCH (1) Rresiduals for the Total Export of Refined Oil Data
For the validity of the inference we now check the normality assumption of the errors. At first we
give a normal probability plot of the ARCH (1) residuals which is given in Figure 4.2.2. This
graph shows that perhaps the errors satisfy normality assumptions here. Finally we employ the
Jarque-Bera and the rescaled moments test for checking the normality of errors and the results
are presented in Table 4.2.3. Thus the ARCH (1) model passes the normality test and we can
conclude that ARCH (1) is the most appropriate model for this data.
data together with the Ljuang-Box t and χ 2 tests are given in Table 4.3.1 and in Figure 4.3.1.
Table 4.3.1 The ACF and PACF Values for the Export of Crude Oil to North America Data
70
6 0.067654 0.22598 67.2031 0.069746 0.45736
1.0 1.0
0.8 0.8
0.6 0.6
Partial Autocorrelation
0.4 0.4
Autocorrelation
0.2 0.2
0.0 0.0
-0.2 -0.2
-0.4 -0.4
-0.6 -0.6
-0.8 -0.8
-1.0 -1.0
1 2 3 4 5 6 7 8 9 10 11 1 2 3 4 5 6 7 8 9 10 11
Lag Lag
Figure 4.3.1 The ACF and PACF Values for the Export of Crude Oil to North America Data
The above table and figure clearly show an indication of autoregressive pattern. The ACF values
show a geometrically declined pattern and the first two PACF values are significant indicating
that the possible model could be ARCH (2). To confirm this we employ the White test and the
Table 4.3.2 Order of ARCH Using the White Test for the Export of Crude Oil to North America
Data
71
Lag 2 -0.3969 -2.63 0.012
60
50
40
30
20
10
1
-400 -300 -200 -100 0 100 200 300 400 500
Residual
Figure 4.3.2 Normal Probability Plot of ARCH (2) Residuals for the Export of Crude Oil to
Here we compare three different ARCH models. At first we fit an ARCH (1) model and observe
that the lag effect is significant. Then we fit an ARCH (2) model and observe that both the first
two lag effects are significant. Next we fit an ARCH (3) model and observe that both the first
two lag effects are significant, but the third one is not. Thus we may conclude that ARCH (2) is
the most appropriate model for this data. The White statistic for ARCH (2) is 17.934 and that is
For the validity of the inference we now check the normality assumption of the errors. At
first we give a normal probability plot of the ARCH (2) residuals which is given in Figure 4.3.2.
This graph shows that perhaps the errors satisfy normality assumptions here. Finally we employ
72
the Jarque-Bera and the rescaled moments test for checking the normality of errors and the
Table 4.3.3 Normality Test of ARCH (2) Rresiduals for the Export of Crude Oil to North
America Data
Thus the ARCH (2) model passes the normality test and we can conclude that ARCH (2) is the
this data together with the Ljuang-Box t and χ 2 tests are given in Table 4.4.1 and in Figure 4.4.1.
Table 4.4.1 The ACF and PACF Values for the Export of Refined Oil to North America Data
73
7 0.112194 0.34285 86.5790 0.029984 0.19662
1.0 1.0
0.8 0.8
0.6
Partial Autocorrelation
0.6
0.4
0.4
Autocorrelation
0.2
0.2
0.0
0.0
-0.2
-0.2
-0.4
-0.4
-0.6
-0.6
-0.8
-0.8 -1.0
-1.0
1 2 3 4 5 6 7 8 9 10 11
1 2 3 4 5 6 7 8 9 10 11 Lag
Lag
Figure 4.4.1 The ACF and PACF Values for the Export of Refined Oil to North America Data
The above table and figure clearly show an indication of autoregressive pattern. The ACF values
show a geometrically declined pattern and the first two PACF values are significant indicating
that the possible model could be ARCH (2). To confirm this we employ the White test and the
Table 4.4.2 Order of ARCH Using the White Test for the Export of Refined Oil to North
America Data
74
Here we compare three different ARCH models. At first we fit an ARCH (1) model and observe
that the lag effect is significant. Then we fit an ARCH (2) model and observe that the first lag
effect is significant, but the second one is not. Thus we may conclude that ARCH (1) is the most
appropriate model for this data. The White statistic for ARCH (1) is 14.964 and that is highly
significant as well.
For the validity of the inference we now check the normality assumption of the errors. At
first we give a normal probability plot of the ARCH (1) residuals which is given in Figure 4.4.2.
This graph shows that perhaps the errors satisfy normality assumptions here. Finally we employ
the Jarque-Bera and the rescaled moments test for checking the normality of errors and the
60
50
40
30
20
10
1
-20 -10 0 10 20
Residual
Figure 4.4.2 Normal Probability Plot of ARCH (1) Residuals for the Export of Refined Oil to
75
Table 4.4.3 Normality Test of ARCH (1) Rresiduals for the Export of Refined Oil to North
America Data
Thus the ARCH (1) model passes the normality test and we can conclude that ARCH (1) is the
this data together with the Ljuang-Box t and χ 2 tests are given in Table 4.5.1 and in Figure 4.5.1.
Table 4.5.1 The ACF and PACF Values for the Export of Crude Oil to South America Data
76
9 -0.040192 -0.13265 69.3862 0.225219 1.47686
1.0 1.0
0.8
0.8
0.6
Partial Autocorrelation
0.6
0.4
0.4
Autocorrelation
0.2
0.2
0.0
0.0
-0.2
-0.2
-0.4
-0.4
-0.6
-0.6 -0.8
-0.8 -1.0
-1.0
1 2 3 4 5 6 7 8 9 10 11
1 2 3 4 5 6 7 8 9 10 11 Lag
Lag
Figure 4.5.1 The ACF and PACF Values for the Export of Crude Oil to South America Data
The above table and figure clearly show an indication of autoregressive pattern. The ACF values
show a geometrically declined pattern and the first two PACF values are significant indicating
that the possible model could be ARCH (2). To confirm this we employ the White test and the
Table 4.5.2 Order of ARCH Using the White Test for the Export of Crude Oil to South America
Data
77
Here we compare three different ARCH models. At first we fit an ARCH (1) model and observe
that the lag effect is significant. Then we fit an ARCH (2) model and observe that the first lag
effect is significant, but the second one is not. Thus we may conclude that ARCH (1) is the most
appropriate model for this data. The White statistic for ARCH (1) is 14.534 and that is highly
significant as well.
For the validity of the inference we now check the normality assumption of the errors. At
first we give a normal probability plot of the ARCH (1) residuals which is given in Figure 4.5.2.
Although this graph shows a little bit non-normality pattern in the middle, but perhaps overall the
errors satisfy normality assumptions here. Finally we employ the Jarque-Bera and the rescaled
moments test for checking the normality of errors and the results are presented in Table 4.5.3.
60
50
40
30
20
10
1
-75 -50 -25 0 25 50
Residual
Figure 4.5.2 Normal Probability Plot of ARCH (1) Residuals for the Export of Crude Oil to
78
Table 4.5.3 Normality Test of ARCH (1) Rresiduals for the Export of Crude Oil to South
America Data
Thus the ARCH (1) model passes the normality test and we can conclude that ARCH (1) is the
this data together with the Ljuang-Box t and χ 2 tests are given in Table 4.6.1 and in Figure 4.6.1.
Table 4.6.1 The ACF and PACF Values for the Export of Refined Oil to South America Data
79
10 -0.093397 -0.33281 57.4892 -0.002826 -0.01853
1.0 1.0
0.8 0.8
0.6 0.6
Partial Autocorrelation
0.4 0.4
Autocorrelation
0.2 0.2
0.0 0.0
-0.2 -0.2
-0.4 -0.4
-0.6 -0.6
-0.8 -0.8
-1.0 -1.0
1 2 3 4 5 6 7 8 9 10 11 1 2 3 4 5 6 7 8 9 10 11
Lag Lag
Figure 4.6.1 The ACF and PACF Values for the Export of Refined Oil to South America Data
The above table and figure clearly show an indication of autoregressive pattern. The ACF values
show a geometrically declined pattern and only the first PACF value is significant indicating that
the possible model could be ARCH (1). To confirm this we employ the White test and the results
Table 4.6.2 Order of ARCH Using the White Test for the Export of Refined Oil to South
America Data
Here we compare three different ARCH models. At first we fit an ARCH (1) model and observe
that the lag effect is significant. Then we fit an ARCH (2) model and observe that the first lag
effect is significant, but the second one is not. Thus we may conclude that ARCH (1) is the most
80
appropriate model for this data. The White statistic for ARCH (1) is 4.945 and that is significant
For the validity of the inference we now check the normality assumption of the errors. At
first we give a normal probability plot of the ARCH (1) residuals which is given in Figure 4.6.2.
This graph shows that perhaps the errors satisfy normality assumptions here. Finally we employ
the Jarque-Bera and the rescaled moments test for checking the normality of errors and the
60
50
40
30
20
10
1
-15 -10 -5 0 5 10
Residual
Figure 4.6.2 Normal Probability Plot of ARCH (1) Residuals for the Export of Refined Oil to
Table 4.6.3 Normality Test of ARCH (1) Rresiduals for the Export of Refined Oil to South
America Data
81
Thus the ARCH (1) model passes the normality test and we can conclude that ARCH (1) is the
this data together with the Ljuang-Box t and χ 2 tests are given in Table 4.7.1 and in Figure 4.7.1.
Table 4.7.1 The ACF and PACF Values for the Export of Crude Oil to Western Europe Data
82
Autocorrelation Function for Crude_Western_europe Partial Autocorrelation Function for Crude_Western_europe
(with 5% significance limits for the autocorrelations) (with 5% significance limits for the partial autocorrelations)
1.0 1.0
0.8 0.8
0.6 0.6
Partial Autocorrelation
0.4 0.4
Autocorrelation
0.2 0.2
0.0 0.0
-0.2 -0.2
-0.4 -0.4
-0.6 -0.6
-0.8 -0.8
-1.0 -1.0
1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12
Lag Lag
Figure 4.7.1 The ACF and PACF Values for the Export of Crude Oil to Western Europe Data
The above table and figure clearly show an indication of autoregressive pattern. The ACF values
show a geometrically declined pattern and only the first PACF value is significant indicating that
the possible model could be ARCH (1). To confirm this we employ the White test and the results
Table 4.7.2 Order of ARCH Using the White Test for the Export of Crude Oil to Western Europe
Data
Here we compare three different ARCH models. At first we fit an ARCH (1) model and observe
that the lag effect is significant. Then we fit an ARCH (2) model and observe that the first lag
effect is significant, but the second one is not. Thus we may conclude that ARCH (1) is the most
appropriate model for this data. The White statistic for ARCH (1) is 21.456 and which is highly
significant
83
For the validity of the inference we now check the normality assumption of the errors. At first we
give a normal probability plot of the ARCH (1) residuals which is given in Figure 4.7.2. This
graph shows that perhaps the errors satisfy normality assumptions here. Finally we employ the
Jarque-Bera and the rescaled moments test for checking the normality of errors and the results
60
50
40
30
20
10
1
-1000 -500 0 500 1000
Residual
Figure 4.7.2 Normal Probability Plot of ARCH (1) Residuals for the Export of Crude Oil to
Table 4.7.3 Normality Test of ARCH (1) Rresiduals for the Export of Crude Oil to Western
Europe Data
Thus the ARCH (1) model passes the normality test and we can conclude that ARCH (1) is the
84
4.8 Export of Refined Oil to Western Europe
Next we consider is the export of refined oil to Western Europe. The ACF and PACF values for
this data together with the Ljuang-Box t and χ 2 tests are given in Table 4.8.1 and in Figure 4.8.1.
Table 4.8.1 The ACF and PACF Values for the Export of Refined Oil to Western Europe Data
85
Autocorrelation Function for Refine_Western_europe Partial Autocorrelation Function for Refine_Western_europe
(with 5% significance limits for the autocorrelations) (with 5% significance limits for the partial autocorrelations)
1.0 1.0
0.8 0.8
0.6 0.6
Partial Autocorrelation
0.4 0.4
Autocorrelation
0.2 0.2
0.0 0.0
-0.2 -0.2
-0.4 -0.4
-0.6 -0.6
-0.8 -0.8
-1.0 -1.0
1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12
Lag Lag
Figure 4.8.1 The ACF and PACF Values for the Export of Refined Oil to Western Europe Data
The above table and figure clearly show an indication of autoregressive pattern. The ACF values
show a geometrically declined pattern and only the first PACF value is significant indicating that
the possible model could be ARCH (1). To confirm this we employ the White test and the results
Table 4.8.2 Order of ARCH Using the White Test for the Export of Refined Oil to Western
Europe Data
Here we compare three different ARCH models. At first we fit an ARCH (1) model and observe
that the lag effect is significant. Then we fit an ARCH (2) model and observe that the first lag
effect is significant, but the second one is not. Thus we may conclude that ARCH (1) is the most
appropriate model for this data. The White statistic for ARCH (1) is 12.77 and which is highly
significant
86
For the validity of the inference we now check the normality assumption of the errors. At first we
give a normal probability plot of the ARCH (1) residuals which is given in Figure 4.8.2. This
graph shows that perhaps the errors satisfy normality assumptions here. Finally we employ the
Jarque-Bera and the rescaled moments test for checking the normality of errors and the results
60
50
40
30
20
10
1
-30 -20 -10 0 10 20 30
Residual
Figure 4.8.2 Normal Probability Plot of ARCH (1) Residuals for the Export of Refined Oil to
Table 4.8.3 Normality Test of ARCH (1) Rresiduals for the Export of Refined Oil to Western
Europe Data
Thus the ARCH (1) model passes the normality test and we can conclude that ARCH (1) is the
87
4.9 Export of Crude Oil to Middle East
Next we consider is the export of crude oil to Middle East. The ACF and PACF values for this
data together with the Ljuang-Box t and χ 2 tests are given in Table 4.9.1 and in Figure 4.9.1.
Table 4.9.1 The ACF and PACF Values for the Export of Crude Oil to Middle East Data
1.0 1.0
0.8 0.8
0.6 0.6
Partial Autocorrelation
0.4 0.4
Autocorrelation
0.2 0.2
0.0 0.0
-0.2 -0.2
-0.4 -0.4
-0.6 -0.6
-0.8 -0.8
-1.0 -1.0
1 2 3 4 5 6 7 8 9 10 11 1 2 3 4 5 6 7 8 9 10 11
Lag Lag
Figure 4.9.1 The ACF and PACF Values for the Export of Crude Oil to Middle East Data
88
The above table and figure clearly show an indication of autoregressive pattern. The ACF values
show a geometrically declined pattern and only the first PACF value is significant indicating that
the possible model could be ARCH (1). To confirm this we employ the White test and the results
Table 4.9.2 Order of ARCH Using the White Test for the Export of Crude Oil to Middle East
Data
At first we fit an ARCH (1) model. Although the ACF and PACF values indicated that an ARCH
(1) model could fit this data, we observe that the lag effect is insignificant. Then we fit an ARCH
(2) model and observe that both of the first lag effects are insignificant. The White statistic for
ARCH (1) is only 2.322 having p-value of 0.1275 which is insignificant at both 5% and 10%
levels. Thus we may conclude that this data do not show any evidence of ARCH effect and hence
For the validity of the inference we now check the normality assumption of the errors. At
first we give a normal probability plot of the AR (1) residuals which is given in Figure 4.9.2.
This graph shows that perhaps the errors satisfy normality assumptions here. Finally we employ
the Jarque-Bera and the rescaled moments test for checking the normality of errors and the
89
Probability Plot of Residual
Normal - 95% CI
99
Mean 2.015958E-14
StDev 18.05
95 N 43
AD 0.687
90
P-Value 0.068
80
70
Percent
60
50
40
30
20
10
1
-50 -25 0 25 50
Residual
Figure 4.9.2 Normal Probability Plot of AR (1) Residuals for the Export of Crude Oil to Middle
East Data
Table 4.9.3 Normality Test of AR (1) Rresiduals for the Export of Crude Oil to Middle East Data
Thus the AR (1) model passes the normality test and we can conclude that AR (1) is the most
data together with the Ljuang-Box t and χ 2 tests are given in Table 4.10.1 and in Figure 4.10.1.
90
Table 4.10.1 The ACF and PACF Values for the Export of Refined Oil to Middle East Data
1.0 1.0
0.8 0.8
0.6 0.6
Partial Autocorrelation
0.4 0.4
Autocorrelation
0.2 0.2
0.0 0.0
-0.2 -0.2
-0.4 -0.4
-0.6
-0.6
-0.8
-0.8
-1.0
-1.0
1 2 3 4 5 6 7 8 9 10 11 12
1 2 3 4 5 6 7 8 9 10 11
Lag
Lag
Figure 4.10.1 The ACF and PACF Values for the Export of Refined Oil to Middle East Data
The above table and figure clearly show an indication of autoregressive pattern. The ACF values
show a geometrically declined pattern and only the first PACF value is significant indicating that
91
the possible model could be ARCH (1). To confirm this we employ the White test and the results
Table 4.10.2 Order of ARCH Using the White Test for the Export of Refined Oil to Middle East
Data
At first we fit an ARCH (1) model. Although the ACF and PACF values indicated that an ARCH
(1) model could fit this data, we observe that the first lag effect is significant at the 10% level,
but not at the 5% level. Then we fit an ARCH (2) model and observe that both of the first lag
effects are insignificant. The White statistic for ARCH (1) is 3.27 having p-value of 0.0725
which is significant at the 10% level, but not at the 5% level. Thus we may conclude that this
data do not show any strong evidence of ARCH effect and hence can be fitted by an AR(1)
model.
For the validity of the inference we now check the normality assumption of the errors. At
first we give a normal probability plot of the AR (1) residuals which is given in Figure 4.10.2.
This graph shows that perhaps the errors satisfy normality assumptions here. Finally we employ
the Jarque-Bera and the rescaled moments test for checking the normality of errors and the
92
Probability Plot of Residual
Normal - 95% CI
99
Mean 6.373403E-13
StDev 10.31
95 N 43
AD 0.584
90
P-Value 0.121
80
70
Percent
60
50
40
30
20
10
1
-30 -20 -10 0 10 20 30
Residual
Figure 4.10.2 Normal Probability Plot of AR (1) Residuals for the Export of Refined Oil to
Table 4.10.3 Normality Test of AR (1) Rresiduals for the Export of Refined Oil to Middle East
Data
Thus the AR (1) model passes the normality test and we can conclude that AR (1) is the most
together with the Ljuang-Box t and χ 2 tests are given in Table 4.11.1 and in Figure 4.11.1.
93
Table 4.11.1 The ACF and PACF Values for the Export of Crude Oil to Africa Data
These table and figure clearly show an indication of autoregressive pattern. The ACF values
show a geometrically declined pattern and only the first PACF value is significant indicating that
the possible model could be ARCH (1). To confirm this we employ the White test and the results
94
Autocorrelation Function for Crude_Africa Partial Autocorrelation Function for Crude_Africa
(with 5% significance limits for the autocorrelations) (with 5% significance limits for the partial autocorrelations)
1.0 1.0
0.8 0.8
0.6 0.6
Partial Autocorrelation
0.4 0.4
Autocorrelation
0.2 0.2
0.0 0.0
-0.2 -0.2
-0.4 -0.4
-0.6 -0.6
-0.8 -0.8
-1.0 -1.0
1 2 3 4 5 6 7 8 9 10 11 1 2 3 4 5 6 7 8 9 10 11
Lag Lag
Figure 4.11.1 The ACF and PACF Values for the Export of Crude Oil to Africa Data
Table 4.11.2 Order of ARCH Using the White Test for the Export of Crude Oil to Africa Data
At first we fit an ARCH (1) model. Although the ACF and PACF values indicated that an ARCH
(1) model could fit this data, we observe that the lag effect is insignificant. Then we fit an ARCH
(2) model and observe that both of the first lag effects are insignificant. The White statistic for
ARCH (1) is only 0.946 having p-value of 0.3307 which is insignificant at both 5% and 10%
levels. Thus we may conclude that this data do not show any evidence of ARCH effect and hence
For the validity of the inference we now check the normality assumption of the errors. At
first we give a normal probability plot of the AR (1) residuals which is given in Figure 4.11.2.
This graph shows that perhaps the errors satisfy normality assumptions here. Finally we employ
the Jarque-Bera and the rescaled moments test for checking the normality of errors and the
95
Probability Plot of Residual
Normal - 95% CI
99
Mean 5.023372E-14
StDev 21.15
95 N 43
AD 0.222
90
P-Value 0.819
80
70
Percent
60
50
40
30
20
10
1
-50 -25 0 25 50 75
Residual
Figure 4.11.2 Normal Probability Plot of AR (1) Residuals for the Export of Crude Oil to Africa
Data
Table 4.11.3 Normality Test of AR (1) Rresiduals for the Export of Crude Oil to Africa Data
Thus the AR (1) model passes the normality test and we can conclude that AR (1) is the most
together with the Ljuang-Box t and χ 2 tests are given in Table 4.12.1 and in Figure 4.12.1.
96
Table 4.12.1 The ACF and PACF Values for the Export of Refined Oil to Africa Data
The table and figure clearly show an indication of autoregressive pattern. The ACF values show
a geometrically declined pattern and only the first PACF value is significant indicating that the
possible model could be ARCH (1). To confirm this we employ the White test and the results are
97
Autocorrelation Function for Refine_Africa Partial Autocorrelation Function for Refine_Africa
(with 5% significance limits for the autocorrelations) (with 5% significance limits for the partial autocorrelations)
1.0 1.0
0.8 0.8
0.6 0.6
Partial Autocorrelation
0.4 0.4
Autocorrelation
0.2 0.2
0.0 0.0
-0.2 -0.2
-0.4 -0.4
-0.6 -0.6
-0.8 -0.8
-1.0 -1.0
1 2 3 4 5 6 7 8 9 10 11 1 2 3 4 5 6 7 8 9 10 11
Lag Lag
Figure 4.12.1 The ACF and PACF Values for the Export of Refined Oil to Africa Data
Table 4.12.2 Order of ARCH Using the White Test for the Export of Refined Oil to Africa Data
At first we fit an ARCH (1) model. Although the ACF and PACF values indicated that an ARCH
(1) model could fit this data, we observe that the lag effect is insignificant. Then we fit an ARCH
(2) model and observe that both of the first lag effects are insignificant. The White statistic for
ARCH (1) is only 0.43 having p-value of 0.5119 which is insignificant at both 5% and 10%
levels. Thus we may conclude that this data do not show any evidence of ARCH effect and hence
For the validity of the inference we now check the normality assumption of the errors. At
first we give a normal probability plot of the AR (1) residuals which is given in Figure 4.12.2.
This graph shows that perhaps the errors satisfy normality assumptions here. Finally we employ
the Jarque-Bera and the rescaled moments test for checking the normality of errors and the
98
Probability Plot of Residual
Normal - 95% CI
99
Mean -0.2040
StDev 7.747
95 N 43
AD 0.530
90
P-Value 0.166
80
70
Percent
60
50
40
30
20
10
1
-20 -10 0 10 20
Residual
Figure 4.12.2 Normal Probability Plot of AR (1) Residuals for the Export of Refined Oil to
Africa Data
Table 4.12.3 Normality Test of AR (1) Rresiduals for the Export of Refined Oil to Africa Data
Thus the AR (1) model passes the normality test and we can conclude that AR (1) is the most
this data together with the Ljuang-Box t and χ 2 tests are given in Table 4.13.1 and in Figure
4.13.1.
99
Table 4.13.1 The ACF and PACF Values for the Export of Crude Oil to Asia and Far East Data
The above table and figure clearly show an indication of autoregressive pattern. The ACF values
show a geometrically declined pattern and only the first PACF value is significant indicating that
the possible model could be ARCH (1). To confirm this we employ the White test and the results
100
Autocorrelation Function for Crude_Asia_and_Far_east Partial Autocorrelation Function for Crude_Africa
(with 5% significance limits for the autocorrelations) (with 5% significance limits for the partial autocorrelations)
1.0 1.0
0.8 0.8
0.6 0.6
Partial Autocorrelation
0.4 0.4
Autocorrelation
0.2 0.2
0.0 0.0
-0.2 -0.2
-0.4 -0.4
-0.6 -0.6
-0.8 -0.8
-1.0 -1.0
1 2 3 4 5 6 7 8 9 10 11 1 2 3 4 5 6 7 8 9 10 11
Lag Lag
Figure 4.13.1 The ACF and PACF Values for the Export of Crude Oil to Asia and Far East Data
Table 4.13.2 Order of ARCH Using the White Test for the Export of Crude Oil to Asia and Far
East Data
Here we compare three different ARCH models. At first we fit an ARCH (1) model and observe
that the lag effect is significant. Then we fit an ARCH (2) model and observe that both the first
two lag effects are significant. Next we fit an ARCH (3) model and observe that both the first
two lag effects are significant, but the third one is not. Thus we may conclude that ARCH (2) is
the most appropriate model for this data. The White statistic for ARCH (2) is 31.69 and that is
101
Probability Plot of Residual
Normal - 95% CI
99
Mean 4.896466E-12
StDev 235.1
95 N 43
AD 0.798
90
P-Value 0.036
80
70
Percent
60
50
40
30
20
10
1
-800 -600 -400 -200 0 200 400 600 800
Residual
Figure 4.13.2 Normal Probability Plot of ARCH (2) Residuals for the Export of Crude Oil to
Table 4.13.3 Normality Test of ARCH (2) Rresiduals for the Export of Crude Oil to Asia and Far
East Data
For the validity of the inference we now check the normality assumption of the errors. At
first we give a normal probability plot of the ARCH (2) residuals which is given in Figure 4.13.2.
This graph shows that perhaps the errors satisfy normality assumptions here. Finally we employ
the Jarque-Bera and the rescaled moments test for checking the normality of errors and the
results are presented in Table 4.13.3. Thus the ARCH (2) model passes the normality test and we
can conclude that ARCH (2) is the most appropriate model for this data.
102
4.14 Export of Refined Oil to Asia and Far East
Next we consider is the export of refined oil to Asia and Far East. The ACF and PACF values for
this data together with the Ljuang-Box t and χ 2 tests are given in Table 4.14.1 and in Figure
4.14.1.
Table 4.14.1 The ACF and PACF Values for the Export of Refined Oil to Asia and Far East Data
103
Autocorrelation Function for Refine_Asia_and_Far_east Partial Autocorrelation Function for Refine_Asia_and_Far_east
(with 5% significance limits for the autocorrelations) (with 5% significance limits for the partial autocorrelations)
1.0 1.0
0.8 0.8
0.6 0.6
Partial Autocorrelation
0.4 0.4
Autocorrelation
0.2 0.2
0.0 0.0
-0.2 -0.2
-0.4 -0.4
-0.6 -0.6
-0.8 -0.8
-1.0 -1.0
1 2 3 4 5 6 7 8 9 10 11 1 2 3 4 5 6 7 8 9 10 11
Lag Lag
Figure 4.14.1 The ACF and PACF Values for the Export of Refined Oil to Asia and Far East
Data
The above table and figure clearly show an indication of autoregressive pattern. The ACF values
show a geometrically declined pattern and only the first PACF value is significant indicating that
the possible model could be ARCH (1). To confirm this we employ the White test and the results
Here we compare three different ARCH models. At first we fit an ARCH (1) model and
observe that the lag effect is significant. Then we fit an ARCH (2) model and observe that the
first lag effect is significant, but the second one is not. Thus we may conclude that ARCH (1) is
the most appropriate model for this data. The White statistic for ARCH (1) is 24.08 and which is
highly significant.
Table 4.14.2 Order of ARCH Using the White Test for the Export of Refined Oil to Asia and Far
East Data
104
Probability Plot of Residual
Normal - 95% CI
99
Mean 9.980647E-13
StDev 52.50
95 N 43
AD 0.575
90
P-Value 0.127
80
70
Percent
60
50
40
30
20
10
1
-150 -100 -50 0 50 100 150
Residual
Figure 4.14.2 Normal Probability Plot of ARCH (1) Residuals for the Export of Refined Oil to
Table 4.14.3 Normality Test of ARCH (1) Rresiduals for the Export of Refined Oil to Asia and
For the validity of the inference we now check the normality assumption of the errors. At
first we give a normal probability plot of the ARCH (1) residuals which is given in Figure 4.14.2.
This graph shows that perhaps the errors satisfy normality assumptions here. Finally we employ
the Jarque-Bera and the rescaled moments test for checking the normality of errors and the
results are presented in Table 4.14.3. Thus the ARCH (1) model passes the normality test and we
can conclude that ARCH (1) is the most appropriate model for this data.
105
4.15 Export of Crude Oil to Oceania
Next we consider is the export of crude oil to Oceania. The ACF and PACF values for this data
together with the Ljuang-Box t and χ 2 tests are given in Table 4.15.1 and in Figure 4.15.1.
Table 4.15.1 The ACF and PACF Values for the Export of Crude Oil to Oceania Data
1.0 1.0
0.8 0.8
0.6 0.6
Partial Autocorrelation
0.4 0.4
Autocorrelation
0.2 0.2
0.0 0.0
-0.2 -0.2
-0.4 -0.4
-0.6 -0.6
-0.8 -0.8
-1.0 -1.0
1 2 3 4 5 6 7 8 9 10 11
1 2 3 4 5 6 7 8 9 10 11
Lag
Lag
Figure 4.15.1 The ACF and PACF Values for the Export of Crude Oil to Oceania Data
106
The above table and figure clearly show an indication of autoregressive pattern. The ACF values
show a geometrically declined pattern and only the first PACF value is significant indicating that
the possible model could be ARCH (1). To confirm this we employ the White test and the results
Here we compare three different ARCH models. At first we fit an ARCH (1) model and observe
that the lag effect is significant. Then we fit an ARCH (2) model and observe that both the first
two lag effects are significant. Next we fit an ARCH (3) model and observe that both the first
two lag effects are significant, but the third one is not. Thus we may conclude that ARCH (2) is
the most appropriate model for this data. The White statistic for ARCH (2) is 16.02 and that is
For the validity of the inference we now check the normality assumption of the errors. At first we
give a normal probability plot of the ARCH (2) residuals which is given in Figure 4.15.2. This
graph shows that perhaps the errors satisfy normality assumptions here. Finally we employ the
Jarque-Bera and the rescaled moments test for checking the normality of errors and the results
Table 4.15.2 Order of ARCH Using the White Test for the Export of Crude Oil to Oceania Data
107
Lag 3 0.1567 0.95 0.348
Percent 60
50
40
30
20
10
1
-30 -20 -10 0 10 20 30
Residual
Figure 4.15.2 Normal Probability Plot of ARCH (2) Residuals for the Export of Crude Oil to
Oceania Data
Table 4.15.3 Normality Test of ARCH (2) Rresiduals for the Export of Crude Oil to Oceania
Data
Thus the ARCH (2) model passes the normality test and we can conclude that ARCH (2) is the
together with the Ljuang-Box t and χ 2 tests are given in Table 4.16.1 and in Figure 4.16.1.
108
Table 4.16.1 The ACF and PACF Values for the Export of Refined Oil to Oceania Data
1.0 1.0
0.8 0.8
0.6 0.6
Partial Autocorrelation
0.4 0.4
Autocorrelation
0.2 0.2
0.0 0.0
-0.2 -0.2
-0.4 -0.4
-0.6 -0.6
-0.8 -0.8
-1.0 -1.0
1 2 3 4 5 6 7 8 9 10 11 1 2 3 4 5 6 7 8 9 10 11
Lag Lag
Figure 4.16.1 The ACF and PACF Values for the Export of Refined Oil to Oceania Data
The above table and figure clearly show an indication of autoregressive pattern. The ACF values
show a geometrically declined pattern and only the first PACF value is significant indicating that
109
the possible model could be ARCH (1). To confirm this we employ the White test and the results
Here we compare three different ARCH models. At first we fit an ARCH (1) model and observe
that the lag effect is significant. Then we fit an ARCH (2) model and observe that the first lag
effect is significant, but the second one is not. Thus we may conclude that ARCH (1) is the most
appropriate model for this data. The White statistic for ARCH (1) is 10.79 and which is highly
significant.
Table 4.16.2 Order of ARCH Using the White Test for the Export of Refined Oil to Oceania
Data
For the validity of the inference we now check the normality assumption of the errors. At
first we give a normal probability plot of the ARCH (1) residuals which is given in Figure 4.16.2.
This graph shows that perhaps the errors satisfy normality assumptions here. Finally we employ
the Jarque-Bera and the rescaled moments test for checking the normality of errors and the
110
Probability Plot of Residual
Normal - 95% CI
99
Mean 2.491031E-14
StDev 5.445
95 N 43
AD 0.965
90
P-Value 0.014
80
70
Percent
60
50
40
30
20
10
1
-20 -10 0 10 20
Residual
Figure 4.16.2 Normal Probability Plot of ARCH (1) Residuals for the Export of Refined Oil to
Oceania Data
Table 4.16.3 Normality Test of ARCH (1) Rresiduals for the Export of Refined Oil to Oceania
Data
Thus the ARCH (1) model passes the normality test and we can conclude that ARCH (1) is the
111
Table 4.17 Selected Models for Saudi Arabia Oil Production Data
The autocorrelation functions and partial autocorrelation functions show that all sixteen variables
show autoregressive pattern. The White test suggests to select ARCH (1) model for nine of them.
Three other variables fit ARCH (2) model. We do not find any ARCH effect in four other
112
CHAPTER 5
CONCLUSIONS AND DIRECTION OF
FUTURE RESEARCH
In this chapter we will summarize the findings of our research to draw some conclusions and
5.1 Conclusions
In our research the prime objective was to find the most appropriate models for analyzing Saudi
Arabia oil production data. Since these are time series data we could consider ARIMA models to
fit the data. But most of the variables showed some kind of volatility and for this reason we
select ARCH models for them. If there is no ARCH effect, it will automatically become an
ARIMA model. But the existence of missing values for almost each of the variable makes the
analysis part complicated since an ARCH model does not converge when observations are
employ the EM algorithm for estimating the missing values. But since our data are time series
simple EM algorithm would not be appropriate for them. There is also evidence of the presence
of outliers in the data and robust regression techniques conclude that three out of sixteen
variables contained multiple outliers in it. Hence we finally employed robust regression LTS
After the estimation of missing values we employed the White test to select the most
appropriate ARCH models for all sixteen variables under study. The ACF and PACF values
suggest that all of them showed autoregressive pattern. Nine of them matched with ARCH (1)
model, three with ARCH (2) and the remaining four did not show any ARCH effect and they
113
match with AR (1). Normality tests on resulting residuals were performed to check the validity
of the fitted models and all of them supported the normality assumption confirming that our
possible for us to look at every aspect of time series properties of Saudi Arabia oil production data.
Because of time constraint we could not study the inter relationships among the variables. We have
selected the appropriate ARCH model for each of the variables but we could not study the goodness of fit
of them. We also could not study how effective these ARCH models are in the forecasting. A cross
validation study could be used here to judge the quality of prediction for different models. For cross-
validation we used the data splitting technique. However, there is evidence [see Efron and Tibshirani
(1997)] that cross validations can be improved by implementing a special type of bootstrap. We only
consider ARCH models in our study, sometimes GARCH could be a better alternative for this type of
114
REFERENCES
1. Barnet, V. and Lewis, T. (1994). Outliers in Statistical Data, 3rd ed., Wiley, New York.
6 Franses, P.H. and van Dijk, D. (2000). Nonlinear Time Series Models in Empirical
Finance, Cambridge University Press, Cambridge.
7. Gounder, M.K., Shitan, M and Imon, A.H.M.R. (2007). Detection of outliers in non-
linear time series: A review, Festschrifts in Honour of Professor Mir Masoom Ali,
Department of Mathematical Sciences, Ball State University, USA, 213 – 224.
9. Greene, W.H. (1997). Econometric Analysis, 3rd ed., Prentice Hall, New Jersey.
10. Hadi, A.S., Imon, A.H.M.R. and Werner, M. (2009). Detection of outliers, Wiley
Interdisciplinary Reviews: Computational Statistics, 1, 57 – 70.
11. F.R. Hampel, E.M. Ronchetti, P.J. Rousseeuw and W. Stahel, Robust statistics:The
approach based on influence function, Wiley, New York, 1986.
12. Imon, A. H. M. R. (2003). Regression residuals, moments, and their use in tests for
normality, Communications in Statistics—Theory and Methods, 32, 1021 – 1034.
13. Imon, A.H.M.R., Doula, M.S. and Hamzah, N.A. (2007). On the detection of ARCH
effect in time series data, Proceedings of an International Conference on
Mathematical Sciences on ‘Integrating Mathematical Sciences within Society’,
Bangi – Putrajaya, Malaysia, pp. 783 – 789.
14. Little, R.J.A. and Rubin D. B. (2002). Statistical Analysis with Missing Data, 2nd
115
ed., Wiley, New York.
15. Mamun, A.S.A. (2013). Robust Statistics in Linear Structural Relationship Model
and Analysis of Missing Values. Unpublished Ph.D. thesis, University of Malaya.
16 Maronna, R.A., Martin, R.D. and Yohai, V.J. (2006), Robust Statistics: Theory and
Methods, Wiley, New York.
17. Pearson, K. (1905). On the general theory of skew correlation and non-linear
regression, Biometrika, 4: 171-212.
19. Rana, M.S., Habshah, M. and Imon, A.H.M.R. (2009). A robust rescaled moments test
for normality in regression, Journal of Mathematics and Statistics, 5, 54–62.
20. Rousseeuw, P.J. (1984). Least median of squares regression, Journal of the American
21. Rousseeuw, P.J. and Leroy, A.M. (1987). Robust Regression and Outlier Detection,
Wiley, New York.
116
APPENDIX
SAUDI ARABIA OIL PRODUCTION DATA
http://www.sama.gov.sa/sites/samaen/ReportsStatistics/statistics/Pages/YearlyStatistics.aspx
Total Total Crude Oil Refined Oil Crude Oil Refined Oil
117
1977 3357.96 188.39 359.68 2.63 369.21 6.11
118
2002 2588.98 362.64 488.8 4.91 22.08 10.87
Crude Oil Refined Oil Crude Oil Refined Oil Crude Oil Refined Oil
119
1972 1130.36 7.18 71.31 2.14 57.3 7.87
1987 * * * * * *
120
1997 591.13 29.95 77.69 67.68 38.57 29.73
121
Export of Crude Export of Export of Export of
1962
1963
1964
1965
1966
1967
122
1986 321.47 154.66 3.92 8.65
1987 * * * *
123