Professional Documents
Culture Documents
Project 1
Project 1
Project 1
School of Economics
The report starts by providing an analysis of the historical building approvals data set, which
shows that the historical data is volatile (fluctuate between 1000 to 6000) with a prominent
falling trend beginning in 2002, and the summary statistics show that building approvals
average on 3592 approvals per month. The following section will discuss each component of
the classical decomposition, begins with trend. A polynomial trend of order 2 is found to be
the optimal using AIC and SIC. Seasonality is addressed in the next section and its existence
is confirmed with the result of the Wald test. 12 seasonal dummies are introduced due to the
fact that the data is presented on a monthly basis. It shows that there is a general increase in
building approvals from the beginning of the year and decreases at near the end of the year.
The final component, cycle is discussed next and the Box-Jenkins methodology is used for
model selection. Autoregressive process with 3 lags is found to have the minimum AIC and
SIC, thus selected for the cyclical component.
The next section details a set of diagnostic checks with an in-depth examination of the residuals,
followed by quality checks of the model through various tests, which confirmed the stability
and correctness of the model. A 24-month-ahead forecast for the periods beginning January
2010 to December 2011 is provided and indicates that the future building approvals follow a
slow, minor decreasing trend with strong seasonality. The forecast results will serve as a
guideline for RealAnswers to improve their investment decisions. The shortcomings of the
results are discussed at the end of the report.
= 1 + 2 2 + + + ()
=1
where 1 , 2 = coefficients of TIME and TIME2 respectively
= coefficients of seasonal dummies, = 1 12
() = 1 1 2 2 3 3
= coefficients of AR lags
Main Report
1. Historical Data Analysis
Firstly, we will begin with a summary statistics of the historical building approvals to show
some general features of the time series plot.
Figure 1.1: Histogram of Approvals Figure 1.2: Time series plot of Approvals
Figure 1.1 shows that there is an average of 3592 building approvals per month during the
historical period from July 1983 to December 2009. The maximum occurred in August 1994
at 6363 approvals, while the minimum occurred in January 2009 at 1183 approvals. The Jarque-
Bera test has a p-value=0.365, which is insignificant at 10% level, thus we reject the null
hypothesis and conclude that the data set has a normal distribution. Furthermore, figure 1.2
shows that the historical data of building approvals have a falling trend that is visible at near
the end of the plot, and the series have been volatile over time. Therefore, the historical data
reveals that there is a general decline in the number of building approvals with considerable
random fluctuations.
2. Trend: = + +
Trend, , is the slow, long-run evolution of the building approval time series, . Generally,
the number of building approvals can fluctuate substantially due to the change in the state of
economy such as interest rates, availability of mortgage funds, etc. that affect the demand for
approvals; the slowly evolving of preferences and institutional regulations; and the cyclical
nature of the industry. In order to select the forecasting model that best fits the data, we will
have to first find the trend model that minimizes that sum of squared residuals.
In order to find the optimal regression model, we need to find the model that minimizes the
sum of squared residuals, which is equivalent to selecting the model that minimizes the mean
squared error (MSE). By using the Akaike Information Criterion (AIC) and the Schwarz
Information Criterion (SIC) which are of the form penalty factor times MSE (in-sample
residual variance), we can find the optimal model by selecting the polynomial order with the
minimum AIC and SIC.
Figure 2.1: Linear time trend model OLS Figure 2.2: Quadratic time trend model OLS
AIC SIC
The OLS results for linear, quadratic, and cubic time trend models are shown in figure 2.1, 2.2,
and 2.3 respectively. The AIC and SIC of each model are summarized in table 2.1. The cubic
time trend model has the smallest AIC and SIC among the 3 models, which indicates that it
should be the optimal model. However, by inspecting the p-values of the time trend variables
in each model, the variables TIME and TIME2 in the cubic model have p-values equal to
0.5392 and 0.0421 respectively indicating that the estimates are statistically insignificant at 1%
level. Therefore, the cubic time trend model should not be considered in this case, and the
model with the second smallest AIC and SIC
is used instead, which is the quadratic time
trend model with coefficients of approvals,
TIME, and TIME2 being statistically
significant at 1% level.
= 0 + 1 + 2 2
= 0 + 1 + 2 2 +
We will first check the presence of seasonality in the data set by regressing Approvals on the
dummy variables and use the Wald test to test the joint significance of the seasonal components.
Figure 3.1: Trend and seasonality regression Figure 3.2: Wald test
The Wald test works by testing the null hypothesis that the coefficients of the seasonal dummies
are simultaneously equal to zero, with the alternative hypothesis that one or more of the
seasonal dummies are not equal to zero.
0 : (2) = (3) = (4) = (5) = = (12) = 0
1 : 0
The Wald test results are displayed in figure 3.2 along with the regression results displayed in
figure 3.1. The two test statistics both have p-values equal to 0, which indicates that the
seasonal dummies are jointly significant at 1% level and the null hypothesis is rejected. Thus,
the seasonal dummies do have values different from zero, and we conclude that there is
evidence showing seasonality is presented in the building approvals series.
Now, we will model seasonality by running least-squares regression on our current model
shown below. Note that the constant term or intercept (0 ) is removed because at any time
period , one of the 12 seasonal dummies will equal 1, while all the others will equal 0;
therefore, the sum of all seasonal dummies is 1 and each dummy variable acts as an intercept
when regressed. By removing the the intercept and including all 12 seasonal dummies in the
regression, collinearity and redundancy are removed. Let 1 , 2 , 3 , , 12 be the 12
seasonal dummies and we run OLS on the following model, which is visually represented in
figure 3.3:
12
= + 1 + 2 2 , =
=1
12
= 1 + 2 2 + +
=1
where 1 , 2 represent the coefficients of the time trend and represent the coefficients of
seasonal dummies for = 1. .12, from figure 3.4.
Figure 3.6: Seasonality time series plot from Jan 2004 to Dec 2005
4. Cycle: = + +
Cycle, , is the random fluctuation in that are not contained within trend and seasonality,
aka irregular component. It is a stochastic process. Since we have modelled trend and
seasonality in the previous sections, therefore, we can obtain the cycle component by
subtracting trend and seasonality from Approvals, that is = . We will apply
the Box-Jenkins methodology to model this component. The first step is to determine if the
time series is stationary that is having a unit-root or not, by using the Augmented Dickey-
Fuller (ADF) test. This is to check if data transformation (log, de-trend, difference) is needed
to achieve stationarity. The ADF or unit-root test is carried out under the null hypothesis that
the time series are integrated of order 1 (unit-root exist), against the alternative hypothesis
that the series is stationary (no unit-root).
0 : (1),
1 :
The test statistic of ADF is displayed in figure 4.1, which
has value equal -3.683714 with a p-value=0.0248. Since the
ADF statistic is less (more negative) than the critical value
at 5%, and 10% level, thus we reject the null hypothesis at
5% level and conclude that the cycle should be
Figure 4.1: Augmented Dickey-Fuller Unit Root test stationary and data transformation is not needed. Note
that we included a time trend in the unit-root test since the plot of in figure 1.2 shows a
downward trend from half way till the end of the series; however, note that if we do not include
a time trend in the ADF test, we still reject the null hypothesis at 5% and even 1% level
indicating the cycle is stationary. Hence an Autoregressive Moving Average (ARMA) model
can be built for level , the cyclical component.
= 1 + 2 2 + + + ()
=1
where 1 , 2 = coefficients of TIME and TIME2 respectively
= coefficients of seasonal dummies, = 1 12
() = 1 1 2 2 3 3
= coefficients of AR lags
Before using the model to forecast the future approvals, we must complete the last step of the
Box-Jenkins methodology, that is perform diagnostic checks for model misspecifications. In
other word, we have to ensure that the disturbance term (residual series) of our model
resembles a White Noise process, which implies the model is correct. Firstly, we will check
the full model by running least-squares regression, then we will provide an in-depth
examination of residual series.
In order to ensure the quality of forecasts (model performance), we will diagnose the stability
of our forecasting (full) model by using recursive parameter estimates; recursive residuals; and
the CUSUM test.
7. Further Issues
The future building approvals forecasted by the model is not perfect due to the fact that the
model used contains errors. Firstly, the white noise process assumes that the residuals are
normally distributed with zero mean and finite variance, however, as mentioned in section 5,
the Jarque-Bera statistic is large thus normality is rejected. Hence the residuals might not
behave like a white noise process and actually contains both signals and noises, which is highly
likely to affect the forecast results. Further investigation is needed to find out the reason for the
cause. Furthermore, with closer examination of the correlogram of residuals (figure 5.2), the
ACF and PACF occasionally fall outside the Bartlett bands which is most notable at lag 6, 13,
15 and 21 onwards, and this observation also reflected in the recursive residuals plot (figure
5.7). Additionally, the Q-statistic shows a large increase from lag 12 to 13 which also implies
that there may be some serial correlation in the residual series. The original series and the fitted
values are plotted in figure 7.1. We can see a decreasing trend that starts from around 2002 and
this might be due to unexpected economic shocks that causes autocorrelation to occur.
Moreover, the adjusted R-squared (figure 5.1) for our model is 71.3% (the model accounted
for 71.3% variation) which is not particularly
high, hence the model may not be a very good fit.
These shortcomings of the model might affect the
accuracy of the forecast results.