Project 1

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

THE UNIVERSITY OF NEW SOUTH WALES

School of Economics

ECON2209, Business Forecasting, 2016 S1


Course Project

Group Cover Sheet


Check-list
1. Choose one member to submit the project: one soft-copy to be submitted online and one
hard-copy to be submitted to your tutor.
2. Class number, group number, all names and student numbers of the group must be filled in
on this Cover Sheet.
3. Each group member must complete the online self and peer assessment. Marks on
teamwork will be zero if this assessment is not completed by the deadline. Details about the
teamwork assessment will be announced shortly (see Teamwork Assessment section at the
end of this document).

Class: TUT-W10A Group: 7

Full Name Student No.


1. (person for Fong To z3336387
submission)
2. An Wang z3487833

3. Benson Hung Yao Wong z3376302

4. Stefan Wong z5062987


Executive Summary
The goal of this report is to 1) analyse the data on the monthly series on the building approvals
of New South Wales to reveal the main features of the historical building approvals, 2) use the
historical data to forecast and reveal the main features of the future building approvals for the
period from January 2010 to December 2011. The data set given contains period starting from
July 1983 and ending in December 2009, and the number of building approvals in New South
Wales per month which we denote as = . The classical (additive) decomposition
= + + , splits the observable time series, , into trend , seasonality , and
cycle is used to model the main features of the historical building approvals (the provided
data set), as well as the future approvals (forecast).

The report starts by providing an analysis of the historical building approvals data set, which
shows that the historical data is volatile (fluctuate between 1000 to 6000) with a prominent
falling trend beginning in 2002, and the summary statistics show that building approvals
average on 3592 approvals per month. The following section will discuss each component of
the classical decomposition, begins with trend. A polynomial trend of order 2 is found to be
the optimal using AIC and SIC. Seasonality is addressed in the next section and its existence
is confirmed with the result of the Wald test. 12 seasonal dummies are introduced due to the
fact that the data is presented on a monthly basis. It shows that there is a general increase in
building approvals from the beginning of the year and decreases at near the end of the year.
The final component, cycle is discussed next and the Box-Jenkins methodology is used for
model selection. Autoregressive process with 3 lags is found to have the minimum AIC and
SIC, thus selected for the cyclical component.

The next section details a set of diagnostic checks with an in-depth examination of the residuals,
followed by quality checks of the model through various tests, which confirmed the stability
and correctness of the model. A 24-month-ahead forecast for the periods beginning January
2010 to December 2011 is provided and indicates that the future building approvals follow a
slow, minor decreasing trend with strong seasonality. The forecast results will serve as a
guideline for RealAnswers to improve their investment decisions. The shortcomings of the
results are discussed at the end of the report.

The full model that is used for forecasting is shown as follows:


12

= 1 + 2 2 + + + ()
=1
where 1 , 2 = coefficients of TIME and TIME2 respectively
= coefficients of seasonal dummies, = 1 12
() = 1 1 2 2 3 3
= coefficients of AR lags
Main Report
1. Historical Data Analysis
Firstly, we will begin with a summary statistics of the historical building approvals to show
some general features of the time series plot.

Figure 1.1: Histogram of Approvals Figure 1.2: Time series plot of Approvals

Figure 1.1 shows that there is an average of 3592 building approvals per month during the
historical period from July 1983 to December 2009. The maximum occurred in August 1994
at 6363 approvals, while the minimum occurred in January 2009 at 1183 approvals. The Jarque-
Bera test has a p-value=0.365, which is insignificant at 10% level, thus we reject the null
hypothesis and conclude that the data set has a normal distribution. Furthermore, figure 1.2
shows that the historical data of building approvals have a falling trend that is visible at near
the end of the plot, and the series have been volatile over time. Therefore, the historical data
reveals that there is a general decline in the number of building approvals with considerable
random fluctuations.

2. Trend: = + +
Trend, , is the slow, long-run evolution of the building approval time series, . Generally,
the number of building approvals can fluctuate substantially due to the change in the state of
economy such as interest rates, availability of mortgage funds, etc. that affect the demand for
approvals; the slowly evolving of preferences and institutional regulations; and the cyclical
nature of the industry. In order to select the forecasting model that best fits the data, we will
have to first find the trend model that minimizes that sum of squared residuals.

By assuming the model is a deterministic trend, we can estimate it using least-squares


regression. In order to find the trend model that best fits the data, we will first construct a time
trend variable called TIME, followed by TIME2 (squared), and TIME3 (cubed). Since the data
fluctuates over time as explained in the previous section, thus a polynomial trend will be most
suitable compared to a logarithmic or exponential trend. We will use a low-order polynomial
trend model in order to maintain smoothness, therefore, we will estimate time trend up to order
3 (linear, quadratic, cubic) using OLS to find the trend model that best fits the data.

In order to find the optimal regression model, we need to find the model that minimizes the
sum of squared residuals, which is equivalent to selecting the model that minimizes the mean
squared error (MSE). By using the Akaike Information Criterion (AIC) and the Schwarz
Information Criterion (SIC) which are of the form penalty factor times MSE (in-sample
residual variance), we can find the optimal model by selecting the polynomial order with the
minimum AIC and SIC.

Figure 2.1: Linear time trend model OLS Figure 2.2: Quadratic time trend model OLS

AIC SIC

Linear time trend 16.30649 16.33015


model
Quadratic time trend 15.90777 15.94326
model
Cubic time trend 15.85806 15.90538
model
Table 2.1: Trend models AIC/SIC

Figure 2.3: Cubic time trend model OLS

The OLS results for linear, quadratic, and cubic time trend models are shown in figure 2.1, 2.2,
and 2.3 respectively. The AIC and SIC of each model are summarized in table 2.1. The cubic
time trend model has the smallest AIC and SIC among the 3 models, which indicates that it
should be the optimal model. However, by inspecting the p-values of the time trend variables
in each model, the variables TIME and TIME2 in the cubic model have p-values equal to
0.5392 and 0.0421 respectively indicating that the estimates are statistically insignificant at 1%
level. Therefore, the cubic time trend model should not be considered in this case, and the
model with the second smallest AIC and SIC
is used instead, which is the quadratic time
trend model with coefficients of approvals,
TIME, and TIME2 being statistically
significant at 1% level.

A polynomial time trend model of order 2 is


shown visually in figure 2.4:

= 0 + 1 + 2 2
= 0 + 1 + 2 2 +

where 0 , 1 , 2 represent the coefficients of


time trend in figure 2.2.
Figure 2.4: Time series plot of quadratic time trend and residual
3. Seasonality: = + +
Seasonality, , is the repetitive variation of , over a fixed number of periods. The building
approvals time series may exhibit seasonal patterns. Possible causes of seasonal effects in
building approvals include natural factors (e.g. weather such as cold and storms that may cause
a decline in new housing construction) and calendar events (fixed holidays such as Christmas).
The seasonal patterns for building approvals generally repeat at a yearly basis and given
monthly data on building approvals, we will use 12 monthly seasonal dummy variables to
account for the seasonality in .

We will first check the presence of seasonality in the data set by regressing Approvals on the
dummy variables and use the Wald test to test the joint significance of the seasonal components.

Figure 3.1: Trend and seasonality regression Figure 3.2: Wald test
The Wald test works by testing the null hypothesis that the coefficients of the seasonal dummies
are simultaneously equal to zero, with the alternative hypothesis that one or more of the
seasonal dummies are not equal to zero.
0 : (2) = (3) = (4) = (5) = = (12) = 0
1 : 0
The Wald test results are displayed in figure 3.2 along with the regression results displayed in
figure 3.1. The two test statistics both have p-values equal to 0, which indicates that the
seasonal dummies are jointly significant at 1% level and the null hypothesis is rejected. Thus,
the seasonal dummies do have values different from zero, and we conclude that there is
evidence showing seasonality is presented in the building approvals series.

Now, we will model seasonality by running least-squares regression on our current model
shown below. Note that the constant term or intercept (0 ) is removed because at any time
period , one of the 12 seasonal dummies will equal 1, while all the others will equal 0;
therefore, the sum of all seasonal dummies is 1 and each dummy variable acts as an intercept
when regressed. By removing the the intercept and including all 12 seasonal dummies in the
regression, collinearity and redundancy are removed. Let 1 , 2 , 3 , , 12 be the 12
seasonal dummies and we run OLS on the following model, which is visually represented in
figure 3.3:
12

= + 1 + 2 2 , =
=1
12

= 1 + 2 2 + +
=1
where 1 , 2 represent the coefficients of the time trend and represent the coefficients of
seasonal dummies for = 1. .12, from figure 3.4.

Figure 3.3: Trend + Seasonality and residual time series plot


Figure 3.4: Trend and seasonality regression

Figure 3.5: Seasonality time series plot

Figure 3.7: Seasonality and time trend

The coefficients of the seasonal dummies are


individually statistically significant at 1%
level since they have p-value=0. The seasonal
patterns for the series are shown in figure 3.5,
and figure 3.7 with the fitted trend. A closer
look at the shape of the seasonality plot is
shown in figure 3.6. Generally, the number of
building approvals increase at the beginning
of a year and decline near the end of the year,
which is likely due to calendar events such as
Christmas.

Figure 3.6: Seasonality time series plot from Jan 2004 to Dec 2005
4. Cycle: = + +

Cycle, , is the random fluctuation in that are not contained within trend and seasonality,
aka irregular component. It is a stochastic process. Since we have modelled trend and
seasonality in the previous sections, therefore, we can obtain the cycle component by
subtracting trend and seasonality from Approvals, that is = . We will apply
the Box-Jenkins methodology to model this component. The first step is to determine if the
time series is stationary that is having a unit-root or not, by using the Augmented Dickey-
Fuller (ADF) test. This is to check if data transformation (log, de-trend, difference) is needed
to achieve stationarity. The ADF or unit-root test is carried out under the null hypothesis that
the time series are integrated of order 1 (unit-root exist), against the alternative hypothesis
that the series is stationary (no unit-root).
0 : (1),
1 :
The test statistic of ADF is displayed in figure 4.1, which
has value equal -3.683714 with a p-value=0.0248. Since the
ADF statistic is less (more negative) than the critical value
at 5%, and 10% level, thus we reject the null hypothesis at
5% level and conclude that the cycle should be
Figure 4.1: Augmented Dickey-Fuller Unit Root test stationary and data transformation is not needed. Note
that we included a time trend in the unit-root test since the plot of in figure 1.2 shows a
downward trend from half way till the end of the series; however, note that if we do not include
a time trend in the ADF test, we still reject the null hypothesis at 5% and even 1% level
indicating the cycle is stationary. Hence an Autoregressive Moving Average (ARMA) model
can be built for level , the cyclical component.

In order to find the optimal ARMA model, we have to


analyse sample Autocorrelation Function (ACF) and
Partial Autocorrelation Function (PACF) to determine
the range of possible values for (p, q), where AR is
determined by p denoted as AR(p), and MA by q
denoted as MA(q). A correlogram for is generated
with 23 lags shown in figure 4.2. We will specify the
upper limits for (p, q) by examining the ACF and
PACF. The correlogram of the residual series (cyclical
component) shows a cut-off at lag 3 in PACF and
exponential decay in ACF, which indicates an AR(3)
for the cycle component appears adequate (although
not perfect due to notable jumps in the data for PACF
at lag 10 and 13). The upper limits ( , ) = (4, 1) is
used for the test of AIC/SIC to select the optimal model.
Figure 4.2: Correlogram with 23 lags
AIC AR(0) AR(1) AR(2) AR(3) AR(4)
MA(0) 15.79061 15.34033 15.23697 15.18393 15.18968 As mentioned earlier that AR(3) for the cycle
MA(1) 15.53120 15.19939 15.20083 15.18964 15.19617 component is adequate according to the
observation from the correlogram, this result
SIC AR(0) AR(1) AR(2) AR(3) AR(4)
coincides with the AIC/SIC results shown in
MA(0) 15.96097 15.53503 15.44384 15.40296 15.42089
MA(1) 15.72590 15.40626 15.41986 15.42084 15.43954
table 4.1, where ARMA(3,0) has the lowest
AIC and SIC. Hence, AR(3) will be used for
Table 4.1: AIC/SIC for ARMA models the cyclical component, which can also be implemented by including the
first 3 lags of the dependent variable, Approvals, in the regression.

The ARMA(3,0) model is shown as follows:


() = ()
() = 1 1 2 2 3 3 , () = 1
5. Full Model: = + +
By assembling the components of the classical (additive) decomposition, that is trend,
seasonality, and cycle, we obtain the following full model:
12

= 1 + 2 2 + + + ()
=1
where 1 , 2 = coefficients of TIME and TIME2 respectively
= coefficients of seasonal dummies, = 1 12
() = 1 1 2 2 3 3
= coefficients of AR lags
Before using the model to forecast the future approvals, we must complete the last step of the
Box-Jenkins methodology, that is perform diagnostic checks for model misspecifications. In
other word, we have to ensure that the disturbance term (residual series) of our model
resembles a White Noise process, which implies the model is correct. Firstly, we will check
the full model by running least-squares regression, then we will provide an in-depth
examination of residual series.

Figure 5.1: Full model regression Figure 5.2: Correlogram of residuals

The regression results displayed in figure 5.1 show that the


coefficients of the variables are all statistically significant
at 1% level. Furthermore, the Durbin-Watson statistic is
equal to 1.98 which is very close to 2 which implies there
is no evidence for the presence of autocorrelation in
residuals. The residuals plot (figure 5.3) shows that the
series seem to be stationary in the sense that it fluctuates
around a constant mean (which is close to zero). By
examining the correlogram of residuals in figure 5.2, the
Figure 5.3: Residuals plot series appears to be a white noise (WN), evidenced by the
fact that its ACF and PACF are mostly inside the Bartlett (95% confidence) bands (not
significantly different from zero). The small Q-statistics lead to the same conclusion. On the
other hand, taking into consideration of the histogram of
the residuals (figure 5.5), it has skewness equals 0.627
(not equal 0), kurtosis equals 5.307 (not equal 3), and a
Figure 5.4: Breusch-Pagan test Jarque-Bera statistic equals 90.511 (not less
than 5.99), thus normality should be rejected.
Nevertheless, by examining the histogram, the
residuals appear to be approximately normally
distributed despite the upper tail, however, we
may not be able to assume normality in this
case. The Breusch-Pagan test is used to test the
null hypothesis of constant variance versus the
alternative hypothesis of non-constant variance,
and from the results in figure 5.4, we do not
Figure 5.5: Residuals Histogram reject the null hypothesis at 1% level. Thus, the
error terms are homoscedastic and conclude
that there is evidence that the variance is constant. The residuals have constant mean close to
zero; are not autocorrelated; and have finite variance which suggest that the regression model
assumption that the residuals is iid (independently identically distributed) does hold here, thus
a WN process. This means that the model should have accounted for all the signal and the
residuals purely consist of noise the model should be a good fit.

In order to ensure the quality of forecasts (model performance), we will diagnose the stability
of our forecasting (full) model by using recursive parameter estimates; recursive residuals; and
the CUSUM test.

Figure 5.6: Recursive coefficients/parameters


By examining the recursive parameter estimates, we can monitor the change in estimates as the
time series sample size grows. Figure 5.6 shows the recursive coefficient estimates of the trend
variable TIME, TIME2; the first seasonal dummy D1; and the cycle component AR1-3. Since
cycle and trend components are most likely to change, thus they are examined here instead of
the seasonal dummies. We can see that the parameter estimates remain inside the 2 standard
error bands, hence the quality of forecasts is maintained as no dramatic change in the
parameters is detected from the estimates. Furthermore, the recursive residuals shown in figure
5.7 mostly remain within the 2 standard error bands, hence the model is stable (correct). The
stability of the model is further confirmed by the results of the CUSUM (cumulative sum of
standardized recursive residuals) test given in figure 5.8. The CUSUM falls inside the 95%
confidence bands, therefore the hypothesis of correct model specification and parameter
stability holds. Hence, we can confirm that the final model is stable and there is no significant
change in the patterns that would affect the final forecasts.

Figure 5.7: Recursive residuals Figure 5.8: CUSUM test

6. Forecast of Building Approvals with the Full Model


After analysed the data and constructed the full econometric model for the historical building
approvals, future building approvals can be forecasted if the main historical features remain
unchanged. We will forecast building approvals for the period beginning January 2010 to
December 2011 based on the historical building approvals data set (the information set for the
period beginning July 1983 to December 2009). Note that point forecast +| = +| +
+| + +| is used.

A 24-month-ahead forecast is shown in


figure 6.1 (the shaded area). The forecast
shows that the number of building
approvals for the year 2010 and 2011
fluctuates between 1500 to 2700, with a
slow, minor decreasing trend. The forecast
data shows the presence of seasonality as
we can see that the number of building
approvals increases in the beginning of the
year, reaches a peak in May, and have a
sharp fall from November to December.
The building approvals is trending down for
the forecast periods might be due the global Figure 6.1: Approvals: History and 24-month-ahead forecast
financial crisis occurred in 2007 to 2008,
and the building industry was in a global financial crisis induced funk, and the first home buyers
bonus had yet to take effect. The out-of-sample forecasts seem to perform fine because the
forecasted data falls within the 95% confidence bands.

7. Further Issues
The future building approvals forecasted by the model is not perfect due to the fact that the
model used contains errors. Firstly, the white noise process assumes that the residuals are
normally distributed with zero mean and finite variance, however, as mentioned in section 5,
the Jarque-Bera statistic is large thus normality is rejected. Hence the residuals might not
behave like a white noise process and actually contains both signals and noises, which is highly
likely to affect the forecast results. Further investigation is needed to find out the reason for the
cause. Furthermore, with closer examination of the correlogram of residuals (figure 5.2), the
ACF and PACF occasionally fall outside the Bartlett bands which is most notable at lag 6, 13,
15 and 21 onwards, and this observation also reflected in the recursive residuals plot (figure
5.7). Additionally, the Q-statistic shows a large increase from lag 12 to 13 which also implies
that there may be some serial correlation in the residual series. The original series and the fitted
values are plotted in figure 7.1. We can see a decreasing trend that starts from around 2002 and
this might be due to unexpected economic shocks that causes autocorrelation to occur.
Moreover, the adjusted R-squared (figure 5.1) for our model is 71.3% (the model accounted
for 71.3% variation) which is not particularly
high, hence the model may not be a very good fit.
These shortcomings of the model might affect the
accuracy of the forecast results.

Nevertheless, the diagnostic checks show that the


model is quite stable, hence the performance of
the model should be satisfactory. The 24-month-
ahead forecast generated by the model should be
quite reliable for predicting the future building
approvals. Hence, the forecast results can serve as
a guideline for RealAnswers to improve their
decision making process of their future
Figure 7.1: Fitted and actual time series, residuals plot
investment in New South Wales real estate
market.
Appendix
EViews Commands used in the analysis:
Specify a longer time framwork Unit root test ADF test Diagnostic checking
wfcreate(wf=whatever) m 1983:07 y.uroot(trnd) smpl 1983:07 2009:12
2016:12 equation eqn3.ls y time time2
Generate correlogram with 23 lags D1 D2 D3 D4 D5 D6 D7 D8 D9
Generate time-trend and monthly ls y time time2 D1 D2 D3 D4 D5 D6 D10 D11 D12 ar(1) ar(2) ar(3)
dummies D7 D8 D9 D10 D11 D12 eqn3.makeresids r3
genr time=@trend(1983:07) genr x=res r3.correl(23)
genr time2=time*time x.correl(23) plot r3
genr time3=time*time2 hist r3
genr D1=@seas(1) Record AIC/SIC for ARMA(0,0) to
genr D2=@seas(2) ARMA(4,2) Estimate the model: trend +
genr D3=@seas(3) smpl 1984:7 2009:12 season + AR(3) using first 3
genr D4=@seas(4) matrix(3,5) aic lags of y
genr D5=@seas(5) matrix(3,5) sic equation eqn4.ls y time time2
genr D6=@seas(6) ARMA(0,0) ARMA(0,4) D1 D2 D3 D4 D5 D6 D7 D8 D9
genr D7=@seas(7) ls y time time2 D1 D2 D3 D4 D5 D6 D7 D10 D11 D12 y(-1 to -3)
genr D8=@seas(8) D8 D9 D10 D11 D12 Recursive coefficients
genr D9=@seas(9) aic(1,1)=@aic freeze eqn4.rls(c) c(1) c(2) c(3)
genr D10=@seas(10) sic(1,1)=@schwarz c(4) c(15) c(16) c(17)
genr D11=@seas(11) ls y time time2 D1 D2 D3 D4 D5 D6 D7 Recursive residuals
genr D12=@seas(12) D8 D9 D10 D11 D12 ar(1) freeze eqn4.rls(r)
aic(1,2)=@aic Cumulative recursive residuals
Read data ffrom .xls file sic(1,2)=@schwarz freeze eqn4.rls(q)
read buildingApprovals2009Dec.xls ls y time time2 D1 D2 D3 D4 D5 D6 D7
approvals D8 D9 D10 D11 D12 ar(1) ar(2) Estimate preferred model
aic(1,3)=@aic smpl 1983:7 2009:12
Sample period for modelling sic(1,3)=@schwarz equation eq1.ls y time time2 D1
smpl 1983:07 2009:12 ls y time time2 D1 D2 D3 D4 D5 D6 D7 D2 D3 D4 D5 D6 D7 D8 D9
hist approvals D8 D9 D10 D11 D12 ar(1) ar(2) ar(3) D10 D11 D12 ar(1) ar(2) ar(3)
plot approvals aic(1,4)=@aic freeze(figure2) eq1.resid(g)
genr y=approvals sic(1,4)=@schwarz
ls y time time2 D1 D2 D3 D4 D5 D6 D7 Forecast with eq1
Trend D8 D9 D10 D11 D12 ar(1) ar(2) ar(3) smpl 2009:10 2011:12
ls y c time ar(4) eq1.forecast yhat se
ls y c time time2 aic(1,5)=@aic genr yf=yhat
ls y c time time2 time3 sic(1,5)=@schwarz genr yf_up = yf + 1.96*se
ARMA(1,0) ARMA(1,4) genr yf_lo = yf - 1.96*se
Seasonality ls y time time2 D1 D2 D3 D4 D5 D6 D7 group yfig y yf yf_up yf_lo
equation eqn1.ls y D1 D2 D3 D4 D5 D6 D8 D9 D10 D11 D12 ma(1) smpl 2004:10 2011:12
D7 D8 D9 D10 D11 D12 time time2 aic(2,1)=@aic freeze(Figure3) yfig.line
scalar_ sic(2,1)=@schwarz Figure3.draw(shade,bottom)
gbar=(c(1)+c(2)+c(3)+c(4)+c(5)+c(6)+c(7 ls y time time2 D1 D2 D3 D4 D5 D6 D7 2010:01 2011:12
)+c(8)+c(9)+c(10)+c(11)+c(12))/12 D8 D9 D10 D11 D12 ar(1) ma(1) show Figure3
eqn1.makeresids res aic(2,2)=@aic
genr fitted=y-res sic(2,2)=@schwarz
genr trnd=gbar + c(13)*time+c(14)*time2 ls y time time2 D1 D2 D3 D4 D5 D6 D7
genr seas=fitted-trnd D8 D9 D10 D11 D12 ar(1) ar(2) ma(1)
plot seas aic(2,3)=@aic
plot seas trnd sic(2,3)=@schwarz
smpl 2004:1 2005:12 ls y time time2 D1 D2 D3 D4 D5 D6 D7
plot seas D8 D9 D10 D11 D12 ar(1) ar(2) ar(3)
smpl 1983:07 2009:12 ma(1)
ls y time time2 D1 D2 D3 D4 D5 D6 D7 aic(2,4)=@aic
D8 D9 D10 D11 D12 sic(2,4)=@schwarz
test for presence of seasonality Wald ls y time time2 D1 D2 D3 D4 D5 D6 D7
test D8 D9 D10 D11 D12 ar(1) ar(2) ar(3)
equation eqn2.ls y c D2 D3 D4 D5 D6 D7 ar(4) ma(1)
D8 D9 D10 D11 D12 time time2 aic(2,5)=@aic
freeze eqn2.wald sic(2,5)=@schwarz
c(2)=c(3)=c(4)=c(5)=c(6)=c(7)=c(8)=c(9)
=c(10)=c(11)=c(12)=0

You might also like