Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 8

An Attempt to develop a model future price movements

Introduction
Gold, throughout history, has been known to be precious not only economically but also culturally. Mankind's obsession with Gold has been known since 3600 BC with the first recorded act of Gold lead to this price rise: A booming world population (that follows similar cultural beliefs Data Source: World Gold Council website (http://www.gold.org/investment/statistics/gold_price_chart/) The data is available in XLS format. The same can be viewed in interactive format. Note: We have taken the prices of gold per ten grams and considered the prices from 2/3/2009 to 9/3/2012 The first step is to observe the raw data to see if there is any trend or not and visually

observe if the data is stationary. Time series Plot of the raw data: On observation one can say that there is a visible pattern in the raw data. It is also indicating a linear trend approximately from which one can conclude in a nave manner that the trend can be modeled using a linear equation. Hence just by visualizing the data one can think that the raw data would have to be differenced once. We will now look at the acf and pacf of the raw data to see if there is any hidden non-stationary component to reaffirm our afore-mentioned belief. The above plot of acf versus lag indicates that the acf is decaying very slowly with increasing lags suggesting the presence of non-stationary component and a unit root in the lag polynomial. We now carry out augmented DickeyFuller test ADF test and PhillipsPerron PP test to reaffirm our belief that there could be a unit root in the time series polynomial. The presence of unit root can be interpreted as non-stationary of given time series. PhillipsPerron test makes a non-parametric correction to the t-test statistic. The test is robust with respect to unspecified autocorrelation and heteroscedasticity in the disturbance process of the test equation. ADF Test for the original raw data: Null Hypothesis: Presence of unit root in the lag polynomial Alternative Hypothesis: Unit root is not present in the lag polynomial Dickey-Fuller = -2.6965, Lag order = 9, p-value = 0.2835 Since the p-value is sufficiently high we cannot reject the null hypothesis that there is a unit root in the lag polynomial. PP Test for the original raw data: Null Hypothesis: Presence of unit root in the AR polynomial of an ARMA process Alternative Hypothesis: Unit root is not present in the lag polynomial Dickey-Fuller Z(alpha) = -12.2383, Truncation lag parameter = 6, p-value = 0.427 Since the p-value is sufficiently high we cannot reject the null hypothesis that there is a unit root in the lag polynomial. After seeing the ACF graph and the p-values, we can now conclude that the time series data is non-stationary. The next step is to difference the series. We will then plot the differenced series and acf and pacf plots for the same. Page 2 of 8

Subsequently, ADF test and PP test will be used to check the presence of unit root.

Time series plot of the data differenced once: On observation of the time series plot for the data differenced once we find the trend component has been removed completely. But at some time periods, there are some spikes in the prices of gold indicating the presence of volatility of the market and mathematically one can say that the variance of the data is quite high. Now we look at the acf and pacf plots of the differenced data to reaffirm our belief that the trend has been accounted for completely.

The acf plot of the data differenced once suggests clearly that there are no values which are very negative at any lags and the trend also has been completely accounted for in this process. One can say that if the autocorrelations at high number of lags are large and positive then it would probably require a higher order of differencing which is not seen in our case. The pacf plot of the same suggests the number of AR terms in the model, approximately. If the pacf of the differenced series displays a sharp cutoff, then one can say in a nave manner that AR terms will have to be added to the model. The lag at which pacf cuts off is the indicated number of AR terms approximately, which from the plot indicates that the number of terms would be somewhere around 4. Next, we look at the two tests ADF and PP to reaffirm our belief that first order integrated process would be a suitable one for the gold prices. ADF Test for data differenced once: Null Hypothesis: Presence of unit root in the lag polynomial Alternative Hypothesis: Unit root is not present in the lag polynomial Dickey-Fuller = -7.7409, Lag order = 9, p-value = 0.01 Page 3 of 8

Note: The p-value is actually smaller than the indicated p-value above. PP Test for data differenced once: Null Hypothesis: Presence of unit root in the AR polynomial of an ARMA process Alternative Hypothesis: Unit root is not present in the lag polynomial Dickey-Fuller Z(alpha) = -778.0359, Truncation lag parameter = 6, p-value = 0.01 Note: The p-value is actually smaller than the indicated p-value above. In both the tests, we find that since p-value is very small, there is strong evidence to reject the null hypothesis and hence we can conclude that unit root is not present in the lag polynomial and that the integrated process of order one is apt for gold prices data. Although the tests show a low p-value, the plots point out some points that exceed the 95% confidence levels. Hence, we will difference the data again and check the acf plot.

We see that ACF at lag 2 is highly negative from which we can conclude that we need to difference the data only once as differencing would actually reduce the acf. If some values are too small or negative it is indicative of over differencing. Therefore we conclude that first order integrated process is suitable for gold prices. Model Construction: We now embark upon developing a suitable model with order of differencing one using the Akaike information criterion AIC, which takes into account the log likelihood value and penalizes for higher parameters in the model. AIC is a measure of relative goodness of fit for a statistical model. We select the model of that order which minimizes the AIC value. After several iterations with various values of p and q (order of AR and MA components), we find that an ARIMA (6, 1, 5) model might be appropriate based on the least AIC value. arima(x = gold, order = c(6, 1, 5)) Coefficients: ar1 ar2 ar3 ar4 ar5 ar6 ma1 ma2 1.0222 -0.1590 0.0661 0.7152 -0.5898 -0.1414 -1.1094 0.2307 s.e. 0.1034 0.1931 0.1971 0.1793 0.0978 0.0379 0.0996 0.1893 ma3 ma4 ma5 -0.1314 -0.6978 0.7993 s.e. 0.1956 0.1802 0.0939 sigma^2 estimated as 54959: log likelihood = -5425.99, aic = 10875.97 Page 4 of 8

The next step is to check the significance of the calculated parameters of the ARIMA model. In the above model, we find that the coefficients ar2, ar3, ma2 and ma3 are not significant. Note: The significance of these parameters are calculated as zobs= parameter value/S.E, where S.E is the standard error and the significance level is 0.01. From the normality tables we find that any value less than 2.33 on an absolute scale would be insignificant. Now, we embark upon obtaining a model whose parameters are all significant by dropping the insignificant terms from the model. The final significant parameters are given as follows: Coefficients: ar1 ar2 ma5 ar3 ar4 ar5 ar6 ma1 ma2 ma3 ma4

0.9410 0 -0.0813 0.8390 -0.6414 -0.1475 -1.0372 0.0896 0.8600 s.e. 0.0291 0 0.0281 0.0515 0.0681 0.0376 0.0352 0.0310 0.0431 sigma^2 estimated as 55032: log likelihood = -5426.51, aic = 10873.01

0 -0.8177 0 0.0492

Eventually, we have arrived at our model for gold prices data which is as follows: Xt = 0.9410 Xt-1 0.0813 Xt-3 + 0.839 Xt-4 0.6414 Xt-5 0.1475 Xt-6 + t 1.0372

t-1 + 0.0896 t-2 0.8177 t-4 + 0.86 t-5

Now that we have zeroed in on the appropriate model, we now need to check the characteristics of the residuals left after the modeling exercise. Once we arrive at White Noise our job ends. We start with plotting the residuals and a qqnorm plot to check normality. Residual plot: The residual plot indicates that it is not homoscedastic because of some spikes in some time periods. It can be argued that the variance of the process is quite high and that the volatility of the market is high. Note: When we do a residual analysis, we have to check for two important things: Whiteness test and Independence test Page 5 of 8

Whiteness Test: A good model must have the autocorrelation function inside the confidence interval of the corresponding estimates, indicating that the residuals are uncorrelated. Independence Test: A good model must have residuals uncorrelated with past inputs The normal qq plot is not a straight line indicating that the residuals are not normal. ACF and PACF plots: The ACF and PACF for WN(0,2) are zero. Hence, the residuals which are estimates of the epsilons () in the model must die out or not be present for higher lags. Also for WN it can be shown that PACF estimate is asymptotically normal (0,1/n) But the plots shown below do not suggest that which might be because of several reasons like: High volatility of the market which is time dependent and which should be modeled separately, presence of some seasonality component of higher orders which wouldnt have been taken care of while differencing, other factors etc.

Time series diagnosis plots: We now look at the standardized residual plots and the p-values for Ljung-box plots for various lags. The standardized residual plot show some spikes in some time periods which is indicative of the fact that they are violating the normality and also heteroscedastic. The acf of residuals plot indicates that the acfs at higher lags are not significant which inturn indicates that the residuals are uncorrelated to some extent.

The p-values for Ljung-Box statistic are high till approximately 10 lags indicating that there is no serious problem of autocorrelation. The p-values from the Ljung-Box statistic suggest that white noise has been reached. This is further vetted by Box-Pierce test and Box-Ljung Tests. Box-Pierce test: Null Hypothesis: The residuals are uncorrelated (1 = 2 = = n) Alternative Hypothesis: The residuals are not uncorrelated. X-squared = 26.3454, df = 20, p-value = 0.1547 Note: These values are for lag = 20 Since p-value is sufficiently high one cannot reject null hypothesis. Box-Ljung test: Null Hypothesis: The residuals are independently distributed (Correlations in the population are 0) Page 6 of 8

Alternative Hypothesis: The residuals are not independently distributed. X-squared = 26.7919, df = 20, p-value = 0.1412 Note: These values are for lag = 20 Since p-value is sufficiently high one cannot reject null hypothesis. Hence both the above tests re affirm our belief that the residuals are WN. Normality checks for residuals: We will also run a normality test on the residuals. > normtest(res1) Method 1 Shapiro-Wilk normality test 2 Anderson-Darling normality test 3 Cramer-von Mises normality test 4 Lilliefors (Kolmogorov-Smirnov) normality test 5 Shapiro-Francia normality test P.Value 1.382387e-18 8.365966e-23 6.683295e-10 5.192090e-12 6.161848e-17

The normality tests show that the residuals do not follow normal distribution as the pvalues are very low and hence the null hypothesis which is normality can be rejected. With the above tests results, we can conclude that having been reached white noise residuals, the model building exercise is now complete. Step 2: Prediction We now move on to the most important part of this exercise which is to predict the gold prices for the next 10 periods.

Predicted 27063.31 27087.15 27101.28 27153.65 27116.62 27128.82 27129.63 27164.76 27130.08 27123.65

SE 234.5899 316.2192 380.6225 426.5728 463.5032 506.0323 545.9036 579.5668 608.7533 642.8273

Lower Limit 26594.13 26454.71 26340.04 26300.51 26189.62 26116.76 26037.82 26005.62 25912.58 25837.99

Upper Limit 27532.49 27719.59 27862.53 28006.80 28043.63 28140.89 28221.44 28323.89 28347.59 28409.30

Real Value
27126.69 26387.06 26696.27 26751.65 26834.82

Page 7 of 8

CONCLUSION We have predicted the gold prices per ten grams for the next ten periods with confidence levels of 95% and compared it with the real values for the corresponding periods. We observe that the gold are bound to increase as interested parties advancing their purchases to avoid paying the increased duty which will come into effect from April 1, 2012. 3. Lastly, but not the least, it is conceded that the model itself may not be robust enough to handle variations due to external factors (e.g. seasonality or cyclicity) At the end, we would like to quote George Edward Pelham Box, All models are wrong, some models are useful.

Page 8 of 8

You might also like