Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 27

1

Section 1:

The following table pertains to further calculations being performed in Section 1.

∑X = 7470
# Direct labor cost Printing Technique
1 0.9 20
2 1.1 30
∑Y = 62.7
3 1.4 40
4 1.7 50 ∑X² = 2498500
5 1.5 70
6 1 90 ∑Y² = 142.53
7 1.3 110
8 1.8 120 ∑XY = 17949
9 1.7 140
10 2.1 150 n = 30
11 1.9 170
12 2 190 𝑥̅ = 249
13 2.5 200
14 2.3 220 𝑦̅ = 2.09
15 1.4 240
16 2.2 250 (𝛴𝑥)(𝛴𝑦)
𝑆𝑥𝑦 = 𝛴𝑥𝑦 −
17 1.8 280 𝑛
18 2.1 290
19 1.9 310 = 17949 −
20 2.8 320 (7470)(62.7)
21 2.4 340 30
22 2.5 350
23 2.9 370 = 2336.7
24 2.6 390
25 3 400 (𝛴𝑥)2
𝑠𝑥𝑥 = 𝛴𝑥 2 − 𝑛
26 2.9 420
27 2.4 450 (7470)2
28 3.1 470 = 2498500 − 30
29 2.6 490
30 2.9 500 = 588336.66

(𝛴𝑦)2
𝑠𝑦𝑦 = 𝛴𝑦 2 − 𝑛

(62.7)2
= 142.53 − 30

= 11.487

2
a) [1] 14 82 20 8 69 50 48 36 67 71 51 121 10 22 128 19 107 2 98

[20] 39 100 89 78 62 97 139 72 45 92 30

b)

𝑠𝑥𝑦
c) 𝑟 =
√𝑠𝑥𝑥 𝑠𝑦𝑦

2336.7
=
√(588336.66)(11.487)

= 0.898848903 => 0.8988


𝑠𝑥𝑦 2336.7
d) 𝛽̂1 = 𝑠 =
𝑥𝑥 588336.66

= 0.003971706

𝛽̂0 = 𝑦̅ − 𝛽̂1 𝑥̅ = 2.09 − (0.003971706)(249)

= 1.101045206

𝑦̂ = 𝛽̂0 + 𝛽̂1𝑥

𝑦̂ = 1.101045206 + 0.003971706 𝑥

The fitted regression line is y = 0.003971706 𝑥 + 1.101045206

3
e)

𝑀𝑆
f) 𝑠ⅇ(𝛽̂1 ) = √ 𝑆 𝑅𝑒𝑠
𝑥𝑥

0.07879695
= √
588336.66

= 0.000365967

𝑆𝑆Res
𝑀𝑆Res =
𝑛−2

2.20631459
=
28

= 0.07879695

𝑆𝑆Re𝑠 = 𝑆𝑆𝑇 − 𝛽̂1 𝑆𝑥𝑦

= 𝑠𝑦𝑦 − 𝛽̂1 𝑠𝑥𝑦

= 11.487 − (0.003971706)(2336.7)

= 2.20631459

4
𝑆𝑆
g) 𝑅 2 = 𝑆𝑆𝑅
𝑇

𝑆𝑆Re𝑠
= 1− 𝑆𝑆𝑇

2.20631459
= 1−
11.487

= 0.80793

80.79% of the variability in the number of copies produced is explained by the regression model.

h) 𝐻0 : 𝛽1 = 0

𝐻1 : 𝛽1 ≠ 0

α = 0.05

̂
𝛽
𝑡0 = 𝑠ⅇ(𝛽1̂ )
𝑙

0.003971706
=
0.000365967

= 10.8526

Critical value: 𝑡0.025,28 = 2.048

Decision rule: Reject 𝐻0 when |𝑡0 | > 2.048

Decision: Since 𝑡0 > 2.048, we reject 𝐻0 .

Conclusion: At 5% significance level, there exist relationship between number of copies produced
and the associated direct labor cost.

i) 𝑦̂0 = 𝑢̂𝑦|𝑥0 = 1.101045206 + 0.003971706(250)

= 2.093972

5
90% Prediction Interval:

1 (𝑥0 − 𝑥̅ )2 1 (𝑥0 − 𝑥̅ )2
𝑦̂0 − 𝑡0.05,28 √𝑀𝑆Re𝑠 [1 + + ] ≤ 𝑦0 ≤ 𝑦̂0 + 𝑡0.05,28 √𝑀𝑆Re𝑠 [1 + + ]
𝑛 𝑆𝑥𝑥 𝑛 𝑆𝑥𝑥

1 (250 − 249)2
2.093972 − 1.701√0.07879695 [1 + + ] ≤ 𝑦0
30 588336.66

1 (250 − 249)2
≤ 2.093972 − 1.701√0.07879695 [1 + + ]
30 588336.66

2.093972 − 0.485377448 ≤ 𝑦0 ≤ 2.093972 + 0.485377448

1.608594552 ≤ 𝑦0 ≤ 2.579349448

We’re 90% confident that the prediction interval will contain between 1.6086 ≤ 𝑦0 ≤ 2.5794.

j) Simple linear regression is appropriate as the following conditions are satisfied.


 The dependent variable Y has a linear relationship to the independent variable X. To check
this,we can see that the PT/DLC scatterplot is linear and that the residual plot shows a
random pattern. The following shows a scatterplot.

 For each value of X, the probability distribution of Y has the same standard deviation σ.
When this condition is satisfied, the variability of the residuals will be relatively constant
across all values of X, which is easily checked in a residual plot.

6
 For any given value of X,
 The Y values are independent, as indicated by a random pattern on the residual plot.
The following shows a residual plot.

 The Y values are roughly normally distributed. A histogram or a dot plot will show the
shape of the distribution. The figure below shows a histogram.

7
Section 2:

a) Ŷ = -0.5802 + 15.0352X

b)

It shows a strong positive relationship as the estimated regression data is well fitted
with the actual data.

c) Ŷ = -0.5802 + 15.0352X, when the number of copiers X=0, the number of minutes
spent by the service is -0.5802 minutes.

d) When X=5 copiers are serviced, point estimate of the mean service is 74.59608.

e)
Source of Sum of squares Degree Mean Square F0
variation of
Freedom
Regression 76960 1 76960 968.66
Residual 3416 43 79
Total 80376 44

8
f) H0:β1 = 0
H1: β1 ≠ 0
𝛼= .10
F0 = 968.7
Reject H0 when F0 is more than F(0.10,1,43) = 2.88. Since F0 is larger than
F(0.10,1,43), we reject H0.

g) 8.914 is the total variation in the number of minutes spent on a call reduced when the
number of copiers serviced is introduced into the analysis. It is relatively a small
reduction. The name of this measure is residual standard error.

h) 𝑟 = √0.9575 = 0.9785
It shows a strong positive relationship.

i) R2 has the more clear-cut interpretation as it accounts for the percentage of variation
in Y explained by X.
j)

The residuals show the distance from the Y’s to the fitted Y’s. There is little to no major
difference between the Ys to the fitted Ys since the line is pretty flat and the Y values are large.

9
k)

The normal probabilty appears to be tenable.

10
Section 3:
a) Stem and Leaf plots for X1, X2, X3 are as follows:

X1=> The decimal point is 1 digit(s) to the right of the |

2 | 23
2 | 58899999
3 | 012233344
3 | 66678
4 | 0012233344
4 | 557779
5 | 0233
5 | 55

X2=>The decimal point is at the |

40 | 0
42 | 00
44 | 0
46 | 00000
48 | 0000000000
50 | 000000000000
52 | 000000
54 | 0000
56 | 00
58 | 0
60 | 0
62 | 0

X3=> The decimal point is 1 digit(s) to the left of the |

18 | 000000
20 | 00000000
22 | 0000000000000
24 | 00000000000
26 | 0000
28 | 0000

The stem-and-leaf plots for X1, X2 and X3 are produced above. None of the plots revealed any
noteworthy features: No outlying observations, no special distributional shapes etc. However, one
can get similar information from histograms too.

11
b) Scatterplot matrix

Correlation Matrix
Y X1 X2 X3
Y 1.0000000 -0.7867555 -0.6029417 -0.6445910
X1 -0.7867555 1.0000000 0.5679505 0.5696775
X2 -0.6029417 0.5679505 1.0000000 0.6705287
X3 -0.6445910 0.5696775 0.6705287 1.0000000

The Scatter Plot and Correlation Matrix shows that among Y , X1, X2 and X3. The
response variable Y is negatively correlated with each of the predictor variables X1, X2 and X3.
The predictor variables X1, X2 and X3 are also moderately positively correlated which might
introduce multi-collinearity in the problem.

12
c) The fitted regression model for three predictors can be stated by the following equation

Y= 𝛽 ̂o + 𝛽 ̂1*x1 + 𝛽 ̂2*x2 + 𝛽 ̂3*x3

Intercept X1 X2 X3

159.3952778 -1.1197139 -0.4485034 -13.9946776

Henceforth, the equation can be denoted by,

Y= 159.3953 -1.1197139x1 -0.4485034x2 -13.9946776x3


Interpretation of b2 = -0:44: For one unit increase in severity of illness X2, an average decrease in
patient satisfaction (Y) is -0:44 unit given patient's age X1 and anxiety level X3 are held fixed.

d)
Residuals:

Min 1st Q Median 3rd Q Max

-18.4364 -5.9196 0.2461 8.1356 17.1918

There exist no outliers in the dataset chosen.


13
e) Residuals against Y plot:

The graph above is basically the Predicted vs Residual plot. As you can see the Residuals
against Y plot has evenly distributed/dispersed dots all over the graph where the prediction made
by the model is on the x-axis, and the accuracy of the prediction is on the y-axis.

Normal Probability Graph:

The normal probability plot shows that the residuals have longer tails in both directions.

14
f) In this data set the observations were not repeated. So a formal test for lack of fit is not
recommended here.

g)

Consider the multiple linear regression model with three predictors.


Y=bo+b1x1+b2x2+b3x3+e

In order to test if there is any regression relation, we want to test the null hypothesis:
HO: b1=b2=b3=0
Against alternative
Ha: at least one B is not equal to zero.
The test statistic is T0=(SSR/3)/(SSE/n-4) = 30.05,
which follows 3 and 42 degrees of freedoms. At a = 0:10 level of significance, the tabulated
value is
T1 = F (1-a; 3. 42) = 2.219059

Decision rule: As T0 > T1 the null hypothesis is rejected at a = 0.10 level of significance. That
means at least one of the B's are not equal to zero. The P-value of the test is 1.542^(-10).

h) R2 = SSR=SST = 0.6822

This means that 68:22% variability in the response variable, patient satisfaction, is explained by
the fitted model.

15
i)
fit lower upper

69.01029 64.52854 73.49204

In repeated sampling of size n = 46, 90 percent of the times the estimates will be in
(64.52854; 73.49204). It means that level among 35-year-old patients with severity index 45 and
anxiety index 2.2 is between 64.53 and 73.49 on the satisfaction scope .
j)
fit lower Upper

69.01029 51.50965 86.51092

In repeated sampling of size n = 46, 90 percent of the times the predictions will be in
(51.50965; 86.51092). The satisfaction level for the new patient aged 35-year-old with severity
index 45 and anxiety index 2.2 will be between 51.51 and 86.51 on the satisfaction scope.

16
Section 4:

Table of Contents
I Introduction ……………………………………………………………………………...18

II Analysis and discussions …………………………………………………………….….

Data preparation …………………………………………………………………….…....19

Model selection …………………………………………………………………………..21

Estimation …....………………………………………………………....………………..24

Diagnostics .……………………………………………………………………………....24

Forecasting ……………………………………………………………………………….25

III Conclusion ……………………………………………………………………………...26

IV References ……………………………………………………………………………....27

17
Introduction
We have selected 36 monthly data ranging from May 2015 to April 2018 from our choice of
company which is FGV Holdings Berhad (FELDA Global Ventures Holding Berhad). It is a
Malaysian-based global agricultural and agri-commodities company. With operations worldwide,
FGV produces oil palm and rubber plantation products, soybean and canola products, oleo
chemicals and sugar products. Its initial public offering in 2012 was the third largest in the world
that year after Facebook and biggest IPO in Asia with $3.1 billion. Moreover, It is the third largest
Palm oil company in the world by planted acreage. Malaysia alone, it controls over 850,000 ha of
land, Includes roughly 500,000 ha that it leases and manages for 112,635 FELDA smallholders.

By using this ARIMA technique, First of all, all we have to check is if the series is stationary or
not. If the series is not stationary, we use differencing which allows us to transform a non-
stationary series into a stationary one. In addition, The “first differencing” helps us eliminate any
trend such as if there is a constant growth in the series. If the series is still growing in an increasing
rate, then we can use the same procedure to do “second differencing”.

After that, next we have autocorrelations, it shows how a data is related to itself over time.
Autocorrelation measures how data values are correlated to each other at a number period apart.
The number of periods apart are called as “lag”.

In the ARIMA methodology, we have autoregressive (AR) and moving average (MA) parameters
in which able to describe the movement of the stationary time series. The AR model shows that
the output variable depends linearly on its previous values. Whereas the MA parameters relate to
what happens in certain period of time only to the random errors that occurred in the past time
periods.

Overall, we used ARIMA methodology which includes AR and MA parameters to build a new
model. These models can be also referred as “mixed models”. By using this, it allows us to produce
a more accurate forecast.

18
Analysis and Discussion

Data preparation

Table 2.1 This represents the stock price for FGV Holdings Berhad for 36 months (3 years).

Stock Price Stock Price Stock Price

1. 1.9700 13. 1.3700 25. 1.7500

2. 1.6300 14. 1.5100 26. 1.7100

3. 1.6800 15. 1.8500 27. 1.6300

4. 1.2200 16. 2.2700 28. 1.5500

5. 1.5000 17. 2.3400 29. 1.6900

6. 1.7800 18. 2.0100 30. 1.9100

7. 1.7700 19. 1.5400 31. 1.8200

8. 1.7100 20. 1.5500 32. 1.6900

9. 1.7200 21. 1.8500 33. 2.0100

10. 1.5300 22. 1.8800 34. 1.9400

11. 1.5100 23. 2.0900 35. 1.7000

12. 1.4500 24. 2.1300 36. 1.7100

19
Table 2.2 This represents the time series of the stock price for FGV Holdings Berhad from
year 2015 to 2018.

Jan Feb Mar Apr May Jun Jul Aug Sept Oct Nov Dec

2015 - - - - 1.970 1.630 1.680 1.220 1.500 1.780 1.770 1.710

2016 1.720 1.530 1.510 1.450 1.370 1.510 1.850 2.270 2.340 2.010 1.540 1.550

2017 1.850 1.880 2.090 2.130 1.750 1.710 1.630 1.550 1.690 1.910 1.820 1.690

2018 2.010 1.940 1.700 1.710 - - - - - - - -

20
Model Selection

Figure 2.1

Figure 2.1 Yearly FGV Holdings Berhad stock market price between 2015 to 2018. Data from
Table 2.1.

The time plot in Figure 2.1 shows an initial analysis on the original stock price data. We can
observe that the mean of the series changes over time, hence non-stationary in mean. The
autocorrelation plot is a mixture of exponential decay and sine-wave pattern, given displays non-
stationary and even the data plot makes this clear too. The first partial autocorrelation is very
dominant and close to 1 – also showing the non-stationary. Thus, differencing is required to
remove the non-stationarity in this time series.

21
Figure 2.2

Figure 2.2 First differences of the time series.

Figure 2.2 shows the analysis of the stock price data after taking the first difference. We can
notice that the time plot seems to fluctuate horizontally with a constant variation around a constant
mean and there is no evidence of a change in the mean and variance over time. From the ACF
graph, ACF drop to zero relatively quickly and all of the correlations are within the horizontal
band of confidence interval. Thus, we can conclude that the time series is stationary after first
differencing is done.

22
Figure 2.3

Box-Pierce test

data: z2
X-squared = 9.7649, df = 10, p-value = 0.4614
> Box.test(z2,lag=10,type="Ljung")

Box-Ljung test

data: z2
X-squared = 11.772, df = 10, p-value = 0.3006

23
We take a second difference of the series and shown in Figure 2.3. Form the ACF graph, we can
see clearly that the lag 1 has exceeded the horizontal band of confidence interval and the other lags
are lying within the limit. From the PACF graph in Figure 2.2, the exponential decay of the first
few lags suggesting a non-seasonal MA(1) model; while from the ACF graph, a significant spike
at lag 1 reinforces a non-seasonal MA(1) model.

Estimation

We choose the first differenced model (Figure 2.2) since the series looks just like a white noise
series. There is no moving average (MA) process or autoregressive (AR) process involve in Figure
2.2 since there is no autocorrelations or partial autocorrelations outside the limits. At last, we
suggest ARIMA(0,1,0) model in this case since we choose a first differenced model.

> model<-arima(y,order=c(0,1,0))

> model

Call:
arima(x = y, order = c(0, 1, 0))
sigma^2 estimated as 0.04962: log likelihood = 2.9, aic = -3.79

Diagnostics

Portmanteau Tests

> Box.test(z,lag=10)

Box-Pierce test
data: z
X-squared = 13.845, df = 10, p-value = 0.1802
> Box.test(z,lag=24,type="Ljung")
Box-Ljung test
data: z
X-squared = 36.075, df = 24, p-value = 0.05398

24
The p-value of Box-Pierce Q test is 0.1802, which 18.02% indicating that the first differenced
model is the white noise series. The p-value of Ljung-Box Q* test is 0.05398 (5.398%) which is
not greater than 10%, Hence, indicating that significant evidence that the first differenced series is
not a white noise. Also, both of the X-squared values from Box-Pierce Q test and Ljung-Box Q*
test are more than 0.1, indicating there is no significantly evidence that the first differenced series
is not a white noise. Hence, we can conclude that our model residuals follows a white noise model
and is significant in forecasting.

Forecasting

By using R studio, we can predict stock price of FGV in January and February 2018

> m1<-arima(y, order=c(0,1,0), seasonal=list(order=c(0,1,0), period=12))


> m1
Call:
arima(x = y, order = c(0, 1, 0), seasonal = list(order = c(0, 1, 0), period =
12))
sigma^2 estimated as 0.137: log likelihood = -9.78, aic = 21.55
> predict(object=m1, n.ahead=2, prediction.interval=F, level=.95)
$pred $se
2018 June July
1.33 1.29 2018 June July
0.3701351 0.5234501

Here is the table to show the prediction

Prediction June 2018 July 2018


Stock Price 1.33 1.29
Standard Error 0.3701351 0.5234501

We estimated that stock price in June 2018 will be 1.33 and stock price in July 2018 will be 1.29.

25
Conclusion:

According to the time plot for the stock price, we observed that the graph is non-stationary. From
the ACF plot, we can see that the autocorrelations decrease slowly to zero and a sine-waved pattern
can be observed from the PACF plot as well. This indicates that the data is not stationary. Hence,
first differencing on the original data is needed to obtain a stationary data.

After first difference, we can observe from the ACF graph that the ACF drop to zero relatively
quick and all of the correlations are within the horizontal band of confidence interval. Thus, we
can conclude that the time series is stationary after first differencing.
We have also identified a few possible ARIMA models. At the ends, after some test being
conducted, we choose the best ARIMA model which is ARIMA(0,1,0) to forecast the future stock
price.

As a conclusion, we can conclude that there is no specific trend, or seasonality in the stock price
by observing the time plot of the data. Assuming the stock market is to be stable in the next few
months, holding other factors remain constant, we predict that the FGV Holdings Berhad’s
stock price in June 2018 will be 1.33 and stock price in July 2018 will be 1.29.

26
References:
International Conference on Computational Intelligence for Modelling, Control and
Automation and International Conference on Intelligent Agents, Web Technologies and
Internet Commerce (CIMCA-IAWTIC'06)
Date of Conference: 28-30 Nov. 2005
Date Added to IEEE Xplore: 22 May 2006
Print ISBN: 0-7695-2504-0
INSPEC Accession Number: 9109530
DOI: 10.1109/CIMCA.2005.1631617
Publisher: IEEE
Conference Location: Vienna, Austria

Data for Section 4 taken from :


https://finance.yahoo.com/quote/5222.KL/history/

27

You might also like