Chapter 2 Regression and Forecasting

Chapter 2 Regression and Forecasting
Nowadays in China, many people view house as their

source of happiness, especially as the price of house is
soaring recently, it seems more and more important for
many Chinese people.
Why my project goes to forecasting house price is that I
wish I could give people who plan to buy a house more
information and suggestions.
2
Jenny Wilson Real Estate Data
SELLING SQUARE
AGE CONDITION
PRICE ($) FOOTAGE
95,000 1,926 30 Good
119,000 2,069 40 Excellent
124,800 1,720 30 Excellent
135,000 1,396 15 Good
142,000 1,706 32 Mint
145,000 1,847 38 Mint
159,000 1,950 27 Mint
165,000 2,323 30 Excellent
182,000 2,285 26 Mint
183,000 3,752 35 Good
200,000 2,300 18 Good
211,000 2,525 17 Good
215,000 3,800 40 Excellent
Table 4.5 219,000 1,740 12 Mint
Learning Objectives
1. Understand and know when to use various

families of forecasting models.
2. Compare moving averages, exponential
smoothing, and other time-series models.
3. Compute a variety of error measures.
Introduction
 Managers are always trying to reduce uncertainty and
make better estimates of what will happen in the future.
– This is the main purpose of forecasting.
– Some firms use subjective methods: seat-of-the pants methods, intuition,
experience.
– There are also several quantitative techniques, including:
» Moving averages
» Exponential smoothing
» Trend projections
» Decomposition
Introduction
 Eight steps to forecasting:
1. Determine the use of the forecast—what objective
are we trying to obtain?
2. Select the items or quantities that are to be
forecasted.
3. Determine the time horizon of the forecast.
4. Select the forecasting model or models.
5. Gather the data needed to make the forecast.
6. Validate the forecasting model.
7. Make the forecast.
8. Implement the results.
Forecasting Models
Forecasting Techniques
Qualitative Time-Series Methods Causal

Models Methods
Delphi Moving Regression Analysis

Methods Average
Jury of Executive Exponential Smoothing Multiple

Opinion Regression
Sales Force Trend

Composite Projections
Figure 5.1
Consumer
Market Survey Decomposition
Correlation and Regression
This chapter introduces important methods for making

inferences about a correlation (or relationship) between
two variables, and describing such a relationship with
an equation (regression equation) that can be used for
predicting the value of one variable given the value of
the other variable.
We consider sample data that come in pairs.
Definition
A correlation exists between two
variables when one of them is related
to the other in some way.
The linear correlation coefficient r measures the
strength of the linear relationship between
paired x- and y- quantitative values in a sample.
Scatterplots of Paired Data
1
Scatterplots of Paired Data
1
Formula
The linear correlation coefficient r measures the strength of
a linear relationship between the paired values in a sample.
nxy – (x)(y)
r=
n(x2) – (x)2 n(y2) – (y)2
1
Example: Calculating r
Using the simple random sample of data below, find
the value of r.
Data
x 3 1 3 5
y 5 8 6 4
1
Example: Calculating r - cont
nxy – (x)(y)
r= = -0.956
n(x2) – (x)2 n(y2) – (y)2
1
Example: Calculating r
Using the simple random sample of data below, find
the value of r.
1
Formulas for b0 and b1
n(xy) – (x) (y)

b1 = (slope)
n(x2) – (x)2
b0 = y – b1 x (y-intercept)
1
Calculating the
Regression Equation
Data
x 3 1 3 5
y 5 8 6 4
We have used these values to find that the linear correlation

coefficient of r = –0.956. Use this sample to find the
regression equation.
1
Calculating the
Regression Equation - cont
Data
x 3 1 3 5
y 5 8 6 4
n=4 b0 = y – b1 x
x = 12
5.75 – (–1)(3) = 8.75
y = 23
x2 = 44
y2 = 141 ^y = 8.75 – 1x
xy = 61
1
Multiple Regression Analysis
 Multiple regression models are extensions to
the simple linear model and allow the creation
of models with more than one independent
variable.
Y = 0 + 1X1 + 2X2 + … + kXk + 
where
Y= dependent variable (response variable)
Xi = ith independent variable (predictor or explanatory
variable)
0 = intercept (value of Y when all Xi = 0)
i = coefficient of the ith independent variable
k= number of independent variables
= random error
1
SELLING SQUARE
AGE CONDITION
PRICE ($) FOOTAGE
95,000 1,926 30 Good
119,000 2,069 40 Excellent
124,800 1,720 30 Excellent
135,000 1,396 15 Good
142,000 1,706 32 Mint
145,000 1,847 38 Mint
159,000 1,950 27 Mint
165,000 2,323 30 Excellent
182,000 2,285 26 Mint
183,000 3,752 35 Good
200,000 2,300 18 Good
211,000 2,525 17 Good
215,000 3,800 40 Excellent
Table 4.5 219,000 1,740 12 Mint
2
Jenny Wilson Realty
Jenny Wilson wants to develop a model to determine
the suggested listing price for houses based on the
size and age of the house.
Yˆ  b0  b1 X 1  b2 X 2
where
Ŷ =predicted value of dependent variable (selling price)
b0 = Y intercept
X1 and X2 = value of the two independent
variables (square footage and age) respectively
b1 and b2 = slopes for X1 and X2 respectively
She selects a sample of houses that have sold recently
and records the data shown in Table 4.5
2
Jenny Wilson Realty
Input Screen for the Jenny Wilson Realty Multiple
Regression Example
Program 4.2A
2
Jenny Wilson Realty
Output for the Jenny Wilson Realty Multiple
Regression Example
Program 4.2B 2
Jenny Wilson Realty
 The model is statistically significant
 The p-value for the F-test is 0.002.
 r2 = 0.6719 so the model explains about 67% of
the variation in selling price (Y).
 But the F-test is for the entire model and we can’t
tell if one or both of the independent variables are
significant.
 By calculating the p-value of each variable, we can
assess the significance of the individual variables.
 Since the p-value for X1 (square footage) and X2
(age) are both less than the significance level of
0.05, both null hypotheses can be rejected.
2
SELLING SQUARE
AGE CONDITION
PRICE ($) FOOTAGE
95,000 1,926 30 Good
119,000 2,069 40 Excellent
124,800 1,720 30 Excellent
135,000 1,396 15 Good
142,000 1,706 32 Mint
145,000 1,847 38 Mint
159,000 1,950 27 Mint
165,000 2,323 30 Excellent
182,000 2,285 26 Mint
183,000 3,752 35 Good
200,000 2,300 18 Good
211,000 2,525 17 Good
215,000 3,800 40 Excellent
Table 4.5 219,000 1,740 12 Mint
2
Binary or Dummy Variables
 Binary (or dummy or indicator) variables are

special variables created for qualitative data.
 A dummy variable is assigned a value of 1 if a
particular condition is met and a value of 0
otherwise.
 The number of dummy variables must equal
one less than the number of categories of the
qualitative variable.
2
Jenny Wilson Realty
 Jenny believes a better model can be developed if
she includes information about the condition of the
property.
X3 = 1 if house is in excellent condition
= 0 otherwise
X4 = 1 if house is in mint condition
= 0 otherwise
 Two dummy variables are used to describe the
three categories of condition.
 No variable is needed for “good” condition since if
both X3 and X4 = 0, the house must be in good
condition.
2
Jenny Wilson Realty
Input Screen for the Jenny Wilson Realty Example
with Dummy Variables
2
Output for the Jenny Wilson Realty Example with
Dummy Variables
Y = 121658.4 + 56.4X1 -3962.8X2 + 33162.6X3 + 47369.2X4
2
Colonel Motors
 Engineers at Colonel Motors want to use
regression analysis to improve fuel efficiency.
 They have been asked to study the impact of
weight on miles per gallon (MPG).
WEIGHT (1,000 WEIGHT (1,000
MPG LBS.) MPG LBS.)
12 4.58 20 3.18
13 4.66 23 2.68
15 4.02 24 2.65
18 2.53 33 1.70
19 3.09 36 1.95
19 3.11 42 1.92
Table 4.6
3
Colonel Motors
Nonlinear Model for MPG Data
3
Colonel Motors
 The nonlinear model is a quadratic model.
 The easiest way to work with this model is to
develop a new variable.
X 2  ( weight)2
 This gives us a model that can be solved with

linear regression software:
Yˆ  b0  b1 X 1  b2 X 2
3
Colonel Motors
Yˆ  79.8  30.2 X 1  3.4 X 2
Program 4.5
A better model with a
smaller F-test for
significance and a larger
adjusted r2 value 3
Nonlinear Model
（１）Hyperbola model：
1
y    ui
i 1 2
x i
（２）Polynomial model：
y   x  x
2
i i
    n xin  u i
i 1 2 3
（３）Logarithm model：
y  
i 1 2
ln xi  u i
（４）Trigonometric function model:
y  
i 1 2
sin x u
i i
3
xi
（５）指数模型： y i
 a b u i
   1 x 1i  2 x  u
y i
 e 0 2 i i
（６）幂函数模型：
y
b
i
 a x i
 u i
（７）罗吉斯曲线：
   1x
e 0 i
y i

   1x
 u i
1  e 0 i
（８）修正指数增长曲线：
xi
y i
 a  b r u i 　　　　　　　3
参数
原模型模型代换代换后模型
估计
双曲线模型
1 y     x  u 一元线性回
y   
1 xi  i 1 2 i i
 ui x i 归ＯＬＳ法
i 1 2
x i
多项式模型 y   x  x     n xin  u i
k i 1 2 i1 3 i2 多元线性回
y   x  x 2
    n xin  u i
xik  x i
归ＯＬＳ法
i 1 2 i 3 i
对数模型
一元线性回
xi  ln xi y   xi  ui
y  
i 1 2
ln xi  u i i 1 2
归ＯＬＳ法
三角函数模型
一元线性回
y   sin x u
i i
xi  sin xi y     x  u
i 1 2 i i
归ＯＬＳ法
i 1 2
3
Example: Patient Satisfaction
ID Satisfaction Age Severity Anxiety
1 48 50 51 2.3
An administrator at Belltown 2 57 36 46 2.3
3 66 40 48 2.2
Hospital wanted to study the 4 70 41 44 1.8
relationship between patient 5 89 28 43 1.8
6 36 49 54 2.9
satisfaction and patient’s age, 7 46 42 50 2.2
8 54 45 48 2.4
severity of illness and anxiety 9 26 52 62 2.9
level. She randomly selected 10 77 29 50 2.1
11 89 29 48 2.4
23 patients and collected the 12 67 43 53 2.4
data present below. Larger 13 47 38 55 2.2
14 51 34 51 2.3
values represent more 15 57 53 54 2.2
16 66 36 49 2.0
satisfaction, increased 17 79 33 56 2.5
severity and higher anxiety. 18 88 29 46 1.9
19 60 33 49 2.1
20 49 55 51 2.4
21 77 29 52 2.3
22 52 44 58 2.9
23 60 43 50 2.3
37 3
SUMMARY OUTPUT: Patient Satisfaction

Regression Statistics
Multiple R 0.820157657
R Square 0.672658583
Adjusted R Square 0.620973096
Standard Error 10.28945339
Observations 23
38 3
ANOVA
　 df SS MS F Significance F
Regression 3 4133.633221 1377.878 13.01446 7.48239E-05
Residual 19 2011.584171 105.8729
Total 22 6145.217391　　　
P-value very
small, reject the
null hypothesis
39 3
　 Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

Intercept 162.875898 25.775651 6.31898 4.59E-06 108.92682 216.8249
Age -1.21031816 0.30145159 -4.0149 0.0007 -1.841263 -0.57937
Severity -0.66590561 0.82099695 -0.8110 0.42735 -2.3842725 1.052461
Anxiety -8.61303150 12.2412512 -0.7036 0.49021 -34.234272 17.00820
40 4
RESIDUAL OUTPUT
Observation Predicted Satisfaction Residuals
1 48.58883167 -0.588831665
2 68.86281401 -11.86281401
3 63.55103328 2.448966722
4 68.44955019 1.55044981
5 84.8495919 4.150408101
6 42.63361407 -6.633614068
7 59.79858572 -13.79858572
8 55.77683617 -1.776836171
9 33.67541465 -7.675414646
10 76.39402496 0.605975036
11 75.14192675 13.85807325
12 54.86794441 12.13205559
13 61.31033028 -14.31033028
14 67.95392224 -16.95392224
15 43.82146348 13.17853652
16 69.44900661 -3.449006606
17 64.11210601 14.88789399
18 80.78025374 7.219746265
19 72.21865794 -12.21865794
20 41.67593771 7.32406229
21 73.33960743 3.660392573
22 46.0215824 5.978417597
23 57.72696441 2.273035591
41 4
Time Series Forecasting
Quarter Demand Dt
II, 2006 8000
III, 2006 13000
IV, 2006 23000
Forecast demand for the
I, 2007 34000 next four quarters.
II, 2007 10000
III, 2007 18000
IV, 2007 23000
I, 2008 38000
II, 2008 12000
III, 2008 13000
IV, 2008 32000
I, 2009 41000
7-42 4
Time Series Forecasting
60,000
40,000
20,000
0
7-43 4
Forecasting Models
Forecasting Techniques
Qualitative Time-Series Methods Causal

Models Methods
Delphi Moving Regression Analysis

Methods Average
Jury of Executive Exponential Smoothing Multiple

Opinion Regression
Sales Force Trend

Composite Projections
Figure 5.1
Consumer
Market Survey Decomposition
4
Time-Series Models
 Time-series models attempt to predict the future

based on the past.
 Common time-series models are:
– Moving average.
– Exponential smoothing.
– Trend projections.
– Decomposition.
 Regression analysis is used in trend projections
and one type of decomposition model.
4
Measures of Forecast Accuracy
 We compare forecasted values with actual values to see

how well one model works or to compare models.
Forecast error = Actual value – Forecast value
 One measure of accuracy is the mean

absolute deviation (MAD):
MAD 
 forecast error
n
4
Using a naïve forecasting model we can compute the MAD:
ACTUAL
SALES OF ABSOLUTE VALUE OF
CD FORECAST ERRORS (DEVIATION),
YEAR PLAYERS SALES (ACTUAL – FORECAST)
1 110 — —
2 100 110 |100 – 110| = 10
3 120 100 |120 – 110| = 20
4 140 120 |140 – 120| = 20
5 170 140 |170 – 140| = 30
6 150 170 |150 – 170| = 20
7 160 150 |160 – 150| = 10
8 190 160 |190 – 160| = 30
9 200 190 |200 – 190| = 10
10 190 200 |190 – 200| = 10
Table 5.2 11 — 190 —
Sum of |errors| = 160
MAD = 160/9 = 17.8
4
Using a naïve forecasting model we can compute the MAD:
ACTUAL ABSOLUTE VALUE OF
SALES OF CD ERRORS (DEVIATION),
YEAR PLAYERS FORECAST SALES (ACTUAL – FORECAST)
1 110 — —
2 100 110 |100 – 110| = 10
3 120 100 |120 – 110| = 20
4
MAD 
5
 forecast error
140
170
120

140
160
 17.8
|140 – 120| = 20
|170 – 140| = 30
6 150 n 170 9 |150 – 170| = 20
7 160 150 |160 – 150| = 10
8 190 160 |190 – 160| = 30
9 200 190 |200 – 190| = 10
10 190 200 |190 – 200| = 10
11 — 190 —
Sum of |errors| = 160
MAD = 160/9 = 17.8
4
 There are other popular measures of forecast accuracy.
 The mean squared error:
MSE 
 ( error) 2
n
 The mean absolute percent error:
error
 actual
MAPE  100%
n
 And bias is the average error.
4
Components of a Time-Series
A time series typically has four components:
1. Trend (T) is the gradual upward or downward
movement of the data over time.
2. Seasonality (S) is a pattern of demand fluctuations
above or below the trend line that repeats at regular
intervals.
3. Cycles (C) are patterns in annual data that occur every
several years.
4. Random variations (R) are “blips” in the data caused by
chance or unusual situations, and follow no discernible
pattern.
5
Moving Averages
 Moving averages can be used when demand
is relatively steady over time.
 The next forecast is the average of the most
recent n data values from the time series.
 This methods tends to smooth out short-term
irregularities in the data series.
Sum of demands in previous n periods

Moving average forecast 
n
5
Moving Averages
 Mathematically:
Yt  Yt 1  ...  Yt  n1
Ft 1 
n
Where:
Ft 1
= forecast for time period t + 1
Yt
= actual value in time period t
n = number of periods to average
5
Wallace Garden Supply
 Wallace Garden Supply wants to forecast

demand for its Storage Shed.
 They have collected data for the past year.
 They are using a three-month moving average
to forecast demand (n = 3).
5
MONTH ACTUAL SHED SALES THREE-MONTH MOVING AVERAGE

January 10
February 12
March 13
April 16 (10 + 12 + 13)/3 = 11.67
May 19 (12 + 13 + 16)/3 = 13.67
June 23 (13 + 16 + 19)/3 = 16.00
July 26 (16 + 19 + 23)/3 = 19.33
August 30 (19 + 23 + 26)/3 = 22.67
September 28 (23 + 26 + 30)/3 = 26.33
October 18 (26 + 30 + 28)/3 = 28.00
November 16 (30 + 28 + 18)/3 = 25.33
December 14 (28 + 18 + 16)/3 = 20.67
January — (18 + 16 + 14)/3 = 16.00
Table 5.3
5
Weighted Moving Averages
 Weighted moving averages use weights to put more
emphasis on previous periods.
 This is often used when a trend or other pattern is
emerging.
Ft 1 
 ( Weight in period i )( Actual value in period)
 ( Weights )
 Mathematically:
w1Yt  w2Yt 1  ...  w nYt  n1
Ft 1 
w1  w2  ...  w n
where
wi = weight for the ith observation
5
 Wallace Garden Supply decides to try a

weighted moving average model to forecast
demand for its Storage Shed.
 They decide on the following weighting
scheme:
WEIGHTS APPLIED PERIOD
3 Last month
2 Two months ago
1 Three months ago
3 x Sales last month + 2 x Sales two months ago + 1 X Sales three months ago
6
Sum of the weights
5
THREE-MONTH WEIGHTED
MONTH ACTUAL SHED SALES MOVING AVERAGE
January 10
February 12
March 13
April 16 [(3 X 13) + (2 X 12) + (10)]/6 = 12.17
May 19 [(3 X 16) + (2 X 13) + (12)]/6 = 14.33
June 23 [(3 X 19) + (2 X 16) + (13)]/6 = 17.00
July 26 [(3 X 23) + (2 X 19) + (16)]/6 = 20.50
August 30 [(3 X 26) + (2 X 23) + (19)]/6 = 23.83
September 28 [(3 X 30) + (2 X 26) + (23)]/6 = 27.50
October 18 [(3 X 28) + (2 X 30) + (26)]/6 = 28.33
November 16 [(3 X 18) + (2 X 28) + (30)]/6 = 23.33
December 14 [(3 X 16) + (2 X 18) + (28)]/6 = 18.67
January — [(3 X 14) + (2 X 16) + (18)]/6 = 15.33
Table 5.4
5
Exponential Smoothing
 Exponential smoothing is a type of moving average
that is easy to use and requires little record keeping of
data.
ecast = Last period’s forecast

+ (Last period’s actual demand
– Last period’s forecast)
Here  is a weight (or smoothing constant) in
which 0≤≤1.
5
Mathematically:
Ft 1  Ft   (Yt  Ft )
Where:
Ft+1 = new forecast (for time period t + 1)
Ft = pervious forecast (for time period t)
 = smoothing constant (0 ≤  ≤ 1)
Yt = pervious period’s actual demand
The idea is simple – the new estimate is the old

estimate plus some fraction of the error in the last
period.
5
Exponential Smoothing Example
 In January, February’s demand for a certain car model
was predicted to be 142.
 Actual February demand was 153 autos
 Using a smoothing constant of  = 0.20, what is the
forecast for March?
New forecast (for March demand) = 142 + 0.2(153 – 142)
= 144.2 or 144 autos
 If actual demand in March was 136 autos, the

April forecast would be:
New forecast (for April demand) = 144.2 + 0.2(136 – 144.2)
= 142.6 or 143 autos
6
Selecting the Smoothing Constant
 Selecting the appropriate value for  is key to

obtaining a good forecast.
 The objective is always to generate an accurate
forecast.
 The general approach is to develop trial forecasts
with different values of  and select the  that
results in the lowest MAD.
6
Port of Baltimore Exponential Smoothing Forecast
for =0.1 and =0.5.
ACTUAL
TONNAGE FORECAST FORECAST
QUARTER UNLOADED USING  =0.10 USING  =0.50
1 180 175 175
2 168 175.5 = 175.00 + 0.10(180 – 175) 177.5
3 159 174.75 = 175.50 + 0.10(168 – 175.50) 172.75
4 175 173.18 = 174.75 + 0.10(159 – 174.75) 165.88
5 190 173.36 = 173.18 + 0.10(175 – 173.18) 170.44
6 205 175.02 = 173.36 + 0.10(190 – 173.36) 180.22
7 180 178.02 = 175.02 + 0.10(205 – 175.02) 192.61
8 182 178.22 = 178.02 + 0.10(180 – 178.02) 186.30
9 ? 178.60 = 178.22 + 0.10(182 – 178.22) 184.15
Table 5.5 6
Absolute Deviations and MADs for the Port of
Baltimore Example
ACTUAL FORECAST ABSOLUTE ABSOLUTE
TONNAGE WITH  = DEVIATIONS FORECAST DEVIATIONS
QUARTER UNLOADED 0.10 FOR  = 0.10 WITH  = 0.50 FOR  = 0.50
1 180 175 175

5….. 5….
2 168 175.5 177.5

7.5.. 9.5..
3 159 174.75 172.75

15.75 13.75
4 175 173.18 165.88

1.82 9.12
5 190 173.36 170.44

16.64 19.56
6 205 175.02 180.22

29.98 24.78
7 180 178.02 192.61

1.98 12.61
Table 5.6
8 182
Best178.22
choice 3.78 186.30
4.3..
6
Decomposition of a Time-Series
Product Demand Charted over 4 Years, with Trend
and Seasonality Indicated
Trend
Demand for Product or Service
Component
Seasonal Peaks
Actual
Demand
Line
Average Demand
over 4 Years
| | | |
Figure 5.3 Year Year Year Year

1 2 3 4
Time
6
Components of an Observation
Observed demand (O) =
Systematic component (S) + Random component (R)
Level (current deseasonalized demand)
Trend (growth or decline in demand)
Seasonality (predictable seasonal fluctuation)

• Systematic component: Expected value of demand
• Random component: The part of the forecast that deviates
from the systematic component
• Forecast error: difference between forecast and actual demand
7-65 6
Seasonal Variations with Trend
 When both trend and seasonal components are present, the
forecasting task is more complex.
 Seasonal indices should be computed using a centered
moving average (CMA) approach.
 There are four steps in computing CMAs:
1. Compute the CMA for each observation (where
possible) to deseasonalize the data .
2. Compute the seasonal ratio = Observation/CMA for
that observation.
3. Average seasonal ratios to get seasonal indices.
4. If seasonal indices do not add to the number of
seasons, multiply each index by (Number of seasons)/
(Sum of indices).
6
Estimating Level and Trend
 Before estimating level and trend, demand data
must be deseasonalized
 Deseasonalized demand = demand that would have
been observed in the absence of seasonal
fluctuations
 Periodicity (p)
– the number of periods after which the seasonal cycle
repeats itself
– for demand at Tahoe Salt p = 4
6
Deseasonalizing Demand
[Dt-(p/2) + Dt+(p/2) +  2Di] / 2p for p even

Dt = (sum is from i = t+1-(p/2) to t-1+(p/2))
 Di / p for p odd
(sum is from i = t-(p/2) to t+(p/2)), p/2 truncated to lower integer
6
 t 1 ( p /2)
[ Dt ( p /2)  Dt  ( p /2)   2 Di ] / (2 p ), p is even

 i t 1 ( p /2)
Dt   t [ p /2]


i t [ p /2]
Di / p, p is odd
For the example, p = 4 is even and t = 3:

31 (4/2) 4
D3  [ D3(4/2)  D3 (4/2)  
i  31 (4/2)
2 Di ] / (2  4)  [ D1  D5   2 Di ] / 8
i 2
 ( D1  D5  2 D2  2 D3  2 D4 ) / 8
7-69 6
D3  ( D1  D5  2 D2  2 D3  2 D4 ) / 8 
{8000+10000+[(2)(13000)+(2)(23000)+(2)
(34000)]}/8
= 19750 4 1 2 5
D4  [ D4 2  D4 2  
i  4 1 2
2 Di ] / 8  [ D2  D6   2 Di ] / 8
i 3
 [ D2  D6  2( D3  D4  D5 )] / 8 
{13000+18000+[(2)(23000)+(2)(34000)+(2)
(10000)]}/8= 20625
7-70 7
表 7.3.1 移动平均法长期趋势计算表
时期指标值三项移动平均数 ŷ 四项移动平均数 ŷ 二项移动平均数 ŷ
t1 y1
t2 y2 (y1+y2+y3)/3
(y1+y2+y3+y4)/4
t3 y3 (y2+y3+y4)/3 ( ŷ 23 + ŷ 34 )/2 = ŷ 3
(y2+y3+y4+ y5)/4
t4 y4 (y3+y4+y5)/3 ( ŷ 34+ ŷ 45)/2 = ŷ 4
(y3+y4+y5+ y6)/4
t5 y5 (y4+y5+y6)/3
… … … …
tn yn
7
 Tea demand in China
表 7.3.2 移动平均法长期趋势计算表
年度季度销售量（万件）四项移动平均二项移动平均趋势值 ŷ
1980 3 15
4 18
12.0
1981 1 6 12.25
12.5
2 9 12.75
13.0
3 17 13.5 13.25
4 20 14.0 13.75
1982 1 8 14.5 14.25
2 11 15.0 14.75
3 19 15.5 15.25
4 22 16.0 15.75
1983 1 10 16.5 16.25
2 13 17.0 16.75
3 21 17.5 17.25
4 24 18.0 17.75
1984 1 12
2 15
7
Turner Industries
YEAR QUARTER SALES CMA SEASONAL RATIO
1 1 108
2 125
3 150
4 141
2 1 116
2 134
3 159
4 152
3 1 123
2 142
3 168
4 165
7
Turner Industries
 To calculate the CMA for quarter 3 of year 1 we
compare the actual sales with an average quarter
centered on that time period.
 We will use 1.5 quarters before quarter 3 and 1.5
quarters after quarter 3 – that is we take quarters 2,
3, and 4 and one half of quarters 1, year 1 and
quarter 1, year 2.
0.5(108) + 125 + 150 + 141 + 0.5(116)
CMA(q3, y1) = = 132.00
4
Sales in quarter 3 150
Seasonal ratio    1.136
CMA 132
7
Turner Industries
1 1 108
2 125
3 150 132.000 1.136
4 141
2 1 116
2 134
3 159
4 152
3 1 123
2 142
3 168
4 165
7
Turner Industries
Scatterplot of Turner Industries Sales Data and

Centered Moving Average
CMA
200 –
   
150 –   
  
 
Sales
100 –
50 – Original Sales Figures

| | | | | | | | | | | |
0–
1 2 3 4 5 6 7 8 9 10 11 12
Time Period
Figure 5.5
7
Turner Industries
1 1 108
2 125
3 150 132.000 1.136
4 141 134.125 1.051
2 1 116 136.375 0.851
2 134 138.875 0.965
3 159 141.125 1.127
4 152 143.000 1.063
3 1 123 145.125 0.848
2 142 147.875 0.960
3 168
4 165
7
Turner Industries
There are two seasonal ratios for each quarter so these

are averaged to get the seasonal index:
Index for quarter 1 = I1 = (0.851 + 0.848)/2 = 0.85

7
Deseasonalized Data for Turner Industries
SALES SEASONAL DESEASONALIZED

($1,000,000s) INDEX SALES ($1,000,000s)
108
125
150
141
116
134
159
152
123
142
168
165
7
SALES SEASONAL DESEASONALIZED

($1,000,000s) INDEX SALES ($1,000,000s)
108 0.85 127.059
125 0.96 130.208
150 1.13 132.743
141 1.06 133.019
116 0.85 136.471
134 0.96 139.583
159 1.13 140.708
152 1.06 143.396
123 0.85 144.706
142 0.96 147.917
168 1.13 148.673
165 1.06 155.660
8
 Find a trend line using the deseasonalized data:
b1 = 2.34 b0 = 124.78
 Develop a forecast using this trend and multiply
the forecast by the appropriate seasonal index.
Ŷ = 124.78 + 2.34X
= 124.78 + 2.34(13)
= 155.2 (forecast before adjustment for
seasonality)
Ŷx I1 = 155.2 x 0.85 = 131.92
8
Using Time Series to Forecast the Demand of
Salt in US.
Quarter Demand Dt
II, 2006 8000
III, 2006 13000
IV, 2006 23000 Forecast demand for the
I, 2007 34000 next four quarters.
II, 2007 10000
III, 2007 18000
IV, 2007 23000
I, 2008 38000
II, 2008 12000
III, 2008 13000
IV, 2008 32000
I, 2009 41000
7-82 8
Deseasonalized Demand
t Dt Dt-bar
1 8000
2 13000
3 23000 19,750
4 34000 20,625
5 10000 21,250
6 18000 21,750
7 23000 22,500
8 38000 22,125
9 12000 22,625
10 13000 24,125
11 32000
12 41000
7-83 8
Time Series of Demand
Dt  L  Tt  18439  524t
50000
40000
Demand
30000 Dt
20000 Dt-bar
10000
0
1 2 3 4 5 6 7 8 9 10 11 12
Period
7-84 8
Time Series of Demand
Dt  L  Tt  18439  524t
7-85 8
Estimating Seasonal Factors
t Dt Dt-bar S-bar
1 8000 18963 0.42 = 8000/18963
2 13000 19487 0.67 = 13000/19487
3 23000 20011 1.15 = 23000/20011
4 34000 20535 1.66 = 34000/20535
5 10000 21059 0.47 = 10000/21059
6 18000 21583 0.83 = 18000/21583
7 23000 22107 1.04 = 23000/22107
8 38000 22631 1.68 = 38000/22631
9 12000 23155 0.52 = 12000/23155
10 13000 23679 0.55 = 13000/23679
11 32000 24203 1.32 = 32000/24203
12 41000 24727 1.66 = 41000/24727
S1 = (S1+S5+S9)/3 = 0.47; S2 = (S2+S6+S10)/3 = 0.68

S3 = (S3+S7+S11)/3 = 1.17; S4 = (S4+S8+S12)/3 = 1.67
7-86 8
Estimating the Forecast
Using the original equation, we can forecast the next
four periods of demand:
Ft+l = [L + (t + l)T]St+l
F13 = (L+13T)S1 = [18439+(13)(524)](0.47) = 11,868
F14 = (L+14T)S2 = [18439+(14)(524)](0.68) = 17,527
F15 = (L+15T)S3 = [18439+(15)(524)](1.17) = 30,770
F16 = (L+16T)S4 = [18439+(16)(524)](1.67) = 44,794
7-87 8
The Decomposition Method of Forecasting with
Trend and Seasonal Components
 Decomposition is the process of isolating linear trend and
seasonal factors to develop more accurate forecasts.
 There are five steps to decomposition:
1. Compute seasonal indices using CMAs.
2. Deseasonalize the data by dividing each number by its
seasonal index.
3. Find the equation of a trend line using the
deseasonalized data.
4. Forecast for future periods using the trend line.
5. Multiply the trend line forecast by the appropriate
seasonal index.

Chapter 2 Regression and Forecasting

Uploaded by

Copyright:

Available Formats

You might also like

Chapter 2 Regression and Forecasting

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 2 Regression and Forecasting

Uploaded by

Copyright:

Available Formats

Chapter 2 Regression and Forecasting

Nowadays in China, many people view house as their

1. Understand and know when to use various

Qualitative Time-Series Methods Causal

Delphi Moving Regression Analysis

Jury of Executive Exponential Smoothing Multiple

Sales Force Trend

This chapter introduces important methods for making

n(xy) – (x) (y)

We have used these values to find that the linear correlation

 Binary (or dummy or indicator) variables are

Y = 121658.4 + 56.4X1 -3962.8X2 + 33162.6X3 + 47369.2X4

 This gives us a model that can be solved with

Yˆ  79.8  30.2 X 1  3.4 X 2

（４）Trigonometric function model:

SUMMARY OUTPUT: Patient Satisfaction

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

Qualitative Time-Series Methods Causal

Delphi Moving Regression Analysis

Jury of Executive Exponential Smoothing Multiple

Sales Force Trend

 Time-series models attempt to predict the future

 We compare forecasted values with actual values to see

Forecast error = Actual value – Forecast value

 One measure of accuracy is the mean

Sum of demands in previous n periods

 Wallace Garden Supply wants to forecast

MONTH ACTUAL SHED SALES THREE-MONTH MOVING AVERAGE

 Wallace Garden Supply decides to try a

ecast = Last period’s forecast

The idea is simple – the new estimate is the old

 If actual demand in March was 136 autos, the

 Selecting the appropriate value for  is key to

1 180 175 175

2 168 175.5 177.5

3 159 174.75 172.75

4 175 173.18 165.88

5 190 173.36 170.44

6 205 175.02 180.22

7 180 178.02 192.61

Figure 5.3 Year Year Year Year

Level (current deseasonalized demand)

Trend (growth or decline in demand)

Seasonality (predictable seasonal fluctuation)

[Dt-(p/2) + Dt+(p/2) +  2Di] / 2p for p even

[ Dt ( p /2)  Dt  ( p /2)   2 Di ] / (2 p ), p is even

For the example, p = 4 is even and t = 3:

时期 指标值 三项移动平均数 ŷ 四项移动平均数 ŷ 二项移动平均数 ŷ

年度 季度 销售量（万件） 四项移动平均 二项移动平均趋势值 ŷ

Scatterplot of Turner Industries Sales Data and

50 – Original Sales Figures

There are two seasonal ratios for each quarter so these

Index for quarter 1 = I1 = (0.851 + 0.848)/2 = 0.85

SALES SEASONAL DESEASONALIZED

SALES SEASONAL DESEASONALIZED

 Find a trend line using the deseasonalized data:

S1 = (S1+S5+S9)/3 = 0.47; S2 = (S2+S6+S10)/3 = 0.68

　 Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

时期指标值三项移动平均数 ŷ 四项移动平均数 ŷ 二项移动平均数 ŷ

年度季度销售量（万件）四项移动平均二项移动平均趋势值 ŷ