Professional Documents
Culture Documents
Chapter 2 Regression and Forecasting
Chapter 2 Regression and Forecasting
Chapter 2 Regression and Forecasting
2
Jenny Wilson Real Estate Data
SELLING SQUARE
AGE CONDITION
PRICE ($) FOOTAGE
95,000 1,926 30 Good
119,000 2,069 40 Excellent
124,800 1,720 30 Excellent
135,000 1,396 15 Good
142,000 1,706 32 Mint
145,000 1,847 38 Mint
159,000 1,950 27 Mint
165,000 2,323 30 Excellent
182,000 2,285 26 Mint
183,000 3,752 35 Good
200,000 2,300 18 Good
211,000 2,525 17 Good
215,000 3,800 40 Excellent
Table 4.5 219,000 1,740 12 Mint
Learning Objectives
1
Scatterplots of Paired Data
1
Formula
The linear correlation coefficient r measures the strength of
a linear relationship between the paired values in a sample.
nxy – (x)(y)
r=
n(x2) – (x)2 n(y2) – (y)2
1
Example: Calculating r
Using the simple random sample of data below, find
the value of r.
Data
x 3 1 3 5
y 5 8 6 4
1
Example: Calculating r - cont
nxy – (x)(y)
r= = -0.956
n(x2) – (x)2 n(y2) – (y)2
1
Example: Calculating r
Using the simple random sample of data below, find
the value of r.
1
Formulas for b0 and b1
b0 = y – b1 x (y-intercept)
1
Calculating the
Regression Equation
Data
x 3 1 3 5
y 5 8 6 4
1
Calculating the
Regression Equation - cont
Data
x 3 1 3 5
y 5 8 6 4
n=4 b0 = y – b1 x
x = 12
5.75 – (–1)(3) = 8.75
y = 23
x2 = 44
y2 = 141 ^y = 8.75 – 1x
xy = 61
1
Multiple Regression Analysis
Multiple regression models are extensions to
the simple linear model and allow the creation
of models with more than one independent
variable.
Y = 0 + 1X1 + 2X2 + … + kXk +
where
Y= dependent variable (response variable)
Xi = ith independent variable (predictor or explanatory
variable)
0 = intercept (value of Y when all Xi = 0)
i = coefficient of the ith independent variable
k= number of independent variables
= random error
1
Jenny Wilson Real Estate Data
SELLING SQUARE
AGE CONDITION
PRICE ($) FOOTAGE
95,000 1,926 30 Good
119,000 2,069 40 Excellent
124,800 1,720 30 Excellent
135,000 1,396 15 Good
142,000 1,706 32 Mint
145,000 1,847 38 Mint
159,000 1,950 27 Mint
165,000 2,323 30 Excellent
182,000 2,285 26 Mint
183,000 3,752 35 Good
200,000 2,300 18 Good
211,000 2,525 17 Good
215,000 3,800 40 Excellent
Table 4.5 219,000 1,740 12 Mint
2
Jenny Wilson Realty
Jenny Wilson wants to develop a model to determine
the suggested listing price for houses based on the
size and age of the house.
Yˆ b0 b1 X 1 b2 X 2
where
Ŷ =predicted value of dependent variable (selling price)
b0 = Y intercept
X1 and X2 = value of the two independent
variables (square footage and age) respectively
b1 and b2 = slopes for X1 and X2 respectively
She selects a sample of houses that have sold recently
and records the data shown in Table 4.5
2
Jenny Wilson Realty
Input Screen for the Jenny Wilson Realty Multiple
Regression Example
Program 4.2A
2
Jenny Wilson Realty
Output for the Jenny Wilson Realty Multiple
Regression Example
Program 4.2B 2
Jenny Wilson Realty
The model is statistically significant
The p-value for the F-test is 0.002.
r2 = 0.6719 so the model explains about 67% of
the variation in selling price (Y).
But the F-test is for the entire model and we can’t
tell if one or both of the independent variables are
significant.
By calculating the p-value of each variable, we can
assess the significance of the individual variables.
Since the p-value for X1 (square footage) and X2
(age) are both less than the significance level of
0.05, both null hypotheses can be rejected.
2
Jenny Wilson Real Estate Data
SELLING SQUARE
AGE CONDITION
PRICE ($) FOOTAGE
95,000 1,926 30 Good
119,000 2,069 40 Excellent
124,800 1,720 30 Excellent
135,000 1,396 15 Good
142,000 1,706 32 Mint
145,000 1,847 38 Mint
159,000 1,950 27 Mint
165,000 2,323 30 Excellent
182,000 2,285 26 Mint
183,000 3,752 35 Good
200,000 2,300 18 Good
211,000 2,525 17 Good
215,000 3,800 40 Excellent
Table 4.5 219,000 1,740 12 Mint
2
Binary or Dummy Variables
2
Jenny Wilson Realty
Jenny believes a better model can be developed if
she includes information about the condition of the
property.
X3 = 1 if house is in excellent condition
= 0 otherwise
X4 = 1 if house is in mint condition
= 0 otherwise
Two dummy variables are used to describe the
three categories of condition.
No variable is needed for “good” condition since if
both X3 and X4 = 0, the house must be in good
condition.
2
Jenny Wilson Realty
Input Screen for the Jenny Wilson Realty Example
with Dummy Variables
2
Output for the Jenny Wilson Realty Example with
Dummy Variables
2
Colonel Motors
Engineers at Colonel Motors want to use
regression analysis to improve fuel efficiency.
They have been asked to study the impact of
weight on miles per gallon (MPG).
WEIGHT (1,000 WEIGHT (1,000
MPG LBS.) MPG LBS.)
12 4.58 20 3.18
13 4.66 23 2.68
15 4.02 24 2.65
18 2.53 33 1.70
19 3.09 36 1.95
19 3.11 42 1.92
Table 4.6
3
Colonel Motors
Nonlinear Model for MPG Data
3
Colonel Motors
The nonlinear model is a quadratic model.
The easiest way to work with this model is to
develop a new variable.
X 2 ( weight)2
3
Colonel Motors
Program 4.5
A better model with a
smaller F-test for
significance and a larger
adjusted r2 value 3
Nonlinear Model
(1)Hyperbola model:
1
y ui
i 1 2
x i
(2)Polynomial model:
y x x
2
i i
n xin u i
i 1 2 3
(3)Logarithm model:
y
i 1 2
ln xi u i
y
i 1 2
sin x u
i i
3
xi
( 5 ) 指 数 模 型 : y i
a b u i
1 x 1i 2 x u
y i
e 0 2 i i
( 6 ) 幂 函 数 模 型 :
y
b
i
a x i
u i
( 7 ) 罗 吉 斯 曲 线 :
1x
e 0 i
y i
1x
u i
1 e 0 i
( 8 ) 修 正 指 数 增 长 曲 线 :
xi
y i
a b r u i 3
参数
原模型 模型代换 代换后模型
估计
双曲线模型
1 y x u 一元线性回
y
1 xi i 1 2 i i
ui x i 归OLS法
i 1 2
x i
多项式模型 y x x n xin u i
k i 1 2 i1 3 i2 多元线性回
y x x 2
n xin u i
xik x i
归OLS法
i 1 2 i 3 i
对数模型
一元线性回
xi ln xi y xi ui
y
i 1 2
ln xi u i i 1 2
归OLS法
三角函数模型
一元线性回
y sin x u
i i
xi sin xi y x u
i 1 2 i i
归OLS法
i 1 2
3
Example: Patient Satisfaction
ID Satisfaction Age Severity Anxiety
1 48 50 51 2.3
An administrator at Belltown 2 57 36 46 2.3
3 66 40 48 2.2
Hospital wanted to study the 4 70 41 44 1.8
relationship between patient 5 89 28 43 1.8
6 36 49 54 2.9
satisfaction and patient’s age, 7 46 42 50 2.2
8 54 45 48 2.4
severity of illness and anxiety 9 26 52 62 2.9
level. She randomly selected 10 77 29 50 2.1
11 89 29 48 2.4
23 patients and collected the 12 67 43 53 2.4
data present below. Larger 13 47 38 55 2.2
14 51 34 51 2.3
values represent more 15 57 53 54 2.2
16 66 36 49 2.0
satisfaction, increased 17 79 33 56 2.5
severity and higher anxiety. 18 88 29 46 1.9
19 60 33 49 2.1
20 49 55 51 2.4
21 77 29 52 2.3
22 52 44 58 2.9
23 60 43 50 2.3
37 3
Example: Patient Satisfaction
38 3
Example: Patient Satisfaction
ANOVA
df SS MS F Significance F
Regression 3 4133.633221 1377.878 13.01446 7.48239E-05
Residual 19 2011.584171 105.8729
Total 22 6145.217391
P-value very
small, reject the
null hypothesis
39 3
Example: Patient Satisfaction
40 4
Example: Patient Satisfaction
RESIDUAL OUTPUT
Observation Predicted Satisfaction Residuals
1 48.58883167 -0.588831665
2 68.86281401 -11.86281401
3 63.55103328 2.448966722
4 68.44955019 1.55044981
5 84.8495919 4.150408101
6 42.63361407 -6.633614068
7 59.79858572 -13.79858572
8 55.77683617 -1.776836171
9 33.67541465 -7.675414646
10 76.39402496 0.605975036
11 75.14192675 13.85807325
12 54.86794441 12.13205559
13 61.31033028 -14.31033028
14 67.95392224 -16.95392224
15 43.82146348 13.17853652
16 69.44900661 -3.449006606
17 64.11210601 14.88789399
18 80.78025374 7.219746265
19 72.21865794 -12.21865794
20 41.67593771 7.32406229
21 73.33960743 3.660392573
22 46.0215824 5.978417597
23 57.72696441 2.273035591
41 4
Time Series Forecasting
Quarter Demand Dt
II, 2006 8000
III, 2006 13000
IV, 2006 23000
Forecast demand for the
I, 2007 34000 next four quarters.
II, 2007 10000
III, 2007 18000
IV, 2007 23000
I, 2008 38000
II, 2008 12000
III, 2008 13000
IV, 2008 32000
I, 2009 41000
7-42 4
Time Series Forecasting
60,000
40,000
20,000
0
7-43 4
Forecasting Models
Forecasting Techniques
4
Time-Series Models
4
Measures of Forecast Accuracy
4
Measures of Forecast Accuracy
Using a naïve forecasting model we can compute the MAD:
ACTUAL
SALES OF ABSOLUTE VALUE OF
CD FORECAST ERRORS (DEVIATION),
YEAR PLAYERS SALES (ACTUAL – FORECAST)
1 110 — —
2 100 110 |100 – 110| = 10
3 120 100 |120 – 110| = 20
4 140 120 |140 – 120| = 20
5 170 140 |170 – 140| = 30
6 150 170 |150 – 170| = 20
7 160 150 |160 – 150| = 10
8 190 160 |190 – 160| = 30
9 200 190 |200 – 190| = 10
10 190 200 |190 – 200| = 10
Table 5.2 11 — 190 —
Sum of |errors| = 160
MAD = 160/9 = 17.8
4
Measures of Forecast Accuracy
Using a naïve forecasting model we can compute the MAD:
ACTUAL ABSOLUTE VALUE OF
SALES OF CD ERRORS (DEVIATION),
YEAR PLAYERS FORECAST SALES (ACTUAL – FORECAST)
1 110 — —
2 100 110 |100 – 110| = 10
3 120 100 |120 – 110| = 20
4
MAD
5
forecast error
140
170
120
140
160
17.8
|140 – 120| = 20
|170 – 140| = 30
6 150 n 170 9 |150 – 170| = 20
7 160 150 |160 – 150| = 10
8 190 160 |190 – 160| = 30
9 200 190 |200 – 190| = 10
10 190 200 |190 – 200| = 10
11 — 190 —
Sum of |errors| = 160
MAD = 160/9 = 17.8
4
Measures of Forecast Accuracy
There are other popular measures of forecast accuracy.
The mean squared error:
MSE
( error) 2
n
The mean absolute percent error:
error
actual
MAPE 100%
n
And bias is the average error.
4
Components of a Time-Series
A time series typically has four components:
1. Trend (T) is the gradual upward or downward
movement of the data over time.
2. Seasonality (S) is a pattern of demand fluctuations
above or below the trend line that repeats at regular
intervals.
3. Cycles (C) are patterns in annual data that occur every
several years.
4. Random variations (R) are “blips” in the data caused by
chance or unusual situations, and follow no discernible
pattern.
5
Moving Averages
Moving averages can be used when demand
is relatively steady over time.
The next forecast is the average of the most
recent n data values from the time series.
This methods tends to smooth out short-term
irregularities in the data series.
5
Moving Averages
Mathematically:
Yt Yt 1 ... Yt n1
Ft 1
n
Where:
Ft 1
= forecast for time period t + 1
Yt
= actual value in time period t
n = number of periods to average
5
Wallace Garden Supply
5
Wallace Garden Supply
Table 5.3
5
Weighted Moving Averages
Weighted moving averages use weights to put more
emphasis on previous periods.
This is often used when a trend or other pattern is
emerging.
Ft 1
( Weight in period i )( Actual value in period)
( Weights )
Mathematically:
w1Yt w2Yt 1 ... w nYt n1
Ft 1
w1 w2 ... w n
where
wi = weight for the ith observation
5
Wallace Garden Supply
5
Wallace Garden Supply
THREE-MONTH WEIGHTED
MONTH ACTUAL SHED SALES MOVING AVERAGE
January 10
February 12
March 13
April 16 [(3 X 13) + (2 X 12) + (10)]/6 = 12.17
May 19 [(3 X 16) + (2 X 13) + (12)]/6 = 14.33
June 23 [(3 X 19) + (2 X 16) + (13)]/6 = 17.00
July 26 [(3 X 23) + (2 X 19) + (16)]/6 = 20.50
August 30 [(3 X 26) + (2 X 23) + (19)]/6 = 23.83
September 28 [(3 X 30) + (2 X 26) + (23)]/6 = 27.50
October 18 [(3 X 28) + (2 X 30) + (26)]/6 = 28.33
November 16 [(3 X 18) + (2 X 28) + (30)]/6 = 23.33
December 14 [(3 X 16) + (2 X 18) + (28)]/6 = 18.67
January — [(3 X 14) + (2 X 16) + (18)]/6 = 15.33
Table 5.4
5
Exponential Smoothing
Exponential smoothing is a type of moving average
that is easy to use and requires little record keeping of
data.
5
Exponential Smoothing
Mathematically:
Ft 1 Ft (Yt Ft )
Where:
Ft+1 = new forecast (for time period t + 1)
Ft = pervious forecast (for time period t)
= smoothing constant (0 ≤ ≤ 1)
Yt = pervious period’s actual demand
6
Selecting the Smoothing Constant
6
Exponential Smoothing
Port of Baltimore Exponential Smoothing Forecast
for =0.1 and =0.5.
ACTUAL
TONNAGE FORECAST FORECAST
QUARTER UNLOADED USING =0.10 USING =0.50
1 180 175 175
2 168 175.5 = 175.00 + 0.10(180 – 175) 177.5
3 159 174.75 = 175.50 + 0.10(168 – 175.50) 172.75
4 175 173.18 = 174.75 + 0.10(159 – 174.75) 165.88
5 190 173.36 = 173.18 + 0.10(175 – 173.18) 170.44
6 205 175.02 = 173.36 + 0.10(190 – 173.36) 180.22
7 180 178.02 = 175.02 + 0.10(205 – 175.02) 192.61
8 182 178.22 = 178.02 + 0.10(180 – 178.02) 186.30
9 ? 178.60 = 178.22 + 0.10(182 – 178.22) 184.15
Table 5.5 6
Exponential Smoothing
Absolute Deviations and MADs for the Port of
Baltimore Example
ACTUAL FORECAST ABSOLUTE ABSOLUTE
TONNAGE WITH = DEVIATIONS FORECAST DEVIATIONS
QUARTER UNLOADED 0.10 FOR = 0.10 WITH = 0.50 FOR = 0.50
Component
Seasonal Peaks
Actual
Demand
Line
Average Demand
over 4 Years
| | | |
6
Estimating Level and Trend
Before estimating level and trend, demand data
must be deseasonalized
Deseasonalized demand = demand that would have
been observed in the absence of seasonal
fluctuations
Periodicity (p)
– the number of periods after which the seasonal cycle
repeats itself
– for demand at Tahoe Salt p = 4
6
Deseasonalizing Demand
Di / p for p odd
(sum is from i = t-(p/2) to t+(p/2)), p/2 truncated to lower integer
6
Deseasonalizing Demand
t 1 ( p /2)
( D1 D5 2 D2 2 D3 2 D4 ) / 8
7-69 6
Deseasonalizing Demand
D3 ( D1 D5 2 D2 2 D3 2 D4 ) / 8
{8000+10000+[(2)(13000)+(2)(23000)+(2)
(34000)]}/8
= 19750 4 1 2 5
D4 [ D4 2 D4 2
i 4 1 2
2 Di ] / 8 [ D2 D6 2 Di ] / 8
i 3
[ D2 D6 2( D3 D4 D5 )] / 8
{13000+18000+[(2)(23000)+(2)(34000)+(2)
(10000)]}/8= 20625
7-70 7
表 7.3.1 移动平均法长期趋势计算表
t1 y1
t2 y2 (y1+y2+y3)/3
(y1+y2+y3+y4)/4
t3 y3 (y2+y3+y4)/3 ( ŷ 23 + ŷ 34 )/2 = ŷ 3
(y2+y3+y4+ y5)/4
t4 y4 (y3+y4+y5)/3 ( ŷ 34+ ŷ 45)/2 = ŷ 4
(y3+y4+y5+ y6)/4
t5 y5 (y4+y5+y6)/3
… … … …
tn yn
7
Tea demand in China
表 7.3.2 移动平均法长期趋势计算表
1980 3 15
4 18
12.0
1981 1 6 12.25
12.5
2 9 12.75
13.0
3 17 13.5 13.25
4 20 14.0 13.75
1982 1 8 14.5 14.25
2 11 15.0 14.75
3 19 15.5 15.25
4 22 16.0 15.75
1983 1 10 16.5 16.25
2 13 17.0 16.75
3 21 17.5 17.25
4 24 18.0 17.75
1984 1 12
2 15
7
Turner Industries
YEAR QUARTER SALES CMA SEASONAL RATIO
1 1 108
2 125
3 150
4 141
2 1 116
2 134
3 159
4 152
3 1 123
2 142
3 168
4 165
7
Turner Industries
To calculate the CMA for quarter 3 of year 1 we
compare the actual sales with an average quarter
centered on that time period.
We will use 1.5 quarters before quarter 3 and 1.5
quarters after quarter 3 – that is we take quarters 2,
3, and 4 and one half of quarters 1, year 1 and
quarter 1, year 2.
0.5(108) + 125 + 150 + 141 + 0.5(116)
CMA(q3, y1) = = 132.00
4
Sales in quarter 3 150
Seasonal ratio 1.136
CMA 132
7
Turner Industries
YEAR QUARTER SALES CMA SEASONAL RATIO
1 1 108
2 125
3 150 132.000 1.136
4 141
2 1 116
2 134
3 159
4 152
3 1 123
2 142
3 168
4 165
7
Turner Industries
150 –
Sales
100 –
7
Turner Industries
YEAR QUARTER SALES CMA SEASONAL RATIO
1 1 108
2 125
3 150 132.000 1.136
4 141 134.125 1.051
2 1 116 136.375 0.851
2 134 138.875 0.965
3 159 141.125 1.127
4 152 143.000 1.063
3 1 123 145.125 0.848
2 142 147.875 0.960
3 168
4 165
7
Turner Industries
7
Deseasonalized Data for Turner Industries
7
Deseasonalized Data for Turner Industries
8
Deseasonalized Data for Turner Industries
b1 = 2.34 b0 = 124.78
Develop a forecast using this trend and multiply
the forecast by the appropriate seasonal index.
Ŷ = 124.78 + 2.34X
= 124.78 + 2.34(13)
= 155.2 (forecast before adjustment for
seasonality)
Ŷx I1 = 155.2 x 0.85 = 131.92
8
Using Time Series to Forecast the Demand of
Salt in US.
Quarter Demand Dt
II, 2006 8000
III, 2006 13000
IV, 2006 23000 Forecast demand for the
I, 2007 34000 next four quarters.
II, 2007 10000
III, 2007 18000
IV, 2007 23000
I, 2008 38000
II, 2008 12000
III, 2008 13000
IV, 2008 32000
I, 2009 41000
7-82 8
Deseasonalized Demand
t Dt Dt-bar
1 8000
2 13000
3 23000 19,750
4 34000 20,625
5 10000 21,250
6 18000 21,750
7 23000 22,500
8 38000 22,125
9 12000 22,625
10 13000 24,125
11 32000
12 41000
7-83 8
Time Series of Demand
Dt L Tt 18439 524t
50000
40000
Demand
30000 Dt
20000 Dt-bar
10000
0
1 2 3 4 5 6 7 8 9 10 11 12
Period
7-84 8
Time Series of Demand
Dt L Tt 18439 524t
7-85 8
Estimating Seasonal Factors
t Dt Dt-bar S-bar
1 8000 18963 0.42 = 8000/18963
2 13000 19487 0.67 = 13000/19487
3 23000 20011 1.15 = 23000/20011
4 34000 20535 1.66 = 34000/20535
5 10000 21059 0.47 = 10000/21059
6 18000 21583 0.83 = 18000/21583
7 23000 22107 1.04 = 23000/22107
8 38000 22631 1.68 = 38000/22631
9 12000 23155 0.52 = 12000/23155
10 13000 23679 0.55 = 13000/23679
11 32000 24203 1.32 = 32000/24203
12 41000 24727 1.66 = 41000/24727
7-87 8
The Decomposition Method of Forecasting with
Trend and Seasonal Components
Decomposition is the process of isolating linear trend and
seasonal factors to develop more accurate forecasts.
There are five steps to decomposition:
1. Compute seasonal indices using CMAs.
2. Deseasonalize the data by dividing each number by its
seasonal index.
3. Find the equation of a trend line using the
deseasonalized data.
4. Forecast for future periods using the trend line.
5. Multiply the trend line forecast by the appropriate
seasonal index.