Regression Analysis For Forecasting: Yosef Daryanto

You might also like

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 36

Regression Analysis for

Forecasting

Yosef Daryanto
Atma Jaya Yogyakarta University
2015
Simple Regression Forecasting
Method

Yosef Daryanto
Faculty of Economy
Atma Jaya Yogyakarta University
2015
Overview of forecasting techniques
Quantitative: Sufficient quantitative information is available
*Time series: Predicting the situation of historical pattern such as the
growth in sales or gross national product
*Explanatory: Understanding how explanatory variables such as prices and
advertising affect sales

Qualitative: Little or no quantitative information is available,


but sufficient qualitative knowledge exists.
*Predicting the speed of telecommunications around the year 2020
*Forecasting how a large increase oil price will affect the consumption of oil

Unpredictable: Little or no information is available.


*Predicting the effects of interplanetary travel
Bivariate Regression
 Bivariate regression analysis (also called simple
regression) is a statistical tool that gives us the
ability to estimate the mathematical relationship
between a single independent variable (usually
called X) and a dependent variable (usually called
Y).
 The dependent variable is the variable for which we
want to develop a forecast.
 The objective is to develop an explanatory model
relating Y and X, e.g. correlation between X and Y.
Population Sales
(000) (000)
505 372
Oil World Oil
351 275 Production Price
186 214 (million ($/barrel)
175 135 barrel)
Pulp World
132 81 255 100 Shipment Pulp
115 144 340 78 (millions Price
108 90 330 80 metric tons) ($/ton)
79 97 385 60 10.4 792
380 67 11.4 868
368 69 11.1 801
348 75 11.7 715
12.7 723
14.0 748
15.1 765
15.2 755
Visualization of Data: An Important
Step in Regression Analysis
 All four scatter plots below have very similar statistical
properties but are visually quite different; Y = 3 + 0.5X

One should utilize graphic techniques to inspect the data, looking especially for
Trend, seasonal, and cyclical components, as well as for outliers.
Least Square Estimation
 Linear relationship between Y and X given by
Y = a + bX + e

where a is the intercept, b is the slope of the line,


and e denotes the error (the deviation of the
observation from the linear relationship)
Least Square Estimation
 The objective of least square estimation is to find the
values of a and b so the line Ŷ = a + bX presents the
“best fit” to the data.
n

 (X
i 1
i  X )( Yi  Y )
n( XY ) - ( X)(  Y)
b 
n
n( X 2 )  ( X) 2

i 1
(X i  X ) 2

a  Y - bX
Cross-sectional Forecasting
Example: Based on data in the Pulp Shipment World Pulp
table, develop the (millions metric Price
relationship between world tons) ($/ton)
pulp price and shipment!
How much of shipment 10.4 79
could we expect when 11.4 86
world pulp price increase to 11.1 80
90 $/ton? 11.7 71
12.7 72
14.0 74
15.1 76
15.2 75
Cross-sectional Forecasting
Exercise: One company operate in eight cities. Table below shows
data of most recent year’s sales and the population of each city.
Develop the model to predict sales based on population by using a
simple regression model.
Population (000) Sales (000)

505 372
351 275
186 214
175 135
132 81
115 144
108 90
79 97
Time series forecast
Car Sales
 In fitting the linear Month Time period Sales
regression, we have to
Jan’09 1 100
ignored the time ordering
Feb’09 2 96
of the data
Mar’09 3 107
 To use the equation to
Apr’09 4 98
make a forecast for time
series data, we need only Mei’09 5 103
substitute the appropriate Jun’09 6 99
values for time (T). Jul’09 7 126
Aug’09 8 128
Sep’09 9 122
Oct’09 10 130
Correlation Coefficient (rxy)
 It often occurs that two variables are related to each other,
even though it might be incorrect to say that the value of
one of the variables depends upon, or is influenced by,
changes in the value of the other variable.
 The coefficient of correlation, r, is a relative measure of the
linear association between two numerical variables.

rxy 
Cov xy

 (X  X)(Y  Y)
i i

SxSy
 (X  X)  (Y  Y)
i
2
i
2
Correlation Coefficient (rxy)

(Makridakis et al, 1998)


Correlation Coefficient (rxy)
 The sign of the correlation ( + or - ) indicates the direction of
the relationship between two variables. If it is positive, they
tend to increase and decrease together.
 The magnitude of the correlation coefficient is a measure of the
strength of the association – meaning that as the absolute
value of the correlation moves away from zero, the two
variables are more strongly associated.
 It is well to bear in mind that r values are unstable in small
samples, are measures of linear association, and are seriously
influenced by extreme values.
Non-linear relationship
 When a scatter diagram indicates there is a nonlinear
relationship between Y and X, we can try to transform the X
variable to another form so that the resulting relationship with Y
is linear.
 Four of the most common transformations (functions) that are
used to generate new predictor variables are the reciprocal, the
log, the square root, and the square.
1/X; log X; √X; X2
Example
Week Advertising Sales volume  Advertising expenditures
Expenditure (X) (Y) versus monthly sales in a
1 3.9 1.1 Hardware store.
2 4.9 1.7
3 7.6 2.6
4 6.8 2.4
1/X; log X; √X; X2
5 5.9 2.3
6 9.1 2.9
7 3.4 0.4 4

8 11.6 3.2 3.5


3
9 14.1 3.3 Sales 2.5
2
10 14.9 3.1 1.5
11 10.5 3.2 1
0.5
12 9.9 3.0 0

13 17.1 3.7 0 2 4 6 8 10 12 14 16 18
Advertising expenditures
14 12.4 3.3
Advertising Expenditure Vs Sales
4 4
3.5 3.5
3 3

Y (Sales)
2.5 2.5
Y (Sales)

2 2
1.5 1.5
1 1
0.5 0.5
0 0
0.000 0.050 0.100 0.150 0.200 0.250 0.300 0.350 0.000 0.200 0.400 0.600 0.800 1.000 1.200 1.400
1/X LogX

More linear
4 4
3.5 3.5
3 3

Y (Sales)
2.5
Y (Sales)

2.5
2 2
1.5 1.5
1 1
0.5 0.5
0 0
0.000 0.500 1.000 1.500 2.000 2.500 3.000 3.500 4.000 4.500 0 50 100 150 200 250 300 350
Sq Root X Square X
Multiple Regression
Analysis

Yosef Daryanto
Universitas Atma Jaya Yogyakarta
2015
Introduction
 In simple linear regression, the relationship between
a single independent variable and a dependent
variable is investigated. The relationship between
two variables frequently allows one to accurately
predict the dependent variable from knowledge of
the independent variable.
 Unfortunately, many real life forecasting situations
are not so simple. More than one independent
variable is usually necessary in order to predict a
dependent variable accurately.
 Regression models with more than one independent
variable are called multiple regression models.
 Mr. Charlie observes the selling price and sales
volume of milk gallons for 10 randomly selected weeks
presented in Table below. Construct the linear
relationship between both variables! Evaluate the
equation using r2!
Week Selling Price (X) Sales volume (Y)
1 1.30 10
2 2.00 6
3 1.70 5
4 1.50 12
5 1.60 10
6 1.20 15
7 1.60 5
8 1.40 12
9 1.00 17
10 1.10 20
Introduction
 In the problem of forecast sales volume of gallons of
milk from knowledge of price per gallon, Mr. Charlie
is faced with the problem of making a prediction that
is not entirely accurate. He can explain almost 75%
of the differences in gallons of milk sold by using
one independent variable. Thus, 25% (1 – r2) of the
total variation is unexplained.
 To do a more accurate job of forecasting, he needs
to find another predictor variable that will enable him
to explain more of the total variation. If Mr. Charlie
can reduce the unexplained variation, his forecast
will involve less uncertainty and be more accurate.
Introduction
 A search must be conducted for another
independent variable that is related to sales volume
of gallons of milk. However, this new independent
variable cannot relate too highly to the independent
variable already in use (price per gallon).
 If the two independent variables are highly related to
each other, they will explain the same variation, and
the addition of the second variable will not improve
the forecast.
 This problem often referred as multi-collinearity.
Correlation Matrix
 Mr. Charlie decides that advertising might help improve
his forecast of weekly sales volume. He investigate the
relationship among advertising expense, sales volume,
and price per gallon by examining a correlation matrix.
 Correlation matrix is constructed by computing the simple
correlation coefficients for each combination of pairs of
variables.
Variables
variables 1 2 3
R 2
 r 2 
( Ŷi Y ) 2


1 r11 r12 r13 yy
(Yi  Y) 2
2 r22 r23
r 2 = (r)2
3 r33
Example
 Mr. Charlie’s data:
Week Sales (1,000) Price per Gallon ($) Advertising ($100)
Y X1 X2
1 10 1.3 9
2 6 2 7
3 5 1.7 5
4 12 1.5 14
5 10 1.6 15
6 15 1.2 12
7 5 1.6 6
8 12 1.4 10
9 17 1 15
10 20 1.1 21

Total 112 14.4 114


Mean 11.2 1.44 11.4
Example
Variables
Variables Sales (Y) Price (X1) Advertising (X2)
Sales (Y) -0.863 0.891
Price (X1) -0.654
Advertising (X2)
Multiple Regression Model
Y = b0 + b1X1 + b2X2 + … + bkXk + e
Ŷ = b0 + b1X1 + b2X2 + … + bkXk

 The calculation in multiple regression analysis are


ordinarily performed using computer software
packages.
 Minitab
 Ms Excel
 Win QSB 2.0
 Etc.
Win QSB 2.0 Forecasting and LR
Module
Win QSB 2.0 Forecasting and LR
Module
 Choose Linear Regression and specify the problem
Win QSB 2.0 Forecasting and LR
Module
 Fill in the data and than Choose Perform Linear
Regression
Win QSB 2.0 Forecasting and LR
Module
 Specify the dependent and independent variables
Win QSB 2.0 Forecasting and LR
Module
 Result of Regression Analysis
Win QSB 2.0 Forecasting and LR
Module
 Result of Correlation Analysis and the Regression
Equation
Interpreting Regression
Coefficient
 Consider the interpretation of b0, b1, and b2 in Mr.
Charlie’s fitted regression function:
 The value b0 is the Y-intercept; that is the value of Ŷ
when both X1 and X2 are equal to zero.
 The coefficients b1 and b2 are referred to as the
partial or net regression coefficient. It measures the
average change in the dependent variable per unit
change in the relevant independent variable, holding
the other independent variables constant.
Thank You
Thank You

You might also like