Download as pdf or txt
Download as pdf or txt
You are on page 1of 26

Multiple Linear Regression

and Correlation

Dipersiapkan oleh: Dr. Indra, S.Si, M.Si


Multiple Linear Regression
• Response Variable: Y
• Explanatory Variables: X1,...,Xk
• Model (Extension of Simple Regression):
E(Y) = a + b1 X1 +  + bk Xk V(Y) = s2
• Partial Regression Coefficients (bi): Effect of
increasing Xi by 1 unit, holding all other
predictors constant.
• Computer packages fit models, hand
calculations very tedious
Prediction Equation & Residuals
• Model Parameters: a, b1,…, bk, s
^
• Estimators: a, b1, …, bk, s
^
• Least squares prediction equation: Y = a + b1 X1 +  + bk X k

^
Residuals: e = Y − Y
^
• Error Sum of Squares: SSE =  e =  (Y − Y ) 2
2

• Estimated conditional standard deviation:


^ SSE
s=
n − k −1
Commonly Used Plots
• Scatterplot: Bivariate plot of pairs of variables. Do not
adjust for other variables. Some software packages plot a
matrix of plots
• Conditional Plot (Coplot): Plot of Y versus a predictor
variable, seperately for certain ranges of a second
predictor variable. Can show whether a relationship
between Y and X1 is the same across levels of X2
• Partial Regression (Added-Variable) Plot: Plots
residuals from regression models to determine
association between Y and X2, after removing effect of X1
(residuals from (Y , X1) vs (X2 , X1))
Example - Airfares 2002Q4
• Response Variable: Average Fare (Y, in $)
• Explanatory Variables:
– Distance (X1, in miles)
– Average weekly passengers (X2)
• Data: 1000 city pairs for 4th Quarter 2002
• Source: U.S. DOT
e

N
ee
i
m m
0
2
3
4
7 A
0
0
0
0
5 D
0
1
6
1
5 A
0 V
Example - Airfares 2002Q4
Scatterplot Matrix of Average Fare, Distance, and Average
Passengers (produced by STATA):
0 1000 2000 3000
400

avefare 200

0
3000

2000
distance
1000

0
10000

avepass 5000

0
0 200 400 0 5000 10000
Example - Airfares 2002Q4
Partial Regression Plots: Showing whether a new predictor is
associated with Y, after removing effects of other predictor(s):
Partial Regression Plot Partial Regression Plot

Dependent Variable: AVEFARE Dependent Variable: AVEFARE


200
300

200

100

100

0
0
AVEFARE

-100

-100
-200
-2000 0 2000 4000 6000 8000 10000
-2000 -1000 0 1000 2000

AVEPASS
DISTANCE

After controlling for AVEPASS, After controlling for DISTANCE,


DISTANCE is linearly related to FARE AVEPASS not related to FARE
Standard Regression Output
• Analysis of Variance:
^
– Regression sum of Squares: SSR =  (Y − Y ) 2 df R = k
– Error Sum of Squares: ^
SSE =  (Y − Y ) 2 df E = n − k − 1
– Total Sum of Squares: TSS =  (Y − Y ) 2 dfT = n − 1

• Coefficient of Correlation/Determination: R2=SSR/TSS


• Least Squares Estimates
– Regression Coefficients
– Estimated Standard Errors
– t-statistics
– P-values (Significance levels for 2-sided tests)
Explained Variation
ANOVA TABLE
Source df Sum Square Mean Square F-Hit
Regressio k SSR= MSR= MSR/
n S(Y’ – Y)2 SSR/(k) MSE
Error n-k-1 SSE= MSE=
S(Y-Y’)2 SSE/(n-k-1)
Total n-1 TSS=
SSR+SSE
S(Y-Y) Total Variation
Unexplained or Random Variation

ANOVA
table
Example - Airfares 2002Q4
b
u

E
u s
r
q
q
s
R M
t
u
u
2
0
9
4 a
1
a
P
b
D

Ob

m
S
ud
F
Sa
M
if
g
6
2
2
201
Ra

4
7
1 R
0
9 T
a
P
b
D

a
i c

d
a a
iic
c
SB
eM
E
i
t g
6
4
8
0 1
(
0
2
1
6
0 D
5
2
4
1
4 A
a
D
Multicollinearity
• Many social research studies have large numbers
of predictor variables
• Problems arise when the various predictors are
highly related among themselves (collinear)
– Estimated regression coefficients can change
dramatically, depending on whether or not other
predictor(s) are included in model.
– Standard errors of regression coefficients can
increase, causing non-significant t-tests and wide
confidence intervals
– Variables are explaining the same variation in Y
Testing for the Overall Model - F-test

• Tests whether any of the explanatory variables


are associated with the response
• H0: b1==bk=0 (None of Xs associated with Y)
• HA: Not all bi = 0
MSR R2 / k
T .S . : Fobs = =
MSE (1 − R 2 ) /( n − ( k + 1))
P − val : P ( F  Fobs )

The P-value is based on the F-distribution with k numerator and


(n-(k+1)) denominator degrees of freedom
Testing Individual Partial Coefficients - t-tests
• Wish to determine whether the response is
associated with a single explanatory variable, after
controlling for the others

• H0: bi = 0 HA: bi  0 (2-sided alternative)

bi
T .S . : tobs = ^
sb i

R.R. : | tobs |  ta / 2, n −( k +1)


P − val : 2 P (t | tobs |)
Modeling Interactions
• Statistical Interaction: When the effect of one
predictor (on the response) depends on the level
of other predictors.
• Can be modeled (and thus tested) with cross-
product terms (case of 2 predictors):
– E(Y) = a + b1X1 + b2X2 + b3X1X2
– X2=0  E(Y) = a + b1X1
– X2=10  E(Y) = a + b1X1 + 10b2 + 10b3X1
= (a + 10b2) + (b1 + 10b3)X1
• The effect of increasing X1 by 1 on E(Y) depends
on level of X2, unless b3=0 (t-test)
Comparing Regression Models
• Conflicting Goals: Explaining variation in Y while
keeping model as simple as possible (parsimony)
• We can test whether a subset of k-g predictors
(including possibly cross-product terms) can be
dropped from a model that contains the remaining
g predictors. H0: bg+1=…=bk =0
– Complete Model: Contains all k predictors
– Reduced Model: Eliminates the predictors from H0
– Fit both models, obtaining the Error sum of squares for
each (or R2 from each)
Comparing Regression Models
• H0: bg+1=…=bk = 0 (After removing the
effects of X1,…,Xg, none of other predictors
are associated with Y)
• Ha: H0 is false
( SSEr − SSEc ) /( k − g )
Test Statistic : Fobs =
SSEc /[ n − (k + 1)]
P = P( F  Fobs )

P-value based on F-distribution with k-g and n-(k+1) d.f.


Partial Correlation
• Measures the strength of association between Y
and a predictor, controlling for other predictor(s).
• Squared partial correlation represents the fraction
of variation in Y that is not explained by other
predictor(s) that is explained by this predictor.

rYX 2 − rYX1 rX1 X 2


rYX 2 • X1 = − 1  rYX 2 • X1  1
(1 − r )(1 − r )
2
YX 1
2
X1 X 2
Coefficient of Partial Determination
• Measures proportion of the variation in Y that is
explained by X2, out of the variation not explained by
X1
• Square of the partial correlation between Y and X2,
controlling for X1.

R −r
2 2

= 0  rYX2 2 • X 1  1
2 YX 1
rYX 2 • X 1
1− r 2
YX 1

• where R2 is the coefficient of determination for model


with both X1 and X2: R2 = SSR(X1,X2) / TSS
• Extends to more than 2 predictors (pp.414-415)
Standardized Regression Coefficients
• Measures the change in E(Y) in standard
deviations, per standard deviation change in Xi,
controlling for all other predictors (bi*)
• Allows comparison of variable effects that are
independent of units
• Estimated standardized regression coefficients:
 sXi 
b = bi 
*
i

 sY 

• where bi , is the partial regression coefficient and sXi and sY


are the sample standard deviations for the two variables
Illustration Using EVIEWS
• The example used a model of sales for Big Andy's Burger Barn.
• Big Andy’s sales revenue depends on the prices charged for
hamburgers, fries, shakes, and so on, and on the level of
advertising.
– The prices charged in a given city are collected together into a weighted price
index that is denoted by P = PRICE and measured in dollars.
– Monthly sales revenue for a given city is denoted by S = SALES and
measured in $1,000 units.
– Advertising expenditure for each city A = ADVERT is also measured in
thousands of dollars.
• The model includes two explanatory variables and a constant and is
written as
Illustration Using EVIEWS
• Descriptive statistics for all variables…
ADVERT PRICE SALES
Mean 1.844000 5.687200 77.37467
Median 1.800000 5.690000 76.50000
Maximum 3.100000 6.490000 91.20000
Minimum 0.500000 4.830000 62.40000
Std. Dev. 0.831677 0.518432 6.488537
Skewness 0.037087 0.061846 -0.010631
Kurtosis 1.704890 1.667162 2.255328

Jarque-Bera 5.258786 5.599242 1.734340


Probability 0.072122 0.060833 0.420139

Sum 138.3000 426.5400 5803.100


Sum Sq. Dev. 51.18480 19.88911 3115.482

Observations 75 75 75
Illustration Using EVIEWS
• Correlation results
Covariance Analysis: Ordinary
Date: 02/17/19 Time: 20:18
Sample: 1 75
Included observations: 75

Correlation
t-Statistic
Probability ADVERT PRICE SALES
ADVERT 1.000000
-----
-----

PRICE 0.026366 1.000000


0.225348 -----
0.8223 -----

SALES 0.222080 -0.625541 1.000000


1.946052 -6.850394 -----
0.0555 0.0000 -----
Illustration Using EVIEWS
• Regression result…
Dependent Variable: SALES
Method: Least Squares
Date: 02/17/19 Time: 20:08
Sample: 1 75
Included observations: 75

Variable Coefficient Std. Error t-Statistic Prob.

C 118.9136 6.351638 18.72172 0.0000


PRICE -7.907854 1.095993 -7.215241 0.0000
ADVERT 1.862584 0.683195 2.726283 0.0080

R-squared 0.448258 Mean dependent var 77.37467


Adjusted R-squared 0.432932 S.D. dependent var 6.488537
S.E. of regression 4.886124 Akaike info criterion 6.049854
Sum squared resid 1718.943 Schwarz criterion 6.142553
Log likelihood -223.8695 Hannan-Quinn criter. 6.086868
F-statistic 29.24786 Durbin-Watson stat 2.183037
Prob(F-statistic) 0.000000
Illustration Using EVIEWS
• Standardized Coefficient and Elasticities..

Scaled Coefficients
Date: 02/17/19 Time: 20:16
Sample: 1 75
Included observations: 75

Standardized Elasticity
Variable Coefficient Coefficient at Means

C 118.9136 NA 1.536855
PRICE -7.907854 -0.631835 -0.581244
ADVERT 1.862584 0.238739 0.044389
Assignment
• In this case we want to examine the effect
of government expenditure on infrastructure
(GEI, in IDR Billion), the inflation rate
(INF, in percent), and the labor force (TK,
in million people) on national GDP (GDP,
in IDR Billion).
• A multiple linear regression model will be
used which is expressed as follows:

GDPt = a 0 + a1GEI t + a 2 INFt + a 3TK +  t


Assignment
Based on the case above:
• Calculate descriptive statistics for all variables!
• Calculate the correlation coefficient between variables
and test whether it is statistically significant!
• Perform the regression analysis as follow:
– Test the coefficient significance both simultaneously and
individually
– Interpret the estimated coefficient
– Interpret the coefficient determination

You might also like