Professional Documents
Culture Documents
Regression Analysis: Li-Ann Lee C. Nalangan
Regression Analysis: Li-Ann Lee C. Nalangan
Regression Analysis: Li-Ann Lee C. Nalangan
2. Multiple Regression
Simple Regression
Learn….
6 29 62 75
7 33 75 70
8 31 66 65
9 32 72
10 30 67 60
11 30 64 55
28 29 30 31 32 33 34 35 36 37 38 39 40
12 33 66
13 36 78
14 35 81
15 35 66
What information does a scatter plot
give?
The
form of relationship (linear or
non-linear)
The strength of relationship
Possible relationships between
X and Y in Scatter Diagrams
Y Y Y
(a) Direct linear (b) Inverse linear (c) Direct curvilinear
X X X
X X X
Types of Relationships
Direct vs. Inverse
◦ Direct - X and Y increase together
◦ Inverse - X and Y have opposite directions
Price of durian
opposite direction
Sales
Price of durian
Sales
X X
Y Y
(c) No linear relationship (d) non-linear relationship
X X
Correlation coefficient
other decreases.
SPXY
r
SS X SSY
SPXY xy
x y x 2
n SS X x 2
n
y
2
SS Y y2
n
a.planted(X) a.Harvested(Y) XY XX YY
9784 9713 95031992 95726656 94342369
11267 10526 1.19E+08 1.27E+08 1.11E+08
5681 5846 33211126 32273761 34175716
14623 12157 1.78E+08 2.14E+08 1.48E+08
778 748 581944 605284 559504
16151 15648 2.53E+08 2.61E+08 2.45E+08
9881 9598 94837838 97634161 92121604
163 100 16300 26569 10000
13828 12229 1.69E+08 1.91E+08 1.5E+08
670 663 444210 448900 439569
665 669 444885 442225 447561
n2 11 2
yˆ a bx 60
55
28 29 30 31 32 33 34 35 36 37 38 39 40
ŷ
is the predicted value of the response
variable y
a is the y-intercept and b is the slope
Constant is another name for y-intercept
b = SPXY/SSX a Y bX
XY
SSX X 2
X
2
SPXY XY
n n
The Regression Line of Wage
What factors (or variables) determine higher
or lower wage?
Years of education
Gender
Work experience
a. -446.341
b. 4.593
c. 331133
d. 71425
South Cotabato Total rice production data 2007
50
40
30
20 Y
Wage ($)
Predicted Y
10
0
0 5 10 15 20
Educ (years)X
Residuals are Prediction Errors
The regression equation is often called a
prediction equation yˆ a bx
The difference between an observed
outcome and its predicted value is the
prediction error, called a residual
Outliers
Outliers are observations with large residuals
Check for outliers by plotting the data
The regression line can be pulled toward an
outlier and away from the general trend of
points
Influential Observation
An observation can be influential in affecting
the regression line when two things happen:
◦ Its x value is low or high compared to the rest of
the data
◦ It does not fall in the straight-line pattern that the
rest of the data have
R *100%
SSY n
Model Summary
1. Simple Regression
2. Multiple Regression
Models, coefficients and interpretation
Evaluation of statistical models
Cautions in using multiple regression
EXAMPLES OF RESEARCH QUESTIONS
yˆ a b1 x 1 b 2 x 2 b 3 x 3
Production volume explained by 3 factors
Coefficientsa
Model
Unstandardized Standardi
Coefficients zed Coef
B Std. Error Beta t Sig.
1
(Constant) 346.059 956.121 .362 .728
area 4.344 .981 .943 4.429 .003
harvested(ha)
area 1.920 1.030 .455 1.865 .104
harvested(ha)
Farmers served -3.029 1.330 -.403 -2.277 .057
a. Dependent Variable: production volume(mt/yr)
Model Summary
Result if violated:
Effects of each factor cannot be
singled out since there is
simultaneous effects among highly
correlated factors.
Symptoms:
High R2
Few significant factors
Negative ‘b’ (coefficients) on
estimates for positively correlated
response and explanatory variable.
Strong/very strong correlation
among factors
ca ll … OUTPUT
R e Model Summary
Diagnosis: Multicollinearity
Caution2:No patterns should
remain in the residuals. (Residuals
should be random and not related
with one another.)
Result if violated:
The model estimate is not the
BEST estimate for the data.
Symptoms:
Diagnosis: Heteroscedasticity
Good – no heteroscedasticity
Bad – heteroscedasticity
Caution3: Be careful in assuming
the model form (linear or non-linear)
Result if violated:
Forecasts (predicted value or
estimates) may also be incorrect.
Caution4: Check data for outliers
or typographical errors.
Result if violated:
Analyses and forecasts (predicted
value or estimates) may also be
incorrect.
Exercise 4
1.Identify 1 response variable and at least
2 explanatory variables.
2.Estimate the regression model for the
response variable.
3.Evaluate the model using 3 indicators of
a good statistical model.
4.Be reminded of the CAUTIONS.
References
Agresti/Franklin. Statistics Analyzing Association Between
Quantitative Variables: Regression www-
rohan.sdsu.edu/~szarei/ppt250/ch_11.ppt
colorado.edu/Economics/courses/.../chapter4/regression1.ppt
Hartman, Julia. An Interactive Tutorial for SPSS 10.0 for
Windows.Multiple Linear Regression.
bama.ua.edu/~jhartman/689/mlr.ppt
Makridakis, Wheelwright and Hyndman, Forecasting Methods
and Applications. John Wiley & Sons, Inc. New York, 1998.
Regression and correlation analysis.
www.unc.edu/~jreiler/econ70/handouts/regression.doc
Wong, Ka-fu, ECON1003 Analysis of Economic Data School of
Economics and Finance, The University of Hongkong