Regression Analysis: Li-Ann Lee C. Nalangan

Regression Analysis
LI-ANN LEE C. NALANGAN

OUTLINE
1. Simple Regression
 Scatter plot
 Correlation
 Models, coefficients and interpretation
 Outliers
 Evaluation of statistical models
2. Multiple Regression
Simple Regression
 Learn….
To use regression analysis to

explore the association between
two quantitative variables
EXAMPLES OF RESEARCH
QUESTIONS
How does one’s level of education affect one’s
earnings?
Is the attention span of students affected by
giving quiz?
What is the effect of unemployment benefits
on the duration of unemployment?
EXAMPLES OF RESEARCH
QUESTIONS, Cont’d
How is literacy affected by household
income?
How is unemployment affected by inflation?
What kind of individual characteristics
contribute to explain an individual’s decision
to retire?
What is the effect of drug use on wages?
The Scatterplot
 Thefirst step in answering the
question of association is to look at
the data
A scatterplot is a graphical display of
the relationship between two variables
Person Neckline(cm)
Waistline(cm)
1 38 72
2 34 64 90
3 31 59 85
4 31 64
5 32 68 80
6 29 62 75
7 33 75 70
8 31 66 65
9 32 72
10 30 67 60
11 30 64 55
28 29 30 31 32 33 34 35 36 37 38 39 40
12 33 66
13 36 78
14 35 81
15 35 66
What information does a scatter plot
give?
 The
form of relationship (linear or
non-linear)
 The strength of relationship
Possible relationships between
X and Y in Scatter Diagrams
Y Y Y
(a) Direct linear (b) Inverse linear (c) Direct curvilinear
X X X
Y YY (e) Inverse linear Y

(d) Inverse curvilinear (f) No linear relationship
with more scattering
X X X
Types of Relationships
 Direct vs. Inverse
◦ Direct - X and Y increase together
◦ Inverse - X and Y have opposite directions
 Linear vs. Curvilinear

◦ Linear - Straight line best describes the
relationship between X and Y
◦ Curvilinear - Curved line best describes the
relationship between X and Y
Direct vs. Inverse Relationship
Direct Relationship Inverse Relationship
Price and demand

of durian have
Price of durian
opposite direction
Sales
Advertising and Sales

increase/decrease
together
Advertising Durian demand

South Cotabato Total rice production data 2007
municipality area area prodn farmers

planted harvested volume served
banga 9784 9713 44907 6154
koronadal 11267 10526 46813 6433
lake sebu 5681 5846 24286 4033
norala 14623 12157 59235 7895
polomolok 778 748 3258 487
sto.nino 16151 15648 71998 8594
surallah 9881 9598 44965 6012
tampakan 163 100 435 131
tantangan 13828 12229 51434 8654
t'boli 670 663 2470 568
tupi 665 669 3056 442
 Observe the 3 scatter-plots that
follow
 Comment on the form of relationship of

each pair used in the graph
Scatter plot of area planted of rice vs. area
harvested at South Cotabato, 2007
Area planted and area

harvested increase together.
(Direct relationship)
Scatter plot of area planted rice vs.
production volume
Area planted and production

volume increase together.
Scatter plot of farmers served vs. area
planted of rice
Area planted and farmers

served increase together.
Exercise 1
1. Identify two variables and construct a

scatter-plot using SPSS.
2. Based on the diagram, what is the
possible relationship of the variables?
3. (What is the probable strength of the
variables used?)
Direct vs. Inverse Relationship
An upward straight line implies a perfect linear
correlation,  = 1 while a downward line means  = -1
Price and demand of

durian are perfectly
inversely proportional
Price of durian
Sales
Advertising and Sales

are perfectly correlated
Advertising Durian Demand

Strength of relationships between X
and Y in Scatter plots
Y YY
(b) Moderate linear
(a) Very strong linear
relationship
relationship
X X
Y Y
(c) No linear relationship (d) non-linear relationship
X X
Correlation coefficient
 The sign could be + or -.

 “+” sign indicates they increase (or decrease)
together.
 “-” sign indicates one increases while the
other decreases.
 The magnitude (0-1) is a measure of

strength of the association
Values and interpretation of r
Absolute values Interpretation on strength of
of correlation linear relationship of two
coefficient, r variables
0.01 - 0.20 very weak
0.21 - 0.40 weak
0.41 - 0.60 moderate
0.61 - 0.80 strong
0.80 - 0.99 very strong
Interpret the sign and magnitude of
correlation coefficient
Correlation (r) between
 Area planted & area
harvested, r=0.994
 Farmers served &

production volume,
r=0.982
Both pairs have very strong

linear relationship
Strength of linear relationship of rice production data
area planted area production farmers

(ha) harvested volume served
(ha) (mt/yr)
area planted .994** .993** .992
(ha) 1
.000 .000 .000**
area .994** .997** .989**
harvested 1
(ha) .000 .000 .000
production .993* .997** .982**
volume 1
(mt/yr) .000* .000 .000
farmers .992** .989** .982**
served 1
How to estimate correlation coefficients,
Pearson ()
SPXY
r
SS X SSY
SPXY   xy 
  x   y   x  2
n SS X  x 2

n
  y
2
SS Y   y2 
n
a.planted(X) a.Harvested(Y) XY XX YY
9784 9713 95031992 95726656 94342369
11267 10526 1.19E+08 1.27E+08 1.11E+08
5681 5846 33211126 32273761 34175716
14623 12157 1.78E+08 2.14E+08 1.48E+08
778 748 581944 605284 559504
16151 15648 2.53E+08 2.61E+08 2.45E+08
9881 9598 94837838 97634161 92121604
163 100 16300 26569 10000
13828 12229 1.69E+08 1.91E+08 1.5E+08
670 663 444210 448900 439569
665 669 444885 442225 447561
symbols SX SY SXY SXX SYY

sums 83491 77897 9.43E+08 1.02E+09 8.75E+08
Spxy 3.52E+08 r 0.994

SSx 3.86E+08
Ssy 3.23E+08
Hypothesis Test
 H0:  = 0 (There is no significant correlation
between X and Y)
Ha:   0 (There is significant correlation
between X and Y)
 Test statistic is : r
tc 
1 r2
n2
 Decision Rule: Reject H0 if t > t/2(n-2)
or if tn-2 < - t/2(n-2)
 t/2 – use t table

Hypothesis Test
 H0:  = 0 (There is no significant correlation
between a.planted and harvested)
Ha:   0 (There is significant correlation
between a.planted and harvested)
r  0 .994  0
tc    27.26
 Test statistic is : 1 r 2
1  .994 2
n2 11  2
 Decision Rule: Reject H0 if |tc| > t.025(9) =2.262

 Decision: Reject Ho.
Exercise 2
1. Identify at least three variables.
2. Compute their correlation coefficients.
3. Which pair of variables has significant
correlation?
4. Interpret the results.
Regression Analysis
 The next step in a regression analysis is
to identify the response and explanatory
variables
◦ We use Y to denote the response variable -the
variable we want to predict in terms of other
variables (factors)
◦ We use X to denote the explanatory variable-

the variable(s) or factor(s) used to predict the
value of a variable (response or dependent)
EXAMPLES OF RESEARCH QUESTIONS
How does one’s level of education affect one’s
earnings?
Response variable: earnings
Explanatory variable: level of education
What is the effect of unemployment benefits

on the duration of unemployment?
Response variable: duration of unemployment
Explanatory variable:
unemployment benefits
EXAMPLES OF RESEARCH QUESTIONS, Cont’d
How is literacy affected by household

income?
Response variable: literacy
Explanatory variable: household income
How is unemployment affected by inflation?

Response variable: unemployment
Explanatory variable: inflation
The Regression Line Equation
 When the scatterplot shows a linear trend, a
straight line fitted through the data points
90
describes that trend 85

80
75
 The regression line is: 70

65
yˆ  a  bx 60
55
28 29 30 31 32 33 34 35 36 37 38 39 40
 ŷ
is the predicted value of the response
variable y
 a is the y-intercept and b is the slope
 Constant is another name for y-intercept
b = SPXY/SSX a  Y  bX
XY
SSX   X 2

  X
2
SPXY  XY
n n
The Regression Line of Wage
What factors (or variables) determine higher
or lower wage?
Years of education
Gender
Work experience
Let’s consider one factor as explanatory

variable : Years of education
How is years of education related to wage?
Hourly wage rate explained by years of
education
yˆ  a  bx 50
Let 40
Hourly Wage Rate

Y = hourly wage rate($) 30
X = education (years) 20
10
a=-2.67853 0
0 5 10 15 20
b= 0.90538 Years of Education
Wage = -2.67853 +0.90538 (educ)

 If b is positive, correlation is also positive
 Higher values of education are positively
correlated with higher values of wages
Wage = -2.679 +0.905(educ)
Wage = -2.679 +0.905(educ=0)
Wage = -2.679 +0.905(1)
Educ (yr) Wage($) Wage = -2.679 +0.905(2)
0 -2.679
0.905 A person who has not
1 -1.773
attended school is estimated
2 -0.868
0.905 to have hourly wage of
33 0.0376
$-2.679. (Meaningful? )
66 2.7538
1010 6.3753  For every additional 1 year
1414 9.9968
9.9968 of schooling, hourly wage of
a person is estimated to
increase by $0.905.
The Regression Line of Production Volume

or lower production volume?
Area planted
Area harvested
Farmers served
Let’s consider one factor as explanatory

variable : area harvested
How is area harvested related to production
volume?
Production volume explained by area harvested
yˆ  a  bx
Let
Y = production volume (mt/yr)
X = area harvested (ha)
a=-446.341 b= 4.593
prodxn volume=-446.341+4.593 (area harvested)
A municipality with no harvested area has -446.341
mt/yr volume of rice production. (Meaningful? )
 For every additional 1 ha of harvested area

production increases its volume by 4.593 mt/yr.
For recent data on municipality, the prediction
equation relating y = production volume to x =
area harvested is:
Find the predicted production volume of

sto.nino which has the largest area harvested
(= 15648 ha).
a. -446.341
b. 4.593
c. 331133
d. 71425
municipality area prodn Estimated residual

harvested volume Prodn vol (error)
banga 9713 44907 44165 742
koronadal 10526 46813 47900 -1087
lake sebu 5846 24286 26404 -2118
norala 12157 59235 55391 3844
polomolok 748 3258 2989 269
sto.nino 15648 71998 71425 573
surallah 9598 44965 43637 1328
tampakan 100 435 13 422
tantangan 12229 51434 55721 -4287
t'boli 663 2470 2599 -129
tupi 669 3056 2626 430
Actual Y data and Predicted (Estimated) Y
Actual Y data and Predicted (Estimated) Y
50
40
30
20 Y
Wage ($)
Predicted Y
10
0
0 5 10 15 20
Educ (years)X
Residuals are Prediction Errors
 The regression equation is often called a
prediction equation yˆ  a  bx
 The difference between an observed
outcome and its predicted value is the
prediction error, called a residual
Outliers
 Outliers are observations with large residuals
 Check for outliers by plotting the data
 The regression line can be pulled toward an
outlier and away from the general trend of
points
Influential Observation
 An observation can be influential in affecting
the regression line when two things happen:
◦ Its x value is low or high compared to the rest of
the data
◦ It does not fall in the straight-line pattern that the
rest of the data have
 It is usually an extreme observation in the X-

variable, lying away from the bulk of X-data
A Statistical Model
 A statistical model never holds exactly in
practice.
 It is merely a simple approximation for
reality
 Even though it does not describe reality
exactly, a model is useful if the true
relationship is close to what the model
predicts
 Select the best among many different
models.
How to evaluate models to
determine the best
Evaluation of a Statistical Model
A statistical model can be evaluated using:
 R2 – the coefficient of determination

 Standard error of the estimate
 Significance of parameters
Coefficient of Determination, R2
 The proportion of variation in Y which is

explained by X
 Ranges from 0 to 100 percent
(or 0-1 if in decimal)
 The nearer it is to 100, the better is the
model
2 b1 SP XY
R  *100%
SSY
2 b1 SP XY SSY   Y 2

  Y 2
R  *100%
SSY n
R2 is the % variation in Y explained by X

R2 = 38.3%
R2 of regression of production volume
 R2 = .994 or 99.4%
 The 99.4% variation in production volume is
explained by area harvested.
 This indicates that only .6% is not explained
by area harvested
(probably .6 can be explained by other
factors like area planted).
Model Summary
Adjusted R Std. Error of the

Model R R Square Square Estimate
1 .997a .994 .993 2158.217
a. Predictors: (Constant), ha
Standard error of the estimate
A measure of difference of the actual value and the

estimated value (which uses statistical model) on the
average.
The smaller the standard error, the better is the model.
Model Summary

1 .997a .994 .993 2158.217
a. Predictors: (Constant), ha
Standard error of the regression of production
volume
For regression of production volume, standard

error of estimate is 2158.217.
Interpretation: The actual production volume

differs from its predicted value using area
harvested by an average of 2158.217 mt/yr.
Note: A possible better model has std. error

smaller than 2158.217 mt/yr.
municipality area prodn Estimated residual
harvested volume Prodn vol (error)
banga 9713 44907 44165 742
koronadal 10526 46813 47900 -1087
lake sebu 5846 24286 26404 -2118
norala 12157 59235 55391 3844
polomolok 748 3258 2989 269
sto.nino 15648 71998 71425 573
surallah 9598 44965 43637 1328
tampakan 100 435 13 422
tantangan 12229 51434 55721 -4287
t'boli 663 2470 2599 -129
tupi 669 3056 2626 430
Std. error of estimate

2158.217
Significance of parameters
 An explanatory variable is said to be an important
predictor variable if it has significant effect on
response variable.
 An explanatory variable with significant effect has
“sig” value less than .05 or .01 (the usual level of
significance)
 A good model includes (only) significant
explanatory variables
Sample SPSS output for
Coefficientsa/
Model
Unstandardized Standardized
Coefficients Coefficients
B Std. Error Beta t Sig.
(Constant) -446.341 1070.321 -.417 .686
Area 4.593 .120 .997 38.273 .000
prodxn
a. Dependent Variable: mt/yr
 Area production is a significant factor or explanatory

variable
 the constant is not and could be excluded in modeling
Application of correlation
coefficient on regression
 Strong relationship of explanatory variable (X)
to response variable (Y) may imply that X is a
good variable for predicting Y.
 Usually, we prioritize using X with a higher

correlation coefficient.
Note: Correlation coefficient does not mean

causality
Exercise 3
1.Identify 2 variables, one explanatory and
the other response variable.
2.Compute the coefficients of a regression
model for the response variable.
3.Evaluate the model using 3 indicators of
a good statistical model.
Sometimes we need to transform the data
(a) Y versus PORC3_NR (%age of large farms in number );

(b) log10 Y versus log 10 (PORC3_NR).
Sometimes we need to transform the data
Predicted vs Observed Plots: (a) model with variables not

transformed): R2 = 0.61; (b) Model 7: R2 = 0.85.
OUTLINE
1. Simple Regression
2. Multiple Regression
 Models, coefficients and interpretation
 Evaluation of statistical models
 Cautions in using multiple regression
How does one’s level of education, years of

experience and gender affect one’s earnings?
Response variable: earnings
Explanatory variable: level of education,
years of experience
and gender
How is literacy affected by household income,

unemployment, family tradition and political
condition?
Response variable: literacy
Explanatory variable: household income,
unemployment, family
tradition and political
condition
municipality area area prodn farmers

planted harvested volume served
banga 9784 9713 44907 6154
koronadal 11267 10526 46813 6433
lake sebu 5681 5846 24286 4033
norala 14623 12157 59235 7895
polomolok 778 748 3258 487
sto.nino 16151 15648 71998 8594
surallah 9881 9598 44965 6012
tampakan 163 100 435 131
tantangan 13828 12229 51434 8654
t'boli 670 663 2470 568
tupi 665 669 3056 442
 With data from South Cotabato on rice
production, which could be response
variable and its possible explanatory
variables?
 Data: area planted
area harvested
production volume
farmers served
or lower production volume?
Area planted
Area harvested
Farmers served
Let’s consider the 3 factors as explanatory

variables: area harvested, area planted and
farmers served.
Compare results of this model (having 3 factors) with those of

Production volume explained by 3 factors
Let Y = production volume (mt/yr)

X1 = area harvested (ha)
X2 = area planted (ha)
X3 = farmers served
yˆ  a  b1 x 1  b 2 x 2  b 3 x 3
Coefficientsa
Model
Unstandardized Standardi
Coefficients zed Coef
1
(Constant) 346.059 956.121 .362 .728
area 4.344 .981 .943 4.429 .003
harvested(ha)
area 1.920 1.030 .455 1.865 .104
harvested(ha)
Farmers served -3.029 1.330 -.403 -2.277 .057
a. Dependent Variable: production volume(mt/yr)
yˆ  346 . 059  4 . 344 x1  1 . 92 x 2  3 . 029 x 3

Let Y = production volume (mt/yr)

X1 = area harvested (ha)
X2 = area planted (ha)
X3 = farmers served
yˆ  346 . 059  4 . 344 x1  1 . 92 x 2  3 . 029 x 3
prodxn vol=346.059 + 4.344*(a.harvested)

+ 1.92(a.planted) – 3.029*(farmers)
For recent data on municipality, the prediction
equation relating production volume to area
harvested, planted and farmers served is:
Find the predicted production volume of

sto.nino with area harvested = 15648 ha, area
planted = 16151 and farmers = 8594.
prodxn vol=346.059 + 4.344*(15648)

+ 1.92(16151) –3.029*(8594)
prodxn vol=73300
Interpretation:
A municipality with no harvest, no area planted and no
farmer served has an estimate of 346.059 mt/yr volume of
rice production.(Meaningful?)
 For every additional 1 ha of harvested area, production of
rice increases its volume by 4.344 mt/yr, holding other factors
fixed.
 For every additional 1 ha of area planted of rice,
production increases its volume by 1.92 mt/yr, holding
other factors fixed.
 For every additional 1 farmer served, production of rice
decreases its volume by 3.029 mt/yr, holding other factors
fixed. (Meaningful?)
ca ll…
R e
Evaluation of a Statistical Model
A good statistical model has :
 High R2 – the coefficient of determination

 Low standard error of the estimate
 Significant parameters ONLY
R2 of regression of production volume
 R2 = .997 or 99.7%
 The 99.7% variation in production volume is
explained by area harvested, planted and
farmers served.
 Has an additional .3% compared to first
model with R2 =99.4.
Model Summary

2 .998a .997 .995 1809.882
a. Predictors: (Constant), area harvested, area planted, farmers served
Standard error of the regression of production
volume
Standard error of estimate is 1809.882.
Note: This model has smaller std. error of estimate

compared to 2158.217 mt/yr of the first.
Model Summary

2 .998a .997 .995 1809.882
a. Predictors: (Constant), area harvested, area planted, farmers served
Coefficientsa
Model Coefficients zed Coef
2 (Constant) 346.059 956.121 .362 .728ns
area 4.344 .981 .943 4.429 .003**
harvested(ha)
area planted(ha) 1.920 1.030 .455 1.865 .104ns
Farmers served -3.029 1.330 -.403 -2.277 .057ns
 Area harvested is the only significant factor

or explanatory variable, the rests are not.
Cautions in using multiple
regression analysis
Caution1: Factors (explanatory
variables) should not be correlated
with one another.
Result if violated:
Effects of each factor cannot be
singled out since there is
simultaneous effects among highly
correlated factors.
Symptoms:
 High R2
 Few significant factors
 Negative ‘b’ (coefficients) on
estimates for positively correlated
response and explanatory variable.
 Strong/very strong correlation
among factors
ca ll … OUTPUT
R e Model Summary

2 .998a .997 .995 1809.882
a. Predictors: (Constant), area harvested,aarea planted, farmers served
Coefficients
Model Coefficients zed Coef
2 (Constant) 346.059 956.121 .362 .728ns
area 4.344 .981 .943 4.429 .003**
harvested(ha)
area 1.920 1.030 .455 1.865 .104ns
harvested(ha)
Farmers served -3.029 1.330 -.403 -2.277 .057ns
Interpretation:
A municipality with no harvest, no area planted and no
farmer served has an estimate of 346.059 mt/yr volume
production of rice.(Meaningful?)
 For every additional 1 farmer served, volume production

of rice decreases by 3.029 mt/yr, holding other factors
fixed. (Meaningful?)
 Not consistent with r=.982 for production volume and
farmers served
Strength of linear relationship of rice production data
r area planted area production farmers

(ha) harvested volume served
Sig (ha) (mt/yr)
area planted .994** .993** .992
(ha) 1
.000 .000 .000**
area .994** .997** .989**
harvested 1
(ha) .000 .000 .000
production .993* .997** .982**
volume 1
(mt/yr) .000* .000 .000
farmers .992** .989** .982**
served 1
 High R2: 99.7%
 Few significant factors: 1 out of 3
 Negative ‘b’ (coefficients) on estimates for
positively correlated response and explanatory
variable: farmers served
 Strong/very strong correlation among
factors: r=.99
Diagnosis: Multicollinearity
Caution2:No patterns should
remain in the residuals. (Residuals
should be random and not related
with one another.)
Result if violated:
The model estimate is not the
BEST estimate for the data.
Symptoms:
 Increasing residuals with

the predicted value (or
explanatory variable).
 Significant correlations on
residuals
municipality prodn Estimated residual

volume Prodn vol (error)
banga 44907 42684 2223
koronadal 46813 48218 -1405
lake sebu 24286 24433 -147
norala 59235 57318 1917
polomolok 3258 3614 -356
sto.nino 71998 73300 -1302
surallah 44965 42801 2164
tampakan 435 697 -262
tantangan 51434 53806 -2372
t'boli 2470 2792 -322
tupi 3056 3190 -134
Detecting patterns
As the predicted production

volume increases, the points
become more scattered
Diagnosis: Heteroscedasticity
Good – no heteroscedasticity
Bad – heteroscedasticity
Caution3: Be careful in assuming
the model form (linear or non-linear)
Result if violated:
Forecasts (predicted value or
estimates) may also be incorrect.
Caution4: Check data for outliers
or typographical errors.
Result if violated:
Analyses and forecasts (predicted
value or estimates) may also be
incorrect.
Exercise 4
1.Identify 1 response variable and at least
2 explanatory variables.
2.Estimate the regression model for the
response variable.
3.Evaluate the model using 3 indicators of
a good statistical model.
4.Be reminded of the CAUTIONS.
References
 Agresti/Franklin. Statistics Analyzing Association Between
Quantitative Variables: Regression www-
rohan.sdsu.edu/~szarei/ppt250/ch_11.ppt
 colorado.edu/Economics/courses/.../chapter4/regression1.ppt
 Hartman, Julia. An Interactive Tutorial for SPSS 10.0 for
Windows.Multiple Linear Regression.
bama.ua.edu/~jhartman/689/mlr.ppt
 Makridakis, Wheelwright and Hyndman, Forecasting Methods
and Applications. John Wiley & Sons, Inc. New York, 1998.
 Regression and correlation analysis.
www.unc.edu/~jreiler/econ70/handouts/regression.doc
 Wong, Ka-fu, ECON1003 Analysis of Economic Data School of
Economics and Finance, The University of Hongkong

Regression Analysis: Li-Ann Lee C. Nalangan

Uploaded by

Copyright:

Available Formats

You might also like

Regression Analysis: Li-Ann Lee C. Nalangan

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Regression Analysis: Li-Ann Lee C. Nalangan

Uploaded by

Copyright:

Available Formats

Regression Analysis

LI-ANN LEE C. NALANGAN

To use regression analysis to

Y YY (e) Inverse linear Y

 Linear vs. Curvilinear

Price and demand

Advertising and Sales

Advertising Durian demand

municipality area area prodn farmers

 Comment on the form of relationship of

Area planted and area

Area planted and production

Area planted and farmers

1. Identify two variables and construct a

Price and demand of

Advertising and Sales

Advertising Durian Demand

 The sign could be + or -.

 The magnitude (0-1) is a measure of

 Farmers served &

Both pairs have very strong

area planted area production farmers

symbols SX SY SXY SXX SYY

Spxy 3.52E+08 r 0.994

 t/2 – use t table

 Decision Rule: Reject H0 if |tc| > t.025(9) =2.262

◦ We use X to denote the explanatory variable-

What is the effect of unemployment benefits

How is literacy affected by household

How is unemployment affected by inflation?

describes that trend 85

 The regression line is: 70

Let’s consider one factor as explanatory

Hourly Wage Rate

Wage = -2.67853 +0.90538 (educ)

What factors (or variables) determine higher

Let’s consider one factor as explanatory

 For every additional 1 ha of harvested area

prodxn volume=-446.341+4.593 (area harvested)

Find the predicted production volume of

municipality area prodn Estimated residual

 It is usually an extreme observation in the X-

A statistical model can be evaluated using:

 R2 – the coefficient of determination

 The proportion of variation in Y which is

R2 is the % variation in Y explained by X

Adjusted R Std. Error of the

A measure of difference of the actual value and the

The smaller the standard error, the better is the model.

Adjusted R Std. Error of the

For regression of production volume, standard

Interpretation: The actual production volume

Note: A possible better model has std. error

Std. error of estimate

 Area production is a significant factor or explanatory

 Usually, we prioritize using X with a higher

Note: Correlation coefficient does not mean