Week 12+13

STATISTICS
IN ECONOMICS AND BUSINESS
Nguyen Huyen Trang

Faculty of Statistics - National Economics University
trangtk@neu.edu.vn
LECTURE 10: REGRESSION
➢ Causality relationship
➢ Methods of correlation
➢ Regression analysis
CAUSALITY RELATIONSHIP
There are many examples in economics where we

need to consider the relationship between two or
more variables:
▪ Consumption and income
▪ Inflation and unemployment
▪ Output and costs
▪ Advertising and Sales Revenue
THE RELATIONSHIP BETWEEN X AND Y
• Correlation: Is there a relationship between 2

variables?
• Regression: How well a certain independent variable

predict dependent variable?
METHODS OF CORRELATION
• Scatter Plots
• Covariance
• Correlation coefficient
SCATTER PLOTS
Indicative of the type of relationship between your two variables
Positive relationship Negative Relationship

18
16
Reliability
14
12
Height in CM
10
0
0 10 20 30 40 50 60 70 80 90
Age in Weeks
Age of Car
SCATTER PLOTS
No relationship
COVARIANCE
Using Formula
n
Variance
Gives information on
variability of a single
 i
( x − x ) 2
S x2 = i =1
variable n −1
Gives information on n
 (x i − x)( yi − y )
Covariance the degree to which two cov( x, y ) = i =1
n −1
variables vary together
COVARIANCE
Measure combined variability of 𝑋 and 𝑌
σ𝑛𝑖=1(𝑥𝑖 − 𝑥)(𝑦
ҧ 𝑖 − 𝑦)
ത
𝐶𝑜𝑣 𝑋, 𝑌 = 𝑠𝑋𝑌 =
𝑛−1
▪ When X and Y : cov (x,y) > 0

▪ When X and Y : cov (x,y) < 0
▪ When no constant relationship: cov (x,y) = 0
EXERCISE 1
Subject x y x error * x y x error *

y error y error
1 101 100 2500 54 53 9
2 81 80 900 53 52 4
3 61 60 100 52 51 1
4 51 50 0 51 50 0
5 41 40 100 50 49 1
6 21 20 900 49 48 4
7 1 0 2500 48 47 9
Mean
Sum of x error * y error : Sum of x error * y error :
Covariance: Covariance:
CORRELATION COEFFICIENTS
▪ Covariance does not really tell us anything
→ Standardize this measure → Correlation Coefficient (r)
▪ Measures the direction and strength between two

variables
𝐶𝑜𝑣(𝑋, 𝑌) σ𝑛𝑖=1(𝑥𝑖 − 𝑥)(𝑦

ҧ 𝑖 − 𝑦)
ത
𝑟= =
𝑆𝑋 𝑆𝑌 σ𝑛𝑖=1 𝑥𝑖 − 𝑥ҧ 2 σ𝑛𝑖=1 𝑦𝑖 − 𝑦ത 2
The value of r ranges between -1 and 1
The sign of r denotes the direction of association
The value of r denotes the strength of association

CORRELATION
Graph and Correlation Coefficient (𝑟)

r = 0.5 Positively
Weak
Strong r = 0.8
r=0
Negatively
No
r = – 0.5 correlated
No relationship
-1 0 1
Stronger negative Stronger positive (direct)
(indirect, inverse) relationship relationship
Absolute value of General

correlation Interpretation
The strength of the correlation
coefficient depends on how many data
.8 - 1.0 Very Strong
points in the scatter plot are near
.6 - .8 Strong
.4 - .6 Moderate or far in a pattern.
.2 - .4 Weak
.0 - .2 Very Weak or
no relationship
EXERCISE 2
A sample of 6 children was selected, data about their age in years and
weight in kilograms was recorded as shown in the following table. It
is required to find the correlation between age and weight.
Children Age Weight
No (years) (Kg)
1 7 12
2 6 8
3 8 12
4 5 10
5 6 11
6 9 13
REGRESSION ANALYSIS
➢ Correlation describes the strength of a linear

relationship between two variables
➢ Regression tells us how to draw the straight line
described by the correlation
REGRESSION ANALYSIS
Technique concerned with predicting some variables

by knowing others
The process of predicting variable Y using variable X
Tells you how values in Y change as a function of
changes in values of X
REGRESSION ANALYSIS
• Regression analysis is used to:

• Predict the value of a dependent variable based on the
value of at least one independent variable
• Explain the impact of changes in an independent
variable on the dependent variable
Dependent variable: the variable we wish to explain
Independent variable: the variable used to explain the
dependent variable
SINGLE REGRESSION
• Variable: 𝑌 ← 𝑋
𝒀 𝑿
Dependent variable Independent variable
Explained variable Explanatory variable
Controlled variable Control variable
Predictand Predictor
Endogenous Exogenous
Example: Quantity of sales depends on Price
Advertising expenditure affects to Revenue
Selling price of a home depends on its size
REGRESSION MODEL
➢ Single Regression
➢ Least Squares estimation method
➢ Coefficient of determination
➢ Population vs Sample Regression
➢ T-test for Coefficient
➢ Confidence interval of Coefficient
➢ F-test for goodness of fit
SIMPLE LINEAR REGRESSION MODEL
The population regression model:

Population Random
Population Independent Error
Slope
Y intercept Variable term
Coefficient
Dependent
Variable
Yi = β0 + β1 X i + εi
Y Yi = β0 + β1 X i + εi
Observed Value
of Y for Xi
εi Slope = β1
Predicted Value Random Error for

of Y for Xi this Xi value
Intercept = β0
Xi
X
The sample regression model:
Estimated (or Estimate of the Estimate of the

predicted) y regression regression slope
value for intercept
observation i
Value of x for
yො i = 𝛽መ0 + 𝛽መ1 xi observation i
The individual random error terms ei have a mean of zero
ei = (yi −ොyi ) = yi − (𝛽መ0 + 𝛽መ1 xi )

➢ 𝛽መ0 is the estimated average value of y when the value of

x is zero (if x = 0 is in the range of observed x values)
➢ 𝛽መ1 is the estimated change in the average value of y as a

result of a one-unit change in x
65
yො i = 𝛽መ0 + 𝛽መ1 xi
60 e10
e7
e9
55
e8
e6
50
e3
e5
45 e2
e4
40 e
35
8 10 12 14 16
LEAST SQUARES METHOD
▪ To estimate coefficient
▪ Estimated regression: 𝑦ෝ𝑖 = 𝛽መ0 + 𝛽መ1 𝑥𝑖
▪ Minimum 𝑅𝑆𝑆 = σ𝑒𝑖2 = σ 𝑦𝑖 − 𝑦ො𝑖 2
Σ𝑦𝑖 = 𝑛. 𝛽መ0 + 𝛽መ1 . Σ𝑥𝑖

൝
Σ𝑦𝑖 . 𝑥𝑖 = 𝛽መ0 Σ𝑥𝑖 + 𝛽መ1 . Σ𝑥𝑖2
𝑥 ∗ 𝑦 − 𝑥ҧ ∗ 𝑦ത
𝛽መ1 = ; 𝛽መ0 = 𝑦ത − 𝛽መ1 𝑥ҧ
𝑥 2 − 𝑥ҧ 2
LEAST SQUARES METHOD
The coefficients 𝛽መ0 and 𝛽መ1 , and other regression results

in this topic, will be found using a computer:
• Hand calculations are tedious
• Statistical routines are built into Excel
• Other statistical analysis software can be used
EXERCISE 3
• A real estate agent wishes to examine the relationship

between the selling price of a home and its size
(measured in square feet)
• A random sample of 10 houses is selected

• Dependent variable (Y) = house price in $1000s
• Independent variable (X) = square feet
EXERCISE 3
House Price in $1000s Square Feet

(Y) (X)
245 1400
312 1600
279 1700
308 1875
199 1100
219 1550
405 2350
324 2450
319 1425
255 1700
EXERCISE 3
House price model: scatter plot
450
400
House Price ($1000s)
350
300
250
200
150
100
50
0
0 500 1000 1500 2000 2500 3000
Square Feet
SPSS OUTPUT
Regression Statistics
Multiple R 0.76211
R Square 0.58082
Adjusted R Square 0.52842
Standard Error 41.33032
Observations 10
ANOVA
df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

Constant 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386
Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580
EXERCISE 3
ෝ𝒊 = 98.24833 + 0.10977𝒙𝒊
𝒚
▪ 𝛽መ0 is the estimated average value of Y when the value of
X is zero (if X = 0 is in the range of observed X values).
Here, no houses had 0 square feet, so 𝛽መ0 = 98.24833 just
indicates that, for houses within the range of sizes
observed, $98,248.33 is the portion of the house price not
explained by square feet
▪ Here, 𝛽መ1 = 0.10977 tells us that the average value of a
house increases by .10977($1000) = $109.77, on average,
for each additional one square foot of size
ANALYSIS OF VARIANCE
• 𝑆𝑆𝑇 = σ 𝑌𝑖 − 𝑌ത 2 : Total Sum of Squares: total variation in the

dependent variable
2
෡ ത
• 𝑆𝑆𝑅 = σ 𝑌𝑖 − 𝑌 : Sum of Squares of Regression: variation in
the dependent variable explained by variation in the
independent variable
2
෠
• 𝑆𝑆𝐸 = σ 𝑌𝑖 − 𝑌𝑖 : Sum of Squares of Residual : variation due
to errors where:
• 𝑆𝑆𝑇 = 𝑆𝑆𝑅 + 𝑆𝑆𝐸 yǉ = Average value of the dependent variable
yi = Observed values of the dependent variable

yො = Predicted value of y for the given xi value
COEFFICIENT OF DETERMINATION
• Goodness of fit measures:
𝑆𝑆𝑅 𝑆𝑆𝐸
𝑅2 = =1−
𝑆𝑆𝑇 𝑆𝑆𝑇
• 0 ≤ 𝑅2 ≤ 1
• 𝑅2 measures proportion of variation in the dependent
variable is explained by the variation in the independent
variable
• 𝑅2 = 0: horizonal regression line
• 𝑅2 = 1: scatter on straight line
COEFFICIENT OF DETERMINATION
𝑆𝑆𝑅 𝑆𝑆𝐸
𝑅2 = =1− (0 ≤ R2 ≤ 1)
𝑆𝑆𝑇 𝑆𝑆𝑇
Y
Y High R2
R2=1
X X
Y Y
Low R2
R2 ≈ 0
X X
SPSS OUTPUT
Multiple R 0.76211
R Square 0.58082
Observations 10
ANOVA
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000

Constant 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386
Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580
POPULATION VS SAMPLE REGRESSION
▪ Population regression model

𝑌 = 𝛽0 + 𝛽1 𝑋 + 𝑢
▪ Sample regression model
𝑌𝑖 = 𝛽መ0 + 𝛽መ1 𝑋𝑖 + 𝑒𝑖
ෝ𝒊 = 𝛽መ0 + 𝛽መ1 𝑋𝑖
𝒚
▪ 𝛽መ0 is estimate of 𝛽0
▪ 𝛽መ1 is estimate of 𝛽1
▪ Least Squares method: 𝑅𝑆𝑆 = σ𝑒𝑖2 → 𝑚𝑖𝑛
▪ Calculated Standard Error: 𝑆𝐸 𝛽መ0 , 𝑆𝐸 𝛽መ1
T-TEST FOR COEFFICIENT
▪ Based on assumption of Normality of error

▪ Testing compare value of 𝛽1 with value 𝑏
▪ Two-tail hypothesis pair
𝐻0 : 𝛽1 = 𝑏
ቊ
𝐻1 : 𝛽1 ≠ 𝑏
𝛽መ1 − 𝑏
𝑇 − 𝑠𝑡𝑎𝑡 =
𝑆𝐸(𝛽መ1 )
𝑇 − 𝑐𝑟𝑖𝑡𝑖𝑐𝑎𝑙: 𝑡 𝑛−𝐾
𝑛: number of observations; 𝑘: number of coefficients
T-TEST FOR COEFFICIENT
Statistical Critical
Hypothesis Reject H0
value value
𝐻0 : 𝛽1 = 𝑏
𝑡 𝑛−𝑘 𝛼 𝑇 > 𝑡 𝑛−𝑘 𝛼
𝐻1 : 𝛽1 > 𝑏
𝛽መ1 − 𝑏 𝐻0 : 𝛽1 = 𝑏
𝑇= −𝑡 𝑛−𝑘 𝛼 𝑇 < −𝑡 𝑛−𝑘 𝛼
𝑆𝐸(𝛽መ1 ) 𝐻1 : 𝛽1 < 𝑏
𝐻0 : 𝛽1 = 𝑏
𝑡 𝑛−𝑘 𝛼/2 𝑇 >𝑡 𝑛−𝑘 𝛼/2
𝐻1 : 𝛽1 ≠ 𝑏
SPSS OUTPUT
Multiple R 0.76211
R Square 0.58082
Observations 10
ANOVA
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000
Coefficients Standard Error t Stat Sig. Lower 95% Upper 95%

Constant 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386
Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580
CONFIDENCE INTERVAL FOR COEFFICIENT
• At confidence level of (1 − 𝛼)
• 𝛽መ𝑗 − 𝑡 𝑛−𝑘 𝛼/2 𝑆𝐸 𝛽መ𝑗 < 𝛽𝑗 < 𝛽መ𝑗 + 𝑡 𝑛−𝑘 𝛼/2 𝑆𝐸(𝛽መ𝑗 )
SPSS OUTPUT
Multiple R 0.76211
R Square 0.58082
Observations 10
ANOVA
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000

Constant 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386
Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580
F-TEST FOR GOODNESS OF FIT
• 𝐻0 : 𝑅2 = 0 (Model is overall insignificant)

• 𝐻1 : 𝑅2 ≠ 0 (𝑅2 > 0) (Model is overall significant)
𝑅 2 /(𝑘−1)
• 𝐹 − 𝑠𝑡𝑎𝑡: 𝐹 =
(1−𝑅 2 )/(𝑛−𝑘)
• Critical value: 𝐹 𝑘−1,𝑛−𝑘 𝛼
• If 𝐹 − 𝑠𝑡𝑎𝑡 > 𝐹 − 𝑐𝑟𝑖𝑡𝑖𝑐𝑎𝑙 ⇒ Reject 𝐻0
• Or: (𝑃 − 𝑣𝑎𝑙𝑢𝑒 < 𝛼)
45
SPSS OUTPUT
Multiple R 0.76211
R Square 0.58082
Observations 10
ANOVA
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000

Constant 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386
Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580
T-TEST VS F-TEST
▪ F-test is overall test, multiple-coefficient test

▪ T-test is single coefficient test
• Sometimes F-test and T-test are conflict
47

Week 12+13

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Week 12+13

Uploaded by

Copyright:

Available Formats

STATISTICS

IN ECONOMICS AND BUSINESS

Nguyen Huyen Trang

There are many examples in economics where we

• Correlation: Is there a relationship between 2

• Regression: How well a certain independent variable

Indicative of the type of relationship between your two variables

Positive relationship Negative Relationship

Measure combined variability of 𝑋 and 𝑌

▪ When X and Y : cov (x,y) > 0

Subject x y x error * x y x error *

▪ Covariance does not really tell us anything

→ Standardize this measure → Correlation Coefficient (r)

▪ Measures the direction and strength between two

𝐶𝑜𝑣(𝑋, 𝑌) σ𝑛𝑖=1(𝑥𝑖 − 𝑥)(𝑦

The value of r ranges between -1 and 1

The sign of r denotes the direction of association

The value of r denotes the strength of association

Graph and Correlation Coefficient (𝑟)

Absolute value of General

➢ Correlation describes the strength of a linear

Technique concerned with predicting some variables

• Regression analysis is used to:

The population regression model:

Predicted Value Random Error for

The sample regression model:

Estimated (or Estimate of the Estimate of the

yො i = 𝛽መ0 + 𝛽መ1 xi observation i

The individual random error terms ei have a mean of zero

ei = (yi −ොyi ) = yi − (𝛽መ0 + 𝛽መ1 xi )

➢ 𝛽መ0 is the estimated average value of y when the value of

➢ 𝛽መ1 is the estimated change in the average value of y as a

Σ𝑦𝑖 = 𝑛. 𝛽መ0 + 𝛽መ1 . Σ𝑥𝑖

The coefficients 𝛽መ0 and 𝛽መ1 , and other regression results

• A real estate agent wishes to examine the relationship

• A random sample of 10 houses is selected

House Price in $1000s Square Feet

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

• 𝑆𝑆𝑇 = σ 𝑌𝑖 − 𝑌ത 2 : Total Sum of Squares: total variation in the

yi = Observed values of the dependent variable

• Goodness of fit measures:

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

▪ Population regression model

▪ Based on assumption of Normality of error

Coefficients Standard Error t Stat Sig. Lower 95% Upper 95%

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

• 𝐻0 : 𝑅2 = 0 (Model is overall insignificant)

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

▪ F-test is overall test, multiple-coefficient test

You might also like