Professional Documents
Culture Documents
Week 12+13
Week 12+13
➢ Causality relationship
➢ Methods of correlation
➢ Regression analysis
CAUSALITY RELATIONSHIP
• Scatter Plots
• Covariance
• Correlation coefficient
SCATTER PLOTS
16
Reliability
14
12
Height in CM
10
0
0 10 20 30 40 50 60 70 80 90
Age in Weeks
Age of Car
SCATTER PLOTS
No relationship
COVARIANCE
Using Formula
n
Variance
Gives information on
variability of a single
i
( x − x ) 2
S x2 = i =1
variable n −1
Gives information on n
(x i − x)( yi − y )
Covariance the degree to which two cov( x, y ) = i =1
n −1
variables vary together
COVARIANCE
σ𝑛𝑖=1(𝑥𝑖 − 𝑥)(𝑦
ҧ 𝑖 − 𝑦)
ത
𝐶𝑜𝑣 𝑋, 𝑌 = 𝑠𝑋𝑌 =
𝑛−1
r=0
Negatively
No
r = – 0.5 correlated
CORRELATION COEFFICIENTS
No relationship
-1 0 1
Stronger negative Stronger positive (direct)
(indirect, inverse) relationship relationship
CORRELATION COEFFICIENTS
A sample of 6 children was selected, data about their age in years and
weight in kilograms was recorded as shown in the following table. It
is required to find the correlation between age and weight.
Children Age Weight
No (years) (Kg)
1 7 12
2 6 8
3 8 12
4 5 10
5 6 11
6 9 13
REGRESSION ANALYSIS
• Variable: 𝑌 ← 𝑋
𝒀 𝑿
Dependent variable Independent variable
Explained variable Explanatory variable
Controlled variable Control variable
Predictand Predictor
Endogenous Exogenous
Example: Quantity of sales depends on Price
Advertising expenditure affects to Revenue
Selling price of a home depends on its size
REGRESSION MODEL
➢ Single Regression
➢ Least Squares estimation method
➢ Coefficient of determination
➢ Population vs Sample Regression
➢ T-test for Coefficient
➢ Confidence interval of Coefficient
➢ F-test for goodness of fit
SIMPLE LINEAR REGRESSION MODEL
Yi = β0 + β1 X i + εi
SIMPLE LINEAR REGRESSION MODEL
Y Yi = β0 + β1 X i + εi
Observed Value
of Y for Xi
εi Slope = β1
Intercept = β0
Xi
X
SIMPLE LINEAR REGRESSION MODEL
60 e10
e7
e9
55
e8
e6
50
e3
e5
45 e2
e4
40 e
35
8 10 12 14 16
LEAST SQUARES METHOD
▪ To estimate coefficient
▪ Estimated regression: 𝑦ෝ𝑖 = 𝛽መ0 + 𝛽መ1 𝑥𝑖
▪ Minimum 𝑅𝑆𝑆 = σ𝑒𝑖2 = σ 𝑦𝑖 − 𝑦ො𝑖 2
350
300
250
200
150
100
50
0
0 500 1000 1500 2000 2500 3000
Square Feet
SPSS OUTPUT
Regression Statistics
Multiple R 0.76211
R Square 0.58082
Adjusted R Square 0.52842
Standard Error 41.33032
Observations 10
ANOVA
df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000
ෝ𝒊 = 98.24833 + 0.10977𝒙𝒊
𝒚
▪ 𝛽መ0 is the estimated average value of Y when the value of
X is zero (if X = 0 is in the range of observed X values).
Here, no houses had 0 square feet, so 𝛽መ0 = 98.24833 just
indicates that, for houses within the range of sizes
observed, $98,248.33 is the portion of the house price not
explained by square feet
▪ Here, 𝛽መ1 = 0.10977 tells us that the average value of a
house increases by .10977($1000) = $109.77, on average,
for each additional one square foot of size
ANALYSIS OF VARIANCE
𝑆𝑆𝑅 𝑆𝑆𝐸
𝑅2 = =1−
𝑆𝑆𝑇 𝑆𝑆𝑇
• 0 ≤ 𝑅2 ≤ 1
• 𝑅2 measures proportion of variation in the dependent
variable is explained by the variation in the independent
variable
• 𝑅2 = 0: horizonal regression line
• 𝑅2 = 1: scatter on straight line
COEFFICIENT OF DETERMINATION
𝑆𝑆𝑅 𝑆𝑆𝐸
𝑅2 = =1− (0 ≤ R2 ≤ 1)
𝑆𝑆𝑇 𝑆𝑆𝑇
Y
Y High R2
R2=1
X X
Y Y
Low R2
R2 ≈ 0
X X
SPSS OUTPUT
Regression Statistics
Multiple R 0.76211
R Square 0.58082
Adjusted R Square 0.52842
Standard Error 41.33032
Observations 10
ANOVA
df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000
Statistical Critical
Hypothesis Reject H0
value value
𝐻0 : 𝛽1 = 𝑏
𝑡 𝑛−𝑘 𝛼 𝑇 > 𝑡 𝑛−𝑘 𝛼
𝐻1 : 𝛽1 > 𝑏
𝛽መ1 − 𝑏 𝐻0 : 𝛽1 = 𝑏
𝑇= −𝑡 𝑛−𝑘 𝛼 𝑇 < −𝑡 𝑛−𝑘 𝛼
𝑆𝐸(𝛽መ1 ) 𝐻1 : 𝛽1 < 𝑏
𝐻0 : 𝛽1 = 𝑏
𝑡 𝑛−𝑘 𝛼/2 𝑇 >𝑡 𝑛−𝑘 𝛼/2
𝐻1 : 𝛽1 ≠ 𝑏
SPSS OUTPUT
Regression Statistics
Multiple R 0.76211
R Square 0.58082
Adjusted R Square 0.52842
Standard Error 41.33032
Observations 10
ANOVA
df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000
• At confidence level of (1 − 𝛼)
• 𝛽መ𝑗 − 𝑡 𝑛−𝑘 𝛼/2 𝑆𝐸 𝛽መ𝑗 < 𝛽𝑗 < 𝛽መ𝑗 + 𝑡 𝑛−𝑘 𝛼/2 𝑆𝐸(𝛽መ𝑗 )
SPSS OUTPUT
Regression Statistics
Multiple R 0.76211
R Square 0.58082
Adjusted R Square 0.52842
Standard Error 41.33032
Observations 10
ANOVA
df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000
45
SPSS OUTPUT
Regression Statistics
Multiple R 0.76211
R Square 0.58082
Adjusted R Square 0.52842
Standard Error 41.33032
Observations 10
ANOVA
df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000
47