Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

department of department of

statistics Learning Outcomes statistics Correlation


Introduction to Correlation and At the end of this lesson, you would be able to: Linear correlation analysis
Linear Regression Analysis statistical technique used to determine the direction and
1. Construct a scatterplot for a bivariate data; strength or degree of linear relationship existing between two
Glyzel Grace M. Francisco
STAT1200 – Management Science variables
2nd Semester, 2022-2023
2. Compute the correlation coefficient and estimates for the parameter of the
regression model; and Assumptions:
• The sample of paired (X,Y) data is a random sample.
• The pairs of (X,Y) data have a bivariate normal distribution. (For any
3. Test the significance of the correlation coefficient and regression parameters. fixed value of X, the corresponding values of Y have a distribution
that is bell shaped, and any fixed values of Y, the corresponding
values of X have a distribution that is bell shaped.)
CENTRAL LUZON STATE UNIVERSITY INTRODUCTION TO CORRELATION AND LINEAR REGRESSION | 2 INTRODUCTION TO CORRELATION AND LINEAR REGRESSION | 3

department of department of department of


statistics Correlation statistics Correlation statistics Types of Correlation
Scatter Diagram / Scatter plot
Pearson Product Moment Correlation 1. Positive linear correlation – if the variable
a type of mathematical diagram using Cartesian coordinates to display
values for two variables for a set of data It is appropriate when both variables are measured at increases, the other variable also increases; or
an interval level. if the variable decreases, the other also
decreases.
Other Correlations
If you have two ordinal variables, you could use the Spearman rank
Order Correlation (rho) or the Kendall rank order Correlation (tau). 2. Negative linear correlation – if the variable
When one measure is a continuous interval level one and the other is increases, the other variable decreases; or if
dichotomous (i.e., two-category) you can use the Point-Biserial the variable decreases, the other increases.
Correlation.
INTRODUCTION TO CORRELATION AND LINEAR REGRESSION | 4 INTRODUCTION TO CORRELATION AND LINEAR REGRESSION | 5 INTRODUCTION TO CORRELATION AND LINEAR REGRESSION | 6

department of department of department of


statistics Types of Correlation statistics Correlation statistics Correlation

A measure of the degree of linear relationship is called population coefficient ρ


Interpretations of linear correlation coefficient r
3. No linear correlation – no linear relationship
exists between the two variables. Pearson Product Moment Correlation Coefficient r
Measures the strength of the linear relationship between the paired X and Y Value of r Interpretation
values in a sample. 0.01 to 0.20 Very weak linear relationship
0.21 to 0.40 Weak linear relationship
𝑛 σ 𝑋𝑖 𝑌𝑖 – σ 𝑋𝑖 σ 𝑌𝑖
r = 0.41 to 0.70 Moderate linear relationship
𝑛 σ 𝑋𝑖 2 − σ 𝑋𝑖 2 𝑛 σ 𝑌𝑖 2 − σ 𝑌𝑖 2 0.71 to 0.90 Strong linear relationship
Note: This does not imply that the variable is not related with the other
0.91 to 0.99 Very strong linear relationship
variable. If the relationship is not linear, then it can be exponential or
logarithmic.
INTRODUCTION TO CORRELATION AND LINEAR REGRESSION | 7 INTRODUCTION TO CORRELATION AND LINEAR REGRESSION | 8 INTRODUCTION TO CORRELATION AND LINEAR REGRESSION | 9
department of department of department of
statistics Correlation statistics Correlation statistics Correlation Example 1
Formal hypothesis test The table below shows the number of absences, x, in a Statistics class and the
Properties of the Linear Correlation Coefficient r Ho : ρ = 0 (There is no significant linear relationship/correlation.) final exam grade, y, for 7 students. Find the correlation coefficient and interpret
Ha : ρ ≠ 0 (There is a significant linear relationship/correlation.) your result.
1. The value of r is always -1 to 1. That is, -1 ≤ r ≤ 1. Ho : ρ ≤ 0
Student 𝑋 𝑌
2. The value of r does not change if all values of either variable are Ha : ρ > 0 (There is a positive linear relationship/correlation.) 1 1 95
converted to a different scale. Ho : ρ ≥ 0 2 0 90
3. The value of r is not affected by the choice of x or y. Interchange all x and Ha : ρ < 0 (There is a negative linear relationship/correlation.) 3 2 90
𝑟
y values and the value of r will not change. Test statistic: 𝑡𝑐 =
1−𝑟2
4 6 55
4. r measures the strength of a linear relationship. It is not designated to 𝑛−2
5 4 70
Decision Rule:
measure the strength of a relationship that is not linear. Two tailed Right tailed Left tailed 6 3 80
Reject Ho if |𝑡𝑐 | > 𝑡𝛼,𝑛−2 Reject Ho if 𝑡𝑐 > 𝑡𝛼,𝑛−2 Reject Ho if 𝑡𝑐 < 𝑡𝛼,𝑛−2 7 3 85
2

INTRODUCTION TO CORRELATION AND LINEAR REGRESSION | 10 INTRODUCTION TO CORRELATION AND LINEAR REGRESSION | 11 INTRODUCTION TO CORRELATION AND LINEAR REGRESSION | 12

department of department of department of


statistics Correlation Example 1 statistics Correlation Example 2 statistics Note: in correlation, y and x may interchange Correlation Example 2
2 2 Household
Student 𝑋 𝑌 𝑋𝑖 𝑌𝑖 𝑋𝑖 𝑌𝑖 The data below consist of weights (in pounds) of discarded food and 1 2 3 4 5 6 7 8 Total
no.
1 1 95 1 9025 95
2 0 90 0 8100 0 sizes of households. Test if there is a significant relationship between Food 1.04 3.68 4.43 2.98 6.30 1.46 8.82 9.62 σ 𝑿𝒊 =38.3300
3 2 90 4 8100 180 the two variables. Household
2 3 3 6 4 2 1 5 σ 𝒀𝒊 =26
4 6 55 36 3025 330 Size
5 4 70 16 4900 280 2
𝑋𝑖 1.0816 13.5424 19.6249 8.8804 39.69 2.1316 77.7924 92.5444 σ 𝑿𝒊 𝟐 =255.2877
6 3 80 9 6400 240 Household no. 1 2 3 4 5 6 7 8
7 3 85 9 7225 255 𝑌𝑖 2 4 9 9 36 16 4 1 25 σ 𝒀𝒊 𝟐 =104
Student σ 𝑋𝑖 =19 σ 𝑌𝑖 =565 σ 𝑋𝑖 2 =75 σ 𝑌𝑖 2 =46775 σ 𝑋𝑖 𝑌𝑖 =1380 𝑋𝑖 𝑌𝑖 2.08 11.04 13.29 17.88 25.2 2.92 8.82 48.1 σ 𝑿𝒊 𝒀𝒊 =129.3300
Food 1.04 3.68 4.43 2.98 6.30 1.46 8.82 9.62
𝑛 σ 𝑋𝑖 𝑌𝑖 – σ 𝑋𝑖 σ 𝑌𝑖 7 1380 −(19)(565) −1075 𝑛 σ 𝑋𝑖 𝑌𝑖 – σ 𝑋𝑖 σ 𝑌𝑖 8 129.3300 −(38.3300)(26) 38.0600
r = = = 1159.6551 = −𝟎. 𝟗𝟐𝟕𝟎 r = = = = 𝟎. 𝟏𝟐𝟕𝟑
𝑛 σ 𝑋𝑖 2 − σ 𝑋𝑖 2 𝑛 σ 𝑌𝑖 2 − σ 𝑌𝑖 2 7 75 − 19 2 7 46775 − 565 2 8 255.2877 − 38.3300 2 8 104 − 26 2 299.0077
𝑛 σ 𝑋𝑖 2 − σ 𝑋𝑖 2 𝑛 σ 𝑌𝑖 2 − σ 𝑌𝑖 2
Household Size 2 3 3 6 4 2 1 5
Hence, the relationship of the number of absences(x) and the final exam grade (y) is Hence, the relationship of the weights (in pounds) of discarded food (x) and sizes of
VERY STRONG NEGATIVE LINEAR RELATIONSHIP households (y) is VERY WEAK POSITIVE LINEAR RELATIONSHIP
INTRODUCTION TO CORRELATION AND LINEAR REGRESSION | 13 INTRODUCTION TO CORRELATION AND LINEAR REGRESSION | 14 INTRODUCTION TO CORRELATION AND LINEAR REGRESSION | 15

department of
statistics Formal Hypothesis Test Example 1
department of
statistics Formal Hypothesis Test Example 1 department of
statistics

test whether there is a significant linear relationship/correlation between the 3. Decision Rule Alpha
tail
number of absences(x) and the final exam grade (y) at 5% level (Example 1 in Reject 𝐻𝑜 if 𝑡𝑐 > 𝑡(𝛼/2,𝑛−2)
Correlation) α = 0.05, n = 7, 𝑟 = −0.9270 (see slide 13)
𝑡(𝛼/2,𝑑𝑓) = 𝑡(0.05/2,5) = 𝟐. 𝟓𝟕𝟏(see next slide) 𝑡 0.05/2,5 = 𝟐. 𝟓𝟕𝟏 α = 0.05
1. Ho : ρ = 0 (There is no significant linear relationship/correlation.) Reject Ho if 𝒕𝒄 > 𝟐. 𝟓𝟕𝟏
0.05/2 Tail: two-tailed
Ha : ρ ≠ 0 (There is a significant linear relationship/correlation.) -> claim 0.05/2
df = n−2 = 7−2 = 5
−𝑡0.05/2,5 = −2.571 𝑡0.05/2,5 = 2.571
2. Test statistic: t-test 4. Computation
tail: two-tailed Tail of distribution Null hypothesis (Ho) Alternative hypothesis (Ha) −0.9270
𝑡𝑐 = = −𝟓. 𝟓𝟐𝟔𝟕
Two-tailed ρ=0 ρ≠0 1 − (−0.9270)2
Right-tailed ρ≤0 ρ>0 7−2
Left-tailed ρ≥0 ρ<0
INTRODUCTION TO CORRELATION AND LINEAR REGRESSION | 16 INTRODUCTION TO CORRELATION AND LINEAR REGRESSION | 17 INTRODUCTION TO CORRELATION AND LINEAR REGRESSION | 18
department of
statistics Formal Hypothesis Test Example 1 department of
statistics
department of
statistics Formal Hypothesis Test Example 2
5. Decision test whether there is a positive linear relationship between the weights in
Since −5.5267 > 2.571, we reject 𝐻𝑜 pounds of discarded food (x) and sizes of households (y) at 5% level (Example 2
in Correlation)

1. Ho : ρ ≤ 0
0.05/2 0.05/2 Ha : ρ > 0 (There is a positive linear relationship/correlation.) -> claim
−𝑡0.05/2,5 = −2.571 𝑡0.05/2,5 = 2.571

6. Conclusion ✓ 2. Test statistic: t-test


At 5% level of significance, the sample data support the claim that there is a ✓ tail: right-tailed Tail of distribution Null hypothesis (Ho) Alternative hypothesis (Ha)

significant linear relationship/correlation between the number of absences(x) Two-tailed ρ=0 ρ≠0

and the final exam grade (y) Claim: ρ ≠ 0 Right-tailed ρ≤0 ρ>0
(see next slide for the wording of final conclusion) Decision: Reject 𝑯𝒐 Left-tailed ρ≥0 ρ<0
INTRODUCTION TO CORRELATION AND LINEAR REGRESSION | 19 INTRODUCTION TO CORRELATION AND LINEAR REGRESSION | 20 INTRODUCTION TO CORRELATION AND LINEAR REGRESSION | 21

department of
statistics Formal Hypothesis Test Example 2 department of
statistics
department of
statistics Formal Hypothesis Test Example 2
Alpha
3. Decision Rule tail 5. Decision
Reject 𝐻𝑜 if 𝑡𝑐 > 𝑡(𝛼,𝑛−2) Since 0.3144 > 1.943, we fail to reject 𝐻𝑜
α = 0.05, n = 8, 𝑟 = 0.1273 (see slide 15)
𝑡(𝛼,𝑑𝑓) = 𝑡(0.05,6) = 𝟏. 𝟗𝟒𝟑 (see next slide) 𝑡 0.05,6 = 𝟏. 𝟗𝟒𝟑 α = 0.05
Reject Ho if 𝒕𝒄 > 𝟏. 𝟗𝟒𝟑
0.05 Tail: one-tailed 0.05

𝑡0.05, 6 = 1.943 df = n−2 = 8−2 = 6 𝑡0.05, 6 = 1.943


4. Computation 6. Conclusion
0.1273 At 5% level of significance, there is no sufficient sample evidence to support the
𝑡𝑐 = = 𝟎. 𝟑𝟏𝟒𝟒
1 − (0.1273)2 claim that there is a positive linear relationship between the weights in pounds
8−2 of discarded food (x) and sizes of households (y)
(see slide 25 for the wording of final conclusion)
INTRODUCTION TO CORRELATION AND LINEAR REGRESSION | 22 INTRODUCTION TO CORRELATION AND LINEAR REGRESSION | 23 INTRODUCTION TO CORRELATION AND LINEAR REGRESSION | 24

department of department of department of


statistics statistics Regression statistics Regression

Regression Analysis Simple Linear Regression


composed of two components; a non-random component (line itself)
▪ Used to predict the value of a dependent variable based on the value of at
and a purely random component (the error term)
least one independent variable
▪ Used to explain the impact of changes in an independent variable on the The population regression model:
dependent variable.
Population
Slope

Independent
Dependent variable Population y Coefficient Variable
the variable we wish to explain intercept
Dependent
Variable Random Error term
Independent variable 𝒚 = 𝜷𝒐 + 𝜷𝟏 𝑿 + 𝜺
Claim: ρ > 0
Decision: Failed to reject 𝑯𝒐
✓ the variable used to explain the dependent variable
INTRODUCTION TO CORRELATION AND LINEAR REGRESSION | 25 INTRODUCTION TO CORRELATION AND LINEAR REGRESSION | 26 INTRODUCTION TO CORRELATION AND LINEAR REGRESSION | 27
department of department of department of
statistics Regression statistics Regression statistics Regression

Estimated Regression Model by Least Squares Method ෝ based on some given value of x
In predicting a value of 𝒚
Estimate of the Estimate of the The formulas for 𝒃𝟏 and 𝒃𝟎
Estimated (or regression intercept regression slope
Independent 1. If there is no significant linear correlation, the best predicted
predicted) y value
variable 𝑛 σ 𝑋𝑖 𝑌𝑖 – σ 𝑋𝑖 σ 𝑌𝑖
𝑏1 = y-value is 𝑦ത
𝒚ෝ𝒊 = 𝒃𝒐 + 𝒃𝟏 𝑿 𝑛 σ 𝑋𝑖 2 − σ 𝑋𝑖 2
2. If there is a significant linear correlation, the best predicted
Interpretation of the slope and the intercept 𝑏0 = 𝑦ത − 𝑏1 𝑥ҧ y-value is found by substituting the x-value into the
▪ 𝑏0 is the estimated average value of y when the value of x is zero regression equation.
▪ 𝑏1 is the estimated change (increase or decrease) in the average
value of y as a result of a one-unit increase in x
INTRODUCTION TO CORRELATION AND LINEAR REGRESSION | 28 INTRODUCTION TO CORRELATION AND LINEAR REGRESSION | 29 INTRODUCTION TO CORRELATION AND LINEAR REGRESSION | 30

department of department of department of


statistics Regression statistics Regression statistics Regression
Guidelines for using the Regression Equation
Notes: ▪ If there is no significant linear correlation, do not use the regression Coefficient of Determination 𝑹𝟐
equation to make predictions.
▪ Regression line is the graph of the regression equation. It is called proportion of the variation in y that is explained by the
▪ When using the regression equation for predictions, stay within the scope of regression line
the line of best fit.
the available sample data.

▪ Least squares method is an objective and efficient method of 𝑅2 = 𝑟 2


▪ A regression equation based on old data is not necessary valid now.
determining the best-fitting straight line of X to Y linear
relationship in the scatterplot. ▪ Do not make predictions about a population that is different from the
population from which the sample data were drawn.
INTRODUCTION TO CORRELATION AND LINEAR REGRESSION | 31 INTRODUCTION TO CORRELATION AND LINEAR REGRESSION | 32 INTRODUCTION TO CORRELATION AND LINEAR REGRESSION | 33

department of department of department of


statistics Regression Example 1 statistics Regression Example 1 statistics Regression Example 1
2 2
Student 𝑋 𝑌 𝑋𝑖 𝑌𝑖 𝑋𝑖 𝑌𝑖 b. 𝑏0 =𝑦ത − 𝑏1 𝑥ҧ = 80.7143 − −6.5549 2.7143 = 𝟗𝟖. 𝟓𝟎𝟔𝟑
The table below shows the number of absences, x, in a Statistics class and the
1 1 95 1 9025 95 Interpretation: the estimated average value of the final exam grade is 𝟗𝟖. 𝟓𝟎𝟔𝟑 when the
final exam grade, y, for 7 students. Find the following and interpret:
2 0 90 0 8100 0 value of the number of absences is zero. (see slide 28) 𝑥ҧ = 2.7143
Student 𝑋 𝑌 3 2 90 4 8100 180
𝑥ҧ = 2.7143 𝑦ത = 80.7143
a. 𝑏1 c. Regression equation 𝑏1 = −6.5549
1 1 95 4 6 55 36 3025 330 𝑦ෝ𝑖 = 𝑏𝑜 + 𝑏1 𝑋 𝑦ො = 98.5063 − 6.5549𝑋 𝑏0 = 98.5063
b. 𝑏𝑂 𝑦ത = 80.7143
2 0 90 5 4 70 16 4900 280
c. Regression equation 6 3 80 9 6400 240 d. Estimate the final exam grade when the number of absences is 4
d. Estimate the final exam 3 2 90 7 3 85 9 7225 255
Subtitute x = 4 in the regression equation
grade when the number of 𝑦ො = 98.5063 − 6.5549 4 = 𝟕𝟐. 𝟐𝟖𝟔𝟕
4 6 55 Total σ 𝑋𝑖 =19 σ 𝑌𝑖 =565 σ 𝑋𝑖 2 =75 σ 𝑌𝑖 2 =46775 σ 𝑋𝑖 𝑌𝑖 =1380
Interpretation: the final exam grade is 72.2867 when the number of absences is 4.
absences is 4 5 4 70 𝑛 σ 𝑋𝑖 𝑌𝑖 – σ 𝑋𝑖 σ 𝑌𝑖 7(1380) – 19 565
e. 𝑅2 a. 𝑏1 = 𝑛 σ 𝑋𝑖 2 − σ 𝑋𝑖 2
=
7 75 − 19 2
= −𝟔. 𝟓𝟓𝟒𝟗 (since the 𝒃𝟏 is negative) e. 𝑅2 = 𝑟 2 = (−0.9270)2 = 𝟎. 𝟖𝟓𝟗𝟑 𝐨𝐫 𝟖𝟓. 𝟗𝟑%
6 3 80 Interpretation: 85.93% is the proportion of the variation in the final exam grade that is
Interpretation: there is a 𝟔. 𝟓𝟓𝟒𝟗 decrease in the average value of final exam
7 3 85 explained by the regression line. (see slide 33)
grade for every additional value in the number of absences (see slide 28)
INTRODUCTION TO CORRELATION AND LINEAR REGRESSION | 34 INTRODUCTION TO CORRELATION AND LINEAR REGRESSION | 35 INTRODUCTION TO CORRELATION AND LINEAR REGRESSION | 36
department of department of department of
statistics Regression Example 2 statistics Regression Example 2 statistics Regression Example 2
Household b. 𝑏0 =𝑦ത − 𝑏1 𝑥ҧ = 4.7913 − (0.2440)(3.25) = 𝟑. 𝟗𝟗𝟖𝟑
The data below consist of weights (in pounds) of discarded food and sizes of no. 1 2 3 4 5 6 7 8 Total
Interpretation: the estimated average value of the weights (in pounds) of discarded food is
households. Find the following and interpret: Food (Y) 1.04 3.68 4.43 2.98 6.30 1.46 8.82 9.62 σ 𝒀𝒊 =38.3300 3.9983 when the value of the size of household is zero. (see slide 28)
d. Estimate the weight (in pounds) of discarded 𝑥ҧ = 3.25
a. 𝑏1 𝑦ത = 4.7913
Household
b. 𝑏𝑂 food when the household size is 5 Size (X)
2 3 3 6 4 2 1 5 σ 𝑿𝒊 =26 c. Regression equation 𝑏1 = 0.2440
c. Regression equation e. 𝑅2 𝑦ෝ𝑖 = 𝑏𝑜 + 𝑏1 𝑋 𝑦ො = 3.9983 + 0.2440𝑋 𝑏0 = 3.9983
𝑌𝑖 2 1.0816 13.5424 19.6249 8.8804 39.69 2.1316 77.7924 92.5444 σ 𝒀𝒊 𝟐 =255.2877
d. Estimate the weight of discarded food when the size of household is 5
Household 𝑋𝑖 2 4 9 9 36 16 4 1 25 σ 𝑿𝒊 𝟐 =104 Subtitute x = 5 in the regression equation
1 2 3 4 5 6 7 8 𝑦ො = 3.9983 + 0.2440 5 = 𝟓. 𝟐𝟏𝟖𝟑
no. 𝑋𝑖 𝑌𝑖 2.08 11.04 13.29 17.88 25.2 2.92 8.82 48.1 σ 𝑿𝒊 𝒀𝒊 =129.3300
Interpretation: the weights (in pounds) of discarded food is 5.2183 when the size of
𝑛 σ 𝑋𝑖 𝑌𝑖 – σ 𝑋𝑖 σ 𝑌𝑖 8(129.3300) – 26 38.3300
Food 1.04 3.68 4.43 2.98 6.30 1.46 8.82 9.62 a. 𝑏1 = = = 𝟎. 𝟐𝟒𝟒𝟎 household is 5.
𝑛 σ 𝑋𝑖 2 − σ 𝑋𝑖 2 8 104 − 26 2
(since the 𝒃𝟏 is positive)
Household e. 𝑅2 = 𝑟 2 = (0.1273)2 = 𝟎. 𝟎𝟏𝟔𝟐 𝐨𝐫 𝟏. 𝟔𝟐%
2 3 3 6 4 2 1 5 Interpretation: there is a 𝟎. 𝟐𝟒𝟒𝟎 increase in the average value of weights (in pounds) of
Size discarded food for every additional value in the sizes of households. (see slide 28)
Interpretation: 1.62% is the proportion of the variation in the weights (in pounds) of
discarded food that is explained by the regression line. (see slide 33)
INTRODUCTION TO CORRELATION AND LINEAR REGRESSION | 37 INTRODUCTION TO CORRELATION AND LINEAR REGRESSION | 38 INTRODUCTION TO CORRELATION AND LINEAR REGRESSION | 39

You might also like