Download as pdf or txt
Download as pdf or txt
You are on page 1of 18

STQT401

SIMPLE LINEAR REGRESSION


AND CORRELATION
Regression analysis
The nature of the relationship between 2 variables can take many forms, from simple
ones to extremely complicated mathematical functions

Used primarily for the purpose of prediction

The simplest relationship consists of a straight-line or linear relationship

Development of a statistical model that can be used to predict the values of a dependent
or response variable from the values of at least one explanatory or independent variable

Dependent variable
• The variable we wish to predict or explain

Independent variable
• The variable used to predict or explain the dependent variable

Scatter diagram
• Visualise the relationship between variables (independent variable on the horizontal X axis and a dependent
variable on the vertical Y axis)
• Helps suggest starting point for regression analysis
Positive straight-line relationship

∆𝑌𝑌 - Change in Y

a ∆X - Change in X

0 X

X – Independent Variable Y – Dependent Variable


Simple Linear regression model

Only one independent variable (X)

Relationship between X and Y is described by a linear function

Changes in Y are assumed to be related to changes in X

𝑌𝑌𝑐𝑐 = 𝑎𝑎 + 𝑏𝑏𝑏𝑏

Yc = Predicted value of Y for the observation (X variable)

a = Population Y intercept

b = slope (average change in Yc for each change of 1 in X) (Population slope coefficient)

X = Independent Variable
Types of relationships

Positive Linear Weak linear relationship Curvilinear relationship

Negative Linear

No relationship
Correlation Analysis

Used to measure the strength of the association between variables

The objective is not to use one variable to predict another, but rather to measure the
strength of the association or covariation that exists between two continuous
variables

Coefficient of Correlation (r): Measurement is on a scale for r between -1 and +1.

When r = 0, there is no relationship.

For r → +1, there is a strong positive relationship between the variables, i.e., as x
increases, y also increases.

For r → -1, there is a strong negative relationship between the variables, i.e., as x
increases, y decreases, and vice versa.
Association

Y Y Y

X X X

Perfect positive Perfect negative


No correlation
correlation correlation
• When one variable • When one variable • There is no
changes, the other changes, the other relationship between
variable changes in variable changes in the variables
the same direction the opposite direction • As X increases, there
• Y increases in a • Y decreases in a is no systematic
perfectly predictable perfectly predictable change in Y, so there
manner as X manner as X is no association
increases increases between the values of
X and the values of Y
Association

0.00

-0.72

0.50

-0.96

0.98

-0.45
Procedure for Calculation

1. Collect the data for both dependent (Y) and independent (X).

2. Arrange the data in two columns X and Y.

3. Compute Pearson's Coefficient of Correlation:

4. Determine the values of a and b:

n⋅ ∑ xy − ∑ 𝑥𝑥 ⋅ ∑ 𝑦𝑦
r= Coefficient of determination = 𝑟𝑟 2
n⋅ ∑ x 2 − ∑ 𝑥𝑥 2 ⋅ n⋅ ∑ y 2 − ∑ 𝑦𝑦 2

n⋅ ∑ xy − ∑ 𝑥𝑥 ⋅ ∑ 𝑦𝑦
𝑏𝑏 = 𝑎𝑎 = 𝑌𝑌 − 𝑏𝑏 ⋅ 𝑋𝑋
n⋅ ∑ x 2 − ∑ 𝑥𝑥 2

∑ 𝑌𝑌 ∑ 𝑋𝑋
𝑌𝑌 = 𝑋𝑋 =
𝑛𝑛 𝑛𝑛
Example
An engineer wishes to examine the relationship between the length of steel bars (cm) and its
respective weight (kg). A random sample of 10 steel bars is selected.

Length of steel bars (cm) Weight (kg)

1,40 2,45
1,60 3,12
1,70 2,79
1,88 3,08
1,10 1,99
1,55 2,19
2,35 4,05
2,45 3,24
1,43 3,19
1,70 2,55

a) Compute the coefficient of correlation and determination and interpret your answers.
b) Determine the regression equation and estimate the weight of 3cm of steel bar
c) Test the hypothesis if the coefficient of correlation in the population is zero. Use α = 0.05
Scatter Plot
4,5

3,5

3
Weight (Kg)

2,5

1,5

0,5

0
0,00 0,50 1,00 1,50 2,00 2,50 3,00
Length (cm)
Computing the values

X - Length (cm) Y - Weight (kg) 𝒙𝒙𝟐𝟐 𝒚𝒚𝟐𝟐 XxY


1,40 2,45 1,96 6,00 3,43

1,60 3,12 2,56 9,73 4,99

1,70 2,79 2,89 7,78 4,74

1,88 3,08 3,53 9,49 5,79

1,10 1,99 1,21 3,96 2,19

1,55 2,19 2,40 4,80 3,39

2,35 4,05 5,52 16,40 9,52

2,45 3,24 6,00 10,50 7,94

1,43 3,19 2,04 10,18 4,56

1,70 2,55 2,89 6,50 4,34

17,16 28,65 31,02 85,34 50,89


Computing r and b

n⋅ ∑ xy − ∑ 𝑥𝑥 ⋅ ∑ 𝑦𝑦
r=
n⋅ ∑ x 2 − ∑ 𝑥𝑥 2 ⋅ n⋅ ∑ y 2 − ∑ 𝑦𝑦 2

10 X 50.89 − 17,16 X 28.65


r= = 0,764
10 X 31.02 − 17.16 2 X 10 X 85.34 − 28.65 2

There is a moderate positive relationship between the length of steel bars and the weight

𝑟𝑟 2 = 0.7642 = 0.584

58% of the variation in the weight of steel bars can be explained by the variability in the length.

n⋅ ∑ xy − ∑ 𝑥𝑥 ⋅ ∑ 𝑦𝑦
𝑏𝑏 =
n⋅ ∑ x 2 − ∑ 𝑥𝑥 2

10 X 50.89 − 17.16 X 28.65


𝑏𝑏 = = 1.1
10 𝑋𝑋 31.02 − 17.16 2
Computing a

∑ 𝑌𝑌 28.65
𝑌𝑌 = = = 2.865
𝑛𝑛 10

∑ 𝑋𝑋 17.16
𝑋𝑋 = = = 1.716
𝑛𝑛 10

𝑎𝑎 = 𝑌𝑌 − 𝑏𝑏 ⋅ 𝑋𝑋

𝑎𝑎 = 2.865 − (1.10 𝑥𝑥 1.716) = 0,977


Graphical Presentation

4,5

3,5

Slope (b) = 1.1


Weight (Kg)

2,5

1,5

0,5

0
Intercept (a) 0,00 0,50 1,00 1,50 2,00 2,50 3,00
Length (cm)
= 0,977

𝑌𝑌(𝑊𝑊𝑊𝑊𝑊𝑊𝑊𝑊𝑊𝑊𝑊) = 𝑎𝑎 + 𝑏𝑏𝑋𝑋(𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿) Y = 0.977 + 1.1𝑋𝑋


Predicting the dependent variable

Predict the weight of a steel bar that has a length of 3cm.

Y = 0.977 + 1.1𝑋𝑋

Y = 0.977 + (1,10 X 3) = 4.28

The predicted weight for a length of steel bar that is 3cm long = 4.28 kg
Hypothesis Testing (Correlation coefficient (r) for Liner Regression)
State the null hypothesis (𝐻𝐻𝑜𝑜 )
• Ho: 𝜌𝜌 = 0

State the alternate hypothesis (𝐻𝐻𝑎𝑎 )


• Ha: 𝜌𝜌 ≠ 0

Specify the level of significance to be used for t test


• α values are 0,01; 0,02; 0,05 and 0,10.

Critical value tc
• tc = t(degrees of freedom = n-2)

Decision rule:
• Accept Ho if -tc < t < tc

t test
𝑟𝑟 − 𝜌𝜌
𝑡𝑡 =
1 − 𝑟𝑟 2
𝑛𝑛 − 2 State the decision
• r – correlation value (sample)
• n – number of samples
• 𝜌𝜌 - population correlation coefficient
Example
Test the hypothesis that the coefficient of correlation in the experiment is zero. Use α = 0.05

Ho: 𝜌𝜌 = 0 (There is no correlation)

Ha: 𝜌𝜌 ≠ 0 (There is correlation)

2-tail t test α = 0.05 df = n – 2 = 10 – 2 = 8

Decision rule: Accept Ho if -2,306 < t < 2,306

𝑟𝑟−𝜌𝜌 0.764−0
Test statistic 𝑡𝑡 = = = 3.34
1−𝑟𝑟2 1−0.584
𝑛𝑛−2 10−2

Decision: Since t falls in the region of rejection, Ho is rejected

Conclusion: There is a relationship between the variables

You might also like