Topic 6 Correlation and Regression

You might also like

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 25

DCC3132 - STATISTICS

TOPIC 6: CORRELATION AND


REGRESSION (4 HOURS)
TOPIC 6 - CORRELATION AND REGRESSION
By the end of this topic, you should be able to
6.1 Understand the basic concepts of correlation and regression
6.1.1 Explain the basic concept of correlation and
regression
6.1.2 Explain the importance of scatter diagram in
correlation analysis.
6.1.3 Explain the different types of correlation.
6.2 Show the relationship between the value of a correlation
coefficient and scatter diagram graphically.
6.1.2.1 Illustrate the relationship between the value of a
correlation coefficient and the scatter diagram graphically.
CORRELATION AND
REGRESSION
Correlation analysis is a statistical method used to
measure the strength of the relationship between
two variables.
Regression analysis is a statistical technique that
can be used to obtain the equation relating to the
two variables.
Basic concept of correlation
and regression
Sometimes two variable are found to relate to each other in some ways.
A change of one variable might cause another variable to change due to
the first variable on the second variable.
For example, an increase in sugar price may cause the price of certain
food to increase, therefore food manufacturer have to increase their
selling price.
In the same scenario applies when petroleum price increases.
Petroleum price affects the prices of other supplies, products and
services, resulting in a series of chain reaction. Higher fuel price causes
energy price to increase, therefore producers need to increase the
selling price of their product and services to cover the escalating
operating costs.
THE SCATTER DIAGRAM
The first step in determining whether a relationship exist between two
variables is to plot or graph the available data. Normally the
independent variable is labelled on the horizontal axis and the
dependent vatable on the vertical axis. These paired variables are then
plotted on the graph. This graph is called a scatter diagram.
THE SCATTER DIAGRAM

The scatter diagram forms certain patterns (increasing or decreasing),


indicating that there is a relationship between two variables. If the
scatter diagram does not show any pattern or is randomly scattered, we
can assume that there is no relationship between the two variables
Scatters plot and the strengths of
correlation between two variables
Positive Correlation
• For an increase in ‘X’ there is a corresponding increasing in ‘Y’

Negative Correlation
• For an increase in ‘X’ there is a corresponding decreasing in ‘Y’

No Correlation
• For an increase in ‘X’ there is no corresponding reaction in ‘Y’
Exercise
1. Draw a scatter diagram for the following data and state the type of
relationship between the variables.
x 1 3 5 7 9 13 17
y 0 5 11 14 19 22 30

2. Plot the data below and determine whether a linear relationship


exists between the variables p and q. If there is, state the relationship.
P 3 4 6 8 11 17 18 20
q 10 20 3 7 4 1 12 5
Solution:
LINEAR CORRELATION
COEFFICIENT
Linear correlation coefficient provides us with the measures to evaluate
the strength of relationship.
Two method are commonly used for this purposed:
◦ Pearson’s product moment correlation coefficient
◦ Spearman’s rank correlation coefficient
PEARSON’S PRODUCT MOMENT
CORRELATION COEFFICIENT
The Pearson’s correlation coefficient tell us two aspects of the
relationship between two variables . The sign (- or +) for r identified the
kind of relationship between the two quantitative variables, and the
magnitude of r describes the strength of relationship.
The mathematical formula for the Pearson’s correlation r is as follows:
or

Where r = correlation coefficient

n = number of observations

∑xy = sum of the product of x and y

∑x2 = sum of the squares of values of variables

(∑x)2 = square of the sum of all the values of variable x


The magnitude of the correlation lies between -1.0 and 1.0 . This
means that -1.0 < r < 1.0 .
The value of correlation coefficient that is close to -1.0 indicates that
the two variables have a strong negative relationship. Negative
relationship means that an increase in one variable causes another
variable to decrease and vice versa.
On the other hand, a value that is close to 1.0 indicates that the two
variables have a strong positive relationship. Positive relationship
means that an increase in one variables will cause the other
variables to increase and vice versa.
A correlation that is close to, or equals to zero means that there is
no linear relationship between the two variables. This means that
an increase or decrease in value of one variable will not affect the
other variable and vice versa.
Exercise (The Pearson’s correlation coefficient )
Calculate Pearson’s product moment correlation coefficient, r , for the
following set of data.
x 3 5 8 10 13 15 18 20 28
y 30 35 41 50 51 60 65 66 70
Solution:
SPEARMAN’S RANK
CORRELATION COEFFICIENT
Spearman’s rank correlation coefficient is a measure of association
between two variables that are at least of ordinal scale. This means that
Spearman’s rank correlation coefficient is suitable for qualitative data.
Spearman’s rank correlation coefficient ρ is then calculated as follows.

Where n = number of observation or subjects


The value of ρ is between -1.0 and 1.0, that is -1.0 < ρ < 1.0. if the value of ρ is close to 1.0,
then there is a strong linear association between two variables. If the value of ρ is close to -
1.0, then there is a strong negative association between the two variables. If ρ is close to 0,
then the two variables are not related.
Exercise (Spearman’s rank correlation coefficient )
Five student A,B,C,D,E are ranked in two subject, statistics and
computer programming with the following results.

Subject A B C D E
Statistics 1 2 3 4 5
Computer programming 3 1 4 2 5
Solution:
REGRESSION LINE
In a scatter diagrams, two axes are drawn on the graph paper. The value
of the independent variable x are plotted on the horizontal axis and the
values of the dependent variable y are plotted on the vertical axis.
A method use is the Method Of Least Squares. A regression line with a
positive slope indicates that there is a direct relationship between the
two variables. This means that if x increase, y will increase well, and vice
versa.
METHOD OF LEAST SQUARES
The least squares regression line of y on x for a set of data is in the form
of y = a + bx. The value of a and b in the regression line y = a + bx can be
calculated using the following formula:
Exercise (METHOD OF LEAST SQUARES)
Find the least squares regression line of y on x for the following data.

x 3 6 9 11 16 18

y 2 8 11 14 19 21

Determine the values of y when (a) x = 5 and (b) x = 14


Solution:
END

You might also like