Correlation and Regression

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 25

Bivariate analysis Quantitative

Methods
( Regression and
By : Prof. Sonia
Correlation) Nangalia
INTRODUCTION
• In order to make decisions, the decision
makers rely on the relationship of what is
known and what is to be estimated.
• In this chapter, we will study how to determine
the relationship between variables
Difference
• Chi square test tells us whether there is a
relationships, but it does not tell us what the
relationship is.
• Regression and correlation analyses show us
how to determine both the nature and the
strength of a relationship between two
variables.
Types of relationships
• Regression and correlation analyses are based
on the relationship, or association between
two or more variables.
• The known variable is called the independent
variable.
• The variable we are trying to predict is the
dependent variable.
Example:

Sales Years of Annual


person Experience sales
(£1000s)
A 1 80
B 3 97
C 4 92
D 4 102
E 6 103
F 8 111
G 10 119
H 10 123
I 11 117
J 13 136

To get a first idea about the possible existence of a relation between the two
variables it is useful to plot the data in a scatter diagram.
The values of the independent variable are plotted along the x-axis,
and the values of the dependent variable along the y-axis.
• Direct relationship between X and Y: As the
independent variable increases, the dependent
variable also increases. The slope of this line is
positive as y increases x also increases.
• Indirect relationship between X and Y: As the
independent variable increases, the dependent
variable decreases. The slope of this line is
negative as y increases x decreases.
Scatter diagrams
• The first step in determining whether there is a
relationship between two variables is to examine
the graph of the observed or known data. This
graph or chart is called scatter diagram.
• It gives us two information
visually we can look for patterns that indicate that the variables
are related.
if the variables are related we can see what kind of line, or
estimating equation, describes this relationship.
160

140

) 120
s
0
0
0 100
1
£
(
s
e
l 80
a
S
l 60
a
u
n
n 40
A

20

0
0 2 4 6 8 10 12 14
Years of Experience

The scatter diagram suggests two things:

a) Generally, when experience increases, annual sales increase.


In other words, there seems to be a positive relationship between the two variables.
b) The relationship is not perfect; a given increase in experience is not
accompanied by the same increase in annual sales in all cases.
Estimation using regression line
• Equation for a straight line
Y= a + bX
Y = dependent variable
X = independent variable
a = Y-intercept
b = slope of the line.
Finding the values for a and b
• Visually we can find a by locating the point
where the line crosses the y-axis
• to find b (slope) determine how the
dependent variable changes as the
independent variable changes.
• Slope of the straight line
Direct relationship; positive slope
indirect relationship: negative slope
The Method of Least Squares
• Since we are normally not dealing with
deterministic relationships, it is not possible to
fit a straight line that coincides exactly with
the actual observations.
• The best we can do is to find a linear equation
for which the sum of the (squared) differences
between the estimated values of Y and the
actual values of Y are the smallest possible.
• The Estimating Line
^
Y = a + bX
Equations to find the slope and the intercept of
the best- fitting regression line.
• Slope of the best fitting regression line
• Y-intercept of the best fitting regression line.
Example
• The director of a sanitation dept is interested in the
relation between the age of a garbage truck and the
annual repair expense she should expect to incur. In
order to determine the relationship, she has
collected following info:
Truck no Age of truck Repair expense
In hundreds.
101 5 7
102 3 7
103 3 6
104 1 4
Cont…

• if the city has a truck of 4 years old, what


would be the repair expense?
Correlation Analysis
• A statistical tool we can use to describe the
degree to which one variable is linearly
related to another.
• 2 measures for describing the correlation
between two variables
– Coefficient of determination
– Coefficient of correlation
Coefficient of correlation
The Coefficient of Determination
• Measures the extent or strength of the
association that exists between two variables X
and Y.
• Referred as sample coefficient of determination.
• It is developed from the relationship between
two kinds of variation
– The fitted regression line
– Their own mean
• Sample coefficient of determination
one minus the ratio between these two
variations
2
represented by r

^ 2
r2 Ʃ(Y–Y)
= 1-
-
Ʃ(Y–Y)
2

Use shortcut method to calculate sample coefficient of


determination.
2
Interpretation of r
• Value of r-square lies between 0 and 1.
• r-square close to 1 indicates a strong
correlation between X and Y.
• R-square close to 0 means that there is little
correlation between the two variables.
Example

• Calculate coefficient of determination


Year R&D expense X Annual profit Y
1995 5 31
1994 11 40
1993 4 30
1992 5 34
1991 3 25
1990 2 20
The coefficient of correlation.
• It is denoted by r
• It is the square root of coefficient of
determination.
• When the slope of the estimating equation is
positive, r is the positive square root, but if b is
negative, r is the negative square root.
• Thus the sign of r indicates the direction of the
relationship between two variables X and Y.
Causation and correlation
• A correlation between variables does not mean
automatically mean that the change in one
variable is the cause of the change in the values
of the other variables.

• Causation indicates that one event is the result


of the occurrence of the other event ie. There is
a causal relationship between the two events.
References
• Statistics for Management, Levin and Rubin
• Business Statistics: For Contemporary Decision Making, Ken
Black
Thank you

You might also like