Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 19

Correlation and Regression Analysis

JC Beringuel, MA.Ed., LPT, ECT


General Education/Electronics Engineering Department, College of Engineering
University of Rizal System
Introduction
This chapter covers correlation coefficient and
regression analysis. The early part of the
chapter discussed scatter diagram and its
importance. Then immediately followed by
correlation, which is a statistical method used
to determine whether a relationship between
variables exist. A variable here is characteristic
of the population being observed or measured.

The second part of the chapter is the


regression analysis, which is a statistical
method used to describes the nature of the
relationship between variables, that is, either
positive or negative, linear or non-linear. There
are two types of relationships: simple and
multiple. In a simple relationship, there are two
variables-an independent variable and a
dependent variable. On the other hand,
multiple relationship, two or more independent
variables are used to predict the dependent
variable.
Introduction cont.

Simple linear relationship can be


positive or negative. A positive
relationship exists when either
variables increase at the same
time or both decrease at the
same time. On the contrary, in a
negative relationship, as one
variable increase, the other
variables decreases or vice
versa. The text is limited with the
discussion of simple linear
regression analysis.
Scatter Diagram

• Is a useful tool for checking the


assumptions in a regression
analysis. It can be viewed
during an initial screening run
of the analysis or after the
analysis
Example (Scatter Diagram)

1. Consider the data on lot size and number of man-hours


spent in preparing the land of 10 individuals

Lot Size (X) 3 2 6 8 4 5 6 3 7 6

Man-hrs (Y) 7.3 5.2 12.8 17.0 8.7 10.8 13.5 6.9 14.8 13.2
Statistical Model

In most real situation, the relation between two variables.

Y = b0 + b1Xi + Eii

where:

b0 = intercept of the line

b1 = slope of the line; rate of change in the value of Y for every


change in X

Eii = random error associated to the i observation


Estimation of the Parameters b1 and b0

The values of the parameters in the regression equation


above are often times not known.
Example (Simple Linear Regression Analysis)

1. Consider the data on lot size and number of man-hours


spent in preparing the land of 10 individuals
Lot Size (X) 3 2 6 8 4 5 6 3 7 6

Man-hrs (Y) 7.3 5.2 12.8 17.0 8.7 10.8 13.5 6.9 14.8 13.2
Coefficient of Determination (R^2 %)
If we want to determine whether the percent contribution of the variable X
to the variation in Y, or if we want to determine whether the estimated line
fits the observed data, then the coefficient of determination denoted by
R^2 can be computed from the sample data.
Example (Coefficient of Determination)

1. Consider the data on lot size and number of man-hours


spent in preparing the land of 10 individuals

Lot Size (X) 3 2 6 8 4 5 6 3 7 6

Man-hrs (Y) 7.3 5.2 12.8 17.0 8.7 10.8 13.5 6.9 14.8 13.2
Simple Correlation Analysis

Simple Correlation Analysis is a


technique used to describe the
relationship or association
between variables. If we want to
know the degree of the
relationship between two
variables which are measured in
an interval scale, the Pearson
Product Moment Correlation
Coefficient (r) maybe obtained.
Pearson Product Moment Correlation Coefficient (r)

The unbiased estimator of the true population Pearson


Product Moment Correlation Coefficient (r) is given below.
Properties of the Correlation Coefficient (r)

1. It is a unites quantity.

2. It is always some number between -1 and +1 inclusive.

3. If r=1, then all the points lie in a straight line and the relationship is
said to be perfect positive relationship. If r=-1, then all the points lie
also in a straight line but in a reverse relation and this relationship is
said to be perfect negative relationship. If r=0, then there is no
relationship between two variables and the magnitude are said to be
independent.

4. The magnitude of r is simply a measure of how closely the points


cluster about a certain trend line which we have known as the
regression line.
Example
Consider the scores obtained in Math(X) and Statistics (Y)
subjects by 10 student.

Observation Math Score (X) Stat Score (Y)


1 5 2
2 8 7
3 10 8
4 12 9
5 12 10
6 14 12
7 15 14
8 16 10
9 18 16
10 20 12
Spearman Rank Correlation Coefficient (rs)

When the variables of interest are measured in an ordinal


scale, the spearman rank correlation coefficient (rs) maybe
used instead of the Pearson r. To obtain the Spearman rs,
apply the following steps summarized as follows:

Step 1. Rank the scores in distribution X giving the lowest a


rank of 1 and the highest a rank of n. Repeat the process
for the scores in distribution Y.

Step 2. Obtain the difference(di) between the two sets of


ranks.

Stem 3. Square each difference and then the sum of the


squared di
Step 4. Compute the formula

Step 5. If the proportion of tries in either the X or the Y


observation is large, use the formula.
Example
The sampling distribution of the test is the student t
distribution with n-2 degrees of freedom.

Math (X) Stat (Y)

1 18 24

2 17 28

3 14 30

4 13 26

5 12 22

6 10 18

7 8 15
Correlation between Nominal Variables

The Use of Guttman’s Lambda Formula(also) known as


Guttman’s Coefficient of Predictability)
Example
Let us measure the degree of relationship of individual’s
religion and political party where he belong

LAKAS NUCD LAMMP REPORMA Total


Catholic 20 9 15 44
Iglesia ni Cristo 5 18 4 27
Protestant 11 8 10 29
Total 36 35 29 100

You might also like