PGD - Stat-3.Simple Corr & Regression

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 28

EAST WEST UNIVERSITY

Executive Development Centre, Skills for Employment


Investment Program (SEIP)

Graduate Diploma in Lather and Footwear Management


Basic Computation and Statistical Techniques (LF 702)

Statistical Techniques –III

Linear Correlation and regression


Instructor:
Dr. Md. Sohel Rana
Associate Professor of Statistics,
Department of Mathematical & Physical Sciences, EWU
Email: srana@ewubd.edu

Dr. Md. Sohel Rana, Stat, EWU


Correlation Analysis

• Correlation analysis is used to measure


strength of the association (linear
relationship) between two variables
– Correlation is only concerned with strength of the
relationship
– No causal effect is implied with correlation
– Correlation was first presented in Chapter 3

Dr. Md. Sohel Rana, Stat, EWU


Positive (weak and Strong) Correlation Patterns

Dr. Md. Sohel Rana, Stat, EWU


Negative (weak and Strong) Correlation Patterns

Dr. Md. Sohel Rana, Stat, EWU


Sample Pearson’s correlation coefficient

Sample Pearson’s correlation coefficient

Cov( x, y ) S xy
r 
Var ( x).Var ( y ) S xx S yy

 x
2
y 
 y
i
2

x 
2
S xx 2 i S yy i
i
n n

S xy x y 
 x  y  i i
i i
n

Dr. Md. Sohel Rana, Stat, EWU


Pearson’s Correlation Coefficient

• “r” indicates…
– strength of relationship (strong, weak, or none)
– direction of relationship
• positive (direct) – variables move in same direction
• negative (inverse) – variables move in opposite directions

• r ranges in value from –1.0 to +1.0


-1.0 0.0 +1.0

Strong Negative No Relationship Strong Positive

Dr. Md. Sohel Rana, Stat, EWU


Rule of thumb

Value of r Strength of relationship


0.8 to 1 Very Strong
0.7 to 0.8 strong
0.5 to 0.7 Moderate
0.3 to 0.5 Weak
0 to 0.3 Very weak / No relationship

Dr. Md. Sohel Rana, Stat, EWU


Example
• A major airline wants to estimate the relationship
between the number of reservations and the actual
number of passengers who show up for flight ABC.
• Information gathered over 12 randomly selected
days for flight ABC is given in the table below:

Dr. Md. Sohel Rana, Stat, EWU


Airline Data
Day No. of Reservations No. of Passengers
1 250 210
2 548 405
3 156 120
4 121 89
5 416 304
6 450 320
7 462 319
8 508 410
9 307 275
10 311 289
11 265 236
12 189 170

Dr. Md. Sohel Rana, Stat, EWU


Example
450
400
No. of Passengers

350
300
250
200
150
100
50
0
0 100 200 300 400 500 600

No. of Reservations

S xy  154483.2 ; S xx  223736.9 ; S yy  113564.2


154483.2
r  0.97
(223736.9)(113564.2 )
Comment: ?
Dr. Md. Sohel Rana, Stat, EWU
Linear Regression: Introduction
• We are often interested in trying to determine the
relationship between a pair of variables. For
instance,
o How does the amount of money spent in
advertising a new product relate to the first
month’s sales figures for that product? Or
o How does the house price relate to the house
space? Or
o How does the expenditure of a family relate to
the family income ?

Dr. Md. Sohel Rana, Stat, EWU


Introduction
• The variable whose value is determined first is
called the input or independent variable and the
other is called the response or dependent variable.

Dr. Md. Sohel Rana, Stat, EWU


Simple Linear Regression Model
• We consider a basic regression model where there
is one input (or independent) variable X and one
response (or dependent) variable Y and the
relationship is linear.
• The regression model can be stated as follows:
Y = β0 +β1X + ε
• The quantities β0 and β1 are parameters (unknown
population characteristic). The variable ε, called the
random error, is assumed to be a random variable
having normal distribution with mean 0 and
variance σ2.
Dr. Md. Sohel Rana, Stat, EWU
Simple Linear Regression Model
Definition:
• The relationship between the response variable Y
and the input variable X specified by the equation
Y = β0 +β1X + ε
is called a simple linear regression.

Dr. Md. Sohel Rana, Stat, EWU


Estimating Regression Parameters
• Suppose that the responses yi corresponding to the
input values xi, i =1, 2, …, n, are to be observed and
used to estimate the parameters β0 and β1 in a
simple linear regression model
yi = β0 +β1xi + εi , i =1, 2, …, n
ˆ0 and ˆ1
• If are the respective estimators of β0 and
β1, then the estimator of the response
ˆ0  ˆ1 xicorresponding
to the input value xi would be
• Since the actual response is yi, it follows that the
difference between the actual  ( ˆ  ˆ xand
 i  yresponse
 its
i 0 1 i)
estimated value isDr.given by
Md. Sohel Rana, Stat, EWU
Estimating Regression Parameters
• Now, it is reasonable to choose our estimates of β0
and β1 to be the values of ˆ0 and ˆ1 that make these
errors as small as possible.
ˆ and ˆ
• To do this, we choose 0 1 to minimize the

value of the sum of the squares of the errors (SSE),


n n

    [ yi  (0  1 xi )]
i 1
i
2 ˆ
i 1
ˆ 2

• The resulting estimators of β0 and β1 are called


least-square estimators.

Dr. Md. Sohel Rana, Stat, EWU


Estimating Regression Parameters
Definition:
For given data pairs (xi, yi, i = 1, 2, ..., n), the least-
square estimators of β0 and β1 are the values ˆ0 and ˆ1
that make n n

i  i 0 1i
 2

i 1
 [ y  (
i 1
ˆ  ˆ x )]2

as small as possible.

Dr. Md. Sohel Rana, Stat, EWU


Estimating Regression Parameters
• The Least Square Method: Minimize SSE

yˆi  ˆ0  ˆ1 xi

Dr. Md. Sohel Rana, Stat, EWU


Estimating Regression Parameters
• It can be shown that the least-squares estimators of
β0 and β1, which we call ˆ0 and ˆ1 , are given by

S xy
ˆ1  and ˆ0  y  ˆ1 x
S xx

Dr. Md. Sohel Rana, Stat, EWU


Example
Refer to the Airline Example
a) Which is the dependent variable (Y) and which is
the explanatory variable (X) in this problem?
b) Draw a scatter diagram with X and Y. Is the
relationship looks linear?
c) Fit the regression line of Y on X. Or, fit a linear
regression model to these data with no. of
Passengers being the response variable and no. of
Reservations the explanatory variable.
d) From the output, identify and interpret the slope
and the intercept.
Dr. Md. Sohel Rana, Stat, EWU
Example
Solution:
a) Since the no. of Passengers depends on the no. of
Reservations, the dependent variable (Y) in this
example is the no. of Passengers and the
explanatory (or independent) variable (X) is the no.
of Reservations .

Dr. Md. Sohel Rana, Stat, EWU


Example
b) The Scatter Plot of Reservations (X) and no. of Passengers
(Y) is shown below.
It is clear from the diagram that there is a positive
relationship between the variables. It is also reasonably
clear that there is a linear trend in the data.

450
400
No. of Passengers

350
300
250
200
150
100
50
0
0 100 200 300 400 500 600

No. of Reservations

Dr. Md. Sohel Rana, Stat, EWU


x y y2 x2 xy
250 210 62500 44100 52500
548 405 300304 164025 221940
156 120 24336 14400 18720
121 89 14641 7921 10769
416 304 173056 92416 126464
450 320 202500 102400 144000
462 319 213444 101761 147378
508 410 258064 168100 208280
307 275 94249 75625 84425
311 289 96721 83521 89879
265 236 70225 55696 62540
189 170 35721 28900 32130
=3983 =3147 =1545761 = 938865 =1199025

Dr. Md. Sohel Rana, Stat, EWU


Example - 1
• For this data:

=3983 =3147 =1545761 = 938865 =1199025

S xy  154483.2 ; S xx  223736.9 ; S yy  113564.2

= = 0.69
and
The fitted line is

Dr. Md. Sohel Rana, Stat, EWU


Example
d) Interpretation of intercept (β0) and slope (β1):
(i) Slope β1:
• In general this tells us how we expect Y to
change, on average, if X is increased by 1 unit
• In this example, β1 = 0.69. Thus, for every
additional Reservations, the no. of Passengers
will increase by an average of 0.69.
• Since the slope is positive, we expect Y to
increase as X increases.
• If the slope were negative, we would expect Y to
decrease as XDr.increases.
Md. Sohel Rana, Stat, EWU
Example

(ii) Intercept β0:


• This is the value of Y predicted for X = 0.
• In this example, β0 = 33.0721 which means that
at zero Reservations, the no. of Passengers
estimated to be 33.0721 (??).
• In most applications, the intercept has no useful
practical interpretation. It just serves to fix the
line.

Dr. Md. Sohel Rana, Stat, EWU


Coefficient of determination

• The coefficient of determination, R2, for a simple


regression is equal to the simple correlation
squared

R r
2 2
xy


R 2 greater than 0.80 usually indication of a good
fitted model

Dr. Md. Sohel Rana, Stat, EWU


The End

Dr. Md. Sohel Rana, Stat, EWU

You might also like