Covariance

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 27

Covariance

1
Covariance
The covariance of two variables x and y in a data set measures how
the two are linearly related. A positive covariance would indicate a
positive linear relationship between the variables, and a negative
covariance would indicate the opposite.

The covariance is measured in units. The units are computed by


multiplying the units of the two variables. The covariance can take any
positive or negative values. The values are interpreted as follows:

Positive covariance: Indicates that two variables tend to move in the


same direction.

Negative covariance: Reveals that two variables tend to move in


inverse directions.

2
Covariance
Variables may change in relation to each other

Covariance measures how much the movement in one variable


predicts the movement in a corresponding variable

3
Variance vs Covariance
First, a note on your sample:
– If you’re wishing to assume that your sample is
representative of the general population (RANDOM
EFFECTS MODEL), use the degrees of freedom (n – 1)
in your calculations of variance or covariance.

– But if you’re simply wanting to assess your current


sample (FIXED EFFECTS MODEL), substitute n for
the degrees of freedom.

4
Variance vs Covariance
Do two variables change together?
Variance: n
• Gives information on variability of a
single variable.
 (x i  x) 2

Var( X )  S x2  i 1

Covariance: n 1
• Gives information on the degree to
which two variables vary together. n
• Note how similar the covariance is to
variance: the equation simply  (x i  x)( yi  y )
cov( x, y )  i 1
multiplies x’s error scores by y’s error
scores as opposed to squaring x’s error
scores. n 1

5
Covariance

 (x i  x)( yi  y )
cov( x, y )  i 1
n 1
 When X and Y : cov (x,y) = pos.
 When X and Y : cov (x,y) = neg.
 When no constant relationship: cov (x,y) = 0

6
Covariance
When two random variables X and Y are not independent,
it is frequently of interest to assess how strongly they are
related to one another.
The covariance between two rv’s X and Y is

7
Covariance
The covariance between two rv’s X and Y is
Cov(aX,cY), where a and c are constants.

8
Covariance
The covariance between two rv’s X and Y is
Cov(aX+b,cY+d), where a,b,c and d are constants.

9
Problem with Covariance:
The value obtained by covariance is dependent on the size of the
data’s standard deviations: if large, the value will be greater
than if small… even if the relationship between x and y is
exactly the same in the large versus small standard deviation
datasets.

10
Covariance
Ex: Calculate the covariance of the following pairs of
observations of two variates X and Y.
(1,6), (2,9), (3,6), (4,7), (5,8), (6,5), (7,12), (8,3), (9,17),
(10,1)
Sol:

11
Example Covariance

5
x y xi  x yi  y ( xi  x )( yi  y )
4 0 3 -3 0 0
3 2 2 -1 -1 1
2
3 4 0 1 0
1
4 0 1 -3 -3
0
0 1 2 3 4 5 6 7 6 6 3 3 9
x3 y3  7
n

 ( x  x)( y  y))
i i
7
cov( x, y )  i 1
  1.75
n 1 4
12
Covariance

Example: investigate relationship between cigarette smoking and lung


capacity

Data: sample group response data on smoking habits, and measured


lung capacities, respectively

13
Smoking v Lung Capacity Data

N Cigarettes (X ) Lung Capacity (Y )

1 0 45
2 5 42
3 10 33
4 15 31
5 20 29

14
Smoking and Lung Capacity

Lung Capacity (Y )
50

45
Lung Capacity

40

35

30

25

20
-5 0 5 10 15 20 25
Smoking (yrs)
15
Smoking v Lung Capacity
Observe that as smoking exposure goes up, corresponding
lung capacity goes down

Variables vary inversely

Covariance and Correlation quantify relationship

16
Covariance

Variables that vary inversely, like smoking and lung


capacity, tend to appear on opposite sides of the group
means
• When smoking is above its group mean,
lung capacity tends to be below its group
mean.
Average product of deviation measures extent to which
variables vary, the degree of linkage between them

17
The Sample Covariance
Similar to variance, for theoretical reasons, average is
typiclly computed using (N -1), not N . Thus,

1 N
S xy  
N  1 i1
 Xi  X Y  Y 
i

18
Calculating Covariance

Cigs (X ) Lung Cap (Y )


0 45
5 42
10 33
15 31
20 29
X  10 Y  36

19
Calculating Covariance

Cigs (X ) ( X  X ) ( X  X ) (Y  Y ) (Y  Y ) Cap (Y )

0 -10 -90 9 45
5 -5 -30 6 42
10 0 0 -3 33
15 5 -25 -5 31
20 10 -70 -7 29
∑= -215
20
Covariance Calculation

Evaluation yields,

1
S xy  ( 215)  53.75
4

21
Computational Formula

22
22
Covariance
That is, since X – X and Y – Y are the deviations of the
two variables from their respective mean values, the
covariance is the expected product of deviations. Note
that Cov(X, X) = E[(X – X)2] = V(X).

The rationale for the definition is as follows.

Suppose X and Y have a strong positive relationship to one


another, by which we mean that large values of X tend to
occur with large values of Y and small values of X with
small values of Y.

23
Covariance
Then most of the probability mass or density will be
associated with (x – X) and (y – Y), either both positive
(both X and Y above their respective means) or both
negative, so the product (x – X)(y – Y) will tend to be
positive.

Thus for a strong positive relationship, Cov(X, Y) should be


quite positive.

For a strong negative relationship, the signs of (x – X) and


(y – Y) will tend to be opposite, yielding a negative
product.
24
Covariance
Thus for a strong negative relationship, Cov(X, Y) should
be quite negative.

If X and Y are not strongly related, positive and negative


products will tend to cancel one another, yielding a
covariance near 0.

25
Covariance
Figure 5.4 illustrates the different possibilities. The
covariance depends on both the set of possible pairs
and the probabilities. In Figure 5.4, the probabilities
could be changed without altering the set of possible
pairs, and this could drastically change the value of
Cov(X, Y).

p(x, y) = 1/10 for each of ten pairs corresponding to indicated points:

(a) positive covariance; (b) negative covariance; (c) covariance near zero
Figure 5.4 26
Covariance
The following shortcut formula for Cov(X, Y) simplifies the
computations.

Proposition
Cov(X, Y) = E(XY) – X  Y

According to this formula, no intermediate subtractions are


necessary; only at the end of the computation is X  Y
subtracted from E(XY). The proof involves expanding
(X – X)(Y – Y) and then taking the expected value of
each term separately.

27

You might also like