Covariance & Correlation

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 16

Covariance & Correlation

Litty Mathew
Table of contents
• Formula
• Key Differences
• Why correlation is preferred?
• Correlation Estimation
1. Scatter diagram
2. Correlation Coefficient
• Correlation :For different types of data
• Spurious correlation
• Coefficient of determination
Covariance & correlation
• Variance : How a single variable vary with respect to its mean
Var (X)=E([X−E(X)]2)
= cov (X,X)
• Covariance: How two items vary together.

Cov (X,Y)=E([X−E(X)][Y−E(Y)])

• Correlation: How a change in one variable will impact the other


:Measure the strength and direction of linear relationship

cov(X,Y)
Cor (X,Y)=std(X)std(Y)

Both measure the only linear relationship.


Which is Preferred?
• correlation is preferred over covariance, because it remains
unaffected by the change in location and scale, and can also be used
to make a comparison between two pairs of variables.
• Correlation is dimensionless, i.e. it is a unit-free measure of the
relationship between variables
• normalized covariance value.
• covariance matrix is a bit hard to interpret (the covariances) because
they are a mix of different units of measure.
• A way we get around that is standardizing the measures by converting
them to z scores: z − scores = (xi − x)/ SDx
• The scores then have a distribution with a M = 0 and SD = 1
Correlation Estimation
Correlation Coefficient
• It is a measure of degree of extent of linear relationship between two
variables X and Y . The population correlation coefficient is denoted by rho (ρ)
and is estimated by r.

cov(X,Y)
r=std(X)std(Y)

• Also called Pearson product-moment correlation coefficient


• Correlation = +1 indicates that random variables have a direct and strong
relationship.
• Correlation = -1 indicates that there is a strong inverse relationship and an
increase in one variable will lead to an equal and opposite decrease in the
other variable.
Don’t Confuse!!
• If two variables are independent , the
correlation coefficient between them is zero
but the converse is not true.
• If r=0 ,it shows that the relationship between
the variables X and Y is not linear.
Correlation :For different types of data
Spurious correlation
• It is a statistical term that describes a relationship between two
variables that seem to be related (correlated), but happens just by
chance or due to an unseen third variable.
• Eg: ice cream sales and accidental drownings are highly correlated -
they tend to both move up and down in a consistent pattern.
• In this case a 3rd variable - temperature - is the driver. Hotter days
result in higher ice cream sales and more people swimming - and,
unfortunately - more people drowning.
Coefficient of determination (R2)
• When an intercept is included, then r2 is simply the square of the
sample correlation coefficient (i.e., r) between the observed
outcomes and the observed predictor values.
• An R2 of 0 means that the dependent variable cannot be predicted
from the independent variable.
• An R2 of 1 means the dependent variable can be predicted without
error from the independent variable.
• It is the proportion of the variance in the dependent variable that is
predictable from the independent variable.
• An R2 of 0.10 means that 10 percent of the variance in Y is predictable
from X
Thank you

You might also like