Professional Documents
Culture Documents
Covariance and Correlation
Covariance and Correlation
1
Business Analytics
3
Correlation
• When you say that two items correlate, you are saying
that the change in one item effects a change in another
item.
4
Correlation
5
Covariance
• Covariance is a concept used in statistics and probability
theory to describe how two variables change when
compared to one another.
• Covariance is not standardized, unlike the correlation
coefficient.
• In business and investing, covariance is used to determine
different investments’ returns over a period of time in
relation to different variables. Usually the investment
assets are the marketable securities in a portfolio.
6
Covariance
• A positive covariance means the assets’ returns move up
or down together. A negative covariance means they
move in opposite directions.
7
Covariance
• For instance, if Asset A and Asset B have a negative
covariance, when Asset A has a positive return, Asset B
will have a negative return.
• However, if Asset A’s returns increase by 15%, covariance
will only tell you that Asset B’s returns will go down. The
amount could be 5%, 10%, etc.
• To determine the actual amount; you must first calculate
the correlation between the two assets’ units of
measure.
8
Covariance
• A large covariance can mean a strong relationship
between variables.
• However, you can’t compare variances over data sets with
different scales (like pounds and inches).
• A weak covariance in one data set may be a strong one in
a different data set with different scales.
• The main problem with covariance is that the wide range
of results that it takes on makes it hard to interpret.
9
Covariance
• For example, your data set could return a value of 3, or
3,000. This wide range of values is cause by a simple fact;
• The larger the X and Y values, the larger the covariance.
A covariance of 300 tells us that the variables are
correlated, but unlike the correlattion coefficient, that
number doesn’t tell us exactly how strong that
relationship is.
• The problem can be fixed by dividing the covariance by
the standard deviation to get the correlation coefficient
10
Correlation
11
R Code Correlation & Covariance
• Let us find whether there is Correlation between ice cream sales &
Temperature.
• Load xlsx package
• Create dataframe, “mydata”
>mydata<-
read.csv("C:/Users/User/Documents/R/icecream.csv“,header=TRUE)
>x<-mydata$Temperature
>y<-mydata$Sales
>cov(x,y)
484.0932
>cor(x,y)
0.9575066
12
R Code Correlation & Covariance
> plot(x,y,xlab="Temperature",ylab="Sales")
> abline(lm(y~x),cov(x,y))
The correlation coefficient of the Temperature and
Sales is 0.9575066.
Since it is close to 1, we can conclude that the
variables are positively linearly related.
13
Scatterplot showing +ve Correlation
14