Preprocessing-Featue Engineering

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 16

Linear Relationship among

Features/Variables
Day ABC Returns (%) XYZ Returns (%)

1 1.1 3

2 1.7 4.2

3 2.1 4.9

4 1.4 4.1

5 0.2 2.5

Daily returns for two stocks using the closing prices

How do you know the relationship between Stocks ABC and XYZ ?

How much strong these two stocks are related?


Covariance

Covariance measures how two variables move


together. It measures whether the two move in
the same direction (a positive covariance) or in
opposite directions (a negative covariance).
Covariance
Interpretation
In the example there is a positive covariance, so
the two stocks tend to move together. When one
has a high return, the other tends to have a high
return as well.

If the result was negative, then the two stocks


would tend to have opposite returns; when one
had a positive return, the other would have a
negative return.
Correlation

Covariance can tell how the stocks move


together, but to determine the strength of the
relationship, we need correlation.
Correlation Analysis (Categorical Data)

• Χ2 (chi-square) test
(Observed  Expected) 2
2  
Expected
• The larger the Χ2 value, the more likely the variables are related
• The cells that contribute the most to the Χ2 value are those
whose actual count is very different from the expected count
• Correlation does not imply causality
– # of hospitals and # of car-theft in a city are correlated
– Both are causally linked to the third variable: population

September 6, 2015 Data Mining: Concepts and Techniques 15


Chi-Square Calculation: An Example

Play chess Not play chess Sum (row)


Like science fiction 250(90) 200(360) 450

Not like science fiction 50(210) 1000(840) 1050

Sum(col.) 300 1200 1500

• Χ2 (chi-square) calculation (numbers in parenthesis are expected


counts calculated based on the data distribution in the two
categories)
( 250  90) 2
(50  210) 2
( 200  360) 2
(1000  840) 2
2      507.93
90 210 360 840
• It shows that like_science_fiction and play_chess are correlated
in the group
September 6, 2015 Data Mining: Concepts and Techniques 16

You might also like