Assignment 2 - Advanced Statistics

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 2

Assignment 2- Advanced Statistics

1.
Co-variance : It states dependency between two independent variables on a different
scale(Units).
There is no range for the value of co-variance.

Co-Relation : It also states dependency between two independent variables but here
there is a range for the value of co-variance. i.e -1 to +1.
If the co-relation tends to -1 then it is negatively co-related. If it tends to +1 then it is
negatively co-related

Sample size (N) = 7


Mean (A) = (25+35+21+67+98+27+64) ÷7 =337÷7 =48.143(approx.)
Mean (B) = (52+10+5+98+52+36+69) ÷7 = 322÷7 =46

Sum (A) = 25 + 35 + 21 + 67 + 98 + 27 + 64 = 337


AMean= 48.143

Sum(B) = 52 + 10 + 5 + 98 + 52 + 36 + 69 = 322
BMean= 46

Covariance(A,B) = ∑(Ai- Amean)*(Bi- Bmean)/(samplesize -1)


= (25-48.143)*(52-46)+(35-48.143)*(10-46)+(21-48.143)*(5-46)+(67-48.143)*(98-
46)+(98-48.143)*(52-46)+(27-48.143)*(36-46)+(64-48.143)*(69-46))/6

Covariance(A,B) = 550.5

σ(A)=√(∑[A-μ(A)]2 ÷ N-1) = sqrt( (25-48.143)2+(35-48.143)2+(21-48.143)2+(67-


48.143)2+(98-48.143)2+(27-48.143)2+(64-48.143)2)/7-1 =√830.809(Approx.)

σ(A)=28.82376

σ(B)=√(∑[B-μ(B)]2 ÷ N-1) = sqrt((52-46)2+(10-46)2+(5-46)2+(98-46)2+(52-46)2+(36-


46)2+(69-46)2)/7-1

σ(B)=√1063.66= 32.6139

Correlation(A,B)=Cov(A,B)÷(σ(A)*σ(B))=550.5÷ (28.82*32.61)= Correlation(A,B)=


0.5856
2. Multicollinearity occurs when two or more variables are linearly interdependent.
Including such variables might result in a biased model which will perform nicely in
the validation set but completely fail in out-of-time validation or in production.

To avoid multi-collinearity, the best and the standard way is to remove the identified
variables.

But there are scenarios when we need to retain some of these variables (which are linearly
dependent) in our final training set for building the model. In such cases, we may consider a
linear combination of these variables as a new variable and drop all the identified variables.

Suppose, we identify a strong correlation between the variables X1 and X2. In this case we
want to discard one of these variables. But the business thinks that both these variables are
important and should be utilized. In such a case, we may drop X1, X2 but introduce a new
variable Z = X1 + X2 (or any other linear combination).

You might also like