Quantitative Techniques Assignment: Correlation Analysis, Karl Pearson's Coefficient of Correlation

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 8

Quantitative Techniques Assignment

Correlation Analysis, Karl Pearson's Coefficient Of Correlation


Correlation analysis
Some important definitions of correlation are given below:

 ''correlation analysis deals with the association between two or more variables. If two or more quantities very
in sympathy so that movements in one tend to be accompanied by corresponding movements in the other (so
then they are said to be correlated. When the relationship is of a quantitative nature, the appropriate
statistical tool for discovering and measuring the relationship and expressing it in brief formula is known as
corrosion. Correlation analysis attempts to determine the degree of relationship' between variables.
Correlation is an analysis of the co variation between two or more variables."

Thus correlation is a statistical device which helps us in analysing the co variation of two or more variables.

Correlation analysis contributes to the understanding of economic behavior, aids in locating the critically
important variables on which others depend, may reveal to the economist the connection by which
disturbances spread and suggest to him the paths through which stabilizing forces may become effective.
In business, correlation analysis enables the executive to estimate costs, sales prices and other variables on the
basis of some other series with which these costs, sales or prices may be functionally related. Some of the
guesswork can be removed from decisions when the relationship between a variable to be estimated and the
one or more other variables on which it depends are close and reasonably invariant.
Some of its main topics are:
1.    correlation and causation
2.    Karl Pearson's coefficient of correlation
3.    significance of the study of correlation
Correlation and causation
Correlation analysis helps us in determining the degree of relationship between two or more variables-it does
not tell us anything about cause and effect relationship. Even a high degree of correlation does not necessarily
mean that a relationship of cues and effect exists between the variables or, simple stated, correlation does not
necessarily imply causation or functional relationship though the existence of causation always implies
correlation. By itself it establishes only coronation. The always implies correlation. By itself it establishes only
co variation. The explanation of a significant degree of correlation may be any one, or combination of the
following reasons;
(i)          The correlation may be due to pure chance, especially in a small sample. We may get a high degree of
correlation between two variables in a sample but in the universe there may not be any relationship between
the variables at all. This is especially so in case of small samples. Such a correlation may arise either because of
pure random sampling variation or because of the bias of the investigator in selecting the sample. The
following example shall illustrate the point:
Income ($) 9500 9600 9700 9800 9900
Weight (lbs.) 120 140 160 180 200
The above data show a perfect positive relationship between income and weight, i.e. as the income is
increasing the weight is increasing and the rate of change between and the rate of change between two
variables is the same.
(ii)   Both the correlated variables may be influenced by one or more other variables. It is just possible that a
high degree of correlation between the variables may be due to some causes affecting each with the same
effect. For example, a high degree of correlation between the yield per acre of rice and tea may be due to the
fact that both are related to the amount of rainfall. But none of the two variables is the cause of the other. To
take another example: suppose the correlation of teacher's salaries and the consumption of liquor over a
period of year comes out to be 0.9, this does not prove that teachers drink; nor does it prove that liquor sale
increases teacher's salaries. Instead, both variables move together because both are influenced by a third
variable- Long run growth in national income and population. 
(iii)   Both the variables may be mutually influencing each other so that neither can be designated as he cause
and the other the effect. There be mutually influencing each other so that neither can be designated as the
cause and the other the effect. There may be a high degree of correlation between the variables but it may be
difficult to pinpoint as to which is the cause and which is the effect. This is especially likely to be so in the case
of which is the case economic variables. For example, such variables as demand and supply, price and
protection, etc. mutually interact. To take a specific case, it is a well known principle of economics that as the
price of a commodity increases its demand goes down and so price is the cause and demand of a commodity
due to growth of population or other reasons may exercise an upward pressure on price. Thus, at times it may
become difficult to explain from the two correlated variables which is the cause and which is the effect
because both may be resting on each other.
The above points clearly bring out the fact that a mathematical relationship implies nothing in itself about
cause and effect. In general, if factors A and Bare correlated, it may be that
(1) a causes to be sure but it might also be that
(2) b causes a,
(3) a and b influence each other continuously or intermittently.
(4) A and B are both influenced by C or
(5) The correlation is due to chance.
 In many instances extremely high degree of correlation between two variables may be obtained when no
meaning can be attached to the answer. There is, for example extremely high correlation between some series
representing the production of pigs and the production of pig iron, yet no one has ever believed that this
correlation has any meaning or that it indicates the existence of a cause-effect relation. By itself, it establishes
only co variation. Correlation observed between variables that cannot conceivably be casually related is called
spurious of nonsense correlation more appropriately; we should remember that it is interpretation of the
degree of correlation that is spurious, not the degree of correlation itself. The high degree of correlation
indicates only the mathematical result. We should reach a conclusion based on logical reasoning and
intelligent investigation of significantly related matters, it only reading causation into spurious correlation but
also interpreting spuriously a perfectly valid relationship.
Karl Pearson's coefficient of correlation
Of the several mathematical methods of measuring correlation the Karl person's method, popularly known as
Pearson's coefficient of correlation, is most widely used in practice. The Pearson coefficient of correlation is
dented by the symbol r. it is one of the very few symbols that are used universally for describing the degree of
correlation between two series. The formula for computing Pearsonian r is: 

   
This method is to be applied only where deviations of items are taken from actual mean and not from assumed
mean.
The value of the coefficient of correlation as obtained by the above formula shall always lie between   1.
When r = + 1, it means there is perfect positive correlation between the variables. When r = - 1, it means there
is perfect negative correlation between the variables. However, in practice such values of r as + 1, - 1 and o are
rare. We normally get values which lie between + 1 and - 1 such as + 0.8, - 0.26, etc the coefficient of
correlation describes not only the magnitude o correlation but also its direction. Thus = 0.8 would mean that
correlation is positive because the sigh of e is + and the magnitude of correlation is 0.8 similar - 0.26 means
low degree of negative correlation.
 The above formula for computing person's coefficient of correlation can be transformed to the following form
which is easier to apply.
 R* = ∑ x y/√∑x2 x ∑ y2
 Where, x = (x - x) and y = (y - y)
It is obvious that while applying this formula we have not to calculate separately the standard deviation of X
and Y series as is required by formula
(I). this simplifies greatly the task of calculating correlation coefficient.
Take the deviations of X series from the mean of X and denote these deviations by X.
Square these deviations and obtain the total i.e. ∑ x2.
Take the deviations of Y series from the mean of Y and denote these deviations by Y.
Square these deviations and obtain the total i.e. ∑y2.
Multiply the deviations of X and Y series and obtain the total i.e. ∑ x y.
Substitute the values of ∑ x y, ∑ x2 and ∑ y2 in the above formula.
Illustration: calculate Karl Pearson's coefficient from the following data and interpret its value:
 
Roll no. of students 1 2 3 4 5
Marks in Accountancy 48 35 17 23 47
Marks in Statistics 45 20 40 25 45
 
Solution: let marks in Accountancy be denoted by X and marks in Statistics by Y.
 
Roll no. X (X - 34) x X2 Y (Y - 35) y y2 xy
1 48 +14 196 45 +10 100 +140
2 35 +1 1 20 -15 225 -15
3 17 -17 289 40 +5 25 -85
4 23 -11 121 25 -10 100 +110
5 47 +13 169 45 +10 100 +130
  ΣX = 170 Σx = 0 Σx2 = 776 ΣY = 0 Σy = 0 Σy2 = 550 Σxy = 280
 
r = Σxy/√Σ x2 × Σy2
x = (X - ‾X), y = (Y - ‾Y)
‾X = Σ X/N = 170/5 = 34; ‾Y = Σ‾Y/N = 175/5 = 35
Σxy = 280, Σx2 = 776, Σy2 = 550
r = 280/√776 ×550 = 280/653 - 299 = 0.429
Significance of the study of Correlation
The study of correlation is of immense use in practical life because of the following reasons:
1. Most of the variables show some kind of relationship. For example, there is relationship between price and
supply, income and expenditure, etc. with the help of correlation analysis we can measure in one figure the
degree of relationship existing between the variables.
2.  Once we know that two variables are closely related, we can estimate the value of one variable given the
value of another. This is known with the help of regression analysis.
So far we have studied problems relating to one variable only, in practice we come across a large number of
problems involving the use of two or more than two variables. If two quantities vary in such a way that
movements in one are accompanied by movements in the other, there quantities are correlated, for example,
there exists some relationship between age of husband and age of wife. Price of commodity and amount
demanded. Increase in rainfall up to a point and number of cinemagoers, etc. the degree of relationship
between the variables under consideration is measured through the correlation analysis. The measure of
correlation called the correlation coefficient or correlation index summarizes in one figure the direction and
degree of correlation. The correlation analysis refers to the techniques used in measuring the closeness of the
relationship between the variables.
The problem of analysing the relation between different series should be broken into three steps:

1. Determining whether a relation exists and, if it does measuring it.

2. Testing whether it is significant.

3. Establishing the cause and affect relation, if any.


Here only the first aspect will be discussed, for the second aspect a reference may be made on Tests of
Significance. The third aspect in the analysis that of establishing the cause-effect relation, is difficult to be
treated statistically. An extremely high and significant correlation between the increase in smoking and
increase in lung cancer would not prove that smoking causes lung cancer. The proof of a cause and effect
relation can be developed only by means of an exhaustive study of the operative elements themselves.

It should be noted that the detection and analysis of correlation (i.e. co variation) between two statistical
variables requires relationship of some sort which associates the observation in pairs, one of each pair being a
value of each of the two variables. In general, the pairing relationship may be of almost any nature, such as
observations at the time or place or over a period of time or different places.

The computation concerning the degree of closeness is based on the regression equation. However it is
possible to perform correlation analysis without actually having a regression equation.
ANOVA
Introduction to ANOVA
Anova is the abbreviation for Analysis of Variance. It was developed by Robert Fisher. Its application was
first published in the early 1920’s. Since then it has proved instrumental in providing significant statistical
conclusions in various research. ANOVA is a collection of statistical tools which helps in drawing a
significant inference between different groups of data.The ANOVA test determines the significance of the
difference between the means of three or more groups.Before delving into Analysis of Variance, it is
imperative to have a fundamental understanding of standard deviation, variance, and hypothesis testing.
Standard deviation and Variance are also known as measures of dispersion. They determine the variation
of individual values from the mean within a data set.

Consider a data set which shows marks of 5 students (s1 to s5) over a period of time. The students are
the same. One group shows the marks in 10th (g1). The second group when they are in 12th (g2) and the
third in the final year of graduation (g3). If represented in a tabular format, they would appear as below.

Step 1 Calculate mean of each group i.e. 10th, 12th and graduation (Mean (g1), Mean (g2) and Mean (g3)

Step 2 calculate Standard deviation [SD(g1), SD(g2), SD(g3)] and variance [Var(g1), Var(g2), Var(g3)] of
each group.

Step 3 calculate Grand mean, based on mean values of group 1, 2 and 3 (Grand Mean)

Step 4 use the grand mean and calculate the difference from each data point (v1 to v15). Determine the
square of each value and calculate the sum total. This is termed as Sum of Squares total.

Step 5 Calculate Sum of Squares between. This is the sum of squares of mean of each group from the
grand mean (ie [n*(Mean(g1) – Grand mean)2 + n*(Mean(g2) – Grand Mean)2 + n*(Mean (g3) – Grand
Mean)2 ] where n is the number of samples in each group. This sample size is the 5 students in each
group.

Step 6 Sum of squared errors = Sum of squares total – the sum of squares between. This value can be
verified by an alternative method. The cumulative value of Sum of squares of each group from its own
mean should be the same as [Sum of squares total – sum of squares between]
Step 7 Create ANOVA table and calculate degrees of freedom based on the number of groups (K = 3
groups) and total sample size (n = 3 groups and 5 observations each, i.e., a sample size of 15). Calculate
mean square and F-ratio using ANOVA Table.

7(a) Mean Square between = Sum of Squares between / (k-1) Mean Square error = Sum of Square error /
(n-k)

7(b) F Ratio = (Mean Square between )/ (Mean Square error)

Oneway ANOVA practice problems

Group15145334567

Group22343234345

Group35676748756
Solution
Sample means (x¯x¯) for the groups: = 48.2, 35.4, 69.8
Intermediate steps in calculating the group variances:
[[1]]
value mean deviations sq deviations
1 51 48.2 2.8 7.84
2 45 48.2 -3.2 10.24
3 33 48.2 -15.2 231.04
4 45 48.2 -3.2 10.24
5 67 48.2 18.8 353.44

[[2]]
value mean deviations sq deviations
1 23 35.4 -12.4 153.76
2 43 35.4 7.6 57.76
3 23 35.4 -12.4 153.76
4 43 35.4 7.6 57.76
5 45 35.4 9.6 92.16

[[3]]
value mean deviations sq deviations
1 56 69.8 -13.8 190.44
2 76 69.8 6.2 38.44
3 74 69.8 4.2 17.64
4 87 69.8 17.2 295.84
5 56 69.8 -13.8 190.44
Sum of squared deviations from the mean (SS) for the groups:
[1] 612.8 515.2 732.8
Var1=612.85−1=153.2Var1=612.85−1=153.2

Var2=515.25−1=128.8Var2=515.25−1=128.8

Var3=732.85−1=183.2Var3=732.85−1=183.2

MSerror=153.2+128.8+183.23=155.07MSerror=153.2+128.8+183.23=155.07 Note: this is just the


average within-group variance; it is not sensitive to group mean differences!

Calculating the remaining error (or within) terms for the ANOVA table:


dferror=15−3=12dferror=15−3=12

SSerror=(155.07)(15−3)=1860.8SSerror=(155.07)(15−3)=1860.8

Intermediate steps in calculating the variance of the sample means:

Grand mean (x¯grandx¯grand) = 48.2+35.4+69.83=51.1348.2+35.4+69.83=51.13

group mean grand mean deviations sq deviations

48.2 51.13 -2.93 8.58

35.4 51.13 -15.73 247.43

69.8 51.13 18.67 348.57

Sum of squares (SSmeans)=604.58(SSmeans)=604.58

Varmeans=604.583−1=302.29Varmeans=604.583−1=302.29

MSbetween=(302.29)(5)=1511.45MSbetween=(302.29)(5)=1511.45 Note: This method of


estimating the variance IS sensitive to group mean differences!

Calculating the remaining between (or group) terms of the ANOVA table:

dfgroups=3−1=2dfgroups=3−1=2

SSgroup=(1511.45)(3−1)=3022.9SSgroup=(1511.45)(3−1)=3022.9

Test statistic and critical value

F=1511.45155.07=9.75F=1511.45155.07=9.75

Fcritical(2,12)=3.89Fcritical(2,12)=3.89

 Decision: reject H0  Decision: reject H0 

ANOVA table
Source SS df MS

Group 3022.9 2 1511.45

Error 1860.8 12 155.07

Total 4883.7
Effect size
η2=3022.94883.7=0.62η2=3022.94883.7=0.62
APA writeup
F(2, 12)=9.75, p <0.05, η2η2=0.62

Regression

Regression analysis is the statistical tool for estimating relationships among the variables. In the
regression analysis, the focus is primarily on determining the relationship between the dependent
variable and one or more independent variables. The independent variables are also called as
‘Predictors’. Through regression analysis, the investigator tries to ascertain the causal effect of one
variable over another, for example decrease in demand due to increase in price. In addition,
‘statistical significance’ of the estimated relationships is also assessed. Multiple regression
techniques have been a key to the field of ‘Econometrics’ and have wide range of applications
such as evaluate trends and make estimates of forecasts. Regression analysis is also used to
generate insights on customer behavior and estimating parameters for profitability.

Regression Model

The general regression model can be represented as

Y= f (X, β)

Where, Y is Dependent variable,

X is Independent variable

Β is constant or unknown parameter

Our Statistics assignment help experts will define the regression model based on the
requirements mentioned in your regression analysis assignment or homework. They will assist you
in estimating the right independent variables which would be statistically significant and can
explain the variation in the dependent variable. Through our online regression analysis assignment
help, you can avail the assistance to build multiple types of regression model such as simple liner,
multiple linear, logit, binary regression model etc. Our online regression assignment help service
will enable you to learn complex academic concepts related to regression modeling.

Types Of Regression Analysis

Based on the relationships between the dependent and predictor variables, various forms of
regression models can be defined such as Simple Linear regression, Multiple Linear regression,
Logistic regression, Polynomial regression etc. All of our online Statistics experts are well versed
with these different types of regression models and can provide online quality regression analysis
assistance on 24*7 basis. Our Regression analysis writing services is one of the best in industry
due to well qualified, experienced team of our professional regression analysis experts tutors. Our
online experts have so far provided regression analysis homework help to numerous students
across UK, USA and Australia. Our regression analysis demonstrate the difference among multiple
techniques through below examples and justifications

1. Liner Regression

Linear regression is one of the most widely known application technique. It also has highest
number of business as well as academic applications. In the linear regression technique,
dependent variable is continuous whereas predictor variable(s) can be continuous and discrete. It
establishes relationship between the dependent variable (Y) and one or more predictor variables
(x) using the best fit line whose nature is linear. The best fit line is also called as Regression line.
The linear regression can be represented as

Y= f (X, β)

Where, Y is dependent variable

β0 is intercept or constant

B1 is slope

e is error

Simple linear regression examines the relationship between one dependent variable and one
predictor (independent) variable. If the model includes more than one predictor or independent
variables, it is called as multiple linear regression.

Our Statistics assignment help experts help to perform linear regression analysis of any nature
and can prepare the detailed analysis report with relevant findings. If you seek any liner regression
analysis assignment help, please email your assignment to us.

Ordinary Least squares (OLS) regression: In ordinary least square technique, the equation is
estimated by determining the equation such that sum of squared distances from each data point to
the regression line as minimum as possible. Certain assumptions are considered for OLS to
provide most precise results such as

 Regression model is linear


 Residuals have normally distributed and have a mean of zero
Email us your assignment and avail the quality, accurate Ordinary Least squares (OLS)
regression homework help.

2. Logistic Regression

Logistic regression also called as Logit Model measures the relationship between categorical
dependent variable and one of more predictor variables. The model estimates the
probabilities using a logistic function which is cumulative logistic distribution. According to
logistic regression assignment help experts, logit regression can be treated as a specialized
case of generalized linear model and thus is analogous to linear regression.

3. Polynomial RegressionPolynomial regression is a non-linear type of regression. In the


polynomial regression model, the relationship between dependent and the predictor variables is
estimated using nth degree of the polynomial. These regression models are usually fit using
method of least squares

You might also like