Professional Documents
Culture Documents
Short Term Training Programme On Data Analytics Using SPSS and RCMDR
Short Term Training Programme On Data Analytics Using SPSS and RCMDR
R = +0.2
R = + 0 .40
Karl Pearson’s coefficient of Correlation
Example: Correlation of Gestational Age and Birth Weight
A small study is conducted involving 17 infants to investigate the association between gestational age at birth, measured
in weeks, and birth weight, measured in grams.
Karl Pearson’s coefficient of Correlation
Example : Correlation of Gestational Age and Birth Weight
We wish to estimate the association between gestational age and infant birth weight.
In this example, birth weight is the dependent variable and gestational age is the independent variable. Thus Y = birth weight and
X = gestational age.
ത
σ(𝑋𝑖 −𝑋)(𝑌 ത
𝑖 −𝑌)
𝑟∗ = = 0.82
𝑠𝑥 .𝑠𝑦
Conclusion: The sample’s correlation coefficient indicates a strong positive correlation between Gestational Age and Birth Weight.
Karl Pearson’s coefficient of Correlation
Example 7.1: Correlation of Gestational Age and Birth Weight
Significance Test
To test whether the association is merely apparent, and might have arisen by chance use the t test in
the following calculation
𝑛−2
𝑡=𝑟
1 − 𝑟2
17 − 2
𝑡 = 0.82 = 1.44
1 − 0.822
Consulting the t-test table, at degrees of freedom 15 and for 𝛼 = 0.05, we find that t = 1.753.
Thus, the value of Pearson’s correlation coefficient in this case may be regarded as highly
significant.
Spearman’s Rank Correlation
When the data are not available to use in numerical form for doing correlation analysis but
when then information is sufficient to rank the data as first, second, third, and so forth, we
quite often use the rank correlation method and work out the coefficient of rank
correlation.
In fact, rank correlation coefficient is a measure of correlation that exists between the two
sets of ranks. In other words, it is a measure of association that is based on the ranks of the
observations and not on the numerical values of the data. It was developed by famous
statistician Charles Spearman in the early 1900s and as such it is also known as Spearman’s
rank correlation coefficient.
Charles Spearman’s coefficient of correlation (or
rank correlation)
It is the technique of determining the degree of correlation between two variables in case
of ordinal data where ranks are given to the different values of the variables. The main
objective of this coefficient is to determine the extent to which the two sets of ranking are
similar or dissimilar.This coefficient is determined as under:
Simple Regression Analysis
Regression is the determination of a statistical relationship between two or more variables.
In simple regression, we have only two variables, one variable (defined as independent) is
the cause of the behavior of another one (defined as dependent variable).
Regression can only interpret what exists physically i.e., there must be a physical way in
which independent variable X can affect dependent variable Y. The basic relationship between X
andY is given by:
This equation is known as the regression equation of Y on X (also represents the regression
line of Y on X when drawn on a graph) which means that each unit change in X produces a
change of b inY, which is positive for direct and negative for inverse relationships.
Simple Regression Analysis
For fitting a regression equation of the type to the given values of X and Y
variables, we can find the values of the two constants viz., a and b by using the following
two normal equations
Multiple Correlation & Regression
In multiple regression analysis, the regression coefficients (viz., b1 b2) become less reliable as
the degree of correlation between the independent variables (viz., X1, X2) increases. If there
is a high degree of correlation between independent variables, we have a problem of what is
commonly described as the problem of multicollinearity.
In such a situation we should use only one set of the independent variable to make our
estimate. In fact, adding a second variable, say X2, that is correlated with the first variable, say
X1, distorts the values of the regression coefficients. Nevertheless, the prediction for the
dependent variable can be made even when multicollinearity is present, but in such a
situation enough care should be taken in selecting the independent variables to estimate a
dependent variable so as to ensure that multi-collinearity is reduced to the minimum.
THANK YOU ☺