Statistical Tools

1
STATISTICAL TOOLS
Simple Correlation Analysis

Simple Correlation analysis is a technique used to describe the relationship or
association between the variables.
The following statistical tools could be used.
TYPE OR LEVEL OF DATA STATISTICAL TOOL

TO BE CORRELATED
1. Both variables are measured at least in an Pearson Product Moment Correlation

interval scale Coefficient (Pearson r)
2. Variables of interest are measured in an Spearman’s rs (Spearman rank)

ordinal scale
3. Variables of interest are nominal variables Guttman’s Lambda Formula

( Guttman’s Coefficient of
Predictability)
4. One variable is interval and the other Point Biserial Coefficient of

variable is nominal variable Correlation Formula
5. An interval variable and any nominal Correlation Ratio Formula

Variable
Pearson Product Moment Correlation
When to use Pearson r:
The variables to be tested are both in an interval or ratio level.
Note: To have a more reliable and acceptable result the paired data to be tested must be at
least 30.
Properties of the Correlation Coefficient (r)
1. It is a unitless quantity.
2. It is always some number between -1 and +1 inclusive.
3. If r=1, then all the points lie in a straight line and the relationship is said to be
perfect positive relationship. If r= -1, then all the points lie in a straight line but in
an inverse relation and this relationship is said to be perfect negative relationship.
2
If r=o. then there is no relationship between two variables and the magnitude are
said to be independent.
4. The magnitude of r is simply a measure of how closely the points cluster about a
certain trend line known as regression line.
Degree of correlation:
Value of r Degree of Correlation
1.0 Perfect
.81 - .99 Very high
.61 - . 80 Marked, substantial
.41 - .60 Moderate
.21 - .40 Definite but small
.01 - .20 Almost negligible
0 No correlation
Computational Format:
Consider the scores obtained in Math (X) and Statistics (Y) subjects of ten
Marketing Management students.
________________________________________________________________________
Observation Math Score (X) Stat Score (Y) X2 Y2 XY
1 5 2 _____ _____ ______
2 8 7 _____ _____ ______
3 10 8 _____ _____ ______
4 12 9 _____ _____ ______
5 12 10 _____ _____ ______
6 14 12 _____ _____ ______
7 15 14 _____ _____ ______
8 16 10 _____ _____ ______
9 18 16 _____ _____ ______
10 20 12 _____ _____ ______
SUM 130 100 1878 1138 1440
_______________________________________________________________________
Summation X (∑X ) : _____

Summation Y (∑Y) : _____
Summation X square (∑ X2) : _____
Summation Y square (∑ Y2) : _____
Summation XY (∑ XY) : ______
Formula:
𝒏(∑ 𝑿𝒀) − (∑ 𝑿)(∑ 𝒀)
𝒓=
√[𝒏(∑ 𝑿𝟐 ) − (∑ 𝑿)𝟐 ][𝒏(∑ 𝒀𝟐 ) − (∑ 𝒀)𝟐 ]
3
where:
n = sample size or the number of paired observations
∑X = is the variable X summed
∑Y = is the variable Y summed
(∑X2)= is the X variable squared and the squares summed
(∑X)2= is the variable X summed and the sum squared
(∑Y2)= is the Y variable squared and the squares summed
(∑Y)2= is the variable Y summed and the sum squared
∑XY= is the sum of the products of X and Y
r = correlation coefficient or the degree of relationship between X and Y
Testing the significance of r:
If we want to know how strong the degree of the relationship is, a test of
significance is necessary.
Example:
To test the significance of the null hypothesis using the computed r it is done as
follows:
Ho: r = 0 (There is no significant relationship between Math scores and Statistics

scores of students).
Ha: r ≠ 0 (There is a significant relationship between Math scores and Statistics

Correlation test of hypothesis
Formula:
𝑟 √𝑛−2
𝑡=
√1−𝑟 2
Computation:
By using Table B compare the computed t with the tabular t at df 8 (n -2 or 10-2 =

8 ) which is .05 = 2.306; and .01 = 3.355
Decision: The null hypothesis is rejected because the computed t-value exceeded the
tabular value.
Conclusion: The math scores is significantly correlated to the statistics scores

obtained by students.
4
SPEARMAN RANK CORRELATION COEFFICIENT ( rs)
If the variables of interest are measured in an ordinal scale, the spearman rank
correlation coefficient (rs) may be used instead of the Pearson r.
Steps in computing rs:
Step1. Rank the scores in distribution X giving the lowest a rank of 1 and the
highest a rank of n. Repeat the process for the scores in distribution Y.
Step 2. Obtain the difference (di) between the two sets of ranks.
Step 3. Square each difference and then take the sum of the squares of di.
Step 4. Compute using the following formula:

6 ∑ 𝑑𝑖 2
𝑟𝑠 = 1 −
𝑛(𝑛2 − 1)
where:
rs = Spearman’s rho
id= difference of the two sets of ranks
Step 5. If the proportion of ties in either the X or the Y observations is large use
the formula.
∑ 𝑋 2 + ∑ 𝑌 2 − ∑ 𝑑𝑖 2
𝑟𝑠 =
2 2
2√(∑ 𝑋 )(∑ 𝑌 )
where:
n(n2 – 1) - ∑Ty
∑X = 2
──────────────── -∑Tx
12
tx3 - tx
Tx = _────────
12
n ( n2-1)
∑Y = _──────── - ∑Ty
2
12
tx3- tx
Ty = _────────
12
tx = number of observations in X tied at a given rank

5
ty = number of observations in Y tied at a given rank
Step 6. To test whether the observed rs value indicates an association between

variables the following may be applied:
a) For n from 4 to 30, critical values of rs for .05 and .01 level of
significance use Table P.
b) For n > 30, significance of the observed rs under the null hypothesis
can be determined using the t-test using the following formula:
The sampling distribution of the test is the student t distribution with n-2 degrees
of freedom.
Computation Format:
Math Rank X Stat Rank Y di di2
1 18 ______ 24 ______ _____ _____

2 17 ______ 28 ______ _____ _____
3 14 ______ 30 ______ _____ _____
4 13 ______ 26 ______ ______ _____
5 12 ______ 22 ______ ______ _____
6 10 ______ 18 ______ ______ _____
7 8 ______ 15 ______ ______ _____
________________________________________________________________________
∑di2= _______
Computation:
The test of significance of the null hypothesis using the computed rs in the
example above is:
Ho: r = 0 (There is no relationship between Math scores and Statistics scores of

students).
Ha: r ≠ 0 (There is significant relationship between Math scores and Statistics

Computation:
Decision:
Conclusion:
6
CORRELATION BETWEEN INTERVAL AND DICHOTOMOUS NOMINAL

VARIABLES
The Use of Point Biserial Coefficient of Correlation Formula
Formula:
∑ 𝑓 (∑ 𝑓𝑎 𝑋) − ∑ 𝑓𝑎 (∑ 𝑓𝑥 )
𝑟𝑝𝑏 =
√∑ 𝑓𝑎 ∑ 𝑓𝑏 [∑ 𝑓( ∑ 𝑓𝑥2 ) − (∑ 𝑓𝑥 )2
Where: x = interval variable

fa = frequency of one of the dichotomous nominal variable
fb = frequency of the other dichotomous variable
f = total frequency of the dichotomous nominal variable
Example: Determine the degree of relationship between sex and intelligence.
IQ Score Number of Males Number of Females

95 8 3
90 3 2
85 1 4
80 2 0
75 4 3
Solution:
(X) (fa) (fb) (f) (fX) x ( fX) (faX)
95 8 3 11 ______ _______ _______

90 3 2 5 ______ _______ _______
85 1 4 5 ______ _______ _______
80 2 0 2 ______ _______ _______
75 4 3 7 ______ _______ _______
Total 18 12 30 2,605 228,075 1,575
Computation:
Interpretation:
There is very small positive relationship between sex and intelligence.

7
CORRELATION BETWEEN INTERVAL AND ANY NOMINAL

VARIABLES
Correlation Ratio Formula
∑Ni Ÿi2 - NiŸt2

E2 = ───────────
∑Ÿi2 – Nt Ÿt2
Where: Ni = number of sample per category

Ÿi = average obtained per category
N = total number of samples
Ÿ = over-all average
Ÿij2 = individual item
Example: Measure the degree of relationship between the civil status and the annual
salary (expressed in thousand pesos) of the given samples.
Single 65 83 81 69 73 89 76 60 N= 8
Married 70 67 90 84 78 N= 5
Widowed 89 64 78 N= 3
Solution:
N1 = 8 N2 = 5 N3 = 3 Nt = 16
Ÿ1 = 596/8 Ÿ2 = 389/5 Ÿ3 = 231/3 Ÿt = 1,216/16
= 74.5 = 77.8 = 77.0 = 76.0
Yij2 = (65)2 + ( 83) 2 + ( 81 )2 + . . . + (89)2 + ( 64 )2 + ( 78 )2

= 93,792
Computation:
Interpretation:
There is a very small positive relationship between the civil status and the annual salary (
expressed in thousand pesos) of the given samples.
8
CORRELATION BETWEEN NOMINAL VARIABLES
The Use of Guttman’s Lambda Formula (also known as Guttman’s Coefficient of

Predictability)
Formula:
FR - RT and FC- CT
c = ────── c = ─────
N - RT N - CT
where: FC = the biggest cell frequency in each column

FR = the biggest cell frequency in each column
CT = the biggest column total
RT = the biggest row total
N = total frequency
Example:
Measure the degree of relationship of female teachers’ level of knowledge and

extent of adopting control measures of varicosity.
Knowledge Level Extent of Adopting Control Measures
Always Frequent Sometimes Never Total

Very knowledgeable 18 9 10 0 37
Knowledgeable 6 15 6 3 30
Slightly knowledgeable 10 8 9 3 30
Not knowledgeable 0 0 3 0 3
TOTAL 34 32 28 6 100
Computation:
Conclusion:
The obtained lambda coefficient of _____ indicates that when extent of adoption
of control measures for varicosity is treated as an independent variable, the error reduced
in the prediction (increases its accuracy) is _____ percent. While the obtained lambda
coefficient of _____ indicates that when knowledge is treated as an independent variable ,
the error minimized in the prediction (increases its accuracy) is _______ percent. These
results prove that the extent of adoption of the control measures of varicosity accurately
predicts knowledge level more than knowledge level predicting extent of practice.
9
CHI – SQUARE TEST
Characteristics of the Chi-square
The statistic used in the analysis of enumeration data known as Chi-square test. The
chi-square test can be used for a variable or two variables for which there are two or more
categories each. It reflects discrepancies between the observed and expected or
theoretical frequencies of individuals, objects, or events falling in the various categories.
Applications of the Chi-square

1. test of goodness of fit
2. test of homogeneity (two or more samples, one criterion variable)
3. test of independence ( one sample, two criterion variables)
Steps in Using Chi-square

These are the steps in using the chi-square test for k independent samples.
1. State the null hypothesis. The null hypothesis may be stated in any of these ways:
a. the sample distribution conforms with the hypothetical or theoretical
distribution.
b. The actual observed proportion is not significantly different from the ideal
or expected proportion.
c. One variable does not depend on other variable. Or the two variables are
independent from each other.
2. Set the level of significance, also known as alpha (a)
3. Cast the observed frequencies in a k x r contingency table, using the k columns
for the groups. Determine the expected frequency for each cell by finding the
product of the marginal totals common to the cell and dividing this product by N.
( N is the sum of each group of marginal totals. It represents the total number of
independent observation).
Determine the degrees of freedom using the formula:
For one way classification:

df – number of categories - 1
For two-way classification:
df = ( k-1 ) ( r – 1 )
4. Locate the tabular value of the chi-square in the chi-square distribution table by
getting the value where the desired level of significance and the computed degree
of freedom intersect.
5. Compute for the chi-square value by using the formula:
( fo – fe )2
X2 = ───────
fe
10
where: fo = observed number of cases

fe = expected number of cases
How to compute for fe:

fe = (row total ) (column total)
grand total
6. State the conclusion arrive at by the acceptance or rejection of the null hypothesis.
If the computed value of the chi-square is less than the tabular value, the null
hypothesis is accepted. If the computed value of the chi-square is greater than the tabular
value, the null hypothesis is rejected.
Test of Goodness of Fit
A chi-square goodness of fit is performed in order to determine if a set of

obtained data corresponds to some theoretical distribution.
Example:
In bowling tournament, does the number of wins related to lane position?
Data:
Lane Position No. of Wins ( fo) Expected frequency (fe) ( fo –fe )2
fe
1 29 18 _______
2 19 18 _______
3 18 18 _______
4 25 18 _______
5 17 18 _______
6 10 18 _______
7 15 18 _______
8 11 18 _______
X2 = _______
N = 144
How the fe was computed:
fe = Number of wins / lane position

= 144 / 8 = 18
11
Chi-square Test of Homogeneity (Two or More Samples, One Criterion Variable)
The chi-square test is frequently used to determine if two or more populations are
homogenous. By this means that the data distributions are similar with respect to a
particular criterion variable.
The samples drawn from each population in a test of homogeneity need not be
equal size. However, it is recommended that they be so whenever possible, for when to
make calculations easier.
Example: In an experiment involving two groups of samples, 100 males and 100 females,
subjects were asked to state a preference between polished rice and brown rice. Do the
preferences of the two groups differ?
Rice Preference
Sex Polished Brown Total

Males 69 31 100
Females 48 52 100
Total 117 83 200
Computation:
Rice Preference
Polished Brown Total

Sex fo fe fo fe
Males 69 ________ 31 ________ 100

Females 48 ________ 52 ________ 100
Total 117 ________ 83 _________ 200
Solution: Follow the steps in using the chi-square test.

1. Ho: The preferences of the two groups do not differ significantly.
Ha: The preferences of the two groups differ significantly.
2. Use both 5% and 1% level of significance.
3. Since the given data is a two –way classification, the degree of freedom is
computed as :
df = ( k-1 ) ( r – 1 )
df = ( 2 – 1 ) ( 2 – 1 )
df = 1
4. The tabular value at 1 df and 5% and 1 % level is 3.84 and 6.64, respectively
( Table E ).
5. For two – way classification, get the expected frequencies using the following
formula:
12
fe = (row total ) ( column total)

grand total
a. Males -Polished rice ( fe) = 117 x 100 / 200 = _________

b. Males – Brown Rice (fe) = 83 x 100 / 200 = _________
c. Females – Polished rice (fe) = 117 x 100 / 200 = ________
d. Females – Brown rice ( fe) = 83 x 100 / 200 = _________
Then substitute the values into the formula with correction for continuity since we have a
2 x 2 table with 1 degree of freedom.
(│fo - fe │- .5 )2
X2 = ───────────
fe
Computation:
6. Since ______ ( tabular value ) < ______ ( computed value), we reject the null
hypothesis. The preference of the two groups differ significantly.
TEST OF INDEPENDENCE ( ONE SAMPLE, TWO CRITERION

VARIABLES)
The one-sample test of independence differs from the test of homogeneity in that
for each sample member there are measures on two variables. The sample used in a test
of independence consist of members of randomly drawn from the same population. This
test is used to see if measures taken on two criterion variables are either independent or
associated with one another in a given population.
The calculation of a chi-square test of independence is similar to that made with a
test of homogeneity.
Example:
One hundred high school students, aged 13-16, were given a test to determine
their levels of knowledge on the effects of climate change. Both age and levels of
knowledge were classified as shown below.
13
LEVELS OF KNOWLEDGE
AGE High Average Low TOTAL
______________________________________________________________________
13 - 14 23 20 17 60
15 - 16 18 12 10 40
TOTAL 41 32 27 100
Solution:
Follow the steps in using the chi-square test.

1. Ho : The level of knowledge of the students does not depend on their age.
Ha : The level of knowledge of the students depend on their age.
2. Set 5% level of significance.
3. Since the given data is a two-way classification, the degree of freedom is
computed as:
df = ( k -1 ) ( r – 1 )
df = ( 2 – 1 ) ( 3 – 1 )
df = 2
4. For two-way classification get the expected frequencies using the following
formula:
fe = (row total) (column total)

grand total
Compute the fe as follows:
Age 13-14 – High score = (41 x 60 ) / 100 = __________

Age 13-14 – Average score = ( 32 x 60 ) / 100 = __________
Age 13- 14- Low score = ( 27 x 60 ) / 100 = __________
Age 15-16 – High score = ( 41 x 40 ) / 100 = __________
Age 15- 16 – Average score = ( 32 x 40 ) / 100 = __________
Age 15- 16 – Low score = ( 27 x 40 ) / 100 = __________
Then substitute the values in the formula:
Computation:
Conclusion:
14
TEST OF COMPARISON
Comparing Two Independent Samples ( t – test )

This is a test comparing two sample means.
Example:
Two groups of students were tested if whose group is faster in traversing a 100
meter hanging bridge. As a whole, there were 11 students , hence the first group is
composed of five (5) members while the other group consisted of six (6) members. These
groups of students were timed as they traversed the hanging bridge. The results in
minutes are shown below. Is there a significant difference in their mean time (minutes) of
traversing the hanging bridge?
GROUP A GROUP B
Member Minutes Member Minutes

1 2 1 3
2 4 2 7
3 9 3 5
4 3 4 8
5 2 5 4
6 3
Total 5 20 6 30
Mean ( X ) 4 5
Hypothesis:
Ho : There is no significant difference in mean time of traversing the hanging

bridge between Group A and Group B.
Ha : There is a significant difference in mean time of traversing the hanging
bridge between Group A and group B.
ep 1. Calculate the sample standard deviation.
GROUP A GROUP B
2
Member Minutes (X1) X1 Member Minutes ( X2 ) X22
1 2 4 1 3 9
2 4 16 2 7 49
3 9 81 3 5 25
4 3 9 4 8 64
5 2 4 5 4 16
6 3 9
TOTAL 5 20 114 6 30 172

15
Formula of Standard Deviation:
(∑ 𝑿𝟏 )𝟐 (∑ 𝑿𝟐 )𝟐
∑ 𝑿𝟏 𝟐 − ∑ 𝑿𝟐 𝟐 −
𝒔𝟏= √ 𝒔𝟐= √
𝒏𝟏 𝒏𝟐
Group A Group B
𝒏𝟏 −𝟏 𝒏𝟐 −𝟏
Substitute the Values in the Formula Using the following values:
Group A Group B
n 5 6
∑X 20 30
∑X2 114 172
Computation:
Standard Deviation Group A: ________________

Standard Deviation Group B : ________________
Compute the t value
Formula :
|𝑿𝟏 − 𝑿𝟐 |
𝒕=
𝟐 𝟐
√ 𝑺𝟏 + 𝑺𝟐
𝒏𝟏 𝒏𝟐
Where:
X1 = Mean of group A _________

X2 = Mean of group B _________
Standard deviation of group A ( s ) = _____________
Standard deviation of group B ( s ) = _____________
Degrees of freedom ( df ) = n- 2 = 5 + 6 = 11
11 – 2 = 9
Tabular value use Table B ( df = n-2 ) = 9 ; .05 = 2.262; .01 = 3.250
Computation:
16
ANALYSIS OF VARIANCE
One-way ANOVA ( Parametric)
One-way ANOVA is used when the research calls for comparison of the means of
two or more groups. In the case of a two-group comparison the obtained F is equal to t2.
Example:
Is there a significant difference in the mean errors committed by encoders using
different machines?
M A C H I N E S
A B C D
18 3 7 11
5 9 5 19
10 14 15 20
17 10 14 19
10 9 9 11
∑ 60 45 50 80
Mean 12 9 10 16
∑X = 235
∑X2 = 3245
Compute for the F-ratio ∝ = . 𝟎𝟓
Steps:
1. Compute for the totals per machine.
2. Compute the means.
3. Compute the ∑X2 = 3245
4. Compute the ∑X = 235
5. Find the correction term, C
C = ( ∑X)2
───
n
6. Find the total sum of squares
SST = ∑X2 – C
Find the sum of squares (ss) among means.
ssm = (∑XA)2 + ( ∑XB)2 + . . . ( ∑XD)2
───────────────────────────
-C
n
7. Find the sum of squares within.
ssw = ssT - ssm
8. Prepare the ANOVA Table

17
Analysis of Variance
Source of Variation df SS Mean Square F ratio

( n-1 )
Among Xs Means k-1 SSTreatment SST/ (k-1) MST/ MSE
Within groups (Error) n-k SSE SSE/ (n-k)
Total n -1 SStotal
9. Interpret, use Table D

df numerator = 3
df denominator = 16
Table values
.05 = 3.24
.01 = 5.29

Statistical Tools

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Statistical Tools

Uploaded by

Copyright:

Available Formats

1

Simple Correlation Analysis

The following statistical tools could be used.

TYPE OR LEVEL OF DATA STATISTICAL TOOL

1. Both variables are measured at least in an Pearson Product Moment Correlation

2. Variables of interest are measured in an Spearman’s rs (Spearman rank)

3. Variables of interest are nominal variables Guttman’s Lambda Formula

4. One variable is interval and the other Point Biserial Coefficient of

5. An interval variable and any nominal Correlation Ratio Formula

Pearson Product Moment Correlation

When to use Pearson r:

The variables to be tested are both in an interval or ratio level.

Properties of the Correlation Coefficient (r)

Value of r Degree of Correlation

Summation X (∑X ) : _____

Testing the significance of r:

Ho: r = 0 (There is no significant relationship between Math scores and Statistics

Ha: r ≠ 0 (There is a significant relationship between Math scores and Statistics

Correlation test of hypothesis

By using Table B compare the computed t with the tabular t at df 8 (n -2 or 10-2 =

Conclusion: The math scores is significantly correlated to the statistics scores

SPEARMAN RANK CORRELATION COEFFICIENT ( rs)

Steps in computing rs:

Step 4. Compute using the following formula:

tx = number of observations in X tied at a given rank

ty = number of observations in Y tied at a given rank

Step 6. To test whether the observed rs value indicates an association between

Math Rank X Stat Rank Y di di2

1 18 ______ 24 ______ _____ _____

Ho: r = 0 (There is no relationship between Math scores and Statistics scores of

Ha: r ≠ 0 (There is significant relationship between Math scores and Statistics

CORRELATION BETWEEN INTERVAL AND DICHOTOMOUS NOMINAL

The Use of Point Biserial Coefficient of Correlation Formula

Where: x = interval variable

Example: Determine the degree of relationship between sex and intelligence.

IQ Score Number of Males Number of Females

(X) (fa) (fb) (f) (fX) x ( fX) (faX)

95 8 3 11 ______ _______ _______

Total 18 12 30 2,605 228,075 1,575

There is very small positive relationship between sex and intelligence.

CORRELATION BETWEEN INTERVAL AND ANY NOMINAL

Correlation Ratio Formula

∑Ni Ÿi2 - NiŸt2

Where: Ni = number of sample per category

Ÿ1 = 596/8 Ÿ2 = 389/5 Ÿ3 = 231/3 Ÿt = 1,216/16

= 74.5 = 77.8 = 77.0 = 76.0

Yij2 = (65)2 + ( 83) 2 + ( 81 )2 + . . . + (89)2 + ( 64 )2 + ( 78 )2

CORRELATION BETWEEN NOMINAL VARIABLES

The Use of Guttman’s Lambda Formula (also known as Guttman’s Coefficient of

where: FC = the biggest cell frequency in each column

Measure the degree of relationship of female teachers’ level of knowledge and

Knowledge Level Extent of Adopting Control Measures

Always Frequent Sometimes Never Total

CHI – SQUARE TEST

Characteristics of the Chi-square

Applications of the Chi-square

Steps in Using Chi-square

Determine the degrees of freedom using the formula:

For one way classification:

where: fo = observed number of cases

How to compute for fe:

Test of Goodness of Fit

A chi-square goodness of fit is performed in order to determine if a set of

1 18 __ 24 _ ___

95 8 3 11 __ _ _____

Males 69 31 100

Total 117 83 _ 200