Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 17

1

STATISTICAL TOOLS

Simple Correlation Analysis


Simple Correlation analysis is a technique used to describe the relationship or
association between the variables.

The following statistical tools could be used.

TYPE OR LEVEL OF DATA STATISTICAL TOOL


TO BE CORRELATED

1. Both variables are measured at least in an Pearson Product Moment Correlation


interval scale Coefficient (Pearson r)

2. Variables of interest are measured in an Spearman’s rs (Spearman rank)


ordinal scale

3. Variables of interest are nominal variables Guttman’s Lambda Formula


( Guttman’s Coefficient of
Predictability)

4. One variable is interval and the other Point Biserial Coefficient of


variable is nominal variable Correlation Formula

5. An interval variable and any nominal Correlation Ratio Formula


Variable

Pearson Product Moment Correlation

When to use Pearson r:

The variables to be tested are both in an interval or ratio level.

Note: To have a more reliable and acceptable result the paired data to be tested must be at
least 30.

Properties of the Correlation Coefficient (r)

1. It is a unitless quantity.
2. It is always some number between -1 and +1 inclusive.
3. If r=1, then all the points lie in a straight line and the relationship is said to be
perfect positive relationship. If r= -1, then all the points lie in a straight line but in
an inverse relation and this relationship is said to be perfect negative relationship.
2

If r=o. then there is no relationship between two variables and the magnitude are
said to be independent.
4. The magnitude of r is simply a measure of how closely the points cluster about a
certain trend line known as regression line.

Degree of correlation:

Value of r Degree of Correlation

1.0 Perfect
.81 - .99 Very high
.61 - . 80 Marked, substantial
.41 - .60 Moderate
.21 - .40 Definite but small
.01 - .20 Almost negligible
0 No correlation

Computational Format:
Consider the scores obtained in Math (X) and Statistics (Y) subjects of ten
Marketing Management students.
________________________________________________________________________
Observation Math Score (X) Stat Score (Y) X2 Y2 XY
1 5 2 _____ _____ ______
2 8 7 _____ _____ ______
3 10 8 _____ _____ ______
4 12 9 _____ _____ ______
5 12 10 _____ _____ ______
6 14 12 _____ _____ ______
7 15 14 _____ _____ ______
8 16 10 _____ _____ ______
9 18 16 _____ _____ ______
10 20 12 _____ _____ ______
SUM 130 100 1878 1138 1440
_______________________________________________________________________

Summation X (∑X ) : _____


Summation Y (∑Y) : _____
Summation X square (∑ X2) : _____
Summation Y square (∑ Y2) : _____
Summation XY (∑ XY) : ______

Formula:
𝒏(∑ 𝑿𝒀) − (∑ 𝑿)(∑ 𝒀)
𝒓=
√[𝒏(∑ 𝑿𝟐 ) − (∑ 𝑿)𝟐 ][𝒏(∑ 𝒀𝟐 ) − (∑ 𝒀)𝟐 ]
3

where:
n = sample size or the number of paired observations
∑X = is the variable X summed
∑Y = is the variable Y summed
(∑X2)= is the X variable squared and the squares summed
(∑X)2= is the variable X summed and the sum squared
(∑Y2)= is the Y variable squared and the squares summed
(∑Y)2= is the variable Y summed and the sum squared
∑XY= is the sum of the products of X and Y
r = correlation coefficient or the degree of relationship between X and Y

Testing the significance of r:

If we want to know how strong the degree of the relationship is, a test of
significance is necessary.

Example:

To test the significance of the null hypothesis using the computed r it is done as
follows:

Ho: r = 0 (There is no significant relationship between Math scores and Statistics


scores of students).

Ha: r ≠ 0 (There is a significant relationship between Math scores and Statistics


scores of students).

Correlation test of hypothesis

Formula:
𝑟 √𝑛−2
𝑡=
√1−𝑟 2

Computation:

By using Table B compare the computed t with the tabular t at df 8 (n -2 or 10-2 =


8 ) which is .05 = 2.306; and .01 = 3.355

Decision: The null hypothesis is rejected because the computed t-value exceeded the
tabular value.

Conclusion: The math scores is significantly correlated to the statistics scores


obtained by students.
4

SPEARMAN RANK CORRELATION COEFFICIENT ( rs)

If the variables of interest are measured in an ordinal scale, the spearman rank
correlation coefficient (rs) may be used instead of the Pearson r.

Steps in computing rs:

Step1. Rank the scores in distribution X giving the lowest a rank of 1 and the
highest a rank of n. Repeat the process for the scores in distribution Y.

Step 2. Obtain the difference (di) between the two sets of ranks.

Step 3. Square each difference and then take the sum of the squares of di.

Step 4. Compute using the following formula:


6 ∑ 𝑑𝑖 2
𝑟𝑠 = 1 −
𝑛(𝑛2 − 1)
where:
rs = Spearman’s rho
id= difference of the two sets of ranks

Step 5. If the proportion of ties in either the X or the Y observations is large use
the formula.

∑ 𝑋 2 + ∑ 𝑌 2 − ∑ 𝑑𝑖 2
𝑟𝑠 =
2 2
2√(∑ 𝑋 )(∑ 𝑌 )

where:

n(n2 – 1) - ∑Ty
∑X = 2
──────────────── -∑Tx
12
tx3 - tx
Tx = _────────
12

n ( n2-1)
∑Y = _──────── - ∑Ty
2

12

tx3- tx
Ty = _────────
12

tx = number of observations in X tied at a given rank


5

ty = number of observations in Y tied at a given rank

Step 6. To test whether the observed rs value indicates an association between


variables the following may be applied:
a) For n from 4 to 30, critical values of rs for .05 and .01 level of
significance use Table P.
b) For n > 30, significance of the observed rs under the null hypothesis
can be determined using the t-test using the following formula:

The sampling distribution of the test is the student t distribution with n-2 degrees
of freedom.

Computation Format:

Math Rank X Stat Rank Y di di2

1 18 ______ 24 ______ _____ _____


2 17 ______ 28 ______ _____ _____
3 14 ______ 30 ______ _____ _____
4 13 ______ 26 ______ ______ _____
5 12 ______ 22 ______ ______ _____
6 10 ______ 18 ______ ______ _____
7 8 ______ 15 ______ ______ _____
________________________________________________________________________
∑di2= _______

Computation:

The test of significance of the null hypothesis using the computed rs in the
example above is:

Ho: r = 0 (There is no relationship between Math scores and Statistics scores of


students).

Ha: r ≠ 0 (There is significant relationship between Math scores and Statistics


scores of students).

Computation:

Decision:

Conclusion:
6

CORRELATION BETWEEN INTERVAL AND DICHOTOMOUS NOMINAL


VARIABLES

The Use of Point Biserial Coefficient of Correlation Formula

Formula:
∑ 𝑓 (∑ 𝑓𝑎 𝑋) − ∑ 𝑓𝑎 (∑ 𝑓𝑥 )
𝑟𝑝𝑏 =
√∑ 𝑓𝑎 ∑ 𝑓𝑏 [∑ 𝑓( ∑ 𝑓𝑥2 ) − (∑ 𝑓𝑥 )2

Where: x = interval variable


fa = frequency of one of the dichotomous nominal variable
fb = frequency of the other dichotomous variable
f = total frequency of the dichotomous nominal variable

Example: Determine the degree of relationship between sex and intelligence.

IQ Score Number of Males Number of Females


95 8 3
90 3 2
85 1 4
80 2 0
75 4 3

Solution:

(X) (fa) (fb) (f) (fX) x ( fX) (faX)

95 8 3 11 ______ _______ _______


90 3 2 5 ______ _______ _______
85 1 4 5 ______ _______ _______
80 2 0 2 ______ _______ _______
75 4 3 7 ______ _______ _______

Total 18 12 30 2,605 228,075 1,575

Computation:

Interpretation:

There is very small positive relationship between sex and intelligence.


7

CORRELATION BETWEEN INTERVAL AND ANY NOMINAL


VARIABLES

Correlation Ratio Formula

∑Ni Ÿi2 - NiŸt2


E2 = ───────────
∑Ÿi2 – Nt Ÿt2

Where: Ni = number of sample per category


Ÿi = average obtained per category
N = total number of samples
Ÿ = over-all average
Ÿij2 = individual item

Example: Measure the degree of relationship between the civil status and the annual
salary (expressed in thousand pesos) of the given samples.

Single 65 83 81 69 73 89 76 60 N= 8
Married 70 67 90 84 78 N= 5
Widowed 89 64 78 N= 3

Solution:
N1 = 8 N2 = 5 N3 = 3 Nt = 16

Ÿ1 = 596/8 Ÿ2 = 389/5 Ÿ3 = 231/3 Ÿt = 1,216/16

= 74.5 = 77.8 = 77.0 = 76.0

Yij2 = (65)2 + ( 83) 2 + ( 81 )2 + . . . + (89)2 + ( 64 )2 + ( 78 )2


= 93,792

Computation:

Interpretation:

There is a very small positive relationship between the civil status and the annual salary (
expressed in thousand pesos) of the given samples.
8

CORRELATION BETWEEN NOMINAL VARIABLES

The Use of Guttman’s Lambda Formula (also known as Guttman’s Coefficient of


Predictability)

Formula:
FR - RT and FC- CT
c = ────── c = ─────
N - RT N - CT

where: FC = the biggest cell frequency in each column


FR = the biggest cell frequency in each column
CT = the biggest column total
RT = the biggest row total
N = total frequency
Example:

Measure the degree of relationship of female teachers’ level of knowledge and


extent of adopting control measures of varicosity.

Knowledge Level Extent of Adopting Control Measures

Always Frequent Sometimes Never Total


Very knowledgeable 18 9 10 0 37
Knowledgeable 6 15 6 3 30
Slightly knowledgeable 10 8 9 3 30
Not knowledgeable 0 0 3 0 3

TOTAL 34 32 28 6 100

Computation:

Conclusion:

The obtained lambda coefficient of _____ indicates that when extent of adoption
of control measures for varicosity is treated as an independent variable, the error reduced
in the prediction (increases its accuracy) is _____ percent. While the obtained lambda
coefficient of _____ indicates that when knowledge is treated as an independent variable ,
the error minimized in the prediction (increases its accuracy) is _______ percent. These
results prove that the extent of adoption of the control measures of varicosity accurately
predicts knowledge level more than knowledge level predicting extent of practice.
9

CHI – SQUARE TEST

Characteristics of the Chi-square

The statistic used in the analysis of enumeration data known as Chi-square test. The
chi-square test can be used for a variable or two variables for which there are two or more
categories each. It reflects discrepancies between the observed and expected or
theoretical frequencies of individuals, objects, or events falling in the various categories.

Applications of the Chi-square


1. test of goodness of fit
2. test of homogeneity (two or more samples, one criterion variable)
3. test of independence ( one sample, two criterion variables)

Steps in Using Chi-square


These are the steps in using the chi-square test for k independent samples.
1. State the null hypothesis. The null hypothesis may be stated in any of these ways:
a. the sample distribution conforms with the hypothetical or theoretical
distribution.
b. The actual observed proportion is not significantly different from the ideal
or expected proportion.
c. One variable does not depend on other variable. Or the two variables are
independent from each other.
2. Set the level of significance, also known as alpha (a)
3. Cast the observed frequencies in a k x r contingency table, using the k columns
for the groups. Determine the expected frequency for each cell by finding the
product of the marginal totals common to the cell and dividing this product by N.
( N is the sum of each group of marginal totals. It represents the total number of
independent observation).

Determine the degrees of freedom using the formula:

For one way classification:


df – number of categories - 1
For two-way classification:
df = ( k-1 ) ( r – 1 )

4. Locate the tabular value of the chi-square in the chi-square distribution table by
getting the value where the desired level of significance and the computed degree
of freedom intersect.
5. Compute for the chi-square value by using the formula:

( fo – fe )2
X2 = ───────
fe
10

where: fo = observed number of cases


fe = expected number of cases

How to compute for fe:


fe = (row total ) (column total)
grand total

6. State the conclusion arrive at by the acceptance or rejection of the null hypothesis.

If the computed value of the chi-square is less than the tabular value, the null
hypothesis is accepted. If the computed value of the chi-square is greater than the tabular
value, the null hypothesis is rejected.

Test of Goodness of Fit

A chi-square goodness of fit is performed in order to determine if a set of


obtained data corresponds to some theoretical distribution.

Example:

In bowling tournament, does the number of wins related to lane position?

Data:
Lane Position No. of Wins ( fo) Expected frequency (fe) ( fo –fe )2
fe

1 29 18 _______
2 19 18 _______
3 18 18 _______
4 25 18 _______
5 17 18 _______
6 10 18 _______
7 15 18 _______
8 11 18 _______
X2 = _______
N = 144

How the fe was computed:

fe = Number of wins / lane position


= 144 / 8 = 18
11

Chi-square Test of Homogeneity (Two or More Samples, One Criterion Variable)

The chi-square test is frequently used to determine if two or more populations are
homogenous. By this means that the data distributions are similar with respect to a
particular criterion variable.

The samples drawn from each population in a test of homogeneity need not be
equal size. However, it is recommended that they be so whenever possible, for when to
make calculations easier.

Example: In an experiment involving two groups of samples, 100 males and 100 females,
subjects were asked to state a preference between polished rice and brown rice. Do the
preferences of the two groups differ?

Rice Preference

Sex Polished Brown Total


Males 69 31 100
Females 48 52 100
Total 117 83 200

Computation:
Rice Preference

Polished Brown Total


Sex fo fe fo fe

Males 69 ________ 31 ________ 100


Females 48 ________ 52 ________ 100

Total 117 ________ 83 _________ 200

Solution: Follow the steps in using the chi-square test.


1. Ho: The preferences of the two groups do not differ significantly.
Ha: The preferences of the two groups differ significantly.
2. Use both 5% and 1% level of significance.
3. Since the given data is a two –way classification, the degree of freedom is
computed as :
df = ( k-1 ) ( r – 1 )
df = ( 2 – 1 ) ( 2 – 1 )
df = 1

4. The tabular value at 1 df and 5% and 1 % level is 3.84 and 6.64, respectively
( Table E ).
5. For two – way classification, get the expected frequencies using the following
formula:
12

fe = (row total ) ( column total)


grand total

a. Males -Polished rice ( fe) = 117 x 100 / 200 = _________


b. Males – Brown Rice (fe) = 83 x 100 / 200 = _________
c. Females – Polished rice (fe) = 117 x 100 / 200 = ________
d. Females – Brown rice ( fe) = 83 x 100 / 200 = _________

Then substitute the values into the formula with correction for continuity since we have a
2 x 2 table with 1 degree of freedom.

(│fo - fe │- .5 )2
X2 = ───────────
fe
Computation:

6. Since ______ ( tabular value ) < ______ ( computed value), we reject the null
hypothesis. The preference of the two groups differ significantly.

TEST OF INDEPENDENCE ( ONE SAMPLE, TWO CRITERION


VARIABLES)

The one-sample test of independence differs from the test of homogeneity in that
for each sample member there are measures on two variables. The sample used in a test
of independence consist of members of randomly drawn from the same population. This
test is used to see if measures taken on two criterion variables are either independent or
associated with one another in a given population.
The calculation of a chi-square test of independence is similar to that made with a
test of homogeneity.

Example:

One hundred high school students, aged 13-16, were given a test to determine
their levels of knowledge on the effects of climate change. Both age and levels of
knowledge were classified as shown below.
13

LEVELS OF KNOWLEDGE
AGE High Average Low TOTAL
______________________________________________________________________
13 - 14 23 20 17 60
15 - 16 18 12 10 40
TOTAL 41 32 27 100

Solution:

Follow the steps in using the chi-square test.


1. Ho : The level of knowledge of the students does not depend on their age.
Ha : The level of knowledge of the students depend on their age.
2. Set 5% level of significance.
3. Since the given data is a two-way classification, the degree of freedom is
computed as:
df = ( k -1 ) ( r – 1 )
df = ( 2 – 1 ) ( 3 – 1 )
df = 2
4. For two-way classification get the expected frequencies using the following
formula:

fe = (row total) (column total)


grand total

Compute the fe as follows:

Age 13-14 – High score = (41 x 60 ) / 100 = __________


Age 13-14 – Average score = ( 32 x 60 ) / 100 = __________
Age 13- 14- Low score = ( 27 x 60 ) / 100 = __________
Age 15-16 – High score = ( 41 x 40 ) / 100 = __________
Age 15- 16 – Average score = ( 32 x 40 ) / 100 = __________
Age 15- 16 – Low score = ( 27 x 40 ) / 100 = __________

Then substitute the values in the formula:

Computation:

Conclusion:
14

TEST OF COMPARISON

Comparing Two Independent Samples ( t – test )


This is a test comparing two sample means.

Example:

Two groups of students were tested if whose group is faster in traversing a 100
meter hanging bridge. As a whole, there were 11 students , hence the first group is
composed of five (5) members while the other group consisted of six (6) members. These
groups of students were timed as they traversed the hanging bridge. The results in
minutes are shown below. Is there a significant difference in their mean time (minutes) of
traversing the hanging bridge?

GROUP A GROUP B

Member Minutes Member Minutes


1 2 1 3
2 4 2 7
3 9 3 5
4 3 4 8
5 2 5 4
6 3

Total 5 20 6 30
Mean ( X ) 4 5

Hypothesis:

Ho : There is no significant difference in mean time of traversing the hanging


bridge between Group A and Group B.
Ha : There is a significant difference in mean time of traversing the hanging
bridge between Group A and group B.

ep 1. Calculate the sample standard deviation.

GROUP A GROUP B
2
Member Minutes (X1) X1 Member Minutes ( X2 ) X22
1 2 4 1 3 9
2 4 16 2 7 49
3 9 81 3 5 25
4 3 9 4 8 64
5 2 4 5 4 16
6 3 9

TOTAL 5 20 114 6 30 172


15

Formula of Standard Deviation:

(∑ 𝑿𝟏 )𝟐 (∑ 𝑿𝟐 )𝟐
∑ 𝑿𝟏 𝟐 − ∑ 𝑿𝟐 𝟐 −
𝒔𝟏= √ 𝒔𝟐= √
𝒏𝟏 𝒏𝟐
Group A Group B
𝒏𝟏 −𝟏 𝒏𝟐 −𝟏

Substitute the Values in the Formula Using the following values:

Group A Group B

n 5 6
∑X 20 30
∑X2 114 172

Computation:

Standard Deviation Group A: ________________


Standard Deviation Group B : ________________
Compute the t value

Formula :
|𝑿𝟏 − 𝑿𝟐 |
𝒕=
𝟐 𝟐
√ 𝑺𝟏 + 𝑺𝟐
𝒏𝟏 𝒏𝟐

Where:

X1 = Mean of group A _________


X2 = Mean of group B _________
Standard deviation of group A ( s ) = _____________
Standard deviation of group B ( s ) = _____________
Degrees of freedom ( df ) = n- 2 = 5 + 6 = 11
11 – 2 = 9
Tabular value use Table B ( df = n-2 ) = 9 ; .05 = 2.262; .01 = 3.250

Computation:
16

ANALYSIS OF VARIANCE

One-way ANOVA ( Parametric)

One-way ANOVA is used when the research calls for comparison of the means of
two or more groups. In the case of a two-group comparison the obtained F is equal to t2.

Example:
Is there a significant difference in the mean errors committed by encoders using
different machines?

M A C H I N E S
A B C D
18 3 7 11
5 9 5 19
10 14 15 20
17 10 14 19
10 9 9 11
∑ 60 45 50 80
Mean 12 9 10 16
∑X = 235
∑X2 = 3245

Compute for the F-ratio ∝ = . 𝟎𝟓

Steps:
1. Compute for the totals per machine.
2. Compute the means.
3. Compute the ∑X2 = 3245
4. Compute the ∑X = 235
5. Find the correction term, C
C = ( ∑X)2
───
n
6. Find the total sum of squares
SST = ∑X2 – C
Find the sum of squares (ss) among means.
ssm = (∑XA)2 + ( ∑XB)2 + . . . ( ∑XD)2
───────────────────────────
-C
n
7. Find the sum of squares within.
ssw = ssT - ssm

8. Prepare the ANOVA Table


17

Analysis of Variance

Source of Variation df SS Mean Square F ratio


( n-1 )
Among Xs Means k-1 SSTreatment SST/ (k-1) MST/ MSE

Within groups (Error) n-k SSE SSE/ (n-k)

Total n -1 SStotal

9. Interpret, use Table D


df numerator = 3
df denominator = 16

Table values
.05 = 3.24
.01 = 5.29

You might also like