Professional Documents
Culture Documents
Statistical Tools
Statistical Tools
STATISTICAL TOOLS
Note: To have a more reliable and acceptable result the paired data to be tested must be at
least 30.
1. It is a unitless quantity.
2. It is always some number between -1 and +1 inclusive.
3. If r=1, then all the points lie in a straight line and the relationship is said to be
perfect positive relationship. If r= -1, then all the points lie in a straight line but in
an inverse relation and this relationship is said to be perfect negative relationship.
2
If r=o. then there is no relationship between two variables and the magnitude are
said to be independent.
4. The magnitude of r is simply a measure of how closely the points cluster about a
certain trend line known as regression line.
Degree of correlation:
1.0 Perfect
.81 - .99 Very high
.61 - . 80 Marked, substantial
.41 - .60 Moderate
.21 - .40 Definite but small
.01 - .20 Almost negligible
0 No correlation
Computational Format:
Consider the scores obtained in Math (X) and Statistics (Y) subjects of ten
Marketing Management students.
________________________________________________________________________
Observation Math Score (X) Stat Score (Y) X2 Y2 XY
1 5 2 _____ _____ ______
2 8 7 _____ _____ ______
3 10 8 _____ _____ ______
4 12 9 _____ _____ ______
5 12 10 _____ _____ ______
6 14 12 _____ _____ ______
7 15 14 _____ _____ ______
8 16 10 _____ _____ ______
9 18 16 _____ _____ ______
10 20 12 _____ _____ ______
SUM 130 100 1878 1138 1440
_______________________________________________________________________
Formula:
𝒏(∑ 𝑿𝒀) − (∑ 𝑿)(∑ 𝒀)
𝒓=
√[𝒏(∑ 𝑿𝟐 ) − (∑ 𝑿)𝟐 ][𝒏(∑ 𝒀𝟐 ) − (∑ 𝒀)𝟐 ]
3
where:
n = sample size or the number of paired observations
∑X = is the variable X summed
∑Y = is the variable Y summed
(∑X2)= is the X variable squared and the squares summed
(∑X)2= is the variable X summed and the sum squared
(∑Y2)= is the Y variable squared and the squares summed
(∑Y)2= is the variable Y summed and the sum squared
∑XY= is the sum of the products of X and Y
r = correlation coefficient or the degree of relationship between X and Y
If we want to know how strong the degree of the relationship is, a test of
significance is necessary.
Example:
To test the significance of the null hypothesis using the computed r it is done as
follows:
Formula:
𝑟 √𝑛−2
𝑡=
√1−𝑟 2
Computation:
Decision: The null hypothesis is rejected because the computed t-value exceeded the
tabular value.
If the variables of interest are measured in an ordinal scale, the spearman rank
correlation coefficient (rs) may be used instead of the Pearson r.
Step1. Rank the scores in distribution X giving the lowest a rank of 1 and the
highest a rank of n. Repeat the process for the scores in distribution Y.
Step 2. Obtain the difference (di) between the two sets of ranks.
Step 3. Square each difference and then take the sum of the squares of di.
Step 5. If the proportion of ties in either the X or the Y observations is large use
the formula.
∑ 𝑋 2 + ∑ 𝑌 2 − ∑ 𝑑𝑖 2
𝑟𝑠 =
2 2
2√(∑ 𝑋 )(∑ 𝑌 )
where:
n(n2 – 1) - ∑Ty
∑X = 2
──────────────── -∑Tx
12
tx3 - tx
Tx = _────────
12
n ( n2-1)
∑Y = _──────── - ∑Ty
2
12
tx3- tx
Ty = _────────
12
The sampling distribution of the test is the student t distribution with n-2 degrees
of freedom.
Computation Format:
Computation:
The test of significance of the null hypothesis using the computed rs in the
example above is:
Computation:
Decision:
Conclusion:
6
Formula:
∑ 𝑓 (∑ 𝑓𝑎 𝑋) − ∑ 𝑓𝑎 (∑ 𝑓𝑥 )
𝑟𝑝𝑏 =
√∑ 𝑓𝑎 ∑ 𝑓𝑏 [∑ 𝑓( ∑ 𝑓𝑥2 ) − (∑ 𝑓𝑥 )2
Solution:
Computation:
Interpretation:
Example: Measure the degree of relationship between the civil status and the annual
salary (expressed in thousand pesos) of the given samples.
Single 65 83 81 69 73 89 76 60 N= 8
Married 70 67 90 84 78 N= 5
Widowed 89 64 78 N= 3
Solution:
N1 = 8 N2 = 5 N3 = 3 Nt = 16
Computation:
Interpretation:
There is a very small positive relationship between the civil status and the annual salary (
expressed in thousand pesos) of the given samples.
8
Formula:
FR - RT and FC- CT
c = ────── c = ─────
N - RT N - CT
TOTAL 34 32 28 6 100
Computation:
Conclusion:
The obtained lambda coefficient of _____ indicates that when extent of adoption
of control measures for varicosity is treated as an independent variable, the error reduced
in the prediction (increases its accuracy) is _____ percent. While the obtained lambda
coefficient of _____ indicates that when knowledge is treated as an independent variable ,
the error minimized in the prediction (increases its accuracy) is _______ percent. These
results prove that the extent of adoption of the control measures of varicosity accurately
predicts knowledge level more than knowledge level predicting extent of practice.
9
The statistic used in the analysis of enumeration data known as Chi-square test. The
chi-square test can be used for a variable or two variables for which there are two or more
categories each. It reflects discrepancies between the observed and expected or
theoretical frequencies of individuals, objects, or events falling in the various categories.
4. Locate the tabular value of the chi-square in the chi-square distribution table by
getting the value where the desired level of significance and the computed degree
of freedom intersect.
5. Compute for the chi-square value by using the formula:
( fo – fe )2
X2 = ───────
fe
10
6. State the conclusion arrive at by the acceptance or rejection of the null hypothesis.
If the computed value of the chi-square is less than the tabular value, the null
hypothesis is accepted. If the computed value of the chi-square is greater than the tabular
value, the null hypothesis is rejected.
Example:
Data:
Lane Position No. of Wins ( fo) Expected frequency (fe) ( fo –fe )2
fe
1 29 18 _______
2 19 18 _______
3 18 18 _______
4 25 18 _______
5 17 18 _______
6 10 18 _______
7 15 18 _______
8 11 18 _______
X2 = _______
N = 144
The chi-square test is frequently used to determine if two or more populations are
homogenous. By this means that the data distributions are similar with respect to a
particular criterion variable.
The samples drawn from each population in a test of homogeneity need not be
equal size. However, it is recommended that they be so whenever possible, for when to
make calculations easier.
Example: In an experiment involving two groups of samples, 100 males and 100 females,
subjects were asked to state a preference between polished rice and brown rice. Do the
preferences of the two groups differ?
Rice Preference
Computation:
Rice Preference
4. The tabular value at 1 df and 5% and 1 % level is 3.84 and 6.64, respectively
( Table E ).
5. For two – way classification, get the expected frequencies using the following
formula:
12
Then substitute the values into the formula with correction for continuity since we have a
2 x 2 table with 1 degree of freedom.
(│fo - fe │- .5 )2
X2 = ───────────
fe
Computation:
6. Since ______ ( tabular value ) < ______ ( computed value), we reject the null
hypothesis. The preference of the two groups differ significantly.
The one-sample test of independence differs from the test of homogeneity in that
for each sample member there are measures on two variables. The sample used in a test
of independence consist of members of randomly drawn from the same population. This
test is used to see if measures taken on two criterion variables are either independent or
associated with one another in a given population.
The calculation of a chi-square test of independence is similar to that made with a
test of homogeneity.
Example:
One hundred high school students, aged 13-16, were given a test to determine
their levels of knowledge on the effects of climate change. Both age and levels of
knowledge were classified as shown below.
13
LEVELS OF KNOWLEDGE
AGE High Average Low TOTAL
______________________________________________________________________
13 - 14 23 20 17 60
15 - 16 18 12 10 40
TOTAL 41 32 27 100
Solution:
Computation:
Conclusion:
14
TEST OF COMPARISON
Example:
Two groups of students were tested if whose group is faster in traversing a 100
meter hanging bridge. As a whole, there were 11 students , hence the first group is
composed of five (5) members while the other group consisted of six (6) members. These
groups of students were timed as they traversed the hanging bridge. The results in
minutes are shown below. Is there a significant difference in their mean time (minutes) of
traversing the hanging bridge?
GROUP A GROUP B
Total 5 20 6 30
Mean ( X ) 4 5
Hypothesis:
GROUP A GROUP B
2
Member Minutes (X1) X1 Member Minutes ( X2 ) X22
1 2 4 1 3 9
2 4 16 2 7 49
3 9 81 3 5 25
4 3 9 4 8 64
5 2 4 5 4 16
6 3 9
(∑ 𝑿𝟏 )𝟐 (∑ 𝑿𝟐 )𝟐
∑ 𝑿𝟏 𝟐 − ∑ 𝑿𝟐 𝟐 −
𝒔𝟏= √ 𝒔𝟐= √
𝒏𝟏 𝒏𝟐
Group A Group B
𝒏𝟏 −𝟏 𝒏𝟐 −𝟏
Group A Group B
n 5 6
∑X 20 30
∑X2 114 172
Computation:
Formula :
|𝑿𝟏 − 𝑿𝟐 |
𝒕=
𝟐 𝟐
√ 𝑺𝟏 + 𝑺𝟐
𝒏𝟏 𝒏𝟐
Where:
Computation:
16
ANALYSIS OF VARIANCE
One-way ANOVA is used when the research calls for comparison of the means of
two or more groups. In the case of a two-group comparison the obtained F is equal to t2.
Example:
Is there a significant difference in the mean errors committed by encoders using
different machines?
M A C H I N E S
A B C D
18 3 7 11
5 9 5 19
10 14 15 20
17 10 14 19
10 9 9 11
∑ 60 45 50 80
Mean 12 9 10 16
∑X = 235
∑X2 = 3245
Steps:
1. Compute for the totals per machine.
2. Compute the means.
3. Compute the ∑X2 = 3245
4. Compute the ∑X = 235
5. Find the correction term, C
C = ( ∑X)2
───
n
6. Find the total sum of squares
SST = ∑X2 – C
Find the sum of squares (ss) among means.
ssm = (∑XA)2 + ( ∑XB)2 + . . . ( ∑XD)2
───────────────────────────
-C
n
7. Find the sum of squares within.
ssw = ssT - ssm
Analysis of Variance
Total n -1 SStotal
Table values
.05 = 3.24
.01 = 5.29