Professional Documents
Culture Documents
Introduction To Statistics: Romeo D. Caturao, D.SC., Ph.D. Dean, College of Fisheries
Introduction To Statistics: Romeo D. Caturao, D.SC., Ph.D. Dean, College of Fisheries
Statistical inference – Statistical inference comprises those methods concerned with the
analysis of a subset of data leading to predictions or inferences about the entire set of data.
Statistical Relationship?
Relationships in probability and statistics can generally be one of three things:
deterministic, random, or statistical.
Population Sample
Mean µ ×
Standard deviation σ s
Example:
People may vary in sex, age, educational attainment, socio-
economic status, religion and others.
n= 70
1 + 70 (0.0025)
n= 70
1 + 0.175
Pi = N i
Nt
Step 3. Calculate the number of academic administrators required in every SUC, by the using the
equation:
ni = Pi (nt)
Pi = the proportion of the distribution of the required number of academic administrators in every
SUC.
nt = the sample size or the representative number of academic administrators in Region VI.
Example of the required sample size of academic administrators in West Visayas State University (n i):
ni = 0.129 (60) = 7.74 or 8 ( the required number of academic administrators at West Visayas
State University).
For Capiz University:
Pi = 0.200 ; nt = 60
ni = 0.200 (60)
nt = 12.0 (the required sample size or number of academic
administrators in CSU)
Table 1. Shows the distribution of population and sample size of
academic administrators in State Universities and Colleges in
Region VI.
State Universities and Colleges Ni Pi ni
1. Aklan State University 6 0.086 5
2. Carlos A. Hilado Memorial State College 5 0.071 4
3. Guimaras State College 3 0.043 2
4. Iloilo State College Fisheries 7 0.100 6
5. Northern Iloilo Polytechnic State College 9 0.129 8
6. Capiz State University (PSPC) 14 0.200 12
7. University of Antique (PSCA) 10 0.143 9
8. WVCST 7 0.100 6
9. West Visayas State University 9 0.129 9
Total (Nt0) 70 1.00 nt =60
Basic Sampling Techniques
There are two basic sampling techniques: (1) Probability Sampling
Techniques, and (2) Non-Probability Sampling Techniques
Also, 3
∑ xi = x2 + x3 = 10 + 18 = 28
i=2
n
In general, the symbol ∑ means that we replace i wherever it
i=2
it appears after the summation symbol by 1, then by 2, and so on
up to n, and then add up the terms. Therefore, we can write
3
∑ xi2 = X12 + X22 + X32 ,
i= 1
5
∑ xj Yj = X2Y2 + X3Y3 + X5Y5.
j=2
9
∑ x = 1 + 2 + …… + 9 = 45.
x=1
When we are summing over all the values of Xi that are available, the limits of
summation are often omitted and we simply write ∑ xi. If in the diet experiment
only 4 people were involved, then ∑ xi = X1 + X2 + X3 + X4. In fact some authors even
drop the subscript and let ∑ x represent the sum of all available data.
Solution: ∑ Xi = X1 + X2 + X3 = 3 + 5 + 7 = 15
Three theorems that provide basic rules in dealing with
summation notation are given below:
THEOREM 1.1
The summation of the sum of two or more variables is the sum of
their summations. Thus:
n n n n
n n n
= ∑ x i + ∑ y i + ∑ z i.
i=1 i=1 i=1
THEOREM 1.2
If c is a constant, then
n n
∑ cxi = c ∑ xi.
i=1 i=1
n
= C ∑ xi.
i=1
THEOREM 1.3
If a is constant, then
n
∑ c = nc.
I=1
Proof. If in Theorem 1.2 all the xi are equal to 1, then
n
∑ c = c + c + . . . + c = nc.
i=1
n items
The use of Theorems 1.1 through 1.3 in simplifying summation
problems is illustrated in the following examples.
Example 10: If X1 = 2, X2=4, Y1=3, Y2 = -1, find the value of
n
∑ (3xi - yi + 4).
i=1
Solution
2 2 2 2
∑ (3xi - yi + 4) = ∑ 3xi - ∑ yi + ∑ 4
i=1 i=1 i=1 i=1
2 2
= 3 ∑ xi - ∑ yi + (2)(4)
i =1 i =1
= (3)(2 + 4) – (3 – 1) + 8
= 24
Example 11. Simplify
3
∑ ( x – i)2 .
i=1
Solution
3 3
∑ ( x – i)2 = ∑ (x2 – 2xi + i2)
i=1 i=1
3 3 3
= ∑ x2 - ∑ 2xi + ∑i2 = 3x2 – 2x( 1+ 2 + 3) + (1 + 4 + 9)
i=1 i= 1 i=1
3 3
= 3x2 – 2x ∑ i + ∑ i2 = 3x2 – 12x + 14.
i=1 i=1
Often our data may be classified according to two criteria. For
example Xij might represent the amount of gas released when a
chemical experiment is run at the ith temperature level and the
jth pressure level. To sum such observations, it is convenient to
adopt a double-summation notation. The symbol
m n
Similarly, if f(x, y) represents the textbooks sales for publisher xi at university yj, then
3
2 3
∑ ∑ f(xi, yj) = ∑ f (xi,y1) + f(xi, y2)
I=1 j=1 I=1
+ f(x3,y1) + f(x3,Y2)
gives the total sales of a certain three publishers at two specific universities.
=
Parameters and Statistics
= The terminology and notation adopted by statisticians in their
treatment of statistical data depend entirely on whether the data set
constitutes a population or a sample selected from a population.
= Consider for example the following set of data representing the
number of typing errors made by the secretary on 10 different pages
of a document: 1, 2, 1, 2, , 1, 1, 4, 0, 2, and 2.
= First let us assume that the document contains exactly 10 pages so
that the data constitute a small finite population. A quick study of
this population could lead to a number of conclusions. For instance,
we could make the statement that the largest number of typing
errors on any single page was 4, or we might state that the
arithmetic mean (average) of the 10 numbers is 1.5. The numbers 4
and 1.5 are descriptive properties of our population. We refer to
such values as parameters of the population.
Parameter
Any numerical value describing a characteristic of a population is
called a parameter.
= It is customary to represent parameters by Greek letters. By
tradition the arithmetic mean of population is denoted by µ.
Hence, for our population of typing errors, µ = 1.5. Note that a
parameter is a constant value describing the population.
= Now let us suppose that the data representing the number of
typing errors constitute a sample obtained by counting the
number of errors on 10 pages randomly selected from a large
manuscript. Clearly, the population is now a much larger set of
data about which we only have partial information provided by
the sample. The numbers 4 and 1.5 are now descriptive measures
of the sample and are not to be considered parameters of the
population. A value computed from a sample is called a statistic.
Statistic
- Any numerical value describing a characteristics of a
sample is called a statistic.
= A statistic is usually represented by ordinary letters of the
English alphabet. If the statistic happens to be the sample
mean, we shall denote it by x . For our random sample of
typing errors we have x = 1.5. Since many random samples
are possible from the same population, we would expect the
statistic to vary from sample to sample. In other words, if a
second random sample of 10 pages is selected from the
manuscript and the number of typing errors per page
tabulated, the largest value might turn out to be 5 rather
than 4, and the arithmetic mean would be probably be close
to 1.5, but almost certainly different.
In our study of statistical inference we shall use
the value of a statistic as an estimate of the
corresponding population parameter. The size
of the population will, for the most part, large
or infinite. To know how accurate the statistic
estimates the parameter, we must first
investigate the distribution of the values of the
statistic obtained in repeated sampling.
Measures of Central Location
To investigate a set of quantitative data, it is useful to define numerical
measures that describe important features of the data . One of the important
ways of describing a group of measurements, whether it be a sample or
population, is by the use of an average.
An average is a measure of the center of a set of data when the data are
arranged in an increasing or decreasing order of magnitude. For example, if an
automobile averages 14.5 kilometers to 1 liter of gasoline, this can be
considered a value indicating the center of several more values. In the country
1 liter of gasoline may give considerably more kilometers per liter than in the
congested traffic or a large city. The number 4.5 in some sense defines a center
value.
Any measure indicating the center of a ser of data, arranged in an increasing
or decreasing order of magnitude, is called a measure of central location or a
measure of central tendency. The most commonly used measures of central
location are the mean, median, and mode. The most important of these and
the one will shall consider first is the mean.
Population Mean
If the set of data x1, x2, . . . , xN, not necessarily all
distinct, represents a finite population of size N, then
the population mean is
∑ xi
i=1
µ=
N
Example: The number of fishes at 5 different breeding
tanks are 3, 5, 6, 4, and 6. Treating the data as a
population, find the mean number of fishes for the 5
tanks.
µ = 3 + 5 + 6 + 4 + 6 = 4.8
5
Sample Mean
If the set of data x1, x2, …, xn, not necessarily all
distinct represents a finite sample of size n, then the
sample mean is
∑ xi
i=1
x= n
Example 2. A food inspector examined a random sample of 7
cans of a certain brand of tuna to determine the percent of
foreign impurities. The following data were recorded : 1.8,
2.1, 1.7, 1.6, 0.9, 2.7, and 1.8. Compute the sample mean.
n n
∑ yi ∑ axi
i=1 i=1
y = ------- ----------- = ax.
n n
Therefore, if all observations are multiplied
or divided by a constant, the new
observations will have a mean that is the
same constant multiple of the original
mean. The mean of the numbers 4, 6, 14 is
equal to 8, and therefore, after dividing by
2, the mean of the set 2, 3, and 7 must be
8/2 = 4.
MEDIAN
The second most useful measure of central location is the
median. For a population we designate the median by
∑xi = 96 ∑xi = 96
X= 8 X= 8
Measures of Dispersion Using the Number Line system
Group A
8
8
7 8 9
5 6 7 8 9 10 11
-X +X
3 3
6
Measures of dispersion Using Variance
σ2 A = ∑(Xi –X)2 30 30
= = = 2.73
N-1 12 – 1 11
∑ (Xi - X)2 30
= = 1.65
s.d.A = N -1 12-1
Steps in ranking:
1. Arrange the data to be ranked in a descending or ascending order.
2. Assign consecutive numbers for each item from the highest to
lowest or vice versa.
3. Rank an item occurring once the same as its consecutive number.
4. The rank of an item occurring two or more times is done by adding
their consecutive numbers and divide by the number of items.
Example: Rank the following average weight of
tilapia after one month culture.
83 82 79 86 80 82
80 76 78 77 79 81
82 84 80 81 78 79
85 76 75 75 85 84
No. Ave. Wt. Rank
1. 86 1.0
2 85 2.5 2+3 = 5; 5÷2 = 2.5
3 85 2.5
4 84 4.5 4+5 = 9; 9÷2 = 4.5
5 84 4.5
6 83 6.0
7 82 8.0 7+8+9 = 24; 24÷3 = 8.0
8 82 8.0
9 82 8.0
10 81 10.5 10+11 = 21; 21÷2 = 10.5
11 81 10.5
12 80 13.0 12+13+14 = 39; 39÷3 = 13.0]
13 80 13.0
14 80 13.0
15 79 15+16+17 = 48; 48÷3 = 16
16 79
17 79
18 78 18+19 = 37; 37÷2 = 18.5
19 78
20 77
21 76 21+22 = 43; 43÷2 = 21.5
22 76
23 75 23+24 =47; 23.5
24 75
Frequency Distribution of Interval
Using the following average weight of tilapia, follow the steps
and present the frequency distribution.
70 75 49 85 79 73 52 65 90 65 87 47 65 56
80 50 72 92 82 78 95 63 80 69 92 66 68 86
89 64 74 50 57 74 56 73 71 72 72 59 55 80
75 57 97 60 53 68 74 79 75 77 69 81 69 82
66 48 71 71 66 62 68 86 61 81 60 89 71
1. Determine the range. The range is the difference between the highest
score or data and the lowest score or data, In the given data the highest
score is 97 and the lowest score is 47. The range is computed as follows:
Range = H - L
Range = 97 - 47 = 50
2. Determine the acceptable size of the interval by
dividing the range by 10 and 15.
R/10 = 50/10 = 5
M = 90 + 94 = 184 = 92
2 2
7. Compute for the cumulative frequency less than “CF<“ and
the cumulative frequency greater than “CF>”.
Class interval Tally (t) (f) M CF<CF>
95 -99 ll 2 97 75 2
90-94 lll 3 92 73 5
85-89 llll-l 6 87 70 11
80-84 llll-lll 8 82 64 19
75-79 llll-llll 10 77 56 29
70-74 llll-llll-llll 14 72 46 43
65-69 llll-llll-ll12 67 32 55
60-64 llll-ll 7 62 20 62
55-59 llll-l 6 57 13 68
50-54 llll 4 52 7 72
45-49 lll 3 47 3 75
N = 75
8. Compute the relative frequency and place it under (FR%).
The relative frequency is computed by finding the quotient of
the class frequency over the total frequency multiplied by
100 since RF is expressed in percent.
FR(%) = C x 100
Tf
Where RF = the relative frequency
C = the class frequency
TF = the total frequency
Sample computation of RF for the second class interval
RF(%) = C x 100 = 3 x 100 = 4%
TF 75
Compute the cumulative relative frequency CRF.
The cumulative relative frequency is computed
by adding the RF of reach class interval from
above. The total cumulative frequency is 100%
or very close to 100%.
Class interval Tally (t) (f) M CF< CF> RF CRF
95 -99 ll 2 97 75 2 2.67 2.67
90-94 lll 3 92 73 5 4.00 6.67
85-89 llll-l 6 87 70 11 8.00 14.67
80-84 llll-lll 8 82 64 19 10.67 25.34
75-79 llll-llll 10 77 56 29 13.33 38.57
70-74 llll-llll-llll 14 72 46 43 18.67 57.24
65-69 llll-llll-ll 12 67 32 55 16.00 73.24
60-64 llll-ll 7 62 20 62 9.33 82.57
55-59 llll-l 6 57 13 68 8.00 90.57
50-54 llll 4 52 7 72 5.33 96.00
45-49 lll 3 47 3 75 4.00 100
N = 75
Measures of Relationships
Question? Are measures of relationships, correlations, or
association descriptive statistics or inferential statistics?
Answer? If the researcher is using a population data,
measures of relationships are descriptive statistics, since the
researcher is measuring the strength or degree of
relationships among variable in the population. But, if the
researcher is using a sample data, measures of relationships
are under the inferential statistics and from such the
researcher will now be testing a research hypothesis. Again,
both population and sample data have their own
corresponding and specified statistical tool to be used
depending on the levels of measurement in determining the
strength and significance of the relationships among variables.
Further, the difference in the use of a descriptive statistics o
inferential statistics in the measures of relationships is that in
descriptive statistics you simply measure the degree or
strength of relationship. But in inferential statistics, the
researcher determines the significance of the degree of
relationships, whether or not the degree or strength is
significant.
Sdx = 4.57
rs = 1.0 - 6 ∑ D2
N (N2 -1
= 1.0 – 6(21)
10(102 -1)
= 1.0 - 126
990
= 1.0 – 0.127
= 0.87 this indicates very high positive correlation
from the computed r-value using Pearson’s r
If however, the data being analyzed were taken from a sample data,
the researcher needs to determine the significance of the
computed r-value by testing the significance of r-value using the
equation shown below:
t – value = r n -2
1 – r2
= 0.87 10 – 2
1 – (.87)2
= 0.87 (5.74)
= 4.99
In testing the significance of r-value, compare the computed t-
value with that of the tabular value at 0.05 alpha or at p-value
set at 0.05 alpha for a two-tailed test.
Calculated t-value tabular t- value
0.05 0.01
4.99* 2.31 3.36
p < 0.01 significant at 0.01 alpha
rp = ∑dxdy
(N- 1) (sdx) (sdy)
70 = 2.79
∑(Yi – M)2
Sdy = N-1 = 9
rp = ∑dxdy = 48
(N-1) (sdx) (sdy) (9) (3.40) (2.79)
= rp = 0.56
Determining Pearson’s Coefficient of Correlation Using the
Difference Method
The formula to use is:
whereby:
464 - 360
Sdx = 9
104
Sdx = 9
Sdx = 3.40
∑ Y2 – (∑X)2 560 – (70)2
Sdy = N = 10
N -1 N- 1
560 – 490 70
Sdy = 9 = 9
= 2.79
Computation of the coefficient of variation (CV)
CV = error MS X 100
grand mean
116
Analysis of Variance table
Source of Variation Df Ss Ms Observed Tabular F
F2 5% 1%
Treatments 6 5,587,175 931,196 9.82** 2.57 3.81
Experimental Error 21 1,990,237 94,773
Total 27 7,577,412
117
Comparisons Among Treatment Means
Introduction
- the significant F-test in the analysis of variance indicates the
presence of one or more real differences among the treatments
tested.
- It dos not, however, locate the specific difference, or
differences, that may account for the significance.
- Thus, the analysis of variance should be considered only as the
first step in evaluating differences among treatments.
- The subsequent step is to locate the specific treatment
differences of interest to the researcher and test whether these
differences are significant.
- Thus, this topic is primarily concerned with how to locate and
test specific treatment comparisons.
118
Specific Comparison among treatment means can be arbitrarily classified into three types:
1. Comparison between pairs of treatment
2. Comparison between two groups of treatment
3. Trend comparison
Trend comparison
I limited to treatments that are quantitative. For example, rates of fertilizer and distances
of planting. Although the trend comparison is slightly more complicated, it can also
generate more information than can be obtained from the first two comparisons.
119
COMPARISON BETWEEN PAIRS OF MEANS
- All that is done is to locate any pair of means whose difference exceeds the LSD value and declare
the means significantly different from each other.
- Strictly speaking, however, this procedure is valid when the experiment involves only two
treatments, because as more treatments are involved and when all possible pairs of treatment
means (unplanned comparisons) are to be tested the LSD becomes less and less precise.
- Note that the number of all possible pairs increase very rapidly as the number of treatments
increases, i.e., 10 possible pairs for 5 treatments, 45 for 10 treatments, and 105 for 15 treatments.
-Furthermore, it can be shown that the probability of at least one comparison, the largest vs. the
smallest, exceeding the LSD value at 5% level of significance, when in fact, the difference is not
real, is 29% for 5 treatments, 63% for 10 treatments, and 83% for 15 treatments.
-This implies that when the experimenter thinks he is testing at the 5% level of significance, he is
actually test at 29% level of significance for 5 treatments, 63% for 10 treatments, and so on.
- Thus the LSD test can be, and often is, misused. Hence, special
care must be exercised in using this test. The following rules
have been found useful in the effort to use the LSD test
effectively:
1. Use the LSD test only when the F-test in the analysis of
variance is significant;
2. Do not use the LSD test for comparisons among all
possible pairs of means when the experiment involves more
than five treatments; and
3. Use the LSD test for preplanned comparisons regardless
of the number of treatments involved. For instance, in
comparing every treatment to a control, the LSD test can used
even if there are more than five treatments.
Equal Replictation
- when every treatment is replicated r times, the formula or calculating
the LSD value at a certain level of significance, say α, is
LSDα = tα 2 s2/r ,
**
= significant at 1% level, * significant at 5% level, ns = not significant.
To judge which differences are statistically significant, each
difference is compared with the computed LSD values. If it
exceeds the LSD value at the 1% level, two asterisks are used
to indicate that the difference is highly significant. If it
exceeds the LSD value only at the 5% level, one asterisk is
used. Otherwise, use ns to indicate that the difference is not
significant. Of the six insecticides, only the Dimecron-Krap
treatment was not significantly different from the control.
Others are significantly superior either at 5% level or at 1%
level.
Unequal replication
= 2.045 2(176,532)/4
= 2.756 2(176,532)4
= 819 kg/ha
To compare a treatment with three replications with the control, the following
are used:
The results are shown in the preceding tables. All treatments are significantly
different from the control at either the 5% or 1% level.
Table. Grain yield of rice under different types, rates, and times of application of post-emergence
herbicides under upland-rainfed condition, IRRI.
Treatment
Type Rate1 (kg Time of a Grain Yields (kg/ha) Treatment Treatment
a.i./ha) Application2 Total Mean
(DAS)
Propanil/ 2.0/0.25 21 3,187 4,610 3,562 3,217 14,576 3,644
Bromoxynil
Propanil/ 3.0/1.00 28 3,390 2,875 2,775 9,040 3,013
2.4-D-Bee
Propanil/ 2.0/0.25 14 2,797 3,001 2,505 3,490 11,793 2948
Bromoxyl
Propanil/ 2.0/0.50 14 2,832 3,103 3,448 2,255 11638 2,010
Ioxynil
Propanil/ 3.0/1.50 21 2,233 2,743 2,727 7,703 2,568
CHCH
Phenyedi- 1.5 14 2,952 2,272 2,470 7,694 2,565
pham
Propanil/ 2.0/0.25 28 2,858 2,895 2,458 1,723 9,934 2,484
Bromoxyl
Propanil/,4 3.0/1.0 28 2,308 2,335 1,975 6,618 2,206
-D-IPE
Propanil/ 2.0/0.50 28 2,013 1,788 2,248 2,115 8,164 2,041
Ioxynil
Handweed - 15 and 35 3,202 3,060 2,240 2,690 11,192 2,798
twice
Control - - 1,192 1652 1,075 1,030 4,949 1237
Type Rate1 (kg Time of a Replications Treatment Difference LSD Values
a.i./ha) Application2 (no.) Mean From Control 5% 1%
(DAS) (kg/ha)
Control - - 4 1237 - - -
- In contrast to the LSD test in which only one value is computed for all comparisons, the Duncan’s multiple ranges
test (DMRT) prescribes a set of significant differences of increasing sizes depending upon the distance between
the rankings of the two means to be compared.
- While additional computations are required for the DMRT, the test overcomes the major defects of the LSD test
in that the DMRT can be used to test differences among all possible pairs of treatments regardless of the number
of treatments involved and still maintain the prescribed level of significance.
Sx = S2/r ,
Step 3. calculate the shortest significant ranges for various ranges of means as follows:
Rp = r p Sx
where rp (p = 2, 3 …….., t) are the values of significant studentized ranges
obtained from appendix 6 based on the error degrees of freedom and t is the
number of treatments.
Step 4. Group the treatment means according to the statistical significance.
For this, the following method may be used:
1. From the largest mean, subtract the “shortest significant range” of the
largest p. Declare all means less than this value significantly different from
the largest mean. For the remaining means not declared significantly
different, compare the range (i.e., difference between the largest and the
smallest) with appropriate Rp . If the range is smaller than its corresponding
Rp, all remaining means are not significantly different.
2. From the second largest mean, subtract the second largest Rp.
Declare all means less than this value significantly different from the second
largest mean. Then, compare the range of the remaining means with the
appropriate Rp.
3. Continue the process with the third largest mean, then the fourth, and
so on, until all means have been properly compared.
The steps in the computation of Duncan’s Multiple Range Test will be
illustrated using the previous data we had discussed before.
Step 1. Arrange the means in decreasing order as follows:
Treatment Mean yield(kg/ha) Rank
T2: Dol-mix (2 kg) 2,678 1
T3: DDT + γ-BHC 2,552 2
T4: Azodrin 2,128 3
T1: Dol-mix (1kg) 2,127 ` 4
T5: Dimecron-Boom 1,796 5
T6:Dimecron Krap 1,681 6
T7: Control 1,316
p rp (0.05)
2 2.94
3 3.09
4 3.18
5 3.25
6 3.30
7 3.33
Then, compute the “shortest significant ranges” using formula from step 3. as follows:
p Rp = rp sx
2 (2.94)(153.93) =453
3 (3.09)(153.93) = 476
4 (3.18)(153.93) = 490
5 (3.25)(153.93) = 500
6 (3.30)(153.93) = 508
7 (3.33)(153.93) = 513
Step 4. The following steps should be followed in grouping the means according to
statistical significance.
1. From the largest treatment mean of 2,678 kg/ha, subtract the largest Rp (e,i., R7
= 513) and obtain the difference of 2,165 kg/ha. Declare all treatment means less
than 2,165kg/ha as significantly different from the largest mean. From the array of
means sown in Step 1, all treatments except T3 are significantly different from T2.
since there are two remaining means, namely, T2 and T3, their difference of 2,678 –
2,552 = 126 kg/ha is compared to R2 = 453. Since the difference is smaller than R2,
T2 and T3 are not significantly different.
Draw a vertical line connecting the means of T2 and T3 as follows:
Treatment Mean yield (by rank)
T2 2,678
T3 2552
T4 2128
T1 2127
T5 1796
T6 1681
T7 1316
- The vertical lines is a convenient and widely accepted way of marking
differences among treatment means. Note that all means connected by the
same line have been judged not significantly different from each other.
b. From the second largest mean (T3) of 2,552 kg/ha, subtract the second
largest Rp (i.e., R6); i.e. 2,552-508 = 2,044. Declare all means less than 2,044
kg/ha significantly different from the mean of T3. Here, the means of T5, T6,
and T7, are all less than 2,044 kg/ha and hence are significantly different from
T3.
- The range of the remaining means that were not declared significantly
different is T3-T1 = 2,552 – 2,127 = 425 which is less than R3=476. Hence, all
remaining means, namely, T3, T4, and T1, are not significantly different. Draw a
vertical line connecting the means of T3, T4, and T1 as follows:
Treatment Mean
T2 2,678
T3 2,552
T4 2,128
T1 2,127
T5 1,796
T6 1,681
T7 1,316
(d) Here the process can be continued with the fourth largest mean and so on.
However, since T7 is, at this stage, the only mean outside of groupings already
made, it is simpler just to compare T7 with the rest of the means (namely, T1, T5,
and T6) using the appropriate Rp. The comparisons are:
F-ratio = MSBG
MSBG
Bet Groups
Within Groups
Total
The Chi-Square Test
Measures of relationships on ordered sets of data on two or more nominal or
ordinal variables can be measured by the Chi-Square Test for a one sample
case, Chi-Square Test for two sample cases, the tetrachronic correlation, the
phi correlation, the rank biserial correlation, and the point biserial correlation.
fe
where:
∑ (fo-fe)2 12
fe
_______________________________________________________________________
Cqlculated X2 Value Tabular Value
0.05 0.01
_______________________________________________________________________
High Php120,000 5 40 45 90
Above
Average 10 20 45 75
Php72,000 to
Php119,999
Low (Below 20 8 8 36
Php72,000)
Total 35 68 98 201
Steps in the process of computing the calculated Chi-Square
value:
Step 1. Examine the cell frequencies in the tables and
determine if there are zero cell frequencies, or if the degrees of
freedom are equal to one, or a cell frequency of less than five.
This process is an idea of determining the appropriate tool in the
analysis of data based from the constraints of when to use a Chi-
Square Test.
Step 2. Since by inspection of the contingency table, degree of
freedom is greater than one, no zero cell frequency, and no cell
frequency of less than 5, and so, the standard Chi-Square formula
will be used.
Step 3. Designate the corresponding cells by the use of a
letter symbol such as for bachelor’s degree with high salary rate as
cell a, those with master’s degree and with high salary rate is
designated as cell b and so on.
Step 4. Compute the expected cell frequency by using the
equation :
Expected Frequency (fe) = (row total) (column total)
Grand Total
For example in cell a, the row total is 90, the column total is 35, and the grand total is 201. Compute the expected
frequency for the following cells:
fe
A 5 16 -11 121 7.56
B 40 30 10 100 3.33
C 45 44 1 1 0.02
D 10 13 -3 9 0.69
E 20 25 -5 25 1.00
F 45 37 8 64 1.73
G 20 6 14 196 32.67
H 8 12 -4 16 1.33
I 8 18 -10 100 5.56
∑ ( fo –fe )2 = 53.89
fe
∑ ( fo –fe )2 = 53.89
fe
6. Make a decision to the null hypothesis by using the table and figure
below:
_____________________________________________________________
Calculated x2 value Tabular x2 value
0.05 0.01
_____________________________________________________________
53.89 9.49 13.28
_____________________________________________________________
- The t-test for two sample cases, is of two types: (a) the t-test
for independent samples, and (b) the t-test for a dependent
sample.
- The t-test for an independent sample is used when there are two data
taken from two distinct different groups , while the t-test for a dependent
sample is used when there are two data taken from the same group.
= 15-10 /0.458
Where : Sem = sd = 4.58
√n √100