Introduction To Statistics: Romeo D. Caturao, D.SC., Ph.D. Dean, College of Fisheries

Introduction to Statistics
Romeo D. Caturao, D.Sc., Ph.D.

Dean, College of Fisheries
What is Statistics
• Is a science which deals with the methods of

collection, presentation, analysis and interpretation
of the data as bases for decision making.
Observation – refer to any recording of information,

whether it be numerical or categorical
numerical data – represents counts or measures
categorical data – that can be classified according
to some criterion.
Statistical methods
– are those procedures used in the collection,

presentation, analysis, and interpretation of data.
– Statistical methods are categorized into two major

areas:
• Descriptive statistics
• Inferential statistics
• Relational statistics
Descriptive statistics – comprises those methods concerned with collecting and describing a set
of data so as to yield meaningful information
Statistical inference – Statistical inference comprises those methods concerned with the
analysis of a subset of data leading to predictions or inferences about the entire set of data.
Statistical Relationship?
Relationships in probability and statistics can generally be one of three things:
deterministic, random, or statistical.
A statistical relationship is a mixture of deterministic and random relationships.
• A deterministic relationship involves an exact relationship between two variables. For

example, let’s say you earn P10 per hour. For every hour you work, you earn ten pesos more.
• A random relationship is a bit of a misnomer, because there is no relationship between the
variables. However, random processes may make it seem like there is a relationship. For
example, you spend P20 on lottery tickets and win P25. That “win” is due to random chance,
but it could cause you to think that for every P20 you spend on tickets, you’ll win P25 more
(which of course, is false).
How to Tell if a Statistical Relationship Exists
• A statistical relationship exists if a change in one variable (X) results in a

systematic increase in another (Y). The systematic increase doesn’t have to
be exact (i.e. up by ten units each time), but it should be approximately the
same (“around ten”). Ways to measure statistical relationships include:
A line graph or scatterplot can give you an idea about relationships. This
doesn’t have to be a straight line (a “linear relationship”), but it should shown
a definite pattern (a parabolic curve is an example of a “nonlinear
relationship”). The pattern doesn’t have to be perfect (like in the scatterplot
above), and is sometimes called a “trend.”
Correlation is a measure of relationship strength. Pearson’s R is one
numerical measure of the correlation strength. A value of 0 means no
relationship and 1 means a perfect relationship.
Cohen’s D is also a measure of relationship strength. In general, values
close to zero are “small” relationships and values closer to one are “large”
relationships.
Population and sample
• Statistics is primarily concerned with data, most

specifically, quantitative data. In any particularly
study, the number of possible observations may be
small or large and finite or infinite.
Population – refers to the totality of the observations

with which we are concerned. The totality of the
observations with which we are concerned whether
their number be finite or infinite constitutes the
population.
In the past years, the word population refers
to the observations obtained from statistical
studies involving people. Today, statisticians
use the term population to refer to
observations relevant to anything of interest,
whether it be groups of people, animals,
objects, materials, measurements or
happenings of any kind.
A sample – is the subset of the population. If
inferences of the population based on the
sample are to be valid, the sample that are
representative of the population must be
obtained. Oftentimes one is tempted to
choose a sample by selecting the most
convenient member of the population. Such a
procedure may lead to erroneous inferences
concerning the population.
Parameters and Estimates
A parameter -is a property descriptive of the

population.
Estimate – refers to a property of a sample drawn at

random from a population. The sample value is
presumed to be an estimate of a corresponding
population parameter.
The distinction between the parameter and estimate

is reflected in the statistical notation that follows:
Example:
Population Sample
Mean µ ×
Standard deviation σ s
The Greek letter represents the parameter and

the roman letter represents the estimates
Variable and Constant
Variable – is any measurable characteristics that vary

or differ from person to person, environment or
experimental treatment.
Example: People may vary in sex, age, educational

attainment, socio-economic status, religion and others.
Constant – is a contrast of a variable, the
value of which does not change.
Example:
People may vary in sex, age, educational attainment, socio-
economic status, religion and others.
When a researcher wants to study pupils’ performance with

only one level of age such as six years old grade one pupil, is a
constant not a variable.
Two Classes of Variables
1. Independent Variable – is a variable that the

researcher used to describe or to explain the
differences or one that may cause changes in the
dependent variable.
two types of independent variables:
a. subject variable – is a type of independent variable
that is based on the measurable characteristics of the
subject, which the researcher can not directly change. It
is a condition of the subject that exists before the
research begins.
b. manipulated of experimental variable - is a
type of independent variable that the researcher
systematically controls or manipulates and to which
the subjects are assigned.
2. Dependent Variable – is an outcome of interest

that is observed and measured by the researcher in
order to assess the effects of the independent
variable.
Types of Variables
1. Qualitative Variables – refers to the

attributes or characteristics being studies that
are non-numeric. When the data being
studied are qualitative the researchers are
interested to know how many or what number
fall on each category. Qualitative data are
often summarized in charts and bar graphs.
Example:
Civil status, religion, and type of dwelling, type of cell
phone used, etc.
How many Christians and Muslims’ are in the Philippines?
How many percent of the population are below 18 years old?
2. Quantitative variables – refers to the

variables under consideration that can be
reported numerically. These variables can be
in the form of whole numbers, decimals or
fractions
Example: Quantitative variables are scores in a test, the size of
the family and the distance between Iloilo City and Roxas City.
Variables According to Values
1. Discrete variables or discontinuous variables
that have distinct divisions and where the
values or levels cannot include decimal form.
Example: the number of students in the
class say 35 (not 34.5 or 35½) it can take a
specific value in a set of natural counting
numbers… 1,2,3….
2. Non-discrete variables are variables whose
levels can take a continuous value to the most
specific, precise and definite measurements.
Example: The weight of the students may

include the values in decimal form such as
42.625 kilograms, the height is 1.45 meters.
Variables According to Level of
Measurements
1. Nominal variable – is the level of
measurement by classifying observations
attempting to put into the same category
observation that are most similar to each
other.
Example: Students may be classified according to
their sex, which can be categorized as male and
female, type of school graduated as public and
private schools, residence as urban and rural areas
2. Ordinal Variable – the level of measurement where
observations are categorized according to rank or order in
terms of the relative degree of characteristics they possess.
Example: Police officers can be classified according to their ranks such
as SPO1, SPO2, SPO3…… etc.
3. Interval Variable – are variables which permit the
expression of statements of sameness or differences such as
greater than or less than. An interval variable does not have
a true zero point.
Example: The researcher uses a questionnaire to determine the
consumers’ attitude towards a certain product. The questionnaire is
scored for the number of items answerable by Yes or No. The
researcher finds that the score from 28-78. He may now decide to treat
the scores as interval data, in which case he assumes equal distances
(28-32, 33-37, 38-42, etc) for the scores representing equal distance.
4. Ratio Variable – is a variable which permits the making of
statement of equality of ratios. An absolute zero is always
implied.
Example: A scale used to measure the length, width and
height would be a ratio scale. The absence of length (that is
no length) is the zero point in the scale.
Example: In business, money can be considered as a ratio

scale because the zero point on the scale represents the
absence of scales or income
Lets Go back to sampling:
Example: The researcher would like to conduct a study on administrators’
performance in State Universities and Colleges (SUCs) I Region VI from
which the distribution of population of academic administrations (Ni)
were as follows:
Aklan State University – 6
Carlos A. Hilado Memorial State College – 5
Guimaras State College – 3
Iloilo State College of Fisheries – 7
Northern Iloilo Polytechnic College – 9
Panay State Polytechnic College – 14
University of Antique – 10
Western Visayas State College of Science and Technology – 7
West Visayas State University - 9
Determining the Required Sample Size in State Universities and
Colleges:
Step 1. Calculate sample size using Slovin’s equation. Set allowable margin
of error to be 5% that would give a confidence level of 95% level of
confidence.
nt = 70
1 + 70 (0.05)2
n= 70
1 + 70 (0.0025)
n= 70
1 + 0.175
n = 59.57 (the required sample size of academic administrators of SUCs

in Region VI)
Step 2. Calculate the proportion Pi) of the distribution of
academic administrators in every SUC using the equation.
Pi = N i
Nt
where Pi = proportional number of academic

administrators in in every SUC
Ni = population of academic administrators in every SUC
Nt = total number of academic administrators in State

Universities and Colleges in Region VI.
For Aklan State University, the total number of academic administrators were 6, and the total
number of academic administrators in Region VI were 70. Using the aforementioned equation, the
obtained Pi for ASU is 0.086. do the same with the other SUCs. Take note the summation of P i is
equal to 1.00.
Step 3. Calculate the number of academic administrators required in every SUC, by the using the
equation:
ni = Pi (nt)
Where: ni = the required sample size for every SUC
Pi = the proportion of the distribution of the required number of academic administrators in every
SUC.
nt = the sample size or the representative number of academic administrators in Region VI.
Example of the required sample size of academic administrators in West Visayas State University (n i):
ni = 0.129 (60) = 7.74 or 8 ( the required number of academic administrators at West Visayas
State University).
For Capiz University:
Pi = 0.200 ; nt = 60
ni = 0.200 (60)
nt = 12.0 (the required sample size or number of academic
administrators in CSU)
Table 1. Shows the distribution of population and sample size of
academic administrators in State Universities and Colleges in
Region VI.
State Universities and Colleges Ni Pi ni
1. Aklan State University 6 0.086 5
2. Carlos A. Hilado Memorial State College 5 0.071 4
3. Guimaras State College 3 0.043 2
4. Iloilo State College Fisheries 7 0.100 6
5. Northern Iloilo Polytechnic State College 9 0.129 8
6. Capiz State University (PSPC) 14 0.200 12
7. University of Antique (PSCA) 10 0.143 9
8. WVCST 7 0.100 6
9. West Visayas State University 9 0.129 9
Total (Nt0) 70 1.00 nt =60
Basic Sampling Techniques
There are two basic sampling techniques: (1) Probability Sampling
Techniques, and (2) Non-Probability Sampling Techniques
1. Probability Sampling Techniques – a process of selecting a

sample in such a way that all individuals in the defined
population have an equal and independent chance of being
selected for the sample, the process being called as
randomization.
In other words, every individual has the same probability of

being selected and selection of one individual in no way
affects selection of another individual. Several sampling
procedures can be used under this technique.
a. Simple Random Sampling – this type of sampling
technique uses the concept of lottery method of getting a
sample. If a total number of population say 1,000 and the
required sample size is 286, you may get a small piece of
paper and number from 1 to 1,000. The numbered small
pieces of paper will be put in box. Mix them by shaking the
box and draw the required number of sample size. Take note
that when you draw the first number say 25, 25 will be one of
your samples out of 286 from a population of 1,000 and when
you draw the next member of your sample, do not forget to
return to the box the number which you have drawn to
complete the probability sum of 1.0 for the whole drawing of
samples.
b. Stratified sampling - It is the process of selecting a sample in
such a way that identified sub-groups in the population are
represented in the sample proportion that they exist in the
population. It can also be used to select equal-sized samples
from each of a number of sub-groups if sub-group comparisons
are desired. Proportional stratified sampling would be
appropriate, for example, if you are going to take a survey prior
to a national election in order to predict the probable winner.
You would want your sample to represent the voting population.
Therefore, you would want the same proportion of the
Nacionalista Party and Liberal Party, for example, in your sample
as existed in the population.
Other likely variables for proportional stratification might include
educational background, gender, socio-economic status, and the
like. On the other hand, equal-sized samples would be desired if
you want to compare the performance of different sub-groups.
The steps in stratified sampling are very similar to those in random
sampling except that selection is from subgroups in the population
rather than the population as a whole. Stratified sampling involves the
following steps:
1. Identify and define the population.
2. Determine desired sample size as in Table 1.
3. Identify the variable and subgroups (strata) for which you
want to guarantee appropriate representation (either
proportional or equal).
4. Classify all members of the population as members of one of
the identified subgroups.
5. Randomly select (using a table of random numbers or by
using lottery method) an “appropriate” number of individuals
from each of the subgroups, “appropriate” meaning either a
proportional number of individuals or an equal number of
individuals.
c. Cluster Sampling – It is a sampling process in which groups,
not individuals, are randomly selected. All the members of
selected groups have similar characteristics. It is a result from a
two-stage process in which the population is divided into clusters
and a subset of the clusters is randomly selected. Clusters are
commonly based on the geographic areas or districts.
For example, the sample for a household survey taken in a city
may be selected by using city blocks as clusters; a random
sample of city blocks is selected, and all household within the
selected city blocks are surveyed.
Steps in cluster Sampling
The steps in cluster sampling are not very different from those
involved in random sampling. The major difference is that
random selection of groups (clusters) is involved, not individuals.
Cluster sampling involves the following steps:
1. identify and define the population.
2. Determine the desired sample size.
3. identify and define a logical cluster.
4. List all clusters (or obtain a list) that comprise the
population
5. Estimate the average number of population members per
cluster.
6. Determine the number of clusters needed by dividing the
sample size by the estimated size of a cluster.
7. Randomly select the needed number of clusters 9using a
table of random numbers).
8. Include in your study all population members in each
selected cluster.
Cluster sampling can be done in stages, involving selection of
clusters within clusters. This process is called multi-stage
sampling. For example, school can be randomly selected and
then classrooms within each school can be randomly selected.
An Example of cluster Sampling:
Let us see how our superintendent would get a sample of
teachers if cluster sampling is used. The steps are as follows:
1. The population is all 5,000 teachers in the
superintendent’s school system.
2. The desired cluster is a school.
3. A logical cluster is a school.
4. The superintendent has a list of all the schools in the
district; there are 100 schools.
5. Although the schools vary in he number of teachers per
school, there is an average of 50 teachers per school.
6. the number of clusters ( schools) needed equals the
desired sample size, 500, divided by the average size of a
cluster, 50. Thus, the number of schools needed is 500 ÷ 50 =
10.
7. Therefore, 10 of the 100 schools are randomly selected.
8. All the teachers in each of the 10 schools are in the sample
(10 schools, 50 teachers per school, equals the desired sample
size).
d. Systematic Sampling – Is a sampling technique in which
individuals are selected form a list by taking every “kth” name.
What is a “kth” name? That depends on what K is. If K = 4 ,
selection involves taking every 4th name, if K= 10, every 10th name,
and so forth. What K actually equals depends on the size of the list
and the desired sample size. The major difference between
systematic sampling and the other types of sampling is the fact
that all members of the population do not have an independent
chance of being selected for the sample. Once the first name is
selected, all the rest of the individuals to be included in the sample
are automatically determined. Even though choices are not
independent, a systematic sample can be considered a random
sample if the list of the population is randomly ordered. One or
the other has to be random – either the selection process or the
list. If however, the list of population is not randomly ordered,
systematic sampling is as good as non-probability sampling.
Steps in Systematic Sampling:
1. Identify and define the population.
2. Determine the desired sample size.
3. obtain a list of population.
4. Determine what K is equal to by dividing the size of the
population by the desired sample size.
5. Start at some random place at the top of the population
list.
6. Starting at that point, take every kth name on the list until
the desired sample size is reached.
7. If the end of the list is reached before the desired sample is
reached, go back to the top of the list.
2. Non-probability Sampling Techniques
In non-probability sampling, there is no random selection of
cases from the population. Randomization process is not
considered in this techniques of getting a sample. The
samples or subjects that are needed are merely taken or
selected for a certain purpose of the study.
Example: You would like to determine the mathematics
performance of students as to their socioeconomic status. In
a probability sampling, the distribution of socioeconomic
status requires a sample of subject of study from those with
low, average, and high socioeconomic status.
But in non-probability sampling the researcher has the
option to select subject of study to those only with high or
low or average socioeconomic status only.
Several sampling techniques
1. Accidental or Incidental Sampling
It is a process of getting a subject of study that is only
available during the period.
Example: if the researcher would like to conduct a survey on
which brands of toothpaste are top sellers in Region VI, the
researcher has to identify the peak shopping hours in a
certain mall and standby at the exit gate and interview the
number of shoppers who came out about the brand of
toothpaste they bought until the researcher has met the
desired sample size.
2. Quota Sampling
It is a process of getting a sample of subject of study through
quota system.
Example: In making an opinion survey about legalization of
divorce in the Philippines, the researcher can assign a quota
system of subjects of investigation such all fourth year students
taking Political Science courses in all Higher Education Institutions
in every region.
3. Purposive Sampling
In this sampling technique, the researcher simply picks out the
subjects that are representatives of the population depending on
the purpose of the study.
Example: In a study on NSAT Performance of Science Students,
the researchers ca study only those students belonging to a high
socioeconomic status or those with average socioeconomic
status, but not a representative of all students.
When the population and the required sample size
have been determined, the researcher is now ready to
prepare the number of copies of a data gathering
instruments to be used in his/her investigation.
If however, the researcher does not have the adapted

questionnaire to be used, preparation of a data
gathering instruments is a must in order to measure
what is to be measured in that kind of investigation.
SUMMATION NOTATION
In statistics, it is frequently necessary to work with sums
of numerical values.
Example: We may wish to compute the average cost of a
certain brand of toothpaste sold at 10 different stores.
Perhaps we would like to know the total number of
heads that occur when 3 coins are tossed several times.
Consider a controlled experiment in which the decreases
in weight over a 6-month period were 15, 10, 18, and 6
kilograms, respectively. If we designate the first
recorded value x1, x2, and so on, then we can write:
x1 = 15
x2 = 10
x3 = 18
x4 = 6
Using the Greek letter ∑ (capital sigma) to indicate “summation of,”

we can write the sum of the 4 weights as
4
∑ Xi,
i=1
where we read “summation of Xi, i going from 1 to 4.” The numbers 1
and 4 are called the lower and the upper limits of summation. Hence
4
∑ xi = x1 + x2 + x3 + x4
i=1 = 15 + 10 + 18 + 6 = 49
Also, 3
∑ xi = x2 + x3 = 10 + 18 = 28
i=2
n
In general, the symbol ∑ means that we replace i wherever it
i=2
it appears after the summation symbol by 1, then by 2, and so on
up to n, and then add up the terms. Therefore, we can write
3
∑ xi2 = X12 + X22 + X32 ,
i= 1
5
∑ xj Yj = X2Y2 + X3Y3 + X5Y5.
j=2
The subscript may be any letter, although i, j, and k seem to be

preferred by statisticians. Obviously,
n n
∑ Xi = ∑ Xi =
i = 1 i=1
The lower limit of summation is not necessarily a subscript. For instance, the sum
of the natural numbers 1 to 9 may be written
9
∑ x = 1 + 2 + …… + 9 = 45.
x=1
When we are summing over all the values of Xi that are available, the limits of
summation are often omitted and we simply write ∑ xi. If in the diet experiment
only 4 people were involved, then ∑ xi = X1 + X2 + X3 + X4. In fact some authors even
drop the subscript and let ∑ x represent the sum of all available data.
Example: If X1 = 3, X2 = 5, and X3 = 7, find
Solution: ∑ Xi = X1 + X2 + X3 = 3 + 5 + 7 = 15
Three theorems that provide basic rules in dealing with
summation notation are given below:
THEOREM 1.1
The summation of the sum of two or more variables is the sum of
their summations. Thus:
n n n n
∑ (xi + yi + zi) = ∑xi + ∑yi + ∑zi.

i=1 i=1 i=1 i=1
Proof. Expanding the left side and regrouping, we have

n
∑ (xi + yi + zi)
i= 1
= (x1 + y1 + z1 ) + (x2 + y2 +z2) + . . . + (xn + yn + zn)
= ( x1 + x2 + . . . + xn) + (y1 + y2 + . . . + yn) + (z1 + z2 + . . . + zn)
n n n
= ∑ x i + ∑ y i + ∑ z i.
i=1 i=1 i=1
THEOREM 1.2
If c is a constant, then
n n
∑ cxi = c ∑ xi.
i=1 i=1
Prof. Expanding the left side and factoring, we get

n
∑ cxi = cxi + cx2 + . . . cxn
i=1
= c(x1 + x2 + . . . + xn)
n
= C ∑ xi.
i=1
THEOREM 1.3
If a is constant, then
n
∑ c = nc.
I=1
Proof. If in Theorem 1.2 all the xi are equal to 1, then
n
∑ c = c + c + . . . + c = nc.
i=1
n items
The use of Theorems 1.1 through 1.3 in simplifying summation
problems is illustrated in the following examples.
Example 10: If X1 = 2, X2=4, Y1=3, Y2 = -1, find the value of
n
∑ (3xi - yi + 4).
i=1
Solution
2 2 2 2
∑ (3xi - yi + 4) = ∑ 3xi - ∑ yi + ∑ 4
i=1 i=1 i=1 i=1
2 2
= 3 ∑ xi - ∑ yi + (2)(4)
i =1 i =1
= (3)(2 + 4) – (3 – 1) + 8
= 24
Example 11. Simplify
3
∑ ( x – i)2 .
i=1
Solution
3 3
∑ ( x – i)2 = ∑ (x2 – 2xi + i2)
i=1 i=1
3 3 3
= ∑ x2 - ∑ 2xi + ∑i2 = 3x2 – 2x( 1+ 2 + 3) + (1 + 4 + 9)
i=1 i= 1 i=1
3 3
= 3x2 – 2x ∑ i + ∑ i2 = 3x2 – 12x + 14.
i=1 i=1
Often our data may be classified according to two criteria. For
example Xij might represent the amount of gas released when a
chemical experiment is run at the ith temperature level and the
jth pressure level. To sum such observations, it is convenient to
adopt a double-summation notation. The symbol
m n
∑ ∑ means that we first sum over the subscript j, using the

i=1 theory for single summation, and then perform the
j=1
second summation by allowing i to assume values from 1 to m.
Hence, for the data in the following table,
Temperature Pressure
1 2 3 4
T1 X11 X12 X13 X14

T2 X21 X22 X23 X24
we have
2 4 2
∑ ∑ xij = ∑ (x12 + x13 + x14)
i=1 j=2 i=1
= (x12 + x13 +x14) + (x22 + x23 + x24).
Similarly, if f(x, y) represents the textbooks sales for publisher xi at university yj, then
3
2 3
∑ ∑ f(xi, yj) = ∑ f (xi,y1) + f(xi, y2)
I=1 j=1 I=1
= f ( x 1, y1) + f (x1, y2) + f( x2, y1) + f(x2, y2)
+ f(x3,y1) + f(x3,Y2)
gives the total sales of a certain three publishers at two specific universities.
=
Parameters and Statistics
= The terminology and notation adopted by statisticians in their
treatment of statistical data depend entirely on whether the data set
constitutes a population or a sample selected from a population.
= Consider for example the following set of data representing the
number of typing errors made by the secretary on 10 different pages
of a document: 1, 2, 1, 2, , 1, 1, 4, 0, 2, and 2.
= First let us assume that the document contains exactly 10 pages so
that the data constitute a small finite population. A quick study of
this population could lead to a number of conclusions. For instance,
we could make the statement that the largest number of typing
errors on any single page was 4, or we might state that the
arithmetic mean (average) of the 10 numbers is 1.5. The numbers 4
and 1.5 are descriptive properties of our population. We refer to
such values as parameters of the population.
Parameter
Any numerical value describing a characteristic of a population is
called a parameter.
= It is customary to represent parameters by Greek letters. By
tradition the arithmetic mean of population is denoted by µ.
Hence, for our population of typing errors, µ = 1.5. Note that a
parameter is a constant value describing the population.
= Now let us suppose that the data representing the number of
typing errors constitute a sample obtained by counting the
number of errors on 10 pages randomly selected from a large
manuscript. Clearly, the population is now a much larger set of
data about which we only have partial information provided by
the sample. The numbers 4 and 1.5 are now descriptive measures
of the sample and are not to be considered parameters of the
population. A value computed from a sample is called a statistic.
Statistic
- Any numerical value describing a characteristics of a
sample is called a statistic.
= A statistic is usually represented by ordinary letters of the
English alphabet. If the statistic happens to be the sample
mean, we shall denote it by x . For our random sample of
typing errors we have x = 1.5. Since many random samples
are possible from the same population, we would expect the
statistic to vary from sample to sample. In other words, if a
second random sample of 10 pages is selected from the
manuscript and the number of typing errors per page
tabulated, the largest value might turn out to be 5 rather
than 4, and the arithmetic mean would be probably be close
to 1.5, but almost certainly different.
In our study of statistical inference we shall use
the value of a statistic as an estimate of the
corresponding population parameter. The size
of the population will, for the most part, large
or infinite. To know how accurate the statistic
estimates the parameter, we must first
investigate the distribution of the values of the
statistic obtained in repeated sampling.
Measures of Central Location
To investigate a set of quantitative data, it is useful to define numerical
measures that describe important features of the data . One of the important
ways of describing a group of measurements, whether it be a sample or
population, is by the use of an average.
An average is a measure of the center of a set of data when the data are
arranged in an increasing or decreasing order of magnitude. For example, if an
automobile averages 14.5 kilometers to 1 liter of gasoline, this can be
considered a value indicating the center of several more values. In the country
1 liter of gasoline may give considerably more kilometers per liter than in the
congested traffic or a large city. The number 4.5 in some sense defines a center
value.
Any measure indicating the center of a ser of data, arranged in an increasing
or decreasing order of magnitude, is called a measure of central location or a
measure of central tendency. The most commonly used measures of central
location are the mean, median, and mode. The most important of these and
the one will shall consider first is the mean.
Population Mean
If the set of data x1, x2, . . . , xN, not necessarily all
distinct, represents a finite population of size N, then
the population mean is
∑ xi
i=1
µ=
N
Example: The number of fishes at 5 different breeding
tanks are 3, 5, 6, 4, and 6. Treating the data as a
population, find the mean number of fishes for the 5
tanks.
Solution: since the data are considered to be finite

population.
µ = 3 + 5 + 6 + 4 + 6 = 4.8
5
Sample Mean
If the set of data x1, x2, …, xn, not necessarily all
distinct represents a finite sample of size n, then the
sample mean is
∑ xi
i=1
x= n
Example 2. A food inspector examined a random sample of 7
cans of a certain brand of tuna to determine the percent of
foreign impurities. The following data were recorded : 1.8,
2.1, 1.7, 1.6, 0.9, 2.7, and 1.8. Compute the sample mean.
Solution: This being a sample, we have
X = 1.8 + 2. 1 + 1.7 + 1.6 + 0.9 + 2.7 + 1.8 = 1.8%

7
Often, it is possible to simplify the work in computing a mean by
using coding techniques. For example, it is sometimes
convenient to add (or subtract) a constant to all our
observations and then compute the mean. How is this new
mean related to the mean of the original set of observations?
If we let yi = xi + a, then
n n
∑ yi ∑ yi (xi + a)
i =1 i =1
y = ---------- = ---------------- = x + a.
n n
Therefore, the addition (or subtraction) of a constant to all
observations changes the mean by the same amount. To find
the mean of the numbers -5, -3, 1, 4, and 6, we might add 5 first
to give the set of all positive values 0, 2, 6, 9, and 11 that have a
mean of 5.6. therefore, the original numbers have a mean of 5.6
– 5 = 0.6
Now, suppose that we let yi = axi. It follows that
n n
∑ yi ∑ axi
i=1 i=1
y = ------- ----------- = ax.
n n
Therefore, if all observations are multiplied
or divided by a constant, the new
observations will have a mean that is the
same constant multiple of the original
mean. The mean of the numbers 4, 6, 14 is
equal to 8, and therefore, after dividing by
2, the mean of the set 2, 3, and 7 must be
8/2 = 4.
MEDIAN
The second most useful measure of central location is the
median. For a population we designate the median by
µ and for a sample we X.
The median. The median of a set of observation arranged in an

increasing or decreasing order of magnitude is the middle
value when the number of observations is odd or the
arithmetic mean of the two middle values when the number
of observations is even.
Example 3. On 5 term tests in fisheries a student has
made grades of 85, 93, 86, 92, and 97. Find the median
for this population of grades.
Solution. Arranging the grades in an increasing order of
magnitude, we get
85
86
92
93
97
and hence µ = 92
Example 4. The nicotine contents for a random sample
of 6 cigarettes of a certain brand are found to be 2.3,
2.7, 2.5, 2.9, 3.1, and 1.9 milligrams. Find the median.
Solution. If we arrange these nicotine contents in an

increasing order of magnitude, we get
1.9
2.3
2.5
2.7
2.9
3.1
and the median is then the mean of 2.5 and 2.7.
Therefore,
X = 2.5 + 2.7 = 2.6 milligrams.

2
Mode
The mode of a set of observations is that value which
occurs most often or with the greatest frequency.
For some sets of data there may be several values
occurring with greatest frequency in which case we
have more than one mode.
Example 5. If the donations from the Fisheries
Association of the Philippines are recorded as 9, 10,
5, 9, 9, 7, 8, 6, 10, and 11 dollars, then 9 dollars, the
value that occurs with the greatest frequency, is the
mode.
Example 6. The number of movies attended last month

by a random sample of 12 high school students were
recorded as follows: 1, 0, 3, 1, 2, 4, 2, 5, 4, 0, 1, and 4.
In this case, there are two modes, 2 and 4, since
both 2 and 4 occur with the greatest frequency. The
distribution is said to be bimodal.
MEASURES OF DISPERSION
Dispersion or variability refers to the spread of the
values around the central tendency. Specifically,
measures of dispersion simply serve as index of spread
of X-values away the central value.
Measures of Dispersion or Variation has four common

measures: range, variance, standard, and coefficient
of variation. It Serves as an index of reliability or of
the consistency of the data being processed in the
sense that it indicates the heterogeneity or
homogeneity of the data generated from the variates.
Range – is a measure of dispersion from which it
is the difference from the highest and the
lowest value. In a number line system, it is the
absolute value towards the negative and
positive value from which the reference point
is of the origin.
Student Xi dx = (Xi-X) (dx)2=(Xi-X)2 Student Xi dx=(Xi-X) (dx)2=(Xi-X)2
A 5 -3 9 A 2 -6 36
B 6 -2 4 B 4 -4 16
C 7 -1 1 C 6 -2 4
D 7 -1 1 D 7 -1 1
E 8 0 0 E 7 -1 1
F 8 0 0 F 8 0 0
G 8 0 0 G 8 0 0
H 8 0 0 H 8 0 0
I 9 1 1 I 9 1 1
J 9 1 1 J 11 3 9
K 10 2 4 K 12 4 16
L 11 3 9 L 14 6 36
Example : N=12 ∑dx = 0 ∑(dx)2=30 N= 12 ∑dx = 0 ∑(dx)2=30
∑xi = 96 ∑xi = 96
X= 8 X= 8
Measures of Dispersion Using the Number Line system
Group A
8
8
7 8 9
5 6 7 8 9 10 11
-X +X
3 3
6
Measures of dispersion Using Variance
σ2 A = ∑(Xi –X)2 30 30
= = = 2.73
N-1 12 – 1 11
σ2 A = ∑(Xi –X)2 = 120 = 120 = 10.91

N- 12 – 1 11
Measures of Dispersion Using Standard Deviation
∑ (Xi - X)2 30
= = 1.65
s.d.A = N -1 12-1
∑ (Xi - X)2 120 = 3.30

=
s.d.B = N-1 12-1
Measures of dispersion Using coefficient of Variation
C.V.A = s.d. x 100 = 165 = 20.63%

XA 8
C.V.A = s.d. X 100 = 41.25 = 41.25%

XB 8
Comparing the four measures of dispersion using the two groups

of subjects, is reflected in the following table.
Measures of Dispersion Group A Group B
Range 6 12
Variance 2.73 10.91
Standard Deviation 1.65 3.30
Coefficient 20.63% 41.25%

You will notice then, that in the measures of central tendency,
both groups have the same central values but in terms of their
measures or dispersion they may or may not vary. In this case,
since Group B has a wider distance of x – values from the mean,
Group B varies widely compared to Group A.
In a descriptive data analysis aside from the frequency, rank and

percentages, the most important tools are the mean and
standard deviation.
The mean gives the true picture of the group or the ones that
can vest describe or give a description of the group under
investigation.
The standard deviation on the other hand, gives the picture of
the homogeneity or heterogeneity of the a set of data being
analyzed. It is more accurate and detailed estimate of dispersion
because an outlier can greatly exaggerate the range as what is
true in this example where the outlier value of 6 for Group A
and 10 for Group B stand apart from the rest of the values.
Ranking
Ranking is the process of arranging the data from the highest to the lowest
or vice versa based on certain criteria. Such criteria can be ordered in
terms of quantity, quality, appraised value and chronology.
Steps in ranking:
1. Arrange the data to be ranked in a descending or ascending order.
2. Assign consecutive numbers for each item from the highest to
lowest or vice versa.
3. Rank an item occurring once the same as its consecutive number.
4. The rank of an item occurring two or more times is done by adding
their consecutive numbers and divide by the number of items.
Example: Rank the following average weight of
tilapia after one month culture.
83 82 79 86 80 82
80 76 78 77 79 81
82 84 80 81 78 79
85 76 75 75 85 84
No. Ave. Wt. Rank
1. 86 1.0
2 85 2.5 2+3 = 5; 5÷2 = 2.5
3 85 2.5
4 84 4.5 4+5 = 9; 9÷2 = 4.5
5 84 4.5
6 83 6.0
7 82 8.0 7+8+9 = 24; 24÷3 = 8.0
8 82 8.0
9 82 8.0
10 81 10.5 10+11 = 21; 21÷2 = 10.5
11 81 10.5
12 80 13.0 12+13+14 = 39; 39÷3 = 13.0]
13 80 13.0
14 80 13.0
15 79 15+16+17 = 48; 48÷3 = 16
16 79
17 79
18 78 18+19 = 37; 37÷2 = 18.5
19 78
20 77
21 76 21+22 = 43; 43÷2 = 21.5
22 76
23 75 23+24 =47; 23.5
24 75
Frequency Distribution of Interval
Using the following average weight of tilapia, follow the steps
and present the frequency distribution.
70 75 49 85 79 73 52 65 90 65 87 47 65 56
80 50 72 92 82 78 95 63 80 69 92 66 68 86
89 64 74 50 57 74 56 73 71 72 72 59 55 80
75 57 97 60 53 68 74 79 75 77 69 81 69 82
66 48 71 71 66 62 68 86 61 81 60 89 71
1. Determine the range. The range is the difference between the highest
score or data and the lowest score or data, In the given data the highest
score is 97 and the lowest score is 47. The range is computed as follows:
Range = H - L
Range = 97 - 47 = 50
2. Determine the acceptable size of the interval by
dividing the range by 10 and 15.
R/10 = 50/10 = 5
R/15 = 50/15 = 3.33
The size of interval is from 3.3 to 5. Number 4 is within the

interval of 3.3 and 5, but it is not an acceptable interval since
most frequency distribution have odd number as the size of
the interval. The main advantage of using odd numbers as
class interval is that the midpoints or the class mark will be a
whole number
3. Establish the class interval. To establish the class interval see to
it that the lowest class interval begins with a number that is a
multiple of the interval size. Since the lowest score is 47 and the
size of interval is 5, the lowest class interval would start with 45
– 49 and the next class 50-54. These are called the interval
limits.
To determine the class boundaries, the values are obtained from

the frequency table by increasing the upper class limits and
decreasing the lower class limits by the same amount so that
there are no gaps between consecutive classes. The exact or real
limits are 44.5 and 49.5, respectively.
Determine the rest of the intervals by increasing each interval

limits by 5 until the highest class interval 95-99, which contains
the highest score of 97 in the distribution is reached.
4. Tally each score or data
Class interval Tally (t) (f)
95 -99 ll 2
90-94 lll 3
85-89 llll-l 6
80-84 llll-lll 8
75-79 llll-llll 10
70-74 llll-llll-llll 14
65-69 llll-llll-ll 12
60-64 llll-ll 7
55-59 llll-l 6
50-54 llll 4
45-49 lll 3
5. Summarize the tally under column (f) or frequency
6. Compute the midpoint (M) of every class interval and
place it under column (M).
To determine the Midpoint use the following
formula:
M = Ll + Hl
2
where: M = Midpoint
Ll = the lover limit in the class interval
Hl = the higher limit in the class interval
Example:
For the second class interval
M = 90 + 94 = 184 = 92
2 2
7. Compute for the cumulative frequency less than “CF<“ and
the cumulative frequency greater than “CF>”.
Class interval Tally (t) (f) M CF<CF>
95 -99 ll 2 97 75 2
90-94 lll 3 92 73 5
85-89 llll-l 6 87 70 11
80-84 llll-lll 8 82 64 19
75-79 llll-llll 10 77 56 29
70-74 llll-llll-llll 14 72 46 43
65-69 llll-llll-ll12 67 32 55
60-64 llll-ll 7 62 20 62
55-59 llll-l 6 57 13 68
50-54 llll 4 52 7 72
45-49 lll 3 47 3 75
N = 75
8. Compute the relative frequency and place it under (FR%).
The relative frequency is computed by finding the quotient of
the class frequency over the total frequency multiplied by
100 since RF is expressed in percent.
FR(%) = C x 100
Tf
Where RF = the relative frequency
C = the class frequency
TF = the total frequency
Sample computation of RF for the second class interval
RF(%) = C x 100 = 3 x 100 = 4%
TF 75
Compute the cumulative relative frequency CRF.
The cumulative relative frequency is computed
by adding the RF of reach class interval from
above. The total cumulative frequency is 100%
or very close to 100%.
Class interval Tally (t) (f) M CF< CF> RF CRF
95 -99 ll 2 97 75 2 2.67 2.67
90-94 lll 3 92 73 5 4.00 6.67
85-89 llll-l 6 87 70 11 8.00 14.67
80-84 llll-lll 8 82 64 19 10.67 25.34
75-79 llll-llll 10 77 56 29 13.33 38.57
70-74 llll-llll-llll 14 72 46 43 18.67 57.24
65-69 llll-llll-ll 12 67 32 55 16.00 73.24
60-64 llll-ll 7 62 20 62 9.33 82.57
55-59 llll-l 6 57 13 68 8.00 90.57
50-54 llll 4 52 7 72 5.33 96.00
45-49 lll 3 47 3 75 4.00 100
N = 75
Measures of Relationships
Question? Are measures of relationships, correlations, or
association descriptive statistics or inferential statistics?
Answer? If the researcher is using a population data,
measures of relationships are descriptive statistics, since the
researcher is measuring the strength or degree of
relationships among variable in the population. But, if the
researcher is using a sample data, measures of relationships
are under the inferential statistics and from such the
researcher will now be testing a research hypothesis. Again,
both population and sample data have their own
corresponding and specified statistical tool to be used
depending on the levels of measurement in determining the
strength and significance of the relationships among variables.
Further, the difference in the use of a descriptive statistics o
inferential statistics in the measures of relationships is that in
descriptive statistics you simply measure the degree or
strength of relationship. But in inferential statistics, the
researcher determines the significance of the degree of
relationships, whether or not the degree or strength is
significant.
To measure the degree or strength of correlation, use the guide

below in interpreting the coefficient of correlation.
Computed r-value Interpretation
0.00 to + 0.10 No correlation
+0.11 to + 0.25Negligible correlation
+0.26 to + 0.50Moderate correlation
+0.51 to + 0.75High correlation
+0.76 to + 1.00Very high perfect positive/correlation
Coefficient of correlation is a measure of strength of
the association between two variables. There are
three types of correlation:
Perfect positive correlation
No correlation
Perfect negative correlation
If all data points lie on a straight line, the coefficient is

either +1.0 or -1.0, depending on the direction of the
slope of the line. Coefficients of +1.0 or -1.0 describe
perfect correlation, otherwise it is either weak,
strong positive or negative correlation.
Pearson’s r
Pearson’s r is also known as the Product-Moment
Correlation Coefficient which is to be used only in
determining the strength of correlation between two or
more interval data.
For example, if the data were taken from a population data
and the levels of measurement are intervally scaled,
such as the relationship between the student’s English
performance with their Mathematics performance as
shown:
The Pearson’s r Computed Using the Deviation Method
English Mat9h dx dx2 dy dy2 dxdy

(X) (Y)
19 11 7 49 2 4 14
17 15 5 25 6 36 30
15 9 3 9 0 0 0
14 13 2 4 4 16 8
13 11 1 1 2 4 2
11 9 -1 1 0 0 0
11 8 -1 1 -1 1 1
9 7 -3 9 -2 4 6
7 6 -5 25 -3 9 15
4 1 -8 64 -8 64 64
∑xi = ∑yi ∑(dx)2=188 ∑(dy)2=138 ∑dxdy = 140

120 =90
Mx = 12 My = 9
∑ (Xi – M)2 188
Sdx = N–1 = 10 - 1
Sdx = 4.57
∑(Yi – My)2 138

N–1 10 - 1
Sdy =
Sdy = 3.92
Rp = ∑dxdy
(N-1) (sdx) (sdy)
whereby:
rp = Pearson’s r coefficient of correlation
∑dxdy = the sum of the product of the deviation
of x and y variables.
sdx = standard deviation of x
sdy = standard deviation of y
N = number of subjects
∑dxdy 140
Rp = Rp =
(N-1) (sdx) (sdy) (9) ( 4.57 ) (3.92)
rp = 0.87 this indicates very high positive

correlation
Since the population data are being used in this study, to

determine its strength of correlation just simply compare the
computed r-value from the guide of scales by Garret.
If we are going to compute the degree of relationships between English and
Mathematics performance of the students using the same data but we are
going to transform only the data from interval to ordinal data and use the
Spearman’s r, the results are shown below:
English (x) Math (Y) Rx Ry D D2
19 11 1.0 3.5 -2.5 6.25
17 15 2.0 1.0 1.0 1.0
15 9.0 3.0 5.5 2.5 6.25
14 13 4.0 2.0 2.0 4.0
13 11 5.0 3.5 1.5 2.25
11 9.0 6.5 5.5 1.0 1.0
11 8.0 6.5 7.0 0.5 .25
9 7.0 8.0 8.0 0.0 0.0
7 6.0 9.0 9.0 0.0 0.0
4 1.0 10 10 0.0 0.0
∑xi =120 ∑yi =90 ∑D2= 21
Using Spearman’s r equation:
rs = 1.0 - 6 ∑ D2
N (N2 -1
= 1.0 – 6(21)
10(102 -1)
= 1.0 - 126
990
= 1.0 – 0.127
= 0.87 this indicates very high positive correlation
from the computed r-value using Pearson’s r
If however, the data being analyzed were taken from a sample data,
the researcher needs to determine the significance of the
computed r-value by testing the significance of r-value using the
equation shown below:
t – value = r n -2
1 – r2
= 0.87 10 – 2
1 – (.87)2
= 0.87 (5.74)
= 4.99
In testing the significance of r-value, compare the computed t-
value with that of the tabular value at 0.05 alpha or at p-value
set at 0.05 alpha for a two-tailed test.
Calculated t-value tabular t- value
0.05 0.01
4.99* 2.31 3.36
p < 0.01 significant at 0.01 alpha
This means that the null hypothesis of there is no significant

relationship between the English and Mathematics performance
of the students was rejected. The researcher is confident that
out of 100 trials, he or she is 99% positive that there is a
significant relationship between variables and only 1% will be
left due to chance or error of there is no significant relationship.
The Pearson’s r
Pearson’s r is used when the variables are of the interval or ratio
type of measurement. In this case the deviation method and the
difference method will be used.
Determining Pearson’s Coefficient of Correlation Using the
Deviation Method
The formula for Pearson’s r deviation method is:
rp = ∑dxdy
(N- 1) (sdx) (sdy)
whereby: rp = Pearson’s coffeficient of correlation between

variables X and Y
∑dxdy = the sum of the product of the deviation between
variable X and Y
sdx = the standard deviation of variable X
sdy = the standard deviation of variable y
N = the total number of subjects
Example = Find the degree of correlation between the scores of

students in English subject (X) with their scores in
Mathematics subject (Y).
X Y dx=(Xi - M) dy = (Yi-M) (dx)2=(XiM)2 dy=(Yi-M)2 dxdy
12 8 6 1 36 1 6
10 12 4 5 16 25 20
8 6 2 -1 4 1 -2
7 4 1 -3 1 9 -3
6 9 0 2 0 4 0
5 10 -1 3 1 9 -3
5 7 -1 0 1 0 0
4 6 -2 -1 4 1 2
2 5 -4 -2 16 4 8
1 3 -5 -4 25 16 20
60 70 104 70 48
Mx=60 My=7
Sdx =
∑(Xi – M)2 104 = 3.40
N-1 = 10 – 1
70 = 2.79
∑(Yi – M)2
Sdy = N-1 = 9
rp = ∑dxdy = 48
(N-1) (sdx) (sdy) (9) (3.40) (2.79)
= rp = 0.56
Determining Pearson’s Coefficient of Correlation Using the
Difference Method
The formula to use is:
Sx2 + Sy2 –Sd2

rp =
2(sdx) (sdy)
whereby:
rp = Pearson’s coefficient of correlation

Sx2 = the variance of variable X
Sy2 = the variance of variable Y
Sd2 = the variance of the differences between variable X and
variable Y
Sdx = the standard deviation of variable X
Sdy = the standard deviation of variable y
The same data were used in determining Pearson’s Correlation using
the difference Method.
X Y X2 Y2 D D2
12 8 144 64 4 16
10 12 100 144 -2 4
8 6 64 36 2 4
7 4 49 16 3 9
6 9 36 81 -3 9
5 10 25 100 -5 25
5 7 25 49 -2 4
4 6 16 36 -2 4
2 5 4 25 -3 9
1 3 1 9 -2 4
∑X2 – (∑X)2 464 – (60)2
Sdx = N = 10
N–1 10-1
464 - 360
Sdx = 9
104
Sdx = 9
Sdx = 3.40
∑ Y2 – (∑X)2 560 – (70)2
Sdy = N = 10
N -1 N- 1
560 – 490 70
Sdy = 9 = 9
= 2.79
Computation of the coefficient of variation (CV)
CV = error MS X 100
grand mean
The cv, value which indicates the degree of

precision in a particular experiment, is
generally placed underneath the analysis of
variance table as shown below:
116
Analysis of Variance table
Source of Variation Df Ss Ms Observed Tabular F
F2 5% 1%
Treatments 6 5,587,175 931,196 9.82** 2.57 3.81
Experimental Error 21 1,990,237 94,773
Total 27 7,577,412
CV = 15.1%, ** = significant at 1% level.
Cv = 94,773 x 100 = 15.1%

2,040
The coefficient of variation is a good index of the reliability of an experiment. It expresses the experimental
error as a percentage of the mean, and thus, the higher its value, the lower is the reliability of the
experiment. The coefficients of variation vary greatly with the type of experiment, the crop being tested,
and the character being measured. An experienced researcher, however, can make good judgment on the
acceptability of a particular cv for a given type of experiment. Our experience with measuring production
rate, for, example, indicates that we should aim for a cv of 6 to 8% species tests, 10 to 12% for fertilizer
trials and about 15% for insecticide or herbicide trials. Moreover, the cv for other species can differ
greatly from that of production. For example, in experiments where the cv for production is about 10%,
that for producer or a tiller number can be expected to be about 20% and that for fish growth about 3%.
117
Comparisons Among Treatment Means
Introduction
- the significant F-test in the analysis of variance indicates the
presence of one or more real differences among the treatments
tested.
- It dos not, however, locate the specific difference, or
differences, that may account for the significance.
- Thus, the analysis of variance should be considered only as the
first step in evaluating differences among treatments.
- The subsequent step is to locate the specific treatment
differences of interest to the researcher and test whether these
differences are significant.
- Thus, this topic is primarily concerned with how to locate and
test specific treatment comparisons.
118
Specific Comparison among treatment means can be arbitrarily classified into three types:
1. Comparison between pairs of treatment
2. Comparison between two groups of treatment
3. Trend comparison
Comparison between pairs of treatment

is the simplest since each comparisons involves only two treatments. Because of its
simplicity, this comparison is the most commonly used.
Comparison between two groups of treatment

involves classifying the treatments into meaningful groups. Each group may consist of one
or more treatments and comparison is made between the aggregate means. One example
of this is a comparison between the mean of all fertilized fishpond plots and the mean of
unfertilized fishpond plots; another is the comparison between the mean of several native
species of catfish and mean of several improved or newly introduced species.
Trend comparison
I limited to treatments that are quantitative. For example, rates of fertilizer and distances
of planting. Although the trend comparison is slightly more complicated, it can also
generate more information than can be obtained from the first two comparisons.
119
COMPARISON BETWEEN PAIRS OF MEANS
There are atleast two ways of comparing pairs of treatment means.
1. Preplanned comparisons - based on pairs of treatments that have been

specifically planned and pin pointed before the start of the experiment.
Example: A common example is comparing each of the several
treatments to a control, as in comparing the unfertilized treatment to each
of the fertilized treatments
2. Unplanned comparisons - Is one where no specific comparison is

chosen in advance. Instead, all possible comparisons are evaluated to see
which difference may appear real. It is best to know which of these two
general comparisons is desired before deciding on the appropriate
statistical procedure.
- Several procedures for testing the differences between
treatment means are discussed by Steel and Torrie (1960) and
Snedecor and Cochran (1971).
- Two of the more commonly used procedures are the Least

Significant difference Test (LSD) and Duncan’s Multiple Range
Test.
Least Significant Difference Test (LSD)
- Because of its simplicity, the LSD test is probably the most common procedure for evaluating the
significance of difference between pairs of treatment means.
- All that is done is to locate any pair of means whose difference exceeds the LSD value and declare
the means significantly different from each other.
- Strictly speaking, however, this procedure is valid when the experiment involves only two
treatments, because as more treatments are involved and when all possible pairs of treatment
means (unplanned comparisons) are to be tested the LSD becomes less and less precise.
- Note that the number of all possible pairs increase very rapidly as the number of treatments
increases, i.e., 10 possible pairs for 5 treatments, 45 for 10 treatments, and 105 for 15 treatments.
-Furthermore, it can be shown that the probability of at least one comparison, the largest vs. the
smallest, exceeding the LSD value at 5% level of significance, when in fact, the difference is not
real, is 29% for 5 treatments, 63% for 10 treatments, and 83% for 15 treatments.
-This implies that when the experimenter thinks he is testing at the 5% level of significance, he is
actually test at 29% level of significance for 5 treatments, 63% for 10 treatments, and so on.
- Thus the LSD test can be, and often is, misused. Hence, special
care must be exercised in using this test. The following rules
have been found useful in the effort to use the LSD test
effectively:
1. Use the LSD test only when the F-test in the analysis of
variance is significant;
2. Do not use the LSD test for comparisons among all
possible pairs of means when the experiment involves more
than five treatments; and
3. Use the LSD test for preplanned comparisons regardless
of the number of treatments involved. For instance, in
comparing every treatment to a control, the LSD test can used
even if there are more than five treatments.
Equal Replictation
- when every treatment is replicated r times, the formula or calculating
the LSD value at a certain level of significance, say α, is
LSDα = tα 2 s2/r ,
where tα, is the tabular value of t at the α level of significance, and

with the error degrees of freedom (appendix 3), s 2 is the error mean
square from the analysis of variance, and r is the number of
replications.
To judge whether a difference between two treatment means is

statistically significant, compare the observed difference with the
computed LSD value. If the observed difference is larger than the LSD
value, the two treatment means are significantly different at α level of
significance. If the difference between any pair of means is smaller than
the LSD value, the two treatments are not significantly different.
Example: As an illustration, data from the insecticide test presented
in Table preceding table will be used. In this instance, the LSD
test will be applied for the comparisons between the control and
each of the six insecticide treatments. This is a planned
comparison, hence, the use of the LSD test is appropriate
regardless of the number of treatments involved.
Information needed for the computation of LSD value is taken

from the analysis of variance . Using the formula, LSD values at
5% and 1% levels of significance are computed as
LSD 0.05 = t0.05 2s2/r = 2.080 2(94,773)/4

= 453kg/ha, and
LSD0.01 = t0.01 2s2/r = 2.831 2(94,773)/4
= 616 kg/ha.
The observed difference between each of the treatment means from
the control mean is shown in Table below:
Table: Comparison between grain yield of each of the five insecticide

treatments and control using the LSD test.
_______________________________________________________
Treatments Treatment Mean (kg/ha) Difference from Control (kg/ha)
__________________________________________________________________________________
Dol-mix (1kg) 2,127 811**
Dol-mix (2 kg) 2,678 1,362**
DDT + y-BHC 2,552 1,236**
Azodrin 2,128 812**
Dimecron-Boom 1,796 480*
Dimecron-Krap 1,681 365ns
Control 1,316 ____
______________________________________________________________________________________________________________________________
**
= significant at 1% level, * significant at 5% level, ns = not significant.
To judge which differences are statistically significant, each
difference is compared with the computed LSD values. If it
exceeds the LSD value at the 1% level, two asterisks are used
to indicate that the difference is highly significant. If it
exceeds the LSD value only at the 5% level, one asterisk is
used. Otherwise, use ns to indicate that the difference is not
significant. Of the six insecticides, only the Dimecron-Krap
treatment was not significantly different from the control.
Others are significantly superior either at 5% level or at 1%
level.
Unequal replication
When treatments have different numbers of replications, the LSD

value for comparing any pair of treatment means, say between
the ith treatment and the jth , is computed as
LSDα = tα s2 (1/ri + 1/rj),
where ri and rj are the number of replications of the ith

treatment and the jth treatment, respectively.
Data from a week-control trial shown from the previous tables

will be used as examples.
To compare a treatment with four replications with the control (having four
replications), LSD values at 5% and 1% levels of significance are computed
as follows:
LSD.05 = t.05 s2(1/4 + 1/4)
= 2.045 2(176,532)/4
= 608 kg/ha, and
LSD.01 = t.01 s2(1/4 + 1/4)
= 2.756 2(176,532)4
= 819 kg/ha
To compare a treatment with three replications with the control, the following
are used:
LSD0.5 = t0.5 s2 (1/3+1/4 )
= 2.045 7(176,532)/12 = 656 kg/ha
LSD.01 = t.05 s2 (1/3 +1/4)
= 2.756 7(176,532)/12 = 884 kg/ha
The results are shown in the preceding tables. All treatments are significantly
different from the control at either the 5% or 1% level.
Table. Grain yield of rice under different types, rates, and times of application of post-emergence
herbicides under upland-rainfed condition, IRRI.
Treatment
Type Rate1 (kg Time of a Grain Yields (kg/ha) Treatment Treatment
a.i./ha) Application2 Total Mean
(DAS)
Propanil/ 2.0/0.25 21 3,187 4,610 3,562 3,217 14,576 3,644
Bromoxynil
Propanil/ 3.0/1.00 28 3,390 2,875 2,775 9,040 3,013
2.4-D-Bee
Propanil/ 2.0/0.25 14 2,797 3,001 2,505 3,490 11,793 2948
Bromoxyl
Propanil/ 2.0/0.50 14 2,832 3,103 3,448 2,255 11638 2,010
Ioxynil
Propanil/ 3.0/1.50 21 2,233 2,743 2,727 7,703 2,568
CHCH
Phenyedi- 1.5 14 2,952 2,272 2,470 7,694 2,565
pham
Propanil/ 2.0/0.25 28 2,858 2,895 2,458 1,723 9,934 2,484
Bromoxyl
Propanil/,4 3.0/1.0 28 2,308 2,335 1,975 6,618 2,206
-D-IPE
Propanil/ 2.0/0.50 28 2,013 1,788 2,248 2,115 8,164 2,041
Ioxynil
Handweed - 15 and 35 3,202 3,060 2,240 2,690 11,192 2,798
twice
Control - - 1,192 1652 1,075 1,030 4,949 1237
Type Rate1 (kg Time of a Replications Treatment Difference LSD Values
a.i./ha) Application2 (no.) Mean From Control 5% 1%
(DAS) (kg/ha)
Propanil/ 2.0/0.25 21 4 3,644 2407** 608 819

Bromoxynil
Propanil/2.4- 3.0/1.00 28 3 3,013 1776** 656 884

D-Bee
Propanil/ 2.0/0.25 14 4 2948 1711** 608 819

Bromoxyl
Propanil/ 2.0/0.50 14 4 2,010 1673** 608 819

Ioxynil
Propanil/ 3.0/1.50 21 3 2,568 1331** 656 884

CHCH
Phenyedi- 1.5 14 3 2,565 1328** 656 884

pham
Propanil/ 2.0/0.25 28 4 2,484 1247** 608 819

Bromoxyl
Propanil/,4- 3.0/1.0 28 3 2,206 969** 656 884

D-IPE
Propanil/ 2.0/0.50 28 4 2,041 804* 608 919

Ioxynil
Handweed - 15 and 35 4 2,798 1561** 608 819

twice
Control - - 4 1237 - - -
1DAS – Days after seeding ; * = significant at 5% level ; ** significant at 1% level.

DUNCAN’S MULTIPLE RANGE TEST (DMRT)
- In contrast to the LSD test in which only one value is computed for all comparisons, the Duncan’s multiple ranges
test (DMRT) prescribes a set of significant differences of increasing sizes depending upon the distance between
the rankings of the two means to be compared.
- While additional computations are required for the DMRT, the test overcomes the major defects of the LSD test
in that the DMRT can be used to test differences among all possible pairs of treatments regardless of the number
of treatments involved and still maintain the prescribed level of significance.
- Steps in computing the DMRT:

Step 1. Arrange the treatment means in decreasing (or increasing) order.
Step 2. Calculate standard error of a treatment mean, as follows:
Sx = S2/r ,
Step 3. calculate the shortest significant ranges for various ranges of means as follows:
Rp = r p Sx
where rp (p = 2, 3 …….., t) are the values of significant studentized ranges
obtained from appendix 6 based on the error degrees of freedom and t is the
number of treatments.
Step 4. Group the treatment means according to the statistical significance.
For this, the following method may be used:
1. From the largest mean, subtract the “shortest significant range” of the
largest p. Declare all means less than this value significantly different from
the largest mean. For the remaining means not declared significantly
different, compare the range (i.e., difference between the largest and the
smallest) with appropriate Rp . If the range is smaller than its corresponding
Rp, all remaining means are not significantly different.
2. From the second largest mean, subtract the second largest Rp.
Declare all means less than this value significantly different from the second
largest mean. Then, compare the range of the remaining means with the
appropriate Rp.
3. Continue the process with the third largest mean, then the fourth, and
so on, until all means have been properly compared.
The steps in the computation of Duncan’s Multiple Range Test will be
illustrated using the previous data we had discussed before.
Step 1. Arrange the means in decreasing order as follows:
Treatment Mean yield(kg/ha) Rank
T2: Dol-mix (2 kg) 2,678 1
T3: DDT + γ-BHC 2,552 2
T4: Azodrin 2,128 3
T1: Dol-mix (1kg) 2,127 ` 4
T5: Dimecron-Boom 1,796 5
T6:Dimecron Krap 1,681 6
T7: Control 1,316
Step 2. Calculate standard error of a treatment mean as:
Sx = S2/r = 94,773/4 = 153.93 kg/ha

Step 3. From appendix 6 with error d.f. of 21, obtain the values of “significant studentized
ranges” for the 5% level of significance as follows:
p rp (0.05)
2 2.94
3 3.09
4 3.18
5 3.25
6 3.30
7 3.33
Then, compute the “shortest significant ranges” using formula from step 3. as follows:
p Rp = rp sx
2 (2.94)(153.93) =453
3 (3.09)(153.93) = 476
4 (3.18)(153.93) = 490
5 (3.25)(153.93) = 500
6 (3.30)(153.93) = 508
7 (3.33)(153.93) = 513
Step 4. The following steps should be followed in grouping the means according to
statistical significance.
1. From the largest treatment mean of 2,678 kg/ha, subtract the largest Rp (e,i., R7
= 513) and obtain the difference of 2,165 kg/ha. Declare all treatment means less
than 2,165kg/ha as significantly different from the largest mean. From the array of
means sown in Step 1, all treatments except T3 are significantly different from T2.
since there are two remaining means, namely, T2 and T3, their difference of 2,678 –
2,552 = 126 kg/ha is compared to R2 = 453. Since the difference is smaller than R2,
T2 and T3 are not significantly different.
Draw a vertical line connecting the means of T2 and T3 as follows:
Treatment Mean yield (by rank)
T2 2,678
T3 2552
T4 2128
T1 2127
T5 1796
T6 1681
T7 1316
- The vertical lines is a convenient and widely accepted way of marking
differences among treatment means. Note that all means connected by the
same line have been judged not significantly different from each other.
b. From the second largest mean (T3) of 2,552 kg/ha, subtract the second
largest Rp (i.e., R6); i.e. 2,552-508 = 2,044. Declare all means less than 2,044
kg/ha significantly different from the mean of T3. Here, the means of T5, T6,
and T7, are all less than 2,044 kg/ha and hence are significantly different from
T3.
- The range of the remaining means that were not declared significantly
different is T3-T1 = 2,552 – 2,127 = 425 which is less than R3=476. Hence, all
remaining means, namely, T3, T4, and T1, are not significantly different. Draw a
vertical line connecting the means of T3, T4, and T1 as follows:
Treatment Mean
T2 2,678
T3 2,552
T4 2,128
T1 2,127
T5 1,796
T6 1,681
T7 1,316
(d) Here the process can be continued with the fourth largest mean and so on.
However, since T7 is, at this stage, the only mean outside of groupings already
made, it is simpler just to compare T7 with the rest of the means (namely, T1, T5,
and T6) using the appropriate Rp. The comparisons are:
T1 – T7 = 2.127 – 1,316 = 811 > R4 = 490

T5 – T7 = 1,796 – 1,316 = 480 > R3 = 476, and
T6 – T7 = 1,681 – 1,316 = 365 < R2 = 453
The only difference that is not significant is between T6 and T7. Hence, these two
means are connected by a common vertical line as follows:
Treatment Mean
T2 2,678
T3 2,552
T4 2,128
T1 2,127
T5 1,796
T6 1,681
T7 1,316
In the presentation of results, the use of vertical lines to indicate significant
differences is appropriate only when the treatment means are arranged according to
rank. Otherwise letters should be substituted for the lines. The letter designation
means that any two means having at least one common letter are not significantly
different. Line designation is transferred to letter designation simply by replacing
each line with a letter. In this instance, there is a total of four lines. Hence, four
letters (a,b,c, and d) are used.
Table. For grain yield data
Treatment Treatment mean1 (kg/ha) Statistical

significance
T1 Dol-mix (1kg) 2,127 bc
T2 Dol-mix (2kg) 2,678 a
T3 DDT+γ-BHC 2,552 ab
T4 Azodrin 2,128 bc
T5 Dimecron-Boom 1,796 c
T6 Dimecron-Krap 1,681 d
T7 Control 1,316 d
In the presentation of results, the use of vertical lines to indicate significant differences
is appropriate only when the treatment means are arranged according to rank.
Otherwise, letters should be substituted for the lines. The letter designation means
that any two means having atleast one common letter are not significantly different.
Line designation is transferred to letter designation simply by replacing each line with
a letter. In this instance, there is a total of four lines. Hence, four letters (a,b, c and d)
are used.
ONE FACTOR ANALYSIS OF VARIANCE
The one factor analysis of variance (One-Way Anova) is
used to test the differences of a certain intervally
scaled dependent variable to another variable with
three or more categories.
It is being called as a one factor because one

dependent variable is being tested to a none
independent variable with more than two categories.
For example, if the researcher wishes to
determine significant difference in the NMAT
performance of the BS Biology students as to
their socioeconomic status (high, average,
low), the one-way analysis of variance will be
used in the analysis of data.
It is a method for dividing the variation being

called as the sources of variation into different
parts such as the between groups, within
groups as a source of factor in order to obtain
the F-ratio.
ANOVA is a useful tool which helps the user to
identify sources of variability from one or
more potential sources, sometimes referred to
as “treatments” or “factors.”
It is a parametric statistics that allows us to test

the effect of various variables or factors on
certain outcomes.
Procedures in the Computation of the One-Way
Analysis of Variance
Step 1. Determine the correction factor by using

the equation:
C.F. = (Grand Total)2 = (GT)2

Total Number of Observation nt
Step 2. Determine the total sum of squares:
SStotal = ∑(Xi)2 –C.F.
where SS total = the sum of squares of x-values

minus the correction factor
Xi = the x-values
C.F. = correction factor
Step 3. Compute the Sum of Squares between
groups:
SSBG = ∑ (Ti)2 – C.F.

r
where; Ti = total of the individual categories
r = the number of observation per category

Step 4. Compute for Sum of Squares within
group:
SSWG = SStotal – SSBG
Step 5. Compute for the degree of freedom

dfBG = k-1 where k= the numbers of
categories of variable e.g., SES has three
levels 3-1 = 2 is the degrees of freedom
between groups
dfWG = ntotal – k if the total number of observation
is 36 and there are 3 levels of SES
36 – 3 = 33 is the degrees of freedom
within groups
dftotal = n total -1 ( In this case if the total number

of observation is 36, then 36-1 = 35 is the
degrees of freedom total)
Step 5. Compute for the mean squares by using
the equation:
MSBG = SSBG and

dfBG
MSBG = the mean squares between groups

SSBG = the sum of squares between groups
dfBG = the degrees of freedom between groups
MSWG = SSWG
dfWG
where: MSWG = the mean square within

groups
SSWG = the sum of squares within groups
dfWG = degrees of freedom within groups.
Step 7. Compute the F-ratio by using the
equation:
F-ratio = MSBG
MSBG
The computed sum of squares will be entered

in standard table for One-Way Analysis of
Variance
Sources of DF SMS F-ratio F-Prob
Variations Squares
Bet Groups
Within Groups
Total
The Chi-Square Test
Measures of relationships on ordered sets of data on two or more nominal or
ordinal variables can be measured by the Chi-Square Test for a one sample
case, Chi-Square Test for two sample cases, the tetrachronic correlation, the
phi correlation, the rank biserial correlation, and the point biserial correlation.
The Chi-Square Test is used to determine the strength of association between

two nominal variables. The Ch-Square Test is of two types: The Chi-Square
Test in a One Sample Case, and the other is the Chi-Square Test in Two Sample
cases.
This type of test is only applicable for a non-parametric testing or

relationsh9ps of variables with nominal data. The Chi-Square Test has the
general equation as shown:
X2 = ∑ (fo- fe)2
fe
where:
X2 = the Chi-Square value

fo = the actually observed frequencies
fe = the expected frequencies
Chi-Square Test in A One Sample Case
The Chi-Square Test in a One Sample Case is used to determine whether a

significant correlation or a significant difference exists between the observed
frequency with that of the expected frequency distribution.
It is a statistical test commonly used to compare observed data we would expect

to obtain according to a specific hypothesis.
For example, if, according to Mendel’s Laws, you expected 20 out of 20

offsprings from a cross to be male and the actual observed number was 8 males,
then you might want to know about the “goodness of fit” between the observed
and the expected. Were the deviations (differences between observed and
expected) the result of chance, or were they due to other factors? How much
deviation can occur before you, the investigator, must conclude that something
other than chance is at work, causing the observed to differ from the expected?
The Chi-Square Test is always testing what scientists call
the null hypothesis, which states that there is no
significant difference between the expected and the
observed result.
For example in every survey conducted by the

University Student Council on- “Would you like to have
a torch parade during the eve of the University Day
Celebration?”. In this case, the investigator would like
to determine whether or not there is a significant
difference in the observed with that of the expected
frequencies. The data of the responses are tabulated as
follows:
Favor Undecided Not in Favor Total
fo 28 8 15 51
fe 17 17 17 51
fo-fe 11 -9 -2
(f0-fe)2 121 81 4
(fo-fe)2 7.21 4.76 .24
fe
∑ (fo-fe)2 12
fe
_______________________________________________________________________
Cqlculated X2 Value Tabular Value
0.05 0.01
_______________________________________________________________________
12.12 5.99 9.21

_______________________________________________________________________
p<.01 Significant at .01 alpha d.f. = k -1 ; 3 -1 = 2
Since the calculated Chi-Square value of 12.12 is
greater than the tabular value of 9.21 at .01 level of
probability, the null hypothesis that students do not
differ in their response was rejected, and the
alternative hypothesis that students differ in their
responses was accepted.
This means that students were agreeable to the

holding of torch parade during the eve of the
University Day Celebration.
The Chi-Square Test in Two Sample Cases
The Chi-Square Test for two sample cases will be used only in
determining the degree of association or determining
significant differences between two variables with two or
more categories.
The data to be used in this kind of statistical tool should be

nominal for both of the variables.
Chi-Square in two sample cases can either be used in testing

significant association or significant differences two variables
with two or more categories.
Constraints in Using the Chi-Square Test
The statistician or the researcher must be cautioned against
using the Chi-Square Test. In testing the independence or
relatedness of the variables using Chi-Square in two sample
cases, the following are the precautions that must be
considered.
1. When the degree of freedom is equal to one df = 1, and

the contingency table has a cell frequency of less than 5,
apply the Yate’s Correction formula which is now being called
as the Fischer’s Exact Test. Yate’s Correction formula is shown
below:
X2 = ∑ [ fo –fe - .50]2
fe
Chi-Square Test in two sample cases design depends on the number of
categories of each variable. For example, in the example shown below
from which teachers annual salary is of three categories and their
educational attainment is also of three categories, it is being called as a 3 x
3 Chi-Square Test in two sample cases. The data in computing the Chi-
Square Test in Two Sample cases:
Teachers’ Educational Attainment
Annual Salary Bachelor’s Degree Master’s Degree Doctoral Degree Total
High Php120,000 5 40 45 90
Above
Average 10 20 45 75
Php72,000 to
Php119,999
Low (Below 20 8 8 36
Php72,000)
Total 35 68 98 201
Steps in the process of computing the calculated Chi-Square
value:
Step 1. Examine the cell frequencies in the tables and
determine if there are zero cell frequencies, or if the degrees of
freedom are equal to one, or a cell frequency of less than five.
This process is an idea of determining the appropriate tool in the
analysis of data based from the constraints of when to use a Chi-
Square Test.
Step 2. Since by inspection of the contingency table, degree of
freedom is greater than one, no zero cell frequency, and no cell
frequency of less than 5, and so, the standard Chi-Square formula
will be used.
Step 3. Designate the corresponding cells by the use of a
letter symbol such as for bachelor’s degree with high salary rate as
cell a, those with master’s degree and with high salary rate is
designated as cell b and so on.
Step 4. Compute the expected cell frequency by using the
equation :
Expected Frequency (fe) = (row total) (column total)
Grand Total
For example in cell a, the row total is 90, the column total is 35, and the grand total is 201. Compute the expected
frequency for the following cells:
Cell a = (90) (35) = 15.67

201
Cell b = (90) (68) = 30.45
201
Cell c = (90) (98) = 43.88
201
Cell d = (75) (35) = 13.06
201
Cell e = (75) (68) = 25.37
201
Cell f = (75) (98) = 36.57
201
Cell g = (36) (35) = 6.27
201
Cell h = (36) 68) = 12.18
201
Cell i = (36) (98) = 17.55
201
Step 5. Construct a table shown the column of cells, the column of the observed
frequencies (fo), the column of the expected frequencies (fe), the column of the
difference between the observed frequencies with that of the expected
frequencies (f0-fe), the column for squares of the difference between the observed
frequencies and the expected frequencies divided by its expected frequency (f0-
fe)2
fe
Cells fo fe (fo –fe) (fo –fe)2 (fo –fe)2
fe
A 5 16 -11 121 7.56
B 40 30 10 100 3.33
C 45 44 1 1 0.02
D 10 13 -3 9 0.69
E 20 25 -5 25 1.00
F 45 37 8 64 1.73
G 20 6 14 196 32.67
H 8 12 -4 16 1.33
I 8 18 -10 100 5.56
∑ ( fo –fe )2 = 53.89
fe
Step 6. Interpret results by following the steps in testing research hypothesis.

Using the variables in the given data, follow the analysis by practicing the
seven (7) steps in testing the hypothesis.
1. Statement of the problem: is there any significant association between
the teachers’ annual salary rate with their educational attainment? Or, are
teachers” salary rates associated with their educational attainment?
2. Statement of the hypothesis:
Null hypothesis (Ho): there is no significant association between the
teachers’ annual salary rate and their educational attainment. Or, salary
rates are not associated with their educational attainment.
Alternative hypothesis ( H1): there is no significant association between
teachers’ annual salary rate with their educational attainment. Or, salary
rate is associated with their educational attainment.
3. Determine the level of measurement of variables:
Annual salary rate: 1 – High (Php 120,000 and above)

2 – Average (Php27,000.00 to Php
119,999.00)
3 – Low ( Below Php72,000.00)
Educational Attainment:
1 – Bachelors degree
2 – Masters degree
3. Doctoral degree
Variable 1 (annual salary rate) the level of measurement is nominal
Variable 2 (educational attainment) the level of measurement is also
nominal. The statistical tool to be used is by the use of Chi-Square in Two
Sample Cases.
4. Set the level of significance: the level of significance will be set at 0.05
alpha under tow tailed test.
5. Test the hypothesis. Testing the research hypothesis is by computing

the calculated Chi-Square value, and the calculated Chi-Square value is
∑ ( fo –fe )2 = 53.89
fe
6. Make a decision to the null hypothesis by using the table and figure
below:
_____________________________________________________________
Calculated x2 value Tabular x2 value
0.05 0.01
_____________________________________________________________
53.89 9.49 13.28
_____________________________________________________________
p<.01 Significang at .01 alpha

Test of Differences Between Means
In parametric testing, significant differences among variables can be
determined by either t-test or F-test and the dependency or relationships
among intervally scaled variables can be determined by the use of a Pearson’s
r.
But in non-parametric testing, the significant differences among variables can

be determined by the use of Mann-Whitney U Test, Wilcoxon Matched Pairs,
Kruskal Wallist Test and the relationship among variables can be determined
by t he use of Spearman’s r, Chi-Square, Fisher’s Exact Test and Kolmogorov-
Smirnov Test.
T-test is the most commonly used method to evaluate the
differences in means between two groups. There are two types
of t-test:
1) t-test for one sample case,
2) t-test for two sample cases
- The t-test for one sample case is used in determining the

significant differences between the sample mean from a
standard mean.
- The t-test for two sample cases, is of two types: (a) the t-test
for independent samples, and (b) the t-test for a dependent
sample.
- The t-test for an independent sample is used when there are two data
taken from two distinct different groups , while the t-test for a dependent
sample is used when there are two data taken from the same group.
Sample computation in the use of t-test for a none sample case:

Data: Sample mean = 10 mg/i.u.; standard man = 15 mg/i.u.;
standard deviation of the sample mean= 4.58; n= 100.
Compute the t-ratio = µ - x /SEm
= 15-10 /0.458
Where : Sem = sd = 4.58
√n √100
= 4.58 = 0.458 = t-ratio = 2 = 10.92

10 0.458
___________________________________
Calculated t-value Tabular t-value
.05 .01
_______________________________________________
10.92 1.98 2.62
_______________________________________________
df= n-2 = 98; p<.01 significant at .01 alpha
Since the calculated t-value of 10.92 is greater than the tabular t-value of 2.62 at .01 alpha, the
null hypothesis of there is no significant difference that would exist between the standard mean
with that of the sample mean, was rejected (or was not accepted). This means that the sample
mean is significantly different with that of the standard mean. The patients who are cigarette
smokers will be advised to have a slow withdrawal from smoking or else complications of high
blood pressure and cancer cells will occur or generally their health will be affected.

Introduction To Statistics: Romeo D. Caturao, D.SC., Ph.D. Dean, College of Fisheries

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Introduction To Statistics: Romeo D. Caturao, D.SC., Ph.D. Dean, College of Fisheries

Uploaded by

Copyright:

Available Formats

Introduction to Statistics

Romeo D. Caturao, D.Sc., Ph.D.

• Is a science which deals with the methods of

Observation – refer to any recording of information,

– are those procedures used in the collection,

– Statistical methods are categorized into two major

A statistical relationship is a mixture of deterministic and random relationships.

• A deterministic relationship involves an exact relationship between two variables. For

• A statistical relationship exists if a change in one variable (X) results in a

• Statistics is primarily concerned with data, most

Population – refers to the totality of the observations

A parameter -is a property descriptive of the

Estimate – refers to a property of a sample drawn at

The distinction between the parameter and estimate

The Greek letter represents the parameter and

Variable – is any measurable characteristics that vary

Example: People may vary in sex, age, educational

When a researcher wants to study pupils’ performance with

1. Independent Variable – is a variable that the

2. Dependent Variable – is an outcome of interest

1. Qualitative Variables – refers to the

2. Quantitative variables – refers to the

Example: The weight of the students may

Example: In business, money can be considered as a ratio

n = 59.57 (the required sample size of academic administrators of SUCs

where Pi = proportional number of academic

Ni = population of academic administrators in every SUC

Nt = total number of academic administrators in State

Where: ni = the required sample size for every SUC

1. Probability Sampling Techniques – a process of selecting a

In other words, every individual has the same probability of

If however, the researcher does not have the adapted

Using the Greek letter ∑ (capital sigma) to indicate “summation of,”

The subscript may be any letter, although i, j, and k seem to be

Example: If X1 = 3, X2 = 5, and X3 = 7, find

∑ (xi + yi + zi) = ∑xi + ∑yi + ∑zi.

Proof. Expanding the left side and regrouping, we have

= ( x1 + x2 + . . . + xn) + (y1 + y2 + . . . + yn) + (z1 + z2 + . . . + zn)

Prof. Expanding the left side and factoring, we get

∑ ∑ means that we first sum over the subscript j, using the

T1 X11 X12 X13 X14

= (x12 + x13 +x14) + (x22 + x23 + x24).

= f ( x 1, y1) + f (x1, y2) + f( x2, y1) + f(x2, y2)

Solution: since the data are considered to be finite

Solution: This being a sample, we have

X = 1.8 + 2. 1 + 1.7 + 1.6 + 0.9 + 2.7 + 1.8 = 1.8%

µ and for a sample we X.

The median. The median of a set of observation arranged in an

Solution. If we arrange these nicotine contents in an

X = 2.5 + 2.7 = 2.6 milligrams.

Example 6. The number of movies attended last month

Measures of Dispersion or Variation has four common

Example : N=12 ∑dx = 0 ∑(dx)2=30 N= 12 ∑dx = 0 ∑(dx)2=30

σ2 A = ∑(Xi –X)2 = 120 = 120 = 10.91

∑ (Xi - X)2 120 = 3.30

C.V.A = s.d. x 100 = 165 = 20.63%

C.V.A = s.d. X 100 = 41.25 = 41.25%

Comparing the four measures of dispersion using the two groups

Variance 2.73 10.91

Standard Deviation 1.65 3.30

Coefficient 20.63% 41.25%

In a descriptive data analysis aside from the frequency, rank and