3 Biometry For Abg-730

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 18

1

INTRODUCTORY BIOSTATISTICS
BIOSTATISTICS:
Application of statistical methods to the solution of biological problems is called
biostatistics. It is also called biological statistics or simply biometry. In this definition, the
term "statistics" can be defined as "the scientific study of numerical data based on variation
in nature". The word statistics is also used in another, related sense; as the plural of the noun
statistic, which refers to any one of many computed or estimated statistical quantities, such
as the mean, the standard deviation, or the correlation coefficient. Each one of these is a
statistic. All parts of this definition are important and deserve emphasis.
Science is said to find meaningful simplicity in the midst of disorderly complexity. Thus
while pursuing a scientific study, we are concerned with the commonly accepted criteria of
validity of scientific evidence. Objectivity in presentation and in evaluation of data and the
general ethical code of scientific methodology must constantly be followed.

Statistics generally deals with populations or groups of individuals; hence it deals with
quantities of information, not with a single datum. Thus the measurement of a single animal
or the response from a single biochemical test will generally not be of interest; unless a
sample of animals is measured or several such tests are performed, statistics ordinarily can
play no role. Unless the data of a study can be quantified in one way or another, they are not
amenable to statistical analysis. The data can be measurements (the length and width of a
structure or the amount of a chemical in a body fluid) or counts (the number of bristles or
teeth or frequencies of different qualitative states such as mutant eye colours).

This term ‘natural variation’ is used to include all those events that happen in animate and
inanimate nature not under the direct control of the investigator, plus those that are evoked
by the scientist and are partly under his or her control, as in an experiment. Different
biologists concern themselves with different levels of variation; other kinds of scientists
with yet different levels; but all agree that variation in the chirping of sparrows, the number
of peas in a pod, and the age at maturity in chickens is natural. The heartbeat of rats in
response to adrenaline or the mutation rate in maize after irradiation may still be considered
natural, even though the researcher has manipulated the phenomenon in an experiment. The
average biologist, however, would not consider variation in the number of colour TV sets
bought by persons in different cities in a giver year to be natural, although sociologists or
human ecologists might, thus deeming it worthy of study. The qualification "variation in
nature”: is included in the definition of statistics largely to make certain that the phenomena
studied are not arbitrary ones that are entirely under the will and control of the researcher,
such as the number of animals employed by the experimenter.

Scientists deal with the observation and classification of facts. They must then be able to
observe an event or set of events as the result of a plan or design. These experiments are
thus the substance of scientific methods. The numerical value assigned to each outcome of
an experiment is known as a variable. It can be represented by a symbol like X, Y or Z just
for notation or convenience. The "X" thus is a representative of variability or variation and
is sometimes called as variate or variable or chance variable.
To represent more than one observations in a particular variable, an index or subscript is
very oftenly used. For example, birth weight (a variable) for first lamb can be represented
by X1, for second lamb as X2, for third lamb as X3 or for an ith lamb as Xi. Another
characteristics of these lambs may be their height which can be represented by Y1, Y2, Y3 or
Yi .

msk
2

Observations or variables can be either quantitative (amount of milk produced daily by a


cow or number of WBC’s in a microlitre of milk, CGPA of students in a class etc.) or
qualitative (colours of crossbred cows, martial status, traditional seasons in a year etc.).

CENTRES (measures of central tendency)


One of the first thing we usually want to know about a set of data such as milk production
of buffaloes at a farm, number of cases of milk fever reported at a vet. hospital etc., is
“about how much” or “about how many”. When we ask, “about how much”, we probably
want to know the value of milk yield in the middle or centre of the data. In the second
example we are interested to know the central value. The centre of the data set can be
defined in many different ways but the most common definition is the average or the mean.
Mean:
To find the average or mean of a data set, just add all the values and divide by the number
of the individual values in the data set. If a set consists of data values X and there are N
such numbers, the mean is
Mean = sum of values / N or X = Xi/N
Median:
Another way of looking at the idea of the centre is to find the middle value of the data set
(think of the median strip on the motor way that divides the total road way in half). An easy
example would be to find a median of the marks obtained by the students in a Nutrition
class. Suppose there are 25 students in this class. When scores are arranged from highest to
lowest suppose the 13th value from the top is 62%, the median. In case the number of
students was 24, the average of the 12th and 13th score would be called as median. So
If N is Odd Median is the value positioned at (N+1)/2
If N is even Median is the average of the values falling in positions N/2 and (N/2)+1

Mode:
The most frequent or most common value in a data set is called the mode. Suppose we go
shopping to look for a pair of sneakers. In glancing around the shopping mall, we notice a
lot of price tags with Rs. 500 on them. We probably would not attempt either to add all the
prices to find mean, nor to list them to calculate median, we would be more likely to say
that sneakers in the shopping mall cost around Rs. 500.

The three most commonly used measures of central tendency have their advantages and
disadvantages to be used under different situations. Mean is the most commonly used
statistic however.

SPREADS (measures of variation)


There are many measures of variation. One of very simple measure is ‘range’. It is used to
depict the difference between the two extremes in the data set. For example, highest and
lowest height of individuals in a class etc. Quartiles and percentiles are also frequently used.
One of the most frequently used measure of spread is the ‘standard deviation’ (and
‘variance’).
Standard deviation:
Square root of average squared deviation of any variable from its mean is standard deviation
( for population and S for the sample).
n

 (X i  X) 2
 X i 2  ( X i ) 2 / n
i 1
S 
( n  1) ( n  1)

February temperatures of five cities are given below. Find the standard deviation of the
temperature.
msk
3

City Temperature in oC Deviation from mean Squared Deviation


(Xi) (Xi- X ) (Xi- X )2
Islamabad 8 8-11.8=-3.8 14.44
Lahore 13 13-11.8= 1.2 1.44
Faisalabad 14 14-11.8= 2.2 4.84
Muzafarabad 8 8-11.8=-3.8 14.44
Karachi 16 16-11.8= 4.2 17.64
Total 59 52.80

Mean temperature ( X ) = 59/5 = 11.8 oC

Standard deviation (S) = (52.80)/(5-1) = 3.63oC

Variance:
The square of the standard deviation is called variance (2 for population and S2 for the
sample). It is widely used as an expression of variability because of the additive nature of its
components. By a technique called ‘analysis of variance’, the total phenotypic variance (P2)
expressed by a given trait in a population can be statistically partitioned into components of genetic
variance (G2) and non-genetic (environmental) variance (E2) etc.

RELATIONSHIP BETWEEN TWO VARIABLES

We have previously been dealing with statistics like mean, range and standard deviation
where one variable was considered at a time. But in real life, we are often concerned with more
than one variables at a time. For example, we may wish to know relationship between the amount
of a certain type of ration and weight gain in lambs or number of somatic cells in milk and milk
quality (say, casein percentage). To see how certain variables vary together, the term oftenly used is
called covariance. It is the average crossproduct of deviations of two variables from their
respective means. The formula for sample covariance is
n
 ( X i  X)( Yi  Y)  X i Yi  (  X i )(  Yi ) / n
i 1
Cov( XY)  =
( n  1) ( n  1)

where X and Y are sample means and n-1 are the degrees of freedom. For population covariance
(XY) the divisor would be N, similar to the formula for population variance. Let’s take a
hypothetical example to calculate covariance between height (X) and weight (Y) scores.

Example 1.
Height (X) Weight (Y)
(Xi- X )(Yi- Y )
4 12 (-2)(-4) = 8
5 14 (-1)(-2) = 2
6 16 (0)(0) = 0
7 18 (+1)(+2) = 2
8 20 (+2)(+4) = 8

Xi = 30 Yi = 80 (Xi- X )(Yi- Y ) = 20


n=5 n=5
X =6 Y = 16
Cov(XY) = 20/4 = 5.0

msk
4

Example 2. Pl. Calculate covariance between the following data on length (from nose to tail) and
weight of 8 laboratory mice. X represents length (cm) and Y represents weight (gm). Pl. use the
two different formuli.

Mice 1 2 3 4 5 6 7 8
X 16 15 20 13 15 17 16 21
Y 32 26 40 27 17 38 34 43

Regression
The linear relationship between two variables is usually expressed as the covariance
between the two variables (X and Y) in terms of (relative to) the variance of one variable (say X)
and quantity is known as the regression coefficient or slope. For population, it is denoted by beta
() while for samples, b is used. Variable names are used as subscripts to show the dependent and
independent variables. For regression of Y on X, the formula would be:

bYX = Cov(XY)
Var(X)

= (Xi- X )(Yi- Y )/(n-1)


(Xi- X )2/(n-1)

= (Xi- X )(Yi- Y )
(Xi- X )2

= XiYi-(X)(Y)/n
(Xi)2-(X)2/n

In Example 1, the value of bYX would be 2.0 which means that the value of Y variable
increases by 2.0 units per unit increase in the value of X variable. If the height is measured in inches
and weight is measured in pounds, it will mean that rate of increase in weight is 2.0 pounds per inch
increase in height of the individuals. We can also calculate bXY (bee of X on Y) = 0.5. It would mean
that X increases by 0.5 units per unit increase in Y i.e. the rate of change in X (height) is 0.5 inches for
every kg increase in body weight.
If X is thought to be responsible for the variation in Y variable (the response variable), we can
predict the value of Ŷ or (y-hat) using the regression coefficient and the mean values of X and Y.
Such an equation is called as Regression Equation or Prediction Equation.

Ŷ = a + bYX Xi

Where ‘a’ (alpha) is intercept of Y axis at zero value of X. and

a = Y - bYX X

In Example 1, a = 16 - (2)(6) = 4.0. The regression equation above is similar to a familiar expression of
converting centigrade (oF = 32 + 9/5 oC) where 32 is value of oF at zero centigrade. Graphical
representation of the regression equation should make it more clear.
The regression coefficient may have any value between -  to + . The positive value of
regression coefficient means that there is an increase in Y (dependent variable) per unit increase in X
(independent variable). The negative value of regression coefficient indicates that rate of decrease in Y
value per unit increase in X values.

msk
5

Example 3. Slaughtered data on 24 lambs at the end of an experiment were recorded. The live weights
(X) and dressed carcass weights (Y) are as under.

Lamb No. X Y Lamb No. X Y Lamb No. X Y


1 74 38.8 9 110.1 56.3 17 51.5 26
2 71.5 41 10 80 38 18 68 33
3 92.0 45 11 74.5 38.5 19 76 38
4 81 38.7 12 78.5 40 20 77 40
5 78 38.5 13 83 44 21 91.5 45
6 72.5 44 14 76 37.4 22 50 23.5
7 105 55 15 60 31 23 61 28.5
8 103 50.7 16 56 27.5 24 75 35.8
Xi = 1845.1 ; Yi = 934.7 ; X = 147422.51 ; n = 24 ; Y = 37991.21 ; XY = 74714.28
2 2

bYX = 2855.32/5572.76 = 0.5124 kg

We can say that carcass weight of lambs increases by 0.5124 kg per unit (kg) increase in live
weight. It may be noted that regression coefficients are expressed with units of measurements like
inches, kg. etc. These are not absolute values and must be used with units of measurements.
Pl. construct the regression equation for Example 3 and predict the carcass weight for a live weight of
100 kg.
The calculations of alpha and beta can be accomplished using matrices as well. Suppose we have X as
lactation length (days) and Y as milk yield (kg). To develop simple linear regression equation to
predict milk yield from lactation length would be as follows:

X Y XY X2 Y2 Ŷ = .95+6.86LL
30 190 5700 900 36100 206.89
60 260 15600 3600 67600 412.83
85 620 52700 7225 384400 584.45
100 750 75000 10000 562500 687.42
150 1050 157500 22500 1102500 1030.65
180 1350 243000 32400 1822500 1236.59
200 1420 284000 40000 2016400 1373.89
225 1550 348750 50625 2402500 1545.51
250 700 425000 62500 2890000 1717.12
305 2000 610000 93025 4000000 2094.68
Sum 1585 10890 2217250 322775 15284500
Mean 158.5 1089.0

Y = n b0 + b1 X  10890 = 10 b0 + b1 1585
XY = X b0 + b1 X2  2217250 = 1585 b0 + b1 322775

10 1585 b0 10890
1585 322775 b1 = 2217250
b0 0.451102 -0.002215 10890 0.9483
b1 = -0.002215 0.000014 2217250 = 6.8647

You may confirm the values of alpha and beta without the use of matrices.
msk
6

If we had three unknowns following equations would have been solved

Y = b0 n + b1 X1 + b2 X2
X1Y = b0 X1 + b1 X12 + b2 X1X2
X2Y = b0 X2 + b1 X1X2 + b2 X22

Correlation
If the interest centres around just a measure of relationship between two variables and we are
not interested to predict one variable from the other, then we can calculate a quantity known as
correlation coefficient represented by XY (Rho X Y) for population and rXY for samples.

rXY = Cov(XY)
Sqrt [Var(X).Var(Y)]

= (Xi- X )(Yi- Y )
Sqrt[{(Xi- X )2}*{(Yi- Y )2}]

= XiYi-(X)(Y)/n
Sqrt[{(Xi)2-(X)2/n}*{(Yi)2-(Y)2/n}]

The correlation coefficient is a unit less quantity and varies from -1.00 (perfect negative
correlation) to +1.00 (perfect positive correlation). The value or rXY is zero when there is no
relationship between X and Y variables.
In the example on live weight and carcass weight of lambs, the correlation coefficient rXY is =
2855.32/2975.3 = 0.96 which means that the relationship between the two variables is very strong.
It may be mentioned that existence of a relationship between two variables does not prove that
one causes the other. For example we might get correlation between the marks obtained in the
biometry test and the shoe size. This won’t imply that knowledge of biometry causes big feet (or vice
versa). Some scatter plots for different correlation coefficients are presented below.

20 20 20
r=1.0 r=0.0 r=-1.0
15 15 15

10 10 10

5 5 5

0 0 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10

20 20 20
r=0.90 r=-0.70 r=0.25
15 15 15

10 10 10

5 5 5

0 0 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10

Example 4. The following data were taken on three year old calves of Holstein. The X represents
wither height (cm) and Y, the body weight (lb.).

msk
7

X 127 134 133 132 130 126 131 130 137 134
Y 153 160 160 153 155 154 154 153 161 162

Calculate means, variances, covariance, regression of body weight on wither weight as well as
the correlation coefficient between the two variables.
Try another statistic ‘coefficient of variation’ also. Its formula is given below.

CV (%) = (Standard deviation  Mean)*100

Example 5. Number of personal fouls (X) and total points scored (Y) by 14 basketball players of a
University team in a tournament are given below. Calculate the correlation coefficient between the
two variables and calculate regression coefficient for Y on X. How many points are expected from
a player having 10 personal fouls.

Player 1 2 3 4 5 6 7 8 9 10 11 12 13 14
X 0 0 0 0 6 17 7 31 9 4 24 15 16 0
Y 1 0 0 2 6 48 21 75 18 2 42 60 37 3

Example 6. Relationship between time spent on the slenderizing program and the weight loss from
9 persons attending a Weight-Loss clinic is presented below. Calculate rXY and bYX. How much
weight should an individual loose if she attends the clinic for 10 weeks?

Individual 1 2 3 4 5 6 7 8 9
X (Weeks) 12 4 8 4 12 7 5 9 14
Y (lbs) 19 18 8 16 26 12 12 39 31

Hint: SXY = 19.25 Weeks-lbs


SX = 3.71 Weeks SY = 8.08 lbs

TESTING THAT POPULATION MEAN IS A SPECIFIED VALUE

Let  and 2 denote the mean and variance of a population. A random sample of size n
is drawn and the sample mean and variance are computed. To test the null hypothesis H0:  = 0
(assuming that the population is normally distributed and 2 is unknown), the test criterion is:
Y 0
t
s/ n
The Y is the sample mean and is the 0 hypothesised mean. The denominator is also called the
standard error of the mean (S Y ) and SY = S2 / n = S / n
The calculated value of the t statistic is compared with the table value at n-1 d.f and at a
specified alpha and Null hypothesis is rejected if |t| > t/2 for two tailed t-test and at |t| > t for
one tailed t-test.
Example 7. Suppose we have hypothesised that the population mean is 40 lbs. The butterfat
yield for a month from 10 cows chosen at random has mean 36.4 lbs and variance 264.04 lbs2.
H0:  = 40
Ha:   40
 = .05
t = (36.4 - 40) / (264.04/10) = -0.701
Now when we look in the t-table the |t| is less than t.05/2 i.e. 1.98. So there is no reason to think
that -0.701 as an unusual value of t if  is 40. We accept the null hypothesis.

msk
8

COMPARISON OF TWO TREATMENT MEANS BY T-TEST)

Suppose that we have two populations with means 1 and 2 . A random sample is
drawn from each population to test the null hypothesis that 1 and 2 are separated by a
specified amount, usually chosen to be zero. For null hypothesis of no difference, t is defined
as:
Y1  Y2
t SY1Y2
Here the S Y1-Y2 is the standard deviation appropriate to a difference between two random means
from a normal population. Its calculation depends on whether the two populations have a
common variance, if these variances are known or estimated, if samples are of the same size
and if the observations are paired. The choice of the rejection region depends on the level of
significance chosen, the sample size and whether the test is single or two tailed.
With the assumption of equal (unknown) variance, let 1 and 2 be the means of two
populations. The means ( Y1 , Y2 ), variances (S12 , S22 ) and sizes (n1, n2) of the two samples would
be needed along with the standard error of the difference between the means.
1 1
SY1-Y2 1 S2P( + )
n1 n2
2
The SP in the above formula is the weighted average of the sample variances, sometimes called
as pooled variance.
(n  1)S12 (n2  1)S22
S2P  1
(n1  1)+(n2  1)

if n1 = n2, the formula of pooled variance reduces to


S12  S22
S 
2
P
2

Example 8. An experiment was conducted to compare the mean number of tapeworms in the
stomachs of sheep that had been treated for worms against the mean number in those that were
untreated. A sample of 14 worm-infected lambs was randomly divided into two groups. Seven
were injected with drug and the remainder were left untreated. After six-month period, the
lambs were slaughtered and the following worm counts were recorded.

Drug-treated sheep (Y1) 18 43 28 50 16 32 13


Untreated sheep (Y2) 40 54 26 63 21 37 39

Test the hypothesis that there is no difference in the mean number of worms between treated
and untreated lambs. Assume that the drug can not increase the number of worms and hence use
the alternative hypothesis that the mean for treated lambs is less than the mean for untreated
lambs (one-tail test). Use  = .05.
Y1 = 28.57 Y2 = 40.0
2
S1 = 198.62 S22 = 215.33
n1 =7 n2 = 7
S2P = 206.98

H0: 1 - 2 = 0 Ha: 1 - 2 < 0

t = -1.49
msk
9

For  = .05, the critical t-value for one-tailed test with d.f = n1 + n2 -2 = 12 can be obtained
from the t-table at  = .05. The tabulated value is 1.782 and since 1.49<1.782 we accept H0 and
conclude that there is no difference in the mean number of worms in treated and untreated
lambs.

Example 9. Weight gain in two groups of Holstein heifers, control (Y1) and supplemented with
vitamin A (Y2) are given. Test if means are same.

Y1 175 132 218 151 200 219 234 149 187 123 248 206 179 206
Y2 142 311 337 262 302 195 253 199 236 216 211 176 249 214

Hint: Y1 = 2627 Y2 = 3303 Y12 = 511807 Y22 = 817583

msk
10

COMPARISON OF SEVERAL TREATMENTS


(Experimental Designs)

The main objective of an experiment is to observe the events occurring as the result of a
plan, collecting and tabulating the observations and interpreting them such a way that the
conclusions and estimates can be assessed by means of inductive reasoning based on
mathematics of probability. A scientist should thus specify all the conditions in which the trial
is to be performed i.e. the material and treatments together with circumstances under which the
measurements are to be made or data are to be collected. The plan (the experimental design)
under which experiment is conducted will help to furnish unbiased and unconfounded estimates
with adequate precision.
It should be realised that statistical methods can bring out only that information which
has been incorporated into the data by careful design and execution of the experiment.
Elaborate statistics are thus no substitute for meticulous (full of minute details)
experimentation. Inferences about the population will not hold good if based on carelessly
collected data. Similarly, extensive and conscientiously done measurements may contain little
worthwhile information if the experimental design was faulty. It is only by a combination of
appropriate design, skilful conduct of experiment and suitable statistical method that the
investigator is assured of reliable evidences, upon which to base his decisions.
An element or a group of elements to which a treatment is applied is called an
Experimental Unit. Such a unit must have an equal and independent chance of receiving any
one of the treatments. i.e. treatments are randomly applied to each experimental unit. The
Randomization is the method by which no treatment or group of treatments are favoured over
the others. It can be achieved by slips or by use of random number tables etc. Replication on
the other hand refers to the number of experimental units to which one treatment is applied. It is
needed to get an estimate of experimental error, estimate of which be more precise when
number of replications increases. This will help to detect the differences between the treatments
more accurately. Experimental Error represents the variation within the experimental units
treated alike. It is the residual variation in the observations due to known and unknown factors.
The most commonly used statistical designs are:
1) CRD (Completely Randomized Design)
2) RCBD (Randomized Complete Block Design)
3) LSD (Latin Square Design

Completely Randomised Design


(C.R.D)

This type of experimental design is most simple and easy to analyze. The unequal
number of replications do not complicate the situation. It is used when quite homogenous
experimental material is available. Homogenous means that all the experimental units are alike
- they behave the same way when treated alike.
Suppose three experimental rations A, B, and C are to be tested for their effect on
growth rate in lambs. Eighteen lambs of the same breed, same age, same body size and almost
of the same weight are available for the study. Each individual lamb is weighted and numbered
for identification. They are then randomly allotted to the three treatments (rations) such that
each treatment receives six lambs. These lambs were then fed the experimental ration
individually for the specified period and weighted at the end of the experimental period. The
weight gain of each lamb can be determined by subtracting the initial body weight from the
final body weight. These gains are then tabulated.
Example 10
A random observation on jth lamb fed ith ration can be assumed to have the statistical
model:

msk
11

Xij =  + Ti + eij
where

Xij is the weight gain of jth lamb on ith ration.


i = 1, 2, 3  Number of treatments used (a)
j = 1, 2, ..., 6  Number of lambs on each treatment (n)
 is overall mean or population mean.
Ti is the effect of ith treatment and it is assumed that Ti = 0
eij is the random error associated with jth observation.
It is assumed that eij is normally and independently distributed with mean zero and
variance e. Traditionally, it is expressed as: eij ~ N (0, e).

Tabulation and calculations:


Let us suppose we have the following weight gains on the three rations:

A B C
4 3 10
6 5 13
5 4 8
6 3 12
4 5 8
5 4 9 X.. = X1. + X2. + X3. = 30 + 24 + 60
Total X1.=30 X2.=24 X3.=60 = 114 = Grand total

Correction term (C.T) or Correction Factor


= Grand total square divided by the total number of observations
Total sum of squares (Total S.S)
= Sum of the individual observations squared minus the C.T
Treatment sum of squares (Treatment S.S)
= Sum of individual treatment total squared divided by number of observations on each
treatment minus the C.T
Residual sum of squares (Error)
= Total S.S minus Treatment S.S

C.T = (114)2 / 18 = 722


Total S.S = (4)2 + (6)2 + ... + (9)2 - 722 = 154
Treatment S.S = [(30)2 + (24)2 + (60)2]/6 - 722 = 124
Error = 154 - 124 = 30

It can be shown that the error sum of squares obtained in the above calculations is
pooled sum of squared deviations from the respective treatment means (i.e. 4 + 4 + 22 = 30).
These results can be presented in the form of a table usually called ANOVA (Analysis of
Variance Table).

ANOVA for Weight Gain in Lambs


Source of variation Degrees of Sum of Mean Squares F-ratio Expected mean
(S.O.V) freedom (d.f) squares (S.S) (M.S) squares E.M.S)
Between Treatments (a-1) = 2 124.0 62.0 31.0 e + 6T
Within Treatments a(n-1) = 15 30.0 2.0 e
Total an-1 = 17
Table F-values for 2 and 15 d.f, F(2,15) are 3.68 at 5% and 6.36 for 1% level of probability.

msk
12

Because calculated F-ratio of 31.0 (obtained by dividing the treatment mean squares by

that the results are highly significant (or significant at 1% level of probability), usually written
as (P < .01). This would conclude that mean weight gain of lambs on different rations differed
significantly from each other and chances of error are one in one hundred.
Traditionally, two stars (asterisk) are put on the values of F-ratios for any source of
variation if results are highly significant (P<.01), one star for results significant at P<.05 and
N.S (non-significant) if the calculated value is less than the tabulated value at P<.05.

C.R.D with Unequal Replications

In experiments especially with animals, some accident may happen during the
experimental period resulting in loss/death of some animal (replication). In the numerical
example discussed earlier, the experiment was planned for testing the effect of three rations (A,
B, C) on growth rate of lambs. Suppose that out of six lambs randomly allotted to the three
rations, one on ration B and two on ration C died during the experimental period. Final body
weights thus could only be recorded on 15 instead of 18 lambs and data on weight gain are
tabulated below.

Example 11

A B C
4 4 10 X.. = 30 + 19 + 44 = 93
6 5 13 C.T = (93)2 / 15 = 576.6
5 3 12 Total S.S = (4)2 + (6)2 + ... + (9)2 - 576.6
6 4 9 = 146.4 = Xij2 - C.T
4 3 - Treatment S.S = (Xi.2)/ni - C.T
5 - - = [(30)2/6 + (19)2/5 +(44)2/4] - 576.6 = 129.6
Xi 30 19 44 Error S.S = 146.4 - 129.6 = 16.8
ni 6 5 4
Xi 5 3.8 11

ANOVA Table
S.O.V d.f S.S M.S F E.M.S
Between treatments 3-1 = 2 129.6 64.8 46.28** e + KT
Within treatments 14-2 = 12 16.8 1.4 e
Total N-1 = 14 146.4
Table value F(2,12) = 3.88 at  =.05 and is 6.93 at  =.01

K = 1/(t-1) * [N - ni2 / N)]


where
K is number of observations per treatment
N is total number of observations
ni is number of observations on ith treatment
t is number of treatments
so ni2 = (6)2 + (5)2 + (4)2 = 77
and K = 1/(3-1) * [15 - (77/15)] = 4.935

msk
13

Randomized Complete Block Design


(R.C.B. Design)

This experimental design is used when the experimental material (experimental units) is
not quite homogenous but has one known source of variation.
In case of field experiments, the piece of land used for experimental plots may vary in
fertility in the known direction. The land can be divided in such blocks where the variation in
the plots of the same block is less than the variation among plots belonging to different block.
In case of experiments involving animals, say lambs, they may be of the same breed,
may have same age, same body size but they may differ in body weight such that they can be
divided in groups (or blocks) of similar weights.
Suppose four rations are to be tested for their effects on the growth rate of lambs.
Twelve lambs are available for this study. They are of the same breed, same sex, same age,
same body weight but their initial body weights are quite variable. However, they can be
divided in three groups or blocks such that the variation between weights of lambs within the
block is less than variation between individuals belonging to different blocks. Each block has
four lambs. The experimental rations are randomly allotted among the animals of the same
block. This allotment is done separately for each block.

Treatments
1 B D A C
Blocks 2 C A D B
3 A B D C

In randomized complete block design, each treatment appears once and only once in
each block. If the number of experimental units in each block are not enough to cover all the
treatments under trial, then the experimental design will be Randomized Incomplete Block
Design and will lead to certain computational difficulties.
The animals are fed experimental rations for a specified period and final weight at the
end of the experiment is recorded. Each observation on weight gain can be assumed to follow
the statistical model:

Example 12
Xij =  + Ti + Bj + eij
where
i = 1, 2, 3, 4 (no. of rations) ... a
j = 1, 2, 3 (no. of blocks) ... b

Xij is weight gain of lambs on ith ration in jth block


 is overall population mean
Ti is the effect of ith treatment, Ti = 0
Bj is the effect of jth block, Bj = 0
eij is the random error associated with the observation on the ith ration in the jth block. eij
=0
It is further assumed that eij is normally and independently distributed with mean zero
and variance 2e i.e. eij ~ N (0, e).

In randomized block design, it is also assumed that there exists no interaction between
the treatments and the blocks. The calculations for this design symbolically are as under:

Correction Term = (Xij)2/N = (X..)2/N


Total Sum of Squares = Xij2 - C.T
msk
14

Between Blocks sum of Squares = (X.j2)/a - C.T


Between Rations Sum of Squares = (xi.2)/b - C.T
Error Sum of Squares = Total SS - Block SS - Ration SS
The data on weight gains can be tabulated ration wise and block wise.

Rations
Blocks A B C D Block total
1 4 6 8 5 23
2 5 10 7 5 27
3 6 12 9 8 35
Ration total 28 24 18 85
Calculations:
Correction Term = (85)/12 = 7225/12 = 602.04
Block Sum of Squares = [{(23)+(27)+(35)}/4] - 602.04 = [(529+729+1225)/4] - 602.04 =
18.67
Rations Sum of Squares = [{(15)+(28)+(24)+(18)}/3] - C.T = [(225+784+576+324)/3] -
602.08 = 34.25
Total Sum of Squares = (4) +(5)+(6)+(10)+(12)+(8)+(7)+(9)+(5)+(8) - C.T = 665 -


602.08 = 62.92
Error Sum of Squares = Total S.S - Block S.S - Ration S.S = 62.92 - 18.67 - 34.25 = 10.00

Analysis of Variance (ANOVA) for Weight Gain of Lambs


Source d.f S. S M. S. F-ratio E. M. S
Between (b-1) = 2 18.67 9.34 5.60 e + 4K
blocks
Between (a-1) = 3 34.25 11.42 6.85* e + 3
Rations
Error (B x R) (a-1)(b-1) = 6 10.00 1.67 e

Total 11 62.92

Table values F(2,6) 5.14 (P<.05) 10.92 (P<.01)


F(3,6) 4.76 (P<.05) 9.78 (P<.01)
The results of analysis of variance indicate that there is significant difference between
the treatment means.

Latin Square Design

This design has been used to advantage in many fields of research where two major
sources of variation are present in the conduct of experiment. In field experiments, the layout is
usually a square, thus allowing the removal of variation resulting from soil differences in two
directions. The square is subdivided in equal number of rows and columns and then the same
number of treatments can be applied.
The chief disadvantage of Latin Square is that the number of rows, columns and
treatments must be equal. If there are many treatments, the number of plots required soon
becomes impractical. The most common square is in the range of 5 x 5 to 8 x 8 and the squares
larger than 12 x 12 are rarely used.
The other limitation of Latin Square is that as the block size or plot size increases, the
experimental error per unit is likely to increase. Small squares provide few degrees of freedom
msk
15

for estimation of experimental error and thus a substantial decrease in experimental error to
compensate for the small number of degrees of freedom. However, more than one square can be
used in the same experiment.
In analyzing data in this design, variation between rows and between columns are
removed from the total variation and then the remaining variation is analyzed between and
within treatments.
In animal experiments to test say digestibility of four rations, four heifers may be used
representing each animal as columns. The trial can be repeated four times representing the rows
according to Latin Square Design. The period when trial is being conducted, may affect the
intake or digestibility of ration because of varying environment.

Example 13
We can assume the following statistical model for observation Xij(t) on ith row and jth
column to which treatment t was applied.

Xij(t) =  + Ri + Cj + T(t) + eij


where
i is no. of rows, j is no. of columns, and t is no. of treatments
and
Ri is the effect of ith row
Cj is the effect of jth column
Tt is the effect of ith treatment
eij is the random error associated with ith row and jth column and it is further assumed
that eij ~ NID (0, e).

Animals
1 2 3 4 Row Total
1 8A 7B 5C 4D 24
Trials 2 6B 5C 6D 9A 26
3 4C 5D 7A 4B 20
4 6D 10 A 8B 7C 31
Column 24 27 26 24 101
Total

Treatments
A B C D
8 7 5 4
9 6 5 6
7 4 4 5
10 8 7 6
Total 34 25 21 21 101

Calculations:-
Correction Term = (101)2/16 = 10201/16 = 637.57
Total. SS =
(8)2+(7)2+(5)2+(4)2+(6)2+(5)2+(6)2+(9)2+(4)2+(5)2+(72+)(4)2+(6)2+(10)2+(8)2+(7)2 - C.T
= 687.0 - 637.57 = 49.43
Between animals SS = [(24)2+(27)2+(26)2+(24)2]/4 - C.T = 1.68
Between Trials SS = [(24)2+(26)2+(20)2+(31)2]/4 - C.T = 15.68
Between Treatments =[(34)2+(25)2+(21)2+(21)2]/4 - C.T = 28.18
Error SS = Total SS - (Animal SS + Trial SS + Treatment SS) = 3.89

msk
16

ANOVA
Source d.f S.S M.S F EMS
Between Animals 3 1.68 0.56 0.86 e + 4Kc
Between Trials 3 15.68 5.23 8.07 e + 4Kr
Between Treatments 3 28.18 9.39 14.49 e + 4t
Error 6 3.89 0.65 e
Total 15 49.43

Table Values F(3,6) 4.76 (P <.05) 9.78 (P <.01)

RCBD was used when the experimental material (experimental units) was not quite
homogenous and had one known source of variation. In case of experiments involving animals,
say lambs, that may be the breed, may be the age or body weight for which blocking is to be
done. Within block however, experimental units should be homogeneous. The LSD on the other
hand is mostly used in nutritional experiments where two major sources of variation are present.
In such experiments to test the digestibility of four rations, for example, four heifers may be
used representing each animal as columns. The trial can be repeated four times representing the
rows and the period when trial is being conducted, may affect the intake or digestibility of
ration because of varying environments as mentioned above.

msk
17

More Examples
Example 14. Anxiety score (X) and examination scores (Y) are given for a class. Calculate
correlation coefficient between two variables and develop a regression equation to predict exam
score when anxiety score is 80. Comment on your prediction.
X Y
15 37
20 33
23 5
30 31
34 40
34 35
35 32
36 30
38 14
40 9
42 20
44 24
46 18
49 21
50 13
53 14
59 7
60 9
62 7
63 13

Example 15. Dry matter digestibility (%) of corn silage is given below for two treatments.
Conclude if the treatments differ. Seventh sheep on second treatment died. Use both t and F tests.

Sheep Y1 Y2
1 57.8 64.2
2 56.2 58.7
3 61.9 63.1
4 54.4 62.5
5 53.6 59.8
6 56.4 59.2
7 53.2

Example 16. Sire difference in birth weight of Sahiwal cattle is being studied. Birth weight of first
10 calves born this year from each of the four sires used are recorded. Does the data suggest that
the mean birth weights from sires differ?.

Birth weight (kg)


Sire 1 20.5 28.1 27.8 27.0 28.0 25.2 25.3 27.1 20.5 31.3
Sire 2 26.3 24.0 26.2 20.2 23.7 34.0 17.1 26.8 23.7 24.9
Sire 3 29.5 34.0 27.5 29.4 27.9 26.2 29.9 29.5 30.0 35.6
Sire 4 36.5 44.2 34.1 30.3 31.4 33.1 34.1 32.9 36.3 25.5

Hint: Error MS = 15.64

msk
18

Example 17. Distance (in cm) in the air travelled by fish resulting from forced air in a sorting
process. Can the fish species sorted properly?
Species 1 Species 2 Species 3
142 164 182
154 168 191
151 156 184
138 161 170
157 172 176
Totals 742 821 903

Example 18. An advertising firm is studying the effect of four kinds of displays of a dairy product
in a grocery store in three different sales areas in the city. Within each sales area, four stores are
selected and each receives one of the four displays. Over the duration of the experiment, the
number of units of the product sold are recorded and data are shown below. Do the four displays
result in different average sales?

Sales area
Display 1 2 3
A 100 56 75
B 94 40 82
C 120 65 102
D 82 60 65

Example 19. Means of total volatile fatty acids of four cows taken at four monthly intervals for
rations A, B, C and D are given. Do treatment means differ ? [Hint: For rations, Fcal= 5.56]
Periods Cows Total
1 2 3 4
1 A = 10.0854 B = 10.2028 C = 9.1842 D = 7.9256 37.3980
2 B = 9.3762 C = 9.1898 D = 8.8376 A = 9.4882 36.8918
3 C = 9.7070 D = 10.2938 A = 10.7468 B = 8.7042 39.4518
4 D = 8.4668 A = 9.9844 B = 10.0724 C = 8.3150 36.8386
Totals 37.6354 39.6708 38.8410 34.4330 150.5802
Ration Totals A = 40.3048 B = 38.3556 C = 36.3960 D = 35.5238

Example 20. Total nutrient consumption (lb) for a 6-week period on three treatments A
(roughages), B (limited grains) and C (full grain) are given below for three cows and three periods.
Compute the ANOVA and report if treatments differed.

Cow
1 2 3
1 608 (A) 885 (B) 940 (C)
Periods 2 715 (B) 1087 (C) 766 (A)
3 844 (C) 711 (A) 832 (B)

msk

You might also like