Download as pdf or txt
Download as pdf or txt
You are on page 1of 25

Chapter 13:

Symbiosis in the Soil, Session 2


Karl Siegert, Linda Robinson, Warren Ewens, and Corlett Wood
Department of Biology, University of Pennsylvania

Objectives:
• To acquire metadata between alfalfa and rhizobia symbionts for analysis
• To use R to analyze pooled data
• To make conclusions regarding the effects of symbiosis between alfalfa and rhizobia
from urban or suburban settings

Overview of the Symbiosis Lab Sessions


Symbiosis in the Soil, Session 1:
Inoculation of growing alfalfa seedlings with different rhizobia strains
In this session you will learn all about the mutualistic nature between alfalfa and rhizobia and how
they form their partnership. To grasp a full picture, you will also be looking at nitrogen as a source
of growth for plants, insofar its availability and usage from the environment. You will then
understand how the soil ecology around the roots (termed rhizosphere) of a plant functions. By
understanding all of these pieces of the puzzle you can then perform some independent background
research and generate hypotheses for each dependent variable we have identified for you.
Remember, these hypotheses will be based on the differences between the rhizospheres of known
symbiosis between rhizobia and their plants. You will inoculate your alfalfa seedlings first and
then spend the remainder of the lab searching for primary literature articles to frame your
biological rationale and formulate your hypothesis.

Symbiosis in the Soil, Session 2:


Data collection of your bacterial strain and analyzing multi-class data
In this session you will collect data on each of the dependent variables we have picked out for you.
You will then upload your data to Canvas. A dataset will be complied and disseminated back out.
You will then have access to data from all twelve strains and their effects on the overall health of
the legume plants. By manipulating the data in Excel, you will be able to draw conclusions from
the dataset and reject or fail to reject your hypotheses. You will be given ample time to grapple
the data before the assignment will be due.

Metadata collection from Alfalfa Plants


Remember that during the first session, each group of students was given six alfalfa seedlings
planted on agar slants in glass tubes: three without nitrogen and three with nitrogen. Each group

10-1
inoculated all six seedlings with the one of four USDA strain or sterile water as a control. These
seedlings have now been growing under bright lights at optimal temperature for four weeks.

During today’s session, you will measure the following variables that are indicators of plant
health that may be affected by rhizobia-legume symbiosis on each of your 3 seedlings:

• Number of root nodules • Number of lateral roots


• count the total number up • count the total number up
• Number of nodules on the primary • Length of longest lateral root
root • measure in mm
• count the total number up • Shoot height
• Number of nodules on the lateral • above the hypocotyl,
roots measure in mm
• count the total number up • Length of longest petiole
• Color of nodules • measure in mm
• white, red/pink, or brown • Number of leaves
• Shape of nodules • exclude hypocotyl, count
• round or oblong the total number up
• Primary root length • Color of darkest leaf
• measure in mm • use leaf color chart
• Primary root’s majority color • Any other interesting observations
• white or red • anything you find
interesting

After measuring these variables, each group will submit their data for each seedling using the
Symbiosis Google Form, so that we can pool the data for you to analyze. Each group must submit
their data by the end of their lab session or points will be taken off.

Descriptive Statistics
In many experiments you will collect data. Although many forms of data are possible, we focus
here on the case where these data consist of N numbers, which we denote here X1 , X2 , …., XN.
These might for example be the heights of N buildings, the respective numbers of petals on each
of N flowers, and so on.

It is convenient to think of these data as a sample from a potentially larger group. For example, a
data set consisting of the birth weights of N wombats can be thought of as a sample from a much
wider set of birth weights. Let’s say this could be a subset of data from wombats born in the world
in the year that the sample was taken. So, from now on we often refer to our data set as our sample.
We call N the “sample size” and we often refer to X1 , X2 , …., XN as the respective “data values.”

We often wish to make inferences about this wider set from the data in our sample, and many of
the activities of statistics are concerned with the best way of making these inferences. Examples
of this are discussed in the “Inferential Statistics” section.

10-2
To help make these inferences it is often convenient to summarize some of the main features
exhibited by our sample into a small number of descriptive statistics, and doing so is usually the
first stage of any data analysis. The descriptive statistics discussed here are the sample average
and the sample standard deviation.

Sample Average
The sample average, or more simply the average, of the N data values is calculated by dividing
their sum by N. This average is denoted X , and so in mathematical terms:

∑X i
i =1
X=
N
N
Here ∑X i is the sigmoidal notation and the standard mathematical way of denoting the sum X1 +
i =1
X2 + … + XN.

Sample Standard Deviation


While the average provides some useful information about a group of measurements, it provides
no information about the variation among the individual data values. Data sets with identical
averages can have very different amounts of variation. Consider the following two data sets:

individual sample average sample standard


data values (Xi) size (N) (X ) deviation (SD)

10, 30, 10, 50, 90, 70, 90 7 50 34.6


50, 49, 50, 51, 53, 47, 50 7 50 1.8

These datasets have identical sample sizes and averages, but the values in the first data set vary
among each other much more than the values in the second data set. A commonly used statistic to
describe variation within a data set is called the sample standard deviation (often, and
misleadingly, just called the standard deviation (SD)). The sample standard deviation is a measure
of the dispersion of the individual data values around their average.

Many scientific calculators and programs such as Excel provide the SD for any set of data values
entered into the computer. The Excel formulas are STDEV or STDEV.S. (But watch out: some
calculators incorrectly use “N” instead of “N-1” in the denominator of the formula for the SD –
see below for the correct formula.) However, the calculation is straightforward, and it is best to
calculate the SD of several data sets by hand before doing so by using a calculator, since this will
give you a good feel for what an SD is.

10-3
Two equivalent formulae for the SD are:

N N
2
∑ (X i − X) (∑ Xi 2 ) − N(X 2 )
i =1 i=1
SD = =
N −1 N −1
The formula on the right requires fewer calculation steps than the formula on the left. The details
of the calculations are below:
N
1. Calculate the square of each data value, and then sum these squares. This gives (∑ X i2 )
i =1

2. Calculate the average of the data values, as described earlier. Square this average and
multiply it by the total number of data values N. This gives N(X 2 ) .

3. Subtract the result of step (2) from the result of step (1). Then divide this result by the total
number of data values minus one (N - 1).

4. Calculate the square root of the result of step (3). This is then the sample standard deviation
of your data values.

In the table below, we have added 3 more data sets to the two data sets discussed in the section on
standard deviation above:

individual sample average


data values (Xi) size (N) (X )

Set 1: 10, 30, 10, 50, 90, 70, 90 7 50


Set 2: 50, 49, 50, 51, 53, 47, 50 7 50
Set 3: 40, 45, 50, 55, 60, 47, 49 7 49.4
Set 4: 44, 58, 45, 42, 56, 54, 48 7 49.6
Set 5: 58, 60, 60, 45, 60, 52, 42 7 53.9

In the assignment for the Bacterial Mutagenesis lab, we use sample standard deviation because the
data used for the assignment is section specific (or within one sample). In the data set above, Set
1 would be the data from one lab section, Set 2 would be the data from the second lab section, etc.
In the other assignments, where we pool the data from multiple lab sections, we average data across
all lab sections, which we can consider a population. If Sets 1 through 5 above represent the data
from five different lab sections, we can consider each lab section a sample. You can then think of
all class sections (Sets 1 through 5 in this example) as part of the overall population of BIOL 1124.
In calculating the standard error of the mean (SEM), you would then take the standard deviation
of the averages from Sets 1 through 5. If you don’t have this information, another way to determine
the SEM is to take the standard deviation of the population (across all the sections) and divide by
the square root of the sample size, which is the number of individual sections. Using standard error
of the mean for these assignments allows us to take advantage of our large sample size and get a

10-4
more precise estimate of how the means calculated for each category differ from each other, instead
of just a measure of how the data is dispersed.

There are two formulas for the SEM, depending on the data that is available to you:

A. SEM = SDavg
B. SEM = s /ÖN

SDavg = standard deviation of the averages from each sample (lab section or “Set” in our example)
s = standard deviation of the population (All “Sets”)
N = sample size

To calculate SEM if you have average data from each sample (Lab section or Set) use equation A.
N

∑X i

1. Calculate the average from each sample (Lab section or Set): X = i =1

2. Calculate the standard deviation of the mean (SEM) by calculating the standard deviation
of all of the averages from Step 1, where each average is represented by Xi(avg) in the
formula for standard deviation. Since each average acts as a single value for this
calculation, the sample size, N, used for this SD calculation is the number of samples (Lab
sections or Sets).

To calculate SEM if you do not have average data from each sample (lab section or Set) use
equation B.

1. Calculate the standard deviation of the population by calculating the standard deviation of
each individual data value, Xi, from the entire population. Since each individual data value
is a single value for this calculation, the sample size, N, used for this SD calculation is the
number of individual data values.

N N

∑ (Xi − X ) 2 (∑ Xi 2 ) − N(X 2 )
s = SD = i =1
= i=1
N −1 N −1

2. Divide s from Step 1 by the square root of the sample size, ÖN, where N equals the number
of individual data values.

3.2..20

10-5
Example calculation of the standard error of the mean (SEM) from the following data:

individual sample average square of average


data values (Xi) size (N) (X )

Set 1: 10, 30, 10, 50, 90, 70, 90 7 50 2500


Set 2: 50, 49, 50, 51, 53, 47, 50 7 50 2500
Set 3: 40, 45, 50, 55, 60, 47, 49 7 49.4 2440.4
Set 4: 44, 58, 45, 42, 56, 54, 48 7 49.6 2460.2
Set 5: 58, 60, 60, 45, 60, 52, 42 7 53.9 2905.2
Sum 252.9 12805.7

A. Since we have the averages from each Set: SEM = SDavg

∑ X (avg) = 252.9
i sample size (N) = 5
i =1
N–1=4
N
2
∑X i (avg) = 12805.7 average ( X ) = 252.9/5 = 50.6
i =1
2
X = 2558.3

N N

∑ (Xi − X ) 2 (∑ Xi 2 ) − N(X 2 )
SEM = SD = i =1

N −1
= i=1

N −1
= Ö[12805.7-(5) (2558.3) / 4] = 1.9

B. If we did not have this information: SEM = s /ÖN


N

∑X i = 1770 sample size (N) = 35


i =1
N-1 = 34
N
2
∑X i = 97682 average ( X ) = 1770/35 = 50.6
i =1
2
X = 2558.3

N N

∑ (Xi − X ) 2 (∑ Xi 2 ) − N(X 2 )
s = SD = i =1
N −1
= i=1
N −1
= Ö[97682-(35)(2558.3)/34] = 15.5
SEM = s /ÖN = 15.5 / Ö35 = 2.7

10-6
Example calculation of the sample standard deviation

individual
measurements(Xi) Xi 2 sample size (N) = 7
N-1=6
10 100
30 900 average ( X ) = 350/7
10 100 = 50
2
50 2500 X = 2500
90 8100
70 4900
90 8100

N N ( ∑ Xi 2 ) − N(X 2 )
2 24700 − 7(2500)
∑ X i = 350 ∑X i = 24700 SD = i =1
=
i =1 i =1 N −1 6
= 34.6

As described above, the standard deviation is a measure of variation, or scatter, of the data.

For many parameters (such as a mean, proportion, odds ratio, survival probability, etc.) we would
like to know the exact value. However, it is difficult or impossible to sample all the individuals in
a population. The standard error provides an estimate of the precision of the desired parameter
and is used when we want to make inferences about a population from a sample of data.

The standard error of the mean (SEM) is the standard deviation of the means of several data sets.
It depicts the dispersion of sample means around the population mean.

Graphing Data
You will often prepare graphs during the analysis stage of laboratory sessions. The main purpose
of graphing data is to allow you and the readers of your lab reports to see your data visually, which
often helps in a subsequent data analysis. Clarity is important: you should present your data in a
simple yet informative fashion.

There are two general types of graphs that you will often draw: scatterplots and bar graphs.
Which you use depends on the aim in drawing the graph.

Scatterplots typically are used to show the relationship between a dependent and an independent
variable. In the experimental lab material discussed in this manual, the independent variable is the
one whose values you are often able to choose and manipulate. The dependent variable is the one
you measure to look for an effect of changing the independent variable. In other situations, you
may need to make an inference about causality in determining which variable is which. For
example, you might conduct a study where you examine the effect of lecture attendance on course

10-7
performance in Introductory Biology classes, under the assumption that attendance (independent
variable) influences performance (dependent variable).

In scatterplots, the independent variable is placed on the X-axis and the dependent variable on the
Y-axis. In drawing your scatterplot graphs, keep the following guidelines in mind:

• Always label each axis with the name of the variable and the units measured.
• Give the entire graph a title that describes, in a general sense, what the graph represents.
• When you are graphing data from several treatments or subjects on the same axes, use
different symbols or different colors for each and be sure to provide a descriptive legend.
• Only draw lines on a scatterplot for a specified reason.
o It is usually not appropriate to “connect the dots” on a scatterplot.
o One important exception to this rule, however, is a time series plot, where time is
plotted on the X-axis, and you have obtained sequential measurements of the same
subject. In the case of a time series, it is appropriate to connect the dots for a
particular subject.

The scatterplot below illustrates most of the proper conventions for this type of graph.

10-8
Bar graphs are typically used when the variable on the X-axis is categorical. All of the guidelines
for generating scatterplots also apply to bar graphs: label your axes, indicate units, provide a title
and legend, etc. One additional feature of many bar graphs you will prepare for this course is the
depiction of error bars. The bars in bar graphs usually indicate the average of a category or
variable.

Error bars above and below the average are used to indicate the magnitude of variation around
each average. Sample standard deviation, standard error of the mean (SEM), or twice the standard
error of the mean (2X SEM) are commonly plotted as error bars. If a series of samples are taken
and the mean of each is calculated, 95% of the means would be expected to fall within the range
of two standard errors above and two standard errors below the mean of these means. This is why
we often ask you to plot 2X SEM as error bars. Be sure that you always state which type of error
bar you have plotted on a graph.

You can also choose to indicate the sample size of a given category by a number placed above the
top error bar. Other than this sample size number, you should not include numerical values above
the bars: the bars themselves indicate the average and the average plus and minus one sample
standard deviation, so providing these as numerical values is redundant and distracting.

The example bar graph below illustrates these conventions.

10-9
Boxplots are useful for variated data. Researchers often display categorical data using a boxplot,
also called a ‘box and whiskers plot.” It tells you where the center of the data is, and how much
variation there is in the data. The image below shows you how to interpret a box plot. Before
generating a box plot, the data are arranged in order from lowest to highest. Next, they are divided
into four quartiles, or quarters. The box itself encompasses the middle two quartiles, which
represent 50% of the data points. The upper bound (top) of the box is the third quartile. The lower
bound (bottom) of the box is the first quartile. The line in the middle of the box is the median. The
‘whiskers’ are not error bars, but rather are 1.5 times the height of the box. Possible outliers are
data points that lie outside of this range and are plotted above or below the whiskers. Boxplots are
useful for giving you a visual representation of the data and for comparing medians and
distributions between different groups of samples, however they can’t tell you if those groups are
significantly different from each other. In order to determine if groups are significantly different
from each other, you need to run the statistical analysis, covered in the next section.

Inferential Statistics: Hypothesis Testing


The descriptive statistics described above are frequently used as a preliminary to what is the main
occupation of statistics, namely saying something about some population from which the sample
was randomly drawn on the basis of the data in the sample. In this section we describe the statistical
process of hypothesis testing, the most important of these activities.

Most research starts with the formulation of a research hypothesis. This is a concise statement of
a relationship or mechanism that a scientist suspects to be true. For example, if you were interested
in the relationship between smoking and heart disease, your research hypothesis might be
“smokers have a higher incidence of heart disease than non-smokers.” As a scientist, you would
then design an experiment that will lead to data values which will produce evidence for or against
this research hypothesis.

10-10
At the same time the researcher formulates a null hypothesis, which is, typically, that the research
hypothesis is not true. In the above case, the null hypothesis is that there is no true difference
between the two groups, that is to say smokers and non-smokers have the same incidence of heart
disease. As a simpler example, a research hypothesis might be that a coin is biased towards heads;
the null hypothesis would then be that the coin is fair.

To test the “smoking” hypothesis, you could survey the incidence of heart disease in a group of
smokers and non-smokers. If you find exactly the same incidence in both groups, it seems,
intuitively reasonable to conclude that smoking has no relationship to heart disease. But what if
you do find a difference between the groups? This difference might reflect a real biological
difference between the groups, or it might be due to chance variation. Statistical methods are
designed to test whether chance variation can reasonably be ruled out as a reason for any
differences observed.
How does this potential chance variation arise? In most cases of interest, chance variation arises
because the data used to test the research hypothesis is only a sample taken from a large population
in which we are interested. Thus, in principle, chance variation could be avoided by observing all
individuals in that population. In practice, this is almost always impossible. For example, if we
wished to apply our smoking research hypothesis to all adults in the United States, then our
population would be all of the adults in the United States. Obviously, we do not have the resources
to measure the smoking habits and incidence of heart disease for every adult in the United States
– we must instead take a representative sample. Chance variation arises because such a sample is
very rarely a perfect reflection of the population.

A second example of chance variation arises in the case of the coin toss. Suppose that a fair coin
is flipped 10 times. Even though the coin is fair, you will not necessarily get exactly 5 heads and
5 tails. Instead, you may see 6 heads and 4 tails, or 3 heads and 7 tails, or some other combination.
It is even possible to get 10 heads and no tails. For a fair coin, any such deviation from the 50:50
ratio is caused by chance.

Suppose that we are given a new coin and wish to test the research hypothesis that it is biased
towards heads against the null hypothesis that it is fair. Suppose (as an extreme example) that you
flip the coin 10 times and get 10 heads. If the null hypothesis is true, the probability of obtaining
10 heads in 10 flips is 1 in 1024, a number that rounds to 0.001. In other words, there is only about
one chance in a thousand that you will obtain 10 heads in 10 tosses of a perfectly balanced coin.

The probability value of 0.001 is called a P-value in statistical terminology. In the coin case
described above, the P-value is the probability of obtaining the observed number of heads (or an
even larger number) when the null hypothesis is true. When the P-value is low, we are inclined to
reject the null hypothesis, since it does not explain the observed data well. In the coin case the low
P-value of 0.001 would incline us to making this decision.

This decision, however, could be incorrect: a fair coin can, although very rarely, give 10 heads
from 10 tosses, and thus the decision to claim that a coin is unfair if it does give 10 heads from 10
tosses can sometimes be wrong. Wrong decisions of this type are unavoidable when randomness
is involved in the outcome of an experiment, and our aim is to adopt procedures that make the
chance of such a wrong decision quite small.

10-11
If the coin had given 9 heads in the 10 tosses, the P-value is about 0.01. This is still low and would
still incline us to reject the null hypothesis. For 7 heads the P-value is about 0.27, and this would
not be taken as low enough for us to reject the null hypothesis. Another way of saying this is that
the probability that a fair coin gives 7 or more heads from 10 tosses is about 0.27, so the observation
of getting 7 heads is not enough for us to reject the null hypothesis.

When P-values are available, either by mathematical calculation or as part of the output from a
statistical computer package, they provide the most natural vehicle for testing a statistical
hypothesis. Biologists tend to doubt that a null hypothesis is true when the P-value calculated from
the data is 0.05 or less. Medical researchers tend to be more cautious and require a P-value of 0.01
before doubting the truth of the null hypothesis.

In contrast to the coin example, for many statistical tests of hypotheses P-values cannot be
calculated easily and might not even be available from computer packages. When P-values are not
available, researchers use a "critical value" approach to hypothesis testing. The details of this
procedure are as follows.

The researcher must first decide on an acceptable "false positive rate" (called technically a Type
I error). As stated above, because of the randomness inherent in any sampling procedure, it is
possible to reject a null hypothesis even when it is true, that is to make a false positive claim. The
researcher chooses an acceptably small false positive rate, usually 0.05 or 0.01, and carries out a
procedure for which a false positive decision was made with this probability. In outline the details
are as follows.

In any statistical testing procedure, a so-called “test statistic” is calculated from the data. In the
coin-tossing case described above, this is simply the number of heads observed. In more
complicated (and realistic) cases, this might be the t-statistic, or the chi-square test-statistic
\described below. For each such statistic, tables are available giving so-called "critical values." If
the observed value of the test statistic exceeds the appropriate critical value given in the table, the
null hypothesis is rejected. These critical values are calculated so as to ensure that the null
hypothesis is rejected with the false positive rate chosen by the investigator. Examples of this
approach are given below.

A Primer On Linear Models


You may be familiar with linear models already, although under a different name. The most
widely used analyses—such as t-tests, ANOVA, and linear regression—are all linear models. A
linear model tests for a relationship between a response variable (also called the “dependent” or
“Y” variable) and one or more predictor variables (also called “independent” or “X” variables).
In a graph, response variables are plotted on the Y-axis and predictors are plotted on the X-axis.

t-tests, ANOVA, and linear regression are subtypes of linear models that are distinguished by the
type of predictor (X) variable in the analysis:
• t-tests and ANOVA have categorical predictors.
o When the predictor has 2 categories, the analysis is called a t-test
o When it has 2 or more categories, the analysis is called an ANOVA.

10-12
• Linear regression refers to a linear model with continuous predictors.

Although these analyses have historically gone by different names, the underlying mathematics is
the same. This is why in R, we use the same function (lm, which stands for “linear model”),
regardless of whether the predictor is categorical or continuous.

BOX 1. Examples of linear models


Research question 1: Does the Moderna mRNA vaccine protect against severe COVID-19?
• Type of linear model: t-test
• Response (Y): Disease severity
• Predictor (X): Vaccine treatment (2 categories: vaccine or placebo)

Research question 2: Do pollinators prefer plants with white, pink, or red flowers?
• Type of linear model: ANOVA
• Response (Y): Number of pollinator visits
• Predictor (X): Flower color (3 categories: white, pink, or red)

Research question 3: Do beetles with longer horns produce more offspring?


• Type of linear model: linear regression
• Response (Y): Number of offspring
• Predictor (X): Horn length

How Linear Models Work


A statistical analysis consists of 3 steps:

Step 1: Parameter estimation


The first step is estimating parameter(s) from your data. A parameter is a number that describes
a key feature of your data, like group averages or the slope of a line. In an analysis with a
categorical predictor (t-test and ANOVA), the parameters that are estimated are the group means
(averages). In an analysis with a continuous predictor (linear regression), the parameter that is
estimated is the slope of a line.

BOX 2. Parameters estimated in the examples in Box 1 above


Research question 1: Does the Moderna mRNA vaccine protect against severe COVID-19?
• Parameters: Average disease severity in the vaccine and placebo groups

Research question 2: Do pollinators prefer plants with white, pink, or red flowers?
• Parameters: Average number of pollinator visits to white, pink, and red flowers

Research question 3: Do beetles with longer horns produce more offspring?


• Parameter: Slope of the relationship between horn length (X) and number of offspring
(Y); the increase in number of offspring per cm of horn length

10-13
Step 2: Calculate the test statistic
The second step is calculating the test statistic. A test statistic is a number calculated from your
data that will be used to calculate a p-value. There are many test statistics: F, t, and chi-square are
all common ones that you might have heard of before.

Test statistics quantify the match between your data and the null hypothesis. What does this
actually mean in practice? In a linear model, the test statistic quantifies the strength of the
relationship between X and Y. The null hypothesis is that there is no relationship between X and
Y, i.e., that the groups are not different from each other (t-test, ANOVA) or that the slope of the
line is 0 (linear regression).

The F-ratio in ANOVA


To illustrate how a test statistic is calculated, we’ll use the F-ratio (F for short). F is the test
statistic often used in ANOVA. F measures the difference among group means relative to the
differences between observations within groups.

A large F-ratio indicates that the groups are different. This is because F is large whenever there
are large differences among groups and small differences between observations in the same
group.

BOX 3. How is F calculated in an ANOVA?


F is a ratio of variances. It is calculated by dividing the among-group variance by the within-
group variance:
F=Among-group variance/Within-group variance

The among-group variance measures the differences among group means, and the within-
group variance measures the differences between observations in the same group:

10-14
A variance, like its name implies, is a measure of variation. It is defined as the average deviation
from the mean. In other words, a variance quantifies how far, on average, observations are from
the mean. The variance is closely related to the standard deviation as the standard deviation is
the square root of the variance.

Step 3: Calculate the p-value


Although the F-ratio and other test statistics convey information about how different the groups
are, they do not directly tell you whether the difference is statistically significant. To determine if
it is, the test statistic is used to calculate a p-value.

• Mathematically, a p-value is the probability of observing a test statistic that is at least as


extreme as the value of your test statistic if the null hypothesis is true.

• Conceptually, a p-value it is the chance that you would have observed as large of a
difference between the groups if in reality there were no difference between the groups.

P-values range from 0 to 1, and the threshold for statistical significance is 0.05. if your p-value
is less than 0.05, it means that there is less than a 5% chance that you would have observed as large
of a difference between the groups as you did if they aren’t in fact different.

BOX 4. How to calculate a p-value


Calculating a p-value involves two steps:
1. Determine the null distribution of your test statistic. A test statistic’s null distribution is
the range of values you could observe if the null hypothesis were true, that is, if there is
no real difference between the groups.
2. Calculate the proportion of values in the null distribution that is more extreme than the
value of your test statistic (e.g., the F-ratio you calculated from your data)

Two Examples of Comparative Statistical Tests


The coin example above is a very simple example of a statistical procedure. More realistic
examples usually concern comparative statistical methods. These are designed to compare two or
more samples. You will often use comparative statistical tests to compare data from a “control”
sample, or group, and a “treatment” sample, or group, in experimental biology labs. Many such
comparative statistical tests exist, each designed for a specific situation. The two outlined here are
the two comparative statistical tests used most frequently by biologists. In both cases, the relevant
calculations can be done using a normal pocket calculator. More elaborate statistical tests are
usually performed on a computer running statistical software.

The t-test
Imagine that you are interested in the effects of midterm-induced stress on the physiology of Penn
undergraduates. Because epinephrine is a hormone associated with the short-term stress response,
you decide to focus your study on changes in circulating epinephrine levels caused by midterm

10-15
worries. Your research hypothesis would then be “A midterm exam will increase the epinephrine
levels of Penn undergraduates.” The null hypothesis is that the exam does not change the
epinephrine levels of Penn undergraduates.

In order to test the null hypothesis against the research hypothesis, suppose that you choose 10
students at random from an introductory biology class. They all agree to participate in your study.
From each student, you take a blood sample ten days before an introductory biology midterm and
measure the level of circulating epinephrine. Ten days later, just as the midterm is beginning, you
take another blood sample from each student and again measure epinephrine levels. An imaginary
dataset is presented below for purposes of illustration:

epinephrine levels (ppm) for each individual sample standard


1 2 3 4 5 6 7 8 9 10 average deviation

pre-midterm 0.06 0.09 0.04 0.11 0.03 0.05 0.12 0.02 0.05 0.07
during midterm 0.01 0.29 0.22 0.09 0.19 0.09 0.11 0.10 0.19 0.07
difference -0.05 0.20 0.18 -0.02 0.16 0.04 -0.01 0.08 0.14 0.00 0.072 0.092

We focus on the various differences in epinephrine levels. The sample average and sample
standard deviation of the differences, calculated by the formulae given above, are 0.072 and 0.092
ppm respectively.

Thus, epinephrine levels increased on average by 0.072 parts per million during the midterm exam.
But was this increase statistically significant, or was it likely to have been caused by chance?

In this example, the test of the research hypothesis against the null hypothesis is carried out by
using a t-test. There are several types of t-test, and the various tests being appropriate in different
situations. The t-test relevant to the above data is the so-called “two-sample paired” t-test. Two-
sample t-tests in general are used to test the research hypothesis that some variable differs between
two groups. They can be either “paired” or “unpaired.” The data in the epinephrine example are
said to be “paired,” as each subject was measured both before and during the exam, so that there
is a natural “within-subject” pairing. In general, paired t-tests are used when there is a meaningful
one-to-one correspondence between the data points in one group and those in the other.

Other kinds of data are “unpaired,” so that there is not a meaningful one-to-one correspondence
between the data points in one group and those in the other. For example, if we wished to test the
research hypothesis that there is a gender difference in mean epinephrine levels, and the data used
to carry out the test came from a randomly selected group of men and a randomly selected group
of women, the unpaired t-test would be appropriate. The unpaired t-test is widely used in biological
research because meaningful pairing is not always possible.

10-16
The formula for the test statistic used in a paired t-test is relatively simple. First, calculate the
difference for each pair, as we did above for the epinephrine example. Then calculate the sample
average X D and the sample standard deviation SDD of these differences. The t-statistic then
calculated from as seen below, where N is the number of pairs in the sample (10 in the epinephrine
example).

XD N
t=
SDD
For our epinephrine example, this gives:

Average SD t statistic

0.072 10 0.228
difference 0.072 0.092 t= = = 2.478
0.092 0.092

An intuitive interpretation for this statistic is that it is a signal-to-noise ratio. The signal is the
observed average difference, 0.072. The noise is measured by the standard deviation, 0.092. The
sample size (10) is also a relevant factor in the calculation.

If we could calculate the P-value corresponding to the observed value (2.478) of the t-statistic in
the same manner as we did for the P-value for getting 10 heads from 10 flips of a coin, we could
make a reasonable assessment about whether we should accept the null hypothesis or the research
hypothesis. However, this P-value calculation is complicated, and we therefore proceed by using
the critical value approach, described above.

The critical value appropriate for any given t-test depends on three quantities. First, it depends on
the false positive rate accepted by the researcher. All values chosen in practice for this are listed
across the top of the t-table. Second, it depends on the "degrees of freedom" for the test. For the
paired t-test, the number of degrees of freedom is one less than the number of pairs; in the
epinephrine example, this is 9. Third, it depends on the nature of the research hypothesis. In the
epinephrine example the research hypothesis claims an increase in the quantity measured under
treatment conditions. The t-table is designed for a research hypothesis of this type. The table can
also be used, with some adjustments, for a case where the research hypothesis claims a decrease
in the quantity measured under treatment conditions, and also for a case where the research
hypothesis claims a change, either up or down, in the quantity measured under treatment
conditions.

For the epinephrine example, suppose that the false positive rate chosen is 0.05. Reading down the
column represented by the false positive rate chosen of 0.05 to 9 degrees of freedom, we find a
critical value of 1.833. Since our calculated t-statistic value of 2.478 is greater than 1.833, we
reject the null hypothesis, and our research hypothesis that “A midterm exam will increase the
epinephrine levels of Penn undergraduates” is supported by our data.

10-17
Critical values of t (for a “one-sided up” t-test)

Degrees of False Positive Rate


Freedom 0.10 0.05 0.025 0.01 0.005
1 3.078 6.314 12.700 31.821 63.657
2 1.888 2.920 4.303 6.965 9.925
3 1.638 2.353 3.182 4.541 5.841
4 1.533 2.132 2.776 3.747 4.604
5 1.476 2.015 2.571 3.365 4.032
6 1.440 1.943 2.447 3.143 3.707
7 1.415 1.895 2.365 2.998 3.499
8 1.397 1.860 2.306 2.896 3.355
9 1.383 1.833 2.262 2.821 3.250
10 1.372 1.812 2.228 2.764 3.169
11 1.363 1.796 2.201 2.718 3.106
12 1.356 1.782 2.179 2.681 3.055
13 1.350 1.771 2.160 2.650 3.012
14 1.345 1.761 2.145 2.624 2.977
15 1.341 1.753 2.131 2.602 2.947

There are two notes regarding this procedure. First, there is no difference in principle between this
approach to hypothesis testing and that using P-values. We have used the approach described
above simply because P-values are very difficult to calculate in this example. It is however possible
to say from the t-table that, with the observed value of 2.478 of the t-statistic, the P-value is
somewhere between 0.025 and 0.01. Second, use of the t-test assumes that the data have a normal
distribution (the so-called bell-shaped curve) – these distributions are very common in biology,
and you can assume that the data from your labs have normal distributions unless your lab
instructor tells you otherwise.

The chi-square test


A chi-square test is used when you have frequency data, rather than the measurement data we
analyzed with the t-test. For example, suppose that we wish to test the research hypothesis that
“smokers have a higher incidence of heart disease than non-smokers” against the null hypothesis
that smokers and non-smokers have equal heart disease rates. To do this we might gather data such
as that shown below.

Suffered from Did not suffer Total


heart disease from heart disease
Smokers 45 103 148

Non-smokers 61 311 372

Total 106 414 520

10-18
We can see that in agreement with the research hypothesis, 30.41% (i.e., 45/148) of smokers but
only 16.40% (i.e., 61/372) of non-smokers suffered from heart disease. But is this a statistically
significant difference or is it likely to be the result of chance variation? Since our data consist of
two frequencies, rather than data with averages and standard deviations, we cannot use a t-test to
answer this question. Instead, we use a chi-square test.

To perform a chi-square test, we calculate a "chi-square" statistic (or χ 2 ) using the following
formula:
2
2 (O − E)
χ = ∑[ ]
E
In this equation O is the observed number from each of the four cells shown in bold in the above
table, and E is the number we "expect" for each of these cells if our null hypothesis were true.

How do we know what to expect if the null hypothesis is true? Since overall, 106 of the 520 people
in the study (106/520) suffered from heart disease, we would expect this same proportion of both
smokers and non-smokers to suffer from heart disease if our null hypothesis were correct (i.e.,
there was no effect of smoking on heart disease). For the same reason, we would expect a
proportion (414/520) of both smokers and non-smokers to not suffer from heart disease. We
therefore calculate the expected values as follows:

Expected data Suffered from Did not suffer Total


heart disease from heart disease
Smokers 148 x 106/520 148 x 414/520 148.00
= 30.17 = 117.83

Non-smokers 372 x 106/520 372 x 414/520 372.00


= 75.83 = 296.17

Total 106.00 414.00 520.00

When making a table of expected numbers, you should always calculate to two decimal place
accuracy in order for your final X2 to be accurate. Also, always check your calculations by making
sure that your expected numbers add up to the same totals as in the data table, both row-wise and
column-wise. With our expected numbers, we are now ready to calculate our X2:

2 (45 − 30.17) 2 (103 − 117.83)2 (61 − 75.83)2 (312 − 296.17)2


χ = + + + = 12.91
30.17 117.83 75.83 296.17

If our data were exactly the same as what we expected under the null hypothesis, the value of X2
would be zero. The further a calculated X2 is from zero, the less likely that the results are due to
chance variation alone, and the more significant our evidence is against the null hypothesis. To see
if our calculated X2 is significant, we consult a “critical values of chi-square” table, shown below.

10-19
As with the t-test, to determine the significance of a certain X2 value using this table, we must first
choose a false positive rate, and determine the degrees of freedom. With this chi-square test, there
is one degree of freedom. For one degree of freedom, and a false positive rate of 0.05, the X2 value
is 3.84. Our calculated X2 of 12.91 is greater than 3.84, so we reject the null hypothesis. Our
hypothesis that: “smokers are more likely to suffer from heart disease than non-smokers” is
supported by our data.

Critical values of chi-square

Degrees of False Positive Rate


Freedom 0.100 0.050 0.010 0.005
1 2.71 3.84 6.63 7.88
2 4.61 5.99 9.21 10.60
3 6.25 7.81 11.34 12.84
4 7.78 9.49 13.28 14.86
5 9.24 11.07 15.09 16.75
6 10.64 12.59 16.81 18.55

Another very different application of the chi-square test is to evaluate the significance of observed
deviations from null hypothesis expected values determined by some genetic hypothesis. For
example, suppose that you make a number of crosses of two pink-flowered plants, and find that
some of the offspring from this cross have red flowers, some have pink flowers, and others have
white flowers. What genetic theory can be tested with the data you obtain?

If flower color is determined in a simple way by the genes at a single locus, then the probabilities
of a cross resulting in a red, pink, or white flower are, respectively, ¼, ½, and ¼ (these are the
classical Hardy-Weinberg proportions obtained from a monohybrid cross of the parental alleles).
In this example, the null hypothesis might be that these proportions do apply; that is, that flower
color is determined in this simple way. The research hypothesis might be that flower color is
determined in a more complicated genetic way. The test of this null hypothesis against this research
hypothesis is also tested by a chi-square procedure, but one that is different from that described
above.

Suppose that you make 1000 crosses of two pink-flowered plants, and that 259 of the offspring
plants have red flowers, 513 have pink flowers, and 228 have white flowers. Do these values
support the null hypothesis (the Hardy Weinberg proportions above) or not? To answer this
question using a chi-square test, the first step is to determine the expected values (E) for each
category, assuming that the null hypothesis is true. These expected values are a product of the
sample size (in this case 1000) and the appropriate ratio for each category. Then, the value for X2
is calculated as S[(O-E)2/E] for all three categories as follows:

10-20
sample of 1000 plants
red flowers pink flowers white flowers
observed (O) 259 513 228
expected (E) 1000 x 1/4 = 250.00 1000 x 1/2 = 500.00 1000 x 1/4 = 250.00
(O-E)2/E 0.32 0.34 1.94
X2= S[(O-E)2/E] 2.60

When analyzing genetic data such as these, the degrees of freedom is calculated as the number of
categories minus one. Since there are three categories (red, pink and white), there are two degrees
of freedom in this example. If your accepted false positive rate is 0.05, the “critical values of chi-
square” table above shows that the critical X2 value (for two degrees of freedom) is 5.99. Since
your calculated value for X2 is less than 5.99, then null hypothesis is supported, and the observed
deviations from the expected results can be attributed to chance.

The fact that the observed value of chi-square (2.60) is just below the chart value for an accepted
false positive rate of 0.100 shows that the P-value corresponding to the value 2.60 is just above
0.100.

10-21
Measuring Metadata for Analysis
Follow the directions below to measure each variable for each of your 6 seedlings. Record your
data on the Symbiosis Google Form for each seedling. 1 point will be deducted from each
group’s score if all six Google Forms are not submitted by midnight on the day of your lab session.

Make sure you measure all seedling color parameters before placing the tubes in the water bath.
The water bath will skew the nodule, root, and leaf coloration. When you are ready place each of
the tubes in the 65˚ water bath to soften the agar and make it easier for you to remove the seedling
to make measurements.

After 5-10 minutes in the water bath, remove all three tubes and excise the seedlings to the black
trays at your desk. Have one group member make all measurements and one group member record
measurements to avoid variance and bias in data.

Materials: Amount:
Mini rulers 1 ruler / pair
Working test tube rack 1 rack / pair
Black tray 2 trays / pair
Forceps 1 beaker / set up
Metal spatula 1 beaker / set up
Water bath set to 65°C 3 baths / class

Cultures: Amount:
Alfalfa growing on agar TY agar slant 6 plants / pair

Measuring Alfalfa Color Parameters:

1. Observe the color of the nodules on your first seedling. If any of the nodules are red or
pink, score the nodules as red or pink. If the majority (over half) of the nodules are brown,
score the nodules as brown. Otherwise, score the nodules as white. Enter root nodule color
here and on your Google Form:
Color of root nodules:

2. Enter the color of the primary root here and on your Google form. Score as white if there
is no red or pink in the primary root and red or pink if there is any red or pink in the primary
root.
Primary root color:

3. Score the color of the darkest green leaf that you see on a scale of 1-10 after comparing to
the leaf color chart below. Enter the number corresponding to the color of the darkest green
leaf (closes to 1 on the scale) that you see here and on the Google Form.
Color of darkest leaf:

4. Score the color of the lightest green or yellow leaf that you see on a scale of 1-10 after
comparing to the leaf color chart above. Enter the number corresponding to the color of the

10-22
lightest green, yellow or brown leaf (closest to 10 on the scale) that you see here and on
the Google Form.
Color of lightest leaf:

Excision of the Alfalfa Seedlings:

1. Place each of your six tubes in a 65°C water bath.

2. After ~10 minutes remove your seedling tubes from the water bath by placing them in a
test tube rack and take it to your bench.

3. Excise each plant from the slant carefully removing the foam stopper from the slant culture
and, using a spatula, go to the base of the slant and stick the spatula in it as if you were to
inoculate the butt of the slant.

4. Maneuver the spatula so that you loosen the agar slant from the glass tube by making a
circular motion around the tube and physically separating the agar. Since the agar is
loosened from heat bath this should have little to no resistance.

5. Once the agar has been separated from the glass test tube use forceps to grab a part of the
alfalfa stem and gently pull the plant up and out of the tube. The loosened agar should give
way so the plant can be slowly and carefully excised without any agar stuck to the root
system.

6. If agar has been stuck to the root system, you may use a gloved hand to gently pull it away.

7. Place the plants on a black tray and carefully splay out the plant so that the shoots and roots
are all nicely laid out. Do your best to lay out the lateral roots without breaking them off!
These root systems are fragile.

10-23
Measuring Remaining Variables:

1. Count the number of root nodules on your first seedling and enter the numbers here and on
the Google Form (once per group):
Total number of root nodules:
Number of root nodules on primary root:
Number of root nodules on lateral roots:

2. Observe the shape of the nodules on your first seedling. Score them as round if the majority
(over half) of the nodules are round and as oblong if the majority of the nodules are oblong.
Enter here and on your Google Form:
Shape of root nodules:

3. Use a ruler to measure the primary root length in millimeters. You will need to determine
where the root ends and the stem begins, in order to do this. The root diameter is usually
slightly larger than the stem diameter, and the root is often (but not always) white. Use the
dissecting microscope if necessary, and look for root hairs on the root, but not the stem.
Enter here and on your Google Form:
Primary root length:

4. Count the number of lateral roots coming off of the primary root of your seedling. A
forming lateral root can be distinguished from a nodule because a forming lateral root is
pointy, whereas a forming nodule is usually rounded. Enter this number here and on your
Google form.
Number of lateral roots:

5. Use a ruler to measure the length of the longest lateral root in mm and enter this number
here and on your Google Form.
Length of longest lateral root:

6. Use a ruler to measure the stem height from the base of the stem to the shoot apical
meristem in mm. The base of the stem is the point at which you started measuring the
primary root. The stem is usually narrower in diameter than the root, and is usually green
or red, without root hairs. The shoot apical meristem is a small pointy structure at the top
node of the stem where leaf petiole(s) (look like branches) rise. Do not include the length
of the top leaf petiole (branch) in this measurement. Enter the stem height (in mm) here
and on your Google Form.
Stem height:

7. Use your forceps to carefully spread out the leaves. The petioles are the branch-like
structures that connect the leaves to the stem (technically, they are part of the leaf, but they
look like branches). Measure the length of the longest petiole in mm and enter the petiole
length here and on your Google Form.
Length of longest petiole:

10-24
8. Count the number of leaves on your seedling. Do not include the cotyledons, which are the
two single-lobed leaves at the base of the stem. The cotyledons have very short petioles. If
you don’t see this pair of single-lobed leaves at the base of the stem, the cotyledons may
have died or fell off. The first leaf that forms above the cotyledons is a single-lobed leaf
with a longer petiole. Any subsequent leaf that forms should have 3 lobes at the end of a
longer petiole. Count all three lobes as a single leaf! Enter the number of leaves on your
seedling (not including the cotyledons) here and on your Google Form.
Number of leaves:

9. In the notes section, here and on the Google Form, enter any interesting observations that
you make about your seedling. For example, if the nodules are unusually large, if the roots
are very curly and form circles, or if the seedling is dying due to lack of agar, you should
mention this in the notes section.
Notes about seedling:

10. Submit your completed Google Form for Seedling #1.

11. Repeat the steps above for Seedlings #2-6 on new Google Forms and submit when
completed.

A spreadsheet with all of the pooled data from BIOL1124 will be posted to Canvas on Friday April
26th. Use this data to complete the Symbiosis in the Soil, Session 2 Group Assignment by
Wednesday, May 1st.

10-25

You might also like