And Install R Box (You Won't Need Any of The Source Code Files, These Are For Developers) - Follow

You might also like

Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 4

R Statistics and Graphing

The courses in the Biology Department have adopted a statistical program called R. To install R,
go to http://cran.r-project.org and click on the MacOS X or Windows link in the Download
and Install R box (you won’t need any of the source code files, these are for developers). Follow
the links for the base files for Windows (Install R for the first time … Download (R 3.3.1) or
3,3,1 pkg. for MacOS.

Independent t-test: Used for most Biology 102 assignments when comparing two separate
groups that do not have a direct influence on the other during the experiment (e.g., comparing
plant growth of two groups under different growing conditions).

When significance is determined for experimental results in Biol 102 assignments it is important
to use the required wording as shown in the following example. Failure to do so will cost marks.

Required Wording: It was found that the students on Team A (n=14) were significantly faster (p=0.04)
than those on Team B (n=21).

Where
n refers to the sample size … e.g., 14 students on Team A were measured, 21 on Team B were
measured.
p is the threshold value, and 0.05 (or 95%) is the standard threshold in science and the one we
will use in Bio 102. A p value of less than 0.05 implies significance. For example, the
significant p value of 0.04 we obtained in the example above would equate 98% confidence, and
if you were a gambler, you would say the odds were 49 to 1 that Team A would win in a race.
Direction. We are using a one-sided analysis with all our assignments. By default, R gives a p
value that is based on a two-sided analysis. To convert to a one-sided analysis, we simply divide
the given p value in half. This allows us to give direction to our wording, and in our Team A
example the direction we note was “faster”. In our assignments we require you to state the
direction (e.g., bigger, stronger, faster, greater, etc.).

Purpose. A significant result adds strength to the quality of your results. Let’s say you were
comparing two sets of results and the average value of trial 1 was 7 pounds and the average value
of trial 2 was 11 pounds, you could say that trial 2 had a higher average weight. You COULD
NOT say that Trial 2 had a much higher average weight as that is subjective, and a mark loser on
Biology 102 assignments. However, if you do an independent t-test and get a p value of below
0.05, you can now say the Trial 2 had a significantly higher average weight. Neat eh.

R instructions

1. Open the R program.

2. Under the File pull down menu, open up New Script. This opens up a script box that allows
you to do rough work. Trust us; it will save you headaches to do your initial data entry in this
script box. It is not a good idea to copy and paste your R setup from Excel or Word, as there can
be code that R does not recognize (e.g., ‘ vs.' ) and you will get error messages when you try to
enter your script into the console.

3. Enter the R formula for an Independent t-test exactly as it appears below into the New Script
editor box. In the example below the first line represents the experimental values for two trials of

1
data (6,5,8,5,7,11,7 for Trial 1 and 3,5,3,5,2,1,2 for Trial 2), and each number in the first line
corresponds (in order) to the appropriate Trial (1 or 2) in the second line of script. It doesn’t
really matter what we call them (i.e., Trial 1, Test 1, A, etc.) in the second line, as each term is
surrounded by apostrophes, but we must be consistent. When you have finished editing your
script, copy and paste it into the R Console box. See Figure 1.

4. A proof of R is required at the end of each assignment. See what is required for Proof of R at
the end of this file.

For practice enter the example below into the New Script editor and then copy and paste from
there to a ( >) of the R Console. It is easy to make mistakes, such as a misplaced comma,
different spacing etc. and practice is helpful. R is also used in many upper year biology courses.

Example Script for an Independent t-test


A=c(6,5,8,5,7,11,7,3,5,3,5,2,1,2)
B=c('Trial 1','Trial 1','Trial 1','Trial 1','Trial 1','Trial 1','Trial 1','Trial 2','Trial 2','Trial 2','Trial
2','Trial 2','Trial 2','Trial 2')
C=data.frame(A,B)
C
D=glm(A~B,data=C)
summary(D)

Figure 1. Screen capture of R with New Script formula and resulting t-test analysis. Note the given p
value of 0.00148.

2
The Box Plot: All graphing in the course will involve a box plot that you create in R. We have
provided an example script for a two trial box plot directly below, and then a three trial box plot
to show how easy it can be to add more boxes. The first example corresponds to the t-test data
from the previous page. Note the similar values and the lack of space between Highsun and
Lowsun in the first three lines of script and note how spaces are not an issue in the fourth line of
script because of the apostrophes.

Highsun=c(6,5,8,5,7,11,7)
Lowsun=c(3,5,3,5,2,1,2)
Variables=c(Highsun,Lowsun)
Trials=c('High Sun','High Sun','High Sun','High Sun','High Sun','High Sun','High Sun','Low
Sun','Low Sun','Low Sun','Low Sun','Low Sun','Low Sun','Low Sun')
boxplot(Variables~Trials,ylab="New Growth (cm)", xlab="Bean Growth Trials")

Example script for a three trial box plot (see Figure 2 box plot). The enzyme lab requires a 5
trial box plot.
A=c(6,5,8,5,7,11,7)
B=c(3,5,3,5,2,1,2)
C=c(4,5,4,4,5,6,5)
Variables=c(A,B,C)
Trials=c('CS','CS','CS','CS','CS','CS','CS','AA','AA','AA','AA','AA','AA','AA','CS+AA','CS+AA','
CS+AA','CS+AA','CS+AA','CS+AA','CS+AA')
boxplot(Variables~Trials,ylab="Carbon Dioxide Output (µL/g/min)", xlab="Cricket Trials")
10
Carbon Dioxide Output (µL/g/min)

8
6
4
2

AA CS CS+AA

Cricket Trials
Figure 2. The carbon dioxide output (µL/g/min) of crickets under different conditions, where AA is acetic
acid, and CS is cesium.

3
Proof of R

Your Proof of R should appear exactly as it does below, including titles, bolding,
significant digits, and order. Any departure can cost arks. For the results below, note
how we only took these two lines from the larger results output shown in the R console
screen capture (Figure 1).

Proof of R for bean growth lab assignment.

Script for an Independent t-test


A=c(6,5,8,5,7,11,7,3,5,3,5,2,1,2)
B=c('Trial 1','Trial 1','Trial 1','Trial 1','Trial 1','Trial 1','Trial 1','Trial 2','Trial 2','Trial 2','Trial
2','Trial 2','Trial 2','Trial 2')
C=data.frame(A,B)
C
D=glm(A~B,data=C)
summary(D)

Results
Estimate Std. Error t value Pr(>|t|)
(Intercept) 7.0000 0.6901 10.144 3.07e-07
BTrial 2 -4.0000 0.9759 -4.099 0.00148/2=0.00074

Script for Box Plot


Highsun=c(6,5,8,5,7,11,7)
Lowsun=c(3,5,3,5,2,1,2)
Variables=c(Highsun,Lowsun)
Trials=c('High Sun','High Sun','High Sun','High Sun','High Sun','High Sun','High Sun','Low
Sun','Low Sun','Low Sun','Low Sun','Low Sun','Low Sun','Low Sun')
boxplot(Variables~Trials,ylab="New Growth (cm)", xlab="Bean Growth Trials")

SIGNIFICANT DIGITS: Note the use of two significant digits in the final value
associated with the Results above (i.e., 0.00074). The proper number of significant digits is
normally based on a statistical analysis of the experimental protocol (e.g., accuracy of machinery
etc.). We won’t be undertaking this analysis, so we will expect two significant digits when
presenting R information in all Bio 102 assignments. Using more will result in a penalty mark.

You might also like