Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

Analysis of ToothGrowth data set

Javier Santibez
Sunday, December 19, 2014
Introduction
The data set ToothGrowth contains 60 observation of three variables:
len, numeric, Tooth length
supp, factor, Supplement type (VC or OJ).
dose, numeric, Dose in milligrams
The response is the length of odontoblasts (teeth) in each of 10 guinea pigs at each of three dose levels of
Vitamin C (0.5, 1, and 2 mg) with each of two delivery methods (orange juice or ascorbic acid).
In this exercise we have to adrees the following questions:
1. Load the ToothGrowth data and perform some basic exploratory data analyses
2. Provide a basic summary of the data.
3. Use confidence intervals and/or hypothesis tests to compare tooth growth by supp and dose. (Only use
the techniques from class, even if theres other approaches worth considering)
4. State your conclusions and the assumptions needed for your conclusions.

Methodology
We will use the packages following packages:
dataset, it contains the ToothGrowth data set.
dplyr, to manage the data.
ggplot2, to print a plot to summarise the data.
We will compare lenby supp at different dose levels. To campare len we will use the function t.test with
a confidence level of 95%.

Results
Data summary
First we have to load the data and print a baseic summary.
library(datasets)
data(ToothGrowth)
summary(ToothGrowth)
##
##
##

len
Min.
: 4.20
1st Qu.:13.07

supp
OJ:30
VC:30

dose
Min.
:0.500
1st Qu.:0.500
1

##
##
##
##

Median :19.25
Mean
:18.81
3rd Qu.:25.27
Max.
:33.90

Median :1.000
Mean
:1.167
3rd Qu.:2.000
Max.
:2.000

Also, we can make a plot to show variability of len, by dose ans supp.
library(ggplot2)
g<-ggplot(ToothGrowth,aes(supp,len))
g+geom_point(color="steelblue",size=3)+
facet_grid(.~dose)

0.5

len

30

20

10

OJ

VC

OJ

VC

OJ

supp
In this part it is important to explore the variace at each supp and dose combination.
suppressWarnings(suppressMessages(library(dplyr)))
group_by(ToothGrowth,dose,supp) %>%
summarise(Varibility=var(len))
##
##
##
##
##
##
##
##
##
##

Source: local data frame [6 x 3]


Groups: dose

1
2
3
4
5
6

dose supp Varibility


0.5
OJ 19.889000
0.5
VC
7.544000
1.0
OJ 15.295556
1.0
VC
6.326778
2.0
OJ
7.049333
2.0
VC 23.018222

From the previous result we can see that it is not feasible to assume equal variances.
Comparing tooth length by supp at dose=0.5
We subset the original data set and do the t.test for equality of means.
2

VC

data1.1<-filter(ToothGrowth,supp=="OJ",dose==0.5) %>% select(len)


data1.2<-filter(ToothGrowth,supp=="VC",dose==0.5) %>% select(len)
t.test(data1.1,data1.2,)
##
##
##
##
##
##
##
##
##
##
##

Welch Two Sample t-test


data: data1.1 and data1.2
t = 3.1697, df = 14.969, p-value = 0.006359
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
1.719057 8.780943
sample estimates:
mean of x mean of y
13.23
7.98

Form our results we can conclude that the means of len by supp at dose=0.5 are not equal. Additionaly, we
can see that both limits of the confidence interval are above zero, this implies that the mean of len when
supp=OJ is greater than the mean when supp=VC.
Comparing tooth length by supp at dose=1.0
We subset the original data set and do the t.test for equality of means.
data2.1<-filter(ToothGrowth,supp=="OJ",dose==1) %>% select(len)
data2.2<-filter(ToothGrowth,supp=="VC",dose==1) %>% select(len)
t.test(data2.1,data2.2,)
##
##
##
##
##
##
##
##
##
##
##

Welch Two Sample t-test


data: data2.1 and data2.2
t = 4.0328, df = 15.358, p-value = 0.001038
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
2.802148 9.057852
sample estimates:
mean of x mean of y
22.70
16.77

Form our results we can conclude that the means of len by supp at dose=1.0 are not equal. Additionaly, we
can see that both limits of the confidence interval are above zero, this implies that the mean of len when
supp=OJ is greater than the mean when supp=VC.
Comparing tooth length by supp at dose=2.0
We subset the original data set and do the t.test for equality of means.

data3.1<-filter(ToothGrowth,supp=="OJ",dose==2) %>% select(len)


data3.2<-filter(ToothGrowth,supp=="VC",dose==2) %>% select(len)
t.test(data3.1,data3.2,)
##
##
##
##
##
##
##
##
##
##
##

Welch Two Sample t-test


data: data3.1 and data3.2
t = -0.0461, df = 14.04, p-value = 0.9639
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-3.79807 3.63807
sample estimates:
mean of x mean of y
26.06
26.14

Form our results we can conclude that the means of len by supp at dose=2.0 are not different. Additionaly,
we can see that the confidence interval cpontais zero, this implies that the mean of len when supp=OJ is not
different to the mean when supp=VC.

Conclusions
From the preceding analysis we can set the following conclussion:
The variability of len is deffetent at each level of supp and each level of dose.
The mean value of len when supp=OJis greater than the mean when supp=VC when the dose level is
0.5 or 1.0. This means that we can expect greater values of len when provides the vitamin C via
orange juice.
When dose=2.0, the are no evidence that the mean values of len are different at different values of
supp. This means that we can expect the same values of len when dose=2.0 despite the supplemet we
use.

You might also like