Professional Documents
Culture Documents
Analysis of ToothGrowth Data Set
Analysis of ToothGrowth Data Set
Javier Santibez
Sunday, December 19, 2014
Introduction
The data set ToothGrowth contains 60 observation of three variables:
len, numeric, Tooth length
supp, factor, Supplement type (VC or OJ).
dose, numeric, Dose in milligrams
The response is the length of odontoblasts (teeth) in each of 10 guinea pigs at each of three dose levels of
Vitamin C (0.5, 1, and 2 mg) with each of two delivery methods (orange juice or ascorbic acid).
In this exercise we have to adrees the following questions:
1. Load the ToothGrowth data and perform some basic exploratory data analyses
2. Provide a basic summary of the data.
3. Use confidence intervals and/or hypothesis tests to compare tooth growth by supp and dose. (Only use
the techniques from class, even if theres other approaches worth considering)
4. State your conclusions and the assumptions needed for your conclusions.
Methodology
We will use the packages following packages:
dataset, it contains the ToothGrowth data set.
dplyr, to manage the data.
ggplot2, to print a plot to summarise the data.
We will compare lenby supp at different dose levels. To campare len we will use the function t.test with
a confidence level of 95%.
Results
Data summary
First we have to load the data and print a baseic summary.
library(datasets)
data(ToothGrowth)
summary(ToothGrowth)
##
##
##
len
Min.
: 4.20
1st Qu.:13.07
supp
OJ:30
VC:30
dose
Min.
:0.500
1st Qu.:0.500
1
##
##
##
##
Median :19.25
Mean
:18.81
3rd Qu.:25.27
Max.
:33.90
Median :1.000
Mean
:1.167
3rd Qu.:2.000
Max.
:2.000
Also, we can make a plot to show variability of len, by dose ans supp.
library(ggplot2)
g<-ggplot(ToothGrowth,aes(supp,len))
g+geom_point(color="steelblue",size=3)+
facet_grid(.~dose)
0.5
len
30
20
10
OJ
VC
OJ
VC
OJ
supp
In this part it is important to explore the variace at each supp and dose combination.
suppressWarnings(suppressMessages(library(dplyr)))
group_by(ToothGrowth,dose,supp) %>%
summarise(Varibility=var(len))
##
##
##
##
##
##
##
##
##
##
1
2
3
4
5
6
From the previous result we can see that it is not feasible to assume equal variances.
Comparing tooth length by supp at dose=0.5
We subset the original data set and do the t.test for equality of means.
2
VC
Form our results we can conclude that the means of len by supp at dose=0.5 are not equal. Additionaly, we
can see that both limits of the confidence interval are above zero, this implies that the mean of len when
supp=OJ is greater than the mean when supp=VC.
Comparing tooth length by supp at dose=1.0
We subset the original data set and do the t.test for equality of means.
data2.1<-filter(ToothGrowth,supp=="OJ",dose==1) %>% select(len)
data2.2<-filter(ToothGrowth,supp=="VC",dose==1) %>% select(len)
t.test(data2.1,data2.2,)
##
##
##
##
##
##
##
##
##
##
##
Form our results we can conclude that the means of len by supp at dose=1.0 are not equal. Additionaly, we
can see that both limits of the confidence interval are above zero, this implies that the mean of len when
supp=OJ is greater than the mean when supp=VC.
Comparing tooth length by supp at dose=2.0
We subset the original data set and do the t.test for equality of means.
Form our results we can conclude that the means of len by supp at dose=2.0 are not different. Additionaly,
we can see that the confidence interval cpontais zero, this implies that the mean of len when supp=OJ is not
different to the mean when supp=VC.
Conclusions
From the preceding analysis we can set the following conclussion:
The variability of len is deffetent at each level of supp and each level of dose.
The mean value of len when supp=OJis greater than the mean when supp=VC when the dose level is
0.5 or 1.0. This means that we can expect greater values of len when provides the vitamin C via
orange juice.
When dose=2.0, the are no evidence that the mean values of len are different at different values of
supp. This means that we can expect the same values of len when dose=2.0 despite the supplemet we
use.