Professional Documents
Culture Documents
A Quick Introduction To iNextPD Via Examples
A Quick Introduction To iNextPD Via Examples
via Examples
T. C. Hsieh
2017-03-11
iNextPD also plots the above two types of sampling curves and a sample completeness curve. The
sample completeness curve provides a bridge between these two types of curves (type=2).
o Required: R
o Suggested: RStudio IDE
install.packages("iNextPD")
install.packages('devtools')
library(devtools)
install_github('JohnsonHsieh/iNextPD')
## import packages
library(iNextPD)
library(ggplot2)
library(ade4)
Remark: In order to install devtools package, you should update R to the latest version. Also, to
get install_github to work, you should install the httr package.
MAIN FUNCTION: iNextPD()
The arguments of this function are briefly described below, and will be explained in more details by
illustrative examples in later text. This main function computes diversity estimates of order q = 0, 1,
2, the sample coverage estimates and related statistics for K (if knots=K) evenly‐spaced knots
(sample sizes) between size 1 and the endpoint, where the endpoint is described below. Each knot
represents a particular sample size for which diversity estimates will be calculated. By default,
endpoint = double the reference sample size (total sample size for abundance data; total sampling
units for incidence data). For example, if endpoint = 10, knot = 4, diversity estimates will be
computed for a sequence of samples with sizes (1, 4, 7, 10).
Argumen
Description
t
an integer vector of sample sizes for which diversity estimates will be computed. If NULL,
size then diversity estimates will be calculated for those sample sizes determined by the
specified/default endpoint and knots.
an integer specifying the sample size that is the endpoint for R/E calculation; If NULL,
endpoint
then endpoint=double the reference sample size.
conf a positive number < 1 specifying the level of confidence interval, default is 0.95.
DATA FORMAT/INFORMATION
Bird phylogeny and survey dataset is included in iNextPD package. This data set describes the
phylogeny ($tre) of 41 birds as reported by Jetz et al. (2012). It also gives the two sites of species
abundance ($abun) and incidence ($inci) data to these 41 species in November 2012 at Barrington
Tops National Park, Australia. For this data, the following commands display basic data
visualization:
data(bird)
str(bird)
List of 3
$ tre : chr
"(((((Alisterus_scapularis:31.96595541,Platycercus_elegans:31.96595545):13.04819101,
(Cacatua_galerita:32.14669035,Calyptorhynchu"| __truncated__
$ inci:List of 2
# plot(bird.phy)
For is data, the following commands display basic data information and run the iNextPD() function
for q = 0.
iNextPD(x=bird$abun, labels=bird.lab, phy=bird.phy, q=0, datatype="abundance")
For incidence data, the list $DataInfo includes the reference sample size (T), observed species
richness (S.obs), total number of incidences (U), a sample coverage estimate (SC), and the first ten
incidence frequency counts (Q1‐Q10).
In the North.site, by default, 40 equally spaced knots (samples sizes) between 1 and 404 (= 2 x
202, double reference sample size) are selected. Diversity estimates and related statistics are
computed for these 40 knots(corresponding to sample sizes m = 1, 12, 23, … 202, …, 404), which
locates the reference sample at the mid‐point of the selected knots. By default we only show five
estimates on the screen, user should call iNextPD.object$iNextPDEst to show complete output. If
the argument se=TRUE, then the bootstrap method is applied to obtain the conf (by
default conf=0.95) confidence intervals for each diversity and sample coverage estimates.
For the sample size corresponding to each knot, the list $iNextPDEst (as shown below for
the North.site) includes the sample size (m, i.e., each of the 40 knots), the method
(interpolated, observed, or extrapolated, depending on whether the size m is less than, equal to,
or greater than the reference sample size), the diversity order, the diversity estimate of order q
(qPD), the 95% lower and upper confidence limits of diversity (qD.95.LCL, qD.95.UCL), and the
sample coverage estimate (SC) along with the 95% lower and upper confidence limits of sample
coverage (SC.95.LCL, SC.95.UCL). These sample coverage estimates with conf% confidence
intervals are used for plotting the sample completeness curve and coverage-based R/E curves.
$iNextPDEst: phylogenetic diversity estimates with rarefied and extrapolated samples.
$North.site
m method order qPD qPD.95.LCL qPD.95.UCL SC SC.95.LCL SC.95.UCL
NOTE1: Only show five estimates, call iNextPD.object$iNextPDEst to show complete output.
$AsyPDEst lists the observed diversity, asymptotic estimates, estimated bootstrap s.e. and 95%
confidence intervals for Hill numbers with q = 0, 1, and 2. See Hsieh and Chao (2016) for
asymptotic estimators. The output for the bird data is shown below. All row and column variables
are self‐explanatory.
$AsyPDEst: asymptotic phylogenetic diversity estimates along with related statistics.
To show the completed branch abundance/incience and branch length (Ui, Li), i = 1, 2, …, B, user
could call iNextPD.object$ExpandData.
In practice, the user may specify an integer sample size for the argument endpoint to designate the
maximum sample size of R/E calculation. For Faith’s PD, the extrapolation method is reliable up to
the double reference sample size; beyond that, the prediction bias may be large. However, for
measures of q = 1 and 2, the extrapolation can usually be safely extended to the asymptote if data
are not sparse; thus there is no limit for the value of endpoint for these two measures.
The user may also specify the number of knots in the range of sample size between 1 and the
endpoint. If you choose a large number of knots, then it may take a long time to obtain the output
due to the time‐consuming bootstrap method. Alternatively, the user may specify a series of sample
sizes for R/E computation, as in the following example:
# set a series of sample sizes (m) for R/E computation
Further, iNextPD can simultaneously run R/E computation for Hill numbers with q = 0, 1, and 2 by
specifying a vector for the argument q as follows:
out <- iNextPD(x=bird$abun, labels=bird.lab, phy=bird.phy,
to compute diversity estimates with q = 0, 1, 2 for any particular level of sample size (base="size")
or any specified level of sample coverage (base="coverage") for either abundance data
(datatype="abundance") or incidence data ("incidence_raw"). If level=NULL, this function computes
the diversity estimates for the minimum sample size/coverage among all sites.
For example, the following command returns the species diversity with a specified level of sample
coverage of 97.5% for the bird abundance-based data. For some sites, this coverage value
corresponds to the rarefaction part whereas the others correspond to extrapolation, as indicated in
the method of the output.
estimatePD(bird$abun, bird.lab, bird.phy, "abundance",
## Not run:
## End(Not run)
The following commands return the sample completeness curve in which different colors are used
for the two sites:
ggiNEXT(out, type=2, facet.var="none", color.var="site")
The following commands return the coverage‐based R/E sampling curves in which different colors
are used for the two sites (facet.var="site") and for three orders (facet.var="order")
ggiNEXT(out, type=3, facet.var="site")
q=0, datatype="incidence_raw",
theme_bw(base_size = 18) +
theme(legend.position="none")
xlim(c(5,25)) + ylim(c(0.7,1)) +
theme_bw(base_size = 18) +
theme(legend.position="none")
xlim(c(0.7,1)) +
theme_bw(base_size = 18) +
theme(legend.position="bottom",
legend.title=element_blank())
Hacking ggiNEXT()
Remove legend
out2 <- iNextPD(bird$abun, bird.lab, bird.phy,
endpoint=400, se=TRUE)
theme(legend.position="none")
theme_bw(base_size = 18) +
theme(legend.position="right")
facet_wrap(~order, scales="free")
scale_shape_manual(values=c(19,19,19))
General customization
Example: bird data
library(ggplot2)
library(gridExtra)
data(bird)
datatype="abundance", se=TRUE)
scale_linetype_manual(values=c(1,2))
scale_fill_manual(values=c("red", "blue"))
# library(gridExtra)
In order to chage the size of reference sample point or rarefaction/extrapolation curve, user need
modify ggplotobject.
gb3$data[[1]]$size <- 10
# library(grid)
# grid.draw(gt3)
gb4$data[[2]]$size <- 3
# grid.draw(gt4)
Customize theme
A ggplot object can be themed by adding a theme. User could run help(theme_grey) to show the
default themes inggplot2. Further, some extra themes provided by ggthemes package. Examples
shown in the following:
g5 <- g + theme_bw() +
g6 <- g + theme_classic() +
library(ggthemes)
theme(legend.box = "vertical") +
scale_colour_hc("darkunica")
g8 <- g + theme_economist() +
theme(legend.box = "vertical") +
scale_colour_economist()
Black-White theme
The following are custmized themes for black-white figure. To modifiy legend, see Cookbook for
R for more details.
g9 <- g + theme_bw(base_size = 18) +
theme(legend.position="bottom",
legend.title=element_blank(),
legend.box = "vertical")
theme(legend.position="bottom",
legend.title=element_blank(),
legend.box = "vertical")
head(df)
c("interpolated", "extrapolated"),
c("interpolation", "extrapolation"))
geom_ribbon(aes(ymin=y.lwr, ymax=y.upr,
theme(legend.position = "bottom",
legend.title=element_blank(),
text=element_text(size=18),
legend.box = "vertical")
License
The iNextPD package is licensed under the GPLv3. To help refine iNextPD, your comments or
feedbacks would be welcome (please send them to T. C. Hsieh or report an issue on iNextPD
github reop).
How to cite
If you publish your work based on results from iNextPD (R package), please make reference to
Hsieh and Chao (2016) and Chao et al. (2015) given in the following list.
References
o Chao, A., Gotelli, N.J., Hsieh, T.C., Sander, E.L., Ma, K.H., Colwell, R.K. & Ellison, A.M.
(2014) Rarefaction and extrapolation with Hill numbers: a framework for sampling and
estimation in species diversity studies. Ecological Monographs, 84, 45–67.
o Chao A., Chiu C.H., Hsieh T.C., Davis T., Nipperess D.A. & Faith D.P. (2015) Rarefaction
and extrapolation of phylogenetic diversity.Method Ecol. Evol. 6:380–388.
o Hsieh, T.C., Ma, K.H. and Chao, A. (2016) iNEXT: an R package for rarefaction and
extrapolation of species diversity (Hill numbers). Methods Ecol Evol.
o Jetz, W., Thomas, G.H., Joy, J.B., Hartmann, K. & Mooers A.O. (2012) The global diversity
of birds in space and time. Nature, 491, 444-448.