A Dendrochronology Program Library in R (DPLR) : Andrew G. Bunn

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

ARTICLE IN PRESS

Dendrochronologia 26 (2008) 115–124


www.elsevier.de/dendro

TECHNICAL NOTE
A dendrochronology program library in R (dplR)
Andrew G. Bunn
Environmental Sciences, Huxley College, Western Washington University, Bellingham, WA 98225-9181, USA

Received 4 September 2007; accepted 8 January 2008

Abstract
I present and describe a new software package in the R statistical programming environment for dendrochronology.
R is considered the world’s pre-eminent open-source statistical computing environment where users can contribute
packages, which are freely available on the Internet. The dendrochronology program library in R (dplR) is able to read
standard decadal-format files and allows users to perform several standard analyses including interactive detrending,
chronology building, and the calculation of standard descriptive statistics. The package can also produce a variety of
publication quality plots. The dplR package should make it easier for dendrochronologists to take advantage of R and
use it as their primary analytic environment.
r 2008 Elsevier GmbH. All rights reserved.

Keywords: Statistical software; R; Detrending; Chronology

Introduction The tree-ring community cherishes open communica-


tion and sharing of data. This is exemplified by the
Dendrochronology is a highly quantitative discipline. establishment and popularity of the International Tree-
Since the 1960s, scientists working in the field have made Ring Data Bank (ITRDB) at the National Oceanic and
software available to the dendrochronology community. Atmospheric Administration (NOAA) Paleoclimatology
Especially notable are the three mainstays of dendro- Program and World Data Center for Paleoclimatology
chronology: Dendrochronology Program Library (http://www.ncdc.noaa.gov/paleo/). The ITRDB cur-
(DPL), COFECHA, and ARSTAN (Holmes; 1983, rently has more than 2700 collections (Bruce Bauer,
1992; Cook and Holmes, 1996). These programs allow NOAA, pers. comm.), all of which are publicly
users to read text files containing tree-ring measure- available. Thus, researchers in many fields have access
ments and perform a variety of functions or statistical to a large amount of data, confirming that dendrochro-
analyses. However, rarely are they sufficient for a full nology is a very open discipline.
project analysis – i.e., other computer programs are Transparency in making data available advances the
usually used for statistical modeling and creation of discipline. However, the wide array of often proprietary
graphics. software used by different researchers makes it difficult to
recreate analyses. A recent report by the National
Research Council (North et al., 2006) exhorts researchers
Tel.: +360 650 4252; fax: +360 650 7284. to make data publicly available but also make computer
E-mail addresses: andy.bunn@wwu.edu, andrew.bunn@wwu.edu code available. In the US, making the tools necessary to
(A.G. Bunn). recreate analysis is often a prerequisite for federal funding.

1125-7865/$ - see front matter r 2008 Elsevier GmbH. All rights reserved.
doi:10.1016/j.dendro.2008.01.002
ARTICLE IN PRESS
116 A.G. Bunn / Dendrochronologia 26 (2008) 115–124

R programming language and environment dplR


The R programming language is both a program- The dendrochronology program library in R (dplR)
ming language and a software environment for reads standard, decadal-format tree-ring files and per-
statistical computing (R Development Core Team, forms standard analyses and plots (Bunn, 2007). Ring-
2007). It is free and open-source, licensed under width list (rwl) and chronology (crn) files are easily
the GNU General Public License. R is based on loaded. There are functions to detrend and convert the
the S programming language and functions as an raw series to ring-width indices (RWIs) and build either
open-source successor and alternative to the proprie- standard or residual chronologies, and plot them in
tary S-Plus statistics package (Chambers, 1998). various ways.
Usage numbers for R are hard to gauge, but R For the remainder of this section I will use the
users number in the millions. R is so widely used Hurricane Ridge (Abies amabilis Dougl.) ring-width file
for statistical computing that it is virtually the from the ITRDB (Schweingruber and Briffa, 1983) to
standard software used among statisticians for the demonstrate the functionality of dplR. The most
development of statistical software. R has been important functions in dplR are read.rwl and
used widely for paleoclimate research and is well read.crn. These functions allow decadal (Tucson)
accepted among the paleoclimate research community format text files to be read into R and the precision of
(Pocernich, 2006). the data is inferred from the file. For example, the rwl
R is an extremely versatile environment. It can be file is read via:
used for virtually any type of statistical analysis
including frequentist and Bayesian analysis, data 4wa082o- read.rwl(‘‘wa082.rwl’’)
mining, bootstrapping, spatial and time series analyses, There appears to be a header in the crn file
and can serve as an environment for simulation There are 23 series
modeling. R is inherently a command-line driven
program. Traditionally this has caused apprehension Users can check the overlap of the series that have
among users who do not already know a programming been imported to R using the seg.plot function
language, but the wide variety of graphical interfaces (Fig. 1). The program can read files that adhere to the
has made this less problematic. ITRDB standards for both ring-width list (rwl) and
Many dendrochronologists use proprietary comput- chronology (crn) files. In this example, the file was saved
ing environments (e.g., Matlab, IDL, S-Plus), and R
is not dissimilar to these. In fact, it was the release
of Matlab routines for dendrochronological analyses
by Meko (2002) that led to the creation of this Hurricane Ridge
library. However, there are four major features that 712072
make R preferred over other environments for most 712071
712082
dendrochronology analyses. First, R and all of 712081
its add-on packages are free. The program costs nothing 712022
and users can change it in any way they like. Other 712032
712021
similar software costs thousands of dollars which 712011
puts it out of reach of many researchers, especially 712091
in developing countries. Second, the R user community 712061
712062
is extremely passionate and helpful. There is a quarterly 712042
R newsletter – Rnews (http://cran.r-project.org/doc/ 712052
Rnews/) and a popular R-Help mailing list (http:// 712012
712031
www.r-project.org/mail.html) which is an electronic 712051
list where users post questions and answers on using 712111
R. In July 2007, R-Help had over 2000 posts made 712121
712041
to it. There is also an extensive Wiki (web pages 712092
that are created and edited by registered users) for 712102
R (http://wiki.r-project.org/rwiki/). Third, all analyses 712101
712122
in R are easily reproducible and analysts can
archive the code used to produce their results. Finally,
R operates on every modern computer platform 1700 1750 1800 1850 1900 1950
Year
(e.g., Microsoft Windows, Mac OS X, and Linux)
with virtually no differences between computing plat- Fig. 1. The time span of each series in the Hurricane Ridge
forms. dataset.
ARTICLE IN PRESS
A.G. Bunn / Dendrochronologia 26 (2008) 115–124 117

Table 1. Individual series statistics produced by dplR function rwl.stats for the Hurricane Ridge ring-width data

Series First year Last year Span Mean Median Standard Skewness Mean First-order
deviation sensitivity autocorrelation

712011 1811 1983 173 0.571 0.490 0.302 1.277 0.324 0.724
712012 1770 1983 214 0.567 0.520 0.257 0.659 0.271 0.728
712021 1824 1983 160 1.427 1.375 0.372 0.253 0.224 0.395
712022 1838 1983 146 1.565 1.465 0.458 1.438 0.206 0.481
712031 1764 1983 220 0.888 0.865 0.248 0.295 0.201 0.589
712032 1828 1983 156 1.140 1.120 0.290 0.496 0.194 0.535
712041 1722 1983 262 0.802 0.760 0.255 0.576 0.207 0.654
712042 1773 1983 211 0.564 0.560 0.196 0.378 0.207 0.725
712051 1761 1983 223 0.714 0.600 0.400 0.805 0.245 0.827
712052 1772 1931 160 0.755 0.600 0.476 1.641 0.227 0.832
712061 1797 1983 187 0.921 0.850 0.302 1.127 0.182 0.749
712062 1777 1983 207 0.920 0.890 0.353 0.451 0.178 0.787
712071 1883 1983 101 1.959 1.900 0.592 0.497 0.205 0.567
712072 1889 1983 95 2.185 2.210 0.552 0.167 0.201 0.525
712081 1861 1983 123 1.083 1.090 0.421 0.249 0.224 0.769
712082 1864 1983 120 1.845 1.860 1.031 0.211 0.195 0.910
712091 1810 1983 174 1.091 0.970 0.429 1.989 0.188 0.755
712092 1715 1983 269 0.863 0.760 0.467 0.954 0.234 0.841
712101 1702 1983 282 1.049 0.980 0.426 0.724 0.238 0.724
712102 1706 1983 278 0.871 0.770 0.418 1.392 0.221 0.764
712111 1748 1983 236 0.571 0.510 0.283 1.404 0.258 0.791
712121 1730 1983 254 0.788 0.655 0.538 1.380 0.214 0.895
712122 1698 1983 286 0.726 0.640 0.385 1.619 0.217 0.848

locally onto the computer, but R also allows files to be smoothing spline approach closely follows the ‘‘n-year
read remotely over the Internet. spline’’ approach first described by Cook and Peters
The dplR package also produces some common (1981) where Gt is calculated as a spline with a frequency
descriptive statistics with the rwl.stats function. response of 50% at a wavelength of n years. Here, n is
For example, using the raw ring-width data from fixed as 2/3 the length of the series (Cook et al., 1990).
Hurricane Ridge: The last approach is to fit Gt as the mean of the series.
In all three cases, the RWI is calculated by division:
4wa082.stats o- rwl.stats(wa082)
RWIt ¼ Rt/Gt, where the actual growth (R) is divided
The object wa082.stats contains various statistics by the expected growth (G) at time t. The user can
on individual series such as the mean, median, and also detrend interactively and choose a different
standard deviation of the ring widths for each series in detrending method for each series using the i.
addition to the mean sensitivity (Fritts, 2001) and first- detrend function:
order autocorrelation (Table 1).
4wa082.rwi o- i.detrend(wa082)
The raw ring-width data can be detrended interac-
tively, one series at a time, or all at once with the user This produces a plot showing each series and three
preselecting a method or methods. The latter method is standard detrending options for each one (Fig. 2). The
useful to ensure replication. There are three standard user can decide which method to use using the keyboard
detrending methods: a modified negative exponential and the results (RWIs) are stored in the object
curve, cubic smoothing spline, or a horizontal line. wa082.rwi.
(Here and elsewhere in the paper, notation follows Cook Chronologies can also be built in dplR with the
et al. (1990) and Fritts (2001).) The modified negative chron function. This function builds a mean value
exponential curve fits a model Gt ¼ aebt+k where the chronology either by averaging each year’s RWI using
growth trend Gt is estimated as a function of time t with the arithmetic mean or using Tukey’s biweight robust
coefficients a, b, and k. If that nonlinear model cannot mean which minimizes the effects of outliers. Prewhi-
be fit, then a standard linear model is fit (Gt ¼ b0+b1t tened chronologies can also be built where autocorrela-
where b0 and b1 are the intercept and slope). The tion is removed from each series before averaging using
ARTICLE IN PRESS
118 A.G. Bunn / Dendrochronologia 26 (2008) 115–124

Raw Series 712011


1.5

1.0
mm

0.5

0.0

0 50 100 150
Age (Yrs)

Spline

1.5
RWI

1.0

0.5

0.0

0 50 100 150
Age (Yrs)

Neg. Exp. Curve or Straight Line

2.0

1.5
RWI

1.0

0.5

0.0

0 50 100 150
Age (Yrs)

Horizontal Line (Mean)

2.5

2.0

1.5
RWI

1.0

0.5

0.0

0 50 100 150
Age (Yrs)

Fig. 2. Series 712011 from the Hurricane Ridge dataset is shown here with three detrending options for interactive detrending.
ARTICLE IN PRESS
A.G. Bunn / Dendrochronologia 26 (2008) 115–124 119

HURstd

1.4

20
Sample Depth
1.2
RWI

15
1.0

10
0.8

0.6

1750 1800 1850 1900 1950


Years

Fig. 3. The standard chronology is shown for the Hurricane Ridge RWI data. A smoothing spline highlights low-frequency
variability and the sample depth is plotted on right-hand y-axis.

the R function ar. The prewhitening is performed by Table 2. Composite statistics produced by dplR function
fitting an autoregressive model to the data where the rwi.stats for the detrended and standardized Hurricane
complexity of the model is selected by Akaike’s Ridge data
information criterion (Venables and Ripley, 2002).
Statistic Value
A standard chronology of the Hurricane Ridge RWI
data (wa082.rwi) can be built with Ntot 253
Nwt 11
4wa082.crn o- chron(wa082.rwi, Nbt 242
prefix ¼ ‘‘HUR’’) r̄tot 0.244
r̄wt 0.47
The object wa082.crn has two columns with the r̄bt 0.233
first (wa082.crn$HUR) containing the chronology ceff 1.846
values and the second (wa082.crn$samp.depth) r̄eff 0.308
the number of samples for each year. The chronology eps 0.911
can be truncated to only have years with more than five Notation follows Cook et al. (1990).
samples using R’s subset function:

4wa082.trunc o- subset(wa082.crn,
samp.depth 45)
and standardized ring-width data from Hurricane
Ridge:
A plot of the chronology with the sample depth can be
produced using the crn.plot function (Fig. 3): 4wa082.idso-read.ids(wa082.rwi,
stc ¼ c(3, 2, 3))
4crn.plot(wa082.trunc) 4wa082.rwi.statso-rwi.stats(wa082.
rwi,ids ¼ wa082.ids)
This function takes advantage of R’s plotting ability
and is a relatively simple plot that is designed to be used The tree ids are read using the read.ids function
with the output of the chron function. and parses the series ids by site (3 characters, e.g.,
The rwi.stats function produces a variety of ‘‘712’’), tree (2 characters, e.g., ‘‘01’’, ‘‘02’’,y), and core
statistics that indicate the within-tree and between- (3 characters, e.g., ‘‘1’’, ‘‘2’’,y). The Hurricane Ridge
tree correlation (r̄) and expressed population signal data set has two cores per tree for all but one series (i.e.,
(Cook et al., 1990). Tree-ring data sets often ‘‘712111’’). The object wa082.stats contains within-
contain more than one sample per tree to analyze and between-tree correlations for maximum pairwise
within- versus between-tree variability. The read.ids overlap among the series (Table 2).
function allows the user to specify tree and core The output from dplR in terms of chronologies and
ids for each series by attempting to parse the descriptive statistics is very close to the output from
series identifications. For example, using the detrended traditional software in dendrochronology. For example,
ARTICLE IN PRESS
120 A.G. Bunn / Dendrochronologia 26 (2008) 115–124

Hurricane Ridge
Chronology
1.6 99% CI
Samples

20
1.4

Sample Depth
1.2

15
RWI

1.0

0.8

10
0.6

1750 1800 1850 1900 1950


Years

Fig. 4. The standard chronology is shown for the Hurricane Ridge RWI data, but this time with bootstrapped 99% confidence
intervals (1000 bootstrapped replicates were used).

the standard chronologies calculated in dplR and Prospectus


ARSTAN using the modified negative exponential
curves and Tukey’s biweight robust mean have a The dplR package in R is a small collection of
correlation coefficient of 0.997. functions that make it easier to import, manipulate,
analyze, and visualize tree-ring data. I present
this package in hopes of encouraging the dendrochro-
Examples of other functions in R nology community to use R as their primary analytic
environment. R is powerful and flexible environment
R has hundreds of packages for performing virtually and its use allows great transparency in presenting
every type of statistical analysis. For example, an often data results. But the package is a preliminary effort.
under-appreciated aspect of chronology building is Other functions can, and will, be added based on
the error associated when averaging across samples. suggestions from the research community. For example,
The R library boot (Canty and Ripley, 2007) allows I will incorporate more detrending methods, other
easy implementation of parametric or non-parametric choices for wavelets, and COFECHA-like functionality.
bootstrap replication for calculating statistics. In Fig. 4, I encourage others to participate in the development
I show the Hurricane Ridge chronology with upper and of dplR.
lower 99% confidence intervals calculated with 1000
bootstrap replicates surrounding the calculated mean
(see Appendix A for code). These confidence intervals
could be used to assess significance of trends, for
example, or allow the more formal estimation of Availability
parameter error in a climate reconstruction (e.g., see
archived R code from Li et al., 2007). The dplR package is available as an add-on package
Another example of incorporating R functionality in R. Interested users can download and install R from
into tree-ring analysis is performing additive decom- the Comprehensive R Archive Network website: http://
position of the Hurricane Ridge chronology via multi- cran.r-project.org/. Within R, dplR can be installed,
resolution analysis (Mallat, 1989) with the contributed loaded, and the help pages (with embedded examples)
package waveslim (Whitcher, 2006). The Hurricane can be seen via:
Ridge chronology can be decomposed into a cascade
from the smallest scales to the largest using wavelets. 4install.packages(‘‘dplR’’)
Fig. 5 shows each wavelet detail (or band-pass) data 4library(dplR)
series for 2–26 years (see Appendix A for code). 4?dplR
ARTICLE IN PRESS
A.G. Bunn / Dendrochronologia 26 (2008) 115–124 121

Multiresolution decomposition of HURstd

D1 2yrs

D2 4yrs

D3 8yrs

D4 16yrs

D5 32yrs

D6 64yrs

1750 1800 1850 1900 1950


Years

Fig. 5. A multiresolution decomposition of the Hurricane Ridge chronology for seven wavelet details is shown with years
corresponding to the wavelet details shown on the right. Each wavelet detail is scaled independently by dividing it by the root-mean-
square.

Acknowledgments and G. Pederson provided helpful tests of dplR and


suggestions from two anonymous reviewers greatly
I would like to acknowledge support from the Office improved the quality of the manuscript. I also thank
of Research and Sponsored Programs at Western all of the researchers who contribute data to the
Washington University and the National Science Foun- International Tree-Ring Data Bank and thus greatly
dation (ARC-0612346 and ATM-0629172). A. Lloyd advance the field of dendrochronology.
ARTICLE IN PRESS
122 A.G. Bunn / Dendrochronologia 26 (2008) 115–124

Appendix A. A dendrochronology program library in R (dplR)


ARTICLE IN PRESS
A.G. Bunn / Dendrochronologia 26 (2008) 115–124 123

References Chambers, J.M., 1998. Programming with Data. Springer,


New York, USA, p. 469.
Bunn, A.G., 2007. dplR: Dendrochronology Program Library Cook, E.R., Holmes, R.L., 1996. Users Manual for Program
in R. R package version 1.0. URL /http://www.R-project. ARSTAN. Laboratory of Tree-Ring Research. University
orgS. of Arizona, Tucson, USA.
Canty, A., Ripley, B.D., 2007. Boot: Bootstrap R (S-Plus) Cook, E.R., Peters, K., 1981. The smoothing spline: a new
Functions. R package version 1.2–28. URL /http://www. approach to standardizing forest interior tree-ring width series
R-project.orgS. for dendroclimatic studies. Tree-Ring Bulletin 41, 45–53.
ARTICLE IN PRESS
124 A.G. Bunn / Dendrochronologia 26 (2008) 115–124

Cook, E.R., Briffa, K., Shiyatov, S., Mazepa, A., Jones, P.D., North, G.R., Biondi, F., Bloomfield, P., Christy, J.R., Cuffey,
1990. Data analysis. In: Cook, E.R., Kairiukstis, L.A. K.M., Dickinson, R.E., Druffel, E.R.M., Nychka, D.,
(Eds.), Methods of Dendrochronology: Applications in the Otto-Bliesner, B., Roberts, N., Turekian, K.K., Wallace,
Environmental Sciences. Kluwer Academic Publishers, J.M., 2006. Surface Temperature Reconstructions for the
Dordrecht, pp. 97–162. Last 2000 Years. National Academies Press, Washington,
Fritts, H.C., 2001. Tree Rings and Climate. Blackburn Press, p. 145.
Caldwell, NJ, USA, p. 567. Pocernich, M., 2006. R’s role in the climate change debate.
Holmes, R.L., 1983. Computer assisted quality control in tree- Rnews 6/4, 17–18.
ring dating and measurement. Tree-Ring Bulletin 43, 69–78. R Development Core Team, 2007. R: a language and
Holmes, R.L., 1992. Dendrochronology Program Library, environment for statistical computing. R Foundation for
Instruction and Program Manual (January 1992 update). Statistical Computing, Vienna, Austria, ISBN 3-900051-07-0,
Laboratory of Tree-Ring Research, University of Arizona, URL /http://www.R-project.orgS.
Tucson, USA. Schweingruber, F., Briffa, K., 1983. Hurricane Ridge Data
Li, B., Nychka, D.W., Ammann, C.M., 2007. The ‘hockey Set. IGBP PAGES/World Data Center for Paleoclimatol-
stick’ and the 1990s: a statistical perspective on reconstruct- ogy Data Contribution Series 1983-WA082.RWL, NOAA/
ing hemispheric temperatures. Tellus A 59, 591–598. NCDC Paleoclimatology Program, Boulder, CO, USA.
Mallat, S.G., 1989. A theory for multiresolution signal decom- Venables, W.N., Ripley, B.D., 2002. Modern Applied Statis-
position: the wavelet representation. IEEE Transactions on tics with S, fourth ed. Springer, Berlin, p. 495.
Pattern Analysis and Machine Intelligence 11, 674–693. Whitcher, B., 2006. waveslim: basic wavelet routines for one-,
Meko, D., 2002. Tree-Ring MATLAB Toolbox. URL /http:// two- and three- dimensional signal processing. R package
www.mathworks.com/matlabcentral/S. version 1.6. URL /http://www.R-project.orgS.

You might also like