Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

TREE-1847; No.

of Pages 8

Opinion

The changing face of the molecular


evolutionary clock
Simon Y.W. Ho
School of Biological Sciences, University of Sydney, Sydney, NSW, Australia

The molecular clock has played an important role in The observation that evolutionary rates in coding sequences
biological research, both as a description of the evolu- were actually independent of generation length was one of
tionary process and as a tool for inferring evolutionary the motivations for the development of the nearly-neutral
timescales. Genomic data have provided valuable theory shortly afterwards [3]. The nearly-neutral theory
insights into the molecular clock, allowing the patterns gave population size a key role in governing the relative
and causes of evolutionary rate variation to be charac- impacts of drift and selection. To this day, the neutral theory
terized in increasing detail. I explain how genome and the molecular clock remain important null models in
sequences offer exciting opportunities for estimating evolutionary analysis.
the timescale of the Tree of Life. I describe the different The molecular clock is familiar to many researchers as a
approaches that have been used to deal with the compu- tool for estimating evolutionary rates and timescales. Ear-
tational and statistical challenges encountered in molec- ly molecular clock studies focused on the timescale of
ular clock analyses of genomic data. Finally, I offer a hominid evolution [4] and were followed by ambitious
perspective on the future of molecular clocks, highlight- efforts to date some of the deepest nodes in the Tree of
ing some of the key limitations and the most promising Life [5,6]. Alongside these studies was a growing recogni-
research directions. tion of rate variation among lineages, in contradiction to
the molecular clock. This motivated the development of
The molecular clock more powerful methods of estimating evolutionary time-
One of the fundamental goals of biological research is to scales, including models that incorporated rate heteroge-
understand the evolutionary process. By allowing the raw neity across lineages [7]. As a result, the role of the
materials of evolution to be analyzed, genetic data have had molecular clock grew dramatically and molecular dating
an immense impact on this endeavor. In this context, the analyses now form an important component of many evo-
molecular clock has been extremely valuable owing to its dual lutionary studies [8,9]. Our understanding of molecular
role as a description of the pattern of molecular evolution and evolutionary rates has been aided by the growth in genomic
as a tool for estimating evolutionary rates and timescales. sequence data. These data have brought significant compu-
The importance of the molecular clock has not diminished tational challenges but offer a rich source of information for
over the years, with its role shifting to the analysis of evolu- resolving the timescale of the Tree of Life.
tionary patterns and processes on a genomic scale.
The molecular clock hypothesis, which postulates a con- Heterogeneity in molecular evolutionary rates
stancy of evolutionary rates among lineages, was introduced Molecular evolution involves dynamic interactions among
in the early 1960s [1] and played a part in the development of the forces of mutation, selection, and drift. As a conse-
molecular evolutionary theory. The apparent homogeneity quence, rate variation across lineages and across the ge-
of rates among lineages was one of the inspirations for the nome are ubiquitous features of the evolutionary process.
neutral theory, which proposed that a large proportion of Large datasets increase the power of statistical methods to
mutations do not alter the fitness of an organism [2]. The test hypotheses about molecular evolution, including the
neutral theory emphasized the importance of genetic drift impacts of different biological and environmental factors
and predicted that evolutionary rates depended on rates of that affect evolutionary rates.
spontaneous mutation, independently of population size. The causes of rate variation can be broadly divided into
According to the neutral theory, however, absolute rates gene effects, lineage effects, and residual effects [10,11].
of evolution (per unit time) are expected to have a negative Gene effects lead to different rates between loci, and have
relationship with generation length. This is because most long been recognized as an intrinsic feature of the molecu-
inherited mutations are thought to occur during replication lar evolutionary process [12]. They can be caused by differ-
of germline DNA, and species with long generations there- ences in the proportion of functionally constrained sites
fore tend to accumulate fewer average mutations per year. and by regional heterogeneities in mutation rates across
the genome [11]. Gene effects represent the only form of
Corresponding author: Ho, S.Y.W. (simon.ho@sydney.edu.au).
rate variation recognized in the simplest clock model,
Keywords: molecular clock; genomic data; rate heterogeneity; pacemaker models;
phylogenetic analysis. known as the strict or global molecular clock.
0169-5347/
Lineage effects refer to factors that act across the whole
ß 2014 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.tree.2014.07.004 genome, such as differences in particular physiological and
life-history traits. These might include generation time,
Trends in Ecology & Evolution xx (2014) 1–8 1
TREE-1847; No. of Pages 8

Opinion Trends in Ecology & Evolution xxx xxxx, Vol. xxx, No. x

r Deg
ake en
m pa

er
ac

ate
Universal p

ce
ma

mulple
ker
Mu r
l p l ke
e pace ma

TRENDS in Ecology & Evolution

Figure 1. Comparison of the three pacemaker models of genome evolution [16], each illustrated using rooted four-taxon trees from five loci. In the Universal Pacemaker
model, loci evolve at different rates but share the same pattern of rate variation across branches. In the Multiple Pacemaker model, loci evolve at different rates but
groups of loci share the same patterns of rate variation across branches. In the Degenerate Multiple Pacemaker model, each locus has its own, distinct pattern of rate
variation across branches. These three models involve different interactions between gene effects and lineage effects, leading to contrasting patterns of rate variation
between loci.

metabolic rate, body size, and the performance of DNA Between these two models sits the Multiple Pacemaker
repair mechanisms [13]. Interactions between lineage and model, which involves a moderate level of residual effects.
gene effects are known as residual effects and act hetero- In this model there are clusters of genes that share the
geneously across the genome. Residual effects can be same lineage effects. This is the most plausible model of
caused by selection, variation in population size, and other genomic evolution and appears to have some statistical
factors [14,15]. By causing the pattern and extent of support [16]. The growing availability of genomic datasets
among-lineage rate heterogeneity to vary across loci, re- will enable further comparisons of these models of molec-
sidual effects are particularly important at the genome ular evolution.
scale.
The interplay between gene and lineage effects is en- Dating using genome sequences
capsulated in the pacemaker models of genome evolution The molecular clock continues to be an important tool for
(Figure 1) [16]. The Universal Pacemaker model ascribes estimating evolutionary rates and timescales in the geno-
variation in molecular rates to the presence of both gene mic era. There has been a steady increase in the size of
and lineage effects. In this model, loci can evolve at distinct datasets used for such phylogenetic dating analyses, both
rates from one another, but are all governed equally by inspiring and enabling the development of sophisticated,
lineage effects. Notably, this model assumes the absence of parameter-rich models of molecular evolution. Much of the
residual effects. Studies of archaeal and bacterial genomes progress in molecular clock methods over the past few
[16,17] and of genome-wide collections of trees from Dro- decades can be described as an effort to account for lineage
sophila and yeast [18] found that the Universal Pacemaker effects – that is, dealing with rate variation among
model explained much of the observed variation in rates. branches in the phylogeny [7]. With the growing use of
This stands in contrast with the Degenerate Multiple large datasets comprising multiple markers, the impacts of
Pacemaker model, which assumes that each locus has a residual effects are becoming increasingly important. To
distinct pattern of rate variation across lineages, and appreciate how methodological development has benefited
genomic evolution is thereby dominated by residual effects. from the increase in genetic data it is worthwhile to
2
TREE-1847; No. of Pages 8

Opinion Trends in Ecology & Evolution xxx xxxx, Vol. xxx, No. x

consider how different methods have been designed to Box 1. Calibrating the molecular clock
account for the three major forms of rate variation. This
Molecular clock models describe the patterns of rate variation
review focuses on sequence data, which are by far the most
across lineages, allowing estimation of the relative ages of nodes in
widely used type of genome-scale data in molecular clock the phylogeny. To place an absolute timescale on the phylogenetic
analyses. tree, the molecular clock needs to be calibrated. This can be done in
one of two ways: by setting the rate to a known value, or by
Accounting for rate variation across lineages constraining the age of at least one node in the phylogeny. The
scarcity of reliable rate estimates means that the latter approach is
For the first few decades of its history, the term ‘molecular
usually preferred, especially when there is substantial rate variation
clock’ referred to the strict-clock model, which does not among lineages. Generally, increasing the number of calibrations
account for lineage effects. This still provides a useful null leads to an improvement in molecular clock estimation [25,66].
model for testing rate variation among lineages. In addi- Ages of nodes can be constrained on the basis of fossil or
tion, use of the strict clock remains commonplace in anal- geological information. The earliest fossil representative of a
lineage can be used to infer the divergence time of that lineage
yses of datasets with low genetic variation, such as those from its sister lineage [67]. The age constraint can be implemented
comprising samples from a single population or from close- in several ways, of which the simplest is to fix the age of the node to
ly related species [19]. a point value. However, this ignores any uncertainty in the
Since the late 1990s there has been a proliferation of calibration age, such as that associated with radiometric dating or
methods that relax the assumption of rate constancy [7]. taxonomic assignment. Instead, a preferable approach is to account
for uncertainty by allowing the node age to vary within chosen
These relaxed-clock models allow the rate to vary across constraints [68]. In Bayesian phylogenetic analysis this can be done
lineages such that each branch of the phylogeny can have a by specifying an informative prior distribution for the node age,
distinct evolutionary rate. The various relaxed-clock mod- typically in the form of a lognormal or exponential distribution [69].
els make different assumptions about how rates vary Choosing the parameters of these distributions can be a difficult
throughout the tree, such as the degree of correlation exercise [70], but there are several formalized methods that allow
this process to be informed by the fossil data [71,72]. Alternatively,
between rates in neighboring branches. These models have calibrating fossils can be included in combined analyses of
been reviewed and compared in detail [7,20], although morphological and molecular data such that their associated
some uncertainty remains about their relative merits temporal information is incorporated implicitly [63,73].
[21–23]. If ancient genomes are available, the ages of the sequences can be
used for calibration. This is possible in studies based on well-
The wide range of molecular clock models has made the
preserved ancient specimens or on rapidly evolving viruses
estimation of evolutionary rates and timescales a substan- sampled through time [74]. Calibrations based on ancient se-
tially statistical exercise. However, using an inappropriate quences can be very effective because there is usually no
clock model can produce highly misleading estimates of uncertainty in the attachment of dates to nodes in the phylogeny.
evolutionary rates and timescales [24–27]. Accordingly, Moreover, these dates are often known with considerable precision.
Any nontrivial uncertainty in the sequence ages can be incorporated
choosing an appropriate model is an important step in
into the molecular clock analysis [75,76].
any phylogenetic dating analysis, and there are several
methods that can be used for model selection. Although the
various clock models provide different descriptions of rate contrasting patterns of rate variation in different branches
variation across lineages, they do not make any statements of the tree. A familiar example of this is associated with the
about absolute rates or node times. In this respect, all codon structure of protein-coding genes: patterns of rate
molecular clock methods share a reliance on calibrations, variation at second codon positions are influenced by se-
which are usually informed by paleontological or geological lection, whereas rates at third codon sites are more likely
data (Box 1). to be subject to lineage effects [32]. In this particular
example, a simple solution is to assign a separate model
Accounting for rate variation along the sequence of among-lineage rate variation to each codon position [33].
Evolutionary rates can vary across the genome, among More generally, one can account for residual effects by
regions, among loci, and even between nucleotide sites. In partitioning the data into subsets based on their pattern of
phylogenetic analyses of DNA sequence data, rate hetero- among-lineage rate heterogeneity. An appropriate strate-
geneity among sites has typically been taken into account gy might be to assign separate molecular clock models to
by assigning sites to a small number of discrete rate different genomic regions, different loci, or different codon
categories, based on the gamma distribution [28]. This positions in protein-coding genes [33]. Alternatively, a
method can also be used when analyzing genomic datasets, statistical approach can be taken to identify the optimal
but an alternative approach is to allow each locus or group partitioning scheme for the data. This can be done, for
of loci to have a distinct evolutionary rate [29–31]. For example, by comparing the estimates of branch lengths
example, a relative rate parameter can be assigned to each from the different loci in the dataset and using a clustering
locus, with these rates following a gamma [30] or Dirichlet method to group loci with similar patterns of rate variation
distribution [31]. These methods, however, are only appro- among branches [29,34]. These patterns can be summa-
priate when gene and lineage effects are present, but not rized using tree-distance metrics [34] or principal-compo-
residual effects (conforming to the Universal Pacemaker nents analysis [29]. In the molecular clock analysis, a
model of genome evolution; Figure 1). separate model of among-lineage rate variation can then
be assigned to each subset of the data. This is analogous to
Accounting for residual effects the common practice of partitioning the dataset and
A more difficult problem emerges when there are signifi- assigning a separate substitution model to each data sub-
cant residual effects such that different sites or loci show set [35].
3
TREE-1847; No. of Pages 8

Opinion Trends in Ecology & Evolution xxx xxxx, Vol. xxx, No. x

Analyzing large datasets in molecular phylogenetics, the first steps in this direction
With the drive to base evolutionary inferences on datasets were taken using organellar genomes. Molecular dating
of increasing size, an ongoing feature of sampling has been using complete sequences of organellar genomes is now
the trade-off between numbers of loci and taxa. Some relatively common (e.g., [42,43]), whereas few dating stud-
studies have focused on estimating evolutionary time- ies based on nuclear DNA have taken advantage of the
scales for very large numbers of taxa, but these have data produced by early genome projects (e.g., [44]). When
typically been restricted to small numbers of loci or have there are few taxa in the dataset, the number of distinct
used supertree methods to merge smaller trees inferred site patterns in the alignment is small. Sites sharing the
from different markers [36,37]. The emergence of high- same pattern of variation across taxa can be grouped for
throughput sequencing technology has made it possible to the purposes of likelihood calculation such that molecular
assemble large datasets that can be characterized as hav- clock analyses of these datasets are computationally trac-
ing (i) a small number of markers for a large number of table even with intensive Bayesian and likelihood methods
taxa; (ii) a large number of markers for a small number of (Box 2). However, datasets with many loci but few taxa are
taxa; or (iii) a large number of markers for a large number subject to several disadvantages associated with sparse
of taxa. As I explain below, these three types of dataset taxon sampling, with impacts on tree balance, performance
present different challenges for molecular clock analyses. of phylogenetic inference, and estimation of macroevolu-
Datasets comprising small numbers of markers for a tionary parameters [45]. The addition of taxa often comes
very large sample of taxa can be used to answer a variety of at the cost of an increased proportion of missing data, with
evolutionary questions, particularly those associated with uncertain impacts on molecular clock analysis [46,47].
macroevolutionary processes such as diversification Analyses of multilocus data must also deal with incongru-
[36,37]. Several methods can be used to estimate evolu- ence between trees estimated from different loci, the extent
tionary timescales for very large phylogenetic trees [38–41] of which will depend on the taxonomic scale being investi-
(Box 2). These methods share several features in common. gated (Box 3).
All treat inference of the phylogeny as a separate problem, Genome-scale datasets from moderate to large numbers
and thus the topology and branch lengths (measuring the of taxa are still relatively rare. However, with various
amount of genetic change) are assumed to be known for the genome-sequencing initiatives underway, such as the Ge-
dating analysis. Although this leads to a considerable nome 10K Project [48] and the i5K Project [49], very large
reduction in the computational burden, it also places a datasets will soon become much more common. Analyses of
limit on model complexity because the sequence data are these data, which have the potential to comprise millions of
not always analyzed directly. In addition, most of the characters from tens to hundreds of taxa, can be handled
methods that can analyze large numbers of taxa are unable using rapid likelihood-based methods but remain problem-
to accommodate complex calibrating information. Thus, atic for most Bayesian phylogenetic methods. In particu-
these methods typically do not handle uncertainty in cali- lar, enormous computational demands are made by the
brations and the phylogeny in an ideal manner. calculation of the full likelihood and by the estimation of
In molecular dating analyses there has been growing the posterior using Markov chain Monte Carlo sampling
use of datasets that comprise large amounts of sequence [50]. These analyses will benefit from efforts to improve the
data from small numbers of taxa. As with many advances computational efficiency of molecular dating analyses. In

Box 2. Methods for estimating evolutionary timescales from large datasets


A wide range of molecular clock methods for estimating evolu- clock methods can usually only handle datasets comprising
tionary timescales are available in various software packages. The moderate numbers of taxa. Further developments will allow
methods differ in terms of their statistical motivations, assumptions Bayesian molecular clock methods to be applied to much larger
about rate variation, and ability to handle different forms of datasets [50].
calibration. These characteristics typically need to be weighed Several non-Bayesian methods have been developed for analyzing
against the computational burden and running time of the different datasets containing large numbers of taxa and are available in such
methods. software as PATHd8 [41], treePL [40], DAMBE [39], chronos [66], and
Perhaps the most popular approaches, at least in recent years, are RelTime [38]. These typically rely on other methods to produce
those implemented in a Bayesian phylogenetic framework. Bayesian estimates of the tree topology and branch lengths. Genome-scale
methods are popular because they can readily incorporate complex datasets can be used for this purpose if rapid phylogenetic analysis is
models of molecular evolution. Bayesian molecular clocks are performed by methods such as RAxML [83]. Most non-Bayesian
available in programs such as BEAST [77], MrBayes [78], MCMCtree methods for molecular clock analysis employ a rate-smoothing
[79], PhyloBayes [80], and DPPdiv [20]. Some of these require a fixed approach in which large rate changes between neighboring branches
tree topology, whereas others are able to co-estimate the phylogeny in the phylogeny are disfavored [68]. Calibrations are implemented as
and node times. These programs are usually able to implement a constraints on node times or by fixing the ages of nodes to point
range of relaxed-clock models. Clock calibrations are typically values. Choosing a molecular clock method depends on a balance
incorporated as priors on node ages. Estimates of node times can between the features of the method and its computational demands.
be sensitive to the choice of priors, making it important to examine With a sufficiently large number of calibrations, however, we would
the priors carefully [31,81]. expect most molecular clock methods to converge on similar
Unlike most other Bayesian methods, MCMCtree has the option estimates of evolutionary timescales [25,38,66]. Nevertheless, the
of using approximate likelihood calculation [52]. This makes it various methods report uncertainty in different ways, depending on
feasible for MCMCtree to analyze genome-scale datasets [29,82]. how they are able to handle error in the calibrations and the
Despite such computational improvements, Bayesian molecular phylogeny.

4
TREE-1847; No. of Pages 8

Opinion Trends in Ecology & Evolution xxx xxxx, Vol. xxx, No. x

Box 3. Taxonomic scales of analysis


Molecular clock studies usually deal with evolutionary patterns and When the species in the dataset are represented by multiple
processes occurring across geological timeframes. However, many samples, species-tree methods can be used. These allow variation
studies are conducted at much lower taxonomic scales, with multiple across individual gene trees but assume that they are all embedded
samples drawn from a single species or from each of several closely in a single, underlying species tree (e.g., [88,89]). These methods
related species. These datasets pose very different challenges for tend to be computationally intensive and cannot be readily applied
molecular clock analysis, and for phylogenetic analysis more to genome-scale datasets. Moreover, species-tree methods are
generally, because recombination removes the expectation of con- generally not amenable to the incorporation of calibrating informa-
gruence among the trees estimated from different loci. Accordingly, tion.
the evolutionary process is better described using the coalescent Among studies of recent evolutionary events, a common problem
framework than by birth–death models of speciation and extinction. is a lack of reliable and appropriate calibrations [90]. Fossil
Genome-scale intraspecific datasets are becoming more common calibrations are generally more useful for analyses of deeper
(e.g., [84,85]), raising the question of how such data can be analyzed timescales, whereas biogeography-based calibrations usually in-
using molecular clocks. Methods based on concatenation, which volve strong assumptions about gene flow – and are not always
assume that all loci share the same tree, would clearly be available for the taxa being studied. The paucity of reliable
inappropriate [86]. One approach is to obtain an estimate of the calibrations also exacerbates the problem of time-dependent biases
timescale from each locus: this can easily be achieved even using in rate estimation [91]. Purifying selection and other factors cause
intensive Bayesian or likelihood-based dating methods. The age evolutionary rates to appear higher over short timeframes, an
estimates for nodes of interest, such as those that are present in a effect that is particularly pronounced over the time-depths involved
majority of gene trees, can then be summarized across loci. However, in intraspecific analyses. Failing to account for this problem can
any approaches based on consensus of gene trees can be highly result in highly misleading estimates of evolutionary timescales
misleading, even at higher taxonomic levels [87]. [92,93].

150 100 50

Jurassic Cretaceous Paleogene Neogene

Monotremata

Marsupiala

Xenarthra

Afrotheria
Theria

Lagomorpha

Placentalia Rodena

Scandena

Primates

Eulipotyphla

Cetarodactyla

Chiroptera

Perissodactyla

Carnivora

TRENDS in Ecology & Evolution

Figure 2. Evolutionary timescale of ordinal diversification in mammals. Chronogram (dates in Ma) estimated in a Bayesian relaxed-clock analysis of 14 632 nuclear genes,
calibrated using 38 fossil-based constraints on the ages of nodes [29]. Light-blue bars at nodes represent 95% credibility intervals of divergence-time estimates. Triangles
denote clades represented by more than one species in the analysis. Even with genome-scale data, the age estimates for some nodes have considerable uncertainty. The
orange vertical line indicates the timing of the Cretaceous–Paleogene boundary. Most orders of placental mammals diversified in the Paleogene, but the basal divergences
in Placentalia occurred in the Late Cretaceous. The date estimates were robust to a range of factors including data partitioning and various model priors [29,31].

5
TREE-1847; No. of Pages 8

Opinion Trends in Ecology & Evolution xxx xxxx, Vol. xxx, No. x

Box 4. Case study – phylogenomic estimate of the


estimates of node times therefore depend on the clock
mammalian evolutionary timescale
model and on the calibrations [39]. Assuming that the
chosen clock model accurately describes the rate variation
The early history of mammals, particularly the timing of their across the tree, uncertainty in the estimates of node times
diversification in relation to the extinction of the dinosaurs at the
converges to the uncertainty in the calibrations; this can
Cretaceous–Paleogene boundary (65.5 million years ago), has been
the subject of considerable research [94]. Most molecular clock occur even with relatively small amounts of sequence data
analyses have placed the diversification of mammalian orders in the [57,58]. In particular, estimation error in node ages is most
Cretaceous (e.g., [36,95]). In a recent study based on genomic data, strongly influenced by the most precisely defined calibra-
dos Reis and colleagues [29] estimated the timescale of mammalian tions [57]. Therefore, without reliable calibrations, the
diversification. They analyzed the DNA sequences of 14 632 genes
that were partitioned into 20 subsets on the basis of their relative
advantages of using genomic data are largely blunted.
rate of evolution. This partitioning was carried out using the As a consequence, identification of reliable calibrations
distance between human and mouse for each gene. remains the most crucial component of molecular dating
The authors performed further analyses using a smaller set of 857 analyses [59].
genes that were present in all 36 genomes. As with the larger The growing wealth of genomic data presents an excit-
dataset, the authors partitioned the genes according to their relative
evolutionary rate. For the 857 gene dataset, however, dos Reis et al.
ing opportunity for improving models of the molecular
[29] also compared a partitioning scheme in which genes were clock. Comprehensive comparisons of different relaxed-
grouped according to their patterns of rate variation across clock models will help to identify the models that provide
branches. This partitioning was performed using a clustering the best description of biological reality. Furthermore,
approach based on the branch lengths estimated from each gene.
improvements in our understanding of the causes of rate
In both cases, the evolutionary timescale was estimated using a
Bayesian molecular clock analysis in the software MCMCTREE [79], variation among lineages and among loci bring the pros-
calibrated by 38 fossil-based age constraints. To make the analysis pect of building mechanistic models of molecular rates.
computationally tractable, the authors used approximate likelihood This will be aided by studies of biological correlates of rates
calculation. Even with this approach, the analysis of the full dataset [60]. In this regard, models of covariation between traits
took about 20 days.
and rates present a promising line of inquiry [61]. Although
The results of the analysis showed that the ordinal crown groups of
mammals originated after the Cretaceous–Paleogene boundary (see this review has focused on sequence data, future model
Figure 2 in main text). The basal divergences among these orders, development might be able to accommodate other forms of
however, largely occurred in the Late Cretaceous. This supports a genome-scale data, such as ultra-conserved elements and
scenario in which placental mammals had a long evolutionary fuse in panels of single-nucleotide polymorphisms. For example, a
the Cretaceous before radiating explosively in the Paleogene [96]. The
estimate of the evolutionary timescale was robust to a range of
clock-based analysis of protein folds has been used to
factors, including the data-partitioning strategy. This attested to the investigate the early evolutionary history of life on Earth
benefits of using genome-scale data in this study. Nevertheless, the [62].
age-estimates for some of the divergence events in the tree had Refinement of calibration methods, discoveries of infor-
considerable uncertainty, showing that even genomic data do not mative fossils, and improved understanding of biogeo-
necessarily yield errorless estimates of evolutionary parameters.
graphic calibrations will all play important roles in
strengthening the reliability of molecular clock estimates.
this regard, an effective technique is the use of approxi- An exciting area of work is the development of methods for
mate likelihood calculations in Bayesian relaxed-clock incorporating fossil data into molecular clock analyses.
methods, which can lead to a 1000-fold reduction in com- These can either involve combined analyses of morpholog-
puting time [50–52]. A recent example of the application of ical and molecular data [63] or other methods of using
these methods involved the timing of the early diversifica- fossil-occurrence data for calibration [64,65].
tion of placental mammals (Figure 2; Box 4). Substantial
reductions in the time needed for molecular clock analyses Concluding remarks and future directions
can also be achieved by distributing computational load, The molecular clock has made important contributions to
such as by multithreading or parallelization [53,54]. our understanding of biological processes and evolutionary
timescales. Now in the genomic era, we have experienced a
Looking towards the future deluge of data that can be used to gain fascinating insights
There has been a rapid growth in genomic data over the into evolutionary questions. To use the molecular clock to
past few years, but are there diminishing returns when its full potential, methodological and computational devel-
using larger datasets for molecular clock analysis? Even opment must keep pace.
with infinite data, molecular dating analyses cannot pro- Addressing computational challenges should not merely
duce errorless estimates of evolutionary timescales [55,56]. depend on technological progress but should involve the
Assuming that the phylogenetic tree has been estimated improvement of methods of phylogenetic and molecular
accurately and that an appropriate substitution model has clock estimation. Further development of models of rate
been chosen, uncertainty in estimates of node ages arises variation and methods of clock calibration will help to
from three sources [39,57]. First, there is estimation error improve the reliability of molecular clock inferences. With
in the branch lengths. Second, there is uncertainty in the data availability no longer being a major limiting factor,
pattern of rate variation among branches in the phylogeny. efforts should also be directed towards identifying the
Third, uncertainty in the fossil or geological information genomic data subsets that are the most suitable for esti-
used to construct calibrations is rarely trivial. mating evolutionary timescales. With these improvements
In analyses of genomic data, the branch lengths are in hand, the Time-Tree of Life will have its roots in firm
effectively estimated without error, and uncertainties in ground.
6
TREE-1847; No. of Pages 8

Opinion Trends in Ecology & Evolution xxx xxxx, Vol. xxx, No. x

Acknowledgments 27 Battistuzzi, F.U. et al. (2010) Performance of relaxed-clock methods in


The author thanks Sebastián Duchêne for useful discussions and Paul estimating evolutionary divergence times and their credibility
Craze, Tracy Heath, and anonymous reviewers for their constructive intervals. Mol. Biol. Evol. 27, 1289–1300
comments on the manuscript. S.Y.W.H. was supported by the Australian 28 Yang, Z. (1994) Maximum likelihood phylogenetic estimation from
Research Council. DNA sequences with variable rates over sites: approximate
methods. J. Mol. Evol. 39, 306–314
29 dos Reis, M. et al. (2012) Phylogenomic datasets provide both precision
Appendix A. Supplementary data and accuracy in estimating the timescale of placental mammal
Supplementary data associated with this article can be found, in the online phylogeny. Proc. R. Soc. Lond. B 279, 3491–3500
version, at http://dx.doi.org/10.1016/j.tree.2014.07.004. 30 Thorne, J.L. and Kishino, H. (2002) Divergence time and evolutionary
rate estimation with multilocus data. Syst. Biol. 51, 689–702
References 31 dos Reis, M. et al. (2014) The impact of the rate prior on Bayesian
1 Zuckerkandl, E. and Pauling, L. (1962) Molecular disease, evolution, estimation of divergence times with multiple loci. Syst. Biol. 63, 555–
and genic heterogeneity. In Horizons in Biochemistry (Kasha, M. and 565
Pullman, B., eds), pp. 189–225, Academic Press 32 Smith, N.G. and Eyre-Walker, A. (2003) Partitioning the variation in
2 Kimura, M. (1968) Evolutionary rate at the molecular level. Nature mammalian substitution rates. Mol. Biol. Evol. 20, 10–17
217, 624–626 33 Ho, S.Y.W. and Lanfear, R. (2010) Improved characterisation of among-
3 Ohta, T. (1972) Evolutionary rate of cistrons and DNA divergence. J. lineage rate variation in cetacean mitogenomes using codon-
Mol. Evol. 1, 150–157 partitioned relaxed clocks. Mitochondrial DNA 21, 138–146
4 Sarich, V.M. and Wilson, A.C. (1967) Immunological time scale for 34 Duchêne, S. et al. (2014) ClockstaR: choosing the number of relaxed-
hominid evolution. Science 158, 1200–1203 clock models in molecular phylogenetic analysis. Bioinformatics 30,
5 Runnegar, B. (1982) A molecular-clock date for the origin of the animal 1017–1019
phyla. Lethaia 15, 199–205 35 Lanfear, R. et al. (2012) Partitionfinder: combined selection of
6 Doolittle, R.F. et al. (1996) Determining divergence times of the major partitioning schemes and substitution models for phylogenetic
kingdoms of living organisms with a protein clock. Science 271, 470– analyses. Mol. Biol. Evol. 29, 1695–1701
477 36 Bininda-Emonds, O.R.P. et al. (2007) The delayed rise of present-day
7 Welch, J.J. and Bromham, L. (2005) Molecular dating when rates vary. mammals. Nature 446, 507–512
Trends Ecol. Evol. 20, 320–327 37 Jetz, W. et al. (2012) The global diversity of birds in space and time.
8 Kumar, S. (2005) Molecular clocks: four decades of evolution. Nat. Rev. Nature 491, 444–448
Genet. 6, 654–662 38 Tamura, K. et al. (2012) Estimating divergence times in large
9 Bromham, L. and Penny, D. (2003) The modern molecular clock. Nat. molecular phylogenies. Proc. Natl. Acad. Sci. U.S.A. 109, 19333–19338
Rev. Genet. 4, 216–224 39 Xia, X. and Yang, Q. (2011) A distance-based least-square method for
10 Gillespie, J.H. (1991) The Causes of Molecular Evolution, Oxford dating speciation events. Mol. Phlyogenet. Evol. 59, 342–353
University Press 40 Smith, S.A. and O’Meara, B.C. (2012) treePL: divergence time
11 Gaut, B. et al. (2011) The patterns and causes of variation in plant estimation using penalized likelihood for large phylogenies.
nucleotide substitution rates. Annu. Rev. Ecol. Evol. Syst. 42, 245–266 Bioinformatics 28, 2689–2690
12 Dickerson, R.E. (1971) The structures of cytochrome c and the rates of 41 Britton, T. et al. (2007) Estimating divergence times in large
molecular evolution. J. Mol. Evol. 1, 26–45 phylogenetic trees. Syst. Biol. 56, 741–752
13 Bromham, L. (2011) The genome as a life-history character: why rate of 42 Pacheco, M.A. et al. (2011) Evolution of modern birds revealed by
molecular evolution varies between mammal species. Philos. Trans. R. mitogenomics: timing the radiation and origin of major orders. Mol.
Soc. Lond. B 366, 2503–2513 Biol. Evol. 28, 1927–1942
14 Takahata, N. (1987) On the overdispersed molecular clock. Genetics 43 Martin, M.D. et al. (2014) Persistence of the mitochondrial lineage
116, 169–179 responsible for the Irish potato famine in extant New World
15 Cutler, D.J. (2000) Understanding the overdispersed molecular clock. Phytophthora infestans. Mol. Biol. Evol. 31, 1414–1420
Genetics 154, 1403–1417 44 Cutter, A.D. (2008) Divergence times in Caenorhabditis and
16 Snir, S. et al. (2012) Universal pacemaker of genome evolution. PLOS Drosophila inferred from direct estimates of the neutral mutation
Comput. Biol. 8, e1002785 rate. Mol. Biol. Evol. 25, 778–786
17 Wolf, Y.I. et al. (2013) Stability along with extreme variability in core 45 Heath, T.A. et al. (2008) Taxon sampling affects inferences of
genome evolution. Genome Biol. Evol. 5, 1393–1402 macroevolutionary processes from phylogenetic trees. Syst. Biol. 57,
18 Snir, S. et al. (2014) Universal pacemaker of genome evolution in 160–166
animals and fungi and variation of evolutionary rates in diverse 46 Lemmon, A.R. et al. (2009) The effect of ambiguous data on
organisms. Genome Biol. Evol. 6, 1268–1278 phylogenetic estimates obtained by maximum likelihood and
19 Brown, R.P. and Yang, Z. (2011) Rate variation and estimation of Bayesian inference. Syst. Biol. 58, 130–145
divergence times using strict and relaxed clocks. BMC Evol. Biol. 11, 47 Filipski, A. et al. (2014) Prospects for building large timetrees using
271 molecular data with incomplete gene coverage among species. Mol.
20 Heath, T.A. et al. (2012) A Dirichlet process prior for estimating Biol. Evol. (in press)
lineage-specific substitution rates. Mol. Biol. Evol. 29, 939–955 48 Genome 10K. Community of Scientists (2009) Genome 10K: a proposal
21 Linder, M. et al. (2011) Evaluation of Bayesian models of substitution to obtain whole-genome sequence for 10 000 vertebrate species. J.
rate evolution–parental guidance versus mutual independence. Syst. Hered. 100, 659–674
Biol. 60, 329–342 49 i5K Consortium (2013) The i5K Initiative: advancing arthropod
22 Ho, S.Y.W. (2009) An examination of phylogenetic models of genomics for knowledge, human health, agriculture, and the
substitution rate variation among lineages. Biol. Lett. 5, 421–424 environment. J. Hered. 104, 595–600
23 Lepage, T. et al. (2007) A general comparison of relaxed molecular clock 50 Guindon, S. (2010) Bayesian estimation of divergence times from large
models. Mol. Biol. Evol. 24, 2669–2680 sequence alignments. Mol. Biol. Evol. 27, 1768–1781
24 Ho, S.Y.W. et al. (2005) Accuracy of rate estimation using relaxed-clock 51 Thorne, J.L. et al. (1998) Estimating the rate of evolution of the rate of
models with a critical focus on the early metazoan radiation. Mol. Biol. molecular evolution. Mol. Biol. Evol. 15, 1647–1657
Evol. 22, 1355–1363 52 dos Reis, M. and Yang, Z. (2011) Approximate likelihood calculation on
25 Duchêne, S. et al. (2014) The impact of calibration and clock-model a phylogeny for Bayesian estimation of divergence times. Mol. Biol.
choice on molecular estimates of divergence times. Mol. Phlyogenet. Evol. 28, 2161–2172
Evol. 78, 277–289 53 Darriba, D. et al. (2013) Boosting the performance of Bayesian
26 Worobey, M. et al. (2014) A synchronized global sweep of the internal divergence time estimation with the Phylogenetic Likelihood
genes of modern avian influenza virus. Nature 508, 254–257 Library. In In 2013 IEEE 27th International Symposium on Parallel

7
TREE-1847; No. of Pages 8

Opinion Trends in Ecology & Evolution xxx xxxx, Vol. xxx, No. x

and Distributed Processing Workshops and PhD Forum. pp. 539–548 74 Drummond, A.J. et al. (2003) Measurably evolving populations. Trends
IEEE Computer Society Washington Ecol. Evol. 18, 481–488
54 Ayres, D.L. et al. (2012) BEAGLE: an application programming 75 Shapiro, B. et al. (2011) A Bayesian phylogenetic method to estimate
interface and high-performance computing library for statistical unknown sequence ages. Mol. Biol. Evol. 28, 879–887
phylogenetics. Syst. Biol. 61, 170–173 76 Molak, M. et al. (2013) Phylogenetic estimation of timescales using
55 Yang, Z. and Rannala, B. (2006) Bayesian estimation of species ancient DNA: the effects of temporal sampling scheme and uncertainty
divergence times under a molecular clock using multiple fossil in sample ages. Mol. Biol. Evol. 30, 253–262
calibrations with soft bounds. Mol. Biol. Evol. 23, 212–226 77 Drummond, A.J. et al. (2012) Bayesian phylogenetics with BEAUti and
56 Britton, T. (2005) Estimating divergence times in phylogenetic trees the BEAST 1.7. Mol. Biol. Evol. 29, 1969–1973
without a molecular clock. Syst. Biol. 54, 500–507 78 Ronquist, F. et al. (2012) MrBayes 3.2: efficient Bayesian phylogenetic
57 dos Reis, M. and Yang, Z. (2013) The unbearable uncertainty of inference and model choice across a large model space. Syst. Biol. 61,
Bayesian divergence time estimation. J. Syst. Evol. 51, 30–43 539–542
58 Rannala, B. and Yang, Z. (2007) Inferring speciation times under an 79 Yang, Z. (2007) PAML 4: phylogenetic analysis by maximum likelihood.
episodic molecular clock. Syst. Biol. 56, 453–466 Mol. Biol. Evol. 24, 1586–1591
59 Wheat, C.W. and Wahlberg, N. (2013) Critiquing blind dating: the 80 Lartillot, N. et al. (2009) PhyloBayes 3: a Bayesian software package for
dangers of over-confident date estimates in comparative genomics. phylogenetic reconstruction and molecular dating. Bioinformatics 25,
Trends Ecol. Evol. 28, 636–642 2286–2288
60 Lanfear, R. et al. (2010) Watching the clock: studying variation in rates 81 Heled, J. and Drummond, A.J. (2012) Calibrated tree priors for relaxed
of molecular evolution between species. Trends Ecol. Evol. 25, 495–503 phylogenetics and divergence time estimation. Syst. Biol. 61, 138–149
61 Lartillot, N. and Poujol, R. (2011) A phylogenetic model for 82 Hedin, M. et al. (2012) Phylogenomic resolution of paleozoic
investigating correlated evolution of substitution rates and divergences in harvestmen (Arachnida, Opiliones) via analysis of
continuous phenotypic characters. Mol. Biol. Evol. 28, 729–744 next-generation transcriptome data. PLoS ONE 7, e42888
62 Wang, M. et al. (2011) A universal molecular clock of protein folds and 83 Stamatakis, A. (2014) RAxML version 8: a tool for phylogenetic
its power in tracing the early history of aerobic metabolism and planet analysis and post-analysis of large phylogenies. Bioinformatics 30,
oxygenation. Mol. Biol. Evol. 28, 567–582 1312–1313
63 Ronquist, F. et al. (2012) A total-evidence approach to dating with 84 The 1000 Genomes Project Consortium (2012) An integrated map of
fossils, applied to the early radiation of the Hymenoptera. Syst. Biol. genetic variation from 1,092 human genomes. Nature 491, 56–65
61, 973–999 85 Martin, M.D. et al. (2013) Reconstructing genome evolution in historic
64 Wilkinson, R.D. et al. (2011) Dating primate divergences through an samples of the Irish potato famine pathogen. Nat. Commun. 4, 2172
integrated analysis of palaeontological and molecular data. Syst. Biol. 86 Kubatko, L.S. and Degnan, J.H. (2007) Inconsistency of phylogenetic
60, 16–31 estimates from concatenated data under coalescence. Syst. Biol. 56,
65 Heath, T.A. et al. (2014) The fossilized birth-death process for coherent 17–24
calibration of divergence-time estimates. Proc. Natl. Acad. Sci. U.S.A. 87 Degnan, J.H. and Rosenberg, N.A. (2009) Gene tree discordance,
http://dx.doi.org/10.1073/pnas.1319091111 phylogenetic inference and the multispecies coalescent. Trends Ecol.
66 Paradis, E. (2013) Molecular dating of phylogenies by likelihood Evol. 24, 332–340
methods: a comparison of models and a new information criterion. 88 Heled, J. and Drummond, A.J. (2010) Bayesian inference of species
Mol. Phylogenet. Evol. 67, 436–444 trees from multilocus data. Mol. Biol. Evol. 27, 570–580
67 Donoghue, P.C. and Benton, M.J. (2007) Rocks and clocks: calibrating 89 Yang, Z. and Rannala, B. (2010) Bayesian species delimitation using
the Tree of Life using fossils and molecules. Trends Ecol. Evol. 22, multilocus sequence data. Proc. Natl. Acad. Sci. U.S.A. 107, 9264–9269
424–431 90 Shapiro, B. and Ho, S.Y.W. (2014) Ancient hyaenas highlight the old
68 Sanderson, M.J. (1997) A nonparametric approach to estimating problem of estimating evolutionary rates. Mol. Ecol. 23, 499–501
divergence times in the absence of rate constancy. Mol. Biol. Evol. 91 Ho, S.Y.W. et al. (2011) Time-dependent rates of molecular evolution.
14, 1218–1231 Mol. Ecol. 20, 3087–3101
69 Ho, S.Y.W. and Phillips, M.J. (2009) Accounting for calibration 92 Pulquério, M.J. and Nichols, R.A. (2007) Dates from the molecular
uncertainty in phylogenetic estimation of evolutionary divergence clock: how wrong can we be? Trends Ecol. Evol. 22, 180–184
times. Syst. Biol. 58, 367–380 93 Ho, S.Y.W. and Larson, G. (2006) Molecular clocks: when times are a-
70 Parham, J.F. et al. (2012) Best practices for justifying fossil changin’. Trends Genet. 22, 79–83
calibrations. Syst. Biol. 61, 346–359 94 Cooper, A. and Penny, D. (1997) Mass survival of birds across the
71 Heath, T.A. (2012) A hierarchical Bayesian model for calibrating Cretaceous–Tertiary boundary: molecular evidence. Science 275,
estimates of species divergence times. Syst. Biol. 61, 793–809 1109–1113
72 Nowak, M.D. et al. (2013) A simple method for estimating informative 95 Meredith, R.W. et al. (2011) Impacts of the Cretaceous Terrestrial
node age priors for the fossil calibration of molecular divergence time Revolution and KPg extinction on mammal diversification. Science 334,
analyses. PLoS ONE 8, e66245 521–524
73 Lee, M.S.Y. et al. (2009) Phylogenetic uncertainty and molecular clock 96 Archibald, J.D. and Deutschman, D.H. (2001) Quantitative analysis of
calibrations: a case study of legless lizards (Pygopodidae, Gekkota). the timing of the origin and diversification of extant placental orders. J.
Mol. Phylogenet. Evol. 50, 661–666 Mamm. Evol. 8, 107–124

You might also like