Goloboff Et Al 2008 - TNT, A Free Program For Phylogenetic Analysis

Cladistics
Cladistics 24 (2008) 774–786

10.1111/j.1096-0031.2008.00217.x
TNT, a free program for phylogenetic analysis

Pablo A. Goloboff a,*, James S. Farrisb and Kevin C. Nixonc
a
CONICET, INSUE, Instituto Miguel Lillo, Miguel Lillo 205, 4000 S.M. de Tucumán, Argentina;
b
Molekylärsystematiska laboratoriet, Naturhistoriska riksmuseet, Box 50007, S-104 Stockholm, Sweden;
c
L.H. Bailey Hortorium, Cornell University, Ithaca, NY 14853-4301, USA
Accepted 6 January 2008
Abstract
The main features of the phylogeny program TNT are discussed. Windows versions have a menu interface, while Macintosh and
Linux versions are command-driven. The program can analyze data sets with discrete (additive, non-additive, step-matrix) as well as
continuous characters (evaluated with Farris optimization). Effective analysis of large data sets can be carried out in reasonable
times, and a number of methods to help identifying wildcard taxa in the case of ambiguous data sets are implemented. A variety of
methods for diagnosing trees and exploring character evolution is available in TNT, and publication-quality tree-diagrams can be
saved as metafiles. Through the use of a number of native commands and a simple but powerful scripting language, TNT allows the
user an enormous flexibility in phylogenetic analyses or simulations.
The Willi Hennig Society 2008.
Introduction sophisticated menu interface or GUI (Figs 1–5), which

makes TNT the only major phylogeny program that can
Since the first breakthrough in parsimony analysis be menu-driven under Windows. Using emulators or
with the release of Hennig86 (Farris, 1988), parsimony alternative implementations of the Windows APIs (such
programs have continued to improve, culminating in as WINE or CrossOver Mac), the Windows version of
TNT (Goloboff et al., 2003b), which includes several TNT can also be run under Linux or OSX. Note that the
new methods to facilitate phylogenetic analysis (for menu-driven PAUP* (Swofford, 2002) is a Classic
reviews see Hovenkamp, 2004; Giribet, 2005; Meier and Macintosh program, and as such it is no longer supported
Ali, 2005). Under an agreement between the Willi on the latest version of OSX or any Intel Macintosh.
Hennig Society and the authors, TNT is now available Commands can also be read from data files or
as a free program. A version of TNT licensed for single- instruction files, providing an easy way to automate
user, academic use can be downloaded from http:// routines. The on-line help (help command) includes a list
www.zmuc.dk/public/phylogeny/TNT. The purpose of of all the TNT commands and their options. Through-
the present paper is to provide a general overview of the out, command options (available in all versions) are
program and some guidance for beginners. indicated as italics and bold (e.g. procedure) and menu
options (available only in Windows versions) are indi-
cated as bold (e.g. File ⁄ OpenInputFile). To save space,
General interface: menus or commands only the most important and common options are
indicated.
TNT is a fully interactive program. All versions
(Windows, Linux and Macintosh) can be run by means
of commands. In addition, the Windows version has a A basic analysis
*Corresponding author: The input data file can be in either TNT or basic
E-mail address: pablogolo@csnat.unt.edu.ar Nexus format. The TNT format is derived from
The Willi Hennig Society 2008

P.A. Goloboff et al. / Cladistics 24 (2008) 774–786 775
Fig. 1. Main menu options of TNT.
Hennig86 ⁄ NONA with significant enhancements mak- morphological characters), ? (for missing data), or
ing it relatively backwards compatible (requiring at the - (for gaps).
most a little editing to eliminate commands extraneous To read in the data file, select File ⁄ OpenInputFile or
to TNT). The data file should start with a specification type in procedure datafilename. Usually you will name
of the data type, which in the case of DNA data is your data file with a .tnt extension, such as mydata.tnt.
nstates dna;. Next comes the xread command, followed Before analyzing the data, you should make provision
by the number of characters, the number of taxa, and for saving the output to a log file. This can be done by
the data themselves (sequence data must be prealigned). selecting File ⁄ Output ⁄ OpenOutput or by entering log
Character states may be IUPAC codes, digits (for logfilename. Usually you should give a log file the .out
776 P.A. Goloboff et al. / Cladistics 24 (2008) 774–786
Fig. 2. Dialogs for handling internal tree file: A, filtering, and B, creating groups of trees.
extension and the same base name as the data 10 trees per replication (roughly equivalent to a ‘‘heu-
file—something like mydata.out. ristic search’’ with random addition sequences in
All program output is temporarily saved to an PAUP*, or hold ⁄ 10;mult*10; in NONA). In the case of
internal text-buffer. This text-buffer can be saved to a small data sets (15–30 taxa), exact solutions using
newly opened log file by selecting File ⁄ Output ⁄ SaveDis branch-and-bound (which guarantee finding all trees
playBuffer. In command-driven versions, the text-buffer optimal under current settings) can be produced within
can be inspected with the view command; in the reasonable times with Analyze ⁄ ImplicitEnumeration or
Windows version, the default window displays the the ienum command.
text-buffer automatically. Once calculated, trees may be viewed by selecting
A basic analysis consists of using multiple addition Tree ⁄ Display or by entering tplot. In the Windows
sequences followed by branch swapping. To do this version, the tree can be saved as an extended metafile (a
select Analyze ⁄ TraditionalSearch with default options publication-quality image file which can be exported to
(just click Search when the option dialog appears) or PowerPoint, CorelDraw, etc.), by pressing ‘‘M’’ while
enter mult. This will perform 10 random addition viewing the tree-diagram (see below). To save the trees for
sequences followed by branch-swapping, saving up to later reanalysis, create a save file by selecting File ⁄
Fig. 3. Dialog for editing data on a character-by-character basis.
TreeSaveFile ⁄ Open or by entering tsave * savefilename. optimizes continuous characters as such (Goloboff
Usually you will name your save file with a .tre extension, et al., 2006).
so that its name will be something like mydata.tre. If there In Windows versions, it is possible to edit the data
are multiple trees, their consensus can be found by (selecting Data ⁄ Edit, either on a character-by-character
selecting Trees ⁄ Consensus or by entering nelsen. basis, Fig. 3, or on a taxon-by-taxon basis); if charac-
The synapomorphies common to several trees can be ters ⁄ states have been named, this facilitates inspecting
plotted on their consensus by selecting Optimize ⁄ Syna- and proof-checking the data, without the need to look at
pomorphies ⁄ MapCommon or by entering apo [ . an alphanumeric matrix. Data editing in command-
Bremer supports can be calculated by using a script (a driven versions is limited, and can only be done one cell
program written in the scripting language, as explained at a time.
later) contained in a file called bremer.run, with
instructions for TNT to calculate Bremer supports using
either searches for suboptimal trees, constraints for non- Groups of taxa, trees, and characters
monophyly, or combinations of both methods. To run
this script select File ⁄ OpenInputFile or enter procedure In commands that require selections of trees, taxa or
bremer.run. Resampling (jackknifing, bootstrapping, characters, it is possible to specify them one by one, or
etc.) can be done by selecting Analyze ⁄ Resampling or by referring to tree, taxon, or character groups (Fig. 2B
by entering resample. shows the dialog for defining tree groups), which can be
defined by means of tgroup (for trees), agroup (for taxa)
and xgroup (for characters). Taxa, characters, and trees
Quantitative and morphological characters, and data are numbered starting from 0, so that for N elements,
editing the last is numbered N ) 1. When a command expects a
list, enclosing the name (possibly truncated) or number
In addition to DNA sequences, data can also consist of a group in curly braces { } is equivalent to listing all
of morphological characters with up to 32 states the members of the group. Commands that generate,
(alphanumerical codes), continuous characters (values modify or read trees from files automatically place the
from 0 to 65, with three decimals), or protein sequences trees in tree groups, which makes subsequent manipu-
(IUPAC codes), possibly combined (each one must be lations of sets of trees easier. For example, the ‘‘nelsen
placed in a different block of data, preceded by *;’’ option calculates and saves a strict consensus tree to
indication of the corresponding format). Despite the memory (automatically placing it in a group called
fact that continuous characters are so common in ‘‘strict_consensus’’), and the tnodes command counts the
morphological data sets, all other programs for phylo- number of nodes in subsequently listed trees, so that
genetic analysis require that continuous characters be ‘‘nelsen *; tnodes {strict };’’ counts the number of nodes
forced into characters with discrete states; TNT instead in the strict consensus.
Fig. 4. Dialogs for setting optimality criteria (A), branch-swapping and random addition sequences (B), and new technology searches (C).
It is also possible to specify groups of taxa by separate window to ‘‘pre-view’’ the trees. From the
referring to specific nodes of a tree (i.e. @T N refers to previewing window the user can decide whether the
all of the taxa descended from node N of tree T), and tree diagram is to be discarded, written to the text
trees can be edited with the edit command. buffer or log file, or saved as a metafile (i.e., a
publication-quality image file). If a metafile is opened
before displaying the tree (with File ⁄ Output ⁄ OpenMeta
Tree viewing and editing: producing publication-quality file), the tree diagram goes there automatically. The
output previewing window also allows mapping characters in
color, or defining specific legends or colors for tree
In Windows versions, commands or menu options branches, by a double right-click on a node, when the
that output tree diagrams display the trees in a tree is unlocked.
Fig. 5. Dialogs for calculating consensus trees (A), Bremer supports (B), and resampling (C).
Windows versions also allow the graphical editing and Optimality criteria and character types
selecting of taxa. Selecting Trees ⁄ View switches to tree-
viewing mode. Under tree-viewing mode, a left-click on TNT implements different criteria for parsimony
the node of a tree will select (in red) or deselect (in green) analysis (Fig. 4A). Analyses can be carried out either
all the taxa descended from the node, so that the dialog using equal weights (default), weights predefined by the
for taxon selection (under any menu option which can user, implied weights (Goloboff, 1993; either with
select some taxa) will initially display that selection. In standard or with user-defined functions of the homo-
tree-viewing mode trees may be locked or unlocked. plasy), or self-weighted optimization (i.e. dynamic
This is controlled by the padlock toggle switch in the weighting of state transformations, using the method
toolbar or by selecting Settings ⁄ LockTrees. If the tree is of Goloboff, 1997). The scripting language can be used
locked, right-clicking on a node shows a list of synapo- to search iteratively (as in successive weighting, Farris,
morphies for the node (if characters ⁄ states have been 1969; dynamic weighting, Williams and Fitch, 1990; or
named, and Format ⁄ UseCharacterNames is selected, it support weighting, Farris, 2001), reassigning weights to
then uses character names). If the tree is unlocked, it can either characters or state transformations, in specific
be edited using the mouse. ways determined by the user.
If characters or state-transformations have been large data sets, the difference in speed is often much
assigned differential weights, then these weights or larger (e.g. Goloboff and Pol, 2007, for two data sets
costs are considered during implied weighting or self- with c. 11 000 and 14 000 taxa, report speed differences
weighted optimization. TNT can optimize additive in TBR of 300 and 900 times, respectively).
(Farris, 1970) and nonadditive (Fitch, 1971) characters
(‘‘ordered’’ and ‘‘unordered’’ of PAUP*), as well as
Sankoff (step-matrix; Sankoff and Rousseau, 1975) A stricter collapsing of zero-length branches improves
characters with any metric costs. Character state-trees tree-searches
can easily be defined in the form of a diagram using
ASCII characters (with Data ⁄ CharacterSettings, select- Searches optionally collapse zero-length branches
ing ‘‘Character-state-tree’’, or with the cstree com- under different criteria or retain all distinct dichotomous
mand); thus in trees. The default collapsing rule in TNT is to eliminate
012 any branch for which the minimum possible length
n (among alternative most parsimonious reconstructions)
534 is zero. As discussed by Goloboff (1996) and Davis et al.
(2004), this criterion produces more effective searches
the cost of changing between two states is simply the than criteria which collapse fewer branches, both in
number of lines on the shortest path between them. terms of time needed to complete searches, and ability to
find shortest trees when doing multiple RAS+TBR
saving limited numbers of trees per replication. Under
Searching for most parsimonious trees: basic methods this criterion, the polytomized trees may become longer
(see Coddington and Scharff, 1994). Unless explicitly
In addition to the basic approach described earlier asked to polytomize the trees after a search (either by
(Fig. 4B), branch-swapping can be applied to any ticking on the corresponding option, or with the collapse
starting tree (with the bbreak command, or selecting as auto option), TNT will retain the trees as dichotomous,
starting trees ‘‘trees from RAM’’ instead of ‘‘wagner so that re-optimizing them will produce minimum
trees’’). For most medium-sized data sets, the best length. Note that even when TNT does not collapse
approach is to run a number of random addition the trees, the program makes sure that all the trees saved
sequences (RAS), each with TBR1 branch-swapping, would be different if they were collapsed. When trees are
until the best score is hit 10 to 20 times independently. (by default) retained as binary, consensus calculation
This is usually sufficient for all global optima (‘‘islands’’, re-optimizes the trees, temporarily eliminating
as they were once called) to be found. Note here that if a zero-length branches. If the trees are collapsed after
search is repeated several times, the same random seed the search, the consensus calculation should avoid
will be used unless changed explicitly, either with the re-collapsing the trees (see below, under Tree Compa-
rseed command or from the same dialog box in risons).
Windows versions. When the program reports that
‘‘some replications overflowed’’ and the goal is to find
all most parsimonious trees for the data set at hand, Tree-searches in large and complex data sets: the new
subsequent branch-swapping from the trees produced by technology options
RAS+TBR can be used after setting the maximum trees
to save to a large number, with Settings ⁄ Memory or with For large and very complex data sets, TNT also
the hold command. For data sets that produce very large implements several algorithms that are much more
numbers of equally parsimonious trees, saving all of effective than simple branch-swapping (Fig. 4C). Most
them is (as noted by Farris et al., 1996) very impractical; of these algorithms were introduced in TNT, and make
in those cases, similar results can be obtained if best TNT the only program capable of reliably analyzing data
scores are found independently a significant number of sets with more than a few hundred taxa. Using these new
times, and the results are then strict-consensed. algorithms TNT may require only a hundredth or a
The branch-swapping algorithms of TNT are very thousandth of the time needed by PAUP* to find trees of
efficient. For medium-sized data sets, RAS+TBR minimum length. A recent example is Goloboff et al.Õs
searches typically proceed 5 to 10 times faster than in (submitted) reanalysis of McMahon and SandersonÕs
PAUP* or Nona ⁄ Pee-Wee (Goloboff, 1994a,b). For (2006) 2228-taxon data set: trees of the best length ever
found with the ratchet under PAUP* in 1700 h of CPU-
1 time were found by TNT, on average, in 30 min.
‘‘Tree bisection reconnection’’ is a synonym of ‘‘branch-breaking’’
(Farris, 1988; see Goloboff, 1999, footnote 1), the swapping method of The ratchet (Nixon, 1999) and tree-drifting (Goloboff,
Hennig86, from which the name of the TNT command, bbreak, is 1999) use a cyclic scheme of perturbation and search
derived. phases. Tree-fusing (Goloboff, 1999) evaluates sub-tree
exchanges between trees, effecting those which improve For calculation of Bremer supports, or for other
score. Sectorial searches (Goloboff, 1999) create reduced purposes, the program can search for suboptimal trees
data sets (using down-pass state-sets for each node), and (based on either fit difference and ⁄ or relative fit differ-
subject the reduced data set to a search algorithm (in ence, of Goloboff and Farris, 2001). The maximum
TNT, any specific search command, including further acceptable difference in score is set with Analyze ⁄
subdivision in sectors, can be used). For extremely large Suboptimal or the subopt command.
data sets, sectorial searches are the most effective means
for quickly finding near-optimal trees (see Goloboff and
Pol, 2007). These algorithms can be applied to preex- Character optimization: diagnosis and mapping
isting trees (with the commands ratchet, drift, sectsch
and tfuse, or selecting Analyze ⁄ NewTechSearch and Character mapping and the diagnosis of the trees
‘‘RAM’’ for ‘‘Get trees from’’, see Fig. 4C), or to trees obtained are one of the main components of a cladistic
created de novo with multiple RAS (xmult command, or analysis. TNT can either produce lists of synapomor-
selecting ‘‘Driven search’’ or ‘‘Random addition phies or character changes, or display those results on
sequences’’). The ‘‘driven search’’ continues until the tree diagrams (with the menu options Optimize ⁄ Syna-
specified tree-score (or the best score found during the pomorphies and Optimize ⁄ Characters, or with the com-
search, if no target score was specified) is hit a given mands apo and map). When multiple most parsimonious
number of times, or until the consensus (re-calculated as trees are used to calculate consensus trees, the consensus
new hits to the best score are produced) becomes stable. is longer, so that optimizing it for producing synapo-
The latter provides a way to produce accurate consensus morphy lists or studying character evolution is (while
trees (especially if the trees are collapsed more strictly, possible) inappropriate. In this case TNT allows an
see below) without saving all possible equally parsimo- optimization of the multiple most parsimonious trees
nious trees. Note that the consensus stabilization will individually, displaying on the consensus tree a sum-
produce more reliable results, or with fewer hits to mary of the changes that are common to all the trees
minimum length, when the trees are collapsed more used to produce the consensus. This is done with the
strictly (the best choice is TBR-collapsing; see Goloboff, Common Synapomorphies or Common Changes options,
1999). which are also available as options of the apo and map
In the case of impossibly large data sets (or easier commands.
data sets but very impatient users), a conservative TNT also implements options for counting specific
estimate of the actual consensus of most parsimonious transformations (e.g., are gains more common than
trees that does not require actually finding them (i.e. a losses?), and enumerating all possible most parsimoni-
quick consensus estimation; Goloboff and Farris, 2001) ous reconstructions, in the case of ambiguity. The
can be produced by selecting Analyze ⁄ EstimateConsen- scripting language can access these options as well as
sus or with the qnelsen command. In addition to the handling individual reconstructions (or finding the best
built-in routines for tree-searches, the scripting lan- state assignments with a fixed state at one or more
guage of TNT allows users to devise their own search nodes, for a given tree; see the documentation for the
strategies. iterrecs command).
Constrained searches, suboptimal trees Finding wildcard taxa, tree comparisons, consensus trees
Tree searches can be performed under either positive The options for comparing trees and summarizing
or negative constraints, so that only trees either having, results are one of the most important aspects of TNT.
or not having, certain specified groups are acceptable. The standard methods for consensus (strict, semi-strict
This can be useful for calculating Bremer supports. The or combinable components, majority, and frequency
Windows version has a uniquely simple way to define differences) are accessed with Trees ⁄ Consensus
constraints, by just clicking on tree nodes, with (Fig. 5A), or with the commands nelsen, comcomp,
Data ⁄ DefineConstraints. In command-driven versions, majority and freqdifs. As noted above, the trees are (by
constraints can be defined by reference to trees or taxon default) temporarily collapsed as the consensus is
groups in the force command. Once defined, constraints calculated (needed if the trees are retained as binary
must be explicitly applied to searches, either by ticking after searches, which is the default); whether trees are
on the corresponding box, or with the constrain com- temporarily collapsed is changed with Settings ⁄ Consen-
mand. In the case of PAUP*, searches can use either seOptions, or the collapse notemp command. If the trees
positive or negative constraints, but not both at the have different taxon sets, then the default action
same time; TNT can use both positive and negative (changed with Settings ⁄ ConsenseOptions or the unshared
constraints. command) is pruning all the trees so that they have
identical taxon sets. Poorly resolved consensus trees (or TNT also natively supports semistrict consensus
some groups with low frequencies, in the case of supertrees (Goloboff and Pol, 2002), as well as easy
majority rule or frequency difference trees, often used creation of MRP matrices (for subsequent supertree
to measure group supports; see below) may be caused by analysis under either parsimony or cliques).
just one or a few taxa moving among alternative distant For simple comparisons between tree topologies,
positions in the input trees. TNT allows checking (and reporting) of duplicate trees,
TNT implements several options that facilitate iden- checking the groups present in one tree but not in
tifying the taxa responsible for the lack of resolution, or another (a sort of ‘‘anticonsensus’’, either with tcomp or
responsible for the low group supports: Trees ⁄ Comparisons ⁄ CompareGroups), and (if con-
(a) automatic evaluation of alternative taxon pru- straints have been defined) checking whether each of
nings, counting the numbers of nodes gained in either the groups defined in the constraints is (or is not)
the strict or semistrict consensus, and reporting those monophyletic (mono, or Trees ⁄ Monophyly).
prunings which improve the results (Trees ⁄ Another important component of the tree-comparison
Comparisons ⁄ PrunedTrees, or prunnelsen and pruncom routines is found in the options which calculate different
commands); since the alternative prunings are tried coefficients of tree similarity. The natively implemented
combinatorially, this option may (depending on the options for topological comparison are the retention
resolution of the consensus trees) become very time- index (Farris, 1989) of the MRP of one tree mapped
consuming beyond five or six taxa (or clades) eliminated onto the other (tcomp command), which is a variation of
at the same time; Farris (1973) ‘‘distortion coefficient’’, and SPR-distances
(b) agreement subtrees, which identify the largest between trees (using the heuristic method of Goloboff
subset of taxa which are identically related in all input (2008), with the Trees ⁄ Comparisons ⁄ SPR-Distances, or
trees (Trees ⁄ Comparisons ⁄ AgreementSubtrees, or prun- sprdiff command). In conjunction with the scripting
nelsen). This is often used also as a measure of similarity language, it is also possible to implement Robinson–
between the input trees; Foulds distances (Robinson and Foulds, 1981), triplet
(c) a heuristic command which, given the groups in a dissimilarity, number of steps in the MRP, number of
reference tree, attempts to identify the taxa to prune to ‘‘flippings’’ (changes to the MRP matrix), etc.
increase group frequencies (this is the method used by
Goloboff et al. (submitted), implemented with the prun-
major command); Bootstrapping, jackknifing and Bremer support
(d) TBR-tracking (chkmoves command), which sub-
mits a tree to TBR branch-swapping, and records (for For measuring group supports, TNT implements both
each clade) the number of possible positions, maximum Bremer supports and measures based on resampling
distance and depth of rerooting, for all the moves that (jackknifing, bootstrapping or symmetric resampling).
produce equally optimal trees (or trees within a specified For resampling (Analyze ⁄ Resampling, see Fig. 5C, or
score difference); and resample), the user may define any search routines (or
(e) calculation of approximate frequencies of group use the instructions in a file or script) to analyze each
recovery with the rfreqs command, which is a sort of resampled data set. After analyzing each resampled data
majority rule, but calculates a similarity index between set, TNT will automatically compute the strict consen-
partitions (based on the composition of the taxa inside sus tree (collapsing for which is done according to the
and outside the group). The score is 1 for two identical criterion in effect; as with consensus stabilization, the
(=monophyletic) groups, and decreases to the extent best choice is TBR-collapsing; see Goloboff and Farris,
that there are more differences. In this way, a group 2001; and Goloboff et al., 2003a). A summary of results
which is ‘‘almost’’ monophyletic (i.e. which could be is calculated at the end, and it is optionally possible to
made monophyletic by ignoring the position of just a save the strict consensus for each replication for
few terminals) in most of the trees will receive scores subsequent manipulations and consensing.
approaching 100%. Groups that have scores near 100% For Bremer supports, finding suboptimal trees where
are probably amenable to have their frequencies the different groups are non-monophyletic is up to the
improved by ignoring the position of few taxa (while user. In simple data sets, just using the optimal trees as a
groups with very low scores can, probably, be improv- starting point for searches, saving successively larger sets
able only by pruning large numbers of taxa). of more suboptimal trees (make sure you select ‘‘trees
Most of these commands allow identifying the taxa from RAM’’ as starting trees, see Fig. 4B, and tick on
either visually or (perhaps by interacting with the ‘‘stop when maxtrees hit’’, or use the bbreak = fillonly
scripting language) by placing them in taxon groups, command, so that the suboptimal trees are not need-
so that automation of routines to improve consensus lessly swapped). Increasing the value of suboptimal in
trees by using a combination of these basic approaches several steps is important, because otherwise the values
is possible. of Bremer support will probably be overestimated—the
search for suboptimal trees is very likely to fill the tree average) off by the same numbers of steps, then their
buffer with very suboptimal trees, thus missing most of difference in length equals the Bremer supports.
the slightly suboptimal trees (which are needed to Trees may be plotted with multiple support measures
identify the least supported groups). Once the optimal (Bremer support, bootstrap frequency, jackknife fre-
and suboptimal trees are stored in memory, the program quency, etc.) attached to each branch. This may be done
checks minimum score differences to lose each group by selecting Trees ⁄ MultipleTags ⁄ Store or the command
(with Trees ⁄ BremerSupports, Fig. 5B, or the bsupport ttag= before running the support calculation, then
command) and plots them on a tree diagram. plotting the trees with Trees ⁄ MultipleTags ⁄ ShowSave or
For better supported groups or very noisy data sets, the command ttag. The multiple tags can also be saved,
finding a tree not displaying a given group by just in the Windows version, in extended metafile formats.
accepting suboptimal trees may require saving enor-
mous numbers of suboptimal trees—and it may be
impossible in the case of large data sets with hundreds of Randomization
thousands of optimal trees. The best method is then
searching for trees lacking the group of interest, using For specific tests or simulations, TNT allows gener-
negative constraints (see above). Creating constraints ation of random trees (Trees ⁄ RandomTrees or rand-
and searching for each of the groups in the tree may be trees), character permutation (xperm), or generation of
very tedious, but a simple script distributed with TNT, data (random or simulated under Jukes-Cantor, xread).
Bremer.run, automates this task (creating the con- The results of these can be used, with suitable scripts, to
straints and searching for every group). The script implement specific tests. An example is Goloboff et al.Õs
automatically writes to the corresponding branches the (2006) analysis, in which the number of SPR moves
difference, and plots the tree. corresponding to the consensus trees for discrete-only
The Bremer.run script also implements an alternative and continuous-only data sets were compared to those
means of calculating Bremer supports, which is proba- for pairs of random trees, by means of the script in
bly the only way to estimate Bremer supports for very Fig. 6. This script takes as arguments two numbers (the
large data sets. For each group for which the support is numbers of trees to compare), and outputs the propor-
to be calculated this method consists of calculating first tion of pairs of random trees which differ in as few (or
the average tree-length for each of a number of simple fewer) SPR-moves (for further explanation of scripting
searches (e.g. 5 RAS+TBR saving a single tree) with the in TNT, see next section). Using scripts, a wide variety
group constrained to be monophyletic. This is then of tests can easily be implemented.
followed by a similar calculation, but with the group
constrained to not be monophyletic. The difference
between the two averages constitutes an estimation of Extending TNTÕs capabilities with scripts
the Bremer supports. This uses a reasoning similar to
that in Farris et al.Õs (1994) congruence test: if the The scripting language allows the making of deci-
positively and negatively constrained searches are (on sions and the automation of processes, so that TNT
Fig. 6. Example of a simple script, to test whether the number of SPR-moves to interconvert two trees is greater than the number of moves to
interconvert two random trees.
can be programmed to perform specific tasks. This The syntax for a loop is:
extends the capabilities of the program enormously. An loop ¼ name i þ j k ðactionðsÞÞ stop
example is Pickett et al.Õs (2005) analysis, in which they
automated tens of thousands of tree searches using the loop will start at value i and end at value k, every
TNT scripts. time increasing by the value of j (the ‘‘+j’’ can be
Only the basic aspects of the scripting language will be omitted if the increment is 1; if k<i, then the loop is
explained here; a more detailed description is provided decreasing). Within the specified action(s), the expres-
in the documentation that comes with the program. sion ‘‘#name’’ is equivalent to writing the current value
If a script is placed in a file called xxx.run, in current of the loop index value (if the equal sign and the name
directory, then just typing xxx at the TNT command are omitted, then the loop is not named, and the value of
prompt will automatically execute file xxx.run, passing the loop is accessed with ‘‘#N’’, where N is the nesting
to the file any arguments given to xxx (argument level of the loop to be accessed; nesting level is exclusive
number n is identified, within the file, as %n; argument 0 of each input file).
is the name of the file itself). Within the loop, the command setloop N will re-set
The basic components of the scripting language are: the loop value to N, continue will move to the next cycle
(a) commands for flow control and decision-making: (skipping all actions between continue and stop), and
decisions are made with if, and repetitive actions are endloop will interrupt the execution of the loop.
executed with the loop command (with the symbol # Other commands that execute loops are sprit and tbrit
replaced by the current value of the loopÕs index (execute subsequent actions for each of the SPR or TBR
variable, in every cycle). rearrangements of the specified tree), travtree (executes
(b) internal variables (or values calculated by TNT), subsequent actions for each of the nodes in a tree, visited
accessible only within the context of scripting com- according to a down-pass, up-pass, or path-travelling
mands; for example, the expression ntrees is equivalent sequence), and iterrecs (executes subsequent actions for
to the number of trees in memory minus 1 (recall that each of the equally parsimonious reconstructions of the
TNT numbers everything, except data blocks, starting specified character, with the possibility of forcing a
from 0). Thus, specific state at any given node). TNTÕs on-line help
explains the usage of each of these commands in detail.
loop 0 ntrees
tplot #1 ;
Internal variables
stop
will sequentially plot the tree diagram, for each of the TNT provides easy access to more than 120 internal
trees in memory. variables to be used in decision making (e.g. number of
(c) variables defined by the user: these establish the taxa, trees, tree-scores, degree of resolution, branch
connection between internal variables and the regular lengths, tree-distances, hits to minimum length or
commands; they can also be used to store values of number of rearrangements in the last search, most
calculations. Writing the name (or number) of a user parsimonious states at internal nodes, etc.). Internal
variable within quotes, is equivalent to writing the value variables cannot be accessed from regular TNT com-
of the user variable. mands, but only from the scripting commands.
(d) expressions: in the context of the scripting
language, when a number is expected, a parenthetical User variables
expression (with operations between numbers, or logical
comparisons) can be used instead of the number. The user can define variables (arrays with up to five
dimensions). Once named and defined (with the var
Flow-control and decision making command), their values can be set with the set command.
For example, the following script will randomly select 10
The if command has the following syntax: characters (between 0 and nchar, the number of char-
ifðexpressionÞ
acters minus 1) and place them (with the xgroup
command) in a character-group named ‘‘selected’’:
actionðsÞA;
else var : i;
actionðsÞB; xgroup ¼ 0 ðselectedÞ ;
end loop 1 10
set i getrandom ½ 0 nchar ;
If the expression following the if is different from 0,
xgroup 0 0 i0 ;
then action(s) A are executed; otherwise, action(s) B are
stop
executed. The else and action(s) B can be omitted.
proc=;
There can be any number of nested ifÕs ⁄ elseÕs.
Note that getrandom, being an internal variable, could Coddington, Julian Faivovich, Gonzalo Giribet, Peter
not be accessed from the xgroup command, so that Hovenkamp, Daniel Janies, Diana Lipscomb, Rudolf
the value has to be transferred first to a user vari- Meier, Marcos Mirande, Diego Pol, Martin Ramirez,
able called i. The script is merely an example, since Claudia Szumik, among others. Rudolf Meier and Kurt
the xgroup command has a built-in option to select Pickett offered useful comments and advice on the
characters at random, with which the equivalent result manuscript.
can be achieved more simply: xgroup =0 (selected)
*10;.
References
Other features of the scripting language
Coddington, J., Scharff, N., 1994. Problems with zero-length branches.
The scripting language also allows: Cladistics 10, 415–423.
Davis, J., Nixon, K., Little, D., 2004. The limits of conventional
(a) formatted input (hifile command) and output cladistic analysis. In: Albert, V. (Ed.), Parsimony, Phylogeny, and
(quote and lquote commands). Genomics. Oxford University Press, Oxford, pp. 119–147.
(b) In Windows versions, customizable dialogs (open- Farris, J., 1969. A successive approximmations approach to character
dlg command). weighting. Syst. Zool. 18, 374–385.
(c) access to taxon names, character names, branch- Farris, J., 1970. Methods for computing wagner trees. Syst. Zool. 19,
83–92.
legends, etc., for use in formatted output, or for setting Farris, J., 1973. On comparing the shapes of taxonomic trees. Syst.
internal variables. Within almost any context, a dollar Zool. 22, 50–54.
sign followed by the specification of the type and Farris, J., 1988. Hennig86, ver. 1.5. Program and Documentation,
number is automatically converted (e.g. $taxon n is Distributed by the Author. Port Jefferson Station, NY.
Farris, J., 1989. The retention index and the rescaled consistency index.
equivalent to typing the name of taxon number n, and
Cladistics 5, 417–419.
$ttag i is equivalent to typing the legend of branch Farris, J.S., 2001. Support weighting. Cladistics 17, 389–394.
number i). Conversely, values can be written to tree-tags Farris, J., Källersjö, M., Kluge, A., Bult, C., 1994. Testing significance
for subsequent display on a tree-diagram (ttag com- of incongruence. Cladistics 10, 315–319.
mand). Farris, J., Albert, V., Källersjö, M., Lipscomb, D., Kluge, A., 1996.
Parsimony jackknifing outperforms neighbor-joining. Cladistics 12,
(d) input redirection to specific parts of a script (with 99–124.
the goto and label commands). Instructions in that Fitch, W., 1971. Toward defining the course of evolution: minimal
section are effectively executed as functions, making it change for a specific tree topology. Syst. Zool. 20, 406–416.
possible to write well structured scripts. Giribet, G., 2005. A review of: ‘‘TNT: Tree Analysis Using New
(e) report of progress and interruption (progress Technology’’. Syst. Biol. 54, 176–178.
Goloboff, P., 1993. Estimating character weights during tree search.
command), for time consuming scripts. This is best Cladistics 9, 83–91.
used with the output silenced (with the silent command, Goloboff, P., 1994a. Nona: A Tree-searching Program. Program
so that the program does not produce tons of unnec- and documentation, available at http://www.zmuc.dk/public/
essary output). phylogeny/Nona-PeeWee.
Goloboff, P., 1994b. Pee-Wee: Parsimony and Implied Weights.
Program and documentation, available at http://www.zmuc.dk/
Running TNT in parallel public/phylogeny/Nona-PeeWee.
Goloboff, P., 1996. Methods for faster parsimony analysis. Cladistics
In Linux and Mac versions, it is possible to run TNT 12, 199–220.
in parallel (this requires installation of the PVM system, Goloboff, P., 1997. Self-weighted optimization: character state recon-
structions and tree searches under implied transformation costs.
available at http://www.netlib.org/pvm3). With the ptnt
Cladistics 13, 225–245.
command, the program can launch, monitor (with Goloboff, P., 1999. Analyzing large data sets in reasonable times:
information retrievable from the slaves at any time), solutions for composite optima. Cladistics 15, 415–428.
and control slave tasks (possibly interrupting them, Goloboff, P., 2008. Calculating SPR-distances between trees. Cladis-
making them return results and continue running, or tics (in press). doi: 10.1111/j.1096-0031.2007.00189.x
Goloboff, P., Farris, J.S., 2001. Methods for quick consensus
substituting the instructions for the slaves for a new set estimation. Cladistics 17, S26–S34.
of instructions). For details see the documentation of Goloboff, P., Pol, D., 2002. Semi-strict supertrees. Cladistics 18, 514–
the ptnt command. 525.
Goloboff, P., Pol, D., 2007. On divide-and-conquer strategies for
parsimony analysis of large data sets: rec-i-dcm3 vs TNT. Syst.
Biol. 56, 485–495.
Acknowledgements Goloboff, P., Farris, J., Källersjö, M., Oxelmann, B., Ramı́rez, M.,
Szumik, C., 2003a. Improvements to resampling measures of group
The authors wish to express their gratitude to all the support. Cladistics 19, 324–332.
users of TNT who have helped improve it, with bug Goloboff, P., Farris, J., Nixon, K.. 2003b. T.N.T.: Tree Analysis Using
reports and suggestions of new features: Lone Aagesen, New Technology. Program and documentation, available at http://
www.zmuc.dk/public/phylogeny/tnt.
James Carpenter, Santiago Catalano, Jonathan
Goloboff, P., Mattoni, C., Quinteros, Y.S., 2006. Continuous charac- Pickett, K.M., Tolman, G.L., Wheeler, W.C., Wenzel, J.W., 2005.
ters analyzed as such. Cladistics 22, 589–601. Parsimony overcomes statistical inconsistency with the addition of
Hovenkamp, P., 2004. Review of TNT—Tree analysis using New more data from the same gene. Cladistics 21, 438–445.
Technology, ver. 1.0. Cladistics 20, 378–383. Robinson, D.F., Foulds, L.R., 1981. Comparison of phylogenetic
McMahon, M., Sanderson, M., 2006. Phylogenetic supermatrix trees. Math. Biosci. 53, 131–147. 58, 825–841.
analysis of GenBank sequences from 2228 papilionoid legumes. Sankoff, D., Rousseau, P., 1975. Locating the vertices of a Steiner tree
Syst. Biol. 55, 818–836. in an arbitrary space. Math. Program. 9, 240–246.
Meier, R., Ali, F., 2005. The newest kid on the parsimony block: Swofford, D., 2002. PAUP*: Phylogenetic Analysis Using Parsimony
TNT (Tree analysis using new technology). Syst. Entomol. 30, (* and other methods), ver. 4. Sinauer Associates, Sunderland,
179–182. MA.
Nixon, K., 1999. The parsimony ratchet, a new method for rapid Williams, P., Fitch, W., 1990. Phylogeny determination using dynami-
parsimony analysis. Cladistics 15, 407–414. cally weighted parsimony method. Meth. Enzymol. 183, 615–626.

Goloboff Et Al 2008 - TNT, A Free Program For Phylogenetic Analysis

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Goloboff Et Al 2008 - TNT, A Free Program For Phylogenetic Analysis

Uploaded by

Copyright:

Available Formats

Cladistics

Cladistics 24 (2008) 774–786

TNT, a free program for phylogenetic analysis

Introduction sophisticated menu interface or GUI (Figs 1–5), which

The Willi Hennig Society 2008

Fig. 1. Main menu options of TNT.

Fig. 3. Dialog for editing data on a character-by-character basis.

You might also like