Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

Forensic Science International 133 (2003) 260–265

Short communication
Inferring recent human phylogenies using forensic
STR technology
Diane J. Rowold, Rene J. Herrera*
a
Department of Biological Sciences, Florida International University, University Park Campus, Miami, FL 33199, USA
Received 12 February 2003; accepted 13 February 2003

Abstract

STR loci are characterized by extremely high mutation rates and thus, high levels of length polymorphism both within and
among populations. In addition, much of the observed variation is believed to be nearly selectively neutral. Because of these
features, STRs are ideal markers for genetic mapping, intra-species phylogenetic reconstructions and forensic analysis. In the
present study, we investigate the application of five STR loci (CS1PO, TH01, TPOX, FGA and vWA) routinely used in forensic
analysis for delineating the phylogenetic relationships of 10 human populations representing the three major racial groups
(African–Caribbean, Croatian from the island of Hvar, East Asian, Han Chinese, Italian, Japanese, Portuguese, UK Caucasian,
US Caucasian and Zimbabwe). The resulting tree topology exhibited strong geographic and racial partitioning consistent with
that obtained with mtDNA haplotypes, Y-chromosome markers, SNPs, PAIs (polymorphic Alu insertions) as well as classic
genetic polymorphisms. These findings suggest that forensic STR loci may be particularly powerful tools and provide the
necessary fine resolution for the reconstruction of recent human evolutionary history.
# 2003 Elsevier Science Ireland Ltd. All rights reserved.

Keywords: Recent human evolution; Microsatellite; STR; Phylogenetic analysis

1. Introduction forensic STRs and those typically employed in these phylo-


genetic investigations, it is not extensive and is more a result
Microsatellite loci are endowed with several properties of happenstance rather than design or rationale. On the other
that render them desirable markers for fine scale genetic hand, the phylogenetic analyses of both Chakraborty et al. [9]
mapping [1–3], intra-species phylogenetic reconstructions and Budowle and Chakraborty [10] are based upon the
[4,5], maternity/paternity determination [6] and forensic thirteen CODIS loci. However, the phylogenetic assessment
analysis [1,2]. These features include a high level of rela- in these two studies are restricted to a simple neighbor joining
tively stable polymorphisms, a dense, uniform chromosomal (NJ) or an UPGMA procedure (each yielding a single output
distribution as well as short sequence lengths which facil- tree) and distance measures. Thus, in the absence of a more
itates detection and analysis by PCR and sequencing [6,7]. extensive phylogenetic evaluation such as one which includes
To be effective tools for forensic applications, STR loci must different phylogenetic approaches (i.e. distance and optim-
also possess high levels of heterozygosity, uniform repeat ality criterion) and statistical bootstrapping, it is not known, at
motifs, and be compatible with rapid, multiplex PCR-based this time, whether the use of forensic STR loci is phylogen-
assays that provide reliable and unambiguous detection of etically informative or hinders meaningful analysis. The
alleles (http://www.cstl.nist.gov/div831/strbase). extremely high level of intra-population variation required
Numerous microsatellite markers have been used in of forensic systems coupled with the elevated mutation rates
phylogenetic analyses of extant human populations [4,5,8]. and the narrow size constraints of microsatellite sequences
Although there exists some overlap between the sets of [11–13] suggests a rapid saturation of genetic variation at
forensic STR loci and thus, may pose a greater risk of
*
Corresponding author. Tel.: þ1-305-348-1258; undetected convergent evolution among some populations.
fax: þ1-305-348-1259. On the other hand, the evolutionary signal produced by these
E-mail address: herrerar@fiu.edu (R.J. Herrera). markers may be strong enough to prevail against any random

0379-0738/03/$ – see front matter # 2003 Elsevier Science Ireland Ltd. All rights reserved.
doi:10.1016/S0379-0738(03)00073-2
D.J. Rowold, R.J. Herrera / Forensic Science International 133 (2003) 260–265 261

noise generated by allelic homoplasy. Furthermore, the fine the three major races (Africans, Orientals and Caucasians).
scale resolution characteristic of forensic STRs may prove The populations examined include: African–Caribbean
useful in delineating genetic affinities among closely related (N ¼ 157–192 depending upon locus [15]), Croatian
ethnicities (within racial groups). This, in turn, may provide (N ¼ 206 [18]), East Asian (N ¼ 183–200 depending upon
a phylogenetic window to examine recent evolutionary locus [15]), Han Chinese (N ¼ 500 [16]), Italians (N ¼ 223
events. [21]), Japanese (N ¼ 206 [17]), Portuguese (N ¼ 153 [20]),
The objective of this study is to explore the utility of UK Caucasian (N ¼ 192–217 depending upon locus [15]),
forensic STR loci in ascertaining phylogenetic relationships. US Caucasian (N ¼ 151 [19]) and Zimbabwe (N ¼ 104
We approach this goal by compiling a geographically tar- [14]).
geted and racially diverse set of population STR databases The five STR loci analyzed in the current investigation
from the forensic literature and analyzing this data both (CSF1PO, FGA, TH01, TPOX and vWA) contain tetrameric
phenetically and phylogenetically with respect to CSF1PO, core repeat sequences and reside on different autosomes. All
FGA, TH01, TPOX and vWA (the largest set of forensic STR 10 populations are in Hardy–Weinberg equilibrium for each
in common to all 10 populations). The phylogenetic assess- locus (data not shown).
ment used in this study is rigorous in that it includes two
different phylogenetic approaches, a distance method and 2.2. Statistical analyses
one based upon the maximum likelihood optimality criterion
as well as a statistical bootstrapping procedure involving Three separate analyses were executed, each based upon
1000 replications. The ensuing tree topologies and principle the STR allelic frequency profiles of the 10 populations
component (PC) plot are then compared with results obtained listed above. In the first, distance values were estimated
in previous phylogenetic investigations based on microsatel- using Nei’s formula [22], and a phylogeny was inferred by
lite data as well as other markers including mtDNA, restric- the neighbor joining (NJ) option in PHYLIP version 3.5c
tion fragment length polymorphisims, Y haplotypes and [23]. A second phylogenetic reconstruction was based on
polymorphic Alu insertions (PAIs). maximum likelihood (ML) and the STR frequency distribu-
It should be mentioned that one of the five STRs included tion (CONTML in PHYLIP version 3.5c [23]). A principal
in this study, FGA, may exhibit somewhat higher hetero- component (PC) analysis generated by NTSYSpc [24],
zygosities than the rest primarily due to the larger array constituted the third analysis.
of length polymorphisms (including both full and partial
numbers of repeats) observed at this locus [current study: 24
for FGA versus 11, 9, 9 and 9 for vWA, CSF1PO, TH01 and 3. Results
TPOX, respectively; National Institute of Standards and
Technology (http://www.cstl.nist.gov/div831/strbase): 67 The two resulting unrooted radial phylograms (NJ and ML)
for FGA versus 20, 15, 20 and 10 for vWA, CSF1PO, are presented in Fig. 1 (a and b, respectively). The edge
TH01 and TPOX, respectively; Perkin-Elmer Applied lengths displayed in these phylograms indicate the amount of
Biosystems: 14 common length polymorphisms for FGA evolutionary change occurring along each branch. The scores
versus 11, 10, 10 and 8 for vWA, CSF1PO, TH01 and TPOX, next to the nodes represent the number of bootstrap replicates
respectively]. Loci with substantially lower allelic counts (out of 1000) exhibiting these specific bifurcations.
may provide better phylogenetic discrimination in cases in The NJ phylogram (Fig. 1a) contains three basal groups:
which there is extensive overlap in allelic types among (1) Zimbabwe and African–Caribbean; (2) East Asian,
populations. However, high heterozygosities and/or numer- Han Chinese and Japanese; and (3) the Caucasian cluster
ous observed alleles do not necessarily interfere with the comprised of the Portuguese branch and two monophyletic
phylogenetic information content of a locus if the frequency units, Italian/Croatian and UK Caucasian/US Caucasian.
profiles of the populations are sufficiently different. Further- Two of the five nodes are accompanied by a bootstrap score
more, multiplicity of alleles may reflect a higher incidence substantially above 50%. These are: (1) the division between
of private alleles which are highly significant in segregating the African–Caribbean/Zimbabwe monophyly and the other
populations during phylogenetic reconstruction. In the populations (bootstrap ¼ 997); and (2) the vertex separating
absence of the phylogenetic analysis itself, it is difficult the Han Chinese/Japanese monophyly from the East Asians
to predict which of these scenarios apply. (bootstrap ¼ 777).
The ML phylogram (Fig. 1b) displays a distinct racial
partitioning that is identical in topology to the NJ tree. As
2. Materials and methods detected in a rectangular version of the unrooted phylogram
(not shown), the Italians are the first to branch off from the rest
2.1. Population data of the Caucasian cluster. The Croatian is the next population to
split off followed by the Portuguese. The two remaining
Ten geographically targeted populations were selected to groups, UK Caucasian and US Caucasian form a monophy-
encompass the major biogeographical zones and represent of letic unit with a bootstrap score of 486. As in the NJ tree, the
262 D.J. Rowold, R.J. Herrera / Forensic Science International 133 (2003) 260–265

Fig. 1. (a) Neighbor joining phylogram based on Nei’s distance with bootstrap scores. Consensus bootstrap scores (1000 replicates) were
transferred to the corresponding nodes of the NJ phylogram. These topological divisions and their respective bootstrap scores (BS) are as
follows: the African–Caribbean and Zimbabwe from the Oriental and Caucasian clusters (BS ¼ 997); the Oriental groups from the African and
Caucasian clades (BS ¼ 300); the Caucasian from the African and Oriental clusters (BS ¼ 401); the Han Chinese/Japanese monophyly from
the East Asian (BS ¼ 777); the UK/US Caucasian monophyly from the Italian and Croatian groups (BS ¼ 464). (b) Maximum likelihood
phylogram with bootstrap scores. Consensus bootstrap scores (1000 replicates) were transferred to the corresponding nodes of the ML
phylogram. These topological divisions and their respective bootstrap scores (BS) are as follows: the African–Caribbean and Zimbabwe from
the Oriental and Caucasian clades (BS ¼ 828); the Oriental from the African and the Caucasian clusters (BS ¼ 452); the Caucasian from the
African and Oriental populations (BS ¼ 440); the Han Chinese/Japanese monophyly from the East Asian (BS ¼ 796); the Italian from the
remaining Caucasian populations (BS ¼ 240); the Croatian from the Portuguese, and the UK/US Caucasian monophyly (BS ¼ 214); the UK/
US Caucasian monophyly from the Portuguese (BS ¼ 486).

Han Chinese/Japanese and African–Caribbean/Zimbabwe 4. Discussion


monophylies are both supported by bootstrap values above
50% (with score of 796 and 828, respectively). Although the The CONTML option in PHYLIP is a ML program based
arrangement of the Caucasian groups is somewhat different in upon the assumption that differences in allele frequency
the two trees, both contain a terminal UK Caucasian/US arise solely from the random action of genetic drift (Brow-
Caucasian subcluster. nian motion model) upon the diverged populations [23]. In
The East Asian, Italian, Portuguese and Zimbabwe contrast, the NJ algorithm constructs a branching order from
populations display relatively short (or non-detectable) a matrix of estimated genetic distances [22]. This distance
branch lengths in one or both phylogenies. This reflects measure, which assumes both mutation as well as genetic
the lesser degree of genetic differentiation sustained by these drift, is expected to increase linearly with time if several
groups. assumptions hold [22,23]. The close similarity in basal
A plot of the first and second principal components, which cluster patterns (i.e., among the three racial groups versus
together constitute 63.5% of the total variability (PC 1, the more terminal intra-racial associations) between the NJ
47.7%; PC 2, 15.8%) is presented in Fig. 2. There is a and ML trees may indicate that, the impact of the genetic
substantial separation of the African groups from the two drift component of these distance estimations eclipses that of
other racial clusters (Oriental and Caucasian) along the x- mutation. Although, the European branching pattern differs
axis (PC 1). The second principle component (PC 2) exhibits slightly between the two trees, both are fully resolved
a relatively clean division between the Caucasians and the (according to the rectangular versions of the two phyloge-
remaining populations with the exception of the East Asians, nies which are not shown). The variation in the Caucasian
an Oriental group that segregates near the Caucasians. Over- branching order between the two trees may also be attribu-
all, however, the racial assemblages are distinct and parallel table, to the limited power provided by the small number of
the biogeographical associations depicted in both the NJ and loci and/or the limited genetic uniqueness of European
ML phylograms. groups (Cavalli-Sforza et al., 1994).
D.J. Rowold, R.J. Herrera / Forensic Science International 133 (2003) 260–265 263

Fig. 2. Plot of principle components 1 (47.7%) and 2 (15.8%).

In both trees, the length of the African–Caribbean/Zim- Strong racial partitioning is evident in phylogenies inferred
babwe branch is much longer than that of any other from these studies as well. The phylogenetic reconstruction
group. The patristic separation of the Africans from the of Jorde et al. [5] assimilates frequency data from 30
remaining groups is also reflected, in the PC plot (Fig. 2). In tetranucleotide STR loci and 13 populations. The level of
contrast with the large inter-nodal distance of the African– intra-racial resolution is similar to the NJ and ML phylo-
Caribbean/Zimbabwe monophyly, the East Asians display a grams of the current analysis. In a subsequent study, Jorde
branch length of zero in both phylograms and lie near the et al. [27] performed a principle coordinates analysis based
patristic origin suggesting a genetic affinity to the non- on the mean allele sizes of 60 STR loci (including the 30
Oriental populations. This is supported by the East Asian’s STR loci used in Jorde et al. [5]) and 15 human populations
medial position in the PC plot (Fig. 2). which generated a racial cluster pattern highly concordant
The Fst [25] NJ phylogram of Perez-Lezaun et al. [26], with our results. The NJ phylogeny reported by Bowcock
generated with 20 STRs, is largely consistent with that of the et al. [4], a chord distance [28] reconstruction based on 30
current study in that it also displays the generally accepted dinucleotide loci and 10 globally dispersed populations,
clustering of racial groups. On the other hand, the second exhibits a topology that is highly congruent (with respect
best tree, reconstructed using the Dsw distance (and the same to the major racial groups) with the mid-rooted versions (not
20 STRs), exhibits a trifurcation composed of two Caucasian shown) of both forensic STR phylogenies (NJ and ML).
groups and the African lineage. In contrast, both the NJ and The NJ and ML deudograms of the current analysis were
ML phylogenies of the current study, which are based on also compared to those obtained with other genetic markers
only five forensic STR loci versus the 20 STRs of Perez- (mtDNA in [5] and PAIs in [29]). Overall, the strong racial
Lezaun et al. [26], depict a clean separation among all inter- partitioning in the forensic STR phylograms are consistent
and intra-racial groups. with that inferred from bi-allelic frequency data involving
Overall, the trees generated by the five forensic STR six PAIs from a globally distributed set of 47 populations
tetrameric loci are in accordance with those derived from [29]. However, the fit between the present results and that
much larger sets of microsatellites (30 loci each for Bow- derived from an analysis involving the hyper-variable
cock et al. [4], Jorde et al. [5] and 60 in Jorde et al. [27]). sequence-2 of mtDNA [5] is less satisfactory since the
264 D.J. Rowold, R.J. Herrera / Forensic Science International 133 (2003) 260–265

groups in the mtDNA topology do not partition as cleanly [11] M. Nauta, F. Weissing, Constraints on allele size at
along racial lines. This is evidenced by the placement of the microsatellite loci: implications for genetic differentiation,
two Caucasian groups (French and northern European) in a Genetics 143 (1996) 1021–1032.
monophyletic clade with the Chinese. [12] J. Weber, C. Wong, Mutation of human short tandem repeats,
Hum. Mol. Genet. 2 (1993) 1123–1128.
There are several potential difficulties associated with
[13] B. Brinkmann, M. Klintschar, F. Neuhuber, J. Huhne, B. Rolf,
using STRs for phylogenetic inference. These include: the Mutation rate in human microsatellites: influence of the
erosion of genetic differentiation among taxa due to high structure and length of the tandem repeat, Am. J. Hum. Genet.
mutation rates coupled with allelic size constraints [11–13], 62 (1998) 1408–1415.
lack of an appropriately detailed model(s) of STR evolution [14] B. Budowle, L. Nhari, T. Moretti, S. Kanoyangwa, E.
and corresponding analytical computer programs, as well as Masuka, D. Defenbaugh, J. Smerick, Zimbabwe black
mutation rate heterogeneity among alleles [30]. Nonetheless, population data on the six short tandem repeat loci—
it is encouraging to note that despite these limitations, we CSF1PO, TPOX, TH01, D3S1358, VWA, and FGA, Forensic
were able to obtain topologies with a strong racial partition- Sci. Int. 90 (1997) 215–221.
ing and a 100% resolution of intra-racial groups using [15] J. Thomson, V. Pilotti, P. Stevens, K. Ayres, P. Debenham,
Validation of short tandem repeat analysis for the investigation
two different methods of phylogenetic reconstruction. It is
of cases of disputed paternity, Forensic Sci. Int. 100 (1999) 1–16.
possible that the hyper-variability exhibited by these STR loci [16] C. Pu, F. Wu, C. Cheng, K. Wu, C. Chao, J. Li, DNA short
provides the fine resolution required to observe the phylo- tandem repeat profiling of Chinese population in Taiwan
genetic events that shaped very recent human evolution. determined by using an automated sequencer, Forensic Sci.
Int. 97 (1998) 47–51.
[17] T. Yamamoto, R. Uchihi, H. Nozawa, X. Huang, Y. Leong,
References M. Tanaka, M. Mizutani, K. Tamaki, Y. Katsumata, Allele
distribution at the nine STR loci—D3S1358, vWA, FGA,
[1] A. Edwards, H. Hammond, L. Jin, C. Caskey, R. Chakraborty, TH01, TPOX, CSF1PO, D5S818, D13S317, and D7S820—in
Genetic variation at five trimeric and tetrameric tandem the Japanese population by multiplex PCR and capillary
repeat loci in four human population groups, Genomics 12 electrophoresis, J. Forensic Sci. 44 (1999) 167–170.
(1992) 241–253. [18] I. Martinovic, L. Barac, I. Furac, B. Janicijevic, M. Kubat, M.
[2] A. Edwards, A. Civitello, H. Hammond, C. Caskey, DNA Pericic, B. Vidovic, P. Rudan, STR polymorphism in the
typing and genetic mapping with trimeric and tetrameric population of the island of Hvar, Hum. Biol. 71 (1999) 341–
tandem repeats, Am. J. Hum. Genet. 49 (1991) 746–756. 352.
[3] C. Hearne, S. Ghosh, J. Todd, Microsatellites for linkage [19] T. Kupferschmid, T. Calicchio, B. Budowle, Maine Caucasian
analysis of genetic traits, Trends Genet. 8 (1992) 288–295. population DNA database using twelve short tandem repeat
[4] A. Bowcock, A. Ruiz-Linares, J. Tomfohrde, E. Minch, J.R. loci, J. Forensic Sci. 44 (1999) 392–395.
Kidd, L.L. Cavall- Sforza, High resolution of human [20] S. Santos, B. Budowle, J. Smerick, K. Keys, T. Moretti,
evolutionary trees with polymorphic microsatellites, Nature Portuguese population data on the six short tandem repeat
368 (1994) 455–457. loci: CSF1PO, TPOX, TH01, D3S1358, VWA and FGA,
[5] L. Jorde, M. Bamshad, W. Watkins, R. Zenger, A. Fraley, P. Forensic Sci. Int. 83 (1996) 229–235.
Krakowiak, K. Carpenter, H. Soodyall, T. Jenkins, A. Rogers, [21] L. Garofano, M. Pizzamiglio, C. Vecchio, G. Lago, T. Floris,
Origins and affinities of modern humans: a comparison of G. D’Errico, G. Brembilla, A. Romano, B. Budowle, Italian
mitochondrial and nuclear genetic data, Am. J. Hum. Genet. population data on the thirteen short tandem repeat loci:
57 (1995) 523–538. HUMTH01, D21S11, D18S51, HUMVWFA31, HUMFIBRA,
[6] H. Hammond, L. Jin, Y. Zhong, C. Caskey, R. Chakraborty, D8S1179, HUMTPOX, HUMCSF1PO, D16S539, D7S820,
Evaluation of 13 short tandem repeat loci for use in personal D13S317, D7S818, D3S1358, Forensic Sci. Int. 97 (1998)
identification applications, Am. J. Hum. Genet. 55 (1994) 53–60.
175–189. [22] M. Nei, Genetic distance between populations, Am. Nat. 106
[7] C. Kimpton, P. Gill, A. Walton, A. Urquhart, E. Millican, M. (1972) 283–292.
Adams, Automated DNA profiling employing multiplex [23] J. Felsenstein, Phylogeny Inference Package (PHYLIP)
amplification of short tandem repeat loci, PCR Methods version 3.5c/ distributed by the author, Department of
Appl. 3 (1993) 13–22. Genetics, University of Washington, Seattle, 1993.
[8] R. Deka, M. Shriver, L. Yu, R. Ferrell, R. Chakraborty, Intra- [24] F.J. Rohlf, NTSYSpc, Version 2.1 (2002), Exeter Publishing
and inter-population diversity at short tandem repeat loci in Ltd., Setauket, NY.
diverse populations in the world, Electrophoresis 16 (1995) [25] S. Wright, The genetical structure of populations, Ann.
1659–1664. Hugenics 15 (1951) 323–354.
[9] R. Chakraborty, D. Stivers, B. Su, Y. Zhong, B. Budowle, The [26] A. Perez-Lezaun, F. Calafell, E. Mateu, D. Comas, R. Ruiz-
utility of short tandem repeat loci beyond human identifica- Pacheco, J. Bertranpetit, Microsatellite variation and the
tion: implications for development of new DNA typing differentiation of modern humans, Hum. Genet. 99 (1997) 1–7.
systems, Electrophoresis 20 (1999) 1682–1696. [27] L. Jorde, A. Rogers, M. Bamshad, W. Watkins, P. Krakowiak,
[10] B. Budowle, R. Chakraborty, Population variation at the S. Sung, J. Kere, H. Harpending, Microsatellite diversity and
CODIS core short tandem repeat loci in Europeans, Legal the demographic history of modern humans, Proc. Natl. Acad.
Med. 3 (2001) 29–33. Sci. U.S.A. 94 (1997) 100–3103.
D.J. Rowold, R.J. Herrera / Forensic Science International 133 (2003) 260–265 265

[28] L.L. Cavalli-Sforza, A. Edwards, Phylogenetic analysis. Phylogenetics of worldwide human populations as determined
Models and estimation procedures, Am. J. Hum. Genet. 19 by polymorphic Alu insertions (PAIs), Electrophoresis 23 (2002)
(Suppl.) (1967) 233. 3346–3356.
[29] G. Antunez-de-Mayolo, P. Antunez-de-Mayolo, A. Antunez- [30] A. Primmer, N. Saino, A. Moller, H. Ellegren, Directional
de-Mayolo, S. Papiha, M. Hammer, E. Yunis, J. Yunis, C. evolution in germline microsatellite mutations, Nat. Genet. 13
Damodaran, M. de Pancorbo, J. Caeiro, V. Puzyrev, R.J. Herrera, (1996) 391–393.

You might also like