Professional Documents
Culture Documents
Relethford1988 - Heterogeneity of Long-Distance Migration in Studies of Genetic Structure
Relethford1988 - Heterogeneity of Long-Distance Migration in Studies of Genetic Structure
Relethford1988 - Heterogeneity of Long-Distance Migration in Studies of Genetic Structure
1, 55-64
Migration matrix analysis has been a powerful tool for investigations of the genetic
structure of human populations. The migration matrix approach uses an observed
matrix of exchanges among a set of populations, a vector of effective population sizes,
and a vector of linear systematic pressures (usually long-distance migration from the
'outside world') to predict patterns of genetic variation within and among populations
(Bodmer and Cavalli-Sforza 1968, Smith 1969, Imaizumi, Morton and Harris 1970).
The resultant matrix represents a prediction of the balance between gene-flow and
genetic drift. These methods have the advantage of using the observed matrix of
exchanges without reduction of migration patterns using some theoretical model.
Jorde (1980) has reviewed many of the studies which utilize the migration matrix
approach. He notes several problems which are common in such studies, including the
assumption of constancy in migration rates and population sizes, as well as the large
number of generations often required for the model to reach an equilibrium state.
Wood (1986) has recently looked at the latter problem and has shown that relative
measures, such as inter-populational distances, converge more quickly.
Another problem in migration matrix analysis is the assumption that migrants
represent a random sample of the population they came from (Jorde 1980). Work by
Fix (1978) and others has shown that this assumption is often violated in human
populations, and the effects of such kin-structured migration must be taken into
consideration. Other potential problems in migration analysis include the possibilities
of non-Markovian migration (Kramer 1981) and return migration (Relethford and Lees
1983).
Less attention has been given to analysis of the effects of differential rates of long-
distance migration and genetic heterogeneity of these migrants. In migration matrix
methods, the long-distance migrants are those from outside of the set of populations
being studied, and are often referred to as migrants from the 'outside world'. Some
studies have ignored migration from the outside world altogether (Wood 1986), while
others have used a single rate of long-distance migration for all populations (Imaizumi
56 J . H . Relethford
et al. 1970). A more common approach preferred in most applications, is the use of a
vector of long-distance migration rates (Jorde 1980).
Even when separate rates of long-distance migration are used for each population it
is still assumed that all long-distance migrants come from a genetically homogeneous
outside world. Wagener (1973) modified Bodmer and Cavalli-Sforza's (1968) migration
matrix model to take genetic heterogeneity of long-distance migrants, along with
differential rates of long-distance migration, into account. Her modification has
stringent data requirements, needing a migration matrix, a vector of effective popula-
tion sizes, a vector of long-distance migration rates, and the gene frequencies of all
Ann Hum Biol Downloaded from informahealthcare.com by Flinders University of South Australia on 01/05/15
populations in the outside world. She applied this method to migration and gene-
frequency data from Ward and Neel's (1970) study of the Makiritare, and found little
effect of genetic heterogeneity on long-distance migrants. She noted, however, that
these results need not apply to all human populations.
Harpending and Ward (1982) have considered heterogeneity in long-distance
migration using the simple island model of population structure under two conditions.
In the first, all islands exchange migrants from the same continent, but in different
amounts. They found that there is little effect on measures of genetic distance based on
a chi-square statistic, but that measures of heterozygosity will be affected. In general,
islands with greater rates of exchange with the outside world will show greater hetero-
zygosity, whereas islands with lower rates of exchange will show less heterozygosity.
The second situation examined by Harpending and Ward is where each island
For personal use only.
exchanges migrants with a different continent, but all islands have the same rate of
migration. In this situation the heterogeneity of source populations will act to slow
down the rate of loss of heterozygosity within islands. Thus, the average heterozygosity
of islands will be less than under a situation of equal exchanges with the same continent.
Actual studies of migration are apt to involve heterogeneity of long-distance
migrants in terms of both rate and population of origin. The purpose of this paper is to
use historical demographic data to assess variation in rates of long-distance migration,
variation in the origin of long-distance migrants, and expected genetic effects of hetero-
geneity of long-distance migrants.
was identified. Each person was then coded as 'local' (from within the 25 km radius) or
'distant' (from outside the 25 km radius). A total of 896 out of 7184 individuals were
long-distance migrants (12.5 °70). The number of long-distance migrants into each of the
four populations is reported in table 1. The definition of 'long distance' used here is
different from in my earlier papers (Relethford 1986b,c) because the purpose is
different. The 25 km study region was chosen as being more representative of the size
of most migration matrix studies. The four towns analysed represent a sample of a
possible 24 by 24 town migration matrix.
Ann Hum Biol Downloaded from informahealthcare.com by Flinders University of South Australia on 01/05/15
Long- Number of
Census Number of Total distance origin towns for
Town o f population long-distance sample migration long-distance
residence sizes migrants size rate migrants
(1) Gardner 955 56 908 0- 062 40
(2) Hubbardston 1402 131 1642 0.080 82
(3) Leominster 1869 348 2132 0" 163 130
(4) Barre 2259 361 2502 0. 144 144
-~ Population size is the harmonic mean of census population sizes from 1800 to 1850 in 10-year intervals.
work on random isonymy in these populations, male and female surnames were pooled
to maximize sample size (Relethford 1986 c).
Values of random isonymy within groups provide an index of surname diversity
within those groups. An alternative measure of surname diversity has been suggested by
Bhatia and Wilson (1981) and was also used here. Their measure o f surname diversity
within population i, H'i(S), is derived from the Shannon information statistic and is
computed as
H'i(S) = - ~(nik/Ni)log(nik/Ni) (3)
Ann Hum Biol Downloaded from informahealthcare.com by Flinders University of South Australia on 01/05/15
where Ni k and Ni are as defined above, summation is over all surnames, and log(nik/Ni)
= 0 when Nik = 0. The higher this measure, the greater amount of surname diversity.
While its computation is similar to random isonymy, it has the advantage of being only
minimally affected by rare surnames. This measure was computed for both the total
sample and for the long-distance migrants alone.
The effect of differential origin of long-distance migrants can be assessed by com-
puting the random isonymy matrix after substituting a random sample of long-distance
migrants for the observed long-distance migrants into each town. That is, the surnames
of the long-distance migrants into a given town are replaced with a random sample of
all possible long-distance migrants. This procedure can be incorporated into the
formula for computation of random isonymy. The number of individuals with
surname k in population i, nik, can be rewritten as
For personal use only.
3. Results
The total number of individuals and the number of long-distance migrants is
reported for each of the four study towns in table 1. The long-distance migration rate
varies from 0.062 to 0" 163. Chi-square analysis of the number of local versus distant
migrants shows highly significant variation in the extent of long-distance migration
among towns (Chi-square = 101.17, d. f. = 3, P < 0" 001).
The number of origin towns for long-distance migrants is also reported in table 1.
O f a total of 268 origin towns, the number of towns represented in the long-distance
migrants varies from 40 to 144. This variation is significantly different from a null
hypothesis of equal numbers of origin towns (Kolmogorov-Smirnov one-sample test,
D = 0" 192, P < 0" 001). It is interesting to note that the smaller the town, the fewer the
number of origin towns. From a genetic perspective, the smaller towns might be
expected to have less genetic diversity in their long-distance migrants.
Results of the Kolmogorov-Smirnov two-sample test comparing the distribution of
Heterogeneity of long-distance migration 59
long-distance migrants by town are reported in table 2. These comparisons are based on
only those towns in the outside world which contributed to one or both towns. With the
exception of the comparison of long-distance migrants of Gardner and Hubbardston
(towns 1 and 2), all the distributions are significantly different (P < 0.05). Thus, the
relative contributions of long-distance migrants from the origin towns varies among the
four towns, as does the rate of long-distance migration.
Table 2. Pair-wise comparison of proportions of
long-distance migrants from origin towns using the
Ann Hum Biol Downloaded from informahealthcare.com by Flinders University of South Australia on 01/05/15
(0.0028) is only slightly larger than the amount of random isonymy expected if all long-
distance migrants were pooled (0.0023). The among-sample values are similar to the
within-sample values, suggesting little genetic heterogeneity among the migrant
samples. Bhatia and Wilson (1981) have derived a formula to estimate the effect of
population subdivision from isonymy by computing the ratio (Csx) of the loss of
surname diversity from subdivision to surname diversity in a total sample. The ratio
based on the above values is low ((?ST= 0" 0005), which further suggests little effect of
subdivision among the long-distance migrant samples.
Bhatia and Wilson's (1981) measure of surname diversity was computed for the
long-distance migrant sample for each of the four towns, and the results are shown in
table 4. The correspondence of the values of surname diversity, the number of long-
distance migrants and the number of origin towns (table 1) is clear. The smaller the
migrant sample and the fewer towns of origin, the lower the surname diversity. This
pattern was not apparent, however, from examination of within-group random
isonymy values (table 3) which provide another index of surname diversity. This
difference may reflect the fact that Bhatia and Wilson's measure is less sensitive to rare
surnames.
60 J. H. Relethford
Observed random isonymy within and among the four study towns is reported in
table 5. Within-town values are larger than among-town values, as expected from geo-
graphic subdivision. The random isonymy values expected from equal sampling of
long-distance migrants are also reported. The observed mean within-town isonymy
(0.0076) is slightly higher than the value expected under proportional long-distance
migration (0- 0073). The observed mean among-town isonymy (0" 0029) is slightly lower
than the value expected under proportional long-distance migration (0-0030). Overall,
the two sets of estimates are almost identical and are not significantly different (paired
variates t-test: t-- 0.808, d.f. = 9, P = 0.440).
Table 5. Random isonymy within and among study towns: observed values and expected values given a
random sample of long-distance migrants.
Observed Expected
For personal use only.
random random
Towns isonymy isonymy
Within-towns 1 0- 0081 0- 0079
2 0" 0096 0" 0093
3 0-0071 0-0067
4 0" 0054 0" 0053
Among-towns 1,2 0" 0041 0" 0041
1,3 0-0027 0-0026
1,4 0" 0023 0" 0024
2,3 0" 0024 0" 0025
2,4 0" 0036 0" 0037
3,4 0" 0023 0" 0025
Sample sizes: town 1 = 896, town 2 = 1609, town 3 = 2063, town 4 = 2455.
Bhatia and Wilson's (1981) measure of surname diversity was computed for each
town using both the observed surname distributions and those expected from equal
sampling of long-distance migrants. Both values are reported in table 6. Both observed
and expected values show a correspondence with total population size (table 1); the
larger the population, the greater the surname diversity. For all four towns, the
observed surname diversity is slightly less than that expected from equal sampling of
long-distance migrants. This difference is only of the order of 2°70, indicating little
change in surname diversity due to non-random sampling of long-distance migrants.
Table 6. Surname diversity (H'i(S)) within study
towns: observed values and expected values given a
random sample of long-distance migrants.
Town Observed value Expected value
1 2- 235 2.283
2 2.252 2.302
3 2.407 2.460
4 2.516 2.548
Sample sizes are from table 5.
Heterogeneity of long-distance migration 61
4. Discussion
Migration matrix analysis is a powerful tool in the study of the genetic structure of
human populations. Criticisms of the method focus on violation of assumptions.
Empirical tests can be used to determine the robustness of these assumptions (e.g.,
Wagener 1973, Wood 1977, 1986).
Of the data required by migration matrix analysis, the vector of long-distance
migration is often problematical, since the genetic nature of the long-distance migrants
is most frequently unknown. After all, the definition of local region and long distance is
often made on the basis of identifying populations for which data are not available!
Ann Hum Biol Downloaded from informahealthcare.com by Flinders University of South Australia on 01/05/15
rant samples appears related to population size, the number of migrants, and the
number of origin towns. The larger the population the greater the number o f migrants
and the greater the number of origin towns. The increase in surname diversity in towns
with more source populations is expected. While this trend is found for Bhatia and
Wilson's measure it is not found for random within-group isonymy, which is also a
measure of surname diversity. This difference might reflect the fact that the latter mea-
sure is more sensitive to rare names. Also, neutral genes (or equivalent measures such as
surnames) are more likely to become fixed in small populations (Kimura and Ohta
1971). Under such a situation new surnames introduced by migration might reach
higher frequencies, and thus increase diversity in small samples. The complex interplay
of possibilities is not open to analysis with the present data-set. In any case, the low
level of subdivision among migrant pools suggests any such effects are minimal in this
study.
There is little effect of differential long-distance migration on the pattern of genetic
similarity within and among the study populations, as judged from isonymy (table 5).
Differential long-distance migration acts to increase within-town similarity and
decrease among-town similarity by a small factor, but not to a significant extent. The
observed pattern is virtually identical with the pattern expected under random sampling
of all long-distance migrants. An earlier analysis of these data (Relethford 1986c)
showed high similarity between migration matrix and isonymy analyses, providing
further confirmation of the correspondence between observed and expected patterns of
variation.
Analysis of surname diversity within populations using Bhatia and Wilson's (1981)
measure shows similar results. The diversity measures of observed and expected
patterns of surname diversity table (table 6) are in close agreement. The observed
measures are slightly less than those expected from a random sample of long-distance
migrants, which is expected from Harpending and Ward's (1982) model of hetero-
geneity in source populations. They predict that populations will show less heterozyg-
62 J. H. Relethford
osity than under a condition of equal migration from a homogeneous outside world.
However, the measurement of surname diversity from random isonymy within popu-
lations (table 5) shows the opposite pattern; the observed measures show greater hetero-
geneity than the expected measures. This difference might reflect sensitivity to rare
surnames, or the fact that Harpending and Ward's models consider two hypothetical
cases: d'.'.fferentmigration rates from the same outside source and similar migration
rates from different outside sources. In the present study there appear to be different
rates of migration from different outside sources. Again, however, the differences
between observed and expected are small and non-significant.
Ann Hum Biol Downloaded from informahealthcare.com by Flinders University of South Australia on 01/05/15
The results in the present study are in agreement with Wagener's (1973) finding that
differential origins of long-distance migrants have little effect on genetic estimates, even
though our data and methodologies are quite different. Wagener predicted that a
significant effect of long-distance migration might be found in two cases: (1) small levels
of internal migration, and (2) highly variable gene frequencies among sections of the
outside world. The present study is characterized by rather low rates of inter-town
migration (Relethford 1986c) and there is still little effect. This lack of effect is most
likely due to the minimal differences in surname distributions among the long-distance
migrant samples. Surname homogeneity could reflect common ancestry among origin
towns and/or a reduction of diversity due to multiple origins of the same surname.
Gene-frequency data might show a different pattern, but of course are not available
from historical data.
For personal use only.
Acknowledgements
I thank Dr Denise Hodges, Cheryl Hopkins and Donna Smethurst for their assist-
ance in data collection and data processing. I am also extremely grateful for the valu-
able comments provided by the two anonymous referees. This research was supported
in part by State University of New York Research Foundation Grant No. 227-7177A
and a Walter B. Ford Grant from the State University of New York College at Oneonta.
References
BHATIA,K., and WILSON,S. R., 1981, The application of gene diversity analyses to surname diversity data.
Journal of Theoretical Biology, 88, 121-133.
BODMER, W.F., and CAVALLI-SFORZA,L.L., 1968, A migration matrix model for the study of random
genetic drift. Genetics, 59, 565-592.
Fix, A. G., 1978, The role of kin-structured migration in genetic microdifferentiation. Annals of Human
Genetics, 41, 329-339.
FUSTER,V., 1986, Relationship by isonymy and migration pattern in northwest Spain. Human Biology, 58,
391-406.
Heterogeneity o f long-distance migration 63
HARPENDING, H.C., and WARD, R.H., 1982, Chemical systematics and human populations. In
Biochemical Aspects of Evolutionary Biology, edited by M. Nitecki (Chicago: University of
Chicago Press), p. 213.
IMAIZUMI, Y., MORTON, N.E., and HARRIS, D.E., 1970, Isolation by distance in artificial populations.
Genetics, 66, 569-582.
JORDE, L. B., 1980, The genetic structure of subdivided human populations. In Current Developments in
Anthropological Genetics, Volume 1: Theory and Methods, edited by J.H. Mielke and M . H .
Crawford (New York: Plenum Press), p. 135.
KIMURA, M., and OHTA, T., 1971, Theoretical Aspects of Population Genetics (Princeton: Princeton
University Press).
KRAMER,P. L., 1981, The non-Markovian nature of migration: a case study in the Aland Islands, Finland.
Ann Hum Biol Downloaded from informahealthcare.com by Flinders University of South Australia on 01/05/15
WAGENER,D.K., 1973, An extension of migration matrix analysis to account for differential immigration
from the outside world. American Journal of Human Genetics, 25, 46-56.
WARD, R.H., and NEEL, J.V., 1970, Gene frequencies and microdifferentiation among the Makiritare
Indians. IV. Comparison of a genetic network with ethnohistory and migration matrices; a new
index of genetic isolation. American Journal of Human Genetics, 22, 538-561.
WOOD, J. W., 1977, A stability test for migration matrix models of genetic differentiation. Human Biology,
49, 309-320.
WOOD, J.W., 1986, Convergence of genetic distances in a migration matrix model. American Journal of
Physical Anthropology, 71,209-219.
R6sum6. L'une des suppositions pr6alables que font les m6thodes qui utilisent les matrices de migration
pour 6tudier la structure des populations, est que les immigrants "lointains" sont tous tir6s d'un "milieu
ext6rieur" g6n6tiquement homog6ne. Cette supposition n'a pas souvent 6t6 discut6e. Cet article examine
des donn6es de migration et de noms de famille de quatre villes du Massachusetts historique, afin
d'6prouver cette supposition et d'appr6cier les effets g6n6tiques potentiels r6sultant de migrations h longue
distance h6t6rog6nes. L'analyse des donn6es sur les migrations montre que le taux d'immigration lointaine
est significativement diff6rent pour les quatre villes. Les distributions des populations source de ces
migrants sont significativement diff6rentes dans les quatre villes. L'analyse des noms montre que malgr6 le
non respect du postulat d'homog6n6it6 des immigrants lointains, cela produit peu d'effet sur le degr6 et les
modalit6s de la variation inter-groupe et intra-groupe dans ces villes. Cette absence d'effet parait li6e
l'homog6n6it6 g6n6tique des immigrants lointains.