Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

American Journal of Medical Genetics 63:386-391 (1996)

Estimating Parental Relationship in Linkage


Analysis of Recessive Traits
Chantal MBrette and Jurg Ott
Centre de Recherche Universitd Lava1 Robert-Giffard, Beauport, Quebec, Canada (C.M.); and Department of Psychiatry,
Columbia University, and New York State Psychiatric Institute, New York, New York (J.O.)

In linkage analysis of recessive traits, out under the assumption that the parents are unre-
parental relationship is important. For the lated, which might unnecessarily reduce power if in
case that it is unknown, the question is in- fact the parents are related. This led us to consider a
vestigated as to whether estimating paren- novel approach, i.e., to obtain a n estimate for the pa-
tal relationship and using the estimated re- rental relationship and use that estimated relationship
lationship in linkage analysis is beneficial. in linkage analysis a s if i t were known.
Results show that estimating parental rela- Below, we investigate statistical properties of the
tionship can reliably be carried out on the proposed approach. Our investigation falls into two
basis of 5CL100 genetic marker loci (analysis separate components: 1)determining the effect of mis-
based on theory by Thompson [1975: Am J specifying parental relationship in linkage analysis of
Hum Genet 39:173-1881). Misspecification of recessive traits, and 2) estimating the relationship be-
parental relationship leads to a loss of link- tween two individuals. The latter question has been
age informativeness,but not to false-positive studied on the basis of genetic marker data [Thompson,
evidence for linkage. An asymptotic bias in 1975,1986,19911 and DNA fingerprinting [Chakraborty
the recombination fraction estimate occurs and Jin, 19931.We will be concentrating on the marker-
when parents are unrelated and falsely based approach.
taken to be related, but no such bias is seen Because a n analytical investigation of these ques-
when related parents are taken to be unre- tions does not appear feasible, we employed computer
lated. Results from this investigation sug- simulation (Monte Carlo methods) to study properties
gest that an estimated parental relationship of the proposed methods. We generate family data on
may be used in linkage analysis as if it were the computer, i.e., under known conditions, and apply
the correct relationship, when evidence for our methods to the generated family data. This allows
the estimated relationship is supported by a us, for example, to tell with what probability our meth-
likelihood ratio of at least 1O:l against the ods come to a correct or wrong conclusion. Such com-
parents being unrelated. 01996 Wiley-Liss, Inc. puter simulation methods belong to the standard reper-
toire of statistical geneticists.
KEY WORDS: linkage analysis, recessive
disease, identity by descent MATERIALS AND METHODS
Misspecification of Parental Relationship
To study the effects of misspecifying parental rela-
INTRODUCTION tionship on linkage analysis of a recessive trait, we fo-
cused on two cases: 1) parents unrelated, and 2) par-
For recessive diseases, i t is well-known that inbred ents a s first cousins. Investigations were carried out on
matings potentially provide much more information for the two pedigree structures shown in Figure 1. PEDl
linkage than noninbred matings [Smith, 1953; Lander refers to the individuals within the dotted line who
and Botstein, 19861. For this reason, families are some- form a nuclear family with 2 unrelated parents and 4
times collected in countries where it is relatively fre- offspring, 3 affected and 1 unaffected, with marker in-
quent for parents to be related. In many cases, however, formation available on all 6 individuals. PED2 refers to
the relationship between the parents is uncertain or the complete four-generation pedigree, where ancestors
unknown. Then, linkage analysis is typically carried above the parents are deceased and thus unavailable
for marker typing, whereas genetic marker information
is available for the other individuals. For the PED2
Received for publication May 5, 1995;revision received October pedigree structure, the parents of the 4 siblings are
27, 1995. first cousins. For linkage analysis purposes, PED2 dif-
Address reprint requests to Dr. Jurg Ott, New York Psychiatric fers from P E D l only in the relationship of the parents
Institute, Unit 58, 722 West 168th St., New York, NY 10032. of the 4 siblings.
01996 Wiley-Liss, Inc.
Estimating Parental Relationship 387

@TO PEDP
which Z,,, exceeded some threshold, c. For power calcu-
lations ( r < 0.50), we used c = 3 , whereas significance
levels ( r = 0.50) were determined with respect to c =
0.5, 1.0, and 1.5.
Estimating Relationship
I t is intuitively clear that 2 related individuals must
have similar genotypes. Thus, estimation of a n un-
......................................... ........... known relationship between 2 individuals (no relatives
of either individual known) may be based on their geno-
types at marker loci. Below, our approach is based on
the theory developed by Thompson [1975, 1986, 19911,
who derived the likelihood for the single-locus geno-
I
I
types of 2 individuals of a given relationship.

a a o e
I I I
For multiple unlinked marker loci, the total likeli-
hood is simply the product over all single-locus likeli-
hoods. For numbers of marker loci typically available in
current human marker typing, we want to determine
Fig. 1. Pedigree structures used for computer simulation. PEDl is how reliably the relationship between 2 individuals can
limited to a nuclear family with unrelated parents, whereas PEDZ in- be estimated.
corporates grandparents and great grandparents, thus showing that Consider a single genetic marker with a number of
the parents are cousins.
alleles, where its j-th allele, a,, has population frequency,
p,. For a pair of individuals with a given relationship,
assume, for example, genotype auavfor individual 1and
To study the effects of making correct or incorrect genotype avawfor individual 2. The likelihood for the ob-
assumptions on the relationship of the 2 parents, we served genotypes is then simply the probability of oc-
considered all four possible combinations between the currence, P(a,a,, a,a,). This probability may be evalu-
two true situations, i.e., parents are in reality cousins ated a s follows: let q, be the conditional probability of
or unrelated, and linkage analysis being carried out un- occurrence of the two genotypes, given that the individ-
der the assumption that parents are cousins or unre- uals share i alleles identical by descent (IBD), i = 0, 1,
lated. For each of the two true situations (PED1, par- or 2. Thompson [1991] provided a table of such condi-
ents unrelated; PEDS, parents are cousins), for a single tional probabilities for all possible pairwise genotypes
genetic marker locus with four equally frequent alleles at a locus, where these probabilities depend only on
(heterozygosity of 0.75), we generated marker geno- IBD sharing and not on the relationship between 2 in-
types by computer simulation using the SLINK com- dividuals. For example, when 2 individuals share no
puter program [Weeks et al., 19901. Simulations were alleles IBD (i = 01, we have qo = 4pup,2p,. Further,
carried out for the following (true) recombination frac- q1 = pupvpwand q2 = 0. Now, let k,be the conditional
tions: r = 0,0.02,0.04,0.10,and 0.50. For r < 0.50,500 probability t h a t two individuals of a given relationship
replicates of the corresponding pedigree structures and share i alleles IBD. Then, the joint probability of occur-
marker genotypes were generated; for r = 0.50 (no link- rence of the two genotypes for individuals of a given
age), 2,000 replicates were generated. Typically, each relationship is given by koqo+ klql + k2q2,which is the
replicate contained only a single pedigree; for r = 0.02, conditional single-locus likelihood, given some relation-
replicates with n = 3 , 5, and 10 pedigrees each were ship between the two individuals. For observed geno-
generated. types at multiple unlinked loci (we do not consider
Each set of replicates was analyzed under each of two linked loci), the corresponding likelihoods are simply
assumptions on the relationship between the 2 parents multiplied. The relationship between two individuals
(unrelated, cousins) to learn of the effect of a n incorrect is now estimated by calculating the likelihood under
assumption on parental relationship. For pedigree data a number of assumed relationships. The relation-
generated under linkage ( r < 0.50), in each replicate, ship with the highest likelihood is the estimated rela-
lod scores were calculated at assumed (formal) recom- tionship.
bination fractions, 0, ranging from 0-0.40 in steps of In this approach, the relationship between two indi-
0.02. At each value of 8, lod scores were averaged over viduals is characterized by its associated IBD pro-
all replicates, leading to a n approximation of the ex- babilities, k,. Thus, relationships with identical IBD
pected lod score at that 8 value. The maximum of these probabilities are indistinguishable, e.g., uncle-niece,
expected lod scores, the MELOD ( a customary abbrevi- grandparent-grandchild, and half-sibs [Thompson,
ation for maximum of expected LOD score), was 19861.
recorded a s well as the 0 value, 6, a t which it occurred. We investigated six different relationships between 2
The difference, 6 - r, approximates the asymptotic bias individuals. Figure 2 shows a pedigree with 10 individ-
in the estimate of the recombination fraction. Also, in uals containing examples of each of the relationships
each replicate, the observed maximum lod score, Z,,,, considered. Table I (based on Table I in Thompson
over all 0 values was recorded. Power and significance [ 19911) lists these relationships and their correspond-
levels were estimated a s the proportion of replicates in ing well-known IBD probabilities. Statistical proper-
388 Merette and Ott
TABLE 11. MELOD and Recombination Fraction, 0,
at Which the Maximum Occurs, for True and Assumed
Relationships of Parents (Alive) in Fimre 1"
Assumed relationship
Cousins Unrelated
True
relationship r MELOD 6 MELOD 6
Cousins 0 1.45 0 1.02 0
Unrelated 0 0.55 0.08 0.95 0
Cousins 0.02 1.20 0.02 0.89 0.02
Unrelated 0.02 0.49 0.10 0.74 0.02
*r is the true recombination fraction
7
8/9 combination fractions, r = 0,0.02,0.04,0.10 or 0.50 (re-
sults shown only for r = 0 and r = 0.02), i.e., ignoring in
Fig. 2. Pedigree with ten members showing relationships listed in the analysis that parents are cousins and carrying out
Table I. a n analysis under the assumption of unrelated parents
does not lead to a n asymptotic bias. Not unexpectedly,
however, there is a drop in the expected lod score when
ties of the maximum likelihood estimation procedure the analysis disregards a n existing parental relation-
were determined by computer simulation as follows: ship. For example, a t r = 0, the MELOD was 1.45 when
using the SIMULATE computer program [Ott and Ter- the data were analyzed with the true parental relation-
williger, 19921, for each of the six (true) relationships of ship (first cousins). It decreased to 1.02 when in the
2 individuals, we generated 500 replicates of genotypes analysis the parents were assumed unrelated, which
a t m unlinked marker loci, with each locus having 10 represents a loss of 30% in informativeness for linkage.
equally frequent alleles (heterozygosity of 0.90). For For the data simulated under r = 0.02, 0.04, and 0.10,
each set of 500 replicates, log likelihoods were calculated the decrease in the MELOD was again close to 30%
in six different ways, each time under the assumption (results for r > 0.02 not shown).
of a different one of the six relationships considered Parents are unrelated. In contrast to the results
here. These analyses were carried out for m = 50 and discussed above, falsely assuming that the parents are
m = 100 markers. The latter number represents the cousins while in fact they are unrelated leads to a n as-
maximum number of approximately unlinked marker ymptotic overestimation of the recombination fraction.
loci over the human genome if we assume that, with a For example, with r = 0, we find 0 = 0.08. In addition,
genome length of 40 Morgans, two markers are un- at r = 0, the MELOD drops from 0.95 (parents correctly
linked when they are at least 40 centimorgans apart. treated as unrelated) to 0.55 (parents falsely assumed
RESULTS to be cousins), which represents a loss of 42%. For in-
creased values of r, both asymptotic bias and drop in
MELOD persist but tend to become less pronounced
Misspecifications of Parental Relationship
(results shown only for r = 0 and r = 0.02).
Parents are cousins. Table I1 shows results of the As Table I1 shows, analysis under the correct relation-
computer simulation for two true values of the recom- ship is always better than under an incorrect relationship.
bination fraction, r = 0 and r = 0.02 (results for other In other words, analysis under a n incorrect relation-
values are referred to in the text). As Table I1 shows, ship does not tend to lead to inflated evidence for link-
the maximum expected lod score (MELOD) was always age; if anything, it may mask a n existing linkage.
observed a t the true recombination fraction, whether or No linkage. Table I11 presents estimates of signif-
not the analysis was performed under the correct fa- icance levels associated with critical maximum lod
milial relationship. This was the case for any of the re- scores, Z,, 2 c, c = 0.5, 1.0, and 1.5 (2,000 replicates).
For all cases shown, significance levels are smaller
when assumed parental relationship is different from
TABLE I. IBD Probabilities, k,, for Two Individuals true relationship, i.e., assuming a wrong parental rela-
With Selected Relationships* tionship for analysis purposes does not lead to false-
Relationship Examples ko kl k2 positive evidence for linkage.
Power calculations. Table IV shows the results of
Parent-offspring 5, 8 0 1 0 power analyses for n = 3 , 5 , and 10 families of the struc-
Full sibs 4, 5 '/4 ?h 1/4
Uncle-niece" 4,8 '/2 '/z 0 tures given in Figure 1.When parents are first cousins,
First cousins 7,8 3/4 '/4 0 analysis under correct parental relationship yields
First cousins once removed 7, 10 7/8 '/8 0 power values of 0.78, 0.98, and 1.00, respectively. With
Unrelated 3,6 1 0 0 a n analysis assuming the parents to be unrelated even
"Examples refer to numbered individuals shown in Figure 2. though they are in fact cousins, the corresponding
"Also half-sibs, grandparent-grandchild. power values are 0.54, 0.90, and 1.00. Thus, there is a
Estimating Parental Relationship 389

TABLE 111. SignificanceLevels ( r = 0.50) for True and Assumed Relationships


of Parents (Alive)in Figure 1
Assumed relationship
Cousins Unrelated
lod score threshold lod score threshold
True
relationship 0.5 1.0 1.5 0.5 1.0 1.5
Cousins 0.082 0.025 0.010 0.077 0.024 0
Unrelated 0.069 0.016 0.003 0.079 0.020 0

clear tendency for a decrease in power when parental For true first cousins, discriminating power among
relationship is not taken into account in the analysis. different relationships was better than for siblings.
An analogous, but more pronounced, drop in power oc- Only two relationships (first cousins, first cousins once
curs when unrelated parents are falsely taken to be removed) were detected with a power exceeding 50%.
first cousins. These results confirm the tendencies seen With 100 markers, power was estimated to be somewhat
above, i.e., that misspecification of parental relation- larger for first cousins once removed than for cousins,
ship does not inflate evidence for linkage. which, presumably, is due to a random fluctuation.
Estimating Relationship DISCUSSION
As mentioned above, for a pair of individuals, marker The main purpose of this investigation was to learn
data were generated under each of six relationships, R, whether estimating a n unknown parental relationship
(“true relationship”). For each of the resulting 500 sets and using the estimated relationship in a linkage
(replicates) of simulated data, the likelihood was calcu- analysis of recessive traits might be beneficial. The an-
lated six times, each time assuming a different one of swer depends on the true parental relationship. As-
the six relationships (“stated relationship”), and the sumption of a false parental relationship carries a
maximum likelihood estimate, R, of the relationship penalty, in terms of both the maximum expected lod
was determined a s that relationship with the highest score (MELOD) (Table 11),and power for detecting link-
associated likelihood. For each combination of true and age (Table IV). The resulting loss of expected lod score
stated relationships, the following two statistics were is more serious when individuals are unrelated than
then calculated: S, was the probability that the stated when they are, for example, first cousins. Consequently,
relationship has the highest likelihood of all relation- if estimated relationships are planned to be used in a
ships investigated. This probability was approximated linkage analysis, it is prudent to be conservative in de-
as the proportion of replicates in which the likelihood claring individuals to be related, for example, by apply-
for the stated relationship was highest among all rela- ing a cutoff criterion of LR = 10, as in Table V, for the
tionships considered. The second statistic, S2, approxi- S2 statistic.
mated the probability that the likelihood ratio, X = L Consider, for simplicity, the case that 2 parents are
(stated R)/L(R = unrelated), is 210. Thus, with 10 a s a either first cousins, C, or unrelated, U. If they are C
cutoff point for the likelihood ratio, S, may be viewed as then, even with only 50 markers, the probability is 65%
the power in a significance test of the null hypothesis, that this relationship will be detected (with the conser-
R = unrelated, when individuals are related, or a s the vative criterion, LR 2 10). There is virtually no chance
significance level when the true relationship is “unre- that they will be declared U (but they may be mistaken
lated,” against the hypothesis, R = stated relationship. for having a relationship resembling C). In this case, es-
As Table V shows, when 2 individuals are unrelated, timating parental relationships is clearly beneficial. If
S, is at most equal to 0.01, i.e., it is rare that some re- the parents are U, there is little chance that they will
lationship is inferred (cutoff point, LR = 10) when in mistakenly be called C. Thus, we conclude that it is
fact individuals are unrelated. Simulations were also generally useful to estimate the relationship between
carried out with a cutoff point of LR = 2.7 (correspond-
ing to a difference in In likelihood of 1; results not
shown), but then falsely inferring a relationship oc- TABLE IV. Power (P[Z,,, 2 31) for Detecting Linkage*
curred in up to 9% of the replicates.
When 2 individuals are siblings, their estimated re- Assumed relationship
lationship was always the true relationship. On the Cousins Unrelated
other hand, when a cutoff point of LR = 10 was applied
for declaring a relationship accepted, no less than four True Number of families Number of families
relationships (sibs, uncle-niece, first cousins, and first relationship 3 5 10 3 5 10
cousins once removed) fulfilled this criterion. Thus, .-.

these relationships are difficult to distinguish for 2 Cousins 0.78 0.98 1.00 0.54 0.90 1.00
siblings. This situation was the same for 50 or 100 Unrelated 0.05 0.26 0.73 0.36 0.81 0.99
marker loci. :ii r = 0.02 between disease and marker.
390 MBrette and Ott
TABLE V. Resulting Statistics, S, and Sz,for Estimating Relationship Between Two Individuals"
Stated relationship, R.
Parent- Full Uncle- First First cousins
Statistic offspring sibs niece cousins once removed Unrelated
m = 50 markers
Parent-offspring 1.00 0.00 0.00 0.00 0.00 0.00
1.00 1.00 1.00 1.00 1.00 0.00
Full sibs 0.00 1.00 0.00 0.00 0.00 0.00
0.00 1.00 1.00 1.00 1.00 0.00
Uncle-niece 0.00 0.00 0.88 0.12 0.00 0.00
0.00 0.46 0.98 0.99 0.99 0.00
First cousins 0.00 0.00 0.11 0.61 0.25 0.03
0.00 0.00 0.38 0.65 0.54 0.00
First cousins 0.00 0.00 0.00 0.26 0.45 0.29
once removed 0.00 0.00 0.06 0.19 0.13 0.00
Unrelated 0.00 0.00 0.00 0.02 0.22 0.76
0.00 0.00 0.00 0.01 0.00 0.00
m = 100 markers
Parent-offspring 1.00 0.00 0.00 0.00 0.00 0.00
1.00 1.00 1.00 1.00 1.00 0.00
Full sibs 0.00 1.00 0.00 0.00 0.00 0.00
0.00 1.00 1.00 1.00 1.00 0.00
Uncle-niece 0.00 0.00 0.95 0.05 0.00 0.00
0.00 0.48 1.00 1.00 1.00 0.00
First cousins 0.00 0.00 0.05 0.75 0.19 0.01
0.00 0.00 0.38 0.87 0.91 0.00
First cousins 0.00 0.00 0.00 0.19 0.59 0.22
once removed 0.00 0.00 0.03 0.26 0.35 0.00
Unrelated 0.00 0.00 0.00 0.00 0.17 0.83
0.00 0.00 0.00 0.01 0.01 0.00
*&, true relationship; R,, stated relationship; R, estimated relationship; S,,
P(R = RJ; S,, P(L[R,l/LLunrelated] 2 10)

parents whenever it is in any way doubtful that they parents are cousins. In our simplified situation,
are unrelated. To localize the gene for a recessive dis- 1- p = P(U)
ease on the human genome, we recommend the follow-
ing steps, assuming that nuclear families are investi- is then the prior probability that they are unrelated.
gated: Such priors may be used to obtain more precise estimates
of parental relationship. If L(C) = P(data I C) is the
1) With a number of unlinked markers, estimate the likelihood for the marker data given that 2 individuals
relationship between the parents. are cousins, and L(U) is the likelihood given that they
2) For analysis purposes, if the estimated relation- are unrelated, then the posterior probability that they
ship has a likelihood a t least 10 times that for unre- are cousins is given by
lated parents, make up relatives of the parents such
that they show the estimated relationship. If the likeli-
hood ratio is < l o , treat the parents a s being unrelated.
3) Carry out linkage analyses using the modified
family tree, i.e., with the estimated relationship be- This equation is easily extended to several relation-
tween the parents. ships.
In the analysis of extended pedigrees, many errors in
Typically, model misspecifications lead to a n asymp- determining marker genotypes tend to be exposed as
totic bias in the recombination fraction estimate [Ott, Mendelian incompatibilities. No such safeguards exist
19911. As is seen above, when parents are cousins but in the estimation of relationship between 2 individuals.
in the analysis assumed to be unrelated, the recombi- Thus, marker errors should be kept to a s low a level a s
nation fraction estimate is still asymptotically unbi- possible. One might even consider building marker
ased. An explanation for this phenomenon has been errors into the estimation procedure, but this is not
given previously (section 10.6 in Ott [1991]): when pursued here. Similarly, errors in allele frequencies
parental phase probabilities are different from 'hbut in might influence the results of estimating parental rela-
the analysis assumed to be equal to 1/z, the resulting re- tionships. However, as shown by Thompson [ 19751, such
combination fraction estimate was shown to be asymp- estimates are robust against changes in allele frequencies.
totically unbiased.
For many populations, there are good records of the ACKNOWLEDGMENTS
frequencies of various parental relationships. Consider, We thank Dr. Linda Brzustowicz for approaching us
for example, the prior probability, p = P(C), t h a t two with the question of how one might estimate relation-
Estimating Parental Relationship 391

ship among 2 individuals, which led us to find the liter- Ott J (1991): “Analysis of Human Genetic Linkage.” Baltimore: Johns
Hopkins University Press, pp 217-227.
ature already available on this subject and eventually
prompted us to investigate properties of assuming an Ott J , Terwilliger J D (1992): Assessing the evidence for linkage
in psychiatric genetics. In Mendlewicz J, Hippius H (eds):
estimated parental relationship in linkage analysis. “Genetic Research in Psychiatry.” New York: Springer-Verlag,
This work was supported by grant HG00008 from the pp 245-249.
National Center for Human Genome Research,by the Na- Smith CAB (1953): The detection of linkage in human genetics. J R
tional Retinitis F’igmentosa Foundation, and by the Fonds Stat SOC15153-184.
de la Recherche en Sante de Quebec, Canada. Thompson EA (1975): The estimation of painvise relationships. Ann
Hum Genet 39:173-188.
Thompson EA (1986): “Pedigree Analysis in Human Genetics.” Balti-
REFERENCES more: Johns Hopkins University Press, pp 47-55.
Chakrahorty R, Jin L (1993): Determination of relatedness between Thompson EA (1991): Estimation of relationships from genetic data.
individuals using DNA fingerprinting. Hum Biol65:875-895. In Rao CR, Chakrahorty R (eds): “Handbook of Statistics,” Vol 8.
New York Elsevier, pp 255-269.
Lander ES, Botstein D (1986): Mapping complex genetic traits in
humans: New methods using a complete RFLP linkage map. Cold Weeks DE, Ott J, Lathrop GM (1990): SLINK A general simulation
Spring Harbor Symp Quant Biol51:49-62. program for linkage analysis. Am J Hum Genet 47:204.

You might also like