Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

Journal of General Virology (2002), 83, 25632573.

Printed in Great Britain


. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Evidence for diversifying selection in Potato virus Y and in the
coat protein of other potyviruses
Beno#t Moury,
1, 2
Caroline Morel,
1
Elisabeth Johansen
2
and Mireille Jacquemond
1
1
Station de Pathologie Ve! ge! tale, Institut National de la Recherche Agronomique, F-84143 Montfavet cedex, France
2
Danish Institute of Agricultural Sciences, Biotechnology Group, Thorvaldsensvej 40 1, DK-1871 Frederiksberg C, Denmark
The modes of evolution of the proteins of Potato virus Y were investigated with a maximum-
likelihood method based on estimation of the ratio between non-synonymous and synonymous
substitution rates. Evidence for diversifying selection was obtained for the 6K2 protein (one amino
acid position) and coat protein (24 amino acid positions). Amino acid sites in the coat proteins of
other potyviruses (Bean yellow mosaic virus, Yam mosaic virus) were also found to be under
diversifying selection. Most of the sites belonged to the N-terminal domain, which is exposed to the
exterior of the virion particle. Several of these amino acid positions in the coat proteins were
shared between some of these three potyviruses. Identication of diversifying selection events in
these different proteins will help to unravel their biological functions and is essential to an
understanding of the evolutionary constraints exerted on the potyvirus genome. The hypothesis of
a link between evolutionary constraints due to host plants and occurrence of diversifying selection
is discussed.
Introduction
Potato virus Y (PVY) is the type member of the genus
Potyvirus in the family Potyviridae (Shukla et al., 1994). The
monopartite genome is composed of a single-stranded,
positive-sense RNA molecule of about 97 kb. During the
infection process, this RNA is translated into a large precursor
polyprotein that is cleaved co- and post-translationally into 10
mature proteins (Dougherty & Carrington, 1988; Riechmann
et al., 1992). PVY is the cause of major diseases in solanaceous
crops including potato, tobacco, pepper and tomato, and also
infects many solanaceous and non-solanaceous weeds. The
symptoms induced by PVY and its host range are highly
variable. These traits are used for the classication of PVY
strains into dierent groups. For example, strains from potato
are classied into three main subgroups (O, Nand C) according
to symptomatology and serology (with monoclonal anti-
bodies).
PVY subgroups obtained by phylogenetic analyses of
Author for correspondence: Beno#t Moury (at Institut National de la
Recherche Agronomique). Fax 33 4 32 72 28 42.
e-mail moury!avignon.inra.fr
The EMBL accession numbers of the sequences reported in this paper
are AJ439544 and AJ439545.
sequence data correlate only partially with the biological traits
of the strains. No molecular determinants of symptomatology
or host range have yet been isolated, with the exception of the
NIa proteinase of PVY, required for elicitation of the Ry-
mediated resistance in potato (Mestre et al., 2000). Knowledge
of modes of evolution in the dierent genome regions and in
the dierent phylogenetic lineages would help to identify
symptomatology and host-range molecular determinants as
well as external (including plant hosts, vector insects and
physical environment) and internal (biological functions of
PVY proteins) evolutionary constraints exerted on PVY.
The ratio () of non-synonymous (amino acid-altering) to
synonymous (silent) substitution rates provides an estimate of
the selective pressure at the protein level (Kimura, 1983). A
value of 1 means that non-synonymous mutations oer
tness advantages to the protein and have higher xation
probabilities than synonymous mutations (diversifying or
positive selection). On the other hand, values close to 0 mean
that the protein is essentially conserved at the amino acid level
(purifying or negative selection) and 1 corresponds to
neutral evolution. Calculation of as a mean over all codons
in the gene and over the entire evolutionary time that
separates the sequences provides only limited information and
impedes the detection of diversifying selection events in many
cases. Indeed, many proteins appear to be under purifying
0001-8476 #2002 SGM
CFGD
B. Moury and others B. Moury and others
selection most of the time (Li, 1997) and a large proportion of
their amino acids are largely invariable (with close to 0) due
to structural constraints. When knowledge of the functional
domains of the protein is not available, or when only a few
codons or a few lineages undergo diversifying selection, a
better approach is to devise statistical models that allow for
heterogeneous ratios among codons or among lineages
(Yang, 1998; Nielsen & Yang, 1998). This can be achieved by
maximum-likelihood methods (Yang, 1997; Yang et al., 2000).
Most of the PVY genomic sequences available are from
potato strains. To obtain more information about the diversity
and evolution of PVY, we have cloned and sequenced the
genomes of two PVY strains, one from tomato and the other
from pepper. Using a maximum-likelihood algorithm, we
looked for positive selection events in each of the PVY-
encoded proteins. Finally, we also examined the presence of
substitution saturation and the occurrence of recombination
events among the sequences, since these can interfere with the
detection of positive selection events. Amino acid sites in two
PVY proteins [the 6K2 and coat or capsid proteins (CP)] were
found to be under positive selection. We also examined the
CPs of other potyviruses to conrm our ndings and to
evaluate the involvement of the CP in host adaptation.
Methods
Nucleotide sequences. PVY nucleotide sequences available in the
GenBank database were used (Table 1). Strain LYE84.2 was isolated from
Table 1. PVY sequences used for phylogenetic analyses
Sequences corresponding to accession numbers M95491, U09509, X12456 and X97895 and to strains SON41
(AJ439544) and LYE84.2 (AJ439545) are full-length and hence are common to all analyses. Accession numbers
or strain names of additional sequences are shown in the table.
Protein
Sequences
analysed (n) Accession numbers or names of strains*
P1 11 M37180, M38377, N-Nysa, NN, N-Wilga
HC-Pro 8\12 AF166115, M37180, M38377, Z50041, Z50042, Z50043
P3 6
6K1 6
CI 8 S51663, S51664
6K2 8 D12539, Z29526
NIa VPg 8 D12539, Z29526
NIa Pro 7 D12539
NIb 9\11 AF062999, D12539, U06789, U09508, Y16879
CP 25 AF012026, AF012027, AF012028, AF012029, AJ005639,
D12570, U06789, U10378, X14136, X54058, X54636, X68221,
X68222, X68223, X68224, X68225, X68226, X79305, P25
* Most of the sequences are from potato strains. Underlined and bold entries are respectively from tobacco
and pepper strains.
From Marie-Jeanne Tordo et al. (1995) and personal communication.
Some sequences are only partial. Several analyses were performed with dierent numbers of sequences and
dierent sequence lengths.
5 From Fakhfakh et al. (1995) ; sequence entered by hand from the original publication.
tomato and has been repeatedly passed through Lycopersicon spp.
genotypes (Legnani, 1995). Strain SON41 was isolated from the weed
Solanum nigrum and has been repeatedly passed through Capsicum
annuum cultivar Florida VR2 (Gebre! -Selassie! et al., 1985). Degenerate
oligonucleotide primers were dened in regions conserved among PVY
strains. Overlapping cDNA fragments covering the whole genome of
SON41 were amplied by PCR using Taq DNA polymerase and cloned
in pCR2.1-TOPO(Invitrogen). Nucleotide sequence reactions (ABI Prism
BigDye Terminator cycle sequencing ready reaction kit, Applied
Biosystems) were analysed on an automated ABI 377XL DNA sequencer
(Applied Biosystems). Specic PCR primers were designed and over-
lapping PCR fragments were amplied with Pfu DNA polymerase
(Stratagene) and cloned in pCR-bluntII-TOPO (Invitrogen). The nucleo-
tide sequences of these clones were determined as described previously.
Two strategies were used to obtain the cDNA of the strain LYE84.2. A
38 kb clone covering the 3-terminal region was obtained using the
procedure of Gubler & Homan (1983). The remaining part of the virus
was amplied by PCR and cloned in pZErO-2 (Invitrogen). Sequencing
was performed by Genome Express (Grenoble, France). The nucleotide
sequences at the 5 termini of the SON41 and LYE84.2 RNAs have not
been determined. Therefore, a consensus primer corresponding to the
rst 28 nucleotides was used as the 5 primer for cDNA amplication of
this part of the genomes.
Nucleotide sequences of the coat proteins of Bean yellow mosaic virus
(BYMV), Lettuce mosaic virus (LMV), Potato virus A (PVA), Potato virus V
(PVV), Turnip mosaic virus (TuMV) and Yam mosaic virus (YMV) were
obtained from the GenBank database.
Sequence alignments. Because the data contained variable
numbers of sequences along the PVY genome (Table 1), each of the 10
proteins resulting from the cleavage of the PVY polyprotein was
analysed separately, unless indicated. The nucleotide sequences were
CFGE
Diversifying selection in potyviruses Diversifying selection in potyviruses
aligned using the c|ust| method of aligning multiple sequences
(MegAlign program) available with the ostx package for the Apple
Macintosh (version 1.02, ostx Inc.). No gaps were observed, except
for P3 and NIb. For P3, sequence X12456 shows a one-nucleotide
deletion after nucleotide 569 followed by a one-nucleotide insertion after
nucleotide 612. In the NIb coding region of sequence X12456, a similar
local frame shift was identied by a nucleotide deletion after nucleotide
266 followed by a one-nucleotide insertion after nucleotide 311. Both of
these local frame shifts resulted in a deduced amino acid sequence for
X12456 that is highly divergent in these regions from the ve other
strains. It is highly probable that these one-nucleotide deletions and
insertions are sequencing errors (C. Robaglia, personal communication)
and they were considered as such in the following analyses. For NIb,
sequence X12456 shows a two-amino acid codon insertion after amino
acid codon 45 in comparison with all other sequences. These positions
were discarded in the following analyses. For other potyviruses, only CP
sequences that did not show any gaps were kept for further analysis.
Phylogenetic relationships among PVY strains. When the
genetic distance increases between sequences, the number of observed
transitions (substitutions between T and C and between A and G) relative
to that of transversions (all other changes) gradually decreases due to
substitution saturation. This can lead to detection of artefactual
diversifying selection events. For instance, a lower transition\trans-
version ratio may reect a positive, diversifying selective pressure on the
region (and vice versa) since, overall, transitions are more likely to be
synonymous changes than are transversions. Simulation studies show
that phylogenetic information is essentially lost when the observed
saturation is equal to or greater than half of full substitution saturation
(Xia, 1999). The presence of substitution saturation was tested in our
sequence data sets with omsr version 4.0.43 (Xia & Xie, 2001).
Comparison of the saturation index expected when assuming half of full
saturation with the observed saturation index was performed with a t test
for each of the proteins. A plot of the transition and transversion rates
(estimated with omsr) versus the divergence of pairwise comparisons
between the six full-length ORFs also oered a visual display of
substitution saturation.
Phylogeny construction and evaluation was done using the
neighbour-joining (NJ), Fitch and Margoliash and maximum-likelihood
methods implemented in the rnv||r software package (Felsenstein, 1993).
One thousand bootstrap replications were performed to place condence
estimates on groups contained in the most-parsimonious unrooted trees.
Nodes with low reliability (bootstrap support below 70%; Hillis & Bull,
1993) were collapsed and the subsequent tree topology was used for
maximum-likelihood analyses of codon substitution.
Determination of recombination events. Because recombi-
nation events cause dierent regions of a gene to have dierent
evolutionary histories, it is important to detect them before performing
analyses of evolutionary rates and searches for positive selection. The
initial search for recombination in PVY was conducted using split
decomposition, a method that depicts all the shortest pathways linking
sequences, including those that produce an interconnected network, as
expected under recombination (Bandelt & Dress, 1992). This analysis was
performed with the program SplitsTree2 (Huson, 1998). When putative
recombinant sequences were detected, the use of Lard version 2.2
(Holmes et al., 1999) and SiScan version 1.01 (Smith et al., 2000) allowed
mapping of the recombination events and assessment of the statistical
support for them. Further analyses were performed either discarding the
putative recombinant strains or with sequences that did not contain any
recombination breakpoints.
Tests for positive selection. In order to identify regions
submitted to positive selection in the PVY genome, we employed rm|
version 3.0c, which estimates the occurrence of values greater than 10
and which has proven useful in the documentation of positive selection
in other viruses (Yang et al., 2000). Using the program coorm| and the
tree structures obtained for each of the 10 PVY-encoded proteins, we
employed two categories of maximum-likelihood models. The rst
category of models aims to identify codon sites submitted to positive
selection. These models assume that the non-synonymous\synonymous
substitution rate ratio, , is constant across all the lineages but varies
across the codons. Dierent evolutionary models were tested (Yang et al.,
2000) : a model (M0) assuming a constant ratio, models (M1 and M7)
assuming that amino acid sites are either neutral or deleterious (1)
and models (M3 and M8) allowing the occurrence of positively selected
sites (1). Several other models available in rm| were not exploited
because of their lack of power or because they converged particularly
slowly, according to the recommendations of Yang et al. (2000). For each
codon, the probability of observing the data was computed using the
proportion of sites belonging to these dierent categories. The log
likelihood is the sum of these probabilities over all codons in the
sequence. When 1 for some codons, the likelihood ratio of the two
models to be compared (M3 versus M1 or M8 versus M7) tests whether
the positive selection model ts the data signicantly better than the null
hypothesis : twice the dierence in log likelihood between the two
models is compared with a # distribution with n degrees of freedom, n
being the dierence between the numbers of parameters of the two
models. An empirical Bayesian approach implemented in coorm| was
used to infer to which category (neutral, deleterious or advantageous)
each amino acid most likely belongs. Codons with posterior probabilities
above 95% were considered signicant.
The second category of models aims to identify lineages under-
going positive selection. They assume that is constant across all
the codons but varies across the N dierent branches in the tree.
Comparison of the scenario where each branch has a particular value
( N-ratio model) to the null hypothesis, where is constant across
all branches and all codons (model M0; Yang et al., 2000), con-
stitutes a likelihood ratio test of the constancy of among evolutionary
lineages. When branches with 1 were obtained, we tested
whether the values associated with those particular branches were
signicantly greater than 10 using Fishers exact test of homogeneity
(Zhang et al., 1997).
Care was taken to run rm| with several initial values (at
least the four initial values 02, 10, 12 and 21) for each model to
avoid local maxima. In almost all analyses, at least three initial
values led to an identical maximum likelihood (that was larger than
the one obtained with the fourth initial value, when this was dierent).
When an ambiguity remained, three more initial values were tested
and allowed to produce the maximum likelihood with satisfactory
condence.
A maximum-likelihood method, also implemented in rm|, allowed
the estimation of values in pairwise sequence comparisons (Yang &
Nielsen, 2000). Signicant departures from neutral evolution (
estimates signicantly greater than 10) were assessed with Fishers
exact test of homogeneity (Zhang et al., 1997).
Results
Sequence data
We have cloned and sequenced the entire genomes of the
LYE84.2 and SON41 PVY strains. No dierences were
observed between the clones obtained with Taq or Pfu DNA
CFGF
B. Moury and others B. Moury and others
Fig. 1. Plot of the transition rate (4) and transversion rate (#)
against the Tamura & Nei (1993) (TN93) distance for pairwise
comparisons of the coding sequences of the six full-length PVY strains.
Linearity is an indication of lack of saturation of the phylogenetic signal.
01
Fig. 2. Unrooted neighbour-joining phylogenetic tree of the CP gene of
PVY. Bootstrap analysis was applied using 1000 bootstrap samples. Some
bootstrap percentages at internal nodes are reported. The different
subgroups of PVY strains from potato plants are indicated. Underlined
sequences and sequences in boxes are respectively from tobacco and
pepper strains.
polymerases (SON41) or between the dierent sequence
reactions (LYE84.2). The nucleotide sequences and deduced
amino acid sequences of LYE84.2 and SON41 have been
submitted to the EMBL database under accession numbers
Fig. 3. (A) Map of recombination breakpoints identied in the PVY
genome. These recombination events were also shown in Revers et al.
(1996) and Glais et al. (2002) except for the two recombination
breakpoints in the NIb gene. (B) Z scores for total nucleotide identity
scores of the NIb gene were calculated with SiScan for two sets of
comparisons: strains X12456 and M95491 (dashed line) and strains
X12456 and D12539 (unbroken line), showing two putative
recombination events for strain X12456. Z values above 367 are
signicant (P 5%).
AJ439545 and AJ439544, respectively. These are the rst full-
length sequences of non-potato strains and the rst sequences
of non-potato strains for several PVY proteins (P3, 6K1, CI,
6K2 and NIa).
Analysis of the phylogenetic signal
Evidence for lack of substitution saturation comes from
linear regression of the transition and transversion rates versus
the divergence of the sequences (Fig. 1). In the case of
saturation, the number of observed transitions relative to that
of transversions gradually decreases with increasing diver-
gence (Xia, 1999). The plot of pairwise comparisons between
the six full-length ORFs of PVY showed that this is not the
case here (Fig. 1). Expected and observed saturation indices
were also compared for each data set, corresponding to each
protein, with omsr. The signicance of their dierences
showed that there was no evidence for substitution saturation
(data not shown).
Topology of the PVY trees
As expected, the trees obtained with the dierent PVY
genes can be divided into three main parts (illustrated for the
CFGG
D
i
v
e
r
s
i
f
y
i
n
g
s
e
l
e
c
t
i
o
n
i
n
p
o
t
y
v
i
r
u
s
e
s
D
i
v
e
r
s
i
f
y
i
n
g
s
e
l
e
c
t
i
o
n
i
n
p
o
t
y
v
i
r
u
s
e
s
Table 2. Maximum-likelihood analysis of the evolution of PVY proteins with models allowing to vary across amino acid sites
Results were obtained with the coorm| program implemented in rm| (Yang, 1997).
Parameter P1
HC-Pro (12
sequences) P3 6K1 CI 6K2
NIa
VPg
NIa
Pro
NIb (9
sequences) CP
Log likelihood for model :*
M0 342168 404228 292402 37400 496257 45295 149261 192983 296369 360108
M1 340916 407060 287809 38535 498315 45278 148422 193687 293872 352535
M3 337830 402508 287680 37400 494400 44687 147801 191822 292016 348860
M7 337837 402581 288156 37401 494828 44926 148101 192066 293162 351103
M8 337832 402509 287687 37400 494400 44663 147795 191821 292098 348860
Best t among models M0, M1
and M3
M3 M3 M1 M0 M3 M3 M3 M3 M3 M3
Mean 0226 0072 0213 0065 0032 0116 0069 0053 0067 0162
Maximum category and
corresponding proportion, p,
of sites
0562,
p 0363
0625,
p 0034
1000,
p 0213
0065,
p 1000
0253,
p 0101
2002,
p 0029
0402,
p 0169
0861,
p 0026
0850,
p 0042
1086,
p 0125
M3 versus M1 0019 410

15
M8 versus M7 0072 210

10
* Model names after Yang et al. (2000). M0 assumes a constant ratio for amino acid sites. M1 and M7 assume that no amino acid sites are under diversifying selection whereas M3 and
M8 allow sites with 1.
For the best-t model among M0, M1 and M3.
Probabilities that twice the dierence between the log likelihoods is smaller than a # with 2 (M8 versus M7) or 4 (M3 versus M1) degrees of freedom. Values signicant at the 5%
threshold are in bold.
C
F
G
H
B. Moury and others B. Moury and others
Fig. 4. Alignment of the CP sequences of
PVY, PVA, BYMV and YMV and amino
acid sites (shaded) that are putatively
submitted to diversifying selection
(P 95% to belong to categories
109, 155, 180 and 105, respectively).
Positions submitted to positive selection
and shared by several potyviruses are
indicated by arrows.
CP gene in Fig. 2) : PVY strains were mainly divided into the
N and O strains and a third group, composed mainly of non-
potato strains. For the CP, PVY strains of the C group are
joined to the non-potato strains (C1 subgroup) or consist of a
separate group (C2 subgroup) between the non-potato strains
and the N and O strains. No other sequences are available for
strains in the C groups, except one for the P1 protein that
clusters with SON41 and LYE84.2. Tobacco strains can be
joined to any group. This clustering was supported by high
bootstrap values and was consistent for all PVY genes, with
the exception of recombination events (see below). The
dierent methods used to investigate the PVY phylogeny
gave the same topologies with slight dierences in the
bootstrap values (data not shown).
Detection of recombination in the PVY genome
Several putative recombination events were identied
among the sequences examined and were found to be
statistically signicant (Fig. 3A). The recombinations, occurring
in CP and in the 3 untranslated region, have already been
documented (Revers et al., 1996). The recombinations aecting
strain N-Wilga (in the P1 protein) and strain M95491 (in the P3
and VPg proteins) have also been identied (Glais et al., 2002)
and resulted in changes of phylogenetic group (between PVY
N and O subgroups) that were supported by high bootstrap
values (70%). Finally, two recombination events in the NIb
gene were found to be signicant for strain X12456, with the
central part of NIb linked to the N group and the rest of the
genome linked to the O group of PVY (Fig. 3B). Reanalysis of
the data with SplitsTree2 without these recombinant strains
did not reveal any other putative recombinant strains.
Non-synonymous/synonymous substitution rate
analyses in PVY
Heterogeneous evolution across amino acid sites. Log likelihood
values and parameter estimates for the best-t model among
M0, M1 and M3 (see Methods) are listed in Table 2 for each
PVY protein. As a whole, the evolution of the PVY proteins
appears to be conservative, with mean values ranging from
003 (CI) to 023 (P1). There was no evidence of selection for
codon usage (eective number of codons is 5258). Occurrence
of sites submitted to positive selection was shown for the 6K2
and CP proteins. The values associated with the positively
selected sites were weak (respectively 20 and 11) and
comprised one (amino acid position 14 of the 6K2 protein) and
24 (CP; Fig. 4) amino acid positions with a posterior probability
greater than 95% that they belong to the positive selection
category. Comparison of models M8 and M7 (Yang et al.,
2000) conrmed the signicance of positive selection for CP
but was slightly above the 5% threshold for the 6K2 protein
(Table 2). For the 6K2 protein, the signicance of positive
selection (comparisons of M3 and M1 and of M8 and M7)
increased when a larger region was analysed (from the end of
CI to the end of VPg) with the six sequences available in the
data set (data not shown). It is unlikely that detection of
positive selection in CP was due to the large number of
sequences used for the analysis. Positive selection was still
detected with only the six CP sequences corresponding to the
full-length PVY genomes [using rm| allowed detection of a
category of codon positions with 15 (model M8) and
showed that model M8 is signicantly better than model M7
(P 002)]. A proportion of amino acid sites with relatively
high values was also observed for NIa-Pro (26% of sites
with 086), NIb (42% of sites with 085) and
CFGI
Diversifying selection in potyviruses Diversifying selection in potyviruses
T
a
b
l
e
3
.
O
c
c
u
r
r
e
n
c
e
o
f
d
i
v
e
r
s
i
f
y
i
n
g
s
e
l
e
c
t
i
o
n
i
n
t
h
e
C
P
o
f
p
o
t
y
v
i
r
u
s
e
s
w
i
t
h
v
a
r
y
i
n
g
h
o
s
t
-
r
a
n
g
e
p
r
o
p
e
r
t
i
e
s
P
a
r
a
m
e
t
e
r
P
V
V
P
V
A
P
P
V
Y
M
V
L
M
V
P
Y
Y
T
u
M
V
B
Y
M
V
S
e
q
u
e
n
c
e
s
a
n
a
l
y
s
e
d
9
2
1
1
7
2
0
1
3
2
0
2
7
1
8
S
o
u
r
c
e
p
l
a
n
t
g
e
n
e
r
a
1
1
1
1
2
4
7
8
N
u
c
l
e
o
t
i
d
e
d
i
v
e
r
s
i
t
y
*
0

0
2
1
(
0

0
1
0
)
0

0
3
0
(
0

0
1
5
)
0

0
1
3
(
0

0
0
7
)
0

1
1
5
(
0

0
5
5
)
0

0
3
3
(
0

0
1
6
)
0

0
7
9
(
0

0
3
7
)
0

0
5
8
(
0

0
2
8
)
0

1
1
4
(
0

0
5
4
)
M
a
x
i
m
u
m

c
a
t
e
g
o
r
y
a
n
d
c
o
r
r
e
s
p
o
n
d
i
n
g
p
r
o
p
o
r
t
i
o
n
,
p
,
o
f
s
i
t
e
s

5
5
,
p

1
9

5
3
,
p

0
5

0
5
,
p

1
1

4
6
,
p

0
3

0
9
,
p

1
2

8
0
,
p

0
3
M
3
v
e
r
s
u
s
M
1

5
5
0

5
2
4

1
0

2
2
0

6
1
4

1
0

1
5

1
0

1
2
M
8
v
e
r
s
u
s
M
7

0
2
0

1
2
3

1
0

4
0

4
0
2

1
0

1
0

1
0

7
*
E
x
p
r
e
s
s
e
d
a
s
n
u
c
l
e
o
t
i
d
e
d
i
v
e
r
s
i
t
y
p
e
r
s
i
t
e
(
N
e
i
,
1
9
8
7
)
,
w
i
t
h
s
t
a
n
d
a
r
d
d
e
v
i
a
t
i
o
n
s
i
n
p
a
r
e
n
t
h
e
s
e
s
.

A
c
c
o
r
d
i
n
g
t
o
r

m
|
,
m
o
d
e
l
M
8
(
Y
a
n
g
,
1
9
9
7
)
.

P
r
o
b
a
b
i
l
i
t
i
e
s
t
h
a
t
t
w
i
c
e
t
h
e
d
i

e
r
e
n
c
e
b
e
t
w
e
e
n
t
h
e
l
o
g
l
i
k
e
l
i
h
o
o
d
s
b
e
t
w
e
e
n
m
o
d
e
l
s
M
3
a
n
d
M
1
o
r
b
e
t
w
e
e
n
m
o
d
e
l
s
M
8
a
n
d
M
7
i
n
r

m
|
(
Y
a
n
g
,
1
9
9
7
)
i
s
s
m
a
l
l
e
r
t
h
a
n
a

#
w
i
t
h
4
o
r
2
d
e
g
r
e
e
s
o
f
f
r
e
e
d
o
m
,
r
e
s
p
e
c
t
i
v
e
l
y
.
V
a
l
u
e
s
s
i
g
n
i

c
a
n
t
a
t
t
h
e
5
%
t
h
r
e
s
h
o
l
d
a
r
e
i
n
b
o
l
d
.
especially P3, with 213% of the sites undergoing neutral
evolution (1).
Heterogeneous evolution across lineages. For all proteins in the
PVY genome except the P1, 6K1 and 6K2 proteins, the N-
ratio model (see above) tted the data signicantly better than
M0, indicating heterogeneity in the mode of evolution along
lineages (data not shown). In all 10 proteins, branches were
detected with estimates greater than 1 (data not shown), but
these particular branches involved only a very small number of
substitutions (a maximum of eight substitutions was observed
for a branch in the P3 tree with 19). The estimates of
these branches were never signicantly greater than 1,
essentially because of the small number of substitutions along
these branches in comparison with the background branches
(data not shown).
Evolution modes of the CP of other potyviruses
Few sequence data are available for the 6K2 proteins of
other potyviruses. However, the numerous available CP
sequences can be used to see whether the CP of other
potyviruses share the same evolution pattern as that of PVY.
Hence, the same analyses as before were performed with the
CP genes of seven potyviruses, with either wide or narrow
host ranges (Table 3). For Plum pox virus (PPV), TuMV and
YMV, several sequences were discarded because of important
gaps in the alignments. Most of the potyviruses showed a
varying proportion of fast-evolving amino acid sites, except
PVV and TuMV (Table 3). Diversifying selection was
statistically signicant for YMV and BYMV when models M3
and M1 or M8 and M7 in rm| were compared and for PVA
when models M3 and M1 were compared (Table 3). Some of
the amino acid positions subjected to diversifying selection (P
95%) are identical in an alignment of the CP of PVY, PVA,
BYMV and YMV (Fig. 4).
As for the PVY proteins, models allowing the ratio to
vary across lineages showed signicant heterogeneity ( N-
ratio model versus M0) in most cases, but did not reveal any
branch with signicantly greater than 10 (data not shown).
Estimates of in sequence pairwise comparisons (Yang &
Nielsen, 2000) for all sequence data sets were rarely greater
than 10 or involved only a small number of substitutions. An
exception was found for PPV, where pairwise comparisons
between the whole CP sequences of two strains [corresponding
to accession numbers X81078 (unknown host origin) and
X81083 (sour cherry origin)] and several other strains revealed
estimates greater than 10 with relatively large numbers of
substitutions. The estimate between CP sequences X81078 and
X81083 is 309, with 163 non-synonymous versus 20
synonymous substitution estimates, very near the 5% signi-
cance threshold (P 006).
CFGJ
B. Moury and others B. Moury and others
Discussion
Occurrence of positive selection in potyviruses
Sequence analysis has allowed the detection of genome
regions that are undergoing positive selection in several animal
viruses (reviewed in Yang & Bielawski, 2000), mainly in
domains that interact with the host immune system. It is
dicult, however, to derive evidence for positive selection
from sequence data when ratios are estimated for entire
genes. Indeed, ratios for plant virus genes are usually low,
except for some overlapping ORFs (reviewed in Garc!a-Arenal
et al., 2001). Positive selection is, however, suggested to occur
in natural populations of Tobacco mild green mosaic virus (Fraile
et al., 1996).
Our study showed that amino acids in two PVY proteins
were submitted to positive selection: a single amino acid
position in the 6K2 protein and several amino acid positions in
the CP (especially in the N terminus). Evidence for diversifying
selection was also obtained for the CP of BYMV and YMV.
Several of these positions were shared (or nearby) in several
potyviruses, strengthening their signicance. Heterogeneous
modes of evolution were detected not only across amino acid
sites but also across lineages. For most of the PVY proteins and
for the CP of other potyviruses, the ratio is signicantly
dierent across the dierent lineages. However, no specic
branches with ratios signicantly greater than 10 were
detected, because of the relatively small numbers of substitu-
tions. Diversifying selection events were also suspected for the
CP of PVAand of two PPVisolates in comparison with several
other isolates. It should be noted that the test of molecular
adaptation used in rm| is highly conservative: it will fail if
positive selection aects only a few amino acid sites along a
fewlineages on the phylogeny. Models that allowthe selective
pressure to vary both among lineages and among codons
would be more realistic for detecting adaptive molecular
evolution, but should have increased power.
Evolutionary constraints on the potyvirus genome
We have obtained evidence for positive selection events
occurring in two proteins of the PVY genome and in the CP of
other potyviruses. In the absence of a three-dimensional
structural model or with only very partial delineation of the
functional domains of these proteins, it is dicult to unravel
the evolutionary constraints that are responsible for these
events. It is plausible that the regions identied are ligand-
binding domains and that positive selection reects gains or
losses of anity for these ligands. In this respect, interactions
with plant or vector-insect factors as well as interactions with
other PVY proteins can be the driving forces of this evolution.
The N-terminal part of CP is non-essential for replication
and cell-to-cell movement of the potyviruses Tobacco etch virus
(TEV) (Dolja et al., 1993, 1994) and Zucchini yellow mosaic virus
(Arazi et al., 2001). Mutations in this part of the genome can,
however, aect some virus functions quantitatively. Deletion
of 25 amino acids in the CP of TEV (8 of these amino acids
correspond to amino acid positions submitted to positive
selection in the PVY genome; Fig. 4) reduced the speed of cell-
to-cell movement in tobacco and completely abolished sys-
temic movement (Dolja et al., 1994). A substitution of the
amino acid immediately following the DAG motif in the CP of
tobacco vein mottling potyvirus (the corresponding amino
acid in PVY is potentially submitted to positive selection; Fig.
4) reduced the eciency of aphid transmission of the virus
(Atreya et al., 1995). This illustrates that amino acid substitu-
tions in this region can aect the tness of various potyviruses
quantitatively and can consequently be the target of selective
processes. The N-terminal part of CP is a striking example of
the multifunctionality of viral proteins. It is exposed on the
virion surface (thus potentially involved in binding ligands)
and involved in aphid transmission (Atreya et al., 1990, 1991,
1995) and cell-to-cell and long-distance movement (Dolja et al.,
1994, 1995; Rojas et al., 1997). Therefore, it is dicult to know
which virus function is involved in these positive selection
events.
Athree-dimensional structure model was recently proposed
for the CP of PVA (Baratova et al., 2001). We compared the
amino acid positions previously identied with this model as a
reference, using the CP sequence alignment in Fig. 4 and
keeping in mind that only weak evidence of diversifying
selection was obtained for PVA (Table 3). In all four
potyviruses, a large proportion of sites that belong to the
positive selection category are in the N-terminal part of the
CP, comprising the regions most accessible at the particle
surface (Baratova et al., 2001). Several of the positively selected
amino acid sites are also encountered in the central and C-
terminal (internal) parts of the capsid, without coinciding with
a particular domain of the putative protein structure.
The central hydrophobic domain of the 6K2 protein is
responsible for binding to the ER (Schaad et al., 1997) and it has
consequently been proposed that the 6K2 protein is required
for genome amplication and that it anchors the replication
apparatus to ER membranes. Chu et al. (1997) showed that
substitutions in a region spanning the end of CI, 6K2 and the
beginning of VPg of TEV were sucient (but not absolutely
necessary) to induce a wilting response in Tabasco pepper.
Since most of the TEV eld isolates belong to the wilting
type, the determinants of the wilting response in Tabasco
could confer a selective advantage to TEV. However, the two
amino acid changes in the 6K2 peptide that distinguish a non-
wilting from a wilting isolate of TEV do not coincide with
amino acid position 14 of PVY, which we suspect to be under
positive selection.
Host adaptation and evolution of the potyvirus
genome
Many environmental factors can exert evolutionary con-
straints on PVY, but one of the most important is certainly the
host plant. Occurrence of diversifying selection in the CP of
CFHA
Diversifying selection in potyviruses Diversifying selection in potyviruses
potyviruses was not found to correlate with the width of host
range (Table 3). The three potyviruses that showed evidence of
positive selection (PVY, BYMV and YMV) correspond to CP
sequences with the largest diversity measured at the nucleotide
level (Table 3). Strong conservation of the nucleotide se-
quences for the other potyviruses can be interpreted as an
eect of purifying selection and can also be responsible for the
lack of signicance of positive selection when estimates
greater than 10 were obtained (PVA, PPV and LMV). There
was no structuring of the phylogenetic trees as a function of
the host species of the isolates (data not shown) except,
partially, for PVY, which would be an indication of the lack of
involvement of the CP as a host-range determinant. However,
it should be emphasized that very limited host-range data are
available for the majority of the strains used in these sequence
analyses.
Biological arguments have already been reported for host
adaptation of PPV in cherry trees versus other Prunus species
(Dosba et al., 1987; Nemchinov et al., 1996) and especially for
PVY. In PVY, isolates from pepper usually do not infect potato
plants systemically (McDonald &Kristjansson, 1993; dAquino
et al., 1995; Gebre! -Selassie! et al., 1985) when inoculated
manually. Reciprocally, manual inoculations did not allow
systemic infection of pepper plants with PVY
N
strains (Gebre! -
Selassie! et al., 1985; McDonald &Kristjansson, 1993; Valkonen
et al., 1996), whereas some of the strains in the O and C groups
could infect some pepper cultivars (McDonald & Kristjansson,
1993; Valkonen et al., 1996; Blanco-Urgoiti et al., 1998).
Tomato (Stobbs et al., 1994; Legnani, 1995) and tobacco
(Blancard, 1998) plants lacking specic resistance genes can be
infected by most, if not all, PVY isolates, including those that
originate from potato and pepper. Taken together, these data
suggest an inuence of some host plants (potato and pepper)
but not others (tomato and tobacco) in the selection and
evolution of PVY isolates. There are also partial arguments for
host-range variations according to the source plants for strains
of BYMV (Wada et al., 2000) and TuMV (e.g. Stavolone et al.,
1998). PVV, PVA, YMV and LMV, which all share very
narrow host ranges, do not seem to undergo adaptation or
diversication according to host plants (except for overcoming
of specic cultivar resistance genes, or propagation of particular
virus strains that could be due to vegetative and clonal
multiplication of cultivars). However, two kinds of bias should
be emphasized. (i) Once again, very limited host-range data are
available for the majority of the strains used in these sequence
analyses, and (ii) rm| analyses are based on substitution
models; very divergent CP sequences (e.g. some cherry isolates
of PPV and some YMV and TuMV isolates) could not be
included in the analyses because of numerous gaps in the
alignments. This will reduce the possibility of detecting
divergent evolution events and may introduce a bias for the
comparison of CP evolution of the dierent potyviruses.
In conclusion, evidence of diversifying selection in several
potyviruses cannot yet be interpreted in terms of evolutionary
constraints. Further biological characterization should be
performed and the role of amino acid substitutions in these
dierent proteins investigated through molecular analyses.
Confrontation of biological data with the modes of evolution
of the CP makes this protein an interesting candidate for host
adaptation of some potyviruses, including PVY, BYMV, YMV
and, maybe, PPV.
B. M. and C. M. contributed equally to this work and should be
considered as joint rst authors. B. M. was the recipient of a post-doctoral
NATO fellowship and C. M. of a PhD fellowship from the French
Research Ministry. We thank F. Garc!a-Arenal for improving the
manuscript, N. Galtier for helpful comments, V. Marie-Jeanne Tordo for
unpublished sequence data and L. Guilbaud and B. Olsen for ecient
technical help.
References
Arazi, T., Shiboleth, Y. M. & Gal-On, A. (2001). A nonviral peptide can
replace the entire N terminus of zucchini yellow mosaic potyvirus coat
protein and permits viral systemic infection. Journal of Virology 75,
63296336.
Atreya, C. D., Raccah, B. &Pirone, T. P. (1990). Apoint mutation in the
coat protein abolishes aphid transmissibility of a potyvirus. Virology 178,
161165.
Atreya, P. L., Atreya, C. D. & Pirone, T. P. (1991). Amino acid
substitutions in the coat protein result in loss of insect transmissibility of
a plant virus. Proceedings of the National Academy of Sciences, USA 88,
78877891.
Atreya, P. L., Lopez-Moya, J. J., Chu, M., Atreya, C. D. & Pirone, T. P.
(1995). Mutational analysis of the coat protein N-terminal amino acids
involved in potyvirus transmission by aphids. Journal of General Virology
76, 265270.
Bandelt, H. J. & Dress, A. W. M. (1992). Split decomposition: a new and
useful approach to phylogenetic analysis of distance data. Molecular
Phylogenetics and Evolution 1, 242252.
Baratova, L. A., Emov, A. V., Dobrov, E. N., Fedorova, N. V., Hunt, R.,
Badun, G. A., Ksenofontov, A. L., Torrance, L. & Ja$ rveku$ lg, L. (2001).
In situ spatial organization of potato virus A coat protein subunits as
assessed by tritium bombardment. Journal of Virology 75, 96969702.
Blancard, D. (1998). Maladies du Tabac. Observer, Identier, Lutter. Edited
by INRA. Paris, France.
Blanco-Urgoiti, B., Sa! nchez, F., Pe! rez de San Roma! n, C., Dopazo, J. &
Ponz, F. (1998). Potato virus Y group C isolates are a homogeneous
pathotype but two dierent genetic strains. Journal of General Virology 79,
20372042.
Chu, M., Lopez-Moya, J. J., Llave-Correas, C. & Pirone, T. P. (1997).
Two separate regions in the genome of tobacco etch virus contain
determinants of the wilting response of Tabasco pepper. Molecular
PlantMicrobe Interactions 10, 472480.
dAquino, L., Dalmay, T., Burgyan, J., Ragozzino, A. &Scala, F. (1995).
Host range and sequence analysis of an isolate of potato virus Y inducing
veinal necrosis in pepper. Plant Disease 79, 10461050.
Dolja, V. V., Herndon, K. L., Pirone, T. P. & Carrington, J. C. (1993).
Spontaneous mutagenesis of a plant potyvirus genome after insertion of
a foreign gene. Journal of Virology 67, 59685975.
Dolja, V. V., Haldeman, R., Robertson, N. L., Dougherty, W. G. &
Carrington, J. C. (1994). Distinct functions of capsid protein in assembly
CFHB
B. Moury and others B. Moury and others
and movement of tobacco etch potyvirus in plants. EMBO Journal 13,
14821491.
Dolja, V. V., Haldeman-Cahill, R., Montgomery, A. E., Vandenbosch,
K. A. & Carrington, J. C. (1995). Capsid protein determinants involved
in cell-to-cell and long distance movement of tobacco etch potyvirus.
Virology 206, 10071016.
Dosba, F., Maison, P., Lansac, M. & Massonie, G. (1987). Experimental
transmission of plum pox virus (PPV) to Prunus mahaleb and Prunus
avium. Journal of Phytopathology 120, 199204.
Dougherty, W. G. &Carrington, J. C. (1988). Expression and function of
potyviral gene products. Annual Review of Phytopathology 26, 123143.
Fakhfakh, H., Makni, M., Robaglia, C., Elgaaied, A. & Marrakchi, M.
(1995). Polymorphisme des re! gions capside et 3 NTR de 3 isolats
tunisiens du virus Y de la pomme de terre (PVY). Agronomie 15, 569579.
Felsenstein, J. (1993). rnv||r: phylogenetic inference package, version
3.5c. Distributed by the author. Department of Genetics, University of
Washington, Seattle, USA.
Fraile, A., Malpica, J. M., Aranda, M. A., Rodr!guez-Cerezo, E. &
Garc!a-Arenal, F. (1996). Genetic diversity in tobacco mild green
mosaic tobamovirus infecting the wild plant Nicotiana glauca. Virology
223, 148155.
Garc!a-Arenal, F., Fraile, A. & Malpica, J. M. (2001). Variability and
genetic structure of plant virus populations. Annual Review of Phyto-
pathology 39, 157186.
Gebre! -Selassie! , K., Marchoux, G., Delecolle, B. & Pochard, E. (1985).
Variabilite! naturelle des souches du virus Y de la pomme de terre dans les
cultures de piment du sud-est de la France. Caracte! risation et classication
en pathotypes. Agronomie 5, 621630.
Glais, L., Tribodet, M. & Kerlan, C. (2002). Genomic variability in
potato potyvirus Y (PVY) : evidence that PVY
N
W and PVY
NTN
variants
are single to multiple recombinants between PVY
O
and PVY
N
isolates.
Archives of Virology 147, 363378.
Gubler, U. & Hoffman, B. J. (1983). A simple and very ecient method
for generating cDNA libraries. Gene 25, 263269.
Hillis, D. M. & Bull, J. J. (1993). An empirical test of bootstrapping as a
method for assessing condence in phylogenetic analysis. Systematic
Biology 42, 182192.
Holmes, E. C., Worobey, M. & Rambaut, A. (1999). Phylogenetic
evidence for recombination in dengue virus. Molecular Biology and
Evolution 16, 405409.
Huson, D. H. (1998). SplitsTree: analyzing and visualizing evolutionary
data. Bioinformatics 14, 6873.
Kimura, M. (1983). The Neutral Theory of Molecular Evolution. Cam-
bridge: Cambridge University Press.
Legnani, R. (1995). Analyse, comparaison et exploitation des reTsistances au
virus Y de la pomme de terre (PVY) et au tobacco etch virus (TEV) chez la
tomate. PhD thesis, University of Montpellier II, France.
Li, W.-H. (1997). Molecular Evolution. Sunderland, MA: Sinauer
Associates.
McDonald, J. G. & Kristjansson, G. T. (1993). Properties of strains of
potato virus Y
N
in North America. Plant Disease 77, 8789.
Marie-Jeanne Tordo, V., Chachulska, A. M., Fakhfakh, H., Le Ro-
mancer, M., Robaglia, C. & Astier-Manifacier, S. (1995). Sequence
polymorphism in the 5NTR and in the P1 coding region of potato virus
Y genomic RNA. Journal of General Virology 76, 939949.
Mestre, P., Brigneti, G. & Baulcombe, D. C. (2000). An Ry-mediated
resistance response in potato requires the intact active site of the NIa
proteinase from potato virus Y. Plant Journal 23, 653661.
Nei, M. (1987). Molecular Evolutionary Genetics. New York: Columbia
University Press.
Nemchinov, L., Hadidi, A., Maiss, E., Cambra, M., Candresse, T. &
Damsteegt, V. (1996). Sour cherry strain of plum pox potyvirus (PPV) :
molecular and serological evidence for a new subgroup of PPV strains.
Phytopathology 86, 12151221.
Nielsen, R. & Yang, Z. (1998). Likelihood models for detecting
positively selected amino acid sites and application to the HIV-1
envelope gene. Genetics 148, 929936.
Revers, F., Le Gall, O., Candresse, T., Le Romancer, M. & Dunez, J.
(1996). Frequent occurrence of recombinant potyvirus isolates. Journal of
General Virology 77, 19531965.
Riechmann, J. L., La!n, S. & Garc!a, J. A. (1992). Highlights and
prospects of potyvirus molecular biology. Journal of General Virology 73,
116.
Rojas, M. R., Zerbini, F. M., Allison, R. F., Gilbertson, R. L. & Lucas,
W. J. (1997). Capsid protein and helper component-proteinase function
as potyvirus cell-to-cell movement proteins. Virology 237, 283295.
Schaad, M. C., Jensen, P. E. & Carrington, J. C. (1997). Formation of
plant RNA virus replication complexes on membranes : role of an
endoplasmic reticulum-targeted viral protein. EMBO Journal 16, 4049
4059.
Shukla, D. D., Ward, C. W. & Brunt, A. A. (1994). Genome structure,
variation and function. In The Potyviridae, pp. 74110. Wallingford, UK:
CAB International.
Smith, G. R., Borg, Z., Lockhart, B. E. L., Braithwaite, K. S. & Gibbs,
M. J. (2000). Sugarcane yellow leaf virus : a novel member of the Luteo-
viridae that probably arose by inter-species recombination. Journal of
General Virology 81, 18651869.
Stavolone, L., Alioto, D., Ragozzino, A. & Laliberte! , J.-F. (1998).
Variability among turnip mosaic potyvirus isolates. Phytopathology 88,
12001204.
Stobbs, L. W., Poysa, V. & Van Schagen, J. G. (1994). Susceptibility of
cultivars of tomato and pepper to a necrotic strain of potato virus Y.
Canadian Journal of Plant Pathology 16, 4348.
Tamura, K. & Nei, M. (1993). Estimation of the number of nucleotide
substitutions in the control region of mitochondrial DNA in humans and
chimpanzees. Molecular Biology and Evolution 10, 512526.
Valkonen, J. P. T., Kyle, M. M. & Slack, S. A. (1996). Comparison of
resistance to potyviruses within Solanaceae : infection of potatoes with
tobacco etch potyvirus and peppers with potato A and Y potyviruses.
Annals of Applied Biology 129, 2538.
Wada, Y., Iwai, H., Ogawa, Y. & Arai, K. (2000). Comparison of
pathogenicity and nucleotide sequences of 3-terminal regions of Bean
yellow mosaic virus isolates from Gladiolus. Journal of General Plant
Pathology 66, 345352.
Xia, X. (1999). omsr (Software Package for Data Analysis in Molecular
Biology and Evolution). User Manual. Hong Kong: Department of
Ecology and Biodiversity, University of Hong Kong.
Xia, X. & Xie, Z. (2001). omsr: software package for data analysis in
molecular biology and evolution. Journal of Heredity 92, 371373.
Yang, Z. (1997). rm| : a program package for phylogenetic analysis by
maximumlikelihood. Computer Applications in the Biosciences 13, 555556.
Yang, Z. (1998). Likelihood ratio tests for detecting positive selection
and application to primate lysozyme evolution. Molecular Biology and
Evolution 15, 568573.
Yang, Z. & Bielawski, J. P. (2000). Statistical methods for detecting
molecular adaptation. Trends in Ecology & Evolution 15, 496503.
CFHC
Diversifying selection in potyviruses Diversifying selection in potyviruses
Yang, Z. & Nielsen, R. (2000). Estimating synonymous and nonsynony-
mous substitution rates under realistic evolutionary models. Molecular
Biology and Evolution 17, 3243.
Yang, Z., Nielsen, R., Goldman, N. & Pedersen, A.-M. K. (2000).
Codon-substitution models for heterogeneous selection pressure at
amino acid sites. Genetics 155, 431449.
Zhang, J., Kumar, S. & Nei, M. (1997). Small-sample tests of episodic
adaptive evolution: a case study of primate lysozymes. Molecular Biology
and Evolution 14, 13351338.
Received 27 March 2002; Accepted 17 June 2002
CFHD

You might also like