Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

Gene 333 (2004) 81 – 90

www.elsevier.com/locate/gene

Adaptive evolution and functional divergence of pepsin gene family


Vincenzo Carginale a, Francesca Trinchella b, Clemente Capasso a, Rosaria Scudiero b,
Marilisa Riggio a, Elio Parisi a,*
a
CNR Institute of Protein Biochemistry, Via Marconi 10, 80125 Naples, Italy
b
Department of Evolutionary and Comparative Biology, University Federico II, via Mezzocannone 8, Naples, Italy
Received 25 September 2003; received in revised form 22 January 2004; accepted 5 February 2004

Available online 26 April 2004

Abstract

In vertebrates, a large proportion of genes is organized in gene families. Paralogous gene groups generated by gene duplication are related
by homology, high degree of sequence identity and similar structural architecture of their products. Aspartic proteinases form a widely
distributed protein superfamily including cathepsins, pepsins, renin and napsin. In the present study, the nucleotide sequences coding for
various pepsins in 30 vertebrate species have been used to derive a gene phylogeny. Gene duplication and losses have been inferred from a
reconciled tree, reconstructed by combining information from gene tree and species tree. Our findings based on the results of the relative rate
ratio test and maximum likelihood analysis suggest that each round of gene duplication is characterized by adaptive evolution, although
instances of evolution under positive selection have been found also long after divergence of gene families. The results of functional
divergence analysis provided statistical evidence for shifted evolutionary rate after gene duplication.
D 2004 Elsevier B.V. All rights reserved.

Keywords: Pepsins; Phylogeny; Gene duplication; Adaptive evolution; Functional divergence; Gene families; Gene duplication; Reconciled tree; Relative rate
ratio; Maximum likelihood analysis

1. Introduction function; alternatively, one paralog may undergo a number


of deleterious mutations culminating in a complete loss of
A variety of studies demonstrate the complexity of function (Walsh, 1995). Although the persistence of two
eukaryote genome with respect to that of prokaryotes. paralogous loci seems more likely to occur when one of the
While most genes in prokaryotes are present in single copy, replicates acquires a novel function, it is not infrequent to
a large portion of eukaryotic genes is hierarchically orga- find instances of large gene families in which both paralogs
nized into families and superfamilies, comprising various resulting from a duplication event persist without apparent
paralogous genes produced by gene duplication (Ohno, loss of function (Lynch and Conery, 2000; Lynch and
1970; Ohta, 1988, 1989). From an operative point of view, Force, 2000; Lynch et al., 2001).
a gene family is defined as a group of genes sharing a The maintenance of both replicates may depend on a
pairwise amino acid similarity higher than 50%, while a variety of causes, including acquisition of modified func-
superfamily includes genes having a lower similarity level, tions, tissue-specific expression, or requirement of a higher
but capable of forming an alignable set of sequences. In gene dosage dictated by an increased metabolic demand.
most cases, one of the two replicates retains the original Typical is the case of the vertebrate metallothionein super-
function, whereas the other diverges acquiring a novel family containing several families sharing stringent phylo-
genetic relationships and characterized by the presence of a
number of paralogous genes (Bargelloni et al., 1999). In this
Abbreviations: NNI, nearest-neighbor interchange; TBR, tree bisection- case, the proteins produced by distinct paralogs are termed
reconnection; SPR, subtree pruning regrafting; dN, non-synonymous
substitution rate; dS, synonymous substitution rate.
also isoforms and may attain different expression levels in
* Corresponding author. Tel.: +39-81-7257323. different tissues of the same organism (Scudiero et al.,
E-mail address: e.parisi@ibp.cnr.it (E. Parisi). 2000).

0378-1119/$ - see front matter D 2004 Elsevier B.V. All rights reserved.
doi:10.1016/j.gene.2004.02.011
82 V. Carginale et al. / Gene 333 (2004) 81–90

Aspartic proteinases constitute another widely distributed 2. Materials and methods


protein superfamily, whose members accomplish a variety
of functions involving protein degradation at low pH 2.1. Data set
(Davies, 1990; Dunn, 1992). The most significant families
include cathepsin D, cathepsin E, pepsin, renin, napsin and a Of the pepsin-encoding DNA sequences included in the
new member termed nothepsin (Capasso et al., 1998), present study, seven are from fish, four from amphibians, two
specifically expressed in fish liver under oestrogen control from chicken and the remaining from mammals. This dataset
(Riggio et al., 2000, 2002). constitute a representative sample of genes from distinct
The catalytic mechanism of all known aspartic protei- pepsin families (Table 1). Translated amino acid sequences
nases depends on the presence of two aspartate residues were aligned using the program Clustal X. The nucleotide
forming the active site of the enzyme (Pearl and Blundell, sequences were aligned by the same program and the result-
1984). The structure of all aspartic proteinases shows two ing alignment was refined manually by the program Se-Al v.
aspartate residues positioned in the middle of a cleft covered 1.0 (available at http://www.evolve.zoo.ox.ac.uk/Se-Al/
by a flap forming a hairpin loop protruding from the N- Se-Al.html) using the aligned protein sequences as refer-
terminal portion of the molecule: starting from the cleavage ence. The alignments are available on request. Nucleotide
site, the sites on the substrate in the N-terminal half are frequency bias among species was examined by the chi-
indicated as P1, P2, P3, P4, P5; the sites in the COOH- square test implemented in the program TreePuzzle (Strimmer
terminal half as P1V, P2V, P3V. The corresponding binding and von Haeseler, 1996). Following this test, the sequence of
sites of the cleft interacting with the substrate are indicated chicken pepsin A could not be included in the dataset.
accordingly as S1, S2, S3, S4, S5 and S1V, S2V, S3V (Fusek
and Vetvicka, 1995). 2.2. Data analysis
Pepsins are a family of aspartic proteinases accomplish-
ing important digestive functions in both invertebrates and Differences in the rate of molecular evolution were
vertebrates (Kageyama, 2002). Like other aspartic protei- assessed by the branch length test (Takezaki et al., 1995)
nases, pepsin is produced as a zymogen that is quickly and by the RRtree test (Robinson-Rechavi and Huchon,
converted into mature enzymes by autocatalytic cleavage.
The two major groups of immunogenetically and biochem-
Table 1
ically distinct enzymes are pepsins A and pepsin C, the latter
Name Species Gene Accession no.
also known as gastricsin. In the human stomach, the two
enzymes are distributed in different regions: pepsin A is Plaice A2a Pseudopleuronectes Pepsin 2a AF156787
americanus
localized mainly in the fundus, whereas pepsin C is present
Plaice A2b Pseudopleuronectes Pepsin 2b AF156788
in all the organ compartments. Pepsin C has an optimal pH americanus
slightly higher than pepsin A, but substrate specificity is Rockcod A1 Trematomus bernacchii Pepsin A1 AJ550949
quite the same for the two enzymes (Fusek and Vetvicka, Rockcod A2 Trematomus bernacchii Pepsin A2 AJ550750
1995). In addition to these major families, other pepsin Rockcod A3 Trematomus bernacchii Pepsin A3 AJ550951
Rockcod C Trematomus bernacchii Pepsin C AJ550952
groups have been discovered in foetuses and young indi-
Seatrout C Salvelinus fontinalis Pepsin C AF275939
viduals. Chymosins (also known as pepsins Y) and fetal Frog A Rana catesbeiana Pepsin A AB045376
pepsins F are expressed at high levels in newborn to become Frog C Rana catesbeiana Pepsin C M73750
gradually substituted by pepsin A during post-natal growth. Bullfrog A Xenopus laevis Pepsin A AB045380
As these forms have been found in mammals and birds and Bullfrog C Xenopus laevis Pepsin C AB045379
Chicken C Gallus gallus Pepsin C AB025284
not in other vertebrate groups, it has been inferred that they
Chicken Y Gallus gallus Pepsin Y P16476
are involved in the process of milk digestion or in the Mouse C Mus musculus Pepsin C AK008959
cleavage of egg yolk (Kageyama, 2002). Mouse F Mus musculus Pepsin F AF240776
The ultimate goal of the present study is to find helpful Rat C Rattus norvegicus Pepsin C X04644
indications on the selective forces responsible of the evolu- Rat F Rattus norvegicus Pepsin F AJ251687
Rat Y Rattus norvegicus Pepsin Y AJ251688
tionary process occurred in the pepsin gene family. The
Rabbit III A Oryctolagus cuniculus Pepsin III A AB089792
outline of the present paper is the following: (i) construction Rabbit II-4A Oryctolagus cuniculus Pepsin II 4A AB089791
of a gene tree from pepsin DNA and protein sequence data; Rabbit II-1A Oryctolagus cuniculus Pepsin II A1 AB089790
(ii) derivation of a reconciled tree by combining information Rabbit C Oryctolagus cuniculus Pepsin C AB047250
from gene tree and species tree with the specific aim to find Rabbit F Oryctolagus cuniculus Pepsin F M59238
Dog A Canis familiaris Pepsin A AB047246
evidence of gene duplication events; (iii) detection of
Dog B Canis familiaris Pepsin B AB082936
adaptive evolution instances in relation to functional diver- Human A Homo sapiens Pepsin A BC029055
sification of different pepsin groups originated by gene Human C Homo sapiens Pepsin C J04443
duplication events; (iv) estimation of functional divergence Monkey A Callithrix jacchus Pepsin A AB038384
between gene families based on site-specific shifted evolu- Monkey C Callithrix jacchus Pepsin C AB038385
Monkey Y Callithrix jacchus Pepsin Y AB038386
tionary rates.
V. Carginale et al. / Gene 333 (2004) 81–90 83

2000) using the trout pepsin C sequence as outgroup. Split the program TreeRot (Sorenson, 1999). A phenetic tree was
decomposition analysis was performed according to the inferred using the neighbor-joining method with bootstrap
method implemented in the program SplitsTree (Huson, (10,000 replicates) under the Tamura – Nei substitution mod-
1998). Analysis for possible gene conversion events was el with gamma shape parameter set to 1.9, using the MEGA 2
carried out using the program GeneConv (available at http:// software version 2.1 (Kumar et al., 2001).
www.math.wust1.edu/fsawyer). Adaptive evolution was
investigated by means of the Creevey – McInerney method 2.4. Determination of gene duplication and loss in the
(Creevey and McInerney, 2002) implemented in the pro- inferred gene tree
gram CRANN (Creevey and McInerney, 2003). The ratio x
of nonsynonymous over synonymous substitutions was In order to resolve conflicts between gene tree and
determined according to the method of Yang and Nielsen species phylogeny, a reconciled tree was constructed using
(Yang and Nielsen, 1998) implemented in the codeml the program GeneTree v. 1.3.0 (available at http://www.
program contained in the PAML package (available at taxonomy.zoology.gla.ac.uk/rod/rod.html). The starting
http://abacus.gene.ucl.ac.uk/software/paml.html) under four phylogeny of vertebrate taxa was deduced from a number
models: (1) a single ratio for all branches in the phylogeny; of data available in the literature. An overall picture of major
(2) one ratio restricted to the branch predating diversifica- vertebrate groups was based on Bishop and Friday (1988).
tion of the clade formed by pepsins A, Y and F, and one Relationships within terminal taxa were according to Nelson
ratio for all other branches; (3) one ratio restricted to the (1994) for teleosts and Liu et al. (2001) for eutherian
branches predating fish pepsins A and chymosin diversifi- mammals. The species phylogeny minimizing the number
cation, one ratio restricted to the branches predating foetal of duplication and losses in the reconciled tree was inferred
pepsins and tetrapod pepsins A diversification, one ratio for by 100 heuristic searches using the random starting trees
the remaining branches; (4) an independent ratio for each and steepest ascent options, alternating NNI and subtree
branch in the phylogeny. The following options were pruning regrafting (SPR) branch swapping.
specified in the codeml control file: (1) the ML tree in
Fig. 1 as user tree; (2) codon frequency calculated from the 2.5. Estimation of functional divergence
average nucleotide frequencies; (3) no variation of ratios
among sites; (4) constant rate among sites; (5) transition/ Functional divergence between protein families was
transversion ratio estimated from the data set using different detected on the basis of site-specific shifted evolutionary
starting values. The likelihood ratio test was applied to rates after gene duplication or speciation using the program
decide which value fitted the data better. The optimal value Diverge (Gu, 1999, 2001). Functional branch length bF(x)
estimated in such a way resulted to be 1.4, close to that (where x denotes a generic sequence cluster) was calculated
obtained by Modeltest (see below). as described by Gu et al. (2002).

2.3. Phylogenetic analysis


3. Results
Phylogenetic analysis was conducted using three different
methods. The maximum likelihood tree was generated from 3.1. Pepsin gene phylogeny
first and second codon positions of nucleotide sequences
using the program PAUP*v.4.0 b10 (Swofford, 1993) with The phylogenetic analysis of the dataset was carried out
settings obtained by Modeltest 3.06 (Posada and Crandall, according to three different methods, namely maximum
1998). The tree was inferred under the general-time-revers- likelihood (ML), maximum parsimony (MP) and neighbor
ible (GTR) model with a proportion of invariant sites joining (NJ). In such a way, possible inconsistency due to
( p=0.24) and discrete gamma model (gamma shape param- the use of a single method could be avoided. The ML tree is
eter=1.92). Nucleotide frequencies were: A=0.31, C=0.22, shown in Fig. 1; the support for the different branches was
G=0.23, T=0.24; the estimated transition/transversion ratio determined by bootstrap analyses performed according to
was 1.3. Heuristic search was conducted using a neighbor- the three methods and by the index of the Bremer support.
joining tree as starting tree and nearest-neighbor interchange All analyses identified two major clades, one of pepsins
(NNI) branch swapping algorithm. The resulting tree was C and another comprising pepsins A and pepsins F and Y
then used as starting tree for a new search using the same together. In the latter, however, piscine pepsins A formed a
settings but tree bisection-reconnection (TBR) branch swap- well-separated group with respect to mammalian pepsins A
ping. Robustness was assessed by 100 bootstrap replicates. and foetal pepsins. This relationship was strongly supported
Maximum parsimony analysis with bootstrap (1000 repli- by all the methods used, providing evidence that diversifi-
cates) was carried out using the program PAUP* v.4.0 b10; cation of amniote pepsins occurred after the separation from
heuristic search was conducted on the first and second codon the fish group. The picture appears more complicated within
positions using random stepwise addition (20 replicates) and the amniote group: the analyses showed a split between the
TBR algorithm. Bremer (1994) support was estimated using tetrapod pepsins A group and the two groups of foetal
84 V. Carginale et al. / Gene 333 (2004) 81–90

Fig. 1. Phylogenetic tree of pepsin sequences. Tree topology was inferred from aligned nucleotide sequences using the maximum likelihood (ML) method.
Bootstrap values obtained by ML, maximum parsimony, neighbor joining methods, and the Bremer support indexes are shown in bold (with arrows pointing
towards nodes). The numbers and labels ( ) serve to identify the branches in the Creevey – McInerney test and in the maximum likelihood analysis reported in
Table 3.

pepsins. However, while the split between tetrapod pepsins exception of the rabbit C taxon. From our analysis, dog
A and pepsins F was well supported in the ML and NJ pepsin B segregates in the group of pepsins C; hence its
analyses, it remained unresolved in MP. classification as distinct pepsin does not appear justified.
Similarly, the split of the pepsins Y is supported in ML
and MP, but not in NJ. In these few controversial cases, we 3.2. Rate homogeneity
decided to give more credit to the topology obtained by ML
analysis because of the higher bootstrap value. The clade of The rate homogeneity test was conducted by applying
pepsins C resulted in general well resolved, with the only the Takezaki branch length test to the translated amino acid
V. Carginale et al. / Gene 333 (2004) 81–90 85

Table 2A position analysis. The tree-like structure of the split graph


Results of branch length test
(not shown) is indicative of absence of phylogenetic
Taxa with BL <0.387 Taxa with BL >0.387 conflict.
(average root-to-tip length)
Taxon p Taxon p 3.4. Inference of gene duplication and losses
Rockcod C 0.0004** rockcod A1 0.0012**
Frog C 0.0004** rockcod A2 0.001** Members of a gene family may arise from gene duplica-
Xenopus C 0.0004** rockcod A3 0.034*
tion events, but instances of gene loss can appear at different
Chicken C 0.0188* plaice A2b 0.0072**
Monkey C 0.0066** rat F 0.0004** steps of phylogeny. Gene losses can either occur because the
Human C 0.0082** rabbit F 0.0002** pertinent sequences were missing from the data bank, or
rat Y 0.0003** because the genes were actually not present or inactive in the
monkey Y 0.0026** genome of certain species. Gene duplication must be postu-
* Significant at 5%. lated whenever the gene tree results to be incompatible with
** Significant at 1%. the species tree and can be inferred even when some genes
were lost or missing. The reconciled tree shown in Fig. 2
sequences contained in the data set. The analysis was suggests the occurrence of at least eight duplication events
carried out by applying the Poisson correction, using seatr- along different lineages. The first took place very early,
out pepsin C as outgroup. The results of the test reported in marking the divergence of the major pepsin group from that
Table 2A reveal that a number of taxa violated the hypoth- of pepsins C; the second and third duplication led to the
esis of rate homogeneity: one group of sequences, including segregation of pepsins Y and pepsins F, respectively. All the
pepsins C from rockcod, frog, bullfrog, chicken, monkey other duplications occurred during the diversification of
and human are characterized by a lower rate, whereas the pepsins A, disclosing a great complexity in the evolution
group including most fish pepsins A and foetal forms of these genes. In rockcod, plaice and rabbit, two duplication
evolves significantly faster. Such results are supported by rounds took place along the evolutionary pattern of these
the results of the relative rate test shown in Table 2B. The species, with some duplication having occurred before
comparison between fish pepsin A and pepsin C clade lineage divergence, others afterwards. In contrast, some gene
resulted to be highly significant; and so was that between groups including fetal pepsins and pepsins C appear loath to
tetrapod pepsin A and pepsin C clade. Conversely, the duplicate. Noteworthy is the contrasting positions of pri-
comparison between fish and tetrapod pepsins A resulted mates in the pepsins A and pepsins C clades: in the latter, the
to be non-significant. two sequences appear to be the resultant of cladogenesis; in
the former, monkey A merges from a duplication occurred
3.3. Detection of gene conversion events before the segregation of the other mammals, with the
exception of dog whose position may reflect the uncertainty
Search for possible gene conversion events was per- in the phylogeny of carnivores.
formed on the 30 nucleotide sequences of the data set using The optimal species tree inferred by gene tree parsimo-
the program Geneconv. Following the procedure imple- ny analysis is shown in Fig. 3. In this tree, fish is the most
mented in the program, we found no global inner and no ancestral group followed by amphibians, birds and mam-
global outer-sequence significant fragment ( p-values 0.979 mals with glires and primates forming sister clades. Such a
and 1.000, respectively). We found a total of five pairwise reconstruction, albeit limited to a few species, is however
inner fragments with no pairwise outer-sequence fragment. in agreement with the currently accepted phylogeny of
With 30!/(2!(30 2)!)=435 pairwise comparisons, about 22 vertebrates.
pairs of sequences are expected to have pairwise significant
fragments with p-value of 0.05 or lower by chance alone. 3.5. Adaptive evolution in pepsin-coding DNA sequences
Hence, the null hypothesis that no gene conversion event
took place during pepsin evolution cannot be rejected. Such Adaptive evolution was investigated using the method
a conclusion is corroborated by the results of split decom- described by Creevey and McInerney (2002) derived from

Table 2B
Results of relative rate test
Clades dK S.D. dK/S.D. Exact probability ( p)
Fish pepsin A vs. Pepsin C 0.223521 0.049304 4.53353 6.7210 6**
Tetrapods pepsin A vs. Pepsin C 0.156384 0.0441276 3.5439 0.0004**
Fish pepsin A vs. Tetrapods pepsin A 0.0709812 0.0443991 1.59871 0.11 ns
ns: not significant.
** Significant at 1%.
86 V. Carginale et al. / Gene 333 (2004) 81–90

Fig. 2. Reconciled tree for the gene tree depicted in Fig. 1. The tree was inferred using the program GeneTree; hypothetical branches leading to lost or missing
sequences are shown as dashed lines.

the relative rate ratio test (MacDonald and Kreitman, (the new character state is preserved in all subsequent
1991). The Creevey– McInerney method implemented in lineages), replacement-variable (RV) (the new character
the program CRANN offers the advantage that it does not state is not preserved and changes at least once more in
require a specific model. In brief, given a phylogeny, the subsequent lineages), silent-invariable (SI) (silent changes
number and types of substitutions are counted for each that have not been changed again), silent-variable (SV)
internal branch of the tree rooted with an appropriate (silent changes that have been changed again in subsequent
outgroup, using reconstructed ancestral sequences. The lineages). The method uses silent (synonymous) substitu-
substitutions are classified as replacement-invariable (RI) tions as an estimate of neutral substitution pattern and
V. Carginale et al. / Gene 333 (2004) 81–90 87

We followed an alternative approach to detect adaptive


evolution by evaluating non-synonymous versus synony-
mous substitutions rate ratio (dN/dS) using the maximum
likelihood method described by Yang (1998). The advantage
of this method is that it does not depend on the accuracy of
ancestral sequences. Ratios greater than 1 are indicative of
positive selection, whereas ratios less than and equal to 1
suggest purifying and neutral selection, respectively.
In all the pairwise comparisons performed with pepsin
sequences with the maximum likelihood method, the ratio
x=dN/dS was significantly less than 1, a result that is far from
being surprising, as adaptive evolution is rarely detected in
protein families using the pairwise approach (Bielawski and
Yang, 2003). For such a reason, ML analysis of synonymous
and nonsynonymous ratios along lineages was performed
testing the different models described in Section 2. The two
extreme models assume either a unique rate ratio for all the
branches of the tree (one ratio model) or an independent rate
ratio value for each branch (free-ratio model). We tested these
two models together with the two intermediate models. The
log likelihood (l) was 17,828.0 for the free-ratio model,

Table 3
Detection of adaptive evolution by relative rate ratio test and maximum
likelihood analysis
CRANN (Creevey – McInerney) PAML (Yang)
Branch no. RI RV SI SV G value xa
0 26 37 12 36 3.21
1 61 57 38 49 1.28
2 99 104 58 92 3.56
3 129 189 74 145 2.53 20.1
Fig. 3. Optimal species tree inferred from the reconciled tree by parsimony 4 145 346 81 248 2.39 35.4
analysis. 5 157 459 82 334 4.70* 17.0
6 33 51 48 165 8.10**
7 211 556 131 528 11.40** 43.7
looks at the ways in which replacement (non-synonymous) 8 29 37 22 20 0.72
9 93 89 56 55 0.01
substitutions significantly deviate from this pattern. The 10 64 82 30 91 10.63**
test was performed on the tree in Fig. 1 rooted using 11 189 209 95 167 8.16**
seatrout and rockcod pepsin C sequences as outgroup. 12 63 128 30 111 5.58* 26.4
Significance was assessed by G-test to evaluate whether 13 81 303 46 205 0.72
the ratio of replacement-invariable over replacement-vari- 14 27 37 13 36 2.97
15 62 83 45 68 0.22
able substitutions was greater than expected from neutral- 16 71 122 49 95 0.27 9.6
ity. Table 3 shows the estimated values of RI, RV, SI, SV 17 80 157 55 135 1.12 4.7
and the results of statistical analysis performed by applying 18 99 197 70 192 2.98
the G-test. The branch numbers correspond to those 19 42 78 51 169 5.31*
reported in the tree in Fig. 1. The data in Table 3 show 20 156 304 124 386 10.83** 16.6
21 50 47 54 43 0.32
that the ratio of RI to RV deviates significantly from 22 137 151 76 116 2.97
neutrality in branches 5, 6, 7, 10, 11, 12, 19, 20, 23, 24, 23 304 506 201 528 17.33** 43.5
25, 26. The last four branches correspond to the duplica- 24 394 866 248 766 12.94** 37.2
tion events causing divergence of different gene groups. In 25 594 1167 344 964 19.68** 53.9
the clade of fish pepsins A, adaptive evolution seems to 26 808 1846 475 1561 29.60** 50.0
have occurred during evolution both by gene duplication RI=replacement-invariable; RV=replacement-variable; SI=silent-invariable;
SV=silent-variable.
and speciation. Possibly, the divergence of the pepsin A3 a
Only x values >1 are reported in the table. Corresponding branches
gene in the Antarctic rockcod might be dictated by are indicated by in Fig. 1.
stringent environmental conditions that required the pro- * Significant at 5%.
duction of a new cold-adapted form of enzyme. ** Significant at 1%.
88 V. Carginale et al. / Gene 333 (2004) 81–90

17,979.8 for the one-ratio model, 17,993.3 for the two- program Diverge that evaluates shifted evolutionary rate
ratios model and 17,993.2 for the three-ratios model. By after gene duplication or speciation. The coefficients of
comparing the two extreme models by the likelihood ratio functional divergence h estimated between the groups of
test, the free-ratio model resulted to fit the data better than the pepsin genes reported in Table 4 indicate significant site-
one-ratio model; indeed, twice the log likelihood difference, specific shift of evolutionary rate between them. In partic-
2(lfree lone)=303.6 gave a p<0.005 with 57 degrees of ular, the pepsin C group appears to be functionally divergent
freedom (because the one-ratio and the free-ratio models from all the other groups considered; similarly, the data
involved the estimation of 1 and 58 values, respectively). The show functional differences between the fish and tetrapod
x values for the free-ratio model significantly greater than 1 pepsin A clades. The last column of Table 4 reports the
are listed in Table 3; all other branches (not shown in Table 3) number of amino acid sites responsible of functional diver-
have x values lower than 1 thus suggesting purifying gence having posterior probability Qk>0.5. The values of
selection along them. functional branch length (bF) estimated from the pairwise
The two methods give comparable results especially for comparisons between pepsins C, fish pepsins A and tetrapod
the innermost branches. The overall picture merging from pepsins A are: bF (pepsins C)=0.141, bF (fish pepsins
these data is that a marked increase in the rate of fixation A)=0.093 and bF (tetrapod pepsins A)=0.207.
of nonsynonymous mutations occurred after duplication
leading to formation of distinct gene families. However, in
a number of cases, evolution under positive selection 4. Discussion
apparently occurred also after divergence of gene families.
This is particularly evident in the clades of pepsins A and The aim of the present paper is to understand the
pepsins C, although not always the two methods are in historical relationships between pepsin encoding genes
perfect agreement. For instance, the results of the relative generated by gene duplication events. Comparative bio-
rate ratio test indicate positive selection for two branches chemical studies allowed establishing that, in spite of the
in the fish pepsins A clade, whilst maximum likelihood for striking structural similarity, pepsins show broad functional
these branches is in favor of purifying selection. At the and chemical differences, based on which particular
moment, it is not possible to establish the limits and the enzymes can be assigned as members of the gene family.
power of each method; however, it must be noticed that In this analysis, we have investigated a set of different
the phylogeny of fish pepsins A is complicated by several pepsin sequences from a number of vertebrate species to test
rounds of duplications occurred before and after speciation. the hypothesis that the divergence of pepsin groups may
have involved positive selection and functional diversifica-
3.6. Divergence between pepsin sequences tion following gene duplication.
Our results show that the optimal phylogeny of the
It is a fact that synonymous substitutions can occur quite pepsin gene family presents a marked discrepancy with
quickly with the result that synonymous sites become that of the relative organisms; however, by comparing the
saturated for further changes. This represents a general gene tree with a taxonomic tree it is possible to infer a
problem in any study on adaptive evolution at molecular reconciled tree that allows the prediction of gene duplica-
level. In order to corroborate the results presented in the tion and gene loss events. As stated above, gene loss may
previous section, we used a method that can detect func- simply mean that certain genes have never been sequenced,
tional divergence following gene duplication using amino but in a number of cases the genes may be really missing
acid sequences. Such an approach offers the advantage that (or functionally inactive). The distinction between these
amino acid evolution proceeds at a lower rate with respect to two possibilities is not always straightforward, and requires
nucleotides, providing a method for studying evolution by a dose of good sense. Among the sequences considered in
gene duplication in too divergent genes. the present study, the genes of foetal pepsins (pepsins Y and
We estimated functional divergence between member F) are probably missing in amphibians, whilst their absence
genes of pepsin family by posterior analysis using the in higher vertebrates is more likely due to incomplete

Table 4
Functional divergence between groups of the pepsin gene family
Group 1 Group 2 HFS.E. LRT p Sites with Qk>0.5
Pepsins C Fish Pepsins A+Tetrapod Pepsins A,Y,F 0.172F0.052 10.79 0.0005 9
Pepsins C Tetrapod Pepsins A,Y,F 0.184F0.060 9.33 <0.0025 8
Pepsins C Tetrapod Pepsins A,F 0.225F0.068 10.81 0.0005 15
Pepsins C Fish Pepsins A 0.209F0.087 5.74 <0.025 7
Pepsins C Tetrapod Pepsins A 0.294F0,077 14.56 <0.0005 31
Fish Pepsins A Tetrapod Pepsins A 0.259F0.077 11.43 <0.0005 23
V. Carginale et al. / Gene 333 (2004) 81–90 89

sampling. Such a conclusion is corroborated by the evi- based on a maximum likelihood test of functional diver-
dence that chymosins (pepsins Y) are provided of the gence (Gu, 1999, 2001; Gu et al., 2002). This method uses
highest milk-clotting activity among the pepsins, a feature amino acid sequences; hence it offers the great advantage
that is in accord with their nutritional role in mammalian that is not sensitive to saturation of synonymous sites. As
newborns (Kageyama, 2002). Similarly, the presence of groups with less than four sequences cannot be analyzed,
pepsin A has never been reported in rodents: possibly, the the evolution of pepsins Y and F was not studied with this
species-specific expression of various pepsins may reflect method. The results of such analysis show functional
an adaptation to different diets. Similarly, the presence of divergence between pepsins C and other pepsin genes,
multiple forms of pepsin A in certain species may be and between fish and tetrapod pepsin A clades. Our results
correlated to the type of food or to the feeding habit. Plaice show also that only a fraction of the protein residues
and rockcod have two and three forms of pepsin A, contributes to functional divergence, as most sites are
respectively; as both fish live in cold waters, it can be highly conserved. Some of these residues fall in the group
hypothesized that in these cases gene amplification is a of sites involved in the enzyme– substrate interaction: for
strategy to increase enzyme production to facilitate diges- example, among the 23 residues with Qk>0.5 responsible of
tion at low temperatures. Rabbit has three pepsin A iso- the divergence between fish and tetrapod pepsins A, one
forms although they are typical herbivorous. It has been belongs to the S1 – S5 subsites and two to the S1V – S3V
suggested that digestion of herbaceous foods requires more subsites.
enzyme to accomplish effective degradation of low con- In conclusion, our results unravel the complex evolu-
centrations of proteins (Kageyama, 2002). tionary pattern of the pepsin gene family, characterized by a
Functional diversification between different types of number of gene duplication events and, possibly, by instan-
pepsins is supported by experimental evidence. These ces of gene loss. Usually, duplications are followed by
enzymes display a quite high level of conservation; but this positive selection, but in some cases the latter may have
is true only if one considers the overall catalytic properties, occurred also long after divergence. At the protein level,
while a more subtle distinction is possible only after a gene duplication is characterized by functional divergence
thorough analysis. Pepsins C (gastricsin) and pepsin A involving modifications of specific amino acid sites.
usually display maximal activity at pH 2– 3 against hemo-
globin, but the former has a specific activity twofold higher
(Kageyama, 2000) with preference for Tyr at the P1 position
Acknowledgements
(Tang, 1970). Chymosins, on the other hand, have optimal
pH around 4 and a difference at the level of subsite S1V with
This research is within the framework of the Italian
respect to both pepsins A and C. The cleavage specificity of
National Programme for Antarctic Research (P.N.R.A.).
chymosins is apparently due to the presence of a negatively
charged residue located near the edge of the active site cleft
(Kageyama, 2002). Very little is known about the properties
and the specificity of pepsins F. Probably other distinctive References
features may merge when more detailed studies on zymogen
activation, species-specific substrate cleaving and enzyme Bargelloni, L., Scudiero, R., Parisi, E., Carginale, V., Capasso, C., Patar-
stability will be available. nello, T., 1999. Metallothioneins in Antarctic fish: evidence for in-
dependent duplication and gene conversion. Mol. Biol. Evol. 16,
The overall picture merging from the present study is that 885 – 897.
positive selection occurred after duplication leading to Bielawski, J.P., Yang, Z., 2000. Positive and negative selection in the DAZ
formation of distinct pepsin groups, a phenomenon fre- gene family. Mol. Biol. Evol. 18, 523 – 528.
quently observed during evolution of gene families (Bie- Bielawski, J.P., Yang, Z., 2003. Maximum likelihood methods for detecting
adaptive evolution after gene duplication. J. Struct. Funct. Genomics 3,
lawski and Yang, 2000; Seoighe et al., 2003). It has been
201 – 212.
speculated that positive selection after gene duplication is a Bishop, M.J., Friday, A.E., 1988. Estimating the inter-relationships of tet-
sign of adaptation of the protein to a new function (Bie- rapod groups on the basis of molecular sequence data. In: Benton, M.J.
lawski and Yang, 2000; Lynch and Conery, 2000). The (Ed.), The Phylogeny and Classification of the Tetrapods. Clarendon,
presence in the pepsin family of episodes of adaptive Oxford, pp. 33 – 58.
evolution in absence of recent duplication events is intrigu- Bremer, K., 1994. Branch support and tree stability. Cladistics 10,
295 – 304.
ing, but in keeping with the evidence that some gene groups Capasso, C., Riggio, M., Scudiero, R., Carginale, V., di Prisco, G., Kay, J.,
continue to evolve under positive selection long after the Kille, P., Parisi, E., 1998. Molecular cloning and sequence determina-
duplication event (Bielawski and Yang, 2003). It has been tion of a novel aspartic proteinase from Antarctic fish. Biochim. Bio-
suggested that adaptive evolution subsequent to functional phys. Acta 1387, 1 – 5.
divergence may reflect the existence of long-term selective Creevey, C., McInerney, J.O., 2002. An algorithm for detecting directional
and non-directional positive selection, neutrality and negative selection
pressure (Rosenberg and Domachowske, 1999). in protein coding DNA sequences. Gene 300, 43 – 51.
Another way to study protein adaptation is to investigate Creevey, C., McInerney, J.O., 2003. CRANN: detecting adaptive evolution
evolution of the pepsin gene family following an approach in protein-coding DNA sequences. Bioinformatics 19, 1726.
90 V. Carginale et al. / Gene 333 (2004) 81–90

Davies, D.R., 1990. The structure and function of the aspartic proteinases. Ohta, T., 1989. Role of gene duplication in evolution. Genome 31,
Annu. Rev. Biophys. Chem. 19, 189 – 215. 304 – 310.
Dunn, B.M., 1992. Structure and Function of the Aspartic Proteinases. Pearl, L., Blundell, T., 1984. The active site of aspartic proteinases. FEBS
Plenum, New York. Lett. 174, 96 – 101.
Fusek, M., Vetvicka, V., 1995. Aspartic proteinases. Physiology and pa- Posada, D., Crandall, K.A., 1998. Modeltest: testing the model of DNA
thology. CRC Press, Boca Raton. substitution. Bioinformatics 14, 817 – 818.
Gu, X., 1999. Statistical methods for testing functional divergence after Riggio, M., Scudiero, R., Filosa, S., Parisi, E., 2000. Sex- and tissue-
gene duplication. Mol. Biol. Evol. 16, 1664 – 1674. specific expression of aspartic proteinases in Danio rerio (zebrafish).
Gu, X., 2001. A site-specific measure for rate difference after gene dupli- Gene 260, 67 – 75.
cation and speciation. Mol. Bio. Evol. 18, 2327 – 2330. Riggio, M., Scudiero, R., Filosa, S., Parisi, E., 2002. Oestrogen-induced
Gu, J., Wang, Y., Gu, X., 2002. Evolutionary analysis for functional diver- expression of a novel liver-specific aspartic proteinase in Danio rerio
gence of Jak protein kinase domains and tissue-specific genes. J. Mol. (zebrafish). Gene 295, 241 – 246.
Evol. 54, 725 – 733. Robinson-Rechavi, M., Huchon, D., 2000. RR-Tree: relative-rate tests be-
Huson, D.H., 1998. SplitsTree: analyzing and visualizing evolutionary da- tween groups of sequences on a phylogenetic tree. Bioinformatics 16,
ta. Bioinformatics 14, 68 – 73. 296 – 297.
Kageyama, T., 2000. New world monkey pepsinogens A and C, and pro- Rosenberg, H.F., Domachowske, J.B., 1999. Eosinophils, ribonucleases
chymosins: purification, characterization of enzymatic properties, and host defense solving the puzzle. Immunol. Res. 20, 261 – 274.
cDNA cloning, and molecular evolution. J. Biochem. 127, 761 – 770. Scudiero, R., Verde, C., Carginale, V., Kille, P., Capasso, C., di Prisco, G.,
Kageyama, T., 2002. Pepsinogen, progastricsin, and prochymosin: struc- Parisi, E., 2000. Tissue-specific regulation of metallothionein and met-
ture, function, evolution, and development. CMLS, Cell. Mol. Life Sci. allothionein mRNA accumulation in the Antarctic notothenioid, Noto-
59, 288 – 306. theinia coriiceps. Polar Biol. 23, 17 – 23.
Kumar, S., Tamura, K., Jakobsen, I.B., Nei, M., 2001. MEGA2: molecular Seoighe, C., Johnston, C.R., Shields, D.C., 2003. Significantly different
evolutionary genetics analysis software. Bioinformatics 17, 1244 – 1245. patterns of amino acid replacement after gene duplication as compared
Liu, F.G.R., Miyamoto, M., Freire, N.P., Ong, P.Q., Tennant, M.R., Young, to after speciation. Mol. Biol. Evol. 20, 484 – 490.
T.S., Gugel, K.F., 2001. Molecular and morphological supertrees for Sorenson, M.D., 1999. TreeRot. Boston University, Boston.
eutherian (placental) mammals. Science 291, 1786 – 1789. Strimmer, K., von Haeseler, A., 1996. Quartet puzzling: a quartet maxi-
Lynch, M., Conery, J.S., 2000. The evolutionary fate and consequences of mum likelihood method for reconstructing tree topologies. Mol. Biol.
duplicate genes. Science 290, 1151 – 1155. Evol. 13, 964 – 969.
Lynch, M., Force, A., 2000. The probability of duplicate genes preservation Swofford, D.L., 1993. PAUP, Phylogenetic Analysis Using Parsimony.
by subfunctionalization. Genetics 154, 459 – 473. Illinois Natural History Survey, Champaign, IL.
Lynch, M., O’Hely, M., Walsh, B., Force, A., 2001. The probability Takezaki, N., Rzhetsky, A., Nei, M., 1995. Phylogenetic test of the molec-
of preservation of a newly arisen gene duplicate. Genetics 159, ular clock and linearized tree. Mol. Biol. Evol. 12, 823 – 833.
1789 – 1804. Tang, J., 1970. Gastricsin and pepsin. Methods Enzymol. 19, 406 – 421.
MacDonald, J.H., Kreitman, M., 1991. Adaptive protein evolution at the Walsh, J.B., 1995. How often do duplicated genes evolve new functions?
Adh locus in Drosophila. Nature 351, 652 – 654. Genetics 139, 421 – 428.
Nelson, J.S., 1994. Fishes of the World. Wiley, New York. Yang, Z., 1998. Likelihood ratio tests for detecting positive selection
Ohno, S., 1970. Evolution by Gene Duplication. Springer-Verlag, Heidel- and application to primate lysozyme evolution. Mol. Biol. Evol. 15,
berg, Germany. 568 – 573.
Ohta, T., 1988. Evolution by gene duplication and compensatory advanta- Yang, Z., Nielsen, R., 1998. Synonymous and nonsynonymous rate varia-
geous mutations. Genetics 120, 841 – 847. tion in nuclear genes of mammals. J. Mol. Evol. 46, 409 – 418.

You might also like