Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

Protein Engineering vol. 16 no. 11 pp.

799±807, 2003
DOI: 10.1093/protein/gzg101

Exploring the sequence patterns in the a-helices of proteins

Junwen Wang1,2 and Jin-An Feng1,2,3 that the formation of secondary structures largely depended on
1Department
the local amino acid sequence since the intermediate folding
of Chemistry and 2Center for Biotechnology,
Temple University, 1901 North 13th Street, Philadelphia, PA 19122, USA states presumably did not have the established tertiary contacts
3To
of the native structure.
whom correspondence should be addressed.
E-mail: feng@astro.temple.edu
Armed with this experimental evidence, computational
efforts to identify relationships between the amino acid
This paper reports an extensive sequence analysis of the sequence and special local structural elements have been
a-helices of proteins. a-Helices were extracted from the intensive. Most attempts to identify such relationships have
Protein Data Bank (PDB) and were divided into groups proceeded by identifying a common structural motif, then
according to their sizes. It was found that some amino characterizing the frequencies of occurrence of each amino
acids had differential propensity values for adopting acid at each position in that motif. One of the best examples of
helical conformation in short, medium and long a-helices.

Downloaded from http://peds.oxfordjournals.org/ at UCSF Library on March 19, 2015


such a study was the sequence pattern of the b-turn loop. This
Pro and Trp had a signi®cantly higher propensity for four-residue b-turn has been observed in a large number of
helical conformation in short helices than in medium and protein structures (E®mov, 1993). It is a tight-turn loop
long helices. Trp was the strongest helix conformer in structure with an intra-loop hydrogen bond between the main
short helices. Sequence patterns favoring helical conform- chain C=O(i) and the N±H(i + 3). The b-turn was later re-
ation were derived from a neighbor-dependent sequence classi®ed into many different sub-classes according to the
analysis of proteins, which calculated the effect of neigh- polypeptide backbone dihedral angles (Hutchinson and
boring amino acid type on the propensity of residues for Thornton, 1994). Other distinct sequence patterns include
adopting a particular secondary structure in proteins. a-helix capping residues and b-breakers. Gly is commonly
This method produced an enhanced statistical signi®cance described as a helix terminator (or Ccap residue), whereas Asp,
scale that allowed us to explore the positional preference Asn, Ser and Thr are favored at the Ncap position of a-helices
of amino acids for a-helical conformations. It was shown (Presta and Rose, 1988; Richardson and Richardson, 1988;
that the amino acid pair preference for a-helix had a
Parker and Hefford, 1997). b-Breakers are residues often found
unique pattern and this pattern was not always predict-
at the b-strand initiation or termination positions of protein
able by assuming proportional contributions from the
structures (Colloc'h and Cohen, 1991). They include Gly, Glu,
individual propensity values of the amino acids. Our
Asp and Pro. More comprehensive approaches have involved
analysis also yielded a series of amino acid dyads that
clustering structural segments of proteins into classes using
showed preference for a-helix conformation. The data
measures of structural similarity and then tabulating the
presented in this study, along with our previous study on
sequence preference for each of the classes (Unger et al.,
loop sequences of proteins, should prove useful for
developing potential `codes' for recognizing sequence 1989; Rooman et al., 1990; Olivea et al., 1997). In spite of
patterns that are favorable for speci®c secondary these efforts, the discovered relationship between a particular
structural elements in proteins. local structure and the amino acid sequence has been insuf-
Keywords: a-helix/propensity/protein structures/secondary ®cient for developing a rational secondary structure predictor.
structure/sequence pattern We developed a method for analyzing the sequence±
structure relationship of proteins termed neighbor-dependent
sequence analysis (Crasto and Feng, 2001). This method
calculated the neighboring probability of a pair of amino acids,
Introduction in any combination, in three classes of secondary structures
Recognizing the sequence patterns of proteins in relation to (a-helix, b-strand and loop). Neighbors were de®ned as the
their structures has been one of the most important aspects of ®rst neighbor, where the amino acids in the pair were
our efforts towards the understanding the principles of protein immediately next to each other in sequence; the second
folding. The An®nsen experiments in the 1950s suggested that neighbor, where the pair of amino acids was separated by one
the primary amino acid sequence contained the information amino acid residue in sequence; the third neighbor, where the
that speci®es the folded native protein structure (An®nsen, pair of amino acid was separated by two amino acids; or the
1973). Subsequent experiments over the past few decades have fourth neighbor, where the pair of amino acids was separated
generally supported this conclusion (Dill, 1990; Baldwin and by three amino acids. We applied the neighbor-dependent
Rose, 1999a,b; Honig 1999). Two-dimensional NMR hydrogen sequence analysis to the residues of immediate neighbors in
exchange, coupled with stopped-¯ow pulse-labeling experi- loops of proteins (Crasto and Feng, 2001). A series of dyad
ments, showed that the folding intermediate(s) usually possess codes that had strong preference for loop conformation were
secondary structures that are similar to that of the native protein found. For example, it was found that Cys had a high loop
(Hughson et al., 1991; Jennings and Wright, 1993; propensity in short loops when it was at a position preceding an
Chamberlain and Marquesee, 1997). These data suggested Arg, although both residues had low individual loop
Protein Engineering vol.16 no.11 ã Oxford University Press 2003; all rights reserved 799
J.Wang and J.-A.Feng

propensities. It was evident that the neighbor-dependent structure assignments as follows: helices, 310 helices and the
protein sequence analysis method could reveal `hidden' p-helices were all considered helices.
sequence codes in proteins. The database system used in this study was PostgreSQL
a-Helices are one of the most dominant structural elements packaged in Redhat Linux 7.2. The sequence and secondary
in proteins. Extensive studies have been carried out focusing on structure information of every PDB entry were parsed into
a-helix folding and its sequence±structure relationship. Early relational tables. Two sets of tables (one was parsed based on
studies by Chou and Fasman (Chou and Fasman, 1978) have the author-assigned secondary structure information and the
established a statistical scale to evaluate the likelihood of other was parsed according to the DSSP calculation) were
amino acids adopting a-helix conformation. Residues with compared. Considering that the authors' assignments were
high propensities were termed strong helix conformers, and the experimentally observed, we decided to use such assignments
residues with helix propensities slightly higher than random in the PDB as the standard for de®ning a-helices in the protein
distribution were termed medium helix conformers. Amino structures. However, manual errors were often encountered in
acids having a frequency of occurrence in helices lower than the PDB. In order to avoid such issues, we applied a double-
that of the random distribution were regarded as weak helix check mechanism where every structural element in the PDB
conformers. Although no chemical±physical rationale could be was compared with the DSSP assignment. The a-helices that
easily derived for the preference of amino acids adopting helix were agreed upon by both methods were selected from the PDB
conformation, such a statistical analysis has achieved limited and placed in a helix library. The total number of helices
success in assisting our understanding of the sequence± extracted was 10 643, which constituted 96.2% of the total
structure relationship of proteins, as well as in predicting helices available in those PDB entries. The a-helices were
protein secondary structures (Chou and Fasman, 1978). A grouped according to their sizes.
recent study by Penel et al. (Penel et al., 1999) on the analysis

Downloaded from http://peds.oxfordjournals.org/ at UCSF Library on March 19, 2015


The residue helix propensity values (ea) were determined
of side chain structures in¯uencing residues adopting a-helical from the ratio of the residue's frequency of occurrence in
conformation showed that some amino acid residues favored helices versus its frequency of occurrence in the PDB
hydrogen bonding with neighboring residues via either main (Equation 1):
chain±side chain or side chain±side chain interactions, thus
suggesting a potential neighbor-dependent preference for
residues in the a-helix. In this study, we carried out detailed aS =nS
ea ˆ 1†
neighbor-dependent sequence analysis on a-helices in proteins. aP =nP
Our results showed that amino acid pair preference for the
a-helix had a unique pattern. Based on the neighbor-dependent
propensity values, we derived a series of amino acid dyads that where aS was the number of residues of type a in the helix
had predominant preference for a-helical conformation. We library; nS was the total number of residues in the helix data
also carried out an analysis on the propensity variation of bank; aP was the number of number of residues of type a in the
amino acids in short, medium and long a-helices. To the best of PDB that contained all helices in the helix library; and nP was
our knowledge, such an analysis has never been reported. Our the total number of residues in the PDB used in this analysis.
study showed that there were signi®cant propensity variations The ea values for residues in different helix groups were
for certain amino acids in helices of difference size groups. calculated using the corresponding values.
For neighbor-dependent analysis of helices, the frequency of
occurrence of the residue type x at neighboring positions of a
Materials and methods
helix residue (a) was calculated according to Equation 2:
All analyses were performed using a relational database
derived from the October 2001 release of the Brookhaven ‰Sx a  i†S Š=n pair†S
Protein Data Bank (PDB) (Bernstein et al., 1977). Sequences in ex ai† ˆ 2†
the helix and strand regions were often more conserved than ‰Sx a  i†P Š=n pair†P
that of loop structures in homologous proteins (Benner and
Gerloff, 1990). A non-redundant set of PDB entries derived by
using PISCES (Wang and Dunbrack, 2003). Proteins with a where Sx(a 6 i)S and Sx(a 6 i)P were the occurrences of
sequence identity of >25% were removed. Since the helical residue type x at the 6ith positions of the residue a in
regions of the protein structures were usually well de®ned, it secondary structure sequence library S (a-helix) and in our
was necessary to include only structures that were determined PDB (P), respectively; n(pair)S and n(pair)P were the total number
at high resolution. The resolution cut-off of our database was of residue pairs in S and P, respectively. The numerator of
2.5 AÊ . Based on these criteria, a total of 1430 proteins were Equation 2 calculated the frequency of occurrence of residue x
selected from the PDB. These PDB entries were used in the neighbored with the residue type a in the secondary structure
subsequent parsing protocols. (S), while the denominator of the equation calculated the
The extraction of sequences and secondary structure frequency of occurrence of residue x neighbored with the
information from every PDB entry was based on two residue type a in the PDB (P). The ratio of these values would
independent secondary structure assignment strategies: assign- be the propensity of residue x in S when it was neighbored with
ments which were experimentally observed and assignments residue type a in S.
generated by the Kabsch and Sander DSSP algorithm (Kabsch In this paper, we present the neighbor-dependent propensity
and Sander, 1983), which assigned secondary structure based values as ex(a61). An ex(a61) value of 1.0 means that the
on an analysis of backbone dihedral angles and hydrogen occurrence of the residue pair, ax (or xa), in helices is the same
bonds. For the sake of comparison between the two methods of as its frequency of occurrence in proteins. A value >1.0 means
assignment, we converted the more sophisticated DSSP the pair has an occurrence in helices higher than that in
800
Sequence analysis of a-helix in proteins

Downloaded from http://peds.oxfordjournals.org/ at UCSF Library on March 19, 2015


Fig. 1. Length distribution of a-helices in the helix library.

proteins, suggesting that the pair has a preference for adopting residues. This subset had a total population of 8771 helices.
helix conformation. ex(a61) values lower than unity suggest less Owing to the uneven distribution of different lengths of the
preference for the pair in helices. For example, eP(A ± 1) = 1.52 helices in the library where the number of short helices far
in short helices means Pro has 50% more chance to be found in exceeded the number of medium and long helices (Figure 1), it
short helices than in the proteins when it precedes Ala, i.e. Pro was likely that the helix preference patterns described for the
at ±1 position of Ala; eP(A+1) = 0.47 suggests Pro is less likely to helix library would only re¯ect the characteristics of the short
be found in short helices when it follows Ala, i.e. Pro at +1 or medium a-helices. In order to address such potential bias,
position of Ala. we divided the subset of helices into three groups: short helices
(four to seven residues), medium helices (eight to 13 residues)
and long helices (14 to 22 residues). Based on the selection
Results
criteria, all three groups had approximately equal population
Length distribution of a-helices in proteins sizes. In the following discussions, propensities calculated
The lengths of a-helices in proteins varied between three and from the entire subset of four to 22 residues helices were
77 residues with a total of 10 643 helices in the PDB. The termed total helix propensity; propensities calculated from the
population distribution of helices in the library had a mean short, the medium and the long helical groups were termed
helical length of approximately 12.1 residues (Figure 1), which short, medium and long helix propensities, respectively.
was close to the mean helical length determined by Barlow and
Thornton (Barlow and Thornton, 1988) that included 291 a- Propensities of amino acids in a-helices of different length
helices. There was a relatively large population of three- groups
residue helices found in proteins. Those helices were most The helix propensities of amino acids in short helices appeared
likely half-turn helices or irregular helices such as the 310 quite different from that of the medium and long helices. Of
helices that were more frequently found in helices of less than particular interest were Trp and Pro. Not known as a strong
four residues long (Barlow and Thornton, 1988). In contrast to helix conformer, Trp had a signi®cantly higher frequency of
the a-helix length distribution reported by Zhu and Blundell occurrence in short helices (eW = 1.51) than in medium and
(Zhu and Blundell, 1996), where four-residue helices had twice long helices. In fact, Trp was the strongest helix conformer in
the population size as that of the ®ve-residue helices, our helix short helices. In contrast with its presence in medium and long
library contained more ®ve-residue helices than four-residue helices, Pro also had a signi®cantly elevated frequency of
helices by a ratio of 3:2 (Figure 1). There was a gradual occurrence in short helices (eP = 0.99). A number of residues,
decrease in the helix population as the helical length increased including Asp, Cys, Phe, Ser and Tyr, also had higher helix
beyond 13 residues. Helices longer than 40 residues were rarely propensities in short helices than their propensity values in
found in proteins. The longest helix in our helix library had 77 medium and long helices, while in the same subgroup of
residues. In fact, there were only one or two examples of each helices, residues Ala, Arg, Gln, Ile, Leu, Lys, Met and Val had
helical length having more than 47 residues (except 50- and 51- slightly lower propensity values. Particularly noticeable were
residue helices which had three examples of each). residues Asp and Ala. Both residues were good helix
In order to establish a data set that was representative of the conformers in short helices, while their helix propensities
general helix population, we chose a subset of our helix library were quite different in the context of the overall helix
containing only helices with lengths between four and 22 population (Figure 2). The amino acid propensities in the
801
J.Wang and J.-A.Feng

Downloaded from http://peds.oxfordjournals.org/ at UCSF Library on March 19, 2015


Fig. 2. A bar graph of the normalized helix propensity of amino acids in groups of different helix sizes. The propensity values of amino acids in different helix
groups are represented by different bars as indicated in the legend.

medium and the long helix subgroups were generally similar to had a strong in¯uence on the preference of neighboring
those of the total helix group. It appeared that the helical residues adopting helical conformation. Not surprisingly,
composition of short helices was quite different from that of the amino acids with strong or medium individual helix propensity,
medium and long helices. including Ala, Arg, Gln, Glu, Ile, Leu, Lys, Met, Phe, Ser, Trp
and Tyr, often exhibited a strong preference adopting a-helix
Neighbor-dependent sequence analysis of a-helices conformation when they were neighbored with Ala, Glu, Leu
In an attempt to analyze how neighboring residues affect the a- and Gln residues. On the other hand, the neighbor-dependent
helix conformations of amino acids, we calculated amino acid effect for stronger helix conformers positioned next to residues
preferences at positions immediately preceding (±1) or fol- with low individual helix propensity was limited. Although the
lowing (+1) an a-helix residue (a). The neighbor-dependent frequency of occurrence for residues with low individual helix
helix propensities of 20 amino acids at the +1 and ±1 position propensities was generally increased when they were neigh-
of a-helix residues [ex(a61)] in different groups are tabulated in bored with strong helix conformers, most of the neighbor-
Table Ia±d. Propensity values >1.20 are in bold in the table for dependent propensities were nevertheless <1.0 (Table Ia).
ease of inspection. Based on estimated standard deviations, Proportionally, medium helix conformers had less in¯uence on
most of the neighbor-dependent propensities had a comparable the preference of neighboring residues adopting helix con-
level of con®dence as that of the individual amino acid formation than that of the strong helix conformers. Amino
propensities (Table Ia±d) (J.Wang and J.-A.Feng, unpublished acids Arg, Lys, Met and Trp had high neighbor-dependent
results; Kumar and Bensal, 1996). The estimated standard propensity when they were neighbored with each other
deviations were slightly higher for neighbor-dependent pro- (Table Ia). No neighbor-dependent effect was observed when
pensities in the short helix group than that of other helix they were positioned next to weak helix conformers.
groups. This variation could in part be attributed to the small Unique sequence patterns were observed in different helix
population size of residue pairs in the short helix group, which groups for a number of amino acids, particularly in the short
was less than one-third of the other groups. helix group. Asp had a high propensity for helix conformation
The neighbor-dependent helix propensities of amino acids when it was neighbored with Ala, Arg and Glu in short helices,
often re¯ected the individual helix propensities of neighboring while such a pattern was not observed in the medium and the
residues. When two strong helix conformers were neighbored, long helix groups. In contrast, the pairings of Ala with Val and
their neighbor-dependent propensity was almost invariably Ile, as well as the pairing of Arg with Leu, were less frequently
high. By the same token, when two weak helix conformers found in short helices than in other helix groups (Table Ib±d).
were neighbored, their neighbor-dependent propensity was Similarly reduced neighbor-dependent propensity values were
often low. Interesting patterns often occurred when a strong also found for Arg, Ile, Lys and Met when they were
helix conformer was neighbored with a weak helix conformer, neighbored to Gln in short helices. Tyr, a weak helix
or two moderate helix conformers were neighbored. conformer, had a strong neighbor-dependent helix propensity
Ala, Glu, Leu and Gln were stronger helix conformers. when it was positioned next to Met in short helices [eY(M + 1) =
Neighbor-dependent propensity calculation showed that they 1.91, eY(M ± 1) =1.82], while in other helix groups, Met had no
802
Table I. Normalized neighbor-dependent helix propensity of residues in various helix groupsa

Ala Arg Asn Asp Cys Gln Glu Gly His Ile Leu Lys Met Phe Pro Ser Thr Trp Tyr Val

(a)
Ala 1.74(4) 1.58(5) 1.04(5) 1.05(4) 1.33(10) 1.66(6) 1.68(4) 0.69(3) 1.13(7) 1.53(5) 1.84(4) 1.60(5) 1.72(7) 1.40(6) 0.42(3) 1.16(4) 0.97(4) 1.71(10) 1.35(6) 1.32(4)
Arg 1.57(5) 1.25(6) 0.98(6) 1.21(5) 1.02(11) 1.44(7) 1.63(5) 0.57(4) 1.16(9) 1.12(5) 1.38(5) 1.28(6) 1.43(10) 1.11(6) 0.31(4) 1.01(6) 0.90(6) 1.34(12) 1.12(7) 0.80(4)
Asn 1.17(5) 0.96(7) 0.51(5) 0.68(5) 0.55(9) 0.86(7) 1.13(6) 0.35(3) 0.86(9) 0.82(5) 0.95(5) 0.82(5) 1.05(10) 0.94(6) 0.42(4) 0.66(5) 0.69(5) 0.93(10) 0.77(6) 0.73(5)
Asp 1.35(4) 1.07(6) 0.63(5) 0.78(5) 0.83(9) 1.19(7) 1.23(5) 0.28(2) 0.99(8) 1.05(5) 1.17(4) 1.02(5) 1.32(9) 1.09(6) 0.50(4) 0.68(4) 0.85(5) 1.21(10) 1.12(6) 0.96(4)
Cys 1.07(9) 1.04(10) 0.78(11) 0.73(9) 0.73(15) 1.03(12) 1.13(10) 0.54(7) 0.94(13) 1.00(11) 1.3(9) 1.03(11) 1.02(19) 0.86(12) 0.21(5) 0.58(8) 0.68(9) 0.95(21) 0.56(10) 0.59(8)
Gln 1.69(5) 1.53(7) 0.91(7) 0.99(7) 0.95(13) 1.54(7) 1.51(6) 0.55(4) 1.16(10) 1.22(6) 1.64(5) 1.39(7) 1.52(11) 1.24(8) 0.29(4) 0.96(6) 0.92(6) 1.19(12) 1.27(8) 1.01(6)
Glu 1.81(4) 1.44(5) 1.01(5) 1.08(5) 1.08(11) 1.76(6) 1.65(4) 0.63(4) 1.16(8) 1.39(5) 1.61(4) 1.50(5) 1.62(8) 1.29(6) 0.35(4) 1.20(5) 1.19(5) 1.60(10) 1.2(6) 1.18(4)
Gly 0.68(3) 0.58(4) 0.31(3) 0.42(3) 0.56(7) 0.58(4) 0.60(4) 0.35(2) 0.53(5) 0.55(3) 0.71(3) 0.48(3) 0.71(6) 0.57(4) 0.45(4) 0.40(3) 0.42(3) 0.50(6) 0.49(4) 0.45(3)
His 1.17(8) 1.00(9) 0.74(9) 0.75(7) 0.60(11) 1.17(10) 1.33(9) 0.47(5) 0.62(7) 0.93(7) 1.17(6) 0.89(8) 1.10(14) 0.86(9) 0.32(5) 0.75(7) 0.82(8) 0.89(14) 0.92(10) 0.90(7)
Ile 1.49(5) 1.26(6) 0.85(5) 0.82(4) 0.86(10) 1.28(7) 1.33(5) 0.73(4) 0.94(8) 1.03(5) 1.28(5) 1.24(5) 1.43(10) 0.93(6) 0.40(4) 0.88(5) 0.68(4) 1.09(12) 0.87(6) 0.85(4)
Leu 1.74(4) 1.47(4) 1.08(5) 1.16(4) 1.19(9) 1.43(5) 1.64(4) 0.81(3) 1.25(7) 1.3(5) 1.55(4) 1.54(4) 1.55(8) 1.21(5) 0.40(3) 1.06(4) 1.04(4) 1.38(10) 1.05(5) 1.10(4)
Lys 1.61(5) 1.37(6) 0.93(5) 1.03(5) 0.87(10) 1.45(7) 1.60(5) 0.50(3) 0.99(8) 1.08(5) 1.40(4) 1.29(5) 1.40(9) 1.14(7) 0.35(4) 0.94(5) 0.96(5) 1.12(11) 1.18(6) 0.83(4)
Met 1.60(7) 1.27(9) 0.84(9) 1.00(8) 1.09(18) 1.49(11) 1.27(8) 0.70(7) 0.91(13) 1.22(9) 1.73(8) 1.50(8) 1.30(13) 1.21(11) 0.55(7) 0.95(7) 0.71(7) 1.57(21) 1.25(12) 1.08(8)
Phe 1.29(6) 1.14(7) 0.79(6) 0.84(5) 0.83(11) 1.12(8) 1.30(6) 0.68(5) 0.74(8) 0.97(6) 1.44(5) 1.09(6) 1.31(12) 0.99(8) 0.29(4) 0.76(5) 0.71(5) 1.28(13) 1.11(8) 0.89(5)
Pro 0.81(4) 0.66(6) 0.26(4) 0.52(4) 0.37(8) 0.93(6) 1.09(5) 0.28(3) 0.61(8) 0.59(5) 0.81(4) 0.54(5) 0.69(9) 0.64(6) 0.28(4) 0.57(4) 0.46(4) 0.59(9) 0.64(6) 0.44(4)
Ser 1.17(4) 1.00(5) 0.61(5) 0.76(4) 0.64(8) 1.07(6) 1.08(5) 0.39(3) 0.74(7) 1.04(5) 1.20(4) 0.97(5) 1.12(9) 0.96(6) 0.40(4) 0.65(4) 0.65(4) 1.00(9) 0.93(6) 0.82(4)
Thr 1.14(4) 0.99(6) 0.50(4) 0.53(4) 0.78(9) 1.07(7) 1.04(5) 0.49(3) 0.73(7) 0.76(5) 1.15(4) 0.95(5) 1.24(9) 0.80(5) 0.45(4) 0.65(4) 0.67(4) 0.83(9) 0.72(6) 0.73(4)
Trp 1.76(10) 1.37(11) 0.79(10) 0.96(9) 0.65(16) 1.42(12) 1.34(11) 0.76(8) 0.87(14) 1.40(12) 1.63(9) 1.16(11) 1.07(17) 1.21(13) 0.60(10) 0.88(9) 0.70(9) 1.21(21) 0.83(12) 1.04(10)
Tyr 1.31(6) 0.97(6) 0.75(6) 0.86(6) 0.81(11) 1.11(8) 1.17(7) 0.63(5) 0.75(9) 0.88(7) 1.46(6) 1.10(7) 1.36(12) 1.01(8) 0.46(5) 0.67(5) 0.56(5) 0.95(12) 0.96(8) 0.76(5)
Val 1.27(4) 1.12(5) 0.83(5) 0.79(4) 0.83(9) 1.20(6) 1.17(4) 0.57(3) 0.69(6) 0.72(4) 1.04(4) 1.12(5) 1.25(8) 0.85(5) 0.35(3) 0.70(4) 0.60(4) 0.71(9) 0.68(5) 0.66(3)

(b)
Ala 1.27(11) 1.38(17) 1.33(19) 1.47(16) 1.21(32) 1.25(18) 1.56(16) 0.76(10) 0.88(21) 0.82(13) 1.22(12) 1.53(17) 1.08(22) 1.31(19) 0.47(11) 1.34(16) 0.95(14) 1.48(34) 0.84(17) 0.82(11)
Arg 1.03(15) 0.96(18) 1.08(21) 1.47(20) 0.38(22) 1.0(21) 1.47(19) 0.72(14) 1.43(32) 0.62(13) 0.85(12) 1.34(21) 0.80(26) 1.26(22) 0.56(16) 1.15(19) 1.03(19) 1.38(43) 1.16(23) 0.56(12)
Asn 1.04(16) 1.05(22) 0.37(12) 0.75(17) 0.92(37) 0.71(19) 1.12(19) 0.37(9) 1.37(35) 0.79(16) 0.87(14) 0.67(16) 0.32(18) 0.91(21) 0.76(16) 0.94(18) 0.64(15) 1.43(40) 0.94(22) 0.78(15)
Asp 1.28(15) 1.4(21) 0.95(18) 1.12(17) 1.41(38) 1.41(25) 1.6(18) 0.46(9) 0.96(26) 1.39(18) 1.73(16) 1.34(18) 1.69(36) 1.44(21) 1.19(18) 1.07(17) 0.98(17) 2.15(43) 1.43(23) 1.34(16)
Cys 1.13(32) 1.12(35) 0.61(30) 1.30(38) 0.62(43) 0.76(33) 1.65(41) 1.00(27) 2.07(60) 0.72(29) 0.88(26) 1.00(35) 0.37(37) 1.21(45) 0.43(21) 1.14(32) 1.09(34) 2.54(109) 1.09(44) 0.54(24)
Gln 1.20(17) 1.07(22) 0.87(21) 1.00(21) 0.76(38) 1.17(23) 1.61(23) 0.8(16) 0.65(24) 0.89(19) 1.31(17) 1.08(20) 1.17(36) 1.21(26) 0.44(16) 0.80(18) 0.74(18) 1.13(39) 1.25(27) 0.96(18)
Glu 1.72(16) 1.07(16) 1.1(17) 1.56(19) 1.28(40) 1.66(24) 1.47(16) 0.70(12) 1.12(27) 1.58(18) 1.58(15) 1.34(16) 1.92(33) 1.19(20) 0.49(13) 1.5(20) 1.43(19) 2.54(45) 1.42(23) 1.19(15)
Gly 0.77(11) 0.96(15) 0.31(9) 0.63(11) 1.17(30) 0.66(15) 0.67(12) 0.44(8) 0.40(14) 0.50(10) 0.59(9) 0.57(10) 0.58(18) 0.66(13) 0.52(13) 0.61(11) 0.49(10) 0.70(23) 0.78(16) 0.41(8)
His 1.56(29) 0.72(24) 1.21(34) 0.66(22) 0.63(36) 1.23(33) 1.10(28) 0.59(17) 0.55(21) 0.77(22) 1.06(20) 0.86(25) 1.13(45) 0.90(28) 0.87(24) 0.61(20) 0.93(26) 0.95(47) 0.54(24) 0.98(23)
Ile 1.06(14) 1.26(19) 0.56(13) 0.76(13) 0.33(19) 0.75(18) 1.25(17) 0.71(13) 0.77(23) 0.70(14) 0.82(13) 0.80(14) 0.93(29) 0.96(21) 0.35(10) 0.82(14) 0.56(12) 1.53(45) 0.94(21) 0.24(8)
Leu 1.32(12) 0.98(13) 0.73(13) 1.32(14) 1.15(29) 1.17(17) 1.29(13) 0.78(11) 0.96(21) 0.94(14) 1.26(12) 1.14(13) 1.24(24) 1.54(21) 0.5(10) 0.97(12) 0.91(12) 1.50(38) 0.89(16) 0.79(11)
Lys 1.17(14) 1.26(20) 0.97(18) 1.22(17) 1.11(36) 1.15(22) 1.39(16) 0.53(11) 1.16(28) 0.80(14) 1.12(14) 1.10(15) 1.30(31) 1.16(22) 0.57(14) 1.07(17) 0.89(15) 1.34(40) 1.57(25) 0.68(12)
Met 0.83(20) 1.15(30) 0.37(18) 1.05(27) 1.05(60) 1.00(33) 1.24(28) 1.01(25) 0.19(19) 0.64(22) 0.95(21) 1.26(28) 0.90(36) 0.87(32) 0.71(25) 0.81(22) 0.62(21) 2.38(93) 1.91(49) 0.73(21)
Phe 1.15(19) 1.55(26) 0.93(20) 1.02(18) 1.96(55) 1.30(28) 1.68(24) 0.81(16) 0.45(20) 0.62(16) 1.42(19) 1.00(19) 1.17(38) 1.14(27) 0.39(14) 0.97(18) 0.93(18) 1.31(45) 0.98(24) 0.75(16)
Pro 1.52(18) 1.26(24) 0.63(16) 0.93(16) 0.85(34) 1.97(30) 2.18(22) 0.47(11) 1.10(31) 0.83(18) 1.28(17) 0.92(19) 0.92(32) 1.49(26) 0.78(18) 1.58(21) 0.93(17) 1.90(49) 1.16(25) 0.47(11)
Ser 1.19(15) 1.34(20) 0.88(17) 1.20(17) 1.34(35) 1.42(22) 1.53(19) 0.52(9) 0.89(23) 1.3(18) 1.56(16) 1.30(18) 1.17(30) 1.23(20) 0.57(14) 1.03(14) 0.44(11) 1.36(34) 1.08(21) 0.89(14)
Thr 0.73(12) 0.58(14) 0.55(14) 0.75(14) 1.09(32) 0.97(21) 0.73(14) 0.47(9) 0.81(23) 0.76(15) 0.98(12) 1.04(18) 1.10(30) 0.68(16) 0.5(12) 0.84(15) 0.70(14) 1.23(33) 1.20(23) 0.51(10)
Trp 2.27(43) 2.42(50) 0.72(29) 1.49(36) 1.20(68) 1.74(45) 2.3(48) 1.39(35) 1.94(66) 2.37(54) 1.88(35) 2.06(47) 0.31(31) 2.18(58) 0.69(34) 1.71(40) 0.88(31) 1.70(82) 1.26(47) 0.72(27)
Tyr 1.10(19) 0.80(19) 0.78(19) 1.17(21) 1.44(47) 0.78(21) 1.83(27) 1.07(19) 1.16(36) 0.48(16) 1.21(19) 1.07(22) 1.82(49) 1.06(25) 0.62(18) 0.97(19) 0.50(15) 1.44(47) 0.8(23) 0.51(14)
Val 0.90(12) 0.69(13) 0.76(15) 0.81(13) 0.66(25) 0.98(18) 0.95(13) 0.55(10) 0.54(18) 0.45(10) 0.79(11) 1.12(15) 0.93(25) 1.17(19) 0.34(9) 0.68(12) 0.47(10) 0.54(24) 0.64(16) 0.50(9)

803
Sequence analysis of a-helix in proteins

Downloaded from http://peds.oxfordjournals.org/ at UCSF Library on March 19, 2015


Table I. Continued

804
Ala Arg Asn Asp Cys Gln Glu Gly His Ile Leu Lys Met Phe Pro Ser Thr Trp Tyr Val

(c)
Ala 1.75(7) 1.49(10) 0.97(9) 0.97(8) 1.27(18) 1.83(12) 1.64(9) 0.66(6) 1.16(13) 1.52(9) 2.10(8) 1.60(9) 1.52(14) 1.59(11) 0.39(6) 1.16(8) 0.81(7) 1.72(20) 1.63(13) 1.31(8)
Arg 1.64(10) 1.15(11) 1.03(11) 1.08(10) 1.10(21) 1.41(14) 1.68(11) 0.51(7) 1.14(16) 1.13(10) 1.42(9) 1.20(11) 1.50(20) 0.88(11) 0.30(7) 0.93(10) 0.82(10) 1.51(25) 1.06(12) 0.68(7)
J.Wang and J.-A.Feng

Asn 1.25(10) 0.90(12) 0.53(8) 0.67(9) 0.42(15) 0.82(12) 1.05(11) 0.24(4) 0.69(14) 0.81(9) 0.93(8) 0.80(10) 1.08(19) 0.95(12) 0.34(6) 0.59(8) 0.72(9) 0.90(18) 0.69(11) 0.77(8)
Asp 1.36(9) 0.93(10) 0.59(8) 0.76(8) 0.78(16) 1.13(13) 1.31(9) 0.22(4) 0.81(14) 1.05(9) 1.06(7) 0.97(9) 1.16(17) 1.20(11) 0.42(6) 0.59(7) 0.80(9) 1.41(20) 1.05(11) 1.03(8)
Cys 0.74(15) 1.04(19) 0.63(18) 0.77(17) 0.63(25) 1.14(23) 1.05(19) 0.47(11) 0.77(22) 0.91(19) 1.23(17) 1.42(23) 1.66(42) 0.71(20) 0.15(7) 0.56(13) 0.78(16) 0.50(17) 0.37(12)
Gln 1.57(11) 1.58(15) 0.85(12) 0.91(12) 0.78(22) 1.29(14) 1.41(12) 0.44(7) 1.21(19) 1.35(13) 1.66(11) 1.26(12) 1.28(21) 1.49(16) 0.36(8) 0.80(10) 0.88(11) 1.35(24) 1.16(15) 0.87(10)
Glu 1.76(9) 1.57(11) 0.88(9) 1.04(9) 1.01(20) 1.7(13) 1.93(10) 0.49(6) 0.92(14) 1.41(10) 1.72(8) 1.65(10) 1.58(17) 1.35(12) 0.40(7) 1.20(10) 1.12(9) 1.59(20) 1.32(13) 1.22(8)
Gly 0.69(6) 0.50(6) 0.36(6) 0.35(5) 0.43(11) 0.46(7) 0.57(6) 0.40(5) 0.50(9) 0.50(6) 0.72(6) 0.39(5) 0.83(12) 0.52(7) 0.41(7) 0.29(4) 0.34(5) 0.45(11) 0.45(7) 0.39(5)
His 0.91(13) 0.90(15) 0.55(13) 0.75(13) 0.87(24) 1.23(19) 1.31(17) 0.37(8) 0.59(12) 0.86(13) 1.21(12) 0.69(13) 0.77(22) 0.93(16) 0.25(8) 0.75(13) 0.74(13) 0.57(21) 0.88(17) 0.83(12)
Ile 1.55(9) 1.10(10) 0.98(10) 0.89(8) 0.79(17) 1.36(13) 1.32(10) 0.70(7) 0.86(14) 1.04(10) 1.57(10) 1.32(10) 1.44(20) 0.91(12) 0.46(7) 0.88(8) 0.77(8) 0.86(19) 0.74(11) 0.87(8)
Leu 1.92(8) 1.66(9) 1.32(10) 1.11(7) 1.21(17) 1.34(10) 1.74(9) 0.8(6) 1.17(13) 1.39(9) 1.67(7) 1.70(9) 1.55(15) 1.36(11) 0.47(5) 1.06(7) 1.06(8) 1.20(19) 1.05(10) 1.14(8)
Lys 1.65(9) 1.36(11) 0.97(10) 1.06(9) 1.02(20) 1.48(14) 1.67(10) 0.47(6) 1.12(16) 1.26(10) 1.44(9) 1.50(10) 1.29(17) 1.15(12) 0.33(6) 0.95(9) 0.89(9) 0.96(19) 1.21(12) 0.88(8)
Met 1.73(16) 1.18(17) 0.85(16) 0.98(15) 1.08(34) 1.22(20) 1.11(15) 0.50(10) 0.77(21) 1.23(17) 2.05(17) 1.42(16) 1.18(23) 1.14(21) 0.64(13) 0.81(13) 0.64(12) 1.09(36) 1.35(23) 1.09(14)
Phe 1.36(11) 0.90(12) 0.68(10) 0.90(10) 0.78(20) 1.08(14) 1.33(12) 0.71(8) 0.67(14) 1.1(12) 1.57(11) 1.37(13) 1.65(25) 1.06(15) 0.30(7) 0.67(8) 0.70(9) 1.29(25) 1.05(14) 0.96(10)
Pro 0.99(8) 0.69(10) 0.20(5) 0.55(7) 0.29(12) 0.74(11) 1.17(9) 0.27(5) 0.56(13) 0.78(10) 0.91(8) 0.56(9) 0.83(17) 0.56(9) 0.24(6) 0.49(7) 0.49(7) 0.60(16) 0.68(11) 0.49(6)
Ser 1.24(8) 0.81(9) 0.60(8) 0.63(7) 0.53(13) 0.96(11) 1.03(9) 0.36(5) 0.79(13) 1.08(9) 1.16(8) 0.87(9) 1.07(16) 0.88(10) 0.37(6) 0.56(6) 0.58(7) 1.09(17) 0.79(10) 0.80(8)
Thr 1.19(9) 1.00(11) 0.55(8) 0.51(7) 0.68(15) 0.79(11) 1.06(9) 0.41(5) 0.67(12) 0.61(8) 1.15(8) 0.94(10) 1.42(19) 0.77(10) 0.48(7) 0.64(7) 0.61(7) 0.68(14) 0.62(10) 0.76(7)
Trp 1.41(19) 0.94(18) 0.86(18) 0.80(15) 0.69(30) 1.24(22) 1.31(21) 0.60(13) 0.50(20) 1.22(22) 1.79(19) 1.14(20) 1.27(34) 1.38(26) 0.89(22) 0.93(17) 0.60(15) 1.74(46) 0.49(17) 1.33(20)
Tyr 1.28(11) 1.04(12) 0.75(11) 0.81(10) 0.88(21) 1.10(14) 1.12(12) 0.51(8) 0.36(12) 0.91(12) 1.43(11) 1.17(13) 1.15(22) 0.96(14) 0.51(9) 0.71(9) 0.67(10) 0.88(21) 0.84(13) 0.76(10)
Val 1.25(8) 1.18(9) 0.67(8) 0.82(7) 0.87(16) 1.22(11) 1.31(9) 0.51(6) 0.87(13) 0.74(7) 1.13(7) 1.19(9) 1.12(15) 0.92(10) 0.46(6) 0.66(7) 0.61(6) 0.63(15) 0.64(9) 0.71(6)

(d)
Ala 1.84(6) 1.69(8) 1.02(8) 1.00(6) 1.40(15) 1.64(9) 1.74(7) 0.69(5) 1.16(11) 1.7(8) 1.81(6) 1.63(8) 2.02(13) 1.29(9) 0.43(5) 1.11(7) 1.09(7) 1.75(16) 1.27(9) 1.45(7)
Arg 1.64(8) 1.38(10) 0.93(9) 1.23(9) 1.12(17) 1.58(12) 1.63(9) 0.58(6) 1.11(13) 1.24(9) 1.48(7) 1.33(9) 1.53(16) 1.24(10) 0.25(5) 1.03(8) 0.93(8) 1.20(18) 1.15(10) 0.94(7)
Asn 1.15(8) 0.98(10) 0.53(7) 0.67(7) 0.56(14) 0.92(10) 1.19(9) 0.43(5) 0.86(13) 0.84(8) 0.99(7) 0.87(8) 1.20(16) 0.94(10) 0.39(6) 0.64(7) 0.68(7) 0.84(15) 0.78(9) 0.69(7)
Asp 1.35(7) 1.09(9) 0.58(7) 0.71(7) 0.73(13) 1.18(11) 1.08(7) 0.28(4) 1.12(13) 0.98(7) 1.11(6) 0.98(7) 1.34(15) 0.92(8) 0.39(5) 0.66(6) 0.85(7) 0.84(13) 1.09(9) 0.83(6)
Cys 1.29(16) 1.03(16) 0.93(17) 0.57(12) 0.82(23) 1.02(18) 1.06(16) 0.48(9) 0.77(18) 1.14(17) 1.45(15) 0.76(14) 0.72(24) 0.88(18) 0.21(7) 0.46(10) 0.50(11) 1.23(36) 0.48(14) 0.76(13)
Gln 1.90(9) 1.61(12) 0.97(10) 1.04(10) 1.11(21) 1.81(13) 1.56(10) 0.57(6) 1.24(15) 1.22(10) 1.70(9) 1.55(11) 1.77(19) 1.08(12) 0.2(5) 1.11(10) 0.99(10) 1.09(18) 1.35(13) 1.12(9)
Glu 1.87(7) 1.44(8) 1.08(8) 1.00(7) 1.08(17) 1.83(11) 1.50(7) 0.71(6) 1.33(13) 1.33(8) 1.55(7) 1.43(7) 1.58(13) 1.28(10) 0.27(5) 1.12(8) 1.17(8) 1.37(15) 1.08(9) 1.16(7)
Gly 0.65(5) 0.54(5) 0.27(4) 0.43(4) 0.51(9) 0.64(7) 0.61(5) 0.29(3) 0.59(8) 0.59(5) 0.73(5) 0.52(5) 0.66(9) 0.59(6) 0.46(6) 0.43(4) 0.46(5) 0.49(9) 0.45(6) 0.50(4)
His 1.26(12) 1.13(14) 0.76(13) 0.78(11) 0.41(14) 1.12(15) 1.40(14) 0.52(8) 0.66(11) 1.02(12) 1.16(10) 1.04(13) 1.32(22) 0.81(12) 0.23(6) 0.79(11) 0.84(12) 1.10(23) 1.04(15) 0.94(11)
Ile 1.55(8) 1.37(9) 0.83(7) 0.78(6) 1.04(15) 1.35(11) 1.35(8) 0.75(6) 1.03(12) 1.10(8) 1.19(7) 1.28(8) 1.53(17) 0.94(10) 0.37(5) 0.90(7) 0.65(6) 1.15(18) 0.95(10) 0.99(7)
Leu 1.72(6) 1.46(7) 1.00(7) 1.16(6) 1.19(14) 1.56(9) 1.67(7) 0.82(5) 1.37(11) 1.32(7) 1.54(6) 1.53(7) 1.62(12) 1.01(8) 0.33(4) 1.08(6) 1.05(6) 1.48(17) 1.10(8) 1.14(6)
Lys 1.69(8) 1.40(9) 0.89(8) 0.96(7) 0.72(14) 1.51(11) 1.61(8) 0.51(5) 0.86(12) 1.02(8) 1.44(7) 1.19(7) 1.5(15) 1.13(10) 0.31(5) 0.91(7) 1.02(8) 1.18(17) 1.07(10) 0.83(6)
Met 1.68(12) 1.37(15) 0.94(14) 1.01(12) 1.10(28) 1.80(19) 1.38(13) 0.76(10) 1.18(21) 1.35(15) 1.69(13) 1.61(14) 1.48(21) 1.34(18) 0.45(9) 1.07(12) 0.77(11) 1.73(36) 1.02(17) 1.17(12)
Phe 1.28(9) 1.20(11) 0.83(9) 0.75(7) 0.59(15) 1.11(12) 1.20(9) 0.63(7) 0.87(13) 0.97(10) 1.36(8) 0.92(9) 1.10(17) 0.91(11) 0.26(5) 0.77(7) 0.67(7) 1.27(20) 1.18(12) 0.86(8)
Pro 0.52(5) 0.49(7) 0.20(5) 0.40(5) 0.31(10) 0.81(9) 0.77(6) 0.24(4) 0.53(10) 0.40(6) 0.62(6) 0.43(6) 0.53(12) 0.49(7) 0.20(4) 0.38(5) 0.33(5) 0.26(9) 0.49(8) 0.39(5)
Ser 1.11(7) 1.06(8) 0.54(7) 0.74(6) 0.56(11) 1.06(9) 1.01(7) 0.39(4) 0.68(10) 0.95(7) 1.14(6) 0.95(7) 1.14(13) 0.95(8) 0.37(5) 0.63(5) 0.75(7) 0.86(13) 0.99(9) 0.81(6)
Thr 1.20(7) 1.08(9) 0.45(6) 0.5(6) 0.77(13) 1.29(11) 1.10(8) 0.55(5) 0.75(10) 0.86(7) 1.19(6) 0.93(8) 1.14(14) 0.85(8) 0.41(5) 0.62(6) 0.70(6) 0.85(13) 0.67(8) 0.75(6)
Trp 1.88(17) 1.41(17) 0.75(14) 0.95(14) 0.49(21) 1.48(19) 1.14(16) 0.72(12) 0.88(21) 1.30(18) 1.46(14) 0.97(15) 1.12(27) 0.85(17) 0.38(12) 0.63(12) 0.72(13) 0.72(26) 0.96(19) 0.92(14)
Tyr 1.38(10) 0.95(10) 0.74(9) 0.82(8) 0.62(15) 1.20(12) 1.05(10) 0.60(7) 0.92(15) 0.96(10) 1.54(9) 1.05(10) 1.39(20) 1.03(12) 0.38(7) 0.57(7) 0.51(7) 0.89(17) 1.08(12) 0.82(8)
Val 1.38(6) 1.19(8) 0.96(8) 0.76(6) 0.84(13) 1.23(9) 1.12(7) 0.62(5) 0.60(9) 0.78(6) 1.02(6) 1.08(7) 1.42(14) 0.73(7) 0.27(4) 0.72(6) 0.61(5) 0.81(14) 0.72(8) 0.67(5)

Amino acids in the columns precede residues in the rows. Propensity values >1.20 are in bold for ease of inspection. Parts (a)±(d) list normalized propensity values derived from analysis of (a) helix library, (b) short
helices, (c) medium helices and (d) long helices. The standard deviations were estimated according the formula derived by Williams et al. (Williams et al., 1987): sd = [fij,k(1 ± fij,k)nij]1/2N/nij, where fij,k was the
frequency of occurrence of the residue pair i, j in the helix group k, nij was the total number of i, j pairs and N was the total number of residue pairs in our PDB library. The total number of residue pairs in the short
helix group was 11 101, in the medium helix group was 32 415, in the long helix group was 45 917 and in the total helix group was 89 433. The total number of residue pairs in our PDB library was 338 875. All
propensity values were represented with two decimal places in order to keep all records on a consistent basis. The residue pair Cys±Trp was not present in the medium helix group.

Downloaded from http://peds.oxfordjournals.org/ at UCSF Library on March 19, 2015


Sequence analysis of a-helix in proteins

Table II. Dyad sequence codes for different groups of helix and strand
Discussion
The large population of secondary structural sequences derived
All helices Short helices Medium helices Long helices
in this study allowed us to analyze sequence patterns in
Asp±Met Ala±Arg Arg±Met Asp±Ala different helix groups. Amino acid helix propensity values in
Met±Trp Ala±Lys Arg±Trp Asp±Met the medium and the long helix groups were quite similar to
Trp±Ile Arg±Glu Asp±Ala Phe±Leu those found in the total helix group. More signi®cant variations
Tyr±Leu Asp±Gln Asp±Trp Trp±Gln
Asp±Ile Cys±Lys Tyr±Leu
were found in short helices, particularly for residues Trp and
Asp±Phe Gln±Phe Tyr±Met Pro where both residues showed a signi®cant increase in their
Asp±Val Glu±Met frequency of occurrence (Figure 2). Such differential prefer-
Glu±Thr His±Glu ence of certain amino acids in helices of variable sizes was
His±Ala Leu±Asn
Lys±Tyr Phe±Met
found in other studies. Kumar and Bansal (Kumar and Bansal,
Phe±Glu Thr±Met 1996) have shown that long helices (helices with more than 25
Pro±Ala Trp±Leu residues) often had a higher content of residues with longer
Pro±Gln Trp±Val side chains than those in medium and short helices. It was
Pro±Glu Tyr±Leu suggested that amino acids with longer side chains could
Pro±Phe
Pro±Ser perhaps better facilitate complementary interactions with other
Ser±Gln elements of the protein structure (Kumar and Bansal, 1998).
Ser±Leu However, the analysis of our helix library failed to show a trend
that the appearance of residues with bulkier side chains was
All entries in this table were selected from Table Ia±d according to the
following criteria: (i) the dyad signatures had a propensity >1.30, whereas more favored in longer helices. It should be noted that our

Downloaded from http://peds.oxfordjournals.org/ at UCSF Library on March 19, 2015


the propensity of their respective dyad pairs was <1.20, and (ii) the current helix library contained only a limited number of helices
propensities of the dyad signatures and their dyad pairs differed by >0.3. longer than 25 residues (Figure 2). It would be interesting to
Dyad pairs with an estimated standard deviation >0.3 were excluded from revisit the helix propensities of amino acids in longer helices
the table.
when a larger population of long helices (>25 residues)
becomes available.
The differential propensity values of amino acids in different
helix groups were also re¯ected in the neighbor-dependent
in¯uence on the preference of Tyr adopting helical conform- sequence analysis. As a result, we found a number of sequence
ation. Another noteworthy pattern was that the pairing of Met patterns that were unique to speci®c helix groups. For example,
and Ala, two strong helix conformers, showed no preference Trp had mostly comparable neighbor-dependent propensities
for adopting helical conformation in short helices (Table Ib). with other amino acids for helices in the medium, long and total
Such a pattern was not observed in other helix groups. helix groups. On the other hand, in the short helix group, the
For weak helix conformers, their neighbor-dependent helix Trp was strongly favored for the helical conformation when it
propensities varied signi®cantly in different helix groups. In was positioned at the ±1 positions of Glu, His, Lys and Ser.
the short helix group, Asp was a good `helix neighbor' to a However, at the +1 positions of these residues, Trp had no
number of amino acids, including Ala, Arg, Asp, Gln, Glu, Ile, preference for helical conformation (Table Ia±d). Although the
Leu, Lys, Met, Phe, Trp, Tyr and Val, while in long helices Asp estimated standard deviations for residue pairs containing Trp
had little effect on other amino acids adopting helical were higher than that of other residue pairs, the neighbor-
conformation. Also in the short helix group, the pairing of dependent propensities for Trp±Glu, Trp±His, Trp±His, Trp±
Tyr with Cys and Glu yielded strong neighbor-dependent Lys and Trp±Ser were large enough to justify the statistical
propensities. In medium and long helices, Tyr was found to be signi®cance of the sequence patterns. This unsymmetrical
favored next to Leu residues. Cys was also more frequently neighbor preference was not limited to Trp; Pro also exhibited
found at the +1 position of Phe [eC(F + 1) = 1.96] and at the ±1 unusual sequence patterns in short helices. Although the
position of Trp [eC(W ± 1) = 2.54]. There were only relatively individual propensity value of Pro in the short helix group (eP =
few amino acids showing preference for the helix conformation 0.95) was signi®cantly higher than its propensity value in the
when neighbored to a His residue (Table Ia±c). One example medium and the long helix groups, it was nevertheless still
was that Cys and Trp were preferably positioned at the ±1 below the threshold of random distribution (Figure 2). Our
position of His in short helices [eW(H ± 1) = 2.07, eW(H ± 1) = neighbor-dependent sequence analysis showed that Pro had
1.94]. Like that of His, residues neighboring Ser were not often high frequencies of occurrence at the ±1 position of Ala, Gln,
found in helical conformation. The exceptions we found were Glu, Phe, Ser and Trp, with propensity values of eP(A ± 1) =
the pairings of Ser with Trp and Glu in short helices (Table Ib). 1.52, eP(Q ± 1) = 1.97, eP(E ± 1) = 2.18, eP(F ± 1) = 1.49, eP(S ± 1) =
Pro and Gly, being the residues that had the lowest 1.58 and eP(W ± 1) = 1.90, respectively. The occurrences of Pro
propensity for helix conformation, had the expected pattern at the +1 position of these amino acids in short helices, on the
of weak effect on the helical propensities of their neighbors. other hand, were signi®cantly below that of the random
This pattern appeared consistent in all helix groups (Table Ia± distribution (Table Ib). Earlier studies had shown that Pro was
d). Interesting preference patterns of amino acids neighboring one of the preferred residues at the Ncap position of the helices
Pro residues, however, were found in short helices. Pro had (the position before the ®rst residue of the a-helix) (Richardson
high helix propensity values when it was positioned at the ±1 and Richardson, 1988; Harper and Rose, 1993; Kumar and
positions of Ala, Gln, Glu, Phe, Ser and Trp residues [eP(A ± 1) = Bansal, 1998), as well as at the ®rst position of helices (the N1
1.52, eP(Q ± 1) = 1.97, eP(E ± 1) = 2.18, eP(F ± 1) = 1.49, eP(S ± 1) = position) (Penel et al., 1999). The ®nding of high occurrence of
1.58, eP(W ± 1) = 1.90]. On the other hand, no preference pattern a Pro±X (X = Ala, Gln, Glu, Phe, Ser and Trp) pattern could
was found for residues neighboring Gly in helices. potentially be a `by-product' of Pro being favored at the N-
805
J.Wang and J.-A.Feng

terminus of helices. On the other hand, while the propensities be directly compared. Nevertheless, the statistical signi®cance
of amino acids, Asp and Leu, were found to be among the of both methods should be comparable, i.e. a propensity value
highest at the second (the N2 position) and the third positions of 1.0 represented random distribution of the amino acids in the
(the N3 position) of the helix (Penel et al., 1999; Cochran and PDB. The expanded statistical scale of the neighbor-dependent
Doig, 2001), the amino acid pairs, Pro±Asp and Pro±Leu, had analysis enabled us to explore the hidden codes in the protein
relatively low neighbor-dependent propensity values in a-heli- sequence. Speci®cally, we were able to identify dyad signa-
ces (Table Ia±d). It appeared that the neighbor-dependent tures (a±b) that were highly favorable for the helix conform-
effect played a role in determining sequence patterns in ation, whereas their dyad pairs (i.e. b±a) had little or no
a-helices. preference for their corresponding conformations. Table II lists
While it was dif®cult to provide a physical±chemical some of the asymmetric dyads that had a high propensity for
rationale for the sequence patterns discovered in this study, helix conformations. The dyads in Table II were selected from
the neighbor dependency of amino acids favoring helical Table Ia±d according to the following criteria: (i) all entries
conformation appeared to be consistent with experimental had propensities >1.30, whereas the dyad pair of these entries
®ndings. Recent studies have shown that an amino acid was <1.2, and (ii) the propensity difference of the dyad pair was
propensity value for a particular geometrical conformation is >0.3.
not independent of its environment. Sequence analysis of The existence of the dyad sequence patterns re¯ected the
a-helices in proteins revealed that transitions from loop to dependence of the helical preference of some amino acids on
helix conformation required the presence of a particular group their neighbors. Such patterns were not always easily predict-
of amino acids (Presta and Rose, 1988; Richardson and able. Short helices had by far the most diversi®ed sequence
Richardson, 1988; Parker and Hefford, 1997). The amino acid patterns (Table II). The preference of amino acid dyads
composition at the ends of helices, where they were often more adopting helical conformation could not be easily rationalized.

Downloaded from http://peds.oxfordjournals.org/ at UCSF Library on March 19, 2015


hydrophilic in nature, was distinctly different from the Through structural geometrical analysis, Penel et al. (Penel
composition in the middle of the helices, where they were et al., 1999) suggested that neighbor-dependent sequence
often more hydrophobic in nature (Lacroix et al., 1998). These preference for adopting helical conformation could arise from
®ndings were also supported by experimental work that side chain±main chain or side chain±side chain interactions
analyzed the helix propensity values of amino acids at different between neighboring residues. The sequence preference for
positions of the synthetic peptides (Petukhov et al., 1998, 2002; a-helix was not always predictable by assuming proportional
Thomas et al., 2001). contributions from the propensity values of the individual
One of the applications of the knowledge on the sequence± amino acids. For example, the occurrence of a Pro (eP = 0.55)
structure relationship of proteins is in predicting protein following a Gly (eG = 0.58), i.e. the sequence Gly±Pro, was
secondary structures. Protein secondary structure predictions actually higher in helices than that of a Pro following a Gln
based on statistical methods have been implemented in a (Gln±Pro), even though Gln had an individual helix propensity
number of computer algorithms. These include the rule-based value of 1.22 (Table Ia). Similar examples were found in
Chou and Fasman method (Chou and Fasman, 1978), the GOR numerous combinations of amino acid pairs. The sequence
III method which predicts the structural conformation of an patterns of loops were even more diversi®ed than that of the
amino acid in a protein sequence based on the statistical helices discussed here (Crasto and Feng, 2001).
information of amino acids within a window surrounding that Residues in a-helices could have both sequence neighbors
amino acid (Gilbrat et al., 1987), and the more recent and spatial neighbors. Spatial neighbors in helices were
application of the hidden Markov model (HMM) which derives residues separated by i 6 4 in sequence positions. An
a probability for assigning a structural conformation of an intriguing question was whether neighbor-dependent effects,
amino acid by inferring its neighboring information (Asai et al., such as those observed in the loop sequences, could exist
1993; Schmidler et al., 2000). These studies have shown that between spatial neighboring residues. For example, side chains
the accuracy of secondary structure prediction is improved as of the spatial neighbors could form favorable interactions,
more sophisticated implementations of sequence neighboring including hydrogen bonding, polar and hydrophobic inter-
sequence information are applied. Implicitly, these studies actions, thus providing stabilization energy for helical con-
suggest that neighboring residue type could play a role in formation. We examined this scenario by calculating the
affecting the propensity of an amino acid adopting a particular neighbor-dependent propensity between residues of i 6 4 in
conformation. However, structure prediction algorithms are helices. Surprisingly, our results showed no signi®cant correl-
usually prediction-accuracy driven; little consideration has ations between chemical complementarity of spatial neighbors
been placed on the nature of intermediate parameters derived and their neighbor-dependent propensity in a-helix (J.Wang
from the training data set. As demonstrated in this study, the and J.-A.Feng, unpublished results). Spatial neighbors with
sequence patterns could differ signi®cantly in secondary chemical complementarities apparently did not have higher
structures of variable lengths. It is conceivable that the neighbor-dependent propensity than that of the spatial neigh-
prediction ef®ciency of these algorithms could be improved bors with no chemical complementarities.
with the incorporation of knowledge learned in this study. The results of this study show that the sequence patterns for
One of the most noticeable attributes of the neighbor- a-helices were more predictable than that of the sequence
dependent analysis method was the increased statistical patterns for loops. A combination of residues with high
signi®cance of the residue propensities. The individual individual propensity values for helix conformation would
propensity values, ea, were in the range 0.67±1.51 (Figure 2), usually yield a high preference for adopting a helical
while the neighbor-dependent propensity values, ex(a 6 1), were conformation. On the other hand, when residues with moderate
in a much greater range between 0.19 and 2.54. Because or low helix preferences were neighbored in sequence,
different scale factors were applied (see Materials and unexpected sequence patterns emerged, particularly in short
methods), the derived propensity values obviously could not helices where numerous amino acid dyads were found.
806
Sequence analysis of a-helix in proteins

Considering the differential amino acid composition in helices


of different length, it was not surprising that the sequence
patterns of a-helices were size dependent.

Acknowledgements
The authors would like to thank Jennifer Davis and other members of the Feng
laboratory for helpful discussions. We also acknowledge the ®nancial support
from the National Institutes of Health (GM54630), the American Cancer
Society (PRG9926301GMC) and an appropriation from the commonwealth of
Pennsylvania.

References
An®nsen,C.B. (1973) Science, 181, 223±230.
Asai,K., Hayamizu,S. and Honda,K.I. (1993) Comput. Appl. Biosci., 9, 141±
146.
Baldwin,R.L. and Rose,G.D. (1999a) Trends Biochem. Sci., 24, 26±33.
Baldwin,R.L. and Rose,G.D. (1999b) Trends Biochem. Sci., 24, 77±83.
Barlow,D.J. and Thornton,J.M. (1988) J. Mol. Biol., 201, 601±619.
Benner,S.A. and Gerloff,D. (1990) Adv. Enzyme Regul., 31, 121±181.
Bernstein,F.C. et al. (1977) J. Mol. Biol., 11, 535±542.
Chamberlain,A.K. and Marquesee,S. (1997) Structure, 5, 859±863.
Chou,P.Y. and Fasman,G.D. (1978) Annu. Rev. Biochem., 47, 251±276.
Cochran,D.A.E. and Doig,A.J. (2001) Protein Sci., 10, 1305±1311.

Downloaded from http://peds.oxfordjournals.org/ at UCSF Library on March 19, 2015


Colloc'h,N. and Cohen,F.E. (1991) J. Mol. Biol., 221, 603±613.
Crasto,C.J. and Feng,J.-A. (2001) Proteins: Struct. Funct. Genet., 42, 399±
413.
Dill,K.A. (1990) Biochemistry, 29, 7133±7155.
E®mov,A.V. (1993) Prog. Biophys. Mol. Biol., 60, 201±239.
Gilbrat,J.-F., Garnier,J. and Robson,B. (1987) J. Mol. Biol., 198, 425±433.
Harper,E.T. and Rose,G.D. (1993) Biochemistry, 32, 7605±7609.
Honig,B. (1999) J. Mol. Biol., 293, 283±293.
Hughson,F.M., Barrick,D. and Baldwin,R.L. (1991) Biochemistry, 30, 4143±
4148.
Hutchinson,E.G. and Thornton,J.M. (1994) Protein Sci., 3, 2207±2216.
Jennings,P.A. and Wright,P.E. (1993) Science, 262, 892±896.
Kabsch,W. and Sander,C. (1983) Biopolymers, 22, 2577±2637.
Kumar,S. and Bansal,M. (1996) Biophys. J., 71, 1574±1586.
Kumar,S. and Bansal,M. (1998) Proteins: Struct. Funct. Genet., 31, 460±476.
Lacroix,E., Viguera,A.R. and Serrano,L. (1998) J. Mol. Biol., 284, 173±191.
Olivea,O., Bates,B.A., Querol,E., Aviles,F.X. and Sternberg,M.J.T. (1997)
J. Mol. Biol., 266, 814±830.
Parker,M.H. and Hefford,M.A. (1997) Protein Eng., 10, 487±496.
Penel,S., Hughes,E. and Doig,A.J. (1999) J. Mol. Biol., 287, 127±143.
Petukhov,M., Munoz,V., Yumoto,N., Yoshikawa,S. and Serrano,L. (1998)
J. Mol. Biol., 278, 279±289.
Petukhov,M., Uegaki,K., Yumoto,N. and Serrano,L. (2002) Protein Sci., 11,
766±777.
Presta,L.G. and Rose,G.D. (1988) Science, 240, 1632±1641.
Richardson,J.S. and Richardson,D.C. (1988) Science, 240, 1648±1652.
Rooman,M.J., Rodriguez,J. and Wodak,S.J. (1990) J. Mol. Biol., 213, 327±
336.
Schmidler,S.C., Liu,J.S. and Brutlag,D.L. (2000) J. Comput. Biol., 7, 233±248.
Thomas,S.T., Loladze,V.V. and Makhatadze,G.I. (2001) Proc. Natl Acad. Sci.
USA, 98, 10670±10675.
Unger,R., Harel,D., Wherland,S. and Sussman,J.L. (1989) Proteins: Struct.
Funct. Genet., 5, 355±375.
Wang,G. and Dunbrack,R.L. (2003) Bioinformatics, 12, 1589±1591.
Williams,R.W., Chang,A., Juretic,D. and Loughran,S. (1987) Biochim.
Biophys Acta, 916, 200±204.
Zhu,Z.-Y. and Blundell,T.L. (1996) J. Mol. Biol., 260, 261±276.

Received January 5, 2003; revised June 25, 2003; accepted September 4, 2003

807

You might also like