Professional Documents
Culture Documents
10 1093@protein@gzg101
10 1093@protein@gzg101
799±807, 2003
DOI: 10.1093/protein/gzg101
Junwen Wang1,2 and Jin-An Feng1,2,3 that the formation of secondary structures largely depended on
1Department
the local amino acid sequence since the intermediate folding
of Chemistry and 2Center for Biotechnology,
Temple University, 1901 North 13th Street, Philadelphia, PA 19122, USA states presumably did not have the established tertiary contacts
3To
of the native structure.
whom correspondence should be addressed.
E-mail: feng@astro.temple.edu
Armed with this experimental evidence, computational
efforts to identify relationships between the amino acid
This paper reports an extensive sequence analysis of the sequence and special local structural elements have been
a-helices of proteins. a-Helices were extracted from the intensive. Most attempts to identify such relationships have
Protein Data Bank (PDB) and were divided into groups proceeded by identifying a common structural motif, then
according to their sizes. It was found that some amino characterizing the frequencies of occurrence of each amino
acids had differential propensity values for adopting acid at each position in that motif. One of the best examples of
helical conformation in short, medium and long a-helices.
propensities. It was evident that the neighbor-dependent structure assignments as follows: helices, 310 helices and the
protein sequence analysis method could reveal `hidden' p-helices were all considered helices.
sequence codes in proteins. The database system used in this study was PostgreSQL
a-Helices are one of the most dominant structural elements packaged in Redhat Linux 7.2. The sequence and secondary
in proteins. Extensive studies have been carried out focusing on structure information of every PDB entry were parsed into
a-helix folding and its sequence±structure relationship. Early relational tables. Two sets of tables (one was parsed based on
studies by Chou and Fasman (Chou and Fasman, 1978) have the author-assigned secondary structure information and the
established a statistical scale to evaluate the likelihood of other was parsed according to the DSSP calculation) were
amino acids adopting a-helix conformation. Residues with compared. Considering that the authors' assignments were
high propensities were termed strong helix conformers, and the experimentally observed, we decided to use such assignments
residues with helix propensities slightly higher than random in the PDB as the standard for de®ning a-helices in the protein
distribution were termed medium helix conformers. Amino structures. However, manual errors were often encountered in
acids having a frequency of occurrence in helices lower than the PDB. In order to avoid such issues, we applied a double-
that of the random distribution were regarded as weak helix check mechanism where every structural element in the PDB
conformers. Although no chemical±physical rationale could be was compared with the DSSP assignment. The a-helices that
easily derived for the preference of amino acids adopting helix were agreed upon by both methods were selected from the PDB
conformation, such a statistical analysis has achieved limited and placed in a helix library. The total number of helices
success in assisting our understanding of the sequence± extracted was 10 643, which constituted 96.2% of the total
structure relationship of proteins, as well as in predicting helices available in those PDB entries. The a-helices were
protein secondary structures (Chou and Fasman, 1978). A grouped according to their sizes.
recent study by Penel et al. (Penel et al., 1999) on the analysis
proteins, suggesting that the pair has a preference for adopting residues. This subset had a total population of 8771 helices.
helix conformation. ex(a61) values lower than unity suggest less Owing to the uneven distribution of different lengths of the
preference for the pair in helices. For example, eP(A ± 1) = 1.52 helices in the library where the number of short helices far
in short helices means Pro has 50% more chance to be found in exceeded the number of medium and long helices (Figure 1), it
short helices than in the proteins when it precedes Ala, i.e. Pro was likely that the helix preference patterns described for the
at ±1 position of Ala; eP(A+1) = 0.47 suggests Pro is less likely to helix library would only re¯ect the characteristics of the short
be found in short helices when it follows Ala, i.e. Pro at +1 or medium a-helices. In order to address such potential bias,
position of Ala. we divided the subset of helices into three groups: short helices
(four to seven residues), medium helices (eight to 13 residues)
and long helices (14 to 22 residues). Based on the selection
Results
criteria, all three groups had approximately equal population
Length distribution of a-helices in proteins sizes. In the following discussions, propensities calculated
The lengths of a-helices in proteins varied between three and from the entire subset of four to 22 residues helices were
77 residues with a total of 10 643 helices in the PDB. The termed total helix propensity; propensities calculated from the
population distribution of helices in the library had a mean short, the medium and the long helical groups were termed
helical length of approximately 12.1 residues (Figure 1), which short, medium and long helix propensities, respectively.
was close to the mean helical length determined by Barlow and
Thornton (Barlow and Thornton, 1988) that included 291 a- Propensities of amino acids in a-helices of different length
helices. There was a relatively large population of three- groups
residue helices found in proteins. Those helices were most The helix propensities of amino acids in short helices appeared
likely half-turn helices or irregular helices such as the 310 quite different from that of the medium and long helices. Of
helices that were more frequently found in helices of less than particular interest were Trp and Pro. Not known as a strong
four residues long (Barlow and Thornton, 1988). In contrast to helix conformer, Trp had a signi®cantly higher frequency of
the a-helix length distribution reported by Zhu and Blundell occurrence in short helices (eW = 1.51) than in medium and
(Zhu and Blundell, 1996), where four-residue helices had twice long helices. In fact, Trp was the strongest helix conformer in
the population size as that of the ®ve-residue helices, our helix short helices. In contrast with its presence in medium and long
library contained more ®ve-residue helices than four-residue helices, Pro also had a signi®cantly elevated frequency of
helices by a ratio of 3:2 (Figure 1). There was a gradual occurrence in short helices (eP = 0.99). A number of residues,
decrease in the helix population as the helical length increased including Asp, Cys, Phe, Ser and Tyr, also had higher helix
beyond 13 residues. Helices longer than 40 residues were rarely propensities in short helices than their propensity values in
found in proteins. The longest helix in our helix library had 77 medium and long helices, while in the same subgroup of
residues. In fact, there were only one or two examples of each helices, residues Ala, Arg, Gln, Ile, Leu, Lys, Met and Val had
helical length having more than 47 residues (except 50- and 51- slightly lower propensity values. Particularly noticeable were
residue helices which had three examples of each). residues Asp and Ala. Both residues were good helix
In order to establish a data set that was representative of the conformers in short helices, while their helix propensities
general helix population, we chose a subset of our helix library were quite different in the context of the overall helix
containing only helices with lengths between four and 22 population (Figure 2). The amino acid propensities in the
801
J.Wang and J.-A.Feng
medium and the long helix subgroups were generally similar to had a strong in¯uence on the preference of neighboring
those of the total helix group. It appeared that the helical residues adopting helical conformation. Not surprisingly,
composition of short helices was quite different from that of the amino acids with strong or medium individual helix propensity,
medium and long helices. including Ala, Arg, Gln, Glu, Ile, Leu, Lys, Met, Phe, Ser, Trp
and Tyr, often exhibited a strong preference adopting a-helix
Neighbor-dependent sequence analysis of a-helices conformation when they were neighbored with Ala, Glu, Leu
In an attempt to analyze how neighboring residues affect the a- and Gln residues. On the other hand, the neighbor-dependent
helix conformations of amino acids, we calculated amino acid effect for stronger helix conformers positioned next to residues
preferences at positions immediately preceding (±1) or fol- with low individual helix propensity was limited. Although the
lowing (+1) an a-helix residue (a). The neighbor-dependent frequency of occurrence for residues with low individual helix
helix propensities of 20 amino acids at the +1 and ±1 position propensities was generally increased when they were neigh-
of a-helix residues [ex(a61)] in different groups are tabulated in bored with strong helix conformers, most of the neighbor-
Table Ia±d. Propensity values >1.20 are in bold in the table for dependent propensities were nevertheless <1.0 (Table Ia).
ease of inspection. Based on estimated standard deviations, Proportionally, medium helix conformers had less in¯uence on
most of the neighbor-dependent propensities had a comparable the preference of neighboring residues adopting helix con-
level of con®dence as that of the individual amino acid formation than that of the strong helix conformers. Amino
propensities (Table Ia±d) (J.Wang and J.-A.Feng, unpublished acids Arg, Lys, Met and Trp had high neighbor-dependent
results; Kumar and Bensal, 1996). The estimated standard propensity when they were neighbored with each other
deviations were slightly higher for neighbor-dependent pro- (Table Ia). No neighbor-dependent effect was observed when
pensities in the short helix group than that of other helix they were positioned next to weak helix conformers.
groups. This variation could in part be attributed to the small Unique sequence patterns were observed in different helix
population size of residue pairs in the short helix group, which groups for a number of amino acids, particularly in the short
was less than one-third of the other groups. helix group. Asp had a high propensity for helix conformation
The neighbor-dependent helix propensities of amino acids when it was neighbored with Ala, Arg and Glu in short helices,
often re¯ected the individual helix propensities of neighboring while such a pattern was not observed in the medium and the
residues. When two strong helix conformers were neighbored, long helix groups. In contrast, the pairings of Ala with Val and
their neighbor-dependent propensity was almost invariably Ile, as well as the pairing of Arg with Leu, were less frequently
high. By the same token, when two weak helix conformers found in short helices than in other helix groups (Table Ib±d).
were neighbored, their neighbor-dependent propensity was Similarly reduced neighbor-dependent propensity values were
often low. Interesting patterns often occurred when a strong also found for Arg, Ile, Lys and Met when they were
helix conformer was neighbored with a weak helix conformer, neighbored to Gln in short helices. Tyr, a weak helix
or two moderate helix conformers were neighbored. conformer, had a strong neighbor-dependent helix propensity
Ala, Glu, Leu and Gln were stronger helix conformers. when it was positioned next to Met in short helices [eY(M + 1) =
Neighbor-dependent propensity calculation showed that they 1.91, eY(M ± 1) =1.82], while in other helix groups, Met had no
802
Table I. Normalized neighbor-dependent helix propensity of residues in various helix groupsa
Ala Arg Asn Asp Cys Gln Glu Gly His Ile Leu Lys Met Phe Pro Ser Thr Trp Tyr Val
(a)
Ala 1.74(4) 1.58(5) 1.04(5) 1.05(4) 1.33(10) 1.66(6) 1.68(4) 0.69(3) 1.13(7) 1.53(5) 1.84(4) 1.60(5) 1.72(7) 1.40(6) 0.42(3) 1.16(4) 0.97(4) 1.71(10) 1.35(6) 1.32(4)
Arg 1.57(5) 1.25(6) 0.98(6) 1.21(5) 1.02(11) 1.44(7) 1.63(5) 0.57(4) 1.16(9) 1.12(5) 1.38(5) 1.28(6) 1.43(10) 1.11(6) 0.31(4) 1.01(6) 0.90(6) 1.34(12) 1.12(7) 0.80(4)
Asn 1.17(5) 0.96(7) 0.51(5) 0.68(5) 0.55(9) 0.86(7) 1.13(6) 0.35(3) 0.86(9) 0.82(5) 0.95(5) 0.82(5) 1.05(10) 0.94(6) 0.42(4) 0.66(5) 0.69(5) 0.93(10) 0.77(6) 0.73(5)
Asp 1.35(4) 1.07(6) 0.63(5) 0.78(5) 0.83(9) 1.19(7) 1.23(5) 0.28(2) 0.99(8) 1.05(5) 1.17(4) 1.02(5) 1.32(9) 1.09(6) 0.50(4) 0.68(4) 0.85(5) 1.21(10) 1.12(6) 0.96(4)
Cys 1.07(9) 1.04(10) 0.78(11) 0.73(9) 0.73(15) 1.03(12) 1.13(10) 0.54(7) 0.94(13) 1.00(11) 1.3(9) 1.03(11) 1.02(19) 0.86(12) 0.21(5) 0.58(8) 0.68(9) 0.95(21) 0.56(10) 0.59(8)
Gln 1.69(5) 1.53(7) 0.91(7) 0.99(7) 0.95(13) 1.54(7) 1.51(6) 0.55(4) 1.16(10) 1.22(6) 1.64(5) 1.39(7) 1.52(11) 1.24(8) 0.29(4) 0.96(6) 0.92(6) 1.19(12) 1.27(8) 1.01(6)
Glu 1.81(4) 1.44(5) 1.01(5) 1.08(5) 1.08(11) 1.76(6) 1.65(4) 0.63(4) 1.16(8) 1.39(5) 1.61(4) 1.50(5) 1.62(8) 1.29(6) 0.35(4) 1.20(5) 1.19(5) 1.60(10) 1.2(6) 1.18(4)
Gly 0.68(3) 0.58(4) 0.31(3) 0.42(3) 0.56(7) 0.58(4) 0.60(4) 0.35(2) 0.53(5) 0.55(3) 0.71(3) 0.48(3) 0.71(6) 0.57(4) 0.45(4) 0.40(3) 0.42(3) 0.50(6) 0.49(4) 0.45(3)
His 1.17(8) 1.00(9) 0.74(9) 0.75(7) 0.60(11) 1.17(10) 1.33(9) 0.47(5) 0.62(7) 0.93(7) 1.17(6) 0.89(8) 1.10(14) 0.86(9) 0.32(5) 0.75(7) 0.82(8) 0.89(14) 0.92(10) 0.90(7)
Ile 1.49(5) 1.26(6) 0.85(5) 0.82(4) 0.86(10) 1.28(7) 1.33(5) 0.73(4) 0.94(8) 1.03(5) 1.28(5) 1.24(5) 1.43(10) 0.93(6) 0.40(4) 0.88(5) 0.68(4) 1.09(12) 0.87(6) 0.85(4)
Leu 1.74(4) 1.47(4) 1.08(5) 1.16(4) 1.19(9) 1.43(5) 1.64(4) 0.81(3) 1.25(7) 1.3(5) 1.55(4) 1.54(4) 1.55(8) 1.21(5) 0.40(3) 1.06(4) 1.04(4) 1.38(10) 1.05(5) 1.10(4)
Lys 1.61(5) 1.37(6) 0.93(5) 1.03(5) 0.87(10) 1.45(7) 1.60(5) 0.50(3) 0.99(8) 1.08(5) 1.40(4) 1.29(5) 1.40(9) 1.14(7) 0.35(4) 0.94(5) 0.96(5) 1.12(11) 1.18(6) 0.83(4)
Met 1.60(7) 1.27(9) 0.84(9) 1.00(8) 1.09(18) 1.49(11) 1.27(8) 0.70(7) 0.91(13) 1.22(9) 1.73(8) 1.50(8) 1.30(13) 1.21(11) 0.55(7) 0.95(7) 0.71(7) 1.57(21) 1.25(12) 1.08(8)
Phe 1.29(6) 1.14(7) 0.79(6) 0.84(5) 0.83(11) 1.12(8) 1.30(6) 0.68(5) 0.74(8) 0.97(6) 1.44(5) 1.09(6) 1.31(12) 0.99(8) 0.29(4) 0.76(5) 0.71(5) 1.28(13) 1.11(8) 0.89(5)
Pro 0.81(4) 0.66(6) 0.26(4) 0.52(4) 0.37(8) 0.93(6) 1.09(5) 0.28(3) 0.61(8) 0.59(5) 0.81(4) 0.54(5) 0.69(9) 0.64(6) 0.28(4) 0.57(4) 0.46(4) 0.59(9) 0.64(6) 0.44(4)
Ser 1.17(4) 1.00(5) 0.61(5) 0.76(4) 0.64(8) 1.07(6) 1.08(5) 0.39(3) 0.74(7) 1.04(5) 1.20(4) 0.97(5) 1.12(9) 0.96(6) 0.40(4) 0.65(4) 0.65(4) 1.00(9) 0.93(6) 0.82(4)
Thr 1.14(4) 0.99(6) 0.50(4) 0.53(4) 0.78(9) 1.07(7) 1.04(5) 0.49(3) 0.73(7) 0.76(5) 1.15(4) 0.95(5) 1.24(9) 0.80(5) 0.45(4) 0.65(4) 0.67(4) 0.83(9) 0.72(6) 0.73(4)
Trp 1.76(10) 1.37(11) 0.79(10) 0.96(9) 0.65(16) 1.42(12) 1.34(11) 0.76(8) 0.87(14) 1.40(12) 1.63(9) 1.16(11) 1.07(17) 1.21(13) 0.60(10) 0.88(9) 0.70(9) 1.21(21) 0.83(12) 1.04(10)
Tyr 1.31(6) 0.97(6) 0.75(6) 0.86(6) 0.81(11) 1.11(8) 1.17(7) 0.63(5) 0.75(9) 0.88(7) 1.46(6) 1.10(7) 1.36(12) 1.01(8) 0.46(5) 0.67(5) 0.56(5) 0.95(12) 0.96(8) 0.76(5)
Val 1.27(4) 1.12(5) 0.83(5) 0.79(4) 0.83(9) 1.20(6) 1.17(4) 0.57(3) 0.69(6) 0.72(4) 1.04(4) 1.12(5) 1.25(8) 0.85(5) 0.35(3) 0.70(4) 0.60(4) 0.71(9) 0.68(5) 0.66(3)
(b)
Ala 1.27(11) 1.38(17) 1.33(19) 1.47(16) 1.21(32) 1.25(18) 1.56(16) 0.76(10) 0.88(21) 0.82(13) 1.22(12) 1.53(17) 1.08(22) 1.31(19) 0.47(11) 1.34(16) 0.95(14) 1.48(34) 0.84(17) 0.82(11)
Arg 1.03(15) 0.96(18) 1.08(21) 1.47(20) 0.38(22) 1.0(21) 1.47(19) 0.72(14) 1.43(32) 0.62(13) 0.85(12) 1.34(21) 0.80(26) 1.26(22) 0.56(16) 1.15(19) 1.03(19) 1.38(43) 1.16(23) 0.56(12)
Asn 1.04(16) 1.05(22) 0.37(12) 0.75(17) 0.92(37) 0.71(19) 1.12(19) 0.37(9) 1.37(35) 0.79(16) 0.87(14) 0.67(16) 0.32(18) 0.91(21) 0.76(16) 0.94(18) 0.64(15) 1.43(40) 0.94(22) 0.78(15)
Asp 1.28(15) 1.4(21) 0.95(18) 1.12(17) 1.41(38) 1.41(25) 1.6(18) 0.46(9) 0.96(26) 1.39(18) 1.73(16) 1.34(18) 1.69(36) 1.44(21) 1.19(18) 1.07(17) 0.98(17) 2.15(43) 1.43(23) 1.34(16)
Cys 1.13(32) 1.12(35) 0.61(30) 1.30(38) 0.62(43) 0.76(33) 1.65(41) 1.00(27) 2.07(60) 0.72(29) 0.88(26) 1.00(35) 0.37(37) 1.21(45) 0.43(21) 1.14(32) 1.09(34) 2.54(109) 1.09(44) 0.54(24)
Gln 1.20(17) 1.07(22) 0.87(21) 1.00(21) 0.76(38) 1.17(23) 1.61(23) 0.8(16) 0.65(24) 0.89(19) 1.31(17) 1.08(20) 1.17(36) 1.21(26) 0.44(16) 0.80(18) 0.74(18) 1.13(39) 1.25(27) 0.96(18)
Glu 1.72(16) 1.07(16) 1.1(17) 1.56(19) 1.28(40) 1.66(24) 1.47(16) 0.70(12) 1.12(27) 1.58(18) 1.58(15) 1.34(16) 1.92(33) 1.19(20) 0.49(13) 1.5(20) 1.43(19) 2.54(45) 1.42(23) 1.19(15)
Gly 0.77(11) 0.96(15) 0.31(9) 0.63(11) 1.17(30) 0.66(15) 0.67(12) 0.44(8) 0.40(14) 0.50(10) 0.59(9) 0.57(10) 0.58(18) 0.66(13) 0.52(13) 0.61(11) 0.49(10) 0.70(23) 0.78(16) 0.41(8)
His 1.56(29) 0.72(24) 1.21(34) 0.66(22) 0.63(36) 1.23(33) 1.10(28) 0.59(17) 0.55(21) 0.77(22) 1.06(20) 0.86(25) 1.13(45) 0.90(28) 0.87(24) 0.61(20) 0.93(26) 0.95(47) 0.54(24) 0.98(23)
Ile 1.06(14) 1.26(19) 0.56(13) 0.76(13) 0.33(19) 0.75(18) 1.25(17) 0.71(13) 0.77(23) 0.70(14) 0.82(13) 0.80(14) 0.93(29) 0.96(21) 0.35(10) 0.82(14) 0.56(12) 1.53(45) 0.94(21) 0.24(8)
Leu 1.32(12) 0.98(13) 0.73(13) 1.32(14) 1.15(29) 1.17(17) 1.29(13) 0.78(11) 0.96(21) 0.94(14) 1.26(12) 1.14(13) 1.24(24) 1.54(21) 0.5(10) 0.97(12) 0.91(12) 1.50(38) 0.89(16) 0.79(11)
Lys 1.17(14) 1.26(20) 0.97(18) 1.22(17) 1.11(36) 1.15(22) 1.39(16) 0.53(11) 1.16(28) 0.80(14) 1.12(14) 1.10(15) 1.30(31) 1.16(22) 0.57(14) 1.07(17) 0.89(15) 1.34(40) 1.57(25) 0.68(12)
Met 0.83(20) 1.15(30) 0.37(18) 1.05(27) 1.05(60) 1.00(33) 1.24(28) 1.01(25) 0.19(19) 0.64(22) 0.95(21) 1.26(28) 0.90(36) 0.87(32) 0.71(25) 0.81(22) 0.62(21) 2.38(93) 1.91(49) 0.73(21)
Phe 1.15(19) 1.55(26) 0.93(20) 1.02(18) 1.96(55) 1.30(28) 1.68(24) 0.81(16) 0.45(20) 0.62(16) 1.42(19) 1.00(19) 1.17(38) 1.14(27) 0.39(14) 0.97(18) 0.93(18) 1.31(45) 0.98(24) 0.75(16)
Pro 1.52(18) 1.26(24) 0.63(16) 0.93(16) 0.85(34) 1.97(30) 2.18(22) 0.47(11) 1.10(31) 0.83(18) 1.28(17) 0.92(19) 0.92(32) 1.49(26) 0.78(18) 1.58(21) 0.93(17) 1.90(49) 1.16(25) 0.47(11)
Ser 1.19(15) 1.34(20) 0.88(17) 1.20(17) 1.34(35) 1.42(22) 1.53(19) 0.52(9) 0.89(23) 1.3(18) 1.56(16) 1.30(18) 1.17(30) 1.23(20) 0.57(14) 1.03(14) 0.44(11) 1.36(34) 1.08(21) 0.89(14)
Thr 0.73(12) 0.58(14) 0.55(14) 0.75(14) 1.09(32) 0.97(21) 0.73(14) 0.47(9) 0.81(23) 0.76(15) 0.98(12) 1.04(18) 1.10(30) 0.68(16) 0.5(12) 0.84(15) 0.70(14) 1.23(33) 1.20(23) 0.51(10)
Trp 2.27(43) 2.42(50) 0.72(29) 1.49(36) 1.20(68) 1.74(45) 2.3(48) 1.39(35) 1.94(66) 2.37(54) 1.88(35) 2.06(47) 0.31(31) 2.18(58) 0.69(34) 1.71(40) 0.88(31) 1.70(82) 1.26(47) 0.72(27)
Tyr 1.10(19) 0.80(19) 0.78(19) 1.17(21) 1.44(47) 0.78(21) 1.83(27) 1.07(19) 1.16(36) 0.48(16) 1.21(19) 1.07(22) 1.82(49) 1.06(25) 0.62(18) 0.97(19) 0.50(15) 1.44(47) 0.8(23) 0.51(14)
Val 0.90(12) 0.69(13) 0.76(15) 0.81(13) 0.66(25) 0.98(18) 0.95(13) 0.55(10) 0.54(18) 0.45(10) 0.79(11) 1.12(15) 0.93(25) 1.17(19) 0.34(9) 0.68(12) 0.47(10) 0.54(24) 0.64(16) 0.50(9)
803
Sequence analysis of a-helix in proteins
804
Ala Arg Asn Asp Cys Gln Glu Gly His Ile Leu Lys Met Phe Pro Ser Thr Trp Tyr Val
(c)
Ala 1.75(7) 1.49(10) 0.97(9) 0.97(8) 1.27(18) 1.83(12) 1.64(9) 0.66(6) 1.16(13) 1.52(9) 2.10(8) 1.60(9) 1.52(14) 1.59(11) 0.39(6) 1.16(8) 0.81(7) 1.72(20) 1.63(13) 1.31(8)
Arg 1.64(10) 1.15(11) 1.03(11) 1.08(10) 1.10(21) 1.41(14) 1.68(11) 0.51(7) 1.14(16) 1.13(10) 1.42(9) 1.20(11) 1.50(20) 0.88(11) 0.30(7) 0.93(10) 0.82(10) 1.51(25) 1.06(12) 0.68(7)
J.Wang and J.-A.Feng
Asn 1.25(10) 0.90(12) 0.53(8) 0.67(9) 0.42(15) 0.82(12) 1.05(11) 0.24(4) 0.69(14) 0.81(9) 0.93(8) 0.80(10) 1.08(19) 0.95(12) 0.34(6) 0.59(8) 0.72(9) 0.90(18) 0.69(11) 0.77(8)
Asp 1.36(9) 0.93(10) 0.59(8) 0.76(8) 0.78(16) 1.13(13) 1.31(9) 0.22(4) 0.81(14) 1.05(9) 1.06(7) 0.97(9) 1.16(17) 1.20(11) 0.42(6) 0.59(7) 0.80(9) 1.41(20) 1.05(11) 1.03(8)
Cys 0.74(15) 1.04(19) 0.63(18) 0.77(17) 0.63(25) 1.14(23) 1.05(19) 0.47(11) 0.77(22) 0.91(19) 1.23(17) 1.42(23) 1.66(42) 0.71(20) 0.15(7) 0.56(13) 0.78(16) 0.50(17) 0.37(12)
Gln 1.57(11) 1.58(15) 0.85(12) 0.91(12) 0.78(22) 1.29(14) 1.41(12) 0.44(7) 1.21(19) 1.35(13) 1.66(11) 1.26(12) 1.28(21) 1.49(16) 0.36(8) 0.80(10) 0.88(11) 1.35(24) 1.16(15) 0.87(10)
Glu 1.76(9) 1.57(11) 0.88(9) 1.04(9) 1.01(20) 1.7(13) 1.93(10) 0.49(6) 0.92(14) 1.41(10) 1.72(8) 1.65(10) 1.58(17) 1.35(12) 0.40(7) 1.20(10) 1.12(9) 1.59(20) 1.32(13) 1.22(8)
Gly 0.69(6) 0.50(6) 0.36(6) 0.35(5) 0.43(11) 0.46(7) 0.57(6) 0.40(5) 0.50(9) 0.50(6) 0.72(6) 0.39(5) 0.83(12) 0.52(7) 0.41(7) 0.29(4) 0.34(5) 0.45(11) 0.45(7) 0.39(5)
His 0.91(13) 0.90(15) 0.55(13) 0.75(13) 0.87(24) 1.23(19) 1.31(17) 0.37(8) 0.59(12) 0.86(13) 1.21(12) 0.69(13) 0.77(22) 0.93(16) 0.25(8) 0.75(13) 0.74(13) 0.57(21) 0.88(17) 0.83(12)
Ile 1.55(9) 1.10(10) 0.98(10) 0.89(8) 0.79(17) 1.36(13) 1.32(10) 0.70(7) 0.86(14) 1.04(10) 1.57(10) 1.32(10) 1.44(20) 0.91(12) 0.46(7) 0.88(8) 0.77(8) 0.86(19) 0.74(11) 0.87(8)
Leu 1.92(8) 1.66(9) 1.32(10) 1.11(7) 1.21(17) 1.34(10) 1.74(9) 0.8(6) 1.17(13) 1.39(9) 1.67(7) 1.70(9) 1.55(15) 1.36(11) 0.47(5) 1.06(7) 1.06(8) 1.20(19) 1.05(10) 1.14(8)
Lys 1.65(9) 1.36(11) 0.97(10) 1.06(9) 1.02(20) 1.48(14) 1.67(10) 0.47(6) 1.12(16) 1.26(10) 1.44(9) 1.50(10) 1.29(17) 1.15(12) 0.33(6) 0.95(9) 0.89(9) 0.96(19) 1.21(12) 0.88(8)
Met 1.73(16) 1.18(17) 0.85(16) 0.98(15) 1.08(34) 1.22(20) 1.11(15) 0.50(10) 0.77(21) 1.23(17) 2.05(17) 1.42(16) 1.18(23) 1.14(21) 0.64(13) 0.81(13) 0.64(12) 1.09(36) 1.35(23) 1.09(14)
Phe 1.36(11) 0.90(12) 0.68(10) 0.90(10) 0.78(20) 1.08(14) 1.33(12) 0.71(8) 0.67(14) 1.1(12) 1.57(11) 1.37(13) 1.65(25) 1.06(15) 0.30(7) 0.67(8) 0.70(9) 1.29(25) 1.05(14) 0.96(10)
Pro 0.99(8) 0.69(10) 0.20(5) 0.55(7) 0.29(12) 0.74(11) 1.17(9) 0.27(5) 0.56(13) 0.78(10) 0.91(8) 0.56(9) 0.83(17) 0.56(9) 0.24(6) 0.49(7) 0.49(7) 0.60(16) 0.68(11) 0.49(6)
Ser 1.24(8) 0.81(9) 0.60(8) 0.63(7) 0.53(13) 0.96(11) 1.03(9) 0.36(5) 0.79(13) 1.08(9) 1.16(8) 0.87(9) 1.07(16) 0.88(10) 0.37(6) 0.56(6) 0.58(7) 1.09(17) 0.79(10) 0.80(8)
Thr 1.19(9) 1.00(11) 0.55(8) 0.51(7) 0.68(15) 0.79(11) 1.06(9) 0.41(5) 0.67(12) 0.61(8) 1.15(8) 0.94(10) 1.42(19) 0.77(10) 0.48(7) 0.64(7) 0.61(7) 0.68(14) 0.62(10) 0.76(7)
Trp 1.41(19) 0.94(18) 0.86(18) 0.80(15) 0.69(30) 1.24(22) 1.31(21) 0.60(13) 0.50(20) 1.22(22) 1.79(19) 1.14(20) 1.27(34) 1.38(26) 0.89(22) 0.93(17) 0.60(15) 1.74(46) 0.49(17) 1.33(20)
Tyr 1.28(11) 1.04(12) 0.75(11) 0.81(10) 0.88(21) 1.10(14) 1.12(12) 0.51(8) 0.36(12) 0.91(12) 1.43(11) 1.17(13) 1.15(22) 0.96(14) 0.51(9) 0.71(9) 0.67(10) 0.88(21) 0.84(13) 0.76(10)
Val 1.25(8) 1.18(9) 0.67(8) 0.82(7) 0.87(16) 1.22(11) 1.31(9) 0.51(6) 0.87(13) 0.74(7) 1.13(7) 1.19(9) 1.12(15) 0.92(10) 0.46(6) 0.66(7) 0.61(6) 0.63(15) 0.64(9) 0.71(6)
(d)
Ala 1.84(6) 1.69(8) 1.02(8) 1.00(6) 1.40(15) 1.64(9) 1.74(7) 0.69(5) 1.16(11) 1.7(8) 1.81(6) 1.63(8) 2.02(13) 1.29(9) 0.43(5) 1.11(7) 1.09(7) 1.75(16) 1.27(9) 1.45(7)
Arg 1.64(8) 1.38(10) 0.93(9) 1.23(9) 1.12(17) 1.58(12) 1.63(9) 0.58(6) 1.11(13) 1.24(9) 1.48(7) 1.33(9) 1.53(16) 1.24(10) 0.25(5) 1.03(8) 0.93(8) 1.20(18) 1.15(10) 0.94(7)
Asn 1.15(8) 0.98(10) 0.53(7) 0.67(7) 0.56(14) 0.92(10) 1.19(9) 0.43(5) 0.86(13) 0.84(8) 0.99(7) 0.87(8) 1.20(16) 0.94(10) 0.39(6) 0.64(7) 0.68(7) 0.84(15) 0.78(9) 0.69(7)
Asp 1.35(7) 1.09(9) 0.58(7) 0.71(7) 0.73(13) 1.18(11) 1.08(7) 0.28(4) 1.12(13) 0.98(7) 1.11(6) 0.98(7) 1.34(15) 0.92(8) 0.39(5) 0.66(6) 0.85(7) 0.84(13) 1.09(9) 0.83(6)
Cys 1.29(16) 1.03(16) 0.93(17) 0.57(12) 0.82(23) 1.02(18) 1.06(16) 0.48(9) 0.77(18) 1.14(17) 1.45(15) 0.76(14) 0.72(24) 0.88(18) 0.21(7) 0.46(10) 0.50(11) 1.23(36) 0.48(14) 0.76(13)
Gln 1.90(9) 1.61(12) 0.97(10) 1.04(10) 1.11(21) 1.81(13) 1.56(10) 0.57(6) 1.24(15) 1.22(10) 1.70(9) 1.55(11) 1.77(19) 1.08(12) 0.2(5) 1.11(10) 0.99(10) 1.09(18) 1.35(13) 1.12(9)
Glu 1.87(7) 1.44(8) 1.08(8) 1.00(7) 1.08(17) 1.83(11) 1.50(7) 0.71(6) 1.33(13) 1.33(8) 1.55(7) 1.43(7) 1.58(13) 1.28(10) 0.27(5) 1.12(8) 1.17(8) 1.37(15) 1.08(9) 1.16(7)
Gly 0.65(5) 0.54(5) 0.27(4) 0.43(4) 0.51(9) 0.64(7) 0.61(5) 0.29(3) 0.59(8) 0.59(5) 0.73(5) 0.52(5) 0.66(9) 0.59(6) 0.46(6) 0.43(4) 0.46(5) 0.49(9) 0.45(6) 0.50(4)
His 1.26(12) 1.13(14) 0.76(13) 0.78(11) 0.41(14) 1.12(15) 1.40(14) 0.52(8) 0.66(11) 1.02(12) 1.16(10) 1.04(13) 1.32(22) 0.81(12) 0.23(6) 0.79(11) 0.84(12) 1.10(23) 1.04(15) 0.94(11)
Ile 1.55(8) 1.37(9) 0.83(7) 0.78(6) 1.04(15) 1.35(11) 1.35(8) 0.75(6) 1.03(12) 1.10(8) 1.19(7) 1.28(8) 1.53(17) 0.94(10) 0.37(5) 0.90(7) 0.65(6) 1.15(18) 0.95(10) 0.99(7)
Leu 1.72(6) 1.46(7) 1.00(7) 1.16(6) 1.19(14) 1.56(9) 1.67(7) 0.82(5) 1.37(11) 1.32(7) 1.54(6) 1.53(7) 1.62(12) 1.01(8) 0.33(4) 1.08(6) 1.05(6) 1.48(17) 1.10(8) 1.14(6)
Lys 1.69(8) 1.40(9) 0.89(8) 0.96(7) 0.72(14) 1.51(11) 1.61(8) 0.51(5) 0.86(12) 1.02(8) 1.44(7) 1.19(7) 1.5(15) 1.13(10) 0.31(5) 0.91(7) 1.02(8) 1.18(17) 1.07(10) 0.83(6)
Met 1.68(12) 1.37(15) 0.94(14) 1.01(12) 1.10(28) 1.80(19) 1.38(13) 0.76(10) 1.18(21) 1.35(15) 1.69(13) 1.61(14) 1.48(21) 1.34(18) 0.45(9) 1.07(12) 0.77(11) 1.73(36) 1.02(17) 1.17(12)
Phe 1.28(9) 1.20(11) 0.83(9) 0.75(7) 0.59(15) 1.11(12) 1.20(9) 0.63(7) 0.87(13) 0.97(10) 1.36(8) 0.92(9) 1.10(17) 0.91(11) 0.26(5) 0.77(7) 0.67(7) 1.27(20) 1.18(12) 0.86(8)
Pro 0.52(5) 0.49(7) 0.20(5) 0.40(5) 0.31(10) 0.81(9) 0.77(6) 0.24(4) 0.53(10) 0.40(6) 0.62(6) 0.43(6) 0.53(12) 0.49(7) 0.20(4) 0.38(5) 0.33(5) 0.26(9) 0.49(8) 0.39(5)
Ser 1.11(7) 1.06(8) 0.54(7) 0.74(6) 0.56(11) 1.06(9) 1.01(7) 0.39(4) 0.68(10) 0.95(7) 1.14(6) 0.95(7) 1.14(13) 0.95(8) 0.37(5) 0.63(5) 0.75(7) 0.86(13) 0.99(9) 0.81(6)
Thr 1.20(7) 1.08(9) 0.45(6) 0.5(6) 0.77(13) 1.29(11) 1.10(8) 0.55(5) 0.75(10) 0.86(7) 1.19(6) 0.93(8) 1.14(14) 0.85(8) 0.41(5) 0.62(6) 0.70(6) 0.85(13) 0.67(8) 0.75(6)
Trp 1.88(17) 1.41(17) 0.75(14) 0.95(14) 0.49(21) 1.48(19) 1.14(16) 0.72(12) 0.88(21) 1.30(18) 1.46(14) 0.97(15) 1.12(27) 0.85(17) 0.38(12) 0.63(12) 0.72(13) 0.72(26) 0.96(19) 0.92(14)
Tyr 1.38(10) 0.95(10) 0.74(9) 0.82(8) 0.62(15) 1.20(12) 1.05(10) 0.60(7) 0.92(15) 0.96(10) 1.54(9) 1.05(10) 1.39(20) 1.03(12) 0.38(7) 0.57(7) 0.51(7) 0.89(17) 1.08(12) 0.82(8)
Val 1.38(6) 1.19(8) 0.96(8) 0.76(6) 0.84(13) 1.23(9) 1.12(7) 0.62(5) 0.60(9) 0.78(6) 1.02(6) 1.08(7) 1.42(14) 0.73(7) 0.27(4) 0.72(6) 0.61(5) 0.81(14) 0.72(8) 0.67(5)
Amino acids in the columns precede residues in the rows. Propensity values >1.20 are in bold for ease of inspection. Parts (a)±(d) list normalized propensity values derived from analysis of (a) helix library, (b) short
helices, (c) medium helices and (d) long helices. The standard deviations were estimated according the formula derived by Williams et al. (Williams et al., 1987): sd = [fij,k(1 ± fij,k)nij]1/2N/nij, where fij,k was the
frequency of occurrence of the residue pair i, j in the helix group k, nij was the total number of i, j pairs and N was the total number of residue pairs in our PDB library. The total number of residue pairs in the short
helix group was 11 101, in the medium helix group was 32 415, in the long helix group was 45 917 and in the total helix group was 89 433. The total number of residue pairs in our PDB library was 338 875. All
propensity values were represented with two decimal places in order to keep all records on a consistent basis. The residue pair Cys±Trp was not present in the medium helix group.
Table II. Dyad sequence codes for different groups of helix and strand
Discussion
The large population of secondary structural sequences derived
All helices Short helices Medium helices Long helices
in this study allowed us to analyze sequence patterns in
Asp±Met Ala±Arg Arg±Met Asp±Ala different helix groups. Amino acid helix propensity values in
Met±Trp Ala±Lys Arg±Trp Asp±Met the medium and the long helix groups were quite similar to
Trp±Ile Arg±Glu Asp±Ala Phe±Leu those found in the total helix group. More signi®cant variations
Tyr±Leu Asp±Gln Asp±Trp Trp±Gln
Asp±Ile Cys±Lys Tyr±Leu
were found in short helices, particularly for residues Trp and
Asp±Phe Gln±Phe Tyr±Met Pro where both residues showed a signi®cant increase in their
Asp±Val Glu±Met frequency of occurrence (Figure 2). Such differential prefer-
Glu±Thr His±Glu ence of certain amino acids in helices of variable sizes was
His±Ala Leu±Asn
Lys±Tyr Phe±Met
found in other studies. Kumar and Bansal (Kumar and Bansal,
Phe±Glu Thr±Met 1996) have shown that long helices (helices with more than 25
Pro±Ala Trp±Leu residues) often had a higher content of residues with longer
Pro±Gln Trp±Val side chains than those in medium and short helices. It was
Pro±Glu Tyr±Leu suggested that amino acids with longer side chains could
Pro±Phe
Pro±Ser perhaps better facilitate complementary interactions with other
Ser±Gln elements of the protein structure (Kumar and Bansal, 1998).
Ser±Leu However, the analysis of our helix library failed to show a trend
that the appearance of residues with bulkier side chains was
All entries in this table were selected from Table Ia±d according to the
following criteria: (i) the dyad signatures had a propensity >1.30, whereas more favored in longer helices. It should be noted that our
terminus of helices. On the other hand, while the propensities be directly compared. Nevertheless, the statistical signi®cance
of amino acids, Asp and Leu, were found to be among the of both methods should be comparable, i.e. a propensity value
highest at the second (the N2 position) and the third positions of 1.0 represented random distribution of the amino acids in the
(the N3 position) of the helix (Penel et al., 1999; Cochran and PDB. The expanded statistical scale of the neighbor-dependent
Doig, 2001), the amino acid pairs, Pro±Asp and Pro±Leu, had analysis enabled us to explore the hidden codes in the protein
relatively low neighbor-dependent propensity values in a-heli- sequence. Speci®cally, we were able to identify dyad signa-
ces (Table Ia±d). It appeared that the neighbor-dependent tures (a±b) that were highly favorable for the helix conform-
effect played a role in determining sequence patterns in ation, whereas their dyad pairs (i.e. b±a) had little or no
a-helices. preference for their corresponding conformations. Table II lists
While it was dif®cult to provide a physical±chemical some of the asymmetric dyads that had a high propensity for
rationale for the sequence patterns discovered in this study, helix conformations. The dyads in Table II were selected from
the neighbor dependency of amino acids favoring helical Table Ia±d according to the following criteria: (i) all entries
conformation appeared to be consistent with experimental had propensities >1.30, whereas the dyad pair of these entries
®ndings. Recent studies have shown that an amino acid was <1.2, and (ii) the propensity difference of the dyad pair was
propensity value for a particular geometrical conformation is >0.3.
not independent of its environment. Sequence analysis of The existence of the dyad sequence patterns re¯ected the
a-helices in proteins revealed that transitions from loop to dependence of the helical preference of some amino acids on
helix conformation required the presence of a particular group their neighbors. Such patterns were not always easily predict-
of amino acids (Presta and Rose, 1988; Richardson and able. Short helices had by far the most diversi®ed sequence
Richardson, 1988; Parker and Hefford, 1997). The amino acid patterns (Table II). The preference of amino acid dyads
composition at the ends of helices, where they were often more adopting helical conformation could not be easily rationalized.
Acknowledgements
The authors would like to thank Jennifer Davis and other members of the Feng
laboratory for helpful discussions. We also acknowledge the ®nancial support
from the National Institutes of Health (GM54630), the American Cancer
Society (PRG9926301GMC) and an appropriation from the commonwealth of
Pennsylvania.
References
An®nsen,C.B. (1973) Science, 181, 223±230.
Asai,K., Hayamizu,S. and Honda,K.I. (1993) Comput. Appl. Biosci., 9, 141±
146.
Baldwin,R.L. and Rose,G.D. (1999a) Trends Biochem. Sci., 24, 26±33.
Baldwin,R.L. and Rose,G.D. (1999b) Trends Biochem. Sci., 24, 77±83.
Barlow,D.J. and Thornton,J.M. (1988) J. Mol. Biol., 201, 601±619.
Benner,S.A. and Gerloff,D. (1990) Adv. Enzyme Regul., 31, 121±181.
Bernstein,F.C. et al. (1977) J. Mol. Biol., 11, 535±542.
Chamberlain,A.K. and Marquesee,S. (1997) Structure, 5, 859±863.
Chou,P.Y. and Fasman,G.D. (1978) Annu. Rev. Biochem., 47, 251±276.
Cochran,D.A.E. and Doig,A.J. (2001) Protein Sci., 10, 1305±1311.
Received January 5, 2003; revised June 25, 2003; accepted September 4, 2003
807