Professional Documents
Culture Documents
Protein Modeling by Multiple Sequence Threading and Distance Geometry
Protein Modeling by Multiple Sequence Threading and Distance Geometry
1:3842 (1997)
ABSTRACT
The application of homology
modeling is often limited by the lack of known
structures with sufficiently high sequence similarity to the target protein. The recent development of threading methods now enable the
identification of likely folding patterns in a
number of cases where the structural relatedness between target and template(s) is not
detectable at the sequence level. We devised a
hybrid method in which fold recognition was
performed using the Multiple Sequence
Threading (MST) method. The structural
equivalences deduced from the threading output were used to guide the distance geometry
program DRAGON in the construction of lowresolution Ca/Cb models. The initial structures
were converted to full-atom representation and
refined using the general-purpose molecular
modeling package QUANTA. The performance
of the approach is illustrated on the CASP2
target T0004 (polyribonucleotide nucleotidyltransferase S1 motif (PNS1) from Escherichia
coli, PDB code: 1SRO) for which no obvious
homologues with known structure were available. The correct fold of PNS1 was successfully
identified, and the model was found to be more
similar to the experimental PNS1 structure than
the scaffold (Ca RMSD of 6.2 compared with 6.4
). Our results indicate that a sensitive fold recognition algorithm coupled with a distance geometry program capable of rapidly generating initial structures can successfully complement highresolution homology modeling methods in cases
where sequential similarity is low. Proteins,
Suppl. 1:3842, 1997. r 1998 Wiley-Liss, Inc.
Key words: distance geometry; homology modeling; fold recognition; protein
structure prediction
INTRODUCTION
Homology modeling is a technique whereby the
three-dimensional (3D) conformation of a target
protein is deduced from the known structures of
other proteins (the templates) by using sequential
similarities between the target and the templates to
establish structural equivalences. The approach is
based on the observation that structural features are
conserved during evolution to a larger extent than
r 1998 WILEY-LISS, INC.
39
vent a dislocated threading across two protein domains, a modified version of MST was used to thread
just one domain of given size. Three hits with the
best packing scores and secondary structure alignments were selected from the MST results as follows
(PDB codes in parentheses): porcine E. coli heatlabile enterotoxin chain D (1LTSD), RNase Hdomain from HIV-1 reverse transcriptase chain A
(1HRHA), and staphylococcal nuclease (2SNS). All
these candidate templates were small b 1 a structures: 1LTSD is a classic OB-fold,16 2SNS is a more
elaborate OB-fold, while 1HRHA is a ribonuclease
RT domain. Figure 1A illustrates the threading on
one of these structures (1LTSD).
Fold Generation
Model building was performed by the distance
geometry program DRAGON. Structural equivalences between the unknown target structure and
the scaffold proteins with known structures were
described by mapping distance restraints between
Ca atoms onto the model through alignments constructed from the MST threading output. The alignments contained the target sequence and one of the
three candidate template sequences only. The threading-based structural information was complemented
by additional restraints derived from secondary structure prediction. For each of the three scaffold structures, 50 models were generated by using Ca:Ca
distances shorter than 10 to guide the folding
process. A representative model based on 1LTS chain
D is shown in Figure 1B.
Model Building and Refinement
The 10 best-scoring DRAGON output structures
were averaged for each scaffold. The missing atoms
were added to the Ca average structures and the
resulting full-atom structures were minimized by
QUANTA version 4.1/CHARMM 23.1.17 Initial geometry regularization was followed by an in vacuo
simulated annealing at T 5 1000 K and a second MD
run with an 8--thick solvation layer at T 5 300 K.
Comparison
Once the NMR coordinates were obtained from the
CASP2 organizers we superposed our models onto
the experimental structure to assess how well the
folds matched (Fig. 1D). The template structures
were also compared to the experimental structure,
although this could not be done by a straightforward
rigid-body superposition as the template sequences
were different from that of PNS1. A modification of
the SSAP algorithm18 was used to generate optimal
correspondences between atoms for the superposition.
RESULTS AND DISCUSSION
We identified three potential folds for the CASP2
target T0004 (polyribonucleotide nucleotidyltransfer-
DI ET AL.
A. ASZO
40
Fig. 1. Modeling target T0004 on the 1LTSD chain. A: Threaded structure of T0004 on 1LTSD
where blue 5 deletions, white 5 inserts, and red 5 hydrophobic. B: Raw DRAGON model of T0004
based on the structure of 1LTSD. C: Experimental NMR structure of PNS1. D: Superposition of the
DRAGON model and the NMR structure (the N terminus is color-coded blue).
Template
MST
score
MatchMaker
score (kT)
CHARMM
energy
(kcal/mol)
Ca RMSD
()
1LTSD
1HRHA
2SNS
2743
2707
2966
20.12
20.06
20.14
24417
24547
24403
6.2
10.8
11.0
41
quality structural information available in comparative modeling, where the target and the templates
are closely related both sequentially and structurally. Most participants at the CASP2 meeting agreed
that model quality depends very much on the quality
and quantity of external structural information supplied to the prediction algorithms. Second, it seems
to be difficult to choose the appropriate level of
resolution. While in our case the low-resolution
Ca:Cb model building by distance geometry appeared to be justified on grounds of efficiency and
lack of detailed experimental information, perhaps
the method would have performed better if another
refinement at intermediate resolution had been carried out before the full-atom modeling to improve the
main-chain geometry. Finally, although the choice of
detailed potential functions and sophisticated energy minimization/refinement methods are important for the last stage of full-atom refinement, these
cannot compensate for gross errors (such as misaligned residues in homology modeling) made earlier
in the modeling process. Possible improvements to
our approach should therefore include a careful
choice of low-resolution interaction potentials and
improved gap modeling,20 followed by refinement to
facilitate the full exploitation of available structural
information.
CONCLUSION
Combining threading with distance geometry can
be a useful way to construct a model for a protein. If a
sequence has no known structural homologues, then
it can be threaded to predict a likely scaffold on
which to base the model. This approach has several
advantages over a pure ab initio prediction, where a
fold is constructed by using just secondary structure
information, as threading will provide hints to the
possible tertiary structure of the target as well.
Although we only describe one example in detail,
several points can be taken from the CASP2 experiment with respect to our methods. The distance
geometry program DRAGON performed best on problems like T0004, where possible template structures
could be identified from fold recognition performed
with the high-sensitivity MST method and a lowresolution model chain representation was adequate. DRAGON cannot be expected to replicate
structures with high accuracy based on close sequence similarity, as it uses only Ca atoms and
therefore discards main-chain geometry details that
may be important at the higher level of detail. We
plan to develop the tandem MST/DRAGON approach into a protein modeling system that performs
well under conditions where accurate structural
information is not available, thereby complementing
high-resolution comparative modeling methods.
DI ET AL.
A. ASZO
42
ACKNOWLEDGMENTS