Professional Documents
Culture Documents
Protein NMR Spectroscopy in Structural Genomics: Progress
Protein NMR Spectroscopy in Structural Genomics: Progress
com
progress
Protein NMR spectroscopy provides an important complement to X-ray crystallography for structural
genomics, both for determining three-dimensional protein structures and in characterizing their biochemical
and biophysical functions.
Structural genomics involves the determi- that can be obtained. The highest quality NMR structures have
nation, analysis, and dissemination of the accuracies comparable to 2.0–2.5 Å X-ray crystal structures2.
three-dimensional structures of all protein Although atomic positions in high-resolution crystal structures
and RNA molecules in nature, providing are more precisely determined than in the corresponding NMR
new opportunities at the interface of struc- structures, the crystallization process may select for a subset of
© 2000 Nature America Inc. • http://structbio.nature.com
tural biology, functional genomics, and conformers present under solution conditions. For example,
bioinformatics. This very ambitious goal while high-quality NMR structures typically exibit root mean
requires both large-scale structure deter- square (r.m.s.) deviations of backbone and heavy atoms (exclud-
mination and amplification of these data by high-throughput ing those of surface side chains) of 0.3–0.6 Å and 0.5–0.8 Å,
modeling. It is generally recognized that X-ray crystallography respectively, analysis of a set of high-resolution X-ray crystal
using synchrotron radiation, and multiwavelength anomalous structures of bovine pancreatic trypsin inhibitor determined in
dispersion (MAD) methods1 for determining the phase informa- different crystal forms3 indicates similar variations of 0.2–0.6 Å
tion required for crystallographic analysis, will play a central role in backbone atom positions due to preferential selection of dis-
in genomic-scale structural analysis (see the articles by Stevens tinct low energy conformers in the crystallization process.
and colleagues, and Lamzin and Perrakis). Solution state NMR NMR has special value in structural genomics efforts for rapid-
will also have a complementary role in post-genomic analysis, ly characterizing the ‘foldedness’ of specific protein or RNA con-
particularly considering that (i) many protein targets do not pro- structs. The dispersion and lineshapes of resonances measured in
vide crystals suitable for crystallographic analysis; (ii) some 1D 1H-NMR and 2D 15N-1H or 13C-1H correlation spectra pro-
15–20% of new protein structures are determined by NMR vide ‘foldedness’ criteria with which to define constructs and
methods; and (iii) sequence-specific resonance assignments pro- solution conditions that provide folded protein samples (Fig. 1).
vide the basis for various kinds of functional characterization. As the required isotopic enrichment with 15N is relatively inex-
pensive, and the 2D 15N-1H correlation spectra can be recorded in
Strengths and weaknesses of NMR in structural
genomics
Several features of solution-state NMR make it particularly a b
suitable for structure-function analysis and structural
genomics. Structural analysis by NMR does not require
protein crystals. Most (∼75%) of the NMR structures in the
Protein Data Bank (PDB) do not have corresponding crys-
tal structures, and many of these simply do not provide dif-
fraction quality crystals. Moreover, NMR studies can be
carried out in aqueous solution under conditions quite sim-
ilar to the physiological conditions under which the protein
normally functions. This feature allows comparisons to be
made between subtly different solution conditions that may
modulate structure-function relationships. For example,
pH titration data can be used to determine pKa values of
specific ionizable groups in the protein and to characterize
the corresponding structure-function relationships. While
most crystal structures are determined under physiological- Fig. 1 Comparison of 15N-1H correlation spectra for disordered and well-folded pro-
ly relevant conditions, in many cases somewhat exotic solu- teins. a, Spectrum of Drosophila melanogaster Par 1 C-terminal domain, a domain
tion conditions are required for crystallization. construct that is predominantly disordered under the conditions of these measure-
ments (K.G. and G.T.M., unpublished results). b, Spectrum of Thermus thermophilus
The accuracy of protein structures determined by varient of COG272 protein, a target with well-defined three-dimensional structure
NMR is very dependent on the extent and quality of data in aqueous solution (B. Dixon, S. Anderson, and G.T.M., unpublished results).
1Center for Advanced Biotechnology and Medicine, Department of Molecular Biology and Biochemistry, Rutgers University, Piscataway, New Jersey 08854-5638, USA.
2Department of Chemistry, State University of New York at Buffalo, Buffalo, New York 14260, USA. Correspondence should be addressed to G.T.M.
email: guy@cabm.rutgers.edu
progress
energy force fields, which in essence reflect our current knowledge about conformational preferences of proteins.
tens of minutes with conventional NMR systems, it is quite feasi- Large multidomain proteins are generally not suitable for
ble to use such data as a ‘foldedness’ screen in a high throughput NMR analysis. However, these can also exhibit interdomain flex-
sample preparation pipeline. Moreover, there may be correlations ibility, which can complicate or prevent crystallization.
between such ‘foldedness’ criteria and crystallizability, so that Fortunately, many of these larger proteins are composed of
data from a high throughput NMR screen might directly support structural domains13–15, with an average size of ∼175 amino
efforts to generate samples for crystallographic analysis. acids. Indeed, much of the structural information available for
Protein backbone chemical shift assignments are obtained at the such larger proteins comes from X-ray and NMR studies of iso-
initial stage of a structure determination (see Box 1), and can often lated domains. In this regard, both experimental and theoretical
be generated in a fully automated fashion4. These data provide methods for parsing large multidomain proteins into
experimental determination of locations of secondary structural autonomously folding domain segments are critical to the gener-
elements5,6, which is more reliable than that provided by secondary al aims of structural genomics.
structure prediction algorithms. This knowledge is tremendously NMR is particularly valuable in structural genomics for analyz-
enabling for fold prediction algorithms. Such fold predictions form ing protein structures that are outside the scope of crystallographic
the basis for functional predictions7 and can also be used for prior- studies. Included in the classes of proteins that do not form crystals
itizing targets for further experimental structure analysis. suitable for crystallographic analysis are those that are partially
NMR also provides a powerful tool for downstream characteri- unfolded in the absence of binding partners, as well as some mem-
zation of structure-function relationships, a critical component brane-associated proteins that can be studied in micelle environ-
of the process of structure-based functional genomics8. Chemical ments using solution-state NMR. Solid state NMR methods can
shift perturbation provides an important tool for validating pro- also provide structural information for some integral membrane
posed biochemical functions, screening for small molecule lig- proteins that may not be accessible by crystallographic methods.
ands, mapping ligand binding epitopes, and drug development9. NMR spectroscopy is relatively insensitive, which severely
Moreover, it is generally appreciated that the thermodynamics limits experimental design. Typically samples at ∼1 mM protein
and mechanisms of molecular function depend on
changes in internal dynamics, which can be character-
ized using nuclear relaxation measurements10.
Although significant progress has been made in
determining resonance assignments and low resolu-
tion structures of larger systems11,12, standard meth-
ods for high resolution structure analysis by NMR are
limited to proteins with molecular weights less than
25–30 kDa. The size distribution of ORFs in some
genomes is shown in Fig. 2. Even though many of
these ORFs code for oligomeric proteins, proteins
that are folded only in the presence of binding part-
ners, or integral membrane proteins, we estimate that
at least 25% of yeast ORFs will be suitable for NMR
structure determination with current methodologies.
In higher eukaryotic genomes, this fraction of small
Fig. 2 Distribution of predicted open reading frame (ORF) lengths in the genomes of
ORFs is somewhat lower. Nonetheless, there are thou- Escherichia coli (blue), Saccharomyces cerevisiae (red), Caenorhabditis elegans (yel-
sands of full-length ORF targets that will be suitable low), and Drosophila melanogaster (green). Assuming monomeric structures, the
for NMR structure determination. length cut-off for routine NMR studies is ∼300 amino acids (dotted vertical line).
progress
progress
frequencies) have long been recog- Table 1 Web sites related to the use of NMR in structural genomics
nized as a potential source for struc-
tural refinement. In particular, 13Cα
Center or Consortium URL
and 13Cβ shifts offer a robust means
BioMagResBank www.bmrb.wisc.edu
to map the secondary structure and
Harvard Structural Genomics of Cancer sbweb.med.harvard.edu/~sgc/
to derive backbone dihedral angle
Initiative, USA
constraints at an early stage of the
New Jersey Commission on Science and www-nmr.cabm.rutgers.edu/structuralgenomics
structure determination5,6. They are
Technolology Initiative in Structural
obtained during the resonance
Genomics, USA
assignment process, and are thus of
Northeast Structural Genomics www.nesg.org
outstanding value for efficient high-
Consortium, USA
throughput efforts. Third, detection
Protein Structure Factory, Germany userpage.chemie.fu-berlin.de/~psf/
of through-hydrogen bond scalar
24 Riken Genome Sciences Center, Tokyo, Japan www.gsc.riken.go.jp
couplings affords valuable unam- Toronto Structural Proteomics Project, Canada nmr.oci.utoronto.ca/arrowsmith/proteomics
biguous constraints for characteriz-
ing hydrogen-bonded networks,
although the small size of these couplings may restrict this to Conclusions
smaller proteins. Protein NMR provides structural and biophysical information
that is complementary to X-ray crystallography, and these two
© 2000 Nature America Inc. • http://structbio.nature.com
Automated data analysis methods will play synergistic roles in the postgenomic analysis
Another important area of development involves automated and structural genomics. Indeed, NMR is already playing key
analysis of NMR data. It has been recognized for some time that roles in several of the established pilot projects. The primary
many of the interactive tasks carried out by an expert in the challenges to NMR for high throughput applications are the nec-
process of spectral analysis could, in principle, be carried out essarily long time periods for data collection and the laborious
more efficiently and rapidly by computational systems. Recent expert reasoning need for data analysis. Recent advances in
developments provide automated analysis of NMR assignments probe design, data collection strategies, and software engineer-
and three-dimensional structures of proteins ranging from ∼50 ing demonstrate the potential for higher throughput data collec-
to 200 amino acids4,18. When good quality data are available, tion and automated structure analysis.
automated analysis of protein NMR data can be very rapid.
Many of the available resonance assignment programs execute in
tens of seconds4,18, and automated structure refinements are Acknowledgments
being carried out in tens of minutes using arrays of processors We thank S. Anderson for useful discussions. The NMR data for FGF were
provided by R. Powers and F. Moy (Wyeth Ayerst Research Laboratories).
for course-grain parallel calculations (Fig. 3). However, while G.T.M. is supported by grants from the New Jersey Commission on Science and
progress over the last few years is encouraging, more work is Technology, The National Science Foundation, and the Merck Genome Research
required, even for small proteins, before automated structural Institute. K.C.G. is supported by Postdoctoral Fellowship Award from the NIH.
analysis is routine. In particular, general methods for automated
analysis of side chain resonance assignments are not yet well Associations with structural genomics
developed, and there are as yet no examples of completely auto- G. T. M. is Director of the New Jersey Commission on Science and Technology
mated protein structure determinations. Moreover, little work Initiative in Structural Genomics and Bioinformatics
has focused on the specific problems associated with nucleic acid 1. Hendrickson, W. Science 254, 51–58 (1995).
2. Billeter, M. Q. Rev. Biophys. 25, 325–377 (1992).
structure determinations. 3. Kossiakoff, A.A., Randal, M., Guenot, M. & Eigenbrot, C. Proteins Struct. Funct.
Genet. 14, 65–74 (1992).
4. Moseley, H.N.B. & Montelione, G.T. Curr. Opin. Struct. Biol. 9, 635–642 (1999).
Pilot projects using NMR for structural genomics 5. Wishart, D.S. & Sykes, B.D. J. Biomol. NMR 4, 171–180 (1994).
In view of these technological advances and the unique opportu- 6. Cornilescu, G., Delaglio, F. & Bax, A. J. Biomol. NMR 13, 289–302 (1999).
7. Fetrow, J.S. & Skolnick, J. J. Mol. Biol. 281, 949–968 (1998).
nities presented by the genomic sequence data, several research 8. Montelione, G.T. & Anderson, S. Nature Struct. Biol. 11–12 (1999).
groups and consortia have initiated pilot projects using NMR in 9. Shuker, S.B., Hajduk, P.J., Meadows, R.P. & Fesik, S.W. Science 274, 1531–1534 (1996).
10. Palmer, A.G., Williams, J. & McDermott, A. J. Phys. Chem. 100, 13293–13310
structural genomics (Table 1). The scales of these efforts range (1996).
from the effort at Rutgers University in the USA funded by the 11. Wüthrich, K. Nature Struct. Biol. 5, 492–495 (1998).
12. Gardner, K.H. & Kay, L.E. Annu. Rev. Biophys. Biomol. Struct. 27, 357−406 (1998).
New Jersey Commission on Science and Techonology, which 13. Murzin, A.G., Brenner, S.E., Hubbard, T. & Chothia, C. J. Mol. Biol. 247, 536–540
focuses primarily on technology development, to the RIKEN (1995).
14. Holm, L. & Sander, C. Science 273, 595–602 (1996).
Genome Sciences Center in Japan, which is in the process of 15. Orengo, C.A., et al. Structure 5, 1093–1108 (1997).
installing some twenty high field NMR spectrometers to be used 16. Pervushin, K., Riek, R., Wider, G. & Wüthrich, K. Proc. Natl. Acad. Sci. USA 94,
12366–12371 (1997).
largely for high throughput structural genomics. Also particular- 17. Wider, G. & Wüthrich, K. Curr. Opin. Struct. Biol. 9, 594–601 (1999).
ly noteworthy is the structural genomics pilot project organized 18. Montelione, G.T., Rios, C.B., Swapna, G.V.T. & Zimmerman, D.E. In Biological
magnetic resonance (eds Krishna, R. & Berliner, L.) 81–130 (Klewer
by researchers at University of Toronto in Canada, in which iso- Academic/Plenum Publishers, New York; 1999).
tope-enriched samples of proteins encoded by the genome of 19. Szyperski, T., Wider, G., Bushweller, J.H. & Wüthrich, K. J. Am. Chem. Soc. 115,
9307–9308 (1993).
Methanobacterium thermoautotrophicum have been distributed 20. Szyperski, T., Banecki, B., Braun, D. & Glaser, R.W. J. Biomol. NMR 11, 387–405 (1998).
to several NMR groups for parallel data collection and structure 21. Tjandra, N. & Bax, A. Science 278, 1111–1114 (1997).
22. Hansen, M.R., Mueller, L. & Pardi, A. Nature Struct. Biol. 5, 1065–1074 (1998).
analysis, resulting in some dozen three-dimensional structures 23. Prestegard, J.H. Nature Struct. Biol. 5, 517–522 (1998).
over the last year. 24. Cordier, F. & Grzesiek, S. J. Am. Chem. Soc 121, 1601–1602 (1999).