Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

© 2000 Nature America Inc. • http://structbio.nature.

com

progress

Protein NMR spectroscopy in structural genomics


Gaetano T. Montelione1, Deyou Zheng1, Yuanpeng J. Huang1, Kristin C. Gunsalus1 and Thomas Szyperski2

Protein NMR spectroscopy provides an important complement to X-ray crystallography for structural
genomics, both for determining three-dimensional protein structures and in characterizing their biochemical
and biophysical functions.

Structural genomics involves the determi- that can be obtained. The highest quality NMR structures have
nation, analysis, and dissemination of the accuracies comparable to 2.0–2.5 Å X-ray crystal structures2.
three-dimensional structures of all protein Although atomic positions in high-resolution crystal structures
and RNA molecules in nature, providing are more precisely determined than in the corresponding NMR
new opportunities at the interface of struc- structures, the crystallization process may select for a subset of
© 2000 Nature America Inc. • http://structbio.nature.com

tural biology, functional genomics, and conformers present under solution conditions. For example,
bioinformatics. This very ambitious goal while high-quality NMR structures typically exibit root mean
requires both large-scale structure deter- square (r.m.s.) deviations of backbone and heavy atoms (exclud-
mination and amplification of these data by high-throughput ing those of surface side chains) of 0.3–0.6 Å and 0.5–0.8 Å,
modeling. It is generally recognized that X-ray crystallography respectively, analysis of a set of high-resolution X-ray crystal
using synchrotron radiation, and multiwavelength anomalous structures of bovine pancreatic trypsin inhibitor determined in
dispersion (MAD) methods1 for determining the phase informa- different crystal forms3 indicates similar variations of 0.2–0.6 Å
tion required for crystallographic analysis, will play a central role in backbone atom positions due to preferential selection of dis-
in genomic-scale structural analysis (see the articles by Stevens tinct low energy conformers in the crystallization process.
and colleagues, and Lamzin and Perrakis). Solution state NMR NMR has special value in structural genomics efforts for rapid-
will also have a complementary role in post-genomic analysis, ly characterizing the ‘foldedness’ of specific protein or RNA con-
particularly considering that (i) many protein targets do not pro- structs. The dispersion and lineshapes of resonances measured in
vide crystals suitable for crystallographic analysis; (ii) some 1D 1H-NMR and 2D 15N-1H or 13C-1H correlation spectra pro-
15–20% of new protein structures are determined by NMR vide ‘foldedness’ criteria with which to define constructs and
methods; and (iii) sequence-specific resonance assignments pro- solution conditions that provide folded protein samples (Fig. 1).
vide the basis for various kinds of functional characterization. As the required isotopic enrichment with 15N is relatively inex-
pensive, and the 2D 15N-1H correlation spectra can be recorded in
Strengths and weaknesses of NMR in structural
genomics
Several features of solution-state NMR make it particularly a b
suitable for structure-function analysis and structural
genomics. Structural analysis by NMR does not require
protein crystals. Most (∼75%) of the NMR structures in the
Protein Data Bank (PDB) do not have corresponding crys-
tal structures, and many of these simply do not provide dif-
fraction quality crystals. Moreover, NMR studies can be
carried out in aqueous solution under conditions quite sim-
ilar to the physiological conditions under which the protein
normally functions. This feature allows comparisons to be
made between subtly different solution conditions that may
modulate structure-function relationships. For example,
pH titration data can be used to determine pKa values of
specific ionizable groups in the protein and to characterize
the corresponding structure-function relationships. While
most crystal structures are determined under physiological- Fig. 1 Comparison of 15N-1H correlation spectra for disordered and well-folded pro-
ly relevant conditions, in many cases somewhat exotic solu- teins. a, Spectrum of Drosophila melanogaster Par 1 C-terminal domain, a domain
tion conditions are required for crystallization. construct that is predominantly disordered under the conditions of these measure-
ments (K.G. and G.T.M., unpublished results). b, Spectrum of Thermus thermophilus
The accuracy of protein structures determined by varient of COG272 protein, a target with well-defined three-dimensional structure
NMR is very dependent on the extent and quality of data in aqueous solution (B. Dixon, S. Anderson, and G.T.M., unpublished results).

1Center for Advanced Biotechnology and Medicine, Department of Molecular Biology and Biochemistry, Rutgers University, Piscataway, New Jersey 08854-5638, USA.
2Department of Chemistry, State University of New York at Buffalo, Buffalo, New York 14260, USA. Correspondence should be addressed to G.T.M.
email: guy@cabm.rutgers.edu

982 nature structural biology • structural genomics supplement • november 2000


© 2000 Nature America Inc. • http://structbio.nature.com

progress

Box 1 Protein structure determination by NMR


The determination of a NMR solution structure may be dissected into six major parts. (i) At
the outset of the NMR study, a suitable sample, usually ∼500 µL of a 1 mM protein solution is
prepared. If the molecular weight of a protein exceeds ∼10 kDa, enrichment with 13C and 15N
isotopes is required in order to resolve spectral overlap in 1H-NMR spectroscopy. Due to the
availability of high-yield over-expression systems, stable isotope labeling has become
routine. (ii) Subsequently, this sample is used to record a set of multidimensional NMR
experiments, typically at temperatures around 30 ºC, which provide, after suitable data
processing, the NMR spectra. (iii) These allow determination of (nearly) complete sequential
NMR assignments (the measurement of resonance frequencies (chemical shifts) of the NMR-
active spins in the protein). (iv) The resulting conformation-dependent dispersion of the
chemical shifts is a prerequisite for deriving experimental constraints from various NMR
experiments (such as NOE, scalar coupling, and dipolar coupling data) for the NMR structure
calculation. The circular arrows between steps (iv) and (v) indicate that the analysis of
structural constraints and the calculation of NMR structures is generally pursued in an iterative fashion. (v) Iterations involving structure
calculations and identification of new constraints are carried out until the overwhelming majority of experimentally derived constraints is
in agreement with a bundle of protein conformations representing the NMR solution structure. Conformational variations in the bundle of
structures reflect the precision of the NMR structure determination. (vi) Finally, the NMR structure can be refined using conformational
© 2000 Nature America Inc. • http://structbio.nature.com

energy force fields, which in essence reflect our current knowledge about conformational preferences of proteins.

tens of minutes with conventional NMR systems, it is quite feasi- Large multidomain proteins are generally not suitable for
ble to use such data as a ‘foldedness’ screen in a high throughput NMR analysis. However, these can also exhibit interdomain flex-
sample preparation pipeline. Moreover, there may be correlations ibility, which can complicate or prevent crystallization.
between such ‘foldedness’ criteria and crystallizability, so that Fortunately, many of these larger proteins are composed of
data from a high throughput NMR screen might directly support structural domains13–15, with an average size of ∼175 amino
efforts to generate samples for crystallographic analysis. acids. Indeed, much of the structural information available for
Protein backbone chemical shift assignments are obtained at the such larger proteins comes from X-ray and NMR studies of iso-
initial stage of a structure determination (see Box 1), and can often lated domains. In this regard, both experimental and theoretical
be generated in a fully automated fashion4. These data provide methods for parsing large multidomain proteins into
experimental determination of locations of secondary structural autonomously folding domain segments are critical to the gener-
elements5,6, which is more reliable than that provided by secondary al aims of structural genomics.
structure prediction algorithms. This knowledge is tremendously NMR is particularly valuable in structural genomics for analyz-
enabling for fold prediction algorithms. Such fold predictions form ing protein structures that are outside the scope of crystallographic
the basis for functional predictions7 and can also be used for prior- studies. Included in the classes of proteins that do not form crystals
itizing targets for further experimental structure analysis. suitable for crystallographic analysis are those that are partially
NMR also provides a powerful tool for downstream characteri- unfolded in the absence of binding partners, as well as some mem-
zation of structure-function relationships, a critical component brane-associated proteins that can be studied in micelle environ-
of the process of structure-based functional genomics8. Chemical ments using solution-state NMR. Solid state NMR methods can
shift perturbation provides an important tool for validating pro- also provide structural information for some integral membrane
posed biochemical functions, screening for small molecule lig- proteins that may not be accessible by crystallographic methods.
ands, mapping ligand binding epitopes, and drug development9. NMR spectroscopy is relatively insensitive, which severely
Moreover, it is generally appreciated that the thermodynamics limits experimental design. Typically samples at ∼1 mM protein
and mechanisms of molecular function depend on
changes in internal dynamics, which can be character-
ized using nuclear relaxation measurements10.
Although significant progress has been made in
determining resonance assignments and low resolu-
tion structures of larger systems11,12, standard meth-
ods for high resolution structure analysis by NMR are
limited to proteins with molecular weights less than
25–30 kDa. The size distribution of ORFs in some
genomes is shown in Fig. 2. Even though many of
these ORFs code for oligomeric proteins, proteins
that are folded only in the presence of binding part-
ners, or integral membrane proteins, we estimate that
at least 25% of yeast ORFs will be suitable for NMR
structure determination with current methodologies.
In higher eukaryotic genomes, this fraction of small
Fig. 2 Distribution of predicted open reading frame (ORF) lengths in the genomes of
ORFs is somewhat lower. Nonetheless, there are thou- Escherichia coli (blue), Saccharomyces cerevisiae (red), Caenorhabditis elegans (yel-
sands of full-length ORF targets that will be suitable low), and Drosophila melanogaster (green). Assuming monomeric structures, the
for NMR structure determination. length cut-off for routine NMR studies is ∼300 amino acids (dotted vertical line).

nature structural biology • structural genomics supplement • november 2000 983


© 2000 Nature America Inc. • http://structbio.nature.com

progress

concentration are required, preventing studies of proteins with a b


very low solubilities. Because of constraints on pulse sequence
design arising from these sensitivity limitations, several differ-
ent NMR spectra recorded over a four to six week period are
necessary to obtain the information needed for a high-quality
structure determination. These long data collection periods, in
turn, put significant constraints on sample stability. Although
multiple samples can be used in the structure determination
process, each one must be stable for days to weeks with respect
to precipitation, aggregation, and other forms of degradation.
Manual analysis of these multiple NMR data sets is laborious
and requires significant expertise. Another important limitation
of NMR analysis is that the density of constraints is sometimes c d
inadequate for accurate structural analysis. In particular, gener-
al methods for cross validation analogous to a free R-factor, a
statistical measurement used in crystallographic studies to eval-
uate how well a structural model fits the diffraction data, are not
yet available.
© 2000 Nature America Inc. • http://structbio.nature.com

Recent technological advances


The reduction of the data collection time required for a structure
determination is a major challenge for NMR-based structural
genomics. Technological advances enhancing sensitivity, such as
the construction of new high-field magnets are of keen interest.
The sensitivity of the acquired NMR data depends critically on Fig. 3. Results of automatic analysis of protein structures from NMR
data. Comparison of backbone structures of basic fibroblast growth fac-
the performance of the NMR probe, a sophisticated electronic tor (FGF) determined by a, manual analysis of NMR data (PDB code
device used to detect NMR signals. In the near future, the intro- 1bld), b, automated analysis of the same NMR data using the program
duction of cryogenic probes is expected to have a significant AutoStructure (Y.J.H, R. Tejero and G.T.M., unpublished) or c, X-ray crys-
tallography (PDB code 1bas). Only residues 28–152 are shown, as the
impact. Radiofrequency (RF) coils constitute the heart of these N-terminal segment is not well-ordered in either the X-ray or solution
probes, and their sensitivity scales with the thermal noise associ- NMR structure, and a few C-terminal residues are not defined in the
ated with the coil’s temperature. Cryogenic probes utilize X-ray crystal structure. The average root mean square (r.m.s.) deviation
RF-coils cooled to ~25 K, and the resulting sensitivity enhance- of backbone atom positions between the AutoStructure and manually-
determined NMR structures is 0.6 Å. d, Superposition of 10 NMR struc-
ment reduces instrument time requirements by factors that tures computed with AutoStructure. The average r.m.s. deviation of
range from 4 to 16. Another key advance involves partial deuter- core backbone atoms relative to the mean coordinates is 0.3 Å.
ation12, providing samples that can be studied with improved
signal-to-noise ratios that result from their sharper linewidths
and longer transverse relaxation times. The combination of par-
tial deuteration and cryogenic probes can provide a factor of 10 ments19,20, with simultaneous frequency labeling of more than
or more reduction in the requisite data collection times. These one atom type in indirect dimensions, offers an attractive solu-
technologies provide the basis for high throughput NMR, and tion that matches data collection times with signal-to-noise, and
are particularly valuable for samples exhibiting limited stabilities requiring minimal sets of NMR experiments for resonance
and/or low solubilities. A novel spectroscopic concept named assignment20.
TROSY (transverse relaxation optimized spectroscopy), based Traditional NMR structure determination relies on measure-
on selection of slowly relaxing NMR transitions, also can provide ment of nuclear Overhauser effects (NOEs; through-space dipo-
significant sensitivity enhancement for large proteins11,16,17 and lar interactions between protons) and scalar couplings
may become a prerequisite to extend structural genomics by (through-bond interactions between nuclei mediated by
NMR into the 30–50 kDa molecular weight range. nuclear-electron interactions) for deriving distance and torsion
NMR structure determinations rely on the nearly complete angle constraints, respectively. NOE constraints will continue to
assignment of chemical shifts, which are obtained using multidi- be key for high-throughput structure determination, but the
mensional 13C,15N,1H-triple resonance NMR methods (for arsenal of techniques that have recently been developed to
recent technical reviews see refs 12, 17, and 18). However, a com- recruit additional experimental parameters for structure refine-
plete set of these experiments often requires far more instrument ment will play a valuable role in structural genomics. First, mea-
time than the minimum dictated by signal-to-noise (S/N) surement of residual dipolar 1H-15N and 1H-13C couplings in
requirements. A particular challenge for structural genomics is dilute liquid crystalline media (aqueous solutions containing
the development of NMR experiments that allow matching of suitable amounts of bicelles21 or filamentous phage22 to help con-
instrument time investments to the minimum time required for strain the orientation of the protein under study) offers qualita-
measuring the chemical shift data. For many samples, most of tively new structural information. Dipolar coupling constraints
the instrument time is needed not to detect signal, but to ensure can establish the spatial relationship of remote segments of a bio-
appropriate resolution and/or information content of the spec- logical macromolecule and can complement sparse NOE net-
tra. In particular, lower bounds for the measurement time of works for obtaining high-quality structures23. Current
three- and four-dimensional experiments are often determined limitations for use in structural genomics are the efficient identi-
by digital resolution requirements in the indirect dimensions fication of suitable orienting media in which the protein sample
rather than S/N requirements. Reduced dimensionality experi- remains soluble. Second, chemical shifts (the NMR resonance

984 nature structural biology • structural genomics supplement • november 2000


© 2000 Nature America Inc. • http://structbio.nature.com

progress

frequencies) have long been recog- Table 1 Web sites related to the use of NMR in structural genomics
nized as a potential source for struc-
tural refinement. In particular, 13Cα
Center or Consortium URL
and 13Cβ shifts offer a robust means
BioMagResBank www.bmrb.wisc.edu
to map the secondary structure and
Harvard Structural Genomics of Cancer sbweb.med.harvard.edu/~sgc/
to derive backbone dihedral angle
Initiative, USA
constraints at an early stage of the
New Jersey Commission on Science and www-nmr.cabm.rutgers.edu/structuralgenomics
structure determination5,6. They are
Technolology Initiative in Structural
obtained during the resonance
Genomics, USA
assignment process, and are thus of
Northeast Structural Genomics www.nesg.org
outstanding value for efficient high-
Consortium, USA
throughput efforts. Third, detection
Protein Structure Factory, Germany userpage.chemie.fu-berlin.de/~psf/
of through-hydrogen bond scalar
24 Riken Genome Sciences Center, Tokyo, Japan www.gsc.riken.go.jp
couplings affords valuable unam- Toronto Structural Proteomics Project, Canada nmr.oci.utoronto.ca/arrowsmith/proteomics
biguous constraints for characteriz-
ing hydrogen-bonded networks,
although the small size of these couplings may restrict this to Conclusions
smaller proteins. Protein NMR provides structural and biophysical information
that is complementary to X-ray crystallography, and these two
© 2000 Nature America Inc. • http://structbio.nature.com

Automated data analysis methods will play synergistic roles in the postgenomic analysis
Another important area of development involves automated and structural genomics. Indeed, NMR is already playing key
analysis of NMR data. It has been recognized for some time that roles in several of the established pilot projects. The primary
many of the interactive tasks carried out by an expert in the challenges to NMR for high throughput applications are the nec-
process of spectral analysis could, in principle, be carried out essarily long time periods for data collection and the laborious
more efficiently and rapidly by computational systems. Recent expert reasoning need for data analysis. Recent advances in
developments provide automated analysis of NMR assignments probe design, data collection strategies, and software engineer-
and three-dimensional structures of proteins ranging from ∼50 ing demonstrate the potential for higher throughput data collec-
to 200 amino acids4,18. When good quality data are available, tion and automated structure analysis.
automated analysis of protein NMR data can be very rapid.
Many of the available resonance assignment programs execute in
tens of seconds4,18, and automated structure refinements are Acknowledgments
being carried out in tens of minutes using arrays of processors We thank S. Anderson for useful discussions. The NMR data for FGF were
provided by R. Powers and F. Moy (Wyeth Ayerst Research Laboratories).
for course-grain parallel calculations (Fig. 3). However, while G.T.M. is supported by grants from the New Jersey Commission on Science and
progress over the last few years is encouraging, more work is Technology, The National Science Foundation, and the Merck Genome Research
required, even for small proteins, before automated structural Institute. K.C.G. is supported by Postdoctoral Fellowship Award from the NIH.
analysis is routine. In particular, general methods for automated
analysis of side chain resonance assignments are not yet well Associations with structural genomics
developed, and there are as yet no examples of completely auto- G. T. M. is Director of the New Jersey Commission on Science and Technology
mated protein structure determinations. Moreover, little work Initiative in Structural Genomics and Bioinformatics
has focused on the specific problems associated with nucleic acid 1. Hendrickson, W. Science 254, 51–58 (1995).
2. Billeter, M. Q. Rev. Biophys. 25, 325–377 (1992).
structure determinations. 3. Kossiakoff, A.A., Randal, M., Guenot, M. & Eigenbrot, C. Proteins Struct. Funct.
Genet. 14, 65–74 (1992).
4. Moseley, H.N.B. & Montelione, G.T. Curr. Opin. Struct. Biol. 9, 635–642 (1999).
Pilot projects using NMR for structural genomics 5. Wishart, D.S. & Sykes, B.D. J. Biomol. NMR 4, 171–180 (1994).
In view of these technological advances and the unique opportu- 6. Cornilescu, G., Delaglio, F. & Bax, A. J. Biomol. NMR 13, 289–302 (1999).
7. Fetrow, J.S. & Skolnick, J. J. Mol. Biol. 281, 949–968 (1998).
nities presented by the genomic sequence data, several research 8. Montelione, G.T. & Anderson, S. Nature Struct. Biol. 11–12 (1999).
groups and consortia have initiated pilot projects using NMR in 9. Shuker, S.B., Hajduk, P.J., Meadows, R.P. & Fesik, S.W. Science 274, 1531–1534 (1996).
10. Palmer, A.G., Williams, J. & McDermott, A. J. Phys. Chem. 100, 13293–13310
structural genomics (Table 1). The scales of these efforts range (1996).
from the effort at Rutgers University in the USA funded by the 11. Wüthrich, K. Nature Struct. Biol. 5, 492–495 (1998).
12. Gardner, K.H. & Kay, L.E. Annu. Rev. Biophys. Biomol. Struct. 27, 357−406 (1998).
New Jersey Commission on Science and Techonology, which 13. Murzin, A.G., Brenner, S.E., Hubbard, T. & Chothia, C. J. Mol. Biol. 247, 536–540
focuses primarily on technology development, to the RIKEN (1995).
14. Holm, L. & Sander, C. Science 273, 595–602 (1996).
Genome Sciences Center in Japan, which is in the process of 15. Orengo, C.A., et al. Structure 5, 1093–1108 (1997).
installing some twenty high field NMR spectrometers to be used 16. Pervushin, K., Riek, R., Wider, G. & Wüthrich, K. Proc. Natl. Acad. Sci. USA 94,
12366–12371 (1997).
largely for high throughput structural genomics. Also particular- 17. Wider, G. & Wüthrich, K. Curr. Opin. Struct. Biol. 9, 594–601 (1999).
ly noteworthy is the structural genomics pilot project organized 18. Montelione, G.T., Rios, C.B., Swapna, G.V.T. & Zimmerman, D.E. In Biological
magnetic resonance (eds Krishna, R. & Berliner, L.) 81–130 (Klewer
by researchers at University of Toronto in Canada, in which iso- Academic/Plenum Publishers, New York; 1999).
tope-enriched samples of proteins encoded by the genome of 19. Szyperski, T., Wider, G., Bushweller, J.H. & Wüthrich, K. J. Am. Chem. Soc. 115,
9307–9308 (1993).
Methanobacterium thermoautotrophicum have been distributed 20. Szyperski, T., Banecki, B., Braun, D. & Glaser, R.W. J. Biomol. NMR 11, 387–405 (1998).
to several NMR groups for parallel data collection and structure 21. Tjandra, N. & Bax, A. Science 278, 1111–1114 (1997).
22. Hansen, M.R., Mueller, L. & Pardi, A. Nature Struct. Biol. 5, 1065–1074 (1998).
analysis, resulting in some dozen three-dimensional structures 23. Prestegard, J.H. Nature Struct. Biol. 5, 517–522 (1998).
over the last year. 24. Cordier, F. & Grzesiek, S. J. Am. Chem. Soc 121, 1601–1602 (1999).

nature structural biology • structural genomics supplement • november 2000 985

You might also like