Download as pdf or txt
Download as pdf or txt
You are on page 1of 31

Chapter 23

ARIA for Solution and Solid-State NMR


Benjamin Bardiaux, Thrse Malliavin, and Michael Nilges

Abstract
In solution or solid-state, determining the three-dimensional structure of biomolecules by Nuclear
Magnetic Resonance (NMR) normally requires the collection of distance information. The interpretation
of the spectra containing this distance information is a critical step in an NMR structure determination. In
this chapter, we present the Ambiguous Restraints for Iterative Assignment (ARIA) program for auto-
mated cross-peak assignment and determination of macromolecular structure from solution and solid-state
NMR experiments. While the program was initially designed for the assignment of nuclear Overhauser
effect (NOE) resonances, it has been extended to the interpretation of magic-angle spinning (MAS) solid-
state NMR data. This chapter first details the concepts and procedures carried out by the program. Then,
we describe both the general strategy for structure determination with ARIA 2.3 and practical aspects of
the technique. ARIA 2.3 includes all recent developments. such as an extended integration of the
Collaborative Computing Project for the NMR community (CCPN), the incorporation of the log-har-
monic distance restraint potential and an automated treatment of symmetric oligomers.

Key words: Ambiguous distance restraint, Structure calculation, Automated assignment, MAS,
Solid-state NMR, CCPN, NOE, ARIA, PDSD, CHHC

1. Introduction

Nuclear Magnetic Resonance (NMR) is widely used in the field of


structural biology. Most structure determinations by NMR rely on
the measurement of distances and angles between nuclei, the dis-
tances playing a crucial role in the fold determination. In solution,
these distances are measured by nuclear Overhauser effect spec-
troscopy (NOESY) (1). The intensity of the nuclear Overhauser
effect (NOE), produced by the magnetization transfer through the
dipolar coupling between the observed spins, is related to the dis-
tance between the two interacting spins. The qualitative estimate of
distances from NOE intensities is then translated into interatomic

Alexander Shekhtman and David S. Burz (eds.), Protein NMR Techniques, Methods in Molecular Biology, vol. 831,
DOI 10.1007/978-1-61779-480-3_23, Springer Science+Business Media, LLC 2012

453
454 B. Bardiaux et al.

restraints and the structure is calculated from these restraints.


Structure determination from NOEs thus requires the assignment
of the NOE cross-peaks to pairs of magnetically interacting spins.
However, this assignment cannot generally be obtained without
the knowledge of the structure. In fact, unambiguously assigning
NOE cross-peaks is sometimes very difficult due to inadequate
spectral resolution, chemical shift degeneracy and potentially over-
lapped cross-peaks.
The introduction of the concept of Ambiguous Distances
Restraints (ADR) (2) was a breakthrough in the treatment of
degenerate NOE assignments, since it actually derives distance
information from ambiguously assigned cross-peaks. The intricate
relationship between the structure determination and the NOE
assignment led to the development of an iterative automatic proce-
dure to simultaneously calculate the structure and assign the NOEs.
In this procedure, structure calculation from ADRs and cross-peak
assignment are performed alternatively by comparing the tentative
ambiguous assignment to an ensemble of molecular conformations
determined on the basis of ADRs. The implementation of this iter-
ative strategy is largely automated in the software ARIA (Ambiguous
Restraints for Iterative Assignment) (36), which is described in
more detail. ARIA is an open source software, widely disseminated
in the biological NMR community.
The use of Magic-Angle Spinning (MAS) with solid-state
NMR (ssNMR) spectroscopy was applied to the structure determi-
nation of proteins in microcrystalline or fibrilar form. Long-range
structural restraints can also be obtained from proton-driven spin
diffusion experiments or proton-mediated rare-spin detected cor-
relation experiments. However, cross-peak assignment is compli-
cated by the larger band widths that induce substantial ambiguities
in resonance assignments. The first de novo determination of a pro-
tein structure from MAS ssNMR marked an important step in the
field (7). Shortly after, it was demonstrated that automated meth-
ods for cross-peak assignment, such as ARIA or CYANA (8), could
be successfully applied to carboncarbon or protonproton corre-
lation NMR experiments in the solid-state (912). ARIA now
incorporates routines for ssNMR structure determination by using
various solid-state NMR spectra.
In addition, the in-depth integration of the Collaborative
Computing Project for the NMR community in ARIA streamlines
the structure determination process by NMR by facilitating import
and export of data. Other recent improvements to ARIA include:
(i) implementation of the network anchoring approach (8, 13)
adapted to the ARIA philosophy, (ii) automated treatment of
symmetric oligomers (13) and (iii) the availability of the log-
harmonic potential and the Bayesian estimation of optimal
restraint weight (14).
23 ARIA for Solution and Solid-State NMR 455

2. Materials

2.1. ARIA Software The following software packages are required to use ARIA.
Package
1. ARIA software package. ARIA (6) is written in the program-
ming language Python (15). The current version is 2.3.
Instructions on how to install ARIA can be found in the ARIA
installation archive, which should be downloaded from http://
aria.pasteur.fr. ARIA can be installed on computers operating
under Linux, Windows or Mac OS X.
2. CNS software. To enable specific features used by ARIA, it is
necessary to compile the CNS program (16) with libraries pro-
vided within the ARIA package.
3. Optional: CCPNmr Analysis software package (version 2 or
later) (17). ARIA uses the CCPN data model to read input
data and to store all results in a general format.
4. Optional: Access to a computer cluster for distributed
calculation.

2.2. Input Data The minimal set of data required by ARIA consists of (see Note 1):
1. Definition of the molecular system.
2. List(s) of chemical shift assignments of 1H (for 2D-NOESY)
and 13C/15N if necessary for 3D-NOESY or for MAS solid-
state NMR spectra (see Note 2).
3. One or more lists of cross-peaks with chemical shift positions
in each dimension and peak volumes/intensities. Individual
peaks can be either fully assigned, partially assigned or com-
pletely unassigned. A list of cross-peaks generally corresponds
to the peaks picked in a particular spectrum. It is recommended
that similar experiments performed with different mixing times
are entered as separated lists.
ARIA also integrates various data types for additional experi-
mental information. All restraints must be in CNS tbl format
(see Note 1).
1. Hydrogen bonds: The distance between hydrogen donor and
acceptor as well as the distance between acceptor and hydrogen.
2. Dihedral angles: Dihedral angle restraints incorporated using a
flat-bottom harmonic-wall potential.
3. J-couplings: Calculated J-couplings are directly refined against
observed J-couplings.
4. Residual dipolar couplings: Residual dipolar coupling (RDC)
data as restraints.
5. Distance restraints: Preformatted distance restraints, e.g., from
manual assignments.
456 B. Bardiaux et al.

6. Preliminary structure or structure ensemble:. PDB formatted


file(s) from a previous calculation or models (see Note 1).
7. CCPN Project. CCPN project containing the same data as
listed above but directly imported into ARIA without format
conversion.

2.3. Software Additional software required to analyze the quality of the final
for Structural Quality structure ensembles:
Checks
1. PROCHECK (18).
2. WHAT IF (or WHAT_CHECK) (19).
3. ProSa II (or ProSa 2003) (20).
4. MolProbity suite (21).

3. Methods

The general workflow of the ARIA methodology is presented in


Fig. 1. After an initial chemical-shift based cross-peak assignment
and a calibration step, ambiguous distance restraints are derived
from the cross-peaks (NOEs or C-C correlations). From these
restraints, an ensemble of conformers is calculated. On the basis of
these structures, noise peaks are detected with a violation analysis,
and unlikely assignment possibilities are discarded. This process is
iterated several times (nine by default) with optimized parameters
for each iteration. Each step of the protocol is described in detail in
the sections below.

3.1. Preparation Phase Before cross-peak assignment and structure calculation, the fol-
lowing steps are automatically performed by ARIA. First, the data
are checked and filtered for errors and inconsistencies. The pro-
gram then creates the molecular topology of the system.

3.1.1. Data Filtering When checking the chemical shift assignments for consistency,
ARIA considers three possible situations:
1. A unique assignment consisting of a single atom and a single
chemical shift.
2. A degenerate chemical shift assignment, where one group of
equivalent atoms is assigned to exactly one chemical shift.
3. An assignment of the two substituents of a prochiral group,
which can have one or two chemical shifts.
In the latter case, floating chirality assignment (22) is used in
the resulting restraints (cf. Subheading 3.3.5). Peaks that lack fre-
quency information or with incorrect/missing peak sizes are
removed (see Note 3).
23 ARIA for Solution and Solid-State NMR 457

a b
ltering
Initial cross-peak assignment
Molecular topology creation

ARIA
Iterative protocol
c
Chemical
Cross-peak lists Calibration
shits assignments
d
Violation Analysis
Molecular Structure
ntion ensemble e

Structure ensemble
Noise peaks removal

f
Partial Assignment
Additional restraints g
(dihedrals, RDC,
Restraints Merging

restraints
Distance
distance)

Structure Calculation
ARIA nition h
(GUI)
Floating chirality assignment

j i
Generation of report
nement in explicit solvent
Quality analysis

Cross-peak Restraint PDB structure


assignments violation list ensemble

Structure quality
statistics

Fig. 1. Description of the ARIA protocol workflow. Rounded rectangles indicate steps performed by ARIA, folded rectangles
correspond to user provided input-data and trapezoids represent results.

3.1.2. Molecular Topology From the definition of the molecular system provided as input data,
Creation ARIA creates a molecular topology file (MTF) with the program
CNS (16). Name, chemical type, charge and mass of each atom as
well as the covalent connectivity are defined in the MTF. An
extended conformation of the molecule is then generated by CNS
and the coordinates are stored in a PDB file (cf. Subheading 3.8).
The molecular topology is created automatically for standard bio-
polymers. If applicable, topological features can be easily defined
by the user through the graphical interface (cf. Subheading 3.7).

3.2. Initial Cross-Peak For every cross-peak, ARIA uses the chemical shift lists from the
Assignment sequential resonance assignment to derive possible assignments. As
illustrated in Fig. 2, the peak position is defined by its frequency
3.2.1. Chemical-Shift
coordinates (c1, c2) in each dimension of the spectrum. To account
Based Assignment
for the limited precision in chemical shift measurements, for the
uncertainty of the cross-peak coordinates and for systematic exper-
imental errors, chemical shift tolerances (d1, d2) are applied around
the peak position. The tolerances should be chosen to be sufficiently
458 B. Bardiaux et al.

dimension 2
c11 c1 c1+1

c2+2
pz
py
c2
px
c22

pa pb pc pd dimension 1

Fig. 2. Illustration of the assignment of a cross-peak. c1,c2 denote the peak coordinates in
frequency space. The assignment frequency window is indicated by the solid black square,
defined from the chemical shifts tolerances d1 and d2. The coordinates of the (hypotheti-
cal) correct assignment are represented by the gray dashed lines (pb, py). Multiple reso-
nances within the tolerance window (pa, pb, pc, pd in dimension 1 and px, py, pz in the other
dimension) give rise to 12 assignment possibilities.

large to obtain frequency windows that can compensate for all


sources of inconsistencies between the list of resonance assign-
ments and the cross-peak lists. Then, for each peak dimension, all
protons (or 13C/15N spins for MAS ssNMR) whose chemical shifts
fall in the peak frequency windows are collected (see Note 4). In
the case of 3D or 4D heteronuclear spectra, the hetero atom
attached to the proton must also match the corresponding chemi-
cal shift window. The list of all assignment possibilities (or contri-
butions) for a cross-peak is generated from the combination of the
resonances assignment (Fig. 2). The sizes of the frequency win-
dows play an important role in the initial cross-peak assignment
step (see Note 5). In addition, the completeness of the chemical
shift assignments influences the accuracy of the initial assignment
(see Note 6). For symmetric oligomers, since symmetric nuclei will
have the same chemical shifts, ARIA will collect possible assign-
ments for all monomers. To simplify the treatment of the resulting
highly ambiguous assignments (see Note 7), ARIA considers only
one dimension (of the two corresponding to the through-space
correlation) as ambiguous in terms of chain assignment. Later on,
the corresponding symmetric restraints will be automatically gen-
erated by ARIA prior to structure calculation. ARIA also takes into
account information about the intramolecular or intermolecular
nature of the experiment (if applicable and specified by the user) by
excluding the nonvisible contributions.

3.2.2. Structural Rules ARIA can use information about the secondary structure organiza-
for Symmetric Oligomers tion of the system under investigation to remove unlikely assign-
ments. ARIA uses simple rules (23) to assign some cross-peaks as
intermonomer before the structure calculation, using the predicted
23 ARIA for Solution and Solid-State NMR 459

secondary structure elements (see Note 8). If two symmetric


secondary structure elements are facing each other in the interface,
cross-peaks observed within the same element between residues
separated by more than five residues in sequence cannot arise from
intramolecular contacts and are thus unambiguously classified as
intermolecular.

3.2.3. Network Anchoring ARIA implements a network anchoring approach (8) to reduce the
number of possibilities of cross-peak assignments prior to structure
calculation. The approach is based on the ranking of each assign-
ment, calculated using the information about the assignments of
neighboring nuclei in 3D space, and is efficient because true assign-
ments form a self-consistent subset of the network of all possible
assignments (see ref. 8, 13 for details). The behavior of network
anchoring is controlled by a set of user-defined parameters:
1. High network-anchoring (NA) score per residue threshold
high
(N res ).
min
2. Minimal NA score per residue threshold (N res ).
min
3. Minimal NA score per residues threshold (N atom ).
A peak is conserved if one of the following rules is verified:
S res N res
high
(1)

S res N res
min
and S atom N atom
min
(2)
where Sres and Satom are respectively the residue-wise and atom-wise
network anchoring score. Even though the network anchoring
approach does not directly rely on 3D structure information, it is
still possible to use it after the first ARIA iteration.

3.3. Iterative Structure The most important idea that underlies the ARIA methodology is
Calculation the concept of Ambiguous Distance Restraints (ADR) (2). In the
framework of the ADR, each NOESY cross-peak is treated as the
3.3.1. Ambiguous Distance
superposition of the signals from each of its multiple assignments
Restraints
possibilities: the NOE intensity depends on the sum of the inverse
sixth power of all the individual protonproton distances that con-
tribute to the signal. An effective distance D is thus derived as:
1

Nc 6
D = dc6 (3)
c =1

where c runs through all Nc assignment possibilities and dc is the


interatomic distance between the two protons corresponding to
the c-th contribution. During structure calculation, in a similar
fashion as for unambiguous distance constraints, the distance D in
the molecular coordinates is restrained through the distance target
energy function (cf. Subheading 3.3.5).
460 B. Bardiaux et al.

3.3.2. Distance Calibration The simplest model to derive distances from NOE signal intensity
is the Isolated Spin Pair Approximation (ISPA), which considers
only the observed spin pair, neglecting spin diffusion through third
nuclei. For short mixing times, ISPA provides a good approxima-
tion to relate an NOE volume (Vij) to the distance dij of two inter-
acting spins i and j:
Vij = Cdij6 (4)

The scale factor C (also named calibration factor) cannot be


measured directly since it depends on the system under investiga-
tion and on the experimental setup. The calibration factor is esti-
mated for all NOEs from the ratio of the average of the experimental
volume, Vexp, to the average of the theoretical volume:

V exp
C= i
(5)
di 6
i

where di is the average effective distance for NOE i in the con-


former ensemble. In the case of multiple assignment possibilities,
di is calculated according to equation of ADR Eq. 3. Finally, the
calibrated distance is obtained by:
1

d = (C 1V exp ) 6
(6)
In the case of NOE between two groups of magnetically equiv-
alent spins (e.g., methyl groups and aromatic rings), averaging
effects are taken into account by expanding Eq. 4 (see Note 9).
Magnetization can also be transferred from one spin to another
not only directly but also by spin diffusion, i.e., indirectly via other
spins in the vicinity. For longer mixing times, the spin-diffusion
phenomenon must be considered in the estimation of the distance.
When applying ISPA the resulting interproton distances are there-
fore mostly underestimated. ARIA employs relaxation matrix the-
ory to account for indirect magnetization transfer. In this formalism,
cross-peak volumes at mixing time tm can be calculated given the
volumes at tm = 0 and the matrix of auto- and cross-relaxation rates,
R (24):
Vij (t m ) = CVij (0)(exp(Rt m ))ij (7)

The resulting NOE back-calculated volumes, which take into


account the bias induced by spin-diffusion, are then converted into
corrected target distances d:
1

V exp 6
d = d C 1 (8)
V th
23 ARIA for Solution and Solid-State NMR 461

where d is the average effective distance, and V exp and V th are the
experimental and theoretical NOE volumes, respectively. When
using spin-diffusion corrected distances, the distance bounds cal-
culated from the theoretical volume may also be of use for the
structure calculation (25). In ARIA 2.3, the spin-diffusion correc-
tion is performed by the python core of ARIA and not by CNS
routines. It is also important to note that every spectrum is inde-
pendently calibrated. Still, these models are approximate and it is
common practice to restrain the distance to an interval to account
for uncertainties in the distances (see Note 10). This interval is
thus defined by lower and upper distance bounds, L and U:
L = d ,U = d + where = 0.125d 2 (9)

3.3.3. Violation Analysis To identify incorrect assignments and noise peaks, the calibrated
and Noise Peak Removal restraints are treated with a violation analysis, following the struc-
tural consistency hypothesis (3, 26): incorrectly assigned peaks or
noise peaks are not consistent with the 3D structure determined
with all experimental data. To assess whether a particular restraint
follows the general trends imposed on the structures by the entire
data set, the obtained distance bounds are compared to the corre-
sponding distances found in the conformer ensemble. A restraint is
considered as violated if the distance found in the structure lies
outside the bounds by more than a user-defined violation tolerance,
t. To identify systematically violated restraints, each conformer in
the ensemble is analyzed. The fraction, f i , of conformers violating
restraint i is calculated according to:
1 S
fi = max((Li t di(k) ), (di(k) U i t ))
S k =1
(10)

where Li and Ui denote the lower and upper bounds of the i-th
restraint, di(k) designates the distance found in the k-th conformer;
Q is the Heaviside step function and S is the total number of con-
formers analyzed. A restraint is classified as violated if f i exceeds a
user-defined violation threshold (50% by default). The correspond-
ing cross-peak is thus removed from the list of active peaks for the
next iteration. During the course of the protocol, the violation
tolerance, is reduced from iteration to iteration to ensure that most
of the inconsistent peaks are removed.

3.3.4. Partial Assignment The assignment of cross-peaks is made in an indirect fashion by


progressively eliminating unlikely assignment possibilities. Due to
the r 6 dependence, assignments with large distances contribute
only little to the NOE intensity. Thus, for a particular cross-peak,
each assignment possibility is weighted by its normalized partial
volume, wc , calculated as follows:
wc dc6 (11)
462 B. Bardiaux et al.

Nc

w
c =1
c =1 (12)

where dc is the average distance of the contribution c in the struc-


ture ensemble and Nc, the number of contributions for the cross-
peak. To reduce the number of assignment possibilities, only the m
largest contributions satisfying the following condition are kept:
m

w
1
c p (13)

where p designates a user-defined ambiguity cut-off. This cut-off is


set to 1.0 in the first iteration and progressively reduced to 0.8 so
that for most peaks unambiguous assignments can be derived in
the last iteration. The quality of NMR structure ensembles might
also be improved by excluding peaks that involve a large number of
contributions. This function is controlled by the parameter max_n,
which defines the maximum number of assignment possibilities
(4). Symmetric peaks or duplicate peaks from different experiments
lead to equivalent restraints (restraints involving the same set of
atoms). To avoid overrepresentation of certain distance data, non-
violated restraints with equivalent atom content are detected. The
restraint with the smallest distance is kept, while the others are dis-
carded for the rest of the protocol. For every iteration, the file
noe_restraints.merged lists restraints discarded by the
merging procedure.

3.3.5. Calculation On the basis of the merged restraints list, a new structure ensemble
of Structure Ensemble is calculated with the program CNS (16) through a molecular
dynamics simulated annealing (MDSA) protocol. ARIA provides
two forms of molecular dynamics : in Cartesian or torsion angle
space. Torsion angle molecular dynamics (TAD) (27) reduces the
calculation time and allows for higher MDSA temperatures, while
generally increasing the convergence radius. The molecular struc-
tures obtained with TAD also provide better local geometries. The
MDSA protocol used in ARIA is divided into two phases : an initial
high temperature search phase, and a cooling phase where the tem-
perature slowly decreases. The second part of the cooling stage is
performed in Cartesian coordinates. The length of the cooling
stages determines the slope of the bath temperature cooling func-
tion. It has been shown that this parameter plays an important role
in the convergence properties of the ARIA calculation for highly
ambiguous data (28). The MDSA protocols implemented in ARIA
(3) are optimized for the application of ambiguous distance
restraints and for the violation analysis method. The minimization
protocols are based primarily on separate scaling of different energy
terms with relatively low force constants. Any other structural
23 ARIA for Solution and Solid-State NMR 463

Table 1
Important protocol parameters, their location in the GUI, and
defaults values (if applicable)

Parameter GUI item Default value

Project environment Project


Project name 1
File root
Working directory
Temporary directory
Data specification Data
Frequency window (proton) Spectra 0.02
Frequency window (hetero) Spectra 0.5
Trust assignments Spectra No
Use only assigned Spectra No
Symmetry Symmetry None
CNS topology file Molecular system topallhdg5.3.pro
CNS parameter file Molecular system parallhdg5.3.pro
Protocol parameters Protocol
Number of structures Iterations 20
(n_structures)
Violation tolerance (t) Iterations 1000.00.1
Violation threshold Iterations 0.5
Ambiguity cutoff (wc) Iterations 1.00.8
Maximum nb. of Iterations 20
contributions (max_n)
Number of lowest energy Iterations 7
structures (S)
Solvent for refinement Water refinement Water
Structure calculation Structure Generation
Local CNS executable CNS
Command to start remote Job Manager
calculation
High temperature steps CNS Dynamics 10,000
Cooling 1 steps CNS Dynamics 5,000
Cooling 2 steps CNS Dynamics 4,000
Log-Harmonic potential CNS Annealing No
Parameters

restraints available are also used during the structure calculation


(hydrogen bond restraints, dihedral angles and RDCs). The number
of calculated conformers is an important parameter of the structure
calculation protocol. Among all calculated conformers, only the
n-lowest energy ones (usually n = 30%) will be used in the next
ARIA iteration to recalibrate and reassign NOEs. For every itera-
tion, the number of structures is a user defined parameter (see
Table 1).
464 B. Bardiaux et al.

3.3.6. Restraint Energy The aim of the MDSA protocol is to find a global energy minimum
Function of an objective function that incorporates experimental data and
physical energy. The latter is quantified by using a molecular
dynamics force field. Experimental data are integrated in the form
of conformational restraints entering the objective function via an
energy potential. For distance restraints, ARIA employs an flat-
bottom harmonic-wall potential with zero-energy between the dis-
tance bounds and linear asymptotes (3). This potential allows for
large distance violations as may occur in an automated assignment
procedure. Nevertheless, it is still difficult to correctly evaluate the
bounds and the relative weight to apply to the data. Recently, we
have introduced an new error-tolerant potential where lower and
upper bounds are replaced by a bounds-free log-harmonic potential
(14). This potential derives from a Bayesian analysis showing that
NOEs and the derived distances ideally follow the log-normal dis-
tribution (29, 30). In ARIA, we also retain another important fea-
ture of this Bayesian approach: automatic determination of the
optimal weight for the experimental data (31). The log-harmonic
potential is applied during the second cooling stage of the MDSA
and during water refinement. The weight for the distance restraints,
wdata , is iteratively evaluated as:
n
wdata = (14)
(X )
2

where n is the number of restraints, and:


d
2 (X ) = log 2 i (15)

i di

where, for each restraint i, di is the effective distance Eq. 3 calcu-


lated from the current structure, and di is the target distance of
the restraint. This approach was shown to generally improve the
accuracy as well as the quality of the structures calculated from
assigned restraints (14). Our initial experience in using ARIA with
real (noisy and ambiguous) data indicates that the log-harmonic
restraint potential is preferable.

3.3.7. Symmetric The symmetry of the system is maintained during the calculation
Oligomers by adding a symmetry target function to the objective energy func-
tion (32). This target function contains terms that ensure the
symmetry relation between the monomers and keep them in the
vicinity of each other (Packing, see Note 11).

3.3.8. Floating Chirality The treatment of unassigned prochiral groups is realized with a
Assignment floating chirality assignment approach (22). The two substituents
of a prochiral center (methylene protons or methyl protons of iso-
propyl groups) are often difficult to assign stereo-specifically, in
terms of chemical shifts. In each proton dimension, a resonance
23 ARIA for Solution and Solid-State NMR 465

matching one of the chemical shifts may potentially involve either


of the two prochiral substituents. In ARIA, the two assignment
alternatives are tested during the course of the structure calculation
and the most energetically favorable possibility is used. The result
is written for each conformer in a file with a .float extension.

3.4. Solvent The simplified force field parameters for nonbonded contacts
Refinement applied to structure calculations in vacuo often produce structures
that contain artifacts (unrealistic side-chain packing and unsatisfied
hydrogen bond donors or acceptors). Therefore, the final struc-
tures of the last ARIA iteration are automatically refined in a shell
of explicit solvent (water or DMSO molecules). This refinement
consists in a short MD with a complete force field, which includes
coulombic and Lennard-Jones potentials. The covalent parameters
used in the refinement (33) are consistent with the force field used
for structure calculation and validation, thus avoiding systematic
differences that could influence validation results. It has been
shown that the refinement in solution significantly improves the
quality of the structure (3335).
3.5. Results Export
At the end of the ARIA protocol, assigned peak lists, restraint lists,
and Generation
along with violations, and final structure ensembles (last iteration
of Output Files
and solvent refined) are automatically exported into a CCPN proj-
3.5.1. Export to CCPN ect (see Fig. 3). Data exchange, further analysis of results, and
management of ARIA runs are then facilitated through the use of
the CCPN program suite (cf. Subheading 3.11).

ARIA

Cross-peak assignments

nition Distance restraints

Cross-peaks lists Violations


IMPORT

Final structure ensemble


EXPORT

Chemical shift assignments


Distance restraints
Hydrogen bond restraints
Dihedral angle restraints
RDC restraints
Initial structure ensemble

CCPN
project CCPN Analysis

Fig. 3. Communication interface between ARIA and CCPN for import of input data and export of results.
466 B. Bardiaux et al.

3.5.2. Report Files For every iteration, ARIA creates the following report files:
1. report summarizes analyses of the restraint lists and the
structure ensemble (number of restraints applied, violations,
ensemble precision).
2. noe_restraints.unambig, noe_restraints.ambig
tabulates information about unambiguous and ambiguous
restraints, respectively. For each restraint, the reference cross-
peak, restraint bounds and the average distance found in the
ensemble are provided. The result of violation analysis is also
given here (see Note 12).
3. noe_restraints.violations lists all violated restraints.
4. noe_restraints.assignments lists the tentative assign-
ments corresponding to every restraint. The nature of the
assignment(s) is also given (fully, partially or unassigned cross-
peaks).
5. noe_restraints.xml, noe_restraints.pickle stores
the complete list of cross-peak based distance restraints in
XML format and Python binary format. The latter is required
for further assignment analysis in the ARIA GUI (cf.
Subheading 3.10.2).

3.5.3. Quality Checks To evaluate the structural quality of both the final set of structures
and the solvent-refined ensemble, ARIA makes use of the programs
WHAT IF (19), PROCHECK (18), ProSa (20) and MolProbity
(21). Separate report files are generated for every program, named
quality_checks.*, and are stored in the directories of the
respective ensembles (last iteration and solvent-refined). Overall
quality scores are tabulated in the file quality_checks, whereas
WHAT-IF score profiles along the molecular sequence are gener-
ated in both textual and graphical forms (cf. Subheading 3.10.3).

3.5.4. CNS Analyses CNS scripts calculate restraint energies, ensemble RMSDs, an opti-
mal superposition of the final structure ensemble (with automated
determination of flexible and rigid regions), and an unminimized
average structure. Analyses of restraints from complementary
experimental data are also given. Results are stored in the directory
analysis/.
In the following sections, we detail the typical procedure to be
followed by a user to perform an ARIA calculation. In a structure
determination project, the general procedure consists of repeated
ARIA runs using revised results from a previous calculation as input
data (Fig. 4).

3.6. Conversion Since most NMR software packages use proprietary formats for
of Input Data data storage, the interconversion step required to transfer data with
other applications such as ARIA can lead to a loss of information.
23 ARIA for Solution and Solid-State NMR 467

Initial stage Series of ARIA runs

Preparation of input data


Setup of a new run
Parameters and project setup
Adjustment of frequency windows

Revision of input data


ARIA Completion of cross-peak
assignments
Automated cross-peak
assignment and Removal of potential noise
structure calculation peaks

Examination of quality checks for les


nal structure ensemble
Analysis of violations
and proposed assignments
Final result

Fig. 4. A series of ARIA runs in a typical structure determination project, with several cycles of structure calculations and
cross-peak assignments punctuated by manual inspection and correction of experimental input data.

To facilitate data validation and integration, ARIA uses a data


format based on the extensible markup language (XML) (36) to
describe molecular systems, chemical shifts, and cross-peaks lists. If
input data are intended to be read from a previously created CCPN
project, the conversion step described here is no longer required
(see Fig. 3). Input data will be read directly and internally con-
verted from the CCPN data model into ARIA at run-time. It is
otherwise necessary to convert input data to ARIA XML format
before starting the ARIA program per se. This step is simplified by
the internal conversion routine provided by ARIA. To use this rou-
tine, one must prepare a simple XML conversion file.
1. Conversion template. A preformatted conversion file can be
auto-generated by typing the following command in a
terminal:
aria2 --convert -t conversion.xml
An empty conversion file template, conversion.xml is then
created and must be completed.
2. Editing the conversion file. In addition to formats and filenames
of the raw data (sequence, chemical shift lists and spectra) (see
Note 13), the user has to specify the mapping between nuclei
and frequency dimensions. If the molecular system is a sym-
metric multimer, it is mandatory to specify the molecular chains
involved (segment id or segid). For the cross-peak lists, the user
needs to indicate the chains involved and the level of chain-wise
468 B. Bardiaux et al.

ambiguity. Possibilities are intramolecular, intermolecular, or


unknown. (cf. Subheading 3.7). For solid-state NMR experi-
ments, a parameter has to be filled in by the user to designate
the type of experiment and transfer (see Note 14).
3. Conversion step. Then, invoke the command
aria2 --convert conversion.xml
to start the data conversion. Converted data will be written in
ARIA XML format; a project file, which has to be completed by
the user, will be generated as well.

3.7. Specification 1. Project creation. All program parameters and locations of the
of ARIA Project input data are stored in single project file (in XML format). To
Parameters conveniently change or review the project settings, ARIA pro-
vides a Graphical User Interface (GUI) (Fig. 5). Entering the
following command will start the GUI and load the project
definition from project.xml (see Note 15)
aria2 --gui project.xml

Fig. 5. Graphical User Interface of ARIA 2.3 for project management, where data and protocol settings can be modified
graphically.
23 ARIA for Solution and Solid-State NMR 469

Important program and protocol parameters are listed in


Table 1. Default settings are provided for the rest of the
parameters.
2. General settings. Mandatory parameters are related to the gen-
eral infrastructure of the project, e.g., the name of the project,
the directory where an ARIA run will be stored (Working direc-
tory) or the prefix (File root) used by ARIA throughout the
project for naming PDB files.
3. Sequence definition. It is necessary to provide here the defini-
tion of the molecular system. A project file created during the
conversion step will already display the location of the XML
file of the molecular sequence. Otherwise, the Browse but-
ton assists in locating the sequence definition XML file. If the
sequence has to be read from a CCPN project, the user should
first locate the CCPN project in the CCPN data model panel
in the GUI. Then, the CCPN format has to be chosen for
the sequence, and hitting the Select button will open a pop-
up window displaying available molecular systems contained in
the CCPN project. This procedure is common to all steps
where import of data from a CCPN project is available.
4. Adding input data. Spectra and additional experimental data
can be added by clicking the Add button in the GUI menu.
When adding a spectrum, it is necessary to provide both the
location of the cross-peak list and the corresponding chemical
shift list. Additional experimental data can be supplied in the
form of CNS tbl formatted files or from a CCPN Project. In
the latter case, supplementary options are offered when the
distance restraints list is added. For instance, distance restraints
can be selected to enter the iterative protocol, where they will
be recalibrated and filtered like restraints derived from the
internal ARIA cross-peak assignments procedure. Otherwise,
they will be kept untouched by the program during the entire
protocol.
5. Adjusting data parameters. For each spectrum the default fre-
quency window sizes should be adjusted. When a user wants to
apply spin-diffusion correction, the necessary parameters need
to be entered (molecule correlation time, spectrometer fre-
quency and mixing time). The nature of the cross-peak in
terms of possible chain assignment should also be specified
here in the case of symmetric oligomers. This option intends to
make better use of possible information arising from filtered/
separated experiments recorded on asymmetrically labeled
samples. Finally, for solid-state NMR spectra, we recommend
specifying lower and upper distance bounds that will be applied
to the cross-peak derived restraints (see Note 10). If applicable,
its is furthermore possible to pick an appropriate labeling
scheme (see Note 14). In addition, parameters relative to dihedral
470 B. Bardiaux et al.

angle restraints (see Note 8), RDC (see Note 16) and J-couplings
(see Note 17) should be defined.
6. Symmetry. ARIA can treat oligomers with C2, C3, C5 or D2
symmetry (see Note 11).
7. Specifying topology patches. By default, ARIA supports the fol-
lowing cases: Disulfide bridges (unambiguous or ambiguous)
(2), Histidine protonation states, cis-proline and tetrahedral
coordination of Zinc ions. In the case of nonstandard residues
or other chemical compounds, manual intervention of the user
is required (see Note 18).
8. Iteration parameters. The mode of restraint calibration has to
be specified : ratio of average (default), spin-diffusion correction
or fixed bounds (see Note 10). For every iteration, default val-
ues are provided for protocol parameters (Table 1) and the
network-anchoring thresholds (see Note 19).
9. Job Manager. Distributing structure calculations to multiple
processors speeds up the ARIA protocol. ARIA provides sup-
port for several job submission modes (see Note 20). The
appropriate command should be entered and the correct path
to the remote CNS program executable should be specified.
10. Structure calculation parameters. The remaining parameters
are related to the molecular dynamics simulated annealing, and
in particular the number of steps, restraint force constants and
potential shape (flat-bottom-harmonic-wall and log-harmonic).

3.8. Project Setup At this point, the project must be set up with the following
command.
aria2 --setup project.xml
The project is then validated and ARIA creates the directory
tree for the project (directory run1). As shown in Fig. 6, the results
of the successive iterations are stored in structures/, each iteration
having its own subdirectory, e.g., structures/it0/. Experimental
data files are copied into their respective directory in data/ (see
Note 21). Report files for the cross-peak filtering procedure are
stored in data/spectra/. All data, protocols, parameters, and topol-
ogy files used by CNS reside in the cns/ subdirectory.

3.9. Starting an ARIA It is now possible to launch the ARIA calculation, using the follow-
Run ing command:
aria2 project.xml
ARIA will then automatically perform all the steps listed in
Subheadings 3.13.5. The main ARIA job will be executed on the
local machine where it has been started. According to the job man-
ager settings of the project, the structure calculations will be
23 ARIA for Solution and Solid-State NMR 471

begin le

sequence nition (XML)

templates Template structures (PDB)

spectra Cross-peaks and chemical shifts lists (XML)

ssbonds de bonds (TBL)

data hbonds Distance restraints for hydrogen bonds (TBL)

Location where ARIA jcouplings Restraints for J-Couplings (TBL)


stores input data
rdcs Residual Dipolar couplings restraints (TBL)

dihedrals Dihedral angle restraints (TBL)

distances User provided distances restraints (TBL)

it0 rst ARIA iteration (iteration 0)

it1
analysis Various analysis results (performed by CNS)
run1 structures ...
graphics les (PostScript)
ARIA run directory les for each it8
iteration are stored here molmol le to visualize restraints
Last iteration
cns les

ne W ned structures and quality-checks analysis

protocols Simulated-annealing protocols (CNS)

cns data Input data for simulated annealing

Files used for toppar T nition


structure calculation
begin Template structure for simulated-annealing

Fig. 6. Illustration of the directory tree of an ARIA project and details about the content. Final results can be found in the
directories marked in gray.

successively launched on the local processor (default behavior) or


dispatched to a computer cluster (see Note 20).

3.10. Checking In the next paragraphs, we list the points of interests when inspect-
the Results ing the calculation results, along with some guidance on how to
correct input data and adapt the protocol parameters.

The level of convergence indicates how well the protocol managed


3.10.1. Convergence
to find a well-defined structure and a consistent set of assignments.
Convergence can be estimated with two indicators:
1. The average (and variation) of the total energy of the structure
ensemble
2. The conformational variance of the structure ensemble (or pre-
cision) expressed as a RMSD.
A low average energy (see Note 22) and a high precision
(RMSD < 1.5 ) generally mean that convergence has been reached.
Other situations may stem for unadapted protocol settings or
incomplete or low quality data. The average energy can be found in
structures/it8/analysis/energy.disp and the precision in the report
file or in structures/it8/analysis/rmsdave.disp.
472 B. Bardiaux et al.

3.10.2. Automated 1. The report files listed in Subheading 3.5 provide analyses on all
Assignments restraints and particularly which restraints have been classified
as violated. Restraints showing consistent violation greater
than 0.1 should be inspected manually. Restraints with large
upper-bound violations (5 ) in the majority of the conform-
ers (85%) usually result from incorrect assignments. Restraints
detected as such should not be used in a later ARIA run and
the corresponding cross-peak removed from its respective
spectrum. Other assignments should be considered as reli-
able in a subsequent run.
2. Analyzing text files for violations and assignments can be a
tedious task. ARIA also provides ways to investigate this in a
graphical manner (37). Postscript files describing the restraints,
based on the RMS of violations are generated automatically
during a run. These values are displayed at the residue level, in
the form of a profile along the protein sequence, or as a contact
map for the RMS of violations per residue pair (Fig. 7a). The
contact map displays the sum of the RMS of violations per resi-
due pair. In the profile, the sum of the RMS of violations per
residues is plotted along the protein sequence. In addition, the
program provides an interactive tool to browse assignments at
the residue level (Peak map). A peak-map can be viewed for all
iterations in the ARIA GUI (Fig. 8). Clicking on a contact

Fig. 7. Per-residue quality plots. (a) Contact map displaying the sums of RMS deviations and a profile of the RMS deviations.
(b) WHATIF score profiles along the sequence. The RMS deviations are plotted on a color scale (figure adapted from ref. 25).
23 ARIA for Solution and Solid-State NMR 473

Fig. 8. Interactive peak map. Right panel of the ARIA 2.3 GUI showing the interactive peak map at iteration 8 of an ARIA run.
Each pixel of the map located between residues i and j is clickable and opens an assignment report, which contains the
list of peaks that exist between residues i and j, along with their contributions (figure adapted from ref. 25).

between residues i and j opens a pop-up window that shows a


list of ARIA restraints involving atoms from both residues,
where restraints are labeled. Such graphical representations can
be useful to detect regions of the structure where violations are
concentrated, indicating where restraints and assignments
should be more thoroughly investigated.
3. Finally, the resulting restraints and assignments that are
exported to a CCPN project can be later investigated with the
CCPNmr Analysis software. As illustrated on Fig. 9, CCPNmr
Analysis offers utilities to inspect restraints through a customi-
zable user interface. Moreover, a user will be able to examine
the proposed resonance assignments directly in a spectral dis-
play window at the positions in frequency space where the
peaks were picked.

3.10.3. Quality Indices The quality of structure ensembles as determined by independent


structure validation is widely acknowledged as a good indicator of
the performance of the structure calculation protocol and of the
reliability of the structure. The application of NMR restraints for
structure calculation may induce distortions in the geometry of the
molecular structure. For this purpose, ARIA applies four major
programs (PROCHECK (18), WHAT IF (19), ProSa (20) and
MolProbity (21)) that aim at detecting outliers and abnormalities
474 B. Bardiaux et al.

Fig. 9. Screenshot of CCPNmr Analysis windows showing the result of an ARIA run.

in macromolecular structure by comparing several characteristic


geometric properties to a database of small molecules and/or high-
resolution X-ray structures. The summary of all global quality indi-
ces is given in the quality_checks file. For thorough reviews of
tools to evaluate the quality of NMR structures, we suggest con-
sulting the following references (38, 39). We would like to stress
here that despite the apparent lower resolution of solid-state NMR
data, a great deal of attention should still be given to the inspection
of such quality checks. The following scores should be investigated
further (see Note 23).
1. Procheck Ramachandran percentage. For typical NMR struc-
tures deposited in the PDB, 80% of the dihedral angles lie
within the preferred region of the Ramachandran plot. For
high-resolution NMR structures, a higher percentage is
expected (90%).
23 ARIA for Solution and Solid-State NMR 475

2. WHAT-IF Z-scores. WHAT-IF results are presented in the form


of overall Z-scores. In general, structures with Z-scores
between 2 and +2 are considered to be within a normal range
and are thus good structures, while structures with Z-scores
lower than 2 should be inspected further. Useful indicators of
good quality are Backbone conformation and Packing
quality. The bump-score also reports the number of van
der Waals violations per 100 residues.
3. WHAT-IF profiles. Recently, some studies have stressed that
global structural indicators are not sufficient to detect errors in
structures and suggested examining parameters on a per-residue
basis (40, 41). Such profiles for the WHAT-IF scores are pro-
duced by ARIA in the form of a PostScript file (Fig. 7b). Thus,
poor quality regions can be precisely identified (see Note 24).
4. Molprobity clashscore. This reports the number of overlaps
>0.4 per thousand atoms. For typical NMR structures depos-
ited in the PDB, this score is generally high (>10). From our
experience, the application of the log-harmonic potential along
with automated weight estimation significantly improves this
situation.

3.11. Preparing To use the result of an ARIA run to further improve the structure,
a New Run it may be necessary to correct the input data. At this stage, we rec-
ommend preparing a new ARIA project for better bookkeeping.
CCPNmr Analysis also offers a utility to manage the input and
output of successive ARIA runs (Fig. 9). The same CCPN project
can be used in multiple ARIA runs.

3.11.1. Correction 1. Peaks identified as erroneous (noise peaks) should be deleted


of Input Data from the input data.
2. Automated assignments may be added in the initial cross-peak
assignment and incorrect assignments removed.
3. To improve convergence, reliable assignments can be used
either as distance restraints or set individually as reliable in
the input XML file.

3.11.2. Adjusting In the new project file, protocol parameters may also be changed
Parameters according to the result of a previous calculation. We list here the
most important parameters that ought to be adapted.
1. The number of dynamic steps required for convergence is
determined by the system size and the level of ambiguity or
incompleteness of the input data. Default values work well for
systems up to about 100 residues studied with NOESY.
However, for larger systems (e.g., symmetric oligomers) or
when MAS solid-state NMR data are used, it might become
necessary to increase the number of steps in the cooling stage
476 B. Bardiaux et al.

of the simulated annealing protocol. On the one hand, the


computation time to calculate a structure will increase with the
length of the dynamics. On the other hand, a slow-cooling
strategy substantially increases the probability of success of the
minimization protocol (see Note 25).
2. In case of poor convergence, one should also check frequency
window sizes. Narrow windows affect the completeness of a
cross-peak assignment. It may therefore be judicious to slightly
increase the individual window size (e.g., by 10%). Conversely,
when the final set of restraints is still largely ambiguous, it is
reasonable to reduce the window sizes.
3. Achieving convergence may also be hampered by a tight viola-
tion tolerance. If a large number of restraints are rejected, the
data may be become too sparse. Also, if an initial ensemble of
template structures (from a previous calculation for instance) is
specified, the default tolerance must be reduced for the first
iteration (e.g., 5 ).

4. Notes

1. Data can be read from common NMR formats or via the


CCPN program suite. Compliant formats are the following :
Ansig (42), NMRDraw (43), NMRView (44), Pipp (45),
Pronto (46), Sparky (T. D. Goddard and D. G. Kneller,
University of California), XEasy (47), Diana (48), and NMRStar
(49). PDB files with CNS(16), IUPAC(50), or DYANA (51)
atom name nomenclatures can be read by ARIA. Restraints
files should follow the CNS/XPLOR syntax and nomencla-
ture. Mismatch in segment id (segid) between the restraints
and the molecular definition is often a source of errors. ARIA
internally follows the IUPAC (50) recommendations for the
atom name nomenclature. Most common naming problems
are the following:
The C-terminal carboxyl group is named O and O. O
contains two apostrophes (ASCII 39), not a quotation
mark (ASCII 34). The PDB uses O and OXT or OT1 and
OT2 instead.
The N-terminus consists of H1, H2, and H3 (not HT1,
HT2 and HT3).
The protein backbone amide proton is called H (instead
of HN).
The glycine alpha protons are HA2 and HA3.
Pseudoatoms (52) are not supported, r6-averaging is
applied to equivalent groups.
23 ARIA for Solution and Solid-State NMR 477

2. ARIA supports CHHC/NHHC (53) and 2D/3D 13C-13C


correlation spectra, i.e., PDSD (54, 55), DARR (56), and
PAR (57).
3. We always use absolute values of peak sizes (volume or
intensity).
4. For C/NHHC experiments, cross-peak assignment is per-
formed on the basis of 13C/15N chemical shifts, but later trans-
formed in protonproton distance restraints.
5. Windows that are too narrow induce potentially incomplete
assignments, while large window sizes lead to highly ambigu-
ous initial assignments, which are often the source of severe
convergence issues during the ARIA protocol. Therefore, win-
dow size must be chosen carefully; the ideal situation is reached
when the windows size is sufficiently large to contain the cor-
rect assignments, but without unduly increasing the number of
assignment possibilities. Typical window size values for NOESY
spectra are 0.02 and 0.04 ppm for the direct and indirect pro-
ton dimensions, respectively, and 0.5 ppm for the heteronu-
clear dimensions. The maximum number of assignment
possibilities (max_n) also affects the quality of the initial assign-
ment, since some peaks that could correctly be assigned are
rejected due to an excessively large number of assignment pos-
sibilities. Fossi et al. have developed a strategy, based on a pre-
calculation analysis, for choosing optimal values for d and and
max_n for a particular data set (58). The size of the windows
is directly linked to the line-width of the spectra. Thus, for
MAS solid-state NMR experiments, line broadening would
require larger assignment windows. From the literature, typical
values for proton-driven spin diffusion experiments or proton-
mediated rare-spin correlation experiments are in the range of
0.250.6 ppm.
6. Atoms with missing resonance assignments will not be assigned
to any cross-peak. In this case, automatically generated assign-
ments are almost certainly wrong. From our experience, to
achieve reasonable convergence, the completeness of a chemi-
cal shift list should not be less than 90%.
7. In addition to the standard ambiguity arising from chemical
shift degeneracy, symmetry degeneracy leads to a larger num-
ber of assignment possibilities.
8. Different methods can be used to estimate secondary struc-
tures. For instance, CSI (59), TALOS (60) or DANGLE (61)
predict likely values of phi/psi main-chain dihedral angles from
a list of chemical shift assignments. Such predictions can be
incorporated as dihedral angle restraints using an harmonic
square-well potential.
478 B. Bardiaux et al.

9. The theoretical cross-peak volume is then calculated as an r 6 -


average over all pairwise contributions:
1
VIJ = CnI n J dIJ6 where dIJ6 =
N IN J
d
IJ
6
ij (16)

and where I and J denote two groups of spins having nI and nJ


members, respectively. Introduction of the effective distance
dIJ retains the functional form of Eq. 4. Equation 16 relies on
a discrete slow jump model where spins I and J jump between
NI and NJ equilibrium sites, respectively (24).
10. For solid-state NMR data, approximation is more severe.
Because of additional effects that influence the relation
between peak intensity and the actual distance (dipolar trun-
cation, partial mobility, transfer efficiency), the calibration
routine implemented in ARIA may not be adapted to cor-
rectly model the cross-peak signals. However, the use of fixed
distance bounds has been shown to be sufficient in numerous
solid-state NMR studies. In fact, the calibration is less impor-
tant since the essential feature of the ambiguous distance
restraint remains valid: if at least one of the assignment possi-
bilities is smaller than the upper limit, the restraint is satisfied.
Bounds can be estimated, for instance, from buildup curves.
We recommend consulting the following references for details
(7, 9, 55, 62, 63).
11. The packing restraint intends to compensate for lack of unam-
biguous intermonomer restraints in early ARIA iterations. If
convergence is achieved and a sufficient number of meaningful
intermonomer cross-peaks have been assigned, we advise not
to use this restraint.
12. Restraints discarded by the merging procedure are excluded
from the list.
13. To use the CCPNmr FormatConverter (17) for data conver-
sion with file formats not natively supported by ARIA, it is
necessary to use the following command
aria2 --convert_ccpn conversion.xml
14. If solid-state NMR experiments are performed on site-directed
13
C-enriched samples (7, 64), it is necessary to specify the
appropriate labeling scheme, i.e., [1,3-13C]-glycerol and
[2-13C]-glycerol. ARIA automatically removes assignment
options that are not permitted by the labeling pattern, as first
described in the SOLARIA program (9). Alternatively,
CCPNmr Analysis provides routines to create ambiguous dis-
tance restraints respecting the labeling patterns. Such restraints
can be then imported into ARIA.
23 ARIA for Solution and Solid-State NMR 479

15. A user can also choose the New item in the GUI menu
Project to create a new project. As an alternative, the follow-
ing command
aria2 --project_template project.xml
will create a new project file.
16. Residual dipolar coupling data can be incorporated as restraints
following two alternative approaches: direct (SANI) or indirect
(VEAN). For SANI, the user has to specify the rhombicity and
magnitude of the alignment tensor (65). Several methods exist
to predict these parameters, from the distribution of the RDC
values (66) or from the shape of the molecule (67). VEAN
uses intervector projection angle restraints which must be gen-
erated with a separate program (68).
17. The correlation between a three-bond measured J-coupling
and the corresponding dihedral angle is modeled by the Karplus
curve. Default values for the parameters of the Karplus curve
are given for 3J(HNHa).
18. An MTF can be specified in the project file. Changes must be also
made to the CNS topology, linkage, and parameter files. Definitions
of the additional residues or compounds must be added to the
ARIA dictionary (files atomnames.xml and iupac.xml).
A detailed explanation is given on the ARIA Web site.
19. We recommend the use of the network-anchoring only for the
first 3 iterations. Too stringent thresholds or an application of
network-anchoring during more ARIA iterations may bias the
assignment process toward an incorrect structure (13).
20. Jobs can be submitted via ssh commands or with the follow-
ing batch queuing systems: PBS (69), SGE (70) or Condor
(71). Alternatively, CCPN users can submit their ARIA calcu-
lation to the CCPNGrid portal server at http://www.webapps.
ccpn.ac.uk/ccpngrid/.
21. Only local copies of data files are used for structure calculation.
Changes in the original files will thus become active only in the
next project setup.
22. For systems of about 100 residues, well converged ensembles
show average energies of the order of 1,000 kcal/mol. Normal
energy variation is about 10%, the total average energy scaling
is approximately linear with the system size.
23. Others methods are available to estimate the credibility of the
structures, notably by scrutinizing the information content of
the data (72). For instance, the completeness (73) of a restraint
set provides insight into the local reliability of each structure.
The completeness is the ratio between the number of observed
restraints and the number of expected restraints. We recom-
mend the method AQUA (73) to perform such analysis.
480 B. Bardiaux et al.

Moreover, several Web servers exist where a user can submit


structures for quality checking and validation, e.g., PSVS (74)
and Cing (75).
24. Comparing such quality profiles can be very helpful to detect reli-
able solutions when multiple conformations are obtained (13).
25. A recent study on the effect of the cooling rate of the simulated-
annealing with highly ambiguous data reported an increased
efficiency of slower cooling, e.g., 100,000 (equivalent Cartesian)
steps (28). The same order of value was successfully used to
determine the structure of the SH3 domain (9), Crh (10), and
aB crystallin dimer from MAS solid-state NMR data (76). Note
that ARIA divides the number of steps for the torsion angle
phase by the value of the parameter TAD time-steps factor to
allow a larger time-step (default factor value is 9).

Acknowledgments

This work was supported by the EU grants SPINE (QLG2-


CT-2002-00988) and ExtendNMR (LSHG-CT- 2005018988).
The Ministre de lEnseignement Suprieur (ACI IMPBio, project
ICMD-RMN) and Institut Pasteur are also acknowledged for
financial support. The authors would like to thank Wolfgang
Rieping, Michael Habeck, Aymeric Bernard, and the CCPN team
for their active participation in the development of ARIA, as well
as Anja Bckmann and Barth-Jan van Rossum for fruitful collabo-
rations on solid-state NMR. Benjamin Bardiaux thanks Hartmut
Oschkinat for support.

References
1. Wuthrich, K. (1986) NMR of Proteins and 6. Rieping, W., Habeck, M., Bardiaux, B.,
Nucleic Acids, Wiley-Interscience New York. Bernard, A., Malliavin, T., and Nilges, M.
2. Nilges, M. (1995) Calculation of protein struc- (2007) ARIA2: automated NOE assignment
tures with ambiguous distance restraints. and data integration in NMR structure calcula-
Automated assignment of ambiguous NOE tion. Bioinformatics 23, 381382.
crosspeaks and disulphide connectivities. J. Mol. 7. Castellani, F., van Rossum, B., Diehl, A.,
Biol. 245, 645660. Schubert, M., Rehbein, K., and Oschkinat, H.
3. Nilges, M. and ODonoghue, S. I. (1998) (2002) Structure of a protein determined by
Ambiguous NOEs and automated NOESY solid-state magic-angle-spinning NMR spec-
assignment. Prog. NMR Spec. 32, 107139. troscopy. Nature 420, 98102.
4. Linge, J. P., ODonoghue, S. I., and Nilges, M. 8. Herrmann, T., Gntert, P., and Wthrich, K.
(2001) Automated assignment of ambiguous (2002) Protein NMR structure determination
nuclear overhauser effects with ARIA. Methods with automated NOE assignment using the
Enzymol. 339, 7190. new software CANDID and the torsion angle
5. Linge, J. P., Habeck, M., Rieping, W., and dynamics algorithm DYANA. J. Mol. Biol. 319,
Nilges, M. (2003) ARIA: automated NOE 209227.
assignment and NMR structure calculation. 9. Fossi, M., Castellani, F., Nilges, M., Oschkinat,
Bioinformatics 19, 315316. H., and van Rossum, B. (2005) SOLARIA: a
23 ARIA for Solution and Solid-State NMR 481

protocol for automated cross-peak assignment Arendall, W. B., Snoeyink, J., Richardson, J. S.,
and structure calculation for solid-state magic- and Richardson, D. C. (2007) MolProbity: all-
angle spinning NMR spectroscopy. Angew. atom contacts and structure validation for pro-
Chem. Int. Ed. Engl. 44, 61516154. teins and nucleic acids. Nucleic Acids Res. 35,
10. Loquet, A., Bardiaux, B., Gardiennet, C., W375383.
Blanchet, C., Baldus, M., Nilges, M., Malliavin, 22. Folmer, R. H., Hilbers, C. W., Konings, R. N.,
T., and Bckmann, A. (2008) 3D Structure and Nilges, M. (1997) Floating stereospecific
Determination of the Crh Protein from Highly assignment revisited: application to an 18 kDa
Ambiguous Solid-State NMR Restraints. protein and comparison with J-coupling data.
J. Am. Chem. Soc. 130, 35793589. J. Biomol. NMR 9, 245258.
11. Manolikas, T., Herrmann, T., and Meier, B. 23. Duggan, B., Legge, G., Dyson, H., and Wright,
(2008) Protein structure determination from P. (2001) SANE (Structure Assisted NOE
(13)C spin-diffusion solid-state NMR spectros- Evaluation): an automated model-based
copy. J. Am. Chem. Soc. 130, 39593966. approach for NOE assignment. J. Biomol. NMR
12. Wasmer, C., Lange, A., Melckebeke, H. V., 19, 321329.
Siemer, A., Riek, R., and Meier, B. (2008) 24. Grler, A. and Kalbitzer, H. R. (1997) Relax, a
Amyloid fibrils of the HET-s(218289) prion flexible program for the back calculation of
form a beta solenoid with a triangular hydro- NOESY spectra based on complete relaxation
phobic core. Science 319, 15231526. matrix formalism. J. Magn. Reson. 124,
13. Bardiaux, B., Bernard, A., Rieping, W., Habeck, 177188.
M., Malliavin, T. E., and Nilges, M. (2009) 25. Linge, J., Habeck, M., Rieping, W., and Nilges,
Influence of different assignment conditions on M. (2004) Correction of spin diffusion during
the determination of symmetric homodimeric iterative automated NOE assignment. J. Magn.
structures with ARIA. Proteins 75, 569585. Reson. 167, 334342.
14. Nilges, M., Bernard, A., Bardiaux, B., Malliavin, 26. Mumenthaler, C. and Braun, W. (1995)
T., Habeck, M., and Rieping, W. (2008) Automated assignment of simulated and exper-
Accurate NMR structures through minimisa- imental NOESY spectra of proteins by feedback
tion of an extended hybrid energy. Structure filtering and self-correcting distance geometry.
16, 13051312. J. Mol. Biol. 254, 465480.
15. van Rossum, G., http://www.python.org/. 27. Stein, E. G., Rice, L. M., and Brnger, A. T.
16. Brnger, A. T., Adams, P. D., Clore, G. M., (1997) Torsion-angle molecular dynamics as a
DeLano, W. L., Gros, P., Grosse-Kunstleve, new efficient tool for NMR structure calcula-
R. W., Jiang, J.-S., Kuszewski, J., Nilges, M., tion. J. Magn. Reson. 124, 154164.
Pannu, N. S., Read, R. J., Rice, L. M., 28. Fossi, M., Oschkinat, F., Nilges, M., and Ball,
Simonson, T., and Warren, G. L. (1998) L. (2005) Quantitative study of the effects of
Crystallography and NMR system (CNS): A chemical shift tolerances and rates of SA cool-
new software suite for macromolecular struc- ing on structure calculation from automatically
ture determination. Acta Cryst. sect. D 54, assigned NOE data. J. Magn. Reson. 175,
905921. 92102.
17. Vranken, W. F., Boucher, W., Stevens, T. J., 29. Rieping, W., Habeck, M., and Nilges, M. (2005)
Fogh, R. H., Pajon, A., Llinas, M., Ulrich, E. Modeling errors in NOE data with a log-normal
L., Markley, J. L., Ionides, J., and Laue, E. D. distribution improves the quality of NMR struc-
(2005) The CCPN data model for NMR spec- tures. J. Am. Chem. Soc. 127, 1602616027.
troscopy: development of a software pipeline. 30. Rieping, W., Habeck, M., and Nilges, M.
Proteins 59, 687696. (2005) Inferential Structure Determination.
18. Laskowski, R. A., MacArthur, M. W., Moss, D. Science 309, 303306.
S., and Thornton, J. M. (1993) PROCHECK: 31. Habeck, M., Rieping, W., and Nilges, M.
a program to check the stereochemical quality (2006) Weighting of experimental evidence in
of protein structures. J. Appl. Cryst. 26, macromolecular structure determination. Proc.
283291. Natl. Acad. Sci. USA 103, 17561761.
19. Vriend, G. (1990) WHAT IF: a molecular 32. Nilges, M. (1993) A calculation strategy for the
modeling and drug design program. J. Mol. structure determination of symmetric dimers
Graph. 8, 5256. by 1 H NMR. Proteins 17, 297309.
20. Sippl, M. J. (1993) Recognition of errors in 33. Linge, J. P., Williams, M. A., Spronk, C. A.,
three-dimensional structures of proteins. Bonvin, A. M., and Nilges, M. (2003)
Proteins Struct. Funct. Genet. 17, 355362. Refinement of protein structures in explicit sol-
21. Davis, I. W., Leaver-Fay, A., Chen, V. B., Block, vent. Proteins Struct. Funct. Genet. 20,
J. N., Kapral, G. J., Wang, X., Murray, L. W., 496506.
482 B. Bardiaux et al.

34. Linge, J. P. and Nilges, M. (1999) Influence of contour diagrams., J. Magn. Reson. 95,
non-bonded parameters on the quality of NMR 214220.
structures: a new force-field for NMR structure 46. Kjr, M., Andersen, K. V., and Poulsen, F. M.
calculation. J. Biomol. NMR 13, 5159. (1994) Automated and semiautomated analysis
35. Nederveen, A., Doreleijers, J., Vranken, W., of homo- and heteronuclear multidimensional
Miller, Z., Spronk, C., Nabuurs, S., Guntert, nuclear magnetic resonance spectra of proteins:
P., Livny, M., Markley, J., Nilges, M., Ulrich, the program PRONTO. Methods Enzymol.
E., Kaptein, R., and Bonvin, A. M. (2005) 239, 288308.
RECOORD: a REcalculated COORdinates 47. Bartels, C., Xia, T.-H., Billeter, M., Gntert,
Database of 500+ proteins from the PDB using P., and Wthrich, K. (1995) The program
restraints from the BioMagResBank. Proteins XEASY for computer-supported NMR spectral
59, 662672. analysis of biological macromolecules. J. Biomol.
36. The World Wide Web Consortium (2008), NMR 5, 110.
Extensible Markup Language (XML) 1.0 (Fifth 48. Gntert, P., Braun, W., and Wthrich, K.
Edition), http://www.w3.org/TR/xml/. (1991) Efficient computation of three-dimen-
37. Bardiaux, B., Bernard, A., Rieping, W., Habeck, sional protein structures in solution from
M., Malliavin, T., and Nilges, M. (2008) nuclear magnetic resonance data using the pro-
Graphical analysis of NMR structural quality gram DIANA and the supporting programs
and interactive contact map of NOE assign- CALIBA, HABAS and GLOMSA. J. Mol. Biol.
ments in ARIA. BMC Struct. Biol. 8, 3034. 217, 517530.
38. Spronk, C. A. E. M., Nabuurs, S. B., Krieger, 49. Hall, S. R. and Cook, A. P. F. (1995) STAR
E., Vriend, G., and Vuister, G.W. (2004) dictionary definition language: Initial specifica-
Validation of protein structures derived by tion. J. Chem. Inf. Comput. Sci. 35, 819825.
NMR spectroscopy. Progress in Nuclear 50. Markley, J. L., Bax, A., Arata, Y., Hilbers, C. W.,
Magnetic Resonance Spectroscopy 45, 315337. Kaptein, R., Sykes, B. D., Wright, P. E., and
39. Saccenti, E. and Rosato, A. (2008) The war of Wthrich, K. (1998) Recommendations for the
tools: how can NMR spectroscopists detect presentation of NMR structures of proteins
errors in their structures? J. Biomol. NMR 40, and nucleic acids. J. Mol. Biol. 280, 933952.
251261. 51. Gntert, P., Mumenthaler, C., and Wtrich, K.
40. Nabuurs, S., Krieger, E., Spronk, C., Nederveen, (1997) Torsion Angle Dynamics for NMR
A., Vriend, G., and Vuister, G. (2005) Strucutre Calculation with the New Program
Definition of a new information-based per-res- DYANA. J. Mol. Biol. 273, 283298.
idue quality parameter. J. Biomol. NMR 33, 52. Wthrich, K., Billeter, M., and Braun, W.
123134. (1983) Pseudo-structures for the 20 common
41. Nabuurs, S., Spronk, C., Vuister, G., and Vriend, G. amino acids for use in studies of protein con-
(2006) Traditional biomolecular structure deter- formations by measurements of intramolecular
mination by NMR spectroscopy allows for major proton-proton distance constraints with nuclear
errors. PLoS Comput. Biol. 2, e9. magnetic resonance. J Mol Biol 169, 949961.
42. Kraulis, P., Domaille, P. J., Campbell-Burk, S. 53. Lange, A., Luca, S., and Baldus, M. (2002)
L., van Aken, T., and Laue, E. D. (1994) Structural constraints from proton-mediated
Solution structure and dynamics of ras p21. rare-spin correlation spectroscopy in rotating
GDP determined by heteronuclear three- and solids. J. Am. Chem. Soc. 124, 97049705.
four-dimensional NMR spectroscopy. 54. Szeverenyi, N., Sullivan, M., and Maciel, G.
Biochemistry 33, 35153531. (1982) Observation of spin exchange by two-
43. Delaglio, F., Grzesiek, S., Vuister, G. W., Zhu, dimensional fourier transform 13 C cross polar-
G., Pfeifer, J., and Bax, A. (1995) NMRPipe: a ization-magic-angle spinning. J. Magn. Reson.
multidimensional spectral processing system 47, 462475.
based on UNIX pipes. J. Biomol. NMR 6, 55. Castellani, F., van Rossum, B., Diehl, A.,
277293. Rehbein, K., and Oschkinat, H. (2003)
44. Johnson, B. A. and Blevins, R. A. (1994) Determination of solid-state NMR structures
NMRView: A computer program for the visu- of proteins by means of three-dimensional
alization and analysis of NMR data. J. Biomol. 15 N-13 C-13 C dipolar correlation spectros-
NMR 4, 603614. copy and chemical shift analysis. Biochemistry
45. Garrett, D., Powers, R., Gronenborn, A., and 42, 1147611483.
Clore, G. (1991) A common sense approach to 56. Takegoshi, K., Nakamura, S., and Terao, T.
peak picking two-, three- and four-dimensional (2003) 13 C-1 H dipolar-driven 13 C-13 C recou-
spectra using automatic computer analysis of pling without 13 C rf irradiation in nuclear
23 ARIA for Solution and Solid-State NMR 483

magnetic resonance of rotating solids. J. Chem. 67. Zweckstetter, M. and Bax, A. (2000) Prediction
Phys. 118, 23252341. of sterically induced alignment in a dilute liquid
57. Lewandowski, J. R., Pape, G. D., Eddy, M. T., crystalline phase: Aid to protein structure
and Griffin, R. G. (2009) (15)N-(15)N proton determination by NMR. J. Am. Chem. Soc.
assisted recoupling in magic angle spinning 122, 37913792.
NMR. J. Am. Chem. Soc. 131, 57695776. 68. Meiler, J., Blomberg, N., Nilges, M., and
58. Fossi, M., Linge, J., Labudde, D., Leitner, D., Griesinger, C. (2000) A new approach for
Nilges, M., and Oschkinat, H. (2005) Influence applying residual dipolar couplings as restraints
of chemical shift tolerances on NMR structure in structure calculations. J. Biomol. NMR 16,
calculations using ARIA protocols for assigning 245252.
NOE data. J. Biomol. NMR 31, 2134. 69. Jones, J. P. (2002) PBS: portable batch system,
59. Wishart, D. S. and Sykes, B. D. (1994) The 13 C Beowulf cluster computing with Linux, MIT
chemical-shift index: a simple method for the Press, Cambridge, MA, USA, 369390.
identification of protein secondary structure 70. Gentzsch, W. (2001) Sun Grid Engine: Towards
using 13 C chemical-shift data. J. Biomol. NMR creating a compute power grid, CCGRID 01:
4, 171180. Proceedings of the 1st International Symposium
60. Cornilescu, G., Delaglio, F., and Bax, A. (1999) on Cluster Computing and the Grid,
Protein backbone angle restraints from search- IEEE Computer Society, Washington, DC,
ing a database for chemical shift and sequence USA, 35.
homology. J. Biomol. NMR 13, 289302. 71. Thain, D., Tannenbaum, T., and Livny, M.
61. Cheung, M.-S., Maguire, M. L., Stevens, T. J., (2005) Distributed computing in practice: the
and Broadhurst, R. W. (2010) DANGLE: A Condor experience. Concurr. Comput.: Pract.
Bayesian inferential method for predicting pro- Exper. 17, 323356.
tein backbone dihedral angles and secondary 72. Nabuurs, S., Spronk, C., Krieger, E.,
structure. J. Magn. Reson. 202, 22333. Maassen, H., Vriend, G., and Vuister, G.
62. Loquet, A., Gardiennet, C., and Bckmann, A. (2003) Quantitative evaluation of experimental
(2010) Protein 3D structure determination by NMR restraints. J. Am. Chem. Soc. 125,
high-resolution solid-state NMR. Comptes. 1202612034.
Rendus - Chimie 13, 423430. 73. Doreleijers, J. F., Raves, M. L., Rullmann, T.,
63. Gardiennet, C., Loquet, A., Etzkorn, M., and Kaptein, R. (1999) Completeness of NOEs
Heise, H., Baldus, M., and Bckmann, A. in protein structure: a statistical analysis of
(2008) Structural constraints for the Crh pro- NMR data. J. Biomol. NMR 14, 123132.
tein from solid-state NMR experiments. J. 74. Bhattacharya, A., Tejero, R., and Montelione,
Biomol. NMR. 40, 239250. G. T. (2007) Evaluating protein structures
64. LeMaster, D. M. and Kushlan, D. M. (1996) determined by structural genomics consortia.
Dynamical mapping of E. coli thioredoxin via Proteins 66, 778795.
13 C NMR relaxation analysis. J. Am. Chem. 75. Doreleijers, J. F., Vranken, W. F., Schulte, C.,
Soc. 118, 92559264. Lin, J., Wedell, J. R., Penkett, C. J., Vuister, G.
65. Tjandra, N., Garrett, D. S., Gronenborn, A. W., Vriend, G., Markley, J. L., and Ulrich, E. L.
M., Bax, A., and Clore, G. M. (1997) Defining (2009) The NMR restraints grid at BMRB for
long range order in NMR structure determina- 5,266 protein and nucleic acid PDB entries.
tion from the dependence of heteronuclear J. Biomol. NMR 45, 389396.
relaxation times on rotational diffusion anisot- 76. Jehle, S., Rajagopal, P., Bardiaux, B., Markovic,
ropy. Nature Struct. Biol. 4, 443449. S., Khne, R., Stout, J. R., Higman, V. A.,
66. Clore, G., Gronenborn, A., and Bax, A. (1998) A Klevit, R. E., van Rossum, B.-J., and Oschkinat,
robust method for determining the magnitude of H. (2010) Solid-state NMR and SAXS studies
the fully asymmetric alignment tensor of oriented provide a structural basis for the activation of
macromolecules in the absence of structural alphaB-crystallin oligomers. Nat. Struct. Mol.
information. J. Magn. Reson. 133, 216221. Biol. 17, 10371042.

You might also like