Salvatore Lanzavecchia, Francesca Cantele, Michael Radermacher and Pier Luigi Bellon - Symmetry Embedding in The Reconstruction of Macromolecular Assemblies Via The Discrete Radon Transform

Journal of
Structural Biology
Journal of Structural Biology 137 (2002) 259272 www.academicpress.com
Symmetry embedding in the reconstruction of macromolecular assemblies via the discrete Radon transform
Salvatore Lanzavecchia,a Francesca Cantele,a Michael Radermacher,b and Pier Luigi Bellona,*
b a Dipartimento di Chimica Strutturale, Universita degli Studi, via Venezian 21, 20133 Milan, Italy Max Planck Institut fr Biophysik, Abt. Strukturbiologie, Heinrich-Homann-Strasse 7, 60528 Frankfurt, Germany u
Received 29 August 2001; and in revised form 16 February 2002
Abstract In this paper we discuss the embedding of symmetry information in an algorithm for three-dimensional reconstruction, which is based on the discrete Radon transform. The original algorithm was designed for randomly oriented and in principal asymmetric particles. The expanded version presented here covers all symmetry point groups which can be exhibited by macromolecular protein assemblies. The orientations of all symmetry equivalent projections, based on the orientation of an experimental projection, are obtained using global group operators. Further, an improved interpolation scheme for the recovery of the three-dimensional discrete Radon transform has been designed for greater computational eciency. The algorithm has been tested on phantom structures as well as on real data, a virus structure possessing icosahedral symmetry. 2002 Elsevier Science (USA). All rights reserved.
Keywords: Symmetry (of protein assemblies); Point groups (of protein assemblies); Radon transform; Three-dimensional reconstruction; Virus reconstruction
1. Introduction We have recently described a fast and accurate algorithm for reconstructing macromolecular protein assemblies from projections with random orientations extracted from electron micrographs (Lanzavecchia et al., 1999). The algorithm is based on the calculation of a discrete approximation to three-dimensional Radon transform (RT; Radon, 1917) starting from discrete approximations of the two-dimensional RT of projections. The three-dimensional discrete Radon transform (DRT) thus obtained is used in an inversion process to compute a three-dimensional representation of the object which, in electron microscopy, is an electron-density map. Before we introduce the subject of this paper, it is worth quoting Deans (Deans, 1993) on the possibility of identifying a function f and its RT f with physical is identied with a measured quantities: Since f
*
Corresponding author. Fax: +39-02-5031-4454. E-mail address: pierluigi.bellon@unimi.it (P. Luigi Bellon).
quantity, it only represents an approximation. Even worse, since the probe must be applied a nite number of times, f is not even approximated in a continuous function. Consequently, any determination of f is, at best, only an approximation to the desired distribution and at worst bears no resemblance to the desired distributions.. . .Keep in mind that many diculties are associated with reconstruction problems simply be cause the function f is not known exactly. This statement makes us aware of two diculties. Since f represents a two-dimensional image or a three-dimensional density distribution to be represented in Cartesian arrays, diculties arise from the need of sampling a physical quantity in polar or spherical spaces. For this reason, one or more interpolation stages are needed to recover a discrete Radon transform. The problem becomes more serious in three-dimensional electron microscopy when experimental data are represented by projections with random orientations (see, e.g., Crowther et al., 1970a). Approximations to f, however, are accurate enough to allow physicians to base their diagnoses upon CT or NMR images as well
1047-8477/02/$ - see front matter 2002 Elsevier Science (USA). All rights reserved. PII: S 1 0 4 7 - 8 4 7 7 ( 0 2 ) 0 0 0 0 4 - 7
260
S. Lanzavecchia et al. / Journal of Structural Biology 137 (2002) 259272
as astrophysicists, geologists, and structural biologists to conduct their studies. The reconstruction algorithm is based on the fact that also for RTs there exists a central section theorem, similar to the central section theorem of Fourier transforms. The radial lines of the 2D RT (or sinogram) of a projection represent radial lines in the three-dimensional RT, although this is not generally true for DRT. If the 3D Radon space is represented in a system of three orthogonal axes (p; /; h, see Eq. (1) later on), a sinogram describes a plane parallel wave whose equation is determined by the projecting direction. In the discrete domain, it is the dierence of the sampling grids in two and three dimensions which makes it necessary to use an interpolation to ll the three-dimensional DRT array. Dierent versions of the algorithm have been developed. Two versions for the recovery of DRT, described previously, were based either on a nearest neighbor interpolation and averaging or on a more elaborated interpolation scheme where the radial lines in the threedimensional DRT were obtained as linear combinations of a neighborhood of sinogram lines weighted by their angular distances from the radial line in DRT. In the inversion step, from the DRT to real space, also two variations have been implemented, one of which uses a weighted back projection (Gilbert, 1972; Radermacher, 1997) and the other a direct Fourier method (DFM; Lanzavecchia and Bellon, 1998). From the total of four combinations only three have been used up to now: combined with the ner interpolation scheme both inversion methods and combined with the nearest neighbor interpolation only the weighted backprojection inversion. The algorithms can exploit the properties of the DRT to reject non tomographic noise (NTN, Lanzavecchia and Bellon, 1996 ) in a process which lls all radial lines of the transform array in the cases where experimental data are not enough. If the DRT is stored in a 3D array representing a sampling in p; /; h coordinates, the values in each plane of the array must full certain constraints due to the continuity of the transform (Lanzavecchia et al., 1999). If the DRT array is obtained from a set of experimental projections, the constraints often are not satised, mostly because of noise. The lter imposes consistency on the transform by suppressing the part of noise which is not consistent with a tomographic experiment. Additional inconsistencies may arise if data are missing. The lter can also be used to ll an incomplete array by enforcing consistency (Lanzavecchia and Bellon, 1996). Combined with a replacement of experimental data and used iteratively this is a special case of the projection onto convex set algorithm (POCS; Carazo, 1992; Carazo and Carracosa, 1987; Sezan, 1982). Very often macromolecular protein assemblies are symmetric and, as a consequence, one projection can be
observed along a number of dierent directions or, in other words, is representative of a symmetry-related series of projections. Usage of the symmetry increases the number of projections available for the reconstruction. We have implemented the symmetry operations as part of the reconstruction algorithm and modied the interpolation scheme for higher computational eciency. The accuracy of the new algorithms has been tested with phantom data and their performance has been tested on an icosahedral structure reconstructed from real data.
2. Symmetry of macromolecular protein assemblies Protein molecules are characterized by inherent handedness. If they assemble together to form a crystal or an isolated assembly the resulting entity maintains this chiral property. For this reason we will only observe symmetry groups in which there are no mirror planes, inversion centers, or improper axes (two-step operations: rotation around an axis followed by reection through an orthogonal plane) which would reverse the handedness of the protein structure. There exist 230 possible space groups. Discarding those containing centers, mirror planes, or improper axes one can easily isolate a subset of only 65 groups representing all possible ways in which protein molecules can crystallize. In isolated assemblies, whose symmetry is described by means of point groups, only axial symmetry can be observed, with axes of various orders. Based on physical reasons, an axis of symmetry cannot pass through the protein electron density. This statement, perhaps obvious, can be demonstrated as follows. Suppose that a symmetry axis passes through an atom inside a protein, then this atom will be reproduced n times, n being the axis order. Yet also its neighbor atoms will be reproduced n times; that is, all atoms of the protein chain would be reproduced symmetrically around that axis n times. This would cause interpenetration of n density distributions in a short range of distances around the atom hit by the axis. The same reasoning holds true for an axis intersecting or passing near a chemical bond. Thus, an axis through a three-dimensional electron density indicates either close contact between equivalent molecules or reconstruction artifacts. Point groups are abstract entities which can be dressed by an equivalent moiety (an isolated molecule or a molecular assembly), regularly repeated a number of times equal to the group order. Here is a short list, in Schnies notation, of point groups which can be obo served in isolated protein assemblies (for Schnies o symbols and a thorough description of point group theory, see Cotton, 1964, and Hahn, 1992). The symmetries of some simulated macromolecular assemblies,
261
Fig. 1. Stereo views of phantom structures exhibiting dierent symmetries. Each structure is obtained by replicating a random knot with all symmetry operators of a group. The number of knots in each structure is equal to the group order. (a) A completely asymmetric structure belonging to C1 group of order 1; (b, c) two structures with C7 and D7 symmetry (group orders 7 and 14, respectively); (d, e, f) structures belonging with T, O, and I subgroups (group orders 12, 24, and 60, respectively).
used in our tests, are shown in Fig. 1. The phantoms are composed of one or several copies of an asymmetric random knot (Bellon et al., 1998), whose number depends upon the group order. a. The C1 or identity group characterizes individual proteins or macromolecular assemblies with no symmetry at all, as is the case, for example, for the ribosome; b. The Cn or cyclic groups possess only one symmetry axis of order n. In this case, n equivalent protein moieties assemble together around the axis. In principle, n can assume any value and the moieties are reproduced around the axis with constant angular separation equal to 2p=n; c. The Dn or dihedral groups contain a Cn plus C2 axes orthogonal to it. One can imagine obtaining an assembly with this kind of symmetry by rst obtaining n molecules around the main Cn axis and by rotating the result by p (the C2 operation) around an axis orthogonal to Cn . Thus, an assembly possessing Dn symmetry will contain 2n equivalent moieties (note that because of the Cn axes, the C2 is replicated too).
d. The T subgroup of the tetrahedral Td group possesses symmetry operators which are easily visualized in a cube inscribing the tetrahedron; in this construction, the edges of the tetrahedron lie along the diagonals of the cube faces. The T group has four threefold axes C3 oriented as the main or body diagonals of the cube, plus twofold axes orthogonal to the cube faces. A single molecule is reproduced three times by a C3 around a vertex of the tetrahedron and any two C2 axes (orthogonal to the cube face) operating in sequence will dress the remaining vertices. A T assembly will contain therefore 3 2 2 12 equivalent moieties; e. The O subgroup of the cubic or octahedral Oh group (h stays for holohedral, Oh comprising a center and mirror planes) possesses a set of four C3 axes coming out of the cube vertices plus three C4 axes orthogonal to the cube faces and six C2 axes cutting opposite pairs of edges. A set of three moieties obtained around a C3 axis can be reproduced by a C4 axis to dress four vertices of a cube face and nally reproduced on the parallel face by a p rotation orthogonal to the C4 axis
262
used previously. The subgroup generates therefore 3 4 2 24 equivalent moieties; f. The I subgroup of the icosahedral group Ih , possesses 6 C5 axes (half the number of icosahedron vertices, 12), plus 10 C3 axes (half the number of faces, 20) and, nally, 15 C2 axes (half the number of edges, 30). The symmetry of the icosahedron is such that the 20 faces can be grouped in 5 subsets of 4, each of which lies on the faces of an inscribing tetrahedron. Furthermore, 15 C2 axes can be grouped in 5 groups of 3 orthogonal axes. The latter property oers a convenient way to represent an icosahedral structure. A moiety is reproduced three times by a C3 operation to dress an icosahedral face and this set is copied by a C2 on another face sharing an edge with the rst one. The set of 6 moieties is then duplicated by another C2 axis orthogonal to the rst one and, nally, the 12 moieties are replicated ve times by a C5 . The order of I subgroup is therefore 3 2 2 5 60. Point groups can be regarded as collections of symmetry elements as well as of operators. Since we are restricted to collections of symmetry axes, the corresponding operators merely consist of rotation matrices and their products. As is true for matrix products, the operations of symmetry elements do not commute: point groups are not Abelian with the exception of cyclic groups Cn . As stated above, the presence of symmetry aids the reconstruction of a macromolecular protein assembly from its projections. For a given projection, n 1 equal projections exist, n being the group order, unless the projecting direction is a special one. These projections cross the original one along n 1 common lines. In icosahedral structures n 60, symmetry-generated projections cross a given one along 59 common lines whose positions can be used to determine the Euler angles of the projecting direction (see, e.g., Fuller et al., 1996).
3. Symmetry embedding in the reconstruction algorithm A reconstruction program is fed with a set of projections assigned with known projecting directions. The latter are determined with respect to a dened orientation of the molecule in a reference system. A projecting direction is specied by two Euler angles a and b (a longitude and b latitude) which dene the projection axis orientation in spherical coordinates. The third Euler angle c describes a rotation of the structure around that axis. From the Euler angles a compound rotation matrix, R ZaYbZc, can be computed (Z and Y being counterclockwise rotation matrices around z and y, respectively, whose angular arguments are in parentheses). It is conversely easy to go back from R to the Euler angles. The matrix R can be used to rotate a three
dimensional density map in such a way that the projection labeled by the three Euler angles can be obtained by projecting the density along z. Symmetry is imposed to the reconstruction by using the same projection labeled with dierent triads a; b; c. Each triad represents one of the equivalent projecting directions dictated by the point group. Given an Euler triad, all symmetry equivalent triads are found by applying the group operators to R. To do this we have devised global operators G, dierent for each group. A global operator is a matrix which varies according to the values of one or more indices (see Appendix A). Fig. 2a shows a C3 phantom structure for which a given projection (Fig. 2b) is identical to two others. The orientations of the latter are easily determined by the global group operator once the orientation of the rst projection is known. In this way, the 2D FT of the projection ts three central sections of the three-dimensional FT (Fig. 2d). Correspondingly, the sinogram of Fig. 2c) ts three times in the RT (Fig. 2e). The orientation of the symmetry axes with respect to the reference system is arbitrary. It is convenient, however, to choose it in such a way that the rotation matrices are evaluated in the simplest way. This can be achieved by letting one or more symmetry axes of the molecular assembly coincide with the orthogonal axes of the reference system. In other words, the structure should be oriented in a canonical way, specic for each symmetry group. Thus, the principal axes of Cn structures are conveniently oriented along z. The same is true for Dn structures, in which a twofold axis, orthogonal to the principal one, is set along x. A preliminary model with Cn or Dn symmetry, oriented at random, is easily brought to its canonical orientation. For a distributions possessing symmetries, a nondegenerated axis of inertia is oriented along the principal symmetry axis which can be brought into coincidence with one coordinate axis. A simple and essentially automatic algorithm to achieve this has been recently described (Lanzavecchia et al., 2001). A canonical orientation can be unambiguously dened also for completely asymmetric structures (group C1 ) because, in this case, there are three nondegenerated axes of the inertial ellipsoid. For density distributions belonging to the T, O, and I symmetry groups, the ellipsoid is a sphere so that the inertial approach becomes nonsensical. In these cases, canonical orientations are dened in a conventional way. Once this has been chosen, as described in Fig. 3, a preliminary model can be brought to the appropriate orientation by trial and error methods (see Frank, 1996, and references therein). In the canonical orientation of the T group, the twofold axes are set along the coordinate axes and the threefold axes lie along the diagonals of alternate octants (Fig. 3a). The same is true for the O group, once the fourfold axes are set along the coordinate axes (Fig. 3b). In the icosahedral subgroup I,
263
Fig. 2. Filling DRT in the presence of symmetry. A projection of the C3 phantom in a is shown in b. The 2D Fourier transforms of it and of two equal projections generated by symmetry are central sections of the structure 3D Fourier transform, as shown in d. Correspondingly, the sinogram of the projection in c ts three times the RT as shown in e.
dierent choices are possible. The one usually adopted (Klug and Finch, 1968) is that shown in Fig. 3c, in which 3 (out of 15) C2 axes are oriented along x; y; and z. The two opposite edges cut by z are set parallel to x so that along the arc y ! z C5 is encountered rst, followed by C3 . The angular distance of C5 from z is arctan(s) and that of p is arctan(2 s), s being the C3 golden ratio s 1 5=2 1:61803 . . . as shown in Appendix B.
4. Short overview of two methods of building a discrete Radon array For a three-dimensional function f x the continuous RT f is dened as Z f p; f f xdp f x dx; 1 where f represents a unit vector in R3 . If f is described by two angles h and / (the latitude and longitude) the
function f is represented in the coordinates p; h; /; p being the radial coordinate along the direction specied ~ by h and /. A discrete version of f f DRT is obtained by evaluating numerical values of the transform at equispaced intervals Dp; Dh, and D/ along p; h, ~ and / : f i; j; k f iDp; jDh; kD/ f pi ; hj ; /k . This sampling is clearly not evenly spaced according to Euclidean metric. However, the factors limiting the quality ~ of an approximation of f by f are the same which limit the reliability of a discrete Fourier transform of a nite object, since Radon and Fourier transforms are intimately related (Bracewall, 1956). The Shannon theorem (Shannon, 1949) and the sampling theory are applicable in Fourier space since an invertible transformation exists between the sampling set and a set of points evenly spaced in the Euclidean space (Clark et al., 1985). The sampled version of the continuous sinogram Sp; d of a projection with Euler angles a; b; c can be computed at discrete points to yield a 2D array Spi ; dn . Each continuous line of the sinogram Sd p Sp; d constant corresponds to a line of the continuous
264
Fig. 3. Canonical orientations of T, O, and I subgroups with respect to Cartesian axes.
three-dimensional RT fd p f p; h0d ; /0d ; h0d and /0d being constant values depending upon d. This correspondence is illustrated in Fig. 2. In the discrete version, ~ S i; n Spi ; dn , the radial sampling pi over p is the ~ ~ same for the lines of S and f but the values h0n ; /0n (determined by the parameters a; b; c; and dn ) do not gen~ erally coincide with any value of the sampling grid of f , i.e., with the points hj ; /k . The two methods described earlier (Lanzavecchia et al., 1999) use dierent strategies to ll the array ~ f i; j; k with the values of the lines Sn Spi ; dn . Method 1 scans all projections one at a time and determines, for each line Sn , the values h0n ; /0n . The sino~ gram line is added into the array f i; j; k in the position corresponding to h0j ; /0k which dier from h0n ; /0n less than ~ half the angular sampling interval of f . At the end, each ~ line of f is normalized by the number of sinogram lines received. Method 2 scans over the three-dimensional ~ array f i; j; k f pi ; hj ; /k line by line along hj , and /k . For a given line l, the set of all projections is examined to retrieve all sinogram lines Sn closest to l. The angular distance of each sinogram line to l is calculated
and is used to compute a weight which in turn is used to multiply the values of Sn before adding them to the line l. In the nal step, the contribution received by each 3D radial line is normalized by the sum of the weights of each contribution. The computational cost of the process grows linearly with the number of projections. The two methods are illustrated in Fig. 4. As can be ~ seen, the main dierence is in the number of lines of f which are lled, which is remarkably dierent near the pole. In this zone, the absolute angular separation among a number of lines hj ; /k , equally spaced in j and k, is actually small. Thus, according to method 1, a line ~ Sn contributes only to one line of f , while with method 2 ~ the same line contributes to several lines in f . This has a consequence on the inversion stage of the three-dimensional DRT. The DRT is inverted by either an r -weighted backprojection algorithm applied twice, once to the slices of the DRT with constant h and subsequently to sections spanned by h; p of the partially inverted DRT. A second inversion algorithm uses a direct Fourier method (Bellon and Lanzavecchia, 1997; Lanzavecchia and
265
Fig. 4. Two interpolation strategies used in the buildup of the DRT are illustrated by means of a unit sphere spanned by angles /0; 2p and h0; p; bold circle is the equator h p=2. The radial lines of the discrete transform correspond to points where meridians cross parallels. From the point of view of interpolation, the slices dened by two adjacent meridians are equivalent. Starlets represent the positions of discrete sinogram lines with angles / and h. In method 1 (left side) sinogram lines are accumulated on the nearest DRT lines marked by a dot. At right, sinogram lines spread their contributions, with appropriate weight, all over the lines of DRT (marked by a dot) which are conned within a circle (method 2). The dots falling within a circle have an absolute angular distance from a sinogram line less than the sampling step at the equator. Note that number of DRT lines to which a given sinogram line contributes is larger near the pole than near the equator.
Bellon, 1998) which takes advantage of an interpolation based upon the moving window Shannon reconstruction (Lanzavecchia and Bellon, 1995) applied in the Fourier domain (Lanzavecchia and Bellon, 1997). Complete arrays can be inverted by a fast DFM (Lanzavecchia and Bellon, 1998), while the arrays with gaps must be inverted by backprojection or ltered in a POCS procedure based on NTN rejection to ll the voids before DFM inversion. Both approaches are robust also if the angular distribution of projections is uneven, a situation that is often found when molecules are reconstructed from projections with random orientations.
DRT lines, a problem requiring a large number of trigonometric functions to be evaluated for each equivalent projection. Among the two algorithms described above, the interpolation method 1 is very fast and is most suitable when a large number of projections is available, as is the case for structures possessing high symmetry orders such as icosahedral viruses. Interpolation method 2, although more precise, has the disadvantage of an extra computational cost to evaluate absolute angular distances, which increases as symmetry increases. To achieve higher eciency, we have developed method 2b, a dierent implementation of method 2. Method 2b consists of scanning all projections, much the same way as the rst method does, and of computing the values h0n ; /0n corresponding to each sinogram line Sn . The latter still contributes with weights based upon the absolute angular separation to closer lines hj ; /k but a lookup table is used to alleviate the computation cost. The way a sinogram line spreads its contribution varies depending on the value of h0n ; /0n , but it repeats identically if /0n is incremented by D 2p=m; D being the sampling distance of f along /. Thus, a slice of a sphere, spanning 0 p along h and 0 2p=m along /, is enough to build a table. The slice is previously sampled at a given rate t, say t 2p=10m, and for every point ht ; /t the angular distances from all neighboring hj ; /k lines are computed and stored in the table together with weighting coecients. As can be seen in Fig. 4, the number of lines hj ; /k nearby line ht ; /t depends on ht . Thus, for each ht ; /t , the table contains a variable number of lines hj ; /k . In the program, for each line h0n ; /0n , the value of /0n is truncated to 0 2p=m to compute the closest ht ; /t . The table is then addressed to obtain the full list of Radon lines hj ; /k on which the contribution of the sinogram line is to be spread with the tabulated weight. Thus, in method 2b the transform is lled in much the same way as in method 2, apart from small approximations introduced by nite sampling of the slice of the sphere. The loss of numerical accuracy is barely perceivable, as can be seen by comparing the discrepancies data for methods 2 and 2b in Table 1. There is, however, a consistent gain in eciency.
6. Tests with icosahedral structures The performance tests reported here have the purpose of comparing times and accuracy of the methods described above. We cannot obviously make comparisons with other methods of reconstruction since this would essentially depend upon dierent software implementations and dierent operating systems. The three implementations of our algorithm (methods 1, 2, and 2b) have been evaluated using icosahedral virus structures, which belong to the symmetry subgroup I of order 60. The
5. Symmetry eects and protocol improvements In the presence of symmetry, the data of the sinograms, computed once, are inserted in DRT several times, using dierent triads of Euler angles corresponding to all equivalent viewing directions, as shown in Fig. 2. A time-consuming step is the computing of the relationship between the lines of the sinograms and
266
Table 1 Computation times and error estimates are reported for the phantom structure shown in Fig. 5, reconstructed from a set of 50 projections both noisefree and corrupted with noise (S=N 1) Method 1, BP inversion 1, plus POCS and DFM inversion 2, DFM inversion 2b, DFM inversion Total time in seconds 380 82 250 63 Discrepancy, noise-free projection 0.105 0.050 0.049 0.052 Discrepancy, S=N 1, no low pass 0.169 0.231 0.181 0.178 Discrepancy, S=N 1, low pass 0.166 0.205 0.162 0.163
Note. Four implementations of the protocol have been considered. The low-pass lter used for discrepancies reported in the last column has been set to 2/3 the Nyquist frequency.
tests have been performed with phantom data, such as shown in Fig. 5, and with experimental images of poliovirus (Belnap et al., 2000). 6.1. Phantom tests An analytical phantom structure has been generated in a cube of 1283 voxels (Fig. 5a) and 50 projections (Fig. 5b) have been obtained with random Euler angles by analytic rotation of the model (no interpolation). A second set of projections (Fig. 5c) was obtained by corrupting the rst set with noise, at a signal to noise ratio S=N 1. The noise pattern was obtained from structure-free areas of micrographs of ice-embedded samples. The phantom has been reconstructed from both sets in four dierent ways: (a) method 1 for recovering DRT combined with the two-step back projection inversion (Radermacher, 1997); (b) method 1 combined with ltering to ll the gaps in the DRT and DFM inversion; (c, d) method 2 and 2b, combined with DFM inversion. The performances have been evaluated using the discrepancy measure, the normalized rootmean-square deviation with respect to the original (Herman et al., 1973). The reconstruction from noisy data has been computed with and without the application of a low-pass lter. Execution times and discrepancies are reported in Table 1. The times, reported for a personal computer (Pentium II, 450 Mhz) under Linux operating system, refer to the entire process of reconstruction starting from the projections. Therefore I/O operations as well as the computation of sinograms and of two dimensional Fourier transforms are included. The phantoms reconstructed with the dierent methods from noisy projections look visually the same. One reconstruction is shown in Fig. 5d. 6.2. Tests on poliovirus A set of 123 images of poliovirus (Belnap et al., 2000) has been kindly provided by David Belnap of National Institutes of Health (NIH), see Fig. 6a. The projections were already aligned and the Euler angles known. The original images, digitized in 109 109 pixels, were padded to 128 128 pixels and a volume of 1283 voxels
was reconstructed using dierent protocols: (a) Method 1 to recover the Radon transform followed by the twostep backprojection inversion; (b) method 1 with DFM inversion which requires ltering to ll the gaps in the transform; (c, d) methods 2 and 2b with DFM inversion. Images of the reconstructed structures are shown in Figs. 6b and c; computation times are quoted in Table 2. The structures reconstructed by dierent methods were visually indistinguishable. Estimates of their resolutions have been obtained by dividing the projections of poliovirus into two subsets which were reconstructed with each method and calculating the Fourier shell correlations (FSC; Saxton and Baumeister, 1982; van Heel et al., 1982). In all cases the FSC crossed level 0.5 at a frequency (23/128) pixels1 [maximum observable frequency (63/128) pixels1 ] in agreement with the resolution obtained at NIH (Belnap et al., 2000).
7. Discussion If the symmetry of a particle is known, the reconstruction algorithms described above can take advantage of it to make optimum use of all the information available. We implemented the complete set of symmetry operations that apply to isolated symmetrical particles. The usage of symmetry in the reconstruction algorithm virtually increases the number of projections the algorithm must handle. This requires an ecient implementation, yet without loss of accuracy. All the algorithms tested perform with comparable accuracy, as shown in Table 1. In reconstructing the icosahedral phantom of Fig. 5a from noise-free projections, method 1 used in combination with weighted backprojection and no NTN ltration shows the largest discrepancy. For the same interpolation method, yet combined with NTN ltration, and for interpolation methods 2 and 2b the discrepancy gures, however, are almost equal when the phantom is reconstructed from noise-free projections and DFM inversion. In the presence of noise and if no low-pass lter is applied, method 1 used with backprojection inversion yields the smallest discrepancy. All discrepancies are comparable when the reconstructed structure is low-pass ltered to 2/3 of the Nyquist
267
Fig. 5. A simulated structure with I symmetry and its reconstruction. (a) The original phantom; (b) some analytical projections of the phantom; (c) same as in b, corrupted with noise digitized from feature-free areas of micrographs of ice embedded samples S=N 1; (d) reconstruction of the phantom shown in a, as obtained from 50 noisy projections of the type shown in c. The nal models obtained with the dierent methods listed in Table 1 are visually indistinguishable.
frequency. A comparison of the discrepancies quoted in the last two rows of Table 1 shows that the use of tabulated weights in method 2b does not cause any significant loss of accuracy in comparison with method 2. The main advantage of the algorithm with either interpolation method is in the speed of the Radon approach. The dierent speeds are illustrated in Table 2. Filtration and inversion of the three-dimensional DRT
depend only on the size of the volume but are independent of the number of projections, whereas the interpolation as part of the calculation of the DRT from the projections grows linearly with the number of images used. The new interpolation scheme based on a lookup table, as illustrated above, is faster by about a factor of 56 as compared to the original interpolation scheme of method 2, without signicant loss of accuracy. As
268
Fig. 6. Some experimental projections of poliovirus, reproduced with permission of D. Belnap and of the National Institutes of Health, are shown in a. Two views of the reconstruction, down the threefold and the vefold axis, are shown in b and c, respectively. The virus has been reconstructed with the algorithms reported in Table 2, from a set of 123 images with assigned projection angles and translation shifts.
expected, the fastest method is based on the nearest neighbor interpolation combined with DFM inversion. Yet, to achieve the same accuracy, this method requires the application of the NTN lter when the number of projections is low. The inversion algorithms require that the DRT is being sampled in a polar coordinate system with equal angular increment. Equal angular sampling creates a sampling grid with points that are not equidistant in
space. The distance between sampling points is smaller near the poles of the polar coordinate system and larger near the equator. This sampling causes an uneven distribution in the signal to noise ratio in the DRT that is recovered from the projections. This is most obvious for the nearest neighbor interpolation (method 1), where fewer measurements contribute to one line in the DRT when the line is located near the pole. Because the recovery of the DRT is an averaging process and fewer
Table 2 Total computation times (in seconds) required by four dierent implementations of the reconstruction protocol in the case of 123 projections of poliovirus sizing 128 128 pixels each Method 1, BP inversion 1, DFM inversion 2, DFM inversion 2b, DFM inversion Computing DRT 56 56 598 103 Filtering 39 Inversion 360 23 23 23 Total 416 108 621 126
Note. The times are comprehensive of I/O operations as well as of sinogram and two-dimensional FT computations.
S. Lanzavecchia et al. / Journal of Structural Biology 137 (2002) 259272 Table 3 Evaluation of the anisotropy for the reconstruction arrays of the phantom of Fig. 5 Method Discrepancy, noise-free projection Without NTN 1, BP inversion 1, plus POCS and DFM inversion 2, DFM inversion 2b, DFM inversion 0.055 0.021 0.007 0.009 With use of NTN 0.014 0.006 0.007 Discrepancy, S=N 1, low pass Without NTN 0.057 0.167 0.063 0.064 With use of NTN 0.046 0.013 0.018
269
Note. The discrepancy is reported between each reconstruction array and its copy with permuted indices. For DFM methods, each reconstruction has been obtained with and without use of the NTN lter.
measurements are averaged near the poles the signal to noise ratio is lower near the poles. For the more elaborate interpolation schemes this eect is smaller, and the NTN lter reduces it further. The situation however becomes more complicated to analyze. Recovering the discrete Radon transform on an evenly sampled grid an algorithm which we developed for other purposes (Radermacher, unpublished)would require a second interpolation before the inversion methods could be applied. Based on what has been demonstrated for nonuniformly sampled functions (Clark et al., 1985) one would not expect any additional anisotropy eects to arise from the use of spherical coordinates. We analyzed the possible anisotropy, a dierent behavior of the reconstruction algorithm along dierent directions, using the icosahedral phantom shown in Fig. 5. In an icosahedral structure three equivalent twofold axes are orthogonal and oriented along the Cartesian axes of the reconstruction so that what is seen along the z axis can be compared to what is seen along x and y. Thus the structure can be rotated into three equivalent orientations and compared. The rotation requires just a permutation of the array indexes. Using the phantom of Fig. 5, we evaluated the discrepancies between arrays reconstructed by the dierent methods and their versions with permuted axes. The phantom itself, due to round o errors in its creation, is very slightly aniso-
tropic. The discrepancies among its versions with permuted axes are about 2 105 . The phantom reconstructions performed by the dierent methods, with and without noise, show small discrepancies between the versions with permuted axes (Table 3). In the absence of noise the discrepancies are very low, especially for DFM methods. In the presence of noise the discrepancy increases for all methods with the exception of backprojection; however, the use of NTN lter, which imposes consistency to DRT, reduces the discrepancy to values close to that of a reconstruction from noise-free projections. Because no interpolation is needed to rotate the structure into the three equivalent orientations, the data in Table 3 describe solely the eect of anisotropy caused by the reconstruction algorithm. The values (with two exception) are less than half the discrepancy values in the comparison of the reconstruction algorithm to the model structure (Table 1). As expected the discrepancies are higher when the nearest neighbor interpolation scheme is used without NTN ltration than when ner interpolations are employed. Visually, however, this anisotropy could not be observed. In Fig. 7 we show enlarged details of the reconstruction of poliovirus along the three twofold axes aligned with x; y, and z. As can be seen, no appreciable dierences can be noted, and this is true for all reconstruction methods used here.
Fig. 7. No anisotropy eects can be noted if poliovirus is reconstructed by the DRT approach. In a, b, and c are three partial rendering (from plane No. 15 to No. 32, total No. of planes 128) along x, y, and z, respectively. The details, along three equivalent twofold axes, are perfectly equivalent.
270
8. Conclusions We have included into our previous reconstruction algorithms a complete set of symmetry operations that now can be easily used for all types of single particles, starting with no symmetry and extending to the group of icosahedral symmetry. The algorithms perform accurately and are computationally ecient. Reconstructing isolated macromolecular assemblies from projections via the Radon transform method with either implementation presented here is a robust and attractive alternative to conventional backprojection methods, from both speed and accuracy points of view. Most electron microscopy studies on icosahedral virus structures are carried out by the Fourier-Bessel method (FBM,) with use of cylindrical coordinates. Even in this case interpolation is a critical point (Crowther et al., 1970b). The result obtained by DRT methods for icosahedral structures is perfectly comparable with that obtained with FBM (D. Belnap, private communication). At the end, we would like to note one important aspect of three-dimensional reconstruction of single particles. Even though all biochemical data might predict a structure to be symmetric, it is possible that the specimen on the grid is not. Asymmetries can be introduced by improper refolding of expressed proteins, or by damage to the structure in any step of the purication and specimen preparation. Other asymmetries, inherent to the molecule, may be due to conformational dierences between subunits in a multisubunit complex, depending on their functional state. Therefore, even though the new algorithm makes it easy to enforce symmetry, symmetry must not be enforced if it is not present in the imaged sample. Only when the sample has been shown to posses symmetry (e.g., Kocsis et al., 1995) should this symmetry be enforced by the algorithm.
Notation used X and Z are rotation matrices around the x and z axis, respectively. In parentheses are the angular arguments of rotation which are either a constant angle or the product of it with an integer i; j; k; l. P and Q are product matrices used to align along z threefold and vefold axes, respectively; once the latter have performed their rotations, the original orientations are restored. G is the global group operator, a matrix whose coecients are determined by the values of integer indices. Angular symbols p a tan1 1 5=2 tan1 s b tan1 2 s (see Appendix B for a and b) p c tan1 2 dn 2p=n; n being the axis order Global group operators Cn group. To obtain n symmetry equivalent orientations, the global operator is simply: Gi Zidn ; with i 0 n 1 identity is obtained for i 0: Dn group. To obtain 2n symmetry equivalent orientations, the n-fold axis operates rst with the matrix: Zidn ; with i 0 n 1 identity is obtained for i 0: The twofold axis operates next to obtain 2n orientations: Xjd2 ; with j 0; 1 identity is obtained for j 0: The global group operator is Gj;i Xj jd2 Zidn identity is obtained for j and i 0: T subgroup. To obtain 3 2 2 12 equivalent orientations, the threefold axis is rst aligned along z to operate and nally brought to its original orientation: Pid3 Zp=4T XcT Zid3 XcZp=4; with i 0; 1; 2 identity is obtained for i 0: The twofold axes, operating next, are Zjd2 and Xkd2 ; with j and k 0; 1 identity is obtained for j or k 0: The global group operator is Gk;j;i Xkd2 Zjd2 Pid3 identity is obtained for k and j and i 0: O subgroup. To obtain 3 4 2 24 equivalent orientations, the threefold axis is rst oriented along z to
Acknowledgments This work was supported by Italian Ministry of University and Research (COFIN 2000 and FIRST 2001) and by Grant NSF DBI 95 155 18. The authors thank David Belnap and the National Institutes of Health for making available projection maps of poliovirus used in experimental tests and for continued interest in this work.
Appendix A. Global operators Global operators transform an orientation matrix R, obtained from a triad of Euler angles, into all symmetry equivalent matrices Rn ; n being the group order. All matrices mentioned below represent counterclockwise rotations.
271
operate and once the rotation has been performed, its original orientation is restored by the matrix: Pid3 Zp=4T XcT Zid3 XcZp=4; with i 0; 1; 2 identity is obtained for i 0: The fourfold axis coincident with z and operating next, is: Zjd4 ; with j 0 3 identity is obtained for j 0: Finally, a twofold axis coincident with x operates: Xkd2 ; with k 0; 1 identity is obtained for k 0:
The global group operator is Gk;j;i Xkd2 Zjd4 Pid3 identity is obtained if k and j and i 0: I subgroup. To obtain 3 2 2 5 60 equivalent orientations, the threefold axis is rst oriented along z to rotate and once this operation has been performed, its original orientation is restored by the matrix: Pid3 XbT Zid3 Xb; with i 0; 1; 2 identity is obtained for i 0: A twofold axis coincident with z operates next: Zjd2 ; with j 0; 1 identity is obtained for j 0: Another twofold axis, coincident with x, is applied further: Xkd2 ; with k 0; 1 identity is obtained for k 0: Finally, the vefold axis is rst oriented along z to operate and nally brought to the original orientation by the matrix: Qld5 Xa Zld5 Xa; with l 0 5 identity is obtained if l 0: The global group operator is therefore Gl;k;j;i Qld5 Xkd2 Zjd2 Pid3 identity is obtained if l and k and j and i 0: Note that global operators, products of several matrices with the exception of Cn group, perform a single operation (a rotation about a n-fold axis) provided that all but one of the matrices with variable arguments are set to identity.
T
Fig. A.1. Three mutually perpendicular and intersecting golden rectangles (e.g., edge lengths 1 and s) describe an icosahedron. Points 1, 2, and 3 dene a face whose center is intersected by a threefold axis. A vefold axis goes from the origin to point 3.
If three golden rectangles (whose edge lengths are in the golden ratio) mutually intersect in perpendicular planes as shown in Fig. A.1, their 12 vertices describe an icosahedron (Coxeter, 1989). A threefold axis, orthogonal to x and pointing from the origin to the center of the triangle 1-2-3, can be aligned along z by a counterclockwise rotation around x. This center is identied by a vector w3 equal to 1/3 the sum of vectors pointing from the origin to the vertices 1, 2, and 3: w3 1=31; 0; s 1; 0; s 0; s; 1 1=30; s; 2s 1 0; 1=3s; 1=3s3 : The counterclockwise rotation angle to orient the threefold axis along z is therefore b arctans=s3 arctans2 arctan2 s 20:905: The vector 0; s; 1, pointing from the origin to vertex 3, identies the orientation of a vefold axis. This axis can be oriented along z by a counterclockwise rotation around x. The rotation angle is a arctans 58:282:
Appendix B. Integer powers of the golden ratio s and two relevant icosahedral angles p Integer powers of s 5 1=2 1:6180 . . . are obtained by the recurrence formula sn sn1 sn2 (Wells, 1986). Starting from s0 and s1 , we can obtain s2 s 1; s3 2s 1; s1 s 1; s2 2 s.
References
Bellon, P.L., Lanzavecchia, S., 1997. Fast direct Fourier methods, based on 1- and 2-pass coordinates transformation, yields accurate reconstructions of X-ray CT clinical images. Phys. Med. Biol. 42, 443463. Bellon, P.L., Lanzavecchia, S., Scatturin, V., 1998. A two exposures technique of electron tomography from projections with random
272
S. Lanzavecchia et al. / Journal of Structural Biology 137 (2002) 259272 Klug, A., Finch, J.T., 1968. Structure of viruses of the papillomapolyoma type. IV. Analysis of tilting experiments in the electron microscope. J. Mol. Biol. 31, 112. Kocsis, E., Cerritelli, M.E., Trys, B.L., Cheng, N., Steven, A.C., 1995. Improved methods for determination of rotational symmetries in macromolecules. Ultramicroscopy 60, 219228. Lanzavecchia, S., Bellon, P.L., 1995. A bevy of novel interpolating kernel for the Shannon reconstruction of high band pass images. J. Visual Commun. Image Repres. 6, 122131. Lanzavecchia, S., Bellon, P.L., 1996. Electron tomography in conical tilt geometry. The accuracy of a direct Fourier method (DFM) and the suppression of non-tomographic noise. Ultramicroscopy 63, 247261. Lanzavecchia, S., Bellon, P.L., 1997. The moving window Shannon reconstruction in direct and Fourier domain: application in tomography. Scanning Microsc. Suppl. 11, 153168. Lanzavecchia, S., Bellon, P.L., 1998. Fast computation of 3D Radon transform via a direct Fourier method. Bioinformatics 14, 212216. Lanzavecchia, S., Bellon, P.L., Radermacher, M., 1999. Fast and accurate three-dimensional reconstruction from projections with random orientations via Radon transforms. J. Struct. Biol. 128, 152164. Lanzavecchia, S., Cantele, F., Bellon, P.L., 2001. Alignment of 3D structures of macromolecular assemblies. Bioinformatics 17, 5862. Radermacher, M., 1997. Radon transform techniques for alignment and 3D reconstruction from random projections. Scanning Microsc. Intl. Suppl. 11, 169176. Radon, J., 1917. Uber die Bestimmung von Funktionen durch ihre Integralwerte lngs gewisser Mannigfaltigkeiten. Ber. Verh. K nig a o Schs. Ges. Wiss. Leipzig, Math. Phys. 69, 262267. a Saxton, W.O., Baumeister, W., 1982. The correlation averaging of a regularly arranged bacterial cell envelope protein. J. Microsc. 127, 127138. Sezan, M.S.H., 1982. Image restoration by the method of convex projections. II. Applications and numerical results. IEEE Trans. Med. Imag. MI-1, 95101. Shannon, C.E., 1949. Communication in the presence of noise. Proc. IRE 37, 1021. van Heel, M., Keegstra, W., Schutter, W., van Bruggen, E.J.F., 1982. Arthropod hemocyanin structures studied by image analysis. In: Leeds, W.E.J. (Ed.), Life Chemistry Reports (Suppl. 1). The Structure and Function of Invertebrate Respiratory Proteins, EMBO Workshop, pp. 6973. Wells, D., 1986. In: The Penguin Dictionary of Curious and Interesting Numbers. Penguin, Middlesex, England, pp. 3649.
orientations and a quasi-Boolean angular reconstitution. Ultramicroscopy 72, 177186. Belnap, D.M., McDermott Jr., B.M., Filman, D.J., Cheng, N., Trus, B.L., Zuccola, H.J., Racaniello, V.R., Hogle, J.M., Steven, A.C., 2000. Three-dimensional structure of poliovirus receptor bound to poliovirus. Proc. Natl. Acad. Sci. USA 97, 7378. Bracewall, R.N., 1956. Strip integration in radio astronomy. Aust. J. Phys. 9, 198217. Carazo, J.M., 1992. The delity of 3D reconstructions from incomplete data and the use of restoration methods. In: Frank, J. (Ed.), Electron Tomography. Plenum, New York, pp. 117166. Carazo, J.M., Carracosa, J.L., 1987. Information recovery in missing angular data cases: an approach by the convex projections method in three dimensions. J. Microsc. 45, 2343. Clark, J.J., Palmer, M.R., Lawrence, P.D., 1985. A transformation method for reconstruction of functions from non-uniformly spaced samples. IEEE Trans. Acoust. Speech Signal Process. 33, 1151 1165. Cotton, F.A., 1964. Chemical Applications of Group Theory. Wiley, New York. Coxeter, H.S.M., 1989. In: Introduction to Geometry. Wiley, New York, p. 162. Crowther, R.A., Amos, L.A., Finch, J.T., De Rosier, D.J., Klug, A., 1970a. Three-dimensional reconstruction of spherical viruses by Fourier synthesis from electron micrographs. Nature 226, 421 425. Crowther, R.A., De Rosier, D.J., Klug, A., 1970b. The reconstruction of a three-dimensional structure from projections and its application to electron microscopy. Proc. Roy. Soc. Lond. A 317, 319340. Deans, S.R., 1993. The Radon Transform and Some of its Applications. Wiley, New York (original work published 1983). Frank, J., 1996. Three-Dimensional Electron Microscopy of Macromolecular Assemblies. Academic Press, San Diego. Fuller, S.D., Butcher, S.J., Cheng, R.H., Baker, T.S., 1996. Threedimensional reconstruction of icosahedral particlesthe uncommon line. J. Struct. Biol. 116, 4855. Gilbert, P.F., 1972. The reconstruction of a three-dimensional structure from projections and its application to electron microscopy. II. Direct methods. Proc. Roy. Soc. Lond. B 182, 89102. Hahn, T. (Ed.), 1992. Space-Group Symmetry. International Tables for Crystallography, vol. A. Kluwer Academic, Dordrecht. Herman, G.T., Lent, A., Rowland, S.W., 1973. ART: Mathematics and applications (a report on the mathematical foundations and on the applicability to real data of the algebraic reconstruction techniques). J. Theor. Biol. 42, 132.

Salvatore Lanzavecchia, Francesca Cantele, Michael Radermacher and Pier Luigi Bellon - Symmetry Embedding in The Reconstruction of Macromolecular Assemblies Via The Discrete Radon Transform

Uploaded by

Copyright:

Available Formats

You might also like

Salvatore Lanzavecchia, Francesca Cantele, Michael Radermacher and Pier Luigi Bellon - Symmetry Embedding in The Reconstruction of Macromolecular Assemblies Via The Discrete Radon Transform

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Salvatore Lanzavecchia, Francesca Cantele, Michael Radermacher and Pier Luigi Bellon - Symmetry Embedding in The Reconstruction of Macromolecular Assemblies Via The Discrete Radon Transform

Uploaded by

Copyright:

Available Formats

Journal of

Received 29 August 2001; and in revised form 16 February 2002

S. Lanzavecchia et al. / Journal of Structural Biology 137 (2002) 259272

S. Lanzavecchia et al. / Journal of Structural Biology 137 (2002) 259272

S. Lanzavecchia et al. / Journal of Structural Biology 137 (2002) 259272

S. Lanzavecchia et al. / Journal of Structural Biology 137 (2002) 259272

S. Lanzavecchia et al. / Journal of Structural Biology 137 (2002) 259272

Fig. 3. Canonical orientations of T, O, and I subgroups with respect to Cartesian axes.

S. Lanzavecchia et al. / Journal of Structural Biology 137 (2002) 259272

S. Lanzavecchia et al. / Journal of Structural Biology 137 (2002) 259272

S. Lanzavecchia et al. / Journal of Structural Biology 137 (2002) 259272

S. Lanzavecchia et al. / Journal of Structural Biology 137 (2002) 259272

S. Lanzavecchia et al. / Journal of Structural Biology 137 (2002) 259272

S. Lanzavecchia et al. / Journal of Structural Biology 137 (2002) 259272

You might also like