Professional Documents
Culture Documents
Top-Down Design of Protein Architectures With Reinforcement Learning.
Top-Down Design of Protein Architectures With Reinforcement Learning.
M
generated (which increases with the number
ultisubunit protein assemblies play ful for biomedicine in immunobiology and of choices at each step). We chose to balance
critical roles in biology and are the other areas, as highlighted by the recent ap- these factors by using as building blocks para-
result of evolutionary selection for func- proval of a de novo–designed COVID vaccine metrically generated straight helices, which
tion of the entire assembly. Therefore, (20–23), the bottom-up approach does have are fully described by a single parameter (the
the subunits in structures such as icosa- limitations. The properties of the assembly length, which we allow to vary from nine to
hedral viral capsids often fit together almost are limited to what can be generated from the 22 residues), followed by short loops clus-
perfectly (1, 2). In contrast to direct evolu- available oligomeric building blocks, at least tered into 316 bins (derived from clustering
tionary selection on overall system properties, one of the subunit-subunit interfaces must be loops in a large helical protein database; see
de novo protein design has generated protein strong enough to stabilize a cyclic oligomeric the materials and methods). The search be-
architectures using a “bottom-up” hierarchical substructure in isolation, and, more generally, gins with the selection of one of the helix
approach (Fig. 1A, left) in which monomeric there is no way to directly optimize the prop- possibilities and then alternates between the
structures are first docked into symmetric erties of the overall assembly. addition of a loop or a helix choice at either
oligomers (3–6) and then assembled into We sought to overcome the limitations of terminus. Once a loop bin is chosen, we select
closed assemblies with tetrahedral, octahe- bottom-up protein complex design by devel- randomly from the closely related loop back-
dral, or icosahedral symmetry (7–14) or open oping a top-down approach (Fig. 1A, right) bones within the cluster (Fig. 1B, left). Although
assemblies such as two-dimensional (2D) lay- that starts from a specification of the desired this is a far narrower set of local structures
ers and 3D crystals (15–19). An advantage of properties (overall symmetry, porosity, etc.) of than observed in native protein structures,
this hierarchical approach is that the multi- the structure and systematically builds up we found in preliminary explorations that a
ple interfaces that stabilize the assembly can subunits that pack together to optimize these wide variety of compact protein shapes could
be validated independently (the first by char- properties. We reasoned that protein frag- be readily generated from such building blocks.
acterization of the symmetric oligomer and ment assembly (24–28), which can generate Building up a 100-residue protein backbone
the second by characterization of the nano- a wide variety of monomeric protein struc- with this approach requires about five helix
material assembly from the preformed oligo- tures, could provide a suitable mechanism and four loop additions, yielding a total
mer), considerably increasing the robustness for generating diversity. Previous design ap- number of possibilities of ~1 × 1017, with
of the overall design process. Although such proaches such as SEWING have built up pro- additional structural diversity from the var-
designed assemblies are already proving use- teins from fragments, optimizing for monomer iation in loop backbones within a bin. The
stability at each step (29), but we aimed in- size of the search tree grows exponentially
1
stead to optimize for overall system properties, with the number of structural elements, so
Department of Biochemistry, University of Washington,
Seattle, WA, USA. 2Institute for Protein Design, University of
which could involve trading off monomer the space of possibilities is more effectively
Washington, Seattle, WA, USA. 3Department of Bioengineering, stability for increased subunit-subunit inter- explored for monomers with fewer helices
University of Washington, Seattle, WA, USA. 4BioInnovation action strength and other properties. To en- than for larger monomers.
Institute, DK2200 Copenhagen N, Denmark. 5Howard Hughes
Medical Institute, University of Washington, Seattle,
able such end state–based optimization, we The search is modulated based on the spe-
WA, USA. 6Institute for Stem Cell and Regenerative Medicine, turned to reinforcement learning (RL), which cific problem specification through geometric
University of Washington, Seattle, WA, USA. 7Oral Health has achieved considerable success recently in constraints that are applied at each step in
Sciences, University of Washington, Seattle, WA, USA.
8 different fields of artificial intelligence, such the search tree and score functions that are
Key Laboratory of Structural Biology of Zhejiang Province,
School of Life Sciences, Westlake University, Hangzhou, as self-driving cars (30), the AlphaGo pro- evaluated only after full structures are com-
Zhejiang, China. 9School of Biological Sciences, Seoul gram that defeats top human players in the pleted. Potential moves consisting of helix
National University, Seoul, Republic of Korea. game of Go (31, 32), and algorithm develop- or loop fragments are selected at each level
*Corresponding author. Email: dabaker@uw.edu (D.B.);
swang523@uw.edu (S.W.) ment (33). Monte Carlo tree search (MCTS) of the search tree only if they pass geometric
†These authors contributed equally to this work. (34, 35) is an RL algorithm that finds optimal constraints that can be evaluated before the
assembly of the entire structure; these include in the search tree. Completed backbones are reweighted accordingly. As individual move
internal clashes and overall shape constraints evaluated using score functions that assess weights become increasingly biased after many
(see the materials and methods for a full list how well the overall generated structure sat- traversals through the search tree, the gen-
of geometric constraints). Upon selection of a isfies the user specification of the problem to erated complete backbones have higher and
move passing the geometric constraints, its be solved (Fig. 1C and materials and meth- higher scores (fig. S1). Because each itera-
probability is upweighted, as are the proba- ods), and the probabilities of selection of each tion takes on average only tens of millisec-
bilities of all prior moves leading to this point move at each step along the search tree are onds, high-scoring backbones can be sampled
at scale by searching over tens of thousands between the two rings, requiring dense pack- within a specified upper distance bound of the
of iterations. To address the classical RL prob- ing such that the only large void in the re- origin in a random orientation and performed
lem of balancing exploration with exploita- sulting assembly is the pore of the inner C6 10,000 iterations for each to generate a large
tion (30–32), the search is initialized from ring. Both the inner and the outer ring have set of diverse structures. The MCTS generated
many independent trees, and the maximum C6 symmetry, and the search tree was initial- closely packed icosahedral assemblies in silico,
probability of any one move is capped (see ized to start at the N termini of the outer ring which span a structural space distinct from
the materials and methods). and simultaneously build six subunits that that of native and previous de novo icosahe-
We first tested the MCTS approach in silico collectively fill the empty space. We performed dra, with shorter sequence lengths than any
at the protein monomer level, choosing as a the MCTS for each of 2000 placements of a previously described protein icosahedra and
test problem the generation of protein back- set of different inner rings with a range of porosities comparable to the densely packed
bones with arbitrarily prespecified overall inner pore sizes inside a constant outer ring capsids generated by evolution (Fig. 1D).
shapes. To our knowledge, there are no cur- (for each inner ring, we sampled rotations The MCTS method rapidly generates tens
rent approaches for addressing this problem. around and translations along the common of thousands of candidate icosahedral assem-
A specified build volume is represented on a cyclic symmetry axis). We selected backbones blies, and we experimented with approaches
grid, and the MCTS is initialized randomly that fully filled the space between the two for rapidly designing sequences that stabilize
within the volume. At each move, only addi- rings, designed sequences with ProteinMPNN these assemblies in a manner compatible with
tions that stay within the specified volume are (37), and selected for experimental character- our overall top-down approach. In previous
accepted. For a range of prescribed shapes, ization 32 designs predicted to assemble into bottom-up nanocage design studies, the se-
into an Escherichia coli expression vector, and above 65°C. nsEM analysis showed that assem- sion (Fig. 4C). Over the designed monomer,
the proteins produced in E. coli in a 96-well bly morphologies were retained after 1 hour of the root mean square deviation (RMSD) be-
format were purified by immobilized metal treatment at 95°C and subsequent cooling to tween the cryo-EM structure and the design
affinity chromatography (IMAC) pull-down. A 25°C (Fig. 3, F and H, and fig. S17). model is 0.76 Å (Fig. 4D); a single rotamer
total of 208 of the 368 designs were expressed To evaluate the accuracy of our design strat- flip (Phe63) and tilting of the C-terminal helix
and soluble as assessed by SDS–polyacrylamide egy, we determined the structures of SEC- results in a slight expansion of the overall
gel electrophoresis. To evaluate particle for- purified RC_I_1 and RC_I_2 capsid particles cage diameter, resulting in an RMSD over all
mation, we performed nsEM on the IMAC using cryo-EM (Fig. 4 and fig. S18). For RC_I_1, 60 subunits of 3.72 Å (Fig. 4E). For RC_I_2, the
elution fraction for each soluble sample. Two 3D reconstruction yielded a 2.5-Å-resolution 2.9-Å cryo-EM structure of design RC_I_2 was
designs (RC_I_1 and RC_I_2, RL capsid with I cryo-EM atomic model that closely matched even closer to the design model (Fig. 4, F and
symmetry, design 1 and 2) formed uniform the computational design (Fig. 4, A and B, G, and fig. S20), with RMSDs at the C2 and C5
particles with the expected size and shape and fig. S19). The N-terminal helices of two interfaces of 0.66 and 0.27 Å, respectively
(Fig. 3, A and B). Size-exclusion chromatog- monomers pack in an antiparallel fashion to (Fig. 4H). The RC_I_2 monomer adopts the
raphy (SEC) of both designs yielded single form the primarily hydrophobic C2 interface, designed three-helical bundle fold with a 0.59-Å
peaks with an apparent molecular weight in whereas the two helices near the C terminus RMSD to the design model (Fig. 4I), and the
the range expected for these assemblies (Fig. 3, form the C5 interface with their neighbors overall assembly is almost identical to the
C and D). The designed assemblies had the (Fig. 4, B and C). Small apertures (diameter design model with a 1.39-Å RMSD over all
expected alpha-helical circular dichroism (CD) ~13 Å) present at the C3 axes of the capsid 60 subunits (Fig. 4J). The C2 interface is
spectra and apparent melting temperatures make the N termini available for genetic fu- situated near the extended C terminus of the
larger F-domain–presenting icosahedral nano- than 100 nM Ang1. The F domain–displaying to investigating the effect of packing density
particle (I53-50) (12, 43); the elevated potency capsid is thus an exceptionally potent Tie2- on the elicitation of immune responses by
likely results from the higher surface display activating ligand. The designed capsid is also nanoparticle-based immunogens. As a first
density [to facilitate comparison, concentra- far easier to produce and much more stable step in this direction, we fused trimeric in-
tions at the bottom of Fig. 5D are in terms of than Ang1 and thus could be useful in stim- fluenza hemagglutinin (HA) to the N termi-
Fd monomer (0.16 nM capsid × 60 Fd copies ulating differentiation and regeneration. nus of I1-capsid RC_I_1 using a (GS)6 linker.
per capsid = 10 nM Fd)]. The 0.16 nM (10 nM The high surface presentation density en- The fusion protein was expressed and sec-
Fd) capsid also elicited stronger responses abled by the designed scaffolds provides a route reted from mammalian cells and clearly forms
HA-displaying particles according to SEC and HA remains antigenically intact when dis- influenza and is currently being evaluated in
nsEM (fig. S29 and Fig. 5F). Biolayer interfer- played on the surface of the capsids. We im- clinical trials (53). We found that HA-displaying
ometry showed binding of both 5J8 [anti-HA munized mice with HA-displaying RC_I_1, as RC_I_1 elicited a strong antibody response
head antibody (50)] and CR9114 [anti-HA well as a much larger icosahedral immunogen, against vaccine-matched HA that was greater
stem antibody (51)] immunoglobulin G to HA-I53_dn5 (52), which has previously been than that produced by the clinical vaccine can-
HA capsids (Fig. 5G), indicating that the shown to elicit protective responses against didate by a small but statistically significant
amount (Fig. 5H). These results indicate that 9. Y.-T. Lai et al., Nat. Chem. 6, 1065–1071 (2014). computational design and discussion; R. Kibler, D. Feldman,
the high antigen presentation density enabled 10. N. P. King et al., Nature 510, 103–108 (2014). L. Milles, and N. Ennist for help with the experiments; and
11. Y. Hsia et al., Nature 540, 150 (2016). S. Dickinson and J. Quipse for help in maintaining and operating
by top-down design can yield robust immune the electron microscopes used. Funding: This work was conducted
12. J. B. Bale et al., Science 353, 389–394 (2016).
responses. 13. Y. Hsia et al., Nat. Commun. 12, 2294 (2021). at the Advanced Light Source (ALS), a national user facility
14. R. Divine et al., Science 372, eabd9994 (2021). operated by Lawrence Berkeley National Laboratory on behalf of
Conclusion 15. C. J. Lanci et al., Proc. Natl. Acad. Sci. U.S.A. 109, 7304–7309 the Department of Energy, Office of Basic Energy Sciences,
(2012). through the Integrated Diffraction Analysis Technologies (IDAT)
Our top-down RL approach enables the solution program, supported by DOE Office of Biological and Environmental
16. J. C. Sinclair, K. M. Davies, C. Vénien-Bryan,
of design challenges inaccessible to previous Research. Additional support comes from the National Institutes of
M. E. M. Noble, Nat. Nanotechnol. 6, 558–562 (2011).
Health project ALS-ENABLE (NIH grant P30 GM124169) and a
bottom-up design methods. Cryo-EM struc- 17. S. Gonen, F. DiMaio, T. Gonen, D. Baker, Science 348,
High-End Instrumentation Grant S10OD018483. This work was also
tures confirm the design of 54- and 67-residue 1365–1368 (2015).
supported by the National Institute on Aging grant (grant
18. A. J. Ben-Sasson et al., Nature 589, 468–473
proteins that assemble into 60-subunit icosa- (2021).
1U19AG065156-01 to I.D.L. and D.B.); Amgen (S.W.), the Audacious
Project at the Institute for Protein Design (A.J.B., Z.L., L.C., H.R.-B.,
hedra with both internal monomer and over- 19. Z. Li et al., bioRxiv 2022.11.18.517014 [Preprint] (2022). Y.T.Z., D.B., and N.P.K.); the NIH/National Institute of Dental
all assembly structure nearly identical to the 20. A. C. Walls et al., Cell 183, 1367–1382.e17 and Craniofacial Research (NIDCR) (grant T90 DE021984 to Y.T.Z);
computational models, and of disk-shaped (2020). the National Institute of Allergy and Infectious Diseases (NIAID
21. P. S. Arunachalam et al., Nature 594, 253–258 grant 1P01AI167966 to N.P.K.); Novo Nordisk (foundation grant
nanopores generated by densely filling the (2021). NNF170C0030446 to C.N.); the Open Philanthropy Project
space between cyclic protein rings with differ- 22. P. S. Arunachalam et al., bioRxiv [Preprint] Universal Flu Vaccine and Improving Rosetta Design (A.D., N.P.K.,
ent diameters. Both the icosahedra and the (2022). and D.B.); a Microsoft gift (M.B.); and the Department of Defense
23. J. Y. Song et al., EClinicalMedicine 51, 101569 Peer Reviewed Medical Research Program (award W81XWH-21-1-
disk designs are distinct from any previously
(2022).
Science (ISSN ) is published by the American Association for the Advancement of Science. 1200 New York Avenue NW, Washington, DC
20005. The title Science is a registered trademark of AAAS.
Copyright © 2023 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim
to original U.S. Government Works