Professional Documents
Culture Documents
Craig Venter - Design and Synthesis of A Minimal Bacterial Genome
Craig Venter - Design and Synthesis of A Minimal Bacterial Genome
Craig Venter - Design and Synthesis of A Minimal Bacterial Genome
Design and synthesis of a minimal ficial genome (5). We have been working since
then to produce the tools needed to accomplish
this. We developed methods to chemically syn-
bacterial genome thesize the M. genitalium genome (7). However,
M. genitalium grows very slowly, so we turned
to the faster-growing M. mycoides genome as
Clyde A. Hutchison III,1*† Ray-Yuan Chuang,1†‡ Vladimir N. Noskov,1
our target for minimization. We developed the
Nacyra Assad-Garcia,1 Thomas J. Deerinck,2 Mark H. Ellisman,2 John Gill,3 method of genome transplantation, which al-
Krishna Kannan,3 Bogumil J. Karas,1 Li Ma,1 James F. Pelletier,4§ Zhi-Qing Qi,3 lowed us to introduce M. mycoides genomes, as
R. Alexander Richter,1 Elizabeth A. Strychalski,4 Lijie Sun,1|| Yo Suzuki,1 isolated DNA molecules, into cells of a different
Billyana Tsvetanova,3 Kim S. Wise,1 Hamilton O. Smith,1,3 John I. Glass,1 species, M. capricolum (8, 9). In this process, the
Chuck Merryman,1 Daniel G. Gibson,1,3 J. Craig Venter1,3* M. capricolum genome is lost, resulting in a cell
containing only the transplanted genome. In 2010,
We used whole-genome design and complete chemical synthesis to minimize the we reported the complete chemical synthesis and
1079–kilobase pair synthetic genome of Mycoplasma mycoides JCVI-syn1.0. An initial installation of the genome of M. mycoides JCVI-
design, based on collective knowledge of molecular biology combined with limited syn1.0 [1,078,809 base pairs (bp) (10); hereafter
transposon mutagenesis data, failed to produce a viable cell. Improved transposon abbreviated syn1.0). This genome was an almost
mutagenesis methods revealed a class of quasi-essential genes that are needed for robust exact copy of the wild-type M. mycoides genome,
growth, explaining the failure of our initial design. Three cycles of design, synthesis, and with the addition of a few watermark and vector
testing, with retention of quasi-essential genes, produced JCVI-syn3.0 (531 kilobase pairs, sequences.
473 genes), which has a genome smaller than that of any autonomously replicating cell Genome reduction in bacteria such as E. coli
found in nature. JCVI-syn3.0 retains almost all genes involved in the synthesis and and B. subtilis has previously been achieved by
processing of macromolecules. Unexpectedly, it also contains 149 genes with unknown a series of sequential deletion events (11, 12).
biological functions. JCVI-syn3.0 is a versatile platform for investigating the core functions After each deletion, viability, growth rate, and
of life and for exploring whole-genome design. other phenotypes can be determined. In con-
C
trast to this approach, we set out to design a
ells are the fundamental units of life. The deciphering the operating system of the cell reduced genome, then build and test it. We
genome sequence of a cell may be thought was a daunting task. initially designed a hypothetical minimal ge-
of as its operating system. It carries the We have long been interested in simplifying nome (HMG) based on a combination of exist-
code that specifies all of the genetic func- the genomic software of a bacterial cell by elim- ing transposon mutagenesis and deletion data
tions of the cell, which in turn determine inating genes that are nonessential for cell growth (13) and cumulative knowledge of molecular
the cellular chemistry, structure, replication, and under ideal conditions in the laboratory. This biology from the literature (14).
other characteristics. Each genome contains in- facilitates the goal of achieving an understanding We designed the genome to be built in eight
structions for universal functions that are com- of the molecular and biological function of every segments, each of which could be independently
mon to all forms of life, as well as instructions gene that is essential for life. To survive in nature, tested for viability in the context of a seven-
that are specific to the particular species. The most bacterial cells must be capable of adapting eighths syn1.0 genome (i.e., a syn1.0 genome that
genome is dependent on the functions of the cell to numerous environments. Typical well-studied is seven-eighths complete). Initially, only one of
cytoplasm for its expression. In turn, the proper- bacteria such as Bacillus subtilis and Escherichia the designed HMG segments produced a viable
ties of the cytoplasm are determined by the in- coli carry 4000 to 5000 genes. They are highly genome. Improvements to our global transposon
structions encoded in the genome. The genome adaptable, because many of their genes provide mutagenesis method allowed us to reliably clas-
can be viewed as a piece of software; DNA se- functions that are needed only under certain sify genes as essential or nonessential and to
quencing allows the software code to be read. In growth conditions. Some bacteria, however, grow identify quasi-essential genes that are needed for
1984, Morowitz proposed the simplest cells ca- in restricted environments and have undergone robust growth, though not absolutely required
pable of autonomous growth, the mycoplasmas, genome reduction over evolutionary time. They [figs. S1 to S4 (9); a similar result in M. pneumoniae
as models for understanding the basic principles have lost genes that are unnecessary in a stable is presented in (15)]. We also established rules for
of life (1). A key early step in his proposal was the environment. The mycoplasmas, which typically removing genes from our genome design with-
sequencing of a mycoplasma genome, which grow in the nutrient-rich environment of animal out disturbing the expression of the remaining
we accomplished for Mycoplasma genitalium hosts, have the smallest known genomes of any genes. Methods that we developed in the course
in 1995 (2). Even with the sequence in hand, autonomously replicating cells. A comparison of building syn1.0 (10) provide a way to build a
of the first two available genome sequences, new genome as a centromeric plasmid in yeast
1
J. Craig Venter Institute, La Jolla, CA 92037, USA. 2National
Haemophilus influenzae [1815 genes (3)] and M. and to test it for viability and other phenotypic
Center for Microscopy and Imaging Research, University of genitalium [the smallest known mycoplasma ge- traits after transplantation into an M. capricolum
California–San Diego, La Jolla, CA 92037, USA. 3Synthetic nome; 525 genes (2)], revealed a common core of recipient cell. These methods, along with improve-
Genomics, La Jolla, CA 92037, USA. 4National Institute of only 256 genes, much smaller than either ge- ments described here, make up a design-build-
Standards and Technology, Gaithersburg, MD 20899, USA.
*Corresponding author. E-mail: chutchis@jcvi.org (C.A.H.);
nome. This was proposed to be the minimal gene test (DBT) cycle (Fig. 1).
jcventer@jcvi.org (J.C.V.) †These authors contributed equally to set for life (4). Here we report a new cell, JCVI-syn3.0 (ab-
this work. ‡Present address: American Type Culture Collection, In 1999, to put this comparative study to an breviated syn3.0), that is controlled by a
10801 University Boulevard, Manassas, VA 20110, USA. §Present experimental test, we introduced the method 531–kilobase pair (kbp) synthetic genome that
address: Center for Bits and Atoms, Massachusetts Institute of
Technology, Room E15-401, 20 Ames Street, Cambridge, MA
of global transposon mutagenesis (5), which al- encodes 438 proteins and 35 annotated RNAs.
02139, USA. ||Present address: CBRITE, 11575 Sorrento Valley lowed us to catalog 150 nonessential genes in It is a working approximation to a minimal cell.
Road, Suite 204, San Diego, CA 92121, USA. M. genitalium (6) and predict a set of 375 essen- Its genome is substantially smaller than that of
Fig. 5. Map of proteins in syn3.0 and homologs found in other organisms. Searches using BLASTP software were performed for all syn3.0 protein-coding
genes against a panel of 14 organisms ranging from non-Mycoides mycoplasmas to Archaea. A score of 1e−5 was used as the similarity cutoff. From left to right, five
classes (equivalog, 232 genes; probable, 58 genes; putative, 34 genes; generic, 84 genes; and unknown, 65 genes) proceed from nearly complete certainty about a
gene’s activity (equivalog) to no functional information (unknown). White space indicates no homologs to syn3.0 in that organism.
out of yeast into an M. capricolum recipient cell. Mobile elements and DNA restriction 0 73
.....................................................................................................................................................................................................................
Several were viable. One of these, RGD3.0 clone Total 473 428
.....................................................................................................................................................................................................................
g-19 (table S4), was selected for detailed analysis
and named JCVI-syn3.0.
A final round of Tn5 mutagenesis was per- passaging (P4). Nonessential vector genes and in genes that were originally classified as quasi-
formed on syn3.0 to determine which genes intergenic sequences are the most frequent in- essential make up almost the whole population
continue to show Tn5 insertions after serial sertion sites. As expected, cells with insertions of P4 cells that have insertions in mycoplasma
genes. The genes in syn3.0 are predominantly 65% of the deleted genes. In addition, because of Lastly, 81 genes (17%) that are primarily involved
e- or i-genes, based on the original syn1.0 classi- the rich growth medium that supplies almost all in cytosolic metabolism are retained in the cat-
fication. Of these, only the i-genes can tolerate of the necessary small molecules, many genes in- egories of nucleotide salvage, lipid salvage and
Tn5 insertions without producing lethality. The volved in transport, catabolism, proteolysis, and biogenesis, proteolysis, metabolic processes, redox
most highly represented in-, i-, and ie-genes are other metabolic processes have become dispens- homeostasis, transport and catabolism of non-
shown in table S5. It might be possible to remove able. For example, because glucose is plentiful in glucose carbon sources, and glucose transport
a few of these in a fourth DBT cycle, but there the medium, most genes for transport and catab- and glycolysis (Fig. 6).
would probably be further erosion of growth olism of other carbon sources have been deleted We presume that most of the 79 genes that
rate. In addition, a dozen genes that were orig- (34 out of 36), whereas all 15 genes involved in glu- are not assigned to a functional category be-
inally classified as nonessential continue to re- cose transport and glycolysis have been retained. long to one or another of these same four major
tain that classification (table S5 and database S1). In contrast, almost all of the genes involved in groups (gene expression, genome preservation,
the machinery for reading and expressing the membrane structure and function, and cytosolic
In syn3.0, 149 genes cannot be assigned genetic information in the genome and in ensuring metabolism). Among these 79 genes, 65 have com-
a specific biological function the preservation of genetic information across pletely unknown functions and 24 are classified as
Syn3.0 has 438 protein- and 35 RNA-coding genes. generations have been retained. The first of these generic, such as in the case of a hydrolase for
We organized the 473 genes into five classes, based two fundamental life processes, the expression which neither the substrate nor the biological role
on our level of confidence in their precise func- of genetic information as proteins, requires the is discernible. The other 60 of the 84 genes in the
tions: equivalog, probable, putative, generic, and retention of 195 genes in the categories of tran- generic class were assigned to a functional cat-
unknown (Fig. 5 and database S1). Many of these scription, regulation, RNA metabolism, transla- egory on the basis of their generic classification.
genes have been studied exhaustively, and their tion, protein folding, RNA (rRNA, tRNA, and small For example, an ABC transporter is assigned to
primary biological functions are known. RNAs), ribosome biogenesis, rRNA modification, membrane transport, even though the substrate
We used the TIGRfam equivalog family of and tRNA modification. The second of these two is unknown. Some of the unassigned essential
hidden Markov models (20) to annotate equiv- fundamental processes, the preservation of genome genes match domains of unknown function that
alog genes (~49% of the genes). The less certain sequence information, requires the retention of have been found in a wide variety of organisms.
classes were defined in a stepwise manner (Fig. 34 genes in the categories of DNA replication,
5). The probable class included genes that scored DNA repair, DNA topology, DNA metabolism, chro- Syn3.0 has a doubling time of 3 hours
well against unambiguous TIGRfam mathemat- mosome segregation, and cell division. These two and is polymorphic in appearance
ical models but that nevertheless scored below processes together require 229 (48%) of the Comparison of syn3.0 with the starting cell,
the trusted cutoff. These genes were consistently 473 total genes in syn3.0 (Fig. 6). syn1.0 (Fig. 7A) (9), showed that they have simi-
supported by other lines of evidence. Genomic In addition to the two vital processes just lar colony morphologies, which are characteristic
context and threading alignment to crystal struc- described, another major component of living of the natural, wall-less Mycoplasma mycoides
tures both agreed with the assignment. The pu- cells is the cell membrane that separates the outer subspecies capri on which the synthetic syn1.0
tative class included genes that were similarly medium from the cytoplasm and governs molec- genome was originally based (10). The smaller
supported by multiple lines of evidence; at the ular traffic into and out of the cell. It is an iso- colony size of syn3.0 suggested a slower growth
same time, either their scores, genomic context, latable structure, and many of the syn3.0 genes rate and perhaps an altered colony architecture
or alignment to structures with known activities code for its protein constituents. Because our on the solid medium. A corresponding reduction
were not convincing. The generic class included minimal cell is largely lacking in biosynthesis of in the growth rate of syn3.0 in static liquid cul-
genes encoding clearly identifiable proteins (e.g., amino acids, lipids, nucleotides, and vitamins, it ture (Fig. 7B), from a doubling time of ~60 min
kinase) but lacking consistent clues as to their depends on the rich medium to supply almost all for syn1.0 to ~180 min, confirmed the lower in-
substrates or biological role. Unknown genes were of these required small molecules. This neces- trinsic rate of propagation for syn3.0. This rate,
those that could not be reliably categorized with sitates numerous transport systems within the however, greatly exceeds the 16-hour doubling
regard to a putative activity. membrane. In addition, the membrane is rich time of M. genitalium (21).
Thus, biological functions could not be as- in lipoproteins. Membrane-related genes account In contrast to the anticipated reduction in
signed for the ~31% of the genes that were for 84 (18%) of the 473 total syn3.0 genes. In- growth rate, we found unexpected changes in
placed in the generic and unknown classes. cluded categories from Table 1 are lipoproteins, macro- and microscopic growth properties of
Nevertheless, potential homologs for a number cofactor transport, efflux systems, protein trans- syn3.0 cells. Whereas syn1.0 grew in static cul-
of these were found in diverse organisms. Many port, and other membrane transport systems. ture as nonadherent planktonic suspensions of
of these genes probably encode universal pro-
teins whose functions are yet to be character- Fig. 6. Partition of
ized. Each of the five sectors has homologs in genes into four major
species ranging from mycoplasma to humans. functional groups.
However, some of each annotation class is blank, Syn3.0 has 473 genes.
indicating that no homologs for these genes were Of these, 79 have no
found among the organisms chosen for display assigned functional
in Fig. 5. Because mycoplasmas evolve rapidly, category (Table 1). The
some of the white space in Fig. 5 corresponds to remainder can be
sequences that have diverged so as to align poor- assigned to four major
ly with representatives from other organisms functional groups:
In Table 1, we have assigned syn1.0 genes to (i) expression of
30 functional categories and indicated how many genome information
were kept or deleted in syn3.0. Of the 428 deleted (195 genes); (ii) preser-
genes, the largest group is the functionally un- vation of genome infor-
assigned genes; 134 out of 213 have been deleted. mation (34 genes); (iii)
All of the 73 mobile element and DNA modifica- cell membrane struc-
tion and restriction genes have been removed, as ture and function (84
well as most genes encoding lipoproteins (72 out genes); and (iv) cytoso-
of 87). These three categories alone account for lic metabolism (81 genes). The percentage of genes in each group is indicated.
predominantly single cells with a diameter of cleotide level. This effort was meant to clarify of two genetic elements. Despite extensive re-
~400 nm (10), syn3.0 cells formed matted sedi- whether gene order and gene sequence are organization, the resulting cell grew about as fast
ments under the same conditions. Microscopic major contributors to cell viability. The ability as syn1.0, as judged by colony size. Thus, the
images of these undisturbed cells revealed ex- to invert large sections of DNA in many ge- details of genetic organization impinge upon
tensive networks of long, segmented filamen- nomes has demonstrated that overall, gene survival in hypercompetitive natural environ-
tous structures, along with large vesicular bodies order is not critical. We showed that fine-scale ments, but the finer details are apparently not
(Fig. 7C), which were particularly prevalent at late gene order is also not a major factor in cell critical for life.
stages of growth. Both of these structures were survival. About an eighth of the genome was
easily disrupted by physical agitation, yet such sus- reconfigured into seven contiguous DNA cas- Recoding and rRNA gene replacement
pensions contained small replicative forms that settes, six of which represented known bio- provide examples of genome plasticity
passed through 0.2-mm filters to render colony- logical systems; the seventh cassette contained Our DBT cycle for bacterial genomes allows us
forming units (CFU). This same procedure retained genes whose system-level assignment was some- to assess the plasticity of gene content in terms
99.9% of the CFU in planktonic syn1.0 cultures. what equivocal. The vertical bar on the right of sequence and functionality. This includes
side of Fig. 8 specifies the biological systems. testing modified versions of genes that are fun-
Exploring the design of Individual genes (colored horizontal lines) and damentally essential for life. We tested whether
reorganized genomes intergenic regions (black lines) can be traced an altered 16S rRNA gene (rrs), which is essen-
To further refine our genome-design rules, we from their native location to their new positions tial, could support life (Fig. 9A). The single copy
also investigated prospects for logically orga- by following a line from left to right. Intersecting of the syn3.0 rrs gene was designed and synthe-
nizing genomes and recoding them at the nu- lines represent a change in the relative position sized to include seven single-nucleotide changes
Fig. 7. Comparison of syn1.0 and syn3.0 growth features. (A) Cells derived of determination (R2) are shown. (C) Native cell morphology in liquid culture
from 0.2 mm–filtered liquid cultures were diluted and plated on agar medium was imaged in wet mount preparations by means of differential interference
to compare colony size and morphology after 96 hours (scale bars, 1.0 mm). contrast microscopy (scale bars, 10 mm). Arrowheads indicate assorted forms
(B) Growth rates in liquid static culture were determined using a fluorescent of segmented filaments (white) or large vesicles (black). (D) Scanning electron
measure (relative fluorescent units, RFU) of double-stranded DNA accu- microscopy of syn1.0 and syn3.0 (scale bars, 1 mm). The picture on the right
mulation over time (minutes) to calculate doubling times (td). Coefficients shows a variety of the structures observed in syn3.0 cultures.
corresponding to those contained in the rrs gene containing three essential genes (era, recO, and can be built by chemical synthesis. As the de-
of M. capricolum. In addition, we replaced helix glyS) to determine its role. Specifically, we mod- tailed genetic requirements for life are discov-
h39 (35 nucleotides) with that from a phyloge- ified this region to include (i) M. mycoides codon ered, it will become possible to design whole
netically distant E. coli rrs counterpart. This adaptation index (CAI), but with the unusual genomes from first principles, build them by
unique 16S gene was successfully incorporated start codons recoded and tryptophan encoded chemical synthesis, and then bring them to life
into syn3.0 without noticeably affecting growth by the TGG codon, instead of by TGA; (ii) E. coli by installation into a receptive cellular environ-
rate. Some other variants of the rrs gene were CAI, with tryptophan still encoded by TGA; or ment. We have applied this whole-genome de-
constructed but proved nonviable. This demon- (iii) E. coli CAI with standard codon usage (TGG sign and synthesis approach to the problem of
strates our ability to test the plasticity of a gene encoding tryptophan) (Fig. 9B). Unexpectedly, minimizing a cellular genome.
sequence and, at the same time, provides a water- we found that all three versions were functional A minimal cell is usually defined as a cell in
mark by which to quickly identify this strain. and resulted in M. mycoides cells without no- which all genes are essential. This definition is
We also tested the underlying codon usage ticeable growth differences. However, large-scale incomplete, because the genetic requirements
principles in the M. mycoides genome, which changes in codon usage may need to accompany for survival, and therefore the minimal genome
has extremely high adenine and thymine (AT) modifications in the tRNA dosage levels to en- size, depend on the environment in which the
content. M. mycoides uses TGA as a codon for sure efficient translation. cell is grown. The work described here has been
the amino acid tryptophan, instead of a stop conducted in medium that supplies virtually all
codon, and occasionally uses nonstandard start Discussion and conclusions the small molecules required for life. A minimal
codons; in addition, the codon usage is heavily Genomics is moving from a descriptive phase, genome determined under such permissive con-
biased toward high-AT content. We modified in which genomes are sequenced and analyzed, ditions should reveal a core set of environment-
this uncommon codon usage in a 5-kbp region to a synthetic phase, in which whole genomes independent functions that are necessary and
sufficient for life. Under less permissive condi-
tions, we expect that additional genes will be
required.
There is a large body of literature concerning
the minimal cell concept and minimal sets of
essential genes in a number of organisms [for a
review, see (22)]. Work in the area has focused
on comparative genomic analyses and on expe-
riments in which genes are individually knocked
out or disrupted by transposon insertion. Such
studies identify a core of essential genes, often
about 250 in number. But this is not a set of
genes that is sufficient to constitute a viable
cellular genome, because redundant genes for
essential functions are scored as nonessential
in these studies.
In contrast, we set out to construct a mini-
mal cellular genome in order to experimentally
determine a core set of genes for an indepen-
dently replicating cell. We designed a genome
using genes from M. mycoides JCVI-syn1.0 (10).
This mycoplasma cell has several advantages
for this purpose. First, the mycoplasmas already
have very small genomes. They have evolved
from gram-positive bacteria with larger genomes
by losing genes that are unnecessary in their
niche as mammalian parasites. They are already
far along an evolutionary pathway to a minimal
genome, and consequently they are likely to have
fewer functionally redundant genes than other
bacteria. We also have a highly developed set of
tools for building this genome and for assembling
and manipulating the genome as an extra chro-
mosome in yeast.
Our initial attempt to design a minimal genome
was based on the current collective knowledge
Fig. 8. Reorganization of gene order in seg- of molecular biology, in combination with lim-
ment 2. Genes involved in the same process ited data concerning transposon disruption of
were grouped together in the design for “mod- genes, which provided additional information
ularized” segment 2. At the far left, the gene order of about gene essentiality. This information was
syn1.0 segment 2 is indicated. Genes deleted in syn3.0 particularly valuable with respect to the genes
are indicated by faint gray lines. Retained genes are indi- of unknown function. Specific experimental pro-
cated by colored lines matching the functional categories to which posals for minimal genome construction have
they belong, which are shown on the right. Each line connects the been made solely on the basis of accumulated
position of the corresponding gene in syn1.0 with its position in the knowledge concerning the genes that are in-
modularized segment 2. Black lines represent intergenic sequences containing promoters or transcrip- volved in fundamental biological processes (14).
tional terminators. Our HMG was assembled from eight segments
and proved nonviable, although one of the seg- fore, to obtain better information concerning Any design, viable or not, can be built in yeast
ments (segment 2) was functional when tested in gene essentiality, we made major improvements and then tested to determine whether it can
the context of the other seven syn1.0 segments. in our transposon mutagenesis methods. function as the genome of a viable bacterium.
These results convinced us that initially, we did To produce a genome containing all of the After four DBT cycles (genome designs HMG,
not have sufficient knowledge to design a func- essential and quasi-essential genes, we devel- RGD1.0, RGD2.0, and RGD3.0), we obtained a
tional minimal genome from first principles. There- oped a DBT cycle for bacterial genomes (Fig. 1). viable genome with all eight segments reduced,
Fig. 9. Gene content and codon usage principles, tested using the were modified in three different ways: using M. mycoides CAI with TGG
DBT cycle. (A) Secondary structure of the modified rrs gene that was encoding tryptophan, E. coli CAI with TGG encoding tryptophan, or E. coli
successfully incorporated into the syn3.0 genome; this gene was car- CAI with TGA encoding tryptophan. GC content of the wild-type and
rying M. capricolum mutations and had its h39 (inset) swapped with that modified genes is noted. The JCat codon adaptation tool was used for this
of E. coli. Positions with nucleotide changes are indicated by red arrows, exercise (www.jcat.de) to optimize the three open reading frames, with the
and E. coli numbering is used to indicate the position of M. capricolum exception of the overlapping gene fragment. Green and purple indicate
mutations. (B) The sequences of the essential genes era, recO, and glyS wild-type and codon-optimized sequences, respectively.
Table 2. Reduced genomes resulting from the DBT cycles, ultimately leading to syn3.0. Column 1 indicates the round of genome design (dashes indicate the
starting genome, syn1.0), column 2 gives the size of the designed genome (in kilobase pairs), and column 3 gives the number of mycoplasma genes in the design.
Column 4 shows the genome composition for key viable cell strains; for nonviable designs, a viable strain with the highest number of segments from the design is
shown, as well as a more robust alternative for RGD1.0 (fourth row) and a smaller derivative for RGD2.0 (sixth row, syn2.0). Column 5 gives the size of the genome
corresponding to column 3, and column 6 shows a quantitative or qualitative estimate of the growth rate of cells with the genome described in column 4.
- 901 syn1.0: all eight syn1.0 segments 1079 kbp Doubling time, td = 60 min
............................................................................................................................................................................................................................................................................................................................................
HMG 483 kbp 460 HMG segment 2 + 7/8 syn1.0 1003 kbp Slow-growing
............................................................................................................................................................................................................................................................................................................................................
RGD1.0 544 kbp 483 RGD1.0 segments 2,6,7,8 + syn1.0 segments 1,3,4,5 758 kbp Slow-growing
............................................................................................................................................................................................................................................................................................................................................
W............................................................................................................................................................................................................................................................................................................................................
W RGD1.0 segments 1,2,4,6,8 + syn1.0 segments 3,5,7 718 kbp Slow-growing
RGD2.0 575 kbp 512 RGD2.0 segments 1,2,3,4,6,7,8 + syn1.0 segment 5 617 kbp ?
............................................................................................................................................................................................................................................................................................................................................
syn2.0: RGD2.0 segments 1,2,3,4,6,7,8 + syn1.0 segment
W W 576 kbp td = 92 min
5 with genes MMSYN1_0454 to -0474 and MMSYN1_0483 to -0492 deleted
............................................................................................................................................................................................................................................................................................................................................
RGD3.0 531 kbp 473 syn3.0: all eight segments of RGD3.0 531 kbp td = 180 min
............................................................................................................................................................................................................................................................................................................................................
syn3.0. Table 2 summarizes the process leading in metabolism (81 genes, 17%). During reductive nisms far simpler than the complex division ap-
to syn3.0. The first three designs did not yield evolution as a mycoplasma, many biosynthetic paratus in most eubacteria may be sufficient.
complete viable cellular genomes. But in each genes were lost and replaced by transporters re- First, several types of bacterial cells, both
case, one or more of the eight segments yielded a siding in the membrane, resulting in a trade-off natural (26) and experimentally manipulated
viable genome when combined with syn1.0 seg- between these two categories. A relatively small (27, 28), have been shown to divide in the absence
ments for the remainder of the genome. The number of genes function in the replication of of key cytoskeletal components, most notably the
composition of several of these intermediate the genome and the preservation of genomic in- FtsZ cytoskeletal scaffolding and force-generating
strains is listed in the table. The viable syn3.0 formation through cell division (36 genes, 7%). component. Through our empirically based de-
cell is our best approximation to a minimal Unexpectedly, there are 79 genes (17%) that we sign process, a nonessential gene cluster present in
cell. We obtained another strain with a genome have been unable to assign to a functional cat- syn1.0 (MMYSYN1_0520 through MMYSYN1_0522)
smaller than any free-living cell found in nature egory. Of these, 19 are in the essential category was removed during construction of syn3.0 cells.
(syn2.0). This cell has a genome consisting of (e-genes), 36 are needed for rapid growth (i- or This contained orthologs of ftsZ and sepF [encod-
seven RGD2.0 segments plus a syn1.0 segment 5 ie-genes), and 24 are nonessential or nearly so (n- ing a membrane-anchoring component that inter-
with 31 genes deleted. or in-genes). We presume that most of these will acts with FtsZ (29)]. An adjacent gene, ftsA, which
Syn3.0 has a 531-kbp genome that encodes 473 fall into one of the four major categories described is reported to share some redundant functions with
gene products. It is substantially smaller than above (gene expression, membrane structure and sepF in other systems (30), remained essential in
M. genitalium (580 kbp), which has the smallest function, cytosolic metabolism, and genome pres- progressively reduced constructs that lacked ftsZ.
genome of any naturally occurring cell that has ervation), but it seems likely that some of them Second, completely synthetic lipid vesicles
been grown in pure culture. The syn3.0 genome may perform previously undescribed biological have been shown to spontaneously segregate
contains the core set of genes that are required functions. In particular, 13 of the 19 functionally without the involvement of macromolecular
for cellular life, but it is only half the size of syn1.0 unassigned e-genes are of completely unknown scaffolding or catalysis (31). In propagating cell
(10). In comparing the HMG to the subsequently function. Some of these match genes of unknown wall–deficient bacteria, alteration of the lipid
derived viable syn3.0 genome, we found agree- function in other bacteria or even in eukaryotes, content and properties of the plasma membrane
ment among 329 deleted genes and 365 retained and these are prime candidates for proteins with have been shown to elicit analogous membrane
genes. However, 111 genes that were kept in novel functions. Genes of unknown function that vesicle segregation (32). In several wall-less my-
syn3.0 were deleted from HMG, and 100 genes are required in syn3.0 and present in most orga- coplasma species, filamentous and large-vesicle
deleted in syn3.0 were kept in HMG. The dis- nisms must represent nearly universal functions morphotypes similar to those in syn3.0 have long
crepancies were primarily due to the sparseness and thus can provide biological insights. Like- been observed under certain growth conditions,
and quality of the initial transposon data, which wise, unknown genes without homologs may be depending in part on the nature of lipid precur-
resulted in incorrect identification of a number novel, or they may represent unusual sequences sors available to these cells (33). Ultimately,
of essential or nonessential genes and did not but well-understood functions. In contrast to the understanding the genetic and mechanistic
identify quasi-essential genes that affect growth wholly unknown genes, it is easy to oversimplify basis for the phenotype of syn3.0 propagation
rate (discussed below). One example of the im- a gene’s putative role in cell survival if it has a may shed light on the minimal requirements for
portance of the quasi-essential gene classifica- generic functional assignment. For example, segregation of the membrane-bounded cellular
tion was the case of four genes (MMSYN_0008 to some of the numerous hydrolases and kinases compartment that is essential for a living cell.
MMSYN_0011) that make up the transport sys- will undoubtedly contribute to processes such
tem for nucleosides. The original annotation of as nucleotide or cofactor salvage. The question is, Use of the DBT cycle for applications
the system was as a ribose/galactose ABC trans- will all of the generic functions of the unknown other than genome minimization
porter, which led us to target it for deletion in the genes be so commonplace, or do some of them Our main focus here has been the application
HMG. Our initial transposon data showed that represent fundamentally new processes? There of the whole-genome DBT cycle to a specific
all four genes were hit heavily and appeared to are genes whose generic annotations are per- problem, the construction of a minimal cellular
confirm that the genes were nonessential. How- plexing even though they are needed for survival. genome. However, the approach we describe can
ever, in later transposon mapping experiments, For example, there are six different efflux systems, be applied to the construction of a cell with any
P0 transposon data confirmed that they were hit encoded by the genes MMSYN1_0034, MMSYN1_0371 desired properties. For example, a cell could be
heavily, but after serial passage to deplete slow- and MMSYN1_0372, MMSYN1_0399, MMSYN1_0531, designed with added metabolic pathways (34),
growing cells, all four genes received zero hits, MMSYN1_0639, and MMSYN1_0691. Except for the an altered genetic code (35), or dramatically al-
confirming that they were quasi-essential and heterodimeric pair, MMSYN1_0371 and MMSYN1_0372, tered gene arrangements. We have begun to de-
should have been retained. which may be a flippase, the substrates and func- sign genomes with modified 16S rRNA sequences
tions of these proteins are unclear. It is some- and to assess the effects of dramatic alterations
Gene content of syn3.0 what disconcerting to imagine that all of these in codon usage. Application of our DBT cycle is
Syn3.0 is a working approximation of a minimal exclude or remove toxic substances. Similarly, a limited only by our ability to produce designs with
cell. Our first synthetic cell, syn1.0, contained rather complex pathway (23) for producing and a reasonable chance of success. With increasing
901 mycoplasma genes plus some watermarks and exporting glycoglycerolipids was required. Although knowledge of the functions of essential genes that
vector sequences. Of these, 428 have been re- there is some evidence that galactofuranose res- are presently unknown, and with increasing ex-
moved in syn3.0, leaving 438 protein-coding genes idues are important for membrane integrity (24), perience in reorganizing the genome, we expect
and 35 RNA genes. More genes could probably a detailed explanation for the biological role that our design capabilities will strengthen. The
be removed while retaining viability, but it seems fulfilled by glycoglycerolipids remains obscure. ability to design cells in which the function of
likely that growth rate would be compromised. every gene is known should facilitate complete
The slower growth rate of syn3.0 is not due to the Phenotype of the syn3.0 cell computational modeling of the cell (36). This would
removal of one of the rRNA operons. We also con- The replication of genomic information and its co- make it possible to calculate the consequences of
structed a strain with the same gene complement, ordinated distribution into segregated membrane- adding pathways for the production of useful prod-
except that it retained both rRNA operons, and bound cellular compartments are hallmarks of ucts, such as drugs or industrial chemicals, and
this strain grew at close to the same rate as syn3.0. extant living systems that are commonly con- would lead to greater efficiency in development.
The largest group of genes retained in syn3.0 sidered to be among the attributes that define
is involved in gene expression (195 genes, 41%). cellular life (25). The minimal requirements for Methods summary
Approximately equal numbers of genes are in- this process are not known, but evidence from Our methods for the identification of nonessential
volved in the cell membrane (84 genes, 18%) and disparate fields of study suggests that mecha- genes by global Tn5 mutagenesis, manipulation
of bacterial genomes in yeast by the scarless 16. A. Ramon, H. O. Smith, Single-step linker-based combinatorial 32. R. Mercier, P. Domínguez-Cuevas, J. Errington, Crucial role for
TREC (tandem repeat coupled with endonuclease assembly of promoter and gene cassettes for pathway membrane fluidity in proliferation of primitive cells. Cell Reports
engineering. Biotechnol. Lett. 33, 549–555 (2011). 1, 417–423 (2012). doi: 10.1016/j.celrep.2012.03.008;
cleavage) deletion method, synthesis and assembly doi: 10.1007/s10529-010-0455-x; pmid: 21107654 pmid: 22832271
of reduced genomes, genome transplantation, mi- 17. C. Merryman, D. G. Gibson, Methods and applications for 33. S. Razin, B. J. Cosenza, M. E. Tourtellotte, Filamentous
croscopic analysis of cells with reduced genomes, assembling large DNA constructs. Metab. Eng. 14, 196–204 growth of mycoplasma. Ann. N. Y. Acad. Sci. 143, 66–72
and observation of their growth characteristics are (2012). doi: 10.1016/j.ymben.2012.02.005; (1967). doi: 10.1111/j.1749-6632.1967.tb27645.x;
pmid: 22629570 pmid: 4861145
described in detail in the supplementary materials. 18. V. N. Noskov, L. Ma, S. Chen, R. Y. Chuang, Recombinase- 34. M. J. Smanski et al., Functional optimization of gene clusters
General information about our methods, accom- mediated cassette exchange (RMCE) system for functional by combinatorial design and assembly. Nat. Biotechnol. 32,
panied by specific references to the supplementary genomics studies in Mycoplasma mycoides. Biol. Proced. Online 1241–1249 (2014). doi: 10.1038/nbt.3063;
materials, is included throughout the text. 17, 6 (2015). doi: 10.1186/s12575-015-0016-8; pmid: 25419741
pmid: 25774095 35. M. J. Lajoie et al., Genomically recoded organisms expand
19. T. Dobzhansky, Genetics of natural populations. XIII. biological functions. Science 342, 357–360 (2013).
RE FE RENCES AND N OT ES Recombination and variability in populations of Drosophila pmid: 24136966
1. H. J. Morowitz, The completeness of molecular biology. Isr. J. pseudoobscura. Genetics 31, 269–290 (1946). 36. J. R. Karr et al., A whole-cell computational model predicts
Med. Sci. 20, 750–753 (1984). pmid: 6511349 20. D. H. Haft, J. D. Selengut, O. White, The TIGRFAMs database of phenotype from genotype. Cell 150, 389–401 (2012).
2. C. M. Fraser et al., The minimal gene complement of protein families. Nucleic Acids Res. 31, 371–373 (2003). doi: 10.1016/j.cell.2012.05.044; pmid: 22817898
Mycoplasma genitalium. Science 270, 397–404 (1995). doi: 10.1093/nar/gkg128; pmid: 12520025
doi: 10.1126/science.270.5235.397; pmid: 7569993 21. J. S. Jensen, H. T. Hansen, K. Lind, Isolation of Mycoplasma AC KNOWLED GME NTS
3. R. D. Fleischmann et al., Whole-genome random sequencing genitalium strains from the male urethra. J. Clin. Microbiol. 34, We thank Synthetic Genomics (SGI) and the Defense Advanced
and assembly of Haemophilus influenzae Rd. Science 269, 286–291 (1996). pmid: 8789002 Research Projects Agency’s Living Foundries program (contract
496–512 (1995). doi: 10.1126/science.7542800; pmid: 7542800 22. R. Gil, F. J. Silva, J. Peretó, A. Moya, Determination of the core HR0011-12-C-0063) for funding this work. Microscopy work at the
4. A. R. Mushegian, E. V. Koonin, A minimal gene set for cellular of a minimal bacterial gene set. Microbiol. Mol. Biol. Rev. 68, University of California–San Diego was supported by NIH grant
life derived by comparison of complete bacterial genomes. 518–537 (2004). doi: 10.1128/MMBR.68.3.518-537.2004; P41GM103412 from the National Institute of General Medical
Proc. Natl. Acad. Sci. U.S.A. 93, 10268–10273 (1996). pmid: 15353568 Sciences to M.H.E. J.F.P. was supported by a Fannie and John
doi: 10.1073/pnas.93.19.10268; pmid: 8816789 23. J. Romero-García, C. Francisco, X. Biarnés, A. Planas, Hertz Graduate Fellowship, the Massachusetts Institute of
5. C. A. Hutchison III et al., Global transposon mutagenesis and a Structure-function features of a Mycoplasma glycolipid Technology (MIT) Center for Bits and Atoms, and the MIT
minimal Mycoplasma genome. Science 286, 2165–2169 (1999). synthase derived from structural data integration, molecular Department of Physics. E.A.S. was supported by the National
doi: 10.1126/science.286.5447.2165; pmid: 10591650 simulations, and mutational analysis. PLOS ONE 8, e81990 Institute of Standards and Technology (certain commercial
6. J. I. Glass et al., Essential genes of a minimal bacterium. (2013). doi: 10.1371/journal.pone.0081990; equipment, instruments, or materials are identified in this paper to
Proc. Natl. Acad. Sci. U.S.A. 103, 425–430 (2006). pmid: 24312618 foster understanding; such identification does not imply
doi: 10.1073/pnas.0510013103; pmid: 16407165 24. E. Schieck et al., Galactofuranose in Mycoplasma mycoides is recommendation or endorsement by the National Institute of
7. D. G. Gibson et al., Complete chemical synthesis, assembly, important for membrane integrity and conceals adhesins but Standards and Technology, nor does it imply that the materials or
and cloning of a Mycoplasma genitalium genome. Science 319, does not contribute to serum resistance. Mol. Microbiol. 99, equipment identified are necessarily the best available for the
1215–1220 (2008). pmid: 18218864 55–70 (2016). pmid: 26354009 stated purpose). We thank M. D. Adams, M. A. Algire, D. Brami,
8. C. Lartigue et al., Genome transplantation in bacteria: 25. J. C. Xavier, K. R. Patil, I. Rocha, Systems biology perspectives D. Brown, L. Brinkac, N. Caiazza, O. Fetzer, L. Fu, D. Haft,
Changing one species to another. Science 317, 632–638 on minimal and simpler cells. Microbiol. Mol. Biol. Rev. 78, S. Kaushal, M. Lapointe, A. Lee, M. Lewis, D. Lomelin, C. Ludka,
(2007). doi: 10.1126/science.1144622; pmid: 17600181 487–509 (2014). doi: 10.1128/MMBR.00050-13; M. Montague, C. Orosco, T. Peterson, A. Ramon, T. Richardson,
9. Materials and methods are available as supplementary pmid: 25184563 A. Schwartz, D. Smith, S. Vashee, and T. Yee for their contributions
materials on Science Online. 26. R. Bernander, T. J. Ettema, FtsZ-less cell division in archaea to this work and for helpful discussions. J.C.V. is chairman of
10. D. G. Gibson et al., Creation of a bacterial cell controlled by a and bacteria. Curr. Opin. Microbiol. 13, 747–752 (2010). the Board of Directors and co–chief scientific officer of SGI.
chemically synthesized genome. Science 329, 52–56 (2010). doi: 10.1016/j.mib.2010.10.005; pmid: 21050804 H.O.S. is on the Board of Directors and co–chief scientific
pmid: 20488990 27. R. Mercier, Y. Kawai, J. Errington, Excess membrane synthesis officer of SGI. C.A.H. is chairman of the SGI Scientific Advisory
11. M. Juhas, D. R. Reuß, B. Zhu, F. M. Commichau, Bacillus drives a primitive mode of cell proliferation. Cell 152, 997–1007 Board. D.G.G. is a vice president of SGI. J.C.V., H.O.S., C.A.H.,
subtilis and Escherichia coli essential genes and minimal cell (2013). doi: 10.1016/j.cell.2013.01.043; D.G.G., J.G., K.K., and the J. Craig Venter Institute (JCVI) hold
factories after one decade of genome engineering. Microbiology pmid: 23452849 SGI stock and/or stock options. SGI and JCVI have filed
160, 2341–2351 (2014). doi: 10.1099/mic.0.079376-0; 28. M. Lluch-Senar, E. Querol, J. Piñol, Cell division in a minimal patent applications on some of the methods and concepts
pmid: 25092907 bacterium in the absence of ftsZ. Mol. Microbiol. 78, 278–289 described in this paper.
12. G. Pósfai et al., Emergent properties of reduced-genome (2010). doi: 10.1111/j.1365-2958.2010.07306.x;
Escherichia coli. Science 312, 1044–1046 (2006). doi: 10.1126/ pmid: 20735775
science.1126439; pmid: 16645050 29. S. Gola, T. Munder, S. Casonato, R. Manganelli, M. Vicente, The
13. Y. Suzuki et al., Bacterial genome reduction using the essential role of SepF in mycobacterial division. Mol. Microbiol. SUPPLEMENTARY MATERIALS
progressive clustering of deletions via yeast sexual cycling. 97, 560–576 (2015). doi: 10.1111/mmi.13050; www.sciencemag.org/content/351/6280/aad6253/suppl/DC1
Genome Res. 25, 435–444 (2015). doi: 10.1101/gr.182477.114; pmid: 25943244 Materials and Methods
pmid: 25654978 30. R. Duman et al., Structural and genetic analyses reveal the
Figs. S1 to S21
14. A. C. Forster, G. M. Church, Towards synthesis of a minimal protein SepF as a new membrane anchor for the Z ring.
Tables S1 to S15
cell. Mol. Syst. Biol. 2, 45 (2006). doi: 10.1038/msb4100090; Proc. Natl. Acad. Sci. U.S.A. 110, E4601–E4610 (2013).
References (37–59)
pmid: 16924266 doi: 10.1073/pnas.1313978110; pmid: 24218584
Database S1
15. M. Lluch-Senar et al., Defining a minimal cell: Essentiality of 31. J. C. Blain, J. W. Szostak, Progress toward synthetic cells.
small ORFs and ncRNAs in a genome-reduced bacterium. Annu. Rev. Biochem. 83, 615–640 (2014). doi: 10.1146/ 13 October 2015; accepted 5 February 2016
Mol. Syst. Biol. 11, 780 (2015). pmid: 25609650 annurev-biochem-080411-124036; pmid: 24606140 10.1126/science.aad6253
If you wish to distribute this article to others, you can order high-quality copies for your
colleagues, clients, or customers by clicking here.
Permission to republish or repurpose articles or portions of articles can be obtained by
following the guidelines here.
Updated information and services, including high-resolution figures, can be found in the online
version of this article at:
/content/351/6280/aad6253.full.html
Supporting Online Material can be found at:
/content/suppl/2016/03/23/351.6280.aad6253.DC1.html
This article cites 57 articles, 27 of which can be accessed free:
/content/351/6280/aad6253.full.html#ref-list-1
This article appears in the following subject collections:
Molecular Biology
/cgi/collection/molec_biol
Science (print ISSN 0036-8075; online ISSN 1095-9203) is published weekly, except the last week in December, by the
American Association for the Advancement of Science, 1200 New York Avenue NW, Washington, DC 20005. Copyright
2016 by the American Association for the Advancement of Science; all rights reserved. The title Science is a
registered trademark of AAAS.