Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

Molecular Life Sciences

DOI 10.1007/978-1-4614-6436-5_99-2
# Springer Science+Business Media New York 2014

Genes and Genomes: Structure


Lawrence I. Grossman*
Center for Molecular Medicine and Genetics, Wayne State University School of Medicine, Detroit, MI, USA

Synopsis
The ability of living cells to continue depends on their ability to divide and produce either exact
copies of themselves or programmed variations; they thus require a repository of knowledge for
doing so. This repository is their genes, composed in cells of DNA and in aggregate referred to as
their genome. The mechanism by which this DNA is duplicated is treated in Section I, DNA
Replication. Here the concern is with its organization and function. These facets, organization and
function, differ among the different repository levels (nuclei, organelles – mitochondria and
chloroplasts – plasmids, and viruses) and are dealt with individually in each appropriate section.
Major differences include the number of chromosomes (multiple chromosomes in eukaryotic nuclei,
single chromosomes typically in organelles, plasmids, and DNA viruses), the number of genome
copies (two in nuclei, hundreds to thousands in mitochondria and chloroplasts, various numbers >2
for plasmids and viruses), and their organization (DNA-protein complexes called chromatin in
nuclei, defined DNA-protein complexes for viruses, looser and less well-defined associations of
DNA and protein for organellar DNAs and plasmids). Whatever the details in each case, the genome
organization used, in addition to being able to support its own replication, must be able to support
expression and regulation of expression of its information content. These considerations must be
borne in mind in focusing here primarily on the organization and information content of genomes
and on the relative differences among related species.

Development of the Field


The discovery of DNA (structure and function) is surely one of the great milestones in the history of
knowledge. Although genetics (inheritance) could be described and studied in the absence of
specific knowledge about how it is mediated – that is, what is the genetic material? – it could not
really be understood in that context. The structure itself was worked out and announced dramatically
(Watson and Crick 1953) – and has been the subject of much historical writing and personal
recollection (e.g., Watson 1969; Sayre 1975) – but DNA did not spring full blown from the famous
note in Nature by Watson and Crick. Rather, it was preceded by nearly a century of work, some of it
on DNA as a chemical of unknown function, dating to a time when protein was suspected of being
the genetic material because it could be seen to have enough complexity that it might, in fact,
embody a code passed through generations; DNA, by contrast, had four seemingly repeating bases,
which seemed too simple for the purpose. Even after the association of DNA with the genetic
material (Griffith 1928; Avery et al. 1944; Hershey and Chase 1952), a good deal of work was
carried out in chemical characterization (Chargaff et al. 1950) that set the stage for the model
eventually proposed by Watson and Crick. Now that the structure of both genes and genomes is
understood in some depth, it is astonishing to sit back and consider the diversity of chemical,

*Email: lgrossman@wayne.edu

Page 1 of 11
Molecular Life Sciences
DOI 10.1007/978-1-4614-6436-5_99-2
# Springer Science+Business Media New York 2014

structural, and topological forms that they can take in different organisms. Details about the
molecular mechanisms that operate at each of these stages of DNA replication are described in
individual essays of this section.

The Genetic Code, Genes, and Chromosomes


This section, which is often the introductory chapter in books about genetics or other advanced
works in modern biology and biomedicine, and which will be included at a later time in the online
edition, embodies the history of molecular biology and encompasses the central subject matter of
decades of research that started during World War II and was called molecular biology. Historically,
a gene is the specifier for an inherited trait, a formulation that was very much in keeping with earlier
genetic analysis that followed the inheritance of visible features, such as the examples familiar to
students of genetics of smooth versus wrinkled seeds. As will be developed in this section, that
definition is too limited; a modern definition would consist of the sequence on the genome that
specifies the gene product along with any regulatory or otherwise functional associated regions.
A phenotypic difference, it is now clear, can result not only from protein (or RNA) coding region
differences but also from a single-nucleotide variation in a regulatory region that might regulate
whether or not a gene product is expressed, or at what level it is expressed, or in response to which
stimulus it is expressed.
The genetic code specifies the relation between DNA sequence and protein sequence. Early
workers suspected that there would be a triplet code (three bases specify an amino acid) because
a singlet code would allow only four amino acids to be specified, and even a doublet code did not
allow enough possibilities (16) to uniquely specify the 20 known amino acids. A triplet code
(64) was certainly possible; in fact, it would overspecify the number of amino acids, causing
speculation that some codons could serve as punctuation of various kinds. Furthermore, there was
a period of uncertainty as to whether the genetic code was overlapping (123, 234, 345. . .) or
nonoverlapping (123, 456, 789. . .). A historic experiment was able to resolve this: the ability to
sequence peptides had become available by the late 1950s, and Tsugita and Fraenkel-Conrat, who
were working with tobacco mosaic virus (TMV), were able to resolve it (discussed in Jukes (1962)).
They reasoned that in an overlapping code, a single-base mutation would change two or three
proteins but a nonoverlapping code would change only one. They thus treated TMV with a mutagen
that was known to lead to a single-base change in RNA and determined that only one amino acid was
changed in the cognate protein.
This experiment that showed the genetic code was nonoverlapping did not prove that it was
a triplet. That was suggested by another experiment, by Crick, Brenner, and their colleagues (Crick
et al. 1961). Working with the Escherichia coli bacteriophage T4, they used a mutagen that
introduced a different kind of mutation than the one Tsugita and Fraenkel-Conrat used; their
mutagen caused a frame shift (by adding or deleting a base in DNA instead of changing it for
a different one). Thus, reading out the DNA sequence after the frameshift would result in a different,
irrelevant protein. They were able to group the ones that added a base and the ones that subtracted
a base because infections with a phage from one group with a phage from the other usually restored
function (ability to cause an infection), whereas two from the same group did not. The important
observation was that three (but not two) from either group often restored function. The reasoning
was that if the mutations were fairly close, then a short region of incorrect or missing amino acids
might not render the whole protein unusable.

Page 2 of 11
Molecular Life Sciences
DOI 10.1007/978-1-4614-6436-5_99-2
# Springer Science+Business Media New York 2014

As is now well known, the genetic code is nonoverlapping and redundant and minimizes the
effects of mutation by rendering a moderate proportion of them silent. It is also universal – at least in
the nucleus. As the first complete sequence of mitochondrial DNA (Anderson et al. 1981) showed,
mitochondria use a modified genetic code. It became clear later that the mitochondrial genetic code
is not universal among mitochondria but differs somewhat in different genera. The endosymbiont
hypothesis of mitochondrial origination posits that they arose from the symbiotic fusion of an
oxygen utilizing prokaryote with an emerging eukaryotic cell, followed by gene movement and
elimination. One conclusion from the present-day difference between genetic codes, in which there
are differences in which codons are used to signal protein termination, is that gene flow between
subcellular compartments is no longer possible.
Chromosome is a term applied to the genetic material organized as a nucleoprotein structure.
Eukaryotic cell chromosomes in particular are examples of highly organized but fluid structures
which, on the one hand, need to be enormously condensed to fit inside a nucleus and, on the other
hand, able to undergo DNA replication or regulation. Regulation requires that condensed regions be
opened and available to transcription machinery in response to specific signals. It is perhaps not
surprising that the structural proteins that help DNA carry out these tasks (histones) are among the
most evolutionarily conserved known.

Prokaryotic Genomes
This section, which will be included at a later time in the online edition, deals with genomes of
prokaryotes. Prokaryotes consist of two major domains, Bacteria and Archaea. Bacteria are gener-
ally single celled, divide by fission, and lack internal membrane-bound structures like nuclei and
mitochondria. Indeed, many bacteria are similar in size to mitochondria. Archaea are similar to
bacteria in size and shape but differ in their evolutionary history, including the type of metabolic
pathways they use.
Prokaryotic genomes have been well studied. However, a caveat may be in order: on the one hand,
this is a mature subject about which we know a great deal given that the foundations of molecular
biology were worked out over the last 75 or so years with bacteria. On the other hand, we know
a great deal about relatively few organisms; the vast majority of prokaryotes have not been explored
at all and, indeed, have likely not yet even been found. Nevertheless, many of the ones that have been
studied have, in fact, been sequenced in their entirety (http://www.ncbi.nlm.nih.gov/genomes/
MICROBES/microbial_taxtree.html, last accessed 26 Mar 2014).
Prokaryotic genome size is between 106 and 107 base pairs (bp), about three orders of magnitude
smaller than many eukaryotes. The intestinal bacterium Escherichia coli, the most widely studied in
the laboratory, is ~5  106 bp; mammals like mice or humans have ~3  109. A bacterial genome is
comprised of a single molecule of double-stranded DNA in the form of a circle. In addition to its
circular genome, which in cells is complexed with protein and called a nucleoid, many prokaryotes
contain additional genes located on separate, small circular DNA molecules within the cell called
plasmids (▶ Plasmid Genomes, Introduction to).
The organization of bacterial genomes is based on the operon concept (Jacob and Monod 1961;
Beckwith 2011). That means that genes coding for proteins that function at sequential steps in
a particular pathway are typically clustered together and co-regulated. Regulation of operons is
carried out by the binding of proteins to a regulatory sequence at the start of the operon, termed the
operator. Depending on their function, operons can be normally active and turned off when no longer
needed by the binding of a regulatory protein (termed the repressor) to the regulatory DNA region, or

Page 3 of 11
Molecular Life Sciences
DOI 10.1007/978-1-4614-6436-5_99-2
# Springer Science+Business Media New York 2014

they can be normally inactive and turned on when needed by the inactivation of the repressor. One of
the best studied examples of a bacterial operon, the E. coli lac operon, operates by being normally
inactive, so as to not make lactose metabolizing proteins in the absence of lactose, but is turned on in
the presence of lactose (and the absence of glucose) by the binding of a lactose metabolite to the lac
repressor, inactivating it.

Eukaryotic Genomes
Eukaryotic genomes vary widely in their properties. In terms of amount of DNA, among the least is
found in the fission yeast Schizosaccharomyces pombe (13.8 million bp). The largest amounts are
found in some plants; a Lilium species has 300,000 million bp. Most mammals contain about 3,000
million bp. Compared to prokaryotic genomes, several features distinguish them. The most notable
is that the genome is divided into chromosomes, DNA-protein complexes whose number varies
from a low of 2 to more than 1,000 in some plants; humans have 23. Second, the genome
(chromosomes) is isolated in a subcellular compartment, the nucleus. One way among many to
mark the rate of progress in genomic research is by observing the number of species for which
a complete genome sequence is now available. The first eukaryote (yeast) was sequenced in 1996;
currently 183 eukaryotic genomes have been fully sequenced and annotated (http://www.
genomesonline.org, last accessed 26 Mar 2014). Many more (thousands) prokaryotic genomes
have been completed since they are a thousand or more times smaller.
A striking consideration about eukaryotic genomes consists of what is sometimes called the
packaging problem. The basic problem, for example, for the human genome, is that the total length
of DNA is on the order of 2 m, but it must be packaged to fit into a microscopic nucleus and
packaged in such a way that specific regions can be activated or inhibited under specific conditions
in particular cell types. Although this problem has been addressed in broad outline, it remains an area
of active investigation.
Another area of great current interest falls under what is broadly called genomics. In general,
one representative of an organism is sequenced. In comparing a particular locus across species, the
question arises as to whether any difference(s) is a property of the species or a variation in
the particular individual chosen for sequencing. To clarify this more than one individual needs to
be assessed. In the case of human, where variation can be related to disease or other phenotype,
extensive population characterization of variation has been carried out (http://hapmap.ncbi.nlm.nih.
gov, last accessed 26 Mar 2014).
A fuller discussion of eukaryotic genomes will be included at a later time in the online edition.

Plant Genomes
Information about plant genomes has continued to increase, although not at the rate information
about animal genomes has accumulated. There are two major reasons for this. One is that plant
genomes are complex. Plants consist of three separate genomes, deriving from the nucleus, the
chloroplasts, and the mitochondria. The major genome, in the nucleus, can have considerable
variation in size, a phenomenon long recognized and referred to as the C-value paradox. C-value
refers to the haploid DNA content of an organism, and the so-called paradox was the lack of
relationship between the C-value and the apparent complexity of the organism. As Childs and Buell
discuss, just among angiosperms there is a 2,380-fold difference between smallest and largest

Page 4 of 11
Molecular Life Sciences
DOI 10.1007/978-1-4614-6436-5_99-2
# Springer Science+Business Media New York 2014

known genomes, with a mean, interestingly, near that of mammals. The other reason for the
information disparity is that more support has been available, largely through the National Institutes
of Health but also from numerous charities often targeting specific diseases, for the more directly
human health-related work thought to derive from animal genomics.
What further complicates analysis of plant genomes in addition to size variation is the nature of
the sequences. Plants contain a large number of transposable elements – fragments of DNA that can
move from one location to another – resulting, as Jiang describes, in insertions, deletions, duplica-
tions, and chromosomal inversions. Their ability to amplify allows them in some cases to be
responsible for the majority of DNA sequence in some species. For example, maize contains well
over a million copies of transposable elements, consisting of 84 % of its genome. And, as Shiu
discusses, in analyzing the causes of genome expansions, the ability of transposable elements to
expand is per se a major reason for genome size increases. Finally, plants are frequently polyploid,
with heterozygosity between the copies sufficiently extensive as to inhibit genome assembly.
Therefore, as Buell describes, special versions must be bred to allow ready genome sequencing.
The existence of so much sequence diversity in plants, as Hansey discusses, leads to phenotypic
diversity. In a model organism like Arabidopsis thaliana, which has a relatively small and fully
sequenced genome, genome diversity (single-nucleotide polymorphisms, insertions and deletions,
copy number variation) can be evaluated and alleles responsible for phenotypic variation can start to
be determined.

Plasmid Genomes
Plasmids are autonomously replicating extrachromosomal DNA elements. They are of considerable
interest, for at least two reasons. One is that they have been harnessed by molecular biologists as
invaluable tools in modern molecular genetics; indeed, the original work on recombinant DNA
technology more than 30 years ago was facilitated by engineered plasmids that allowed insertion of
foreign DNA into bacterial cells along with the use of selectable markers in the plasmids to identify
and isolate those bacterial cells that contained them. The second reason is that plasmids have
allowed a wide range of genes that encode resistance to rarely encountered environmental perils
to exist in a population of organisms, ensuring survival of the species if the particular peril is
encountered, but not overly burdening the genome with coding capacity for rarely if ever used genes
in particular organisms. Antimony is an example among many of a toxin but a relatively uncommon
one. Considering the number of environmental toxins a bacterium might encounter, and postulating
one or several genes devoted to detoxification for each, the bacterial genome would swell substan-
tially if it contained the information for coping with each potential danger. Instead of devoting
resources on an ongoing basis to anticipating infrequent perils, bacteria and other microorganisms
have “outsourced” these genes to extrachromosomal DNA. Plasmids containing genes for
a particular danger are not present in every organism but are often present in the population.
Here we are not emphasizing this basic plasmid functional biology but, rather, focusing on the
plasmid genomes themselves – their types, their replication, and the way they interact with their
hosts. The introductory article by Thomas and Frost lays out and summarizes the major issues such
as types of replication, copy number control, and regulation of replication. Many of these topics are
treated in more detail in the specialized articles that follow. For example, Van Houdt and Mergeay
describe chromids, which are large plasmids that carry genes indispensable for cell viability.
However, a chromid differs from a chromosome in using a plasmid type of replication system. In
addition, chromid genes evolve more rapidly than chromosomal ones.

Page 5 of 11
Molecular Life Sciences
DOI 10.1007/978-1-4614-6436-5_99-2
# Springer Science+Business Media New York 2014

Two plasmids cannot always coexist stably in the same cell line. Their inability to do so, called
plasmid incompatibility, is described by Thomas, who points out that typically incompatibility
means that two plasmids are closely related and are competing for the same proteins. However, other
mechanisms have also been observed.
Suzuki, Brown, and Top discuss plasmid host range, which is their ability to maintain themselves
in bacteria of divergent species. Clearly, the greater the host range, the more rapidly a plasmid can
spread genes in its environment. Since it is difficult to test each organism in an environment but
sequences of different species are accumulating, Suzuki et al. discuss predicting plasmid host range
from genomic signatures, which are evaluated from a plasmid’s and a potential host’s nucleotide
sequence.
Stekel discusses the modeling of plasmid regulatory systems. In nature plasmids have evolved to
contribute a negligible burden to the host’s metabolism. However, understanding this burden is
necessary both for designing plasmids not found in nature and for trying to manipulate the burden for
therapeutic or other interventions. Starting with simple models, considerable success in mathemat-
ical modeling has been achieved thus far in building dynamic and realistic models. Kreft further
discusses mathematical modeling of plasmid dynamics, reviewing models that have given useful
results in appropriate environments, and then moves on to models that incorporate spatial structure
and are oriented to individual organisms rather than only population-level properties.

Mitochondrial Genomes
The discovery of DNA in mitochondria (Nass and Nass 1963; Schatz et al. 1964), which helped to
rationalize years of prior genetics results on cytoplasmic or non-Mendelian inheritance that had been
studied largely in plants and in fungi, started what has been more than a half century of character-
ization of these genomes and their role in cellular function. This initial discovery in yeast was soon
shown to be true also in mammalian cells (Corneo et al. 1966).
Work on mitochondrial DNA (mtDNA) can be largely divided into work on animal mtDNA and
work on all others, although all others contain vastly more types of organisms. This dichotomy has
two primary causes. One is that animal mtDNA, already known to be small, was also the beneficiary
of one of the early milestones in the field, the discovery that mammalian mtDNA is circular (Van
Bruggen et al. 1966); in fact, all vertebrate mtDNA is circular and has a similar genetic map. This
discovery came soon after the initial discovery of closed circular DNA in small tumor viruses
(Dulbecco and Vogt 1963; Weil and Vinograd 1963). Because the discovery was both novel and
associated in the minds of the discoverers with tumorigenicity, substantial work took place on the
properties of circularity and the associated property of supercoiling, including, soon enough,
a purification method based on the discovery that the intercalating drug ethidium bromide showed
restricted binding to closed circular DNA. Once mtDNA, which constitutes about 0.1 % of total
cellular DNA, could be highly purified, considerable strides could be made in characterizing it,
leading within about 20 years to its complete nucleotide sequencing – one of the first and at that time
the largest DNA to be sequenced (Anderson et al. 1981).
The second cause was the discovery that mtDNA mutations could be linked to human disease
(Holt et al. 1988; Wallace et al. 1988) and, additionally, to human migrations (Wallace 2005). These
discoveries caused human and other animal mtDNAs to become the objects of considerable interest
and, importantly, research resources.
However, as the overview essay by Gray, as well as the individual essays, makes clear, mtDNA
throughout eukaryotic life comes in a wide variety of sizes and forms. Although its gene content has

Page 6 of 11
Molecular Life Sciences
DOI 10.1007/978-1-4614-6436-5_99-2
# Springer Science+Business Media New York 2014

a focus of energy metabolism, the form genomes have taken in descent from a bacterial ancestor is
diverse. Furthermore, as compared to animal mtDNAs, with 37 genes consisting of 13 respiratory
proteins, 22 tRNAs, and 2 rRNAs, some genomes contain some or all ribosomal protein genes, some
contain only some or no tRNA genes, and many contain additional and still unidentified open
reading frames. Lee and Hua, in their essay about Archaeplastidian algae, describe mitochondrial
genome variation in size by nearly 20-fold, variation in structure (circular, linear, branched,
fragmented), and variation in information content (up to a fivefold difference in number of genes).
Slamovits, describing alveolate protists, finds linear genomes in all sequenced cases containing
20–44 genes. One alveolate lineage, Apicomplexa, contains mitochondrial genomes generally
between 6 and 8 kb and in one case appears to lack mtDNA altogether. In addition, some alveolates
contain fragmented rRNA genes that must be edited. RNA editing is also present in a number of
other groups, including Amoebozoa, as described by Miller.
Chromista, as O’Brien and Lane discuss, contains a threefold variation in genome size as well as
novel features not seen outside this group of organisms, such as large intergenic regions and some of
the open reading frames. Novelty is also a feature of supergroup Excavata: as pointed out by Lukes,
some members contain the largest gene set seen in mtDNA, with an operon type of organization,
whereas other members have lost mtDNA altogether. More surprises come in fungal groups:
although most fungal mitochondrial genomes are unsurprising, containing circular chromosomes
with standard mitochondrial gene features, Lang points out that less well-characterized and faster-
evolving lineages have many novel features. They contain different genome architectures, genetic
code changes, unorthodox initiator tRNA structures and translation initiation mechanisms, and,
most recently, group I intron-mediated mRNA trans-splicing.
For vertebrate animals, Bogenhagen summarizes the features, perhaps best known for human
mtDNA, that have been conserved for about 500 Ma. Invertebrate animals, by contrast, as Lavrov
describes, reveal substantial diversity. Particularly in non-bilateral animals, gene content, genome
architecture, and the mode and tempo of genome evolution show considerable variation. Both of the
above groups are part of Metazoa (multicellular animals). Unicellular animals, reviewed by Lavrov
and Lang, also show wide variation. A member of Choanoflagellata, protists most closely related to
the Metazoa, has a circular genome more than 76 kb in size, 86 % A + T in composition, and
53 identified genes. More distant relatives include ones with linear genomes, repeat containing
noncoding regions, and greater size containing more genes. Lastly, Bonen discusses land plants,
which contain some of the most unusual mitochondrial genomes. Plants such as liverworts and
mosses have mitochondrial genomes of approximately 100–200 kb with a conservative set of about
70 genes, some of which retain bacterial-operon organization. By contrast, vascular plant mitochon-
drial genomes range from 200 to 3,000 kb. These mitochondrial genomes recombine readily and so
exist in various physical forms and stoichiometries, as well as containing incorporated nuclear and
chloroplast sequences. In addition, RNA editing of message takes place.

Chloroplast Genomes
Like mitochondria, chloroplasts too have their own genomes. And, like mitochondria, they are
postulated to have arisen by endosymbiosis, with most genes being transferred to the nucleus over
time. As Childs and Buell write, this is an ongoing process that is still occurring both in plant
mitochondria and in chloroplasts. Although novel findings have been seen in particular taxa (Knight
et al. 2001), both chloroplast and mitochondrial genomes in plants use the universal genetic code,
whereas in the animal kingdom, and in all metazoans examined, variations in the code occur that

Page 7 of 11
Molecular Life Sciences
DOI 10.1007/978-1-4614-6436-5_99-2
# Springer Science+Business Media New York 2014

would prevent current transfer of organelle genes to the nucleus in a straightforward way. A full
discussion of chloroplast genomes will be included at a later time in the online edition.
In general, chloroplast genomes consist of circles that are 100–200 kb and that contain two
inverted repeats, thereby dividing the molecule into single-copy and repeat regions. The inverted
repeats contain the rRNAs and some other genes, but most genes are found in the single-copy
regions. The single-copy genes are of three basic types. First, they contain chloroplast versions of
genes for basic machinery that are also present in mitochondrial genomes. These include rRNAs,
tRNAs, and core subunits of respiratory complexes such as NADH dehydrogenase, cytochrome
c oxidase, and ATP synthase. Second, they contain a number of proteins similar to ones found in
mitochondria but whose genes are often – always in the case of animal mitochondria – found in the
nucleus. Examples are some subunits of chloroplast (cl)-RNA polymerase, cl-DNA polymerase,
some ribosomal proteins, and others. Third, and importantly, they contain a number of genes not
found elsewhere in the cell that suit their role in the major function of plants, photosynthesis.
Examples of these are genes for photosystem I and II.
It’s interesting, because it’s so puzzling, to compare organelle genomes in terms of their basic
properties. mtDNAs from metazoan animals, whose size and gene order are essentially constant, use
all of their genetic information for coding RNA or protein, except for the approximately 1-kb
regulatory region. To reduce genome size, there is an example of overlapping genes on opposite
strands, and there is even an example of laboring to save a single base pair: at the 30 end of the gene
for COX1, TCT encodes the final amino acid, serine, leaving AG before the 30 end of tRNAser;
polyadenylation of the mRNA subsequently creates AGA, the termination codon. Compared to this
aggressive saving of a single base, one of the striking features of chloroplast genomes (and of higher
plant mitochondrial genomes) is their large size and great size variability.

Viral Genomes
Much of the early knowledge of molecular biology originated by using viruses both as objects of
curiosity and as research tools. The early studies in E. coli benefitted from use of both the virulent
T bacteriophages and the lysogenic phages, prototypically bacteriophage lambda, that could either
be virulent or instead integrate into the host chromosome as a benign passenger. Their relatively
compact genomes could be analyzed with the less sophisticated tools available ~50 years ago, and
they also were among the first molecules whose DNA sequence was determined when the technol-
ogy became available. In the case of animal DNA viruses, they were able to effect oncogenic
transformations and thus served as an early probe into cell transformation. In addition, as noted
earlier, the fact that they were closed circular drove a period of intense investigation on the inherent
properties of circularity and on technology that would allow their separation from nuclear DNA. A
discussion of viral genomes will be included at a later time in the online version.

Future Outlook
The future of work on genes and genomes continues to be full of interest, along with the occasional
surprise, and there is no reason to think this will end any time soon. Much is known, to be sure. Still,
lest we get too convinced that the main principles have been discovered and the most exciting work
is behind us, it may be useful to remember a period about 40 years ago when the broad details of
replication, transcription, and translation had been worked out in microorganisms but before the

Page 8 of 11
Molecular Life Sciences
DOI 10.1007/978-1-4614-6436-5_99-2
# Springer Science+Business Media New York 2014

discovery of introns, all the regulatory RNAs, and so much else, that it was credible to claim that the
period of great discovery in molecular genetics had come to an end, suggesting that what we have
left is just filling in details. Some of the areas where we can expect to see new advances are:

• Prokaryotic genomes: Synthetic biology promises to better define what are the genes that
constitute a minimum requirement for life and informing the questions of what additional
genes both add and cost.
• Eukaryotic genomes: Some major issues in the field are regulation; codes for modifications about
which little is known, such as protein glycosylation; and the role of noncoding RNA. On the latter
point, it has been known for some time that the protein-coding part of the eukaryotic genome
occupies on the order of 3 % of the genome, and for some years much of the remainder was
considered “junk” DNA, possibly the vestiges of failed evolution experiments in the past. More
recently, it became clear that most of the genome is transcribed and that this noncoding RNA is in
some cases as conserved or more so than coding regions. The functions of this noncoding RNA
are just beginning to be unearthed. Furthermore, illuminating the functional effects of noncoding
variation is a major area that is just now in its infancy.
• Plant (including chloroplast and mitochondrial) genomes: The expansion of sequencing projects
to plant materials promises a considerable growth in the amount of data available.
• Mitochondrial genomes: Among mammals, much is known by way of physical description
because mtDNA is so readily sequenced and because, in humans, disease-causing mutations
receive considerable study and collection (www.mitomap.org). However, there is a disconnect
between major phenotypic effects attributed solely to mtDNA (Sharpley et al. 2012) and genome-
wide polymorphism studies in populations (www.ncbi.nlm.nih.gov/gap) that typically do not
examine mtDNA. In the future, studies that examine interaction between nuclear and mitochon-
drial variations are likely to be of considerable interest.

Also likely to be of interest is regulation of the mitochondrial genome based on sensing of the
cell’s metabolic state, a field that is in its early stages. The discovery that proteins thought to function
in the cytoplasm and nucleus are also found in the mitochondria, at least conditionally, is driving
discovery of signaling pathways that such proteins participate in (e.g., Leigh-Brown et al. 2010;
De et al. 2012).

Cross-References
▶ Chemical Biology of DNA Replication
▶ DNA Replication
▶ DNA Topology and Topoisomerases
▶ Gene Regulation
▶ Genomic Sequence and Structural Diversity in Plants
▶ Mitochondrial Genomes
▶ Plant Genomes: From Sequence to Function Across Evolutionary Time
▶ Plasmid Genomes, Introduction to

Page 9 of 11
Molecular Life Sciences
DOI 10.1007/978-1-4614-6436-5_99-2
# Springer Science+Business Media New York 2014

References
Anderson S, Bankier AT, Barrell BG, De Bruijn MHL, Coulson AR, Drouin J, Eperon IC, Nierlich
DP, Roe BA, Sanger F, Schreier PH, Smith AJH, Staden R, Yougng IG (1981) Sequence and
organization of the human mitochondrial genome. Nature 290:457–465
Avery OT, Macleod CM, McCarty M (1944) Studies on the chemical nature of the substance
inducing transformation of pneumococcal types: induction of transformation by
a desoxyribonucleic acid fraction isolated from pneumococcus type III. J Exp Med 79:137–158
Beckwith J (2011) The operon as paradigm: normal science and the beginning of biological
complexity. J Mol Biol 409:7–13
Chargaff E, Zamenhof S, Green C (1950) Composition of human deoxypentose nucleic acid. Nature
165:756–757
Corneo G, Moore C, Sanadi DR, Grossman LI, Marmur J (1966) Mitochondrial DNA in yeast and
some mammalian species. Science 151:687–689
Crick FH, Barnett L, Brenner S, Watts-Tobin RJ (1961) General nature of the genetic code for
proteins. Nature 192:1227–1232
De S, Kumari J, Mudgal R, Modi P, Gupta S, Futami K, Goto H, Lindor NM, Furuichi Y,
Mohanty D, Sengupta S (2012) Recql4 is essential for the transport of p53 to mitochondria in
normal human cells in the absence of exogenous stress. J Cell Sci 125:2509–2522
Dulbecco R, Vogt M (1963) Evidence for a ring structure of polyoma virus DNA. Proc Natl Acad Sci
U S A 50:236–243
Griffith F (1928) The significance of pneumococcal types. J Hygiene 27:113–159
Hershey AD, Chase M (1952) Independent functions of viral protein and nucleic acid in growth of
bacteriophage. J Gen Physiol 36:39–56
Holt IJ, Harding AE, Morgan-Hughes JA (1988) Deletions of muscle mitochondrial DNA in patients
with mitochondrial myopathies. Nature 331:717–719
Jacob F, Monod J (1961) Genetic regulatory mechanisms in the synthesis of proteins. J Mol Biol
3:318–356
Jukes TH (1962) Relations between mutations and base sequences in the amino acid code. Proc Natl
Acad Sci U S A 48:1809–1815
Knight RD, Freeland SJ, Landweber LF (2001) Rewiring the keyboard: evolvability of the genetic
code. Nat Rev Genet 2:49–58
Leigh-Brown S, Enriquez JA, Odom DT (2010) Nuclear transcription factors in mammalian
mitochondria. Genome Biol 11:215
Nass MM, Nass S (1963) Intramitochondrial fibers with DNA characteristics. I Fixation and electron
staining reactions. J Cell Biol 19:593–611
Sayre A (1975) Rosalind Franklin and DNA. Norton, New York
Schatz G, Haslbrunner E, Tuppy H (1964) Deoxyribonucleic acid associated with yeast mitochon-
dria. Biochem Biophys Res Commun 15:127–132
Sharpley MS, Marciniak C, Eckel-Mahan K, Mcmanus M, Crimi M, Waymire K, Lin CS,
Masubuchi S, Friend N, Koike M, Chalkia D, Macgregor G, Sassone-Corsi P, Wallace DC
(2012) Heteroplasmy of mouse mtDNA is genetically unstable and results in altered behavior
and cognition. Cell 151:333–343
Van Bruggen EF, Borst P, Ruttenberg GJ, Gruber M, Kroon AM (1966) Circular mitochondrial
DNA. Biochim Biophys Acta 119:437–439
Wallace DC (2005) The mitochondrial genome in human adaptive radiation and disease: on the road
to therapeutics and performance enhancement. Gene 354:169–180

Page 10 of 11
Molecular Life Sciences
DOI 10.1007/978-1-4614-6436-5_99-2
# Springer Science+Business Media New York 2014

Wallace DC, Singh G, Lott MT, Hodge JA, Schurr TG, Lezza AM, Elsas LJ 2nd, Nikoskelainen EK
(1988) Mitochondrial DNA mutation associated with Leber’s hereditary optic neuropathy. Sci-
ence 242:1427–1430
Watson JD (1969) Double helix. Athenium, New York
Watson JD, Crick FH (1953) Molecular structure of nucleic acids; a structure for deoxyribose
nucleic acid. Nature 171:737–738
Weil R, Vinograd J (1963) The cyclic helix and cyclic coil forms of polyoma viral DNA. Proc Natl
Acad Sci U S A 50:730–738

Page 11 of 11

You might also like