Download as pdf or txt
Download as pdf or txt
You are on page 1of 236

PERIYAR INSTITUTE OF DISTANCE EDUCATION

(PRIDE)

PERIYAR UNIVERSITY
SALEM – 636 011.

B.Sc. BIOTECHNOLOGY
SECOND YEAR
PAPER – IV : MOLECULAR BIOLOGY

1
Prepared by :
Dr.G. SUBRAMANIAN., M.Sc., M.Phil.,
B.Ed., Ph.D.,
Lecturer, Department of Botany,
Arignar Anna Govt.Arts College (Men)
Namakkal.

2
B.Sc. BIOTECHNOLOGY
SECOND YEAR
PAPER – IV : MOLECULAR BIOLOGY

UNIT TITLE OF THE UNIT

I DNA – GENERAL INFORMATION

II DNA REPAIR MECHANISMS

III TRANSCRIPTION

IV TRANSLATION

V GENE ORGANIZATION AND


EXPRESSION

3
UNIT – I :
STRUCTURE
Nucleic Acids Structure and functions (DNA and RNA).
Watson and Crick model of DNA
Other forms of DNA (A and Z).
Functions of DNA and RNA including ribosomes.
DNA Replication Prokaryotic and Eukaryotic.

UNIT – II:
STRUCTURE
DNA Repair Causes and mechanism-
photo-reactivation, excision repair, mismatch repair, SOS repair.
Recombination in prokaryotes
Transformation, Conjugation and Transduction.

UNIT – III :
STRUCTURE
Transcription in Prokaryotes and Eukaryotes.
Mechanism of Promoters and RNA polymerase and transcription factors,

UNIT – IV:
STRUCTURE
Translation. Mechanism of translation in Prokaryotes and Eukaryotes,
Post translational modifications of proteins.
Regulation of Gene expression in Prokaryotes(Operon concept (Lac and Tryp))
and in Eukaryotes (galactose metabolism in yeast).

UNIT – V :
STRUCTURE
Gene organization and expression in Mitochondria and Chloroplasts.
Transposable elements in maize and drosophila.

4
UNIT – I :
STRUCTURE
Nucleic Acids Structure and functions (DNA and RNA).
Watson and Crick model of DNA
Other forms of DNA (A and Z).
Functions of DNA and RNA including ribosomes.
DNA Replication Prokaryotic and Eukaryotic.

5
THE STRUCTURE OF NUCLEIC ACIDS
DNA (deoxyribonucleic acid) and RNA (ribonucleic acid) are
polymers of nucleotides linked in a chain through phosphodiester bonds. In
biological systems, they serve as information-carrying molecules or, in the case
of some RNA molecules, catalysts. This brief review will focus on aspects of
structure of particular importance in manipulating DNA.
Bases, Nucleosides and Nucleotides
Nucleotides are the building blocks of all nucleic acids. Nucleotides
have a distinctive structure composed of three components covalently bound
together:
 a nitrogen-containing "base" - either a pyrimidine (one ring) or
purine (two rings)
 a 5-carbon sugar - ribose or deoxyribose
 a phosphate group
The combination of a base and sugar is called a nucleoside.
Nucleotides also exist in activated forms containing two or three phosphates,
called nucleotide diphosphates or triphosphates. If the sugar in a nucleotide is
deoxyribose, the nucleotide is called a deoxynucleotide; if the sugar is ribose,
the term ribonucleotide is used.
The structure of a nucleotide is depicted below. The structure on the
left - deoxyguanosine - depicts the base, sugar and phosphate moieties. In
comparison, the structure on the right has an extra hydroxyl group on the 2'
carbon of ribose, making it a ribonucleotide - riboguanosine or just guanosine.

In the right-hand figure, note also the 5' and 3' carbons on ribose
(or deoxyribose) - understanding this concept and nomenclature is critical to
understanding polarity of nucleic acids, as discussed below. The 5' carbon has
an attached phosphate group, while the 3' carbon has a hydroxyl group.
There are five common bases, and four are generally represented in
either DNA or RNA. Those bases and their corresponding nucleosides are
described in the following table:

6
Abbr. Base Nucleoside Nucleic Acid

deoxyadenosine DNA
A Adenine
adenosine RNA

deoxyguanosine DNA
G Guanine
guanosine RNA

deoxycytidine DNA
C Cytosine
cytidine RNA

T Thymine deoxythymidine (thymidine) DNA

U Uracil uridine RNA

Another useful way to categorize nucleotide bases is as purines (A and


G) versus pyrimidines (C, T and U). Although committing this to memory is
often difficult, the importance is that in double-stranded nucleic acids, base
pairs are always formed between a purine and a pyrimidine.
Nucleic Acids
DNA and RNA are synthesized in cells by DNA polymerases and
RNA polymerases. Short fragments of nucleic acids also are commonly
produced without enzymes by oligonucleotide synthesizers. In all cases, the
process involves forming phosphodiester bonds between the 3' carbon of one
nucleotide and the 5' carbon of another nucleotide. This leads to formation of
the so-called "sugar-phosphate backbone", from which the bases project.

7
A key feature of all nucleic acids is that they have two distinctive
ends: the 5' (5-prime) and 3' (3-prime) ends. This terminology refers to the
5' and 3' carbons on the sugar. For both DNA (shown above) and RNA, the 5'
end bears a phosphate, and the 3' end a hydroxyl group.
Another important concept in nucleic acid structure is that DNA
and RNA polymerases add nucleotides to the 3' end of the previously
incorporated base. Another way to put this is that nucleic acids are
synthesized in a 5' to 3' direction.
Base Pairing and Double Stranded Nucleic Acids
Most DNA exists in the famous form of a double helix, in which two
linear strands of DNA are wound around one another. The major force
promoting formation of this helix is complementary base pairing: A's form
hydrogen bonds with T's (or U's in RNA), and G's form hydrogen bonds with
C's. If we mix two ATGC's together, the following duplex will form:

Examine the figure above and note two very important features:
The two strands of DNA are arranged antiparallel to one
another: viewed from left to right the "top" strand is aligned 5' to 3', while the
"bottom" strand is aligned 3' to 5'. This is always the case for duplex nucleic
acids.

8
 G-C base pairs have 3 hydrogen bonds, whereas A-T base pairs
have 2 hydrogen bonds: one consequence of this disparity is that it takes more
energy (e.g. a higher temperature) to disrupt GC-rich DNA than AT-rich DNA.
The figures above fail to impart any appreciation of the three-
dimensional structure of DNA. This deficiency can be rectified to some extent
by viewing and manipulating a 3-D model of duplex DNA.
What about double stranded RNA? RNAs are
usually single stranded, but many RNA molecules have
secondary structure in which intramolecular loops are
formed by complementary base pairing. A simple
example of this is shown in the figure to the right, and
much more extensive and complex examples are known.
Base pairing in RNA follows exactly the same principles
as with DNA: the two regions involved in duplex
formation are antiparallel to one another, and the base pairs that form are A-U
and G-C.
OK, what about RNA-DNA hybrids? Can they form? The answer is
yes. Complementary sequences of RNA and DNA readily anneal with one
another to form duplexes. In fact, RNA-DNA hybrids are more stable than the
corresponding DNA-DNA and RNA-RNA duplexes.
Finally, does understanding base-pairing have relevance to biotechnology
per se? Absolutely yes! This simple chemistry is at the heart of nucleic acid
hybridization, polymerase chain reaction, antisense technology, mutagenesis,
and many other of the techniques commonly applied in biotechnology labs.
STRUCTURE AND FUNCTION OF RNA
RNA is structurally similar to DNA!
Both nucleic acids are sugar-phosphate polymers and both have nitrogen bases
attached to the sugars of the backbone- but there are several important
differences.
They differ in composition:
1 The sugar in RNA is ribose, not the deoxyribose in DNA
2 The base uracil is present in RNA instead of thymine.
They also differ in size and structure:
1 RNA molecules are smaller (shorter) than DNA molecules,
2 RNA is single-stranded, not double-stranded like DNA.
Another difference between RNA and DNA is in function. DNA has only
one function-STORING GENETIC INFORMATION in its sequence of

9
nucleotide bases. But there are three main kinds of ribonucleic acid, each of
which has a specific job to do.
1 Ribosomal RNAs-exist outside the nucleus in the cytoplasm of a cell in
structures called ribosomes. Ribosomes are small, granular structures where
protein synthesis takes place. Each ribosome is a complex consisting of about
60% ribosomal RNA (rRNA) and 40% protein.
2 Messenger RNAs-are the nucleic acids that "record" information from
DNA in the cell nucleus and carry it to the ribosomes and are known as
messenger RNAs (mRNA).
3 Transfer RNAs-The function of transfer RNAs (tRNA) is to deliver
amino acids one by one to protein chains growing at ribosomes.
RNA Structure
For RNA, nucleosides are formed similarly to DNA. RNA exist as a
single strand. Hairpin is a common secondary/tertiay structure. It requires
complementarity betweem part of the strand. the figure on the left is a
schematic representation of the haipin structure.
The chime image below represent yeast tRNA and has been extracted
from the RNA structure tour pages from Carnegy Mellon University). colors
are set from red at the 3' end to blue at the 5' end. download chime Double
standed RNA can also exists and is generally similar to A-DNA (present is few
viruses)

RNA Structures - The nucleic acid, other than DNA, which exists in both
prokaryotes and eukaryotes is ribonucleic acid. In some plant and animal

10
viruses, RNA molecules can act as genetic material. Like DNA, RNA is also a
long polymer of nucleotide. But there are some differences between DNA and
RNA.
1. RNA is single-stranded and does not have antiparallel or complementary
strand.
2. RNA also contains four major bases adenine, guanine, cytosine and uracil
instead of thymine
3. The essential difference between DNA and RNA is the type of sugar each
contains RNA contains the sugar D ribose (hence ribonucleic acid, RNA).
Ribose sugar contains OH at 2' carbon, which is absent in deoxyribose. This
minor structural difference confers very different chemical and physical
properties upon DNA and RNA. RNA is much stiffer due to steric hindrance
and more susceptible to hydrolysis in alkaline conditions.
RiboNucleic Acids consist of:
1. Ribose (a pentose = sugar with 5 carbons)
2. Phosphoric Acid
3. Organic (nitrogenous) bases: Purines (Adenine and Guanine) and
Pyrimidines (Cytosine and Uracil)
An RNA molecule is a linear polymer in which the monomers (nucleotides)
are linked together by means of phosphodiester bridges, or bonds. These
bonds link the 3' carbon in the ribose of one nucleotide to the 5' carbon in the
ribose of the adjacent nucleotide.
This is illustrated in Figure 1.

Figure 1: A segment of a single nucleic acid chain.

11
Figure 2: model of a molecule of RNA. It is not a double helix, but in some
places, (shaded in green), the positions of the bases are stabilised by hydrogen
bonds.
Naming nucleosides and nucleotides.

Definitions Bases

Adenine Guanine Cytosine Uracyl


(A) (G) (C) (U)

The combination of a ribose and


Adenosine Guanosine Cytidine Uridine
a base constitutes a nucleoside .

The combination of a
phosphate, a ribose and a base Adenylate Guanylate Cytidylate Uridylate
constitutes a nucleotide.

RNA : a few facts...


Some RNA is found in the nucleus, where
it is synthesised, and in the cytoplasm, as
messenger RNA, transfer RNA or ribosomal
RNA. These forms of RNA are involved in the
protein synthesis. Many other RNAs that are
found in the cell, many of them in vital important
catalytic steps.

12
The rule A+C=U+G CAN'T BE APPLIED THERE
because most RNA is single stranded and does not form a double helix.
Although each RNA molecule has only a single polynucleotide chain, it
is not a smooth linear structure. It has extensive regions of complementary AU,
or GC pairs. Therefore, the molecule folds on itself forming structures called
hairpin loops. In the base paired region, the RNA molecule adopts a helical
structure as in DNA.
Note that some viruses genomes are made of double stranded RNA.
RNA Ribonucleic Acid - A More Detailed Description

An Overview of DNA Functions


The functions of DNA are vital for inheritance, coding for proteins and
the genetic blueprint of life. Given the enormity of DNA's functions in the
human body and its responsibility for the growth and maintenance of life, it is
not surprising that the discovery of DNA has led to such a great number of

13
developments in treating disease. DNA holds the instructions for an organism's
development and reproduction - ultimately, its survival.
Coding for Proteins
DNA holds the code for proteins, which are complex molecules that do
huge amounts of work around our body. Information in DNA is initially 'read'
and then it is transcribed into a messenger molecule. After, the information
held in this messenger molecule is translated into a 'language' that the body can
understand. This language is one of amino acids, which are also known as the
building blocks of proteins. It is this specific language that dictates how the
amino acids should produce a particular protein. If you think about the twenty
different kinds of amino acids, you can see that the ordering can produce an
enormous variety of proteins.
DNA Replication
DNA replication is vital for a virtually endless list of functions, from
reproduction to maintenance and growth of cells, tissues and body systems. To
copy itself, a DNA molecule essentially 'unzips,' thus resulting in a series of
bases without pairs along the backbone of the molecule. DNA has four bases -
all part of a nucleotide that also consists of a sugar and phosphate. The four
bases in DNA are very specific about which base they will attach to, which
means that adenine only pairs with thymine and guanine will only pair with
cytosine. As the nucleotides connect with unpaired bases on the backbone of
the DNA molecule, they build a new strand that complements - or matches - the
original sequence. The end result is a strand that is a perfect match to the
original one prior to it unzipping.
Cells in your body replicate for purposes such as making new skin or
blood cells. When mistakes occur, there are repair systems in place to remedy
the mistake or alternately, a cell has a marker for destruction. If a cell survives
a mutation, there are still benefits to an organism. In fact, this concept is
essentially the basis for evolution.
Genetic Code
DNA is important in terms of our genetic code, in the sense that it
transfers genetic messages to all of the cells in your body. If you think about
DNA in a reproductive sense, consider that the joining of an egg and sperm to
create your first cell provided your completed genetic code that your body
would use for all of your life. Within that initial cell, half of your chromosomes
- containing your DNA - came from your father and half came from your
mother.

14
DNA clearly plays important roles in the human body and is one of the
most significant discoveries of the twentieth century. Our continued research
and knowledge of DNA functions will likely help us to learn even more about
this important molecule.
Biological functions
DNA usually occurs as linear chromosomes in eukaryotes, and circular
chromosomes in prokaryotes. The set of chromosomes in a cell makes up its
genome; the human genome has approximately 3 billion base pairs of DNA
arranged into 46 chromosomes. The information carried by DNA is held in the
sequence of pieces of DNA called genes. Transmission of genetic information
in genes is achieved via complementary base pairing. For example, in
transcription, when a cell uses the information in a gene, the DNA sequence is
copied into a complementary RNA sequence through the attraction between the
DNA and the correct RNA nucleotides. Usually, this RNA copy is then used to
make a matching protein sequence in a process called translation which
depends on the same interaction between RNA nucleotides. Alternatively, a
cell may simply copy its genetic information in a process called DNA
replication. The details of these functions are covered in other articles; here we
focus on the interactions between DNA and other molecules that mediate the
function of the genome.
Genes and genomes
Genomic DNA is located in the cell nucleus of eukaryotes, as well as
small amounts in mitochondria and chloroplasts. In prokaryotes, the DNA is
held within an irregularly shaped body in the cytoplasm called the nucleoid.
The genetic information in a genome is held within genes, and the complete set
of this information in an organism is called its genotype. A gene is a unit of
heredity and is a region of DNA that influences a particular characteristic in an
organism. Genes contain an open reading frame that can be transcribed, as well
as regulatory sequences such as promoters and enhancers, which control the
transcription of the open reading frame.
In many species, only a small fraction of the total sequence of the
genome encodes protein. For example, only about 1.5% of the human genome
consists of protein-coding exons, with over 50% of human DNA consisting of
non-coding repetitive sequences. The reasons for the presence of so much non-
coding DNA in eukaryotic genomes and the extraordinary differences in
genome size, or C-value, among species represent a long-standing puzzle
known as the "C-value enigma." However, DNA sequences that do not code
protein may still encode functional non-coding RNA molecules, which are
involved in the regulation of gene expression.

15
Some non-coding DNA sequences play structural roles in
chromosomes. Telomeres and centromeres typically contain few genes, but are
important for the function and stability of chromosomes. An abundant form of
non-coding DNA in humans are pseudogenes, which are copies of genes that
have been disabled by mutation.[61] These sequences are usually just molecular
fossils, although they can occasionally serve as raw genetic material for the
creation of new genes through the process of gene duplication and divergence.
Transcription and translation
A gene is a sequence of DNA that contains genetic information and can
influence the phenotype of an organism. Within a gene, the sequence of bases
along a DNA strand defines a messenger RNA sequence, which then defines
one or more protein sequences. The relationship between the nucleotide
sequences of genes and the amino-acid sequences of proteins is determined by
the rules of translation, known collectively as the genetic code. The genetic
code consists of three-letter 'words' called codons formed from a sequence of
three nucleotides (e.g. ACT, CAG, TTT).
In transcription, the codons of a gene are copied into messenger RNA
by RNA polymerase. This RNA copy is then decoded by a ribosome that reads
the RNA sequence by base-pairing the messenger RNA to transfer RNA, which
carries amino acids. Since there are 4 bases in 3-letter combinations, there are
64 possible codons (43 combinations). These encode the twenty standard amino
acids, giving most amino acids more than one possible codon. There are also
three 'stop' or 'nonsense' codons signifying the end of the coding region; these
are the TAA, TGA and TAG codons.
DNA replication. The double helix is unwound by a helicase and
topoisomerase. Next, one DNA polymerase produces the leading strand copy.
Another DNA polymerase binds to the lagging strand. This enzyme makes
discontinuous segments (called Okazaki fragments) before DNA ligase joins
them together.
Interactions with proteins
All the functions of DNA depend on interactions with proteins. These
protein interactions can be non-specific, or the protein can bind specifically to a
single DNA sequence. Enzymes can also bind to DNA and of these, the
polymerases that copy the DNA base sequence in transcription and DNA
replication are particularly important.
DNA-binding proteins
Structural proteins that bind DNA are well-understood examples of
non-specific DNA-protein interactions. Within chromosomes, DNA is held in

16
complexes with structural proteins. These proteins organize the DNA into a
compact structure called chromatin. In eukaryotes this structure involves DNA
binding to a complex of small basic proteins called histones, while in
prokaryotes multiple types of proteins are involved.[64][65] The histones form a
disk-shaped complex called a nucleosome, which contains two complete turns
of double-stranded DNA wrapped around its surface. These non-specific
interactions are formed through basic residues in the histones making ionic
bonds to the acidic sugar-phosphate backbone of the DNA, and are therefore
largely independent of the base sequence.[66] Chemical modifications of these
basic amino acid residues include methylation, phosphorylation and
acetylation. These chemical changes alter the strength of the interaction
between the DNA and the histones, making the DNA more or less accessible to
transcription factors and changing the rate of transcription.[68] Other non-
specific DNA-binding proteins in chromatin include the high-mobility group
proteins, which bind to bent or distorted DNA. These proteins are important in
bending arrays of nucleosomes and arranging them into the larger structures
that make up chromosomes.
A distinct group of DNA-binding proteins are the DNA-binding
proteins that specifically bind single-stranded DNA. In humans, replication
protein A is the best-understood member of this family and is used in processes
where the double helix is separated, including DNA replication, recombination
and DNA repair.[71] These binding proteins seem to stabilize single-stranded
DNA and protect it from forming stem-loops or being degraded by nucleases.
In contrast, other proteins have evolved to bind particular DNA
sequences. The most intensively-studied of these are the various transcription
factors, which are proteins that regulate transcription. Each transcription factor
binds to one particular set of DNA sequences and activates or inhibits the
transcription of genes that have these sequences close to their promoters. The
transcription factors do this in two ways. Firstly, they can bind the RNA
polymerase responsible for transcription, either directly or through other
mediator proteins; this locates the polymerase at the promoter and allows it to
begin transcription.[73] Alternatively, transcription factors can bind enzymes
that modify the histones at the promoter; this will change the accessibility of
the DNA template to the polymerase.
As these DNA targets can occur throughout an organism's genome,
changes in the activity of one type of transcription factor can affect thousands
of genes. Consequently, these proteins are often the targets of the signal
transduction processes that control responses to environmental changes or
cellular differentiation and development. The specificity of these transcription

17
factors' interactions with DNA come from the proteins making multiple
contacts to the edges of the DNA bases, allowing them to "read" the DNA
sequence. Most of these base-interactions are made in the major groove, where
the bases are most accessible.
DNA-modifying enzymes
Nucleases and ligases
Nucleases are enzymes that cut DNA strands by catalyzing the
hydrolysis of the phosphodiester bonds. Nucleases that hydrolyse nucleotides
from the ends of DNA strands are called exonucleases, while endonucleases cut
within strands. The most frequently-used nucleases in molecular biology are
the restriction endonucleases, which cut DNA at specific sequences. For
instance, the EcoRV enzyme shown to the left recognizes the 6-base sequence
5′-GAT|ATC-3′ and makes a cut at the vertical line. In nature, these enzymes
protect bacteria against phage infection by digesting the phage DNA when it
enters the bacterial cell, acting as part of the restriction modification system.[78]
In technology, these sequence-specific nucleases are used in molecular cloning
and DNA fingerprinting.
Enzymes called DNA ligases can rejoin cut or broken DNA strands.[79]
Ligases are particularly important in lagging strand DNA replication, as they
join together the short segments of DNA produced at the replication fork into a
complete copy of the DNA template. They are also used in DNA repair and
genetic recombination
RNA FUNCTION
RNA, which is made up of nucleic acids, has a variety of functions in a
cell and is found in many organisms including plants, animals, viruses, and
bacteria. Ribonucleic acid (RNA) and deoxyribonucleic acid (DNA) differ
functionally. DNA primarily serves as the storage material for genetic
information. RNA can function as a carrier of genetic information, a catalyst of
biochemical reactions, an adapter molecule in protein synthesis, and a
structural molecule in cellular organelles.
Since the discovery of DNA and RNA in the 1950s, scientists have
studied the function and structure of the components that makeup these
structures. The various types and functions of RNA have been investigated by
numerous researchers, including Spanish physiologist Severo Ochoa (1905–
1993), who received a Nobel prize in 1959 for his contributions to our
understanding of how RNA is synthesized.
There are five major types of RNA that are found in the cells of
eukaryotes. These include heterogeneous nuclear RNA (hnRNA), messenger

18
RNA (mRNA), transfer RNA tRNA), ribosomal RNA (rRNA), and small
nuclear RNA. Structurally, hnRNA and mRNA are both single stranded, while
rRNA and tRNA form three–dimensional molecular configurations. Each type
of RNA has a different role in various cellular processes. In addition to these
functions, RNA plays an important role in the ability of certain viruses to cause
infection.
One of the primary functions of RNA is to facilitate the translation of
DNA into protein. This process begins in the nucleus of the cell with a series of
enzymatic reactions that transcribe DNA into heterogeneous nuclear RNA by
complementary base pairing. Since hnRNA is a direct copy of DNA, it contains
exons and introns which are coding and noncoding regions of nucleotides,
respectively. hnRNA undergoes post–transcriptional processing that involves
removal of the introns and the addition of adenines to the end of single stranded
RNA molecules (a process called capping), which are now referred to as
mRNA. mRNA is transported out of the nucleus into the cytoplasm of the cell.
In this way, it functions as a carrier for information from the cells DNA to the
protein synthesizing organelle, called the ribosome.
The mRNA attaches to the ribosome to allow for the initiation of
protein synthesis. Part of this process involves another type of RNA that is
located in the ribosome called tRNA. tRNA is an adapter molecule, which
functions as a bridge between a specific three-base sequence or codon in the
mRNA strand and the amino acids that are used to construct the protein. The
tRNA carries an amino acid that matches the specific codon and this process
begins and stops based on specific sequences in the mRNA. Each amino acid is
transferred to the growing polypeptide by chemical interactions to produce a
full-length protein. Another type of RNA that is part of the ribosome and is
involved in protein synthesis is rRNA. rRNA has two primary functions. First,
it provides the structure and shape producing the catalytic regions of the
ribosome. Second, it helps speed up, or catalyze, protein synthesis by
interactions between the tRNA and the protein synthesis machinery.
While DNA and RNA are very similar in their composition, RNA has a
different roles. RNA can serve as a component of the translation machinery and
catalyze chemical reactions. For example, in addition to RNA molecules such
as rRNA, ribozymes are also a type of RNA that can serve catalytic functions.
rRNA functions as a ribozyme during protein synthesis. Another form of RNA
that acts as a ribozyme is the small nuclear ribonucleoprotein. During the
process of RNA splicing, this ribozyme—like, RNA—containing structure
catalyzes reactions in the spliceosome, a group of biomolecules that are

19
involved in removal of the intron, or splicing the hnRNA. These molecules,
therefore, play a role in the processing of the hnRNA
Certain viruses contain RNA as their primary genetic material. Viruses
bind to a specific protein or receptor on the surface of the cell that it is going to
infect. RNA, the virus's genetic material, is injected into the cell. The viral
RNA associates with the ribosomes that belong to the cell it is infecting. In a
sense, viruses hijack the host's molecular machinery, using the cells
transcriptional abilities for its own purpose, to produce viral proteins. The viral
proteins then form new viruses. Viral RNA can also form replication
complexes where it can copy itself. This copied RNA then gets packaged into
the newly created viruses that can cause the cell to lyse, or break open, and
these released viruses can infect other cells.
Currently, there is growing interest in small, barely detectable RNA
molecules that do not translate into protein, but have been shown to be
important in regulating gene expression. Called RNA genes, these small
molecules were initially identified in the species of worm Caenorhabditis
elegans by American geneticist Victor Ambros and colleagues in the early
1990s. They were shown to turn off gene expression during worm
development. This novel function was later demonstrated in other species.
American geneticist Stephen R. Holbrook of Lawrence Berkeley National
Laboratory in California in a report in the October 1, 2001, journal, Nucleic
Acids Research, identified many other potential RNA genes previously
undetected using a complex computer program called RNAGENiE. Biotech
companies are currently using RNA genes as potential drug targets because of
recent interest in RNA genes produced during bacterial infections and their
pathogenic effects through the regulation of gene expression of host DNA.

20
Watson and Crick describe structure of DNA 1953

Photo: Model of DNA molecule


In the late nineteenth century, a German biochemist found the nucleic
acids, long-chain polymers of nucleotides, were made up of sugar, phosphoric
acid, and several nitrogen-containing bases. Later it was found that the sugar in
nucleic acid can be ribose or deoxyribose, giving two forms: RNA and DNA.
In 1943, American Oswald Avery proved that DNA carries genetic
information. He even suggested DNA might actually be the gene. Most people
at the time thought the gene would be protein, not nucleic acid, but by the late
1940s, DNA was largely accepted as the genetic molecule. Scientists still
needed to figure out this molecule's structure to be sure, and to understand how
it worked.
In 1948, Linus Pauling discovered that many proteins take the shape of
an alpha helix, spiraled like a spring coil. In 1950, biochemist Erwin Chargaff
found that the arrangement of nitrogen bases in DNA varied widely, but the
amount of certain bases always occurred in a one-to-one ratio. These
discoveries were an important foundation for the later description of DNA.
In the early 1950s, the race to discover DNA was on. At Cambridge
University, graduate student Francis Crick and research fellow James Watson
(b. 1928) had become interested, impressed especially by Pauling's work.
Meanwhile at King's College in London, Maurice Wilkins (b. 1916) and
Rosalind Franklin were also studying DNA. The Cambridge team's approach
was to make physical models to narrow down the possibilities and eventually
create an accurate picture of the molecule. The King's team took an

21
experimental approach, looking particularly at x-ray diffraction images of
DNA.
In 1951, Watson attended a lecture by Franklin on her work to date. She
had found that DNA can exist in two forms, depending on the relative humidity
in the surrounding air. This had helped her deduce that the phosphate part of
the molecule was on the outside. Watson returned to Cambridge with a rather
muddy recollection of the facts Franklin had presented, though clearly critical
of her lecture style and personal appearance. Based on this information, Watson
and Crick made a failed model. It caused the head of their unit to tell them to
stop DNA research. But the subject just kept coming up.
Franklin, working mostly alone, found that her x-ray diffractions
showed that the "wet" form of DNA (in the higher humidity) had all the
characteristics of a helix. She suspected that all DNA was helical but did not
want to announce this finding until she had sufficient evidence on the other
form as well. Wilkins was frustrated. In January, 1953, he showed Franklin's
results to Watson, apparently without her knowledge or consent. Crick later
admitted, "I'm afraid we always used to adopt -- let's say, a patronizing attitude
towards her."
Watson and Crick took a crucial conceptual step, suggesting the
molecule was made of two chains of nucleotides, each in a helix as Franklin
had found, but one going up and the other going down. Crick had just learned
of Chargaff's findings about base pairs in the summer of 1952. He added that to
the model, so that matching base pairs interlocked in the middle of the double
helix to keep the distance between the chains constant.
Watson and Crick showed that each strand of the DNA molecule was a
template for the other. During cell division the two strands separate and on
each strand a new "other half" is built, just like the one before. This way DNA
can reproduce itself without changing its structure -- except for occasional
errors, or mutations.
The structure so perfectly fit the experimental data that it was almost
immediately accepted. DNA's discovery has been called the most important
biological work of the last 100 years, and the field it opened may be the
scientific frontier for the next 100. By 1962, when Watson, Crick, and Wilkins
won the Nobel Prize for physiology/medicine, Franklin had died. The Nobel
Prize only goes to living recipients, and can only be shared among three
winners. Were she alive, would she have been included in the prize?
Watson Crick Model of DNA - Watson and Crick proposed the DNA model
by using all the information that was available at that time. They used the data
obtained from experiments carried out on DNA by Chargaff, and Maurice

22
Wilkins and Rosalind Franklin. Before we go through Watson-Crick double
helix we must look at the work of these people.
Chargaff Rule - Chargaff rule states that the number of purines is always equal
to the number of pyrimidines in a given DNA. The relationship is that the
number of adenine residue equals the number of thymine, and the number of
guanines equals the number of cytosine that is A = T and G = C. Thus we can
say A + G = C + T.
X-ray diffraction studies X-ray diffraction studies by M. Wilkins and R.
Franklin suggested that DNA could be a helix with two regular periodicities of
3.4 A and 34 A along the axis of the molecules.
Based upon the above facts, Watson and Crick proposed the famous
DNA structure model. The important features of this model are:
1. The DNA molecule is a double helix with single polynucleotides
(phosphates, sugar, base) running in opposite directions.
2. The double helix is right-handed. This means that if the double helix is a
spiral staircase that you were climbing up, the base would be on your right
hand side.
3. The double helix has two different grooves. The helix is not absolutely
regular. A major (-22A) and a minor (-12A) groove can be distinguished.
This feature is important in the interaction between the double helix and
the proteins involved in DNA replication and in expression.
4. The nitrogenous bases are stacked towards the inside of the helix. The
experimental evidence also indicated that the sugar-phosphate backbone of
the molecules is on the outside, with the bases inside the helix.
5. Bases of the two polynucleotides interact by hydrogen bonding. This is the
explanation of Char gaffs base pairs.An adenine residue in one of the
polynucleotides is always adjacent to a thymine in the other strand;
similarly guanine is always adjacent to cytosine. These two pairs of bases,
and no other combinations are able to form hydrogen bonds between each
other. These hydrogen bonds are the only attractive forces between the two
polynucleotides of the double helix and serve to hold the structure
together.
6. Ten base pairs occur per turn of the helix. The double helix executes a turn
every ten base pairs (abbreviated as 10 bp). The height or pitch of the helix
is 34 A. The bases are stacked one on top of the other like a pile of plates.
The space between the two base pairs is 8.4 A and has an angle of 36°.
7. The diameter of the helix is 20 A.

23
A Form of DNA - A-form of DNA is formed when relative humidity is 70%. It
is right-handed, wider, incorporates more base pairs per helical turn and is less
flexible than the B- DNA. The major groove is deep and narrow, whereas the
minor groove is broad and shallow. The diameter of the DNA is 23 A. Eleven
base pairs are present per turn with a distance of2.6 A between them. A-form of
DNA is less soluble than the B-form. The helix height or pith is 31A.
Z Form of DNA - Alexander Rich and his colleagues discovered in 1979 that
DNA does not always have the right-handed form, it can be left handed also.
They showed that double stranded DNA containing strands of alternating
purines and pyrimidines (e.g. poly GC) can exist in an extended left handed
helical form. Because of the zigzag look of this DNA backbone when viewed
from the side, it is often called as Z- DNA. The major groove is flat and the
minor groove is narrow and very deep. The diameter of the DNA is 18 A.
Twelve base pairs are present per turn with a distance of 3.7 A between
them. Z-DNA structures tend to form in torsionally stressed DNA and are
stabilized by dehydration. Originally, it was thought that Z- DNA would not
prove of interest to biologists because it required very high salt concentration to
become stable. However, it was found that Z-DNA can be stabilized in
physiologically normal conditions, if methyl groups are added to the cytosine.
Z- DNA may be involved in regulating given expression in eukaryotes.
Multiple Stranded DNA - Under natural conditions, single stranded RNA and
double-stranded DNA are the rule. However, under laboratory conditions, it is
possible to induce a third strand of DNA to interdigitate itself into the major
groove of the double helix of normal DNA in a sequence-specific fashion. The
binding is site specific and rules of binding are less precise than normal.
Triple stranded nucleotide chains were first created in 1957 by
Alexander Rich, David Daves and Gary Felsenfeld, while they were creating
artificial nucleic acids. At that time, triple-stranded DNA seemed like a
laboratory curiosity. Now it seems of interest, because it may have valuable
uses, both experimentally and clinically. Triplex DNA is generated by binding
a known single stranded DNA because single strand of DNA is capable of
recognizing a relatively long sequence of the double stranded DNA in a
chromosome
Thus it is possible to selectively locate a particular gene loci or
sequence. The second use of triplex DNA is to cut DNA at a specific place by
adding a cleaving compound to both ends of the third strand of DNA. However,
it seems to have good potential for therapeutic use and to help in studying and
mapping the human genome.

24
More recently, four stranded DNA molecules have been found in which
double helices of certain sequences interdigitate to form four stranded
structures. Guanine can form base tetrads and DNA containing runs of
guanosine residues can form quadruplex structures, which may contribute to
telomerase structure.
Telomeric DNA consist of short, tandemly repeated sequence. These
have been characterized from a number of eukaryotes and are generally GC
rich with guanine residues clustered on the one strand and cytosine residues on
the other. They may form unusual quadruplex structures by unorthodox
interactions between guanosine residues and which may playa role in
protecting the telomere from end-joining reactions. Four stranded DNA plays
an important role in crossover.
Ribosome
Ribosomes are complexes of RNA and protein that are found in all
cells. Ribosomes from bacteria, archaea and eukaryotes, the three domains of
life, have significantly different structure and RNA. Interestingly, the
ribosomes in the mitochondrion of eukaryotic cells resemble those in bacteria,
reflecting the evolutionary origin of this organelle.
The ribosome functions in the expression of the genetic code from
nucleic acid into protein, in a process called translation. Ribosomes do this by
catalyzing the assembly of individual amino acids into polypeptide chains; this
involves binding a messenger RNA and then using this as a template to join
together the correct sequence of amino acids. This reaction uses adapters called
transfer RNA molecules, which read the sequence of the messenger RNA and
are attached to the amino acids.
Function
Ribosomes are the workhorses of protein biosynthesis, the process of
translating mRNA into protein. The mRNA comprises a series of codons that
dictate to the ribosome the sequence of the amino acids needed to make the
protein. Using the mRNA as a template, the ribosome traverses each codon (3
nucleotides) of the mRNA, pairing it with the appropriate amino acid provided
by a tRNA. Molecules of transfer RNA (tRNA) contain a complementary
anticodon on one end and the appropriate amino acid on the other. The small
ribosomal subunit, typically bound to a tRNA containing the amino acid
methionine, binds to an AUG codon on the mRNA and recruits the large
ribosomal subunit. The ribosome then contains three RNA binding sites,
designated A, P, and E. The A site binds an aminoacyl-tRNA (a tRNA bound to
an amino acid); the P site binds a peptidyl-tRNA (a tRNA bound to the peptide
being synthesized); and the E site binds a free tRNA before it exits the

25
ribosome. Protein synthesis begins at a start codon AUG near the 5' end of the
mRNA. mRNA binds to the P site of the ribosome first. The ribosome is able to
identify the start codon by use of the Shine-Dalgarno sequence of the mRNA in
prokaryotes and Kazak box in eukaryotes.

Figure 3 : Translation of mRNA (1) by a ribosome (2) into a polypeptide chain


(3). The mRNA begins with a start codon (AUG) and ends with a stop codon
(UAG).
In Figure 3, both ribosomal subunits (small and large) assemble at the
start codon (towards the 5' end of the mRNA). The ribosome uses tRNA which
matches the current codon (triplet) on the mRNA to append an amino acid to
the polypeptide chain. This is done for each triplet on the mRNA, while the
ribosome moves towards the 3' end of the mRNA. Usually in bacterial cells,
several ribosomes are working parallel on a single mRNA, forming what is
called a polyribosome or polysome.
DNA replication
DNA replication. The double helix is unwound and each strand acts as a
template. Bases are matched to synthesize the new partner strands.
DNA replication is the process of copying a double-stranded DNA molecule to
form two double-stranded molecules.[1][2] The process of DNA replication is a
fundamental process used by all living organisms as it is the basis for biological
inheritance. As each DNA strand holds the same genetic information, both
strands can serve as templates for the reproduction of the complementary
strand. The template strand is preserved in its entirety and the new strand is
assembled from nucleotides. This process is called "semiconservative
replication". The resulting double-stranded DNA molecules are identical;
proofreading and error-checking mechanisms exist to ensure near perfect
fidelity. ¤ In a cell, DNA replication must happen before cell division can
occur. DNA synthesis begins at specific locations in the genome, called
"origins", where the two strands of DNA are separated.[3] RNA primers attach
to single stranded DNA and the enzyme DNA polymerase extends the primers
to form new strands of DNA, adding nucleotides matched to the template
strand. The unwinding of DNA and synthesis of new strands forms a

26
replication fork. In addition to DNA polymerase, a number of other proteins are
associated with the fork and assist in the initiation and continuation of DNA
synthesis.
DNA replication can also be performed artificially, using the same
enzymes used within the cell. DNA polymerases and artificial DNA primers
are used to initiate DNA synthesis at known sequences in a template molecule.
The polymerase chain reaction (PCR), a common laboratory technique,
employs artificial synthesis in a cyclic manner to rapidly and specifically
amplify a target DNA fragment from a pool of DNA.
DNA structure
DNA usually exists in a double-stranded structure, with both strands
coiled together to form the characteristic double-helix. Each single strand of
DNA is a chain of four types of nucleotide: adenine, cytosine, guanine, and
thymine. A nucleotide consists of a phosphate and a deoxyribose sugar forming
the backbone of the DNA double helix plus a base that points inwards.
Nucleotides are matched between strands through hydrogen bonds to form base
pairs. Adenine pairs with thymine and cytosine pairs with guanine.
The physical pairing of bases in DNA means that the information
contained within each strand is redundant. The nucleotides on a single strand
can be used to reconstruct nucleotides on a newly synthesized partner strand.
DNA strands have a directionality, and the different ends of a single
strand are called the "3' end" and the "5' end" (these refer to the carbon atom in
ribose that the next phosphate in the chain attaches to). In addition to being
complementary, the two strands of DNA are antiparallel: they are orientated in
opposite directions. This directionality has consequences in DNA synthesis,
because DNA polymerase can only synthesize DNA in one direction by adding
nucleotides to the 3' end of a DNA strand.
DNA polymerase
DNA polymerase adds nucleotides to the 3' end of a strand of DNA. If a
mismatch is accidentally incorporated, the polymerase is inhibited from further
extension. Proofreading removes the mismatched nucleotide and extension
continues.
DNA polymerases are a family of enzymes critical for all forms of
DNA replication. A DNA polymerase synthesizes a new strand of DNA by
extending the 3' end of an existing nucleotide chain, adding new nucleotides
matched to the template strand one at a time. Some DNA polymerases may also
have some proofreading ability, removing nucleotides from the end of a strand

27
in order to remove any mismatched bases. DNA polymerases are generally
extremely accurate, making less than one error for every 109 nucleotides added.
The energy for the process of DNA polymerization comes from the two
additional phosphates attached to each of the unincorporated nucleotides. These
free nucleotides, also known as nucleoside triphosphates, contain a total of
three phosphates. When a nucleotide is being added to a growing DNA strand,
two of the phosphates are removed and the energy produced is used to attach
the remaining phosphate to the growing chain. The energetics of this process
may also explain the directionality of synthesis - if DNA were synthesized in
the 3' to 5' direction, the energy for the process would come from the 5' end of
the growing strand rather than from free nucleotides. During proofreading, if
the 5' nucleotide needed to be removed this triphosphate end would be lost,
losing the energy source required to add a new nucleotide to the end.
DNA polymerase can only extend an existing DNA strand paired with a
template strand, it cannot begin the synthesis of a new strand. To do this a short
fragment of DNA or RNA, called a primer, must be created and paired with the
template strand before DNA polymerase can synthesize new DNA.
DNA replication within the cell
Origins of replication
For a cell to divide, it must first replicate itself into newer ones DNA.[5]
This process is initiated at particular points within the DNA, known as
"origins", which are targeted by proteins that separate the two strands and
initiate DNA synthesis.[3] Origins contain DNA sequences recognized by
replication initiator proteins (eg. dnaA in E coli' and the Origin Recognition
Complex in yeast).[6] These initiator proteins recruit other proteins to separate
the two strands and initiate replication forks.
Initiator proteins recruit other proteins to separate the DNA strands at
the origin, forming a bubble. Origins tend to be "AT-rich" (rich in adenine and
thymine bases) to assist this process because A-T base pairs have two hydrogen
bonds (rather than the three formed in a C-G pair)—strands rich in these
nucleotides are generally easier to separate.[7] Once strands are separated, RNA
primers are created on the template strands and DNA polymerase extends these
to create newly synthesized DNA.
As DNA synthesis continues, the original DNA strands continue to
unwind on each side of the bubble, forming replication forks. In bacteria, which
have a single origin of replication on their circular chromosome, this process
eventually creates a "theta structure" (resembling the Greek letter theta: θ). In
contrast, eukaryotes have longer linear chromosomes and initiate replication at
multiple origins within these.

28
DNA replication in prokaryotes is exemplified in E. coli. It is bi-directional
and originates at a single origin of replication (OriC).
Initiation
The initiation of replication is mediated by a protein that binds to a
region of the origin known as the DnaA box. In E. coli, there are 5 DnaA
boxes, each of which contains a highly conserved 9 bp consensus sequence 5' -
TTATCCACA - 3'. Binding of DnaA to this region causes it to become
negatively supercoiled. Following this, a region of OriC upstream of the DnaA
boxes (known as DnaB boxes) become melted. There are three of these regions,
and each are 13 bp long, and AT-rich (which facilitates melting because less
energy is required to break the two hydrogen bonds that form between A and T
nucleotides). This region has the consensus sequence 5' - GATCTNTTNTTTT
- 3. Melting of the DnaB boxes requires ATP (which is hydrolyzed by DnaA).
Following melting, DnaA recuits a hexameric helicase (six DnaB proteins) to
opposite ends of the melted DNA. This is where the replication fork will form.
Recruitment of helicase requires six DnaC proteins, each of which is attached
to one subunit of helicase. Once this complex is formed, an additional five
DnaA proteins bind to the original five DnaA proteins to form five DnaA
dimers. DnaC is then released, and the prepriming complex is complete. In
order for DNA replication to continue, SSB protein is needed to prevent the
single strands of DNA from forming any secondary structures and to prevent
them from reannealing, and DNA gyrase is needed to relieves the stress (by
creating negative supercoils) created by the action of DnaB helicase. The
unwinding of DNA by DnaB helicase allows for primase (DnaG) and RNA
polymerase to prime each DNA template so that DNA synthesis can begin.
Elongation
Once priming is complete, DNA polymerase III holoenzyme is loaded
into the DNA and replication begins. The catalytic mechanism of DNA
polymerase III involves the use of two metal ions in the active site, and a
region in the active site that can discriminate between deoxynucleotides and
ribonucleotides. The metal ions are general divalent cations that help the 3' OH
initiate a nucleophilic attack onto the alpha phosphate of the
deoxyribonucleotide and orient and stabilize the negatively charged
triphosphate on the deoxyribonucleotide. Nucleophilic attack by the 3' OH on
the alpha phosphate releases pyrophosphate, which is then subsequently
hydrolyzed (by inorganic phosphatase) into two phosphates. This hydrolysis
drives DNA synthesis to completion.
Furthermore, DNA polymerase III must be able to distinguish between
correctly paired bases and incorrectly paired bases. This is accomplished by

29
distinguishing Watson-Crick base pairs through the use of an active site pocket
that is complementary in shape to the structure of correctly paired nucleotides.
This pocket has a tyrosine residue that is able to form van der Waals
interactions with the correctly paired nucleotide. In addition, dsDNA in the
active site has a wider and shallower minor groove that permits the formation
of hydrogen bonds with the third nitrogen of purine bases and the second
oxygen of pyrimidine bases. Finally, the active site makes extensive hydrogen
bonds with the DNA backbone. These interactions result in the DNA
polymerase III closing around a correctly paired base. If a base is inserted and
incorrectly paired, these interactions could not occur due to disruptions in
hydrogen bonding and van der Waals interactions.
DNA is read in the 3' → 5' direction, therefore, nucleotides are
synthesized (or attached to the template strand) in the 5' → 3' direction.
However, one of the parent strands of DNA is 3' → 5' while the other is 5' →
3'. To solve this, replication occurs in opposite directions. Heading towards the
replication fork, the leading strand in synthesized in a continuous fashion, only
requiring one primer. On the other hand, the lagging strand, heading away from
the replication fork, is synthesized in a series of short fragments known as
Okazaki fragments, consequently requiring many primers. The RNA primers of
Okazaki fragments are subsequently degraded by RNAse H and DNA
Polymerase I (exonuclease), and the gap (or nicks) are filled with
deoxyribonucleotides and sealed by the enzyme ligase.
Termination
Termination of DNA replication in E. coli is completed through the use
of termination sequences and the Tus protein. These sequences allow the two
replication forks to pass through in only one direction, but not the other.
However, these sequences are not required for termination of replication.
Regulation of DNA replication is achieved through several
mechanisms. Mechanisms involve the ratio of ATP to ADP, of DnaA to the
number of DnaA boxes and the hemimethylation and sequestering of OriC. The
ratio of ATP to ADP indicates that the cell has reached a specific size and is
ready to divide. This "signal" occurs because in a rich medium, the cell will
grow quickly and will have a lot of excess DNA. Furthermore, DnaA binds
equally well to ATP or ADP, and only the DnaA-ATP complex is able to
initiate replication. Thus, in a fast growing cell, there will be more DnaA-ATP
than DnaA-ADP. Because the levels of DnaA are strictly regulated, and 5
DnaA-DnaA dimers are needed to initiate replication, the ratio of DnaA to the
number of DnaA boxes in the cell is important. After DNA replication is
complete, this number is halved, thus DNA replication cannot occur until the

30
levels of DnaA protein increases. Finally, DNA is sequestered to a membrane-
binding protein called SeqA. This protein binds to hemi-methylated GATC
DNA sequences. This four bp sequences occurs 11 times in OriC, and newly
synthesized DNA only has its parent strand methylated. DAM
methyltransferase methylates the newly synthesized strand of DNA only if it is
not bound to SeqA. The importance of hemi-methylation is twofold. Firstly,
OriC becomes inaccessible to DnaA, and secondly, DnaA binds better to fully
methylated DNA than hemi-methylated DNA.
The replication fork
Many enzymes are involved in the DNA replication fork.
The replication fork is a structure which forms when DNA is being
replicated. It is created through the action of helicase, which breaks the
hydrogen bonds holding the two DNA strands together. The resulting structure
has two branching "prongs", each one made up of a single strand of DNA.
Leading strand synthesis
In DNA replication, the leading strand is defined as the new DNA
strand at the replication fork that is synthesized in the 5'→3' direction in a
continuous manner. When the enzyme helicase unwinds DNA, two single
stranded regions of DNA (the "replication fork") form. On the leading strand
DNA polymerase III is able to synthesize DNA using the free 3' OH group
donated by a single RNA primer and continuous synthesis occurs in the
direction in which the replication fork is moving.
Lagging strand synthesis
The lagging strand is the DNA strand at the opposite side of the
replication fork from the leading strand, running in the 3' to 5' direction.
Because DNA polymerase cannot synthesize in the 3'→5' direction, the lagging
strand is synthesized in short segments known as Okazaki fragments. Along the
lagging strand's template, primase builds RNA primers in short bursts. DNA
polymerases are then able to use the free 3' OH groups on the RNA primers to
synthesize DNA in the 5'→3' direction. The RNA fragments are then removed
(different mechanisms are used in eukaryotes and prokaryotes) and new
deoxyribonucleotides are added to fill the gaps where the RNA was present.
DNA ligase then joins the deoxyribonucleotides together, completing the
synthesis of the lagging strand.
Dynamics at the replication fork
The assembled human DNA clamp, a trimer of the protein PCNA.
As helicase unwinds DNA at the replication fork, the DNA ahead is
forced to rotate. This process results in a build-up of twists in the DNA

31
ahead.[8] This build-up would form a resistance that would eventually halt the
progress of the replication fork. DNA topoisomerases are enzymes that solve
these physical problems in the coiling of DNA. Topoisomerase I cuts a single
backbone on the DNA, enabling the strands to swivel around each other to
remove the build-up of twists. Topoisomerase II cuts both backbones, enabling
one double-stranded DNA to pass through another, thereby removing knots and
entanglements that can form within and between DNA molecules.
Bare single-stranded DNA has a tendency to fold back upon itself and
form secondary structures; these structures can interfere with the movement of
DNA polymerase. To prevent this, single-strand binding proteins bind to the
DNA until a second strand is synthesized, preventing secondary structure
formation.
Clamp proteins form a sliding clamp around DNA, helping the DNA
polymerase maintain contact with its template and thereby assisting with
processivity. The inner face of the clamp enables DNA to be threaded through
it. Once the polymerase reaches the end of the template or detects double
stranded DNA, the sliding clamp undergoes a conformational change which
releases the DNA polymerase. Clamp-loading proteins are used to initially load
the clamp, recognizing the junction between template and RNA primers.
Eukaryotic DNA replication
Although the mechanisms of DNA synthesis in eukaryotes and
prokaryotes are similar, DNA replication in eukaryotes is much more
complicated. Though DNA synthesis in prokaryotes such as E. coli is
regulated, DNA replication is initiated before the end of the cell cycle.
Eukaryotic cells can only initiate DNA replication at a specific point in the cell
cycle, the beginning of S phase.
DNA replication in eukaryotes occurs only in the S phase of the cell
cycle. However, pre-initiation occurs in the G1 phase. Due to the sheer size of
chromosomes in eukaryotes, eukaryotic chromosomes contain multiple origins
of replication. Some origins are well characterized, such as the autonomously
replicating sequences (ARS) of yeast while other eukaryotic origins,
particularly those in metazoa, can be found in spans of thousands of basepairs.
However, the assembly and initiation of replicaton is similar in both the
protozoa and metazoa.
The first step in DNA replication is the formation of the pre-initiation
replication complex (the pre-RC). The formation of this complex occurs in two
stages. The first stage requires that there is no CDK activity. This can only
occur in early G1. The formation of the pre-RC is known as licensing, but a
licensed pre-RC cannot initiate replication. Initiation of replication can only

32
occur during the S-phase. Thus, the separation of licensing and activation
ensures that the origin can only fire once per cell cycle.
DNA replication in eukaryotes is not very well characterized. However,
researchers believe that it begins with the binding of the origin recognition
complex (ORC) to the origin. This complex is a hexamer of related proteins
and remains bound to the origin, even after DNA replication occurs.
Furthermore, ORC is the functional analogue of DnaA. Following the binding
of ORC to the origin, Cdc6/Cdc18 and Cdt1 coordinate the loading of the
MCM (minichromosome maintenance functions) complex to the origin by first
binding to ORC and then binding to the MCM complex. The MCM complex is
thought to be the major DNA helicase in eukaryotic organisms, and is a
hexamer (mcm2-7). Once binding of MCM occurs, a fully licensed pre-RC
exists.
Activation of the complex occurs in S-phase and requires Cdk2-Cyclin
E and Ddk. The activation process begins with the addition of Mcm10 to the
pre-RC, which displaces Cdt1. Following this, Ddk phosphorylates Mcm3-7,
which activates the helicase. It is believed that ORC and Cdc6/18 are
phosphorylated by Cdk2-Cyclin E. Ddk and the Cdk complex then recruits
another protein called Cdc45, which then recruits all of the DNA replication
proteins to the replication fork. At this stage the origin fires and DNA synthesis
begins.
Activation of a new round of replication is prevented through the
actions of the cyclin dependent kinases and a protein known as geminin.
Geminin binds to Cdt1 and sequesters it. It is a periodic protein that first
appears in S-phase and is degraded in late M-phase, possibly through the action
of the anaphase promoting complex (APC). In addition, phosphorylation of
Cdc6/18 prevent it from binding to the ORC (thus inhibiting loading of the
MCM complex) while the phosphorylation of ORC remains unclear. Cells in
the G0 stage of the cell cycle are prevented from initiating a round of
replication because the Mcm proteins are not expressed.
Numerous polymerases can replicate DNA in eukaryotic cells.
Currently, six families of polymerases (A, B, C, D, X, Y) have been
discovered. At least four different types of DNA polymerases are involved in
the replication of DNA in animal cells (POLA, POLG, POLD1 and POLE).
POL1 functions by extending the primer in the 5' -> 3' direction and tightly
associates with primase. However, it lacks the ability to proofread DNA.
POLD1 has a proofreading ability and is able to replicate the entire length of a
template only when associated with PCNA. POLE is able to replicate the entire
length of a template in the absence of PCNA and is able to proofread DNA

33
while POLG replicates mitochondrial DNA via the D-Loop mechanism of
DNA replication. All primers are removed by RNaseH1 and Flap Endonuclease
I. The general mechanisms of DNA replication on the leading and lagging
strand, however, are the same as to those found in prokaryotic cells.
Regulation of replication
The cell cycle of eukaryotic cells.
Eukaryotes
Within eukaryotes, DNA replication is controlled within the context of
the cell cycle. As the cell grows and divides, it progresses through stages in the
cell cycle; DNA replication occurs during the S phase (Synthesis phase). The
progress of the eukaryotic cell through the cycle is controlled by cell cycle
checkpoints. Progression through checkpoints is controlled through complex
interactions between various proteins, including cyclins and cyclin-dependent
kinases.[10]
The G1/S checkpoint (or restriction checkpoint) regulates whether
eukaryotic cells enter the process of DNA replication and subsequent division.
Cells which do not proceed through this checkpoint are quiescent in the "G0"
stage and do not replicate their DNA.
Replication of chloroplast and mitochondrial genomes occurs
independent of the cell cycle, through the process of D-loop replication.
Bacteria
Most bacteria do not go through a well-defined cell cycle and instead
continuously copy their DNA; during rapid growth this can result in multiple
rounds of replication occurring concurrently.[11] Within E coli, the most well-
characterized bacteria, regulation of DNA replication can be achieved through
several mechanisms, including: the hemimethylation and sequestering of the
origin sequence, the ratio of ATP to ADP, and the levels of protein DnaA.
These all control the process of initiator proteins binding to the origin
sequences.
Because E coli methylates GATC DNA sequences, DNA synthesis
results in hemimethylated sequences. This hemimethylated DNA is recognized
by a protein (SeqA) which binds and sequesters the origin sequence; in
addition, dnaA (required for initiation of replication) binds less well to
hemimethylated DNA. As a result, newly replicated origins are prevented from
immediately initiating another round of DNA replication.[12]
ATP builds up when the cell is in a rich medium, triggering DNA
replication once the cell has reached a specific size. ATP competes with ADP
to bind to DnaA, and the DnaA-ATP complex is able to initiate replication. A

34
certain number of DnaA proteins are also required for DNA replication — each
time the origin is copied the number of binding sites for DnaA doubles,
requiring the synthesis of more DnaA to enable another initiation of replication.
Termination of replication
Because bacteria have circular chromosomes, termination of replication
occurs when the two replication forks meet each other on the opposite end of
the parental chromosome. E coli regulate this process through the use of
termination sequences which, when bound by the Tus protein, enable only one
direction of replication fork to pass through. As a result, the replication forks
are constrained to always meet within the termination region of the
chromosome.[13]
Eukaryotes initiate DNA replication at multiple points in the
chromosome, so replication forks meet and terminate at many points in the
chromosome; these are not known to be regulated in any particular manner.
Because eukaryotes have linear chromosomes, DNA replication often fails to
synthesize to the very end of the chromosomes (telomeres), resulting in
telomere shortening. This is a normal process in somatic cells — cells are only
able to divide a certain number of times before the DNA loss prevents further
division. (This is known as the Hayflick limit.) Within the germ cell line, which
passes DNA to the next generation, the enzyme telomerase extends the
repetitive sequences of the telomere region to prevent degradation. Telomerase
can become mistakenly active in somatic cells, sometimes leading to cancer
formation.
Rolling circle replication
Another method of copying DNA, sometimes used in vivo by bacteria
and viruses, is the process of rolling circle replication. In this form of
replication, a single replication fork progresses around a circular molecule to
form multiple linear copies of the DNA sequence. In cells, this process can be
used to rapidly synthesize multiple copies of plasmids or viral genomes.
In the cell, rolling circle replication is initiated by an initiator protein
encoded by the plasmid or virus DNA. This protein is able to nick one strand of
the double-stranded, circular DNA molecule at a site called the double-strand
origin (DSO) and remains bound to the 5' phosphate end of the nicked strand.
The free 3' hydroxyl end is released and can serve as a primer for DNA
synthesis. Using the unnicked strand as a template, replication proceeds around
the circular DNA molecule, displacing the nicked strand as single-stranded
DNA. Continued DNA synthesis produces multiple single-stranded linear
copies of the original DNA in a continuous head-to-tail series. In vivo these
linear copies are subsequently converted to double-stranded circular molecules.

35
Rolling circle replication can also be performed in vitro and has found
wide uses in academic research and biotechnology, often used for amplification
of DNA from very small amounts of starting material. Replication can be
initiated by nicking a double-stranded circular DNA molecule or by
hybridizing a primer to a single-stranded circle of DNA. The use of a reverse
primer (or random primers) produces hyperbranched rolling circle
amplification, resulting in exponential rather than linear growth of the DNA
molecule.
Briefly - DNA Replication
Before a cell can divide, it must duplicate all its DNA. In eukaryotes,
this occurs during S phase of the cell cycle.
The Biochemical Reactions
 DNA replication begins
with the "unzipping" of
the parent molecule as the
hydrogen bonds between
the base pairs are broken.
 Once exposed, the
sequence of bases on each
of the separated strands
serves as a template to
guide the insertion of a
complementary set of
bases on the strand being
synthesized.
 The new strands are
assembled from
deoxynucleoside triphosphates.
 Each incoming nucleotide is covalently linked to the "free" 3' carbon
atom on the pentose (figure) as
 the second and third phosphates are removed together as a molecule of
pyrophosphate (PPi).
 The nucleotides are assembled in the order that complements the order
of bases on the strand serving as the template.
 Thus each C on the template guides the insertion of a G on the new
strand, each G a C, and so on.
 When the process is complete, two DNA molecules have been formed
identical to each other and to the parent molecule.

36
The Enzymes
 A portion of the double helix is
unwound by a helicase.
 A molecule of a DNA polymerase
binds to one strand of the DNA and
begins moving along it in the 3' to 5'
direction, using it as a template for
assembling a leading strand of
nucleotides and reforming a double
helix. In eukaryotes, this molecule is
called DNA polymerase delta (δ).
 Because DNA synthesis can only
occur 5' to 3', a molecule of a second
type of DNA polymerase (epsilon, ε, in eukaryotes) binds to the other
template strand as the double helix opens. This molecule must
synthesize discontinuous segments of polynucleotides (called Okazaki
fragments). Another enzyme, DNA ligase I then stitches these together
into the lagging strand.
DNA Replication is Semiconservative
When the replication process is complete, two DNA molecules —
identical to each other and identical to the original — have been produced.
Each strand of the original molecule has
 remained intact as it served as the template for the synthesis of
 a complementary strand.
This mode of replication is described as semi-conservative: one-half of
each new molecule of DNA is old; one-half new.
Watson and Crick had suggested that this was the way the DNA would
turn out to be replicated. Proof of the model came from the experiments of
Meselson and Stahl.
Speed of Replication
Bacteria
The single molecule of DNA that is the
E. coli genome contains 4.7 x 106
nucleotide pairs. DNA replication begins
at a single, fixed location in this
molecule, the replication origin,
proceeds at about 1000 nucleotides per
second, and thus is done in no more than

37
40 minutes. And thanks to the precision of the process (which includes a
"proof-reading" function), the job is done with only about one incorrect
nucleotide for every 109 nucleotides inserted. In other words, more often than
not, the E. coli genome (4.7 x 106) is copied without error!
Eukaryotes
The average human chromosome contains 150 x 106 nucleotide pairs
which are copied at about 50 base pairs per second. The process would take a
month (rather than the hour it actually does) but for the fact that there are many
places on the eukaryotic chromosome where replication can begin. Replication
begins at some replication origins earlier in S phase than at others, but the
process is completed for all by the end of S phase. As replication nears
completion, "bubbles" of newly replicated DNA meet and fuse, finally forming
two new molecules.
Control of Replication
With their multiple origins, how does the eukaryotic cell know which
origins have been already replicated and which still await replication?
An observation: When a cell in
G2 of the cell cycle is fused with
a cell in S phase, the DNA of the
G2 nucleus does not begin
replicating again even though
replication is proceeding
normally in the S-phase nucleus.
Not until mitosis is completed,
can freshly-synthesized DNA be
replicated again.
Two control mechanisms have
been identified — one positive
and one negative. This
redundancy probably reflects the crucial importance of precise replication to
the integrity of the genome.
Licensing: positive control of replication
In order to be replicated, each origin of replication must be bound by:
 an Origin Recognition Complex of proteins (ORC). These remain on
the DNA throughout the process.
 Accessory proteins called licensing factors. These accumulate in the
nucleus during G1 of the cell cycle. They include:

38
o Cdc-6 and Cdt-1, which bind to the ORC and are essential for
coating the DNA with
o MCM proteins. Only DNA coated with MCM proteins (there
are 6 of them) can be replicated.
Once replication begins in S phase,
 Cdc-6 and Cdt-1 leave the ORCs (the latter by ubiquination and
destruction in proteasomes).
 The MCM proteins leave in front of the advancing replication fork.
Geminin: negative control of replication
G2 nuclei also contain at least one protein — called geminin — that prevents
assembly of MCM proteins on freshly-synthesized DNA (probably by blocking
the actions of Cdt1).
As the cell completes mitosis, geminin is degraded so the DNA of the two
daughter cells will be able to respond to licensing factors and be able to
replicate their DNA at the next S phase.

39
NOTES
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...

40
UNIT – II
STRUCTURE
DNA Repair Causes and mechanisms
photo-reactivation, excision repair, mismatch repair, SOS repair.
Recombination in prokaryotes
Transformation, Conjugation and Transduction.
DNA Damage and Repair
DNA damage is a major topic of research within cancer biology.
Damage not only causes cancer, it is used as a means to cure certain cancers
through radiotherapy or chemotherapy and is also responsible for the side
effects of these treatments.
Over 74,000 damage
incidences occur in
DNA per cell per day,
mostly by oxidation,
hydrolysis, alkylation,
radiation or toxic
chemicals that can either
directly damage one of
the 3 billion bases
contained in DNA or
create breaks in the
phosphodiester
backbone that the bases
sit on. The result can be
mutations in genes
which are transferred the gene product (protein). If these mutations are in genes
that normally control cell proliferation or suppress tumour growth, the cells
may start to grow uncontrollably. Cells have therefore developed mechanisms
to repair DNA damage but when they stop working efficiently, the number of
mutations in our genome increases and cancer can develop.
Oxidising agents can convert Guanine to 8-hydroxydeoxyguanosine
(8oxoG) which introduces mutations as G can now pair with A instead of C, so
when the DNA is replicated the GC paring in the sequence is replaced by AT (a
point mutation). It is thought that 8oxoG lesions can lead to telomere damage
by preventing the binding of proteins which normally help to maintain telomere
structure. Telomeres are complexes of DNA and proteins which cap the end of
chromosomes and prevent them being mistaken for double stranded breaks by

41
the cells repair mechanisms and if they don’t
function properly, telomeric DNA can be lost or
the ends of telomeres can fuse together
resulting in genomic changes or apoptosis.
8oxoG lesions are repaired by Base
Excision Repair (BER) which removes a single
damaged or mutated base (caused by free
radicals, deamination of cytosine to uracil or
depurination of DNA) and replaces it with the
correct one. It involves the sequential action of
four enzymes which remove the damaged base
(N-glycosidase), cut 5’ (endonuclease) and then
3’ (DNA polymerase ß) to the abasic site
removing the region of the phosphate backbone, fill in the gap with the correct
nucleotide (DNA ploymerase ß) and seal the new base into the DNA at both
ends (DNA ligase).
The UV
component of
sunlight can cause
bulky lesions in
DNA as a result of
dimers forming
between thymine
bases and causing distortions in the DNA. The two main types are cyclobutane
pyrimidine dimers (CPD) and (6-4) photoproducts and are removed by
Nucleotide excision repair (NER). Over 20 proteins in the cell are involved in
recognising and removing damaged regions of DNA (up to 30 bases) leaving
the undamaged single strand opposite, which is then used as a template to fill in
the gap.
This is a complex process and if humans have mutations in the genes
that code for one or more of the proteins that contribute to NER, it can result in
a condition known as xeroderma pigmentosum (XP). Individuals with this
condition are very sensitive to sunlight and have a greater chance of developing
skin cancer as a result of defective NER mechanisms. XPG is one of the
proteins that plays a role in NER and has multiple functions within the cell, one
of which is as a structure specific nuclease that makes a cut 3’ to the damaged
site during repair.
If the protein is completely absent in mice they are born with severe
defects and die very early, but if a point mutation is introduced into the protein

42
that only effects the nuclease activity, the animal develops as normal but is
extremely sensitive to sunlight. This shows that the proteins nuclease activity is
only required for NER and therefore is very important in the repair of UV
induced lesions in DNA.
During DNA replication the wrong bases can sometimes be
incorporated into the new strand, and this is corrected by mismatch repair. A
multiprotein complex recognises the mismatched base on the newly synthesised
strand and degrades a small region of this strand. DNA polymerase and ligase
then fill in the gap using the other strand as a template. The multiprotein
complex is made up of four proteins, two of which (MSH2 and MLH1) cause a
form of cancer called Hereditary NonPolyposis Colon Cancer (HNPCC) if they
are defective.
Throughout our genome there are regions of repetitive DNA sequences
called microsatellites. These vary in length in the population but are generally
consistent in a single individual. Mutations can occur within the repeated units
of these sequences as part of normal DNA replication as DNA polymerase can
‘slip’ on the many repeats incorporating errors such as deletions or additions of
repeat units. If they are not corrected by mismatch repair, the sequence can
become ‘unstable’ as mutations are passed on in subsequent replications. This
is termed ‘microsatellite instability’ (MSI) and has been found in various
human cancers, for example HNPCC. Small changes of less than 6 bases in a
microsatellite are classed as ‘Type A’ MSI which is the type frequently found
in human tumours and has been directly linked to defective mismatch repair in
cells.
Double strand breaks
(DSBs) can occur when DNA
is distorted by bulky lesions
caused by ionising radiation or
carcinogens that add large
groups onto bases. DSBs are
difficult for the cell to repair
accurately but can be fixed by
recombination with the second
undamaged copy of the
chromosome (involving a
group of proteins called RAD)
or by non homologous end
joining of the broken strands,
which causes deletions or

43
insertions of bases surrounding the breaks. Homologous recombination (HR) is
the most accurate mechanism for mending DSBs and relies on repeated
sequences in the genome (e.g. on sister chromatids or homologous
chromosomes). If these are not repaired efficiently they can lead to genomic
rearrangements causing the activation of proto-oncogenes or inactivation of
tumour suppressor genes and the start of cancer.
ATM kinases are the first step in a signal transduction pathway that
halts cell cycle progression after DNA damage is induced. ATM is a large
protein made up of two identical subunits and has numerous substrates which it
phosphorylates. Although it is not
required for essential cellular processes
(like normal cell cycle progression), it
assists the cell in dealing with stresses
that affect DNA or chromatin structure.
When a DSB in induced, ATM
separates into two active monomers
which move to the site of the DNA
damage to act on various substrates
and cause the cell to arrest. Cells with a
functioning damage response system
generally arrest and repair or die in
response to DNA damage. Mutations
in these pathways can allow cells with genomic irregularities to continue to
grow, and so increase the chance of cancer.
The tumour suppressor protein p53 has recently been found to have a
new role in controlling DNA repair. When p53 is absent or inactivated in cells
there is reduced repair of UV induced lesions and increased spontaneous and
stress induced HR. When present it can stimulate BER and may cooperate with
mismatch repair in response to damage, but this relationship is not definite as it
varies in different tissues.
Damaged Reversal OR Photo Reactivation -Exposure of a cell to ultraviolet
light can result in the covalent joining of two adjacent pyrimidines producing a
dimer. Although cytosine-cytosine, cytosine thymine dimers are also formed,
the principal products of UV irradiation are thymine thymine dimers.
These thymine dimers prevent DNA polymerase from replicating the
DNA strand beyond the site of dimer formation. In E coli, an enzyme called
DNA photolyase (deoxyribodipyrimidine photolyase or photo reactivating
enzyme) detects and binds to the damaged DNA site.

44
Then the enzyme absorbs energy from visible light, which activates it
so it can break the bonds holding the pyrimidine dimer together. The enzyme
then falls free of the DNA. This enzyme thus reverses the UV induced
dimerization. Photo reactivation or photo-restoration is a light dependent DNA
repair mechanism in which certain types of pyrimidine dimers are cleaved.
DNA repair

DNA damage resulting in multiple broken chromosomes


DNA repair refers to a collection of processes by which a cell
identifies and corrects damage to the DNA molecules that encode its genome.
In human cells, both normal metabolic activities and environmental factors
such as UV light and Radiation can cause DNA damage, resulting in as many
as 1 million individual molecular lesions per cell per day.[1] Many of these
lesions cause structural damage to the DNA molecule and can alter or eliminate
the cell's ability to transcribe the gene that the affected DNA encodes. Other
lesions induce potentially harmful mutations in the cell's genome, which affect
the survival of its daughter cells after it undergoes mitosis. Consequently, the
DNA repair process is constantly active as it responds to damage in the DNA
structure.
The rate of DNA repair is dependent on many factors, including the cell
type, the age of the cell, and the extracellular environment. A cell that has
accumulated a large amount of DNA damage, or one that no longer effectively
repairs damage incurred to its DNA, can enter one of three possible states:
1. an irreversible state of dormancy, known as senescence
2. cell suicide, also known as apoptosis or programmed cell death

45
3. unregulated cell division, which can lead to the formation of a tumor that
is cancerous
The DNA repair ability of a cell is vital to the integrity of its genome
and thus to its normal functioning and that of the organism. Many genes that
were initially shown to influence lifespan have turned out to be involved in
DNA damage repair and protection.[2] Failure to correct molecular lesions in
cells that form gametes can introduce mutations into the genomes of the
offspring and thus influence the rate of evolution.
[edit] DNA damage
DNA damage, due to environmental factors and normal metabolic
processes inside the cell, occurs at a rate of 1,000 to 1,000,000 molecular
lesions per cell per day.[1] While this constitutes only 0.000165% of the human
genome's approximately 6 billion bases (3 billion base pairs), unrepaired
lesions in critical genes (such as tumor suppressor genes) can impede a cell's
ability to carry out its function and appreciably increase the likelihood of tumor
formation.
The vast majority of DNA damage affects the primary structure of the
double helix; that is, the bases themselves are chemically modified. These
modifications can in turn disrupt the molecules' regular helical structure by
introducing non-native chemical bonds or bulky adducts that do not fit in the
standard double helix. Unlike proteins and RNA, DNA usually lacks tertiary
structure and therefore damage or disturbance does not occur at that level.
DNA is, however, supercoiled and wound around "packaging" proteins called
histones (in eukaryotes), and both superstructures are vulnerable to the effects
of DNA damage.
Sources of damage
DNA damage can be subdivided into two main types:
1. endogenous damage such as attack by reactive oxygen species produced
from normal metabolic byproducts (spontaneous mutation), especially the
process of oxidative deamination;
2. exogenous damage caused by external agents such as
1. ultraviolet [UV 200-300nm] radiation from the sun
2. other radiation frequencies, including x-rays and gamma rays
3. hydrolysis or thermal disruption
4. certain plant toxins
5. human-made mutagenic chemicals, especially aromatic compounds that
act as DNA intercalating agents

46
6. cancer chemotherapy and radiotherapy
The replication of damaged DNA before cell division can lead to the
incorporation of wrong bases opposite damaged ones. Daughter cells that
inherit these wrong bases carry mutations from which the original DNA
sequence is unrecoverable (except in the rare case of a back mutation, for
example, through gene conversion).
Types of damage
There are four main types of damage to DNA due to endogenous cellular
processes:
1. oxidation of bases [e.g. 8-oxo-7,8-dihydroguanine (8-oxoG)] and
generation of DNA strand interruptions from reactive oxygen species,
2. alkylation of bases (usually methylation), such as formation of 7-
methylguanine, 1-methyladenine, O6 methylguanine
3. hydrolysis of bases, such as deamination, depurination and
depyrimidination.
4. mismatch of bases, due to errors in DNA replication, in which the wrong
DNA base is stitched into place in a newly forming DNA strand, or a
DNA base is skipped over or mistakenly inserted.
Damage caused by exogenous agents comes in many forms. Some examples
are:
1. UV-B light causes crosslinking between adjacent cytosine and thymine
bases creating pyrimidine dimers. This is called direct DNA damage.
2. UV-A light creates mostly free radicals - especially if sunscreen penetrated
into the skin. The damage caused by free radicals is called indirect DNA
damage.
3. Ionizing radiation such as that created by radioactive decay or in cosmic
rays causes breaks in DNA strands.
4. Thermal disruption at elevated temperature increases the rate of
depurination (loss of purine bases from the DNA backbone) and single
strand breaks. For example, hydrolytic depurination is seen in the
thermophilic bacteria, which grow in hot springs at 85–250 °C. The rate of
depurination (300 purine residues per genome per generation) is too high
in these species to be repaired by normal repair machinery, hence a
possibility of an adaptive response cannot be ruled out.
5. Industrial chemicals such as vinyl chloride and hydrogen peroxide, and
environmental chemicals such as polycyclic hydrocarbons found in smoke,
soot and tar create a huge diversity of DNA adducts- ethenobases, oxidized

47
bases, alkylated phosphotriesters and Crosslinking of DNA just to name a
few.
UV damage, alkylation/methylation, X-ray damage and oxidative
damage are examples of induced damage. Spontaneous damage can include the
loss of a base, deamination, sugar ring puckering and tautomeric shift.
Nuclear versus mitochondrial DNA damage
In human cells, and eukaryotic cells in general, DNA is found in two
cellular locations - inside the nucleus and inside the mitochondria. Nuclear
DNA (nDNA) exists as chromatin during non-replicative stages of the cell
cycle and is condensed into aggregate structures known as chromosomes
during cell division. In either state the DNA is highly compacted and wound up
around bead-like proteins called histones. Whenever a cell needs to express the
genetic information encoded in its nDNA the required chromosomal region is
unravelled, genes located therein are expressed, and then the region is
condensed back to its resting conformation. Mitochondrial DNA (mtDNA) is
located inside mitochondria organelles, exists in multiple copies, and is also
tightly associated with a number of proteins to form a complex known as the
nucleoid. Inside mitochondria, reactive oxygen species (ROS), or free radicals,
byproducts of the constant production of adenosine triphosphate (ATP) via
oxidative phosphorylation, create a highly oxidative environment that is known
to damage mtDNA. A critical enzyme in counteracting the toxicity of these
species is superoxide dismutase, which is present in both the mitochondria and
cytoplasm of eukaryotic cells.
Senescence and apoptosis
Senescence, an irreversible state in which the cell no longer divides
(mitosis), is a protective response to the shortening of the chromosome ends
(telomeres). The telomeres are long regions of repetitive noncoding DNA that
cap chromosomes and undergo partial degradation each time a cell undergoes
division In contrast, quiescence is a reversible state of cellular dormancy that
is unrelated to genome damage. Senescence in cells may serve as a functional
alternative to apoptosis in cases where the physical presence of a cell for spatial
reasons is required by the organism, which serves as a "last resort" mechanism
to prevent a cell with damaged DNA from replicating inappropriately in the
absence of pro-growth cellular signaling. Unregulated cell division can lead to
the formation of a tumor, which is potentially lethal to an organism. Therefore
the induction of senescence and apoptosis is considered to be part of a strategy
of protection against cancer.

48
DNA damage and mutation
It is important to distinguish between DNA damage and mutation, the
two major types of error in DNA. DNA damages and mutation are
fundamentally different. Damages are physical abnormalities in the DNA, such
as single and double strand breaks, 8-hydroxydeoxyguanosine residues and
polycyclic aromatic hydrocarbon adducts. DNA damages can be recognized by
enzymes, and thus they can be correctly repaired if redundant information, such
as the undamaged sequence in the complementary DNA strand or in a
homologous chromosome, is available for copying. If a cell retains DNA
damage, transcription of a gene can be prevented and thus translation into a
protein will also be blocked. Replication may also be blocked and/or the cell
may die.
In contrast to DNA damage, a mutation is a change in the base sequence
of the DNA. A mutation cannot be recognized by enzymes once the base
change is present in both DNA strands, and thus a mutation cannot be repaired.
At the cellular level, mutations can cause alterations in protein function and
regulation. Mutations are replicated when the cell replicates. In a population of
cells, mutant cells will increase or decrease in frequency according to the
effects of the mutation on the ability of the cell to survive and reproduce.
Although distinctly different from each other, DNA damages and mutations are
related because DNA damages often cause errors of DNA synthesis during
replication or repair and these errors are a major source of mutation.
Given these properties of DNA damage and mutation, it can be seen
that DNA damages are a special problem in non-dividing or slowly dividing
cells, where unrepaired damages will tend to accumulate over time. On the
other hand, in rapidly dividing cells, unrepaired DNA damages that do not kill
the cell by blocking replication will tend to cause replication errors and thus
mutation. The great majority of mutations that are not neutral in their effect are
deleterious to a cell’s survival. Thus, in a population of cells comprising a
tissue with replicating cells, mutant cells will tend to be lost. However
infrequent mutations that provide a survival advantage will tend to clonally
expand at the expense of neighboring cells in the tissue. This advantage to the
cell is disadvantageous to the whole organism, because such mutant cells can
give rise to cancer. Thus DNA damages in frequently dividing cells, because
they give rise to mutations, are a prominent cause of cancer. In contrast, DNA
damages in infrequently dividing cells are likely a prominent cause of aging.

49
DNA repair mechanisms

Single strand and double strand DNA damage


Cells cannot function if DNA damage corrupts the integrity and
accessibility of essential information in the genome (but cells remain
superficially functional when so-called "non-essential" genes are missing or
damaged). Depending on the type of damage inflicted on the DNA's double
helical structure, a variety of repair strategies have evolved to restore lost
information. If possible, cells use the unmodified complementary strand of the
DNA or the sister chromatid as a template to losslessly recover the original
information. Without access to a template, cells use an error-prone recovery
mechanism known as translesion synthesis as a last resort.
Damage to DNA alters the spatial configuration of the helix and such
alterations can be detected by the cell. Once damage is localized, specific DNA
repair molecules bind at or near the site of damage, inducing other molecules to
bind and form a complex that enables the actual repair to take place. The types
of molecules involved and the mechanism of repair that is mobilized depend on
the type of damage that has occurred and the phase of the cell cycle that the cell
is in.

50
Direct reversal
Cells are known to eliminate three types of damage to their DNA by
chemically reversing it. These mechanisms do not require a template, since the
types of damage they counteract can only occur in one of the four bases. Such
direct reversal mechanisms are specific to the type of damage incurred and do
not involve breakage of the phosphodiester backbone. The formation of
thymine dimers (a common type of cyclobutyl dimer) upon irradiation with UV
light results in an abnormal covalent bond between adjacent thymidine bases.
The photoreactivation process directly reverses this damage by the action of the
enzyme photolyase, whose activation is obligately dependent on energy
absorbed from blue/UV light (300–500nm wavelength) to promote catalysis.[
Another type of damage, methylation of guanine bases, is directly reversed by
the protein methyl guanine methyl transferase (MGMT), the bacterial
equivalent of which is called as ogt. This is an expensive process because each
MGMT molecule can only be used once; that is, the reaction is stoichiometric
rather than catalytic. A generalized response to methylating agents in bacteria
is known as the adaptive response and confers a level of resistance to alkylating
agents upon sustained exposure by upregulation of alkylation repair enzymes.[8]
The third type of DNA damage reversed by cells is certain methylation of the
bases cytosine and adenine.
Single strand damage

Structure of the base-excision repair enzyme uracil-DNA glycosylase.


The uracil residue is shown in yellow.
When only one of the two strands of a double helix has a defect, the
other strand can be used as a template to guide the correction of the damaged
strand. In order to repair damage to one of the two paired molecules of DNA,
there exist a number of excision repair mechanisms that remove the damaged
nucleotide and replace it with an undamaged nucleotide complementary to that
found in the undamaged DNA strand.

51
1. Base excision repair (BER), which repairs damage to a single nucleotide
caused by oxidation, alkylation, hydrolysis, or deamination. The base is
removed with glycosylase and ultimately replaced by repair synthesis with
DNA ligase.
2. Nucleotide excision repair (NER), which repairs damage affecting longer
strands of 2–30 bases. This process recognizes bulky, helix-distorting changes
such as thymine dimers as well as single-strand breaks (repaired with enzymes
such UvrABC endonuclease). A specialized form of NER known as
Transcription-Coupled Repair (TCR) deploys high-priority NER repair
enzymes to genes that are being actively transcribed.
3. Mismatch repair (MMR), which corrects errors of DNA replication and
recombination that result in mispaired (but normal, that is non- damaged)
nucleotides following DNA replication.
Double-strand breaks
Double-strand breaks (DSBs), in which both strands in the double helix are
severed, are particularly hazardous to the cell because they can lead to genome
rearrangements. Two mechanisms exist to repair DSBs: non-homologous end
joining (NHEJ) and recombinational repair (also known as template-assisted
repair or homologous recombination repair).

DNA ligase, shown above repairing chromosomal damage, is an


enzyme that joins broken nucleotides together by catalyzing the formation of an
internucleotide ester bond between the phosphate backbone and the
deoxyribose nucleotides.
In NHEJ, DNA Ligase IV, a specialized DNA Ligase that forms a
complex with the cofactor XRCC4, directly joins the two ends.[9] To guide

52
accurate repair, NHEJ relies on short homologous sequences called
microhomologies present on the single-stranded tails of the DNA ends to be
joined. If these overhangs are compatible, repair is usually accurate. NHEJ can
also introduce mutations during repair. Loss of damaged nucleotides at the
break site can lead to deletions, and joining of nonmatching termini forms
translocations. NHEJ is especially important before the cell has replicated its
DNA, since there is no template available for repair by homologous
recombination. There are "backup" NHEJ pathways in higher eukaryotes.[14]
Besides its role as a genome caretaker, NHEJ is required for joining hairpin-
capped double-strand breaks induced during V(D)J recombination, the process
that generates diversity in B-cell and T-cell receptors in the vertebrate immune
system.[15]
Recombinational repair requires the presence of an identical or nearly
identical sequence to be used as a template for repair of the break. The
enzymatic machinery responsible for this repair process is nearly identical to
the machinery responsible for chromosomal crossover during meiosis. This
pathway allows a damaged chromosome to be repaired using a sister chromatid
(available in G2 after DNA replication) or a homologous chromosome as a
template. DSBs caused by the replication machinery attempting to synthesize
across a single-strand break or unrepaired lesion cause collapse of the
replication fork and are typically repaired by recombination.
Topoisomerases introduce both single- and double-strand breaks in the
course of changing the DNA's state of supercoiling, which is especially
common in regions near an open replication fork. Such breaks are not
considered DNA damage because they are a natural intermediate in the
topoisomerase biochemical mechanism and are immediately repaired by the
enzymes that created them.
A team of French researchers bombarded Deinococcus radiodurans to
study the mechanism of double-strand break DNA repair in that organism. At
least two copies of the genome, with random DNA breaks, can form DNA
fragments through annealing. Partially overlapping fragments are then used for
synthesis of homologous regions through a moving D-loop that can continue
extension until they find complementary partner strands. In the final step there
is crossover by means of RecA-dependent homologous recombination.
Translesion synthesis
Translesion synthesis is a DNA damage tolerance process that allows
the DNA replication machinery to replicate past DNA lesions such as thymine
dimers or AP sites. It involves the switching out of regular DNA polymerases
for specialized translesion polymerases, often with larger active sites that can

53
facilitate the insertion of bases opposite damaged nucleotides. The polymerase
switching is thought to be mediated by, among other factors, the post-
translational modification of the replication processivity factor PCNA.
Translesion synthesis polymerases often have low fidelity (high propensity to
insert wrong bases) relative to regular polymerases. However, many are
extremely efficient at inserting correct bases opposite specific types of damage.
For example, Pol η mediates error-free bypass of lesions induced by UV
irradiation, whereas Pol ζ introduces mutations at these sites. From a cellular
perspective, risking the introduction of point mutations during translesion
synthesis may be preferable to resorting to more drastic mechanisms of DNA
repair, which may cause gross chromosomal aberrations or cell death.
Global response to DNA damage
Cells exposed to ionizing radiation, ultraviolet light or chemicals are
prone to acquire multiple sites of bulky DNA lesions and double strand breaks.
Moreover, DNA damaging agents can damage other biomolecules such as
proteins, carbohydrates, lipids and RNA. The accumulation of damage,
specifically double strand breaks or adducts stalling the replication forks, are
among known stimulation signals for a global response to DNA damage.[17]
The global response to damage is an act directed toward the cells' own
preservation and triggers multiple pathways of macromolecular repair, lesion
bypass, tolerance or apoptosis. The common features of global response are
induction of multiple genes, cell cycle arrest, and inhibition of cell division.
DNA damage checkpoints
After DNA damage, cell cycle checkpoints are activated. Checkpoint
activation pauses the cell cycle and gives the cell time to repair the damage
before continuing to divide. DNA damage checkpoints occur at the G1/S and
G2/M boundaries. An intra-S checkpoint also exists. Checkpoint activation is
controlled by two master kinases, ATM and ATR. ATM responds to DNA
double-strand breaks and disruptions in chromatin structure,[18] whereas ATR
primarily responds to stalled replication forks. These kinases phosphorylate
downstream targets in a signal transduction cascade, eventually leading to cell
cycle arrest. A class of checkpoint mediator proteins including BRCA1,
MDC1, and 53BP1 has also been identified.[19] These proteins seem to be
required for transmitting the checkpoint activation signal to downstream
proteins.
p53 is an important downstream target of ATM and ATR, as it is
required for inducing apoptosis following DNA damage.[20] At the G1/S
checkpoint, p53 functions by deactivating the CDK2/cyclin E complex.

54
Similarly, p21 mediates the G2/M checkpoint by deactivating the CDK1/cyclin
B complex.
The prokaryotic SOS response
The SOS response is the term used to describe changes in gene
expression in Escherichia coli and other bacteria in response to extensive DNA
damage. The prokaryotic SOS system is regulated by two key proteins: LexA
and RecA. The LexA homodimer is a transcriptional repressor that binds to
operator sequences commonly referred to as SOS boxes. It is known that LexA
regulates transcription of approximately 48 genes including the lexA and recA
genes.[21] The most common cellular signals activating the SOS response are
regions of single stranded DNA (ssDNA), arising from stalled replication forks
or double strand breaks, which are processed by DNA helicase to separate the
two DNA strands.[17] In the initiation step, RecA protein binds to ssDNA in an
ATP hydrolysis driven reaction creating RecA–ssDNA filaments. RecA–
ssDNA filaments activate LexA autoprotease activity which ultimately leads to
cleavage of LexA dimmer and subsequent LexA degradation. The loss of LexA
repressor induces transcription of the SOS genes and allows for further signal
induction, inhibition of cell division and an increase in levels of proteins
responsible for damage processing.
SOS boxes are 20-nucleotide long sequences near promoters with
palindromic structure and a high degree of sequence conservation. This
distinction in promoter sequences causes differential binding of LexA to
different promoters and allows for timing of the SOS response. Logically, the
lesion repair genes are induced at the beginning of SOS response. The error
prone translession polymerases, for example: UmuCD’2 (also called DNA
polymerase V), are induced later on as a last resort.[22] Once the DNA damage
is repaired or bypassed using polymerases or through recombination, the
amount of single-stranded DNA in cells is decreased, lowering the amounts of
RecA filaments decreases cleavage activity of LexA homodimer which
subsequently binds to the SOS boxes near promoters and restores normal gene
expression.
Eukaryotic transcriptional responses to DNA damage
Eukaryotic cells exposed to DNA damaging agents also activate
important defensive pathways by inducing multiple proteins involved in DNA
repair, cell cycle checkpoint control, protein trafficking and degradation. Such
genome wide transcriptional response is very complex and tightly regulated,
thus allowing coordinated global response to damage. Exposure of yeast
Saccharomyces cerevisiae to DNA damaging agents results in overlapping but
distinct transcriptional profiles. Similarities to environmental shock response

55
indicates that a general global stress response pathway exist at the level of
transcriptional activation. In contrast, different human cell types respond to
damage differently indicating an absence of a common global response. The
probable explanation for this difference between yeast and human cells may be
in the heterogeneity of mammalian cells. In an animal different types of cells
are distributed amongst different organs which have evolved different
sensitivities to DNA damage.
In general global response to DNA damage involves expression of
multiple genes responsible for postreplication repair, homologous
recombination, nucleotide excision repair, DNA damage checkpoint, global
transcriptional activation, genes controlling mRNA decay and many others.
The vast amount of damage to a cell leaves it with an important decision;
undergo apoptosis and die, or survive at the cost of living with a modified
genome. An increase in tolerance to damage can lead to an increased rate of
survival which will allow a greater accumulation of mutations. Yeast Rev1 and
human polymerase η are members of [Y family translesion DNA polymerases
present during global response to DNA damage and are responsible for
enhanced mutagenesis during a global response to DNA damage in eukaryotes.
Experimental animals with genetic deficiencies in DNA repair often
show decreased lifespan and increased cancer incidence. For example, mice
deficient in the dominant NHEJ pathway and in telomere maintenance
mechanisms get lymphoma and infections more often, and consequently have
shorter lifespans than wild-type mice. Similarly, mice deficient in a key repair
and transcription protein that unwinds DNA helices have premature onset of
aging-related diseases and consequent shortening of lifespan. However, not
every DNA repair deficiency creates exactly the predicted effects; mice
deficient in the NER pathway exhibited shortened lifespan without
correspondingly higher rates of mutation

56
DNA repair and aging
Pathological effects of poor DNA repair

DNA repair rate is an important determinant of cell pathology


If the rate of DNA damage exceeds the capacity of the cell to repair it,
the accumulation of errors can overwhelm the cell and result in early
senescence, apoptosis or cancer. Inherited diseases associated with faulty DNA
repair functioning result in premature aging, increased sensitivity to
carcinogens, and correspondingly increased cancer risk (see below). On the
other hand, organisms with enhanced DNA repair systems, such as
Deinococcus radiodurans, the most radiation-resistant known organism, exhibit
remarkable resistance to the double strand break-inducing effects of
radioactivity, likely due to enhanced efficiency of DNA repair and especially
NHEJ.

57
Longevity and caloric restriction

Most lifespan influencing genes affect the rate of DNA damage


A number of individual genes have been identified as influencing
variations in lifespan within a population of organisms. The effects of these
genes is strongly dependent on the environment, particularly on the organism's
diet. Caloric restriction reproducibly results in extended lifespan in a variety of
organisms, likely via nutrient sensing pathways and decreased metabolic rate.
The molecular mechanisms by which such restriction results in lengthened
lifespan are as yet unclear; however, the behavior of many genes known to be
involved in DNA repair is altered under conditions of caloric restriction.
For example, increasing the gene dosage of the gene SIR-2, which
regulates DNA packaging in the nematode worm Caenorhabditis elegans, can
significantly extend lifespan. The mammalian homolog of SIR-2 is known to
induce downstream DNA repair factors involved in NHEJ, an activity that is
especially promoted under conditions of caloric restriction.[29] Caloric

58
restriction has been closely linked to the rate of base excision repair in the
nuclear DNA of rodents, although similar effects have not been observed in
mitochondrial DNA.
Interestingly, the C. elegans gene AGE-1, an upstream effector of DNA
repair pathways, confers dramatically extended lifespan under free-feeding
conditions but leads to a decrease in reproductive fitness under conditions of
caloric restriction.[32] This observation supports the pleiotropy theory of the
biological origins of aging, which suggests that genes conferring a large
survival advantage early in life will be selected for even if they carry a
corresponding disadvantage late in life.
Medicine and DNA repair modulation
Hereditary DNA repair disorders
Defects in the NER mechanism are responsible for several genetic disorders,
including:
 xeroderma pigmentosum: hypersensitivity to sunlight/UV, resulting in
increased skin cancer incidence and premature aging
 Cockayne syndrome: hypersensitivity to UV and chemical agents
 trichothiodystrophy: sensitive skin, brittle hair and nails
Mental retardation often accompanies the latter two disorders,
suggesting increased vulnerability of developmental neurons.
Other DNA repair disorders include:
 Werner's syndrome: premature aging and retarded growth
 Bloom's syndrome: sunlight hypersensitivity, high incidence of
malignancies (especially leukemias).
 ataxia telangiectasia: sensitivity to ionizing radiation and some
chemical agents
All of the above diseases are often called "segmental progerias"
("accelerated aging diseases") because their victims appear elderly and suffer
from aging-related diseases at an abnormally young age.
Other diseases associated with reduced DNA repair function include
Fanconi's anemia, hereditary breast cancer and hereditary colon cancer.
DNA repair and cancer
Inherited mutations that affect DNA repair genes are strongly associated
with high cancer risks in humans. Hereditary nonpolyposis colorectal cancer
(HNPCC) is strongly associated with specific mutations in the DNA mismatch
repair pathway. BRCA1 and BRCA2, two famous mutations conferring a
hugely increased risk of breast cancer on carriers, are both associated with a

59
large number of DNA repair pathways, especially NHEJ and homologous
recombination.
Cancer therapy procedures such as chemotherapy and radiotherapy
work by overwhelming the capacity of the cell to repair DNA damage,
resulting in cell death. Cells that are most rapidly dividing - most typically
cancer cells - are preferentially affected. The side effect is that other non-
cancerous but rapidly dividing cells such as stem cells in the bone marrow are
also affected. Modern cancer treatments attempt to localize the DNA damage to
cells and tissues only associated with cancer, either by physical means
(concentrating the therapeutic agent in the region of the tumor) or by
biochemical means (exploiting a feature unique to cancer cells in the body).
DNA repair and evolution
The basic processes of DNA repair are highly conserved among both
prokaryotes and eukaryotes and even among bacteriophage (viruses that infect
bacteria); however, more complex organisms with more complex genomes
have correspondingly more complex repair mechanisms.[33] The ability of a
large number of protein structural motifs to catalyze relevant chemical
reactions has played a significant role in the elaboration of repair mechanisms
during evolution. For an extremely detailed review of hypotheses relating to the
evolution of DNA repair, see.[34]
The fossil record indicates that single celled life began to proliferate on
the planet at some point during the Precambrian period, although exactly when
recognizably modern life first emerged is unclear. Nucleic acids became the
sole and universal means of encoding genetic information, requiring DNA
repair mechanisms that in their basic form have been inherited by all extant life
forms from their common ancestor. The emergence of Earth's oxygen-rich
atmosphere (known as the "oxygen catastrophe") due to photosynthetic
organisms, as well as the presence of potentially damaging free radicals in the
cell due to oxidative phosphorylation, necessitated the evolution of DNA repair
mechanisms that act specifically to counter the types of damage induced by
oxidative stress.
Rate of evolutionary change
On some occasions, DNA damage is not repaired, or is repaired by an
error-prone mechanism which results in a change from the original sequence.
When this occurs, mutations may propagate into the genomes of the cell's
progeny. Should such an event occur in a germ line cell that will eventually
produce a gamete, the mutation has the potential to be passed on to the
organism's offspring. The rate of evolution in a particular species (or, more
narrowly, in a particular gene) is a function of the rate of mutation.

60
Consequently, the rate and accuracy of DNA repair mechanisms have an
influence over the process of evolutionary change.
RECOMBINATION IN PROKARYOTES
Recombination in Prokaryotes
We have seen that genetic changes due to mutations can result in the
acquisition of new biological characteristics and thereby allow evolutionary
change. However, evolution of the fittest organism in a particular environment
can be enhanced if transfer of genes between organisms is made possible by
genetic recombination. As compared with eukaryotes, where sexual
recombination is of ordered nature, in prokaryotes the process is less well
developed. It does not involve a true fusion of male and female gametes to
produce a diploid zygote.
Instead there is transfer of only some genes from the donor cell to
produce a partial diploid. This is followed by recombination to restore the
haploid state. There are three mechanisms by which these DNA fragments can
pass from a donor to a recipient cell
(i) transformation,
(ii) transduction and
(iii) conjugation
Transformation Mechanism in Gene Recombination
Transformation was discovered by an English bacteriologist, Frederick
Griffith in 1928, who made a series of experiments with laboratory mice and
two types of pneumonia causing bacterium, Diplococcus pneumoniae. This
bacterium has two types of strains. One type has smooth (S), capsulated cells,
whereas another type has rough (R) noncapsulated cells. The disease is caused
by smooth type of cells only i.e. smooth type cells are pathogenic (virulent)
whereas rough type cells harmless or nonpathogenic (avirulent).
The experiments conducted by him are illustrated. As shown in the
figure, when live, harmless (rough type) cells were injected in the body of mice
the animal remained healthy. The injection of dead, pathogenic (smooth type)
cells into the body of mice also did not cause any disease. In a classic
experiment, Griffith mixed live, harmless (rough type) cells with the dead
remains of pathogenic (smooth type) cells, and then injected the mixture into
the laboratory mice.
The live cocci taken in the mixture were uncapsulated and formed
rough colonies (R) on agar. The dead cocci taken in the mixture originally had
a capsule and were taken from smooth (S) colonies on agar. To Griffith's
surprise, the mice developed pneumonia and died. On autopsy (examination of

61
tissue of dead animal), he isolated live, capsulated cells that formed smooth
colonies on agar.
Apparently the live, harmless rough cocci had been transformed in the
mice into live, pathogenic, smooth, cocci. A rough to smooth conversion (RS)
had been accomplished, Five years later, James L. Alloway of Rockefeller
Institute confirmed Griffith's work using fragments from the dead smooth type
cells to transform the rough type cells.
In 1944, Oswald T. A very, Colin M. MacLeod, and Maclyn N.
McCarty, also of Rockefeller Institute found that deoxyribose nucleic acid
(DNA) isolated from the fragments could induce the transformation.
At that time, DNA was an obscure chemical with little significance. The
work of A very MacLeod and McCarty helped bring it to the force. Their
experiments were the first proof that in living organism genetic matter is DNA.
The possible mechanism of transformation. Though it takes place in less than 1
% of a population, transformation is an important method of recombination in
bacteria.
A number of donor cells break apart and an explosive release and
fragmentation of DNA follows. A segment of double stranded DNA containing
about 10-20 genes then passes through the cell wall and membrane of a
recipient cell. Only a few competent recipient cells can take up the DNA. After
entry into cell, an enzyme dissolves one strand of DNA leaving the second
strand to be incorporated.
This strand then displaces a segment from a strand of the recipient's
DNA. The displaced DNA is dissolved by another enzyme in the cell. The cell
is now transformed. It will display its own traits as well as those coded by the
new DNA.
Transformation may also take place by the incorporation of plasmids to
competent cells. In this case, no DNA is displaced. Rather, the plasmid adds
genes to those already in the cell and multiplies along with the cell.
Since the 1940's, transformation has also been demonstrated in species of
Neisseria, Bacillus, Haemophilus, Azotohacter and Streptococcus. The process
involves the transfer of DNA from the fragments of donor cells into the
cytoplasm of a live recipient cell. Sections of single stranded or double
stranded DNA may be taken up but only a single strand will align with the
bacterial chromosome and becomes incorporated into it.
Transformations in bacteria have been observed in the ability to form a
capsule, a drug resistance and pathogenicity, and in nutritional patterns.
Transformations are not common, however, because the large fragments of

62
DNA molecule can not pass through the recipient's cell wall or membrance. In
nature, transformation may increase the pathogenicity of an organism.
Bacterial Transformation
During the forties, it was recognised that inheritance in bacteria was
governed basically by the same mechanisms as those in higher eucaryotic
organisms. It was also realized that bacteria represent a useful tool to
understand the mechanism of heredity and genetic transfer and were therefore,
increasingly used in genetic studies.
The first observation that the bacterial properties can be changed by the
use of heat inactivated cell material was however, made in .1928 by Griffith.
Griffith had found that in Streptococcus pneumoniae (earlier called
Pneumococcus), virulence to mice was related to the presence of a capsular
material and the loss of the ability to produce the capsule made the bacteria
avirulent.
Mutants lacking the capsular material were designated as rough (R)
because colonies formed by these on solid media appeared rough as opposed to
the colonies formed by the virulent, capsule forming strains which were smooth
and shining (S). Griffith's experiments involved the infection of mice with heat
killed and living preparations from two different strains of S. pneumoniae.
When he injected the mice with either the dead "S" cells or a small number of
living "R" cells no death occurred. However, when the mice were injected with
a mixture of dead "S" cells and a small number of live "R" cells, the mice died.
From these experiments he concluded, that the dead "S" cells which
contained the capsule contributed to the killing effect by the "R" cells, since
neither of the preparations by themselves were effective.
Although, these observations were not well understood at that time, the
term "transformation" was used to describe this phenomenon whereby one type
of cells were converted by contact with the dead cells of a second strain. The
material responsible for causing transformation was thought to be the capsular
polysaccharide.
However not until 1943 the material responsible for bringing about this
change was identified. It was left to Avery, McLeod and McCarty in 1944 to
identify the transforming principle in capsulated cells as the DNA. Their
studies with purified DNA from the smooth cells of pneumoniae and its ability
to transform rough cells in a test tube explained the observations made by
Griffith in 1928. It was then possible to conclude that the heat killed
encapsulated cells carried the information for the synthesis of the capsule
which was transferred to the live noncapsulated cells. As a consequence cells

63
that received the genetic material for capsule formation became encapsulated
and virulent.
This remarkable finding did not receive as much attention as it should
have at that time, since most believed that proteins rather than nucleic acids are
the genetic elements and that proteins in the DNA preparations were
responsible for bringing about transformation.
Since then however using highly purified DNA preparations and other
genetic markers it has been shown beyond doubt that the transforming principle
is DNA and not protein. The process of transformation has now also been
demonstrated in several other bacteria such as Bacillus subtilis. Haemophilus
influenzae, Rhizobium, E. coli, Streptococcus, Streptomyces etc.
The process of transformation in all these organisms has certain
common features: (i) the purified donor DNA is first transported across the cell
membrane into the recipient "competent cells" (cells, that can take up DNA),
and (ii) the DNA then undergoes recombination with the recipient DNA and is
then expressed.
The uptake process apparently is not very specific since it has been
found that even calf thymus DNA can betaken in by bacterial cells but the
subsequent process of integration is highly specific. Although the double
stranded DNA is necessary for transformation, single stranded DNA can also
penetrate the bacterial cells. Following uptake, by the recipient cells the
transforming DNA undergoes modifications immediately and an "eclipse"
period lasting for a few minutes is seen.
During this period the donor DNA cannot be recovered from the
recipient cells. Using isotopically labelled transforming DNA it has been
shown that during this Period the DNA exists in a single stranded form. This is
followed by the integration of DNA into the recipient DNA in an area of
homology. The process of integration apparently involves recombination and
the loss of a region of recipient DNA. Although many details are known about
this process, our understanding of the transformation process in bacteria, is yet
difficult to generalize
The frequency of transformation for any single character is rather small
since the amount of DNA that is taken up by the cells and integrated is small.
Nevertheless, transformation using purified DNA preparations has been useful
in locating genetic loci (in genetic mapping) as well as to understand the effect
of a variety of physical and chemical treatments on the; biological functioning
of DNA.

64
Transduction Mechanism in Gene Recombination
A third mechanism by which the genetic material in bacteria can be
transferred from one cell to another is through the mediation of bacterial
viruses an this process is known as transduction. This process was first
discovered by Norman Zinder and J. Lederberg in 1952 during their
experiments to see whether the process or conjugation existed in Salmonella.
In doing the 'D' tube' experiments, they found that the recombinants
appeared only in one arm of the tube without cell contact, Also, cell free
filtrates from One culture could yield recombinants when mixed with the other.
The active factor in the filtrate was however, resistant to DNase and this ruled
out transformation involving DNA.
It was subsequently proved that the active component was a
bacteriophage which was carried by one of the strains in the prophage
condition.
Some bacteria have the ability to carry phage DNA within their own
DNA and such bacteria are known as lysogenic bacteria. In such lysogenic
bacteria, the prophage under certain conditions becomes active, multiplies and
destroys the host cell with the release of a number of phage particles.
The phage particles released from a small number of bacterial cells
attack sensitive cells, multiply and release more phage particles. The lysogenic
strains are however, resistant to the same phage that they carry.
Sometimes when the prophage is released as the vegetative phage, in
addition to its own DNA it also carries a small fragment of the host DNA.
These phages can infect other bacteria and carry the bacterial DNA to the
recipient cells. Such phages are called "transducing phages" and these act as
carriers of bacterial DNA from one cell to another.
The size of the DNA transferred by transduction is small as compared
to either transformation or conjugation and the amount of DNA is generally
less than one per cent of the bacterial genome. This technique is therefore
useful only in determining the relative positions of very closely located markers
and to map regions within a gene.
Types of Transduction
In experiments performed by Zinder and Lederberg, with phage P22
and S.typhimurium L22, it was found that the phage can carry any part of the
bacterial DNA and transfer this to another Salmonella strain. The new bacterial
DNA thus introduced into the recipient strains was subsequently integrated into
the recipient cell DNA.

65
This ability of the bacteriophage to carry with its genetic material any
region of the bacterial DNA is now known as "generalized transduction"
(unrestricted transduction). As opposed to this, there are bacterial phages such
as the lambda phage (λ) of E. coli which can carry only a specific region of the
bacterial DNA to, a recipient and this is called as "Specialised transduction"
(restricted transduction). The technique of specialized transduction has been
extensively used in recent years to determine the fine structure of bacterial
genes.
Phage preparations obtained by the induction of lysogenic cultures
possess transducing properties. However, phages that emerge from the lytic
infection are inert. The generation of transducing phage involves a kind of
recombination between the bacterial and prophage chromosomes and during
later release, a part of the bacterial chromosome is exchanged for a part of the
phage genome
Generalised Transduction / Unrestricted Transduction
This type of transduction, generalised transduction is a more common
event. It is mediated by the prophages that have remained in the cytoplasm as
plasmids that are not attached to the chromosome. This occurs in PI phage and
many others. The viral DNA lies in the cytoplasm and produces copies of itself
for new phage particles.
In doing so it may accidentally incorporate small chromosomal
segments of bacterial DNA and incorporates these to its own DNA. Some
phages may accidentally package only bacterial DNA. In most cases, normal
viruses will be liberated from the cell. Occassionally, a virus contains several
bacterial genes acquired in the chromosomal segments.
If such a virus infects a new cell, whereupon they will attach to the
chromosome and transduce the cell as lysogeny is established. In generalised
transduction, the viral DNA enters the lytic cycle and forms new virus particles
However, tiny fragments of bacterial chromosome are sometimes
incorporated into the DNA of the new viruses or may occasionally replace the
viral DNA. This is a random occurrence that may involve any of the bacterial
genes, hence the name generalised transduction. Perhaps one phage in a
thousand contains bacterial DNA.All bacterial genes are equally available to be
picked up by the phage DNA.
When the viral particles are released during lysis, the genes are carried
along and on subsequent infection, the genes enter the cytoplasm of the new
host cell where they will now function.The phenomenon of lysogeny is well
established in modern microbiology. Diphtheria organisms are known to
contain bacteriophages that code for the toxin produced during disease. Herpes

66
simplex viruses remain for many years as prophages in the cytoplasm of the
body cells, expressing themselves at, long intervals. Certain viruses are known
to attach to human chromosomes, transforming the cells to tumour cells.
Specialised Transduction / Restricted Transduction-The bacterium may
remain lysogenic for many generations during which time the viral DNA
replicates together with the bacterium. However, at some point in the future,
the phage stops coding the repressor protein, and the lytic cycle will begin. The
viral DNA that was attached to the chromosome will now break free and direct
the synthesis of those proteins that will yield new viruses.
In detaching, however, the viral DNA may carry with it a few bacterial
genes from the chromosome. The genes are then replicated along with the viral
DNA and they become part of the new phage particles. When the latter are
released, copies of the genes are carried along. As the cycle repeats during the
next infection, phage DNA enters the new bacterial cells and inserts onto a new
chromosome. However, copies of the original, bacterial genes are included, and
the bacterium becomes transduced.
The bacterial cell now contains its own genes plus several from the
original cell. This type of transduction is called specialised transduction,
because specific genes are removed from the bacterial chromosome, depending
upon where the viral DNA was attached. This occurs in lambda phage. The
removal of genes, however, is thought to be an extremely rare event.
Abortive Transduction - As discussed before, fusion between Enterobacteria
is common but the formation of stable heterokaryons or diploids is not known.
This is perhaps due to the difficulty in transferring the entire genome from one
cell to another.
After conjugation partial zygotes (merozygotes) are formed but these
are transient and therefore, are not very suitable for complementation tests.
One of the systems available in bacteria to test complementation is
"abortive transduction". In this parasexual mechanism, modified temperate
bacteriophages act as vectors of small fragments of bacterial DNA, transporting
the DNA from a donor cell in which the phage is grown to a recipient strain
which the phage can infect.
However, the transduced bacterial segment fails to undergo
recombination and replication but remains functional and is transferred during
cell division to only one daughter cell.
In Salmonella, motility is one of the many characters that can be
transduced by the phage P22. When P22 IS grown on a motile donor strain,

67
transducing phages which can infect nonmotile recipient bacteria can be
obtained.
Motile transductants can be isolated by plating the infected nonmotile
recipients on soft gelatin agar in which the growth of nonmotile organism is
confined and compact, while motile cells migrate outwards as they multiply to
give an expanding 'flare' of growth.
Stocker, Zinder and Lederberg noticed that in addition to flares,
sometimes a number of linear trails of isolated colonies leading out from the
outer region of confined growth. These trails were explained on the basis that
the motility gene transferred through transduction does not participate in
recombination, but is functional and capable of conferring motility. When such
cells divide only one daughter cell in which the gene is present is able to move
away from the parent forming tiny compact colonies.
This process of unilinear inheritance of the transferred gene continues
leaving a trail of nonmotile celled colonies until the gene is lost for reasons not
well known. Using single cell studies, this mechanism was confirmed as
abortive transduction.
Later in 1956, Ozeki also observed the same type of phenomenon in'
transduction of nutritional characters in Salmonella diagrammatically
summarises linear inheritance as it occurs in abortive transduction.
Conjugation Mechanism in Gene Recombination - Literature in bacterial
morphology contains many descriptions of microscopic observations of cell
pairs which were identified as indicators of mating and sexuality in bacteria.

68
However; no confirmatory genetic evidence was available till the
discovery of conjugation in E. coli by J. Lederberg and E.L. Tatum in 1946,
who mixed auxotrophic mutants and selected rare recombinants. In their initial
experiments, Lederberg and Tatum plated E. coli mutants having triple and
complementary nutritional requirements (abcDEF X ABCdef) on minimal agar
and obtained prototrophic bacteria (ABCDEF).
These recombinants were stable, could be propagated and arose at a
frequency of 10-6,10-7. Further evidence to show that the development of
prototrophic colonies required the cooperation of intact bacteria of both types
was obtained by the 'U' tube experiments (see transduction). Neither the culture
filtrates nor the cell free culture extracts were productive suggesting that actual
cell contact was necessary. Lederberg also examined a large number of the
prototrophic colonies to know whether the process was reciprocal. He found
that most colonies contained only one class of recombinants suggesting that
recombination in bacteria may be of an unorthodox kind. Also detailed analysis
of prototrophs showed initial heterozygous nature but later were converted to
haploids.
These studies by Lederberg and his colleagues proved that bacteria
possessed sex which made them amenable to formal genetic analysis and also
revealed the existence of genetic material in a chromosomal organization.
Subsequent studies carried out to determine the size of the DNA
fragment involved, by detecting the number of genetic markers transferred,
suggested that more than one market could be transferred at a time and
interestingly, linkage between certain markers was always seen.
It was concluded that in this process of conjugation (i) large fragments
of DNA were transferred from one bacterium to another in a non reciprocal
manner, and (ii) that transfer always occurred from a given point.
It was also found that the size of the DNA transferred from one cell to
another was much larger than in transformation and this technique appeared to
be a more useful technique for gene mapping in bacteria. The bacteria that
transfer DNA were called the "donor" bacteria, while those that receive the
DNA were called the "recipient" bacteria.

69
70
NOTES
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...

71
UNIT – III
STRUCTURE
Transcription in Prokaryotes and Eukaryotes.
Mechanism of Promoters and RNA polymerase and transcription factors
TRANSCRIPTION IN PROKARYOTES
As in most areas of molecular biology, studies of E. coli have provided
the model for subsequent investigations of transcription in eukaryotic cells. The
basic mechanisms by which transcription is regulated were likewise elucidated
by pioneering experiments in E. coli, in which regulated gene expression
allows the cell to respond to variations in the environment, such as changes in
the availability of nutrients. An understanding of transcription in E. coli has
thus provided the foundation for studies of the far more complex mechanisms
that regulate gene expression in eukaryotic cells.
RNA Polymerase and Transcription
The principal enzyme responsible for RNA synthesis is RNA
polymerase, which catalyzes the polymerization of ribonucleoside 5′-
triphosphates (NTPs) as directed by a DNA template. The synthesis of RNA is
similar to that of DNA, and like DNA polymerase, RNA polymerase catalyzes
the growth of RNA chains always in the 5′ to 3′ direction. Unlike DNA
polymerase, however, RNA polymerase does not require a preformed primer to
initiate the synthesis of RNA. Instead, transcription initiates de novo at specific
sites at the beginning of genes. The initiation process is particularly important
because this is the primary step at which transcription is regulated.
E. coli RNA polymerase, like DNA polymerase, is a complex enzyme
made up of multiple polypeptide chains. The intact enzyme consists of four
different types of subunits, called α, β, β′, and σ (Figure 6.1). The σ subunit is
relatively weakly bound and can be separated from the other subunits, yielding
a core polymerase consisting of two α, one β, and one β′ subunits. The core
polymerase is fully capable of catalyzing the polymerization of NTPs into
RNA, indicating that σ is not required for the basic catalytic activity of the
enzyme. However, the core polymerase does not bind specifically to the DNA
sequences that signal the normal initiation of transcription; therefore, the σ
subunit is required to identify the correct sites for transcription initiation. The
selection of these sites is a critical element of transcription because synthesis of
a functional RNA must start at the beginning of a gene.

72
Figure 6.1. E. coli RNA polymerase The complete enzyme consists of
five subunits: two α, one β, one β′, and one σ. The σ subunit is relatively
weakly bound and can be dissociated from the other four subunits, which
constitute the core polymerase.
The DNA sequence to which RNA polymerase binds to initiate
transcription of a gene is called the promoter. The DNA sequences involved in
promoter function were first identified by comparisons of the nucleotide
sequences of a series of different genes isolated from E. coli. These
comparisons revealed that the region upstream of the transcription initiation
site contains two sets of sequences that are similar in a variety of genes. These
common sequences encompass six nucleotides each, and are located
approximately 10 and 35 base pairs upstream of the transcription start site
(Figure 6.2). They are called the -10 and -35 elements, denoting their position
relative to the transcription initiation site, which is defined as the +1 position.
The sequences at the -10 and -35 positions in different promoters are not
identical, but they are all similar enough to establish consensus sequences—the
bases most frequently found at each position.

Figure 6.2. Sequences of E. coli promoters E. coli promoters are


characterized by two sets of sequences located 10 and 35 base pairs upstream
of the transcription start site (+1). The consensus sequences shown correspond
to the bases most frequently found in different promoters.
Several types of experimental evidence support the functional
importance of the -10 and -35 promoter elements. First, genes with promoters
that differ from the consensus sequences are transcribed less efficiently than
genes whose promoters match the consensus sequences more closely. Second,
mutations introduced in either the -35 or -10 consensus sequences have strong
effects on promoter function. Third, the sites at which RNA polymerase binds
to promoters have been directly identified by footprinting experiments, which
are widely used to determine the sites at which proteins bind to DNA (Figure
6.3).

73
In experiments of this type, a DNA fragment is radiolabeled at one end.
The labeled DNA is incubated with the protein of interest (e.g., RNA
polymerase) and then subjected to partial digestion with DNase. The principle
of the method is that the regions of DNA to which the protein binds are
protected from DNase digestion. These regions can therefore be identified by
comparison of the digestion products of the protein-bound DNA with those
resulting from identical DNase treatment of a parallel sample of DNA that was
not incubated with protein. Variations of this basic method, which employ
chemical reagents to modify and cleave DNA at particular nucleotides, can be
used to identify the specific DNA bases that are in contact with protein. Such
footprinting analysis has shown that RNA polymerase generally binds to
promoters over approximately a 60-base-pair region, extending from -40 to +20
(i.e., from 40 nucleotides upstream to 20 nucleotides downstream of the
transcription start site).

Figure 6.3.

74
DNA footprinting A sample containing fragments of DNA
radiolabeled at one end is divided into two, and one half of the sample is
incubated with a protein that binds to a specific DNA sequence within the
fragment. Both samples are then digested with DNase, under conditions such
that the DNase introduces an average of one cut per molecule. The region of
DNA bound to the protein is protected from DNase digestion. The DNA-
protein complexes are then denatured, and the sizes of the radiolabeled DNA
fragments produced by DNase digestion are analyzed by electrophoresis (as for
DNA sequencing). Fragments of DNA resulting from DNase cleavage within
the region protected by protein binding are missing from the sample of DNA
that was incubated with protein.
The σ subunit binds specifically to sequences in both the -35 and -10
promoter regions, substantiating the importance of these sequences in promoter
function. In addition, some E. coli promoters have a third sequence, located
upstream of the -35 region, that serves as a specific binding site for the RNA
polymerase α subunit.
In the absence of σ, RNA polymerase binds nonspecifically to DNA
with low affinity. The role of σ is to direct the polymerase to promoters by
binding specifically to both the -35 and -10 sequences, leading to the initiation
of transcription at the beginning of a gene (Figure 6.4). The initial binding
between the polymerase and a promoter is referred to as a closed-promoter
complex because the DNA is not unwound. The polymerase then unwinds
approximately 15 bases of DNA around the initiation site to form an open-
promoter complex in which single-stranded DNA is available as a template for
transcription. Transcription is initiated by the joining of two free NTPs. After
addition of about the first 10 nucleotides, σ is released from the polymerase,
which then leaves the promoter and moves along the template DNA to continue
elongation of the growing RNA chain. As it travels, the polymerase unwinds
the template DNA ahead of it and rewinds the DNA behind it, maintaining an
unwound region of about 17 base pairs in the region of transcription.

75
Figure 6.4.
Transcription by E. coli RNA polymerase The polymerase initially
binds nonspecifically to DNA and migrates along the molecule until the σ
subunit binds to the -35 and -10 promoter elements, forming a closed-promoter
complex. The polymerase then unwinds DNA around the initiation site, and
transcription is initiated by the polymerization of free NTPs. The σ subunit then
dissociates from the core polymerase, which migrates along the DNA and
elongates the growing RNA chain.

76
RNA synthesis continues until the polymerase encounters a termination
signal, at which point transcription stops, the RNA is released from the
polymerase, and the enzyme dissociates from its DNA template. The simplest
and most common type of termination signal in E. coli consists of a
symmetrical inverted repeat of a GC-rich sequence followed by four or more A
residues (Figure 6.5). Transcription of the GC-rich inverted repeat results in the
formation of a segment of RNA that can form a stable stem-loop structure by
complementary base pairing. The formation of such a self-complementary
structure in the RNA disrupts its association with the DNA template and
terminates transcription. Because hydrogen bonding between A and U is
weaker than that between G and C, the presence of A residues downstream of
the inverted repeat sequences is thought to facilitate the dissociation of the
RNA from its template. Other types of transcription termination signals, in both
prokaryotic and eukaryotic cells, depend on the binding of proteins that
terminate transcription to specific DNA sequences, rather than on the formation
of a stem-loop structure in the RNA.

Figure 6.5.
Transcription termination The termination of transcription is signaled
by a GC-rich inverted repeat followed by four A residues. The inverted repeat
forms a stable stem-loop structure in the RNA, causing the RNA to dissociate
from the DNA template.

77
Repressors and Negative Control of Transcription
The pioneering studies of gene regulation in E. coli were carried out by
François Jacob and Jacques Monod in the 1950s. These investigators and their
colleagues analyzed the expression of enzymes involved in the metabolism of
lactose, which can be used as a source of carbon and energy via cleavage to
glucose and galactose (Figure 6.6). The enzyme that catalyzes the cleavage of
lactose (β-galactosidase) and other enzymes involved in lactose metabolism are
expressed only when lactose is available for use by the bacteria. Otherwise, the
cell is able to economize by not investing energy in the synthesis of
unnecessary RNAs and proteins. Thus, lactose induces the synthesis of
enzymes involved in its own metabolism. In addition to requiring β-
galactosidase, lactose metabolism involves the products of two other closely
linked genes: lactose permease, which transports lactose into the cell, and a
transacetylase, whose function in lactose metabolism is still unknown. On the
basis of purely genetic experiments, Jacob and Monod deduced the mechanism
by which the expression of these genes was regulated, thereby formulating a
model that remains fundamental to our understanding of transcriptional
regulation.

Figure 6.6.
Metabolism of lactose β-galactosidase catalyzes the hydrolysis of
lactose to glucose and galactose.
The starting point in this analysis was the isolation of mutants that were
defective in regulation of the genes involved in lactose utilization. These

78
mutants were of two types: constitutive mutants, which expressed all three
genes even when lactose was not available, and noninducible mutants, which
failed to express the genes even in the presence of lactose. Genetic mapping
localized these regulatory mutants to two distinct loci, called o and i, with o
located immediately upstream of the structural gene for β-galactosidase.
Mutations affecting o resulted in constitutive expression; mutants of i were
either constitutive or noninducible.
The function of these regulatory genes was probed by experiments in
which two strains of bacteria were mated, resulting in diploid cells containing
genes derived from both parents (Figure 6.7). Analysis of gene expression in
such diploid bacteria provided critical insights by defining which alleles of
these regulatory genes are dominant and which recessive. For example, when
bacteria containing a normal i gene (i+) were mated with bacteria carrying an i
gene mutation resulting in constitutive expression (an i- mutation), the resulting
diploid bacteria displayed normal inducibility; therefore, the normal i+ gene
was dominant over the i- mutant. In contrast, matings between normal bacteria
and bacteria with an oc mutation (constitutive expression) yielded diploids with
the constitutive expression phenotype, indicating that oc is dominant over o+.
Additional experiments in which mutations in o and i were combined with
different mutations in the structural genes showed that o affects the expression
of only the genes to which it is physically linked, whereas i affects the
expression of genes on both chromosome copies in diploid bacteria. Thus, in an
oc/o+ cell, only the structural genes that are linked to oc are constitutively
expressed. In contrast, in an i+/i- cell, structural genes on both chromosomes are
regulated normally. These results led to the conclusion that o represents a
region of DNA that controls the transcription of adjacent genes, whereas the i
gene encodes a regulatory factor (e.g., a protein) that can diffuse throughout the
cell and control genes on both chromosomes.

Figure 6.7.

79
Regulation of β-galactosidase in diploid E. coli The mating of two
bacterial strains results in diploid cells that contain genes from both parents. In
these examples, it is assumed that the genes encoding β-galactosidase (the z
genes) can be distinguished on the basis of structural gene mutations,
designated z1 and z2. In an i+/i- diploid (left), both structural genes are
inducible; therefore, i+ is dominant over i- and affects expression of z genes on
both chromosomes. In contrast, in an oc/o+ diploid (right), the z gene linked to
oc is constitutively expressed, whereas that linked to o+ is inducible. Therefore,
o affects expression of only the adjacent z gene on the same chromosome.
The model of gene regulation developed on the basis of these
experiments is illustrated in Figure 6.8. The genes encoding β-galactosidase,
permease, and transacetylase are expressed as a single unit, called an operon.
Transcription of the operon is controlled by o (the operator), which is adjacent
to the transcription initiation site. The i gene encodes a protein that regulates
transcription by binding to the operator. Since i- mutants (which result in
constitutive gene expression) are recessive, it was concluded that these mutants
failed to make a functional gene product. This result implies that the normal i
gene product is a repressor, which blocks transcription when bound to o. The
addition of lactose leads to induction of the operon because lactose binds to the
repressor, thereby preventing it from binding to the operator DNA. In
noninducible i mutants (which are dominant over i+), the repressor fails to bind
lactose, so expression of the operon cannot be induced.

Figure 6.8.

80
Negative control of the lac operon The i gene encodes a repressor
which, in the absence of lactose (top), binds to the operator (o) and blocks
transcription of the three structural genes (z, β-galactosidase; y, permease; and
a, transacetylase). Lactose induces expression of the operon by binding to the
repressor (bottom), which prevents the repressor from binding to the operator.
P = promoter; Pol = polymerase.
The model neatly fits the results of the genetic experiments from which
it was derived. In i- cells, the repressor is not made, so the lac operon is
constitutively expressed. Diploid i+/i- cells are normally inducible, since
functional repressor is encoded by the i+ allele. Finally, in oc mutants a
functional operator has been lost and repressor cannot be bound. Consequently,
oc mutants are dominant but affect the expression only of linked structural
genes.
Confirmation of this basic model has since come from a variety of
experiments, including Walter Gilbert's isolation, in the 1960s, of the lac
repressor and analysis of its binding to operator DNA. Molecular analysis has
defined the operator as approximately 30 base pairs of DNA, starting a few
bases before the transcription initiation site. Footprinting analysis has identified
this region as the site to which the repressor binds, blocking transcription. As
predicted, lactose binds to the repressor, which then no longer binds to operator
DNA. Also as predicted, oc mutations alter sequences within the operator,
thereby preventing repressor binding and resulting in constitutive gene
expression.
The central principle of gene regulation exemplified by the lactose
operon is that control of transcription is mediated by the interaction of
regulatory proteins with specific DNA sequences. This general mode of
regulation is broadly applicable to both prokaryotic and eukaryotic cells.
Regulatory sequences like the operator are called cis-acting control elements,
because they affect the expression of only linked genes on the same DNA
molecule. On the other hand, proteins like the repressor are called transacting
factors because they can affect the expression of genes located on other
chromosomes within the cell. The lac operon is an example of negative control
because binding of the repressor blocks transcription. This, however, is not
always the case; many trans-acting factors are activators rather than inhibitors
of transcription.
Positive Control of Transcription
The best-studied example of positive control in E. coli is the effect of
glucose on the expression of genes that encode enzymes involved in the
breakdown (catabolism) of other sugars (including lactose) that provide

81
alternative sources of carbon and energy. Glucose is preferentially utilized, so
as long as glucose is available, enzymes involved in catabolism of alternative
energy sources are not expressed. For example, if E. coli are grown in medium
containing both glucose and lactose, the lac operon is not induced and only
glucose is used by the bacteria. Thus, glucose represses the lac operon even in
the presence of the normal inducer (lactose).
Glucose repression (generally called catabolite repression) is now
known to be mediated by a positive control system, which is coupled to levels
of cyclic AMP (cAMP) (Figure 6.9). In bacteria, the enzyme adenylyl cyclase,
which converts ATP to cAMP, is regulated such that levels of cAMP increase
when glucose levels drop. cAMP then binds to a transcriptional regulatory
protein called catabolite activator protein (CAP). The binding of cAMP
stimulates the binding of CAP to its target DNA sequences, which in the lac
operon are located approximately 60 bases upstream of the transcription start
site. CAP then interacts with the α subunit of RNA polymerase, facilitating the
binding of polymerase to the promoter and activating transcription.
Figure 6.9. Positive control of the lac
operon by glucose Low levels of
glucose activate adenylyl cyclase, which
converts ATP to cyclic AMP (cAMP).
Cyclic AMP then binds to the catabolite
activator protein (CAP) and stimulates
its binding to regulatory sequences of
various operons concerned with the
metabolism of alternative sugars, such as
lactose. CAP interacts with the α subunit
of RNA polymerase to activate
transcription.
Transcriptional Attenuation
Both the positive and negative control mechanisms that we have
discussed act at the level of initiation of transcription. An additional
mechanism, transcriptional attenuation, regulates the expression of some genes
by controlling the ability of RNA polymerase to continue elongation past
specific sites. This mode of regulation has been described best in the E. coli trp
operon, which encodes five enzymes involved in biosynthesis of the amino acid
tryptophan. These genes are expressed only when tryptophan is not available to
the cell in its environment, since otherwise the synthesis of additional
tryptophan is unnecessary.

82
The trp operon is regulated in part by a repressor that, when bound to
tryptophan, blocks transcription (Figure 6.10). However, transcriptional
attenuation provides an additional level of control that results in more stringent
regulation than could be achieved by repression of initiation alone. The site of
attenuation is located 162 nucleotides downstream of the transcription start site.
If tryptophan is abundant, most transcription terminates at this site; only if
tryptophan is scarce does transcription continue to yield functional Trp mRNA.

Figure 6.10.
Regulation of the tryptophan operon The operon contains five structural
genes involved in the biosynthesis of tryptophan: trpE, D, C, B, and A.
Expression of these genes is controlled at two levels. The trpR gene encodes a
repressor that, in the presence of tryptophan, binds to the operator (o) to block
transcription. In addition, expression is mediated by an attenuator sequence that
prematurely terminates transcription when high levels of tryptophan are
present. In this case, the attenuated RNA consists of only a short leader
sequence (L). P = promoter.
The mechanism of attenuation depends on the fact that translation in
bacteria is coupled with transcription, so ribosomes begin translating the 5′ end
of an mRNA while it is still being synthesized. Thus, the rate of translation can
affect the structure of the growing RNA chain, which in turn determines
whether further transcription can continue. Transcription termination is
signaled by a stem-loop structure that forms by complementary base pairing
between two specific sequences of the growing Trp mRNA chain (Figure 6.11).
This structure forms if translation of the growing chain is proceeding at a
normal rate, as it does when tryptophan is present in adequate supply. If
tryptophan is scarce, however, protein synthesis stalls at a critical region of the
message. If this occurs, the ribosomes bound to the mRNA block formation of

83
the transcription-terminating stem loop, allowing Trp mRNA synthesis to
continue.
The critical region of Trp mRNA contains two adjacent tryptophan
codons, so the rate of translation is highly dependent on tryptophan levels; this
is the link between transcriptional attenuation and the availability of
tryptophan. If tryptophan levels in the cell are low, the ribosome stalls at this
point and transcription of Trp mRNA continues. If tryptophan is abundant,
translation continues and transcription is terminated

Figure 6.11.
Mechanism of transcriptional attenuation The trp mRNA is
translated while still being synthesized. In the presence of high levels of
tryptophan, the ribosomes proceed along the message slightly behind the site of
transcription. Under these conditions, the mRNA regions designated 3 and 4
hybridize to form a stem-loop structure that signals the termination of
transcription. In the presence of low levels of tryptophan, however, the
ribosomes stall at region 1 of the mRNA, which contains two adjacent codons
for tryptophan. In this case, since region 2 is not bound to a ribosome, it is free
to form an alternative stem-loop structure by hybridizing to region 3. This
hybridization prevents formation of the 3–4 stem loop, and transcription is able
to continue past the attenuator sequence.

84
TRANSCRIPTION MECHANISMS IN EUKARYOTES
In eukaryotes, there are three classes of RNA polymerases: I, II and III.
This section will focus on the RNA polymerase II (Pol II), which is involved in
the transcription of all protein genes. Transcription by RNA Pol I and Pol III is
discussed in Section I.

Figure 4-E-1. Structure of the human TBP core domain complexed


with DNA as determined by x-ray crystallography. The DNA includes the
TATA element. PDB ID = 1CDW.
Initiation
RNA Pol II does not contain a subunit similar to the prokaryotic s
factor, which can recognize the promoter and unwind the DNA double helix.
In eukaryotes, these two functions are carried out by a set of proteins called
general transcription factors. The RNA Pol II is associated with six general
transcription factors, designated as TFIIA, TFIIB, TFIID, TFIIE, TFIIF and
TFIIH, where "TF" stands for "transcription factor" and "II" for the RNA Pol
II.
TFIID consists of TBP (TATA-box binding protein) and TAFs (TBP
associated factors). The role of TBP is to bind the core promoter (Figure 4-E-
1). TAFs may assist TBP in this process. In human cells, TAFs are formed by
12 subunits. One of them, TAF250 (with molecular weight 250 kD), has the
histone acetyltransferase activity, which can relieve the binding between DNA
and histones in the nucleosome.
The transcription factor which catalyzes DNA melting is TFIIH.
However, before TFIIH can unwind DNA, the RNA Pol II and at least five
general transcription factors (TFIIA is not absolutely necessary) have to form a

85
pre-initiation complex (PIC). The order of the PIC assembly is described in
Figure 4-E-2.
Elongation
After PIC is assembled at the promoter, TFIIH can use its helicase
activity to unwind DNA. This requires energy released from ATP hydrolysis.
The DNA melting starts from about -10 bp. Then, RNA Pol II uses nucleoside
triphosphates (NTPs) to synthesize a RNA transcript. During RNA elongation,
TFIIF remains attached to the RNA polymerase, but all of the other
transcription factors have dissociated from PIC.
The carboxyl-terminal domain (CTD) of the largest subunit of RNA Pol
II is critical for elongation. In the initiation phase, CTD is unphosphorylated,
but during elongation it has to be phosphorylated. This domain contains many
proline, serine and threonine residues.
Termination
Eukaryotic protein genes contain a poly-A signal located downstream
of the last exon. This signal is used to add a series of adenylate residues during
RNA processing. Transcription often terminates at 0.5 - 2 kb downstream of
the poly-A signal, but the mechanism is unclear.

The role of regulatory transcription factors


In early 1990s, when the mystery of transcriptional regulation in
prokaryotes have been largely unveiled, scientists still knew very little about
the regulation mechanism in eukaryotes. The breakthrough came in 1996 when
a number of research groups discovered that certain transcriptional
coactivators are histone acetyltransferases (HATs). It has been known for
some time that binding of transcriptional activators to the enhancer region, in
most cases, is not sufficient to stimulate transcription. Certain co-activators are
also required. Similarly, transcriptional repression often requires both
repressor binding on the silencer element and the participation of co-repressor
proteins. The precise role of these co-activators and co-repressors was not
clear until 1996.
In eukaryotes, the association between DNA and histones prevents
access of the polymerase and general transcription factors to the promoter.
Histone acetylation catalyzed by HATs can relieve the binding between DNA

86
and histones. Although a subunit of TFIID (TAF250 in human) has the HAT
activity, participation of other HATs can make transcription more efficient.
The following rules apply to most (but not all) cases:
Binding of activators to the enhancer element recruits HATs to relieve
association between histones and DNA, thereby enhancing transcription.
Binding of repressors to the silencer element recruits histone deacetylases
(denoted by HDs or HDACs) to tighten association between histones and
DNA.
Transcription process
Eukaryotes have three nuclear RNA polymerases, each with distinct
roles and properties:

Name Location RNA transcribed

RNA Polymerase I Larger ribosomal RNA (rRNA)


nucleolus
(Pol I, Pol A) (28S, 18S, 5.8S)

messenger RNA (mRNA) and


RNA Polymerase
nucleus most small nuclear RNAs
II (Pol II, Pol B)
(snRNAs)

nucleus (and possibly the transfer RNA (tRNA) and other


RNA Polymerase
nucleolus-nucleoplasm small RNAs (including the
III (Pol III, Pol C)
interface) small 5S rRNA)

There are many eukaryotes that differ from the canonical presentation
of the roles of RNA polymerases. Certain organisms possess four distinct RNA
polymerases. Other organisms utilize RNA polymerase I to transcribe certain
protein-coding genes in addition to rRNAs.
Transcription regulation
The regulation of gene expression is achieved through the interaction of
several levels of control including the regulation of transcription initiation.
Most (not all) eukaryotes possess robust methods of regulating transcription
initiation on a gene-by-gene basis. The transcription of a gene can be regulated
by cis-acting elements within the regulatory regions of the DNA, and trans-
acting factors that include transcription factors and the basal transcription
complex.
Splicing
Two types of splicing, cis-splicing and trans-splicing, use the same
splicing machinery to cleave RNAs at specific points and rejoin them to form

87
new combinations once transcribed. Although most eukaryotes possess splicing
machinery the extent of cis- and trans-splicing varies from organism to
organism.
Cis-splicing
Primary (initial) mRNA transcripts are synthesized as larger precursor
RNAs that are processed by splicing out introns (non-coding sequences) and
ligating exons (non-contiguous coding sequences) into the mature mRNA.
Primary transcripts for some genes can be large. The primary transcripts of the
neurexin genes, for instance, are as large as 1.7 megabases (1,700,000 bases),
while the mature (processed) neurexin mRNAs are under 10 kilobases (10,000
bases), with as many as 24 exons and thousands of possible alternative splice
variants that produce proteins with different activities. Alternative splicing is
now incorporated in as much as 60% of human genetic coding, drastically
increasing the potential variety of actual proteins produced.
Trans-splicing
Observed in range of different eukaryotes (including most
conspicuously the worm C. elegans and a group of parasitic protists called
kinetoplastids), trans-splicing occurs whereby an exon from one RNA molecule
is spliced onto the 5' end of a completely separate molecule post-
transcriptionally. While relatively unimportant to many eukaryotes, the role of
this process in the biology of some organisms is ubiquitous. In kinetoplastids,
for example, every single nuclear-encoded message must be trans-spliced
before translation of the message can occur.
The promoter
Promoters used by RNA polymerase II have different structures
depending upon the particular combination of transcription factors that are
required to build a functional transcriptional complex at each promoter.
Nevertheless, these different structures can be viewed as a combination of a
relatively limited number of specific sequence elements.
Some of the common elements that have been described in class II eukaryotic
promoters are the following:
 The TATA Box located approximately 25 bp upstream of the start-
point of transcription is found in many promoters. The consenus
sequence of this element is TATAAAA (so it resembles the TATAAT
sequence of the prokaryotic -10 region but please do not mix them up).
The TATA box appears to be more important for selecting the startpoint
of transcription (i.e. positioning the enzyme) than for defining the
promoter.

88
 The Initiator is a sequence that is found in many promoters and defines
the startpoint of transcription.
 The GC box is a common element in eukaryotic class II promoters. Its
consensus sequence is GGGCGG. It may be present in one or more
copies which can be located between 40 and 100 bp upstream of the
startpoint of transcription. The transcription factor Sp1 binds to the GC
box.
 The CAAT box - consensus sequence CCAAT - is also often found
between 40 and 100 bp upstream of the startpoint of transcription. The
transcription factor CTF or NF1 binds to the CAAT box.
The following diagram shows some examples of eukaryotic promoters and
the combination of sequence elements that they contain:

In addition to the above elements, Enhancers may be required for full


expression. These elements are not part of the promoter per se. They can be
located upstream or downstream of the promoter and may be quite far away
from it. The mechanism by which they work is not known. They may provide
an entry point for RNA polymerase or they may bind other proteins that assist
RNA polymerase to bind to the promoter region.
RNA polymerase

89
RNAP from T. aquaticus pictured during elongation. Portions of the
enzyme were made transparent so as to make the path of RNA and DNA more
clear. The magnesium ion (yellow) is located at the enzyme active site.
RNA polymerase (RNAP or RNApol) is an enzyme that produces
RNA. In cells, RNAP is needed for constructing RNA chains from DNA genes
as templates, a process called transcription. RNA polymerase enzymes are
essential to life and are found in all organisms and many viruses. In chemical
terms, RNAP is a nucleotidyl transferase that polymerizes ribonucleotides at
the 3' end of an RNA transcript.
Control of transcription

An electron-micrograph of DNA strands decorated by hundreds of


RNAP molecules too small to be resolved. Each RNAP is transcribing an RNA
strand which can be seen branching off of the DNA. "Begin" indicates the 3'
end of the DNA, where RNAP initiates transcription; "End" indicates the 5'
end, where the longer RNA molecules are almost completely transcribed.
Control of the process of gene transcription affects patterns of gene
expression and thereby allows a cell to adapt to a changing environment,
perform specialized roles within an organism, and maintain basic metabolic
processes necessary for survival. Therefore, it is hardly surprising that the
activity of RNAP is both complex and highly regulated. In Escherichia coli
bacteria, more than 100 transcription factors have been identified which modify
the activity of RNAP.[4]
RNAP can initiate transcription at specific DNA sequences known as
promoters. It then produces an RNA chain which is complementary to the
template DNA strand. The process of adding nucleotides to the RNA strand is
known as elongation; In eukaryotes, RNAP can build chains as long as 2.4
million nucleosides (the full length of the dystrophin gene). RNAP will

90
preferentially release its RNA transcript at specific DNA sequences encoded at
the end of genes known as terminators.
Products of RNAP include:
 Messenger RNA (mRNA)—template for the synthesis of proteins by
ribosomes.
 Non-coding RNA or "RNA genes"—a broad class of genes that encode
RNA that is not translated into protein. The most prominent examples
of RNA genes are transfer RNA (tRNA) and ribosomal RNA (rRNA),
both of which are involved in the process of translation. However, since
the late 1990s, many new RNA genes have been found, and thus RNA
genes may play a much more significant role than previously thought.
o Transfer RNA (tRNA)—transfers specific amino acids to
growing polypeptide chains at the ribosomal site of protein
synthesis during translation
o Ribosomal RNA (rRNA)—a component of ribosomes
o Micro RNA—regulates gene activity
o Catalytic RNA (Ribozyme)—enzymatically active RNA
molecules
RNAP accomplishes de novo synthesis. It is able to do this because
specific interactions with the initiating nucleotide hold RNAP rigidly in place,
facilitating chemical attack on the incoming nucleotide. Such specific
interactions explain why RNAP prefers to start transcripts with ATP (followed
by GTP, UTP, and then CTP). In contrast to DNA polymerase, RNAP includes
helicase activity, therefore no separate enzyme is needed to unwind DNA.
RNA polymerase action
Binding and initiation
RNA Polymerase binding in prokaryotes involves the α subunit
recognizing the upstream element (-40 to -70 base pairs) in DNA, as well as the
σ factor recognizing the -10 to -35 region. There are numerous σ factors that
regulate gene expression. For example, σ70 is expressed under normal
conditions and allows RNAP binding to house-keeping genes, while σ32 elicits
RNAP binding to heat-shock genes.
After binding to the DNA, the RNA polymerase switches from a closed
complex to an open complex. This change involves the separation of the DNA
strands to form an unwound section of DNA of approximately 13bp.
Ribonucleotides are base-paired to the template DNA strand, according to
Watson-Crick base-pairing interactions. Supercoiling plays an important part in
polymerase activity because of the unwinding and rewinding of DNA. Because

91
regions of DNA in front of RNAP are unwound, there is compensatory positive
supercoils. Regions behind RNAP are rewound and negative supercoils are
present.
Elongation
Transcription elongation involves the further addition of ribonucleotides
and the change of the open complex to the transcriptional complex. RNAP
cannot start forming full length transcripts because of its strong binding to
promoter. Transcription at this stage primarily results in short RNA fragments
of around 9 bp in a process known as abortive transcription. Once the RNAP
starts forming longer transcripts it clears the promoter. At this point, the -10 to
-35 promoter region is disrupted, and the σ factor falls off RNAP. This allows
the rest of the RNAP complex to move forward, as the σ factor held the RNAP
complex in place.
The 17 bp transcriptional complex has an 8 bp DNA-RNA hybrid, that
is, 8 base-pairs involve the RNA transcript bound to the DNA template strand.
As transcription progresses, ribonucleotides are added to the 3' end of the RNA
transcript and the RNAP complex moves along the DNA. Although RNAP
does not seem to have the 3'exonuclease activity that characterizes the
proofreading activity found in DNA polymerase, there is evidence of that
RNAP will halt at mismatched base-pairs and correct it.
The addition of ribonucleotides to the RNA transcript has a very similar
mechanism to DNA polymerization - it is believed that these polymerases are
evolutionarily related. Aspartyl (asp) residues in the RNAP will hold onto Mg2+
ions, which will in turn coordinate the phosphates of the ribonucleotides. The
first Mg2+ will hold onto the α-phosphate of the NTP to be added. This allows
the nucleophilic attack of the 3'OH from the RNA transcript, adding an
additional NTP to the chain. The second Mg2+ will hold onto the pyrophosphate
of the NTP. The overall reaction equation is:
(NMP)n + NTP --> (NMP)n+1 + PPi
Termination
Termination of RNA transcription can be rho-independent or rho-dependent:
Rho-independent transcription termination is the termination of
transcription without the aid of the rho protein. Transcription of a palindromic
region of DNA causes the formation of a hairpin structure from the RNA
transcription looping and binding upon itself. This hairpin structure is often
rich in G-C base-pairs, making it more stable than the DNA-RNA hybrid itself.
As a result, the 8bp DNA-RNA hybrid in the transcription complex shifts to a

92
4bp hybrid. Coincidentally, these last 4 base-pairs are weak A-U base-pairs,
and the entire RNA transcript will fall off.
RNA polymerase in bacteria
In bacteria, the same enzyme catalyzes the synthesis of mRNA and ncRNA.
RNAP is a relatively large molecule. The core enzyme has 5 subunits (~400
kDa):
 α2: the two α subunits assemble the enzyme and recognize regulatory
factors. Each subunit has two domains: αCTD (C-Terminal domain)
binds the UP element of the extended promoter, and αNTD (N-terminal
domain) binds the rest of the polymerase. This subunit is not used on
promoters without an UP element.
 β: this has the polymerase activity (catalyzes the synthesis of RNA)
which includes chain initiation and elongation.
 β': binds to DNA (nonspecifically).
 ω: restores denatured RNA polymerase to its functional form in vitro. It
has been observed to offer a protective/chaperone function to the β'
subunit in Mycobacterium smegmatis. Now known to promote
assembly.
In order to bind promoter-specific regions, the core enzyme requires
another subunit, sigma (σ). The sigma factor greatly reduces the affinity of
RNAP for nonspecific DNA while increasing specificity for certain promoter
regions, depending on the sigma factor. That way, transcription is initiated at
the right region. The complete holoenzyme therefore has 6 subunits: α2ββ'σω
(~480 kDa). The structure of RNAP exhibits a groove with a length of 55 Å
(5.5 nm) and a diameter of 25 Å (2.5 nm). This groove fits well the 20 Å (2
nm) double strand of DNA. The 55 Å (5.5 nm) length can accept 16
nucleotides.
When not in use RNA polymerase binds to low affinity sites to allow
rapid exchange for an active promoter site when one opens. RNA polymerase
holoenzyme, therefore, does not freely float around in the cell when not in use.
The real E. coli RNA Polymerase
In 1960, the true enzyme was identified by 4 separate groups: Sam
Weiss at the University of Chicago, Jerard Hurwitz, A. Stevens and J.
Bonner. This enzyme required a template, used all four rNTPs as substrates
and synthesized a product with a composition similar to that of the template,
and it required Mg++.

93
DNA-directed RNA polymerase catalyzes the synthesis of RNA on a
DNA template following similar nucleophilic attack chemistry as DNA
polymerase.
For a long time, E. coli was thought to contain just a single RNA
polymerase. We now know that, although it has one major or principal RNA
polymerase, it also contains (at least) six minor forms that are required or are
used in special circumstances.
As with DNA polymerase III, there are two forms of RNA polymerase:
Core Enzyme
The core enzyme has four polypeptide subunits: alpha (a), beta (b),
beta' (b'), and omega (w) in the stoichiometry a2bb'w. The omega subunit was
for many years considered a curiosity since no function could be ascribed to it
and it did not appear to necessary to function. However, it is now known that
omega is necessary to restore denatured RNA polymerase in vitro to its fully
functional form. It may function by binding simultaneously to the N-terminus
and C-terminus of the b' subunit. The omega subunit is a part of the Thermus
aquaticus enzyme whose structure was determined in May 2002.
Core RNA polymerase can bind to DNA and catalyze the synthesis of
RNA but it has no specificity. This form of the enzyme will bind (quite well,
really) to non-specific DNA but it cannot recognize promoters.
Holo Enzyme
The RNA polymerase holoenzyme contains an additional subunit - sigma
(s). The sigma subunit does two things:
 It reduces the affinity of the enzyme for non-specific DNA.
 It greatly increases the affinity of the enzyme for promoters.
The following table summarizes the subunits of E. coli RNA polymerase
and their properties:

subunit Size aa Size (Kd) gene function

required for assembly of the


alpha enzyme; interacts with
329 36511 rpoA
(b) some regulatory proteins;
also involved in catalysis

involved in catalysis: chain


beta (b) 1342 150616 rpoB
initiation and elongation

94
beta' (b') 1407 155159 rpoC binds to the DNA template

sigma directs enzyme to the


613 70263 rpoD
(s) promoter

required to restore
omega denatured RNA polymerase
91 10237 rpoZ
(w) in vitro to its fully
functional form

Note:Links in the subunit column will take you to the SWISS-PROT


database entry on the protein; Links in the gene column will take you to
the EMBL database entry on the gene.

The subunits of RNA polymerase assemble into a structure that has the
same "hand" like structure as DNA polymerases.
The structure of the Thermus aquaticus RNA polymerase holoenzyme
has recently been solved to 4 Å resolution. It is clear that the sigma subunit is
intimately associated with the enzyme and literally "buries" into the interior of
the complex.
Transcriptional cofactors
There are a number of proteins which can bind to RNAP and modify its
behavior. For instance, GreA and GreB from E. coli and in most other
prokaryotes can enhance the ability of RNAP to cleave the RNA template near
the growing end of the chain. This cleavage can rescue a stalled polymerase
molecule, and is likely involved in proofreading the occasional mistakes made
by RNAP. A separate cofactor, Mfd, is involved in transcription-coupled
repair, the process in which RNAP recognizes damaged bases in the DNA
template and recruits enzymes to restore the DNA. Other cofactors are known
to play regulatory roles, i.e. they help RNAP choose whether or not to express
certain genes.
RNA polymerase in eukaryotes
Structure of eukaryotic RNA polymerase II (light blue) in complex with
α-amanitin (red), a strong poison found in death cap mushrooms that targets
this vital enzyme
Eukaryotes have several types of RNAP, characterized by the type of
RNA they synthesize:

95
 RNA polymerase I synthesizes a pre-rRNA 45S, which matures into
28S, 18S and 5.8S rRNAs which will form the major RNA sections of
the ribosome.
 RNA polymerase II synthesizes precursors of mRNAs and most snRNA
and microRNAs. This is the most studied type, and due to the high level
of control required over transcription a range of transcription factors are
required for its binding to promoters.
 RNA polymerase III synthesizes tRNAs, rRNA 5S and other small
RNAs found in the nucleus and cytosol.
 RNA polymerase IV synthesizes siRNA in plants.
There are other RNA polymerase types in mitochondria and
chloroplasts. And there are RNA-dependent RNA polymerases involved in
RNA interference.
TRANSCRIPTION FACTORS
The basal transcription factors are typically defined as the minimal
complement of proteins necessary to reconstitute accurate transcription from a
minimal promoter (such as a TATA element or initiator sequence). They are
distinct from the regulatory transcription factors, which bind to sequences
farther away from the initiation site and serve to modulate levels of
transcription. This regulation presumably occurs through interactions between
the regulatory and basal transcription factors, although there is a great deal of
controversy about the identity of the regulatory factors' "target(s)"
The basal transcription complex assembles through an extensive series
of protein-protein interactions. Although the basal factors can assemble on the
promoter in a step-wise manner in vitro, there is some evidence that many of
the factor interactions can occur in the absence of DNA and that some of the
factors may pre-assemble into a "holoenzyme".
TFIIA
TFIIB
TFIID/TATA Binding Protein
TFIIE
TFIIF
TFIIH
TFIIK (CTD kinase) Not up yet
As of the latest release of TRANSFAC, a transcription factor database,
in 2001, it contained 2785 entries. Many of these are homologous proteins from
different species, nevertheless this number is indicative of the vast number of

96
transcription factors now known that regulate the expression of eukaryotic
genes. Any detailed treatment of these factors is way beyond the scope of this
course.

Basal Transcription Factor Pages


Transcription factors are the ultimate targets of cell-signalling
pathways. Whenever cells need to response to an extracellular signal such as a
hormone, the response is mediated by a change in gene expression that comes
about, most often as the result of a change in the phosphorylation state of a
transcription factor.
For example, growth hormone binding to its receptor catalyses the
autophosphorylation of a tyrosine residue in the cytoplasmic domain of the
receptor. This, in turn is recognised by the SH2 (Src homology 2) domain of a
cytoplasmic response protein which through its further interactions activates
the Ras protein. Ras is a G protein that is active when GTP is bound but
inactive when GDP is bound. Ras then activates a series of kinases until,
finally, one of these migrates to the nucleus where it phosphorylates a
transcription factor such as Fos, Jun or Myc.
The importance of transcription factors in the regulation of gene
expression can be illustrated by looking at the regulation of lipid metabolism.
Over 30 different transcription factors are involved in the regulation of lipid

97
metabolism genes. The PPAR and SREBP families of transcription factors are
particularly important. Fatty acid oxidation is regulated by PPAR factors;
cholesterol homeostasis is regulated by SREBP factors.
The following diagrams illustrate networks of genes regulated by PPAR
factors (which are Zn finger proteins) [LEFT] and a network of cholesterol
metabolism genes controlled by SREBP [RIGHT]:
Here and in Fig. 2, circles indicate proteins; rectangles show genes
coding for these proteins. ACO, acyl-CoA oxidase; ACS, acyl-coenzyme A
synthetase; apoAI - apolipoprotein AI; apoCIII - apolipoprotein CIII; GR -
glucocorticoid receptor; HD, Hydratase-dehydrogenase; HNF4, hepatocyte
nuclear factor 4; PPAR - peroxisome proliferator activated receptor.

Fig. 2.

98
ACC, acetyl coenzyme A carboxylase; f.a. - fatty acids; FAS, fatty
acid synthase; FDPS - farnesyl diphosphate synthase; HMG-CoA-R, 3-
hydroxy-3-methylglutaryl CoA reductase; HMG-CoA-S, 3-hydroxy-3-
methylglutaryl CoA synthase; LDL, low density lipoprotein; LDLR, low
density lipoprotein receptor; preSREBP sterol regulatory element-1 binding
protein precursor; SREBP - sterol regulatory element-1 binding protein; SRP -
sterol-regulated protease; SS, squalene synthase; LDL, very low density
lipoprotein
Eukaryotic RNA Polymerases and General Transcription Factors
Although transcription proceeds by the same fundamental mechanisms
in all cells, it is considerably more complex in eukaryotic cells than in bacteria.
This is reflected in two distinct differences between the prokaryotic and
eukaryotic systems. First, whereas all genes are transcribed by a single RNA
polymerase in bacteria, eukaryotic cells contain multiple different RNA
polymerases that transcribe distinct classes of genes. Second, rather than
binding directly to promoter sequences, eukaryotic RNA polymerases need to
interact with a variety of additional proteins to specifically initiate
transcription. This increased complexity of eukaryotic transcription presumably
facilitates the sophisticated regulation of gene expression needed to direct the
activities of the many different cell types of multicellular organisms.
Eukaryotic RNA Polymerases
Eukaryotic cells contain three distinct nuclear RNA polymerases that
transcribe different classes of genes (Table 6.1). Protein-coding genes are
transcribed by RNA polymerase II to yield mRNAs; ribosomal RNAs (rRNAs)
and transfer RNAs (tRNAs) are transcribed by RNA polymerases I and III.
RNA polymerase I is specifically devoted to transcription of the three largest
species of rRNAs, which are designated 28S, 18S, and 5.8S according to their
rates of sedimentation during velocity centrifugation. RNA polymerase III
transcribes the genes for tRNAs and for the smallest species of ribosomal RNA
(5S rRNA). Some of the small RNAs involved in splicing and protein transport
(snRNAs and scRNAs) are also transcribed by RNA polymerase III, while
others are polymerase II transcripts. In addition, separate RNA polymerases
(which are similar to bacterial RNA polymerases) are found in chloroplasts and
mitochondria, where they specifically transcribe the DNAs of those organelles.

99
Table 6.1. Classes of Genes Transcribed by Eukaryotic RNA Polymerases

Type of RNA synthesized RNA polymerase

Nuclear genes

mRNA II

tRNA III

rRNA

5.8S, 18S, 28S I

5S III

snRNA and scRNA II and IIIa

Mitochondrial genes Mitochondrialb

Chloroplast genes Chloroplastb

All three of the nuclear RNA polymerases are complex enzymes,


consisting of 8 to 14 different subunits each. Although they recognize different
promoters and transcribe distinct classes of genes, they share several common
features. The two largest subunits of all three eukaryotic RNA polymerases are
related to the β and β′subunits of the single E. coli RNA polymerase. In
addition, five subunits of the eukaryotic RNA polymerases are common to all
three different enzymes. Consistent with these structural similarities, the
different eukaryotic polymerases share several functional properties, including
the need to interact with other proteins to appropriately initiate
transcription.
General Transcription Factors and Initiation of Transcription by RNA
Polymerase II
Because RNA polymerase II is responsible for the synthesis of mRNA
from protein-coding genes, it has been the focus of most studies of transcription
in eukaryotes. Early attempts at studying this enzyme indicated that its activity
is different from that of prokaryotic RNA polymerase. The accurate
transcription of bacterial genes that can be accomplished in vitro simply by the
addition of purified RNA polymerase to DNA containing a promoter is not
possible in eukaryotic systems. The basis of this difference was elucidated in
1979, when Robert Roeder and his colleagues discovered that RNA polymerase
II is able to initiate transcription only if additional proteins are added to the
reaction. Thus, transcription in the eukaryotic system appeared to require

100
distinct initiation factors that (in contrast to bacterial σ factors) were not
associated with the polymerase.
Biochemical fractionation of nuclear extracts has now led to the
identification of specific proteins (called transcription factors) that are required
for RNA polymerase II to initiate transcription. Indeed, the identification and
characterization of these factors represents a major part of ongoing efforts to
understand transcription in eukaryotic cells. Two general types of transcription
factors have been defined. General transcription factors are involved in
transcription from all polymerase II promoters and therefore constitute part of
the basic transcription machinery. Additional transcription factors (discussed
later in the chapter) bind to DNA sequences that control the expression of
individual genes and are thus responsible for regulating gene expression.
polymerase II promoters have a TATA box (consensus sequence TATAA) 25
to 30 nucleotides upstream of the transcription start site. This sequence is
recognized by transcription factor TFIID, which consists of the TATA-binding
protein (TBP) and TBP-associated factors (TAFs). TFIIB(B) then binds to
TBP, followed by binding of the polymerase in association with TFIIF(F).
Finally, TFIIE(E) and TFIIH(H) associate with the complex.
Five general transcription factors are required for initiation of
transcription by RNA polymerase II in reconstituted in vitro systems (Figure
6.12). The promoters of many genes transcribed by polymerase II contain a
sequence similar to TATAA 25 to 30 nucleotides upstream of the transcription
start site. This sequence (called the TATA box) resembles the -10 sequence
element of bacterial promoters, and the results of introducing mutations into
TATAA sequences have demonstrated their role in the initiation of
transcription. The first step in formation of a transcription complex is the
binding of a general transcription factor called TFIID to the TATA box (TF
indicates transcription factor; II indicates polymerase II). TFIID is itself
composed of multiple subunits, including the TATA-binding protein (TBP),
which binds specifically to the TATAA consensus sequence, and 10-12 other
polypeptides, called TBP-associated factors (TAFs). TBP then binds a second
general transcription factor (TFIIB) forming a TBP-TFIIB complex at the
promoter (Figure 6.13). TFIIB in turn serves as a bridge to RNA polymerase,
which binds to the TBP-TFIIB complex in association with a third factor,
TFIIF.

101
Figure 6.12. Formation of a polymerase II transcription complex
Many
Following recruitment of RNA polymerase II to the promoter, the
binding of two additional factors (TFIIE and TFIIH) is required for initiation of
transcription. TFIIH is a multisubunit factor that appears to play at least two
important roles. First, two subunits of TFIIH are helicases, which may unwind
DNA around the initiation site. (These subunits of TFIIH are also required for
nucleotide excision repair, as discussed in Chapter 5.) Another subunit of
TFIIH is a protein kinase that phosphorylates repeated sequences present in the
C-terminal domain of the largest subunit of RNA polymerase II.
Phosphorylation of these sequences is thought to release the polymerase from
its association with the initiation complex, allowing it to proceed along the
template as it elongates the growing RNA chain.

102
In addition to a TATA box, the promoters of many genes transcribed by
RNA polymerase II contain a second important sequence element (an initiator,
or Inr, sequence) that spans the transcription start site. Moreover, some RNA
polymerase II promoters contain only an Inr element, with no TATA box.
Initiation at these promoters still requires TFIID (and TBP), even though TBP
obviously does not recognize these promoters by binding directly to the TATA
sequence. Instead, other subunits of TFIID (TAFs) appear to bind to the Inr
sequences. This binding recruits TBP to the promoter, and TFIIB, polymerase
II, and additional transcription factors then assemble as already described. TBP
thus plays a central role in initiating polymerase II transcription, even on
promoters that lack a TATA box.
Despite the development of in vitro systems and the characterization of
several general transcription factors, much remains to be learned concerning
the mechanism of polymerase II transcription in eukaryotic cells. The
sequential recruitment of transcription factors described here represents the
minimal system required for transcription in vitro; additional factors may be
needed within the cell. Furthermore, RNA polymerase II appears to be able to
associate with some transcription factors in vivo prior to the assembly of a
transcription complex on DNA. In particular, preformed complexes of RNA
polymerase II with TFIIB, TFIIE, TFIIF, TFIIH, and other transcriptional
regulatory proteins have been detected in both yeast and mammalian cells.
These large complexes (called polymerase II holoenzymes) can be recruited to
a promoter via direct interaction with TFIID (Figure 6.14). The relative
contributions of stepwise assembly of individual factors versus recruitment of
the RNA polymerase II holoenzyme to promoters within the cell thus remain to
be determined.

Figure 6.14. RNA polymerase II holoenzyme The holoenzyme


consists of a preformed complex of RNA polymerase II, the general

103
transcription factors TFIIB, TFIIE, TFIIF, and TFIIH, and several other
proteins that activate transcription. This complex can be recruited directly to a
promoter via interaction with TFIID (TBP + TAFs).
Transcription by RNA Polymerases I and III
As previously discussed, distinct RNA polymerases are responsible for
the transcription of genes encoding ribosomal and transfer RNAs in eukaryotic
cells. All three RNA polymerases, however, require additional transcription
factors to associate with appropriate promoter sequences. Furthermore,
although the three different polymerases in eukaryotic cells recognize distinct
types of promoters, a common transcription factor—the TATA-binding protein
(TBP)—appears to be required for initiation of transcription by all three
enzymes.
RNA polymerase I is devoted solely to the transcription of ribosomal
RNA genes, which are present in tandem repeats. Transcription of these genes
yields a large 45S pre-rRNA, which is then processed to yield the 28S, 18S,
and 5.8S rRNAs (Figure 6.15). The promoter of ribosomal RNA genes spans
about 150 base pairs just upstream of the transcription initiation site. These
promoter sequences are recognized by two transcription factors, UBF
(upstream binding factor) and SL1 (selectivity factor 1), which bind
cooperatively to the promoter and then recruit polymerase I to form an
initiation complex (Figure 6.16). The SL1 transcription factor is composed of
four protein subunits, one of which, surprisingly, is TBP. The role of TBP has
been demonstrated directly by the finding that yeasts carrying mutations in
TBP are defective not only for transcription by polymerase II, but also for
transcription by polymerases I and III. Thus, TBP is a common transcription
factor required by all three classes of eukaryotic RNA polymerases. Since the
promoter for ribosomal RNA genes does not contain a TATA box, TBP does
not bind to specific promoter sequences. Instead, the association of TBP with
ribosomal RNA genes is mediated by the binding of other proteins in the SL1
complex to the promoter, a situation similar to the association of TBP with the
Inr sequences of polymerase II genes that lack TATA boxes.
The genes for tRNAs, 5S rRNA, and some of the small RNAs involved
in splicing and protein transport are transcribed by polymerase III. These genes
are characterized by promoters that lie within, rather than upstream of, the
transcribed sequence (Figure 6.17). The most thoroughly studied of the genes
transcribed by polymerase III are the 5S rRNA genes of Xenopus. TFIIIA
(which is the first transcription factor to have been purified) initiates assembly
of a transcription complex by binding to specific DNA sequences in the 5S
rRNA promoter.

104
Figure 6.15. The ribosomal RNA gene The ribosomal DNA (rDNA) is
transcribed to yield a large RNA molecule (45S pre-rRNA), which is then
cleaved into 28S, 18S, and 5.8S rRNAs.

Figure 6.16. Initiation of rDNA transcription Two transcription


factors, UBF and SL1, bind cooperatively to the rDNA promoter and recruit
RNA polymerase I to form an initiation complex. One subunit of SL1 is the
TATA-binding protein (TBP
This binding is followed by the sequential binding of TFIIIC, TFIIIB,
and the polymerase. The promoters for the tRNA genes differ from the 5S
rRNA promoter in that they do not contain the DNA sequence recognized by
TFIIIA. Instead, TFIIIC binds directly to the promoter of tRNA genes, serving
to recruit TFIIIB and polymerase to form a transcription complex. TFIIIB is
composed of multiple subunits, one of which (once again) is the TATA-binding
protein, TBP. Thus, although the three RNA polymerases of eukaryotic cells
recognize different promoters, TBP appears to be a common element that links
promoter recognition with polymerase recruitment to the transcription
complex.

105
Figure 6.17. Transcription of polymerase III genes The promoters of
5S rRNA and tRNA genes are downstream of the transcrip-tion initiation site.
Transcription of the 5S rRNA gene is initiated by the binding of TFIIIA,
followed by the binding of TFIIIC, TFIIIB, and the polymerase. The tRNA
promoters do not contain a binding site for TFIIIA, and TFIIIA is not required
for their transcription. Instead, TFIIIC initiates the transcription of tRNA genes
by binding to promoter sequences, followed by the association of TFIIIB and
polymerase. The TATA-binding protein (TBP) is a subunit of TFIIIB.

106
NOTES

…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...

107
UNIT – IV
STRUCTURE

Translation. Mechanism of translation in Prokaryotes and Eukaryotes,


Post translational modifications of proteins.
Regulation of Gene expression in Prokaryotes(Operon concept (Lac and Tryp))
and in Eukaryotes (galactose metabolism in yeast).
PROKARYOTIC TRANSLATION
Prokaryotic Translation - The pathway of protein synthesis is called
translation because the language of the nucleotide sequence on the mRNA is
translated into the language of an amino acid sequence
The mRNA is translated from the 5 end to its 3 end producing a protein.
Prokaryotic mRNAs often have several coding regions, that is, they are
polycistronic.
Each coding region has its own initiation codon and produces a separate
species of polypeptide. Prokaryotic translation resembles that of eukaryotic
translation in most of the detail except with minor changes.
Initiation: In prokaryotes, the ribosome 70S instead of 80S is used in
translation machinery. In prokaryotes, the first amino acid, the one that initiates
protein synthesis, is always the same. N-formyl methionine (Met). N-formyl
methionyl-tRNA forms in two steps. First, ordinary methionine binds to a
special tRNA, then the methionine on the tRNA receives the formyl group.
This unique amino acid participates only in initiation. It never goes into
the interior of a polypeptide. In E. coli, a sequence of nucleotide bases (5' -
UAGGAGG -3 ') known as the Shine Dalgarno sequence is located 6 to 10
bases upstream of the AUG codon on the mRNA molecule, that is, near its 5' -
end.
The 16S ribosomal RNA component of the 30S ribosomal subunit has a
nucleotide sequence near its 3 -end that is complementary to all pan of the
Shine Dalgarno sequence.
Therefore, the mRNA 5 -end and the 3 -end of the 16S ribosomal RNA
can form complementary base pairs, thus facilitating the binding and
positioning of the mRNA on the 30S ribosomal subunit.
Elongation: During elongation, prokaryotes use EF-TU instead of eEF-la for
catalysis, whereas for translocation they use EF -G instead of eEF-2.
Interestingly, many ribosomes can translate the same mRNA molecule
simultaneously. Because of their relatively large size, the ribosome particles

108
cannot attach to an mRNA any closer than 80 nucleotides apart. Multiple
ribosomes on the same mRNA molecule form a polyribosome or polysome.
Prokaryotic translation is the process by which messenger RNA is translated
into proteins in prokaryotes.
Initiation
Initiation of translation in prokaryotes involves the assembly of the
components of the translation system which are: the two ribosomal subunits
(small and large), the mRNA to be translated, the first (formyl) aminoacyl
tRNA (the tRNA charged with the first amino acid), GTP (as a source of
energy), and three initiation factors (IF1, IF2 and IF3) which help the assembly
of the initiation complex.
The ribosome consists of three sites: the A site, the P site, and the E
site. The A site is the point of entry for the aminoacyl tRNA (except for the first
aminoacyl tRNA, fMet-tRNAfMet, which enters at the P site). The P site is
where the peptidyl tRNA is formed in the ribosome. And the E site which is the
exit site of the now uncharged tRNA after it gives its amino acid to the growing
peptide chain.
Initiation of translation begins with the 50S and 30S ribosomal subunits
dissociated. IF1 (initiation factor 1) blocks the A site to insure that the fMet-
tRNA can bind only to the P site and that no other aminoacyl-tRNA can bind in
the A site during initiation, while IF3 blocks the E site and prevents the two
subunits from associating. IF2 is a small GTPase which binds fmet-tRNAfMet
and helps its binding with the small ribosomal subunit. The 3' end of the 16S
rRNA of the small 30S ribosomal subunit recognizes the ribosomal binding site
on the mRNA (Shine-Dalgarno sequence or SD), through its anti-SD sequence,
5-10 base pairs upstream of the start codon. The Shine-Dalgarno sequence is
found only in prokaryotes.

109
The process of initiation of translation in prokaryotes.
This helps to correctly position the ribosome onto the mRNA so that the
P site is directly on the AUG initiation codon. IF3 helps to position fMet-
tRNAfMet into the P site, such that fMet-tRNAfMet interacts via base pairing with
the mRNA initiation codon (AUG). Initiation ends as the large ribosomal
subunit joins the complex causing the dissociation of initiation factors. Note
that prokaryotes can differentiate between a normal AUG (coding for
methionine) and an AUG initiation codon (coding for formyl-methionine and
indicating the start of a new translation process).
Elongation
Elongation of the polypeptide chain involves addition of amino acids to
the carboxyl end of the growing chain. The growing protein exits the ribosome
through the polypeptide exit tunnel in the large subunit

110
Elongation starts when the fmet-tRNA enters the P site, causing a
conformational change which opens the A site for the new aminoacyl-tRNA to
bind. This binding is facilitated by elongation factor-Tu (EF-Tu), a small
GTPase. Now the P site contains the beginning of the peptide chain of the
protein to be encoded and the A site has the next amino acid to be added to the
peptide chain. The growing polypeptide connected to the tRNA in the P site is
detached from the tRNA in the P site and a peptide bond is formed between the
last amino acids of the polypeptide and the amino acid still attached to the
tRNA in the A site. This process, known as peptide bond formation, is
catalyzed by a ribozyme, peptidyltransferase, an activity intrinsic to the 23S
ribosomal RNA in the 50S ribosomal subunit. Now, the A site has newly
formed peptide, while the P site has an unloaded tRNA (tRNA with no amino
acids). In the final stage of elongation, translocation, the ribosome moves 3
nucleotides towards the 3'end of mRNA. Since tRNAs are linked to mRNA by
codon-anticodon base-pairing, tRNAs move relative to the ribosome taking the
nascent polypeptide from the A site to the P site and moving the uncharged
tRNA to the E exit site. This process is catalyzed by elongation factor G (EF-
G). The ribosome continues to translate the remaining codons on the mRNA as
more aminoacyl-tRNA bind to the A site, until the ribosome reaches a stop
codon on mRNA(UAA, UGA, or UAG).
Termination
Termination occurs when one of the three termination codons moves
into the A site. These codons are not recognized by any tRNAs. Instead, they
are recognized by proteins called release factors, namely RF1 (recognizing the
UAA and UAG stop codons) or RF2 (recognizing the UAA and UGA stop
codons). These factors trigger the hydrolysis of the ester bond in peptidyl-
tRNA and the release of the newly synthesized protein from the ribosome. A
third release factor RF-3 catalyzes the release of RF-1 and RF-2 at the end of
the termination process.
Recycling
The post-termination complex formed by the end of the termination step
consists of mRNA with the termination codon at the A-site, an uncharged
tRNA in the P site, and the intact 70S ribosome. Ribosome recycling step is
responsible for the disassembly of the post-termination ribosomal complex.[2]
Once the nascent protein is released in termination, Ribosome Recycling Factor
and Elongation Factor G (EF-G) function to release mRNA and tRNAs from
ribosomes and dissociate the 70S ribosome into the 30S and 50S subunits. IF3
then replaces the deacylated tRNA releasing the mRNA. All translational
components are now free for additional rounds of translation.

111
Polysomes
Translation is carried out by more than one ribosome simultaneously.
Because of the relatively large size of ribosomes, they can only attach to sites
on mRNA 35 nucleotides apart. The complex of one mRNA and a number of
ribosomes is called a polysome or polyribosome.
Effect of antibiotics
Several antibiotics exert their action by targeting the translation process
in bacteria. They exploit the differences between prokaryotic and eukaryotic
translation mechanisms to selectively inhibit protein synthesis in bacteria
without affecting the host. Examples include:
 Puromycin has a structure similar to the tyrosinyl aminoacyl-tRNA.
Thus, it binds to the ribosomal A site and participates in peptide bond
formation, producing peptidyl-puromycin. However, it does not engage
in translocation and quickly dissociates from the ribosome causing a
premature termination of polypeptide synthesis.
 Streptomycin causes misreading of the genetic code in bacteria at
relatively low concentrations and inhibits initiation at higher
concentrations, by binding to the 30s ribosomal subunit.
 Other aminoglycosides as Tobramycin and Kanamycin prevent
ribosomal association at the end of initiation step and cause misreading
of the genetic code.
 Tetracyclines block the A site on the ribosome, preventing the binding
of aminoacyl tRNAs.
 Chloramphenicol blocks the peptidyl transfer step of elongation on the
50s ribosomal subunit in both bacteria and mitochondria.
 Macrolides and Lincosamides bind to the 50s ribosomal subunits
inhibiting the peptidyltransferase reaction or translocation or both.
Eukaryotic Translation
Eukaryotic Translation - Translation, like transcription, is divided into three
phases or steps. The mechanism of eukaryotic translation and the molecules
involved.
Initiation:Initiation of protein synthesis requires an mRNA molecule to be
selected for translation by a ribosome.
Once the mRNA binds to the ribosome, translation begins. This process
involves tRNA, rRNA, mRNA and at least 10 eukaryotic initiation factors
(eIFs). Initiation consists of four steps.

112
Ribosomal dissociation Two initiation factors eIF3 and eIF1A, bind to the
newly dissociated 40S ribosomal subunit. This delays its reassociation with
60S subunit and allows other translational initiation factors to associate with
the 40S subunits.
Formation of the 435 pre-initiation complex The first step in this process
involves the binding of GTP by eIF2. This binary complex then binds to met-
tRNA, a tRNA specifically involved in binding to the initiation codon AUF.
This tertiary complex binds to the 4OS ribosomal subunit to form the 43S pre-
initiation complex which is stabilized by association with eIF3 and eIF1A.
Formation of the 43S initiation complex The 5 terminals of most mRNA
molecules in eukaryotic cells are capped. This methyl-guanosyl triphosphate
cap facilitates the binding of mRNA to the 43S preinitiation complex. A cap
binding protein complex, (eIF4F) binds to the cap through the 4E protein.
The association of mRNA with the 43S preinitiation complex to form the 48S
initiation complex requires A TP hydrolysis. Following the association of the
43S preinitiation complex with the mRNA cap and reduction of the secondary
structure near the 5 end of the mRNA, the complex scans the mRNA for
suitable initiation codon.Generally this is the 5' most AUG. But the precise
initiation codon is determined by the so called "kozak" consensus sequences
that surround the AUG in eukaryotes.
Formation of the 80S initiation camplex The binding of the 60S ribosomal
subunit to the 48S initiation complex involves the hydrolysis of the GTP bound
to eIF2 by eIFS. This reaction results in the release of the initiation factors
bound to the 48S initiation complex and the rapid association of the 40S and
60S subunits to form the 80S ribosome.
Elongation:Elongation is a cyclic process involving several steps catalysed by
proteins called elongation factors eEF. These steps are
1. Binding of aminoacyl-tRNA to the A site
2. Peptide bond formation
3. Translocation
Binding of aminoacyl-tRNA to the A site In the complete 80S ribosome
formed, during the process of initiation, the A site (aminoacyl or acceptor site)
is free. The proper codon recognition-elongation factor eEF-lx-GDP and
phosphate.
Peptide bond formation The x-amino group of the new aminoacyl-tRNA in
the A site carries out a nucleophilic attack on the esterified carboxyl group of
the peptidyl tRNA occupying the P site. This reaction is catalysed by a peptidyl
transferase, a component of the 28S RNA of the 60S ribosomal subunit.

113
Translocation Upon removal of the peptidyl moiety from the tRNA in the P
site, elongation factor (eEF2) and GTP are responsible for the translocation of
the newly formed peptidyl tRNA at the A site into the empty P site.
The GTP required for eEF2 is hydrolysed to GDP and phosphate during the
translocation process.
The translocation of the newly formed peptidyl tRNA and its
corresponding codon into the P site frees, the A site for another cycle of
aminoacyl-tRNA codon recognition and elongation.
The energy required for the formation of one peptide bond includes the
hydrolysis of 2 A TP and 2 GTP molecules. This process occurs rapidly.
A eukaryotic ribosome can incorporate as many as six amino acids per
second, whereas 18 amino acids are incorporated per second by prokaryotic
ribosomes. Thus, the process of peptide synthesis occurs with great speed and
accuracy until a termination codon is reached in prokaryotes.
Termination:In comparison to initiation and elongation, termination is
relatively a simple process. Multiple cycles of elongation occur culminating in
polymerization of the specific amino acids into a protein molecule. There is no
tRNA with an anticodon capable of recognizing such a termination signal.
Releasing factors (eRF) are capable of recognizing termination signal
residues in the A site. The releasing factor, in conjugation with GTP and the
peptidyl transferases, promotes the hydrolysis of the bond between the peptide
and the tRNA occupying the P site. The ribosome dissociates into 40S and 60S
subunits.

114
Eukaryotic translation

Eukaryotic translation is the process by which messenger RNA is translated


into proteins in eukaryotes. It consists of initiation, elongation and termination
[1]

Initiation
The process of initiation of translation in eukaryotes.
Cap-dependent initiation
Initiation of translation usually involves the interaction of certain key
proteins with a special tag bound to the 5'-end of an mRNA molecule, the 5'
cap. The protein factors bind the small ribosomal subunit (also referred to as

115
the 40S subunit), and these initiation factors hold the mRNA in place. The
eukaryotic Initiation Factor 3 (eIF3) is associated with the small ribosomal
subunit, and plays a role in keeping the large ribosomal subunit from
prematurely binding. eIF3 also interacts with the eIF4F complex which consists
of three other initiation factors: eIF4A, eIF4E and eIF4G. eIF4G is a
scaffolding protein which directly associates with both eIF3 and the other two
components. eIF4E is the cap-binding protein. It is the rate-limiting step of cap-
dependent initiation, and is often cleaved from the complex by some viral
proteases to limit the cell's ability to translate its own transcripts. This is a
method of hijacking the host machinery in favor of the viral (cap-independent)
messages. eIF4A is an ATP-dependent RNA helicase, which aids the ribosome
in resolving certain secondary structures formed by the mRNA transcript.
There is another protein associated with the eIF4F complex called the Poly-A
Binding Protein (PABP), which binds the poly-A tail of most eukaryotic
mRNA molecules. This protein has been implicated in playing a role in
circularization of the mRNA during translation.
This pre-initiation complex (43S subunit, or the 40S and mRNA)
accompanied by the protein factors move along the mRNA chain towards its 3'-
end, scanning for the 'start' codon (typically AUG) on the mRNA, which
indicates where the mRNA will begin coding for the protein. In eukaryotes and
archaea, the amino acid encoded by the start codon is methionine. The initiator
tRNA charged with Met forms part of the ribosomal complex and thus all
proteins start with this amino acid (unless it is cleaved away by a protease in
subsequent modifications). The Met-charged initiator tRNA is brought to the P-
site of the small ribosomal subunit by eukaryotic Initiation Factor 2 (eIF2). It
hydrolyzes GTP, and signals for the dissociation of several factors from the
small ribosomal subunit which results in the association of the large subunit (or
the 60S subunit). The complete ribosome (80S) then commences translation
elongation, during which the sequence between the 'start' and 'stop' codons is
translated from mRNA into an amino acid sequence -- thus a protein is
synthesized.
The cap-independent initiation
The best studied example of the cap-independent mode of translation
initiation in eukaryotes is the Internal Ribosome Entry Site (IRES) approach.
What differentiates cap-independent translation from cap-dependent translation
is that cap-independent translation does not require the ribosome to start
scanning from the 5' end of the mRNA cap until the start codon. The ribosome
can be trafficked to the start site by ITAFs (IRES trans-acting factors)
bypassing the need to scan from the 5' end of the untranslated region of the

116
mRNA. This method of translation has been recently discovered, and has found
to be important in conditions that require the translation of specific mRNAs,
despite cellular stress or the inability to translate most mRNAs. Examples
include factors responding to apoptosis, stress-induced responses.
Elongation
Elongation is dependent on eukaryotic elongation factors.
At the end of the initiation step, the mRNA is positioned so that the next
codon can be translated during the elongation stage of protein synthesis. The
initiator tRNA occupies the P site in the ribosome, and the A site is ready to
receive an aminoacyl-tRNA. During chain elongation, each additional amino
acid is added to the nascent polypeptide chain in a three-step microcycle. The
steps in this microcycle are (1) positioning the correct aminoacyl-tRNA in the
A site of the ribosome, (2) forming the peptide bond and (3) shifting the mRNA
by one codon relative to the ribosome. The translation machinery works
relatively slowly compared to the enzyme systems that catalyze DNA
replication. Proteins are synthesised at a rate of only 18 amino acid residues per
second, whereas bacterial replisomes synthesize DNA at a rate of 1000
nucleotides per second. This difference in rate reflects, in part, the difference
between polymerizing four types of nucleotides to make nucleic acids and
polymerizing 20 types of amino acids to make proteins. Testing and rejecting
incorrect aminoacyl-tRNA molecules takes time and slows protein synthesis.
The rate of transcription in prokaryotes is approximately 55 nucleotides per
second, which corresponds to about 18 codons per second, or the same rate at
which the mRNA is translated. In bacteria, translation initiation occurs as soon
as the 5' end of an mRNA is synthesized, and translation and transcription are
coupled. This tight coupling is not possible in eukaryotes because transcription
and translation are carried out in separate compartments of the cell (the nucleus
and cytoplasm). Eukaryotic mRNA precursors must be processeed in the
nucleus (eg capping, polyadenylation, splicing) before they are exported to the
cytoplasm for translation.
Termination
Termination of elongation is dependent on eukaryotic release factors
The process is similar to that of prokaryotic termination.
POST-TRANSCRIPTIONAL MODIFICATION
Post-transcriptional modification is a process in cell biology by
which, in eukaryotic cells, primary transcript RNA is converted into mature
RNA. A notable example is the conversion of precursor messenger RNA into
mature messenger RNA (mRNA), which includes splicing and occurs prior to
protein synthesis. This process is vital for the correct translation of the genomes

117
of eukaryotes as the human primary RNA transcript that is produces as a result
of transcription contains both exons, which are coding sections of the primary
RNA transcript and introns, which are the non coding sections of the primary
RNA transcript.[1]
mRNA processing
The pre-mRNA molecule undergoes three main modifications. These
modifications are 5' capping, 3' polyadenylation, and RNA splicing, which
occur in the cell nucleus before the RNA is translated.[2]
5' Processing
Capping
Capping of the pre-mRNA involves the addition of 7-methylguanosine
(m7G) to the 5' end. In order to achieve this, the terminal 5' phosphate requires
removal, which is done by the aid of a phosphatase enzyme. The enzyme
guanosyl transferase then catalyses the reaction which produces the
diphosphate 5' end. The diphosphate 5' prime end then attacks the α phosphorus
atom of a GTP molecule in order to add the guanine residue in a 5'5'
triphosphate link. The enzyme S-adenosyl methionine then methylates the
guanine ring at the N-7 position. This type of cap, with just the (m7G) in
position is called a cap 0 structure. The ribose of the adjacent nucleotide may
also be methylated to give a cap 1. Methylation of nucleotides downstream of
the RNA molecule produce cap 2, cap 3 structures and so on. In these cases the
methyl groups are added to the 2' OH groups of the ribose sugar. The cap
protects the 5' end of the primary RNA transcript from attack by ribonucleases
that have specificity to the 3'5' phosphodiester bonds. [3]
3' Processing
Cleavage and Polyadenylation
The pre-mRNA processing at the 3' end of the RNA molecule involves
cleavage of its 3' end and then the addition of about 200 adenine residues to
form a poly(A) tail. The cleavage and adenylation reactions occur if a
polyadenylation signal sequence (5'- AAUAAA-3') is located near the 3' end of
the pre-mRNA molecule, which is followed by another sequence, which is
usually (5'-CA-3'). The second signal is the site of cleavage. A GU-rich
sequence is also usually present further downstream on the pre-mRNA
molecule. After the synthesis of the sequence elements, two multisubunit
protiens called cleavage and polyadenylation specificity factor (CPSF) and
cleavage stimulation factor (CStF) are transferred from RNA Polymerase II to
the RNA molecule. The two factors bind to the sequence elements. A protein
complex forms which contains additional cleavage factors and the enzyme

118
Polyadenylate Polymerase (PAP). This complex cleaves the RNA between the
polyadenylation sequence and the GU-rich sequence at the cleavage site
marked by the (5'-CA-3') sequences. Poly(A) polymerase then adds about 200
adenine units to the new 3' end of the RNA molecule using ATP as a precursor.
As the poly(A) tails is synthesised, it binds multiple copies of poly(A) binding
protein, which protects the 3'end from ribonuclease digestion.[3]
Splicing
RNA splicing is the process by which introns, regions of RNA that do
not code for protein, are removed from the pre-mRNA and the remaining exons
connected to re-form a single continuous molecule. Although most RNA
splicing occurs after the complete synthesis and end-capping of the pre-mRNA,
transcripts with many exons can be spliced co-transcriptionally.[4] The splicing
reaction is catalyzed by a large protein complex called the spliceosome
assembled from proteins and small nuclear RNA molecules that recognize
splice sites in the pre-mRNA sequence. Many pre-mRNAs, including those
encoding antibodies, can be spliced in multiple ways to produce different
mature mRNAs that encode different protein sequences. This process is known
as alternative splicing, and allows production of a large variety of proteins from
a limited amount of DNA.
LAC OPERON
The lactose operon, a nucleotide sequence in Escherichia coli that
controls the synthesis of the enzyme β-galactosidase comprising binding
sequence motifs for the cap protein, which activates transcription, the repressor
protein, which inhibits transcription, and a region with which RNA polymerase
interacts. The first, best studied and best understood model for gene regulation.
The lac operon is an operon required for the transport and metabolism of
lactose in Escherichia coli and some other enteric bacteria. It consists of three
adjacent structural genes, a promoter, a terminator, and an operator. The lac
operon is regulated by several factors including the availability of glucose and
of lactose. Gene regulation of the lac operon was the first genetic regulatory
mechanism to be elucidated and is often used as the canonical example of
prokaryotic gene regulation.

119
Structure of the operon
Structure of lactose and the
products of its cleavage.
The lac operon consists of
three structural genes, a
promoter, a terminator, and an
operator. The three structural
genes are:: lacZ, lacY, and
lacA.
 lacZ encodes β-galactosidase (LacZ), an intracellular enzyme that
cleaves the disaccharide lactose into glucose and galactose.
 lacY encodes β-galactoside permease (LacY), a membrane bound
transport protein that pumps lactose into the cell.
 lacA encodes β-galactoside transacetylase (LacA), an enzyme that
transfers an acetyl group from acetyl-CoA to β-galactosides.
Only lacZ and lacY appear to be necessary for lactose catabolism.
Specific control of the lac genes depends on the availability of the
substrate lactose to the bacterium. The proteins are not produced by the
bacterium when lactose is unavailable as a carbon source. The lac genes are
organized into an operon; that is, they are oriented in the same direction
immediately adjacent on the chromosome and are co-transcribed into a single
polycistronic mRNA molecule. Transcription of all genes starts with the
binding of the enzyme RNA polymerase (RNAP), a DNA-binding protein, to a
specific DNA binding site immediately upstream of the genes, the promoter.
From this position RNAP proceeds to transcribe all three genes (lacZYA) into
mRNA. The DNA sequence of the E. coli lac operon, the lacZYA mRNA, and
the lacI genes are available from GenBank (view).
The regulatory response to lactose requires an intracellular regulatory
protein called the lactose repressor. The lacI gene encoding repressor lies
nearby the lac operon and is always expressed (constitutive). If lactose is
missing from the growth medium, the repressor binds very tightly to a short
DNA sequence just downstream of the promoter near the beginning of lacZ
called the lac operator. Repressor bound to the operator interferes with binding
of RNAP to the promoter, and therefore mRNA encoding LacZ and LacY is
only made at very low levels. When cells are grown in the presence of lactose,
a lactose metabolite called allolactose binds to the repressor, causing a change
in its shape. Thus altered, the repressor is unable to bind to the operator,

120
allowing RNAP to transcribe the lac genes and thereby leading to high levels of
the encoded proteins.
Genetic nomenclature
Three-letter mnemonics are used to describe phenotypes in bacteria including
E. coli.
Examples include:
 Lac (the ability to use lactose),
 His (the ability to synthesize the amino acid histidine)
 Mot (swimming motility)
 Str (response to the antibiotic streptomycin)
In the case of Lac, wild type cells are Lac+ and are able to use lactose as
a carbon and energy source, while Lac- mutant derivatives cannot use lactose.
The same three letters are typically used (lower-case, italicized) to label the
genes involved in a particular phenotype, where each different gene is
additionally distinguished by an extra letter. The lac genes encoding enzymes
are lacZ, lacY, and lacA. The fourth lac gene is lacI, encoding the lactose
repressor---I stands for inducibility.
The diagram below summarizes these statements.

121
lac operon in detail
One may distinguish between structural genes encoding enzymes, and
regulatory genes encoding proteins that affect gene expression. Current usage
expands the phenotypic nomenclature to apply to proteins: thus, LacZ is the
protein product of the lacZ gene, β-galactosidase. Various short sequences that
are not genes also affect gene expression, including the lac promoter, lac p, and
the lac operator, lac o. Although it is not strictly standard usage, mutations
affecting lac o are referred to as lac oc, for historical reasons.
Lactose analogues

122
IPTG

ONPG

X-gal

allolactose
A number of lactose derivatives or analogs have been described that are
useful for work with the lac operon. These compounds are mainly substituted
galactosides, where the glucose moiety of lactose is replaced by another
chemical group.
 Isopropyl-β-D-thio-galactoside (IPTG) is frequently used as an inducer
of the lac operon for physiological work.[1] IPTG binds to repressor and
inactivates it, but is not a substrate for β-galactosidase. One advantage
of IPTG for in vivo studies is that since it cannot be metabolized by E.
coli its concentration remains constant and the rate of expression of lac
p/o-controlled genes, is not a variable in the experiment. In addition,
IPTG is transported efficiently independent of whether the lacY gene is
functional.
 Phenyl-β-D-galactose (phenyl-Gal) is a substrate for β-galactosidase,
but does not inactivate repressor and so is not an inducer. Since wild
type cells produce very little β-galactosidase, they cannot grow on
phenyl-Gal as a carbon and energy source. Mutants lacking repressor
are able to grow on phenyl-Gal. Thus, minimal medium containing only

123
phenyl-Gal as a source of carbon and energy is selective for repressor
mutants and operator mutants. If 108 cells of a wild type strain are
plated on agar plates containing phenyl-Gal, the rare colonies which
grow are mainly spontaneous mutants affecting the repressor. The
relative distribution of repressor and operator mutants is affected by the
target size. Since the lacI gene encoding repressor is about 50 times
larger than the operator, repressor mutants predominate in the selection.
 Other compounds serve as colorful indicators of β-galactosidase
activity.
o ONPG is cleaved to produce the intensely yellow compound,
orthonitrophenol, and is commonly used as a substrate for assay
of β-galactosidase in vitro. [1]
o Colonies that produce β-galactosidase are turned blue by X-gal
(5-bromo-4-chloro-3-indolyl-β-D-galactoside). [2]
 Allolactose is an isomer of lactose and is the inducer of the lac operon.
Lactose is galactose-(β1->4)-glucose, whereas allolactose is galactose-
(β1->6)-glucose. Lactose is converted to allolactose by β-galactosidase
in an alternative reaction to the hydrolytic one. A physiological
experiment which demonstrates the role of LacZ in production of the
"true" inducer in E. coli cells is the observation that a null mutant of
lacZ can still produce LacY permease when grown with IPTG but not
when grown with lactose. The explanation is that processing of lactose
to allolactose (catalyzed by β-galactosidase) is needed to produce the
inducer inside the cell.
Classification of regulatory mutants
A conceptual breakthrough of Jacob and Monod[3] was to recognize the
distinction between regulatory substances and sites where they act to change
gene expression. A former soldier, Jacob used the analogy of a bomber that
would release its lethal cargo upon receipt of a special radio transmission or
signal. A working system requires both a ground transmitter and a receiver in
the airplane. Now, suppose that the usual transmitter is broken. This system can
be made to work by introduction of a second, functional transmitter. In
contrast, he said, consider a bomber with a defective receiver. The behavior of
this bomber cannot be changed by introduction of a second, functional
aeroplane.
To analyze regulatory mutants of the lac operon, Jacob developed a
system by which a second copy of the lac genes (lacI with its promoter, and
lacZYA with promoter and operator) could be introduced into a single cell. A
culture of such bacteria, which are diploid for the lac genes but otherwise

124
normal, is then tested for the regulatory phenotype. In particular, it is
determined whether LacZ and LacY are made even in the absence of IPTG.
This experiment, in which genes or gene clusters are tested pairwise, is called a
complementation test.

This test is illustrated in the figure (lacA is omitted for simplicity). First,
certain haploid states are shown (i.e. the cell carries only a single copy of the
lac genes). Panel (a) shows repression, (b) shows induction by IPTG, and (c)
and (d) show the effect of a mutation to the lacI gene or to the operator,
respectively. In panel (e) the complementation test for repressor is shown. If
one copy of the lac genes carries a mutation in lacI, but the second copy is wild
type for lacI, the resulting phenotype is normal---no LacZ is expressed without
IPTG. Mutations affecting repressor are said to be recessive to wild type (and
that wild type is dominant), and this is explained by the fact that repressor is a
small protein which can diffuse in the cell. The copy of the lac operon adjacent
to the defective lacI gene is effectively shut off by protein produced from the
second copy of lacI.
If the same experiment is carried out using an operator mutation, a
different result is obtained (panel (f)). The phenotype of a cell carrying one
mutant and one wild type operator site is that LacZ and LacY are produced

125
even in the absence of the inducer IPTG. The operator mutation is dominant.
When the operator site where repressor must bind is damaged by mutation, the
presence of a second functional site in the same cell makes no difference to
expression of genes controlled by the mutant site.
A more sophisticated version of this experiment uses marked operons to
distinguish between the two copies of the lac genes and show that the
unregulated gene(s) are the ones next to the mutant operator (panel (g). For
example, suppose that one copy is marked by a mutation inactivating lacZ so
that it can only produce the LacY protein, while the second copy carries a
mutation affecting lacY and can only produce LacZ. In this version, only the
copy of the lac operon that is adjacent to the mutant operator is expressed
without IPTG. We say that the operator mutation is cis-dominant, it is
dominant to wild type but affects only the copy of the operon which is
immediately adjacent to it.
This explanation is misleading in an important sense, because it
proceeds from a description of the experiment and then explains the results in
terms of a model. But in fact, it is often true that the model comes first, and an
experiment is fashioned specifically to test the model. Jacob and Monod first
imagined that there must be a site in DNA with the properties of the operator,
and then designed their complementation tests to show this.
The dominance of operator mutants also suggests a procedure to select
them specifically. If regulatory mutants are selected from a culture of wild type
using phenyl-Gal, as described above, operator mutations are rare compared to
repressor mutants because the target-size is so small. But if instead we start
with a strain which carries two copies of the lac genes (that is diploid for lac),
the repressor mutations (which still occur) are not recovered because
complementation by the wild type genes confers a wild type phenotype. While
mutation of a single repressor gene produces no change in phenotype, mutation
of a single operator can be determined by a reduction in color intensity.
Regulation by cyclic AMP
The experimental microorganism used by François Jacob and Jacques
Monod was the common laboratory bacterium, E. coli, but many of the basic
regulatory concepts that were discovered by Jacob and Monod are fundamental
to cellular regulation in organisms. The key idea is that proteins are not
synthesized when they are not needed--- E. coli conserves cellular resources
and energy by not making the three Lac proteins when there is no need to
metabolize lactose, such as when other sugars like glucose are available. The
following section discusses how E. coli controls certain genes in response to
metabolic needs.

126
During World War II, Monod was testing the effects of combinations of sugars
as nutrient sources for E. coli. He found that bacteria grown with two different
sugars often displayed two phases of growth. For example, if glucose and
lactose were both provided, glucose would be metabolized first (growth phase
I, see Figure 2) and then lactose (growth phase II). This phenomenon is called
diauxie.

Figure 2: Monod's "bi-phasic" growth curve


Metabolism of lactose does not occur during the first part of the diauxic
growth curve because β-galactosidase is not made when both glucose and
lactose are present in the medium.
Explanation of this depended on the characterization of additional
mutations affecting the lac genes other than those explained by the classical
model. Two other genes, cya and crp, were known that map far away from lac,
and when mutant result in a decreased level of expression in the presence of
IPTG and even in a strain which is mutant for repressor or operator. The
discovery of cyclic AMP in 1957 (in eukaryotic cells) and a decade later in E.
coli led to the demonstration that mutants defective in one of these genes could
be restored to full activity by the addition of cyclic AMP to the medium.
The cya gene encodes adenylate cyclase, which produces cyclic AMP.
In a cya mutant, the absence of cyclic AMP makes the expression of the
lacZYA genes more than ten times lower than normal. Addition of cyclic AMP
corrects the low Lac expression characteristic of cya mutants. The second gene,
crp, encodes a protein called catabolite activator protein (CAP) or cAMP
receptor protein (CRP). [It is remarkable that more than almost forty years
later, different geneticists use different terms for the same gene depending on
how they feel about the two competing groups involved in the original
discovery.]
This dual regulation causes the lactose metabolism enzymes to be made
in small quantities in the presence of both glucose and lactose (sometimes
called leaky expression) due to lactose inhibiting LacI from binding to the
operator, but at high cAMP concentrations and in the presence of lactose there

127
are high levels of expression (Phase II in Figure 2). Leaky expression is
necessary in order to allow for metabolism of some lactose after the glucose
source is expended, but before lac expression is fully activated.
In summary:
 When lactose is absent then there is very little Lac enzyme production
(the operator has LacI bound to it).
 When lactose is present but a preferred carbon source (like glucose) is
also present then a small amount of enzyme is produced (LacI is not
bound to the operator).
 When lactose is the favoured carbon source (for example in the absence
of glucose) cAMP-CAP bind to the promoter and Lac enzyme
production is maximised.
The delay between growth phases reflects the time needed to produce
sufficient quantities of lactose-metabolizing enzymes. First, the CAP regulatory
protein has to assemble on the lac promotor, resulting in an increase in the
production of lac mRNA. More available copies of the lac mRNA results in the
production (see translation) of significantly more copies of LacZ (β-
galactosidase, for lactose metabolism) and LacY (lactose permease to transport
lactose into the cell). After a delay needed to increase the level of the lactose
metabolizing enzymes, the bacteria enter into a new rapid phase of cell growth.
Two puzzles of catabolite repression relate to how cAMP level is
actually coupled to the presence of glucose, and secondly, why the cells should
even bother. After lactose is cleaved it actually forms glucose and galactose
(easily converted to glucose). In metabolic terms, lactose is just as good a
carbon and energy source as glucose. The cAMP level is related not to
intracellular glucose concentration but to the rate of glucose transport, which
influences the activity of adenylate cyclase. (In addition, glucose transport also
leads to direct inhibition of the lactose permease.) As to why E. coli works this
way, one can only speculate. All enteric bacteria ferment glucose, which
suggests they encounter it frequently. It is possible that a small difference in
efficiency of transport or metabolism of glucose v. lactose makes it
advantageous for cells to regulate the lac operon in this way.
Multimeric nature of repressor and the complex operator
The lac repressor is a tetramer of identical subunits. Each subunit
contains a helix-turn-helix (HTH) motif capable of binding to DNA. The
operator site where repressor binds is a DNA sequence with inverted repeat
symmetry. The two DNA half-sites of the operator together bind to two of the
subunits of the tetrameric repressor. Although the other two subunits of

128
repressor are not doing anything in this model, this property was not
understood for many years.
Eventually it was discovered that two additional (minor) operators are
involved in lac regulation. One (O3) lies in the end of the lacI gene and the
other (O2) is about 400 bp downstream in the early part of lacZ. These two sites
were not found in the early work because they have redundant functions and
individual mutations do not affect repression very much. Single mutations to
either O2 or O3 have only 2 to 3-fold effects. However, their importance is
demonstrated by the fact that a double mutant defective in both O2 and O3 is
dramatically de-repressed (by about 70-fold).
In the current model, repressor is bound simultaneously to both the
main operator O1 and to either O2 or O3. The intervening DNA loops out from
the complex. The redundant nature of the two minor operators suggests that it
is not a specific looped complex that is important. One idea is that the system
works through tethering. If bound repressor releases from O1 momentarily,
binding to a minor operator keeps it in the vicinity, so that it may rebind
quickly. This would increase the affinity of repressor for O1.
Mechanism of induction
The repressor is an allosteric protein, i.e. it can assume either one of
two slightly different shapes, which are in equilibrium with each other. In one
form the repressor is capable of binding to the operator DNA, and in the other
form it cannot bind to the operator. According to the classical model of
induction, binding of the inducer, either allolactose or IPTG, to the repressor
affects the distribution of repressor between the two shapes. Thus, repressor
with inducer bound is stabilized in the non-DNA-binding conformation.
However, this simple model cannot be the whole story, because repressor is
bound quite stably to DNA, yet it is released rapidly by addition of inducer.
Therefore it seems clear that repressor can also bind inducer while still bound
to DNA. It is still not entirely known what the exact mechanism of binding is.
Use in molecular biology
The lac gene and its derivatives are amenable to use as a reporter gene
in a number of bacterial-based selection techniques such as two hybrid
analysis, in which the successful binding of a transcriptional activator to a
specific promoter sequence must be determined.[2] In LB plates containing X-
gal, the colour change from white colonies to a shade of blue, corresponds to
about 20-100 β-galactosidase units, while tetrazolium lactose and MacConkey
lactose media have a range of 100-1000 units, being most sensitive in the high
and low parts of this range respectively. [2] Since MacConkey lactose and
tetrazolium lactose media both rely on the products of lactose breakdown, they

129
require the presence of both lacZ and lacY genes. The many lac fusion
techniques which include only the lacZ gene are thus suited to the X-gal plates
[2]
or ONPG liquid broths
Trp operon
Trp operon is an operon in bacteria which promotes the production of
tryptophan when tryptophan isn't present in the environment. Discovered in
1953 by Jacques Monod and colleagues, the trp operon in E. coli was the first
repressible operon to be discovered. While the lac operon can be activated by a
chemical (allolactose), the tryptophan (Trp) operon is inhibited by a chemical
(tryptophan). This operon contains five structural genes: trp E, trp D, trp C, trp
B, and trp A, which encodes tryptophan synthetase. It also contains a promoter
which binds to RNA polymerase and an operator which blocks transcription
when bound to the protein synthesized by the repressor gene (trp R) that binds
to the operator. In the lac operon, lactose binds to the repressor protein and
prevents it from repressing gene transcription, while in the trp operon,
tryptophan binds to the repressor protein and enables it to repress gene
transcription. Also unlike the lac operon, the trp operon contains a leader
peptide and an attenuator sequence which allows for graded regulation.[1]
It is an example of negative regulation of gene expression. Within the
operon's regulatory sequence, the operator is blocked by the repressor protein
in the presence of tryptophan (thereby preventing transcription) and is liberated
in tryptophan's absence (thereby allowing transcription). The process of
attenuation (explained below) complements this regulatory action.
Repression
The repressor for the trp operon is produced upstream by the trpR gene,
which is continually expressed. It creates monomers, which associate into
tetramers. When tryptophan is present, it binds to the tryptophan repressor
tetramers, and causes a change in conformation, which allows the repressor to
bind the operator, which prevents RNA polymerase from binding or
transcribing the operon, so tryptophan is not produced. When tryptophan is not
present, the repressor cannot bind the operator, so transcription can occur. This
is therefore a negative feedback mechanism.
Attenuation
Attenuation is a second mechanism of negative feedback in the trp
operon. While the TrpR repressor decreases transcription by a factor of 70,
attenuation can further decrease it by a factor of 10, thus allowing accumulated
repression of about 700-fold. Attenuation is made possible by the fact that in
prokaryotes (which have no nucleus), the ribosomes begin translating the

130
mRNA while RNA polymerase is still transcribing the DNA sequence. This
allows the process of translation to directly affect transcription of the operon.
At the beginning of the transcribed genes of the trp operon is a sequence of 140
nucleotides termed the leader transcript (trpL). This transcript includes four
short sequences designated 1-4. Sequence 1 is partially complementary to
sequence 2, which is partially complementary to sequence 3, which is partially
complementary to sequence 4. Thus, three distinct secondary structures
(hairpins) can form: 1-2, 2-3 or 3-4. The hybridization of strands 1 and 2 to
form the 1-2 structure prevents the formation of the 2-3 structure, while the
formation of 2-3 prevents the formation of 3-4. The 3-4 structure is a
transcription termination sequence, once it forms RNA polymerase will
disassociate from the DNA and transcription of the structural genes of the
operon will not occur.
Part of the leader transcript codes for a short polypeptide of 14 amino
acids, termed the leader peptide. This peptide contains two adjacent tryptophan
residues, which is unusual, since tryptophan is a fairly uncommon amino acid
(about one in a hundred residues in a typical E. coli protein is tryptophan). If
the ribosome attempts to translate this peptide while tryptophan levels in the
cell are low, it will stall at either of the two trp codons. While it is stalled, the
ribosome physically shields sequence 1 of the transcript, thus preventing it
from forming the 1-2 secondary structure. Sequence 2 is then free to hybridize
with sequence 3 to form the 2-3 structure, which then prevents the formation of
the 3-4 termination hairpin. RNA polymerase is free to continue transcribing
the entire operon. If tryptophan levels in the cell are high, the ribosome will
translate the entire leader peptide without interruption and will only stall during
translation termination at the stop codon. At this point the ribosome physically
shields both sequences 1 and 2. Sequences 3 and 4 are thus free to form the 3-4
structure which terminates transcription. The end result is that the operon will
be transcribed only when tryptophan is unavailable for the ribosome, while the
trpL transcript is constitutively expressed.
To ensure that the ribosome binds and begins translation of the leader
transcript immediately following its synthesis, a pause site exists in the trpL
sequence. Upon reaching this site, RNA polymerase pauses transcription and
apparently waits for translation to begin. This mechanism allows for
synchronization of transcription and translation, a key element in attenuation.
A similar attenuation mechanism regulates the synthesis of histidine,
phenylalanine and threonine.

131
GALACTOSE METABOLISM IN YEAST
Metabolism refers to the biochemical assimilation (in anabolic
pathways) and dissimilation (in catabolic pathways) of nutrients by a cell. Like
in other organisms, in yeast these processes are mediated by enzymic reactions,
and regulation of the underlying pathways have been studied in great detail in
yeast. Anabolic pathways include reductive processes leading to the
production of new cellular material, while catabolic pathways are oxidative
processes which remove electrons from substrates or intermediates that are
used to generate energy. Preferably, these processes use NADP or NAD,
respectively, as co-factors.
Although all yeasts are microorganisms that derive their chemical
energy, in the from of ATP, from the breakdown of organic compounds, there
is metabolic diversity in how these organisms generate and consume energy
from these substrates. Knowledge of the underlying regulatory mechanisms is
not only valuable in the understanding of general principles of regulation but
also of great importance in biotechnology, if new metabolic capabilities of
particular yeasts have to be exploited.
It is now well established that most yeasts employ sugars as their main
carbon and hence energy source, but there are particular yeasts which can
utilize non-conventional carbon sources. With regard to nitrogen metabolism,
most yeasts are capable of assimilating simple nitrogenous sources to
biosynthesize amino acids and proteins (Table 3-1). Aspects of phosphorus and
sulphur metabolism as well as aspects of metabolism of other inorganic
compounds have been studied in some detail, predominantly in the yeast,
Saccharomyces cerevisiae.
Table 3-1: Nutrients for growth of yeast (S. cerevisiae) cells.
Substrate Intermediates Enzymes Products

132
3.1 Sugar Catabolism in Yeast
3.1.1 Principal Pathways
The major source for energy production in the yeast, Saccharomyces
cerevisiae, is glucose and glycolysis is the general pathway for conversion of
glucose to pyruvate, whereby production of energy in form of ATP is coupled
to the generation of intermediates and reducing power in form of NADH for
biosynthetic pathways.
Two principal modes of the use of pyruvate in further energy
production can be distinguished: respiration and fermentation (Figure 3-1). In
the presence of oxygen and absence of repression, pyruvate enters the
mitochondrial matrix where it is oxidatively decarboxylated to acetyl CoA by
the pyruvate dehydrogenase multi enzyme complex. This reaction links
glycolysis to the citric acid cycle, in which the acetyl CoA is completely
oxidized to give two molecules of CO2 and reductive equivalents in form of
NADH and FADH2. However, the citric acid cycle is an amphibolic pathway,
since it combines both catabolic and anabolic functions. The latter results, for
example, from the production of intermediates for the synthesis of amino acids
and nucleotides. Replenishment of compounds necessary to drive the citric acid
cycle, such as oxaloacetate and α-ketoglutarate, are (i) the fixation of CO2 to
pyruvate by the actions of the enzymes pyruvate carboxylase (ATP-dependent)
and phosphoenolpyruvate carboxykinase and (ii) the glyoxalate cycle (a
shortcut across the citric acid cycle), which is important when yeasts are grown
on two-carbon sources, such as acetate or ethanol.

Figure 3-1: Metabolism in yeast under aerobic and anaerobic conditions.

133
During alcoholic fermentation of sugars, yeasts re-oxidize NADH to NAD in a
two-step reaction from pyruvate, which is first decarboxylated by pyruvate
decarboxylase followed by the reduction of acetaldehyde, catalyzed by alcohol
dehydrogenase (ADH). Concomitantly, glycerol is generated from
dihydroyacetone phosphate to ensure production of this important compound.
An alternative mode of glucose oxidation is the hexose phosphate
pathway also known as the pentose phosphate cycle, which provides the cell
with pentose sugars and cytosolic NADPH, necessary for biosynthetic
reactions, such as the production of fatty acids, amino acids and sugar alcohols.
The first step in this pathway is the dehydrogenation of glucose-6-phosphate to
6- phosphogluconolactone and generation of one mole of NADPH (by glucose-
6-phosphate dehydrogenase). Subsequently, 6-phosphogluconate is
decarboxlated by the action of phosphogluconate dehydrogenase to give
ribulose-5-phosphate and a second mole of NADPH. Thus, besides generating
NADPH, the other major function of this pathway is the production of ribose
sugars which serve in the biosynthesis of nucleic acid precursors and nucleotide
coenzymes.
The redox carriers, NAD and FAD, which become reduced during the
breakdown of sugars to NADH and FADH2, respectively, are reoxidized in the
respiratory (electron transport) chain located in the inner mitochondrial
membrane. The energy released during the transfer of electrons is coupled to
the process of oxidative phosphorylation, which is effected by ATP synthase,
an enzyme complex which is also located in the inner mitochondrial membrane
and designed to synthesize ATP from ADP and inorganic phosphate. These
pathways will be considered separately in chapter 9: Transport.
3.1.2 Regulation of Biochemical Pathways
Biochemical pathways in yeasts are regulated at various levels:
(i) Enzyme synthesis - induction, repression and derepression of gene
expression;
(ii) Enzyme activity - allosteric activation, inhibition, or interconversion of
isoenzymes;
(iii) Cellular compartmentalization - localization of particular pathways to the
cytosol, mitochondria, peroxisomes, or the vacuole;
(iv) Transport mechanisms - internalization, secretion, trafficking of
compounds between the various cellular compartments. Like in the
studies of many biochemical aspects, yeast as a versatile system has
contributedn significantly to decipher a number of important regulatory
circuits, which in many instances have been conserved among all

134
eukaryotes investigated thus far. Examples will be presented in chapters
‘Transport’ and ‘Regulation’.
3.1.3 Respiration versus Fermentation
Yeasts can be catagorized in several groups according to their modes of
energy production, utilizing respiration or fermentation (Table 3-2). It is
important to note that these processes are mainly regulated by environmental
factors, the best documented being the availability of glucose and oxygen. Thus
yeasts can adapt to varying growth environments, and even within a single
species, the prevailing pathways will depend on the actual growth conditions.
For example, glucose can be utilized in several different ways by S. cerevisiae,
depending on the presence of oxygen and other carbon sources.
Table 3-2: Principal modes of respiration in yeasts.

Types Examples Respiration Fermentation Anaerobic growth


Catabolite repression occurs when glucose or an initial product of glucose
metabolism represses the synthesis of various respiratory and gluconeogenic
enzymes. Catabolite inactivation results in the rapid disappearance of such
enzymes on addition of glucose. In catabolite repression, enzyme activity is lost
by dilution with cell growth. Although enzymes are still present, they are no
longer synthesized due to gene repression by signals derived from glucose or
other sugars. However, the nature of the signal(s) is not clear at present.
Glucose repression in yeast describes a long-term regulatory adaptation to
degrade glucose exclusively to ethanol and CO2. Therefore, when S. cerevisiae
is grown aerobically on high concentrations of glucose, fermentation will
account for the bulk of glucose consumption. In batch culutures, when the
levels of glucose decline, cells become gradually derepressed, resulting in the
induction of respiratory enzyme synthesis. This in turn results in oxidative
consumption of ethanol, when cells enter a second phase of growth known as
the diauxic shift. Catabolite inactivation is more rapid than repression and is

135
thought to be due to deactivation by glucose of a limited number of key
enzymes, such as fructose 1,6-bisphosphatase. Inacivation occurs primarily by
enzyme phosphorylation, followed by slower vacular degradation of the
enzyme. It has been established that cAMP as a second messenger plays a
central role in regulating catabolite repression and inactivation in S. cerevisiae.
Other Sugars – Galactose
Galactose is a 'non-
conventional' nutrient for
yeast, which however can
be used as a sole carbon
source when glucose is
absent from the medium. In
yeast cells supplied with
glucose, the GAL genes are
repressed. They are
activated a thousand fold in
cells that are starved for
glucose, and this one of the
few pathways in yeast
which is regulated in a
nearly 'all-or-nothing'
mode. The three enzymes
involved are depicted in
Figure 3-2.
Figure 3-2: Metabolism of galactose.
Metabolism of Non-Hexose Carbon Sources
In addition to hexose sugars, yeasts can utilize a number of 'non-
conventional' carbon sources, such as biopolymers, pentoses, alcohols,
polyols, hydrocarbons, fatty acids and organic acids. This is of particular
interest for biotechnological processes, the most prominent being the use of S.
cerevisiae in fermentation. One should also remember that free glucose is scarce
in natural environments or in natural products used to feed yeast cells.
For example, disaccharides, such as maltose, sucrose, melibiose,
lactose or cellobiose can easily be accepted as nutrients by the action of
corresponding hydrolases which break these disaccharides down into their
constituent monosaccharides (Table 3-3). Notably, hydrolysis is coupled to
transport of either the disaccharide or the resulting monosaccharides.

136
Table 3-3: Disaccharides as substrates in yeasts.

Other saccharide biopolymers, like starch, inulin, cellulose,


hemicellulose, or pectin, can be metabolized by some specialized yeasts
directly, while for the use of carbon sources to other species they have to be
hydrolyzed by non-yeast enzymes before utilization.
Pentose sugars can be fermented to ethanol by only very few yeast species,
although many yeasts can grow aerobically on pentoses. The inability of S.
cerevisiae to ferment xylose (e.g. derived from hemicellulose) could be
circumvented by introducing genes for xylose reductase and xylitol
dehydrogenase from xylose-fermenting species (Pichia) by recombinant DNA
technology. However, the efficiency of xylose fermentation remains low.
Many yeasts have the capability of metabolizing ethanol (Table 3-4) or
methanol, an approach used in biomass production of yeasts of
biotechnological interest. Methanol-utilizing (methylotropic) yeasts are found,
for example, in Hansenula polymorpha, Pichia pastoris, several Candida
species, and Torulopsis sonorensis. In these organisms, methanol is first
metabolized by an O2-dependent oxidase to formaldehyde, which is then
converted into dihydroxy acetone (DAH) by a DAH synthase. DHA and GAP
can be utilized to synthesize fructose-6-phosphate.
Glycerol functions as a compatible solute in osmoregulation in
osmotolerantyeasts that are capable of growing in high sugar or salt
environments. Many yeasts can grow on glycerol as a sole carbon source under
aerobic conditions, but glycerol is a non-fermentable carbon source for many
yeast species, including S. cerevisiae. To serve as a carbon source, glycerol
after internalization has to converted by glycerol kinase to glycerol-3-
phosphate, which is then transformed into DAH phosphate by glycerol-3-
phosphate dehdrogenase that is a substrate in gluconeogenesis.

137
Table 3.4. Use of unusual nutrients in yeasts.

138
NOTES

…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...

139
UNIT – V
STRUCTURE
Gene organization and expression in Mitochondria and Chloroplasts.
Transposable elements in maize and drosophila.
5.1. GENE ORGANIZATION AND EXPRESSION IN CHLOROPLAST
AND MITOCHONDRIA
Introduction
Eukaryotic genomes are much more complex than prokaryotic
genomes. And further, plant genomes are more complex than other eukaryotic
genomes. Prior to the development of recombinant DNA technology genomes,
were analyzed by reassociation kinetics techniques. Reassociation kinetic
experiments are performed by melting DNA and allowing it to reanneal upon
itself or with another population of either DNA or RNA molecules. The
kinetics of the reassociation provides data that can be used to analyze the
overall structure, evolution and expression of genomes.
The most common method of denaturing duplex DNA is by heating to
o
100 C. But how can we monitor this denaturation and subsequent renaturation?
The most common method is by measuring the absorbance change in the
ultraviolet region at 260 nm. The important property is that melted, single-
stranded DNA absorbs about 40% more at 260 nm than duplex DNA. If we
slowly heat DNA the absorbance will increase dramatically over a short range
of temperatures. The mid-point of this transition is called the melting
temperature or Tm. Under physiological conditions, the Tm usually lies in the
range of 85-95oC. Thus, without altering the cellular conditions, the duplex
DNA is stable in the cell. The exact temperature that a particular DNA melts
depends on several parameters. The GC content is important because GC base
pairs have three hydrogen bonds compared to the two for AT base pairs. Thus,
the higher the GC content the higher the Tm.
Effect of GC content on Tm : DNA denaturation is reversible, and the
reversible process is called renaturation. This process requires that the
temperature be lowered gradually. As this lowering of temperature proceeds the
following occurs:
1. Single-stranded molecules randomly encounter each other.
2. Short complementary stretches of duplex DNA form.
3. The DNA then zips back together to form the original structure.
Obviously if a single molecule is denatured it should be able to reform
completely. But if two separate molecules are denatured together, for example
wheat and barley DNA, then complementary regions between the two DNAs

140
will be able to form duplex DNA. The ability of two molecules to renature is
called hybridization.
Hybridization - the pairing of complementary nucleic acids
Performing hybridization experiments in a solution is called liquid
hybridization. We have already discussed the related procedure called filter
hybridization.
Chloroplast Genome Organization
All angiosperms and land plants have cpDNAs which range in size from 120-
160 kb; three expceptions are:

Species Size (kb)

N. accuminati 171

Duckweed 180

Geranium 217

All cpDNA molecules are circular and spinach is used as the basis for
all comparisons. Very few repeat elements are found other than short sequences
of less than 100 bp. The notable exception is a large (10-76 kb) inverted repeat
section, which when present, always contains the rRNA genes. (Legumes such
as pea do not contain this repeat.) For the majority of species, this repeat region
is 22-26 kb in size. Finally,the genetic order of the ribosomal unit is conserved
in all species:
16S - tRNAile - tRNAala - 23S - 5S
Recent research has also described two other features of chloroplast
DNA. First it was shown to that it can exist in in two orientations This implies
that the molecule can undergo an isomerization event. Second is has been
shown that spinach, corn, tomato and pea can all exist as multimers.

Multimer Relative Abundance Percent

Monomer 1 67.5

Dimer 1/3 22.5

Trimer 1/9 7.5

Tetramer 1/27 2.5

141
Because photosysnthesis is the primary function of the chloroplast it is
not surprising that the chlroplast genome contains genes which encode for
proteins that are involved in that process.

Reaction Function

Dark Reactions rbcS (nuclear encoded)


rbcL (chloroplast encoded)

Light Reactions apoproteins for PSI andPSII


cytochrome b6
cytochrome f
6 of 9 ATPase subunits
cab, LHC proteins (nuclear encoded)
plastocyanin (nuclear encoded)
ferredoxin (nuclear encoded)

Other 19/60 ribosome binding proteins


translation factors
RNA polymerase subunits
tRNA and rRNA genes

Atrazine resistance is apparantley mediated through the psbA gene


sequences of the 32 kd protein which is encoded by cpDNA. DNA sequence
analysis revealed the following amino acid changes that are thought to be
important.

Species AA# Susceptible Resistant

Blue green algae 264 Ser (TCG) Ala (GCG)

Chlamydomonas 264 Ser (TCT) Ala (GCT)

Solanum nigrum 264 Ser (AGT) Gly (GGT)

Amaranthus 228 Ser (AGT) Gly (GGT)

Evolutionary Changes of cpDNA


1. The majority of changes are small insertions and deletions of 1-106bp;
significantly, a few length mutations of 50-1200 bp are clusted in "hot
spots".
2. The largest deletion occured in pea where an entire rRNA cluster is lost.

142
3. The most common evolutionary change is in gene order. Small changes
in the gene order occur, especially in the algae, but inversions have
generated large scale order changes:
o legumes - about 50 kb inversion brought rbcL closer to psbA
o wheat - about 25 kb inversion brought atpA closer to rbcL
MITOCHONDRIAL GENOME ORGANIZATION
In comparison to the chloroplast genome, the size of the mitochondrial genome
is quite variable.

Species Size (kb)

Oenothera 195

Turnip 218

Corn 570

Muskmelon 2400

Further, in comparison to the mitochondrial genomes of other species


the size is quite large and variable. For example, animal mitochondrial
genomes range in size form 15-18 kb, and fungi mitochondrial genomes range
form 18-78 kb.
Plants may code for more proteins than with species. For example,
genes for ribosomes, subunits I and II of cytochrome oxidase and ATPase
subunits are located on the mitochondrial genomes of plants.
When DNA from corn mitochondria was investigated with EM, several
circular molecules of different sizes were detected. Once the genome was
mapped it became apparent that a mechanism existed to generated these circles
of different sizes. It is now understood how these molecules arise. First, lets
look at the simple situation of turnip. Two direct repeats undergo
intramolecular recombination to give the two smaller molecules:
218 kb -------> 135 kb + 83 kb
The mitochondrial genome of corn undergoes the same type of
recombination, but the events are more complex. First, the master circles can be
subdivided into two major subgroups:
570 kb -------> 488 kb + 82 kb
570 kb -------> 503 kb + 67 kb
The second groups of molecules are still labile and can produce several
other subpopulations. Further two sub genomic circles can unite to form a

143
larger circle. This variability is possible because corn has 10 repeats with which
intra-molecular recombination can occur.

Master Circle Size Sub-genomic Circle Size Repeat Size


Species (kb) (kb) (kb)

Turnip 218 135 + 83 2

Cauliflower 217 172 + 45 ?

Black
231 135 + 96 7
Mustard

White
208 none none
Mustard

Radish 242 139 + 103 10

Spinach 327 234 + 93 6

Introns have been located in the cytochrome oxidase subunit II gene


(the cytochrome complex consists of 3 mitochondrial and 4 nuclear encoded
genes). This gene contains one intron in rye, corn, wheat, rice, and carrot, but
for other species such as Oenothera, broad bean, cucumber the gene has no
intron.
Promiscuous DNA
Stern and Lonsdale (1982) hybridized mtRNA to a SstII digest of maize
mt DNA and found that it hybridized to fragments known not to contain mt
rRNA genes. The question of interest was - what was it hybridizing to? They
next looked at a cosmid clone of corn mtDNA that hybridized to the mt RNA
and found that it hybridized to a RNA molecule of the size of the cp 16S RNA
gene. How could this have happened?
They next mapped the clone and compared it to the map of the corn
cpDNA and found that the clone map was almost congruent with that of the the
cpDNA 16S RNA region. This mapping showed that the two maps were nearly
identical over a 12 kb region of DNA. These results suggest that cpDNA had
been transferred to the mitochondrial genome.
The observation that organelle DNA was found in other DNA
compartments of the cell were extended by other researcher. Stern and Palmer
looked at corn, mung bean, spinach and pea and found extensive evidence of
cpDNA/mtDNA homology. These observations were extended to other DNA
locations in the plant cell. Kemble et al (1983) demonstrated that mitochondrial

144
DNA sequences are located in the nucleus of corn. Scott and Timmis (1984)
showed that cpDNA sequences are found in the nuclear DNA.
CHLOROPLAST GENOME - The chloroplasts of green plants are
cytoplasmic organelles that house the various pigments and enzymes of the
light harvesting photosynthetic apparatus. Even before the turn of the century it
was clear that green pigmentation was one of the easiest traits to observe in
plant breeding experiments. Although some pigmentation traits obeyed
Mendel's laws, other colour traits were only transmitted through the female
parent that provided the cytoplasm of the zygote.
These observations of cytoplasmic or maternal inheritance eventually
led to the hypothesis that chloroplasts must carry genes. We know that
chloroplasts contain a unique circular DNA genome that is completely different
from the nuclear genome. The presence of a genetic system within chloroplasts
had already been inferred from studies on non Mendelian inheritance in 1909,
but the presence of organellar DNA and ribosomes was demonstrated only in
1962.
Since then it has been shown that chloroplasts and other plastids contain
all the machinery necessary for gene expression. The chloroplast genetic
components form a large proportion of those in the leaf, comprising up to 15%
of the total DNA and up to 60% of the total ribosomes. The chloroplast genome
has been extensively characterized from a variety of species and cooperation
between the chloroplast and nuclear genome in chloroplast biogenesis is
currently under investigation.
Electron micrographs indicate that the chloroplast DNA is some 10 to
20 times smaller than the E. coli chromosomes. For example, the chloroplast
genome of maize (corn) contains 140,000 base pairs of DNA. Such genomes
are much too small to encode the approximately 1,000 different proteins found
in chloroplasts. Instead, biosynthesis of the chloroplast involves an intimate
collaboration between the nuclear and chloroplast genomes.
In fact, every known multimeric protein component of chloroplasts is a
mixture of the products of both nuclear and chloroplast genes. Most chloroplast
proteins are encoded by nuclear DNA, translated in the cytoplasm, and
imported into the chloroplast by a specific transport mechanism that enables
polypeptides to cross the outer membrane of the organelle.
However, some 100 chloroplast specific proteins are synthesized within
the chloroplast itself. These proteins are encoded by chloroplast DNA,
transcribed by the chloroplast specific RNA polymerase, and translated by the
chloroplast specific protein-synthesizing machinery. Since RNA cannot cross

145
the outer membrane of the chloroplast, chloroplast ribosomal RNAs and tRNAs
must be encoded in chloroplast DNA.
Chloroplasts are not static organelles but can adapt to different
physiological conditions, such as high or low levels of light. For example,
when grown entirely in the dark, chloroplasts lack chlorophyll but retain
carotenoid pigments. Thus many chloroplast genes are light regulated in certain
cases by light sensitive promoters.
CHLOROPLAST GENOME ORGANIZATION
Chloroplast is the chlorophyll containing organelle that carries out
photosynthesis and starch grain formation in plants. Like mitochondria,
chloroplasts contain DNA and ribosomes, both with prokaryotic affinities. The
DNA of chloroplast (cpDNA) is a circle that ranges in size from 120-190 kb.
The chloroplast DNA, like mitochondrial DNA, controls the production
of tRNAs, ribosomal RNAs and some of the proteins found within the
organelle.
In more than 25 chloroplast DNAs from different plant, species that
have been sequenced, there seems to be about 87-183 genes in the chloroplast
genome. The chloroplast genome codes for all the rRNA and tRNA species
needed for protein synthesis.
The ribosome includes two small rRNAs in addition to the major
species. The tRNA set may include all of the necessary genes. The organelle
genes are transcribed and translated by the apparatus of the organelle.
About half of the chloroplast genes codes for proteins involved in
protein synthesis. The endosymbiotic origin of the chloroplast is emphasized by
the relationship between these genes and their counterparts in bacteria.
The organization of the rRNA genes in particular is closely related to
that of a cyanobacterium, which pins down more precisely the last common
ancestor between chloroplast and bacteria.
Introns in chloroplast fall into two general classes. Those in tRNA
genes are usually located in the anticodon loops, like the introns found in yeast
nuclear tRNA genes. Those in protein coding genes resemble the introns of
mitochondrial genes.
This places the endosymbiotic event at a time in evolution before the
separation of prokaryotes with uninterrupted genes. The role of the chloroplast
is to undertake photosynthesis. Many of the chloroplast genes code for proteins
that are located or function in thylakoid membranes.
But quite strangely, even some protein complex found on thylakoid
membrane are coded by nuclear genes. Thus on the thylakoid membranes you

146
find proteins coded both by chloroplast genome and nuclear genome. Other
chloroplast complexes are coded entirely by one genome.
STRUCTURE AND ORGANISATION OF CHLOROPLAST GENOME
The chloroplast genomes of vascular plants and most algae are quite
similar. In general structure and organization, especially by comparison to the
wholesale variation seen in the nuclear and mitochondrial genomes. With one
Possible exception, all known chloroplast genomes are circular DNA
molecules. Size variation is greatest among green algae in which most
chloroplast genomes range between about 85 and 300 kb.
The genome of Acetabularia chloroplasts is exceptional in being very
large (approximately 2,000 kb) and perhaps composed of linear rather than
circular DNA molecules. However, in angiosperms chloroplast genomes in all
but two of over 200 species examined are circular and range in size between
120 and 160 kb. The low end of this range is a single group of legumes which
lack one copy of the large (15-25 kb) repeated sequence characteristic of most
other chloroplast genomes.
Thus the great majority of angiosperm chloroplast genomes actually fall
into the relatively narrow range of 135 and 160 kb. Chloroplast DNA (ctDNA)
consists of a circular molecule of 83-128 x 106 molecular weights with a size
of 1.21-1.93 x 105 bp, which contains about 85% single copy sequences. DNA
is present in about 30-200 copies per chloroplast.
A number of genes have been located on the circle and one of the
important features is the presence of two copies of the ribosomal DNA
sequences. These sequences are often but not always-present on a large
inverted repeat. Other genes mapped include those for the large subunit of
RuBP. Case, tRNAs, subunits of ATP synthase, and cytochrome.
Most of this size variation can be accounted for by the presence or
absence of a portion of the plastid genome which has been duplicated and is
present in an inverted orientation in the plastid DNA molecule. The location of
this inverted repeat is relatively fixed with respect to other genes and it
separates a small single-copy region from a large single copy DNA region.
In most higher plants the inverted repeat is 22 to 26 kbp, within which
the rRNA transcription unit is located. In geranium the repeated DNA is larger
and genes such as psbB, petB, pelD, petA and rbcL are included in the inverted
repeat. Finally, some plastid genomes, such as those in pea and mung bean,
lack inverted repeat.
Plastid gene content in higher plants is very constant and many
polycistronic transcription units are conserved. Several gene pairs such as

147
psaA-psaB, psbD-psbC, atpB-atpE, are contranscribed in all the higher plant
plastid genomes examined to date. The contranscription of genes may ensure
that the synthesis of subunits is stoichiometric and or could promote protein--
protein interactions required for assembly of functional complexes.
For example, psaA and psaB encode polypeptides which are tightly
associated in the reaction centre of PSI. Other genes, such as rbcL and some
gene encoding tRNAs, are not part of polycistronic transcription units. While
the plastid gene content of higher plants is very constant, variation in gene
order is evident, which results primarily from DNA inversion. DNA.
Inversions have reshuffled plastid genomes such that distances between
genes, and relative orientation of transcription units, vary considerably in
genomes of higher plants. For example, rps16 is proximal to trnk in barley,
whereas in pea rbcL occupies this position. The greatest variation in gene order
is found in peas (at least 12 rearrangements), perhaps due to lack of an inverted
repeat in this plasmid genome which might stabilize the genome.
Comparison of Size of Some Chloroplast DNA

Size in Base Size of Circle


DNA
Pairs (µm)

Plasmid 1-200*103 ---

E.coli (chromosome) 3.8*106 ---

Euglena ctDNA 1.4*105 40-44

Tobacco ctDNA 1.6*105 ---

Maize ctDNA 1.36*105 43

Broad bean ctDNA 1.21*105 39

Arabidopsis total nuclear


2*108 ---
genome

DNA COPY NUMBER AND LOCALIZATION - Multiple copies of the


plastid genome are found in each cell and in each plastid as well. The amount
of DNA per plastid varies with the stage of leaf and chloroplast development.
Proplastids with as few as 22 copies of DNA have been reported, whereas
chloroplasts in general contain 200 to 300 DNA molecules. The polyploid
nature of the plastid genome is even more striking when one considers leaf
cells.

148
In pea, barley and spinach each natural leaf cell contains 9,000 to
13,000 copies of plastid DNA dispersed in 40 to 120 plastids. The very high
effective ploidy of the chloroplast genome means that a significant fraction of
the total DNA in a cell as much as 30% if the nuclear genome is small may be
of chloroplast origin. This means that rbcL, a single copy plastid gene which
encodes the large subunit of Rubisco, is present in about 10,000 copies in a
mesophyll cell.
RNA OTHER PROCESSING ACTIVITIES
Although plastid RNAs are neither capped nor polyadenylated, RNA
maturation pathways can be very complex. In addition to intron removal,
primary transcripts are often cleaved to remove a portion of the untranslated
RNA proximal to open reading frames.
The function of the 5' end RNA processing is not known but it may alter
stability or transcript translatability. Polycistronic transcripts are also processed
at internal sites. Here again the role RNA processing plays gene expression is
unclear but one result is differential accumulation of RNA from some parts of
long transcription units.
PROTEIN GENES
Whether or not most chloroplast protein genes are transcribed as parts
of operons remains to be determined. It is known that rbcS and psbA produce
very abundant monocistronic transcripts but these genes may be exceptions to
the general rule. Transcript mapping studies indicate that the situation in other
regions of the genome may be much more complex.
When Northern blots of electrophoretically separated RNA are probed
by hybridization with small cloned fragments of chloroplast DNA, numerous
RNA bands with homology to the probe are often seen. Some of the RNA
molecules visual-sized in this way are quite large (for example, 4-8 kb), much
larger than any one gene, and often many times the length of the probe.
In some cases, it has been shown that most of the RNAs in a series of
bands come from the same strand of DNA. Since in angiosperms most
chloroplast protein coding genes do not contain introns, this multiplicity of
RNAs must reflect the use of multiple initiation or termination sites and or the
processing of a long primary transcript.
In both cases the initial transcripts are polycistronic and I the production
of mature, translatable mRNA must involve processing steps. Many of the
intermediate size RNA bands may be processing intermediates of various types.
These observations of polycistronic transcripts are surprising since chloroplast
protein genes (in contrast to rRNA genes) generally are not organized into the

149
prokaryotic operon pattern in which functionally related genes are closely
linked. Some remnants of a prokaryotic operon structure can be discerned in
chloroplast genomes, however. One case involves genes for the thylakoid
membrane ATPase complex.
In E. coli ATP genes are part of a well defined operon. Some remnants
of such a structure may be imagined in the plastid genome where atpB and atpE
are very close together. In fact, they actually overlap in many chloroplast
genomes, with the sequence ATGA containing both the first codon of the atpE
coding sequence (ATG) and the translation stop codon of atpB (TGH). The
genes atpA and atpH are also found fairly close together (within 2 kb) but are
located far away from atp8 and atpE.
Genes for the other components of the ATPase complex are not found
in the chloroplast DNA; they are located in the nucleus. Examples of
functionally related genes scattered in different locations in the plastid genome
include those for chloroplast ribosomal proteins, proteins of the cytochrome
b6/f complex, and proteins of the photosynthetic reaction centre.
Additional components of some of these same complexes, notably a
large number of ribosomal proteins, are encoded within the nucleus. Relatively
few protein genes in angiosperm chloroplasts have introns. Only six introns
were detected in an electron microscopic analysis by B. Koller and H. Delius of
Vicia faba chloroplast DNA-RNA hybrids.
They were detected under conditions in which at least fifty introns
(accounting for over 20%) in the Euglena chloroplast genome could be shown.
Such a large difference between Euglena and higher plants is not necessarily
surprising since their chloroplast genomes are thought to have arisen from
separate endosymbiotic .events. What is surprising is that in both Euglena and
higher plants in which intron containing chloroplast genes have been
sequenced, the genes appear to have similar intron boundary sequences.
This observation indicates that the mechanism for intron excision and
splicing for chloroplast protein genes may be similar to that for nuclear
mRNAs in that conserved intron boundary sequences seem to direct the
splicing process. This mechanism differs from intron processing in tRNA
genes; ergo, two different splicing mechanisms probably exist in the
chloroplast.
CHLOROPLAST PROMOTER SEQUENCES
Searches for conserved promoter like sequences upstream from
chloroplast genes have revealed elements with considerable homology to
bacterial promoter sequences. The E. coli "consensus" promoter sequence
contains two conserved regions, one normally found about 35 nucleotides and

150
the other about 10 nucleotides upstream from the start of transcription and
referred to as the "35" and "-10" elements respectively;
5'... TTGACA/T... (16-18 nucleotides).. .TATG/AAT... 3'. The
chloroplast consensus sequence described by L. Bogorad and his colleagues at
Harvard University is as follows: 5' . . . A/g TTG/cA/cNa/t. . . (15-20
nucleotides) . . . T A/tA/tG/aA T. . . 3'. I (In the preceding line, lower case
letters represent less frequent alternative bases.)
In several cases wherein the start of the RNA transcript has been
identified by SI nuclease protection experiments, it occurs within eight
nucleotides of the proximal promoter elements. The conservation of these
sequence elements and their homology with bacterial promoters support, but
does not prove, the notion that they function as chloroplast gene promoters.
Proof requires the direct demonstration that removing or changing these
sequences actually affects promoter activity. One way of obtaining such proof
is to test altered genes in an in, vitro transcription system.
The studies undertaken made progressive deletions of sequences 5' to
the gene, moving gradually closer to the start of transcription. Each deletion
mutant was characterized by DNA sequencing and then tested for its ability to
support accurate transcription. Deletions of sequences further upstream than
position -85 had little effect on the production of transcripts, but there was a
rapid drop in transcription caused by deletions between -80 and -75.
This region contains the sequence TTGCTTA, the first three
nucleotides of which are homologous to the E. coli -35 consensus sequence.
There is also a TATAAT sequence between -54 and -59 which is fully
homologous to the E. coli -10 consensus sequence. Since transcription was
already inhibited by the removal of the sequences further upstream, little effect
was seen when the TATAAT sequence was deleted.
NUCLEAR GENES ENCODING PLASTID PROTEINS
The majority of plastid localized proteins are encoded by nuclear genes.
These genes are transcribed by RNA polymerase II and the resultant transcripts
are spliced, capped and polyadenylated in the nucleus. The mRNAs then are
translated by 80S ribosomes in the cytoplasm to produce proteins which can be
transported into plastids posttranslationally.
Following uptake into the chloroplast, the proteins are assembled with
cofactors and other proteins to form functional complexes, Control sequences
at the 5' -end of the gene which are involved in initiating transcription are very
similar to the "Pribnow': box and the '-35' region characteristic of bacterial
genes.

151
The mRNA produced from chloroplast genes is not usually
polyadenylated, although short sequences up to 20 residues have been reported
and, of course, no transport is required as the mRNA is produced in the same
compartment as the ribosome on which it will be translated. There is no
evidence that chloroplast mRNA is transported but of the chloroplasts and
translated on cytoplasmic ribosomes.
The tRNA chloroplast population differs distinctly from that in the
cytoplasm, as do the aminoacylating enzymes. Ribulose bisphosphate
carboxylase is the major protein component of chloroplasts and its synthesis is
a good example of cooperation between the genomes. The enzyme consists of
eight identical large catalytic subunits encoded by the chloroplast genome and
eight identical small regulator subunits encoded in the nucleus.
After synthesis, the large subunit (which has limited solubility) is
probably bound by a stabilizing protein to maintain solubility prior to assembly
into active enzyme. The small subunit is synthesized in the cytoplasm, on free
ribosomes, with an N-terminal leader peptide of about 20 amino acids.
The complete peptide is then taken up into the chloroplast in an ATP
dependent manner, accompanied by removal of the leader peptide by a stromal
peptidase. This mechanism is not analogous to that involving signal peptide
cleavage. The synthesis of this important enzyme clearly depends on the
coordinate expression of genes in different genomes.
GENE CONTENT AND ARRANGEMENT - Chloroplast DNA encodes a
complete set of ribosomal RNAs and tRNAs as well as many proteins. The
ribosomal RNA genes are contained in the inverted .repeat and thus are present
in two copies per genome. They are organized into an operon which has many
similarities to bacterial rRNA operons. About 40 tRNAs, including acceptors
for all twenty amino acids, have been identified and mapped in the chloroplast
genomes of several species of plants.
Although tRNA genes are arranged in clusters in the Euglena
chloroplast genome, they are dispersed throughout the genome of higher plants.
An exception is the presence of two tRNA genes in the transcribed spacer on
the 5' end of the rRNA operon. Complete DNA sequences for entire chloroplast
genomes are now available and so it is possible to determine the number of
genes more precisely.
Computers are used to search the data for sequences with the properties
of genes, for example "open reading frames" which contain a series of amino
acid codons beginning with an initiation codon and ending with a termination
codon and sequences that are homologous to known tRNA genes. It is possible
from such work to predict the presence of more than 120 genes. About thirty of

152
these are tRNA genes and another four encode ribosomal RNAs. There are
about 85 protein coding genes.
Most of these encode proteins that have been identified in chloroplasts
or are similar to proteins identified in other organisms, such as ribosomal
proteins and RNA polymerase subunits. Some of the protein coding genes
which have been mapped on the chloroplast genome in one or more species of
higher plants are listed in plants are listed
All green plants for which we have information contain essentially the
same set of genes in their chloroplast genomes. The reason for this similarity is
not immediately obvious. One possible explanation is based on the widely
accepted notion that chloroplasts arose from cyanobacterial endosymbionts.
Since cyanobacterial genomes are some 20-30 times larger than those of
chloroplasts, a dramatic reduction in the plastid genome must have occurred
after the endosymbiotic association was established. If this reduction occurred
quite soon after the original endosymbiosis and prior to the divergence of
lineages leading to different groups of green plants, the similarity of green plant
chloroplast genomes might reflect common ancestry.
On the other hand, one might postulate that certain genes are retained in
the chloroplast genome for reasons of function. For example, genes for
ribosomal and transfer RNAs may have been retained in the chloroplast cause it
is difficult or impossible to transport RNAs across the chloroplast membrane.
The same may be true for certain chloroplast proteins.
Some researchers have suggested that natural selection might act to
keep certain genes in the plastid genome if their products are required to "lock
in" or promote the accumulation of certain other proteins originally synthesized
in the cytoplasm. This might apply in particular to multi component protein
complexes composed of chloroplast and nuclear coded proteins is, such as the
enzyme ribulose bisphosphate carboxylase (Rubisco), the chloroplast
ribosomes, the ATP synthase and cytochrome b6/f complexes, and both the
photosynthetic reaction centres.
A related hypothesis suggests that chloroplast coded proteins might be
important in regulating the level of such multicomponent complexes. This
would ensure that control of plastid functions remains in the plastid genome,
which would help to explain the remarkable fact that so many multisubunit
complexes include components by both chloroplast and nuclear genes.
Although we do not yet have much data on chloroplast DNA in
chromophytic and rhodophytic algae, it is clear that these genomes do encode a
number of the same genes found in the higher plant chloroplasts. In some cases

153
they have also been shown to contain genes not present in the plastid genomes
of higher plants.
MAPPED CHLOROPLAST GENES WITH CHLOROPLAST DNA
Ribosomal RNAs: Ribosomal RNA operons are designated rrnA, B,C and so
forth, depending on the number of gene sets in the particular genome. In most
chloroplasts there is one operon in each segment of the inverted repeat. These
are normally identical in sequence. By convention, rrnA is on the right side of
the chloroplast DNA restriction map with the large single copy region at the
top, The rbcl gene is on the left side. Each operon normally includes genes for
16S rRNA, 23S rRNA, 5S rRNA, and 4.55 rRNA.
TRANSFER RNAS: tRNA genes are designated "trn' to indicate transfer
RNA, followed by the single letter amino acid code indicating the amino acid
accepted by the tRNA encoded by the gene. Where there is more than one gene
for a particular amino acid, the isoaccepting species can be indicated either
with sequential numbers or by giving the anticodon. About 40 tRNA genes are
known to exist in the chloroplast genome. Examples:
trnF-gene for tRNA(Phe)
trnC-gene for tRNA(Cys)
trn L1 (or trnL-UAA)--gene for tRNA(Leu)1
trn L2 (or trnL-CAA)--gene for tRNA(Leu)2
Ribosomal Proteins
rps 4-ribosomal protein homologous to E. coli ribosomal protein S4.
rps 19---ribosomal protein homologous to E. coli ribosomal protein S 19
rpl 2-ribosomal protein homologous to E. coli ribosomal protein L2.
Photosystem I Proteins
psaA 1-P700 chlorophyll a apoprotein
psaA2-P700 chlorophyll a apoprotein
Photosystem II Proteins
psbA-"32 kilodalton' quinone-binding polypeptide. Also known as "photogene
32" and "Qb protein"; it contains the binding site for atrazine type herbicides.
psB-51 kilodalton chlorophyll a-binding polypeptide or p680 apoprotein. pbC-
44 kilodalton chlorophyll a-binding polypeptide. pbD-"D2" protein. psbE-
cytochrome b559.
Photosynthetic Electron Proteins
pet-cytochrome f
petS-cytochrome b6
petD-subunit 4 of the cytochrome b6/f complex

154
Proteins of the ATP Synthase Complex
atpA-CF1 alpha subunit
atpS-CF1 beta subunit
atpE-CF1 epsilon subunit
atpH-CFO subunit III, DCCD-binding proteolipid for the ATPase complex
(proton translocating subunit)
Carbon Fixation Enzymes
rbcL-ribulose bisphosphate carboxylase, large subunit
Other Stromal Polypeptides
tufA-translational elongation factor Tu
TRANSCRIPTION OF PLASTID GENES
Transcription of plastid genes resembles transcription in prokaryotes in
many respects. In fact, when E. coli is transformed with the plastid genes rbcL
and psbA, transcription is initiated in it at a site similar or identical to the one
used by the chloroplast RNA polymerase.
This result is consistent with studies showing that the DNA sequence
elements which direct transcription initiation of plastid genes (TTGaca and
TAtaaT, located 35 and 10 nucleotides upstream of the site of transcription
initiation, respectively) are similar to prokaryotic promoter elements.
Furthermore, as in bacteria, these promoter elements precede rRNA,
tRNA and protein coding genes. The only exceptions reported thus far are two
tRNA genes (trnS, trnR) which lack these sequences and may use internal
promoter elements similar to eukaryotic tRNA genes. Transcription of plastid
genes by E. coli RNA polymerase, however, is not identical to that found with
the chloroplast RNA polymerase.
The activity and initiation accuracy of the chloroplast enzymes are
greatly enhanced by supercoiled templates. E.coli RNA polymerase on the
other hand will transcribe linear templates readily. E. coli RNA polymerase
also initiates transcription at sites not used by chloroplast RNA polymerase.
For example, E. coli RNA polymerase initiates transcription within the rbcL
leader region at a site not recognized by the chloroplast transcription apparatus.
Furthermore, E. coli RNA polymerase transcribes atpB more efficiently
than rbcL, whereas the reverse is true for the chloroplast enzyme. Finally,
transcription in E. coli is sensitive to rifampicin, whereas chloroplast
transcription is not. These results show that transcription initiation in E. coli
and chloroplasts have some overall similarities but differ significantly in
specific ways.

155
It is possible that differences in E. coli and chloroplast transcription result from
differences in RNA polymerase composition. The E. coli transcription
apparatus consists of an RNA polymerase containing four subunits (β, β’, α, σ)
and accessory factors (nusA, nusB, rho, tau). Chloroplast RNA polymerase
preparations contain from 5 to 14 subunits.
At least four of these proteins are homologous in E. coli RNA
polymerase subunits (β, β’, α, σ). Sequence analysis has indicated that plastid
DNA encodes proteins homologous to the β, β’ and α a subunits of E. coli RNA
polymerase (on the rpoA. Band C genes). Analysis of chloroplast rpoC
revealed an open reading frame which could encode an additional chloroplast
RNA polymerase subunit. This extra subunit (τ:66KD) has been reported in the
Anabaena RNA polymerase.
In summary, it is likely that the chloroplast RNA polymerase contains
at least five subunits (β, β’, τ, α, σ); proteins homologous to nusA, nusB, rho
and tau have not yet been reported as present. Two lines of evidence indicate
that the chloroplast RNA polymerase population may be heterogeneous.
First, it has been reported that all subunits of an isolated RNA
polymerase preparation are synthesized on cytoplasmic ribosome.The fact that
genes encoding α, β and β ' subunits are localized in plastids raised the
possibility that some RNA polymerase subunits may be encoded by both
nuclear and plastid genes.
Second, transcriptionally active complexes of DNA and RNA
polymerase have been isolated which show preferential transcription of rRNA.
Based on the properties of these preparations it has been proposed that rRNA
and protein genes may be transcribed by different RNA polymerases.
RIBOSOMAL RNA OPERONS
Chloroplast ribosomal RNA genes are arranged in an operon very
similar to the ribosomal RNA operon of bacteria. The bacterial gene order,
168-23S-58, is preserved in chloroplast operons. The "addition" of a gene for
4.58 rRNA may be a rearrangement of DNA rather than an insertion since this
RNA is homologous to the 3'-end of the bacterial 238 RNA and can be
regarded as the structural equivalent of this portion of the molecule.
An additional point of similarity is the location of tRNA genes for
isoleucine and alanine in the spacer between the 16S and 238 rRNA genes. The
same two tRNA genes can be found at corresponding positions in several of the
RNA operons of both E. coli and B. subtilis. However, in most chloroplasts,
these two tRNA genes contain large introns which are not present in bacteria.

156
The homology between plastid and bacterial rRNA operons is also apparent
from DNA sequence. Though intergenic sequences and intron sequences
diverge rapidly and large differences are seen between different species of
higher plants, there is a high level of homology between corresponding
structural genes (exons) of different organisms.
The sequences of bacterial and higher plant plastid rRNA genes show
60% to 80% similarity. A strong homology between plant and bacterial
sequences at the 3' end of the 16S gene, which in bacteria is known to be
involved in binding the ribosome to the mRNA at the start of translation,
provides evidence that this process may be similar in plastids. Supporting this
notion is the fact that chloroplast RNAs have often been found to contain
sequences similar to the ribosome binding sites of bacterial mRNAs.
Transcription of the plastid rRNA operon probably produces a single
primary transcript containing the 16S, 23S, 4.5S and 5S rRNAs, along with
spacer tRNAs.
This is then processed by a series of endonuclease cleavage to produce
the mature RNA products. However, details of this scheme are difficult, to
define precisely, since the processing is so rapid that the original transcript
probably never accumulates to detectable levels.
TRANSCRIPTION TERMINATION
Transcription termination of plastid genes has not been adequately
studied. Analysis of RNA 3' -ends has revealed sequences with dyad symmetry
proximal RNA termini. In E. coli factor independent transcription terminators
typically contain a GC-rich region of dyad symmetry followed by an AT--rich
stretch that often contains a run of thymidines.
In vitro experiments indicated that DNA sequences with dyad symmetry
located at the 3' -end of rbcL do not cause transcription termination.
Transcription was terminated with low efficiency by 3' regions of spinach
psbA, rrnB and petD. In contrast, trnS and trnH caused termination with high
efficiency (80%) although these genes are not followed by regions of dyad
symmetry.
These results suggest that the stem loop structures found at RNA 3' -
ends may not play a significant role in factor independent transcription
termination. Presently available data, on the other hand, does not rule out a role
for these sequences in transcript stability.
Transcription from plastid gene promoters is stimulated in vitro when
supercoded DNA templates are used instead of relaxed circular or linear

157
templates. Furthermore, DNA conformation affected the relative ratio of
transcription of two adjacent promoters (atpB, rbcL) in vitro studies.
Chloroplasts of higher plants contain gyrase and topisomerase I activity
which could alter the superhelicity of plastid DNA in vivo. To date, however,
no direct test of this possibility in higher plants has been reported. In
Chlamydomonas, inhibitors of gyrase, such as novobiocin, change the relative
transcription rates of several plastid genes.
tRNA
Transcripts containing unprocessed tRNA sequences are produced from
the individual tRNA transcription units, from the ribosomal DNA transcription
unit [tRNA-Ala(UGC), tRNA-lle(GAU)], and from transcription units which
encode proteins. Production of mature tRNAs involves 5' anal 3'
endonucleolytic cleavage of primary transcripts, addition of CCA to the 3', end
of the processed RNA, and base modification.
Interestingly, plastid RNAse P, the enzyme responsible for 5'-end
cleavage of tRNA precursor RNAs, does not contain an associated RNA. This
is in contrast to the situation in other eubacterial RNAse P enzymes which
consist of a 377 to 400 nucleotide RNA plus a 14 kD protein. The requirement
for RNA in eukaryotes RNAse P activity has not been established.
Base modification of chloroplast tRNAs is similar to that found, in
bacteria. For example, tRNA-Glu contains several modified bases, including
four pseudouridines, a 5-methylcytosine and a 5-methylaminomethyl'-2-
thiouridine. This latter modification has been reported only in plastids and
prokaryotes providing additional support for the endosymbiont theory
prokaryotic origin of plastids.
In Euglena tRNA genes are organized into operons which are
transcribed to form precursor RNAs containing several tRNA sequences. In
higher plants, tRNA genes are typically dispersed throughout the genome and
transcribed individually. Interestingly, although Euglena tRNA genes lack
intron, about one-fourth of the sequenced higher plant tRNA genes have them.
Thus in both cases tRNA primary transcripts must be processed to
produce mature tRNAs. In one case processing removes intergenic sequences,
in the other introns. Intron removal from most tRNA gene transcripts probably
proceeds by a mechanism different from that used to remove introns from the
transcripts of nuclear genes or protein coding genes in the chloroplast.
Instead of relying on conserved sequences at exon/intron boundaries,
splicing of tRNA transcripts seems to depend on the secondary structure of the
intron in a manner reminiscent of the splicing of certain ribosomal RNAs.

158
Introns - Introns are DNA sequences within genes which disrupt protein
coding regions termed exons. Introns and exons are contranscribed and the
resultant precursor RNA spliced to remove introns. Four classes of introns have
been recognized. One intron type is found in nuclear genes which encode
proteins.
Introns of this type are characterized by invariant GU and AG
dinucleotides at the intron boundaries.A second intron type is found in nuclear
tRNA genes. Two additional intron classes, termed Group I and Group II
introns, have been distinguished on the basis of conserved sequence elements
and potential folding patterns.
Group I introns have been found in chloroplasts, mitochondria and
ribosomal genes of Tetrahymena and Group II introns are present in
mitochondria of yeast and chloroplast.Euglena plastid protein genes contain
numerous introns. In contrast, genes which encode proteins contain introns in
higher plants. One plastid gene intron has been identified as a group I intron
(trnL(UUA)).
In contrast, the gene encoding tRNA-Lys(UUU) contains a 2.5 kbp
Group" intron. An open reading frame within this gene's intron encodes a
protein homologous to mitochondrial RNA maturases.This raises the
interesting possibility that the putative maturase located in trnK could facilitate
intron processing in plastids.
An even more remarkable observation was made recently with regard to
rpsl2. This gene is encoded by three exons. The 5'-exon is located 28 kbp from
the two 3'-exons and transsplicing is needed to include functional RNA.
Transsplicing has also been reported for psaA in Chlamydomonas.
CHLOROPLAST AND MITOCHONDRIAL GENOME
ORGANIZATION
Till now, we have dealt with genes present in the nuclei of eukaryotes.
Certainly, nuclear DNA is the most important and very nearly the universal
genetic material. But there are evidences for the presence of genes outside the
nucleus.
Till now, we have discovered that both the chloroplast and
mitochondria have DNA of their own.
These DNA inherit independently of nuclear genes. In effect, the
organelle genome comprises a length of DNA that has been localized in a
defined part of the cell and is subject to its own form of expression and
regulation. An organelle genome can code for some or all of the RNAs, but
codes for only some proteins are needed to perpetuate the organelle. The other

159
proteins are coded in the nucleus, expressed via the cytoplasmic protein
synthetic apparatus. Genes not residing within the nucleus are generally
described as extranuclear genes.
Gene organization in mitochondria and chloroplast?
Chloroplast Genome Organization - Chloroplast is the chlorophyll
containing organelle that carries out photosynthesis and starch grain formation
in plants. Like mitochondria, chloroplasts contain DNA and ribosomes, both
with prokaryotic affinities. The DNA of chloroplast (cpDNA) is a circle that
ranges in size from 120-190 kb.
The chloroplast DNA, like mitochondrial DNA, controls the production
of tRNAs, ribosomal RNAs and some of the proteins found within the
organelle.
In more than 25 chloroplast DNAs from different plant, species that
have been sequenced, there seems to be about 87-183 genes in the chloroplast
genome. The chloroplast genome codes for all the rRNA and tRNA species
needed for protein synthesis.The ribosome includes two small rRNAs in
addition to the major species. The tRNA set may include all of the necessary
genes. The organelle genes are transcribed and translated by the apparatus of
the organelle.
About half of the chloroplast genes codes for proteins involved in
protein synthesis. The endosymbiotic origin of the chloroplast is emphasized by
the relationship between these genes and their counterparts in bacteria.
The organization of the rRNA genes in particular is closely related to that of a
cyanobacterium, which pins down more precisely the last common ancestor
between chloroplast and bacteria.
Introns in chloroplast fall into two general classes. Those in tRNA
genes are usually located in the anticodon loops, like the introns found in yeast
nuclear tRNA genes. Those in protein coding genes resemble the introns of
mitochondrial genes.This places the endosymbiotic event at a time in evolution
before the separation of prokaryotes with uninterrupted genes. The role of the
chloroplast is to undertake photosynthesis. Many of the chloroplast genes code
for proteins that are located or function in thylakoid membranes.
But quite strangely, even some protein complex found on thylakoid
membrane are coded by nuclear genes. Thus on the thylakoid membranes you
find proteins coded both by chloroplast genome and nuclear genome. Other
chloroplast complexes are coded entirely by one genome

160
GENE EXPRESSION IN CHLOROPLASTS
Chloroplast Biogenesis
Chloroplasts are unique to plants and are the principal site of energy
transduction and biosynthetic reactions. Chloroplasts also contain a small
genome encoding about 80 proteins. The aim of our research is to understand
how the expression of genes in the chloroplast and nuclear genomes is
coordinated to ensure the biogenesis of functional chloroplasts. This includes
studying retrograde signalling from chloroplasts to the nucleus to control
nuclear gene expression, and the effects of chromatin structure on nuclear gene
expression. We use chloroplast transformation to study aspects of the
regulation of chloroplast gene expression, and to express bacterial and viral
antigens for the development of edible vaccines. We are also studying the
structure and functions of stromules, highly dynamic stroma-filled tubules that
extend from the chloroplast surface and permit the exchange of material
between interconnected chloroplasts.

Stromules extending from plastid bodies in a tobacco leaf trichome expressing


a chimeric gene encoding plastid-targeted GFP.
REGULATION OF PLASTID GENE EXPRESSION IN THE
CHLOROPLAST-TO-CHROMOPLAST
Transition Plastid biogenesis and differentiation requires coordinated
expression of genes encoded in the nuclear and plastid genomes. Knowledge
about the regulation of expression in the plastid genome (or plastome) is central
to improving our understanding of these processes. The conversion of
chloroplasts into carotenoid-accumulating chromoplasts is a hallmark of tomato
fruit ripening and an excellent system for the study of plastid differentiation. In
chloroplasts, much of the regulation appears to occur at the posttranscriptional
level (reviewed in Choquet and Wollman, 2002). Regulation of gene
expression during the development of non-green plastids is less well
understood. Kahlau and Bock (pages 856–874) created a microarray

161
representing the tomato plastome to explore the developmental regulation of
plastid gene expression. They found that transcript levels for most genes were
much lower in green tomato fruits than in leaves but were not further changed
during fruit ripening. For most genes, the amount of transcript associated with
polysomes (representing active translation) was also drastically lower in fruits
than in leaves. In contrast with mRNA levels, which remained steady, mRNA
translation continued to decrease throughout fruit ripening. Together, these
results suggest that there is regulation of chloroplast transcript accumulation
upon the initiation of the fruit developmental program but that translational
regulation is the key component of the subsequent chloroplast- to-chromoplast
transition.
The exceptions to these global patterns of downregulation were a small
number of genes involved in gene expression and one other protein-coding
gene, accD, which is the only plastid gene involved in fatty acid synthesis (see
figure). Therefore, it appears that chromoplasts retain minimal expression
activity to allow the synthesis of the fatty acids needed for carotenoid
accumulation. Kahlau and Bock also investigated developmental regulation of
the chloroplast and chromoplast transcription and RNA processing systems and
show that certain genes exhibit specific developmental patterns with respect to
plastid RNA polymerase activities, intron splicing, and RNA editing.
Therefore, the reduced translation seen throughout fruit ripening is likely to be
achieved via different mechanisms for different genes.
This work provides valuable information about the global patterns of
plastome gene expression in tomato fruit. It also illustrates the importance of
studying these processes in a variety of tissues and plastid types.

Chromoplast gene expression largely serves the production of a single


protein. Most plastid genes were strongly downregulated during tomato fruit
ripening, similar to the data shown for the photosynthetic gene psbD. By
contrast, amounts of accD transcripts, encoding a subunit of acetyl- CoA
carboxylase, increase during ripening. Panels show blots of RNA from leaves

162
and ripening fruit probed with psbD (left) and accD (right), with loading
controls shown below.
Posttranscriptional Control of Chloroplast Gene Expression. From RNA to
Photosynthetic Complex
Twenty-five years ago it was well established that chloroplasts contain
their own DNA and protein synthesizing system, but little was known of how
this organellar genome is expressed. As a result of their endosymbiotic origin,
plastids contain a protein synthesizing system that displays several prokaryotic
features. Its 70S ribosomes resemble those of bacteria and are sensitive to the
same set of antibiotics. However, it has become apparent that the chloroplast
gene expression system is unique, differing in many respects from bacterial
systems and using a variety of unusual posttranscriptional steps.
Major technical Advances
Scientific progress is often driven by new technology. A particularly
striking example was the establishment of a chloroplast transformation system
in 1988 by Boynton et al., in Chlamydomonas, which was later extended to
tobacco. These were major technical breakthroughs that coincided with the first
determinations of the entire sequence of chloroplast genomes and opened the
door for the in vivo study of chloroplast gene expression. Because of the
efficient chloroplast homologous recombination system, it became possible to
perform precise DNA manipulations on any chloroplast gene of interest, in
particular, specific gene disruptions and site-directed mutagenesis. This
powerful technology also allowed one to dissect chloroplast promoter regions,
to introduce chimeric genes in the chloroplast genome, and to identify novel
functions by screening or selecting for specific phenotypes. The development of
an efficient nuclear transformation system in Chlamydomonas and of gene
tagging methods in Chlamydomonas, maize, and Arabidopsis provided also
important tools that led to major advances in our understanding of chloroplast
biogenesis.
Molecular cross talk between nucleus and chloroplast
One of the most original features of the chloroplast protein synthesizing
system is that it cooperates with the nucleocytosolic system in the biosynthesis
of the photosynthetic apparatus. The subunits of the photosynthetic complexes
are encoded by chloroplast and nuclear genes that need to be coordinately
expressed. The study of this molecular crosstalk between chloroplast and
nucleus was greatly helped by genetic approaches. The genetic analysis of
photosynthetic mutants of Chlamydomonas reinhardtii and maize revealed a
large number of nuclear and chloroplast loci involved in several
posttranscriptional steps of chloroplast gene expression such as RNA stability,

163
RNA processing, splicing, and translation. A characteristic feature of the
nucleus-encoded factors, at least in the case of Chlamydomonas, is that they are
specifically required for a single posttranscriptional step in the synthesis of an
individual plastid gene product. Thus genetic analysis of mutants deficient in
the accumulation of single chloroplast mRNAs identified a distinct nuclear
locus in each case.
In a similar manner, mutations affecting chloroplast translation define
one to three nuclear loci specifically required for the translation of a single
chloroplast mRNA. A particularly striking example is provided by the genetic
analysis of the maturation of the psaA mRNA of Chlamydomonas, a process
that requires at least 14 nuclear loci. If these findings are extrapolated to the
entire chloroplast genome with a total number of approximately 120 genes, one
can estimate that several hundred nucleus-encoded factors are required for the
expression of the entire set of plastid genes. The first specific factors of this
type were recently cloned in Chlamydomonas, Arabidopsis, and maize using
gene tagging or genomic complementation by transformation. Several are
involved in chloroplast trans-splicing, RNA stability and processing, and
translation. As chloroplast RNA processing, stability, and translation are
closely coupled, a defect in any of these processes could affect the others.
Establishing in Vitro Systems for Species Steps of Chloroplast Gene
Expression
Major advances in our understanding of chloroplast gene expression,
particularly the enzymatic machinery involved in chloroplast 3' end processing,
were achieved through the development of in vitro systems with chloroplast
extracts. As it turns out, most chloroplast 3' ends are produced by RNA
processing rather than by transcription termination. In most cases, an
endonucleolytic cleavage downstream of the mature 3' end is followed by 3'-
exonucleolytic resection to a stem-loop at the 3' end. These processing steps are
catalyzed by a chloroplast degradosome similar to that of Escherichia coli,
consisting of several nucleus-encoded RNA-binding proteins, including exo-
and endonucleases. As in bacteria, polyadenylation appears to play an
important role in chloroplast RNA turnover. Evidence based on the existence of
low abundant polyadenylated chloroplast mRNA fragments is compatible with
a model in which the RNAs are endonucleolytically cut and extended with a
short polyA tail. These tails may provide a foothold for the degradosome.
Another breakthrough was the establishment of an in vitro translation
system from tobacco chloroplasts. This system promotes accurate initiation of
translation from several chloroplast RNAs and revealed important cis-acting
elements within the chloroplast 5'-untranslated regions (UTRs). The in vitro

164
translation system also provided new insights on how polypeptides of the
photosynthetic apparatus are targeted and inserted into the thylakoid membrane.
It is generally assumed that this process occurs cotranslationally based on run-
on translations with thylakoid-bound ribosomes and detection of translation
intermediates in the membrane. Using the in vitro translation system, it was
possible to produce stable ribosome nascent chain complexes and to show that
one polypeptide of the chloroplast signal reduction particle (SRP) particle,
SRP54, interacts with the nascent polypeptide chain and thus represents a
soluble component of the targeting machinery. Because of their prokaryotic
origin, it is not surprising that several other homologs of E. coli besides SRP54
are involved in membrane protein targeting and insertion, including the
chloroplast SecA, SecB, and SecY proteins. Disruption of the SecY gene in
maize leads not only to a severe reduction of thylakoid membranes, but also to a
deficiency in chloroplast translation, thus revealing a link between thylakoid
membrane biogenesis and chloroplast translation.
A characteristic feature of chloroplast protein synthesis is its strong
stimulation by light. A major advance occurred when a correlation was shown
between the light-stimulated binding of a multiprotein complex to the 5'-UTR
of the chloroplast psbA mRNA and its translation. Binding of this complex to
the 5'-UTR was proposed to be controlled by the redox potential and ATP
levels that are modulated by photosynthetic activity. The characterization of this
complex revealed that it contains a 47-kD protein that is homologous to polyA-
binding proteins and a 60-kD protein that appears to be a protein disulfide
isomerase, usually found in the endoplasmic reticulum. It thus appears that
several cytoplasmic proteins have been recruited by the chloroplast for novel
regulatory functions in plastid gene expression.
Coordinate Expression of chloroplast proteins
All photosynthetic complexes consist of several subunits, the
accumulation of which needs to be coordinated in a stoichiometric fashion. This
appears to be achieved in two ways. The first involves proteolytic degradation
of most of the unassembled subunits. The proteases required for this process are
still largely unknown, although recent evidence indicates that the ATP-
dependent ClpP protease is partly involved. The second mechanism was
elegantly demonstrated with the cytochrome b6f complex of Chlamydomonas).
In this process, referred to as control by epistasy of synthesis, translation of the
cytochrome f subunit of the cytochrome b6f complex is strongly attenuated
when other subunits from the same complex are absent. This control is
mediated by a direct or indirect interaction between the 5'-UTR of the
cytochrome f mRNA and the C-terminal domain of the unassembled

165
cytochrome f subunit. Whether this process is also valid for other chloroplast
genes in Chlamydomonas and higher plants remains to be determined.
Unusual features of Chloroplast Post-transcription Processes
The chloroplast psaA gene of Chlamydomonas consists of three exons
flanked by group II intron sequences that are widely separated on the plastid
genome and transcribed independently. The assembly of the mature psaA
mRNA depends on two trans-splicing reactions that require several trans-acting
factors. Factors involved in this process were recently characterized. One of
them resembles -uridine synthases, although this enzyme activity is not
required for trans-splicing. Another trans-acting factor required for psaA trans-
splicing is an RNA. This RNA, called tscA RNA, was identified as part of the
group II psaA intron 1 structure, a first example of a tri-partite intron. This
finding has important evolutionary implications given the fact that group II
introns with their cis-acting catalytic domains are considered to be the
precursors of nuclear introns with their trans-acting snRNPs. The tripartite psaA
group II intron may thus represent an intermediate stage in this evolutionary
process. The tscA RNA has recently been found to be part of a protein complex
that may represent a chloroplast counterpart of snRNPs (C. Rivier, unpublished
data). It will be particularly interesting to determine whether any evolutionary
relationship is apparent between the proteins of this complex and those of
eukaryotic snRNPs. These studies should provide new insights into the
evolution of gene expression systems.
RNA editing, the mechanism of posttranscriptional nucleotide
modification, is one of the most striking chloroplast oddities and also occurs in
plant mitochondria. Chloroplast RNA editing involves mostly cytidine-to-
uridine conversions with the reverse change occurring only in few cases. It thus
adds a novel posttranscriptional step in chloroplast gene expression besides
RNA 5'- and 3'- end processing, cleavage of polycistronic into monocistroinic
mRNAs, and group I and group II splicing. Only 25 sites in the entire tobacco
plastid genome are edited. How the editing machinery selectively modifies
these sites remains an intriguing question, although recent studies indicate that
both mRNA sequences flanking the editing site and specific trans-acting factors
play an important role. The existence of chloroplast editing requires caution in
the evaluation of chloroplast DNA sequence data. In particular, plastid open
reading frames may be missed because of the editing of cryptic ACG initiation
codons to AUG.

166
Perspectives
Although the nucleus influences the expression of the chloroplast
genome through a large set of factors, the chloroplast can also influence nuclear
gene activity. Plants devoid of carotenoids photobleach when exposed to light.
This condition leads to the selective inhibition of transcription of a selected set
of nuclear genes including the genes of the light-harvesting chlorophyll a/b
proteins. The chloroplast signal involved in this response has remained
enigmatic for many years. However, recent studies with C. reinhardtii indicate
that intermediates in the porphyrin pathway such as magnesium
protochlorophyllide methyl ester play a crucial role in this response. An
important task for the future is to identify the targets of these porphyrin
intermediates.
Mitochondria interact in many ways with the chloroplast through
metabolic pathways. Recent work strongly suggests that genetic interactions
also exist between these two organelles. Genetic data indicate that
informational suppressors of chloroplast non-sense mutations suppress
mitochondrial mutations, suggesting exchange of tRNAs between these two
organelles. A molecular analysis of this intriguing process should prove
particularly rewarding.
MITOCHONDRIAL DNA ORGANIZATION
The mitochondrion is an organelle in a eukaryotic cell in which the
electron transport chain takes place. The actual number of mitochondria per
cell can be determined by electron microscopy. The most interesting aspect of
the mitochondrion is that it has its own DNA.
Mitochondria provide higher animals and plants with life sustaining
cellular energy through the oxidative processes of the citric acid and fatty acid
cycles.
Animal mitochondrial DNA is extremely compact, with very few non
coding regions and no introns. Each strand of duplex is transcribed into a single
RNA product that is then cut into smaller pieces primarily by freeing the
twenty two transfer RNAs interspersed throughout the genome.
Also formed are a 16S and a 12S ribosomal RNA. Although proteins
and small molecules such as A TP and tRNAs can move in and out of the
mitochondrion, large RNAs cannot. Most interestingly, all mtDNAs exhibit the
same basic organization of genetic information. Each contains 2 rRNA genes,
22 tRNA genes, and 13 putative protein structural genes. Five genes encode
known proteins but the products and functions of the other putative genes have
not yet been identified.

167
The entire mammalian mitochondrial genome is transcribed as one unit
from a single promoter site and the giant primary transcript is then cleaved
endonucleolytically to produce the individual tRNA, rRNA and mRNA
molecule. Thus, the entire mtDNA is, in effect, equivalent to one operon in
bacteria
The human mitochondrion has 13 genes. But some proteins are
transported into the mitochondrion as they are synthesized in the cytoplasm
under the control of nuclear genes. Proteins targeting the entry into the
mitochondrion have a signal peptide of 85 amino acids in length.
These transit peptides are usually cleaved off the precursor polypeptides
during their transport across the mitochondrial membrane. In mammalian
mitochondrial genes, there are no introns.
Some genes actually overlap and almost every single base pair can be
assigned to a gene, with the exception of the D- Loop, a region concerned with
the initiation of DNA replication. In yeast cells, 10-20 percent of the cellular
DNA is localized in a single mitochondrion.
Yeast mitochondria have introns and are not economical like that of
mammalian mitochondria. Some suggest that the genes with introns have been
originated in the nucleus and are later captured or integrated by the
mitochondria. Even the genome size is five times bigger than mammalian
mtDNA.
MITOCHONDRIAL GENOME
The plant mitochondrial genome has long been an enigma to molecular
biologists. Even the smallest is more than 200 kilobases in size, more than 10
times the size of animal mitochondrial genomes (15-18 kb) and several times
the size of mitochondrial genomes in fungi (18-78 kb) or protists (15-47 kb). In
contrast to the DNA of chloroplasts which is relatively conserved, the DNA of
mitochondria exhibits a wide variation in size and form.
Higher plant mtDNA can be circular or linear and varies from 200 kbp
(in brassicas) up to greater than 2500 kbp (in muskmelon).
The genome of plant mitochon dria is thus very large and may also be
divided between one or more DNA molecules.
The largest plant mitochondrial genome studied so far is half the size of
the entire E. coli genome (4,500 kb). The large size of these genomes presents
several problems.
On the one hand, simply determining the physical structure of such
large genomes is a major challenge, especially since recombination events are
known to produce several different molecular configurations.

168
On the other hand, the large size per se and the dramatic variation of
such genomes demands an explanation.
We know of few genes in plant mitochondrial genomes which are not
also present in the mitochondria of yeast or animal cells.
And the few additional genes we do know about do not begin to account
for the additional DNA in even the smallest of plant mitochondrial genomes.
Both animal and plant mitochondria encode their own ribosomal and transfer
RNAs.
The number of proteins encoded in plant mitochondrial DNA is
probably not much higher than the number encoded in mammalian
mitochondria.
Experiments have shown that most of the protein synthesis in isolated
maize mitochondria can be accounted for by some 18-20 polypeptides.
Although there is always a possibility that more proteins might be
synthesized in mitochondria in vivo than in vitro, it seems likely that most of
the mitochondrial genome is noncoding DNA.
SIZE AND STRUCTURE OF MITOCHONDRIAL GENOME
Changes in the size of the mitochondrial genome seem to occur quite
rapidly in plant evolution since closely related species sometimes have quite
different mitochondrial genome sizes.
The best example of this phenomenon comes from studies in the
laboratory of Arnold Bendich at the University of Washington.
Combining measurements of mitochondrial DNA renaturation kinetics
with an analysis of restriction profiles for the same DNAs, Bendich and his
colleagues B. Ward and R.
Anderson estimated the minimal size (complexity) of mitochondrial
DNA for several species in the family Cucurbitaceae.
Their estimates, along with similar estimates for several other plant
mitochondrial genomes. Within the Cucurbitaceae the size of the mitochondrial
genome can vary by at least 10-fold.
This variation cannot be explained by postulating rapid changes in the
amount of repetitive DNA since little repeated DNA (less than 10%) was found
in any of the genomes.
Hence, the mitochondrial genome is like the chloroplast genome in
being composed principally of single copy DNA. But it is like the nuclear
genome in its highly variable size and its content of "excess" DNA which has
no known function.

169
In several cases, including Brassica, maize, and wheat, restriction maps
have been constructed for mitochondrial genomes. Although the smaller
genomes (for example, Brassica) can be mapped by procedures such as those
used for chloroplast DNA, the larger genomes require different techniques.
The most successful approach has been to clone large fragments of
mitochondrial DNA and then use a combination of hybridization and restriction
mapping to identify overlapping clones and establish linkage groups. This
procedure is called chromosome walking.
With such a procedure it has been possible to show that most, possibly
all, mitochondrial DNA can be described as one large circular linkage group.
In contrast to chloroplast DNA, in which circular molecules can be seen
in an electron microscope, the physical form of the mitochondrial genome is
still not well understood. Many investigators have reported the presence of
circular DNAs in plant mitochondria, but these are small in relation to the total
genome size. Some are mitochondrial plasmids which are not considered to be
part of the main genome; others clearly contain genomic sequences.
In some cases most of the DNA sequences in the mitochondrion can be
found in the collection of circles, although individual circles are often much
smaller, than the size of the genome. Cultured tobacco cells provide a good
case in point. These cells are an excellent source of the relatively small
mitochondrial DNA circles which can be purified on density gradients and
analyzed with restriction enzymes.
Although none of the circles comes close to the size of the genome as a
whole, the restriction profile of the DNA in the circular fraction is identical to
that of total mitochondrial DNA from either cultured cells or intact plants.
At least in cultured tobacco cells, therefore, it seems that the entire
mitochondrial genome can exist as a population of subgenomic circles.
Mitochondrial DNA consisting predominantly of small circles may be
peculiar to cultured plant cell since it is often difficult to isolate any circular
DNA at all (with the exception of plasmids) from the mitochondria of mature
plants.
In these cases the physical form of the DNA remains unknown.
Although it is logical to suppose that larger circles are present in vivo, technical
difficulties make it quite difficult to test this hypothesis.
The complex and variable pattern of mitochondrial DNA organizations
was difficult to rationalize until complete restriction maps became available.
The relatively small Brassica mitochondrial genome was mapped by J.
Palmer and C. Chields at the Carnegie Institution, Department of, Plant

170
Biology, using relatively simple mapping techniques developed originally for
chloroplast DNA.
Meanwhile, D. Lonsdale and his colleagues at the Plant Breeding
Institute in Cambridge, England, had been using chromosome walking
techniques to map the much larger maize mitochondrial genome. The two
groups reported their findings almost simultaneously.
It was discovered in both cases that the genome contained a set of
repeat sequences which could be found associated with different permutations
of flanking sequences, defining a set of substoichiometric restriction fragments.
This data is highly consistent with a model in which site specific
recombination occurs at the repeated sequence, generating a series of
subgenomic circular molecules that are in conformational equilibrium with
each other and with a master circle.
Recombination in mitochondrial DNA has also been shown to occur in
somatic hybrid cells produced by protoplast fusion techniques. Simply
maintaining cells in tissue culture can lead to variations in the restriction
pattern of their mitochondrial DNA.
It is difficult from our present level of knowledge to determine whether
these changes reflect (recombination events or simply differential replication of
different preexisting variant molecules. However, certain somatic hybrid cells
have been shown to contain mitochondrial DNA restriction fragments which
are not present in either parental genome.
And clones have been obtained which contain marker DNA segments
from different parents. As yet it is not known whether intergenomic
recombination events occur by the same mechanism as the intragenomic
recombinations that produce the diverse array of molecules in a single genome,
but it seems reasonable to suppose that they do.
Size of Some Higher Plant Mitochondrial Genome

Genome
Cucubits
size

Watermelon 320

Zucchini 900

Cucumber 1,600

Muskmelon 2,600

Other Dicots

171
Oenothera 20

Brassica 215

Pea 430

Mung Bean 400

Pokeweed 330

Spinach 300

Atriplex rosea 290

Atriplex halimus 260

Monocots

Maize 570

Wheat 430

MITOCHONDRIAL PLASMIDS
In addition to a variety of circular mitochondrial chromosomes,
mitochondria from a number of plants contain episomal or plasmid like
molecules. These are generally small circular, or small linear double stranded
DNAs. They are usually detected as strong discrete bands in electrophoretic
separations of untreated (intact) mitochondrial DNA.
In well studied cases, it has been shown that the episomal DNAs do not
fit into the restriction map of the main genome. Plasmid like DNAs have been
characterized in a number of plants, including sugar beet, sorghum, and some
species of Brassica. Normal maize mitochondria carry a linear episome 2.3 kb
in length and a circular DNA of about 1.9 kb.
This is in addition to the various subgenomic circles generated by
recombination in the main genome. The significance (if any) of the plasmid-
like DNAs to the plant is not known, and the plasmid like DNAs can be lost
without obvious effects on appearance or viability.
The best studied examples of mitochondrial episomes are found in
certain male sterile lines of maize. They have been studied, at least in part, in
the hope that they might be involved in producing the male sterility trait.
Cytoplasmic male sterility (CMS) appears to be associated with
alterations in the mitochondrial DNA.
Several types of CMS maize cytoplasm can be distinguished by their
responses to nuclear restorer genes, which restore fertility to some types but not

172
to others, and by their mitochondrial polypeptides and the restriction profiles of
their mitochondrial DNA. A major CMS cytoplasm is the s type, which is
characterized by two prominent inverted repeats 208 bp long.
These repeats are covalently linked to protein. By analogy to adenovirus
and certain bacteriophage systems, it is thought that the terminal protein
complexes may be involved in initiating DNA replication. The episomal DNAs
8-1 and S-2 are not detectable as free episomes in the mitochondria-from
CMS-S plants that have reverted to fertility, but sequences homologous to S-1
and S-2 can be found in high molecular weight mitochondrial DNA, in normal,
CMS-S, and fertile revertant cytoplasms.
Their arrangement with respect to adjacent genomic sequences differs
between CMS-S and revertants, however, and it is thought that this
rearrangement may somehow be involved in the process of reversion to
fertility. C, Schardl, in collaboration with D.
Lonsdale at the Plant Breeding Institute in Cambridge, England, and
with D. Pring and K. Rose at the University of Florida, analyzed cosmid clones
containing sequences homologous to S-1 and S-2 from CMS and revertant
plants.
Data from restriction analysis of many such cosmids was consistent
with a model in which recombination between the terminal inverted repeats of
the S-1 and S-2 episomes and homologous sequences in the mitochondrial
chromosomes would generate linear mitochondrial DNA molecules containing
S-1 or S-2 at their end.
The relationship of this linearization of the otherwise circular
mitochondrial genome to the CMS phenotype is still not clear, although the fact
that the same arrangements are seen in the DNA of both CMS plants in which
the CMS phenotype has been corrected by nuclear genes indicates that
linearization per se is not sufficient to cause male sterility
Chloroplast Sequences in Mitochondrial DNA
The observation that chloroplast DNA sequences are contained in the
mitochondrial genome provided another surprise for plant molecular biologists.
The initial observations came from Cambridge, England, where D. Stern and D.
Lonsdale of the Plant Breeding Institute reported that mitochondrial DNA from
maize contained a 12 kb sequence from the maize chloroplast genome.
When labelled chloroplast DNA was reacted with restriction digests of
mitochondrial DNA, this sequence hybridized preferentially.
Investigations further showed that the preferentially hybridizing
sequence could be cloned on a cosmid (a plasmid packaged in a lambda virus

173
coat) that contained mitoshy;chondrial DNA on either side of the chloroplast
segment It was found that the 12 kb sequence contained a portion of the
chloroplast inverted repeat with genes for several tRNAs and the 16S ribosomal
RNA.
By restriction mapping, the mitochondrial version of the 12 kb sequence
appeared virually identical to its presumed progenitor sequence in the
chloroplast, the only differences occurring at the conjunction sites of the ends
of the inserted segment and the mitochondrial DNA sequences. Two other
segments of chloroplast DNA have been characterized in the maize
mitochondrial genome.
The first of these segments includes the 3 end of the chloroplast 23S
ribosomal RNA gene, the genes for 4.5S and 5S ribosomal RNAs, and two
tRNA genes. The other segment contains the rbcL gene and its flanking
sequences on both the 3/- and 5/ -ends. The gene is functional in the E. coli
transcription/translation system and its protein product can be precipitated with
antibodies to RuBP carboxylase.
However, the mitochondrial gene produces truncated polypeptide of
21,000 daltons instead of the 54,000 dalton protein synthesized by the
chloroplast gene. Whether this gene or any other chloroplast gene actually
functions in the mitochondrion is not known. However, the genetic code is
slightly different in mitochondria and there are also likely to be important
differences in transcriptional and translational control signals between the two
organelles.
In view of these differences it seems unlikely that chloroplast genes
could be functional in the mitochondrial environment. The presence in
mitochondrial DNA of sequences that hybridize to chloroplast DNA has since
been shown to be a widespread phenomenon, which is not restricted to maize.
In collaboration with J. Palmer, who had been studying evolutionary
relationships among the chloroplast DNAs of a wide variety of plants.
D. Stern showed homologies between cloned segments of chloroplast
DNA and mitochondrial DNA fragments from several species, including pea,
mung bean, spinach, and four different species of cucurbits. Adding up the
segments of chloroplast DNA that seemed to be represented in the
mitochondrial DNA of one or more of these plants gave the impression that
almost the entire chloroplast genome might be subject to random transfer.
In addition, different degrees of homology were seen, suggesting that
transfer events have occurred at different times during evolution. Although
some of the homologies observed in these survey experiments might be the
result of cross hybridization between chloroplast and mitochondrial genes of

174
similar function, this is not always the case, and it appears that there are too
many cross reacting fragments to be easily accounted for by this hypothesis.
The overall picture is most consistent with a series of events in which
random sequences from the chloroplast genome appear at random positions in
the mitochondrial genome.
These events would be frequent in an evolutionary sense. There is no
direct evidence concerning the mechanism by which DNA transfer occurs
between organelles.
However, it is easier to reconcile the present indirect evidence with an
essentially random process than with directed transfer by some kind of vector.
The fusion of organelles of the uptake by one organelle of DNA
released by lysis of another organelle might occur often enough to explain the
foregoing observations.
This view is also consistent with the demonstration that chloroplast
DNA fragments can be found in the nuclear DNA of higher plants.
Sequences transferred between organelles in this way have been called
promiscuous DNA, a term designed to highlight the random nature of the
process.
Transfers from a mitochondrion to chloroplast have not yet been
demonstrated and may not occur with significant frequency. The chloroplast
genome simply may not tolerate random insertions of foreign DNA. As noted
previously, although rearrangements have occurred in a few chloroplast
genomes, chloroplast DNA generally shows a high degree of conservation in
both size and sequence arrangement. Those insertion and deletion events which
do occur seem mostly to involve rather small segments of DNA, which would
not be easy to detect by hybridization techniques. In contrast, the large and
highly variable mitochondrial and nuclear genomes probably contain many
regions in which relatively large pieces of foreign DNA can be inserted with
minimal effect.
Gene Content, Structure and Expression of Mitochondrial Genome
Mitochondria of higher plants contain substantially the same set of
genes as those of other organisms. These include at least one ribosomal protein
gene and the ribosomal and transfer RNAs required for the mitochondrial
translation system.
Mitochondria of higher plants also contain a number of proteins
involved in the electron transport and ATPase complexes of the inner
mitochondrial membrane.

175
Genes identified in plant, yeast and animal mitochondria are listed . The
list is reasonably complete for yeast and animal mitochondria, for which
extensive information is available (including the complete DNA sequence of
several animal mitochondrial genomes).
In plants, with their very large mitochondrial genomes, it remains
possible that additional genes will be discovered.
However, it seems unlikely that the number of additional genes will be
large. When the products of mitochondrial protein synthesis in vitro are
separated by one or two dimensional PAGE, about 30 polypeptide spots can be
seen. Some of the spots may be artifacts of the in vitro reaction, so 30 spots is
probably a maximum estimate.
Thus, even though the number of genes listed is small, it is likely to
represent a reasonably large fraction of all plant mitochondrial genes. A map
showing the positions of several known mitochondrial genes on the 570 kb
master circle of the maize chloroplast genome. Several differences from the
chloroplast genome are immediately apparent, both in number and in location
of known genes.
For example, the genes for mitochondrial ribosomal RNA are arranged
in a pattern similar to that in yeast, with the 26S rRNA gene separated by a
large distance from the genes for the 18S and 58 rRNAs. These genes are quite
close together (separated only by a tRNA gene) in animal mitochondria, and
they are part of a single operon in chloroplasts and E. coli. Plant mitochondrial
genes also differ from corresponding genes in fungi. An example IS found in
the presence or absence of intron. The genes for cytochrome c oxidase subunit I
(COI) and apocytochrome b (COB) in fungi contain multiple introns.
These have been extensively studied by laboratories interested in the
mechanism of RNA splicing since sequences within these introns appear to
encode their own splicing enzymes (or maturases).
In maize neither COI nor COB contains an intron. On the other hand,
the maize cytochrome c oxidase subunit II gene (Coli) contains an intron which
is missing from the Coli genes of fungi and animals. Wheat mitochondrial Coli
contains a similar intron, .whereas the Coli gene from Oenothera does not. In
maize this intron seems to be spliced out to make a stable, possibly circular
RNA.
Little is known about the splicing mechanism except that on the basis of
the DNA sequence one may predict that splicing is dependent on RNA
secondary structure rather than on specific splicing signals, as in the case of
nuclear or chloroplast mRNAs. The amount of mtDNA in a plant is less than 1
% of the total cellular DNA; however, it plays a vital role in the development

176
and reproduction of the plant. The genetic system within the mitochondria of
all organisms is unusual in that it is neither wholly prokaryotic nor eukaryotic
in nature.
Some similarities to bacterial protein synthesis have been observed,
such as sensitivity to antibiotics, sequence homology of rRNAs and the use of
N-formyl-methionine to initiate the polypeptide chain.
However, the diversity of mitochondrial tRNAs and their structure
differ from those found in prokaryotes, the eukaryotic cytoplasm or
chloroplasts.
Mitochondrial ribosomes range from 55S (in animals) up to 77-78S (in
plants) in contrast to 70S chloroplast ribosomes and 80S cytoplasmic
ribosomes.
The major difference between the mitochondrial genetic system and all
other systems is that mitochondria use a slightly altered genetic code.
The genetic code used by mitochondria often differs from the so-called
universal code used by nuclear and chloroplast genes. For example, yeast and
animal mitochondria use the triplet TGC (or UGA) instead of the formal TGG
to code for tryptophan. Mitochondria in higher plants also appear to use CGG
to code for tryptophan. TGA, which encodes tryptophan in the mitochondria of
other species, appears to be used as a stop codon in plant mitochondrial genes.
In plant mitochondrial genes, there seems to be a strong bias toward the
use of codons ending in T, in yeast for those ending in A or T, and in animals
those ending in A or C. These differences, especially those involving novel
termination codons, would make it difficult to express nuclear or chloroplast
genes in the mitochondrial environment.
As in the case of chloroplast genes, mitochondrial genes often produce a
complex set of transcripts. Processing occurs at the ends of the tRNAs, which
are inserted like punctuation marks at the ends of structural genes.
Polyadenylation occurs in neither yeast nor plant mitochondria and transcripts
(although often much larger than an individual gene) do not include the entire
genome.
As yet relatively little is known about transcription or its control in
higher plant mitochondria. Internal fragments of all three of the genes
mentioned above (COI, Coil and COB) hybridize to a complex pattern of large
fragments on Northern blots, suggesting multiple initiation, termination, and or
processing sites.
Intron splicing adds to the complexity, especially when, as in the case of
the Coli gene, the excised intron sequence is sufficiently stable to be detected

177
in hybridization experiments.It is premature to attempt to analyze these
complex patterns in much detail, but we can expect to see major advances in
this area as more genes are identified, mapped, sequenced, and used to analyze
the synthesis and processing of mRNA.
What function does this genome carry out in the cell? Like the
chloroplast, the mitochondrial genome codes for a small but important number
of mitochondrial polypeptides. There is no evidence that it codes for
extramitochondrial components. The mtDNA appears to have a similar
function in cells of all species.
Genes Identified in Plant, Yeast OR Animal Mitochondria

Plant Yeast Animal

Ribosomal Genes

Large subunit + + +

Small subunit + + +

5S rRNA + - -

Transfer RNAs 30 25 22

Ribosome associated protein +(1) + -

Cytochrome c oxidase complex

Subunit I (COI) + + +

Subunit II (COII) + + +

Subunit III(COIII) ? + +

Cytochrome c reductase

Apocytochrome b (COB) + + +

ATPase complex

Subunit F1 alpha + - -

Subunit 6 ? + +

Subunit 8 ? + +

Subunit 9 + + -

NADH Dehydrogenase complex

178
ND - 1 +(2) +(?) +

Unassinged Reading Frames ? ? 2

Cytoplasmic Male Sterility - CMS – T


The most extensively studied phenomenon involving the mitochondrial
genome is cytoplasmic male sterility (CMS).
CMS has been exploited by breeders in the production of hybrid lines of
crop plants, particularly maize, and there is now a large body of evidence
implicating the mitochondrial genome in this phenomenon.
The evidence comes firstly from restriction endonuclease analysis
which shows differences between the mtDNA from normal and CMS
cytoplasms.
Secondly, changes have been demonstrated in the products of protein
synthesis in isolated mitochondria, and these have been correlated with the
cytoplasm type, and with the presence or absence of nuclear genes which
restore fertility (restorer genes).
For example, in Texas type maize (CMS-T) the mitochondria
synthesize an extra 13,000 MW polypeptide but a 21,000 MW polypeptide
characteristic of normal cytoplasm is missing. When T-type cytoplasm is
restored by its nuclear restorer gene, the 13,000 MW polypeptide is no longer
synthesized, but the 21,000 MW polypeptide does not reappear.
A third line of evidence comes from ultrastructural studies of
developing anthers; in maize S-type cytoplasm the first sign of abnormality is
mitochondrial degeneration in the tapetal cells. Cytoplasmic male sterility is
also correlated with the Plasmid like DNA molecules found in mitochondria.
S-type maize cytoplasm carries two linear molecules of 6.2 kbp (S1) and
5.2 kbp (S2).
They have Inverted terminal repeats of 200 bp, show some sequence
homology and can be integrated into the mitochondrial genome. Other maize
CMS cytoplasms have different combinations of plasmid like elements. In
Brassica species the presence of an 11.3 kbp plasmid is strongly correlated with
CMS, but this plasmid is not homologous to any found in maize. Thus there is a
general consensus of opinion that cytoplasmic male sterility resides in the
mitochondrial genome.
CMS in higher plants causes pollen abortion but does not affect female
fertility. The trait is inherited in a uniparental fashion through the female parent
(egg) and is observed in over 140 different plant species. These features have
made CMS useful in hybrid. Seed production and eliminated the need for

179
mechanical or hand emasculation. Plant mitochondrial genomes are transmitted
through the egg and not the pollen. Some CMS appear to be caused by novel
mitochondrial genomes while others seem to be due to mutations of common
mitochondrial genes. In both instances rearrangements have played a prominent
role in the origin of these exceptional genes. The Texas male-sterile cytoplasm
(CMS-T) of maize is characterized by a failure of anther exertion and by pollen
abort.
Also associated with CMS-T is susceptibility to the fungal disease
Bipolaris maydis, race T (Southern maize leaf blight); other maize cytoplasms
are resistant to race T. CMS-T is distinguished from other male sterile
cytoplasms by two nuclear genes, Rf1 and Rf2, that uniquely suppress the CMS
trait and restore normal pollen production. Two open reading frames, now
designated T-urf13 and ORF25, are found in a 3,547 nucleotide sequence of
mtDNA from CMS-T maize.
T-urf13 encodes a polypeptide of 12,961 daltons; ORP25 could
encode a polypeptide of 24,657 daltons but a translation product has not been
confirmed.
The 3,447 nucleotide sequence is noteworthy in that it is composed of
sequences with significant nucleotide homology to the 5 -flanking region of the
atp6 gene, the 3 -flanking region of the 26S rRNA gene (rrn26), a part of the
coding region of rrn26, and a chloroplast tRNA arg gene.
The chimeric sequence contains at least seven readily identifiable
recombinational sites, suggesting that it may have originated by rearrangements
involving both intramolecular and intermolecular recombinations. The coding
re9ion of T-urf13 consists of 88 codons with homology to an untranscribed 3 -
flanking region of rrn26, nine codons of unknown origin, and 18 codons with
homology to the coding region of rrn26.
It is thought that T-urf13 and atp6 have similar promoters because the
5' -flanking region of T-urf13 is almost identical to the 5' -flanking region of
apt6. Autonomous copies of the atp6 and rrn26 genes are located elsewhere in
the genome. It is striking that a functional gene could originate from these
rearrangements because the coding region of T-urf13 is composed chiefly of
mtDNA from the coding and flanking regions of rrn26 , a gene coding for a
structural RNA rather than a polypeptide.
Furthermore, the open reading frame of T.-urf13 is fortuitously
situated immediately downstream of a 5' -flanking sequence which promotes
activity. Given its unusual origin, it is not surprising that T-urf13 is only found
in CMS-T maize.T-urf13 encodes a 13 kd polypeptide that is constitutively
expressed in the mitochondria of all organs of CMS- T maize.

180
The polypeptide was demonstrated by Western blotting and
immunoprecipitation studies with antisera prepared against chemically
synthesized oligopeptides were based on the predicted amino acid sequence of
T-urf13. As expected, the 13 kd polypeptide was not detected in other maize
cytoplasms or in other plant species. T-urf13 is localized in the mitochondrial
membranes, although its exact location is not known.
It has been suggested that it may be associated with the ATPase
complex because it coprecipitates with subunit 9 of the ATPase complex;
however, it has also been observed in isolates of complex IV (cytochrome
oxidase). The dominant nuclear alleles, Rf1 and Rf2, restore full pollen fertility
to CMS-T maize; Rf1 specifically affects expression of the mitochondrial gene
T-urf13, reducing the abundance of the 13 kd polypeptide by about 80% and
altering the transcriptional products of T-urf13.
The recessive allele (rf1) does not affect expression of T-urf13 by a
transcript processing event; however, additional studies are needed to clarify
the mechanism. It is not known whether Rf1 originated before T-urf13 or came
later. Rt1 could have other activities besides its effects on T-urf13, although
none is known. In any event, it is interesting that Rf1 uniquely suppresses
expression of the rare gene T-urt13. In contrast, there is no evidence that Rf2
affects expression of T-urf13 and it is not clear how it contributes to pollen
restoration.
The genetic basis of CMS is of great interest, as the use of these lines
can lead to a high degree of cytoplasmic uniformity in a crop. Such uniformity
is undesirable as it can render a crop vulnerable to damage by pathogens or
extreme environmental conditions. This problem is well illustrated by the
massive losses suffered in the American maize crop in 1970 due to a race of
southern corn leaf blight which preferentially infected plants with CMS-T type
cytoplasm.
Over 85% of the hybrid maize carried this cytoplasm. The identification
of "promiscuous" DNA came from a study of plant mtDNA in conjunction with
a study of other genomes. Ellis (1982) used this term to describe DNA
sequences that are found in more than one genome of the plant. Maize mtDNA
carries q 12 kbp sequences which is homologous to that part of the inverted
repeat of the chloroplast genome which encodes the 16S rRNA and two tRNAs.
This sequence is not transcribed in mitochondria but appears to be
important as it is altered in certain male sterile lines. Homology between
sequence of the nuclear genome and some plasmid like elements of the
mitochondrial genome has also been detected in maize. It was suggested that

181
these plasmid like elements might provide a mechanism for the transfer of
genetic information between genomes.
The mitochondrial genome is perhaps the least well understood plant
genome in terms of function, structure and relationship to development of the
plant. Interest has recently spurted, however, as the importance of the genome
to normal development is now realized and techniques for its analysis have
become available.
Cell-Specific Regulation of Gene Expression in Mitochondria during
Anther Development in Sunflower
In higher plants, mitochondrial function must be developmentally
regulated to meet the changing respiratory demands of different tissues. In
some tissues, high respiratory demand may be met by active mitochondrial
biogenesis to produce sufficient numbers of mitochondria and therefore energy
in the form of ATP. Active biogenesis of mitochondria has been shown to
occur in the root apical meristem of Arabidopsis and other species where
mitochondrial numbers increase from 65 to 200 per cell and respiration rates
are high. Mitochondrial biogenesis has also been reported to occur in meiotic
and tapetal cells during early anther development in maize where numbers of
mitochondria increase 20- to 40-fold. This developmental control of
mitochondrial biogenesis may be partly regulated at the leve1 of mitochondrial
gene expression, as has been suggested by experiments where mitochondria
isolated from different tissues were found to synthesize different profiles of
proteins. Finnegan and Brown (1990) proposed that this developmental
regulation of mitochondrial gene expression occurs at the transcriptional and
post-transcriptional levels.
In many biological systems, mutations that perturb the normal
developmental pathway can be used to define the important processes
occurring in that pathway. Unfortunately, our understanding of how
mitochondrial function and gene ex- To whom correspondence should be
addressed. pression are developmentally regulated in higher plantshas been
limited because only two mitochondrial mutations have been associated with
nonlethal phenotypes: nonchromosomal stripe (NCS) in maize and cytoplasmic
male sterility (CMS) in a range of higher plants. Characterization of leaf
development in NCS mutants has indicated that mitochondrial function is
necessary for normal development of chloroplasts and acquisition of
photosynthetic competence. Similarly, investigation of CMS has shown that
mitochondrial function is essential during pollen formation and therefore sexual
reproduction in higher plants.

182
CMS is a maternally inherited phenotype characterized by an inability
to produce functional pollen. Interestingly, it only disrupts pollen production;
vegetative development and female fertility are apparently unaffected. In
nature, CMS can arise spontaneously, where it may be maintained by an
increase in female fertility , but for commercial purposes it is often induced by
interspecific and intraspecific crosses that introduce a nucleus into a foreign
cytoplasm. In many species, the CMS phenotype is associated with mutations
in the mitochondrial genome and can be suppressed by the action of nuclear-
encoded fertility restorer genes. Thus, the study of fertile, CMS, and restored
fertile hybrid plants provides a good experimental system for the investigation
of the developmental regulation of mitochondrial gene expression by nuclear
genes.
A great deal of our present understanding of the molecular and
biochemical basis of CMS comes from studies of male sterility in maize (CMS-
T) and petunia. In these systems, 812 The Plant Cell polypeptide products of
novel open reading frames (ORFs) have been found to be associated with CMS
(T-urf-13 and the pcf gene, respectively) and are present at reduced levels in
the restored hybrid plants. However, our understanding of how these novel
polypeptides might cause pollen forming cells to abort is still limited. It is also
obvious from cytological studies of different types of CMS within the same
species that the mechanisms of pollen abortion are diverse. Therefore, analysis
of different CMS systems will extend our knowledge of the important
cytoplasmic processes occurring during pollen production.
Recently, molecular characterization of CMS in sunflower has shown
that as in the maize CMS-T and petunia systems, a polypeptide that is the
product of a novel ORF is associated with pollen cell abortion. The abundance
of this protein is reduced specifically in male florets upon restoration of
fertility. CMS in sunflower provides an attractive system for further analysis
because it is associated with the expression of a single novel ORF, and the
timing of meiotic cell abortion has been characterized. We have undertaken a
cytological investigation of anther development in sunflower florets, coupled
with further molecular analysis of CMS in sunflower, to characterize the nature
of mitochondrial gene expression and function during anther development and
the cause of meiotic cell (meiocyte) abortion in male sterile sunflowers. CMS
in sunflower was first identified by Leclercq (1969) in the progeny of an
intraspecific cross between Helianthus annuus and H. petiolaris (PET1 type).
Leclercq (1984) later identified two dominant restorer genes of the CMS
phenotype in sunflower (Rfl and Rf2) and postulated from restoration genetics
that Rf2 is dominant in most sterile cytoplasms. In sunflower, the CMS
phenotype is associated with a mutation in the mitochondrial genome of the

183
sterile line that is thought to have been created by inversion/insertion events
involving recombination across a small repeat. The insertion of a novel
fragment of DNA has led to the creation of a novel open reading frame
(orf522) downstream of the arpA gene. In sterile and restored hybrid lines,
orf522 is cotranscribed with atpA on a 3-kb transcript that is translated to
produce a novel 15-kD polypeptide (ORF522), in addition to the a subunit of
the mitochondrial F1-ATP synthase, encoded by afpA. The restorer gene(s)
acts specifically in male florets to reduce the abundance of the atpA-orf522
cotranscript and therefore the 15-kD ORF522 polypeptide, suggesting that the
expression of orf522 is probably causally related to the CMS phenotype.
In this study, we have addressed the problem of why the expression of
orf522 only affects microsporogenesis by observing the expression of
mitochondrial genes during anther development in sunflower. We showed that
the atpA transcript and the a subunit protein are particularly abundant in young
meiocytes of all lines, implying a cell-specific regulation of expression of the
mitochondrial genome. Our data indicated that this pattern of expression is not
a general phenomenon but is only observed for mitochondrially encoded genes.
Finally, we present evidence that suggests that the restorer gene(s) may act cell
specifically in meiocytes of the restored hybrid line to reduce the abundance of
the atpA-orf522 transcript. These results provide compelling evidence for cell-
specific regulation of the expression of the mitochondrial genome during anther
development and demonstrate that nuclear genes can modify this expression.
Meiosis in Sunflower Anthers
To investigate the importance of mitochondrial gene expression during
the early stages of anther development and its significance for the CMS
phenotype, RNA in situ hybridization analysis was used to investigate the
spatial distribution and abundance of mitochondrial gene transcripts. This first
required a characterization of male meiosis in sunflower anthers.
Sunflower (H. annuus) is a member of the Compositae family, which is
characterized by a single terminal inflorescence consisting of 700 to 3000
individual disc flowers or florets (Knowles, 1978). Sunflower florets are
hermaphroditic, as they are composed of both male and female organs. The
anthers are present in the upper part of the floret (“male floret,” represented by
shaded ellipses in Figure 1C); the ovaries are present in the lower part of the
floret (“female floret,” represented by open circles in Figure 1C). Sunflower
florets exhibit protandry, that is, the male part of the flower (anthers) matures
before the female. Therefore, under normal conditions, male meiosis occurs
before the sunflower inflorescence opens when the inflorescence bud is
between 2.5 and 4.5 cm in diameter (Figure 1A). Within the inflorescence, the

184
florets develop sequentially in whorls from the periphery to the center at a rate
of about one to four whorls a day. This sequential maturation of disc flowers is
demonstrated in the open inflorescence shown in Figure 1. For the purposes of
this study, six stages of meiosis were identified and investigated, beginning
with the formation of the pollen mother cells prior to meiocyte abortion in
sterile anthers and ending with microspore release from the tetrad in the fertile
and restored hybrid lines (Figures 1C and 2). These phases of meiosis were
characterized by using histological and 4:6- diamidino-2-phenylindole (DAPI)
stains on fixed and embedded florets. It was found that the meiotic stage is
correlated with the length of the male part of the floret bud (male floret), as
shown in Figure 1C. These stages of sunflower microsporogenesis correspond
to those described by Horner (1977). The development of anthers in the fertile
and sterile lines was compared at the six stages of meiosis described above.
Development of anthers in florets of the sterile line appears completely normal
during the premeiosis and leptotene stages (Figure 2), but by pachytene (stage
3), abnormalities are detected in the meiocyte cells of the sterile line when
observed by electron microscopy (Laveau et al., 1989; C.J. Smart, data not
shown). By stage 4 (divisions), the tapetum shows abnormal development and
has completely degenerated by the tetrad stage in the sterile line (Figure 2).
Therefore, for the purposes of this study, comparisons of mitochondrial gene
expression in fertile, sterile, and restored hybrid lines were made between male
florets or anthers at the leptotene stage (stage 2) prior to meiocyte abortion in
the sterile line.

185
186
TRANSPOSABLE ELEMENTS IN MAIZE AND DROSPHILA
Transposons are sequences of DNA that can move around to different
positions within the genome of a single cell, a process called transposition. In
the process, they can cause mutations and change the amount of DNA in the
genome. Transposons were also once called "jumping genes", and are examples

187
of mobile genetic elements. They were discovered by Barbara McClintock
early in her career[1], for which she was awarded a Nobel prize in 1983. There
are a variety of mobile genetic elements, and they can be grouped based on
their mechanism of transposition. Class I mobile genetic elements, or
retrotransposons, move in the genome by being transcribed to RNA and then
back to DNA by reverse transcriptase, while class II mobile genetic elements
move directly from one position to another within the genome using a
transposase to "cut and paste" them within the genome. Transposons are very
useful to researchers as a means to alter DNA inside of a living organism.
Transposons make up a large fraction of genome sizes which is evident through
the C-values of eukaryotic species.
Applications
The first transposon was discovered in the plant maize (Zea mays, corn
species), and is named dissociator (Ds). Likewise, the first transposon to be
molecularly isolated was from a plant (Snapdragon). Appropriately,
transposons have been an especially useful tool in plant molecular biology.
Researchers use transposons as a means of mutagenesis. In this context, a
transposon jumps into a gene and produces a mutation. The presence of the
transposon provides a straightforward means of identifying the mutant allele,
relative to chemical mutagenesis methods.
Sometimes the insertion of a transposon into a gene can disrupt that
gene's function in a reversible manner; transposase-mediated excision of the
transposon restores gene function. This produces plants in which neighboring
cells have different genotypes. This feature allows researchers to distinguish
between genes that must be present inside of a cell in order to function (cell-
autonomous) and genes that produce observable effects in cells other than those
where the gene is expressed.
Transposons are also a widely used tool for mutagenesis of most
experimentally tractable organisms.
The first transposons were discovered in maize (Zea mays), (corn
species) by Barbara McClintock in 1948, for which she was awarded a Nobel
Prize in 1983. She noticed insertions, deletions, and translocations, caused by
these transposons. These changes in the genome could, for example, lead to a
change in the color of corn kernels. About 50% of the total genome of maize
consists of transposons. The Ac/Ds system McClintock described are class II
transposons.

188
McClintock's research became well understood in the 1960s and 1970s,
as researchers demonstrated the mechanisms of genetic change and genetic
regulation that she had demonstrated in her maize research in the 1940s and
1950s. Awards and recognition for her contributions to the field followed,
including the Nobel Prize, awarded to her in 1983 for the discovery of genetic
transposition; she is the only woman to receive an unshared Nobel Prize in that
category.
Cold Spring Harbor
After her year-long temporary appointment, McClintock accepted a
full-time research position at Cold Spring Harbor. Here, she was highly
productive and continued her work with the breakage-fusion-bridge cycle,
using it to substitute for X-rays as a tool for mapping new genes. In 1944, in
recognition of her prominence in the field of genetics during this period,
McClintock was elected to the National Academy of Sciences—only the third
woman to be so elected. In 1945, she became the first woman president of the
Genetics Society of America. In 1944 she undertook a cytogenetic analysis of
Neurospora crassa at the suggestion of George Beadle, who had used the
fungus to demonstrate the one gene–one enzyme relationship. He invited her to
Stanford to undertake the study. She successfully described the number of
chromosomes, or karyotype, of N. crassa and described the entire life cycle of
the species. N. crassa has since become a model species for classical genetic
analysis.[6]

189
Discovery of controlling elements

The relationship of Ac/Ds in the control of the elements and mosaic


color of maize. The seed in 10 is colorless, there is no Ac element present and
Ds inhibits the synthesis of colored pigments called anthocyanins. In 11 to 13,
one copy of Ac is present. Ds can move and some anthocyanin is produced,
creating a mosaic pattern. In the kernel in panel 14 there are two Ac elements
and in 15 there are three.
In the summer of 1944 at Cold Harbor Spring, McClintock began
systematic studies on the mechanisms of the mosaic color patterns of maize
seed and the unstable inheritance of this mosaicism. She identified two new
dominant and interacting genetic loci that she named Dissociator (Ds) and
Activator (Ac). She found that the Dissociator did not just dissociate or cause
the chromosome to break, it also had a variety of effects on neighboring genes
when the Activator was also present. In early 1948, she made the surprising
discovery that both Dissociator and Activator could transpose, or change
position, on the chromosome.
She observed the effects of the transposition of Ac and Ds by the
changing patterns of coloration in maize kernels over generations of controlled
crosses, and described the relationship between the two loci through intricate
microscopic analysis. She concluded that Ac controls the transposition of the
Ds from chromosome 9, and that the movement of Ds is accompanied by the
breakage of the chromosome. When Ds moves, the aleurone-color gene is
released from the suppressing effect of the Ds and transformed into the active
form, which initiates the pigment synthesis in cells. The transposition of Ds in

190
different cells is random, it may move in some but not others, which causes
color mosaicism. The size of the colored spot on the seed is determined by
stage of the seed development during dissociation. McClintock also found that
the transposition of Ds and the is determined by the number of Ac copies in the
cell.
Between 1948 and 1950, she developed a theory by which these mobile
elements regulated the genes by inhibiting or modulating their action. She
referred to Dissociator and Activator as "controlling units"—later, as
"controlling elements"—to distinguish them from genes. She hypothesized that
gene regulation could explain how complex multicellular organisms made of
cells with identical genomes have cells of different function. McClintock's
discovery challenged the concept of the genome as a static set of instructions
passed between generations. In 1950, she reported her work on Ac/Ds and her
ideas about gene regulation in a paper entitled "The origin and behavior of
mutable loci in maize" published in the journal Proceedings of the National
Academy of Sciences. In summer 1951, when she reported on her work on gene
mutability in maize at the annual symposium at Cold Spring Harbor, the paper
she presented was called "Chromosome organization and genic expression".
Her work on controlling elements and gene regulation was conceptually
difficult and was not immediately understood or accepted by her
contemporaries; she described the reception of her research as "puzzlement,
even hostility". Nevertheless, McClintock continued to develop her ideas on
controlling elements. She published a paper in Genetics in 1953 where she
presented all her statistical data, and undertook lecture tours to universities
throughout the 1950s to speak about her work.[ She continued to investigate the
problem and identified a new element that she called Suppressor-mutator
(Spm), which, although similar to Ac/Ds, displays more complex behavior.
Based on the reactions of other scientists to her work, McClintock felt she
risked alienating the scientific mainstream, and from 1953 stopped publishing
accounts of her research on controlling elements.

191
The origins of maize

McClintock's microscope and ears of corn on exhibition at the National


Museum of Natural History
In 1957, McClintock received funding from the National Science
Foundation, and the Rockefeller Foundation sponsored her to start research on
maize in South America, an area that is rich in varieties of this species. She was
interested in studying the evolution of maize, and being in South America
would allow her to work on a larger scale. McClintock explored the
chromosomal, morphological, and evolutionary characteristics of various races
of maize. From 1962, she supervised four scientists working on South
American maize at the North Carolina State University in Raleigh. Two of
these Rockefeller fellows, Almiro Blumenschein and T. Angel Kato, continued
their research on South American races of maize well into the 1970s. In 1981,
Blumenschein, Kato, and McClintock published Chromosome constitution of
races of maize, which is considered a landmark study of maize that has
contributed significantly to the fields of evolutionary botany, ethnobotany, and
paleobotany.
Rediscovery of McClintock's controlling elements
McClintock officially retired from her position at the Carnegie
Institution in 1967, and was made a Distinguished Service Member of the
Carnegie Institution of Washington. This honor allowed her to continue
working with graduate students and colleagues in the Cold Spring Laboratory
as scientist emerita. In reference to her decision 20 years earlier no longer to
publish detailed accounts of her work on controlling elements, she wrote in
1973:

192
Over the years I have found that it is difficult if not impossible to bring
to consciousness of another person the nature of his tacit assumptions when, by
some special experiences, I have been made aware of them. This became
painfully evident to me in my attempts during the 1950s to convince geneticists
that the action of genes had to be and was controlled. It is now equally painful
to recognize the fixity of assumptions that many persons hold on the nature of
controlling elements in maize and the manners of their operation. One must
await the right time for conceptual change.
The importance of McClintock's contributions only came to light in the
1960s, when the work of French geneticists Francois Jacob and Jacques Monod
described the genetic regulation of the lac operon, a concept she had
demonstrated with Ac/Ds in 1951. Following Jacob and Monod's 1961 Journal
of Molecular Biology paper "Genetic regulatory mechanisms in the synthesis of
proteins", McClintock wrote an article for American Naturalist comparing the
lac operon and her work on controlling elements in maize. McClintock's
contribution to biology is still not widely acknowledged as amounting to the
discovery of genetic regulation.
McClintock was widely credited for discovering transposition following
the discovery of the process in bacteria and yeast in the late 1960s and early
1970s. During this period, molecular biology had developed significant new
technology, and scientists were able to show the molecular basis for
transposition. In the 1970s, Ac and Ds were cloned by other scientists and were
shown to be Class II transposons. Ac is a complete transposon that can produce
a functional transposase, which is required for the element to move within the
genome. Ds has a mutation in its transposase gene, which means that it cannot
move without another source of transposase. Thus, as McClintock observed, Ds
cannot move in the absence of Ac. Spm has also been characterized as a
transposon. Subsequent research has shown that transposons typically do not
move unless the cell is placed under stress, such as by irradiation or the
breakage, fusion, and bridge cycle, and thus their activation during stress can
serve as a source of genetic variation for evolution. McClintock understood the
role of transposons in evolution and genome change well before other
researchers grasped the concept. Nowadays, Ac/Ds are used as a tool in plant
biology to generate mutant plants used for the characterization of gene
function.
Mu Transposon Excision in Transgenic Maize
Mutator lines of maize contain a high-copy-number DNA transposon
family. Nine Mu element subfamilies exist. All share homologous flanking
~215-bp terminal inverted repeat (TIR) sequences. Subfamilies Mu1 to Mu8 are

193
nonautonomous and require a source of transposase to catalyze transposition.
The Mu transposase is encoded by the 4.9-kb MuDR element, which is present
in multiple copies in highly mutagenic Mutator lines.
Mu elements are an efficient transposon-tagging tool, because
multicopy MuDR lines have a forward mutation frequency 20- to 50-fold higher
than either Ac or Spm. Moreover, Mu elements transpose equally to linked and
unlinked sites. They exhibit an extremely high insertion bias (>90%) for low-
copy-number transcribed regions of the genome. Finally, Mu germinal insertion
events occur late, resulting in independent insertions in sibling progeny.
A fascinating component of Mutator biology is that MuDR catalyzes
distinct transposition behaviors of Mu elements in somatic and germinal cells.
The full somatic program involves activation, activity, and epigenetic silencing.
In a line with methylated Mu elements, introduction of a transcriptionally active
MuDR results in Mu element TIR demethylation in leaves. Demethylated Mu
elements can then excise at high frequencies, but only during the terminal cell
divisions of somatic tissues, as observed in anthers, aleurone, and leaves. In the
cells that give rise to gametes, Mu follows a different program, because
germinal revertants are exceedingly rare. Instead, Mu elements duplicate and
insert in late pregerminal, meiotic, and gametic cells but rarely in the vegetative
precursor cells that give rise to the inflorescences. After amplification, multiple
unlinked MuDR elements in some progeny or leaf sectors within progeny plants
undergo coordinate epigenetic transcriptional silencing, which results in the
remethylation of Mu element TIRs and loss of Mutator activity.
As shown in Figure 1A, MuDR consists of two convergently oriented
genes, mudrA and mudrB, flanked by promoter-containing TIRs. By homology
and analysis, the function of mudrB remains unknown. In contrast, mudrA is the
candidate transposase gene, because it is related to bacterial transposons.
Furthermore, analysis of lines carrying deletions in MuDR demonstrated that
mudrA, but not mudrB, is required to catalyze somatic excisions.

194
Figure 1. Structure of MuDR, Endogenous mudrA and mudrB
Transcripts, the CaMV 35S–mudrA Construct in cA+ Transgenic Maize Lines,
and the Probes Used for RNA Gel Blots.
(A) Structure of an endogenous MuDR element. The element has two open
reading frames, termed mudrA and mudrB, encoded in antiparallel orientation.
The intergenic region between the two genes is composed of diverse short
repetitive elements. The promoters are located within the ~215-bp TIRs. The
mudrA region with high similarity to bacterial transposases is shown in white.
The DNA probes for RNA analysis in this study are located above the element.
Numbering is according to Hershberger et al. 1991 .
(B) The diversity of endogenous mudrA and mudrB transcripts in active
Mutator seedlings. Intron sequences shown in solid black are in-frame with
exons. Alternative mudrA transcription initiation sites (+169 and +252) produce
transcripts with a short or long 5' leader sequence. aa, amino acids.

195
(C) The structure of the CaMV 35S–mudrA cDNA in M13 transformed into
maize to make cA lines. In construct phMR53, the native promoter, alternative
start sites, 5' UTR, and introns were removed. The CaMV 35S promoter and
130-bp leader sequences were substituted. The mudrA 3' UTR (polymorphic
region) was truncated and fused to the nopaline synthase (nos) terminator.
mudrA encodes diverse transcripts resulting from alternative
transcription initiation, intron splice failure, and alternative polyadenylation
sites (Figure 1B; Hershberger et al. 1995 ). Thus, mudrA produces transcripts
with polymorphic 5' and 3' untranslated regions (UTRs) and a coding region
predicted to produce at least two large polypeptides of 736 and 823 amino
acids. Although MuDR was identified in 1991 and fully sequenced, there has
been no progress in using a transgenic approach to determine which transcripts
are sufficient to catalyze or regulate specific Mu activities. The major limitation
is that all mudrA plasmids grown in Escherichia coli develop frameshift or
deletion mutations. For this reason, it has also not been possible to transfer
Mutator activity to heterologous hosts for transposon-tagging experiments.
In this article, we demonstrate that bacteriophage M13 is a suitable
vector to manipulate mudrA and to make transgenic plants. We then use
transgenic maize to test the function of the fully spliced transcript, capable of
encoding the 823– but not the 736–amino acid protein. When expressed in
yeast, this cDNA encodes a 120-kD polypeptide that has been shown to
specifically bind a Mu TIR sequence in vitro. To determine whether the
polymorphic noncoding sequences of mudrA are required for developmental
regulation, we excluded the alternative 5' and 3' UTRs from the transgene and
replaced the native promoter with a heterologous cauliflower mosaic virus
(CaMV) 35S promoter (Figure 1C). In this report, we analyze transgenic plants
expressing full-length and truncated versions of this cDNA to determine
whether these transgenes are sufficient to program the four molecular and
developmental activities catalyzed by MuDR: demethylation, somatic excision,
germinal insertion, and epigenetic reprogramming.
TRANSPOSABLE ELEMENTS IN DROSOPHULA
One family of transposons in the fruit fly Drosophila melanogaster are
called P elements. They seem to have first appeared in the species only in the
middle of the twentieth century. Within 50 years, they have spread through
every population of the species. Gerald Rubin and Allan Spradling pioneered
technology to use artificial P elements to insert genes into Drosophila by
injecting the embryo

196
P element
A P element is a transposon that is present in the fruit fly Drosophila
melanogaster and is used widely for mutagenesis and the creation of
genetically modified flies used for genetic research. The P element gives rise to
a phenotype known as hybrid dysgenesis.
P elements seem to have first appeared in the species only in the middle
of the twentieth century. Over the last fifty years, they have spread through
every wild population, so that only older laboratory stocks lack them.
Characteristics
The P element is a class II transposon, which means that its movement
within the genome is made possible by a transposase. The complete element is
2907 bp and is autonomous because it encodes a functional transposase; non-
autonomous P elements which lack a functional transposase gene due to
mutation also exist. Non-autonomous P elements can still move within the
genome if there are autonomous elements to produce transposase. The P
element can be identified by its terminal 31-bp inverted repeats, and the 8 bp
direct repeats its movement into and out of DNA sequence produces.
Hybrid dysgenesis
Hybrid dysgenesis refers to the high rate of mutation in germ line cells
of Drosophila strains resulting from a cross of males with autonomous P
elements (P Strain/P cytotype) and females that lack P elements (M Strain/M
cytotype). The hybrid dysgenesis syndrome is marked by temperature-
dependent sterility, elevated mutation rates, and increased chromosome
rearrangement and recombination.
The hybrid dysgenesis phenotype is effected by the transposition of
P elements within the germ-line cells of offspring of P strain males with
M strain females. Transposition only occurs in germ-line cells, because a
splicing event needed to make transposase mRNA does not occur in somatic
cells.
Hybrid dysgenesis manifests when crossing P strain males with M
strain females and not when crossing P strain females (females with
autonomous P elements) with M strain males. The eggs of P strain females
contain high amounts of a repressor protein that prevents transcription of the
transposase gene. The eggs of M strain mothers, which do not contain the
repressor protein, allow for transposition of P elements from the sperm of
fathers.

197
P element in molecular biology
The P element has found wide use in Drosophila research as a mutagen.
The mutagenesis system typically uses an autonomous but immobile element,
and a mobile nonautonomous element. Flies from subsequent generations can
then be screened by phenotype or PCR.
Naturally-occurring P elements contain:
 Coding sequence for the enzyme transposase
 Recognition sequences for transposase action
Transposase is an enzyme that regulates and catalyzes the excision of a
P element from the host DNA, cutting at two recognition sites, and then
reinserts randomly. It is the random insertion that may interfere with existing
genes, or carry an additional gene, that can be used for genetic research.
To use this as a useful and controllable genetic tool, the two parts of the
P element must be separated to prevent uncontrolled transposition. The normal
genetic tools are therefore:
 DNA coding for transposase with no transposase recognition sequences
so it cannot insert.
 A "P Plasmid"
P Plasmids always contain:
 A Drosophila reporter gene, often a red-eye marker (the product of the
white gene).
 Transposase recognition sequences.
And may contain:
 A gene of interest
 An E. coli selectable marker gene, often some kind of antibiotic
resistance.
 Origin of replication and other associated plasmid 'housekeeping'
sequences.
Methods of usage
There are two main ways to utilise these tools:
Fly transformation
1. Microinject the posterior end of an early-stage (pre-cellularization)
embryo with DNA coding for transposase and a plasmid with the
reporter gene, gene of interest and transposase recognition sequences.
2. Random transposition occurs, inserting the gene of interest and reporter
gene.

198
3. Grow flies and cross to remove genetic variation between the cells of
the organism. (Only some of the cells of the organism will have been
transformed. By breeding only the genotype of the gametes is passed
on, removing this variation).
4. Look for flies expressing the reporter gene. These carry the inserted
gene of interest, so can be investigated to determine the phenotype due
to the gene of interest.
It is important to note that the inserted gene may have damaged the
function of one of the host's genes. Several lines of flies are required so
comparison can take place and ensure that no additional genes have been
knocked out.
Insertional mutagenesis
1. Microinject the embryo with DNA coding for transposase and a plasmid
with the reporter gene and transposase recognition sequences (and often
the E. coli reporter gene and origin of replication, etc.).
2. Random transposition occurs, inserting the reporter gene randomly. The
insertion tends to occur near actively transcribed genes, as this is where
the chromatin structure is loosest, so the DNA most accessible.
3. Grow flies and cross to remove genetic variation between the cells of
the organism (see above).
4. Look for flies expressing the reporter gene. These have experienced a
successful transposition, so can be investigated to determine the
phenotype due to mutation of existing genes.
Possible mutations:
1. Insertion in a translated region => hybrid protein/truncated protein.
Usually causes loss of protein function, although more complex effects
are seen.
2. Insertion in an intron => altered splicing pattern/splicing failure.
Usually results in protein truncation or the production of inactive mis-
spliced products, although more complex effects are common.
3. Insertion in 5' (the sequence that will become the mRNA 5' UTR)
untranslated region => truncation of transcript. Usually results in failure
of the mRNA to contain a 5' cap, leading to less efficient translation.
4. Insertion in promoter => reduction/complete loss of expression. Always
results in greatly reduced protein production levels. The most useful
type of insertion for analysis due to the simplicity of the situation.

199
5. Insertion between promoter and upstream enhancers => loss of
enhancer function/hijack of enhancer function for reporter gene.†
Generally reduces the level of protein specificity to cell type, although
complex effects are often seen.
Enhancer trapping
The hijack of an enhancer from another gene allows the analysis of the
function of that enhancer. This, especially if the reporter gene is for a
fluorescent protein, can be used to help map expression of the mutated gene
through the organism, and is a very powerful tool.
Other usage of P elements
These methods are referred to as reverse genetics.
Secondary mobilisation
If there is an old P element near the gene of interest (with a broken
transposase) you can remobilise by microinjection of the embryo with coding
for transposase or transposase itself. The P element will often transpose within
a few kilobases of the original location, hopefully affecting your gene of
interest as for 'Insertional Mutagenisis'.
Analysis of mutagenesis products
Once the function of the mutated protein has been determined it is
possible to sequence/purify/clone the regions flanking the insertion by the
following methods:
Inverse PCR
1. Isolate the fly genome.
2. Undergo a light digest (using an enzyme [enzyme 1] known NOT to cut
in the reporter gene), giving fragments of a few kilobases, a few with
the insertion and its flanking DNA.
3. Self ligate the digest (low DNA concentration to ensure self ligation)
giving a selection of circular DNA fragments, a few with the insertion
and its flanking DNA.
4. Cut the plasmids at some point in the reporter gene (with an enzyme
[enzyme 2] known to cut very rarely in genomic DNA, but is know to in
the reporter gene).
5. Using primers for the reporter gene sections, the DNA can be amplified
for sequencing.
The process of cutting, self ligation and re cutting allows the
amplification of the flanking regions of DNA without knowing the sequence.

200
The point at which the ligation occurred can be seen by identifying the cut site
of [enzyme 1].

Process of analysis of DNA flanking a known insert by PCR.

201
Plasmid rescue

Process of analysis of DNA flanking a known insert by plasmid rescue.


1. Isolate the fly genome.
2. Undergo a light digest (using an enzyme [enzyme 1] known to cut in the
boundary between the reporter gene and the E. coli reporter gene and
plasmid sequences), giving fragments of a few kilobases, a few with the
E. coli reporter, the plasmid sequences and its flanking DNA.
3. Self ligate the digest (low DNA concentration to ensure self ligation)
giving a selection of circular DNA fragments, a few with the E. coli
reporter, the plasmid sequences and its flanking DNA.
4. Insert the plasmids into E. coli cells (eg. by electroporation).

202
5. Select plasmids for the E. coli selectable marker gene. Only successful
inserts of plasmids with the plasmid 'housekeeping' sequences will
express this gene.
6. The gene can be cloned for further analysis.
SUGGESTED READINGS:
1. Gene VI Lewin, B., New York, Oxford University Press.
2. Genetic Engineering, Rigby, P.W.J. (1987) Academic Press Inc. Florida,
USA.
3. Genetics – T.A. Brown
4. Molecular biology – David Freifelder.
5. Cell and Molecular Biology – S.C. Rastogi (2003), New Age International
Publishers, New Delhi.

203
A Molecular Biology Glossary
This document is intended to provide a quick reference for molecular
biology terms. It does not go into depth on the terms, but can be useful if you
are trying to understand a typical seminar or paper.
Figure 1: Nucleotide structure.

Figure 2: DNA strands are antiparallel


(the 5' end of one strand pairs with the 3' end of the opposite strand).

Figure 3: RNA transcription.


RNA is synthesized in the 5' to 3' direction from a DNA strand which
runs in the antiparallel direction (3' to 5'). In this diagram, the top DNA strand
is the sense strand, and in sequence would read the same as the RNA (except
with T's instead of U's). The bottom strand is the anti-sense strand, and acts as
the template for transcription.

204
Figure 4: A typical gene.
3' end/5' end: A nucleic acid strand is inherently directional, and the "5
prime end" has a free hydroxyl (or phosphate) on a 5' carbon and the "3 prime
end" has a free hydroxyl (or phosphate) on a 3' carbon (carbon atoms in the
sugar ring are numbered from 1' to 5'; see Figure 1). That's simple enough for
an RNA strand or for single-stranded (ss) DNA. However, for double-stranded
(ds) DNA it's not so obvious - each strand has a 5' end and a 3' end, and the 5'
end of one strand is paired with the 3' end of the other strand (it is
"antiparallel"; Figure 2). One would talk about the 5' end of ds DNA only if
there was some reason to emphasize one strand over the other - for example if
one strand is the sense strand of a gene. In that case, the orientation of the sense
strand establishes the direction (see Figures 3 and 4).
3' flanking region: A region of DNA which is NOT copied into the
mature mRNA, but which is present adjacent to 3' end of the gene (see Figure
4). It was originally thought that the 3' flanking DNA was not transcribed at all,
but it was discovered to be transcribed into RNA, but quickly removed during
processing of the primary transcript to form the mature mRNA. The 3' flanking
region often contains sequences which affect the formation of the 3' end of the
message. It may also contain enhancers or other sites to which proteins may
bind.

205
3' untranslated region: A region of the DNA which IS transcribed into
mRNA and becomes the 3' end or the message, but which does not contain
protein coding sequence. Everything between the stop codon and the polyA tail
is considered to be 3' untranslated (see Figure 4). The 3' untranslated region
may affect the translation efficiency of the mRNA or the stability of the
mRNA. It also has sequences which are required for the addition of the poly(A)
tail to the message (including one known as the "hexanucleotide", AAUAAA).
5' flanking region: A region of DNA which is NOT transcribed into
RNA, but rather is adjacent to 5' end of the gene (see Figure 4). The 5'-flanking
region contains the promoter, and may also contain enhancers or other protein
binding sites.
5' untranslated region: A region of a gene which IS transcribed into
mRNA, becoming the 5' end of the message, but which does not contain
protein coding sequence. The 5'-untranslated region is the portion of the DNA
starting from the cap site and extending to the base just before the ATG
translation initiation codon (see Figure 4). While not itself translated, this
region may have sequences which alter the translation efficiency of the mRNA,
or which affect the stability of the mRNA.

206
Ablation experiment: An experiment designed to produce an animal
deficient in one or a few cell types, in order to study cell lineage or cell
function. The idea is to make a transgenic mouse with a toxin gene (often
diphtheria toxin) under control of a specialized promoter which activates only
in the target cell type. When embryo development progresses to the point
where it starts to form the target tissue, the toxin gene is activated, and that
specific tissue dies. Other tissues are unaffected.
Acrylamide gels: A polymer gel used for electrophoresis of DNA or
protein to measure their sizes (in daltons for proteins, or in base pairs for
DNA). See "Gel Electrophoresis". Acrylamide gels are especially useful for
high resolution separations of DNA in the range of tens to hundreds of
nucleotides in length.
Agarose gels: A polysaccharide gel used to measure the size of nucleic
acids (in bases or base pairs). See "Gel Electrophoresis". This is the gel of
choice for DNA or RNA in the range of thousands of bases in length, or even
up to 1 megabase if you are using pulsed field gel electrophoresis.
Amp resistance: See "Antibiotic resistance".
Anneal: Generally synonymous with "hybridize".
Antibiotic resistance: Plasmids generally contain genes which confer on
the host bacterium the ability to survive a given antibiotic. If the plasmid
pBR322 is present in a host, that host will not be killed by (moderate levels of)
ampicillin or tetracycline. By using plasmids containing antibiotic resistance
genes, the researcher can kill off all the bacteria which have not taken up his
plasmid, thus ensuring that the plasmid will be propagated as the surviving
cells divide.
Anti-sense strand: See discussion under "Sense strand".
AP-1 site: The binding site on DNA at which the transcription "factor"
AP-1 binds, thereby altering the rate of transcription for the adjacent gene.
AP-1 is actually a complex between c-fos protein and c-jun protein, or
sometimes is just c-jun dimers. The AP-1 site consensus sequence is
(C/G)TGACT(C/A) A. Also known as the TPA-response element (TRE).
[TPA is a phorbol ester, tetradecanoyl phorbol acetate, which is a chemical
tumor promoter]
ATG or AUG: The codon for methionine; the translation initiation
codon. Usually, protein translation can only start at a methionine codon
(although this codon may be found elsewhere within the protein sequence as
well). In eukaryotic DNA, the sequence is ATG; in RNA it is AUG. Usually,
the first AUG in the mRNA is the point at which translation starts, and an open

207
reading frame follows - i.e. the nucleotides taken three at a time will code for
the amino acids of the protein, and a stop codon will be found only when the
protein coding region is complete.
BAC: Bacterial Artificial Chromosome — a cloning vector capable of
carrying between 100 and 300 kilobases of target sequence. They are
propagated as a mini-chromosome in a bacterial host. The size of the typical
BAC is ideal for use as an intermediate in large-scale genome sequencing
projects. Entire genomes can be cloned into BAC libraries, and entire BAC
clones can be shotgun-sequenced fairly rapidly.
Band shift assay: see Gel shift assay.
Bacteriophage lambda: A virus which infects E. coli , and which is
often used in molecular genetics experiments as a vector, or cloning vehicle.
Recombinant phages can be made in which certain non-essential l DNA is
removed and replaced with the DNA of interest. The phage can accommodate a
DNA "insert" of about 15-20 kb. Replication of that virus will thus replicate the
investigator's DNA. One would use phage l rather than a plasmid if the desired
piece of DNA is rather large.
Binding site: A place on cellular DNA to which a protein (such as a
transcription factor) can bind. Typically, binding sites might be found in the
vicinity of genes, and would be involved in activating transcription of that gene
(promoter elements), in enhancing the transcription of that gene (enhancer
elements), or in reducing the transcription of that gene (silencers). NOTE that
whether the protein in fact performs these functions may depend on some
condition, such as the presence of a hormone, or the tissue in which the gene is
being examined. Binding sites could also be involved in the regulation of
chromosome structure or of DNA replication.
Blotting: A technique for detecting one RNA within a mixture of RNAs
(a Northern blot) or one type of DNA within a mixture of DNAs (a Southern
blot). A blot can prove whether that one species of RNA or DNA is present,
how much is there, and its approximate size. Basically, blotting involves gel
electrophoresis, transfer to a blotting membrane (typically nitrocellulose or
activated nylon), and incubating with a radioactive probe. Exposing the
membrane to X-ray film produces darkening at a spot correlating with the
position of the DNA or RNA of interest. The darker the spot, the more nucleic
acid was present there. (see figure, below)

208
The DNA is first transferred from the gel to a membrane by capillary
action. Fluid wicks from the gel through the blotting membrane to several
layers of absorbent paper, but the nucleic acids stick to the membrane. Baking
the filter fixes the DNA or RNA to the filter.
Specific bands are detected by hybridization. The filter membrane is
incubated with radioactive probe, which hybridizes to some bands. After the
filter is washed (to remove unused probe), an X-ray film exposed to the filter
will show which bands have hybridized.
BP : Abbreviation for base pair(s). Double stranded DNA is usually measured
in bp rather than nucleotides (nt).
Cap: All eukaryotes have at the 5' end of their messages a structure called a
"cap", consisting of a 7-methylguanosine in 5'-5' triphosphate linkage with the
first nucleotide of the mRNA. It is added post-transcriptionally, and is not
encoded in the DNA.
Cap site: Two usages: In eukaryotes, the cap site is the position in the gene at
which transcription starts, and really should be called the "transcription
initiation site". The first nucleotide is transcribed from this site to start the
nascent RNA chain. That nucleotide becomes the 5' end of the chain, and thus
the nucleotide to which the cap structure is attached (see "Cap"). In bacteria,

209
the CAP site (note the capital letters) is a site on the DNA to which a protein
factor (the Catabolite Activated Protein) binds.
CAT assay: An enzyme assay. CAT stands for chloramphenicol acetyl
transferase, a bacterial enzyme which inactivates chloramphenicol by
acetylating it. CAT assays are often performed to test the function of a
promoter. The gene coding for CAT is linked onto a promoter (transcription
control region) from another gene, and the construct is "transfected" into
cultured cells. The amount of CAT enzyme produced is taken to indicate the
transcriptional activity of the promoter (relative to other promoters which must
be tested in parallel). It is easier to perform a CAT assay than it is to do a
Northern blot, so CAT assays were a common method for testing the effects of
sequence changes on promoter function. Largely supplanted by the reporter
gene luciferase.
CCAAT box: (CAT box, CAAT box, other variants) A sequence found in the
5' flanking region of certain genes which is necessary for efficient expression.
A transcription factor (CCAAT-binding protein, CBP) binds to this site.
cDNA clone: "complementary DNA"; a piece of DNA copied from an mRNA.
The term "clone" indicates that this cDNA has been spliced into a plasmid or
other vector in order to propagate it. A cDNA clone may contain DNA copies
of such typical mRNA regions as coding sequence, 5'-untranslated region, 3'
untranslated region or poly(A) tail. No introns will be present, nor any
promoter sequences (or other 5' or 3' flanking regions). A "full-length" cDNA
clone is one which contains all of the mRNA sequence from nucleotide #1
through to the poly (A) tail.
ChIP: See Chromatin Immuniprecipitation (below).
Chromatin Immunoprecipitation: This is a method for isolating and
characterizing the specific pieces of DNA out of an entire genome, to which is
bound a protein of interest. The protein of interest could for example be a
transcription factor, or a specific modified histone, or any other DNA binding
protein. This procedure requires an antibody to that protein of interest.
One isolates chromosomal material with all the proteins still bound to
the genomic DNA. After fragmenting the DNA, you use the antibody to
immunoprecipitate all chunks that contain your protein of interest. Isolate the
DNA from those chunks, and you can characterize the specific DNA sites to
which your protein was bound.

210
There are two common ways to characterize the DNA so isolated: ChIP-chip
(or "ChIP-on-chip") or ChIP-seq.
 ChIP-chip: In this variant, the DNA isolated from a ChIP experiment is
characterized by labeling it with a fluorescent dye, then hybrizidizing it to
a DNA array (an oligonucleotide array, for example). Array spots that
"light up" are taken as evidence that their specific sequence is present in
your ChIP product. Unfortunately, designing these arrays requires that you
have some idea what to expect in your ChIP isolates.
 ChIP-seq: A newer variant for characterizing ChIP results, one can simply
sequence everything that immunoprecipitated with the antibody. It requires
no fore-knowlege of the expected products, as would ChIP-chip.
Chromosome walking: A technique for cloning everything in the
genome around a known piece of DNA (the starting probe). You screen a
genomic library for all clones hybridizing with the probe, and then figure out
which one extends furthest into the surrounding DNA. The most distal piece of
this most distal clone is then used as a probe, so that ever more distal regions
can be cloned. This has been used to move as much as 200 kb away from a
given starting point (an immense undertaking). Typically used to "walk" from a
starting point towards some nearby gene in order to clone that gene. Also used
to obtain the remainder of a gene when you have isolated a part of it.
Clone (verb): To "clone" something is to produce copies of it. To clone
a piece of DNA, one would insert it into some type of vector (say, a plasmid)
and put the resultant construct into a host (usually a bacterium) so that the
plasmid and insert replicate with the host. An individual bacterium is isolated
and grown and the plasmid containing the "cloned" DNA is re-isolated from
the bacteria, at which point there will be many millions of copies of the DNA -
essentially an unlimited supply. Actually, an investigator wishing to clone
some gene or cDNA rarely has that DNA in a purified form, so practically
speaking, to "clone" something involves screening a cDNA or genomic library
for the desired clone. See also "Probe" for a description of how one might start
a cloning project, and "Screening" for how the probe in used.
One can also clone more complex organisms, with considerable
difficulty. The much-publicized Scottish research that resulted in the sheep
‘Dolly’ exemplifies this approach.
Clone (noun): The term "clone" can refer either to a bacterium carrying
a cloned DNA, or to the cloned DNA itself. If you receive a clone from a
collaborator, you should first figure out if they send you DNA or bacteria. If it
is DNA, your first job is to introduce it ("transform" it) into bacteria [see
"Transformation (with respect to bacteria)"]. Occasionally, someone might

211
send just the "insert", rather than the whole plasmid. "Your assignment, Jim, if
you decide to accept it", is to splice that DNA into a convenient vector, and
only then can you transform it into bacteria.
Coding sequence: The portion of a gene or an mRNA which actually
codes for a protein. Introns are not coding sequences; nor are the 5' or 3'
untranslated regions (or the flanking regions, for that matter - they are not even
transcribed into mRNA). The coding sequence in a cDNA or mature mRNA
includes everything from the AUG (or ATG) initiation codon through to the
stop codon, inclusive.
Coding strand: an ambiguous term intended to refer to one specific
strand in a double-stranded gene. See "Sense strand".
Codon : In an mRNA, a codon is a sequence of three nucleotides which codes
for the incorporation of a specific amino acid into the growing protein. The
sequence of codons in the mRNA unambiguously defines the primary structure
of the final protein. Of course, the codons in the mRNA were also present in
the genomic DNA, but the sequence may be interrupted by introns.
Consensus sequence: A ‘nominal’ sequence inferred from multiple,
imperfect examples. Multiple lanes of shotgun sequence can be merged to
show a consensus sequence. The optimal sequence of nucleotides recognized
by some factor. A DNA binding site for a protein may vary substantially, but
one can infer the consensus sequence for the binding site by comparing
numerous examples. For example, the (fictitious) transcription factor ZQ1
usually binds to the sequences AAAGTT, AAGGTT or AAGATT. The
consensus sequence for that factor is said to be AARRTT (where R is any
purine, i.e. A or G). ZQ1 may also be able to weakly bind to ACAGTT (which
differs by one base from the consensus).
Contig : Several uses, all nouns. The term comes from a shortening of the
word ‘contiguous’. A ‘contig’ may refer to a map showing placement of a set
of clones that completely, contiguously cover some segment of DNA in which
you are interested. Also called the ‘minimal tiling path’. More often, the term
‘contig’ is used to refer to the final product of a shotgun sequencing project.
When individual lanes of sequence information are merged to infer the
sequence of the larger DNA piece, the product consensus sequence is called a
‘contig’.
Cosmid : A type of vector used for cloning 35-45 kb of DNA. These are
plasmids carrying a phage l cos site (which allows packaging into l capsids), an
origin of replication and an antibiotic resistance gene. A plasmid of 40 kb is
very difficult to put into bacteria, but can replicate once there. Cosmids,
however, have a cos site, and thus can be packaged into l phage heads (a

212
reaction which can be performed in vitro ) to allow efficient introduction into
bacteria (you'll have to look up the cos site elsewhere).
DNase : Deoxyribonuclease, a class of enzymes which digest DNA. The most
common is DNase I, an endonuclease which digests both single and double-
stranded DNA.
Dot blot : A technique for measuring the amount of one specific DNA or RNA
in a complex mixture. The samples are spotted onto a hybridization membrane
(such as nitrocellulose or activated nylon, etc.), fixed and hybridized with a
radioactive probe. The extent of labeling (as determined by autoradiography
and densitometry) is proportional to the concentration of the target molecule in
the sample. Standards provide a means of calibrating the results.
Downstream : See "Upstream/Downstream".
E. coli : A common Gram-negative bacterium useful for cloning experiments.
Present in human intestinal tract. Hundreds of strains of E. coli exist. One
strain, K-12, has been completely sequenced.
Electrophoresis: See "Gel electrophoresis".
Endonuclease: An enzyme which digests nucleic acids starting in the middle
of the strand (as opposed to an exonuclease, which must start at an end).
Examples include the restriction enzymes, DNase I and RNase A.
Enhancer: An enhancer is a nucleotide sequence to which transcription
factor(s) bind, and which increases the transcription of a gene. It is NOT part of
a promoter; the basic difference being that an enhancer can be moved around
anywhere in the general vicinity of the gene (within several thousand
nucleotides on either side or even within an intron), and it will still function. It
can even be clipped out and spliced back in backwards, and will still operate. A
promoter, on the other hand, is position- and orientation-dependent. Some
enhancers are "conditional" - in other words, they enhance transcription only
under certain conditions, for example in the presence of a hormone.
ERE: Estrogen Response Element. A binding site in a promoter to which the
activated estrogen receptor can bind. The estrogen receptor is essentially a
transcription factor which is activated only in the presence of estrogens. The
activated receptor will bind to an ERE, and transcription of the adjacent gene
will be altered. See also "Response element".
Evolutionary Footprinting: One can infer which portions of a gene are
important by comparing the sequence of that gene with its cognates from other
species. A plot showing the regions of high conservation will presumably
reflect the regions that are functional in all the test species. In theory, the more
species involved in the comparison, the more stringent the result can be (i.e. the

213
more the conserved regions will reflect truly important sequences). Care must
be taken, however, to use species in which the function of the gene has not
diverged excessively, or the outcome will be uninformative.
Exon: Those portions of a genomic DNA sequence which WILL be
represented in the final, mature mRNA. The term "exon" can also be used for
the equivalent segments in the final RNA. Exons may include coding
sequences, the 5' untranslated region or the 3' untranslated region.
Exonuclease: An enzyme which digests nucleic acids starting at one end. An
example is Exonuclease III, which digests only double-stranded DNA starting
from the 3' end.
Expression: To "express" a gene is to cause it to function. A gene which
encodes a protein will, when expressed, be transcribed and translated to
produce that protein. A gene which encodes an RNA rather than a protein (for
example, a rRNA gene) will produce that RNA when expressed.
Expression clone: This is a clone (plasmid in a bacteria, or maybe a l phage in
bacteria) which is designed to produce a protein from the DNA insert.
Mammalian genes do not function in bacteria, so to get bacterial expression
from your mammalian cDNA, you would place its coding region (i.e. no
introns) immediately adjacent to bacterial transcription/translation control
sequences. That artificial construct (the "expression clone") will produce a
pseudo-mammalian protein if put back into bacteria. Often, that protein can be
recognized by antibodies raised against the authentic mammalian protein, and
vice versa.
Footprinting: A technique by which one identifies a protein binding site on
cellular DNA. The presence of a bound protein prevents DNase from "nicking"
that region, which can be detected by an appropriately designed gel.
Gel electrophoresis: A method to analyze the size of DNA (or RNA)
fragments. In the presence of an electric field, larger fragments of DNA move
through a gel slower than smaller ones. If a sample contains fragments at four
different discrete sizes, those four size classes will, when subjected to
electrophoresis, all migrate in groups, producing four migrating "bands".
Usually, these are visualized by soaking the gel in a dye (ethidium bromide)
which makes the DNA fluoresce under UV light.

214
Gel shift assay: (aka gel mobility shift assay (GMSA), band shift assay (BSA),
electrophoretic mobility shift assay (EMSA)) A method by which one can
determine whether a particular protein preparation contains factors which bind
to a particular DNA fragment. When a radiolabeled DNA fragment is run on a
gel, it shows a characteristic mobility. If it is first incubated with a cellular
extract of proteins (or with purified protein), any protein-DNA complexes will
migrate slower than the naked DNA - a shifted band.
Gene: A unit of DNA which performs one function. Usually, this is equated
with the production of one RNA or one protein. A gene contains coding
regions, introns, untranslated regions and control regions.
Genome: The total DNA contained in each cell of an organism. Mammalian
genomic DNA (including that of humans) contains 6x109 base pairs of DNA
per diploid cell. There are somewhere in the order of a hundred thousand genes,
including coding regions, 5' and 3' untranslated regions, introns, 5' and 3'
flanking DNA. Also present in the genome are structural segments such as
telomeric and centromeric DNAs and replication origins, and intergenic DNA.
Genomic blot: A type of Southern blot specifically used to analyze a mixture
of DNA fragments derived from total genomic DNA. Because genomic DNA is
very complicated, when it has been digested with restriction enzymes, it
produces a complex set of fragments ranging from tens of bp to tens of
thousands of bp. However, any specific gene will be reproducibly found on
only one or a few specific fragments. A million identical cells will produce a
million identical restriction fragments for any given gene, so probing a
genomic Southern with a gene-specific probe will produce a pattern of perhaps
one or just a few bands.
Genomic clone: A piece of DNA taken from the genome of a cell or animal,
and spliced into a bacteriophage or other cloning vector. A genomic clone may
contain coding regions, exons, introns, 5' flanking regions, 5' untranslated
regions, 3' flanking regions, 3' untranslated regions, or it may contain none of

215
these...it may only contain intergenic DNA (usually not a desired outcome of a
cloning experiment!).
Genotype: Two uses: one is a verb, the other a noun. To 'genotype' (verb) is to
example polymorphisms (e.g. RFLPs, microsatellites, SNPs) present in a
sample of DNA. You might be looking for linkage between a microsatellite
marker and an unknown disease gene. With such information, you can infer the
chromosomal location of the unknown gene, and can sometimes identify the
gene. As a noun, a 'genotype' is the result of a genotyping experiment, be it a
SNP or microsat or whatever.
GRE: Glucocorticoid Response Element: A binding site in a promoter to which
the activated glucocorticoid receptor can bind. The glucocorticoid receptor is
essentially a transcription factor which is activated only in the presence of
glucocorticoids. The activated receptor will bind to a GRE, and transcription of
the adjacent gene will be altered. See also "Response element".
Helix-loop-helix: A protein structural motif characteristic of certain DNA-
binding proteins.
hnRNA: Heterogeneous nuclear RNA; refers collectively to the variety of
RNAs found in the nucleus, including primary transcripts, partially processed
RNAs and snRNA. The term hnRNA is often used just for the unprocessed
primary transcripts, however.
Host strain (bacterial): The bacterium used to harbor a plasmid. Typical host
strains include HB101 (general purpose E. coli strain), DH5a (ditto), JM101
and JM109 (suitable for growing M13 phages), XL1-Blue (general-purpose,
good for blue/white lacZ screening). Note that the host strain is available in a
form with no plasmids (hence you can put one of your own into it), or it may
have plasmids present (especially if you put them there). Hundreds, perhaps
thousands, of host strains are available.
Hybridization: The reaction by which the pairing of complementary strands of
nucleic acid occurs. DNA is usually double-stranded, and when the strands are
separated they will re-hybridize under the appropriate conditions. Hybrids can
form between DNA-DNA, DNA-RNA or RNA-RNA. They can form between
a short strand and a long strand containing a region complementary to the short
one. Imperfect hybrids can also form, but the more imperfect they are, the less
stable they will be (and the less likely to form). To "anneal" two strands is the
same as to "hybridize" them.
Insert: In a complete plasmid clone, there are two types of DNA - the "vector"
sequences and the "insert". The vector sequences are those regions necessary
for propagation, antibiotic resistance, and all those mundane functions

216
necessary for useful cloning. In contrast, however, the insert is the piece of
DNA in which you are really interested.
Intergenic: Between two genes; e.g. intergenic DNA is the DNA found
between two genes. The term is often used to mean non-functional DNA (or at
least DNA with no known importance to the two genes flanking it).
Alternatively, one might speak of the "intergenic distance" between two genes
as the number of base pairs from the polyA site of the first gene to the cap site
of the second. This usage might therefore include the promoter region of the
second gene.
Intron: Introns are portions of genomic DNA which ARE transcribed (and thus
present in the primary transcript) but which are later spliced out. They thus are
not present in the mature mRNA. Note that although the 3' flanking region is
often transcribed, it is removed by endonucleolytic cleavage and not by
splicing. It is not an intron.
KB: abbreviation for kilobase, one thousand bases.
Kinase: A kinase is in general an enzyme that catalyzes the transfer of a
phosphate group from ATP to something else. In molecular biology, it has
acquired the more specific verbal usage for the transfer onto DNA of a
radiolabeled phosphate group. This would be done in order to use the resultant
"hot" DNA as a probe.
Knock-out experiment: A technique for deleting, mutating or otherwise
inactivating a gene in a mouse. This laborious method involves transfecting a
crippled gene into cultured embryonic stem cells, searching through the
thousands of resulting clones for one in which the crippled gene exactly
replaced the normal one (by homologous recombination), and inserting that cell
back into a mouse blastocyst. The resulting mouse will be chimaeric but, if you
are lucky (and if you've gotten this far, you obviously are), its germ cells will
carry the deleted gene. A few rounds of careful breeding can then produce
progeny in which both copies of the gene are inactivated.
Lambda: see Bacteriophage Lambda.
Leucine zipper: A motif found in certain proteins in which Leu residues are
evenly spaced through an a-helical region, such that they would end up on the
same face of the helix. Dimers can form between two such proteins. The Leu
zipper is important in the function of transcription factors such as Fos and Jun
and related proteins.
Library: A library might be either a genomic library, or a cDNA library. In
either case, the library is just a tube carrying a mixture of thousands of different
clones - bacteria or l phages. Each clone carries an "insert" - the cloned DNA.

217
A cDNA library is usually just a mixture of bacteria, where each
bacteria carries a different plasmid. Inserted into the plasmids (one per
plasmid) are thousands of different pieces of cDNA (each typ. 500-5000 bp)
copied from some source of mRNA, for example, total liver mRNA. The basic
idea is that if you have a large enough number of different liver-derived cDNAs
carried in those bacteria, there is a 99% probability that a cDNA copy of any
given liver mRNA exists somewhere in the tube. The real trick is to find the
one you want out of that mess - a process called screening (see "Screening").
A genomic library is similar in concept to a cDNA library, but differs in
three major ways - 1) the library carries pieces of genomic DNA (and so
contains introns and flanking regions, as well as coding and untranslated); 2)
you need bacteriophage l or cosmids, rather than plasmids, because... 3) the
inserts are usually 5-15 kb long (in a l library) or 20-40 kb (in a cosmid
library). Therefore, a genomic library is most commonly a tube containing a
mixture of l phages. Enough different phages must be present in the library so
that any given piece of DNA from the source genome has a 99% probability of
being present.
Ligase: An enzyme, T4 DNA ligase, which can link pieces of DNA together.
The pieces must have compatible ends (both of them blunt, or else mutually
compatible sticky ends), and the ligation reaction requires ATP.
Ligation: The process of splicing two pieces of DNA together. In practice, a
pool of DNA fragments are treated with ligase (see "Ligase") in the presence of
ATP, and all possible splicing products are produced, including circularized
forms and end-to-end ligation of 2, 3 or more pieces. Usually, only some of
these products are useful, and the investigator must have some way of selecting
the desirable ones.
Linker: A small piece of synthetic double-stranded DNA which contains
something useful, such as a restriction site. A linker might be ligated onto the
end of another piece of DNA to provide a desired restriction site.
Marker: Two typical usages:
Molecular weight size marker: a piece of DNA of known size, or a mixture of
pieces with known size, used on electrophoresis gels to determine the size of
unknown DNA’s by comparison.
Genetic marker: A known site on the chromosome. It might for example be the
site of a locus with some recognizable phenotype, or it may be the site of a
polymorphism that can be experimentally discerned. See 'Microsatellite', 'SNP',
'Genotyping'.

218
Message: see mRNA.
Microsatellite: A microsatellite is a simple sequence repeat (SSR). It might be a
homopolymer ('...TTTTTTT...'), a dinucleotide repeat
('....CACACACACACACA.....'), trinucleotide repeat
('....AGTAGTAGTAGTAGT...') etc. Due to polymerase slip (a.k.a. polymerase
chatter), during DNA replication there is a slight chance these repeat sequences
may become altered; copies of the repeat unit can be created or removed.
Consequently, the exact number of repeat units may differ between unrelated
individuals. Considering all the known microsatellite markers, no two
individuals are identical. This is the basis for forensic DNA identification and
for testing of familial relationships (e.g. paternity testing).
mRNA: "messenger RNA" or sometimes just "message"; an RNA which
contains sequences coding for a protein. The term mRNA is used only for a
mature transcript with polyA tail and with all introns removed, rather than the
primary transcript in the nucleus. As such, an mRNA will have a 5'
untranslated region, a coding region, a 3' untranslated region and (almost
always) a poly(A) tail. Typically about 2% of the total cellular RNA is mRNA.
M13: A bacteriophage which infects certain strains of E. coli . The salient
feature of this phage is that it packages only a single strand of DNA into its
capsid. If the investigator has inserted some heterologous DNA into the M13
genome, copious quantities of single-stranded DNA can subsequently be
isolated from the phage capsids. M13 is often used to generate templates for
DNA sequencing.
Nick translation: A method for incorporating radioactive isotopes (typically
32P) into a piece of DNA. The DNA is randomly nicked by DNase I, and then
starting from those nicks DNA polymerase I digests and then replaces a stretch
of DNA. Radiolabeled precursor nucleotide triphosphates can thus be
incorporated.
Non-coding strand: Anti-sense strand. See "Sense strand" for a discussion of
sense strand vs. anti-sense strand.
Northern blot: A technique for analyzing mixtures of RNA, whereby the
presence and rough size of one particular type of RNA (usually an mRNA) can
be ascertained. See "Blotting" for more information. After Dr. E. M. Southern
invented the Southern blot, it was adapted to RNA and named the "Northern"
blot.
NT: Abbreviation for nucleotide; i.e. the monomeric unit from which DNA or
RNA are built. One can express the size of a nucleic acid strand in terms of the
number of nucleotides in its chain; hence ‘nt’ can be a measure of chain length.

219
Nuclear run-on: A method used to estimate the relative rate of transcription of a
given gene, as opposed to the steady-state level of the mRNA transcript (which
is influenced not just by transcription rates, but by the stability of the RNA).
This technique is based on the assumption that a highly-transcribed gene should
have more molecules of RNA polymerase bound to it than will the same gene
in a less-active state. If properly prepared, isolated nuclei will continue to
transcribe genes and incorporate 32P into RNA, but only in those transcripts
that were in progress at the time the nuclei were isolated. Once the polymerase
molecules complete the transcript they have in progress, they should not be
able to re-initiate transcription. If that is true, then the amount of radiolabel
incorporated into a specific type of mRNA is theoretically proportional to the
number of RNA polymerase complexes present on that gene at the time of
isolation. A very difficult technique, rarely applied appropriately from what I
understand.
Nuclease: An enzyme which degrades nucleic acids. A nuclease can be DNA-
specific (a DNase), RNA-specific (RNase) or non-specific. It may act only on
single stranded nucleic acids, or only on double-stranded nucleic acids, or it
may be non-specific with respect to strandedness. A nuclease may degrade only
from an end (an exonuclease), or may be able to start in the middle of a strand
(an endonuclease). To further complicate matters, many enzymes have multiple
functions; for example, Bal31 has a 3'-exonuclease activity on double-stranded
DNA, and an endonuclease activity specific for single-stranded DNA or RNA.
Nuclease protection assay: See "RNase protection assay".
Oncogene: A gene in a tumor virus or in cancerous cells which, when
transferred into other cells, can cause transformation (note that only certain
cells are susceptible to transformation by any one oncogene). Functional
oncogenes are not present in normal cells. A normal cell has many "proto-
oncogenes" which serve normal functions, and which under the right
circumstances can be activated to become oncogenes. The prefix "v-" indicates
that a gene is derived from a virus, and is generally an oncogene (like v-src , v-
ras, v-myb , etc). See also "Transformation (with respect to cultured cells)".
Open reading frame: Any region of DNA or RNA where a protein could be
encoded. In other words, there must be a string of nucleotides (possibly starting
with a Met codon) in which one of the three reading frames has no stop codons.
See "Reading frame" for a simple example.
Origin of replication: Nucleotide sequences present in a plasmid which are
necessary for that plasmid to replicate in the bacterial host. (Abbr. "ori")
pBR322: A common plasmid. Along with the obligatory origin of replication,
this plasmid has genes which make the E. coli host resistant to ampicillin and

220
tetracycline. It also has several restriction sites (BamHI, PstI, EcoRI, HindIII
etc.) into which DNA fragments could be spliced in order to clone them.
PCR: see Polymerase Chain Reaction.
Phagemid: A type of plasmid which carries within its sequence a bacteriophage
replication origin. When the host bacterium is infected with "helper" phage, the
phagemid is replicated along with the phage DNA and packaged into phage
capsids.
Plasmid: A circular piece of DNA present in bacteria or isolated from bacteria.
Escherichia coli, the usual bacteria in molecular genetics experiments, has a
large circular genome, but it will also replicate smaller circular DNAs as long
as they have an "origin of replication". Plasmids may also have other DNA
inserted by the investigator. A bacterium carrying a plasmid and replicating a
million-fold will produce a million identical copies of that plasmid. Common
plasmids are pBR322, pGEM, pUC18.
PolyA tail: After an mRNA is transcribed from a gene, the cell adds a stretch of
A residues (typically 50-200) to its 3' end. It is thought that the presence of this
"polyA tail" increases the stability of the mRNA (possibly by protecting it from
nucleases). Note that not all mRNAs have a polyA tail; the histone mRNAs in
particular do not.
Polymerase: An enzyme which links individual nucleotides together into a long
strand, using another strand as a template. There are two general types of
polymerase — DNA polymerases (which synthesize DNA) and RNA
polymerase (which makes RNA). Within these two classes, there are numerous
sub-types of polymerase, depending on what type of nucleic acid can function
as template and what type of nucleic acid is formed. A DNA-dependant DNA
polymerase will copy one DNA strand starting from a primer, and the product
will be the complementary DNA strand. A DNA-dependant RNA polymerase
will use DNA as a template to synthesize an RNA strand.
Polymerase chain reaction: A technique for replicating a specific piece of DNA
in-vitro, even in the presence of excess non-specific DNA. Primers are added
(which initiate the copying of each strand) along with nucleotides and Taq
polymerase. By cycling the temperature, the target DNA is repetitively
denatured and copied. A single copy of the target DNA, even if mixed in with
other undesirable DNA, can be amplified to obtain billions of replicates. PCR
can be used to amplify RNA sequences if they are first converted to DNA via
reverse transcriptase. This two-phase procedure is known as ‘RT-PCR’.
Polymerase Chain Reaction (PCR) is the basis for a number of
extremely important methods in molecular biology. It can be used to detect and
measure vanishingly small amounts of DNA and to create customized pieces of

221
DNA. It has been applied to clinical diagnosis and therapy, to forensics and to
vast numbers of research applications. It would be difficult to overstate the
importance of PCR to science.
Post-transcriptional regulation: Any process occurring after transcription which
affects the amount of protein a gene produces. Includes RNA processing
efficiency, RNA stability, translation efficiency, protein stability. For example,
the rapid degradation of an mRNA will reduce the amount of protein arising
from it. Increasing the rate at which an mRNA is translated will increase the
amount of protein product.
Post-translational processing: The reactions which alter a protein's covalent
structure, such as phosphorylation, glycosylation or proteolytic cleavage.
Post-translational regulation: Any process which affects the amount of protein
produced from a gene, and which occurs AFTER translation in the grand
scheme of genetic expression. Actually, this is often just a buzz-word for
regulation of the stability of the protein. The more stable a protein is, the more
it will accumulate.
PRE: Progesterone Response Element: A binding site in a promoter to which
the activated progesterone receptor can bind. The progesterone receptor is
essentially a transcription factor which is activated only in the presence of
progesterone . The activated receptor will bind to a PRE, and transcription of
the adjacent gene will be altered. See also "Response element".
Primary transcript: When a gene is transcribed in the nucleus, the initial
product is the primary transcript, an RNA containing copies of all exons and
introns. This primary transcript is then processed by the cell to remove the
introns, to cleave off unwanted 3' sequence, and to polyadenylate the 5' end.
The mature message thus formed is then exported to the cytoplasm for
translation.
Primer: A small oligonucleotide (anywhere from 6 to 50 nt long) used to prime
DNA synthesis. The DNA polymerases are only able to extend a pre-existing
strand along a template; they are not able to take a naked single strand and
produce a complementary copy of it de-novo. A primer which sticks to the
template is therefore used to initiate the replication. Primers are necessary for
DNA sequencing and PCR.
Primer extension: This is a method used to figure out how far upstream from a
fixed site the start of an mRNA is. For example, perhaps you have isolated a
cDNA clone, but you don't think that the clone has all of the 5' untranslated
region. To find out how much is missing, you would first sequence the part you
have, and figure out which strand is coding strand (usually the coding strand
will have a large open reading frame). Next, you ask the DNA Synthesis

222
Facility to make an oligonucleotide complementary to the 5'-most region of the
coding strand (and thus complementary to the mRNA). This "primer" is
hybridized to mRNA (say, a mixture of mRNA containing the one in which
you are interested), and reverse transcriptase is added to copy the mRNA from
the primer out to the 5' end. The size of the resulting DNA fragment shows how
far away from the 5' end your primer is.

Probe: A fragment of DNA or RNA which is labeled in some way (often


incorporating 32P or 35S), and which is used to hybridize with the nucleic acid
in which you are interested. For example, if you want to quantitate the levels of
alpha subunit mRNA in a preparation of pituitary RNA, you might make a
radiolabeled RNA in-vitro which is complementary to the mRNA, and then use
it to probe a Northern blot of the pit RNA. A probe can be radiolabeled, or
tagged with another functional group such as biotin. A probe can be cloned
DNA, or might be a synthetic DNA strand. As an example of the latter, perhaps
you have isolated a protein for which you wish to obtain a cDNA or genomic
clone. You might (pay to) microsequence a portion of the protein, deduce the
nucleic acid sequence, (pay to) synthesize an oligonucleotide carrying that
sequence, radiolabel it and use it as a probe to screen a cDNA library or
genomic library. A better way is to call up someone who already has the clone.
Processing: The reactions occurring in the nucleus which convert the primary
RNA transcript to a mature mRNA. Processing reactions include capping,
splicing and polyadenylation. The term can also refer to the processing of the
protein product, including proteolytic cleavages, glycosylation, etc.
Promoter: The first few hundred nucleotides of DNA "upstream" (on the 5'
side) of a gene, which control the transcription of that gene. The promoter is
part of the 5' flanking DNA, i.e. it is not transcribed into RNA, but without the
promoter, the gene is not functional. Note that the definition is a bit hazy as far
as the size of the region encompassed, but the "promoter" of a gene starts with
the nucleotide immediately upstream from the cap site, and includes binding
sites for one or more transcription factors which can not work if moved farther
away from the gene.
Proto-oncogene: A gene present in a normal cell which carries out a normal
cellular function, but which can become an oncogene under certain

223
circumstances. The prefix "c-" indicates a cellular gene, and is generally used
for proto-oncogenes (examples: c-myb , c-myc , c-fos , c-jun , etc).
Pulsed field gel electrophoresis: (PFGE) A gel technique which allows size-
separation of very large fragments of DNA, in the range of hundreds of kb to
thousands of kb. As in other gel electrophoresis techniques, populations of
molecules migrate through the gel at a speed related to their size, producing
discrete bands. In normal electrophoresis, DNA fragments greater than a
certain size limit all migrate at the same rate through the gel. In PFGE, the
electrophoretic voltage is applied alternately along two perpendicular axes,
which forces even the larger DNA fragments to separate by size.
Random primed synthesis: If you have a DNA clone and you want to produce
radioactive copies of it, one way is to denature it (separate the strands), then
hybridize to that template a mixture of all possible 6-mer oligonucleotides.
Those oligos will act as primers for the synthesis of labeled strands by DNA
polymerase (in the presence of radiolabeled precursors).
Reading frame: When mRNA is translated by the cell, the nucleotides are read
three at a time. By starting at different positions, the groupings of three that are
produced can be entirely different. The following example shows a DNA
sequence and the three reading frames in which it could be read. Not only is an
entirely different amino acid sequence specified by the different reading
frames, but two of the three frames have stop codons, and thus are not open
reading frames (asterisks indicate a stop codon).
A DNA open reading frame:
...ATG ACA TGT AAA GAT AGA CTA ACC TTT TGG...
...Met Thr Cys Lys Asp Arg Leu Thr Phe Trp...
Same bases, different 'frame':
...A TGA CAT GTA AAG ATA GAC TAA CCT TTT GG...
... *** His Val Lys Ile Asp *** Pro Phe Gly..
Same sequence, the last of the 3 possible frames:
...AT GAC ATG TAA AGA TAG ACT AAC CTT TTG G..
... Asp Met *** Arg *** Thr Asn Leu Leu ...
If we shift the grouping again, we will just get the first reading frame
again. The reading frame that is actually used is determined by the first
methionine codon (the initiation codon). Once that first AUG is recognized, the
pattern of triplet groupings follows unambiguously.
Repetitive DNA: A surprising portion of any genome consists not of genes or
structural elements, but of frequently repeated simple sequences. These may be

224
short repeats just a few nt long, like CACACA etc. They can also range up to a
few hundred nt long. Examples of the latter include Alu repeats, LINEs, SINEs.
The function of these elements is often unknown. In shorter repeats like di- and
tri-nucleotide repeats, the number of repeating units can occasionally change
during evolution and descent. They are thus useful markers for familial
relationships and have been used in paternity testing, forensic science and in
the identification of human remains.
Response element: By definition, a "response element" is a portion of a gene
which must be present in order for that gene to respond to some hormone or
other stimulus. Response elements are binding sites for transcription factors.
Certain transcription factors are activated by stimuli such as hormones or heat
shock. A gene may respond to the presence of that hormone because the gene
has in its promoter region a binding site for hormone-activated transcription
factor. Example: the glucocorticoid response element (GRE).
Restriction: To "restrict" DNA means to cut it with a restriction enzyme. See
"Restriction Enzyme".
Restriction enzyme: A class of enzymes ("restriction endonucleases") generally
isolated from bacteria, which are able to recognize and cut specific sequences
("restriction sites") in DNA. For example, the restriction enzyme BamHI
locates and cuts any occurrence of:
5'-GGATCC-3'
||||||
3'-CCTAGG-5'
Note that both strands contain the sequence GGATCC, but in
antiparallel orientation. The recognition site is thus said to be palindromic,
which is typical of restriction sites. Every copy of a plasmid is identical in
sequence, so if BamHI cuts a particular circular plasmid at three sites
producing three "restriction fragments", then a million copies of that plasmid
will produce those same restriction fragments a million times over. There are
more than six hundred known restriction enzymes.
Bacteria produce restriction enzymes for protection against invasion by
foreign DNA such as phages. The bacteria's own DNA is modified in such a
way as to prevent it from being clipped.
Restriction fragment: The piece of DNA released after restriction digestion of
plasmids or genomic DNA. See "Restriction enzyme". One can digest a
plasmid and isolate one particular restriction fragment (actually a set of
identical fragments). The term also describes the fragments detected on a
genomic blot which carry the gene of interest.

225
Restriction fragment length polymorphism: See "RFLP".
Restriction map: A "cartoon" depiction of the locations within a stretch of
known DNA where restriction enzymes will cut.

The map usually indicates the approximate length of the entire piece
(scale on the bottom), as well as the position within the piece at which
designated enzymes will cut. This map happens to be of a plasmid, and the two
ends are joined together with about 25 nt between the EcoRI and HindIII sites.
Restriction site: See Restriction enzyme.
Reverse transcriptase: An enzyme which will make a DNA copy of an RNA
template - a DNA-dependant RNA polymerase. RT is used to make cDNA; one
begins by isolating polyadenylated mRNA, providing oligo-dT as a primer, and
adding nucleotide triphosphates and RT to copy the RNA into cDNA.
RFLP: Restriction fragment length polymorphism; the acronym is pronounced
"riflip". Although two individuals of the same species have almost identical
genomes, they will always differ at a few nucleotides. Some of these
differences will produce new restriction sites (or remove them), and thus the
banding pattern seen on a genomic Southern will thus be affected. For any
given probe (or gene), it is often possible to test different restriction enzymes
until you find one which gives a pattern difference between two individuals - a
RFLP. The less related the individuals, the more divergent their DNA
sequences are and the more likely you are to find a RFLP.
Ribonuclease: see "RNAse".
Riboprobe: A strand of RNA synthesized in-vitro (usually radiolabeled) and
used as a probe for hybridization reactions. An RNA probe can be synthesized
at very high specific activity, is single stranded (and therefore will not self
anneal), and can be used for very sensitive detection of DNA or RNA.
Ribosome: A cellular particle which is involved in the translation of mRNAs to
make proteins. Ribosomes are a complex consisting of ribosomal RNAs
(rRNA) and several proteins.
RNAi: 'RNA interference' (a.k.a. 'RNA silencing') is the mechanism by which
small double-stranded RNAs can interfere with expression of any mRNA
having a similar sequence. Those small RNAs are known as 'siRNA', for short
interfering RNAs. The mode of action for siRNA appears to be via dissociation
of its strands, hybridization to the target RNA, extension of those fragments by

226
an RNA-dependent RNA polymerase, then fragmentation of the target.
Importantly, the remnants of the target molecule appears to then act as an
siRNA itself; thus the effect of a small amount of starting siRNA is effectively
amplified and can have long-lasting effects on the recipient cell.
The RNAi effect has been exploited in numerous research programs to
deplete the call of specific messages, thus examining the role of those messages
by their absence.
RNase: Ribonuclease; an enzyme which degrades RNA. It is ubiquitous in
living organisms and is exceptionally stable. The prevention of RNase activity
is the primary problem in handling RNA.
RNase protection assay: This is a sensitive method to determine (1) the amount
of a specific mRNA present in a complex mixture of mRNA and/or (2) the
sizes of exons which comprise the mRNA of interest. A radioactive DNA or
RNA probe (in excess) is allowed to hybridize with a sample of mRNA (for
example, total mRNA isolated from tissue), after which the mixture is digested
with single-strand specific nuclease. Only the probe which is hybridized to the
specific mRNA will escape the nuclease treatment, and can be detected on a
gel. The amount of radioactivity which was protected from nuclease is
proportional to the amount of mRNA to which it hybridized. If the probe
included both intron and exons, only the exons will be protected from nuclease
and their sizes can be ascertained on the gel.
rRNA: "ribosomal RNA"; any of several RNAs which become part of the
ribosome, and thus are involved in translating mRNA and synthesizing
proteins. They are the most abundant RNA in the cell (on a mass basis).
RT-PCR: See ‘Polymerase Chain Reaction’.
Run-off: see Nuclear run-on.
Run-on: see Nuclear run-on.
S1 end mapping: A technique to determine where the end of an RNA transcript
lies with respect to its template DNA (the gene). Can't be described in a short
paragraph. See "RNAse Protection assay" for a closely related technique.
S1 nuclease: An enzyme which digests only single-stranded nucleic acids.
Screening: To screen a library (see "Library") is to select and isolate individual
clones out of the mixture of clones. For example, if you needed a cDNA clone
of the pituitary glycoprotein hormone alpha subunit, you would need to make
(or buy) a pituitary cDNA library, then screen that library in order to detect and
isolate those few bacteria carrying alpha subunit cDNA.
There are two methods of screening which are particularly worth
describing: screening by hybridization, and screening by antibody.

227
Screening by hybridization involves spreading the mixture of bacteria
out on a dozen or so agar plates to grow several ten thousand isolated colonies.
Membranes are laid onto each plate, and some of the bacteria from each colony
stick, producing replicas of each colony in their original growth position. The
membranes are lifted and the adherent bacteria are lysed, then hybridized to a
radioactive piece of alpha DNA (the source of which is a story in itself - see
"Probe"). When X-ray film is laid on the filter, only colonies carrying alpha
sequences will "light up". Their position on the membranes show where they
grew on the original plates, so you now can go back to the original plate (where
the remnants of the colonies are still alive), pick the colony off the plate and
grow it up. You now have an unlimited source of alpha cDNA.
Screening by antibody is an option if the bacteria and plasmid are
designed to express proteins from the cDNA inserts (see "Expression clones").
The principle is similar to hybridization, in that you lift replica filters from
bacterial plates, but then you use the antibody (perhaps generated after olde
tyme protein purification rituals) to show which colony expresses the desired
protein.
Sense strand: A gene has two strands: the sense strand and the anti-
sense strand. The Sense strand is, by definition, the same 'sense' as the mRNA;
that is it can be translated exactly as the mRNA sequence can. Given a sense
strand with the following sequence:
5' - ATG GGG CCA CGG CTG TGA - 3'
Met Gly Pro Arg Leu stop
The anti-sense strand will read as follows (note that the strand has been
reversed and complemented):
5' - TCA CAG CCG TGG CCC CAT - 3'
The duplex DNA will pair as follows:
5' - ATGGGGCCACGGCTGTGA - 3'
||||||||||||||||||
3' - TACCCCGGAGCCGACACT - 5'
Note however that when the RNA is transcribed from this sequence, the
ANTI-SENSE strand is used as the template for RNA polymerization. After all,
the RNA must base-pair with its template strand (see Figure 3), so the process
of transcription produces the complement of the anti-sense strand. This
introduces some confusion about terminology:
Some people use the term ‘coding strand’ and ‘non-coding strand’ to
refer to the sense and antisense strands, respectively. Unfortunately, many
people interpret these terms in exactly the opposite way. I consider the terms

228
‘coding strand’ and ‘non-coding strand’ to be too ambiguous. Some people use
the exact opposite definition for ‘sense’ and ‘anti-sense’ that I have given here.
Be aware of the possibility of a discrepancy. Textbooks I have consulted
generally agree with the nomenclature given herein, albeit some avoid defining
these terms at all.
Sequence: As a noun, the sequence of a DNA is a buzz word for the structure
of a DNA molecule, in terms of the sequence of bases it contains. As a verb,
"to sequence" is to determine the structure of a piece of DNA; i.e. the sequence
of nucleotides it contains.
Shotgun cloning: The practice of randomly clipping a larger DNA fragment
into various smaller pieces, cloning everything, and then studying the resulting
individual clones to figure out what happened. For example, if one was
studying a 50 kb gene, it "may" be a bit difficult to figure out the restriction
map. By randomly breaking it into smaller fragments and mapping those, a
master restriction map could be deduced. See also Shotgun sequencing.
Shotgun sequencing: A way of determining the sequence of a large DNA
fragment which requires little brainpower but lots of late nights. The large
fragment is shotgun cloned (see above), and then each of the resulting smaller
clones ("subclones") is sequenced. By finding out where the subclones overlap,
the sequence of the larger piece becomes apparent. Note that some of the
regions will get sequenced several times just by chance.
siRNA: Small Inhibitory RNA; a.k.a. 'RNAi'. See 'RNAi'.
Slot blot: Similar to a dot blot, but the analyte is put onto the membrane using a
slot-shaped template. The template produces a consistently shaped spot, thus
decreasing errors and improving the accuracy of the analysis. See Dot blot.
snRNA: Small nuclear RNA; forms complexes with proteins to form snRNPs;
involved in RNA splicing, polyadenylation reactions, other unknown functions
(probably).
snRNP: "snerps", Small Nuclear RiboNucleoProtein particles, which are
complexes between small nuclear RNAs and proteins, and which are involved
in RNA splicing and polyadenylation reactions.
SNP: Single Nucleotide Polymorphism (SNP) - a position in a genomic DNA
sequence that varies from one individual to another. It is thought that the
primary source of genetic difference between any two humans is due to the
presence of single nucleotide polymorphisms in their DNA. Furthermore, these
SNPs can be extremely useful in genetic mapping (see 'Genetic Mapping') to
follow inheritance of specific segments of DNA in a lineage. SNP-typing is the

229
process of determining the exact nucleotide at positions known to be
polymorphic.
Solution hybridization: A method closely related to RNase protection (see
"RNase protection assay"). Solution hybridization is designed to measure the
levels of a specific mRNA species in a complex population of RNA. An excess
of radioactive probe is allowed to hybridize to the RNA, then single-strand
specific nuclease is used to destroy the remaining unhybridized probe and
RNA. The "protected" probe is separated from the degraded fragments, and the
amount of radioactivity in it is proportional to the amount of mRNA in the
sample which was capable of hybridization. This can be a very sensitive
detection method.
Southern blot: A technique for analyzing mixtures of DNA, whereby the
presence and rough size of one particular fragment of DNA can be ascertained.
See "Blotting". Named for its inventor, Dr E. M. Southern.
SSR: Simple Sequence Repeat. See 'Microsatellite'.
Stable transfection: A form of transfection experiment designed to produce
permanent lines of cultured cells with a new gene inserted into their genome.
Usually this is done by linking the desired gene with a "selectable" gene, i.e. a
gene which confers resistance to a toxin (like G418, aka Geneticin). Upon
putting the toxin into the culture medium, only those cells which incorporate
the resistance gene will survive, and essentially all of those will also have
incorporated the experimenter's gene.
Sticky ends: After digestion of a DNA with certain restriction enzymes, the
ends left have one strand overhanging the other to form a short (typically 4 nt)
single-stranded segment. This overhang will easily re-attach to other ends like
it, and are thus known as "sticky ends". For example, the enzyme BamHI
recognizes the sequence GGATCC, and clips after the first G in each strand:

The overhangs thus produced can still hybridize ("anneal") with each
other, even if they came from different parent DNA molecules, and the enzyme
ligase will then covalently link the strands. Sticky ends therefore facilitate the
ligation of diverse segments of DNA, and allow the formation of novel DNA
constructs.

230
Stringency: A term used to describe the conditions of hybridization. By varying
the conditions (especially salt concentration and temperature) a given probe
sequence may be allowed to hybridize only with its exact complement (high
stringency), or with any somewhat related sequences (relaxed or low
stringency). Increasing the temperature or decreasing the salt concentration will
tend to increase the selectivity of a hybridization reaction, and thus will raise
the stringency.
Sub-cloning: If you have a cloned piece of DNA (say, inserted into a plasmid)
and you need unlimited copies of only a part of it, you might "sub-clone" it.
This involves starting with several million copies of the original plasmid,
cutting with restriction enzymes, and purifying the desired fragment out of the
mixture. That fragment can then be inserted into a new plasmid for replication.
It has now been subcloned.
Taq polymerase: A DNA polymerase isolated from the bacterium Thermophilis
aquaticus and which is very stable to high temperatures. It is used in PCR
procedures and high temperature sequencing.
TATA box: A sequence found in the promoter (part of the 5' flanking region)
of many genes. Deletion of this site (the binding site of transcription factor
TFIID) causes a marked reduction in transcription, and gives rise to
heterogeneous transcription initiation sites.
Tet resistance: See "Antibiotic resistance".
Tissue-specific expression: Gene function which is restricted to a particular
tissue or cell type. For example, the glycoprotein hormone alpha subunit is
produced only in certain cell types of the anterior pituitary and placenta, not in
lungs or skin; thus expression of the glycoprotein hormone alpha-chain gene is
said to be tissue-specific. Tissue specific expression is usually the result of an
enhancer which is activated only in the proper cell type.
Tm: The melting point for a double-stranded nucleic acid. Technically, this is
defined as the temperature at which 50% of the strands are in double-stranded
form and 50% are single-stranded, i.e. midway in the melting curve. A primer
has a specific Tm because it is assumed that it will find an opposite strand of
appropriate character.
Transcription factor: A protein which is involved in the transcription of genes.
These usually bind to DNA as part of their function (but not necessarily). A
transcription factor may be general (i.e. acting on many or all genes in all
tissues), or tissue-specific (i.e. present only in a particular cell type, and
activating the genes restricted to that cell type). Its activity may be constitutive,
or may depend on the presence of some stimulus; for example, the

231
glucocorticoid receptor is a transcription factor which is active only when
glucocorticoids are present.
Transcription: The process of copying DNA to produce an RNA transcript.
This is the first step in the expression of any gene. The resulting RNA, if it
codes for a protein, will be spliced, polyadenylated, transported to the
cytoplasm, and by the process of translation will produce the desired protein
molecule.
Transfection: A method by which experimental DNA may be put into a
cultured mammalian cell. Such experiments are usually performed using cloned
DNA containing coding sequences and control regions (promoters, etc) in order
to test whether the DNA will be expressed. Since the cloned DNA may have
been extensively modified (for example, protein binding sites on the promoter
may have been altered or removed), this procedure is often used to test whether
a particular modification affects the function of a gene.
Transformation (with respect to bacteria): The process by which a bacteria
acquires a plasmid and becomes antibiotic resistant. This term most commonly
refers to a bench procedure performed by the investigator which introduces
experimental plasmids into bacteria.
Transformation (with respect to cultured cells): A change in cell morphology
and behavior which is generally related to carcinogenesis. Transformed cells
tend to exhibit characteristics known collectively as the "transformed
phenotype" (rounded cell bodies, reduced attachment dependence, increased
growth rate, loss of contact inhibition, etc). There are different "degrees" of
transformation, and cells may exhibit only a subset of these characteristics. Not
well understood, the process of transformation is the subject of intense
research.
Transgenic mouse: A mouse which carries experimentally introduced DNA.
The procedure by which one makes a transgenic mouse involves the injection
of DNA into a fertilized embryo at the pro-nuclear stage. The DNA is generally
cloned, and may be experimentally altered. It will become incorporated into the
genome of the embryo. That embryo is implanted into a foster mother, who
gives birth to an animal carrying the new gene. Various experiments are then
carried out to test the functionality of the inserted DNA.
Transient transfection: When DNA is transfected into cultured cells, it is able to
stay in those cells for about 2-3 days, but then will be lost (unless steps are
taken to ensure that it is retained - see Stable transfection). During those 2-3
days, the DNA is functional, and any functional genes it contains will be
expressed. Investigators take advantage of this transient expression period to
test gene function.

232
Translation: The process of decoding a strand of mRNA, thereby producing a
protein based on the code. This process requires ribosomes (which are
composed of rRNA along with various proteins) to perform the synthesis, and
tRNA to bring in the amino acids. Sometimes, however, people speak of
"translating" the DNA or RNA when they are merely reading the nucleotide
sequence and predicting from it the sequence of the encoded protein. This
might be more accurately termed "conceptual translation".
Tumor suppressor: A gene that inhibits progression towards neoplastic
transformation. The best-known examples of tumor suppressors are the proteins
p53 and Rb.
tRNA: "transfer RNA"; one of a class of rather small RNAs used by the cell to
carry amino acids to the enzyme complex (the ribosome) which builds proteins,
using an mRNA as a guide. Fairly abundant.
Upstream activator sequence: A binding site for transcription factors, generally
part of a promoter region. A UAS may be found upstream of the TATA
sequence (if there is one), and its function is (like an enhancer) to increase
transcription. Unlike an enhancer, it can not be positioned just anywhere or in
any orientation.
Upstream/Downstream: In an RNA, anything towards the 5' end of a reference
point is "upstream" of that point. This orientation reflects the direction of both
the synthesis of mRNA, and its translation - from the 5' end to the 3' end. In
DNA, the situation is a bit more complicated. In the vicinity of a gene (or in a
cDNA), the DNA has two strands, but one strand is virtually a duplicate of the
RNA, so it's 5' and 3' ends determine upstream and downstream, respectively.
NOTE that in genomic DNA, two adjacent genes may be on different strands
and thus oriented in opposite directions. Upstream or downstream is only used
on conjunction with a given gene.
Vector: The DNA "vehicle" used to carry experimental DNA and to clone it.
The vector provides all sequences essential for replicating the test DNA.
Typical vectors include plasmids, cosmids, phages and YACs.
Western blot: A technique for analyzing mixtures of proteins to show the
presence, size and abundance of one particular type of protein. Similar to
Southern or Northern blotting (see "Blotting"), except that (1) a protein mixture
is electrophoresed in an acrylamide gel, and (2) the "probe" is an antibody
which recognizes the protein of interest, followed by a radioactive secondary
probe (such as 125I-protein A).

233
YAC: Yeast artificial chromosome. This is a method for cloning very large
fragments of DNA. Genomic DNA in fragments of 200-500 kb are linked to
sequences which allow them to propagate in yeast as a mini-chromosome
(including telomeres, a centromere and an ARS - an autonomous replication
sequence). This technique is used to clone large genes and intergenic regions,
and for chromosome walking.
Zinc finger: A protein structural motif common in DNA binding proteins. Four
Cys residues are found for each "finger" and one finger can bind a molecule of
zinc. A typical configuration is: CysXxxXxxCys--(intervening 12 or so aa's)--
CysXxxXxxCys.

234
NOTES

…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...

235
NOTES

…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...
…………………………………………………………………………………...

236

You might also like