Download as pdf or txt
Download as pdf or txt
You are on page 1of 19

20

Biomolecular chemistry

2. RNA and transcription

Primary Source Material


•Biochemistry Berg, Jeremy M.; Tymoczko, John L.; and Stryer, Lubert (courtesy of the NCBI bookshelf)
•Molecular Cell Biology Lodish, Harvey; Berk, Arnold; Zipursky, S. Lawrence; Matsudaira, Paul; Baltimore,
David; Darnell, James E. (courtesy of the NCBI bookshelf)
•Many figures and the descriptions for the figures are from the educational resources provided at the Protein
Data Bank (http://www.pdb.org/)
•Most of these figures and accompanying legends have been written by David S. Goodsell of the Scripps
Research Institute and are being used with permission. I highly recommend browsing the Molecule of the
Month series at the PDB (http://www.pdb.org/pdb/101/motm_archive.do)

Some objectives for this section:


• you will appreciate the many roles of RNA
• you will understand the mechanism of RNA polymerase
• you will know some differences between prokaryote and eukaryote RNA processing
• you will know what an exon and an intron is
• you will appreciate the structures that RNA can adapt
• you will know what reverse transcriptase does
21
The Central Dogma

information
U.S. Department of Energy Human Genome Program (http://www.ornl.gov/hgmis)

• There are many ways of stating the central dogma of molecular biology. Apparently Francis Crick originally defined it
like this:
The central dogma of molecular biology deals with the detailed residue-by-residue transfer of sequential
information. It states that information cannot be transferred back from protein to either protein or nucleic acid.
(http://en.wikipedia.org/wiki/Central_dogma_of_molecular_biology)

• That way that I think of the Central Dogma is: genetic information tends to flow from DNA to RNA to proteins.
• The information stored as DNA becomes useful through gene expression.
• Gene expression means the production of a protein or a functional RNA from its gene. 
• Gene expression involves several steps:
• Transcription:  A DNA strand is used as the template to synthesize a RNA strand, which is called the primary
transcript. 
• RNA processing:  This step involves modifications of the primary transcript to generate a mature mRNA (for protein
genes) or a functional tRNA or rRNA. For RNA genes (tRNA and rRNA), the expression is complete after a
functional tRNA or rRNA is generated.  However, protein genes require additional steps.
• Nuclear transport: mRNA has to be transported from the nucleus to the cytoplasm for protein synthesis
• Protein synthesis:  In the cytoplasm, mRNA binds to ribosomes, which can synthesize a polypeptide based on the
sequence of mRNA.

• Epigenetic information can be thought of as flowing the other way: changes in the cell (typically in proteins or caused
by proteins) that result in changes in gene expression but not in changes in the genetic sequence itself.
The roles of RNA: more than just messengers22

rRNA

And one more… catalytic RNA

• Messenger RNA is the template for protein synthesis (translation). An mRNA molecule may be
produced for each gene or group of genes that is to be expressed in E. coli, whereas a distinct
mRNA is produced for each gene in eukaryotes. In E. coli, the average length of an mRNA molecule
is about 1.2 kilobases (kb).
• Transfer RNA carries amino acids in an activated form to the ribosome for peptide-bond formation, in
a sequence dictated by the mRNA template. There is at least one kind of tRNA for each of the 20
amino acids. Transfer RNA consists of about 75 nucleotides (having a mass of about 25 kDa), which
makes it one of the smallest of the RNA molecules discussed here.
• Ribosomal RNA (rRNA), the major component of ribosomes, plays both a catalytic and a structural
role in protein synthesis. In E. coli, there are three kinds of rRNA, called 23S, 16S, and 5S RNA
because of their sedimentation behaviour. One molecule of each of these species of rRNA is present
in each ribosome.
• The first catalytic RNA was discovered by Cech and coworkers in the early 1980’s. Most naturally
occurring ribozymes have a role in mRNA splicing. However, in vitro evolution has resulted in
ribozymes with a variety of different functions.
23
DNA to RNA (transcription)

The 2006 Nobel Prize in Chemistry was awarded to Roger Kornberg for his work in
determining the mechanism of RNA polymerase, including solving this crystal structure.
David S. Goodsell: The Molecule of the Month appearing at the PDB

• RNA synthesis (transcription), is the process of transcribing DNA nucleotide sequence information into RNA sequence information.
RNA synthesis is catalyzed by a large enzyme called RNA polymerase. The basic biochemistry of RNA synthesis is common to
prokaryotes and eukaryotes, although its regulation is more complex in eukaryotes. Despite substantial differences in size and number
of polypeptide subunits, the overall structures of these enzymes are quite similar between prokaryotes and eukaryotes, revealing a
common evolutionary origin.
• RNA synthesis takes place in three stages: initiation, elongation, and termination. RNA polymerase performs multiple functions in this
process:
• It searches DNA for initiation sites, also called promoter sites or simply promoters.
• It unwinds a short stretch of double-helical DNA to produce a single-stranded DNA template from which it will ‘read’ the sequence.
• It selects the correct ribonucleoside triphosphate and catalyzes the formation of a phosphodiester bond. RNA polymerase is
completely processive - a transcript is synthesized from start to end by a single RNA polymerase molecule.
• It detects termination signals that specify where a transcript ends.
• It interacts with activator and repressor proteins that modulate the rate of transcription initiation over a wide dynamic range. These
proteins, which play a more prominent role in eukaryotes than in prokaryotes, are called transcription factors. RNA polymerase is a
huge factory with many moving parts. The one shown here, from PDB entry 1i6h, is from yeast (Saccharomyces cerevisiae). It is
composed of a dozen different proteins. Together, they form a machine that surrounds DNA strands, unwinds them, and builds an
RNA strand based on the sequence of the DNA. Once the enzyme gets started, RNA polymerase continues along the DNA
copying RNA strands thousands of nucleotides long.
• In contrast with DNA synthesis, RNA synthesis can start de novo, without the requirement for a primer.
• Most newly synthesized RNA chains carry a highly distinctive tag on the 5’ end: the first base at that end is either pppG or pppA.
• RNA chains, like DNA chains, grow in the 5’-3’ direction.
• The Nobel Prize in Chemistry for 2006 went to Roger D. Kornberg of Stanford University, CA, USA "for his studies of the molecular
basis of eukaryotic transcription" (http://nobelprize.org/nobel_prizes/chemistry/laureates/2006/index.html)
24
The transcription bubble

David S. Goodsell: The Molecule of the Month appearing at the PDB

• The region containing RNA polymerase, DNA, and newly synthesized RNA is called a transcription bubble because it contains a locally melted
“bubble” of DNA. The newly synthesized RNA forms a hybrid helix with the template DNA strand. This RNA-DNA helix is about 8 bp long, which
corresponds to nearly one turn of a double helix (10.4 bp/turn in B-form).
• The 3’ hydroxyl group of the RNA in this hybrid helix is positioned so that it can attack the alpha-phosphate atom of an incoming ribonucleoside
triphosphate. The core enzyme also contains a binding site for the other DNA strand. About 17 bp of DNA are unwound throughout the
elongation phase, as in the initiation phase. The transcription bubble moves a distance of 170 Å (17 nm) in a second, which corresponds to a
rate of elongation of about 50 nucleotides per second. Although rapid, it is much slower than the rate of DNA synthesis, which is about 800
nucleotides per second.
• As you might expect, RNA polymerase needs to be accurate in its copying of genetic information. To improve its accuracy, it performs a simple
proofreading step as it builds an RNA strand. The active site is designed to be able to remove nucleotides as well as add them to the growing
strand. The enzyme tends to hover around mismatched nucleotides longer than properly added ones, giving the enzyme time to remove them.
This process is somewhat wasteful, since proper nucleotides are also occasionally removed, but this is a small price to pay for creating better
RNA transcripts. Overall, RNA polymerase makes an error about once in 10,000 nucleotides added, or about once per RNA strand created. This
rate is about 104 fold higher than DNA synthesis. The much lower fidelity of RNA synthesis can be tolerated because mistakes are not
transmitted to progeny. For most genes, many RNA transcripts are synthesized; a few defective transcripts are unlikely to be harmful.
• PDB entry 1msw reveals the structure of a very small RNA polymerase that is made by the bacteriophage T7, shown here with blue tubes. A
small transcription bubble, composed of two DNA strands and an RNA strand, is bound in the active site. Notice how the two DNA strands form a
double helix at the top of the picture. The enzyme separates them in the middle and builds an RNA strand using the DNA on the right. Finally, at
the bottom, the two DNA strands come back together.
• This structure was not determined by Roger Kornberg, but rather Tom Steitz, a famous x-ray crystallographer. Professor Steitz was awarded the
The Nobel Prize in Chemistry 2009 "for studies of the structure and function of the ribosome"(http://nobelprize.org/nobel_prizes/chemistry/
laureates/2009/index.html)
• Question: What does this mean "many RNA transcripts are synthesized"?
• Answer: This statement refers to the fact that there is not one mRNA for each gene. When a gene is being expressed, it implies that there are
many RNA polymerases are copying it and making many mRNA molecules. If a couple of them have a mistake, it is probably not a big deal.
25
Transcription is a highly regulated process
Example: the lac operon

lactose absent lactose present

For transcription to occur, lactose must bind to the lac repressor.


This binding changes the conformation of the protein such that it
can no longer bind to the operator site and interfere with the
function of RNA polymerase

• With only a few exceptions, every cell of the body contains a full set of chromosomes and
identical genes. Only a fraction of these genes are turned on at any one time, however, and it is
the subset that is "expressed" that confers unique properties to each cell type.
• "Gene expression" is the term used to describe the transcription of the information contained
within the DNA, the repository of genetic information, into messenger RNA (mRNA) molecules
that are then translated into the proteins that perform most of the critical functions of cells.
• Biologists study the kinds and amounts of mRNA produced by a cell to learn which genes are
expressed, which in turn provides insights into how the cell responds to its changing needs.
• Gene expression is a highly complex and tightly regulated process that allows a cell to respond
dynamically both to environmental stimuli and to its own changing needs. This mechanism acts
as both an "on/off" switch to control which genes are expressed in a cell as well as a "volume
control" that increases or decreases the level of expression of particular genes as necessary.
• The lac operon shown in this movie is one of the simplest gene regulation mechanisms, but it is
actually a bit more complicated than the extremely simplified version shown here.
The mechanisms of DNA and RNA elongation26
are similar

Active site of DNA polymerase Active site of RNA polymerase

• The catalytic site of RNA polymerase resembles that of DNA polymerase in that it includes two metal
ions in its active form. One metal ion remains bound to the enzyme, whereas the other appears to
come in with the nucleoside triphosphate and leave with the pyrophosphate. Three conserved
aspartate residues of the enzyme participate in binding these metal ions. Note that the overall
structures of DNA polymerase and RNA polymerase are quite different; their similar active sites are
the products of convergent evolution.
27
Transcription is much more complex in
eukaryotes than in prokaryotes

• In prokaryotes (bacterial and archaeal cells defined by the fact that they lack a nucleus), translation
of mRNA begins while the transcript is still being synthesized.
• In eukaryotes (animal, plant, and fungi cells defined by the fact that they have a nucleus),
transcription and translation take place in different cellular compartments: transcription takes place
in the membrane-bounded nucleus, whereas translation takes place outside the nucleus in the
cytoplasm.
• A second major difference between prokaryotes and eukaryotes is the extent of RNA processing.
Eukaryotes extensively process nascent pre-mRNA destined to become mature mRNA. Primary
transcripts (pre-mRNA molecules), the products of RNA polymerase action, acquire a cap at their 5’
ends and a poly(A) tail at their 3’ ends. Most importantly, nearly all mRNA precursors in higher
eukaryotes are spliced.
• primary transcript: Initial RNA product, containing introns and exons, produced by transcription of
DNA. Many primary transcripts must undergo RNA processing to form the physiologically active
RNA species.

• Question: In many pictures, it only shows the mRNA. How about tRNA and rRNA, are they also go
to the transcription and processing processes?
• Answer: tRNA and rRNA are encoded in the genome and are synthesized by RNA polymerases just
like mRNA is. They will also undergo processing, but it is different than the processing that occurs
for mRNA.
28

Many ribosomes can


translate a single mRNA
simultaneously

• The sequence of amino acids in a protein is translated from the nucleotide sequence in mRNA. In
which direction is the message read? The direction of translation is 5’ to 3’ in terms of the reading of
the mRNA template. This corresponds to synthesis from the N-to-C terminus in terms of the protein
product.
• The direction of translation has important consequences. Recall that transcription also occurs in the
5’-3’ direction. If the direction of translation were opposite that of transcription, only fully synthesized
mRNA could be translated.
• In contrast, because the directions are the same, mRNA can be translated while it is being
synthesized. In prokaryotes, almost no time is lost between transcription and translation. The 5’ end
of mRNA interacts with ribosomes very soon after it is made, much before the 3’ end of the mRNA
molecule is finished.
• An important feature of prokaryotic gene expression is that translation and transcription are closely
coupled in space and time. Many ribosomes can be translating an mRNA molecule simultaneously.
This parallel synthesis markedly increases the efficiency of mRNA translation.
29
Mature eukaryotic vs. prokaryotic mRNA

Eukaryotic mRNA

• The 5' cap (also called an RNA cap, an RNA 7-methylguanosine cap or an RNA m7G cap) is a
modified guanine nucleotide that has been added to the 5' end of the messenger RNA shortly after
the start of transcription. The 5' cap consists of a terminal 7-methylguanosine residue which is linked
through a 5'-5'-triphosphate bond to the first transcribed nucleotide. Its presence is critical for
recognition by the ribosome and protection from RNases.
• Coding regions are composed of codons, which are decoded and translated into protein by the
ribosome. Coding regions begin with the start codon and end with the one of three possible stop
codons. In addition to their protein-coding role, portions of coding regions may also serve as
regulatory sequences.
• Untranslated regions (UTRs) are sections of the RNA before the start codon and after the stop
codon that are not translated, termed the five-prime untranslated region (5' UTR) and three-prime
untranslated region (3' UTR), respectively. These regions are transcribed as part of the same
transcript as the coding region. Several roles in gene expression have been attributed to the
untranslated regions, including mRNA stability, mRNA localization, and translational efficiency. The
ability of a UTR to perform these functions depends on the sequence of the UTR and can differ
between mRNAs.
• The 3' poly(A) tail is a long sequence (often several hundred) of adenine nucleotides added to the
"tail" (3' end) of the pre-mRNA.
• From http://en.wikipedia.org/wiki/MRNA
30
Splicing of mammalian mRNA:
Introns and Exons

The primary transcript is ‘spliced’ to form the correct reading sequence of the gene

• Intron: Part of a primary transcript (or the DNA encoding it) that is removed by splicing during RNA
processing and is not included in the mature, functional mRNA, rRNA, or tRNA; also called
intervening sequence.
• Exon: Segments of a eukaryotic gene (or of its primary transcript) that reaches the cytoplasm as part
of a mature mRNA, rRNA, or tRNA molecule.
• Introns are precisely excised from primary transcripts, and exons are joined to form mature mRNAs
with continuous messages. Alternative splicing enlarges the repertoire of proteins in eukaryotes and
is a clear illustration of why the proteome is more complex than the genome.
• Right hand figure and following legend from Nature Reviews Genetics 5, 389-396 (May 2004)
“There are several conserved motifs in the nucleotide sequences near the intron–exon boundaries
that act as essential splicing signals: GU and AG dinucleotides at the exon–intron and intron–exon
junctions, respectively (5'- and 3'-splice sites), a polypyrimidine tract (Py)n and an A nucleotide at the
branch site. Splicing takes places in two transesterification steps. In the first step, the 2'-hydroxyl
group of the A residue at the branch site attacks the phosphate at the GU 5'-splice site. This leads to
cleavage of the 5' exon from the intron and the formation of a lariat intermediate. In the following
step, a second transesterification reaction, which involves the phosphate (p) at the 3' end of the
intron and the 3'-hydroxyl group of the detached exon, ligates the two exons. This reaction releases
the intron, still in the form of a lariat.”
31
HIV: reverse transcriptase is essential

see ‘HIV live cycle’ animation on webpage

• Retroviruses: these viruses can reverse the flow of genetic information (RNA to DNA rather than
from DNA to RNA)! The most famous retrovirus is human immunodeficiency virus 1 (HIV-1), the
cause of AIDS. Retroviruses have two identical copies of a single-stranded RNA genome and an
outer envelope containing protruding viral glycoproteins.
0. The retroviral envelope fuses directly with the plasma membrane (step 1).
1. Following fusion, the nucleocapsid enters the cytoplasm of the cell; then deoxynucleoside
triphosphates from the cytosol enter the nucleocapsid, where viral reverse transcriptase
and other proteins copy the ssRNA genome of the virus into a dsDNA copy (step 2).
2. The viral DNA copy is transported into the nucleus (only one host-cell chromosome is
depicted) and integrated into one of many possible sites in the host-cell chromosomal DNA
(step 3).
3. The integrated viral DNA, referred to as a provirus, is transcribed by the host-cell RNA
polymerase, generating mRNAs (dark red) and genomic RNA molecules (light red). The
host-cell machinery translates the viral mRNAs into glycoproteins and nucleocapsid
proteins (step 4).
4. The latter assemble with genomic RNA to form progeny nucleocapsids, which interact with
the membrane-bound viral glycoproteins. Eventually the host-cell membrane buds out and
progeny virions are pinched off (step 5).
32

HIV: reverse transcriptase

+ primer + primer

David S. Goodsell: The Molecule of the Month appearing at the PDB

• Reverse transcriptase performs several different functions. As indicated by the name, it can build
DNA strands based on an RNA template. This reaction is performed in the polymerase active site,
which is formed by two sets of arms that surround the RNA and DNA. The polymerase site is at the
top in this illustration, taken from PDB entry 2hmi. After building the DNA strand, the enzyme then
removes the original RNA strand by cleaving it into pieces. This is performed by a nuclease active
site, which is located at the opposite end of the enzyme. Finally, it builds a second DNA strand
matched to the one that was just created to form the final DNA double helix. This reaction is also
performed by the polymerase site.
• Reverse transcriptase performs a remarkable feat, reversing the normal flow of genetic information,
but it is rather sloppy in its job. The polymerases used to make DNA and RNA in cells are very
accurate and make very few mistakes. This is essential because they are the caretakers of our
genetic information, and mistakes may be passed on to our offspring. Reverse transcriptase, on the
other hand, makes lots of mistakes, up to about one in every 2,000 bases that it copies. This high
error rate turns out to be an advantage for the virus in this era of drug treatment. The errors allow
HIV to mutate rapidly, finding drug resistant strains in a matter of weeks after treatment begins.
Fortunately, the recent development of treatments that combine several drugs are often effective in
combating this problem. Since the virus is simultaneously attacked by several different drugs, it
cannot mutate to evade all of them at the same time.
33
RNA can adapt well-defined tertiary structures

http://prion.bchs.uh.edu/bp_type/bp_structure.html

• Unlike DNA, which exists primarily in a single, very long three-dimensional structure, the double
helix, the various types of RNA exhibit different conformations. Differences in the sizes and
conformations of the various types of RNA permit them to carry out specific functions in a cell.
• The simplest secondary structures in single-stranded RNAs are formed by pairing of
complementary bases. “Hairpins” are formed by pairing of bases within ~ 5 to 10 nucleotides of
each other, and “stem-loops” by pairing of bases that are separated by ~50 to several hundred
nucleotides. These simple folds can cooperate to form more complicated tertiary structures, one
of which is termed a “pseudoknot”. As discussed on the next page, tRNA molecules adopt a well-
defined three-dimensional architecture in solution that is crucial in protein synthesis.
• Stem-loops, hairpins, and other secondary structures can form by base pairing between distant
complementary segments of an RNA molecule. In stem-loops, the single-stranded loop (dark red)
between the base-paired helical stem (light red) may be hundreds or even thousands of
nucleotides long, whereas in hairpins, the short turn may contain as few as 6 – 8 nucleotides.
• Interactions between the flexible loops may result in further folding to form tertiary structures such
as the pseudoknot. This tertiary structure resembles a figure-eight knot, but the free ends do not
pass through the loops, so no knot is actually formed.
RNA can adapt well-defined tertiary structures34
>Yeast phenyalanine tRNA
GCGGAUUUAGCUCAGUUGGGAGAGCGCCAGACUG
AAGAUCUGGAGGUCCUGUGUUCGAUCCACAGAAU
primary (1°)
UCGCACCA

secondary (2°) tertiary (3°)

http://ndbserver.rutgers.edu/atlas/xray/structures/T/tr0001/tr0001.html

• Transfer RNA (abbreviated tRNA), is a small RNA chain (73-93 nucleotides) that transfers a specific
amino acid to a growing polypeptide chain at the ribosomal site of protein synthesis during translation. It
has a 3' terminal site for amino acid attachment. This covalent linkage is catalyzed by an aminoacyl tRNA
synthetase. It also contains a three base region called the anticodon that can base pair to the
corresponding three base codon region on mRNA. Each type of tRNA molecule can be attached to only
one type of amino acid, but because the genetic code contains multiple codons that specify the same
amino acid, tRNA molecules bearing different anticodons may also carry the same amino acid.
• http://en.wikipedia.org/wiki/Transfer_RNA
35

• mFold is a tool that enables the prediction of DNA or RNA secondary structure.
• It has been in operation since 1995, making it one of the oldest bioinformatics tools on the web.
• http://mfold.rna.albany.edu/?q=mfold/RNA-Folding-Form
• mFold was developed primarily by Dr. Michael Zuker, now at the Rensselaer Polytechnic Institute,
while he was affiliated with the NRCC and later with Washington University, in St. Louis.
36
mFold does a fairly good job of predicting
tRNA 2° structure

Rotate and flip

• These are the results obtained when I submitted the yeast phenylalanine sequence to the mfold
server
• Yeast phenyalanine tRNA
GCGGAUUUAGCUCAGUUGGGAGAGCGCCAGACUGAAGAUCUGGAGGUCCUGUG
UUCGAUCCACAGAAUUCGCACCA
• The predicted structures are practically identical to the known structure that has been
experimentally determined and verified using multiple techniques.
37
But the true structure is not always the one
predicted to have the lowest energy
Human Phenylalanine tRNA
GCCGAAAUAGCUCAGUUGGGAGAGCGUUAGACUGAAGAUC
UAAAGGUCCCUGGUUCGAUCCCGGGUUUCGGCA

dG = -30.1 dG = -29.2 dG = -29.0 dG = -28.3

• Try it yourself using the human Phe tRNA sequence. This sequence is available on the website
as .txt.
• Q. What factor is mFold not taking into account that could explain the difference between
theoretical and experimental 2° structures?
• A. The tertiary structure. Their could be contacts in 3 dimensions between regions that are distant
in primary and secondary structure. The contacts could provide additional stabilization to one
particular arrangement of secondary structural elements.
38
Summary of RNA and Transcription
• RNA polymerase synthesis RNA from a DNA template
• RNA polymerase locally unwinds the double stranded DNA to
make a ‘transcription bubble’
• The catalytic mechanisms of RNA polymerase and DNA
polymerase are very similar
• RNA has 3 main roles in proteins synthesis but this will be
discussed in more detail next class
• RNA production is a highly regulated process
• In eukaryotes, mRNA is initially produced as a series of exons
and introns. Splicing out of the introns, plus further modifications,
provides the mature mRNA.
• RNA can adapt well defined tertiary structures. Software is getting
pretty good at predicting these structures.
• Reverse transcriptase goes against the standard ‘flow of
information’: it makes DNA from RNA

You might also like