Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 95

Enhancer: An enhancer is a nucleotide sequence to which transcription factor(s) bind, and which

increases the transcription of a gene. It is NOT part of a promoter; the basic difference being that an
enhancer can be moved around anywhere in the general vicinity of the gene (within several thousand
nucleotides on either side or even within an intron), and it will still function. It can even be clipped out
and spliced back in backwards, and will still operate. A promoter, on the other hand, is position- and
orientation-dependent. Some enhancers are "conditional" - in other words, they enhance
transcription only under certain conditions, for example in the presence of a hormone.
Expression: To "express" a gene is to cause it to function. A gene which encodes a protein will, when
expressed, be transcribed and translated to produce that protein. A gene which encodes an RNA
rather than a protein (for example, a rRNA gene) will produce that RNA when expressed.
Nick translation: A method for incorporating radioactive isotopes (typically 32P) into a piece of DNA.
The DNA is randomly nicked by DNase I, and then starting from those nicks DNA polymerase I digests
and then replaces a stretch of DNA. Radiolabeled precursor nucleotide triphosphates can thus be
incorporated.
Non-coding strand: Anti-sense strand.
Conditional gene expression - controlled inducible expression of transgene either in vitro or in vivo.
Constitutive gene or constitutive expression - a gene that is transcribed continually compared to a
facultative gene which is only transcribed as needed.
Cis-dominant - mutations (e.g., of an operator) that alter the functioning of genes on that same piece
of DNA. It arises because the operator represents a site on the DNA rather than a gene that encodes a
product.
Down regulation - decreasing the rate of gene expression.
Housekeeping gene - typically a constitutive gene that is transcribed at a relatively constant level
across many or all known conditions.
Inducible gene - a gene whose expression is either responsive to environmental change or dependent
on the position of the cell cycle.
Promoter: region of DNA crucial to the accuracy and rate of transcription initiation. Usually, but not
always, immediately upstream of the gene itself. Region to which RNA polymerase binds in order to
initiate transcription.
Proof reading: The ability of a DNA polymerase to correct misincorporated nucleotides as a result of
its 3' to 5' exonuclease activity
Operon: section of DNA in which two or more related genes lie adjacent to one another and are
transcribed from a single promoter into polycistronic mRNA. Common in bacteria, rare or unknown in
eukaryotes.

Repression - decreasing the rate of gene expression.


Repressor - a DNA-binding protein that regulates the expression of one or more genes by binding to
the operator and blocking the attachment of RNA polymerase to the promoter, thus preventing
transcription of the genes.
RNA splicing - modification of an RNA strand where exons (the coding regions of a transcribed gene)
are retained and the introns are removed. Sometimes the exons are recombined either in vivo or
experimentally to form alternative splicings, which have various functional effects.
Overlapping genes: Two genes whose coding regions overlap, either completely or partially.
Palindrome: DNA sequence which reads the same in both directions, taking account of the two
strands. A simple example is
5'-AAAAAATTTTTT-3'
3'-TTTTTTAAAAAA-5'

[1]
Negative control: regulation of transcription in which factors are normally present to prevent RNA
synthesis; activity only occurs after removal of such factors (repressors).

Physical mapping: a linear map of the locations of genes on a chromosome as determined by the
physical detection of overlaps between cloned DNA fragments rather than by linkage analysis.

Primer: A small oligonucleotide (anywhere from 6 to 50 nt long) used to prime DNA synthesis. The
DNA polymerases are only able to extend a pre-existing strand along a template; they are not able to
take a naked single strand and produce a complementary copy of it de-novo. A primer which sticks to
the template is therefore used to initiate the replication. Primers are necessary for DNA sequencing
and PCR.
Upstream/Downstream: In an RNA, anything towards the 5' end of a reference point is "upstream" of
that point. This orientation reflects the direction of both the synthesis of mRNA, and its translation -
from the 5' end to the 3' end. In DNA, the situation is a bit more complicated. In the vicinity of a gene
(or in a cDNA), the DNA has two strands, but one strand is virtually a duplicate of the RNA, so it's 5'
and 3' ends determine upstream and downstream, respectively. NOTE that in genomic DNA, two
adjacent genes may be on different strands and thus oriented in opposite directions. Upstream or
downstream is only used on conjunction with a given gene.

CHAPTER: 1 DNA

Composition of nucleic acids


Nucleic acids are biopolymers of high molecular weight with mononucleotide as their repeating
units. Each mononucleotide consists of the following:
(A) Nitrogenous bases
(B) Phosphoric acid
(C) Pentose sugars

(A) Nitrogenous bases


Two types of major nitrogenous bases, which account for the base composition of DNA or RNA,
are found in all nucleic acids. These are:
a) Purine bases
b) Pyrimidine bases

Pyrimidines Purine

[2]
Both the purine and pyrimidine bases are planar molecules, owing to their π-electron clouds. Purine and
pyrimidine bases are hydrophobic and relatively insoluble in water at the near neutral pH of cell. Purines
can exist in syn or anti forms; pyrimidines can exist in anti form because of steric interference between
the sugar and carbonyl oxygen at C-2 of pyrimidine.
Besides, the major nitrogenous bases, some minor bases also called modified nitrogenous bases (purines
and pyrimidines) also occur in polynucleotide structures.
Some naturally occurring forms of modified purines are hypoxanthine, xanthine, uric acid, 6-
methyladenine (6-Me), 6-dimethyladenine (6-DiMe), 6-N-isopentenyladenine (6-IPA), 1- methylguanine
(1-MeG), 2-dimethylguanine (2-DiMeG). Among the modified purines, some are found in tRNA.
Methylation is the most common form of purine modification in microorganisms. Methylation is the most
common form of purine modification in microorganisms.
Some naturally occurring forms of modified pyrimidines (e.g. 5,6-dihydrouracil, pseudouracil, 4-Thiouracil
etc.) are common in tRNA (described later). Other examples include 5-methylcytosine (5-MeC) and 5-
hydroxymethylcytosine. The 5-methylcytosine is a common component of higher plant and animal DNA.
Infact up to 25% of the cytosine residues of plant genome are methylated. The DNA of plants is richer in
5-MeC than the DNA of animals. The DNA of the T-even bacteriophages (T , T ) of E. coli has no cytosine
2 4
but instead has 5-hydroxymethylcytosine and its glucoside derivatives

(B) Phosphorus
Phosphorus, present in the backbone of nucleic acids, is a constituent of phosphodiester bond that links
the two sugar moieties. The molecular formula of phosphoric acid is H PO . It contains three monovalent
3 4
hydroxyl groups and a divalent oxygen atom, all linked to the pentavalent phosphorus atom
(C) Sugar
Both DNA and RNA contain five-carbon ketose sugar, i.e. a pentose sugar. The essential difference
between DNA and RNA is the type of sugar they contain. RNA contains the sugar D-ribose (hence called
ribonucleic acid, RNA) whereas DNA contains its derivatives 2’-deoxy-D-ribose, where the 2’-hydroxyl
group of ribose is replaced by hydrogen (hence called deoxyribonucleic acid, DNA). Sugars are always in
closed ring β-furanose form in nucleic acids and hence are called furanose sugars because of their
similarity to the heterocyclic compound furan.

(A) Nucleosides
The nucleosides are compounds in which nitrogenous bases (purines and pyrimidines) are conjugated to
the pentose sugars (ribose or deoxyribose) by a β-N-glycosidic linkage. These consist of a base joined to a
pentose sugar at position C1′. The sugar C1′ carbon atom is joined to the N1 atom of pyrimidine and the
N9 atom of purine. This represents a β-N-glycosidic bond. Thus, the purine nucleosides are N-9 glycosides
and the pyrimidine nucleosides are N-1 glycosides. These are stable in alkali. The purine nucleosides are
[3]
readily hydrolyzed by acid whereas pyrimidine nucleosides are hydrolyzed only after prolonged
treatment with concentrated acid. Comparison of some morphological features and selected bond torsion
angles and helical parameters of the three major types of DNA helix:

In case of pseudouridine, base is attached to sugar through C5 of base as opposed to that in case of
uridine, where the attachment of base to sugar is through N1

Two nucleoside analogues, 3′-azidodeoxythymidine (AZT) and 2′, 3′-dideoxycytidine (DDC), have found
therapeutic use for the treatment of AIDS patients

(B) Nucleotides or Nucleoside 5’-triphosphates These are phosphate esters of nucleosides i.e.
nucleosides form nucleotides by joining with phosphoric acid. Esterification can occur at any free
hydroxyl group, but is most common at the 5′ and 3′ positions in sugars

Energy carriers: Nucleotides represent energy rich compounds that drive metabolic process, especially
biosynthetic, in all cells. Hydrolysis of nucleoside triphosphate provides the chemical energy to drive a
wide variety of cellular reactions. ATP is the most widely used for this purpose. UTP, GTP, CTP are also
used. Nucleoside triphosphate also serves as the activated precursors of DNA and RNA synthesis. The
hydrolysis of ester linkage (between ribose and α-phosphate) yields about 14 kJ / mol under standard
conditions, whereas hydrolysis of each anhydride bond (between α-β and β-γ phosphates) yields about
30 kJ / mol. ATP hydrolysis often plays an important thermodynamic role in biosynthesis.

Enzyme cofactors: Many enzyme cofactors include adenosine in their structure, e.g., NAD, NADP, FAD.
Chemical messengers: Some nucleotides act as regulatory molecules and serve as chemical signals or
secondary messengers, key links in cellular systems that respond to hormones and other extracellular
stimuli and lead to adaptive changes in cells interior. Two hydroxyl groups can be esterified by the same
phosphate moiety to generate a cyclic AMP (cAMP, adenosine 3’-5’ cyclic phosphate) or cyclic GMP
(cGMP, guanosine 3’-5’ cyclic phosphate).

Base Nucleoside Nucleotide Abbreviation in


RNA / DNA
Adenine adenosine adenylic acid AMP / dAMP
Guanine guanosine guanylic acid GMP / dGMP
Cytosine cytidine cytidylic acid CMP / dCMP
Thymine tymidine thymidylic acid ------ / dTMP
Uracil uridine uridylic acid UMP / ------

[4]
Structural levels of nucleic acids
Nucleic acids possess following structures:
(a) Primary structure
The nature, properties and function of the two nucleic acids (DNA and RNA) depend on the exact order of the
purine and pyrimidine bases in the molecule. This sequence of specific bases is termed as the primary
structure. Thus, primary structure of nucleic acid is its covalent structure and nucleotide sequence.
(b) Secondary structure
The term secondary structure relates to regions of regular conformation of the chain, stabilized by regular,
repeating interactions (e.g. double helix of DNA). Thus, any regular, stable structure taken up by some or
all of the nucleotides in a nucleic acid can be referred to as secondary structure.
Nucleic acid secondary structures are generated by two kinds of noncovalent interactions between bases. The
secondary structure of DNA is characterized by intermolecular base pairing to generate double stranded
or duplex molecules. Watson and Crick base pairs form the basis of secondary structure interactions in
nucleic acids as well as explaining Chargaff’s rule. Secondary structures in RNA, which exist primarily in
single stranded form, generally reflect intramolecular base interactions. Thus, the secondary structures
arise due to following interactions:
* Complementary base pairing: It involves stable and specific configurations of H-bonds between bases in
DNA. It is the predominant force causing nucleic acid strands to associate. The molecular basis of
Chargaff’s rule is complementary base pairing between A-T and between G-C in double stranded DNA.
Chargaff’s rule was later explained by double helical structure described by Watson and Crick. G:C with
three H-bonds are more stable than A:T (or A:U).

[5]
*Base stacking: The structures are stabilized by hydrophobic interactions between adjacent bases brought
about by electrons in π rings. It is these π-π interactions, which are described as base stacking forces.
* Alternative forms of base pairing: Watson-Crick base pairs (A: T and G:C) are predominant in the structure
and function of nucleic acids. However, there are 28 possible arrangements of at least two H-bonds
between bases, which provide the basis for a diverse set of interactions. The most significant to these
alternative configurations are the Hoogsteen base pairs, which contribute to tRNA structure and allow the
formation of triple helices. A modification to Watson-Crick base pairs is the Wobble pairs, which allow
bases in the 5’-anticodon position of tRNA to pair ambiguously with the mRNA. The Wobble base pairs are
formed because bases are offset from their normal Watson-Crick positions and one of the H-bonds is lost.
*Intramolecular base pairing: In RNA and single stranded regions of DNA (non-duplex DNA), secondary
structure is determined by intramolecular base pairing. Since cellular DNA is usually present as a duplex,
the bases are available for intramolecular interactions only rarely. Conversely intramolecular secondary
structures are abundant in cellular RNA and underlie their functional specialization. The major classes of
intramolecular nucleic acid secondary structures are bulges, bulge loops, bubbles, hairpins, stem loops,
panhandle, cruciform. Lariats are often classified as secondary structures, but because they are formed by
the covalent bonds joining nucleotides, they are strictly primary structures.
One common type of secondary structure found in single strands of nucleotides is a hairpin, which forms
when sequences of nucleotides on the same strand are inverted complements.
(C) Tertiary structure
The complex folding of large chromosomes within eukaryotic chromatin and bacterial nucleoids is generally
considered tertiary structure. Thus, tertiary structures of nucleic acid reflect interactions, which contribute
to overall 3D shape.
(D) Quaternary structure
In many structures, nucleic acid interacts in trans (e.g. the ribosome and spliceosome) and this may be
considered a quaternary level of nucleic acid structure. Nucleic acids also interact with an enormous number
of proteins (e.g. genome structural proteins, transcription factors, enzymes, splicing factors). Many of these
proteins have a significant effect on DNA or RNA conformation. Interactions with proteins may be general or
sequence specific and may involve subtle or overt changes in structure. The restriction enzymes EcoRI and
EcoRV, for e.g., both introduce a pronounced kink in the DNA at their recognition sequence which may
facilitate their endonucleolytic activity. Proteins of the high mobility group (HMG) class appear specifically to
bend DNA in order to facilitate interactions between components bound at distant sites.

Evidence that DNA is the genetic information carrier

The fact that DNA is genetic material came from the experiments using bacteria and viruses.

The first series of experiments were performed by a British bacteriologist F. Griffith in 1928, using the
bacterium Diplococcus pneumoniae which causes pneumonia in mammals.

Griffith noticed that this bacterium had two types of strains.


 S-type, which was capsulated and produced a smooth colony on a synthetic medium.
 R-type, which was non-capsulated and produced rough colony on a synthetic medium.
When S-type of bacteria was injected into healthy mice, the mice developed pneumonia and died. So S-type
was named as virulent or pathogenic. However, R-type of bacteria was non-pathogenic.
If heat killed S-type of bacteria were injected into healthy mice, they did not cause disease and the mice
remained healthy.
When heat killed S-type of bacteria were mixed with R-type living bacteria and the mixture injected into
healthy mice, the mice developed pneumonia and died.

[6]
When bacteria were isolated from the dead mice, they were of living S-type and R-type

The last experiment resulted in the death of most of the mice. More surprisingly yet was that the blood of the
dead mice contained live ‘S’ pneumococci. The dead ‘S’ pneumococci initially injected into the mice had
somehow transformed the otherwise innocuous ‘R’ pneumococci to the virulent ‘S’ form. Furthermore, the
progeny of the transformed pneumococci were also ‘S’; the transformation was permanent. Eventually, it was
shown that transformation could also be made in vitro by mixing ‘R’ cells with a cell-free extract of ‘S’ cells.
This experiment could not, however, explain that DNA is the transforming principle.

In 1944, Ostwald Avery, Colin MacLeod and Maclyn McCarty, after a 10-year investigation, extended Griffith’s
experiment and reported that transforming material is DNA. The conclusion was based on the observation
that the laboriously purified transforming material had all the physical and chemical properties of DNA,
contained no detectable protein, was unaffected by enzymes that catalyze the hydrolysis of proteins and RNA,
and was totally inactivated by treatment with an enzyme that catalyzes the hydrolysis of DNA. DNA must
therefore be the carrier of genetic information.

In 1952, Alfred Hershey and Martha Chase performed ‘Blender experiment’ to demonstrate that DNA is
genetic material in bacteriophage. Bacteriophage T 2 was grown on E. coli in a medium containing the
radioactive isotopes 32P and 35S. They labeled the phage capsid, which contains no P, with 35S, and its DNA,
which contains no S, with 32P. These phages are added to an unlabeled culture of E. coli. After sufficient time
allowed for the phages to infect the bacterial cells, the culture was agitated in a blender so as to shear the
phage heads from the bacterial cells. This rough treatment neither injured the bacteria nor ghosts were
separated from the bacteria (by centrifugation), the ghosts were found to contain most of the 35S, whereas
the bacteria contained most of the 32P. Furthermore, 30% of the 32P appeared in the progeny phages but
only 1% of the 35S did so. Hershey and Chase therefore concluded that only the phage DNA was essential for

[7]
the production of progeny and the protein coat served only as a protective shell. DNA, therefore, must be the
hereditary material.

DNA structure
(a) Chargaff’s Equivalence Rule
In 1950, E. E. Chargaff formulated important generalizations about DNA structure based on the data of
quantitative chromatographic methods for separation and quantitative analysis of four bases in
hydrolysates of DNA specimen isolated from different organisms. These generalizations are called
Chargaff’s equivalence rule. These include:
*Base composition of DNA varies from one species to another.
*DNA specimens isolated from different tissues of the same species have the same base composition.
*The base composition of DNA in a given species does not change with age, nutritional state, or changes in
environment.
*Purines (A, G) and pyrimidines (T, C) are always equal such that amount of A is equal to T and the amount of
G is always equal to C, i.e. A=T, G=C (Molar equivalence of few bases).
*Base ratio A+T / G+C may vary from one species to other, but is constant for a given species. This ratio can be
used to identity the source of DNA and can sometimes help in classification.
*The deoxyribose sugar and phosphate components occur in equal proportions.

Double helical structure of DNA (Watson-Crick model) (B-DNA)


In 1953, J. D. Watson and F. H. Crick postulated precise 3-D model of DNA structure, based on the X-Ray data
of Franklin and Wilkins and the base equivalence observed by Chargaff. This model accounted for
many of the observations on the chemical and physical properties of DNA and also suggested a
mechanism for accurate replication of genetic information. DNA contains two polynucleotide chains
that are coiled in helical fashion around the same axis in right handed or counterclockwise direction,
thus forming a double helix. The two chains or strands are antiparallel i.e. their 3’, 5’- internucleotide
phosphodiester bridges run in opposite directions (as determined by nearest neighbour analysis).
[8]
These chains are complementary to each other. The antiparallel orientation is a stereochemical
consequence of the way that A and T and G and C pair with each other. All the phosphodiester
linkages have the same orientation along the chain, giving each linear nucleic acid strand a specific
polarity and distinct 5’ and 3’ ends. By definition 5’ end lacks a nucleotide at 5’ position, 3’ end lacks
nucleotide at 3’ position.
*The backbone of helix consists of sugar and phosphate groups while bases are perpendicular to the
backbone, projecting inwards to the center. Purine and pyrimidine bases are stacked inside the helix
with their planes parallel to each other and perpendicular to the helix axis. Backbone is found on the
periphery of the helix and is hydrophilic. Hydroxyl groups of sugar forms H-bonds with water.
Phosphate groups with pK near zero are negatively charged at neutral pH and negative charges are
a
generally neutralized by ionic interaction with positive charges of protein, metals and polyamines.
Bases are hydrophobic and shielded from water. It means that single stranded structure, in which the
bases are exposed to aqueous environment, is unstable. Hence DNA is double helix. DNA double helix
is held together by two forces: H-bonding of complementary base pairs and hydrophobic interactions.
*A base pair consists of a purine and a pyrimidine. Moreover, a specific purine pairs with a specific pyrimidine
owing to a perfect match between hydrogen donor and acceptor sites on the two bases. The bases of one
strand are paired in the same planes with the bases of other strand. Base pairing is due to steric and H-
bonding factors. Base A is bonded with T by two H-bonds (double bond) and G is bonded to C by triple H-
bond. Only A and T and also G and C have the proper spatial arrangements to form correct H-bonding. This is
the concept of specific base pairing. The allowed pairs are A-T and G-C which are precisely the base pairs
showing Chargaff’s equivalence in DNA. Thus, Watson-Crick double helix involves not only the maximum
possible number of H-bonded base pairs but also those pairs giving maximum fit and stability. The individual
H-bond is weak in nature, but, as in the case of proteins, a large number of them involved in the DNA
molecule confer stability to it. However, the stability of DNA is primarily a consequence of van der Waals
forces and hydrophobic (base stacking) interactions between the planes of stacked bases. Thus, H-bonding is
specific and is responsible for complementarity of two strands, while hydrophobic interactions [(π-π) stacking
interactions between adjacent bases] are non-specific and are responsible for stability of the macromolecule.
The nucleic acid strands tend to stick together even in the absence of specific base pairing, although the
specific interactions make the association stronger.

The two helices are wound in such a way so as to produce two interchain spacings or grooves, a major or wide
groove (width 12 Å, depth 8.5 Å) and a minor or narrow groove (width 6.0 Å, depth 7.5 Å). Thus, major
groove is slightly deeper than minor one. The two grooves arise because the glycosidic bonds of a
base pair are not diametrically opposite each other. The minor grove contains the pyrimidine O-2 and
the purine N-3 of the base pair; and the major groove is on the opposite side of the pair. Potential H-
bond donor and acceptor atoms line each groove. The major groove displays more distinctive features
than the minor groove. In these grooves, specific proteins interact with sequences of DNA. Such
double helices cannot be pulled apart and can be separated only by the unwinding process. They are
called as plectonemic coils, i.e., coils that are interlocked about the same axis. The helical structure
helps in shielding of the bases from the environment, thereby protecting the genetic information from
physical and chemical attack.
Rotation about the C-1' bond allows two orientations: anti- extends the base and pentose rings in opposite
directions. For pyrimidines,this means that O-2 faces away from the pentose. syn- orients the base and the
pentose in the same direction.

[9]
Free purine nucleosides, in particular guanosine, favour the syn- orientation, but adopt the common anti-
orientation within most DNA and RNA helices. Pyrimidines adopt antiorientation almost exclusively, because
of steric interference between O-2 and C-5' in the syn- orientation,

Features of the Watson Crick pairing


1. The permitted hydrogen bonds are: adenine with thymine (2 bonds); and, cytosine with guanine (3
bonds).
2. The dimensions of the 2 permitted base-pairs are similar, i.e. the C1'-C1' distance is nearly identical in
both cases.
3. The beta-glycosidic bond is attached on the same edge of the base pair.
4. Although some of the atoms in the purine and pyrimidine bases are involved in hydrogen bonds,
there is still potential for further hydrogen bonding. This potential is particularly important for sequence
specific protein binding.

5. The Watson-Crick base-pair is a planar structure.

DNA BENDING:- DNA bending is an intrinsic property depending on stacking interactions, which according to
local sequence, may be isotropic (unbiased) or anisotropic (bending in a specific direction). Intrinsic
DNA bends occur in A-T rich runs and in repeats of the sequence GGCC in step with helical periodicity.
DNA bending can also be induced by proteins (nucleic acid binding proteins) and by circularization
(DNA topology). Induced bending is necessary for DNA packaging in chromosomes and for replication,
recombination and transcription. Proteins may also recognize DNA that is bent in a certain way (e.g.
topoisomerase).

[10]
Conformation
A B Z

Morphological characteristics

Helical sense R R L
Pitch (base pairs per turn) 11 10 12
Major groove Deep, narrow Wide Flat
Minor groove Broad, shallow Narrow Narrow and very deep
Helix diameter 2.3 nm 1.9 nm 1.8 nm

Torsional parameters
Sugar pucker C3’ endo C2’ endo Alternating
Glycosidic bond angle anti anti Alternating anti/syn

Helical parameters

Displacement -4.4 0.6 3.2

Twist 33 36 -49/-10

Rise 2.6 3.4 3.7

Inclination 22 -2 -7

Measuring helical winding.


►The linking number (L) is the number of times one DNA strand wraps round the other in a duplex, and for
right-handed helices, L is positive. The duplex winding number (Lo) is the linking number for relaxed DNA
and represents the most energetically favorable configuration.
twist is a measure of the helical winding of the DNA strands around each other,
whereas writhe is a measure of the coiling of the axis of the double helix, which is called super-coiling.
A right-handed coil is assigned a negative number (negative supercoiling) and a left-handed coil is assigned a
positive number (positive supercoiling).
►For B-DNA, the average Lo=n / 10.3 where n is the number of base pairs. If DNA is relaxed DNA, L =L0, but
any deviation from this state by over-winding or under-winding creates torsional strain.
►In open DNA (DNA with free ends), the strain is countered by rotation of the strands relative to each other,
whereas in covalently closed DNA (circular DNA or DNA with fixed ends) oppositional rotation is prevented
and torsional strain must be countered by super coiling.
[11]
►Measuring supercoiling. The degree of supercoiling in a given DNA molecule is expressed as the super
helical density (λ), which is calculated as follows:

►The superhelix winding number (τ) is the difference between L and Lo. If DNA is overwound, positive
supercoils are introduced and τ is positive, whereas underwound DNA generates negative supercoils and τ is
negative. τ quantifies the degree of torsional strain a given molecule is under and thus its propensity to
undergo supercoiling, but it does not measure the actual number of superhelical turns, because the pitch of
the helix may also be changed by torsional strain.
►The number of superhelical turns is expressed as the writhing number (W). This is related to the linking
number in the equation L = T +W where T, the twisting number, is the total number of turns in a DNA
molecule. The linking number is topological (L invariable under deformation) so any change in W, the number
of turns of superhelix, must be countered by an equal and opposite change in T.
► In a relaxed molecule, L = T, hence W = 0 and all turns are helical turns. One unit of writhe is equivalent to
one half super helical turn, i.e. a turn of 180 0 in the helical axis of the DNA. Each unit of writhe can be thought
of as a point at which two duplexes cross each other when a supercoiled molecule is forced to lie on a flat
surface, such a point being described as a node.

The unwound DNA and supercoiled DNA, having the same value of Lk but differ in Tw (twist) and Wr (writhe)
are topologically identical but geometrically different. 

DNA can assume different secondary structures, depending on the conditions in which it is placed and on its
base sequence. B-DNA is thought to be the most common configuration in the cell. Local variation in DNA
arises as a result of environmental factors and base sequence.
 A-DNA structure
o favored for DNA-DNA duplex under dehydrating conditions
o right-handed, double-stranded (complementary strands)
o favored under physiological conditions for RNA-RNA ( (A_RNA) or RNA-DNA duplexes,
because the 2-hydroxyl of ribose sterically inhibits formation of the B conformation
o Strands antiparallel and complementary in sequence
o Major groove narrow and deep; very little minor groove (not much of a "groove" -- wide and
shallow)
o wider diameter than B-DNA, with hole down the helix axis
o the first fibre diffraction pictures taken by Rosalind Franklin were of a dehydrated form
of DNA, which we now know as A-DNA.
 Z-DNA structure
o left-handed form of double-stranded DNA (complementary strands)
o backbone phosphates "zig-zag"
o favored by alternating purine-pyrimidine sequences, and high salt concentrations (which
minimize the electrostatic repulsion between backbone phosphates)
o Strands antiparallel and complementary in sequence
o almost no major groove (flat); minor groove narrow and deep
[12]
o atoms very tightly packed
o physiological role uncertain -- does occur in short tracts in vivo in both prokaryotes &
eukaryotes, and may have something to do with regulation of expression of some genes, or in
genetic recombination
o . The bases had adopted a syn- conformation rather than the usual anti- conformation with
the result that the repeating unit of the structure was a dinucleotide base pair. Because the
phosphodiester backbone had a zig-zag appearance, they called this form: Z-DNA.
Normally Z- form occur only at high salt concentration, But When some of the bases in the potential Z-form
sequences are methylated, it is stable at lower salt concentrations.Thus Z-DNA having methylated bases may
be stable in vivo.Moreover, its stability is enhanced by cations, including polyamines such as spermine, by
negative supercoiling, and by DNA binding proteins specific for Z-DNA.
►In fact, there is evidence that Z-DNA exists in the interband regions of the giant salivary gland chromosomes
of Drosophila melanogaster and in the transcriptionally active macronucleus of the ciliated protozoan
Stylonychia mytilus. A Rich and colleagues have prepared antibodies specific for Z-DNA; these antibodies do
not react with B-form DNA. Rich and coworkers have shown that the Z-DNA-specific antibodies bind to the
interband regions of the polytene chromosomes of D. melanogaster. It will be of great interest to determine
the sequences and methylation patterns of the DNA in the interband regions of these polytene chromosomes.
►Another hint of the possible involvement of Z-DNA in regulating gene expression is that the structures of
certain regulatory proteins suggest that they may bind in the major groove of left-handed double helices, but
not right-handed helices. Stabilizes its CAP-binding sequences in a left-handed conformation. They further
propose that this right handed to left-handed transition in the double helix unwinds the adjacent promoter or
RNA polymerase-binding site and thus activates transcription of the adjacent structural genes Repressor
proteins might act in the opposite direction, stabilizing regulatory sequences in the right-handed B-form
and preventing transcription. Although their functions are still unknown, Z-DNA-specific binding proteins
have been isolated from Drosophila.
 In principle, Z-DNA formation could have a functional role that need not involve its recognition by proteins.
For example, E. coli RNA polymerase does not transcribe through Z-DNA raising the possibility that the
formation of Z-DNA behind (5') to a moving polymerase may block a trailing RNA polymerase from
transcribing through that region of a gene until the torsional strain stabilizing the Z-DNA is relieved by
topoisomerases. This mechanism might ensure spatial separation between successive polymerases. As a
consequence, processing of an RNA would then be physically and temporally removed from that of
subsequent.

DNA as Polymer
►A linear polymer that has free rotation about all bonds in the chain and has no interaction of side groups
is called a random coil. It does not have a unique 3-D structure or size because it is continually being distorted
by brownian movement. In these structures, each monomer could be at any angle with respect to the
adjacent monomer.
►A perfect random structure does not exist because no bond is perfectly flexible.
►A protein in which all H-bonds are broken but a few disulfide bond remain is sometime could a near random
coil.
►A helical structure arises when monomeric units of the structure are related by a constant rotation about
some axis plus a constant translation along that axis (translation means upward displacement per molecule)
such structure can fit in a cylinder.

[13]
How does intermolecular and intramolecular forces and subunit make a structure.
The 3-D structure of macromolecule is determined by three factors –
1. the allowable bond angles (Ф & Ψ as in case of protein)
2. the interaction between the components of the macromolecule (as different monomer) and
3. the interaction between the solvent and the components.
The solvent interaction are of two types
1. Solvation- which is an attraction between the solvent and components molecule.
2. Hydrophobic interaction:- which is solute-solute interaction as a consequence either of inability to
interact with the solvent or an avoidance interaction.
If we see basic rule we find that if a collection of molecules is unable to be solvated, the molecule will,
instead stick close to one another in order to minimize contact with the solvent.(This is the one of the factor
that makes even single stranded Polynucleotide somewhat rigid) Hydrophobic interactions are not
directional.
Van der Waals attraction:- is a weak force that exists between all molecule. It is effective only at very small
distance. If two region of a macromolecule have complementary shapes the regions can approach one
another closely and the Van der waals force can be quite strong. Thus the Vander Waal’s force is responsible
in part for interactions between two region of a molecule that can fold in such a way that complementary
surface are produced.
►What is Native structure – It can mean any of the following –
1. The structure of a macromolecule as it exists in nature
2. The structure of macromolecule as isolated if it remains enzymatic activity.
3. The form of a macromolecule that has no biological activity but possess secondary structure.
►Meaning of Denatured – is form of a macromolecule that has less secondary structure than that which is
called native for Proton it mean – near random coil for dsDNA it mean – ssDNA.

Helix coil transition


►A transition from an ordered to a disordered structure is called helix coil transition (even if native state is
not helical). The usual agents used to induce a helix coil transition are temperature, pH, salt concentration
and chemical denaturants– such as urea & guanidium chloride for Protein and Formamide, formaldehyde
and ethylene glycol for Nucleic Acid (N.A.)
►Helix Coil transition can be cooperative or Non cooperative. If we consider DNA strand as role model we
will see, that in case of two independent nucleotide, the chances of their binding by H-binding is very rare
although there are thousands pairs in solution., where probability of any pair existing independent of the
existence of any other pair called Non cooperative transition.
►In cooperative transition – if we consider two ssDNA there will be higher chances of stable pairing between
nucleotides because in this type of transition the probability of a pair existing depends on the existence of
other pairs.
►In general all pairs do not have the same strength because they are either chemically different or in
different location in the molecule. Thus in a non cooperative transition as the temp is raised pairs will be

[14]
disrupted at different temperature. So that helix coil transition will not be sharp. In a cooperative transition
there is much greater difficulty in disrupting the first pair – because it is stabilized by the existence of all other
pairs – than in breaking the last pair which is stabilized only by its own intrinsic binding energy.
Therefore, when there is cooperativity, a transition is much sharper than when there is non-cooperativity.
►DNA duplex (double-stranded structure) favored by:
o MAIN STABILIZING FACTOR = base stacking - a combination of
 the hydrophobic effect (an entropic effect, getting bases out of contact with H2O), and
 van der Waals and dipole-dipole interactions (enthalpic effects)
Hydrogen bond formation between base pairs is not nearly as important as stacking interactions in stabilizing
double stranded structure. The strength of hydrogen bond, which is a weak bond.
In any case, it is affected by the angle made by the two component groups. Thus hydrogen bonding in a
polynucleotide is weak unless the bases are stacked, because the stacking provides the orientation that is
necessary for a large number of H-bonds to form simultaneously.
Separation in to single strands ("melting") is favored by:
o having less electrostatic repulsion of the backbone phosphates than in duplex
o conformational entropy (one molecule --> 2 molecules, and also more freedom of single
strands to adopt different conformations in solution)
o hydrogen bond formation with water
o Although base pairing by hydrogen bonding doesn't play a big role in stabilizing the duplex
structure, it provides all the specificity required for the complementarity of the 2 strands, essential
for the processes of DNA replication and transcription
►Since we study DNA denaturation in some solution; so at a particular temperature, bombardment of the
molecule by solvent molecules tends to break the hydrogen bonds and to alter the relative orientation of the
bases. This is very difficult though because, in order to break one hydrogen bond, the adjacent bonds have to
strain in order to tip the plane of one base with respect to an adjacent base; the first would have to tip also
with respect to its other neighbor. Thus, there is an enormous stabilization resulting from the stacking so that
the huge DNA molecule typically undergoes a helix-coil transition at temperatures 30-40° C above (82°C) the
values usually encountered for proteins.
►Base pair causes the base to project inward from the sugar phosphate back bone. This encourage the bases
to stack so that H-bonding and base stacking are synergistic.
►The hydrogen bonds that are most susceptible to disruption are those at the ends of a DNA molecule
because the terminal base pair is stabilized only by one pair of stacked bases thus as the temperature
increases, the H-bonds at the ends of the double helix are the first to break. This breakage destabilizes the
next pair and denaturation proceed progressively inward.
►In addition to the termini sequences rich in adenine . thymine (A . T) Base pairs also denature early in the
transition then guanine. cytosine (G . C) pairs, which have three hydrogen bonds. An internal region at which
base pairs are disrupted is called a bubble. Further breakage of hydrogen bonds occurs preferentially at the
ends of bubbles with high A . T content because these regions are nearly equivalent to the ends of the
molecule. Thus, denaturation of DNA proceeds both by enlargment of bubbles and progressive opening of the
helix from the ends.

►Helix coil transitions are frequently described by the temperature at which transition is 50% complete.
[15]
►Tm (melting temperature) = The value of T m depends on the mean of detecting the transition .Tm for the
change of absorbance of a DNA solution need not be the same as the value of T m for the change of viscosity
of the solution. The melting temperature is sometimes called the midpoint of the transition; however, it is
important to realize that it is not the temperature midway between the temperatures at which the
transition stops and starts. In fact, the melting curves are not usually symmetric on the temperature axis
about the value of Tm.
►In helix coil transition the cooperatively strengthens the ordered state and provides a mean for a chewing it.
Fluctuation of the DNA molecule
► Formaldehyde cause a slow and irreversible penetration of DNA because can react with –NH 2 group of
bases and thus eliminate their ability. These amino groups are available for reacting with HCHO because bases
are continually being paired and unpaired. This observation indicate that DNA is dynamic structure in which
double strand region frequently open to become single strand bubble. This important phenomenon, called
breathing, is thought to enable specialized protein (Helix destabilizing proteins) to interact with DNA molecule
and to react its encoded information (for Replication transcription. Breathing occur more frequently in regions
rich in A = T pairs than in region rich in G = C pairs.
►Helix destabilizing molecule binds tightly to several adjacent bases (in bubble) on single strand DNA and
thereby prevents reformation of the H-bonds. This distortion weakens the stacking interaction, so that bubble
fluctuate in size.
►The helix destabilizing protein that is bound undergoes a small change in shape that enable it to bind also to
a second identical protein molecule. Thus another helix destabilizing molecule, which is by itself capable of
binding to the bases in the fluctuating bubble now has a greater probability of binding because it can bend
both to the bases and to bound protein molecule. Thus, this is a cooperative binding.
Hoogsteen base pairs
 About 10 years after Watson and Crick proposed their model base pairs, Karst Hoogsteen tried to confirm the
model by heating a solution of adenine and thymine and then letting it cool slowly to form crystals. He found
that, in his crystals, adenine and thymine did not form hydrogen bonds as proposed by Watson and Crick.
Instead, they formed two hydrogen bonds with one another in a different way which involved the N7 atom of
the purine ring rather than the N1 atom.This alternative geometry is known as a Hoogsteen base-pair.

The Hoogsteen geometry is the most favourable one for A-T base-pairs in solution - but not in double helices.
G-C base pairs do not form Hoogsteen base pairs in solution - they are stable only in mildly acidic (pH4 - pH5)
solutions when the N3 atom of cytosine is protonated and can participate in a hydrogen bond with the N7 of
guanine. Hoogsteen G-C base pairs have only two hydrogen bonds, therefore, protonation is essential for
pairing. The third Watson-Crick hydrogen bond ensures that the Watson-Crick pairing scheme is the most
favourable in solution.

[16]
H-DNA:- In certain circumstances, DNA can form a triple helix. When base pair triples occur, the third
polynucleotide chain base-pairs with the other two using Hoogsteen base-pairs. The primary two chains are
paired following the usual Watson Crick geometry. 

The formation of triple helices in DNA is a subject of some study for its pharmaceutical and research potential

Chemical Stability of Nucleic Acids


Hydrolysis by acids and alkali
DNA is generally quite stable. It will resist attack in acid and alkali solutions. However, in mild acid solutions -
at pH 4 - the beta-glycosidic bonds to the purine bases are hydrolyzed. Protonation of purine bases (N7 of
guanine, N3 of adenine) occurs at this pH. Protonated purines are good leaving groups hence the hydrolysis.
Once this happens, the depurinated sugar can easily isomerize into the open-chain form and in this form the
depurinated (or apurinic) DNA is susceptible to cleavage by hydroxyl ions.
In contrast to DNA, RNA is very unstable in alkali solutions due to hydrolysis of the phophodiester backbone.
The 2'OH group in ribonucleotides renders RNA molecules susceptible to strand cleavage in alkali solutions.
Effect of CF3 COONa and NaCl on DNA:
For both salts Tm for .1M is greater than 0.01M is because of the neutralization of Po 4- (so less inter strand
repulsion).
►But after .1M, at 4M Tm for NaCl increase stating that repulsive force were incompletely eliminated at .1M
and at 4M they are more stabilize.
The stabilization by NaCl are due to two reasons
1. formation of positively charged “clouds” around negatively charged Po 4- and effective of Na+ shielding of
Po4–.
2.Neutrilization of the negatively charge by binding of Na + ions.
►In case of CF3COONa – after .1M, at 4M the stabilizing forces are takes over by destabilizing forces. Here
CF3COONa increases the solubility of bases by disrupting the water shell around them [as we now the base are
hydrophobic and in water they tend to aggregate to repel mater].
►this reduce hydrophobic interaction which decreases the stacking tendency of the bases So that Thermal
stability of the DNA reduces. [Because of the hydrophobic nature of bases, to maximize the contact with water
Sugar – Po4– chain is on the outside and the bases are on the inside].

[17]
C-Value Paradox
►The total amount of DNA in the Genome is a characteristic of each living species known as its C-value.
There is enormous variation in the range of C-values, from <10 6 bp for a mycoplasma to > 1011 bp for some
plants and amphibians
►There is an increase in the minimum genome size found in each group as the complexity increases. But as
absolute amounts of DNA increase in the higher eukaryotes,
►we see some wide variations in the genome sizes within some phyla.We can also see the steady increase in
genome size with complexity.
 It is necessary to increase the genome size in order to make insects, birds or amphibians, and mammals.
However, after this point there is no good relationship between genome size and morphological
complexity of the organism.
 We know that genes are much larger than the sequences needed to code for proteins, because exons
(coding regions) may comprise only a small part of the total length of a gene. This explains why there is
much more DNA than is needed to provide reading frames for all the proteins of the organism. Large parts
of an interrupted gene may not be concerned with coding for protein. And there may also be significant
lengths of DNA between genes. So it is not possible to deduce from the overall size of the genome anything
about the number of genes.
 Plotting the minimum amount of DNA required for a member of each group suggests that an increase in
genome size is required to make more complex prokaryotes and lower eukaryotes.
By seeing C-values of different phyla we can predict that with the increase in C-value complexity increases
what happen in case of first fully multicellular organism in eukaryotes.

Pyrenomas salina (algae)  6.6 x 105


(smallest Known genome)
E.Coli  4.2 x 106 Yeast (S. cerevisiae)  1.3 x 107
Dictyostelum discoideum  5.4 x 107 C.Elegans  8.0 x 107
Xenopus laevis (Amphibia)  3.1 x 109 Mammal (Human)  3.3 x 109

Reptile, Avies and mammals show very little variation within the phylum (it can be upto 2 times), but in case
of insects, amphibians and plants there is wide range of values often more than tenfolds.
Musca domestica  8.6 x 108 bp
Drosophila  1.4 x 108 bp
Thus this whole them get so puzzling that we cannot predict the C-value and function of genome with
complexity relation just by seeing C-value. This paradoxical situation expresses two features.

1. There is an excess of DNA compared with the amount that could be expected to code for proteins. This
mean a Gene can be much larger than needed to code for proteins.

2. There is large variation in C-values between certain species which do not show that much variation in
complexity.

[18]
►The C-value paradox refers to the lack of correlation between genome size and genetic complexity. There
are some extremely curious variations in relative genome size. The toad Xenopus and man have genomes
of essentially the same size.

 In some phyla there are extremely large variations in DNA content between organisms that do not vary much
in complexity.

►This is especially marked in insects, amphibians and plants, but does not occur in birds, reptiles and
mammals, which all show little variation within the group, with an ~2X range of genome sizes. A cricket has
a genome 11X the size of a fruit fly. In amphibians, the smallest genomes are <10 9 bp, while the largest are
~1011 bp.

Renaturation Kinetics:- The general nature of Eukaryotic genome can be assessed by the kinetics with which
denatured DNA sequence reassociated. This reassociation reaction reflects the variety of sequence that are
present; so the reaction can be used to quantitate gene (amount) and their RNA product. The reaction 2 nd
order kinetics

dc / dt = -KC2 C = Concentration of ssDNA at any time t.

dc/ dt = -KC2 K = reassociation rate constant.

C / C0 = 1 / (1+K.C0t) C0 = Concentration of ssDNa

at t=0

C / C0 = ½ = 1 / 1+K.C0t1/2 so C0t1/2 = 1/K (C0t1/2 = C0t1/2)

t1/2 = when reaction is half complete.

C0t1/2 is the product of concentration and time required to proceed half-way , a greater C 0t implies a slower
RxN

1/K = C0t1/2 = moles x Sec / Liter K = Liter / (moles – sec)

1. The C0t1/2 is directly related to the amount of DNA in the genome.

2. Rate of reassociation is inversely proportional to the length of the reassociating DNA. This describe the
complexity.

C0t1/2 x complexity

The complexity of any DNA can be determined by comparing its C 0t1/2 with that of a standard DNA of known
complexity. Usually E.Coli DNA is used as standard.

C0t1/2 ( DNA of any genome) Complexity of any genome


----------------------------------- = -----------------------------------

[19]
C0t1/2 (E.Coli DNa) 4.2 x 106 bp

actors that effect the denaturation and renaturation of nucleic acid duplexes.

 Parameter  Effect on Tm Effect on rate of renaturation


base composition increase Tm with increase %G-C no effect

 
hybrid length <150 bp; increase Tm with increase increase rate with increase
length length

>500 bp; no effect 


+ +
ionic strength increase Tm with increase [Na ]  optimal at 1.5M Na
%bp mismatch  decrease Tm with increase decrease rate with increase
%mismatch %mismatch
DNA concentration  no effect increase rate with increase
[DNA]
denaturing agents decrease Tm with optimal at 50% formamide
increaseformamide], [urea]
 
 
Temperature not applicable optimal at 20°C below Tm
MICRO RNA

Micro RNAs (miRNA) are single-stranded RNA molecules of 21-23 nucleotides in length, which regulate gene
expression. miRNAs are encoded by genes from whose DNA they are transcribed but miRNAs are not

[20]
translated into protein (non-coding RNA); instead each primary transcript (a pri-miRNA) is processed into a
short stem-loop structure called a pre-miRNA and finally into a functional miRNA. Mature miRNA molecules
are partially complementary to one or more messenger RNA (mRNA) molecules, and their main function is to
down-regulate gene expression.

The function of miRNAs appears to be in gene regulation.

For that purpose, a miRNA is complementary to a part of one or more messenger RNAs (mRNAs). Animal
miRNAs are usually complementary to a site in the 3' UTR whereas plant miRNAs are usually complementary
to coding regions of mRNAs.

 The annealing of the miRNA to the mRNA then inhibits protein translation, but sometimes facilitates
cleavage of the mRNA. This is thought to be the primary mode of action of plant miRNAs. In such cases, the
formation of the double-stranded RNA through the binding of the miRNA triggers the degradation of the
mRNA transcript through a process similar to RNA interference (RNAi), though in other cases it is believed
that the miRNA complex blocks the protein translation machinery or otherwise prevents protein translation
without causing the mRNA to be degraded. miRNAs may also target methylation of genomic sites which
correspond to targeted mRNAs. miRNAs function in association with a complement of proteins collectively
termed the miRNP.

dsRNA can also activate gene expression, a mechanism that has been termed "small RNA-induced gene
activation" or RNAa.

dsRNAs targeting gene promoters can induce potent transcriptional activation of associated genes. This was
demonstrated in human cells using synthetic dsRNAs termed small activating RNAs (saRNAs), but has also
been emonstrated for endogenous microRNA.

[21]
miRNA and disease

Just as miRNA is involved in the normal functioning of eukaryotic cells, so has dysregulation of miRNA been
associated with disease.

 miRNA and cancer Several miRNAs has been found to have links with some types of cancer.

A study of mice altered to produce excess c-myc — a protein implicated in several cancers — shows that
miRNA has an effect on the development of cancer. Mice that were engineered to produce a surplus of types
of miRNA found in lymphoma cells developed the disease within 50 days and died two weeks later. In
contrast, mice without the surplus miRNA lived over 100 days.

Another study found that two types of miRNA inhibit the E2F1 protein, which regulates cell proliferation.
miRNA appears to bind to messenger RNA before it can be translated to proteins that switch genes on and off.

RNA Interference
One of the first indications of this phenomenon occurred in Richard Jorgensen’s attempt to genetically
engineer more vividly purple petunias by introducing extra copies of the gene that directs the synthesis of the
purple pigment. Surprisingly, the resulting transgenic plants had variegated and often entirely white flowers.
Apparently, the purple-making genes somehow switched each other. Injecting sense RNA (RNA with the same
sequence as an mRNA) into the nematode C. elegans also blocks protein production. Since the added RNA
somehow interferes with gene expression, this phenomenon is known as RNA interference [RNAi;
posttranscriptional gene silencing (PTGS) in plants].
Andrew Fire and Craig Mello showed that double-stranded RNA (dsRNA) was substantially more effective in
causing RNAi in C. elegans than were either of its component strands alone. RNAi is induced by only a few
molecules of dsRNA per affected cell, suggesting that RNAi is a catalytic rather than a stoichiometric effect.
1. The trigger dsRNA, as Phillip Zamore discovered is chopped up into  21 – to 23-nt-long double-
stranded fragments know as small interfering RNAs (siRNAs), each of whose strands has a 2-nt
overhang at its 3’ end and a 5’ phosphate. This reaction is mediated by an ATP-dependent RNase
named Dicer, a homodimer of 2249-residue subunits that is a member of the RNase III family of
double-strand-specific RNA endonucleases.
2. An siRNA is transferred to a 250-to 500-kD multi-subunit complex known as RNA-induced silencing
complex (RISC), which contains an endoribonucclese that is distinct from Dicer. The antisense strand
of the siRNA guides the RISC complex to an mRNA with the complementary sequence.
3. RISC cleaves the mRNA, probably opposite the bound siRNA. The cleaved mRNA is then further
degraded by cellular nucleases, thereby preventing its translation.
RNAi requires that the trigger dsRNA be copied so as to permit the siRNAs to reach sufficient
concentrations to cleave the target mRNAs. This amplification process is mediated by an RNA-
dependent RNA polymerase (RdRP). Moreover, an siRNA strand can act as a primer for the RdRP-
catalyzed synthesis of secondary trigger dsRNA, which is subsequently “diced” to yield secondary
siRNAs. Since the secondary trigger dsRNA may extended beyond the sequence complementary to the
original trigger dsRNA, some of the resulting secondary siRNA. This would cause the silencing of genes
with segments resembling portions of the original trigger mRNA but with no similarity to any portion
of the original trigger dsRNA, a phenomenon known as transitive RNAi.

[22]
RIBOZYME
A ribozyme (from ribonucleic acid enzyme, also called RNA enzyme or catalytic RNA) is an RNA molecule
that catalyzes a chemical reaction. Many natural ribozymes catalyze either the hydrolysis of one of their own
phosphodiester bonds, or the hydrolysis of bonds in other RNAs, but they have also been found to catalyze
the aminotransferase activity of the ribosome.
Discovery :The first ribozymes were discovered in the 1980s by Thomas R. Cech, who was studying RNA
splicing in the ciliated protozoan Tetrahymena thermophila and Sidney Altman, who was working on the
bacterial RNase P complex. These ribozymes were found in the intron of an RNA transcript, which removed
itself from the transcript, as well as in the RNA component of the RNase P complex, which is involved in the
maturation of pre-tRNAs.

Since Cech's and Altman's discovery, other investigators have discovered other examples of self-cleaving
RNA or catalytic RNA molecules. Many ribozymes have either a hairpin – or hammerhead – shaped active
center and a unique secondary structure that allows them to cleave other RNA molecules at specific
sequences. It is now possible to make ribozymes that will specifically cleave any RNA molecule. These RNA
catalysts may have pharmaceutical applications. For example, a ribozyme has been designed to cleave the
RNA of HIV. If such a ribozyme was made by a cell, all incoming virus particles would have their RNA
genome cleaved by the ribozyme, which would prevent infection.

[23]
Although most ribozymes are quite rare in the cell, their roles are sometimes essential to life. For example, the
functional part of the ribosome, the molecular machine that translates RNA into proteins, is fundamentally a
ribozyme. Ribozymes often have divalent metal ions such as Mg2+ as cofactors.

Known ribozymes
Naturally occurring ribozymes include:
 Peptidyl transferase 23S r RNA
 RNase P
 Group I and Group II introns
 Hairpin ribozyme
 Hammerhead ribozyme
 Ribonuclease P
All living things synthesize an enzyme — called Ribonuclease P (RNase P) — that cleaves the head (5') end of
the precursors of transfer RNA (tRNA) molecules.

In bacteria, ribonuclease P is a heterodimer containing


 a molecule of RNA and one of
 protein
Separated from each other, the RNA retains its ability to catalyze the cleavage step (although less efficiently
than the intact dimer), but the protein alone cannot do the job.

Group I Introns
Some ribosomal RNA (rRNA) genes contain introns that must be spliced out to make the final product. Such
example are seen :-
 in the mitochondrial genome of certain fungi (e.g., yeast)
 in some chloroplast genomes
 in the nuclear genome of some "lower" eukaryotes, for example
o the ciliated protozoan Tetrahymena thermophila)
o the plasmodial slime mold Physarum polycephalum
The splicing reaction is self-contained; that is, the intron - with the help of associated proteins - splices itself
out of the precursor RNA.
Once excision of the intron and splicing of the adjacent exons are completed, the story is over. In other words,
although the action is catalyzed by the RNA, only a single molecule of substrate is involved (unlike protein
enzymes that repeatedly catalyze a reaction).
However, synthetic versions of Group I introns made in the laboratory can - in vitro - act repeatedly; that is,
like true enzymes.
The DNA of some Group I introns includes an open reading frame (ORF) that encodes a transposase-like
protein that can make a copy of the intron and insert it elsewhere in the genome.
All the Group I introns share a characteristic secondary structure and mode of action that distinguishes them
from the next group.

[24]
Group II Introns
Some messenger RNA (mRNA) genes
 in the mitochondrial genome of yeast and other fungi (encoding the proteins cytochrome b and
subunits of cytochrome c oxidase)
 in some chloroplast genomes
also contain self-splicing introns.
Because their secondary structure and the details of the splicing reaction differ from the rRNA introns discussed
above, these are called Group II introns. The DNA of some Group II introns also includes an open reading
frame (ORF) that encodes a transposase-like protein that can make a copy of the intron and insert it elsewhere
in the genome.
Spliceosomes
Spliceosomes remove introns and splice the exons of most nuclear genes. They are composed of 5 kinds of
small nuclear RNA (snRNA) molecules and a large number of protein molecules. It is the snRNA — not the
protein — that catalyzes the splicing reactions.
The molecular details of the reactions are similar to those of Group II introns, and this has lead to speculation
that this splicing machinery evolved from them.

Viroids
Viroids are:
 RNA molecules that infect plant cells as conventional viruses do, but
o are far smaller (one has only 246 nucleotides)
o are naked; that is, they are not encased in a capsid.
 Some viroidlike molecules get into the cell as passengers inside a conventional plant virus. These
are called virusoids or viroidlike satellite RNAs.
 In both cases, the molecules consists of
o single-stranded RNA whose
o ends are covalent bonded to form a circle.
o There are several regions where base-pairing occurs across adjacent portions of the molecule.
 New viroids and virusoids are synthesized by the host cell as long precursors in which the viroid
structure is tandemly repeated.
 These repeats must be cut out and ligated to form the final product.
 Most virusoids and at least one viroid are self-splicing; that is they can cut themselves out of the
precursor and ligate their ends without the aid of any host enzymes.
 Thus they represent another class of ribozyme.
Both viroids and virusoids are responsible for a number of serious diseases of economically important plants;
e.g. the coconut palm and chrysanthemums.

[25]
CHAPTER: 2 ENZYMES INVOLVED IN REPLICATION

Topoisomerase

Topoisomerases are enzymes that introduce positive or negative supercoils in closed, circular duplex
DNA.
They are enzymes that act on the topology of DNA. The double-helical configuration that DNA strands
naturally reside in makes them difficult to separate, and yet they must be separated by helicase proteins if
other enzymes are to transcribe the sequences that encode proteins, or if chromosomes are to be replicated.
In so-called circular DNA, in which double helical DNA is bent around and joined in a circle, the two strands are
topologically linked, or knotted. Otherwise identical loops of DNA having different numbers of twists are
topoisomers, and cannot be interconverted by any process that does not involve the breaking of DNA strands.
Topoisomerases catalyze and guide the unknotting of DNA.
The insertion of viral DNA into chromosomes and other forms of recombination can also require the action
of topoisomerases.
►Many drugs operate through interference with the topoisomerases. The broad-spectrum fluoroquinolone
antibiotics act by disrupting the function of bacterial type II topoisomerases.
►Some chemotherapy drugs work by interfering with topoisomerases in cancer cells: type 1 is inhibited by
irinotecan and topotecan, while type 2 is inhibited by etoposide and teniposide.

Type I topoisomerases
►Both type I and type II topoisomerases change the supercoiling of DNA. Type I topoisomerases function by
nicking one of the strands of the DNA double helix, twisting it around the other strand, and re-ligating the
nicked strand. This is not an active process in the sense that energy in the form of ATP is not spent by the
topoisomerase during uncoiling of the DNA; rather, the torque present in the DNA drives the uncoiling.
►Type I enzymes can be further subdivided into type IA and type IB, based on their chemistry of action. Type
IA topoisomerases change the linking number of a circular DNA strand by units of strictly 1, whereas Type IB
topoisomerases change the linking number by multiples of 1.
►All topoisomerases form a phosphotyrosine intermediate between the catalytic tyrosine of the enzyme and
the scissile phosphoryl of the DNA backbone. Type IA topoisomerases form a covalent linkage between the
catalytic tyrosine and the 5’-phosphoryl while type IB enzymes form a covalent 3’-phosphotyrosine
intermediate. Apart from these similarities, they have very different mechanisms of action, have different
crystal structures and appear not to have similar evolutionary ancestors. Type IB topoisomerase are
specifically inhibited by quinolinebased alkaloid “comptothecin” (a product of Chinese tree Comptotheca
accuminata) only known naturally accuring topoisomerase IB inhibitor.

[26]
Type II topoisomerases
Type II topoisomerases cut both strands of the DNA helix simultaneously. Once cut, the ends of the DNA are
separated, and a second DNA duplex is passed through the break. Following passage, the cut DNA is resealed.
This reaction allows type II topoisomerases to increase or decrease the linking number of a DNA loop by 2
units, and promotes chromosome disentanglement. For example, DNA gyrase, a type II topoisomerase
observed in E. coli and most other prokaryotes, introduces negative supercoils and decreases the linking
number by 2. Gyrase also is able to remove knots from the bacterial chromosome.
►There are two subclasses of type II topoisomerases, type IIA and IIB.
Type IIA topoisomerases include the enzymes DNA gyrase, eukaryotic topoisomerase II, and bacterial
topoisomerase IV. Type IIB topoisomerases are structurally and biochemically distinct, and comprise a single
family member, topoisomerase VI. Type IIB topoisomerases are found in archaea and some higher plants.
►In cancers, the topoisomerase II alpha is highly expressed in highly proliferating cells. In certain cancers,
such as peripheral nerve sheath tumors, high expression of its encoded protein is also associated to poor
patient survival.

Topoisomerases of E.Coli and eukaryotes


Class I topoisomerase Class II topoisomerase
Cleavage One strand Both strands
Steps of 1 Steps of 2
ΔL
Mechanism Enzyme binds non covalently to DNA. Enzyme bind non covalently to DNA
↓ ↓
One strand cleaved Both strand cleave
↓ ↓
5’ PO4 group covalently attached to active 5’ PO4 group covalently bound to
site active site
↓ ↓
Intact strand passed Intact duplex passed
↓ ↓
Borken strand relegate Broken duplex relegate and DNA
released.
ATP required No Yes
E.Coli Topo I (w protein) Gyrase (topo II) Introduces negative
Topo III supercoils.
Relaxes negative super coils
Eukaryotes Topoisomerase I. Relaxes positive and Topoisomerase II. Relaxes positive &
negative supercoils. Topoisomerase III. negative super coils. Topoisomerase
Relaxes negative supercoils no IV- Relaxes negative and probably
decatenation. also positive supercoils

[27]
Gyrase

The current mechanochemical model of Gyrase activity:


►DNA gyrase, often referred to simply as gyrase, is a type II topoisomerase that introduces negative
supercoils (or relaxes positive supercoils) into DNA by looping the template so as to form a crossing, then
cutting one of the double helices and passing the other through it before resealing the break, changing the
linking number by two in each enzymatic step.
►The unique ability of gyrase to introduce negative supercoils into DNA is what allows bacterial DNA to have
free negative supercoils. The ability of gyrase to relax positive supercoils comes into play during DNA
replication. The right-handed nature of the DNA double helix causes positive supercoils to accumulate ahead
of a translocating enzyme, in the case of DNA replication, a DNA polymerase.
The ability of gyrase (and topoisomerase IV) to relax positive supercoils allows superhelical tension ahead of
the polymerase to be released so that replication can continue.

Inhibition by Antibiotics
Gyrase is found in bacteria and plants, but not in humans. This makes gyrase a good target for antibiotics.
Two classes of antibiotics that inhibit gyrase are the coumarins (including novobiocin), and the quinolones
(including nalidixic acid and ciprofloxacin).
DNA gyrase is a member of a class of enzymes called topoisomerases type II. . Quinolones bind these enzymes
and prevent them from decatenating replicating DNA. Quinolone resistant bacteria frequently harbor
mutated topoisomerases which resist quinolone binding.

Helicase
Helicases are a class of enzymes vital to all living organisms. They are motor proteins that translocate
unidirectionally along single-stranded nucleic acids using energy derived from nucleotide hydrolysis, often
separating the two strands of a nucleic acid double helix in the process. (i.e. DNA, RNA, RNA-DNA hybrid)

Function
[28]
Many cellular processes (DNA replication, RNA transcription, DNA recombination, DNA repair) involve
separation of the nucleic acid strands. There are many helicases (14 identified so far in E. coli, 24 in human
cells) resulting from the great variety of processes in which strand separation must be catalyzed.
All helicases separate the strands of a double helix using the energy derived from ATP hydrolysis. They move
along one strand with directionality specific to each enzyme (3’-5’ or 5’-3’) while separating the strands.

Structural Features
The common features shared by helicases account for the fact that they display amino acid sequence
homology to a certain degree: they all possess common sequence motifs located in the central part of their
sequence. These are thought to be specifically involved in ATP binding, ATP hydrolysis and translocation on
the nucleic acid template. The variable part of the amino acid sequence is related to the specific features of
each helicase. Based on the presence of the so-called helicase motifs, it is possible to attribute a putative
helicase activity to a given protein. However, the presence of these motifs does not necessarily imply that the
protein indeed possesses helicase activity. Based on the presence and the form of the helicase motifs,
helicases have been separated in 4 superfamilies and 2 smaller families. Some members of these families are
indicated, with the organism from which they are extracted, and their function.
REPLICATIVE HELICASES AND THEIR FUNCITON:
Helicase Polarity and Functions
Dna B protein 5’ → 3’ polarity. Major helicase in E.Coli chromosomal replication.
Pri A protein (n’, y) 3’ → 5’ polarity, component of primosome
Rep 3’ → 5’ polarity, Absolutely required for rolling circle replication.
DNAhelicase A Enzymes isolated from yeast & calf thymus.
Copurifies with DNApolymerase α primase
DNA helicase δ 5’ → 3’ polarity. Copurifies with DNA polymerase δ
Helicases adopt different structures and oligomerization states. Whereas DnaB-like helicases unwind DNA as
donut shaped hexamers, other enzymes have been shown to be active as monomers or dimers. Their precise
mechanisms of action are still unclear.

[29]
Single stranded Binding Proteins (SSB)
E. Coli Eukaryotes
Protein SSB RP-A (replication protein A)
Or HSSB (human SSB)
Structure Homotetramer of 19 kDa subunits Heterotrimer of 70 KDA, ~30 KDA and ~15
KDA subunits.
Function Stabilizes SS regions during replication, Stabilizes SS regions during replication,
recombination and repair. recombination and repair.
Directs priming to origin of M13 related Interacts with pol α : primase to prevent
genomes. non specific priming events.
Associates with Pri B in primosome Interacts with transcription factors repair,
complex proteins XP-A and several helicase.
Properties ssDNA-specific but no sequence ss-DNA specific, partial sequence
DNA polymerase
specificity. Cooperative binding specificity. Activity modulated by
phosphorylation.
Comparison of the properties of the DNA polymerase of E.Coli
Genes Ssb RP-A1, RP-A2, RP-A3
Pol I Pol II Pol III

5’ → 3’ Polymerase Yes Yes Yes


3’ → 5’ Exonuclease Yes Yes Yes
5’ → 3’ Exonuclease Yes No No
Structure Polypeptide Poly peptide Multimeric complex
Function Repair, Primer excision Error Prone repair Principle replication
EUKARYOTIC DNA POLYMERASES polymerase polymerase
Mammalian Name α β γ (SOS inducible) δ ε
Yeast Nameof E.Coli Pol
Subunits Pol1 III holoenzyme
Pol 4 Pol M Pol 3 Pol 2
Yeast gene
Core Subunit POL 1 POL 4 MIP 1 POL3 POL2
Location
α 5’→Nuclear Nuclear
3’ Polymerase Mitochondrial
activity, required Nuclear
for DNA synthesis Nuclear
No.
ε of subunit 3’→45’ exonuclease
1 activity, required
2 2
for proofreading >1
► DN
A 5’→3’Polymerase
Q Yes unknown.
Function Yes Yes Yes Yes
3’→ 5’ Exonuclease No No Yes Yes Yes
Accessory
Primase Yes No No No No
τ DNa dependent ATPase, required for initiation.
Associated facotors None None None PCNA None
γ DNA dependent ATPase forming γ complex (with 4 peptides) facilitates β
Processivity Moderate Low High High with High
subunit binding.
PCNA
δ, δ1, χ, Ψ Forms γ complex required [30]
for loading & unloading β subunit
Function Lagging Repair Organelle Principle Unknown
β ‘Sliding
strand
clamp’, polymerase
forms preinitiation
polymerase
complex with
replicative
DNA a process which
requires
priming
ATP dependent activity of the γ complex.polymerase
polymerase is an enzyme that assists in DNA replication. Such enzymes catalyze the polymerization of
deoxyribonucleotides alongside a DNA strand, which they "read" and use as a template. The newly-
polymerized molecule is complementary to the template strand and identical to the template’s partner
strand.
►All DNA polymerases synthesize DNA in the 5’ to 3’ direction. No known DNA polymerase is able to begin
a new chain (de novo). They can only add a nucleotide onto a preexisting 3’-OH group. For this reason, DNA
polymerase needs a primer at which it can add the first nucleotide. Primers consist of RNA and DNA bases
with the first two bases always being RNA, and are synthesized by another enzyme called primase. An enzyme
known as a helicase is required to unwind DNA from a double-strand structure to a single-strand structure to
facilitate replication of each strand consistent with the semiconservative model of DNA replication.
►DNA polymerases have highly-conserved structure, which means that their overall catalytic subunits vary,
on a whole, very little from species to species. Conserved structures usually indicate evolutionary advantages.
DNA polymerase is considered to be a holoenzyme since it requires a Magnesium ion as a co-factor to function
properly. In the absence of the Magnesium ion, it is referred to as an apoenzyme.
►Error correction is a property of some, but not all, DNA polymerases. This process corrects mistakes in
newly-synthesized DNA. When an incorrect base pair is recognized, DNA polymerase reverses its direction by
one base pair of DNA. The 3’->5’ exonuclease activity of the enzyme allows the incorrect base pair to be
excised (this activity is known as proofreading). Following base excision, the polymerase can re-insert the
correct base and replication can continue.
►Some viruses also encode special DNA polymerases which may selectively replicate viral DNA through a
variety of mechanisms. Retroviruses encode an unusual DNA polymerase called reverse transcriptase, which is
an RNA-dependent DNA polymerase (RdDp). It polymerizes DNA from a template of RNA.

DNA polymerase families


Based on sequence homology, DNA polymerases can be further subdivided into seven different families A, B,
C, D, X, Y, and RT.
►Family A
Family A polymerases contain both replicative and repair polymerases. Replicative members from this family
include the extensively studied T7 DNA polymerase as well as the eukaryotic mitochondrial DNA Polymerase γ.
Among the repair polymerases are E. coli DNA pol I, Thermus aquaticus pol I, and Bacillus stearothermophilus
pol I. These repair polymerases are involved in excision repair and processing of Okazaki fragments generated
during lagging strand synthesis.
►Family B
Family B polymerases mostly contain replicative polymerases and include the major eukaryotic DNA
polymerases α, δ, ε, and also DNA polymerase ζ. Family B also includes DNA polymerases encoded by some
bacteria and bacteriophages, of which the best characterized are from T4, Phi29 and RB69 bacteriophages.
These enzymes are involved in both leading and lagging strand synthesis.
A hallmark of the B family of polymerases is remarkable accuracy during replication and many have strong
3’-5’ exonuclease activity
(except DNA polymerase α and ζ which have no proofreading activity).
[31]
►Family C
Family C polymerases are the primary bacterial chromosomal replicative enzymes and thus have polymerase
and 3’-5’ exonuclease activity.

►Family D
Family D polymerases are still not very well characterized. All known examples are found in the Euryarchaeota
subdomain of Archaea and are thought to be replicative polymerases.
►Families X
Family X contains the well known eukaryotic polymerase pol β as well as other eukaryotic polymerases such as
pol σ, pol λ, pol μ, and terminal deoxynucleotidyl transferase (TdT).
Pol β is required for short-patch base excision repair, a DNA repair pathway that is essential for repairing
abasic sites. Pol λ and Pol μ are involved in non-homologous end joining, a mechanism for rejoining DNA
double-strand breaks. TdT is only expressed in lymphoid tissue and adds "n nucleotides" to double-strand
breaks formed during V(D)J recombination to promote immunological diversity.
►Families Y
The Y-family polymerases differ from others in having a low fidelity on undamaged templates and in their
ability to replicate through damaged DNA. Members of this family are hence called translesion sythesis (TLS)
polymerases. Depending on the lesion TLS polymerases can bypass the damage in an error-free or error-prone
fashion, the latter resulting in elevated mutagenesis.
►Xeroderma pigmentosum variant (XPV) patients for instance have mutations in the gene encoding Pol η
(eta), which is error-free for UV-lesions. In XPV patients alternative error-prone polymerases e.g. Polζ (zeta)
(polymerase ζ is a B Family polymerase), are thought to be involved in mistakes which result in the cancer
predisposition of these patients. Other members in humans are Pol ι (iota), Pol κ (kappa) and Rev1 (terminal
deoxycytidyl transferase). In E.coli two TLS polymerases, Pol IV (DINB) and PolV (UMUC), are known.
►Family RT
Finally, the reverse transcriptase family contain examples both from retroviruses and eukaryotic polymerases.
The eukaryotic polymerases are usually restricted to telomerases. These polymerases use a RNA template to
synthesize the DNA strand.

►Prokaryotic DNA polymerases


Bacteria have 5 known DNA polymerases:
 Pol I: Is implicated in DNA repair and has both 5’->3’(Nick translation) and 3’->5’ (Proofreading)
exonuclease activity.
 Pol II: Pol II is involved in replication of damaged DNA and has both 5’->3’chain extension ability and 3’->5’
exonuclease activity.
 Pol III: is the main polymerase in bacteria (elongates in DNA replication), as such it has 3’->5’ exonuclease
proofreading ability.
 Pol IV: is a Y-family DNA polymerase

[32]
 Pol V: is a Y-family DNA polymerase and participates in bypassing DNA damage

Eukaryotic DNA polymerases


Eukaryotes have at least 15 DNA Polymerases
 Pol α: acts as a primase (synthesizing a RNA primer), and then as a DNA Pol elongating that primer with
DNA nucleotides. After a few hundred nucleotides elongation is taken over by Pol δ and ε.
 Pol β: is implicated in repairing DNA.
 Pol γ: replicates mitochondrial DNA.
 Pol δ: is the main polymerase in eukaryotes, it is highly processive and has 3’->5’ exonuclease activity.
 Pol ε: may substitute for Pol δ in lagging strand synthesis, however the exact role is uncertain.
 η, ι, κ, and Rev1 are Y-family DNA polymerases and Pol ζ is a B-family DNA polymerase. These
polymerases are involved in the bypass of DNA damage.
 There are also other eukaryotic polymerases known, which are not as well characterized: θ, λ, φ, σ, and μ.
There are also others, but the nomenclature has become quite jumbled.
►None of the eukaryotic polymerases can remove primers (5’->3’ exonuclease activity), that function is
carried out by other enzymes. Only the polymerases that deal with the elongation (γ, δ and ε) have
proofreading ability (3’->5’ exonuclease).

RNA polymerase
Control of transcription
►RNA polymerase (RNAP or RNA pol) is an enzyme responsible for making RNA from a DNA or RNA template.
RNAP accomplishes this task by constructing RNA chains through a process termed transcription. In scientific
terms, RNAP is a nucleotidyl transferase that polymerizes ribonucleotides at the 3’ end of an RNA transcript.
RNA polymerase enzymes are essential and are found in all organisms, cells, and many viruses.

►RNAP was discovered independently by Sam Weiss and Jerard Hurwitz in 1960. By this time the 1959
Nobel Prize had been awarded to Severo Ochoa for the discovery of what was believed to be RNAP, but
instead turned out to be a ribonuclease .

Essential Subunit Of Human RNA Polymerases I, II and III


►Control of the process of transcription affects patterns of gene expression and thereby allows a cell to adapt
to a changing environment, perform specialized roles within an organism, and maintain basic metabolic
processes necessary for survival. Therefore, it is hardly surprising that the activity of RNAP is both complex
and highly regulated. In E. coli bacteria, more than 100 factors have been identified which modify the activity
of RNAP.

[33]
►RNAP can initiate transcription at specific DNA sequences known as promoters. It then produces an RNA
chain which is complementary to the DNA strand used as a template. The process of adding nucleotides to the
RNA strand is known as elongation, and in eukaryotes RNAP can build chains as long as 2.4 million nucleosides
(the full length of the dystrophin gene). RNAP will preferentially release its RNA transcript at specific DNA
sequences encoded at the end of genes known as terminators.

►Some RNA molecules produced by RNAP will serve as templates for the synthesis of proteins by the
ribosome. Others can fold into enzymatically active ribozymes or tRNA molecules. A third option is that an
RNA molecule will serve a purely regulatory role to control future gene expression.(SiRNA)
►RNAP accomplishes de novo synthesis. It is able to do this because specific interactions with the initiating
nucleotide hold RNAP rigidly in place, facilitating chemical attack on the incoming nucleotide. Such specific
interactions explain why RNAP prefers to start transcripts with ATP (followed by GTP, UTP, and then CTP). In
contrast to DNA polymerase, RNAP includes a helicase activity, therefore no separate enzyme is needed to
unwind DNA.

RNA polymerase in bacteria


In bacteria, the same enzyme catalyzes the synthesis of three types of RNA: mRNA, rRNA and tRNA.
RNAP is a relatively large molecule. The core enzyme has 5 subunits (~400 kDa):
• α2: the two α subunits assemble the enzyme and recognize regulatory factors.
• β: this has the polymerase activity (catalyzes the synthesis of RNA).
• β’: binds to DNA (nonspecifically).
• ω: function not known clearly.
►However it has been observed to offer a protective/chaperone function to the β’ subunit in M. smegmatis.
In order to bind promoter-specific regions, the core enzyme requires another subunit, sigma (σ). The sigma
factor greatly reduces the affinity of RNAP for nonspecific DNA while increasing specificity for certain
promoter regions, depending on the sigma factor.
►The complete holoenzyme therefore has 6 subunits: α2ββ’σω (~480 kDa). The structure of RNAP exhibits a
groove with a length of 55 Å (5.5 nm) and a diameter of 25 Å (2.5 nm). This groove fits well the 20 Å (2 nm)
double strand of DNA. The 55 Å (5.5 nm) length can accept 16 nucleotides.
►When not in use RNA polymerase binds to low affinity sites to allow rapid exchange for an active promotor
site when one opens. RNA polymerase holoenzyme, therefore, does not freely float around in the cell when
not in use

RNA polymerase in eukaryotes


Eukaryotes have several types of RNAP:
• RNA polymerase I synthesizes a pre-rRNA 45S, which matures into 28 S, 18S and 5.8S rRNAs which will form
the major RNA sections of the ribosome.

[34]
• RNA polymerase II synthesizes precursors of mRNAs and most snRNA. This is the most studied type, and due
to the high level of control required over transcription a range of transcription factors are required for its
binding to promoters. For detail of RNA polymerase function
• RNA polymerase III synthesizes tRNAs, rRNA 5S and other small RNAs found in the nucleus and cytosol.
Other RNA polymerase types in mitochondria and chloroplasts.

RNA polymerase in archaea


Archaea have a single form of RNAP that is closely related to the three main eukaryotic polymerases. It has
been speculated that the archaeal polymerase resembles the ancestor of the specialized eukaryotic
polymerases.

RNA polymerase in viruses


Many viruses also encode for RNAP. Perhaps the most widely studied viral RNAP is found in bacteriophage T7.
This single-subunit RNAP is related to that found in mitochondria and chloroplasts, and shares considerable
homology to DNA polymerase. It is believed by many that most viral polymerases therefore evolved from DNA
polymerase and are not directly related to the multi-subunit polymerases described above.
►The viral polymerases are diverse, and include some forms which can use RNA as a template instead of DNA.
This occurs in negative strand RNA viruses and dsRNA viruses, both of which exist for a portion of their life cycle
as double-stranded RNA. However, some postive strand RNA viruses, such as polio, also contain these RNA
dependent RNA polymerases.

Transcriptional cofactors
There are a number of proteins which can bind to RNAP and modify its behavior. For instance, greA and greB
from E. coli can enhance the ability of RNAP to cleave the RNA template near the growing end of the chain.
This cleavage can rescue a stalled polymerase molecule, and is likely involved in proofreading the occasional
mistakes made by RNAP.
►A separate cofactor, Mfd, is involved in transcription-coupled repair, the process in which RNAP
recognizes damaged bases in the DNA template and recruits enzymes to restore the DNA. Other cofactors
are known to play regulatory roles, i.e. they help RNAP choose whether or not to express certain genes.

LIGASE
In molecular biology, DNA ligase is a particular type of ligase that can link together DNA strands that have
double-strand breaks (a break in both complementary strands of DNA). The alternative, a single-strand break,
is easily fixed by DNA polymerase using the complementary strand as a template but still requires DNA ligase
to create the final phosphodiester bond to fully repair the DNA.
DNA ligase has applications in both DNA repair and DNA replication. In addition, DNA ligase has extensive use
in molecular biology laboratories for recombination experiments.

Ligase mechanism

[35]
The mechanism of DNA ligase to connect broken DNA strands is to form covalent phosphodiester bonds
between 3’ hydroxyl ends of one nucleotide with the 5’ phosphate end of another. ATP is required for the
ligase reaction.A pictorial example of how a ligase works (with sticky ends):
Ligase will also work with blunt ends, although higher enzyme concentrations and different reaction
conditions are required.

Mammalian ligases
In mammals, there are four specific types of ligase.
• DNA ligase I: ligates Okazaki fragments during lagging strand DNA replication and some recombinant
fragments.
• DNA ligase II: alternatively spliced form of DNA ligase III found in non-dividing cells.
• DNA ligase III: complexes with DNA repair protein XRCC1 to aid in sealing base excision mutations and
recombinant fragments.
• DNA ligase IV: complexes with XRCC4. It catalyzes the final step in the non-homologous end joining DNA
double-strand break repair pathway. It is also required for V (D) J recombination, the process which
generates diversity in immunoglobulin and T-cell receptor loci during immune system development.

Applications in molecular biology research


DNA ligases have become an indispensable tool in modern molecular biology research for generating
recombinant DNA sequences. For example, DNA ligases are used with restriction enzymes to insert DNA
fragments, often genes, into plasmids.
One vital, and often tricky, aspect to performing successful recombination experiments involving ligase is
controlling the optimal temperature. Most experiments use T4 DNA Ligase (isolated from bacteriophage T4)
which is most active at 25°c. However in order to perform successful ligations, the optimal enzyme
temperature needs to be balanced with the melting temperature Tm (also the annealing temperature) of the
DNA fragments being ligated. If the ambient temperature exceeds Tm, homologous pairing of the sticky ends
will not occur because the high temperature disrupts hydrogen bonding. The shorter the DNA fragments, the
lower the Tm. Thus for extremely short fragments on the order of tens of base pairs, ligation experiments are
performed at very low temperatures (~4°c) for a long period of time (often overnight).
The common commercially available DNA ligases were originally discovered in bacteriophage T4, E. coli or
other bacteria.

[36]
CHAPTER: 3 DNA REPLICATION


Three models for DNA replication Matthew Meselson and
Franklin Stahl experiment in 1958
– Grow E. coli in the presence of 15N (a heavy isotope of Nitrogen) for many
generations
• Cells get heavy-labeled DNA
– Switch to medium containing only 14N (a light isotope of Nitrogen)
– Collect sample of cells after various times
– Analyze the density of the DNA by centrifugation using a CsCl gradient

DNA
14
N 15
N

1955: Arthur Kornberg Worked with E. coli. Discovered the mechanisms of DNA synthesis.
Four components are required:
1. DNA polymerase (Kornberg enzyme)
from E. coli catalyzes the stepwise addition of deoxy ribonucleotides to the 3’-OH end of a DNA chain:

(DNA)n residues + dNTP (DNA)n+1 + PPi
The enzyme has the following requirements:
2. all four dNTPs (dATP, dGTP, dTTP and dCTP) must be present to be used as precursors; Mg 2+ is also
required;
3. a DNA template is essential, to be copied by the DNA polymerase;
4. a primer with a free 3’-OH that the enzyme can extend.

[37]
DNA polymerase is a template-directed enzyme, that is it recognizes the next nucleotide on the DNA
template and then adds a complementary nucleotide to the 3’-OH of the primer, creating a 3’5’
phosphodiester bond,and releasing pyrophosphate. It involves nucleophilic attack of the 3’-OH of the primer
on the α-phosphate group of the incoming nucleotide. The primer is extended in a 5’—>3’ direction.
• Initiation of replication, major elements:
 Segments of single-stranded DNA are called template strands.
 Gyrase (a type of topoisomerase) relaxes the supercoiled DNA.
 Initiator proteins and DNA helicase binds to the DNA at the replication fork and untwist the DNA using
energy derived from ATP (adenosine triphosphate). (Hydrolysis of ATP causes a shape change in DNA
helicase)
 DNA primase next binds to helicase producing a complex called a primosome (primase is required for
synthesis),
 Primase synthesizes a short RNA primer of 10-12 nucleotides, to which DNA polymerase III adds
nucleotides.
 Polymerase III adds nucleotides 5’ to 3’ on both strands beginning at the RNA primer.
 The RNA primer is removed and replaced with DNA by polymerase I, and the gap is sealed with DNA
ligase.
Single-stranded DNA-binding (SSB) proteins (>200) stabilize the single-stranded template DNA during the
process.

The Replication process (A comparative study of eukaryotes and procaryotes)


Initiation of Replication is NOT a random process and always begins at the some specific position/positions
called origin of Replication. A circular bacteria genome has single ori and eukaryotic chromosome have
multiple.
Initiation at E.coli origin of Replication
OriC, spans 250 Bp. containing two short repeat motifs one of 9 Nucleotides and other is 13 Nucleotides
 5 copies of 9 Nucleotide repeats bind with DnaA, resulting in melting (opening) with in tandem array of 3 AT
rich, 13 Nucleotides repeats located at one end of Ori C sequence.
After melting, first step is attachment of a complex of two proteins Dna BC forming prepriming complex Dna C
release soon, Dna B act as helicase, break base pairs.

[38]
[39]
causes the region to wrap around the DnaA
proteins and separates the AT-rich region

SSB SSB

SSB SSB

Initiation in yeast: it starts from ARs (autonomously replicating sequence) having length less than 200 Bp.
Containing 4 subdomains A, B1, B2, B3.
A & B1 make origin recognition sequence (40 Bp) whichact as binding site for the origin recognition complex
(ORC) (a set of six proteins that attach to the ARS). ORC have a key role in the regulation of DNA replication. B 2
corresponds to 13 Nucleotide repeat of E. Coli and melted, this melting is included by torsional stress
introduced by attachment of a DNA binding proteins ARS binding factor 1 (ABF1) which attaches to subdomain
B3.

Elongation: - In E.coli Pol III is main polymerizing enzyme having 3 main subunits α, ε and Q.

[40]
Eukaryotes have five DNA polymerases called α β γ δ and ε. The main replicating enzyme is DNA pol δ has two
subunits and work in conjugation with an accessory protein called proliferating cell nuclear antigen (PCNA)
which is equivalent of β subunit of E. Coli DNA pol III, holding the enzyme tightly to template. Pol α also has an
important function in DNA synthesis being the enzymes that primes eukaryotic replication.

Discontinuous strand Synthesis and Priming Problem


During DNA replication both strands of the double helix must be copied but polymerases can synthesize DNA
in the 5’ → 3’ direction so one parent strand (leading strand) can be copied in a continuous manner and on
other (lagging strand) has to be carried out in a discontinuous fashion. In this fashion initially product of
lagging strand replication are short segments of polynucleotide called Okazaki fragments.
[1000-2000 Nucleotide in prokaryotes & 100-200 in eukaryotes]
 Polymerase can extend of polynucleotide efficiently only if its 3’ Nucleotide is base paired.
 In bacteria, Primers are synthesized by primase, a special RNA pol unrelated to Transcribing enzyme with
each primer about 5 nucleotide in length.
In eukaryotes primase is an integral part of DNA polymerase α which synthesis RNA primers of 10
Nucleotide and then extends the primers by adding about 30 Nucleotide of DNA before main replicative
enzyme take over.

Event at the bacterial replication fork


After helicase (Dna B) has bound to the origin forming prepriming complex, The primase is recruited resulting
in the primosome which initiates replication of the leading strand.
[Helicase bind to single strand rather than double strand DNA. And migrate along polynucleotide in either 5’
→ 3’ or 3’ → 5’ direction depending on the specificity of the helicase].

5’
Two separated ssDNA (if allowed) would immediately reform (lack any enzyme activity) attach to the
polynucleotide and prevent them from reassociating. After 1000-2000 Nucleotidee of the leading strand have
been replicated the first round of discontinuous strand synthesis on the lagging strand can begin.
►Some DNA pol III complex synthesizing leading strand, would extended on lagging strand.
[41]
► actually β subunit slide on two strand and two α subunits bound with γ subunit which in turn bound with
β.
Here the main function of the γ-complex is to interact with β-subunit and hence control the attachment and
removal of the enzyme from the template, A function that is required primarily during lagging strand
replication when enzyme has to attach and detach repeatedly at the start and end of each okazaki fragment.
The combination of DNA pol III and the primosome is called replisome. It migrate along the parent DNA and
carrying out most of the replicative function. After its passage, the replication process must be completed by
joining up the individual okazaki fragments But since each okazaki fragment has its RNA primer attach at the
point where ligation should take place, can NOT be removed by DNA pol III (lacking 5’ → 3’ exonuclease
activity) DNA III releases the lagging strand and its place is taken by DNA Pol I, removing primer and extending
the adjacent fragment into the region of the template that is exposed. Two okazaki fragment (newly syn.) are
now abut. All that remains is for the missing phosphodiester bond to be put in place by a DNA ligase linking
two fragments and completing replication of this region of lagging strand.
• Concurrent Synthesis of Leading and Lagging Strands
• Kornberg model
– Process should be processive, not distributive
– Makes no sense for DNAP molecules to move away from the fork and then have to return
– Has two DNAP III core enzymes connected to each other
• One synthesizes each strand
– One continuously and one using a looping discontinuous method that
produces short Okazaki fragments

Connecting Leading and Lagging Strand


Synthesis Polymerase dimer synthesizes
both strands simultaneously using both a
continuous and a looping discontinuous
approach
The Eukaryotic replication fork

[42]
 Which helicase (of several) is responsible for unwinding of DNA has NOT yet been established.
 Separated Polynucleotide are prevented from reattaching by RPA (replication protein A).
 DNA polymerase α can both synthesize an RNA primer and extend this primer with about 30 Nucleotide of
DNA but because it lack the stabilizing effect of sliding clamp (equivalent to β of Pol III or PCNA of Pol δ) it
must the be replaced by the main replicating enzyme DNA polymerase δ.
 Function of γ- complex of E.coli polymerase is carried out by multisubunit accessory protein called
replication factor (in eukaryotes).

Replication forks:
When the bacterial circular chromosome is replicated, replication starts at a single origin. The double helix
opens up and both strands serve as template for the synthesis of new DNA. DNA synthesis then proceeds
outward in both directions from the single origin . The products of the reaction are two daughter double
stranded DNA molecules each of which has one original template strand and one strand of newly synthesized
DNA. Thus, replication is semi-conservative. The region of replicating DNA associated with the single origin is
called a replication bubble or replication eye and consists of two replication forks moving in opposite
directions around the DNA circle.

Okazaki fragments: Double-stranded DNA is antiparallel; one strand runs 5’3’ and the complementary
strand runs 3’5’. As the original double-stranded DNA opens up at a replication fork, new DNA is made
against each template strand. Superficially, therefore, one might expect new DNA to be made 5’->3’ for one
daughter strand and 3’5’ for the other daughter strand. However, all DNA polymerases make DNA only in
the 5’3’ direction and never in the 3’-»5’ direction. What actually happens is that on the template strand
with 3’5’ orientation, new DNA is made in a continuous piece in the correct 5’3’ direction. This new DNA
is called the leading strand. On the other template strand (that has a 5’3’ orientation), DNA polymerase
synthesizes short pieces of new DNA {about 1000 nucleotides long) in the 5’3’ direction and then joins these
pieces together. The small fragments are called Okazaki fragments after their discoverer. The new DNA strand
which is made by this discontinuous method is called the lagging strand.

RNA primer: DNA polymerase cannot start DNA synthesis without a primer. Even on the lagging stand, each
Okazaki fragment requires an RNA primer before DNA synthesis can start. The primer used in each case is a
short (approximately five nucleotides long) piece of RNA and is synthesized by an RNA polymerase called
primase. Primase can make RNA directly on the single-stranded DNA template because, like all RNA
polymerases, it does not require a primer to begin synthesis. The RNA primer made by primase is then
extended by DNA polymerase III. DNA polymerase III synthesizes DNA for both the leading and lagging strand.
After DNA synthesis by DNA polymerase III, DNA polymerase I uses its 5’3’ exonuclease activity to remove
the RNA primer and then fills the gap with new DNA. DNA polymerase III cannot carry out this task because it
lacks the 5’3’ activity of DNA polymerase I. Finally, DNA ligase joins the ends of the DNA fragments together.

[43]
(a) Primase binds to the DNA template strand (thin
line) and

(b) synthesizes a short RNA primer (dotted line);

(c) DNA polymerase III now extends the RNA


primer by synthesizing new DNA (thick line);
(d) during synthesis of the lagging stand, adjacent
Okazeki fragments are separated by the RNA
primers;

(e) the RNA primers are now removed and the


gaps filled with DNA by DNA polymerase I

(f) generating adjacent DNA fragments that are


then

(g) joined by DNA ligase.

Fig:- Details of DNA replication.

Accessory proteins:DNA polymerases I and III, primase and DNA ligase are not the only proteins needed
for replication of the bacterial chromosome. The DNA template is a double helix with each strand wound
tightly around the other and hence the two strands must be unwound during replication.

How is this unwinding problem solved?


A DNA helicase is used to unwind the double helix (using ATP as energy source) and SSB (single-stranded
DNA-binding) protein prevent the single-stranded regions from base-pairing again so that each of the two
DNA strands is accessible for replication. In principle, for a replication fork to move along a piece of DNA, the
DNA helix would need to unwind ahead of it, causing the DNA to rotate rapidly. However, the bacterial
chromosome is circular and so there are no ends to rotate. The solution to the problem is that an enzyme
called topoisomerase I breaks a phosphodiester bond in one DNA strand (a single-strand break) a small
distance ahead of the fork, allowing the DNA to rotate freely (swivel) around the other (intact) strand. The
phosphodiester bond is then re-formed by the topoisomerase.
After the bacterial circular DNA has been replicated, the result is two doubles- stranded circular DNA
molecules that are interlocked; Topoisomerase II separates them as follows. This enzyme works in a similar
manner to topoisomerase I but causes a transient break in each strand (a double-strand break) of a double-

[44]
stranded DNA molecule. Thus topoisomerase II binds to one double-stranded DNA circle and causes a
transient double-strand break that acts as a ‘gate’ through which the other DNA circle can pass .
Topoisomerase II then re-seals the strand breaks.

DNA REPLICATION IN EUKARYOTES (In General)


The life of a eukaryotic cell can be defined as a cell cycle. Mitosis and cell division occur in the M phase which
lasts for only about 1 h. This is followed by the G 1 phase (G for gap), then the S phase (S for synthesis), during
which time the chromosomal DNA is replicated, and finally the G 2 phase in which the cells prepare for mitosis.
Eukaryotic cells in culture typically have cell cycle times of 16-24 h but the cell cycle time can be much longer
(> ‘100 days) for some cells in a multicellular organism. Most of the variation in cell cycle times occurs by
differences in the length of the G, phase. Some cells in vivo, such as neurons, stop dividing completely and are
said to be quiescent, locked in a G o phase.
Multiple Replicons: In eukaryotes, replication of chromosomal DNA occurs only in the S phase of the cell
cycle. As for bacterial DNA, eukaryotic DNA is replicated semi conservatively. Replication of each linear DNA
molecule in a chromosome starts at many origins, one every 3-300 kb of DNA depending on the species and
tissue, and proceeds bi-directionally from each origin. The use of multiple origins is essential in order to
ensure that the chromosomal DNA is replicated within the necessary time period. At each origin, a replication
bubble forms consisting of two replication forks moving in opposite directions. The DNA replicated under the
control of a single origin is called a replicon. DNA synthesis proceeds until replication bubbles merge
together .
All of the regions of a chromosome are not replicated simultaneously. Rather, many replication eyes will be
found on one part of the chromosome and none on another section. Thus replication origins are activated in
clusters, called replication units, consisting of 20-80 origins. During S phase, the different replication units are
activated in a set order until eventually the whole chromosome has been replicated. Transcriptionally active
genes appear to be replicated early in S phase, while chromatin that is condensed and not transcriptionally
active is replicated later.

Five DNA polymerases:


Eukaryotic cells contain five different DNA polymerases;     and . The DNA polymerases involved in
replication of chromosomal DNA are  and . DNA polymerases  and  are involved in DNA repair. All of
these polymerases except DNA polymerase  are located in the nucleus; DNA polymerase  is found in
mitochondria and replicates mitochondrial DNA.

Leading and Lagging Strands:


►The basic scheme of replication of double-stranded chromosomal DNA in eukaryotes follows that for
bacterial DNA replication; a leading strand and a lagging strand are synthesized, the latter involving
discontinuous synthesis via Okazaki fragments. However, in eukaryotes, replication forks move much slower
than in prokaryotes (about one-tenth of the rate) and the two new DNA strands are made by different
polymerases; DNA polymerase a catalyzes synthesis of the lagging strand, via Okazaki fragments, and DNA
polymerase 8 synthesizes the leading strand.
►The RNA primers required are made by DNA polymerase a which carries a primase subunit. Whereas the 8
enzyme has 3’—>5’ exonuclease activity and so can proof-read the DNA made, DNA polymerase a has no such

[45]
activity and so the new lagging strand DNA made by DNA polymerase a is probably proof-read by a separate
accessory protein.

►Replication Of Chromatin:
Once DNA is bound to histones to form nucleosomes, histones rarely leave the DNA. Thus when a
chromosome is replicated, the histones stay in place but somehow must allow the replication machinery to
pass through and make new DNA. One suggestion is that the nucleosome histone octamer transiently unfolds
into two half-nucleosomes to allow the replication machinery access to the DNA. The new DNA must also be
packaged into nucleosomes and so histones are also synthesized during the S phase of the cell cycle.
►Experiments indicate that the old nucleosomes stay with the daughter DNA molecule containing the leading
strand whilst new nucleosomes assemble on the daughter molecule containing the lagging strand.

Removal of RNA Primers from okazaki fragments:-

No eukaryotic DNA polymerase have 5’3’ exonuclease activity. FEN1 (flap endonucleases) play a central role
for this purpose, which associated with the DNA pol δ complex, in order to degrade the primer from the 5’ end
of the adjacent fragment.

The inability of FEN 1 to initiate primer degradation because of it is unable to remove the ribonucleotide at
the extreme 5’ end of primer because this ribonucliotide carries 5’-triphosphate group which blocks FEN1
activity.

[46]
Mode 1 Helicase → Pushing of Primer by Pol δ → FEN activity cleave Phosphodiester at RNA-DNA joint

Mode 2 RNase → FEN1 → at RNA : DNA junction

* In eukaryotes there is No replisome, instead, the enzyme and proteins involved in replication form sizeable
structure with in the nucleus, each containing 100 or 1000 of individual replication complexes. These structure
are immobile because of attachment with nuclear matrix, So DNA molecules are threaded through the
complex as they are replicated. The structures are referred to as replication factories.

Termination of Replication

Bacterial genomes are replicated Bidirectionally from a single point. This is because of termination sequences
“Ter” (7 in number), Two fork moving with different speed meet at exactly diagonal position. Ter sequences
acting as the recognition site for a sequence specific DNA binding protein called “Tus” when bound to Ter, a
tus protein allows a replication fork to pass if the fork is moving in one direction but block progress if the fork
is moving in opposite way around. The directionality is set by the orientation of the tus protein on the double
helix.

Little is known about termination in eukaryotes Quite possible replication forks meets at randoms.

Telomeres

[47]
Telomerase consist both Protein & RNA. This RNA at 5’ end has complementarity with human telomeric
repeats (5’ TTAGGG3’). This RNA is used as a template for each extension step, The DNA synthesis being
carried out by the protein component of enzyme, Which is reverse transcriptase.

 Telomere length is regulated by Telomere binding protein (TBPs) Such as TRF1 in human.

 Mutation that prevent TBPs from binding to the DNA. Result in the telomere becoming longer than
normal. If over production => Telomere shortening.

Telomere Replication:

The replication of a linear DNA molecule in a eukaryotic chromosome creates a problem that does not exist
for the replication of bacterial circular DNA molecules. The normal mechanism of DNA synthesis (see above)
means that the 3’ end of the lagging strand is not replicated. This creates a gap at the end of the chromosome
and therefore a shortening of the double-stranded replicated portion. The effect is that the chromosomal DNA
would become shorter and shorter after each replication. Various mechanisms have evolved to solve this
problem. In many organisms the solution is to use an enzyme called telomerase to replicate the chromosome
ends (telomeres).

Each telomere contains many copies of a repeated hexanucleotide sequence that is G-rich; in Tetrahymena it
is GGGTTG. Telomerase carries, as an integral part of its structure, a short RNA molecule that is
complementary to part of this G-rich sequence. The exact mechanism of action of telomerase is not clear. The
RNA molecule of telomerase is envisaged to hydrogen-bond to the telomere end. Then, using the RNA as a
template, telomerase copies the RNA template (hence this enzyme is a reverse transcriptase) and adds six
deoxynucleotides to the telomere DNA end. Telomerase then dissociates from the DNA, re-binds at the new
telomere end and repeats the extension process. It can do this hundreds of times before finally dissociating.
The newly extended DNA strand can then act as a template for normal DNA replication to form double-
stranded chromosomal DNA. The two processes, of the DNA ends shortening through normal replication and
of lengthening using telomerase, are very roughly in balance so that each chromosome stays approximately
the same length.

[48]
Fig:- Replication of telomeric DNA. Telomerase has a bound RNA molecule that is used as template to direct
DNA synthesis and hence extension of the ends of chromosomal DNA.

Theta replication: - A common type of replication that takes place in circular DNA, such as that found in E.
coli and other bacteria. In theta replication, double-stranded DNA begins to nucleotide strands that then serve as
templates on which new. DNA can be synthesized. The unwinding of the double helix generates a loop, termed
a replication bubble. Unwinding may be at one or both ends of the bubble, making it progressively larger.
DNA replication on both of the template strands is simultaneous with unwinding. The point of unwinding,
where the two single nucleotide strands separate from the double-stranded DNA helix, is called a replication
fork.

If there are two replication forks, one at each end of the replication bubble, the forks proceed outward in both
directions in a process called bidirectional replication, simultaneously unwinding and replicating the DNA
until they eventually meet. If a single replication fork is present, it proceeds around the entire circle to produce
two complete circular DNA molecules, each consisting of one old and one new nucleotide strand.

John Cairns provided the first visible evidence of theta replication in 1963 by growing bacteria in the presence
of radioactive nucleotides.

[49]
 Rolling-circle replication Another form of replication, called rolling-circle replication takes place in some
viruses and in the F factor of E. coli. This form of replication is initiated by a break in one of the nucleotide
strands that creates a 3’-OH group and a 5’-phosphate group. New nucleotides are added to the 3’ end of the
broken strand, with the inner (unbroken) strand used as a template. As new nucleotides are added to the 3 ’ end,
the 5’ end of the broken strand is displaced from the template, rolling out like thread being pulled off a spool.
The 3’ end grows around the circle, giving rise to the name rolling-circle model.
The replication fork may continue around the circle a number of times, producing several linked copies of the
same sequence. With each revolution around the circle, the growing 3’ end displaces the nucleotide strand
synthesized in the preceding revolution. Eventually, the linear DNA molecule is cleaved from the circle,
resulting in a double stranded circular DNA molecule and a single-stranded linear DNA molecule. The linear
molecule circularizes either before or after serving as a template for the synthesis of a complementary strand.

D-Loop Mode Replication

[50]
Chloroplasts and mitochondria (in eukaryotic cells) have their own
circular DNA molecules that appear to replicate by a slightly
different mechanism. The origin of replication is at a different
point on each of the two parental template strands. Replication
begins on one strand, displacing the other while forming a
displacement loop or D-loop structure. Replication continues until
the process passes the origin of replication on the other strand.
Replication then initiates on the second strand, in the opposite
direction. Normal Y-junction replication, also occurs in
mitochondrial DNA under some growth conditions

[51]
CHAPTER: 4 DNA REPAIR

DNA in the living cell is subject to many chemical alterations (a fact often forgotten in the excitement of being
able to do DNA sequencing on dried and/or frozen specimens
If the genetic information encoded in the DNA is to remain uncorrupted, any chemical changes must be
corrected.
A failure to repair DNA produces a mutation.

Agents that Damage DNA


►Certain wavelengths of radiation
o ionizing radiation such as gamma rays and x-rays
o ultraviolet rays, especially the UV-C rays (~260 nm) that are absorbed strongly by DNA but
also the longer-wavelength UV-B that penetrates the ozone shield
►Highly-reactive oxygen radicals produced during normal cellular respiration as well as by other
biochemical pathways
►Chemicals in the environment
o many hydrocarbons, including some found in cigarette smoke
o some plant and microbial products, e.g. the aflatoxins produced in moldy peanuts
►Chemicals used in chemotherapy, especially chemotherapy of cancers

Types of DNA Damage


1. All four of the bases in DNA (A, T, C, G) can be covalently modified at various positions.
o One of the most frequent is the loss of an amino group ("deamination") — resulting, for example,
in a C being converted to a U.
2. Mismatches of the normal bases because of a failure of proofreading during DNA replication.
o Common example: incorporation of the pyrimidine U (normally found only in RNA) instead of T.

3. Breaks in the backbone.


o Can be limited to one of the two strands (a single-stranded break, SSB) or

o on both strands (a double-stranded break (DSB).

o Ionizing radiation is a frequent cause, but some chemicals produce breaks as well.

4. Crosslinks Covalent linkages can be formed between bases


o on the same DNA strand ("intrastrand") or

o on the opposite strand ("interstrand").

Several chemotherapeutic drugs used against cancers crosslink DNA

[52]
Repairing Damaged Bases
Damaged or inappropriate bases can be repaired by several mechanisms:

A. Direct chemical reversal of the damage


B. Excision Repair, in which the damaged base or bases are removed and then replaced with the correct
ones in a localized burst of DNA synthesis. There are three modes of excision repair, each of which employs
specialized sets of enzymes.
1. Base Excision Repair (BER)
2. Nucleotide Excision Repair (NER)
3. Mismatch Repair (MMR)

A. Direct Reversal of Base Damage


Perhaps the most frequent cause of point mutations in humans is the spontaneous addition of a methyl group
(CH3-) (an example of alkylation) to C followed by deamination to a T. Fortunately, most of these changes are
repaired by enzymes, called glycosylases, that remove the mismatched T restoring the correct C. This is done
without the need to break the DNA backbone (in contrast to the mechanisms of excision repair described
below).
Some of the drugs used in cancer chemotherapy ("chemo") also damage DNA by alkylation. Some of the
methyl groups can be removed by a protein encoded by our MGMT gene. However, the protein can only do it
once, so the removal of each methyl group requires another molecule of protein.
This illustrates a problem with direct reversal mechanisms of DNA repair: they are quite wasteful. Each of the
myriad types of chemical alterations to bases requires its own mechanism to correct. What the cell needs are
more general mechanisms capable of correcting all sorts of chemical damage with a limited toolbox. This
requirement is met by the mechanisms of excision repair.

B1. Base Excision Repair (BER)


The steps and some key players:
1. removal of the damaged base (estimated to occur some 20,000 times a day in each cell in our body)
by a DNA glycosylase. We have at least 8 genes encoding different DNA glycosylases each enzyme
responsible for identifying and removing a specific kind of base damage.
2. removal of its deoxyribose phosphate in the backbone, producing a gap. We have two genes encoding
enzymes with this function.
3. replacement with the correct nucleotide. This relies on DNA polymerase beta, one of at least 11 DNA
polymerases encoded by our genes.
4. ligation of the break in the strand. Two enzymes are known that can do this; both require ATP to
provide the needed energy.

[53]
Nucleotide Excision Repair (NER)
NER differs from BER in several ways:
 It uses different enzymes.
 Even though there may be only a single "bad" base to correct, its nucleotide is removed along with
many other adjacent nucleotides; that is, NER removes a large "patch" around the damage.
The steps and some key players:
1. The damage is recognized by one or more protein factors that assemble at the location.
2. The DNA is unwound producing a "bubble". The enzyme system that does this is Transcription Factor
IIH, TFIIH, (which also functions in normal transcription).
3. Cuts are made on both the 3' side and the 5' side of the damaged area so the tract containing the
damage can be removed.
4. A fresh burst of DNA synthesis — using the intact (opposite) strand as a template — fills in the correct
nucleotides. The DNA polymerases responsible are designated polymerase delta and epsilon.
5. A DNA ligase covalent binds the fresh piece into the backbone.

Xeroderma Pigmentosum (XP)


XP is a rare inherited disease of humans which, among other things, predisposes the patient to
 pigmented lesions on areas of the skin exposed to the sun and
 an elevated incidence of skin cancer.
It turns out that XP can be caused by mutations in any one of several genes — all of which have roles to
play in NER. Some of them:

[54]
 XPA, which encodes a protein that binds the damaged site and helps assemble the other proteins
needed for NER.
 XPB and XPD, which are part of TFIIH. Some mutations in XPB and XPD also produce signs of
premature aging.
 XPF, which cuts the backbone on the 5' side of the damage
 XPG, which cuts the backbone on the 3' side.

Transcription-Coupled NER
Nucleotide-excision repair proceeds most rapidly
 in cells whose genes are being actively transcribed
 on the DNA strand that is serving as the template for transcription.
This enhancement of NER involves XPB, XPD, and several other gene products. The genes for two of them are
designated CSA and CSB (mutations in them cause an inherited disorder called Cockayne's syndrome).
The CSB product associates in the nucleus with RNA polymerase II, the enzyme responsible for synthesizing
messenger RNA (mRNA), providing a molecular link between transcription and repair.
One plausible scenario: If RNA polymerase II, tracking along the template (antisense) strand), encounters a
damaged base, it can recruit other proteins, e.g., the CSA and CSB proteins, to make a quick fix before it
moves on to complete transcription of the gene.

B3. Mismatch Repair (MMR)


Mismatch repair deals with correcting mismatches of the normal bases; that is, failures to maintain normal
Watson-Crick base pairing (A•T, C•G)
Many incorrectly inserted nucleotides that escape detection by proofreading are corrected by mismatch
repair
 Incorrectly paired bases distort the three-dimensional structure of DNA, and mismatch repair enzymes
detect these distortions. In addition to detecting incorrectly paired bases, the mismatch-repair system
corrects small unpaired loops in the DNA, such as those caused by strand slippage in replication.
After the incorporation error has been recognized, mismatch-repair enzymes cut out the distorted section of

the newly synthesized strand and fill the gap with new nucleotides, by using the original DNA strand as a
template.
The proteins that carry out mismatch repair in E. coli differentiate between old and new strands by the
presence of methyl groups on special sequences of the old strand. After replication, adenine nucleotides in
the sequence GATC are methylated by an enzyme called Dam methylase. The process of methylation is
delayed and so, immediately after replication, the old strand is methylated and the new strand is not . In E.
coli, the proteins MutS, MutL, and MutH are required for mismatch repair.
 MutS binds to the mismatched bases and forms a complex with MutL and MutH; this complex is
thought to bring an unmethylated GATC sequence in close proximity to the mismatched bases. MutH

[55]
nicks the unmethylated strand at the GATC site and exonucleases degrade the unmethylated strand
from the nick to the mismatched bases
 DNA polymerase and DNA ligase fill in the gap on the unmethylated strand with correctly paired
nucleotides
It can enlist the aid of enzymes involved in both base-excision repair (BER) and nucleotide-excision repair
(NER) as well as using enzymes specialized for this function.
Mutations in either of these genes predisposes the person to an inherited form of colon cancer. So these
genes qualify as tumor suppressor genes.

Question: How does the MMR system know which is the incorrect nucleotide?
In E. coli, certain adenines become methylated shortly after the new strand of DNA has been synthesized. The
MMR system works more rapidly, and if it detects a mismatch, it assumes that the nucleotide on the already-
methylated (parental) strand is the correct one and removes the nucleotide on the freshly-synthesized
daughter strand. How such recognition occurs in mammals is not yet known.
Synthesis of the repair patch is done by the same enzymes used in NER: DNA polymerase delta and epsilon.
Cells also use the MMR system to enhance the fidelity of recombination; i.e., assure that only homologous
regions of two DNA molecules pair up to crossover and recombine segments (e.g., in meiosis).
Repairing Strand Breaks
Ionizing radiation and certain chemicals can produce both single-strand breaks (SSBs) and double-strand
breaks (DSBs) in the DNA backbone.

[56]
Single-Strand Breaks (SSBs)
Breaks in a single strand of the DNA molecule are repaired using the same enzyme systems that are used in
Base-Excision Repair (BER).
Double-Strand Breaks (DSBs)
There are two mechanisms by which the cell attempts to repair a complete break in a DNA molecule:
 Direct joining of the broken ends. This requires proteins that recognize and bind to the exposed ends and
bring them together for ligating. They would prefer to see some complementary nucleotides but can proceed
without them so this type of joining is also called Nonhomologous End-Joining (NHEJ).
 Errors in direct joining may be a cause of the various translocations that are associated with cancers.
Examples:
o Burkitt's lymphoma
o the Philadelphia chromosome in chronic myelogenous leukemia (CML)
o B-cell leukemia
 Homologous Recombination. Here the broken ends are repaired using the information on the intact
o sister chromatid (available in G2 after chromosome duplication), or on the
o homologous chromosome (in G1; that is, before each chromosome has been duplicated). This requires
searching around in the nucleus for the homolog — a task sufficiently uncertain that G 1 cells usually prefer to
mend their DSBs by NHEJ. or on the
o same chromosome if there are duplicate copies of the gene on the chromosome oriented in opposite
directions (head-to-head or back-to-back).
Two of the proteins used in homologous recombination are encoded by the genes BRCA1 and BRCA2.
Inherited mutations in these genes predispose women to breast and ovarian cancers.

Meiosis also involves DSBs


Recombination between homologous chromosomes in meiosis I also involves the formation of DSBs and their
repair. So it is not surprising that this process uses the same enzymes
Meiosis I with the alignment of homologous sequences provides a mechanism for repairing damaged DNA;
that is, mutations. in fact, many biologists feel that the main function of sex is to provide this mechanism for
maintaining the integrity of the genome.
However, most of the genes on the human Y chromosome have no counterpart on the X chromosome, and
thus cannot benefit from this repair mechanism. They seem to solve this problem by having multiple copies of
the same gene — oriented in opposite directions. Looping the intervening DNA brings the duplicates together
and allowing repair by homologous recombination.

Gene Conversion

[57]
If the sequence used as a template for repairing a gene by homologous recombination differs slightly from the
gene needing repair; that is, is an allele, the repaired gene will acquire the donor sequence. This nonreciprocal
transfer of genetic information is called gene conversion.
The donor of the new gene sequence may by:
 the homologous chromosome (during meiosis)
 the sister chromatid (also during meiosis)
 a duplicate of the gene on the same chromosome (during mitosis)

Gene conversion during meiosis alters the normal mendelian ratios. Normally, meiosis in a heterozygous
(Aa) parent will produce gametes or spores in a 1:1 ratio; e.g., 50% A; 50% a. However, if gene conversion has
occurred, other ratios will appear. If, for example, an A allele donates its sequence as it repairs a damaged a
allele, the repaired gene will become A, and the ratio will be 75% A; 25% a.

Genotoxicity
►This is a recently developed branch of toxicology which identifies mutagens in the environment. Most
agents that cause cancer, carcinogens, are also mutagens. Identification of mutagens is recognised as
important in many areas including pharmaceuticals, food additives, agriculture, pollution analysis and in many
industrial processes.
►For this reason it has been necessary to develop tests to identify dangerous compounds. Early testing
systems relied on small mammals such as rats or mice, but these tests were time consuming, very expensive
and attracted considerable ethical criticism.
AMES TEST:-
A number of in vitro tests have been developed which use bacteria, or animal cells grown in tissue culture. In
many countries there are legal requirements for new chemicals to be tested by a series of different tests
before being licensed by government agencies. The best known test is the Ames Test. This is a very rapid
test for mutagens.
►It utilizes bacteria of the species Salmonella typhimurium that have a mutation in the histidine operon, and
hence cannot synthesize histidine. They are referred to as histidine auxotrophs (his).

[58]
►Compounds are tested to determine whether they can induce reversion of this mutation. The test can be
carried out, very simply, by plating out his- mutants on an agar plate that contains only a trace of histidine. A
crystal, or a filter disc containing a solution of the compound to be tested, is placed on the surface of the agar.
The bacteria grow for a short time until the histidine is depleted. After that point, the only bacteria capable of
continuing growth to form colonies are those which have undergone reversion and are capable of synthesizing
their own histidine, histidine prototrophs (his+). If the test compound is not mutagenic a few revertant
colonies will be found randomly scattered across the agar plate. If it is mutagenic then the number of colonies
will be increased and they will be clustered around the point where the compound was placed on the plate.
►Obviously the test can be constructed in a more quantitative manner to give a dose response curve for any
compound under test. In order to identify the maximum number of mutagens two types of his-mutants are
used, one is a single base substitution and the other a frameshift mutation. This allows detection of mutagens
which have different effects on DNA. It is also possible to genetically alter both the permeability of the
bacteria to test
compounds, and to decrease their ability to repair damaged DNA. This again increases the likelihood of
detecting mutagenic activity.
►Some compounds which are known to cause cancer are only capable of doing so after they have been
converted to mutagens by the action of enzymes within the body. These are known as procarcinogens-
Enzyme action converts them to ultimate carcinogens. If procarcinogens are used in the Ames Test they will
give a negative result, however the test can be adapted to take account of this. The liver is a rich source of
activating enzymes. Liver extracts, containing these enzymes, can be added along with the test compound.
Activation of a procarcinogen will then occur resulting in increased numbers of revertants.
► The Ames Test is rapid; it can be carried out in 48 hours. It is cheap and easily quantifi able. It has identified
many compounds as mutagens including certain hair dyes, flame-retardants and food colorings.
►One deficiency of the Ames Test is that the target organism is a bacterium rather than a mammal. For this
reason a number of tests have been developed using cultured animal cell lines. These are very similar to the
Ames Test but use different selection systems and test genes.

[59]
►Agents that damage chromosomes are known as clastogens. These are also identified by test systems.
These tests can be carried out on cell lines, in laboratory animals or even in plants. The tests consist of scoring
chromosome aberrations such as breaks, exchanges, ring chromosomes, dicentries and trans locations. The
frequency of aberration is often low.
►An alternative to enumeration of such gross chromosomal aberrations is to count sister chromatid
exchanges (SCE). SCE involves exchange of material between two chromatids of the same chromosome, and is
a process that takes place spontaneously at low frequencies in all cell types. It can occur both in mitosis and
meiosis.
►Its usefulness is that the frequency of SCE increases much more rapidly than gross chromosomal
aberrations, as a response to clastogen treatment. SCE can be detected by a number of procedures, all of
which are dependent on the semi-conservative nature of DNA replication. The essence of the technique is to
make sister chromatids stain differently so that exchanges can easily be observed.This is achieved by a
complex staining method after culturing cells for two rounds of division in the presence of the thymidine
analog bromodeoxyuridme (BrdU). This is incorporated into the newly synthesized DNA in place of thymidine
and alters the staining properties of the chromatid so that one stains dark and the other light

[60]
CHAPTER: 5 DNA RECOMBINATION
 HOMOLOGOUS RECOMBINATION
 Homologous recombination occurs in bacteria as well as in eukaryotes.
 In Bacteria, homologous recombination is used to repair double stranded breaks in the DNA, Restart
collapsed replication forks and to allow chromosome recombination with bacteriophage genome or
via conjugation.
 In Eukaryotes, homologous recombination is also used for DNA repair, to restart collapsed replication
forks and to ensure variation in genes needed to be passed on to the next generation.
 Homologous recombination is the exchange between 2 homologous DNA molecules. At least 100bp
need to be identical. Within homologous regions, there are going to be different alleles of the same
gene.
 Models that describe putative ways in which homologous recombination takes place
Holliday Model
1. Homologous chromosomes align.
2. A single DNA strand breaks in each duplex.
3. Strand invasion takes place.
4. Branch migration takes place
5. Resolution gives rise to either crossover or patch products.
 Double-Strand break (DSB) repair Model
1. Homologous chromosomes align.
2. Double strand breaks occur on one DNA duplex.
3. Degradation of the breaks occur to create single stranded 3’ extensions (DNA tails). The 3’ ends serve
as primers for DNA synthesis.
4. Strand invasion takes place on intact chromosome and base pairing with complementary stand occurs.
5. Two holiday junctions are generated.
6. Branch migration takes place
7. Resolution gives rise to either crossover or patch products
Because double strand breaks occur relatively frequently, the DSB model is the most attractive model.

[61]
 

The RecBCD pathway is the best understood homologous recombination event. It occurs in E. coli and it
follows the proposed DSB repair Model events.
1. The chi sites (GCTGGTGG) are abundant in E. coli. They are present near sites of recombination in E.
coli.
2. RecD is a DNA helicase that moves on the 5’ending strand. RecB is a DNA helicase that moves on the
3’ending strand.
3. The RecBCD complex is a nuclease that cleaves single stranded DNA and degrades it.
4. Upon reaching the chi site, RecBCD changes enzymatic activity because RecD seems to be
inactivated or lost.
5. RecBCD interacts with RecA to promote assembly of RecA on single stranded DNA. Thus, RecA coats
the single stranded DNA tail. RecA is a strand exchange protein, that is, it induces strand invasion.
Strand exchange proteins of the RecA family are present in all forms of life.
6. After strand invasion is complete and the Holliday junction forms, RuvA (a Holliday junction specific
DNA binding protein) recruits RuvB. RuvB is an ATPase. RuvB promotes branch migration.
7. RuvC is the Holliday junction resolving endonuclease.
8. DNA ligase then joins the 5’ phosphoryl group with the 3’ OH.

[62]
Homologous recombination occurs during meiosis. Failure in homologous recombination is often reflected in
poor fertility. ►The following, are proteins that are known to be involved in meiotic recombination. There are
many proteins in this event that have yet to be discovered:
1. SPO11 is a protein that introduces double stranded breaks in the chromosomal DNA to initiate meiotic
recombination.
2. The MRX complex (made up of Mre11, Rad50 and Xrs2) processes DNA to generate 3’ ends single
stranded DNA. The MRX complex seems to also remove the DNA-linked SPO11.
3. Dmc1 and Rad51 are homologous to RecA. Dmc1 is expressed only in cells that enter meiosis.
4. It seems that Mus8.1 may be the Holliday junction resolvase.
 SITE SPECIFIC RECOMBINATION
 In site specific recombination, recombination sites are about 20 bp long and are DNA sequences that
bind recombinases and where DNA cleavage and rejoining occur.
 Recombination can result in insertion of foreign DNA, deletion of DNA and inversion of a specific DNA
sequence.
 Recombinases can be subdivided into serine recombinases and tyrosine recombinases. In each case,
serine or tyrosine, respectively, will covalently attack the phosphodiester backbone of DNA and form a
covalent bond with the phosphate group. The energy to reanneal DNA is conserved within this
covalent protein-DNA bond.
 Since all 4 DNA strands must be broken, 4 subuntis of recombinases are needed. Serine recombinases
cleave all 4 strands prior to strand exchange. Once molecule of serine recombinase promotes each of
these cleavage reactions. Thus, a minimum of 4 serine recombinases is required. To recombine, R2

[63]
recombines at R3 site and R1 with R4 site. Once the swap occurs, the 3’OH ends attack the
recombinase DNA bond.
 Tyrosine recombinases cleave and rejoin 2 DNA strands first and then they do the other strand.
Tyrosine recombinases break and rejoin 2 strands at a time and generate a holliday junction.

 Example of Site Specific Recombination


Bacteriophage λ infection of E. coli (an example of insertion and excision)
 To integrate, λ phage uses λ integrase. This enzyme catalyzes the recombination between 2 specific
sites, att (attachemtn sites), attP (p for phage) and attB (B for Bacteria). This integrase is a tyrosine
recombinase and requires accessory proteins.
 Integration requires attB, attP, λ Int and an architechtural protein called integration host factor (IHF).
IHF introduce large bends (>160o) in DNA.
 After recombination, attL and attR result.
 Phage excision requires Xis (for excise) an architechtural protein recognized at X1 and X2 sites. Xis, λ
Int and IHF stimulate excision and assemble at attR and interact with proteins assembled at attL.
Synthesis of Xis occurs only when phage is triggered to enter lytic growth.
 Salmonella Hin recombinase (an example of inversion)
 Salmolella can elude the immune system by switching between the expression of H1 or H2 flagellin.
This switching occurs because of the Hin recombinase which is a serine recombinase.
 In Salmonella, there are two genes controlled by an invertible DNA sequence which is 1 kb long. This
particular invertible DNA sequence carries recombination sites hixL and hixR, the recombinase hin
gene, and a promoter that when in the “ON” position, promotes transcription of the fljB gene and the
fljA gene. The fljB gene codes for the H2 flagellin, the fljA codes for the repressor of the H1 flagellin
gene.
 If the invertible sequence is inverted by Hin recombinase, the promoter no longer promotes
transcription of the fljB and fljA genes and the H1 flagellin which is constitutively expressed elsewhere
on the bacterial genome is expressed. The promoter in the latter position is said to be in the “OFF”
position. This site specific type of recombination is catalyzed by the Fis (Factor of inversion
stimulation) protein. Fis bends the DNA and is an architechtural protein.

 TRANSPOSITION

Nearly half of the human genome has sequences derived from transposable elements. Transposable elements
are mobile genetic elements that move from one site of the genome to another.
There are 3 principal classes of transposable elements:
1. DNA transposons,
2. Viral-like retrotransposons that carried LTR sequences.
3. Poly-A retrotranspososns.

[64]
 DNA transposons
o Carry both DNA sequences that function as recombination sites and genes encoding proteins
that participate in recombination. The recombination sites are at the two ends of the element
and are organized as inverted repeat sequences. These repeat sequences carry the
recombinase recognition sites.
o Transpositon is carried out by transposases that are also carried by the transposable element.
 Viral-like retrotransposones
 Carry 2 proteins needed for their mobility: integrase (the transposase) and reverse transcriptase(RT).
 This type of retrotransposon is flanked by LTR (long terminal repeats). RT is needed for transposition
because an RNA intermediate is required for the transposition reaction.
 The retrotranspososns can move only to new DNA sites within a cell but never leave that cell.
 Poly-A retrotransposons look like genes
 These elements are flanked by 2 different sequences called 5’UTR and 3’UTR (untranslated region).
The 3’UTR is followed by a stretch of A-T base pairs called the poly-A sequence. These elements are
also flanked by short target site duplicatons.
 These type of retrotanspososns carry 2 genes known as ORF1 and ORF2. ORF1 encodes an A RNA-
binding protein. ORF2 encodes a protein with both reverse transcriptase activity and an endonuclease
activity. This protein, although distinct from the tranposases and integrases encoded by the other
classes of elements, plays essential roles during recombination.
 There are many truncated elements that do not have a complete 5’UTR sequence and hense have lost
their ability to transpose.

[65]
CHAPTER: 6 TRANSCRIPTION IN PROKARYOTES
Three phases of transcription:
Gene transcription by E. coli RNA polymerase takes place in three phases: initiation, elongation and
termination.
►During initiation, RNA polymerase recognizes a specific site on the DNA, upstream from the gene that will
be transcribed, called a promoter site and then unwinds the DNA locally. During elongation the RNA
polymerase uses the antisense (—) strand of DNA as template and synthesizes a complementary RNA
molecule using ribonucleoside 5’ triphosphates as precursors. The RNA produced has the same sequence as
the non-template strand, called the sense (+) strand (or coding strand) except that the RNA contains U
instead of T.
►At different locations on the bacterial chromosome, sometimes one strand is used as template, sometimes
the other, depending on which strand is the coding strand for the gene in question. The correct strand to be
used as template is identified for the RNA polymerase by the presence of the promoter site. Finally, the RNA
polymerase encounters a termination signal and ceases transcription, releasing the RNA transcript and
dissociating from the DNA.

Promoters And Initiation: In E. coli, all genes are transcribed by a single large RNA polymerase with the
subunit structure 21. This complete enzyme, called the holoenzyme, is needed to initiate transcription
since the a factor is essential for recognition of the promoter; it decreases the affinity of the core enzyme for
nonspecific DNA binding sites and increases its affinity for the promoter. It is common for prokaryotes to have
several a factors that recognize different types of promoter (in E. coli, the most common a factor is 70).

The holoenzyme binds to a promoter region about 40-60 bp in


size and then initiates transcription a short distance
downstream (i.e. 3' to the promoter). Within the promoter lie
two 6-bp sequences that are particularly important for
promoter function and which are therefore highly conserved
between species. Using the convention of calling the first
nucleotide of a transcribed sequence as +1, these two
promoter elements lie at positions -10 and -35, that is about
10 and 35 bp, respectively, upstream of where transcription
will begin.

1.The -10 sequence has the consensus TATAAT. Because this element was discovered by Pribnow, it is also
known as the Pribnow box. It is an important recognition site that interacts with the o factor of RNA
polymerase.
2.The -35 sequence has the consensus TTGACA and is important in DNA unwinding during transcriptional
initiation.

[66]

The actual sequence between the -10 sequence and the -35 sequence is not conserved (i.e. it varies from
promoter to promoter) but the distance between these two sites is extremely important for correct
functioning of the promoter.
Promoters differ by up to 1000-fold in their efficiency of initiation of transcription so that genes with strong
promoters are transcribed very frequently whereas genes with weak promoters are transcribed far less often.
The -10 and -35 sequences of strong promoters correspond well with the consensus sequences shown in
whereas weaker promoters may have sequences that differ from these at one or more nucleotides.
The nature of the sequences around the transcriptional start site can also influence the efficiency of initiation.
RNA polymerase does not need a primer to begin transcription ; having bound to the promoter site, the RNA
polymerase begins transcription directly.

[67]
Elongation: After transcription initiation, the  factor is released from the transcriptional complex to leave
the core enzyme (21), which continues elongation of the RNA transcript. Thus the core enzyme contains
the catalytic site for polymerization, probably within the (3 subunit. The first nucleotide in the RNA transcript
is usually pppG or pppA. The RNA poiymerase then synthesizes RNA in the 5’  3’ direction, using the four
ribonucleoside 5’ triphosphates (ATP, CTP, GTP, UTP) as precursors. The 3’-OH at the end of the growing RNA
chain attacks the phosphate group of the incoming ribonucleoside 5’ triphosphate to form a 3’5’
phosphodiester bond.
►The complex of RNA polymerase, DNA template and new RNA transcript is called a ternary complex (i.e.
three components) and the region of unwound DNA that is undergoing transcription is called the transcription
bubble . The RNA transcript forms a transient RNA-DNA hybrid helix with its template strand but then peels
away from the DNA as transcription proceeds. The DNA is unwound ahead of the transcription bubble and
after the transcription complex has passed, the DNA rewinds.

Termination:
Transcription continues until a termination sequence is reached. The most common termination signal is a GC-
rich region that is a palindrome, followed by an AT-rich sequence. The RNA made from the DNA palindrome is
selfcomplementary and so base pairs internally to form a hairpin structure rich in GC base pairs followed by
four or more U residues. However, not all termination sites have this hairpin structure. Those that lack such a
structure require an additional protein, called rho (ρ), to help recognize the termination site and stop
transcription.

►RNA processing:
In prokaryotes, RNA transcribed from protein-coding genes (messenger RNA, mRNA), requires little or no
modification prior to translation. In fact, many mRNA molecules begin to be translated even before RNA
synthesis has finished. However, ribosomal RNA (rRNA) and transfer RNA (tRNA) are synthesized as precursor
molecules that do require post-transcriptional processing.

[68]
Induction of The Lac Operon:
Many protein-coding genes in bacteria are clustered together in operons, which serve as transcriptional
units that are coordinately regulated.
One of the most studied of these is the lac operon in E. coli. This code for key enzymes involved in lactose
metabolism: galactoside pemease (also known as lactose permease; it transports lactose into the cell across
the cell membrane) and -galactosidase (which hydrolyzes lactose to glucose and galactose). It also codes for
a third enzyme, thiogalactoside transacetylase but the role of this enzyme is not clear.
Normally E. coli cells make very little of any of these three proteins but when lactose is available it causes a
large and coordinated increase in the amount of each enzyme. Thus each enzyme is an inducible enzyme and
the process is called induction. The mechanism is that the few molecules of  -galactosidase in the cell before
induction convert the lactose to allolactose, which then turns on transcription of these three genes in the lac
operon. Thus allolactose is an inducer.
Another inducer of the lac operon is isopropylthiogalactoside (IPTG). Unlike allolactose, this inducer is not
metabolized by E. coli and so is useful for experimental studies of induction .
Allolactose is a disaccharide similar to lactose. It consists of the monosaccharides ß-D-galactose and ß-D-
glucose linked through a ß1-6 glycosidic linkage. Allolactose binds to an allosteric site on the repressor protein
causing a conformational change. As a result of this change, the repressor can no longer bind to the operator
region and falls off. RNA polymerase can then bind to the promoter and transcribe the lac genes.

Jacob and Monod proposed the operon model for the regulation of transcription.
The operon model proposes three elements:
1. a set of structural genes (i.e. genes encoding the proteins to be regulated);
2. an operator site, which is a DNA sequence that regulates transcription of the structural genes;
3.a regulator gene which encodes a protein that recognizes the operator sequence.
In the lac operon, the structural genes are the lacZ, lacY and lacA genes encoding β-galactosidase, the
permease and the transacetylase, respectively.
They are transcribed to yield a single polycistronic mRNA that is then translated to produce all three
enzymes. The existence of a polycistronic mRNA ensures that the amounts of all three gene products are
regulated coordinately. Transcription occurs from a single promoter (P lac) that lies upstream of these structural
genes and binds RNA polymerase . However, also present are an operator site (O lac) between the promoter
and the structural genes, and a lacI gene that codes for the lac represser protein.

[69]
The Lac Represser:
The lacI gene has its own promoter (P lacI) that binds RNA polymerase and leads to transcription of lac represser
mRNA and hence production of lac represser protein monomers. Four identical represser monomers come
together to form the active tetramer, which can bind tightly to the lac operator site, Olac. The Olac sequence is
palindromic, that is it has the same DNA sequence when one strand is read 5’ to 3’ and the complementary
strand is read 5’ to 3’.

This symmetry of the operator site is matched by the symmetry of the represser tetramer.
In the absence of an inducer such as allolactose or IPTG, the lac I gene is transcribed and the resulting
represser protein binds to the operator site of the lac operon, Olac and prevents transcription of the lacZ,
lacY and lacA genes. During induction, the inducer binds to the represser. This causes a change in
conformation of the represser that greatly reduces its affinity for the lac operator site. The lac represser now
dissociates from the operator site and allows the RNA polymerase (already in place on the adjacent promoter
site) to begin transcribing the lacZ, lacY and lac A genes. This yields many copies of the polycistronic mRNA
[70]
and, after translation, large amounts of all three enzymes. If inducer is removed, the lac represser rapidly
binds to the lac operator site and transcription is inhibited almost immediately. The lacZYA RNA transcript is
very unstable and so degrades quickly such that further synthesis of the β galactosidase, permease and
transacetylase ceases.

CRP/CAP:
High-level transcription of the lac operon requires the presence of a specific activator protein called catabolite
activator protein (CAP), also called cAMP receptor protein (CRP). This protein, which is a dimer, cannot bind
to DNA unless it is complexed with 3’5’ cyclic AMP (cAMP). The CRP-cAMP complex binds to the lac promoter
just upstream from the binding site for RNA polymerase. It increases the binding of RNA polymerase and so
stimulates transcription of the lac operon.

Whether or not the CRP protein is able to bind to the lac promoter depends on the carbon source available to
the bacterium. When glucose is present, E. coli does not need to use lactose as a carbon source and so the lac
operon does not need to be active. Thus the system has evolved to be responsive to glucose. Glucose inhibits
adenylate cyclase, the enzyme that synthesizes cAMP from ATP. Thus, in the presence of glucose the
intracellular level of cAMP falls, so CRP cannot bind to the lac promoter, and the lac operon is only weakly
active (even in the presence of lactose). When glucose is absent, adenylate cyclase is not inhibited, the level
of intracellular cAMP rises and binds to CRP.

Therefore, when glucose is absent but lactose is present, the CRP-cAMP complex stimulates transcription of
the lac operon and allows the lactose to be used as an alternative carbon source. In the absence of lactose,
the lac represser of course ensures that the lac operon remains inactive. These combined controls ensure that
the lacZ, lacY and lacA genes are transcribed strongly only if glucose is absent and lactose is present .

Positive and negative regulation:


The lac operon is a good example of negative control (negative regulation) of gene expression in that bound
represser prevents transcription of the structural genes. Positive control (positive regulation) of gene
expression is when the regulatory protein binds to DNA and increases the rate of transcription. In this case the
regulatory protein is called an activator. The CAP/CRP involved in regulating the lac operon is a good example
of an activator. Thus the lac operon is subject to both negative and positive control.
►LAC Operon is an INDUCIBLE operon (i.e.,exhibits negative control). it is always off and is turned on by an
inducer molecule (allolactose)
► CATABOLIC REPRESSION...as long as glucose is present LAC operon is OFF, even if allolactose is present
glucose prevents the action of the LAC operon through another regulator-like protein, the Catabolite Activator
Protein or CAP and DNA binding site - CAP gene
►CAP is an allosteric protein, regulated by cAMP
when glucose is low - all the ATP is hydrolyzed favoring high cAMP amounts cAMP-CAP conformation
can bind to CAP DNA region - favors rapid transcription
when glucose is high - lots of ATP & little cAMP
►CAP-alone conformation doesn't bind to CAP DNA region - favors slow transcription

[71]
THE TRP OPERON
Organization Of The TRP Operon:
The tryptophan (trp) operon contains five structural genes encoding enzymes for tryptophan biosynthesis with
an upstream trp promoter (Ptrp) and trp operator sequence (Otrp). The trp operator region partly overlaps the
trp promoter. The operon is regulated such that transcription occurs when tryptophan in the cell is in short
supply.
Repression:
In the absence of tryptophan a trp represser protein encoded by a separate operon, trpR, is synthesized and
forms a dimer. However, this is inactive and so is unable to bind to the trp operator and the structural genes
of the trp operon are transcribed. When tryptophan is present the enzymes for tryptophan biosynthesis are
not needed and so expression of these genes is turned off. This is achieved by tryptophan binding to the
represser to activate it so that it now binds to the operator and stops transcription of the structural genes. In
this role, tryptophan is said to be a co-repressor. This is negative control, because the bound represser
prevents transcription, but note that the lac operon and trp operon show two ways in which negative control
can be achieved; either (as in the lac operon) by having an active bound represser that is inactivated by a
bound ligand (the inducer) or (as in the trp operon) by having a represser that is inactive normally but acti-
vated by binding the ligand. As in the case of the lac operator, the core-binding site for the trp represser in the
trp operator is palindromic.

Attenuation: A second mechanism, called attenuation, is also used to control expression of the trp operon.
The 5’ end of the polycistronic mRNA transcribed from the trp operon has a leader sequence upstream of the
coding region of the trpE structural gene. This leader sequence encodes a 14 amino acid leader peptide
containing two tryptophan residues.

The function of the leader sequence is to fine tune expression of the trp operon based on the availability of
tryptophan inside the cell. It does this as follows. The leader sequence contains four regions that can form a
variety of base-paired stem-loop (‘hairpin’) secondary structures. Now consider the two extreme situations:
the presence or absence of tryptophan. Attenuation depends on the fact that, in bacteria, ribosomes attach to
mRNA as it is being synthesized and so translation starts even before transcription of the whole mRNA is
complete

[72]
When tryptophan is abundant, ribosomes bind to the trp polycistronic mRNA that is being transcribed and
begin to translate the leader sequence. Now, the two-trp codons for the leader peptide lie within sequence 1,
and the translational Stop codon lies between sequence 1 and 2. During translation, the ribosomes follow very
closely behind the RNA polymerase and synthesize the leader peptide, with translation stopping eventually
between sequences 1 and 2. At this point, the position of the ribosome prevents sequence 2 from interacting
with sequence 3. Instead sequence 3 base pairs with sequence 4 to form a 3:4 stem loop, which acts as a
transcription terminator. Therefore, when tryptophan is present, further transcription of the trp operon is
prevented. If, however, tryptophan is in short supply, the ribosome will pause at the two-trp codons
contained within sequence 1. This leaves sequence 2 free to base pair with sequence 3 to form a 2:3 structure
(also called the antirterminator), so the 3:4 structure cannot form and transcription continues to the end of
the trp operon. Hence the availability of tryptophan controls whether transcription of this operon will stop
early (attenuation) or continue to synthesize a complete polycistronic mRNA.

[73]
Historically, attenuation was discovered when it was noticed that deletion of a short sequence of DNA
between the operator and the first structural gene, trpE, increased the level of transcription. This region was
named the attenuator and is the DNA that encodes that part of the leader sequence that forms the
transcription terminator stem-loop

The Arabinose Operon

The ara operon codes for three enzymes that are required to catalyze the metabolism of arabinose.

 Arabinose isomerase - encoded by araA - coverts arabinose to ribulose


 Ribulokinase - encoded by araB -- phosphorylates ribulose
 Ribulose-5-phosphate epimerase - encoded by araD -- converts ribulose-5-phosphate to
xylulose-5-phosphate which can then be metabolized via the pentose phosphate pathway.

The three structural genes are arranged in an operon that is regulated by the araC gene product. There are four
important regulatory sites as shown in the following diagram:
 

   araO1 is an operator site. AraC binds to this site and represses its own transcription from the PC promoter.
In the presence of arabinose, however, AraC bound at this site helps to activate expression of
the PBAD promoter.

 araO2 is also an operator site. AraC bound at this site can simultaneously bind to the araI site to repress
transcription from the PBADpromoter
 araI is also the inducer site. AraC bound at this site can simultaneously bind to the araO2 site to repress
transcription from the PBADpromoter. In the presence of arabinose, however, AraC bound at this site helps to
activate expression of the PBAD promoter.
 CRP binds to the CRP binding site. It does not directly assist RNA polymerase to bind to the promoter in
this case. Instead, in the presence of arabinose, it promotes the rearrangement of AraC when arabinose is
present from a state in which it represses transcription of the PBAD promoter to one in which it activates
transcription of the PBAD promoter.

Regulation of the arabinose operon is, clearly, much more complex than the lactose operon.
When arabinose is absent, there is no need to express the structural genes. 
AraC does this by binding simultaneously to araI andaraO2. As a result the intervening DNA is looped. These
two events block access to the PBAD promoter which is, in any case, a very weak promoter (unlike
[74]
the lac promoter):

AraC also prevents its own expression. Thus, it is an autoregulator of its own expression. This makes sense;
there is no need to over-express AraC. If the concentration falls too low then transcription of araC resumes
until the amount of AraC is sufficient to prevent more transcription again.

 When arabinose is present, it binds to AraC and allosterically induces it to bind to araI instead araO2.


If glucose is also absent, then the presence of CRP bound to its site between araO1 and araI helps to break the
DNA loop and also helps AraC to bind to araI:

 The ara operon demonstrates both negative and positive control. It shows a different function for CRP. It also
shows how a protein can act as a switch with its activity being radically altered upon the binding of a small
molecule.

[75]
CHAPTER: 7 TRANSCRIPTION IN EUKARYOTES
Three RNA Polymerases:
Unlike prokaryotes where all RNA is synthesized by a single RNA polymerase, the nucleus of a eukaryotic cell
has three RNA polymerases responsible for transcribing different types of RNA.
1. RNA polymerase I (RNA Pol I) is located in the nucleolus and transcribes the 28S, 18S and 5.8S rRNA genes.
2.RNA polymerase II (RNA Pol II) is located in the nucleoplasm and tran scribes protein coding genes, to yield
pre-mRNA, and also the genes encoding small nuclear RNAs (snRNAs) involved in mRNA processing , except
for U6 snRNA.
3.RNA polymerase III (RNA Pol III) is also located in the nucleoplasm. It transcribes the genes for tRNA, 5S
rRNA, U6 snRNA, and the 7S RNA associated with the signal recognition particle (SRP) involved in
the translocation of proteins across the endoplasmic reticulum membrane.
α-Amanitin can also be used to determine which types of RNA polymerase are present. This is done by
testing the sensitivity of the polymerase in the presence of α-amanitin. RNA polymerase I is insensitive,
RNA pol II is highly sensitive, and RNA pol III is slightly sensitive.

RNA Synthesis: The basic mechanism of RNA synthesis by these eukaryotic RNA polymerases is the same as
for the prokaryotic enzyme, that is: 1.the initiation of RNA synthesis by RNA polymerase is directed by the
presence of a promoter site on the 5’ side of the transcriptional start site;
1. The RNA polymerase transcribes one strand, the antisense (—) strand, of the DNA template;
2. RNA synthesis does not require a primer;
3. RNA synthesis occurs in the 5’ —> 3’ direction with the RNA polymerase catalyzing a nucleophilic attack by
the 3’-OH of the growing RNA chain on the phosphorus atom on an incoming ribonucleoside 5’ triphosphate.
RNA Polymerase Subunits: Each of the three eukaryotic RNA polymerases contains 12 or more subunits and
so these are large complex enzymes. The genes encoding some of the subunits of each eukaryotic enzyme
show DNA sequence similarities to genes encoding subunits of the core enzyme (21) of E. coli RNA
polymerase. However, four to seven other subunits of each eukaryotic RNA polymerase are unique in that
they show no similarity either with bacterial RNA polymerase subunits or with the subunits of other
eukaryotic RNA polymerases.

TRANSCRIPTION OF PROTEIN CODING GENES IN EUKARYOTES


Initiation of transcription:
Most promoter sites for RNA polymerase II include a highly conserved sequence located about 25-35 bp
upstream (i.e. to the 5’ side) of the start site which has the consensus TATA(A/T)A(A/T) and is called the TATA
box . Since the start site is denoted as position +1, the TATA box position is said to be located at about
position -25. The TATA box sequence resembles the —10 sequence in prokaryotes (TATAAT) except that it is
located further upstream. Both elements have essentially the same function, namely recognition by the RNA
polymerase in order to position the enzyme at the correct location to initiate transcription. The sequence
around the TATA box is also important in that it influences the efficiency of initiation. Transcription is also
regulated by upstream control elements that lie 5’ to the TATA box.
Some eukaryotic protein-coding genes lack a TATA box and have an initiator element instead, centered
around the transcriptional initiation site. This does not have a strong consensus between genes but often
includes a C at position -1 and an A at position +1.

[76]
Yet other promoters have neither a TATA box nor an initiator element; these genes tend to be transcribed at
low rates and initiate transcription somewhere within a broad region of DNA (about 200 bp or so) rather than
at a defined transcriptional start site.

In order to initiate transcription, RNA polymerase II requires the assistance of several other proteins or protein
complexes, called general (or basal) transcription factors, which must assemble into a complex on the
promoter in order for RNA polymerase to bind and start transcription. These all have the generic name of TFII
(for Transcription Factor for RNA polymerase II). The first event in initiation is the binding of the transcription
factor IID (TFIID) protein complex to the TATA box. The key subunit of TFIID is TBP (TATA box binding
protein). Other subunits in the TFIID complex are called TBP-associated factors (TAFs). The order of events is
that TBP binds to the TATA box and then at least eight TAFs bind to form TFIID. As soon as the TFIID complex
has bound, TFIIA binds and stabilizes the TFIID-TATA box interaction. Next, TFIIB binds to TFIID. However,
[77]
TFIIB can also bind to RNA polymerase II and so acts as a bridging protein. Thus, RNA polymerase II, which has
already complexed with TFIIF, now binds. This is followed by the binding of TFIIE, H and J. This final protein
complex contains at least 40 polypeptides and is called the transcription initiation complex.
It can now begin to transcribe the gene, although at only a relatively low rate, and is the basal transcription
apparatus. For a high rate of transcription, other transcription factors are required which bind to additional
sequence elements and interact with this initiation complex. Those protein-coding genes that have an initiator
element instead of a TATA box appear to need another protein(s) that binds to the initiator element and
facilitates the binding of TBP. The other transcription factors then bind to form the transcription initiation
complex in a similar manner to that described above for genes possessing. a TATA box promoter
Elongation and termination:
Elongation of the RNA chain continues until termination occurs. Unlike RNA polymerase in prokaryotes, RNA
polymerase II does not terminate transcription at specific sites but rather transcription stops at varying
distances downstream of the gene. The RNA molecule made from a protein-coding gene by RNA polymerase II
is called a primary transcript.
Unlike the situation in prokaryotes, the primary transcript from a eukaryotic protein-coding gene is a
precursor molecule, pre-RNA, that needs extensive RNA processing in order to yield mature mRNA ready for
translation. Several RNA processing reactions are involved: capping, 3’ cleavage and polyadenylation and RNA
splicing.

REGULATION OF TRANSCRIPTION BY RNA POL II


Mechanism of regulation:
A number of protein-coding genes are active in all cells and are required for so-called ‘house-keeping’
functions, such as the enzymes of glycolysis, the citric acid cycle and the proteins of the electron transport
chain . However, some genes are active only in specific cell types and are responsible for defining the specific
characteristics and function of those cells; for example immunoglobulin genes in lymphocytes, myosin in
muscle cells. In addition, the proteins expressed by any given cell may change over time (for example during
early development) or in response to external stimuli, such as hormones. Eukaryotic cells can regulate the
expression of protein-coding genes at a number of levels but a prime site of regulation is transcription.

Transcriptional regulation in a eukaryotic cell (i.e. which genes are transcribed and at what rate) is
mediated by transcription factors, other than the general transcription factors, which recognize and bind to
short regulatory DNA sequences (control elements) associated with the gene. These sequences are also called
cis-acting elements (or simply cis-elements) since they are on the same DNA molecule as the gene being
controlled (cis is Latin for ‘on this side’). The protein transcription factors that bind to these elements are also
known as transacting factors (or simply trans-factors) in that the genes encoding them can be on different
DNA molecules (i.e. on different chromosomes). The transcription factors, which regulate specific gene
transcription, do so by interacting with the proteins of the transcription initiation complex and may either
increase (activate) or decrease (repress) the rate of transcription of the target gene. Typically each protein
coding gene in a eukaryotic cell has several control elements in its promoter and hence is under the control of
several transcription factors which interact with each other and with the transcription initiation complex by
protein-protein interaction to determine the rate of transcription of that gene.

[78]
Fig:- Control regions that regulate transcription of a typical eukaryotic protein-coding gene. Although shown as distinct entities here
for clarity, in vivo the different regulatory proteins bound to the control elements and distant enhancers interact with each other and
with the general transcription factors of the transcription initiation complex to modulate the rate of transcriptional initiation.

Upstream regulatory elements:


Many transcription factors bind to control elements within a few hundred base-pairs of the protein-coding
gene being regulated. Positive control elements that lie upstream of the gene, usually within 200 bp of the
transcriptional start site, are often called upstream regulatory elements (UREs) and function to increase the
transcriptional activity of the gene well above that of the basal promoter. Some of these elements, for
example the SP1 box and the CAAT box, are found in the promoters of many eukaryotic protein-coding genes;
indeed genes often have several copies of one or both elements.
The SP1 box has the core sequence GGGCGG, and binds transcription factor SP1 which then interacts with
one of the TAFn proteins that bind to TBP to form TFIID. In contrast, some upstream regulatory elements are
associated only with a few specific genes and are responsible for limiting the transcription of those genes to
certain tissues or in response to certain stimuli such as steroid hormones. For example, steroid hormones
control metabolism by entering the target cell and binding to specific steroid hormone receptors in the
cytoplasm. The binding of the hormone releases the receptor from an inhibitor protein that normally keeps
the receptor in the cytoplasm. The hormone-receptor complex, now free of inhibitor, dimerizes and travels to
the nucleus where it binds to a transcriptional control element, called a hormone response element, in the
promoters of target genes. Then, like other transcription factors, the bound hormone-receptor complex
interacts with the transcription initiation complex to increase the rate of transcription of the gene. The result
is a hormone-specific transcription of a subset of genes in target cells that contain the appropriate steroid
hormone receptor. Here, the hormone receptor is itself a transcription factor that is activated by binding the
hormone ligand.
Unlike steroid hormones, polypeptide hormones, such as insulin and cytokines, do not enter the target cell
but instead bind to protein receptors located at the cell surface. The binding reaction triggers a cascade of
protein activations, often involving protein phosphorylation, which relay the signal inside the cell ( signal
transduction). Again the response may be that specific transcription factors are activated and stimulate the
transcription of selected genes, but here the activation is mediated via the signal transduction pathway and
does not involve direct binding of the hormone or cytokine to the transcription factor. Many additional
examples of transcriptional activation of specific genes by transcription factors exist in eukaryotes.

Enhancers:
Although many positive control elements lie close to the gene they regulate, others can be located long
distances away (sometimes 10-50 kb) either upstream or downstream of the gene. A long-distance positive
control sequence of this kind is called an enhancer if the transcription factor(s) that binds to it increases the
rate of transcription. An enhancer is typically 100-200 bp long and contains several sequence elements that

[79]
act together to give the overall enhancer activity. When they were first discovered, enhancers were viewed as
a distinct class of control element in that they:
1.can activate transcription over long distances
2.can be located upstream or downstream of the gene being controlled
3.are active in either orientation with respect to the gene.
However, it is now clear that some upstream promoter elements and enhancers show strong similarities
physically and functionally so that the distinction is not as clear as was once thought. For enhancers located a
long distance away from the gene being controlled, interaction between transcription factors bound to the
enhancer and to promoter elements near the gene occurs by looping out of the DNA between the two sets of
elements.

---Looping out of DNA allowing the interaction of enhancer-bound factor(s) with the transcription initiation complex

Transcription factors have multiple domains:


In most cases, the transcription factors in eukaryotes that bind to enhancer or promoter sequences are
activator proteins that induce transcription. These proteins usually have at least two distinct domains of
protein structure, a DNA-binding domain that recognizes the specific DNA sequence to bind to, and an
activation domain responsible for bringing about the transcriptional activation by interaction with other
transcription factors and/or the RNA polymerase molecule. Many transcription factors operate as dimers,
either homodimers (identical subunits) or heterodimers (dissimilar subunits) with the subunits held together
via dimerization domains. DNA binding domains and dimerization domain characteristic protein structure
(motifs) that are described below. Finally, some transcription factors (e.g. steroid hormone receptors) are
responsive to specific small molecules (ligands), which regulate the activity of the transcription factor. In these
cases, the ligand binds at a ligand-binding domain.

DNA binding domains


Helix-turn-helix
This motif consists of two -helices separated by a short (four-amino acid) peptide sequence that forms a -
turn. When the transcription factor binds to DNA, one of the helices, called the recognition helix, lies in the
major groove of the DNA double helix. The helix-turn-helix motif was originally discovered in certain
transcription factors that play major roles in Drosophila early development. These proteins each contain a 60-
amino acid DNA-binding region called a homeodomain (encoded by a DNA sequence called a homeobox). The
homeodomain has four a-helices in which helices II and III are the classic helix-turn-helix motif. Since the
[80]
original discovery, the helix-turn-helix motif has been found in a wide range of transcription factors, including
many that have no role in development.

Zinc finger
1. The C2H2 zinc finger is a loop of 12 amino acids with two cysteines and two histidines at the base of the loop
that tetrahedrally coordinate a zinc ion. This forms a compact structure of two -strands and one -helix .
The -helix contains a number of conserved basic amino acids and interacts directly with the DNA, binding in
the major groove of the double helix.
They are extremely common in mammalian transcription factors. These domains adopt a simple ββα fold and
have the amino acid Sequence motif: X2-Cys-X2,4-Cys-X12-His-X3,4,5-His
 Transcription factors that contain zinc fingers often contain several such motifs; usually at least three zinc
fingers are needed for tight DNA binding of the protein Indeed RNA polymerase III transcription factor A
(TFIIIA) contains nine zinc fingers.
The SP1 transcription factor, which binds to the SP1 box, has three zinc fingers.
2. Gag-knuckle: -This fold group is defined by two short β-strands connected by a turn (zinc knuckle)
followed by a short helix or loop and resembles the classical Cys 2His2 motif with a large portion of the helix
and β-hairpin truncated. The retroviral nucleocapsid (NC) protein from HIV and other related retroviruses are
examples of proteins possessing these motifs. The gag-knuckle zinc finger in the HIV NC protein is the target
of a class of drugs known as zinc finger inhibitors.
3. Treble-clef
The treble-clef motif consists of a β-hairpin at the N-terminus and an α-helix at the C-terminus that each
contribute two ligands for zinc binding, although a loop and a second β-hairpin of varying length and
conformation can be present between the N-terminal β-hairpin and the C-terminal α-helix. These fingers are
present in a diverse group of proteins that frequently do not share sequence or functional similarity with each
other. The best-characterized proteins containing treble-clef zinc fingers are the nuclear hormone receptors.
4. The C4 zinc finger is also found in a number of transcription factors, including steroid hormone receptor
proteins. This motif forms a similar structure to that of C 2H2 zinc finger but has four cysteines co-ordinated to
the zinc ion instead of two cysteines and two histidines .
These zinc fingers can be found in several transcription factors including the yeast Gal4 protein

[81]
Basic domains
DNA binding domains called basic domains (rich in basic amino acids), occur in transcription factors in
combination with leucine zipper or helix-loop-helix (HLH) dimerization domains. The combination of basic
domain and dimerization domain gives these proteins their names of basic leucine zipper proteins (bZIP) or
basic HLH proteins, respectively. In each case the dimerization means that two basic domains (one from each
monomer) interact with the target DNA.

Dimerization domains:
Leucine zippers
The leucine zipper motif contains a leucine every seventh amino acid in the primary sequence and forms an -
helix with the leucines presented on the same side of the helix every second turn, giving a hydrophobic
surface. The transcription factor dimer is formed by the two monomers interacting via the hydrophobic faces
of their leucine zipper motifs . In the case of bZIP proteins, each monomer also has a basic DNA binding
domain located N-terminal to the leucine zipper. Thus the bZIP protein dimer has two basic domains. These
actually face in opposite directions which allows them to bind to DNA sequences that have inverted
symmetry. They bind in the major groove of the target DNA . The leucine zipper domain also acts as the
dimerization domain in transcription factors that use DNA binding domains other than the basic domain. For
example, some homeodomain proteins, containing the helix-turn-helix motif for DNA binding, have leucine
zipper dimerization domains. In all cases, the dimers that form may be homodimers or heterodimers.
Helix-loop-helix motif
The helix-loop-helix (HLH) dimerization domain is quite distinct from the helix-turn-helix motif described
above (which is involved in DNA binding not dimerization) and must not be confused with it. The HLH domain
consists of two α-helices separated by a nonhelical loop. The C-terminal α-helix has hydrophobic amino acids
on one face. Thus two transcription factor monomers, each with an HLH motif can dimerize by interaction
between the hydrophobic faces of the two C-terminal α-helices. Like the leucine zipper, the HLH motif is often
found in transcription factors that contain basic DNA binding domains. Again, like the leucine zipper, the HLH
motif can dimerize transcription factor monomers to form either homodimers or heterodimers. This ability to
form heterodimers markedly increases the variety of active transcription factors that are possible and so
increases the potential for gene regulation.

[82]
LEUCINE ZIPPER Helix-loop-helix motif
Activation domains:
Unlike DNA binding domains and dimerization domains, no common structural motifs have yet been identified
in the activation domains of diverse transcription factors. However, most activation domains so far reported
appear to fall into one of three classes:
1.acidic activation domains are rich in acidic amino acids (aspartic and glutamic acids). For example,
mammalian glucocorticoid receptor proteins contain this type of activation domain;
2.glutamine-rich domains (e.g. as in SP1 transcription factor);
3.proline-rich domains (e.g. c-jun transcription factor).
Repressers:
Gene represser proteins that inhibit the transcription of specific genes in eukaryotes also exist. They may act
by binding either to control elements within the promoter region near the gene or at sites located a long
distance away from the gene, called silencers. The represser protein may inhibit transcription directly. One
example is the mammalian thyroid hormone receptor which, in the absence of thyroid hormone, represses
transcription of the target genes. However, other repressers inhibit transcription by blocking activation. This
can be achieved in one of several ways: by blocking the DNA binding site for an activator protein, by binding to
and masking the activation domain of the activator factor, or by forming a non-DNA binding complex with the
activator protein. Several examples of each mode of action are known.

PROCESSING OF EUKARYOTIC PRE-mRNA


In eukaryotes, the product of transcription of a protein-coding gene is pre-mRNA which requires processing to
generate functional mRNA. Several processing reactions occur. The 5’ end of the primary RNA transcript, pre-
mRNA, is modified by the addition of a 5’ cap (a process known as capping) and the 3’ end of most (but not
all) pre-mRNAs is also modified by cleavage and then the addition of 200-250 A residues to form a poly(A) tail
(a process called polyadenylation). The pre-mRNA sequence includes both coding (exon) and noncoding
[83]
(intron) regions .The latter need to be removed and the exon sequences joined together by RNA splicing to
generate a continuous coding sequence for translation. All of these mRNA processing reactions occur in the
nucleus so that, at any one time, there is a population of pre-mRNAs of different sizes reflecting both the sizes
of the protein-coding genes from which they were transcribed and the extent of processing that has occurred.
This population of RNA molecules is called heterogeneous nuclear RNA (hnRNA). hnRNA is not naked but has
specific proteins bound to it forming heterogeneous nuclear ribonucleoprotein (hnRNP) complexes. The
proteins are probably involved both in the various processing reactions and subsequent transport of mRNA
from the nucleus.

5’ processing: --capping:
Capping of pre-mRNA occurs immediately after synthesis and involves the addition of 7-methylguanosine
(m7G) to the 5’ end. To achieve this, the terminal 5’ phosphate is first removed by a phosphatase. Guanosyl
transferase then catalyzes a reaction whereby the resulting diphosphate 5’ end attacks the phosphorus atom
of a GTP molecule to add a G residue in an unusual 5’5’ triphosphate link. The G residue is then methylated
by adding a methyl group to the N-7 position of the guanine ring, using S-adenosyl methionine as methyl
donor. This structure, with just the m G in position, is called a cap 0 structure. The ribose of the adjacent
nucleotide (nucleotide 2 in the RNA chain) or the riboses of both nucleotides 2 and 3 may also be
methylated to give cap 1 or cap 2 structures respectively. In these cases, the methyl groups are added to the
2’ OH groups of the ribose sugars.

The cap protects the 5’ end of the primary transcript against attack by ribonu-cleases that have specificity for
3’5’ phosphodiester bonds and so cannot hydrolyze the 5’5’ bond in the cap structure. In addition, the cap
plays a role in the initiation step of protein synthesis in eukaryotes. Only RNA transcripts from eukaryotic
protein-coding genes become capped; prokaryotic mRNA and eukaryotic rRNA and tRNAs are uncapped.
[84]
3’ processing: cleavage and polyadenylation

1. Cleavage and Polyadenylation Specificity Factor (CPSF) and Cleavage Stimulation Factor (CstF), both of
which are multi-protein complexes, start bound to the rear of the advancing RNA polymerase II.
2. As the RNA polymerase II advances over the adenylation signal sequences CPSF and CstF transfer to the
new pre-mRNA, CPSF binding to the AAUAAA sequence, and CstF to the GU or U rich sequence following

[85]
it.
3. CPSF and CstF promote cleavage approximately 35 nucleotides after the end of the AAUAAA sequence.
Immediately Polyadenylate
4. Polymerase (PAP) starts writing the polyadenosine tail. Cleavage will not occur unless PAP is bound to the
complex, eliminating the possibility of premature cleavage. Nuclear Polyadenylate Binding. Protein
(PABPN1) immediately binds to the new polyadenosine sequence
5. CPSF dissociates, and polyadenylation by PAP continues to write an adenosine tail of approximately 100 to
250 nucleotides, depending on the organism. PABPN1 acts as some kind of molecular ruler, specifying
when polyadenylation should stop.
6. PAP dissociates, and PABPN1 remains bound. It is thought this, along with the 5' cap, helps target the
mRNA for nuclear export.

RNA splicing:
The next step in RNA processing is the precise removal of intron sequences and joining the ends of
neighboring exons to produce a functional mRNA molecule, a process called RNA splicing. The exon-intron
boundaries are marked by specific sequences. In most cases, at the 5’ boundary between the exon and the
intron (the 5’ splice site), the intron starts with the sequence GU and at the 3’ exon-intron boundary (the 3’
splice site) the intron ends with the sequence AG. Each of these two sequences lies within a longer consensus
sequence. A polypyrimidine tract (a conserved stretch of about 11 pyrimidines) lies upstream of the AG at the
3’ splice site. A key signal sequence is the branchpoint sequence, which is located about 20-50 nt upstream of
the 3’ splice site. In vertebrates this sequence is 5’-CURAY-3’ where R = purine and Y = pyrimidine (in yeast this
sequence is 5’-UACUAAC-3’).
RNA splicing occurs in two steps:
In the first step, the 2’ OH of the A residue at the branch site attacks the 3’5’ phosphodi-ester bond at the
5’ splice site causing that bond to break and the 5’ end of the intron to loop round and form an unusual 2’ 5’
bond with the A residue in the branch site sequence. Because this A residue already has 3’ 5’ bonds with its
neighbors in the RNA chain, the intron becomes branched at this point to form what is known as a lariat
intermediate (named as such since it resembles a cowboy’s lasso).
The new 3’-OH end of exon 1 now attacks the phosphodiester bond at the 3’ splice site causing the two exons
to join and release the intron, still as a lariat. In each of the two splicing reactions, one phosphate-ester bond
is exchanged for another (i.e. these are two transesterification reactions). Since the number of phosphate-
ester bonds is unchanged, no energy (ATP) is consumed.
RNA splicing requires the involvement of several small nuclear RNAs (snRNAs) each of which is associated
with several proteins to form a small nuclear ribonucleoprotein particle or snRNP.Because snRNAs are rich in
U residues; they are named Ul, U2, etc. The RNA components of the snRNPs have regions that are
complementary to the 5’ and 3’ splice site sequences and to other conserved sequences in the intron and so
can base-pair with them. The Ul snRNP binds to the 5’ splice site and U2 snRNP binds to the branchpoint
sequence. A tri-snRNP complex of U4, U5 and U6 snRNPs then binds, as do other accessory proteins, so that a
multicomponent complex (called a spliceosome) is formed at the intron to be removed and causes the intron
to be looped out .

[86]
Thus, through interactions between the snRNAs and the pre-mRNA, the spliceosome brings the upstream and
downstream exons together ready for splicing. The spliceosome next catalyzes the two-step splicing reaction
to remove the intron and ligate together the two exons. The spliceosome then dissociates and the released
snRNPs can take part in further splicing reactions at other sites on the pre-mRNA.

Although the vast majority of pre-mRNA introns start with GU at the 5’ splice site and end with AG at the 3’
splice site, some introns (possibly as many as 1%) have different splice site consensus sequences. In these
cases, the intron starts with AU and ends with AC instead of GU and AG, respectively . Since RNA splicing
involves recognition of the splice site consensus sequences by key snRNPs (see above), and since these
sequences are different in the minor intron class, U1, U2, U4, U6 snRNPs do not take part in splicing these so-
called ‘AT-AC introns’ (the AT-AC refers of course to the corresponding DNA sequence). Instead, U11, U12,
U4atac and U6atac snRNPs are involved, replacing the roles of U1, U2, U4 and U6 respectively, and assemble to
form the ‘AT-AC spliceosome’. U5 snRNP is required for splicing both classes of intron. In some cases, RNA
precursor molecules are known to undergo splicing in the absence of protein; the intron excises itself.

[87]
Alternative processing:
Alternative polyadenylation sites
Certain pre-mRNAs contain more than one set of signal sequences for 3’ end cleavage and polyadenylation. In
some cases, the location of the alternative polyadenylation sites is such that, depending on the site chosen,
particular exons may be lost or retained in the subsequent splicing reactions. Here the effect is to change the
coding capacity of the final mRNA so that different proteins are produced depending on the polyadenylation
site used. In other cases, the alternative sites both lie within the 3’ noncoding region of the pre-mRNA so that
the same coding sequences are included in the final mRNA irrespective of which site is used but the 3’
noncoding region can vary. Since the 3’ noncoding sequence may contain signals to control mRNA stability,
the choice of polyadenylation site in this situation can affect the lifetime of the resulting mRNA.

Alternative splicing

[88]
Many cases are now known where different tissues splice the primary RNA transcript of a single gene by
alternative pathways, where the exons that are lost and those that are retained in the final mRNA depend
upon the pathway chosen. Presumably some tissues contain regulatory proteins that promote or suppress the
use of certain splice sites to direct the splicing pathway selected. These alternative splicing pathways are very
important since they allow cells to synthesize a range of functionally distinct proteins from the primary
transcript of a single gene.

RNA editing:
RNA editing is the name given to several reactions whereby the nucleotide sequence on an mRNA molecule
may be changed by mechanisms other than RNA splicing. Individual nucleotides within the mRNA may be
changed to other nucleotides, deleted entirely or additional nucleotides inserted. The effect of RNA editing is
to change the coding capacity of the mRNA so that it encodes a different polypeptide than that originally
encoded by the gene. An example of RNA editing in humans is apolipoprotein B mRNA. In liver, the mRNA
does not undergo editing and the protein produced after translation is called apolipoprotein B-100. In cells of
the small intestine, RNA editing causes the conversion of a single C residue in the mRNA to U and, in so doing,
changes a codon for glutamine (CAA) to a termination codon (UAA). Subsequent translation of the edited
mRNA yields the much shorter apolipoprotein B48 (48% of the size of apolipoprotein B100). This is not a
trivial change; apolipoprotein B-48 lacks a protein domain needed for receptor binding which apolipoprotein
B-100 possesses and hence the functional activities of the two proteins are different. Many other cases of RNA
editing are also known. Trypanosome mitochondrial mRNAs, for example, undergo extensive RNA editing
which results in over half of the uridines in the final mRNA being acquired through the editing process.

[89]
RIBOSOMAL RNA
Ribosomes:
Each ribosome consists of two subunits, a small subunit and a large subunit, each of which is a
multicomponent complex of ribosomal RNAs (rRNAs) and ribosomal proteins. One way of distinguishing
between particles such as ribosomes and ribosomal subunits is to place the sample in a tube within a
centrifuge rotor and spin this at very high speed. This causes the particles to sediment to the tube bottom.
Particles that differ in mass, shape and/or density sediment at different velocities (sedimentation velocities).
Thus a particle with twice the mass of another will always sediment faster provided both particles have the
same shape and density. The sedimentation velocity of any given particle is also directly proportional to the
gravitational forces (the centrifugal field) experienced during the centrifugation, which can be increased
simply by spinning the rotor at a higher speed. However, it is possible to define a sedimentation coefficient
that depends solely on the size, shape and density of the particle and is independent of the centrifugal field.
Sedimentation coefficients are usually measured in Svedberg units (S). A prokaryotic ribosome has a
sedimentation coefficient of 70S whereas the large and small subunits have sedimentation coefficients of 50S
and 30S, respectively (note that S values are not additive). The 50S subunit contains two rRNAs (23S and 5S)
complexed with 34 polypeptides whereas the 30S subunit contains 16S rRNA and 21 polypeptides. In
eukaryotes the ribosomes are larger and more complex; the ribosome monomer is 50S and consists of 60S
and 40S subunits. The 60S subunit contains three rRNAs (28S, 5.8S and 5S) and about 49 polypeptides and the
40S subunit has 18S rRNA and about 33 polypeptides.
A wide range of studies have built up a detailed picture of the fine structure of ribosomes, mapping the
location of the various RNA and protein components and their interactions. The overall shape of a 70S
ribosome, gained through electron microscopy studies.

Transcription and processing of prokaryotic rRNA:


In E. coli there are seven rRNA transcription units scattered throughout the genome, each of which contains
one copy of each of the 23S, 16S and 5S rRNA genes and one to four copies of various tRNA genes . This gene
assembly is transcribed by the single prokaryotic RNA polymerase to yield a single 30S pre-rRNA transcript
[90]
(about 6000 nt in size). This arrangement ensures that stoichiometric amounts of the various rRNAs are
synthesized for ribosome assembly. Following transcription, the 30S pre-rRNA molecule forms internal base-
paired regions to give a series of stem-loop structures and ribosomal proteins bind to form a
ribonucleoprotein (RNP) complex. A number of the nucleotides in the folded pre-rRNA molecule are now
methylated, on the ribose moieties, using S-adenosylmethionine as the methyl donor. Next the pre-rRNA
molecule is cleaved at specific sites by RNase III to release precursors of the 23S, 16S and 5S rRNAs. The
precursors are then trimmed at their 5’ and 3’ ends by ribonucleases M5, M16 and M23 (which act on the 5S,
16S and 23S precursor RNAs respectively) to generate the mature rRNAs.

Synthesis of eukaryotic 28S, 18S and 5.8S:


In eukaryotes, the genes for 28S, 18S and 5.8S rRNA are typically clustered, together and tandemly repeated
in that one copy each of 18S, 5.8S and then 28S genes occur, followed by untranscribed spacer DNA, then
another set of rRNA 18S, 5.8S and 28S genes occur and so on. In humans, there are about 200 copies of this
rRNA transcription unit arranged as five clusters of about 40 copies on separate chromosomes. These rRNA
transcription units are transcribed by RNA polymerase I (RNA Pol I) in a region of the nucleus known as the
nucleolus. The nucleolus contains loops of DNA extending from each of the rRNA gene clusters on the various
chromosomes and hence each cluster is called a nucleolar organizer.
►The rRNA promoter consists of a core element which straddles the transcriptional start site (designated as
position +1) from residues -31 to +6 plus an upstream control element (UCE) about 50-80 bp in size and
located about 100 bp upstream from the start site (i.e. at position -100 ). A transcription factor called
upstream binding factor (UBF) binds both to the UCE as well as to a region next to and overlapping with the
core element. Interestingly, TATA box binding protein (TBP), also binds to the rRNA promoter (in fact, TBP is
required for initiation by all three eukaryotic RNA polymerases). The UBF and TBP transcription factors
interact with each other and with RNA Pol I to form a transcription initiation complex. The RNA Pol I then

[91]
transcribes the whole transcription unit of 28S, 18S and 5.8S genes to synthesize a single large pre-rRNA
molecule.
►In humans, the product of transcription is a 45S pre-rRNA, which has non-rRNA external transcribed
spacers (ETSs) at the 5’ and 3’ ends and non-rRNA internal transcribed spacers (ITSs) internally separating the
rRNA sequences. This 45S molecule is processed in a similar pattern to that observed in prokaryotes for pre-
rRNA, i.e. the pre-rRNA folds up to form a defined secondary structure with stem-loops, ribosomal proteins
bind to selected sequences, and methylation of ribose moieties occurs (at over 100 nucleotides). The 45S pre-
rRNA molecule is then cleaved by ribonucleases, first in the ETSs and then in the ITSs, to release precursor
rRNAs which are cleaved further and trimmed by other ribonucleases to release the mature 28S, 18S and 5.8S
rRNAs .

In eukaryotes, selection of the sites in pre-rRNA that will be methylated depends upon small RNAs found in
the nucleolus called small nucleolar RNAs (snoRNAs) that exist in ribonucleoprotein complexes called
snoRNPs. The snoRNAs contain long regions (10-21 ntd) that are complementary to specific regions of the pre-
rRNA molecule, form base pairs with the pre-rRNA at these sites and then guide where methylation of specific
Ψ
ribosome residues (2’-O methylation) will occur. A number of pseudouridine ( ) residues are also produced
during processing of eukaryotic pre-rRNA and again snoRNAs are involved in guiding this event.

Synthesis of eukaryotic 5S rRNA:

In eukaryotes, the 5S rRNA gene is also present in multiple copies (2000 in human cells, all clustered together
at one chromosomal site). Unlike other eukaryotic rRNA genes, the 5S rRNA genes are transcribed by RNA
polymerase III (RNA Pol III). The promoters of tRNA genes, which are also transcribed by RNA Pol III, contain
control elements called the A box and B box located downstream of the transcriptional start site.
A similar situation exists for 5S rRNA genes in that the promoter has two control elements located
downstream of the transcriptional start site, an A box and a C box . The C box binds transcription factor IIIA
(TFIIIA) which then in turn interacts with TFIIIC to cause it to bind, a process which probably also involves
recognition of the A box. Once TFIIIC has bound, TFIIIB binds and interacts with RNA Pol III, causing that to
bind also to form the transcription initiation complex. One of the three subunits of TFIIB is TATA box binding
protein (TBP), the transcription factor required for transcription by all three eukaryotic RNA polymerases.

[92]
Following transcription, the 5S rRNA transcript requires no processing. It migrates to the nucleolus and is
recruited into ribosome assembly.

tRNA structure:
Transfer RNA (tRNA) molecules play an important role in protein synthesis. Each tRNA becomes covalently
bonded to a specific amino acid to form aminoacyl-tRNA, which recognizes the corresponding codon in mRNA
and ensures that the correct amino acid is added to the growing polypeptide chain. The tRNAs are small
molecules, only 74-95 nt long, which form distinctive cloverleaf secondary structures by internal base-pairing.

The stem-loops of the cloverleaf are known as arms:


1.the anticodon arm contains in its loop the three nucleotides of the anticodon which will form base-pairs
with the complementary codon in mRNA during translation;
2.the D or DHU arm (with its D loop) contains dihydrouracil, an unusual pyrimidine;
3.the T or TψC arm (with its T loop) contains another unusual base, pseudouracil (denoted ψ in the
sequence TψC;
4.Some tRNAs also have a variable arm (optional arm) which is 3-21 nt in size.
The other notable feature is the amino acid acceptor stem. This is where the amino acid becomes attached, at
the 3’ OH group of the 3’-CCA sequence. The three dimensional structure of tRNA is even more complex be-
cause of additional interactions between the various units of secondary structure.

Transcription and processing of tRNA in prokaryotes:


The rRNA transcription units in E. coli contain some tRNA genes that are transcribed and processed at the time
of rRNA transcription. Other tRNA genes occur in clusters of up to seven tRNA sequences separated by spacer
regions. Following transcription by the single prokaryotic RNA poly-merase, the primary RNA transcript folds
up into the characteristic stem-loop structures and is then processed in an ordered series of cleavages by
ribonucleases (RNases), which release and trim the tRNAs to their final lengths. The cleavage and trimming
reactions at the 5’ and 3’ ends of the precursor tRNAs involves RNases D, E, F and P working in the sequence
shown in fig. RNases E, F and P are endonucleases, cutting the RNA internally, while RNase D is an
exonuclease, trimming the ends of the tRNA molecules.

[93]
Transcription and processing of tRNA in eukaryotes:
In eukaryotes, the tRNA genes exist as multiple copies and are transcribed by RNA polymerase III (RNA Pol
III). As in prokaryotes, several tRNAs may be transcribed together to yield a single pre-tRNA molecule that is
then processed to release the mature tRNAs. The promoters of eukaryotic tRNA genes are unusual in that the
transcriptional control elements are located downstream (i.e. on the 3’ side) of the transcriptional start site
(at position +1). In fact they lie within the gene itself. Two such elements have been identified, called the A
box and B box. Transcription of the tRNA genes by RNA Pol III requires transcription factor HIC (TFIIIC) as well
as TFIIIB. TFIIIC binds to the A and B boxes whilst TFIIIB binds upstream of the A box.
TFIIIB contains three subunits, one of which is TBP (TATA binding protein), the polypeptide required by all
three eukaryotic RNA polymerases.

After synthesis, the pre-tRNA molecule folds up into the characteristic stem-loops structures and non-tRNA
sequence is cleaved from the 5’ and 3’ ends by ribonucleases. In prokaryotes, the CCA sequence at the 3’ end
of the tRNA (which is the site of bonding to the amino acid) is enclosed by the tRNA gene but this is not the
case in eukaryotes. Instead, the CCA is added to the 3’ end after the trimming reactions by tRNA nucleotidyl
transferase. Another difference between prokaryotes and eukaryotes is that eukaryotic pre-tRNA molecules
often contain a short intron in the loop of the anticodon arm This intron must be removed in order to create a
functional tRNA molecule. Its removal occurs by cleavage by an endonuclease at each end of the intron and
then ligation together of the tRNA ends. This RNA splicing pathway for intron removal is totally different from
that used to remove introns from pre-mRNA molecules in eukaryotes and must have evolved independently

[94]
.
Modification of tRNA:
Transfer RNA molecules are notable for containing unusual nucleotides such as 1-methylguanosine (m’G),
pseudouridine (), dihydrouridine (D), inosine (I) and 4-thiouridine (S4U). These are created by modification
of guano-sine and uridine after tRNA synthesis. For example, inosine is generated by deamination of
guanosine.

[95]

You might also like