Professional Documents
Culture Documents
Genome Organization in Eukaryotes
Genome Organization in Eukaryotes
The picture that emerges is that the genetic organization of the yeast genome is much more
economical than that of the human version. The genes themselves are more compact, having
fewer introns, and the spaces between the genes are relatively short, with much less space
taken up by genome-wide repeats and other non-coding sequences.
The hypothesis that more complex organisms have less compact genomes holds when other
species are examined. Let’s examine fruit fly fragment. If we agree that a fruit fly is more
complex than a yeast cell but less complex than a human then we would expect the
organization of the fruit-fly genome to be intermediate between that of yeast and humans.
The gene density in the fruit-fly genome is intermediate between that of yeast and humans,
and the average fruit-fly gene has many more introns than the average yeast gene but still
three times fewer than the average human gene.
It is beginning to become clear that the genome-wide repeats play an intriguing role in
dictating the compactness or otherwise of a genome. This is strikingly illustrated by the
maize genome, which at 5000 Mb is larger than the human genome but still relatively small
for a flowering plant. Only a few limited regions of the maize genome have been sequenced,
but some remarkable results have been obtained, revealing a genome dominated by repetitive
elements. The only gene in 50-kb region is one member of a family of genes coding for the
alcohol dehydrogenase enzymes. Instead of genes, the dominant feature of this genome
segment is the genome-wide repeats. The majority of these are of the LTR element type,
which comprise virtually all of the non-coding part of the segment, and on their own are
estimated to make up approximately 50% of the maize genome. It is becoming clear that one
or more families of genome-wide repeats have undergone a massive proliferation in the
genomes of certain species. This may provide an explanation for the most puzzling aspect of
the C-value paradox, which is not the general increase in genome size that is seen in
increasingly complex organisms, but the fact that similar organisms can differ greatly in
genome size. A good example is provided by Amoeba dubia which, being a protozoan, might
be expected to have a genome of 100-500 kb, similar to other protozoa such as Tetrahymena
pyriformis. In fact the Amoeba genome is over 200,000 Mb. Similarly, we might guess that
the genomes of crickets are similar in size to those of other insects, but these bugs have
genomes of approximately 2000 Mb, 11 times that of the fruit fly.
Nuclear genome:
DNA in the nucleus exists mainly in combination with histone proteins; the DNA–
histone complex is called “chromatin”. Chromatin can undergo changes in its structure in
response to various cellular metabolic demands. Chromatin can be envisioned as a repeat of
structural units called “nucleosomes”. The nucleosome core particle is composed of histone
octamer plus the DNA that wraps around it. The histone octamer contains two molecules
each of histones H2A, H2B, H3, and H4. DNA wraps around the octamer in a left-handed
supercoil in about 1.75 turns which encloses about 150 bp. Histone H1 is a linker histone
that, along with linker DNA (the DNA in between two nucleosome core particles), physically
connects the adjacent nucleosome core particles. The length of linker DNA varies with
species and cell types. Usually, nucleosome core particle and linker DNA on both sides of the
core encompasses between 180- and 200-bp DNA. Between the nucleosome unit structure
and the metaphase chromosome structure containing two chromatids, there are several levels
of organization and compaction of the chromatin. Each nucleosome has a diameter of 10 nm;
the nucleosomes are compacted into a solenoid fiber structure of 30 nm called as 30 nm fiber;
the 30-nm solenoid fibers are compacted into a 300-nm filament; and finally, the 300-nm fi
laments are further compacted into a 700-nm chromosome. During cell division, when the
chromosomes duplicate, a 1,400-nm metaphase chromosome is produced containing two
chromatids, each chromatid being 700 nm.
Figure: From nucleosome to chromosome.
The 30 nm fiber is probably the major type of chromatin in the nucleus during
interphase, the period between nuclear divisions. When the nucleus divides, the DNA adopts
a more compact form of packaging, resulting in the highly condensed metaphase
chromosomes that can be seen with the light microscope and which have the appearance
generally associated with the word 'chromosome'. The metaphase chromosomes form at a
stage in the cell cycle after DNA replication has taken place and so each one contains two
copies of its chromosomal DNA molecule. The two copies are held together at the
centromere, which has a specific position within each chromosome. Individual chromosomes
can therefore be recognized because of their size and the location of the centromere relative
to the two ends. Further distinguishing features are revealed when chromosomes are stained.
There are a number of different staining techniques, each resulting in a banding pattern that is
characteristic for a particular chromosome. This means that the set of chromosomes
possessed by an organism can be represented as a karyogram, in which the banded
appearance of each one is depicted.
An important part of the chromosome is the terminal region or telomere. Telomeres
are important because they mark the ends of chromosomes and therefore enable the cell to
distinguish a real end from an unnatural end caused by chromosome breakage – an essential
requirement because the cell must repair the latter but not the former. Telomeric DNA is
made up of hundreds of copies of a repeated motif, 5 -TTAGGG-3 in humans, with a short
extension of the 3 terminus of the double-stranded DNA molecule.
Functional DNA content of genome: This includes coding and non-coding gene content and
contributes 25% of nuclear genome. As we have seen earlier in our comparison of genome
fragment from different organisms one thing becomes clear that genes are not arranged in
definite pattern but rather arranged unevenly throughout the entire genome. There were two
lines of evidence, one of which related to the banding patterns that are produced when
chromosomes are stained. The dyes used in these procedures bind to DNA molecules, but in
most cases with preferences for certain base pairs. Giemsa, for example, has a greater affinity
for DNA regions that are rich in A and T nucleotides. The dark G-bands in the human
karyogram are therefore thought to be AT-rich regions of the genome. The base composition
of the genome as a whole is 59.7% A + T so the dark G-bands must have AT contents
substantially greater than 60%. Cytogeneticists therefore predicted that there would be fewer
genes in dark G-bands because genes generally have AT contents of 45-50%. This prediction
was confirmed when the draft genome sequence was compared with the human karyogram.
The second line of evidence pointing to uneven gene distribution derived from the isochore
model of genome organization. According to this model, the genomes of vertebrates and
plants (and possibly of other eukaryotes) are mosaics of segments of DNA, each at least 300
kb in length, with each segment having a uniform base composition that differs from that of
the adjacent segments. Support for the isochore model comes from experiments in which
genomic DNA is broken into fragments of approximately 100 kb, treated with dyes that bind
specifically to AT- or GC-rich regions, and the pieces separated by density gradient
centrifugation. When this experiment is carried out with human DNA, five fractions are seen,
each representing a different isochore type with a distinctive base composition: two AT-rich
isochores, called L1 and L2, and three GC-rich classes: H1, H2 and H3. The last of these, H3,
is the least abundant in the human genome, making up only 3% of the total, but contains over
25% of the genes. This is a clear indication that genes are not distributed evenly through the
human genome.
The genes present in an organisms can be classified using two approaches first is
based according to the function of genes and other is based on particular domain of the
protein a gene codes for. The second approach is more informative and better because it
shows that particular genome specifies a number of protein domains that are absent from the
genomes of other organisms, these domains including several involved in activities such as
cell adhesion, electric couplings, and growth of nerve cells. These functions are interesting
because they are ones that we look on as conferring the distinctive features of vertebrates
compared with other types of eukaryote.
Since the earliest days of DNA sequencing it has been known that multigene families -
groups of genes of identical or similar sequence - are common features of many genomes.
The rRNA genes are examples of 'simple' or 'classical' multigene families, in which all the
members have identical or nearly identical sequences. These families are believed to have
arisen by gene duplication, with the sequences of the individual members kept identical by an
evolutionary process. Other multigene families, more common in higher eukaryotes than in
lower eukaryotes, are called 'complex' because the individual members, although similar in
sequence, are sufficiently different for the gene products to have distinctive properties. One
of the best examples of this type of multigene family are the mammalian globin genes. The
globins are the blood proteins that combine to make hemoglobin, each molecule of
haemoglobin being made up of two α-type and two β-type globins. Why are the members of
the globin gene families so different from one another? The answer was revealed when the
expression patterns of the individual genes were studied. It was discovered that the genes are
expressed at different stages in human development: for example, in the β-type cluster ε is
expressed in the early embryo, Gγ and Aγ (whose protein products differ by just one amino
acid) in the fetus, and δ and β in the adult. The different biochemical properties of the
resulting globin proteins are thought to reflect slight changes in the physiological role that
hemoglobin plays during the course of human development.
In some multigene families, the individual members are clustered, as with the globin
genes, but in others the genes are dispersed around the genome. An example of a dispersed
family is the five human genes for aldolase, an enzyme involved in energy generation, which
are located on chromosomes 3, 9, 10, 16 and 17. The important point is that, even though
dispersed, the members of the multigene family have sequence similarities that point to a
common evolutionary origin.
The Repetitive DNA Content of Genomes
Repetitive DNA is found in all organisms and that in some, including humans, it
makes up a substantial fraction of the entire genome. There are various types of repetitive
DNA, and several classification systems have been devised. The scheme that we will use
begins by dividing the repeats into those that are clustered into tandem arrays and those that
are dispersed around the genome.
a) Tandemly repeated DNA: Tandemly repeated DNA is a common feature of eukaryotic
genomes but is found much less frequently in prokaryotes. This type of repeat is also called
satellite DNA because DNA fragments containing tandemly repeated sequences form
'satellite' bands when genomic DNA is fractionated by density gradient centrifugation. The
satellite bands contain fragments of repetitive DNA, and hence have GC contents and
buoyant densities that are atypical of the genome as a whole. The satellite bands in density
gradients of eukaryotic DNA are made up of fragments composed of long series of tandem
repeats, possibly hundreds of kb in length. A single genome can contain several different
types of satellite DNA, each with a different repeat unit, these units being anything from < 5
to > 200 bp. The three satellite bands in human DNA include at least four different repeat
types.
One type of human satellite DNA is the alphoid DNA repeats found in the centromere
regions of chromosomes. Although some satellite DNA is scattered around the genome, most
is located in the centromeres, where it may play a structural role, possibly as binding sites for
one or more of the special centromeric proteins. Alternatively, the repetitive DNA content of
the centromere might be a reflection of the fact that this is the last region of the chromosome
to be replicated. In order to delay its replication until the very end of the cell cycle, the
centromere DNA must lack sequences that can act as origins of replication. The repetitive
nature of centromeric DNA may be a means of ensuring that such origins are absent.
Although not appearing in satellite bands on density gradients, two other types of tandemly
repeated DNA are also classed as 'satellite' DNA. These are minisatellites and
microsatellites. Minisatellites form clusters up to 20 kb in length, with repeat units up to 25
bp; microsatellite clusters are shorter, usually < 150 bp, and the repeat unit is usually 13 bp or
less. We have already seen one type of minisatellite DNA is Telomeric DNA. In addition to
telomeric minisatellites, some eukaryotic genomes contain various other clusters of
minisatellite DNA, many, although not all, near the ends of chromosomes. The functions of
these other minisatellite sequences have not been identified. The function of microsatellites is
equally mysterious. The typical microsatellite consists of a 1-, 2-, 3- or 4-bp unit repeated 10
20 times, as illustrated by the microsatellites in the human β T-cell receptor locus. Although
each microsatellite is relatively short, there are many of them in the genome. In humans, for
example, microsatellites with a CA repeat, that make up 0.25% of the genome, 8 Mb in all.
Single base-pair repeats such as: (A)15 make up another 0.15%.
Although their function, if any, is unknown, microsatellites have proved very useful to
geneticists. Many microsatellites are variable, meaning that the number of repeat units in the
array is different in different members of a species. This is because 'slippage' sometimes
occurs when a microsatellite is copied during DNA replication, leading to insertion or, less
frequently, deletion of one or more of the repeat units. No two individuals have exactly the
same combination of microsatellite length variants: if enough microsatellites are examined
then a unique genetic profile can be established for every individual. The only exceptions are
genetically identical twins. Genetic profiling is well known as a tool in forensic science, but
identification of criminals is a fairly trivial application of microsatellite variability. More
sophisticated methodology makes use of the fact that a person's genetic profile is inherited
partly from the mother and partly from the father. This means that microsatellites can be used
to establish kinship relationships and population affinities, not only for humans but also for
other animals, and for plants.
b) Interspersed genome-wide repeats: Tandemly repeated DNA sequences are thought to
have arisen either by replication slippage, as described for microsatellites, or by DNA
recombination processes. Both of these events are likely to result in a series of linked repeats,
rather than individual repeat units scattered around the genome. Interspersed repeats must
therefore have arisen by a different mechanism, one that can result in a copy of a repeat unit
appearing in the genome at a position distant from the location of the original sequence. The
most frequent way in which this occurs is by transposition, and most interspersed repeats
have inherent transpositional activity.
There are two alternative modes of transposition, one that involves an RNA
intermediate and one that does not. The version that involves an RNA intermediate is called
retrotransposition. The basic mechanism involves three steps:
1. An RNA copy of the transposon is synthesized by the
normal process of transcription.
2. The RNA transcript is copied into DNA. This
conversion of RNA to DNA, the reverse of the normal
transcription process, requires a special enzyme called
reverse transcriptase. Often the reverse transcriptase is
coded by a gene within the transposon and is translated
from the RNA copy synthesized in step 1.
3. The DNA copy of the transposon integrates into the
genome, possibly back into the same chromosome
occupied by the original unit, or possibly into a different
chromosome.
The end result is that there are now two copies of the
transposon, at different points in the genome.
RNA transposons or retroelements are features of eukaryotic genomes but have not so far
been discovered in prokaryotes.
Endogenous retroviruses (ERVs) are retroviral genomes integrated into vertebrate
chromosomes. Some are still active and might, at some stage in a cell's lifetime, direct
synthesis of exogenous viruses, but most are decayed relics that no longer have the capacity
to form viruses. These inactive sequences are genomewide repeats but they are not capable of
additional proliferation.
Retrotransposons have sequences similar to ERVs but are features of nonvertebrate
eukaryotic genomes (i.e. plants, fungi, invertebrates and microbial eukaryotes) rather than
vertebrates. Retrotransposons have very high copy numbers in some genomes, with many
different types present. There are two types of retrotransposon: the Ty3/gypsy family (Ty3 and
gypsy are examples of this class in yeast and fruit fly, respectively), whose members possess
the same set of genes as an ERV, and the Ty1/copia family, members of which lack the env
gene. Both types are able to transpose but the absence of the env gene means that the
Ty1/copia group cannot form infectious virus particles. In fact, despite the presence of env in
the Ty3/gypsy genome, it has only recently been recognized that some of these elements can
form viruses and hence should be looked upon as non-vertebrate retroviruses. Although
technically they are interspersed elements, retrotransposons are sometimes found in clusters
in a genome sequence as a result of the presence of preferred integration sites for transposing
elements.
The three types of retroelement described so far are LTR elements, as they have long terminal
repeats at either end which play a role in the transposition process. Other retroelements do not
have LTRs. These are called retroposons and in mammals include the following:
• LINEs (long interspersed nuclear elements) contain a reverse-transcriptase-like gene
probably involved in the retrotransposition process. An example is the human element LINE-
1, which is 6.1 kb and has a copy number of 516,000 in the human genome. A LINE contains
a pol II promoter and two open reading frames (ORFs), one encoding the endonuclease and
the other encoding the reverse transcriptase. LINE activity proceeds as follows: RNA pol II
transcribes the LINE DNA into LINE RNA; the LINE RNA is translated into proteins; the
proteins and RNA join together and reenter the nucleus; the endonuclease cuts a strand of the
target genomic DNA, often in the intron of a gene; the reverse transcriptase copies the LINE
RNA into LINE DNA which is inserted into the target DNA forming a new LINE element
there. Three distant related LINE families are found in the human genome: LINE1, LINE2,
and LINE3. Only LINE1 (L1) is still active.
• SINEs (short interspersed nuclear elements) do not have a reverse transcriptase gene but
can still transpose, probably by 'borrowing' reverse transcriptase enzymes that have been
synthesized by other retroelements. SINEs are short sequences (about 100–400 bp) and they
contain an internal pol III promoter but do not encode any proteins. All currently known
SINEs are derived from tRNA and 7SL RNA genes. Most nonautonomous SINEs share the 3′
end with a resident LINE. The only active SINE in the human genome is the Alu element,
which is the major SINE constituting about 11% of the genome (~1 million Alu elements).
Not all transposons require an RNA intermediate. Many are able to transpose in a
more direct DNA to DNA manner. In eukaryotes, DNA transposons are less common than
retrotransposons, but they have a special place in genetics because a family of plant DNA
transposons - the Ac/Ds elements of maize - were the first transposable elements to be
discovered, by Barbara McClintock in the 1950s. DNA transposons are a much more
important component of prokaryotic genome anatomies than the RNA transposons. The
insertion sequences, IS1 and IS186, are examples of DNA transposons, and a single E. coli
genome may contain as many as 20 of these of various types. Other kinds of DNA transposon
known in E. coli, and fairly typical of prokaryotes in general, includes: Composite
transposons and Tn3-type transposons.
By
Dr Subhash Jakhesara