Professional Documents
Culture Documents
Genome Organization in Eukaryotes PDF
Genome Organization in Eukaryotes PDF
Fungi
Saccharomyces cerevisiae
12.1
Aspergillus nidulans
25.4
Protozoa
Tetrahymena pyriformis
190
Invertebrates
Caenorhabditis elegans
97
Drosophila melanogaster
180
490
845
5000
Vertebrates
Takifugu rubripes (pufferfish)
400
Homo sapiens
3200
3300
Plants
Arabidopsis thaliana (vetch)
125
430
2500
4800
16 000
120 000
Genome size range coincides to a certain extent with the complexity of the organism, the
simplest eukaryotes such as fungi having the smallest genomes, and higher eukaryotes such
as vertebrates and flowering plants having the largest ones. This might appear to make sense
as one would expect the complexity of an organism to be related to the number of genes in its
genome - higher eukaryotes need larger genomes to accommodate the extra genes. However,
the correlation is far from precise: if it was, then the nuclear genome of the yeast S.
cerevisiae, which at 12 Mb is 0.004 times the size of the human nuclear genome, would be
expected to contain 0.004 35 000 genes, which is just 140. In fact the S. cerevisiae genome
contains about 5800 genes.
For many years the lack of precise correlation between the complexity of an organism and the
size of its genome was looked on as a bit of a puzzle, the so-called C-value paradox. In fact
the answer is quite simple: space is saved in the genomes of less complex organisms because
the genes are more closely packed together. We will try to understand this by comparison of
the 50 kb fragment of genomes of humans, yeast, fruit flies, maize and Escherichia coli. The
yeast genome segment, which comes from chromosome III (the first eukaryotic chromosome
to be sequenced), has the following distinctive features:
It contains more genes than the human segment.
Relatively few of the yeast genes are discontinuous.
There are fewer genome-wide repeats.
The picture that emerges is that the genetic organization of the yeast genome is much more
economical than that of the human version. The genes themselves are more compact, having
fewer introns, and the spaces between the genes are relatively short, with much less space
taken up by genome-wide repeats and other non-coding sequences.
The hypothesis that more complex organisms have less compact genomes holds when other
species are examined. Lets examine fruit fly fragment. If we agree that a fruit fly is more
complex than a yeast cell but less complex than a human then we would expect the
organization of the fruit-fly genome to be intermediate between that of yeast and humans.
The gene density in the fruit-fly genome is intermediate between that of yeast and humans,
and the average fruit-fly gene has many more introns than the average yeast gene but still
three times fewer than the average human gene.
It is beginning to become clear that the genome-wide repeats play an intriguing role in
dictating the compactness or otherwise of a genome. This is strikingly illustrated by the
maize genome, which at 5000 Mb is larger than the human genome but still relatively small
for a flowering plant. Only a few limited regions of the maize genome have been sequenced,
but some remarkable results have been obtained, revealing a genome dominated by repetitive
elements. The only gene in 50-kb region is one member of a family of genes coding for the
alcohol dehydrogenase enzymes. Instead of genes, the dominant feature of this genome
segment is the genome-wide repeats. The majority of these are of the LTR element type,
which comprise virtually all of the non-coding part of the segment, and on their own are
estimated to make up approximately 50% of the maize genome. It is becoming clear that one
or more families of genome-wide repeats have undergone a massive proliferation in the
genomes of certain species. This may provide an explanation for the most puzzling aspect of
the C-value paradox, which is not the general increase in genome size that is seen in
increasingly complex organisms, but the fact that similar organisms can differ greatly in
genome size. A good example is provided by Amoeba dubia which, being a protozoan, might
be expected to have a genome of 100-500 kb, similar to other protozoa such as Tetrahymena
pyriformis. In fact the Amoeba genome is over 200,000 Mb. Similarly, we might guess that
the genomes of crickets are similar in size to those of other insects, but these bugs have
genomes of approximately 2000 Mb, 11 times that of the fruit fly.
Nuclear genome:
Packaging of DNA into chromosomes: Chromosomes are much shorter than the DNA
molecules that they contain. A highly organized packaging system is therefore needed to fit a
DNA molecule into its chromosome.
In 1973-74 several groups carried out nuclease protection experiments on chromatin (DNAhistone complexes) that had been gently extracted from nuclei by methods designed to retain
as much of the chromatin structure as possible. In a nuclease protection experiment the
complex is treated with an enzyme that cuts the DNA at positions that are not 'protected' by
attachment to a protein. The sizes of the resulting DNA fragments indicate the positioning of
the protein complexes on the original DNA molecule. After limited nuclease treatment of
purified chromatin, the bulk of the DNA fragments have lengths of approximately 200 bp and
multiples thereof, suggesting a regular spacing of histone proteins along the DNA.
Nuclease protection analysis of chromatin
from human nuclei. Chromatin is gently
purified from nuclei and treated with a nuclease
enzyme. On the left, the nuclease treatment is
carried out under limiting conditions so that the
DNA is cut, on average, just once in each of the
linker regions between the bound proteins. After
removal of the protein, the DNA fragments are
analyzed by agarose gel electrophoresis and
found to be 200 bp in length, or multiples thereof.
On the right, the nuclease treatment proceeds to
completion, so all the DNA in the linker regions
is digested. The remaining DNA fragments are all
146 bp in length. The results show that in this
form of chromatin, protein complexes are spaced
along the DNA at regular intervals, one for each
200 bp, with 146 bp of DNA closely attached to
each protein complex.
DNA in the nucleus exists mainly in combination with histone proteins; the DNA
histone complex is called chromatin. Chromatin can undergo changes in its structure in
response to various cellular metabolic demands. Chromatin can be envisioned as a repeat of
structural units called nucleosomes. The nucleosome core particle is composed of histone
octamer plus the DNA that wraps around it. The histone octamer contains two molecules
each of histones H2A, H2B, H3, and H4. DNA wraps around the octamer in a left-handed
supercoil in about 1.75 turns which encloses about 150 bp. Histone H1 is a linker histone
that, along with linker DNA (the DNA in between two nucleosome core particles), physically
connects the adjacent nucleosome core particles. The length of linker DNA varies with
species and cell types. Usually, nucleosome core particle and linker DNA on both sides of the
core encompasses between 180- and 200-bp DNA. Between the nucleosome unit structure
and the metaphase chromosome structure containing two chromatids, there are several levels
of organization and compaction of the chromatin. Each nucleosome has a diameter of 10 nm;
the nucleosomes are compacted into a solenoid fiber structure of 30 nm called as 30 nm fiber;
the 30-nm solenoid fibers are compacted into a 300-nm filament; and finally, the 300-nm fi
laments are further compacted into a 700-nm chromosome. During cell division, when the
chromosomes duplicate, a 1,400-nm metaphase chromosome is produced containing two
chromatids, each chromatid being 700 nm.
The 30 nm fiber is probably the major type of chromatin in the nucleus during
interphase, the period between nuclear divisions. When the nucleus divides, the DNA adopts
a more compact form of packaging, resulting in the highly condensed metaphase
chromosomes that can be seen with the light microscope and which have the appearance
generally associated with the word 'chromosome'. The metaphase chromosomes form at a
stage in the cell cycle after DNA replication has taken place and so each one contains two
copies of its chromosomal DNA molecule. The two copies are held together at the
centromere, which has a specific position within each chromosome. Individual chromosomes
can therefore be recognized because of their size and the location of the centromere relative
to the two ends. Further distinguishing features are revealed when chromosomes are stained.
There are a number of different staining techniques, each resulting in a banding pattern that is
characteristic for a particular chromosome. This means that the set of chromosomes
possessed by an organism can be represented as a karyogram, in which the banded
appearance of each one is depicted.
An important part of the chromosome is the terminal region or telomere. Telomeres
are important because they mark the ends of chromosomes and therefore enable the cell to
distinguish a real end from an unnatural end caused by chromosome breakage an essential
requirement because the cell must repair the latter but not the former. Telomeric DNA is
made up of hundreds of copies of a repeated motif, 5 -TTAGGG-3 in humans, with a short
extension of the 3 terminus of the double-stranded DNA molecule.
Functional DNA content of genome: This includes coding and non-coding gene content and
contributes 25% of nuclear genome. As we have seen earlier in our comparison of genome
fragment from different organisms one thing becomes clear that genes are not arranged in
definite pattern but rather arranged unevenly throughout the entire genome. There were two
lines of evidence, one of which related to the banding patterns that are produced when
chromosomes are stained. The dyes used in these procedures bind to DNA molecules, but in
most cases with preferences for certain base pairs. Giemsa, for example, has a greater affinity
for DNA regions that are rich in A and T nucleotides. The dark G-bands in the human
karyogram are therefore thought to be AT-rich regions of the genome. The base composition
of the genome as a whole is 59.7% A + T so the dark G-bands must have AT contents
substantially greater than 60%. Cytogeneticists therefore predicted that there would be fewer
genes in dark G-bands because genes generally have AT contents of 45-50%. This prediction
was confirmed when the draft genome sequence was compared with the human karyogram.
The second line of evidence pointing to uneven gene distribution derived from the isochore
model of genome organization. According to this model, the genomes of vertebrates and
plants (and possibly of other eukaryotes) are mosaics of segments of DNA, each at least 300
kb in length, with each segment having a uniform base composition that differs from that of
the adjacent segments. Support for the isochore model comes from experiments in which
genomic DNA is broken into fragments of approximately 100 kb, treated with dyes that bind
specifically to AT- or GC-rich regions, and the pieces separated by density gradient
centrifugation. When this experiment is carried out with human DNA, five fractions are seen,
each representing a different isochore type with a distinctive base composition: two AT-rich
isochores, called L1 and L2, and three GC-rich classes: H1, H2 and H3. The last of these, H3,
is the least abundant in the human genome, making up only 3% of the total, but contains over
25% of the genes. This is a clear indication that genes are not distributed evenly through the
human genome.
The genes present in an organisms can be classified using two approaches first is
based according to the function of genes and other is based on particular domain of the
protein a gene codes for. The second approach is more informative and better because it
shows that particular genome specifies a number of protein domains that are absent from the
genomes of other organisms, these domains including several involved in activities such as
cell adhesion, electric couplings, and growth of nerve cells. These functions are interesting
because they are ones that we look on as conferring the distinctive features of vertebrates
compared with other types of eukaryote.
Since the earliest days of DNA sequencing it has been known that multigene families groups of genes of identical or similar sequence - are common features of many genomes.
The rRNA genes are examples of 'simple' or 'classical' multigene families, in which all the
members have identical or nearly identical sequences. These families are believed to have
arisen by gene duplication, with the sequences of the individual members kept identical by an
evolutionary process. Other multigene families, more common in higher eukaryotes than in
lower eukaryotes, are called 'complex' because the individual members, although similar in
sequence, are sufficiently different for the gene products to have distinctive properties. One
of the best examples of this type of multigene family are the mammalian globin genes. The
globins are the blood proteins that combine to make hemoglobin, each molecule of
haemoglobin being made up of two -type and two -type globins. Why are the members of
the globin gene families so different from one another? The answer was revealed when the
expression patterns of the individual genes were studied. It was discovered that the genes are
expressed at different stages in human development: for example, in the -type cluster is
expressed in the early embryo, G and A (whose protein products differ by just one amino
acid) in the fetus, and and in the adult. The different biochemical properties of the
resulting globin proteins are thought to reflect slight changes in the physiological role that
hemoglobin plays during the course of human development.
In some multigene families, the individual members are clustered, as with the globin
genes, but in others the genes are dispersed around the genome. An example of a dispersed
family is the five human genes for aldolase, an enzyme involved in energy generation, which
are located on chromosomes 3, 9, 10, 16 and 17. The important point is that, even though
dispersed, the members of the multigene family have sequence similarities that point to a
common evolutionary origin.
RNA transposons or retroelements are features of eukaryotic genomes but have not so far
been discovered in prokaryotes.
Endogenous retroviruses (ERVs) are retroviral genomes integrated into vertebrate
chromosomes. Some are still active and might, at some stage in a cell's lifetime, direct
synthesis of exogenous viruses, but most are decayed relics that no longer have the capacity
to form viruses. These inactive sequences are genomewide repeats but they are not capable of
additional proliferation.
Retrotransposons have sequences similar to ERVs but are features of nonvertebrate
eukaryotic genomes (i.e. plants, fungi, invertebrates and microbial eukaryotes) rather than
vertebrates. Retrotransposons have very high copy numbers in some genomes, with many
different types present. There are two types of retrotransposon: the Ty3/gypsy family (Ty3 and
gypsy are examples of this class in yeast and fruit fly, respectively), whose members possess
the same set of genes as an ERV, and the Ty1/copia family, members of which lack the env
gene. Both types are able to transpose but the absence of the env gene means that the
Ty1/copia group cannot form infectious virus particles. In fact, despite the presence of env in
the Ty3/gypsy genome, it has only recently been recognized that some of these elements can
form viruses and hence should be looked upon as non-vertebrate retroviruses. Although
technically they are interspersed elements, retrotransposons are sometimes found in clusters
in a genome sequence as a result of the presence of preferred integration sites for transposing
elements.
The three types of retroelement described so far are LTR elements, as they have long terminal
repeats at either end which play a role in the transposition process. Other retroelements do not
have LTRs. These are called retroposons and in mammals include the following:
LINEs (long interspersed nuclear elements) contain a reverse-transcriptase-like gene
probably involved in the retrotransposition process. An example is the human element LINE1, which is 6.1 kb and has a copy number of 516,000 in the human genome. A LINE contains
a pol II promoter and two open reading frames (ORFs), one encoding the endonuclease and
the other encoding the reverse transcriptase. LINE activity proceeds as follows: RNA pol II
transcribes the LINE DNA into LINE RNA; the LINE RNA is translated into proteins; the
proteins and RNA join together and reenter the nucleus; the endonuclease cuts a strand of the
target genomic DNA, often in the intron of a gene; the reverse transcriptase copies the LINE
RNA into LINE DNA which is inserted into the target DNA forming a new LINE element
there. Three distant related LINE families are found in the human genome: LINE1, LINE2,
and LINE3. Only LINE1 (L1) is still active.
SINEs (short interspersed nuclear elements) do not have a reverse transcriptase gene but
can still transpose, probably by 'borrowing' reverse transcriptase enzymes that have been
synthesized by other retroelements. SINEs are short sequences (about 100400 bp) and they
contain an internal pol III promoter but do not encode any proteins. All currently known
SINEs are derived from tRNA and 7SL RNA genes. Most nonautonomous SINEs share the 3
end with a resident LINE. The only active SINE in the human genome is the Alu element,
which is the major SINE constituting about 11% of the genome (~1 million Alu elements).
Not all transposons require an RNA intermediate. Many are able to transpose in a
more direct DNA to DNA manner. In eukaryotes, DNA transposons are less common than
retrotransposons, but they have a special place in genetics because a family of plant DNA
transposons - the Ac/Ds elements of maize - were the first transposable elements to be
discovered, by Barbara McClintock in the 1950s. DNA transposons are a much more
important component of prokaryotic genome anatomies than the RNA transposons. The
insertion sequences, IS1 and IS186, are examples of DNA transposons, and a single E. coli
genome may contain as many as 20 of these of various types. Other kinds of DNA transposon
known in E. coli, and fairly typical of prokaryotes in general, includes: Composite
transposons and Tn3-type transposons.
By
Dr Subhash Jakhesara