Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 25

Genome Organization

Mardalisa, B.Sc., M.Si

Viral genomes
Viral genomes: ssRNA, dsRNA, ssDNA, dsDNA, linear or ciruclar

Viruses with RNA genomes:

•Almost all plant viruses and some bacterial and animal viruses
•Genomes are rather small (a few thousand nucleotides)
Viruses with DNA genomes (e.g. lambda = 48,502 bp):
•Often a circular genome.
Replicative form of viral genomes
•all ssRNA viruses produce dsRNA molecules
•many linear DNA molecules become circular
Molecular weight and contour length:
• duplex length per nucleotide = 3.4 Å
• Mol. Weight per base pair = ~ 660
Procaryotic genomes

 Generally 1 circular chromosome (dsDNA)

 Usually without introns
 Relatively high gene density (~2500 genes per mm
of E. coli DNA)
 Contour length of E.coli genome: 1.7 mm
 Often indigenous plasmids are present
Extra chromosomal circular DNAs
 Found in bacteria, yeast and other fungi
foreign gene
 Size varies form ~ 3,000 bp to 100,000 bp.
 Replicate autonomously (origin of replication)
 May contain resistance genes
 May be transferred from one bacterium to another
 May be transferred across kingdoms
 Multicopy plasmids (~ up to 400 plasmids/per cell)
 Low copy plasmids (1 –2 copies per cell)
 Plasmids may be incompatible with each other
 Are used as vectors that could carry a foreign gene of interest
(e.g. insulin)
Eukaryotic genome

 Moderately repetitive
 Functional (protein coding, tRNA coding)
 Unknown function
 SINEs (short interspersed elements)
 200-300 bp
 100,000 copies

 LINEs (long interspersed elements)

 1-5 kb
 10-10,000 copies
Eukaryotic genome

 Highly repetitive
 Minisatellites
 Repeats of 14-500 bp
 1-5 kb long
 Scattered throughout genome
 Microsatellites
 Repeats up to 13 bp
 100s of kb long, 106 copies
 Around centromere
 Telomeres
 Short repeats (6 bp)
 250-1,000 at ends of chromosomes
Eucaryotic genomes
 Located on several chromosomes
 Relatively low gene density (50 genes per mm of
DNA in humans)
 Contour length of DNA from a single human cell = 2
 Approximately 1011 cells = total length 2 x 1011 km
 Distance between sun and earth (1.5 x 108 km)
 Human chromosomes vary in length over a 25 fold
 Carry organelles genome as well
Mitochondrial genome (mtDNA)

 Multiple identical circular chromosomes

 Size ~15 Kb in animals
 Size ~ 200 kb to 2,500 kb in plants
 Over 95% of mitochondrial proteins are encoded in the nuclear genome.
 Often A+T rich genomes.
 Mt DNA is replicated before or during mitosis
Chloroplast genome (cpDNA)

 Multiple circular molecules

 Size ranges from 120 kb to 160 kb
 Similar to mtDNA
 Many chloroplast proteins are encoded in the nucleus (separate signal sequence)
“Cellular” Genomes
Viruses Procaryotes Eucaryotes


Viral genome Bacterial

Chromosomes Mitochondrial
(Nuclear genome) genome

Genome: all of an organism’s genes plus intergenic DNA
Intergenic DNA = DNA between genes
Estimated genome sizes



bacteria (>100)

mitochondria (~ 100)

viruses (1024)

1e1 1e2 1e3 1e4 1e5 1e6 1e7 1e8 1e9 1e10 1e11 1e12
Size in nucleotides. Number in ( ) = completely sequenced genomes
Size of genomes

Epstein-Barr virus 0.172 x 106

E. coli 4.6 x 106
S. cerevisiae 12.1 x 106
C. elegans 95.5 x 106
A. thaliana 117 x 106
D. melanogaster 180 x 106
H. sapiens 3200 x 106
Chromosome organization
Eucaryotic chromosome

Telomere Centromere Telomere

p-arm q-arm

• DNA sequence that serve as an attachment for protein during mitosis.
• In yeast these sequences (~ 130 nts) are very A+T rich.
• In higher eucaryotes centromers are much longer and contain
“satellite DNA”
• At the end of chromosomes; help stabilize the chromosome
• In yeast telomeres are ~ 100 bp long (imperfect repeats)
• Repeats are added by a specific telomerase

5’ – (TxGy)n x and y = 1 - 4
3’ – (AxCy)n n = 20 to 100; (1500 in mammals)
Gene classification
region non-coding
coding genes genes

Messenger RNA Structural RNA


transfer ribosomal other


Structural proteins Enzymes

What is a gene ?
 Definitions
1. Classical definition: Portion of a DNA that determines a
single character (phenotype)
2. One gene – one enzyme (Beadle & Tatum 1940): “Every
gene encodes the information for one enzyme”
3. One gene – one protein: “One gene contains information
for one protein (structural proteins included) one gene –
one polypeptide
4. Current definition: A piece of DNA (or in some cases
RNA) that contains the primary sequence to produce a
functional biological gene product (RNA, protein).
Coding region
Nucleotides (open reading frame) encoding the amino acid sequence of a protein

The molecular definition of gene includes

more than just the coding region
Noncoding regions

 Regulatory regions
 RNA polymerase binding site
 Transcription factor binding sites
 Introns
 Polyadenylation [poly(A)] sites

Molecular definition:
Entire nucleic acid sequence necessary for the synthesis of a functional polypeptide (protein
chain) or functional RNA
Anatomy of a gene

 ORF. From start (ATG) to stop (TGA, TAA, TAG)

 Upstream region with binding site. (e.g. TATA box).
 Poly-a ‘tail’
 Splices. Bounded by AG and GT splice signals.
Bacterial genes

 Most do not have introns

 Many are organized in operons: contiguous genes, transcribed as a single
polycistronic mRNA, that encode proteins with related functions

Polycistronic mRNA encodes several proteins

Bacterial operon

What would be the effect of a mutation

in the control region (a) compared to a
mutation in a structural gene (b)?
Eucaryotic genes

Hemoglobin beta subunit gene

Exon 1 Intron A Exon 2 Intron B Exon 3
90 bp 131 bp 222 bp 851 bp 126 bp


Introns: intervening sequences within a gene that are not translated

into a protein sequence. Collagen has 50 introns.
Exons: sequences within a gene that encode protein sequences
Splicing: Removal of introns from the mRNA molecule.
Regulatory mechanisms

 ‘organize expression of genes’ (function calls)

 Promoter region (binding site), usually near coding region
 Binding can block (inhibit) expression
 Computational challenges
 Identify binding sites
 Correlate sequence to expression

You might also like