chapter 2

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 65

CHAPTER TWO

Repetitive DNA

BY: Mekoyet Addise(MSc.)


1
2.Introduction to repetitive DNA

2.1. Repetitive DNA: An Important source of


variation in Eukaryotic Genomes
 Types and distribution of repetitive DNA
 Function of repetitive DNA sequence

2.2. Genetic diversity and basis of polymorphism

2
2.1. Repetitive DNA: An Important source of
variation in Eukaryotic Genomes

 What is a Genome ?

 From where does the


variation come from?

 What is repetitive
sequence (repetitive
DNA ) mean?

3
Genome

 A genome is an organism’s complete set of genetic instructions. Each

genome contains all of the information needed to build that organism


and allow it to grow and develop.

 The instructions in our genome are made up of DNA.

 All living things have a unique genome.

 The human genome is made of 3.2 billion bases of DNA but other

organisms have different genome sizes.


4
From where does the variation come from?

Do more complex organisms have more genes?

Is variation in genome size between organisms


determined by coding or non-coding sequences?

Variation in genome size between species is not


explained to any significant extent by the number of
genes or by their size i.e. C Value Paradox!

5
The concept of C value paradox….

 The E. Coli genome has 4.6 million base pairs and codes for
about 3,000 different proteins (proteins of ~40,000 and 500
bp for promoters)
 Using the same assumptions the human genome should code
for 1 million proteins (3 billion base pairs (3*10^9),
protein ~50,000 and promoters of 1500 bp)
 Humans only have ~30,000 coding “genes”

6
Which one has more genes?

7
Percent of non-coding DNA

8
Cont’d..

The genome…

 The majority of a given eukaryotic nuclear genome is repetitive DNA


that include:

 Transposons,

 simple séquence repeats,

 Segmental duplications and pseudogenes

9
What does the DNA in the human genome look like?

10
What is repetitive DNA mean?

 Repeated sequences (also known as repetitive elements, repeats,

repetitive DNA) are short or long patterns of nucleic acids (DNA or


RNA) that are present in multiple copies throughout the genome.

 Is a major component of eukaryotic genomes and may account for up

to 90% of the genome size.

11
Repetitive DNA

 Repetitive DNA are present in all three genomes


 Nuclear DNA
 MtDNA
 CpDNA

12
CpDNA

 Only present in plants

 Ranges from 135 to 160 kb in size

 Packed with genes

 Resembles the streamlined configuration of its


cyanobacterial ancestral genome

 consists of an inverted repeat separating one large single


copy (LSC) and one small single copy (SSC) region

13
Nuclear genome

 A huge ocean of largely nongenic DNA with:-

 Some tens of thousands of genes and gene clusters


scattered around like small islands and
archipelagos

 A high proportion of this apparently nonfunctional


DNA consists of repeated motifs
 May be considered as junk DNA or selfish DNA

14
MtDNA
 Shares a number of features with both the nuclear and the chloroplast
genome

• Thus, plant mtDNA genes have prokaryotic properties just like CpDNA
genes. But, introns are more common

• With about 370 to 490 kb, the three higher plant mtDNAs sequenced so
far are about 20 times larger than their animal counterparts. But only
about 10% of these sequences represent genes

• Another 10 to 26% were found to be made up of repetitive DNA,


including retrotransposons

• Thus, the majority of plant mtDNA sequences lack any obvious features
of information
15
Con’t….

 Both cpDNA and mtDNA are present in hundreds of copies


per cell

 Each acts as a single heritable unit

 Inheritance is uniparental. In most cases, transmission is


through the female parent

 The best-known exception to this rule is the paternal


transmission of cpDNA in most but not all gymnosperms

• The accumulating sequence data also revealed an extensive


and ongoing horizontal exchange of DNA between the three
different genomes
16
Repeated DNA elements
 Comprise the largest space of the nuclear genome in most
eukaryotic organisms

 Various types of repetitive DNA are also found in the


organelles

 Therefore, a considerable fraction of the currently employed


DNA profiling relies on mutations of repetitive DNA elements

17
Types and distribution of repetitive DNA sequences

Depending on their genomic organization, repetitive DNA elements may be


classified as

1. Tandemly repeated
 are restricted to fewer loci
 consist of arrays of two to several thousand sequence units arranged in a
head-to-tail fashion
 This kind of organization is also exhibited by some genes, such as the
transcription units for histone mRNA and rRNA

2. Interspersed repeats:- are identical or similar DNA sequences which are


found in different locations throughout the genome
 exemplified by transposable elements
 are present at multiple sites throughout the genome
18
1. Tandem-repetitive DNA

 Classified according to the length and copy number of the


basic repeat units as well as its genomic localization
I. Satellite DNA
II. Minisatellites
III. Microsatellites

19
I .Satellite DNA(satDNA)

 Generally heterochromatic in nature

 Often located in subtelomeric or centromeric regions

 Typical satellites consist of very high numbers of repetitions


 usually between 1000 and more than 100,000 copies, of basic sequence
motif
 Monomer sizes may range from two to several thousand bp, but 100 to
300 bp are most common

 Satellite DNA as fraction of total genome


 Mammals 5-30%
 Plants 5-40%
20
II. Minisatellites

 Coined by Jeffreys et al.

 Occur in nuclear DNA

 Highly polymorphic loci

 Often, they form families of related sequences that occur at many hundred loci
in the nuclear genome

 Consist of intermediate-sized DNA motifs (about 10 to 60 nucleotides)

 Show a lower degree of repetition at a given locus compared with satellites

 Units carry a common GC-rich core sequence of 10 to 15 bp

 Repeats with longer unit size has higher AT content were also identified
21
Minisatellites

 Distributed unevenly across the nuclear genome

 Localization of human minisatellites


 in subtelomeric regions
 significant increase toward the telomeres

 In other mammals, a subtelomeric location of minisatellites is less


obvious
 cluster around the centromeres ( in plants)

 are frequently associated with other types of repeats, including


 Microsatellites
 Transposons
22
Functions of minisatellites

 Nuclear proteins specifically i.e. interaction with certain minisatellites

 Serve as regulatory purposes, for example,


 Recombination
 Transcriptional activation and/or
 Splicing etc.

 May constitute fragile chromosome sites


 Could thus be involved in chromosomal translocations

 They are sometimes present in genes


 as, for example, in human genes encoding an epithelial mucin
and involucrin
23
Minisatellites as molecular markers

 Exploited as molecular markers in various ways

 But two techniques clearly prevail

 Minisatellite-complementary probes are hybridized to restriction-

digested genomic DNA to produce highly variable RFLP fingerprints

 Minisatellites are used as single primers in a PCR

 Minisatellites in plant mtDNA and cpDNA represent a largely untapped source

of molecular markers at the intraspecific level

24
III. Microsatellites
 First recognized in the early 1970s
 When (TAGG)n repeats were found in the satellite DNA of a hermit
crab

 Consist of tandemly reiterated, short DNA sequence motifs ( 1 to 6 bp)

 They are ubiquitous components of all eukaryotic genomes, and are also
found in prokaryotes

 Microsatellite frequencies in plants are higher than animals

 Usually characterized by a low degree of repetition at a particular locus

 Microsatellites consisting of identical motifs may be found at many


thousand genomic loci
25
Categories of microsatellites

Classification is based on

 Motif

 Degree of perfectness of the arrays

26
Categories of microsatellites; based on motif

 Monomeric, one nucleotide repeat , (A)n

 Dimeric, two nucleotide repeat, e.g.(CA)n

 Trimeric, three nucleotide repeat, e.g. (GAA)n

 Tetrameric, Pentameric, Haxameric, …

The most abundant motifs found in mammalian genomes


 (A)n and (CA)n as well as their complements

The most frequent motifs in plants


 (A)n, (AT)n, (GA)n, and (GAA)n repeats
 Mononucleotide repeats consisting of A/T tracts are also present
in chloroplast genomes
27
Categories of microsatellites; based on motif

 tri-, tetra, and pentanucleotide motifs are generally less common than mono- and
dinucleotide repeats

 Estimates are extremely variable depending on:


• The motif
• The genomic localization (introns vs. exons vs. 5’- and 3’- untranslated
regions vs. intergenic regions), and
• The species under consideration

 As a general rule, trinucleotide repeats are the predominant type of


microsatellites found in exons b/c
 Since slippage of one or more trinucleotide units does not affect the
triplet periodically imposed by the open reading frame
 Repeats consisting of multiples of one, two, four, and five bps are rare
in genes
 Frameshift mutations resulting from the insertion/deletion of the other
types of repeat units will completely change the amino acid sequence
downstream of the mutated site28
Categories of microsatellites;
based on degree of perfectness of the arrays

 Weber (1990) recognized three classes, comprising


1. Perfect repeats, which consist of a single, uninterrupted array of
particular motif e.g. GCTAGCCACACACACACACATGCATC
2. Imperfect repeats, in which the array is interrupted by one or
several out-of-frame bases, e.g.
GCTAGCCACACGTCACACACTGCATC
– Compound repeats, with intermingled perfect or imperfect arrays
of several motifs. e.g. GCTAGCCACACATATATGTGTGCATC

 Weber also showed that the level of polymorphism exhibited by


PCR-amplified (CA)n microsatellites in humans is positively
correlated with the number of uninterrupted, perfect repeats at a given
locus
29
Microsatellites in organelle genomes

• Poly (A/T) repeats are the only type of microsatellites that are
regularly present in the chloroplast genome,

• Mainly in introns and intergenic regions

• Some chloroplast microsatellites appear to be associated with


mutational hotspots in the cpDNA molecule

• They appear to be rare in plant mtDNA, with one single


explicit report of a (G)n repeat from several conifer species

30
Kangaroo Rat (Dipodomys ordii)
50% of the genome consists of:
AAG 2.4  109 times
TTAGGG 2.2  109 times
ACACAGCGGG 1.2  109 times

31
Potential functions of microsatellites

1. Microsatellite-like repeats are structural elements of both


telomeres and centromeres

2. Some microsatellites bind nuclear proteins and may, for


example, serve as a landing pad for transcription factors
that enhance or reduce the expression of neighboring genes
(e.g., the GAGA factor)

3. Some microsatellites (especially trinucleotide repeats) are


transcribed and then often encode tracts of identical amino
acids

32
Microsatellites as molecular markers

• The most important variant is the locus-specific PCR amplification


of nuclear and organellar microsatellites with flanking primers

• Other methods use microsatellite motifs (instead of flanking regions)


as single PCR primers,

• As PCR primers in combination with other primer types, or

• As hybridization probes

33
Microsatellites vs minisatellites

 Microsatellites are more useful than minisatellites for marker analysis


because:
- They are shorter
- Easier to amplify

- More abundant, and

- More evenly distributed throughout the genome

 The large number of alleles and high levels of variability among closely
related organisms made PCR-amplified microsatellites the marker system
of choice for a wide variety of applications
34
2. Transposable elements

 First discovered by Barbara McClintock in maize more than 50


years ago

35
Transposable elements

 First discovered by Barbara McClintock in maize more than 50 years ago

 Mobile genetic elements


 able to change their position within the genome
 acquired their current genomic location by transposition
The mechanism of transpostion can be divided into two classes
Class I transposons
o disperse via an RNA intermediate
o Given that reverse transcription of RNA into DNA
o reverse-transcribed into a cDNA
o more commonly called retrotransposons

Class II transposons
o Propagate (jump) via DNA intermediate
36
(eukaryotes only)
37
“copy-and-paste”
38
“cut-and-paste”

39
( prokaryotes
40 and eukaryotes)
Retrotransposons
 According to their genomic organization and gene content, retrotransposons may
be further divided into:
1. Retroviruses
2. Long terminal repeat (LTR) retrotransposons
3. Long interspersed elements (LINEs)
4. Short interspersed elements (SINEs)

 LINEs and SINEs are also referred to as non-LTR retrotransposons

 For each type of retrotransposons, active as well as defective copies have been
found

 In general, inactive elements outnumber active copies by a factor of several


thousand

 Mobile elements in eukaryotes are dominated as retrotransposons than


eukaryotic DNA transposons.
41
Retroviruses
 Are distinguished from other types of retroelements by the presence
of an env gene in their genome
 The protein encoded by this gene allows retroviruses to enter and
leave their host cell
 the only infectious type of retroelement

 typical host organisms are the vertebrates

 It was hypothesized that retroviruses evolved from LTR


retrotransposons
 characterized by the presence of about 300 to 500-bp-long direct
repeats at both ends of the element

42
Retroviruses cont..
 It encodes
1. A capsid protein, which packages the viral RNA into a virus-like
particle
2. An Rnase (RNase H)
3. A reverse transcriptase, which generates a cDNA from the full-sized
message
4. A protease, which is needed for processing the polyprotein, and
5. An endonuclease, which serves as an integrase

43
LTR Retrotransposons
 RNA intermediate
transcribed from the
mobile element by
RNA polymerase

 Reverse transcription
to convert the RNA
into double stranded
DNA by reverse
transcriptase
 Like Retroviruses 44
Non-LTR retrotransposons
A. Long Interspersed Elements (LINEs) :
 An interesting and heterogeneous class of sequences comprised in part of
transposons and retrotransposons.

 Elements that are 3,000 - 5,000 bp in length that are dispersed (interspersed)
throughout genomes

 Clearly mobile (able to “move” from location to location within a genome)


and inducible.

 Definite involvement of transposable elements in mutation and chromosomal


rearrangement. Example:- ≈6 kb in human

account for 21%


45 of the genome
General Principles of LINE transposition

46
Lodish et al., Molecular Cell Biology, 7th ed. Fig 10-16
B) Short Interspersed Elements (SINEs)

 150-300 base pair (bp) repeated elements are found – typically possess an 8-20
bp inverted repeat (characteristic of “insertion” sequences) called ‘target-site
duplications’

 exhibit a highly variable pattern among organisms. e.g. ≈300 bp in human,


account for 13% of the genome

 SINE sequences are transcribed but are not translated -- in humans, AluI
sequences are found in 20% of hn (pre-)mRNA but are removed during mRNA
processing

 now thought to be possibly ‘mobilized’ by retroposons (LINES)

 the function of SINE sequences are unknown; ad hoc suggestions include


transcription regulation and regulation of mRNA processing
47
Class II transposons

 Disperse via a DNA intermediate

 Characterized by short terminal inverted repeats (TIRs)

 The internal regions encode one or two genes responsible for transposition

 Transposition usually follows a nonreplicative cut-and-paste mechanism

 copy numbers are small to intermediate (usually less a few hundred)

 comprise only a small part of the genome

 They often integrate in gene-rich regions

 makes them useful tools for gen isolation by transposon tagging

 Most mobile elements in bacteria is DNA


48
transposons
49
Clasification of Class II transposons

 In plants, can be grouped into at least four superfamilies,

 Three of which (Ac, CACTA, Mu) were first characterized in


maize

 Transposons of the Ac family (e.g., Ac in maize, P-elements in


Drosophila) code for a single gene ( a transposase)

 Transposons of the CACTA family carry two genes, encoding


a transposase and a DNA binding protein

50
Unclassified transposons

 Miniature inverted-repeat transposable elements (MITEs) are a superfamily


of transposons that are characterized by
• Small size (<500 bp)
• Short TIRs
• AT-richness
• High levels of internal sequence divergence
• The potential to form secondary structures
• Relatively large copy numbers (typically > 1000 per haploid genome),
and
• A preference for sequence-defined integration sites such as TA or TAA
• More than 10 MITE families have been characterized
• There is no sequence homology between the various families

51
Transposons as molecular markers

• A wide variety of molecular marker techniques use PCR


primers directed toward transposable elements, either alone or
in combination with other types of primers

• Thus, LTR retrotransposon-specific primers have been


combined with microsatellite-specific primers

• With AFLP primers in sequence-specific amplifications


polymorphism (S-SAP)

• AFLP primers were also used together with primers specific


for DNA transposons

52
Eukaryotic Repetitive DNA

Tandem repeats Interspersed repeats

DNA
RNA transposons
transposons
Minisatelites Satellite microsatellite

LTRs Non- LTRs

LINEs SINEs
53
 2.2. Genetic diversity and
basis of polymorphism
2.2. Genetic diversity and basis of polymorphism

 What is genetic diversity ?

 How is genetic diversity generated?

 Why is genetic diversity important?

 What happen when genetic diversity is low?

 How do we stop genetic diversity loss?

 What is polymorphism in genetic diversity?

55
Genetic diversity

 What is genetic diversity?

 Genetic- means related to traits passed from parent to


offspring

 Diversity- means having a range of different things

 Genetic diversity:- refers to the range of different


inherited traits within a species.

 The combined differences in the DNA of all individuals in


a species make up the genetic diversity

56
Cont’d..

 The overall diversity in the DNA between the individuals of a


species of that species.
 It causes individuals to have different characteristics

 In a species with high genetic diversity, there would be many


individuals with a wide variety of different traits.

 E.g. Although all apples belong to the same species,


 the apples we eat are hundreds of apple varieties, that range from
 red to green,
 tart to sweet, and
 some apples even have pink
57
flesh inside
How is Genetic Diversity Generated?

Compared to coding sequences, repetitive DNAs are considered as


 fast-evolving genome components.
 Their variable abundance, high sequence variations and distinct
chromosomal distributions contribute to genome divergence
among species.

 Mutation of genes, genetic drift and gene flow are also responsible for genetic
diversity.

 Mutations can arise when mistakes are made while cells are copying DNA.
These mutations make up a species’ genetic diversity.

 Most mutations are either harmful or have no impact at all, but sometimes
these mutations can cause changes that are helpful for a species
58
Why is Genetic Diversity Important?

 The individuals that have these helpful mutations


might have greater chances of survival, and have
more babies as a result. This is adaptation
 Adaptation :- the process of a species changing in order to
better survive in its environment

 In addition, it strengthens the ability of species and


populations to resist diseases, pests and other stress.

.
59
What Happens When Genetic Diversity is Low?

 When few mutations are found in the DNA of a species,


genetic diversity is said to be low.

 Low genetic diversity: means that there is a limited variety


of alleles for genes within that species and so there are not
many differences between individuals.
 means that there are fewer opportunities to adapt
to environmental changes.
 often occurs due to habitat loss.

60
Cont’d…

 If genetic diversity gets too low,

 species can go extinct and be lost forever due to the combined


effects of inbreeding depression and failure to adapt to
change.
In such case , the introduction of new alleles can save a population . This is
called genetic rescue.

Genetic rescue : is A conservation strategy, new individuals are moved


into a population to increase genetic diversity and improve population
health. 61
How Do We Stop Genetic Diversity Loss?

 The following are strategies that can help to stop genetic diversity loss
:
 preserve and protect genetic diversity
 use nature reserves and wildlife bridges to reconnect wild
populations that have become separated by our cities and
highways.
 restore habitats, because this will allow wild populations to
get bigger.
 Sometimes remove harmful stressors and pests

62
Cont’d…

 reintroduce species that have been lost from habitats


they used to live in.

 It is important to protect genetic diversity because it is the


foundation for healthy species.

 Healthy species are necessary for human health and for the
health of the whole planet

63
what is polymorphism?

Polymorphism

 Presence of two or more variant forms of a


specific DNA sequence that can occur among
different individuals or populations.

 Mutation are the basis of polymorphism

 So , genetic polymorphism determines the diversity


of individuals.
64
65

You might also like