Professional Documents
Culture Documents
Dna Sequencing Strategies
Dna Sequencing Strategies
Continue………
1995; Craig Venter & Hamilton Smith;
Haemophilus influenzae (1,830,137 bp) (1st free living).
Mycoplasma genitalium (smallest free-living, 580,000 bp; 470 genes)
Log MW
technology. .
.
▪ DNA sequencing: In particular, almost . .
all DNA sequencing is still performed
using the approach pioneered by
Distance
Sanger.
▪ Library construction: Also essential to
high-throughput sequencing is the ability to
generate libraries of genomic clones and
then cut portions of these clones and
introduce them into other vectors.
▪ PCR amplification: The use of the
polymerase chain reaction (PCR) to
amplify DNA, developed in the 1980s, is
another technique at the core
▪ of
Hybridization Technique: Finally, the use of hybridization of one nucleic acid to
another in order to detect and quantitate DNA and RNA (Southern blotting). This method
remains the basis for genomics techniques such as microarrays.
STEPS OF GENOME SEQUENCING
▪ Break genome into smaller fragments
▪ Sequence those smaller pieces
▪ Piece the sequences of the short fragments together
Continue………
In this approach, every part
of the genome is actually
sequenced roughly 4-5 times
to ensure that no part of the
genome is left out.
D. Microsatellites, etc.
A. Restriction Fragment Length Polymorphisms (RFLP)
Polymorphism means that a genetic locus has different forms, or alleles.
The cutting the DNA from any two individuals with a restriction enzyme may yield fragments of
different lengths, called Restriction Fragment Length Polymorphisms (RFLP), is usually
pronounced “rifflip”.
❑ The pattern of RFLP generated will depend mainly on
– 1) The differentiation in DNA of selected strains (or) species
– 2) The restriction enzymes used
– 3) The DNA probe employed for southern hybridization
Steps:
a. Consider the restriction enzyme HindIII, which recognizes the sequence
AAGCTT.
b. Between two, One individual contains three sites of a chromosome, so cutting the DNA with HindIII
yields two fragments, 2 and 4 kb long.
Continue………
Figure: Detecting a RFLP
c. Another individual may lack the middle site but have the other two, so cutting the
DNA with HindIII yields one fragment 6 kb long. These fragments are called
RFLP.
Continue………
d. These restriction fragments of different lengths beteween the genotypes can be
detected on southern blots and by the use of suitable probe. An
RFLP is detected as a differential movement of a band on the gel lanes
from different species and strains. Each such bond is regarded as single RFLP
locus. So any differences among the DNA of individuals are easy to see.
Limitations
➢ Requires relatively large amount of highly pure DNA
➢ Laborious and expensive to identify a suitable marker restriction
enzymes.
➢ Time consuming.
➢ Required expertise in auto radiography because of using radio actively
labeled probes
B. Variable Number of Tandem Repeats (VNTRs)
Due to the greater the degree of polymorphism of a RFLP, mapping become very
tedious, in this case variable number tandem repeats (VNTRs) will be
more useful.
Tandem repeats occur in DNA when a pattern of one or more nucleotides is
repeated and the repetitions are directly adjacent to each other.
An example would be:
Another kind of genetic marker, which is very useful to genome mappers, is the
sequence-tagged site (STS).
•STSs are short sequences, about 60–1000 bp long, that can be easily
detected by PCR using specific primers.
•The sequences of small areas of this DNA may be known or unknown, so one can
design primers that will hybridize to these regions and allow PCR to
produce double stranded fragments of predictable lengths. If the proper size
appears, then the DNA has the STS of interest.
•One great advantage of STSs as a mapping tool is that no DNA must be
cloned and examined.
•Instead, the sequences of the primers used to generate an STS are published and
then anyone in the world can order those same primers and find the same STS in an
experiment that takes just a few hours.
Continue………
In this example, two PCR
primers (red) spaced 250 bp apart
have been used. Several cycles of
PCR generate many double- stranded
PCR products that are precisely 250
bp long.
Electrophoresis of this product
allows one to measure its size exactly
and confirm that it is the correct one.
Continue………
Fig: Mapping with STSs.
At top left, several representative BACs are shown, with different symbols representing different STSs placed at
specific intervals. In step (a) of the mapping procedure, screen for two or more widely spaced STSs. In this case
screen for STS1 and STS4. All those BACs with either STS1 or 4 are shown at top right. The identified STSs are shown
in color. In step (b), each of these positive BACs is further screened for the presence of STS2, STS3, and STS5.The
colored symbols on the BACs at bottom right denote the STSs detected in each BAC. In step (c), align the STSs in
each BAC to form the contig. Measuring the lengths of the BACs by pulsed-field gel electrophoresis helps to pin
down the spacing between pairs of BACs.
D. Microsatellites
STSs are very useful in physical mapping or locating specific sequences in the
genome. But sometimes it is not possible to use them for genetic mapping.
▪ Fortunately, geneticists have discovered a class of STSs called
microsatellites.
GGCCTTTTGGGGTTGGTTGGAATTGGTTAAGGAAAAGGGGCCG
GCCCCAAAATTGGCCAATTCCTTCCGGAACCGGTTAATT
GGCCGGTTAATTAACCGGGGGGTTTTAACCCCCCCCCCTTTTTTG
GCCAAAATTCCAAGGTTGGCCAACACCAACCAC
sequence repeated
ACACCACA CCAACACover
CAAand
CCACAover
CCAmany
ACACCAtimes inCA
ACCACA aCA
row.
GGTTGGCC
▪ The core sequence in typical microsatellites is smaller—usually only 2–4 bp
long.CCAAAAGGCCAA
▪ Microsatellites are highly polymorphic; they are also widespread and
AAAAuniformly
relatively
AAAATTAdistributed
AAACCGGinCthe
CCChuman
AA AA G GCCAAGGAAAACCGGA
genome.
▪ The Anumber CCGGTvaried
AAGGAofArepeats TTTCCTquite
TCCGaGAbit
AGfrom
GAAone
AACindividual
CAACCCCto another.
▪ Thus, they are ideal as markers for both linkage and physical mapping.
▪ Microsatellites are similar to minisatellites in that they consist of a
Continue……
▪ In 1992, Jean Weissenbach et al produced a linkage map of the entire
human genome based on 814 microsatellites containing a C–A
dinucleotide repeat.
▪ The most common way to detect microsatellites is to design PCR primers that are
unique to one locus in the genome and unique on base pair on either side of the
repeated portion.
▪ Therefore, a single pair of PCR primers will work for every individual in the
species and produce different sized products for each of the different length
microsatellites.
▪ The PCR products are then separated by either gel electrophoresis. Either way,
the investigator can determine the size of the PCR product and thus how many
times the dinucleotide ("CA") was repeated for each allele.
WHOLE GENOME SHOT GUN SEQUENCING
The shotgun-sequencing strategy, first proposed by Craig Venter, Hamilton Smith, and Leroy Hood in
1996, bypasses the mapping stage and goes right to the sequencing stage.
This method was employed by Celera Genomics, which was a private entity that was trying to mono-
polise the human genome sequence by patenting it, to do this they had to try and beat the publicly funded project.
Whole genome shotgun sequencing was therefore adopted by them.
1. BAC library: A BAC library is generated of random fragments of the human genome using restriction
digestion followed by cloning.
The sequencing starts with a set of BAC clones containing very large DNA inserts, averaging about 150
kb. The insert in each BAC is sequenced on both ends using an automated sequencer that can usually read about
500 bases at a time, so 500 bases at each end of the clone will be determined.
Assuming that 300,000 clones of human DNA are sequenced this way, that would generate 300
million bases of sequence, or about 10% of the total human genome. These 500-base sequences serve as an
identity tag, called a sequence-tagged connector (STC), for each BAC clone. This is the origin of the term
connector—each clone should be “connected” via its STCs to about 30 other clones
. Continue………
Steps:
1. BAC library
2. Finger printing
3. Plasmid library
4. BAC walking
5.Powerful computer
program
3.Plasmid library: A seed BAC is selected for sequencing. The seed BAC is sub cloned into a plasmid
vector by subdividing the BAC into smaller clones only about 2 kb. A plasmid library is prepared by
transforming E. coli strains with plasmid. This whole BAC sequence allows the identification of the 30 or so
other BACs that overlap with the seed: They are the ones with STCs that occur somewhere in the seed BAC.
Continue………
4. BAC walking: Three thousand of the plasmid clones are sequenced, and the sequences are ordered by their overlaps,
producing the sequence of the whole 150-kb BAC. Finding the BACs (about 30) with overlapping STCs, then compare
them by fingerprinting to find those with minimal overlaps, and sequence them. This strategy, called BAC walking,
would in principle allow one laboratory to sequence the whole human genome.
5. Powerful computer program: But we do not have that much time, so Venter and colleagues modified the
procedure by sequencing BACs at random until they had about 35 billion bp of sequence. In principle that
should cover the human genome ten times over, giving a high degree of coverage and accuracy. Then they fed
all the sequence into a computer with a powerful program that found areas of overlap between clones and
fit their sequences together, building the sequence of the whole genome.