Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 32

DNA SEQUENCING STRATEGIES

WHAT IS GENOME ???


❑ Genome: One complete set of genetic
information (total amount of DNA) from a haploid set of
chromosomes of a single cell in eukaryotes, in a single
chromosome in bacteria, or in the DNA or RNA of viruses.

❑ Basic set of chromosome in a organism.

“The whole hereditary information of an organism that is


encoded in the DNA”
• In cytogenetic genome means a single set of chromosomes.
•It is denoted by x. Genome depends on the number of ploidy of
organism.
• In Drosophila melanogaster (2n = 2x = 8); genome x = 4.
• In hexaploid Triticum aestivum (2n = 6x = 42); genome x = 7.
Continue………
The genome is found
inside every cell,
and
in those that have
nucleus, the genome
is situated inside the
nucleus. Specifically,
it is all the DNA in an
organelle.

❑The term genome was introduced by H. Winkler in 1920 to


denote the complete set of chromosomal and extra
chromosomal genes present in an organism, including a virus.
TYPES OF GENOME ???
1. Prokaryotic Genomes
2. Eukaryotic Genomes
• Nuclear Genomes
• Mitochondrial Genomes
• Choloroplast Genomes
If not specified, “genome” usually refers to the nuclear genome.

WHAT IS GENOMICS ???


• Genomics is the study of the structure and function of
whole genomes.
• Genomics is the comprehensive study of whole sets of genes
and their interactions rather than single genes or proteins.

• According to T.H. Roderick, genomics is the mapping


sequencing to analyze the and structure and
genome. organization of
ORIGIN OF GENOMICS ???
• The term genome was used by German botanist Hans Winker
in 1920
• Collection of genes in haploid set of chromosomes Now it
• encompasses all DNA in a cell

❑The field includes studies of intro-genomic phenomena such as


heterosis, epistasis, pleiotropy and other interactions between loci and
alleles within the genome.
❑The sequence information of the genome will
show;
▪ The position of every gene along the
chromosome,
▪ The regulatory regions that flank each gene, and
▪ The coding sequence that determines
the protein
produce by each gene.

❑ How is Genomics different from Genetics?


Genetics as the study of inheritance and genomics as the
study of genomes.
– Genetics looks at single genes, one at a time, like a
picture or snapshot.
TYPE OF GENOMICS
1. Structural: It deals with the of the
determination complete sequence of genomes
and gene map.
This has progressedofin high
(i)construction steps as follows: genetic and physical
resolution
maps,
(ii) sequencing of the genome, and
(iii)determination of complete set of proteins in an
organism.
2.Functional: It refers to the study of functioning of genes
and their regulation and products(metabolic pathways),
i.e., the gene expression patterns in organism.
3.Comparative: It compare genes from different genomes to
elucidate functional and evolutional relationship.
GENOME SEQUENCING
Genome sequencing is the technique that allows researchers
to read the genetic information found in the DNA of anything from
bacteria to plants to animals. Sequencing involves determining the
order of bases, the nucleotide subunits- adenine(A), guanine(G),
cytosine(C) and thymine(T), found in DNA.

Genome sequencing is figuring out the order of DNA


nucleotides.

CHALENGES OF GENOME SEQUENCING


▪ Data produce in form of short reads, which have to be assembled correctly in
large contigs and chromosomes.
▪ Short reads produced have low quality bases and vector/adaptor
contaminations.
▪ Several genome assemblers are available but we have to check the
performance of them to search for best one.
MILESTONE OF GENOME SEQUENCING
1977; Fred Sanger; X 174 bacteriophage (first sequenced genome );
5,375 bp
Amino acid sequence of phage proteins
Overlapping genes only in viruses

Fig: The genetic map of phage X174 (Overlapping reading frames)

Continue………
1995; Craig Venter & Hamilton Smith;
Haemophilus influenzae (1,830,137 bp) (1st free living).
Mycoplasma genitalium (smallest free-living, 580,000 bp; 470 genes)

1996; Saccharomyces cerevisiae; (1st eukaryote) 12,068,000 bp

1997; Escherichia coli; 4,639,221 bp; Genetically more important.

1999; Human chromosome 22; 53,000,000 bp

2000; Drosophila melanogaster; 180,000,000 bp

2001; Human; Working draft; 3,200,000,000 bp

2002; Plasmodium falciparum; 23,000,000 bp

Anopheles gambiea; 278,000,000 bp

Mus musculus; 2,500,000,000 bp

2003; Human; finished sequence, 3,200,000,000 bp

2005; Oryza sativa (first cereal grain); 489,000,000 bp

2006; Populus trichocarpa (first tree) ; 485,000,000


Technical foundations of genomics
▪ Molecular biology: Almost all of the
underlying techniques of genomics
originated withrecombinant-DNA

Log MW
technology. .
.
▪ DNA sequencing: In particular, almost . .
all DNA sequencing is still performed
using the approach pioneered by
Distance
Sanger.
▪ Library construction: Also essential to
high-throughput sequencing is the ability to
generate libraries of genomic clones and
then cut portions of these clones and
introduce them into other vectors.
▪ PCR amplification: The use of the
polymerase chain reaction (PCR) to
amplify DNA, developed in the 1980s, is
another technique at the core
▪ of
Hybridization Technique: Finally, the use of hybridization of one nucleic acid to
another in order to detect and quantitate DNA and RNA (Southern blotting). This method
remains the basis for genomics techniques such as microarrays.
STEPS OF GENOME SEQUENCING
▪ Break genome into smaller fragments
▪ Sequence those smaller pieces
▪ Piece the sequences of the short fragments together

DNA SEQUENCING APPROACHES

Two different methods used


1. Hierarchical shotgun sequencing
-Useful for sequencing genomes of higher vertebrates that
contain repetitive sequences
2. Whole genome Shotgun Sequencing
-Useful for smaller genomes
HIERARCHICAL SHOTGUN SEQUENCING APPROACHES
• The method preferred by the Human Genome Project is
the hierarchical shotgun sequencing method.
• Also known as
– The Clone-by-Clone Strategy
– the map-based method
– map first, sequence later
– top-down sequencing

Human Genome Project adopted a map-based strategy


– Start with well-defined physical map
– Produce shortest tiling path for large-insert clones
– Assemble the sequence for each clone
– Then assemble the entire sequence, based on the physical map
CLONE-BY-CLONE STRATEGY APPROACHES
1) Markers for regions of the genomes are identified.
2) The genome is split into larger fragments (50-200kb) using restriction/cutting enzymes that contain
a known marker.
3) These fragments are cloned in bacteria (E. coli) using BACs (Bacterial Artificial
Chromosomes), where they are replicated and stored.
4) The BAC inserts are isolated and the whole genome is mapped by finding markers regularly
spaced along each chromosome to determine the order of each cloned.
5) The fragments contained in these clones have different ends, and with enough coverage finding a scaffold of
BAC contigs. This scaffold is called a tiling path. BAC contig that covers the entire genomic area of
interest makes up the tiling path.
6) Each BAC fragment in the Golden Path is fragmented randomly into smaller pieces and these fragments are
individually sequenced using automated Sanger sequencing and sequenced on both strands.
7) These sequences are aligned so that identical sequences are overlapping. Assembly of the genome is done on
the basis of prior knowledge of the markers used to localize sequenced fragments to their genomic location. A
computer stitches the sequences up using the markers as a reference guide.

Continue………
In this approach, every part
of the genome is actually
sequenced roughly 4-5 times
to ensure that no part of the
genome is left out.

Fig: Hierarchical shotgun sequencing


Each 150,000 bp fragment is inserted into a BAC (bacterial artificial chromosome).
A BAC can replicate inside a bacterial cell. A set of BACs containing an entire
human genome is called a BAC library.

The Clone-by-Clone Strategy used in


✓S. cerevisiae (yeast),
✓C. elegans (nematode),
✓Arabidopsis thaliana (mustard weed),
✓Oryza sativa,
✓Drosophila melanogaster and
✓Homo sapiens (Human), etc.
MARKERS USED IN CLONE-BY-CLONE STRATEGY

Different types of Markers are used in mapping large


genomes, Such as
A. Restriction Fragment Length Polymorphisms (RFLP)

B. Variable Number of Tandem Repeats (VNTRs)

C. Sequence Tagged Sites (STS)

D. Microsatellites, etc.
A. Restriction Fragment Length Polymorphisms (RFLP)
Polymorphism means that a genetic locus has different forms, or alleles.
The cutting the DNA from any two individuals with a restriction enzyme may yield fragments of
different lengths, called Restriction Fragment Length Polymorphisms (RFLP), is usually
pronounced “rifflip”.
❑ The pattern of RFLP generated will depend mainly on
– 1) The differentiation in DNA of selected strains (or) species
– 2) The restriction enzymes used
– 3) The DNA probe employed for southern hybridization

Steps:
a. Consider the restriction enzyme HindIII, which recognizes the sequence
AAGCTT.
b. Between two, One individual contains three sites of a chromosome, so cutting the DNA with HindIII
yields two fragments, 2 and 4 kb long.
Continue………
Figure: Detecting a RFLP
c. Another individual may lack the middle site but have the other two, so cutting the
DNA with HindIII yields one fragment 6 kb long. These fragments are called
RFLP.
Continue………
d. These restriction fragments of different lengths beteween the genotypes can be
detected on southern blots and by the use of suitable probe. An
RFLP is detected as a differential movement of a band on the gel lanes
from different species and strains. Each such bond is regarded as single RFLP
locus. So any differences among the DNA of individuals are easy to see.

e. This RFLP is used as a marker in chromosomal mapping.

Limitations
➢ Requires relatively large amount of highly pure DNA
➢ Laborious and expensive to identify a suitable marker restriction
enzymes.
➢ Time consuming.
➢ Required expertise in auto radiography because of using radio actively
labeled probes
B. Variable Number of Tandem Repeats (VNTRs)
Due to the greater the degree of polymorphism of a RFLP, mapping become very
tedious, in this case variable number tandem repeats (VNTRs) will be
more useful.
Tandem repeats occur in DNA when a pattern of one or more nucleotides is
repeated and the repetitions are directly adjacent to each other.
An example would be:

ATTCGCCAATC ATTCGCCAATC ATTCGCCAATC


ATTCGCCAATC ATTCGCCAATC ATTCGCCAATC
In which the sequence ATTCGCCAATC is repeated three times.

• A variable number tandem repeat (or VNTR) is a location


a genome
in where a short nucleotide sequence is organized as a tandem repeat.
The repeated sequence is longer — about 10-100 base pairs long.
• The full genetic profiles of individuals reveal many differences.
• Since most human genes are the same from person to person, but
• Variable Number of Tandem Repeats or VNTRs that tends to
differ
among different people.
Continue………
• While the repeated sequences themselves are usually the same from
person to person, the number of times they are repeated tends to vary.
• VNTRs are highly polymorphic. These can be isolated from an
individual’s DNA and therefore relatively easy to map.
• However, VNTRs have a disadvantage as genetic markers: They tend to
bunch together at the ends of chromosomes, leaving the interiors of the
chromosomes relatively devoid of markers.
C. Sequence Tagged Sites (STS)

Another kind of genetic marker, which is very useful to genome mappers, is the
sequence-tagged site (STS).
•STSs are short sequences, about 60–1000 bp long, that can be easily
detected by PCR using specific primers.
•The sequences of small areas of this DNA may be known or unknown, so one can
design primers that will hybridize to these regions and allow PCR to
produce double stranded fragments of predictable lengths. If the proper size
appears, then the DNA has the STS of interest.
•One great advantage of STSs as a mapping tool is that no DNA must be
cloned and examined.
•Instead, the sequences of the primers used to generate an STS are published and
then anyone in the world can order those same primers and find the same STS in an
experiment that takes just a few hours.

Continue………
In this example, two PCR
primers (red) spaced 250 bp apart
have been used. Several cycles of
PCR generate many double- stranded
PCR products that are precisely 250
bp long.
Electrophoresis of this product
allows one to measure its size exactly
and confirm that it is the correct one.

Figure : Sequence-tagged sites


Making physical map using Sequence Tagged Sites (STS)

1. Geneticists interested in physically mapping or sequencing a given region of a


genome aim to assemble a set of clones called a contig, which contains
contiguous (actually overlapping) DNAs spanning long distances.
2. It is essential to have vectors like BACs and YACs that hold big chunks of
DNA. Assuming we have a BAC library of the human genome, we need some
way to identify the clones that contain the region we want to map.
3. A more reliable method is to look for STSs in the BACs. It is best to screen the
BAC library for at least two STSs, spaced hundreds of kilo- bases apart, so
BACs spanning a long distance are selected.
4. After we have found a number of positive BACs, we begin mapping by
screening them for several additional STSs, so we can line them up in an
overlapping fashion as shown in following figure. This set of overlapping
BACs is our new contig. We can now begin finer mapping, and even
sequencing, of the contig.

Continue………
Fig: Mapping with STSs.
At top left, several representative BACs are shown, with different symbols representing different STSs placed at
specific intervals. In step (a) of the mapping procedure, screen for two or more widely spaced STSs. In this case
screen for STS1 and STS4. All those BACs with either STS1 or 4 are shown at top right. The identified STSs are shown
in color. In step (b), each of these positive BACs is further screened for the presence of STS2, STS3, and STS5.The
colored symbols on the BACs at bottom right denote the STSs detected in each BAC. In step (c), align the STSs in
each BAC to form the contig. Measuring the lengths of the BACs by pulsed-field gel electrophoresis helps to pin
down the spacing between pairs of BACs.
D. Microsatellites
STSs are very useful in physical mapping or locating specific sequences in the
genome. But sometimes it is not possible to use them for genetic mapping.
▪ Fortunately, geneticists have discovered a class of STSs called
microsatellites.

GGCCTTTTGGGGTTGGTTGGAATTGGTTAAGGAAAAGGGGCCG
GCCCCAAAATTGGCCAATTCCTTCCGGAACCGGTTAATT
GGCCGGTTAATTAACCGGGGGGTTTTAACCCCCCCCCCTTTTTTG

GCCAAAATTCCAAGGTTGGCCAACACCAACCAC
sequence repeated
ACACCACA CCAACACover
CAAand
CCACAover
CCAmany
ACACCAtimes inCA
ACCACA aCA
row.
GGTTGGCC
▪ The core sequence in typical microsatellites is smaller—usually only 2–4 bp
long.CCAAAAGGCCAA
▪ Microsatellites are highly polymorphic; they are also widespread and
AAAAuniformly
relatively
AAAATTAdistributed
AAACCGGinCthe
CCChuman
AA AA G GCCAAGGAAAACCGGA
genome.
▪ The Anumber CCGGTvaried
AAGGAofArepeats TTTCCTquite
TCCGaGAbit
AGfrom
GAAone
AACindividual
CAACCCCto another.
▪ Thus, they are ideal as markers for both linkage and physical mapping.
▪ Microsatellites are similar to minisatellites in that they consist of a
Continue……
▪ In 1992, Jean Weissenbach et al produced a linkage map of the entire
human genome based on 814 microsatellites containing a C–A
dinucleotide repeat.

▪ The most common way to detect microsatellites is to design PCR primers that are
unique to one locus in the genome and unique on base pair on either side of the
repeated portion.
▪ Therefore, a single pair of PCR primers will work for every individual in the
species and produce different sized products for each of the different length
microsatellites.
▪ The PCR products are then separated by either gel electrophoresis. Either way,
the investigator can determine the size of the PCR product and thus how many
times the dinucleotide ("CA") was repeated for each allele.
WHOLE GENOME SHOT GUN SEQUENCING
The shotgun-sequencing strategy, first proposed by Craig Venter, Hamilton Smith, and Leroy Hood in
1996, bypasses the mapping stage and goes right to the sequencing stage.
This method was employed by Celera Genomics, which was a private entity that was trying to mono-
polise the human genome sequence by patenting it, to do this they had to try and beat the publicly funded project.
Whole genome shotgun sequencing was therefore adopted by them.

1. BAC library: A BAC library is generated of random fragments of the human genome using restriction
digestion followed by cloning.
The sequencing starts with a set of BAC clones containing very large DNA inserts, averaging about 150
kb. The insert in each BAC is sequenced on both ends using an automated sequencer that can usually read about
500 bases at a time, so 500 bases at each end of the clone will be determined.
Assuming that 300,000 clones of human DNA are sequenced this way, that would generate 300
million bases of sequence, or about 10% of the total human genome. These 500-base sequences serve as an
identity tag, called a sequence-tagged connector (STC), for each BAC clone. This is the origin of the term
connector—each clone should be “connected” via its STCs to about 30 other clones

. Continue………
Steps:
1. BAC library
2. Finger printing
3. Plasmid library
4. BAC walking
5.Powerful computer
program

Fig: Whole Genome Shotgun


Sequencing Method
Continue……
2. Finger printing: This step is to fingerprint each clone by digesting it with a restriction enzyme. This serves
two important purposes. First, it tells the insert size (the sum of the sizes of all the fragmented by the
restriction enzyme). Second, it allows one to eliminate aberrant clones whose fragmentation patterns do not fit
the consensus of the overlapping clones. Note that this clone fingerprinting is not the same as mapping; it is just
a simple check before sequencing begins.

3.Plasmid library: A seed BAC is selected for sequencing. The seed BAC is sub cloned into a plasmid
vector by subdividing the BAC into smaller clones only about 2 kb. A plasmid library is prepared by
transforming E. coli strains with plasmid. This whole BAC sequence allows the identification of the 30 or so
other BACs that overlap with the seed: They are the ones with STCs that occur somewhere in the seed BAC.

Continue………
4. BAC walking: Three thousand of the plasmid clones are sequenced, and the sequences are ordered by their overlaps,
producing the sequence of the whole 150-kb BAC. Finding the BACs (about 30) with overlapping STCs, then compare
them by fingerprinting to find those with minimal overlaps, and sequence them. This strategy, called BAC walking,
would in principle allow one laboratory to sequence the whole human genome.

5. Powerful computer program: But we do not have that much time, so Venter and colleagues modified the
procedure by sequencing BACs at random until they had about 35 billion bp of sequence. In principle that
should cover the human genome ten times over, giving a high degree of coverage and accuracy. Then they fed
all the sequence into a computer with a powerful program that found areas of overlap between clones and
fit their sequences together, building the sequence of the whole genome.

You might also like