Download as pdf or txt
Download as pdf or txt
You are on page 1of 32

DNA sequencing: Importance

• The DNA sequences making up any organism comprise


the basic blueprint for that organism
The Human Genome Project (and others)
• Potential benefits
• Molecular medicine
 Improved diagnosis of disease
– Disease gene identification will lead to more accurate diagnosis
 Earlier detection of genetic predispositions to disease
– Will be able to assess risk for certain diseases, e.g. cancer, Type
II diabetes, heart disease
 Rational drug design
– Drugs designed to target specific gene products that cause
disease
 Gene therapy and control systems for drugs
– Replacement of defective genes for certain diseases
 Pharmacogenomics "custom drugs”
– Drug therapy based on ones genotype…
The Human Genome Project (and others)
• Potential benefits
– Bioarchaeology, anthropology, evolution, and
human migration
• Study evolution through germline mutations in
lineages.
• Study migration of different population groups
based on female genetic inheritance.
• Study mutations on the Y chromosome to trace
lineage and migration of males.
• Compare breakpoints in the evolution of mutations
with ages of populations and historical events.
The Human Genome Project (and others)
• Potential benefits
DNA forensics (identification)
• Identify potential suspects whose DNA may match evidence
left at crime scenes.
• Exonerate persons wrongly accused of crimes.
• Identify crime and catastrophe victims.
• Establish paternity and other family relationships.
• Identify endangered and protected species as an aid to wildlife
officials (could be used for prosecuting poachers).
• Detect bacteria and other organisms that may pollute air, water,
soil, and food.
• Determine pedigree for seed or livestock breeds.
The Human Genome Project (and others)
• Potential benefits
Agriculture, livestock breeding, and
bioprocessing
• Disease-, insect-, and drought-resistant crops.
• Healthier, more productive, disease-resistant farm animals.
• More nutritious produce .
• Biopesticides.
• Edible vaccines incorporated into food products
• New environmental cleanup uses for plants like tobacco.
What to Sequence and Why?
• De novo whole genome sequencing
Structure – requires de novo whole genome assembly

• Polymorphism discovery (distinct from genotyping!)


– Targeted approaches
– Whole genome
– SNPs, copy number variations, insertions, deletions, etc.

• Expressed sequence discovery


– ESTs
– cDNAs
– miRNAs, etc

• Functional genomics
Function – ChIP
– Expression profiling
– Nucleosome positioning
DNA sequencing methodologies:
ca. 1977
• Maxam-Gilbert • Sanger
– base modification by – DNA replication.
general and specific – substitution of
chemicals. substrate with chain-
– depurination or terminator chemical.
depyrimidination. – more efficient
– single-strand excision. – automation??
– not amenable to
automation
Maxam-Gilbert ‘chemical’ method
versus “synthesis-based” methods

• Fred Sanger: Nobel Prize 1980


• Instead of taking a complete sequence and
breaking it down, build DNA sequences up and
analyze steps along the way
• They key to this process: dideoxynucleotides
(ddNTPs)
What to label for visualization?
• Primers?
• Disadvantages of primer-labels:
– four reactions
– tedious
– limited to certain regions, custom oligos or
– limited to cloned inserts behind ‘universal’ priming
sites.
• Advantages: it works
• Solution:
– labeled “terminators” - ddNTPs
DNA Analysis: DNA Sequencing
• ddNTPs are analagous to faulty LEGOs,

Faulty LEGOs lack the


Normal LEGOs have little pegs and nothing can
little pegs that allow stack on them – thus,
them to stack they ‘terminate’ the stack
5’ and 3’
Base plus sugar
“nucleoside”
Adenine Adenosine
5’ Guanine Guanosine
Cytosine Cytidine
Thymine Thymidine

in DNA: “deoxyadenosine”

Deoxyribose plus triphosphate


3’
“deoxynucleotide”
“2’-deoxyadenosine 5’-
triphosphate” = dATP

5’
PO3 3’ OH

base base base

base base base base base

OH Antiparellel
3’ PO3
5’

If I throw in DNA polymerase and free


nucleotide, which end gets extended?
5’ 3’
Watson 5’ T A G C G T C A G C T G 3’
Crick 3’ A T C G C A G T C G A C 5’

5’
PO3 3’ OH

base base base

base base base base base

OH 3’ PO3
5’
Sanger Sequencing Templates
PCR product Plasmid “Clone”
Plasmid
seq backbone
seq primer site primer
site

Insert

Watson 5’ .. T A G C G T C A G C T .. 3’
Crick 3’ .. A T C G C A G T C G A .. 5’

5’ 3’
Primer T A G C G
3’ .. A T C G C A G T C G A C .. 5’

In Sanger sequencing, Crick is the template and Watson’s synthesis starts at the primer’s 3’OH
The Chain Terminator

• Dideoxy nucleotides cannot be further extended, and so terminate the sequence chain

5’ dideoxy
3’ H

base base base

base base base base

3’ PO3
5’
Original Sanger Sequencing with
Radioactive Signal
Template (Crick)

very low
concentration
of ddNTPs
compared to
dNTPs

A nested series of
Watsons DNA fragments
ending in the base
specified by the
terminator-ddNTP

Expose gel to x-ray film (to


make an “auto-radiogram”)
10_07_1_enzym.dideoxy.jpg
10_07_2_enzym.dideoxy.jpg
This is great but…

Wouldn’t it be great to run everything in one lane?


Save space and time, more efficient

Fluorescently label the ddNTPs so that they


each appear a different color
• Fluorescent Sanger Sequencing: “Dye-terminators”
Each of the 4 ddNTPs is labeled with a different fluorescent dye (instead of radioactivity)
07_02.jpg
07_02_2.jpg
07_03.jpg
Fluorescent Sanger Sequencing
Load on gel
dGTP (modern machines use capillaries, not slab
gels)
dATP +
dTTP
dCTP Direction
of electro-
phoresis
One-tube sequencing reaction
(note: cycle sequencing with modified Taq Polymerase)
Fluorescent Sanger Sequencing
Lane signal Trace
(Real fluorescent signals from a lane/capillary are much uglier than this).

A bunch of magic to boost signal/noise, correct for dye-effects, mobility


differences, etc, generates the ‘final’ trace (for each capillary of the run)

Trace
Sanger Base Calling

ugly over here ugly over there

Base Caller (Phred)

... 44 45 46 47 48 49 50 51 52 53 54 55 ... 718 719 720 ...


... N A G C G T T C C G C G ... A N N ...

... 0 3 20 25 40 88 95 99 99 99 99 99 ... 10 0 0 ...

Quality score = -10 * log(probability of error)


For Q20, probability of error = 1/100
For Q99, probability of error ~10-10
Phred: The base-calling program
• Algorithm based on ideas about what might go wrong in a
sequencing reaction and in electrophoresis

• Tested the algorithm on a huge dataset of “gold standard”


sequences (finished human and C. elegans sequences generated by highly-redundant
sequencing)

• Compared the results of phred with the ABI Basecaller

• Phred was considerably more accurate (40-50% fewer


errors), particularly for indels and particularly for the
higher quality sequences
(Ewing et al., 1998, Gen. Research 8: 175-185; Ewing and Green 1998, Gen. Research 8: 186-194)
Progress of Sanger Sequencing
Technology
Radioactive polyacrylamide slab
gel electrophoresis

Low throughput, labor-intensive

1 grad student could do

10 runs a day, 100 bp/run =

1000 bp/day
Progress of Sanger Sequencing
Technology
AB slab gel sequencers

Fluorescent sequencing

Per machine:

6 runs/day
96 reads/run
500 bp/read

288,000 bp/day
Progress of Sanger Sequencing
Technology
AB capillary sequencers

Per machine:

24 runs/day
96 reads/run
550 – 1,000 bp/read

1-2 million
bp/day
~1,000-fold increase in throughput since 1985 accomplished by
incremental improvements of the same underlying technology
Novel Disruptive Sequencing Technologies have 100-1000x more
throughput:
454 Pyrosequencing, Solexa, SOLiD
“virtual autorad” - real-time DNA sequence output from ABI 377

1. Trace files (dye signals) are analyzed and


bases called to create chromatograms.

2. Chromatograms from opposite strands are


reconciled with software to create double-
stranded sequence data.

You might also like