Lecture 09 Chapter 05-DNA-sequencing

DNA sequencing: Importance
• The DNA sequences making up any organism comprise

the basic blueprint for that organism
The Human Genome Project (and others)
• Potential benefits
• Molecular medicine
 Improved diagnosis of disease
– Disease gene identification will lead to more accurate diagnosis
 Earlier detection of genetic predispositions to disease
– Will be able to assess risk for certain diseases, e.g. cancer, Type
II diabetes, heart disease
 Rational drug design
– Drugs designed to target specific gene products that cause
disease
 Gene therapy and control systems for drugs
– Replacement of defective genes for certain diseases
 Pharmacogenomics "custom drugs”
– Drug therapy based on ones genotype…
– Bioarchaeology, anthropology, evolution, and
human migration
• Study evolution through germline mutations in
lineages.
• Study migration of different population groups
based on female genetic inheritance.
• Study mutations on the Y chromosome to trace
lineage and migration of males.
• Compare breakpoints in the evolution of mutations
with ages of populations and historical events.
DNA forensics (identification)
• Identify potential suspects whose DNA may match evidence
left at crime scenes.
• Exonerate persons wrongly accused of crimes.
• Identify crime and catastrophe victims.
• Establish paternity and other family relationships.
• Identify endangered and protected species as an aid to wildlife
officials (could be used for prosecuting poachers).
• Detect bacteria and other organisms that may pollute air, water,
soil, and food.
• Determine pedigree for seed or livestock breeds.
Agriculture, livestock breeding, and
bioprocessing
• Disease-, insect-, and drought-resistant crops.
• Healthier, more productive, disease-resistant farm animals.
• More nutritious produce .
• Biopesticides.
• Edible vaccines incorporated into food products
• New environmental cleanup uses for plants like tobacco.
What to Sequence and Why?
• De novo whole genome sequencing
Structure – requires de novo whole genome assembly
• Polymorphism discovery (distinct from genotyping!)

– Targeted approaches
– Whole genome
– SNPs, copy number variations, insertions, deletions, etc.
• Expressed sequence discovery

– ESTs
– cDNAs
– miRNAs, etc
• Functional genomics
Function – ChIP
– Expression profiling
– Nucleosome positioning
DNA sequencing methodologies:
ca. 1977
• Maxam-Gilbert • Sanger
– base modification by – DNA replication.
general and specific – substitution of
chemicals. substrate with chain-
– depurination or terminator chemical.
depyrimidination. – more efficient
– single-strand excision. – automation??
– not amenable to
automation
Maxam-Gilbert ‘chemical’ method
versus “synthesis-based” methods
• Fred Sanger: Nobel Prize 1980

• Instead of taking a complete sequence and
breaking it down, build DNA sequences up and
analyze steps along the way
• They key to this process: dideoxynucleotides
(ddNTPs)
What to label for visualization?
• Primers?
• Disadvantages of primer-labels:
– four reactions
– tedious
– limited to certain regions, custom oligos or
– limited to cloned inserts behind ‘universal’ priming
sites.
• Advantages: it works
• Solution:
– labeled “terminators” - ddNTPs
DNA Analysis: DNA Sequencing
• ddNTPs are analagous to faulty LEGOs,
Faulty LEGOs lack the

Normal LEGOs have little pegs and nothing can
little pegs that allow stack on them – thus,
them to stack they ‘terminate’ the stack
5’ and 3’
Base plus sugar
“nucleoside”
Adenine Adenosine
5’ Guanine Guanosine
Cytosine Cytidine
Thymine Thymidine
in DNA: “deoxyadenosine”
Deoxyribose plus triphosphate

3’
“deoxynucleotide”
“2’-deoxyadenosine 5’-
triphosphate” = dATP
5’
PO3 3’ OH
base base base
base base base base base
OH Antiparellel
3’ PO3
5’
If I throw in DNA polymerase and free

nucleotide, which end gets extended?
5’ 3’
Watson 5’ T A G C G T C A G C T G 3’
Crick 3’ A T C G C A G T C G A C 5’
5’
PO3 3’ OH
base base base
base base base base base
OH 3’ PO3
5’
Sanger Sequencing Templates
PCR product Plasmid “Clone”
Plasmid
seq backbone
seq primer site primer
site
Insert
Watson 5’ .. T A G C G T C A G C T .. 3’
Crick 3’ .. A T C G C A G T C G A .. 5’
5’ 3’
Primer T A G C G
3’ .. A T C G C A G T C G A C .. 5’
In Sanger sequencing, Crick is the template and Watson’s synthesis starts at the primer’s 3’OH
The Chain Terminator
• Dideoxy nucleotides cannot be further extended, and so terminate the sequence chain
5’ dideoxy
3’ H
base base base
base base base base
3’ PO3
5’
Original Sanger Sequencing with
Radioactive Signal
Template (Crick)
very low
concentration
of ddNTPs
compared to
dNTPs
A nested series of
Watsons DNA fragments
ending in the base
specified by the
terminator-ddNTP
Expose gel to x-ray film (to

make an “auto-radiogram”)
10_07_1_enzym.dideoxy.jpg
10_07_2_enzym.dideoxy.jpg
This is great but…
Wouldn’t it be great to run everything in one lane?

Save space and time, more efficient
Fluorescently label the ddNTPs so that they

each appear a different color
• Fluorescent Sanger Sequencing: “Dye-terminators”
Each of the 4 ddNTPs is labeled with a different fluorescent dye (instead of radioactivity)
07_02.jpg
07_02_2.jpg
07_03.jpg
Fluorescent Sanger Sequencing
Load on gel
dGTP (modern machines use capillaries, not slab
gels)
dATP +
dTTP
dCTP Direction
of electro-
phoresis
One-tube sequencing reaction
(note: cycle sequencing with modified Taq Polymerase)
Fluorescent Sanger Sequencing
Lane signal Trace
(Real fluorescent signals from a lane/capillary are much uglier than this).
A bunch of magic to boost signal/noise, correct for dye-effects, mobility

differences, etc, generates the ‘final’ trace (for each capillary of the run)
Trace
Sanger Base Calling
ugly over here ugly over there
Base Caller (Phred)
... 44 45 46 47 48 49 50 51 52 53 54 55 ... 718 719 720 ...

... N A G C G T T C C G C G ... A N N ...
... 0 3 20 25 40 88 95 99 99 99 99 99 ... 10 0 0 ...
Quality score = -10 * log(probability of error)

For Q20, probability of error = 1/100
For Q99, probability of error ~10-10
Phred: The base-calling program
• Algorithm based on ideas about what might go wrong in a
sequencing reaction and in electrophoresis
• Tested the algorithm on a huge dataset of “gold standard”

sequences (finished human and C. elegans sequences generated by highly-redundant
sequencing)
• Compared the results of phred with the ABI Basecaller
• Phred was considerably more accurate (40-50% fewer

errors), particularly for indels and particularly for the
higher quality sequences
(Ewing et al., 1998, Gen. Research 8: 175-185; Ewing and Green 1998, Gen. Research 8: 186-194)
Progress of Sanger Sequencing
Technology
Radioactive polyacrylamide slab
gel electrophoresis
Low throughput, labor-intensive
1 grad student could do
10 runs a day, 100 bp/run =
1000 bp/day
Technology
AB slab gel sequencers
Fluorescent sequencing
Per machine:
6 runs/day
96 reads/run
500 bp/read
288,000 bp/day
Technology
AB capillary sequencers
Per machine:
24 runs/day
96 reads/run
550 – 1,000 bp/read
1-2 million
bp/day
~1,000-fold increase in throughput since 1985 accomplished by
incremental improvements of the same underlying technology
Novel Disruptive Sequencing Technologies have 100-1000x more
throughput:
454 Pyrosequencing, Solexa, SOLiD
“virtual autorad” - real-time DNA sequence output from ABI 377
1. Trace files (dye signals) are analyzed and

bases called to create chromatograms.
2. Chromatograms from opposite strands are

reconciled with software to create double-
stranded sequence data.

Lecture 09 Chapter 05-DNA-sequencing

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture 09 Chapter 05-DNA-sequencing

Uploaded by

Copyright:

Available Formats

DNA sequencing: Importance

• The DNA sequences making up any organism comprise

• Polymorphism discovery (distinct from genotyping!)

• Expressed sequence discovery

• Fred Sanger: Nobel Prize 1980

Faulty LEGOs lack the

Deoxyribose plus triphosphate

base base base

base base base base base

If I throw in DNA polymerase and free

base base base

base base base base base

base base base

base base base base

Expose gel to x-ray film (to

Wouldn’t it be great to run everything in one lane?

Fluorescently label the ddNTPs so that they

A bunch of magic to boost signal/noise, correct for dye-effects, mobility

ugly over here ugly over there

Base Caller (Phred)

... 44 45 46 47 48 49 50 51 52 53 54 55 ... 718 719 720 ...

... 0 3 20 25 40 88 95 99 99 99 99 99 ... 10 0 0 ...

Quality score = -10 * log(probability of error)

• Tested the algorithm on a huge dataset of “gold standard”

• Compared the results of phred with the ABI Basecaller

• Phred was considerably more accurate (40-50% fewer

Low throughput, labor-intensive

1 grad student could do

10 runs a day, 100 bp/run =

1. Trace files (dye signals) are analyzed and

2. Chromatograms from opposite strands are

You might also like