Professional Documents
Culture Documents
ADN Sequencing TechGuide PDF
ADN Sequencing TechGuide PDF
high-performance BLAST,
without the furnace.
CodeQuest includes:
CodeQuest enables you to run large analyses faster. In
• A powerful HP workstation with 2-CPUs, 1 terabyte of data 1 hour, you can map 26,000 30-mers to the rat genome,
storage, a 19” flat-panel monitor, 1 or 2 DeCypher Engine™ compare 2,000 protein sequences to Pfam for family
Accelerator cards for ultra-fast processing and PipeWorks™, assignment, and run 30,000 sequences past GenBank NR.
a drag-and-drop interface for building analysis pipelines This performance rivals that of hundreds of CPUs, without
• Tera-BLAST™, HMM and Smith-Waterman applications to enable the hassles and expenses of running a server farm.
comprehensive comparisons with GenBank, UniProt and Pfam
In addition to freeing you from high power, cooling,
• Tera-Probe™ for mapping oligos to the genome with high speci- space, administration and maintenance costs, CodeQuest
ficity to select optimal microarray probes and locate SNPs is priced less than a 10-CPU cluster. Contact us today to
• GeneDetective™ to generate intron/exon models and bring CodeQuest’s performance and usability to your lab.
putative RNA transcripts for alternative splicing studies
List of Resources . . . . . . . . . . . . . . . . . . . . . . . . . 14
Flowgram of a GS 20 read
with unprecedented speed and sensitivity
Harness the horsepower of the newest revolution in
sequencing today — visit www.genome-sequencing.com
call 800 262 4911 (U.S.), or contact your local sales representative.
Index of experts
Kevin Knudtson Lee Rowen
Director, DNA Facility Senior Research Scientist
University of Iowa Institute for Systems Biology
Bruce Roe
Alice Young
George Lynn Cross Research Professor
Deputy Director, Sequencing Group
of Chemistry and Biochemistry
NIH Intramural Sequencing Center
University of Oklahoma
Successful DNA sequencing is greatly dependent on We don't prepare plasmid DNA, only PCR products
the purity and integrity of the DNA. Most commercial from genomic DNA. When we prepare PCR template
kits used to prepare plasmids for sequencing are for sequencing, we set up the PCR reactions in a
based on the alkaline lysis procedure of Birnboim and separate clean room, using either a Beckman robot
Doly (Nucleic Acids Res, 7(6):1513-23) and will yield with filter tips or a Deerac Equator that is a non-
sequencing-quality templates when applied correctly. contact pipettor. In our lab we consider it critical to
There are a number of keys to success when using keep pre-PCR and post-PCR manipulations in a
this protocol to extract plasmid DNA. Do not overgrow different room to avoid contamination.
the bacterial culture, as the culture should be collected — Margaret Robertson
in the log to late-log phase of growth to avoid
degradation of the DNA. Completely remove all the We use an automated procedure for cell lysis and
growth media following centrifugation. Some bacteria plasmid extraction (specifically, an Eppendorf/ Brinkman
produce nucleases that will degrade the DNA during the robot that uses filter plates for DNA separation).
preparation process. Thoroughly suspend the bacterial However, non-automated procedures using standard
pellet in the lysis solution; vortex if necessary. Mix gently, cell lysis and plasmid extraction protocols also work.
but thoroughly, when adding the denaturing and — Lee Rowen
neutralizing solutions to avoid fragmentation of the
genomic DNA. The supernatant in the resulting mixture Template purity is the most important factor in
should be crystal clear. If the supernatant is turbid, then obtaining good-quality sequence data, and as a core
the bacterial pellet probably was not completely laboratory, this is something that we struggle with
suspended in the lysis solution and the DNA yield will be on a regular basis. We work with many types of
low. After centrifugation to pellet the cellular debris, templates from a variety of sources, often prepared
avoid collecting any of the debris. Repeat the using different techniques. Salts, organics, RNA,
centrifugation step again if any debris is collected. proteins, polysaccharides or chromosomal DNA from
The various commercial kits have their own unique plasmid preparation, or primers and unincorporated
methods to concentrate the DNA while reducing or nucleotides from PCR reactions all interfere with the
eliminating the salt concentration. We like to perform sequencing reaction. We encourage customers to
at least two wash steps to reduce the salt concentration keep this in mind before submitting samples to us.
and thoroughly remove proteins and other bound For plasmid preparations using commercial
material. Regardless of template type, the DNA should miniprep kits, we suggest many of the standard
be suspended in a low-salt buffer or water. In addition, precautions:
the sample should be free of any traces of ethanol, 1. Avoid overloading the columns, which may
isopropyl alcohol, and phenol as these compounds will lead to decreased yield or plasmid quality.
inhibit the DNA sequencing chemistry. For low-copy plasmids we recommend
— Kevin Knudtson adding no more than 6 ml of overnight
There are a number of critical steps that can affect Take extreme care to not shear any genomic DNA
the ability to construct a quality library. Construction during BAC isolation by minimizing agitation at the
of a high-quality cDNA library starts with obtaining various mixing steps (rolling the centrifuge bottles
good-quality RNA. The use of degraded or partially rather than shaking).
degraded RNA will give a library that does not fully Employ a second acetate precipitation step
represent the genome. Following cDNA synthesis, during the BAC isolation — see our website at
careful fractioning of the resulting cDNA should be http://www.genome.ou.edu/BAC_isoln_200ml_cultu
performed to eliminate very small products. Failure re.html.
to do so will lead to a library that only contains small — Bruce Roe
inserts that will necessitate the collection of
considerably more independent recombinants to a) Pay attention to bacterial growth conditions,
have complete coverage of the genome. optimizing media and duration.
Also, the inclusion of a b) We use a Kurabo AutoGen
library-specific sequence tag in 740 for BAC or fosmid DNA
the poly(T) reverse primer during “Presence of the wrong sequence isolation, again optimizing
cDNA synthesis will help detect tags indicates contamination from conditions (e.g. lysis,
cross-library contamination if centrifugation) to reduce
more than one library is to be a previous library preparation.” E. coli contamination.
constructed. Vector DNA must be — Kevin Knudtson — Lee Rowen
completely digested and
phosphatased because uncut Library construction is not
vector will transform the bacterial host very efficiently, something that we do in house, but rather outsource
resulting in a library that contains few or no inserts. to commercial vendors. I can, however, stress the
Library quality can be assessed by DNA importance of carrying out the recommended quality
sequencing to look for the presence of an insert, the checks throughout the entire procedure to ensure
poly(A) tail, a hexamer known to be a signal for that the final product is of the highest quality. For
cleavage and polyadenylation, and the unique cDNA libraries, we verify the quality of the starting
library-specific sequence tag. The absence of inserts RNA and determine the final cloning efficiency and
suggests the vector was not cut completely or there average insert size (typically greater than 95 percent
was a poor insert-to-vector ratio for the ligation step. and 1.5 kb, respectively). Within the scope of our
The absence of poly(A) tails and polyadenylation EST projects, we also assess library quality and
signals suggests RNA degradation. The presence of diversity based on the number of redundant clones
the wrong sequence tags indicates contamination in the library, by performing BlastN searches against
from a previous library preparation. all sequenced ESTs.
— Kevin Knudtson — Glenis Wiebe (continued on p.14)
The users of our core have had good success using low concentrations without compromising efficiency.
Primer3 and the commercially available primer We have a routine optimization procedure where we
design programs to design their primers. We perform touchdown PCR (the first 10 cycles vary by
encourage our users to design their primers to have one degree of the annealing temperature from 60-
a Tm of at least 45°C, but primers with a Tm of 55°C 50°), then 25 cycles at 50°. Any PCR dropouts go on
to 60°C tend to give successful sequencing results on to a "Betaine" optimization that usually recovers
a more consistent basis. about 60 percent to 70 percent of the dropouts.
Primers should not contain stretches of any one After that we have to work very hard to define
particular base. Repetitive G's or C's should conditions for the remaining assays. This can take a
especially be avoided. Primers should be "stickier" lot of hands-on time. It is probably more efficient to
on their 5' ends than on their 3' ends. A "sticky" 3' redesign primers, which we usually do if an assay
end as indicated by a high GC content could continues to fail.
potentially anneal at multiple — Margaret Robertson
sites on the template DNA.
However, we encourage the
“Primer design programs are a must Use PrimOU, a primer picking
use of a G/C clamp, especially program based on MIT's Primer
when sequencing through AT- to ensure that the primer will anneal and UT Southwestern's Primo.
rich regions. Freely available from our website
at a unique site in your clone.”
Primers should not possess at http://www.genome.ou.edu/
complementary sequences — Alice Young informatics/primou.html.
(palindromes), as this will result — Bruce Roe
in a primer that will fold back
on itself, leading to an unproductive priming event For sequencing PCR products, we use the same
that decreases the overall signal. Primers should not primer for both PCR and sequencing after purifying
contain sequences of nucleotides that would allow the PCR products. For plasmids, we use either
one primer to anneal to another (primer dimer universal or custom primers, following ABI protocols
formation). If possible, run a computer search for concentration. Generally, our primers are
against the vector and insert DNA sequences to between 18 and 20 bases. It's been our experience
verify that the primer and especially the 8-10 bases that choosing a primer without going through the
of its 3' end are unique. rigmarole of an oligo design program usually works
— Kevin Knudtson just fine. We choose a primer site about 100 bases
upstream of the region we want sequence for, so as
We use Primer3 to select PCR primers. Primer to avoid errors that occur at the beginning of a
concentration is kept low at 2uM, as are dNTPs and sequence read.
Taq. We try to keep Taq, primers, dNTPs, and DNA at — Lee Rowen (continued on p.14)
We took an empirical approach to determine the the reagent dilution. After the sequencing run, we
best reagent dilution to use in our sequencing analyze the data for read length (alignment to a
reactions for our core users. Essentially we processed consensus sequence) and Phred quality scores.
two to three users' samples using different dilutions — Lee Rowen
of BigDye and we are currently using the highest
dilution that continued to give successful sequencing In a core facility, the templates we receive are
results. extremely varied in type, quality, and concentration.
— Kevin Knudtson We continuously monitor failure/repeat reaction
rates to help determine what types of problems are
Since we are re-sequencing for rare mutations it is occurring, and how they are best dealt with. In our
imperative that we don't dilute the BigDye v3.1 experience, using 1/8th dilution (1 µl of BigDye
reagent too much. Phred scores have to be higher Terminator v3.1 in 10 µl final volume) in the
than 30 to pass our quality metrics in our sequencing reaction provides a good balance between
pipeline, so good quality data is a must. minimizing sample failure and keeping costs down.
I test out dilutions of reagents using several With larger-scale projects where we have control
exons of different lengths (400 to 600 bases) that over template preparation, we have been able to
contain known polymorphisms of different sequence use as little as 1/80th (0.1 µl) of the sequencing
combinations (e.g. C/T, or G/C) and empirically reaction kit.
determine the performance of the various dilutions — Glenis Wiebe
by inspecting the traces for quality, signal strength,
and accuracy. For ABI 3730 DNA sequencers this was accomplished
— Margaret Robertson by setting up reactions at various dilutions and
volumes. A robust reaction can be achieved using a
Since we're the folks that originally published 6 ul reaction containing 1/3 ul of the BDT version 3.1
reagent dilution on our website, we do the dilutions sequencing reaction mix and supplemental 5x buffer.
and then run reactions with both pUC and a We use very small amounts of DNA (20 ng to 50 ng
set of pUC-based shotgun clones at each BigDye for plasmids 5 kb to 7 kb). Lower dilutions can be
or ET mix dilution, with sequencing on either used, but are dependent on the quality and
the ABI 3700 or 3730. See our website at concentration of your DNA. For ABI 3130 DNA
ht t p : / / w w w. g e no m e . o u . e du / P r e d i s p e n s e d sequencers we use 10 ul reactions containing 4 ul
DilutedSeqMixBD3orET.html. BDT version 3.1. These conditions are probably
— Bruce Roe overkill, but we are generally sequencing very
difficult templates on these instruments.
We run a 96 plate with, say, four batches of 24 — Alice Young
templates of the sort we typically sequence, varying
Homopolymeric regions, Alu repeats, direct and Follow the protocols that we have on our website:
inverted repeats, di- and trinucleotide repeats, and • http://www.genome.ou.edu/protocol_
strong hairpins are among the template features that enhancing_PCR.html
can be difficult to successfully sequence through. • http://www.genome.ou.edu/seq_very_
Sometimes difficult templates need to be addressed difficult_regions.html
on a case-by-case basis depending on the base • http://www.genome.ou.edu/phi29_
content of the region. protocol.html
One of the first steps we take is to design a new These are especially useful for GC-rich regions.
sequencing primer that binds approximately 100 If they are AT-rich regions, we replace the BigDyes
bases upstream of the difficult region. Also, we will with the d-Rhodamine mixes. Also we PCR the
include a heat denaturation step of 98°C for five homopolymer stretch to determine the exact
minutes prior to the cycle sequencing reaction. In number of bases present in the repeats in
many cases, the heat denaturation step and moving conjunction with the above methods.
the primer closer will permit the DNA polymerase to — Bruce Roe
extend through the difficult region without changing
the cycle sequencing chemistry. We have also had We try a different chemistry, such as BigDye Primers
limited success in sequencing through GC-rich (now hard to get) or dGTP reagent kits.
difficult regions by using a cocktail of three parts — Lee Rowen
BigDye terminator v3.1 to one part dGTP BigDye
terminator v3.0. A more comprehensive discussion We use a 3:1 mixture of BigDye v3.1 and dGTP BigDye
on sequencing through difficult regions can be v3.0 in most reactions. For a problematic long
found in chapter three of DNA Sequencing: homopolymeric region, the first step is to sequence the
Optimizing the Process and Analysis, edited by Jan opposite strand. If this doesn't provide the necessary
Kieleczawa. information, we try a combination of approaches such
— Kevin Knudtson as lowering the extension temperature from 60ºC to
55ºC, using an oligo dT primer with degenerate bases
Since we are sequencing PCR products, it is often not at the 3' end, or designing a new primer closer to the
possible to sequence through a long homopolymer homopolymeric region.
because of enzyme slippage. In the primer design — Glenis Wiebe
stage we try to avoid homopolymers greater than
eight. In regular core sequencing of plasmids or We avoid these like the plague. Our medical
BACs, we can use a polyT primer with an anchored resequencing amplimers are carefully designed to
base or a mixed 3' base to try to get through avoid these regions in both the primers and the
the region. internal sequence of the PCR products. Good luck!
— Margaret Robertson — Alice Young
Visit www.454.com
© 2006 454 Life Sciences. All trademarks or registered trademarks are the property of 454 and/or its subsidiaries. Printed in USA
How do you detect and
resolve artifacts in your
sequence data?
Initially we examine the plate report generated by 454; BigDye terminator vs. BigDye primer chemistry).
our LIMS software, Geospiza Finch Suite, for If only one sequencing platform/chemistry is used,
potential issues in the sequencing results. We also then systematic error may not be detected. Random
import the plate record into Applied Biosystems' errors can be detected using redundancy (e.g.
Sequence Scanner v1.0 program to perform shotgun sequencing) or by comparison to a
individual assessments and examination of the consensus sequence. In many cases (e.g.
sequencing quality of each sample. Once an artifact compressions) sequences off of the forward and
has been identified, the daunting task can be trying reverse strand may differ, and a visual examination of
to figure out its source. Is the artifact due to the the data in a trace editor can help an experienced
instrument, chemistry, polymer, capillary, primer, person make the correct call.
template, or something else? The DNA Sequencing — Lee Rowen
Research Group of the Association of Biomolecular
Resource Facilities has developed a tool [known as Because we are a lower-throughput operation, we
the] DSRG Sequencing Troubleshooting Web are able to visually inspect every chromatogram.
Resource that investigators can use to help This allows us to quickly note any sequencing
troubleshoot their sequencing problems. artifacts that arise, and convey them to the
— Kevin Knudtson customer. In some cases we are able to edit the
base-calling, and no further intervention is required.
We rely a lot on automated data analysis, but have Otherwise, we may re-run the sample and repeat
LIMS tools that will flag data that is poor quality or the sequencing reaction as necessary.
that needs to be inspected by a human. — Glenis Wiebe
— Margaret Robertson
Statistics are generated for each sequencing run.
It's best not to generate any artifacts by following Each day a summary report is e-mailed to key
established protocols. For example, the ABI BigDye personnel. Array views are routinely examined on the
Version 3.1 includes some additives that virtually sequencers, especially when there appears to be
eliminate many of these artifacts. We also sequence anything abnormal showing up in our statistics. We
each base at least three times following Sanger's scrutinize these displays for signal intensity,
"Rule of Three" with at least one of the times being resolution of peaks, and background. Our finishing
from the opposite strand. group is responsible for bringing the assembled
— Bruce Roe sequence quality up to standards. In doing so, they
examine many traces. They once detected a cross-
Systematic errors are best detected by comparing the contamination we were getting from a robot that
sequences of thousands of templates using different was not properly washing tips between samples.
platforms or chemistries (e.g. ABI vs. MegaBace vs. — Alice Young
We make two types of libraries - small-insert (3-5 kb) We tend to follow the standard rules: the length
shotgun libraries from BACs or fosmid libraries from should be 18 to 24 nucleotides, with a G/C content of
BACs. In both cases our BACs come to use as glycerol approximately 40 percent to 60 percent, and a melting
stocks. These stocks were fingerprinted in our temperature between 48º and 72ºC. Primers with long
mapping lab where the fingerprint is one of several runs of a single base should be avoided, especially runs
parameters used in selecting which BACs to sequence. of three or more G's or C's. Primers should not form
When we reprep the BACs for library construction we secondary structures or primer dimers; the presence of
refingerprint the DNA and check for a match. During 3' hairpins or 3' complementarity is particularly
library construction, BACs are handled in groups of 16 detrimental. It has been suggested that the 3' end of a
so swaps are possible. When the sequence data comes primer should have a G or C as final base to act as a
through, the virtual fingerprint is compared to the real "clamp" when the primer anneals to the template
fingerprint. To minimize E. coli contamination we use a DNA; however, we have not found this to be
BAC prep method that came out of Wash U. During very important. In cases where the template is known,
the denaturation and neutralization steps of alkaline the primer should be checked for second-site
lysis there is no mixing that would cause shearing of hybridization. Finally, degenerate primers generally do
the genomic E. coli DNA. not work well for sequencing.
— Alice Young Regarding primer purity, as long as the coupling
efficiency of the synthesis was high, post-synthesis
desalting is usually sufficient. However, since primers
List of resources of poor quality do not work well for sequencing, we
recommend that our customers order HPLC or PAGE
Our panel of experts referred to a number of purified primers.
Web resources, which can be found below. — Glenis Wiebe