Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 5

454 Sequencing

454 instruments are pyrosequencers that carry out many reactions at a time (parallel sequencing) in wells of
a PicoTiter Plate. Beads coated with thousands of homogeneous DNA fragments are added to individual
wells on the plate. The DNA fragments are amplified in an oil emulsion mixture with DNA polymerase and
primers[1]. dNTPs are sequentially added to the wells one at a time and washed. The process of continuous
washing and the sequencial addition of dNTPs, DNA polymerase, luciferase, and ATP-sulfurylase explains
the high reagent costs of sequencing. ATP-sulfurylase converts the PPi released from each dNTP addition
to the complementary strand of the original ssDNA to ATP. ATP fuels luciferase in each well [2]. The light
produced is detected with a flourescence microscope [3]. The current (2009) 454 FLX system has the
ability to sequence 100 Mb DNA in 8 hours with an average read of 250 bp and raw accuracy of 99.5% [4].

(image from [5]) (image from [6])


Illumina Sequencing
Illumina instruments amplify DNA fragments in situ on a flow cell. Fragment colonies are dispersed on the
flow cell with lanes at a low concentration at first, allowing for non-overlapping fragment colonies.
Clusters are promoted by isothermal bridging amplification [7].

The amplification of DNA using universal primers covalently bonded to the surface of the flow cell
produces 500-1000 clonal copies of the DNA fragments [8]. Fluorescently labeled nucleotides are
cyclically washed over the flow cell. These nucleotides are conjugated with reversible terminators so that
the four nucleotide bases can be simultaneously incorporated base by base across the flow cell. Laser
induced excitation of the cell allows imaging of the excited flourophores [9]. Before the next cycle, tris(2-
carboxyethyl)pho-sphine (TCEP) is added to knock off the flourescent dye and side chain (reversible
terminator) and bring back the 3' hydroxyl group, allowing for the next nucleotide addition [10]. The use of
a flow cell and reversible terminator allows the Illumina Genome Analyzer to produce 600 Mb of DNA per
day with only 36 bp reads. The trade-off between pyrosequencing methods and the flow cell method is
increased throughput for shorter reads. The raw accuracy of the Illumina genome analyzer is over 98.5%.
Increased coverage is necessary when using sequencers with high raw error rates [11].

(image from [12]) (image from [13])


Read Length
The issue with short reads (20-40 nt fragments sequenced at a time) is in assembly. We must use algorithms
to find overlapping sections of fragments, then piece these fragments together. There are many repetitive
regions of the genome. Using only 20-40 nt fragments we may have a hard time finding overlapping
regions and determining the correct linear chromosomal location of repetitive segments. While the error
rate of sequencing is only 2 percent for the first 30 nucleotides at the head of reads using Illumina
technology, the error rate quickly increases to 20 percent at the tails of reads at 50 nucleotides [14]. The
high error rate results from the incorporation of wrong bases by DNA polymerase (all four present at a
time) with no error-fixing machinery found in a normal cell. Long reads are difficult because replicating,
homogeneous DNA molecules in the clusters can get out of sync [15]. Paired end reads of circularized
DNA of adapted kilobase fragments can be used to link repetitive segments to general location. The 454
Titanium FLX instrument can perform extra long reads of 400-600 bps to eliminate the need for paired
ends [16]. Sequencing with short reads is generally cheaper and offers higher throughput, but, problems
arise in de novo fragment assembly. Assembly from short read data is usually accomplished with the help
of a reference genome [17], [18]. The top two sets of de novo assembly algorithms are Beijing Genomics
Institute's SOAPdenovo and the Broad Institute's ALLPATHS-LG [19].
The good people from NCSU and the David H. Murdock Institute probably used the 454 FLX system to
sequence the Vaccinium corymbosum genome de novo. Longer reads produced by this system facilitate de
novo assembly. They probably used an Illumina system to resequence and examine the quality of the
assembly. This resequencing also increases coverage. The authors of "The genome of woodland strawberry
(Fragaria vesca)" in Nature Genetics used a combination of Roche/454, Illumina/Solexa, and Life
technologies/SOLiD to sequence and resequence [20]. Curiously, Illumina advocates the use of the
Genome Analyzer for de novo sequencing. Illumina points to over 100 bp reads achieved by some
researchers. I question the accuracy of these reads. The company promotes the use of Velvet, a Bruijn
graph-based assembly program [21].

(image from [22])


(image from [23])
Single Molecule Real Time Sequencing
The future? No PCR? Direct detection of methylated segments? [24] No clustering or washing is necessary,
so long, cheap reads are possible. DNA libraries are incorporated into SMRTbell constructs of ligated
circularized DNA bound to a single DNA polymerase molecule [25]. A laser is shined through the glass
covering the zero-mode wavelength (ZMW) that excites only the 30nm bottom of the ZMW where the
polymerase is actively incorporating fluorescently labeled nucleotides. Interference from non-incorporated
nucleotides in the micropore is minimized. Accuracy of sequencing is still a major issue.
A team at Harvard used PacBio's sequencing methods to sequence five strains of Vibrio colerae at varying
coverages (in between 20X and 60X per strain) over two days. That translates to approximately 368Mb/day
throughput (of two strains). That throughput is on par with or worse than 454's throughput. Authors of the
paper do not indicate cost of sequencing, assembly time, nor raw error rate. By matching PacBio raw reads
to the Vibrio colerae reference genome, Dr. Elemento determined a raw error of 20 percent [26]. The
average read length was more impressive: 954 bp. No paired end, no reference genome necessary for
assembly [27]. With throughput of only 5.3Mb/30 minute run, the single molecule method is still toddling
around. We will see how this technology progresses.

(image from [28])

You might also like