Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

SCIENCE CHINA

Life Sciences
January 2010 Vol.53 No.1: 44–57
Celebrating Scientia Sinica doi: 10.1007/s11427-010-0023-6
(SCIENCE CHINA)’S the 60th Anniverasry

· REVIEW ·

The next-generation sequencing technology: A technology review


and future perspective
ZHOU XiaoGuang1†*, REN LuFeng1†, LI YunTao2, ZHANG Meng1, YU YuDe2 & YU Jun1*
1
Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100029, China;
2
Institute of Semiconductor, Chinese Academy of Sciences, Beijing 100083, China

Received December 8, 2009; accepted December 16, 2009

As one of the most powerful tools in biomedical research, DNA sequencing not only has been improving its productivity in an
exponential growth rate but also been evolving into a new layout of technological territories toward engineering and physical
disciplines over the past three decades. In this technical review, we look into technical characteristics of the next-gen sequenc-
ers and provide prospective insights into their future development and applications. We envisage that some of the emerging
platforms are capable of supporting the $1000 genome and $100 genome goals if given a few years for technical maturation.
We also suggest that scientists from China should play an active role in this campaign that will have profound impact on both
scientific research and societal healthcare systems.

genomics, DNA sequencing, next generation sequencing technologies, sequencer

Citation: Zhou X G, Ren L F, Li Y T, et al. The next-generation sequencing technology: A technology review and future perspective. Sci China Life Sci, 2010,
53: 44–57, doi: 10.1007/s11427-010-0023-6

1 Introduction propel creation and development of other branches of ge-


nomic studies such as comparative genomics and bioinfor-
matics as well as closely related fields such as systems bi-
DNA sequencing technology has played an essential role in
ology and synthetic biology. In a way, technological ad-
the advancement of molecular biology ever since its inven-
vancement in DNA sequencing has transformed the study of
tion [1]. From early manual sequencing operation developed
fundamental element of life – from individual, localized
by Frederick Sanger, first-generation automated sequencer
genes or fragment of genes to whole genomes, which in turn
driven by Sanger chemistry, to present next-gen sequencing
demands more competent sequencing technology. The syn-
platforms, we have witnessed tremendous changes in this
ergetic relationship between sequencing and its applications
field [2]. Some even liken this change in genomic sequencing
ensures that the trend will continue in foreseeable future and
to the evolution of semiconductor technology [3]. This is not
even accelerate due to the promise of and drive for person-
totally unfounded – the speed of sequencing has improved
alized medicine in disease diagnosis and treatment. Here,
exponentially every few years over the last few decades,
we provide a review of sequencing technology evolution,
similar to what semiconductor industry has experienced
summary of generational advancements with their merits
under the Moore’s law [4]. This rapid transformation is
and drawbacks, and prediction of possible direction of the
captured in Figure 1 and has fundamentally changed the
field. For ease of discussion, we categorize the progress of
way we can examine the blue-print of all life and helps to
sequencing technology into three generations with sub-
† Contributed equally to this work
*Corresponding author (email:junyu@big.ac.cn; joezhou@big.ac.cn)
© Science China Press and Springer-Verlag Berlin Heidelberg 2010 life.scichina.com www.springerlink.com
Zhou XiaoGuang, et al. Sci China Life Sci January (2010) Vol.53 No.1 45

time, to study in depth the genetic code of life.


The original method was primarily a manual endeavor
and hard to automate. For one, it utilized isotopic radioac-
tive labeling of primer for DNA ladder imaging, making the
sequencing process non-user friendly. The requirement of
four separate chain-termination reactions with dideoxynu-
cleotides (ddNTPs) and subsequent slab-gel based separa-
tion of chain-terminated products on four individual elec-
trophoretic lanes are both time- and reagent-consuming. All
these severely limited the overall throughput of sequencing-
hence, the desire to develop non-radioactive based 1st gen-
eration sequencing technology.

2.1.1 G1.1
Figure 1 Sequencing technology timeline
The initial version of 1st generation sequencer first appeared
in mid 80s and developed in Leroy Hood’s laboratory at Cal
generations as illustrated in Table 1. The designation of each Tech [6]. It made possible through modifications to the
state of advancement is somewhat arbitrary but nevertheless Sanger’s method. The key improvement includes the use of
it captures the key delineation of technological advances of color fluorescent dyes to replace radioactive labeling - four
each period. dideoxynucleotide terminators are tagged with differently
colored fluorophores. Furthermore, the tag is attached to the
terminator molecule (ddNTPs) instead of the primer as in
2 A review of the technology and its recent de- the case of original Sanger’s method. The color-coded
velopments scheme made it possible to perform all four chain-termina-
tion reactions in one tube. Polyacrylamide gel analysis of
2.1 1st generation technology – fluorescently labeled ladder fragments can be performed through computerized
sanger method fluorescence detection system. This greatly enhanced the
overall sequencing speed and reduced manual intervention
Before the appearance of first automated DNA sequencing required by operator during sequencing run.
platform, widely accepted DNA sequencing method of In the following year, ABI introduced its first semi-
choice had been the Sanger’s chain-termination method automated DNA sequencing platform, e.g. ABI 370 Se-
developed in the mid 1970s, for which Sanger was awarded quencer, based on the technology from Leroy Hood’s lab [7].
the Nobel Chemistry Prize in 1980 [5]. His invention In the subsequent two decades, we had experienced a rapid
opened a realm of possibility for researchers, for the first change and improvement of its performance. But the under-

Table 1 Roadmap of sequencing technologya)

Generation 1st-G 2nd-G 3rd-G


Version 1.1 1.2 2.1 2.2 2.3 3.1 3.2
ABI/GenoME
Sanger ABI
MS
SBS Illumina
Complete Ge-
SBL ABI/Polonator G.007
nomics
SBP Roche
FD Helicos
Platform SM-SBS Pacific Biosciences/ ?
FE
VisiGen
SM-SBL
SM-SBP
Pore PoC
Nano Nife PoC
Graphene PoC
a) SBS, sequence-by-synthesis; SBL, sequence-by-ligation; SBP: sequence-by- pyrosequencing; SM: single molecule, FD: fixed DNA; FE: fixed en-
zyme, PoC: proof-of-concept; ?: expected technology
46 Zhou XiaoGuang, et al. Sci China Life Sci January (2010) Vol.53 No.1

pinning working principle has not changed until very re- tions of sequencing platform. The reliability, raw accuracy,
cently. scalability of the tried-and-true method will continue to play
an important role, especially in sequencing PCR products
2.2.2 G1.2 and clone-ends of plasmids and bacterial artificial chromo-
Toward the end of last century, the second version of the somes as well as genotyping for STR markers.
1st-generation technology appeared. With it, we see further
enhancement in the speed and quality of DNA sequencing. 2.2 2nd Generation technologies – cyclic array sequenc-
This was mainly achieved through improvements in two ing by Synthesis
areas. First, slab-gel based separation was replaced by cap-
illary-electrophoresis, and second, number of concurrent The so-called next generation sequencing methods encom-
samples that can be analyzed was increased through higher pass a myriad of approaches based on different technology.
parallelism. The use of capillary instead slab-gel eliminated Although utilized quite diverse techniques and biochemistry
sample loading, reduced the reagent consumption and sped in each step from template library preparation, fragment
up analysis. Further, the compact form of capillary device amplification, to sequencing, they all adopted a massive
makes it easier to parallelize multiple sequencing runs, re- matrix configuration popularized by microarray analysis –
sulting in higher instrument throughput; 96 samples on ABI DNA samples on the array are simultaneously analyzed in
3730 platform and 384 samples on Amersham MegaBACE parallel. Furthermore, sequencing is carried out by observ-
could be achieved in one run. This generation of sequencers ing and recording optical events through microscopic
played a pivotal role in DNA sequence production at later apparatus during iterative sequencing cycles - a serial ex-
stage of the Human Genome Project and helped to accele- tension of primed template by either DNA polymerase [8]
rate the project completion. They have been continuously or ligase [9].
used until this day due to key advantages in its raw data Several key characteristics can be easily observed based
accuracy and sequence read length. on the general description. First, massive parallelism can be
Through decades of gradual improvement, 1st-generation achieved through ordered or disordered array configuration
sequencer can be applied to achieve sequencing length up to that offers high degree of information density. Theoretically,
1000 bp, with raw accuracy as high as 99.999%, at a cost as this is only limited by the diffraction limit of light (i.e., half
little as $0.50/kilobase and throughput close to 600000 of the wavelength used for detection of independent optical
bp/day. However impressive those numbers may seem, the events). This dramatically increases the overall throughput
1st-gen technology has reached its pinnacle both in terms of of the sequencing operation. Second, no electrophoresis is
speed and cost. Reliance on electrophoretic separation has used, resulting in ease of miniaturization and less sam-
rendered this approach difficult to further increase analysis ple/reagent consumption over the 1st-generation technology.
speed, to achieve higher degree of parallelism, and to re-
duce sequencing cost through miniaturization; hence, the 2.2.1 Next-gen Sequencer
need for development of a completely new generation of All next-gen sequencing platforms follow a similar work
approaches that overcome those limitations. flow as outlined in Figure 2 and require clonal amplification
That said, the 1st generation technology is not going to to enhance optical detection for sequencing. Three widely
disappear any time soon. It will co-exist with next-genera- used next-gen commercial platforms are Illumina Genome

Figure 2 Workflow of next-gen sequencing


Zhou XiaoGuang, et al. Sci China Life Sci January (2010) Vol.53 No.1 47

Analyzer, Roche 454 Genome Sequencer, and Life Tech- DNA. The 3′ end is then unblocked to allow next cycle of
nologies SOLiD System. They were all invented and de- extension to occur. This process repeats multiple times, up
veloped towards the end of 1990s and commercialized in to 50 cycles, to yield DNA read length of 50 bp.
the mid of first decade of the century. A relatively new Throughput of the platform can be thousand times higher
comer in this arena is the Polonator G.007, initially devel- than that of the conventional sequencer platform, i.e. 1st-gen
oped in George Church’s lab at Harvard and now manufac- platform. The main drawback is its relative short read length
tured by Dover Systems. Complete Genomics has recently contributed by optical signal decay and dephasing. Since
introduced its sequencing service platform based on its pro- optical signal is acquired on each DNA cluster, it is critical
prietary sequencing technology although it has not indicated that all strands of DNA in an ensemble grow in unison.
its intention to market this instrument. All of them are util- However, each step of sequencing chemistry could fail, e.g.
ized sequencing-by-synthesis with variation in DNA array failing to cleave fluorescent tag and/or remove blocking
formation, cluster amplification, and enzyme-based se- group. This leads to some DNA strands extend out of synch
quencing biochemistry. with other strands in the ensemble or fail to extend all to-
First, DNA template library is constructed. DNA library gether, causing signal decay or fluorescent signal dephased.
fragments are prepared from either randomly sheared ge- Furthermore, the error rate is accumulative, i.e. increasing
nomic DNA (10 s to 100s bp in size) or alternatively as DNA strand gets longer, limiting the length of sequen-
pair-end fragments with controlled distance distribution. cing read.
The double-stranded fragments are ligated with adaptor (2) Roche 454 Genome Sequencer. The 454 Sequencer
sequences at both ends and denatured. The resulting sin- utilizes emulsion PCR to yield amplicons used for the se-
gle-stranded template library is created and immobilized on quencing procedure [14]. Tiny paramagnetic beads coated
a solid surface, either a planar surface or supporting beads, with DNA primers are mixed with single-stranded template
and clonally amplified by one of several means, e.g. bridge DNA library together with components necessary for PCR
PCR [10], emulsion PCR [11], or in situ polonies [12]. DNA reaction. Proportional amount of beads and library frag-
clusters or amplified beads form an array of DNA clusters ments are mixed to ensure most beads carry no more than
on a slide, which then undergo cyclic manipulation through one ssDNA molecule. The aqueous solution is mixed with
enzyme such as polymerase or ligase. Optical events gener- oil to form emulsion where each water compartment forms
ated from the cyclic chain extension process are monitored an independent micro-reactor for subsequent PCR chemistry.
by microscopic detection system, and images recorded After multiple rounds of thermo-cycling, each bead is
through CCD camera. Sequential analysis of array image coated with thousands of copies of DNA of the same se-
yields DNA fragment sequences, which are assembled into quence. Beads are further enriched, transferred, and depos-
larger sequence contigs by computer algorithm. ited on a picotiter plate fabricated in organized array of tiny
(1) Illumina Genome Analyzer. The amplification of sin- wells with each hole occupied by only one bead. The pico-
gle-stranded library fragments is carried out through a titer plate is engineered as part of flow cell for sequencing
process coined “bridge amplification” [13]. On an oligo- chemistry on one side and bounded with optic fibers as part
derived flow-cell surface, consisting eight independent of CCD based optical detection system on the other.
lanes, single-strand DNAs flanked by asymmetrical adap- The base interrogation operation is sequencing-by-
tors form an oligo-bridges from both ends. After multiple synthesis that taps into pyrophosphate chemistry to produce
PCR thermal cycles, thousands of copies of DNA, ampli- optical signal for detection [15]. The pyosequencing, as
cons, based on one single-strand of DNA fragment are cre- often called, relies on enzymes ATP sulfurylase and
ated and clustered on the surface to a single physical loca- luciferase. Release of pyrophosphate, during nucleotide
tion. Millions of such clusters can be produced in each one triphosphate incorporation into the DNA chain, triggers a
of the eight independent flow cell channels (as such, eight cascade of biochemical reaction via ATP sulfurylase and
independent libraries can be analyzed in parallel during one luciferase, resulting in a burst of biochemiluminescent light
instrument run). A sequencing primer is then hybridized to a being emitted. Sequencing is achieved by sequentially in-
universal sequence in amplicons to start the sequencing run. troducing each of the four dNTPs into the flow cell. Pres-
Illumina’s GA utilizes sequencing-by-synthesis with ence or absence of light burst of each picotiter well indi-
fluorescently labeled nucleotides and reversible terminators. cates the incorporation or not of corresponding nucleotide
In each cycle of sequence interrogation, four distinctly la- and, therefore, reveals the identity of complementary base
beled nucleotides are added simultaneously to the flow cell on the template DNA in that well.
channel together with DNA polymerase to give rise to DNA Major advantages of pyrosequencing are its speed and
chain extension according to base pairing rule. Each nucleo- read length-up to 500 bp. Unlike other next-generation
tide is 3′-OH blocked to prevent further addition. A fluores- technologies discussed here, pyrosequencing does not need
cence image is acquired. The base-unique fluorescence re- to carry out extra chemistries to the extending DNA chain
veals the identity of newly incorporated nucleotide for each beyond normal biochemical process by DNA polymerase,
cluster and, therefore, sequence of corresponding template e.g . no need of removing label moiety and/or de-block ter-
48 Zhou XiaoGuang, et al. Sci China Life Sci January (2010) Vol.53 No.1

minator. This reduces the chances of mishap in chemical dual-base encoding approach as described above. Sequenc-
reaction and, therefore, less chance for premature chain ing assay is carried out through a serial of ligation reactions
termination or out of sync extension which is major cause of between a universal primer and a nanomer probe on emul-
dephasing. However, this asynchronous processing renders sion PCR amplified DNA cluster [9]. Each pool of nanomer
a limitation for pyrosequencing when going through ho- probes, consisting of degenerate oligonucleotides with
mopolymer region, e.g. GGGGGG in a row, where no fluorescent label correlating to one query position (i.e. fluo-
terminating moiety can stop the extension run. Length of the rescence color corresponds to the base at interrogation posi-
homopolymer has to be inferred from optical signal inten- tion), is successively introduced together with DNA ligase
sity which is prone to error. As a consequence, dominant to carry out primer-probe ligation. After each ligation cycle,
error type on this platform is insertion-deletion rather than a fluorescent image is taken. The extended primer-probe
substitution. Another drawback for 454 is its relative high chain is then denatured away to reset the system. Ligation
cost for reagents, comparing with other next-gen sequencer, between primer and the second pool of degenerate oligo-
due to its reliance on a set of enzymes for pyrophosphate probes for next query position occurs. This reset-ligation-
detection chemistry. imagining process repeats until all positions are interro-
(3) Life Technologies SOLiD System. Like in the case of gated.
454, SOLiD system also employs emulsion PCR as a DNA In this system, since no consecutive ligation is required
template amplification scheme with paramagnetic beads. after each reset, sequencing error is not accumulative. This
After breaking the emulsion, amplified beads are collected, is one of the advantages of the system. However, this does
enriched, and fixed on a flat glass substrate to create a dis- limit the reach of query position from each primer location
order array. and, therefore, result in a shorter read length. This short-
Its sequencing-by-synthesis is driven by ligation rather coming can be somewhat mitigated by using multiple an-
than polymerization as in previous platforms [16]. Further- chor locations in library sequence to extend the reach. The
more, it employs a dual-base encoding scheme in the proc- Polonator is substantially lower in instrument price than
ess to assist error detection. A universal primer complemen- other commercially available next-generation systems. It is
tary to the adaptor region of template library DNA on the also an open-source platform, which means it potentially
bead is hybridized. A serial ligation cycles follow - each allows sequencing operation and/or chemistry of the in-
ligation occurs between the extending chain and a pool of strument to be altered and enhanced by end users.
fluorescently labeled (at the 8th position) degenerate oc- (5) Complete Genomics. Complete Genomics utilizes
tamer probes. The octamer pool is structured such that oli- sequencing by ligation approach in much the same way as
gos with bases at the probing positions, e.g. position 1 and 2, the Polonator does. However, it incorporates a unique crea-
correlate with specific fluorescent color. After ligation, a tion to increase the density of DNA clusters on slide surface
fluorescent image is acquired. The octamer is, then, chemi- and to reduce reagent consumption [17]. To increase the
cally cleaved between position 5 and 6 to remove the last read length, multiple adaptors (four) flanking genomic
three bases together with the fluorescent label. Progressive fragments are inserted to form a circular DNA template
rounds of ligation enable interrogation of every 1th and 2th through sequence manipulation [18]. The template sequence
positions along the extended chain (i.e., 1-2, 6-7, 11-12, is then multiplied through circular PCR to make concate-
16-17, 21-22, 26-27, and 31-32). After 7 rounds of ligation mers containing two hundred copies of the original template
cycle, the extended chain is lifted away and system is reset. sequence. The concatemer is folded into a ball structure
A second primer, set back by one base, is annealed to the called DNA nano-ball (DNB). Each ball then self-assembles
adaptor region. This is followed by another 7 rounds of onto a planar substrate surface patterned with sticky (or
ligation/interrogation cycles as described above. This en- activated) spots to form a dense array of DNBs. DNBs do
ables interrogation of a new set of positions at 0-1, 5-6 … not stick to areas between activated spot on the slide, leav-
etc. This process continues with successive reset with ing an orderly-arranged matrix of DNBs. This provides
one-base-shortened primers to and followed by ligation cy- much denser array on the surface than those created through
cles until entire sequencing region is covered. This approach clusters or deposited beads because DNA nano-balls utilize
sounds complicated on paper. In reality, it’s all driven by more effectively three-dimensional spaces. It also eliminates
computerized system and fully automated. Since each base the use of flow cell as the rest of the next-gen sequencing
is measured twice, i.e. in two separate ligation cycles, this machines all do [19].
approach has an added advantage of identify miscall during The sequencing assay is carried out in a similar fashion
sequencing [16]. The major limitation is its relatively short as the Polonator, i.e. sequencing by ligation with single
sequence read length, also caused by dephasing in an en- query position – coined as cPAL (combinatorial probe-
semble. anchor ligation). cPAL sequencing uses pools of fluores-
(4) Polonator G.007. Polonator is another next-genera- cently labeled degenerate oligo-probes with four distinct
tion sequencing platform that utilizes sequencing-by-liga- colors correlating to four types of base at a given position.
tion. Its implementation uses single base probe instead of There is a separate pool for each reading position. A given
Zhou XiaoGuang, et al. Sci China Life Sci January (2010) Vol.53 No.1 49

pool of probes is ligated with anchor according to base ment.


pairing rule at the query position with fluorescent color of (1) Helicos HeliScope. The HeliScope Genetic Analysis
the probe correlating to the query base. After each read, System by Helicos Biosciences, based on the work from
probe-anchor complex is washed away. Another anchor hy- Quake’s group [21), is the first single molecule sequencing
bridized and next pool of probes for a different position is system appeared on market very recently. It utilizes se-
cycled in. The process repeats until all positions are read. A quencing by synthesis on single molecule. Constructed sin-
recent publication from Complete Genomics has shown its gle-stranded DNA library is disorderly arrayed on a flat
sequence accuracy and cost-effectiveness based on the se- substrate without any amplification. At each cycle of se-
quences of three human genomes [20]. quencing, DNA polymerase and one of four fluorescently
There is no chaining of consecutive probes as ABI labeled nucleotides are flowed in, resulting in template-
SOLiD System does. This offers a couple of advantages. dependent extension of DNA strands. Strands in the array
First, there is no memory effect. Any error made in prior that have undergone base extension light up by fluorescent
ligation cycle does not carry forward, resulting in better label, which are recorded with CCD camera. After washing,
fault tolerance. Furthermore, ligation yield of each cycle fluorescent labels on the extended strands are chemically
does not have to be high, reducing the amount of reagents, cleaved and removed. Another cycle of single-base exten-
e.g. probe and anchor, needed for assay. However, the read sion, label-cleave and imaging can begin. As in pyrose-
length from each anchor position is still short due to limited quencing, each iterative cycle is asynchronous - some
length of oligo-probes (9-mers). The overall read length is strands in the array may pull ahead, fall behind, or com-
extended by using four separate anchor locations in each pletely fail to extend all together. Since each strand operates
library fragment. in independence, there is no dephasing issue to concern.
The next-generation sequencing platforms based on se- This does mean, however, homopolymer run could be an
quencing by synthesis shown above dramatically increase issue as in the case for pyrosequencing. But, unlike Roche
the speed and reduce sequencing cost per base over the 454 platform, single molecule affords us mitigation by
1st-generation platforms. It is common to see these plat- playing trick with enzyme kinetics to slow down the rate of
forms to churn out Giga-base of sequence data per day per chain extension so to reduce the chance of two consecutive
instrument at a cost only a fraction of a cent per kilobase. base incorporations before dNTP being washed away [22].
However, their short sequence reads is an Archille’s heel for As mentioned above, a key challenge with SMS is its
all next-gen platforms except that of Roche 454 Sequencer. detection. HeliScope utilizes a fluorescent microscopic
This is mainly attributed to the dephasing problem of opti- technique called the Total Internal Reflection Microscopy
cal signal in DNA sequencing cluster. One solution to rem- (TIRM) – where only fluorophores within a very thin layer
edy this would be to eliminate the ensemble effect all to- of reaction volume on the surface of a flow cell can be ex-
gether – sequencing on single molecule. cited by evanescent wave to produce fluorescence [23]. This
helps to reduce fluorescent background. But even with the
2.2.2 Single Molecule Sequencing (SMS) state-of-art optical system, it is still often a challenge to
To overcome one of the major drawbacks of next-gen se- capture single-molecule event. Therefore, raw sequencing
quencing technology, relatively short read, efforts have been accuracy of the platform suffers in comparing with ensem-
made to develop single molecule sequencing platforms – ble-detection-based predecessors with dominant error type
where sequencing by synthesis is performed on an array of being deletion. However, a two-pass strategy can substan-
single DNA molecule. Single molecule also helps to in- tially improve the accuracy. Single molecule means we can
crease the number of DNA fragments that can be independ- now reset the tethered template DNA to its original state by
ently analyzed in a given surface area and, therefore, lifting off the extended chain after one run of sequencing.
achieves much higher level of throughput. Of course, it also This affords us to carry out another sequencing pass in op-
means no costly cluster amplification step is required, fur- posite direction from distal adaptor, yielding a second se-
ther reducing sequencing cost. This, however, introduces a quence of the same template. Duplicated sequences can then
new set of challenges, mostly in the area of optical signal be used to average out detection errors and, thus, give rise
detection of single-molecule event. The major issue is to to much higher accuracy than single pass.
reduce non-assay-specific fluorescent background interfer- (2) VisiGen. VisiGen Biotechnologies, now part of Life
ence, e.g. free floating fluorescent molecules which are not Technologies, has also been working on an implementation
participants of the actual chemical reaction. Several ap- of single molecule sequencing by synthesis [24]. In a nut-
proaches have been implemented in attempt to address this shell, they engineered a protein nano-device to observe and
challenge. The underpinning principle of all is to limit the record the DNA synthesis process by DNA polymerase in
volume of detection close to actual site of sequencing reac- real-time. This is achieved through FRET (Fluorescence
tion, such as through evanescent electromagnetic wave. In Resonance Energy Transfer) between fluorescence donor
next few sections, we will take a look at a few platforms and receptor. In FRET, receptor molecule does not emit
that have been developed or are currently under develop- fluorescent light when excited except when there is a energy
50 Zhou XiaoGuang, et al. Sci China Life Sci January (2010) Vol.53 No.1

donor nearby. In their setup, each dNTP is attached with a fluorescence burst reveals the identity of the complementary
fluorescent receptor moiety of distinct color at gamma- base on template DNA. By continuously following the
phosphate and DNA polymerase is engineered to carry a bursts of fluorescence of each waveguide in real-time, se-
fluorescent donor moiety close to its active site. During quence of template DNA in each hole can be rapidly deter-
DNA base extension, a matching dNTP is grabbed by DNA mined.
polymerase and brought fluorescent receptor close to the The PacificBio technology has the great potential for
donor group. Fluorescent energy transfer occurs, giving off high speed sequencing with long read length and low se-
fluorescence of correlating color. Once done, fluorescent quencing cost. But, error stem from challenges of real-time
moiety as part of pyrophosphate is released by DNA poly- single-molecule detection might put a damper on its per-
merase. This, in essence, creates a short burst of fluores- formance in raw accuracy as with other SMS platforms.
cent light in concert with nucleotide incorporation. By This can be mitigated through multiple runs after reset on
tracking and analyzing sequential light burst, the DNA se- same samples as described in previous section. Also, current
quence can be constructed. Please note that there is no CCD chip technology has limited the maximum area of si-
pause in the process to remove fluorescent label or to cleave multaneously observable ZMWs. Low yield ratio (~30%) of
blocking group - it is a true real-time process. This means it polymerase-occupied ZMWs also put a limit on the number
can be done at tremendous speed, given that optical re- of useable waveguides [27]. All these have limited the
cording apparatus can keep up. To further reduce back- higher throughput potential of the SMRT technology. Even
ground interference, they also apply TIRM as its fluores- with these limitations, the first version of the instrument,
cence detection setup. Furthermore, unlike Helicos’ plat- when introduced in 2010, is promised to have read length of
form, this system fixes DNA polymerase on a substrate sur- no less than 1500 bp, at a speed of 15 min per run, and with
face, instead of DNA strand - extending DNA strand grows reagent cost no more than $60 per run. It is anticipated that
untethered. The benefit of immobilized enzyme instead of future version of this platform, after those technical issues
DNA comes from keeping nucleotide incorporation event are resolved, could churn out 100 gigabases of data per day
close to the detection volume as DNA extends, i.e. fluores- with read length up to 100000 bp.
cence not out of sight as DNA chain grows. Although Life (4) Mobious Nexus I. Other than announcing its intention
Technologies has not said much about instrument’s per- to develop a single molecule sequencing platform, Mobious
formance, we can surmise it is going to be fast and to give Biosystems has not said much about the inner workings of
long read length. Theoretically, the read length is only lim- its technology - the Polykinetic Sequencing [28]. This
ited by the processivity of DNA polymerase. In reality, it technology exploits the natural chemical behavior of poly-
might be restricted by other factors such as photo-bleaching merase during DNA synthesis. For a polymerase to incor-
of fluorescence donor molecule attached to DNA poly- porate nucleotide to a growing DNA chain based on its
merase. It has been reported that Life Technologies is template, it needs to test a nucleotide from solution to de-
working on a kind of quantum-dot fluorescent label to termine its complementarity. If the nucleotide not a match,
overcome this problem. it is released immediately. If it is a match, polymerase will
(3) Pacific Biosciences. Pacific Biosciences is another hold on to it and continue time-consuming steps to add the
company that has been working to develop a new genera- nucleotide to growing chain. It is this time difference for
tion of sequencing technology, the SMRT (Single Molecule each nucleotide which is exploited in Mobious’s sequenc-
Real Time) technology [25]. Its single-molecule sequenc- ing-by-synthesis approach. Four nucleotides are sequen-
ing-by-synthesis relies on a nano-structure called Zero tially introduced into the reaction volume one at a time. By
Mode Waveguide (ZMW) for real-time observation of DNA measuring the time DNA polymerase (fixed on substrate
polymerization [26]. It consists of thousands upon thou- surface) hold on to the nucleotide to complete polymeriza-
sands of sub-wavelength holes, tens of nanometers in di- tion, matching or non-matching nucleotide can be inferred.
ameter, fabricated by perforating a thin metal film supported Detection methods which can capture polymerase’s con-
by transparent substrate. When illuminated from the side of formational change, for instance, can be used in this regard.
glass, light cannot penetrate through the hole because the Fluorescence resonance energy transfer (FRET) with pair of
dimension of each hole is less than the wavelength of illu- donor and receptor strategically mounted on DNA poly-
minating light. This leaves an exponentially-decayed eva- merase can certainly be used for this purpose as VisiGen
nescent wave at very bottom of each hole, creating a very does. The key problem with fluorescence, however, is
small volume of detection. During sequencing assay, dou- photo-bleaching of fluorophore. To get around this problem,
ble-stranded DNA is synthesized from single-stranded tem- Mobious exploits electromagnetic property variation as en-
plate by polymerase planted at the bottom of each zyme conformation changes, through plasmon resonance
waveguide, one per hole. Each time a base being added, spectroscopy, nuclear magnetic resonance, etc. [29]. One
polymerase locks on a fluorescently labeled dNTP (also added advantage of this approach, besides those with sin-
attached to gamma-phosphate position) and brings it to the gle-molecule sequencing, is no fluorescently labeled nu-
detection volume, creating a burst of light. This color-coded cleotides are used - a potential reagent cost reduction.
Zhou XiaoGuang, et al. Sci China Life Sci January (2010) Vol.53 No.1 51

2.3 Generation Technologies – Direct Sequencing ters (1-2 nanometers), usually made of membrane of
solid-state material or biological molecules with perforated
In all abovementioned 2nd-generation technologies, se- hole. The idea is that when threading a DNA strand through
quence is determined by indirect interrogation of nucleo- the pore, driven electropheritically, one can read off the
base incorporation with either DNA polymerase or DNA bases as they pass through pore opening by some electro-
ligase through optical events generated during synthesis, physical means. Various groups and organizations around
often assisted by fluorescence or chemiluminescence. Be- the globe are exploring this idea: Agilent, DNA Electronics,
sides requiring expensive optical detection system, large IBM, NabSys, Oxford Nanopore Technologies, Sequenom,
numbers of optical images have to be recorded, stored, and etc., just name a few.
analyzed, adding to the complexity and cost of sequencing. Two key challenges [35] facing all nanopore-based ap-
Reliance on biochemical reactions for base interrogation proaches are (i) distinguish four nucleotides in time com-
further adds to the cost of consumables which account for a mensurate with the travel rate of passing DNA strand, (ii)
substantial fraction of current sequencing expense. Direct control the speed of DNA translocation through the pore.
sequence interrogation, where no chemistry is required, is Initial attempts to measure the fluctuation of ionic current -
highly desirable to further reduce sequencing cost. What is variation in electric potentials across the nanopore upon
going to happen next is hard to predict with a great certainty blockage - as the single-stranded DNA passes through the
in a field that has been experiencing changes with neck- hole have so far yielded little success. Calculation and ex-
breaking speed. But, several areas of recent research with periment have shown measurement of ionic conductivity
great amount of activities offer us some hint on potential across the nanopore alone is unlikely to provide the required
technologies from which future generation of sequencing resolution to discern each nucleotide in DNA molecule [36].
platform may emerge. The channel length of nanopore is usually more than 5 nm,
spanning over a dozen nucleobases which is too long to
2.3.1 Non-optical Microscopic Imaging
offer single base current resolution needed for sequencing.
As the saying goes “picture speaks a thousand words”. One Even though the ionic current measurement is not dis-
of the most direct ways to determine DNA sequence is to tinguishable for individual base, it can readily discern sin-
visualize the nucleotides’ (mostly the bases) linear arrange- gle-stranded vs. double-stranded DNA [37]. NABsys in
ment in space. If a picture of DNA strand can be taken with collaboration with a research group at Brown University
enough resolution to distinguish four bases along a DNA exploited this capability and is developing a sequencing-by-
chain, sequence can be readily read off. This is precisely hybridization technology [38] - the Hybridization Assisted
what researchers in microscopy community have been at- Nanopore Sequencing (HANS). Genomic DNA is ran-
tempting to do. The idea is to tap into powerful non-optical domly cut to fragments of about 100 kb in length, made
microscopes with resolving power down to atomic level, single-stranded and hybridized with 6-mer oligonucleotide
such as scanning tunneling microscopy (STM), atomic force probe. Genomic library fragments bound with probes are
microscopy (AFM), etc [30,31]. With admittedly limited then driven through an addressable nanopore array device.
success so far, progress has been made. Recently, research- Ionic current across each pore is independently measured to
ers from a Japanese group at Osaka University showed that create a current tracing that shows the precise positions of
the scanning tunneling microscope imaging can be use to hybridized probes on each genomic fragment. Overlapping
distinguish guanine from other three bases with its distinct probe regions of genomic fragments are used to align frag-
electronic fingerprint along a real stretch of DNA molecule ment library to create a probe map of the genome. This is
[32]. Other group has been actively working on atomic repeated for all hybridization probes in turn, creating a
force microscope to read off the distinct force required to complete set of probe maps of the genome. Using computer
pull the tightly fitted ring structure of each base [33]. Yet, algorithm, the entire genomic DNA sequence can then be
ZS Genetics is working on an electron microscopy directed constructed. The precision and consistency of this
approach to sequence DNA. To address insufficient contrast nanopore-based measurement of hybridized location, how-
under electron microscope of natural DNA molecule, they ever, still need to be demonstrated.
attempt to use nucleotides labeled with heavier elements to To enhance the sensitivity of nanopore to reveal features
synthesize a new DNA strand by polymerase from the tem- of nucleobases, research groups are working on other ap-
plate DNA molecule [34]. The obtained heavier DNA strand proaches, including embedded electrical probes inside
can then be visualized under electron microscope to con- nanopore structure [39]. They hope that the pair of tunnel-
struct nucleobase sequence. ing electrodes abutted on opposite side of the nanopore can
register characteristic tunneling current as each base being
2.3.2 Nanopore driven through pore. Both computer simulation and experi-
Another area we have seen flurry of activities is nanopore ence with scanning tunneling microscopy which has been
structure for DNA sequencing. Nanopore, as its name im- successfully used to reveal atomic-scale features give us
plies, is a tiny hole with a diameter in the range of nanome- optimism for this to work. But fabricating a nano-device of
52 Zhou XiaoGuang, et al. Sci China Life Sci January (2010) Vol.53 No.1

this scale is not a walk in the park. Another creative solution sible to move DNA molecule one base at a time through the
to the problem is to develop chemically functionalized nanopore. This would give them ample time to interrogate
nanopore. Instead of embedding solid-state electrodes, each passing nucleobase.
Lindsay and colleagues [40] have proposed to use two
chemical probes to form hydrogen bonds with phosphate 2.3.3 Graphene and Carbon Nanotube
group (the grabber) and base moiety (the base reader) re- Graphene is a single layer carbon arranged in a sheet. It is
spectively, as nucleobase pass through. Four different reader very strong and has great electrical conductivity, a very
probes would be required for four nucleotides in this ap- good candidate for making electrode used for nucleobase
proach. interrogation. One idea [47] that has been proposed is to
Yet, other groups are working on alternatives of solid- create a gap, about 1 nm wide, on the graphene sheet. DNA
state nanopore - biopore. One group uses engineered MspA molecule is guided vertically through the gap. As the DNA
protein to construct bio-nanopore for analysis of ssDNA [41]. passes through the gap, two edges of the graphene sheet can
They demonstrate that ssDNA can be threaded through this act as electrodes to interrogate nucleobase for sequencing.
biopore, a potential for single molecule sequencing. Oxford Besides challenge for making such a small gap on a gra-
Nanopore Technologies, collaborating with University of phene sheet, controlling the orientation, motion, and speed
Oxford, is working on another protein engineered nanopore. of DNA translocation through the gap is also an issue.
Through genetic engineering, they have been able to con- Carbon nanotube (CNT) has been promised to have great
struct a biochemical nanopore by covalently attaching the potential to play an important role in rapid DNA sequencing
aminocyclodextrin adaptor within the α-hemolysin pore in a because of its unique electro-physical properties and
lipid-bilayer membrane [42]. Recently, they demonstrated nanometric dimension, albeit no working device has yet
that by driving four nucleotide monophosphates (dNMPs) been fabricated so far. It has been shown that surface of
through the pore the ionic current is reduced to one of four CNT is highly interactive with DNA molecule, even in se-
levels, each of which correlates to one nucleotide [43]. quence specific way [48]. Long genomic ssDNA can wrap
Coupling this discovery with successive release of nucleo- around a single-walled CNT to form a tight, stable DNA-
tide from DNA chain, which can be achieved through ex- CNT complex [49]. Computer simulation has demonstrated
onuclease, offers us another potential nanopore-based single that four types of nucleotides introduce distinct characteris-
molecule sequencing technology. To make it happen, it is tic features in the local density of states [50], making CNT a
important to demonstrate the exonuclease moiety can be good candidate for the development of electronic sequenc-
mounted in a way to ensure delivery of released nucleotide ing strategy on its own or in combination with other tech-
monophosphates into and through the pore in strict sin- nology. Most of those ideas are still in proof- of-concept
gle-file. phase.
Aside from base detection, controlling the motion and
speed of DNA translocation through a nanopore is also im-
portant and challenging. The speedy translocation of DNA 3 Future Perspectives
through a nanopore holds the promise for ultra-fast se-
quencing. But, if the DNA strand passes through the pore After providing the overview of sequencing technology
too fast, it might give little time for each base to be accu- development over three decades and potential new break-
rately determined. The situation is made worse by stochastic throughs that may follow, it begs the question of what is
motion of DNA molecule and non-specific interaction of going to happen in the years to come. What is the outlook of
bases with the nanopore surface [44]. All these add to the technical parameters of sequencing, where is the thousand
uncertainty to the rate of DNA molecule translocating or the hundred dollar genomes ($1000 genome or TDG and
through the nanopore. Although the travel speed of DNA $100 genome or HDG; the two goals assume the total cost
can be reduced by lowering temperature, increasing solvent for sequencing a human genome and the way to analyze it is
viscosity, and decreasing potential bias across nanapore, reference-based) technology going to come from, and when
variation of velocity is still problematic. There have been will it happen? In this section, we will try to provide our
various attempts and proposals to overcome this difficulty. answers to some of these questions.
One such idea [45] is to tap into processive enzyme of some
sort to bind with traversing DNA strand; this should sub- 3.1 Technology Convergence
stantially reduce the rate of translocation. More recently, a
group at IBM announced their idea based on a nanopore- We do not know for sure which game changing idea or dis-
device they termed DNA transistor [46] – a nanopore em- ruptive technology would ultimately bring us to the promise
bedded with metal layers, forming metal-dielectric structure land of the $1000 genome or the eventual $100 genome.
which can be modulated to trap DNA molecule inside the One thing is strikingly clear when we look at the
nanopore. By cyclically turning the gate potential on and off, technological progression of three generations of sequenc-
their computer simulation result shows that it would be pos- ing platforms – the marriage of solid-state technology and
Zhou XiaoGuang, et al. Sci China Life Sci January (2010) Vol.53 No.1 53

biochemistry. And furthermore, technological convergence and processivity, respectively, of DNA polymerase. Future
is shifting toward physical interrogation from biochemical generation of sequencing technology based on nanopore has
or chemical approaches as illustrated Table 2. We believe also been promised to achieve long read as nucleotides are
this trend will continue and future generation of sequencing determined, one by one, while DNA threads through the
technology could eliminate biochemistry all together; nanopore structure. In principle, the length of DNA that can
nanotechnology will play a bigger role. be read in a single swoop is only limited by the practicali-
ties of threading a very long DNA strand through the pore
3.2 Sequencing Throughput and Read Length without shearing. It has been demonstrated that ssDNA up to
5.4-kb can be threaded through a solid-state nanopore [37].
Throughput and read length have been a dichotomy for se- It can, therefore, readily anticipate that newer generation of
quencing platform selection. From the first-generation to the sequencers will be capable of super-high throughput with
next-gen sequencing technology, we have seen a dra- read length surpassing Sanger instruments.
matic improvement in sequencing throughput but at the
suffering of read length. As stated in previous sections, this
is primarily the result of out-of-synch sequencing chemistry 3.3 Sequencing Cost and Productivity-the Experience
for extending DNA strands in an ensemble - the dephasing Curve
effect. Users are left with a choice between sequencing
platforms of longer read-length but low-throughput, i.e. the Over last three decades, we have seen the cost of sequenc-
first-gen, and that of short read and high throughput, i.e. the ing dropped precipitously while sequencing throughput (or
next-gen. The situation will change with single molecule productivity) increased exponentially. Some has likened
sequencing, e.g. PacBio’s ZMW technology, which prom- the dramatic change to that of IT industry as abovemen-
ises super-fast throughput with multi-kilobase read length. tioned. In fact, the speed of change in DNA sequencing has
Assuming uninhibited optical detection capability, through- beaten the Moore’s law of semiconductor industry in certain
put and read length are only limited by the synthesis rate aspect. Figure 3 shows plots of change over the years in log
scale for sequencing cost (in US dollar per human genome)
and productivity (in nucleotides per day per instrument).
Table 2 Technology convergence
A couple of observations can be made in these plots. First,
Key technology 1st Gen
2.1-2.2th 2.3th
3rd Gen
as expected, cost of sequencing and throughput has fol-
Gen Gen lowed pretty much exponential drop and uptake, respec-
DNA hybridization √ √
tively, over three decades. Closer examination reveals an-
DNA polymerase √ √ √
other interesting point - the slope of both curves increased
PCR √ √
Electrophoresis √
with an inflection point right around the year of 2005,
Optoelectronics √ √ √ meaning the rate of cost reduction and throughput en-
Microfluidics √ √ hancement has accelerated since then. This is the time when
Micro/Nano-fab √ √ √ next-gen sequencing platforms started to make their ways
Single molecular
√ √
into laboratories. The near linear slopes on both sides of the
detection inflection point reflect the two distinct drivers for sequenc-

Figure 3 Sequencing cost and productivity


54 Zhou XiaoGuang, et al. Sci China Life Sci January (2010) Vol.53 No.1

ing production improvement: one being the productivity such as disruptive technological breakthrough or lack of it.
increase and cost reduction within a technological genera- If we hit a wall with current technology before reaching the
tion and the other propelled by technological breakthrough. goals, all bets are off. Current model we use to make our
Productivity and cost relationship can be modeled by projection will become obsolete and a new one based on
experience curve (aka the learning curve) effect. This effect replacing technology will have to be established. If hap-
states that the more often a task is performed the lower will pened, how long that is going to take is everybody’s guess.
be the cost of doing it. It was first successfully used by But one thing is more certain – we should know the answer
Theodore Wright in mid of 1930s to quantify and project in a year or two. Based on current trajectory of the experi-
decreases in cost as a function of increased airplane produc- ence curve, we project the TDG could be attainable within
tion [51]. Since then, the model has been used to study pro- 2nd-gen technology based on cyclic array sequencing by
ductivity and cost relation in many industries. If we plot the synthesis. This could come in a year or two. But for the
sequencing cost since 2005 in log scale against sequencing HDG, it is harder to project because current generation of
throughput (in log scale) over the same period, we obtain technology, even with its newer SMS, might not be techno-
the experience curve for the 2nd-gen sequencing technology logically sufficient to lower the cost to $100 dollar. If that is
as shown in Figure 4. The linearity of the plot is a typical the case, we may have to wait until a newer generation of
experience curve effect – as productivity increases by a cer- disruptive technology comes along, e.g. the 3rd-generation.
tain fold, the cost of doing it reduces by a certain percentage.
Occasionally the linearity of experience curve can stop 3.5 Cost of Sequencing to Customer
abruptly. The discontinuity reflects the obsolete technology
or process that has been replaced by newer one. This is what Are we really getting there this quickly? Are we really going
we have observed between 1st-gen and 2nd-gen technologies to see $1000 customer cost for human genome sequencing?
right around 2005 as previously discussed. In our current Not so fast. All recent sequencing cost data we rely our
generation, i.e. 2nd-gen, we have not seen the discontinuity analysis on are based on consumable or reagent cost in most
point yet. So far, an order of magnitude increase in se- part. Even with the same instrument platform, a surprisingly
quencing throughput translates into 1.8 orders of magnitude wide range of cost estimates can be generated [52]. One
in sequencing cost reduction since 2005. frequently under-appreciated cost is the downstream infor-
matics analysis. All too often, people forget to include the
3.4 Thousand Dollar Genome (TDG) and Hundred time-consuming and human-intensive analysis of sequenc-
Dollar Genome (HDG) ing data generated by instrument to give rise a well-
annotated genome data in the cost estimate or procla-mation.
Extending the trend of our experience curve as shown in If one factors in all the associated costs to generate quality
Figure 4, we can conclude that the TDG and the HDG goal human genome data, e.g. consumables, machine amortiza-
might be reached when we can produce genome sequences tion, maintenance, labor, and computation, real cost could
at a rate of 20 Gb per day and 70 Gb per day per instrument, run up many times higher than those numbers.
respectively, based on the 2nd-generation technologies (in- Besides the real cost of sequencing, we also have to con-
cluding SMS currently under development). For the TDG, it sider the market force and include business operation cost.
could happen pretty soon, within a year or two, if current Even if we can get sequencing cost down to a thousand dol-
trend remains. For the HDG, it might happen in two to three lar range, initial market demands of such service will prob-
years based on the same model. ably keep the price higher than that – simple market supply
One big caveat, of course, is that all these predictions are and demand. Adding all these up implies that we may not
based on our experience curve model. As we already know, see the $1000 genome in real sense of the word for some
there are other factors that experience curve cannot predict time soon in the very near future.
The real cost structure currently is that the reagent cost is
artificially high since both library construction and se-
quencing reaction reagents are fully controlled by the com-
panies who provide the instruments. Until serious competi-
tors who challenge the reagent monopoly enter the market-
place, the situation is not going to change in favor of the
customers. Therefore, the customers’ job is to choose their
resources for sequencing with priority, starting from the top
of the to-be-done list, rather than to jump on the bandwagon
right away.

3.6 Sequencing Operation Model


Figure 4 Experience curve of 2nd-generation Another factor we need to consider on the issue of sequenc-
Zhou XiaoGuang, et al. Sci China Life Sci January (2010) Vol.53 No.1 55

ing cost is economies of scale. In a way, this is factored in responsibility of hosting web sites for many organizations,
by the experience curve effect abovementioned - the more large or small. Genomic sequencing service providers of
one does something the less costly it will become. This such kind have already sprouted up with companies such as
raises another interesting question: which direction of the Agencourt Bioscience, Cofactor Genomics, Complete Ge-
sequencing business is going to go, production-oriented nomics, Knome, SeqWright, etc., just name a few. This
center or distributed sequencing activity? We predict the trend will continue. But eventually, many of them will con-
market needs will diversify the operation models. solidate into a few larger providers as the market becomes
There are two operational models for DNA sequencing: mature.
as a routine laboratory technique or a high-throughput op-
eration. The formal has been obviously observed for capill- 3.7 Technology Coexistence
ary-electrophoresis -based machines, such as ABI-3730 XL,
which are necessary for sequencing limited amount samples Generations of sequencing technology are not mutually ex-
and/or for special tasks. It is predictable that the current clusive – emergence of a newer generation technology does
next-gen sequencers will move to such a niche-based opera- not mean older platforms become obsolete completely. They
tion. For instance, we will soon see scaled-down versions of often coexist due to complementary functionality of differ-
the current machines, such as those of Roche/454 and Illu- ent generations as illustrated in Table 3. A case in point, the
mina GA in the market, which will be used for small-scale next-gen platform, which provided much higher throughput,
operations to satisfy individual labs’ needs that start from as has not completely replaced the 1st-gen technology based on
small as a few bacterial genomes in a single machine run to Sanger’s method. Advantage of read length and raw se-
as large as a human genome in a few runs. The large-scale quencing accuracy attainable through Sanger’s method still
machines will be pushed to the capability of acquiring finds it a niche in small-scale sequencing projects and/or
20x-coverage sequence data for a haploid human genome, provide complement to the next-gen systems in large scale
equivalent to 100-150 Gb per run. project.
We believe that most of future large-scale DNA se-
quencing activities will shift to large commercial service 3.8 What China Should Do?
centers or providers rather than being done in individual
labs or even small institutions. As sequencing cost drops to Fierce competitions in developing newer generation of se-
a level below $1000 per genome, if current promise holds quencing technologies among organizations in highly de-
true, it becomes harder to justify the purchase of a half mil- veloped countries and regions, such as the US and EU, is
lion dollar instrument (assuming instrument price stays the one of many indicators for its importance. For China, it is
same as of now – a reasonable assumption based on what still a great challenge to develop technology of this sophis-
we have experienced so far). To quickly recuperate the in- tication. But, it is also a great opportunity.
strument cost and reach break-even point, it would require Traditionally, China has been lacking financial support
large enough volume of sequencing jobs. Service-oriented and technical know-how to develop sophisticated analytical
sequencing provider of efficient operation with large cus- instrument such as newer generation of DNA sequencer.
tomer base would certainly have the financial advantage. Most analytical equipments that have so far been developed
Simply put it, it is the economies of scale. This is, in a way, in China are limited to basic laboratory products such as
similar to the changes that we have seen in the IT industry – centrifuge, shaker, and power supply. For high-end analyti-
large Web hosting service providers have taken over the cal needs such as high-throughput DNA sequencing, all the

Table 3 Functionality of sequencing technologya)

Functionality
Technology Definition De novo
PCR Seq Re-Seq GT 1000/100
WGS CBC
1.1 Slab Gel + + + + + NA
1st-G
1.2 Cap-4color +++ ++ ++ ++ ++ NA
LR NA +++ NA ++ NA NA
2.1 emPCR
SR NA ++ NA +++ NA NA
2nd-G
2.2 HT/No Flow Cell NA +++ NA +++ NA NA
2.3 Dingle Molecule NA +++ NA +++ + 1000
3.1 Chem/Nanotech NA +++ NA +++ + 1000
3rd-G 3.2 Nanotech NA +++ NA +++ + 100
3.3 Nanotech NA +++ NA +++ + 100
a) WGS: short reads; LR, long reads; GT: genotyping, 1000/100: per genome costs; WGS, whole genome shotgun; CBC, clone; HT, high throughput; SM,
single molecule
56 Zhou XiaoGuang, et al. Sci China Life Sci January (2010) Vol.53 No.1

equipments are imported. This is understandable due to 2 Sanger F, Coulson A R. A rapid method for determining sequences in
DNA by primed synthesis with DNA polymerase. J Mol Biol, 1975,
China’s state of economy and technology in the past.
94: 441–448
To develop an advanced analytical platform of this kind, 3 Shendure J, Mitra R, Varma C, et al. Advanced sequencing
one needs technical and engineering infrastructure, collabo- technologies: methods and goals. Nat Rev Genet, 2004, 5: 335–344
ration, and integration of multiple disciplines, e.g. biologi- 4 Moore G E. Cramming more components onto integrated circuits.
Electronics,1965, 38: 4
cal chemistry, semiconductor, electronic engineering, me-
5 Sanger F. Sequences, sequences, and sequences. Annu Rev Biochem,
chanical engineering, computing, etc. It also requires a huge 1988, 57: 1–28
public or private investment organizationally and financially 6 Smith L M, Fung S, Hunkapiller M W, et al. The synthesis of oli-
in such an endeavor. But the potential benefit is huge. gonucleotides containing an aliphatic amino group at the 5' terminus:
synthesis of fluorescent DNA primers for use in DNA sequence anal-
First, through the development of an analytical system of
ysis. Nucleic Acids Res, 1985, 13: 2399–2412
this kind, Chinese scientists and engineers alike not only 7 Applied Biosystems Timeline. www.appliedBiosystems.come
can contribute to human efforts in deciphering the secret 8 Mitra R D, Shendure J, Olejnik J, et al. Fluorescent in situ
code of life but also can learn and acquire the technical and sequencing on polymerase colonies. Anal Biochem, 2003, 320: 55–65
9 Shendure J, Porreca G J, Reppas N B, et al. Accurate multiplex
organizational skills needed for such effort. Second, beyond
polony sequencing of an evolved bacterial genome. Science, 2005,
advancing technological, engineering, and scientific know- 309: 1728–1732
how, it has huge economical potential as well. Advanced 10 Adessi C, Matton G, Ayala G, et al. Solid phase DNA amplification:
genomic sequencing technology opens the door for person- characterisation of primer attachment and amplification mechanisms.
Nucleic Acids Res, 2000, 28: e87
alized medicine. A corollary of this is the potential eco-
11 Dressman D, Yan H, Traverso G, et al. Transforming single DNA
nomic benefit for ultimate healthcare delivery. If only ten molecules into fluorescent magnetic particles for detection and
percent of population in China opt to have their genome enumeration of genetic variations. Proc Natl Acad Sci USA, 2003,
determined at a price of $1000 a piece, it would create 130 100: 8817–8822
12 Mitra R D, Church G M. In situ localized amplification and contact
billion US dollar economy. Therefore, we argue that even
replication of many individual DNA molecules. Nucleic Acids Res,
though the up-front investment for China to develop this 1999, 27: e34
technology seems large but the potential loss of not doing it 13 Fedurco M, Romieu A, Williams S, et al. BTA, a novel reagent for
is even greater. DNA attachment on glass and efficient generation of solid-phase
amplified DNA colonies. Nucleic Acids Res, 2006, 34: e22
14 Margulies M, Egholm M, Altman W E, et al. Genome sequencing in
microfabricated high-density picolitre reactors. Nature, 2005, 437:
4 Conclusions 376–380
15 Ronaghi M, Karamohamed S, Pettersson B, et al. Real-time DNA
sequencing using detection of pyrophosphate release. Anal, Biochem,
Three decades’ innovation and development have ushered in
1996, 242: 84–89
a new era for genomic sequencing. It has evolved from 16 Macevicz S C. DNA sequencing by parallel oligonucleotide
manual and one-sample-at-a-time operation to a highly- extensions. US patent 5750341. 1998.
automated and massive array-based sequencing activity. We 17 Complete Genomics Technology Paper. www.completegenomics.com
18 Dahl F, Drmanac R, Sparks A. Methods and oligonucleotide designs
have experienced an exponential increase in sequencing
for insertion of multiple adaptors into library constructs. US patent
throughput while seeing a precipitous reduction in its cost. application 20090176652. 2009.
With current and newer generation of sequencing technolo- 19 Holt R A, Jones S J M. The new paradigm of flow cell sequencing.
gies in the horizon, the goal of reaching one thousand dollar Genome Res, 2008, 18: 839-846
20 Drmanac R, Sparks A B, Callow M J, et al. Human genome
genome becomes more attainable. Ability engendered by sequencing using unchained base Reads on self-assembling DNA
rapid and less expensive readout of sequence information nanoarrays. Science, 2010, 327: 78–81
opens up a realm of possibility in comparative genomic 21 Braslavsky I, Hebert B, Kartalov E, et al. Sequence information can
analysis, disease diagnosis, and ultimately personalized (or be obtained from single DNA molecules. Proc Natl Acad Sci USA,
2003, 100: 3960–3964
individualized) medicine. With its huge potential benefit in 22 Harris T D, Buzby P R, Babcock H, et al. Single-molecule DNA
both scientific and financial terms, China should play a sequencing of a viral genome. Science, 2008, 320: 106–109
greater role in this key technological invention of the cen- 23 Harris T D, Buzby P R, Jarosz M, et al. Optical train and method for
TIRF single molecule detection and analysis. US patent application
tury and its in-depth application in biological research and 20070070349. 2007.
medicine. When genomes of all extant life forms and their 24 Hardin S, Gao X, Briggs J, et al. Methods for real-time single mole-
meaningful variations are thoroughly acquired and discov- cule sequence determination. US patent 7329492. 2008.
ered, the contributors of such an endeavor should be all very 25 Eid J, Fehr A, Gray J, et al. Real-time DNA sequencing from single
polymerase molecules. Science, 2009, 323: 133–138
proud of themselves. 26 Levene M J, Korlach J, Turner S W, et al. Zero-mode waveguides for
single-molecule analysis at high concentrations. Science, 2003, 299:
682–686
This work was supported by the Chinese Academy of Sciences Scientific 27 Korlach J, Marks P J, Cicero R L, et al. Selective aluminum
Research Equipments (Grant No. YZ200823) passivation for targeted immobilization of single DNA polymerase
molecules in zero-mode waveguide nanostructures. Proc Natl Acad
Sci USA, 2008, 105: 1176–1181
1 Gilbert W. DNA sequencing and gene structure. Nobel lecture, 1980 28 Array based sequencing-by-synthesis. www.mobious.com
Zhou XiaoGuang, et al. Sci China Life Sci January (2010) Vol.53 No.1 57

29 Densham D H. Nucleic acid sequence analysis. EU Patent Applica- Acad Sci USA, 2008, 105: 20647–20652
tion EP1229133. 2002. 42 Wu H-C, Astier Y, Maglia G, et al. Protein nanopores with covalently
30 Driscoll R J, Youngquist M G, Baldeschwieler J D. Atomic-scale attached molecular adapters. J Am Chem Soc, 2007, 129: 16142–
imaging of DNA using scanning tunnelling microscopy. Nature, 1990, 16148
346: 294–296 43 Clarke J, Wu H-C, Jayasinghe L, et al. Continuous base identification
31 Ikai A. TM and AFM of biolorganic molecules and structures. Surf for single-molecule nanopore DNA sequencing. Nat Nanotechnol,
Sci Rep, 1996, 26: 263–332 2009, 4: 265–270
32 Tanaka H, Kawai T. Partial sequencing of a single DNA molecule 44 Cheikh C, Koper G. Influence of the stick-slip transition on the
with a scanning tunnelling microscope. Nat Nanotechnol, 2009, 4: electrokinetic behavior of nanoporous material. Physica A, 2007, 373:
518–522 21–28
33 Bension R. Rapid sequencing of polymers. US patent application 45 Benner S, Chen R J, Wilson N A, et al. Sequence-specific detection
20040214177. 2004. of individual DNA polymerase complexes in real time using a
34 Glover III, Roy W. Systems and methods of analyzing nucleic acid nanopore. Nat Nanotechnol, 2007, 2: 718–724
polymers and related components. US patent 7291467. 2007. 46 IBM press release. Advancing the Science of DNA Sequencing.
35 Branton D, Deamer D W, Marziali A, et al. The potential and www.ibm.com. 2009.
challenges of nanopore sequencing. Nat Biotechnol, 2008, 26: 47 Postma H W Ch.Rapid sequencing of individual DNA molecules in
1146–1153 graphene nanogaps. arXiv:0810.3035v1 [physics.bio-ph]. 2008.
36 Meller A, Nivon L, Branton D. Voltage-driven DNA translocations 48 Albertorio F, Hughes M E, Golovchenko J A, et al. Base dependent
through a nanopore. Phys Rev Lett, 2001, 86: 3435–3438 DNA-carbon nanotube interactions: activation enthalpies and
37 Fologea D, Gershow M, Ledden B, et al. Detecting single stranded assembly-disassembly control. Nanotechnol, 2009, 20: 395101
DNA with a solid state nanopore. Nano Lett, 2005, 5: 1905–1909 49 Gigliott B, Sakizzie B, Bethune DS, et al. Sequence-independent
38 Ling X S, Bready B, Pertsinidis A. Hybridization-assisted nanopore helical wrapping of single-walled carbon nanotubes by long genomic
sequencing of nucleic acids. US patent application 20070190542. DNA. Nano Lett, 2006,6: 159–164
2007. 50 Meng S, Maragakis P, Papaloukas C, et al. DNA nucleoside
39 Lagerqvist J, Zwolak M, Di Ventra M. Fast DNA sequencing via interaction and identification with carbon nanotubes. Nano Lett, 2007.
transverse electronic transport. Nano Lett, 2006, 6: 779–782 7: 45–50
40 He J, Lin L, Zhang P, et al. Identification of DNA base-pairing via 51 Wright T P. Affecting the cost of airplan. J Aeronautical Sci, 1936, 3:
tunnelcurren decay. Nano Lett, 2007, 7: 3854–3858 122–128
41 Butlera T Z, Pavlenokb M, Derringtona I M, et al. Single-molecule 52 Karow J. The Cost of Sequencing a Human Genome? Answers Differ,
DNA detection with an engineered MspA protein nanopore. Proc Natl Even for the Same Platform. In Sequence, 2009

You might also like