Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

| |

Received: 18 March 2019    Revised: 26 March 2019    Accepted: 26 March 2019

DOI: 10.1111/dgd.12608

REVIEW ARTICLE

Nanopore sequencing: Review of potential applications in


functional genomics

Nobuaki Kono  | Kazuharu Arakawa

Institute for Advanced Biosciences, Keio


University, Tsuruoka, Yamagata, Japan
Abstract
Molecular biology has been led by various measurement technologies, and increased
Correspondence
Kazuharu Arakawa, Institute for Advanced throughput has developed omics analysis. The development of massively parallel se‐
Biosciences, Keio University, Tsuruoka, quencing technology has enabled access to fundamental molecular data and revealed
Yamagata, Japan.
Email: gaou@sfc.keio.ac.jp genomic and transcriptomic signatures. Nanopore sequencers have driven such evo‐
lution to the next stage. Oxford Nanopore Technologies Inc. provides a new type of
Funding information
ImPACT Program of Council for Science, single molecule sequencer using protein nanopore that realizes direct sequencing
Technology and Innovation (Cabinet Office, without DNA synthesizing or amplification. This nanopore sequencer can sequence
Government of Japan); Yamagata Prefectural
Government and Tsuruoka City, Japan an ultra‐long read limited by the input nucleotide length, or can determine DNA/RNA
modifications. Recently, many fields such as medicine, epidemiology, ecology, and
education have benefited from this technology. In this review, we explain the fea‐
tures and functions of the nanopore sequencer, introduce various situations where it
has been used as a critical technology, and expected future applications.

KEYWORDS
long reads, nanopore sequencing, next generation sequencing

1 | I NTRO D U C TI O N Turner, & Kasarskis, 2010). Nanopore sequencing distinguishes itself
from these previous approaches, in that it directly detects the nu‐
Novel technologies that visualize the unseen or detect the unde‐ cleotides without active DNA synthesis, as a long stretch of single
tectable have always contributed to breakthroughs in scientific dis‐ stranded DNA passes through a protein nanopore that is stabilized
coveries, and the rapid advent of high‐throughput and affordable in an electrically resistant polymer membrane (Branton et al., 2008;
DNA sequencing technologies has undoubtedly been the key driving Feng, Zhang, Ying, Wang, & Du, 2015). By setting a voltage across
force in the progress of life sciences over the last decade (Goodwin, this membrane, sensors detect the ionic current changes shifted by
Mcpherson, & Mccombie, 2016). Genomic information has also been nucleotides occupying the pore in real time as the DNA molecule
one of the cores of molecular biology, providing and assisting tools passes through. History of the development of nanopore sequenc‐
to probe the genome, its structure, epigenetics, gene expression, ing of DNA, as well as detailed methods for sequencing, is already
and a multitude of other applications. Latest in the line of DNA reviewed in depth elsewhere (review in (Deamer, Akeson, & Branton,
sequencers are the nanopore sequencers (Heather & Chain, 2016; 2016)) and is beyond the scope of this review. Here, we focus on the
Jain, Olsen, Paten, & Akeson, 2016; Mardis, 2013), successfully current advantages as well as the limitations of the ONT nanopore
commercialized by Oxford Nanopore Technologies (ONT) (Brown & sequencer, reviewing research publications available thus far. It
Clarke, 2016). While a variety of sequencing approaches exist such should be noted, however, that in a rapidly evolving technology like
as pyrosequencing, sequencing by synthesis, sequencing by ligation, nanopore sequencing, reviewing efforts can also quickly become
and single molecule real time (SMRT) sequencing, DNA sequencing outdated.
methods since Sanger sequencing predominantly relied on the pro‐ Nanopore's unique sequencing method provides multiple ad‐
cess of DNA synthesis (Clarke et al., 2009; Eid et al., 2009; Schadt, vantages over existing technologies. Firstly, nanopore sequencing

316  |  wileyonlinelibrary.com/journal/dgd


© 2019 Japanese Society of Developmental Develop Growth Differ. 2019;61:316–326.
Biologists
KONO and ARAKAWA |
      317

does not require imaging equipment to detect the nucleotides, Improvements in base‐caller software from Hidden Markov Model
allowing the system to scale down in size to a portable level. The (HMM) based methods to Recurrent Neural Network (RNN) based
current MinION Mk1B device only weighs 90 g, measuring only algorithms enhanced base‐level accuracy by 2%–5% (Rang et al.,
3.3 × 10.5 × 2.3 cm in size, fitting in the palm of one's hand. The 2018; Teng et al., 2018). In the following sections, we review each of
cost of the device is also much lower compared to other massively the above points in detail.
parallel sequencers, the initial cost being only around $1,000, in‐
cluding the device and initial set of reagents. MinION devices can
be powered through the USB (Universal Serial Bus) port of laptop 2 | U LTR A‐ LO N G R E A DS ‐ “ W H A LE
computers, such that sequencing can be conducted anywhere, even WATC H I N G ”
in the field. Lack of an image analysis step also allows real time base
calling during sequencing, realizing rapid detection of target DNA The official library preparation protocol with adapter ligation pro‐
for the screening of pathogens from clinical samples, for example. vided by ONT recommends an optional fragmentation step for the
Secondly, since nanopore sequencing directly detects the input mol‐ input DNA to average in the 8 kbp range, in order to have optimal
ecule without DNA amplification or synthesis, there is no apparent molar concentration of adaptor‐ligated DNA ends to meet the na‐
limit to the length of DNA that can be sequenced. The challenge nopores during sequencing to gain maximal throughput, which is
in read length using nanopore sequencers therefore is not in the around 0.2 pM. However, since there is no apparent technical limit
sequencing technology itself, but in the library preparation step, for the size of DNA to be sequenced, the nanopore community
which needs to extract and load intact extremely high molecular primarily led by Loman et al. called out for “whale spotting” with
weight (HMW) DNA into the flow cell of the sequencer. Sequenced nanopore (http://lab.loman.net/2017/03/09/ultrareads-for-nanop‐
reads exceeding a mega‐base have been reported, demonstrating ore/). A “whale” is a comparative measurement system that relates
the extraordinary capabilities of the nanopore device in sequenc‐ ultra‐long DNA bp to the nearly equivalent numbers of grams: for
ing extremely long stretches of the DNA molecule (Jain et al., 2018). example, one of the smallest whales is the narwhal, weighing around
The extreme long reads enable de novo genome assembly without 940 kg, and read lengths of 940 kbp to 1 mbp were the initial chal‐
preparation of complex mate‐pair libraries typically required with lenge. In order to load such HMW DNA into the sequencer, every
short read sequencing. The extremely long reads are also useful step of DNA handling required reconsideration. Typical DNA extrac‐
to determine sequences of genomic region containing long repeti‐ tion methods using commercial kits such as the silica spin column
tive sequences, which is difficult with short read sequencing. Long usually result in relatively shorter DNA molecules, typically below
reads also allow the study of structural variations within the genome 50 kbp. Drying of DNA to eliminate residual ethanol during purifi‐
(Cretu Stancu et al., 2017). Moreover, detection of the ionic current cation, clean‐up and concentration with paramagnetic Solid Phase
changes shifted by the nucleotides passing through the nanopore is Reversible Immobilization (SPRI) beads, and even the pipetting
not limited to the canonical four nucleobases of adenine, guanine, of DNA solution result in fragmentation of ultra‐long DNA mol‐
cytosine and thymine. Taking advantage of the nature of unamplified ecules. Therefore, Quick developed a protocol based on the classic
direct sequencing, nanopore sequencing can directly observe base Sambrook method (Sambrook & Russell, 2001) of using phenol‐chlo‐
modifications such as methylations (Rand et al., 2017; Simpson et al., roform‐isoamyl alcohol with wide‐bore pipette tips to extract HMW
2017), and even direct sequencing of RNA molecules containing ura‐ DNA from cultured cells, and then prepared a nanopore library with
cil bases (Garalde et al., 2018). a Rapid Sequencing Kit (ONT) that utilized transposons to add the
The main current drawback of nanopore sequencing, as with sequencing adapters with minimal pipetting steps (Quick J. 2018.
other long read sequencers, is the relatively high error rate com‐ Ultra‐long read sequencing protocol for RAD004. https://www.
pared to short read sequencing. Current error rates range from 5% protocols.io/view/ultra-long-read-sequencing-protocol-for-rad004-
to 20%, dependent on the type of molecules and library prepara‐ mrxc57n). A larger amount of input DNA was also required due to
tion methods, and the errors include both insertions and deletions the decreased molarity of DNA ends with ultra‐long molecules. One
(Rang, Kloosterman, & De Ridder, 2018). Unlike SMRT sequencing of of the very first “whales” was successfully spotted with this protocol
PacBio, there seems to be systematic error in nanopore sequencing for human culture cells, where an ultra‐long read of length 882 kbp
(Jain et al., 2018), so error correction typically requires additional was mapped to the reference spanning a 950 kbp region (Jain et al.,
short read sequence data. Data yield also varies rather significantly 2018). With a sequencing speed of 450 bp/s for 1D kits using R9.4
depending on the input, which is also difficult to predict. On the flow cells, the “whale watching” of this single read took more than
other hand, active developments in bioinformatics for base‐calling 30 min as it passed through the pore. Bioinformatics methods were
and error correction, as well as optimization on library preparation also required to observe the entirety of such ultra‐long sequencing,
steps and new pore developments by ONT, are rapidly improving for the software controlling the sequencing (MinKNOW) errone‐
on these issues. Throughput of the system is greatly enhanced; ously subdivided some of the continuous data streams of ultra‐long
for example, by the recent introduction of the PromethION sys‐ reads into shorter multiple reads. A software to observe such dis‐
tem, where a single flow cell can yield 50–100 Gbp (typical yield of continuity, designated BulkVis, was developed, which observed
MinION system is 5–10 Gbp) and 24 flow cells can be run in parallel. the largest “whale” yet at 2,272,580 bp (Payne, Holmes, Rakyan, &
|
318       KONO and ARAKAWA

Loose, 2018). This single ultra‐long read is comparable to the size using split read mapping (Cretu Stancu et al., 2017). Mitsuhashi
of smaller bacterial chromosomes such as those of Bifidobacterium and colleagues successfully identified a full‐length microsatellite
longum NCC2705, with 2,256,640 bp (Schell et al., 2002), or nearly repeat spanning 49,877 bp in the D4Z4 array responsible for fa‐
one half of the Escherichia coli genome, demonstrating the potential cioscapulohumeral muscular dystrophy (Mitsuhashi, Nakagawa,
to sequence chromosome‐level reads with nanopore technology. et al., 2017). In the case of HLA genotyping, the latest MinION
It is also interesting that this kind of new technology reevaluates accuracy has reached a quality that offers cost‐effective and
the rather classic methods of DNA handling, such as the use of the scalable genotyping with minor shortcomings (Lang et al., 2018).
phenol‐chloroform‐isoamyl alcohol method, vacuum concentration, Furthermore, hybrid assembly of combined short‐ and long‐reads
pulse‐field gel electrophoresis (PFGE) with gel plugs and tapping or could resolve the structure and chromosomal insertion site of
gentle rotation as opposed to spin columns, the use of wide bore the antibiotic resistance island in Salmonella Typhi Haplotype 58
pipette tips, pipette tip cutting, vortexing, SPRI clean‐up, or regular (Ashton et al., 2015), opening up numerous possibilities for real‐
submarine gel electrophoresis. In order to maximize the read length time genomic epidemiology.
in nanopore sequencing, one needs to take extra care for such HMW
DNA handling throughout the library preparation steps.
The extraction protocol should be adjusted according to the 3 | LO N G R E A D A S S E M B LY S TR ATEG I E S
target species, and cell lysis and homogenization procedures have
a profound influence on the extracted nucleotide quality for tissue In short read assembly, the de Bruijn graph (DBG) algorithm has
samples, unlike Gram‐negative bacteria or cultured cells that lack a been most commonly used due to its accuracy and speed (Lu,
thick cell wall or exoskeleton. Therefore, homogenization and DNA Giordano, & Ning, 2016). The DBG algorithm splits sequenced
extraction methods should be optimized for each species, especially reads into their k‐mer components and constructs a graph with
for plants or arthropods that harbor thick cell walls and cuticular connecting pairs of k‐mers based on whether they have k‐1 com‐
exoskeletons. In the case of plant genomes such as the fastest‐grow‐ mon nucleotides. This approach, however, is highly dependent
ing angiosperm (greater duckweed) or tropical timber trees (teak), on the determination of the k‐mer graph, which is not suitable
HMW genomic DNA extraction has been successfully performed for long reads with innately high error rates, and cannot fully
by a CTAB method (Hoang et al., 2018; Yasodha et al., 2018), fol‐ take advantage of the longer lengths of the reads, to take ac‐
lowing homogenization steps to break the cell wall, where grinding count of complex genomic features including structural variation
with liquid nitrogen is generally used to crush the cell wall while or non‐random elements (Sohn & Nam, 2018). Long read assem‐
cellular enzymes remain inactivated. After sufficient homogeniza‐ bly therefore revived interest in an overlap‐layout‐consensus
tion, samples are then resuspended in detergent‐based extraction (OLC) algorithm with all‐against‐all pairwise alignment, which
buffers containing cetyltrimethylammonium bromide (CTAB). Due is flexible in read length and robust to error, allowing for a de‐
to the abundance of polysaccharides in plants, the CTAB method crease in chimeric contigs or assembly bubbles. Canu has been
is established as the best detergent for purifying DNA from plant the most popular assembly software used in numerous nanop‐
material (Murray & Thompson, 1980). Likewise, in insects such as ore assembly projects, which includes tf‐idf weighted MinHas
harlequin ladybird or fruit fly, a dissected tissue or a whole specimen for fast and accurate adaptive overlapping Nanopore‐only error
frozen with liquid nitrogen is ground using a pestle, and HMW DNA correction and using the raw electric current signal information
extraction is subsequently performed with Genomic‐tip 500/G kit can significantly improve base‐level accuracies to around 99%,
(QIAGEN) or Blood & Cell Culture kit (QIAGEN) (Gautier et al., 2018; as demonstrated in the whole genome assembly of E. coli K‐12
Miller, Staber, Zeitlinger, & Hawley, 2018). For vertebrates such as MG1655 (Loman, Quick, & Simpson, 2015). This approach, how‐
clownfish or eel, suitable tissue samples such as muscles or livers are ever, still ends with a higher base‐level error rate in comparison
selected (Jansen et al., 2017; Tan et al., 2018). to short read assembly, and is highly computationally intensive.
Coupled with the portable, real time, single molecule nature of A hybrid strategy utilizing short Illumina reads is therefore com‐
the nanopore sequencing technology, it is worth noting that the monly adopted to polish the assembly (Figure 1a) (Giordano
long reads are expected to contribute to translational applications et al., 2017; Istace et al., 2017). Multiple rounds of polishing are
in genomic medicine and clinical testing (Ameur, Kloosterman, & often necessary, where the error correction process is validated
Hestand, 2019). Notably, long reads are critical in structural vari‐ using Benchmarking Universal Single‐Copy Orthologs (BUSCO)
ation analysis and the complete sequencing of repetitive DNA completeness scores (Jain et al., 2018; Simao, Waterhouse,
contents of clinical utility. Norris and colleagues demonstrated Ioannidis, Kriventseva, & Zdobnov, 2015; Tan et al., 2018; Tyson
that nanopore sequencers could detect large‐scale structural vari‐ et al., 2018). Several assemblers incorporate the hybrid error
ations, including large deletions, inversions, and translocations re‐ correction step prior to the assembly, in place of the initial over‐
lated to the inactivation of tumor suppressor genes in pancreatic lap‐based error correction (Bankevich et al., 2012; Zimin et al.,
cancer, with very few reads (Norris, Workman, Fan, Eshleman, & 2013). While hybrid error correction is computationally efficient
Timp, 2016). Another group developed a new bioinformatics tool, and generally results in highly accurate sequences, it should
NanoSV, for structural variation detection with nanopore data be noted that the disadvantage of short reads remains in this
KONO and ARAKAWA |
      319

F I G U R E 1   (Long read assembly) The nanopore read assembly requires error correction, assembly, and polishing processes. Error
correction can be performed by hybrid or long read only approaches. Hybrid methods use high‐accuracy short read. Assembled contigs
are polished to improve the consensus sequence by using high‐accuracy read or raw current data obtained by nanopore sequencer. (Splice
variant analysis) Long read obtained by direct sequencing allows detection of structural variation as it is. Therefore, the splice variant
analysis does not require assembly

approach, where error correction of non‐random or repetitive especially pertinent to uncover the diversity of alternative splicing
regions is difficult. isoforms and their expression levels, which are often difficult to de‐
lineate with short reads (Figure 1b). Consequently, the combination
of long and short reads is reported to increase transcriptome analy‐
4 | TR A N S C R I P TO M E A N A LYS I S A N D sis performance (Weirather et al., 2017). Use of long reads for the
D I R EC T R N A S EQ U E N C I N G analysis of splice isoforms has clear advantages over short reads, in
that there is no need for read mapping or assembly to figure out
The advantages of long reads are not limited to the study of genomic the isoform structures, but one can instead simply look at the en‐
DNA, but are also useful in the study of RNA sequences. It is tire sequence to pinpoint the isoform. The Dscam1 (Down syndrome
|
320       KONO and ARAKAWA

cell adhesion molecule 1, (Schmucker et al., 2000)) gene structure (m6A) or 5‐mC is the most common internal modifications in mRNA
in Drosophila, which produces a total of 38,016 different isoforms that are implicated in various RNA metabolism and regulations, and
via alternative splicing arising from four variable exon clusters, is re‐ nanopore sequencing can identify these modifications (Garalde
garded as “the most complicated alternatively spliced gene known in et al., 2018). A comprehensive study of the human poly (A) tran‐
nature” (Bolisetty, Rajadinakaran, & Graveley, 2015). Nanopore full‐ scriptome using direct RNA sequencing has recently been reported,
length cDNA sequencing analysis, however, successfully identified including the detection of these modifications as well as A‐to‐I RNA
7,899 full‐length isoforms of this gene. The accurate identification editing (Workman et al., 2018). Although it should be noted that the
of incorporated exon structures using long reads is also powerful in error rate of direct RNA sequencing exceeds the already high error
its application to evidence‐based gene structural annotation. Cook rate of nanopore DNA sequencing, the ability to directly study RNA
and colleagues developed an automated gene structure annota‐ molecules could open up a wide range of research that are otherwise
tion pipeline named LoReAn (long‐read annotation), in which they not possible.
demonstrated that a full‐length cDNA sequence could assist correct
annotation of the gene structures in Arabidopsis thaliana and Oryza
sativa (Cook et al., 2019). 5 | R E A L TI M E A N D P O RTA B LE
Long read cDNA sequencing is also possible with other plat‐
forms, but the nanopore sequencer is the only device that has made The excellent portability of a MinION device that can be bus‐pow‐
the direct sequencing of long stretches of RNA molecule possible ered from a mobile PC can make a significant contribution in the
(Garalde et al., 2018). Although tRNA detection by nanopore has pre‐ context of in‐the‐field sequencing (Ameur et al., 2019; Quick et al.,
viously been explored and reported (Henley et al., 2016; Smith, Abu‐ 2017), which has already been demonstrated in extreme places
Shumays, Akeson, & Bernick, 2015), the ONT nanopore sequencer such as an academic conference venue (Ii et al., 2019), coal fields
commercialized direct RNA sequencing kits in 2017. Conventional (Edwards et al., 2017), the arctic (Edwards, Debbonaire, Sattler,
“RNA‐seq” with short reads, which is rather inappropriately termed Mur, & Hodson, 2016), Antarctica (Johnson, Zaikova, Goerlitz, Bai,
so since the RNA molecules are not directly sequenced (Hrdlickova, & Tighe, 2017), and even on the International Space Station (Castro‐
Toloue, & Tian, 2017), requires reverse transcription (RT) of the tem‐ Wallace et al., 2017). For these purposes, mobile lab approaches for
plate RNA molecule into complementary DNA (cDNA), which is typ‐ DNA extraction and library preparation have been explored using
ically further amplified by PCR. Direct RNA sequencing (again, it is devices like VolTRAX V2 or Bento Lab (all‐in‐one DNA laboratory
unfortunate that true RNA sequencing has to add the word “direct” including thermal cycler: https://www.bento.bio), and experimen‐
to differentiate itself from conventional “RNA‐Seq”), on the other tal protocol developments are also underway, such as those using
hand, directly detects the ribonucleobases passing through the pore LAMP (loop‐mediated isothermal amplification) methods instead of
without RT or PCR amplification, and this approach is free of pos‐ conventional PCR to eliminate the need for a thermal cycler. LAMP
sible biases or misamplifications introduced during such steps. The only requires a water bath to conduct its isothermal amplification,
gene annotation process is simplified by the direct RNA sequenc‐ and Imai and colleagues demonstrated an identification system for
ing, and it has therefore allowed the identification of more com‐ human Plasmodium species (Imai et al., 2017). Yamagishi and col‐
plex or novel transcript isoforms genome‐wide (Byrne et al., 2017; leagues also reported the serotyping system for dengue virus in a
Krizanovic, Echchiki, Roux, & Sikic, 2018), and the ability to differ‐ single day (Yamagishi et al., 2017) using this approach.
entiate transcript haplotypes as well as to identify 3′ poly (A) tail Moreover, the real time nature of the device, where DNA mol‐
lengths (Workman et al., 2018). Such an application is also practical ecules passing through the pore are immediately base‐called on a
for the study of RNA viral genomes, because the genome has multi‐ per‐read basis as they are sequenced, becomes a more rapid and
ple reading frames, anti‐sense gene locations, inefficient termination highly robust alternative to conventional clinical testing such as che‐
signals, and complex splice forms, and gene annotation is challenging miluminescence, ELISA, immune chromatographic tests (ICT), and
using the conventional RNA‐seq method (Depledge et al., 2018). real‐time PCR. Greninger and colleagues implemented a web‐based
Moreover, another strength of unamplified or non‐reverse‐tran‐ pipeline for real‐time analysis (MetaPORE) and made it possible to
scribed direct RNA sequencing is the detection of nucleotide modifi‐ detect Chikungunya virus, Ebola virus, and hepatitis C virus simul‐
cations. Since the presence of modified bases results in altered ionic taneously from a human blood sample with nanopore sequencing
current signal from the unmodified base as it passes through the (Greninger et al., 2015). Likewise, Mitsuhashi and colleagues de‐
pore, nanopore sequencing is able to detect the modifications with‐ veloped a real‐time identification of bacterial composition within
out any additional sample preparation. Rand and colleagues demon‐ 2 hr of obtaining a sample (Mitsuhashi, Kryukov, et al., 2017), and
strated that three cytosine variants within DNA molecules could be Quick et al. (2015) showed that genotyping of a hospital outbreak of
detected with an accuracy of 80% using an HMM‐HDP (HMM with Salmonella could be carried out in less than half a day. Furthermore,
a Hierarchical Dirichlet Process) (Rand et al., 2017), and Simpson and combining the portable and real‐time nature of the MinION device,
colleagues also proved that the nanopore sequencer could detect it has been adopted for the surveillance and monitoring of viral out‐
the 5‐methylcytosine (5‐mC) at up to 95% accuracy using HMM breaks in resource‐limited locations. The Zika virus outbreak was
(Simpson et al., 2017). For RNA modification, N6‐methyladenosine declared an international public health emergency by the World
KONO and ARAKAWA |
      321

Health Organization (WHO) in 2016, and the Zika in Brazil Real‐time Multiple spidroin families exist, corresponding to different types of
Analysis (ZiBRA) project was established to sequence a thousand ge‐ silk with various mechanical properties, but they all share conserved
nomes from Brazil to monitor the epidemiological information (Faria non‐repetitive N/C‐terminal domains (Hayashi, Blackledge, & Lewis,
et al., 2016). Quick et al. (2017) thus developed a protocol to rapidly 2004), and a highly complex sequence organization that is extremely
obtain the complete genome of the Zika virus from clinical samples long: typically exceeding 10kbp, and is almost entirely comprised of
using the MinION sequencer, encompassing sample preparation to repetitive sequences. The multi‐giga‐base size of spider genomes
bioinformatics analysis. The same research group also developed a (Babb et al., 2017) and this characteristic spidroin gene structure
genomic surveillance system using the nanopore sequencer for the (Xu & Lewis, 1990) makes the determination of the full length of the
Ebola outbreak in West Africa (Quick et al., 2016). This system takes sequence challenging, as even from the PCR amplification step of
as little as 15–60 min sequencing after receiving the blood sample, the gene there is a risk of producing chimeric artifacts. Unamplified
realizing an on‐the‐spot virus testing without the need for the trans‐ single molecule long nanopore sequencing is one of a very limited
portation of hazardous specimens. Furthermore, Runtuwene and number of possible approaches.
colleagues demonstrated a new MinION application for genotyping In order for an untargeted, comprehensive search for spidroin
analysis of the malaria parasite Plasmodium falciparum in Indonesia fragments, the non‐repetitive N/C‐terminal domains as well as re‐
(Runtuwene et al., 2018), stressing the potential of the nanopore ge‐ peat unit sequences were computationally searched from a short
notyping approach as an effective and practical alternative for the read transcriptome assembly. These fragments were then extended
diagnosis of the malaria parasite. by an OLC of short reads until the contigs met unresolvable bifur‐
cations in the DBG. Then, the collected extended fragments were
ordered based on nanopore long reads directly obtained from mRNA
6 | U S E C A S E: H I G H LY R E PE TITI V E or genomic DNA to correct for base‐level accuracies, so that the
S PI D E R S I LK G E N E S gene length and the arrangement of repeat units were guaranteed,
avoiding compressed alignment (Ricker, Qian, & Fulthorpe, 2012).
The improved accessibility of sequencing technology makes it easier Here, we illustrate a specific example of the spidroin gene sequence
to access genomic data, and contributes significantly to molecular obtained by nanopore sequencer (Figure 2). Araneus ventricosus is an
biology in non‐classical model organisms (Russell et al., 2017). Long orb‐weaving spider and has seven spidroin gene families with a com‐
read assemblies are especially useful in this regard in complex large bination of repetitive and non‐repetitive unit structures as described
eukaryotic genomes, typically with numerous non‐random features above. One of the spidroin genes, the minor ampullate spidroin
such as highly repetitive sequences and polyploidy (Kono, Nakamura, (MiSp) gene, was determined by low‐coverage (5.5M reads) direct
Ohtoshi, Pedrazzoli Moran, et al., 2019; Kono, Nakamura, Ohtoshi, DNA sequencing. The collection of long reads covering the full‐
Tomita, Numata, et al., 2019). As one example, we here introduce the length of MiSp gene sequence from numerous nanopore reads was
de novo sequencing of a spider silk protein (spidroin) gene. The spi‐ performed based on similarity with the MiSp gene fragments con‐
der silk exhibits extraordinary physio‐chemical properties for poten‐ structed by short read assembly in advance. In the previous study,
tial industrial applications as a multifunctional protein material, such it was reported that the MiSp gene sequence was over 5,000 bp in
as high toughness, high tensile strength per density, and thermosta‐ length (Chen et al., 2012), and nanopore sequencing showed over
bility (Blamires, Blackledge, & Tso, 2017; Omenetto & Kaplan, 2010). 8.3 kbp in length. The same case of another gene (aciniform spidroin:

F I G U R E 2   The dot plots represent the nanopore reads covering full length gene. The area where dot concentrates indicates the
receptive region. Red and blue box mean N‐ and C‐terminus domain. Gray box means intron region. (a) and (b) show MiSp and AcSp gene
reads obtained by Araneus ventricosus genomic DNA. These dotplots were generated using LAST (Katoh & Standley, 2013)
|
322       KONO and ARAKAWA

TA B L E 1   List of bioinformatics software commonly used in the analysis of nanopore data

  Software Description Reference

Basecaller Albacore Basecaller developed by Oxford Nanopore Technologies Inc. https://nanoporetech.com


Chiron Establish end‐to‐end basecalling using a deep learning CNN+RNN+CTC (Teng et al., 2018)
structure
Metrichor Cloud‐based basecaller that cannot be run locally https://nanoporetech.com
Tombo Detection tool for modified nucleotides such as methylation https://nanoporetech.com
Polishing Nanopolish HMM‐based polishing tool using raw signal data of reads (Louis et al., 2016)
Racon Polishing tool for contigs assembled by Canu (Koren et al., 2017) with raw (Vaser, Sovic, Nagarajan, &
long reads. Sikic, 2017)
Pilon Polishing tool for contigs assembled by Canu (Koren et al., 2017) with (Walker et al., 2014)
Illumina short reads.
Assembler DBG2OLC Hybrid large genome assembler using short read contigs as anchor points (Ye, Hill, Wu, Ruan, & Ma,
for overlap graph construction with long reads 2016)
Wtdbg2 Fast long‐read assembler based on fuzzy‐Bruijn graph (FBG) that is analogy (Ruan & Li, 2019)
to de Bruijn graph but permits mismatches and gaps
RaGOO Scaffolding tool using reference‐guided contig ordering and orienting (Alonge, Soyk,
Ramakrishnan, & Wang,
2019)
MaSuRCA Hybrid genome assembler using long and short reads (Zimin et al., 2013)
Canu Canu is one of the leading Celera assembler that designed for SMRT and (Koren et al., 2017)
ONT reads. A successor of PBcR (Miller et al., 2008)
SMARTdenovo Fast and reasonably accurate OLC assembler for long reads without prior (Giordano et al., 2017)
error correction step
TULIP Efficient assembler for large and complex genome using seed extension (Jansen et al., 2017)
Falcon Long read assembler for phased diploid genome (Chin et al., 2016)
Miniasm Efficient assembler for SMRT and ONT reads without an error correction (Li, 2016)
Unicycler Assembler specially designed for hybrid assembly of small (bacterial or (Wick, Judd, Gorrie, & Holt,
viral) genomes based on SPAdes 2017)
ABruijn De novo assembler for long and noisy read using de Bruijn graph (Lin et al., 2016)
HINGE Long read OLC assembler based without pre‐assembly or read correction (Kamath, Shomorony, Xia,
step Courtade, & Tse, 2017)
SV/SNV NanoSV Breakpoint junction detection of structural variant using mapping BAM file (Cretu Stancu et al., 2017)
caller produced by LAST
Sniffles Structural variation caller using automatic filtering of false events based on (Sedlazeck et al., 2018)
coverage data
npInv Specially designed tool for non‐allelic homologous recombination (NAHR) (Shao et al., 2018)
inversions
marginCaller SNV detection program in marginAlign package https://github.com/
benedictpaten/marginAlign
Alignment minimap2 Fast mapping and alignment program for long noisy reads (Li, 2018)
minialign Fast and moderately accurate alignment tool for long reads https://github.com/ocxtal/
minialign
LAST Genome‐scale sequence comparison tool to find similar region (Kielbasa, Wan, Sato,
Horton, & Frith, 2011)
NGMLR Long read aligner with a new convex gap‐cost scoring model for SNV caller (Sedlazeck et al., 2018)
(Sniffles)
marginAlign Package to align reads to a reference genome for SNV calling (marginCaller) https://github.com/
benedictpaten/marginAlign
BWA‐MEM Leading short read mapping tool tuned to work with ONT recently https://github.com/lh3/bwa
Others NanoPipe Interactive web‐based tool for fast and easy processing and analysis of the (Shabardina et al., 2019)
ONT data
BulkVis Visualization tool for bulk FAST5 files obtained from ONT sequencing (Payne et al., 2018)
KONO and ARAKAWA |
      323

AcSp) is also shown in Figure 2. Because of the unamplified single ORCID


molecule long read sequencing of genomic DNA, not only the gene
Nobuaki Kono  https://orcid.org/0000-0001-5960-7956
length, but also the repetitive or exon/intron structure were clearly
uncovered. Kazuharu Arakawa  https://orcid.org/0000-0002-2893-4919

REFERENCES
7 | LI M ITATI O N S
Alonge, M., Soyk, S., Ramakrishnan, S., & Wang, X. (2019). Fast and accu‐
The largest limitation of the nanopore sequencer is the compara‐ rate reference‐guided scaffolding of draft genomes. bioRxiv.
Ameur, A., Kloosterman, W. P., & Hestand, M. S. (2019). Single‐molecule
tively lower read accuracy when compared to short read sequenc‐
sequencing: Towards clinical applications. Trends in Biotechnology, 37,
ers. Because insertions and deletions are included in the errors, 72–85. https://doi.org/10.1016/j.tibtech.2018.07.013
nanopore reads per se are not optimal for single nucleotide varia‐ Ashton, P. M., Nair, S., Dallman, T., Rubino, S., Rabsch, W., Mwaigwisya,
tion (SNV) detection. Although SNV genotyping with nanopore‐se‐ S., … O'Grady, J. (2015). MinION nanopore sequencing identifies
the position and structure of a bacterial antibiotic resistance is‐
quenced reads has been demonstrated, high coverage reads were
land. Nature Biotechnology, 33, 296–300. https://doi.org/10.1038/
required (Ebler, Haukness, Pesout, Marschall, & Paten, 2018; Koren nbt.3103
et al., 2017), and HLA genotyping also has problems in that it cannot Babb, P. L., Lahens, N. F., Correa‐Garhwal, S. M., Nicholson, D. N., Kim,
distinguish specific alleles due to a lack of read accuracy (Jain et al., E. J., Hogenesch, J. B., … Voight, B. F. (2017). The Nephila clavipes ge‐
nome highlights the diversity of spider silk genes and their complex
2018).
expression. Nature Genetics, 49, 895–903. https://doi.org/10.1038/
Improving the accuracy of nanopore sequencers depends on both ng.3852
the pore chemistry and the base‐calling algorithm (Patel et al., 2018). Bankevich, A., Nurk, S., Antipov, D., Gurevich, A. A., Dvorkin, M., Kulikov,
The latest R9.X nanopore is derived from the E. coli Curlin sigma S‐ A. S., … Pevzner, P. A. (2012). SPAdes: A new genome assembly
algorithm and its applications to single‐cell sequencing. Journal
dependent growth (CsgG) gene (Goyal et al., 2014), achieving sig‐
of Computational Biology, 19, 455–477. https://doi.org/10.1089/
nificantly reduced error rate (Patel et al., 2018; Rang et al., 2018). cmb.2012.0021
According to the Nanopore Community Meeting 2018 (https:// Blamires, S. J., Blackledge, T. A., & Tso, I. M. (2017). Physicochemical
nanoporetech.com/resource-centre/videos/community-meeting), a property variation in spider silk: Ecology, evolution, and synthetic
new protein, R10 nanopore, is due to be released in 2019, further production. Annual Review of Entomology, 62, 443–460. https://doi.
org/10.1146/annurev-ento-031616-035615
improving the accuracy for homopolymers. So far, R9.X nanopore
Bolisetty, M. T., Rajadinakaran, G., & Graveley, B. R. (2015).
has a sharp reader that can only dominate five bases, making ac‐ Determining exon connectivity in complex mRNAs by nanopore
curate determination of the homopolymers difficult. However, the sequencing. Genome Biology, 16, 204. https://doi.org/10.1186/
new chemistry has long and multiple “readers” which can dominate s13059-015-0777-z
Boza, V., Brejova, B., & Vinar, T. (2017). DeepNano: Deep recurrent neu‐
more base signal, so homopolymer sequence accuracy is projected
ral networks for base calling in MinION nanopore reads. PLoS ONE,
to improve from Q34 of R9.4 to Q40 at 75× coverage. 12, e0178751. https://doi.org/10.1371/journal.pone.0178751
Basecalling algorithms that convert raw ionic current signal to Branton, D., Deamer, D. W., Marziali, A., Bayley, H., Benner, S. A., Butler,
nucleotide sequences based on a deep learning approach, combin‐ T., … Schloss, J. A. (2008). The potential and challenges of nanopore
sequencing. Nature Biotechnology, 26, 1146–1153. https://doi.
ing a convolutional neural network (CNN), connectionist temporal
org/10.1038/nbt.1495
classification, and a RNN, improved the read accuracy dramatically Brown, C. G., & Clarke, J. (2016). Nanopore development at Oxford
(Rang et al., 2018) over the previous HMM‐based approach from Nanopore. Nature Biotechnology, 34, 810–811. https://doi.
segmented raw data according to k‐mer (Boza, Brejova, & Vinar, org/10.1038/nbt.3622
Byrne, A., Beaudin, A. E., Olsen, H. E., Jain, M., Cole, C., Palmer, T., …
2017; Stoiber & Brown, 2017). As described above, many limitations
Vollmers, C. (2017). Nanopore long‐read RNAseq reveals widespread
can be overcome by improved software fine‐tuned for use with transcriptional variation among the surface receptors of individual
nanopore data. Many bioinformatics tools specialized or optimized B cells. Nature Communications, 8, 16027. https://doi.org/10.1038/
for nanopore sequencing have been released, and many more are in ncomms16027
Castro‐Wallace, S. L., Chiu, C. Y., John, K. K., Stahl, S. E., Rubins, K. H.,
active development. Table 1 lists representative software tools use‐
McIntyre, A. B. R., … Turner, D. J. (2017). Nanopore DNA sequencing
ful in the analysis of nanopore data. and genome assembly on the international space station. Scientific
Reports, 7, 18022. https://doi.org/10.1038/s41598-017-18364-0
Chen, G., Liu, X., Zhang, Y., Lin, S., Yang, Z., Johansson, J., … Meng, Q.
AC K N OW L E D G M E N T S (2012). Full‐length minor ampullate spidroin gene sequence. PLoS
ONE, 7, e52293. https://doi.org/10.1371/journal.pone.0052293
The authors would like to thank Dr. James Fleming for his diligent
Chin, C. S., Peluso, P., Sedlazeck, F. J., Nattestad, M., Concepcion, G. T.,
proofreading of this manuscript. This work was funded by the Clum, A., … Ecker, J. R. (2016). Phased diploid genome assembly with
ImPACT Program of Council for Science, Technology and Innovation single‐molecule real‐time sequencing. Nature Methods, 13, 1050–
(Cabinet Office, Government of Japan) and in part by research funds 1054. https://doi.org/10.1038/nmeth.4035
Clarke, J., Wu, H. C., Jayasinghe, L., Patel, A., Reid, S., & Bayley, H.
from the Yamagata Prefectural Government and Tsuruoka City,
(2009). Continuous base identification for single‐molecule nanopore
Japan.
|
324       KONO and ARAKAWA

DNA sequencing. Nature Nanotechnology, 4, 265–270. https://doi. Heather, J. M., & Chain, B. (2016). The sequence of sequencers: The
org/10.1038/nnano.2009.12 history of sequencing DNA. Genomics, 107, 1–8. https://doi.
Cook, D. E., Valle‐Inclan, J. E., Pajoro, A., Rovenich, H., Thomma, B., org/10.1016/j.ygeno.2015.11.003
& Faino, L. (2019). Long‐read annotation: Automated eukaryotic Henley, R. Y., Ashcroft, B. A., Farrell, I., Cooperman, B. S., Lindsay, S.
genome annotation based on long‐read cDNA sequencing. Plant M., & Wanunu, M. (2016). Electrophoretic deformation of individ‐
Physiology, 179, 38–54. https://doi.org/10.1104/pp.18.00848 ual transfer RNA molecules reveals their identity. Nano Letters, 16,
Cretu Stancu, M., Van Roosmalen, M. J., Renkens, I., Nieboer, M. M., 138–144. https://doi.org/10.1021/acs.nanolett.5b03331
Middelkamp, S., de Ligt, J., … Espejo, Valle.‐Inclan. J. (2017). Mapping Hoang, P. N. T., Michael, T. P., Gilbert, S., Chu, P., Motley, S. T., Appenroth,
and phasing of structural variation in patient genomes using K. J., … Lam, E. (2018). Generating a high‐confidence reference ge‐
nanopore sequencing. Nature Communications, 8, 1326. https://doi. nome map of the Greater Duckweed by integration of cytogenomic,
org/10.1038/s41467-017-01343-4 optical mapping, and Oxford Nanopore technologies. Plant Journal,
Deamer, D., Akeson, M., & Branton, D. (2016). Three decades of 96, 670–684. https://doi.org/10.1111/tpj.14049
nanopore sequencing. Nature Biotechnology, 34, 518–524. https:// Hrdlickova, R., Toloue, M., & Tian, B. (2017). RNA‐Seq methods for tran‐
doi.org/10.1038/nbt.3423 scriptome analysis. Wiley Interdisciplinary Reviews: RNA, 8(1). https://
Depledge, D. P., Kalanghad Puthankalam, S., Sadaoka, T., Bready, D., doi.org/10.1002/wrna.1364
Mori, Y., Placantonakis, D. G., … Wilson, A. C. (2018). Native RNA Ii, K. M., Kono, N., Paulino‐Lima, I. G., Tomita, M., Rothschild, L. J., &
sequencing on nanopore arrays redefines the transcriptional com‐ Arakawa, K. (2019). Complete genome sequence of Arthrobacter sp.
plexity of a viral pathogen. bioRxiv. Strain MN05‐02, a UV‐Resistant Bacterium from a manganese de‐
Ebler, J., Haukness, M., Pesout, T., Marschall, T., & Paten, B. (2018). posit in the Sonoran Desert. Journal of Genomics, 7, 18–25. https://
Haplotype‐aware genotyping from noisy long reads. bioRxiv. doi.org/10.7150/jgen.32194
Edwards, A., Debbonaire, A. R., Sattler, B., Mur, L. A. J., & Hodson, A. J. Imai, K., Tarumoto, N., Misawa, K., Runtuwene, L. R., Sakai, J., Hayashida,
(2016). Extreme metagenomics using nanopore DNA sequencing: A K., … Suzuki, Y. (2017). A novel diagnostic method for malaria
field report from Svalbard, 78 °N. bioRxiv. using loop‐mediated isothermal amplification (LAMP) and MinION
Edwards, A., Soares, A., Rassner, S., Green, P., Felix, J., & Mitchell, A. nanopore sequencer. BMC Infectious Diseases, 17, 621. https://doi.
(2017). Deep sequencing: Intra‐terrestrial metagenomics illustrates org/10.1186/s12879-017-2718-9
the potential of off‐grid nanopore DNA sequencing. bioRxiv. Istace, B., Friedrich, A., D'agata, L., Faye, S., Payen, E., Beluche, O., …
Eid, J., Fehr, A., Gray, J., Luong, K., Lyle, J., Otto, G., … Dalal, R. (2009). Engelen, S. (2017). de novo assembly and population genomic sur‐
Real‐time DNA sequencing from single polymerase molecules. vey of natural yeast isolates with the Oxford Nanopore MinION se‐
Science, 323, 133–138. https://doi.org/10.1126/science.1162986 quencer. Gigascience, 6, 1–13.
Faria, N. R., Sabino, E. C., Nunes, M. R., Alcantara, L. C., Loman, N. Jain, M., Koren, S., Miga, K. H., Quick, J., Rand, A. C., Sasani, T. A., …
J., & Pybus, O. G. (2016). Mobile real‐time surveillance of Zika Dilthey, A. T. (2018). Nanopore sequencing and assembly of a human
virus in Brazil. Genome Medicine, 8, 97. https://doi.org/10.1186/ genome with ultra‐long reads. Nature Biotechnology, 36, 338–345.
s13073-016-0356-2 https://doi.org/10.1038/nbt.4060
Feng, Y., Zhang, Y., Ying, C., Wang, D., & Du, C. (2015). Nanopore‐based Jain, M., Olsen, H. E., Paten, B., & Akeson, M. (2016). The Oxford
fourth‐generation DNA sequencing technology. Genomics Proteomics Nanopore MinION: Delivery of nanopore sequencing to the genom‐
Bioinformatics, 13, 4–16. https://doi.org/10.1016/j.gpb.2015.01.009 ics community. Genome Biology, 17, 239. https://doi.org/10.1186/
Garalde, D. R., Snell, E. A., Jachimowicz, D., Sipos, B., Lloyd, J. H., Bruce, s13059-016-1103-0
M., … Serra, S. (2018). Highly parallel direct RNA sequencing on Jansen, H. J., Liem, M., Jong‐Raadsen, S. A., Dufour, S., Weltzien, F. A.,
an array of nanopores. Nature Methods, 15, 201–206. https://doi. Swinkels, W., … Palstra, A. P. (2017). Rapid de novo assembly of the
org/10.1038/nmeth.4577 European eel genome from nanopore sequencing reads. Scientific
Gautier, M., Yamaguchi, J., Foucaud, J., Loiseau, A., Ausset, A., Facon, Reports, 7, 7213. https://doi.org/10.1038/s41598-017-07650-6
B., … Lopez‐Roques, C. (2018). The genomic basis of color pattern Johnson, S. S., Zaikova, E., Goerlitz, D. S., Bai, Y., & Tighe, S. W. (2017).
polymorphism in the Harlequin Ladybird. Current Biology, 28(3296– Real‐Time DNA sequencing in the Antarctic dry valleys using the
3302), e3297. Oxford Nanopore Sequencer. Journal of Biomolecular Techniques, 28,
Giordano, F., Aigrain, L., Quail, M. A., Coupland, P., Bonfield, J. K., Davies, 2–7.
R. M., … Ning, Z. (2017). De novo yeast genome assemblies from Kamath, G. M., Shomorony, I., Xia, F., Courtade, T. A., & Tse, D. N.
MinION, PacBio and MiSeq platforms. Scientific Reports, 7, 3935. (2017). HINGE: Long‐read assembly achieves optimal repeat res‐
https://doi.org/10.1038/s41598-017-03996-z olution. Genome Research, 27, 747–756. https://doi.org/10.1101/
Goodwin, S., Mcpherson, J. D., & Mccombie, W. R. (2016). Coming of gr.216465.116
age: Ten years of next‐generation sequencing technologies. Nature Katoh, K., & Standley, D. M. (2013). MAFFT multiple sequence align‐
Reviews Genetics, 17, 333–351. https://doi.org/10.1038/nrg.2016.49 ment software version 7: Improvements in performance and us‐
Goyal, P., Krasteva, P. V., Van Gerven, N., Gubellini, F., Van den Broeck, ability. Molecular Biology and Evolution, 30, 772–780. https://doi.
I., Troupiotis‐Tsaïlaki, A., … Chapman, M. R. (2014). Structural and org/10.1093/molbev/mst010
mechanistic insights into the bacterial amyloid secretion channel Kielbasa, S. M., Wan, R., Sato, K., Horton, P., & Frith, M. C. (2011).
CsgG. Nature, 516, 250–253. https://doi.org/10.1038/nature13768 Adaptive seeds tame genomic sequence comparison. Genome
Greninger, A. L., Naccache, S. N., Federman, S., Yu, G., Mbala, P., Bres, Research, 21, 487–493. https://doi.org/10.1101/gr.113985.110
V., … Mulembakani, P. (2015). Rapid metagenomic identification of Kono, N., Nakamura, H., Ohtoshi, R., Tomita, M., Numata, K., & Arakawa,
viral pathogens in clinical samples by real‐time nanopore sequenc‐ K. (2019). The bagworm genome reveals a unique fibroin gene that
ing analysis. Genome Medicine, 7, 99. https://doi.org/10.1186/ provides high tensile strength. Communications Biology. in press.
s13073-015-0220-9 Kono, N., Nakamura, H., Ohtoshi, R., Pedrazzoli Moran, D.A., Shinohara,
Hayashi, C. Y., Blackledge, T. A., & Lewis, R. V. (2004). Molecular and A., Yoshida, Y., … Arakawa, K. (2019). Orb‐weaving spider Araneus
mechanical characterization of aciniform silk: Uniformity of iterated ventricosus genome elucidates the spidroin gene catalogue. Scientific
sequence modules in a novel member of the spider silk fibroin gene Reports. in press.
family. Molecular Biology and Evolution, 21, 1950–1959. https://doi. Koren, S., Walenz, B. P., Berlin, K., Miller, J. R., Bergman, N. H., & Phillippy,
org/10.1093/molbev/msh204 A. M. (2017). Canu: Scalable and accurate long‐read assembly via
KONO and ARAKAWA |
      325

adaptive k‐mer weighting and repeat separation. Genome Research, Review of potential applications in neurosurgery. Surgical Neurology
27, 722–736. https://doi.org/10.1101/gr.215087.116 International, 9, 157.
Krizanovic, K., Echchiki, A., Roux, J., & Sikic, M. (2018). Evaluation of Payne, A., Holmes, N., Rakyan, V., & Loose, M. (2018). BulkVis: A graph‐
tools for long read RNA‐seq splice‐aware alignment. Bioinformatics, ical viewer for Oxford nanopore bulk FAST5 files. Bioinformatics.
34, 748–754. https://doi.org/10.1093/bioinformatics/btx668 https://doi.org/10.1093/bioinformatics/bty841
Lang, K., Surendranath, V., Quenzel, P., Schofl, G., Schmidt, A. H., & Quick, J., Ashton, P., Calus, S., Chatt, C., Gossain, S., Hawker, J., … Nye, K.
Lange, V. (2018). Full‐Length HLA Class I genotyping with the (2015). Rapid draft sequencing and real‐time nanopore sequencing in
MinION nanopore sequencer. Methods in Molecular Biology, 1802, a hospital outbreak of Salmonella. Genome Biology, 16, 114. https://
155–162. https://doi.org/10.1007/978-1-4939-8546-3 doi.org/10.1186/s13059-015-0677-2
Li, H. (2016). Minimap and miniasm: Fast mapping and de novo assembly Quick, J., Grubaugh, N. D., Pullan, S. T., Claro, I. M., Smith, A. D.,
for noisy long sequences. Bioinformatics, 32, 2103–2110. https://doi. Gangavarapu, K., … Beutler, N. A. (2017). Multiplex PCR method for
org/10.1093/bioinformatics/btw152 MinION and Illumina sequencing of Zika and other virus genomes di‐
Li, H. (2018). Minimap2: Pairwise alignment for nucleotide se‐ rectly from clinical samples. Nature Protocols, 12, 1261–1276. https://
quences. Bioinformatics, 34, 3094–3100. https://doi.org/10.1093/ doi.org/10.1038/nprot.2017.066
bioinformatics/bty191 Quick, J., Loman, N. J., Duraffour, S., Simpson, J. T., Severi, E., Cowley, L.,
Lin, Y., Yuan, J., Kolmogorov, M., Shen, M. W., Chaisson, M., & Pevzner, … Boettcher, J. P. (2016). Real‐time, portable genome sequencing for
P. A. (2016). Assembly of long error‐prone reads using de Bruijn Ebola surveillance. Nature, 530, 228–232. https://doi.org/10.1038/
graphs. Proceedings of the National Academy of Sciences of the United nature16996
States of America, 113, E8396–E8405. https://doi.org/10.1073/ Rand, A. C., Jain, M., Eizenga, J. M., Musselman‐Brown, A., Olsen, H. E.,
pnas.1604560113 Akeson, M., & Paten, B. (2017). Mapping DNA methylation with high‐
Loman, N. J., Quick, J., & Simpson, J. T. (2015). A complete bacterial throughput nanopore sequencing. Nature Methods, 14, 411–413.
genome assembled de novo using only nanopore sequencing data. https://doi.org/10.1038/nmeth.4189
Nature Methods, 12, 733–735. https://doi.org/10.1038/nmeth.3444 Rang, F. J., Kloosterman, W. P., & De Ridder, J. (2018). From squiggle
Louis, D. N., Perry, A., Reifenberger, G., von Deimling, A., Figarella‐ to basepair: Computational approaches for improving nanopore
Branger, D., Cavenee, W. K., … Kleihues, P. (2016). The 2016 World sequencing read accuracy. Genome Biology, 19, 90. https://doi.
Health Organization Classification of Tumors of the Central Nervous org/10.1186/s13059-018-1462-9
System: A summary. Acta Neuropathologica, 131, 803–820. https:// Ricker, N., Qian, H., & Fulthorpe, R. R. (2012). The limitations of
doi.org/10.1007/s00401-016-1545-1 draft assemblies for understanding prokaryotic adaptation and
Lu, H., Giordano, F., & Ning, Z. (2016). Oxford nanopore MinION se‐ evolution. Genomics, 100, 167–175. https://doi.org/10.1016/j.
quencing and genome assembly. Genomics Proteomics Bioinformatics, ygeno.2012.06.009
14, 265–279. https://doi.org/10.1016/j.gpb.2016.05.004 Ruan, J., & Li, H. (2019). Fast and accurate long‐read assembly with
Mardis, E. R. (2013). Next‐generation sequencing platforms. Annual wtdbg2. bioRxiv.
Review of Analytical Chemistry, 6, 287–303. https://doi.org/10.1146/ Runtuwene, L. R., Tuda, J. S. B., Mongan, A. E., Makalowski, W., Frith,
annurev-anchem-062012-092628 M. C., Imwong, M., … Maeda, R. (2018). Nanopore sequenc‐
Miller, J. R., Delcher, A. L., Koren, S., Venter, E., Walenz, B. P., Brownley, ing of drug‐resistance‐associated genes in malaria parasites,
A., … Sutton, G. (2008). Aggressive assembly of pyrosequenc‐ Plasmodium falciparum. Sci. Rep., 8, 8286. https://doi.org/10.1038/
ing reads with mates. Bioinformatics, 24, 2818–2824. https://doi. s41598-018-26334-3
org/10.1093/bioinformatics/btn548 Russell, J. J., Theriot, J. A., Sood, P., Marshall, W. F., Landweber, L. F.,
Miller, D. E., Staber, C., Zeitlinger, J., & Hawley, R. S. (2018). Highly con‐ Fritz‐Laylin, L., … Umen, J. (2017). Non‐model model organisms. BMC
tiguous genome assemblies of 15 Drosophila species generated using Biology, 15, 55. https://doi.org/10.1186/s12915-017-0391-5
nanopore sequencing. G3 (Bethesda), 8, 3131–3141. https://doi. Sambrook, J., & Russell, D. W. (2001). Molecular cloning : A laboratory man-
org/10.1534/g3.118.200160 ual. Cold Spring Harbor, NY: Cold Spring Harbor Laboratory Press.
Mitsuhashi, S., Kryukov, K., Nakagawa, S., Takeuchi, J. S., Shiraishi, Y., Schadt, E. E., Turner, S., & Kasarskis, A. (2010). A window into third‐
Asano, K., & Imanishi, T. (2017). A portable system for rapid bac‐ generation sequencing. Human Molecular Genetics, 19, R227–R240.
terial composition analysis using a nanopore‐based sequencer and https://doi.org/10.1093/hmg/ddq416
laptop computer. Scientific Reports, 7, 5657. https://doi.org/10.1038/ Schell, M. A., Karmirantzou, M., Snel, B., Vilanova, D., Berger, B., Pessi,
s41598-017-05772-5 G., … Arigoni, F. (2002). The genome sequence of Bifidobacterium
Mitsuhashi, S., Nakagawa, S., Takahashi Ueda, M., Imanishi, T., Frith, longum reflects its adaptation to the human gastrointestinal tract.
M. C., & Mitsuhashi, H. (2017). Nanopore‐based single molecule Proceedings of the National Academy of Sciences of the United States of
sequencing of the D4Z4 array responsible for facioscapulohu‐ America, 99, 14422–14427. https://doi.org/10.1073/pnas.212527599
meral muscular dystrophy. Scientific Reports, 7, 14789. https://doi. Schmucker, D., Clemens, J. C., Shu, H., Worby, C. A., Xiao, J., Muda, M., …
org/10.1038/s41598-017-13712-6 Zipursky, S. L. (2000). Drosophila Dscam is an axon guidance recep‐
Murray, M. G., & Thompson, W. F. (1980). Rapid isolation of high molecu‐ tor exhibiting extraordinary molecular diversity. Cell, 101, 671–684.
lar weight plant DNA. Nucleic Acids Research, 8, 4321–4325. https:// https://doi.org/10.1016/s0092-8674(00)80878-8
doi.org/10.1093/nar/8.19.4321 Sedlazeck, F. J., Rescheneder, P., Smolka, M., Fang, H., Nattestad, M.,
Norris, A. L., Workman, R. E., Fan, Y., Eshleman, J. R., & Timp, W. (2016). von Haeseler, A., & Schatz, M. C. (2018). Accurate detection of com‐
Nanopore sequencing detects structural variants in cancer. Cancer plex structural variations using single‐molecule sequencing. Nature
Biology & Therapy, 17, 246–253. https://doi.org/10.1080/15384047 Methods, 15, 461–468. https://doi.org/10.1038/s41592-018-0001-7
.2016.1139236 Shabardina, V., Kischka, T., Manske, F., Grundmann, N., Frith, M. C.,
Omenetto, F. G., & Kaplan, D. L. (2010). New opportunities for an an‐ Suzuki, Y., & Makałowski, W. (2019). NanoPipe‐a web server for
cient material. Science, 329, 528–531. https://doi.org/10.1126/ nanopore MinION sequencing data analysis. Gigascience, 8. https://
science.1188936 doi.org/10.1093/gigascience/giy169
Patel, A., Belykh, E., Miller, E. J., George, L. L., Martirosyan, N. L., Shao, H., Ganesamoorthy, D., Duarte, T., Cao, M. D., Hoggart, C. J., &
Byvaltsev, V. A., & Preul, M. C. (2018). MinION rapid sequencing: Coin, L. J. M. (2018). npInv: Accurate detection and genotyping of
|
326       KONO and ARAKAWA

inversions using long read sub‐alignment. BMC Bioinformatics, 19, Pacific Biosciences and Oxford Nanopore Technologies and their
261. https://doi.org/10.1186/s12859-018-2252-9 applications to transcriptome analysis. F1000Res, 6, 100. https://doi.
Simao, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V., & org/10.12688/f1000research
Zdobnov, E. M. (2015). BUSCO: Assessing genome assembly and Wick, R. R., Judd, L. M., Gorrie, C. L., & Holt, K. E. (2017). Unicycler:
annotation completeness with single‐copy orthologs. Bioinformatics, Resolving bacterial genome assemblies from short and long sequenc‐
31, 3210–3212. https://doi.org/10.1093/bioinformatics/btv351 ing reads. PLoS Computational Biology, 13, e1005595. https://doi.
Simpson, J. T., Workman, R. E., Zuzarte, P. C., David, M., Dursi, L. J., org/10.1371/journal.pcbi.1005595
& Timp, W. (2017). Detecting DNA cytosine methylation using Workman, R. E., Tang, A., Tang, P. S., Jain, M., Tyson, J. R., Zuzarte, P. C.,
nanopore sequencing. Nature Methods, 14, 407–410. https://doi. Gilpatrick, T., … Timp, W. (2018). Nanopore native RNA sequencing
org/10.1038/nmeth.4184 of a human poly(A) transcriptome. bioRxiv.
Smith, A. M., Abu‐Shumays, R., Akeson, M., & Bernick, D. L. (2015). Xu, M., & Lewis, R. V. (1990). Structure of a protein superfiber: Spider
Capture, unfolding, and detection of individual tRNA molecules using dragline silk. Proceedings of the National Academy of Sciences of the
a nanopore device. Frontiers in Bioengineering and Biotechnology, 3, 91. United States of America, 87, 7120–7124. https://doi.org/10.1073/
Sohn, J. I., & Nam, J. W. (2018). The present and future of de novo whole‐ pnas.87.18.7120
genome assembly. Briefings in Bioinformatics, 19, 23–40. Yamagishi, J., Runtuwene, L. R., Hayashida, K., Mongan, A. E., Thi, L. A.
Stoiber, M., & Brown, J. (2017). BasecRAWller: Streaming nanopore ba‐ N., Thuy, L. N., … Frith, M. (2017). Serotyping dengue virus with iso‐
secalling directly from raw signal. bioRxiv. thermal amplification and a portable sequencer. Scientific Reports, 7,
Tan, M. H., Austin, C. M., Hammer, M. P., Lee, Y. P., Croft, L. J., & Gan, H. 3510. https://doi.org/10.1038/s41598-017-03734-5
M. (2018). Finding Nemo: Hybrid assembly with Oxford Nanopore Yasodha, R., Vasudeva, R., Balakrishnan, S., Sakthi, A. R., Abel, N., Binai,
and Illumina reads greatly improves the clownfish (Amphiprion ocel- N., … Dev, S. A. (2018). Draft genome of a high value tropical tim‐
laris) genome assembly. Gigascience, 7, 1–6. ber tree, Teak (Tectona grandis L. f): Insights into SSR diversity, phy‐
Teng, H., Cao, M. D., Hall, M. B., Duarte, T., Wang, S., & Coin, L. J. M. logeny and conservation. DNA Research, 25, 409–419. https://doi.
(2018). Chiron: Translating nanopore raw signal directly into nu‐ org/10.1093/dnares/dsy013
cleotide sequence using deep learning. Gigascience, 7. https://doi. Ye, C., Hill, C. M., Wu, S., Ruan, J., & Ma, Z. S. (2016). DBG2OLC: Efficient
org/10.1093/gigascience/giy037 Assembly of Large Genomes Using Long Erroneous Reads of the
Tyson, J. R., O'neil, N. J., Jain, M., Olsen, H. E., Hieter, P., & Snutch, T. P. Third Generation Sequencing Technologies. Scientific Reports, 6,
(2018). MinION‐based long‐read sequencing and assembly extends 31900. https://doi.org/10.1038/srep31900
the Caenorhabditis elegans reference genome. Genome Research, 28, Zimin, A. V., Marcais, G., Puiu, D., Roberts, M., Salzberg, S. L., & Yorke,
266–274. https://doi.org/10.1101/gr.221184.117 J. A. (2013). The MaSuRCA genome assembler. Bioinformatics, 29,
Vaser, R., Sovic, I., Nagarajan, N., & Sikic, M. (2017). Fast and accurate 2669–2677. https://doi.org/10.1093/bioinformatics/btt476
de novo genome assembly from long uncorrected reads. Genome
Research, 27, 737–746. https://doi.org/10.1101/gr.214270.116
Walker, B. J., Abeel, T., Shea, T., Priest, M., Abouelliel, A., Sakthikumar,
How to cite this article: Kono N, Arakawa K. Nanopore
S., … Earl, A. M. (2014). Pilon: An integrated tool for comprehen‐
sequencing: Review of potential applications in functional
sive microbial variant detection and genome assembly improve‐
ment. PLoS ONE, 9, e112963. https://doi.org/10.1371/journal. genomics. Develop Growth Differ. 2019;61:316–326. https://doi.
pone.0112963 org/10.1111/dgd.12608
Weirather, J. L., De Cesare, M., Wang, Y., Piazza, P., Sebastiano, V.,
Wang, X. J., … Au, K. F. (2017). Comprehensive comparison of

You might also like