Carnegie Mellon University, Pittsburgh, PA, USA Bilkent University, Ankara, Turkey ETH Zürich, Zürich, Switzerland

Nanopore Sequencing Technology and Tools:
Computational Analysis of the Current State, Bottlenecks and Future Directions

Damla Senol , Jeremie Kim , Saugata Ghose , Can Alkan and Onur Mutlu
1 1,3 1 2 3,1
1
Carnegie Mellon University, Pittsburgh, PA, USA 2 Bilkent University, Ankara, Turkey 3 ETH Zürich, Zürich, Switzerland
Genome Sequencing Long Read Analysis Nanopore Sequencing

Genome sequencing is the process of Long reads Nanopore sequencing is an emerging DNA sequencing
determining the order of the DNA o Sequences with thousands of bases technology.
sequence in an organism’s genome. o Sequences with higher error rates
o Suitable for de novo assembly o Long read length
Large DNA molecule
o Portable and low cost
De novo assembly is the method of
o Merging the reads in order to construct the o Produces data in real-time
original sequence Nanopore sequencers rely solely on the electrochemical
Small DNA fragments
o Without the aid of a reference genome structure of the different nucleotides for identification and
ACGTACCCCGT TTTTTTTAATT Assembly quality can be improved by using measure the change in the ionic current as long strands of
ACGAGCGGGT GATACACTGTG AAAAAAAAAA Reads longer reads, since they can cover long DNA (ssDNA) pass through the nano-scale protein pores.
CTAGGGACCTT ACGACGTAGCT repetitive regions.
Pipeline and Current Tools Problem & Our Goal

Raw signal Basecalling Problem
data Tools: Metrichor, Nanocall, Nanonet, DeepNano
DNA reads The tools used for nanopore sequence analysis are of critical importance in
order to increase the accuracy of the whole pipeline to take better
Read-to-Read Overlap Finding
Tools: Minimap, GraphMap
advantage of long reads, and increase the speed of the whole pipeline to
Overlaps
enable real-time data analysis.
Assembly
Assembly
Tools:Canu, Miniasm Our Goal
Draft o Comprehensively analyze current publicly available tools in the whole
Read Mapping
assembly pipeline for nanopore sequence analysis, with a focus on understanding their
Tools: BWA-MEM advantages, disadvantages, and performance bottlenecks.
Reads vs. draft o Provide guidelines for determining the appropriate tools for each step of the
assembly
Improved Polishing mappings
pipeline.
assembly Tools: Nanopolish
Results and Analysis

Step 1 Step 1 Step 2 Step 2 Step 3 Step 3
Number of Identity Coverage
Wall Clock Memory Wall Clock Memory Wall Clock Memory
Contigs (%) (%)
Time (h:m:s) Usage (GB) Time (h:m:s) Usage (GB) Time (h:m:s) Usage (GB)
Metrichor + Canu – – 44:12:31 5.76 1 98.04 99.31
Metrichor + Minimap + Miniasm – – 2:15 12.30 1:19 1.96 1 85.00 94.85
Metrichor + Graphmap + Miniasm 6:14 56.58 1:05 1.82 2 85.24 96.95
Nanonet + Canu – – 11:32:40 5.27 1 97.92 98.71
Nanonet + Minimap + Miniasm 17:52:42 1.89 1:13 9.45 33 0.69 1 85.50 93.72
Nanonet + Graphmap + Miniasm 3:18 29.16 32 0.65 1 85.36 92.05
Nanocall + Canu – – – – – – –
Nanocall + Minimap + Miniasm 47:04:53 37.73 1:15 12.19 20 0.47 5 80.53 96.80
Nanocall + Graphmap + Miniasm 5:14 56.78 16 0.30 3 80.52 95.43
Deepnano + Canu – – 1:15:48 3.61 106 92.63 154.07
Deepnano + Minimap + Miniasm 23:54:34 8.38 1:50 11.71 1:03 1.31 1 82.37 91.62
Deepnano + Graphmap + Miniasm 5:18 54.64 58 1.10 1 82.39 91.60
OBSERVATION 1: Basecalling with Nanocall (HMM) vs. Nanonet (RNN)
OBSERVATION 4: Canu, an assembler with error
80000
Recurrent Neural Networks performs 60000 correction, produces high-quality assemblies but is
Wall Clock
Time (sec)
better than basecalling with Hidden 40000

3.1x
Nano call (HMM) vs . Nano ne t (RNN)
slow compared to Miniasm, an assembler without

Markov Models in terms of accuracy, speed,
8 0000
6 0000
Wall Clock
Time (sec)
3.1x
20000
40000
20000
0 3.0x
error correction.
0 10 20 30 40
Numbe r o f Thre ads
3.0x
nano call nano ne t
and memory usage. However, it has Nanocall (HMM) vs. Nanonet (RNN)
0
0 10 20 30 40
scalability limitations due to data sharing 80 Number of Threads
Miniasm is suitable for fast initial analysis, and the
Peak Memory
between threads.
Usage (GB)
60 nanocall nanonet
40 quality of its assembly can be increased with an
OBSERVATION 2: Sharing the 20
additional polishing step.
computation of a read between parallel 0
0 10 20 30 40 40
Nanocall (HMM) vs. Nanonet (RNN)
threads provides a constant and low memory OBSERVATION 5: Nanopolish is compatible only
Parallel Speedup
Number of Threads 1.2x

30
usage, but data sharing between multiple nanocall nanonet
20 with reads basecalled by Metrichor.
sockets degrades the parallel speedup 10
when number of threads reaches higher 0

0 10 20 30 40 Polishing the draft assembly generated with Canu
values. Number of Threads
takes 5h52m and increases the accuracy from
nanocall nanonet
98.04% to 99.46%.
OBSERVATION 3: Storing minimizers instead of all k-mers does not affect the accuracy of
the whole pipeline. However, Minimap has a lower memory usage and higher speed Polishing the draft assembly generated with Miniasm
than GraphMap, since computation is decreased by shrinking the size of the dataset takes 5d2h54m and increases the accuracy from
that needs to be considered. 85.00% to 92.31%.

Carnegie Mellon University, Pittsburgh, PA, USA Bilkent University, Ankara, Turkey ETH Zürich, Zürich, Switzerland

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Carnegie Mellon University, Pittsburgh, PA, USA Bilkent University, Ankara, Turkey ETH Zürich, Zürich, Switzerland

Uploaded by

Copyright:

Available Formats

Nanopore Sequencing Technology and Tools:

Computational Analysis of the Current State, Bottlenecks and Future Directions

Genome Sequencing Long Read Analysis Nanopore Sequencing

Pipeline and Current Tools Problem & Our Goal

Results and Analysis

better than basecalling with Hidden 40000

slow compared to Miniasm, an assembler without

Number of Threads 1.2x

when number of threads reaches higher 0

You might also like