Professional Documents
Culture Documents
The DNA Data Storage Model
The DNA Data Storage Model
D
digital data (Figure 1). To learn more
about this process, see Preserving Our
NA data storage, or using synthetic DNA as a Digital Legacy: An Introduction to DNA Data Storage.1
data storage medium, is being seriously con- Even though synthetic DNA as a data storage medium
sidered as an archival storage solution due to is similar to traditional storage media in many ways, it is
its volumetric data density potential, data re- worth highlighting some key differences.
tention characteristics, sustainability, and potential for First, in traditional storage, the media is premanufactured
dramatically lower total cost of ownership versus existing (e.g., SSDs use NAND cells, HDD, and tape use magnetic do-
storage technologies. mains on a platter or strip, respectively) and written by mod-
ifying the state of the media. For such devices, capacity and
INTRODUCTION throughput scaling require modifications to both media and
The biotechnology industry has made DNA data storage pos- write/read heads. In contrast, the most common DNA data
sible today due to decades of investment in molecular-level storage method does not employ a premanufactured media
technologies for medical and life sciences applications that substrate. (Note: Methods to attach DNA to a planar substrate,
now enable us to construct and read synthetic DNA, base by to serve as a “memory/storage cell”, are being considered; this
article does not cover these methods.) Instead, the storage
Digital Object Identifier 10.1109/MC.2023.3272188
media—DNA molecules—are manufactured during write op-
Date of current version: 26 June 2023 erations. In this case, DNA’s universal, fixed, and reader/writer
J U LY 2 0 2 3 79
MEMORY AND STORAGE
Network Stack and Switches Transmitter Receiver Network Stack and Switches
• Depacketization
• Routing Bitstream
Symbols to Signals Signals to Symbols
• Flow Control 111100..
(D→A, Shaping, ...) (A→D, Shaping, ...)
Analog
(a) Electrical
(b) DNA
FIGURE 5. Comparing (a) the electrical channel model with (b) the DNA channel model.
J U LY 2 0 2 3
81
MEMORY AND STORAGE
(homopolymers), a high proportion of similar in nature to those used in exist- technology-based implementations that
Gs and Cs (high GC content), and some ing storage systems. In the next section, miniaturize the synthesis device and
other specific patterns can cause er- we discuss the DNA physical layer. can achieve higher write throughput by
rors. For example, some sequencing synthesizing more sequences in paral-
techniques exhibit errors with long THE DNA DATA STORAGE lel (thousands in well plates versus bil-
homopolymers, and high GC content PHYSICAL LAYER lions on chips). For example, chips akin
may affect sequencing preparation The DNA physical layer [Figure 2(b)] in- to static RAM arrays have been used to
protocols. The transformations in the volves the basic chemistry of DNA for synthesize sequences of DNA, one se-
DNA channel layer avoid transforming writing (synthesis), physically preserv- quence per array position8 (Figure 6).
digital bitstreams into such problem- ing (storage), and reading (sequencing) Like other storage technologies,
atic patterns of bases, thereby reducing DNA molecules. It also involves the scaling relies on miniaturization of
downstream errors and associated er- basic chemistry of DNA for retrieval, these positions so that more fit onto
a chip, but instead of increasing
capacity, more positions (synthesis
sites) on a DNA synthesis chip increase
write throughput, as more sequences
In summary, the DNA channel layer encodes an can be synthesized simultaneously.
input bitstream so that the DNA sequences that are
sent to the physical layer can be effectively and Sequencing
successfully decoded after “transmission.” The two main methods of DNA sequenc-
ing considered for DNA data storage to-
day are sequencing-by-synthesis (SBS)
ror correction overhead, and improving which is both a general preparation (Figure 7) and nanopore sequencing
overall data reliability. This is concep- step for sequencing and also an integral (Figure 8). SBS20,21 indirectly identifies
tually similar to the actions in a net- step in how systems implement DNA the bases in a source ssDNA strand by
work/electrical channel to mitigate an- storage operation requests from the up- reconstructing a dsDNA strand from
alog effects. For example, a serial link per layers. We will first briefly discuss that source. As each complementary
maintains close to an equal number of synthesis, storage, and sequencing and base is attached in building the dsDNA
ones and zeroes on the wire. then cover retrieval as it is central to strand, the attachment event enables
Optional DNA space protocol: In ad- the functioning of logical storage oper- identification of the original source base,
dition to the actions above, the codec ations for DNA data storage. usually by optical means. Nanopore se-
may insert additional base sequences quencing22,23 detects bases directly
into the already transformed digital Synthesis (no dsDNA construction) by measuring
data (process not shown in Figure 3). DNA synthesis has historically been disturbances in ionic current as a DNA
The purposes are use-case specific implemented with large machines, strand is guided through a tiny opening
but generally involve random access using hoses, valves, and plastic contain- (nanopore). Nanopores can be biological
to or search within the DNA archive. ers known in the life sciences as well or, in development, solid state. A main
An example is the addition of an ob- plates. Hoses and valves will not disap- tradeoff between the two methods is
ject ID, assigned by the session layer pear; however, advances in the semi- that SBS offers higher accuracy but typ-
to sequences belonging to the same conductor industry have enabled silicon ically requires a batched process that
object. The section “Retrieval” dis-
cusses this further.
After these steps, the resulting
DNA sequences are handed over to the
DNA physical layer.
In summary, the DNA channel layer
encodes an input bitstream so that the Detection
DNA sequences that are sent to the Event
physical layer can be efficiently and suc-
cessfully decoded after “transmission” 2 µm
Source
(synthesis, storage, retrieval, and se- Strand
quencing). In doing this, the DNA chan-
FIGURE 6. An electrochemical DNA syn- FIGURE 7. SBS sequencing. (Source:
nel uses transformations and process-
thesis array at 2 μm pitch.8 Illumina, Inc.; used with permission.)
ing steps that are well structured and
J U LY 2 0 2 3 83
MEMORY AND STORAGE
The result is a pool in which most of the methods in the OSI layered storage storage architecture,” Science, vol.
DNA molecules represent the object(s) model will help the nascent DNA data 355, no. 6328, pp. 950–954, Mar. 2017,
to be read. storage ecosystem evolve. doi: 10.1126/science.aaj2038.
DNA pull-out with magnetic nano 8. B. H. Nguyen et al., “Scaling DNA
particles uses the same principle ACKNOWLEDGMENT data storage with nanoscale elec-
of DNA attachment but a different We thank Kyle Tomek, John Hoff- trode wells,” Sci. Adv., vol. 7, no. 48,
method to make the molecules repre- man, Damien Le Moal, and Luis Ceze Nov. 2021, Art. no. eabi6714, doi:
senting the object of interest more for their valuable contributions and 10.1126/sciadv.abi6714.
numerous in a solution. With this feedback on this manuscript. We also 9. D. Coudy, M. Colotte, A. Luis, S. Tuffet,
process, the channel layer only needs thank our respective research teams and J. Bonnet, “Long term conservation
to append the object IDs as target sites and collaborators for their substantial of DNA at ambient temperature. Im-
at one end of every DNA sequence be- contributions to this field. Dave Lands- plications for DNA data storage,” PLoS
longing to an object. At retrieval time, man is the corresponding author. One, vol. 16, no. 11, Nov. 2021, Art.
the probes are attached to magnetic no. e0259868, doi: 10.1371/journal.
nanoparticles. As the probes bind REFERENCES pone.0259868.
to target DNA molecules, the target 1. “Preserving our digital legacy: 10. L. Organick et al., “An empirical com-
DNA molecules can then be sepa- An introduction to DNA data parison of preservation methods for syn-
rated from the rest of the DNA mole- storage,” DNA Data Storage Alliance. thetic DNA data storage,” Small Methods,
cules in the pool with a magnet. [Online]. Available: https:// vol. 5, no. 5, May 2021, Art. no. 2001094,
For either random access method, dnastoragealliance.org/dev/wp doi: 10.1002/smtd.202001094.
additional preparation steps such as -content/uploads/2021/06/DNA 11. R. N. Grass, R. Heckel, M. Puddu,
further PCR steps and DNA cleanup -Data-Storage-Alliance-An D. Paunescu, and W. J. Stark, “Robust
may be required, depending on the se- -Introduction-to-DNA-Data-Storage.pdf chemical preservation of digital infor-
quencer used for reading the informa- 2. N. Goldman et al., “Towards practical, mation on DNA in silica with error-cor-
tion out. high-capacity, low-maintenance infor- recting codes,” Angewandte Chemie,
Tradeoffs between different chemi- mation storage in synthesized DNA,” vol. 54, no. 8, pp. 2552–2555, Feb. 2015,
cal implementations of object random Nature, vol. 494, no. 7435, pp. 77–80, doi: 10.1002/anie.201411378.
access along dimensions such as reliabil- Feb. 2013, doi: 10.1038/nature11875. 12. S. Newman et al., “High density DNA
ity, preparation, and reading overheads 3. L. Organick et al., “Random access in data storage library via dehydration
are an active area of research for DNA large-scale DNA data storage,” Nature with digital microfluidic retrieval,”
data storage,16 as well further work on Biotechnol. vol. 36, no. 3, pp. 242–248, Nature Commun., vol. 10, no. 1,
implementing even more advanced op- Mar. 2018, doi: 10.1038/nbt.4079. Apr. 2019, Art. no. 1706, doi: 10.1038/
erations such as search,17 content sim- 4. S. R. Srinivasavaradhan, S. Gopi, H. s41467-019-09517-y.
ilarity search,18 file preview,19 etc. D. Pfister and S. Yekhanin, “Trellis 13. S. M. Tabatabaei Yazdi, Y. Yuan,
This discussion has shown how, for BMA: Coded trace reconstruction on J. Ma, H. Zhao, and O. Milenkovic,
the same read-object request from IDS channels for DNA storage,” in “A rewritable, random-access DNA-
the session layer, d i f ferent mecha- Proc. IEEE Int. Symp. Inf. Theory (ISIT), based storage system,” Scientific Rep.,
nisms can be used at the lower chan- 2021, pp. 2453–2458, doi: 10.1109/ vol. 5, Sep. 2015, Art. no. 14138,
nel and physical layers, similar to ISIT45174.2021.9517821. doi: 10.1038/srep14138.
how two NAND f lash storage devices 5. G. M. Church, Y. Gao, and S. Kosuri, 14. C. Winston, L. Organick, D. Ward,
are implemented differently despite “Next-generation digital information L. Ceze, K. Strauss, and Y. J. Chen,
using the same command interface. storage in DNA,” Science, vol. 337, no. “Combinatorial PCR method for
6102, p. 1628, Sep. 2012, doi: 10.1126/ efficient, selective oligo retrieval from
W
science.1226355. complex oligo pools,” ACS Synthetic
hile DNA as a storage me- 6. W. H. Press, J. A. Hawkins, S. K. Jones Biol., vol. 11, no. 5, pp. 1727–1734, Feb.
dium has fundamental Jr., J. M. Schaub, and I. J. Finkelstein, 2022, doi: 10.1021/acssynbio.1c00482.
differences from tradi- “HEDGES error-correcting code for 15. K. N. Lin, K. Volkel, J. M. Tuck, and
tional storage, many of the data trans- DNA storage corrects indels and A. J. Keung, “Dynamic and scalable
formations and error processing con- allows sequence constraints,” Proc. DNA-based information storage,”
siderations for DNA data storage have Nat. Acad. Sci. USA, vol. 117, no. 31, Nature Commun., vol. 11, no. 1, Jun.
analogies to transmitting data through pp. 18,489–18,496, Aug. 2020, doi: 2020, Art. no. 2981, doi: 10.1038/
“traditional” network/storage electri- 10.1073/pnas.2004821117. s41467-020-16797-2.
cal channels. It is our hope that fram- 7. Y. Erlich and D. Zielinski, “DNA Foun- 16. K. J. Tomek et al., “Driving the scal-
ing DNA data storage implementation tain enables a robust and efficient ability of DNA-based information
Lorrie Cranor
Bob Blakley
Subscribe Today
www.computer.org/over-the-rainbow-podcast Digital Object Identifier 10.1109/MC.2023.3277764